├── README.md ├── _config.yml ├── advanced-networking-speciality.md ├── developer-associate.md ├── devops-engineer-professional-02.md ├── devops-engineer-professional.md ├── solutions-architect-associate.md ├── solutions-architect-professional.md └── sysops-administrator-associate.md /README.md: -------------------------------------------------------------------------------- 1 | # AWS Certification Notes 2 | 3 | * [DevOps Engineer Professional DOP-C02](devops-engineer-professional-02.md) (2023) 4 | * [Advanced Network Speciality](advanced-networking-speciality.md) (2021) 5 | * [Solutions Architect Professional](solutions-architect-professional.md) (2021) 6 | * [DevOps Engineer Professional DOP-C01](devops-engineer-professional.md) (2020) 7 | * [Solutions Architect Associate](solutions-architect-associate.md) (2019) 8 | * [SysOps Administrator Associate](sysops-administrator-associate.md) (2018) 9 | * [Developer Associate](developer-associate.md) (2017) 10 | 11 | --- 12 | 13 | * View as [GitHub Pages](https://jangroth.github.io/aws-certification-notes/) 14 | * TOCs generated with [MarkDownHelper](https://github.com/jangroth/markdownhelper) 15 | -------------------------------------------------------------------------------- /_config.yml: -------------------------------------------------------------------------------- 1 | markdown: GFM 2 | theme: jekyll-theme-dinky 3 | title: AWS Cert Notes 4 | description: My AWS cert notes. 5 | -------------------------------------------------------------------------------- /advanced-networking-speciality.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | --- 4 | * [Advanced Networking - Speciality](#1) 5 | * [Exam Objectives](#2) 6 | * [Content](#2_1) 7 | * [Design and Implement AWS Networks](#3) 8 | * [AWS Global Network Infrastructure](#3_1) 9 | * [Virtual Private Cloud (VPC)](#3_2) 10 | * [Connecting VPCs to other VPCs](#3_3) 11 | * [Extending on-premises networks to VPCs](#3_4) 12 | * [Open](#4) 13 | * [Services](#4_1) 14 | * [Topics](#4_2) 15 | * [Practice/Hands-on](#4_3) 16 | * [Supporting Material](#4_4) 17 | --- 18 | 19 | --- 20 | 21 | # [↖](#top)[↓](#2) Advanced Networking - Speciality 22 | 23 | > 8/2021 - 24 | 25 | --- 26 | 27 | 28 | # [↖](#top)[↑](#1)[↓](#2_1) Exam Objectives 29 | * Design, develop, and deploy cloud-based solutions using AWS. 30 | * Implement core AWS services according to basic architectural best practices. 31 | * Design and maintain network architecture for all AWS services. 32 | * Leverage tools to automate AWS networking tasks. 33 | 34 | 35 | ## [↖](#top)[↑](#2)[↓](#2_1_1) Content 36 | 37 | * [Domain 1: Design and Implement Hybrid IT Network Architectures at Scale](#2_1_1) 38 | * [Domain 2: Design and Implement AWS Networks](#2_1_2) 39 | * [Domain 3: Automate AWS Tasks](#2_1_3) 40 | * [Domain 4: Configure Network Integration with Application Services](#2_1_4) 41 | * [Domain 5: Design and Implement for Security and Compliance](#2_1_5) 42 | * [Domain 6: Manage, Optimize, and Troubleshoot the Network](#2_1_6) 43 | 44 | 45 | ### [↖](#2_1)[↑](#2_1)[↓](#2_1_2) Domain 1: Design and Implement Hybrid IT Network Architectures at Scale 46 | * 1.1 Implement connectivity for hybrid IT 47 | * 1.2 Given a scenario, derive an appropriate hybrid IT architecture connectivity solution 48 | * 1.3 Explain the process to extend connectivity using AWS Direct Connect 49 | * 1.4 Evaluate design alternatives that leverage AWS Direct Connect 50 | * 1.5 Define routing policies for hybrid IT architectures 51 | 52 | ### [↖](#2_1)[↑](#2_1_1)[↓](#2_1_3) Domain 2: Design and Implement AWS Networks 53 | * 2.1 Apply AWS networking concepts 54 | * 2.2 Given customer requirements, define network architectures on AWS 55 | * 2.3 Propose optimized designs based on the evaluation of an existing implementation 56 | * 2.4 Determine network requirements for a specialized workload 57 | * 2.5 Derive an appropriate architecture based on customer and application requirements 58 | * 2.6 Evaluate and optimize cost allocations given a network design and application data flow 59 | 60 | ### [↖](#2_1)[↑](#2_1_2)[↓](#2_1_4) Domain 3: Automate AWS Tasks 61 | * 3.1 Evaluate automation alternatives within AWS for network deployments 62 | * 3.2 Evaluate tool-based alternatives within AWS for network operations and management 63 | 64 | ### [↖](#2_1)[↑](#2_1_3)[↓](#2_1_5) Domain 4: Configure Network Integration with Application Services 65 | * 4.1 Leverage the capabilities of Route 53 66 | * 4.2 Evaluate DNS solutions in a hybrid IT architecture 67 | * 4.3 Determine the appropriate configuration of DHCP within AWS 68 | * 4.4 Given a scenario, determine an appropriate load balancing strategy within the AWS ecosystem 69 | * 4.5 Determine a content distribution strategy to optimize for performance 70 | * 4.6 Reconcile AWS service requirements with network requirements 71 | 72 | ### [↖](#2_1)[↑](#2_1_4)[↓](#2_1_6) Domain 5: Design and Implement for Security and Compliance 73 | * 5.1 Evaluate design requirements for alignment with security and compliance objectives 74 | * 5.2 Evaluate monitoring strategies in support of security and compliance objectives 75 | * 5.3 Evaluate AWS security features for managing network traffic 76 | * 5.4 Utilize encryption technologies to secure network communications 77 | 78 | ### [↖](#2_1)[↑](#2_1_5)[↓](#3) Domain 6: Manage, Optimize, and Troubleshoot the Network 79 | * 6.1 Given a scenario, troubleshoot and resolve a network issu 80 | 81 | 82 | # [↖](#top)[↑](#2_1_6)[↓](#3_1) Design and Implement AWS Networks 83 | 84 | 85 | ## [↖](#top)[↑](#3)[↓](#3_1_1) AWS Global Network Infrastructure 86 | 87 | * [Overview](#3_1_1) 88 | 89 | 90 | 91 | ### [↖](#3_1)[↑](#3_1)[↓](#3_2) Overview 92 | AWS has the concept of a **Region**, which is a physical location around the world where we cluster data centers. We call each group of logical data centers an Availability Zone. Each AWS Region consists of multiple, isolated, and physically separate AZs within a geographic area. 93 | 94 | An **Availability Zone (AZ)** is one or more discrete data centers with redundant power, networking, and connectivity in an AWS Region. AZs give customers the ability to operate production applications and databases that are more highly available, fault tolerant, and scalable than would be possible from a single data center. 95 | 96 | A **transit center** provides redundant connectivity between AZs and internet backbones. 97 | 98 | **Edge locations** are AWS data centers ('endpoints') designed to deliver services with the lowest latency possible. Amazon has dozens of these data centers spread across the world. They’re closer to users than Regions or Availability Zones, often in major cities, so responses can be fast and snappy. A subset of services for which latency really matters use edge locations, including: 99 | * *CloudFront*, which uses edge locations to cache copies of the content that it serves, so the content is closer to users and can be delivered to them faster. 100 | * *Lambda@Edge*, is a feature of Amazon CloudFront that lets you run code closer to users of your application, which improves performance and reduces latency. 101 | * *Route 53*, which serves DNS responses from edge locations, so that DNS queries that originate nearby can resolve faster (and, contrary to what you might think, is also Amazon’s premier database). 102 | * *Web Application Firewall* and *AWS Shield*, which filter traffic in edge locations to stop unwanted traffic as soon as possible. 103 | 104 | **AWS Local Zones** place compute, storage, database, and other select AWS services closer to end-users. With AWS Local Zones, you can easily run highly-demanding applications that require single-digit millisecond latencies to your end-users such as media & entertainment content creation, real-time gaming, reservoir simulations, electronic design automation, and machine learning: 105 | * A Local Zone is an extension of an AWS Region that is geographically close to your users. 106 | * You can extend any VPC from the parent AWS Region into Local Zones by creating a new subnet and assigning it to the AWS Local Zone. When you create a subnet in a Local Zone, your VPC is extended to that Local Zone. The subnet in the Local Zone operates the same as other subnets in your VPC. 107 | 108 | 109 | ## [↖](#top)[↑](#3_1_1)[↓](#3_2_1) Virtual Private Cloud (VPC) 110 | 111 | * [Overview](#3_2_1) 112 | * [Default VPC (Amazon specific)](#3_2_1_1) 113 | * [Non-default VPC (regular VPC)](#3_2_1_2) 114 | * [VPC Scenarios](#3_2_1_3) 115 | * [Core Components](#3_2_2) 116 | * [Security Components](#3_2_3) 117 | * [Structure & Package Flow](#3_2_4) 118 | * [Package flow through VPC components](#3_2_4_1) 119 | * [Limits](#3_2_5) 120 | 121 | 122 | 123 | ### [↖](#3_2)[↑](#3_2)[↓](#3_2_1_1) Overview 124 | **Amazon Virtual Private Cloud (Amazon VPC)** is a service that lets you launch AWS resources in a logically isolated virtual network that you define. You have complete control over your virtual networking environment, including selection of your own IP address range, creation of subnets, and configuration of route tables and network gateways. You can use both IPv4 and IPv6 for most resources in your virtual private cloud, helping to ensure secure and easy access to resources and applications. 125 | 126 | As one of AWS's foundational services, Amazon VPC makes it easy to customize your VPC's network configuration. You can create a public-facing subnet for your web servers that have access to the internet. It also lets you place your backend systems, such as databases or application servers, in a private-facing subnet with no internet access. Amazon VPC lets you to use multiple layers of security, including security groups and network access control lists, to help control access to Amazon EC2 instances in each subnet. 127 | * Provisions a logically isolated section of the AWS cloud 128 | * Spans over all AZs in a region 129 | * Allows to create layered architecture 130 | * Shared or dedicated tenancy (exclusive hardware or not) 131 | * Cannot be changed after VPC creation 132 | * *Security groups* and subnet-level *network ACLs* 133 | * Ability to extend on-premises network to cloud 134 | * Can be extended *after creation* by adding 1 to utmost 4 CIDR blocks 135 | * On AWS 136 | * Service - FAQs - User Guide 137 | 138 | 139 | #### [↖](#3_2)[↑](#3_2_1)[↓](#3_2_1_2) Default VPC (Amazon specific) 140 | * Gives easy access to a VPC without having to configure it from scratch 141 | * Has different subnets in different AZs and an internet Gateway (HA, spread out to all AZs) 142 | * Each instance launched automatically receives a *public IP* (and a private IP), this is usually not the case for non-default VPCs 143 | * Cannot be restored if deleted 144 | * Comes with default NACL that allows all inbound/outbound traffic 145 | 146 | #### [↖](#3_2)[↑](#3_2_1_1)[↓](#3_2_1_3) Non-default VPC (regular VPC) 147 | * Only has private IP addresses 148 | * Resources *only* accessible through *Elastic IP*, *VPN* or *Internet Gateways* 149 | 150 | #### [↖](#3_2)[↑](#3_2_1_2)[↓](#3_2_2) VPC Scenarios 151 | * VPC with private subnet only -> single tier apps 152 | * VPC with public and private subnets -> layered apps 153 | * VPC with public, private subnets and hardware connected VPN -> extending apps to on-premises 154 | * VPC with private subnets and hardware connected VPN -> extended VPN 155 | 156 | 157 | ### [↖](#3_2)[↑](#3_2_1_3)[↓](#3_2_3) Core Components 158 | * **CIDR range** 159 | * VPCs are private networks and use RFC1918 ranges 160 | * 10.0.0.0/8 (-> `10.255.255.255`) 161 | * 172.16.0.0/12 (-> `172.31.255.255`) 162 | * 192.168.0.0/16 (-> `192.168.255.255`) 163 | * This guarantees that VPCs cannot conflict in the public internet 164 | * **Subnet** 165 | * In exactly one AZ 166 | * If traffic is routed to an Internet Gateway, the subnet is known as a *public subnet* 167 | * Gets public IP through Internet Gateway 168 | * If a subnet doesn't have a route to the Internet Gateway, it's known as a *private subnet* 169 | * Can get internet access through NAT Gateway 170 | * EC2 instances are launched into subnets 171 | * Sometimes grouped into Subnet Groups, e.g. for caching or DB. Typically across AZs 172 | * **Route Table** 173 | * Contains a set of rules, called *routes* that determine where network traffic is directed to 174 | * Each VPC automatically comes with a *main route table* that can be configured 175 | * Each subnet in a VPC must be associated with a route table; the table controls the routing for the subnet. 176 | * A subnet can only be associated with one route table at a time, but multiple subnets can be associated with the same route table 177 | * Each route in a table specifies a destination CIDR and a target 178 | * Every route table contains a local route for communication within the VPC 179 | * Has a *local route* for communication within the VPC (e.g. `172.31.0.0/16`) 180 | * Can have a *default route* `0.0.0.0/0` to route everything that doesn't have a specific rule 181 | 182 | |Route Table Type|Description| 183 | |-|-| 184 | |Main|The route table that automatically comes with your VPC. It controls the routing for all subnets that are not explicitly associated with any other route table.| 185 | |Subnet|A route table that's associated with a subnet.| 186 | |Gateway|A route table that's associated with an internet gateway or virtual private gateway.| 187 | |Local gateway|A route table that's associated with an Outposts local gateway.| 188 | 189 | * **Elastic IP** 190 | * Static IPv4 address mapped to an instance or network interface 191 | * If attached to network interface it's decoupled from the instance's lifecycle 192 | * Routes to private IP address of instance 193 | * Can be remapped in case of failure 194 | * For use in a specific region only 195 | * Can only map to instances in public subnets 196 | * **Gateways** 197 | * *Internet Gateway* 198 | * Horizontally scaled, redundant, and highly available VPC component that allows communication between instances in a VPC and the internet 199 | * Provides a target in VPC route tables for internet-routable traffic 200 | * Performs network address translation (NAT) for instances that have been assigned public IPv4 addresses 201 | * *Egress-Only* Gateway 202 | * Allows outbound communication over IPv6 from instances in your VPC to the Internet 203 | * Prevents the Internet from initiating an IPv6 connection with your instances. 204 | * (IPv6 addresses are globally unique, and are therefore public by default) 205 | * *Virtual Private* Gateway (VGW) 206 | * AWS side of Site-to-site VPN 207 | * Has VPN connection to customer gateway attached 208 | * Serves as VPN concentrator on the Amazon side of the VPN connection 209 | * Only one virtual private gateway can be attached to a VPC at a time 210 | * *Customer Gateway* 211 | * Customer side of Site-to-site VPN 212 | * A physical device or software application on your side of the VPN connection 213 | * **NAT** 214 | * 'One-way valve' that allows access *to* the internet, but not *from*. 215 | * *NAT Instances* 216 | * Manually configured instance from an NAT AMI 217 | * Need to manually disable *source/destination check* on the instance 218 | * *NAT Gateway* 219 | * AWS-mananged service 220 | * HA per AZ, create one gateway per AZ 221 | * **DNS** 222 | * Route53 resolver is provided for VPC (can be disabled) 223 | * Can provide DHCP options to provide own DNS configuration 224 | * DNS hostnames are provides (can be disabled) 225 | * Private (internal) hostname: `ip-private-ipv4-address.region.compute.internal` 226 | * Public (external) hostname: `ec2-public-ipv4-address.region.compute.amazonaws.com` 227 | 228 | ### [↖](#3_2)[↑](#3_2_2)[↓](#3_2_4) Security Components 229 | * **Security Groups** 230 | * Acts as a virtual, distributed firewall to control inbound and outbound traffic to instances 231 | * Acts on instance level, not subnet level 232 | * 'Allow rules' for inbound and outbound traffic (*no* explicite deny rules) 233 | * All outbound traffic is allowed by default 234 | * All inbound traffic is denied per default 235 | * Support *allow* rules only 236 | * Cannot block individual IP adresses (use NACL for that) 237 | * *Stateful* - will always allow response to (allowed) outbound traffic 238 | * Can refer to other security groups, e.g. allow traffic from there 239 | * Can have mulitple security groups attached to an instance 240 | * Can have any number of instances within a security group 241 | * **Network ACL** 242 | * Subnet level, acting as firewall 243 | * One subnet can (and must) only ever be associated to one NACL, however, one NACL can be associated to many subnets 244 | * Rules for inbound and outbound traffic 245 | * Rules have numbers and are evaluated from low to high 246 | * Default is to deny everything in and out 247 | * *Stateless* 248 | * Support *allow* and *deny* rules 249 | * Can block IP addresses (Security groups can't) 250 | * **Cannot** block URLs (forward proxies can) 251 | * **VPC Flow Logs** 252 | * Capture information about the IP traffic going to and from network interfaces in a VPC. 253 | * Contains description of networking packets, but not their payload 254 | * Log data can be published to Amazon CloudWatch Logs and Amazon S3 255 | * Can be created at 3 levels: 256 | * VPC 257 | * Subnet 258 | * Network interface 259 | 260 | 261 | ### [↖](#3_2)[↑](#3_2_3)[↓](#3_2_4_1) Structure & Package Flow 262 | 263 | #### [↖](#3_2)[↑](#3_2_4)[↓](#3_2_5) Package flow through VPC components 264 | * VPC (has *CIDR*) 265 | * Gateway (Internet or VPN) 266 | * Router 267 | * Route table (one per subnet, can be shared) 268 | * Network ACL (one per subnet, can be shared) 269 | * Subnets (CIDRs match VPC's CIDR) 270 | * Security Group (on VPC level) 271 | * Instance (needs public IP for internet communication, either ELB or Elastic IP) 272 | 273 | 274 | 275 | ### [↖](#3_2)[↑](#3_2_4_1)[↓](#3_3) Limits 276 | ||| 277 | |-|-| 278 | |VPCs per region|5| 279 | |Min/max VPC size|`/28`/`/16`| 280 | |Subnets per VPC|200| 281 | |Customer gateways per region|50| 282 | |Gateway per region|5 Internet| 283 | |Elastic IPs per account per region|5| 284 | |VPN connections per region|50| 285 | |Route tables per region|200| 286 | |Security groups per region|500| 287 | 288 | --- 289 | 290 | 291 | ## [↖](#top)[↑](#3_2_5)[↓](#3_3_1) Connecting VPCs to other VPCs 292 | 293 | * [Overview](#3_3_1) 294 | * [VPC Peering](#3_3_2) 295 | * [Establishing a VPC peering](#3_3_2_1) 296 | * [Longest prefix match](#3_3_2_2) 297 | * [Unsupported VPC peering configurations](#3_3_2_3) 298 | * [Limits](#3_3_2_4) 299 | * [Transit Gateway](#3_3_3) 300 | * [Overview](#3_3_3_1) 301 | * [Setting up a Transit Gateway](#3_3_3_2) 302 | * [Transit VPC (=Software VPN, not recommended any more)](#3_3_4) 303 | * [AWS PrivateLink](#3_3_5) 304 | 305 | 306 | 307 | ### [↖](#3_3)[↑](#3_3)[↓](#3_3_2) Overview 308 | 309 | ||VPC Peering|Transit Gateway| 310 | |-|-|-| 311 | |VPC-Limit|125 peerings|5,000 attachments| 312 | |Bandwith limit|N/A (intra-region)|50Gb/s per VPC attachment| 313 | |Management|Decentralized|Centralized| 314 | |Cost Dimensions|Data transfer|Data transfer & attachment| 315 | 316 | 317 | ### [↖](#3_3)[↑](#3_3_1)[↓](#3_3_2_1) VPC Peering 318 | * Connect VPCs through direct network routing 319 | * Cross-region, cross-account 320 | * Allows instances to communicate with each other as if they were in the same network 321 | * Full private IP connectivity between VPCs 322 | * Connectivity must be established for each VPC that need to communicate with one another 323 | * Can reference a security group of a peered VPC (even cross-account) 324 | * Must update route tables in each VPC’s subnets to ensure instances can communicate 325 | * On AWS: 326 | * Documentation - FAQs 327 | 328 | 329 | #### [↖](#3_3)[↑](#3_3_2)[↓](#3_3_2_2) Establishing a VPC peering 330 | * Consumer VPC initiates peering request 331 | * Provider VPC accepts peering request 332 | * Route tables on both sides are updated, to ensure traffic can flow 333 | 334 | 335 | #### [↖](#3_3)[↑](#3_3_2_1)[↓](#3_3_2_3) Longest prefix match 336 | * VPC uses the longest prefix match to select the most specific route 337 | * Other way of saying it is “most specific route” 338 | 339 | 340 | #### [↖](#3_3)[↑](#3_3_2_2)[↓](#3_3_2_4) Unsupported VPC peering configurations 341 | * *Overlapping CIDR blocks* 342 | * Cannot create a VPC peering connection between VPCs with matching or overlapping IPv4 CIDR blocks 343 | * *Transitive peering* 344 | * You have a VPC peering connection between VPC A and VPC B (pcx-aaaabbbb), and between VPC A and VPC C (pcx-aaaacccc). There is no VPC peering connection between VPC B and VPC C. You cannot route packets directly from VPC B to VPC C through VPC A. 345 | * *Edge to edge routing through a gateway or private connection* 346 | * A VPN connection or an AWS Direct Connect connection to a corporate network 347 | * An internet connection through an internet gateway 348 | * An internet connection in a private subnet through a NAT device 349 | * A gateway VPC endpoint to an AWS service; for example, an endpoint to Amazon S3. 350 | * (IPv6) A ClassicLink connection. You can enable IPv4 communication between a linked EC2-Classic instance and instances in a VPC on the other side of a VPC peering connection. However, IPv6 is not supported in EC2-Classic, so you cannot extend this connection for IPv6 communication. 351 | 352 | 353 | #### [↖](#3_3)[↑](#3_3_2_3)[↓](#3_3_3) Limits 354 | ||soft|hard| 355 | |-|-|-| 356 | |Active VPC peering connections per VPC|50|125| 357 | 358 | 359 | ### [↖](#3_3)[↑](#3_3_2_4)[↓](#3_3_3_1) Transit Gateway 360 | 361 | 362 | #### [↖](#3_3)[↑](#3_3_3)[↓](#3_3_3_2) Overview 363 | AWS Transit Gateway connects VPCs and on-premises networks through a central hub. This simplifies your network and puts an end to complex peering relationships. It acts as a cloud router – each new connection is only made once. 364 | 365 | As you expand globally, inter-Region peering connects AWS Transit Gateways together using the AWS global network. Your data is automatically encrypted, and never travels over the public internet. And, because of its central position, AWS Transit Gateway Network Manager has a unique view over your entire network, even connecting to Software-Defined Wide Area Network (SD-WAN) devices. 366 | * For having transitive peering between thousands of VPC and on-premises, hub-and-spoke (star) connection 367 | * Private IP connectivity 368 | * VPCs must be in same region as Transit Gateway 369 | * However, you can peer Transit Gateways across regions 370 | * VPCs can be in different accounts 371 | * Transit Gateway Route Tables: Control which VPC can talk with other VPC 372 | * Works with Direct Connect Gateway, VPN connections 373 | * Instances in a VPC can access a NAT Gateway, NLB, PrivateLink, and EFS in others VPCs attached to the AWS Transit Gateway. 374 | * Share cross-account using Resource Access Manager 375 | * AWS Resource Access Manager (AWS RAM) lets you share your resources with any AWS account or through AWS Organizations. If you have multiple AWS accounts, you can create resources centrally and use AWS RAM to share those resources with other accounts. 376 | * Supports *IP Multicast* (not supported by any other AWS service) 377 | * On AWS: 378 | * Service - FAQs - User Guide 379 | 380 | 381 | #### [↖](#3_3)[↑](#3_3_3_1)[↓](#3_3_4) Setting up a Transit Gateway 382 | * Connected VPCs route to Transit Gateway 383 | * Transit Gateway Route Table determines which VPCs can talk to each other 384 | 385 | 386 | ### [↖](#3_3)[↑](#3_3_3_2)[↓](#3_3_5) Transit VPC (=Software VPN, not recommended any more) 387 | * Not an AWS offering, newer managed solution is Transit Gateway 388 | * Uses the public internet with a software VPN solution 389 | * Allows for transitive connectivity between VPC & locations 390 | * More complex routing rules, overlapping CIDR ranges, network-level packet filtering 391 | 392 | 393 | ### [↖](#3_3)[↑](#3_3_4)[↓](#3_4) AWS PrivateLink 394 | ... 395 | 396 | --- 397 | 398 | 399 | ## [↖](#top)[↑](#3_3_5)[↓](#3_4_1) Extending on-premises networks to VPCs 400 | 401 | * [AWS VPN](#3_4_1) 402 | * [AWS Direct Connect](#3_4_2) 403 | 404 | 405 | 406 | ### [↖](#3_4)[↑](#3_4)[↓](#3_4_2) AWS VPN 407 | ... 408 | 409 | ### [↖](#3_4)[↑](#3_4_1)[↓](#4) AWS Direct Connect 410 | ... 411 | 412 | --- 413 | 414 | 415 | # [↖](#top)[↑](#3_4_2)[↓](#4_1) Open 416 | 417 | ## [↖](#top)[↑](#4)[↓](#4_2) Services 418 | * RAM 419 | 420 | ## [↖](#top)[↑](#4_1)[↓](#4_3) Topics 421 | * IPv4 vs IPv6 422 | * Dynamic Routing Protocols (BGP) 423 | 424 | ## [↖](#top)[↑](#4_2)[↓](#4_4) Practice/Hands-on 425 | * VPC Peering 426 | * Transit Gateway 427 | * Transit VPC 428 | * PrivateLink/Endpoint service 429 | 430 | --- 431 | 432 | 433 | ## [↖](#top)[↑](#4_3) Supporting Material 434 | * [Exam Readiness: AWS Certified Advanced Networking - Specialty](https://www.aws.training/Details/Curriculum?id=21330) (free aws training) 435 | * [AWS Networking Fundamentals](https://www.youtube.com/watch?v=hiKPPy584Mg) (youtube) 436 | 437 | -------------------------------------------------------------------------------- /developer-associate.md: -------------------------------------------------------------------------------- 1 | [toc_start]:: 2 | 3 | --- 4 | * [AWS Developer Associate](#1) 5 | * [AWS Fundamentals](#2) 6 | * [Global infrastructure](#2_1) 7 | * [Storage overview](#2_2) 8 | * [Security Concepts](#2_3) 9 | * [Services](#3) 10 | * [IAM](#3_1) 11 | * [Secure Token Service (STS)](#3_2) 12 | * [S3](#3_3) 13 | * [Dynamo DB](#3_4) 14 | * [Elastic Compute Cloud (EC2)](#3_5) 15 | * [Elastic Load Balancer (ELB)](#3_6) 16 | * [SNS](#3_7) 17 | * [SQS](#3_8) 18 | * [Cloudformation](#3_9) 19 | * [Elastic Beanstalk (EB)](#3_10) 20 | * [Simple Workflow Service (SWF)](#3_11) 21 | * [Virtual Private Cloud (VPC)](#3_12) 22 | * [Relational Database Service (RDS)](#3_13) 23 | * [Etc](#4) 24 | --- 25 | [toc_end]:: 26 | 27 | # [↖](#top)[↑](#)[↓](#2) Developer Associate 28 | > 6/2017 - 8/2017 29 | 30 | 31 | # [↖](#top)[↑](#1)[↓](#2_1) AWS Fundamentals 32 | 33 | 34 | ## [↖](#top)[↑](#2)[↓](#2_2) Global infrastructure 35 | * **Region** - grouping of data centers 36 | * **AZ** - indidvidual data center in a region. Redundancy throughout AZs in one region 37 | * **Edge Location** - location to deliver cached data fast -> Use *Cloudfront* CDN to cache data 38 | close to where it's being used 39 | 40 | 41 | ## [↖](#top)[↑](#2_1)[↓](#2_2_1) Storage overview 42 | 43 | ### Instance store volumes 44 | * **Temporary block storage** 45 | * Physically attached to the host computer of the instance 46 | * Useful for often-changing data like caches & buffers 47 | * *Data is lost* when EC2 instance stops or terminates (*ephemeral* data) 48 | 49 | ### Elastic Block Storage (EBS) 50 | * **Permanent block storage**, independent to instance 51 | * Attachable to running EC2 instances (same AZ) 52 | * Only accessible by a *single instance* 53 | * Can take snapshots from 54 | * Can be encrypted 55 | * Stores redundantly in single AZ 56 | * Different volume options: 57 | * General purpose SSD 58 | * Provisioned IOPS 59 | * Magnetic volumes 60 | 61 | ### Elastic File System (EFS) 62 | * **Scalable file storage** for use with *Amazon EC2 instances* 63 | * Elastic storage capacity, growing and shrinking as files are added or removed 64 | * *Multiple EC2 instances* from *multiple AZs* can access an EFS file system at the same time 65 | * Stores redundantly in multiple AZs 66 | 67 | ### Amazon Glacier 68 | * Low cost, very slow retrieval 69 | * Can be intergrated with S3 lifecycle policy 70 | 71 | ### Database Storage 72 | * *DynamoDB* 73 | * *RDS* 74 | * DBs on EC2 instances 75 | * *AWS Redshift* (data warehouse service) 76 | 77 | ### In-memory caching 78 | * *ElastiCache* (Memcached and Redis) 79 | * Software on EC2 instances 80 | 81 | ### Storage gateway 82 | * Integrate existing *on-premises storage* infrastructure and data with the AWS Cloud 83 | 84 | 85 | ## [↖](#top)[↑](#2_2_7)[↓](#3) Security Concepts 86 | * **Shared responsibility** environment 87 | * AWS is responsible for: 88 | * Server / Host level and below 89 | * Physical environment security 90 | * Hardware decommissioning 91 | * Traffic security (Networks, ACLs, SSL, DDOS-protection) 92 | * EC2 hypervisor isolation 93 | * User is responsible for: 94 | * IAM 95 | * MFA 96 | * Password/key-rotation 97 | * Access advisor (shows used permissions) 98 | * Trusted advisor (validates best practices) 99 | * Security groups 100 | * ACL (resource based policy) 101 | * VPC 102 | 103 | 104 | # [↖](#top)[↑](#2_3)[↓](#3_1) Services 105 | 106 | ## [↖](#top)[↑](#3)[↓](#3_1_1) IAM 107 | IAM is a global service that helps to securely control access to AWS resources. 108 | 109 | * **Users** hold credentials 110 | * **Groups** hold users, typically only provides permission to assume a role 111 | * **Roles** hold policies. 112 | * Can have **trust relationships** with trusted entities that can *assume* this role 113 | * **Policies** can be attached to users, groups or roles (preferred) 114 | * An **instance profile** is a container for an IAM role that you can use to pass role information to an 115 | EC2 instance when the instance starts. 116 | * Users and / or services assume roles 117 | 118 | 119 | ### Policies 120 | * Any actions on resources that are not explicitly allowed are **denied by default** 121 | * Structure 122 | * **E** - `effect` (*allow* / *deny*) 123 | * What the effect will be when the user requests the specific action 124 | * **P** - `prinicpal` (*ARN*) 125 | * The account or user who is allowed access to the actions and resources in the statement 126 | * IAM policies do not have a principal (because they are attached to users, groups or roles) 127 | * **A** - `action` or `notaction` 128 | * Describes the specific action or actions that will be allowed or denied 129 | * **R** - `resource` or `notresource` 130 | * Specifies the object or objects that the statement covers 131 | * **C** - `condition` 132 | * Specifies conditions for when a policy is in effect 133 | * Can use **policy variables** 134 | * `aws:currentTime`, `aws:userid`, ... 135 | 136 | ``` 137 | { 138 | "Version": "2012-10-17", 139 | "Statement": [ 140 | { 141 | "Effect": "Allow", 142 | "Action": "s3:ListAllMyBuckets", 143 | "Resource": "arn:aws:s3:::*" 144 | }, 145 | { 146 | "Effect": "Allow", 147 | "Action": [ 148 | "s3:ListBucket", 149 | "s3:GetBucketLocation" 150 | ], 151 | "Resource": "arn:aws:s3:::productionapp" 152 | }, 153 | { 154 | "Effect": "Allow", 155 | "Action": [ 156 | "s3:GetObject", 157 | "s3:PutObject", 158 | "s3:DeleteObject" 159 | ], 160 | "Resource": "arn:aws:s3:::productionapp/*" 161 | } 162 | ] 163 | } 164 | ``` 165 | 166 | #### IAM Policies 167 | * Managed policies (the new way) 168 | * Can be attached to multiple users, groups and roles 169 | * AWS managed policies 170 | * Updated by AWS if new API come out 171 | * Inline policies (the old way) 172 | 173 | 174 | ### Limits 175 | .|. 176 | -|- 177 | Groups per account|100 178 | Instance profiles|100 179 | Roles|500 180 | Server certificates|20 181 | Users|5000 182 | 183 | 184 | ## [↖](#top)[↑](#3_1_2)[↓](#3_2_1) Secure Token Service (STS) 185 | * Allows to grant **temporary access** to authenticated users 186 | * IAM users 187 | * Web-based identity providers (google, facebook, ...) 188 | * Organization's existing identity system 189 | * Returns **temporary credentials** that expire after some time: 190 | * Access key 191 | * Session token 192 | 193 | 194 | ### Terms 195 | * **Federation** 196 | * Trust relationship between identity provider and AWS 197 | * **Identity broker** 198 | * Broker in charge of mapping user to the right set of credentials 199 | * **Identity store** 200 | * Eg Google or Facebook 201 | * **Identities** 202 | * Users 203 | 204 | 205 | ### Scenarios 206 | * Temporary credentials with EC2 207 | * Assign IAM role to instance 208 | * Get temp credentials from *instance metadata* 209 | * Temporary credentials with SDK 210 | * Call `assumeRole`, extract temp credentials 211 | * Options for temporary credentials with API calls 212 | * *Sign request* with temp credentials 213 | * Add AC / SK to request (*header* or *query string*) 214 | 215 | 216 | ## [↖](#top)[↑](#3_2_2)[↓](#3_3_1) S3 217 | 218 | Amazon Simple Storage Service (S3) is object storage with a simple web service interface to store and 219 | retrieve any amount of data from anywhere on the web. It is designed to deliver 11x9 durability and 220 | scale past trillions of objects worldwide. 221 | 222 | * **Key**-**value** storage (folder-like structure is only a UI representation) 223 | * **Bucket** size is unlimited. Objects from 0B to 5TB. 224 | * HA and scalable, transparent data partitioning 225 | * Bucket lifecycle events can trigger *SNS*, *SQS* or *AWS Lambda* 226 | * New object created events 227 | * Object removal events 228 | * Reduced Redundancy Storage (RRS) object lost event 229 | * Bucket names have to be globally unique, should comply with DNS naming conventions. 230 | * `http://bucket.s3.amazonaws.com` 231 | * `http://bucket.s3-aws-region.amazonaws.com` 232 | * `http://s3.amazonaws.com/bucket` 233 | * `http://s3-aws-region.amazonaws.com/bucket` 234 | 235 | 236 | ### Perfomance & Consistency 237 | * Bucket operations **get** - **list** - **put** - **delete** - **head** 238 | * Implemented through *http* operations: `GET` - `PUT` - `DELETE` - `HEAD` 239 | * *Read-after-write consistency* for `PUT` of *new* objects. 240 | * *Eventual consistency* for *overwrite* `PUT` and `DELETE` (stale reads but low latency). 241 | * Can only delete a bucket that is empty. 242 | * *Scales* automatically, up to a certain limit: 243 | * Consistent: 244 | * `>100 PUT/LIST/DELETE/s` 245 | * `>300 GET/s` 246 | * Bursts: 247 | * `>300 PUT/LIST/DELETE/s` 248 | * `>800 GET/s` 249 | * Key names are used to determine which partition to store the object in. 250 | * Make sure keys are spread out (not sequential) 251 | * E.g. by adding a random prefix to the key name 252 | * For `GET` requests put *AWS CloudFront* in front of S3 bucket 253 | * Internal caching 254 | * Reduced latency - objects are physically closer to the consumer. 255 | * **Multipart upload** 256 | * Recommended for objects >=100MB, mandatory for >=5GB 257 | * Supports parallel uploads 258 | * Can pause & resume 259 | * Can upload file while it's being created 260 | * 3 step process: 261 | * Initiate multipart upload 262 | * `POST /ObjectName?uploads HTTP/1.1` 263 | * Upload of all parts 264 | * `PUT /ObjectName?partNumber=PartNumber&uploadId=UploadId HTTP/1.1` 265 | * Complete Multipart upload 266 | * ``` 267 | POST /ObjectName?uploadId=UploadId HTTP/1.1` 268 | ... 269 | ``` 270 | 271 | 272 | ### Hosting Static Websites 273 | `.s3-website-.amazonaws.com` 274 | * Bucket name *must* match domain name. Every hosted bucket recieves its own URL 275 | * Use *AWS Route 53* to integrate custom domains (also to automatically fail-over from dynamic website) 276 | * Specify `index` & `error` documents 277 | * In *AWS Route 53*: create hosted zone & record set 278 | * Might need to add CORS configuration to bucket (cross origin resource sharing) 279 | 280 | 281 | ### Access Control 282 | * **Effect** – This can be either allow or deny 283 | * **Principal** – Account or user who is allowed access to the actions and resources in the statement 284 | * **Actions** – For each resource, S3 supports a set of operations 285 | * **Resources** – Buckets and objects are the resources 286 | * Authorization works as a *union* of **IAM** & **bucket policies** and **bucket ACLs** 287 | 288 | #### Defaults 289 | * Bucket is *owned* by the AWS account that created it 290 | * Bucket ownership is not transferable 291 | * Bucket owner gets full permission (ACL) 292 | * The person paying the bills always has full control. 293 | * A person uploading an object into a bucket owns it by default. 294 | 295 | #### IAM 296 | * IAM policies (in general) specify what actions are allowed or denied on what AWS resources 297 | * Defined as JSON 298 | * Attached to IAM users, groups, or roles (so they cannot grant access to anonymous users) 299 | * Use if you’re more interested in *“What can this user do in AWS?”* 300 | 301 | #### Bucket policies 302 | * Specify what actions are allowed or denied for which principals on the bucket that the policy is 303 | attached to 304 | * Defined as JSON 305 | * Attached *only* to S3 buckets. Can however effect object in buckets. 306 | * Contain *principal* element (unnecessary for IAM) 307 | * Use if you’re more interested in *“Who can access this S3 bucket?”* 308 | * Easiest way to grant *cross-account permissions* for all `s3:*` permission. (Cannot do this with ACLs.) 309 | 310 | #### ACLs 311 | * Defined as XML. Legacy, not recomended any more. 312 | * Can 313 | * be attached to individual objects (bucket policies only bucket level) 314 | * control access to object uploaded into a bucket from a *different* account. 315 | * Cannot.. 316 | * have conditions 317 | * cannot explicitely deny actions 318 | * grant permission to bucket sub-resources (eg. lifecycle or static website configurations) 319 | * Other than *object ACL*s there are *bucket ACL*s as well - only for writing access log objects to a 320 | bucket. 321 | 322 | #### How to specify resources in a policy: 323 | .|. 324 | -|- 325 | `arn:partition:service:region:namespace:relative-id`|`arn:aws:s3:::mybucket` 326 | `arn:aws:s3:::*`|All buckets and objects in account 327 | `arn:aws:s3:::mybucket`|`mybucket` 328 | `arn:aws:s3:::mybucket/*`|All objects in `mybucket` 329 | `arn:aws:s3:::mybucket/mykey`|`mykey` in `mybucket` 330 | `arn:aws:s3:::mybucket/developers/($aws:username)/`|folder matching the accessing user's name 331 | 332 | #### Pre-signed URLs 333 | All objects are private by default. Only the object owner has permission to access these objects. 334 | However, the object owner can optionally share objects with others by creating a **pre-signed URL**, 335 | using their own security credentials, to grant time-limited permission to download the objects. 336 | 337 | 338 | ### Logging 339 | * *AWS CloudTrail* logs S3-API calls for bucket-level operations (and many other information) and 340 | stores them in an S3 bucket. Could also send email notifications or trigger *SNS* notifications for 341 | specific events. 342 | * *S3 Server Access Logs* log on object level. 343 | 344 | 345 | ### Versioning 346 | * Works on bucket level (for *all* objects) 347 | * Versioning can either be *unversioned* (default), *enabled* or *suspended* 348 | * **Version ids** are automatically assigned to objects 349 | * Ids cannot changed. 350 | * As long as versioning is *disabled*, id is set to `null` 351 | * Once enabled, versioning can only be suspended (but not disabled) 352 | * `PUT` creates a new version, `GET` returns the latest version. Specific versions can be requested. 353 | * `DELETE` (without version) marks latest version as deleted and returns a `404` for subsequent `GET`s. 354 | * Older versions (pre-delete) can still be requested. 355 | * Restore old version by deleting the new version or by copying the old version on top of the bucket. 356 | * `DELETE` (with a version) permanently deletes that version. 357 | * If versioning is *suspendend*, S3 automatically adds a `null` version ID to every subsequent 358 | object stored thereafter 359 | * *Lifecycle Management policies* can automatically handle old versions, e.g. permanently delete or 360 | move to *AWS Glacier*. 361 | * Different versions of the same object can have different permissions. 362 | 363 | 364 | ### Encryption 365 | 366 | #### Protecting data in transit 367 | * Using an AWS KMS–Managed Customer Master Key (CMK) 368 | * Before *uploading* to S3, Client makes request to KMS, receives plain text encryption key and 369 | cypher blob, to upload to S3 as object metadata. Decrypt by sending cypher blob to KMS, retrieving 370 | plain text back, use for decryption. 371 | * Before *downloading* from S3, The client first downloads the encrypted object from Amazon S3 along 372 | with the cipher blob version of the data encryption key stored as object metadata. The client then 373 | sends the cipher blob to AWS KMS to get the plain text version of the same, so that it can decrypt 374 | the object data. 375 | * Using a Client-Side Master Key 376 | * Clients provides a master key, S3 client generates random data 377 | key and encrypts with client's master key. 378 | * *Uploads* material description as part of the object metadata. 379 | * On *download* S3 client uses metadata to determine the right master key to use for decryption. 380 | * Use *SSL encryption* 381 | 382 | #### Protecting data at rest 383 | * Uses *AES-256* (or others) 384 | * Encryption can be enforced via bucket policy. 385 | * Enable server-side encryption by adding specific header to request (`x-amz-server-side-encryption`). 386 | * Server-Side Encryption with *Amazon S3-Managed Keys* (SSE-S3) 387 | * Each object is encrypted with a unique key employing strong multi-factor encryption 388 | * Furthermore it encrypts the key itself with a master key that is rotated regularly 389 | * Server-Side Encryption with *AWS KMS-Managed Keys* (SSE-KMS) 390 | * Similar to SSE-S3, with extra benefits 391 | * Separate permissions for the use of an envelope key 392 | * Has audit trail 393 | * Server-Side Encryption with *Customer-Provided Keys* (SSE-C) 394 | * Key is not stored with AWS (stores salted HMAC valued instead) 395 | 396 | 397 | ### Storage classes 398 | .|. 399 | -|- 400 | S3 Standard|Durability 11x9 401 | |Availability 4x9 402 | S3 IA (infrequent access)|Durability 11x9 403 | |Availability 3x9 404 | S3 RRS (reduced redundancy storage)|Durability 4x9 405 | |Availability 4x9 406 | 407 | 408 | ### Request/response headers 409 | Request|Response 410 | -|- 411 | `x-amz-content-sha256`|`x-amz-delete-marker` 412 | `x-amz-date`|`x-amz-id-2 ` 413 | `x-amz-security-token`|`x-amz-request-id` 414 | |`x-amz-version-id ` 415 | 416 | 417 | ### Error codes 418 | .|. 419 | -|- 420 | 400 Bad Request|`ExpiredToken` 421 | 400 Bad Request|`InvalidToken` 422 | 400 Bad Request|`InvalidArgument` 423 | 400 Bad Request|`InvalidRequest` 424 | 400 Bad Request|`IncompleteBody` 425 | 400 Bad Request|`IncompleteDigest` 426 | 400 Bad Request|`InvalidBucketName` 427 | 403 Forbidden|`AccessDenied` 428 | 403 Forbidden|`InvalidAccessKeyId` 429 | 404 Not Found|`NoSuchBucket` 430 | 404 Not Found|`NoSuchKey` 431 | 409 Conflict|`BucketAlreadyExists` 432 | 409 Conflict|`BucketNotEmpty` 433 | 434 | 435 | ### Limits 436 | .|. 437 | -|- 438 | Buckets per account|100 439 | Bucket policy max size|20KB 440 | Object size|0B to 5TB 441 | Object size in a single `PUT`|5GB 442 | 443 | 444 | ## [↖](#top)[↑](#3_3_10)[↓](#3_4_1) Dynamo DB 445 | 446 | 447 | ### Overview 448 | * Fully managed **NoSQL** database 449 | * *HA* through different AZs, automatically spreads data and traffic accross servers 450 | * 3 geographically distributed regions per table 451 | * Can scale up and down depending on demand (no downtime, no performance degradation) 452 | * Built-in monitoring 453 | * User controlled read/write capacity (recently added: *auto-scaling*) 454 | * Big data: Integrates with *AWS Elastic MapReduce* and *Redshift* 455 | * No joins - create references to other tables manually (`table1#something`) 456 | * Option between **eventual consistency** or **strongly consistency** 457 | * Conditional updates and concurrency control (**atomic counters**) 458 | 459 | 460 | ### Core components 461 | * A **table** is a collection of items. 462 | * Can be updated through a single `UpdateTable` command at a time (`ACTIVE` -> `UPDATING`) 463 | * An **item** is a group of one or more attributes that is uniquely identifiable among all of the 464 | other items. (*row* in a traditional db) 465 | * An **attribute** is a fundamental data element, something that does not need to be broken down any 466 | further. Can be nested up to 32 levels. (*column* in a traditional db) 467 | * **Primary keys** are used to uniquely identify each item in a table. Apart from that DynamoDB is 468 | *schemaless*, which means that neither the attributes nor their data types need to be defined 469 | beforehand 470 | * **Secondary indexes** are used to provide more querying flexibility 471 | * **Control plane** operations create and manage DynamoDB table 472 | * **Data plane** operations perform CRUD actions on data in a table 473 | * **DynamoDB streams** operations capture data modification events in DynamoDB tables 474 | 475 | 476 | ### Keys and indexes 477 | 478 | #### Partion key (PK) 479 | * **Partition key** is also called **hash attribute** or **primary key** 480 | * Must be unique, used for internal hash function (*unordered*) 481 | * Used to retrieve data 482 | 483 | #### PK & Sort key 484 | * **Composite PK**: *index* composed of hashed PK (*unordered*) and SK (*ordered*) 485 | * **Sort key** is also called **range attribute** or **range key** 486 | * Different items can have the same *PK*, must have different *SK* 487 | 488 | 489 | ### Secondary indexes 490 | * Associated with exactly one table, from which it obtains its data 491 | * Allows to query or scan data by an *alternate key* (other than PK/SK) 492 | * All secondary indexes are automatically maintained by DynamoDB as sparse objects 493 | * Items will only appear in an index if they exist in the base table 494 | * Makes querying very efficient 495 | * Only for `read` operations, `write` is not supported. 496 | * Tables with secondary indexes need to be created sequentially (`LimitExceededException`) 497 | 498 | #### Projected attributes 499 | * Attributes copied from the base table into an *index* 500 | * Makes them queryable 501 | * Different projection types 502 | * *KEYS_ONLY* - Only the index and primary keys are projected into the index 503 | * *INCLUDE* - Only the specified table attributes are projected into the index 504 | * *ALL* - All of the table attributes are projected into the index 505 | 506 | #### Local secondary index 507 | * Uses the *same PK*, but offers different *SK* 508 | * Every partition of a local secondary index is scoped to a base table partition that has the same 509 | partition key value 510 | * Local secondary indexes are extra tables that dynamo keeps in the background 511 | * Cannot be created after the base table has already been created. 512 | * Can choose *eventual consistency* or *strong consistency* at *creation* time 513 | * *Local* as in "co-located on the same partition" 514 | * Can request *not-projected* attributes for query or scan operation 515 | * Consumes read/write throughput from the original table. 516 | 517 | #### Global secondary index 518 | * Uses *different PK* and offers additional *SK* (or none). 519 | * *PK* does not have to be unique (unlike base table) 520 | * Queries on the global index can span all of the data in the base table, across all partitions 521 | * Can be created after the base table has already been created. 522 | * Only support *eventual consistency* 523 | * Have their own provisioned read/write throughput 524 | * Global secondary keys are distributed transactions across multiple partitions 525 | * Global as in "over many partitions" 526 | * Cannot request not-projected attributes for query or scan operation 527 | 528 | 529 | ### Capacity provisioning 530 | * Unit for operations: 531 | * 1 *strongly consistent* `read` per second (up to 4KB/s) 532 | * 2 *eventual consistent* `read` per second (up to 8KB/s) 533 | * 1 `write` per second (up to 1KB) 534 | * Algorithm 535 | 536 | .|. 537 | -|- 538 | .|*300 strongly consistent reads of 11KB per minute* 539 | Calculate read / writes per second|`300r/60s = 5r/s` 540 | Multiply with payload factor|`5r/s * (11KB/4KB) = 15cu` 541 | If eventual consistent, devide by 2|`15cu / 2 = 8cu` 542 | 543 | * More throughput -> more reads / writes per second 544 | * Exceeding allocated throughput may result in throttling of the operation. Check return code. 545 | * Failing to distribute data accross partions can result in `ProvisionedThroughputExceededException` 546 | * Local secondary index 547 | * `Read` 548 | * If read only index keys and projected attributes use same calculation 549 | * If more than index keys and projected attributes add extra latency and read capacity cost 550 | * Use read capacity from the index *and* for every item from the table 551 | * `Write` (to items in the base table that are indexed) 552 | * 1 for adding an item 553 | * 2 for changing the value of an item 554 | * 1 for deleting and item 555 | * Global secondary index 556 | * Read 557 | * Only supports eventual consistency, so 8KB/s base unit 558 | * Calculated the same as in tables, except that the size of the index entries is used instead 559 | of the size of the entire item 560 | * Write (to items in the base table that are indexed) 561 | * Putting, Updating, or Deleting items in a table consumes the index' write capacity units 562 | 563 | 564 | ### Query and scan operation 565 | 566 | #### Query 567 | * Finds items based on PK values 568 | * Can *only* query any table or secondary index that have a composite primary key 569 | * *Has* to use PK, *can* specify SK 570 | * Very efficient, only searches index 571 | * Result is orderd by SK 572 | * Returns all attributes or only specified subset 573 | * Eventually consistent per default, can request consistent read 574 | * Can use *conditional attributes* 575 | 576 | 577 | ### Scan 578 | * Reads every item in table (much worse performance than queries) 579 | * Can *filter* result (slows down performance) 580 | * The larger the data set in the table the slower the scan 581 | * *Eventual consistent* reads by default, can specify *strongly consistent* 582 | * Try to avoid scans 583 | * Use *Page Size* to limit how much data is retrieved at the same time 584 | 585 | 586 | ### Atomic and conditional updates 587 | 588 | #### Atomic Counters 589 | * Increment or decrement the value of an existing attribute without interfering with other writes 590 | * Request are applied in the order they are received 591 | * *Not idempotent* 592 | 593 | #### Conditional updates 594 | * Only proceed if condition is met 595 | * *Idempotent* 596 | 597 | 598 | ### How to grant temporary access 599 | * *Web Identity Federation* - use existing OpenId provider, eg. Amazon, Google, Facebook 600 | * *Amazon Cognito* does Web Identity Federation, also synchronizes app data 601 | * *IAM* - contains role for users to assume 602 | 603 | 604 | ### API 605 | * Control Plane 606 | 607 | Create and manage tables|. 608 | -|- 609 | `CreateTable`|Creates a table and specifies the primary index used for data access 610 | `DescribeTable`|Returns information such as primary key schema, throughput settings, index information 611 | `ListTables`|Returns the names of all of your tables in a list 612 | `UpdateTable`|Modifies the settings of a table or its indexes 613 | `DeleteTable`|emoves a table and all of its dependent objects 614 | 615 | * Data Plane 616 | 617 | Creating data|.|conditional? 618 | -|-|- 619 | `PutItem`|Creates a new item, or replaces an old item with a new item|yes 620 | `BatchWriteItem`|Puts or deletes multiple items in one or more tables|no 621 | |Called in a loop it typically checks for unprocesses items and submits a new `BWI` request for those 622 | 623 | Reading data|.|conditional? 624 | -|-|- 625 | `GetItem`|Returns a set of Attributes for an item that matches the PK|no 626 | `BatchGetItem`|Returns the attributes for multiple items from multiple tables using their PKs|no 627 | `Query`|Gets one or more items using the table *PK*, or from a secondary index using the index key|no 628 | `Scan`|Gets all items and attributes by performing a full scan across the table or a secondary index|no 629 | 630 | Updating data|.|conditional? 631 | -|-|- 632 | `UpdateItem`|Modifies one or more attributes in an item|yes 633 | 634 | Deleting data|.|conditional? 635 | -|-|- 636 | `DeleteItem`|Deletes a single item in a table by primary key|yes 637 | `BatchWriteItem`|Puts or deletes multiple items in one or more tables|no 638 | |Called in a loop it typically checks for unprocesses items and submits a new `BWI` request for those 639 | 640 | 641 | ### Limits 642 | .|. 643 | -|- 644 | Tables per account/region|256 645 | Max read / write per table partition|3000 reads / 1000 writes 646 | Partition key|min 1B, max 2048B 647 | Sort key|min 1B, max 1024B 648 | Local secondary index per table|5 649 | Global secondary index per table|5 650 | Item size|1B to 400KB, including name & value 651 | Simultaneous `CreateTable`, `UpdateTable`, `DeleteTable`|up to 10 652 | Single `BatchGetItem`|Max 100 items, must be <16MB 653 | Single `BatchWriteItem`|Up to 25 *PutItem* or *DeleteItem*, must be <16MB 654 | *Query* and *Scan* result set limit|1MB data per call 655 | 656 | 657 | ## [↖](#top)[↑](#3_4_11)[↓](#3_5_1) Elastic Compute Cloud (EC2) 658 | * Resizable **compute capacity** in the cloud 659 | * Amazon Machine Image (AMI) 660 | * Unit of deployment 661 | * Packaged-up environment that includes all the necessary bits to set up and boot an instance 662 | * Can create AMI from configured *EC2* instance 663 | 664 | 665 | ### Different options 666 | * Payment models 667 | * **On-demand instances** 668 | * Pay for compute capacity by the hour, can be terminated by Amazon 669 | * **Reserved instances** 670 | * Provide a significant discount compared to On-Demand pricing and 671 | provide a capacity reservation when used in a specific Availability Zone 672 | * Can transfer between AZs 673 | * **Spot instances** 674 | * Bid on spare Amazon EC2 computing capacity, not available for all instance types 675 | * **Dedicated hosts** 676 | * A physical server with EC2 instance capacity fully dedicated to your use 677 | * Instance sizes & types 678 | * *Sizes*: nano / micro / small / medium / large 679 | * *Types*: general purpose / computer optimized / memory optimized / gpu / storage optimized 680 | * Pricing by 681 | * Compute time 682 | * Data transfer 683 | * Storage 684 | * Elastic IP address 685 | * Monitoring 686 | * Elastic load balancer 687 | 688 | 689 | ### Instance metadata & userdata 690 | * Data about an instance that can be used to configure or manage the running instance 691 | * Available from *running instance* under `http://169.254.169.254/latest/meta-data/` 692 | * Contains various data about the current instance (static & dynamic) 693 | * Can specify user-data 694 | * Allows to launch individual instances from same AMI 695 | 696 | 697 | ### API 698 | .|. 699 | -|- 700 | `DescribeImages`|Describe an Amazon Machine Image 701 | `RegisterImage`|Final process of creating an AMI 702 | 703 | 704 | ### Limits: 705 | .|. 706 | -|- 707 | Elastic IP addresses for EC2-Classic|5 708 | 709 | 710 | ## [↖](#top)[↑](#3_5_4)[↓](#3_6_1) Elastic Load Balancer (ELB) 711 | * **Distributes traffic** between instances that belong to the ELB group 712 | * Stops sending requests to unhealthy instances 713 | * Can store SSL certificates (offloads encryption to load balancer level) 714 | * Can configure session stickyness: 715 | * *LB issued cookie* 716 | * Easy to implement, not best balancing 717 | * *Application issued cookie* 718 | * Cookies based on application session, marginally better 719 | * *ElastiCache* 720 | * Better distribution, requires state to be stored in *DB* or in *EC* memory. 721 | * EC memory is the much better option 722 | * Relies on DNS / *Route53* 723 | * Can route traffic into instances running in private subnets 724 | * Needs to be configured with (empty) public subnets though. 725 | 726 | 727 | ### Limits: 728 | .|. 729 | -|- 730 | Total load balancers per region (ALB & ELB)|20 731 | 732 | 733 | ## [↖](#top)[↑](#3_6_1)[↓](#3_7_1) SNS 734 | * **Publishes** messages to **subscribers** via topic 735 | * **Pub-Sub-Service** for messaging 736 | * Scenarios: 737 | * *Fanout*: Many subsribers process event parallel and asyncronously 738 | * *Push to SQS*: Services pull from SQS, when they become available 739 | * *Alert*: Notification triggered by event or threshold 740 | * **Mobile Notifications** to mobile devices 741 | * Sends *push notifications* to iOS, Android, Fire OS, Windows and Baidu-based devices 742 | 743 | 744 | ### Components 745 | * **Publisher** (producer) 746 | * Communicates asynchronously with subscribers 747 | * Policies determine which topic(s) publishers can write to 748 | * **Topics** 749 | * Unique name up to 256 characters 750 | * Stored redundantly on multitple servers and datacenters 751 | * **Subscribers** (consumer) 752 | * Subscribes to topic 753 | * Endpoints like mobile app, web server, email, *AWS SQS*, *AWS Lambda* 754 | * Email subscriptions need to be confirmed 755 | * **Messages** 756 | * Json-formatted key-value pairs 757 | * Fixed set + additional attributes if required 758 | * `POST`s to https endpoints with specific headers 759 | * Contains topics- and subscription-ARN 760 | * To identify messages without parsing the body 761 | * Up to 10 for SQS. 762 | * Provider-specific for mobile push notifications 763 | * Messages can be signed and verified 764 | * Message data: 765 | * `Message`, `MessageId`, `Signature`, `SignatureVersion`, `SigningCertURL`, `Subject`, 766 | `Timestamp`, `TopicArn`, `Type`, `UnsubscribeURL` 767 | 768 | 769 | ### Managing access 770 | * Owner creates topic and controls access to it 771 | * Can use own API (Access Control) and / or *IAM*, similar to *S3* 772 | * Access control policies 773 | * Default denies, needs explicit allow 774 | * Can grant access across account (API call: *AddPermission*) 775 | * IAM 776 | * More fine grained or very coarse, can include conditions 777 | * Can grant temporary security credentials 778 | 779 | 780 | ### Mobile push notifications 781 | * Does not push to endpoint, but to PN-service (platform/provider specific) 782 | 1. Request *credentials* from mobile platforms (ADM, APNS, etc...) 783 | 2. Request a *token* from mobile platforms (*registrations id* for some platforms) 784 | 3. Create a *platform application object* 785 | 4. Create a *platform endpoint object* 786 | 5. Publish a *message* to the mobile endpoint 787 | * A single message can contain different data for different platforms 788 | 789 | 790 | ### API 791 | .|. 792 | -|- 793 | `CreateTopic`|Create a new topic. 794 | `DeleteTopic`|Delete a topic and all its subscriptions. 795 | `Publish`| Publish a new message to the topic. 796 | `ListTopics`|List of topics owned by a particular user (AWS ID). 797 | `ListSubscriptions`|List subscriptions owned by a particular user (AWS ID) 798 | `ListSubscriptionsByTopic`|List of subscriptions for a particular topic 799 | `Subscribe`|Register a new subscription on a topic, will generate a confirmation message from Amazon SNS 800 | `ConfirmSubscription`|Respond to a confirmation message, confirming to receive notifications from the topic 801 | `UnSubscribe`|Cancel a previously registered subscription 802 | 803 | 804 | ### Limits: 805 | .|. 806 | -|- 807 | Topics per account|100,000 808 | 809 | 810 | ## [↖](#top)[↑](#3_7_5)[↓](#3_8_1) SQS 811 | * Scalable **message queue** service 812 | * Allows *loose coupling* and *asynchronous processing* 813 | * **Pull** from *SQS* (*Push* to *SNS*) 814 | * PCI compliant 815 | * Allows for asynchronous processing 816 | * Protection against data loss on application failure 817 | 818 | 819 | ### Core features 820 | * Redundant infrastructure 821 | * Multiple readers / writers at the same time 822 | * Access control via *SQS policies* (similar to *IAM*) 823 | * **Standard queue** 824 | * Guarantees message delivery *at least once* 825 | * *No guarantee on message order* 826 | * No guarantee on not receiving duplicates (app has to deal with it) 827 | * **FIFO queue** 828 | * Guaranteed order 829 | * Exactly-once processing 830 | * *Message groups* - multiple ordered message groups within a single queue 831 | * Name ends in `.fifo` 832 | * **Delay queues** 833 | * Controls when a message becomes available 834 | * Between 0s and 15min, default 0s 835 | * **Visibility timeout** 836 | * Controls when a polled message becomes visible again 837 | * Configurable and extendable for individual messages 838 | * Between 0s and 12h, default 30s 839 | * **Message retention period** 840 | * Amount of time a message will live in the queue if it's not deleted 841 | * Between 1min and 14d, default 4d 842 | * **In flight message** 843 | * Sent to a client but have not yet been deleted or have not yet reached the end of their 844 | visibility window 845 | * **Deadletter queue** 846 | * Queue that other queues can send messages to when these were not successfully 847 | processed. 848 | * **Receive message wait time** 849 | * Value >0 enables *long polling* 850 | * Between 0s and 20s, default 0s (*short polling*) 851 | 852 | 853 | ### Message lifecycle 854 | * Component 1 sends message A to queue 855 | * `SendMessage`/`SendMessageBatch` 856 | * Component 2 retrieves A from queue. 857 | * A remains in queue while it's being processed, but is not returned to any other components 858 | * Message is now considered to be *in flight*. 859 | * `ReceiveMessage` 860 | * Component 2 deletes A from queue during visibility timeout 861 | * Otherwise it will get processed again 862 | * SQS will never delete messages 863 | * `DeleteMessage`/`DeleteMessageBatch` 864 | 865 | 866 | ### Long polling vs short polling 867 | * **Short polling** returns immediately, could be *false empty* (e.g. message not fully propagated yet) 868 | * **Long polling** won't return unless there's a message in the queue or receive message wait time is 869 | exceeded. Also checks *every server* to avoid false empty responses 870 | 871 | 872 | ### API 873 | .|. 874 | -|- 875 | `SendMessage`/`SendMessageBatch`|Delivers a message to the specified queue (up to 20, <= 256KB) 876 | `ReceiveMessage`|Retrieves one or more messages (up to 10), `WaitTimeSeconds` for long poll 877 | `ChangeMessageVisibility`/`ChangeMessageVisibilityBatch`|Changes the visibility timeout of a message 878 | `DeleteMessage`/`DeleteMessageBatch`|Deletes the specified message from the specified queue 879 | `SetQueueAttribute`|e.g `DelaySeconds`, `MessageRetentionPeriod` 880 | `GetQueueURL`| 881 | `CreateQueue`| 882 | `DeleteQueue`| 883 | `ListQueues`| 884 | 885 | 886 | ### Limits: 887 | .|. 888 | -|- 889 | Max message size|256KB 890 | Max inflight messages|120,000 891 | 892 | 893 | ## [↖](#top)[↑](#3_8_5)[↓](#3_9_1) Cloudformation 894 | * Allows to create and provision **resources** in a reusable **template** fashion 895 | * A *CloudFormation* template is a `JSON` or `YAML` formatted text file 896 | * Related resources are managed in a single unit called a **stack** 897 | * All the resources in a stack are defined by the stack's *CloudFormation* template 898 | * Two ways to update a stack 899 | * *Direct update* 900 | * Create **change set** 901 | * Summary of proposed changes 902 | * Will **rollback** stack if it fails to create (can be disabled via API / console) 903 | 904 | 905 | ### Anatomy of template 906 | * *AWSTemplateFormatVersion* 907 | * *Description* 908 | * *Metadata* 909 | * Details about the template 910 | * *Parameters* 911 | * Values to pass in right before template creation 912 | * Allows validation per *regular expression* 913 | * *Mappings* 914 | * Maps keys to values (eg different values for different regions) 915 | * *Conditions* 916 | * Check values before deciding what to do 917 | * *Resources* 918 | * Creates resources 919 | * *Outputs* 920 | * Values to be exposed from the console or from API calls. 921 | * Can be used in a different stack (*cross stack references*) 922 | 923 | 924 | ### Intrinsic Functions 925 | * Used to pass in values that are not available until runtime 926 | * Usable in *resource* properties, *metadata* attributes, and *update policy* attributes (auto-scaling) 927 | * `Fn::GetAtt` 928 | * Returns the value of an attribute from a resource in the template 929 | * `Fn::FindInMap` 930 | * Returns the value corresponding to keys in a two-level map that is declared in the *Mappings* section 931 | * `Fn::Join` 932 | * Appends a set of values into a single value, separated by the specified delimiter 933 | * `Fn::GetAZs` 934 | * Returns an array that lists *Availability Zones* for a specified region 935 | * `Fn::Select` 936 | * Returns a single object from a list of objects by index 937 | * `Fn::ImportValue` 938 | * Returns the value of an *Output* exported by another stack 939 | * `Fn::Split` 940 | * Split a string into a list of string values so that you can select an element from the resulting 941 | string list 942 | * `Fn::Sub` 943 | * Substitutes variables in an input string with values that you specify 944 | * `Ref` 945 | * Returns the value of the specified parameter or resource 946 | 947 | 948 | ### Limits: 949 | .|. 950 | -|- 951 | Max stacks per region|200 952 | Max templates per region|unlimited 953 | Parameters|60 954 | Mappings|100 955 | Resources|200 956 | Outputs|60 957 | 958 | 959 | ## [↖](#top)[↑](#3_9_3)[↓](#3_10_1) Elastic Beanstalk (EB) 960 | * **Full stack** that provisions *capacity*, sets up *load balancing* and *auto-scaling* and configures 961 | *monitoring* 962 | * No need to create / manage infrastructure 963 | * Not good if full control of resource configuration is needed 964 | * Not everything fits into the EB model 965 | 966 | 967 | ### AWS-Stack 968 | * *EC2* instance 969 | * Instance *Security Group* 970 | * *Elastic Load Balancer* 971 | * *Load Balancer Security Group* 972 | * *Auto Scaling Group* 973 | * Automatically replaces instances if they become unavailable 974 | * *S3 Bucket* 975 | * Source code, logs & othe artifacts 976 | * *CloudWatch Alarm* 977 | * 2 alarms that monitor load on instances & Auto Scaling group scaling up / down 978 | * *Cloudformation stack* 979 | * *Domain name* 980 | * `subdomain.region.elasticbeanstalk.com` 981 | 982 | 983 | ### Supports 984 | * Platform-specific application *source bundle* (e.g. Java `war` for Tomcat) 985 | * Go 986 | * Java with Tomcat 987 | * Java SE 988 | * .NET on Windows Server with IIS 989 | * Node.js 990 | * PHP 991 | * Python 992 | * Ruby (Passenger Standalone) 993 | * Ruby (Puma) 994 | * Single Container Docker 995 | * Multicontainer Docker 996 | * Preconfigured Docker (Glassfish) 997 | * Preconfigured Docker (Python 3.x) 998 | * Preconfigured Docker (Go) 999 | 1000 | 1001 | ### Core components 1002 | * **Application** 1003 | * Logical collection of *Elastic Beanstalk* components, including *environments*, *versions*, and 1004 | *environment configurations*. In Elastic Beanstalk an application is conceptually similar to a 1005 | folder. 1006 | * **Application version** 1007 | * An *application version* refers to a specific, labeled iteration of deployable code for a web 1008 | application 1009 | * **Environment** 1010 | * An environment is a version that is deployed onto AWS resources 1011 | * Runs only a single application version at a time 1012 | * Can run the same version or different versions in many environments at the same time 1013 | * **Environment Configuratoin** 1014 | * Collection of parameters and settings that define how an environment and its associated resources behave 1015 | * Updating a configuration will cause AWS to automatically apply the changes 1016 | * **Configuration template** 1017 | * Starting point for creating unique environment configurations 1018 | 1019 | 1020 | ### Limits: 1021 | .|. 1022 | -|- 1023 | Applications|75 1024 | Application Versions|1000 1025 | Environments|200 1026 | 1027 | 1028 | ## [↖](#top)[↑](#3_10_4)[↓](#3_11_1) Simple Workflow Service (SWF) 1029 | * **Task coordination** and **state management** service 1030 | * Distributed, scales up and down depending on task 1031 | * Works with *on-premise* and *cloud* apps 1032 | * Allows for *synchronous* or *asynchronous* processing 1033 | * Can contain human events 1034 | * Guaranteed order of execution 1035 | * Tasks can live up to one year (`31,536,000 seconds`) 1036 | 1037 | 1038 | ### Core components 1039 | * **Workflow** 1040 | * A workflow is a set of *activities* that carry out some objective, together with logic that 1041 | coordinates the activities. 1042 | * **Domain** 1043 | * Scope of a *workflow* 1044 | * An account can have multiple *domains*, each of which can contain multiple *workflows* 1045 | * *Workflows* in different *domains* cannot interact 1046 | * **Workflow Starter** 1047 | * Any application that can initiate workflow executions 1048 | * **Activity** 1049 | * Things carried out by a *workflow* 1050 | * **Activity Task** 1051 | * Represents one invocation of an *activity* 1052 | * Can run synchronously or asynchronously 1053 | * Gets assigned to worker 1054 | * *Decision task* tells decider that state of workflow has changed 1055 | * **Activity Worker** 1056 | * Is a program that receives *activity tasks*, performs them, and provides results back 1057 | * Might be used by a person 1058 | * Can live on *EC2* or on-premise 1059 | * **Decider** 1060 | * Coordination logic in a *workflow* 1061 | * Schedules *activity tasks*, provides input data to the *activity workers*, processes events that 1062 | arrive while the *workflow* is in progress, and ultimately ends (or closes) the *workflow* when the 1063 | objective has been completed. 1064 | 1065 | 1066 | ### Limits: 1067 | .|. 1068 | -|- 1069 | Maximum registered domains|100 1070 | 1071 | 1072 | ## [↖](#top)[↑](#3_11_2)[↓](#3_12_1) Virtual Private Cloud (VPC) 1073 | * Provisions a logically isolated section of the AWS cloud 1074 | * Spans over all AZs in a region 1075 | * Allows to create layered architecture 1076 | * Shared or dedicated tenancy (exclusive hardware or not) 1077 | * *Security groups* and subnet *network ACLs* 1078 | * Ability to extend on-premise network to cloud 1079 | 1080 | 1081 | ### Overview 1082 | 1083 | #### Default VPC (Amazon specific) 1084 | * Gives easy access to a VPC without having to configure it from scratch 1085 | * Has different subnets in different AZs and an internet gateway per AZ 1086 | * Each instance launched automatically receives a *public IP* (very different to non-default VPC) 1087 | * Cannot be restored if deleted 1088 | 1089 | #### Non-default VPC (regular VPC) 1090 | * Only has private IP addresses 1091 | * Resources *only* accessible through *Elastic IP*, *VPN* or *internet gateways* 1092 | 1093 | #### VPC Peering 1094 | * Connect VPCs through direct network routing 1095 | * Can occur between different accounts and VPCs, but must be in the same region 1096 | * Allows instances to communicate with each other as if they were in the same network 1097 | 1098 | #### VPC Scenarios 1099 | * VPC with private subnet only -> single tier apps 1100 | * VPC with public and private subnets -> layered apps 1101 | * VPC with public, private subnets and hardware connected VPN -> extending apps to on-premise 1102 | * VPC with private subnets and hardware connected VPN -> extended VPN 1103 | 1104 | 1105 | ### Components 1106 | * **Subnet** 1107 | * In exactly one AZ 1108 | * If traffic is routed to an Internet gateway, the subnet is known as a public subnet 1109 | * If a subnet doesn't have a route to the Internet gateway, it's known as a private subnet 1110 | * EC2 instances are launched into subnets 1111 | * Use ssh-agent forwarding to connect from public to private instances 1112 | * Sometimes grouped into Subnet Groups, e.g. for caching or DB. Typically across AZs 1113 | * **Route Table** 1114 | * Contains a set of rules, called routes that determine where network traffic is directed to 1115 | * Each VPC automatically comes with a main route table that can be configured 1116 | * Each subnet in a VPC must be associated with a route table; the table controls the routing 1117 | for the subnet. A subnet can only be associated with one route table at a time, but multiple 1118 | subnets can be associated with the same route table 1119 | * Each route in a table specifies a destination CIDR and a target 1120 | * Every route table contains a local route for communication within the VPC 1121 | * Can have a *default route* 0.0.0.0/0 to route everything that doesn't have a specific rule 1122 | * **Elastic IP** 1123 | * Static IPv4 address mapped to an instance or network interface 1124 | * If attached to network interface it's decoupled from the instance's lifecycle 1125 | * Routes to private IP address of instance 1126 | * Can be remapped in case of failure. 1127 | * For use in a specific region only 1128 | * Can only map to instances in public subnets 1129 | * **Gateways** 1130 | * *Internet Gateway* 1131 | * Horizontally scaled, redundant, and highly available VPC component that allows communication 1132 | between instances in a VPC and the internet 1133 | * Provides a target in VPC route tables for internet-routable traffic 1134 | * Performs network address translation (NAT) for instances that have been assigned public 1135 | IPv4 addresses 1136 | * *Virtual Private Gateway* 1137 | * Has VPN connection to customer gateway attached 1138 | * Serves as VPN concentrator on the Amazon side of the VPN connection 1139 | * *Customer Gateway* 1140 | * A physical device or software application on your side of the VPN connection 1141 | * **NAT** 1142 | * *NAT Instances* 1143 | * Manually configured instance from an NAT AMI 1144 | * *NAT Gateway* 1145 | * AWS-mananged service 1146 | 1147 | #### Structure & package flow 1148 | * VPC (has *CIDR*) 1149 | * Gateway (Internet or VPN) 1150 | * Routes (one per subnet, can be shared) 1151 | * Network ACL (one per subnet, can be shared) 1152 | * Subnets (CIDRs match VPC's CIDR) 1153 | * Security Group (on VPC level) 1154 | * Instance (needs public IP for internet communication, either ELB or Elastic IP) 1155 | 1156 | 1157 | ### Security 1158 | 1159 | #### Network ACL 1160 | * Subnet level, acting as firewall 1161 | * Rules for inbound and outbound traffic 1162 | * Rules have numbers and are evaluated from low to high 1163 | * *Stateless* 1164 | 1165 | #### Security Groups 1166 | * Acts as a virtual firewall to control inbound and outbound traffic to instances 1167 | * Acts on instance level, not subnet level 1168 | * Rules for inbound and outbound traffic 1169 | * *Stateful* - will always allow response to (allowed) outbound traffic 1170 | * Can refer to other security group, e.g. allow traffic from there 1171 | 1172 | 1173 | ### Limits: 1174 | .|. 1175 | -|- 1176 | VPCs per region|5 1177 | Subnets per VPC|200 1178 | Customer gateways per region|50 1179 | Gateway per region|5 Internet 1180 | Elastic IPs per account per region|5 1181 | VPN connections per region|50 1182 | Route tables per region|200 1183 | Security groups per region|500 1184 | 1185 | 1186 | ## [↖](#top)[↑](#3_12_4)[↓](#4) Relational Database Service (RDS) 1187 | * Set up, operate, and scale a **relational database** in the cloud 1188 | * Supports 1189 | * Amazon Aurora 1190 | * MySQL 1191 | * MariaDB 1192 | * Oracle 1193 | * SQL Server 1194 | * PostgreSQL 1195 | * Automates common administrative tasks such as performing **backups** and software **patching** 1196 | * *Automated backups* 1197 | * Performs a full daily snapshot 1198 | * Enables point-in-time recovery 1199 | * *DB Snapshots* 1200 | * User-initiated 1201 | * As frequent as desired 1202 | * Supports *encryption at rest* for all database engines 1203 | * **DB instance** 1204 | * Database environment in the cloud with specified *compute* and *storage* resources 1205 | * **Multi-AZ deployments** 1206 | * Provide enhanced availability and durability for DB Instances, making them a natural fit for 1207 | production database workloads 1208 | * **DB subnet group** 1209 | * Collection of subnets that you are designated for the RDS DB Instances in a VPC 1210 | * **Maintenance window** 1211 | * Needs to be specified (or defaults to weekly) for maintenance events like scaling and patching 1212 | * **DB Parameter group** 1213 | * Acts as a “container” for engine configuration values that can be applied to one or more DB 1214 | Instances 1215 | 1216 | 1217 | # [↖](#top)[↑](#3_13)[↓](#) Etc 1218 | * *us-east-1* is the default region for all SDKs 1219 | * *Penetration tests* need to be anounced 1220 | -------------------------------------------------------------------------------- /devops-engineer-professional-02.md: -------------------------------------------------------------------------------- 1 | 2 | # DevOps Engineer Professional (C02) 3 | 4 | ## Comments per Service 5 | 6 | ### CodeStar 7 | 8 | #### CodeCommit 9 | 10 | - CodeCommit requires CloudWatch Events rule to trigger CodePipeline 11 | - Can trigger lambda functions out of CodeCommit events 12 | - AWS provides several managed policies: 13 | - `AWSCodeCommitFullAccess` , `AWSCodeCommitPowerUser` , `AWSCodeCommitReadOnly` 14 | - Can use Approval Rule templates to e.g. trigger unit tests via CodeBuild 15 | 16 | #### CodePipeline 17 | 18 | - CodePipeline can execute cross-region actions 19 | - CodePipeline can deploy straight into S3 20 | - CodePipeline can have custom actions that invoke job workers 21 | 22 | #### CodeBuild 23 | 24 | - CodeBuild can be triggered directly from Github via web hook 25 | - CodeBuild supports build badges, which provide an embeddable, dynamically generated image (_badge_) that displays the status of the latest build for a project 26 | 27 | #### CodeDeploy 28 | 29 | - In EC2/On-Premises deployment, a CodeDeploy **deployment group** is a set of individual instances targeted for a deployment. A deployment group contains individually tagged instances, Amazon EC2 instances in Amazon EC2 Auto Scaling groups, or both. 30 | - CodeDeploy can terminate the original instances in the deployment group with a waiting period of 1 hour. 31 | - CodeDeploy has a default timeout of 1 hour to wait for scripts to finish 32 | - CodeDeploy failing on `AllowTraffic` can mean that health checks on ELB are misconfigured 33 | - Notifies via CloudWatch Events 34 | - Lambda[]() 35 | - SNS 36 | - Kinesis streams 37 | - SQS 38 | - Built-in targets (CloudWatch Alarms actions) 39 | 40 | #### CodeGuru 41 | 42 | - Amazon CodeGuru **Profiler** helps developers understand the runtime behaviour of their applications, improve performance, and decrease infrastructure costs. 43 | - Amazon CodeGuru **Reviewer** is an automated code review service that identifies critical defects and deviation from coding best practices for Java and Python code. Works on PRs 44 | - Reviewer can protect secrets and suggest code changes to mitigate 45 | 46 | ### IaC 47 | 48 | #### CloudFormation 49 | 50 | - CFN custom resources -> pre-signed URLs 51 | - In a stackset, global resources (like S3) have to be unique 52 | - CloudFormation drift detection requires manual intervention; use AWS Config to automate detection. 53 | 54 | #### Service Catalog 55 | 56 | - By using a launch role via **launch constraint**, you can instead limit the end users’ permissions to the minimum they require for that product 57 | - The **template constraint** limits the options that are available to end-users when they launch a product. It works by narrowing the allowable values for parameters that are defined in the product’s underlying AWS CloudFormation template 58 | - Apply template constraints to ensure that the end users can use products without breaching the compliance requirements of your organization 59 | 60 | #### OpsWorks 61 | 62 | - OpsWorks can create time-based instances for scaling of predictable workload, or load-based using CPU utilisation or load, or memory utilisation 63 | 64 | ### Compute 65 | 66 | #### EC2 67 | 68 | - EC2 memory metrics are not collected by default and need to have CloudWatch agent installed 69 | - EC2 can use built-in **instance recovery** 70 | - An instance is scheduled to be retired when AWS detects irreparable failure of the underlying hardware that hosts the instance. 71 | - When an instance reaches its scheduled retirement date, it is stopped or terminated by AWS. 72 | - AWS also sends an AWS Health event, which you can monitor and manage by using Amazon CloudWatch Events. 73 | 74 | #### ASG 75 | 76 | - ASG lifecycle states: 77 | - `Pending` (hooks `Pending:Wait`, `Pending:Proceed`) 78 | - `InService` 79 | - `Terminating` (hooks `Terminating:Wait`, `Terminating:Proceed`) 80 | - `Terminated` 81 | - `Pending:Wait` lifecycle hook can allow AMI upgrades before bringing them into service 82 | - `Terminating:Wait` lifecycle hook to collect instance data (e.g. logs) before final termination 83 | - Tags mentioned in the Auto Scaling group are _not_ propagated to EBS volumes 84 | - ASG: A warm pool gives you the ability to decrease latency for your applications that have exceptionally long boot times, for example, because instances need to write massive amounts of data to disk. 85 | - Can keep instances in pool _running_ or _stopped_ 86 | - ASG can notify via SNS on failed instance launch 87 | - Can use Amazon EventBridge or Amazon CloudWatch Events to track the Auto Scaling Events 88 | - Can trigger Lambdas from ASG by filtering on EventBridge events 89 | - CloudFormation + ASGs: 90 | - `AutoScalingReplacingUpdate`: `WillReplace` `true` will wait for a complete replacement of the ASG and its instances before deleting the old ASG 91 | - `AutoScalingRollingUpdate`: replaces existing instance in ASG; valid options: MaxBatchSize, MinInstancesInService, MinSuccessfulInstancesPercent, PauseTime, SuspendProcesses, WaitOnResourceSignals 92 | 93 | #### Storage Gateway 94 | 95 | - Storage Gateway does not automatically refresh the cache if the files were added directly to S3. `RefreshCache` can be used to refresh the cache periodically. 96 | - **Tape gateway** is backed up by glacier, meant for backups etc 97 | - **File gatewayEC2** gets on-premises data into the cloud 98 | - **Volume gateway** is cloud-backed iSCSI block storage volumes 99 | 100 | #### SSM 101 | 102 | - ``AWS-AmazonLinuxDefaultPatchBaseline`` is a predefined patch baseline, doesn't do custom patches 103 | - `aws:runDocument` plugin runs SSM documents stored in Systems Manager or on a local share 104 | - `aws:downloadContent` plugin downloads an SSM document from a remote location to a local share 105 | - Can use SSM to create AMIs 106 | 107 | #### ELB 108 | 109 | - ALBs can be configured for 'dual stack' mode that allows IPv4 and IPv6 110 | - ALBs can have weightings between target groups 111 | 112 | ### Security 113 | 114 | #### IAM 115 | 116 | - `iam:passrole` passes a role to a service. E.g. a developer role to CloudFormation 117 | 118 | #### Firewall Manager 119 | 120 | - Firewall Manager can be used to configure and apply WAF ACLs to the ALBs in an AWS account. It can help centrally manage as well as apply them to new accounts added to the Organization in the future. 121 | 122 | #### KMS 123 | 124 | - **KMS grants** are commonly used by AWS services that integrate with AWS KMS to encrypt your data at rest. 125 | - The service creates a grant on behalf of a user in the account, uses its permissions, and retires the grant as soon as its task is complete. 126 | 127 | ### Compliance 128 | 129 | #### GuardDuty 130 | 131 | - Can be used for org-wide compliance 132 | - AWS recommends a separate delegated GuardDuty administrator account 133 | - Can auto-enable GuardDuty for all future Org accounts 134 | - Can configure GuardDuty **Trusted IP** list and **Threat IP** list and work with findings based on those 135 | - GuardDuty needs EventBridge for filtering 136 | 137 | #### Config 138 | 139 | - AWS Config can ensure all EC2 instances are managed by AWS Systems Manager. 140 | - AWS Config can find `ec2-volume-inuse-check`, but cannot detect how long a volume was unused for 141 | - `cloudformation-stack-drift-detection-check` checks if the actual configuration of a CloudFormation stack differs, or has drifted 142 | - `s3-bucket-ssl-requests-only` checks whether S3 buckets have policies that require requests to use SSL 143 | - Can deploy **conformance packs** into org accounts (from a delegated admin account) 144 | - Config itself is per region, use **Config Aggregator** for centralised collection of findings across regions & accounts 145 | - Uses aggregator account 146 | - By default, AWS Config will not automatically remediate the accounts that disabled its CloudTrail. You must manually set this up using a CloudWatch Events rule and a custom Lambda function that calls the StartLogging API to enable CloudTrail back again. Furthermore, the `cloudtrail-enabled` AWS Config managed rule is only available for the periodic trigger type and not Configuration changes. 147 | 148 | #### ControlTower 149 | 150 | - Use EventBridge to get notifications on Control Tower events like `CreateManagedAccount` 151 | - **Customizations for AWS Control Tower (CfCT)** helps you customize your AWS Control Tower landing zone and stay aligned with AWS best practices. Customizations are implemented with AWS CloudFormation templates and service control policies (SCPs). 152 | - CfCT capability is integrated with AWS Control Tower lifecycle events so that your resource deployments remain synchronized with your landing zone 153 | 154 | #### Org Policies 155 | 156 | - Are inherited down the path `org root` -> `ou` -> `accounts` 157 | 158 | #### Trusted Advisor 159 | 160 | - AWS Trusted Advisor checks identify ways to optimize your AWS infrastructure, improve security and performance, reduce costs, and monitor service quotas 161 | - Cost Optimization, Performance, Security, Fault Tolerance, and Service Limits 162 | - TrustedAdvisor can check for under-utilized EC2 163 | - Trusted Advisor's primary integration point is CloudWatch Events 164 | - With Trusted Advisor’s **Service Limit Dashboard**, you can view, refresh, and export utilization and limit data on a per-limit basis. 165 | - Metrics are published on Amazon CloudWatch in which you can create custom alarms 166 | 167 | #### Health 168 | 169 | - AWS Health is scanning public repos and can send events for compromised keys 170 | - On detection of an exposed IAM access key, AWS Health generates an `AWS_RISK_CREDENTIALS_EXPOSED` CloudWatch Event. 171 | - Also lists AWS Scheduled maintenance events on Health Dashboard 172 | - Can use CloudWatch Events/EventBridge to trigger workflows based on events 173 | - Can monitor AWS Health events using Amazon EventBridge or CloudWatch Events by calling the AWS Health API 174 | 175 | #### CloudTrail 176 | 177 | - Can set up trails for 178 | - **Data events**: These events provide insight into the resource operations performed on or within a resource. These are also known as data plane operations. 179 | - For S3 or Lambda data events 180 | - **Management events**: Management events provide insight into management operations that are performed on resources in your AWS account. These are also known as control plane operations. 181 | 182 | ### Networking 183 | 184 | #### VPC 185 | 186 | - NAT gateway does not span multiple AZs (instead: one gateway per AZ) 187 | - Can send VPC Flow Logs to CloudWatch Logs 188 | 189 | ### Storage 190 | 191 | #### Aurora 192 | 193 | - Read replicas are always asynchronous 194 | - AWS Aurora Global Database uses storage-based replication with typical latency of less than 1 second, using dedicated infrastructure that leaves your database fully available to serve application workloads. 195 | - 1 primary region (read/write), up to 5 secondary regions (read) 196 | - In the event of a regional degradation or outage, one of the second regions can be promoted to read and write capabilities in less than 1 minute. 197 | - Aurora endpoints 198 | - single built-in cluster endpoint, connects to the primary instance of the cluster 199 | - reader endpoint for read-only connections for your Aurora cluster 200 | - can have custom cluster endpoints (managed by Aurora) that can be READER. WRITER or ANY 201 | 202 | #### RDS 203 | 204 | - RDS creates and saves automated backups of your DB instance or Multi-AZ DB cluster during the backup window of your database. 205 | - default: 30min backup during 8h per-region night 206 | - Amazon RDS uses SNS to provide notification when an Amazon RDS event occurs. 207 | - Can also use CloudWatch Events/Eventbridge 208 | - Failover: 209 | - AZ outages => RDS multi-AZ deployment 210 | - Regional outages => RDS read replica 211 | - Multi-region deployments are like multi-AZ deployments, but other regions can be used for reads. Read replicas can be in the same AZ, same region, or cross-region 212 | - Read replicas have best RTO/RPO, but highest cost 213 | 214 | #### DynamoDB 215 | 216 | - In DynamoDb `ThrottledWriteRequests` can help adjusting increase the maximum write capacity units for the table's Auto Scaling policy. 217 | - `WriteThrottleEvents` are requests to DynamoDB that exceed the provisioned write capacity units for a table or a global secondary index. 218 | - Can use Kinesis Data Streams to capture changes to DynamoDB 219 | - Amazon DynamoDB global tables provide a single-digit millisecond latency and make sure the data is available across regions. 220 | - DynamoDB Global Tables requires 221 | - tables are created in each region already 222 | - DynamoDB Streams is enabled 223 | - Don't have multiple lambdas read from DynamoDB Streams 224 | - Only one process per shard! 225 | - Better to use fan-out pattern 226 | 227 | #### Glue 228 | 229 | - AWS Glue is an efficient way to store object metadata. Combination: S3 - Glue - Athena - QuickSight 230 | 231 | #### S3 232 | 233 | - Can include a pre-calculated checksum as part of your request. Amazon S3 compares the provided checksum to the checksum that it calculates by using your specified algorithm 234 | - Can activate access logs and use Athena for analysis/queries 235 | - S3 cross-region replication is push-based: source bucket gets a replication rule, destination bucket gets a bucket policy, source needs IAM role for S3 service to assume 236 | - Configure a replication rule within the source bucket to activate the replication process. 237 | - Create a bucket policy in the destination bucket that grants the source bucket permission to replicate objects into it. 238 | - In the source AWS account, create an IAM role that Amazon S3 can assume to replicate objects. Enable versioning in both buckets. 239 | - AWS CloudTrail only logs bucket-level actions in your Amazon S3 buckets by default. If you want to record all object-level API activity in your S3 bucket, you can set up data events in CloudTrail 240 | 241 | ### Serverless 242 | 243 | #### API Gateway 244 | 245 | - API Gateway does not have specific metrics for individual http error codes like 403, only a generic `4XXError` metric 246 | - Can enable API **caching** in Amazon API Gateway to cache your endpoint's responses 247 | 248 | #### ECS 249 | 250 | - can set ECS tasks as a target of CloudWatch events 251 | - ECS/Fargate logs 252 | - add the required `logConfiguration` parameters to your task definition to turn on the `awslogs` log driver 253 | - ECS/EC2 254 | - container instances have an attached IAM role that contains `logs:CreateLogStream` and `logs:PutLogEvents` 255 | - to turn on the `awslogs` log driver, your Amazon ECS container instances require at least version 1.9.0 of the container agent 256 | 257 | ### Application Auto Scaling 258 | 259 | - Is a web service for automatically scaling scalable resources for individual AWS services beyond Amazon EC2 260 | - Lambda function provisioned concurrency 261 | - DynamoDB tables and global secondary indexes 262 | - Aurora replicas 263 | - Amazon Elastic Container Service (ECS) services 264 | - ... 265 | 266 | - **Target** tracking scaling – Scale a resource based on a target value for a specific CloudWatch metric. 267 | - **Step** scaling – Scale a resource based on a set of scaling adjustments that vary based on the size of the alarm breach. 268 | - **Scheduled** scaling – Scale a resource one time only or on a recurring schedule. 269 | 270 | ### Content Delivery 271 | 272 | #### CloudFront 273 | 274 | - **OriginGroup**: An origin group includes two origins (a primary origin and a second origin to failover to) and a failover criteria that you specify. 275 | 276 | ### Notifications/Events 277 | 278 | #### SNS 279 | 280 | - SNS defines a **delivery policy** for each delivery protocol. The delivery policy defines how Amazon SNS retries the delivery of messages when server-side errors occur (when the system that hosts the subscribed endpoint becomes unavailable). 281 | - When the delivery policy is exhausted, Amazon SNS stops retrying the delivery and discards the message 282 | - —> unless a **dead-letter queue** is attached to the subscription. 283 | - For ECS notifications on **essential task** stopped, used EventBridge 284 | - For S3 fanout, use SNS and subscribe consumers to it 285 | 286 | ### Logging/Monitoring/Notification 287 | 288 | #### CloudWatch 289 | 290 | - CloudWatch Logs are always encrypted 291 | - CloudWatch _Metrics_ filters can be used to filter CloudWatch _Logs_ 292 | - Can create CloudWatch **Alarm** for the `StatusCheckFailed_System` metric and select the EC2 action to recover the instance 293 | - **CloudWatch Logs Subscription** for near realtime feed of log events 294 | - "Getting logs out of CloudWatch for further processing" 295 | - from CloudWatch Logs, to _Kinesis_, _ElasticSearch_ or _Lambda_ 296 | - CloudWatch has a predefined dashboard for CodeBuild metrics 297 | - You can call the EC2 `CreateSnapshot` API directly as a target from CloudWatch Events. 298 | 299 | #### KMS 300 | 301 | - KMS monitors to CloudWatch, can define alarms and alert 302 | 303 | #### Xray 304 | 305 | - Can run X-Ray daemon on AWS Elastic Beanstalk 306 | - X-Ray daemon uses UDP port 2000 307 | 308 | --- 309 | 310 | ## Comments per Topic 311 | 312 | ### Implement CI/CD Pipelines 313 | 314 | - CodeDeploy states + lifecycle hooks 315 | - CodeCommit IAM policies 316 | - CodeCommit needs CloudWatch Events/EventBridge to detect PRs 317 | - (EventBridge is the same service as CloudWatch Events, just with a new interface and more features exposed.) 318 | - GitHub needs a web hook to start a CodePipeline 319 | - CodeDeploy lifecycle hooks (reserved for CodeDeploy in parentheses): 320 | - `ApplicationStop` 321 | - (`DownloadBundle`) 322 | - `BeforeInstall` 323 | - (`Install`) 324 | - `AfterInstall` 325 | - `ApplicationStart` 326 | - `ValidateService` 327 | - `BeforeBlockTraffic` 328 | - (`BlockTraffic`) 329 | - `AfterBlockTraffic` 330 | - `BeforeAllowTraffic` 331 | - (`AllowTraffic`) 332 | - `AfterAllowTraffic` 333 | - Integrate automated testing into CI/CD pipelines 334 | - CloudWatch Logs + EventBridge to automate based on CodeBuild job results 335 | - CodeDeploy + EventBridge to automate based on CodeDeploy job results 336 | - EventBridge for CodePipeline scheduled events 337 | - CodeDeploy can integrate with CloudWatch Alarms to pause deployments 338 | - Build and manage artifacts 339 | - CodeBuild + CodePipeline + CodeDeploy + S3 for artifacts 340 | - S3 versioning + encryption required for CodePipeline 341 | - Implement deployment strategies for instance, container, and serverless environments 342 | - Elastic Beanstalk policies 343 | - All at once - fastest, but causes downtime; all remaining options have zero downtime 344 | - Rolling - still uses batches 345 | - Rolling with additional batch - to maintain full capacity during deploy 346 | - Immutable for when new & old versions must not be mixed and for fast rollback 347 | - Traffic splitting: for canary deploys 348 | - Blue/Green deployments: swap environment URLs; keep RDS in a separate stack; requires DNS change (all previous ones do not) 349 | - Lambda 350 | - canary deployments via alias weights 351 | - use CodeDeploy default deploy options: 352 | - Lambda: `LambdaLinear10PercentEvery10Minutes` (10% of traffic shifted at a time), `LambdaCanary10Percent10Minutes` (one 10% and one 90% deploy) 353 | - EC2: `AllAtOnce`, `OneAtATime`, `HalfAtATime` 354 | - ALB + EC2 + Route53 alias record swaps 355 | - OpsWorks Stack cloning + Route53 alias swaps 356 | - OpsWorks lifecycle stages 357 | 358 | ### Config Management and IaC 359 | 360 | - Define cloud infrastructure and reusable components to provision and manage systems throughout their lifecycle 361 | - CloudFormation cross-stack references use exports + Fn::ImportValue 362 | - Inline Lambda functions in CFN 363 | - Custom resource is used to invoke a Lambda function in AWS CloudFormation, the request will include a pre-signed URL. The Lambda function is responsible for returning a response to the pre-signed URL to indicate if the resource creation was successful or not. 364 | - Deploy automation to create, onboard, and secure AWS accounts in a multi-account/multi-region environment 365 | - Design and build automated solutions for complex tasks and large-scale environments 366 | - CloudFormation + ASGs: 367 | - `AutoScalingReplacingUpdate`: `WillReplace:true` will wait for a complete replacement of the ASG and its instances before deleting the old ASG 368 | - `AutoScalingRollingUpdate`: replaces existing instance in ASG; valid options: `MaxBatchSize`, MinInstancesInService, MinSuccessfulInstancesPercent, PauseTime, SuspendProcesses, WaitOnResourceSignals 369 | - OpsWorks can create _time-based_ instances for scaling of predictable workload, or _load-based_ using CPU utilisation or load, or memory utilisation 370 | - Collecting on-prem info: 371 | - Application Discovery Agent (install on each VM) or Agentless Discovery Connector (separate VM) 372 | 373 | ### Resilient Cloud Solutions 374 | 375 | - Implement highly available solutions to meet resilience and business requirements 376 | - RDS: 377 | - AZ outages => RDS multi-AZ deployment 378 | - Regional outages => RDS read replica 379 | - Multi-region deployments are like multi-AZ deployments, but other regions can be used for reads. Read replicas can be in the same AZ, same region, or cross-region 380 | - Read replicas have best RTO/RPO, but highest cost 381 | - Frontend traffic switching => Route53 failover 382 | - AutoScaling with a min & max of 1 is actually sensible - it makes the instance auto-redeploy if it dies 383 | - Route53 policies: `simple`, `failover`, `geolocation`, `geoproximity`, `latency`, `multi-value answer`, `weighted` 384 | - Implement solutions that are scalable to meet business requirements 385 | - ASG lifecycle states: 386 | - `Pending` (hooks `Pending:Wait`, `Pending:Proceed`) 387 | - `InService` 388 | - `Terminating` (hooks `Terminating:Wait`, `Terminating:Proceed`) 389 | - `Terminated` 390 | - EC2 autoscaling` Pending:Wait` lifecycle hook can allow AMI upgrades before bringing them into service 391 | - `Terminating:Wait` lifecycle hook to collect instance data (e.g. logs) before final termination 392 | - EKS: k8s cluster autoscaler or karpenter 393 | - EKS networking: 394 | - VPC CNI plugin 395 | - Load Balancer Controller 396 | - CoreDNS 397 | - kube-proxy 398 | - Calico 399 | - Hybrid environment patching 400 | - Implement automated recovery processes to meet RTO/RPO requirements 401 | 402 | ### Monitoring and Logging 403 | 404 | - Configure the collection, aggregation, and storage of logs and metrics 405 | - AWS Config Aggregator for centralised collection of findings across regions & accounts 406 | - EC2 custom logging requirements => CloudWatch Logs Agent 407 | - ECS Fargate logs => awslogs driver on task definition 408 | - CloudWatch has a predefined dashboard for CodeBuild metrics 409 | - Audit, monitor, and analyze logs and metrics to detect issues 410 | - near real time dashboards => QuickSight 411 | - near real time processing on CloudWatch logs: 412 | - Lambda subscription filter 413 | - Kinesis stream filter 414 | - ElasticSearch (OpenSearch) subscription filter 415 | - CloudTrail has log integrity checking which must be turned on 416 | - Automate monitoring and event management of complex environments 417 | - Service limit alerting => Trusted Advisor + CloudWatch Alarms + ServiceLimitUsage metric 418 | 419 | ### Incident and Event Response 420 | 421 | - Manage event sources to process, notify, and take action in response to events 422 | - S3 event notifications for data notifications like file deletion 423 | - RDS event notifications for multi-AZ failover events 424 | - EventBridge + AWS Health for notification about IAM credentials being exposed on GitHub, and for notifications about instance outages, etc. 425 | - CloudTrail _data_ events for object-level activity on S3 426 | - EC2 Auto Scaling groups => EventBridge 427 | - CodePipeline stage => EventBridge 428 | - CodeDeploy => CloudWatch Alarm + `MinimumHealthyHosts` metric can be used for rollbacks 429 | - OpsWorks self-healing => EventBridge 430 | - Implement configuration changes in response to events 431 | - Troubleshoot system and application failures 432 | 433 | ### Security and Compliance 434 | 435 | - Implement techniques for identity and access management at scale 436 | - Limit CodeCommit permissions via IAM policy which matches repo 437 | - S3 bucket policies for requiring TLS 438 | - Apply automation for security controls and data protection 439 | - Lifecycle management + auto-rotation of secrets => Secrets Manager 440 | - Cost-effective => SSM Parameter Store SecureStrings 441 | - Patching => SSM Patch Manager 442 | - Implement security monitoring and auditing solutions 443 | -------------------------------------------------------------------------------- /sysops-administrator-associate.md: -------------------------------------------------------------------------------- 1 | [toc_start]:: 2 | 3 | --- 4 | * [AWS-SysOps-Administrator-Associate](#1) 5 | * [Monitoring And Metrics](#2) 6 | * [Virtualization Types](#2_1) 7 | * [EC2 Instance Types](#2_2) 8 | * [EC2 Monitoring](#2_3) 9 | * [EBS Monitoring](#2_4) 10 | * [EFS Monitoring](#2_5) 11 | * [CloudWatch](#2_6) 12 | * [Costs](#3) 13 | * [Consolidated Billing](#3_1) 14 | * [Billing Metrics & Alarms](#3_2) 15 | * [Costs Optimization](#3_3) 16 | * [Cost Explorer](#3_4) 17 | * [High Availability](#4) 18 | * [Scalability & Elasticity Fundamentals](#4_1) 19 | * [Reserved Instances](#4_2) 20 | * [Autoscaling vs Resizing](#4_3) 21 | * [Load Balancers](#4_4) 22 | * [RDS HA](#4_5) 23 | * [HA for IP-based Applications](#4_6) 24 | * [HA/Fault Tolerance for Bastion Hosts](#4_7) 25 | * [Analysis](#5) 26 | * [Optimize the environment to ensure maximum performance](#5_1) 27 | * [Identify Performance Bottlenecks and Implement Remedies](#5_2) 28 | * [Identify Potential Issues on a Given Application Deployment](#5_3) 29 | * [OpsWorks](#6) 30 | * [Overview and components](#6_1) 31 | * [Cloudformation](#6_2) 32 | * [Backups & Recovery](#7) 33 | * [AWS Services with automated backups](#7_1) 34 | * [Disaster Recovery Scenarios](#7_2) 35 | * [Storing log files and backups](#7_3) 36 | * [Security](#8) 37 | * [Implement and Manage Security Policies](#8_1) 38 | * [Ensure Data Integrity and Access Controls when Using the AWS Platform](#8_2) 39 | * [Share responsibility model](#8_3) 40 | * [AWS and IT Audits](#8_4) 41 | * [Networking](#9) 42 | * [Route53 Routing Policies](#9_1) 43 | * [VPC Essentials](#9_2) 44 | * [Limits:](#9_3) 45 | * [Etc](#10) 46 | * [Accessing the OS](#10_1) 47 | * [SQS](#10_2) 48 | * [DynamoDb](#10_3) 49 | --- 50 | [toc_end]:: 51 | 52 | # [↖](#top)[↑](#)[↓](#2) SysOps Administrator Associate 53 | > 5/2018 - 9/2018 54 | 55 | --- 56 | 57 | 58 | # [↖](#top)[↑](#1)[↓](#2_1) Monitoring And Metrics 59 | 60 | 61 | ## [↖](#top)[↑](#2)[↓](#2_2) Virtualization Types 62 | 63 | Linux Amazon Machine Images use one of two types of virtualization: 64 | 65 | AMI|Type|Effect 66 | -|-|- 67 | **PV**|Paravirtual|Historically better performance than HVM, but no longer the case 68 | **HVM**|Hardware virtual machine|More modern, same or better performance than PV 69 | 70 | 71 | ## [↖](#top)[↑](#2_1)[↓](#2_3) EC2 Instance Types 72 | 73 | **General Purpose**|Balance of computer, memory and networking 74 | -|- 75 | **M5**
(2017)|* Require HVM AMIs
* Instance store via EBS or NVMe SSD (physically connected to to the host server) 76 | **M4**
(2015)|* Allows *enhanced networking*
* EBS-optimized 77 | **M3**
(2012)|* SSD (instance) store 78 | **T3**
(2018)|* 30% better price performance 79 | **T2**
(2014)|* Intented for workloads that do not use the full CPU constantly (e.g. web server)
* Allows *burstable performance*
* Burst credits allow to 'burst' past the baseline performance up to 100%
* 1 credit = 100% load per core per minute
* Credits are earned per hour, expire after 24h
* EBS storage only 80 | 81 | **Compute optimized**|Lowest prize for *compute* performance 82 | -|- 83 | **C5**
(2016)| * Intel Skylake
* Use Nitro, Amazon’s lightweight hardware accelerated hypervisor
* Better performance and pricing than C4 84 | **C4**
(2015)| * Intel Haswell
* Optimized for EC2
* Allows *enhanced networking* and *clustering*
* EBS-optimized 85 | **C3**
(2013)| * SSD (instance) store
* Allows *enhanced networking* and *clustering* 86 | 87 | **Memory optimized**|Lowest prize for *memory* performance 88 | -|- 89 | **Z1d**
(2018)| * Offer both high compute capacity and a high memory footprint
* Ideal for workloads with high per-core licensing costs 90 | **X1**
(2016)| * One of the lowest price per GiB of RAM
* SSD storage and EBS-optimized by default
* **X1e** has even more RAM 91 | **R5**
(2018)| * Use Nitro, Amazon’s lightweight hardware accelerated hypervisor 92 | **R4**
(2016)| * Improved networking and EBS performance 93 | **R3**
(2014)| * SSD (instance) store
* High memory capacity
* Allows *enhanced networking* 94 | 95 | **GPU optimized**|. 96 | -|- 97 | **P3**
(2017)| * Faster than P2 98 | **P2**
(2016)| * Intended for general-purpose GPU compute applications 99 | **G3**
(2017)| * Optimized for graphics-intensive applications
* Faster then G2 100 | **G2**
(2013)| * High frequency processors
* High-performce NVIDIA GPUs 101 | 102 | 103 | **Storage optimized**|Very fast SSD-backed instance storage optimized for high random I/O and high IOPS 104 | -|- 105 | **H1**
(2017)| * HDD-based local storage
* deliver high disk throughput
* Balance of compute and memory 106 | **I3**
(2016)| * (NVMe) SSD-backed instance storage optimized for low latency
* very high random I/O performance 107 | **D2**
(2015)| * Lowest price per disk throughput performance 108 | **I2**
(2013)| * SSD (instance) store
* Allows *enhanced networking*
* Supports *TRIM* (more efficient SSD operations) 109 | 110 | **RDS instance types**|Optimized to fit different relational database use cases 111 | -|- 112 | **db.**|General purpose, memory optimized, burstable performance 113 | 114 | .* 115 | 116 | 117 | ## [↖](#top)[↑](#2_2)[↓](#2_3_1) EC2 Monitoring 118 | 119 | 120 | ### EC2 Status Checks 121 | * AWS performs automated checks on every running EC2 instance 122 | * Performed every minute 123 | * Each returns a pass or a fail status 124 | 125 | **System Status Check** 126 | * Loss of network connectivity 127 | * Loss of system power 128 | * Hardware/software issues on physical host 129 | * Solution 130 | * Stop and start instance 131 | * Terminate and re-launch instance 132 | * Contact AWS 133 | * Can configure for *auto-recovery* 134 | * Instance will be rebooted and retain instance id, (e)ip address, EBS volumes et al 135 | 136 | **Instance Status Check** 137 | * Failed system status check 138 | * Network/startup configuration issues 139 | * Memory/disk problems 140 | * Kernel compatability issues 141 | * Solution 142 | * Fix problem 143 | * Stop and start instance 144 | * Terminate and re-launch instance, potentially with more memory/network/disk/... 145 | 146 | 147 | ## [↖](#top)[↑](#2_3_1)[↓](#2_4_1) EBS Monitoring 148 | 149 | 150 | ### EBS Status Checks 151 | * Run every 5 minutes 152 | * `insufficient data` if checks a running 153 | * `ok` if all checks pass 154 | * `warning` typically has to do with performance **degradation** from provisioned IOPS 155 | * `impaired` is a check fails, eg. the volume is **stalled** or not available 156 | 157 | * If Amazon EBS finds that data on a volume might be inconsistent, it disables I/O to that volume. 158 | * Changes status to `impaired` 159 | * This behaviour can be disabled 160 | 161 | 162 | ### EBS Performance Essentials 163 | **IOPS** (Input/Output Operations Per Second) is a common performance measurement used to benchmark 164 | computer storage devices like hard disk drives (HDD), solid state drives (SSD), and storage area 165 | networks (SAN). 166 | * I/O size is capped at 256 KiB for SSD volumes and 1,024 KiB for HDD volumes because SSD volumes handle 167 | small or random I/O much more efficiently than HDD volumes. 168 | * SSDs deliver constant performance for both random and sequential I/O 169 | * HDDs have optimal performance for large and sequential I/O 170 | * HDD can deliver more throughput put drastically less IOPS 171 | 172 | .|`gp2`|`io1`|`st1`|`sc1` 173 | -|-|-|-|- 174 | Volume type|General purpose SSD|Provisioned IOPS SSD|Throughput optimized HDD|Cold HDD 175 | Purpose|Balances price and performance|For mission-critical low-latency or high-throughput workloads|Low cost HDD volume designed for frequently accessed, throughput-intensive workloads |Lowest cost HDD volume designed for less frequently accessed workloads 176 | Volume Size|1 GiB - 16 TiB|4 GiB - 16 TiB|500 GiB - 16 TiB|500 GiB - 16 TiB 177 | Max. IOPS(1)/Volume|10,000|32,000|500|250 178 | Max. Throughput/Volume|160 MiB/s|500 MiB/s|500 MiB/s|250 MiB/s 179 | IOPS|* 3 IOPS per GB (larger volume means more IOPS)
* 100 IOPS <-> 10,000 IOPS
* Can burst to 3,000 IOPS if volume size is < 1TB
* Requires credits that are acquired per 3 IOPS/GB/second
* Max 5.4 miilion credit (also intitial value), enough for 3,000 IOPS for 30min
* Running out of credits reverts volume back to baseline performance|* 30 IOPS per GB (larger volume means more IOPS), up to 20,000
* Does not burst, delivers consistent IOPS rate instead|.|. 180 | 181 | > (1) gp2/io1 based on 16 KiB I/O size, st1/sc1 based on 1 MiB I/O size 182 | 183 | * Using *EBS optimized* instances guarantees optimal networking between EBS and EC2 184 | * Pre-warming/intialization 185 | * No longer needed for new EBS volumes 186 | * Storage blocks on volumes restored from snapshots do need to be initialized (read from) 187 | 188 | 189 | ## [↖](#top)[↑](#2_4_2)[↓](#2_5_1) EFS Monitoring 190 | 191 | * Two throughput modes to choose from for your file system 192 | * **Bursting** Throughput - throughput on Amazon EFS scales as your file system grows 193 | * **Provisioned** Throughput - you can instantly provision the throughput of your file system (in MiB/s) independent of the amount of data stored. 194 | 195 | 196 | ### Performance comparison 197 | .|Amazon EFS|Amazon EBS Provisioned IOPS (`io1`) 198 | -|-|- 199 | Per-operation latency|Low, consistent latency.|Lowest, consistent latency. 200 | Throughput scale|10+ GB per second.|Up to 2 GB per second. 201 | 202 | 203 | ### Storage Characteristics Comparison 204 | 205 | .|Amazon EFS|Amazon EBS Provisioned IOPS 206 | -|-|- 207 | Availability and durability|Data is stored redundantly across multiple AZs.|Data is stored redundantly in a single AZ. 208 | Access|Up to thousands of Amazon EC2 instances, from multiple AZs, can connect concurrently to a file system.|A single Amazon EC2 instance in a single AZ can connect to a file system. 209 | Use cases|Big data and analytics, media processing workflows, content management, web serving, and home directories.|Boot volumes, transactional and NoSQL databases, data warehousing, and ETL. 210 | 211 | 212 | ### S3 vs EFS vs EBS Comparison 213 | 214 | Amazon S3|Amazon EBS|Amazon EFS 215 | -|-|- 216 | Can be publicly accessible|Accessible only via the given EC2 Machine|Accessible via several EC2 machines and AWS services 217 | Web interface|File System interface|Web and file system interface 218 | Object Storage|Block Storage|Object storage 219 | Scalable|Hardly scalable|Scalable 220 | Slower than EBS and EFS|Faster than S3 and EFS|Faster than S3, slower than EBS 221 | Good for storing backups|Is meant to be EC2 drive|Good for shareable applications and workloads 222 | 223 | 224 | ## [↖](#top)[↑](#2_5_3)[↓](#2_6_1) CloudWatch 225 | Monitoring service that plugs into many other services 226 | 227 | * **Metrics** 228 | * Based on currently used service 229 | * Not everything is available out of the box, e.g. no data on memory usage of EC2 instances 230 | * **Alarms** 231 | * Based on thresholds defined on metrics 232 | * Can be added to dashboard 233 | * Invoke *Lambda*, *SNS*, email, ... 234 | * Takes place once, at a specific point in time 235 | * Disable with `mon-disable-alarm-actions` via CLI 236 | * **Logs** 237 | * Log into *log groups* 238 | * **Events** 239 | * Define actions on things that happened 240 | * Define `cron`-based events 241 | * Events are recorded constantly over time 242 | 243 | 244 | ### Key metrics for EC2 245 | 246 | * EC2 metrics are based on what is exposed to the hypervisor. 247 | * *Basic Monitoring* (default) submits values every 5 minutes, *Detailed Monitoring* every minute 248 | * Can install Cloudwatch agent (new) 249 | * Provides access to more metrics 250 | * Can use Cloudwatch monitoring scripts (old) to provide more metrics 251 | * Perl-scripts provided by AWS, need to manually install on instance 252 | * Use `cron` to automate sending data to CloudWatch 253 | 254 | Metric|Effect 255 | -|- 256 | `CPUUtilization`|The total CPU resources utilized within an instance at a given time. 257 | `DiskReadOps`,`DiskWriteOps`|The number of read (write) operations performed on all instance store volumes. This metric is applicable for instance store-backed AMI instances. 258 | `DiskReadBytes`,`DiskWriteBytes`|The number of bytes read (written) on all instance store volumes. This metric is applicable for instance store-backed AMI instances. 259 | `NetworkIn`,`NetworkOut`|The number of bytes received (sent) on all network interfaces by the instance 260 | `NetworkPacketsIn`,`NetworkPacketsOut`|The number of packets received (sent) on all network interfaces by the instance 261 | `StatusCheckFailed`,`StatusCheckFailed_Instance`,`StatusCheckFailed_System`|Reports whether the instance has passed both/instance/system status check in the last minute. 262 | 263 | * Can **not** monitor **memory usage**, **available disk space**, **swap usage** 264 | 265 | 266 | ### Key metrics for EBS 267 | Metric|Effect 268 | -|- 269 | `VolumeReadBytes`,`VolumeWriteBytes`|`sum` reports total bytes transferred, `average` also useful 270 | `VolumeReadOps`,`VolumeWriteOps`|total number of IO operations 271 | `VolumeQueueLength`|Number of read/write operation requests waiting to finish 272 | `VolumeTotalReadTime`,`VolumeTotalWriteTime`|Total number of seconds spent by all operations in a given time 273 | `VolumeThroughputPercentage`|Percentage of IOPS that was achieved out of total provisioned IOPS 274 | `VolumeConsumedReadWriteOps`|Total amount of r/w operations consumed within a specific time period 275 | 276 | * Can **not** monitor **disk usage percentage** 277 | 278 | 279 | ### Key metrics for EFS 280 | Metric|Effect 281 | -|- 282 | `BurstCreditBalance`|The number of burst credits that a file system has. 283 | `ClientConnections`|The number of client connections to a file system. 284 | `DataReadIOBytes`,`DataWriteIOBytes`|The number of bytes for each file system read(write) operation. 285 | `MetadataIOBytes`|The number of bytes for each metadata operation. 286 | `PercentIOLimit`|Shows how close a file system is to reaching the I/O limit of the General Purpose performance mode. 287 | `PermittedThroughput`|The maximum amount of throughput a file system is allowed. 288 | `TotalIOBytes`|The number of bytes for each file system operation, including data read, data write, and metadata operations. 289 | 290 | 291 | ### Key metrics for ELB (classic load balancer) 292 | Metric|Effect 293 | -|- 294 | `Latency`|Time it takes to receive an response. Measure `max` and `average` 295 | `BackendConnectionErrorr`|Number of not successfully established connections to registered instances, measure `sum` and look at difference between `min` and `max` 296 | `SurgeQueueLength`|Total number of request waiting to get routed, look at `max` and `average` 297 | `SpilloverCount`|Dropped requests because of exceeded surge queue. Look at `sum` 298 | `HTTPCode_ELB_3XX_Count`
`HTTPCode_ELB_4XX_Count`
`HTTPCode_ELB_5XX_Count`|The number of HTTP XXX server error codes that originate from the *load balancer*. This count does *not* include any response codes generated by the targets. 299 | `RequestCount`|Number of completed requests 300 | `HealthyHostCount`,`UnhealthyHostCount`|Self explainatory 301 | 302 | * In case of sudden and very large increases in traffic it's possible to contact AWS and have them 303 | 'pre-warm' the *ELB*. 304 | 305 | > spillover and surge queue give an indication of the ELB being overloaded 306 | 307 | * Typically this means that the backend system cannot process requests as fast as they are coming in 308 | * Ideally load balance into an autoscaling group. 309 | 310 | 311 | ### Key metrics for ALB (active load balancer) 312 | Metric|Effect 313 | -|- 314 | `RequestCount`|Number of completed requests 315 | `HealthyHostCount`,`UnhealthyHostCount`|Self explainatory 316 | `TargetResponseTime`|The time elapsed after the request leaves the load balancer until a response from the target is received. 317 | `HTTPCode_ELB_3XX_Count`
`HTTPCode_ELB_4XX_Count`
`HTTPCode_ELB_5XX_Count`|The number of HTTP XXX server error codes that originate from the *load balancer*. This count does *not* include any response codes generated by the targets. 318 | 319 | 320 | ### Key metrics for NLB (network load balancer) 321 | Metric|Effect 322 | -|- 323 | `processedbyte `|The total number of bytes processed by the load balancer, including TCP/IP headers. 324 | `tcp_client_reset_count`|the total number of reset (rst) packets sent from a client to a target. 325 | `tcp_elb_reset_count`|the total number of reset (rst) packets generated by the load balancer. 326 | `tcp_target_reset_coun`|the total number of reset (rst) packets sent from a target to a client. 327 | 328 | 329 | ### Key metrics for elasticache 330 | Supports *memcached* and *redis* 331 | 332 | Metric|**memcached**|**redis** 333 | -|-|- 334 | .|Designed for simplicity|Supports a much richer set of features. can be backed up if in *cluster* mode 335 | `cpu utilization`|* multithreaded
* stay under 90%/#cores
* -> increase # read replicase or use larger cache instance|* single threaded
* stay under 90%
* -> increase size of node or add more nodes 336 | `evictions`|* -> increase size or add nodes to cluster|* -> increase node size 337 | `concurrent connections`|* -> check application logic|* -> check application logic 338 | `swap usage`|* avoid swapping
-> increase `memcached_connections_overhead`|avoid swapping
* -> increase node size
* -> increase `memory connection overhead` (will decrease memory available for cache) 339 | 340 | .* 341 | 342 | 343 | ### Key metrics for RDS 344 | Metric|Effect 345 | -|- 346 | `CPUUtilization`|Percentage of CPU utilization 347 | `DatabaseConnections`|Number of connections that we have at a given point in time 348 | `DiskQueueDepth`|Number of read/write requests waiting to access the disk 349 | `FreeableMemory`|Amount of available RAM 350 | `FreeStorageSpace`|Amount of available storage space 351 | `SwapUsage`|When data is stored in memory on disk 352 | `Increase`|In this usually has to do with running out of available RAMReadIOPS/WriteIOPS 353 | `IOPS`|Represent the number of I/O operations completed per secondIf we don’t have enough IOPS, performance will slow down 354 | `ReadLatency/WriteLatency`|* Average amount of time taken per disk I/O operation (input/output)
* High latency can be solved with more IOPSReadThroughput/WriteThroughput
* `Average` is number of bytes read or written to or from disk per second 355 | 356 | .* 357 | 358 | * Also look at *RDS Events* 359 | 360 | --- 361 | 362 | 363 | # [↖](#top)[↑](#2_6_8)[↓](#3_1) Costs 364 | 365 | 366 | ## [↖](#top)[↑](#3)[↓](#3_2) Consolidated Billing 367 | Set up a **billing account** to pay for multiple **linked accounts** at the same time. 368 | 369 | * Allows for **consolidated billing**. Does *not* give IAM visibility into linked accounts. 370 | * Enables **volume discounts** across linked accounts. 371 | * If one account uses *reserved instances*, other accounts running on similar *on demand* instances 372 | will be billed under the reserved instance price. Similar for *RDS* instances. 373 | * All *credits* earned while linked will be applied to consolidated bill. 374 | 375 | Limits: 376 | * Up to 20 linked accounts 377 | 378 | 379 | ## [↖](#top)[↑](#3_1)[↓](#3_3) Billing Metrics & Alarms 380 | * Only shows metrics of services that have been used. 381 | * Set up *billing alarms* based on billing metrics. 382 | * *Overall* billing alarm, or *service-specific* alarms 383 | * Can still be account-specific, even with consolidated billing 384 | 385 | 386 | ## [↖](#top)[↑](#3_2)[↓](#3_4) Costs Optimization 387 | * Purchase **EC2 Reserved Instances** 388 | * Commit for 1-3 years and get a discount 389 | * Minimize the number of running instances 390 | * Set up *CloudWatch* alarms to spin down underutilized instances 391 | * Find balance between acceptable downtime & costs to eleminate this downtime 392 | * Remove unused **Load Balancers** 393 | * Look for idle (unattached) **EBS** volumes 394 | * Delete unused volumes 395 | * Take a *snapshot* to keep the data 396 | * Downsize volumes that aren't near full capacity 397 | * Look for over-provisoned **IOPS** 398 | * Look for unassociated **Elastic IP** addresses 399 | * Look for idle **RDS** instances 400 | * Check for 0 connections 401 | 402 | 403 | ## [↖](#top)[↑](#3_3)[↓](#4) Cost Explorer 404 | * Costs per *time frame* per *service*, various grouping and filtering options 405 | * Provides forecasts 406 | * **Pricing API** allows to download pricing information for specific services 407 | 408 | --- 409 | 410 | 411 | # [↖](#top)[↑](#3_4)[↓](#4_1) High Availability 412 | 413 | 414 | ## [↖](#top)[↑](#4)[↓](#4_2) Scalability & Elasticity Fundamentals 415 | * Pay only for *what* you need *when* you need it 416 | * Define minimum capacity 417 | * Define what needs to stretch out 418 | 419 | .|**Elasticity**|**Scalability** 420 | -|-|- 421 | .|*Scaling up/down on demand*|*Scaling for growth in order to meet long term requirements
typically does not focus on shrinking back* 422 | *DynamoDb*|Can provision more or less throughput|Stores as much data as we like, scales transparently 423 | *EC2*|Use autoscaling|More instances or bigger instance types 424 | *RDS*|./.|Bigger instances, more read replicas 425 | 426 | 427 | ## [↖](#top)[↑](#4_1)[↓](#4_3) Reserved Instances 428 | * *Reserve* instances for a specific period of time 429 | * *Standard* reserved instances (fixed instance type) 430 | * *Convertible* reserved instances (can be exchanged against another convertible instance type) 431 | * *Scheduled* reserved instances (purchased by the hour on a set schedule with a set instance type) 432 | * Up to 50% cheaper than a *fully utilized* on-demand instance (because we commit upfront to a certain usage) 433 | * Guarantees to *not* run into '*insufficent instance capacity*' issues if AWS is unable to provision instances in that AZ 434 | * Can resell reserved capacity on *Reserved Instance Marketplace* 435 | * Available for: 436 | * EC2 437 | * RDS (*reserved instances*) 438 | * DynamoDB (*reserved capacity*) 439 | * ElastiCache (*reserved nodes*) 440 | * CloudFront (*reserved capacity*) 441 | * Elastic MapReduce (*reserved EC2 instances*) 442 | * ECR (*reserved EC2 instances*) 443 | 444 | 445 | ## [↖](#top)[↑](#4_2)[↓](#4_4) Autoscaling vs Resizing 446 | * **Auto Scaling** distributes load across multiple instances 447 | * *Scheduled Scaling* allows to scale or shrink on a schedule 448 | * Relativly complex to set up 449 | * Applications need to be designed to benefit from multiple instances 450 | * Components 451 | * *Launch Configuration* 452 | * *Autoscaling Group* 453 | * *Scaling Policy* 454 | * *Cloudwatch Alarms* 455 | 456 | * **Changing instance size** increases/decreases available resources to the running application 457 | * *EBS* backed instances need to be stopped before resizing 458 | * *Instance storage* need to be migrated across 459 | * Not as flexible as auto scaling. Not elastic 460 | * Within an autoscaling group the to-be-resized instance might be treated as unhealthy 461 | 462 | 463 | ## [↖](#top)[↑](#4_3)[↓](#4_4_1) Load Balancers 464 | 465 | .|**ALB**|**NLB**|**ELB** 466 | -|-|-|- 467 | .|Active Load Balancer|Network Load Balancer|Classic Load Balancer 468 | Layer|7 (application layer)|4 (transport layer)|EC2-classic network (deprecated) 469 | Protocoll|HTTP, HTTPS|TCP|TCP, SSL, HTTP, HTTPS 470 | Health checks|✔|✔|✔ 471 | Cloudwatch metrics|✔|✔|✔ 472 | Logging|✔|✔|✔ 473 | Zone failover|✔|✔|✔ 474 | Connection draining|✔|✔|✔ 475 | Load balancing to different ports on the same instance|✔|✔|. 476 | WebSockets|✔|✔|. 477 | IP Addresses as targets|✔|✔|. 478 | Load balancing deletion protection|✔|✔|. 479 | Path-based routing|✔|.|. 480 | Host-based routing|✔|.|. 481 | Native http/2|✔|.|. 482 | Configurable idle connection timeout|✔|.|✔ 483 | Cross zone load-balancing|✔|✔|✔ 484 | SSl-offloading|✔|.|✔ 485 | Server-name indication|✔|.|✔ 486 | Sticky-sessions|✔|.|✔ 487 | Backend server encryption|✔|.|✔ 488 | Static IP|.|✔|. 489 | Elastic IP|.|✔|. 490 | Preserve source IP address|.|✔|. 491 | Resource-based IAM permissions|✔|✔|✔ 492 | Tag-based IAM permissions|✔|✔|. 493 | Slow start|✔|.|. 494 | User authenticaion|✔|.|. 495 | Redirects|✔|.|. 496 | Fixed responses|✔|.|. 497 | 498 | 499 | ### Elastic Load Balancer ('Classic LB') 500 | 501 | 502 | ### Overview 503 | * *External* load balancer 504 | * Public facing 505 | * Often used to distribute load between web servers 506 | * Provides public DNS host name 507 | * *Internal* load balancer 508 | * Often used to Distribute load between backend servers 509 | * Provides internal DNS host name 510 | * Configure (in AWS console) 511 | * Internal and external load balancer 512 | * Subnets for each AZ that traffic should be routed to 513 | * Can route into private subnets 514 | * Cross-zone load balancing 515 | * Connection draining (maximum time for the load balancer to keep connections alive before reporting the instance as 516 | de-registered) 517 | 518 | 519 | ### Sticky Sessions 520 | * Need to make sure that session is maintained between instances 521 | * Load Balancer generated stickiness (*duration based* session stickiness) 522 | * Application generated stickiness (*application based* session stickiness) 523 | * For HA, use *ElastiCache* to persist and share session state. So maintaining 524 | stickiness doesn't matter any more 525 | 526 | 527 | ## [↖](#top)[↑](#4_4_3)[↓](#4_6) RDS HA 528 | * Create *subnets* in different AZs 529 | * Create *subnet group* in RDS dashboard 530 | * Collection of subnets (typically private) in a VPC that is desgnated for DB instances 531 | * Should have subnets in at least two Availability Zones in a given region 532 | * Configure RDS for **multi-AZ-deployments** and turn replication on 533 | * Keeps a *synchronous* standby replica in a different AZ 534 | * Recommendation is use of Provisioned IOPS 535 | * Automatic failover in case of planned or unplanned outage of the first AZ 536 | * Most likely still has downtime 537 | * Can *force* failover by *rebooting* 538 | * Other benefits 539 | * Patching 540 | * Backups 541 | * *Aurora* can replicate accross 3 AZs 542 | * Failover process is automated 543 | * AWS detects an issue and starts the failover process 544 | * DNS records are modified to point to the standby instance 545 | * Application re-establishes existing DB connections 546 | 547 | 548 | ## [↖](#top)[↑](#4_5)[↓](#4_7) HA for IP-based Applications 549 | * If the application requires specific IPs (that are hardcoded somewhere), autoscaling cannot be used 550 | * Use *Elastic IP* and standby instances in different AZs instead 551 | * Cannot use Elastic IP across different regions though 552 | * Scale by increasing instance size (vertical scaling) 553 | 554 | 555 | ## [↖](#top)[↑](#4_6)[↓](#5) HA/Fault Tolerance for Bastion Hosts 556 | * Assign Elastic IP to bastion host in AZ 1 557 | * This IP can also be whitelisted to comply with corporate regulations 558 | * Have another instance on standby in different AZ 559 | * Could be in *ASG* (min/max 1), so that it gets immediately replaced 560 | * Place 2 instances behind ELB and enable *SSH Keep Alive* 561 | * Place 1 instance behind ELB, configure *auto recovery* 562 | 563 | --- 564 | 565 | 566 | # [↖](#top)[↑](#4_7)[↓](#5_1) Analysis 567 | 568 | 569 | ## [↖](#top)[↑](#5)[↓](#5_1_1) Optimize the environment to ensure maximum performance 570 | 571 | 572 | ### Offloading database workload 573 | * Using **read replicas** 574 | * Read queries are routed to *read replicas*, reducing load on primary db instance 575 | (*source instance*) 576 | * Table indexes can be created on read replicas directly (and not on the master) 577 | * Some use cases (e.g. data analytics) can be performed exclusively against read replicas 578 | * To create read replicas, AWS initally creates a snapshot of the source instance 579 | * Multi-AZ failover instance (if enabled) is used for snapshotting 580 | * After that all read queries are then *asynchronously* copied to read replica 581 | * Implies data latency, which typically is acceptable. 582 | * `ReplicaLag` can be monitored and *Cloudwatch* alarms can be configured 583 | * *Read replicas* are **not** the same as *multi-AZ failover* instances which 584 | * are *synchronously* updated 585 | * are designed to handle failover 586 | * don't receive any load unless failover actually happens 587 | * Often it is beneficial to have both read replicas and multi-AZ failover instances 588 | * Read replicas themselves can not use the Multi-AZ feature 589 | * A single master can have **up to 5** read replicas 590 | * Can be in different regions 591 | 592 | * Setting up a read replica 593 | * Configure from master instance or other read replica 594 | * Requires 'automated backups' to be enabled on source instance 595 | * Choice of db engine matters, because internal engine features are being used 596 | * Usually pick same database instance type as source instance uses 597 | * AWS provisiones different *endpoint* for read replica 598 | * Configure use of endpoint on application level 599 | 600 | * Read replicas can be promoted to normal instances 601 | * E.g. use read replica to implement bigger changes on db level, after these have been finished 602 | promote to master instance 603 | * Useful for database sharding, could create replicas for each shard 604 | 605 | 606 | ### Looking at EBS volumes 607 | * EBS *pre-warming* 608 | * Used to be required for maximum performance 609 | * Performance is reduced the very first time each block is accessed 610 | * Has been renamed to *initialization* and is no longer required if new EBS volumes are used 611 | * Still required for volumes that are restored from snapshots 612 | * Storage blocks must be initialized (pulled down from Amazon S3 and written to the volume) 613 | * Use `dd` or `fio` to *read* from every block 614 | * Only required if performance matters, obviously 615 | 616 | 617 | ### Prewarming ELBs 618 | * ELB is designed to increase its resource capacity gradually 619 | * Prevents `http 503` (ELB cannot handle anymore requests) 620 | * Can contact AWS to `pre-warm` ELB 621 | * This should not really be required. Maybe if TV ads are running or so. 622 | * Use load testing tools to get a rough estimate of what the current ELB can handle 623 | * Increase at a rate no more than 50% per 5min. 624 | 625 | 626 | ## [↖](#top)[↑](#5_1_3)[↓](#5_2_1) Identify Performance Bottlenecks and Implement Remedies 627 | 628 | 629 | ### Resizing or changing EBS root volumes 630 | * If EBS is at capacity 631 | * Either upgrade volume size to increase the amount of IOPS available 632 | * Or switch to provisiones IOPS volumes (`io1`) 633 | * Resizing 634 | * Create snapshot of EBS volume first 635 | * Incrementally stored on S3 636 | * Can continue to use EBS volume while the snapshot is taking place 637 | * Create new volume from snapshot 638 | * Stop instance 639 | * Attach new volume 640 | 641 | 642 | ### Setting up certificates for Elastic Load Balancers 643 | * Offloading overhead from the instances behind the ELB 644 | * Create ELB and configure https 645 | * Certificate from 646 | * ACM (AWS managed) 647 | * IAM (for external certificiates) 648 | * Upload directly 649 | 650 | 651 | ### Network bottlenecks 652 | * Primary network bottlenecks 653 | * EC2 instances 654 | * Instances in different AZs or regions 655 | * Different instance types get different bandwith capacities 656 | * No absolute numbers communicated by AWS though 657 | * Not using *enhanced network capabilities* (not supported by some instance types) 658 | * Check for performance issues with` iperf3` (github) 659 | * Measures performance for ip-based networks 660 | * Use VPC Peering to create a reliable connection 661 | * No single point of failure 662 | * Connection to on-prem networks 663 | * Use `Direct Connect` 664 | 665 | 666 | ## [↖](#top)[↑](#5_2_3)[↓](#5_3_1) Identify Potential Issues on a Given Application Deployment 667 | 668 | 669 | ### EBS Root Devices on Terminated Instances - Ensuring Data Durability 670 | * *EBS root volumes* will be deleted on instance termination as per default option 671 | * Could create snapshot before termination to backup data 672 | * Could change default settings 673 | * *Instance store root volumes* will be left untouched on instance termination 674 | 675 | 676 | ### Troubleshooting Auto Scaling Issues 677 | * Attempting to use wrong subnet 678 | * AZ no longer available or supported (outage) 679 | * Security group does not exist 680 | * Associated keypair does not exist 681 | * Auto scaling configuration is not working correctly 682 | * Instance type specification does not exist in that AZ 683 | * Auto scaling is not enabled on that subnet 684 | * Invalid EBS device mapping 685 | * Attempt to attach EBS block device to instance-store AMI 686 | * AMI issues 687 | * Attempt to use *placement groups* with instance types that don't support that 688 | * AWS running out of capacity in that AZ 689 | * If an instance is stopped, e.g. for updating it, autoscaling will consider it unhealthy and 690 | terminate - restart it. Need to suspend autoscaling first. 691 | 692 | --- 693 | 694 | 695 | # [↖](#top)[↑](#5_3_2)[↓](#6_1) OpsWorks 696 | 697 | 698 | ## [↖](#top)[↑](#6)[↓](#6_1_1) Overview and components 699 | * Declarative desired state engine 700 | * Automate, monitor and maintain deployments 701 | * **Cookbooks** define **recipes** 702 | * AWS' implementation of *Chef* 703 | * Original Chef 704 | * AWS-bespoke orchestration components 705 | * Components 706 | * **Stack** 707 | * Set of resources that is managed as a group 708 | * Whole service stack 709 | * **Layer** 710 | * Represent and configure components of a stack 711 | * E.g. loadbalancer layer, app layer, db layer 712 | * Share common configuration elements 713 | * **Instance** 714 | * Units of compute within the platform 715 | * Must be associated with at least one layer 716 | * Can run 717 | * 24/7 718 | * Load-based 719 | * Time-based 720 | * **Application** 721 | * Applications that are deployed on one or more instances 722 | * Deployed through source code repo or S3 723 | * Recipes 724 | * Created in ruby, used to customize different layers 725 | * Run at stack lifecycle events 726 | * `setup` 727 | * *Instance* has finished booting 728 | * `configure` 729 | * *Instance* enters or leaves the `online` state 730 | * *Elastic IP* is associated or disassociated 731 | * *Load balancer* is attached or detached 732 | * Event is executed on *all* instances, not only the impacted one 733 | * `deploy` 734 | * *Deploy command* is run on an instance 735 | * `undeploy` 736 | * *Undeploy command* is run on an instance 737 | * *App* is deleted 738 | * `shutdown` 739 | * When *instance* is shutdown, before termination 740 | * Allows cleanup 741 | * Under the hood 742 | * *OpsWorks* **agent** 743 | * Configuration of machines 744 | * *OpsWorks* **automation engine** 745 | * *Create*, *update* & *delete* of various AWS components 746 | * Handles *loadbalancing*, *autoscaling* and *autohealing* 747 | * Supports *lifecycle* events 748 | 749 | 750 | ### BerkShelf 751 | * Addresses an *OpsWorks* shortcoming from old versions - only one repository for recipes 752 | * Was added in *OpsWorks* 11.10 and allows to install cookbooks from many repositories 753 | 754 | TODO: Quickstart OpsWorks 755 | 756 | 757 | ## [↖](#top)[↑](#6_1_1)[↓](#6_2_1) Cloudformation 758 | 759 | 760 | ### Overview 761 | * Allows to create and provision **resources** in a reusable **template** fashion 762 | * A *CloudFormation* template is a `JSON` or `YAML` formatted text file 763 | * Related resources are managed in a single unit called a **stack** 764 | * Controls lifecycle of managed resources 765 | * All the resources in a stack are defined by the stack's *CloudFormation* template 766 | * Stack has `name` & `id` 767 | * Two ways to update a stack 768 | * *Direct update* 769 | * Directly applies changes (if any) 770 | * *Change set* 771 | * Summary of proposed changes, can be applied or rejected 772 | * Will **rollback** stack if it fails to create (can be disabled via API/console) 773 | * A **stack policy** is an *IAM*-style policy statements that governs who can do what 774 | 775 | 776 | ### Templates 777 | * `AWSTemplateFormatVersion` 778 | * `Description` 779 | * `Metadata` 780 | * Details about the template 781 | * `Parameters` 782 | * Values to pass in right before template creation 783 | * Type 784 | * `String`, `Number`, `List`, `CommaDelimitedList` 785 | * AWS-specific types like `AWS::EC2::KeyPair::KeyName` 786 | * Description 787 | * Default Value 788 | * Allowed Values 789 | * Allowed Pattern 790 | * Validation per *regular expression* 791 | * MinLength/MaxLength 792 | * MinValue/MaxValue 793 | * Problem: 794 | * Usage of parameters *might* make it hard to instantiate stacks without human interaction 795 | * *CloudFormation* is able to auto-generate many resources attributes, e.g. name 796 | * `Mappings` 797 | * Maps keys to values (eg different values for different regions) 798 | * `Conditions` 799 | * Check values before deciding what to do 800 | * `Resources` 801 | * Creates resources. Only mandatory section in a template. 802 | * Can have `Condition` element to toggle creation 803 | * `Outputs` 804 | * Values to be exposed from the console or from API calls. 805 | * Can be used in a different stack (*cross stack references*) 806 | * Can be: 807 | * Constructed value 808 | * Parameter reference 809 | * Pseudo parameter 810 | * Output from a function like `fn::getAtt` or `Ref` 811 | 812 | 813 | ### Intrinsic Functions 814 | * Used to pass in values that are not available until runtime 815 | * Usable in `resource` properties, `metadata` attributes, and `update policy` attributes (auto-scaling) 816 | * `Ref` 817 | * Returns the *default* value of the specified parameter or resource, usually instance id 818 | * `Fn::GetAtt` 819 | * Returns the value of an attribute from an object, either the default or the specified attribute 820 | * Object is either from the same or a nested template 821 | * `Fn::Join` 822 | * Joins a set of values into a single value separated by the specified delimiter 823 | * `Fn::Sub` 824 | * Substitutes variables in an input string with values that you specify 825 | * `Fn::FindInMap` 826 | * Returns the value corresponding to keys in a two-level map that is declared in the *Mappings* 827 | section 828 | * `Fn::Select` 829 | * Returns a single object from a list of objects by index 830 | * `Fn::Base64` 831 | * Provides encoding, converts from plain text into base64 832 | * `Fn::GetAZs` 833 | * Returns an array that lists *Availability Zones* for a specified region 834 | * If region is omitted return AZs from the region the template is applied in 835 | * `Fn::ImportValue` 836 | * Returns the value of an *Output* exported by another stack 837 | * `Fn::Split` 838 | * Split a string into a list of string values so that you can select an element from the resulting 839 | string list 840 | * `Fn::If` 841 | * Takes a list of arguments (`boolean`, `string1`, `string2`) 842 | * Returns `string1` if `boolean` is `true`, `string2` otherwise 843 | * `Fn::And`, `Fn::Equals`, `Fn::Or`, `Fn::Not` 844 | * Good for `condition` element 845 | 846 | --- 847 | 848 | 849 | # [↖](#top)[↑](#6_2_3)[↓](#7_1) Backups & Recovery 850 | 851 | 852 | ## [↖](#top)[↑](#7)[↓](#7_2) AWS Services with automated backups 853 | * RDS 854 | * Backups 855 | * *Transactional* storage engine recommended as DB engine 856 | * Degrades performance if multi-AZ is not enabled (taken from slave if enabled) 857 | * Deleting an instance deletes all *automated* backups 858 | * Backups are stored internaly on S3 859 | * PITR 5 minutes 860 | 861 | * Restoring 862 | * When restoring, only default parameters and security groups are associated with instance 863 | * Can change to different storage engine if closely related and enough space available 864 | 865 | * Elasticache 866 | * Backups 867 | * Available to Redis cluster only 868 | * Taking snaphots can degrade performance, should be performed on read replica 869 | * Backups are stored internaly on S3 870 | 871 | * Redshift 872 | * Backups 873 | * Provides free storage equal to the storage capacity of the cluster 874 | * Snapshots can be automated or manual and are incremental 875 | * Backups are stored internaly on S3 876 | * Restoring 877 | * Creates a new cluster and imports the data 878 | 879 | * EC2 880 | * Backups 881 | * No built-in automated backup solution 882 | * Snapshots of EBS volumes are incremental, causing performance degradation 883 | * Every snapshot will restore *all* data, even if older snapshots are deleted 884 | * Backups are stored internaly on S3 885 | 886 | 887 | ## [↖](#top)[↑](#7_1)[↓](#7_2_1) Disaster Recovery Scenarios 888 | 889 | 890 | ### DR of on-prem infra 891 | * Use AWS as backup solution by storing VMs, snapshots and other data 892 | * 'Pilot light' - have bare minimum infra always ready and scale up as required 893 | * 'Hot standby' (aka 'multi site') - has everything ready to go 894 | 895 | 896 | ### DR of cloud infra 897 | * Duplicate the environment from one region to another 898 | 899 | 900 | ### DR of RDS data 901 | * Protection from multiple AZs being down 902 | * Reduce latency for global audience 903 | * Replica lag will most likely go up 904 | * Data transfer across regions is getting charged 905 | * May potentially run into bandwith issues 906 | * Create read replica from existing DB instance, pick different region 907 | * Trigger setup process that will take some time 908 | 909 | 910 | ## [↖](#top)[↑](#7_2_3)[↓](#8) Storing log files and backups 911 | * Implement centralized logging 912 | * From there 913 | * Send to 3rd party tool for analyis 914 | * Backup to S3 915 | * 11x9 durability 916 | * Versioning 917 | * Lifecycle policies 918 | 919 | * Other logging options 920 | * S3 access logs 921 | * Cloudtrail 922 | * Cloudwatch 923 | 924 | --- 925 | 926 | 927 | # [↖](#top)[↑](#7_3)[↓](#8_1) Security 928 | 929 | 930 | ## [↖](#top)[↑](#8)[↓](#8_1_1) Implement and Manage Security Policies 931 | 932 | 933 | ### IAM 934 | IAM is a global service that helps to securely control access to AWS resources. 935 | 936 | * **Users** hold credentials 937 | * **Groups** hold users, typically only provides permission to assume a role 938 | * **Roles** hold policies. 939 | * Can have **trust relationships** with trusted entities that can *assume* this role 940 | * **Policies** can be attached to users, groups or roles (preferred) 941 | * An **instance profile** is a container for an IAM role that you can use to pass role information to an 942 | EC2 instance when the instance starts. 943 | * Users and/or services assume roles 944 | 945 | 946 | #### Policies 947 | * Any actions on resources that are not explicitly allowed are **denied by default** 948 | * Structure 949 | * **E** - `effect` (*allow*/*deny*) 950 | * What the effect will be when the user requests the specific action 951 | * **P** - `prinicpal` (*ARN*) 952 | * The account or user who is allowed access to the actions and resources in the statement 953 | * IAM policies do not have a principal (because they are attached to users, groups or roles) 954 | * **A** - `action` or `notaction` 955 | * Describes the specific action or actions that will be allowed or denied 956 | * **R** - `resource` or `notresource` 957 | * Specifies the object or objects that the statement covers 958 | * **C** - `condition` 959 | * Specifies conditions for when a policy is in effect 960 | * Can use **policy variables** 961 | * `aws:currentTime`, `aws:userid`, ... 962 | 963 | ``` 964 | { 965 | "Version": "2012-10-17", 966 | "Statement": [ 967 | { 968 | "Effect": "Allow", 969 | "Action": "s3:ListAllMyBuckets", 970 | "Resource": "arn:aws:s3:::*" 971 | }, 972 | { 973 | "Effect": "Allow", 974 | "Action": [ 975 | "s3:ListBucket", 976 | "s3:GetBucketLocation" 977 | ], 978 | "Resource": "arn:aws:s3:::productionapp" 979 | }, 980 | { 981 | "Effect": "Allow", 982 | "Action": [ 983 | "s3:GetObject", 984 | "s3:PutObject", 985 | "s3:DeleteObject" 986 | ], 987 | "Resource": "arn:aws:s3:::productionapp/*" 988 | } 989 | ] 990 | } 991 | ``` 992 | 993 | #### IAM Policies 994 | * Managed policies (the new way) 995 | * Can be attached to multiple users, groups and roles 996 | * AWS managed policies 997 | * Updated by AWS if new API come out 998 | * Customer managed policies 999 | * Inline policies (the old way) 1000 | 1001 | 1002 | #### IAM roles and EC2 1003 | 1004 | * Create an IAM role. 1005 | * Define which accounts or AWS services can assume the role. 1006 | * EC2 here, could be other services 1007 | * Define which API actions and resources the application can use after assuming the role. 1008 | * Specify the role when you launch your instance, or attach the role to a running or stopped instance. 1009 | * Have the application retrieve a set of temporary credentials and use them. 1010 | 1011 | * Only one role can be assigned to an EC2 instance, and all applications share the same role and permissions 1012 | 1013 | 1014 | ### S3 IAM and bucket policy concepts 1015 | 1016 | 1017 | #### Defaults 1018 | * Bucket is *owned* by the AWS account that created it 1019 | * Bucket ownership is not transferable 1020 | * Bucket owner gets full permission (ACL) 1021 | * The person paying the bills always has full control. 1022 | * A person uploading an object into a bucket owns it by default. 1023 | 1024 | 1025 | #### Bucket policies (resource level) 1026 | * Specify what actions are allowed or denied for which principals on the bucket that the policy 1027 | is attached to 1028 | * Attached *only* to S3 buckets. Can however effect object in buckets. 1029 | * Contains *principal* element (unnecessary for IAM policies) 1030 | * Use if you’re more interested in *“Who can access this S3 bucket?”* 1031 | * Easiest way to grant *cross-account permissions* for all `s3:*` permission. (Cannot do this 1032 | with ACLs.) 1033 | * Explicit *deny* in bucket policy overwrites explicite *allow* in IAM policy 1034 | * Defined as JSON 1035 | 1036 | ``` 1037 | { 1038 | "Version":"2012-10-17", 1039 | "Statement": 1040 | [ 1041 | { 1042 | "Sid":"PutObjectAcl", 1043 | "Effect":"Allow", 1044 | "Principal": 1045 | { 1046 | "AWS": 1047 | [ 1048 | "arn:aws:iam::111122223333:tom", "arn:aws:iam::444455556666:chris" 1049 | ] 1050 | }, 1051 | "Action": 1052 | [ 1053 | "s3:PutObject", 1054 | "s3:PutObjectAcl" 1055 | ], 1056 | "Resource": 1057 | [ 1058 | "arn:aws:s3:::examplebucket/*" 1059 | ] 1060 | } 1061 | ] 1062 | } 1063 | ``` 1064 | 1065 | 1066 | #### ACLs 1067 | * Defined as XML. Legacy, not recomended any more. 1068 | * Can 1069 | * be attached to individual objects (bucket policies only bucket level) 1070 | * control access to object uploaded into a bucket from a *different* account. 1071 | * Cannot.. 1072 | * have conditions 1073 | * cannot explicitely deny actions 1074 | * grant permission to bucket sub-resources (eg. lifecycle or static website configurations) 1075 | * Other than *object ACL*s there are *bucket ACL*s as well - only for writing access log objects to a 1076 | bucket. 1077 | ``` 1078 | 1079 | 1080 | 1081 | *** Owner-Canonical-User-ID *** 1082 | owner-display-name 1083 | 1084 | 1085 | 1086 | 1088 | *** Owner-Canonical-User-ID *** 1089 | display-name 1090 | 1091 | FULL_CONTROL 1092 | 1093 | 1094 | 1095 | ``` 1096 | 1097 | 1098 | #### IAM policies (user level) 1099 | * IAM policies (in general) specify what actions are allowed or denied on what AWS resources 1100 | * Attached to IAM users, groups, or roles (so they cannot grant access to anonymous users) 1101 | * Use if you’re more interested in *“What can this user do in AWS?”* 1102 | 1103 | .|. 1104 | -|- 1105 | `arn:partition:service:region:namespace:relative-id`|`arn:aws:s3:::mybucket` 1106 | `arn:aws:s3:::*`|All buckets and objects in account 1107 | `arn:aws:s3:::mybucket`|`mybucket` 1108 | `arn:aws:s3:::mybucket/*`|All objects in `mybucket` 1109 | `arn:aws:s3:::mybucket/mykey`|`mykey` in `mybucket` 1110 | `arn:aws:s3:::mybucket/developers/($aws:username)/`|folder matching the accessing user's name 1111 | 1112 | 1113 | #### Cloudfront 1114 | * Can use Cloudfront Origin Access Identity to restrict access to S3 objects 1115 | 1116 | 1117 | ## [↖](#top)[↑](#8_1_2_5)[↓](#8_2_1) Ensure Data Integrity and Access Controls when Using the AWS Platform 1118 | 1119 | 1120 | ### MFA 1121 | * *Should* be turned on for all console access 1122 | * *Can* be enabled for API access as well 1123 | * The administrator configures an AWS MFA device for each user who needs to make API requests that 1124 | require MFA authentication. This process is described at Enabling MFA Devices. 1125 | * The administrator creates policies for the users that include a *Condition* element that checks 1126 | whether the user authenticated with an AWS MFA device. 1127 | * The user calls one of the AWS STS API operations that support the MFA parameters `AssumeRole` or 1128 | `GetSessionToken`, depending on the scenario for MFA protection, as explained later. As part of the 1129 | call, the user includes the device identifier for the device that's associated with the user. The 1130 | user also includes the time-based one-time password (TOTP) that the device generates. In either case, 1131 | the user gets back temporary security credentials that the user can then use to make additional 1132 | requests to AWS. 1133 | * This is not supported by all services (support by *SQS*, *SNS*, *S3*) 1134 | 1135 | * MFA delete can be enabled for root accounts (bucket owners) before permanently deleting an object 1136 | 1137 | ``` 1138 | { 1139 | "Version": "2012-10-17", 1140 | "Statement": [{ 1141 | "Effect": "Allow", 1142 | "Principal": {"AWS": ["ALICE", "BOB"]}, 1143 | "Action": [ "s3:PutObject", "s3:DeleteObject" ], 1144 | "Resource": ["arn:aws:s3:::Alice-Bucket/*"], 1145 | "Condition": {"Bool": {"aws:MultiFactorAuthPresent": "true"}} 1146 | }] 1147 | } 1148 | ``` 1149 | 1150 | 1151 | ### Secure Token Service (STS) 1152 | * Allows to grant **temporary access** to authenticated users 1153 | * IAM users 1154 | * Web-based identity providers (google, facebook, ...) 1155 | * Organization's existing identity system 1156 | * Returns **temporary credentials** that expire after some time: 1157 | * Access key 1158 | * Session token 1159 | 1160 | 1161 | #### Terms 1162 | * **Federation** 1163 | * Trust relationship between identity provider and AWS 1164 | * **Identity broker** 1165 | * Broker in charge of mapping user to the right set of credentials 1166 | * **Identity store** 1167 | * Eg Google or Facebook 1168 | * **Identities** 1169 | * Users 1170 | 1171 | 1172 | #### Scenarios 1173 | * Temporary credentials with EC2 1174 | * Assign IAM role to instance 1175 | * Get temp credentials from *instance metadata* 1176 | * Temporary credentials with SDK 1177 | * Call `assumeRole`, extract temp credentials 1178 | * Options for temporary credentials with API calls 1179 | * *Sign request* with temp credentials 1180 | * Add AC/SK to request (*header* or *query string*) 1181 | 1182 | 1183 | ## [↖](#top)[↑](#8_2_2_2)[↓](#8_4) Share responsibility model 1184 | * **Shared responsibility** environment 1185 | * AWS is responsible for: 1186 | * Server/Host level and below 1187 | * Physical environment security 1188 | * Hardware decommissioning 1189 | * Traffic security (Networks, ACLs, SSL, DDOS-protection) 1190 | * EC2 hypervisor isolation 1191 | * User is responsible for: 1192 | * IAM 1193 | * MFA 1194 | * Password/key-rotation 1195 | * Access advisor (shows used permissions) 1196 | * Trusted advisor (validates best practices) 1197 | * Security groups 1198 | * ACL (resource based policy) 1199 | * VPC 1200 | 1201 | 1202 | ## [↖](#top)[↑](#8_3)[↓](#9) AWS and IT Audits 1203 | * AWS performs self audits of changes to key services to monitor quality, maintain high standards, and 1204 | facilitate continuous improvement of the change management process 1205 | * For audits, AWS provides: 1206 | * *Security of the cloud* 1207 | * Information regarding their global infrastructure 1208 | * From the host operating system and virtualization layer down to the physical security of facilities 1209 | * Annual certifications and reports: (like the Service Organization Control (SOC) reports, ISO 27001 1210 | cert, PCI assessments) 1211 | * For audits, the customer provides: 1212 | * *Security in the cloud* 1213 | * Anything their organization puts on (or connects to) their AWS assets 1214 | Examples: guest operating system, apps on virtual machine instances, objects in S3, database like RDS, 1215 | etc... 1216 | 1217 | --- 1218 | 1219 | 1220 | # [↖](#top)[↑](#8_4)[↓](#9_1) Networking 1221 | 1222 | 1223 | ## [↖](#top)[↑](#9)[↓](#9_1_1) Route53 Routing Policies 1224 | * *Simple* 1225 | * *Weighted* 1226 | * *Latency* 1227 | * *Failover* 1228 | * *Geolocation* 1229 | 1230 | 1231 | ### DNS Failover 1232 | * Can set up *health checks* for endpoints or domains from within *Route53* 1233 | * Route 53 has health checkers in locations around the world. When you create a health check that 1234 | monitors an endpoint, health checkers start to send requests to the endpoint that you specify 1235 | to determine whether the endpoint is healthy. 1236 | * `evaluate target health` 1237 | * DNS entries are then being associated with health checks and can be configured to failover as 1238 | well (1 primary and n secondary recordsets) 1239 | 1240 | 1241 | ### Weighted 1242 | * Control distribution of traffic with DNS entries 1243 | * This can be based on a certain percentage 1244 | * Set *routing policy* to weighted (instead of failover) 1245 | 1246 | 1247 | ### Latency-based 1248 | * Control distribution of traffic based on latency. 1249 | 1250 | 1251 | ## [↖](#top)[↑](#9_1_3)[↓](#9_2_1) VPC Essentials 1252 | * Provisions a logically isolated section of the AWS cloud 1253 | * Spans over all AZs in a region 1254 | * Allows to create layered architecture 1255 | * Shared or dedicated tenancy (exclusive hardware or not) 1256 | * *Security groups* and subnet *network ACLs* 1257 | * Ability to extend on-premise network to cloud 1258 | 1259 | 1260 | ### Default VPC (Amazon specific) 1261 | * Gives easy access to a VPC without having to configure it from scratch 1262 | * Has different subnets in different AZs and an internet gateway per AZ 1263 | * Each instance launched automatically receives a *public IP* (very different to non-default VPC) 1264 | * Cannot be restored if deleted 1265 | 1266 | 1267 | ### Non-default VPC (regular VPC) 1268 | * Only has private IP addresses 1269 | * Resources *only* accessible through *Elastic IP*, *VPN* or *internet gateways* 1270 | * Does not have a gateway attached 1271 | 1272 | 1273 | ### VPC Peering 1274 | * Connect VPCs through direct network routing 1275 | * Can occur between different accounts and VPCs, but must be in the same region 1276 | * Allows instances to communicate with each other as if they were in the same network 1277 | * CIDRs must not overlap 1278 | 1279 | 1280 | ### VPC Scenarios 1281 | * VPC with private subnet only -> single tier apps 1282 | * VPC with public and private subnets -> layered apps 1283 | * VPC with public, private subnets and hardware connected VPN -> extending apps to on-premise 1284 | * VPC with private subnets and hardware connected VPN -> extended VPN 1285 | 1286 | 1287 | ### Components 1288 | * **Subnet** 1289 | * In exactly one AZ 1290 | * If a subnet doesn't have a route to the Internet gateway, it's known as a *private* subnet 1291 | * Instances receive 1292 | * *Private IP* address 1293 | * Internal DNS hostname 1294 | * If traffic is routed to an Internet gateway, the subnet is known as a *public* subnet 1295 | * Instances receive 1296 | * *Public IP* address 1297 | * External DNS hostname 1298 | * EC2 instances are launched into subnets 1299 | * Use ssh-agent forwarding to connect from public to private instances 1300 | * Sometimes grouped into Subnet Groups, e.g. for caching or DB. Typically across AZs 1301 | * **Route Table** 1302 | * Contains a set of rules, called routes that determine where network traffic is directed to 1303 | * Each VPC automatically comes with a main route table that can be configured 1304 | * Each subnet in a VPC must be associated with a route table; the table controls the routing 1305 | for the subnet. A subnet can only be associated with one route table at a time, but multiple 1306 | subnets can be associated with the same route table 1307 | * Each route in a table specifies a destination CIDR and a target 1308 | * Every route table contains a local route for communication within the VPC 1309 | * Can have a *default route* 0.0.0.0/0 to route everything that doesn't have a specific rule 1310 | * **Elastic IP** 1311 | * Static IPv4 address mapped to an *instance* or *network interface* 1312 | * If attached to network interface it's decoupled from the instance's lifecycle 1313 | * Routes to *private IP* address of instance 1314 | * Can be remapped in case of failure. 1315 | * For use in a specific region only 1316 | * Can only map to instances in public subnets 1317 | * **Gateways** 1318 | * *Internet Gateway* 1319 | * Horizontally scaled, redundant, and highly available VPC component that allows communication 1320 | between instances in a VPC and the internet 1321 | * Provides a target in VPC route tables for internet-routable traffic 1322 | * Performs network address translation (NAT) for instances that have been assigned public 1323 | IPv4 addresses 1324 | * *Virtual Private Gateway* 1325 | * Has VPN connection to customer gateway attached 1326 | * Serves as VPN concentrator on the Amazon side of the VPN connection 1327 | * *Customer Gateway* 1328 | * A physical device or software application on your side of the VPN connection 1329 | * **NAT** 1330 | * *NAT Instances* 1331 | * Manually configured instance from an NAT AMI 1332 | * *NAT Gateway* 1333 | * AWS-mananged service 1334 | 1335 | 1336 | ### Security 1337 | 1338 | #### Network ACL 1339 | * Subnet level, acting as firewall 1340 | * Rules for inbound and outbound traffic 1341 | * Rules have numbers and are evaluated from low to high, first matching rule wins, others are *not* evaluated 1342 | * *Stateless* 1343 | 1344 | 1345 | #### Security Groups 1346 | * Acts as a virtual firewall to control inbound and outbound traffic to instances 1347 | * Acts on instance level, not subnet level 1348 | * Rules for inbound and outbound traffic 1349 | * *Stateful* - will always allow response to (allowed) outbound traffic 1350 | * Can refer to other security group, e.g. allow traffic from there 1351 | 1352 | 1353 | #### Structure & package flow 1354 | * VPC (has *CIDR*) 1355 | * Gateway (Internet or VPN) 1356 | * Routes (one per subnet, can be shared) 1357 | * Network ACL (one per subnet, can be shared) 1358 | * Subnets (CIDRs match VPC's CIDR) 1359 | * Security Group (on VPC level) 1360 | * Instance (needs public IP for internet communication, either ELB or Elastic IP) 1361 | 1362 | * Flow from internet 1363 | * Internet Gateway 1364 | * VPC Router (routes into desired subnet) 1365 | * Route Table (of that subnet) 1366 | * NACL 1367 | * Security Group 1368 | * Instance 1369 | 1370 | 1371 | #### Connection To On-prem Network/Direct Connect 1372 | * VPC 1373 | * (has attached) Virtual Private Gateway 1374 | * (has attached) VPN Connection 1375 | * (has attached) Customer Gateway 1376 | 1377 | TODO: VPN vs direct connect. Can I use VPN instead of DC? 1378 | 1379 | 1380 | ## [↖](#top)[↑](#9_2_6_4)[↓](#10) Limits: 1381 | .|. 1382 | -|- 1383 | VPCs per region|5 1384 | Subnets per VPC|200 1385 | Customer gateways per region|50 1386 | Virtual private gateways per region|5 1387 | Virtual private gateways per VPC|1 1388 | Gateway per region|5 Internet 1389 | Elastic IPs per account per region|5 1390 | VPN connections per region|50 1391 | Route tables per region|200 1392 | Security groups per region|500 1393 | 1394 | 1395 | # [↖](#top)[↑](#9_3)[↓](#10_1) Etc 1396 | 1397 | 1398 | ## [↖](#top)[↑](#10)[↓](#10_2) Accessing the OS 1399 | * Services that allow access the the underlaying OS 1400 | * EC2 1401 | * ECS 1402 | * EB (Elastic Bean Stalk) 1403 | * EMR (Elastic Map Reduce) 1404 | * OpsWorks 1405 | * Services that hide the OS away (managed services) 1406 | * DynamoDB 1407 | * RDS 1408 | 1409 | 1410 | ## [↖](#top)[↑](#10_1)[↓](#10_3) SQS 1411 | * Default message retention period: 4 days (max 14 days) 1412 | * `DelaySeconds` will delay a message appearing in the queue 1413 | * Setting `WaitTimeSeconds` will enable *long polling* (can be more cost efficient) 1414 | 1415 | 1416 | ## [↖](#top)[↑](#10_2)[↓](#) DynamoDb 1417 | * Prefix partition key with hash to enforce even distribution of IO across many partitions 1418 | --------------------------------------------------------------------------------