├── README.md
├── _config.yml
├── advanced-networking-speciality.md
├── developer-associate.md
├── devops-engineer-professional-02.md
├── devops-engineer-professional.md
├── solutions-architect-associate.md
├── solutions-architect-professional.md
└── sysops-administrator-associate.md
/README.md:
--------------------------------------------------------------------------------
1 | # AWS Certification Notes
2 |
3 | * [DevOps Engineer Professional DOP-C02](devops-engineer-professional-02.md) (2023)
4 | * [Advanced Network Speciality](advanced-networking-speciality.md) (2021)
5 | * [Solutions Architect Professional](solutions-architect-professional.md) (2021)
6 | * [DevOps Engineer Professional DOP-C01](devops-engineer-professional.md) (2020)
7 | * [Solutions Architect Associate](solutions-architect-associate.md) (2019)
8 | * [SysOps Administrator Associate](sysops-administrator-associate.md) (2018)
9 | * [Developer Associate](developer-associate.md) (2017)
10 |
11 | ---
12 |
13 | * View as [GitHub Pages](https://jangroth.github.io/aws-certification-notes/)
14 | * TOCs generated with [MarkDownHelper](https://github.com/jangroth/markdownhelper)
15 |
--------------------------------------------------------------------------------
/_config.yml:
--------------------------------------------------------------------------------
1 | markdown: GFM
2 | theme: jekyll-theme-dinky
3 | title: AWS Cert Notes
4 | description: My AWS cert notes.
5 |
--------------------------------------------------------------------------------
/advanced-networking-speciality.md:
--------------------------------------------------------------------------------
1 |
2 |
3 | ---
4 | * [Advanced Networking - Speciality](#1)
5 | * [Exam Objectives](#2)
6 | * [Content](#2_1)
7 | * [Design and Implement AWS Networks](#3)
8 | * [AWS Global Network Infrastructure](#3_1)
9 | * [Virtual Private Cloud (VPC)](#3_2)
10 | * [Connecting VPCs to other VPCs](#3_3)
11 | * [Extending on-premises networks to VPCs](#3_4)
12 | * [Open](#4)
13 | * [Services](#4_1)
14 | * [Topics](#4_2)
15 | * [Practice/Hands-on](#4_3)
16 | * [Supporting Material](#4_4)
17 | ---
18 |
19 | ---
20 |
21 | # [↖](#top)[↓](#2) Advanced Networking - Speciality
22 |
23 | > 8/2021 -
24 |
25 | ---
26 |
27 |
28 | # [↖](#top)[↑](#1)[↓](#2_1) Exam Objectives
29 | * Design, develop, and deploy cloud-based solutions using AWS.
30 | * Implement core AWS services according to basic architectural best practices.
31 | * Design and maintain network architecture for all AWS services.
32 | * Leverage tools to automate AWS networking tasks.
33 |
34 |
35 | ## [↖](#top)[↑](#2)[↓](#2_1_1) Content
36 |
37 | * [Domain 1: Design and Implement Hybrid IT Network Architectures at Scale](#2_1_1)
38 | * [Domain 2: Design and Implement AWS Networks](#2_1_2)
39 | * [Domain 3: Automate AWS Tasks](#2_1_3)
40 | * [Domain 4: Configure Network Integration with Application Services](#2_1_4)
41 | * [Domain 5: Design and Implement for Security and Compliance](#2_1_5)
42 | * [Domain 6: Manage, Optimize, and Troubleshoot the Network](#2_1_6)
43 |
44 |
45 | ### [↖](#2_1)[↑](#2_1)[↓](#2_1_2) Domain 1: Design and Implement Hybrid IT Network Architectures at Scale
46 | * 1.1 Implement connectivity for hybrid IT
47 | * 1.2 Given a scenario, derive an appropriate hybrid IT architecture connectivity solution
48 | * 1.3 Explain the process to extend connectivity using AWS Direct Connect
49 | * 1.4 Evaluate design alternatives that leverage AWS Direct Connect
50 | * 1.5 Define routing policies for hybrid IT architectures
51 |
52 | ### [↖](#2_1)[↑](#2_1_1)[↓](#2_1_3) Domain 2: Design and Implement AWS Networks
53 | * 2.1 Apply AWS networking concepts
54 | * 2.2 Given customer requirements, define network architectures on AWS
55 | * 2.3 Propose optimized designs based on the evaluation of an existing implementation
56 | * 2.4 Determine network requirements for a specialized workload
57 | * 2.5 Derive an appropriate architecture based on customer and application requirements
58 | * 2.6 Evaluate and optimize cost allocations given a network design and application data flow
59 |
60 | ### [↖](#2_1)[↑](#2_1_2)[↓](#2_1_4) Domain 3: Automate AWS Tasks
61 | * 3.1 Evaluate automation alternatives within AWS for network deployments
62 | * 3.2 Evaluate tool-based alternatives within AWS for network operations and management
63 |
64 | ### [↖](#2_1)[↑](#2_1_3)[↓](#2_1_5) Domain 4: Configure Network Integration with Application Services
65 | * 4.1 Leverage the capabilities of Route 53
66 | * 4.2 Evaluate DNS solutions in a hybrid IT architecture
67 | * 4.3 Determine the appropriate configuration of DHCP within AWS
68 | * 4.4 Given a scenario, determine an appropriate load balancing strategy within the AWS ecosystem
69 | * 4.5 Determine a content distribution strategy to optimize for performance
70 | * 4.6 Reconcile AWS service requirements with network requirements
71 |
72 | ### [↖](#2_1)[↑](#2_1_4)[↓](#2_1_6) Domain 5: Design and Implement for Security and Compliance
73 | * 5.1 Evaluate design requirements for alignment with security and compliance objectives
74 | * 5.2 Evaluate monitoring strategies in support of security and compliance objectives
75 | * 5.3 Evaluate AWS security features for managing network traffic
76 | * 5.4 Utilize encryption technologies to secure network communications
77 |
78 | ### [↖](#2_1)[↑](#2_1_5)[↓](#3) Domain 6: Manage, Optimize, and Troubleshoot the Network
79 | * 6.1 Given a scenario, troubleshoot and resolve a network issu
80 |
81 |
82 | # [↖](#top)[↑](#2_1_6)[↓](#3_1) Design and Implement AWS Networks
83 |
84 |
85 | ## [↖](#top)[↑](#3)[↓](#3_1_1) AWS Global Network Infrastructure
86 |
87 | * [Overview](#3_1_1)
88 |
89 |
90 |
91 | ### [↖](#3_1)[↑](#3_1)[↓](#3_2) Overview
92 | AWS has the concept of a **Region**, which is a physical location around the world where we cluster data centers. We call each group of logical data centers an Availability Zone. Each AWS Region consists of multiple, isolated, and physically separate AZs within a geographic area.
93 |
94 | An **Availability Zone (AZ)** is one or more discrete data centers with redundant power, networking, and connectivity in an AWS Region. AZs give customers the ability to operate production applications and databases that are more highly available, fault tolerant, and scalable than would be possible from a single data center.
95 |
96 | A **transit center** provides redundant connectivity between AZs and internet backbones.
97 |
98 | **Edge locations** are AWS data centers ('endpoints') designed to deliver services with the lowest latency possible. Amazon has dozens of these data centers spread across the world. They’re closer to users than Regions or Availability Zones, often in major cities, so responses can be fast and snappy. A subset of services for which latency really matters use edge locations, including:
99 | * *CloudFront*, which uses edge locations to cache copies of the content that it serves, so the content is closer to users and can be delivered to them faster.
100 | * *Lambda@Edge*, is a feature of Amazon CloudFront that lets you run code closer to users of your application, which improves performance and reduces latency.
101 | * *Route 53*, which serves DNS responses from edge locations, so that DNS queries that originate nearby can resolve faster (and, contrary to what you might think, is also Amazon’s premier database).
102 | * *Web Application Firewall* and *AWS Shield*, which filter traffic in edge locations to stop unwanted traffic as soon as possible.
103 |
104 | **AWS Local Zones** place compute, storage, database, and other select AWS services closer to end-users. With AWS Local Zones, you can easily run highly-demanding applications that require single-digit millisecond latencies to your end-users such as media & entertainment content creation, real-time gaming, reservoir simulations, electronic design automation, and machine learning:
105 | * A Local Zone is an extension of an AWS Region that is geographically close to your users.
106 | * You can extend any VPC from the parent AWS Region into Local Zones by creating a new subnet and assigning it to the AWS Local Zone. When you create a subnet in a Local Zone, your VPC is extended to that Local Zone. The subnet in the Local Zone operates the same as other subnets in your VPC.
107 |
108 |
109 | ## [↖](#top)[↑](#3_1_1)[↓](#3_2_1) Virtual Private Cloud (VPC)
110 |
111 | * [Overview](#3_2_1)
112 | * [Default VPC (Amazon specific)](#3_2_1_1)
113 | * [Non-default VPC (regular VPC)](#3_2_1_2)
114 | * [VPC Scenarios](#3_2_1_3)
115 | * [Core Components](#3_2_2)
116 | * [Security Components](#3_2_3)
117 | * [Structure & Package Flow](#3_2_4)
118 | * [Package flow through VPC components](#3_2_4_1)
119 | * [Limits](#3_2_5)
120 |
121 |
122 |
123 | ### [↖](#3_2)[↑](#3_2)[↓](#3_2_1_1) Overview
124 | **Amazon Virtual Private Cloud (Amazon VPC)** is a service that lets you launch AWS resources in a logically isolated virtual network that you define. You have complete control over your virtual networking environment, including selection of your own IP address range, creation of subnets, and configuration of route tables and network gateways. You can use both IPv4 and IPv6 for most resources in your virtual private cloud, helping to ensure secure and easy access to resources and applications.
125 |
126 | As one of AWS's foundational services, Amazon VPC makes it easy to customize your VPC's network configuration. You can create a public-facing subnet for your web servers that have access to the internet. It also lets you place your backend systems, such as databases or application servers, in a private-facing subnet with no internet access. Amazon VPC lets you to use multiple layers of security, including security groups and network access control lists, to help control access to Amazon EC2 instances in each subnet.
127 | * Provisions a logically isolated section of the AWS cloud
128 | * Spans over all AZs in a region
129 | * Allows to create layered architecture
130 | * Shared or dedicated tenancy (exclusive hardware or not)
131 | * Cannot be changed after VPC creation
132 | * *Security groups* and subnet-level *network ACLs*
133 | * Ability to extend on-premises network to cloud
134 | * Can be extended *after creation* by adding 1 to utmost 4 CIDR blocks
135 | * On AWS
136 | * Service - FAQs - User Guide
137 |
138 |
139 | #### [↖](#3_2)[↑](#3_2_1)[↓](#3_2_1_2) Default VPC (Amazon specific)
140 | * Gives easy access to a VPC without having to configure it from scratch
141 | * Has different subnets in different AZs and an internet Gateway (HA, spread out to all AZs)
142 | * Each instance launched automatically receives a *public IP* (and a private IP), this is usually not the case for non-default VPCs
143 | * Cannot be restored if deleted
144 | * Comes with default NACL that allows all inbound/outbound traffic
145 |
146 | #### [↖](#3_2)[↑](#3_2_1_1)[↓](#3_2_1_3) Non-default VPC (regular VPC)
147 | * Only has private IP addresses
148 | * Resources *only* accessible through *Elastic IP*, *VPN* or *Internet Gateways*
149 |
150 | #### [↖](#3_2)[↑](#3_2_1_2)[↓](#3_2_2) VPC Scenarios
151 | * VPC with private subnet only -> single tier apps
152 | * VPC with public and private subnets -> layered apps
153 | * VPC with public, private subnets and hardware connected VPN -> extending apps to on-premises
154 | * VPC with private subnets and hardware connected VPN -> extended VPN
155 |
156 |
157 | ### [↖](#3_2)[↑](#3_2_1_3)[↓](#3_2_3) Core Components
158 | * **CIDR range**
159 | * VPCs are private networks and use RFC1918 ranges
160 | * 10.0.0.0/8 (-> `10.255.255.255`)
161 | * 172.16.0.0/12 (-> `172.31.255.255`)
162 | * 192.168.0.0/16 (-> `192.168.255.255`)
163 | * This guarantees that VPCs cannot conflict in the public internet
164 | * **Subnet**
165 | * In exactly one AZ
166 | * If traffic is routed to an Internet Gateway, the subnet is known as a *public subnet*
167 | * Gets public IP through Internet Gateway
168 | * If a subnet doesn't have a route to the Internet Gateway, it's known as a *private subnet*
169 | * Can get internet access through NAT Gateway
170 | * EC2 instances are launched into subnets
171 | * Sometimes grouped into Subnet Groups, e.g. for caching or DB. Typically across AZs
172 | * **Route Table**
173 | * Contains a set of rules, called *routes* that determine where network traffic is directed to
174 | * Each VPC automatically comes with a *main route table* that can be configured
175 | * Each subnet in a VPC must be associated with a route table; the table controls the routing for the subnet.
176 | * A subnet can only be associated with one route table at a time, but multiple subnets can be associated with the same route table
177 | * Each route in a table specifies a destination CIDR and a target
178 | * Every route table contains a local route for communication within the VPC
179 | * Has a *local route* for communication within the VPC (e.g. `172.31.0.0/16`)
180 | * Can have a *default route* `0.0.0.0/0` to route everything that doesn't have a specific rule
181 |
182 | |Route Table Type|Description|
183 | |-|-|
184 | |Main|The route table that automatically comes with your VPC. It controls the routing for all subnets that are not explicitly associated with any other route table.|
185 | |Subnet|A route table that's associated with a subnet.|
186 | |Gateway|A route table that's associated with an internet gateway or virtual private gateway.|
187 | |Local gateway|A route table that's associated with an Outposts local gateway.|
188 |
189 | * **Elastic IP**
190 | * Static IPv4 address mapped to an instance or network interface
191 | * If attached to network interface it's decoupled from the instance's lifecycle
192 | * Routes to private IP address of instance
193 | * Can be remapped in case of failure
194 | * For use in a specific region only
195 | * Can only map to instances in public subnets
196 | * **Gateways**
197 | * *Internet Gateway*
198 | * Horizontally scaled, redundant, and highly available VPC component that allows communication between instances in a VPC and the internet
199 | * Provides a target in VPC route tables for internet-routable traffic
200 | * Performs network address translation (NAT) for instances that have been assigned public IPv4 addresses
201 | * *Egress-Only* Gateway
202 | * Allows outbound communication over IPv6 from instances in your VPC to the Internet
203 | * Prevents the Internet from initiating an IPv6 connection with your instances.
204 | * (IPv6 addresses are globally unique, and are therefore public by default)
205 | * *Virtual Private* Gateway (VGW)
206 | * AWS side of Site-to-site VPN
207 | * Has VPN connection to customer gateway attached
208 | * Serves as VPN concentrator on the Amazon side of the VPN connection
209 | * Only one virtual private gateway can be attached to a VPC at a time
210 | * *Customer Gateway*
211 | * Customer side of Site-to-site VPN
212 | * A physical device or software application on your side of the VPN connection
213 | * **NAT**
214 | * 'One-way valve' that allows access *to* the internet, but not *from*.
215 | * *NAT Instances*
216 | * Manually configured instance from an NAT AMI
217 | * Need to manually disable *source/destination check* on the instance
218 | * *NAT Gateway*
219 | * AWS-mananged service
220 | * HA per AZ, create one gateway per AZ
221 | * **DNS**
222 | * Route53 resolver is provided for VPC (can be disabled)
223 | * Can provide DHCP options to provide own DNS configuration
224 | * DNS hostnames are provides (can be disabled)
225 | * Private (internal) hostname: `ip-private-ipv4-address.region.compute.internal`
226 | * Public (external) hostname: `ec2-public-ipv4-address.region.compute.amazonaws.com`
227 |
228 | ### [↖](#3_2)[↑](#3_2_2)[↓](#3_2_4) Security Components
229 | * **Security Groups**
230 | * Acts as a virtual, distributed firewall to control inbound and outbound traffic to instances
231 | * Acts on instance level, not subnet level
232 | * 'Allow rules' for inbound and outbound traffic (*no* explicite deny rules)
233 | * All outbound traffic is allowed by default
234 | * All inbound traffic is denied per default
235 | * Support *allow* rules only
236 | * Cannot block individual IP adresses (use NACL for that)
237 | * *Stateful* - will always allow response to (allowed) outbound traffic
238 | * Can refer to other security groups, e.g. allow traffic from there
239 | * Can have mulitple security groups attached to an instance
240 | * Can have any number of instances within a security group
241 | * **Network ACL**
242 | * Subnet level, acting as firewall
243 | * One subnet can (and must) only ever be associated to one NACL, however, one NACL can be associated to many subnets
244 | * Rules for inbound and outbound traffic
245 | * Rules have numbers and are evaluated from low to high
246 | * Default is to deny everything in and out
247 | * *Stateless*
248 | * Support *allow* and *deny* rules
249 | * Can block IP addresses (Security groups can't)
250 | * **Cannot** block URLs (forward proxies can)
251 | * **VPC Flow Logs**
252 | * Capture information about the IP traffic going to and from network interfaces in a VPC.
253 | * Contains description of networking packets, but not their payload
254 | * Log data can be published to Amazon CloudWatch Logs and Amazon S3
255 | * Can be created at 3 levels:
256 | * VPC
257 | * Subnet
258 | * Network interface
259 |
260 |
261 | ### [↖](#3_2)[↑](#3_2_3)[↓](#3_2_4_1) Structure & Package Flow
262 |
263 | #### [↖](#3_2)[↑](#3_2_4)[↓](#3_2_5) Package flow through VPC components
264 | * VPC (has *CIDR*)
265 | * Gateway (Internet or VPN)
266 | * Router
267 | * Route table (one per subnet, can be shared)
268 | * Network ACL (one per subnet, can be shared)
269 | * Subnets (CIDRs match VPC's CIDR)
270 | * Security Group (on VPC level)
271 | * Instance (needs public IP for internet communication, either ELB or Elastic IP)
272 |
273 |
274 |
275 | ### [↖](#3_2)[↑](#3_2_4_1)[↓](#3_3) Limits
276 | |||
277 | |-|-|
278 | |VPCs per region|5|
279 | |Min/max VPC size|`/28`/`/16`|
280 | |Subnets per VPC|200|
281 | |Customer gateways per region|50|
282 | |Gateway per region|5 Internet|
283 | |Elastic IPs per account per region|5|
284 | |VPN connections per region|50|
285 | |Route tables per region|200|
286 | |Security groups per region|500|
287 |
288 | ---
289 |
290 |
291 | ## [↖](#top)[↑](#3_2_5)[↓](#3_3_1) Connecting VPCs to other VPCs
292 |
293 | * [Overview](#3_3_1)
294 | * [VPC Peering](#3_3_2)
295 | * [Establishing a VPC peering](#3_3_2_1)
296 | * [Longest prefix match](#3_3_2_2)
297 | * [Unsupported VPC peering configurations](#3_3_2_3)
298 | * [Limits](#3_3_2_4)
299 | * [Transit Gateway](#3_3_3)
300 | * [Overview](#3_3_3_1)
301 | * [Setting up a Transit Gateway](#3_3_3_2)
302 | * [Transit VPC (=Software VPN, not recommended any more)](#3_3_4)
303 | * [AWS PrivateLink](#3_3_5)
304 |
305 |
306 |
307 | ### [↖](#3_3)[↑](#3_3)[↓](#3_3_2) Overview
308 |
309 | ||VPC Peering|Transit Gateway|
310 | |-|-|-|
311 | |VPC-Limit|125 peerings|5,000 attachments|
312 | |Bandwith limit|N/A (intra-region)|50Gb/s per VPC attachment|
313 | |Management|Decentralized|Centralized|
314 | |Cost Dimensions|Data transfer|Data transfer & attachment|
315 |
316 |
317 | ### [↖](#3_3)[↑](#3_3_1)[↓](#3_3_2_1) VPC Peering
318 | * Connect VPCs through direct network routing
319 | * Cross-region, cross-account
320 | * Allows instances to communicate with each other as if they were in the same network
321 | * Full private IP connectivity between VPCs
322 | * Connectivity must be established for each VPC that need to communicate with one another
323 | * Can reference a security group of a peered VPC (even cross-account)
324 | * Must update route tables in each VPC’s subnets to ensure instances can communicate
325 | * On AWS:
326 | * Documentation - FAQs
327 |
328 |
329 | #### [↖](#3_3)[↑](#3_3_2)[↓](#3_3_2_2) Establishing a VPC peering
330 | * Consumer VPC initiates peering request
331 | * Provider VPC accepts peering request
332 | * Route tables on both sides are updated, to ensure traffic can flow
333 |
334 |
335 | #### [↖](#3_3)[↑](#3_3_2_1)[↓](#3_3_2_3) Longest prefix match
336 | * VPC uses the longest prefix match to select the most specific route
337 | * Other way of saying it is “most specific route”
338 |
339 |
340 | #### [↖](#3_3)[↑](#3_3_2_2)[↓](#3_3_2_4) Unsupported VPC peering configurations
341 | * *Overlapping CIDR blocks*
342 | * Cannot create a VPC peering connection between VPCs with matching or overlapping IPv4 CIDR blocks
343 | * *Transitive peering*
344 | * You have a VPC peering connection between VPC A and VPC B (pcx-aaaabbbb), and between VPC A and VPC C (pcx-aaaacccc). There is no VPC peering connection between VPC B and VPC C. You cannot route packets directly from VPC B to VPC C through VPC A.
345 | * *Edge to edge routing through a gateway or private connection*
346 | * A VPN connection or an AWS Direct Connect connection to a corporate network
347 | * An internet connection through an internet gateway
348 | * An internet connection in a private subnet through a NAT device
349 | * A gateway VPC endpoint to an AWS service; for example, an endpoint to Amazon S3.
350 | * (IPv6) A ClassicLink connection. You can enable IPv4 communication between a linked EC2-Classic instance and instances in a VPC on the other side of a VPC peering connection. However, IPv6 is not supported in EC2-Classic, so you cannot extend this connection for IPv6 communication.
351 |
352 |
353 | #### [↖](#3_3)[↑](#3_3_2_3)[↓](#3_3_3) Limits
354 | ||soft|hard|
355 | |-|-|-|
356 | |Active VPC peering connections per VPC|50|125|
357 |
358 |
359 | ### [↖](#3_3)[↑](#3_3_2_4)[↓](#3_3_3_1) Transit Gateway
360 |
361 |
362 | #### [↖](#3_3)[↑](#3_3_3)[↓](#3_3_3_2) Overview
363 | AWS Transit Gateway connects VPCs and on-premises networks through a central hub. This simplifies your network and puts an end to complex peering relationships. It acts as a cloud router – each new connection is only made once.
364 |
365 | As you expand globally, inter-Region peering connects AWS Transit Gateways together using the AWS global network. Your data is automatically encrypted, and never travels over the public internet. And, because of its central position, AWS Transit Gateway Network Manager has a unique view over your entire network, even connecting to Software-Defined Wide Area Network (SD-WAN) devices.
366 | * For having transitive peering between thousands of VPC and on-premises, hub-and-spoke (star) connection
367 | * Private IP connectivity
368 | * VPCs must be in same region as Transit Gateway
369 | * However, you can peer Transit Gateways across regions
370 | * VPCs can be in different accounts
371 | * Transit Gateway Route Tables: Control which VPC can talk with other VPC
372 | * Works with Direct Connect Gateway, VPN connections
373 | * Instances in a VPC can access a NAT Gateway, NLB, PrivateLink, and EFS in others VPCs attached to the AWS Transit Gateway.
374 | * Share cross-account using Resource Access Manager
375 | * AWS Resource Access Manager (AWS RAM) lets you share your resources with any AWS account or through AWS Organizations. If you have multiple AWS accounts, you can create resources centrally and use AWS RAM to share those resources with other accounts.
376 | * Supports *IP Multicast* (not supported by any other AWS service)
377 | * On AWS:
378 | * Service - FAQs - User Guide
379 |
380 |
381 | #### [↖](#3_3)[↑](#3_3_3_1)[↓](#3_3_4) Setting up a Transit Gateway
382 | * Connected VPCs route to Transit Gateway
383 | * Transit Gateway Route Table determines which VPCs can talk to each other
384 |
385 |
386 | ### [↖](#3_3)[↑](#3_3_3_2)[↓](#3_3_5) Transit VPC (=Software VPN, not recommended any more)
387 | * Not an AWS offering, newer managed solution is Transit Gateway
388 | * Uses the public internet with a software VPN solution
389 | * Allows for transitive connectivity between VPC & locations
390 | * More complex routing rules, overlapping CIDR ranges, network-level packet filtering
391 |
392 |
393 | ### [↖](#3_3)[↑](#3_3_4)[↓](#3_4) AWS PrivateLink
394 | ...
395 |
396 | ---
397 |
398 |
399 | ## [↖](#top)[↑](#3_3_5)[↓](#3_4_1) Extending on-premises networks to VPCs
400 |
401 | * [AWS VPN](#3_4_1)
402 | * [AWS Direct Connect](#3_4_2)
403 |
404 |
405 |
406 | ### [↖](#3_4)[↑](#3_4)[↓](#3_4_2) AWS VPN
407 | ...
408 |
409 | ### [↖](#3_4)[↑](#3_4_1)[↓](#4) AWS Direct Connect
410 | ...
411 |
412 | ---
413 |
414 |
415 | # [↖](#top)[↑](#3_4_2)[↓](#4_1) Open
416 |
417 | ## [↖](#top)[↑](#4)[↓](#4_2) Services
418 | * RAM
419 |
420 | ## [↖](#top)[↑](#4_1)[↓](#4_3) Topics
421 | * IPv4 vs IPv6
422 | * Dynamic Routing Protocols (BGP)
423 |
424 | ## [↖](#top)[↑](#4_2)[↓](#4_4) Practice/Hands-on
425 | * VPC Peering
426 | * Transit Gateway
427 | * Transit VPC
428 | * PrivateLink/Endpoint service
429 |
430 | ---
431 |
432 |
433 | ## [↖](#top)[↑](#4_3) Supporting Material
434 | * [Exam Readiness: AWS Certified Advanced Networking - Specialty](https://www.aws.training/Details/Curriculum?id=21330) (free aws training)
435 | * [AWS Networking Fundamentals](https://www.youtube.com/watch?v=hiKPPy584Mg) (youtube)
436 |
437 |
--------------------------------------------------------------------------------
/developer-associate.md:
--------------------------------------------------------------------------------
1 | [toc_start]::
2 |
3 | ---
4 | * [AWS Developer Associate](#1)
5 | * [AWS Fundamentals](#2)
6 | * [Global infrastructure](#2_1)
7 | * [Storage overview](#2_2)
8 | * [Security Concepts](#2_3)
9 | * [Services](#3)
10 | * [IAM](#3_1)
11 | * [Secure Token Service (STS)](#3_2)
12 | * [S3](#3_3)
13 | * [Dynamo DB](#3_4)
14 | * [Elastic Compute Cloud (EC2)](#3_5)
15 | * [Elastic Load Balancer (ELB)](#3_6)
16 | * [SNS](#3_7)
17 | * [SQS](#3_8)
18 | * [Cloudformation](#3_9)
19 | * [Elastic Beanstalk (EB)](#3_10)
20 | * [Simple Workflow Service (SWF)](#3_11)
21 | * [Virtual Private Cloud (VPC)](#3_12)
22 | * [Relational Database Service (RDS)](#3_13)
23 | * [Etc](#4)
24 | ---
25 | [toc_end]::
26 |
27 | # [↖](#top)[↑](#)[↓](#2) Developer Associate
28 | > 6/2017 - 8/2017
29 |
30 |
31 | # [↖](#top)[↑](#1)[↓](#2_1) AWS Fundamentals
32 |
33 |
34 | ## [↖](#top)[↑](#2)[↓](#2_2) Global infrastructure
35 | * **Region** - grouping of data centers
36 | * **AZ** - indidvidual data center in a region. Redundancy throughout AZs in one region
37 | * **Edge Location** - location to deliver cached data fast -> Use *Cloudfront* CDN to cache data
38 | close to where it's being used
39 |
40 |
41 | ## [↖](#top)[↑](#2_1)[↓](#2_2_1) Storage overview
42 |
43 | ### Instance store volumes
44 | * **Temporary block storage**
45 | * Physically attached to the host computer of the instance
46 | * Useful for often-changing data like caches & buffers
47 | * *Data is lost* when EC2 instance stops or terminates (*ephemeral* data)
48 |
49 | ### Elastic Block Storage (EBS)
50 | * **Permanent block storage**, independent to instance
51 | * Attachable to running EC2 instances (same AZ)
52 | * Only accessible by a *single instance*
53 | * Can take snapshots from
54 | * Can be encrypted
55 | * Stores redundantly in single AZ
56 | * Different volume options:
57 | * General purpose SSD
58 | * Provisioned IOPS
59 | * Magnetic volumes
60 |
61 | ### Elastic File System (EFS)
62 | * **Scalable file storage** for use with *Amazon EC2 instances*
63 | * Elastic storage capacity, growing and shrinking as files are added or removed
64 | * *Multiple EC2 instances* from *multiple AZs* can access an EFS file system at the same time
65 | * Stores redundantly in multiple AZs
66 |
67 | ### Amazon Glacier
68 | * Low cost, very slow retrieval
69 | * Can be intergrated with S3 lifecycle policy
70 |
71 | ### Database Storage
72 | * *DynamoDB*
73 | * *RDS*
74 | * DBs on EC2 instances
75 | * *AWS Redshift* (data warehouse service)
76 |
77 | ### In-memory caching
78 | * *ElastiCache* (Memcached and Redis)
79 | * Software on EC2 instances
80 |
81 | ### Storage gateway
82 | * Integrate existing *on-premises storage* infrastructure and data with the AWS Cloud
83 |
84 |
85 | ## [↖](#top)[↑](#2_2_7)[↓](#3) Security Concepts
86 | * **Shared responsibility** environment
87 | * AWS is responsible for:
88 | * Server / Host level and below
89 | * Physical environment security
90 | * Hardware decommissioning
91 | * Traffic security (Networks, ACLs, SSL, DDOS-protection)
92 | * EC2 hypervisor isolation
93 | * User is responsible for:
94 | * IAM
95 | * MFA
96 | * Password/key-rotation
97 | * Access advisor (shows used permissions)
98 | * Trusted advisor (validates best practices)
99 | * Security groups
100 | * ACL (resource based policy)
101 | * VPC
102 |
103 |
104 | # [↖](#top)[↑](#2_3)[↓](#3_1) Services
105 |
106 | ## [↖](#top)[↑](#3)[↓](#3_1_1) IAM
107 | IAM is a global service that helps to securely control access to AWS resources.
108 |
109 | * **Users** hold credentials
110 | * **Groups** hold users, typically only provides permission to assume a role
111 | * **Roles** hold policies.
112 | * Can have **trust relationships** with trusted entities that can *assume* this role
113 | * **Policies** can be attached to users, groups or roles (preferred)
114 | * An **instance profile** is a container for an IAM role that you can use to pass role information to an
115 | EC2 instance when the instance starts.
116 | * Users and / or services assume roles
117 |
118 |
119 | ### Policies
120 | * Any actions on resources that are not explicitly allowed are **denied by default**
121 | * Structure
122 | * **E** - `effect` (*allow* / *deny*)
123 | * What the effect will be when the user requests the specific action
124 | * **P** - `prinicpal` (*ARN*)
125 | * The account or user who is allowed access to the actions and resources in the statement
126 | * IAM policies do not have a principal (because they are attached to users, groups or roles)
127 | * **A** - `action` or `notaction`
128 | * Describes the specific action or actions that will be allowed or denied
129 | * **R** - `resource` or `notresource`
130 | * Specifies the object or objects that the statement covers
131 | * **C** - `condition`
132 | * Specifies conditions for when a policy is in effect
133 | * Can use **policy variables**
134 | * `aws:currentTime`, `aws:userid`, ...
135 |
136 | ```
137 | {
138 | "Version": "2012-10-17",
139 | "Statement": [
140 | {
141 | "Effect": "Allow",
142 | "Action": "s3:ListAllMyBuckets",
143 | "Resource": "arn:aws:s3:::*"
144 | },
145 | {
146 | "Effect": "Allow",
147 | "Action": [
148 | "s3:ListBucket",
149 | "s3:GetBucketLocation"
150 | ],
151 | "Resource": "arn:aws:s3:::productionapp"
152 | },
153 | {
154 | "Effect": "Allow",
155 | "Action": [
156 | "s3:GetObject",
157 | "s3:PutObject",
158 | "s3:DeleteObject"
159 | ],
160 | "Resource": "arn:aws:s3:::productionapp/*"
161 | }
162 | ]
163 | }
164 | ```
165 |
166 | #### IAM Policies
167 | * Managed policies (the new way)
168 | * Can be attached to multiple users, groups and roles
169 | * AWS managed policies
170 | * Updated by AWS if new API come out
171 | * Inline policies (the old way)
172 |
173 |
174 | ### Limits
175 | .|.
176 | -|-
177 | Groups per account|100
178 | Instance profiles|100
179 | Roles|500
180 | Server certificates|20
181 | Users|5000
182 |
183 |
184 | ## [↖](#top)[↑](#3_1_2)[↓](#3_2_1) Secure Token Service (STS)
185 | * Allows to grant **temporary access** to authenticated users
186 | * IAM users
187 | * Web-based identity providers (google, facebook, ...)
188 | * Organization's existing identity system
189 | * Returns **temporary credentials** that expire after some time:
190 | * Access key
191 | * Session token
192 |
193 |
194 | ### Terms
195 | * **Federation**
196 | * Trust relationship between identity provider and AWS
197 | * **Identity broker**
198 | * Broker in charge of mapping user to the right set of credentials
199 | * **Identity store**
200 | * Eg Google or Facebook
201 | * **Identities**
202 | * Users
203 |
204 |
205 | ### Scenarios
206 | * Temporary credentials with EC2
207 | * Assign IAM role to instance
208 | * Get temp credentials from *instance metadata*
209 | * Temporary credentials with SDK
210 | * Call `assumeRole`, extract temp credentials
211 | * Options for temporary credentials with API calls
212 | * *Sign request* with temp credentials
213 | * Add AC / SK to request (*header* or *query string*)
214 |
215 |
216 | ## [↖](#top)[↑](#3_2_2)[↓](#3_3_1) S3
217 |
218 | Amazon Simple Storage Service (S3) is object storage with a simple web service interface to store and
219 | retrieve any amount of data from anywhere on the web. It is designed to deliver 11x9 durability and
220 | scale past trillions of objects worldwide.
221 |
222 | * **Key**-**value** storage (folder-like structure is only a UI representation)
223 | * **Bucket** size is unlimited. Objects from 0B to 5TB.
224 | * HA and scalable, transparent data partitioning
225 | * Bucket lifecycle events can trigger *SNS*, *SQS* or *AWS Lambda*
226 | * New object created events
227 | * Object removal events
228 | * Reduced Redundancy Storage (RRS) object lost event
229 | * Bucket names have to be globally unique, should comply with DNS naming conventions.
230 | * `http://bucket.s3.amazonaws.com`
231 | * `http://bucket.s3-aws-region.amazonaws.com`
232 | * `http://s3.amazonaws.com/bucket`
233 | * `http://s3-aws-region.amazonaws.com/bucket`
234 |
235 |
236 | ### Perfomance & Consistency
237 | * Bucket operations **get** - **list** - **put** - **delete** - **head**
238 | * Implemented through *http* operations: `GET` - `PUT` - `DELETE` - `HEAD`
239 | * *Read-after-write consistency* for `PUT` of *new* objects.
240 | * *Eventual consistency* for *overwrite* `PUT` and `DELETE` (stale reads but low latency).
241 | * Can only delete a bucket that is empty.
242 | * *Scales* automatically, up to a certain limit:
243 | * Consistent:
244 | * `>100 PUT/LIST/DELETE/s`
245 | * `>300 GET/s`
246 | * Bursts:
247 | * `>300 PUT/LIST/DELETE/s`
248 | * `>800 GET/s`
249 | * Key names are used to determine which partition to store the object in.
250 | * Make sure keys are spread out (not sequential)
251 | * E.g. by adding a random prefix to the key name
252 | * For `GET` requests put *AWS CloudFront* in front of S3 bucket
253 | * Internal caching
254 | * Reduced latency - objects are physically closer to the consumer.
255 | * **Multipart upload**
256 | * Recommended for objects >=100MB, mandatory for >=5GB
257 | * Supports parallel uploads
258 | * Can pause & resume
259 | * Can upload file while it's being created
260 | * 3 step process:
261 | * Initiate multipart upload
262 | * `POST /ObjectName?uploads HTTP/1.1`
263 | * Upload of all parts
264 | * `PUT /ObjectName?partNumber=PartNumber&uploadId=UploadId HTTP/1.1`
265 | * Complete Multipart upload
266 | * ```
267 | POST /ObjectName?uploadId=UploadId HTTP/1.1`
268 | ...
269 | ```
270 |
271 |
272 | ### Hosting Static Websites
273 | `.s3-website-.amazonaws.com`
274 | * Bucket name *must* match domain name. Every hosted bucket recieves its own URL
275 | * Use *AWS Route 53* to integrate custom domains (also to automatically fail-over from dynamic website)
276 | * Specify `index` & `error` documents
277 | * In *AWS Route 53*: create hosted zone & record set
278 | * Might need to add CORS configuration to bucket (cross origin resource sharing)
279 |
280 |
281 | ### Access Control
282 | * **Effect** – This can be either allow or deny
283 | * **Principal** – Account or user who is allowed access to the actions and resources in the statement
284 | * **Actions** – For each resource, S3 supports a set of operations
285 | * **Resources** – Buckets and objects are the resources
286 | * Authorization works as a *union* of **IAM** & **bucket policies** and **bucket ACLs**
287 |
288 | #### Defaults
289 | * Bucket is *owned* by the AWS account that created it
290 | * Bucket ownership is not transferable
291 | * Bucket owner gets full permission (ACL)
292 | * The person paying the bills always has full control.
293 | * A person uploading an object into a bucket owns it by default.
294 |
295 | #### IAM
296 | * IAM policies (in general) specify what actions are allowed or denied on what AWS resources
297 | * Defined as JSON
298 | * Attached to IAM users, groups, or roles (so they cannot grant access to anonymous users)
299 | * Use if you’re more interested in *“What can this user do in AWS?”*
300 |
301 | #### Bucket policies
302 | * Specify what actions are allowed or denied for which principals on the bucket that the policy is
303 | attached to
304 | * Defined as JSON
305 | * Attached *only* to S3 buckets. Can however effect object in buckets.
306 | * Contain *principal* element (unnecessary for IAM)
307 | * Use if you’re more interested in *“Who can access this S3 bucket?”*
308 | * Easiest way to grant *cross-account permissions* for all `s3:*` permission. (Cannot do this with ACLs.)
309 |
310 | #### ACLs
311 | * Defined as XML. Legacy, not recomended any more.
312 | * Can
313 | * be attached to individual objects (bucket policies only bucket level)
314 | * control access to object uploaded into a bucket from a *different* account.
315 | * Cannot..
316 | * have conditions
317 | * cannot explicitely deny actions
318 | * grant permission to bucket sub-resources (eg. lifecycle or static website configurations)
319 | * Other than *object ACL*s there are *bucket ACL*s as well - only for writing access log objects to a
320 | bucket.
321 |
322 | #### How to specify resources in a policy:
323 | .|.
324 | -|-
325 | `arn:partition:service:region:namespace:relative-id`|`arn:aws:s3:::mybucket`
326 | `arn:aws:s3:::*`|All buckets and objects in account
327 | `arn:aws:s3:::mybucket`|`mybucket`
328 | `arn:aws:s3:::mybucket/*`|All objects in `mybucket`
329 | `arn:aws:s3:::mybucket/mykey`|`mykey` in `mybucket`
330 | `arn:aws:s3:::mybucket/developers/($aws:username)/`|folder matching the accessing user's name
331 |
332 | #### Pre-signed URLs
333 | All objects are private by default. Only the object owner has permission to access these objects.
334 | However, the object owner can optionally share objects with others by creating a **pre-signed URL**,
335 | using their own security credentials, to grant time-limited permission to download the objects.
336 |
337 |
338 | ### Logging
339 | * *AWS CloudTrail* logs S3-API calls for bucket-level operations (and many other information) and
340 | stores them in an S3 bucket. Could also send email notifications or trigger *SNS* notifications for
341 | specific events.
342 | * *S3 Server Access Logs* log on object level.
343 |
344 |
345 | ### Versioning
346 | * Works on bucket level (for *all* objects)
347 | * Versioning can either be *unversioned* (default), *enabled* or *suspended*
348 | * **Version ids** are automatically assigned to objects
349 | * Ids cannot changed.
350 | * As long as versioning is *disabled*, id is set to `null`
351 | * Once enabled, versioning can only be suspended (but not disabled)
352 | * `PUT` creates a new version, `GET` returns the latest version. Specific versions can be requested.
353 | * `DELETE` (without version) marks latest version as deleted and returns a `404` for subsequent `GET`s.
354 | * Older versions (pre-delete) can still be requested.
355 | * Restore old version by deleting the new version or by copying the old version on top of the bucket.
356 | * `DELETE` (with a version) permanently deletes that version.
357 | * If versioning is *suspendend*, S3 automatically adds a `null` version ID to every subsequent
358 | object stored thereafter
359 | * *Lifecycle Management policies* can automatically handle old versions, e.g. permanently delete or
360 | move to *AWS Glacier*.
361 | * Different versions of the same object can have different permissions.
362 |
363 |
364 | ### Encryption
365 |
366 | #### Protecting data in transit
367 | * Using an AWS KMS–Managed Customer Master Key (CMK)
368 | * Before *uploading* to S3, Client makes request to KMS, receives plain text encryption key and
369 | cypher blob, to upload to S3 as object metadata. Decrypt by sending cypher blob to KMS, retrieving
370 | plain text back, use for decryption.
371 | * Before *downloading* from S3, The client first downloads the encrypted object from Amazon S3 along
372 | with the cipher blob version of the data encryption key stored as object metadata. The client then
373 | sends the cipher blob to AWS KMS to get the plain text version of the same, so that it can decrypt
374 | the object data.
375 | * Using a Client-Side Master Key
376 | * Clients provides a master key, S3 client generates random data
377 | key and encrypts with client's master key.
378 | * *Uploads* material description as part of the object metadata.
379 | * On *download* S3 client uses metadata to determine the right master key to use for decryption.
380 | * Use *SSL encryption*
381 |
382 | #### Protecting data at rest
383 | * Uses *AES-256* (or others)
384 | * Encryption can be enforced via bucket policy.
385 | * Enable server-side encryption by adding specific header to request (`x-amz-server-side-encryption`).
386 | * Server-Side Encryption with *Amazon S3-Managed Keys* (SSE-S3)
387 | * Each object is encrypted with a unique key employing strong multi-factor encryption
388 | * Furthermore it encrypts the key itself with a master key that is rotated regularly
389 | * Server-Side Encryption with *AWS KMS-Managed Keys* (SSE-KMS)
390 | * Similar to SSE-S3, with extra benefits
391 | * Separate permissions for the use of an envelope key
392 | * Has audit trail
393 | * Server-Side Encryption with *Customer-Provided Keys* (SSE-C)
394 | * Key is not stored with AWS (stores salted HMAC valued instead)
395 |
396 |
397 | ### Storage classes
398 | .|.
399 | -|-
400 | S3 Standard|Durability 11x9
401 | |Availability 4x9
402 | S3 IA (infrequent access)|Durability 11x9
403 | |Availability 3x9
404 | S3 RRS (reduced redundancy storage)|Durability 4x9
405 | |Availability 4x9
406 |
407 |
408 | ### Request/response headers
409 | Request|Response
410 | -|-
411 | `x-amz-content-sha256`|`x-amz-delete-marker`
412 | `x-amz-date`|`x-amz-id-2 `
413 | `x-amz-security-token`|`x-amz-request-id`
414 | |`x-amz-version-id `
415 |
416 |
417 | ### Error codes
418 | .|.
419 | -|-
420 | 400 Bad Request|`ExpiredToken`
421 | 400 Bad Request|`InvalidToken`
422 | 400 Bad Request|`InvalidArgument`
423 | 400 Bad Request|`InvalidRequest`
424 | 400 Bad Request|`IncompleteBody`
425 | 400 Bad Request|`IncompleteDigest`
426 | 400 Bad Request|`InvalidBucketName`
427 | 403 Forbidden|`AccessDenied`
428 | 403 Forbidden|`InvalidAccessKeyId`
429 | 404 Not Found|`NoSuchBucket`
430 | 404 Not Found|`NoSuchKey`
431 | 409 Conflict|`BucketAlreadyExists`
432 | 409 Conflict|`BucketNotEmpty`
433 |
434 |
435 | ### Limits
436 | .|.
437 | -|-
438 | Buckets per account|100
439 | Bucket policy max size|20KB
440 | Object size|0B to 5TB
441 | Object size in a single `PUT`|5GB
442 |
443 |
444 | ## [↖](#top)[↑](#3_3_10)[↓](#3_4_1) Dynamo DB
445 |
446 |
447 | ### Overview
448 | * Fully managed **NoSQL** database
449 | * *HA* through different AZs, automatically spreads data and traffic accross servers
450 | * 3 geographically distributed regions per table
451 | * Can scale up and down depending on demand (no downtime, no performance degradation)
452 | * Built-in monitoring
453 | * User controlled read/write capacity (recently added: *auto-scaling*)
454 | * Big data: Integrates with *AWS Elastic MapReduce* and *Redshift*
455 | * No joins - create references to other tables manually (`table1#something`)
456 | * Option between **eventual consistency** or **strongly consistency**
457 | * Conditional updates and concurrency control (**atomic counters**)
458 |
459 |
460 | ### Core components
461 | * A **table** is a collection of items.
462 | * Can be updated through a single `UpdateTable` command at a time (`ACTIVE` -> `UPDATING`)
463 | * An **item** is a group of one or more attributes that is uniquely identifiable among all of the
464 | other items. (*row* in a traditional db)
465 | * An **attribute** is a fundamental data element, something that does not need to be broken down any
466 | further. Can be nested up to 32 levels. (*column* in a traditional db)
467 | * **Primary keys** are used to uniquely identify each item in a table. Apart from that DynamoDB is
468 | *schemaless*, which means that neither the attributes nor their data types need to be defined
469 | beforehand
470 | * **Secondary indexes** are used to provide more querying flexibility
471 | * **Control plane** operations create and manage DynamoDB table
472 | * **Data plane** operations perform CRUD actions on data in a table
473 | * **DynamoDB streams** operations capture data modification events in DynamoDB tables
474 |
475 |
476 | ### Keys and indexes
477 |
478 | #### Partion key (PK)
479 | * **Partition key** is also called **hash attribute** or **primary key**
480 | * Must be unique, used for internal hash function (*unordered*)
481 | * Used to retrieve data
482 |
483 | #### PK & Sort key
484 | * **Composite PK**: *index* composed of hashed PK (*unordered*) and SK (*ordered*)
485 | * **Sort key** is also called **range attribute** or **range key**
486 | * Different items can have the same *PK*, must have different *SK*
487 |
488 |
489 | ### Secondary indexes
490 | * Associated with exactly one table, from which it obtains its data
491 | * Allows to query or scan data by an *alternate key* (other than PK/SK)
492 | * All secondary indexes are automatically maintained by DynamoDB as sparse objects
493 | * Items will only appear in an index if they exist in the base table
494 | * Makes querying very efficient
495 | * Only for `read` operations, `write` is not supported.
496 | * Tables with secondary indexes need to be created sequentially (`LimitExceededException`)
497 |
498 | #### Projected attributes
499 | * Attributes copied from the base table into an *index*
500 | * Makes them queryable
501 | * Different projection types
502 | * *KEYS_ONLY* - Only the index and primary keys are projected into the index
503 | * *INCLUDE* - Only the specified table attributes are projected into the index
504 | * *ALL* - All of the table attributes are projected into the index
505 |
506 | #### Local secondary index
507 | * Uses the *same PK*, but offers different *SK*
508 | * Every partition of a local secondary index is scoped to a base table partition that has the same
509 | partition key value
510 | * Local secondary indexes are extra tables that dynamo keeps in the background
511 | * Cannot be created after the base table has already been created.
512 | * Can choose *eventual consistency* or *strong consistency* at *creation* time
513 | * *Local* as in "co-located on the same partition"
514 | * Can request *not-projected* attributes for query or scan operation
515 | * Consumes read/write throughput from the original table.
516 |
517 | #### Global secondary index
518 | * Uses *different PK* and offers additional *SK* (or none).
519 | * *PK* does not have to be unique (unlike base table)
520 | * Queries on the global index can span all of the data in the base table, across all partitions
521 | * Can be created after the base table has already been created.
522 | * Only support *eventual consistency*
523 | * Have their own provisioned read/write throughput
524 | * Global secondary keys are distributed transactions across multiple partitions
525 | * Global as in "over many partitions"
526 | * Cannot request not-projected attributes for query or scan operation
527 |
528 |
529 | ### Capacity provisioning
530 | * Unit for operations:
531 | * 1 *strongly consistent* `read` per second (up to 4KB/s)
532 | * 2 *eventual consistent* `read` per second (up to 8KB/s)
533 | * 1 `write` per second (up to 1KB)
534 | * Algorithm
535 |
536 | .|.
537 | -|-
538 | .|*300 strongly consistent reads of 11KB per minute*
539 | Calculate read / writes per second|`300r/60s = 5r/s`
540 | Multiply with payload factor|`5r/s * (11KB/4KB) = 15cu`
541 | If eventual consistent, devide by 2|`15cu / 2 = 8cu`
542 |
543 | * More throughput -> more reads / writes per second
544 | * Exceeding allocated throughput may result in throttling of the operation. Check return code.
545 | * Failing to distribute data accross partions can result in `ProvisionedThroughputExceededException`
546 | * Local secondary index
547 | * `Read`
548 | * If read only index keys and projected attributes use same calculation
549 | * If more than index keys and projected attributes add extra latency and read capacity cost
550 | * Use read capacity from the index *and* for every item from the table
551 | * `Write` (to items in the base table that are indexed)
552 | * 1 for adding an item
553 | * 2 for changing the value of an item
554 | * 1 for deleting and item
555 | * Global secondary index
556 | * Read
557 | * Only supports eventual consistency, so 8KB/s base unit
558 | * Calculated the same as in tables, except that the size of the index entries is used instead
559 | of the size of the entire item
560 | * Write (to items in the base table that are indexed)
561 | * Putting, Updating, or Deleting items in a table consumes the index' write capacity units
562 |
563 |
564 | ### Query and scan operation
565 |
566 | #### Query
567 | * Finds items based on PK values
568 | * Can *only* query any table or secondary index that have a composite primary key
569 | * *Has* to use PK, *can* specify SK
570 | * Very efficient, only searches index
571 | * Result is orderd by SK
572 | * Returns all attributes or only specified subset
573 | * Eventually consistent per default, can request consistent read
574 | * Can use *conditional attributes*
575 |
576 |
577 | ### Scan
578 | * Reads every item in table (much worse performance than queries)
579 | * Can *filter* result (slows down performance)
580 | * The larger the data set in the table the slower the scan
581 | * *Eventual consistent* reads by default, can specify *strongly consistent*
582 | * Try to avoid scans
583 | * Use *Page Size* to limit how much data is retrieved at the same time
584 |
585 |
586 | ### Atomic and conditional updates
587 |
588 | #### Atomic Counters
589 | * Increment or decrement the value of an existing attribute without interfering with other writes
590 | * Request are applied in the order they are received
591 | * *Not idempotent*
592 |
593 | #### Conditional updates
594 | * Only proceed if condition is met
595 | * *Idempotent*
596 |
597 |
598 | ### How to grant temporary access
599 | * *Web Identity Federation* - use existing OpenId provider, eg. Amazon, Google, Facebook
600 | * *Amazon Cognito* does Web Identity Federation, also synchronizes app data
601 | * *IAM* - contains role for users to assume
602 |
603 |
604 | ### API
605 | * Control Plane
606 |
607 | Create and manage tables|.
608 | -|-
609 | `CreateTable`|Creates a table and specifies the primary index used for data access
610 | `DescribeTable`|Returns information such as primary key schema, throughput settings, index information
611 | `ListTables`|Returns the names of all of your tables in a list
612 | `UpdateTable`|Modifies the settings of a table or its indexes
613 | `DeleteTable`|emoves a table and all of its dependent objects
614 |
615 | * Data Plane
616 |
617 | Creating data|.|conditional?
618 | -|-|-
619 | `PutItem`|Creates a new item, or replaces an old item with a new item|yes
620 | `BatchWriteItem`|Puts or deletes multiple items in one or more tables|no
621 | |Called in a loop it typically checks for unprocesses items and submits a new `BWI` request for those
622 |
623 | Reading data|.|conditional?
624 | -|-|-
625 | `GetItem`|Returns a set of Attributes for an item that matches the PK|no
626 | `BatchGetItem`|Returns the attributes for multiple items from multiple tables using their PKs|no
627 | `Query`|Gets one or more items using the table *PK*, or from a secondary index using the index key|no
628 | `Scan`|Gets all items and attributes by performing a full scan across the table or a secondary index|no
629 |
630 | Updating data|.|conditional?
631 | -|-|-
632 | `UpdateItem`|Modifies one or more attributes in an item|yes
633 |
634 | Deleting data|.|conditional?
635 | -|-|-
636 | `DeleteItem`|Deletes a single item in a table by primary key|yes
637 | `BatchWriteItem`|Puts or deletes multiple items in one or more tables|no
638 | |Called in a loop it typically checks for unprocesses items and submits a new `BWI` request for those
639 |
640 |
641 | ### Limits
642 | .|.
643 | -|-
644 | Tables per account/region|256
645 | Max read / write per table partition|3000 reads / 1000 writes
646 | Partition key|min 1B, max 2048B
647 | Sort key|min 1B, max 1024B
648 | Local secondary index per table|5
649 | Global secondary index per table|5
650 | Item size|1B to 400KB, including name & value
651 | Simultaneous `CreateTable`, `UpdateTable`, `DeleteTable`|up to 10
652 | Single `BatchGetItem`|Max 100 items, must be <16MB
653 | Single `BatchWriteItem`|Up to 25 *PutItem* or *DeleteItem*, must be <16MB
654 | *Query* and *Scan* result set limit|1MB data per call
655 |
656 |
657 | ## [↖](#top)[↑](#3_4_11)[↓](#3_5_1) Elastic Compute Cloud (EC2)
658 | * Resizable **compute capacity** in the cloud
659 | * Amazon Machine Image (AMI)
660 | * Unit of deployment
661 | * Packaged-up environment that includes all the necessary bits to set up and boot an instance
662 | * Can create AMI from configured *EC2* instance
663 |
664 |
665 | ### Different options
666 | * Payment models
667 | * **On-demand instances**
668 | * Pay for compute capacity by the hour, can be terminated by Amazon
669 | * **Reserved instances**
670 | * Provide a significant discount compared to On-Demand pricing and
671 | provide a capacity reservation when used in a specific Availability Zone
672 | * Can transfer between AZs
673 | * **Spot instances**
674 | * Bid on spare Amazon EC2 computing capacity, not available for all instance types
675 | * **Dedicated hosts**
676 | * A physical server with EC2 instance capacity fully dedicated to your use
677 | * Instance sizes & types
678 | * *Sizes*: nano / micro / small / medium / large
679 | * *Types*: general purpose / computer optimized / memory optimized / gpu / storage optimized
680 | * Pricing by
681 | * Compute time
682 | * Data transfer
683 | * Storage
684 | * Elastic IP address
685 | * Monitoring
686 | * Elastic load balancer
687 |
688 |
689 | ### Instance metadata & userdata
690 | * Data about an instance that can be used to configure or manage the running instance
691 | * Available from *running instance* under `http://169.254.169.254/latest/meta-data/`
692 | * Contains various data about the current instance (static & dynamic)
693 | * Can specify user-data
694 | * Allows to launch individual instances from same AMI
695 |
696 |
697 | ### API
698 | .|.
699 | -|-
700 | `DescribeImages`|Describe an Amazon Machine Image
701 | `RegisterImage`|Final process of creating an AMI
702 |
703 |
704 | ### Limits:
705 | .|.
706 | -|-
707 | Elastic IP addresses for EC2-Classic|5
708 |
709 |
710 | ## [↖](#top)[↑](#3_5_4)[↓](#3_6_1) Elastic Load Balancer (ELB)
711 | * **Distributes traffic** between instances that belong to the ELB group
712 | * Stops sending requests to unhealthy instances
713 | * Can store SSL certificates (offloads encryption to load balancer level)
714 | * Can configure session stickyness:
715 | * *LB issued cookie*
716 | * Easy to implement, not best balancing
717 | * *Application issued cookie*
718 | * Cookies based on application session, marginally better
719 | * *ElastiCache*
720 | * Better distribution, requires state to be stored in *DB* or in *EC* memory.
721 | * EC memory is the much better option
722 | * Relies on DNS / *Route53*
723 | * Can route traffic into instances running in private subnets
724 | * Needs to be configured with (empty) public subnets though.
725 |
726 |
727 | ### Limits:
728 | .|.
729 | -|-
730 | Total load balancers per region (ALB & ELB)|20
731 |
732 |
733 | ## [↖](#top)[↑](#3_6_1)[↓](#3_7_1) SNS
734 | * **Publishes** messages to **subscribers** via topic
735 | * **Pub-Sub-Service** for messaging
736 | * Scenarios:
737 | * *Fanout*: Many subsribers process event parallel and asyncronously
738 | * *Push to SQS*: Services pull from SQS, when they become available
739 | * *Alert*: Notification triggered by event or threshold
740 | * **Mobile Notifications** to mobile devices
741 | * Sends *push notifications* to iOS, Android, Fire OS, Windows and Baidu-based devices
742 |
743 |
744 | ### Components
745 | * **Publisher** (producer)
746 | * Communicates asynchronously with subscribers
747 | * Policies determine which topic(s) publishers can write to
748 | * **Topics**
749 | * Unique name up to 256 characters
750 | * Stored redundantly on multitple servers and datacenters
751 | * **Subscribers** (consumer)
752 | * Subscribes to topic
753 | * Endpoints like mobile app, web server, email, *AWS SQS*, *AWS Lambda*
754 | * Email subscriptions need to be confirmed
755 | * **Messages**
756 | * Json-formatted key-value pairs
757 | * Fixed set + additional attributes if required
758 | * `POST`s to https endpoints with specific headers
759 | * Contains topics- and subscription-ARN
760 | * To identify messages without parsing the body
761 | * Up to 10 for SQS.
762 | * Provider-specific for mobile push notifications
763 | * Messages can be signed and verified
764 | * Message data:
765 | * `Message`, `MessageId`, `Signature`, `SignatureVersion`, `SigningCertURL`, `Subject`,
766 | `Timestamp`, `TopicArn`, `Type`, `UnsubscribeURL`
767 |
768 |
769 | ### Managing access
770 | * Owner creates topic and controls access to it
771 | * Can use own API (Access Control) and / or *IAM*, similar to *S3*
772 | * Access control policies
773 | * Default denies, needs explicit allow
774 | * Can grant access across account (API call: *AddPermission*)
775 | * IAM
776 | * More fine grained or very coarse, can include conditions
777 | * Can grant temporary security credentials
778 |
779 |
780 | ### Mobile push notifications
781 | * Does not push to endpoint, but to PN-service (platform/provider specific)
782 | 1. Request *credentials* from mobile platforms (ADM, APNS, etc...)
783 | 2. Request a *token* from mobile platforms (*registrations id* for some platforms)
784 | 3. Create a *platform application object*
785 | 4. Create a *platform endpoint object*
786 | 5. Publish a *message* to the mobile endpoint
787 | * A single message can contain different data for different platforms
788 |
789 |
790 | ### API
791 | .|.
792 | -|-
793 | `CreateTopic`|Create a new topic.
794 | `DeleteTopic`|Delete a topic and all its subscriptions.
795 | `Publish`| Publish a new message to the topic.
796 | `ListTopics`|List of topics owned by a particular user (AWS ID).
797 | `ListSubscriptions`|List subscriptions owned by a particular user (AWS ID)
798 | `ListSubscriptionsByTopic`|List of subscriptions for a particular topic
799 | `Subscribe`|Register a new subscription on a topic, will generate a confirmation message from Amazon SNS
800 | `ConfirmSubscription`|Respond to a confirmation message, confirming to receive notifications from the topic
801 | `UnSubscribe`|Cancel a previously registered subscription
802 |
803 |
804 | ### Limits:
805 | .|.
806 | -|-
807 | Topics per account|100,000
808 |
809 |
810 | ## [↖](#top)[↑](#3_7_5)[↓](#3_8_1) SQS
811 | * Scalable **message queue** service
812 | * Allows *loose coupling* and *asynchronous processing*
813 | * **Pull** from *SQS* (*Push* to *SNS*)
814 | * PCI compliant
815 | * Allows for asynchronous processing
816 | * Protection against data loss on application failure
817 |
818 |
819 | ### Core features
820 | * Redundant infrastructure
821 | * Multiple readers / writers at the same time
822 | * Access control via *SQS policies* (similar to *IAM*)
823 | * **Standard queue**
824 | * Guarantees message delivery *at least once*
825 | * *No guarantee on message order*
826 | * No guarantee on not receiving duplicates (app has to deal with it)
827 | * **FIFO queue**
828 | * Guaranteed order
829 | * Exactly-once processing
830 | * *Message groups* - multiple ordered message groups within a single queue
831 | * Name ends in `.fifo`
832 | * **Delay queues**
833 | * Controls when a message becomes available
834 | * Between 0s and 15min, default 0s
835 | * **Visibility timeout**
836 | * Controls when a polled message becomes visible again
837 | * Configurable and extendable for individual messages
838 | * Between 0s and 12h, default 30s
839 | * **Message retention period**
840 | * Amount of time a message will live in the queue if it's not deleted
841 | * Between 1min and 14d, default 4d
842 | * **In flight message**
843 | * Sent to a client but have not yet been deleted or have not yet reached the end of their
844 | visibility window
845 | * **Deadletter queue**
846 | * Queue that other queues can send messages to when these were not successfully
847 | processed.
848 | * **Receive message wait time**
849 | * Value >0 enables *long polling*
850 | * Between 0s and 20s, default 0s (*short polling*)
851 |
852 |
853 | ### Message lifecycle
854 | * Component 1 sends message A to queue
855 | * `SendMessage`/`SendMessageBatch`
856 | * Component 2 retrieves A from queue.
857 | * A remains in queue while it's being processed, but is not returned to any other components
858 | * Message is now considered to be *in flight*.
859 | * `ReceiveMessage`
860 | * Component 2 deletes A from queue during visibility timeout
861 | * Otherwise it will get processed again
862 | * SQS will never delete messages
863 | * `DeleteMessage`/`DeleteMessageBatch`
864 |
865 |
866 | ### Long polling vs short polling
867 | * **Short polling** returns immediately, could be *false empty* (e.g. message not fully propagated yet)
868 | * **Long polling** won't return unless there's a message in the queue or receive message wait time is
869 | exceeded. Also checks *every server* to avoid false empty responses
870 |
871 |
872 | ### API
873 | .|.
874 | -|-
875 | `SendMessage`/`SendMessageBatch`|Delivers a message to the specified queue (up to 20, <= 256KB)
876 | `ReceiveMessage`|Retrieves one or more messages (up to 10), `WaitTimeSeconds` for long poll
877 | `ChangeMessageVisibility`/`ChangeMessageVisibilityBatch`|Changes the visibility timeout of a message
878 | `DeleteMessage`/`DeleteMessageBatch`|Deletes the specified message from the specified queue
879 | `SetQueueAttribute`|e.g `DelaySeconds`, `MessageRetentionPeriod`
880 | `GetQueueURL`|
881 | `CreateQueue`|
882 | `DeleteQueue`|
883 | `ListQueues`|
884 |
885 |
886 | ### Limits:
887 | .|.
888 | -|-
889 | Max message size|256KB
890 | Max inflight messages|120,000
891 |
892 |
893 | ## [↖](#top)[↑](#3_8_5)[↓](#3_9_1) Cloudformation
894 | * Allows to create and provision **resources** in a reusable **template** fashion
895 | * A *CloudFormation* template is a `JSON` or `YAML` formatted text file
896 | * Related resources are managed in a single unit called a **stack**
897 | * All the resources in a stack are defined by the stack's *CloudFormation* template
898 | * Two ways to update a stack
899 | * *Direct update*
900 | * Create **change set**
901 | * Summary of proposed changes
902 | * Will **rollback** stack if it fails to create (can be disabled via API / console)
903 |
904 |
905 | ### Anatomy of template
906 | * *AWSTemplateFormatVersion*
907 | * *Description*
908 | * *Metadata*
909 | * Details about the template
910 | * *Parameters*
911 | * Values to pass in right before template creation
912 | * Allows validation per *regular expression*
913 | * *Mappings*
914 | * Maps keys to values (eg different values for different regions)
915 | * *Conditions*
916 | * Check values before deciding what to do
917 | * *Resources*
918 | * Creates resources
919 | * *Outputs*
920 | * Values to be exposed from the console or from API calls.
921 | * Can be used in a different stack (*cross stack references*)
922 |
923 |
924 | ### Intrinsic Functions
925 | * Used to pass in values that are not available until runtime
926 | * Usable in *resource* properties, *metadata* attributes, and *update policy* attributes (auto-scaling)
927 | * `Fn::GetAtt`
928 | * Returns the value of an attribute from a resource in the template
929 | * `Fn::FindInMap`
930 | * Returns the value corresponding to keys in a two-level map that is declared in the *Mappings* section
931 | * `Fn::Join`
932 | * Appends a set of values into a single value, separated by the specified delimiter
933 | * `Fn::GetAZs`
934 | * Returns an array that lists *Availability Zones* for a specified region
935 | * `Fn::Select`
936 | * Returns a single object from a list of objects by index
937 | * `Fn::ImportValue`
938 | * Returns the value of an *Output* exported by another stack
939 | * `Fn::Split`
940 | * Split a string into a list of string values so that you can select an element from the resulting
941 | string list
942 | * `Fn::Sub`
943 | * Substitutes variables in an input string with values that you specify
944 | * `Ref`
945 | * Returns the value of the specified parameter or resource
946 |
947 |
948 | ### Limits:
949 | .|.
950 | -|-
951 | Max stacks per region|200
952 | Max templates per region|unlimited
953 | Parameters|60
954 | Mappings|100
955 | Resources|200
956 | Outputs|60
957 |
958 |
959 | ## [↖](#top)[↑](#3_9_3)[↓](#3_10_1) Elastic Beanstalk (EB)
960 | * **Full stack** that provisions *capacity*, sets up *load balancing* and *auto-scaling* and configures
961 | *monitoring*
962 | * No need to create / manage infrastructure
963 | * Not good if full control of resource configuration is needed
964 | * Not everything fits into the EB model
965 |
966 |
967 | ### AWS-Stack
968 | * *EC2* instance
969 | * Instance *Security Group*
970 | * *Elastic Load Balancer*
971 | * *Load Balancer Security Group*
972 | * *Auto Scaling Group*
973 | * Automatically replaces instances if they become unavailable
974 | * *S3 Bucket*
975 | * Source code, logs & othe artifacts
976 | * *CloudWatch Alarm*
977 | * 2 alarms that monitor load on instances & Auto Scaling group scaling up / down
978 | * *Cloudformation stack*
979 | * *Domain name*
980 | * `subdomain.region.elasticbeanstalk.com`
981 |
982 |
983 | ### Supports
984 | * Platform-specific application *source bundle* (e.g. Java `war` for Tomcat)
985 | * Go
986 | * Java with Tomcat
987 | * Java SE
988 | * .NET on Windows Server with IIS
989 | * Node.js
990 | * PHP
991 | * Python
992 | * Ruby (Passenger Standalone)
993 | * Ruby (Puma)
994 | * Single Container Docker
995 | * Multicontainer Docker
996 | * Preconfigured Docker (Glassfish)
997 | * Preconfigured Docker (Python 3.x)
998 | * Preconfigured Docker (Go)
999 |
1000 |
1001 | ### Core components
1002 | * **Application**
1003 | * Logical collection of *Elastic Beanstalk* components, including *environments*, *versions*, and
1004 | *environment configurations*. In Elastic Beanstalk an application is conceptually similar to a
1005 | folder.
1006 | * **Application version**
1007 | * An *application version* refers to a specific, labeled iteration of deployable code for a web
1008 | application
1009 | * **Environment**
1010 | * An environment is a version that is deployed onto AWS resources
1011 | * Runs only a single application version at a time
1012 | * Can run the same version or different versions in many environments at the same time
1013 | * **Environment Configuratoin**
1014 | * Collection of parameters and settings that define how an environment and its associated resources behave
1015 | * Updating a configuration will cause AWS to automatically apply the changes
1016 | * **Configuration template**
1017 | * Starting point for creating unique environment configurations
1018 |
1019 |
1020 | ### Limits:
1021 | .|.
1022 | -|-
1023 | Applications|75
1024 | Application Versions|1000
1025 | Environments|200
1026 |
1027 |
1028 | ## [↖](#top)[↑](#3_10_4)[↓](#3_11_1) Simple Workflow Service (SWF)
1029 | * **Task coordination** and **state management** service
1030 | * Distributed, scales up and down depending on task
1031 | * Works with *on-premise* and *cloud* apps
1032 | * Allows for *synchronous* or *asynchronous* processing
1033 | * Can contain human events
1034 | * Guaranteed order of execution
1035 | * Tasks can live up to one year (`31,536,000 seconds`)
1036 |
1037 |
1038 | ### Core components
1039 | * **Workflow**
1040 | * A workflow is a set of *activities* that carry out some objective, together with logic that
1041 | coordinates the activities.
1042 | * **Domain**
1043 | * Scope of a *workflow*
1044 | * An account can have multiple *domains*, each of which can contain multiple *workflows*
1045 | * *Workflows* in different *domains* cannot interact
1046 | * **Workflow Starter**
1047 | * Any application that can initiate workflow executions
1048 | * **Activity**
1049 | * Things carried out by a *workflow*
1050 | * **Activity Task**
1051 | * Represents one invocation of an *activity*
1052 | * Can run synchronously or asynchronously
1053 | * Gets assigned to worker
1054 | * *Decision task* tells decider that state of workflow has changed
1055 | * **Activity Worker**
1056 | * Is a program that receives *activity tasks*, performs them, and provides results back
1057 | * Might be used by a person
1058 | * Can live on *EC2* or on-premise
1059 | * **Decider**
1060 | * Coordination logic in a *workflow*
1061 | * Schedules *activity tasks*, provides input data to the *activity workers*, processes events that
1062 | arrive while the *workflow* is in progress, and ultimately ends (or closes) the *workflow* when the
1063 | objective has been completed.
1064 |
1065 |
1066 | ### Limits:
1067 | .|.
1068 | -|-
1069 | Maximum registered domains|100
1070 |
1071 |
1072 | ## [↖](#top)[↑](#3_11_2)[↓](#3_12_1) Virtual Private Cloud (VPC)
1073 | * Provisions a logically isolated section of the AWS cloud
1074 | * Spans over all AZs in a region
1075 | * Allows to create layered architecture
1076 | * Shared or dedicated tenancy (exclusive hardware or not)
1077 | * *Security groups* and subnet *network ACLs*
1078 | * Ability to extend on-premise network to cloud
1079 |
1080 |
1081 | ### Overview
1082 |
1083 | #### Default VPC (Amazon specific)
1084 | * Gives easy access to a VPC without having to configure it from scratch
1085 | * Has different subnets in different AZs and an internet gateway per AZ
1086 | * Each instance launched automatically receives a *public IP* (very different to non-default VPC)
1087 | * Cannot be restored if deleted
1088 |
1089 | #### Non-default VPC (regular VPC)
1090 | * Only has private IP addresses
1091 | * Resources *only* accessible through *Elastic IP*, *VPN* or *internet gateways*
1092 |
1093 | #### VPC Peering
1094 | * Connect VPCs through direct network routing
1095 | * Can occur between different accounts and VPCs, but must be in the same region
1096 | * Allows instances to communicate with each other as if they were in the same network
1097 |
1098 | #### VPC Scenarios
1099 | * VPC with private subnet only -> single tier apps
1100 | * VPC with public and private subnets -> layered apps
1101 | * VPC with public, private subnets and hardware connected VPN -> extending apps to on-premise
1102 | * VPC with private subnets and hardware connected VPN -> extended VPN
1103 |
1104 |
1105 | ### Components
1106 | * **Subnet**
1107 | * In exactly one AZ
1108 | * If traffic is routed to an Internet gateway, the subnet is known as a public subnet
1109 | * If a subnet doesn't have a route to the Internet gateway, it's known as a private subnet
1110 | * EC2 instances are launched into subnets
1111 | * Use ssh-agent forwarding to connect from public to private instances
1112 | * Sometimes grouped into Subnet Groups, e.g. for caching or DB. Typically across AZs
1113 | * **Route Table**
1114 | * Contains a set of rules, called routes that determine where network traffic is directed to
1115 | * Each VPC automatically comes with a main route table that can be configured
1116 | * Each subnet in a VPC must be associated with a route table; the table controls the routing
1117 | for the subnet. A subnet can only be associated with one route table at a time, but multiple
1118 | subnets can be associated with the same route table
1119 | * Each route in a table specifies a destination CIDR and a target
1120 | * Every route table contains a local route for communication within the VPC
1121 | * Can have a *default route* 0.0.0.0/0 to route everything that doesn't have a specific rule
1122 | * **Elastic IP**
1123 | * Static IPv4 address mapped to an instance or network interface
1124 | * If attached to network interface it's decoupled from the instance's lifecycle
1125 | * Routes to private IP address of instance
1126 | * Can be remapped in case of failure.
1127 | * For use in a specific region only
1128 | * Can only map to instances in public subnets
1129 | * **Gateways**
1130 | * *Internet Gateway*
1131 | * Horizontally scaled, redundant, and highly available VPC component that allows communication
1132 | between instances in a VPC and the internet
1133 | * Provides a target in VPC route tables for internet-routable traffic
1134 | * Performs network address translation (NAT) for instances that have been assigned public
1135 | IPv4 addresses
1136 | * *Virtual Private Gateway*
1137 | * Has VPN connection to customer gateway attached
1138 | * Serves as VPN concentrator on the Amazon side of the VPN connection
1139 | * *Customer Gateway*
1140 | * A physical device or software application on your side of the VPN connection
1141 | * **NAT**
1142 | * *NAT Instances*
1143 | * Manually configured instance from an NAT AMI
1144 | * *NAT Gateway*
1145 | * AWS-mananged service
1146 |
1147 | #### Structure & package flow
1148 | * VPC (has *CIDR*)
1149 | * Gateway (Internet or VPN)
1150 | * Routes (one per subnet, can be shared)
1151 | * Network ACL (one per subnet, can be shared)
1152 | * Subnets (CIDRs match VPC's CIDR)
1153 | * Security Group (on VPC level)
1154 | * Instance (needs public IP for internet communication, either ELB or Elastic IP)
1155 |
1156 |
1157 | ### Security
1158 |
1159 | #### Network ACL
1160 | * Subnet level, acting as firewall
1161 | * Rules for inbound and outbound traffic
1162 | * Rules have numbers and are evaluated from low to high
1163 | * *Stateless*
1164 |
1165 | #### Security Groups
1166 | * Acts as a virtual firewall to control inbound and outbound traffic to instances
1167 | * Acts on instance level, not subnet level
1168 | * Rules for inbound and outbound traffic
1169 | * *Stateful* - will always allow response to (allowed) outbound traffic
1170 | * Can refer to other security group, e.g. allow traffic from there
1171 |
1172 |
1173 | ### Limits:
1174 | .|.
1175 | -|-
1176 | VPCs per region|5
1177 | Subnets per VPC|200
1178 | Customer gateways per region|50
1179 | Gateway per region|5 Internet
1180 | Elastic IPs per account per region|5
1181 | VPN connections per region|50
1182 | Route tables per region|200
1183 | Security groups per region|500
1184 |
1185 |
1186 | ## [↖](#top)[↑](#3_12_4)[↓](#4) Relational Database Service (RDS)
1187 | * Set up, operate, and scale a **relational database** in the cloud
1188 | * Supports
1189 | * Amazon Aurora
1190 | * MySQL
1191 | * MariaDB
1192 | * Oracle
1193 | * SQL Server
1194 | * PostgreSQL
1195 | * Automates common administrative tasks such as performing **backups** and software **patching**
1196 | * *Automated backups*
1197 | * Performs a full daily snapshot
1198 | * Enables point-in-time recovery
1199 | * *DB Snapshots*
1200 | * User-initiated
1201 | * As frequent as desired
1202 | * Supports *encryption at rest* for all database engines
1203 | * **DB instance**
1204 | * Database environment in the cloud with specified *compute* and *storage* resources
1205 | * **Multi-AZ deployments**
1206 | * Provide enhanced availability and durability for DB Instances, making them a natural fit for
1207 | production database workloads
1208 | * **DB subnet group**
1209 | * Collection of subnets that you are designated for the RDS DB Instances in a VPC
1210 | * **Maintenance window**
1211 | * Needs to be specified (or defaults to weekly) for maintenance events like scaling and patching
1212 | * **DB Parameter group**
1213 | * Acts as a “container” for engine configuration values that can be applied to one or more DB
1214 | Instances
1215 |
1216 |
1217 | # [↖](#top)[↑](#3_13)[↓](#) Etc
1218 | * *us-east-1* is the default region for all SDKs
1219 | * *Penetration tests* need to be anounced
1220 |
--------------------------------------------------------------------------------
/devops-engineer-professional-02.md:
--------------------------------------------------------------------------------
1 |
2 | # DevOps Engineer Professional (C02)
3 |
4 | ## Comments per Service
5 |
6 | ### CodeStar
7 |
8 | #### CodeCommit
9 |
10 | - CodeCommit requires CloudWatch Events rule to trigger CodePipeline
11 | - Can trigger lambda functions out of CodeCommit events
12 | - AWS provides several managed policies:
13 | - `AWSCodeCommitFullAccess` , `AWSCodeCommitPowerUser` , `AWSCodeCommitReadOnly`
14 | - Can use Approval Rule templates to e.g. trigger unit tests via CodeBuild
15 |
16 | #### CodePipeline
17 |
18 | - CodePipeline can execute cross-region actions
19 | - CodePipeline can deploy straight into S3
20 | - CodePipeline can have custom actions that invoke job workers
21 |
22 | #### CodeBuild
23 |
24 | - CodeBuild can be triggered directly from Github via web hook
25 | - CodeBuild supports build badges, which provide an embeddable, dynamically generated image (_badge_) that displays the status of the latest build for a project
26 |
27 | #### CodeDeploy
28 |
29 | - In EC2/On-Premises deployment, a CodeDeploy **deployment group** is a set of individual instances targeted for a deployment. A deployment group contains individually tagged instances, Amazon EC2 instances in Amazon EC2 Auto Scaling groups, or both.
30 | - CodeDeploy can terminate the original instances in the deployment group with a waiting period of 1 hour.
31 | - CodeDeploy has a default timeout of 1 hour to wait for scripts to finish
32 | - CodeDeploy failing on `AllowTraffic` can mean that health checks on ELB are misconfigured
33 | - Notifies via CloudWatch Events
34 | - Lambda[]()
35 | - SNS
36 | - Kinesis streams
37 | - SQS
38 | - Built-in targets (CloudWatch Alarms actions)
39 |
40 | #### CodeGuru
41 |
42 | - Amazon CodeGuru **Profiler** helps developers understand the runtime behaviour of their applications, improve performance, and decrease infrastructure costs.
43 | - Amazon CodeGuru **Reviewer** is an automated code review service that identifies critical defects and deviation from coding best practices for Java and Python code. Works on PRs
44 | - Reviewer can protect secrets and suggest code changes to mitigate
45 |
46 | ### IaC
47 |
48 | #### CloudFormation
49 |
50 | - CFN custom resources -> pre-signed URLs
51 | - In a stackset, global resources (like S3) have to be unique
52 | - CloudFormation drift detection requires manual intervention; use AWS Config to automate detection.
53 |
54 | #### Service Catalog
55 |
56 | - By using a launch role via **launch constraint**, you can instead limit the end users’ permissions to the minimum they require for that product
57 | - The **template constraint** limits the options that are available to end-users when they launch a product. It works by narrowing the allowable values for parameters that are defined in the product’s underlying AWS CloudFormation template
58 | - Apply template constraints to ensure that the end users can use products without breaching the compliance requirements of your organization
59 |
60 | #### OpsWorks
61 |
62 | - OpsWorks can create time-based instances for scaling of predictable workload, or load-based using CPU utilisation or load, or memory utilisation
63 |
64 | ### Compute
65 |
66 | #### EC2
67 |
68 | - EC2 memory metrics are not collected by default and need to have CloudWatch agent installed
69 | - EC2 can use built-in **instance recovery**
70 | - An instance is scheduled to be retired when AWS detects irreparable failure of the underlying hardware that hosts the instance.
71 | - When an instance reaches its scheduled retirement date, it is stopped or terminated by AWS.
72 | - AWS also sends an AWS Health event, which you can monitor and manage by using Amazon CloudWatch Events.
73 |
74 | #### ASG
75 |
76 | - ASG lifecycle states:
77 | - `Pending` (hooks `Pending:Wait`, `Pending:Proceed`)
78 | - `InService`
79 | - `Terminating` (hooks `Terminating:Wait`, `Terminating:Proceed`)
80 | - `Terminated`
81 | - `Pending:Wait` lifecycle hook can allow AMI upgrades before bringing them into service
82 | - `Terminating:Wait` lifecycle hook to collect instance data (e.g. logs) before final termination
83 | - Tags mentioned in the Auto Scaling group are _not_ propagated to EBS volumes
84 | - ASG: A warm pool gives you the ability to decrease latency for your applications that have exceptionally long boot times, for example, because instances need to write massive amounts of data to disk.
85 | - Can keep instances in pool _running_ or _stopped_
86 | - ASG can notify via SNS on failed instance launch
87 | - Can use Amazon EventBridge or Amazon CloudWatch Events to track the Auto Scaling Events
88 | - Can trigger Lambdas from ASG by filtering on EventBridge events
89 | - CloudFormation + ASGs:
90 | - `AutoScalingReplacingUpdate`: `WillReplace` `true` will wait for a complete replacement of the ASG and its instances before deleting the old ASG
91 | - `AutoScalingRollingUpdate`: replaces existing instance in ASG; valid options: MaxBatchSize, MinInstancesInService, MinSuccessfulInstancesPercent, PauseTime, SuspendProcesses, WaitOnResourceSignals
92 |
93 | #### Storage Gateway
94 |
95 | - Storage Gateway does not automatically refresh the cache if the files were added directly to S3. `RefreshCache` can be used to refresh the cache periodically.
96 | - **Tape gateway** is backed up by glacier, meant for backups etc
97 | - **File gatewayEC2** gets on-premises data into the cloud
98 | - **Volume gateway** is cloud-backed iSCSI block storage volumes
99 |
100 | #### SSM
101 |
102 | - ``AWS-AmazonLinuxDefaultPatchBaseline`` is a predefined patch baseline, doesn't do custom patches
103 | - `aws:runDocument` plugin runs SSM documents stored in Systems Manager or on a local share
104 | - `aws:downloadContent` plugin downloads an SSM document from a remote location to a local share
105 | - Can use SSM to create AMIs
106 |
107 | #### ELB
108 |
109 | - ALBs can be configured for 'dual stack' mode that allows IPv4 and IPv6
110 | - ALBs can have weightings between target groups
111 |
112 | ### Security
113 |
114 | #### IAM
115 |
116 | - `iam:passrole` passes a role to a service. E.g. a developer role to CloudFormation
117 |
118 | #### Firewall Manager
119 |
120 | - Firewall Manager can be used to configure and apply WAF ACLs to the ALBs in an AWS account. It can help centrally manage as well as apply them to new accounts added to the Organization in the future.
121 |
122 | #### KMS
123 |
124 | - **KMS grants** are commonly used by AWS services that integrate with AWS KMS to encrypt your data at rest.
125 | - The service creates a grant on behalf of a user in the account, uses its permissions, and retires the grant as soon as its task is complete.
126 |
127 | ### Compliance
128 |
129 | #### GuardDuty
130 |
131 | - Can be used for org-wide compliance
132 | - AWS recommends a separate delegated GuardDuty administrator account
133 | - Can auto-enable GuardDuty for all future Org accounts
134 | - Can configure GuardDuty **Trusted IP** list and **Threat IP** list and work with findings based on those
135 | - GuardDuty needs EventBridge for filtering
136 |
137 | #### Config
138 |
139 | - AWS Config can ensure all EC2 instances are managed by AWS Systems Manager.
140 | - AWS Config can find `ec2-volume-inuse-check`, but cannot detect how long a volume was unused for
141 | - `cloudformation-stack-drift-detection-check` checks if the actual configuration of a CloudFormation stack differs, or has drifted
142 | - `s3-bucket-ssl-requests-only` checks whether S3 buckets have policies that require requests to use SSL
143 | - Can deploy **conformance packs** into org accounts (from a delegated admin account)
144 | - Config itself is per region, use **Config Aggregator** for centralised collection of findings across regions & accounts
145 | - Uses aggregator account
146 | - By default, AWS Config will not automatically remediate the accounts that disabled its CloudTrail. You must manually set this up using a CloudWatch Events rule and a custom Lambda function that calls the StartLogging API to enable CloudTrail back again. Furthermore, the `cloudtrail-enabled` AWS Config managed rule is only available for the periodic trigger type and not Configuration changes.
147 |
148 | #### ControlTower
149 |
150 | - Use EventBridge to get notifications on Control Tower events like `CreateManagedAccount`
151 | - **Customizations for AWS Control Tower (CfCT)** helps you customize your AWS Control Tower landing zone and stay aligned with AWS best practices. Customizations are implemented with AWS CloudFormation templates and service control policies (SCPs).
152 | - CfCT capability is integrated with AWS Control Tower lifecycle events so that your resource deployments remain synchronized with your landing zone
153 |
154 | #### Org Policies
155 |
156 | - Are inherited down the path `org root` -> `ou` -> `accounts`
157 |
158 | #### Trusted Advisor
159 |
160 | - AWS Trusted Advisor checks identify ways to optimize your AWS infrastructure, improve security and performance, reduce costs, and monitor service quotas
161 | - Cost Optimization, Performance, Security, Fault Tolerance, and Service Limits
162 | - TrustedAdvisor can check for under-utilized EC2
163 | - Trusted Advisor's primary integration point is CloudWatch Events
164 | - With Trusted Advisor’s **Service Limit Dashboard**, you can view, refresh, and export utilization and limit data on a per-limit basis.
165 | - Metrics are published on Amazon CloudWatch in which you can create custom alarms
166 |
167 | #### Health
168 |
169 | - AWS Health is scanning public repos and can send events for compromised keys
170 | - On detection of an exposed IAM access key, AWS Health generates an `AWS_RISK_CREDENTIALS_EXPOSED` CloudWatch Event.
171 | - Also lists AWS Scheduled maintenance events on Health Dashboard
172 | - Can use CloudWatch Events/EventBridge to trigger workflows based on events
173 | - Can monitor AWS Health events using Amazon EventBridge or CloudWatch Events by calling the AWS Health API
174 |
175 | #### CloudTrail
176 |
177 | - Can set up trails for
178 | - **Data events**: These events provide insight into the resource operations performed on or within a resource. These are also known as data plane operations.
179 | - For S3 or Lambda data events
180 | - **Management events**: Management events provide insight into management operations that are performed on resources in your AWS account. These are also known as control plane operations.
181 |
182 | ### Networking
183 |
184 | #### VPC
185 |
186 | - NAT gateway does not span multiple AZs (instead: one gateway per AZ)
187 | - Can send VPC Flow Logs to CloudWatch Logs
188 |
189 | ### Storage
190 |
191 | #### Aurora
192 |
193 | - Read replicas are always asynchronous
194 | - AWS Aurora Global Database uses storage-based replication with typical latency of less than 1 second, using dedicated infrastructure that leaves your database fully available to serve application workloads.
195 | - 1 primary region (read/write), up to 5 secondary regions (read)
196 | - In the event of a regional degradation or outage, one of the second regions can be promoted to read and write capabilities in less than 1 minute.
197 | - Aurora endpoints
198 | - single built-in cluster endpoint, connects to the primary instance of the cluster
199 | - reader endpoint for read-only connections for your Aurora cluster
200 | - can have custom cluster endpoints (managed by Aurora) that can be READER. WRITER or ANY
201 |
202 | #### RDS
203 |
204 | - RDS creates and saves automated backups of your DB instance or Multi-AZ DB cluster during the backup window of your database.
205 | - default: 30min backup during 8h per-region night
206 | - Amazon RDS uses SNS to provide notification when an Amazon RDS event occurs.
207 | - Can also use CloudWatch Events/Eventbridge
208 | - Failover:
209 | - AZ outages => RDS multi-AZ deployment
210 | - Regional outages => RDS read replica
211 | - Multi-region deployments are like multi-AZ deployments, but other regions can be used for reads. Read replicas can be in the same AZ, same region, or cross-region
212 | - Read replicas have best RTO/RPO, but highest cost
213 |
214 | #### DynamoDB
215 |
216 | - In DynamoDb `ThrottledWriteRequests` can help adjusting increase the maximum write capacity units for the table's Auto Scaling policy.
217 | - `WriteThrottleEvents` are requests to DynamoDB that exceed the provisioned write capacity units for a table or a global secondary index.
218 | - Can use Kinesis Data Streams to capture changes to DynamoDB
219 | - Amazon DynamoDB global tables provide a single-digit millisecond latency and make sure the data is available across regions.
220 | - DynamoDB Global Tables requires
221 | - tables are created in each region already
222 | - DynamoDB Streams is enabled
223 | - Don't have multiple lambdas read from DynamoDB Streams
224 | - Only one process per shard!
225 | - Better to use fan-out pattern
226 |
227 | #### Glue
228 |
229 | - AWS Glue is an efficient way to store object metadata. Combination: S3 - Glue - Athena - QuickSight
230 |
231 | #### S3
232 |
233 | - Can include a pre-calculated checksum as part of your request. Amazon S3 compares the provided checksum to the checksum that it calculates by using your specified algorithm
234 | - Can activate access logs and use Athena for analysis/queries
235 | - S3 cross-region replication is push-based: source bucket gets a replication rule, destination bucket gets a bucket policy, source needs IAM role for S3 service to assume
236 | - Configure a replication rule within the source bucket to activate the replication process.
237 | - Create a bucket policy in the destination bucket that grants the source bucket permission to replicate objects into it.
238 | - In the source AWS account, create an IAM role that Amazon S3 can assume to replicate objects. Enable versioning in both buckets.
239 | - AWS CloudTrail only logs bucket-level actions in your Amazon S3 buckets by default. If you want to record all object-level API activity in your S3 bucket, you can set up data events in CloudTrail
240 |
241 | ### Serverless
242 |
243 | #### API Gateway
244 |
245 | - API Gateway does not have specific metrics for individual http error codes like 403, only a generic `4XXError` metric
246 | - Can enable API **caching** in Amazon API Gateway to cache your endpoint's responses
247 |
248 | #### ECS
249 |
250 | - can set ECS tasks as a target of CloudWatch events
251 | - ECS/Fargate logs
252 | - add the required `logConfiguration` parameters to your task definition to turn on the `awslogs` log driver
253 | - ECS/EC2
254 | - container instances have an attached IAM role that contains `logs:CreateLogStream` and `logs:PutLogEvents`
255 | - to turn on the `awslogs` log driver, your Amazon ECS container instances require at least version 1.9.0 of the container agent
256 |
257 | ### Application Auto Scaling
258 |
259 | - Is a web service for automatically scaling scalable resources for individual AWS services beyond Amazon EC2
260 | - Lambda function provisioned concurrency
261 | - DynamoDB tables and global secondary indexes
262 | - Aurora replicas
263 | - Amazon Elastic Container Service (ECS) services
264 | - ...
265 |
266 | - **Target** tracking scaling – Scale a resource based on a target value for a specific CloudWatch metric.
267 | - **Step** scaling – Scale a resource based on a set of scaling adjustments that vary based on the size of the alarm breach.
268 | - **Scheduled** scaling – Scale a resource one time only or on a recurring schedule.
269 |
270 | ### Content Delivery
271 |
272 | #### CloudFront
273 |
274 | - **OriginGroup**: An origin group includes two origins (a primary origin and a second origin to failover to) and a failover criteria that you specify.
275 |
276 | ### Notifications/Events
277 |
278 | #### SNS
279 |
280 | - SNS defines a **delivery policy** for each delivery protocol. The delivery policy defines how Amazon SNS retries the delivery of messages when server-side errors occur (when the system that hosts the subscribed endpoint becomes unavailable).
281 | - When the delivery policy is exhausted, Amazon SNS stops retrying the delivery and discards the message
282 | - —> unless a **dead-letter queue** is attached to the subscription.
283 | - For ECS notifications on **essential task** stopped, used EventBridge
284 | - For S3 fanout, use SNS and subscribe consumers to it
285 |
286 | ### Logging/Monitoring/Notification
287 |
288 | #### CloudWatch
289 |
290 | - CloudWatch Logs are always encrypted
291 | - CloudWatch _Metrics_ filters can be used to filter CloudWatch _Logs_
292 | - Can create CloudWatch **Alarm** for the `StatusCheckFailed_System` metric and select the EC2 action to recover the instance
293 | - **CloudWatch Logs Subscription** for near realtime feed of log events
294 | - "Getting logs out of CloudWatch for further processing"
295 | - from CloudWatch Logs, to _Kinesis_, _ElasticSearch_ or _Lambda_
296 | - CloudWatch has a predefined dashboard for CodeBuild metrics
297 | - You can call the EC2 `CreateSnapshot` API directly as a target from CloudWatch Events.
298 |
299 | #### KMS
300 |
301 | - KMS monitors to CloudWatch, can define alarms and alert
302 |
303 | #### Xray
304 |
305 | - Can run X-Ray daemon on AWS Elastic Beanstalk
306 | - X-Ray daemon uses UDP port 2000
307 |
308 | ---
309 |
310 | ## Comments per Topic
311 |
312 | ### Implement CI/CD Pipelines
313 |
314 | - CodeDeploy states + lifecycle hooks
315 | - CodeCommit IAM policies
316 | - CodeCommit needs CloudWatch Events/EventBridge to detect PRs
317 | - (EventBridge is the same service as CloudWatch Events, just with a new interface and more features exposed.)
318 | - GitHub needs a web hook to start a CodePipeline
319 | - CodeDeploy lifecycle hooks (reserved for CodeDeploy in parentheses):
320 | - `ApplicationStop`
321 | - (`DownloadBundle`)
322 | - `BeforeInstall`
323 | - (`Install`)
324 | - `AfterInstall`
325 | - `ApplicationStart`
326 | - `ValidateService`
327 | - `BeforeBlockTraffic`
328 | - (`BlockTraffic`)
329 | - `AfterBlockTraffic`
330 | - `BeforeAllowTraffic`
331 | - (`AllowTraffic`)
332 | - `AfterAllowTraffic`
333 | - Integrate automated testing into CI/CD pipelines
334 | - CloudWatch Logs + EventBridge to automate based on CodeBuild job results
335 | - CodeDeploy + EventBridge to automate based on CodeDeploy job results
336 | - EventBridge for CodePipeline scheduled events
337 | - CodeDeploy can integrate with CloudWatch Alarms to pause deployments
338 | - Build and manage artifacts
339 | - CodeBuild + CodePipeline + CodeDeploy + S3 for artifacts
340 | - S3 versioning + encryption required for CodePipeline
341 | - Implement deployment strategies for instance, container, and serverless environments
342 | - Elastic Beanstalk policies
343 | - All at once - fastest, but causes downtime; all remaining options have zero downtime
344 | - Rolling - still uses batches
345 | - Rolling with additional batch - to maintain full capacity during deploy
346 | - Immutable for when new & old versions must not be mixed and for fast rollback
347 | - Traffic splitting: for canary deploys
348 | - Blue/Green deployments: swap environment URLs; keep RDS in a separate stack; requires DNS change (all previous ones do not)
349 | - Lambda
350 | - canary deployments via alias weights
351 | - use CodeDeploy default deploy options:
352 | - Lambda: `LambdaLinear10PercentEvery10Minutes` (10% of traffic shifted at a time), `LambdaCanary10Percent10Minutes` (one 10% and one 90% deploy)
353 | - EC2: `AllAtOnce`, `OneAtATime`, `HalfAtATime`
354 | - ALB + EC2 + Route53 alias record swaps
355 | - OpsWorks Stack cloning + Route53 alias swaps
356 | - OpsWorks lifecycle stages
357 |
358 | ### Config Management and IaC
359 |
360 | - Define cloud infrastructure and reusable components to provision and manage systems throughout their lifecycle
361 | - CloudFormation cross-stack references use exports + Fn::ImportValue
362 | - Inline Lambda functions in CFN
363 | - Custom resource is used to invoke a Lambda function in AWS CloudFormation, the request will include a pre-signed URL. The Lambda function is responsible for returning a response to the pre-signed URL to indicate if the resource creation was successful or not.
364 | - Deploy automation to create, onboard, and secure AWS accounts in a multi-account/multi-region environment
365 | - Design and build automated solutions for complex tasks and large-scale environments
366 | - CloudFormation + ASGs:
367 | - `AutoScalingReplacingUpdate`: `WillReplace:true` will wait for a complete replacement of the ASG and its instances before deleting the old ASG
368 | - `AutoScalingRollingUpdate`: replaces existing instance in ASG; valid options: `MaxBatchSize`, MinInstancesInService, MinSuccessfulInstancesPercent, PauseTime, SuspendProcesses, WaitOnResourceSignals
369 | - OpsWorks can create _time-based_ instances for scaling of predictable workload, or _load-based_ using CPU utilisation or load, or memory utilisation
370 | - Collecting on-prem info:
371 | - Application Discovery Agent (install on each VM) or Agentless Discovery Connector (separate VM)
372 |
373 | ### Resilient Cloud Solutions
374 |
375 | - Implement highly available solutions to meet resilience and business requirements
376 | - RDS:
377 | - AZ outages => RDS multi-AZ deployment
378 | - Regional outages => RDS read replica
379 | - Multi-region deployments are like multi-AZ deployments, but other regions can be used for reads. Read replicas can be in the same AZ, same region, or cross-region
380 | - Read replicas have best RTO/RPO, but highest cost
381 | - Frontend traffic switching => Route53 failover
382 | - AutoScaling with a min & max of 1 is actually sensible - it makes the instance auto-redeploy if it dies
383 | - Route53 policies: `simple`, `failover`, `geolocation`, `geoproximity`, `latency`, `multi-value answer`, `weighted`
384 | - Implement solutions that are scalable to meet business requirements
385 | - ASG lifecycle states:
386 | - `Pending` (hooks `Pending:Wait`, `Pending:Proceed`)
387 | - `InService`
388 | - `Terminating` (hooks `Terminating:Wait`, `Terminating:Proceed`)
389 | - `Terminated`
390 | - EC2 autoscaling` Pending:Wait` lifecycle hook can allow AMI upgrades before bringing them into service
391 | - `Terminating:Wait` lifecycle hook to collect instance data (e.g. logs) before final termination
392 | - EKS: k8s cluster autoscaler or karpenter
393 | - EKS networking:
394 | - VPC CNI plugin
395 | - Load Balancer Controller
396 | - CoreDNS
397 | - kube-proxy
398 | - Calico
399 | - Hybrid environment patching
400 | - Implement automated recovery processes to meet RTO/RPO requirements
401 |
402 | ### Monitoring and Logging
403 |
404 | - Configure the collection, aggregation, and storage of logs and metrics
405 | - AWS Config Aggregator for centralised collection of findings across regions & accounts
406 | - EC2 custom logging requirements => CloudWatch Logs Agent
407 | - ECS Fargate logs => awslogs driver on task definition
408 | - CloudWatch has a predefined dashboard for CodeBuild metrics
409 | - Audit, monitor, and analyze logs and metrics to detect issues
410 | - near real time dashboards => QuickSight
411 | - near real time processing on CloudWatch logs:
412 | - Lambda subscription filter
413 | - Kinesis stream filter
414 | - ElasticSearch (OpenSearch) subscription filter
415 | - CloudTrail has log integrity checking which must be turned on
416 | - Automate monitoring and event management of complex environments
417 | - Service limit alerting => Trusted Advisor + CloudWatch Alarms + ServiceLimitUsage metric
418 |
419 | ### Incident and Event Response
420 |
421 | - Manage event sources to process, notify, and take action in response to events
422 | - S3 event notifications for data notifications like file deletion
423 | - RDS event notifications for multi-AZ failover events
424 | - EventBridge + AWS Health for notification about IAM credentials being exposed on GitHub, and for notifications about instance outages, etc.
425 | - CloudTrail _data_ events for object-level activity on S3
426 | - EC2 Auto Scaling groups => EventBridge
427 | - CodePipeline stage => EventBridge
428 | - CodeDeploy => CloudWatch Alarm + `MinimumHealthyHosts` metric can be used for rollbacks
429 | - OpsWorks self-healing => EventBridge
430 | - Implement configuration changes in response to events
431 | - Troubleshoot system and application failures
432 |
433 | ### Security and Compliance
434 |
435 | - Implement techniques for identity and access management at scale
436 | - Limit CodeCommit permissions via IAM policy which matches repo
437 | - S3 bucket policies for requiring TLS
438 | - Apply automation for security controls and data protection
439 | - Lifecycle management + auto-rotation of secrets => Secrets Manager
440 | - Cost-effective => SSM Parameter Store SecureStrings
441 | - Patching => SSM Patch Manager
442 | - Implement security monitoring and auditing solutions
443 |
--------------------------------------------------------------------------------
/sysops-administrator-associate.md:
--------------------------------------------------------------------------------
1 | [toc_start]::
2 |
3 | ---
4 | * [AWS-SysOps-Administrator-Associate](#1)
5 | * [Monitoring And Metrics](#2)
6 | * [Virtualization Types](#2_1)
7 | * [EC2 Instance Types](#2_2)
8 | * [EC2 Monitoring](#2_3)
9 | * [EBS Monitoring](#2_4)
10 | * [EFS Monitoring](#2_5)
11 | * [CloudWatch](#2_6)
12 | * [Costs](#3)
13 | * [Consolidated Billing](#3_1)
14 | * [Billing Metrics & Alarms](#3_2)
15 | * [Costs Optimization](#3_3)
16 | * [Cost Explorer](#3_4)
17 | * [High Availability](#4)
18 | * [Scalability & Elasticity Fundamentals](#4_1)
19 | * [Reserved Instances](#4_2)
20 | * [Autoscaling vs Resizing](#4_3)
21 | * [Load Balancers](#4_4)
22 | * [RDS HA](#4_5)
23 | * [HA for IP-based Applications](#4_6)
24 | * [HA/Fault Tolerance for Bastion Hosts](#4_7)
25 | * [Analysis](#5)
26 | * [Optimize the environment to ensure maximum performance](#5_1)
27 | * [Identify Performance Bottlenecks and Implement Remedies](#5_2)
28 | * [Identify Potential Issues on a Given Application Deployment](#5_3)
29 | * [OpsWorks](#6)
30 | * [Overview and components](#6_1)
31 | * [Cloudformation](#6_2)
32 | * [Backups & Recovery](#7)
33 | * [AWS Services with automated backups](#7_1)
34 | * [Disaster Recovery Scenarios](#7_2)
35 | * [Storing log files and backups](#7_3)
36 | * [Security](#8)
37 | * [Implement and Manage Security Policies](#8_1)
38 | * [Ensure Data Integrity and Access Controls when Using the AWS Platform](#8_2)
39 | * [Share responsibility model](#8_3)
40 | * [AWS and IT Audits](#8_4)
41 | * [Networking](#9)
42 | * [Route53 Routing Policies](#9_1)
43 | * [VPC Essentials](#9_2)
44 | * [Limits:](#9_3)
45 | * [Etc](#10)
46 | * [Accessing the OS](#10_1)
47 | * [SQS](#10_2)
48 | * [DynamoDb](#10_3)
49 | ---
50 | [toc_end]::
51 |
52 | # [↖](#top)[↑](#)[↓](#2) SysOps Administrator Associate
53 | > 5/2018 - 9/2018
54 |
55 | ---
56 |
57 |
58 | # [↖](#top)[↑](#1)[↓](#2_1) Monitoring And Metrics
59 |
60 |
61 | ## [↖](#top)[↑](#2)[↓](#2_2) Virtualization Types
62 |
63 | Linux Amazon Machine Images use one of two types of virtualization:
64 |
65 | AMI|Type|Effect
66 | -|-|-
67 | **PV**|Paravirtual|Historically better performance than HVM, but no longer the case
68 | **HVM**|Hardware virtual machine|More modern, same or better performance than PV
69 |
70 |
71 | ## [↖](#top)[↑](#2_1)[↓](#2_3) EC2 Instance Types
72 |
73 | **General Purpose**|Balance of computer, memory and networking
74 | -|-
75 | **M5**
(2017)|* Require HVM AMIs
* Instance store via EBS or NVMe SSD (physically connected to to the host server)
76 | **M4**
(2015)|* Allows *enhanced networking*
* EBS-optimized
77 | **M3**
(2012)|* SSD (instance) store
78 | **T3**
(2018)|* 30% better price performance
79 | **T2**
(2014)|* Intented for workloads that do not use the full CPU constantly (e.g. web server)
* Allows *burstable performance*
* Burst credits allow to 'burst' past the baseline performance up to 100%
* 1 credit = 100% load per core per minute
* Credits are earned per hour, expire after 24h
* EBS storage only
80 |
81 | **Compute optimized**|Lowest prize for *compute* performance
82 | -|-
83 | **C5**
(2016)| * Intel Skylake
* Use Nitro, Amazon’s lightweight hardware accelerated hypervisor
* Better performance and pricing than C4
84 | **C4**
(2015)| * Intel Haswell
* Optimized for EC2
* Allows *enhanced networking* and *clustering*
* EBS-optimized
85 | **C3**
(2013)| * SSD (instance) store
* Allows *enhanced networking* and *clustering*
86 |
87 | **Memory optimized**|Lowest prize for *memory* performance
88 | -|-
89 | **Z1d**
(2018)| * Offer both high compute capacity and a high memory footprint
* Ideal for workloads with high per-core licensing costs
90 | **X1**
(2016)| * One of the lowest price per GiB of RAM
* SSD storage and EBS-optimized by default
* **X1e** has even more RAM
91 | **R5**
(2018)| * Use Nitro, Amazon’s lightweight hardware accelerated hypervisor
92 | **R4**
(2016)| * Improved networking and EBS performance
93 | **R3**
(2014)| * SSD (instance) store
* High memory capacity
* Allows *enhanced networking*
94 |
95 | **GPU optimized**|.
96 | -|-
97 | **P3**
(2017)| * Faster than P2
98 | **P2**
(2016)| * Intended for general-purpose GPU compute applications
99 | **G3**
(2017)| * Optimized for graphics-intensive applications
* Faster then G2
100 | **G2**
(2013)| * High frequency processors
* High-performce NVIDIA GPUs
101 |
102 |
103 | **Storage optimized**|Very fast SSD-backed instance storage optimized for high random I/O and high IOPS
104 | -|-
105 | **H1**
(2017)| * HDD-based local storage
* deliver high disk throughput
* Balance of compute and memory
106 | **I3**
(2016)| * (NVMe) SSD-backed instance storage optimized for low latency
* very high random I/O performance
107 | **D2**
(2015)| * Lowest price per disk throughput performance
108 | **I2**
(2013)| * SSD (instance) store
* Allows *enhanced networking*
* Supports *TRIM* (more efficient SSD operations)
109 |
110 | **RDS instance types**|Optimized to fit different relational database use cases
111 | -|-
112 | **db.**|General purpose, memory optimized, burstable performance
113 |
114 | .*
115 |
116 |
117 | ## [↖](#top)[↑](#2_2)[↓](#2_3_1) EC2 Monitoring
118 |
119 |
120 | ### EC2 Status Checks
121 | * AWS performs automated checks on every running EC2 instance
122 | * Performed every minute
123 | * Each returns a pass or a fail status
124 |
125 | **System Status Check**
126 | * Loss of network connectivity
127 | * Loss of system power
128 | * Hardware/software issues on physical host
129 | * Solution
130 | * Stop and start instance
131 | * Terminate and re-launch instance
132 | * Contact AWS
133 | * Can configure for *auto-recovery*
134 | * Instance will be rebooted and retain instance id, (e)ip address, EBS volumes et al
135 |
136 | **Instance Status Check**
137 | * Failed system status check
138 | * Network/startup configuration issues
139 | * Memory/disk problems
140 | * Kernel compatability issues
141 | * Solution
142 | * Fix problem
143 | * Stop and start instance
144 | * Terminate and re-launch instance, potentially with more memory/network/disk/...
145 |
146 |
147 | ## [↖](#top)[↑](#2_3_1)[↓](#2_4_1) EBS Monitoring
148 |
149 |
150 | ### EBS Status Checks
151 | * Run every 5 minutes
152 | * `insufficient data` if checks a running
153 | * `ok` if all checks pass
154 | * `warning` typically has to do with performance **degradation** from provisioned IOPS
155 | * `impaired` is a check fails, eg. the volume is **stalled** or not available
156 |
157 | * If Amazon EBS finds that data on a volume might be inconsistent, it disables I/O to that volume.
158 | * Changes status to `impaired`
159 | * This behaviour can be disabled
160 |
161 |
162 | ### EBS Performance Essentials
163 | **IOPS** (Input/Output Operations Per Second) is a common performance measurement used to benchmark
164 | computer storage devices like hard disk drives (HDD), solid state drives (SSD), and storage area
165 | networks (SAN).
166 | * I/O size is capped at 256 KiB for SSD volumes and 1,024 KiB for HDD volumes because SSD volumes handle
167 | small or random I/O much more efficiently than HDD volumes.
168 | * SSDs deliver constant performance for both random and sequential I/O
169 | * HDDs have optimal performance for large and sequential I/O
170 | * HDD can deliver more throughput put drastically less IOPS
171 |
172 | .|`gp2`|`io1`|`st1`|`sc1`
173 | -|-|-|-|-
174 | Volume type|General purpose SSD|Provisioned IOPS SSD|Throughput optimized HDD|Cold HDD
175 | Purpose|Balances price and performance|For mission-critical low-latency or high-throughput workloads|Low cost HDD volume designed for frequently accessed, throughput-intensive workloads |Lowest cost HDD volume designed for less frequently accessed workloads
176 | Volume Size|1 GiB - 16 TiB|4 GiB - 16 TiB|500 GiB - 16 TiB|500 GiB - 16 TiB
177 | Max. IOPS(1)/Volume|10,000|32,000|500|250
178 | Max. Throughput/Volume|160 MiB/s|500 MiB/s|500 MiB/s|250 MiB/s
179 | IOPS|* 3 IOPS per GB (larger volume means more IOPS)
* 100 IOPS <-> 10,000 IOPS
* Can burst to 3,000 IOPS if volume size is < 1TB
* Requires credits that are acquired per 3 IOPS/GB/second
* Max 5.4 miilion credit (also intitial value), enough for 3,000 IOPS for 30min
* Running out of credits reverts volume back to baseline performance|* 30 IOPS per GB (larger volume means more IOPS), up to 20,000
* Does not burst, delivers consistent IOPS rate instead|.|.
180 |
181 | > (1) gp2/io1 based on 16 KiB I/O size, st1/sc1 based on 1 MiB I/O size
182 |
183 | * Using *EBS optimized* instances guarantees optimal networking between EBS and EC2
184 | * Pre-warming/intialization
185 | * No longer needed for new EBS volumes
186 | * Storage blocks on volumes restored from snapshots do need to be initialized (read from)
187 |
188 |
189 | ## [↖](#top)[↑](#2_4_2)[↓](#2_5_1) EFS Monitoring
190 |
191 | * Two throughput modes to choose from for your file system
192 | * **Bursting** Throughput - throughput on Amazon EFS scales as your file system grows
193 | * **Provisioned** Throughput - you can instantly provision the throughput of your file system (in MiB/s) independent of the amount of data stored.
194 |
195 |
196 | ### Performance comparison
197 | .|Amazon EFS|Amazon EBS Provisioned IOPS (`io1`)
198 | -|-|-
199 | Per-operation latency|Low, consistent latency.|Lowest, consistent latency.
200 | Throughput scale|10+ GB per second.|Up to 2 GB per second.
201 |
202 |
203 | ### Storage Characteristics Comparison
204 |
205 | .|Amazon EFS|Amazon EBS Provisioned IOPS
206 | -|-|-
207 | Availability and durability|Data is stored redundantly across multiple AZs.|Data is stored redundantly in a single AZ.
208 | Access|Up to thousands of Amazon EC2 instances, from multiple AZs, can connect concurrently to a file system.|A single Amazon EC2 instance in a single AZ can connect to a file system.
209 | Use cases|Big data and analytics, media processing workflows, content management, web serving, and home directories.|Boot volumes, transactional and NoSQL databases, data warehousing, and ETL.
210 |
211 |
212 | ### S3 vs EFS vs EBS Comparison
213 |
214 | Amazon S3|Amazon EBS|Amazon EFS
215 | -|-|-
216 | Can be publicly accessible|Accessible only via the given EC2 Machine|Accessible via several EC2 machines and AWS services
217 | Web interface|File System interface|Web and file system interface
218 | Object Storage|Block Storage|Object storage
219 | Scalable|Hardly scalable|Scalable
220 | Slower than EBS and EFS|Faster than S3 and EFS|Faster than S3, slower than EBS
221 | Good for storing backups|Is meant to be EC2 drive|Good for shareable applications and workloads
222 |
223 |
224 | ## [↖](#top)[↑](#2_5_3)[↓](#2_6_1) CloudWatch
225 | Monitoring service that plugs into many other services
226 |
227 | * **Metrics**
228 | * Based on currently used service
229 | * Not everything is available out of the box, e.g. no data on memory usage of EC2 instances
230 | * **Alarms**
231 | * Based on thresholds defined on metrics
232 | * Can be added to dashboard
233 | * Invoke *Lambda*, *SNS*, email, ...
234 | * Takes place once, at a specific point in time
235 | * Disable with `mon-disable-alarm-actions` via CLI
236 | * **Logs**
237 | * Log into *log groups*
238 | * **Events**
239 | * Define actions on things that happened
240 | * Define `cron`-based events
241 | * Events are recorded constantly over time
242 |
243 |
244 | ### Key metrics for EC2
245 |
246 | * EC2 metrics are based on what is exposed to the hypervisor.
247 | * *Basic Monitoring* (default) submits values every 5 minutes, *Detailed Monitoring* every minute
248 | * Can install Cloudwatch agent (new)
249 | * Provides access to more metrics
250 | * Can use Cloudwatch monitoring scripts (old) to provide more metrics
251 | * Perl-scripts provided by AWS, need to manually install on instance
252 | * Use `cron` to automate sending data to CloudWatch
253 |
254 | Metric|Effect
255 | -|-
256 | `CPUUtilization`|The total CPU resources utilized within an instance at a given time.
257 | `DiskReadOps`,`DiskWriteOps`|The number of read (write) operations performed on all instance store volumes. This metric is applicable for instance store-backed AMI instances.
258 | `DiskReadBytes`,`DiskWriteBytes`|The number of bytes read (written) on all instance store volumes. This metric is applicable for instance store-backed AMI instances.
259 | `NetworkIn`,`NetworkOut`|The number of bytes received (sent) on all network interfaces by the instance
260 | `NetworkPacketsIn`,`NetworkPacketsOut`|The number of packets received (sent) on all network interfaces by the instance
261 | `StatusCheckFailed`,`StatusCheckFailed_Instance`,`StatusCheckFailed_System`|Reports whether the instance has passed both/instance/system status check in the last minute.
262 |
263 | * Can **not** monitor **memory usage**, **available disk space**, **swap usage**
264 |
265 |
266 | ### Key metrics for EBS
267 | Metric|Effect
268 | -|-
269 | `VolumeReadBytes`,`VolumeWriteBytes`|`sum` reports total bytes transferred, `average` also useful
270 | `VolumeReadOps`,`VolumeWriteOps`|total number of IO operations
271 | `VolumeQueueLength`|Number of read/write operation requests waiting to finish
272 | `VolumeTotalReadTime`,`VolumeTotalWriteTime`|Total number of seconds spent by all operations in a given time
273 | `VolumeThroughputPercentage`|Percentage of IOPS that was achieved out of total provisioned IOPS
274 | `VolumeConsumedReadWriteOps`|Total amount of r/w operations consumed within a specific time period
275 |
276 | * Can **not** monitor **disk usage percentage**
277 |
278 |
279 | ### Key metrics for EFS
280 | Metric|Effect
281 | -|-
282 | `BurstCreditBalance`|The number of burst credits that a file system has.
283 | `ClientConnections`|The number of client connections to a file system.
284 | `DataReadIOBytes`,`DataWriteIOBytes`|The number of bytes for each file system read(write) operation.
285 | `MetadataIOBytes`|The number of bytes for each metadata operation.
286 | `PercentIOLimit`|Shows how close a file system is to reaching the I/O limit of the General Purpose performance mode.
287 | `PermittedThroughput`|The maximum amount of throughput a file system is allowed.
288 | `TotalIOBytes`|The number of bytes for each file system operation, including data read, data write, and metadata operations.
289 |
290 |
291 | ### Key metrics for ELB (classic load balancer)
292 | Metric|Effect
293 | -|-
294 | `Latency`|Time it takes to receive an response. Measure `max` and `average`
295 | `BackendConnectionErrorr`|Number of not successfully established connections to registered instances, measure `sum` and look at difference between `min` and `max`
296 | `SurgeQueueLength`|Total number of request waiting to get routed, look at `max` and `average`
297 | `SpilloverCount`|Dropped requests because of exceeded surge queue. Look at `sum`
298 | `HTTPCode_ELB_3XX_Count`
`HTTPCode_ELB_4XX_Count`
`HTTPCode_ELB_5XX_Count`|The number of HTTP XXX server error codes that originate from the *load balancer*. This count does *not* include any response codes generated by the targets.
299 | `RequestCount`|Number of completed requests
300 | `HealthyHostCount`,`UnhealthyHostCount`|Self explainatory
301 |
302 | * In case of sudden and very large increases in traffic it's possible to contact AWS and have them
303 | 'pre-warm' the *ELB*.
304 |
305 | > spillover and surge queue give an indication of the ELB being overloaded
306 |
307 | * Typically this means that the backend system cannot process requests as fast as they are coming in
308 | * Ideally load balance into an autoscaling group.
309 |
310 |
311 | ### Key metrics for ALB (active load balancer)
312 | Metric|Effect
313 | -|-
314 | `RequestCount`|Number of completed requests
315 | `HealthyHostCount`,`UnhealthyHostCount`|Self explainatory
316 | `TargetResponseTime`|The time elapsed after the request leaves the load balancer until a response from the target is received.
317 | `HTTPCode_ELB_3XX_Count`
`HTTPCode_ELB_4XX_Count`
`HTTPCode_ELB_5XX_Count`|The number of HTTP XXX server error codes that originate from the *load balancer*. This count does *not* include any response codes generated by the targets.
318 |
319 |
320 | ### Key metrics for NLB (network load balancer)
321 | Metric|Effect
322 | -|-
323 | `processedbyte `|The total number of bytes processed by the load balancer, including TCP/IP headers.
324 | `tcp_client_reset_count`|the total number of reset (rst) packets sent from a client to a target.
325 | `tcp_elb_reset_count`|the total number of reset (rst) packets generated by the load balancer.
326 | `tcp_target_reset_coun`|the total number of reset (rst) packets sent from a target to a client.
327 |
328 |
329 | ### Key metrics for elasticache
330 | Supports *memcached* and *redis*
331 |
332 | Metric|**memcached**|**redis**
333 | -|-|-
334 | .|Designed for simplicity|Supports a much richer set of features. can be backed up if in *cluster* mode
335 | `cpu utilization`|* multithreaded
* stay under 90%/#cores
* -> increase # read replicase or use larger cache instance|* single threaded
* stay under 90%
* -> increase size of node or add more nodes
336 | `evictions`|* -> increase size or add nodes to cluster|* -> increase node size
337 | `concurrent connections`|* -> check application logic|* -> check application logic
338 | `swap usage`|* avoid swapping
-> increase `memcached_connections_overhead`|avoid swapping
* -> increase node size
* -> increase `memory connection overhead` (will decrease memory available for cache)
339 |
340 | .*
341 |
342 |
343 | ### Key metrics for RDS
344 | Metric|Effect
345 | -|-
346 | `CPUUtilization`|Percentage of CPU utilization
347 | `DatabaseConnections`|Number of connections that we have at a given point in time
348 | `DiskQueueDepth`|Number of read/write requests waiting to access the disk
349 | `FreeableMemory`|Amount of available RAM
350 | `FreeStorageSpace`|Amount of available storage space
351 | `SwapUsage`|When data is stored in memory on disk
352 | `Increase`|In this usually has to do with running out of available RAMReadIOPS/WriteIOPS
353 | `IOPS`|Represent the number of I/O operations completed per secondIf we don’t have enough IOPS, performance will slow down
354 | `ReadLatency/WriteLatency`|* Average amount of time taken per disk I/O operation (input/output)
* High latency can be solved with more IOPSReadThroughput/WriteThroughput
* `Average` is number of bytes read or written to or from disk per second
355 |
356 | .*
357 |
358 | * Also look at *RDS Events*
359 |
360 | ---
361 |
362 |
363 | # [↖](#top)[↑](#2_6_8)[↓](#3_1) Costs
364 |
365 |
366 | ## [↖](#top)[↑](#3)[↓](#3_2) Consolidated Billing
367 | Set up a **billing account** to pay for multiple **linked accounts** at the same time.
368 |
369 | * Allows for **consolidated billing**. Does *not* give IAM visibility into linked accounts.
370 | * Enables **volume discounts** across linked accounts.
371 | * If one account uses *reserved instances*, other accounts running on similar *on demand* instances
372 | will be billed under the reserved instance price. Similar for *RDS* instances.
373 | * All *credits* earned while linked will be applied to consolidated bill.
374 |
375 | Limits:
376 | * Up to 20 linked accounts
377 |
378 |
379 | ## [↖](#top)[↑](#3_1)[↓](#3_3) Billing Metrics & Alarms
380 | * Only shows metrics of services that have been used.
381 | * Set up *billing alarms* based on billing metrics.
382 | * *Overall* billing alarm, or *service-specific* alarms
383 | * Can still be account-specific, even with consolidated billing
384 |
385 |
386 | ## [↖](#top)[↑](#3_2)[↓](#3_4) Costs Optimization
387 | * Purchase **EC2 Reserved Instances**
388 | * Commit for 1-3 years and get a discount
389 | * Minimize the number of running instances
390 | * Set up *CloudWatch* alarms to spin down underutilized instances
391 | * Find balance between acceptable downtime & costs to eleminate this downtime
392 | * Remove unused **Load Balancers**
393 | * Look for idle (unattached) **EBS** volumes
394 | * Delete unused volumes
395 | * Take a *snapshot* to keep the data
396 | * Downsize volumes that aren't near full capacity
397 | * Look for over-provisoned **IOPS**
398 | * Look for unassociated **Elastic IP** addresses
399 | * Look for idle **RDS** instances
400 | * Check for 0 connections
401 |
402 |
403 | ## [↖](#top)[↑](#3_3)[↓](#4) Cost Explorer
404 | * Costs per *time frame* per *service*, various grouping and filtering options
405 | * Provides forecasts
406 | * **Pricing API** allows to download pricing information for specific services
407 |
408 | ---
409 |
410 |
411 | # [↖](#top)[↑](#3_4)[↓](#4_1) High Availability
412 |
413 |
414 | ## [↖](#top)[↑](#4)[↓](#4_2) Scalability & Elasticity Fundamentals
415 | * Pay only for *what* you need *when* you need it
416 | * Define minimum capacity
417 | * Define what needs to stretch out
418 |
419 | .|**Elasticity**|**Scalability**
420 | -|-|-
421 | .|*Scaling up/down on demand*|*Scaling for growth in order to meet long term requirements
typically does not focus on shrinking back*
422 | *DynamoDb*|Can provision more or less throughput|Stores as much data as we like, scales transparently
423 | *EC2*|Use autoscaling|More instances or bigger instance types
424 | *RDS*|./.|Bigger instances, more read replicas
425 |
426 |
427 | ## [↖](#top)[↑](#4_1)[↓](#4_3) Reserved Instances
428 | * *Reserve* instances for a specific period of time
429 | * *Standard* reserved instances (fixed instance type)
430 | * *Convertible* reserved instances (can be exchanged against another convertible instance type)
431 | * *Scheduled* reserved instances (purchased by the hour on a set schedule with a set instance type)
432 | * Up to 50% cheaper than a *fully utilized* on-demand instance (because we commit upfront to a certain usage)
433 | * Guarantees to *not* run into '*insufficent instance capacity*' issues if AWS is unable to provision instances in that AZ
434 | * Can resell reserved capacity on *Reserved Instance Marketplace*
435 | * Available for:
436 | * EC2
437 | * RDS (*reserved instances*)
438 | * DynamoDB (*reserved capacity*)
439 | * ElastiCache (*reserved nodes*)
440 | * CloudFront (*reserved capacity*)
441 | * Elastic MapReduce (*reserved EC2 instances*)
442 | * ECR (*reserved EC2 instances*)
443 |
444 |
445 | ## [↖](#top)[↑](#4_2)[↓](#4_4) Autoscaling vs Resizing
446 | * **Auto Scaling** distributes load across multiple instances
447 | * *Scheduled Scaling* allows to scale or shrink on a schedule
448 | * Relativly complex to set up
449 | * Applications need to be designed to benefit from multiple instances
450 | * Components
451 | * *Launch Configuration*
452 | * *Autoscaling Group*
453 | * *Scaling Policy*
454 | * *Cloudwatch Alarms*
455 |
456 | * **Changing instance size** increases/decreases available resources to the running application
457 | * *EBS* backed instances need to be stopped before resizing
458 | * *Instance storage* need to be migrated across
459 | * Not as flexible as auto scaling. Not elastic
460 | * Within an autoscaling group the to-be-resized instance might be treated as unhealthy
461 |
462 |
463 | ## [↖](#top)[↑](#4_3)[↓](#4_4_1) Load Balancers
464 |
465 | .|**ALB**|**NLB**|**ELB**
466 | -|-|-|-
467 | .|Active Load Balancer|Network Load Balancer|Classic Load Balancer
468 | Layer|7 (application layer)|4 (transport layer)|EC2-classic network (deprecated)
469 | Protocoll|HTTP, HTTPS|TCP|TCP, SSL, HTTP, HTTPS
470 | Health checks|✔|✔|✔
471 | Cloudwatch metrics|✔|✔|✔
472 | Logging|✔|✔|✔
473 | Zone failover|✔|✔|✔
474 | Connection draining|✔|✔|✔
475 | Load balancing to different ports on the same instance|✔|✔|.
476 | WebSockets|✔|✔|.
477 | IP Addresses as targets|✔|✔|.
478 | Load balancing deletion protection|✔|✔|.
479 | Path-based routing|✔|.|.
480 | Host-based routing|✔|.|.
481 | Native http/2|✔|.|.
482 | Configurable idle connection timeout|✔|.|✔
483 | Cross zone load-balancing|✔|✔|✔
484 | SSl-offloading|✔|.|✔
485 | Server-name indication|✔|.|✔
486 | Sticky-sessions|✔|.|✔
487 | Backend server encryption|✔|.|✔
488 | Static IP|.|✔|.
489 | Elastic IP|.|✔|.
490 | Preserve source IP address|.|✔|.
491 | Resource-based IAM permissions|✔|✔|✔
492 | Tag-based IAM permissions|✔|✔|.
493 | Slow start|✔|.|.
494 | User authenticaion|✔|.|.
495 | Redirects|✔|.|.
496 | Fixed responses|✔|.|.
497 |
498 |
499 | ### Elastic Load Balancer ('Classic LB')
500 |
501 |
502 | ### Overview
503 | * *External* load balancer
504 | * Public facing
505 | * Often used to distribute load between web servers
506 | * Provides public DNS host name
507 | * *Internal* load balancer
508 | * Often used to Distribute load between backend servers
509 | * Provides internal DNS host name
510 | * Configure (in AWS console)
511 | * Internal and external load balancer
512 | * Subnets for each AZ that traffic should be routed to
513 | * Can route into private subnets
514 | * Cross-zone load balancing
515 | * Connection draining (maximum time for the load balancer to keep connections alive before reporting the instance as
516 | de-registered)
517 |
518 |
519 | ### Sticky Sessions
520 | * Need to make sure that session is maintained between instances
521 | * Load Balancer generated stickiness (*duration based* session stickiness)
522 | * Application generated stickiness (*application based* session stickiness)
523 | * For HA, use *ElastiCache* to persist and share session state. So maintaining
524 | stickiness doesn't matter any more
525 |
526 |
527 | ## [↖](#top)[↑](#4_4_3)[↓](#4_6) RDS HA
528 | * Create *subnets* in different AZs
529 | * Create *subnet group* in RDS dashboard
530 | * Collection of subnets (typically private) in a VPC that is desgnated for DB instances
531 | * Should have subnets in at least two Availability Zones in a given region
532 | * Configure RDS for **multi-AZ-deployments** and turn replication on
533 | * Keeps a *synchronous* standby replica in a different AZ
534 | * Recommendation is use of Provisioned IOPS
535 | * Automatic failover in case of planned or unplanned outage of the first AZ
536 | * Most likely still has downtime
537 | * Can *force* failover by *rebooting*
538 | * Other benefits
539 | * Patching
540 | * Backups
541 | * *Aurora* can replicate accross 3 AZs
542 | * Failover process is automated
543 | * AWS detects an issue and starts the failover process
544 | * DNS records are modified to point to the standby instance
545 | * Application re-establishes existing DB connections
546 |
547 |
548 | ## [↖](#top)[↑](#4_5)[↓](#4_7) HA for IP-based Applications
549 | * If the application requires specific IPs (that are hardcoded somewhere), autoscaling cannot be used
550 | * Use *Elastic IP* and standby instances in different AZs instead
551 | * Cannot use Elastic IP across different regions though
552 | * Scale by increasing instance size (vertical scaling)
553 |
554 |
555 | ## [↖](#top)[↑](#4_6)[↓](#5) HA/Fault Tolerance for Bastion Hosts
556 | * Assign Elastic IP to bastion host in AZ 1
557 | * This IP can also be whitelisted to comply with corporate regulations
558 | * Have another instance on standby in different AZ
559 | * Could be in *ASG* (min/max 1), so that it gets immediately replaced
560 | * Place 2 instances behind ELB and enable *SSH Keep Alive*
561 | * Place 1 instance behind ELB, configure *auto recovery*
562 |
563 | ---
564 |
565 |
566 | # [↖](#top)[↑](#4_7)[↓](#5_1) Analysis
567 |
568 |
569 | ## [↖](#top)[↑](#5)[↓](#5_1_1) Optimize the environment to ensure maximum performance
570 |
571 |
572 | ### Offloading database workload
573 | * Using **read replicas**
574 | * Read queries are routed to *read replicas*, reducing load on primary db instance
575 | (*source instance*)
576 | * Table indexes can be created on read replicas directly (and not on the master)
577 | * Some use cases (e.g. data analytics) can be performed exclusively against read replicas
578 | * To create read replicas, AWS initally creates a snapshot of the source instance
579 | * Multi-AZ failover instance (if enabled) is used for snapshotting
580 | * After that all read queries are then *asynchronously* copied to read replica
581 | * Implies data latency, which typically is acceptable.
582 | * `ReplicaLag` can be monitored and *Cloudwatch* alarms can be configured
583 | * *Read replicas* are **not** the same as *multi-AZ failover* instances which
584 | * are *synchronously* updated
585 | * are designed to handle failover
586 | * don't receive any load unless failover actually happens
587 | * Often it is beneficial to have both read replicas and multi-AZ failover instances
588 | * Read replicas themselves can not use the Multi-AZ feature
589 | * A single master can have **up to 5** read replicas
590 | * Can be in different regions
591 |
592 | * Setting up a read replica
593 | * Configure from master instance or other read replica
594 | * Requires 'automated backups' to be enabled on source instance
595 | * Choice of db engine matters, because internal engine features are being used
596 | * Usually pick same database instance type as source instance uses
597 | * AWS provisiones different *endpoint* for read replica
598 | * Configure use of endpoint on application level
599 |
600 | * Read replicas can be promoted to normal instances
601 | * E.g. use read replica to implement bigger changes on db level, after these have been finished
602 | promote to master instance
603 | * Useful for database sharding, could create replicas for each shard
604 |
605 |
606 | ### Looking at EBS volumes
607 | * EBS *pre-warming*
608 | * Used to be required for maximum performance
609 | * Performance is reduced the very first time each block is accessed
610 | * Has been renamed to *initialization* and is no longer required if new EBS volumes are used
611 | * Still required for volumes that are restored from snapshots
612 | * Storage blocks must be initialized (pulled down from Amazon S3 and written to the volume)
613 | * Use `dd` or `fio` to *read* from every block
614 | * Only required if performance matters, obviously
615 |
616 |
617 | ### Prewarming ELBs
618 | * ELB is designed to increase its resource capacity gradually
619 | * Prevents `http 503` (ELB cannot handle anymore requests)
620 | * Can contact AWS to `pre-warm` ELB
621 | * This should not really be required. Maybe if TV ads are running or so.
622 | * Use load testing tools to get a rough estimate of what the current ELB can handle
623 | * Increase at a rate no more than 50% per 5min.
624 |
625 |
626 | ## [↖](#top)[↑](#5_1_3)[↓](#5_2_1) Identify Performance Bottlenecks and Implement Remedies
627 |
628 |
629 | ### Resizing or changing EBS root volumes
630 | * If EBS is at capacity
631 | * Either upgrade volume size to increase the amount of IOPS available
632 | * Or switch to provisiones IOPS volumes (`io1`)
633 | * Resizing
634 | * Create snapshot of EBS volume first
635 | * Incrementally stored on S3
636 | * Can continue to use EBS volume while the snapshot is taking place
637 | * Create new volume from snapshot
638 | * Stop instance
639 | * Attach new volume
640 |
641 |
642 | ### Setting up certificates for Elastic Load Balancers
643 | * Offloading overhead from the instances behind the ELB
644 | * Create ELB and configure https
645 | * Certificate from
646 | * ACM (AWS managed)
647 | * IAM (for external certificiates)
648 | * Upload directly
649 |
650 |
651 | ### Network bottlenecks
652 | * Primary network bottlenecks
653 | * EC2 instances
654 | * Instances in different AZs or regions
655 | * Different instance types get different bandwith capacities
656 | * No absolute numbers communicated by AWS though
657 | * Not using *enhanced network capabilities* (not supported by some instance types)
658 | * Check for performance issues with` iperf3` (github)
659 | * Measures performance for ip-based networks
660 | * Use VPC Peering to create a reliable connection
661 | * No single point of failure
662 | * Connection to on-prem networks
663 | * Use `Direct Connect`
664 |
665 |
666 | ## [↖](#top)[↑](#5_2_3)[↓](#5_3_1) Identify Potential Issues on a Given Application Deployment
667 |
668 |
669 | ### EBS Root Devices on Terminated Instances - Ensuring Data Durability
670 | * *EBS root volumes* will be deleted on instance termination as per default option
671 | * Could create snapshot before termination to backup data
672 | * Could change default settings
673 | * *Instance store root volumes* will be left untouched on instance termination
674 |
675 |
676 | ### Troubleshooting Auto Scaling Issues
677 | * Attempting to use wrong subnet
678 | * AZ no longer available or supported (outage)
679 | * Security group does not exist
680 | * Associated keypair does not exist
681 | * Auto scaling configuration is not working correctly
682 | * Instance type specification does not exist in that AZ
683 | * Auto scaling is not enabled on that subnet
684 | * Invalid EBS device mapping
685 | * Attempt to attach EBS block device to instance-store AMI
686 | * AMI issues
687 | * Attempt to use *placement groups* with instance types that don't support that
688 | * AWS running out of capacity in that AZ
689 | * If an instance is stopped, e.g. for updating it, autoscaling will consider it unhealthy and
690 | terminate - restart it. Need to suspend autoscaling first.
691 |
692 | ---
693 |
694 |
695 | # [↖](#top)[↑](#5_3_2)[↓](#6_1) OpsWorks
696 |
697 |
698 | ## [↖](#top)[↑](#6)[↓](#6_1_1) Overview and components
699 | * Declarative desired state engine
700 | * Automate, monitor and maintain deployments
701 | * **Cookbooks** define **recipes**
702 | * AWS' implementation of *Chef*
703 | * Original Chef
704 | * AWS-bespoke orchestration components
705 | * Components
706 | * **Stack**
707 | * Set of resources that is managed as a group
708 | * Whole service stack
709 | * **Layer**
710 | * Represent and configure components of a stack
711 | * E.g. loadbalancer layer, app layer, db layer
712 | * Share common configuration elements
713 | * **Instance**
714 | * Units of compute within the platform
715 | * Must be associated with at least one layer
716 | * Can run
717 | * 24/7
718 | * Load-based
719 | * Time-based
720 | * **Application**
721 | * Applications that are deployed on one or more instances
722 | * Deployed through source code repo or S3
723 | * Recipes
724 | * Created in ruby, used to customize different layers
725 | * Run at stack lifecycle events
726 | * `setup`
727 | * *Instance* has finished booting
728 | * `configure`
729 | * *Instance* enters or leaves the `online` state
730 | * *Elastic IP* is associated or disassociated
731 | * *Load balancer* is attached or detached
732 | * Event is executed on *all* instances, not only the impacted one
733 | * `deploy`
734 | * *Deploy command* is run on an instance
735 | * `undeploy`
736 | * *Undeploy command* is run on an instance
737 | * *App* is deleted
738 | * `shutdown`
739 | * When *instance* is shutdown, before termination
740 | * Allows cleanup
741 | * Under the hood
742 | * *OpsWorks* **agent**
743 | * Configuration of machines
744 | * *OpsWorks* **automation engine**
745 | * *Create*, *update* & *delete* of various AWS components
746 | * Handles *loadbalancing*, *autoscaling* and *autohealing*
747 | * Supports *lifecycle* events
748 |
749 |
750 | ### BerkShelf
751 | * Addresses an *OpsWorks* shortcoming from old versions - only one repository for recipes
752 | * Was added in *OpsWorks* 11.10 and allows to install cookbooks from many repositories
753 |
754 | TODO: Quickstart OpsWorks
755 |
756 |
757 | ## [↖](#top)[↑](#6_1_1)[↓](#6_2_1) Cloudformation
758 |
759 |
760 | ### Overview
761 | * Allows to create and provision **resources** in a reusable **template** fashion
762 | * A *CloudFormation* template is a `JSON` or `YAML` formatted text file
763 | * Related resources are managed in a single unit called a **stack**
764 | * Controls lifecycle of managed resources
765 | * All the resources in a stack are defined by the stack's *CloudFormation* template
766 | * Stack has `name` & `id`
767 | * Two ways to update a stack
768 | * *Direct update*
769 | * Directly applies changes (if any)
770 | * *Change set*
771 | * Summary of proposed changes, can be applied or rejected
772 | * Will **rollback** stack if it fails to create (can be disabled via API/console)
773 | * A **stack policy** is an *IAM*-style policy statements that governs who can do what
774 |
775 |
776 | ### Templates
777 | * `AWSTemplateFormatVersion`
778 | * `Description`
779 | * `Metadata`
780 | * Details about the template
781 | * `Parameters`
782 | * Values to pass in right before template creation
783 | * Type
784 | * `String`, `Number`, `List`, `CommaDelimitedList`
785 | * AWS-specific types like `AWS::EC2::KeyPair::KeyName`
786 | * Description
787 | * Default Value
788 | * Allowed Values
789 | * Allowed Pattern
790 | * Validation per *regular expression*
791 | * MinLength/MaxLength
792 | * MinValue/MaxValue
793 | * Problem:
794 | * Usage of parameters *might* make it hard to instantiate stacks without human interaction
795 | * *CloudFormation* is able to auto-generate many resources attributes, e.g. name
796 | * `Mappings`
797 | * Maps keys to values (eg different values for different regions)
798 | * `Conditions`
799 | * Check values before deciding what to do
800 | * `Resources`
801 | * Creates resources. Only mandatory section in a template.
802 | * Can have `Condition` element to toggle creation
803 | * `Outputs`
804 | * Values to be exposed from the console or from API calls.
805 | * Can be used in a different stack (*cross stack references*)
806 | * Can be:
807 | * Constructed value
808 | * Parameter reference
809 | * Pseudo parameter
810 | * Output from a function like `fn::getAtt` or `Ref`
811 |
812 |
813 | ### Intrinsic Functions
814 | * Used to pass in values that are not available until runtime
815 | * Usable in `resource` properties, `metadata` attributes, and `update policy` attributes (auto-scaling)
816 | * `Ref`
817 | * Returns the *default* value of the specified parameter or resource, usually instance id
818 | * `Fn::GetAtt`
819 | * Returns the value of an attribute from an object, either the default or the specified attribute
820 | * Object is either from the same or a nested template
821 | * `Fn::Join`
822 | * Joins a set of values into a single value separated by the specified delimiter
823 | * `Fn::Sub`
824 | * Substitutes variables in an input string with values that you specify
825 | * `Fn::FindInMap`
826 | * Returns the value corresponding to keys in a two-level map that is declared in the *Mappings*
827 | section
828 | * `Fn::Select`
829 | * Returns a single object from a list of objects by index
830 | * `Fn::Base64`
831 | * Provides encoding, converts from plain text into base64
832 | * `Fn::GetAZs`
833 | * Returns an array that lists *Availability Zones* for a specified region
834 | * If region is omitted return AZs from the region the template is applied in
835 | * `Fn::ImportValue`
836 | * Returns the value of an *Output* exported by another stack
837 | * `Fn::Split`
838 | * Split a string into a list of string values so that you can select an element from the resulting
839 | string list
840 | * `Fn::If`
841 | * Takes a list of arguments (`boolean`, `string1`, `string2`)
842 | * Returns `string1` if `boolean` is `true`, `string2` otherwise
843 | * `Fn::And`, `Fn::Equals`, `Fn::Or`, `Fn::Not`
844 | * Good for `condition` element
845 |
846 | ---
847 |
848 |
849 | # [↖](#top)[↑](#6_2_3)[↓](#7_1) Backups & Recovery
850 |
851 |
852 | ## [↖](#top)[↑](#7)[↓](#7_2) AWS Services with automated backups
853 | * RDS
854 | * Backups
855 | * *Transactional* storage engine recommended as DB engine
856 | * Degrades performance if multi-AZ is not enabled (taken from slave if enabled)
857 | * Deleting an instance deletes all *automated* backups
858 | * Backups are stored internaly on S3
859 | * PITR 5 minutes
860 |
861 | * Restoring
862 | * When restoring, only default parameters and security groups are associated with instance
863 | * Can change to different storage engine if closely related and enough space available
864 |
865 | * Elasticache
866 | * Backups
867 | * Available to Redis cluster only
868 | * Taking snaphots can degrade performance, should be performed on read replica
869 | * Backups are stored internaly on S3
870 |
871 | * Redshift
872 | * Backups
873 | * Provides free storage equal to the storage capacity of the cluster
874 | * Snapshots can be automated or manual and are incremental
875 | * Backups are stored internaly on S3
876 | * Restoring
877 | * Creates a new cluster and imports the data
878 |
879 | * EC2
880 | * Backups
881 | * No built-in automated backup solution
882 | * Snapshots of EBS volumes are incremental, causing performance degradation
883 | * Every snapshot will restore *all* data, even if older snapshots are deleted
884 | * Backups are stored internaly on S3
885 |
886 |
887 | ## [↖](#top)[↑](#7_1)[↓](#7_2_1) Disaster Recovery Scenarios
888 |
889 |
890 | ### DR of on-prem infra
891 | * Use AWS as backup solution by storing VMs, snapshots and other data
892 | * 'Pilot light' - have bare minimum infra always ready and scale up as required
893 | * 'Hot standby' (aka 'multi site') - has everything ready to go
894 |
895 |
896 | ### DR of cloud infra
897 | * Duplicate the environment from one region to another
898 |
899 |
900 | ### DR of RDS data
901 | * Protection from multiple AZs being down
902 | * Reduce latency for global audience
903 | * Replica lag will most likely go up
904 | * Data transfer across regions is getting charged
905 | * May potentially run into bandwith issues
906 | * Create read replica from existing DB instance, pick different region
907 | * Trigger setup process that will take some time
908 |
909 |
910 | ## [↖](#top)[↑](#7_2_3)[↓](#8) Storing log files and backups
911 | * Implement centralized logging
912 | * From there
913 | * Send to 3rd party tool for analyis
914 | * Backup to S3
915 | * 11x9 durability
916 | * Versioning
917 | * Lifecycle policies
918 |
919 | * Other logging options
920 | * S3 access logs
921 | * Cloudtrail
922 | * Cloudwatch
923 |
924 | ---
925 |
926 |
927 | # [↖](#top)[↑](#7_3)[↓](#8_1) Security
928 |
929 |
930 | ## [↖](#top)[↑](#8)[↓](#8_1_1) Implement and Manage Security Policies
931 |
932 |
933 | ### IAM
934 | IAM is a global service that helps to securely control access to AWS resources.
935 |
936 | * **Users** hold credentials
937 | * **Groups** hold users, typically only provides permission to assume a role
938 | * **Roles** hold policies.
939 | * Can have **trust relationships** with trusted entities that can *assume* this role
940 | * **Policies** can be attached to users, groups or roles (preferred)
941 | * An **instance profile** is a container for an IAM role that you can use to pass role information to an
942 | EC2 instance when the instance starts.
943 | * Users and/or services assume roles
944 |
945 |
946 | #### Policies
947 | * Any actions on resources that are not explicitly allowed are **denied by default**
948 | * Structure
949 | * **E** - `effect` (*allow*/*deny*)
950 | * What the effect will be when the user requests the specific action
951 | * **P** - `prinicpal` (*ARN*)
952 | * The account or user who is allowed access to the actions and resources in the statement
953 | * IAM policies do not have a principal (because they are attached to users, groups or roles)
954 | * **A** - `action` or `notaction`
955 | * Describes the specific action or actions that will be allowed or denied
956 | * **R** - `resource` or `notresource`
957 | * Specifies the object or objects that the statement covers
958 | * **C** - `condition`
959 | * Specifies conditions for when a policy is in effect
960 | * Can use **policy variables**
961 | * `aws:currentTime`, `aws:userid`, ...
962 |
963 | ```
964 | {
965 | "Version": "2012-10-17",
966 | "Statement": [
967 | {
968 | "Effect": "Allow",
969 | "Action": "s3:ListAllMyBuckets",
970 | "Resource": "arn:aws:s3:::*"
971 | },
972 | {
973 | "Effect": "Allow",
974 | "Action": [
975 | "s3:ListBucket",
976 | "s3:GetBucketLocation"
977 | ],
978 | "Resource": "arn:aws:s3:::productionapp"
979 | },
980 | {
981 | "Effect": "Allow",
982 | "Action": [
983 | "s3:GetObject",
984 | "s3:PutObject",
985 | "s3:DeleteObject"
986 | ],
987 | "Resource": "arn:aws:s3:::productionapp/*"
988 | }
989 | ]
990 | }
991 | ```
992 |
993 | #### IAM Policies
994 | * Managed policies (the new way)
995 | * Can be attached to multiple users, groups and roles
996 | * AWS managed policies
997 | * Updated by AWS if new API come out
998 | * Customer managed policies
999 | * Inline policies (the old way)
1000 |
1001 |
1002 | #### IAM roles and EC2
1003 |
1004 | * Create an IAM role.
1005 | * Define which accounts or AWS services can assume the role.
1006 | * EC2 here, could be other services
1007 | * Define which API actions and resources the application can use after assuming the role.
1008 | * Specify the role when you launch your instance, or attach the role to a running or stopped instance.
1009 | * Have the application retrieve a set of temporary credentials and use them.
1010 |
1011 | * Only one role can be assigned to an EC2 instance, and all applications share the same role and permissions
1012 |
1013 |
1014 | ### S3 IAM and bucket policy concepts
1015 |
1016 |
1017 | #### Defaults
1018 | * Bucket is *owned* by the AWS account that created it
1019 | * Bucket ownership is not transferable
1020 | * Bucket owner gets full permission (ACL)
1021 | * The person paying the bills always has full control.
1022 | * A person uploading an object into a bucket owns it by default.
1023 |
1024 |
1025 | #### Bucket policies (resource level)
1026 | * Specify what actions are allowed or denied for which principals on the bucket that the policy
1027 | is attached to
1028 | * Attached *only* to S3 buckets. Can however effect object in buckets.
1029 | * Contains *principal* element (unnecessary for IAM policies)
1030 | * Use if you’re more interested in *“Who can access this S3 bucket?”*
1031 | * Easiest way to grant *cross-account permissions* for all `s3:*` permission. (Cannot do this
1032 | with ACLs.)
1033 | * Explicit *deny* in bucket policy overwrites explicite *allow* in IAM policy
1034 | * Defined as JSON
1035 |
1036 | ```
1037 | {
1038 | "Version":"2012-10-17",
1039 | "Statement":
1040 | [
1041 | {
1042 | "Sid":"PutObjectAcl",
1043 | "Effect":"Allow",
1044 | "Principal":
1045 | {
1046 | "AWS":
1047 | [
1048 | "arn:aws:iam::111122223333:tom", "arn:aws:iam::444455556666:chris"
1049 | ]
1050 | },
1051 | "Action":
1052 | [
1053 | "s3:PutObject",
1054 | "s3:PutObjectAcl"
1055 | ],
1056 | "Resource":
1057 | [
1058 | "arn:aws:s3:::examplebucket/*"
1059 | ]
1060 | }
1061 | ]
1062 | }
1063 | ```
1064 |
1065 |
1066 | #### ACLs
1067 | * Defined as XML. Legacy, not recomended any more.
1068 | * Can
1069 | * be attached to individual objects (bucket policies only bucket level)
1070 | * control access to object uploaded into a bucket from a *different* account.
1071 | * Cannot..
1072 | * have conditions
1073 | * cannot explicitely deny actions
1074 | * grant permission to bucket sub-resources (eg. lifecycle or static website configurations)
1075 | * Other than *object ACL*s there are *bucket ACL*s as well - only for writing access log objects to a
1076 | bucket.
1077 | ```
1078 |
1079 |
1080 |
1081 | *** Owner-Canonical-User-ID ***
1082 | owner-display-name
1083 |
1084 |
1085 |
1086 |
1088 | *** Owner-Canonical-User-ID ***
1089 | display-name
1090 |
1091 | FULL_CONTROL
1092 |
1093 |
1094 |
1095 | ```
1096 |
1097 |
1098 | #### IAM policies (user level)
1099 | * IAM policies (in general) specify what actions are allowed or denied on what AWS resources
1100 | * Attached to IAM users, groups, or roles (so they cannot grant access to anonymous users)
1101 | * Use if you’re more interested in *“What can this user do in AWS?”*
1102 |
1103 | .|.
1104 | -|-
1105 | `arn:partition:service:region:namespace:relative-id`|`arn:aws:s3:::mybucket`
1106 | `arn:aws:s3:::*`|All buckets and objects in account
1107 | `arn:aws:s3:::mybucket`|`mybucket`
1108 | `arn:aws:s3:::mybucket/*`|All objects in `mybucket`
1109 | `arn:aws:s3:::mybucket/mykey`|`mykey` in `mybucket`
1110 | `arn:aws:s3:::mybucket/developers/($aws:username)/`|folder matching the accessing user's name
1111 |
1112 |
1113 | #### Cloudfront
1114 | * Can use Cloudfront Origin Access Identity to restrict access to S3 objects
1115 |
1116 |
1117 | ## [↖](#top)[↑](#8_1_2_5)[↓](#8_2_1) Ensure Data Integrity and Access Controls when Using the AWS Platform
1118 |
1119 |
1120 | ### MFA
1121 | * *Should* be turned on for all console access
1122 | * *Can* be enabled for API access as well
1123 | * The administrator configures an AWS MFA device for each user who needs to make API requests that
1124 | require MFA authentication. This process is described at Enabling MFA Devices.
1125 | * The administrator creates policies for the users that include a *Condition* element that checks
1126 | whether the user authenticated with an AWS MFA device.
1127 | * The user calls one of the AWS STS API operations that support the MFA parameters `AssumeRole` or
1128 | `GetSessionToken`, depending on the scenario for MFA protection, as explained later. As part of the
1129 | call, the user includes the device identifier for the device that's associated with the user. The
1130 | user also includes the time-based one-time password (TOTP) that the device generates. In either case,
1131 | the user gets back temporary security credentials that the user can then use to make additional
1132 | requests to AWS.
1133 | * This is not supported by all services (support by *SQS*, *SNS*, *S3*)
1134 |
1135 | * MFA delete can be enabled for root accounts (bucket owners) before permanently deleting an object
1136 |
1137 | ```
1138 | {
1139 | "Version": "2012-10-17",
1140 | "Statement": [{
1141 | "Effect": "Allow",
1142 | "Principal": {"AWS": ["ALICE", "BOB"]},
1143 | "Action": [ "s3:PutObject", "s3:DeleteObject" ],
1144 | "Resource": ["arn:aws:s3:::Alice-Bucket/*"],
1145 | "Condition": {"Bool": {"aws:MultiFactorAuthPresent": "true"}}
1146 | }]
1147 | }
1148 | ```
1149 |
1150 |
1151 | ### Secure Token Service (STS)
1152 | * Allows to grant **temporary access** to authenticated users
1153 | * IAM users
1154 | * Web-based identity providers (google, facebook, ...)
1155 | * Organization's existing identity system
1156 | * Returns **temporary credentials** that expire after some time:
1157 | * Access key
1158 | * Session token
1159 |
1160 |
1161 | #### Terms
1162 | * **Federation**
1163 | * Trust relationship between identity provider and AWS
1164 | * **Identity broker**
1165 | * Broker in charge of mapping user to the right set of credentials
1166 | * **Identity store**
1167 | * Eg Google or Facebook
1168 | * **Identities**
1169 | * Users
1170 |
1171 |
1172 | #### Scenarios
1173 | * Temporary credentials with EC2
1174 | * Assign IAM role to instance
1175 | * Get temp credentials from *instance metadata*
1176 | * Temporary credentials with SDK
1177 | * Call `assumeRole`, extract temp credentials
1178 | * Options for temporary credentials with API calls
1179 | * *Sign request* with temp credentials
1180 | * Add AC/SK to request (*header* or *query string*)
1181 |
1182 |
1183 | ## [↖](#top)[↑](#8_2_2_2)[↓](#8_4) Share responsibility model
1184 | * **Shared responsibility** environment
1185 | * AWS is responsible for:
1186 | * Server/Host level and below
1187 | * Physical environment security
1188 | * Hardware decommissioning
1189 | * Traffic security (Networks, ACLs, SSL, DDOS-protection)
1190 | * EC2 hypervisor isolation
1191 | * User is responsible for:
1192 | * IAM
1193 | * MFA
1194 | * Password/key-rotation
1195 | * Access advisor (shows used permissions)
1196 | * Trusted advisor (validates best practices)
1197 | * Security groups
1198 | * ACL (resource based policy)
1199 | * VPC
1200 |
1201 |
1202 | ## [↖](#top)[↑](#8_3)[↓](#9) AWS and IT Audits
1203 | * AWS performs self audits of changes to key services to monitor quality, maintain high standards, and
1204 | facilitate continuous improvement of the change management process
1205 | * For audits, AWS provides:
1206 | * *Security of the cloud*
1207 | * Information regarding their global infrastructure
1208 | * From the host operating system and virtualization layer down to the physical security of facilities
1209 | * Annual certifications and reports: (like the Service Organization Control (SOC) reports, ISO 27001
1210 | cert, PCI assessments)
1211 | * For audits, the customer provides:
1212 | * *Security in the cloud*
1213 | * Anything their organization puts on (or connects to) their AWS assets
1214 | Examples: guest operating system, apps on virtual machine instances, objects in S3, database like RDS,
1215 | etc...
1216 |
1217 | ---
1218 |
1219 |
1220 | # [↖](#top)[↑](#8_4)[↓](#9_1) Networking
1221 |
1222 |
1223 | ## [↖](#top)[↑](#9)[↓](#9_1_1) Route53 Routing Policies
1224 | * *Simple*
1225 | * *Weighted*
1226 | * *Latency*
1227 | * *Failover*
1228 | * *Geolocation*
1229 |
1230 |
1231 | ### DNS Failover
1232 | * Can set up *health checks* for endpoints or domains from within *Route53*
1233 | * Route 53 has health checkers in locations around the world. When you create a health check that
1234 | monitors an endpoint, health checkers start to send requests to the endpoint that you specify
1235 | to determine whether the endpoint is healthy.
1236 | * `evaluate target health`
1237 | * DNS entries are then being associated with health checks and can be configured to failover as
1238 | well (1 primary and n secondary recordsets)
1239 |
1240 |
1241 | ### Weighted
1242 | * Control distribution of traffic with DNS entries
1243 | * This can be based on a certain percentage
1244 | * Set *routing policy* to weighted (instead of failover)
1245 |
1246 |
1247 | ### Latency-based
1248 | * Control distribution of traffic based on latency.
1249 |
1250 |
1251 | ## [↖](#top)[↑](#9_1_3)[↓](#9_2_1) VPC Essentials
1252 | * Provisions a logically isolated section of the AWS cloud
1253 | * Spans over all AZs in a region
1254 | * Allows to create layered architecture
1255 | * Shared or dedicated tenancy (exclusive hardware or not)
1256 | * *Security groups* and subnet *network ACLs*
1257 | * Ability to extend on-premise network to cloud
1258 |
1259 |
1260 | ### Default VPC (Amazon specific)
1261 | * Gives easy access to a VPC without having to configure it from scratch
1262 | * Has different subnets in different AZs and an internet gateway per AZ
1263 | * Each instance launched automatically receives a *public IP* (very different to non-default VPC)
1264 | * Cannot be restored if deleted
1265 |
1266 |
1267 | ### Non-default VPC (regular VPC)
1268 | * Only has private IP addresses
1269 | * Resources *only* accessible through *Elastic IP*, *VPN* or *internet gateways*
1270 | * Does not have a gateway attached
1271 |
1272 |
1273 | ### VPC Peering
1274 | * Connect VPCs through direct network routing
1275 | * Can occur between different accounts and VPCs, but must be in the same region
1276 | * Allows instances to communicate with each other as if they were in the same network
1277 | * CIDRs must not overlap
1278 |
1279 |
1280 | ### VPC Scenarios
1281 | * VPC with private subnet only -> single tier apps
1282 | * VPC with public and private subnets -> layered apps
1283 | * VPC with public, private subnets and hardware connected VPN -> extending apps to on-premise
1284 | * VPC with private subnets and hardware connected VPN -> extended VPN
1285 |
1286 |
1287 | ### Components
1288 | * **Subnet**
1289 | * In exactly one AZ
1290 | * If a subnet doesn't have a route to the Internet gateway, it's known as a *private* subnet
1291 | * Instances receive
1292 | * *Private IP* address
1293 | * Internal DNS hostname
1294 | * If traffic is routed to an Internet gateway, the subnet is known as a *public* subnet
1295 | * Instances receive
1296 | * *Public IP* address
1297 | * External DNS hostname
1298 | * EC2 instances are launched into subnets
1299 | * Use ssh-agent forwarding to connect from public to private instances
1300 | * Sometimes grouped into Subnet Groups, e.g. for caching or DB. Typically across AZs
1301 | * **Route Table**
1302 | * Contains a set of rules, called routes that determine where network traffic is directed to
1303 | * Each VPC automatically comes with a main route table that can be configured
1304 | * Each subnet in a VPC must be associated with a route table; the table controls the routing
1305 | for the subnet. A subnet can only be associated with one route table at a time, but multiple
1306 | subnets can be associated with the same route table
1307 | * Each route in a table specifies a destination CIDR and a target
1308 | * Every route table contains a local route for communication within the VPC
1309 | * Can have a *default route* 0.0.0.0/0 to route everything that doesn't have a specific rule
1310 | * **Elastic IP**
1311 | * Static IPv4 address mapped to an *instance* or *network interface*
1312 | * If attached to network interface it's decoupled from the instance's lifecycle
1313 | * Routes to *private IP* address of instance
1314 | * Can be remapped in case of failure.
1315 | * For use in a specific region only
1316 | * Can only map to instances in public subnets
1317 | * **Gateways**
1318 | * *Internet Gateway*
1319 | * Horizontally scaled, redundant, and highly available VPC component that allows communication
1320 | between instances in a VPC and the internet
1321 | * Provides a target in VPC route tables for internet-routable traffic
1322 | * Performs network address translation (NAT) for instances that have been assigned public
1323 | IPv4 addresses
1324 | * *Virtual Private Gateway*
1325 | * Has VPN connection to customer gateway attached
1326 | * Serves as VPN concentrator on the Amazon side of the VPN connection
1327 | * *Customer Gateway*
1328 | * A physical device or software application on your side of the VPN connection
1329 | * **NAT**
1330 | * *NAT Instances*
1331 | * Manually configured instance from an NAT AMI
1332 | * *NAT Gateway*
1333 | * AWS-mananged service
1334 |
1335 |
1336 | ### Security
1337 |
1338 | #### Network ACL
1339 | * Subnet level, acting as firewall
1340 | * Rules for inbound and outbound traffic
1341 | * Rules have numbers and are evaluated from low to high, first matching rule wins, others are *not* evaluated
1342 | * *Stateless*
1343 |
1344 |
1345 | #### Security Groups
1346 | * Acts as a virtual firewall to control inbound and outbound traffic to instances
1347 | * Acts on instance level, not subnet level
1348 | * Rules for inbound and outbound traffic
1349 | * *Stateful* - will always allow response to (allowed) outbound traffic
1350 | * Can refer to other security group, e.g. allow traffic from there
1351 |
1352 |
1353 | #### Structure & package flow
1354 | * VPC (has *CIDR*)
1355 | * Gateway (Internet or VPN)
1356 | * Routes (one per subnet, can be shared)
1357 | * Network ACL (one per subnet, can be shared)
1358 | * Subnets (CIDRs match VPC's CIDR)
1359 | * Security Group (on VPC level)
1360 | * Instance (needs public IP for internet communication, either ELB or Elastic IP)
1361 |
1362 | * Flow from internet
1363 | * Internet Gateway
1364 | * VPC Router (routes into desired subnet)
1365 | * Route Table (of that subnet)
1366 | * NACL
1367 | * Security Group
1368 | * Instance
1369 |
1370 |
1371 | #### Connection To On-prem Network/Direct Connect
1372 | * VPC
1373 | * (has attached) Virtual Private Gateway
1374 | * (has attached) VPN Connection
1375 | * (has attached) Customer Gateway
1376 |
1377 | TODO: VPN vs direct connect. Can I use VPN instead of DC?
1378 |
1379 |
1380 | ## [↖](#top)[↑](#9_2_6_4)[↓](#10) Limits:
1381 | .|.
1382 | -|-
1383 | VPCs per region|5
1384 | Subnets per VPC|200
1385 | Customer gateways per region|50
1386 | Virtual private gateways per region|5
1387 | Virtual private gateways per VPC|1
1388 | Gateway per region|5 Internet
1389 | Elastic IPs per account per region|5
1390 | VPN connections per region|50
1391 | Route tables per region|200
1392 | Security groups per region|500
1393 |
1394 |
1395 | # [↖](#top)[↑](#9_3)[↓](#10_1) Etc
1396 |
1397 |
1398 | ## [↖](#top)[↑](#10)[↓](#10_2) Accessing the OS
1399 | * Services that allow access the the underlaying OS
1400 | * EC2
1401 | * ECS
1402 | * EB (Elastic Bean Stalk)
1403 | * EMR (Elastic Map Reduce)
1404 | * OpsWorks
1405 | * Services that hide the OS away (managed services)
1406 | * DynamoDB
1407 | * RDS
1408 |
1409 |
1410 | ## [↖](#top)[↑](#10_1)[↓](#10_3) SQS
1411 | * Default message retention period: 4 days (max 14 days)
1412 | * `DelaySeconds` will delay a message appearing in the queue
1413 | * Setting `WaitTimeSeconds` will enable *long polling* (can be more cost efficient)
1414 |
1415 |
1416 | ## [↖](#top)[↑](#10_2)[↓](#) DynamoDb
1417 | * Prefix partition key with hash to enforce even distribution of IO across many partitions
1418 |
--------------------------------------------------------------------------------