├── Domain 1- Cloud Computing Concepts and Architectures.md ├── Domain 10- Application Security.md ├── Domain 11- Data Security and Encryption.md ├── Domain 12- Identity, Access, and Entitlement Management.md ├── Domain 13- Security as a Service.md ├── Domain 14- Related Technologies.md ├── Domain 2- Governance and Enterprise Risk Management.md ├── Domain 3- Legal Issues, Contracts and Electronic Discovery.md ├── Domain 4- Compliance and Audit Management.md ├── Domain 5- Information Governance.md ├── Domain 6- Management Plane and Business Continuity.md ├── Domain 7- Infrastructure Security.md ├── Domain 8- Virtualization and Containers.md ├── Domain 9- Incident Response.md ├── Images ├── 1.1.2-1.png ├── 1.1.2.3-1.png ├── 1.1.3-1.png ├── 1.1.3.1-1.png ├── 1.1.3.2-1.png ├── 1.1.4-1.png ├── 1.2.1-1.png ├── 1.2.2-1.png └── 1.2.2.1-1.png └── README.md /Domain 11- Data Security and Encryption.md: -------------------------------------------------------------------------------- 1 | # Domain 11: Data Security and Encryption 2 | 3 | ## Introduction 4 | 5 | Data security is a key enforcement tool for information and data governance. As with all areas of cloud security its use should be risk-based since it is not appropriate to secure everything equally. 6 | 7 | This is true for data security overall, regardless of whether or not the cloud is involved. However, many organizations aren't as accustomed to trusting large amounts of their sensitive data—if not all of it—to a third party, or mixing all their internal data into a shared resource pool. As such, the instinct may be to set a blanket security policy for "anything in the cloud" instead of sticking with a risk-based approach, which will be far more secure and cost effective. 8 | 9 | For example, encrypting everything in SaaS because you don't trust that provider at all likely means that you shouldn't be using the provider in the first place. But encrypting everything is not a cure-all and may lead to a false sense of security, e.g. encrypting data traffic without ensuring the security of the devices themselves. 10 | 11 | By some perspectives information security *is* data security, but for our purposes this domain will focus on those controls related to securing the data itself, of which encryption is one of the most important. 12 | 13 | ## Overview 14 | 15 | ### Data Security Controls 16 | 17 | Data security controls tend to fall into three buckets. We cover all of these in this section: 18 | 19 | 1. Controlling what data goes into the cloud (and where). 20 | 2. Protecting and managing the data in the cloud. The key controls and processes are: 21 | * Access controls 22 | * Encryption 23 | * Architecture 24 | * Monitoring/alerting (of usage, configuration, lifecycle state, etc.) 25 | * Additional controls, including those related to the specific product/service/platform of your cloud provider, data loss prevention, and enterprise rights management. 26 | 3. Enforcing information lifecycle management security 27 | * Managing data location/residency. 28 | * Ensuring compliance, including audit artifacts (logs, configurations). 29 | * Backups and business continuity, which are covered in Domain 6. 30 | 31 | ### Cloud data storage types 32 | 33 | Since cloud storage is virtualized it tends to support different data storage types than used in traditional storage technologies. Below the virtualization layer these might use well-known data storage mechanisms, but the cloud storage virtualization technologies that cloud consumers access will be different. These are the most common: 34 | 35 | *Object storage:* Object storage is similar to a file system. "Objects" are typically files, which are then stored using a cloud platform specific mechanism. Most access is through APIs, not standard file sharing protocols, although cloud providers may also offer front-end interfaces to support those protocols. 36 | 37 | *Volume storage:* This is essentially a virtual hard drive for instances/virtual machines. 38 | 39 | *Database:* Cloud platforms and providers may support a variety of different kinds of databases, including existing commercial and open source options as well as their own proprietary systems. Proprietary databases typically use their own APIs. Commercial or open source databases are hosted by the provider and typically use existing standards for connections. These can be relational or non-relational—the latter includes NoSQL and other key/value storage systems, or file system-based databases (e.g. HDFS). 40 | 41 | *Application/platform:* Examples of these would be a content delivery network (CDN), files stored in SaaS, caching, and other novel options. 42 | 43 | Most cloud platforms also use redundant, durable storage mechanisms that often utilize *data dispersion* (sometimes also known as *data fragmentation of bit splitting*). This process takes chunks of data, breaks them up, and then stores multiple copies on different physical storage to provide high durability. Data stored in this way is thus physically dispersed. A single file, for example, would not be located on a single hard drive. 44 | 45 | ### Managing data migrations to the cloud 46 | 47 | Before securing the data in the cloud most organizations want some means of managing what data is stored in private and public cloud providers. This is often essential for compliance as much or more than for security. 48 | 49 | To start, define your policies for which data types are allowed and where they are allowed, then tie these to your baseline security requirements. For example, "PII is allowed on x services assuming it meets y encryption and access control requirements." 50 | 51 | Then identify your key data repositories. Monitor them for large migrations/activity using tools like Database Activity Monitoring and File Activity Monitoring. This is essentially building an "early warning system" for large data transfers, but it’s also an important data security control to detect all sorts of major breaches and misuse scenarios. 52 | 53 | To detect actual migrations monitor cloud usage and any data transfers. You can do this with the help of the following tools: 54 | 55 | *CASB:* Cloud Access and Security Brokers (also known as Cloud Security Gateways) discover internal use of cloud services using various mechanisms such as network monitoring, integrating with an existing network gateway or monitoring tool, or even by monitoring DNS queries. After discovering which services your users are connecting to, most of these products then offer monitoring of activity on approved services through API connections (when available) or inline interception (man in the middle monitoring). Many support DLP and other security alerting and even offer controls to better manage use of sensitive data in cloud services (SaaS/PaaS/and IaaS). 56 | 57 | *URL filtering:* While not as robust as CASB a URL filter/web gateway may help you understand which cloud services your users are using (or trying to use). 58 | 59 | *DLP:* If you monitor web traffic (and look inside SSL connections) a DLP tool may also help detect data migrations to cloud services. However, some cloud SDKs and APIs may encrypt portions of data and traffic that DLP tools can't unravel, and thus they won't be able to understand the payload. 60 | 61 | ********insert 11.1********* 62 | > Managing data migrations to the cloud. 63 | 64 | #### Securing cloud data transfers 65 | 66 | Ensure that you are protecting your data as it moves to the cloud. This necessitates understanding your provider’s data migration mechanisms, as leveraging provider mechanisms is often more secure and cost effective than "manual" data transfer methods like SFTP. For example, sending data to a provider's object storage over an API is likely much more reliable and secure than setting up your own SFTP server on a virtual machine in the same provider. 67 | 68 | There are a few options for in-transit encryption depending on what the cloud platform supports. One way is to encrypt before sending to the cloud (client-side encryption). Network encryption (TLS/SFTP/etc.) is another option. Most cloud provider APIs use TLS by default; if not, pick a different provider, since this is an essential security capability. Proxy-based encryption may a third option, where you place an encryption proxy in a trusted area between the cloud consumer and the cloud provider and the proxy manages the encryption before transferring the data to the provider. 69 | 70 | In some instances you may have to accept public or untrusted data. If you allow partners or the public to send you data, ensure you have security mechanisms in place to sanitize it before processing or mixing it with your existing data. Always isolate and scan this data before integrating it. 71 | 72 | ### Securing data in the cloud 73 | 74 | Access controls and encryption are the core data security controls across the various technologies. 75 | 76 | #### Cloud data access controls 77 | 78 | *Access controls* should be implemented with a minimum of three layers: 79 | 80 | * *Management plane:* These are your controls for managing access of users that directly access the cloud platform's management plane. For example, logging into the web console of an IaaS service will allow that user to access data in object storage. Fortunately, most cloud platforms and providers start with default deny access control policies. 81 | 82 | * *Public and internal sharing controls:* If data is shared externally, to the public or partners that don't have direct access to the cloud platform, there will be a second layer of controls for this access. 83 | 84 | * *Application level controls:* As you build your own applications on the cloud platform you will design and implement your own controls to manage access. 85 | 86 | Options for access controls will vary based on cloud service model and provider-specific features. Create an entitlement matrix based on platform-specific capabilities. An entitlement matrix documents what users, groups, and roles should access which resources and functions. 87 | 88 | Project X 89 | 90 | | Entitlement | super-admin | service-admin | storage-admin | dev | security-audit | security-admin 91 | | ----- | ----------- | ----------- | ----------- | ----------- | ----------- | 92 | | Volume Describe All | X | X | | X | X | X | 93 | | Object Describe All | X | | X | X | X | X | 94 | | Volume Modify | X | X | | X | | X | 95 | | Read Logs | X | | | | X | X | 96 | | ... | X | X | X | X | X | X | 97 | 98 | 99 | Frequently (ideally continuously) validate that your controls meet your requirements, paying particular attention to any public shares. Consider setting up alerts for all new public shares or for changes in permissions that allow public access. 100 | 101 | *Fine-Grained Access Controls and Entitlement Mappings* 102 | 103 | The depth of potential entitlements will vary greatly from technology to technology. Some databases may support row-level security, others little more than broad access. Some will allow you to tie entitlements to identity and enforcement mechanisms built into the cloud platform, while others rely completely on the storage platform itself merely running in virtual machines. 104 | 105 | It's important to understand your options, map them out, and build your matrix. This applies to more than just file access, of course; it also applies to databases and all your cloud data stores. 106 | 107 | #### Storage (At-Rest) Encryption and Tokenization 108 | 109 | Encryption options vary tremendously based on service model, provider, and application/deployment specifics. Key management is just as essential as encryption, and is thus covered in a subsequent section. 110 | 111 | Encryption and tokenization are two separate technologies. Encryption protects data by applying a mathematical algorithm that "scrambles" the data, which then can only be recovered by running it through an unscrambling (decryption) process with a corresponding key. The result is a blob of ciphertext. Tokenization, on the other hand, takes the data and replaces it with a random value. It then stores the original and the randomized version in a secure database for later recovery. 112 | 113 | Tokenization is often used when the *format* of the data is important (e.g. replacing credit card numbers in an existing system that requires the same format text string). Format Preserving Encryption encrypts data with a key but also keeps the same structural format as tokenization, but it may not be as cryptographically secure due to the compromises. 114 | 115 | There are three components of an encryption system: data, the encryption engine, and key management. The data is, of course, the information that you’re encrypting. The engine is what performs the mathematical process of encryption. Finally, the key manager handles the keys for the encryption. The overall design of the system focuses on where to put each of these components. 116 | 117 | When designing an encryption system, you should start with a threat model. For example, do you trust a cloud provider to manage your keys? How could the keys be exposed? Where should you locate the encryption engine to manage the threats you are concerned with? 118 | 119 | ##### IaaS Encryption 120 | 121 | IaaS volumes can be encrypted using different methods, depending on your data. 122 | 123 | *Volume storage encryption* 124 | 125 | * *Instance-managed encryption:* The encryption engine runs within the instance, and the key is stored in the volume but protected by a passphrase or keypair. 126 | * *Externally managed encryption:* The encryption engine runs in the instance, but the keys are managed externally and issued to the instance on request. 127 | 128 | *********insert 11.2********* 129 | > Externally managed volume encryption. 130 | 131 | *Object and file storage* 132 | 133 | * *Client-side encryption:* When object storage is used as the back-end for an application (including mobile applications), encrypt the data using an encryption engine embedded in the application or client. 134 | * *Server-side encryption:* Data is encrypted on the server (cloud) side after being transferred in. The cloud provider has access to the key and runs the encryption engine. 135 | * *Proxy encryption:* In this model you connect the volume to a special instance or appliance/software, and then connect your instance to the encryption instance. The proxy handles all crypto operations and may keep keys either onboard or externally. 136 | 137 | ##### PaaS Encryption 138 | 139 | PaaS encryption varies tremendously due to all the different PaaS platforms. 140 | 141 | * *Application layer encryption:* Data is encrypted in the PaaS application or the client accessing the platform. 142 | * *Database encryption:* Data is encrypted in the database using encryption that's built in and is supported by a database platform like Transparent Database Encryption (TDE) or at the field level. 143 | * *Other:* These are provider-managed layers in the application, such as the messaging queue. There are also IaaS options when that is used for underlying storage. 144 | 145 | ##### SaaS Encryption 146 | 147 | SaaS providers may use any of the options previously discussed. It is recommended to use per-customer keys when possible, in order to better enforce multitenancy isolation. The following options are for SaaS consumers: 148 | 149 | * *Provider-managed encryption:* Data is encrypted in the SaaS application and generally managed by the provider. 150 | * *Proxy encryption:* Data passes through an encryption proxy before being sent to the SaaS application. 151 | 152 | #### Key Management (including Customer-Managed Keys) 153 | 154 | The main considerations for key management are performance, accessibility, latency, and security. Can you get the right key to the right place at the right time while also meeting your security and compliance requirements? 155 | 156 | There are four potential options for handling key management: 157 | 158 | * *HSM/appliance:* Use a traditional hardware security module (HSM) or appliance-based key manager, which will typically need to be on-premises, and deliver the keys to the cloud over a dedicated connection. 159 | * *Virtual appliance/software:* Deploy a virtual appliance or software-based key manager in the cloud. 160 | * *Cloud provider service:* This is a key management service offered by the cloud provider. Before selecting this option, make sure you understand the security model and SLAs to understand if your key could be exposed. 161 | * *Hybrid:* You can also use a combination, such as using a HSM as the root of trust for keys but then delivering application-specific keys to a virtual appliance that's located in the cloud and only manages keys for its particular context. 162 | 163 | ##### Customer-Managed Keys 164 | 165 | A customer-managed key allows a cloud customer to manage their own encryption key while the provider manages the encryption engine. For example, using your own key to encrypt SaaS data within the SaaS platform. Many providers encrypt data by default, using keys completely in their control. Some may allow you to substitute your own key, which integrates with their encryption system. Make sure your vendor’s practices align with your requirements. 166 | 167 | ************insert 11.3********* 168 | > Customer managed keys. 169 | 170 | Some providers may require you to use a service within the provider to manage the key. Thus, although the key is customer-managed, it is still potentially available to provider. This doesn't necessarily mean it is insecure: since the key management and data storage systems can be separated it would require collusion on the part of multiple employees at the provider to potentially compromise data. However, keys and data could still be exposed by a government request, depending on local laws. You may be able to store the keys externally from the provider and only pass them over on a per-request basis. 171 | 172 | ### Data security architectures 173 | 174 | Application architecture impacts data security. The features your cloud provider offers can reduce the attack surface, but make sure to demand strong metastructure security. For example, gap networks by using cloud storage or a queue service that communicates on the provider's network, not within your virtual network. That forces attackers to either fundamentally compromise the cloud provider or limit themselves to application-level attacks, since network attack paths are closed. 175 | 176 | An example would be using object storage for data transfers and batch processing, rather than SFTP-ing, to static instances. Another is message queue gapping—run application components on different virtual networks that are only bridged by passing data through the cloud provider's message queue service. This eliminates network attacks from one portion of the application to the other. 177 | 178 | ### Monitoring, auditing, and alerting 179 | 180 | These should tie into overall cloud monitoring. (See Domains 3, 6, and 7.) Identify (and alert about) any public access or entitlement changes on sensitive data. Use tagging to support alerting, when it’s available. 181 | 182 | You’ll need to monitor both API and storage access, since data may be exposed through either—in other words, accessing data in object storage via an API call or via a public sharing URL. Activity monitoring, including Database Activity Monitoring, may be an option. Make sure to store your logs in secure location, like a dedicated logging account. 183 | 184 | ### Additional data security controls 185 | 186 | #### Cloud platform/provider-specific controls 187 | 188 | A cloud platform or provider may have data security controls that are not covered elsewhere in this domain. Although typically they will be some form of access control and encryption, this Guidance can't cover all possible options. 189 | 190 | #### Data Loss Prevention 191 | 192 | Data Loss Prevention (DLP) is typically a way to monitor and protect data that your employees access via monitoring local systems, web, email, and other traffic. It is not typically used within data centers, and thus is more applicable to SaaS than PaaS or IaaS, where it is typically not deployed. 193 | 194 | * *CASB (Cloud Access and Security Brokers):* Some CASBs include basic DLP features for the sanctioned services they protect. For example, you could set a policy that a credit card number is never stored in a particular cloud service. The effectiveness depends greatly on the particular tool, the cloud service, and how the CASB is integrated for monitoring. Some CASB tools can also route traffic to dedicated DLP platforms for more robust analysis than is typically available when the CASB offers DLP as a feature. 195 | * *Cloud provider feature:* The cloud provider themselves may offer DLP capabilities, such as a cloud file storage and collaboration platform that scans uploaded files for content and applies corresponding security policies. 196 | 197 | #### Enterprise Rights Management 198 | 199 | As with DLP, this is typically an employee security control that isn't always as applicable in cloud. Since all DRM/ERM is based on encryption, existing tools may break cloud capabilities, especially in SaaS. 200 | 201 | * *Full DRM:* This is traditional full digital rights management using an existing tool. For example, applying rights to a file before storing it in the cloud service. As mentioned, it may break cloud provider features, such as browser preview or collaboration, unless there is some sort of integration (which is rare at the time of this writing). 202 | * *Provider-based control:* The cloud platform may be able to enforce controls very similar to full DRM by using native capabilities. For example, user/device/view versus edit: a policy that only allows certain users to view a file in a web browser, while other users can download and/or edit the content. Some platforms can even tie these policies to specific devices, not just on a user level. 203 | 204 | #### Data masking and test data generation 205 | 206 | These are techniques to protect data used in development and test environments, or to limit real-time access to data in applications. 207 | 208 | * *Test data generation:* This is the creation of a database with non-sensitive test data based on a "real" database. It can use scrambling and other randomization techniques to create a data set that resembles the source in size and structure but lacks sensitive data. 209 | * *Dynamic masking:* Dynamic masking rewrites data on the fly, typically using a proxy mechanism, to mask all or part of data delivered to a user. It is usually used to protect some sensitive data in applications, for example masking out all but the last digits of a credit card number when presenting it to a user. 210 | 211 | ### Enforcing lifecycle management security 212 | 213 | * *Managing data location/residency:* At certain times, you’ll need to disable unneeded locations. Use encryption to enforce access at the container or object level. Then, even if the data moves to an unapproved location, the data is still protected unless the key moves with it. 214 | * *Ensuring compliance:* You don't merely need to implement controls to maintain compliance, you need to document and test those controls. These are "artifacts of compliance"; this includes any audit artifacts you will have. 215 | * *Backups and business continuity:* (see Domain 6) 216 | 217 | ## Recommendations 218 | 219 | * Understand the specific capabilities of the cloud platform you are using. 220 | * Don't dismiss cloud provider data security. In many cases it is more secure than building your own, and comes at a lower cost. 221 | * Create an entitlement matrix for determining access controls. Enforcement will vary based on cloud provider capabilities. 222 | * Consider CASB to monitor data flowing into SaaS. It may still be helpful for some PaaS and IaaS, but rely more on existing policies and data repository security for those types of large migrations. 223 | * Use the appropriate encryption option based on the threat model for your data, business, and technical requirements. 224 | * Consider use of provider-managed encryption and storage options. Where possible, use a customer-managed key. 225 | * Leverage architecture to improve data security. Don't rely completely on access controls and encryption. 226 | * Ensure both API and data-level monitoring are in place, and that logs meet compliance and lifecycle policy requirements 227 | * Standards exist to help establish good security and the proper use of encryption and key management techniques and processes. Specifically, NIST SP-800-57 and ANSI X9.69 and X9.73. -------------------------------------------------------------------------------- /Domain 12- Identity, Access, and Entitlement Management.md: -------------------------------------------------------------------------------- 1 | # Domain 12: Identity, Entitlement, and Access Management 2 | 3 | ## Introduction 4 | 5 | Identity, entitlement, and access management (IAM) are deeply impacted by cloud computing. In both public and private cloud two parties are required to manage IAM without compromising security. This domain focuses on what needs to change in identity management for cloud. While we review some fundamental concepts, the focus is on how cloud changes identity management, and what to do about it. 6 | 7 | Cloud computing introduces multiple changes to how we have traditionally managed IAM for internal systems. It isn't that these are necessarily new issues, but that they are bigger issues when dealing with the cloud. 8 | 9 | The key difference is the relationship between the cloud provider and the cloud consumer, even in private cloud. IAM can't be managed solely by one or the other and thus a trust relationship, designation of responsibilities, and the technical mechanics to enable them are required. More often than not this comes down to *federation*. This is exacerbated by the fact that most organizations have many (sometimes hundreds) of different cloud providers into which they need to extend their IAM. 10 | 11 | Cloud also tends to change faster, be more distributed (including across legal jurisdictional boundaries), add the complexity of the management plane, and rely more (often exclusively) on broad network communications for everything, which opens up core infrastructure administration to network attacks. Plus there are extensive differences between providers and between the different service and deployment models. 12 | 13 | > This domain focuses primarily on IAM between an organization and cloud providers or between cloud providers and services. It does not discuss all the aspects of managing IAM within a cloud application, such as the internal IAM for an enterprise application running on IaaS. Those issues are very similar to building similar applications and services in traditional infrastructure. 14 | 15 | ### How IAM is different in the cloud 16 | 17 | Identity and access management is always complicated. At the heart we are mapping some form of an entity (a person, system, piece of code, etc.) to a verifiable identity associated with various attributes (which can change based on current circumstances), and then making a decision on what they can or can't do based on entitlements. Even when you control the entire chain of that process, managing it across disparate systems and technologies in a secure and verifiable manner, especially at scale, is a challenge. 18 | 19 | In cloud computing, the fundamental problem is that multiple organizations are now managing the identity and access management to resources, which can greatly complicate the process. For example, imagine having to provision the same user on dozens—or hundreds—of different cloud services. *Federation* is the primary tool used to manage this problem, by building trust relationships between organizations and enforcing them through standards-based technologies. 20 | 21 | Federation and other IAM techniques and technologies have existed since before the first computers (just ask a bank or government), and over time many organizations have built patchworks and silos of IAM as their IT has evolved. Cloud computing is a bit of a forcing function since adopting cloud very quickly pushes organizations to confront their IAM practices and update them to deal with the differences of cloud. This brings both opportunities and challenges. 22 | 23 | At a high level, the migration to cloud is an opportunity to build new infrastructure and processes on modern architectures and standards. There have been tremendous advances in IAM over the years, yet many organizations have only been able to implement them in limited use cases due to budget and legacy infrastructure constraints. The adoption of cloud computing, be it a small project or an entire data center migration, means building new systems on new infrastructure that are generally architected using the latest IAM practices. 24 | 25 | These shifts also bring challenges. Moving to federation at scale with multiple internal and external parties can be complex and difficult to manage due to the sheer mathematics of all the variables involved. Determining and enforcing attributes and entitlements across disparate systems and technologies bring both process and technical issues. Even fundamental architectural decisions may be hampered by the wide variation in support among cloud providers and platforms. 26 | 27 | IAM spans essentially every domain in this document. This section starts with a quick review of some core terminology that not all readers may be familiar with, then delves into the cloud impacts firstly on identity, then on access and entitlement management. 28 | 29 | ## Overview 30 | 31 | IAM is a broad area of practice with its own lexicon that can be confusing for those who aren't domain specialists, especially since some terms have different meanings in different contexts (and are used in areas outside IAM). Even the term "IAM" is not universal and is often referred to as *Identity Management (IdM)*. 32 | 33 | Gartner defines IAM as ["the security discipline that enables the right individuals to access the right resources at the right times for the right reasons."](http://www.gartner.com/it-glossary/identity-and-access-management-iam/) Before we get into the details, here are the high level terms most relevant to our discussion of IAM in cloud computing: 34 | 35 | * *Entity:* the person or "thing" that will have an identity. It could be an individual, a system, a device, or application code. 36 | * *Identity:* the unique expression of an entity within a given namespace. An entity can have multiple digital identities, such as a single individual having a work identity (or even multiple identities, depending on the systems), a social media identity, and a personal identity. For example, if you are a single entry in a single directory server then that is your identity. 37 | * *Identifier:* the means by which an identity can be asserted. For digital identities this is often a cryptological token. In the real world it might be your passport. 38 | * *Attributes:* facets of an identity. Attributes can be relatively static (like an organizational unit) or highly dynamic (IP address, device being used, if the user authenticated with MFA, location, etc.). 39 | * *Persona:* the expression of an identity with attributes that indicates context. For example, a developer that logs into work and then connects to a cloud environment as a developer on a particular project. The identity is still the individual, and the persona is the individual in the context of that project. 40 | * *Role:* identities can have multiple roles which indicate context. "Role" is a confusing and abused term used in many different ways. For our purposes we will think of it as similar to a persona, or as a subset of a persona. For example, a given developer on a given project may have different roles, such as "super-admin" and "dev", which are then used to make access decisions. 41 | * *Authentication:* the process of confirming an identity. When you log in to a system you present a username (the identifier) and password (an attribute we refer to as an authentication factor). Also known as *Authn*. 42 | * *Multifactor Authentication (MFA):* use of multiple factors in authentication. Common options include one time passwords generated by a physical or virtual device/token (OTP), out-of-band validation through an OTP sent via text message, or confirmation from a mobile device, biometrics, or plug-in tokens. 43 | * *Access control:* restricting access to a resource. *Access management* is the process of managing access to the resources. 44 | * *Authorization:* allowing an identity access to something (e.g. data or a function). Also known as *Authz*. 45 | * *Entitlement:* mapping an identity (including roles, personas, and attributes) to an authorization. The entitlement is what they are allowed to do, and for documentation purposes we keep these in an *entitlement matrix*. 46 | * *Federated Identity Management:* the process of asserting an identity across different systems or organizations. This is the key enabler of *Single Sign On* and also core to managing IAM in cloud computing. 47 | * *Authoritative source:* the "root" source of an identity, such as the directory server that manages employee identities. 48 | * *Identity Provider:* the source of the identity in federation. The identity provider isn't always the authoritative source, but can sometimes rely on the authoritative source, especially if it is a broker for the process. 49 | * *Relying Party:* the system that relies on an identity assertion from an identity provider. 50 | 51 | There are a few more terms that will be covered in their relevant sections below, including the major IAM standards. Also, although this domain may seem overly focused on public cloud, all the same principles apply in private cloud; the scope, however, will be lessened since the organization may have more control over the entire stack. 52 | 53 | ### IAM standards for cloud computing 54 | 55 | There are quite a few identity and access management standards out there, and many of them can be used in cloud computing. Despite the wide range of options the industry is settling on a core set that are most commonly seen in various deployments and are supported by the most providers. There are also some standards that are promising but aren't yet as widely used. This list doesn't reflect any particular endorsement and doesn't include all options but is merely representative of what is most commonly supported by the widest range of providers: 56 | 57 | * *Security Assertion Markup Language (SAML)* 2.0 is an OASIS standard for federated identity management that supports both authentication and authorization. It uses XML to make assertions between an identity provider and a relying party. Assertions can contain authentication statements, attribute statements, and authorization decision statements. SAML is very widely supported by both enterprise tools and cloud providers but can be complex to initially configure. 58 | * *OAuth* is an IETF standard for authorization that is very widely used for web services (including consumer services). OAuth is designed to work over HTTP and is currently on version 2.0, which is not compatible with version 1.0. To add a little confusion to the mix, OAuth 2.0 is more of a framework and less rigid than OAuth 1.0, which means implementations may not be compatible. It is most often used for delegating access control/authorizations between services. 59 | * *OpenID* is a standard for federated authentication that is very widely supported for web services. It is based on HTTP with URLs used to identify the identity provider and the user/identity (e.g. identity.identityprovider.com). The current version is OpenID Connect 1.0 and it is very commonly seen in consumer services. 60 | 61 | There are two other standards that aren't as commonly encountered but can be useful for cloud computing: 62 | 63 | * *eXtensible Access Control Markup Language (XACML)* is a standard for defining attribute-based access controls/authorizations. It is a policy language for defining access controls at a Policy Decision Point and then passing them to a Policy Enforcement Point. It can be used with both SAML and OAuth since it solves a different part of the problem—i.e. deciding what an entity is allowed to do with a set of attributes, as opposed to handling logins or delegation of authority. 64 | * *System for Cross-domain Identity Management (SCIM)* is a standard for exchanging identity information between domains. It can be used for provisioning and deprovisioning accounts in external systems and for exchanging attribute information. 65 | 66 | *********insert 12.2********* 67 | > Callout and diagram- How Federated Identity Management Works: Federation involves an *identity provider* making assertions to a *relying party* after building a trust relationship. At the heart are a series of cryptographic operations to build the trust relationship and exchange credentials. A practical example is a user logging into their work network, which hosts a directory server for accounts. That user then opens a browser connection to a SaaS application. Instead of logging in there are a series of behind-the-scenes operations where the identity provider (the internal directory server) asserts the identity of the user, that the user authenticated, as well as any attributes. The relying party trusts those assertions and logs the user in without the user entering any credentials. In fact, the relying party doesn't even have a username or password for that user; it relies on the identity provider to assert successful authentication. To the user they simply go to the website for the SaaS application and are logged in, assuming they successfully authenticated with the internal directory. 68 | 69 | This isn't to imply there aren't other techniques or standards used in cloud computing for identity, authentication, and authorization. Most cloud providers, especially IaaS, have their own internal IAM systems that might not use any of these standards or that can be connected to an organization using these standards. For example, HTTP request signing is very commonly used for authenticating REST APIs and authorization decisions are managed by internal policies on the cloud provider side. The request signing might still support SSO through SAML, or the API might be completely OAuth based, or use its own token mechanism. All are commonly encountered, but most enterprise-class cloud providers offer federation support of some sort. 70 | 71 | > Identity protocols and standards do not represent a complete solution by themselves, but they are a means to an end. The essential concepts when choosing an identity protocol are: 72 | 73 | * No protocol is a silver bullet that solves all identity and access control problems. 74 | * Identity protocols must be analyzed in the context of use case(s). For example, Browser-based Single Sign On, API keys, or mobile-to-cloud authentication could each lead companies to a different approach. 75 | * The key operating assumption should be that identity is a perimeter in and of itself, just like a DMZ. So any identity protocol has to be selected and engineered from the standpoint that it can traverse risky territory and withstand malice. 76 | 77 | ### Managing users and identities for cloud computing 78 | 79 | The "identity" part of identity management focuses on the processes and technologies for registering, provisioning, propagating, managing, and deprovisioning identities. Managing identities and provisioning them in systems are problems that information security has been tackling for decades. It wasn't so long ago that IT administrators needed to individually provision users in every different internal systems. Even today, with centralized directory servers and a range of standards, true Single Sign On for everything is relatively rare; users still manage a set of credentials, albeit a much smaller set than in the past. 80 | 81 | > A note on scope: The descriptions in this section are generic but do skew towards user management. The same principles apply to identities for services, devices, servers, code, and other entities, but the processes and details around those can be more complex and are tightly tied to application security and architectures. This domain also only includes limited discussion of all the internal identity management issues for cloud providers, for the same reasons. It isn't that these areas are less important; in many cases they are more important, but they also bring a complexity that can't be fully covered within the constraints of this Guidance. 82 | 83 | Cloud providers and cloud consumers need to start with the fundamental decision on how to manage identities: 84 | 85 | * Cloud providers need to nearly always support internal identities, identifiers, and attributes for users who directly access the service, while also supporting federation so that organizations don't have to manually provision and manage every user in the provider's system and issue everyone separate credentials. 86 | * Cloud consumers need to decide where they want to manage their identities and which architectural models and technologies they want to support to integrate with cloud providers. 87 | 88 | As a cloud consumer you can log into a cloud provider and create all your identities in their system. This is not scalable for most organizations, which is why most turn to federation. Keep in mind there *can* be exceptions where it makes sense to keep all or some of the identities with the cloud provider isolated, such as backup administrator accounts to help debug problems with the federated identity connection. 89 | 90 | When using federation the cloud consumer needs to determine the authoritative source that holds the unique identities they will federate. This is often an internal directory server. The next decision is whether to directly use the authoritative source as the identity provider, use a different identity source that feeds from the authoritative source (like a directory fed from an HR system), or to integrate an *identity broker*. There are two possible architectures: 91 | 92 | *******insert 12.2********* 93 | > Free-form vs. hub and spoke 94 | 95 | * *Free-form:* internal identity providers/sources (often directory servers) connect directly to cloud providers. 96 | * *Hub and spoke:* internal identity providers/sources communicate with a central broker or repository that then serves as the identity provider for federation to cloud providers. 97 | 98 | Directly federating internal directory servers in the free-form model raises a few issues: 99 | 100 | * The directory needs Internet access. This can be a problem, depending on existing topography, or it may violate security policies. 101 | * It may require users to VPN back to the corporate network before accessing cloud services. 102 | * Depending on the existing directory server, and especially if you have multiple directory servers in different organizational silos, federating to an external provider may be complex and technically difficult. 103 | 104 | *Identity brokers* handle federating between identity providers and relying parties (which may not always be a cloud service). They can be located on the network edge or even in the cloud in order to enable web-SSO. 105 | 106 | Identity providers don't need to be located only on-premises; many cloud providers now support cloud-based directory servers that support federation internally and with other cloud services. For example, more complex architectures can synchronize or federate a portion of an organization's identities for an internal directory through an identity broker and then to a cloud-hosted directory, which then serves as an identity provider for other federated connections. 107 | 108 | After determining the large-scale model there are still process and architectural decisions required for any implementation: 109 | 110 | * How to manage identities for application code, systems, devices, and other services. You may leverage the same model and standards or decide to take a different approach within cloud deployments and applications. For example, the descriptions above skew towards users accessing services, but may not apply equally for services talking to services, systems or devices, or for application components within an IaaS deployment. 111 | * Defining the identity provisioning process and how to integrate that into cloud deployments. There may also be multiple provisioning processes for different use cases, although the goal should be to have as unified a process as possible. 112 | * If the organization has an effective provisioning process in place for traditional infrastructure this should ideally be extended into cloud deployments. However, if existing internal processes are problematic then the organization should instead use the move to cloud as an opportunity to build a new, more effective process. 113 | * Provisioning and supporting individual cloud providers and deployments. There should be a formal process for adding new providers into the IAM infrastructure. This includes the process of establishing any needed federation connections as well as: 114 | * Mapping attributes (including roles) between the identity provider and the relying party. 115 | * Enabling required monitoring/logging, including identity-related security monitoring, such as behavioral analytics. 116 | * Building an entitlement matrix (discussed more in the next section). 117 | * Documenting any break/fix scenarios in case there is a technical failure of any of the federation (or other techniques) used for the relationship. 118 | * Ensuring incident response plans for potential account takeovers are in place, including takeovers of privileged accounts. 119 | * Implementing deprovisioning or entitlement change processes for identities and the cloud provider. With federation this requires work on both sides of the connection. 120 | 121 | Lastly, cloud providers need to determine which identity management standards they wish to support. Some providers support only federation while others support multiple IAM standards plus their own internal user/account management. Providers who serve enterprise markets will need to support federated identity, and most likely SAML. 122 | 123 | ### Authentication and Credentials 124 | 125 | Authentication is the process of proving or confirming an identity. In information security authentication most commonly refers to the act of a user logging in, but it also refers to essentially any time an entity proves who they are and assumes an identity. Authentication is the responsibility of the identity provider. 126 | 127 | The biggest impact of cloud computing on authentication is a greater need for *strong authentication* using *multiple factors*. This is for two reasons: 128 | 129 | * Broad network access means cloud services are always accessed over the network, and often over the Internet. Loss of credentials could more easily lead to an account takeover by an attacker, since attacks aren't restricted to the local network. 130 | * Greater use of federation for Single Sign On means one set of credentials can potentially compromise a greater number of cloud services. 131 | 132 | Multifactor authentication (MFA) offers one of the strongest options for reducing account takeovers. It isn't a panacea, but relying on a single factor (password) for cloud services is very high risk. When using MFA with federation the identity provider can and should pass the MFA status as an attribute to the relying party. 133 | 134 | There are multiple options for MFA, including: 135 | 136 | * *Hard tokens* are physical devices that generate one time passwords for human entry or need to be plugged into a reader. These are the best option when the highest level of security is required. 137 | * *Soft tokens* work similarly to hard tokens but are software applications that run on a phone or computer. Soft tokens are also an excellent option but could be compromised if the user's device is compromised, and this risk needs to be considered in any threat model. 138 | * *Out of band Passwords* are text or other messages sent to a user's phone (usually) and are then entered like any other one time password generated by a token. Although also a good option any threat model must consider message interception, especially with SMS. 139 | * *Biometrics* are growing as an option, thanks to biometric readers now commonly available on mobile phones. For cloud services the biometric is a *local protection* that doesn't send biometric information to the cloud provider and is instead an attribute that can be sent to the provider. As such the security and ownership of the local device needs to be considered. 140 | 141 | For customers, [FIDO](https://fidoalliance.org) is one standard that may streamline stronger authentication for consumers while minimizing friction. 142 | 143 | ### Entitlement and Access Management 144 | 145 | The terms *entitlement, authorization, and access control* all overlap somewhat and are defined differently depending on the context. Although we defined them earlier in this section, here is a quick review. 146 | 147 | An *authorization* is permission to do something—access a file or network, or perform a certain function like an API call on a particular resource. 148 | 149 | An *access control* allows or denies the expression of that authorization, so it includes aspects like assuring that the user is authenticated before allowing access. 150 | 151 | An *entitlement* maps identities to authorizations and any required attributes (e.g. user x is allowed access to resource y when z attributes have designated values). We commonly refer to a map of these entitlements as an *entitlement matrix*. Entitlements are often encoded as technical *policies* for distribution and enforcement. 152 | 153 | This is only one definition of these terms and you may see them used differently in other documents. We also use the term *access management* as the "A" portion of IAM and it refers to the entire process of defining, propagating, and enforcing authorizations. 154 | 155 | > Sample Entitlement Matrix 156 | 157 | Project X 158 | 159 | | Entitlement | super-admin | service 1 admin | service 2 admin | dev | security audit | security admin 160 | | ----- | ----------- | ----------- | ----------- | ----------- | ----------- | 161 | | Service1 List All All | X | X | | X | X | X | 162 | | Service 2 List All | X | | X | X | X | X | 163 | | Service 1 Modify Network | X | X | | X | | X | 164 | | Service 2 Modify Security Rule | X | X | | | | X | 165 | | Read Audit Logs | X | | | | X | X | 166 | | ... | X | X | X | X | X | X | 167 | 168 | Here's a real-world cloud example. The cloud provider has an API for launching new virtual machines. That API has a corresponding authorization to allow launching new machines, with additional authorization options for what virtual network a user can launch the VM within. The cloud administrator creates an entitlement that says that users in the developer group can launch virtual machines in only their project network and only if they authenticated with MFA. The group and the use of MFA are attributes of the user's identity. That entitlement is written as a policy that is loaded into the cloud provider's system for enforcement. 169 | 170 | Cloud impacts entitlements, authorizations, and access management in multiple ways: 171 | 172 | * Cloud providers and platforms, like any other technology, will have their own set of potential authorizations specific to them. Unless the provider supports XACML (rare today) the cloud consumer will usually need to configure entitlements within the cloud platform directly. 173 | 174 | * The cloud provider is responsible for enforcing authorizations and access controls. 175 | 176 | * The cloud consumer is responsible for defining entitlements and properly configuring them within the cloud platform. 177 | 178 | * Cloud platforms tend to have greater support for the *Attribute-Based Access Control* model for IAM, which offers greater flexibility and security than the *Role-Based Access Control* model. RBAC is the traditional model for enforcing authorizations and relies on what is often a single attribute (a defined role). ABAC allows more granular and context aware decisions by incorporating multiple attributes, such as role, location, authentication method, and more. 179 | * ABAC is the preferred model for cloud-based access management. 180 | 181 | * When using federation the cloud consumer is responsible for mapping attributes, including roles and groups, to the cloud provider and ensuring that these are properly communicated during authentication. 182 | 183 | * Cloud providers are responsible for supporting granular attributes and authorizations to enable ABAC and effective security for cloud consumers. 184 | 185 | ### Privileged User Management 186 | 187 | In terms of controlling risk, few things are more essential than privileged user management. The requirements mentioned above for strong authentication should be a strong consideration for any privileged user. In addition, account and session recoding should be implemented to drive up accountability and visibility for privileged users. 188 | 189 | In some cases, it will be beneficial for a privileged user to sign in through a separate tightly controlled system using higher levels of assurances for credential control, digital certificates, physically and logically separate access points, and/or jump hosts. 190 | 191 | ## Recommendations 192 | 193 | * Organizations should develop a comprehensive and formalized plan and processes for managing identities and authorizations with cloud services. 194 | * When connecting to external cloud providers, use federation, if possible, to extend existing identity management. Try to minimize silos of identities in cloud providers that are not tied to internal identities. 195 | * Consider the use of identity brokers where appropriate. 196 | * Cloud consumers are responsible for maintaining the identity provider and defining identities and attributes. 197 | * These should be based on an authoritative source. 198 | * Distributed organizations should consider using cloud-hosted directory servers when on-premises options either aren't available or do not meet requirements. 199 | * Cloud consumers should prefer MFA for all external cloud accounts and send MFA status as an attribute when using federated authentication. 200 | * Privileged identities should always use MFA. 201 | * Develop an entitlement matrix for each cloud provider and project, with an emphasis on access to the metastructure and/or management plane. 202 | * Translate entitlement matrices into technical policies when supported by the cloud provider or platform. 203 | * Prefer ABAC over RBAC for cloud computing. 204 | * Cloud providers should offer both internal identities and federation using open standards. 205 | * There are no magic protocols: pick your use cases and constraints first and find the right protocol second. -------------------------------------------------------------------------------- /Domain 13- Security as a Service.md: -------------------------------------------------------------------------------- 1 | # Domain 13: Security as a Service 2 | 3 | ## Introduction 4 | 5 | While most of this Guidance focuses on securing cloud platforms and deployments, this domain shifts direction to cover security services delivered *from* the cloud. These services, which are typically SaaS or PaaS, aren't necessarily used exclusively to protect cloud deployments; they are just as likely to help defend traditional on-premises infrastructure. 6 | 7 | Security as a Service (SecaaS) providers offer security capabilities *as a cloud service*. This includes dedicated SecaaS providers as well as packaged security features from general cloud-computing providers. Security as a Service encompasses a very wide range of possible technologies, but they must meet the following criteria: 8 | 9 | * SecaaS includes security products or services that are delivered as a cloud service. 10 | * To be considered SecaaS, the services must still meet the essential NIST characteristics for cloud computing, as defined in Domain 1. 11 | 12 | This section highlights some of the more common categories in the market, but SecaaS is constantly evolving and the descriptions and following list should not be considered canonical. There are examples and services not covered in this document, and more enter the market on a constant basis. 13 | 14 | ## Overview 15 | 16 | ### Potential benefits and concerns of SecaaS 17 | 18 | Before delving into the details of the different significant SecaaS categories it is important to understand how SecaaS is different from both on-premises and self-managed security. To do so, consider the potential benefits and consequences. 19 | 20 | #### Potential benefits 21 | 22 | * *Cloud-computing benefits.* The normal potential benefits benefits of cloud computing—such as reduced capital expenses, agility, redundancy, high availability, and resiliency—all apply to SecaaS. As with any other cloud provider the magnitude of these benefits depend on the pricing, execution, and capabilities of the security provider. 23 | 24 | * *Staffing and expertise.* Many organizations struggle to employ, train, and retain security professionals across relevant domains of expertise. This can be exacerbated due to limitations of local markets, high costs for specialists, and balancing day-to-day needs with the high rate of attacker innovation. As such, SecaaS providers bring the benefit of extensive domain knowledge and research that may be unattainable for many organizations who are not solely focused on security or the specific security domain. 25 | 26 | * *Intelligence-sharing.* SecaaS providers protect multiple clients simultaneously and have the opportunity to share data intelligence and data across them. For example, finding a malware sample in one client allows the provider to immediately add it to their defensive platform, thus protecting all other customers. Practically speaking this isn't a magic wand, as the effectiveness will vary across categories, but since intelligence-sharing is built into the service, the potential upside is there. 27 | 28 | * *Deployment flexibility.* SecaaS may be better positioned to support evolving workplaces and cloud migrations, since it is itself a cloud-native model delivered using broad network access and elasticity. Services can typically handle more flexible deployment models, such as supporting distributed locations without the complexity of multi-site hardware installations. 29 | 30 | * *Insulation of clients.* In some cases, SecaaS can intercept attacks before they hit the organization directly. For example, spam filtering and cloud-based Web Application Firewalls are positioned *between* the attackers and the organization. They can absorb certain attacks before they ever reach the customer's assets. 31 | 32 | * *Scaling and cost.* The cloud model provides the consumer with a “Pay as you Grow” model, which also helps organizations focus on their core business and lets them leave security concerns to the experts. 33 | 34 | #### Potential concerns 35 | 36 | * *Lack of visibility.* Since services operate at a remove from the customer, they often provide less visibility or data compared to running one's own operation. The SecaaS provider may not reveal details of how it implements its own security and manages its own environment. Depending on the service and the provider, that may result in a difference in data sources and the level of detail available for things like monitoring and incidents. Some information that the customer may be accustomed to having may look different, have gaps, or not be available at all. The actual evidence and artifacts of compliance, as well as other investigative data, may not meet the customer's goals. All of this can and should be determined before entering into any agreement. 37 | 38 | * *Regulation differences.* Given global regulatory requirements, SecaaS providers may be unable to assure compliance in all jurisdictions that an organization operates in. 39 | 40 | * *Handling of regulated data.* Customers will also need assurance that any regulated data potentially vacuumed up as part of routine security scanning or a security incident is handled in accordance with any compliance requirements; this also needs to comply with aforementioned international jurisdictional differences. For example, employee monitoring in Europe is more restrictive than it is in the United States, and even basic security monitoring practices could violate workers' rights in that region. Likewise, if a SecaaS provider relocates its operations, due to data center migration or load balancing, it may violate regulations that have geographical restrictions in data residence. 41 | 42 | * *Data leakage.* As with any cloud computing service or product, there is always the concern of data from one cloud consumer leaking to another. This risk isn't unique to SecaaS, but the highly sensitive nature of security data (and other regulated data potentially exposed in security scanning or incidents) does mean that SecaaS providers should be held to the highest standards of multitenant isolation and segregation. Security-related data is also likely to be involved in litigation, law enforcement investigations, and other discovery situations. Customers want to ensure their data will not be exposed when these situations involve another client on the service. 43 | 44 | * *Changing providers.* Although simply switching SecaaS providers may on the surface seem easier than swapping out on-premises hardware and software, organizations may be concerned about lock-in due to potentially losing access to data, including historical data needed for compliance or investigative support. 45 | 46 | * *Migration to SecaaS.* For organizations that have existing security operations and on-premises legacy security control solutions, the migration to SecaaS and the boundary and interface between any in-house IT department and SecaaS providers must be well planned, exercised, and maintained. 47 | 48 | ### Major Categories of Security as a Service Offerings 49 | 50 | There are a large number of products and services that fall under the heading of Security as a Service. While the following is not a canonical list, it describes many of the more common categories seen as of this writing: 51 | 52 | #### Identity, Entitlement, and Access Management Services 53 | 54 | Identity-as-a-service is a generic term that covers one or many of the services that may comprise an identity ecosystem, such as Policy Enforcement Points (PEP-as-a-service), Policy Decision Points (PDP-as-a-service), Policy Access Points (PAP-as-a-service), services that provide Entities with Identity, services that provide attributes (e.g. Multi-Factor Authentication), and services that provide reputation. 55 | 56 | One of the more well-known categories heavily used in cloud security are Federated Identity Brokers. These services help intermediate IAM between an organization's existing identity providers (internal or cloud-hosted directories) and the many different cloud services used by the organization. They can provide web-based Single Sign On (SSO), helping ease some of the complexity of connecting to a wide range of external services that use different federation configurations. 57 | 58 | There are two other categories commonly seen in cloud deployments. Strong authentication services use apps and infrastructure to simplify the integration of various strong authentication options, including mobile device apps and tokens for MFA. The other category hosts directory servers in the cloud to serve as an organization's identity provider. 59 | 60 | #### Cloud Access and Security Brokers (CASB, also known as Cloud Security Gateways) 61 | 62 | These products intercept communications that are directed towards a cloud service or directly connect to the service via API in order to monitor activity, enforce policy, and detect and/or prevent security issues. They are most commonly used to manage an organization's sanctioned and unsanctioned SaaS services. While there are on-premises CASB options, it is also often offered as a cloud-hosted service. 63 | 64 | CASBs can also connect to on-premises tools to help an organization detect, assess, and potentially block cloud usage and unapproved services. Many of these tools include risk-rating capabilities to help customers understand and categorize hundreds or thousands of cloud services. The ratings are based on a combination of the provider's assessments, which can be weighted and combined with the organization's priorities. 65 | 66 | Most providers also offer basic Data Loss Prevention for the covered cloud services, inherently or through partnership and integration with other services. 67 | 68 | Depending on the organization discussing "CASB," the term is also sometimes used to include Federated Identity Brokers. This can be confusing: although the combination of the "security gateway" and "identity broker" capabilities is possible and does exist, the market is still dominated by independent services for those two capabilities. 69 | 70 | #### Web Security (Web Security Gateways) 71 | 72 | Web Security involves real-time protection, offered either on-premises through software and/or appliance installation, or via the Cloud by proxying or redirecting web traffic to the cloud provider (or a hybrid of both). This provides an added layer of protection on top of other protection, such as anti-malware software to prevent malware from entering the enterprise via activities such as web browsing. In addition, it can also enforce policy rules around types of web access and the time frames when they are allowed. Application authorization management can provide an extra level of granular and contextual security enforcement for web applications. 73 | 74 | #### Email Security 75 | 76 | Email Security should provide control over inbound and outbound email, protecting the organization from risks like phishing and malicious attachments, as well as enforcing corporate polices like acceptable use and spam prevention, and providing business continuity options. 77 | 78 | In addition, the solution may support policy-based encryption of emails as well as integrating with various email server solutions. Many email security solutions also offer features like digital signatures that enable identification and non-repudiation. This category includes the full range of services, from those as simple as anti-spam features all the way to fully-integrated email security gateways with advanced malware and phishing protection. 79 | 80 | #### Security Assessment 81 | 82 | Security assessments are third-party or customer-driven audits of cloud services or assessments of on-premises systems via cloud-provided solutions. Traditional security assessments for infrastructure, applications, and compliance audits are well defined and supported by multiple standards such as NIST, ISO, and CIS. A relatively mature toolset exists, and a number of tools have been implemented using the SecaaS delivery model. Using that model, subscribers get the typical benefits of cloud computing: variant elasticity, negligible setup time, low administration overhead, and pay-per-use with low initial investments. 83 | 84 | There are three main categories of security assessments: 85 | 86 | * Traditional security/vulnerability assessments of assets that are deployed in the cloud (e.g. virtual machines/instances for patches and vulnerabilities) or on-premises. 87 | * Application security assessments, including SAST, DAST, and management of RASP. 88 | * Cloud platform assessment tools that connect directly with the cloud service over API to assess not merely the assets deployed in the cloud, but the cloud configuration as well. 89 | 90 | #### Web Application Firewalls 91 | 92 | In a cloud-based Web Application Firewall (WAF), customers redirect traffic (using DNS) to a service that analyzes and filters traffic before passing it through to the destination web application. Many cloud WAFs also include anti-DDoS capabilities. 93 | 94 | #### Intrusion Detection/Prevention (IDS/IPS) 95 | 96 | Intrusion Detection/Prevention systems monitor behavior patterns using rule-based, heuristic, or behavioral models to detect anomalies in activity which might present risks to the enterprise. With IDS/IPS as a service, the information feeds into a service-provider's managed platform, as opposed to the customer being responsible for analyzing events themselves. Cloud IDS/IPS can use existing hardware for on-premises security, virtual appliances for in-cloud (see Domain 7 for the limitations), or host-based agents. 97 | 98 | #### Security Information & Event Management (SIEM) 99 | 100 | Security Information and Event Management (SIEM) systems aggregate (via push or pull mechanisms) log and event data from virtual and real networks, applications, and systems. This information is then correlated and analyzed to provide real-time reporting on and alerting of information or events that may require intervention or other types of responses. Cloud SIEMs collect this data in a cloud service, as opposed to a customer-managed, on-premises system. 101 | 102 | #### Encryption and Key Management 103 | 104 | These services encrypt data and/or manage encryption keys. They may be offered by cloud services to support customer-managed encryption and data security. They may be limited to only protecting assets within that specific cloud provider, or they may be accessible across multiple providers (and even on-premises, via API) for broader encryption management. The category also includes encryption proxies for SaaS, which intercept SaaS traffic to encrypt discrete data. 105 | 106 | However, encrypting data *outside* a SaaS platform may affect the ability of the platform to utilize the data in question. 107 | 108 | #### Business Continuity and Disaster Recovery 109 | 110 | Providers of cloud BC/DR services back up data from individual systems, data centers, or cloud services to a cloud platform instead of relying on local storage or shipping tapes. They may use a local gateway to speed up data transfers and local recoveries, with the cloud service serving as the final repository for worst-case scenarios or archival purposes. 111 | 112 | #### Security Management 113 | 114 | These services roll up traditional security management capabilities, such as EPP (endpoint) protection, agent management, network security, mobile device management, and so on into a single cloud service. This reduces or eliminates the need for local management servers and may be particularly well suited for distributed organizations. 115 | 116 | #### Distributed Denial of Service Protection 117 | 118 | By nature, most DDoS protections are cloud-based. They operate by rerouting traffic through the DDoS service in order to absorb attacks before they can affect the customer's own infrastructure. 119 | 120 | ## Recommendations 121 | 122 | * Before engaging a SecaaS provider, be sure to understand any security-specific requirements for data-handling (and availability), investigative, and compliance support. 123 | * Pay particular attention to handling of regulated data, like PII. 124 | * Understand your data retention needs and select a provider that can support data feeds that don't create a lock-in situation. 125 | * Ensure that the SecaaS service is compatible with your current and future plans, such as its supported cloud (and on-premises) platforms, the workstation and mobile operating systems it accommodates, and so on. -------------------------------------------------------------------------------- /Domain 14- Related Technologies.md: -------------------------------------------------------------------------------- 1 | # Domain 14: Related Technologies 2 | 3 | ## Introduction 4 | 5 | Throughout this Guidance we have focused on providing background information and best practices for directly securing cloud computing. As such a foundational technology there are also a variety of related technologies that bring their own particular security concerns. 6 | 7 | While covering all potential uses of cloud is well beyond the scope of this document, the Cloud Security Alliance feels it is important to include background and recommendations for key technologies that are interrelated with cloud. Some, such as containers and Software Defined Networks, are so tightly intertwined that we cover them in other respective domains of the Guidance. This Domain provides more depth on additional technologies that don't fit cleanly into existing domains. 8 | 9 | Breaking these out into their own section provides more flexibility to update coverage, adding and removing technologies as their usage shifts and new capabilities emerge. 10 | 11 | ## Overview 12 | 13 | Related technologies fall into two broad categories: 14 | 15 | * Technologies that rely nearly exclusively on cloud computing to operate. 16 | * Technologies that don't necessarily rely on cloud, but are commonly seen in cloud deployments. 17 | 18 | That isn't to say these technologies *can't* work without cloud, just that they are often seen overlapping or relying on cloud deployments and are so commonly seen that they have implications for the majority of cloud security professionals. 19 | 20 | The current list includes: 21 | 22 | * Big Data 23 | * Internet of Things (IoT) 24 | * Mobile devices 25 | * Serverless computing 26 | 27 | Each of these technologies is currently covered by additional Cloud Security Alliance research working groups in multiple ongoing projects and publications: 28 | 29 | * [Big Data Working Group](https://cloudsecurityalliance.org/group/big-data/) 30 | * [Internet of Things Working Group](https://cloudsecurityalliance.org/group/internet-of-things/) 31 | * [Mobile Working Group](https://cloudsecurityalliance.org/group/mobile/) 32 | 33 | ### Big Data 34 | 35 | Big data includes a collection of technologies for working with extremely large datasets that traditional data-processing tools are unable to manage. It's not any single technology but rather refers commonly to distributed collection, storage, and data-processing frameworks. 36 | 37 | Gartner defines it as such: ["Big data is high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization."](http://www.gartner.com/newsroom/id/1731916) 38 | 39 | The "3 Vs" are commonly accepted as the core definition of big data, although there are many other interpretations. 40 | 41 | * *High volume:* a large size of data, in terms of number of records or attributes. 42 | * *High velocity:* fast generation and processing of data, i.e., real-time or stream data. 43 | * *High variety:* structured, semi-structured, or unstructured data. 44 | 45 | Cloud computing, due to its elasticity and massive storage capabilities, is very often where big data projects are deployed. Big data is not exclusive to cloud by any means, but big data technologies are very commonly integrated into cloud-computing applications and offered by cloud providers as IaaS or PaaS. 46 | 47 | There are three common components of big data, regardless of the specific toolset used: 48 | 49 | * *Distributed data collection:* Mechanisms to ingest large volumes of data, often of a streaming nature. This could be as "lightweight" as web-click streaming analytics and as complex as highly distributed scientific imaging or sensor data. Not all big data relies on distributed or streaming data collection, but it is a core big data technology. 50 | 51 | * *Distributed storage*: The ability to store the large data sets in distributed file systems (such as Google File System, Hadoop Distributed File System, etc.) or databases (often NoSQL), which is often required due to the limitations of non-distributed storage technologies. 52 | 53 | * *Distributed processing:* Tools capable of distributing processing jobs (such as map reduce, spark, etc.) for the effective analysis of data sets so massive and rapidly changing that single-origin processing can't effectively handle them. 54 | 55 | #### Security and privacy considerations 56 | 57 | Due to a combination of the highly distributed nature of big data applications (with data collection, storage, and processing all distributed among diverse nodes) and the sheer volume and potential sensitivity of the information, security and privacy are typically high priorities but are challenged by a patchwork of different tools and platforms. 58 | 59 | #### Data collection 60 | 61 | Data collection mechanisms will likely use intermediary storage that needs to be appropriately secured. This storage is used as part of the transfer of data from collection to storage. Even if primary storage is well-secured it's important to also check intermediary storage, which might be as simple as some swap space on a processing node. For example, if collection is run in containers or virtual machines, ensure the underlying storage is appropriately secured. Distributed analysis/processing nodes will also likely use some form of intermediate storage that will need additional security. This could be, for example, the volume storage for instances running processing jobs. 62 | 63 | #### Key management 64 | 65 | Key management for storage may be complicated depending on the exact mechanisms used due to the distributed nature of nodes. There are techniques to properly encrypt most big data storage layers today, and these align with our guidance in *Domain 11- Data Security and Encryption*. The complicating factor is that key management needs to handle distributing keys to multiple storage and analysis nodes. 66 | 67 | #### Security capabilities 68 | 69 | Not all big data technologies have robust security capabilities. In some cases cloud provider security capabilities can help compensate for the big data technology limitations. Both should be included in any security architecture and the details will be specific to the combination of technologies selected. 70 | 71 | #### Identity and Access Management 72 | 73 | Identity and Access Management will likely occur at both cloud and big data tool levels, which can complicate entitlement matrices. 74 | 75 | #### PaaS 76 | 77 | Many cloud providers are expanding big data support with *machine learning* and other platform as a service options that rely on access to enterprise data. These should not be used without a full understanding of potential data exposure, compliance, and privacy implications. For example, if the machine learning runs as PaaS inside the provider's infrastructure, where provider employees could technically access it, does that create a compliance exposure? 78 | 79 | This doesn't mean you shouldn't use the services, it just means you need to understand the implications and make an appropriate risk decisions. Machine learning and other analysis services aren't necessarily insecure and don't necessarily violate privacy and compliance commitments. 80 | 81 | ### Internet of Things (IoT) 82 | 83 | The Internet of Things is a blanket term for non-traditional computing devices used in the physical world that utilize Internet connectivity. It includes everything from Internet-enabled operational technology (used by utilities like power and water) to fitness trackers, connected lightbulbs, medical devices, and beyond. These technologies are increasingly deployed in enterprise environments for applications such as: 84 | 85 | * Digital tracking of the supply chain. 86 | * Digital tracking of physical logistics. 87 | * Marketing, retail, and customer relationship management. 88 | * Connected healthcare and lifestyle applications for employees, or delivered to consumers. 89 | 90 | A very large percentage of these devices connect back to cloud computing infrastructure for their back-end processing and data storage. Key cloud security issues related to IoT include: 91 | 92 | * Secure data collection and sanitization. 93 | * Device registration, authentication, and authorization. One common issue encountered today is use of stored credentials to make direct API calls to the back-end cloud provider. There are known cases of attackers decompiling applications or device software and then using those credentials for malicious purposes. 94 | * API security for connections from devices back to the cloud infrastructure. Aside from the stored credentials issue just mentioned, the APIs themselves could be decoded and used for attacks on the cloud infrastructure. 95 | * Encrypted communications. Many current devices use weak, outdated, or non-existent encryption, which places data and the devices at risk. 96 | * Ability to patch and update devices so they don't become a point of compromise. Currently, it is common for devices to be shipped as-is and never receive security updates for operating systems or applications. This has already caused multiple significant and highly publicized security incidents, such as massive botnet attacks based on compromised IoT devices. 97 | 98 | ### Mobile 99 | 100 | Mobile computing is neither new nor exclusive to cloud, but a very large percentage of mobile applications connect to cloud computing for their back-end processing. Cloud can be an ideal platform to support mobile since cloud providers are geographically distributed and designed for the kinds of highly dynamic workloads commonly experienced with mobile applications. This section won't discuss overall mobile security, just the portions that affect cloud security. 101 | 102 | The primary security issues for mobile computing (in the cloud context) are very similar to IoT, except a mobile phone or tablet is also a general purpose computer: 103 | 104 | * Device registration, authentication, and authorization is a common source of issues. Especially (again) the use of stored credentials, and even more so when the mobile device connects directly to the cloud provider's infrastructure/APIs. Attackers have been known to decompile mobile applications to reveal stored credentials which are then used to directly manipulate or attack the cloud infrastructure. Data stored on the device should also be protected with the assumption that the user of the device may be a hostile attacker. 105 | 106 | * Application APIs are also a potential source of compromise. Attackers are known to sniff API connections, in some cases using local proxies that they redirect their own devices towards, and then decompile the (likely now unencrypted) API calls and explore them for security weaknesses. Certificate pinning/validation inside the device application may help reduce this risk. 107 | 108 | For additional recommendations on the security of mobile and cloud computing see the latest research from the CSA [Mobile Working Group](https://cloudsecurityalliance.org/group/mobile/). 109 | 110 | ### Serverless Computing 111 | 112 | Serverless computing is the extensive use of certain PaaS capabilities to such a degree that all or some of an application stack runs in a cloud provider's environment without any customer-managed operating systems, or even containers. 113 | 114 | "Serverless computing" is a bit of a misnomer since there is always a server running the workload someplace, but those servers and their configuration and security are completely hidden from the cloud consumer. The consumer only manages settings for the service, and not any of the underlying hardware and software stacks. 115 | 116 | Serverless includes services such as: 117 | 118 | * Object storage 119 | * Cloud load balancers 120 | * Cloud databases 121 | * Machine learning 122 | * Message queues 123 | * Notification services 124 | * Code execution environments (These are generally restricted containers where a consumer runs uploaded application code.) 125 | * API gateways 126 | * Web servers 127 | 128 | Serverless capabilities may be deeply integrated by the cloud provider and tied together with event-driven systems and integrated IAM and messaging to support construction of complex applications without any customer management of servers, containers, or other infrastructure. 129 | 130 | From a security standpoint, key issues include: 131 | 132 | * Serverless places a much higher security burden on the cloud provider. Choosing your provider and understanding security SLAs and capabilities is absolutely critical. 133 | * Using serverless the cloud consumer will not have access to commonly-used monitoring and logging levels, such as server or network logs. Applications will need to integrate more logging and cloud providers should provide necessary logging to meet core security and compliance requirements. 134 | * Although the provider's services may be certified or attested for various compliance requirements, not necessarily every service will match every potential regulation. Providers need to keep compliance mappings up to date and customers need to ensure they only use services within their compliance scope. 135 | * There will be high levels of access to the cloud provider's management plane since that is the only way to integrate and use the serverless capabilities. 136 | * Serverless can dramatically reduce attack surface and pathways and integrating serverless components may be an excellent way to break links in an attack chain, even if the entire application stack is not serverless. 137 | * Any vulnerability assessment or other security testing must comply with the provider's terms of service. Cloud consumers may no longer have the ability to directly test applications, or must test with a reduced scope, since the provider's infrastructure is now hosting everything and can't distinguish between legitimate tests and attacks. 138 | * Incident response may also be complicated and will definitely require changes in process and tooling to manage a serverless-based incident. 139 | 140 | ## Recommendations 141 | 142 | * Big data 143 | * Leverage cloud provider capabilities wherever possible, even if they overlap with big data tool security capabilities. This ensures you have proper protection within the cloud metastructure and the specific application stack. 144 | * Use encryption for primary, intermediary, and backup storage for both data collection and data storage planes. 145 | * Include both the big data tool and cloud platform Identity and Access Management in the project entitlement matrix. 146 | * Fully understand the potential benefits and risks of using a cloud machine learning or analytics service. Pay particular attention to privacy and compliance implications. 147 | * Cloud providers should ensure customer data is not exposed to employees or other administrators using technical and process controls. 148 | * Cloud providers should clearly publish which compliance standards their analytics and machine-learning services are compliant with (for their customers). 149 | * Cloud consumers should consider use of data masking or obfuscation when considering a service that doesn't meet security, privacy, or compliance requirements. 150 | * Follow additional big data security best practices, including those provided by the tool vendor (or Open Source project) and [the Cloud Security Alliance](https://downloads.cloudsecurityalliance.org/assets/research/big-data/BigData_Security_and_Privacy_Handbook.pdf). 151 | 152 | * Internet of Things 153 | * Ensure devices can be patched and upgraded. 154 | * Do not store static credentials on devices that could lead to compromise of the cloud application or infrastructure. 155 | * Follow best practices for secure device registration and authentication to the cloud-side application, typically using a federated identity standard. 156 | * Encrypt communications. 157 | * Use a secure data collection pipeline and sanitize data to prevent exploitation of the cloud application or infrastructure through attacks on the data-collection pipeline. 158 | * Assume all API requests are hostile. 159 | * Follow the additional, more-detailed guidance issued by the [CSA Internet of Things working group](https://cloudsecurityalliance.org/group/internet-of-things/). 160 | 161 | * Mobile 162 | * Follow your cloud provider's guidance on properly authenticating and authorizing mobile devices when designing an application that connects directly to the cloud infrastructure. 163 | * Use industry standards, typically federated identity, for connecting mobile device applications to cloud-hosted applications. 164 | * Never transfer unencrypted keys or credentials over the Internet. 165 | * Test all APIs under the assumption that a hostile attacker will have authenticated, unencrypted access. 166 | * Consider certificate pinning and validation inside mobile applications. 167 | * Validate all API data and sanitize for security. 168 | * Implement server/cloud-side security monitoring for hostile API activity. 169 | * Ensure all data stored on device is secured and encrypted. 170 | * Sensitive data that could allow compromise of the application stack should not be stored locally on-device where a hostile user can potentially access it. 171 | * Follow the more detailed recommendations and research issued by the [CSA Mobile working group](https://cloudsecurityalliance.org/group/mobile/). 172 | 173 | * Serverless Computing 174 | * Cloud providers must clearly state which PaaS services have been assessed against which compliance requirements or standards. 175 | * Cloud consumers must only use serverless services that match their compliance and governance obligations. 176 | * Consider injecting serverless components into application stacks using architectures that reduce or eliminate attack surface and/or network attack paths. 177 | * Understand the impacts of serverless on security assessments and monitoring. 178 | * Cloud consumers will need to rely more on application-code scanning and logging and less on server and network logs. 179 | * Cloud consumers must update incident response processes for serverless deployments. 180 | * Although the cloud provider is responsible for security below the serverless platform level, the cloud consumer is still responsible for properly configuring and using the products. -------------------------------------------------------------------------------- /Domain 2- Governance and Enterprise Risk Management.md: -------------------------------------------------------------------------------- 1 | # Domain 2: Governance and Enterprise Risk Management 2 | 3 | ## Introduction 4 | 5 | Governance and risk management are incredibly broad topics. This guidance focuses on how they change in cloud computing; it is not and should not be considered a primer or comprehensive exploration of those topics outside of cloud. 6 | 7 | For security professionals, cloud computing impacts four areas of governance and risk management: 8 | 9 | * *Governance* includes the policy, process, and internal controls that comprise how an organization is run. Everything from the structures and policies to the leadership and other mechanisms for management. 10 | 11 | > For more information on governance please see 12 | * [ISO/IEC 38500:2015 - Information Technology - Governance of IT for the organization](http://www.iso.org/iso/home/store/catalogue_tc/catalogue_detail.htm?csnumber=62816) 13 | * ISACA - COBIT - [A Business Framework for the Governance and Management of Enterprise IT](http://www.isaca.org/cobit/Pages/CobitFramework.aspx) 14 | * [ISO/IEC 27014:2013 - Information Technology - Security techniques - Governance of information security](http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=43754) 15 | 16 | * *Enterprise risk management* includes managing overall risk for the organization, aligned to the organization's governance and risk tolerance. Enterprise risk management includes all areas of risk, not merely those concerned with technology. 17 | 18 | * *Information risk management* covers managing the risk to information, including information technology. Organizations face all sorts of risks, from financial to physical, and information is only one of multiple assets an organization needs to manage. 19 | 20 | * *Information security* is the tools and practices to manage risk to information. Information security isn't the be-all and end-all of managing information risks; policies, contracts, insurance, and other mechanisms also play a role (including physical security for non-digital information). However, a—if not *the*—primary role of information security is to provide the processes and controls to protect electronic information and the systems we use to access it. 21 | 22 | In a simplified hierarchy, information security is a tool of information risk management, which is a tool of enterprise risk management, which is a tool of governance. The four are all closely related but require individual focus, processes, and tools. 23 | 24 | ************insert 2.1 here ************ 25 | > 2.1: A Simplified Risk and Governance Hierarchy 26 | 27 | > Legal issues and compliance are covered in Domains 3 and 4, respectively. Information risk management and data governance are covered in domain 5. Information security is essentially the rest of this guidance. 28 | 29 | ## Overview 30 | 31 | ### Governance 32 | 33 | Cloud computing affects governance, since it either introduces a third party into the process (in the case of public cloud or hosted private cloud) or potentially alters internal governance structures in the case of self-hosted private cloud. The primary issue to remember when governing cloud computing is that *an organization can never outsource responsibility for governance*, even when using external providers. This is always true, cloud or not, but is useful to keep in mind when navigating cloud computing's concepts of shared responsibility models. 34 | 35 | Cloud service providers try to leverage economies of scale to manage costs and enable capabilities. This means creating extremely standardized services (including contracts and service level agreements) that are consistent across all customers. Governance models can't necessarily treat cloud providers the same way they'd treat dedicated external service providers, which typically customize their offerings, including legal agreements, for each client. 36 | 37 | Cloud computing changes the *responsibilities* and mechanisms for implementing and managing governance. Responsibilities and mechanisms for governance are defined in the contract, as with any business relationship. If the area of concern isn't in the contract, there are no mechanisms available to enforce, and there is a governance gap. Governance gaps don't necessarily exclude using the provider, but they do require the customer to adjust their own processes to close the gaps or accept the associated risks. 38 | 39 | #### Tools of cloud governance 40 | 41 | As with any other area, there are specific management tools used for governance. This list focuses more on tools for external providers, but these same tools can often be used internally for private deployments: 42 | 43 | * Contracts 44 | 45 | The primary tool of governance is the contract between a cloud provider and a cloud customer (this is true for public and private cloud). The contract is your only guarantee of any level of service or commitment—assuming there is no breach of contract which, tosses everything into a legal scenario. Contracts are the primary tool to extend governance into business partners and providers. 46 | 47 | **************Insert 2.2 here***************** 48 | > Contracts define the relationship between providers and customers and are the primary tool for customers to extend governance to their suppliers. 49 | 50 | * Supplier (cloud provider) Assessments 51 | 52 | These assessments are performed by the potential cloud customer using available information and allowed processes/techniques. They combine contractual and manual research with third-party attestations (legal statements often used to communicate the results of an assessment or audit) and technical research. They are very similar to any supplier assessment and can include aspects like financial viability, history, feature offerings, third-party attestations, feedback from peers, and so on. More detail on assessments is covered later in this Domain and in Domain 4. 53 | 54 | * Compliance reporting 55 | 56 | Compliance reporting includes all the documentation on a provider's internal (i.e. self) and external compliance assessments. They are the reports from *audits of controls*, which an organization can perform themselves, a customer can perform on a provider (although this usually isn't an option in cloud), or have performed by a trusted third party. Third-party audits and assessments are preferred since they provide independent validation (assuming you trust the third party). 57 | 58 | Compliance reports are often available to cloud prospects and customers but may only be available under NDA or to contracted customers. This is often required by the firm that performed the audit and isn't necessarily something that's completely under the control of the cloud provider. 59 | 60 | Assessments and audits should be based on existing standards (of which there are many). It's critical to understand the scope, not just the standard used. Standards like the SSAE 16 have a defined scope, which includes both *what* is assessed (e.g. which of the provider's services) as well as *which controls* are assessed. A provider can thus "pass" an audit that doesn't include any security controls, which isn't overly useful for security and risk managers. Also consider the transitive trust required to treat a third-party assessment as equivalent to the activities that you might undertake when doing your own assessment. Not all audit firms (or auditors) are created equal and the experience, history, and qualifications of the firm should be included in your governance decisions. 61 | 62 | The [Cloud Security Alliance STAR Registry](https://cloudsecurityalliance.org/star/#_overview) is an assurance program and documentation registry for cloud provider assessments based on the CSA Cloud Controls Matrix (CCM) and Common Assessment Initiative Questionnaire (CAIQ). Some providers also disclose documentation for additional certifications and assessments (including self assessments). 63 | 64 | ### Enterprise Risk Management 65 | 66 | Enterprise Risk Management (ERM) is the overall management of risk for an organization. As with governance, the contract defines the roles and responsibilities for risk management between a cloud provider and a cloud customer. And, as with governance, you can never outsource your overall responsibility and accountability for risk management to an external provider. 67 | 68 | > For more on risk management see 69 | * [ISO 31000:2009 - Risk management – Principles and guidelines](http://www.iso.org/iso/catalogue_detail?csnumber=43170) 70 | * [ISO/IEC 31010:2009 - Risk management – Risk assessment techniques](http://www.iso.org/iso/catalogue_detail?csnumber=51073) 71 | * [NIST Special Publication 800-37 Revision 1](updated June 5, 2014) (http://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-37r1.pdf) 72 | 73 | Risk management in cloud is based on the *shared responsibilities model* (which we most often discuss in reference to security). The cloud provider accepts some responsibility for certain risks, and the cloud customer is responsible for anything beyond that. This is especially evident as you evaluate differences between the service models, where the provider manages more risks in SaaS and the consumer more in IaaS. But, again, the cloud consumer is ultimately responsible for ownership of the risks; they only pass on some of the *risk management* to the cloud provider. This holds true even with a self-hosted private cloud; in those situations an organizational unit is passing on some of their risk management to the internal cloud provider instead of an external party, and internal SLAs and procedures replace external contracts. 74 | 75 | ERM relies on good contracts and documentation to know where the division of responsibilities and potential for untreated risk lie. Whereas governance is nearly exclusively focused on contracts, risk management can delve deeper into the technology and process capabilities of the provider, based on their documentation. For example, a contract will rarely define how network security is actually implemented. Review of the provider's documentation will provide much more information to help with an effective risk decision. 76 | 77 | *Risk tolerance* is the amount of risk that the leadership and stakeholders of an organization are willing to accept. It varies based on asset and you shouldn't make a blanket risk decision about a particular provider; rather, assessments should align with the value and requirements of the assets involved. Just because a public cloud provider is external and a consumer might be concerned with shared infrastructure for some assets doesn't mean it isn't within risk tolerance for all assets. Over time this means that, practically speaking, you will build out a matrix of cloud services along with which types of assets are allowed in those services. Moving to the cloud doesn't change your risk tolerance, it just changes how risk is managed. 78 | 79 | ### The Effects of Service Model and Deployment Model 80 | 81 | In considering the various options available not only in Cloud Service Providers but also in the fundamental delivery of cloud services, attention must be paid to how the Service and Deployment models affect the ability to manage governance and risk. 82 | 83 | #### Service Models 84 | 85 | ##### Software as a Service (SaaS) 86 | 87 | In the majority of cases, SaaS presents the most critical example of the need for a negotiated contract. Such a contract will protect the ability to govern or validate risk as it relates to data stored, processed, and transmitted with and in the application. SaaS providers tend to cluster at either end of the size/capability spectrum and the likelihood of a negotiated contract is much higher when dealing with a small SaaS provider. Unfortunately, many small SaaS providers are not able to operate at a level of sophistication that meets or exceeds customer governance and risk management capabilities. In concrete terms, the entire level of visibility into the actual operation of the infrastructure providing the SaaS application is limited to solely what is exposed in the user interface developed by the Cloud Provider. 88 | 89 | ##### Platform as a Service (PaaS) 90 | 91 | Continuing through the Service Models, the level of detail that is available (and the consequential ability to self-manage governance and risk issues) increases. The likelihood of a fully negotiated contract is likely lower here than with either of the other service models. That's because the core driver for most PaaS is to deliver a single capability with very high efficiency. 92 | 93 | PaaS is typically delivered with a rich API, and many PaaS providers have enabled the collection of some of the data necessary to prove that SLAs are being adhered to. That said, the customer is still in the position of having to exercise a significant effort in determining whether contract stipulations are effectively providing the level of control or support required to enable governance or risk management. 94 | 95 | ##### Infrastructure as a Service (IaaS) 96 | 97 | Infrastructure as a Service represents the closest that Cloud comes to a traditional data center (or even a traditional outsourced managed data center), and the good news is that the vast majority of existing governance and risk management activities that organizations have already built and utilize are directly transferable. There are, however, new complexities related to the underlying orchestration and management layers, as described in Domain 1, that enable the infrastructure which are often overlooked. 98 | 99 | In many ways, the governance and risk management of that orchestration and management layer is consistent with the underlying infrastructure (network, power, HVAC, etc.) of a traditional data center. The same governance and risk management issues are present, but the exposure of those systems is sufficiently different that changes to the existing process are required. For example, controlling who can make network configuration changes shifts from accounts on individual devices to the cloud management plane. 100 | 101 | #### Deployment Models 102 | 103 | ##### Public cloud 104 | 105 | Cloud customers have a reduced ability to govern operations in a public cloud since the provider is responsible for the management and governance of their infrastructure, employees, and everything else. The customers also often have reduced ability to negotiate contracts, which impacts how they extend their governance model into the cloud. Inflexible contracts are a natural property of multitenancy: providers can't necessarily adjust contracts and operations for each customer as everything runs on one set of resources, using one set of processes. Adapting for different customers increases costs, causing a trade-off, and often that's the dividing line between using public and private cloud. Hosted private cloud allows full customization, but at increased costs due to the loss of the economies of scale. 106 | 107 | This doesn't mean you shouldn't try to negotiate your contract, but recognize that this isn't always possible; instead, you'll need to either choose a different provider (which may actually be less secure), or adjust your needs and use alternate governance mechanisms to mitigate concerns. 108 | 109 | To use an analogy, think of a shipping service. When you use a common carrier/provider you don't get to define their operations. You put your sensitive documents in a package and entrust them to meet their obligations to deliver it safely, securely, and within the expected Service Level Agreement. 110 | 111 | ##### Private cloud 112 | 113 | Public cloud isn't the only model that impacts governance; even private cloud will have an effect. If an organization allows a third party to own and/or manage the private cloud (which is very common), this is similar to how governance affects any outsourced provider. There will be shared responsibilities with obligations that are defined in the contract. 114 | 115 | Although you will likely have more control over contractual terms, it's still important to ensure they cover the needed governance mechanisms. As opposed to a public provider—which has various incentives to keep its service well-documented and at particular standard levels of performance, functionality, and competitiveness—a hosted private cloud may only offer exactly what is in the contract, with everything else at extra cost. This *must* be considered and accounted for in negotiations, with clauses to guarantee that the platform itself remains up to date and competitive. For example, by requiring the vendor to update to the latest version of the private cloud platform within a certain time period of release and after *your* sign-off. 116 | 117 | With a self-hosted private cloud governance will focus on internal service level agreements for the cloud consumers (business or other organizational units) and chargeback and billing models for providing access to the cloud. 118 | 119 | ##### Hybrid and Community Clouds 120 | 121 | When contemplating **hybrid cloud environments**, the governance strategy must consider the minimum common set of controls comprised of the Cloud Service Provider's contract and the organization's internal governance agreements. The cloud consumer is connecting either two cloud environments or a cloud environment and a data center. In either case the overall governance is the intersection of those two models. For example, if you connect your data center to your cloud over a dedicated network link you need to account for governance issues that will span both environments. 122 | 123 | Since **community clouds** are a shared platform with multiple organizations, but are not public, governance extends to the relationships with those members of the community, not just the provider and the customer. It's a mix of how you would approach public cloud and hosted private cloud governance, where the overall tools of governance and contracts will have some of the economies of scale of a public cloud provider, but be tunable based on community consensus, as with a hosted private cloud. This also includes community membership relations, financial relationships, and how to respond when a member leaves the community. 124 | 125 | #### Cloud risk management trade-offs 126 | 127 | There are advantages and disadvantages to managing enterprise risk for cloud deployments. These factors are, as you would expect, more pronounced for public cloud and hosted private cloud: 128 | 129 | * There is less physical control over assets and their controls and processes. You don't physically control the infrastructure or the provider's internal processes. 130 | * There is a greater reliance on contracts, audits, and assessments, as you lack day-to-day visibility or management. 131 | * This creates an increased requirement for proactive management of relationship and adherence to contracts, which extends beyond the initial contract signing and audits. Cloud providers also constantly evolve their products and services to remain competitive and these ongoing innovations might exceed, strain, or not be covered by existing agreements and assessments. 132 | * Cloud customers have a reduced need (and associated reduction in costs) to manage risks that the cloud provider accepts under the shared responsibility model. You haven't outsourced accountability for managing the risk, but you can certainly outsource the management of some risks. 133 | 134 | #### Cloud risk management tools 135 | 136 | The following processes help form the foundation of managing risk in cloud computing deployments. One of the core tenants of risk management is that you can *manage, transfer, accept, or avoid* risks. But everything starts with a proper assessment: 137 | 138 | The supplier assessment sets the groundwork for the cloud risk management program: 139 | * Request or acquire documentation. 140 | * Review their security program and documentation. 141 | * Review any legal, regulatory, contractual, and jurisdictional requirements for both the provider and yourself. (See the Domain 3: Legal for more.) 142 | * Evaluate the contracted service in the context of your information assets. 143 | * Separately evaluate the overall provider, such as finances/stability, reputation, and outsourcers. 144 | 145 | *************insert 2.3 here***************** 146 | > Supplier Assessment Process 147 | 148 | Periodically review audits and assessments to ensure they are up to date: 149 | * Don't assume all services from a particular provider meet the same audit/assessment standards. They can vary. 150 | * Periodic assessments should be scheduled and *automated* if possible. 151 | 152 | After reviewing and understanding what risks the cloud provider manages, what remains is *residual risk*. Residual risk may often be managed by controls that you implement (e.g. encryption). The availability and specific implementation of risk controls vary greatly across cloud providers, particular services/features, service models, and deployment models. If, after all your assessments and the controls that you implement yourself there is still residual risk your only options are to transfer it, accept the risk, or avoid it. 153 | 154 | Risk transfer, most often enabled by insurance, is an imperfect mechanism, especially for information risks. It can compensate some of the financial loss associated with a primary loss event, but won't help with a secondary loss event (like loss of customers)—especially an intangible or difficult to quantify loss, such as reputation damage. From the perspective of insurance carriers, cyber-insurance is also a nascent field without the depth of actuarial tables used for other forms of insurance, like those for fire or flooding, and even the financial compensation may not match the costs associated with the primary loss event. Understand the limits. 155 | 156 | ## Recommendations 157 | 158 | * Identify the shared responsibilities of security and risk management based on the chosen cloud deployment *and* service model. Develop a Cloud Governance Framework/Model as per relevant industry best practices, global standards, and regulations like CSA CCM, COBIT 5, NIST RMF, ISO/IEC 27017, HIPAA, PCI DSS, EU GDPR, etc. 159 | * Understand how a contract affects your governance framework/model. 160 | * Obtain and review contracts (and any referenced documents) before entering into an agreement. 161 | * Don't assume that you can effectively negotiate contracts with a cloud provider—but this also shouldn't necessarily stop you from using that provider. 162 | * If a contract can't be effectively negotiated and you perceive an unacceptable risk, consider alternate mechanisms to manage that risk (e.g. monitoring or encryption). 163 | * Develop a process for cloud provider assessments. 164 | * This should include: 165 | * Contract review. 166 | * Self-reported compliance review. 167 | * Documentation and policies. 168 | * Available audits and assessments. 169 | * Service reviews adapting to the customer’s requirements. 170 | * Strong change-management policies to monitor changes in the organization's use of the cloud services. 171 | * Cloud provider re-assessments should occur on a scheduled basis and be automated if possible. 172 | * Cloud providers should offer easy access to documentation and reports needed by cloud prospects for assessments. 173 | * For example, the CSA STAR registry. 174 | * Align risk requirements to the specific assets involved and the risk tolerance for those assets. 175 | * Create a specific risk management and risk acceptance/mitigation methodology to assess the risks of every solution in the space 176 | * Use controls to manage residual risks. 177 | * If residual risks remain, choose to accept or avoid the risks. 178 | * Use tooling to track approved providers based on asset type (e.g. linked to data classification), cloud usage, and management. -------------------------------------------------------------------------------- /Domain 4- Compliance and Audit Management.md: -------------------------------------------------------------------------------- 1 | # Domain 4: Compliance and Audit Management 2 | 3 | ## Introduction 4 | 5 | Organizations face new challenges as they migrate from traditional data centers to the cloud. Delivering, measuring, and communicating compliance with a multitude of regulations across multiple jurisdictions is one of the largest of these challenges. Customers and providers alike need to understand and appreciate the jurisdictional differences and their implications on existing compliance and audit standards, processes, and practices. The distributed and virtualized nature of cloud computing requires significant adjustment from approaches based on definite and physical instantiations of information and processes. 6 | 7 | In addition to providers and customers, regulators and auditors are also adjusting to the new world of cloud computing. Few existing regulations were written to account for virtualized environments or cloud deployments. A cloud consumer can be challenged to show auditors that the organization is in compliance. Understanding the interaction of cloud computing and the regulatory environment is a key component of any cloud strategy. Cloud customers, auditors, and providers must consider and understand the following: 8 | 9 | * Regulatory implications for using a particular cloud service or providers, giving particular attention to any cross-border or multi-jurisdictional issues when applicable. 10 | * Assignment of compliance responsibilities between the provider and customer, including indirect providers (i.e., the cloud provider of your cloud provider). This includes the concept of *compliance inheritance* where a provider may have parts of their service certified as compliant which removes this from the audit scope of the customer, but the customer is still responsible for the compliance of everything they build on top of the provider. 11 | * Provider capabilities for demonstrating compliance, including document generation, evidence production, and process compliance, in a timely manner. 12 | 13 | Some additional cloud-specific issues to pay particular attention to include: 14 | 15 | * The role of provider audits and certifications and how those affect customer audit (or assessment) scope. 16 | * Understanding which features and services of a cloud provider are within the scope of which audits and assessments. 17 | * Managing compliance and audits over time. 18 | * Working with regulators and auditors who may lack experience with cloud computing technology. 19 | * Working with providers who may lack audit and or regulatory compliance experience. 20 | 21 | ## Overview 22 | 23 | Achieving and maintaining compliance with a plethora of modern regulations and standards is a core activity for most information security teams and a critical tool of governance and risk management. So much so that the tools and teams in this realm have their own acronym: *GRC*, for governance, risk, and compliance. Although very closely related with audits — which are a key mechanism to support, assure, and demonstrate compliance — there is more to compliance than audits and more to audits than using them to assure regulatory compliance. For our purposes: 24 | 25 | * Compliance validates awareness of and adherence to corporate obligations (e.g., corporate social responsibility, ethics, applicable laws, regulations, contracts, strategies and policies). The compliance process assesses the state of that awareness and adherence, further assessing the risks and potential costs of non-compliance against the costs to achieve compliance, and hence prioritize, fund, and initiate any corrective actions deemed necessary. 26 | 27 | * Audits are a key tool for proving (or disproving) compliance. We also use audits and assessments to support non-compliance risk decisions. 28 | 29 | This section discusses these interrelated domains individually to better focus on the implications cloud computing has on each. 30 | 31 | ### Compliance 32 | 33 | Information technology in the cloud (or anywhere really) is increasingly subject to a plethora of policies and regulations from governments, industry groups, business relationships, and other stakeholders. Compliance management is a tool of governance; it is how an organization assesses, remediates, and proves it is meeting these internal and external obligations. 34 | 35 | Regulations, in particular, typically have strong implications for information technology and its governance, especially in terms of monitoring, management, protection, and disclosure. Many regulations and obligations require a certain level of security, which is why information security is so deeply coupled with compliance. Security controls are thus an important tool to assure compliance, and evaluation and testing of these controls is a core activity for security professionals. This includes assessments even when performed by dedicated internal *or* external auditors. 36 | 37 | #### How Cloud Changes Compliance 38 | 39 | As with security, compliance in the cloud is a shared responsibility model. Both the cloud provider and customer have responsibilities, but the customer is *always ultimately responsible for their own compliance*. These responsibilities are defined through contracts, audits/assessments, and specifics of the compliance requirements. 40 | 41 | Cloud customers, particularly in public cloud, must rely more on third-party attestations of the provider to understand their compliance alignment and gaps. Since public cloud providers rely on economies of scale to manage costs they often will not allow customers to perform their own audits. Instead, similar to financial audits of public companies, they engage with a third-party firm to perform audits and issue attestations. Thus the cloud customer doesn't typically get to define the scope or perform the audit themselves. They will instead need to rely on these reports and attestations to determine if the service meets their compliance obligations. 42 | 43 | Many cloud providers are certified for various regulations and industry requirements, such as PCI DSS, SOC1, SOC2, HIPAA, best practices/frameworks like CSA CCM, and global/regional regulations like the EU GDPR. These are sometimes referred to as *pass-through audits*. A pass through audit is a form of *compliance inheritance*. In this model all or some of the cloud provider's infrastructure and services undergo an audit to a compliance standard. The provider takes responsibility for the costs and maintenance of these certifications. Provider audits, including pass-through audits, need to be understood within their limitations: 44 | 45 | * They certify that the *provider* is compliant. 46 | * It is still the responsibility of the customer to *build compliant applications and services on the cloud*. 47 | * This means the provider's infrastructure/services are not within scope of a customer's audit/assessment. But everything the customer builds themselves is still within scope. 48 | * The customer is still ultimately responsible for maintaining the compliance of what they build and manage. For example, if an IaaS provider is PCI DSS certified, the customer can build their own PCI-compliant service on that platform and the provider's infrastructure and operations should be outside the *customer's* assessment scope. However, the customer can just as easily run afoul of PCI and fail their assessment if they don't design their own application running in the cloud properly. 49 | 50 | ***************insert 4.1************** 51 | > With compliance inheritance the cloud provider's infrastructure is out of scope for a customer's compliance audit, but everything the customer configures and builds on top of the certified services is still within scope. 52 | 53 | Cloud compliance issues aren't merely limited to pass-through audits; the nature of cloud also creates additional differentiators. 54 | 55 | Many cloud providers offer globally distributed data centers running off a central management console/platform. It is still the customer's responsibility to manage and understand where to deploy data and services and still maintain their legal compliance across national and international jurisdictions. 56 | 57 | Organizations have the same responsibility in traditional computing, but the cloud dramatically reduces the friction of these potentially international deployments. E.g. a developer can potentially deploy regulated data in a non-compliant country without having to request an international data center and sign off on multiple levels of contracts, should the proper controls not be enabled to prevent this. 58 | 59 | Not all features and services within a given cloud provider are necessarily compliant and certified/audited with respect to all regulations and standards. It is incumbent on the cloud provider to communicate certifications and attestations clearly, and for customers to understand the scopes and limitations. 60 | 61 | ### Audit Management 62 | 63 | Proper organizational governance naturally includes audit and assurance. Audit must be independently conducted and should be robustly designed to reflect best practice, appropriate resources, and tested protocols and standards. Before delving into cloud implications we need to define the scope of audit management related to information security. 64 | 65 | Audits and assessments are mechanisms to *document compliance* with internal or external requirements (or identify deficiencies). Reporting needs to include a compliance determination as well as a list of identified issues, risks and remediation recommendations. Audits and assessments aren't limited to information security, but those related to information security typically focus on evaluating the effectiveness of security management and controls. Most organizations are subject to a mix of internal and external audits and assessments to assure compliance with internal and external requirements. 66 | 67 | All audits have variable scope and statement of applicability, which defines what is evaluated (e.g. all systems with financial data) and to which controls (e.g. an industry standard, custom scope, or both). An *attestation* is a legal statement from a third party, which can be used as their statement of audit findings. Attestations are a key tool when evaluating and working with cloud providers since the cloud customer does not always get to perform their own assessments. 68 | 69 | Audit management includes the management of all activities related to audits and assessments, such as determining requirements, scope, scheduling, and responsibilities. 70 | 71 | #### How cloud changes audit management: 72 | 73 | Some cloud customers may be used to auditing third party providers, but the nature of cloud computing and contracts with cloud providers will often preclude things like on-premise audits. Customers should understand that providers can (and often should) consider on-premise audits a security risk when providing multi-tenant services. Multiple on-premise audits from large numbers of customers presents clear logistical and security challenges, especially when the provider relies on shared assets to create the resource pools. 74 | 75 | Customers working with these providers will have to rely more on third party attestations rather than audits they perform themselves. Depending on the audit standard, actual results may only be releasable under a nondisclosure agreement (NDA), which means customers will need to enter into a basic legal agreement before gaining access to attestations for risk assessments or other evaluative purposes. This is often due to legal or contractual requirements with the audit firm, not due to any attempts and obfuscation by the cloud provider. 76 | 77 | Cloud providers should understand that customers still need assurance that the provider meets their contractual and regulatory obligations, and should thus provide rigorous third party attestations to prove they meet their obligations, especially when the provider does not allow direct customer assessments. These should be based on industry standards, with clearly defined scopes and the list of specific controls evaluated. Publishing certifications and attestations (to the degree legally allowed) will greatly assist cloud customers in evaluating providers. The Cloud Security Alliance STAR registry offers a central repository for providers to publicly release these documents. 78 | 79 | Some standards, like SSAE 16, attest that documented controls work as designed/required. The standard doesn't necessarily define the *scope of controls*, so both are needed to perform a full evaluation. Also, attestations and certifications don't necessarily apply equally to all services offered by a cloud provider. Providers should be clear about which services and features are covered, and it is the responsibility of the customer to pay attention and understand the implications on their use of the provider. 80 | 81 | Certain types of customer technical assessments and audits (such as a vulnerability assessment) may be limited in the provider's terms of service, and may require permission. This is often to help the provider distinguish between a legitimate assessment and an attack. 82 | 83 | It's important to remember that attestations and certifications are point-in-time activities. An attestation is a statement of an "over a period of time" assessment and may not be valid at any future point. Providers must keep any published results current or they risk exposing their customers to risks of non-compliance. Depending on contracts, this could even lead to legal exposures to the provider. Customers are also responsible for ensuring they rely on current results and track when their providers' statuses change over time. 84 | 85 | *Artifacts* are the logs, documentation, and other materials needed for audits and compliance; they are the evidence to support compliance activities. Both providers and customers have responsibilities for producing and managing their respective artifacts. Customers are ultimately responsible for the artifacts to support their own audits, and thus need to know what the provider offers, and create their own artifacts to cover any gaps. For example, by building more robust logging into an application since server logs on PaaS may not be available. 86 | 87 | **********insert 4.2*********** 88 | > Collecting and maintaining artifacts of compliance will change when using a cloud provider. 89 | 90 | ## Recommendations 91 | 92 | * Compliance, audit, and assurance should be *continuous*. They should not be seen as merely point in time activities, and many standards and regulations are moving more towards this model. This is especially true in cloud computing where both the provider and customer tend to be in more-constant flux and are rarely ever in a static state. 93 | * Cloud providers should: 94 | * Clearly communicate their audit results, certifications, and attestations with particular attention to: 95 | * The scope of assessments. 96 | * Which specific features/services are covered in which locations and jurisdictions. 97 | * How customer can deploy compliant applications and services on the cloud. 98 | * Any additional customer responsibilities and limitations. 99 | * Cloud providers must maintain their certifications/attestations over time and proactively communicate any changes in status. 100 | * Cloud providers should engage in continuous compliance initiatives to avoid creating any gaps, and thus exposures, for their customers. 101 | * Provide customers commonly needed evidences and artifacts of compliance, such as logs of administrative activity the customer cannot otherwise collect on their own. 102 | * Cloud customers should: 103 | * Understand their full compliance obligations before deploying, migrating to, or developing in the cloud. 104 | * Evaluate a provider's third-party attestations and certifications and align those to compliance needs. 105 | * Understand the scope of assessments and certifications, including both the controls and the features/services covered. 106 | * Attempt to select auditors with experience in cloud computing, especially if pass-through audits and certifications will be used to manage the customer's audit scope. 107 | * Ensure they understand what artifacts of compliance the provider offers, and effectively collect and manage those artifacts. 108 | * Create and collect their own artifacts when the provider's artifacts are not sufficient. 109 | * Keep a register of cloud providers used, relevant compliance requirements, and current status. The Cloud Security Alliance Cloud Controls Matrix can support this activity. 110 | 111 | 112 | -------------------------------------------------------------------------------- /Domain 5- Information Governance.md: -------------------------------------------------------------------------------- 1 | # Domain 5: Information Governance 2 | 3 | ## Introduction 4 | 5 | The primary goal of information security is to protect the fundamental data that powers our systems and applications. As companies transition to cloud computing, the traditional methods of securing data are challenged by cloud-based architectures. Elasticity, multi-tenancy, new physical and logical architectures, and abstracted controls require new data security strategies. In many cloud deployments, users even transfer data to external — or even public — environments in ways that would have been unthinkable only a few years ago. 6 | 7 | Managing information in the era of cloud computing is a daunting challenge that affects all organizations and requires not merely new technical protections but new approaches to fundamental governance. Although cloud computing has at least some affect on all areas of information governance, it particularly impacts compliance, privacy, and corporate policies due to the increased complexity in working with third parties and managing jurisdictional boundaries. 8 | 9 | Definition of information/data governance: 10 | 11 | *Ensuring the use of data and information complies with organizational policies, standards and strategy — including regulatory, contractual, and business objectives.* 12 | 13 | Our data is always subject to a range of requirements. Some placed on us by others — like regulatory agencies or customers and partners — others that are self-defined based on our risk tolerance or simply how we want to manage operations. Information governance includes the corporate structures and controls we use to ensure we handle data in accordance with our goals and requirements. 14 | 15 | There are numerous aspects of having data stored in the cloud that have an impact on information and data governance requirements. 16 | 17 | * *Multi-tenancy:* Multi-tenancy presents complicated security implications. When data is stored in the public cloud, it’s stored on shared infrastructure with other, untrusted tenants. Even in a private cloud environment it is stored and managed on infrastructure that’s shared across different business units, which likely have different governance needs. 18 | 19 | * *Shared security responsibility:* With greater sharing of environments comes greater shared security responsibilities. Data is now more likely to be owned and managed by different teams or even organizations. So, it’s important to recognize the difference between data custodianship and data ownership. 20 | 21 | * *Ownership*, as the name says, is about who owns the data. It’s not always perfectly clear. If a customer provides you data, you might own it or they might still legally own it, depending on law, contracts, and policies. If you host your data on a public cloud provider you *should* own it, but that might again depend on contracts. 22 | 23 | * *Custodianship* refers to who is *managing the data*. If a customer gives you their personal information and you don't have the rights to own it, you are merely the custodian. That means you can only use it in approved ways. If you use a public cloud provider they likewise become the custodian of the data, although you likely *also* have custodial responsibility depending on what controls you implement and manage yourself. Using a provider doesn’t obviate your responsibility. Basically, the owner defines the rules (sometimes indirectly through regulation) and the custodian implements the rules. The lines and roles between owner and custodian are impacted by cloud infrastructure, particularly in the case of public cloud. 24 | 25 | By hosting customer data in the cloud, we are introducing a third party into the governance model, the cloud provider. 26 | 27 | * *Jurisdictional boundaries and data sovereignty* - Since cloud, by definition, enables broad network access, it increases the opportunities to host data in more locations (jurisdictions) and reduces the friction in migrating data. Some providers may not be as transparent about the physical location of the data, while in other cases additional controls may be needed to restrict data to particular locations. 28 | 29 | * *Compliance, regulations, and privacy policies* - All of these may be impacted by cloud due to the combination of a 3rd party provider and jurisdictional changes. E.g. your customer agreement may not allow you to share/use data on a cloud provider, or may have certain security requirements (like encryption). 30 | 31 | * *Destruction and removal of data* - This ties in to the technical capabilities of the cloud platform. Can you ensure the destruction and removal of data in accordance with policy? 32 | 33 | When migrating to cloud, use it as an opportunity to revisit information architectures. Many of our information architectures today are quite fractured as they were implemented over sometimes decades in the face of ever-changing technologies. Moving to cloud creates a green field opportunity to reexamine how you manage information and find ways to improve things. Don't lift and shift existing problems. 34 | 35 | ## Overview 36 | 37 | *Data/information governance* means ensuring that the use use of data and information complies with organizational policies, standards, and strategy. This includes regulatory, contractual, and business requirements and objectives. Data is different than information, but we tend to use them interchangeably. Information is data with value. For our purposes, we use both terms to mean the same thing since that is so common. 38 | 39 | ### Cloud Information Governance Domains 40 | 41 | We will not cover all of data governance, but we’ll focus on where hosting in the cloud affects data governance. Cloud computing affects most data governance domains: 42 | 43 | * *Information Classification.* This is frequently tied to compliance and affects cloud destinations and handling requirements. Not everyone necessarily has a data classification program, but if you do you need to adjust it for cloud computing. 44 | 45 | * *Information Management Policies.* These tie to classification and the cloud needs to be added if you have them. They should also cover the different SPI tiers, since sending data to a SaaS vendor versus building your own IaaS app is very different. You need to determine, what is allowed to go where in the cloud? Which products and services? With what security requirements? 46 | 47 | * *Location and Jurisdiction Polices.* These have very direct cloud implications. Any outside hosting must comply with locational and jurisdictional requirements. Understand that internal policies can be changed for cloud computing, but legal requirements are hard lines. (See the legal domain for more information on this.) Make sure you understand that treaties and laws may create conflicts. You need to work with your legal department when handling regulated data to ensure you comply as best you can. 48 | 49 | * *Authorizations.* Cloud computing requires minimal changes to authorizations, but see the data security lifecycle to understand if the cloud impacts. 50 | 51 | * *Ownership.* Your organization is always responsible for data and information and that can't abrogated when moving to the cloud. 52 | 53 | * *Custodianship.* Your cloud provider may become custodian. Data hosted but properly encrypted is still under custodianship of the organization. 54 | 55 | * *Privacy.* Privacy is a sum of regulatory requirements, contractual obligations, and commitments to customers (e.g. public statements). You need to understand the total requirements and ensure information management and security policies align. 56 | 57 | * *Contractual controls.* This is your legal tool for extending governance requirements to a third party, like a cloud provider. 58 | 59 | * *Security controls.* Security controls are the tool to implement data governance. They change significantly in cloud computing. See the Data Security and Encryption domain. 60 | 61 | ### The Data Security Lifecycle 62 | 63 | Although Information Lifecycle Management is a fairly mature field, it doesn’t map well to the needs of security professionals. The Data Security Lifecycle is different from Information Lifecycle Management, reflecting the different needs of the security audience. This is a summary of the lifecycle, and a complete version is available at http://www.securosis.com/blog/data-security-lifecycle-2.0. It is simply a tool to help understand the security boundaries and controls around data. It’s not meant to be used as a rigorous tool for all types of data. It's a modeling tool to help evaluate data security at a high level and find focus points. 64 | 65 | The lifecycle includes six phases from creation to destruction. Although it is shown as a linear progression, once created, data can bounce between phases without restriction, and may not pass through all stages (for example, not all data is eventually destroyed). 66 | 67 | **********Insert 5.1************ 68 | > The Data Security Lifecycle 69 | 70 | 1. *Create.* Creation is the generation of new digital content, or the alteration/updating/modifying of existing content. 71 | 2. *Store.* Storing is the act committing the digital data to some sort of storage repository and typically occurs nearly simultaneously with creation. 72 | 3. *Use.* Data is viewed, processed, or otherwise used in some sort of activity, not including modification. 73 | 4. *Share.* Information is made accessible to others, such as between users, to customers, and to partners. 74 | 5. *Archive.* Data leaves active use and enters long-term storage. 75 | 6. *Destroy.* Data is permanently destroyed using physical or digital means (e.g., cryptoshredding). 76 | 77 | #### Locations and Entitlements 78 | 79 | The lifecycle represents the phases information passes through but doesn’t address its location or how it is accessed. 80 | 81 | *Locations* 82 | 83 | This can be illustrated by thinking of the lifecycle not as a single, linear operation, but as a series of smaller lifecycles running in different operating environments. At nearly any phase data can move into, out of, and between these environments. 84 | 85 | ********Insert 5.2********* 86 | > Data is accessed and stored in multiple locations, each with its own lifecycle. 87 | 88 | Due to all the potential regulatory, contractual, and other jurisdictional issues it is extremely important to understand both the logical and physical locations of data. 89 | 90 | *Entitlements* 91 | 92 | When users know where the data lives and how it moves, they need to know who is accessing it and how. There are two factors here: 93 | 94 | 1. Who accesses the data? 95 | 2. How can they access it (device & channel)? 96 | 97 | Data today is accessed using a variety of different devices. These devices have different security characteristics and may use different applications or clients. 98 | 99 | #### Functions, Actors, and Controls 100 | 101 | The next step identifies the functions that can be performed with the data, by a given actor (person or system) and a particular location. 102 | 103 | *Functions* 104 | 105 | There are three things we can do with a given datum: 106 | 107 | * *Read.* View/read the data, including creating, copying, file transfers, dissemination, and other exchanges of information. 108 | * *Process.* Perform a transaction on the data: update it; use it in a business processing transaction, etc. 109 | * *Store.* Hold the data (in a file, database, etc.). 110 | 111 | The table below shows which functions map to which phases of the lifecycle: 112 | 113 | Table 1—Information Lifecycle Phases 114 | 115 | | | Create | Store | Use | Share | Archive | Destroy | 116 | | ----- | ----- | ----- | ----- | ----- | ----- | ----- | 117 | | Read | X | X | X | X | X | X | 118 | | Process | X | | X | | | | 119 | | Store | | X | | | X | | 120 | 121 | An actor (person, application, or system/process, as opposed to the access device) performs each function in a location. 122 | 123 | *Controls* 124 | 125 | A control restricts a list of possible actions down to allowed actions. The table below shows one way to list the possibilities, which the user then maps to controls. 126 | 127 | Table 2—Possible and Allowed Controls 128 | 129 | | Function | Actor | Location | 130 | | ----- | ----- | ----- | ----- | ----- | ----- | ----- | 131 | | Possible | Allowed | Possible | Allowed | Possible | Allowed | 132 | | | | | | | | 133 | | | | | | | | 134 | | | | | | | | 135 | 136 | **********Insert 5.3************** 137 | > Mapping the lifecycle to functions and controls. 138 | 139 | ## Recommendations 140 | 141 | * Determine your governance requirements for information before planning a transition to cloud. This includes legal and regulatory requirements, contractual obligations and other corporate policies. Your corporate policies and standards may need to be updated to allow a third party to handle data. 142 | * Ensure information governance policies and practices extend to the cloud. This will be done through contractual and security controls. 143 | * When needed, use the data security lifecycle to help model data handling and controls. 144 | * Instead of lifting and shifting existing information architectures take the opportunity of the migration to the cloud to re-think and re-structure what is often the fractured approach used in existing infrastructure. Don’t bring bad habits. -------------------------------------------------------------------------------- /Domain 6- Management Plane and Business Continuity.md: -------------------------------------------------------------------------------- 1 | 2 | # Domain 6: Management Plane and Business Continuity 3 | 4 | ## Introduction 5 | 6 | The management plane is the single most significant security difference between traditional infrastructure and cloud computing. This isn't all of the metastructure (defined in Domain 1) but is the interface to connect with the metastructure and configure much of the cloud. 7 | 8 | We always have a management plane, the tools and interfaces we use to manage our infrastructure, platforms, and applications, but cloud abstracts and centralizes administrative management of resources. Instead of controlling a data center configuration with boxes and wires, it is now controlled with API calls and web consoles. 9 | 10 | Thus gaining access to the management plane is like gaining unfettered access to your data center, unless you put the proper security controls in place to limit who can access the management plane and what they can do within it. 11 | 12 | To think about it in security terms, the management plane consolidates many things we previously managed through separate systems and tools, and then makes them Internet-accessible with a single set of authentication credentials. This isn't a net loss for security — there are also gains — but it is most definitely different, and it impacts how we need to evaluate and manage security. 13 | 14 | Centralization also brings security benefits. There are no hidden resources, you always know where everything *you own* is at all times, and how it is configured. This is an emergent property of both broad network access and metered service. The cloud controller always needs to know what resources are in the pool, out of the pool, and where they are allocated. 15 | 16 | This doesn't mean that all the assets you put into the cloud are equally managed. The cloud controller can't peer into running servers or open up locked files, nor understand the implications of your specific data and information. 17 | 18 | In the end, this is an extension of the shared responsibility model discussed in Domain 1 and throughout this Guidance. The cloud management plane is responsible for managing the assets of the resource pool, while the cloud consumer is responsible for how they configure those assets, and for the assets they deploy into the cloud. 19 | 20 | * The cloud provider is responsible for ensuring the management plane is secure and *necessary security features are exposed to the cloud consumer*, such as granular entitlements to control what someone can do even if they have management plane access. 21 | 22 | * The cloud consumer is responsible for properly configuring their use of the management plane, as well as for securing and managing their credentials. 23 | 24 | ### Business continuity and disaster recovery in the cloud 25 | 26 | BC/DR is just as important in cloud computing as it is for any other technology. Aside from the differences resulting from the potential involvement of a third-party provider (something we often deal with in BC/DR), there are additional considerations due to the inherent differences when using shared resources. 27 | 28 | The three main aspects of BC/DR in the cloud are: 29 | 30 | * Ensuring continuity and recovery within a given cloud provider. These are the tools and techniques to best architect your cloud deployment to keep things running if either what you deploy breaks, or a portion of the cloud provider breaks. 31 | 32 | * Preparing for and managing cloud provider outages. This extends from the more constrained problems that you can architect around within a provider to the wider outages that take down all or some of the provider in a way that exceeds the capabilities of inherent DR controls. 33 | 34 | * Considering options for portability, in case you need to migrate providers or platforms. This could be due to anything from desiring a different feature set to the complete loss of the provider if, for example, they go out of business or you have a legal dispute. 35 | 36 | #### Architect for failure 37 | 38 | Cloud platforms can be incredibly resilient, but single cloud assets are typically less resilient than in the case of traditional infrastructure. This is due to the inherently greater fragility of virtualized resources running in highly-complex environments. 39 | 40 | This mostly applies to compute, networking, and storage, since those allow closer to raw access, and cloud providers can leverage additional resiliency techniques for their platforms and applications that run on top of IaaS. 41 | 42 | However, this means that cloud providers tend to offer options to improve resiliency, often beyond that which is attainable (for equivalent costs) in traditional infrastructure. For example, by enabling multiple "zones" where you can deploy virtual machines within an auto-scaled group that encompasses physically distinct data centers for high-availability. Your application can be balanced across zones so that if an entire zone goes down your application still stays up. This is quite difficult to implement in a traditional data center, where it typically isn't cost-effective to build multiple, isolated physical zones across which you can deploy a cross-zone load-balanced application with automatic failover. 43 | 44 | But this extra resiliency is only achievable if you architect to leverage these capabilities. Deploying your application all in one zone, or even on a single virtual machine in a single zone, is likely to be less resilient than deploying on a single, well-maintained physical server. 45 | 46 | This is why "lift and shift" wholesale migration of existing applications without architectural changes can *reduce* resiliency. Existing applications are rarely architected and deployed to work with these resiliency options, yet straight-up virtualization and migration without changes can increase the odds of individual failures. 47 | 48 | The ability to manage is higher with IaaS and much lower with SaaS, just like security. For SaaS you rely on the cloud provider keeping the entire application service up. With IaaS you can architect *your* application to account for failures, putting more responsibility in your hands. PaaS, as usual, is in the middle — some PaaS may have resiliency options that you can configure, while other platforms are completely in the hands of the provider. 49 | 50 | Overall, a risk-based approach is key: 51 | * Not all assets need equal continuity. 52 | * Don't drive yourself crazy by planning for full provider outages just because of the perceived loss of control. Look at historical performance. 53 | * Strive to design for RTOs and RPOs equivalent to those on traditional infrastructure. 54 | 55 | ## Overview 56 | 57 | ### Management Plane Security 58 | 59 | The management plane refers to the interfaces for managing your assets in the cloud. If you deploy virtual machines on a virtual network the management plane is how you launch those machines and configure that network. For Software as a Service the management plane is often the "admin" tab of the user interface and where you configure things like users, settings for the organization, etc. 60 | 61 | The management plane controls and configures the metastructure (defined in Domain 1), and is also part of the metastructure itself. As a reminder, cloud computing is taking physical assets (like networks and processors) and using them to build resource pools. Metastructure is the glue and guts to create, provision, and deprovision the pools. The management plane includes the interfaces for building and managing the cloud itself, but also the interfaces for cloud consumers to manage their own allocated resources of the cloud. 62 | 63 | The management plane is a key tool for enabling and enforcing separation and isolation in multi-tenancy. Limiting who can do what with the APIs is one important means for segregating out customers, or different users within a single tenant. 64 | 65 | #### Accessing the management plane 66 | 67 | APIs and web consoles are the way the management plane is delivered. Application Programming Interfaces allow for programatic management of the cloud. They are the glue that holds the cloud's components together and enables their orchestration. Since not everyone wants to write programs to manage their cloud, web consoles provide visual interfaces. In many cases web consoles merely use the same APIs you can access directly. 68 | 69 | Cloud providers and platforms will also often offer Software Development Kits (SDKs) and Command Line Interfaces (CLIs) to make integrating with their APIs easier. 70 | 71 | * *Web consoles* are managed by the provider. They can be organization-specific (typically using DNS redirection tied to federated identity). For example, when you connect to your cloud file-sharing application you are redirected to your own "version" of the application after you log in. This version will have its own domain name associated with it, which allows you to integrate more easily with federated identity (e.g. instead of all your users logging into "application.com" they log into "your-organization.application.com"). 72 | 73 | As mentioned, most web consoles offer a user interface for the same APIs that you can access directly. Although, depending on the platform or provider's development process, you may sometimes encounter a mismatch where either a web feature or an API call appear on one before the other. 74 | 75 | * *APIs* are typically [REST][1] for cloud services, since REST is easy to implement across the Internet. REST APIs have become the standard for web-based services since they run over HTTP/S and thus work well across diverse environments. 76 | 77 | These can use a variety of authentication mechanisms, as there is no single standard for authentication in REST. HTTP request signing and OAuth are the most common; both of these leverage cryptographic techniques to validate authentication requests. 78 | 79 | You still often see services that embed a password in the request. This is less secure and at higher risk for credential exposure. It's most often seen in older or poorly-designed web platforms that built their web interface first and only added consumer APIs later. If you do encounter this, you need to use dedicated accounts for API access if possible, in order to reduce the opportunities for credential exposure. 80 | 81 | #### Securing the management plane 82 | 83 | Identity and Access management (IAM) includes identification, authentication, and authorizations (including access management). This is how you determine who can do what within your cloud platform or provider. 84 | 85 | The specific options, configurations, and even concepts vary heavily between cloud providers and platforms. Each has their own implementation and may not even use the same definitions for things like "groups" and "roles." 86 | 87 | No matter the platform or provider there is always an account owner with super-admin privileges to manage the entire configuration. This should be enterprise-owned (not personal), tightly locked down, and nearly never used. 88 | 89 | Separate from the account-owner you can usually create super-admin accounts for individual admin use. Use these privileges sparingly; this should also be a smaller group since compromise or abuse of one of these accounts could allow someone to change or access essentially everything and anything. 90 | 91 | Your platform or provider may support lower-level administrative accounts that can only manage parts of the service. We sometimes call these "service administrators" or "day to day administrators". These accounts don't necessarily expose the entire deployment if they are abused or compromised and thus are better for common daily usage. They also help compartmentalize individual sessions, so it isn't unusual to allow a single human administrator access to multiple service administrator accounts (or roles) so they can log in with just the privileges they need for that particular action instead of having to expose a much wider range of entitlements. 92 | 93 | ********insert 6.1******** 94 | >Examples of baseline cloud management plane user accounts including super-administrators and service administrators. 95 | 96 | Both providers and consumers should consistently only allow the least privilege required for users, applications, and other management plane usage. 97 | 98 | All privileged user accounts should use multi-factor authentication (MFA). If possible, *all* cloud accounts (even individual user accounts) should use MFA. It's one of the single most effective security controls to defend against a wide range of attacks. This is also true regardless of the service model: MFA is just as important for SaaS as it is for IaaS. 99 | 100 | (See the IAM domain for more information on IAM and the role of federation and strong authentication, much of which applies to the cloud management plane.) 101 | 102 | #### Management plane security when building/providing a cloud service 103 | 104 | When you are responsible for building and maintaining the management plane itself, such as in a private cloud deployment, that increases your responsibilities. When you consume the cloud you only configure the parts of the management plane that the provider exposes to you, but when you are the cloud provider you obviously are responsible for everything. 105 | 106 | Delving into implementation specifics is beyond the scope of this Guidance, but at a high level there are five major facets to building and managing a secure management plane: 107 | 108 | * *Perimeter security:* Protecting from attacks against the management plane's components itself, such as the web and API servers. It includes both lower-level network defenses as well as higher-level defenses against application attacks. 109 | 110 | * *Customer authentication:* Providing secure mechanisms for customers to authenticate to the management plane. This should use existing standards (like OAuth or HTTP request signing) that are cryptographically valid and well documented. Customer authentication should support MFA as an option or requirement. 111 | 112 | * *Internal authentication and credential passing:* The mechanisms your own employees use to connect with the non-customer-facing portions of the management plane. It also includes any translation between the customer's authentication and any internal API requests. Cloud providers should always mandate MFA for cloud management authentication. 113 | 114 | * *Authorization and entitlements:* The entitlements available to customers and the entitlements for internal administrators. Granular entitlements better enable customers to securely manage their own users and administrators. Internally, granular entitlements reduce the impact of administrators' accounts being compromised or employee abuse. 115 | 116 | * *Logging, monitoring, and alerting:* Robust logging and monitoring of administrative is essential for effective security and compliance. This applies both to what the customer does in their account, and to what employees do in their day-to-day management of the service. Alerting of unusual events is an important security control to ensure that monitoring is actionable, and not merely something you look at after the fact. Cloud customers should ideally be able to access logs of their own activity in the platform via API or other mechanism in order to integrate with their own security logging systems. 117 | 118 | ### Business continuity and disaster recovery 119 | 120 | Like security and compliance, business continuity and disaster recovery (BC/DR) is a shared responsibility. There are aspects that the cloud provider has to manage, but the cloud customer is also ultimately responsible for how they use and manage the cloud service. This is especially true when planning for outages of the cloud provider (or parts of the cloud provider's service). 121 | 122 | Also similar to security, customers have more control and responsibility in IaaS, less in SaaS, with PaaS in the middle. 123 | 124 | BC/DR must take a risk-based approach. Many BC options may be cost prohibitive in the cloud, but may also not be necessary. This is no different than in traditional data centers, but it isn't unusual to want to over-compensate when losing physical control. For example, the odds of a major IaaS provider going out of business or changing their entire business model is low, but this isn't all that uncommon for a smaller venture-backed SaaS provider. 125 | 126 | * Ask the provider for outage statistics over time since this can help inform your risk decisions. 127 | 128 | * Remember that capabilities vary between providers and should be included in the vendor selection process. 129 | 130 | #### Business continuity within the cloud provider 131 | 132 | When you deploy assets into the cloud you can't assume the cloud will always be there, or always work the way you expect. Outages and issues are no more or less common than with any other technology, although the cloud can be overall more resilient when the provider includes mechanisms to better enable building resilient applications. 133 | 134 | This is a key point we need to spend a little more time on: As we've mentioned in a few places the very nature of virtualizing resources into pools typically creates *less* resiliency for any single asset, like a virtual machine. On the other hand, abstracting resources and managing everything through software opens up flexibility to more easily enable resiliency features like durable storage and cross-geographic load balancing. 135 | 136 | There is a huge range of options here, and not all providers or platforms are created equal, but you shouldn't assume that "the cloud" as a general term is more or less resilient than traditional infrastructure. Sometimes it's better, sometimes it's worse, and knowing the difference all comes down to your risk assessment and *how* you use the cloud service. 137 | 138 | This is why it is typically best to re-architect deployments when you migrate them to the cloud. Resiliency itself, and the fundamental mechanisms for ensuring resiliency, change. Direct "lift and shift" migrations are less likely to account for failures, nor will they take advantage of potential improvements from leveraging platform or service specific capabilities. 139 | 140 | The focus is on understanding and leveraging the platform's BC/DR features. Once you make the decision to deploy in the cloud you then want to optimize your use of included BC/DR features before adding on any additional capabilities through third-party tools. 141 | 142 | BC/DR must account for the entire logical stack: 143 | 144 | * Metastructure 145 | 146 | Since cloud configurations are controlled by software, these configurations should be backed up in a restorable format. This isn't always possible, and is pretty rare in SaaS, but there are tools to implement this in many IaaS platforms (including third-party options) using *Software Defined Infrastructure*. 147 | 148 | *Software Defined Infrastructure* allows you to create an infrastructure template to configure all or some aspects of a cloud deployment. These templates are then translated natively by the cloud platform or into API calls that orchestrate the configuration. 149 | 150 | This should include controls like IAM and logging, not merely architecture, network design, or service configurations. 151 | 152 | * Infrastructure 153 | 154 | As mentioned, any provider will offer features to support higher availability than can comparably be achieved in a traditional data center for the same cost. But these only work if you adjust your architecture. "Lifting and shifting" applications to the cloud without architectural adjustments or redesign will often result in lower availability. 155 | 156 | Be sure and understand the cost model for these features, especially for implementing them across the provider's physical locations/regions, where the cost can be high. Some assets and data must be converted to work across cloud locations/regions, for example, custom machine images used to launch servers. These assets must be included in plans. 157 | 158 | * Infostructure 159 | 160 | Data synchronization is often one of the more difficult issues to manage across locations, even if the actual storage costs are manageable. This is due to the size of data sets (vs. an infrastructure configuration) and keeping data in sync across locations and services, something that's often difficult even in a single storage location/system. 161 | 162 | * Applistructure 163 | 164 | Applistructure includes all of the above, but also the application assets like code, message queues, etc. When a cloud consumer builds their own cloud applications it's usually built on top of IaaS and/or PaaS, so resiliency and recovery are inherently tied to those layers. But Applistructure includes the full range of everything in an application. 165 | 166 | Understand PaaS limitations and lock-ins, and plan for the outage of a PaaS component. Platform services include a range of functions we used to manually implement in applications, everything from authentication systems to message queues and notifications. It isn't unusual for modern applications to even integrate these kinds of services from multiple different cloud providers, creating an intricate web. 167 | 168 | Discussing availability of the component/service with your providers is reasonable. For example, the database service from your infrastructure provider may not share the same performance and availability as their virtual machine hosting. 169 | 170 | When real-time switching isn't possible, design your application to gracefully fail in case of a service outage. There are many automation techniques to support this. For example, if your queue service goes down, that should trigger halting the front end so messages aren't lost. 171 | 172 | Downtime is always an option. You don't always need perfect availability, but if you *do* plan to accept an outage you should at least ensure you fail gracefully, with emergency downtime notification pages and responses. This may be possible using static stand-by via DNS redirection. 173 | 174 | "Chaos Engineering" is often used to help build resilient cloud deployments. Since everything cloud is API-based, Chaos Engineering uses tools to selectively degrade portions of the cloud to continuously test business continuity. 175 | 176 | This is often done in production, not just test environment, and forces engineers to assume failure instead of viewing it as only a possible event. By *designing systems for failure you can better absorb individual component failures*. 177 | 178 | #### Business continuity for loss of the cloud provider 179 | 180 | It is always possible that an entire cloud provider, or at least a major portion of their infrastructure (such as one specific geography) can go down. Planning for cloud provider outages is difficult, due to the natural lock-in of leveraging a provider's capabilities. Sometimes you can migrate to a different portion of their service, but in other cases an internal migration simply isn't an option, or you may be totally locked in. 181 | 182 | Depending on the history of your provider, and their internal availability capabilities, accepting this risk is often a legitimate option. 183 | 184 | Downtime may be another option, but it depends on your recovery time objectives (RTO). However, some sort of static stand-by should be available via DNS redirection. Graceful failure should also include failure responses to API calls, if you offer APIs. 185 | 186 | Be wary of selecting a secondary provider or service if said service may also be located or reliant on the same provider. It doesn't do you any good to use a backup storage provider if said provider happens to be based on the same infrastructure provider. 187 | 188 | Moving data between providers can be difficult, but might be easy compared to moving metastructure, security controls, logging, and so on, which may be incompatible between platforms. 189 | 190 | SaaS may often be the biggest provider outage concern, due to total reliance on the provider. Scheduled data extraction and archiving may be your only BC option outside of accepting downtime. Extracting and archiving to another cloud service, especially IaaS/PaaS, may be a better option than moving it to local/on-premise storage. Again, take a risk-based approach that includes the unique history of your provider. 191 | 192 | Even if you have your data, you must have an alternate application that you know you can migrate it into. If you can't use the data, you don't have a viable recovery strategy. 193 | 194 | Test, test, and test. This may often be easier than in a traditional data center because you aren't constrained by physical resources, and only pay for use of certain assets during the life of the test. 195 | 196 | #### Business continuity for private cloud and providers 197 | 198 | This is completely on on the provider's shoulders, and BC/DR includes everything down to the physical facilities. RTOs and RPOs will be stringent, since if the cloud goes down, everything goes down. 199 | 200 | If you are providing services to others, be aware of contractual requirements, including data residency, when building your BC plans. For example, failing over to a different geography in a different legal jurisdiction may violate contracts or local laws. 201 | 202 | ## Recommendations 203 | 204 | * Management plane (metastructure) security 205 | * Ensure there is strong perimeter security for API gateways and web consoles. 206 | * Use strong authentication and MFA. 207 | * Maintain tight control of primary account holder/root account credentials and consider dual-authority to access them. 208 | * Establishing multiple accounts with your provider will help with account granularity and to limit blast radius (with IaaS and PaaS). 209 | * Use separate super administrator and day-to-day administrator accounts instead of root/primary account holder credentials. 210 | * Consistently implement least privilege accounts for metastructure access. 211 | * This is why you separate development and test accounts with your cloud provider. 212 | * Enforce use of MFA whenever available. 213 | * Business continuity 214 | * Architecture for failure. 215 | * Take a risk-based approach to everything. Even when you assume the worst, it doesn't mean you can afford or need to keep full availability if the worst happens. 216 | * Design for high availability within your cloud provider. In IaaS and PaaS this is often easier and more cost effective than the equivalent in traditional infrastructure. 217 | * Take advantage of provider-specific features. 218 | * Understand provider history, capabilities, and limitations. 219 | * Cross-location should always be considered, but beware of costs depending on availability requirements. 220 | * Also ensure things like images and asset IDs are converted to work in the different locations. 221 | * Business Continuity for metastructure is as important as that for assets. 222 | * Prepare for graceful failure in case of a cloud provider outage. 223 | * This can include plans for interoperability and portability with other cloud providers or a different region with your current provider. 224 | * For super-high-availability applications, start with cross-location BC before attempting cross-provider BC. 225 | * Cloud providers, including private cloud, must provide the highest levels of availability and mechanisms for customers/users to manage aspects of their own availability. 226 | 227 | [1]: https://en.wikipedia.org/wiki/Representational_state_transfer -------------------------------------------------------------------------------- /Domain 8- Virtualization and Containers.md: -------------------------------------------------------------------------------- 1 | # Domain 8: Virtualization and Containers 2 | 3 | ## Introduction 4 | 5 | Virtualization isn't merely a tool for creating virtual machines—it's the core technology for enabling cloud computing. We use virtualization all throughout computing, from full operating virtual machines to virtual execution environments like the Java Virtual Machine, as well as in storage, networking, and beyond. 6 | 7 | Cloud computing is fundamentally based on pooling resources and virtualization is the technology used to convert fixed infrastructure into these pooled resources. Virtualization provides the *abstraction* needed for resource pools, which are then managed using orchestration. 8 | 9 | As mentioned, virtualization covers an extremely wide range of technologies; essentially any time we create an abstraction, we're using virtualization. For cloud computing we tend to focus on those specific aspects of virtualization used to create our resource pools, especially: 10 | 11 | * Compute 12 | * Network 13 | * Storage 14 | * Containers 15 | 16 | The aforementioned aren't the only categories of virtualization, but they are the ones most relevant to cloud computing. 17 | 18 | Understanding the impacts of virtualization on security is fundamental to properly architecting and implementing cloud security. Virtual assets provisioned from a resource pool may *look* just like the physical assets they replace, but that look and feel is really just a tool to help us better understand and manage what we see. It's also a useful way to leverage existing technologies, like operating systems, without having to completely rewrite them from scratch. Underneath, these virtual assets work completely differently from the resources they are abstracted from. 19 | 20 | ## Overview 21 | 22 | At its most basic, virtualization abstracts resources from their underlying physical assets. You can virtualize nearly anything in technology, from entire computers to networks to code. As mentioned in the introduction, cloud computing is fundamentally based on virtualization: It's how we abstract resources to create pools. Without virtualization, there is no cloud. 23 | 24 | Many security processes are designed with the expectation of physical control over the underlying infrastructure. While this doesn't go away with cloud computing, virtualization adds two new layers for security controls: 25 | 26 | * *Security of the virtualization technology itself.* E.g. securing a hypervisor. 27 | * *Security controls for the virtual assets.* In many cases, this must be implemented differently than it would be in the corresponding physical equivalent. For example, as discussed in Domain 7, virtual firewalls are not the same as physical firewalls, and mere abstraction of a physical firewall into a virtual machine still may not meet deployment or security requirements. 28 | 29 | Virtualization security in cloud computing still follows the shared responsibility model. The cloud provider will always be responsible for securing the physical infrastructure and the virtualization platform itself. Meanwhile, the cloud customer is responsible for properly implementing the available virtualized security controls and understanding the underlying risks, based on what is implemented and managed by the cloud provider. For example, deciding when to encrypt virtualized storage, properly configuring the virtual network and firewalls, or deciding when to use dedicated hosting vs. a shared host. 30 | 31 | Since many of these controls touch upon other areas of cloud security, such as data security, we try to focus on the virtualization-specific concerns in this domain. The lines aren't always clear, however, and the bulk of cloud security controls are covered more deeply in the other domains of this Guidance. Domain 7: Infrastructure Security focuses extensively on virtual networks and workloads. 32 | 33 | ### Major virtualization categories relevant to cloud computing 34 | 35 | #### Compute 36 | 37 | Compute virtualization abstracts the running of code (including operating systems) from the underlying hardware. Instead of running directly on the hardware, the code runs on top of an abstraction layer that enables more flexible usage, such as running multiple operating systems on the same hardware (virtual machines). This is a simplification and we recommend further research into virtual machine managers and hypervisors if you are interested in learning more. 38 | 39 | Compute most commonly refers to virtual machines, but this is quickly changing, in large part due to ongoing technology evolution and adoption of containers. 40 | 41 | Containers and certain kinds of serverless infrastructure also abstract compute. These are different abstractions to create code execution environments, but they don't abstract a full operating system as a virtual machine does. (Containers are covered in more detail below.) 42 | 43 | ##### Cloud Provider Responsibilities 44 | 45 | The primary security responsibilities of the cloud provider in compute virtualization are to enforce *isolation* and maintain a *secure virtualization infrastructure*. 46 | 47 | * *Isolation* ensures that compute processes or memory in one virtual machine/container should not be visible to another. It is how we separate different tenants, even when they are running processes on the same physical hardware. 48 | 49 | * The cloud provider is also responsible for securing the *underlying infrastructure and the virtualization technology* from external attack or internal misuse. This means using patched and up-to-date hypervisors that are properly configured and supported with processes to keep them up to date and secure over time. The inability to patch hypervisors across a cloud deployment could create a fundamentally insecure cloud when a new vulnerability in the technology is discovered. 50 | 51 | Cloud providers should also support secure use of virtualization for cloud consumers. This means creating a secure chain of processes from the image (or other source) used to run the virtual machine all the way through a boot process with security and integrity. This ensures that tenants cannot launch machines based on images that they shouldn't have access to, such as those belonging to another tenant, and that a running virtual machine (or other process) is the one the customer expects to be running. 52 | 53 | In addition, cloud providers should assure customers that volatile memory is safe from unapproved monitoring, since important data could be exposed if another tenant, a malicious employee, or even an attacker is able to access running memory. 54 | 55 | ##### Cloud Consumer Responsibilities 56 | 57 | Meanwhile, the primary responsibility of the cloud consumer is to properly implement the security of whatever it deploys within the virtualized environment. Since the onus of compute virtualization security is on the provider, the customer tends to have only a few security options relating directly to the virtualization of the workload. There is quite a bit more to securing workloads, and those are covered in Domain 7. 58 | 59 | That said, there are still some virtualization-specific differences that the cloud consumer can address in their security implementation. Firstly, the cloud consumer should take advantage of the security controls for *managing* their virtual infrastructure, which will vary based on the cloud platform and often include: 60 | 61 | * *Security settings, such as identity management, to the virtual resources.* This is not the identity management *within* the resource, such as the operating system login credentials, but the identity management of who is allowed to access the cloud management of the resource—for example, stopping or changing the configuration of a virtual machine. See Domain 6 for specifics on management plane security. 62 | 63 | * *Monitoring and logging.* Domain 7 covers monitoring and logging of workloads, including how to handle system logs from virtual machines of containers, but the cloud platform will likely offer additional logging and monitoring at the virtualization level. This can include the status of a virtual machine, management events, performance, etc. 64 | 65 | * *Image asset management.* Cloud compute deployments are based on master images—be it a virtual machine, container, or other code—that are then run in the cloud. This is often highly automated and results in a larger number of images to base assets on, compared to traditional computing master images. Managing these—including which meet security requirements, where they can be deployed, and who has access to them—is an important security responsibility. 66 | 67 | * *Use of dedicated hosting*, if available, based on the security context of the resource. In some situations you can specify that your assets run on hardware dedicated only to you (at higher cost), even on a multi-tenant cloud. This may help meet compliance requirements or satisfy security needs in special cases where sharing hardware with another tenant is considered a risk. 68 | 69 | Secondly, the cloud consumer is also responsible for security controls *within* the virtualized resource: 70 | 71 | * This includes all the standard security for the workload, be it a virtual machine, container, or application code. These are well covered by standard security best practices and the additional guidance in Domain 7. 72 | 73 | * Of particular concern is ensuring deployment of only secure configurations (e.g. a patched, updated virtual machine image). Due to the automation of cloud computing it is easy to deploy older configurations that may not be patched or properly secured. 74 | 75 | Other general compute security concerns include: 76 | 77 | * Virtualized resources tend to be more ephemeral and change at a more rapid pace. Any corresponding security, such as monitoring, must keep up with the pace. Again, the specifics are covered in more depth in Domain 7. 78 | 79 | * Host-level monitoring/logging may not be available, especially for serverless deployments. Alternative log methods may need to be implemented. For example, in a serverless deployment, you are unlikely to see system logs of the underlying platform and should offset by writing more robust application logging into your code. 80 | 81 | ### Network 82 | 83 | There are multiple kinds of virtual networks, from basic Virtual Local Area Networks (VLANs) to full Software Defined Networks (SDN). As a core of cloud infrastructure security these are covered both here and in Domain 7. 84 | 85 | To review, most cloud computing today uses SDN for virtualizing networks. (VLANs are often not suitable for cloud deployments since they lack important isolation capabilities for multitenancy.) 86 | 87 | SDN abstracts the network management plane from the underlying physical infrastructure, removing many typical networking constraints. For example, you can overlay multiple virtual networks, even ones that completely overlap their address ranges, over the same physical hardware, with all traffic properly segregated and isolated. SDNs are also defined using software settings and API calls, which supports orchestration and agility. 88 | 89 | Virtual networks are quite different than physical networks. They run on physical networks, but abstraction allows for deep modification on networking behavior in ways that impact many security processes and technologies. 90 | 91 | #### Monitoring and Filtering 92 | 93 | In particular, monitoring and filtering (including firewalls) change extensively due to the differences in how packets move around the virtual network. Resources may communicate on a physical server without traffic crossing the physical network. For example, if two virtual machines are located on the same physical machine there is no reason to route network traffic off the box and onto the network. Thus they can communicate directly, and monitoring and filtering tools inline on the network (or attached to the routing/switching hardware) will never see the traffic. 94 | 95 | *************insert 8.1************* 96 | > Virtual networks move packets in software and monitoring can't rely on sniffing the physical network connections. 97 | 98 | To compensate, you can route traffic to a virtual network monitoring or filtering tool on the same hardware (including a virtual machine version of a network security product). You can also bridge all network traffic back out to the network, or route it to a virtual appliance on the same virtual network. Each of these approaches have drawbacks since they create bottlenecks and less-efficient routing. 99 | 100 | The cloud platform/provider may not support access for direct network monitoring. Public cloud providers rarely allow full packet network monitoring to customers, due to the complexity (and cost). Thus you can't assume you will ever have access to raw packet data unless you collect it yourself in the host, or using a virtual appliance. 101 | 102 | With public cloud in particular, some communications between cloud services will occur on the provider's network; customer monitoring and filtering of that traffic isn't possible (and would create a security risk for the provider). For example, if you connect a serverless application to the cloud provider's object storage, database platform, message queue, or other PaaS product, this traffic would run natively on the provider's network, not necessarily within the customer-managed virtual network. As we move out of simple infrastructure virtualization the concept of a customer-managed network begins to fade. 103 | 104 | However, all modern cloud platforms offer built-in firewalls, which may offer advantages over corresponding physical firewalls. These are software firewalls that may operate within the SDN or the hypervisor. They typically offer fewer features than a modern, dedicated next-generation firewall, but these capabilities may not always be needed due to other inherent security provided by the cloud provider. 105 | 106 | #### Management Infrastructure 107 | 108 | Virtual networks for cloud computing always support remote management and, as such, securing the management plane/metastructure is critical. At times it is possible to create and destroy entire complex networks with a handful of API calls or a few clicks on a web console. 109 | 110 | ##### Cloud Provider Responsibilities 111 | 112 | The cloud provider is primarily responsible for building a secure network infrastructure and configuring it properly. The absolute top security priority is segregation and isolation of network traffic to prevent tenants from viewing another's traffic. This is the most foundational security control for any multitenant network. 113 | 114 | The provider should disable packet sniffing or other metadata "leaks" that could expose data or configurations between tenants. Packet sniffing, even within a tenant's own virtual networks, should also be disabled to reduce the ability of an attacker to compromise a single node and use it to monitor the network, as is common on non-virtualized networks. Tagging or other SDN-level metadata should also not be exposed outside the management plane or a compromised host could be used to span into the SDN itself. 115 | 116 | All virtual networks should enable built-in firewall capabilities for cloud consumers without the need for host firewalls or external products. The provider is also responsible for detecting and preventing attacks on the underlying physical network and virtualization platform. This includes perimeter security of the cloud itself. 117 | 118 | ##### Cloud Consumer Responsibilities 119 | 120 | Cloud consumers are primarily responsible for properly configuring their deployment of the virtual network, especially any virtual firewalls. 121 | 122 | Network architecture can play a larger role in virtual network security since we aren't constrained by physical connections and routing. Since virtual networks are software constructs, the use of multiple, separate virtual networks may offer extensive compartmentalization advantages not possible on a traditional physical network. You can run every application stack in its own virtual network, which dramatically reduces the attack surface if a malicious actor gains a foothold. An equivalent architecture on a physical network is cost prohibitive. 123 | 124 | *Immutable* networks can be defined on some cloud platforms using software templates, which can help enforce known-good configurations. The entire known-good state of the network can be defined in a template, instead of having to manually configure all the settings. Aside from the ability to create multiple networks with a secure baseline, these can also be used to detect, and in some cases revert, deviations from known good states. 125 | 126 | The cloud consumer is, again, responsible for proper rights management and configuration of exposed controls in the management plane. When virtual firewalls and/or monitoring don't meet security needs, the consumer may need to compensate with a virtual security appliance or host security agent. This falls under cloud infrastructure security and is covered in depth in Domain 7. 127 | 128 | #### Cloud Overlay Networks 129 | 130 | Cloud overlay networks are a special kind of WAN virtualization technology for created networks that span multiple "base" networks. For example, an overlay network could span physical and cloud locations or multiple cloud networks, perhaps even on different providers. A full discussion is beyond the scope of this Guidance and the same core security recommendations apply. 131 | 132 | ### Storage 133 | 134 | Storage virtualization is already common in most organizations—SAN and NAS are both common forms of storage virtualization—and storage security is discussed in more detail in Domain 11. 135 | 136 | Most virtualized storage is *durable* and keeps multiple copies of data in different locations so that drive failures are less likely to result in data loss. Encrypting those drives reduces the concern that swapping out a drive, which is a very frequent activity, could result in data exposure. 137 | 138 | However, this encryption doesn't protect data in any virtualized layers; it only protects the data at physical storage. Depending on the type of storage the cloud provider may also (or instead) encrypt it at the virtualization layer, but this may not protect customer data from exposure to the cloud provider. Thus any additional protection should be provided using the advice in Domain 11. 139 | 140 | ### Containers 141 | 142 | Containers are highly portable code execution environments. To simplify, a virtual machine is a complete operating system, all the way down to the kernel. A container, meanwhile, is a virtual execution environment that features an isolated user space, but uses a shared kernel. A full discussion is beyond the scope of this guidance and [you can read more about software containers at this Wikipedia entry](https://en.wikipedia.org/wiki/Operating-system-level_virtualization). 143 | 144 | Such containers can be built directly on top of physical servers or run on virtual machines. Current implementations rely on an existing kernel/operating system, which is why they can run inside a virtual machine even if nested virtualization is not supported by the hypervisor. (Software containers rely on a completely different technology for hypervisors.) 145 | 146 | Software container systems always include three key components: 147 | * The execution environment (the container). 148 | * An orchestration and scheduling controller (which can be a collection of multiple tools). 149 | * A repository for the container images or code to execute. 150 | 151 | Together these are the place to run things, the things to run, and the management system to tie them together. 152 | 153 | Regardless of the technology platform, container security includes: 154 | 155 | * *Assuring the security of the underlying physical infrastructure (compute, network, storage).* This is no different than any other form of virtualization, but it now extends into the underlying operating system where the container's execution environment runs. 156 | 157 | * *Assuring the security of the management plane*, which in this case are the orchestrator and the scheduler. 158 | 159 | * *Properly securing the image repository.* The image repository should be in a secure location with appropriate access controls configured. This is both to prevent loss or unapproved modification of container images and definition files, as well as to forestall leaks of sensitive data through unapproved access to the files. Containers run so easily that it's also important that images are only able to deploy in the right security context. 160 | 161 | * *Building security into the tasks/code running inside the container.* It's still possible to run vulnerable software inside a container and, in some cases, this could expose the shared operating system or data from other containers. For example, it is possible to configure some containers to allow not merely access to the container's data on the file system but also root file system access. Allowing too much network access is also a possibility. These are all specific to the particular container platform and thus require securely configuring both the container environment *and* the images/container configurations themselves. 162 | 163 | Containers are rapidly evolving, which complicates some aspects of security, but doesn't mean that they are inherently insecure. 164 | 165 | Containers don't necessarily provide full security isolation, but they do provide task segregation. That said, virtual machines typically *do* provide security isolation. Thus you can put tasks of equivalent security context on the same set of physical or virtual hosts in order to provide greater security segregation. 166 | 167 | Container management systems and image repositories also have different security capabilities, based on which products you use. Security should learn and understand the capabilities of the products they need to support. Products should, at a minimum, support role-based access controls and strong authentication. They should also support secure configurations, such as isolating file system, process, and network access. 168 | 169 | A deep understanding of container security relies on a deep understanding of operating system internals, such as namespaces, network port mapping, memory, and storage access. 170 | 171 | Different host operating systems and container technologies offer different security capabilities. This assessment should be included in any container platform selection process. 172 | 173 | One key area to secure is which images/tasks/code are allowed into a particular execution environment. A secure repository with proper container management and scheduling will enable this. 174 | 175 | ## Recommendations 176 | 177 | * Cloud providers should: 178 | * Inherently secure any underlying physical infrastructure used for virtualization. 179 | * Focus on assuring security isolation between tenants. 180 | * Provide sufficient security capabilities at the virtualization layers to allow cloud consumers to properly secure their assets. 181 | * Strongly defend the physical infrastructure and virtualization platforms from attack or internal compromise. 182 | * Implement all customer-managed virtualization features with a secure-by-default configuration. 183 | * Specific priorities: 184 | * Compute 185 | * Use secure hypervisors and implement a patch management process to keep them up to date. 186 | * Configure hypervisors to isolate virtual machines from each other. 187 | * Implement internal processes and technical security controls to prevent admin/non-tenant access to running VMs or volatile memory. 188 | * Network 189 | * Implement essential perimeter security defenses to protect the underlying networks from attack and, wherever possible, to detect and prevent attacks against consumers at the physical level, as well as at any virtual network layers that they can't directly protect themselves. 190 | * Assure isolation between virtual networks, even if those networks are all controlled by the same consumer. 191 | * Unless the consumer deliberately connects the separate virtual networks. 192 | * Implement internal security controls and policies to prevent both modification of consumer networks and monitoring of traffic without approval or outside contractual agreements. 193 | * Storage 194 | * Encrypt any underlying physical storage, if it is not already encrypted at another level, to prevent data exposure during drive replacements. 195 | * Isolate encryption from data-management functions to prevent unapproved access to customer data. 196 | 197 | * Cloud consumers should: 198 | * Ensure they understand the capabilities offered by their cloud providers as well as any security gaps. 199 | * Properly configure virtualization services in accordance with the guidance from the cloud provider and other industry best practices. 200 | * The bulk of fundamental virtualization security falls on the cloud provider, which is why most of the security recommendations for cloud consumers are covered in the other domains of this Guidance. 201 | * For containers: 202 | * Understand the security isolation capabilities of both the chosen container platform and underlying operating system, then choose the appropriate configuration. 203 | * Use physical or virtual machines to provide container isolation and group containers of the same security contexts on the same physical and/or virtual hosts. 204 | * Ensure that only approved, known, and secure container images or code can be deployed. 205 | * Appropriately secure the container orchestration/management and scheduler software stack(s). 206 | * Implement appropriate role-based access controls and strong authentication for all container and repository management. -------------------------------------------------------------------------------- /Domain 9- Incident Response.md: -------------------------------------------------------------------------------- 1 | # Domain 9: Incident Response 2 | 3 | ## Introduction 4 | 5 | Incident Response (IR) is a critical facet of any information security program. Preventive security controls have proven unable to completely eliminate the possibility that critical data could be compromised. Most organizations have some sort of IR plan to govern how they will investigate an attack, but as the cloud presents distinct differences in both access to forensic data and governance, organizations must consider how their IR processes will change. 6 | 7 | This domain seeks to identify those gaps pertinent to IR that are created by the unique characteristics of cloud computing. Security professionals may use this as a reference when developing response plans and conducting other activities during the preparation phase of the IR lifecycle. This domain is organized in accord with the commonly accepted Incident Response Lifecycle as described in the National Institute of Standards and Technology Computer Security Incident Handling Guide (NIST 800-61rev2 08/2012) [1]. Other international standard frameworks for incident response include ISO/IEC 27035 and the ENISA [Strategies for incident response and cyber crisis cooperation](https://www.enisa.europa.eu/publications/strategies-for-incident-response-and-cyber-crisis-cooperation). 8 | 9 | After describing the Incident Response Lifecycle, as laid out in NIST 800-61rev2, each subsequent section addresses a phase of the lifecycle and explores the potential considerations for responders as they work in a cloud environment. 10 | 11 | ## Overview 12 | 13 | ### Incident Response Lifecycle 14 | 15 | The incident response cycle is defined in the NIST 800-61rev2 document. It includes the following phases and major activities: 16 | 17 | ************insert 9.1*********** 18 | > The Incident Response Lifecycle 19 | 20 | * Preparation: "Establishing an incident response capability so that the organization is ready to respond to incidents." 21 | * Process to handle the incidents. 22 | * Handler Communications and Facilities. 23 | * Incident Analysis Hardware and Software. 24 | * Internal Documentation (Port lists, Asset Lists, Network diagrams, current baselines of network traffic). 25 | * Identifying training. 26 | * Evaluating infrastructure by proactive scanning and network monitoring, vulnerability assessments, and performing risk assessments. 27 | * Subscribing to third-party threat intelligence services. 28 | 29 | * Detection & Analysis 30 | * Alerts (Endpoint Protection, Network Security Monitoring, Host Monitoring, Account Creation, Privilege Escalation, other indicators of compromise, SIEM, Security Analytics (baseline and anomaly detection), and user behavior analytics). 31 | * Validate alerts (reducing false positives) and escalation. 32 | * Estimate the scope of the Incident. 33 | * Assign an Incident Manager who will coordinate further actions. 34 | * Designate a person who will communicate the incident containment and recovery status to Sr. Management. 35 | * Build a timeline of the attack. 36 | * Determine the extent of the potential data loss. 37 | * Notification and coordination activities. 38 | 39 | * Containment, Eradication & Recovery 40 | * Containment: Taking systems offline. Considerations for data loss versus service availability. Ensuring systems don't destroy themselves upon detection. 41 | * Eradication & Recovery: Clean up compromised devices and restore systems to normal operation. Confirm systems are functioning properly, deploy controls to prevent similar incidents. 42 | * Documenting the incident and gathering evidence (chain of custody). 43 | 44 | * Post-mortem 45 | * What could have been done better? Could the attack have been detected sooner? What additional data would have been helpful to isolate the attack faster? Does the IR process need to change? If so, how? 46 | 47 | ### How the cloud impacts IR 48 | 49 | Each of the phases of the lifecycle is affected to different degrees by a cloud deployment. Some of these are similar to any incident response in an outsourced environment where you need to coordinate with a third party. Other differences are more specific to the abstracted and automated nature of cloud. 50 | 51 | #### Preparation 52 | 53 | When preparing for cloud incident response, here are some major considerations: 54 | 55 | * *SLAs and Governance:* Any incident using a public cloud or hosted provider requires an understanding of service level agreements (SLAs), and likely coordination with the cloud provider. Keep in mind that, depending on your relationship with the provider, you may not have direct points of contact and might be limited to whatever is offered through standard support. A custom private cloud in a third-party data center will have a very different relationship than signing up through a website and clicking through a license agreement for a new SaaS application. 56 | 57 | Key questions include: What does your organization do? What is the cloud service provider (CSP) responsible for? Who are the points of contact? What are the response time expectations? What are the escalation procedures? Do you have out-of-band communication procedures (in case networks are impacted)? How do the hand-offs work? What data are you going to have access to? 58 | 59 | Be sure to test the process with the CSP if possible. Validate that escalations and roles/responsibilities are clear. Ensure the CSP has contacts to notify you of incidents they detect, and that such notifications are integrated into your process. For click-through services, notifications will likely be sent to your registration email address; these should be controlled by the enterprise and monitored continuously. Ensure that you have contacts, including out-of-band methods, for your CSP and that you test them. 60 | 61 | * *IaaS/PaaS vs. SaaS:* In a multitenant environment, how can data specific to your cloud be provided for investigation? For each major service you should understand and document what data and logs will be available in an incident. Don't assume you can contact a provider after the fact and collect data that isn't normally available. 62 | 63 | * *"Cloud jump kit:"* These are the tools needed to investigate in a remote location (as with cloud-based resources). For example, do you have tools to collect logs and metadata from the cloud platform? Do you have the ability to interpret the information? How do you obtain images of running virtual machines and what kind of data do you have access to: disk storage or volatile memory? 64 | 65 | * *Architect the cloud environment for faster detection, investigation, and response (containment and recoverability).* This means ensuring you have the proper configuration and architecture to support incident response: 66 | 67 | * Enable instrumentation, such as cloud API logs, and ensure that they feed to a secure location that's available to investigators in case of an incident. 68 | * Utilize isolation to ensure that attacks cannot spread and compromise the entire application. 69 | * Use immutable servers when possible. If an issue is detected, move workloads from compromised device onto a new instance in a known-good state. Employ a greater focus on file integrity monitoring and configuration management. 70 | * Implement application stack maps to understand where data is going to reside in order to factor in geographic differences in monitoring and data capture. 71 | * It can be very helpful to perform threat modeling and tabletop exercises to determine the most effective means of containment for different types of attacks on different components in the cloud stack. 72 | * This should include differences between responses for IaaS/PaaS/SaaS. 73 | 74 | #### Detection and Analysis 75 | 76 | Detection and analysis in a cloud environment may look nearly the same (for IaaS) and quite different (for SaaS). In all cases, the monitoring scope must cover the cloud's management plane, not merely the deployed assets. 77 | 78 | You may be able to leverage in-cloud monitoring and alerts that can kick off an automated IR workflow in order to speed up the response process. Some cloud providers offer these features for their platforms, and there are are also some third-party monitoring options available. These may not be security-specific: many cloud platforms (IaaS and possibly PaaS) expose a variety of real-time and near-real-time monitoring metrics for performance and operational reasons. But security may also be able to leverage these for security needs. 79 | 80 | Cloud platforms also offer a variety of logs, which can sometimes be integrated into existing security operations/monitoring. These could range from operational logs to full logging of all API calls or management activity. Keep in mind that they are not available on all providers; you tend to see them more with IaaS and PaaS than SaaS. When log feeds aren't available you may be able to use the cloud console as a means to identify environment/configuration changes. 81 | 82 | *Data sources* for cloud incidents can be quite different from those used in incident response for traditional computing. There is significant overlap, such as system logs, but there are differences in terms of how data can be collected and in terms of new sources, such as feeds from the cloud management plane. 83 | 84 | As mentioned, cloud platform logs may be an option, but they are not universally available. Ideally they should show all management-plane activity. It's important to understand what is logged and the gaps that could affect incident analysis. Is all management activity recorded? Do they include automated system activities (like auto-scaling) or cloud provider management activities? In the case of a serious incident, providers may have other logs that are not normally available to customers. 85 | 86 | One challenge in collecting information may be limited network visibility. Network logs from a cloud provider will tend to be flow records, but not full packet capture. 87 | 88 | Where there are gaps you can sometimes instrument the technology stack with your own logging. This can work within instances, containers, and application code in order to gain telemetry important for the investigation. Pay particular attention to PaaS and serverless application architectures; you will likely need to add custom application-level logging. 89 | 90 | External threat intelligence may also be useful, as it is with on-premises incident response, in order to help identify indicators of compromise and to get adversary information. 91 | 92 | Be aware that there are potential challenges when the information that is provided by a CSP faces chain of custody questions. There are no reliable precedents established at this point. 93 | 94 | *Forensics and investigative support* will also need to adapt, beyond understanding changes to data sources. 95 | 96 | Always factor in what the CSP can provide and whether it meets chain of custody requirements. Not every incident will result in legal action, but it's important to work with your legal team to understand the lines and where you could end up having chain of custody issues. 97 | 98 | There is a greater need to automate many of the forensic/investigation processes in cloud environments, because of their dynamic and higher-velocity nature. For example, evidence could be lost due to a normal auto-scaling activity or if an administrator decides to terminate a virtual machine involved in an investigation. Some examples of tasks you can automate include: 99 | 100 | * Snapshotting the storage of the virtual machine. 101 | * Capturing any metadata at the time of alert, so that the analysis can happen based on what the infrastructure looked like at that time. 102 | * If your provider supports it, "pausing" the virtual machine, which will save the volatile memory state. 103 | 104 | You can also leverage the capabilities of the cloud platform to determine the extent of the potential compromise: 105 | 106 | * Analyze network flows to check if network isolation held up. You can also use API calls to snaphot the network and the virtual firewall rules state, which could give you an accurate picture of the entire stack at the time of the incident. 107 | * Examine configuration data to check if other similar instances were potentially exposed in the same attack. 108 | * Review data access logs (for cloud-based storage, if available) and management plane logs to see if the incident affected or spanned into the cloud platform. 109 | * Serverless and PaaS-based architectures will require additional correlation across the cloud platform and any self-generated application logs. 110 | 111 | #### Containment, Eradication and Recovery 112 | 113 | Always start by ensuring the cloud management plane/metastructure is free of an attacker. This will often involve invoking break-glass procedures to access the root or master credentials for the cloud account, in order to ensure that attacker activity isn't being masked or hidden from lower-level administrator accounts. Remember: You can't contain an attack if the attacker is still in the management plane. Attacks on cloud assets, such as virtual machines, may sometimes reveal management plane credentials that are then used to bridge into a wider, more serious attack. 114 | 115 | The cloud often provides a lot more flexibility in this phase of the response, especially for IaaS. Software-defined infrastructure allows you to quickly rebuild from scratch in a clean environment, and, for more isolated attacks, inherent cloud characteristics—such as auto-scale groups, API calls for changing virtual network or machine configurations, and snapshots—can speed quarantine, eradication, and recovery processes. For example, on many platforms you can instantly quarantine virtual machines by moving the instance out of the auto-scale group, isolating it with virtual firewalls, and replacing it. 116 | 117 | This also means there's no need to immediately "eradicate" the attacker before you identify their exploit mechanisms and the scope of the breach, since the new infrastructure/instances are clean; instead, you can simply isolate them. However, you still need to ensure the exploit path is closed and can't be used to infiltrate other production assets. If there is concern that the management plane is breached, be sure to confirm that the templates or configurations for new infrastructure/applications have not been compromised. 118 | 119 | That said, these capabilities are not always universal: with SaaS and some PaaS you may be very limited and will thus need to rely more on the cloud provider. 120 | 121 | #### Post-mortem 122 | 123 | As with any attack, work with the internal response team and provider to figure what worked and what didn't, then pinpoint any areas for improvement. Pay particular attention to the limitations in the data collected and figure out how to address the issues moving forward. 124 | 125 | It is hard to change SLAs, but if the agreed-upon response time, data, or other support wasn't sufficient, go back and try to renegotiate. 126 | 127 | ## Recommendations 128 | 129 | * SLAs and setting expectations around what the customer does versus what the provider does are the most important aspect of incident response for cloud-based resources. Clear communication of roles/responsibilities and practicing the response and hand-offs are critical. 130 | * Cloud customers must set up proper communication paths with the provider that can be utilized in the event of an incident. Existing open standards can facilitate incident communication. 131 | * Cloud customers must understand the content and format of data that the cloud provider will supply for analysis purposes and evaluate whether the available forensics data satisfies legal chain of custody requirements. 132 | * Cloud customers should also embrace continuous and serverless monitoring of cloud-based resources to detect potential issues earlier than in traditional data centers. 133 | * Data sources should be stored or copied into locations that maintain availability during incidents. 134 | * If needed and possible, they should also be handled to maintain a proper chain of custody. 135 | * Cloud-based applications should leverage automation and orchestration to streamline and accelerate the response, including containment and recovery. 136 | * For each cloud service provider used, the approach to detecting and handling incidents involving the resources hosted at that provider must be planned and described in the enterprise incident response plan. 137 | * The SLA with each cloud service provider must guarantee support for the incident handling required for the effective execution of the enterprise incident response plan. This must cover each stage of the incident handling process: detection, analysis, containment, eradication, and recovery. 138 | * Testing will be conducted at least annually or whenever there are significant changes to the application architecture. Customers should seek to integrate their testing procedures with that of their provider (and other partners) to the greatest extent possible. -------------------------------------------------------------------------------- /Images/1.1.2-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CloudSecurityAlliance/CSA-Guidance/cb2773570925e00c83accadca6e53d4d526f317a/Images/1.1.2-1.png -------------------------------------------------------------------------------- /Images/1.1.2.3-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CloudSecurityAlliance/CSA-Guidance/cb2773570925e00c83accadca6e53d4d526f317a/Images/1.1.2.3-1.png -------------------------------------------------------------------------------- /Images/1.1.3-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CloudSecurityAlliance/CSA-Guidance/cb2773570925e00c83accadca6e53d4d526f317a/Images/1.1.3-1.png -------------------------------------------------------------------------------- /Images/1.1.3.1-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CloudSecurityAlliance/CSA-Guidance/cb2773570925e00c83accadca6e53d4d526f317a/Images/1.1.3.1-1.png -------------------------------------------------------------------------------- /Images/1.1.3.2-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CloudSecurityAlliance/CSA-Guidance/cb2773570925e00c83accadca6e53d4d526f317a/Images/1.1.3.2-1.png -------------------------------------------------------------------------------- /Images/1.1.4-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CloudSecurityAlliance/CSA-Guidance/cb2773570925e00c83accadca6e53d4d526f317a/Images/1.1.4-1.png -------------------------------------------------------------------------------- /Images/1.2.1-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CloudSecurityAlliance/CSA-Guidance/cb2773570925e00c83accadca6e53d4d526f317a/Images/1.2.1-1.png -------------------------------------------------------------------------------- /Images/1.2.2-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CloudSecurityAlliance/CSA-Guidance/cb2773570925e00c83accadca6e53d4d526f317a/Images/1.2.2-1.png -------------------------------------------------------------------------------- /Images/1.2.2.1-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CloudSecurityAlliance/CSA-Guidance/cb2773570925e00c83accadca6e53d4d526f317a/Images/1.2.2.1-1.png -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # How to help the project 2 | 3 | Welcome to the Cloud Security Alliance Guidance 4.0 project on GitHub. Here is how to participate: 4 | 5 | * **We need your feedback!!!** Although we have a dedicated writing team, this is still a community project. The idea is to generate a cleaner and more consistent document than possible by solely relying on working groups to do their own writing, while still reflecting the collective wisdom of the community. 6 | * All feedback and edits will be managed via GitHub so that all parts of the process are open and public. 7 | * You don't need to use any special command-line GitHub tools for this project. GitHub's web interface will allow you to read documents, provide feedback, and participate. But feel free to use git tools if you know how. 8 | * Here is how to use GitHub to provide feedback: 9 | * *Issues* are the best way to add comments. The authors can read and respond to them directly. When leaving an issue. please list the line number for the start of any specific section you are commenting on. 10 | * *Pull requests are for edits*. We *can't respond to all pull requests* because our only options are to ignore a pull or merge the changes. For consistency's sake, it is very hard to accept pull requests directly. All pull requests will be reviewed, some will be merged, and those we cannot directly merge will be treated as an issue/comment and closed. This is just a practical necessity, considering how many people will eventually be providing feedback. 11 | * For writing we are using the Markdown text format. If you want to edit and send pull requests you will need to learn Markdown (fortunately it's incredibly simple). GitHub renders Markdown directly, so unless you are actually editing content you won't need to learn it. 12 | * Keep all feedback public, on GitHub. This is essential for maintaining the independence and objectivity of this project. Even if you know any of the authors or CSA staff, please don't email private feedback, which will be ignored. 13 | 14 | We will do our absolute best to respond to all feedback (with the exception of pull requests, which we *will* review), but depending on volume we may need to combine feedback (and we understand some feedback will be contradictory). 15 | 16 | ## The project process 17 | 18 | Here is what you can expect: 19 | 20 | * We will have a separate file for each domain in the Guidance. 21 | * For each domain, we will first publish a detailed outline with expected changes, and then drafts. Domains will be open for feedback the entire time, but may be closed temporarily during specific writing phases (*e.g.,* after we collect comments on the outline, the author may close feedback as they develop the first draft). 22 | * For each domain, there will be an outline, first draft, and near-final draft. 23 | * The exception is Domain 1. We skipped the outline for that and went straight to the first draft to set a writing tone for the rest of the project. 24 | * The near-final drafts will be pulled from GitHub and converted into Word, with updated graphics, for final publication. 25 | 26 | If you have any questions or general comments, please let us know either here or through email to guidance@cloudsecurityalliance.org, and thank you for your help. 27 | 28 | ## Editing and style notes 29 | 30 | * All images should be placed in /images and named with the section they appear in, followed by a dash, followed by an enumerator. e.g. "1.1.2-1.png" for the first image in the directory. Please use standard Markdown image embedding. 31 | * Links should be referenced, not inline. Each link should be sequentially ordered. This makes things easier to read (look at Domain 1 for formatting examples -- it's easy). If links start getting out of order, feel free to use "1.1" or similar to neaten things up. 32 | * Images will be redone by a graphics team before publication, so don't worry about having them look consistent. 33 | --------------------------------------------------------------------------------