├── CHANGELOG.rst
├── terraform
    ├── requirements.txt
    ├── SAMPLE-env.list
    ├── terraform-plugins
    │   └── terraform-plugins.tf
    ├── dynamodb
    │   ├── s3
    │   │   └── s3.tf
    │   ├── securitygroup
    │   │   └── securitygroup.tf
    │   └── main.tf
    ├── Dockerfile
    ├── teardown_historical.sh
    ├── install_historical.sh
    └── infra
    │   ├── s3
    │       └── s3.tf
    │   └── securitygroup
    │       └── securitygroup.tf
├── .coveragerc
├── .pylintrc
├── mkdocs
    ├── requirements-docs.txt
    ├── docs
    │   ├── troubleshooting.md
    │   ├── img
    │   │   ├── cw-events.png
    │   │   ├── historical.jpg
    │   │   ├── iam-setup.jpg
    │   │   ├── historical-s3.jpg
    │   │   └── historical-overview.jpg
    │   ├── extra.css
    │   ├── index.md
    │   ├── installation
    │   │   ├── index.md
    │   │   ├── iam.md
    │   │   ├── terraform.md
    │   │   └── configuration.md
    │   └── architecture.md
    ├── custom_theme
    │   └── img
    │   │   └── favicon.ico
    └── mkdocs.yml
├── historical
    ├── historical-cookiecutter
    │   ├── historical_{{cookiecutter.technology_slug}}
    │   │   ├── {{cookiecutter.technology_slug}}
    │   │   │   ├── __init__.py
    │   │   │   ├── differ.py
    │   │   │   ├── conftest.py
    │   │   │   ├── poller.py
    │   │   │   ├── models.py
    │   │   │   └── collector.py
    │   │   ├── requirements.txt
    │   │   ├── serverless_configs
    │   │   │   ├── prod.yml
    │   │   │   └── test.yml
    │   │   ├── package.json
    │   │   ├── README.md
    │   │   └── serverless.yaml
    │   └── cookiecutter.json
    ├── __init__.py
    ├── s3
    │   ├── __init__.py
    │   ├── differ.py
    │   ├── poller.py
    │   ├── models.py
    │   └── collector.py
    ├── vpc
    │   ├── __init__.py
    │   ├── differ.py
    │   ├── models.py
    │   ├── poller.py
    │   └── collector.py
    ├── common
    │   ├── __init__.py
    │   ├── exceptions.py
    │   ├── extensions.py
    │   ├── accounts.py
    │   ├── util.py
    │   ├── cloudwatch.py
    │   ├── sqs.py
    │   └── proxy.py
    ├── tests
    │   ├── __init__.py
    │   ├── pynamodb_settings.py
    │   ├── test_cloudwatch.py
    │   ├── factories.py
    │   └── conftest.py
    ├── security_group
    │   ├── __init__.py
    │   ├── differ.py
    │   ├── models.py
    │   ├── poller.py
    │   └── collector.py
    ├── __about__.py
    ├── cli.py
    ├── mapping
    │   └── __init__.py
    ├── constants.py
    ├── attributes.py
    └── models.py
├── setup.cfg
├── .travis.yml
├── README.md
├── setup.py
├── tox.ini
└── .gitignore


/CHANGELOG.rst:
--------------------------------------------------------------------------------
1 | Changelog
2 | =========


--------------------------------------------------------------------------------
/terraform/requirements.txt:
--------------------------------------------------------------------------------
1 | historical>=0.4.10
2 | 


--------------------------------------------------------------------------------
/.coveragerc:
--------------------------------------------------------------------------------
1 | [report]
2 | include = historical/*.py
3 | 


--------------------------------------------------------------------------------
/.pylintrc:
--------------------------------------------------------------------------------
1 | [MESSAGES CONTROL]
2 | disable=C0301,R0913,W1202,W1203,R0903,R0201,R0801
3 | 


--------------------------------------------------------------------------------
/mkdocs/requirements-docs.txt:
--------------------------------------------------------------------------------
1 | mkdocs
2 | mkdocs-bootswatch
3 | pymdown-extensions
4 | 


--------------------------------------------------------------------------------
/mkdocs/docs/troubleshooting.md:
--------------------------------------------------------------------------------
1 | # Troubleshooting
2 | This doc will be updated in the future.
3 | 


--------------------------------------------------------------------------------
/historical/historical-cookiecutter/historical_{{cookiecutter.technology_slug}}/{{cookiecutter.technology_slug}}/__init__.py:
--------------------------------------------------------------------------------
1 | 


--------------------------------------------------------------------------------
/mkdocs/docs/img/cw-events.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Netflix-Skunkworks/historical/master/mkdocs/docs/img/cw-events.png


--------------------------------------------------------------------------------
/mkdocs/docs/img/historical.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Netflix-Skunkworks/historical/master/mkdocs/docs/img/historical.jpg


--------------------------------------------------------------------------------
/mkdocs/docs/img/iam-setup.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Netflix-Skunkworks/historical/master/mkdocs/docs/img/iam-setup.jpg


--------------------------------------------------------------------------------
/mkdocs/docs/img/historical-s3.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Netflix-Skunkworks/historical/master/mkdocs/docs/img/historical-s3.jpg


--------------------------------------------------------------------------------
/mkdocs/custom_theme/img/favicon.ico:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Netflix-Skunkworks/historical/master/mkdocs/custom_theme/img/favicon.ico


--------------------------------------------------------------------------------
/mkdocs/docs/img/historical-overview.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Netflix-Skunkworks/historical/master/mkdocs/docs/img/historical-overview.jpg


--------------------------------------------------------------------------------
/historical/historical-cookiecutter/historical_{{cookiecutter.technology_slug}}/requirements.txt:
--------------------------------------------------------------------------------
1 | git+https://github.com/Netflix-Skunkworks/historical.git#egg=historical


--------------------------------------------------------------------------------
/setup.cfg:
--------------------------------------------------------------------------------
 1 | [metadata]
 2 | description-file = README.md
 3 | 
 4 | [wheel]
 5 | universal = 0
 6 | 
 7 | [egg_info]
 8 | tag_build =
 9 | tag_date = 0
10 | tag_svn_revision = 0
11 | 


--------------------------------------------------------------------------------
/historical/__init__.py:
--------------------------------------------------------------------------------
1 | """
2 | .. module: historical
3 |     :platform: Unix
4 |     :copyright: (c) 2017 by Netflix Inc., see AUTHORS for more
5 |     :license: Apache, see LICENSE for more details.
6 | .. author:: Mike Grima <mgrima@netflix.com>
7 | """
8 | 


--------------------------------------------------------------------------------
/historical/s3/__init__.py:
--------------------------------------------------------------------------------
1 | """
2 | .. module: historical.s3
3 |     :platform: Unix
4 |     :copyright: (c) 2018 by Netflix Inc., see AUTHORS for more
5 |     :license: Apache, see LICENSE for more details.
6 | .. author:: Mike Grima <mgrima@netflix.com>
7 | """
8 | 


--------------------------------------------------------------------------------
/terraform/SAMPLE-env.list:
--------------------------------------------------------------------------------
1 | AWS_ACCESS_KEY_ID=INSERTHERE
2 | AWS_SECRET_ACCESS_KEY=INSERTHERE
3 | AWS_SESSION_TOKEN=INSERTHERE
4 | TECH=s3|securitygroup
5 | TF_S3_BUCKET=INSERTHERE
6 | PRIMARY_REGION=INSERTHERE
7 | SECONDARY_REGIONS=INSERTHERE,INSERTHERE,INSERTHERE,INSERTHERE
8 | 


--------------------------------------------------------------------------------
/.travis.yml:
--------------------------------------------------------------------------------
 1 | language: python
 2 | python:
 3 |   - "3.6"
 4 | 
 5 | before_install:
 6 |   - sudo rm -f /etc/boto.cfg
 7 | 
 8 | install:
 9 |   - pip install tox-travis
10 | 
11 | 
12 | matrix:
13 |   include:
14 |     - env:
15 |     - env: TOXENV=linters
16 | 
17 | script:
18 |   - tox
19 | 


--------------------------------------------------------------------------------
/terraform/terraform-plugins/terraform-plugins.tf:
--------------------------------------------------------------------------------
 1 | // Use this file to pin versions for Terraform plugns:
 2 | provider "aws" {
 3 |   region = "us-west-2"
 4 | 
 5 |   version = "1.39"
 6 | }
 7 | 
 8 | provider "local" {
 9 |   version = "1.1"
10 | }
11 | 
12 | provider "null" {
13 |   version = "1.0"
14 | }
15 | 


--------------------------------------------------------------------------------
/historical/vpc/__init__.py:
--------------------------------------------------------------------------------
1 | """
2 | .. module: historical.vpc
3 |     :platform: Unix
4 |     :copyright: (c) 2017 by Netflix Inc., see AUTHORS for more
5 |     :license: Apache, see LICENSE for more details.
6 | .. author:: Kevin Glisson <kglisson@netflix.com>
7 | .. author:: Mike Grima <mgrima@netflix.com>
8 | """
9 | 


--------------------------------------------------------------------------------
/historical/common/__init__.py:
--------------------------------------------------------------------------------
1 | """
2 | .. module: historical.common
3 |     :platform: Unix
4 |     :copyright: (c) 2018 by Netflix Inc., see AUTHORS for more
5 |     :license: Apache, see LICENSE for more details.
6 | .. author:: Kevin Glisson <kglisson@netflix.com>
7 | .. author:: Mike Grima <mgrima@netflix.com>
8 | """
9 | 


--------------------------------------------------------------------------------
/historical/tests/__init__.py:
--------------------------------------------------------------------------------
1 | """
2 | .. module: historical.tests
3 |     :platform: Unix
4 |     :copyright: (c) 2018 by Netflix Inc., see AUTHORS for more
5 |     :license: Apache, see LICENSE for more details.
6 | .. author:: Mike Grima <mgrima@netflix.com>
7 | .. author:: Kevin Glisson <kglisson@netflix.com>
8 | """
9 | 


--------------------------------------------------------------------------------
/historical/historical-cookiecutter/historical_{{cookiecutter.technology_slug}}/serverless_configs/prod.yml:
--------------------------------------------------------------------------------
1 | accountId: {{cookiecutter.prod_account_id}}
2 | accountName: {{cookiecutter.prod_account_name}}
3 | pythonRequirements:
4 |     dockerizePip: true
5 |     invalidateCaches: true
6 | prune:
7 |   automatic: true
8 |   number: 3


--------------------------------------------------------------------------------
/historical/historical-cookiecutter/historical_{{cookiecutter.technology_slug}}/serverless_configs/test.yml:
--------------------------------------------------------------------------------
1 | accountId: {{cookiecutter.test_account_id}}
2 | accountName: {{cookiecutter.test_account_name}}
3 | pythonRequirements:
4 |     dockerizePip: true
5 |     invalidateCaches: true
6 | prune:
7 |   automatic: true
8 |   number: 3


--------------------------------------------------------------------------------
/historical/security_group/__init__.py:
--------------------------------------------------------------------------------
1 | """
2 | .. module: historical.security_group
3 |     :platform: Unix
4 |     :copyright: (c) 2018 by Netflix Inc., see AUTHORS for more
5 |     :license: Apache, see LICENSE for more details.
6 | .. author:: Kevin Glisson <kglisson@netflix.com>
7 | .. author:: Mike Grima <mgrima@netflix.com>
8 | """
9 | 


--------------------------------------------------------------------------------
/historical/historical-cookiecutter/cookiecutter.json:
--------------------------------------------------------------------------------
 1 | {
 2 |   "technology_name": "ELB",
 3 |   "technology_slug": "{{cookiecutter.technology_name.lower().replace(' ', '_').replace('-', '_')}}",
 4 |   "email": "kevin@example.com",
 5 |   "author": "Kevin Glisson",
 6 |   "team": "Team Rocket",
 7 |   "version": "0.1.0",
 8 |   "test_account_name": "test",
 9 |   "test_account_id": "",
10 |   "prod_account_name": "prod",
11 |   "prod_account_id": "",
12 |   "_extensions": ["historical.common.extensions.HistoricalExtension"]
13 | }


--------------------------------------------------------------------------------
/terraform/dynamodb/s3/s3.tf:
--------------------------------------------------------------------------------
 1 | // S3 SPECIFIC VARIABLES:
 2 | // Set the default values for the Read and Write capacities to your environment's needs
 3 | variable "CURRENT_TABLE" {
 4 |   default = "HistoricalS3CurrentTable"
 5 | }
 6 | 
 7 | variable "CURRENT_TABLE_READ_CAP" {
 8 |   default = 100
 9 | }
10 | 
11 | variable "CURRENT_TABLE_WRITE_CAP" {
12 |   default = 100
13 | }
14 | 
15 | variable "DURABLE_TABLE" {
16 |   default = "HistoricalS3DurableTable"
17 | }
18 | 
19 | variable "DURABLE_TABLE_READ_CAP" {
20 |   default = 100
21 | }
22 | 
23 | variable "DURABLE_TABLE_WRITE_CAP" {
24 |   default = 100
25 | }
26 | 


--------------------------------------------------------------------------------
/historical/common/exceptions.py:
--------------------------------------------------------------------------------
 1 | """
 2 | .. module: historical.common.exceptions
 3 |     :platform: Unix
 4 |     :copyright: (c) 2018 by Netflix Inc., see AUTHORS for more
 5 |     :license: Apache, see LICENSE for more details.
 6 | .. author:: Mike Grima <mgrima@netflix.com>
 7 | """
 8 | 
 9 | 
10 | class DurableItemIsMissingException(Exception):
11 |     """Exception for if a Durable Item is missing but should be found."""
12 | 
13 |     pass
14 | 
15 | 
16 | class MissingProxyConfigurationException(Exception):
17 |     """Exception if the Proxy is missing the proper configuration on how to operate."""
18 | 
19 |     pass
20 | 


--------------------------------------------------------------------------------
/terraform/dynamodb/securitygroup/securitygroup.tf:
--------------------------------------------------------------------------------
 1 | // SECURITY GROUP SPECIFIC VARIABLES:
 2 | // Set the default values for the Read and Write capacities to your environment's needs
 3 | variable "CURRENT_TABLE" {
 4 |   default = "HistoricalSecurityGroupCurrentTable"
 5 | }
 6 | 
 7 | variable "CURRENT_TABLE_READ_CAP" {
 8 |   default = 100
 9 | }
10 | 
11 | variable "CURRENT_TABLE_WRITE_CAP" {
12 |   default = 100
13 | }
14 | 
15 | variable "DURABLE_TABLE" {
16 |   default = "HistoricalSecurityGroupDurableTable"
17 | }
18 | 
19 | variable "DURABLE_TABLE_READ_CAP" {
20 |   default = 100
21 | }
22 | 
23 | variable "DURABLE_TABLE_WRITE_CAP" {
24 |   default = 100
25 | }
26 | 


--------------------------------------------------------------------------------
/historical/tests/pynamodb_settings.py:
--------------------------------------------------------------------------------
 1 | # pylint: disable=E0401,C0103
 2 | """
 3 | .. module: historical.tests.pynamodb_settings.py
 4 |     :platform: Unix
 5 |     :copyright: (c) 2018 by Netflix Inc., see AUTHORS for more
 6 |     :license: Apache, see LICENSE for more details.
 7 | .. author:: Mike Grima <mgrima@netflix.com>
 8 | """
 9 | import requests
10 | 
11 | 
12 | # This is a temporary file that is present to make PynamoDB work properly on unit tests.
13 | # This issue has more details: https://github.com/pynamodb/PynamoDB/issues/558
14 | # and will be fixed when this PR is merged: https://github.com/pynamodb/PynamoDB/pull/559
15 | 
16 | session_cls = requests.Session
17 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Historical
 2 | [![Build Status](https://travis-ci.org/Netflix-Skunkworks/historical.svg?branch=master)](https://travis-ci.org/Netflix-Skunkworks/historical)
 3 | [![Coverage Status](https://coveralls.io/repos/github/Netflix-Skunkworks/historical/badge.svg?branch=master)](https://coveralls.io/github/Netflix-Skunkworks/historical?branch=master)
 4 | [![PyPI version](https://badge.fury.io/py/historical.svg)](https://badge.fury.io/py/historical)
 5 | 
 6 | ## THIS PROJECT IS ARCHIVED AND NO LONGER IN DEVELOPMENT
 7 | 
 8 | Please review the documentation that is hosted here: [https://netflix-skunkworks.github.io/historical](https://netflix-skunkworks.github.io/historical).
 9 | 
10 | [![Historical Logo](mkdocs/docs/img/historical.jpg)](https://netflix-skunkworks.github.io/historical)
11 | 


--------------------------------------------------------------------------------
/historical/__about__.py:
--------------------------------------------------------------------------------
 1 | """
 2 | .. module: historical
 3 |     :platform: Unix
 4 |     :copyright: (c) 2017 by Netflix Inc., see AUTHORS for more
 5 |     :license: Apache, see LICENSE for more details.
 6 | .. author:: Mike Grima <mgrima@netflix.com>
 7 | """
 8 | from __future__ import absolute_import, division, print_function
 9 | 
10 | __all__ = [
11 |     "__title__", "__summary__", "__uri__", "__version__", "__author__",
12 |     "__email__", "__license__", "__copyright__",
13 | ]
14 | 
15 | __title__ = "historical"
16 | __summary__ = ("Historical tracking of AWS resource configuration.")
17 | __uri__ = "https://github.com/Netflix-Skunkworks/historical"
18 | 
19 | __version__ = "0.4.10"
20 | 
21 | __author__ = "The Historical developers"
22 | __email__ = "security@netflix.com"
23 | 
24 | __license__ = "Apache License, Version 2.0"
25 | __copyright__ = f"Copyright 2017 {__author__}"
26 | 


--------------------------------------------------------------------------------
/historical/historical-cookiecutter/historical_{{cookiecutter.technology_slug}}/package.json:
--------------------------------------------------------------------------------
 1 | {
 2 |   "name": "historical-deploy",
 3 |   "version": "0.1.0",
 4 |   "description": "A collection of AWS Lambda functions for collecting and storing AWS configuration data.",
 5 |   "main": "index.js",
 6 |   "repository": {
 7 |     "type": "git",
 8 |     "url": "git+https://github.com/Netflix-Skunkworks/historical.git"
 9 |   },
10 |   "keywords": [
11 |     "python",
12 |     "aws",
13 |     "lambda",
14 |     "serverless"
15 |   ],
16 |   "author": "{{cookiecutter.email}}",
17 |   "license": "Apache",
18 |   "bugs": {
19 |     "url": "https://github.com/Netflix-Skunkworks/historical/issues"
20 |   },
21 |   "homepage": "https://github.com/Netflix-Skunkworks/historical/#readme",
22 |   "dependencies": {
23 |     "serverless-prune-plugin": "^1.1.1",
24 |     "serverless-python-requirements": "^2.2.1"
25 |   }
26 | }
27 | 


--------------------------------------------------------------------------------
/historical/cli.py:
--------------------------------------------------------------------------------
 1 | """
 2 | .. module: historical.cli
 3 |     :platform: Unix
 4 |     :copyright: (c) 2017 by Netflix Inc., see AUTHORS for more
 5 |     :license: Apache, see LICENSE for more details.
 6 | .. author:: Mike Grima <mgrima@netflix.com>
 7 | """
 8 | import os
 9 | import logging
10 | 
11 | import click
12 | import click_log
13 | from cookiecutter.main import cookiecutter  # pylint: disable=E0401
14 | 
15 | from historical.__about__ import __version__
16 | 
17 | LOG = logging.getLogger('historical')
18 | click_log.basic_config(LOG)
19 | 
20 | 
21 | @click.group()
22 | @click_log.simple_verbosity_option(LOG)
23 | @click.version_option(version=__version__)
24 | def cli():
25 |     """Historical commandline for managing historical functions."""
26 |     pass
27 | 
28 | 
29 | @cli.command()
30 | def new():
31 |     """Creates a new historical technology."""
32 |     dir_path = os.path.dirname(os.path.realpath(__file__))
33 |     cookiecutter(os.path.join(dir_path, 'historical-cookiecutter/'))
34 | 


--------------------------------------------------------------------------------
/mkdocs/docs/extra.css:
--------------------------------------------------------------------------------
 1 | .navbar .dropdown-menu>li>a, .navbar .dropdown-menu>li>a:focus {
 2 |     font-weight: 400;
 3 | }
 4 | 
 5 | .navbar-default {
 6 |     background-color: #00526E;
 7 |     font-weight: 400;
 8 | }
 9 | 
10 | .navbar-default .navbar-nav>.active>a, .navbar-default .navbar-nav>.active>a:hover, .navbar-default .navbar-nav>.active>a:focus {
11 |     background-color: #32748B;
12 |     font-family: "Helvetica Neue",Helvetica,Arial,sans-serif;
13 |     font-weight: 400;
14 | }
15 | 
16 | .navbar-default .navbar-nav>li>a:hover, .navbar-default .navbar-nav>li>a:focus {
17 |     background-color: #003142;
18 |     font-family: "Helvetica Neue",Helvetica,Arial,sans-serif;
19 |     font-weight: 400;
20 | }
21 | 
22 | body {
23 |     font-family: "Helvetica Neue",Helvetica,Arial,sans-serif;
24 |     font-size: 1.65em;
25 | }
26 | 
27 | h1, h2, h3, h4, h5, h6 {
28 |     font-family: "Helvetica Neue", Helvetica, Arial, sans-serif;
29 |     font-weight: 500;
30 | }
31 | 
32 | table {
33 |     font-size: 15px;
34 | }


--------------------------------------------------------------------------------
/mkdocs/mkdocs.yml:
--------------------------------------------------------------------------------
 1 | site_name: Historical
 2 | repo_url: https://github.com/Netflix-Skunkworks/historical/
 3 | repo_name: GitHub
 4 | edit_uri: ""
 5 | nav:
 6 |   - Welcome: index.md
 7 |   - Architecture: architecture.md
 8 |   - Installation and Configuration:
 9 |       - Instructions: installation/
10 |       - Prerequisites: installation/#prerequisites
11 |       - IAM Setup: installation/iam.md
12 |       - Terraform: installation/terraform.md
13 |       - Configuration Reference: installation/configuration.md
14 |       - Prepare Docker Container: installation/#prepare-docker-container
15 |       - Installation: installation/#installation
16 |       - Uninstallation: installation/#uninstallation
17 |   - Troubleshooting: troubleshooting.md
18 | theme:
19 |   name: yeti
20 |   custom_dir: custom_theme/
21 | extra_css:
22 |   - extra.css
23 | 
24 | # There are many available formatting extensions available, please read:
25 | # https://facelessuser.github.io/pymdown-extensions/
26 | markdown_extensions:
27 |   - toc:
28 |       permalink: True
29 |   - pymdownx.tilde
30 | 


--------------------------------------------------------------------------------
/historical/historical-cookiecutter/historical_{{cookiecutter.technology_slug}}/README.md:
--------------------------------------------------------------------------------
 1 | ## ⚡ Historical Deploy
 2 | 
 3 | [![serverless](http://public.serverless.com/badges/v3.svg)](http://www.serverless.com)
 4 | 
 5 | ## About
 6 | These are the serverless configuration files needed to various pieces of historical infrastructure. These are configuration files only. Historical itself is located at:
 7 | 
 8 | https://github.com/Netflix-Skunkworks/historical
 9 | 
10 | 
11 | ## Monitoring
12 | 
13 | All of the functions are wrapped with the `RavenLambdaWrapper`. This decorator forwards lambda
14 | telemetry to a [Sentry](https://sentry.io) instance. This will have no effect unless you specify `SENTRY_DSN`
15 | in the Lambda's environment variables.
16 | 
17 | 
18 | ### Deployment
19 | 
20 | Install python requirements:
21 | 
22 |     pip install -r requirements.txt
23 | 
24 | Run the tests:
25 | 
26 |     py.test
27 | 
28 | Get the serverless package:
29 | 
30 |     npm install serverless
31 | 
32 | Fetch AWS credentials.
33 | 
34 | Deploy package
35 | 
36 |     sls deploy --region us-east-1 --stage <prod>|<test>
37 | 


--------------------------------------------------------------------------------
/historical/historical-cookiecutter/historical_{{cookiecutter.technology_slug}}/{{cookiecutter.technology_slug}}/differ.py:
--------------------------------------------------------------------------------
 1 | """
 2 | .. module: {{cookiecutter.technology_slug}}.differ
 3 |     :platform: Unix
 4 |     :copyright: (c) 2017 by Netflix Inc., see AUTHORS for more
 5 |     :license: Apache, see LICENSE for more details.
 6 | .. author:: {{cookiecutter.author}} <{{cookiecutter.email}}>
 7 | """
 8 | import logging
 9 | 
10 | from raven_python_lambda import RavenLambdaWrapper
11 | 
12 | from historical.common.dynamodb import process_dynamodb_differ_record
13 | from .models import Durable{{cookiecutter.technology_slug | titlecase}}Model
14 | 
15 | logging.basicConfig()
16 | log = logging.getLogger('historical')
17 | log.setLevel(logging.WARNING)
18 | 
19 | 
20 | @RavenLambdaWrapper()
21 | def handler(event, context):
22 |     """
23 |     Historical security group event differ.
24 | 
25 |     Listens to the Historical current table and determines if there are differences that need to be persisted in the
26 |     historical record.
27 |     """
28 |     for record in event['Records']:
29 |         process_dynamodb_differ_record(record, Durable{{cookiecutter.technology_slug | titlecase}}Model)
30 | 


--------------------------------------------------------------------------------
/historical/mapping/__init__.py:
--------------------------------------------------------------------------------
 1 | """
 2 | .. module: historical.mapping
 3 |     :platform: Unix
 4 |     :copyright: (c) 2017 by Netflix Inc., see AUTHORS for more
 5 |     :license: Apache, see LICENSE for more details.
 6 | .. author:: Mike Grima <mgrima@netflix.com>
 7 | """
 8 | 
 9 | import os
10 | 
11 | from historical.security_group.models import CurrentSecurityGroupModel, DurableSecurityGroupModel
12 | from historical.s3.models import CurrentS3Model, DurableS3Model
13 | from historical.vpc.models import CurrentVPCModel, DurableVPCModel
14 | 
15 | # The HISTORICAL_TECHNOLOGY variable MUST be equal to that of an existing model's 'tech' Meta field.
16 | HISTORICAL_TECHNOLOGY = os.environ.get('HISTORICAL_TECHNOLOGY')
17 | 
18 | # Current Table Mapping:
19 | CURRENT_MAPPING = {
20 |     CurrentSecurityGroupModel.Meta.tech: CurrentSecurityGroupModel,
21 |     CurrentS3Model.Meta.tech: CurrentS3Model,
22 |     CurrentVPCModel.Meta.tech: CurrentVPCModel
23 | }
24 | 
25 | # Durable Table Mapping:
26 | DURABLE_MAPPING = {
27 |     DurableSecurityGroupModel.Meta.tech: DurableSecurityGroupModel,
28 |     DurableS3Model.Meta.tech: DurableS3Model,
29 |     DurableVPCModel.Meta.tech: DurableVPCModel
30 | }
31 | 


--------------------------------------------------------------------------------
/historical/common/extensions.py:
--------------------------------------------------------------------------------
 1 | """
 2 | .. module: historical.common.extensions
 3 |     :platform: Unix
 4 |     :copyright: (c) 2018 by Netflix Inc., see AUTHORS for more
 5 |     :license: Apache, see LICENSE for more details.
 6 | .. author:: Kevin Glisson <kglisson@netflix.com>
 7 | """
 8 | from jinja2.ext import Extension
 9 | 
10 | 
11 | def titlecase(input_str):
12 |     """Transforms a string to titlecase."""
13 |     return "".join([x.title() for x in input_str.split('_')])
14 | 
15 | 
16 | class HistoricalExtension(Extension):
17 |     """Extension class for Cookiecutters."""
18 | 
19 |     def __init__(self, environment):
20 |         """Instantiates the Historical Extension
21 | 
22 |         :param environment:
23 |         """
24 |         super(HistoricalExtension, self).__init__(environment)
25 |         environment.filters['titlecase'] = titlecase
26 | 
27 |     def parse(self, parser):
28 |         """If any of the :attr:`tags` matched this method is called with the
29 |         parser as first argument.  The token the parser stream is pointing at
30 |         is the name token that matched.  This method has to return one or a
31 |         list of multiple nodes.
32 |         """
33 |         raise NotImplementedError()
34 | 


--------------------------------------------------------------------------------
/historical/vpc/differ.py:
--------------------------------------------------------------------------------
 1 | """
 2 | .. module: historical.vpc.differ
 3 |     :platform: Unix
 4 |     :copyright: (c) 2017 by Netflix Inc., see AUTHORS for more
 5 |     :license: Apache, see LICENSE for more details.
 6 | .. author:: Kevin Glisson <kglisson@netflix.com>
 7 | """
 8 | import logging
 9 | 
10 | from raven_python_lambda import RavenLambdaWrapper
11 | 
12 | from historical.common.dynamodb import process_dynamodb_differ_record
13 | from historical.common.util import deserialize_records
14 | from historical.constants import LOGGING_LEVEL
15 | from historical.vpc.models import CurrentVPCModel, DurableVPCModel
16 | 
17 | logging.basicConfig()
18 | LOG = logging.getLogger('historical')
19 | LOG.setLevel(LOGGING_LEVEL)
20 | 
21 | 
22 | @RavenLambdaWrapper()
23 | def handler(event, context):  # pylint: disable=W0613
24 |     """
25 |     Historical security group event differ.
26 | 
27 |     Listens to the Historical current table and determines if there are differences that need to be persisted in the
28 |     historical record.
29 |     """
30 |     # De-serialize the records:
31 |     records = deserialize_records(event['Records'])
32 | 
33 |     for record in records:
34 |         process_dynamodb_differ_record(record, CurrentVPCModel, DurableVPCModel)
35 | 


--------------------------------------------------------------------------------
/terraform/Dockerfile:
--------------------------------------------------------------------------------
 1 | FROM amazonlinux:1
 2 | 
 3 | MAINTAINER Netflix OSS
 4 | 
 5 | COPY requirements.txt /installer/requirements.txt
 6 | COPY terraform-plugins /installer/terraform-plugins
 7 | 
 8 | ARG TERRAFORM_VERSION=0.11.10
 9 | 
10 | RUN \
11 |     yum install python36 python36-devel gcc-c++ make zip unzip git jq aws-cli -y \
12 |     && curl https://releases.hashicorp.com/terraform/${TERRAFORM_VERSION}/terraform_${TERRAFORM_VERSION}_linux_amd64.zip -o terraform_installer.zip -s \
13 |     && unzip /terraform_installer.zip \
14 |     && cd /installer/terraform-plugins \
15 |     && /terraform init \
16 |     && mv .terraform/plugins/linux_amd64/* ./ \
17 |     && rm -Rf .terraform
18 | 
19 | # ENVIRONMENT VARIABLES:
20 | ENV TECH=""
21 | ENV TF_S3_BUCKET=""
22 | ENV PRIMARY_REGION=""
23 | ENV SECONDARY_REGIONS=""
24 | 
25 | # AWS CREDS:
26 | ENV AWS_ACCESS_KEY_ID=""
27 | ENV AWS_SECRET_ACCESS_KEY=""
28 | ENV AWS_SESSION_TOKEN=""
29 | 
30 | # Do these later to help with caching:
31 | COPY install_historical.sh /installer/install_historical.sh
32 | COPY teardown_historical.sh /installer/teardown_historical.sh
33 | COPY dynamodb /installer/dynamodb
34 | COPY infra /installer/infra
35 | RUN chmod +x /installer/*.sh
36 | 
37 | WORKDIR "/installer"
38 | ENTRYPOINT ["/installer/install_historical.sh"]
39 | 


--------------------------------------------------------------------------------
/historical/security_group/differ.py:
--------------------------------------------------------------------------------
 1 | """
 2 | .. module: historical.security_group.differ
 3 |     :platform: Unix
 4 |     :copyright: (c) 2017 by Netflix Inc., see AUTHORS for more
 5 |     :license: Apache, see LICENSE for more details.
 6 | .. author:: Kevin Glisson <kglisson@netflix.com>
 7 | """
 8 | import logging
 9 | 
10 | from raven_python_lambda import RavenLambdaWrapper
11 | 
12 | from historical.common.dynamodb import process_dynamodb_differ_record
13 | from historical.common.util import deserialize_records
14 | from historical.security_group.models import CurrentSecurityGroupModel, DurableSecurityGroupModel
15 | from historical.constants import LOGGING_LEVEL
16 | 
17 | logging.basicConfig()
18 | LOG = logging.getLogger('historical')
19 | LOG.setLevel(LOGGING_LEVEL)
20 | 
21 | 
22 | @RavenLambdaWrapper()
23 | def handler(event, context):  # pylint: disable=W0613
24 |     """
25 |     Historical security group event differ.
26 | 
27 |     Listens to the Historical current table and determines if there are differences that need to be persisted in the
28 |     historical record.
29 |     """
30 |     # De-serialize the records:
31 |     records = deserialize_records(event['Records'])
32 | 
33 |     for record in records:
34 |         process_dynamodb_differ_record(record, CurrentSecurityGroupModel, DurableSecurityGroupModel)
35 | 


--------------------------------------------------------------------------------
/historical/historical-cookiecutter/historical_{{cookiecutter.technology_slug}}/{{cookiecutter.technology_slug}}/conftest.py:
--------------------------------------------------------------------------------
 1 | from historical.tests.conftest import *
 2 | 
 3 | 
 4 | @pytest.fixture(scope='function')
 5 | def {{cookiecutter.technology_slug}}s(ec2):
 6 |     """Creates {{cookiecutter.technology_slug}}s."""
 7 |     # TODO create aws item
 8 |     # Example::
 9 |     #    yield ec2.create_vpc(
10 |     #        CidrBlock='192.168.1.1/32',
11 |     #        AmazonProvidedIpv6CidrBlock=True,
12 |     #        InstanceTenancy='default'
13 |     #    )['Vpc']
14 |     yield
15 | 
16 | 
17 | @pytest.fixture(scope='function')
18 | def current_{{cookiecutter.technology_slug}}_table():
19 |     from .models import Current{{cookiecutter.technology_slug | titlecase}}Model
20 |     mock_dynamodb2().start()
21 |     yield Current{{cookiecutter.technology_slug | titlecase}}Model.create_table(read_capacity_units=1, write_capacity_units=1, wait=True)
22 |     mock_dynamodb2().stop()
23 | 
24 | 
25 | @pytest.fixture(scope='function')
26 | def durable_{{cookiecutter.technology_slug}}_table():
27 |     from .models import Durable{{cookiecutter.technology_slug | titlecase}}Model
28 |     mock_dynamodb2().start()
29 |     yield Durable{{cookiecutter.technology_slug | titlecase}}Model.create_table(read_capacity_units=1, write_capacity_units=1, wait=True)
30 |     mock_dynamodb2().stop()


--------------------------------------------------------------------------------
/historical/s3/differ.py:
--------------------------------------------------------------------------------
 1 | """
 2 | .. module: historical.s3.differ
 3 |     :platform: Unix
 4 |     :copyright: (c) 2017 by Netflix Inc., see AUTHORS for more
 5 |     :license: Apache, see LICENSE for more details.
 6 | .. author:: Mike Grima <mgrima@netflix.com>
 7 | """
 8 | import logging
 9 | 
10 | from raven_python_lambda import RavenLambdaWrapper
11 | 
12 | from historical.common.util import deserialize_records
13 | from historical.constants import LOGGING_LEVEL
14 | from historical.s3.models import CurrentS3Model, DurableS3Model
15 | from historical.common.dynamodb import process_dynamodb_differ_record
16 | 
17 | logging.basicConfig()
18 | LOG = logging.getLogger('historical')
19 | LOG.setLevel(LOGGING_LEVEL)
20 | 
21 | # Path to where in the dict the ephemeral field is -- starting with "root['M'][PathInConfigDontForgetDataType]..."
22 | # EPHEMERAL_PATHS = []
23 | 
24 | 
25 | @RavenLambdaWrapper()
26 | def handler(event, context):  # pylint: disable=W0613
27 |     """
28 |     Historical S3 event differ.
29 | 
30 |     Listens to the Historical current table and determines if there are differences that need to be persisted in the
31 |     historical record.
32 |     """
33 |     # De-serialize the records:
34 |     records = deserialize_records(event['Records'])
35 | 
36 |     for record in records:
37 |         process_dynamodb_differ_record(record, CurrentS3Model, DurableS3Model)
38 | 


--------------------------------------------------------------------------------
/historical/constants.py:
--------------------------------------------------------------------------------
 1 | """
 2 | .. module: historical.constants
 3 |     :platform: Unix
 4 |     :copyright: (c) 2017 by Netflix Inc., see AUTHORS for more
 5 |     :license: Apache, see LICENSE for more details.
 6 | .. author:: Mike Grima <mgrima@netflix.com>
 7 | .. author:: Kevin Glisson <kglisson@netflix.com>
 8 | """
 9 | import logging
10 | import os
11 | 
12 | LOG_LEVELS = {
13 |     'CRITICAL': logging.CRITICAL,
14 |     'ERROR': logging.ERROR,
15 |     'WARNING': logging.WARNING,
16 |     'INFO': logging.INFO,
17 |     'DEBUG': logging.DEBUG
18 | }
19 | 
20 | 
21 | def extract_log_level_from_environment(k, default):
22 |     """Gets the log level from the environment variable."""
23 |     return LOG_LEVELS.get(os.environ.get(k)) or int(os.environ.get(k, default))
24 | 
25 | 
26 | # 24 hours in seconds is the default
27 | TTL_EXPIRY = int(os.environ.get('TTL_EXPIRY', 86400))
28 | 
29 | # By default, don't randomize the pollers (tasker or collector -- same env var):
30 | RANDOMIZE_POLLER = int(os.environ.get('RANDOMIZE_POLLER', 0))
31 | 
32 | CURRENT_REGION = os.environ.get('AWS_DEFAULT_REGION', 'us-east-1')
33 | HISTORICAL_ROLE = os.environ.get('HISTORICAL_ROLE', 'Historical')
34 | POLL_REGIONS = os.environ.get('POLL_REGIONS', 'us-east-1').split(",")
35 | PROXY_REGIONS = os.environ.get('PROXY_REGIONS', 'us-east-1').split(",")
36 | REGION_ATTR = os.environ.get('REGION_ATTR', 'Region')
37 | SIMPLE_DURABLE_PROXY = os.environ.get('SIMPLE_DURABLE_PROXY', False)
38 | LOGGING_LEVEL = extract_log_level_from_environment('LOGGING_LEVEL', logging.INFO)
39 | EVENT_TOO_BIG_FLAG = 'event_too_big'
40 | 


--------------------------------------------------------------------------------
/historical/common/accounts.py:
--------------------------------------------------------------------------------
 1 | """
 2 | .. module: historical.common.accounts
 3 |     :platform: Unix
 4 |     :copyright: (c) 2018 by Netflix Inc., see AUTHORS for more
 5 |     :license: Apache, see LICENSE for more details.
 6 | .. author:: Kevin Glisson <kglisson@netflix.com>
 7 | .. author:: Mike Grima <mgrima@netflix.com>
 8 | """
 9 | import os
10 | 
11 | from swag_client.backend import SWAGManager
12 | from swag_client.util import parse_swag_config_options
13 | 
14 | 
15 | def parse_boolean(value):
16 |     """Simple function to get a boolean value from string."""
17 |     if not value:
18 |         return False
19 | 
20 |     if str(value).lower() == 'true':
21 |         return True
22 | 
23 |     return False
24 | 
25 | 
26 | def get_historical_accounts():
27 |     """Fetches valid accounts from SWAG if enabled or a list accounts."""
28 |     if os.environ.get('SWAG_BUCKET', False):
29 |         swag_opts = {
30 |             'swag.type': 's3',
31 |             'swag.bucket_name': os.environ['SWAG_BUCKET'],
32 |             'swag.data_file': os.environ.get('SWAG_DATA_FILE', 'accounts.json'),
33 |             'swag.region': os.environ.get('SWAG_REGION', 'us-east-1')
34 |         }
35 |         swag = SWAGManager(**parse_swag_config_options(swag_opts))
36 |         search_filter = f"[?provider=='aws' && owner=='{os.environ['SWAG_OWNER']}' && account_status!='deleted'"
37 | 
38 |         if parse_boolean(os.environ.get('TEST_ACCOUNTS_ONLY')):
39 |             search_filter += " && environment=='test'"
40 | 
41 |         search_filter += ']'
42 | 
43 |         accounts = swag.get_service_enabled('historical', search_filter=search_filter)
44 |     else:
45 |         accounts = [{'id': account_id} for account_id in os.environ['ENABLED_ACCOUNTS'].split(',')]
46 | 
47 |     return accounts
48 | 


--------------------------------------------------------------------------------
/historical/common/util.py:
--------------------------------------------------------------------------------
 1 | """
 2 | .. module: historical.common.util
 3 |     :platform: Unix
 4 |     :copyright: (c) 2018 by Netflix Inc., see AUTHORS for more
 5 |     :license: Apache, see LICENSE for more details.
 6 | .. author:: Mike Grima <mgrima@netflix.com>
 7 | """
 8 | import json
 9 | 
10 | 
11 | def deserialize_records(records):
12 |     """
13 |     This properly deserializes records depending on where they came from:
14 |         - SQS
15 |         - SNS
16 |     """
17 |     native_records = []
18 |     for record in records:
19 |         parsed = json.loads(record['body'])
20 | 
21 |         # Is this a DynamoDB stream event?
22 |         if isinstance(parsed, str):
23 |             native_records.append(json.loads(parsed))
24 | 
25 |         # Is this a subscription message from SNS? If so, skip it:
26 |         elif parsed.get('Type') == 'SubscriptionConfirmation':
27 |             continue
28 | 
29 |         # Is this from SNS (cross-region request -- SNS messages wrapped in SQS message) -- or an SNS proxied message?
30 |         elif parsed.get('Message'):
31 |             native_records.append(json.loads(parsed['Message']))
32 | 
33 |         else:
34 |             native_records.append(parsed)
35 | 
36 |     return native_records
37 | 
38 | 
39 | def pull_tag_dict(data):
40 |     """This will pull out a list of Tag Name-Value objects, and return it as a dictionary.
41 | 
42 |     :param data: The dict collected from the collector.
43 |     :returns dict: A dict of the tag names and their corresponding values.
44 |     """
45 |     # If there are tags, set them to a normal dict, vs. a list of dicts:
46 |     tags = data.pop('Tags', {}) or {}
47 |     if tags:
48 |         proper_tags = {}
49 |         for tag in tags:
50 |             proper_tags[tag['Key']] = tag['Value']
51 | 
52 |         tags = proper_tags
53 | 
54 |     return tags
55 | 


--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Historical
 3 | ==========
 4 | 
 5 | Allows for the tracking of AWS configuration data across accounts/regions/technologies.
 6 | 
 7 | """
 8 | import os.path
 9 | 
10 | from setuptools import find_packages, setup
11 | 
12 | ROOT = os.path.realpath(os.path.join(os.path.dirname(__file__)))
13 | 
14 | about = {}
15 | with open(os.path.join(ROOT, "historical", "__about__.py")) as f:
16 |     exec(f.read(), about)
17 | 
18 | 
19 | install_requires = [
20 |     'boto3>=1.9.47',
21 |     'cloudaux>=1.4.14',
22 |     'click>=6.7',
23 |     'pynamodb>=3.3.1',
24 |     'deepdiff>=3.3.0',
25 |     'raven-python-lambda>=0.1.7',
26 |     'marshmallow>=2.13.5',
27 |     'swag-client==0.4.3',
28 |     'python-dateutil==2.6.1',
29 |     'Jinja2==2.10'
30 | ]
31 | 
32 | tests_require = [
33 |     'cookiecutter==1.6.0',
34 |     'pytest==3.1.3',
35 |     'pytest-cov>=2.5.1',
36 |     'mock==2.0.0',
37 |     'moto>=1.3.2',
38 |     'coveralls==1.1',
39 |     'factory-boy==2.9.2',
40 |     'tox==3.4.0',
41 | ]
42 | 
43 | 
44 | setup(
45 |     name=about["__title__"],
46 |     version=about["__version__"],
47 |     author=about["__author__"],
48 |     author_email=about["__email__"],
49 |     url=about["__uri__"],
50 |     description=about["__summary__"],
51 |     long_description='See README.md',
52 |     packages=find_packages(),
53 |     include_package_data=True,
54 |     zip_safe=False,
55 |     install_requires=install_requires,
56 |     extras_require={
57 |         'tests': tests_require
58 |     },
59 |     entry_points={
60 |         'console_scripts': [
61 |             'historical = historical.cli:cli',
62 |         ]
63 |     },
64 |     keywords=['aws', 'account_management'],
65 |     classifiers=[
66 |         'Programming Language :: Python',
67 |         'Programming Language :: Python :: 3',
68 |         'Programming Language :: Python :: 3.6',
69 |     ],
70 | )
71 | 


--------------------------------------------------------------------------------
/tox.ini:
--------------------------------------------------------------------------------
 1 | [tox]
 2 | envlist = py36,linters
 3 | 
 4 | [testenv]
 5 | usedevelop = True
 6 | passenv = TRAVIS TRAVIS_*
 7 | deps =
 8 |     git+https://github.com/mikegrima/moto.git@instanceprofiles#egg=moto
 9 |     .[tests]
10 |     mock
11 |     pytest
12 |     coveralls
13 | 
14 | setenv =
15 |     COVERAGE_FILE = test-reports/{envname}/.coverage
16 |     PYTEST_ADDOPTS = --junitxml=test-reports/{envname}/junit.xml -vv
17 |     # Fix for PynamoDB Vendored Requests:
18 |     PYNAMODB_CONFIG = historical/tests/pynamodb_settings.py
19 | commands =
20 |     pytest {posargs} --ignore=historical/historical-cookiecutter historical
21 |     coveralls
22 | 
23 | [testenv:linters]
24 | basepython = python3
25 | usedevelop = true
26 | deps =
27 |     {[testenv:flake8]deps}
28 |     {[testenv:pylint]deps}
29 |     {[testenv:setuppy]deps}
30 |     {[testenv:bandit]deps}
31 | commands =
32 |     {[testenv:flake8]commands}
33 |     {[testenv:pylint]commands}
34 |     {[testenv:setuppy]commands}
35 |     {[testenv:bandit]commands}
36 | 
37 | [testenv:flake8]
38 | basepython = python3
39 | skip_install = true
40 | deps =
41 |     flake8
42 |     flake8-docstrings>=0.2.7
43 |     flake8-import-order>=0.9
44 | commands =
45 |     flake8 historical setup.py test
46 | 
47 | [testenv:pylint]
48 | basepython = python3
49 | skip_install = false
50 | deps =
51 |     pyflakes
52 |     pylint
53 | commands =
54 |     pylint --rcfile={toxinidir}/.pylintrc historical
55 | 
56 | [testenv:setuppy]
57 | basepython = python3
58 | skip_install = true
59 | deps =
60 | commands =
61 |     python setup.py check -m -s
62 | 
63 | [testenv:bandit]
64 | basepython = python3
65 | skip_install = true
66 | deps =
67 |     bandit
68 | commands =
69 |     bandit --ini tox.ini -r historical
70 | 
71 | [bandit]
72 | skips = B101
73 | 
74 | [flake8]
75 | ignore = E501,I100,D205,D400,D401,I202,R0913,C901
76 | exclude =
77 |     *.egg-info,
78 |     *.pyc,
79 |     .cache,
80 |     .coverage.*,
81 |     .gradle,
82 |     .tox,
83 |     build,
84 |     dist,
85 |     htmlcov.*
86 |     *-cookiecutter
87 |     historical/tests/factories.py
88 | max-complexity = 10
89 | import-order-style = google
90 | application-import-names = flake8
91 | 
92 | [pytest]
93 | norecursedirs=.*
94 | 


--------------------------------------------------------------------------------
/mkdocs/docs/index.md:
--------------------------------------------------------------------------------
 1 | <div style="display: flex; align-items: baseline">
 2 | <img src="img/historical.jpg" style="max-height: 250px; padding-right: 10px"><h1>Historical<h1>
 3 | </div>
 4 | 
 5 | <h1 style="color: red">THIS PROJECT IS NO LONGER IN DEVELOPMENT. THIS IS ONLY HERE FOR ARCHIVAL PURPOSES ONLY!!</h1>
 6 | 
 7 | Historical is a serverless application that tracks and reacts to AWS resource modifications anywhere in
 8 | your environment. Historical achieves this by describing AWS resources when they are changed, and keeping the history of those changes along with the the CloudTrail context of those changes.
 9 | 
10 | Historical persists data in two places:
11 | 
12 | - A "Current" DynamoDB table, which is a cache of the current state of AWS resources
13 | - A "Durable" DynamoDB table, which stores the change history of AWS resources
14 | 
15 | Historical enables downstream consumers to react to changes in the AWS environment
16 | without the need to directly describe the resource. This greatly increases speed of reaction, reduces IAM permission complexity, and also avoids rate limiting.
17 | 
18 | ## How it works
19 | Historical leverages AWS CloudWatch Events. Events trigger a "Collector" Lambda function to describe the AWS resource that changed, and saves the configuration into a DynamoDB table. From this, a "Differ" Lambda function checks if the resource has changed from what was previously known about that resource. If the item has changed, a new change record is saved, which then enables downstream consumers the ability to react to changes in the environment as the environment effectively changes over time.
20 | 
21 | The CloudTrail context on the change is preserved in the change history.
22 | 
23 | ## Current Technologies Implemented
24 | 
25 | - ### S3
26 | - ### Security Groups
27 | - ### IAM (In active development -- Coming Soon!)
28 | 
29 | ## Architecture
30 | Please review the [Architecture](architecture.md) documentation for an in-depth description of the components involved.
31 | 
32 | ## Installation & Configuration
33 | Please review the [Installation & Configuration](installation/) documentation for details.
34 | 
35 | ## Troubleshooting
36 | Please review the [Troubleshooting](troubleshooting.md) doc if you are experiencing issues.
37 | 


--------------------------------------------------------------------------------
/historical/historical-cookiecutter/historical_{{cookiecutter.technology_slug}}/{{cookiecutter.technology_slug}}/poller.py:
--------------------------------------------------------------------------------
 1 | """
 2 | .. module: {{cookiecutter.technology_slug}}.poller
 3 |     :platform: Unix
 4 |     :copyright: (c) 2017 by Netflix Inc., see AUTHORS for more
 5 |     :license: Apache, see LICENSE for more details.
 6 | .. author:: {{cookiecutter.author}} <{{cookiecutter.email}}>
 7 | """
 8 | import os
 9 | import logging
10 | 
11 | from botocore.exceptions import ClientError
12 | 
13 | from raven_python_lambda import RavenLambdaWrapper
14 | # from cloudaux.aws.ec2 import describe_security_groups
15 | 
16 | # from historical.constants import CURRENT_REGION, HISTORICAL_ROLE
17 | from .models import {{cookiecutter.technology_slug}}_polling_schema
18 | from historical.common.accounts import get_historical_accounts
19 | from historical.common.kinesis import produce_events
20 | 
21 | logging.basicConfig()
22 | log = logging.getLogger("historical")
23 | log.setLevel(logging.INFO)
24 | 
25 | 
26 | @RavenLambdaWrapper()
27 | def handler(event, context):
28 |     """
29 |     Historical {{cookiecutter.technology_name}}  event poller.
30 | 
31 |     This poller is run at a set interval in order to ensure that changes do not go undetected by historical.
32 | 
33 |     Historical pollers generate `polling events` which simulate changes. These polling events contain configuration
34 |     data such as the account/region defining where the collector should attempt to gather data from.
35 |     """
36 |     log.debug('Running poller. Configuration: {}'.format(event))
37 | 
38 |     for account in get_historical_accounts():
39 |         try:
40 |             # TODO describe all items
41 |             # Example::
42 |             #
43 |             # groups = describe_security_groups(
44 |             #     account_number=account['id'],
45 |             #     assume_role=HISTORICAL_ROLE,
46 |             #     region=CURRENT_REGION
47 |             # )
48 |             # events = [security_group_polling_schema.serialize(account['id'], g) for g in groups['SecurityGroups']]
49 |             events = []
50 |             produce_events(events, os.environ.get('HISTORICAL_STREAM', 'Historical{{cookiecutter.technology_slug | titlecase }}PollerStream'))
51 |             log.debug('Finished generating polling events. Account: {} Events Created: {}'.format(account['id'], len(events)))
52 |         except ClientError as e:
53 |             log.warning('Unable to generate events for account. AccountId: {account_id} Reason: {reason}'.format(
54 |                 account_id=account['id'],
55 |                 reason=e
56 |             ))
57 | 
58 | 


--------------------------------------------------------------------------------
/historical/tests/test_cloudwatch.py:
--------------------------------------------------------------------------------
 1 | """
 2 | .. module: historical.tests.test_s3
 3 |     :platform: Unix
 4 |     :copyright: (c) 2017 by Netflix Inc., see AUTHORS for more
 5 |     :license: Apache, see LICENSE for more details.
 6 | .. author:: Kevin Glisson <kglisson@netflix.com>
 7 | """
 8 | 
 9 | import json
10 | 
11 | from historical.tests.factories import (
12 |     CloudwatchEventFactory,
13 |     DetailFactory,
14 |     serialize
15 | )
16 | 
17 | 
18 | def test_filter_request_parameters():
19 |     """Tests that specific elements can be pulled out of the Request Parameters in the CloudWatch Event."""
20 |     from historical.common.cloudwatch import filter_request_parameters
21 |     event = CloudwatchEventFactory(
22 |         detail=DetailFactory(
23 |             requestParameters={'GroupId': 'sg-4e386e31'}
24 |         )
25 |     )
26 |     data = json.loads(json.dumps(event, default=serialize))
27 |     assert filter_request_parameters('GroupId', data) == 'sg-4e386e31'
28 | 
29 | 
30 | def test_get_user_identity():
31 |     """Tests that the User Identity can be pulled out of the CloudWatch Event."""
32 |     from historical.common.cloudwatch import get_user_identity
33 |     event = CloudwatchEventFactory()
34 |     data = json.loads(json.dumps(event, default=serialize))
35 |     assert get_user_identity(data)
36 | 
37 | 
38 | def test_get_principal():
39 |     """Tests that the Principal object can be pulled out of the CloudWatch Event."""
40 |     from historical.common.cloudwatch import get_principal
41 |     event = CloudwatchEventFactory()
42 |     data = json.loads(json.dumps(event, default=serialize))
43 |     assert get_principal(data) == 'joe@example.com'
44 | 
45 | 
46 | def test_get_region():
47 |     """Tests that the Region can be pulled out of the CloudWatch Event."""
48 |     from historical.common.cloudwatch import get_region
49 |     event = CloudwatchEventFactory()
50 |     data = json.loads(json.dumps(event, default=serialize))
51 |     assert get_region(data) == 'us-east-1'
52 | 
53 | 
54 | def test_get_event_time():
55 |     """Tests that the Event Time can be pulled out of the CloudWatch Event."""
56 |     from historical.common.cloudwatch import get_event_time
57 |     event = CloudwatchEventFactory()
58 |     data = json.loads(json.dumps(event, default=serialize))
59 |     assert get_event_time(data)
60 | 
61 | 
62 | def test_get_account_id():
63 |     """Tests that the Account ID can be pulled out of the CloudWatch Event."""
64 |     from historical.common.cloudwatch import get_account_id
65 |     event = CloudwatchEventFactory()
66 |     data = json.loads(json.dumps(event, default=serialize))
67 |     assert get_account_id(data) == '123456789012'
68 | 


--------------------------------------------------------------------------------
/terraform/teardown_historical.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | 
 3 | [ -z "$TECH" ] && echo "Need to set TECH -- one of [s3, securitygroup]" && exit 1;
 4 | [ -z "$TF_S3_BUCKET" ] && echo "Need to set TF_S3_BUCKET -- the S3 bucket to use for Terraform" && exit 1;
 5 | [ -z "$PRIMARY_REGION" ] && echo "Need to set PRIMARY_REGION." && exit 1;
 6 | 
 7 | # AWS ENV VARS:
 8 | [ -z "$AWS_ACCESS_KEY_ID" ] && echo "Need to set the AWS_ACCESS_KEY_ID" && exit 1;
 9 | [ -z "$AWS_SECRET_ACCESS_KEY" ] && echo "Need to set the AWS_SECRET_ACCESS_KEY" && exit 1;
10 | [ -z "$AWS_SESSION_TOKEN" ] && echo "Need to set the AWS_SESSION_TOKEN" && exit 1;
11 | 
12 | # Copy the requirements.txt file over:
13 | WORKING_DIR=$( pwd )
14 | 
15 | # Make an empty file to make Terraform happy:
16 | touch ${WORKING_DIR}/infra/lambda.zip
17 | 
18 | # Tear down the stacks first:
19 | cd ${WORKING_DIR}/infra
20 | cp ${TECH}/${TECH}.tf ./
21 | 
22 | # Start the Terraform work:
23 | echo "[@] Now tearing down the infrastructure for each region -- starting with the PRIMARY REGION: ${PRIMARY_REGION}..."
24 | IFS=','
25 | ALL_REGIONS=$PRIMARY_REGION,$SECONDARY_REGIONS
26 | for region in $ALL_REGIONS;
27 | do
28 |     echo "[-->] Initializing Terraform for ${region}..."
29 |     /terraform init -plugin-dir=/installer/terraform-plugins -backend-config "bucket=$TF_S3_BUCKET" -backend-config "key=terraform/$TECH/INFRA/$region"
30 |     if [ $? -ne 0 ]; then
31 |         echo "[X] Terraform init has failed!!"
32 |         exit 1
33 |     fi
34 | 
35 |     echo "[-->] Tearing down the stack now..."
36 |     TF_VAR_REGION=${region} /terraform destroy -auto-approve
37 |     if [ $? -ne 0 ]; then
38 |         echo "[X] Terraform stack destroy has failed!! -- Sometimes this needs be run multiple times due to eventual consistency."
39 |         exit 1
40 |     fi
41 |     echo "[+] Completed tearing down stack in ${region}."
42 | 
43 |     # Clear out the existing Terraform data:
44 |     rm -Rf .terraform/
45 | done
46 | 
47 | echo "[-->] Initializing Terraform for DynamoDB work..."
48 | cd ${WORKING_DIR}/dynamodb
49 | # Copy the tech template into the local directory for Terraform to tear down the tech's DynamoDB components:
50 | cp ${TECH}/${TECH}.tf ./
51 | /terraform init -plugin-dir=/installer/terraform-plugins -backend-config "bucket=$TF_S3_BUCKET" -backend-config "key=terraform/$TECH/DYNAMODB"
52 | if [ $? -ne 0 ]; then
53 |     echo "[X] Terraform init has failed!!"
54 |     exit 1
55 | fi
56 | echo "[-->] Tearing down the DynamoDB stack..."
57 | /terraform destroy -auto-approve
58 | if [ $? -ne 0 ]; then
59 |     echo "[X] Terraform stack destroy has failed!!"
60 |     exit 1
61 | fi
62 | echo "[+] Completed tearing down DynamoDB."
63 | 
64 | echo "[@] DONE"
65 | 


--------------------------------------------------------------------------------
/historical/common/cloudwatch.py:
--------------------------------------------------------------------------------
 1 | """
 2 | .. module: historical.common.cloudwatch
 3 |     :platform: Unix
 4 |     :copyright: (c) 2017 by Netflix Inc., see AUTHORS for more
 5 |     :license: Apache, see LICENSE for more details.
 6 | .. author:: Kevin Glisson <kglisson@netflix.com>
 7 | .. author:: Mike Grima <mgrima@netflix.com>
 8 | """
 9 | from datetime import datetime
10 | 
11 | from historical.constants import CURRENT_REGION
12 | 
13 | 
14 | def filter_request_parameters(field_name, msg, look_in_response=False):
15 |     """
16 |     From an event, extract the field name from the message.
17 |     Different API calls put this information in different places, so check a few places.
18 |     """
19 |     val = msg['detail'].get(field_name, None)
20 |     try:
21 |         if not val:
22 |             val = msg['detail'].get('requestParameters', {}).get(field_name, None)
23 | 
24 |         # If we STILL didn't find it -- check if it's in the response element (default off)
25 |         if not val and look_in_response:
26 |             if msg['detail'].get('responseElements'):
27 |                 val = msg['detail']['responseElements'].get(field_name, None)
28 | 
29 |     # Just in case... We didn't find the value, so just make it None:
30 |     except AttributeError:
31 |         val = None
32 | 
33 |     return val
34 | 
35 | 
36 | def get_user_identity(event):
37 |     """Gets event identity from event."""
38 |     return event['detail'].get('userIdentity', {})
39 | 
40 | 
41 | def get_principal(event):
42 |     """Gets principal id from the event"""
43 |     user_identity = get_user_identity(event)
44 |     return user_identity.get('principalId', '').split(':')[-1]
45 | 
46 | 
47 | def get_region(event):
48 |     """Get region from event details."""
49 |     return event['detail'].get('awsRegion', CURRENT_REGION)
50 | 
51 | 
52 | def get_event_time(event):
53 |     """Gets the event time from an event"""
54 |     return datetime.strptime(event['detail']['eventTime'], "%Y-%m-%dT%H:%M:%SZ")
55 | 
56 | 
57 | def get_account_id(event):
58 |     """Gets the account id from an event"""
59 |     return event['account']
60 | 
61 | 
62 | def get_collected_details(event):
63 |     """Gets collected details if the technology's poller already described the given asset"""
64 |     return event['detail'].get('collected')
65 | 
66 | 
67 | def get_historical_base_info(event):
68 |     """Gets the base details from the CloudWatch Event."""
69 |     data = {
70 |         'principalId': get_principal(event),
71 |         'userIdentity': get_user_identity(event),
72 |         'accountId': event['account'],
73 |         'userAgent': event['detail'].get('userAgent'),
74 |         'sourceIpAddress': event['detail'].get('sourceIPAddress'),
75 |         'requestParameters': event['detail'].get('requestParameters')
76 |     }
77 | 
78 |     if event['detail'].get('eventTime'):
79 |         data['eventTime'] = event['detail']['eventTime']
80 | 
81 |     if event['detail'].get('eventSource'):
82 |         data['eventSource'] = event['detail']['eventSource']
83 | 
84 |     if event['detail'].get('eventName'):
85 |         data['eventName'] = event['detail']['eventName']
86 | 
87 |     return data
88 | 


--------------------------------------------------------------------------------
/terraform/install_historical.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | 
 3 | [ -z "$TECH" ] && echo "Need to set TECH -- one of [s3, securitygroup]" && exit 1;
 4 | [ -z "$TF_S3_BUCKET" ] && echo "Need to set TF_S3_BUCKET -- the S3 bucket to use for Terraform" && exit 1;
 5 | [ -z "$PRIMARY_REGION" ] && echo "Need to set PRIMARY_REGION." && exit 1;
 6 | 
 7 | # AWS ENV VARS:
 8 | [ -z "$AWS_ACCESS_KEY_ID" ] && echo "Need to set the AWS_ACCESS_KEY_ID" && exit 1;
 9 | [ -z "$AWS_SECRET_ACCESS_KEY" ] && echo "Need to set the AWS_SECRET_ACCESS_KEY" && exit 1;
10 | [ -z "$AWS_SESSION_TOKEN" ] && echo "Need to set the AWS_SESSION_TOKEN" && exit 1;
11 | 
12 | # Copy the requirements.txt file over:
13 | WORKING_DIR=$( pwd )
14 | echo "[-->] Copying the requirements file over to the build dir..."
15 | mkdir build
16 | cp requirements.txt build/
17 | 
18 | # Navigate to the build dir:
19 | cd build/
20 | BUILD_SOURCE_DIR=$( pwd )
21 | 
22 | # Make the venv:
23 | echo "[...] Building the venv..."
24 | python36 -m venv venv
25 | source venv/bin/activate
26 | 
27 | # Packaging:
28 | ZIP_NAME="historical-${TECH}.zip"
29 | echo "[...] Building the Lambda..."
30 | pip install -r requirements.txt -t ./artifacts
31 | echo "[...] Zipping the Lambda..."
32 | cd artifacts
33 | zip -r ${ZIP_NAME} .
34 | cd ${BUILD_SOURCE_DIR}
35 | 
36 | # Make a sym link and place it in the Terraform infra dir for later reference.
37 | cd ${WORKING_DIR}
38 | ln -s ${WORKING_DIR}/build/artifacts/${ZIP_NAME} ${WORKING_DIR}/infra/lambda.zip
39 | 
40 | # Start the Terraform work:
41 | echo "[-->] Initializing Terraform for DynamoDB work..."
42 | cd ./dynamodb
43 | # Copy the tech template into the local directory for Terraform to set up the tech's DynamoDB components:
44 | cp ${TECH}/${TECH}.tf ./
45 | /terraform init -plugin-dir=/installer/terraform-plugins -backend-config "bucket=$TF_S3_BUCKET" -backend-config "key=terraform/$TECH/DYNAMODB"
46 | if [ $? -ne 0 ]; then
47 |     echo "[X] Terraform init has failed!!"
48 |     exit 1
49 | fi
50 | echo "[-->] Applying the DynamoDB template..."
51 | /terraform apply -auto-approve
52 | if [ $? -ne 0 ]; then
53 |     echo "[X] Terraform application has failed!!"
54 |     exit 1
55 | fi
56 | echo "[+] Completed applying Terraform for DynamoDB"
57 | 
58 | echo "[@] Now deploying the rest of the infrastructure for each region -- starting with the PRIMARY REGION: ${PRIMARY_REGION}..."
59 | # Copy the tech template into the local directory for Terraform to set up the tech's complete infrastructure components:
60 | cd ${WORKING_DIR}/infra
61 | cp ${TECH}/${TECH}.tf ./
62 | 
63 | IFS=','
64 | ALL_REGIONS=$PRIMARY_REGION,$SECONDARY_REGIONS
65 | for region in $ALL_REGIONS;
66 | do
67 |     echo "[-->] Initializing Terraform for ${region}..."
68 |     /terraform init -plugin-dir=/installer/terraform-plugins -backend-config "bucket=$TF_S3_BUCKET" -backend-config "key=terraform/$TECH/INFRA/$region"
69 |     if [ $? -ne 0 ]; then
70 |         echo "[X] Terraform init has failed!!"
71 |         exit 1
72 |     fi
73 | 
74 |     echo "[-->] Applying the template now..."
75 |     TF_VAR_REGION=${region} /terraform apply -auto-approve
76 |     if [ $? -ne 0 ]; then
77 |         echo "[X] Terraform application has failed!!"
78 |         exit 1
79 |     fi
80 |     echo "[+] Completed applying template in ${region}."
81 | 
82 |     # Clear out the existing Terraform data:
83 |     rm -Rf .terraform/
84 | done
85 | 
86 | echo "[@] DONE"
87 | 


--------------------------------------------------------------------------------
/historical/historical-cookiecutter/historical_{{cookiecutter.technology_slug}}/{{cookiecutter.technology_slug}}/models.py:
--------------------------------------------------------------------------------
 1 | """
 2 | .. module: {{cookiecutter.technology_slug}}.models
 3 |     :platform: Unix
 4 |     :copyright: (c) 2017 by Netflix Inc., see AUTHORS for more
 5 |     :license: Apache, see LICENSE for more details.
 6 | .. author:: {{cookiecutter.author}} <{{cookiecutter.email}}>
 7 | """
 8 | from marshmallow import Schema, fields, post_dump
 9 | 
10 | from pynamodb.models import Model
11 | from pynamodb.indexes import GlobalSecondaryIndex, AllProjection
12 | from pynamodb.attributes import UnicodeAttribute, NumberAttribute, ListAttribute
13 | 
14 | from historical.constants import CURRENT_REGION
15 | from historical.models import (
16 |     HistoricalPollingEventDetail,
17 |     HistoricalPollingBaseModel,
18 |     DurableHistoricalModel,
19 |     CurrentHistoricalModel,
20 |     AWSHistoricalMixin
21 | )
22 | 
23 | 
24 | class {{cookiecutter.technology_slug | titlecase}}Model(object):
25 |     # TODO add attributes specific to technology
26 |     Tags = ListAttribute()
27 | 
28 | 
29 | class Durable{{cookiecutter.technology_slug | titlecase}}Model(Model, DurableHistoricalModel, AWSHistoricalMixin, {{cookiecutter.technology_slug | titlecase}}Model):
30 |     class Meta:
31 |         table_name = 'Historical{{cookiecutter.technology_slug | titlecase}}DurableTable'
32 |         region = CURRENT_REGION
33 | 
34 | 
35 | class Current{{cookiecutter.technology_slug | titlecase}}Model(Model, CurrentHistoricalModel, AWSHistoricalMixin, {{cookiecutter.technology_slug | titlecase}}Model):
36 |     class Meta:
37 |         table_name = 'Historical{{cookiecutter.technology_slug | titlecase}}CurrentTable'
38 |         region = CURRENT_REGION
39 | 
40 | 
41 | class ViewIndex(GlobalSecondaryIndex):
42 |     class Meta:
43 |         projection = AllProjection()
44 |         region = CURRENT_REGION
45 | 
46 |     view = NumberAttribute(default=0, hash_key=True)
47 | 
48 | 
49 | class {{cookiecutter.technology_slug | titlecase}}PollingRequestParamsModel(Schema):
50 |     # TODO add technology_slug validation fields
51 |     owner_id = fields.Str(dump_to='ownerId', load_from='ownerId', required=True)
52 | 
53 | 
54 | class {{cookiecutter.technology_slug | titlecase}}PollingEventDetail(HistoricalPollingEventDetail):
55 |     @post_dump
56 |     def add_required_{{cookiecutter.technology_slug}}_polling_data(self, data):
57 |         data['eventSource'] = 'historical.ec2.poller'
58 |         data['eventName'] = 'HistoricalPoller'
59 |         return data
60 | 
61 | 
62 | class {{cookiecutter.technology_slug | titlecase}}PollingEventModel(HistoricalPollingBaseModel):
63 |     detail = fields.Nested({{cookiecutter.technology_slug | titlecase}}PollingEventDetail, required=True)
64 | 
65 |     @post_dump()
66 |     def dump_security_group_polling_event_data(self, data):
67 |         data['version'] = '1'
68 |         return data
69 | 
70 |     # TODO add technology_slug specific fields
71 |     def serialize(self, account, group):
72 |         return self.dumps({
73 |             'account': account,
74 |             'detail': {
75 |                 'request_parameters': {
76 |                     'groupId': group['GroupId']
77 |                 }
78 |             }
79 |         }).data
80 | 
81 | 
82 | {{cookiecutter.technology_slug}}_polling_schema = {{cookiecutter.technology_slug | titlecase}}PollingEventModel(strict=True)
83 | 


--------------------------------------------------------------------------------
/historical/attributes.py:
--------------------------------------------------------------------------------
  1 | """
  2 | .. module: historical.attributes
  3 |     :platform: Unix
  4 |     :copyright: (c) 2017 by Netflix Inc., see AUTHORS for more
  5 |     :license: Apache, see LICENSE for more details.
  6 | .. author:: Kevin Glisson <kglisson@netflix.com>
  7 | .. author:: Mike Grima <mgrima@netflix.com>
  8 | """
  9 | import json
 10 | import decimal
 11 | 
 12 | from pynamodb.attributes import Attribute, BooleanAttribute, ListAttribute, MapAttribute, NumberAttribute
 13 | 
 14 | import pynamodb
 15 | from pynamodb.constants import NUMBER, STRING
 16 | 
 17 | DATETIME_FORMAT = '%Y-%m-%dT%H:%M:%SZ'
 18 | 
 19 | 
 20 | class HistoricalUnicodeAttribute(Attribute):
 21 |     """A Historical unicode attribute.
 22 |     Replaces '' with '<empty>' during serialization and correctly deserialize '<empty>' to ''
 23 |     """
 24 | 
 25 |     attr_type = STRING
 26 | 
 27 |     def serialize(self, value):
 28 |         """Returns a unicode string"""
 29 |         if value is None or not len(value):  # pylint: disable=C1801
 30 |             return '<empty>'
 31 |         return value
 32 | 
 33 |     def deserialize(self, value):
 34 |         """Strips out the `<empty>` placeholders with empty strings."""
 35 |         if value == '<empty>':
 36 |             return ''
 37 |         return value
 38 | 
 39 | 
 40 | class EventTimeAttribute(Attribute):
 41 |     """An attribute for storing a UTC Datetime or iso8601 string."""
 42 | 
 43 |     attr_type = STRING
 44 | 
 45 |     def serialize(self, value):
 46 |         """Takes a datetime object and returns a string"""
 47 |         if isinstance(value, str):
 48 |             return value
 49 |         return value.strftime(DATETIME_FORMAT)
 50 | 
 51 | 
 52 | def decimal_default(obj):
 53 |     """Properly parse out the Decimal datatypes into proper int/float types."""
 54 |     if isinstance(obj, decimal.Decimal):
 55 |         if obj % 1:
 56 |             return float(obj)
 57 |         return int(obj)
 58 |     raise TypeError
 59 | 
 60 | 
 61 | # pylint: disable=R1705,C0200
 62 | def fix_decimals(obj):
 63 |     """Removes the stupid Decimals
 64 | 
 65 |     See: https://github.com/boto/boto3/issues/369#issuecomment-302137290
 66 |     """
 67 |     if isinstance(obj, list):
 68 |         for i in range(len(obj)):
 69 |             obj[i] = fix_decimals(obj[i])
 70 |         return obj
 71 | 
 72 |     elif isinstance(obj, dict):
 73 |         for key, value in obj.items():
 74 |             obj[key] = fix_decimals(value)
 75 |         return obj
 76 | 
 77 |     elif isinstance(obj, decimal.Decimal):
 78 |         if obj % 1 == 0:
 79 |             return int(obj)
 80 |         else:
 81 |             return float(obj)
 82 | 
 83 |     else:
 84 |         return obj
 85 | 
 86 | 
 87 | class HistoricalDecimalAttribute(Attribute):
 88 |     """A number attribute"""
 89 | 
 90 |     attr_type = NUMBER
 91 | 
 92 |     def serialize(self, value):
 93 |         """Encode numbers as JSON"""
 94 |         return json.dumps(value, default=decimal_default)
 95 | 
 96 |     def deserialize(self, value):
 97 |         """Decode numbers from JSON"""
 98 |         return json.loads(value)
 99 | 
100 | 
101 | pynamodb.attributes.SERIALIZE_CLASS_MAP = {
102 |     dict: MapAttribute(),
103 |     list: ListAttribute(),
104 |     set: ListAttribute(),
105 |     bool: BooleanAttribute(),
106 |     float: NumberAttribute(),
107 |     int: NumberAttribute(),
108 |     str: HistoricalUnicodeAttribute(),
109 |     decimal.Decimal: HistoricalDecimalAttribute()
110 | }
111 | 


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
  1 | 
  2 | # Created by https://www.gitignore.io/api/python,visualstudiocode,node,serverless
  3 | 
  4 | .idea
  5 | *.cert
  6 | *.key
  7 | 
  8 | ### Node ###
  9 | # Logs
 10 | logs
 11 | *.log
 12 | npm-debug.log*
 13 | yarn-debug.log*
 14 | yarn-error.log*
 15 | 
 16 | # Runtime data
 17 | pids
 18 | *.pid
 19 | *.seed
 20 | *.pid.lock
 21 | 
 22 | # Directory for instrumented libs generated by jscoverage/JSCover
 23 | lib-cov
 24 | 
 25 | # Coverage directory used by tools like istanbul
 26 | coverage
 27 | 
 28 | # nyc test coverage
 29 | .nyc_output
 30 | 
 31 | # Grunt intermediate storage (http://gruntjs.com/creating-plugins#storing-task-files)
 32 | .grunt
 33 | 
 34 | # Bower dependency directory (https://bower.io/)
 35 | bower_components
 36 | 
 37 | # node-waf configuration
 38 | .lock-wscript
 39 | 
 40 | # Compiled binary addons (http://nodejs.org/api/addons.html)
 41 | build/Release
 42 | 
 43 | # Dependency directories
 44 | node_modules/
 45 | jspm_packages/
 46 | 
 47 | # Typescript v1 declaration files
 48 | typings/
 49 | 
 50 | # Optional npm cache directory
 51 | .npm
 52 | 
 53 | # Optional eslint cache
 54 | .eslintcache
 55 | 
 56 | # Optional REPL history
 57 | .node_repl_history
 58 | 
 59 | # Output of 'npm pack'
 60 | *.tgz
 61 | 
 62 | # Yarn Integrity file
 63 | .yarn-integrity
 64 | 
 65 | # dotenv environment variables file
 66 | .env
 67 | 
 68 | 
 69 | ### Python ###
 70 | # Byte-compiled / optimized / DLL files
 71 | __pycache__/
 72 | *.py[cod]
 73 | *$py.class
 74 | 
 75 | # C extensions
 76 | *.so
 77 | 
 78 | # Distribution / packaging
 79 | .Python
 80 | env/
 81 | build/
 82 | develop-eggs/
 83 | dist/
 84 | downloads/
 85 | eggs/
 86 | .eggs/
 87 | lib/
 88 | lib64/
 89 | parts/
 90 | sdist/
 91 | var/
 92 | wheels/
 93 | *.egg-info/
 94 | .installed.cfg
 95 | *.egg
 96 | 
 97 | # PyInstaller
 98 | #  Usually these files are written by a python script from a template
 99 | #  before PyInstaller builds the exe, so as to inject date/other infos into it.
100 | *.manifest
101 | *.spec
102 | 
103 | # Installer logs
104 | pip-log.txt
105 | pip-delete-this-directory.txt
106 | 
107 | # Unit test / coverage reports
108 | htmlcov/
109 | .tox/
110 | .coverage
111 | .coverage.*
112 | .cache
113 | nosetests.xml
114 | coverage.xml
115 | *,cover
116 | .hypothesis/
117 | 
118 | # Translations
119 | *.mo
120 | *.pot
121 | 
122 | # Django stuff:
123 | local_settings.py
124 | 
125 | # Flask stuff:
126 | instance/
127 | .webassets-cache
128 | 
129 | # Scrapy stuff:
130 | .scrapy
131 | 
132 | # Sphinx documentation
133 | docs/_build/
134 | 
135 | # PyBuilder
136 | target/
137 | 
138 | # Jupyter Notebook
139 | .ipynb_checkpoints
140 | 
141 | # pyenv
142 | .python-version
143 | 
144 | # celery beat schedule file
145 | celerybeat-schedule
146 | 
147 | # SageMath parsed files
148 | *.sage.py
149 | 
150 | # dotenv
151 | 
152 | # virtualenv
153 | .venv
154 | venv/
155 | ENV/
156 | venv
157 | 
158 | # Spyder project settings
159 | .spyderproject
160 | .spyproject
161 | 
162 | # Rope project settings
163 | .ropeproject
164 | 
165 | # mkdocs documentation
166 | /site
167 | 
168 | ### Serverless ###
169 | # Ignore build directory
170 | .serverless
171 | .requirements
172 | 
173 | *.test.yml
174 | serverless.yml
175 | serverless_configs/*
176 | test-reports/*
177 | 
178 | ### VisualStudioCode ###
179 | .vscode
180 | 
181 | # End of https://www.gitignore.io/api/python,visualstudiocode,node,serverless
182 | 
183 | .DS_Store
184 | .DS_Store/
185 | 
186 | ### MKDOCS ###
187 | mkdocs/site/
188 | 
189 | # Terraform
190 | env.list
191 | 


--------------------------------------------------------------------------------
/historical/security_group/models.py:
--------------------------------------------------------------------------------
  1 | """
  2 | .. module: historical.security_group.models
  3 |     :platform: Unix
  4 |     :copyright: (c) 2017 by Netflix Inc., see AUTHORS for more
  5 |     :license: Apache, see LICENSE for more details.
  6 | .. author:: Kevin Glisson <kglisson@netflix.com>
  7 | """
  8 | from marshmallow import fields, post_dump
  9 | 
 10 | from pynamodb.attributes import UnicodeAttribute
 11 | 
 12 | from historical.constants import CURRENT_REGION
 13 | from historical.models import AWSHistoricalMixin, CurrentHistoricalModel, DurableHistoricalModel,\
 14 |     HistoricalPollingBaseModel, HistoricalPollingEventDetail
 15 | 
 16 | VERSION = 1
 17 | 
 18 | 
 19 | class SecurityGroupModel:
 20 |     """Security Group specific fields for DynamoDB."""
 21 | 
 22 |     GroupId = UnicodeAttribute()
 23 |     GroupName = UnicodeAttribute()
 24 |     VpcId = UnicodeAttribute(null=True)
 25 |     Region = UnicodeAttribute()
 26 | 
 27 | 
 28 | class DurableSecurityGroupModel(DurableHistoricalModel, AWSHistoricalMixin, SecurityGroupModel):
 29 |     """The Durable Table model for Security Groups."""
 30 | 
 31 |     class Meta:
 32 |         """Table details"""
 33 | 
 34 |         table_name = 'HistoricalSecurityGroupDurableTable'
 35 |         region = CURRENT_REGION
 36 |         tech = 'securitygroup'
 37 | 
 38 | 
 39 | class CurrentSecurityGroupModel(CurrentHistoricalModel, AWSHistoricalMixin, SecurityGroupModel):
 40 |     """The Current Table model for Security Groups."""
 41 | 
 42 |     class Meta:
 43 |         """Table details"""
 44 | 
 45 |         table_name = 'HistoricalSecurityGroupCurrentTable'
 46 |         region = CURRENT_REGION
 47 |         tech = 'securitygroup'
 48 | 
 49 | 
 50 | class SecurityGroupPollingEventDetail(HistoricalPollingEventDetail):
 51 |     """Schema that provides the required fields for mimicking the CloudWatch Event for Polling."""
 52 | 
 53 |     region = fields.Str(required=True, load_from='awsRegion', dump_to='awsRegion')
 54 | 
 55 |     @post_dump
 56 |     def add_required_security_group_polling_data(self, data):
 57 |         """Adds the required data to the JSON.
 58 | 
 59 |         :param data:
 60 |         :return:
 61 |         """
 62 |         data['eventSource'] = 'historical.ec2.poller'
 63 |         data['eventName'] = 'PollSecurityGroups'
 64 |         return data
 65 | 
 66 | 
 67 | class SecurityGroupPollingEventModel(HistoricalPollingBaseModel):
 68 |     """This is the Marshmallow schema for a Polling event. This is made to look like a CloudWatch Event."""
 69 | 
 70 |     detail = fields.Nested(SecurityGroupPollingEventDetail, required=True)
 71 | 
 72 |     @post_dump()
 73 |     def dump_security_group_polling_event_data(self, data):
 74 |         """Adds the required data to the JSON.
 75 | 
 76 |         :param data:
 77 |         :return:
 78 |         """
 79 |         data['version'] = '1'
 80 |         return data
 81 | 
 82 |     def serialize(self, account, group, region):
 83 |         """Serializes the JSON for the Polling Event Model.
 84 | 
 85 |         :param account:
 86 |         :param group:
 87 |         :param region:
 88 |         :return:
 89 |         """
 90 |         return self.dumps({
 91 |             'account': account,
 92 |             'detail': {
 93 |                 'request_parameters': {
 94 |                     'groupId': group['GroupId']
 95 |                 },
 96 |                 'region': region,
 97 |                 'collected': group
 98 |             }
 99 |         }).data
100 | 
101 | 
102 | SECURITY_GROUP_POLLING_SCHEMA = SecurityGroupPollingEventModel(strict=True)
103 | 


--------------------------------------------------------------------------------
/historical/common/sqs.py:
--------------------------------------------------------------------------------
  1 | """
  2 | .. module: historical.common.sqs
  3 |     :platform: Unix
  4 |     :copyright: (c) 2018 by Netflix Inc., see AUTHORS for more
  5 |     :license: Apache, see LICENSE for more details.
  6 | .. author:: Mike Grima <mgrima@netflix.com>
  7 | """
  8 | import logging
  9 | import uuid
 10 | import random
 11 | 
 12 | import boto3
 13 | 
 14 | from historical.constants import CURRENT_REGION
 15 | 
 16 | logging.basicConfig()
 17 | LOG = logging.getLogger('historical')
 18 | LOG.setLevel(logging.INFO)
 19 | 
 20 | 
 21 | def chunks(event_list, chunk_size):
 22 |     """Yield successive n-sized chunks from the event list."""
 23 |     for i in range(0, len(event_list), chunk_size):
 24 |         yield event_list[i:i + chunk_size]
 25 | 
 26 | 
 27 | def get_queue_url(queue_name):
 28 |     """Get the URL of the SQS queue to send events to."""
 29 |     client = boto3.client("sqs", CURRENT_REGION)
 30 |     queue = client.get_queue_url(QueueName=queue_name)
 31 | 
 32 |     return queue["QueueUrl"]
 33 | 
 34 | 
 35 | def make_sqs_record(event, delay_seconds=0):
 36 |     """Get a dict with the components required for SQS"""
 37 |     return {
 38 |         "Id": uuid.uuid4().hex,
 39 |         "DelaySeconds": delay_seconds,
 40 |         "MessageBody": event
 41 |     }
 42 | 
 43 | 
 44 | def get_random_delay(max_seconds):
 45 |     """Gets a randomized number between 0 and the max number in seconds for
 46 |        how long a message in SQS should be delayed.
 47 | 
 48 |        900 seconds (15 min) is the maximum permitted by SQS.
 49 |     :param max_seconds:
 50 |     :return:
 51 |     """
 52 |     return random.randint(0, max_seconds)  # nosec
 53 | 
 54 | 
 55 | def produce_events(events, queue_url, batch_size=10, randomize_delay=0):
 56 |     """
 57 |     Efficiently sends events to the SQS event queue.
 58 | 
 59 |     Note: SQS has a max size of 10 items.  Please be aware that this can make the messages go past size -- even
 60 |     with shrinking messages!
 61 | 
 62 |     Events can get randomized delays, maximum of 900 seconds. Set that in `randomize_delay`
 63 |     :param events:
 64 |     :param queue_url:
 65 |     :param batch_size:
 66 |     :param randomize_delay:
 67 |     """
 68 |     client = boto3.client('sqs', region_name=CURRENT_REGION)
 69 | 
 70 |     for chunk in chunks(events, batch_size):
 71 |         records = [make_sqs_record(event, delay_seconds=get_random_delay(randomize_delay)) for event in chunk]
 72 | 
 73 |         client.send_message_batch(Entries=records, QueueUrl=queue_url)
 74 | 
 75 | 
 76 | def group_records_by_type(records, update_events):
 77 |     """Break records into two lists; create/update events and delete events.
 78 | 
 79 |     :param records:
 80 |     :param update_events:
 81 |     :return update_records, delete_records:
 82 |     """
 83 |     update_records, delete_records = [], []
 84 |     for record in records:
 85 |         if record.get("detail-type", "") == "Scheduled Event":
 86 |             LOG.error("[X] Received a Scheduled Event in the Queue... Please check that your environment is set up"
 87 |                       " correctly.")
 88 |             continue
 89 | 
 90 |         # Ignore SQS junk messages (like subscription notices and things):
 91 |         if not record.get("detail"):
 92 |             continue
 93 | 
 94 |         # Do not capture error events:
 95 |         if not record["detail"].get("errorCode"):
 96 |             if record['detail']['eventName'] in update_events:
 97 |                 update_records.append(record)
 98 |             else:
 99 |                 delete_records.append(record)
100 | 
101 |     return update_records, delete_records
102 | 


--------------------------------------------------------------------------------
/historical/vpc/models.py:
--------------------------------------------------------------------------------
  1 | """
  2 | .. module: historical.vpc.models
  3 |     :platform: Unix
  4 |     :copyright: (c) 2017 by Netflix Inc., see AUTHORS for more
  5 |     :license: Apache, see LICENSE for more details.
  6 | .. author:: Kevin Glisson <kglisson@netflix.com>
  7 | .. author:: Mike Grima <mgrima@netflix.com>
  8 | """
  9 | from marshmallow import fields, post_dump, Schema
 10 | 
 11 | from pynamodb.attributes import BooleanAttribute, UnicodeAttribute
 12 | 
 13 | from historical.constants import CURRENT_REGION
 14 | from historical.models import (
 15 |     AWSHistoricalMixin,
 16 |     CurrentHistoricalModel,
 17 |     DurableHistoricalModel,
 18 |     HistoricalPollingBaseModel,
 19 |     HistoricalPollingEventDetail,
 20 | )
 21 | 
 22 | 
 23 | VERSION = 1
 24 | 
 25 | 
 26 | class VPCModel:
 27 |     """VPC specific fields for DynamoDB."""
 28 | 
 29 |     VpcId = UnicodeAttribute()
 30 |     State = UnicodeAttribute()
 31 |     CidrBlock = UnicodeAttribute()
 32 |     IsDefault = BooleanAttribute()
 33 |     Name = UnicodeAttribute(null=True)
 34 |     Region = UnicodeAttribute()
 35 | 
 36 | 
 37 | class DurableVPCModel(DurableHistoricalModel, AWSHistoricalMixin, VPCModel):
 38 |     """The Durable Table model for VPC."""
 39 | 
 40 |     class Meta:
 41 |         """Table details"""
 42 | 
 43 |         table_name = 'HistoricalVPCDurableTable'
 44 |         region = CURRENT_REGION
 45 |         tech = 'vpc'
 46 | 
 47 | 
 48 | class CurrentVPCModel(CurrentHistoricalModel, AWSHistoricalMixin, VPCModel):
 49 |     """The Current Table model for VPC."""
 50 | 
 51 |     class Meta:
 52 |         """Table details"""
 53 | 
 54 |         table_name = 'HistoricalVPCCurrentTable'
 55 |         region = CURRENT_REGION
 56 |         tech = 'vpc'
 57 | 
 58 | 
 59 | class VPCPollingRequestParamsModel(Schema):
 60 |     """Schema with the required fields for the Poller to instruct the Collector to fetch VPC details."""
 61 | 
 62 |     vpc_id = fields.Str(dump_to='vpcId', load_from='vpcId', required=True)
 63 |     owner_id = fields.Str(dump_to='ownerId', load_from='ownerId', required=True)
 64 | 
 65 | 
 66 | class VPCPollingEventDetail(HistoricalPollingEventDetail):
 67 |     """Schema that provides the required fields for mimicking the CloudWatch Event for Polling."""
 68 | 
 69 |     @post_dump
 70 |     def add_required_vpc_polling_data(self, data):
 71 |         """Adds the required data to the JSON.
 72 | 
 73 |         :param data:
 74 |         :return:
 75 |         """
 76 |         data['eventSource'] = 'historical.ec2.poller'
 77 |         data['eventName'] = 'PollVpc'
 78 |         return data
 79 | 
 80 | 
 81 | class VPCPollingEventModel(HistoricalPollingBaseModel):
 82 |     """This is the Marshmallow schema for a Polling event. This is made to look like a CloudWatch Event."""
 83 | 
 84 |     detail = fields.Nested(VPCPollingEventDetail, required=True)
 85 | 
 86 |     @post_dump()
 87 |     def dump_vpc_polling_event_data(self, data):
 88 |         """Adds the required data to the JSON.
 89 | 
 90 |         :param data:
 91 |         :return:
 92 |         """
 93 |         data['version'] = '1'
 94 |         return data
 95 | 
 96 |     def serialize(self, account, group):
 97 |         """Serializes the JSON for the Polling Event Model.
 98 | 
 99 |         :param account:
100 |         :param group:
101 |         :return:
102 |         """
103 |         return self.dumps({
104 |             'account': account,
105 |             'detail': {
106 |                 'request_parameters': {
107 |                     'vpcId': group['VpcId']
108 |                 }
109 |             }
110 |         }).data
111 | 
112 | 
113 | VPC_POLLING_SCHEMA = VPCPollingEventModel(strict=True)
114 | 


--------------------------------------------------------------------------------
/historical/s3/poller.py:
--------------------------------------------------------------------------------
 1 | """
 2 | .. module: historical.s3.poller
 3 |     :platform: Unix
 4 |     :copyright: (c) 2017 by Netflix Inc., see AUTHORS for more
 5 |     :license: Apache, see LICENSE for more details.
 6 | .. author:: Mike Grima <mgrima@netflix.com>
 7 | """
 8 | import os
 9 | import logging
10 | 
11 | from botocore.exceptions import ClientError
12 | 
13 | from cloudaux.aws.s3 import list_buckets
14 | 
15 | from raven_python_lambda import RavenLambdaWrapper
16 | 
17 | from historical.common.sqs import get_queue_url, produce_events
18 | from historical.common.util import deserialize_records
19 | from historical.constants import CURRENT_REGION, HISTORICAL_ROLE, LOGGING_LEVEL, RANDOMIZE_POLLER
20 | from historical.models import HistoricalPollerTaskEventModel
21 | from historical.s3.models import S3_POLLING_SCHEMA
22 | from historical.common.accounts import get_historical_accounts
23 | 
24 | logging.basicConfig()
25 | LOG = logging.getLogger("historical")
26 | LOG.setLevel(LOGGING_LEVEL)
27 | 
28 | 
29 | @RavenLambdaWrapper()
30 | def poller_tasker_handler(event, context):  # pylint: disable=W0613
31 |     """
32 |     Historical S3 Poller Tasker.
33 | 
34 |     The Poller is run at a set interval in order to ensure that changes do not go undetected by Historical.
35 | 
36 |     Historical pollers generate `polling events` which simulate changes. These polling events contain configuration
37 |     data such as the account/region defining where the collector should attempt to gather data from.
38 | 
39 |     This is the entry point. This will task subsequent Poller lambdas to list all of a given resource in a select few
40 |     AWS accounts.
41 |     """
42 |     LOG.debug('[@] Running Poller Tasker...')
43 | 
44 |     queue_url = get_queue_url(os.environ.get('POLLER_TASKER_QUEUE_NAME', 'HistoricalS3PollerTasker'))
45 |     poller_task_schema = HistoricalPollerTaskEventModel()
46 | 
47 |     events = [poller_task_schema.serialize_me(account['id'], CURRENT_REGION) for account in get_historical_accounts()]
48 | 
49 |     try:
50 |         produce_events(events, queue_url, randomize_delay=RANDOMIZE_POLLER)
51 |     except ClientError as exc:
52 |         LOG.error(f'[X] Unable to generate poller tasker events! Reason: {exc}')
53 | 
54 |     LOG.debug('[@] Finished tasking the pollers.')
55 | 
56 | 
57 | @RavenLambdaWrapper()
58 | def poller_processor_handler(event, context):  # pylint: disable=W0613
59 |     """
60 |     Historical S3 Poller Processor.
61 | 
62 |     This will receive events from the Poller Tasker, and will list all objects of a given technology for an
63 |     account/region pair. This will generate `polling events` which simulate changes. These polling events contain
64 |     configuration data such as the account/region defining where the collector should attempt to gather data from.
65 |     """
66 |     LOG.debug('[@] Running Poller...')
67 | 
68 |     queue_url = get_queue_url(os.environ.get('POLLER_QUEUE_NAME', 'HistoricalS3Poller'))
69 | 
70 |     records = deserialize_records(event['Records'])
71 | 
72 |     for record in records:
73 |         # Skip accounts that have role assumption errors:
74 |         try:
75 |             # List all buckets in the account:
76 |             all_buckets = list_buckets(account_number=record['account_id'],
77 |                                        assume_role=HISTORICAL_ROLE,
78 |                                        session_name="historical-cloudwatch-s3list",
79 |                                        region=record['region'])["Buckets"]
80 | 
81 |             events = [S3_POLLING_SCHEMA.serialize_me(record['account_id'], bucket) for bucket in all_buckets]
82 |             produce_events(events, queue_url, randomize_delay=RANDOMIZE_POLLER)
83 |         except ClientError as exc:
84 |             LOG.error(f"[X] Unable to generate events for account. Account Id: {record['account_id']} Reason: {exc}")
85 | 
86 |         LOG.debug(f"[@] Finished generating polling events for account: {record['account_id']}. Events Created:"
87 |                   f" {len(record['account_id'])}")
88 | 


--------------------------------------------------------------------------------
/historical/vpc/poller.py:
--------------------------------------------------------------------------------
 1 | """
 2 | .. module: historical.vpc.poller
 3 |     :platform: Unix
 4 |     :copyright: (c) 2017 by Netflix Inc., see AUTHORS for more
 5 |     :license: Apache, see LICENSE for more details.
 6 | .. author:: Kevin Glisson <kglisson@netflix.com>
 7 | .. author:: Mike Grima <mgrima@netflix.com>
 8 | """
 9 | import os
10 | import logging
11 | 
12 | from botocore.exceptions import ClientError
13 | 
14 | from raven_python_lambda import RavenLambdaWrapper
15 | from cloudaux.aws.ec2 import describe_vpcs
16 | 
17 | from historical.constants import HISTORICAL_ROLE, LOGGING_LEVEL, POLL_REGIONS, RANDOMIZE_POLLER
18 | from historical.common.util import deserialize_records
19 | from historical.vpc.models import VPC_POLLING_SCHEMA
20 | from historical.models import HistoricalPollerTaskEventModel
21 | from historical.common.accounts import get_historical_accounts
22 | from historical.common.sqs import get_queue_url, produce_events
23 | 
24 | logging.basicConfig()
25 | LOG = logging.getLogger("historical")
26 | LOG.setLevel(LOGGING_LEVEL)
27 | 
28 | 
29 | @RavenLambdaWrapper()
30 | def poller_tasker_handler(event, context):  # pylint: disable=W0613
31 |     """
32 |     Historical VPC Poller Tasker.
33 | 
34 |     The Poller is run at a set interval in order to ensure that changes do not go undetected by Historical.
35 | 
36 |     Historical pollers generate `polling events` which simulate changes. These polling events contain configuration
37 |     data such as the account/region defining where the collector should attempt to gather data from.
38 | 
39 |     This is the entry point. This will task subsequent Poller lambdas to list all of a given resource in a select few
40 |     AWS accounts.
41 |     """
42 |     LOG.debug('[@] Running Poller Tasker...')
43 | 
44 |     queue_url = get_queue_url(os.environ.get('POLLER_TASKER_QUEUE_NAME', 'HistoricalVPCPollerTasker'))
45 |     poller_task_schema = HistoricalPollerTaskEventModel()
46 | 
47 |     events = []
48 |     for account in get_historical_accounts():
49 |         for region in POLL_REGIONS:
50 |             events.append(poller_task_schema.serialize_me(account['id'], region))
51 | 
52 |     try:
53 |         produce_events(events, queue_url, randomize_delay=RANDOMIZE_POLLER)
54 |     except ClientError as exc:
55 |         LOG.error(f'[X] Unable to generate poller tasker events! Reason: {exc}')
56 | 
57 |     LOG.debug('[@] Finished tasking the pollers.')
58 | 
59 | 
60 | @RavenLambdaWrapper()
61 | def poller_processor_handler(event, context):  # pylint: disable=W0613
62 |     """
63 |     Historical Security Group Poller Processor.
64 | 
65 |     This will receive events from the Poller Tasker, and will list all objects of a given technology for an
66 |     account/region pair. This will generate `polling events` which simulate changes. These polling events contain
67 |     configuration data such as the account/region defining where the collector should attempt to gather data from.
68 |     """
69 |     LOG.debug('[@] Running Poller...')
70 | 
71 |     queue_url = get_queue_url(os.environ.get('POLLER_QUEUE_NAME', 'HistoricalVPCPoller'))
72 | 
73 |     records = deserialize_records(event['Records'])
74 | 
75 |     for record in records:
76 |         # Skip accounts that have role assumption errors:
77 |         try:
78 |             vpcs = describe_vpcs(
79 |                 account_number=record['account_id'],
80 |                 assume_role=HISTORICAL_ROLE,
81 |                 region=record['region']
82 |             )
83 | 
84 |             events = [VPC_POLLING_SCHEMA.serialize(record['account_id'], v) for v in vpcs]
85 |             produce_events(events, queue_url, randomize_delay=RANDOMIZE_POLLER)
86 |             LOG.debug(f"[@] Finished generating polling events. Account: {record['account_id']}/{record['region']} "
87 |                       f"Events Created: {len(events)}")
88 |         except ClientError as exc:
89 |             LOG.error(f"[X] Unable to generate events for account/region. Account Id/Region: {record['account_id']}"
90 |                       f"/{record['region']} Reason: {exc}")
91 | 


--------------------------------------------------------------------------------
/historical/s3/models.py:
--------------------------------------------------------------------------------
  1 | """
  2 | .. module: historical.s3.models
  3 |     :platform: Unix
  4 |     :copyright: (c) 2017 by Netflix Inc., see AUTHORS for more
  5 |     :license: Apache, see LICENSE for more details.
  6 | .. author:: Mike Grima <mgrima@netflix.com>
  7 | """
  8 | from marshmallow import fields, post_dump, Schema
  9 | from pynamodb.attributes import UnicodeAttribute
 10 | 
 11 | from historical.constants import CURRENT_REGION
 12 | from historical.models import AWSHistoricalMixin, CurrentHistoricalModel, DurableHistoricalModel, \
 13 |     HistoricalPollingBaseModel, HistoricalPollingEventDetail
 14 | 
 15 | 
 16 | # The schema version -- TODO: Get this from CloudAux
 17 | VERSION = 9
 18 | 
 19 | 
 20 | class S3Model:
 21 |     """S3 specific fields for DynamoDB."""
 22 | 
 23 |     BucketName = UnicodeAttribute()
 24 |     Region = UnicodeAttribute()
 25 | 
 26 | 
 27 | class DurableS3Model(DurableHistoricalModel, AWSHistoricalMixin, S3Model):
 28 |     """The Durable Table model for S3."""
 29 | 
 30 |     class Meta:
 31 |         """Table Details"""
 32 | 
 33 |         table_name = 'HistoricalS3DurableTable'
 34 |         region = CURRENT_REGION
 35 |         tech = 's3'
 36 | 
 37 | 
 38 | class CurrentS3Model(CurrentHistoricalModel, AWSHistoricalMixin, S3Model):
 39 |     """The Current Table model for S3."""
 40 | 
 41 |     class Meta:
 42 |         """Table Details"""
 43 | 
 44 |         table_name = 'HistoricalS3CurrentTable'
 45 |         region = CURRENT_REGION
 46 |         tech = 's3'
 47 | 
 48 | 
 49 | class S3PollingRequestParamsModel(Schema):
 50 |     """Schema with the required fields for the Poller to instruct the Collector to fetch S3 details."""
 51 | 
 52 |     bucket_name = fields.Str(dump_to="bucketName", load_from="bucketName", required=True)
 53 |     creation_date = fields.Str(dump_to="creationDate", load_from="creationDate", required=True)
 54 | 
 55 | 
 56 | class S3PollingEventDetail(HistoricalPollingEventDetail):
 57 |     """Schema that provides the required fields for mimicking the CloudWatch Event for Polling."""
 58 | 
 59 |     request_parameters = fields.Nested(S3PollingRequestParamsModel, dump_to="requestParameters",
 60 |                                        load_from="requestParameters", required=True)
 61 |     event_source = fields.Str(load_only=True, load_from="eventSource", required=True)
 62 |     event_name = fields.Str(load_only=True, load_from="eventName", required=True)
 63 | 
 64 |     @post_dump
 65 |     def add_required_s3_polling_data(self, data):
 66 |         """Adds the required data to the JSON.
 67 | 
 68 |         :param data:
 69 |         :return:
 70 |         """
 71 |         data["eventSource"] = "historical.s3.poller"
 72 |         data["eventName"] = "PollS3"
 73 | 
 74 |         return data
 75 | 
 76 | 
 77 | class S3PollingEventModel(HistoricalPollingBaseModel):
 78 |     """This is the Marshmallow schema for a Polling event. This is made to look like a CloudWatch Event."""
 79 | 
 80 |     detail = fields.Nested(S3PollingEventDetail, required=True)
 81 |     version = fields.Str(load_only=True, required=True)
 82 | 
 83 |     @post_dump()
 84 |     def dump_s3_polling_event_data(self, data):
 85 |         """Adds the required data to the JSON.
 86 | 
 87 |         :param data:
 88 |         :return:
 89 |         """
 90 |         data["version"] = "1"
 91 | 
 92 |         return data
 93 | 
 94 |     def serialize_me(self, account, bucket_details):
 95 |         """Serializes the JSON for the Polling Event Model.
 96 | 
 97 |         :param account:
 98 |         :param bucket_details:
 99 |         :return:
100 |         """
101 |         return self.dumps({
102 |             "account": account,
103 |             "detail": {
104 |                 "request_parameters": {
105 |                     "bucket_name": bucket_details["Name"],
106 |                     "creation_date": bucket_details["CreationDate"].replace(
107 |                         tzinfo=None, microsecond=0).isoformat() + "Z"
108 |                 }
109 |             }
110 |         }).data
111 | 
112 | 
113 | S3_POLLING_SCHEMA = S3PollingEventModel(strict=True)
114 | 


--------------------------------------------------------------------------------
/mkdocs/docs/installation/index.md:
--------------------------------------------------------------------------------
 1 | # Installation & Configuration
 2 | **Note: Some assembly is required.**
 3 | 
 4 | There are many components that make up Historical. Included is a Docker container that you can use to run Terraform for installation.
 5 | 
 6 | Please review each section below in order to ensure that all aspects of the installation go smoothly. This is important because there are _many_ components that have to be configured correctly for Historical to operate properly.
 7 | 
 8 | ## Architecture
 9 | Before reading this installation guide, please become familiar with the Historical architecture. This will assist you in making the proper configuration for Historical. [You can review that here](../architecture.md).
10 | 
11 | ## Prerequisites
12 | Historical requires the following prerequisites:
13 | 
14 | 1. An AWS account that is dedicated for Historical (this is highly recommended).
15 | 1. CloudTrail must be enabled for **ALL** accounts and **ALL** regions.
16 | 1. CloudWatch Event Buses must be configured to route **ALL** CloudWatch Events to the Historical account. [Please review and follow the AWS documentation for sending and receiving events between AWS accounts before continuing](https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/CloudWatchEvents-CrossAccountEventDelivery.html).
17 |     - This diagram outlines how CloudWatch Event Buses should be configured:
18 |     <a href="../img/cw-events.png"><img src="../img/cw-events.png"></a>
19 | 1. You will need to create IAM roles in all the accounts to monitor first. This requires your own orchestration to complete. See the IAM section below for details.
20 | 1. Historical makes use of [SWAG](https://github.com/Netflix-Skunkworks/swag-client) to define which AWS accounts Historical is enabled for. SWAG must be properly configured for Historical to operate. Alternatively, you can specify the AWS Account IDs that Historical will examine via an environment variable. However, it is _highly recommended_ that you make use of SWAG.
21 | 
22 | ## IAM Setup
23 | Please review the [IAM Role setup guide here](iam.md) for instructions.
24 | 
25 | ## Terraform
26 | A set of **sample** [Terraform](https://terraform.io) templates are included to assist with the roll-out of the infrastructure. This is intended to be run within a Docker container (code also included). The Docker container will:
27 | 
28 | This is used for both installation and uninstallation. [Please review the documentation in detail here](terraform.md).
29 | 
30 | ## Configuration and Environment Variables
31 | **IMPORTANT:** There are many environment variables and configuration details that are required to be set. [Please review this page for details on this](configuration.md).
32 | 
33 | ## Prepare Docker Container
34 | Once you have made the necessary changes to your Terraform configuration files, you need to build the Docker container. You will need to build your Docker container.
35 | 
36 | 1. Please [install Docker](https://www.docker.com/get-started) if you haven't already.
37 | 1. Navigate to the `historical/terraform` directory.
38 | 1. In a terminal, run `docker build . -t historical_installer`
39 | 
40 | At this point, you now have a Docker container with all the required components to deploy Historical. _If you need to make any adjustments, you will need to re-build your container._
41 | 
42 | ## Installation
43 | Terraform requires a lot of permissions. You will need a very powerful AWS administrative role with lots of permissions to execute the Docker.
44 | 
45 | 1. Get credentials from an IAM role with administrative permissions.
46 | 1. Make a copy of `terraform/SAMPLE-env.list` to `terraform/env.list`
47 | 1. Open `terraform/env.list`, and fill in the values. ALL values must be supplied and correct. See the [configuration documentation](configuration.md#docker-installer-specific-fields) for reference.
48 | 1. In a terminal, navigate to `terraform/`
49 | 1. Run Docker! `docker run --env-file ./env.list -t historical_installer`
50 | 
51 | Hopefully this works!
52 | 
53 | ## Uninstallation
54 | Like for installation, you will need a lot of permissions. You will need a very powerful AWS administrative role with lots of permissions to execute the Docker.
55 | 
56 | 1. Get credentials from an IAM role with administrative permissions.
57 | 1. Use the `terraform/env.list` values used for installation.
58 | 1. In a terminal, navigate to `terraform/`
59 | 1. Run Docker! `docker run --env-file ./env.list --entrypoint /installer/teardown_historical.sh -t historical_installer`
60 | 
61 | This *might* fail the first time it runs. This is because Terraform doesn't wait long enough for all the resources to be deleted in the primary region. Try running it again if it fails the first time.
62 | 
63 | If it's still failing, you may need to find the resources that are failing to delete and manually delete them.
64 | 
65 | Please note: Depending on how active the Lambda functions are, the CloudWatch Event Log groups may still be present after stack deletion. You will need to manually delete these in each primary and secondary regions.
66 | 
67 | Hopefully this works well for you!
68 | 
69 | ## Troubleshooting
70 | Please review the [Troubleshooting](../troubleshooting) doc if you are experiencing issues.
71 | 


--------------------------------------------------------------------------------
/mkdocs/docs/installation/iam.md:
--------------------------------------------------------------------------------
  1 | # Historical IAM Role Setup Guide
  2 | 
  3 | IAM roles need to be configured for Historical to properly inventory all of your accounts. The following must be created:
  4 | 
  5 | 1. The `HistoricalLambdaProfile` role which is used to launch the Historical Lambda functions.
  6 | 1. The `Historical` role which the `HistoricalLambdaProfile` will assume to describe and collect details from the account in question.
  7 | 
  8 | The architecture for this looks like this:
  9 | <a href="../../img/iam-setup.jpg"><img src="../../img/iam-setup.jpg"></a>
 10 | 
 11 | ## Instructions
 12 | 
 13 | ### Lambda Role
 14 | 
 15 | 1. In the Historical account, create the `HistoricalLambdaProfile` IAM Role. This role needs to permit the `lambda.amazonaws.com` Service Principal access to it. Here is an example:
 16 | 
 17 |     *Trust Policy*:
 18 | 
 19 |         {
 20 |             "Version": "2012-10-17",
 21 |             "Statement": [
 22 |                 {
 23 |                     "Effect": "Allow",
 24 |                     "Principal": {
 25 |                         "Service": "lambda.amazonaws.com"
 26 |                     },
 27 |                     "Action": "sts:AssumeRole"
 28 |                 }
 29 |             ]
 30 |         }
 31 | 
 32 | 1. This role is being executed by AWS Lambda and requires the `AWSLambdaBasicExecutionRole` _AWS managed policy_ attached to it. This managed policy gives the Lambda access to write to CloudWatch Logs. VPC permissions are not required because Historical does not make use of ENIs or Security Groups.
 33 | 
 34 | 1. The role then needs a set of _Inline Policies_ to grant it access to the resources required for the Lambda function to access the Historical resources. Please make a new Inline Policy named `HistoricalLambdaPerms` as follows (substitute `HISTORICAL-ACCOUNT-NUMBER-HERE` with the AWS account ID of the Historical account):
 35 | 
 36 |         {
 37 |             "Version": "2012-10-17",
 38 |             "Statement": [
 39 |                 {
 40 |                     "Sid": "SQS",
 41 |                     "Effect": "Allow",
 42 |                     "Action": [
 43 |                         "sqs:DeleteMessage",
 44 |                         "sqs:GetQueueAttributes",
 45 |                         "sqs:GetQueueUrl",
 46 |                         "sqs:ReceiveMessage",
 47 |                         "sqs:SendMessage"
 48 |                     ],
 49 |                     "Resource": "arn:aws:sqs:*:HISTORICAL-ACCOUNT-NUMBER-HERE:Historical*"
 50 |                 },
 51 |                 {
 52 |                     "Sid": "SNS",
 53 |                     "Effect": "Allow",
 54 |                     "Action": "sns:Publish",
 55 |                     "Resource": "arn:aws:sns:*:HISTORICAL-ACCOUNT-NUMBER-HERE:Historical*"
 56 |                 },
 57 |                 {
 58 |                     "Sid": "STS",
 59 |                     "Effect": "Allow",
 60 |                     "Action": "sts:AssumeRole",
 61 |                     "Resource": "arn:aws:iam::*:role/Historical"
 62 |                 },
 63 |                 {
 64 |                     "Sid": "DynamoDB",
 65 |                     "Effect": "Allow",
 66 |                     "Action": [
 67 |                         "dynamodb:BatchGetItem",
 68 |                         "dynamodb:BatchWriteItem",
 69 |                         "dynamodb:DeleteItem",
 70 |                         "dynamodb:DescribeStream",
 71 |                         "dynamodb:DescribeTable",
 72 |                         "dynamodb:GetItem",
 73 |                         "dynamodb:GetRecords",
 74 |                         "dynamodb:GetShardIterator",
 75 |                         "dynamodb:ListStreams",
 76 |                         "dynamodb:PutItem",
 77 |                         "dynamodb:Query",
 78 |                         "dynamodb:Scan",
 79 |                         "dynamodb:UpdateItem"
 80 |                     ],
 81 |                     "Resource": "arn:aws:dynamodb:*:HISTORICAL-ACCOUNT-NUMBER-HERE:table/Historical*"
 82 |                 }
 83 |             ]
 84 |         }
 85 | 
 86 | 
 87 | ### Destination Account Roles
 88 | 
 89 | You will mostly likely need your own orchestration to roll this out. This will need to be rolled out to ALL accounts that you are inventorying with Historical.
 90 | 
 91 | The role is named `Historical` and has the following configuration details:
 92 | 
 93 | 1. Trust Policy (substitute `HISTORICAL-ACCOUNT-NUMBER-HERE` with the AWS account ID of the Historical account):
 94 | 
 95 |         {
 96 |             "Version": "2012-10-17",
 97 |             "Statement": [
 98 |                 {
 99 |                     "Effect": "Allow",
100 |                     "Principal": {
101 |                         "AWS": "arn:aws:iam::HISTORICAL-ACCOUNT-NUMBER-HERE:role/HistoricalLambdaProfile"
102 |                     },
103 |                     "Action": "sts:AssumeRole",
104 |                     "Condition": {}
105 |                 }
106 |             ]
107 |         }
108 | 
109 | 1. The `Historical` role needs read access to your resources. Simply attach the `ReadOnlyAccess` _AWS managed policy_ to the role and that is all.
110 | 
111 | 1. Duplicate this role to all of your accounts via your own orchestration and automation.
112 | 
113 | ## Next Steps
114 | [Please return to the Installation documentation](../).
115 | 


--------------------------------------------------------------------------------
/historical/security_group/poller.py:
--------------------------------------------------------------------------------
  1 | """
  2 | .. module: historical.security_group.poller
  3 |     :platform: Unix
  4 |     :copyright: (c) 2018 by Netflix Inc., see AUTHORS for more
  5 |     :license: Apache, see LICENSE for more details.
  6 | .. author:: Kevin Glisson <kglisson@netflix.com>
  7 | .. author:: Mike Grima <mgrima@netflix.com>
  8 | """
  9 | import os
 10 | import logging
 11 | 
 12 | from botocore.exceptions import ClientError
 13 | 
 14 | from raven_python_lambda import RavenLambdaWrapper
 15 | from cloudaux.aws.ec2 import describe_security_groups
 16 | 
 17 | from historical.common.sqs import get_queue_url, produce_events
 18 | from historical.common.util import deserialize_records
 19 | from historical.constants import HISTORICAL_ROLE, LOGGING_LEVEL, POLL_REGIONS, RANDOMIZE_POLLER
 20 | from historical.models import HistoricalPollerTaskEventModel
 21 | from historical.security_group.models import SECURITY_GROUP_POLLING_SCHEMA
 22 | from historical.common.accounts import get_historical_accounts
 23 | 
 24 | logging.basicConfig()
 25 | LOG = logging.getLogger("historical")
 26 | LOG.setLevel(LOGGING_LEVEL)
 27 | 
 28 | 
 29 | @RavenLambdaWrapper()
 30 | def poller_tasker_handler(event, context):  # pylint: disable=W0613
 31 |     """
 32 |     Historical Security Group Poller Tasker.
 33 | 
 34 |     The Poller is run at a set interval in order to ensure that changes do not go undetected by Historical.
 35 | 
 36 |     Historical pollers generate `polling events` which simulate changes. These polling events contain configuration
 37 |     data such as the account/region defining where the collector should attempt to gather data from.
 38 | 
 39 |     This is the entry point. This will task subsequent Poller lambdas to list all of a given resource in a select few
 40 |     AWS accounts.
 41 |     """
 42 |     LOG.debug('[@] Running Poller Tasker...')
 43 | 
 44 |     queue_url = get_queue_url(os.environ.get('POLLER_TASKER_QUEUE_NAME', 'HistoricalSecurityGroupPollerTasker'))
 45 |     poller_task_schema = HistoricalPollerTaskEventModel()
 46 | 
 47 |     events = []
 48 |     for account in get_historical_accounts():
 49 |         for region in POLL_REGIONS:
 50 |             events.append(poller_task_schema.serialize_me(account['id'], region))
 51 | 
 52 |     try:
 53 |         produce_events(events, queue_url, randomize_delay=RANDOMIZE_POLLER)
 54 |     except ClientError as exc:
 55 |         LOG.error(f'[X] Unable to generate poller tasker events! Reason: {exc}')
 56 | 
 57 |     LOG.debug('[@] Finished tasking the pollers.')
 58 | 
 59 | 
 60 | @RavenLambdaWrapper()
 61 | def poller_processor_handler(event, context):  # pylint: disable=W0613
 62 |     """
 63 |     Historical Security Group Poller Processor.
 64 | 
 65 |     This will receive events from the Poller Tasker, and will list all objects of a given technology for an
 66 |     account/region pair. This will generate `polling events` which simulate changes. These polling events contain
 67 |     configuration data such as the account/region defining where the collector should attempt to gather data from.
 68 |     """
 69 |     LOG.debug('[@] Running Poller...')
 70 | 
 71 |     collector_poller_queue_url = get_queue_url(os.environ.get('POLLER_QUEUE_NAME', 'HistoricalSecurityGroupPoller'))
 72 |     takser_queue_url = get_queue_url(os.environ.get('POLLER_TASKER_QUEUE_NAME', 'HistoricalSecurityGroupPollerTasker'))
 73 | 
 74 |     poller_task_schema = HistoricalPollerTaskEventModel()
 75 |     records = deserialize_records(event['Records'])
 76 | 
 77 |     for record in records:
 78 |         # Skip accounts that have role assumption errors:
 79 |         try:
 80 |             # Did we get a NextToken?
 81 |             if record.get('NextToken'):
 82 |                 LOG.debug(f"[@] Received pagination token: {record['NextToken']}")
 83 |                 groups = describe_security_groups(
 84 |                     account_number=record['account_id'],
 85 |                     assume_role=HISTORICAL_ROLE,
 86 |                     region=record['region'],
 87 |                     MaxResults=200,
 88 |                     NextToken=record['NextToken']
 89 |                 )
 90 |             else:
 91 |                 groups = describe_security_groups(
 92 |                     account_number=record['account_id'],
 93 |                     assume_role=HISTORICAL_ROLE,
 94 |                     region=record['region'],
 95 |                     MaxResults=200
 96 |                 )
 97 | 
 98 |             # FIRST THINGS FIRST: Did we get a `NextToken`? If so, we need to enqueue that ASAP because
 99 |             # 'NextToken`s expire in 60 seconds!
100 |             if groups.get('NextToken'):
101 |                 logging.debug(f"[-->] Pagination required {groups['NextToken']}. Tasking continuation.")
102 |                 produce_events(
103 |                     [poller_task_schema.serialize_me(record['account_id'], record['region'],
104 |                                                      next_token=groups['NextToken'])],
105 |                     takser_queue_url
106 |                 )
107 | 
108 |             # Task the collector to perform all the DDB logic -- this will pass in the collected data to the
109 |             # collector in very small batches.
110 |             events = [SECURITY_GROUP_POLLING_SCHEMA.serialize(record['account_id'], g, record['region'])
111 |                       for g in groups['SecurityGroups']]
112 |             produce_events(events, collector_poller_queue_url, batch_size=3)
113 | 
114 |             LOG.debug(f"[@] Finished generating polling events. Account: {record['account_id']}/{record['region']} "
115 |                       f"Events Created: {len(events)}")
116 |         except ClientError as exc:
117 |             LOG.error(f"[X] Unable to generate events for account/region. Account Id/Region: {record['account_id']}"
118 |                       f"/{record['region']} Reason: {exc}")
119 | 


--------------------------------------------------------------------------------
/mkdocs/docs/installation/terraform.md:
--------------------------------------------------------------------------------
 1 | # Historical Terraform Setup
 2 | 
 3 | A set of **sample** [Terraform](https://terraform.io) templates are included to assist with the roll-out of the infrastructure. This is intended to be run within a Docker container (code also included). The Docker container will:
 4 | 
 5 | 1. Package the Historical Lambda code
 6 | 1. Run the Terraform templates to provision all of the infrastructure
 7 | 
 8 | This is all run within an [Amazon Linux](https://hub.docker.com/_/amazonlinux/) Docker container. Amazon Linux is required because Historical's dependencies make use of statically linked libraries, which will fail to run in the Lambda environment unless the binaries are built on Amazon Linux.
 9 | 
10 | You can also use this to uninstall Historical from your environment as well.
11 | 
12 | **Please review each section below, as the details are very important:**
13 | 
14 | ### Structure
15 | The Terraform templates are split into multiple components:
16 | 
17 | 1. **Terraform Plugins** (located in terraform/terraform-plugins)
18 | 1. **DynamoDB** (located in terraform/dynamodb)
19 | 1. **Infrastructure** (located in terraform/infra)
20 | 
21 | #### Terraform Backend Configuration
22 | We make the assumption that the Terraform backend is on S3. As such, you will need an S3 bucket that resides in the Historical AWS account. It is __highly recommended__ that you configure the Historical Terraform S3 bucket with versioning enabled. This is needed should there ever be an issue with the Terraform state.
23 | 
24 | **NOTE:** For __ALL__ Terraform `main.tf` template files, at the top of the template file is a backend region configuration. It looks like this:
25 | 
26 |         terraform {
27 |           backend "s3" {
28 |             // Set this to where your Terraform S3 bucket is located (using us-west-2 as the example):
29 |             region = "us-west-2"
30 |           }
31 |         }
32 | 
33 | You will need to set the region to where your Terraform S3 bucket resides. In our examples, we are making use of `us-west-2`.
34 | 
35 | #### Terraform Plugins
36 | This is a Terraform template that is executed in the Docker `build` step. This is done to pin the Terraform plugins to the Docker container so that they need not be re-downloaded later. It is important to keep the version numbers in this doc in sync with the rest of the templates.
37 | 
38 | #### DynamoDB Templates
39 | This is used to construct the Global DynamoDB tables used by Historical. This is structured as follows:
40 | 
41 | 1. `main.tf` - This is the main template with the components required to build out the Global DynamoDB tables for a given Historical stack. The sample included makes an **ASSUMPTION** that you will be utilizing `us-west-2` as your _PRIMARY REGION_, and `us-east-1` and `eu-west-1` as your _SECONDARY REGIONS_.
42 |     - **You will need to modify this template accordingly to change the defaults set.**
43 |     - This is used for ALL stacks. If you want to specify different primary and secondary regions for a given AWS resource type, then you will need to make your own modifications to the installation scripting to leverage different templates.
44 | 1. Per-resource type stack configurations. Included are details for S3 and Security Groups. There is a Terraform template for each resource type. This is where you can configure the read and write capacities for the tables.
45 |     - **You will need to modify these templates accordingly to change the defaults set.**
46 |     - By default the tables are configured with a read and write capacity of `100` units. Change this as necessary.
47 | 
48 | When the installation scripts run, it copies over the resource type configuration to the same directory as the `main.tf` template. Terraform is then able to build out the infrastructure for a given resource type.
49 | 
50 | #### Infrastructure
51 | This is organized similar to the DynamoDB templates. This must be executed _after_ the DynamoDB templates on installation and _before_ the DynamoDB templates on tear-down (for uninstallation should you need to tear down the stack). This is structured as follows:
52 | 
53 | 1. `main.tf` - This is the main template with most of the infrastructure components identified. Very few (or no) changes need to be made here.
54 |     - This is used for ALL stacks.
55 | 1. `off-regions.tf` - This outlines all of the off-region components that are required. This file has a duplicate of every region off-regions' components. Unfortunately, because Terraform lacks a great way to perform loops and iterations, we duplicate the configuration for each region. This makes the file very large and painful to edit. The sample included makes an **ASSUMPTION** that you will be utilizing `us-west-2` as your _PRIMARY REGION_, and `us-east-1` and `eu-west-1` as your _SECONDARY REGIONS_. Thus, all other regions are the off-regions in our sample. You will need to alter this should you want to change the regions for your deployment.
56 |     - **You will need to modify this template accordingly to change the defaults set.**
57 |     - This is used for ALL stacks. If you want to specify different primary, secondary, and off-regions for a given AWS resource type, then you will need to make your own modifications to the installation scripting to leverage different templates.
58 | 1. Per-resource type stack configurations. Included are details for S3 and Security Groups. There is a Terraform template for each resource type. This is where you need to configure a number of details.
59 |     - _Most_ of the defaults values are fine and should not be changed.
60 |     - You will need to set the `PRIMARY_REGION`, and `POLLING_REGIONS` variables accordingly. With the exception of S3, the `POLLING_REGIONS` should include the primary and secondary regions in the list.
61 |     - **You will need to review all of the variables and comments in the template to understand what they mean how they should be set. If you change the defaults, you will need to make updates as necessary.**
62 | 
63 | 
64 | ## Configuration and Environment Variables
65 | **IMPORTANT:** There are many environment variables and configuration details that are required to be set. [Please review this page for details on this](configuration.md).
66 | 
67 | 
68 | ## Next Steps
69 | Once you have thoroughly reviewed this section, please return back to the [installation documentation](../).
70 | 


--------------------------------------------------------------------------------
/terraform/infra/s3/s3.tf:
--------------------------------------------------------------------------------
  1 | // Declare variables for the S3 Stack:
  2 | variable "PRIMARY_REGION" {
  3 |   default = "us-west-2"     // Change this for your infrastructure
  4 | }
  5 | 
  6 | // Define the regions to place the poller infrastructure here:
  7 | variable "POLLING_REGIONS" {
  8 |   type = "list"
  9 | 
 10 |   default = ["us-west-2"]   // Change this for your infrastructure
 11 | }
 12 | 
 13 | // Define the CloudWatch Event configuration:
 14 | data "null_data_source" "cwe_config" {
 15 |   inputs = {
 16 |     off_regions_sns_name = "HistoricalS3CWEForwarder"
 17 | 
 18 |     rule_name         = "HistoricalS3CloudWatchEventRule"
 19 |     rule_desc         = "EventRule forwarding S3 Bucket changes."
 20 | 
 21 |     poller_rule_name  = "HistoricalS3PollerEventRule"
 22 |     poller_rule_desc  = "EventRule for Polling S3."
 23 |     poller_rule_rate  = "rate(6 hours)"
 24 | 
 25 |     sqs_poller_tasker_queue       = "HistoricalS3PollerTasker"
 26 |     sqs_event_queue               = "HistoricalS3Events"
 27 |     sqs_poller_collector_queue    = "HistoricalS3Poller"
 28 |     differ_queue                  = "HistoricalS3Differ"
 29 | 
 30 |     rule_target_name  = "HistoricalS3EventsToSQS"
 31 | 
 32 |     // Event Syntax:
 33 |     event_pattern = <<PATTERN
 34 |   {
 35 |     "source": [
 36 |       "aws.s3"
 37 |     ],
 38 |     "detail-type": [
 39 |       "AWS API Call via CloudTrail"
 40 |     ],
 41 |     "detail": {
 42 |       "eventSource": [
 43 |         "s3.amazonaws.com"
 44 |       ],
 45 |       "eventName": [
 46 |         "DeleteBucket",
 47 |         "DeleteBucketCors",
 48 |         "DeleteBucketLifecycle",
 49 |         "DeleteBucketPolicy",
 50 |         "DeleteBucketReplication",
 51 |         "DeleteBucketTagging",
 52 |         "DeleteBucketWebsite",
 53 |         "CreateBucket",
 54 |         "PutBucketAcl",
 55 |         "PutBucketCors",
 56 |         "PutBucketLifecycle",
 57 |         "PutBucketPolicy",
 58 |         "PutBucketLogging",
 59 |         "PutBucketNotification",
 60 |         "PutBucketReplication",
 61 |         "PutBucketTagging",
 62 |         "PutBucketRequestPayment",
 63 |         "PutBucketVersioning",
 64 |         "PutBucketWebsite"
 65 |       ]
 66 |     }
 67 |   }
 68 | PATTERN
 69 |   }
 70 | }
 71 | 
 72 | // Lambda function configuration:
 73 | data "null_data_source" "lambda_function_config" {
 74 |   inputs = {
 75 |     lambda_name     = "historical-s3"
 76 |     lambda_memory   = "256"
 77 | 
 78 |     poller_tasker_handler = "historical.s3.poller.poller_tasker_handler"
 79 |     poller_tasker_desc    = "Lambda that tasks the poller for S3."
 80 | 
 81 |     poller_handler  = "historical.s3.poller.poller_processor_handler"
 82 |     poller_desc     = "Lambda that polls for changes in S3."
 83 | 
 84 |     collector_handler     = "historical.s3.collector.handler"
 85 |     collector_desc        = "Processes polling and cloudwatch events for S3 Buckets."
 86 |     collector_concurrency = -1  // You can set this if you want to limit the number of concurrent executions.
 87 | 
 88 |     differ_handler  = "historical.s3.differ.handler"
 89 |     differ_desc     = "Stream based function that is resposible for finding differences."
 90 |   }
 91 | }
 92 | 
 93 | // Lambda function tags:
 94 | data "null_data_source" "tags" {
 95 |   inputs = {
 96 |     owner = "yourteam@yourcompany.com"  // Feel free to add tags here.
 97 |   }
 98 | }
 99 | 
100 | // Lambda Env Vars:
101 | data "null_data_source" "env_vars" {
102 |   inputs = {
103 |     // For SWAG, see: https://github.com/Netflix-Skunkworks/swag-client (THIS IS HIGHLY RECOMMENDED)
104 |     # SWAG_BUCKET    = "YOUR-SWAG-BUCKET-HERE"
105 |     # SWAG_DATA_FILE = "v2/accounts.json"
106 |     # SWAG_OWNER     = "YOURCOMPANY"
107 |     # SWAG_REGION    = "YOUR SWAG BUCKET REGION"
108 | 
109 |     //SENTRY_DSN     =  "YOUR SENTRY DSN HERE." -- You can also use https://github.com/Netflix-Skunkworks/raven-sqs-proxy
110 | 
111 |     // IF YOU'RE NOT USING SWAG: YOU NEED TO SPECIFY THE ENABLED ACCOUNT ID'S HERE:
112 |     ENABLED_ACCOUNTS = "YOURACCOUNTIDHERE,YOURSECONDACCOUNTHERE,ETC"   // CSV
113 |     LOGGING_LEVEL    = "DEBUG"
114 |   }
115 | }
116 | 
117 | // Poller Env Vars:
118 | data "null_data_source" "poller_env_vars" {
119 |   inputs = {
120 |     # S3 doens't need POLL_REGIONS defined, because S3 is global (and the poller is only deployed to the Primary Region).
121 |     RANDOMIZE_POLLER    = "900"
122 |     TEST_ACCOUNTS_ONLY  = "False"   // Used with SWAG above -- This allows you to only run against "test" accounts.
123 |   }
124 | }
125 | 
126 | // Collector Env Vars:
127 | data "null_data_source" "collector_env_vars" {
128 |   inputs = {}
129 | }
130 | 
131 | // Current Table Proxy Env Vars:
132 | data "null_data_source" "current_proxy_env_vars" {
133 |   inputs = {
134 |     // The primary region (us-west-2 in this example) needs to specify all regions (minus the Secondary Regions) including itself.
135 |     // The Secondary Regions will only process events that occur within region.
136 |     PROXY_REGIONS = "${var.REGION == "us-west-2" ? "us-east-2,us-west-1,us-west-2,ap-northeast-1,ap-northeast-2,ap-south-1,ap-southeast-1,ap-southeast-2,ca-central-1,eu-central-1,eu-west-2,eu-west-3,sa-east-1" : var.REGION}"
137 |     PROXY_BATCH_SIZE = "1"  // Depending on the size of S3 config data -- leave this as 1 for now.
138 |   }
139 | }
140 | 
141 | // Differ Env Vars:
142 | data "null_data_source" "differ_env_vars" {
143 |   inputs = {}
144 | }
145 | 
146 | // Durable Table Proxy Env Vars:
147 | data "null_data_source" "durable_proxy_env_vars" {
148 |   inputs = {
149 |     // This should contain all regions -- this allows downstream subscribers to receive events.
150 |     PROXY_REGIONS = "us-east-1,eu-west-1,us-east-2,us-west-1,us-west-2,ap-northeast-1,ap-northeast-2,ap-south-1,ap-southeast-1,ap-southeast-2,ca-central-1,eu-central-1,eu-west-2,eu-west-3,sa-east-1"
151 |     SIMPLE_DURABLE_PROXY = "True"
152 |     HISTORICAL_TECHNOLOGY = "s3"
153 |   }
154 | }
155 | 
156 | // Proxy SNS Configuration:
157 | data "null_data_source" "proxy_configs" {
158 |   inputs = {
159 |     durable_sns_proxy = "HistoricalS3DurableProxy"
160 |   }
161 | }
162 | 
163 | // Current DynamoDB Table:
164 | data "aws_dynamodb_table" "current-table" {
165 |   name = "HistoricalS3CurrentTable"
166 | }
167 | 
168 | // Current DynamoDB Table:
169 | data "aws_dynamodb_table" "durable-table" {
170 |   name = "HistoricalS3DurableTable"
171 | }
172 | 


--------------------------------------------------------------------------------
/terraform/infra/securitygroup/securitygroup.tf:
--------------------------------------------------------------------------------
  1 | // Declare variables for the Security Group Stack:
  2 | variable "PRIMARY_REGION" {
  3 |   default = "us-west-2"
  4 | }
  5 | 
  6 | // Define the regions to place the poller infrastructure here:
  7 | variable "POLLING_REGIONS" {
  8 |   type = "list"
  9 | 
 10 |   default = ["us-west-2", "us-east-1", "eu-west-1"]
 11 | }
 12 | 
 13 | // Define the CloudWatch Event configuration:
 14 | data "null_data_source" "cwe_config" {
 15 |   inputs = {
 16 |     off_regions_sns_name = "HistoricalSecurityGroupCWEForwarder"
 17 | 
 18 |     rule_name         = "HistoricalSecurityGroupCloudWatchEventRule"
 19 |     rule_desc         = "EventRule forwarding Security Group changes."
 20 | 
 21 |     poller_rule_name  = "HistoricalSecurityGroupPollerEventRule"
 22 |     poller_rule_desc  = "EventRule for Polling Security Groups."
 23 |     poller_rule_rate  = "rate(6 hours)"
 24 | 
 25 |     sqs_poller_tasker_queue     = "HistoricalSecurityGroupPollerTasker"
 26 |     sqs_event_queue             = "HistoricalSecurityGroupEvents"
 27 |     sqs_poller_collector_queue  = "HistoricalSecurityGroupPoller"
 28 |     differ_queue                = "HistoricalSecurityGroupDiffer"
 29 | 
 30 |     rule_target_name  = "HistoricalSecurityGroupEventsToSQS"
 31 | 
 32 |     // Event Syntax:
 33 |     event_pattern = <<PATTERN
 34 | {
 35 |     "source": [
 36 |         "aws.ec2"
 37 |     ],
 38 |     "detail-type": [
 39 |         "AWS API Call via CloudTrail"
 40 |     ],
 41 |     "detail": {
 42 |         "eventSource": [
 43 |             "ec2.amazonaws.com"
 44 |         ],
 45 |         "eventName": [
 46 |             "AuthorizeSecurityGroupEgress",
 47 |             "AuthorizeSecurityGroupIngress",
 48 |             "CreateSecurityGroup",
 49 |             "DeleteSecurityGroup",
 50 |             "RevokeSecurityGroupEgress",
 51 |             "RevokeSecurityGroupIngress",
 52 |             "UpdateSecurityGroupRuleDescriptionsEgress",
 53 |             "UpdateSecurityGroupRuleDescriptionsIngress"
 54 |         ]
 55 |     }
 56 | }
 57 | PATTERN
 58 |   }
 59 | }
 60 | 
 61 | // Lambda function configuration:
 62 | data "null_data_source" "lambda_function_config" {
 63 |   inputs = {
 64 |     lambda_name     = "historical-securitygroup"
 65 |     lambda_memory   = "256"
 66 | 
 67 |     poller_tasker_handler = "historical.security_group.poller.poller_tasker_handler"
 68 |     poller_tasker_desc    = "Lambda that tasks the poller for Security Groups."
 69 | 
 70 |     poller_handler  = "historical.security_group.poller.poller_processor_handler"
 71 |     poller_desc     = "Lambda that polls for changes in Security Groups."
 72 | 
 73 |     collector_handler     = "historical.security_group.collector.handler"
 74 |     collector_desc        = "Processes polling and cloudwatch events for Security Groups."
 75 |     collector_concurrency = -1  // You can set this if you want to limit the number of concurrent executions.
 76 | 
 77 |     differ_handler  = "historical.security_group.differ.handler"
 78 |     differ_desc     = "Stream based function that is resposible for finding differences."
 79 |   }
 80 | }
 81 | 
 82 | // Lambda function tags:
 83 | data "null_data_source" "tags" {
 84 |   inputs = {
 85 |     owner = "yourteam@yourcompany.com"  // Feel free to add tags here.
 86 |   }
 87 | }
 88 | 
 89 | // Lambda Env Vars:
 90 | data "null_data_source" "env_vars" {
 91 |   inputs = {
 92 |     // For SWAG, see: https://github.com/Netflix-Skunkworks/swag-client (THIS IS HIGHLY RECOMMENDED)
 93 |     # SWAG_BUCKET    = "YOUR-SWAG-BUCKET-HERE"
 94 |     # SWAG_DATA_FILE = "v2/accounts.json"
 95 |     # SWAG_OWNER     = "YOURCOMPANY"
 96 |     # SWAG_REGION    = "YOUR SWAG BUCKET REGION"
 97 | 
 98 |     //SENTRY_DSN     =  "YOUR SENTRY DSN HERE." -- You can also use https://github.com/Netflix-Skunkworks/raven-sqs-proxy
 99 | 
100 |     // IF YOU'RE NOT USING SWAG: YOU NEED TO SPECIFY THE ENABLED ACCOUNT ID'S HERE:
101 |     ENABLED_ACCOUNTS = "YOURACCOUNTIDHERE,YOURSECONDACCOUNTHERE,ETC"   // CSV
102 |     LOGGING_LEVEL    = "DEBUG"
103 |   }
104 | }
105 | 
106 | // Poller Env Vars:
107 | data "null_data_source" "poller_env_vars" {
108 |   inputs = {
109 |     // The primary region (us-west-2 in this example) needs to specify all regions (minus the Secondary Regions) including itself.
110 |     // The Secondary Regions will only poll within region.
111 |     POLL_REGIONS        = "${var.REGION == "us-west-2" ? "us-east-2,us-west-1,us-west-2,ap-northeast-1,ap-northeast-2,ap-south-1,ap-southeast-1,ap-southeast-2,ca-central-1,eu-central-1,eu-west-2,eu-west-3,sa-east-1" : var.REGION}"
112 |     RANDOMIZE_POLLER    = "900"
113 |     TEST_ACCOUNTS_ONLY  = "False"   // Used with SWAG above -- This allows you to only run against "test" accounts.
114 |   }
115 | }
116 | 
117 | // Collector Env Vars:
118 | data "null_data_source" "collector_env_vars" {
119 |   inputs = {}
120 | }
121 | 
122 | // Current Table Proxy Env Vars:
123 | data "null_data_source" "current_proxy_env_vars" {
124 |   inputs = {
125 |     // The primary region (us-west-2 in this example) needs to specify all regions (minus the Secondary Regions) including itself.
126 |     // The Secondary Regions will only process events that occur within region.
127 |     PROXY_REGIONS = "${var.REGION == "us-west-2" ? "us-east-2,us-west-1,us-west-2,ap-northeast-1,ap-northeast-2,ap-south-1,ap-southeast-1,ap-southeast-2,ca-central-1,eu-central-1,eu-west-2,eu-west-3,sa-east-1" : var.REGION}"
128 |   }
129 | }
130 | 
131 | // Differ Env Vars:
132 | data "null_data_source" "differ_env_vars" {
133 |   inputs = {}
134 | }
135 | 
136 | // Durable Table Proxy Env Vars:
137 | data "null_data_source" "durable_proxy_env_vars" {
138 |   inputs = {
139 |     // This should contain all regions -- this allows downstream subscribers to receive events.
140 |     PROXY_REGIONS = "us-east-1,eu-west-1,us-east-2,us-west-1,us-west-2,ap-northeast-1,ap-northeast-2,ap-south-1,ap-southeast-1,ap-southeast-2,ca-central-1,eu-central-1,eu-west-2,eu-west-3,sa-east-1"
141 |   }
142 | }
143 | 
144 | // Proxy Configuration:
145 | data "null_data_source" "proxy_configs" {
146 |   inputs = {
147 |     durable_sns_proxy = "HistoricalSecurityGroupDurableProxy"
148 |   }
149 | }
150 | 
151 | // Current DynamoDB Table:
152 | data "aws_dynamodb_table" "current-table" {
153 |   name = "HistoricalSecurityGroupCurrentTable"
154 | }
155 | 
156 | // Current DynamoDB Table:
157 | data "aws_dynamodb_table" "durable-table" {
158 |   name = "HistoricalSecurityGroupDurableTable"
159 | }
160 | 


--------------------------------------------------------------------------------
/historical/vpc/collector.py:
--------------------------------------------------------------------------------
  1 | """
  2 | .. module: historical.vpc.collector
  3 |     :platform: Unix
  4 |     :copyright: (c) 2017 by Netflix Inc., see AUTHORS for more
  5 |     :license: Apache, see LICENSE for more details.
  6 | .. author:: Kevin Glisson <kglisson@netflix.com>
  7 | """
  8 | import logging
  9 | 
 10 | from botocore.exceptions import ClientError
 11 | from pynamodb.exceptions import DeleteError
 12 | 
 13 | from raven_python_lambda import RavenLambdaWrapper
 14 | 
 15 | from cloudaux.aws.ec2 import describe_vpcs
 16 | 
 17 | from historical.common.sqs import group_records_by_type
 18 | from historical.constants import CURRENT_REGION, HISTORICAL_ROLE, LOGGING_LEVEL
 19 | from historical.common import cloudwatch
 20 | from historical.common.util import deserialize_records, pull_tag_dict
 21 | from historical.vpc.models import CurrentVPCModel, VERSION
 22 | 
 23 | logging.basicConfig()
 24 | LOG = logging.getLogger('historical')
 25 | LOG.setLevel(LOGGING_LEVEL)
 26 | 
 27 | 
 28 | UPDATE_EVENTS = [
 29 |     'CreateVpc',
 30 |     'ModifyVpcAttribute',
 31 |     'PollVpc'
 32 | ]
 33 | 
 34 | DELETE_EVENTS = [
 35 |     'DeleteVpc'
 36 | ]
 37 | 
 38 | 
 39 | def get_arn(vpc_id, region, account_id):
 40 |     """Creates a vpc ARN."""
 41 |     return f'arn:aws:ec2:{region}:{account_id}:vpc/{vpc_id}'
 42 | 
 43 | 
 44 | def describe_vpc(record):
 45 |     """Attempts to describe vpc ids."""
 46 |     account_id = record['account']
 47 |     vpc_name = cloudwatch.filter_request_parameters('vpcName', record)
 48 |     vpc_id = cloudwatch.filter_request_parameters('vpcId', record)
 49 | 
 50 |     try:
 51 |         if vpc_id and vpc_name:  # pylint: disable=R1705
 52 |             return describe_vpcs(
 53 |                 account_number=account_id,
 54 |                 assume_role=HISTORICAL_ROLE,
 55 |                 region=CURRENT_REGION,
 56 |                 Filters=[
 57 |                     {
 58 |                         'Name': 'vpc-id',
 59 |                         'Values': [vpc_id]
 60 |                     }
 61 |                 ]
 62 |             )
 63 |         elif vpc_id:
 64 |             return describe_vpcs(
 65 |                 account_number=account_id,
 66 |                 assume_role=HISTORICAL_ROLE,
 67 |                 region=CURRENT_REGION,
 68 |                 VpcIds=[vpc_id]
 69 |             )
 70 |         else:
 71 |             raise Exception('[X] Describe requires VpcId.')
 72 |     except ClientError as exc:
 73 |         if exc.response['Error']['Code'] == 'InvalidVpc.NotFound':
 74 |             return []
 75 |         raise exc
 76 | 
 77 | 
 78 | def create_delete_model(record):
 79 |     """Create a vpc model from a record."""
 80 |     data = cloudwatch.get_historical_base_info(record)
 81 | 
 82 |     vpc_id = cloudwatch.filter_request_parameters('vpcId', record)
 83 | 
 84 |     arn = get_arn(vpc_id, cloudwatch.get_region(record), record['account'])
 85 | 
 86 |     LOG.debug(F'[-] Deleting Dynamodb Records. Hash Key: {arn}')
 87 | 
 88 |     # tombstone these records so that the deletion event time can be accurately tracked.
 89 |     data.update({
 90 |         'configuration': {}
 91 |     })
 92 | 
 93 |     items = list(CurrentVPCModel.query(arn, limit=1))
 94 | 
 95 |     if items:
 96 |         model_dict = items[0].__dict__['attribute_values'].copy()
 97 |         model_dict.update(data)
 98 |         model = CurrentVPCModel(**model_dict)
 99 |         model.save()
100 |         return model
101 | 
102 |     return None
103 | 
104 | 
105 | def capture_delete_records(records):
106 |     """Writes all of our delete events to DynamoDB."""
107 |     for record in records:
108 |         model = create_delete_model(record)
109 |         if model:
110 |             try:
111 |                 model.delete(condition=(CurrentVPCModel.eventTime <= record['detail']['eventTime']))
112 |             except DeleteError:
113 |                 LOG.warning(f'[?] Unable to delete VPC. VPC does not exist. Record: {record}')
114 |         else:
115 |             LOG.warning(f'[?] Unable to delete VPC. VPC does not exist. Record: {record}')
116 | 
117 | 
118 | def get_vpc_name(vpc):
119 |     """Fetches VPC Name (as tag) from VPC."""
120 |     for tag in vpc.get('Tags', []):
121 |         if tag['Key'].lower() == 'name':
122 |             return tag['Value']
123 | 
124 |     return None
125 | 
126 | 
127 | def capture_update_records(records):
128 |     """Writes all updated configuration info to DynamoDB"""
129 |     for record in records:
130 |         data = cloudwatch.get_historical_base_info(record)
131 |         vpc = describe_vpc(record)
132 | 
133 |         if len(vpc) > 1:
134 |             raise Exception(f'[X] Multiple vpcs found. Record: {record}')
135 | 
136 |         if not vpc:
137 |             LOG.warning(f'[?] No vpc information found. Record: {record}')
138 |             continue
139 | 
140 |         vpc = vpc[0]
141 | 
142 |         # determine event data for vpc
143 |         LOG.debug(f'Processing vpc. VPC: {vpc}')
144 |         data.update({
145 |             'VpcId': vpc.get('VpcId'),
146 |             'arn': get_arn(vpc['VpcId'], cloudwatch.get_region(record), data['accountId']),
147 |             'configuration': vpc,
148 |             'State': vpc.get('State'),
149 |             'IsDefault': vpc.get('IsDefault'),
150 |             'CidrBlock': vpc.get('CidrBlock'),
151 |             'Name': get_vpc_name(vpc),
152 |             'Region': cloudwatch.get_region(record),
153 |             'version': VERSION
154 |         })
155 | 
156 |         data['Tags'] = pull_tag_dict(vpc)
157 | 
158 |         LOG.debug(f'[+] Writing DynamoDB Record. Records: {data}')
159 | 
160 |         current_revision = CurrentVPCModel(**data)
161 |         current_revision.save()
162 | 
163 | 
164 | @RavenLambdaWrapper()
165 | def handler(event, context):  # pylint: disable=W0613
166 |     """
167 |     Historical vpc event collector.
168 |     This collector is responsible for processing Cloudwatch events and polling events.
169 |     """
170 |     records = deserialize_records(event['Records'])
171 | 
172 |     # Split records into two groups, update and delete.
173 |     # We don't want to query for deleted records.
174 |     update_records, delete_records = group_records_by_type(records, UPDATE_EVENTS)
175 |     capture_delete_records(delete_records)
176 | 
177 |     # filter out error events
178 |     update_records = [e for e in update_records if not e['detail'].get('errorCode')]  # pylint: disable=C0103
179 | 
180 |     # group records by account for more efficient processing
181 |     LOG.debug(f'[@] Update Records: {records}')
182 | 
183 |     capture_update_records(update_records)
184 | 


--------------------------------------------------------------------------------
/terraform/dynamodb/main.tf:
--------------------------------------------------------------------------------
  1 | terraform {
  2 |   backend "s3" {
  3 |     // Set this to where your Terraform S3 bucket is located (using us-west-2 as the example):
  4 |     region = "us-west-2"
  5 |   }
  6 | }
  7 | // ----------------------------
  8 | 
  9 | // ----------------------------
 10 | // Set up AWS for the primary region (this one is the main account where most API calls will be based from):
 11 | provider "aws" {
 12 |   version = "1.39"
 13 | 
 14 |   // Set the region to where you need it: us-west-2 is the example:
 15 |   "region" = "us-west-2"
 16 | }
 17 | 
 18 | // Alias providers for the specifc tables:
 19 | provider "aws" {
 20 |   version = "1.39"
 21 | 
 22 |   // This is the PRIMARY REGION ALIAS (us-west-2 is the example):
 23 |   "alias" = "us-west-2"
 24 |   "region" = "us-west-2"
 25 | }
 26 | 
 27 | // Create aliases for the SECONDARY REGIONS:
 28 | provider "aws" {
 29 |   version = "1.39"
 30 | 
 31 |   "alias" = "us-east-1"
 32 |   "region" = "us-east-1"
 33 | }
 34 | 
 35 | provider "aws" {
 36 |   version = "1.39"
 37 | 
 38 |   "alias" = "eu-west-1"
 39 |   "region" = "eu-west-1"
 40 | }
 41 | 
 42 | // ------------ CURRENT TABLES ----------------
 43 | // Create the Current tables for all regions:
 44 | resource "aws_dynamodb_table" "current_table_primary" {
 45 |   provider = "aws.us-west-2"    // Set this to the alias pointed to for the PRIMARY REGION ALIAS
 46 | 
 47 |   name                = "${var.CURRENT_TABLE}"
 48 |   read_capacity       = "${var.CURRENT_TABLE_READ_CAP}"
 49 |   write_capacity      = "${var.CURRENT_TABLE_WRITE_CAP}"
 50 |   hash_key            = "arn"
 51 |   stream_enabled      = true
 52 |   stream_view_type    = "NEW_AND_OLD_IMAGES"
 53 | 
 54 |   attribute {
 55 |     name = "arn"
 56 |     type = "S"
 57 |   }
 58 | 
 59 |   ttl {
 60 |     attribute_name = "ttl"
 61 |     enabled = true
 62 |   }
 63 | }
 64 | 
 65 | // SET UP YOUR SECONDARY REGION TABLES HERE:
 66 | resource "aws_dynamodb_table" "current_table_secondary_1" {
 67 |   provider = "aws.us-east-1"    // Set this to the alias for your secondary table
 68 | 
 69 |   name                = "${var.CURRENT_TABLE}"
 70 |   read_capacity       = "${var.CURRENT_TABLE_READ_CAP}"
 71 |   write_capacity      = "${var.CURRENT_TABLE_WRITE_CAP}"
 72 |   hash_key            = "arn"
 73 |   stream_enabled      = true
 74 |   stream_view_type    = "NEW_AND_OLD_IMAGES"
 75 | 
 76 |   attribute {
 77 |     name = "arn"
 78 |     type = "S"
 79 |   }
 80 | 
 81 |   ttl {
 82 |     attribute_name = "ttl"
 83 |     enabled = true
 84 |   }
 85 | }
 86 | 
 87 | resource "aws_dynamodb_table" "current_table_secondary_2" {
 88 |   provider = "aws.eu-west-1"    // Set this to the alias for your secondary table
 89 | 
 90 |   name                = "${var.CURRENT_TABLE}"
 91 |   read_capacity       = "${var.CURRENT_TABLE_READ_CAP}"
 92 |   write_capacity      = "${var.CURRENT_TABLE_WRITE_CAP}"
 93 |   hash_key            = "arn"
 94 |   stream_enabled      = true
 95 |   stream_view_type    = "NEW_AND_OLD_IMAGES"
 96 | 
 97 |   attribute {
 98 |     name = "arn"
 99 |     type = "S"
100 |   }
101 | 
102 |   ttl {
103 |     attribute_name = "ttl"
104 |     enabled = true
105 |   }
106 | }
107 | 
108 | 
109 | // GLOBAL DYNAMO TABLE:
110 | resource "aws_dynamodb_global_table" "current_table" {
111 |   // Set these to the proper tables above:
112 |   depends_on = [
113 |     "aws_dynamodb_table.current_table_primary",
114 |     "aws_dynamodb_table.current_table_secondary_1",
115 |     "aws_dynamodb_table.current_table_secondary_2"]
116 | 
117 |   name = "${var.CURRENT_TABLE}"
118 | 
119 |   // Set the primary and secondary regions below:
120 |   replica = {
121 |     region_name = "us-west-2"
122 |   }
123 | 
124 |   replica = {
125 |     region_name = "us-east-1"
126 |   }
127 | 
128 |   replica = {
129 |     region_name = "eu-west-1"
130 |   }
131 | }
132 | // ----------------------------
133 | 
134 | // ------------ DURABLE TABLES ----------------
135 | 
136 | // Create the Durable table:
137 | resource "aws_dynamodb_table" "durable_table_primary" {
138 |   provider            = "aws.us-west-2"     // Set this to the alias pointed to for the PRIMARY REGION ALIAS
139 | 
140 |   name                = "${var.DURABLE_TABLE}"
141 |   read_capacity       = "${var.DURABLE_TABLE_READ_CAP}"
142 |   write_capacity      = "${var.DURABLE_TABLE_WRITE_CAP}"
143 |   hash_key            = "arn"
144 |   range_key           = "eventTime"
145 |   stream_enabled      = true
146 |   stream_view_type    = "NEW_AND_OLD_IMAGES"
147 | 
148 |   attribute {
149 |     name = "arn"
150 |     type = "S"
151 |   }
152 | 
153 |   attribute {
154 |     name = "eventTime"
155 |     type = "S"
156 |   }
157 | }
158 | 
159 | resource "aws_dynamodb_table" "durable_table_secondary_1" {
160 |   provider            = "aws.us-east-1"     // Set this to the alias for your secondary table
161 | 
162 |   name                = "${var.DURABLE_TABLE}"
163 |   read_capacity       = "${var.DURABLE_TABLE_READ_CAP}"
164 |   write_capacity      = "${var.DURABLE_TABLE_WRITE_CAP}"
165 |   hash_key            = "arn"
166 |   range_key           = "eventTime"
167 |   stream_enabled      = true
168 |   stream_view_type    = "NEW_AND_OLD_IMAGES"
169 | 
170 |   attribute {
171 |     name = "arn"
172 |     type = "S"
173 |   }
174 | 
175 |   attribute {
176 |     name = "eventTime"
177 |     type = "S"
178 |   }
179 | }
180 | 
181 | resource "aws_dynamodb_table" "durable_table_secondary_2" {
182 |   provider          = "aws.eu-west-1"   // Set this to the alias for your secondary table
183 | 
184 |   name              = "${var.DURABLE_TABLE}"
185 |   read_capacity     = "${var.DURABLE_TABLE_READ_CAP}"
186 |   write_capacity    = "${var.DURABLE_TABLE_WRITE_CAP}"
187 |   hash_key          = "arn"
188 |   range_key         = "eventTime"
189 |   stream_enabled    = true
190 |   stream_view_type  = "NEW_AND_OLD_IMAGES"
191 | 
192 |   attribute {
193 |     name = "arn"
194 |     type = "S"
195 |   }
196 | 
197 |   attribute {
198 |     name = "eventTime"
199 |     type = "S"
200 |   }
201 | }
202 | 
203 | // GLOBAL DYNAMO TABLE:
204 | resource "aws_dynamodb_global_table" "durable_table" {
205 |   // Set these to the proper tables above:
206 |   depends_on = [
207 |     "aws_dynamodb_table.durable_table_primary",
208 |     "aws_dynamodb_table.durable_table_secondary_1",
209 |     "aws_dynamodb_table.durable_table_secondary_2"
210 |   ]
211 | 
212 |   name = "${var.DURABLE_TABLE}"
213 | 
214 |   // Set the primary and secondary regions below:
215 |   replica = {
216 |     region_name = "us-west-2"
217 |   }
218 | 
219 |   replica = {
220 |     region_name = "us-east-1"
221 |   }
222 | 
223 |   replica = {
224 |     region_name = "eu-west-1"
225 |   }
226 | }
227 | // -----------------------------
228 | 


--------------------------------------------------------------------------------
/mkdocs/docs/installation/configuration.md:
--------------------------------------------------------------------------------
 1 | # Historical Environment Variables & Configuration
 2 | 
 3 | Below is a reference of all of the environment variables that Historical makes use of, and the required/default status of them:
 4 | 
 5 | Most of these variables are found in:
 6 | 
 7 | - [`historical/constants.py`](https://github.com/Netflix-Skunkworks/historical/blob/master/historical/constants.py)
 8 | - [`historical/mapping/__init__.py`](https://github.com/Netflix-Skunkworks/historical/blob/master/historical/mapping/__init__.py)
 9 | 
10 | **NOTE: All environment variables are Strings**
11 | 
12 | ## Required Fields
13 | The fields below are required and **MUST** be configured by you in your Terraform templates:
14 | 
15 | | Variable | Where to set | Sample Value |
16 | |:----------:|:-------------|:-------------|
17 | |`PRIMARY_REGION`|Per-stack Terraform template<br />`variable PRIMARY_REGION`|`us-west-2`|
18 | |`POLLING_REGIONS`|Per-stack Terraform template<br />`variable POLLING_REGIONS`|`["us-west-2", "us-east-1", "eu-west-1"]`<br />This should be set to the secondary regions for most stacks.<br /><br />S3 is the exception since it's a "global" namespace.<br />For S3, this is always set to the `PRIMARY_REGION`.<br /><br />This populates the `POLL_REGIONS` env. var for the<br />Poller Lambdas.|
19 | |`REGION`|Infrastructure `main.tf`<br />This is a variable supplied<br />to Terraform in the<br />application of the template.|This value is used to determine if the current region<br />of the deployment is the primary region or a secondary region.|
20 | |`PROXY_REGIONS`|Per-stack Terraform template<br />`current_proxy_env_vars` and `durable_proxy_env_vars`|`us-east-1,eu-west-1,us-east-2,etc.`<br />This is a comma-separated string of regions.<br /><br />The `current_proxy_env_vars` for the `PRIMARY_REGION` needs to be configured to contain the `PRIMARY_REGION` and all the "off-regions".<br /><br />The `durable_proxy_env_vars` should contain ALL<br />the regions (default).|
21 | |`HISTORICAL_TECHNOLOGY`|Per-stack Terraform template<br />`durable_proxy_env_vars`|`s3` or `securitygroup`. This should be set in each sample stack properly.|
22 | |`SIMPLE_DURABLE_PROXY`|Per-stack Terraform template<br />`durable_proxy_env_vars`|`True` - This is the default value for the Durable Proxy.<br />Don't change this.<br /><br />This value toggles whether the DynamoDB<br />stream events will be serialized nicely for downstream consumption or not.|
23 | |`ENABLED_ACCOUNTS`|Per-stack Terraform template<br />`env_vars`|`ACCOUNTID1,ACCOUNTID2,etc.`<br />If you are not making use of [SWAG](https://github.com/Netflix-Skunkworks/swag-client), then you need to set this.|
24 | |`SWAG_BUCKET`|Per-stack Terraform template<br />`env_vars`|`some-s3-bucket-name`<br />Required if you are making use of [SWAG](https://github.com/Netflix-Skunkworks/swag-client).|
25 | |`SWAG_DATA_FILE`|Per-stack Terraform template<br />`env_vars`|`v2/accounts.json`<br />Required if you are making use of [SWAG](https://github.com/Netflix-Skunkworks/swag-client).<br />Points to where the `accounts.json` file is located.|
26 | |`SWAG_OWNER`|Per-stack Terraform template<br />`env_vars`|`yourcompany`<br />Required if you are making use of [SWAG](https://github.com/Netflix-Skunkworks/swag-client).<br />The entity that owns the accounts you are monitoring.|
27 | |`SWAG_REGION`|Per-stack Terraform template<br />`env_vars`|`us-west-2`<br />Required if you are making use of [SWAG](https://github.com/Netflix-Skunkworks/swag-client).<br />The region the `SWAG_BUCKET` is located.|
28 | 
29 | ### Default Required Fields
30 | These are fields that are required, but the default values are sufficient. These are not set in the Terraform templates.
31 | 
32 | | Variable | Description & Defaults |
33 | |:----------:|:-------------|
34 | |`CURRENT_REGION`|This is populated by the `AWS_DEFAULT_REGION` environment variable provided by Lambda. This will be set to the region that the Lambda function is running in.|
35 | |`TTL_EXPIRY`|Default: `86400` seconds. This is the TTL for an item in the Current Table. This is used to account for missing deletion events.|
36 | |`HISTORICAL_ROLE`|Default: `Historical`. Don't change this -- this is the name of the IAM role that Historical needs to assume to describe resources.|
37 | |`REGION_ATTR`|Default: `Region`. Don't change this -- this is the name of the region attribute in the DynamoDB table.|
38 | |`EVENT_TOO_BIG_FLAG`|Default: `event_too_big`. Don't change this -- this is a field name that informs Historical downstream functions if an event is too big to fit in SNS and SQS (>256KB).|
39 | 
40 | ## Optional Fields
41 | 
42 | | Variable | Where to set | Sample Value |
43 | |:----------:|:-------------|:-------------|
44 | |`RANDOMIZE_POLLER`|Per-stack Terraform template<br />`poller_env_vars`|0 <= value <= 900. Number of seconds to delay<br />Polling messages in SQS.<br /><br />It is recommended you set this to `"900"` for the Poller.|
45 | |`LOGGING_LEVEL`|Per-stack Terraform template<br />`env_vars`|[Any one of these values](https://github.com/Netflix-Skunkworks/historical/blob/master/historical/constants.py#L13-L17). `DEBUG` is recommended.|
46 | |`TEST_ACCOUNTS_ONLY`|Per-stack Terraform template<br />`env_vars`|Default `False`. This is used if you are making use of [SWAG](https://github.com/Netflix-Skunkworks/swag-client).<br /><br />Set this to `True` if you want your stack to _ONLY_ query<br />against "test" accounts. Useful for having<br />"test" and "prod" stacks.|
47 | |`PROXY_BATCH_SIZE`|Per-stack Terraform template<br />`current_proxy_env_vars`.|Default: `10`. Set this if the batched event size is too<br />big (>256KB) to send to SQS. This should be refactored<br />in the future so that this is not necessary.|
48 | |`SENTRY_DSN`|Per-stack Terraform template<br />`env_vars`|If you make use of [Sentry](https://sentry.io/), then set this to your DSN.<br /><br />Historical makes use of the [`raven-python-lambda`](https://github.com/Netflix-Skunkworks/raven-python-lambda) for Sentry.<br />You can also optionally use SQS as a transport layer for<br />Sentry messages via [`raven-sqs-proxy`](https://github.com/Netflix-Skunkworks/raven-sqs-proxy).|
49 | |Custom Tags|Per-stack Terraform template<br />`tags`|Add in a name-value pair of tags you want to affix<br />to your Lambda functions.|
50 | 
51 | 
52 | ## Docker Installer Specific Fields
53 | The fields below are specific for installation and uninstallation of Historical via the Docker container.  These values are present in the [`terraform/SAMPLE-env.list`](https://github.com/Netflix-Skunkworks/historical/blob/master/terraform/SAMPLE-env.list) file.
54 | 
55 | **ALL FIELDS BELOW ARE REQUIRED**
56 | 
57 | | Variable | Sample Value |
58 | |:----------:|:-------------|
59 | |`AWS_ACCESS_KEY_ID`|The AWS Access Key ID for the credential that will be used to run Terraform. This is for a very powerful IAM Role.|
60 | |`AWS_SECRET_ACCESS_KEY`|The AWS Secret Access Key for the credential that will be used to run Terraform. This is for a very powerful IAM Role.|
61 | |`AWS_SESSION_TOKEN`|The AWS Session Token for the credential that will be used to run Terraform. This is for a very powerful IAM Role.|
62 | |`TECH`|The Historical resource type for the stack in question. Either `s3` or `securitygroup` (for now).|
63 | |`PRIMARY_REGION`|The Primary Region of your Historical Stack.|
64 | |`SECONDARY_REGIONS`|The Secondary Regions of your Historical Stack. This is a comma separated string.|
65 | 
66 | 
67 | ## Next Steps
68 | [Please return to the Installation documentation](../).
69 | 


--------------------------------------------------------------------------------
/historical/security_group/collector.py:
--------------------------------------------------------------------------------
  1 | """
  2 | .. module: historical.security_group.collector
  3 |     :platform: Unix
  4 |     :copyright: (c) 2017 by Netflix Inc., see AUTHORS for more
  5 |     :license: Apache, see LICENSE for more details.
  6 | .. author:: Kevin Glisson <kglisson@netflix.com>
  7 | """
  8 | import logging
  9 | 
 10 | from botocore.exceptions import ClientError
 11 | from pynamodb.exceptions import DeleteError
 12 | 
 13 | from raven_python_lambda import RavenLambdaWrapper
 14 | 
 15 | from cloudaux.aws.ec2 import describe_security_groups
 16 | 
 17 | from historical.common.sqs import group_records_by_type
 18 | from historical.constants import HISTORICAL_ROLE, LOGGING_LEVEL
 19 | from historical.common import cloudwatch
 20 | from historical.common.util import deserialize_records, pull_tag_dict
 21 | from historical.security_group.models import CurrentSecurityGroupModel, VERSION
 22 | 
 23 | logging.basicConfig()
 24 | LOG = logging.getLogger('historical')
 25 | LOG.setLevel(LOGGING_LEVEL)
 26 | 
 27 | 
 28 | UPDATE_EVENTS = [
 29 |     'AuthorizeSecurityGroupEgress',
 30 |     'AuthorizeSecurityGroupIngress',
 31 |     'RevokeSecurityGroupEgress',
 32 |     'RevokeSecurityGroupIngress',
 33 |     'CreateSecurityGroup',
 34 |     'PollSecurityGroups'
 35 | ]
 36 | 
 37 | DELETE_EVENTS = [
 38 |     'DeleteSecurityGroup'
 39 | ]
 40 | 
 41 | 
 42 | def get_arn(group_id, region, account_id):
 43 |     """Creates a security group ARN."""
 44 |     return f'arn:aws:ec2:{region}:{account_id}:security-group/{group_id}'
 45 | 
 46 | 
 47 | def describe_group(record, region):
 48 |     """Attempts to  describe group ids."""
 49 |     account_id = record['account']
 50 |     group_name = cloudwatch.filter_request_parameters('groupName', record)
 51 |     vpc_id = cloudwatch.filter_request_parameters('vpcId', record)
 52 |     group_id = cloudwatch.filter_request_parameters('groupId', record, look_in_response=True)
 53 | 
 54 |     # Did this get collected already by the poller?
 55 |     if cloudwatch.get_collected_details(record):
 56 |         LOG.debug(f"[<--] Received already collected security group data: {record['detail']['collected']}")
 57 |         return [record['detail']['collected']]
 58 | 
 59 |     try:
 60 |         # Always depend on Group ID first:
 61 |         if group_id:  # pylint: disable=R1705
 62 |             return describe_security_groups(
 63 |                 account_number=account_id,
 64 |                 assume_role=HISTORICAL_ROLE,
 65 |                 region=region,
 66 |                 GroupIds=[group_id]
 67 |             )['SecurityGroups']
 68 | 
 69 |         elif vpc_id and group_name:
 70 |             return describe_security_groups(
 71 |                 account_number=account_id,
 72 |                 assume_role=HISTORICAL_ROLE,
 73 |                 region=region,
 74 |                 Filters=[
 75 |                     {
 76 |                         'Name': 'group-name',
 77 |                         'Values': [group_name]
 78 |                     },
 79 |                     {
 80 |                         'Name': 'vpc-id',
 81 |                         'Values': [vpc_id]
 82 |                     }
 83 |                 ]
 84 |             )['SecurityGroups']
 85 | 
 86 |         else:
 87 |             raise Exception('[X] Did not receive Group ID or VPC/Group Name pairs. '
 88 |                             f'We got: ID: {group_id} VPC/Name: {vpc_id}/{group_name}.')
 89 |     except ClientError as exc:
 90 |         if exc.response['Error']['Code'] == 'InvalidGroup.NotFound':
 91 |             return []
 92 |         raise exc
 93 | 
 94 | 
 95 | def create_delete_model(record):
 96 |     """Create a security group model from a record."""
 97 |     data = cloudwatch.get_historical_base_info(record)
 98 | 
 99 |     group_id = cloudwatch.filter_request_parameters('groupId', record)
100 |     # vpc_id = cloudwatch.filter_request_parameters('vpcId', record)
101 |     # group_name = cloudwatch.filter_request_parameters('groupName', record)
102 | 
103 |     arn = get_arn(group_id, cloudwatch.get_region(record), record['account'])
104 | 
105 |     LOG.debug(f'[-] Deleting Dynamodb Records. Hash Key: {arn}')
106 | 
107 |     # Tombstone these records so that the deletion event time can be accurately tracked.
108 |     data.update({'configuration': {}})
109 | 
110 |     items = list(CurrentSecurityGroupModel.query(arn, limit=1))
111 | 
112 |     if items:
113 |         model_dict = items[0].__dict__['attribute_values'].copy()
114 |         model_dict.update(data)
115 |         model = CurrentSecurityGroupModel(**model_dict)
116 |         model.save()
117 |         return model
118 | 
119 |     return None
120 | 
121 | 
122 | def capture_delete_records(records):
123 |     """Writes all of our delete events to DynamoDB."""
124 |     for rec in records:
125 |         model = create_delete_model(rec)
126 |         if model:
127 |             try:
128 |                 model.delete(condition=(CurrentSecurityGroupModel.eventTime <= rec['detail']['eventTime']))
129 |             except DeleteError:
130 |                 LOG.warning(f'[X] Unable to delete security group. Security group does not exist. Record: {rec}')
131 |         else:
132 |             LOG.warning(f'[?] Unable to delete security group. Security group does not exist. Record: {rec}')
133 | 
134 | 
135 | def capture_update_records(records):
136 |     """Writes all updated configuration info to DynamoDB"""
137 |     for rec in records:
138 |         data = cloudwatch.get_historical_base_info(rec)
139 |         group = describe_group(rec, cloudwatch.get_region(rec))
140 | 
141 |         if len(group) > 1:
142 |             raise Exception(f'[X] Multiple groups found. Record: {rec}')
143 | 
144 |         if not group:
145 |             LOG.warning(f'[?] No group information found. Record: {rec}')
146 |             continue
147 | 
148 |         group = group[0]
149 | 
150 |         # Determine event data for group - and pop off items that are going to the top-level:
151 |         LOG.debug(f'Processing group. Group: {group}')
152 |         data.update({
153 |             'GroupId': group['GroupId'],
154 |             'GroupName': group.pop('GroupName'),
155 |             'VpcId': group.pop('VpcId', None),
156 |             'arn': get_arn(group.pop('GroupId'), cloudwatch.get_region(rec), group.pop('OwnerId')),
157 |             'Region': cloudwatch.get_region(rec)
158 |         })
159 | 
160 |         data['Tags'] = pull_tag_dict(group)
161 | 
162 |         # Set the remaining items to the configuration:
163 |         data['configuration'] = group
164 | 
165 |         # Set the version:
166 |         data['version'] = VERSION
167 | 
168 |         LOG.debug(f'[+] Writing Dynamodb Record. Records: {data}')
169 |         current_revision = CurrentSecurityGroupModel(**data)
170 |         current_revision.save()
171 | 
172 | 
173 | @RavenLambdaWrapper()
174 | def handler(event, context):  # pylint: disable=W0613
175 |     """
176 |     Historical security group event collector.
177 |     This collector is responsible for processing Cloudwatch events and polling events.
178 |     """
179 |     records = deserialize_records(event['Records'])
180 | 
181 |     # Split records into two groups, update and delete.
182 |     # We don't want to query for deleted records.
183 |     update_records, delete_records = group_records_by_type(records, UPDATE_EVENTS)
184 |     capture_delete_records(delete_records)
185 | 
186 |     # filter out error events
187 |     update_records = [e for e in update_records if not e['detail'].get('errorCode')]
188 | 
189 |     # group records by account for more efficient processing
190 |     LOG.debug(f'[@] Update Records: {records}')
191 | 
192 |     capture_update_records(update_records)
193 | 


--------------------------------------------------------------------------------
/historical/models.py:
--------------------------------------------------------------------------------
  1 | """
  2 | .. module: historical.models
  3 |     :platform: Unix
  4 |     :copyright: (c) 2017 by Netflix Inc., see AUTHORS for more
  5 |     :license: Apache, see LICENSE for more details.
  6 | .. author:: Kevin Glisson <kglisson@netflix.com>
  7 | .. author:: Mike Grima <mgrima@netflix.com>
  8 | """
  9 | import time
 10 | from datetime import datetime
 11 | 
 12 | from marshmallow import fields, Schema
 13 | from pynamodb.models import Model
 14 | from pynamodb.attributes import ListAttribute, MapAttribute, NumberAttribute, UnicodeAttribute
 15 | 
 16 | from historical.attributes import EventTimeAttribute, fix_decimals, HistoricalDecimalAttribute
 17 | from historical.constants import TTL_EXPIRY
 18 | 
 19 | 
 20 | EPHEMERAL_PATHS = []
 21 | 
 22 | 
 23 | def default_ttl():
 24 |     """Return the default TTL as an int."""
 25 |     return int(time.time() + TTL_EXPIRY)
 26 | 
 27 | 
 28 | def default_event_time():
 29 |     """Get the current time and format it for the event time."""
 30 |     return datetime.utcnow().replace(tzinfo=None, microsecond=0).isoformat() + 'Z'
 31 | 
 32 | 
 33 | class BaseHistoricalModel(Model):
 34 |     """This is the base Historical DynamoDB model. All Historical PynamoDB models should subclass this."""
 35 | 
 36 |     # pylint: disable=R1701
 37 |     def __iter__(self):
 38 |         """Properly serialize the PynamoDB object as a `dict` via this function.
 39 |         Helper for serializing into a typical `dict`.  See: https://github.com/pynamodb/PynamoDB/issues/152
 40 |         """
 41 |         for name, attr in self.get_attributes().items():
 42 |             try:
 43 |                 if isinstance(attr, MapAttribute):
 44 |                     name, obj = name, getattr(self, name).as_dict()
 45 |                     yield name, fix_decimals(obj)  # Don't forget to remove the stupid decimals :/
 46 |                 elif isinstance(attr, NumberAttribute) or isinstance(attr, HistoricalDecimalAttribute):
 47 |                     yield name, int(attr.serialize(getattr(self, name)))
 48 |                 elif isinstance(attr, ListAttribute):
 49 |                     name, obj = name, [el.as_dict() for el in getattr(self, name)]
 50 |                     yield name, fix_decimals(obj)  # Don't forget to remove the stupid decimals :/
 51 |                 else:
 52 |                     yield name, attr.serialize(getattr(self, name))
 53 | 
 54 |             # For Nulls:
 55 |             except AttributeError:
 56 |                 yield name, None
 57 | 
 58 | 
 59 | class DurableHistoricalModel(BaseHistoricalModel):
 60 |     """The base Historical Durable (Differ) Table model base class."""
 61 | 
 62 |     eventTime = EventTimeAttribute(range_key=True, default=default_event_time)
 63 | 
 64 | 
 65 | class CurrentHistoricalModel(BaseHistoricalModel):
 66 |     """The base Historical Current Table model base class."""
 67 | 
 68 |     eventTime = EventTimeAttribute(default=default_event_time)
 69 |     ttl = NumberAttribute(default=default_ttl())
 70 |     eventSource = UnicodeAttribute()
 71 | 
 72 | 
 73 | class AWSHistoricalMixin(BaseHistoricalModel):
 74 |     """This is the main Historical event mixin. All the major required (and optional) fields are here."""
 75 | 
 76 |     arn = UnicodeAttribute(hash_key=True)
 77 |     accountId = UnicodeAttribute()
 78 |     configuration = MapAttribute()
 79 |     Tags = MapAttribute()
 80 |     version = HistoricalDecimalAttribute()
 81 |     userIdentity = MapAttribute(null=True)
 82 |     principalId = UnicodeAttribute(null=True)
 83 |     userAgent = UnicodeAttribute(null=True)
 84 |     sourceIpAddress = UnicodeAttribute(null=True)
 85 |     requestParameters = MapAttribute(null=True)
 86 |     eventName = UnicodeAttribute(null=True)
 87 | 
 88 | 
 89 | class HistoricalPollingEventDetail(Schema):
 90 |     """This is the Marshmallow schema for a Polling event. This is made to look like a CloudWatch Event."""
 91 | 
 92 |     # You must replace these:
 93 |     event_source = fields.Str(dump_to='eventSource', load_from='eventSource', required=True)
 94 |     event_name = fields.Str(dump_to='eventName', load_from='eventName', required=True)
 95 |     request_parameters = fields.Dict(dump_to='requestParameters', load_from='requestParameters', required=True)
 96 | 
 97 |     # This field is for technologies that lack a "list" method. For those technologies, the tasked poller
 98 |     # will perform all the describes and embed the major configuration details into this field:
 99 |     collected = fields.Dict(dump_to='collected', load_from='collected', required=False)
100 |     # ^^ The collector will then need to look for this and figure out how to save it to DDB.
101 | 
102 |     event_time = fields.Str(dump_to='eventTime', load_from='eventTime', required=True,
103 |                             default=default_event_time, missing=default_event_time)
104 | 
105 | 
106 | class HistoricalPollingBaseModel(Schema):
107 |     """This is a Marshmallow schema that holds objects that were described in the Poller.
108 | 
109 |     Data here will be passed onto the Collector so that the Collector need not fetch new
110 |     data from AWS.
111 |     """
112 | 
113 |     version = fields.Str(required=True)
114 |     account = fields.Str(required=True)
115 | 
116 |     detail_type = fields.Str(load_from='detail-type', dump_to='detail-type', required=True,
117 |                              missing='Poller', default='Poller')
118 |     source = fields.Str(required=True, missing='historical', default='historical')
119 |     time = fields.Str(required=True, default=default_event_time, missing=default_event_time)
120 | 
121 |     # You must replace this:
122 |     detail = fields.Nested(HistoricalPollingEventDetail, required=True)
123 | 
124 | 
125 | class HistoricalPollerTaskEventModel(Schema):
126 |     """This is a Marshmallow schema that will trigger the Poller to perform the List/Describe AWS API calls.
127 | 
128 |     This informs the Poller which account and region to list/describe against. If a next_token is specified, then it
129 |     will properly list/describe from from that pagination marker.
130 |     """
131 | 
132 |     account_id = fields.Str(required=True)
133 |     region = fields.Str(required=True)
134 |     next_token = fields.Str(load_from='NextToken', dump_to='NextToken')
135 | 
136 |     def serialize_me(self, account_id, region, next_token=None):
137 |         """Dumps the proper JSON for the schema.
138 | 
139 |         :param account_id:
140 |         :param region:
141 |         :param next_token:
142 |         :return:
143 |         """
144 |         payload = {
145 |             'account_id': account_id,
146 |             'region': region
147 |         }
148 | 
149 |         if next_token:
150 |             payload['next_token'] = next_token
151 | 
152 |         return self.dumps(payload).data
153 | 
154 | 
155 | class SimpleDurableSchema(Schema):
156 |     """This is a Marshmallow schema that represents a simplified serialized dict of the Durable Proxy events.
157 | 
158 |     This is so that downstream consumers of Historical events need-not worry too much about DynamoDB. This is a
159 |     fully-outlined dict of all the data for representing a given technology.  This will specify if the object was
160 |     too big for SNS/SQS delivery.
161 |     """
162 | 
163 |     arn = fields.Str(required=True)
164 |     event_time = fields.Str(required=True, default=default_event_time)
165 |     tech = fields.Str(required=True)
166 |     event_too_big = fields.Boolean(required=False)
167 |     item = fields.Dict(required=False)
168 | 
169 |     def serialize_me(self, arn, event_time, tech, item=None):
170 |         """Dumps the proper JSON for the schema. If the event is too big, then don't include the item.
171 | 
172 |         :param arn:
173 |         :param event_time:
174 |         :param tech:
175 |         :param item:
176 |         :return:
177 |         """
178 |         payload = {
179 |             'arn': arn,
180 |             'event_time': event_time,
181 |             'tech': tech
182 |         }
183 | 
184 |         if item:
185 |             payload['item'] = item
186 | 
187 |         else:
188 |             payload['event_too_big'] = True
189 | 
190 |         return self.dumps(payload).data.replace('<empty>', '')
191 | 


--------------------------------------------------------------------------------
/historical/historical-cookiecutter/historical_{{cookiecutter.technology_slug}}/{{cookiecutter.technology_slug}}/collector.py:
--------------------------------------------------------------------------------
  1 | """
  2 | .. module: {{cookiecutter.technology_slug}}.collector
  3 |     :platform: Unix
  4 |     :copyright: (c) 2017 by Netflix Inc., see AUTHORS for more
  5 |     :license: Apache, see LICENSE for more details.
  6 | .. author:: {{cookiecutter.author}} <{{cookiecutter.email}}>
  7 | """
  8 | import os
  9 | import logging
 10 | 
 11 | from pynamodb.exceptions import DeleteError
 12 | 
 13 | from raven_python_lambda import RavenLambdaWrapper
 14 | 
 15 | from historical.common import cloudwatch
 16 | from historical.common.kinesis import deserialize_records
 17 | from .models import Current{{cookiecutter.technology_slug | titlecase}}Model
 18 | 
 19 | logging.basicConfig()
 20 | log = logging.getLogger('historical')
 21 | level = logging.getLevelName(os.environ.get('HISTORICAL_LOGGING_LEVEL', 'WARNING'))
 22 | log.setLevel(level)
 23 | 
 24 | 
 25 | # TODO update with your events
 26 | UPDATE_EVENTS = [
 27 |     'HistoricalPoller'
 28 | ]
 29 | 
 30 | DELETE_EVENTS = [
 31 | 
 32 | ]
 33 | 
 34 | 
 35 | def get_arn(id, account):
 36 |     """Gets arn for {{cookiecutter.technology_name}}"""
 37 |     # TODO make ARN for technology
 38 |     # Example::
 39 |     # return 'arn:aws:ec2:{region}:{account_id}:security-group/{group_id}'.format(
 40 |     #     group_id=group_id,
 41 |     #     region=CURRENT_REGION,
 42 |     #     account_id=account_id
 43 |     # )
 44 |     return
 45 | 
 46 | 
 47 | def group_records_by_type(records):
 48 |     """Break records into two lists; create/update events and delete events."""
 49 |     update_records, delete_records = [], []
 50 |     for r in records:
 51 |         if isinstance(r, str):
 52 |             break
 53 | 
 54 |         if r['detail']['eventName'] in UPDATE_EVENTS:
 55 |             update_records.append(r)
 56 |         else:
 57 |             delete_records.append(r)
 58 |     return update_records, delete_records
 59 | 
 60 | 
 61 | def describe_technology(record):
 62 |     """Attempts to  describe {{cookiecutter.technology_name}} ids."""
 63 |     account_id = record['account']
 64 | 
 65 |     # TODO describe the technology item
 66 |     # Example::
 67 |     #    group_name = cloudwatch.filter_request_parameters('groupName', record)
 68 |     #    vpc_id = cloudwatch.filter_request_parameters('vpcId', record)
 69 |     #    group_id = cloudwatch.filter_request_parameters('groupId', record)
 70 |     #
 71 |     #    try:
 72 |     #        if vpc_id and group_name:
 73 |     #            return describe_security_groups(
 74 |     #                account_number=account_id,
 75 |     #                assume_role=HISTORICAL_ROLE,
 76 |     #                region=CURRENT_REGION,
 77 |     #                Filters=[
 78 |     #                    {
 79 |     #                        'Name': 'group-name',
 80 |     #                        'Values': [group_name]
 81 |     #                    },
 82 |     #                    {
 83 |     #                        'Name': 'vpc-id',
 84 |     #                        'Values': [vpc_id]
 85 |     #                    }
 86 |     #                ]
 87 |     #            )['SecurityGroups']
 88 |     #        elif group_id:
 89 |     #            return describe_security_groups(
 90 |     #                account_number=account_id,
 91 |     #                assume_role=HISTORICAL_ROLE,
 92 |     #                region=CURRENT_REGION,
 93 |     #                GroupIds=[group_id]
 94 |     #            )['SecurityGroups']
 95 |     #        else:
 96 |     #            raise Exception('Describe requires a groupId or a groupName and VpcId.')
 97 |     #    except ClientError as e:
 98 |     #        if e.response['Error']['Code'] == 'InvalidGroup.NotFound':
 99 |     #            return []
100 |     #        raise e
101 | 
102 |     return
103 | 
104 | 
105 | def create_delete_model(record):
106 |     """Create a {{cookiecutter.technology_name}} model from a record."""
107 |     data = cloudwatch.get_historical_base_info(record)
108 | 
109 |     # TODO get tech ID
110 |     # Example::
111 |     #    group_id = cloudwatch.filter_request_parameters('groupId', record)
112 |     #    vpc_id = cloudwatch.filter_request_parameters('vpcId', record)
113 |     #    group_name = cloudwatch.filter_request_parameters('groupName', record)
114 | 
115 |     tech_id = None
116 |     arn = get_arn(tech_id, record['account'])
117 | 
118 |     log.debug('Deleting Dynamodb Records. Hash Key: {arn}'.format(arn=arn))
119 | 
120 |     # tombstone these records so that the deletion event time can be accurately tracked.
121 |     data.update({
122 |         'configuration': {}
123 |     })
124 | 
125 |     items = list(Current{{cookiecutter.technology_slug | titlecase}}Model.query(arn, limit=1))
126 | 
127 |     if items:
128 |         model_dict = items[0].__dict__['attribute_values'].copy()
129 |         model_dict.update(data)
130 |         model = Current{{cookiecutter.technology_slug | titlecase }}Model(**model_dict)
131 |         model.save()
132 |         return model
133 | 
134 | 
135 | def capture_delete_records(records):
136 |     """Writes all of our delete events to DynamoDB."""
137 |     for r in records:
138 |         model = create_delete_model(r)
139 |         if model:
140 |             try:
141 |                 model.delete(eventTime__le=r['detail']['eventTime'])
142 |             except DeleteError as e:
143 |                 log.warning('Unable to delete {{cookiecutter.technology_name}}. {{cookiecutter.technology_name}} does not exist. Record: {record}'.format(
144 |                     record=r
145 |                 ))
146 |         else:
147 |             log.warning('Unable to delete {{cookiecutter.technology_name}}. {{cookiecutter.technology_name}} does not exist. Record: {record}'.format(
148 |                 record=r
149 |             ))
150 | 
151 | 
152 | def capture_update_records(records):
153 |     """Writes all updated configuration info to DynamoDB"""
154 |     for record in records:
155 |         data = cloudwatch.get_historical_base_info(record)
156 |         items = describe_technology(record)
157 | 
158 |         if len(items) > 1:
159 |             raise Exception('Multiple items found. Record: {record}'.format(record=record))
160 | 
161 |         if not items:
162 |             log.warning('No technology information found. Record: {record}'.format(record=record))
163 |             continue
164 | 
165 |         item = items[0]
166 | 
167 |         # determine event data for group
168 |         log.debug('Processing item. Group: {}'.format(item))
169 | 
170 |         # TODO update data
171 |         # Example::
172 |         # data.update({
173 |         #     'GroupId': item['GroupId'],
174 |         #     'GroupName': item['GroupName'],
175 |         #     'Description': item['Description'],
176 |         #     'VpcId': item.get('VpcId'),
177 |         #     'Tags': item.get('Tags', []),
178 |         #     'arn': get_arn(item['GroupId'], item['OwnerId']),
179 |         #     'OwnerId': item['OwnerId'],
180 |         #     'configuration': item,
181 |         #     'Region': cloudwatch.get_region(record)
182 |         # })
183 | 
184 |         log.debug('Writing Dynamodb Record. Records: {record}'.format(record=data))
185 | 
186 |         current_revision = Current{{cookiecutter.technology_slug | titlecase}}Model(**data)
187 |         current_revision.save()
188 | 
189 | 
190 | @RavenLambdaWrapper()
191 | def handler(event, context):
192 |     """
193 |     Historical {{cookiecutter.technology_name}} event collector.
194 |     This collector is responsible for processing Cloudwatch events and polling events.
195 |     """
196 |     records = deserialize_records(event['Records'])
197 | 
198 |     # Split records into two groups, update and delete.
199 |     # We don't want to query for deleted records.
200 |     update_records, delete_records = group_records_by_type(records)
201 |     capture_delete_records(delete_records)
202 | 
203 |     # filter out error events
204 |     update_records = [e for e in update_records if not e['detail'].get('errorCode')]
205 | 
206 |     # group records by account for more efficient processing
207 |     log.debug('Update Records: {records}'.format(records=records))
208 | 
209 |     capture_update_records(update_records)
210 | 


--------------------------------------------------------------------------------
/historical/s3/collector.py:
--------------------------------------------------------------------------------
  1 | """
  2 | .. module: historical.s3.collector
  3 |     :platform: Unix
  4 |     :copyright: (c) 2017 by Netflix Inc., see AUTHORS for more
  5 |     :license: Apache, see LICENSE for more details.
  6 | .. author:: Mike Grima <mgrima@netflix.com>
  7 | """
  8 | import logging
  9 | from itertools import groupby
 10 | 
 11 | from botocore.exceptions import ClientError
 12 | from pynamodb.exceptions import PynamoDBConnectionError
 13 | from raven_python_lambda import RavenLambdaWrapper
 14 | from cloudaux.orchestration.aws.s3 import get_bucket
 15 | 
 16 | from historical.common.sqs import group_records_by_type
 17 | from historical.constants import CURRENT_REGION, HISTORICAL_ROLE, LOGGING_LEVEL
 18 | from historical.common import cloudwatch
 19 | from historical.common.util import deserialize_records
 20 | from historical.s3.models import CurrentS3Model, VERSION
 21 | 
 22 | logging.basicConfig()
 23 | LOG = logging.getLogger('historical')
 24 | LOG.setLevel(LOGGING_LEVEL)
 25 | 
 26 | 
 27 | UPDATE_EVENTS = [
 28 |     'PollS3',   # Polling event
 29 |     'DeleteBucketCors',
 30 |     'DeleteBucketLifecycle',
 31 |     'DeleteBucketPolicy',
 32 |     'DeleteBucketReplication',
 33 |     'DeleteBucketTagging',
 34 |     'DeleteBucketWebsite',
 35 |     'CreateBucket',
 36 |     'PutBucketAcl',
 37 |     'PutBucketCors',
 38 |     'PutBucketLifecycle',
 39 |     'PutBucketPolicy',
 40 |     'PutBucketLogging',
 41 |     'PutBucketNotification',
 42 |     'PutBucketReplication',
 43 |     'PutBucketTagging',
 44 |     'PutBucketRequestPayment',
 45 |     'PutBucketVersioning',
 46 |     'PutBucketWebsite'
 47 | ]
 48 | 
 49 | 
 50 | DELETE_EVENTS = [
 51 |     'DeleteBucket',
 52 | ]
 53 | 
 54 | 
 55 | def create_delete_model(record):
 56 |     """Create an S3 model from a record."""
 57 |     arn = f"arn:aws:s3:::{cloudwatch.filter_request_parameters('bucketName', record)}"
 58 |     LOG.debug(f'[-] Deleting Dynamodb Records. Hash Key: {arn}')
 59 | 
 60 |     data = {
 61 |         'arn': arn,
 62 |         'principalId': cloudwatch.get_principal(record),
 63 |         'userIdentity': cloudwatch.get_user_identity(record),
 64 |         'accountId': record['account'],
 65 |         'eventTime': record['detail']['eventTime'],
 66 |         'BucketName': cloudwatch.filter_request_parameters('bucketName', record),
 67 |         'Region': cloudwatch.get_region(record),
 68 |         'Tags': {},
 69 |         'configuration': {},
 70 |         'eventSource': record['detail']['eventSource'],
 71 |         'version': VERSION
 72 |     }
 73 | 
 74 |     return CurrentS3Model(**data)
 75 | 
 76 | 
 77 | def process_delete_records(delete_records):
 78 |     """Process the requests for S3 bucket deletions"""
 79 |     for rec in delete_records:
 80 |         arn = f"arn:aws:s3:::{rec['detail']['requestParameters']['bucketName']}"
 81 | 
 82 |         # Need to check if the event is NEWER than the previous event in case
 83 |         # events are out of order. This could *possibly* happen if something
 84 |         # was deleted, and then quickly re-created. It could be *possible* for the
 85 |         # deletion event to arrive after the creation event. Thus, this will check
 86 |         # if the current event timestamp is newer and will only delete if the deletion
 87 |         # event is newer.
 88 |         try:
 89 |             LOG.debug(f'[-] Deleting bucket: {arn}')
 90 |             model = create_delete_model(rec)
 91 |             model.save(condition=(CurrentS3Model.eventTime <= rec['detail']['eventTime']))
 92 |             model.delete()
 93 | 
 94 |         except PynamoDBConnectionError as pdce:
 95 |             LOG.warning(f"[?] Unable to delete bucket: {arn}. Either it doesn't exist, or this deletion event is stale "
 96 |                         f"(arrived before a NEWER creation/update). The specific exception is: {pdce}")
 97 | 
 98 | 
 99 | def process_update_records(update_records):
100 |     """Process the requests for S3 bucket update requests"""
101 |     events = sorted(update_records, key=lambda x: x['account'])
102 | 
103 |     # Group records by account for more efficient processing
104 |     for account_id, events in groupby(events, lambda x: x['account']):
105 |         events = list(events)
106 | 
107 |         # Grab the bucket names (de-dupe events):
108 |         buckets = {}
109 |         for event in events:
110 |             # If the creation date is present, then use it:
111 |             bucket_event = buckets.get(event['detail']['requestParameters']['bucketName'], {
112 |                 'creationDate': event['detail']['requestParameters'].get('creationDate')
113 |             })
114 |             bucket_event.update(event['detail']['requestParameters'])
115 | 
116 |             buckets[event['detail']['requestParameters']['bucketName']] = bucket_event
117 |             buckets[event['detail']['requestParameters']['bucketName']]['eventDetails'] = event
118 | 
119 |         # Query AWS for current configuration
120 |         for b_name, item in buckets.items():
121 |             LOG.debug(f'[~] Processing Create/Update for: {b_name}')
122 |             # If the bucket does not exist, then simply drop the request --
123 |             # If this happens, there is likely a Delete event that has occurred and will be processed soon.
124 |             try:
125 |                 bucket_details = get_bucket(b_name,
126 |                                             account_number=account_id,
127 |                                             include_created=(item.get('creationDate') is None),
128 |                                             assume_role=HISTORICAL_ROLE,
129 |                                             region=CURRENT_REGION)
130 |                 if bucket_details.get('Error'):
131 |                     LOG.error(f"[X] Unable to fetch details about bucket: {b_name}. "
132 |                               f"The error details are: {bucket_details['Error']}")
133 |                     continue
134 | 
135 |             except ClientError as cerr:
136 |                 if cerr.response['Error']['Code'] == 'NoSuchBucket':
137 |                     LOG.warning(f'[?] Received update request for bucket: {b_name} that does not '
138 |                                 'currently exist. Skipping.')
139 |                     continue
140 | 
141 |                 # Catch Access Denied exceptions as well:
142 |                 if cerr.response['Error']['Code'] == 'AccessDenied':
143 |                     LOG.error(f'[X] Unable to fetch details for S3 Bucket: {b_name} in {account_id}. Access is Denied. '
144 |                               'Skipping...')
145 |                     continue
146 |                 raise Exception(cerr)
147 | 
148 |             # Pull out the fields we want:
149 |             data = {
150 |                 'arn': f'arn:aws:s3:::{b_name}',
151 |                 'principalId': cloudwatch.get_principal(item['eventDetails']),
152 |                 'userIdentity': cloudwatch.get_user_identity(item['eventDetails']),
153 |                 'userAgent': item['eventDetails']['detail'].get('userAgent'),
154 |                 'sourceIpAddress': item['eventDetails']['detail'].get('sourceIPAddress'),
155 |                 'requestParameters': item['eventDetails']['detail'].get('requestParameters'),
156 |                 'accountId': account_id,
157 |                 'eventTime': item['eventDetails']['detail']['eventTime'],
158 |                 'BucketName': b_name,
159 |                 'Region': bucket_details.pop('Region'),
160 |                 # Duplicated in top level and configuration for secondary index
161 |                 'Tags': bucket_details.pop('Tags', {}) or {},
162 |                 'eventSource': item['eventDetails']['detail']['eventSource'],
163 |                 'eventName': item['eventDetails']['detail']['eventName'],
164 |                 'version': VERSION
165 |             }
166 | 
167 |             # Remove the fields we don't care about:
168 |             del bucket_details['Arn']
169 |             del bucket_details['GrantReferences']
170 |             del bucket_details['_version']
171 |             del bucket_details['Name']
172 | 
173 |             if not bucket_details.get('CreationDate'):
174 |                 bucket_details['CreationDate'] = item['creationDate']
175 | 
176 |             data['configuration'] = bucket_details
177 | 
178 |             current_revision = CurrentS3Model(**data)
179 |             current_revision.save()
180 | 
181 | 
182 | @RavenLambdaWrapper()
183 | def handler(event, context):  # pylint: disable=W0613
184 |     """
185 |     Historical S3 event collector.
186 | 
187 |     This collector is responsible for processing CloudWatch events and polling events.
188 |     """
189 |     records = deserialize_records(event['Records'])
190 | 
191 |     # Split records into two groups, update and delete.
192 |     # We don't want to query for deleted records.
193 |     update_records, delete_records = group_records_by_type(records, UPDATE_EVENTS)
194 | 
195 |     LOG.debug('[@] Processing update records...')
196 |     process_update_records(update_records)
197 |     LOG.debug('[@] Completed processing of update records.')
198 | 
199 |     LOG.debug('[@] Processing delete records...')
200 |     process_delete_records(delete_records)
201 |     LOG.debug('[@] Completed processing of delete records.')
202 | 
203 |     LOG.debug('[@] Successfully updated current Historical table')
204 | 


--------------------------------------------------------------------------------
/historical/common/proxy.py:
--------------------------------------------------------------------------------
  1 | """
  2 | .. module: historical.common.proxy
  3 |     :platform: Unix
  4 |     :copyright: (c) 2018 by Netflix Inc., see AUTHORS for more
  5 |     :license: Apache, see LICENSE for more details.
  6 | .. author:: Mike Grima <mgrima@netflix.com>
  7 | """
  8 | import logging
  9 | import json
 10 | import math
 11 | import os
 12 | import sys
 13 | 
 14 | import boto3
 15 | from retrying import retry
 16 | 
 17 | from raven_python_lambda import RavenLambdaWrapper
 18 | 
 19 | from historical.common.dynamodb import DESER, remove_global_dynamo_specific_fields
 20 | from historical.common.exceptions import MissingProxyConfigurationException
 21 | from historical.common.sqs import produce_events
 22 | from historical.constants import CURRENT_REGION, EVENT_TOO_BIG_FLAG, PROXY_REGIONS, REGION_ATTR, SIMPLE_DURABLE_PROXY
 23 | 
 24 | from historical.mapping import DURABLE_MAPPING, HISTORICAL_TECHNOLOGY
 25 | 
 26 | LOG = logging.getLogger('historical')
 27 | 
 28 | 
 29 | @retry(stop_max_attempt_number=4, wait_exponential_multiplier=1000, wait_exponential_max=1000)
 30 | def _publish_sns_message(client, blob, topic_arn):
 31 |     client.publish(TopicArn=topic_arn, Message=blob)
 32 | 
 33 | 
 34 | def shrink_blob(record, deletion):
 35 |     """
 36 |     Makes a shrunken blob to be sent to SNS/SQS (due to the 256KB size limitations of SNS/SQS messages).
 37 |     This will essentially remove the "configuration" field such that the size of the SNS/SQS message remains under
 38 |     256KB.
 39 |     :param record:
 40 |     :return:
 41 |     """
 42 |     item = {
 43 |         "eventName": record["eventName"],
 44 |         EVENT_TOO_BIG_FLAG: (not deletion)
 45 |     }
 46 | 
 47 |     # To handle TTLs (if they happen)
 48 |     if record.get("userIdentity"):
 49 |         item["userIdentity"] = record["userIdentity"]
 50 | 
 51 |     # Remove the 'configuration' and 'requestParameters' fields from new and old images if applicable:
 52 |     if not deletion:
 53 |         # Only remove it from non-deletions:
 54 |         if record['dynamodb'].get('NewImage'):
 55 |             record['dynamodb']['NewImage'].pop('configuration', None)
 56 |             record['dynamodb']['NewImage'].pop('requestParameters', None)
 57 | 
 58 |     if record['dynamodb'].get('OldImage'):
 59 |         record['dynamodb']['OldImage'].pop('configuration', None)
 60 |         record['dynamodb']['OldImage'].pop('requestParameters', None)
 61 | 
 62 |     item['dynamodb'] = record['dynamodb']
 63 | 
 64 |     return item
 65 | 
 66 | 
 67 | @RavenLambdaWrapper()
 68 | def handler(event, context):  # pylint: disable=W0613
 69 |     """Historical S3 DynamoDB Stream Forwarder (the 'Proxy').
 70 | 
 71 |     Passes events from the Historical DynamoDB stream and passes it to SNS or SQS for additional events to trigger.
 72 | 
 73 |     You can optionally use SNS or SQS. It is preferable to use SNS -> SQS, but in some cases, such as the Current stream
 74 |     to the Differ, this will make use of SQS to directly feed into the differ for performance purposes.
 75 |     """
 76 |     queue_url = os.environ.get('PROXY_QUEUE_URL')
 77 |     topic_arn = os.environ.get('PROXY_TOPIC_ARN')
 78 | 
 79 |     if not queue_url and not topic_arn:
 80 |         raise MissingProxyConfigurationException('[X] Must set the `PROXY_QUEUE_URL` or the `PROXY_TOPIC_ARN` vars.')
 81 | 
 82 |     items_to_ship = []
 83 | 
 84 |     # Must ALWAYS shrink for SQS because of 256KB limit of sending batched messages
 85 |     force_shrink = True if queue_url else False
 86 | 
 87 |     # Is this a "Simple Durable Proxy" -- that is -- are we stripping out all of the DynamoDB data from
 88 |     # the Differ?
 89 |     record_maker = make_proper_simple_record if SIMPLE_DURABLE_PROXY else make_proper_dynamodb_record
 90 | 
 91 |     for record in event['Records']:
 92 |         # We should NOT be processing this if the item in question does not
 93 |         # reside in the PROXY_REGIONS
 94 |         correct_region = True
 95 |         for img in ['NewImage', 'OldImage']:
 96 |             if record['dynamodb'].get(img):
 97 |                 if record['dynamodb'][img][REGION_ATTR]['S'] not in PROXY_REGIONS:
 98 |                     LOG.debug(f"[/] Not processing record -- record event took place in:"
 99 |                               f" {record['dynamodb'][img][REGION_ATTR]['S']}")
100 |                     correct_region = False
101 |                     break
102 | 
103 |         if not correct_region:
104 |             continue
105 | 
106 |         # Global DynamoDB tables will update a record with the global table specific fields. This creates 2 events
107 |         # whenever there is an update. The second update, which is a MODIFY event is not relevant and noise. This
108 |         # needs to be skipped over to prevent duplicated events. This is a "gotcha" in Global DynamoDB tables.
109 |         if detect_global_table_updates(record):
110 |             continue
111 | 
112 |         items_to_ship.append(record_maker(record, force_shrink=force_shrink))
113 | 
114 |     if items_to_ship:
115 |         # SQS:
116 |         if queue_url:
117 |             produce_events(items_to_ship, queue_url, batch_size=int(os.environ.get('PROXY_BATCH_SIZE', 10)))
118 | 
119 |         # SNS:
120 |         else:
121 |             client = boto3.client("sns", region_name=CURRENT_REGION)
122 |             for i in items_to_ship:
123 |                 _publish_sns_message(client, i, topic_arn)
124 | 
125 | 
126 | def detect_global_table_updates(record):
127 |     """This will detect DDB Global Table updates that are not relevant to application data updates. These need to be
128 |        skipped over as they are pure noise.
129 | 
130 |     :param record:
131 |     :return:
132 |     """
133 |     # This only affects MODIFY events.
134 |     if record['eventName'] == 'MODIFY':
135 |         # Need to compare the old and new images to check for GT specific changes only (just pop off the GT fields)
136 |         old_image = remove_global_dynamo_specific_fields(record['dynamodb']['OldImage'])
137 |         new_image = remove_global_dynamo_specific_fields(record['dynamodb']['NewImage'])
138 | 
139 |         if json.dumps(old_image, sort_keys=True) == json.dumps(new_image, sort_keys=True):
140 |             return True
141 | 
142 |     return False
143 | 
144 | 
145 | def make_proper_dynamodb_record(record, force_shrink=False):
146 |     """Prepares and ships an individual DynamoDB record over to SNS/SQS for future processing.
147 | 
148 |     :param record:
149 |     :param force_shrink:
150 |     :return:
151 |     """
152 |     # Get the initial blob and determine if it is too big for SNS/SQS:
153 |     blob = json.dumps(record)
154 |     size = math.ceil(sys.getsizeof(blob) / 1024)
155 | 
156 |     # If it is too big, then we need to send over a smaller blob to inform the recipient that it needs to go out and
157 |     # fetch the item from the Historical table!
158 |     if size >= 200 or force_shrink:
159 |         deletion = False
160 |         # ^^ However -- deletions need to be handled differently, because the Differ won't be able to find a
161 |         # deleted record. For deletions, we will only shrink the 'OldImage', but preserve the 'NewImage' since that is
162 |         # "already" shrunken.
163 |         if record['dynamodb'].get('NewImage'):
164 |             # Config will be empty if there was a deletion:
165 |             if not (record['dynamodb']['NewImage'].get('configuration', {}) or {}).get('M'):
166 |                 deletion = True
167 | 
168 |         blob = json.dumps(shrink_blob(record, deletion))
169 | 
170 |     return blob
171 | 
172 | 
173 | def _get_durable_pynamo_obj(record_data, durable_model):
174 |     image = remove_global_dynamo_specific_fields(record_data)
175 |     data = {}
176 | 
177 |     for item, value in image.items():
178 |         # This could end up as loss of precision
179 |         data[item] = DESER.deserialize(value)
180 | 
181 |     return durable_model(**data)
182 | 
183 | 
184 | def make_proper_simple_record(record, force_shrink=False):
185 |     """Prepares and ships an individual simplified durable table record over to SNS/SQS for future processing.
186 | 
187 |     :param record:
188 |     :param force_shrink:
189 |     :return:
190 |     """
191 |     # Convert to a simple object
192 |     item = {
193 |         'arn': record['dynamodb']['Keys']['arn']['S'],
194 |         'event_time': record['dynamodb']['NewImage']['eventTime']['S'],
195 |         'tech': HISTORICAL_TECHNOLOGY
196 |     }
197 | 
198 |     # We need to de-serialize the raw DynamoDB object into the proper PynamoDB obj:
199 |     prepped_new_record = _get_durable_pynamo_obj(record['dynamodb']['NewImage'],
200 |                                                  DURABLE_MAPPING.get(HISTORICAL_TECHNOLOGY))
201 | 
202 |     item['item'] = dict(prepped_new_record)
203 | 
204 |     # Get the initial blob and determine if it is too big for SNS/SQS:
205 |     blob = json.dumps(item)
206 |     size = math.ceil(sys.getsizeof(blob) / 1024)
207 | 
208 |     # If it is too big, then we need to send over a smaller blob to inform the recipient that it needs to go out and
209 |     # fetch the item from the Historical table!
210 |     if size >= 200 or force_shrink:
211 |         del item['item']
212 | 
213 |         item[EVENT_TOO_BIG_FLAG] = True
214 | 
215 |         blob = json.dumps(item)
216 | 
217 |     return blob.replace('<empty>', '')
218 | 


--------------------------------------------------------------------------------
/historical/historical-cookiecutter/historical_{{cookiecutter.technology_slug}}/serverless.yaml:
--------------------------------------------------------------------------------
  1 | service: "historical-{{cookiecutter.technology_slug}}"
  2 | 
  3 | provider:
  4 |   name: aws
  5 |   runtime: python3.6
  6 |   memorySize: 1024
  7 |   timeout: 300
  8 |   deploymentBucket:
  9 |     name: ${opt:region}-${self:custom.accountName}-{{cookiecutter.team}}
 10 | 
 11 | custom: ${file(serverless_configs/${opt:stage}.yml)}
 12 | 
 13 | functions:
 14 |   Collector:
 15 |     handler: historical.security_group.collector.handler
 16 |     description: Processes polling and cloudwatch events.
 17 |     tags:
 18 |       owner: {{cookiecutter.email}}
 19 |     role: arn:aws:iam::${self:custom.accountId}:role/HistoricalLambdaProfile
 20 |     events:
 21 |        - stream:
 22 |            type: kinesis
 23 |            arn:
 24 |              Fn::GetAtt:
 25 |                - Historical{{cookiecutter.technology_slug | titlecase}}Stream
 26 |                - Arn
 27 |            batchSize: 100
 28 |            startingPosition: LATEST
 29 | 
 30 |        - stream:
 31 |            type: kinesis
 32 |            arn:
 33 |              Fn::GetAtt:
 34 |                - Historical{{cookiecutter.technology_slug | titlecase}}PollerStream
 35 |                - Arn
 36 |            batchSize: 100
 37 |            startingPosition: LATEST
 38 |     environment:
 39 |       SENTRY_DSN: ${self:custom.sentryDSN}
 40 | 
 41 |   Poller:
 42 |     handler: historical.{{cookiecutter.technology_slug}}.poller.handler
 43 |     description: Scheduled event that describes {{cookiecutter.technology_name}}.
 44 |     tags:
 45 |       owner: {{cookiecutter.email}}
 46 |     role: arn:aws:iam::${self:custom.accountId}:role/HistoricalLambdaProfile
 47 | 
 48 |   Differ:
 49 |     handler: historical.{{cookiecutter.technology_slug}}.differ.handler
 50 |     description: Stream based function that is resposible for finding differences.
 51 |     tags:
 52 |       owner: {{cookiecutter.email}}
 53 |     role: arn:aws:iam::${self:custom.accountId}:role/HistoricalLambdaProfile
 54 |     events:
 55 |       - stream:
 56 |          type: dynamodb
 57 |          arn:
 58 |            Fn::GetAtt:
 59 |              - Historical{{cookiecutter.technology_slug | titlecase}}CurrentTable
 60 |              - StreamArn
 61 | resources:
 62 |   Resources:
 63 |     # The Kinesis Stream -- Where the events will go:
 64 |     Historical{{cookiecutter.technology_slug | titlecase}}Stream:
 65 |       Type: AWS::Kinesis::Stream
 66 |       Properties:
 67 |         Name: Historical{{cookiecutter.technology_slug | titlecase}}Stream
 68 |         ShardCount: 1
 69 | 
 70 |     # The Kinesis Polling Stream -- Where the polling events will go:
 71 |     Historical{{cookiecutter.technology_slug | titlecase}}PollerStream:
 72 |       Type: AWS::Kinesis::Stream
 73 |       Properties:
 74 |         Name: Historical{{cookiecutter.technology_slug | titlecase}}PollerStream
 75 |         ShardCount: 1
 76 | 
 77 |     # The events -- these will be placed on the Kinesis stream:
 78 |     CloudWatchEventRule:
 79 |       Type: AWS::Events::Rule
 80 |       DependsOn:
 81 |         - Historical{{cookiecutter.technology_slug | titlecase}}Stream
 82 |       Properties:
 83 |         Description: EventRule forwarding security group changes.
 84 |         EventPattern:
 85 |           source:
 86 |             - aws.ec2
 87 |           detail-type:
 88 |             - AWS API Call via CloudTrail
 89 |           detail:
 90 |             eventSource:
 91 |               - ec2.amazonaws.com
 92 |             eventName:
 93 |               # TODO Update with your events
 94 |         State: ENABLED
 95 |         Targets:
 96 |           -
 97 |             Arn:
 98 |               Fn::GetAtt:
 99 |                 - Historical{{cookiecutter.technology_slug | titlecase}}Stream
100 |                 - Arn
101 |             Id: EventStream
102 |             RoleArn: arn:aws:iam::${self:custom.accountId}:role/service-role/AwsEventsInvokeKinesis
103 | 
104 |     # The "Current" DynamoDB table:
105 |     Historical{{cookiecutter.technology_slug | titlecase}}CurrentTable:
106 |       Type: AWS::DynamoDB::Table
107 |       Properties:
108 |         TableName: Historical{{cookiecutter.technology_slug | titlecase}}CurrentTable
109 |         TimeToLiveSpecification:
110 |           AttributeName: ttl
111 |           Enabled: true
112 |         AttributeDefinitions:
113 |           - AttributeName: arn
114 |             AttributeType: S
115 |         KeySchema:
116 |           - AttributeName: arn
117 |             KeyType: HASH
118 |         ProvisionedThroughput:
119 |           ReadCapacityUnits: 100
120 |           WriteCapacityUnits: 100
121 |         StreamSpecification:
122 |           StreamViewType: NEW_AND_OLD_IMAGES
123 | 
124 |     # The Durable (Historical) change DynamoDB table:
125 |     Historical{{cookiecutter.technology_slug | titlecase}}DurableTable:
126 |       Type: AWS::DynamoDB::Table
127 |       Properties:
128 |         TableName: Historical{{cookiecutter.technology_slug | titlecase}}DurableTable
129 |         AttributeDefinitions:
130 |           - AttributeName: arn
131 |             AttributeType: S
132 |           - AttributeName: eventTime
133 |             AttributeType: S
134 |         KeySchema:
135 |           - AttributeName: arn
136 |             KeyType: HASH
137 |           - AttributeName: eventTime
138 |             KeyType: RANGE
139 |         ProvisionedThroughput:
140 |           ReadCapacityUnits: 100
141 |           WriteCapacityUnits: 100
142 |         StreamSpecification:
143 |           StreamViewType: NEW_AND_OLD_IMAGES
144 | 
145 |     # Lambdas
146 |     CollectorLambdaFunction:
147 |       Type: AWS::Lambda::Function
148 |       DependsOn:
149 |         - Historical{{cookiecutter.technology_slug | titlecase}}Stream
150 | 
151 |     DifferLambdaFunction:
152 |       Type: AWS::Lambda::Function
153 |       DependsOn:
154 |         - Historical{{cookiecutter.technology_slug | titlecase}}CurrentTable
155 | 
156 |     PollerScheduledRule:
157 |       Type: AWS::Events::Rule
158 |       Properties:
159 |         Description: ScheduledRule
160 |         ScheduleExpression: rate(60 minutes)
161 |         State: ENABLED
162 |         Targets:
163 |           -
164 |             Arn:
165 |               Fn::GetAtt:
166 |                 - PollerLambdaFunction
167 |                 - Arn
168 |             Id: TargetFunctionV1
169 | 
170 |     PermissionForEventsToInvokeLambda:
171 |       Type: AWS::Lambda::Permission
172 |       Properties:
173 |         FunctionName:
174 |           Ref: PollerLambdaFunction
175 |         Action: lambda:InvokeFunction
176 |         Principal: events.amazonaws.com
177 |         SourceArn:
178 |           Fn::GetAtt:
179 |             - PollerScheduledRule
180 |             - Arn
181 | 
182 |     # Log group -- 1 for each function...
183 |     CollectorLogGroup:
184 |       Properties:
185 |         RetentionInDays: "3"
186 | 
187 |     PollerLogGroup:
188 |       Properties:
189 |         RetentionInDays: "3"
190 | 
191 |     DifferLogGroup:
192 |       Properties:
193 |         RetentionInDays: "3"
194 | 
195 |   # Outputs -- for use in other dependent Historical deployments:
196 |   Outputs:
197 |     Historical{{cookiecutter.technology_slug | titlecase}}StreamArn:
198 |       Description: Historical Security Group Event Kinesis Stream ARN
199 |       Value:
200 |         Fn::GetAtt:
201 |           - Historical{{cookiecutter.technology_slug | titlecase}}Stream
202 |           - Arn
203 |       Export:
204 |         Name: Historical{{cookiecutter.technology_slug | titlecase}}StreamArn
205 | 
206 |     Historical{{cookiecutter.technology_slug | titlecase}}PollerStreamArn:
207 |       Description: Historical Security Group Poller Kinesis Stream ARN
208 |       Value:
209 |         Fn::GetAtt:
210 |           - Historical{{cookiecutter.technology_slug | titlecase}}PollerStream
211 |           - Arn
212 |       Export:
213 |         Name: Historical{{cookiecutter.technology_slug | titlecase}}PollerStreamArn
214 | 
215 |     Historical{{cookiecutter.technology_slug | titlecase}}CurrentTableArn:
216 |       Description: Historical Security Group Current DynamoDB Table ARN
217 |       Value:
218 |         Fn::GetAtt:
219 |           - Historical{{cookiecutter.technology_slug | titlecase}}CurrentTable
220 |           - Arn
221 |       Export:
222 |         Name: Historical{{cookiecutter.technology_slug | titlecase}}CurrentTableArn
223 | 
224 |     Historical{{cookiecutter.technology_slug | titlecase}}CurrentTableStreamArn:
225 |       Description: Historical Security Group Current DynamoDB Table Stream ARN
226 |       Value:
227 |         Fn::GetAtt:
228 |           - Historical{{cookiecutter.technology_slug | titlecase}}CurrentTable
229 |           - StreamArn
230 |       Export:
231 |         Name: Historical{{cookiecutter.technology_slug | titlecase}}CurrentTableStreamArn
232 | 
233 |     Historical{{cookiecutter.technology_slug | titlecase}}DurableTableArn:
234 |       Description: Historical Security Group Durable DynamoDB Table ARN
235 |       Value:
236 |         Fn::GetAtt:
237 |           - Historical{{cookiecutter.technology_slug | titlecase}}DurableTable
238 |           - Arn
239 |       Export:
240 |         Name: Historical{{cookiecutter.technology_slug | titlecase}}DurableTableArn
241 | 
242 |     Historical{{cookiecutter.technology_slug | titlecase}}DurableTableStreamArn:
243 |       Description: Historical Security Group Durable DynamoDB Table Stream ARN
244 |       Value:
245 |         Fn::GetAtt:
246 |           - Historical{{cookiecutter.technology_slug | titlecase}}DurableTable
247 |           - StreamArn
248 |       Export:
249 |         Name: Historical{{cookiecutter.technology_slug | titlecase}}DurableTableStreamArn
250 | 
251 | plugins:
252 |   - serverless-python-requirements
253 |   - serverless-prune-plugin
254 | 


--------------------------------------------------------------------------------
/historical/tests/factories.py:
--------------------------------------------------------------------------------
  1 | # pylint: disable=R0205,E1101,C0103,W0622,W0613
  2 | """
  3 | .. module: historical.tests.factories
  4 |     :platform: Unix
  5 |     :copyright: (c) 2017 by Netflix Inc., see AUTHORS for more
  6 |     :license: Apache, see LICENSE for more details.
  7 | .. author:: Kevin Glisson <kglisson@netflix.com>
  8 | .. author:: Mike Grima <mgrima@netflix.com>
  9 | """
 10 | import datetime
 11 | 
 12 | from boto3.dynamodb.types import TypeSerializer
 13 | from factory import SubFactory, Factory, post_generation  # pylint: disable=E0401
 14 | from factory.fuzzy import FuzzyDateTime, FuzzyText  # pylint: disable=E0401
 15 | import pytz  # pylint: disable=E0401
 16 | 
 17 | SERIA = TypeSerializer()
 18 | 
 19 | 
 20 | def serialize(obj):
 21 |     """JSON serializer for objects not serializable by default json code"""
 22 | 
 23 |     if isinstance(obj, datetime.datetime):
 24 |         serial = obj.replace(microsecond=0).replace(tzinfo=None).isoformat() + "Z"
 25 |         return serial
 26 | 
 27 |     if isinstance(obj, bytes):
 28 |         return obj.decode('utf-8')
 29 | 
 30 |     return obj.__dict__
 31 | 
 32 | 
 33 | class SessionIssuer(object):
 34 |     """Model for the Session Issuer in the CloudWatch Event"""
 35 | 
 36 |     def __init__(self, userName, type, arn, principalId, accountId):
 37 |         self.userName = userName
 38 |         self.type = type
 39 |         self.arn = arn
 40 |         self.principalId = principalId
 41 |         self.accountId = accountId
 42 | 
 43 | 
 44 | class SessionIssuerFactory(Factory):
 45 |     """Generates the Session Issuer component of the CloudWatch Event"""
 46 | 
 47 |     class Meta:
 48 |         """Defines the Model"""
 49 | 
 50 |         model = SessionIssuer
 51 | 
 52 |     userName = FuzzyText()
 53 |     type = 'Role'
 54 |     arn = 'arn:aws:iam::123456789012:role/historical_poller'
 55 |     principalId = 'AROAIKELBS2RNWG7KASDF'
 56 |     accountId = '123456789012'
 57 | 
 58 | 
 59 | class UserIdentity(object):
 60 |     """Model for the User Identity component of the CloudWatch Event"""
 61 | 
 62 |     def __init__(self, sessionContext, principalId, type):
 63 |         self.sessionContext = sessionContext
 64 |         self.principalId = principalId
 65 |         self.type = type
 66 | 
 67 | 
 68 | class UserIdentityFactory(Factory):
 69 |     """Generates the User Identity component of the CloudWatch Event"""
 70 | 
 71 |     class Meta:
 72 |         """Defines the Model"""
 73 | 
 74 |         model = UserIdentity
 75 | 
 76 |     sessionContext = SubFactory(SessionIssuerFactory)
 77 |     principalId = 'AROAIKELBS2RNWG7KASDF:joe@example.com'
 78 |     type = 'Service'
 79 | 
 80 | 
 81 | class SQSData(object):
 82 |     """Model for an SQS Event Message"""
 83 | 
 84 |     def __init__(self, messageId, receiptHandle, body):
 85 |         self.messageId = messageId
 86 |         self.receiptHandle = receiptHandle
 87 |         self.body = body
 88 |         self.eventSource = "aws:sqs"
 89 | 
 90 | 
 91 | class SQSDataFactory(Factory):
 92 |     """Generates the SQS Event Message"""
 93 | 
 94 |     class Meta:
 95 |         """Defines the Model"""
 96 | 
 97 |         model = SQSData
 98 | 
 99 |     body = FuzzyText()
100 |     messageId = FuzzyText()
101 |     receiptHandle = FuzzyText()
102 | 
103 | 
104 | class SQSRecord(object):
105 |     """Model for an individual SQS Event Record"""
106 | 
107 |     def __init__(self, sqs):
108 |         self.sqs = sqs
109 | 
110 | 
111 | class Records(object):
112 |     """Generic Model for multiple Records for an event source (DynamoDB, SQS, SNS, etc.)"""
113 | 
114 |     def __init__(self, records):
115 |         self.Records = records
116 | 
117 | 
118 | class RecordsFactory(Factory):
119 |     """Factory for generating multiple Event (SNS, CloudWatch, Kinesis, DynamoDB, SQS) records."""
120 | 
121 |     class Meta:
122 |         """Defines the Model"""
123 | 
124 |         model = Records
125 | 
126 |     @post_generation
127 |     def Records(self, create, extracted, **kwargs):
128 |         """Generates the Records"""
129 |         if not create:
130 |             # Simple build, do nothing.
131 |             return
132 | 
133 |         if extracted:
134 |             # A list of groups were passed in, use them
135 |             for record in extracted:
136 |                 self.Records.append(record)
137 | 
138 | 
139 | class DynamoDBData(object):
140 |     """Model for the DynamoDB Stream data itself"""
141 | 
142 |     def __init__(self, NewImage, OldImage, Keys):
143 |         self.OldImage = {k: SERIA.serialize(v) for k, v in OldImage.items()}
144 |         self.NewImage = {k: SERIA.serialize(v) for k, v in NewImage.items()}
145 |         self.Keys = {k: SERIA.serialize(v) for k, v in Keys.items()}
146 | 
147 | 
148 | class DynamoDBDataFactory(Factory):
149 |     """DynamoDB Stream Data Component Model"""
150 | 
151 |     class Meta:
152 |         """Defines the Model"""
153 | 
154 |         model = DynamoDBData
155 | 
156 |     NewImage = {}
157 |     Keys = {}
158 |     OldImage = {}
159 | 
160 | 
161 | class DynamoDBRecord(object):
162 |     """DynamoDB Stream Model"""
163 | 
164 |     def __init__(self, dynamodb, eventName, userIdentity):
165 |         self.dynamodb = dynamodb
166 |         self.eventName = eventName
167 |         self.userIdentity = userIdentity
168 | 
169 | 
170 | class DynamoDBRecordFactory(Factory):
171 |     """Factory generating a DynamoDBRecord"""
172 | 
173 |     class Meta:
174 |         """Defines the Model"""
175 | 
176 |         model = DynamoDBRecord
177 | 
178 |     dynamodb = SubFactory(DynamoDBDataFactory)
179 |     eventName = 'INSERT'
180 |     userIdentity = SubFactory(UserIdentityFactory)
181 | 
182 | 
183 | class DynamoDBRecordsFactory(Factory):
184 |     """Factory to generate DynamoDB Stream Events"""
185 | 
186 |     class Meta:
187 |         """Defines the Model"""
188 | 
189 |         model = Records
190 | 
191 |     @post_generation
192 |     def Records(self, create, extracted, **kwargs):
193 |         """Generates the proper records"""
194 |         if not create:
195 |             # Simple build, do nothing.
196 |             return
197 | 
198 |         if extracted:
199 |             # A list of groups were passed in, use them
200 |             for record in extracted:
201 |                 self.Records.append(record)
202 | 
203 | 
204 | class Event(object):
205 |     """The base of the Event Model"""
206 | 
207 |     def __init__(self, account, region, time):
208 |         self.account = account
209 |         self.region = region
210 |         self.time = time
211 | 
212 | 
213 | class EventFactory(Factory):
214 |     """Parent class for all event factories."""
215 | 
216 |     class Meta:
217 |         """Defines the Model"""
218 | 
219 |         model = Event
220 | 
221 |     account = '123456789012'
222 |     region = 'us-east-1'
223 |     time = FuzzyDateTime(datetime.datetime.utcnow().replace(tzinfo=pytz.utc))
224 | 
225 | 
226 | class Detail(object):
227 |     """The CloudWatch Event `detail` Model"""
228 | 
229 |     # pylint: disable=W0622,R0902
230 |     def __init__(self, eventTime, awsEventType, awsRegion, eventName, userIdentity, id, eventSource,
231 |                  requestParameters, responseElements, collected=None):
232 |         self.eventTime = eventTime
233 |         self.awsRegion = awsRegion
234 |         self.awsEventType = awsEventType
235 |         self.userIdentity = userIdentity
236 |         self.id = id
237 |         self.eventSource = eventSource
238 |         self.requestParameters = requestParameters
239 |         self.responseElements = responseElements
240 |         self.eventName = eventName
241 |         self.collected = collected
242 | 
243 | 
244 | class DetailFactory(Factory):
245 |     """Factory for making the CloudWatch Event `detail` component"""
246 | 
247 |     class Meta:
248 |         """Defines the Model"""
249 | 
250 |         model = Detail
251 | 
252 |     eventTime = FuzzyDateTime(datetime.datetime.utcnow().replace(tzinfo=pytz.utc, microsecond=0))
253 |     awsEventType = 'AwsApiCall'
254 |     userIdentity = SubFactory(UserIdentityFactory)
255 |     id = FuzzyText()
256 |     eventName = ''
257 |     requestParameters = dict()
258 |     responseElements = dict()
259 |     eventSource = 'aws.ec2'
260 |     awsRegion = 'us-east-1'
261 |     collected = None
262 | 
263 | 
264 | class CloudwatchEvent(Event):
265 |     """The CloudWatch Event Model"""
266 | 
267 |     def __init__(self, detail, account, region, time):
268 |         self.detail = detail
269 |         super().__init__(account, region, time)
270 | 
271 | 
272 | class CloudwatchEventFactory(EventFactory):
273 |     """Factory for generating CloudWatch Events"""
274 | 
275 |     class Meta:
276 |         """Defines the Model"""
277 | 
278 |         model = CloudwatchEvent
279 | 
280 |     detail = SubFactory(DetailFactory)
281 | 
282 | 
283 | class HistoricalPollingEvent(Event):
284 |     """Polling Event Model"""
285 | 
286 |     def __init__(self, detail, account, region, time):
287 |         self.detail = detail
288 |         super().__init__(account, region, time)
289 | 
290 | 
291 | class HistoricalPollingEventFactory(CloudwatchEventFactory):
292 |     """Factory for generating historical polling events"""
293 | 
294 |     class Meta:
295 |         """Defines the model"""
296 | 
297 |         model = HistoricalPollingEvent
298 | 
299 |     detail = SubFactory(DetailFactory)
300 | 
301 | 
302 | class SnsData:
303 |     """SNS Event model"""
304 | 
305 |     def __init__(self, Message, EventSource, EventVersion, EventSubscriptionArn):
306 |         self.Message = Message
307 |         self.EventSource = EventSource
308 |         self.EventVersion = EventVersion
309 |         self.EventSubscriptionArn = EventSubscriptionArn
310 | 
311 | 
312 | class SnsDataFactory(Factory):
313 |     """SNS Event Model Factory"""
314 | 
315 |     class Meta:
316 |         """Defines the model"""
317 | 
318 |         model = SnsData
319 | 
320 |     Message = FuzzyText()
321 |     EventVersion = FuzzyText()
322 |     EventSource = "aws:sns"
323 |     EventSubscriptionArn = FuzzyText()
324 | 


--------------------------------------------------------------------------------
/historical/tests/conftest.py:
--------------------------------------------------------------------------------
  1 | # pylint: disable=E0401,C0103
  2 | """
  3 | .. module: historical.tests.test_s3
  4 |     :platform: Unix
  5 |     :copyright: (c) 2017 by Netflix Inc., see AUTHORS for more
  6 |     :license: Apache, see LICENSE for more details.
  7 | .. author:: Kevin Glisson <kglisson@netflix.com>
  8 | .. author:: Mike Grima <mgrima@netflix.com>
  9 | """
 10 | import os
 11 | 
 12 | import boto3
 13 | from mock import patch
 14 | from moto import mock_sqs
 15 | from moto.dynamodb2 import mock_dynamodb2
 16 | from moto.s3 import mock_s3
 17 | from moto.iam import mock_iam
 18 | from moto.sts import mock_sts
 19 | from moto.ec2 import mock_ec2
 20 | import pytest
 21 | 
 22 | 
 23 | @pytest.fixture(scope='function')
 24 | def s3():
 25 |     """Mocked S3 Fixture."""
 26 |     with mock_s3():
 27 |         yield boto3.client('s3', region_name='us-east-1')
 28 | 
 29 | 
 30 | @pytest.fixture(scope='function')
 31 | def ec2():
 32 |     """Mocked EC2 Fixture."""
 33 |     with mock_ec2():
 34 |         yield boto3.client('ec2', region_name='us-east-1')
 35 | 
 36 | 
 37 | @pytest.fixture(scope='function')
 38 | def sts():
 39 |     """Mocked STS Fixture."""
 40 |     with mock_sts():
 41 |         yield boto3.client('sts', region_name='us-east-1')
 42 | 
 43 | 
 44 | @pytest.fixture(scope='function')
 45 | def iam():
 46 |     """Mocked IAM Fixture."""
 47 |     with mock_iam():
 48 |         yield boto3.client('iam', region_name='us-east-1')
 49 | 
 50 | 
 51 | @pytest.fixture(scope='function')
 52 | def dynamodb():
 53 |     """Mocked DynamoDB Fixture."""
 54 |     with mock_dynamodb2():
 55 |         yield boto3.client('dynamodb', region_name='us-east-1')
 56 | 
 57 | 
 58 | # pylint: disable=W0621,W0613
 59 | @pytest.fixture(scope='function')
 60 | def retry():
 61 |     """Mock the retry library so that it doesn't retry."""
 62 |     def mock_retry_decorator(*args, **kwargs):
 63 |         def retry(func):
 64 |             return func
 65 |         return retry
 66 | 
 67 |     patch_retry = patch('retrying.retry', mock_retry_decorator)
 68 |     yield patch_retry.start()
 69 | 
 70 |     patch_retry.stop()
 71 | 
 72 | 
 73 | @pytest.fixture(scope='function')
 74 | def swag_accounts(s3, retry):
 75 |     """Create mocked SWAG Accounts."""
 76 |     from swag_client.backend import SWAGManager
 77 |     from swag_client.util import parse_swag_config_options
 78 | 
 79 |     bucket_name = 'SWAG'
 80 |     data_file = 'accounts.json'
 81 |     region = 'us-east-1'
 82 |     owner = 'third-party'
 83 | 
 84 |     s3.create_bucket(Bucket=bucket_name)
 85 |     os.environ['SWAG_BUCKET'] = bucket_name
 86 |     os.environ['SWAG_DATA_FILE'] = data_file
 87 |     os.environ['SWAG_REGION'] = region
 88 |     os.environ['SWAG_OWNER'] = owner
 89 | 
 90 |     swag_opts = {
 91 |         'swag.type': 's3',
 92 |         'swag.bucket_name': bucket_name,
 93 |         'swag.data_file': data_file,
 94 |         'swag.region': region,
 95 |         'swag.cache_expires': 0
 96 |     }
 97 | 
 98 |     swag = SWAGManager(**parse_swag_config_options(swag_opts))
 99 | 
100 |     account = {
101 |         'aliases': ['test'],
102 |         'contacts': ['admins@test.net'],
103 |         'description': 'LOL, Test account',
104 |         'email': 'testaccount@test.net',
105 |         'environment': 'test',
106 |         'id': '012345678910',
107 |         'name': 'testaccount',
108 |         'owner': 'third-party',
109 |         'provider': 'aws',
110 |         'sensitive': False,
111 |         'account_status': 'ready',
112 |         'services': [
113 |             {
114 |                 'name': 'historical',
115 |                 'status': [
116 |                     {
117 |                         'region': 'all',
118 |                         'enabled': True
119 |                     }
120 |                 ]
121 |             }
122 |         ]
123 |     }
124 | 
125 |     swag.create(account)
126 | 
127 | 
128 | @pytest.fixture(scope='function')
129 | def historical_role(iam, sts):
130 |     """Create the mocked Historical IAM role that Historical Lambdas would need to assume to List and
131 |     Collect details about a given technology in the target account.
132 | 
133 |     """
134 |     iam.create_role(RoleName='historicalrole', AssumeRolePolicyDocument='{}')
135 |     os.environ['HISTORICAL_ROLE'] = 'historicalrole'
136 | 
137 | 
138 | @pytest.fixture(scope='function')
139 | def historical_sqs():
140 |     """Create the Mocked SQS queues that are used throughout Historical."""
141 |     with mock_sqs():
142 |         client = boto3.client('sqs', region_name='us-east-1')
143 | 
144 |         # Poller Tasker Queue:
145 |         client.create_queue(QueueName='pollertaskerqueue')
146 |         os.environ['POLLER_TASKER_QUEUE_NAME'] = 'pollertaskerqueue'
147 | 
148 |         # Poller Queue:
149 |         client.create_queue(QueueName='pollerqueue')
150 |         os.environ['POLLER_QUEUE_NAME'] = 'pollerqueue'
151 | 
152 |         # Event Queue:
153 |         client.create_queue(QueueName='eventqueue')
154 |         os.environ['EVENT_QUEUE_NAME'] = 'eventqueue'
155 | 
156 |         # Proxy Queue:
157 |         client.create_queue(QueueName='proxyqueue')
158 | 
159 |         yield client
160 | 
161 | 
162 | @pytest.fixture(scope='function')
163 | def buckets(s3):
164 |     """Create Testing S3 buckets for testing the S3 stack."""
165 |     # Create buckets:
166 |     for i in range(0, 50):
167 |         s3.create_bucket(Bucket=f'testbucket{i}')
168 |         s3.put_bucket_tagging(
169 |             Bucket=f'testbucket{i}',
170 |             Tagging={
171 |                 'TagSet': [
172 |                     {
173 |                         'Key': 'theBucketName',
174 |                         'Value': f'testbucket{i}'
175 |                     }
176 |                 ]
177 |             }
178 |         )
179 |         s3.put_bucket_lifecycle_configuration(Bucket=f'testbucket{i}', LifecycleConfiguration={
180 |             'Rules': [
181 |                 {
182 |                     'Expiration': {
183 |                         'Days': 5
184 |                     },
185 |                     'ID': 'string',
186 |                     'Filter': {
187 |                         'Prefix': 'string',
188 |                         'Tag': {
189 |                             'Key': 'string',
190 |                             'Value': 'string'
191 |                         },
192 |                         'And': {
193 |                             'Prefix': 'string',
194 |                             'Tags': [
195 |                                 {
196 |                                     'Key': 'string',
197 |                                     'Value': 'string'
198 |                                 },
199 |                             ]
200 |                         }
201 |                     },
202 |                     'Status': 'Enabled',
203 |                     'NoncurrentVersionTransitions': [
204 |                         {
205 |                             'NoncurrentDays': 123,
206 |                             'StorageClass': 'GLACIER'
207 |                         },
208 |                     ],
209 |                     'NoncurrentVersionExpiration': {
210 |                         'NoncurrentDays': 123
211 |                     }
212 |                 }
213 |             ]
214 |         })
215 | 
216 | 
217 | @pytest.fixture(scope='function')
218 | def security_groups(ec2):
219 |     """Creates security groups."""
220 |     sg = ec2.create_security_group(
221 |         Description='test security group',
222 |         GroupName='test',
223 |         VpcId='vpc-test'
224 |     )
225 | 
226 |     # Tag it:
227 |     ec2.create_tags(Resources=[sg['GroupId']], Tags=[
228 |         {
229 |             "Key": "Some",
230 |             "Value": "Value"
231 |         },
232 |         {
233 |             "Key": "Empty",
234 |             "Value": ""
235 |         }
236 |     ])
237 | 
238 |     yield sg
239 | 
240 | 
241 | @pytest.fixture(scope='function')
242 | def vpcs(ec2):
243 |     """Creates vpcs."""
244 |     yield ec2.create_vpc(
245 |         CidrBlock='192.168.1.1/32',
246 |         AmazonProvidedIpv6CidrBlock=True,
247 |         InstanceTenancy='default'
248 |     )['Vpc']
249 | 
250 | 
251 | @pytest.fixture(scope='function')
252 | def mock_lambda_environment():
253 |     """Mocks out the AWS Lambda environment context that AWS Lambda passes into the handler."""
254 |     os.environ['SENTRY_ENABLED'] = 'f'
255 | 
256 |     class MockedContext:
257 |         """Class that Mocks out the Lambda `context` object."""
258 | 
259 |         def get_remaining_time_in_millis(self):
260 |             """Mocked method to return the remaining Lambda time in milliseconds."""
261 |             return 99999
262 | 
263 |     return MockedContext()
264 | 
265 | 
266 | @pytest.fixture(scope='function')
267 | def current_security_group_table():
268 |     """Create the Current Security Group Table."""
269 |     from historical.security_group.models import CurrentSecurityGroupModel
270 |     mock_dynamodb2().start()
271 |     yield CurrentSecurityGroupModel.create_table(read_capacity_units=1, write_capacity_units=1, wait=True)
272 |     mock_dynamodb2().stop()
273 | 
274 | 
275 | @pytest.fixture(scope='function')
276 | def durable_security_group_table():
277 |     """Create the Durable Security Group Table."""
278 |     from historical.security_group.models import DurableSecurityGroupModel
279 |     mock_dynamodb2().start()
280 |     yield DurableSecurityGroupModel.create_table(read_capacity_units=1, write_capacity_units=1, wait=True)
281 |     mock_dynamodb2().stop()
282 | 
283 | 
284 | @pytest.fixture(scope='function')
285 | def current_vpc_table():
286 |     """Create the Current VPC Table."""
287 |     from historical.vpc.models import CurrentVPCModel
288 |     mock_dynamodb2().start()
289 |     yield CurrentVPCModel.create_table(read_capacity_units=1, write_capacity_units=1, wait=True)
290 |     mock_dynamodb2().stop()
291 | 
292 | 
293 | @pytest.fixture(scope='function')
294 | def durable_vpc_table():
295 |     """Create the Durable VPC Table."""
296 |     from historical.vpc.models import DurableVPCModel
297 |     mock_dynamodb2().start()
298 |     yield DurableVPCModel.create_table(read_capacity_units=1, write_capacity_units=1, wait=True)
299 |     mock_dynamodb2().stop()
300 | 
301 | 
302 | @pytest.fixture(scope='function')
303 | def current_s3_table(dynamodb):
304 |     """Create the Current S3 Table."""
305 |     from historical.s3.models import CurrentS3Model
306 |     yield CurrentS3Model.create_table(read_capacity_units=1, write_capacity_units=1, wait=True)
307 | 
308 | 
309 | @pytest.fixture(scope='function')
310 | def durable_s3_table(dynamodb):
311 |     """Create the Durable S3 Table."""
312 |     from historical.s3.models import DurableS3Model
313 |     yield DurableS3Model.create_table(read_capacity_units=1, write_capacity_units=1, wait=True)
314 | 


--------------------------------------------------------------------------------
/mkdocs/docs/architecture.md:
--------------------------------------------------------------------------------
  1 | # Historical Architecture
  2 | Historical is a serverless AWS application that consists of many components.
  3 | 
  4 | Historical is written in Python 3 and heavily leverages AWS technologies such as Lambda, SNS, SQS, DynamoDB, CloudTrail, and CloudWatch.
  5 | 
  6 | ## General Architectural Overview
  7 | Here is a diagram of the Historical Architecture:
  8 | <a href="../img/historical-overview.jpg"><img src="../img/historical-overview.jpg"></a>
  9 | 
 10 | **Please Note:** This stack is deployed _for every technology monitored_! There are many, many Historical stacks that will be deployed.
 11 | 
 12 | ### Polling vs. Events
 13 | Historical is *both* a polling and event driven system. It will periodically poll AWS accounts for changes. However, because Historical responds to events in the environment, polling doesn't need to be very aggressive and only happens once every few hours.
 14 | 
 15 | Polling is necessary because events are not 100% reliable. This ensures that data is current just in case an event is dropped.
 16 | 
 17 | Historical is *eventually consistent*, and makes a *best effort* to maintain a current and up-to-date inventory of AWS resources.
 18 | 
 19 | ## Prerequisite Overview
 20 | 
 21 | This is a high-level overview of the prerequisites that are required to make Historical operate. For more details on setting up the required prerequisites, please review the [installation documentation](../installation).
 22 | 
 23 | 1. **ALL AWS accounts** accounts have CloudTrail enabled.
 24 | 1. **ALL AWS accounts** and **ALL regions** in those accounts have a CloudWatch Event rule that captures ALL events and sends them over the CloudWatch Event Bus to the Historical account for processing.
 25 | 1. IAM roles exist in **ALL** accounts and are assumable by the Historical Lambda functions.
 26 | 1. Historical makes use of [SWAG](https://github.com/Netflix-Skunkworks/swag-client) to define which AWS accounts Historical is enabled for. While not a hard requirement, use of SWAG is _highly recommended_.
 27 | 
 28 | ## Regions
 29 | Historical has the concept of regions that fit 3 categories:
 30 | 
 31 | - Primary region
 32 | - Secondary region(s)
 33 | - Off region(s)
 34 | 
 35 | The **Primary Region** is considered the "Base" of Historical. This region has all of the major components that make up Historical. This region processes all in-region AND off-region originating events.
 36 | 
 37 | The **Off Region(s)** are regions you don't have a lot of infrastructure deployed in. However, you still want visibility in these regions should events happen there. These regions have very minimal amount of Historical-related infrastructure deployed. These regions will forward ALL events to the Primary Region for processing.
 38 | 
 39 | The **Secondary Region(s)** are regions that are important to you. Secondary regions look like the primary region and process in-region events. If you have a lot of infrastructure within a region, you should place a Historical stack there. This will allow you to quickly receive and process events, and also gives your applications a regionally-local means of accessing Historical data.
 40 | 
 41 | **Note:** Place a Historical off-region stack in any region that is not Primary or Secondary. This will ensure full visibility in your environment.
 42 | 
 43 | ## Component Overview
 44 | This section describes some of the high-level architectural components.
 45 | 
 46 | ### Primary Components
 47 | Below are the primary components of the Historical architecture:
 48 | 
 49 | 1. CloudWatch Event Rules
 50 | 1. CloudWatch Change Events
 51 | 1. Poller
 52 | 1. Collector
 53 | 1. Current Table
 54 | 1. DynamoDB Stream Proxy
 55 | 1. Differ
 56 | 1. Durable Table
 57 | 1. Off-region SNS forwarders
 58 | 
 59 | As general overview, the infrastructure is an event processing and enriching pipeline. An event will arrive, will get enriched with additional information, and will provide notifications to downstream subscribers on the given changes.
 60 | 
 61 | SQS queues are used in as many places as much as possible to invoke Lambda functions. SQS makes it easy to provide Lambda execution concurrency, auto-scaling, retry of failures without blocking, and dead-letter queuing capabilities.
 62 | 
 63 | SNS topics are used to make it easy for _N_ number of interested parties to subscribe to the Historical DynamoDB table changes. Presently, this is only attached to the Durable table. More details on this below.
 64 | 
 65 | ### CloudWatch Event Rules
 66 | There are two different CloudWatch Event Rules:
 67 | 
 68 | 1. Timed Events
 69 | 1. Change Events
 70 | 
 71 | Timed events are used to kick off the Poller. See the section on the poller below for additional details. Change events are events that arrive from CloudWatch Events when an AWS resource's configuration changes.
 72 | 
 73 | ### Poller
 74 | The Poller's primary function is to obtain a full inventory of AWS resources.
 75 | 
 76 | The Poller is split into two parts:
 77 | 
 78 | 1. Poller Tasker
 79 | 1. Poller
 80 | 
 81 | The "Poller Tasker" is a Lambda function that iterates over all AWS accounts Historical is configured for, and tasks the Poller to *list* all resources in the given environment.
 82 | 
 83 | The Poller Tasker in the *PRIMARY REGION* tasks the Poller to list resources that reside in the primary region and all off-regions. A Poller Tasker in a *SECONDARY REGION* will only task a poller to describe resources that reside in the same region.
 84 | 
 85 | The Poller *lists* all resources in a given account/region, and tasks a "Poller Collector" to fetch details about the resource in question.
 86 | 
 87 | ### Collector
 88 | The Collector describes a given AWS resource and stores its configuration to the "Current" DynamoDB table. The Collector is split into two parts (same code, different invocation mechanisms):
 89 | 
 90 | 1. Poller Collector
 91 | 1. Event Collector
 92 | 
 93 | The Poller Collector is a collector that will only respond to polling events. The Event Collector will only respond to CloudWatch change events.
 94 | 
 95 | The Collector is split into two parts to prevent change events from being sandwiched in between polling events. Historical gives priority to change events over polling events to ensure timeliness of resource configuration changes.
 96 | 
 97 | In both cases, the Collector will go to the AWS account and region that the item resides in, and use `boto3` to describe the configuration of the resource.
 98 | 
 99 | ### Current Table
100 | The "Current" table is a global DynamoDB table that stores the current configuration of a given resource in
101 | AWS.
102 | 
103 | This acts as a cache for current the state of the environment.
104 | 
105 | The Current table has as DynamoDB Stream that will kick off a DynamoDB Stream Proxy that then invokes the Differ.
106 | 
107 | #### Special Note:
108 | The Current table has a TTL set on all items. This TTL is updated any time a change event arrives, or when the Poller runs. The TTL is set to clean-up orphaned items, which can happen if a deletion event is lost. Deleted items will not be picked up by the Poller (only lists items that exist in the account) and thus, will be removed from the Current table on TTL expiration. As a result, the Poller must "see" a resource at least once every few hours before it is deemed deleted from the environment.
109 | 
110 | ### DynamoDB Stream Proxy
111 | The DynamoDB Stream Proxy is a Lambda function that proxies DynamoDB Stream events to SNS or SQS. The purpose is to task subsequent Lambda functions on the specific changes that happen to the DynamoDB table.
112 | 
113 | The Historical infrastructure has two configurations for the DynamoDB Proxy:
114 | 
115 | 1. Current Table Forwarder (DynamoDB Stream Proxy to Differ SQS)
116 | 1. Durable Table Forwarder (DynamoDB Stream Proxy to Change Notification SNS)
117 | 
118 | The Current Table Forwarder proxies events to the SQS queue that invokes the Differ Lambda function.
119 | 
120 | The Durable Table Forwarder proxies events to an SNS topic that can be subscribed. SNS enables *N* subscribers to Historical events. The Durable table proxy serializes the DynamoDB Stream events into an easily consumable JSON that contains the full and complete configuration of the resource in question, along with the the CloudTrail context. This enables downstream applications to make intelligent decisions about the changes that occur as they have the full and complete context of the resource and the changes made to it.
121 | 
122 | #### Special Note:
123 | DynamoDB Streams in Global DynamoDB tables invoke this Lambda whenever a DynamoDB update occurs in ANY of the regions the table is configured to sync with. For the Current table, this can result in Historical Lambda functions _"stepping on each other's toes"_ (this is not a concern for Durable table changes). To avoid this, the Current table DynamoDB Stream Proxy has a `PROXY_REGIONS` environment variable that is configured to only proxy DynamoDB Stream updates that occur to resources that reside in the specified regions. The *PRIMARY REGION* must be configured to proxy events that occur in the primary region, and all off-regions. The *SECONDARY REGION(S)* must be configured to proxy events that occur in the same region.
124 | 
125 | #### Another Special Note:
126 | DynamoDB items are capped to 400KB. SNS and SQS have maximum message sizes of 256KB. Logic exists to handle cases where DynamoDB items are too big to send over to SNS/SQS. Follow-up Lambdas and subscribers will need to make use of the Historical code to fetch the full configuration of the item either out of the Current or Durable tables (depending on the use case). Enhancements will be made in the future to help address this to make the data easier to consume in these (rare) circumstances.
127 | 
128 | ### Differ
129 | The Differ is a Lambda function that gets invoked upon changes to the Current table. The DynamoDB stream provides the Differ (via the Proxy) the current state of the resource that changed. The Differ checks if the resource in question has had an effective change. If so, the Differ saves a new change record to the Durable table to maintain history of the resource as it changes over time, and also saves the CloudTrail context.
130 | 
131 | ### Durable Table
132 | The "Durable" table is a Global DynamoDB table that stores a resource configuration with change history.
133 | 
134 | The Durable table has as DynamoDB Stream that invokes another DynamoDB Stream Proxy. This is used to notify downstream subscribers of the effective changes that occur to the environment.
135 | 
136 | ### Off-Region SNS Forwarders
137 | Very bare infrastructure is intentionally deployed in the off-regions. This helps to reduce costs and complexity of the Historical infrastructure.
138 | 
139 | The off-region SNS forwarders are SNS topics that receive CloudWatch events for resource changes that occur in the off-regions. These topics forward events to the Event Collector SQS queue in the primary region for processing.
140 | 
141 | ## Special Stacks
142 | Some resource types have different stack configurations due to nuances of the resource type.
143 | 
144 | The following resource types have different stack types:
145 | 
146 | - S3
147 | - IAM (Coming Soon!)
148 | 
149 | ### S3
150 | The AWS S3 stack is almost identical to the standard stack. The difference is due to AWS S3 buckets having a globally unique namespace.
151 | 
152 | For S3, because it is not presently possible to only poll for in-region S3 buckets, the poller lives in the primary region only. The poller in the primary region polls for all S3 buckets in all regions.
153 | 
154 | The secondary regions will still respond to in-region events, but lack all polling components.
155 | 
156 | <a href="../img/historical-s3.jpg">This diagram showcases the S3 stack.</a>
157 | 
158 | ### IAM
159 | This is coming soon!
160 | 
161 | ## Installation & Configuration
162 | 
163 | Please refer to the [installation docs](../installation) for additional details.
164 | 


--------------------------------------------------------------------------------