├── .gitignore
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── LICENSE
├── README.md
├── app.py
├── assets
    ├── iceberg-data-level-01.png
    ├── iceberg-data-level-02.png
    ├── iceberg-data-level-03.png
    └── iceberg-table.png
├── cdk.json
├── cdk_stacks
    ├── __init__.py
    ├── glue_job_role.py
    ├── glue_stream_data_schema.py
    ├── glue_streaming_job.py
    ├── kds.py
    ├── lakeformation_permissions.py
    └── s3.py
├── glue-streaming-data-to-iceberg-table.svg
├── requirements-dev.txt
├── requirements.txt
├── source.bat
└── src
    ├── main
        └── python
        │   ├── spark_iceberg_writes_with_dataframe.py
        │   ├── spark_iceberg_writes_with_sql_insert_overwrite.py
        │   └── spark_iceberg_writes_with_sql_merge_into.py
    └── utils
        └── gen_fake_kinesis_stream_data.py


/.gitignore:
--------------------------------------------------------------------------------
  1 | # Byte-compiled / optimized / DLL files
  2 | __pycache__/
  3 | *.py[cod]
  4 | *$py.class
  5 | 
  6 | # C extensions
  7 | *.so
  8 | 
  9 | # Distribution / packaging
 10 | .Python
 11 | build/
 12 | develop-eggs/
 13 | dist/
 14 | downloads/
 15 | eggs/
 16 | .eggs/
 17 | lib/
 18 | lib64/
 19 | parts/
 20 | sdist/
 21 | var/
 22 | wheels/
 23 | *.egg-info/
 24 | .installed.cfg
 25 | *.egg
 26 | MANIFEST
 27 | 
 28 | # PyInstaller
 29 | #  Usually these files are written by a python script from a template
 30 | #  before PyInstaller builds the exe, so as to inject date/other infos into it.
 31 | *.manifest
 32 | *.spec
 33 | 
 34 | # Installer logs
 35 | pip-log.txt
 36 | pip-delete-this-directory.txt
 37 | 
 38 | # Unit test / coverage reports
 39 | htmlcov/
 40 | .tox/
 41 | .coverage
 42 | .coverage.*
 43 | .cache
 44 | nosetests.xml
 45 | coverage.xml
 46 | *.cover
 47 | .hypothesis/
 48 | .pytest_cache/
 49 | 
 50 | # Translations
 51 | *.mo
 52 | *.pot
 53 | 
 54 | # Django stuff:
 55 | *.log
 56 | local_settings.py
 57 | db.sqlite3
 58 | 
 59 | # Flask stuff:
 60 | instance/
 61 | .webassets-cache
 62 | 
 63 | # Scrapy stuff:
 64 | .scrapy
 65 | 
 66 | # Sphinx documentation
 67 | docs/_build/
 68 | 
 69 | # PyBuilder
 70 | target/
 71 | 
 72 | # Jupyter Notebook
 73 | .ipynb_checkpoints
 74 | Untitled*.ipynb
 75 | 
 76 | # pyenv
 77 | .python-version
 78 | 
 79 | # celery beat schedule file
 80 | celerybeat-schedule
 81 | 
 82 | # SageMath parsed files
 83 | *.sage.py
 84 | 
 85 | # Environments
 86 | .env
 87 | .venv
 88 | env/
 89 | venv/
 90 | ENV/
 91 | env.bak/
 92 | venv.bak/
 93 | 
 94 | # Spyder project settings
 95 | .spyderproject
 96 | .spyproject
 97 | 
 98 | # Rope project settings
 99 | .ropeproject
100 | 
101 | # mkdocs documentation
102 | /site
103 | 
104 | # mypy
105 | .mypy_cache/
106 | 
107 | .DS_Store
108 | .idea/
109 | bin/
110 | lib64
111 | pyvenv.cfg
112 | *.bak
113 | share/
114 | cdk.out/
115 | cdk.context.json*
116 | zap/
117 | 
118 | */.gitignore
119 | */setup.py
120 | */source.bat
121 | 
122 | */*/.gitignore
123 | */*/setup.py
124 | */*/source.bat
125 | 
126 | 


--------------------------------------------------------------------------------
/CODE_OF_CONDUCT.md:
--------------------------------------------------------------------------------
1 | ## Code of Conduct
2 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct).
3 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact
4 | opensource-codeofconduct@amazon.com with any additional questions or comments.
5 | 


--------------------------------------------------------------------------------
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
 1 | # Contributing Guidelines
 2 | 
 3 | Thank you for your interest in contributing to our project. Whether it's a bug report, new feature, correction, or additional
 4 | documentation, we greatly value feedback and contributions from our community.
 5 | 
 6 | Please read through this document before submitting any issues or pull requests to ensure we have all the necessary
 7 | information to effectively respond to your bug report or contribution.
 8 | 
 9 | 
10 | ## Reporting Bugs/Feature Requests
11 | 
12 | We welcome you to use the GitHub issue tracker to report bugs or suggest features.
13 | 
14 | When filing an issue, please check existing open, or recently closed, issues to make sure somebody else hasn't already
15 | reported the issue. Please try to include as much information as you can. Details like these are incredibly useful:
16 | 
17 | * A reproducible test case or series of steps
18 | * The version of our code being used
19 | * Any modifications you've made relevant to the bug
20 | * Anything unusual about your environment or deployment
21 | 
22 | 
23 | ## Contributing via Pull Requests
24 | Contributions via pull requests are much appreciated. Before sending us a pull request, please ensure that:
25 | 
26 | 1. You are working against the latest source on the *main* branch.
27 | 2. You check existing open, and recently merged, pull requests to make sure someone else hasn't addressed the problem already.
28 | 3. You open an issue to discuss any significant work - we would hate for your time to be wasted.
29 | 
30 | To send us a pull request, please:
31 | 
32 | 1. Fork the repository.
33 | 2. Modify the source; please focus on the specific change you are contributing. If you also reformat all the code, it will be hard for us to focus on your change.
34 | 3. Ensure local tests pass.
35 | 4. Commit to your fork using clear commit messages.
36 | 5. Send us a pull request, answering any default questions in the pull request interface.
37 | 6. Pay attention to any automated CI failures reported in the pull request, and stay involved in the conversation.
38 | 
39 | GitHub provides additional document on [forking a repository](https://help.github.com/articles/fork-a-repo/) and
40 | [creating a pull request](https://help.github.com/articles/creating-a-pull-request/).
41 | 
42 | 
43 | ## Finding contributions to work on
44 | Looking at the existing issues is a great way to find something to contribute on. As our projects, by default, use the default GitHub issue labels (enhancement/bug/duplicate/help wanted/invalid/question/wontfix), looking at any 'help wanted' issues is a great place to start.
45 | 
46 | 
47 | ## Code of Conduct
48 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct).
49 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact
50 | opensource-codeofconduct@amazon.com with any additional questions or comments.
51 | 
52 | 
53 | ## Security issue notifications
54 | If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our [vulnerability reporting page](http://aws.amazon.com/security/vulnerability-reporting/). Please do **not** create a public github issue.
55 | 
56 | 
57 | ## Licensing
58 | 
59 | See the [LICENSE](LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution.
60 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
 2 | 
 3 | Permission is hereby granted, free of charge, to any person obtaining a copy of
 4 | this software and associated documentation files (the "Software"), to deal in
 5 | the Software without restriction, including without limitation the rights to
 6 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
 7 | the Software, and to permit persons to whom the Software is furnished to do so.
 8 | 
 9 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
10 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
11 | FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
12 | COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
13 | IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
14 | CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
15 | 
16 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | 
  2 | # AWS Glue Streaming ETL Job with Apace Iceberg CDK Python project!
  3 | 
  4 | ![glue-streaming-data-to-iceberg-table](./glue-streaming-data-to-iceberg-table.svg)
  5 | 
  6 | In this project, we create a streaming ETL job in AWS Glue to integrate Iceberg with a streaming use case and create an in-place updatable data lake on Amazon S3.
  7 | 
  8 | After ingested to Amazon S3, you can query the data with [Amazon Athena](http://aws.amazon.com/athena).
  9 | 
 10 | This project can be deployed with [AWS CDK Python](https://docs.aws.amazon.com/cdk/api/v2/).
 11 | The `cdk.json` file tells the CDK Toolkit how to execute your app.
 12 | 
 13 | This project is set up like a standard Python project.  The initialization
 14 | process also creates a virtualenv within this project, stored under the `.venv`
 15 | directory.  To create the virtualenv it assumes that there is a `python3`
 16 | (or `python` for Windows) executable in your path with access to the `venv`
 17 | package. If for any reason the automatic creation of the virtualenv fails,
 18 | you can create the virtualenv manually.
 19 | 
 20 | To manually create a virtualenv on MacOS and Linux:
 21 | 
 22 | ```
 23 | $ python3 -m venv .venv
 24 | ```
 25 | 
 26 | After the init process completes and the virtualenv is created, you can use the following
 27 | step to activate your virtualenv.
 28 | 
 29 | ```
 30 | $ source .venv/bin/activate
 31 | ```
 32 | 
 33 | If you are a Windows platform, you would activate the virtualenv like this:
 34 | 
 35 | ```
 36 | % .venv\Scripts\activate.bat
 37 | ```
 38 | 
 39 | Once the virtualenv is activated, you can install the required dependencies.
 40 | 
 41 | ```
 42 | (.venv) $ pip install -r requirements.txt
 43 | ```
 44 | 
 45 | In case of `AWS Glue 3.0`, before synthesizing the CloudFormation, **you first set up Apache Iceberg connector for AWS Glue to use Apache Iceber with AWS Glue jobs.** (For more information, see [References](#references) (2))
 46 | 
 47 | Then you should set approperly the cdk context configuration file, `cdk.context.json`.
 48 | 
 49 | For example:
 50 | <pre>
 51 | {
 52 |   "kinesis_stream_name": "iceberg-demo-stream",
 53 |   "glue_assets_s3_bucket_name": "aws-glue-assets-123456789012-atq4q5u",
 54 |   "glue_job_script_file_name": "spark_iceberg_writes_with_dataframe.py",
 55 |   "glue_job_name": "streaming_data_from_kds_into_iceberg_table",
 56 |   "glue_job_input_arguments": {
 57 |     "--catalog": "job_catalog",
 58 |     "--database_name": "iceberg_demo_db",
 59 |     "--table_name": "iceberg_demo_table",
 60 |     "--primary_key": "name",
 61 |     "--kinesis_table_name": "iceberg_demo_kinesis_stream_table",
 62 |     "--starting_position_of_kinesis_iterator": "LATEST",
 63 |     "--iceberg_s3_path": "s3://glue-iceberg-demo-atq4q5u/iceberg_demo_db",
 64 |     "--lock_table_name": "iceberg_lock",
 65 |     "--aws_region": "us-east-1",
 66 |     "--window_size": "100 seconds",
 67 |     "--extra-jars": "s3://aws-glue-assets-123456789012-atq4q5u/extra-jars/aws-sdk-java-2.17.224.jar",
 68 |     "--user-jars-first": "true"
 69 |   },
 70 |   "glue_connections_name": "iceberg-connection",
 71 |   "glue_kinesis_table": {
 72 |     "database_name": "iceberg_demo_db",
 73 |     "table_name": "iceberg_demo_kinesis_stream_table",
 74 |     "columns": [
 75 |       {
 76 |         "name": "name",
 77 |         "type": "string"
 78 |       },
 79 |       {
 80 |         "name": "age",
 81 |         "type": "int"
 82 |       },
 83 |       {
 84 |         "name": "m_time",
 85 |         "type": "string"
 86 |       }
 87 |     ]
 88 |   }
 89 | }
 90 | </pre>
 91 | 
 92 | :information_source: `--primary_key` option should be set by Iceberg table's primary column name.
 93 | 
 94 | :warning: **You should create a S3 bucket for a glue job script and upload the glue job script file into the s3 bucket.**
 95 | 
 96 | At this point you can now synthesize the CloudFormation template for this code.
 97 | 
 98 | <pre>
 99 | (.venv) $ export CDK_DEFAULT_ACCOUNT=$(aws sts get-caller-identity --query Account --output text)
100 | (.venv) $ export CDK_DEFAULT_REGION=$(aws configure get region)
101 | (.venv) $ cdk synth --all
102 | </pre>
103 | 
104 | To add additional dependencies, for example other CDK libraries, just add
105 | them to your `setup.py` file and rerun the `pip install -r requirements.txt`
106 | command.
107 | 
108 | ## Run Test
109 | 
110 | 1. Set up **Apache Iceberg connector for AWS Glue** to use Apache Iceberg with AWS Glue jobs.
111 | 2. Create a S3 bucket for Apache Iceberg table
112 |    <pre>
113 |    (.venv) $ cdk deploy IcebergS3Path
114 |    </pre>
115 | 3. Create a Kinesis data stream
116 |    <pre>
117 |    (.venv) $ cdk deploy KinesisStreamAsGlueStreamingJobDataSource
118 |    </pre>
119 | 4. Define a schema for the streaming data
120 |    <pre>
121 |    (.venv) $ cdk deploy GlueSchemaOnKinesisStream
122 |    </pre>
123 | 
124 |    Running `cdk deploy GlueSchemaOnKinesisStream` command is like that we create a schema manually using the AWS Glue Data Catalog as the following steps:
125 | 
126 |    (1) On the AWS Glue console, choose **Data Catalog**.<br/>
127 |    (2) Choose **Databases**, and click **Add database**.<br/>
128 |    (3) Create a database with the name `iceberg_demo_db`.<br/>
129 |    (4) On the **Data Catalog** menu, Choose **Tables**, and click **Add Table**.<br/>
130 |    (5) For the table name, enter `iceberg_demo_kinesis_stream_table`.<br/>
131 |    (6) Select `iceberg_demo_db` as a database.<br/>
132 |    (7) Choose **Kinesis** as the type of source.<br/>
133 |    (8) Enter the name of the stream.<br/>
134 |    (9) For the classification, choose **JSON**.<br/>
135 |    (10) Define the schema according to the following table.<br/>
136 |     | Column name | Data type | Example |
137 |     |-------------|-----------|---------|
138 |     | name	| string | "Ricky" |
139 |     | age | int | 23 |
140 |     | m_time | string | "2023-06-13 07:24:26" |
141 | 
142 |    (11) Choose **Finish**
143 | 
144 | 5. Upload **AWS SDK for Java 2.x** jar file into S3
145 |    <pre>
146 |    (.venv) $ wget https://repo1.maven.org/maven2/software/amazon/awssdk/aws-sdk-java/2.17.224/aws-sdk-java-2.17.224.jar
147 |    (.venv) $ aws s3 cp aws-sdk-java-2.17.224.jar s3://aws-glue-assets-123456789012-atq4q5u/extra-jars/aws-sdk-java-2.17.224.jar
148 |    </pre>
149 |    A Glue Streaming Job might fail because of the following error:
150 |    <pre>
151 |    py4j.protocol.Py4JJavaError: An error occurred while calling o135.start.
152 |    : java.lang.NoSuchMethodError: software.amazon.awssdk.utils.SystemSetting.getStringValueFromEnvironmentVariable(Ljava/lang/String;)Ljava/util/Optional
153 |    </pre>
154 |    We can work around the problem by starting the Glue Job with the additional parameters:
155 |    <pre>
156 |    --extra-jars <i>s3://path/to/aws-sdk-for-java-v2.jar</i>
157 |    --user-jars-first true
158 |    </pre>
159 |    In order to do this, we might need to upload **AWS SDK for Java 2.x** jar file into S3.
160 | 6. Create Glue Streaming Job
161 | 
162 |    * (step 1) Select one of Glue Job Scripts and upload into S3
163 | 
164 |      **List of Glue Job Scirpts**
165 |      | File name | Spark Writes |
166 |      |-----------|--------------|
167 |      | spark_iceberg_writes_with_dataframe.py | DataFrame append |
168 |      | spark_iceberg_writes_with_sql_insert_overwrite.py | SQL insert overwrite |
169 |      | spark_iceberg_writes_with_sql_merge_into.py | SQL merge into |
170 | 
171 |      <pre>
172 |      (.venv) $ ls src/main/python/
173 |       spark_iceberg_writes_with_dataframe.py
174 |       spark_iceberg_writes_with_sql_insert_overwrite.py
175 |       spark_iceberg_writes_with_sql_merge_into.py
176 |      (.venv) $ aws s3 mb <i>s3://aws-glue-assets-123456789012-atq4q5u</i> --region <i>us-east-1</i>
177 |      (.venv) $ aws s3 cp src/main/python/spark_iceberg_writes_with_dataframe.py <i>s3://aws-glue-assets-123456789012-atq4q5u/scripts/</i>
178 |      </pre>
179 | 
180 |    * (step 2) Provision the Glue Streaming Job
181 | 
182 |      <pre>
183 |      (.venv) $ cdk deploy GlueStreamingSinkToIcebergJobRole \
184 |                           GrantLFPermissionsOnGlueJobRole \
185 |                           GlueStreamingSinkToIceberg
186 |      </pre>
187 | 7. Make sure the glue job to access the Kinesis Data Streams table in the Glue Catalog database, otherwise grant the glue job to permissions
188 | 
189 |    Wec can get permissions by running the following command:
190 |    <pre>
191 |    (.venv) $ aws lakeformation list-permissions | jq -r '.PrincipalResourcePermissions[] | select(.Principal.DataLakePrincipalIdentifier | endswith(":role/GlueStreamingJobRole-Iceberg"))'
192 |    </pre>
193 |    If not found, we need manually to grant the glue job to required permissions by running the following command:
194 |    <pre>
195 |    (.venv) $ aws lakeformation grant-permissions \
196 |                --principal DataLakePrincipalIdentifier=arn:aws:iam::<i>{account-id}</i>:role/<i>GlueStreamingJobRole-Iceberg</i> \
197 |                --permissions SELECT DESCRIBE ALTER INSERT DELETE \
198 |                --resource '{ "Table": {"DatabaseName": "<i>iceberg_demo_db</i>", "TableWildcard": {}} }'
199 |    </pre>
200 | 8. Create a table with partitioned data in Amazon Athena
201 | 
202 |    Go to [Athena](https://console.aws.amazon.com/athena/home) on the AWS Management console.<br/>
203 |    * (step 1) Create a database
204 | 
205 |      In order to create a new database called `iceberg_demo_db`, enter the following statement in the Athena query editor
206 |      and click the **Run** button to execute the query.
207 | 
208 |      <pre>
209 |      CREATE DATABASE IF NOT EXISTS iceberg_demo_db
210 |      </pre>
211 | 
212 |     * (step 2) Create a table
213 | 
214 |       Copy the following query into the Athena query editor, replace the `xxxxxxx` in the last line under `LOCATION` with the string of your S3 bucket, and execute the query to create a new table.
215 |       <pre>
216 |       CREATE TABLE iceberg_demo_db.iceberg_demo_table (
217 |         name string,
218 |         age int,
219 |         m_time timestamp
220 |       )
221 |       PARTITIONED BY (`name`)
222 |       LOCATION 's3://glue-iceberg-demo-atq4q5u/iceberg_demo_db/iceberg_demo_table'
223 |       TBLPROPERTIES (
224 |         'table_type'='iceberg'
225 |       );
226 |       </pre>
227 |       If the query is successful, a table named `iceberg_demo_table` is created and displayed on the left panel under the **Tables** section.
228 | 
229 |       If you get an error, check if (a) you have updated the `LOCATION` to the correct S3 bucket name, (b) you have mydatabase selected under the Database dropdown, and (c) you have `AwsDataCatalog` selected as the **Data source**.
230 | 
231 |       :information_source: If you fail to create the table, give Athena users access permissions on `iceberg_demo_db` through [AWS Lake Formation](https://console.aws.amazon.com/lakeformation/home), or you can grant anyone using Athena to access `iceberg_demo_db` by running the following command:
232 |       <pre>
233 |       (.venv) $ aws lakeformation grant-permissions \
234 |                 --principal DataLakePrincipalIdentifier=arn:aws:iam::<i>{account-id}</i>:user/<i>example-user-id</i> \
235 |                 --permissions CREATE_TABLE DESCRIBE ALTER DROP \
236 |                 --resource '{ "Database": { "Name": "<i>iceberg_demo_db</i>" } }'
237 |       (.venv) $ aws lakeformation grant-permissions \
238 |               --principal DataLakePrincipalIdentifier=arn:aws:iam::<i>{account-id}</i>:user/<i>example-user-id</i> \
239 |               --permissions SELECT DESCRIBE ALTER INSERT DELETE DROP \
240 |               --resource '{ "Table": {"DatabaseName": "<i>iceberg_demo_db</i>", "TableWildcard": {}} }'
241 |       </pre>
242 | 
243 | 9. Run glue job to load data from Kinesis Data Streams into S3
244 |     <pre>
245 |     (.venv) $ aws glue start-job-run --job-name <i>streaming_data_from_kds_into_iceberg_table</i>
246 |     </pre>
247 | 10. Generate streaming data
248 | 
249 |     We can synthetically generate data in JSON format using a simple Python application.
250 |     <pre>
251 |     (.venv) $ python src/utils/gen_fake_kinesis_stream_data.py \
252 |                --region-name <i>us-east-1</i> \
253 |                --stream-name <i>your-stream-name</i> \
254 |                --console \
255 |                --max-count 10
256 |     </pre>
257 | 
258 |     Synthentic Data Example order by `name` and `m_time`
259 |     <pre>
260 |     {"name": "Arica", "age": 48, "m_time": "2023-04-11 19:13:21"}
261 |     {"name": "Arica", "age": 32, "m_time": "2023-10-20 17:24:17"}
262 |     {"name": "Arica", "age": 45, "m_time": "2023-12-26 01:20:49"}
263 |     {"name": "Fernando", "age": 16, "m_time": "2023-05-22 00:13:55"}
264 |     {"name": "Gonzalo", "age": 37, "m_time": "2023-01-11 06:18:26"}
265 |     {"name": "Gonzalo", "age": 60, "m_time": "2023-01-25 16:54:26"}
266 |     {"name": "Micheal", "age": 45, "m_time": "2023-04-07 06:18:17"}
267 |     {"name": "Micheal", "age": 44, "m_time": "2023-12-14 09:02:57"}
268 |     {"name": "Takisha", "age": 48, "m_time": "2023-12-20 16:44:13"}
269 |     {"name": "Takisha", "age": 24, "m_time": "2023-12-30 12:38:23"}
270 |     </pre>
271 | 
272 |     Spark Writes using `DataFrame append` insert all records into the Iceberg table.
273 |     <pre>
274 |     {"name": "Arica", "age": 48, "m_time": "2023-04-11 19:13:21"}
275 |     {"name": "Arica", "age": 32, "m_time": "2023-10-20 17:24:17"}
276 |     {"name": "Arica", "age": 45, "m_time": "2023-12-26 01:20:49"}
277 |     {"name": "Fernando", "age": 16, "m_time": "2023-05-22 00:13:55"}
278 |     {"name": "Gonzalo", "age": 37, "m_time": "2023-01-11 06:18:26"}
279 |     {"name": "Gonzalo", "age": 60, "m_time": "2023-01-25 16:54:26"}
280 |     {"name": "Micheal", "age": 45, "m_time": "2023-04-07 06:18:17"}
281 |     {"name": "Micheal", "age": 44, "m_time": "2023-12-14 09:02:57"}
282 |     {"name": "Takisha", "age": 48, "m_time": "2023-12-20 16:44:13"}
283 |     {"name": "Takisha", "age": 24, "m_time": "2023-12-30 12:38:23"}
284 |     </pre>
285 | 
286 |     Spark Writes using `SQL insert overwrite` or `SQL merge into` insert the last updated records into the Iceberg table.
287 |     <pre>
288 |     {"name": "Arica", "age": 45, "m_time": "2023-12-26 01:20:49"}
289 |     {"name": "Fernando", "age": 16, "m_time": "2023-05-22 00:13:55"}
290 |     {"name": "Gonzalo", "age": 60, "m_time": "2023-01-25 16:54:26"}
291 |     {"name": "Micheal", "age": 44, "m_time": "2023-12-14 09:02:57"}
292 |     {"name": "Takisha", "age": 24, "m_time": "2023-12-30 12:38:23"}
293 |     </pre>
294 | 11. Check streaming data in S3
295 | 
296 |     After `3~5` minutes, you can see that the streaming data have been delivered from **Kinesis Data Streams** to **S3**.
297 | 
298 |     ![iceberg-table](./assets/iceberg-table.png)
299 |     ![iceberg-table](./assets/iceberg-data-level-01.png)
300 |     ![iceberg-table](./assets/iceberg-data-level-02.png)
301 |     ![iceberg-table](./assets/iceberg-data-level-03.png)
302 | 
303 | 12. Run test query
304 | 
305 |     Enter the following SQL statement and execute the query.
306 |     <pre>
307 |     SELECT COUNT(*)
308 |     FROM iceberg_demo_db.iceberg_demo_table;
309 |     </pre>
310 | 
311 | ## Clean Up
312 | 
313 | 1. Stop the glue job by replacing the job name in below command.
314 | 
315 |    <pre>
316 |    (.venv) $ JOB_RUN_IDS=$(aws glue get-job-runs \
317 |               --job-name streaming_data_from_kds_into_iceberg_table | jq -r '.JobRuns[] | select(.JobRunState=="RUNNING") | .Id' \
318 |               | xargs)
319 |    (.venv) $ aws glue batch-stop-job-run \
320 |               --job-name streaming_data_from_kds_into_iceberg_table \
321 |               --job-run-ids $JOB_RUN_IDS
322 |    </pre>
323 | 
324 | 2. Delete the CloudFormation stack by running the below command.
325 | 
326 |    <pre>
327 |    (.venv) $ cdk destroy --all
328 |    </pre>
329 | 
330 | ## Useful commands
331 | 
332 |  * `cdk ls`          list all stacks in the app
333 |  * `cdk synth`       emits the synthesized CloudFormation template
334 |  * `cdk deploy`      deploy this stack to your default AWS account/region
335 |  * `cdk diff`        compare deployed stack with current state
336 |  * `cdk docs`        open CDK documentation
337 | 
338 | ## References
339 | 
340 |  * (1) [AWS Glue versions](https://docs.aws.amazon.com/glue/latest/dg/release-notes.html): The AWS Glue version determines the versions of Apache Spark and Python that AWS Glue supports.
341 |  * (2) [Use the AWS Glue connector to read and write Apache Iceberg tables with ACID transactions and perform time travel \(2022-06-21\)](https://aws.amazon.com/ko/blogs/big-data/use-the-aws-glue-connector-to-read-and-write-apache-iceberg-tables-with-acid-transactions-and-perform-time-travel/)
342 |  * (3) [Streaming Data into Apache Iceberg Tables Using AWS Kinesis and AWS Glue (2022-09-26)](https://www.dremio.com/subsurface/streaming-data-into-apache-iceberg-tables-using-aws-kinesis-and-aws-glue/)
343 |  * (4) [Amazon Athena Using Iceberg tables](https://docs.aws.amazon.com/athena/latest/ug/querying-iceberg.html)
344 |  * (5) [Streaming ETL jobs in AWS Glue](https://docs.aws.amazon.com/glue/latest/dg/add-job-streaming.html)
345 |  * (6) [AWS Glue job parameters](https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html)
346 |  * (7) [Crafting serverless streaming ETL jobs with AWS Glue](https://aws.amazon.com/ko/blogs/big-data/crafting-serverless-streaming-etl-jobs-with-aws-glue/)
347 |  * (8) [Apache Iceberg - Spark Writes with SQL (v0.14.0)](https://iceberg.apache.org/docs/0.14.0/spark-writes/)
348 |  * (9) [Apache Iceberg - Spark Structured Streaming (v0.14.0)](https://iceberg.apache.org/docs/0.14.0/spark-structured-streaming/)
349 |  * (10) [Apache Iceberg - Writing against partitioned table (v0.14.0)](https://iceberg.apache.org/docs/0.14.0/spark-structured-streaming/#writing-against-partitioned-table)
350 |    * Iceberg supports append and complete output modes:
351 |      * `append`: appends the rows of every micro-batch to the table
352 |      * `complete`: replaces the table contents every micro-batch
353 | 
354 |        Iceberg requires the data to be sorted according to the partition spec per task (Spark partition) in prior to write against partitioned table.<br/>
355 |        Otherwise, you might encounter the following error:
356 |        <pre>
357 |        pyspark.sql.utils.AnalysisException: Complete output mode not supported when there are no streaming aggregations on streaming DataFrame/Datasets;
358 |        </pre>
359 |  * (10) [Apache Iceberg - Maintenance for streaming tables (v0.14.0)](https://iceberg.apache.org/docs/0.14.0/spark-structured-streaming/#maintenance-for-streaming-tables)
360 |  * (11) [awsglue python package](https://github.com/awslabs/aws-glue-libs): The awsglue Python package contains the Python portion of the AWS Glue library. This library extends PySpark to support serverless ETL on AWS.
361 |  * (12) [AWS Glue Notebook Samples](https://github.com/aws-samples/aws-glue-samples/tree/master/examples/notebooks) - sample iPython notebook files which show you how to use open data dake formats; Apache Hudi, Delta Lake, and Apache Iceberg on AWS Glue Interactive Sessions and AWS Glue Studio Notebook.
362 | 
363 | ## Troubleshooting
364 | 
365 |  * Granting database or table permissions error using AWS CDK
366 |    * Error message:
367 |      <pre>
368 |      AWS::LakeFormation::PrincipalPermissions | CfnPrincipalPermissions Resource handler returned message: "Resource does not exist or requester is not authorized to access requested permissions. (Service: LakeFormation, Status Code: 400, Request ID: f4d5e58b-29b6-4889-9666-7e38420c9035)" (RequestToken: 4a4bb1d6-b051-032f-dd12-5951d7b4d2a9, HandlerErrorCode: AccessDenied)
369 |      </pre>
370 |    * Solution:
371 | 
372 |      The role assumed by cdk is not a data lake administrator. (e.g., `cdk-hnb659fds-deploy-role-12345678912-us-east-1`) <br/>
373 |      So, deploying PrincipalPermissions meets the error such as:
374 | 
375 |      `Resource does not exist or requester is not authorized to access requested permissions.`
376 | 
377 |      In order to solve the error, it is necessary to promote the cdk execution role to the data lake administrator.<br/>
378 |      For example, https://github.com/aws-samples/data-lake-as-code/blob/mainline/lib/stacks/datalake-stack.ts#L68
379 | 
380 |    * Reference:
381 | 
382 |      [https://github.com/aws-samples/data-lake-as-code](https://github.com/aws-samples/data-lake-as-code) - Data Lake as Code
383 | 
384 | ## Security
385 | 
386 | See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information.
387 | 
388 | ## License
389 | 
390 | This library is licensed under the MIT-0 License. See the LICENSE file.
391 | 
392 | 


--------------------------------------------------------------------------------
/app.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python3
 2 | import os
 3 | 
 4 | import aws_cdk as cdk
 5 | 
 6 | from cdk_stacks import (
 7 |   KdsStack,
 8 |   GlueJobRoleStack,
 9 |   GlueStreamDataSchemaStack,
10 |   GlueStreamingJobStack,
11 |   DataLakePermissionsStack,
12 |   S3BucketStack
13 | )
14 | 
15 | APP_ENV = cdk.Environment(account=os.getenv('CDK_DEFAULT_ACCOUNT'),
16 |   region=os.getenv('CDK_DEFAULT_REGION'))
17 | 
18 | app = cdk.App()
19 | 
20 | s3_bucket = S3BucketStack(app, 'IcebergS3Path')
21 | 
22 | kds_stack = KdsStack(app, 'KinesisStreamAsGlueStreamingJobDataSource')
23 | kds_stack.add_dependency(s3_bucket)
24 | 
25 | glue_job_role = GlueJobRoleStack(app, 'GlueStreamingSinkToIcebergJobRole')
26 | glue_job_role.add_dependency(kds_stack)
27 | 
28 | glue_stream_schema = GlueStreamDataSchemaStack(app, 'GlueSchemaOnKinesisStream',
29 |   kds_stack.kinesis_stream
30 | )
31 | glue_stream_schema.add_dependency(kds_stack)
32 | 
33 | grant_lake_formation_permissions = DataLakePermissionsStack(app, 'GrantLFPermissionsOnGlueJobRole',
34 |   glue_job_role.glue_job_role
35 | )
36 | grant_lake_formation_permissions.add_dependency(glue_job_role)
37 | grant_lake_formation_permissions.add_dependency(glue_stream_schema)
38 | 
39 | glue_streaming_job = GlueStreamingJobStack(app, 'GlueStreamingSinkToIceberg',
40 |   glue_job_role.glue_job_role,
41 |   kds_stack.kinesis_stream
42 | )
43 | glue_streaming_job.add_dependency(grant_lake_formation_permissions)
44 | 
45 | app.synth()
46 | 


--------------------------------------------------------------------------------
/assets/iceberg-data-level-01.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/aws-glue-streaming-etl-with-apache-iceberg/19323e67a5b424462088c58761cc5fce7d4680d4/assets/iceberg-data-level-01.png


--------------------------------------------------------------------------------
/assets/iceberg-data-level-02.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/aws-glue-streaming-etl-with-apache-iceberg/19323e67a5b424462088c58761cc5fce7d4680d4/assets/iceberg-data-level-02.png


--------------------------------------------------------------------------------
/assets/iceberg-data-level-03.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/aws-glue-streaming-etl-with-apache-iceberg/19323e67a5b424462088c58761cc5fce7d4680d4/assets/iceberg-data-level-03.png


--------------------------------------------------------------------------------
/assets/iceberg-table.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/aws-glue-streaming-etl-with-apache-iceberg/19323e67a5b424462088c58761cc5fce7d4680d4/assets/iceberg-table.png


--------------------------------------------------------------------------------
/cdk.json:
--------------------------------------------------------------------------------
 1 | {
 2 |   "app": "python3 app.py",
 3 |   "watch": {
 4 |     "include": [
 5 |       "**"
 6 |     ],
 7 |     "exclude": [
 8 |       "README.md",
 9 |       "cdk*.json",
10 |       "requirements*.txt",
11 |       "source.bat",
12 |       "**/__init__.py",
13 |       "python/__pycache__",
14 |       "tests"
15 |     ]
16 |   },
17 |   "context": {
18 |     "@aws-cdk/aws-lambda:recognizeLayerVersion": true,
19 |     "@aws-cdk/core:checkSecretUsage": true,
20 |     "@aws-cdk/core:target-partitions": [
21 |       "aws",
22 |       "aws-cn"
23 |     ],
24 |     "@aws-cdk-containers/ecs-service-extensions:enableDefaultLogDriver": true,
25 |     "@aws-cdk/aws-ec2:uniqueImdsv2TemplateName": true,
26 |     "@aws-cdk/aws-ecs:arnFormatIncludesClusterName": true,
27 |     "@aws-cdk/aws-iam:minimizePolicies": true,
28 |     "@aws-cdk/core:validateSnapshotRemovalPolicy": true,
29 |     "@aws-cdk/aws-codepipeline:crossAccountKeyAliasStackSafeResourceName": true,
30 |     "@aws-cdk/aws-s3:createDefaultLoggingPolicy": true,
31 |     "@aws-cdk/aws-sns-subscriptions:restrictSqsDescryption": true,
32 |     "@aws-cdk/aws-apigateway:disableCloudWatchRole": true,
33 |     "@aws-cdk/core:enablePartitionLiterals": true,
34 |     "@aws-cdk/aws-events:eventsTargetQueueSameAccount": true,
35 |     "@aws-cdk/aws-iam:standardizedServicePrincipals": true,
36 |     "@aws-cdk/aws-ecs:disableExplicitDeploymentControllerForCircuitBreaker": true,
37 |     "@aws-cdk/aws-s3:serverAccessLogsUseBucketPolicy": true
38 |   }
39 | }
40 | 


--------------------------------------------------------------------------------
/cdk_stacks/__init__.py:
--------------------------------------------------------------------------------
1 | from .kds import KdsStack
2 | from .glue_job_role import GlueJobRoleStack
3 | from .glue_stream_data_schema import GlueStreamDataSchemaStack
4 | from .glue_streaming_job import GlueStreamingJobStack
5 | from .lakeformation_permissions import DataLakePermissionsStack
6 | from .s3 import S3BucketStack
7 | 


--------------------------------------------------------------------------------
/cdk_stacks/glue_job_role.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | # -*- encoding: utf-8 -*-
  3 | # vim: tabstop=2 shiftwidth=2 softtabstop=2 expandtab
  4 | 
  5 | import aws_cdk as cdk
  6 | 
  7 | from aws_cdk import (
  8 |   Stack,
  9 |   aws_iam
 10 | )
 11 | from constructs import Construct
 12 | 
 13 | 
 14 | class GlueJobRoleStack(Stack):
 15 | 
 16 |   def __init__(self, scope: Construct, construct_id: str, **kwargs) -> None:
 17 |     super().__init__(scope, construct_id, **kwargs)
 18 | 
 19 |     glue_job_role_policy_doc = aws_iam.PolicyDocument()
 20 |     glue_job_role_policy_doc.add_statements(aws_iam.PolicyStatement(**{
 21 |       "sid": "AWSGlueJobDynamoDBAccess",
 22 |       "effect": aws_iam.Effect.ALLOW,
 23 |       #XXX: The ARN will be formatted as follows:
 24 |       # arn:{partition}:{service}:{region}:{account}:{resource}{sep}{resource-name}
 25 |       "resources": [self.format_arn(service="dynamodb", resource="table", resource_name="*")],
 26 |       "actions": [
 27 |         "dynamodb:BatchGetItem",
 28 |         "dynamodb:DescribeStream",
 29 |         "dynamodb:DescribeTable",
 30 |         "dynamodb:GetItem",
 31 |         "dynamodb:Query",
 32 |         "dynamodb:Scan",
 33 |         "dynamodb:BatchWriteItem",
 34 |         "dynamodb:CreateTable",
 35 |         "dynamodb:DeleteTable",
 36 |         "dynamodb:DeleteItem",
 37 |         "dynamodb:UpdateTable",
 38 |         "dynamodb:UpdateItem",
 39 |         "dynamodb:PutItem"
 40 |       ]
 41 |     }))
 42 | 
 43 |     glue_job_role_policy_doc.add_statements(aws_iam.PolicyStatement(**{
 44 |       "sid": "AWSGlueJobS3Access",
 45 |       "effect": aws_iam.Effect.ALLOW,
 46 |       #XXX: The ARN will be formatted as follows:
 47 |       # arn:{partition}:{service}:{region}:{account}:{resource}{sep}{resource-name}
 48 |       "resources": ["*"],
 49 |       "actions": [
 50 |         "s3:GetBucketLocation",
 51 |         "s3:ListBucket",
 52 |         "s3:GetBucketAcl",
 53 |         "s3:GetObject",
 54 |         "s3:PutObject",
 55 |         "s3:DeleteObject"
 56 |       ]
 57 |     }))
 58 | 
 59 |     glue_job_role = aws_iam.Role(self, 'GlueJobRole',
 60 |       role_name='GlueStreamingJobRole-Iceberg',
 61 |       assumed_by=aws_iam.ServicePrincipal('glue.amazonaws.com'),
 62 |       inline_policies={
 63 |         'aws_glue_job_role_policy': glue_job_role_policy_doc
 64 |       },
 65 |       managed_policies=[
 66 |         aws_iam.ManagedPolicy.from_aws_managed_policy_name('service-role/AWSGlueServiceRole'),
 67 |         aws_iam.ManagedPolicy.from_aws_managed_policy_name('AmazonSSMReadOnlyAccess'),
 68 |         aws_iam.ManagedPolicy.from_aws_managed_policy_name('AmazonEC2ContainerRegistryReadOnly'),
 69 |         aws_iam.ManagedPolicy.from_aws_managed_policy_name('AWSGlueConsoleFullAccess'),
 70 |         aws_iam.ManagedPolicy.from_aws_managed_policy_name('AmazonKinesisReadOnlyAccess')
 71 |       ]
 72 |     )
 73 | 
 74 |     #XXX: When creating a notebook with a role, that role is then passed to interactive sessions
 75 |     # so that the same role can be used in both places.
 76 |     # As such, the `iam:PassRole` permission needs to be part of the role's policy.
 77 |     # More info at: https://docs.aws.amazon.com/glue/latest/ug/notebook-getting-started.html
 78 |     #
 79 |     glue_job_role.add_to_policy(aws_iam.PolicyStatement(**{
 80 |       "sid": "AWSGlueJobIAMPassRole",
 81 |       "effect": aws_iam.Effect.ALLOW,
 82 |       #XXX: The ARN will be formatted as follows:
 83 |       # arn:{partition}:{service}:{region}:{account}:{resource}{sep}{resource-name}
 84 |       "resources": [self.format_arn(service="iam", region="", resource="role", resource_name=glue_job_role.role_name)],
 85 |       "conditions": {
 86 |         "StringLike": {
 87 |           "iam:PassedToService": [
 88 |             "glue.amazonaws.com"
 89 |           ]
 90 |         }
 91 |       },
 92 |       "actions": [
 93 |         "iam:PassRole"
 94 |       ]
 95 |     }))
 96 | 
 97 |     self.glue_job_role = glue_job_role
 98 | 
 99 |     cdk.CfnOutput(self, f'{self.stack_name}_GlueJobRole', value=self.glue_job_role.role_name)
100 |     cdk.CfnOutput(self, f'{self.stack_name}_GlueJobRoleArn', value=self.glue_job_role.role_arn)
101 | 


--------------------------------------------------------------------------------
/cdk_stacks/glue_stream_data_schema.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python3
 2 | # -*- encoding: utf-8 -*-
 3 | # vim: tabstop=2 shiftwidth=2 softtabstop=2 expandtab
 4 | 
 5 | import aws_cdk as cdk
 6 | 
 7 | from aws_cdk import (
 8 |   Stack,
 9 |   aws_glue
10 | )
11 | from constructs import Construct
12 | 
13 | 
14 | class GlueStreamDataSchemaStack(Stack):
15 | 
16 |   def __init__(self, scope: Construct, construct_id: str, kinesis_stream, **kwargs) -> None:
17 |     super().__init__(scope, construct_id, **kwargs)
18 | 
19 |     glue_kinesis_table = self.node.try_get_context('glue_kinesis_table')
20 |     database_name = glue_kinesis_table['database_name']
21 |     table_name = glue_kinesis_table['table_name']
22 |     columns = glue_kinesis_table.get('columns', [])
23 | 
24 |     cfn_database = aws_glue.CfnDatabase(self, "GlueCfnDatabase",
25 |       catalog_id=cdk.Aws.ACCOUNT_ID,
26 |       database_input=aws_glue.CfnDatabase.DatabaseInputProperty(
27 |         name=database_name
28 |       )
29 |     )
30 |     cfn_database.apply_removal_policy(cdk.RemovalPolicy.DESTROY)
31 | 
32 |     cfn_table = aws_glue.CfnTable(self, "GlueCfnTable",
33 |       catalog_id=cdk.Aws.ACCOUNT_ID,
34 |       database_name=database_name,
35 |       table_input=aws_glue.CfnTable.TableInputProperty(
36 |         name=table_name,
37 |         parameters={"classification": "json"},
38 |         storage_descriptor=aws_glue.CfnTable.StorageDescriptorProperty(
39 |           columns=columns,
40 |           input_format="org.apache.hadoop.mapred.TextInputFormat",
41 |           location=kinesis_stream.stream_name,
42 |           output_format="org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
43 |           parameters={
44 |             "streamARN": kinesis_stream.stream_arn,
45 |             "typeOfData": "kinesis"
46 |           },
47 |           serde_info=aws_glue.CfnTable.SerdeInfoProperty(
48 |             serialization_library="org.openx.data.jsonserde.JsonSerDe"
49 |           )
50 |         ),
51 |         table_type="EXTERNAL_TABLE"
52 |       )
53 |     )
54 | 
55 |     cfn_table.add_dependency(cfn_database)
56 |     cfn_table.apply_removal_policy(cdk.RemovalPolicy.DESTROY)
57 | 
58 |     cdk.CfnOutput(self, f'{self.stack_name}_GlueDatabaseName', value=cfn_table.database_name)
59 | 


--------------------------------------------------------------------------------
/cdk_stacks/glue_streaming_job.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python3
 2 | # -*- encoding: utf-8 -*-
 3 | # vim: tabstop=2 shiftwidth=2 softtabstop=2 expandtab
 4 | 
 5 | import aws_cdk as cdk
 6 | 
 7 | from aws_cdk import (
 8 |   Stack,
 9 |   aws_glue,
10 |   aws_s3 as s3,
11 | )
12 | from constructs import Construct
13 | 
14 | 
15 | class GlueStreamingJobStack(Stack):
16 | 
17 |   def __init__(self, scope: Construct, construct_id: str, glue_job_role, kinesis_stream, **kwargs) -> None:
18 |     super().__init__(scope, construct_id, **kwargs)
19 | 
20 |     glue_assets_s3_bucket_name = self.node.try_get_context('glue_assets_s3_bucket_name')
21 |     glue_job_script_file_name = self.node.try_get_context('glue_job_script_file_name')
22 |     glue_job_input_arguments = self.node.try_get_context('glue_job_input_arguments')
23 | 
24 |     glue_job_default_arguments = {
25 |       "--kinesis_stream_arn": kinesis_stream.stream_arn,
26 |       "--enable-metrics": "true",
27 |       "--enable-spark-ui": "true",
28 |       "--spark-event-logs-path": f"s3://{glue_assets_s3_bucket_name}/sparkHistoryLogs/",
29 |       "--enable-job-insights": "false",
30 |       "--enable-glue-datacatalog": "true",
31 |       "--enable-continuous-cloudwatch-log": "true",
32 |       "--job-bookmark-option": "job-bookmark-disable",
33 |       "--job-language": "python",
34 |       "--TempDir": f"s3://{glue_assets_s3_bucket_name}/temporary/"
35 |     }
36 | 
37 |     glue_job_default_arguments.update(glue_job_input_arguments)
38 | 
39 |     glue_job_name = self.node.try_get_context('glue_job_name')
40 | 
41 |     glue_connections_name = self.node.try_get_context('glue_connections_name')
42 | 
43 |     glue_cfn_job = aws_glue.CfnJob(self, "GlueStreamingETLJob",
44 |       command=aws_glue.CfnJob.JobCommandProperty(
45 |         name="gluestreaming",
46 |         python_version="3",
47 |         script_location="s3://{glue_assets}/scripts/{glue_job_script_file_name}".format(
48 |           glue_assets=glue_assets_s3_bucket_name,
49 |           glue_job_script_file_name=glue_job_script_file_name
50 |         )
51 |       ),
52 |       role=glue_job_role.role_arn,
53 | 
54 |       #XXX: Set only AllocatedCapacity or MaxCapacity
55 |       # Do not set Allocated Capacity if using Worker Type and Number of Workers
56 |       # allocated_capacity=2,
57 |       connections=aws_glue.CfnJob.ConnectionsListProperty(
58 |         connections=[glue_connections_name]
59 |       ),
60 |       default_arguments=glue_job_default_arguments,
61 |       description="This job loads the data from Kinesis Data Streams to S3.",
62 |       execution_property=aws_glue.CfnJob.ExecutionPropertyProperty(
63 |         max_concurrent_runs=1
64 |       ),
65 |       #XXX: check AWS Glue Version in https://docs.aws.amazon.com/glue/latest/dg/add-job.html#create-job
66 |       glue_version="3.0",
67 |       #XXX: Do not set Max Capacity if using Worker Type and Number of Workers
68 |       # max_capacity=2,
69 |       max_retries=0,
70 |       name=glue_job_name,
71 |       # notification_property=aws_glue.CfnJob.NotificationPropertyProperty(
72 |       #   notify_delay_after=10 # 10 minutes
73 |       # ),
74 |       number_of_workers=2,
75 |       timeout=2880,
76 |       worker_type="G.1X" # ['Standard' | 'G.1X' | 'G.2X' | 'G.025x']
77 |     )
78 | 
79 |     cdk.CfnOutput(self, f'{self.stack_name}_GlueJobName', value=glue_cfn_job.name)
80 |     cdk.CfnOutput(self, f'{self.stack_name}_GlueJobRoleArn', value=glue_job_role.role_arn)
81 | 


--------------------------------------------------------------------------------
/cdk_stacks/kds.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python3
 2 | # -*- encoding: utf-8 -*-
 3 | # vim: tabstop=2 shiftwidth=2 softtabstop=2 expandtab
 4 | 
 5 | import random
 6 | import string
 7 | 
 8 | import aws_cdk as cdk
 9 | 
10 | from aws_cdk import (
11 |   Duration,
12 |   Stack,
13 |   aws_kinesis,
14 | )
15 | from constructs import Construct
16 | 
17 | random.seed(23)
18 | 
19 | 
20 | class KdsStack(Stack):
21 | 
22 |   def __init__(self, scope: Construct, construct_id: str, **kwargs) -> None:
23 |     super().__init__(scope, construct_id, **kwargs)
24 | 
25 |     KINESIS_DEFAULT_STREAM_NAME = 'PUT-{}'.format(''.join(random.sample((string.ascii_letters), k=5)))
26 |     KINESIS_STREAM_NAME = self.node.try_get_context('kinesis_stream_name') or KINESIS_DEFAULT_STREAM_NAME
27 | 
28 |     source_kinesis_stream = aws_kinesis.Stream(self, "SourceKinesisStreams",
29 |       retention_period=Duration.hours(24),
30 |       stream_mode=aws_kinesis.StreamMode.ON_DEMAND,
31 |       stream_name=KINESIS_STREAM_NAME)
32 | 
33 |     self.kinesis_stream = source_kinesis_stream
34 | 
35 |     cdk.CfnOutput(self, f'{self.stack_name}_KinesisDataStreamName', value=self.kinesis_stream.stream_name)
36 | 
37 | 


--------------------------------------------------------------------------------
/cdk_stacks/lakeformation_permissions.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python3
 2 | # -*- encoding: utf-8 -*-
 3 | # vim: tabstop=2 shiftwidth=2 softtabstop=2 expandtab
 4 | 
 5 | import aws_cdk as cdk
 6 | 
 7 | from aws_cdk import (
 8 |   Stack,
 9 |   aws_lakeformation
10 | )
11 | from constructs import Construct
12 | 
13 | 
14 | class DataLakePermissionsStack(Stack):
15 | 
16 |   def __init__(self, scope: Construct, construct_id: str, glue_job_role, **kwargs) -> None:
17 |     super().__init__(scope, construct_id, **kwargs)
18 | 
19 |     glue_job_input_arguments = self.node.try_get_context('glue_kinesis_table')
20 |     database_name = glue_job_input_arguments["database_name"]
21 | 
22 |     #XXXX: The role assumed by cdk is not a data lake administrator.
23 |     # So, deploying PrincipalPermissions meets the error such as:
24 |     # "Resource does not exist or requester is not authorized to access requested permissions."
25 |     # In order to solve the error, it is necessary to promote the cdk execution role to the data lake administrator.
26 |     # For example, https://github.com/aws-samples/data-lake-as-code/blob/mainline/lib/stacks/datalake-stack.ts#L68
27 |     cfn_data_lake_settings = aws_lakeformation.CfnDataLakeSettings(self, "CfnDataLakeSettings",
28 |       admins=[aws_lakeformation.CfnDataLakeSettings.DataLakePrincipalProperty(
29 |         data_lake_principal_identifier=cdk.Fn.sub(self.synthesizer.cloud_formation_execution_role_arn)
30 |       )]
31 |     )
32 | 
33 |     cfn_principal_permissions = aws_lakeformation.CfnPrincipalPermissions(self, "CfnPrincipalPermissions",
34 |       permissions=["SELECT", "INSERT", "DELETE", "DESCRIBE", "ALTER"],
35 |       permissions_with_grant_option=[],
36 |       principal=aws_lakeformation.CfnPrincipalPermissions.DataLakePrincipalProperty(
37 |         data_lake_principal_identifier=glue_job_role.role_arn
38 |       ),
39 |       resource=aws_lakeformation.CfnPrincipalPermissions.ResourceProperty(
40 |         #XXX: Can't specify a TableWithColumns resource and a Table resource
41 |         table=aws_lakeformation.CfnPrincipalPermissions.TableResourceProperty(
42 |           catalog_id=cdk.Aws.ACCOUNT_ID,
43 |           database_name=database_name,
44 |           # name="ALL_TABLES",
45 |           table_wildcard={}
46 |         )
47 |       )
48 |     )
49 |     cfn_principal_permissions.apply_removal_policy(cdk.RemovalPolicy.DESTROY)
50 | 
51 |     #XXX: In order to keep resource destruction order,
52 |     # set dependency between CfnDataLakeSettings and CfnPrincipalPermissions
53 |     cfn_principal_permissions.add_dependency(cfn_data_lake_settings)
54 | 
55 |     cdk.CfnOutput(self, f'{self.stack_name}_Principal',
56 |       value=cfn_principal_permissions.attr_principal_identifier)
57 | 


--------------------------------------------------------------------------------
/cdk_stacks/s3.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python3
 2 | # -*- encoding: utf-8 -*-
 3 | # vim: tabstop=2 shiftwidth=2 softtabstop=2 expandtab
 4 | 
 5 | from urllib.parse import urlparse
 6 | 
 7 | import aws_cdk as cdk
 8 | 
 9 | from aws_cdk import (
10 |   Stack,
11 |   aws_s3 as s3
12 | )
13 | 
14 | from constructs import Construct
15 | 
16 | 
17 | class S3BucketStack(Stack):
18 | 
19 |   def __init__(self, scope: Construct, construct_id: str, **kwargs) -> None:
20 |     super().__init__(scope, construct_id, **kwargs)
21 | 
22 |     glue_job_input_arguments = self.node.try_get_context('glue_job_input_arguments')
23 |     s3_path = glue_job_input_arguments["--iceberg_s3_path"]
24 |     s3_bucket_name = urlparse(s3_path).netloc
25 | 
26 |     s3_bucket = s3.Bucket(self, "s3bucket",
27 |       removal_policy=cdk.RemovalPolicy.DESTROY, #XXX: Default: cdk.RemovalPolicy.RETAIN - The bucket will be orphaned
28 |       bucket_name=s3_bucket_name)
29 | 
30 |     self.s3_bucket_name = s3_bucket.bucket_name
31 | 
32 |     cdk.CfnOutput(self, f'{self.stack_name}_S3Bucket', value=self.s3_bucket_name)
33 | 


--------------------------------------------------------------------------------
/glue-streaming-data-to-iceberg-table.svg:
--------------------------------------------------------------------------------
1 | <?xml version="1.0" encoding="UTF-8"?>
2 | <!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
3 | <svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" version="1.1" width="761px" height="320px" viewBox="-0.5 -0.5 761 320" content="&lt;mxfile host=&quot;Electron&quot; modified=&quot;2023-01-13T05:50:26.234Z&quot; agent=&quot;5.0 (Macintosh; Intel Mac OS X 10_16_0) AppleWebKit/537.36 (KHTML, like Gecko) draw.io/13.2.2 Chrome/83.0.4103.100 Electron/9.0.3 Safari/537.36&quot; etag=&quot;a_tCTreeCZUcLiYOzjHm&quot; version=&quot;13.2.2&quot; type=&quot;device&quot;&gt;&lt;diagram id=&quot;KxS7Oj1AeW9TW_wmeesr&quot; name=&quot;Page-1&quot;&gt;7VrbcuI4EP0aHidlWb4+ckmY7GamUpXdTdW+UAILWxNjsbYIMF+/LSwDssxAJeAJU5NQidSSdek+fbol3MH92WqYk3nyhUc07dhWtOrgQce2EfZs+Ccl61ISOmEpiHMWqU47wRP7TpXQUtIFi2ihdRScp4LNdeGEZxmdCE1G8pwv9W5TnuqzzklMDcHThKSm9JlFIlFS5IW7hs+UxYmaOrD9smFGqs5qJ0VCIr7cE+HbDu7nnIuyNFv1aSqVV+mlfO7uQOt2YTnNxCkP9NexRefzV/fe5tkQ/b0Ovo4/IbWNV5Iu1I67z08g6Kd8EamFi3WljTlnmdho1O3BBybsWx0XWvqydmO7NUG97usCZNbkGLqgXvd1AaoPj2rzo/oC9wRGTRveqs1v7S0QPrjHFyJlGe1vsWeBMM5JxMAmfZ7yHGQZz0B7vUTMUqghKC4TJujTnEykVpfgNyCb8kwo9CO7qivFy1EBPXNZnq1i6Wg3ZFk4N3HOF/PNlPeA/8bWERRHk40xYRCR8xdaLaxjY/i9k2jpTVma1hb8SnPBwBG6KYvl2ILLqYiqpXQq5IiwC5bFD5vaAFtq5U1TRKRIaKS2Y4JX4VnOSld7IgXmIeUzKvI1dFGtOFCOpZgFeaq+3PmpV9FIsueiuAI9UdwQb8feuQ8UlAc1e1Nx//Dln6/L4fi/cTf5/Dyw//j+7ydkOJPhQY2QOaSyA1DaNxZ0d7vY6rmGeVVnTe2VTR/ImKaPvGCCbXAz5kLw2VGjT2AtNNfBfAy4pJiXG52ylVxHb8MhNL99pSWVoIPoBi2OIiLIaAJ/Uh5vcWOApAFKB3HjY1vDzZag93Dj2yZs/OBCqLEN1AxlybYGsG1JxWr3dSiBCoRuC8NOdXPOWBTJx3s5Ldh3Mt5aSSd2yW1kIQAfpV3PpHjk6IoPA0PxqMlhKwO9R/PN0Q8fd1gaQT6gqjwXCY95RtLbnbQHNJtFWxfb9Xng0nk22vtGhVgrL5GK1c1Gs6grUxXpiCmfvPyVsKwU37E01QzQwJgFX+QT+qNdlv3kRn5oppymRLBXPfFp0rl69FFiZmde1/c18+KwRrOC5DEV6qma5bbLeIcxL0y+0BY67uDO3msbsBwGKjk0k5Y/mZ2hZbr5+YgU3UjHQBkbpJWphmSQpqTjBfRbsKLkbdgyJbPiPPHeqcJ2ha+GeF9xdCu87V0bedQY4G1sojZZOvOxXOjsJBPanh5DglpsOEAyoCKy3uum4t3heTDS53HDGl7KEc/KYGYi8JvB2mcwOLfTjJyHs4KgFhNb5KzmIOleG2m9haNwOxmPj922Mp5mirWuzZhniUD4p0YgD7UTgYx52ohAJ5yHfkegi0egeKP9c8Qf40z2s+OPYwDsz/LAsLvueFKHhg9w3fG++8kQa7p3cMN1B27zusOM/eqiqVQ5yz7EHdP7AO95x5Vutal085TYLRPMa1d1aKGjqnZbvc3zry0dekv24zdmUmdPc4JAd6S2b/NOsOUFvkqxfR9B3vXLfZUyXkxeqBgtmUhGfPwNRjnTpZxfuydpTDDcFhOMwMDNE756qvVdfKMfNR3H1HOruQQ64br9+sk2bIdsHRTeoCC0XMe3PewFNZ8Ka85yYeo1XwlphXqr7r8Y9U5StoHdOXjA9k5g26bjnHcpFjDp9nENTpztjnNDmtGcCDDmtbPwlmC3Ca9naB855yFhqO7eDCsde/d+Hb79Hw==&lt;/diagram&gt;&lt;/mxfile&gt;"><defs><linearGradient x1="0%" y1="100%" x2="0%" y2="0%" id="mx-gradient-945df2-1-5a30b5-1-s-0"><stop offset="0%" style="stop-color:#5A30B5"/><stop offset="100%" style="stop-color:#945DF2"/></linearGradient></defs><g><path d="M 150 0 L 760 0 L 760 319 L 150 319 Z" fill="none" stroke="#232f3e" stroke-miterlimit="10" pointer-events="all"/><path d="M 160.73 6.89 C 158.19 6.87 156.05 9.34 156.28 11.84 C 154.63 12.3 153.57 14.09 153.93 15.76 C 154.17 17.45 155.84 18.8 157.56 18.58 C 161.3 18.56 165.04 18.61 168.79 18.55 C 170.72 18.38 172.05 16.29 171.58 14.45 C 171.34 13.13 170.21 11.98 168.87 11.81 C 168.86 10.3 167.26 8.82 165.79 9.6 C 164.97 10.39 164.82 8.44 164.02 8.13 C 163.14 7.28 161.95 6.83 160.73 6.89 Z M 160.75 7.63 C 162.69 7.51 164.32 9.05 164.91 10.79 C 165.31 11.19 165.67 10.53 165.97 10.34 C 167.25 9.57 168.21 11.17 168.22 12.29 C 168.58 12.77 169.39 12.38 169.77 12.91 C 171.21 13.93 171.31 16.3 169.88 17.37 C 168.9 18.09 167.63 17.75 166.5 17.83 C 163.28 17.82 160.06 17.84 156.84 17.82 C 155.21 17.58 154.19 15.69 154.79 14.17 C 155.07 13.16 156.05 12.67 156.94 12.36 C 157.19 11.62 156.85 10.71 157.33 9.98 C 157.92 8.65 159.25 7.61 160.75 7.63 Z M 150 25 C 150 16.67 150 8.33 150 0 C 158.33 0 166.67 0 175 0 C 175 8.33 175 16.67 175 25 C 166.67 25 158.33 25 150 25 Z" fill="#232f3e" stroke="none" pointer-events="all"/><g transform="translate(-0.5 -0.5)"><switch><foreignObject style="overflow: visible; text-align: left;" pointer-events="none" width="100%" height="100%" requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility"><div xmlns="http://www.w3.org/1999/xhtml" style="display: flex; align-items: unsafe flex-start; justify-content: unsafe flex-start; width: 578px; height: 1px; padding-top: 7px; margin-left: 182px;"><div style="box-sizing: border-box; font-size: 0; text-align: left; "><div style="display: inline-block; font-size: 12px; font-family: Helvetica; color: #232F3E; line-height: 1.2; pointer-events: all; white-space: normal; word-wrap: normal; ">AWS Cloud</div></div></div></foreignObject><text x="182" y="19" fill="#232F3E" font-family="Helvetica" font-size="12px">AWS Cloud</text></switch></g><rect x="502" y="60" width="72" height="78" fill="none" stroke="none" pointer-events="all"/><path d="M 507.02 60 C 504.32 60.01 502.13 62.2 502.12 64.9 L 502.12 133.13 C 502.12 134.42 502.63 135.66 503.55 136.57 C 504.46 137.49 505.7 138 506.99 138 L 511.48 138 L 511.48 134.88 L 507.02 134.88 C 506.55 134.88 506.09 134.69 505.76 134.36 C 505.43 134.03 505.24 133.57 505.24 133.1 L 505.24 64.88 C 505.25 63.91 506.03 63.13 507 63.12 L 511.48 63.12 L 511.48 60 Z M 516.16 60 C 515.3 60 514.6 60.7 514.6 61.56 L 514.6 136.44 C 514.6 137.3 515.3 138 516.16 138 L 572.32 138 C 572.73 138 573.13 137.84 573.42 137.54 C 573.72 137.25 573.88 136.85 573.88 136.44 L 573.88 61.56 C 573.88 61.15 573.72 60.75 573.42 60.46 C 573.13 60.16 572.73 60 572.32 60 Z M 517.72 63.12 L 570.76 63.12 L 570.76 134.88 L 517.72 134.88 Z M 523.96 67.8 L 523.96 70.92 L 566.08 70.92 L 566.08 67.8 Z M 523.96 74.04 L 523.96 77.16 L 566.08 77.16 L 566.08 74.04 Z M 544.32 80.51 C 541.42 80.51 527.16 80.75 527.16 85.68 C 527.16 86.3 527.39 86.85 527.8 87.33 L 539.5 110.53 L 539.5 115.91 C 539.5 116.44 539.77 116.93 540.22 117.22 C 541.41 118.01 542.81 118.45 544.24 118.5 C 545.67 118.47 547.06 118.01 548.23 117.19 C 548.66 116.9 548.92 116.42 548.92 115.91 L 548.92 110.47 L 560.48 87.66 C 561.09 87.1 561.48 86.46 561.48 85.68 C 561.48 80.76 547.22 80.51 544.32 80.51 Z M 544.32 83.63 C 551.9 83.63 556.58 84.85 558 85.68 L 557.98 85.68 C 556.58 86.52 551.87 87.74 544.32 87.74 C 536.75 87.74 532.06 86.52 530.64 85.68 C 532.06 84.85 536.74 83.63 544.32 83.63 Z M 532.46 89.64 C 536.85 90.76 542.62 90.86 544.32 90.86 C 545.99 90.86 551.6 90.76 555.96 89.69 L 545.97 109.39 C 545.86 109.61 545.8 109.85 545.8 110.09 L 545.8 114.97 C 544.83 115.51 543.65 115.51 542.68 114.97 L 542.68 110.15 C 542.68 109.91 542.62 109.67 542.51 109.45 Z M 523.96 122.4 L 523.96 125.52 L 566.08 125.52 L 566.08 122.4 Z M 523.96 128.64 L 523.96 131.76 L 566.08 131.76 L 566.08 128.64 Z" fill="#5a30b5" stroke="none" pointer-events="all"/><rect x="484" y="138" width="110" height="20" fill="none" stroke="none" pointer-events="all"/><g transform="translate(-0.5 -0.5)"><switch><foreignObject style="overflow: visible; text-align: left;" pointer-events="none" width="100%" height="100%" requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility"><div xmlns="http://www.w3.org/1999/xhtml" style="display: flex; align-items: unsafe center; justify-content: unsafe center; width: 1px; height: 1px; padding-top: 148px; margin-left: 539px;"><div style="box-sizing: border-box; font-size: 0; text-align: center; "><div style="display: inline-block; font-size: 12px; font-family: Helvetica; color: #000000; line-height: 1.2; pointer-events: all; white-space: nowrap; ">Glue Data Catalog</div></div></div></foreignObject><text x="539" y="152" fill="#000000" font-family="Helvetica" font-size="12px" text-anchor="middle">Glue Data Catalog</text></switch></g><path d="M 267 239 L 338.88 239" fill="none" stroke="#000000" stroke-miterlimit="10" pointer-events="stroke"/><path d="M 345.88 239 L 338.88 241.33 L 338.88 236.67 Z" fill="#000000" stroke="#000000" stroke-miterlimit="10" pointer-events="all"/><path d="M 189 200 L 267 200 L 267 278 L 189 278 Z" fill="url(#mx-gradient-945df2-1-5a30b5-1-s-0)" stroke="none" pointer-events="all"/><path d="M 242.31 236.53 C 233.02 236.53 225.47 244.08 225.47 253.36 C 225.47 262.65 233.02 270.2 242.31 270.2 C 251.6 270.2 259.15 262.65 259.15 253.36 C 259.15 244.08 251.6 236.53 242.31 236.53 Z M 242.31 238.95 C 246.06 238.95 249.48 240.39 252.05 242.74 L 247.92 242.74 C 247.6 242.74 247.29 242.86 247.06 243.09 C 246.83 243.32 246.71 243.63 246.71 243.95 L 246.71 250.24 C 246.71 250.56 246.83 250.87 247.06 251.1 C 247.29 251.32 247.6 251.45 247.92 251.45 L 254.21 251.45 C 254.53 251.45 254.84 251.32 255.06 251.1 C 255.29 250.87 255.42 250.56 255.42 250.24 L 255.42 247.37 C 256.25 249.2 256.72 251.23 256.72 253.36 C 256.72 253.99 256.68 254.6 256.6 255.22 L 247.94 255.22 L 247.94 257.64 L 256.08 257.64 C 255.66 259 255.04 260.27 254.27 261.43 L 247.94 261.43 L 247.94 263.85 L 252.2 263.85 C 249.62 266.29 246.14 267.77 242.31 267.77 C 238.48 267.77 235 266.29 232.42 263.85 L 234.28 263.85 L 234.28 261.43 L 230.35 261.43 C 229.58 260.27 228.97 259 228.54 257.64 L 234.28 257.64 L 234.28 255.22 L 228.02 255.22 C 227.95 254.6 227.9 253.99 227.9 253.36 C 227.9 252.7 227.95 252.05 228.03 251.41 L 235.55 251.41 L 235.55 248.99 L 228.58 248.99 C 229.01 247.6 229.66 246.3 230.48 245.12 L 244.22 245.12 L 244.22 242.7 L 232.6 242.7 C 235.17 240.37 238.56 238.95 242.31 238.95 Z M 249.13 245.16 L 252.99 245.16 L 252.99 249.02 L 249.13 249.02 Z M 238.06 248.99 L 238.06 251.41 L 244.22 251.41 L 244.22 248.99 Z M 237.99 255.22 C 237.67 255.22 237.36 255.34 237.13 255.57 C 236.91 255.8 236.78 256.11 236.78 256.43 L 236.78 262.72 C 236.78 263.39 237.32 263.93 237.99 263.93 L 244.28 263.93 C 244.95 263.93 245.49 263.39 245.49 262.72 L 245.49 256.43 C 245.49 255.76 244.95 255.22 244.28 255.22 Z M 239.2 257.64 L 243.07 257.64 L 243.07 261.5 L 239.2 261.5 Z M 197 207.8 L 197 210.25 C 197 214.44 199.69 218.36 203.12 221.61 C 206.56 224.86 210.82 227.47 214.45 228.91 L 214.46 228.91 L 214.47 228.92 C 229.57 234.49 238.7 235.23 259.06 235.23 L 259.06 232.81 C 238.77 232.81 230.18 232.13 215.33 226.65 C 212.04 225.35 207.97 222.85 204.79 219.84 C 201.62 216.83 199.43 213.36 199.43 210.25 L 199.43 207.8 Z M 210.99 207.8 L 210.99 210.96 C 210.98 213.55 212.24 215.93 214.01 218.01 C 215.78 220.09 218.11 221.88 220.48 223.31 L 220.52 223.33 L 220.56 223.35 C 233.2 229.81 247.1 230.3 259.06 230.2 L 259.06 227.77 C 247.2 227.88 233.73 227.34 221.71 221.2 C 219.51 219.88 217.39 218.22 215.85 216.43 C 214.32 214.63 213.41 212.75 213.42 210.97 L 213.42 210.97 L 213.42 207.8 Z M 254.85 240.14 L 259.06 240.13 L 259.06 240.13 L 259.06 238.91 L 259.06 237.7 L 259.06 237.7 L 254.85 237.69 Z M 228.35 237.12 C 224.08 236.75 219.48 236.17 215.16 235.3 C 207.02 233.66 199.81 230.87 196.85 227.14 L 196.85 227.14 L 196.85 230.64 C 200.93 234.17 207.59 236.25 214.68 237.68 C 217.17 238.18 219.74 238.58 222.28 238.91 C 219.74 239.24 217.17 239.65 214.68 240.15 C 207.59 241.58 200.93 243.67 196.85 247.18 L 196.85 250.68 C 199.81 246.96 207.02 244.16 215.16 242.52 C 219.51 241.65 224.06 241.07 228.35 240.7 Z M 223.32 246.01 C 220.52 246.79 217.63 247.74 214.47 248.91 L 214.46 248.92 L 214.45 248.92 C 210.82 250.35 206.56 252.96 203.12 256.22 C 199.69 259.47 197 263.39 197 267.59 L 197 270.03 L 199.43 270.03 L 199.43 267.59 C 199.43 264.47 201.62 260.99 204.79 257.98 C 207.97 254.97 212.04 252.48 215.33 251.18 C 218.46 250.02 221.3 249.08 224.05 248.32 Z M 222.04 253.75 C 221.54 253.98 221.05 254.23 220.56 254.48 L 220.52 254.5 L 220.48 254.52 C 218.11 255.94 215.78 257.75 214.01 259.82 C 212.24 261.89 210.98 264.28 210.99 266.86 L 210.99 270.03 L 213.42 270.03 L 213.42 266.86 L 213.42 266.85 C 213.42 265.08 214.32 263.19 215.85 261.4 C 217.39 259.6 219.51 257.94 221.71 256.62 C 222.2 256.36 222.71 256.12 223.21 255.89 Z" fill="#ffffff" stroke="none" pointer-events="all"/><path d="M 701 200 L 701 99 L 582 99" fill="none" stroke="#000000" stroke-miterlimit="10" pointer-events="stroke"/><path d="M 575 99 L 582 96.67 L 582 101.33 Z" fill="#000000" stroke="#000000" stroke-miterlimit="10" pointer-events="all"/><path d="M 657 200 L 735 200 L 735 278 L 657 278 Z" fill="url(#mx-gradient-945df2-1-5a30b5-1-s-0)" stroke="none" pointer-events="all"/><path d="M 694.81 207.83 C 682.99 207.83 672.76 214.71 667.92 224.7 L 670.09 225.76 C 674.52 216.59 683.92 210.26 694.81 210.26 C 709.98 210.26 722.26 222.54 722.26 237.7 C 722.26 243.14 720.69 248.18 717.97 252.44 L 713.25 247.67 C 714.69 244.87 715.51 241.71 715.51 238.34 C 715.51 227.04 706.31 217.84 694.99 217.84 C 686.09 217.84 678.51 223.54 675.68 231.48 L 667.31 231.44 L 667.3 233.87 L 674.98 233.91 C 674.71 235.13 674.54 236.39 674.5 237.68 L 664.81 237.62 L 664.8 240.05 L 674.57 240.11 C 674.68 241.42 674.92 242.68 675.26 243.89 L 667.37 243.97 L 667.4 246.4 L 676.11 246.32 C 679.22 253.68 686.52 258.86 694.99 258.86 C 698.67 258.86 702.12 257.87 705.1 256.18 L 709.65 260.8 C 705.38 263.55 700.28 265.15 694.81 265.15 C 685.25 265.15 676.84 260.27 671.93 252.88 L 669.92 254.23 C 675.28 262.28 684.44 267.58 694.81 267.58 C 700.94 267.58 706.64 265.73 711.38 262.55 L 716.73 267.96 L 716.73 267.97 C 717.87 269.09 719.27 269.87 720.8 270.02 C 722.33 270.17 723.94 269.66 725.29 268.46 L 725.33 268.42 L 725.37 268.38 C 726.58 267.12 727.2 265.6 727.16 264.09 C 727.13 262.58 726.46 261.13 725.36 259.92 L 725.35 259.91 L 719.71 254.2 C 722.85 249.47 724.69 243.8 724.69 237.7 C 724.69 221.22 711.3 207.83 694.81 207.83 Z M 694.99 220.27 C 705 220.27 713.08 228.35 713.08 238.34 C 713.08 248.35 705 256.43 694.99 256.43 C 685 256.43 676.92 248.35 676.92 238.34 C 676.92 228.35 685 220.27 694.99 220.27 Z M 694.86 226.58 C 692.16 226.58 689.62 226.96 687.67 227.56 C 686.69 227.86 685.88 228.21 685.2 228.66 C 684.53 229.12 683.87 229.76 683.87 230.75 L 683.87 230.83 L 685.9 247.08 L 686.04 247.31 C 686.85 248.62 688.28 249.13 689.77 249.49 C 691.27 249.84 692.92 249.96 694.37 249.96 L 694.38 249.96 C 697.54 249.96 700.57 249.5 702.4 247.86 L 702.73 247.57 L 702.79 247.13 L 703.53 241.88 C 703.72 241.92 703.92 241.95 704.11 241.97 C 704.5 242.01 704.86 242.02 705.28 241.94 C 705.49 241.9 705.73 241.83 706.01 241.64 C 706.29 241.44 706.58 241 706.61 240.56 L 706.61 240.55 L 706.61 240.54 C 706.64 239.89 706.38 239.49 706.14 239.15 C 705.9 238.82 705.62 238.53 705.33 238.25 C 704.94 237.89 704.54 237.58 704.17 237.28 L 705.06 230.95 L 705.06 230.86 C 705.06 229.85 704.41 229.25 703.76 228.8 C 703.12 228.34 702.32 227.97 701.41 227.65 C 699.56 227.02 697.22 226.58 694.86 226.58 Z M 694.86 229.01 C 696.91 229.01 699.06 229.41 700.61 229.95 C 701.21 230.16 701.71 230.39 702.08 230.61 C 700.73 231.3 698.39 231.84 694.31 231.84 L 694.3 231.84 L 694.28 231.85 C 691.06 231.9 688.36 231.56 686.76 230.56 C 687.14 230.34 687.69 230.1 688.38 229.89 C 690.03 229.38 692.38 229.01 694.86 229.01 Z M 702.29 233.17 L 701.52 238.7 C 699.68 238.02 697.98 237.19 696.1 236.25 C 695.84 235.52 695.15 235.04 694.37 235.04 C 693.36 235.04 692.53 235.86 692.53 236.88 C 692.53 237.89 693.36 238.71 694.37 238.71 C 694.66 238.71 694.94 238.65 695.2 238.52 C 697.17 239.51 699.04 240.43 701.17 241.18 L 700.47 246.19 C 699.49 246.87 697.05 247.53 694.37 247.53 C 693.06 247.53 691.55 247.41 690.34 247.12 C 689.18 246.85 688.46 246.38 688.24 246.1 L 686.62 233.26 C 688.79 234.19 691.49 234.32 694.31 234.27 C 697.93 234.27 700.48 233.89 702.29 233.17 Z M 711.98 249.83 L 723.56 261.56 C 724.35 262.42 724.71 263.31 724.73 264.14 C 724.75 264.97 724.45 265.8 723.64 266.66 C 722.74 267.45 721.89 267.68 721.04 267.6 C 720.17 267.52 719.27 267.06 718.44 266.24 L 707.18 254.83 C 709.05 253.45 710.67 251.76 711.98 249.83 Z" fill="#ffffff" stroke="none" pointer-events="all"/><path d="M 425 239 L 496.88 239" fill="none" stroke="#000000" stroke-miterlimit="10" pointer-events="stroke"/><path d="M 503.88 239 L 496.88 241.33 L 496.88 236.67 Z" fill="#000000" stroke="#000000" stroke-miterlimit="10" pointer-events="all"/><path d="M 386 200 L 386 99 L 494 99" fill="none" stroke="#000000" stroke-miterlimit="10" pointer-events="stroke"/><path d="M 501 99 L 494 101.33 L 494 96.67 Z" fill="#000000" stroke="#000000" stroke-miterlimit="10" pointer-events="all"/><path d="M 347 200 L 425 200 L 425 278 L 347 278 Z" fill="url(#mx-gradient-945df2-1-5a30b5-1-s-0)" stroke="none" pointer-events="all"/><path d="M 385.92 207.8 C 385.25 207.8 384.71 208.34 384.71 209.01 L 384.71 214.04 C 384.71 214.71 385.25 215.25 385.92 215.25 L 390.93 215.25 C 391.6 215.25 392.14 214.71 392.14 214.04 L 392.14 209.01 C 392.14 208.34 391.6 207.8 390.93 207.8 Z M 387.14 210.23 L 389.71 210.23 L 389.71 212.82 L 387.14 212.82 Z M 375.94 214.01 C 375.27 214.01 374.73 214.55 374.73 215.22 L 374.73 220.25 C 374.73 220.92 375.27 221.46 375.94 221.46 L 380.95 221.46 C 381.62 221.46 382.16 220.92 382.16 220.25 L 382.16 215.22 C 382.16 214.55 381.62 214.01 380.95 214.01 Z M 377.16 216.44 L 379.73 216.44 L 379.73 219.03 L 377.16 219.03 Z M 388.5 219.01 C 387.83 219.01 387.28 219.56 387.28 220.23 L 387.28 225.25 C 387.28 225.92 387.83 226.46 388.5 226.46 L 393.5 226.46 C 394.17 226.46 394.71 225.92 394.71 225.25 L 394.71 220.23 C 394.71 219.56 394.17 219.01 393.5 219.01 Z M 389.71 221.44 L 392.29 221.44 L 392.29 224.03 L 389.71 224.03 Z M 384.6 222.76 C 380.82 222.76 376.47 223.07 372.99 223.63 C 371.25 223.91 369.75 224.24 368.58 224.66 C 368 224.86 367.49 225.08 367.06 225.41 C 366.62 225.73 366.12 226.27 366.12 227.05 C 366.12 227.41 366.23 227.75 366.39 228.03 L 366.43 228.12 C 366.46 228.21 366.51 228.3 366.56 228.38 L 381.05 250.5 L 381.01 253.89 C 381 254.36 381.27 254.8 381.7 255.01 C 382.77 255.51 384.2 255.75 385.75 255.8 C 387.28 255.84 388.9 255.67 390.27 255 C 390.69 254.78 390.95 254.35 390.95 253.89 L 390.92 250.55 L 405.3 228.39 C 405.34 228.33 405.37 228.26 405.4 228.19 L 405.41 228.17 C 405.68 227.79 405.88 227.33 405.88 226.81 C 405.88 225.94 405.27 225.41 404.77 225.1 C 404.26 224.8 403.69 224.59 403 224.39 C 401.64 224 399.85 223.7 397.82 223.46 C 397.61 223.44 397.39 223.41 397.16 223.39 L 396.89 225.8 C 397.1 225.83 397.33 225.85 397.53 225.88 C 399.5 226.11 401.2 226.4 402.34 226.72 C 402.69 226.83 402.9 226.92 403.1 227.02 C 402.9 227.13 402.63 227.26 402.29 227.4 C 401.24 227.78 399.6 228.12 397.74 228.37 C 394.01 228.86 389.31 229.03 385.78 229.06 C 383.17 229.06 378.58 228.87 374.71 228.39 C 372.78 228.14 371.02 227.82 369.84 227.44 C 369.49 227.34 369.21 227.22 368.99 227.11 C 369.1 227.07 369.22 227.01 369.4 226.94 C 370.31 226.61 371.73 226.29 373.38 226.03 C 376.68 225.5 380.96 225.19 384.6 225.19 L 384.61 225.19 L 384.6 222.76 Z M 370.64 230.16 C 371.78 230.43 373.05 230.63 374.41 230.8 C 378.43 231.3 383.07 231.49 385.78 231.49 C 385.79 231.49 385.79 231.49 385.79 231.49 C 389.38 231.46 394.13 231.3 398.06 230.78 C 399.18 230.63 400.24 230.45 401.2 230.23 L 388.67 249.53 C 388.55 249.73 388.48 249.96 388.48 250.21 L 388.52 252.93 C 387.73 253.17 386.83 253.4 385.81 253.37 C 384.82 253.34 384.08 253.13 383.44 252.94 L 383.47 250.16 C 383.48 249.91 383.42 249.68 383.28 249.48 Z M 382.28 257.61 L 382.28 261.39 L 378.31 261.39 C 377.17 261.4 376.66 262.82 377.54 263.55 L 385.18 269.83 C 385.63 270.2 386.28 270.19 386.73 269.82 L 394.27 263.54 C 395.13 262.81 394.62 261.39 393.49 261.39 L 389.63 261.39 L 389.63 257.61 L 387.2 257.61 L 387.2 262.6 C 387.2 263.27 387.75 263.82 388.42 263.82 L 390.13 263.82 L 385.95 267.31 L 381.71 263.82 L 383.49 263.82 C 384.16 263.82 384.71 263.27 384.71 262.6 L 384.71 257.61 Z" fill="#ffffff" stroke="none" pointer-events="all"/><rect x="163" y="278" width="130" height="20" fill="none" stroke="none" pointer-events="all"/><g transform="translate(-0.5 -0.5)"><switch><foreignObject style="overflow: visible; text-align: left;" pointer-events="none" width="100%" height="100%" requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility"><div xmlns="http://www.w3.org/1999/xhtml" style="display: flex; align-items: unsafe center; justify-content: unsafe center; width: 1px; height: 1px; padding-top: 288px; margin-left: 228px;"><div style="box-sizing: border-box; font-size: 0; text-align: center; "><div style="display: inline-block; font-size: 12px; font-family: Helvetica; color: #000000; line-height: 1.2; pointer-events: all; white-space: nowrap; ">Kinesis Data Streams</div></div></div></foreignObject><text x="228" y="292" fill="#000000" font-family="Helvetica" font-size="12px" text-anchor="middle">Kinesis Data Streams</text></switch></g><rect x="336" y="278" width="100" height="20" fill="none" stroke="none" pointer-events="all"/><g transform="translate(-0.5 -0.5)"><switch><foreignObject style="overflow: visible; text-align: left;" pointer-events="none" width="100%" height="100%" requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility"><div xmlns="http://www.w3.org/1999/xhtml" style="display: flex; align-items: unsafe center; justify-content: unsafe center; width: 1px; height: 1px; padding-top: 288px; margin-left: 386px;"><div style="box-sizing: border-box; font-size: 0; text-align: center; "><div style="display: inline-block; font-size: 12px; font-family: Helvetica; color: #000000; line-height: 1.2; pointer-events: all; white-space: nowrap; ">Glue Streaming</div></div></div></foreignObject><text x="386" y="292" fill="#000000" font-family="Helvetica" font-size="12px" text-anchor="middle">Glue Streaming</text></switch></g><rect x="671" y="278" width="50" height="20" fill="none" stroke="none" pointer-events="all"/><g transform="translate(-0.5 -0.5)"><switch><foreignObject style="overflow: visible; text-align: left;" pointer-events="none" width="100%" height="100%" requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility"><div xmlns="http://www.w3.org/1999/xhtml" style="display: flex; align-items: unsafe center; justify-content: unsafe center; width: 1px; height: 1px; padding-top: 288px; margin-left: 696px;"><div style="box-sizing: border-box; font-size: 0; text-align: center; "><div style="display: inline-block; font-size: 12px; font-family: Helvetica; color: #000000; line-height: 1.2; pointer-events: all; white-space: nowrap; ">Athena</div></div></div></foreignObject><text x="696" y="292" fill="#000000" font-family="Helvetica" font-size="12px" text-anchor="middle">Athena</text></switch></g><path d="M 576 239.1 L 616 239.1 L 616 239 L 647.88 239" fill="none" stroke="#000000" stroke-miterlimit="10" pointer-events="stroke"/><path d="M 654.88 239 L 647.88 241.33 L 647.88 236.67 Z" fill="#000000" stroke="#000000" stroke-miterlimit="10" pointer-events="all"/><rect x="501" y="200" width="75" height="78" fill="none" stroke="none" pointer-events="all"/><path d="M 538.02 200.07 C 529.81 200.07 521.26 201 514.44 202.84 C 511.02 203.76 508.04 204.89 505.7 206.32 C 503.37 207.75 501.61 209.52 501.05 211.8 C 501.01 211.98 501 212.16 501.03 212.35 L 508.65 269.35 C 508.65 269.37 508.66 269.39 508.66 269.4 C 509.03 271.56 510.78 273.01 512.85 274.06 C 514.91 275.12 517.47 275.86 520.17 276.42 C 525.56 277.56 531.46 277.93 534.59 277.93 C 541.45 277.93 547.81 277.42 552.87 276.31 C 557.94 275.21 561.85 273.67 563.59 270.73 C 563.69 270.56 563.76 270.37 563.79 270.16 L 567.08 245.92 C 567.73 246.08 568.36 246.23 568.94 246.36 C 570.64 246.71 572.04 246.9 573.28 246.74 C 573.9 246.66 574.53 246.49 575.09 246.04 C 575.65 245.58 576 244.77 576 244.06 C 576 242.12 574.66 240.79 573.14 239.52 C 571.77 238.37 570.08 237.29 568.38 236.3 L 571.65 212.22 C 571.68 212 571.66 211.78 571.6 211.58 C 570.85 209.17 568.79 207.41 566.22 205.99 C 563.66 204.58 560.48 203.48 557.11 202.61 C 550.37 200.86 542.85 200.07 538.02 200.07 Z M 538.02 203.08 C 542.51 203.08 549.93 203.86 556.36 205.51 C 559.57 206.34 562.55 207.4 564.77 208.62 C 566.82 209.76 568.08 211.02 568.57 212.19 C 568.17 213.35 567.03 214.51 565.12 215.57 C 562.97 216.78 559.96 217.78 556.63 218.54 C 549.97 220.07 542.02 220.66 536.68 220.66 C 531.12 220.66 523.02 220.04 516.25 218.52 C 512.87 217.76 509.81 216.77 507.62 215.58 C 505.66 214.52 504.5 213.36 504.09 212.24 C 504.46 211.15 505.46 209.99 507.27 208.88 C 509.23 207.68 511.98 206.61 515.21 205.74 C 521.68 204 530.04 203.08 538.02 203.08 Z M 504.72 217.32 C 505.18 217.64 505.68 217.94 506.2 218.22 C 508.78 219.62 512.05 220.65 515.59 221.45 C 522.69 223.04 530.9 223.66 536.68 223.66 C 542.24 223.66 550.3 223.07 557.3 221.47 C 560.8 220.67 564.03 219.62 566.59 218.19 C 567.05 217.93 567.5 217.66 567.92 217.37 L 564.57 242.1 C 562.32 241.42 559.8 240.58 557.15 239.66 C 553.96 238.54 545.99 234.99 539.74 232.02 C 539.73 230.12 538.14 228.55 536.24 228.55 C 534.33 228.55 532.75 230.13 532.75 232.04 C 532.75 233.95 534.33 235.53 536.24 235.53 C 537.07 235.53 537.84 235.23 538.45 234.73 C 544.75 237.73 552.59 241.24 556.15 242.5 C 556.15 242.5 556.15 242.5 556.16 242.5 C 559.02 243.49 561.72 244.39 564.16 245.11 L 560.87 269.3 C 559.93 270.73 556.87 272.37 552.23 273.38 C 547.49 274.41 541.3 274.92 534.58 274.92 C 531.7 274.92 525.87 274.55 520.78 273.49 C 518.24 272.95 515.88 272.24 514.21 271.39 C 512.54 270.54 511.74 269.63 511.61 268.9 Z M 522.84 230.83 C 522.3 230.81 521.78 231.09 521.5 231.56 L 513.72 244.54 C 513.44 245 513.44 245.58 513.7 246.05 C 513.97 246.52 514.47 246.81 515.01 246.81 L 530.57 246.81 C 531.11 246.81 531.61 246.52 531.88 246.05 C 532.15 245.58 532.14 245 531.86 244.54 L 524.08 231.56 C 523.82 231.12 523.35 230.85 522.84 230.83 Z M 536.24 231.55 C 536.53 231.55 536.73 231.75 536.73 232.04 C 536.73 232.33 536.53 232.53 536.24 232.53 C 535.95 232.53 535.75 232.33 535.75 232.04 C 535.75 231.75 535.95 231.55 536.24 231.55 Z M 522.79 235.25 L 527.92 243.81 L 517.66 243.81 Z M 567.94 239.53 C 569.17 240.29 570.32 241.07 571.21 241.82 C 572.3 242.73 572.62 243.41 572.75 243.75 C 572.15 243.8 571.03 243.72 569.55 243.42 C 568.92 243.29 568.22 243.12 567.48 242.92 Z M 548.58 245.4 C 543.88 245.4 540.05 249.24 540.05 253.94 C 540.05 258.63 543.88 262.47 548.58 262.47 C 553.27 262.47 557.11 258.63 557.11 253.94 C 557.11 249.24 553.27 245.4 548.58 245.4 Z M 548.58 248.4 C 551.65 248.4 554.11 250.86 554.11 253.94 C 554.11 257.01 551.65 259.47 548.58 259.47 C 545.5 259.47 543.05 257.01 543.05 253.94 C 543.05 250.86 545.5 248.4 548.58 248.4 Z M 519.8 253.14 C 519.4 253.14 519.02 253.3 518.74 253.58 C 518.46 253.86 518.3 254.24 518.3 254.64 L 518.3 267.26 C 518.3 267.66 518.46 268.04 518.74 268.32 C 519.02 268.61 519.4 268.76 519.8 268.76 L 532.24 268.76 C 532.64 268.76 533.02 268.61 533.3 268.32 C 533.59 268.04 533.74 267.66 533.74 267.26 L 533.74 254.64 C 533.74 254.24 533.59 253.86 533.3 253.58 C 533.02 253.3 532.64 253.14 532.24 253.14 Z M 521.3 256.14 L 530.74 256.14 L 530.74 265.76 L 521.3 265.76 Z" fill="#277116" stroke="none" pointer-events="all"/><rect x="523.5" y="280" width="30" height="20" fill="none" stroke="none" pointer-events="all"/><g transform="translate(-0.5 -0.5)"><switch><foreignObject style="overflow: visible; text-align: left;" pointer-events="none" width="100%" height="100%" requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility"><div xmlns="http://www.w3.org/1999/xhtml" style="display: flex; align-items: unsafe center; justify-content: unsafe center; width: 1px; height: 1px; padding-top: 290px; margin-left: 539px;"><div style="box-sizing: border-box; font-size: 0; text-align: center; "><div style="display: inline-block; font-size: 12px; font-family: Helvetica; color: #000000; line-height: 1.2; pointer-events: all; white-space: nowrap; ">S3</div></div></div></foreignObject><text x="539" y="294" fill="#000000" font-family="Helvetica" font-size="12px" text-anchor="middle">S3</text></switch></g><path d="M 108.81 238 L 181.07 238" fill="none" stroke="#000000" stroke-miterlimit="10" pointer-events="stroke"/><path d="M 188.07 238 L 181.07 240.33 L 181.07 235.67 Z" fill="#000000" stroke="#000000" stroke-miterlimit="10" pointer-events="all"/><rect x="31" y="200" width="78" height="76" fill="none" stroke="none" pointer-events="all"/><path d="M 82.91 263.62 C 81.76 263.62 80.84 264.35 80.84 265.23 L 80.84 268.33 C 80.84 269.22 81.76 269.94 82.91 269.94 L 100.64 269.94 C 101.79 269.94 102.71 269.22 102.71 268.33 L 102.71 265.23 C 102.71 264.35 101.79 263.62 100.64 263.62 Z M 36.86 260.39 L 103.16 260.39 C 104.71 260.39 105.85 261.51 105.85 262.87 L 105.85 270.63 C 105.85 271.98 104.71 273.1 103.16 273.1 L 36.86 273.1 C 35.3 273.1 34.16 271.98 34.16 270.63 L 34.16 262.87 C 34.16 261.51 35.3 260.39 36.86 260.39 Z M 36.86 257.49 C 33.83 257.49 31.27 259.87 31.27 262.87 L 31.27 270.63 C 31.27 273.62 33.83 276 36.86 276 L 103.16 276 C 106.19 276 108.75 273.62 108.75 270.63 L 108.75 262.87 C 108.75 259.87 106.19 257.49 103.16 257.49 Z M 40.31 209.12 L 99.66 209.12 L 99.66 245.17 L 40.31 245.17 Z M 38.9 206.23 C 38.9 206.23 38.09 206.47 38.09 206.47 C 38.09 206.47 37.94 206.6 37.88 206.65 C 37.82 206.71 37.77 206.77 37.74 206.81 C 37.59 207 37.56 207.11 37.53 207.19 C 37.47 207.36 37.46 207.44 37.45 207.51 C 37.43 207.66 37.42 207.75 37.42 207.87 L 37.42 246.42 C 37.42 246.54 37.43 246.64 37.45 246.78 C 37.46 246.86 37.47 246.93 37.53 247.09 C 37.56 247.17 37.59 247.29 37.74 247.48 C 37.77 247.53 37.82 247.58 37.88 247.64 C 37.94 247.7 38.09 247.82 38.09 247.82 C 38.09 247.83 38.9 248.07 38.9 248.07 L 101.07 248.07 C 101.08 248.07 101.88 247.82 101.88 247.82 C 101.88 247.82 102.04 247.7 102.1 247.64 C 102.16 247.58 102.2 247.53 102.24 247.48 C 102.38 247.29 102.41 247.17 102.45 247.09 C 102.51 246.93 102.51 246.86 102.53 246.78 C 102.55 246.64 102.55 246.54 102.55 246.42 L 102.55 207.87 C 102.55 207.75 102.55 207.66 102.53 207.51 C 102.51 207.44 102.51 207.36 102.45 207.19 C 102.41 207.11 102.38 207 102.24 206.81 C 102.2 206.77 102.16 206.71 102.1 206.65 C 102.04 206.6 101.88 206.47 101.88 206.47 C 101.88 206.47 101.08 206.23 101.07 206.23 Z M 35.47 202.9 L 104.53 202.9 C 105.27 202.9 105.93 203.53 105.93 204.51 L 105.93 249.7 C 105.93 250.68 105.27 251.31 104.53 251.31 L 35.47 251.31 C 34.74 251.31 34.08 250.68 34.08 249.7 L 34.08 204.51 C 34.08 203.53 34.74 202.9 35.47 202.9 Z M 35.47 200 C 33.06 200 31.19 202.09 31.19 204.51 L 31.19 249.7 C 31.19 252.11 33.06 254.21 35.47 254.21 L 104.53 254.21 C 106.94 254.21 108.81 252.11 108.81 249.7 L 108.81 204.51 C 108.81 202.09 106.94 200 104.53 200 Z" fill="#232f3e" stroke="none" pointer-events="all"/><rect x="0" y="276" width="140" height="20" fill="none" stroke="none" pointer-events="all"/><g transform="translate(-0.5 -0.5)"><switch><foreignObject style="overflow: visible; text-align: left;" pointer-events="none" width="100%" height="100%" requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility"><div xmlns="http://www.w3.org/1999/xhtml" style="display: flex; align-items: unsafe center; justify-content: unsafe center; width: 1px; height: 1px; padding-top: 286px; margin-left: 70px;"><div style="box-sizing: border-box; font-size: 0; text-align: center; "><div style="display: inline-block; font-size: 12px; font-family: Helvetica; color: #000000; line-height: 1.2; pointer-events: all; white-space: nowrap; ">Python Data Generator</div></div></div></foreignObject><text x="70" y="290" fill="#000000" font-family="Helvetica" font-size="12px" text-anchor="middle">Python Data Generator</text></switch></g></g><switch><g requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility"/><a transform="translate(0,-5)" xlink:href="https://desk.draw.io/support/solutions/articles/16000042487" target="_blank"><text text-anchor="middle" font-size="10px" x="50%" y="100%">Viewer does not support full SVG 1.1</text></a></switch></svg>


--------------------------------------------------------------------------------
/requirements-dev.txt:
--------------------------------------------------------------------------------
1 | boto3>=1.24.41
2 | mimesis==6.0.0
3 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | aws-cdk-lib==2.59.0
2 | constructs>=10.0.0,<11.0.0
3 | 


--------------------------------------------------------------------------------
/source.bat:
--------------------------------------------------------------------------------
 1 | @echo off
 2 | 
 3 | rem The sole purpose of this script is to make the command
 4 | rem
 5 | rem     source .venv/bin/activate
 6 | rem
 7 | rem (which activates a Python virtualenv on Linux or Mac OS X) work on Windows.
 8 | rem On Windows, this command just runs this batch file (the argument is ignored).
 9 | rem
10 | rem Now we don't need to document a Windows command for activating a virtualenv.
11 | 
12 | echo Executing .venv\Scripts\activate.bat for you
13 | .venv\Scripts\activate.bat
14 | 


--------------------------------------------------------------------------------
/src/main/python/spark_iceberg_writes_with_dataframe.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | # -*- encoding: utf-8 -*-
  3 | # vim: tabstop=2 shiftwidth=2 softtabstop=2 expandtab
  4 | 
  5 | import os
  6 | import sys
  7 | import re
  8 | 
  9 | from awsglue.transforms import *
 10 | from awsglue.utils import getResolvedOptions
 11 | from pyspark.context import SparkContext
 12 | from awsglue.context import GlueContext
 13 | from awsglue.job import Job
 14 | from awsglue import DynamicFrame
 15 | 
 16 | from pyspark.conf import SparkConf
 17 | from pyspark.sql import DataFrame, Row
 18 | from pyspark.sql.types import *
 19 | from pyspark.sql.functions import *
 20 | 
 21 | 
 22 | def get_kinesis_stream_name_from_arn(stream_arn):
 23 |   ARN_PATTERN = re.compile(r'arn:aws:kinesis:([a-z0-9-]+):(\d+):stream/([a-zA-Z0-9-_]+)')
 24 |   results = ARN_PATTERN.match(stream_arn)
 25 |   return results.group(3)
 26 | 
 27 | args = getResolvedOptions(sys.argv, ['JOB_NAME',
 28 |   'catalog',
 29 |   'database_name',
 30 |   'table_name',
 31 |   'kinesis_table_name',
 32 |   'kinesis_stream_arn',
 33 |   'starting_position_of_kinesis_iterator',
 34 |   'iceberg_s3_path',
 35 |   'lock_table_name',
 36 |   'aws_region',
 37 |   'window_size'
 38 | ])
 39 | 
 40 | CATALOG = args['catalog']
 41 | 
 42 | ICEBERG_S3_PATH = args['iceberg_s3_path']
 43 | DATABASE = args['database_name']
 44 | TABLE_NAME = args['table_name']
 45 | DYNAMODB_LOCK_TABLE = args['lock_table_name']
 46 | 
 47 | KINESIS_TABLE_NAME = args['kinesis_table_name']
 48 | KINESIS_STREAM_ARN = args['kinesis_stream_arn']
 49 | KINESIS_STREAM_NAME = get_kinesis_stream_name_from_arn(KINESIS_STREAM_ARN)
 50 | 
 51 | #XXX: starting_position_of_kinesis_iterator: ['LATEST', 'TRIM_HORIZON']
 52 | STARTING_POSITION_OF_KINESIS_ITERATOR = args.get('starting_position_of_kinesis_iterator', 'LATEST')
 53 | 
 54 | AWS_REGION = args['aws_region']
 55 | WINDOW_SIZE = args.get('window_size', '100 seconds')
 56 | 
 57 | def setSparkIcebergConf() -> SparkConf:
 58 |   conf_list = [
 59 |     (f"spark.sql.catalog.{CATALOG}", "org.apache.iceberg.spark.SparkCatalog"),
 60 |     (f"spark.sql.catalog.{CATALOG}.warehouse", ICEBERG_S3_PATH),
 61 |     (f"spark.sql.catalog.{CATALOG}.catalog-impl", "org.apache.iceberg.aws.glue.GlueCatalog"),
 62 |     (f"spark.sql.catalog.{CATALOG}.io-impl", "org.apache.iceberg.aws.s3.S3FileIO"),
 63 |     (f"spark.sql.catalog.{CATALOG}.lock-impl", "org.apache.iceberg.aws.glue.DynamoLockManager"),
 64 |     (f"spark.sql.catalog.{CATALOG}.lock.table", DYNAMODB_LOCK_TABLE),
 65 |     ("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions"),
 66 |     ("spark.sql.iceberg.handle-timestamp-without-timezone", "true")
 67 |   ]
 68 |   spark_conf = SparkConf().setAll(conf_list)
 69 |   return spark_conf
 70 | 
 71 | # Set the Spark + Glue context
 72 | conf = setSparkIcebergConf()
 73 | sc = SparkContext(conf=conf)
 74 | glueContext = GlueContext(sc)
 75 | spark = glueContext.spark_session
 76 | job = Job(glueContext)
 77 | job.init(args['JOB_NAME'], args)
 78 | 
 79 | # Read from Kinesis Data Stream
 80 | streaming_data = spark.readStream \
 81 |                     .format("kinesis") \
 82 |                     .option("streamName", KINESIS_STREAM_NAME) \
 83 |                     .option("endpointUrl", f"https://kinesis.{AWS_REGION}.amazonaws.com") \
 84 |                     .option("startingPosition", STARTING_POSITION_OF_KINESIS_ITERATOR) \
 85 |                     .load()
 86 | 
 87 | streaming_data_df = streaming_data \
 88 |     .select(from_json(col("data").cast("string"), \
 89 |       glueContext.get_catalog_schema_as_spark_schema(DATABASE, KINESIS_TABLE_NAME)) \
 90 |     .alias("source_table")) \
 91 |     .select("source_table.*") \
 92 |     .withColumn('m_time', to_timestamp(col('m_time'), 'yyyy-MM-dd HH:mm:ss'))
 93 | 
 94 | table_identifier = f"{CATALOG}.{DATABASE}.{TABLE_NAME}"
 95 | checkpointPath = os.path.join(args["TempDir"], args["JOB_NAME"], "checkpoint/")
 96 | 
 97 | #XXX: Writing against partitioned table
 98 | # https://iceberg.apache.org/docs/0.14.0/spark-structured-streaming/#writing-against-partitioned-table
 99 | # Complete output mode not supported when there are no streaming aggregations on streaming DataFrame/Datasets
100 | query = streaming_data_df.writeStream \
101 |     .format("iceberg") \
102 |     .outputMode("append") \
103 |     .trigger(processingTime=WINDOW_SIZE) \
104 |     .option("path", table_identifier) \
105 |     .option("fanout-enabled", "true") \
106 |     .option("checkpointLocation", checkpointPath) \
107 |     .start()
108 | 
109 | query.awaitTermination()
110 | 


--------------------------------------------------------------------------------
/src/main/python/spark_iceberg_writes_with_sql_insert_overwrite.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | # -*- encoding: utf-8 -*-
  3 | # vim: tabstop=2 shiftwidth=2 softtabstop=2 expandtab
  4 | 
  5 | import os
  6 | import sys
  7 | import traceback
  8 | 
  9 | from awsglue.transforms import *
 10 | from awsglue.utils import getResolvedOptions
 11 | from pyspark.context import SparkContext
 12 | from awsglue.context import GlueContext
 13 | from awsglue.job import Job
 14 | from awsglue import DynamicFrame
 15 | 
 16 | from pyspark.conf import SparkConf
 17 | from pyspark.sql import DataFrame, Row
 18 | from pyspark.sql.window import Window
 19 | from pyspark.sql.functions import (
 20 |   col,
 21 |   desc,
 22 |   row_number,
 23 |   to_timestamp
 24 | )
 25 | 
 26 | 
 27 | args = getResolvedOptions(sys.argv, ['JOB_NAME',
 28 |   'catalog',
 29 |   'database_name',
 30 |   'table_name',
 31 |   'primary_key',
 32 |   'kinesis_stream_arn',
 33 |   'starting_position_of_kinesis_iterator',
 34 |   'iceberg_s3_path',
 35 |   'lock_table_name',
 36 |   'aws_region',
 37 |   'window_size'
 38 | ])
 39 | 
 40 | CATALOG = args['catalog']
 41 | ICEBERG_S3_PATH = args['iceberg_s3_path']
 42 | DATABASE = args['database_name']
 43 | TABLE_NAME = args['table_name']
 44 | PRIMARY_KEY = args['primary_key']
 45 | DYNAMODB_LOCK_TABLE = args['lock_table_name']
 46 | KINESIS_STREAM_ARN = args['kinesis_stream_arn']
 47 | #XXX: starting_position_of_kinesis_iterator: ['LATEST', 'TRIM_HORIZON']
 48 | STARTING_POSITION_OF_KINESIS_ITERATOR = args.get('starting_position_of_kinesis_iterator', 'LATEST')
 49 | AWS_REGION = args['aws_region']
 50 | WINDOW_SIZE = args.get('window_size', '100 seconds')
 51 | 
 52 | def setSparkIcebergConf() -> SparkConf:
 53 |   conf_list = [
 54 |     (f"spark.sql.catalog.{CATALOG}", "org.apache.iceberg.spark.SparkCatalog"),
 55 |     (f"spark.sql.catalog.{CATALOG}.warehouse", ICEBERG_S3_PATH),
 56 |     (f"spark.sql.catalog.{CATALOG}.catalog-impl", "org.apache.iceberg.aws.glue.GlueCatalog"),
 57 |     (f"spark.sql.catalog.{CATALOG}.io-impl", "org.apache.iceberg.aws.s3.S3FileIO"),
 58 |     (f"spark.sql.catalog.{CATALOG}.lock-impl", "org.apache.iceberg.aws.glue.DynamoLockManager"),
 59 |     (f"spark.sql.catalog.{CATALOG}.lock.table", DYNAMODB_LOCK_TABLE),
 60 |     ("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions"),
 61 |     ("spark.sql.iceberg.handle-timestamp-without-timezone", "true")
 62 |   ]
 63 |   spark_conf = SparkConf().setAll(conf_list)
 64 |   return spark_conf
 65 | 
 66 | # Set the Spark + Glue context
 67 | conf = setSparkIcebergConf()
 68 | sc = SparkContext(conf=conf)
 69 | glueContext = GlueContext(sc)
 70 | spark = glueContext.spark_session
 71 | job = Job(glueContext)
 72 | job.init(args['JOB_NAME'], args)
 73 | 
 74 | kds_df = glueContext.create_data_frame.from_options(
 75 |   connection_type="kinesis",
 76 |   connection_options={
 77 |     "typeOfData": "kinesis",
 78 |     "streamARN": KINESIS_STREAM_ARN,
 79 |     "classification": "json",
 80 |     "startingPosition": f"{STARTING_POSITION_OF_KINESIS_ITERATOR}",
 81 |     "inferSchema": "true",
 82 |   },
 83 |   transformation_ctx="kds_df",
 84 | )
 85 | 
 86 | def processBatch(data_frame, batch_id):
 87 |   if data_frame.count() > 0:
 88 |     stream_data_dynf = DynamicFrame.fromDF(
 89 |       data_frame, glueContext, "from_data_frame"
 90 |     )
 91 | 
 92 |     _df = spark.sql(f"SELECT * FROM {CATALOG}.{DATABASE}.{TABLE_NAME} LIMIT 0")
 93 | 
 94 |     # Apply De-duplication logic on input data to pick up the latest record based on timestamp and operation
 95 |     window = Window.partitionBy(PRIMARY_KEY).orderBy(desc("m_time"))
 96 |     stream_data_df = stream_data_dynf.toDF()
 97 |     stream_data_df = stream_data_df.withColumn('m_time', to_timestamp(col('m_time'), 'yyyy-MM-dd HH:mm:ss'))
 98 |     upsert_data_df = stream_data_df.withColumn("row", row_number().over(window)) \
 99 |       .filter(col("row") == 1).drop("row") \
100 |       .select(_df.schema.names)
101 | 
102 |     upsert_data_df.createOrReplaceTempView(f"{TABLE_NAME}_upsert")
103 |     # print(f"Table '{TABLE_NAME}' is upserting...")
104 | 
105 |     sql_query = f"""
106 |     INSERT OVERWRITE {CATALOG}.{DATABASE}.{TABLE_NAME} SELECT * FROM {TABLE_NAME}_upsert
107 |     """
108 |     try:
109 |       spark.sql(sql_query)
110 |     except Exception as ex:
111 |       traceback.print_exc()
112 |       raise ex
113 | 
114 | 
115 | checkpointPath = os.path.join(args["TempDir"], args["JOB_NAME"], "checkpoint/")
116 | 
117 | glueContext.forEachBatch(
118 |   frame=kds_df,
119 |   batch_function=processBatch,
120 |   options={
121 |     "windowSize": WINDOW_SIZE,
122 |     "checkpointLocation": checkpointPath,
123 |   }
124 | )
125 | 
126 | job.commit()
127 | 


--------------------------------------------------------------------------------
/src/main/python/spark_iceberg_writes_with_sql_merge_into.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | # -*- encoding: utf-8 -*-
  3 | # vim: tabstop=2 shiftwidth=2 softtabstop=2 expandtab
  4 | 
  5 | import os
  6 | import sys
  7 | import traceback
  8 | 
  9 | from awsglue.transforms import *
 10 | from awsglue.utils import getResolvedOptions
 11 | from awsglue.context import GlueContext
 12 | from awsglue.job import Job
 13 | from awsglue import DynamicFrame
 14 | 
 15 | from pyspark.context import SparkContext
 16 | from pyspark.conf import SparkConf
 17 | from pyspark.sql import DataFrame, Row
 18 | from pyspark.sql.window import Window
 19 | from pyspark.sql.functions import (
 20 |   col,
 21 |   desc,
 22 |   row_number,
 23 |   to_timestamp
 24 | )
 25 | 
 26 | args = getResolvedOptions(sys.argv, ['JOB_NAME',
 27 |   'catalog',
 28 |   'database_name',
 29 |   'table_name',
 30 |   'primary_key',
 31 |   'kinesis_stream_arn',
 32 |   'starting_position_of_kinesis_iterator',
 33 |   'iceberg_s3_path',
 34 |   'lock_table_name',
 35 |   'aws_region',
 36 |   'window_size'
 37 | ])
 38 | 
 39 | CATALOG = args['catalog']
 40 | ICEBERG_S3_PATH = args['iceberg_s3_path']
 41 | DATABASE = args['database_name']
 42 | TABLE_NAME = args['table_name']
 43 | PRIMARY_KEY = args['primary_key']
 44 | DYNAMODB_LOCK_TABLE = args['lock_table_name']
 45 | KINESIS_STREAM_ARN = args['kinesis_stream_arn']
 46 | #XXX: starting_position_of_kinesis_iterator: ['LATEST', 'TRIM_HORIZON']
 47 | STARTING_POSITION_OF_KINESIS_ITERATOR = args.get('starting_position_of_kinesis_iterator', 'LATEST')
 48 | AWS_REGION = args['aws_region']
 49 | WINDOW_SIZE = args.get('window_size', '100 seconds')
 50 | 
 51 | def setSparkIcebergConf() -> SparkConf:
 52 |   conf_list = [
 53 |     (f"spark.sql.catalog.{CATALOG}", "org.apache.iceberg.spark.SparkCatalog"),
 54 |     (f"spark.sql.catalog.{CATALOG}.warehouse", ICEBERG_S3_PATH),
 55 |     (f"spark.sql.catalog.{CATALOG}.catalog-impl", "org.apache.iceberg.aws.glue.GlueCatalog"),
 56 |     (f"spark.sql.catalog.{CATALOG}.io-impl", "org.apache.iceberg.aws.s3.S3FileIO"),
 57 |     (f"spark.sql.catalog.{CATALOG}.lock-impl", "org.apache.iceberg.aws.glue.DynamoLockManager"),
 58 |     (f"spark.sql.catalog.{CATALOG}.lock.table", DYNAMODB_LOCK_TABLE),
 59 |     ("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions"),
 60 |     ("spark.sql.iceberg.handle-timestamp-without-timezone", "true")
 61 |   ]
 62 |   spark_conf = SparkConf().setAll(conf_list)
 63 |   return spark_conf
 64 | 
 65 | # Set the Spark + Glue context
 66 | conf = setSparkIcebergConf()
 67 | sc = SparkContext(conf=conf)
 68 | glueContext = GlueContext(sc)
 69 | spark = glueContext.spark_session
 70 | job = Job(glueContext)
 71 | job.init(args['JOB_NAME'], args)
 72 | 
 73 | kds_df = glueContext.create_data_frame.from_options(
 74 |   connection_type="kinesis",
 75 |   connection_options={
 76 |     "typeOfData": "kinesis",
 77 |     "streamARN": KINESIS_STREAM_ARN,
 78 |     "classification": "json",
 79 |     "startingPosition": f"{STARTING_POSITION_OF_KINESIS_ITERATOR}",
 80 |     "inferSchema": "true",
 81 |   },
 82 |   transformation_ctx="kds_df",
 83 | )
 84 | 
 85 | def processBatch(data_frame, batch_id):
 86 |   if data_frame.count() > 0:
 87 |     stream_data_dynf = DynamicFrame.fromDF(
 88 |       data_frame, glueContext, "from_data_frame"
 89 |     )
 90 | 
 91 |     tables_df = spark.sql(f"SHOW TABLES IN {CATALOG}.{DATABASE}")
 92 |     table_list = tables_df.select('tableName').rdd.flatMap(lambda x: x).collect()
 93 |     if f"{TABLE_NAME}" not in table_list:
 94 |       print(f"Table {TABLE_NAME} doesn't exist in {CATALOG}.{DATABASE}.")
 95 |     else:
 96 |       _df = spark.sql(f"SELECT * FROM {CATALOG}.{DATABASE}.{TABLE_NAME} LIMIT 0")
 97 | 
 98 |       # Apply De-duplication logic on input data to pick up the latest record based on timestamp and operation
 99 |       window = Window.partitionBy(PRIMARY_KEY).orderBy(desc("m_time"))
100 |       stream_data_df = stream_data_dynf.toDF()
101 |       stream_data_df = stream_data_df.withColumn('m_time', to_timestamp(col('m_time'), 'yyyy-MM-dd HH:mm:ss'))
102 |       upsert_data_df = stream_data_df.withColumn("row", row_number().over(window)) \
103 |         .filter(col("row") == 1).drop("row") \
104 |         .select(_df.schema.names)
105 | 
106 |       upsert_data_df.createOrReplaceTempView(f"{TABLE_NAME}_upsert")
107 |       # print(f"Table '{TABLE_NAME}' is upserting...")
108 | 
109 |       try:
110 |         spark.sql(f"""MERGE INTO {CATALOG}.{DATABASE}.{TABLE_NAME} t
111 |           USING {TABLE_NAME}_upsert s ON s.{PRIMARY_KEY} = t.{PRIMARY_KEY}
112 |           WHEN MATCHED THEN UPDATE SET *
113 |           WHEN NOT MATCHED THEN INSERT *
114 |           """)
115 |       except Exception as ex:
116 |         traceback.print_exc()
117 |         raise ex
118 | 
119 | checkpointPath = os.path.join(args["TempDir"], args["JOB_NAME"], "checkpoint/")
120 | 
121 | glueContext.forEachBatch(
122 |   frame=kds_df,
123 |   batch_function=processBatch,
124 |   options={
125 |     "windowSize": WINDOW_SIZE,
126 |     "checkpointLocation": checkpointPath,
127 |   }
128 | )
129 | 
130 | job.commit()
131 | 


--------------------------------------------------------------------------------
/src/utils/gen_fake_kinesis_stream_data.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python3
 2 | # -*- encoding: utf-8 -*-
 3 | # vim: tabstop=2 shiftwidth=2 softtabstop=2 expandtab
 4 | 
 5 | import sys
 6 | import argparse
 7 | from datetime import datetime
 8 | import json
 9 | import random
10 | import time
11 | 
12 | import boto3
13 | from mimesis.locales import Locale
14 | from mimesis.schema import Field, Schema
15 | 
16 | 
17 | def main():
18 |   parser = argparse.ArgumentParser()
19 | 
20 |   parser.add_argument('--region-name', action='store', default='us-east-1',
21 |     help='aws region name (default: us-east-1)')
22 |   parser.add_argument('--stream-name', help='The name of the stream to put the data record into')
23 |   parser.add_argument('--max-count', default=10, type=int, help='The max number of records to put (default: 10)')
24 |   parser.add_argument('--dry-run', action='store_true')
25 |   parser.add_argument('--console', action='store_true', help='Print out records ingested into the stream')
26 | 
27 |   options = parser.parse_args()
28 | 
29 |   _CURRENT_YEAR = datetime.now().year
30 |   _NAMES = 'Arica,Burton,Cory,Fernando,Gonzalo,Kenton,Linsey,Micheal,Ricky,Takisha'.split(',')
31 | 
32 |   #XXX: For more information about synthetic data schema, see
33 |   # https://github.com/aws-samples/aws-glue-streaming-etl-blog/blob/master/config/generate_data.py
34 |   _ = Field(locale=Locale.EN)
35 | 
36 |   _schema = Schema(schema=lambda: {
37 |     # "name": _("first_name"),
38 |     "name": _("choice", items=_NAMES),
39 |     "age": _("age"),
40 |     "m_time": _("formatted_datetime", fmt="%Y-%m-%d %H:%M:%S", start=_CURRENT_YEAR, end=_CURRENT_YEAR)
41 |   })
42 | 
43 |   if not options.dry_run:
44 |     kinesis_streams_client = boto3.client('kinesis', region_name=options.region_name)
45 | 
46 |   cnt = 0
47 |   for record in _schema.iterator(options.max_count):
48 |     cnt += 1
49 | 
50 |     if options.dry_run:
51 |       print(f"{json.dumps(record)}")
52 |     else:
53 |       res = kinesis_streams_client.put_record(
54 |         StreamName=options.stream_name,
55 |         Data=f"{json.dumps(record)}\n", # convert JSON to JSON Line
56 |         PartitionKey=f"{record['name']}"
57 |       )
58 | 
59 |       if options.console:
60 |         print(f"{json.dumps(record)}")
61 | 
62 |       if cnt % 100 == 0:
63 |         print(f'[INFO] {cnt} records are processed', file=sys.stderr)
64 | 
65 |       if res['ResponseMetadata']['HTTPStatusCode'] != 200:
66 |         print(res, file=sys.stderr)
67 |     time.sleep(random.choices([0.01, 0.03, 0.05, 0.07, 0.1])[-1])
68 |   print(f'[INFO] Total {cnt} records are processed', file=sys.stderr)
69 | 
70 | 
71 | if __name__ == '__main__':
72 |   main()
73 | 


--------------------------------------------------------------------------------