├── .github └── PULL_REQUEST_TEMPLATE.md ├── CODE_OF_CONDUCT.md ├── CONTRIBUTING.md ├── EN ├── README.md ├── clean-up │ └── README.md ├── images │ ├── architecture_all.png │ ├── architecture_lab1.png │ ├── architecture_lab2.png │ ├── architecture_lab3.png │ ├── architecture_lab4.png │ ├── architecture_lab5.png │ └── architecture_lab6.png ├── lab1 │ ├── README.md │ ├── additional_info_lab1.md │ ├── additional_info_lab1_Fluentd.md │ ├── additional_info_lab1_IAM.md │ ├── asset │ │ ├── ap-northeast-1 │ │ │ ├── 1-cmd.txt │ │ │ └── 1-minilake_ec2.yaml │ │ └── us-east-1 │ │ │ ├── 1-cmd.txt │ │ │ └── 1-minilake_ec2.yaml │ └── images │ │ └── windows_login_ec2_capture01.png ├── lab2 │ ├── README.md │ ├── asset │ │ ├── ap-northeast-1 │ │ │ ├── 2-cmd.txt │ │ │ ├── 2-dashboard.json │ │ │ ├── 2-td-agent.conf │ │ │ └── 2-visualization.json │ │ └── us-east-1 │ │ │ ├── 2-cmd.txt │ │ │ ├── 2-dashboard.json │ │ │ ├── 2-td-agent.conf │ │ │ └── 2-visualization.json │ └── images │ │ ├── Lab2-Section1-Step1-4.png │ │ ├── kibana_capture01.png │ │ ├── kibana_dashboard.png │ │ ├── kibana_discover.png │ │ └── kibana_management.png ├── lab3 │ ├── README.md │ ├── additional_info_lab3.md │ └── asset │ │ ├── ap-northeast-1 │ │ ├── 3-cmd.txt │ │ ├── 3-dashboard.json │ │ ├── 3-define-extraction-query.txt │ │ ├── 3-td-agent.conf │ │ └── 3-visualization.json │ │ └── us-east-1 │ │ ├── 3-cmd.txt │ │ ├── 3-dashboard.json │ │ ├── 3-define-extraction-query.txt │ │ ├── 3-td-agent.conf │ │ └── 3-visualization.json ├── lab4 │ ├── README.md │ ├── additional_info_lab4.md │ ├── asset │ │ ├── ap-northeast-1 │ │ │ ├── 4-cmd.txt │ │ │ ├── 4-policydocument.txt │ │ │ ├── 4-td-agent1.conf │ │ │ └── 4-td-agent2.conf │ │ └── us-east-1 │ │ │ ├── 4-cmd.txt │ │ │ ├── 4-policydocument.txt │ │ │ ├── 4-td-agent1.conf │ │ │ └── 4-td-agent2.conf │ └── images │ │ └── quicksight_capture01.png ├── lab5 │ ├── README.md │ ├── asset │ │ ├── ap-northeast-1 │ │ │ ├── 5-cmd.txt │ │ │ ├── 5-minilake_privatesubnet.yaml │ │ │ ├── 5-td-agent1.conf │ │ │ └── 5-td-agent2.conf │ │ └── us-east-1 │ │ │ ├── 5-cmd.txt │ │ │ ├── 5-minilake_privatesubnet.yaml │ │ │ ├── 5-td-agent1.conf │ │ │ └── 5-td-agent2.conf │ └── images │ │ ├── Lab5-Section4-Step4-25.png │ │ └── quicksight_vpc_setting.png └── lab6 │ ├── README.md │ ├── additional_info_lab6.md │ ├── asset │ ├── ap-northeast-1 │ │ └── 6-cmd.txt │ └── us-east-1 │ │ └── 6-cmd.txt │ └── images │ ├── CSV_nopartition.png │ ├── CSV_partition.png │ ├── Parquet_nopartition.png │ ├── Parquet_partition.png │ └── glue_job_capture01.png ├── JP ├── README.md ├── clean-up │ └── README.md ├── images │ ├── architecture_all.png │ ├── architecture_lab1.png │ ├── architecture_lab2.png │ ├── architecture_lab3.png │ ├── architecture_lab4.png │ ├── architecture_lab5.png │ └── architecture_lab6.png ├── lab1 │ ├── README.md │ ├── additional_info_lab1.md │ ├── additional_info_lab1_Fluentd.md │ ├── additional_info_lab1_IAM.md │ ├── asset │ │ ├── ap-northeast-1 │ │ │ ├── 1-cmd.txt │ │ │ └── 1-minilake_ec2.yaml │ │ └── us-east-1 │ │ │ ├── 1-cmd.txt │ │ │ └── 1-minilake_ec2.yaml │ └── images │ │ └── windows_login_ec2_capture01.png ├── lab2 │ ├── README.md │ ├── asset │ │ ├── ap-northeast-1 │ │ │ ├── 2-cmd.txt │ │ │ ├── 2-dashboard.json │ │ │ ├── 2-td-agent.conf │ │ │ └── 2-visualization.json │ │ └── us-east-1 │ │ │ ├── 2-cmd.txt │ │ │ ├── 2-dashboard.json │ │ │ ├── 2-td-agent.conf │ │ │ └── 2-visualization.json │ └── images │ │ ├── Lab2-Section1-Step1-4.png │ │ ├── kibana_capture01.png │ │ ├── kibana_capture02.png │ │ ├── kibana_dashboard.png │ │ ├── kibana_discover.png │ │ ├── kibana_management.png │ │ ├── kibana_pain.png │ │ └── kibana_pain2.png ├── lab3 │ ├── README.md │ ├── additional_info_lab3.md │ ├── asset │ │ ├── ap-northeast-1 │ │ │ ├── 3-cmd.txt │ │ │ ├── 3-dashboard.json │ │ │ ├── 3-define-extraction-query.txt │ │ │ ├── 3-td-agent.conf │ │ │ └── 3-visualization.json │ │ └── us-east-1 │ │ │ ├── 3-cmd.txt │ │ │ ├── 3-dashboard.json │ │ │ ├── 3-define-extraction-query.txt │ │ │ ├── 3-td-agent.conf │ │ │ └── 3-visualization.json │ └── images │ │ ├── Lab3-Section2-Step3-4.png │ │ └── kibana_pain2.png ├── lab4 │ ├── README.md │ ├── additional_info_lab4.md │ ├── asset │ │ ├── ap-northeast-1 │ │ │ ├── 4-cmd.txt │ │ │ ├── 4-policydocument.txt │ │ │ ├── 4-td-agent1.conf │ │ │ └── 4-td-agent2.conf │ │ └── us-east-1 │ │ │ ├── 4-cmd.txt │ │ │ ├── 4-policydocument.txt │ │ │ ├── 4-td-agent1.conf │ │ │ └── 4-td-agent2.conf │ └── images │ │ └── quicksight_capture01.png ├── lab5 │ ├── README.md │ ├── asset │ │ ├── ap-northeast-1 │ │ │ ├── 5-cmd.txt │ │ │ ├── 5-minilake_privatesubnet.yaml │ │ │ ├── 5-td-agent1.conf │ │ │ └── 5-td-agent2.conf │ │ └── us-east-1 │ │ │ ├── 5-cmd.txt │ │ │ ├── 5-minilake_privatesubnet.yaml │ │ │ ├── 5-td-agent1.conf │ │ │ └── 5-td-agent2.conf │ └── images │ │ ├── Lab5-Section4-Step2-7.png │ │ ├── Lab5-Section4-Step3-8.png │ │ ├── Lab5-Section4-Step3-9.png │ │ ├── Lab5-Section4-Step4-21.png │ │ ├── Lab5-Section4-Step4-25.png │ │ ├── Lab5-Section4-Step4-9.png │ │ ├── kibana_pain2.png │ │ └── quicksight_vpc_setting.png └── lab6 │ ├── README.md │ ├── additional_info_lab6.md │ ├── asset │ ├── ap-northeast-1 │ │ └── 6-cmd.txt │ └── us-east-1 │ │ └── 6-cmd.txt │ └── images │ ├── CSV_nopartition.png │ ├── CSV_partition.png │ ├── Parquet_nopartition.png │ ├── Parquet_partition.png │ └── glue_job_capture01.png ├── LICENSE ├── LICENSE-SAMPLECODE ├── LICENSE-SUMMARY ├── README.md └── datalake-handson-sample-data ├── .DS_Store └── amazon_reviews_JP.csv.gz /.github/PULL_REQUEST_TEMPLATE.md: -------------------------------------------------------------------------------- 1 | *Issue #, if available:* 2 | 3 | *Description of changes:* 4 | 5 | 6 | By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice. 7 | -------------------------------------------------------------------------------- /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | ## Code of Conduct 2 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). 3 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact 4 | opensource-codeofconduct@amazon.com with any additional questions or comments. 5 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Guidelines for contributing 2 | 3 | Thank you for your interest in contributing to AWS documentation! We greatly value feedback and contributions from our community. 4 | 5 | Please read through this document before you submit any pull requests or issues. It will help us work together more effectively. 6 | 7 | ## What to expect when you contribute 8 | 9 | When you submit a pull request, our team is notified and will respond as quickly as we can. We'll do our best to work with you to ensure that your pull request adheres to our style and standards. If we merge your pull request, we might make additional edits later for style or clarity. 10 | 11 | The AWS documentation source files on GitHub aren't published directly to the official documentation website. If we merge your pull request, we'll publish your changes to the documentation website as soon as we can, but they won't appear immediately or automatically. 12 | 13 | We look forward to receiving your pull requests for: 14 | 15 | * New content you'd like to contribute (such as new code samples or tutorials) 16 | * Inaccuracies in the content 17 | * Information gaps in the content that need more detail to be complete 18 | * Typos or grammatical errors 19 | * Suggested rewrites that improve clarity and reduce confusion 20 | 21 | **Note:** We all write differently, and you might not like how we've written or organized something currently. We want that feedback. But please be sure that your request for a rewrite is supported by the previous criteria. If it isn't, we might decline to merge it. 22 | 23 | ## How to contribute 24 | 25 | To contribute, send us a pull request. For small changes, such as fixing a typo or adding a link, you can use the [GitHub Edit Button](https://blog.github.com/2011-04-26-forking-with-the-edit-button/). For larger changes: 26 | 27 | 1. [Fork the repository](https://help.github.com/articles/fork-a-repo/). 28 | 2. In your fork, make your change in a branch that's based on this repo's **master** branch. 29 | 3. Commit the change to your fork, using a clear and descriptive commit message. 30 | 4. [Create a pull request](https://help.github.com/articles/creating-a-pull-request-from-a-fork/), answering any questions in the pull request form. 31 | 32 | Before you send us a pull request, please be sure that: 33 | 34 | 1. You're working from the latest source on the **master** branch. 35 | 2. You check [existing open](https://github.com/awsdocs/amazon-s3-datalake-handson/pulls), and [recently closed](https://github.com/awsdocs/amazon-s3-datalake-handson/pulls?q=is%3Apr+is%3Aclosed), pull requests to be sure that someone else hasn't already addressed the problem. 36 | 3. You [create an issue](https://github.com/awsdocs/amazon-s3-datalake-handson/issues/new) before working on a contribution that will take a significant amount of your time. 37 | 38 | For contributions that will take a significant amount of time, [open a new issue](https://github.com/awsdocs/amazon-s3-datalake-handson/issues/new) to pitch your idea before you get started. Explain the problem and describe the content you want to see added to the documentation. Let us know if you'll write it yourself or if you'd like us to help. We'll discuss your proposal with you and let you know whether we're likely to accept it. We don't want you to spend a lot of time on a contribution that might be outside the scope of the documentation or that's already in the works. 39 | 40 | ## Finding contributions to work on 41 | 42 | If you'd like to contribute, but don't have a project in mind, look at the [open issues](https://github.com/awsdocs/amazon-s3-datalake-handson/issues) in this repository for some ideas. Any issues with the [help wanted](https://github.com/awsdocs/amazon-s3-datalake-handson/labels/help%20wanted) or [enhancement](https://github.com/awsdocs/amazon-s3-datalake-handson/labels/enhancement) labels are a great place to start. 43 | 44 | In addition to written content, we really appreciate new examples and code samples for our documentation, such as examples for different platforms or environments, and code samples in additional languages. 45 | 46 | ## Code of conduct 47 | 48 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). For more information, see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact [opensource-codeofconduct@amazon.com](mailto:opensource-codeofconduct@amazon.com) with any additional questions or comments. 49 | 50 | ## Security issue notifications 51 | 52 | If you discover a potential security issue, please notify AWS Security via our [vulnerability reporting page](http://aws.amazon.com/security/vulnerability-reporting/). Please do **not** create a public issue on GitHub. 53 | 54 | ## Licensing 55 | 56 | See the [LICENSE](https://github.com/awsdocs/amazon-s3-datalake-handson/blob/master/LICENSE) file for this project's licensing. We will ask you to confirm the licensing of your contribution. We may ask you to sign a [Contributor License Agreement (CLA)](http://en.wikipedia.org/wiki/Contributor_License_Agreement) for larger changes. 57 | -------------------------------------------------------------------------------- /EN/README.md: -------------------------------------------------------------------------------- 1 | ------------------------------------------------------------------------------------ 2 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 3 | SPDX-License-Identifier: MIT-0 4 | 5 | ------------------------------------------------------------------------------------ 6 | 7 | # Introduction 8 | ## The goal of this hands-on workshop  9 | The goal of this hands-on workshop 10 | Data Lake is used by many companies to store and analyze data. Data Lake can be used as a central repository for the both structured and unstructured data taken from variety of data sources. 11 | 12 | The goal of this hands-on is to gain knowledge of building an infrastructure to analyze big data with Data Lake by building a pipeline of analysis between AWS Big Data services and other related services. 13 | 14 | 15 | ## Prerequisites 16 | - PC (Windows, Mac OS, Linux and so on) which can be connected to AWS infrastructure 17 | - AWS account prepared in advance 18 | - SSH client (Tera Term is recommended for Windows especially in Japanese.) 19 | - Web browser (Forefox or Chrome is recommended.) 20 | 21 | # Overview of this hands-on workshop 22 | 23 | ## Six labs in this hands-on workshop 24 | This hands-on workshop consists of six labs. 25 | 26 | Lab1:Getting started (Required) 27 | AWS services in the topic:Amazon VPC, Amazon EC2, AWS CloudFormation, AWS IAM 28 | 29 | Lab2:Visualization of application logs in real time 30 | AWS services in the topic:Amazon Elasticsearch Service 31 | 32 | Lab3:Visualization of application logs in real time and alarm settings 33 | AWS services in the topic:Amazon CloudWatch, AWS Lambda, Amazon Elasticsearch Service 34 | 35 | Lab4:Application log persistence and long-term data analysis and visualization 36 | AWS services in the topic:Amazon Kinesis Data Firehose, Amazon S3, Amazon Athena, Amazon QuickSight 37 | 38 | Lab5:Data analysis using DWH on AWS 39 | AWS services in the topic: Amazon Kinesis Data Firehose, Amazon S3, Amazon Redshift, Amazon Redshift Spectrum, Amazon QuickSight 40 | 41 | Lab6:ETL processing of data using serverless 42 | AWS services in the topic:AWS Glue, Amazon Athena 43 | 44 | 45 | ## Three options of this hands-on workshop 46 | 47 | This hands-on can be completed walking through the following path. You have three options. 48 | (1) Implementation of near real-time data analysis environment (speed layer):[Lab1](lab1/README.md) → [Lab2](lab2/README.md) → [Lab3](lab3/README.md) 49 | (2) Implementation of an environment for batch analysis of long-term data (batch layer) and optimization of performance and cost:[Lab1](lab1/README.md) → [Lab4](lab4/README.md) or [Lab5](lab5/README.md) → [Lab6](lab6/README.md) 50 | (3) All labs:[Lab1](lab1/README.md) → [Lab2](lab2/README.md) → [Lab3](lab3/README.md) → [Lab4](lab4/README.md) → [Lab5](lab5/README.md) → [Lab6](lab6/README.md) 51 | 52 | 53 | Once you complete all labs, you can build the following architecture. 54 | 55 | 56 | 57 | 58 | With this architecture, you can create a hair close to the complete serverless mechanism that enables you to perform near real-time analysis in the speed layer. 59 | For logging, this architecture can skip alarms at specific conditions, as well as save all log data at low cost for a long time. Regarding data processing, it performs ETL processing as needed, directly executes query log data in ad hoc. It can identify and pick the data to be analyzed, and visualize the evaluated data in DWH using BI tools. 60 | 61 | ## Outline of each lab 62 | The outline of each lab is as follows. 63 | 64 | ### Lab1:Getting started (Required) 65 | Build a common environment required to set up in the following five labs. 66 | Using AWS CloudFormation (CloudFormation), Amazon VPC (VPC) and Amazon EC2 (EC2) are built and the permission for the execution is configured AWS IAM (IAM). Then manually install the log collection software Fluentd. 67 | 68 | - The steps of Lab1 are [here](lab1/README.md) 69 | 70 | 71 | 72 | AWS services in this Lab:VPC, EC2, CloudFormation, IAM 73 | 74 | ### Lab2:Visualization of application logs in real time 75 | In this Lab, we visualize the log data recorded in EC2 instance which was configured in “Lab1: Getting Started” in real time. 76 | The log output from EC2 is sent to Amazon Elasticsearch Service (Elasticsearch Service) in a stream using Fluentd (OSS). 77 | This functionality to visualize using Kibana that comes with Elasticsearch Service. 78 | 79 | - The steps of Lab2 are [here](lab2/README.md) 80 | 81 | 82 | 83 | AWS services in this Lab:Elasticsearch Service 84 | 85 | ### Lab3:Visualization of application logs in real time and alarm settings 86 | 87 | In addition to the visualization performed in "Lab2: Visualization of application logs in real time", in this section, we set an alarm to detect an error. 88 | We add alarm notification before sending from Fluentd to Elasticsearch Service. Amazon CloudWatch (CloudWatch) and AWS Lambda (Lambda) are used for the alarm. 89 | 90 | - The steps of Lab3 are [here](lab3/README.md) 91 | 92 | 93 | 94 | AWS services in this Lab:CloudWatch, Lambda, Elasticsearch Service 95 | 96 | ### Lab4:Application log persistence and long-term data analysis and visualization 97 | 98 | After sending the stream data to Amazon Kinesis Data Firehose (Kinesis Data Firehose), you can save the data to Amazon S3 (S3) for long-term storage. After that, ad hoc analysis is performed using Amazon Athena (Athena). Also you can visualize using Amazon QuickSight (QuickSight). 99 | 100 | - The steps of Lab4 are [here](lab4/README.md) 101 | 102 | 103 | 104 | AWS services in this Lab:Kinesis Data Firehose, S3, Athena, QuickSight 105 | 106 | ### Lab5:Data analysis using DWH on AWS 107 | 108 | After sending the stream data to Kinesis Data Firehose, you can save the data to S3 for long-term storage. After that, use Amazon Redshift Spectrum (Redshift Spectrum) to execute queries and visualize the data with QuickSight 109 | 110 | - The steps of Lab5 are [here](lab5/README.md) 111 | 112 | 113 | 114 | AWS services in this Lab:Kinesis Data Firehose, S3, Athena, Redshift, Redshift Spectrum, QuickSight 115 | 116 | ### Lab6:ETL processing of data using serverless 117 | 118 | After sending the stream data to Kinesis Data Firehose, you can save the data to S3 for long-term storage. After that, use AWS Glue (Glue) and (1) You can convert the file format to Apache Parquet format. (2)You can place the file in a partitioned storage, and save the result in S3. Then, use Athena or Redshift Spectrum to execute queries and visualize with QuickSight. 119 | 120 | - The steps of Lab6 are [here](lab6/README.md) 121 | 122 | 123 | 124 | AWS services in this Lab:Glue, Athena 125 | 126 | 127 | ## Precautions throughout this hands-on workshop 128 | 1. This hands-on workshop is based on the assumption of using resources in the “Tokyo Region”. If you hit the limit of the number of resources in this region, you can create an environment in the Northern Virginia Region instead. In that case, it is necessary to replace all descriptions of “Tokyo Region (ap-northeast-1)” with “Northern Virginia Region(us-east-1)” in each hands-on documentation. In addition, we prepared asset materials for both regions. , so please use the materials for your region of choice. 129 | 130 | 2. The “supplemental explanation” arranged in each lab is not a mandatory procedure to proceed this hands-on workshop. Please use as reference material. 131 | 132 | 3. If several people run this hands-on workshop using the same AWS account, please be careful not to overlap names of the resource each other. 133 | 134 | 4. In each procedure, you can freely change the names of items that are marked as “optional”. With an exception of S3 resources, It is recommended that you name them as it appear in the materials, so that you will not lose the track. 135 | 136 | 5. In each procedure, a link to the related Asset material is provided. you reach them via a browser, it is provided in HTML format. For readability, if necessary, please download the file and proceed the lab with the material. 137 | 138 | 139 | -------------------------------------------------------------------------------- /EN/clean-up/README.md: -------------------------------------------------------------------------------- 1 | ------------------------------------------------------------------------------------ 2 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 3 | SPDX-License-Identifier: MIT-0 4 | 5 | ------------------------------------------------------------------------------------ 6 | 7 | 8 | # Clean up 9 | 10 | Follow the instructions below. Deletion target is described for each Lab. 11 | **If you have not completed this work, you will continue to be charged.** 12 | 13 | ## Lab6 14 | 15 | 1. Glue job (job name: minilake1) 16 | 17 | 2. Remove crawlers from Glue (crawler name: minilake-in1, minilake-out1, minilake-out2) 18 | 19 | 3. Delete tables from Glue (table name: minilake\_out1, minilake\_out2) 20 | 21 | 4. Delete database from Glue (database name: minilake) 22 | 23 | 24 | ## Lab5 25 | 26 | 1. Remove QuickSight VPC connection 27 | 28 | 2. Unsubscribe QuickSight (Click the account name of QuickSight → [Manage QuickSight] → [Account settings] → [Unsubscribe]) 29 | 30 | 3. Delete S3 bucket (bucket name: [S3 bucket name created by you]) 31 | 32 | 4. Delete Kinesis Firehose delivery stream (stream name: minilake1) 33 | 34 | 5. Delete a Redshift cluster 35 | 36 | 6. Delete table from Glue (table name: ec2log\_external) 37 | 38 | 7. Delete database from Glue (database name: spectrumdb) 39 | 40 | 8. Remove manually added rule (access from qs-rs-private-conn) for inbound rule of handson-minilake-private security group 41 | 42 | 9. Delete the security group qs-rs-private-conn 43 | 44 | 10. Removed “handson-minilake-private-subnet” stack from CloudFormation console. 45 | 46 | 47 | ## Lab4 48 | 49 | 1. Unsubscribe QuickSight (Click the account name of QuickSight → [Manage QuickSight] → [Account settings] → [Unsubscribe]) 50 | 51 | 2. Delete S3 bucket (bucket name: [S3 bucket name created by you]) 52 | 53 | 3. Delete Kinesis Firehose delivery stream (stream name: minilake1) 54 | 55 | 4. Remove a crawler from Glue (crawler name: minilake1) 56 | 57 | 5. Delete tables from Glue (table name: minilake\_in1, minilake\_out1, ec2log\_external) 58 | 59 | 6. Delete databases from Glue (database name: minilake, spectrumdb) 60 | 61 | 62 | ## Lab1〜3 63 | 64 | 1. Delete Cloudwatch Logs (under log group name) 65 | - Tokyo region 66 | - /minilake_group 67 | - /aws/lambda/LogsToElasticsearch_handson-minilake 68 | - /aws/kinesisfirehose/minilake1 69 | - Streams related to "minilake" under /aws-glue/crawlers/ 70 | - The streams of /aws-glue/jobs/error and /aws-glue/jobs/output are created for each Job ID. If you haven't used Glue before, delete the entire log group. 71 | 72 | 2. Delete Lambda (function name: LogsToElasticsearch-handson\_minilake) 73 | 74 | 3. CloudWatch alarm (alarm name: minilake_errlog) 75 | 76 | ## Lab1〜Lab2 77 | 78 | 1. Delete Amazon Elasticsearch Service (domain name: handson-minilake) 79 | 80 | 2. Delete a stack with CloudFormation (deletion of EC2, EIP and so on) 81 | 82 | 3. Delete an IAM role (role name: handson-minilake) 83 | 84 | That's all for cleaning up. 85 | -------------------------------------------------------------------------------- /EN/images/architecture_all.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-s3-datalake-handson/77a5c739987367b8462c5334b320319885a57d96/EN/images/architecture_all.png -------------------------------------------------------------------------------- /EN/images/architecture_lab1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-s3-datalake-handson/77a5c739987367b8462c5334b320319885a57d96/EN/images/architecture_lab1.png -------------------------------------------------------------------------------- /EN/images/architecture_lab2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-s3-datalake-handson/77a5c739987367b8462c5334b320319885a57d96/EN/images/architecture_lab2.png -------------------------------------------------------------------------------- /EN/images/architecture_lab3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-s3-datalake-handson/77a5c739987367b8462c5334b320319885a57d96/EN/images/architecture_lab3.png -------------------------------------------------------------------------------- /EN/images/architecture_lab4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-s3-datalake-handson/77a5c739987367b8462c5334b320319885a57d96/EN/images/architecture_lab4.png -------------------------------------------------------------------------------- /EN/images/architecture_lab5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-s3-datalake-handson/77a5c739987367b8462c5334b320319885a57d96/EN/images/architecture_lab5.png -------------------------------------------------------------------------------- /EN/images/architecture_lab6.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-s3-datalake-handson/77a5c739987367b8462c5334b320319885a57d96/EN/images/architecture_lab6.png -------------------------------------------------------------------------------- /EN/lab1/README.md: -------------------------------------------------------------------------------- 1 | ------------------------------------------------------------------------------------ 2 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 3 | SPDX-License-Identifier: MIT-0 4 | 5 | ------------------------------------------------------------------------------------ 6 | 7 | 8 | # Lab1:Getting started 9 | Build a common environment prerequisite to the following five labs. 10 | Amazon VPC (VPC) and Amazon EC2 (EC2) are built and the permission is configured appropriately in AWS IAM (IAM), using AWS CloudFormation (CloudFormation). Then we will manually install the Fluentd; the log collection software. 11 | 12 | ## Section1:Before you start 13 | ### Step1:Log in to the AWS Management Console 14 | 15 | 1. Log in to the AWS Management Console. After logging in, confirm that **[Tokyo]** is set in the region selection in the header section at the top right of the screen. 16 | 17 | **Note:** If it is not **[Tokyo]**, please change to **[Tokyo]**. 18 | 19 | 2. Select **EC2** from the list of services in the AWS Management Console. From the left pane of **[EC2 Dashboard]**, click **[Key Pairs]**, click the **[Create Key Pair]** button, enter any value for **[Key pair name]** (Example: handson) and click **[Create]**. The private key (Example: handson.pem) is downloaded to your PC. 20 | **Note:** If you are using an existing key pair, skip this step. 21 | 22 | ## Section2:EC2 environment 23 | ### Step1:Launch an EC2 instance with CloudFormation 24 | 25 | Use CloudFormation to create a VPC and launch an EC2 instance which continuously outputs logs to the VPC. About 10 logs appear every 2 minutes, and 300 error logs appear every 10 minutes. 26 | 27 | 1. Select **CloudFormation** from the list of services in the AWS Management Console. 28 | 29 | **Note:** If you cannot find CloudFormation, enter a part of the word, such as “cloudform” in the search window and select it. 30 | 31 | 2. On the **[CloudFormation]** dashboard, click **[Create stack]** at the top right of the dashboard. 32 | 33 | 3. On the **[Create stack]** screen, select **[Template is ready]** in **[Prerequisite - Prepare template]**. 34 | 35 | **Note:** If it is selected by default, proceed to the next, leaving the default as it is. 36 | 37 | 4. Then, in **[Specify template]** of **[Create stack]** screen, select **[Upload a template file]**, click **[Choose file]**, specify the downloaded template "**1-minilake_ec2.yaml**" and click **[Next]**. 38 | 39 | **Asset** resource:[1-minilake_ec2.yaml](asset/ap-northeast-1/1-minilake_ec2.yaml) 40 | 41 | 5. Specify "**handson-minilake** (optional)" as **[Stack name]**, and "**handson.pem** (optional)" which was created in **Section1** as **[KeyPair]** or the existing keypair name if you already have it, and enter **handson-minilake-role** (optional) for RoleName. Then, click **[Next]**. 42 | 43 | 6. In the optional **tag**, enter "**Name**" for **Key** and "**handson-minilake** (optional)" for **Value**, then click **[Next]**. 44 | 45 | 7. Review the contents of the final confirmation page, check the "**I acknowledge that AWS CloudFormation might create IAM resources with custom names.**", then click **[Create stack]**.After a few minutes, EC2 launches and log starts to output in **/root/es-demo/testapp.log**. 46 | 47 | **Note:** If you are logged in with SSM, we recommend that you take a break of 10 minutes or so, because there may be a time lag between when the instance starts and when SSM connectivity is available. 48 | 49 | 8. Log in to EC2 **with SSH and switch to root**. Then, you can check the logs appearing every 2 minutes. 50 | 51 | **Note:** See [here](additional_info_lab1.md#EC2へのログイン方法) for EC2 login instructions. For the IP address information of the EC2 connection destination, select the appropriate CloudFormation stack from the **[CloudFormation]** dashboard and click the **[Outputs]** tab. **[AllowIPAddress]** section have IP address information. 52 | 53 | ``` 54 | $ sudo su - 55 | # tail -f /root/es-demo/testapp.log 56 | ``` 57 | 58 | **[Log output example]** 59 | 60 | ``` 61 | [2019-09-16 15:14:01+0900] WARNING prd-db02 uehara 1001 [This is Warning.] 62 | [2019-09-16 15:14:01+0900] INFO prd-db02 uehara 1001 [This is Information.] 63 | [2019-09-16 15:14:01+0900] INFO prd-web002 uchida 1001 [This is Information.] 64 | [2019-09-16 15:18:01+0900] INFO prd-ap001 uehara 1001 [This is Information.] 65 | [2019-09-16 15:18:01+0900] ERROR prd-db02 uchida 1001 [This is ERROR.] 66 | ``` 67 | 68 | ## Section3:Conclusion 69 | 70 | Using CloudFormation, we configured the following 71 | 72 | 1. You created a VPC and an EC2 that logs about 10 every 2 minutes and continues to log 300 errors every 10 minutes. 73 | 2. You have granted permissions to the EC2 you built in your VPC to access AWS resources. Please see [here](./additional_info_lab1_IAM.md) for details. 74 | 3. I installed the log collection software Fluentd on the EC2 that I built. Please see [here](./additional_info_lab1_Fluentd.md) for details. 75 | 76 | 77 | 78 | That's it for Lab1. Try the following procedure according with the path you have selected. 79 | 80 | (1) Implementation of near real-time data analysis environment (speed layer):[Lab1](../lab1/README.md) → [Lab2](../lab2/README.md) → [Lab3](../lab3/README.md) 81 | (2) Implementation of an environment for batch analysis of long-term data (batch layer) and optimization of performance and cost:[Lab1](../lab1/README.md) → [Lab4](../lab4/README.md) or [Lab5](../lab5/README.md) → [Lab6](../lab6/README.md) 82 | (3) All labs:[Lab1](../lab1/README.md) → [Lab2](../lab2/README.md) → [Lab3](../lab3/README.md) → [Lab4](../lab4/README.md) → [Lab5](../lab5/README.md) → [Lab6](../lab6/README.md) 83 | 84 | Please follow [these instructions](../clean-up/README.md) when deleting an environment. -------------------------------------------------------------------------------- /EN/lab1/additional_info_lab1.md: -------------------------------------------------------------------------------- 1 | ------------------------------------------------------------------------------------ 2 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 3 | SPDX-License-Identifier: MIT-0 4 | 5 | ------------------------------------------------------------------------------------ 6 | 7 | 8 | # Addendum: How to log in to EC2 9 | Here are 3 ways to log in to EC2. 10 | 11 | 1. **Windows**: Log in using **Tera Term** 12 | 13 | 2. **Mac / Linux**: Log in using **Terminal** 14 | 15 | 3. **Windows / Mac / Linux**: Log in using **AWS Systems Manager Session Manager** (**Session Manager**) 16 | 17 | To log in to EC2 using Windows or Mac, the following information is required. 18 | 19 | - Private key file of the key pair specified when creating the instance (eg: **handson.pem** ) 20 | 21 | **Note:** For the creation procedure, see the Lab1 procedure. 22 | 23 | - Public IP address assigned to the instance 24 | 25 | ### Addendum: EC2 public IP address confirmation procedure 26 | 1. Log in to the AWS Management Console and select **EC2** from the list of services in the AWS Management Console. 27 | 28 | 2. From the left pane of the **[EC2 Dashboard]** screen, select **[Instances]**. 29 | 30 | 3. Select the corresponding instance from the instance list, copy the contents described in **[Public DNS (IPv4)]** from the contents of the **[Description]** tab at the bottom of the screen, and put it in a notepad on your computer to use it later. 31 | 32 | 33 | ## 1. For Windows 34 | 35 | Log in to EC2 from Windows using the following procedure. 36 | 37 | 1. Launch Tera Term(ttssh.exe) 38 | 39 | **Note:** The module can be downloaded from “https://osdn.net/projects/ttssh2/”. 40 | 41 | 2. Type **[Public DNS]** of the instance into **[host]** 42 | 43 | 3. Select **[SSH2]** in the **[SSH version]**, then Click OK 44 | 45 | 4. Upon displaying the view as below, click **[Continue]** 46 | 47 | 48 | 5. Type 「 **ec2-user** 」 for user name 49 | 50 | 6. Select **[Use RSA/DSA/ECDSA/ED255519]** 。 51 | 52 | 7. Click **[Secret file]**, and select the name of the key pair file in your computer, **[key pair name].pem** (Ex:handson.pem), then connect. 53 | 54 | **Note:** Select "All files (\*.\*)" to view all files including the secret key pair file. 55 | 56 | ## 2.For Mac / Linux 57 | 58 | Use the following procedure to log in to EC2 from Windows. 59 | 60 | 1. Log in at the command line from the terminal. 61 | 62 | **Note:** Connection is not possible unless the permission of the private key file (pem file) is set to **600** in advance. 63 | 64 | ``` 65 | $ chmod 600 ~/Downloads/handson.pem 66 | $ ssh -i ~/Downloads/handson.pem ec2-user@[assigned public ip address] 67 | ``` 68 | 69 | 2. You will be asked "Are you sure you want to continue connecting (yes/no)?". Enter "yes" and log in. 70 | 71 | 72 | ## 3. For Session Manager 73 | 74 | Log in to EC2 from the Session Manager as follows. 75 | 76 | **Note:** The required IAM roles between EC2 and Session Manager are already in place when you run AWS CloudFormation.Please see [here](./additional_info_lab1_IAM.md) for details. 77 | 78 | 1. Select **Systems Manager** from the list of services in the AWS Management Console, select **[Session Manager]**, and click **[Start session]**. 79 | 80 | 2. Specify the EC2 instance ID to be logged in and click **[Start session]** to log in to EC2. 81 | 82 | **Note:** It is expected that it takes about 5 minutes until the corresponding instance is displayed. 83 | 84 | 3. A command line appears on the web. Execute the following command and switch to **ec2-user** user. 85 | **Note:** By default, you are logged in as the user **ssm-user**. 86 | 87 | ``` 88 | $ whoami 89 | $ sudo su - ec2-user 90 | ``` 91 | 92 | 93 | ### Supplement: What to do if the target instance is not displayed in the target instance 94 | 95 | Select **EC2** from the list of services in the AWS Management Console, and confirm that the **[Status Checks]** of the instance "**handson-minilake** (optional)" created this time is **[2/2 checks passed]**. If it is still initializing, wait for it to complete and check the Systems Manager target instance again. 96 | 97 | If the corresponding instance is not displayed even though initialization is completed, select **EC2** from the list of services in the AWS Management Console, select **[Instances]** from the left pane of the **[EC2 Dashboard]** screen, check the instance "**handson-minilake** (optional)" created this time, and click **[Actions] → [Instance State] → [Reboot]**. 98 | Then check the target instance of Systems Manager again. 99 | 100 | 101 | ## Check points when SSH login does not work 102 | - Is the instance fully launched? 103 | - Are you operating as specified at startup? 104 | - Is the destination IP address or host name correct? 105 | - Is the specified Security Group enabled 22 (SSH) or 3389 (RDP)? 106 | - Is the key file corresponding to the specified Key Pair specified? 107 | - Is the permission of the private key file 600? (When connecting from Mac / Linux) 108 | - Is communication established between EC2 and SSM? (When using Session Manager) 109 | 110 | 111 | -------------------------------------------------------------------------------- /EN/lab1/additional_info_lab1_Fluentd.md: -------------------------------------------------------------------------------- 1 | ------------------------------------------------------------------------------------ 2 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 3 | SPDX-License-Identifier: MIT-0 4 | 5 | ------------------------------------------------------------------------------------ 6 | 7 | 8 | # Addendum: Installing Fluentd 9 | 10 | AWS CloudFormation launches EC2 from an AMI with the following steps in place:. 11 | If you want to install Fluentd on any EC2 instance, do the following:. 12 | 13 | 1. Log in to EC2 and install redhat-lsb-core and gcc. 14 | 15 | **Note:** This step can be skipped because it is already installed on a prepared AMI. 16 | **Asset**:[1-cmd.txt](asset/ap-northeast-1/1-cmd.txt) 17 | 18 | ``` 19 | $ sudo su - 20 | # yum -y install redhat-lsb-core gcc 21 | ``` 22 | 23 | 2. Install td-agent. 24 | 25 | **Asset**:[1-cmd.txt](asset/ap-northeast-1/1-cmd.txt) 26 | 27 | ``` 28 | # rpm -ivh http://packages.treasuredata.com.s3.amazonaws.com/3/redhat/6/x86_64/td-agent-3.1.1-0.el6.x86_64.rpm 29 | ``` 30 | 31 | 3. Modify the **TD\_AGENT\_USER** specification in line 18 of **/etc/init.d/td-agent** from **td-agent** to **root**. 32 | 33 | **Note:** The command example uses the vi editor, but if you are familiar with another editor, use it. 34 | **Asset**:[1-cmd.txt](asset/ap-northeast-1/1-cmd.txt) 35 | 36 | ``` 37 | # vi /etc/init.d/td-agent 38 | ``` 39 | 40 | **[Before change]** 41 | 42 | ``` 43 | TD_AGENT_USER=td-agent 44 | ``` 45 | 46 | **[After change]** 47 | 48 | ``` 49 | TD_AGENT_USER=root 50 | ``` 51 | 52 | 4. Set up Fluentd to start automatically. (The actual startup will take place later.) 53 | 54 | **Asset**:[1-cmd.txt](asset/ap-northeast-1/1-cmd.txt) 55 | 56 | ``` 57 | # chkconfig td-agent on 58 | ``` 59 | -------------------------------------------------------------------------------- /EN/lab1/additional_info_lab1_IAM.md: -------------------------------------------------------------------------------- 1 | ------------------------------------------------------------------------------------ 2 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 3 | SPDX-License-Identifier: MIT-0 4 | 5 | ------------------------------------------------------------------------------------ 6 | 7 | 8 | # Addendum: Creating and Attaching an IAM Role to EC2 9 | 10 | Steps to create an IAM role that gives EC2 permission to access AWS resources. AWS CloudFormation automates the following settings. 11 | 12 | 1. In the AWS Management Console, select **IAM** from the list of services, and then select **[Roles]** from the left pane of the **[Identity and Access Management (IAM)]** screen. 13 | 14 | 2. Click **[Create Role]**. 15 | 16 | 3. Select **[AWS Services]**, select EC2, and then click **[Next]**. 17 | 18 | 4. In the **[Add Permissions]** screen, click **[Next]** without making any changes. 19 | 20 | **Note:** At this stage, you create the role without a policy. Only **[AmazonEC2RollforSSM]** is attached for Session Manager users. 21 | 22 | 5. On the **[Name, review, and create]** screen, type **"handson-minilake (optional)"** for the role name, and then click Create Role. 23 | 24 | 6. Select **[EC2]** from the service list in the AWS Management Console, select **[Instances]** from the left pane of the **[EC2 Dashboard]**, check the "**handson-minilake (optional)**" instance you just created, and click **[Actions]** > **[Security]** > **[Change IAM Role]**. 25 | 26 | 7. In the **[Change IAM Role]** screen, select "**handson-minilake (optional)**" for IAM role, and click **[Save]**. 27 | 28 | -------------------------------------------------------------------------------- /EN/lab1/asset/ap-northeast-1/1-cmd.txt: -------------------------------------------------------------------------------- 1 | ------------------------------------------------------------------------------------ 2 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 3 | SPDX-License-Identifier: MIT-0 4 | 5 | ------------------------------------------------------------------------------------ 6 | 7 | 8 | 1. 9 | yum -y install redhat-lsb-core gcc 10 | 11 | 2. 12 | rpm -ivh http://packages.treasuredata.com.s3.amazonaws.com/3/redhat/6/x86_64/td-agent-3.1.1-0.el6.x86_64.rpm 13 | 14 | 3. 15 | vi /etc/init.d/td-agent 16 | 17 | 4. 18 | chkconfig td-agent on 19 | -------------------------------------------------------------------------------- /EN/lab1/asset/ap-northeast-1/1-minilake_ec2.yaml: -------------------------------------------------------------------------------- 1 | Parameters: 2 | KeyPair: 3 | Description: Name of an existing EC2 KeyPair to enable SSH access to the instance 4 | Type: "AWS::EC2::KeyPair::KeyName" 5 | MinLength: '1' 6 | MaxLength: '255' 7 | AllowedPattern: '[\x20-\x7E]*' 8 | ConstraintDescription: can contain only ASCII characters. 9 | RoleName: 10 | Description: Set role name 11 | Type: String 12 | MinLength: '1' 13 | MaxLength: '255' 14 | AllowedPattern: '[\x20-\x7E]*' 15 | Resources: 16 | # Create VPC 17 | MyVPC: 18 | Type: AWS::EC2::VPC 19 | Properties: 20 | CidrBlock: 10.0.0.0/24 21 | EnableDnsSupport: 'true' 22 | EnableDnsHostnames: 'true' 23 | InstanceTenancy: default 24 | Tags: 25 | - Key: Name 26 | Value: handson-minilake 27 | # Create Public RouteTable 28 | PublicRouteTable: 29 | Type: AWS::EC2::RouteTable 30 | Properties: 31 | VpcId: !Ref MyVPC 32 | Tags: 33 | - Key: Name 34 | Value: handson-minilake 35 | # Create Public Subnet A 36 | PublicSubnetA: 37 | Type: AWS::EC2::Subnet 38 | Properties: 39 | VpcId: !Ref MyVPC 40 | CidrBlock: 10.0.0.0/27 41 | AvailabilityZone: "ap-northeast-1a" 42 | Tags: 43 | - Key: Name 44 | Value: handson-minilake 45 | PubSubnetARouteTableAssociation: 46 | Type: AWS::EC2::SubnetRouteTableAssociation 47 | Properties: 48 | SubnetId: !Ref PublicSubnetA 49 | RouteTableId: !Ref PublicRouteTable 50 | # Create InternetGateway 51 | myInternetGateway: 52 | Type: "AWS::EC2::InternetGateway" 53 | Properties: 54 | Tags: 55 | - Key: Name 56 | Value: handson-minilake 57 | AttachGateway: 58 | Type: AWS::EC2::VPCGatewayAttachment 59 | Properties: 60 | VpcId: !Ref MyVPC 61 | InternetGatewayId: !Ref myInternetGateway 62 | myRoute: 63 | Type: AWS::EC2::Route 64 | DependsOn: myInternetGateway 65 | Properties: 66 | RouteTableId: !Ref PublicRouteTable 67 | DestinationCidrBlock: 0.0.0.0/0 68 | GatewayId: !Ref myInternetGateway 69 | MyEIP: 70 | Type: "AWS::EC2::EIP" 71 | Properties: 72 | Domain: vpc 73 | ElasticIPAssociate: 74 | DependsOn: MyEC2Instance 75 | Type: AWS::EC2::EIPAssociation 76 | Properties: 77 | AllocationId: !GetAtt MyEIP.AllocationId 78 | InstanceId: !Ref MyEC2Instance 79 | MyEC2Instance: 80 | Type: 'AWS::EC2::Instance' 81 | Properties: 82 | ImageId: ami-08c23a10b77e0835b 83 | InstanceType: t3.micro 84 | SubnetId: !Ref PublicSubnetA 85 | KeyName : 86 | Ref: KeyPair 87 | SecurityGroupIds: 88 | - Ref: MyEC2SecurityGroup 89 | IamInstanceProfile: !Ref InstanceProfile 90 | Tags: 91 | - Key: Name 92 | Value: handson-minilake 93 | MyEC2SecurityGroup: 94 | Type: 'AWS::EC2::SecurityGroup' 95 | Properties: 96 | GroupName: handson-minilake-sg 97 | GroupDescription: Enable SSH access via port 22 98 | VpcId: !Ref MyVPC 99 | SecurityGroupIngress: 100 | - IpProtocol: tcp 101 | FromPort: '22' 102 | ToPort: '22' 103 | CidrIp: 104 | '0.0.0.0/0' 105 | - IpProtocol: tcp 106 | FromPort: '5439' 107 | ToPort: '5439' 108 | CidrIp: 109 | '0.0.0.0/0' 110 | minilaketestrole: 111 | Type: 'AWS::IAM::Role' 112 | Properties: 113 | AssumeRolePolicyDocument: 114 | Version: '2012-10-17' 115 | Statement: 116 | - Effect: Allow 117 | Principal: 118 | Service: 119 | - ec2.amazonaws.com 120 | Action: 121 | - 'sts:AssumeRole' 122 | Path: / 123 | RoleName: !Ref RoleName 124 | ManagedPolicyArns: 125 | - arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore 126 | InstanceProfile: 127 | Type: 'AWS::IAM::InstanceProfile' 128 | Properties: 129 | Path: '/' 130 | Roles: 131 | - !Ref minilaketestrole 132 | Outputs: 133 | AllowIPAddress: 134 | Description: EC2 PublicIP 135 | Value: !Join 136 | - ',' 137 | - - !Ref MyEIP 138 | -------------------------------------------------------------------------------- /EN/lab1/asset/us-east-1/1-cmd.txt: -------------------------------------------------------------------------------- 1 | ------------------------------------------------------------------------------------ 2 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 3 | SPDX-License-Identifier: MIT-0 4 | 5 | ------------------------------------------------------------------------------------ 6 | 7 | 8 | 1. 9 | yum -y install redhat-lsb-core gcc 10 | 11 | 2. 12 | rpm -ivh http://packages.treasuredata.com.s3.amazonaws.com/3/redhat/6/x86_64/td-agent-3.1.1-0.el6.x86_64.rpm 13 | 14 | 3. 15 | vi /etc/init.d/td-agent 16 | 17 | 4. 18 | chkconfig td-agent on 19 | -------------------------------------------------------------------------------- /EN/lab1/asset/us-east-1/1-minilake_ec2.yaml: -------------------------------------------------------------------------------- 1 | Parameters: 2 | InstanceType: 3 | Description: EC2 instance type 4 | Type: String 5 | Default: t3.micro 6 | AllowedValues: 7 | - t2.micro 8 | - t3.micro 9 | ConstraintDescription: must be a valid EC2 instance type. 10 | KeyPair: 11 | Description: Name of an existing EC2 KeyPair to enable SSH access to the instance 12 | Type: "AWS::EC2::KeyPair::KeyName" 13 | MinLength: '1' 14 | MaxLength: '255' 15 | AllowedPattern: '[\x20-\x7E]*' 16 | ConstraintDescription: can contain only ASCII characters. 17 | Resources: 18 | # Create VPC 19 | MyVPC: 20 | Type: AWS::EC2::VPC 21 | Properties: 22 | CidrBlock: 10.0.0.0/24 23 | EnableDnsSupport: 'true' 24 | EnableDnsHostnames: 'true' 25 | InstanceTenancy: default 26 | Tags: 27 | - Key: Name 28 | Value: handson-minilake 29 | # Create Public RouteTable 30 | PublicRouteTable: 31 | Type: AWS::EC2::RouteTable 32 | Properties: 33 | VpcId: !Ref MyVPC 34 | Tags: 35 | - Key: Name 36 | Value: handson-minilake 37 | # Create Public Subnet A 38 | PublicSubnetA: 39 | Type: AWS::EC2::Subnet 40 | Properties: 41 | VpcId: !Ref MyVPC 42 | CidrBlock: 10.0.0.0/27 43 | AvailabilityZone: "us-east-1a" 44 | Tags: 45 | - Key: Name 46 | Value: handson-minilake 47 | PubSubnetARouteTableAssociation: 48 | Type: AWS::EC2::SubnetRouteTableAssociation 49 | Properties: 50 | SubnetId: !Ref PublicSubnetA 51 | RouteTableId: !Ref PublicRouteTable 52 | # Create InternetGateway 53 | myInternetGateway: 54 | Type: "AWS::EC2::InternetGateway" 55 | Properties: 56 | Tags: 57 | - Key: Name 58 | Value: handson-minilake 59 | AttachGateway: 60 | Type: AWS::EC2::VPCGatewayAttachment 61 | Properties: 62 | VpcId: !Ref MyVPC 63 | InternetGatewayId: !Ref myInternetGateway 64 | myRoute: 65 | Type: AWS::EC2::Route 66 | DependsOn: myInternetGateway 67 | Properties: 68 | RouteTableId: !Ref PublicRouteTable 69 | DestinationCidrBlock: 0.0.0.0/0 70 | GatewayId: !Ref myInternetGateway 71 | MyEIP: 72 | Type: "AWS::EC2::EIP" 73 | Properties: 74 | Domain: vpc 75 | ElasticIPAssociate: 76 | DependsOn: MyEC2Instance 77 | Type: AWS::EC2::EIPAssociation 78 | Properties: 79 | AllocationId: !GetAtt MyEIP.AllocationId 80 | InstanceId: !Ref MyEC2Instance 81 | MyEC2Instance: 82 | Type: 'AWS::EC2::Instance' 83 | Properties: 84 | ImageId: ami-010bdc9d71b0090aa 85 | InstanceType: t3.micro 86 | SubnetId: !Ref PublicSubnetA 87 | KeyName : 88 | Ref: KeyPair 89 | SecurityGroupIds: 90 | - Ref: MyEC2SecurityGroup 91 | Tags: 92 | - Key: Name 93 | Value: handson-minilake 94 | MyEC2SecurityGroup: 95 | Type: 'AWS::EC2::SecurityGroup' 96 | Properties: 97 | GroupName: handson-minilake-sg 98 | GroupDescription: Enable SSH access via port 22 99 | VpcId: !Ref MyVPC 100 | SecurityGroupIngress: 101 | - IpProtocol: tcp 102 | FromPort: '22' 103 | ToPort: '22' 104 | CidrIp: 105 | '0.0.0.0/0' 106 | - IpProtocol: tcp 107 | FromPort: '5439' 108 | ToPort: '5439' 109 | CidrIp: 110 | '0.0.0.0/0' 111 | Outputs: 112 | AllowIPAddress: 113 | Description: EC2 PublicIP 114 | Value: !Join 115 | - ',' 116 | - - !Ref MyEIP 117 | -------------------------------------------------------------------------------- /EN/lab1/images/windows_login_ec2_capture01.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-s3-datalake-handson/77a5c739987367b8462c5334b320319885a57d96/EN/lab1/images/windows_login_ec2_capture01.png -------------------------------------------------------------------------------- /EN/lab2/asset/ap-northeast-1/2-cmd.txt: -------------------------------------------------------------------------------- 1 | ------------------------------------------------------------------------------------ 2 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 3 | SPDX-License-Identifier: MIT-0 4 | 5 | ------------------------------------------------------------------------------------ 6 | 7 | 8 | 1. 9 | td-agent-gem install -v 2.6.0 fluent-plugin-elasticsearch 10 | td-agent-gem list | grep plugin-elasticsearch 11 | 12 | *If the installation fails, you can uninstall as follow: 13 | td-agent-gem uninstall -v 2.6.0 fluent-plugin-elasticsearch 14 | 15 | 2. 16 | /etc/init.d/td-agent start 17 | 18 | 3. 19 | tail -f /var/log/td-agent/td-agent.log 20 | -------------------------------------------------------------------------------- /EN/lab2/asset/ap-northeast-1/2-dashboard.json: -------------------------------------------------------------------------------- 1 | [ 2 | { 3 | "_id": "test1-dashboard", 4 | "_type": "dashboard", 5 | "_source": { 6 | "title": "test1-dashboard", 7 | "hits": 0, 8 | "description": "", 9 | "panelsJSON": "[{\"panelIndex\":\"1\",\"gridData\":{\"x\":0,\"y\":2,\"w\":12,\"h\":4,\"i\":\"1\"},\"id\":\"test1-alarmlevel\",\"type\":\"visualization\",\"version\":\"6.2.2\"},{\"panelIndex\":\"3\",\"gridData\":{\"x\":6,\"y\":6,\"w\":6,\"h\":5,\"i\":\"3\"},\"id\":\"test1-hostalarmlevel\",\"type\":\"visualization\",\"version\":\"6.2.2\"},{\"panelIndex\":\"4\",\"gridData\":{\"x\":0,\"y\":0,\"w\":6,\"h\":2,\"i\":\"4\"},\"id\":\"test-text\",\"type\":\"visualization\",\"version\":\"6.2.2\"},{\"panelIndex\":\"5\",\"gridData\":{\"x\":3,\"y\":6,\"w\":3,\"h\":3,\"i\":\"5\"},\"embeddableConfig\":{\"spy\":{\"mode\":{\"fill\":false,\"name\":null}},\"vis\":{\"defaultColors\":{\"0 - 100\":\"rgb(0,104,55)\"}}},\"id\":\"test1-count\",\"type\":\"visualization\",\"version\":\"6.2.2\"},{\"panelIndex\":\"6\",\"gridData\":{\"x\":0,\"y\":6,\"w\":3,\"h\":3,\"i\":\"6\"},\"embeddableConfig\":{\"spy\":{\"mode\":{\"fill\":false,\"name\":null}},\"vis\":{\"params\":{\"sort\":{\"columnIndex\":null,\"direction\":null}}}},\"id\":\"test1-ranking\",\"type\":\"visualization\",\"version\":\"6.2.2\"},{\"panelIndex\":\"7\",\"gridData\":{\"x\":0,\"y\":9,\"w\":6,\"h\":3,\"i\":\"7\"},\"embeddableConfig\":{\"spy\":{\"mode\":{\"fill\":false,\"name\":null}}},\"id\":\"test1-username\",\"type\":\"visualization\",\"version\":\"6.2.2\"},{\"panelIndex\":\"8\",\"gridData\":{\"x\":0,\"y\":12,\"w\":8,\"h\":6,\"i\":\"8\"},\"embeddableConfig\":{\"spy\":{\"mode\":{\"fill\":false,\"name\":null}},\"vis\":{\"defaultColors\":{\"0 - 7\":\"rgb(247,252,245)\",\"13 - 20\":\"rgb(116,196,118)\",\"20 - 26\":\"rgb(35,139,69)\",\"7 - 13\":\"rgb(199,233,192)\"}}},\"id\":\"fb6541a0-fbe1-11e7-a2c5-4f46f1590b26\",\"type\":\"visualization\",\"version\":\"6.2.2\"},{\"panelIndex\":\"9\",\"gridData\":{\"x\":0,\"y\":18,\"w\":12,\"h\":3,\"i\":\"9\"},\"id\":\"c05b0260-fbe2-11e7-a2c5-4f46f1590b26\",\"type\":\"visualization\",\"version\":\"6.2.2\"},{\"panelIndex\":\"10\",\"gridData\":{\"x\":6,\"y\":0,\"w\":5,\"h\":2,\"i\":\"10\"},\"id\":\"test1-text\",\"type\":\"visualization\",\"version\":\"6.2.2\"}]", 10 | "optionsJSON": "{\"darkTheme\":false,\"hidePanelTitles\":false,\"useMargins\":false}", 11 | "version": 1, 12 | "timeRestore": false, 13 | "kibanaSavedObjectMeta": { 14 | "searchSourceJSON": "{\"filter\":[],\"query\":{\"language\":\"lucene\",\"query\":{\"query_string\":{\"analyze_wildcard\":true,\"default_field\":\"*\",\"query\":\"*\"}}},\"highlightAll\":true,\"version\":true}" 15 | } 16 | } 17 | } 18 | ] -------------------------------------------------------------------------------- /EN/lab2/asset/ap-northeast-1/2-td-agent.conf: -------------------------------------------------------------------------------- 1 | 2 | @type tail 3 | path /root/es-demo/testapp.log 4 | pos_file /var/log/td-agent/testapp.log.pos 5 | format /^\[(?[^ ]* [^ ]*)\] (?[^ ]*) *? (?[^ ]*) * (?[^ ]*) * (?.*) \[(?.*)\]$/ 6 | time_format %d/%b/%Y:%H:%M:%S %z 7 | types size:integer, status:integer, reqtime:float, runtime:float, time:time 8 | tag testappec2.log 9 | 10 | 11 | 12 | type_name testappec2log 13 | @type elasticsearch 14 | include_tag_key true 15 | tag_key @log_name 16 | host eshost 17 | port 443 18 | scheme https 19 | logstash_format true 20 | logstash_prefix testappec2log 21 | flush_interval 10s 22 | retry_limit 5 23 | buffer_type file 24 | buffer_path /var/log/td-agent/buffer/testapp.log.buffer 25 | reload_connections false 26 | 27 | -------------------------------------------------------------------------------- /EN/lab2/asset/us-east-1/2-cmd.txt: -------------------------------------------------------------------------------- 1 | ------------------------------------------------------------------------------------ 2 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 3 | SPDX-License-Identifier: MIT-0 4 | 5 | ------------------------------------------------------------------------------------ 6 | 7 | 8 | 1. 9 | td-agent-gem install -v 2.6.0 fluent-plugin-elasticsearch 10 | td-agent-gem list | grep plugin-elasticsearch 11 | 12 | *If the installation fails, you can uninstall as follow: 13 | td-agent-gem uninstall -v 2.6.0 fluent-plugin-elasticsearch 14 | 15 | 2. 16 | /etc/init.d/td-agent start 17 | 18 | 3. 19 | tail -f /var/log/td-agent/td-agent.log 20 | -------------------------------------------------------------------------------- /EN/lab2/asset/us-east-1/2-dashboard.json: -------------------------------------------------------------------------------- 1 | [ 2 | { 3 | "_id": "test1-dashboard", 4 | "_type": "dashboard", 5 | "_source": { 6 | "title": "test1-dashboard", 7 | "hits": 0, 8 | "description": "", 9 | "panelsJSON": "[{\"panelIndex\":\"1\",\"gridData\":{\"x\":0,\"y\":2,\"w\":12,\"h\":4,\"i\":\"1\"},\"id\":\"test1-alarmlevel\",\"type\":\"visualization\",\"version\":\"6.2.2\"},{\"panelIndex\":\"3\",\"gridData\":{\"x\":6,\"y\":6,\"w\":6,\"h\":5,\"i\":\"3\"},\"id\":\"test1-hostalarmlevel\",\"type\":\"visualization\",\"version\":\"6.2.2\"},{\"panelIndex\":\"4\",\"gridData\":{\"x\":0,\"y\":0,\"w\":6,\"h\":2,\"i\":\"4\"},\"id\":\"test-text\",\"type\":\"visualization\",\"version\":\"6.2.2\"},{\"panelIndex\":\"5\",\"gridData\":{\"x\":3,\"y\":6,\"w\":3,\"h\":3,\"i\":\"5\"},\"embeddableConfig\":{\"spy\":{\"mode\":{\"fill\":false,\"name\":null}},\"vis\":{\"defaultColors\":{\"0 - 100\":\"rgb(0,104,55)\"}}},\"id\":\"test1-count\",\"type\":\"visualization\",\"version\":\"6.2.2\"},{\"panelIndex\":\"6\",\"gridData\":{\"x\":0,\"y\":6,\"w\":3,\"h\":3,\"i\":\"6\"},\"embeddableConfig\":{\"spy\":{\"mode\":{\"fill\":false,\"name\":null}},\"vis\":{\"params\":{\"sort\":{\"columnIndex\":null,\"direction\":null}}}},\"id\":\"test1-ranking\",\"type\":\"visualization\",\"version\":\"6.2.2\"},{\"panelIndex\":\"7\",\"gridData\":{\"x\":0,\"y\":9,\"w\":6,\"h\":3,\"i\":\"7\"},\"embeddableConfig\":{\"spy\":{\"mode\":{\"fill\":false,\"name\":null}}},\"id\":\"test1-username\",\"type\":\"visualization\",\"version\":\"6.2.2\"},{\"panelIndex\":\"8\",\"gridData\":{\"x\":0,\"y\":12,\"w\":8,\"h\":6,\"i\":\"8\"},\"embeddableConfig\":{\"spy\":{\"mode\":{\"fill\":false,\"name\":null}},\"vis\":{\"defaultColors\":{\"0 - 7\":\"rgb(247,252,245)\",\"13 - 20\":\"rgb(116,196,118)\",\"20 - 26\":\"rgb(35,139,69)\",\"7 - 13\":\"rgb(199,233,192)\"}}},\"id\":\"fb6541a0-fbe1-11e7-a2c5-4f46f1590b26\",\"type\":\"visualization\",\"version\":\"6.2.2\"},{\"panelIndex\":\"9\",\"gridData\":{\"x\":0,\"y\":18,\"w\":12,\"h\":3,\"i\":\"9\"},\"id\":\"c05b0260-fbe2-11e7-a2c5-4f46f1590b26\",\"type\":\"visualization\",\"version\":\"6.2.2\"},{\"panelIndex\":\"10\",\"gridData\":{\"x\":6,\"y\":0,\"w\":5,\"h\":2,\"i\":\"10\"},\"id\":\"test1-text\",\"type\":\"visualization\",\"version\":\"6.2.2\"}]", 10 | "optionsJSON": "{\"darkTheme\":false,\"hidePanelTitles\":false,\"useMargins\":false}", 11 | "version": 1, 12 | "timeRestore": false, 13 | "kibanaSavedObjectMeta": { 14 | "searchSourceJSON": "{\"filter\":[],\"query\":{\"language\":\"lucene\",\"query\":{\"query_string\":{\"analyze_wildcard\":true,\"default_field\":\"*\",\"query\":\"*\"}}},\"highlightAll\":true,\"version\":true}" 15 | } 16 | } 17 | } 18 | ] -------------------------------------------------------------------------------- /EN/lab2/asset/us-east-1/2-td-agent.conf: -------------------------------------------------------------------------------- 1 | 2 | @type tail 3 | path /root/es-demo/testapp.log 4 | pos_file /var/log/td-agent/testapp.log.pos 5 | format /^\[(?[^ ]* [^ ]*)\] (?[^ ]*) *? (?[^ ]*) * (?[^ ]*) * (?.*) \[(?.*)\]$/ 6 | time_format %d/%b/%Y:%H:%M:%S %z 7 | types size:integer, status:integer, reqtime:float, runtime:float, time:time 8 | tag testappec2.log 9 | 10 | 11 | 12 | type_name testappec2log 13 | @type elasticsearch 14 | include_tag_key true 15 | tag_key @log_name 16 | host eshost 17 | port 443 18 | scheme https 19 | logstash_format true 20 | logstash_prefix testappec2log 21 | flush_interval 10s 22 | retry_limit 5 23 | buffer_type file 24 | buffer_path /var/log/td-agent/buffer/testapp.log.buffer 25 | reload_connections false 26 | 27 | -------------------------------------------------------------------------------- /EN/lab2/images/Lab2-Section1-Step1-4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-s3-datalake-handson/77a5c739987367b8462c5334b320319885a57d96/EN/lab2/images/Lab2-Section1-Step1-4.png -------------------------------------------------------------------------------- /EN/lab2/images/kibana_capture01.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-s3-datalake-handson/77a5c739987367b8462c5334b320319885a57d96/EN/lab2/images/kibana_capture01.png -------------------------------------------------------------------------------- /EN/lab2/images/kibana_dashboard.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-s3-datalake-handson/77a5c739987367b8462c5334b320319885a57d96/EN/lab2/images/kibana_dashboard.png -------------------------------------------------------------------------------- /EN/lab2/images/kibana_discover.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-s3-datalake-handson/77a5c739987367b8462c5334b320319885a57d96/EN/lab2/images/kibana_discover.png -------------------------------------------------------------------------------- /EN/lab2/images/kibana_management.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-s3-datalake-handson/77a5c739987367b8462c5334b320319885a57d96/EN/lab2/images/kibana_management.png -------------------------------------------------------------------------------- /EN/lab3/README.md: -------------------------------------------------------------------------------- 1 | ------------------------------------------------------------------------------------ 2 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 3 | SPDX-License-Identifier: MIT-0 4 | 5 | ------------------------------------------------------------------------------------ 6 | 7 | 8 | # Lab3:Visualization of application logs in real time and alarm settings 9 | 10 | In addition to the visualization performed in "Lab2: Visualization of application logs in real time", in this section, we set an alarm detection. 11 | We add alarm notification before sending from Fluentd to Elasticsearch Service. Amazon CloudWatch (CloudWatch) and AWS Lambda (Lambda) are used for the alarm. 12 | 13 | 14 | ## Section1:EC2 settings 15 | ### Step1:IAM role settings 16 | 17 | Add a policy to the created "**handson-minilake** (optional)" IAM role as follows. 18 | 19 | 1. Select **IAM** from the list of services in the AWS Management Console, select **[Roles]** in the left pane of **[Identity and Access Management (IAM)]** dashboard, click the role name "**handson-minilake** (optional)". 20 | 21 | 2. Select the tab **[Permissions]** and click **[Attach policies]**. 22 | 23 | 3. Check **[CloudWatchLogsFullAccess]** using the search window, and click **[Attach policy]**. 24 | 25 | 4. Click the name of the changed role again, select the **[Permissions]** tab, and confirm that **[CloudWatchLogsFullAccess]** is attached. 26 | 27 | ### Step2:Fluentd settings 28 | 29 | Configure settings for sending log data from Fluentd to CloudWatch Logs. 30 | 31 | 1. Log in to EC2 and install the CloudWatch Logs plugin. 32 | 33 | **Asset** resource:[3-cmd.txt](asset/ap-northeast-1/3-cmd.txt) 34 | 35 | ``` 36 | $ sudo su - 37 | # td-agent-gem install fluent-plugin-cloudwatch-logs -v 0.4.4 38 | ``` 39 | 40 | 2. Confirm plugin installation. 41 | 42 | **Asset** resource:[3-cmd.txt](asset/ap-northeast-1/3-cmd.txt) 43 | 44 | ``` 45 | # td-agent-gem list | grep cloudwatch-logs 46 | ``` 47 | 48 | **[Execution result example]** 49 | 50 | ``` 51 | fluent-plugin-cloudwatch-logs (0.4.4) 52 | ``` 53 | 54 | 3. To change the setting of "**/etc/td-agent/td-agent.conf**", delete the contents of "**/etc/td-agent/td-agent.conf**" once. Open it with an editor such as vi and delete it with ":%d". 55 | 56 | ``` 57 | # vi /etc/td-agent/td-agent.conf 58 | ``` 59 | 60 | 4. Copy the contents of "**3-td-agent.conf**" in **Asset** resource and paste them. 61 | 62 | **Asset** resource:[3-td-agent.conf](asset/ap-northeast-1/3-td-agent.conf) 63 | 64 | 5. Open the file"**/etc/init.d/td-agent**" and then add the following line around the 14th line. 65 | 66 | ``` 67 | # vi /etc/init.d/td-agent 68 | ``` 69 | 70 | **[Example of the line to add]** 71 | 72 | ``` 73 | export AWS_REGION="ap-northeast-1" 74 | ``` 75 | 76 |   **Note:** If you change the region, change it accordingly. 77 | 78 | 6. Restart Fluentd. 79 | 80 | **Asset** resource:[3-cmd.txt](asset/ap-northeast-1/3-cmd.txt) 81 | 82 | ``` 83 | # /etc/init.d/td-agent restart 84 | ``` 85 | 86 | 7. Check if there are any errors continuously in the log of Fluentd. 87 | 88 | **Asset** resource:[3-cmd.txt](asset/ap-northeast-1/3-cmd.txt) 89 | 90 | ``` 91 | # tail -f /var/log/td-agent/td-agent.log 92 | ``` 93 | 94 | ## Section2:CloudWatch, Elasticsearch Service settings 95 | ### Step1:CloudWatch Logs settings 96 | 1. Select **CloudWatch** from the list of services in the AWS Management Console and then click **[Log groups]** in the left pane of **[CloudWatch]** dashboard 97 | 98 | 2. Confirm that the log group "**minilake_group** (optional)" is created and then click it. 99 | 100 | **Note:** If there is no log after a few minutes, make sure that the IAM role is attached to EC2. 101 | 102 | 3. Click the log stream "**testapplog_stream** (optional)" and confirm that the latest log in the output. Click the **[Log Groups]** starting at the top of the screen to return to the log group listing. 103 | 104 | 4. Check the log group "**minilake_group** (optional)", click "**[Actions]**" and click **[Stream to Amazon Elasticsearch Service]**. 105 | 106 | **Note:** A Lambda Function is automatically created in the background. 107 | 108 | 5. At **"Step 1: Choose Destination"**, select "**This Account**" for **[Select account]** and select "**handson-minilake** (optional)" in **[Amazon ES cluster]**. Select **[Create new IAM role]** in **[Lambda IAM Execution Role]**. 109 | 110 |   **Note:** If a pop-up block runs in your browser, set the permission to allow, and start over from the previous step. 111 | 112 | 6. A role named "**lambda\_elasticsearch\_execution**" will be created. Click **[Allow]** at the bottom right. 113 | 114 | 7. At **"Step 1: Choose Destination"**, simply click **[Next]**. 115 | 116 | 8. At **"Step 2: Configure Log Format and Filters"**, select **[Other]** for **[Log Format]**, click **[Next]**, and then click **[Next]** at **"Step 3: Review"**. Click **[Start Streaming]** at **"Step 4: Confirmation"**. 117 | 118 | 119 | ### Step2:Elasticsearch Service settings 120 | 121 | 1. Open **Kibana** screen, click ![kibana_management](../lab2/images/kibana_management.png)icon from the left pane of the **Kibana** screen, and then click **[Index Patterns]**. 122 | 123 | 2. Click **[Create Index Pattern]**. 124 | 125 | 3. Input "**cwl-***" in **[Index pattern]** and then click **[Next step]**. 126 | 127 | **Note:** Since indexing takes some time, it is expected that it takes some time before **[Next step]** can be clicked. 128 | 129 | 4. Select **[@timestamp]** in **[Time Filter field name]** and then click **[Create index pattern]**. 130 | 131 | 5. Click ![kibana_discover](../lab2/images/kibana_discover.png)icon from the left pane of the **Kibana** screen and select "**cwl-***" for Index. If the value is collected and the graph is displayed according with the value, you can move forward. 132 | 133 | 6. Click ![kibana_management](../lab2/images/kibana_management.png)icon from the left pane of the **Kibana** screen, and then click **[Saved Objects]**. Click **[Import]** at the top right of the screen. 134 | 135 | 7. On the **[Saved Objects]** screen, click the **[Import]** icon, select "**3-visualization.json**" in **Asset** resource and click the **[Import]** icon. Then, at the next screen, select "**cwl-\***" and click **[Confirm all changes]** to complete importing. After importing without any errors, click **[Done]** to return to the original screen. 136 | 137 | **Asset** resource:[3-visualization.json](asset/ap-northeast-1/3-visualization.json) 138 | 139 | 8. Next, on the **[Import saved objects]** screen, click the **[Import]** icon again, select "**3-dashboard.json**" in **Asset** resource, and click the **[Import]** icon to import. After importing without any errors, click **[Done]** to return to the original screen. 140 | 141 | **Asset** resource:[3-dashboard.json](asset/ap-northeast-1/3-dashboard.json) 142 | 143 | ### Step3:CloudWatch Alarm settings 144 | 145 | 1. Select **CloudWatch** from the list of services in the AWS Management Console, click **[Logs]**, check the log group "**minilake_group** (optional)" and click **[Create Metric Filter]**. 146 | 147 | 2. Enter "**ERROR**" in the filter pattern, click **[Test Pattern]**, check the contents, and click **[Assign Metric]** at the bottom right of the screen. 148 | 149 | 3. Enter "**minilake_errlog** (optional)" in **[Metric Name]** and click **[Create Filter]**. 150 | 151 | 4. Your filter is created. Click **[Create Alarm]** on the right side of the screen without closing the screen, and then set the alarm. 152 | 153 | 5. To change the interval, click **[Edit]**, set **[Period]** to "**1 minute**", and click **[Select metric]**. 154 | 155 | 6. Set as follows and click **[Next]**. 156 | 157 | - Threshold type:Static 158 | - Define the alarm condition:Greater/Equal 159 | - Define the threshold value:50 160 | ▼Additional configuration 161 | - Datapoints to alarm:1/1 162 | 163 | 7. Set as follows and click **[Create topic]** to create a topic. 164 | 165 | - Select an SNS topic:Create new topic 166 | - Create a new topic…:Default\_CloudWatch\_Alarms\_Topic(Optional) 167 | - Email endpoints that will receive the notification…:Your email address that can receive email during this hands-on 168 | 169 | **Note:** After registration, a confirmation email will be sent to the email address registered in this procedure. Click **[Confirm subscription]** in the email body. 170 | 171 | 8. Now that you have all the settings you need, click **[Next]**. 172 | 173 | 9. Enter "**minilake-handson-alarm** (optional)" in **[Alarm name]**, click **[Next]**, and click **[Create alarm]** in the subsequent screen. 174 | 175 | 10. Click **[Alarm]** of **[CloudWatch]** and check the graph screen. Initially, the status is displayed as **[Insufficient data]**, but after a while it becomes **[OK]**. If more than 50 **ERROR** occur in one minute, an alert will be raised. Since it is set to output 300 ERRORs every 10 minutes, an alert will be raised every 10 minutes. 176 | 177 | ### Note: Elasticsearch Service now supports event monitoring and alerts. Alerts are available for domains running Elasticsearch 6.2 or later. Click [here](additional_info_lab3.md) for detailed instructions that replace the applicable instructions in this hands-on. 178 | 179 | ## Section3:Summary 180 | 181 | We were able to build a real-time log monitoring and monitoring environment. 182 | 183 | 184 | 185 | That's it for Lab3. Try the following procedure according with the path you have selected. 186 | 187 | (1) Implementation of near real-time data analysis environment (speed layer):[Lab1](../lab1/README.md) → [Lab2](../lab2/README.md) → [Lab3](../lab3/README.md) 188 | (2) Implementation of an environment for batch analysis of long-term data (batch layer) and optimization of performance and cost:[Lab1](../lab1/README.md) → [Lab4](../lab4/README.md) or [Lab5](../lab5/README.md) → [Lab6](../lab6/README.md) 189 | (3) All labs:[Lab1](../lab1/README.md) → [Lab2](../lab2/README.md) → [Lab3](../lab3/README.md) → [Lab4](../lab4/README.md) → [Lab5](../lab5/README.md) → [Lab6](../lab6/README.md) 190 | 191 | Please follow [these instructions](../clean-up/README.md) when deleting an environment. 192 | -------------------------------------------------------------------------------- /EN/lab3/additional_info_lab3.md: -------------------------------------------------------------------------------- 1 | ------------------------------------------------------------------------------------ 2 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 3 | SPDX-License-Identifier: MIT-0 4 | 5 | ------------------------------------------------------------------------------------ 6 | 7 | 8 | # Alert settings in Elasticsearch Service 9 | 10 | Amazon Elasticsearch Service (Elasticsearch Service) has started supporting event monitoring and alerts. Alerts are available for domains running Elasticsearch 6.2 or later. Click [here](https://aws.amazon.com/about-aws/whats-new/2019/04/amazon-elasticsearch-service-adds-event-monitoring-and-alerting-support/) for details. 11 | This page introduces alert notification settings using Amazon Simple Notification Service (Amazon SNS). 12 | 13 | 1. Select **Simple Notification Service** from the list of services in the AWS Management Console, select **[Topics]** from the left pane, and click **[Create topic]**. 14 | 15 | 2. On the **[Create topic]** screen, enter "**handson-minilake** (optional)" as the name and click **[Create topic]** to create the topic. For information on creating an endpoint subscription for a topic, see [here](https://docs.aws.amazon.com/sns/latest/dg/sns-getting-started.html) for registration. 16 | 17 | 3. To add a policy to the created "**handson-minilake** (optional)" IAM role, select **IAM** from the list of services in the AWS Management Console, select **[Roles]** and click the role name of "**handson-minilake** (optional)". 18 | 19 | 4. Select the **[Permissions]** tab and click **[Attach policies]**. 20 | 21 | 5. Using search tool, check the **[AmazonSNSFullAccess]** policy and click **[Attach policy]**. 22 | 23 | 6. Click the **[Trust relationships]** tab and click the **[Edit trust relationship]** button. 24 | 25 | 7. On the **[Edit Trust Relationship]** screen, add **es** to the location of **”Service”: “ec2.amazonaws.com”**. Add **es.amazonaws.com** with "**[]**" and "**,**" and click **[Update Trust Policy]**. 26 | 27 | **[Example]** 28 | 29 | ``` 30 | { 31 | "Version": "2012-10-17", 32 | "Statement": [ 33 | { 34 | "Effect": "Allow", 35 | "Principal": { 36 | "Service": [ 37 | "glue.amazonaws.com",   38 | "ec2.amazonaws.com" 39 | ] 40 | }, 41 | "Action": "sts:AssumeRole" 42 | } 43 | ] 44 | } 45 | ``` 46 | 47 | 8. Open **Kibana** and select **[Alerting]**. Select the **[Destinations]** tab and select **[Add Destination]**. 48 | 49 | 9. On the **[Add Destination]** screen, enter the following and click **[Create]**. 50 | 51 | - Name:sns-handson-minilake (optional) 52 | - Type:Amazon SNS 53 | - SNS topic ARN:ARN of SNS topic used for notification 54 | - IAM role ARN:ARN of the role "handson-minilake (optional)" 55 | 56 | 10. Next, configure the monitoring settings. Select **[Monitors]** and click **[Create monitor]**. 57 | 58 | 11. Input in **[Configure Monitor]** is as follows. 59 | 60 | - Monitor name:monitor-handson-minilake (optional) 61 | - Frequency:By interval (default) 62 | - Every:3 Minutes 63 | 64 | **Note:** If you set interval every minutes, data transfer from CloudWatch Logs to Elasticsearch Service may be delayed. So, please set it every 3 minutes. 65 | 66 | 67 | 12. Enter in **[Define Monitor]** as follows and click **[Create]** at the bottom of the screen. 68 | 69 | - How do you want to define the monitor?:Define using extraction query 70 | - Index:Select "cwl-*" 71 | - Define extraction query:Copy contents of [3-define-extraction-query.txt](asset/ap-northeast-1/3-define-extraction-query.txt) 72 | 73 | 13. In order to move to the **[Create Trigger]** screen, set the trigger with **[Define Trigger]**. Enter as follows. 74 | 75 | - Trigger name:trigger-handson-minilake (optional) 76 | - Severity level:1 (default) 77 | - Execution query response:Not change 78 | - Trigger condition:Change to 50 79 | ``` 80 | ctx.results[0].hits.total > 50 81 | ``` 82 | 83 | 14. In **[Configure Actions]**, enter the following and click **[Create]** at the bottom of the screen to create it. 84 | 85 | - Action name:action-handson-minilake (optional) 86 | - Destination name:sns-handson-minilake – (Amazon SNS) 87 | - Message subject:alert-handson-minilake (optional) 88 | 89 | 15. Since we set our system to generate 300 ERRORs every 10 minutes, you can see that an alert has been triggered after about 10 minutes. 90 | -------------------------------------------------------------------------------- /EN/lab3/asset/ap-northeast-1/3-cmd.txt: -------------------------------------------------------------------------------- 1 | ------------------------------------------------------------------------------------ 2 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 3 | SPDX-License-Identifier: MIT-0 4 | 5 | ------------------------------------------------------------------------------------ 6 | 7 | 8 | 1. 9 | td-agent-gem install fluent-plugin-cloudwatch-logs -v 0.4.4 10 | td-agent-gem list | grep fluent-plugin-cloudwatch-logs 11 | 12 | 2. 13 | export AWS_REGION="ap-northeast-1" 14 | 15 | 3. 16 | /etc/init.d/td-agent restart 17 | 18 | 4. 19 | tail -f /var/log/td-agent/td-agent.log 20 | -------------------------------------------------------------------------------- /EN/lab3/asset/ap-northeast-1/3-dashboard.json: -------------------------------------------------------------------------------- 1 | [ 2 | { 3 | "_id": "test2-dashboard", 4 | "_type": "dashboard", 5 | "_source": { 6 | "title": "test2-dashboard", 7 | "hits": 0, 8 | "description": "", 9 | "panelsJSON": "[{\"panelIndex\":\"1\",\"gridData\":{\"x\":0,\"y\":2,\"w\":12,\"h\":4,\"i\":\"1\"},\"id\":\"test2-alarmlevel\",\"type\":\"visualization\",\"version\":\"6.2.2\"},{\"panelIndex\":\"3\",\"gridData\":{\"x\":6,\"y\":6,\"w\":6,\"h\":5,\"i\":\"3\"},\"id\":\"test2-hostalarmlevel\",\"type\":\"visualization\",\"version\":\"6.2.2\"},{\"panelIndex\":\"4\",\"gridData\":{\"x\":0,\"y\":0,\"w\":5,\"h\":2,\"i\":\"4\"},\"id\":\"test-text\",\"type\":\"visualization\",\"version\":\"6.2.2\"},{\"panelIndex\":\"5\",\"gridData\":{\"x\":3,\"y\":6,\"w\":3,\"h\":3,\"i\":\"5\"},\"id\":\"test2-count\",\"type\":\"visualization\",\"embeddableConfig\":{\"spy\":{\"mode\":{\"fill\":false,\"name\":null}},\"vis\":{\"defaultColors\":{\"0 - 100\":\"rgb(0,104,55)\"}}},\"version\":\"6.2.2\"},{\"panelIndex\":\"6\",\"gridData\":{\"x\":0,\"y\":6,\"w\":3,\"h\":3,\"i\":\"6\"},\"id\":\"test2-ranking\",\"type\":\"visualization\",\"embeddableConfig\":{\"spy\":{\"mode\":{\"fill\":false,\"name\":null}},\"vis\":{\"params\":{\"sort\":{\"columnIndex\":null,\"direction\":null}}}},\"version\":\"6.2.2\"},{\"panelIndex\":\"7\",\"gridData\":{\"x\":0,\"y\":9,\"w\":6,\"h\":3,\"i\":\"7\"},\"id\":\"test2-username\",\"type\":\"visualization\",\"embeddableConfig\":{\"spy\":{\"mode\":{\"fill\":false,\"name\":null}}},\"version\":\"6.2.2\"},{\"panelIndex\":\"8\",\"gridData\":{\"x\":0,\"y\":12,\"w\":8,\"h\":6,\"i\":\"8\"},\"id\":\"fb6541a0-fbe1-11e7-a2c5-4f46f1590b26\",\"type\":\"visualization\",\"embeddableConfig\":{\"spy\":{\"mode\":{\"fill\":false,\"name\":null}},\"vis\":{\"defaultColors\":{\"0 - 7\":\"rgb(247,252,245)\",\"7 - 13\":\"rgb(199,233,192)\",\"13 - 20\":\"rgb(116,196,118)\",\"20 - 26\":\"rgb(35,139,69)\"}}},\"version\":\"6.2.2\"},{\"panelIndex\":\"9\",\"gridData\":{\"x\":0,\"y\":18,\"w\":12,\"h\":3,\"i\":\"9\"},\"id\":\"c05b0260-fbe2-11e7-a2c5-4f46f1590b26\",\"type\":\"visualization\",\"version\":\"6.2.2\"},{\"panelIndex\":\"10\",\"gridData\":{\"x\":5,\"y\":0,\"w\":4,\"h\":2,\"i\":\"10\"},\"type\":\"visualization\",\"id\":\"test2-text\",\"version\":\"6.2.2\"}]", 10 | "optionsJSON": "{\"darkTheme\":false,\"useMargins\":false}", 11 | "version": 1, 12 | "timeRestore": false, 13 | "kibanaSavedObjectMeta": { 14 | "searchSourceJSON": "{\"filter\":[],\"query\":{\"language\":\"lucene\",\"query\":{\"query_string\":{\"analyze_wildcard\":true,\"default_field\":\"*\",\"query\":\"*\"}}},\"highlightAll\":true,\"version\":true}" 15 | } 16 | } 17 | } 18 | ] -------------------------------------------------------------------------------- /EN/lab3/asset/ap-northeast-1/3-define-extraction-query.txt: -------------------------------------------------------------------------------- 1 | { 2 | "size": 0, 3 | "query": { 4 | "bool": { 5 | "must": [ 6 | { 7 | "match": { 8 | "alarmlevel": { 9 | "query": "ERROR", 10 | "operator": "OR", 11 | "prefix_length": 0, 12 | "max_expansions": 50, 13 | "fuzzy_transpositions": true, 14 | "lenient": false, 15 | "zero_terms_query": "NONE", 16 | "auto_generate_synonyms_phrase_query": true, 17 | "boost": 1 18 | } 19 | } 20 | } 21 | ], 22 | "filter": [ 23 | { 24 | "range": { 25 | "@timestamp": { 26 | "from": "now-3m", 27 | "to": "now", 28 | "include_lower": true, 29 | "include_upper": true, 30 | "boost": 1 31 | } 32 | } 33 | } 34 | ], 35 | "adjust_pure_negative": true, 36 | "boost": 1 37 | } 38 | }, 39 | "aggregations": {} 40 | } -------------------------------------------------------------------------------- /EN/lab3/asset/ap-northeast-1/3-td-agent.conf: -------------------------------------------------------------------------------- 1 | 2 | @type tail 3 | path /root/es-demo/testapp.log 4 | pos_file /var/log/td-agent/testapp.log.pos 5 | format /^\[(?[^ ]* [^ ]*)\] (?[^ ]*) *? (?[^ ]*) * (?[^ ]*) * (?.*) \[(?.*)\]$/ 6 | time_format %d/%b/%Y:%H:%M:%S %z 7 | types size:integer, status:integer, reqtime:float, runtime:float, time:time 8 | tag testappec2.log 9 | 10 | 11 | 12 | @type cloudwatch_logs 13 | log_group_name minilake_group 14 | log_stream_name testapplog_stream 15 | auto_create_stream true 16 | 17 | -------------------------------------------------------------------------------- /EN/lab3/asset/us-east-1/3-cmd.txt: -------------------------------------------------------------------------------- 1 | ------------------------------------------------------------------------------------ 2 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 3 | SPDX-License-Identifier: MIT-0 4 | 5 | ------------------------------------------------------------------------------------ 6 | 7 | 8 | 1. 9 | td-agent-gem install fluent-plugin-cloudwatch-logs -v 0.4.4 10 | td-agent-gem list | grep fluent-plugin-cloudwatch-logs 11 | 12 | 2. 13 | export AWS_REGION="ap-northeast-1" 14 | 15 | ※バージニア北部で実施の場合 16 | export AWS_REGION="us-east-1" 17 | 18 | 19 | 3. 20 | /etc/init.d/td-agent restart 21 | 22 | 4. 23 | tail -f /var/log/td-agent/td-agent.log 24 | -------------------------------------------------------------------------------- /EN/lab3/asset/us-east-1/3-dashboard.json: -------------------------------------------------------------------------------- 1 | [ 2 | { 3 | "_id": "test2-dashboard", 4 | "_type": "dashboard", 5 | "_source": { 6 | "title": "test2-dashboard", 7 | "hits": 0, 8 | "description": "", 9 | "panelsJSON": "[{\"panelIndex\":\"1\",\"gridData\":{\"x\":0,\"y\":2,\"w\":12,\"h\":4,\"i\":\"1\"},\"id\":\"test2-alarmlevel\",\"type\":\"visualization\",\"version\":\"6.2.2\"},{\"panelIndex\":\"3\",\"gridData\":{\"x\":6,\"y\":6,\"w\":6,\"h\":5,\"i\":\"3\"},\"id\":\"test2-hostalarmlevel\",\"type\":\"visualization\",\"version\":\"6.2.2\"},{\"panelIndex\":\"4\",\"gridData\":{\"x\":0,\"y\":0,\"w\":5,\"h\":2,\"i\":\"4\"},\"id\":\"test-text\",\"type\":\"visualization\",\"version\":\"6.2.2\"},{\"panelIndex\":\"5\",\"gridData\":{\"x\":3,\"y\":6,\"w\":3,\"h\":3,\"i\":\"5\"},\"id\":\"test2-count\",\"type\":\"visualization\",\"embeddableConfig\":{\"spy\":{\"mode\":{\"fill\":false,\"name\":null}},\"vis\":{\"defaultColors\":{\"0 - 100\":\"rgb(0,104,55)\"}}},\"version\":\"6.2.2\"},{\"panelIndex\":\"6\",\"gridData\":{\"x\":0,\"y\":6,\"w\":3,\"h\":3,\"i\":\"6\"},\"id\":\"test2-ranking\",\"type\":\"visualization\",\"embeddableConfig\":{\"spy\":{\"mode\":{\"fill\":false,\"name\":null}},\"vis\":{\"params\":{\"sort\":{\"columnIndex\":null,\"direction\":null}}}},\"version\":\"6.2.2\"},{\"panelIndex\":\"7\",\"gridData\":{\"x\":0,\"y\":9,\"w\":6,\"h\":3,\"i\":\"7\"},\"id\":\"test2-username\",\"type\":\"visualization\",\"embeddableConfig\":{\"spy\":{\"mode\":{\"fill\":false,\"name\":null}}},\"version\":\"6.2.2\"},{\"panelIndex\":\"8\",\"gridData\":{\"x\":0,\"y\":12,\"w\":8,\"h\":6,\"i\":\"8\"},\"id\":\"fb6541a0-fbe1-11e7-a2c5-4f46f1590b26\",\"type\":\"visualization\",\"embeddableConfig\":{\"spy\":{\"mode\":{\"fill\":false,\"name\":null}},\"vis\":{\"defaultColors\":{\"0 - 7\":\"rgb(247,252,245)\",\"7 - 13\":\"rgb(199,233,192)\",\"13 - 20\":\"rgb(116,196,118)\",\"20 - 26\":\"rgb(35,139,69)\"}}},\"version\":\"6.2.2\"},{\"panelIndex\":\"9\",\"gridData\":{\"x\":0,\"y\":18,\"w\":12,\"h\":3,\"i\":\"9\"},\"id\":\"c05b0260-fbe2-11e7-a2c5-4f46f1590b26\",\"type\":\"visualization\",\"version\":\"6.2.2\"},{\"panelIndex\":\"10\",\"gridData\":{\"x\":5,\"y\":0,\"w\":4,\"h\":2,\"i\":\"10\"},\"type\":\"visualization\",\"id\":\"test2-text\",\"version\":\"6.2.2\"}]", 10 | "optionsJSON": "{\"darkTheme\":false,\"useMargins\":false}", 11 | "version": 1, 12 | "timeRestore": false, 13 | "kibanaSavedObjectMeta": { 14 | "searchSourceJSON": "{\"filter\":[],\"query\":{\"language\":\"lucene\",\"query\":{\"query_string\":{\"analyze_wildcard\":true,\"default_field\":\"*\",\"query\":\"*\"}}},\"highlightAll\":true,\"version\":true}" 15 | } 16 | } 17 | } 18 | ] -------------------------------------------------------------------------------- /EN/lab3/asset/us-east-1/3-define-extraction-query.txt: -------------------------------------------------------------------------------- 1 | { 2 | "size": 0, 3 | "query": { 4 | "bool": { 5 | "must": [ 6 | { 7 | "match": { 8 | "alarmlevel": { 9 | "query": "ERROR", 10 | "operator": "OR", 11 | "prefix_length": 0, 12 | "max_expansions": 50, 13 | "fuzzy_transpositions": true, 14 | "lenient": false, 15 | "zero_terms_query": "NONE", 16 | "auto_generate_synonyms_phrase_query": true, 17 | "boost": 1 18 | } 19 | } 20 | } 21 | ], 22 | "filter": [ 23 | { 24 | "range": { 25 | "@timestamp": { 26 | "from": "now-3m", 27 | "to": "now", 28 | "include_lower": true, 29 | "include_upper": true, 30 | "boost": 1 31 | } 32 | } 33 | } 34 | ], 35 | "adjust_pure_negative": true, 36 | "boost": 1 37 | } 38 | }, 39 | "aggregations": {} 40 | } -------------------------------------------------------------------------------- /EN/lab3/asset/us-east-1/3-td-agent.conf: -------------------------------------------------------------------------------- 1 | 2 | @type tail 3 | path /root/es-demo/testapp.log 4 | pos_file /var/log/td-agent/testapp.log.pos 5 | format /^\[(?[^ ]* [^ ]*)\] (?[^ ]*) *? (?[^ ]*) * (?[^ ]*) * (?.*) \[(?.*)\]$/ 6 | time_format %d/%b/%Y:%H:%M:%S %z 7 | types size:integer, status:integer, reqtime:float, runtime:float, time:time 8 | tag testappec2.log 9 | 10 | 11 | 12 | @type cloudwatch_logs 13 | log_group_name minilake_group 14 | log_stream_name testapplog_stream 15 | auto_create_stream true 16 | 17 | -------------------------------------------------------------------------------- /EN/lab4/additional_info_lab4.md: -------------------------------------------------------------------------------- 1 | ------------------------------------------------------------------------------------ 2 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 3 | SPDX-License-Identifier: MIT-0 4 | 5 | ------------------------------------------------------------------------------------ 6 | 7 | 8 | # Additional guidance of query execution in Athena 9 | ## Execution example when data has accumulated for a while 10 | 11 | 1. Execute the following SQL in the query editor. 12 | 13 | ``` 14 | SELECT * FROM "minilake"."minilake_in1"; 15 | ``` 16 | 17 | **[Execution result example]** 18 | 19 | ``` 20 | (Run time: 4.66 seconds, Data scanned: 27.02 MB) 21 | ``` 22 | 23 | You can confirm that the execution time and the amount of scanned data increase, according with the amount of data. 24 | 25 | 26 | 2. Try running a query with a Where clause. 27 | 28 | ``` 29 | SELECT * FROM "minilake"."minilake_in1" where partition_0 = '2019' AND partition_1 = '09' AND partition_2 = '27' AND partition_3 = '14'; 30 | ``` 31 | 32 | **[Execution result example]** 33 | 34 | ``` 35 | (Run time: 2.46 seconds, Data scanned: 284.42 KB) 36 | ``` 37 | 38 | **Note:** Enter the date of the Where clause that has data. 39 | 40 | 41 | Similar to 1, you can confirm that the execution time and the amount of scanned data increase, according with the amount of data. 42 | 43 | 44 | 45 | **Athena charges you for the amount of data scanned, so if you don't scan volumes of data, you can keep costs low. In many cases, small scans can improve performance.** 46 | Use partition, compression, and columnar formats to reduce the amount of reading. Click [here](https://aws.amazon.com/jp/blogs/big-data/top-10-performance-tuning-tips-for-amazon-athena/) for reference information. 47 | 48 | 49 | -------------------------------------------------------------------------------- /EN/lab4/asset/ap-northeast-1/4-cmd.txt: -------------------------------------------------------------------------------- 1 | ------------------------------------------------------------------------------------ 2 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 3 | SPDX-License-Identifier: MIT-0 4 | 5 | ------------------------------------------------------------------------------------ 6 | 7 | 8 | 1. 9 | td-agent-gem install fluent-plugin-kinesis -v 2.1.0 10 | 11 | 2. 12 | td-agent-gem list | grep plugin-kinesis 13 | 14 | 3. 15 | export AWS_REGION="ap-northeast-1" 16 | 17 | 4. 18 | /etc/init.d/td-agent restart 19 | 20 | 5. 21 | SELECT * FROM "minilake"."minilake_in1"; 22 | 23 | 6. 24 | SELECT * FROM "minilake"."minilake_in1" where partition_0 = '2019' AND partition_1 = '09' AND partition_2 = '27' AND partition_3 = '14'; 25 | -------------------------------------------------------------------------------- /EN/lab4/asset/ap-northeast-1/4-policydocument.txt: -------------------------------------------------------------------------------- 1 | { 2 | "Version": "2012-10-17", 3 | "Statement": [ 4 | { 5 | "Effect": "Allow", 6 | "Principal": { 7 | "Service": [ 8 | "glue.amazonaws.com", 9 | "ec2.amazonaws.com" 10 | ] 11 | }, 12 | "Action": "sts:AssumeRole" 13 | } 14 | ] 15 | } 16 | -------------------------------------------------------------------------------- /EN/lab4/asset/ap-northeast-1/4-td-agent1.conf: -------------------------------------------------------------------------------- 1 | 2 | @type tail 3 | path /root/es-demo/testapp.log 4 | pos_file /var/log/td-agent/testapp.log.pos 5 | format /^\[(?[^ ]* [^ ]*)\] (?[^ ]*) *? (?[^ ]*) * (?[^ ]*) * (?.*) \[(?.*)\]$/ 6 | time_format %d/%b/%Y:%H:%M:%S %z 7 | types size:integer, status:integer, reqtime:float, runtime:float, time:time 8 | tag testappec2.log 9 | 10 | 11 | 12 | @type copy 13 | 14 | @type cloudwatch_logs 15 | log_group_name minilake_group 16 | log_stream_name testapplog_stream 17 | auto_create_stream true 18 | 19 | 20 | @type kinesis_firehose 21 | delivery_stream_name minilake1 22 | flush_interval 1s 23 | 24 | 25 | -------------------------------------------------------------------------------- /EN/lab4/asset/ap-northeast-1/4-td-agent2.conf: -------------------------------------------------------------------------------- 1 | 2 | @type tail 3 | path /root/es-demo/testapp.log 4 | pos_file /var/log/td-agent/testapp.log.pos 5 | format /^\[(?[^ ]* [^ ]*)\] (?[^ ]*) *? (?[^ ]*) * (?[^ ]*) * (?.*) \[(?.*)\]$/ 6 | time_format %d/%b/%Y:%H:%M:%S %z 7 | types size:integer, status:integer, reqtime:float, runtime:float, time:time 8 | tag testappec2.log 9 | 10 | 11 | 12 | @type copy 13 | 14 | @type kinesis_firehose 15 | delivery_stream_name minilake1 16 | flush_interval 1s 17 | 18 | 19 | -------------------------------------------------------------------------------- /EN/lab4/asset/us-east-1/4-cmd.txt: -------------------------------------------------------------------------------- 1 | ------------------------------------------------------------------------------------ 2 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 3 | SPDX-License-Identifier: MIT-0 4 | 5 | ------------------------------------------------------------------------------------ 6 | 7 | 8 | 1. 9 | td-agent-gem install fluent-plugin-kinesis -v 2.1.0 10 | 11 | 2. 12 | td-agent-gem list | grep plugin-kinesis 13 | 14 | 3. 15 | export AWS_REGION="ap-northeast-1" 16 | 17 | 4. 18 | /etc/init.d/td-agent restart 19 | 20 | 5. 21 | SELECT * FROM "minilake"."minilake_in1"; 22 | 23 | 6. 24 | SELECT * FROM "minilake"."minilake_in1" where partition_0 = '2019' AND partition_1 = '09' AND partition_2 = '27' AND partition_3 = '14'; 25 | -------------------------------------------------------------------------------- /EN/lab4/asset/us-east-1/4-policydocument.txt: -------------------------------------------------------------------------------- 1 | { 2 | "Version": "2012-10-17", 3 | "Statement": [ 4 | { 5 | "Effect": "Allow", 6 | "Principal": { 7 | "Service": [ 8 | "glue.amazonaws.com", 9 | "ec2.amazonaws.com" 10 | ] 11 | }, 12 | "Action": "sts:AssumeRole" 13 | } 14 | ] 15 | } 16 | -------------------------------------------------------------------------------- /EN/lab4/asset/us-east-1/4-td-agent1.conf: -------------------------------------------------------------------------------- 1 | 2 | @type tail 3 | path /root/es-demo/testapp.log 4 | pos_file /var/log/td-agent/testapp.log.pos 5 | format /^\[(?[^ ]* [^ ]*)\] (?[^ ]*) *? (?[^ ]*) * (?[^ ]*) * (?.*) \[(?.*)\]$/ 6 | time_format %d/%b/%Y:%H:%M:%S %z 7 | types size:integer, status:integer, reqtime:float, runtime:float, time:time 8 | tag testappec2.log 9 | 10 | 11 | 12 | @type copy 13 | 14 | @type cloudwatch_logs 15 | log_group_name minilake_group 16 | log_stream_name testapplog_stream 17 | auto_create_stream true 18 | 19 | 20 | @type kinesis_firehose 21 | delivery_stream_name minilake1 22 | flush_interval 1s 23 | 24 | 25 | -------------------------------------------------------------------------------- /EN/lab4/asset/us-east-1/4-td-agent2.conf: -------------------------------------------------------------------------------- 1 | 2 | @type tail 3 | path /root/es-demo/testapp.log 4 | pos_file /var/log/td-agent/testapp.log.pos 5 | format /^\[(?[^ ]* [^ ]*)\] (?[^ ]*) *? (?[^ ]*) * (?[^ ]*) * (?.*) \[(?.*)\]$/ 6 | time_format %d/%b/%Y:%H:%M:%S %z 7 | types size:integer, status:integer, reqtime:float, runtime:float, time:time 8 | tag testappec2.log 9 | 10 | 11 | 12 | @type copy 13 | 14 | @type kinesis_firehose 15 | delivery_stream_name minilake1 16 | flush_interval 1s 17 | 18 | 19 | -------------------------------------------------------------------------------- /EN/lab4/images/quicksight_capture01.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-s3-datalake-handson/77a5c739987367b8462c5334b320319885a57d96/EN/lab4/images/quicksight_capture01.png -------------------------------------------------------------------------------- /EN/lab5/asset/ap-northeast-1/5-cmd.txt: -------------------------------------------------------------------------------- 1 | ------------------------------------------------------------------------------------ 2 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 3 | SPDX-License-Identifier: MIT-0 4 | 5 | ------------------------------------------------------------------------------------ 6 | 7 | 8 | 1. 9 | td-agent-gem install fluent-plugin-kinesis -v 2.1.0 10 | 11 | 2. 12 | td-agent-gem list | grep plugin-kinesis 13 | 14 | 3. 15 | export AWS_REGION="ap-northeast-1" 16 | 17 | 4. 18 | /etc/init.d/td-agent restart 19 | 20 | 21 | 6. 22 | create table ec2log ( timestamp varchar, alarmlevel varchar, host varchar, number int2, text varchar ); 23 | 24 | 7. 25 | copy ec2log from 's3://[S3 BUCKET NAME]/minilake-in1' format as json 'auto' iam_role '[ARN of the created IAM role]'; 26 | 27 | 8. 28 | create external schema my_first_external_schema from data catalog database 'spectrumdb' iam_role '[ARN of the created IAM role]' create external database if not exists; 29 | 30 | 9. 31 | create external table my_first_external_schema.ec2log_external ( timestamp varchar(max), alarmlevel varchar(max), host varchar(max), number int2, text varchar(max) ) partitioned by (year char(4), month char(2), day char(2), hour char(2)) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' WITH SERDEPROPERTIES ( 'paths'='timestamp,alarmlevel,host,number,text') STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' location 's3://[S3 BUCKET NAME]/minilake-in1/'; 32 | 33 | 10. 34 | select * from svv_external_schemas; 35 | 36 | 11. 37 | select * from svv_external_databases; 38 | 39 | 12. 40 | select * from svv_external_tables; 41 | 42 | 13. 43 | The value of each partition in the "ADD PARTITION" clause must match the value of the partition in the S3 bucket path in the LOCATION clause. The following is an example. 44 | 45 | ALTER TABLE my_first_external_schema.ec2log_external ADD PARTITION (year='2019', month='09', day='27', hour='14') LOCATION 's3://[S3 BUCKET NAME]/minilake-in1/2019/09/27/14'; 46 | 47 | 48 | If you make a mistake in creating a partition, please refer to the following for Drop Partition. 49 | Specify the value of the partition specified by ADD PARTITION and delete it. 50 | 51 | ALTER TABLE my_first_external_schema.ec2log_external DROP PARTITION (year='2019', month='09', day='27', hour='14') 52 | 53 | 54 | -------------------------------------------------------------------------------- /EN/lab5/asset/ap-northeast-1/5-minilake_privatesubnet.yaml: -------------------------------------------------------------------------------- 1 | Parameters: 2 | VpcId: 3 | Description: select handson-minilake vpc id 4 | Type: "AWS::EC2::VPC::Id" 5 | EC2SecurityGroupId: 6 | Description: select EC2 Security Group ID 7 | Type: "AWS::EC2::SecurityGroup::Id" 8 | Resources: 9 | # Create Public RouteTable 10 | PrivateRouteTable: 11 | Type: AWS::EC2::RouteTable 12 | Properties: 13 | VpcId: !Ref VpcId 14 | Tags: 15 | - Key: Name 16 | Value: handson-minilake-private-rt 17 | # Create Private Subnet A 18 | PrivateSubnet: 19 | Type: AWS::EC2::Subnet 20 | Properties: 21 | VpcId: !Ref VpcId 22 | CidrBlock: 10.0.0.32/27 23 | AvailabilityZone: "ap-northeast-1a" 24 | Tags: 25 | - Key: Name 26 | Value: handson-minilake-private-sub 27 | # Associate with Private subnet and route table for private subnet 28 | PriSubnetRouteTableAssociation: 29 | Type: AWS::EC2::SubnetRouteTableAssociation 30 | Properties: 31 | SubnetId: !Ref PrivateSubnet 32 | RouteTableId: !Ref PrivateRouteTable 33 | # Create EIP 34 | MyEIP: 35 | Type: "AWS::EC2::EIP" 36 | Properties: 37 | Domain: vpc 38 | # Create NATGateway 39 | myNATGateway: 40 | Type: "AWS::EC2::NatGateway" 41 | DependsOn: 42 | - MyEIP 43 | - PrivateSubnet 44 | Properties: 45 | AllocationId: !GetAtt MyEIP.AllocationId 46 | SubnetId: !Ref PrivateSubnet 47 | Tags: 48 | - Key: Name 49 | Value: handson-minilake-nat 50 | # set NAT as default gateway on route table for private subnet 51 | myPrivateRoute: 52 | Type: AWS::EC2::Route 53 | DependsOn: myNATGateway 54 | Properties: 55 | RouteTableId: !Ref PrivateRouteTable 56 | DestinationCidrBlock: 0.0.0.0/0 57 | NatGatewayId: !Ref myNATGateway 58 | # create sg for RS 59 | myRSSecurityGroup: 60 | Type: 'AWS::EC2::SecurityGroup' 61 | Properties: 62 | GroupName: handson-minilake-sg-private 63 | GroupDescription: Enable SSH access via port 5439 64 | VpcId: !Ref VpcId 65 | SecurityGroupIngress: 66 | - IpProtocol: tcp 67 | FromPort: '5439' 68 | ToPort: '5439' 69 | SourceSecurityGroupId: !Ref EC2SecurityGroupId 70 | -------------------------------------------------------------------------------- /EN/lab5/asset/ap-northeast-1/5-td-agent1.conf: -------------------------------------------------------------------------------- 1 | 2 | @type tail 3 | path /root/es-demo/testapp.log 4 | pos_file /var/log/td-agent/testapp.log.pos 5 | format /^\[(?[^ ]* [^ ]*)\] (?[^ ]*) *? (?[^ ]*) * (?[^ ]*) * (?.*) \[(?.*)\]$/ 6 | time_format %d/%b/%Y:%H:%M:%S %z 7 | types size:integer, status:integer, reqtime:float, runtime:float, time:time 8 | tag testappec2.log 9 | 10 | 11 | 12 | @type copy 13 | 14 | @type cloudwatch_logs 15 | log_group_name minilake_group 16 | log_stream_name testapplog_stream 17 | auto_create_stream true 18 | 19 | 20 | @type kinesis_firehose 21 | delivery_stream_name minilake1 22 | flush_interval 1s 23 | 24 | 25 | -------------------------------------------------------------------------------- /EN/lab5/asset/ap-northeast-1/5-td-agent2.conf: -------------------------------------------------------------------------------- 1 | 2 | @type tail 3 | path /root/es-demo/testapp.log 4 | pos_file /var/log/td-agent/testapp.log.pos 5 | format /^\[(?[^ ]* [^ ]*)\] (?[^ ]*) *? (?[^ ]*) * (?[^ ]*) * (?.*) \[(?.*)\]$/ 6 | time_format %d/%b/%Y:%H:%M:%S %z 7 | types size:integer, status:integer, reqtime:float, runtime:float, time:time 8 | tag testappec2.log 9 | 10 | 11 | 12 | @type copy 13 | 14 | @type kinesis_firehose 15 | delivery_stream_name minilake1 16 | flush_interval 1s 17 | 18 | 19 | -------------------------------------------------------------------------------- /EN/lab5/asset/us-east-1/5-cmd.txt: -------------------------------------------------------------------------------- 1 | ------------------------------------------------------------------------------------ 2 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 3 | SPDX-License-Identifier: MIT-0 4 | 5 | ------------------------------------------------------------------------------------ 6 | 7 | 8 | 1. 9 | td-agent-gem install fluent-plugin-kinesis -v 2.1.0 10 | 11 | 2. 12 | td-agent-gem list | grep plugin-kinesis 13 | 14 | 3. 15 | export AWS_REGION="ap-northeast-1" 16 | 17 | 4. 18 | /etc/init.d/td-agent restart 19 | 20 | 21 | 6. 22 | create table ec2log ( timestamp varchar, alarmlevel varchar, host varchar, number int2, text varchar ); 23 | 24 | 7. 25 | copy ec2log from 's3://[S3 BUCKET NAME]/minilake-in1' format as json 'auto' iam_role '[ARN of the created IAM role]'; 26 | 27 | 8. 28 | create external schema my_first_external_schema from data catalog database 'spectrumdb' iam_role '[ARN of the created IAM role]' create external database if not exists; 29 | 30 | 9. 31 | create external table my_first_external_schema.ec2log_external ( timestamp varchar(max), alarmlevel varchar(max), host varchar(max), number int2, text varchar(max) ) partitioned by (year char(4), month char(2), day char(2), hour char(2)) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' WITH SERDEPROPERTIES ( 'paths'='timestamp,alarmlevel,host,number,text') STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' location 's3://[S3 BUCKET NAME]/minilake-in1/'; 32 | 33 | 10. 34 | select * from svv_external_schemas; 35 | 36 | 11. 37 | select * from svv_external_databases; 38 | 39 | 12. 40 | select * from svv_external_tables; 41 | 42 | 13. 43 | The value of each partition in the "ADD PARTITION" clause must match the value of the partition in the S3 bucket path in the LOCATION clause. The following is an example. 44 | 45 | ALTER TABLE my_first_external_schema.ec2log_external ADD PARTITION (year='2019', month='09', day='27', hour='14') LOCATION 's3://[S3 BUCKET NAME]/minilake-in1/2019/09/27/14'; 46 | 47 | 48 | If you make a mistake in creating a partition, please refer to the following for Drop Partition. 49 | Specify the value of the partition specified by ADD PARTITION and delete it. 50 | 51 | ALTER TABLE my_first_external_schema.ec2log_external DROP PARTITION (year='2019', month='09', day='27', hour='14') 52 | 53 | 54 | -------------------------------------------------------------------------------- /EN/lab5/asset/us-east-1/5-minilake_privatesubnet.yaml: -------------------------------------------------------------------------------- 1 | Parameters: 2 | VpcId: 3 | Description: select handson-minilake vpc id 4 | Type: "AWS::EC2::VPC::Id" 5 | EC2SecurityGroupId: 6 | Description: select EC2 Security Group ID 7 | Type: "AWS::EC2::SecurityGroup::Id" 8 | Resources: 9 | # Create Public RouteTable 10 | PrivateRouteTable: 11 | Type: AWS::EC2::RouteTable 12 | Properties: 13 | VpcId: !Ref VpcId 14 | Tags: 15 | - Key: Name 16 | Value: handson-minilake-private-rt 17 | # Create Private Subnet A 18 | PrivateSubnet: 19 | Type: AWS::EC2::Subnet 20 | Properties: 21 | VpcId: !Ref VpcId 22 | CidrBlock: 10.0.0.32/27 23 | AvailabilityZone: "us-east-1a" 24 | Tags: 25 | - Key: Name 26 | Value: handson-minilake-private-sub 27 | # Associate with Private subnet and route table for private subnet 28 | PriSubnetRouteTableAssociation: 29 | Type: AWS::EC2::SubnetRouteTableAssociation 30 | Properties: 31 | SubnetId: !Ref PrivateSubnet 32 | RouteTableId: !Ref PrivateRouteTable 33 | # Create EIP 34 | MyEIP: 35 | Type: "AWS::EC2::EIP" 36 | Properties: 37 | Domain: vpc 38 | # Create NATGateway 39 | myNATGateway: 40 | Type: "AWS::EC2::NatGateway" 41 | DependsOn: 42 | - MyEIP 43 | - PrivateSubnet 44 | Properties: 45 | AllocationId: !GetAtt MyEIP.AllocationId 46 | SubnetId: !Ref PrivateSubnet 47 | Tags: 48 | - Key: Name 49 | Value: handson-minilake-nat 50 | # set NAT as default gateway on route table for private subnet 51 | myPrivateRoute: 52 | Type: AWS::EC2::Route 53 | DependsOn: myNATGateway 54 | Properties: 55 | RouteTableId: !Ref PrivateRouteTable 56 | DestinationCidrBlock: 0.0.0.0/0 57 | NatGatewayId: !Ref myNATGateway 58 | # create sg for RS 59 | myRSSecurityGroup: 60 | Type: 'AWS::EC2::SecurityGroup' 61 | Properties: 62 | GroupName: handson-minilake-sg-private 63 | GroupDescription: Enable SSH access via port 5439 64 | VpcId: !Ref VpcId 65 | SecurityGroupIngress: 66 | - IpProtocol: tcp 67 | FromPort: '5439' 68 | ToPort: '5439' 69 | SourceSecurityGroupId: !Ref EC2SecurityGroupId 70 | -------------------------------------------------------------------------------- /EN/lab5/asset/us-east-1/5-td-agent1.conf: -------------------------------------------------------------------------------- 1 | 2 | @type tail 3 | path /root/es-demo/testapp.log 4 | pos_file /var/log/td-agent/testapp.log.pos 5 | format /^\[(?[^ ]* [^ ]*)\] (?[^ ]*) *? (?[^ ]*) * (?[^ ]*) * (?.*) \[(?.*)\]$/ 6 | time_format %d/%b/%Y:%H:%M:%S %z 7 | types size:integer, status:integer, reqtime:float, runtime:float, time:time 8 | tag testappec2.log 9 | 10 | 11 | 12 | @type copy 13 | 14 | @type cloudwatch_logs 15 | log_group_name minilake_group 16 | log_stream_name testapplog_stream 17 | auto_create_stream true 18 | 19 | 20 | @type kinesis_firehose 21 | delivery_stream_name minilake1 22 | flush_interval 1s 23 | 24 | 25 | -------------------------------------------------------------------------------- /EN/lab5/asset/us-east-1/5-td-agent2.conf: -------------------------------------------------------------------------------- 1 | 2 | @type tail 3 | path /root/es-demo/testapp.log 4 | pos_file /var/log/td-agent/testapp.log.pos 5 | format /^\[(?[^ ]* [^ ]*)\] (?[^ ]*) *? (?[^ ]*) * (?[^ ]*) * (?.*) \[(?.*)\]$/ 6 | time_format %d/%b/%Y:%H:%M:%S %z 7 | types size:integer, status:integer, reqtime:float, runtime:float, time:time 8 | tag testappec2.log 9 | 10 | 11 | 12 | @type copy 13 | 14 | @type kinesis_firehose 15 | delivery_stream_name minilake1 16 | flush_interval 1s 17 | 18 | 19 | -------------------------------------------------------------------------------- /EN/lab5/images/Lab5-Section4-Step4-25.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-s3-datalake-handson/77a5c739987367b8462c5334b320319885a57d96/EN/lab5/images/Lab5-Section4-Step4-25.png -------------------------------------------------------------------------------- /EN/lab5/images/quicksight_vpc_setting.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-s3-datalake-handson/77a5c739987367b8462c5334b320319885a57d96/EN/lab5/images/quicksight_vpc_setting.png -------------------------------------------------------------------------------- /EN/lab6/additional_info_lab6.md: -------------------------------------------------------------------------------- 1 | ------------------------------------------------------------------------------------ 2 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 3 | SPDX-License-Identifier: MIT-0 4 | 5 | ------------------------------------------------------------------------------------ 6 | 7 | 8 | # Additional guidance of query comparison in Athena 9 | ## JSON format vs JSON format (partition) vs Parquet format vs Parquet format (partition) 10 | 11 | ### 1. JSON format 12 | 13 | 14 | 15 | **[Query example]** 16 | 17 | ``` 18 | SELECT count(user) FROM "minilake"."minilake_in1" where user = 'uchida' and timestamp >= '2019-09-27 13%' AND timestamp <= '2019-09-27 21%'; 19 | ``` 20 | 21 | 22 | ### 2. JSON format (Partitioned by year, month, day and time) 23 | 24 | 25 | 26 | **[Query example]** 27 | 28 | ``` 29 | SELECT count(user) FROM "minilake"."minilake_in1" where user = 'uchida' and partition_0 = '2019' AND partition_1 = '09' AND partition_2 = '27' AND partition_3 >= '13' AND partition_3 <= '21'; 30 | ``` 31 | 32 | 33 | ### 3. Parquet format 34 | 35 | 36 | 37 | 38 | **[Query example]** 39 | 40 | ``` 41 | SELECT count(user) FROM "minilake"."minilake_out1" where user = 'uchida' and timestamp >= '2019-09-27 13%' AND timestamp <= '2019-09-27 21%'; 42 | ``` 43 | 44 | 45 | ### 4. Parquet format (Partitioned by user, year, month, day and time) 46 | 47 | 48 | 49 | **[Query example]** 50 | 51 | ``` 52 | SELECT count(user) FROM "minilake"."minilake_out2" where user = 'uchida' and timestamp >= '2019-09-27 13%' AND timestamp <= '2019-09-27 21%'; 53 | ``` 54 | 55 | **Athena charges you for the amount of data scanned, so if you don't scan too much, you can keep costs low. In many cases, small scans can improve performance.** 56 | Use partition, compression, and columnar formats to reduce the amount of reading. Click [here](https://aws.amazon.com/jp/blogs/big-data/top-10-performance-tuning-tips-for-amazon-athena/) for reference information. 57 | 58 | 59 | -------------------------------------------------------------------------------- /EN/lab6/asset/ap-northeast-1/6-cmd.txt: -------------------------------------------------------------------------------- 1 | ------------------------------------------------------------------------------------ 2 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 3 | SPDX-License-Identifier: MIT-0 4 | 5 | ------------------------------------------------------------------------------------ 6 | 7 | 8 | 1. 9 | SELECT count(user) FROM "minilake"."minilake_in1" where user='uchida' and timestamp >= '2019-09-27 13%' AND timestamp <= '2019-09-27 21%'; 10 | 11 | 2. 12 | SELECT count(user) FROM "minilake"."minilake_out1" where user='uchida' and timestamp >= '2019-09-27 13%' AND timestamp <= '2019-09-27 21%'; 13 | 14 | 15 | 3. 16 | #applymapping1 = ApplyMapping.apply(frame = datasource0, mappings = [("timestamp", "string", "timestamp", "string"), ("alarmlevel", "string", "alarmlevel", "string"), ("host", "string", "host", "string"), ("user", "string", "user", "string"), ("number", "string", "number", "string"), ("text", "string", "text", "string")], transformation_ctx = "applymapping1") 17 | ###1 18 | applymapping1 = ApplyMapping.apply(frame = datasource0, mappings = [("timestamp", "string", "timestamp", "string"), ("alarmlevel", "string", "alarmlevel", "string"), ("host", "string", "host", "string"), ("user", "string", "user", "string"), ("number", "string", "number", "string"), ("text", "string", "text", "string"),("partition_0", "string", "year", "string"), ("partition_1", "string", "month", "string"), ("partition_2", "string", "day", "string"), ("partition_3", "string", "hour", "string")], transformation_ctx = "applymapping1") 19 | 20 | 21 | #datasink4 = glueContext.write_dynamic_frame.from_options(frame = dropnullfields3, connection_type = "s3", connection_options = {"path": "s3://[S3 BUCKET NAME]/minilake-out"}, format = "parquet", transformation_ctx = "datasink4") 22 | ###1 23 | datasink4 = glueContext.write_dynamic_frame.from_options(frame = dropnullfields3, connection_type = "s3", connection_options = {"path": "s3://[S3 BUCKET NAME]/minilake-out2", "partitionKeys": ["user", "year", "month", "day", "hour"]}, format = "parquet", transformation_ctx = "datasink4") 24 | 25 | 4. 26 | SELECT count(user) FROM "minilake"."minilake_out2" where user='uchida' and timestamp >= '2019-09-27 13%' AND timestamp <= '2019-09-27 21%'; 27 | 28 | -------------------------------------------------------------------------------- /EN/lab6/asset/us-east-1/6-cmd.txt: -------------------------------------------------------------------------------- 1 | ------------------------------------------------------------------------------------ 2 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 3 | SPDX-License-Identifier: MIT-0 4 | 5 | ------------------------------------------------------------------------------------ 6 | 7 | 8 | 1. 9 | SELECT count(user) FROM "minilake"."minilake_in1" where user='uchida' and timestamp >= '2019-09-27 13%' AND timestamp <= '2019-09-27 21%'; 10 | 11 | 2. 12 | SELECT count(user) FROM "minilake"."minilake_out1" where user='uchida' and timestamp >= '2019-09-27 13%' AND timestamp <= '2019-09-27 21%'; 13 | 14 | 15 | 3. 16 | #applymapping1 = ApplyMapping.apply(frame = datasource0, mappings = [("timestamp", "string", "timestamp", "string"), ("alarmlevel", "string", "alarmlevel", "string"), ("host", "string", "host", "string"), ("user", "string", "user", "string"), ("number", "string", "number", "string"), ("text", "string", "text", "string")], transformation_ctx = "applymapping1") 17 | ###1 18 | applymapping1 = ApplyMapping.apply(frame = datasource0, mappings = [("timestamp", "string", "timestamp", "string"), ("alarmlevel", "string", "alarmlevel", "string"), ("host", "string", "host", "string"), ("user", "string", "user", "string"), ("number", "string", "number", "string"), ("text", "string", "text", "string"),("partition_0", "string", "year", "string"), ("partition_1", "string", "month", "string"), ("partition_2", "string", "day", "string"), ("partition_3", "string", "hour", "string")], transformation_ctx = "applymapping1") 19 | 20 | 21 | #datasink4 = glueContext.write_dynamic_frame.from_options(frame = dropnullfields3, connection_type = "s3", connection_options = {"path": "s3://[S3 BUCKET NAME]/minilake-out"}, format = "parquet", transformation_ctx = "datasink4") 22 | ###1 23 | datasink4 = glueContext.write_dynamic_frame.from_options(frame = dropnullfields3, connection_type = "s3", connection_options = {"path": "s3://[S3 BUCKET NAME]/minilake-out2", "partitionKeys": ["user", "year", "month", "day", "hour"]}, format = "parquet", transformation_ctx = "datasink4") 24 | 25 | 4. 26 | SELECT count(user) FROM "minilake"."minilake_out2" where user='uchida' and timestamp >= '2019-09-27 13%' AND timestamp <= '2019-09-27 21%'; 27 | 28 | -------------------------------------------------------------------------------- /EN/lab6/images/CSV_nopartition.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-s3-datalake-handson/77a5c739987367b8462c5334b320319885a57d96/EN/lab6/images/CSV_nopartition.png -------------------------------------------------------------------------------- /EN/lab6/images/CSV_partition.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-s3-datalake-handson/77a5c739987367b8462c5334b320319885a57d96/EN/lab6/images/CSV_partition.png -------------------------------------------------------------------------------- /EN/lab6/images/Parquet_nopartition.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-s3-datalake-handson/77a5c739987367b8462c5334b320319885a57d96/EN/lab6/images/Parquet_nopartition.png -------------------------------------------------------------------------------- /EN/lab6/images/Parquet_partition.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-s3-datalake-handson/77a5c739987367b8462c5334b320319885a57d96/EN/lab6/images/Parquet_partition.png -------------------------------------------------------------------------------- /EN/lab6/images/glue_job_capture01.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-s3-datalake-handson/77a5c739987367b8462c5334b320319885a57d96/EN/lab6/images/glue_job_capture01.png -------------------------------------------------------------------------------- /JP/README.md: -------------------------------------------------------------------------------- 1 | ------------------------------------------------------------------------------------ 2 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 3 | SPDX-License-Identifier: MIT-0 4 | 5 | ------------------------------------------------------------------------------------ 6 | 7 | 8 | # はじめに 9 | ## 本ハンズオンのゴール  10 | 幅広いデータソースからの構造化データまたは非構造化データの集中リポジトリとして使用できる Data Lake は、データの保存と分析の方法として多くの企業に取り入れられています。 11 | 12 | AWS のビッグデータ関連サービスを使用して実際に分析パイプラインを構築することを通して、 Data Lake とビッグデータ分析基盤構築の実感を持って頂くことをゴールとしています。 13 | 14 | ## 準備事項 15 | - AWS を利用可能なネットワークに接続された PC(Windows, Mac OS, Linux等) 16 | - 事前に用意していただいたAWSアカウント 17 | - SSH クライアント(Windows 環境では Tera Term を推奨) 18 | - ブラウザ(Firefox もしくは Chrome を推奨) 19 | 20 | # ハンズオンの概要 21 | 22 | ## ハンズオンの構成 23 | 本ハンズオンは6つのラボで構成されています。 24 | 25 | Lab1:はじめの準備(必須) 26 | 主に使用するAWSサービス:Amazon VPC, Amazon EC2, AWS CloudFormation, AWS IAM 27 | 28 | Lab2:アプリケーションログをリアルタイムで可視化 29 | 主に使用するAWSサービス:Amazon OpenSearch Service 30 | 31 | Lab3:アプリケーションログのリアルタイム可視化とアラーム 32 | 主に使用するAWSサービス:Amazon CloudWatch, AWS Lambda, Amazon OpenSearch Service 33 | 34 | Lab4:アプリケーションログの永続化と長期間データの分析と可視化 35 | 主に使用するAWSサービス:Amazon Kinesis Data Firehose, Amazon S3, Amazon Athena, Amazon QuickSight 36 | 37 | Lab5:クラウドDWHを使用したデータ分析 38 | 主に使用するAWSサービス: Amazon Kinesis Data Firehose, Amazon S3, Amazon Redshift, Amazon Redshift Spectrum, Amazon QuickSight 39 | 40 | Lab6:サーバーレスでデータのETL処理 41 | 主に使用するAWSサービス:AWS Glue, Amazon Athena 42 | 43 | 44 | ## ハンズオン実施パターン(3つ) 45 | 46 | 各 Lab を組み合わせることにより、以下3パターンのハンズオンを実施することが可能です。 47 | (1) ニアリアルタイムデータ分析環境(スピードレイヤ)の構築:[Lab1](lab1/README.md) → [Lab2](lab2/README.md) → [Lab3](lab3/README.md) 48 | (2) 長期間のデータをバッチ分析する環境(バッチレイヤ)の構築と、パフォーマンスとコストの最適化:[Lab1](lab1/README.md) → [Lab4](lab4/README.md) or [Lab5](lab5/README.md) → [Lab6](lab6/README.md) 49 | (3) すべて実施:[Lab1](lab1/README.md) → [Lab2](lab2/README.md) → [Lab3](lab3/README.md) → [Lab4](lab4/README.md) → [Lab5](lab5/README.md) → [Lab6](lab6/README.md) 50 | 51 | 52 | ハンズオン Lab1 〜 Lab6 まで完了すると、下記のような仕組みが構築できます。 53 | 54 | 55 | 56 | スピードレイヤでニアリアルタイム分析を行いながら、特定の条件時にアラームを飛ばしつつ、すべてのログデータを安価に長期保存しながら、必要に応じて ETL 処理を行った上で、アドホックにログデータに直接クエリしながら分析すべきデータを見極めつつ、 DWH で細かく分析を行うと同時に、 BI ツールで可視化する構成を、ほぼサーバレスで実現することが可能です。 57 | 58 | ## 各Labの概要 59 | 各 Lab の概要について、以下に示します。 60 | 61 | ### Lab1:はじめの準備 62 | 残りの5つの Lab で必要となる共通の環境を構築します。 63 | AWS CloudFormation(以降、CloudFormation)にて、 Amazon VPC(以降、VPC)、 Amazon EC2(以降、EC2)の構築、そして AWS IAM(以降、IAM)の権限設定を行います。CloudFormation を実行することで、FluentdがインストールされたEC2が起動します。 64 | 65 | - Lab1 の手順は[こちら](lab1/README.md) 66 | 67 | 68 | 69 | 主に使用する AWS サービス:VPC, EC2, CloudFormation, IAM 70 | 71 | ### Lab2:アプリケーションログをリアルタイムで可視化 72 | 「Lab1:はじめの準備」で構築した EC2 のログデータをリアルタイムで可視化するために、 EC2 で出力されるログを OSS の Fluentd を使ってストリームで Amazon OpenSearch Service(以降、OpenSearch Service)に送信し、 OpenSearch Service に付属している Kibana を使って、可視化を行います。 73 | 74 | - Lab2 の手順は[こちら](lab2/README.md) 75 | 76 | 77 | 78 | 主に使用する AWS サービス:OpenSearch Service 79 | 80 | ### Lab3:アプリケーションログのリアルタイム可視化とアラーム 81 | 「Lab2:アプリケーションログをリアルタイムで可視化」で実施した可視化に加え、アラーム検知を実施します。 82 | Fluentd から OpenSearch Service に送信する前段に Amazon CloudWatch(以降、CloudWatch)、 AWS Lambda(以降、Lambda)を配置して、アラーム通知をする処理を追加します。 83 | 84 | - Lab3 の手順は[こちら](lab3/README.md) 85 | 86 | 87 | 88 | 主に使用するサービス:CloudWatch, Lambda, OpenSearch Service 89 | 90 | ### Lab4:アプリケーションログの永続化と長期間データの分析と可視化 91 | ストリームデータを Amazon Kinesis Data Firehose(以降、Kinesis Data Firehose)に送信後、 Amazon S3(以降、S3)に保存することで長期保存します。その後、 Amazon Athena(以降、Athena)を用いて、アドホックな分析を行い、 Amazon QuickSight(以降、QuickSight)で可視化します。 92 | 93 | - Lab4 の手順は[こちら](lab4/README.md) 94 | 95 | 96 | 97 | 主に使用するサービス:Kinesis Data Firehose, S3, Athena, QuickSight 98 | 99 | ### Lab5:クラウドDWHを使用したデータ分析 100 | ストリームデータを Kinesis Data Firehose に送信後、 S3 に保存することで長期保存します。その後、 Amazon Redshift Spectrum(以降、Redshift Spectrum)を用いて、クエリを実行し、 QuickSight で可視化します。 101 | 102 | - Lab5 の手順は[こちら](lab5/README.md) 103 | 104 | 105 | 106 | 主に使用するサービス:Kinesis Data Firehose, S3, Athena, Redshift, Redshift Spectrum, QuickSight 107 | 108 | ### Lab6:サーバーレスでデータのETL処理 109 | ストリームデータを Kinesis Data Firehose に送信後、 S3 に保存することで長期保存します。その後、 AWS Glue(以下、Glue)を使って、①ファイルフォーマットを Apache Parquet 形式に変換します。②ファイルをパーティショングした配置するための処理、を実行し、その結果を S3 に保存します。その後、 Athena や Redshift Spectrum を用いて、クエリを実行し、 QuickSight で可視化します。 110 | 111 | - Lab6 の手順は[こちら](lab6/README.md) 112 | 113 | 114 | 115 | 主に使用するサービス:Glue, Athena 116 | 117 | 118 | ## ハンズオン全体を通しての注意事項 119 | 1. 本ハンズオンは、基本的に「東京リージョン」を前提に記載されています。リソースなどの上限に引っかかってしまった場合は、「バージニア北部リージョン」での環境を作成することも可能です。その場合、各ハンズオン資料の「東京リージョン(ap-northeast-1)の記載をすべて「バージニア北部(us-east-1)」に読み替える必要があります。また、asset資料については、両リージョンのものが用意されておりますので、該当のリージョンのものをご利用ください。 120 | 121 | 2. 各章で配置されている「補足説明」につきましては、本ハンズオンを進めていただく上では必須手順ではありません。参考資料としてください。 122 | 123 | 3. 同じ AWS アカウントで複数人が同時に本ハンズオンを実施される場合、適宜名前などが重複しないようにご留意ください。 124 | 125 | 4. 各手順において、「任意」と記載のあるものについては自由に名前を変更いただくことができますが、ハンズオン中に指定した名前がわからなくならないように、ハンズオン実施中はS3の名前以外、基本的にはそのままの名前で進めることを推奨いたします。 126 | 127 | 5. 各手順において、関連するAsset資料のリンクが配置されています。ブラウザから参照する場合、HTML形式での参照となります。必要に応じて、ファイルをダウンロードいただき、手順を進めてください。 128 | 129 | 130 | 131 | -------------------------------------------------------------------------------- /JP/clean-up/README.md: -------------------------------------------------------------------------------- 1 | ------------------------------------------------------------------------------------ 2 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 3 | SPDX-License-Identifier: MIT-0 4 | 5 | ------------------------------------------------------------------------------------ 6 | 7 | 8 | # 後片付け 9 | 10 | 下記、後片付けを行います。それぞれの Lab 単位で削除対象を記述しています。 11 | **この作業まで完了していない場合、継続して課金が発生しますので、必ず実施してください。** 12 | 13 | ## Lab6 14 | 15 | 1. Glue のジョブ(ジョブ名:minilake1) 16 | 17 | 2. Glue からクローラの削除(クローラ名:minilake-in1、minilake-out1、minilake-out2) 18 | 19 | 3. Glue からテーブルの削除(テーブル名:minilake\_out1、minilake\_out2) 20 | 21 | 4. Glue からデータベースの削除(データベース名:minilake) 22 | 23 | 24 | ## Lab5 25 | 26 | 1. QuickSight の VPC 接続を削除 27 | 28 | 2. QuickSight の Unsubscribe(QuickSight のアカウント名をクリック → [QuickSight の管理] → [アカウント設定] → [サブスクリプション解除]) 29 | 30 | 3. S3 バケットの削除(バケット名: [ご自身で作成されたS3バケット名]) 31 | 32 | 4. Kinesis Firehose の delivery stream の削除(ストリーム名:minilake1) 33 | 34 | 5. Redshift クラスターの削除 35 | 36 | 6. Glue からテーブルの削除(テーブル名:ec2log_external) 37 | 38 | 7. Glue からデータベースの削除(データベース名:spectrumdb) 39 | 40 | 8. handson-minilake-private セキュリティグループのインバウンドルールに対して、 手動で追加したルール(qs-rs-private-conn からのアクセス許可)を削除 41 | 42 | 9. セキュリティグループの qs-rs-private-conn の削除 43 | 44 | 10. CloudFormation コンソールから、「handson-minilake-private-subnet」スタックを削除。 45 | 46 | 47 | ## Lab4 48 | 49 | 1. QuickSight の Unsubscribe(QuickSight のアカウント名をクリック → [QuickSight の管理] → [アカウント設定] → [サブスクリプション解除]) 50 | 51 | 2. S3 バケットの削除(バケット名: [ご自身で作成された S3 バケット名]) 52 | 53 | 3. Kinesis Firehose の delivery stream の削除(ストリーム名:minilake1) 54 | 55 | 4. Glue からクローラの削除(クローラ名:minilake1) 56 | 57 | 5. Glue からテーブルの削除(テーブル名:minilake\_in1、minilake\_out1、ec2log\_external) 58 | 59 | 6. Glue からデータベースの削除(データベース名:minilake、spectrumdb) 60 | 61 | 62 | ## Lab1〜3 まで 63 | 64 | 1. Cloudwatch Logs の削除(ロググループ名以下) 65 | - 東京リージョン 66 | - /minilake_group 67 | - /aws/lambda/LogsToElasticsearch_handson-minilake 68 | - /aws/kinesisfirehose/minilake1 69 | - /aws-glue/crawlers/の”minilake”関連ストリーム 70 | - /aws-glue/jobs/error、 /aws-glue/jobs/outputのストリームはジョブIDごとに作成されます。Glueをいままで使っていなければロググループごと消してください。 71 | 72 | 2. Lambda の削除(Function 名:LogsToElasticsearch-handson_minilake) 73 | 74 | 3. CloudWatch アラーム(アラーム名:minilake_errlog) 75 | 76 | ## Lab1〜Lab2 77 | 78 | 1. Amazon Elasticsearch Service の削除 (ドメイン名:handson-minilake) 79 | 80 | 2. CloudFormation でスタック削除(EC2 や EIP の削除) 81 | 82 | 3. IAM ロールの削除(ロール名:handson-minilake) 83 | 84 | 4. (作成いただいた場合)キーペアの削除(キーペア名:handson) 85 | 86 | 後片付けは以上です。 87 | -------------------------------------------------------------------------------- /JP/images/architecture_all.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-s3-datalake-handson/77a5c739987367b8462c5334b320319885a57d96/JP/images/architecture_all.png -------------------------------------------------------------------------------- /JP/images/architecture_lab1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-s3-datalake-handson/77a5c739987367b8462c5334b320319885a57d96/JP/images/architecture_lab1.png -------------------------------------------------------------------------------- /JP/images/architecture_lab2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-s3-datalake-handson/77a5c739987367b8462c5334b320319885a57d96/JP/images/architecture_lab2.png -------------------------------------------------------------------------------- /JP/images/architecture_lab3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-s3-datalake-handson/77a5c739987367b8462c5334b320319885a57d96/JP/images/architecture_lab3.png -------------------------------------------------------------------------------- /JP/images/architecture_lab4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-s3-datalake-handson/77a5c739987367b8462c5334b320319885a57d96/JP/images/architecture_lab4.png -------------------------------------------------------------------------------- /JP/images/architecture_lab5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-s3-datalake-handson/77a5c739987367b8462c5334b320319885a57d96/JP/images/architecture_lab5.png -------------------------------------------------------------------------------- /JP/images/architecture_lab6.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-s3-datalake-handson/77a5c739987367b8462c5334b320319885a57d96/JP/images/architecture_lab6.png -------------------------------------------------------------------------------- /JP/lab1/README.md: -------------------------------------------------------------------------------- 1 | ------------------------------------------------------------------------------------ 2 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 3 | SPDX-License-Identifier: MIT-0 4 | 5 | ------------------------------------------------------------------------------------ 6 | 7 | 8 | # Lab1:はじめの準備 9 | 残りの5つの Lab で必要となる共通の環境を構築します。 10 | AWS CloudFormation(以降、CloudFormation)にて、 Amazon VPC(以降、VPC)、 Amazon EC2(以降、EC2)の構築、そして AWS IAM(以降、IAM)の権限設定を行います。CloudFormation を実行することで、FluentdがインストールされたEC2が起動します。 11 | 12 | ## Section1:事前準備 13 | ### Step1:AWS マネジメントコンソールにログイン 14 | 15 | 1. AWS マネジメントコンソールにログインします。ログイン後、画面右上のヘッダー部のリージョン選択にて、 **[東京]** となっていることを確認します。 16 | 17 | **Note:** **[東京]** となっていない場合は変更します。 18 | 19 | 2. AWS マネジメントコンソールのサービス一覧から **EC2** を選択します。 **[EC2 ダッシュボード]** の左ペインから **[キーペア]** をクリックし、 **[キーペアを作成]** ボタンをクリックし、**[名前]** に任意の値(例:handson)を入力し、 **[キーペアを作成]** をクリックします。操作しているパソコンに秘密鍵(例:handson.pem)がダウンロードされます。 20 | 21 | **Note:** 既存のキーペアを使われる場合は、こちらの手順は飛ばしてください。 22 | 23 | ## Section2:EC2 環境構築 24 | ### Step1:EC2 1台を CloudFormation で構築 25 | 26 | CloudFormation を使い、 VPC を作成し、作成された VPC にログを出力し続ける EC2 を構築します。 ログは2分おきに10件前後出力され、10分おきに300件のエラーログが出力されます。 27 | 28 | 1. AWS マネジメントコンソールのサービス一覧から **CloudFormation** を選択します。 29 | 30 | **Note:** CloudFormation が見つけられない場合、検索窓に「cloudform」などと入力し、選択します。 31 | 32 | 2. **[CloudFormation]** の画面において、画面右上の **[スタックの作成]** をクリックし、 **[新しいリソースを使用(標準)]** を選択します。 33 | 34 | **Note:** 場合によっては、 **[スタックの作成]** をクリックすることで、手順 3 の画面に遷移しますが、そのまま進めていただいて問題ありません。 35 | 36 | 3. **[スタックの作成]** 画面の **[前提条件 - テンプレートの準備]** において、 **[テンプレートの準備完了]** を選択します。 37 | 38 | **Note:** デフォルトで選択されている場合、そのまま進めます。 39 | 40 | 4. 続いて、 **[スタックの作成]** 画面の **[テンプレートの指定]** において、 **[テンプレートファイルのアップロード]** を選択し、 **[ファイルの選択]** をクリックし、ダウンロードしたテンプレート「 **1-minilake_ec2.yaml** 」を指定し、 **[次へ]** をクリックします。 41 | 42 | **Asset** 資料:[1-minilake_ec2.yaml](asset/ap-northeast-1/1-minilake_ec2.yaml) 43 | 44 | 5. **[スタックの名前]** に 「 **handson-minilake**(任意)」、 **[パラメータ]** の **[KeyPair]** に **Section1** で作成したキーペア「**handson.pem**(任意)」、もしくは既に作成済みの場合はそのキーペアを指定し、 **[RoleName]** に「 **handson-minilake-role**(任意)」と入力し、 **[次へ]** をクリックします。 45 | 46 | 6. オプションの **タグ** で、 **キー** に 「 **Name** 」 、 **値** に 「 **handson-minilake**(任意)」 と入力し、 **[次へ]** をクリックします。 47 | 48 | 7. 最後の確認ページの内容を確認し、 確認ページ下部の「 **AWS CloudFormation によって IAM リソースがカスタム名で作成される場合があることを承認します。** 」にチェックを入れ **[スタックの作成]** をクリックします。数分ほど待つと EC2 一台ができあがり、 **/root/es-demo/testapp.log** にログ出力が始まります。 49 | 50 | **Note:** SSMでログインされる場合は、インスタンスが起動してからSSM接続ができるまでタイムラグが発生する可能性があるため、10分程度休憩時間を取得することを推奨します。 51 | 52 | 8. EC2 へ **SSH ログインして root にスイッチし、** ログが2分おきに出力していることを確認します。 53 | 54 | **Note:** EC2 のログイン方法については、[こちら](additional_info_lab1.md#EC2へのログイン方法)を参照ください。 EC2 の 接続先の IP アドレス情報につきましては、 **[CloudFormation]** の画面から、該当の CloudFormation のスタックを選択し、 **[出力]** のタブをクリックすると、 **[AllowIPAddress]** の情報から確認できます。 55 | 56 | ``` 57 | $ sudo su - 58 | # tail -f /root/es-demo/testapp.log 59 | ``` 60 | 61 | **[ログの出力例]** 62 | 63 | ``` 64 | [2019-09-16 15:14:01+0900] WARNING prd-db02 uehara 1001 [This is Warning.] 65 | [2019-09-16 15:14:01+0900] INFO prd-db02 uehara 1001 [This is Information.] 66 | [2019-09-16 15:14:01+0900] INFO prd-web002 uchida 1001 [This is Information.] 67 | [2019-09-16 15:18:01+0900] INFO prd-ap001 uehara 1001 [This is Information.] 68 | [2019-09-16 15:18:01+0900] ERROR prd-db02 uchida 1001 [This is ERROR.] 69 | ``` 70 | 71 | ## Section3:まとめ 72 | 73 | CloudFormation を使い、 以下の設定を行いました。 74 | 75 | 1. VPC を作成し、2分おきに10件前後のログを出力し、10分おきに300件のエラーログを出力し続ける EC2 を作成しました。 76 | 2. VPC 内に作成した構築した EC2 に AWS リソースにアクセスするための権限を付与しました。詳細については[こちら](./additional_info_lab1_IAM.md) をご覧下さい。 77 | 3. 構築したEC2に、ログ収集ソフトウェアの Fluentd をインストールしました。詳細については[こちら](./additional_info_lab1_Fluentd.md) をご覧下さい。 78 | 79 | 80 | 81 | Lab1 は以上です。選択されているパターンに合わせて次の手順を実施ください。 82 | 83 | (1) ニアリアルタイムデータ分析環境(スピードレイヤ)の構築:[Lab1](../lab1/README.md) → [Lab2](../lab2/README.md) → [Lab3](../lab3/README.md) 84 | (2) 長期間のデータをバッチ分析する環境(バッチレイヤ)の構築と、パフォーマンスとコストの最適化:[Lab1](../lab1/README.md) → [Lab4](../lab4/README.md) or [Lab5](../lab5/README.md) → [Lab6](../lab6/README.md) 85 | (3) すべて実施:[Lab1](../lab1/README.md) → [Lab2](../lab2/README.md) → [Lab3](../lab3/README.md) → [Lab4](../lab4/README.md) → [Lab5](../lab5/README.md) → [Lab6](../lab6/README.md) 86 | 87 | 環境を削除される際は、[こちら](../clean-up/README.md)の手順をご覧ください。 -------------------------------------------------------------------------------- /JP/lab1/additional_info_lab1.md: -------------------------------------------------------------------------------- 1 | ------------------------------------------------------------------------------------ 2 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 3 | SPDX-License-Identifier: MIT-0 4 | 5 | ------------------------------------------------------------------------------------ 6 | 7 | 8 | # 補足説明:EC2へのログイン方法 9 | EC2へのログイン方法を3つ説明します。 10 | 11 | 1. **Windows** の場合:**Tera Term** を使用して、ログイン 12 | 13 | 2. **Mac / Linux** の場合:**ターミナル** を使用して、ログイン 14 | 15 | 3. **Windows / Mac / Linux** いずれも可能:**AWS Systems Manager Session Manager**(以下、 **Session Manager** )を使用して、ログイン 16 | 17 | Windows または Macを用いて、 EC2 にログインする場合、以下の情報が必要となります。 18 | 19 | - インスタンス作成時に指定したキーペアの秘密鍵ファイル(例:**handson.pem** ) 20 | 21 | **Note:** 作成手順については、 Lab1 の手順を参照ください。 22 | 23 | - インスタンスに割り当てられたパブリック IP アドレス 24 | 25 | ### 補足:EC2 のパブリック IP アドレスの確認手順 26 | 1. AWS マネジメントコンソールにログインし、 AWS マネジメントコンソールのサービス一覧から **EC2** を選択します。 27 | 28 | 2. **[EC2 ダッシュボード]** 画面の左ペインから、 **[インスタンス]** を選択します。 29 | 30 | 3. インスタンス一覧から該当のインスタンスを選択し、画面下部の **[説明]** タブの内容から、 **[パブリック DNS (IPv4) 31 | ]** に記載の内容をコピーし、パソコンのメモ帳などに情報を記録します。 32 | 33 | 34 | ## 1.Windows の場合 35 | 36 | 以下の手順で、 Windows から EC2 にログインします。 37 | 38 | 1. Tera Term(ttssh.exe)を起動します。 39 | 40 | **Note:** 「http://sourceforge.jp/projects/ttssh2/」からモジュールはダウンロード可能です。 41 | 42 | 2. **[ホスト]** に対して、接続するインスタンスの **[パブリック DNS 名]** を入力します。 43 | 44 | 3. **[SSH バージョン]** に対して、**[SSH2]** を指定し、 **[OK]** をクリックします。 45 | 46 | 4. 下記画面が出たら、 **[続行]** をクリックします。 47 | 48 | 49 | 5. ユーザー名を 「 **ec2-user** 」 と入力します。 50 | 51 | 6. **[RSA/DSA/ECDSA/ED255519 鍵を使う]** を選択します。 52 | 53 | 7. **[秘密鍵ファイル]** をクリックし、パソコン上に配置しているキーペアのファイル **[キーペアの鍵の名前].pem** (例:handson.pem)を選択して、接続します。 54 | 55 | **Note:** ファイルを選択する際、「すべてのファイル(*.*)」を選択しないと秘密鍵のファイルが表示されません。 56 | 57 | 58 | ## 2.Mac / Linux の場合 59 | 60 | 以下の手順で、WindowsからEC2にログインします。 61 | 62 | 1. ターミナルからコマンドラインでログインします。 63 | 64 | **Note:** 事前に秘密鍵のファイル(pemファイル)のパーミッションを **600** にしないと接続できません。 65 | 66 | ``` 67 | $ chmod 600 ~/Downloads/handson.pem 68 | $ ssh -i ~/Downloads/handson.pem ec2-user@[割当てられたパブリックIPアドレス] 69 | ``` 70 | 71 | 2. 「Are you sure you want to continue connecting (yes/no)?」と確認されるため、「yes」と入力し、ログインします。 72 | 73 | 74 | ## 3.Session Manager の場合 75 | 76 | 以下の手順で、 Session Manager から AWS マネジメントコンソール経由で EC2 にログインします。 77 | 78 | **Note:** EC2 と Session Manager 間で必要な IAM ロールは、AWS CloudFormation の実行時に既に実施されています。 詳細については[こちら](./additional_info_lab1_IAM.md) をご覧下さい。 79 | 80 | 1. AWS マネジメントコンソールのサービス一覧から **Systems Manager** を選択し、左ペインから **[セッションマネージャー]** を選択し、 **[セッションを開始する]** をクリックします。 81 | 82 | 2. ログイン対象の EC2 インスタンスIDを指定し、 **[セッションを開始する]** をクリックし、 EC2 にログインします。 83 | 84 | **Note:** 該当インスタンスが表示されるまで、5分程度かかることが想定されます。 85 | 86 | 3. ウェブ上にコマンドラインが表示されます。下記コマンドを実行し、 **ec2-user** ユーザーににスイッチしておきます。 87 | **Note:** デフォルトでは **ssm-user** というユーザーでログインされます。 88 | 89 | ``` 90 | $ whoami 91 | $ sudo su - ec2-user 92 | ``` 93 | 94 | 95 | ### 補足:ターゲットインスタンスに対象インスタンスが表示されない場合の対処方法 96 | 97 | AWS マネジメントコンソールのサービス一覧から、 **EC2** を選択し、今回作成したインスタンス「 **handson-minilake** (任意)」の **[ステータスチェック]** が、**[2/2のチェックに合格した]** となっているか確認します。まだ初期化中の場合、完了するまで待って、再度、もう一度 Systems Manager のターゲットインスタンスを確認します。 98 | 99 | 初期化が終了しているにも関わらず、該当のインスタンスが表示されない場合、 AWS マネージメントコンソールのサービス一覧から **EC2** を選択し、 **[EC2ダッシュボード]** 画面の左ペインから **[インスタンス]** を選択し、今回作成したインスタンス「 **handson-minilake** (任意)」にチェックを入れ、 **[アクション] → [インスタンスの状態] → [再起動]** をクリックします。 100 | その後、もう一度 Systems Manager のターゲットインスタンスを確認します。 101 | 102 | 103 | ## SSH ログインがうまくいかない場合の確認ポイント 104 | - インスタンスは完全に起動完了していますか? 105 | - 起動時に指定した内容どおりに起動していますか? 106 | - 接続先の IP アドレス あるいは ホスト名は正しいですか? 107 | - 指定した Security Group は 22(SSH) や 3389 (RDP)を有効にしていますか? 108 | - 指定した Key Pair と対応する鍵ファイルを指定していますか? 109 | - 秘密鍵ファイルのパーミッションは 600 になっていますか?(Mac / Linux から接続する場合) 110 | - EC2 と SSM 間の通信が確立されているか?(Session Manager を利用する場合) 111 | 112 | 113 | -------------------------------------------------------------------------------- /JP/lab1/additional_info_lab1_Fluentd.md: -------------------------------------------------------------------------------- 1 | ------------------------------------------------------------------------------------ 2 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 3 | SPDX-License-Identifier: MIT-0 4 | 5 | ------------------------------------------------------------------------------------ 6 | 7 | 8 | # 補足説明: Fluentd のインストール 9 | 10 | AWS CloudFormation では、以下の手順が実施された状態のAMIからEC2を起動しています。 11 | 任意のEC2 にFluentdをインストールしたい場合は、以下の設定を行なってください。 12 | 13 | 1. EC2 にログインし、 redhat-lsb-core と gcc をインストールします。 14 | 15 | **Note:** 準備済みの AMI にはすでにインストールされているため、スキップ可能な手順です。 16 | **Asset** 資料:[1-cmd.txt](asset/ap-northeast-1/1-cmd.txt) 17 | 18 | ``` 19 | $ sudo su - 20 | # yum -y install redhat-lsb-core gcc 21 | ``` 22 | 23 | 2. td-agent をインストールします。 24 | 25 | **Asset** 資料:[1-cmd.txt](asset/ap-northeast-1/1-cmd.txt) 26 | 27 | ``` 28 | # rpm -ivh http://packages.treasuredata.com.s3.amazonaws.com/3/redhat/6/x86_64/td-agent-3.1.1-0.el6.x86_64.rpm 29 | ``` 30 | 31 | 3. **/etc/init.d/td-agent** の18行目の **TD\_AGENT\_USER** の指定を **td-agent** から **root** に修正します。 32 | 33 | **Note:** コマンド例は、 vi エディタを使用した例ですが、他に使い慣れたエディタがある場合は、そちらをご利用ください。 34 | **Asset** 資料:[1-cmd.txt](asset/ap-northeast-1/1-cmd.txt) 35 | 36 | ``` 37 | # vi /etc/init.d/td-agent 38 | ``` 39 | 40 | **[変更前]** 41 | 42 | ``` 43 | TD_AGENT_USER=td-agent 44 | ``` 45 | 46 | **[変更後]** 47 | 48 | ``` 49 | TD_AGENT_USER=root 50 | ``` 51 | 52 | 4. Fluentd の自動起動設定をします。(実際の起動は後ほど行います。) 53 | 54 | **Asset** 資料:[1-cmd.txt](asset/ap-northeast-1/1-cmd.txt) 55 | 56 | ``` 57 | # chkconfig td-agent on 58 | ``` 59 | -------------------------------------------------------------------------------- /JP/lab1/additional_info_lab1_IAM.md: -------------------------------------------------------------------------------- 1 | ------------------------------------------------------------------------------------ 2 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 3 | SPDX-License-Identifier: MIT-0 4 | 5 | ------------------------------------------------------------------------------------ 6 | 7 | 8 | # 補足説明: IAM ロールの作成と EC2 へのアタッチ 9 | 10 | EC2 に対して、AWS リソースへアクセスする為の権限を与える IAM ロールの作成手順です。AWS CloudFormation により以下の設定が自動化されています。 11 | 12 | 1. AWS マネジメントコンソールのサービス一覧から **IAM** を選択し、 **[Identity and Access Management (IAM)]** 画面の左ペインから **[ロール]** を選択します。 13 | 14 | 2. **[ロールの作成]** をクリックします。 15 | 16 | 3. **[AWS サービス]** を選択し、 **[EC2]** を選択、 **[次のステップ:アクセス権限]** をクリックします。 17 | 18 | 4. **[Attach アクセス権限ポリシー]** の画面で、何も変更せずにそのまま **[次のステップ:タグ]** をクリックします。 19 | 20 | **Note:** この段階ではポリシーなしでロールを作成します。 Session Manager 利用の方は **AmazonEC2RoleforSSM** のみがアタッチされています。 21 | 22 | 5. **[タグの追加(オプション)]** 画面で、そのまま **[次のステップ:確認]** をクリックします。 23 | 24 | 6. **ロール名** に「 **handson-minilake**(任意)」と入力し、 **[ロールの作成]** をクリックします。 25 | 26 | 7. AWS マネージメントコンソールのサービス一覧から **EC2** を選択し、 **[EC2 ダッシュボード]** 画面の左ペインから **[インスタンス]** を選択し、今回作成したインスタンス「 **handson-minilake**(任意)」にチェックを入れ、 **[アクション] → [セキュリティ] → [IAM ロールを変更]** をクリックします。 27 | 28 | 8. **[IAM ロールを変更]** の画面において、 **[IAM ロール]** に「 **handson-minilake**(任意)」を選択し、 **[保存]** をクリックします。 29 | 30 | -------------------------------------------------------------------------------- /JP/lab1/asset/ap-northeast-1/1-cmd.txt: -------------------------------------------------------------------------------- 1 | ------------------------------------------------------------------------------------ 2 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 3 | SPDX-License-Identifier: MIT-0 4 | 5 | ------------------------------------------------------------------------------------ 6 | 7 | 8 | 1. 9 | yum -y install redhat-lsb-core gcc 10 | 11 | 2. 12 | rpm -ivh http://packages.treasuredata.com.s3.amazonaws.com/3/redhat/6/x86_64/td-agent-3.1.1-0.el6.x86_64.rpm 13 | 14 | 3. 15 | vi /etc/init.d/td-agent 16 | 17 | 4. 18 | chkconfig td-agent on 19 | -------------------------------------------------------------------------------- /JP/lab1/asset/ap-northeast-1/1-minilake_ec2.yaml: -------------------------------------------------------------------------------- 1 | Parameters: 2 | KeyPair: 3 | Description: Name of an existing EC2 KeyPair to enable SSH access to the instance 4 | Type: "AWS::EC2::KeyPair::KeyName" 5 | MinLength: '1' 6 | MaxLength: '255' 7 | AllowedPattern: '[\x20-\x7E]*' 8 | ConstraintDescription: can contain only ASCII characters. 9 | RoleName: 10 | Description: Set role name 11 | Type: String 12 | MinLength: '1' 13 | MaxLength: '255' 14 | AllowedPattern: '[\x20-\x7E]*' 15 | Resources: 16 | # Create VPC 17 | MyVPC: 18 | Type: AWS::EC2::VPC 19 | Properties: 20 | CidrBlock: 10.0.0.0/24 21 | EnableDnsSupport: 'true' 22 | EnableDnsHostnames: 'true' 23 | InstanceTenancy: default 24 | Tags: 25 | - Key: Name 26 | Value: handson-minilake 27 | # Create Public RouteTable 28 | PublicRouteTable: 29 | Type: AWS::EC2::RouteTable 30 | Properties: 31 | VpcId: !Ref MyVPC 32 | Tags: 33 | - Key: Name 34 | Value: handson-minilake 35 | # Create Public Subnet A 36 | PublicSubnetA: 37 | Type: AWS::EC2::Subnet 38 | Properties: 39 | VpcId: !Ref MyVPC 40 | CidrBlock: 10.0.0.0/27 41 | AvailabilityZone: "ap-northeast-1a" 42 | Tags: 43 | - Key: Name 44 | Value: handson-minilake 45 | PubSubnetARouteTableAssociation: 46 | Type: AWS::EC2::SubnetRouteTableAssociation 47 | Properties: 48 | SubnetId: !Ref PublicSubnetA 49 | RouteTableId: !Ref PublicRouteTable 50 | # Create InternetGateway 51 | myInternetGateway: 52 | Type: "AWS::EC2::InternetGateway" 53 | Properties: 54 | Tags: 55 | - Key: Name 56 | Value: handson-minilake 57 | AttachGateway: 58 | Type: AWS::EC2::VPCGatewayAttachment 59 | Properties: 60 | VpcId: !Ref MyVPC 61 | InternetGatewayId: !Ref myInternetGateway 62 | myRoute: 63 | Type: AWS::EC2::Route 64 | DependsOn: myInternetGateway 65 | Properties: 66 | RouteTableId: !Ref PublicRouteTable 67 | DestinationCidrBlock: 0.0.0.0/0 68 | GatewayId: !Ref myInternetGateway 69 | MyEIP: 70 | Type: "AWS::EC2::EIP" 71 | Properties: 72 | Domain: vpc 73 | ElasticIPAssociate: 74 | DependsOn: MyEC2Instance 75 | Type: AWS::EC2::EIPAssociation 76 | Properties: 77 | AllocationId: !GetAtt MyEIP.AllocationId 78 | InstanceId: !Ref MyEC2Instance 79 | MyEC2Instance: 80 | Type: 'AWS::EC2::Instance' 81 | Properties: 82 | ImageId: ami-08c23a10b77e0835b 83 | InstanceType: t3.micro 84 | SubnetId: !Ref PublicSubnetA 85 | KeyName : 86 | Ref: KeyPair 87 | SecurityGroupIds: 88 | - Ref: MyEC2SecurityGroup 89 | IamInstanceProfile: !Ref InstanceProfile 90 | Tags: 91 | - Key: Name 92 | Value: handson-minilake 93 | MyEC2SecurityGroup: 94 | Type: 'AWS::EC2::SecurityGroup' 95 | Properties: 96 | GroupName: handson-minilake-sg 97 | GroupDescription: Enable SSH access via port 22 98 | VpcId: !Ref MyVPC 99 | SecurityGroupIngress: 100 | - IpProtocol: tcp 101 | FromPort: '22' 102 | ToPort: '22' 103 | CidrIp: 104 | '0.0.0.0/0' 105 | - IpProtocol: tcp 106 | FromPort: '5439' 107 | ToPort: '5439' 108 | CidrIp: 109 | '0.0.0.0/0' 110 | minilaketestrole: 111 | Type: 'AWS::IAM::Role' 112 | Properties: 113 | AssumeRolePolicyDocument: 114 | Version: '2012-10-17' 115 | Statement: 116 | - Effect: Allow 117 | Principal: 118 | Service: 119 | - ec2.amazonaws.com 120 | Action: 121 | - 'sts:AssumeRole' 122 | Path: / 123 | RoleName: !Ref RoleName 124 | ManagedPolicyArns: 125 | - arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore 126 | InstanceProfile: 127 | Type: 'AWS::IAM::InstanceProfile' 128 | Properties: 129 | Path: '/' 130 | Roles: 131 | - !Ref minilaketestrole 132 | Outputs: 133 | AllowIPAddress: 134 | Description: EC2 PublicIP 135 | Value: !Join 136 | - ',' 137 | - - !Ref MyEIP 138 | -------------------------------------------------------------------------------- /JP/lab1/asset/us-east-1/1-cmd.txt: -------------------------------------------------------------------------------- 1 | ------------------------------------------------------------------------------------ 2 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 3 | SPDX-License-Identifier: MIT-0 4 | 5 | ------------------------------------------------------------------------------------ 6 | 7 | 8 | 1. 9 | yum -y install redhat-lsb-core gcc 10 | 11 | 2. 12 | rpm -ivh http://packages.treasuredata.com.s3.amazonaws.com/3/redhat/6/x86_64/td-agent-3.1.1-0.el6.x86_64.rpm 13 | 14 | 3. 15 | vi /etc/init.d/td-agent 16 | 17 | 4. 18 | chkconfig td-agent on 19 | -------------------------------------------------------------------------------- /JP/lab1/asset/us-east-1/1-minilake_ec2.yaml: -------------------------------------------------------------------------------- 1 | Parameters: 2 | KeyPair: 3 | Description: Name of an existing EC2 KeyPair to enable SSH access to the instance 4 | Type: "AWS::EC2::KeyPair::KeyName" 5 | MinLength: '1' 6 | MaxLength: '255' 7 | AllowedPattern: '[\x20-\x7E]*' 8 | ConstraintDescription: can contain only ASCII characters. 9 | RoleName: 10 | Description: Set role name 11 | Type: String 12 | MinLength: '1' 13 | MaxLength: '255' 14 | AllowedPattern: '[\x20-\x7E]*' 15 | Resources: 16 | # Create VPC 17 | MyVPC: 18 | Type: AWS::EC2::VPC 19 | Properties: 20 | CidrBlock: 10.0.0.0/24 21 | EnableDnsSupport: 'true' 22 | EnableDnsHostnames: 'true' 23 | InstanceTenancy: default 24 | Tags: 25 | - Key: Name 26 | Value: handson-minilake 27 | # Create Public RouteTable 28 | PublicRouteTable: 29 | Type: AWS::EC2::RouteTable 30 | Properties: 31 | VpcId: !Ref MyVPC 32 | Tags: 33 | - Key: Name 34 | Value: handson-minilake 35 | # Create Public Subnet A 36 | PublicSubnetA: 37 | Type: AWS::EC2::Subnet 38 | Properties: 39 | VpcId: !Ref MyVPC 40 | CidrBlock: 10.0.0.0/27 41 | AvailabilityZone: "us-east-1a" 42 | Tags: 43 | - Key: Name 44 | Value: handson-minilake 45 | PubSubnetARouteTableAssociation: 46 | Type: AWS::EC2::SubnetRouteTableAssociation 47 | Properties: 48 | SubnetId: !Ref PublicSubnetA 49 | RouteTableId: !Ref PublicRouteTable 50 | # Create InternetGateway 51 | myInternetGateway: 52 | Type: "AWS::EC2::InternetGateway" 53 | Properties: 54 | Tags: 55 | - Key: Name 56 | Value: handson-minilake 57 | AttachGateway: 58 | Type: AWS::EC2::VPCGatewayAttachment 59 | Properties: 60 | VpcId: !Ref MyVPC 61 | InternetGatewayId: !Ref myInternetGateway 62 | myRoute: 63 | Type: AWS::EC2::Route 64 | DependsOn: myInternetGateway 65 | Properties: 66 | RouteTableId: !Ref PublicRouteTable 67 | DestinationCidrBlock: 0.0.0.0/0 68 | GatewayId: !Ref myInternetGateway 69 | MyEIP: 70 | Type: "AWS::EC2::EIP" 71 | Properties: 72 | Domain: vpc 73 | ElasticIPAssociate: 74 | DependsOn: MyEC2Instance 75 | Type: AWS::EC2::EIPAssociation 76 | Properties: 77 | AllocationId: !GetAtt MyEIP.AllocationId 78 | InstanceId: !Ref MyEC2Instance 79 | MyEC2Instance: 80 | Type: 'AWS::EC2::Instance' 81 | Properties: 82 | ImageId: ami-07ffa58a0316a8fd3 83 | InstanceType: t3.micro 84 | SubnetId: !Ref PublicSubnetA 85 | KeyName : 86 | Ref: KeyPair 87 | SecurityGroupIds: 88 | - Ref: MyEC2SecurityGroup 89 | IamInstanceProfile: !Ref InstanceProfile 90 | Tags: 91 | - Key: Name 92 | Value: handson-minilake 93 | MyEC2SecurityGroup: 94 | Type: 'AWS::EC2::SecurityGroup' 95 | Properties: 96 | GroupName: handson-minilake-sg 97 | GroupDescription: Enable SSH access via port 22 98 | VpcId: !Ref MyVPC 99 | SecurityGroupIngress: 100 | - IpProtocol: tcp 101 | FromPort: '22' 102 | ToPort: '22' 103 | CidrIp: 104 | '0.0.0.0/0' 105 | - IpProtocol: tcp 106 | FromPort: '5439' 107 | ToPort: '5439' 108 | CidrIp: 109 | '0.0.0.0/0' 110 | minilaketestrole: 111 | Type: 'AWS::IAM::Role' 112 | Properties: 113 | AssumeRolePolicyDocument: 114 | Version: '2012-10-17' 115 | Statement: 116 | - Effect: Allow 117 | Principal: 118 | Service: 119 | - ec2.amazonaws.com 120 | Action: 121 | - 'sts:AssumeRole' 122 | Path: / 123 | RoleName: !Ref RoleName 124 | ManagedPolicyArns: 125 | - arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore 126 | InstanceProfile: 127 | Type: 'AWS::IAM::InstanceProfile' 128 | Properties: 129 | Path: '/' 130 | Roles: 131 | - !Ref minilaketestrole 132 | Outputs: 133 | AllowIPAddress: 134 | Description: EC2 PublicIP 135 | Value: !Join 136 | - ',' 137 | - - !Ref MyEIP 138 | -------------------------------------------------------------------------------- /JP/lab1/images/windows_login_ec2_capture01.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-s3-datalake-handson/77a5c739987367b8462c5334b320319885a57d96/JP/lab1/images/windows_login_ec2_capture01.png -------------------------------------------------------------------------------- /JP/lab2/README.md: -------------------------------------------------------------------------------- 1 | ------------------------------------------------------------------------------------ 2 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 3 | SPDX-License-Identifier: MIT-0 4 | 5 | ------------------------------------------------------------------------------------ 6 | 7 | 8 | # Lab2:アプリケーションログをリアルタイムで可視化 9 | 「Lab1:はじめの準備」で構築したEC2のログデータをリアルタイムで可視化するために、 EC2 で出力されるログを OSS の Fluentd を使ってストリームで Amazon OpenSearch Service(以降、OpenSearch Service)に送信し、 OpenSearch Service に付属している OpenSearch Dashboards を使って、可視化を行います。 10 | 2分おきに10件前後、10分おきに300件出力され続けるログを、 Fluentd を使って OpenSearch Service に転送し、 OpenSearch Dashboards で可視化します。 11 | 12 | ## Section1:OpenSearch Service の設定 13 | ### Step1:OpenSearch Service の起動 14 | 15 | 1. AWS マネージメントコンソールのサービス一覧から **OpenSearch Service** を選択し、 **[ドメインの作成]** をクリックします。 16 | 17 | 2. **"名前"** の項目において、 **"ドメイン名"** に「 **handson-minilake**(任意)」と入力しします。 18 | 19 | 3. **"デプロイタイプ"** の項目において、 **"デプロイタイプ"** で **[開発およびテスト]** を選択します。 バージョンは変更せず、そのまま次へ進みます。 20 | 21 | **Note:** 今回の手順はバージョン1.0で確認しました。 22 | 23 | 4. **"データノード"** の項目において、 **"インスタンスタイプ"** に **[t3.small.search]** を選択します。その他の設定は変更せず、そのまま次へ進みます。 24 | 25 | 5. **"ネットワーク"** の項目において、 **"ネットワーク"** にある **[パブリックアクセス]** にチェックを入れます。 26 | 27 | 6. **"きめ細かなアクセスコントロール"** 項目にて **[きめ細かなアクセスコントロールを有効化]** にチェックが入っていることを確認し、 **[マスターユーザーの作成]** にチェックを入れ、 **"マスターユーザー名"** と **"マスターパスワード"** を以下の通り設定します。 28 | 29 | - マスターユーザー名:**aesadmin**(任意) 30 | - マスターユーザーのパスワード:**MyPassword&1**(任意) 31 | 32 | 7. 次に、アクセスポリシーの項目を設定します。 **"ドメインアクセスポリシー"** において **[ドメインレベルのアクセスポリシーの設定]** を選択します。 33 | 34 | - タイプに **[IPv4 アドレス]** を選択、プリンシパルに「 **[ご自身のIPアドレス](http://checkip.amazonaws.com/)** 」を入力、アクションに **[許可]** を選択 35 | 36 | - **[要素を追加]** をクリックし、タイプに **[IPv4 アドレス]** を選択、プリンシパルに「 **Lab1で作成したインスタンスのパブリックIP** 」を入力、アクションに **[許可]** を選択 37 | 38 | - **[要素を追加]** をクリックし、タイプに **[IAM ARN]** を選択、プリンシパルに「 **ご自身のAWSアカウントID** 」を入力、アクションに **[許可]** を選択 39 | 40 | 8. 上記の設定以外は全てデフォルトのままで、画面の一番下にある **[作成]** をクリックしドメインを作成します。 41 | 42 | **Note:** OpenSearch Service の作成が始まります。構築完了には 15 分ほどかかりますが完了を待たずに次の手順を進めてください。 43 | 44 | 45 | ## Section2:EC2, Fluentd, OpenSearch Service の設定 46 | ### Step1:IAM ロールの設定 47 | 48 | 作成済の「 **handson-minilake-role**(任意)」の IAM ロールに以下のようにポリシーを追加します。 49 | 50 | 1. AWS マネジメントコンソールのサービス一覧から **IAM** を選択し、 **[Identity and Access Management (IAM)]** 画面の左ペインから **[ロール]** を選択し、「 **handson-minilake-role**(任意)」のロール名をクリックします。 51 | 52 | 2. **[許可]** タブを選択し、 **[許可を追加]** をクリックし、 **[ポリシーをアタッチ]** をクリックします。 53 | 54 | 3. 検索窓で「 **amazones** 」と入れ検索し、 **[AmazonESFullAccess]** にチェックを入れ、 **[ポリシーのアタッチ]** をクリックします。 55 | 56 | 4. 変更実施したロール名を再びクリックし、 **[許可]** タブを選択し、 **[AmazonESFullAccess]** がアタッチされたことを確認します。 57 | 58 | 59 | ### Step2:Fluentd の設定 60 | 61 | Fluentd から OpenSearch Service にログデータを送信するための設定を行います。 62 | 63 | 1. AWS マネジメントコンソールのサービス一覧から **OpenSearch Service** を選択し、 **[Amazon OpenSearch Service ダッシュボード]** 画面から作成したドメイン名「 **handson-minilake**(任意)」をクリックし、 **[ドメインエンドポイント]** にある **URL の文字列** を **https://を含めない形** でパソコンのメモ帳などにメモしておきます。 64 | 65 | 2. EC2 にログインし、 OpenSearch のプラグインのインストール状況を確認します。 66 | 67 | **Asset** 資料:[2-cmd.txt](asset/ap-northeast-1/2-cmd.txt) 68 | 69 | ``` 70 | # td-agent-gem list | grep plugin-elasticsearch 71 | ``` 72 | 73 | **[実行結果例]** 74 | 75 | ``` 76 | fluent-plugin-elasticsearch (2.6.0, 2.4.0) 77 | ``` 78 | 79 | 5. 「 **/etc/td-agent/td-agent.conf** 」の設定を **Lab2** 向けに変更するために、あらかじめ用意しておいた **asset** に 以下の cp コマンドを用いて置き換えます。その際、ファイルを上書きするかの確認が入る為、 **yes** と入力します。 80 | 81 | ``` 82 | # cp -p /root/asset/2-td-agent.conf /etc/td-agent/td-agent.conf 83 | ``` 84 | 85 | 6. 置き換えたあと、内容を **vi** などのエディタを使用し、一部修正します。 **<エンドポイント>** の値を手順1でコピーしておいたエンドポイントの値と置き換え、保存します。 86 | 87 | **Note:** **eshost** の値として、 **https://** は含めません。 88 | 89 | ``` 90 | # vi /etc/td-agent/td-agent.conf 91 | ``` 92 | 93 | **[変更前]** 94 | 95 | ``` 96 | host <エンドポイント> 97 | ``` 98 | 99 | **[変更後の例]** 100 | 101 | ``` 102 | host search-handson-minilake-ikop2vbusshbf3pgnuqzlxxxxx.ap-northeast-1.es.amazonaws.com 103 | ``` 104 | 105 | 8. 上記に続き **<マスターユーザー名>** と **<マスターパスワード>** についても、 **[Step1:OpenSearch Service の起動のセクション6]** で作成した、 **"マスターユーザー名":** ```aesadmin(任意)``` と **"マスターパスワード":** ```MyPassword&1(任意)``` に置き換える形で修正します。 106 | 107 | **[変更前]** 108 | 109 | ``` 110 | user <マスターユーザー名> 111 | password <マスターパスワード> 112 | ``` 113 | **[変更後の例]** 114 | 115 | ``` 116 | user aesadmin 117 | password MyPassword&1 118 | ``` 119 | 9. td-agent のプロセスを起動します。 120 | 121 | **Asset** 資料:[2-cmd.txt](asset/ap-northeast-1/2-cmd.txt) 122 | 123 | ``` 124 | # /etc/init.d/td-agent restart 125 | ``` 126 | 127 | 10. Fluentd のログを確認します。 128 | 129 | **Asset** 資料:[2-cmd.txt](asset/ap-northeast-1/2-cmd.txt) 130 | 131 | ``` 132 | # tail -f /var/log/td-agent/td-agent.log 133 | ``` 134 | 135 |   **Note:** ログの中にエラーが出続けることがないかを確認します。起動に成功した場合、以下の文言が出力されます。 136 | 137 | ``` 138 | [info]: #0 Connection opened to OpenSearch cluster => {..... 139 | ``` 140 | ログ出力まで少し時間がかかる場合があります。 141 | 142 | ### Step3:OpenSearch Service の設定 143 | 144 | 1. AWS マネジメントコンソールのサービス一覧から **OpenSearch Service** を選択します。 145 | 146 | 2. 左ペインにある **[ドメイン]** を選択します。作成した「 **handson-minilake**(任意)」ドメインの **[ドメインのステータス]** が **[アクティブ]** で、 **[検索可能なドキュメント]** の件数が1件以上になっていることを確認し、「 **handson-minilake**(任意)」ドメインをクリックします。 147 | 148 | 3. **[OpenSearch Dashboards の URL]** のをクリックします。 149 | 150 | 4. **[Open Distro for OpenSearch]** 画面が表示されるため、 **[Step1:OpenSearch Service の起動のセクション6]** で作成した、 **"マスターユーザー名"** と **"マスターパスワード"** を入力します。 151 | 152 | 5. **[Welcome to OpenSearch]** 画面が表示されるため、 **[Explore on my own]** を選択します。 153 | 154 | 6. **[Select your tenant]** のポップアップが表示されるため、 **[Private]]** を選択し、 **[Confirm]]** をクリックし、 **OpenSearch Dashboards** の画面を開きます。 155 | 156 | #### OpenSearch Dashboards での操作 157 | 158 | 7. **OpenSearch Dashboards** の画面左にある![kibana_pain](images/kibana_pain2.png)アイコンをクリックし、 **[Dashboad]** をクリックします。 159 | 160 | 8. **[Create index pattern]** をクリックし、 **[Create index pattern]** 画面において、 **[Index pattern name]** に「 ___testappec2log-*___ 」を入力し、右側の **[Next step]** をクリックします。 161 | 162 | 9. **[Time field]** において、 **[@timestamp]** を選択し、画面右下の **[Create index pattern]** をクリックします。 163 | 164 | 10. **OpenSearch Dashboards** の画面の左ペインにある **[Saved Objects]** をクリックします。画面右上の **[Import]** をクリックします。 165 | 166 | 11. **[Import saved objects]** 画面において、 **[Import]** アイコンをクリックし、 **Asset** 資料の「 **2-visualization.json** 」を選択し、 **[Import]** をクリックします。続く画面において、 **[New index patten]** に対して、「 **testappec2log-\*** 」を選択し、 **[Confirm all changes]** をクリックし、インポートを完了します。問題なくインポートが完了したら、 **[Done]** をクリックすると、元の画面に戻ります。 167 | 168 | **Asset** 資料:[2-visualization.json](asset/ap-northeast-1/2-visualization.json) 169 | 170 | 12. 続いて、再度 **[Saved Objects]** 画面において、 **[Import]** アイコンをクリックし、 **Asset** 資料の「 **2-dashboard.json** 」を選択し、 **[Import]** をクリックし、インポートします。問題なくインポートが完了したら、 **[Done]** をクリックすると、元の画面に戻ります。 171 | 172 | **Asset** 資料:[2-dashboard.json](asset/ap-northeast-1/2-dashboard.json) 173 | 174 | 13. **OpenSearch Dashboards** の画面左にある![kibana_pain](images/kibana_pain2.png)アイコンをクリックし、ペインからインポートした「 **test1-dashboard** 」をクリックし、以下のように値が表示されていれば完了です。 175 | 176 | 177 | 178 | 14. **OpenSearch Dashboards** の画面にて、右上でタイムレンジが選べるため、期間を **[Last 1 hour]** にしてみます。グラフ表示が1時間の間の取得値に変化していることが確認できます。 179 | 180 | 15. **OpenSearch Dashboards** の画面左にある![kibana_pain](images/kibana_pain2.png)アイコンをクリックし、**[Discover]** をクリックします。 181 | 182 | 16. **"Available fields"** において、 **[alarmlevel]** の右の **[プラスボタン]** をクリックします。同じように **[user]** の右側の **[プラスボタン]** をクリックすると、対象のカラム(Time, alarmlevel, user)だけが表示されます。 183 | 184 | **Note:** **[プラスボタン]** はカーソルがある時にだけ表示されます。 185 | 186 | 17. 検索窓に「 **user:"imai"** 」と入力し、Enterを押すと、「 **imai** 」というユーザーでフィルタリングされます。 187 | 188 | 189 | ## Section3:まとめ 190 | 191 | EC2 からのログをストリームで OpenSearch Service に送り、 OpenSearch Dashboards で可視化してエラーログなどを探しやすくなりました。大量の EC2 がある場合、ログを探すのは大変なのでさらに高い効果が見込めます。 192 | 193 | 194 | 195 | Lab2 は以上です。選択されているパターンに合わせて次の手順を実施ください。 196 | 197 | (1) ニアリアルタイムデータ分析環境(スピードレイヤ)の構築:[Lab1](../lab1/README.md) → [Lab2](../lab2/README.md) → [Lab3](../lab3/README.md) 198 | (2) 長期間のデータをバッチ分析する環境(バッチレイヤ)の構築と、パフォーマンスとコストの最適化:[Lab1](../lab1/README.md) → [Lab4](../lab4/README.md) or [Lab5](../lab5/README.md) → [Lab6](../lab6/README.md) 199 | (3) すべて実施:[Lab1](../lab1/README.md) → [Lab2](../lab2/README.md) → [Lab3](../lab3/README.md) → [Lab4](../lab4/README.md) → [Lab5](../lab5/README.md) → [Lab6](../lab6/README.md) 200 | 201 | 環境を削除される際は、[こちら](../clean-up/README.md)の手順をご覧ください。 202 | -------------------------------------------------------------------------------- /JP/lab2/asset/ap-northeast-1/2-cmd.txt: -------------------------------------------------------------------------------- 1 | ------------------------------------------------------------------------------------ 2 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 3 | SPDX-License-Identifier: MIT-0 4 | 5 | ------------------------------------------------------------------------------------ 6 | 7 | 8 | 1. 9 | td-agent-gem install -v 2.6.0 fluent-plugin-elasticsearch 10 | td-agent-gem list | grep plugin-elasticsearch 11 | 12 | ※失敗したら以下でアンインストール 13 | td-agent-gem uninstall -v 2.6.0 fluent-plugin-elasticsearch 14 | 15 | 2. 16 | /etc/init.d/td-agent start 17 | 18 | 3. 19 | tail -f /var/log/td-agent/td-agent.log 20 | -------------------------------------------------------------------------------- /JP/lab2/asset/ap-northeast-1/2-dashboard.json: -------------------------------------------------------------------------------- 1 | [ 2 | { 3 | "_id": "test1-dashboard", 4 | "_type": "dashboard", 5 | "_source": { 6 | "title": "test1-dashboard", 7 | "hits": 0, 8 | "description": "", 9 | "panelsJSON": "[{\"panelIndex\":\"1\",\"gridData\":{\"x\":0,\"y\":2,\"w\":12,\"h\":4,\"i\":\"1\"},\"id\":\"test1-alarmlevel\",\"type\":\"visualization\",\"version\":\"6.2.2\"},{\"panelIndex\":\"3\",\"gridData\":{\"x\":6,\"y\":6,\"w\":6,\"h\":5,\"i\":\"3\"},\"id\":\"test1-hostalarmlevel\",\"type\":\"visualization\",\"version\":\"6.2.2\"},{\"panelIndex\":\"4\",\"gridData\":{\"x\":0,\"y\":0,\"w\":6,\"h\":2,\"i\":\"4\"},\"id\":\"test-text\",\"type\":\"visualization\",\"version\":\"6.2.2\"},{\"panelIndex\":\"5\",\"gridData\":{\"x\":3,\"y\":6,\"w\":3,\"h\":3,\"i\":\"5\"},\"embeddableConfig\":{\"spy\":{\"mode\":{\"fill\":false,\"name\":null}},\"vis\":{\"defaultColors\":{\"0 - 100\":\"rgb(0,104,55)\"}}},\"id\":\"test1-count\",\"type\":\"visualization\",\"version\":\"6.2.2\"},{\"panelIndex\":\"6\",\"gridData\":{\"x\":0,\"y\":6,\"w\":3,\"h\":3,\"i\":\"6\"},\"embeddableConfig\":{\"spy\":{\"mode\":{\"fill\":false,\"name\":null}},\"vis\":{\"params\":{\"sort\":{\"columnIndex\":null,\"direction\":null}}}},\"id\":\"test1-ranking\",\"type\":\"visualization\",\"version\":\"6.2.2\"},{\"panelIndex\":\"7\",\"gridData\":{\"x\":0,\"y\":9,\"w\":6,\"h\":3,\"i\":\"7\"},\"embeddableConfig\":{\"spy\":{\"mode\":{\"fill\":false,\"name\":null}}},\"id\":\"test1-username\",\"type\":\"visualization\",\"version\":\"6.2.2\"},{\"panelIndex\":\"8\",\"gridData\":{\"x\":0,\"y\":12,\"w\":8,\"h\":6,\"i\":\"8\"},\"embeddableConfig\":{\"spy\":{\"mode\":{\"fill\":false,\"name\":null}},\"vis\":{\"defaultColors\":{\"0 - 7\":\"rgb(247,252,245)\",\"13 - 20\":\"rgb(116,196,118)\",\"20 - 26\":\"rgb(35,139,69)\",\"7 - 13\":\"rgb(199,233,192)\"}}},\"id\":\"fb6541a0-fbe1-11e7-a2c5-4f46f1590b26\",\"type\":\"visualization\",\"version\":\"6.2.2\"},{\"panelIndex\":\"9\",\"gridData\":{\"x\":0,\"y\":18,\"w\":12,\"h\":3,\"i\":\"9\"},\"id\":\"c05b0260-fbe2-11e7-a2c5-4f46f1590b26\",\"type\":\"visualization\",\"version\":\"6.2.2\"},{\"panelIndex\":\"10\",\"gridData\":{\"x\":6,\"y\":0,\"w\":5,\"h\":2,\"i\":\"10\"},\"id\":\"test1-text\",\"type\":\"visualization\",\"version\":\"6.2.2\"}]", 10 | "optionsJSON": "{\"darkTheme\":false,\"hidePanelTitles\":false,\"useMargins\":false}", 11 | "version": 1, 12 | "timeRestore": false, 13 | "kibanaSavedObjectMeta": { 14 | "searchSourceJSON": "{\"filter\":[],\"query\":{\"language\":\"lucene\",\"query\":{\"query_string\":{\"analyze_wildcard\":true,\"default_field\":\"*\",\"query\":\"*\"}}},\"highlightAll\":true,\"version\":true}" 15 | } 16 | } 17 | } 18 | ] -------------------------------------------------------------------------------- /JP/lab2/asset/ap-northeast-1/2-td-agent.conf: -------------------------------------------------------------------------------- 1 | 2 | @type tail 3 | path /root/es-demo/testapp.log 4 | pos_file /var/log/td-agent/testapp.log.pos 5 | format /^\[(?[^ ]* [^ ]*)\] (?[^ ]*) *? (?[^ ]*) * (?[^ ]*) * (?.*) \[(?.*)\]$/ 6 | time_format %d/%b/%Y:%H:%M:%S %z 7 | types size:integer, status:integer, reqtime:float, runtime:float, time:time 8 | tag testappec2.log 9 | 10 | 11 | 12 | type_name testappec2log 13 | @type elasticsearch 14 | include_tag_key true 15 | tag_key @log_name 16 | host <エンドポイント > 17 | port 443 18 | user <マスターユーザー名> 19 | password <マスターパスワード> 20 | scheme https 21 | logstash_format true 22 | logstash_prefix testappec2log 23 | flush_interval 10s 24 | retry_limit 5 25 | buffer_type file 26 | buffer_path /var/log/td-agent/buffer/testapp.log.buffer 27 | reload_connections false 28 | 29 | -------------------------------------------------------------------------------- /JP/lab2/asset/us-east-1/2-cmd.txt: -------------------------------------------------------------------------------- 1 | ------------------------------------------------------------------------------------ 2 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 3 | SPDX-License-Identifier: MIT-0 4 | 5 | ------------------------------------------------------------------------------------ 6 | 7 | 8 | 1. 9 | td-agent-gem install -v 2.6.0 fluent-plugin-elasticsearch 10 | td-agent-gem list | grep plugin-elasticsearch 11 | 12 | ※失敗したら以下でアンインストール 13 | td-agent-gem uninstall -v 2.6.0 fluent-plugin-elasticsearch 14 | 15 | 2. 16 | /etc/init.d/td-agent start 17 | 18 | 3. 19 | tail -f /var/log/td-agent/td-agent.log 20 | -------------------------------------------------------------------------------- /JP/lab2/asset/us-east-1/2-dashboard.json: -------------------------------------------------------------------------------- 1 | [ 2 | { 3 | "_id": "test1-dashboard", 4 | "_type": "dashboard", 5 | "_source": { 6 | "title": "test1-dashboard", 7 | "hits": 0, 8 | "description": "", 9 | "panelsJSON": "[{\"panelIndex\":\"1\",\"gridData\":{\"x\":0,\"y\":2,\"w\":12,\"h\":4,\"i\":\"1\"},\"id\":\"test1-alarmlevel\",\"type\":\"visualization\",\"version\":\"6.2.2\"},{\"panelIndex\":\"3\",\"gridData\":{\"x\":6,\"y\":6,\"w\":6,\"h\":5,\"i\":\"3\"},\"id\":\"test1-hostalarmlevel\",\"type\":\"visualization\",\"version\":\"6.2.2\"},{\"panelIndex\":\"4\",\"gridData\":{\"x\":0,\"y\":0,\"w\":6,\"h\":2,\"i\":\"4\"},\"id\":\"test-text\",\"type\":\"visualization\",\"version\":\"6.2.2\"},{\"panelIndex\":\"5\",\"gridData\":{\"x\":3,\"y\":6,\"w\":3,\"h\":3,\"i\":\"5\"},\"embeddableConfig\":{\"spy\":{\"mode\":{\"fill\":false,\"name\":null}},\"vis\":{\"defaultColors\":{\"0 - 100\":\"rgb(0,104,55)\"}}},\"id\":\"test1-count\",\"type\":\"visualization\",\"version\":\"6.2.2\"},{\"panelIndex\":\"6\",\"gridData\":{\"x\":0,\"y\":6,\"w\":3,\"h\":3,\"i\":\"6\"},\"embeddableConfig\":{\"spy\":{\"mode\":{\"fill\":false,\"name\":null}},\"vis\":{\"params\":{\"sort\":{\"columnIndex\":null,\"direction\":null}}}},\"id\":\"test1-ranking\",\"type\":\"visualization\",\"version\":\"6.2.2\"},{\"panelIndex\":\"7\",\"gridData\":{\"x\":0,\"y\":9,\"w\":6,\"h\":3,\"i\":\"7\"},\"embeddableConfig\":{\"spy\":{\"mode\":{\"fill\":false,\"name\":null}}},\"id\":\"test1-username\",\"type\":\"visualization\",\"version\":\"6.2.2\"},{\"panelIndex\":\"8\",\"gridData\":{\"x\":0,\"y\":12,\"w\":8,\"h\":6,\"i\":\"8\"},\"embeddableConfig\":{\"spy\":{\"mode\":{\"fill\":false,\"name\":null}},\"vis\":{\"defaultColors\":{\"0 - 7\":\"rgb(247,252,245)\",\"13 - 20\":\"rgb(116,196,118)\",\"20 - 26\":\"rgb(35,139,69)\",\"7 - 13\":\"rgb(199,233,192)\"}}},\"id\":\"fb6541a0-fbe1-11e7-a2c5-4f46f1590b26\",\"type\":\"visualization\",\"version\":\"6.2.2\"},{\"panelIndex\":\"9\",\"gridData\":{\"x\":0,\"y\":18,\"w\":12,\"h\":3,\"i\":\"9\"},\"id\":\"c05b0260-fbe2-11e7-a2c5-4f46f1590b26\",\"type\":\"visualization\",\"version\":\"6.2.2\"},{\"panelIndex\":\"10\",\"gridData\":{\"x\":6,\"y\":0,\"w\":5,\"h\":2,\"i\":\"10\"},\"id\":\"test1-text\",\"type\":\"visualization\",\"version\":\"6.2.2\"}]", 10 | "optionsJSON": "{\"darkTheme\":false,\"hidePanelTitles\":false,\"useMargins\":false}", 11 | "version": 1, 12 | "timeRestore": false, 13 | "kibanaSavedObjectMeta": { 14 | "searchSourceJSON": "{\"filter\":[],\"query\":{\"language\":\"lucene\",\"query\":{\"query_string\":{\"analyze_wildcard\":true,\"default_field\":\"*\",\"query\":\"*\"}}},\"highlightAll\":true,\"version\":true}" 15 | } 16 | } 17 | } 18 | ] -------------------------------------------------------------------------------- /JP/lab2/asset/us-east-1/2-td-agent.conf: -------------------------------------------------------------------------------- 1 | 2 | @type tail 3 | path /root/es-demo/testapp.log 4 | pos_file /var/log/td-agent/testapp.log.pos 5 | format /^\[(?[^ ]* [^ ]*)\] (?[^ ]*) *? (?[^ ]*) * (?[^ ]*) * (?.*) \[(?.*)\]$/ 6 | time_format %d/%b/%Y:%H:%M:%S %z 7 | types size:integer, status:integer, reqtime:float, runtime:float, time:time 8 | tag testappec2.log 9 | 10 | 11 | 12 | type_name testappec2log 13 | @type elasticsearch 14 | include_tag_key true 15 | tag_key @log_name 16 | host eshost 17 | port 443 18 | scheme https 19 | logstash_format true 20 | logstash_prefix testappec2log 21 | flush_interval 10s 22 | retry_limit 5 23 | buffer_type file 24 | buffer_path /var/log/td-agent/buffer/testapp.log.buffer 25 | reload_connections false 26 | 27 | -------------------------------------------------------------------------------- /JP/lab2/images/Lab2-Section1-Step1-4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-s3-datalake-handson/77a5c739987367b8462c5334b320319885a57d96/JP/lab2/images/Lab2-Section1-Step1-4.png -------------------------------------------------------------------------------- /JP/lab2/images/kibana_capture01.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-s3-datalake-handson/77a5c739987367b8462c5334b320319885a57d96/JP/lab2/images/kibana_capture01.png -------------------------------------------------------------------------------- /JP/lab2/images/kibana_capture02.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-s3-datalake-handson/77a5c739987367b8462c5334b320319885a57d96/JP/lab2/images/kibana_capture02.png -------------------------------------------------------------------------------- /JP/lab2/images/kibana_dashboard.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-s3-datalake-handson/77a5c739987367b8462c5334b320319885a57d96/JP/lab2/images/kibana_dashboard.png -------------------------------------------------------------------------------- /JP/lab2/images/kibana_discover.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-s3-datalake-handson/77a5c739987367b8462c5334b320319885a57d96/JP/lab2/images/kibana_discover.png -------------------------------------------------------------------------------- /JP/lab2/images/kibana_management.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-s3-datalake-handson/77a5c739987367b8462c5334b320319885a57d96/JP/lab2/images/kibana_management.png -------------------------------------------------------------------------------- /JP/lab2/images/kibana_pain.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-s3-datalake-handson/77a5c739987367b8462c5334b320319885a57d96/JP/lab2/images/kibana_pain.png -------------------------------------------------------------------------------- /JP/lab2/images/kibana_pain2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-s3-datalake-handson/77a5c739987367b8462c5334b320319885a57d96/JP/lab2/images/kibana_pain2.png -------------------------------------------------------------------------------- /JP/lab3/README.md: -------------------------------------------------------------------------------- 1 | ------------------------------------------------------------------------------------ 2 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 3 | SPDX-License-Identifier: MIT-0 4 | 5 | ------------------------------------------------------------------------------------ 6 | 7 | 8 | # Lab3:アプリケーションログのリアルタイム可視化とアラーム 9 | 「Lab2:アプリケーションログをリアルタイムで可視化」で実施した可視化に加え、アラーム検知を実施します。 10 | Fluentd から OpenSearch Service に送信する前段に Amazon CloudWatch(以降、CloudWatch)、 AWS Lambda(以降、Lambda)を配置して、アラーム通知をする処理を追加します。 11 | 12 | ## Section1:EC2 の設定変更 13 | ### Step1:IAM ロールの設定 14 | 15 | 作成済の「 **handson-minilake**(任意)」の IAM ロールに以下のようにポリシーを追加します。 16 | 17 | 1. AWS マネジメントコンソールのサービス一覧から **IAM** を選択し、 **[Identity and Access Management (IAM)]** 画面の左ペインから **[ロール]** を選択し、「 **handson-minilake-role**(任意)」のロール名をクリックします。 18 | 19 | 2. **[許可]** タブを選択し、 **[許可を追加]** をクリックし、 **[ポリシーをアタッチ]** をクリックします。 20 | 21 | 3. 検索などを使いながら、 **[CloudWatchLogsFullAccess]** のポリシーにチェックを入れ、 **[ポリシーをアタッチ]** をクリックします。 22 | 23 | 4. 変更実施したロールの **[許可]** タブを選択し、 **[CloudWatchLogsFullAccess]** がアタッチされたことを確認します。 24 | 25 | 5. 次に **[信頼関係]** タブを選択し、 **[信頼ポリシーを編集]** を押下し、ポリシーに以下を追加します。 26 | 27 | **[追加するポリシー]** 28 | 29 | ``` 30 | "lambda.amazonaws.com" 31 | ``` 32 | 33 | **[追加例]** 34 | 35 | ``` 36 | { 37 | "Version": "2012-10-17", 38 | "Statement": [ 39 | { 40 | "Effect": "Allow", 41 | "Principal": { 42 | "Service": [ 43 | "ec2.amazonaws.com", 44 | "lambda.amazonaws.com" 45 | ] 46 | }, 47 | "Action": "sts:AssumeRole" 48 | } 49 | ] 50 | } 51 | ``` 52 | 53 | 6. 次に **[信頼されたエンティティ]** に、 **[Lambda]** が追加されていることを確認します。 54 | 55 | 7. あとで使用するため、 **[ロールARN]** の値をメモしておきます。 56 | 57 | ### Step2:OpenSearch へのロールの認証設定 58 | 59 | 1. **OpenSearch Dashboards** の画面を開き、 **OpenSearch Dashboards** の画面左にある![kibana_pain](images/kibana_pain2.png)アイコンをクリックし、 **[Security]** をクリックします。 60 | 61 | 2. 左ペインにあるRolesを押下し、Role一覧の中にある **[all_access]** をクリックする。 62 | 63 | 3. **[Mapped users]** のタブを選択し、 **[Manage mapping]** を押下する。 64 | 65 | 4. **[Backend roles]** に先ほどメモした、 「**handson-minilake**(任意)」の **[ロールARN]** の値を入力し **[Map]** ボタンを押下する。 66 | 67 | 68 | ### Step3:Fluentd の設定 69 | 70 | Fluentd から CloudWatch Logs にログデータを送信するための設定を行います。 71 | 72 | 1. EC2 にログインし、プラグインのインストール状況を確認します。 73 | 74 | **Asset** 資料:[3-cmd.txt](asset/ap-northeast-1/3-cmd.txt) 75 | 76 | ``` 77 | # td-agent-gem list | grep cloudwatch-logs 78 | ``` 79 | 80 | **[実行結果例]** 81 | 82 | ``` 83 | fluent-plugin-cloudwatch-logs (0.4.4) 84 | ``` 85 | 86 | 2. 「 **/etc/td-agent/td-agent.conf** 」の設定を **Lab3** 向けに変更するために、あらかじめ用意しておいた **asset** に 以下の cp コマンドを用いて置き換えます。その際、ファイルを上書きするかの確認が入る為、 **yes** と入力します。 87 | 88 | ``` 89 | # cp -p /root/asset/3-td-agent.conf /etc/td-agent/td-agent.conf 90 | ``` 91 | 92 | 3. Fluentd を再起動します。 93 | 94 | **Asset** 資料:[3-cmd.txt](asset/ap-northeast-1/3-cmd.txt) 95 | 96 | ``` 97 | # /etc/init.d/td-agent restart 98 | ``` 99 | 100 | 4. Fluentd のログを確認し、ログの中にエラーが出続けることがないかを確認します。 101 | 102 | **Asset** 資料:[3-cmd.txt](asset/ap-northeast-1/3-cmd.txt) 103 | 104 | ``` 105 | # tail -f /var/log/td-agent/td-agent.log 106 | ``` 107 | 108 | ## Section2:CloudWatch, OpenSearch Service の設定変更 109 | ### Step1:CloudWatch Logs の設定 110 | 111 | 1. AWS マネジメントコンソールのサービス一覧から **CloudWatch** を選択し、 **[CloudWatch]** の画面の左側ペインから **[ロググループ]** をクリックします。 112 | 113 | 2. ロググループ「 **minilake_group**(任意)」が出力されていることを確認し、クリックします。 114 | 115 | **Note:** 数分待ってもログが出ない場合は、 EC2 に IAM ロールがアタッチされてるか確認します。 116 | 117 | 3. ログストリーム「 **testapplog_stream**(任意)」をクリックします。直近のログが出力されていることを確認します。画面上部の **[ロググループ]** の文字列をクリックし、ロググループに戻ります。 118 | 119 | 4. ロググループ「 **minilake_group**(任意)」にチェックを入れ、 **[アクション]** をクリックし、 **[サブスクリプションフィルター]** の中にある **[Amazon OpenSearch Service サブスクリプションフィルターを作成]** をクリックします。 120 | 121 | **Note:** 裏側では自動で Lambda Function が作られます。 122 | 123 | 5. **"送信先の選択"** において、 **[アカウントの選択]** で **[このアカウント]** を選択し、 **[Amazon OpenSearch Service クラスター]** において、作成済みの「 **handson-minilake**(任意)」を選択し、 **[Lambda IAM Execution Role]** において、 「**handson-minilake-role**(任意)」 を選択します。 124 | 125 |   **Note:** ブラウザでポップアップブロックが走ったら、許可して、1つ前の手順からやり直して下さい。 126 | 127 | 6. **"ログ形式とフィルターを設定"** 画面において、 **[ログの形式]** に **[その他]** を選択し、 **[サブスクリプションフィルター名]** に、「**handson-minilake01**(任意)」と入力し、 **[ストリーミングの開始]** をクリックします。 128 | 129 | 130 | ### Step2:OpenSearch Service の設定 131 | 132 | 1. **OpenSearch Dashboards** の画面を開き、 **OpenSearch Dashboards** の画面左にある![kibana_pain](images/kibana_pain2.png)アイコンをクリックし、 **[Management]** 内にある **[Stack Management]** をクリックします。 133 | 134 | 2. 左ペインの **[Index Patterns]** を選択し、 **[Create index pattern]** をクリックします。 135 | 136 | 3. **[Index pattern name]** に「 __cwl-*__ 」を入力し、右側の **[Next step]** をクリックします。 137 | 138 | **Note:** インデックス作成に多少の時間がかかるため、 **[Next step]** がクリックできるようになるまで、少し時間がかかることが想定されます。 139 | 140 | 4. **[Time field]** に **[@timestamp]** を選択し、右下の **[Create index pattern]** をクリックします。 141 | 142 | 5. **OpenSearch Dashboards** の画面の左ペインから **[Saved Objects]** をクリックします。画面右上の **[Import]** アイコンをクリックします。 143 | 144 | 6. **[Import saved objects]** 画面において、 **[Import]** アイコンをクリックし、 **Asset** 資料の「 **3-visualization.json** 」を選択し、 **[Import]** をクリックします。続く画面において、 **[New index patten]** に対して、「 **cwl-\*** 」を選択し、 **[Confirm all changes]** をクリックし、インポートを完了します。問題なくインポートが完了したら、 **[Done]** をクリックすると、元の画面に戻ります。 145 | 146 | **Asset** 資料:[3-visualization.json](asset/ap-northeast-1/3-visualization.json) 147 | 148 | 7. 続いて、再度 **[Saved Objects]** 画面において、 **[Import]** アイコンをクリックし、 **Asset** 資料の「 **3-dashboard.json** 」を選択し、 **[Import]** をクリックし、インポートします。問題なくインポートが完了したら、 **[Done]** をクリックすると、元の画面に戻ります。 149 | 150 | **Asset** 資料:[3-dashboard.json](asset/ap-northeast-1/3-dashboard.json) 151 | 152 | ### Step3:CloudWatch アラームの設定 153 | 154 | 1. AWS マネジメントコンソールのサービス一覧から **CloudWatch** を選択し、**[ロググループ]** をクリックし、「 **minilake_group**(任意)」のロググループにチェックを入れ、 **[アクション]** から、**[メトリクスフィルターを作成]** をクリックします。 155 | 156 | 2. フィルターパターンに「 **ERROR** 」を入力し、 **[パターンをテスト]** の **[テストするログデータを選択]** にて、 **[testapplog_stream]** を選択し、画面右下の **[Next]** をクリックします。 157 | 158 | 3. **[フィルター名]** に「 **ERROR** 」と入力し、 **[メトリクス名前空間]** と **[メトリクス名]** に「 **minilake_errlog**(任意)」と入力し、 最後に **[メトリクス値]** に「 **1** 」を入力し、 **[Next]** をクリックします。 159 | 160 | 4. 確認画面に遷移するので、右下の **[メトリクスフィルターの作成]** をクリックするとフィルタが作成されます。画面を閉じずそのまま、作成したメトリクスフィルターにチェックを入れ、画面の右側の **[アラームの作成]** をクリックし、続けてアラーム設定を行います。 161 | 162 | 5. **[期間]** を「 **1 分** 」に設定し、そのまま **[条件]** に下記を設定し、 **[次へ]** をクリックします。 163 | 164 | - しきい値の種類:静的 165 | - アラーム条件を定義:以上 166 | - しきい値:50 167 | 168 | 6. 下記を設定し、 **[トピックの作成]** をクリックし、トピックを作成します。 169 | 170 | - SNS トピックの選択:新しいトピックの作成 171 | - 新規トピックの作成中:任意の文字列(例:Default\_CloudWatch\_Alarms\_Topic) 172 | - 通知を受け取るEメールエンドポイント:ハンズオン実施中に受信可能なメールアドレス 173 | 174 | **Note:** 登録後、この手順で登録したメールアドレスに確認メールが飛びますので、メール本文の **[Confirm subscription]** をクリックし確認ください。 175 | 176 | 7. 必要な設定は完了したため、 **[次へ]** をクリックします。 177 | 178 | 8. **[アラーム名]** に「 **minilake-handson-alarm**(任意)」と入力し、 **[次へ]** をクリックし、続いての画面で、 **[アラームを作成]** をクリックします。 179 | 180 | 9. **[CloudWatch]** の **[アラーム]** をクリックし、 「**minilake-handson-alarm** 」をクリックし、グラフ画面を確認します。最初は状態が **[データ不足]** と表示されていますが、しばらくすると **[OK]** になります。1分間で50件以上 **ERROR** が発生するとアラートが上がります。10分ごとに ERROR が300件出る設定になっている為、10分毎にアラートが上がります。 181 | 182 | 183 | ## Section3:まとめ 184 | 185 | リアルタイムなログのモニタリングと監視の環境を構築することができました。 186 | 187 | 188 | 189 | Lab3 は以上です。選択されているパターンに合わせて次の手順を実施ください。 190 | 191 | (1) ニアリアルタイムデータ分析環境(スピードレイヤ)の構築:[Lab1](../lab1/README.md) → [Lab2](../lab2/README.md) → [Lab3](../lab3/README.md) 192 | (2) 長期間のデータをバッチ分析する環境(バッチレイヤ)の構築と、パフォーマンスとコストの最適化:[Lab1](../lab1/README.md) → [Lab4](../lab4/README.md) or [Lab5](../lab5/README.md) → [Lab6](../lab6/README.md) 193 | (3) すべて実施:[Lab1](../lab1/README.md) → [Lab2](../lab2/README.md) → [Lab3](../lab3/README.md) → [Lab4](../lab4/README.md) → [Lab5](../lab5/README.md) → [Lab6](../lab6/README.md) 194 | 195 | 環境を削除される際は、[こちら](../clean-up/README.md)の手順をご覧ください。 196 | -------------------------------------------------------------------------------- /JP/lab3/additional_info_lab3.md: -------------------------------------------------------------------------------- 1 | ------------------------------------------------------------------------------------ 2 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 3 | SPDX-License-Identifier: MIT-0 4 | 5 | ------------------------------------------------------------------------------------ 6 | 7 | 8 | # Elasticsearch Service でのアラート設定 9 | 10 | Amazon Elasticsearch Service(以下、Elasticsearch Service)で、イベント監視およびアラートのサポートを開始しました。アラートは、 Elasticsearch 6.2 以降を実行しているドメインで利用可能です。詳細は[こちら](https://aws.amazon.com/jp/about-aws/whats-new/2019/04/amazon-elasticsearch-service-adds-event-monitoring-and-alerting-support/) を参照ください。 11 | 本資料では、 Amazon Simple Notification Service(以下、Amazon SNS)を用いたアラート通知設定を紹介します。 12 | 13 | 1. AWS マネジメントコンソールのサービス一覧から **Simple Notification Service** を選択し、左ペインから **[トピック]** を選択し、 **[トピックの作成]** をクリックします。 14 | 15 | 2. **[トピックの作成]** 画面で、名前に「 **handson-minilake** (任意)」と入力し、 **[トピックの作成]** をクリックし、トピックを作成します。トピックに対するエンドポイントのサブスクリプションの作成については[こちら](https://docs.aws.amazon.com/ja_jp/sns/latest/dg/sns-getting-started.html)を参照し、登録を行います。 16 | 17 | 3. 作成済の「 **handson-minilake** (任意)」の IAM ロールにポリシーを追加するために、 AWS マネジメントコンソールのサービス一覧から **IAM** を選択し、 **[ロール]** をクリックし、「 **handson-minilake** (任意)」のロール名をクリックします。 18 | 19 | 4. **[アクセス権限]** タブを選択し、 **[ポリシーのアタッチ]** をクリックします。 20 | 21 | 5. 検索などを使いながら、 **[AmazonSNSFullAccess]** のポリシーにチェックを入れ、 **[ポリシーのアタッチ]** をクリックします。 22 | 23 | 6. **[信頼関係]** タブをクリックし、 **[信頼関係の編集]** ボタンをクリックします。 24 | 25 | 7. **[信頼関係の編集]** 画面において、**”Service”: “ec2.amazonaws.com”** の箇所に **es** を追記します。 **[]**でくくり、 **カンマ** で区切り、 **es.amazonaws.com** を追記し、**[信頼ポリシーの更新]** をクリックします。 26 | 27 | **[記入例]** 28 | 29 | ``` 30 | { 31 | "Version": "2012-10-17", 32 | "Statement": [ 33 | { 34 | "Effect": "Allow", 35 | "Principal": { 36 | "Service": [ 37 | "glue.amazonaws.com",   38 | "ec2.amazonaws.com" 39 | ] 40 | }, 41 | "Action": "sts:AssumeRole" 42 | } 43 | ] 44 | } 45 | ``` 46 | 47 | 8. **Kibana** を開き、 **[Alerting]** を選択します。**[Destinations]** タブを選択し、 **[Add Destination]** を選択します。 48 | 49 | 9. **[Add Destination]** 画面で、下記の通り入力し、 **[Create]** をクリックし作成を選択します。 50 | 51 | - Name:sns-handson-minilake(任意) 52 | - Type:Amazon SNS 53 | - SNS topic ARN:通知に使用するSNSトピックのARN 54 | - IAM role ARN:handson-minilake(任意)ロールのARN 55 | 56 | 10. 続いて、監視の設定を行います。 **[Monitors]** を選択し、**[Create monitor]** をクリックします。 57 | 58 | 11. **[Configure Monitor]** 内の入力については、下記の通り行います。 59 | 60 | - Monitor name:monitor-handson-minilake(任意) 61 | - Frequency:By interval(デフォルト) 62 | - Every:3 Minutes 63 | 64 | **Note:** 1分毎とした場合、 Elasticsearch Service への CloudWatch Logs の情報流し込みが多少遅延する可能性があるため、3分毎としています。 65 | 66 | 67 | 12. **[Define Monitor]** 内の入力については、下記の通り行い、画面下部の **[Create]** をクリックします。 68 | 69 | - How do you want to define the monitor?:Define using extraction query 70 | - Index:cwl-* を選択 71 | - Define extraction query:[3-define-extraction-query.txt](asset/ap-northeast-1/3-define-extraction-query.txt)ファイルの内容をコピー 72 | 73 | 13. **[Create Trigger]** の画面に遷移するため、**Define Trigger** にて、トリガーの設定を行う。下記の通り入力する。 74 | 75 | - Trigger name:trigger-handson-minilake(任意) 76 | - Severity level:1(デフォルト) 77 | - Execution query response:変更しない 78 | - Trigger condition:50に変更 79 | ``` 80 | ctx.results[0].hits.total > 50 81 | ``` 82 | 83 | 14. **[Configure Actions]** にて、下記の通り入力し、画面下部の **[Create]** をクリックし、作成します。 84 | 85 | - Action name:action-handson-minilake(任意) 86 | - Destination name:sns-handson-minilake – (Amazon SNS) 87 | - Message subject:alert-handson-minilake(任意) 88 | 89 | 15. 10 分ごとに ERROR が 300 件出る設定になっていますので、10 分程度経過すると、アラートがトリガーされていることが確認できます。 90 | -------------------------------------------------------------------------------- /JP/lab3/asset/ap-northeast-1/3-cmd.txt: -------------------------------------------------------------------------------- 1 | ------------------------------------------------------------------------------------ 2 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 3 | SPDX-License-Identifier: MIT-0 4 | 5 | ------------------------------------------------------------------------------------ 6 | 7 | 8 | 1. 9 | td-agent-gem install fluent-plugin-cloudwatch-logs -v 0.4.4 10 | td-agent-gem list | grep fluent-plugin-cloudwatch-logs 11 | 12 | 2. 13 | export AWS_REGION="ap-northeast-1" 14 | 15 | 3. 16 | /etc/init.d/td-agent restart 17 | 18 | 4. 19 | tail -f /var/log/td-agent/td-agent.log 20 | -------------------------------------------------------------------------------- /JP/lab3/asset/ap-northeast-1/3-dashboard.json: -------------------------------------------------------------------------------- 1 | [ 2 | { 3 | "_id": "test2-dashboard", 4 | "_type": "dashboard", 5 | "_source": { 6 | "title": "test2-dashboard", 7 | "hits": 0, 8 | "description": "", 9 | "panelsJSON": "[{\"panelIndex\":\"1\",\"gridData\":{\"x\":0,\"y\":2,\"w\":12,\"h\":4,\"i\":\"1\"},\"id\":\"test2-alarmlevel\",\"type\":\"visualization\",\"version\":\"6.2.2\"},{\"panelIndex\":\"3\",\"gridData\":{\"x\":6,\"y\":6,\"w\":6,\"h\":5,\"i\":\"3\"},\"id\":\"test2-hostalarmlevel\",\"type\":\"visualization\",\"version\":\"6.2.2\"},{\"panelIndex\":\"4\",\"gridData\":{\"x\":0,\"y\":0,\"w\":5,\"h\":2,\"i\":\"4\"},\"id\":\"test-text\",\"type\":\"visualization\",\"version\":\"6.2.2\"},{\"panelIndex\":\"5\",\"gridData\":{\"x\":3,\"y\":6,\"w\":3,\"h\":3,\"i\":\"5\"},\"id\":\"test2-count\",\"type\":\"visualization\",\"embeddableConfig\":{\"spy\":{\"mode\":{\"fill\":false,\"name\":null}},\"vis\":{\"defaultColors\":{\"0 - 100\":\"rgb(0,104,55)\"}}},\"version\":\"6.2.2\"},{\"panelIndex\":\"6\",\"gridData\":{\"x\":0,\"y\":6,\"w\":3,\"h\":3,\"i\":\"6\"},\"id\":\"test2-ranking\",\"type\":\"visualization\",\"embeddableConfig\":{\"spy\":{\"mode\":{\"fill\":false,\"name\":null}},\"vis\":{\"params\":{\"sort\":{\"columnIndex\":null,\"direction\":null}}}},\"version\":\"6.2.2\"},{\"panelIndex\":\"7\",\"gridData\":{\"x\":0,\"y\":9,\"w\":6,\"h\":3,\"i\":\"7\"},\"id\":\"test2-username\",\"type\":\"visualization\",\"embeddableConfig\":{\"spy\":{\"mode\":{\"fill\":false,\"name\":null}}},\"version\":\"6.2.2\"},{\"panelIndex\":\"8\",\"gridData\":{\"x\":0,\"y\":12,\"w\":8,\"h\":6,\"i\":\"8\"},\"id\":\"fb6541a0-fbe1-11e7-a2c5-4f46f1590b26\",\"type\":\"visualization\",\"embeddableConfig\":{\"spy\":{\"mode\":{\"fill\":false,\"name\":null}},\"vis\":{\"defaultColors\":{\"0 - 7\":\"rgb(247,252,245)\",\"7 - 13\":\"rgb(199,233,192)\",\"13 - 20\":\"rgb(116,196,118)\",\"20 - 26\":\"rgb(35,139,69)\"}}},\"version\":\"6.2.2\"},{\"panelIndex\":\"9\",\"gridData\":{\"x\":0,\"y\":18,\"w\":12,\"h\":3,\"i\":\"9\"},\"id\":\"c05b0260-fbe2-11e7-a2c5-4f46f1590b26\",\"type\":\"visualization\",\"version\":\"6.2.2\"},{\"panelIndex\":\"10\",\"gridData\":{\"x\":5,\"y\":0,\"w\":4,\"h\":2,\"i\":\"10\"},\"type\":\"visualization\",\"id\":\"test2-text\",\"version\":\"6.2.2\"}]", 10 | "optionsJSON": "{\"darkTheme\":false,\"useMargins\":false}", 11 | "version": 1, 12 | "timeRestore": false, 13 | "kibanaSavedObjectMeta": { 14 | "searchSourceJSON": "{\"filter\":[],\"query\":{\"language\":\"lucene\",\"query\":{\"query_string\":{\"analyze_wildcard\":true,\"default_field\":\"*\",\"query\":\"*\"}}},\"highlightAll\":true,\"version\":true}" 15 | } 16 | } 17 | } 18 | ] -------------------------------------------------------------------------------- /JP/lab3/asset/ap-northeast-1/3-define-extraction-query.txt: -------------------------------------------------------------------------------- 1 | { 2 | "size": 0, 3 | "query": { 4 | "bool": { 5 | "must": [ 6 | { 7 | "match": { 8 | "alarmlevel": { 9 | "query": "ERROR", 10 | "operator": "OR", 11 | "prefix_length": 0, 12 | "max_expansions": 50, 13 | "fuzzy_transpositions": true, 14 | "lenient": false, 15 | "zero_terms_query": "NONE", 16 | "auto_generate_synonyms_phrase_query": true, 17 | "boost": 1 18 | } 19 | } 20 | } 21 | ], 22 | "filter": [ 23 | { 24 | "range": { 25 | "@timestamp": { 26 | "from": "now-3m", 27 | "to": "now", 28 | "include_lower": true, 29 | "include_upper": true, 30 | "boost": 1 31 | } 32 | } 33 | } 34 | ], 35 | "adjust_pure_negative": true, 36 | "boost": 1 37 | } 38 | }, 39 | "aggregations": {} 40 | } -------------------------------------------------------------------------------- /JP/lab3/asset/ap-northeast-1/3-td-agent.conf: -------------------------------------------------------------------------------- 1 | 2 | @type tail 3 | path /root/es-demo/testapp.log 4 | pos_file /var/log/td-agent/testapp.log.pos 5 | format /^\[(?[^ ]* [^ ]*)\] (?[^ ]*) *? (?[^ ]*) * (?[^ ]*) * (?.*) \[(?.*)\]$/ 6 | time_format %d/%b/%Y:%H:%M:%S %z 7 | types size:integer, status:integer, reqtime:float, runtime:float, time:time 8 | tag testappec2.log 9 | 10 | 11 | 12 | @type cloudwatch_logs 13 | log_group_name minilake_group 14 | log_stream_name testapplog_stream 15 | auto_create_stream true 16 | 17 | -------------------------------------------------------------------------------- /JP/lab3/asset/us-east-1/3-cmd.txt: -------------------------------------------------------------------------------- 1 | ------------------------------------------------------------------------------------ 2 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 3 | SPDX-License-Identifier: MIT-0 4 | 5 | ------------------------------------------------------------------------------------ 6 | 7 | 8 | 1. 9 | td-agent-gem install fluent-plugin-cloudwatch-logs -v 0.4.4 10 | td-agent-gem list | grep fluent-plugin-cloudwatch-logs 11 | 12 | 2. 13 | export AWS_REGION="ap-northeast-1" 14 | 15 | ※バージニア北部で実施の場合 16 | export AWS_REGION="us-east-1" 17 | 18 | 19 | 3. 20 | /etc/init.d/td-agent restart 21 | 22 | 4. 23 | tail -f /var/log/td-agent/td-agent.log 24 | -------------------------------------------------------------------------------- /JP/lab3/asset/us-east-1/3-dashboard.json: -------------------------------------------------------------------------------- 1 | [ 2 | { 3 | "_id": "test2-dashboard", 4 | "_type": "dashboard", 5 | "_source": { 6 | "title": "test2-dashboard", 7 | "hits": 0, 8 | "description": "", 9 | "panelsJSON": "[{\"panelIndex\":\"1\",\"gridData\":{\"x\":0,\"y\":2,\"w\":12,\"h\":4,\"i\":\"1\"},\"id\":\"test2-alarmlevel\",\"type\":\"visualization\",\"version\":\"6.2.2\"},{\"panelIndex\":\"3\",\"gridData\":{\"x\":6,\"y\":6,\"w\":6,\"h\":5,\"i\":\"3\"},\"id\":\"test2-hostalarmlevel\",\"type\":\"visualization\",\"version\":\"6.2.2\"},{\"panelIndex\":\"4\",\"gridData\":{\"x\":0,\"y\":0,\"w\":5,\"h\":2,\"i\":\"4\"},\"id\":\"test-text\",\"type\":\"visualization\",\"version\":\"6.2.2\"},{\"panelIndex\":\"5\",\"gridData\":{\"x\":3,\"y\":6,\"w\":3,\"h\":3,\"i\":\"5\"},\"id\":\"test2-count\",\"type\":\"visualization\",\"embeddableConfig\":{\"spy\":{\"mode\":{\"fill\":false,\"name\":null}},\"vis\":{\"defaultColors\":{\"0 - 100\":\"rgb(0,104,55)\"}}},\"version\":\"6.2.2\"},{\"panelIndex\":\"6\",\"gridData\":{\"x\":0,\"y\":6,\"w\":3,\"h\":3,\"i\":\"6\"},\"id\":\"test2-ranking\",\"type\":\"visualization\",\"embeddableConfig\":{\"spy\":{\"mode\":{\"fill\":false,\"name\":null}},\"vis\":{\"params\":{\"sort\":{\"columnIndex\":null,\"direction\":null}}}},\"version\":\"6.2.2\"},{\"panelIndex\":\"7\",\"gridData\":{\"x\":0,\"y\":9,\"w\":6,\"h\":3,\"i\":\"7\"},\"id\":\"test2-username\",\"type\":\"visualization\",\"embeddableConfig\":{\"spy\":{\"mode\":{\"fill\":false,\"name\":null}}},\"version\":\"6.2.2\"},{\"panelIndex\":\"8\",\"gridData\":{\"x\":0,\"y\":12,\"w\":8,\"h\":6,\"i\":\"8\"},\"id\":\"fb6541a0-fbe1-11e7-a2c5-4f46f1590b26\",\"type\":\"visualization\",\"embeddableConfig\":{\"spy\":{\"mode\":{\"fill\":false,\"name\":null}},\"vis\":{\"defaultColors\":{\"0 - 7\":\"rgb(247,252,245)\",\"7 - 13\":\"rgb(199,233,192)\",\"13 - 20\":\"rgb(116,196,118)\",\"20 - 26\":\"rgb(35,139,69)\"}}},\"version\":\"6.2.2\"},{\"panelIndex\":\"9\",\"gridData\":{\"x\":0,\"y\":18,\"w\":12,\"h\":3,\"i\":\"9\"},\"id\":\"c05b0260-fbe2-11e7-a2c5-4f46f1590b26\",\"type\":\"visualization\",\"version\":\"6.2.2\"},{\"panelIndex\":\"10\",\"gridData\":{\"x\":5,\"y\":0,\"w\":4,\"h\":2,\"i\":\"10\"},\"type\":\"visualization\",\"id\":\"test2-text\",\"version\":\"6.2.2\"}]", 10 | "optionsJSON": "{\"darkTheme\":false,\"useMargins\":false}", 11 | "version": 1, 12 | "timeRestore": false, 13 | "kibanaSavedObjectMeta": { 14 | "searchSourceJSON": "{\"filter\":[],\"query\":{\"language\":\"lucene\",\"query\":{\"query_string\":{\"analyze_wildcard\":true,\"default_field\":\"*\",\"query\":\"*\"}}},\"highlightAll\":true,\"version\":true}" 15 | } 16 | } 17 | } 18 | ] -------------------------------------------------------------------------------- /JP/lab3/asset/us-east-1/3-define-extraction-query.txt: -------------------------------------------------------------------------------- 1 | { 2 | "size": 0, 3 | "query": { 4 | "bool": { 5 | "must": [ 6 | { 7 | "match": { 8 | "alarmlevel": { 9 | "query": "ERROR", 10 | "operator": "OR", 11 | "prefix_length": 0, 12 | "max_expansions": 50, 13 | "fuzzy_transpositions": true, 14 | "lenient": false, 15 | "zero_terms_query": "NONE", 16 | "auto_generate_synonyms_phrase_query": true, 17 | "boost": 1 18 | } 19 | } 20 | } 21 | ], 22 | "filter": [ 23 | { 24 | "range": { 25 | "@timestamp": { 26 | "from": "now-3m", 27 | "to": "now", 28 | "include_lower": true, 29 | "include_upper": true, 30 | "boost": 1 31 | } 32 | } 33 | } 34 | ], 35 | "adjust_pure_negative": true, 36 | "boost": 1 37 | } 38 | }, 39 | "aggregations": {} 40 | } -------------------------------------------------------------------------------- /JP/lab3/asset/us-east-1/3-td-agent.conf: -------------------------------------------------------------------------------- 1 | 2 | @type tail 3 | path /root/es-demo/testapp.log 4 | pos_file /var/log/td-agent/testapp.log.pos 5 | format /^\[(?[^ ]* [^ ]*)\] (?[^ ]*) *? (?[^ ]*) * (?[^ ]*) * (?.*) \[(?.*)\]$/ 6 | time_format %d/%b/%Y:%H:%M:%S %z 7 | types size:integer, status:integer, reqtime:float, runtime:float, time:time 8 | tag testappec2.log 9 | 10 | 11 | 12 | @type cloudwatch_logs 13 | log_group_name minilake_group 14 | log_stream_name testapplog_stream 15 | auto_create_stream true 16 | 17 | -------------------------------------------------------------------------------- /JP/lab3/images/Lab3-Section2-Step3-4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-s3-datalake-handson/77a5c739987367b8462c5334b320319885a57d96/JP/lab3/images/Lab3-Section2-Step3-4.png -------------------------------------------------------------------------------- /JP/lab3/images/kibana_pain2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-s3-datalake-handson/77a5c739987367b8462c5334b320319885a57d96/JP/lab3/images/kibana_pain2.png -------------------------------------------------------------------------------- /JP/lab4/additional_info_lab4.md: -------------------------------------------------------------------------------- 1 | ------------------------------------------------------------------------------------ 2 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 3 | SPDX-License-Identifier: MIT-0 4 | 5 | ------------------------------------------------------------------------------------ 6 | 7 | 8 | # Athena におけるクエリ実行の補足説明 9 | ## しばらく時間が経過し、データが溜まった状態での実行例 10 | 11 | 1. クエリエディタで下記 SQL を実行します。 12 | 13 | ``` 14 | SELECT * FROM "minilake"."minilake_in1"; 15 | ``` 16 | 17 | **[実行結果例]** 18 | 19 | ``` 20 | (実行時間: 4.66 秒, スキャンしたデータ: 27.02 MB) 21 | ``` 22 | 23 | データの容量に応じて、実行時間、スキャンデータ量が増えていることが確認できます。 24 | 25 | 26 | 2. Where 句をつけたクエリを実行してみます。 27 | 28 | ``` 29 | SELECT * FROM "minilake"."minilake_in1" where partition_0 = '2019' AND partition_1 = '09' AND partition_2 = '27' AND partition_3 = '14'; 30 | ``` 31 | 32 | **[実行結果例]** 33 | 34 | ``` 35 | (実行時間: 2.46 秒, スキャンしたデータ: 284.42 KB) 36 | ``` 37 | 38 | **Note:** Where 句の日付はデータが存在するものを入力してください。 39 | 40 | 41 | 1と同様、データの容量に応じて、実行時間、スキャンデータ量が増えていることが確認できます。 42 | 43 | 44 | 45 | **Athena はスキャンしたデータ量による課金、スキャン量がすくなければコストも低く抑えられさらにスキャン量が少なければパフォーマンス向上にもなります。** 46 | 読み取る量を減らす工夫のために、パーティション、圧縮、カラムナフォーマットを利用します。参考情報は[こちら](https://aws.amazon.com/jp/blogs/news/top-10-performance-tuning-tips-for-amazon-athena/)を参照ください。 47 | 48 | 49 | -------------------------------------------------------------------------------- /JP/lab4/asset/ap-northeast-1/4-cmd.txt: -------------------------------------------------------------------------------- 1 | ------------------------------------------------------------------------------------ 2 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 3 | SPDX-License-Identifier: MIT-0 4 | 5 | ------------------------------------------------------------------------------------ 6 | 7 | 8 | 1. 9 | td-agent-gem install fluent-plugin-kinesis -v 2.1.0 10 | 11 | 2. 12 | td-agent-gem list | grep plugin-kinesis 13 | 14 | 3. 15 | export AWS_REGION="ap-northeast-1" 16 | 17 | 4. 18 | /etc/init.d/td-agent restart 19 | 20 | 5. 21 | SELECT * FROM "minilake"."minilake_in1"; 22 | 23 | 6. 24 | SELECT * FROM "minilake"."minilake_in1" where partition_0 = '2019' AND partition_1 = '09' AND partition_2 = '27' AND partition_3 = '14'; 25 | -------------------------------------------------------------------------------- /JP/lab4/asset/ap-northeast-1/4-policydocument.txt: -------------------------------------------------------------------------------- 1 | { 2 | "Version": "2012-10-17", 3 | "Statement": [ 4 | { 5 | "Effect": "Allow", 6 | "Principal": { 7 | "Service": [ 8 | "glue.amazonaws.com", 9 | "ec2.amazonaws.com" 10 | ] 11 | }, 12 | "Action": "sts:AssumeRole" 13 | } 14 | ] 15 | } 16 | -------------------------------------------------------------------------------- /JP/lab4/asset/ap-northeast-1/4-td-agent1.conf: -------------------------------------------------------------------------------- 1 | 2 | @type tail 3 | path /root/es-demo/testapp.log 4 | pos_file /var/log/td-agent/testapp.log.pos 5 | format /^\[(?[^ ]* [^ ]*)\] (?[^ ]*) *? (?[^ ]*) * (?[^ ]*) * (?.*) \[(?.*)\]$/ 6 | time_format %d/%b/%Y:%H:%M:%S %z 7 | types size:integer, status:integer, reqtime:float, runtime:float, time:time 8 | tag testappec2.log 9 | 10 | 11 | 12 | @type copy 13 | 14 | @type cloudwatch_logs 15 | log_group_name minilake_group 16 | log_stream_name testapplog_stream 17 | auto_create_stream true 18 | 19 | 20 | @type kinesis_firehose 21 | delivery_stream_name minilake1 22 | flush_interval 1s 23 | 24 | 25 | -------------------------------------------------------------------------------- /JP/lab4/asset/ap-northeast-1/4-td-agent2.conf: -------------------------------------------------------------------------------- 1 | 2 | @type tail 3 | path /root/es-demo/testapp.log 4 | pos_file /var/log/td-agent/testapp.log.pos 5 | format /^\[(?[^ ]* [^ ]*)\] (?[^ ]*) *? (?[^ ]*) * (?[^ ]*) * (?.*) \[(?.*)\]$/ 6 | time_format %d/%b/%Y:%H:%M:%S %z 7 | types size:integer, status:integer, reqtime:float, runtime:float, time:time 8 | tag testappec2.log 9 | 10 | 11 | 12 | @type copy 13 | 14 | @type kinesis_firehose 15 | delivery_stream_name minilake1 16 | flush_interval 1s 17 | 18 | 19 | -------------------------------------------------------------------------------- /JP/lab4/asset/us-east-1/4-cmd.txt: -------------------------------------------------------------------------------- 1 | ------------------------------------------------------------------------------------ 2 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 3 | SPDX-License-Identifier: MIT-0 4 | 5 | ------------------------------------------------------------------------------------ 6 | 7 | 8 | 1. 9 | td-agent-gem install fluent-plugin-kinesis -v 2.1.0 10 | 11 | 2. 12 | td-agent-gem list | grep plugin-kinesis 13 | 14 | 3. 15 | export AWS_REGION="ap-northeast-1" 16 | 17 | 4. 18 | /etc/init.d/td-agent restart 19 | 20 | 5. 21 | SELECT * FROM "minilake"."minilake_in1"; 22 | 23 | 6. 24 | SELECT * FROM "minilake"."minilake_in1" where partition_0 = '2019' AND partition_1 = '09' AND partition_2 = '27' AND partition_3 = '14'; 25 | -------------------------------------------------------------------------------- /JP/lab4/asset/us-east-1/4-policydocument.txt: -------------------------------------------------------------------------------- 1 | { 2 | "Version": "2012-10-17", 3 | "Statement": [ 4 | { 5 | "Effect": "Allow", 6 | "Principal": { 7 | "Service": [ 8 | "glue.amazonaws.com", 9 | "ec2.amazonaws.com" 10 | ] 11 | }, 12 | "Action": "sts:AssumeRole" 13 | } 14 | ] 15 | } 16 | -------------------------------------------------------------------------------- /JP/lab4/asset/us-east-1/4-td-agent1.conf: -------------------------------------------------------------------------------- 1 | 2 | @type tail 3 | path /root/es-demo/testapp.log 4 | pos_file /var/log/td-agent/testapp.log.pos 5 | format /^\[(?[^ ]* [^ ]*)\] (?[^ ]*) *? (?[^ ]*) * (?[^ ]*) * (?.*) \[(?.*)\]$/ 6 | time_format %d/%b/%Y:%H:%M:%S %z 7 | types size:integer, status:integer, reqtime:float, runtime:float, time:time 8 | tag testappec2.log 9 | 10 | 11 | 12 | @type copy 13 | 14 | @type cloudwatch_logs 15 | log_group_name minilake_group 16 | log_stream_name testapplog_stream 17 | auto_create_stream true 18 | 19 | 20 | @type kinesis_firehose 21 | delivery_stream_name minilake1 22 | flush_interval 1s 23 | 24 | 25 | -------------------------------------------------------------------------------- /JP/lab4/asset/us-east-1/4-td-agent2.conf: -------------------------------------------------------------------------------- 1 | 2 | @type tail 3 | path /root/es-demo/testapp.log 4 | pos_file /var/log/td-agent/testapp.log.pos 5 | format /^\[(?[^ ]* [^ ]*)\] (?[^ ]*) *? (?[^ ]*) * (?[^ ]*) * (?.*) \[(?.*)\]$/ 6 | time_format %d/%b/%Y:%H:%M:%S %z 7 | types size:integer, status:integer, reqtime:float, runtime:float, time:time 8 | tag testappec2.log 9 | 10 | 11 | 12 | @type copy 13 | 14 | @type kinesis_firehose 15 | delivery_stream_name minilake1 16 | flush_interval 1s 17 | 18 | 19 | -------------------------------------------------------------------------------- /JP/lab4/images/quicksight_capture01.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-s3-datalake-handson/77a5c739987367b8462c5334b320319885a57d96/JP/lab4/images/quicksight_capture01.png -------------------------------------------------------------------------------- /JP/lab5/asset/ap-northeast-1/5-cmd.txt: -------------------------------------------------------------------------------- 1 | ------------------------------------------------------------------------------------ 2 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 3 | SPDX-License-Identifier: MIT-0 4 | 5 | ------------------------------------------------------------------------------------ 6 | 7 | 8 | 1. 9 | td-agent-gem install fluent-plugin-kinesis -v 2.1.0 10 | 11 | 2. 12 | td-agent-gem list | grep plugin-kinesis 13 | 14 | 3. 15 | export AWS_REGION="ap-northeast-1" 16 | 17 | 4. 18 | /etc/init.d/td-agent restart 19 | 20 | 21 | 6. 22 | create table ec2log ( timestamp varchar, alarmlevel varchar, host varchar, number int2, text varchar ); 23 | 24 | 7. 25 | copy ec2log from 's3://[S3 BUCKET NAME]/minilake-in1' format as json 'auto' iam_role '[作成したIAMロールのARN]'; 26 | 27 | 8. 28 | create external schema my_first_external_schema from data catalog database 'spectrumdb' iam_role '[作成したIAMロールのARN]' create external database if not exists; 29 | 30 | 9. 31 | create external table my_first_external_schema.ec2log_external ( timestamp varchar(max), alarmlevel varchar(max), host varchar(max), number int2, text varchar(max) ) partitioned by (year char(4), month char(2), day char(2), hour char(2)) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' WITH SERDEPROPERTIES ( 'paths'='timestamp,alarmlevel,host,number,text') STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' location 's3://[S3 BUCKET NAME]/minilake-in1/'; 32 | 33 | 10. 34 | select * from svv_external_schemas; 35 | 36 | 11. 37 | select * from svv_external_databases; 38 | 39 | 12. 40 | select * from svv_external_tables; 41 | 42 | 13. 43 | "ADD PARTITION"句の各パーティションの値は、LOCATION句のS3バケットパスのパーティションの値と合わせてください。下記は一例です。 44 | 45 | ALTER TABLE my_first_external_schema.ec2log_external ADD PARTITION (year='2019', month='09', day='27', hour='14') LOCATION 's3://[S3 BUCKET NAME]/minilake-in1/2019/09/27/14'; 46 | 47 | 48 | もしパーティション作成を間違えた場合は、以下を参考にDrop Partitionしてください。 49 | 上記ADD PARTITIONで指定したパーティションの値を指定して削除します。 50 | 51 | ALTER TABLE my_first_external_schema.ec2log_external DROP PARTITION (year='2019', month='09', day='27', hour='14') 52 | 53 | 54 | -------------------------------------------------------------------------------- /JP/lab5/asset/ap-northeast-1/5-minilake_privatesubnet.yaml: -------------------------------------------------------------------------------- 1 | Parameters: 2 | VpcId: 3 | Description: select handson-minilake vpc id 4 | Type: "AWS::EC2::VPC::Id" 5 | EC2SecurityGroupId: 6 | Description: select EC2 Security Group ID 7 | Type: "AWS::EC2::SecurityGroup::Id" 8 | Resources: 9 | # Create Public RouteTable 10 | PrivateRouteTable: 11 | Type: AWS::EC2::RouteTable 12 | Properties: 13 | VpcId: !Ref VpcId 14 | Tags: 15 | - Key: Name 16 | Value: handson-minilake-private-rt 17 | # Create Private Subnet A 18 | PrivateSubnet: 19 | Type: AWS::EC2::Subnet 20 | Properties: 21 | VpcId: !Ref VpcId 22 | CidrBlock: 10.0.0.32/27 23 | AvailabilityZone: "ap-northeast-1a" 24 | Tags: 25 | - Key: Name 26 | Value: handson-minilake-private-sub 27 | # Associate with Private subnet and route table for private subnet 28 | PriSubnetRouteTableAssociation: 29 | Type: AWS::EC2::SubnetRouteTableAssociation 30 | Properties: 31 | SubnetId: !Ref PrivateSubnet 32 | RouteTableId: !Ref PrivateRouteTable 33 | # Create EIP 34 | MyEIP: 35 | Type: "AWS::EC2::EIP" 36 | Properties: 37 | Domain: vpc 38 | # Create NATGateway 39 | myNATGateway: 40 | Type: "AWS::EC2::NatGateway" 41 | DependsOn: 42 | - MyEIP 43 | - PrivateSubnet 44 | Properties: 45 | AllocationId: !GetAtt MyEIP.AllocationId 46 | SubnetId: !Ref PrivateSubnet 47 | Tags: 48 | - Key: Name 49 | Value: handson-minilake-nat 50 | # set NAT as default gateway on route table for private subnet 51 | myPrivateRoute: 52 | Type: AWS::EC2::Route 53 | DependsOn: myNATGateway 54 | Properties: 55 | RouteTableId: !Ref PrivateRouteTable 56 | DestinationCidrBlock: 0.0.0.0/0 57 | NatGatewayId: !Ref myNATGateway 58 | # create sg for RS 59 | myRSSecurityGroup: 60 | Type: 'AWS::EC2::SecurityGroup' 61 | Properties: 62 | GroupName: handson-minilake-sg-private 63 | GroupDescription: Enable SSH access via port 5439 64 | VpcId: !Ref VpcId 65 | SecurityGroupIngress: 66 | - IpProtocol: tcp 67 | FromPort: '5439' 68 | ToPort: '5439' 69 | SourceSecurityGroupId: !Ref EC2SecurityGroupId 70 | -------------------------------------------------------------------------------- /JP/lab5/asset/ap-northeast-1/5-td-agent1.conf: -------------------------------------------------------------------------------- 1 | 2 | @type tail 3 | path /root/es-demo/testapp.log 4 | pos_file /var/log/td-agent/testapp.log.pos 5 | format /^\[(?[^ ]* [^ ]*)\] (?[^ ]*) *? (?[^ ]*) * (?[^ ]*) * (?.*) \[(?.*)\]$/ 6 | time_format %d/%b/%Y:%H:%M:%S %z 7 | types size:integer, status:integer, reqtime:float, runtime:float, time:time 8 | tag testappec2.log 9 | 10 | 11 | 12 | @type copy 13 | 14 | @type cloudwatch_logs 15 | log_group_name minilake_group 16 | log_stream_name testapplog_stream 17 | auto_create_stream true 18 | 19 | 20 | @type kinesis_firehose 21 | delivery_stream_name minilake1 22 | flush_interval 1s 23 | 24 | 25 | -------------------------------------------------------------------------------- /JP/lab5/asset/ap-northeast-1/5-td-agent2.conf: -------------------------------------------------------------------------------- 1 | 2 | @type tail 3 | path /root/es-demo/testapp.log 4 | pos_file /var/log/td-agent/testapp.log.pos 5 | format /^\[(?[^ ]* [^ ]*)\] (?[^ ]*) *? (?[^ ]*) * (?[^ ]*) * (?.*) \[(?.*)\]$/ 6 | time_format %d/%b/%Y:%H:%M:%S %z 7 | types size:integer, status:integer, reqtime:float, runtime:float, time:time 8 | tag testappec2.log 9 | 10 | 11 | 12 | @type copy 13 | 14 | @type kinesis_firehose 15 | delivery_stream_name minilake1 16 | flush_interval 1s 17 | 18 | 19 | -------------------------------------------------------------------------------- /JP/lab5/asset/us-east-1/5-cmd.txt: -------------------------------------------------------------------------------- 1 | ------------------------------------------------------------------------------------ 2 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 3 | SPDX-License-Identifier: MIT-0 4 | 5 | ------------------------------------------------------------------------------------ 6 | 7 | 8 | 1. 9 | td-agent-gem install fluent-plugin-kinesis -v 2.1.0 10 | 11 | 2. 12 | td-agent-gem list | grep plugin-kinesis 13 | 14 | 3. 15 | export AWS_REGION="ap-northeast-1" 16 | 17 | 4. 18 | /etc/init.d/td-agent restart 19 | 20 | 21 | 6. 22 | create table ec2log ( timestamp varchar, alarmlevel varchar, host varchar, number int2, text varchar ); 23 | 24 | 7. 25 | copy ec2log from 's3://[S3 BUCKET NAME]/minilake-in1' format as json 'auto' iam_role '[作成したIAMロールのARN]'; 26 | 27 | 8. 28 | create external schema my_first_external_schema from data catalog database 'spectrumdb' iam_role '[作成したIAMロールのARN]' create external database if not exists; 29 | 30 | 9. 31 | create external table my_first_external_schema.ec2log_external ( timestamp varchar(max), alarmlevel varchar(max), host varchar(max), number int2, text varchar(max) ) partitioned by (year char(4), month char(2), day char(2), hour char(2)) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' WITH SERDEPROPERTIES ( 'paths'='timestamp,alarmlevel,host,number,text') STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' location 's3://[S3 BUCKET NAME]/minilake-in1/'; 32 | 33 | 10. 34 | select * from svv_external_schemas; 35 | 36 | 11. 37 | select * from svv_external_databases; 38 | 39 | 12. 40 | select * from svv_external_tables; 41 | 42 | 13. 43 | "ADD PARTITION"句の各パーティションの値は、LOCATION句のS3バケットパスのパーティションの値と合わせてください。下記は一例です。 44 | 45 | ALTER TABLE my_first_external_schema.ec2log_external ADD PARTITION (year='2019', month='09', day='27', hour='14') LOCATION 's3://[S3 BUCKET NAME]/minilake-in1/2019/09/27/14'; 46 | 47 | 48 | もしパーティション作成を間違えた場合は、以下を参考にDrop Partitionしてください。 49 | 上記ADD PARTITIONで指定したパーティションの値を指定して削除します。 50 | 51 | ALTER TABLE my_first_external_schema.ec2log_external DROP PARTITION (year='2019', month='09', day='27', hour='14') 52 | 53 | 54 | -------------------------------------------------------------------------------- /JP/lab5/asset/us-east-1/5-minilake_privatesubnet.yaml: -------------------------------------------------------------------------------- 1 | Parameters: 2 | VpcId: 3 | Description: select handson-minilake vpc id 4 | Type: "AWS::EC2::VPC::Id" 5 | EC2SecurityGroupId: 6 | Description: select EC2 Security Group ID 7 | Type: "AWS::EC2::SecurityGroup::Id" 8 | Resources: 9 | # Create Public RouteTable 10 | PrivateRouteTable: 11 | Type: AWS::EC2::RouteTable 12 | Properties: 13 | VpcId: !Ref VpcId 14 | Tags: 15 | - Key: Name 16 | Value: handson-minilake-private-rt 17 | # Create Private Subnet A 18 | PrivateSubnet: 19 | Type: AWS::EC2::Subnet 20 | Properties: 21 | VpcId: !Ref VpcId 22 | CidrBlock: 10.0.0.32/27 23 | AvailabilityZone: "us-east-1a" 24 | Tags: 25 | - Key: Name 26 | Value: handson-minilake-private-sub 27 | # Associate with Private subnet and route table for private subnet 28 | PriSubnetRouteTableAssociation: 29 | Type: AWS::EC2::SubnetRouteTableAssociation 30 | Properties: 31 | SubnetId: !Ref PrivateSubnet 32 | RouteTableId: !Ref PrivateRouteTable 33 | # Create EIP 34 | MyEIP: 35 | Type: "AWS::EC2::EIP" 36 | Properties: 37 | Domain: vpc 38 | # Create NATGateway 39 | myNATGateway: 40 | Type: "AWS::EC2::NatGateway" 41 | DependsOn: 42 | - MyEIP 43 | - PrivateSubnet 44 | Properties: 45 | AllocationId: !GetAtt MyEIP.AllocationId 46 | SubnetId: !Ref PrivateSubnet 47 | Tags: 48 | - Key: Name 49 | Value: handson-minilake-nat 50 | # set NAT as default gateway on route table for private subnet 51 | myPrivateRoute: 52 | Type: AWS::EC2::Route 53 | DependsOn: myNATGateway 54 | Properties: 55 | RouteTableId: !Ref PrivateRouteTable 56 | DestinationCidrBlock: 0.0.0.0/0 57 | NatGatewayId: !Ref myNATGateway 58 | # create sg for RS 59 | myRSSecurityGroup: 60 | Type: 'AWS::EC2::SecurityGroup' 61 | Properties: 62 | GroupName: handson-minilake-sg-private 63 | GroupDescription: Enable SSH access via port 5439 64 | VpcId: !Ref VpcId 65 | SecurityGroupIngress: 66 | - IpProtocol: tcp 67 | FromPort: '5439' 68 | ToPort: '5439' 69 | SourceSecurityGroupId: !Ref EC2SecurityGroupId 70 | -------------------------------------------------------------------------------- /JP/lab5/asset/us-east-1/5-td-agent1.conf: -------------------------------------------------------------------------------- 1 | 2 | @type tail 3 | path /root/es-demo/testapp.log 4 | pos_file /var/log/td-agent/testapp.log.pos 5 | format /^\[(?[^ ]* [^ ]*)\] (?[^ ]*) *? (?[^ ]*) * (?[^ ]*) * (?.*) \[(?.*)\]$/ 6 | time_format %d/%b/%Y:%H:%M:%S %z 7 | types size:integer, status:integer, reqtime:float, runtime:float, time:time 8 | tag testappec2.log 9 | 10 | 11 | 12 | @type copy 13 | 14 | @type cloudwatch_logs 15 | log_group_name minilake_group 16 | log_stream_name testapplog_stream 17 | auto_create_stream true 18 | 19 | 20 | @type kinesis_firehose 21 | delivery_stream_name minilake1 22 | flush_interval 1s 23 | 24 | 25 | -------------------------------------------------------------------------------- /JP/lab5/asset/us-east-1/5-td-agent2.conf: -------------------------------------------------------------------------------- 1 | 2 | @type tail 3 | path /root/es-demo/testapp.log 4 | pos_file /var/log/td-agent/testapp.log.pos 5 | format /^\[(?[^ ]* [^ ]*)\] (?[^ ]*) *? (?[^ ]*) * (?[^ ]*) * (?.*) \[(?.*)\]$/ 6 | time_format %d/%b/%Y:%H:%M:%S %z 7 | types size:integer, status:integer, reqtime:float, runtime:float, time:time 8 | tag testappec2.log 9 | 10 | 11 | 12 | @type copy 13 | 14 | @type kinesis_firehose 15 | delivery_stream_name minilake1 16 | flush_interval 1s 17 | 18 | 19 | -------------------------------------------------------------------------------- /JP/lab5/images/Lab5-Section4-Step2-7.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-s3-datalake-handson/77a5c739987367b8462c5334b320319885a57d96/JP/lab5/images/Lab5-Section4-Step2-7.png -------------------------------------------------------------------------------- /JP/lab5/images/Lab5-Section4-Step3-8.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-s3-datalake-handson/77a5c739987367b8462c5334b320319885a57d96/JP/lab5/images/Lab5-Section4-Step3-8.png -------------------------------------------------------------------------------- /JP/lab5/images/Lab5-Section4-Step3-9.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-s3-datalake-handson/77a5c739987367b8462c5334b320319885a57d96/JP/lab5/images/Lab5-Section4-Step3-9.png -------------------------------------------------------------------------------- /JP/lab5/images/Lab5-Section4-Step4-21.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-s3-datalake-handson/77a5c739987367b8462c5334b320319885a57d96/JP/lab5/images/Lab5-Section4-Step4-21.png -------------------------------------------------------------------------------- /JP/lab5/images/Lab5-Section4-Step4-25.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-s3-datalake-handson/77a5c739987367b8462c5334b320319885a57d96/JP/lab5/images/Lab5-Section4-Step4-25.png -------------------------------------------------------------------------------- /JP/lab5/images/Lab5-Section4-Step4-9.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-s3-datalake-handson/77a5c739987367b8462c5334b320319885a57d96/JP/lab5/images/Lab5-Section4-Step4-9.png -------------------------------------------------------------------------------- /JP/lab5/images/kibana_pain2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-s3-datalake-handson/77a5c739987367b8462c5334b320319885a57d96/JP/lab5/images/kibana_pain2.png -------------------------------------------------------------------------------- /JP/lab5/images/quicksight_vpc_setting.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-s3-datalake-handson/77a5c739987367b8462c5334b320319885a57d96/JP/lab5/images/quicksight_vpc_setting.png -------------------------------------------------------------------------------- /JP/lab6/additional_info_lab6.md: -------------------------------------------------------------------------------- 1 | ------------------------------------------------------------------------------------ 2 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 3 | SPDX-License-Identifier: MIT-0 4 | 5 | ------------------------------------------------------------------------------------ 6 | 7 | 8 | # Athena におけるクエリ比較の補足説明 9 | ## JSON 形式 vs JSON 形式(パーティション) vs Parquet 形式 vs Parquet 形式(パーティション) 10 | 11 | ### 1. JSON 形式 12 | 13 | 14 | 15 | **[クエリ例]** 16 | 17 | ``` 18 | SELECT count(user) FROM "minilake"."minilake_in1" where user = 'uchida' and timestamp >= '2019-09-27 13%' AND timestamp <= '2019-09-27 21%'; 19 | ``` 20 | 21 | 22 | ### 2. JSON 形式(年・月・日・時間でパーティション) 23 | 24 | 25 | 26 | **[クエリ例]** 27 | 28 | ``` 29 | SELECT count(user) FROM "minilake"."minilake_in1" where user = 'uchida' and partition_0 = '2019' AND partition_1 = '09' AND partition_2 = '27' AND partition_3 >= '13' AND partition_3 <= '21'; 30 | ``` 31 | 32 | 33 | ### 3. Parquet 形式 34 | 35 | 36 | 37 | 38 | **[クエリ例]** 39 | 40 | ``` 41 | SELECT count(user) FROM "minilake"."minilake_out1" where user = 'uchida' and timestamp >= '2019-09-27 13%' AND timestamp <= '2019-09-27 21%'; 42 | ``` 43 | 44 | 45 | ### 4. Parquet 形式(ユーザー・年・月・日・時間でパーティション) 46 | 47 | 48 | 49 | **[クエリ例]** 50 | 51 | ``` 52 | SELECT count(user) FROM "minilake"."minilake_out2" where user = 'uchida' and timestamp >= '2019-09-27 13%' AND timestamp <= '2019-09-27 21%'; 53 | ``` 54 | 55 | **Athena はスキャンしたデータ量による課金であるため、スキャン量がすくなければコストも低く抑えられます。多くの場合、スキャン量が少なければパフォーマンス向上にもつながります。** 56 | 読み取る量を減らす工夫のために、パーティション、圧縮、カラムナフォーマットを利用します。参考情報は[こちら](https://aws.amazon.com/jp/blogs/news/top-10-performance-tuning-tips-for-amazon-athena/)を参照ください。 57 | 58 | 59 | -------------------------------------------------------------------------------- /JP/lab6/asset/ap-northeast-1/6-cmd.txt: -------------------------------------------------------------------------------- 1 | ------------------------------------------------------------------------------------ 2 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 3 | SPDX-License-Identifier: MIT-0 4 | 5 | ------------------------------------------------------------------------------------ 6 | 7 | 8 | 1. 9 | SELECT count(user) FROM "minilake"."minilake_in1" where user='uchida' and timestamp >= '2019-09-27 13%' AND timestamp <= '2019-09-27 21%'; 10 | 11 | 2. 12 | SELECT count(user) FROM "minilake"."minilake_out1" where user='uchida' and timestamp >= '2019-09-27 13%' AND timestamp <= '2019-09-27 21%'; 13 | 14 | 15 | 3. 16 | #applymapping1 = ApplyMapping.apply(frame = datasource0, mappings = [("timestamp", "string", "timestamp", "string"), ("alarmlevel", "string", "alarmlevel", "string"), ("host", "string", "host", "string"), ("user", "string", "user", "string"), ("number", "string", "number", "string"), ("text", "string", "text", "string")], transformation_ctx = "applymapping1") 17 | ###1 18 | applymapping1 = ApplyMapping.apply(frame = datasource0, mappings = [("timestamp", "string", "timestamp", "string"), ("alarmlevel", "string", "alarmlevel", "string"), ("host", "string", "host", "string"), ("user", "string", "user", "string"), ("number", "string", "number", "string"), ("text", "string", "text", "string"),("partition_0", "string", "year", "string"), ("partition_1", "string", "month", "string"), ("partition_2", "string", "day", "string"), ("partition_3", "string", "hour", "string")], transformation_ctx = "applymapping1") 19 | 20 | 21 | #datasink4 = glueContext.write_dynamic_frame.from_options(frame = dropnullfields3, connection_type = "s3", connection_options = {"path": "s3://[S3 BUCKET NAME]/minilake-out"}, format = "parquet", transformation_ctx = "datasink4") 22 | ###1 23 | datasink4 = glueContext.write_dynamic_frame.from_options(frame = dropnullfields3, connection_type = "s3", connection_options = {"path": "s3://[S3 BUCKET NAME]/minilake-out2", "partitionKeys": ["user", "year", "month", "day", "hour"]}, format = "parquet", transformation_ctx = "datasink4") 24 | 25 | 4. 26 | SELECT count(user) FROM "minilake"."minilake_out2" where user='uchida' and timestamp >= '2019-09-27 13%' AND timestamp <= '2019-09-27 21%'; 27 | 28 | -------------------------------------------------------------------------------- /JP/lab6/asset/us-east-1/6-cmd.txt: -------------------------------------------------------------------------------- 1 | ------------------------------------------------------------------------------------ 2 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 3 | SPDX-License-Identifier: MIT-0 4 | 5 | ------------------------------------------------------------------------------------ 6 | 7 | 8 | 1. 9 | SELECT count(user) FROM "minilake"."minilake_in1" where user='uchida' and timestamp >= '2019-09-27 13%' AND timestamp <= '2019-09-27 21%'; 10 | 11 | 2. 12 | SELECT count(user) FROM "minilake"."minilake_out1" where user='uchida' and timestamp >= '2019-09-27 13%' AND timestamp <= '2019-09-27 21%'; 13 | 14 | 15 | 3. 16 | #applymapping1 = ApplyMapping.apply(frame = datasource0, mappings = [("timestamp", "string", "timestamp", "string"), ("alarmlevel", "string", "alarmlevel", "string"), ("host", "string", "host", "string"), ("user", "string", "user", "string"), ("number", "string", "number", "string"), ("text", "string", "text", "string")], transformation_ctx = "applymapping1") 17 | ###1 18 | applymapping1 = ApplyMapping.apply(frame = datasource0, mappings = [("timestamp", "string", "timestamp", "string"), ("alarmlevel", "string", "alarmlevel", "string"), ("host", "string", "host", "string"), ("user", "string", "user", "string"), ("number", "string", "number", "string"), ("text", "string", "text", "string"),("partition_0", "string", "year", "string"), ("partition_1", "string", "month", "string"), ("partition_2", "string", "day", "string"), ("partition_3", "string", "hour", "string")], transformation_ctx = "applymapping1") 19 | 20 | 21 | #datasink4 = glueContext.write_dynamic_frame.from_options(frame = dropnullfields3, connection_type = "s3", connection_options = {"path": "s3://[S3 BUCKET NAME]/minilake-out"}, format = "parquet", transformation_ctx = "datasink4") 22 | ###1 23 | datasink4 = glueContext.write_dynamic_frame.from_options(frame = dropnullfields3, connection_type = "s3", connection_options = {"path": "s3://[S3 BUCKET NAME]/minilake-out2", "partitionKeys": ["user", "year", "month", "day", "hour"]}, format = "parquet", transformation_ctx = "datasink4") 24 | 25 | 4. 26 | SELECT count(user) FROM "minilake"."minilake_out2" where user='uchida' and timestamp >= '2019-09-27 13%' AND timestamp <= '2019-09-27 21%'; 27 | 28 | -------------------------------------------------------------------------------- /JP/lab6/images/CSV_nopartition.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-s3-datalake-handson/77a5c739987367b8462c5334b320319885a57d96/JP/lab6/images/CSV_nopartition.png -------------------------------------------------------------------------------- /JP/lab6/images/CSV_partition.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-s3-datalake-handson/77a5c739987367b8462c5334b320319885a57d96/JP/lab6/images/CSV_partition.png -------------------------------------------------------------------------------- /JP/lab6/images/Parquet_nopartition.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-s3-datalake-handson/77a5c739987367b8462c5334b320319885a57d96/JP/lab6/images/Parquet_nopartition.png -------------------------------------------------------------------------------- /JP/lab6/images/Parquet_partition.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-s3-datalake-handson/77a5c739987367b8462c5334b320319885a57d96/JP/lab6/images/Parquet_partition.png -------------------------------------------------------------------------------- /JP/lab6/images/glue_job_capture01.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-s3-datalake-handson/77a5c739987367b8462c5334b320319885a57d96/JP/lab6/images/glue_job_capture01.png -------------------------------------------------------------------------------- /LICENSE-SAMPLECODE: -------------------------------------------------------------------------------- 1 | Copyright 2019 Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | 3 | Permission is hereby granted, free of charge, to any person obtaining a copy of this 4 | software and associated documentation files (the "Software"), to deal in the Software 5 | without restriction, including without limitation the rights to use, copy, modify, 6 | merge, publish, distribute, sublicense, and/or sell copies of the Software, and to 7 | permit persons to whom the Software is furnished to do so. 8 | 9 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, 10 | INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A 11 | PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 12 | HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION 13 | OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE 14 | SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 15 | -------------------------------------------------------------------------------- /LICENSE-SUMMARY: -------------------------------------------------------------------------------- 1 | Copyright 2019 Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | 3 | The documentation is made available under the Creative Commons Attribution-ShareAlike 4.0 International License. See the LICENSE file. 4 | 5 | The sample code within this documentation is made available under the MIT-0 license. See the LICENSE-SAMPLECODE file. 6 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## This hands-on has been migrated to the below workshop. 2 | 3 | https://catalog.us-east-1.prod.workshops.aws/workshops/c65358ab-dfea-44f0-a764-cb4e5aef5f01/ja-JP 4 | 5 | ## Amazon S3 Data Lake Handson 6 | 7 | This repository provides the contents for AWS Data Lake Handson in both Japanese and English. 8 | 9 | ## License Summary 10 | 11 | The documentation is made available under the Creative Commons Attribution-ShareAlike 4.0 International License. See the LICENSE file. 12 | 13 | The sample code within this documentation is made available under the MIT-0 license. See the LICENSE-SAMPLECODE file. 14 | -------------------------------------------------------------------------------- /datalake-handson-sample-data/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-s3-datalake-handson/77a5c739987367b8462c5334b320319885a57d96/datalake-handson-sample-data/.DS_Store -------------------------------------------------------------------------------- /datalake-handson-sample-data/amazon_reviews_JP.csv.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-s3-datalake-handson/77a5c739987367b8462c5334b320319885a57d96/datalake-handson-sample-data/amazon_reviews_JP.csv.gz --------------------------------------------------------------------------------