├── README.md └── python2.7 └── kinesis_stream_put_to_s3_athena_partitioning.py /README.md: -------------------------------------------------------------------------------- 1 | ## AWS lambda functions 2 | 3 | 4 | ### python2.7/: 5 | 6 | - **kinesis_stream_put_to_s3_athena_partitioning.py** 7 | 8 | Description: process data from kinesis stream and put to s3, partitioning the data for Athena. The partition is based on the timestamp of the "d" value, so is not important the order of the stream data. 9 | 10 | Environment Variables: 11 | S3_BUCKET -> name of the s3 bucket (es. 'bucket') 12 | S3_PATH -> custom path (es. 'test/test1/test2') 13 | 14 | Example minimum json: 15 | { 16 | "_id":"asdasdasdasd", 17 | "d":{"sec":1498746471} 18 | } 19 | 20 | 21 | -------------------------------------------------------------------------------- /python2.7/kinesis_stream_put_to_s3_athena_partitioning.py: -------------------------------------------------------------------------------- 1 | from __future__ import print_function 2 | 3 | import os 4 | import base64 5 | import json 6 | import boto3 7 | from datetime import datetime 8 | 9 | S3_BUCKET = os.environ['S3_BUCKET'] 10 | S3_PATH = os.environ['S3_PATH'] 11 | 12 | print('Loading function') 13 | 14 | 15 | def lambda_handler(event, context): 16 | 17 | #s3 client init 18 | s3 = boto3.client('s3') 19 | 20 | for record in event['Records']: 21 | # Kinesis data is base64 encoded so decode here 22 | str_payload = base64.b64decode(record['kinesis']['data']) 23 | 24 | 25 | json_payload = json.loads(str_payload) 26 | 27 | 28 | date = datetime.fromtimestamp(json_payload['d']['sec']) 29 | 30 | filePath = S3_PATH + "/year=" + str(date.year) + "/month=" + str(date.month) + "/day=" + str(date.day) + "/hour=" + str(date.hour) + "/minute=" + str(date.minute) + "/" + str(json_payload['d']['sec']) + "_" + str(json_payload['_id']) + ".json" 31 | 32 | #print("path: " + filePath) 33 | 34 | s3.put_object(ContentType="application/json", Bucket=S3_BUCKET, Key=filePath, Body=str_payload) 35 | 36 | 37 | return 'Successfully processed {} records.'.format(len(event['Records'])) --------------------------------------------------------------------------------