├── README └── lambda_s3_kafka.py /README: -------------------------------------------------------------------------------- 1 | This is a demo Lambda function that produces events to a Kafka topic, notifying consumers about new files in S3 buckets. 2 | 3 | To deploy this, you'll need: 4 | 5 | * Apache Kafka cluster. I used Confluent Cloud, deployed on GCP - because hybrid clouds are the most fun. 6 | * Create a deployment package for lambda - this is a zip that contains the lambda_s3_kafka.py file and all the dependencies. In this case, the dependency is kafka-python, and you can pull it into the zip by running: pip install kafka-python -t /Users/gwen/workspaces/lambda_s3_kafka (your directory is hopefully different). 7 | * Upload the package to Lambda. I used the GUI. Make sure the handler is lambda_s3_kafka.lambda_handler, that you set the privileges correctly and that you use Python 2.7 (at least thats what I used). 8 | * You can test that the events arrive with: ccloud -c ccloud-gcp consume -b -t webapp, and you should see something like: "We have new object. In bucket gwen-hub, with key LICENSE.txt" 9 | 10 | Notes: 11 | 12 | * I used kafka-python rather than the more logical confluent-kafka-python because confluent-kafka-python has a dependency on librdkafka, which is a C library. Creating a deployment package on MacosX and deploying on the Linux that Lambda uses got a bit complex with binaries, so I skipped for now. 13 | * Note the extra SSL configs. You may or may not need them - depending on the version of your SSL dependency. But I don't control what Lambda is running. -------------------------------------------------------------------------------- /lambda_s3_kafka.py: -------------------------------------------------------------------------------- 1 | from __future__ import print_function 2 | 3 | import json 4 | import boto3 5 | from kafka import KafkaProducer 6 | import urllib 7 | import ssl 8 | import logging 9 | 10 | root = logging.getLogger() 11 | if root.handlers: 12 | for handler in root.handlers: 13 | root.removeHandler(handler) 14 | logging.basicConfig(format='%(asctime)s %(message)s',level=logging.DEBUG) 15 | 16 | print('Loading function') 17 | 18 | s3 = boto3.client('s3') 19 | 20 | context = ssl.create_default_context() 21 | context.options &= ssl.OP_NO_TLSv1 22 | context.options &= ssl.OP_NO_TLSv1_1 23 | 24 | producer = KafkaProducer( 25 | bootstrap_servers=['pkc-loyje.us-central1.gcp.confluent.cloud:9092'], 26 | value_serializer=lambda m: json.dumps(m).encode('ascii'), 27 | retry_backoff_ms=500, 28 | request_timeout_ms=20000, 29 | security_protocol='SASL_SSL', 30 | sasl_mechanism='PLAIN', 31 | ssl_context=context, 32 | sasl_plain_username='KAQ6FBDAGJHXTNUD', 33 | sasl_plain_password='+Vz/bZr89unWz8f2ufuDUeJgKSB2/BBFtAsxgCM6cstG2WrO6cK4lMTfoTyewSUv') 34 | 35 | 36 | 37 | def lambda_handler(event, context): 38 | 39 | print("Received event: " + json.dumps(event, indent=2)) 40 | 41 | # Get the object from the event and show its content type 42 | bucket = event['Records'][0]['s3']['bucket']['name'] 43 | key = urllib.unquote_plus(event['Records'][0]['s3']['object']['key'].encode('utf8')) 44 | try: 45 | print("We have new object. In bucket {}, with key {}".format(bucket, key)) 46 | future = producer.send("webapp","We have new object. In bucket {}, with key {}".format(bucket, key)) 47 | record_metadata = future.get(timeout=10) 48 | print("sent event to Kafka! topic {} partition {} offset {}".format(record_metadata.topic, record_metadata.partition, record_metadata.offset)) 49 | 50 | except Exception as e: 51 | print(e) 52 | print('Error getting object {} from bucket {}. Make sure they exist and your bucket is in the same region as this function.'.format(key, bucket)) 53 | raise e 54 | 55 | 56 | --------------------------------------------------------------------------------