├── Confluent Account.md ├── Confluent Topic Creation.md ├── ConfluentClusterSetup.md ├── Kafka key and secrets.md ├── LICENSE ├── README.md ├── cardekho_dataset.csv ├── install_python_packages.txt ├── kafka_assignment.txt ├── kafka_json_consumer.py ├── kafka_json_producer.py ├── requirements.txt └── restaurant_orders.csv /Confluent Account.md: -------------------------------------------------------------------------------- 1 | # kafka-tutorial 2 | 3 | To truly tap into Kafka, you need Confluent 4 | You love Apache Kafka®, but not managing it. Cloud-native, complete, and fully managed service goes above & beyond Kafka so your best people can focus on delivering value to your business. 5 | 6 | With Confluent, capture and process customer interactions as they happen. Unlock a data-rich view of their actions and preferences and engage with them in the most meaningful ways, personalizing their experiences, across every channel, in real time. 7 | 8 | 9 | ## Create Account 10 | 11 | 1. open signup page of confluent Kafka 12 | [Confluent Kafka](https://www.confluent.io/get-started/) 13 | ![download](https://user-images.githubusercontent.com/34875169/169843335-edbf331f-96a2-499f-81a0-892dfeed9d78.png) 14 | 15 | 16 | 17 | 18 | 2. Go to Confluent [Login page](https://confluent.cloud/signup/idp/google-oauth2?signup_source=iosocial&iov_id=49a3b680-a2a1-4f52-8c1d-888fae73120e&_ga=2.145483828.1681359053.1653306300-338132655.1653306300) 19 | 20 | ![download](https://user-images.githubusercontent.com/34875169/169844230-41d01336-f22d-4037-99e4-ac5e458e0c24.png) 21 | 22 | 23 | -------------------------------------------------------------------------------- /Confluent Topic Creation.md: -------------------------------------------------------------------------------- 1 | Open confluent [home page](https://confluent.cloud/home) 2 | ![download](https://user-images.githubusercontent.com/34875169/169838941-9f722c64-c149-4039-8656-31a832c6b7ce.png) 3 | Choose default cluster 4 | ![download](https://user-images.githubusercontent.com/34875169/169838953-48a6bfa3-d434-4180-9277-147c3dd913e5.png) 5 | Choose cluster 6 | ![download](https://user-images.githubusercontent.com/34875169/169838966-295cbbab-0388-49bb-808d-951a043857a7.png) 7 | Choose Topics 8 | ![download](https://user-images.githubusercontent.com/34875169/169838983-738d8f6d-e727-4f89-acb4-7daa7cd2270b.png) 9 | Click on Add topic 10 | ![download](https://user-images.githubusercontent.com/34875169/169838997-2306476f-5254-483c-bd09-627375bd5e46.png) 11 | Provide topic details. 12 | ![download](https://user-images.githubusercontent.com/34875169/169839009-8b7e7bde-948e-4972-b0cd-b4115a3a8928.png) 13 | Now you can see that topic has been created. 14 | ![download](https://user-images.githubusercontent.com/34875169/169839027-b7e5307f-0045-4d30-813c-bb7d7c6aa3c7.png) 15 | 16 | -------------------------------------------------------------------------------- /ConfluentClusterSetup.md: -------------------------------------------------------------------------------- 1 | ## Confluent Kafka Cluster setup 2 | 3 | 1. Open [Home page](https://confluent.cloud/home) 4 | ![download](https://user-images.githubusercontent.com/34875169/169840757-cd9463c2-e2c7-4e6e-a142-0a3c6c988663.png) 5 | 6 | 2. Choose Default cluster 7 | ![download](https://user-images.githubusercontent.com/34875169/169840798-ec8b73a4-7d1b-460b-8ff0-7515cf7412c0.png) 8 | 9 | 3. Add cluster 10 | ![download](https://user-images.githubusercontent.com/34875169/169840830-01cf05a8-0990-4c5e-8bdf-17fed1af1d0b.png) 11 | 12 | 4. Choose free version 13 | ![download](https://user-images.githubusercontent.com/34875169/169840842-95ce0d9a-4138-41a4-b800-f5d77d912417.png) 14 | 15 | 5. Choose any cloud or you can follow same as per below image 16 | ![download](https://user-images.githubusercontent.com/34875169/169840853-581fba6c-0683-47f0-a845-0bc7053ebcc8.png) 17 | 18 | 6. Provide cluster name and then launch cluster 19 | ![download](https://user-images.githubusercontent.com/34875169/169840870-accc7373-8955-49b7-9470-ebceebe0f24c.png) 20 | 21 | -------------------------------------------------------------------------------- /Kafka key and secrets.md: -------------------------------------------------------------------------------- 1 | Obtain kafka cluster key and secrets 2 | 3 | 1. Open [Homepage](https://confluent.cloud/home) 4 | ![download](https://user-images.githubusercontent.com/34875169/169841520-4166bc30-9a1d-4b9e-ac10-47afa7abc927.png) 5 | 6 | 2. Choose default cluster 7 | ![download](https://user-images.githubusercontent.com/34875169/169841535-98372671-f95a-4047-9602-4ad35b817672.png) 8 | 9 | 3. Choose default cluster 10 | ![download](https://user-images.githubusercontent.com/34875169/169841548-efecc085-ca0b-4378-ac58-9acb626ef7a1.png) 11 | 12 | 4. Choose Data integration option 13 | ![download](https://user-images.githubusercontent.com/34875169/169841560-1911095a-4cea-4efd-a291-95f86e0ba2eb.png) 14 | 15 | 5. Choose API keys 16 | ![download](https://user-images.githubusercontent.com/34875169/169841568-8cc7e3a1-c69f-4732-85e3-e04684100608.png) 17 | 18 | 6. Choose global access and then choose next 19 | ![download](https://user-images.githubusercontent.com/34875169/169841580-27d88d85-1fa4-44f5-84f8-af00c85dd9d8.png) 20 | 21 | 22 | 7. Download the key and secrets it will be used to send data to kafka topics. 23 | ![download](https://user-images.githubusercontent.com/34875169/169842232-ab6dc20f-bbda-40a9-ad76-c50ddcb00e63.png) 24 | 25 | 26 | 27 | 28 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2022 Avnish Yadav 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ![download](https://user-images.githubusercontent.com/34875169/169837256-b5cce5b4-0b10-4a5b-82b7-926f10690437.png) 2 | *** 3 | How to setup confluent Kafka. 4 | 1. [Account Setup](Confluent%20Account.md) 5 | 2. [Cluster Setup](ConfluentClusterSetup.md) 6 | 3. [Kafka Topic](Confluent%20Topic%20Creation.md) 7 | 4. [Obtain secrets](Kafka%20key%20and%20secrets.md) 8 | *** 9 | 10 | Create a conda environment 11 | ``` 12 | conda create -p venv python==3.7 -y 13 | ``` 14 | 15 | Activate conda environment 16 | ``` 17 | conda activate venv 18 | ``` 19 | 20 | To use confluent kafka we need following details from Confluent dashboard. 21 | 22 | ``` 23 | confluentClusterName = "" 24 | confluentBootstrapServers = "" 25 | confluentTopicName = "" 26 | confluentApiKey = "" 27 | confluentSecret = "" 28 | ``` 29 | Add below library in requirements.txt 30 | ``` 31 | confluent-kafka[avro,json,protobuf] 32 | pyspark==3.2.1 33 | ``` 34 | 35 | ### Read data from kafka topic 36 | Import necessary packages 37 | ``` 38 | from pyspark.sql import SparkSession 39 | ``` 40 | 41 | Create a spark session object using below snippet. 42 | ``` 43 | spark_session=SparkSession.builder.master("local[*]").appName("Confluent").getOrCreate() 44 | ``` 45 | Read data from kafka topic 46 | ``` 47 | df = (spark_session 48 | .readStream 49 | .format("kafka") 50 | .option("kafka.bootstrap.servers", confluentBootstrapServers) 51 | .option("kafka.security.protocol", "SASL_SSL") 52 | .option("kafka.sasl.jaas.config", 53 | "org.apache.kafka.common.security.plain.PlainLoginModule required username='{}' password='{}';".format(confluentApiKey, confluentSecret)) 54 | .option("kafka.ssl.endpoint.identification.algorithm", "https") 55 | .option("kafka.sasl.mechanism", "PLAIN") 56 | .option("subscribe", confluentTopicName) 57 | .option("startingOffsets", "earliest") 58 | .option("failOnDataLoss", "false") 59 | .load() 60 | ) 61 | ``` 62 | 63 | process read data from kafka topic 64 | 65 | ``` 66 | df = (df.withColumn('key_str',df['key'].cast('string').alias('key_str')).drop('key').withColumn('value_str',df['value'].cast('string').alias('key_str'))) 67 | ``` 68 | 69 | Write data in json file. 70 | ``` 71 | query = (df.selectExpr("value_str").writeStream 72 | .format("json") 73 | .option("format", "append") 74 | .trigger(processingTime="5 seconds") 75 | .option("checkpointLocation", os.path.join("csv_checkpoint")) 76 | .option("path", os.path.join("json")) 77 | .outputMode("append") 78 | .start() 79 | ) 80 | query.awaitTermination() 81 | ``` 82 | 83 | Write data in csv file 84 | ``` 85 | query = (df.writeStream 86 | .format("csv") 87 | .option("format", "append") 88 | .trigger(processingTime="5 seconds") 89 | .option("checkpointLocation", os.path.join("csv_checkpoint")) 90 | .option("path", os.path.join("csv")) 91 | .outputMode("append") 92 | .start() 93 | ) 94 | query.awaitTermination() 95 | ``` 96 | Write data to kafka topic 97 | ``` 98 | query = (df.writeStream 99 | .format("kafka") 100 | .option("kafka.bootstrap.servers", confluentBootstrapServers) 101 | .option("kafka.security.protocol", "SASL_SSL") 102 | .option("kafka.sasl.jaas.config", 103 | "org.apache.kafka.common.security.plain.PlainLoginModule required username='{}' password='{}';".format( 104 | confluentApiKey, confluentSecret)) 105 | .option("kafka.ssl.endpoint.identification.algorithm", "https") 106 | .option("kafka.sasl.mechanism", "PLAIN") 107 | .option("checkpointLocation", os.path.join("kafka_checkpoint")) 108 | .option("topic", confluentTopicName).start()) 109 | 110 | 111 | query.awaitTermination() 112 | ``` 113 | 114 | *** 115 | Note: Don't run your python script using python command 116 | use below command to run your script for kafka confluent. 117 | *** 118 | 119 | To run python script 120 | ```commandline 121 | spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.2.1 122 | ``` -------------------------------------------------------------------------------- /install_python_packages.txt: -------------------------------------------------------------------------------- 1 | 1. Install Python3 version (Check for Windows, Linux and Mac) 2 | 3 | 2. If not then install pip3 package explicitly 4 | 5 | 3. pip3 install confluent_kafka 6 | 4. pip3 install pandas 7 | 5. pip3 install requests 8 | 9 | 10 | # During the execution of final code you might face error of "No Module name jsonschema" 11 | 12 | # pip3 install jsonschema 13 | -------------------------------------------------------------------------------- /kafka_assignment.txt: -------------------------------------------------------------------------------- 1 | # download restaurent data from below mentioned link 2 | 3 | Download Data Link -> https://github.com/shashank-mishra219/Confluent-Kafka-Setup/blob/main/restaurant_orders.csv 4 | 5 | Complete the given below task to finish this assignment. 6 | 7 | 1. Setup Confluent Kafka Account 8 | 2. Create one kafka topic named as "restaurent-take-away-data" with 3 partitions 9 | 3. Setup key (string) & value (json) schema in the confluent schema registry 10 | 4. Write a kafka producer program (python or any other language) to read data records from restaurent data csv file, 11 | make sure schema is not hardcoded in the producer code, read the latest version of schema and schema_str from schema registry and use it for 12 | data serialization. 13 | 5. From producer code, publish data in Kafka Topic one by one and use dynamic key while publishing the records into the Kafka Topic 14 | 6. Write kafka consumer code and create two copies of same consumer code and save it with different names (kafka_consumer_1.py & kafka_consumer_2.py), 15 | again make sure lates schema version and schema_str is not hardcoded in the consumer code, read it automatically from the schema registry to desrialize the data. 16 | Now test two scenarios with your consumer code: 17 | a.) Use "group.id" property in consumer config for both consumers and mention different group_ids in kafka_consumer_1.py & kafka_consumer_2.py, 18 | apply "earliest" offset property in both consumers and run these two consumers from two different terminals. Calculate how many records each consumer 19 | consumed and printed on the terminal 20 | b.) Use "group.id" property in consumer config for both consumers and mention same group_ids in kafka_consumer_1.py & kafka_consumer_2.py, 21 | apply "earliest" offset property in both consumers and run these two consumers from two different terminals. Calculate how many records each consumer 22 | consumed and printed on the terminal 23 | 24 | 7. Once above questions are done, write another kafka consumer to read data from kafka topic and from the consumer code create one csv file "output.csv" 25 | and append consumed records output.csv file 26 | 27 | -------------------------------------------------------------------------------- /kafka_json_consumer.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | 3 | from confluent_kafka import Consumer 4 | from confluent_kafka.serialization import SerializationContext, MessageField 5 | from confluent_kafka.schema_registry.json_schema import JSONDeserializer 6 | 7 | 8 | API_KEY = 'HNUA2KUYENIP44PV' 9 | ENDPOINT_SCHEMA_URL = 'https://psrc-35wr2.us-central1.gcp.confluent.cloud' 10 | API_SECRET_KEY = 'TH5n14kG1JAD6b8rmf92Y6wyXPY66De2kzbiZUS0jytRfkxpEM4rWdlGVSsM/nFR' 11 | BOOTSTRAP_SERVER = 'pkc-lzvrd.us-west4.gcp.confluent.cloud:9092' 12 | SECURITY_PROTOCOL = 'SASL_SSL' 13 | SSL_MACHENISM = 'PLAIN' 14 | SCHEMA_REGISTRY_API_KEY = 'PBEUUAHOC2GTPJWT' 15 | SCHEMA_REGISTRY_API_SECRET = 'EuAq+lp9CJYCs2n/TKOdhk9C2bbMl0ZRyE6KfYJ0v2Ng6anqHnLzqAtCjSwMSE+Y' 16 | 17 | 18 | def sasl_conf(): 19 | 20 | sasl_conf = {'sasl.mechanism': SSL_MACHENISM, 21 | # Set to SASL_SSL to enable TLS support. 22 | # 'security.protocol': 'SASL_PLAINTEXT'} 23 | 'bootstrap.servers':BOOTSTRAP_SERVER, 24 | 'security.protocol': SECURITY_PROTOCOL, 25 | 'sasl.username': API_KEY, 26 | 'sasl.password': API_SECRET_KEY 27 | } 28 | return sasl_conf 29 | 30 | 31 | 32 | def schema_config(): 33 | return {'url':ENDPOINT_SCHEMA_URL, 34 | 35 | 'basic.auth.user.info':f"{SCHEMA_REGISTRY_API_KEY}:{SCHEMA_REGISTRY_API_SECRET}" 36 | 37 | } 38 | 39 | 40 | class Car: 41 | def __init__(self,record:dict): 42 | for k,v in record.items(): 43 | setattr(self,k,v) 44 | 45 | self.record=record 46 | 47 | @staticmethod 48 | def dict_to_car(data:dict,ctx): 49 | return Car(record=data) 50 | 51 | def __str__(self): 52 | return f"{self.record}" 53 | 54 | 55 | def main(topic): 56 | 57 | schema_str = """ 58 | { 59 | "$id": "http://example.com/myURI.schema.json", 60 | "$schema": "http://json-schema.org/draft-07/schema#", 61 | "additionalProperties": false, 62 | "description": "Sample schema to help you get started.", 63 | "properties": { 64 | "brand": { 65 | "description": "The type(v) type is used.", 66 | "type": "string" 67 | }, 68 | "car_name": { 69 | "description": "The type(v) type is used.", 70 | "type": "string" 71 | }, 72 | "engine": { 73 | "description": "The type(v) type is used.", 74 | "type": "number" 75 | }, 76 | "fuel_type": { 77 | "description": "The type(v) type is used.", 78 | "type": "string" 79 | }, 80 | "km_driven": { 81 | "description": "The type(v) type is used.", 82 | "type": "number" 83 | }, 84 | "max_power": { 85 | "description": "The type(v) type is used.", 86 | "type": "number" 87 | }, 88 | "mileage": { 89 | "description": "The type(v) type is used.", 90 | "type": "number" 91 | }, 92 | "model": { 93 | "description": "The type(v) type is used.", 94 | "type": "string" 95 | }, 96 | "seats": { 97 | "description": "The type(v) type is used.", 98 | "type": "number" 99 | }, 100 | "seller_type": { 101 | "description": "The type(v) type is used.", 102 | "type": "string" 103 | }, 104 | "selling_price": { 105 | "description": "The type(v) type is used.", 106 | "type": "number" 107 | }, 108 | "transmission_type": { 109 | "description": "The type(v) type is used.", 110 | "type": "string" 111 | }, 112 | "vehicle_age": { 113 | "description": "The type(v) type is used.", 114 | "type": "number" 115 | } 116 | }, 117 | "title": "SampleRecord", 118 | "type": "object" 119 | } 120 | """ 121 | json_deserializer = JSONDeserializer(schema_str, 122 | from_dict=Car.dict_to_car) 123 | 124 | consumer_conf = sasl_conf() 125 | consumer_conf.update({ 126 | 'group.id': 'group1', 127 | 'auto.offset.reset': "earliest"}) 128 | 129 | consumer = Consumer(consumer_conf) 130 | consumer.subscribe([topic]) 131 | 132 | 133 | while True: 134 | try: 135 | # SIGINT can't be handled when polling, limit timeout to 1 second. 136 | msg = consumer.poll(1.0) 137 | if msg is None: 138 | continue 139 | 140 | car = json_deserializer(msg.value(), SerializationContext(msg.topic(), MessageField.VALUE)) 141 | 142 | if car is not None: 143 | print("User record {}: car: {}\n" 144 | .format(msg.key(), car)) 145 | except KeyboardInterrupt: 146 | break 147 | 148 | consumer.close() 149 | 150 | main("test_topic") -------------------------------------------------------------------------------- /kafka_json_producer.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | # 4 | # Copyright 2020 Confluent Inc. 5 | # 6 | # Licensed under the Apache License, Version 2.0 (the "License"); 7 | # you may not use this file except in compliance with the License. 8 | # You may obtain a copy of the License at 9 | # 10 | # http://www.apache.org/licenses/LICENSE-2.0 11 | # 12 | # Unless required by applicable law or agreed to in writing, software 13 | # distributed under the License is distributed on an "AS IS" BASIS, 14 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 15 | # See the License for the specific language governing permissions and 16 | # limitations under the License. 17 | 18 | 19 | # A simple example demonstrating use of JSONSerializer. 20 | 21 | import argparse 22 | from uuid import uuid4 23 | from six.moves import input 24 | from confluent_kafka import Producer 25 | from confluent_kafka.serialization import StringSerializer, SerializationContext, MessageField 26 | from confluent_kafka.schema_registry import SchemaRegistryClient 27 | from confluent_kafka.schema_registry.json_schema import JSONSerializer 28 | #from confluent_kafka.schema_registry import * 29 | import pandas as pd 30 | from typing import List 31 | 32 | FILE_PATH = "/Users/shashankmishra/Desktop/Kafka Classes/Confluen Kafka Setup/Confluent-Kafka-Setup/cardekho_dataset.csv" 33 | columns=['car_name', 'brand', 'model', 'vehicle_age', 'km_driven', 'seller_type', 34 | 'fuel_type', 'transmission_type', 'mileage', 'engine', 'max_power', 35 | 'seats', 'selling_price'] 36 | 37 | API_KEY = 'HNUA2KUYENIP44PV' 38 | ENDPOINT_SCHEMA_URL = 'https://psrc-35wr2.us-central1.gcp.confluent.cloud' 39 | API_SECRET_KEY = 'TH5n14kG1JAD6b8rmf92Y6wyXPY66De2kzbiZUS0jytRfkxpEM4rWdlGVSsM/nFR' 40 | BOOTSTRAP_SERVER = 'pkc-lzvrd.us-west4.gcp.confluent.cloud:9092' 41 | SECURITY_PROTOCOL = 'SASL_SSL' 42 | SSL_MACHENISM = 'PLAIN' 43 | SCHEMA_REGISTRY_API_KEY = 'PBEUUAHOC2GTPJWT' 44 | SCHEMA_REGISTRY_API_SECRET = 'EuAq+lp9CJYCs2n/TKOdhk9C2bbMl0ZRyE6KfYJ0v2Ng6anqHnLzqAtCjSwMSE+Y' 45 | 46 | 47 | def sasl_conf(): 48 | 49 | sasl_conf = {'sasl.mechanism': SSL_MACHENISM, 50 | # Set to SASL_SSL to enable TLS support. 51 | # 'security.protocol': 'SASL_PLAINTEXT'} 52 | 'bootstrap.servers':BOOTSTRAP_SERVER, 53 | 'security.protocol': SECURITY_PROTOCOL, 54 | 'sasl.username': API_KEY, 55 | 'sasl.password': API_SECRET_KEY 56 | } 57 | return sasl_conf 58 | 59 | 60 | 61 | def schema_config(): 62 | return {'url':ENDPOINT_SCHEMA_URL, 63 | 64 | 'basic.auth.user.info':f"{SCHEMA_REGISTRY_API_KEY}:{SCHEMA_REGISTRY_API_SECRET}" 65 | 66 | } 67 | 68 | 69 | class Car: 70 | def __init__(self,record:dict): 71 | for k,v in record.items(): 72 | setattr(self,k,v) 73 | 74 | self.record=record 75 | 76 | @staticmethod 77 | def dict_to_car(data:dict,ctx): 78 | return Car(record=data) 79 | 80 | def __str__(self): 81 | return f"{self.record}" 82 | 83 | 84 | def get_car_instance(file_path): 85 | df=pd.read_csv(file_path) 86 | df=df.iloc[:,1:] 87 | cars:List[Car]=[] 88 | for data in df.values: 89 | car=Car(dict(zip(columns,data))) 90 | cars.append(car) 91 | yield car 92 | 93 | def car_to_dict(car:Car, ctx): 94 | """ 95 | Returns a dict representation of a User instance for serialization. 96 | Args: 97 | user (User): User instance. 98 | ctx (SerializationContext): Metadata pertaining to the serialization 99 | operation. 100 | Returns: 101 | dict: Dict populated with user attributes to be serialized. 102 | """ 103 | 104 | # User._address must not be serialized; omit from dict 105 | return car.record 106 | 107 | 108 | def delivery_report(err, msg): 109 | """ 110 | Reports the success or failure of a message delivery. 111 | Args: 112 | err (KafkaError): The error that occurred on None on success. 113 | msg (Message): The message that was produced or failed. 114 | """ 115 | 116 | if err is not None: 117 | print("Delivery failed for User record {}: {}".format(msg.key(), err)) 118 | return 119 | print('User record {} successfully produced to {} [{}] at offset {}'.format( 120 | msg.key(), msg.topic(), msg.partition(), msg.offset())) 121 | 122 | 123 | def main(topic): 124 | 125 | schema_str = """ 126 | { 127 | "$id": "http://example.com/myURI.schema.json", 128 | "$schema": "http://json-schema.org/draft-07/schema#", 129 | "additionalProperties": false, 130 | "description": "Sample schema to help you get started.", 131 | "properties": { 132 | "brand": { 133 | "description": "The type(v) type is used.", 134 | "type": "string" 135 | }, 136 | "car_name": { 137 | "description": "The type(v) type is used.", 138 | "type": "string" 139 | }, 140 | "engine": { 141 | "description": "The type(v) type is used.", 142 | "type": "number" 143 | }, 144 | "fuel_type": { 145 | "description": "The type(v) type is used.", 146 | "type": "string" 147 | }, 148 | "km_driven": { 149 | "description": "The type(v) type is used.", 150 | "type": "number" 151 | }, 152 | "max_power": { 153 | "description": "The type(v) type is used.", 154 | "type": "number" 155 | }, 156 | "mileage": { 157 | "description": "The type(v) type is used.", 158 | "type": "number" 159 | }, 160 | "model": { 161 | "description": "The type(v) type is used.", 162 | "type": "string" 163 | }, 164 | "seats": { 165 | "description": "The type(v) type is used.", 166 | "type": "number" 167 | }, 168 | "seller_type": { 169 | "description": "The type(v) type is used.", 170 | "type": "string" 171 | }, 172 | "selling_price": { 173 | "description": "The type(v) type is used.", 174 | "type": "number" 175 | }, 176 | "transmission_type": { 177 | "description": "The type(v) type is used.", 178 | "type": "string" 179 | }, 180 | "vehicle_age": { 181 | "description": "The type(v) type is used.", 182 | "type": "number" 183 | } 184 | }, 185 | "title": "SampleRecord", 186 | "type": "object" 187 | } 188 | """ 189 | schema_registry_conf = schema_config() 190 | schema_registry_client = SchemaRegistryClient(schema_registry_conf) 191 | 192 | string_serializer = StringSerializer('utf_8') 193 | json_serializer = JSONSerializer(schema_str, schema_registry_client, car_to_dict) 194 | 195 | producer = Producer(sasl_conf()) 196 | 197 | print("Producing user records to topic {}. ^C to exit.".format(topic)) 198 | #while True: 199 | # Serve on_delivery callbacks from previous calls to produce() 200 | producer.poll(0.0) 201 | try: 202 | for car in get_car_instance(file_path=FILE_PATH): 203 | 204 | print(car) 205 | producer.produce(topic=topic, 206 | key=string_serializer(str(uuid4()), car_to_dict), 207 | value=json_serializer(car, SerializationContext(topic, MessageField.VALUE)), 208 | on_delivery=delivery_report) 209 | break 210 | except KeyboardInterrupt: 211 | pass 212 | except ValueError: 213 | print("Invalid input, discarding record...") 214 | pass 215 | 216 | print("\nFlushing records...") 217 | producer.flush() 218 | 219 | main("test_topic") 220 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | confluent-kafka[avro,json,protobuf] 2 | pyspark==3.2.1 --------------------------------------------------------------------------------