├── Confluent Account.md
├── Confluent Topic Creation.md
├── ConfluentClusterSetup.md
├── Kafka key and secrets.md
├── LICENSE
├── README.md
├── cardekho_dataset.csv
├── install_python_packages.txt
├── kafka_assignment.txt
├── kafka_json_consumer.py
├── kafka_json_producer.py
├── requirements.txt
└── restaurant_orders.csv


/Confluent Account.md:
--------------------------------------------------------------------------------
 1 | # kafka-tutorial
 2 | 
 3 | To truly tap into Kafka, you need Confluent
 4 | You love Apache Kafka®, but not managing it. Cloud-native, complete, and fully managed service goes above & beyond Kafka so your best people can focus on delivering value to your business.
 5 | 
 6 | With Confluent, capture and process customer interactions as they happen. Unlock a data-rich view of their actions and preferences and engage with them in the most meaningful ways, personalizing their experiences, across every channel, in real time.
 7 | 
 8 | 
 9 | ## Create Account
10 | 
11 | 1. open signup page of confluent Kafka
12 | [Confluent Kafka](https://www.confluent.io/get-started/)
13 | ![download](https://user-images.githubusercontent.com/34875169/169843335-edbf331f-96a2-499f-81a0-892dfeed9d78.png)
14 | 
15 | 
16 | 
17 | 
18 | 2. Go to Confluent [Login page](https://confluent.cloud/signup/idp/google-oauth2?signup_source=iosocial&iov_id=49a3b680-a2a1-4f52-8c1d-888fae73120e&_ga=2.145483828.1681359053.1653306300-338132655.1653306300)
19 | 
20 | ![download](https://user-images.githubusercontent.com/34875169/169844230-41d01336-f22d-4037-99e4-ac5e458e0c24.png)
21 | 
22 | 
23 | 


--------------------------------------------------------------------------------
/Confluent Topic Creation.md:
--------------------------------------------------------------------------------
 1 | Open confluent [home page](https://confluent.cloud/home)
 2 | ![download](https://user-images.githubusercontent.com/34875169/169838941-9f722c64-c149-4039-8656-31a832c6b7ce.png)
 3 | Choose default cluster
 4 | ![download](https://user-images.githubusercontent.com/34875169/169838953-48a6bfa3-d434-4180-9277-147c3dd913e5.png)
 5 | Choose cluster
 6 | ![download](https://user-images.githubusercontent.com/34875169/169838966-295cbbab-0388-49bb-808d-951a043857a7.png)
 7 | Choose Topics
 8 | ![download](https://user-images.githubusercontent.com/34875169/169838983-738d8f6d-e727-4f89-acb4-7daa7cd2270b.png)
 9 | Click on Add topic
10 | ![download](https://user-images.githubusercontent.com/34875169/169838997-2306476f-5254-483c-bd09-627375bd5e46.png)
11 | Provide topic details.
12 | ![download](https://user-images.githubusercontent.com/34875169/169839009-8b7e7bde-948e-4972-b0cd-b4115a3a8928.png)
13 | Now you can see that topic has been created.
14 | ![download](https://user-images.githubusercontent.com/34875169/169839027-b7e5307f-0045-4d30-813c-bb7d7c6aa3c7.png)
15 | 
16 | 


--------------------------------------------------------------------------------
/ConfluentClusterSetup.md:
--------------------------------------------------------------------------------
 1 | ## Confluent Kafka Cluster setup
 2 | 
 3 | 1. Open [Home page](https://confluent.cloud/home)
 4 | ![download](https://user-images.githubusercontent.com/34875169/169840757-cd9463c2-e2c7-4e6e-a142-0a3c6c988663.png)
 5 | 
 6 | 2. Choose Default cluster
 7 | ![download](https://user-images.githubusercontent.com/34875169/169840798-ec8b73a4-7d1b-460b-8ff0-7515cf7412c0.png)
 8 | 
 9 | 3. Add cluster
10 | ![download](https://user-images.githubusercontent.com/34875169/169840830-01cf05a8-0990-4c5e-8bdf-17fed1af1d0b.png)
11 | 
12 | 4. Choose free version
13 | ![download](https://user-images.githubusercontent.com/34875169/169840842-95ce0d9a-4138-41a4-b800-f5d77d912417.png)
14 | 
15 | 5. Choose any cloud or you can follow same as per below image
16 | ![download](https://user-images.githubusercontent.com/34875169/169840853-581fba6c-0683-47f0-a845-0bc7053ebcc8.png)
17 | 
18 | 6. Provide cluster name and then launch cluster 
19 | ![download](https://user-images.githubusercontent.com/34875169/169840870-accc7373-8955-49b7-9470-ebceebe0f24c.png)
20 | 
21 | 


--------------------------------------------------------------------------------
/Kafka key and secrets.md:
--------------------------------------------------------------------------------
 1 | Obtain kafka cluster key and secrets
 2 | 
 3 | 1. Open [Homepage](https://confluent.cloud/home)
 4 | ![download](https://user-images.githubusercontent.com/34875169/169841520-4166bc30-9a1d-4b9e-ac10-47afa7abc927.png)
 5 | 
 6 | 2. Choose default cluster
 7 | ![download](https://user-images.githubusercontent.com/34875169/169841535-98372671-f95a-4047-9602-4ad35b817672.png)
 8 | 
 9 | 3. Choose default cluster
10 | ![download](https://user-images.githubusercontent.com/34875169/169841548-efecc085-ca0b-4378-ac58-9acb626ef7a1.png)
11 | 
12 | 4. Choose Data integration option
13 | ![download](https://user-images.githubusercontent.com/34875169/169841560-1911095a-4cea-4efd-a291-95f86e0ba2eb.png)
14 | 
15 | 5. Choose API keys
16 | ![download](https://user-images.githubusercontent.com/34875169/169841568-8cc7e3a1-c69f-4732-85e3-e04684100608.png)
17 | 
18 | 6. Choose global access and then choose next
19 | ![download](https://user-images.githubusercontent.com/34875169/169841580-27d88d85-1fa4-44f5-84f8-af00c85dd9d8.png)
20 | 
21 | 
22 | 7. Download the key and secrets it will be used to send data to kafka topics.
23 | ![download](https://user-images.githubusercontent.com/34875169/169842232-ab6dc20f-bbda-40a9-ad76-c50ddcb00e63.png)
24 | 
25 | 
26 | 
27 | 
28 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2022 Avnish Yadav
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | ![download](https://user-images.githubusercontent.com/34875169/169837256-b5cce5b4-0b10-4a5b-82b7-926f10690437.png)
  2 | ***
  3 | How to setup confluent Kafka.
  4 | 1. [Account Setup](Confluent%20Account.md)
  5 | 2. [Cluster Setup](ConfluentClusterSetup.md)
  6 | 3. [Kafka Topic](Confluent%20Topic%20Creation.md)
  7 | 4. [Obtain secrets](Kafka%20key%20and%20secrets.md)
  8 | ***
  9 | 
 10 | Create a conda environment
 11 | ```
 12 | conda create -p venv python==3.7 -y
 13 | ```
 14 | 
 15 | Activate conda environment
 16 | ```
 17 | conda activate venv
 18 | ```
 19 | 
 20 | To use confluent kafka we need following details from Confluent dashboard.
 21 | 
 22 | ```
 23 | confluentClusterName = ""
 24 | confluentBootstrapServers = ""
 25 | confluentTopicName = ""
 26 | confluentApiKey = ""
 27 | confluentSecret = ""
 28 | ```
 29 | Add below library in requirements.txt
 30 | ```
 31 | confluent-kafka[avro,json,protobuf]
 32 | pyspark==3.2.1
 33 | ```
 34 | 
 35 | ### Read data from kafka topic
 36 | Import necessary packages
 37 | ```
 38 | from pyspark.sql import SparkSession
 39 | ```
 40 | 
 41 | Create a spark session object using below snippet.
 42 | ```
 43 | spark_session=SparkSession.builder.master("local[*]").appName("Confluent").getOrCreate()
 44 | ```
 45 | Read data from kafka topic
 46 | ```
 47 | df = (spark_session
 48 |           .readStream
 49 |           .format("kafka")
 50 |           .option("kafka.bootstrap.servers", confluentBootstrapServers)
 51 |           .option("kafka.security.protocol", "SASL_SSL")
 52 |           .option("kafka.sasl.jaas.config",
 53 |                   "org.apache.kafka.common.security.plain.PlainLoginModule  required username='{}' password='{}';".format(confluentApiKey, confluentSecret))
 54 |           .option("kafka.ssl.endpoint.identification.algorithm", "https")
 55 |           .option("kafka.sasl.mechanism", "PLAIN")
 56 |           .option("subscribe", confluentTopicName)
 57 |           .option("startingOffsets", "earliest")
 58 |           .option("failOnDataLoss", "false")
 59 |           .load()
 60 |           )
 61 | ```
 62 | 
 63 | process read data from kafka topic
 64 | 
 65 | ```
 66 | df = (df.withColumn('key_str',df['key'].cast('string').alias('key_str')).drop('key').withColumn('value_str',df['value'].cast('string').alias('key_str')))
 67 | ```
 68 | 
 69 | Write data in json file.
 70 | ```
 71 |     query = (df.selectExpr("value_str").writeStream
 72 |              .format("json")
 73 |              .option("format", "append")
 74 |              .trigger(processingTime="5 seconds")
 75 |              .option("checkpointLocation", os.path.join("csv_checkpoint"))
 76 |              .option("path", os.path.join("json"))
 77 |              .outputMode("append")
 78 |              .start()
 79 |              )
 80 |     query.awaitTermination()
 81 | ```
 82 | 
 83 | Write data in csv file
 84 | ```
 85 |     query = (df.writeStream
 86 |              .format("csv")
 87 |              .option("format", "append")
 88 |              .trigger(processingTime="5 seconds")
 89 |              .option("checkpointLocation", os.path.join("csv_checkpoint"))
 90 |              .option("path", os.path.join("csv"))
 91 |              .outputMode("append")
 92 |              .start()
 93 |              )
 94 |     query.awaitTermination()
 95 | ```
 96 | Write data to kafka topic
 97 | ```
 98 |     query = (df.writeStream
 99 |              .format("kafka")
100 |              .option("kafka.bootstrap.servers", confluentBootstrapServers)
101 |              .option("kafka.security.protocol", "SASL_SSL")
102 |              .option("kafka.sasl.jaas.config",
103 |                      "org.apache.kafka.common.security.plain.PlainLoginModule  required username='{}' password='{}';".format(
104 |                          confluentApiKey, confluentSecret))
105 |              .option("kafka.ssl.endpoint.identification.algorithm", "https")
106 |              .option("kafka.sasl.mechanism", "PLAIN")
107 |              .option("checkpointLocation", os.path.join("kafka_checkpoint"))
108 |              .option("topic", confluentTopicName).start())
109 | 
110 | 
111 |     query.awaitTermination()
112 | ```
113 | 
114 | ***
115 | Note: Don't run your python script using python command
116 | use below command to run your script for kafka confluent.
117 | ***
118 | 
119 | To run python script
120 | ```commandline
121 | spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.2.1 <scipt_name.py>
122 | ```


--------------------------------------------------------------------------------
/install_python_packages.txt:
--------------------------------------------------------------------------------
 1 | 1. Install Python3 version (Check for Windows, Linux and Mac)
 2 | 
 3 | 2. If not then install pip3 package explicitly
 4 | 
 5 | 3. pip3 install confluent_kafka
 6 | 4. pip3 install pandas
 7 | 5. pip3 install requests
 8 | 
 9 | 
10 | # During the execution of final code you might face error of "No Module name jsonschema"
11 | 
12 | # pip3 install jsonschema
13 | 


--------------------------------------------------------------------------------
/kafka_assignment.txt:
--------------------------------------------------------------------------------
 1 | # download restaurent data from below mentioned link
 2 | 
 3 | Download Data Link -> https://github.com/shashank-mishra219/Confluent-Kafka-Setup/blob/main/restaurant_orders.csv
 4 | 
 5 | Complete the given below task to finish this assignment.
 6 | 
 7 | 1. Setup Confluent Kafka Account
 8 | 2. Create one kafka topic named as "restaurent-take-away-data" with 3 partitions
 9 | 3. Setup key (string) & value (json) schema in the confluent schema registry
10 | 4. Write a kafka producer program (python or any other language) to read data records from restaurent data csv file, 
11 |    make sure schema is not hardcoded in the producer code, read the latest version of schema and schema_str from schema registry and use it for
12 |    data serialization.
13 | 5. From producer code, publish data in Kafka Topic one by one and use dynamic key while publishing the records into the Kafka Topic
14 | 6. Write kafka consumer code and create two copies of same consumer code and save it with different names (kafka_consumer_1.py & kafka_consumer_2.py), 
15 |    again make sure lates schema version and schema_str is not hardcoded in the consumer code, read it automatically from the schema registry to desrialize the data. 
16 |    Now test two scenarios with your consumer code:
17 |     a.) Use "group.id" property in consumer config for both consumers and mention different group_ids in kafka_consumer_1.py & kafka_consumer_2.py,
18 |         apply "earliest" offset property in both consumers and run these two consumers from two different terminals. Calculate how many records each consumer
19 |         consumed and printed on the terminal
20 |     b.) Use "group.id" property in consumer config for both consumers and mention same group_ids in kafka_consumer_1.py & kafka_consumer_2.py,
21 |         apply "earliest" offset property in both consumers and run these two consumers from two different terminals. Calculate how many records each consumer
22 |         consumed and printed on the terminal
23 |         
24 | 7. Once above questions are done, write another kafka consumer to read data from kafka topic and from the consumer code create one csv file "output.csv"
25 |    and append consumed records output.csv file
26 |    
27 | 


--------------------------------------------------------------------------------
/kafka_json_consumer.py:
--------------------------------------------------------------------------------
  1 | import argparse
  2 | 
  3 | from confluent_kafka import Consumer
  4 | from confluent_kafka.serialization import SerializationContext, MessageField
  5 | from confluent_kafka.schema_registry.json_schema import JSONDeserializer
  6 | 
  7 | 
  8 | API_KEY = 'HNUA2KUYENIP44PV'
  9 | ENDPOINT_SCHEMA_URL  = 'https://psrc-35wr2.us-central1.gcp.confluent.cloud'
 10 | API_SECRET_KEY = 'TH5n14kG1JAD6b8rmf92Y6wyXPY66De2kzbiZUS0jytRfkxpEM4rWdlGVSsM/nFR'
 11 | BOOTSTRAP_SERVER = 'pkc-lzvrd.us-west4.gcp.confluent.cloud:9092'
 12 | SECURITY_PROTOCOL = 'SASL_SSL'
 13 | SSL_MACHENISM = 'PLAIN'
 14 | SCHEMA_REGISTRY_API_KEY = 'PBEUUAHOC2GTPJWT'
 15 | SCHEMA_REGISTRY_API_SECRET = 'EuAq+lp9CJYCs2n/TKOdhk9C2bbMl0ZRyE6KfYJ0v2Ng6anqHnLzqAtCjSwMSE+Y'
 16 | 
 17 | 
 18 | def sasl_conf():
 19 | 
 20 |     sasl_conf = {'sasl.mechanism': SSL_MACHENISM,
 21 |                  # Set to SASL_SSL to enable TLS support.
 22 |                 #  'security.protocol': 'SASL_PLAINTEXT'}
 23 |                 'bootstrap.servers':BOOTSTRAP_SERVER,
 24 |                 'security.protocol': SECURITY_PROTOCOL,
 25 |                 'sasl.username': API_KEY,
 26 |                 'sasl.password': API_SECRET_KEY
 27 |                 }
 28 |     return sasl_conf
 29 | 
 30 | 
 31 | 
 32 | def schema_config():
 33 |     return {'url':ENDPOINT_SCHEMA_URL,
 34 |     
 35 |     'basic.auth.user.info':f"{SCHEMA_REGISTRY_API_KEY}:{SCHEMA_REGISTRY_API_SECRET}"
 36 | 
 37 |     }
 38 | 
 39 | 
 40 | class Car:   
 41 |     def __init__(self,record:dict):
 42 |         for k,v in record.items():
 43 |             setattr(self,k,v)
 44 |         
 45 |         self.record=record
 46 |    
 47 |     @staticmethod
 48 |     def dict_to_car(data:dict,ctx):
 49 |         return Car(record=data)
 50 | 
 51 |     def __str__(self):
 52 |         return f"{self.record}"
 53 | 
 54 | 
 55 | def main(topic):
 56 | 
 57 |     schema_str = """
 58 |     {
 59 |   "$id": "http://example.com/myURI.schema.json",
 60 |   "$schema": "http://json-schema.org/draft-07/schema#",
 61 |   "additionalProperties": false,
 62 |   "description": "Sample schema to help you get started.",
 63 |   "properties": {
 64 |     "brand": {
 65 |       "description": "The type(v) type is used.",
 66 |       "type": "string"
 67 |     },
 68 |     "car_name": {
 69 |       "description": "The type(v) type is used.",
 70 |       "type": "string"
 71 |     },
 72 |     "engine": {
 73 |       "description": "The type(v) type is used.",
 74 |       "type": "number"
 75 |     },
 76 |     "fuel_type": {
 77 |       "description": "The type(v) type is used.",
 78 |       "type": "string"
 79 |     },
 80 |     "km_driven": {
 81 |       "description": "The type(v) type is used.",
 82 |       "type": "number"
 83 |     },
 84 |     "max_power": {
 85 |       "description": "The type(v) type is used.",
 86 |       "type": "number"
 87 |     },
 88 |     "mileage": {
 89 |       "description": "The type(v) type is used.",
 90 |       "type": "number"
 91 |     },
 92 |     "model": {
 93 |       "description": "The type(v) type is used.",
 94 |       "type": "string"
 95 |     },
 96 |     "seats": {
 97 |       "description": "The type(v) type is used.",
 98 |       "type": "number"
 99 |     },
100 |     "seller_type": {
101 |       "description": "The type(v) type is used.",
102 |       "type": "string"
103 |     },
104 |     "selling_price": {
105 |       "description": "The type(v) type is used.",
106 |       "type": "number"
107 |     },
108 |     "transmission_type": {
109 |       "description": "The type(v) type is used.",
110 |       "type": "string"
111 |     },
112 |     "vehicle_age": {
113 |       "description": "The type(v) type is used.",
114 |       "type": "number"
115 |     }
116 |   },
117 |   "title": "SampleRecord",
118 |   "type": "object"
119 | }
120 |     """
121 |     json_deserializer = JSONDeserializer(schema_str,
122 |                                          from_dict=Car.dict_to_car)
123 | 
124 |     consumer_conf = sasl_conf()
125 |     consumer_conf.update({
126 |                      'group.id': 'group1',
127 |                      'auto.offset.reset': "earliest"})
128 | 
129 |     consumer = Consumer(consumer_conf)
130 |     consumer.subscribe([topic])
131 | 
132 | 
133 |     while True:
134 |         try:
135 |             # SIGINT can't be handled when polling, limit timeout to 1 second.
136 |             msg = consumer.poll(1.0)
137 |             if msg is None:
138 |                 continue
139 | 
140 |             car = json_deserializer(msg.value(), SerializationContext(msg.topic(), MessageField.VALUE))
141 | 
142 |             if car is not None:
143 |                 print("User record {}: car: {}\n"
144 |                       .format(msg.key(), car))
145 |         except KeyboardInterrupt:
146 |             break
147 | 
148 |     consumer.close()
149 | 
150 | main("test_topic")


--------------------------------------------------------------------------------
/kafka_json_producer.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | # -*- coding: utf-8 -*-
  3 | #
  4 | # Copyright 2020 Confluent Inc.
  5 | #
  6 | # Licensed under the Apache License, Version 2.0 (the "License");
  7 | # you may not use this file except in compliance with the License.
  8 | # You may obtain a copy of the License at
  9 | #
 10 | # http://www.apache.org/licenses/LICENSE-2.0
 11 | #
 12 | # Unless required by applicable law or agreed to in writing, software
 13 | # distributed under the License is distributed on an "AS IS" BASIS,
 14 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 15 | # See the License for the specific language governing permissions and
 16 | # limitations under the License.
 17 | 
 18 | 
 19 | # A simple example demonstrating use of JSONSerializer.
 20 | 
 21 | import argparse
 22 | from uuid import uuid4
 23 | from six.moves import input
 24 | from confluent_kafka import Producer
 25 | from confluent_kafka.serialization import StringSerializer, SerializationContext, MessageField
 26 | from confluent_kafka.schema_registry import SchemaRegistryClient
 27 | from confluent_kafka.schema_registry.json_schema import JSONSerializer
 28 | #from confluent_kafka.schema_registry import *
 29 | import pandas as pd
 30 | from typing import List
 31 | 
 32 | FILE_PATH = "/Users/shashankmishra/Desktop/Kafka Classes/Confluen Kafka Setup/Confluent-Kafka-Setup/cardekho_dataset.csv"
 33 | columns=['car_name', 'brand', 'model', 'vehicle_age', 'km_driven', 'seller_type',
 34 |        'fuel_type', 'transmission_type', 'mileage', 'engine', 'max_power',
 35 |        'seats', 'selling_price']
 36 | 
 37 | API_KEY = 'HNUA2KUYENIP44PV'
 38 | ENDPOINT_SCHEMA_URL  = 'https://psrc-35wr2.us-central1.gcp.confluent.cloud'
 39 | API_SECRET_KEY = 'TH5n14kG1JAD6b8rmf92Y6wyXPY66De2kzbiZUS0jytRfkxpEM4rWdlGVSsM/nFR'
 40 | BOOTSTRAP_SERVER = 'pkc-lzvrd.us-west4.gcp.confluent.cloud:9092'
 41 | SECURITY_PROTOCOL = 'SASL_SSL'
 42 | SSL_MACHENISM = 'PLAIN'
 43 | SCHEMA_REGISTRY_API_KEY = 'PBEUUAHOC2GTPJWT'
 44 | SCHEMA_REGISTRY_API_SECRET = 'EuAq+lp9CJYCs2n/TKOdhk9C2bbMl0ZRyE6KfYJ0v2Ng6anqHnLzqAtCjSwMSE+Y'
 45 | 
 46 | 
 47 | def sasl_conf():
 48 | 
 49 |     sasl_conf = {'sasl.mechanism': SSL_MACHENISM,
 50 |                  # Set to SASL_SSL to enable TLS support.
 51 |                 #  'security.protocol': 'SASL_PLAINTEXT'}
 52 |                 'bootstrap.servers':BOOTSTRAP_SERVER,
 53 |                 'security.protocol': SECURITY_PROTOCOL,
 54 |                 'sasl.username': API_KEY,
 55 |                 'sasl.password': API_SECRET_KEY
 56 |                 }
 57 |     return sasl_conf
 58 | 
 59 | 
 60 | 
 61 | def schema_config():
 62 |     return {'url':ENDPOINT_SCHEMA_URL,
 63 |     
 64 |     'basic.auth.user.info':f"{SCHEMA_REGISTRY_API_KEY}:{SCHEMA_REGISTRY_API_SECRET}"
 65 | 
 66 |     }
 67 | 
 68 | 
 69 | class Car:   
 70 |     def __init__(self,record:dict):
 71 |         for k,v in record.items():
 72 |             setattr(self,k,v)
 73 |         
 74 |         self.record=record
 75 |    
 76 |     @staticmethod
 77 |     def dict_to_car(data:dict,ctx):
 78 |         return Car(record=data)
 79 | 
 80 |     def __str__(self):
 81 |         return f"{self.record}"
 82 | 
 83 | 
 84 | def get_car_instance(file_path):
 85 |     df=pd.read_csv(file_path)
 86 |     df=df.iloc[:,1:]
 87 |     cars:List[Car]=[]
 88 |     for data in df.values:
 89 |         car=Car(dict(zip(columns,data)))
 90 |         cars.append(car)
 91 |         yield car
 92 | 
 93 | def car_to_dict(car:Car, ctx):
 94 |     """
 95 |     Returns a dict representation of a User instance for serialization.
 96 |     Args:
 97 |         user (User): User instance.
 98 |         ctx (SerializationContext): Metadata pertaining to the serialization
 99 |             operation.
100 |     Returns:
101 |         dict: Dict populated with user attributes to be serialized.
102 |     """
103 | 
104 |     # User._address must not be serialized; omit from dict
105 |     return car.record
106 | 
107 | 
108 | def delivery_report(err, msg):
109 |     """
110 |     Reports the success or failure of a message delivery.
111 |     Args:
112 |         err (KafkaError): The error that occurred on None on success.
113 |         msg (Message): The message that was produced or failed.
114 |     """
115 | 
116 |     if err is not None:
117 |         print("Delivery failed for User record {}: {}".format(msg.key(), err))
118 |         return
119 |     print('User record {} successfully produced to {} [{}] at offset {}'.format(
120 |         msg.key(), msg.topic(), msg.partition(), msg.offset()))
121 | 
122 | 
123 | def main(topic):
124 | 
125 |     schema_str = """
126 |     {
127 |   "$id": "http://example.com/myURI.schema.json",
128 |   "$schema": "http://json-schema.org/draft-07/schema#",
129 |   "additionalProperties": false,
130 |   "description": "Sample schema to help you get started.",
131 |   "properties": {
132 |     "brand": {
133 |       "description": "The type(v) type is used.",
134 |       "type": "string"
135 |     },
136 |     "car_name": {
137 |       "description": "The type(v) type is used.",
138 |       "type": "string"
139 |     },
140 |     "engine": {
141 |       "description": "The type(v) type is used.",
142 |       "type": "number"
143 |     },
144 |     "fuel_type": {
145 |       "description": "The type(v) type is used.",
146 |       "type": "string"
147 |     },
148 |     "km_driven": {
149 |       "description": "The type(v) type is used.",
150 |       "type": "number"
151 |     },
152 |     "max_power": {
153 |       "description": "The type(v) type is used.",
154 |       "type": "number"
155 |     },
156 |     "mileage": {
157 |       "description": "The type(v) type is used.",
158 |       "type": "number"
159 |     },
160 |     "model": {
161 |       "description": "The type(v) type is used.",
162 |       "type": "string"
163 |     },
164 |     "seats": {
165 |       "description": "The type(v) type is used.",
166 |       "type": "number"
167 |     },
168 |     "seller_type": {
169 |       "description": "The type(v) type is used.",
170 |       "type": "string"
171 |     },
172 |     "selling_price": {
173 |       "description": "The type(v) type is used.",
174 |       "type": "number"
175 |     },
176 |     "transmission_type": {
177 |       "description": "The type(v) type is used.",
178 |       "type": "string"
179 |     },
180 |     "vehicle_age": {
181 |       "description": "The type(v) type is used.",
182 |       "type": "number"
183 |     }
184 |   },
185 |   "title": "SampleRecord",
186 |   "type": "object"
187 | }
188 |     """
189 |     schema_registry_conf = schema_config()
190 |     schema_registry_client = SchemaRegistryClient(schema_registry_conf)
191 | 
192 |     string_serializer = StringSerializer('utf_8')
193 |     json_serializer = JSONSerializer(schema_str, schema_registry_client, car_to_dict)
194 | 
195 |     producer = Producer(sasl_conf())
196 | 
197 |     print("Producing user records to topic {}. ^C to exit.".format(topic))
198 |     #while True:
199 |         # Serve on_delivery callbacks from previous calls to produce()
200 |     producer.poll(0.0)
201 |     try:
202 |         for car in get_car_instance(file_path=FILE_PATH):
203 | 
204 |             print(car)
205 |             producer.produce(topic=topic,
206 |                             key=string_serializer(str(uuid4()), car_to_dict),
207 |                             value=json_serializer(car, SerializationContext(topic, MessageField.VALUE)),
208 |                             on_delivery=delivery_report)
209 |             break
210 |     except KeyboardInterrupt:
211 |         pass
212 |     except ValueError:
213 |         print("Invalid input, discarding record...")
214 |         pass
215 | 
216 |     print("\nFlushing records...")
217 |     producer.flush()
218 | 
219 | main("test_topic")
220 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | confluent-kafka[avro,json,protobuf]
2 | pyspark==3.2.1


--------------------------------------------------------------------------------