├── CONTRIBUTING.md
├── DomainJoined-Producer-Consumer-With-TLS
├── .gitignore
├── README.md
├── media
│ ├── Add_User.png
│ ├── Azure_Portal_UI.png
│ ├── Edit_Policy_UI.png
│ ├── Kafk_Policy_UI.png
│ └── Ranger_UI.png
├── pom.xml
└── src
│ └── main
│ └── java
│ └── com
│ └── microsoft
│ └── example
│ ├── AdminClientWrapper.java
│ ├── Consumer.java
│ ├── Producer.java
│ └── Run.java
├── DomainJoined-Producer-Consumer
├── .gitignore
├── README.md
├── media
│ ├── Add_User.png
│ ├── Azure_Portal_UI.png
│ ├── Edit_Policy_UI.png
│ ├── Kafk_Policy_UI.png
│ └── Ranger_UI.png
├── pom.xml
└── src
│ └── main
│ └── java
│ └── com
│ └── microsoft
│ └── example
│ ├── AdminClientWrapper.java
│ ├── Consumer.java
│ ├── Producer.java
│ └── Run.java
├── LICENSE
├── Prebuilt-Jars
├── kafka-producer-consumer-esp.jar
├── kafka-producer-consumer-tls-esp.jar
└── kafka-producer-consumer.jar
├── Producer-Consumer
├── .gitignore
├── pom.xml
└── src
│ └── main
│ └── java
│ └── com
│ └── microsoft
│ └── example
│ ├── AdminClientWrapper.java
│ ├── Consumer.java
│ ├── Producer.java
│ └── Run.java
├── README.md
├── Streaming
├── .gitignore
├── pom.xml
└── src
│ └── main
│ └── java
│ └── com
│ └── microsoft
│ └── example
│ └── Stream.java
└── azuredeploy.json
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
1 | # Contributing to Azure samples
2 |
3 | Thank you for your interest in contributing to Azure samples!
4 |
5 | ## Ways to contribute
6 |
7 | You can contribute to [Azure samples](https://azure.microsoft.com/documentation/samples/) in a few different ways:
8 |
9 | - Submit feedback on [this sample page](https://azure.microsoft.com/documentation/samples/hdinsight-java-storm-wordcount/) whether it was helpful or not.
10 | - Submit issues through [issue tracker](https://github.com/Azure-Samples/hdinsight-java-storm-wordcount/issues) on GitHub. We are actively monitoring the issues and improving our samples.
11 | - If you wish to make code changes to samples, or contribute something new, please follow the [GitHub Forks / Pull requests model](https://help.github.com/articles/fork-a-repo/): Fork the sample repo, make the change and propose it back by submitting a pull request.
--------------------------------------------------------------------------------
/DomainJoined-Producer-Consumer-With-TLS/.gitignore:
--------------------------------------------------------------------------------
1 | target/
2 | pom.xml.tag
3 | pom.xml.releaseBackup
4 | pom.xml.versionsBackup
5 | pom.xml.next
6 | release.properties
7 | dependency-reduced-pom.xml
8 | buildNumber.properties
9 | .mvn/timing.properties
10 | .idea/
11 | *.log
12 | .classpath
13 | .project
14 | .settings/
15 | *.iml
--------------------------------------------------------------------------------
/DomainJoined-Producer-Consumer-With-TLS/README.md:
--------------------------------------------------------------------------------
1 | ---
2 | page_type: sample
3 | languages: java
4 | products:
5 | - azure
6 | - azure-hdinsight
7 | description: "Examples in this repository demonstrate how to use the Kafka Consumer, Producer, and Streaming APIs with a Kerberized Kafka on HDInsight cluster."
8 | urlFragment: hdinsight-kafka-java-get-started
9 | ---
10 |
11 | # Java-based example of using the Kafka Consumer, Producer, and Streaming APIs
12 |
13 | The examples in this repository demonstrate how to use the Kafka Consumer, Producer, and Streaming APIs with `ESP Kafka including TLS enabled` on HDInsight cluster.
14 |
15 | ## Prerequisites
16 |
17 | * Apache Kafka on HDInsight cluster. To learn how to create the cluster, see [Start with Apache Kafka on HDInsight](apache-kafka-get-started.md).
18 | * [Java Developer Kit (JDK) version 8](https://aka.ms/azure-jdks) or an equivalent, such as OpenJDK.
19 | * [Apache Maven](https://maven.apache.org/download.cgi) properly [installed](https://maven.apache.org/install.html) according to Apache. Maven is a project build system for Java projects.
20 | * An SSH client like Putty. For more information, see [Connect to HDInsight (Apache Hadoop) using SSH](../hdinsight-hadoop-linux-use-ssh-unix.md).
21 |
22 | ## Understand the code
23 |
24 | If you're using **Enterprise Security Package (ESP) with TLS Encryption** enabled Kafka cluster, you should use the application version located in the `DomainJoined-Producer-Consumer-With-TLS` subdirectory.
25 |
26 | The application consists primarily of four files:
27 | * `pom.xml`: This file defines the project dependencies, Java version, and packaging methods.
28 | * `Producer.java`: This file sends random sentences to Kafka using the producer API.
29 | * `Consumer.java`: This file uses the consumer API to read data from Kafka and emit it to STDOUT.
30 | * `AdminClientWrapper.java`: This file uses the admin API to create, describe, and delete Kafka topics.
31 | * `Run.java`: The command-line interface used to run the producer and consumer code.
32 |
33 | ### Pom.xml
34 |
35 | The important things to understand in the `pom.xml` file are:
36 |
37 | * Dependencies: This project relies on the Kafka producer and consumer APIs, which are provided by the `kafka-clients` package. The following XML code defines this dependency:
38 |
39 | ```xml
40 |
41 |
42 | org.apache.kafka
43 | kafka-clients
44 | ${kafka.version}
45 |
46 | ```
47 |
48 | The `${kafka.version}` entry is declared in the `..` section of `pom.xml`, and is configured to the Kafka version of the HDInsight cluster.
49 |
50 | * Plugins: Maven plugins provide various capabilities. In this project, the following plugins are used:
51 |
52 | * `maven-compiler-plugin`: Used to set the Java version used by the project to 8. This is the version of Java used by HDInsight 4.0.
53 | * `maven-shade-plugin`: Used to generate an uber jar that contains this application as well as any dependencies. It is also used to set the entry point of the application, so that you can directly run the Jar file without having to specify the main class.
54 |
55 | ### Producer.java
56 |
57 | The producer communicates with the Kafka broker hosts (worker nodes) and sends data to a Kafka topic. The following code snippet is from the [Producer.java](https://github.com/Azure-Samples/hdinsight-kafka-java-get-started/blob/master/DomainJoined-Producer-Consumer/src/main/java/com/microsoft/example/Producer.java) file from the [GitHub repository](https://github.com/Azure-Samples/hdinsight-kafka-java-get-started) and shows how to set the producer properties.
58 |
59 | ```java
60 | Properties properties = new Properties();
61 | // Set the brokers (bootstrap servers)
62 | properties.setProperty("bootstrap.servers", brokers);
63 | // Set how to serialize key/value pairs
64 | properties.setProperty("key.serializer","org.apache.kafka.common.serialization.StringSerializer");
65 | properties.setProperty("value.serializer","org.apache.kafka.common.serialization.StringSerializer");
66 | // Set the TLS Encryption for Domain Joined TLS Encrypted cluster
67 | properties.setProperty(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, "SASL_SSL");
68 | properties.setProperty("ssl.mechanism", "GSSAPI");
69 | properties.setProperty("sasl.kerberos.service.name", "kafka");
70 | // Set the SSL Truststore location and password
71 | properties.setProperty(SslConfigs.SSL_TRUSTSTORE_LOCATION_CONFIG,"/home/sshuser/ssl/kafka.client.truststore.jks");
72 | properties.setProperty(SslConfigs.SSL_TRUSTSTORE_PASSWORD_CONFIG, "MyClientPassword123");
73 | // Set the SSL keystore location and password
74 | properties.setProperty(SslConfigs.SSL_KEYSTORE_LOCATION_CONFIG,"/home/sshuser/ssl/kafka.client.keystore.jks");
75 | properties.setProperty(SslConfigs.SSL_KEYSTORE_PASSWORD_CONFIG, "MyClientPassword123");
76 | // Set the SSL key password
77 | properties.setProperty(SslConfigs.SSL_KEY_PASSWORD_CONFIG, "MyClientPassword123");
78 | KafkaProducer producer = new KafkaProducer<>(properties);
79 | ```
80 |
81 | ### Consumer.java
82 |
83 | The consumer communicates with the Kafka broker hosts (worker nodes), and reads records in a loop. The following code snippet from the [Consumer.java](https://github.com/Azure-Samples/hdinsight-kafka-java-get-started/blob/master/DomainJoined-Producer-Consumer/src/main/java/com/microsoft/example/Consumer.java) file sets the consumer properties.
84 |
85 | ```java
86 | KafkaConsumer consumer;
87 | // Configure the consumer
88 | Properties properties = new Properties();
89 | // Point it to the brokers
90 | properties.setProperty("bootstrap.servers", brokers);
91 | // Set the consumer group (all consumers must belong to a group).
92 | properties.setProperty("group.id", groupId);
93 | // Set how to serialize key/value pairs
94 | properties.setProperty("key.deserializer","org.apache.kafka.common.serialization.StringDeserializer");
95 | properties.setProperty("value.deserializer","org.apache.kafka.common.serialization.StringDeserializer");
96 | // Set the TLS Encryption for Domain Joined TLS Encrypted cluster
97 | properties.setProperty(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, "SASL_SSL");
98 | properties.setProperty("ssl.mechanism", "GSSAPI");
99 | properties.setProperty("sasl.kerberos.service.name", "kafka");
100 | // Set the SSL Truststore location and password
101 | properties.setProperty(SslConfigs.SSL_TRUSTSTORE_LOCATION_CONFIG,"/home/sshuser/ssl/kafka.client.truststore.jks");
102 | properties.setProperty(SslConfigs.SSL_TRUSTSTORE_PASSWORD_CONFIG, "MyClientPassword123");
103 | // Set the SSL keystore location and password
104 | properties.setProperty(SslConfigs.SSL_KEYSTORE_LOCATION_CONFIG,"/home/sshuser/ssl/kafka.client.keystore.jks");
105 | properties.setProperty(SslConfigs.SSL_KEYSTORE_PASSWORD_CONFIG, "MyClientPassword123");
106 | // Set the SSL key password
107 | properties.setProperty(SslConfigs.SSL_KEY_PASSWORD_CONFIG, "MyClientPassword123");
108 | // When a group is first created, it has no offset stored to start reading from. This tells it to start
109 | // with the earliest record in the stream.
110 | properties.setProperty("auto.offset.reset","earliest");
111 | consumer = new KafkaConsumer<>(properties);
112 | ```
113 |
114 | #### Note:
115 | The important properties added for ESP with TLS Encryption enabled cluster.
116 | This is critical to add in `AdminClient, Producer and Consumer`.
117 | It is possible that your ESP cluster might have TLS Encryption and Authentication both. Please change the configurations based on
118 | [Enable TLS Encryption on ESP cluster](https://learn.microsoft.com/en-us/azure/hdinsight/kafka/apache-esp-kafka-ssl-encryption-authentication)
119 | ```
120 | // Set the TLS Encryption for Domain Joined TLS Encrypted cluster
121 | properties.setProperty(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, "SASL_SSL");
122 | properties.setProperty("ssl.mechanism", "GSSAPI");
123 | properties.setProperty("sasl.kerberos.service.name", "kafka");
124 | // Set the SSL Truststore location and password
125 | properties.setProperty(SslConfigs.SSL_TRUSTSTORE_LOCATION_CONFIG,"/home/sshuser/ssl/kafka.client.truststore.jks");
126 | properties.setProperty(SslConfigs.SSL_TRUSTSTORE_PASSWORD_CONFIG, "MyClientPassword123");
127 | // Set the SSL keystore location and password
128 | properties.setProperty(SslConfigs.SSL_KEYSTORE_LOCATION_CONFIG,"/home/sshuser/ssl/kafka.client.keystore.jks");
129 | properties.setProperty(SslConfigs.SSL_KEYSTORE_PASSWORD_CONFIG, "MyClientPassword123");
130 | // Set the SSL key password
131 | properties.setProperty(SslConfigs.SSL_KEY_PASSWORD_CONFIG, "MyClientPassword123");
132 | ```
133 | In this code, the consumer is configured to read from the start of the topic (`auto.offset.reset` is set to `earliest`.)
134 | Above properties can change based on your keystore, truststore location and passwords. Another possible value for `ssl.mechanism` is `PLAIN`
135 |
136 | ### Run.java
137 |
138 | The [Run.java](https://github.com/Azure-Samples/hdinsight-kafka-java-get-started/blob/master/DomainJoined-Producer-Consumer/src/main/java/com/microsoft/example/Run.java) file provides a command-line interface that runs either the producer or consumer code. You must provide the Kafka broker host information as a parameter. You can optionally include a group ID value, which is used by the consumer process. If you create multiple consumer instances using the same group ID, they'll load balance reading from the topic.
139 |
140 | ## Use Pre-built JAR files
141 |
142 | Download the jars from the [Kafka Get Started Azure sample](https://github.com/Azure-Samples/hdinsight-kafka-java-get-started/tree/master/Prebuilt-Jars). If your cluster is **Enterprise Security Package (ESP) with TLS Encryption ** enabled, use kafka-producer-consumer-tls-esp.jar. Use the command below to copy the jars to your cluster.
143 |
144 | ```cmd
145 | scp kafka-producer-consumer-tls-esp.jar sshuser@CLUSTERNAME-ssh.azurehdinsight.net:kafka-producer-consumer.jar
146 | ```
147 |
148 | ## Build the JAR files from code
149 |
150 | 1. Download and extract the examples from [https://github.com/Azure-Samples/hdinsight-kafka-java-get-started](https://github.com/Azure-Samples/hdinsight-kafka-java-get-started).
151 |
152 | 2. If you are using **Enterprise Security Package (ESP) with TLS Encryption** enabled Kafka cluster, you should set the location to `DomainJoined-Producer-Consumer` subdirectory. Use the following command to build the application:
153 |
154 | ```cmd
155 | mvn clean package
156 | ```
157 |
158 | This command creates a directory named `target`, that contains a file named `kafka-producer-consumer-1.0-SNAPSHOT.jar`. For ESP clusters the file will be `kafka-producer-consumer-esp-1.0-SNAPSHOT.jar`
159 |
160 | 3. Replace `sshuser` with the SSH user for your cluster, and replace `CLUSTERNAME` with the name of your cluster. Enter the following command to copy the `kafka-producer-consumer-*.jar` file to your HDInsight cluster. When prompted enter the password for the SSH user.
161 |
162 | ```cmd
163 | scp ./target/kafka-producer-consumer*.jar sshuser@CLUSTERNAME-ssh.azurehdinsight.net:kafka-producer-consumer.jar
164 | ```
165 |
166 | ## Run the example
167 | This conversation was marked as resolved by piyushgupta
168 |
169 | 1. Replace `sshuser` with the SSH user for your cluster, and replace `CLUSTERNAME` with the name of your cluster. Open an SSH connection to the cluster, by entering the following command. If prompted, enter the password for the SSH user account.
170 |
171 | ```cmd
172 | ssh sshuser@CLUSTERNAME-ssh.azurehdinsight.net
173 | ```
174 |
175 | 1. To get the Kafka broker hosts, substitute the values for `` and `` in the following command and execute it. Use the same casing for `` as shown in the Azure portal. Replace `` with the cluster login password, then execute:
176 |
177 | ```bash
178 | sudo apt -y install jq
179 | export clusterName=''
180 | export password=''
181 | export KAFKABROKERS=$(curl -sS -u admin:$password -G https://$clusterName.azurehdinsight.net/api/v1/clusters/$clusterName/services/KAFKA/components/KAFKA_BROKER | jq -r '["\(.host_components[].HostRoles.host_name):9092"] | join(",")' | cut -d',' -f1,2);
182 | ```
183 |
184 | > **Note**
185 | This command requires Ambari access. If your cluster is behind an NSG, run this command from a machine that can access Ambari.
186 | 1. Create Kafka topic, `myTest`, by entering the following command:
187 |
188 | ```bash
189 | java -jar -Djava.security.auth.login.config=/usr/hdp/current/kafka-broker/config/kafka_client_jaas.conf kafka-producer-consumer.jar create myTest $KAFKABROKERS
190 | ```
191 |
192 | 1. To run the producer and write data to the topic, use the following command:
193 |
194 | ```bash
195 | java -jar -Djava.security.auth.login.config=/usr/hdp/current/kafka-broker/config/kafka_client_jaas.conf kafka-producer-consumer.jar producer myTest $KAFKABROKERS
196 | ```
197 |
198 | 1. Once the producer has finished, use the following command to read from the topic:
199 |
200 | ```bash
201 | java -jar -Djava.security.auth.login.config=/usr/hdp/current/kafka-broker/config/kafka_client_jaas.conf kafka-producer-consumer.jar consumer myTest $KAFKABROKERS
202 | scp ./target/kafka-producer-consumer*.jar sshuser@CLUSTERNAME-ssh.azurehdinsight.net:kafka-producer-consumer.jar
203 | ```
204 |
205 | The records read, along with a count of records, is displayed.
206 |
207 | 1. Use __Ctrl + C__ to exit the consumer.
208 |
209 | ### Run the Example with another User (espkafkauser)
210 |
211 | 1. To get the Kafka broker hosts, substitute the values for `` and `` in the following command and execute it. Use the same casing for `` as shown in the Azure portal. Replace `` with the cluster login password, then execute:
212 |
213 | ```bash
214 | sudo apt -y install jq
215 | export clusterName=''
216 | export password=''
217 | export KAFKABROKERS=$(curl -sS -u admin:$password -G https://$clusterName.azurehdinsight.net/api/v1/clusters/$clusterName/services/KAFKA/components/KAFKA_BROKER | jq -r '["\(.host_components[].HostRoles.host_name):9092"] | join(",")' | cut -d',' -f1,2);
218 | ```
219 | 2. Create the keytab file for espkafkauser with below steps
220 | ```bash
221 | ktutil
222 | ktutil: addent -password -p espkafkauser@TEST.COM -k 1 -e RC4-HMAC
223 | Password for espkafkauser@TEST.COM:
224 | ktutil: wkt espkafkauser.keytab
225 | ktutil: q
226 | ```
227 |
228 | **NOTE:-**
229 | 1. espkafkauser should be part of your domain group and add it in RangerUI to give CRUD operations privileges.
230 | 2. Keep this domain name (TEST.COM) in capital only. Otherwise, kerberos will throw errors at the time of CRUD operations.
231 |
232 | You will be having an espkafkauser.keytab file in local directory. Now create an espkafkauser_jaas.conf jaas config file with data given below
233 |
234 | ```
235 | KafkaClient {
236 | com.sun.security.auth.module.Krb5LoginModule required
237 | useKeyTab=true
238 | storeKey=true
239 | keyTab="/home/sshuser/espkafkauser.keytab"
240 | useTicketCache=false
241 | serviceName="kafka"
242 | principal="espkafkauser@TEST.COM";
243 | };
244 | ```
245 | ### Steps to add espkafkauser on RangerUI
246 | 1. Go to overview page of cluster and use Ambari UI URL to open ranger. Enter the Ambari UI credentials and it should work.
247 |
248 | 
249 | ```
250 | Generic
251 | https:///ranger
252 |
253 | Example
254 | https://espkafka.azurehdinsight.net/ranger
255 | ```
256 |
257 | 2. If everything is correct then you will be able to see ranger dashboard. Now click on Kafka link.
258 |
259 | 
260 |
261 |
262 | 3. Now we can see policy page where some users like kafka have access to do CRUD operation on alltopics.
263 |
264 | 
265 |
266 |
267 | 4. Now edit the alltopic policy and add espkafkauser in selectuser from dropdown. Click on save policy after changes
268 |
269 | 
270 |
271 | 
272 |
273 |
274 | 5. If we are not able to see our user in dropdown then that mean that user is not available in AAD domain.
275 |
276 | 6. Now Execute CRUD operations in head node for verification
277 |
278 | ```bash
279 | # Sample command
280 | java -jar -Djava.security.auth.login.config=JAAS_CONFIG_FILE_PATH PRODUCER_CONSUMER_ESP_JAR_PATH create $TOPICNAME $KAFKABROKER
281 |
282 | # Create
283 | java -jar -Djava.security.auth.login.config=user_jaas.conf kafka-producer-consumer-tls-esp.jar create $TOPICNAME $KAFKABROKERS
284 |
285 | # Describe
286 | java -jar -Djava.security.auth.login.config=user_jaas.conf kafka-producer-consumer-tls-esp.jar describe $TOPICNAME $KAFKABROKERS
287 |
288 | #Produce
289 | java -jar -Djava.security.auth.login.config=user_jaas.conf kafka-producer-consumer-tls-esp.jar producer $TOPICNAME $KAFKABROKERS
290 |
291 | #Consume
292 | java -jar -Djava.security.auth.login.config=user_jaas.conf kafka-producer-consumer-tls-esp.jar consumer $TOPICNAME $KAFKABROKERS
293 |
294 | #Delete
295 | java -jar -Djava.security.auth.login.config=user_jaas.conf kafka-producer-consumer-tls-esp.jar delete $TOPICNAME $KAFKABROKERS
296 | ```
297 |
298 |
299 | ### Multiple consumers
300 |
301 | Kafka consumers use a consumer group when reading records. Using the same group with multiple consumers results in load balanced reads from a topic. Each consumer in the group receives a portion of the records.
302 |
303 | The consumer application accepts a parameter that is used as the group ID. For example, the following command starts a consumer using a group ID of `myGroup`:
304 |
305 | ```bash
306 | java -jar -Djava.security.auth.login.config=user_jaas.conf kafka-producer-consumer-tls-esp.jar consumer myTest $KAFKABROKERS myGroup
307 | ```
308 |
309 | Use __Ctrl + C__ to exit the consumer.
310 |
311 | To see this process in action, use the following command:
312 |
313 | With Kafka as user
314 | ```bash
315 | tmux new-session 'java -jar -Djava.security.auth.login.config=/usr/hdp/current/kafka-broker/config/kafka_client_jaas.conf kafka-producer-consumer-tls-esp.jar consumer myTest $KAFKABROKERS myGroup' \
316 | \; split-window -h 'java -jar -Djava.security.auth.login.config=/usr/hdp/current/kafka-broker/config/kafka_client_jaas.conf kafka-producer-consumer-tls-esp.jar consumer myTest $KAFKABROKERS myGroup' \
317 | \; attach
318 | ```
319 |
320 | With custom user
321 | ```bash
322 | tmux new-session 'java -jar -Djava.security.auth.login.config=user_jaas.conf kafka-producer-consumer-tls-esp.jar consumer myTest $KAFKABROKERS myGroup' \
323 | \; split-window -h 'java -jar -Djava.security.auth.login.config=user_jaas.conf kafka-producer-consumer-tls-esp.jar consumer myTest $KAFKABROKERS myGroup' \
324 | \; attach
325 | ```
326 |
327 | This command uses `tmux` to split the terminal into two columns. A consumer is started in each column, with the same group ID value. Once the consumers finish reading, notice that each read only a portion of the records. Use __Ctrl + C__ twice to exit `tmux`.
328 |
329 | Consumption by clients within the same group is handled through the partitions for the topic. In this code sample, the `test` topic created earlier has eight partitions. If you start eight consumers, each consumer reads records from a single partition for the topic.
330 |
331 | > [!IMPORTANT]
332 | > There cannot be more consumer instances in a consumer group than partitions. In this example, one consumer group can contain up to eight consumers since that is the number of partitions in the topic. Or you can have multiple consumer groups, each with no more than eight consumers.
333 |
334 | Records stored in Kafka are stored in the order they're received within a partition. To achieve in-ordered delivery for records *within a partition*, create a consumer group where the number of consumer instances matches the number of partitions. To achieve in-ordered delivery for records *within the topic*, create a consumer group with only one consumer instance.
335 |
336 | ## Common Issues faced
337 |
338 | 1. Topic creation fails
339 |
340 |
341 | If your cluster is Enterprise Security Pack enabled, use the [pre-built JAR files for producer and consumer](https://github.com/Azure-Samples/hdinsight-kafka-java-get-started/blob/master/Prebuilt-Jars/kafka-producer-consumer-tls-esp.jar).
342 |
343 |
344 | The ESP with TLS Encryption jar can be built from the code in the [`DomainJoined-Producer-Consumer-With-TLS` subdirectory](https://github.com/Azure-Samples/hdinsight-kafka-java-get-started/tree/master/DomainJoined-Producer-Consumer-With-TLS).
345 |
346 |
347 | 1. Facing issue with ESP enabled clusters
348 |
349 | If produce and consume operations fail, and you are using an ESP enabled cluster, check that the user `kafka` is present in all Ranger policies. If it is not present, add it to all Ranger policies.
350 |
351 |
--------------------------------------------------------------------------------
/DomainJoined-Producer-Consumer-With-TLS/media/Add_User.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Azure-Samples/hdinsight-kafka-java-get-started/28497e8edf677bf63b4c50a0e1f6a25b6d3f35a7/DomainJoined-Producer-Consumer-With-TLS/media/Add_User.png
--------------------------------------------------------------------------------
/DomainJoined-Producer-Consumer-With-TLS/media/Azure_Portal_UI.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Azure-Samples/hdinsight-kafka-java-get-started/28497e8edf677bf63b4c50a0e1f6a25b6d3f35a7/DomainJoined-Producer-Consumer-With-TLS/media/Azure_Portal_UI.png
--------------------------------------------------------------------------------
/DomainJoined-Producer-Consumer-With-TLS/media/Edit_Policy_UI.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Azure-Samples/hdinsight-kafka-java-get-started/28497e8edf677bf63b4c50a0e1f6a25b6d3f35a7/DomainJoined-Producer-Consumer-With-TLS/media/Edit_Policy_UI.png
--------------------------------------------------------------------------------
/DomainJoined-Producer-Consumer-With-TLS/media/Kafk_Policy_UI.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Azure-Samples/hdinsight-kafka-java-get-started/28497e8edf677bf63b4c50a0e1f6a25b6d3f35a7/DomainJoined-Producer-Consumer-With-TLS/media/Kafk_Policy_UI.png
--------------------------------------------------------------------------------
/DomainJoined-Producer-Consumer-With-TLS/media/Ranger_UI.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Azure-Samples/hdinsight-kafka-java-get-started/28497e8edf677bf63b4c50a0e1f6a25b6d3f35a7/DomainJoined-Producer-Consumer-With-TLS/media/Ranger_UI.png
--------------------------------------------------------------------------------
/DomainJoined-Producer-Consumer-With-TLS/pom.xml:
--------------------------------------------------------------------------------
1 |
3 | 4.0.0
4 | com.microsoft.example
5 | kafka-producer-consumer-tls-esp
6 | jar
7 | 1.0-SNAPSHOT
8 | kafka-producer-consumer
9 | http://maven.apache.org
10 |
11 |
12 | 2.1.1
13 |
14 |
15 |
16 |
17 | org.apache.kafka
18 | kafka-clients
19 | ${kafka.version}
20 |
21 |
22 |
23 |
24 |
25 | org.apache.maven.plugins
26 | maven-compiler-plugin
27 | 3.3
28 |
29 |
30 | 1.8
31 | 1.8
32 |
33 |
34 |
35 |
36 | org.apache.maven.plugins
37 | maven-shade-plugin
38 | 2.3
39 |
40 |
41 |
42 |
43 |
44 |
45 | com.microsoft.example.Run
46 |
47 |
48 |
49 |
50 |
51 | *:*
52 |
53 | META-INF/*.SF
54 | META-INF/*.DSA
55 | META-INF/*.RSA
56 |
57 |
58 |
59 |
60 |
61 |
62 | package
63 |
64 | shade
65 |
66 |
67 |
68 |
69 |
70 |
71 |
72 |
--------------------------------------------------------------------------------
/DomainJoined-Producer-Consumer-With-TLS/src/main/java/com/microsoft/example/AdminClientWrapper.java:
--------------------------------------------------------------------------------
1 | package com.microsoft.example;
2 |
3 | import org.apache.kafka.clients.producer.ProducerConfig;
4 | import org.apache.kafka.clients.admin.AdminClient;
5 | import org.apache.kafka.clients.admin.DescribeTopicsResult;
6 | import org.apache.kafka.clients.admin.CreateTopicsResult;
7 | import org.apache.kafka.clients.admin.DeleteTopicsResult;
8 | import org.apache.kafka.clients.admin.TopicDescription;
9 | import org.apache.kafka.clients.admin.NewTopic;
10 |
11 | import org.apache.kafka.clients.admin.KafkaAdminClient;
12 | import org.apache.kafka.clients.CommonClientConfigs;
13 | import org.apache.kafka.common.config.SslConfigs;
14 |
15 |
16 | import java.util.Collection;
17 | import java.util.Collections;
18 | import java.util.concurrent.ExecutionException;
19 | import java.util.Properties;
20 | import java.util.Random;
21 | import java.io.IOException;
22 |
23 |
24 | public class AdminClientWrapper {
25 |
26 | public static Properties getProperties(String brokers) {
27 | Properties properties = new Properties();
28 | properties.setProperty(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, brokers);
29 |
30 | // Set how to serialize key/value pairs
31 | properties.setProperty(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer");
32 | properties.setProperty(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer");
33 | // specify the protocol for Domain Joined TLS Encrypted clusters
34 | properties.setProperty(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, "SASL_SSL");
35 | properties.setProperty("ssl.mechanism", "GSSAPI");
36 | properties.setProperty("sasl.kerberos.service.name", "kafka");
37 | // specifiy the Truststore location and password
38 | properties.setProperty(SslConfigs.SSL_TRUSTSTORE_LOCATION_CONFIG,"/home/sshuser/ssl/kafka.client.truststore.jks");
39 | properties.setProperty(SslConfigs.SSL_TRUSTSTORE_PASSWORD_CONFIG, "MyClientPassword123");
40 | // specifiy the Keystore location and password
41 | properties.setProperty(SslConfigs.SSL_KEYSTORE_LOCATION_CONFIG,"/home/sshuser/ssl/kafka.client.keystore.jks");
42 | properties.setProperty(SslConfigs.SSL_KEYSTORE_PASSWORD_CONFIG, "MyClientPassword123");
43 | // specifiy the key password
44 | properties.setProperty(SslConfigs.SSL_KEY_PASSWORD_CONFIG, "MyClientPassword123");
45 | return properties;
46 | }
47 |
48 | public static void describeTopics(String brokers, String topicName) throws IOException {
49 | // Set properties used to configure admin client
50 | Properties properties = getProperties(brokers);
51 |
52 | try (final AdminClient adminClient = KafkaAdminClient.create(properties)) {
53 | // Make async call to describe the topic.
54 | final DescribeTopicsResult describeTopicsResult = adminClient.describeTopics(Collections.singleton(topicName));
55 |
56 | TopicDescription description = describeTopicsResult.values().get(topicName).get();
57 | System.out.print(description.toString());
58 | } catch (Exception e) {
59 | System.out.print("Describe denied\n");
60 | System.out.print(e.getMessage());
61 | //throw new RuntimeException(e.getMessage(), e);
62 | }
63 | }
64 |
65 | public static void deleteTopics(String brokers, String topicName) throws IOException {
66 | // Set properties used to configure admin client
67 | Properties properties = getProperties(brokers);
68 |
69 | try (final AdminClient adminClient = KafkaAdminClient.create(properties)) {
70 | final DeleteTopicsResult deleteTopicsResult = adminClient.deleteTopics(Collections.singleton(topicName));
71 | deleteTopicsResult.values().get(topicName).get();
72 | System.out.print("Topic " + topicName + " deleted");
73 | } catch (Exception e) {
74 | System.out.print("Delete Topics denied\n");
75 | System.out.print(e.getMessage());
76 | //throw new RuntimeException(e.getMessage(), e);
77 | }
78 | }
79 |
80 | public static void createTopics(String brokers, String topicName) throws IOException {
81 | // Set properties used to configure admin client
82 | Properties properties = getProperties(brokers);
83 |
84 | try (final AdminClient adminClient = KafkaAdminClient.create(properties)) {
85 | int numPartitions = 8;
86 | short replicationFactor = (short)3;
87 | final NewTopic newTopic = new NewTopic(topicName, numPartitions, replicationFactor);
88 |
89 | final CreateTopicsResult createTopicsResult = adminClient.createTopics(Collections.singleton(newTopic));
90 | createTopicsResult.values().get(topicName).get();
91 | System.out.print("Topic " + topicName + " created");
92 | } catch (Exception e) {
93 | System.out.print("Create Topics denied\n");
94 | System.out.print(e.getMessage());
95 | //throw new RuntimeException(e.getMessage(), e);
96 | }
97 | }
98 | }
99 |
--------------------------------------------------------------------------------
/DomainJoined-Producer-Consumer-With-TLS/src/main/java/com/microsoft/example/Consumer.java:
--------------------------------------------------------------------------------
1 | package com.microsoft.example;
2 |
3 | import org.apache.kafka.clients.consumer.KafkaConsumer;
4 | import org.apache.kafka.clients.consumer.ConsumerRecords;
5 | import org.apache.kafka.clients.consumer.ConsumerRecord;
6 | import org.apache.kafka.clients.CommonClientConfigs;
7 | import org.apache.kafka.common.config.SslConfigs;
8 |
9 | import java.util.Properties;
10 | import java.util.Arrays;
11 |
12 | public class Consumer {
13 | public static int consume(String brokers, String groupId, String topicName) {
14 | // Create a consumer
15 | KafkaConsumer consumer;
16 | // Configure the consumer
17 | Properties properties = new Properties();
18 | // Point it to the brokers
19 | properties.setProperty("bootstrap.servers", brokers);
20 | // Set the consumer group (all consumers must belong to a group).
21 | properties.setProperty("group.id", groupId);
22 | // Set how to serialize key/value pairs
23 | properties.setProperty("key.deserializer","org.apache.kafka.common.serialization.StringDeserializer");
24 | properties.setProperty("value.deserializer","org.apache.kafka.common.serialization.StringDeserializer");
25 | // specify the protocol for Domain Joined TLS Encrypted clusters
26 | properties.setProperty(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, "SASL_SSL");
27 | properties.setProperty("ssl.mechanism", "GSSAPI");
28 | properties.setProperty("sasl.kerberos.service.name", "kafka");
29 | // specifiy the Truststore location and password
30 | properties.setProperty(SslConfigs.SSL_TRUSTSTORE_LOCATION_CONFIG,"/home/sshuser/ssl/kafka.client.truststore.jks");
31 | properties.setProperty(SslConfigs.SSL_TRUSTSTORE_PASSWORD_CONFIG, "MyClientPassword123");
32 | // specifiy the Keystore location and password
33 | properties.setProperty(SslConfigs.SSL_KEYSTORE_LOCATION_CONFIG,"/home/sshuser/ssl/kafka.client.keystore.jks");
34 | properties.setProperty(SslConfigs.SSL_KEYSTORE_PASSWORD_CONFIG, "MyClientPassword123");
35 | // specifiy the key password
36 | properties.setProperty(SslConfigs.SSL_KEY_PASSWORD_CONFIG, "MyClientPassword123");
37 | // When a group is first created, it has no offset stored to start reading from. This tells it to start
38 | // with the earliest record in the stream.
39 | properties.setProperty("auto.offset.reset","earliest");
40 |
41 | consumer = new KafkaConsumer<>(properties);
42 |
43 | // Subscribe to the 'test' topic
44 | consumer.subscribe(Arrays.asList(topicName));
45 |
46 | // Loop until ctrl + c
47 | int count = 0;
48 | while(true) {
49 | // Poll for records
50 | ConsumerRecords records = consumer.poll(200);
51 | // Did we get any?
52 | if (records.count() == 0) {
53 | // timeout/nothing to read
54 | } else {
55 | // Yes, loop over records
56 | for(ConsumerRecord record: records) {
57 | // Display record and count
58 | count += 1;
59 | System.out.println( count + ": " + record.value());
60 | }
61 | }
62 | }
63 | }
64 | }
65 |
--------------------------------------------------------------------------------
/DomainJoined-Producer-Consumer-With-TLS/src/main/java/com/microsoft/example/Producer.java:
--------------------------------------------------------------------------------
1 | package com.microsoft.example;
2 |
3 | import org.apache.kafka.clients.producer.KafkaProducer;
4 | import org.apache.kafka.clients.producer.ProducerRecord;
5 | import org.apache.kafka.clients.producer.ProducerConfig;
6 | import org.apache.kafka.clients.admin.AdminClient;
7 | import org.apache.kafka.clients.admin.DescribeTopicsResult;
8 | import org.apache.kafka.clients.admin.KafkaAdminClient;
9 | import org.apache.kafka.clients.CommonClientConfigs;
10 | import org.apache.kafka.clients.admin.TopicDescription;
11 | import org.apache.kafka.common.config.SslConfigs;
12 |
13 | import java.util.Collection;
14 | import java.util.Collections;
15 | import java.util.concurrent.ExecutionException;
16 | import java.util.Properties;
17 | import java.util.Random;
18 | import java.io.IOException;
19 |
20 | public class Producer
21 | {
22 | public static void produce(String brokers, String topicName) throws IOException
23 | {
24 |
25 | // Set properties used to configure the producer
26 | Properties properties = new Properties();
27 | // Set the brokers (bootstrap servers)
28 | properties.setProperty("bootstrap.servers", brokers);
29 | // Set how to serialize key/value pairs
30 | properties.setProperty("key.serializer","org.apache.kafka.common.serialization.StringSerializer");
31 | properties.setProperty("value.serializer","org.apache.kafka.common.serialization.StringSerializer");
32 | // specify the protocol for Domain Joined TLS Encrypted clusters
33 | properties.setProperty(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, "SASL_SSL");
34 | properties.setProperty("ssl.mechanism", "GSSAPI");
35 | properties.setProperty("sasl.kerberos.service.name", "kafka");
36 | // specifiy the Truststore location and password
37 | properties.setProperty(SslConfigs.SSL_TRUSTSTORE_LOCATION_CONFIG,"/home/sshuser/ssl/kafka.client.truststore.jks");
38 | properties.setProperty(SslConfigs.SSL_TRUSTSTORE_PASSWORD_CONFIG, "MyClientPassword123");
39 | // specifiy the Keystore location and password
40 | properties.setProperty(SslConfigs.SSL_KEYSTORE_LOCATION_CONFIG,"/home/sshuser/ssl/kafka.client.keystore.jks");
41 | properties.setProperty(SslConfigs.SSL_KEYSTORE_PASSWORD_CONFIG, "MyClientPassword123");
42 | // specifiy the key password
43 | properties.setProperty(SslConfigs.SSL_KEY_PASSWORD_CONFIG, "MyClientPassword123");
44 |
45 | KafkaProducer producer = new KafkaProducer<>(properties);
46 |
47 | // So we can generate random sentences
48 | Random random = new Random();
49 | String[] sentences = new String[] {
50 | "the cow jumped over the moon",
51 | "an apple a day keeps the doctor away",
52 | "four score and seven years ago",
53 | "snow white and the seven dwarfs",
54 | "i am at two with nature"
55 | };
56 |
57 | String progressAnimation = "|/-\\";
58 | // Produce a bunch of records
59 | for(int i = 0; i < 100; i++) {
60 | // Pick a sentence at random
61 | String sentence = sentences[random.nextInt(sentences.length)];
62 | // Send the sentence to the test topic
63 | try
64 | {
65 | producer.send(new ProducerRecord(topicName, sentence)).get();
66 | }
67 | catch (Exception ex)
68 | {
69 | System.out.print(ex.getMessage());
70 | throw new IOException(ex.toString());
71 | }
72 | String progressBar = "\r" + progressAnimation.charAt(i % progressAnimation.length()) + " " + i;
73 | System.out.write(progressBar.getBytes());
74 | }
75 | }
76 | }
77 |
--------------------------------------------------------------------------------
/DomainJoined-Producer-Consumer-With-TLS/src/main/java/com/microsoft/example/Run.java:
--------------------------------------------------------------------------------
1 | package com.microsoft.example;
2 |
3 | import java.io.IOException;
4 | import java.util.UUID;
5 | import java.io.PrintWriter;
6 | import java.io.File;
7 | import java.lang.Exception;
8 |
9 | // Handle starting producer or consumer
10 | public class Run {
11 | public static void main(String[] args) throws IOException {
12 | if(args.length < 3) {
13 | usage();
14 | }
15 | // Get the brokers
16 | String brokers = args[2];
17 | String topicName = args[1];
18 | switch(args[0].toLowerCase()) {
19 | case "producer":
20 | Producer.produce(brokers, topicName);
21 | break;
22 | case "consumer":
23 | // Either a groupId was passed in, or we need a random one
24 | String groupId;
25 | if(args.length == 4) {
26 | groupId = args[3];
27 | } else {
28 | groupId = UUID.randomUUID().toString();
29 | }
30 | Consumer.consume(brokers, groupId, topicName);
31 | break;
32 | case "describe":
33 | AdminClientWrapper.describeTopics(brokers, topicName);
34 | break;
35 | case "create":
36 | AdminClientWrapper.createTopics(brokers, topicName);
37 | break;
38 | case "delete":
39 | AdminClientWrapper.deleteTopics(brokers, topicName);
40 | break;
41 | default:
42 | usage();
43 | }
44 | System.exit(0);
45 | }
46 | // Display usage
47 | public static void usage() {
48 | System.out.println("Usage:");
49 | System.out.println("kafka-example.jar brokerhosts [groupid]");
50 | System.exit(1);
51 | }
52 | }
53 |
--------------------------------------------------------------------------------
/DomainJoined-Producer-Consumer/.gitignore:
--------------------------------------------------------------------------------
1 | target/
2 | pom.xml.tag
3 | pom.xml.releaseBackup
4 | pom.xml.versionsBackup
5 | pom.xml.next
6 | release.properties
7 | dependency-reduced-pom.xml
8 | buildNumber.properties
9 | .mvn/timing.properties
10 | .idea/
11 | *.log
12 | .classpath
13 | .project
14 | .settings/
15 | *.iml
--------------------------------------------------------------------------------
/DomainJoined-Producer-Consumer/README.md:
--------------------------------------------------------------------------------
1 | ---
2 | page_type: sample
3 | languages: java
4 | products:
5 | - azure
6 | - azure-hdinsight
7 | description: "Examples in this repository demonstrate how to use the Kafka Consumer, Producer, and Streaming APIs with a Kerberized Kafka on HDInsight cluster."
8 | urlFragment: hdinsight-kafka-java-get-started
9 | ---
10 |
11 | # Java-based example of using the Kafka Consumer, Producer, and Streaming APIs
12 |
13 | The examples in this repository demonstrate how to use the Kafka Consumer, Producer, and Streaming APIs with a Kafka on HDInsight cluster.
14 |
15 | ## Prerequisites
16 |
17 | * Apache Kafka on HDInsight cluster. To learn how to create the cluster, see [Start with Apache Kafka on HDInsight](apache-kafka-get-started.md).
18 | * [Java Developer Kit (JDK) version 8](https://aka.ms/azure-jdks) or an equivalent, such as OpenJDK.
19 | * [Apache Maven](https://maven.apache.org/download.cgi) properly [installed](https://maven.apache.org/install.html) according to Apache. Maven is a project build system for Java projects.
20 | * An SSH client like Putty. For more information, see [Connect to HDInsight (Apache Hadoop) using SSH](../hdinsight-hadoop-linux-use-ssh-unix.md).
21 |
22 | ## Understand the code
23 |
24 | If you're using **Enterprise Security Package (ESP)** enabled Kafka cluster, you should use the application version located in the `DomainJoined-Producer-Consumer` subdirectory.
25 |
26 | The application consists primarily of four files:
27 | * `pom.xml`: This file defines the project dependencies, Java version, and packaging methods.
28 | * `Producer.java`: This file sends random sentences to Kafka using the producer API.
29 | * `Consumer.java`: This file uses the consumer API to read data from Kafka and emit it to STDOUT.
30 | * `AdminClientWrapper.java`: This file uses the admin API to create, describe, and delete Kafka topics.
31 | * `Run.java`: The command-line interface used to run the producer and consumer code.
32 |
33 | ### Pom.xml
34 |
35 | The important things to understand in the `pom.xml` file are:
36 |
37 | * Dependencies: This project relies on the Kafka producer and consumer APIs, which are provided by the `kafka-clients` package. The following XML code defines this dependency:
38 |
39 | ```xml
40 |
41 |
42 | org.apache.kafka
43 | kafka-clients
44 | ${kafka.version}
45 |
46 | ```
47 |
48 | The `${kafka.version}` entry is declared in the `..` section of `pom.xml`, and is configured to the Kafka version of the HDInsight cluster.
49 |
50 | * Plugins: Maven plugins provide various capabilities. In this project, the following plugins are used:
51 |
52 | * `maven-compiler-plugin`: Used to set the Java version used by the project to 8. This is the version of Java used by HDInsight 4.0.
53 | * `maven-shade-plugin`: Used to generate an uber jar that contains this application as well as any dependencies. It is also used to set the entry point of the application, so that you can directly run the Jar file without having to specify the main class.
54 |
55 | ### Producer.java
56 |
57 | The producer communicates with the Kafka broker hosts (worker nodes) and sends data to a Kafka topic. The following code snippet is from the [Producer.java](https://github.com/Azure-Samples/hdinsight-kafka-java-get-started/blob/master/DomainJoined-Producer-Consumer/src/main/java/com/microsoft/example/Producer.java) file from the [GitHub repository](https://github.com/Azure-Samples/hdinsight-kafka-java-get-started) and shows how to set the producer properties.
58 |
59 | ```java
60 | Properties properties = new Properties();
61 | // Set the brokers (bootstrap servers)
62 | properties.setProperty("bootstrap.servers", brokers);
63 | // Set how to serialize key/value pairs
64 | properties.setProperty("key.serializer","org.apache.kafka.common.serialization.StringSerializer");
65 | properties.setProperty("value.serializer","org.apache.kafka.common.serialization.StringSerializer");
66 | properties.setProperty(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, "SASL_PLAINTEXT");
67 | KafkaProducer producer = new KafkaProducer<>(properties);
68 | ```
69 |
70 | ### Consumer.java
71 |
72 | The consumer communicates with the Kafka broker hosts (worker nodes), and reads records in a loop. The following code snippet from the [Consumer.java](https://github.com/Azure-Samples/hdinsight-kafka-java-get-started/blob/master/DomainJoined-Producer-Consumer/src/main/java/com/microsoft/example/Consumer.java) file sets the consumer properties.
73 |
74 | ```java
75 | KafkaConsumer consumer;
76 | // Configure the consumer
77 | Properties properties = new Properties();
78 | // Point it to the brokers
79 | properties.setProperty("bootstrap.servers", brokers);
80 | // Set the consumer group (all consumers must belong to a group).
81 | properties.setProperty("group.id", groupId);
82 | // Set how to serialize key/value pairs
83 | properties.setProperty("key.deserializer","org.apache.kafka.common.serialization.StringDeserializer");
84 | properties.setProperty("value.deserializer","org.apache.kafka.common.serialization.StringDeserializer");
85 | properties.setProperty(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, "SASL_PLAINTEXT");
86 | // When a group is first created, it has no offset stored to start reading from. This tells it to start
87 | // with the earliest record in the stream.
88 | properties.setProperty("auto.offset.reset","earliest");
89 |
90 | consumer = new KafkaConsumer<>(properties);
91 | ```
92 |
93 | Notice the important property added for ESP cluster. This is critical to add in AdminClient, Producer and Consumer. properties.setProperty(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, "SASL_PLAINTEXT");
94 | In this code, the consumer is configured to read from the start of the topic (`auto.offset.reset` is set to `earliest`.)
95 |
96 | ### Run.java
97 |
98 | The [Run.java](https://github.com/Azure-Samples/hdinsight-kafka-java-get-started/blob/master/DomainJoined-Producer-Consumer/src/main/java/com/microsoft/example/Run.java) file provides a command-line interface that runs either the producer or consumer code. You must provide the Kafka broker host information as a parameter. You can optionally include a group ID value, which is used by the consumer process. If you create multiple consumer instances using the same group ID, they'll load balance reading from the topic.
99 |
100 | ## Use Pre-built JAR files
101 |
102 | Download the jars from the [Kafka Get Started Azure sample](https://github.com/Azure-Samples/hdinsight-kafka-java-get-started/tree/master/Prebuilt-Jars). If your cluster is **Enterprise Security Package (ESP)** enabled, use kafka-producer-consumer-esp.jar. Use the command below to copy the jars to your cluster.
103 |
104 | ```cmd
105 | scp kafka-producer-consumer-esp.jar sshuser@CLUSTERNAME-ssh.azurehdinsight.net:kafka-producer-consumer.jar
106 | ```
107 |
108 | ## Build the JAR files from code
109 |
110 |
111 | If you would like to skip this step, prebuilt jars can be downloaded from the `Prebuilt-Jars` subdirectory. Download the kafka-producer-consumer.jar. If your cluster is **Enterprise Security Package (ESP)** enabled, use kafka-producer-consumer-esp.jar. Execute step 3 to copy the jar to your HDInsight cluster.
112 |
113 | 1. Download and extract the examples from [https://github.com/Azure-Samples/hdinsight-kafka-java-get-started](https://github.com/Azure-Samples/hdinsight-kafka-java-get-started).
114 |
115 | 2. If you are using **Enterprise Security Package (ESP)** enabled Kafka cluster, you should set the location to `DomainJoined-Producer-Consumer` subdirectory. Use the following command to build the application:
116 |
117 | ```cmd
118 | mvn clean package
119 | ```
120 |
121 | This command creates a directory named `target`, that contains a file named `kafka-producer-consumer-1.0-SNAPSHOT.jar`. For ESP clusters the file will be `kafka-producer-consumer-esp-1.0-SNAPSHOT.jar`
122 |
123 | 3. Replace `sshuser` with the SSH user for your cluster, and replace `CLUSTERNAME` with the name of your cluster. Enter the following command to copy the `kafka-producer-consumer-*.jar` file to your HDInsight cluster. When prompted enter the password for the SSH user.
124 |
125 | ```cmd
126 | scp ./target/kafka-producer-consumer*.jar sshuser@CLUSTERNAME-ssh.azurehdinsight.net:kafka-producer-consumer.jar
127 | ```
128 |
129 | ## Run the example
130 | This conversation was marked as resolved by anusricorp
131 |
132 | 1. Replace `sshuser` with the SSH user for your cluster, and replace `CLUSTERNAME` with the name of your cluster. Open an SSH connection to the cluster, by entering the following command. If prompted, enter the password for the SSH user account.
133 |
134 | ```cmd
135 | ssh sshuser@CLUSTERNAME-ssh.azurehdinsight.net
136 | ```
137 |
138 | 1. To get the Kafka broker hosts, substitute the values for `` and `` in the following command and execute it. Use the same casing for `` as shown in the Azure portal. Replace `` with the cluster login password, then execute:
139 |
140 | ```bash
141 | sudo apt -y install jq
142 | export clusterName=''
143 | export password=''
144 | export KAFKABROKERS=$(curl -sS -u admin:$password -G https://$clusterName.azurehdinsight.net/api/v1/clusters/$clusterName/services/KAFKA/components/KAFKA_BROKER | jq -r '["\(.host_components[].HostRoles.host_name):9092"] | join(",")' | cut -d',' -f1,2);
145 | ```
146 |
147 | > **Note**
148 | This command requires Ambari access. If your cluster is behind an NSG, run this command from a machine that can access Ambari.
149 | 1. Create Kafka topic, `myTest`, by entering the following command:
150 |
151 | ```bash
152 | java -jar -Djava.security.auth.login.config=/usr/hdp/current/kafka-broker/config/kafka_client_jaas.conf kafka-producer-consumer.jar create myTest $KAFKABROKERS
153 | ```
154 |
155 | 1. To run the producer and write data to the topic, use the following command:
156 |
157 | ```bash
158 | java -jar -Djava.security.auth.login.config=/usr/hdp/current/kafka-broker/config/kafka_client_jaas.conf kafka-producer-consumer.jar producer myTest $KAFKABROKERS
159 | ```
160 |
161 | 1. Once the producer has finished, use the following command to read from the topic:
162 |
163 | ```bash
164 | java -jar -Djava.security.auth.login.config=/usr/hdp/current/kafka-broker/config/kafka_client_jaas.conf kafka-producer-consumer.jar consumer myTest $KAFKABROKERS
165 | scp ./target/kafka-producer-consumer*.jar sshuser@CLUSTERNAME-ssh.azurehdinsight.net:kafka-producer-consumer.jar
166 | ```
167 |
168 | The records read, along with a count of records, is displayed.
169 |
170 | 1. Use __Ctrl + C__ to exit the consumer.
171 |
172 | ### Run the Example with another User (espkafkauser)
173 |
174 | 1. To get the Kafka broker hosts, substitute the values for `` and `` in the following command and execute it. Use the same casing for `` as shown in the Azure portal. Replace `` with the cluster login password, then execute:
175 |
176 | ```bash
177 | sudo apt -y install jq
178 | export clusterName=''
179 | export password=''
180 | export KAFKABROKERS=$(curl -sS -u admin:$password -G https://$clusterName.azurehdinsight.net/api/v1/clusters/$clusterName/services/KAFKA/components/KAFKA_BROKER | jq -r '["\(.host_components[].HostRoles.host_name):9092"] | join(",")' | cut -d',' -f1,2);
181 | ```
182 | 2. Create the keytab file for espkafkauser with below steps
183 | ```bash
184 | ktutil
185 | ktutil: addent -password -p espkafkauser@TEST.COM -k 1 -e RC4-HMAC
186 | Password for espkafkauser@TEST.COM:
187 | ktutil: wkt espkafkauser.keytab
188 | ktutil: q
189 | ```
190 |
191 | **NOTE:-**
192 | 1. espkafkauser should be part of your domain group and add it in RangerUI to give CRUD operations privileges.
193 | 2. Keep this domain name (TEST.COM) in capital only. Otherwise, kerberos will throw errors at the time of CRUD operations.
194 |
195 | You will be having an espkafkauser.keytab file in local directory. Now create an espkafkauser_jaas.conf jaas config file with data given below
196 |
197 | ```
198 | KafkaClient {
199 | com.sun.security.auth.module.Krb5LoginModule required
200 | useKeyTab=true
201 | storeKey=true
202 | keyTab="/home/sshuser/espkafkauser.keytab"
203 | useTicketCache=false
204 | serviceName="kafka"
205 | principal="espkafkauser@TEST.COM";
206 | };
207 | ```
208 | ### Steps to add espkafkauser on RangerUI
209 | 1. Go to overview page of cluster and use Ambari UI URL to open ranger. Enter the Ambari UI credentials and it should work.
210 |
211 | 
212 | ```
213 | Generic
214 | https:///ranger
215 |
216 | Example
217 | https://espkafka.azurehdinsight.net/ranger
218 | ```
219 |
220 | 2. If everything is correct then you will be able to see ranger dashboard. Now click on Kafka link.
221 |
222 | 
223 |
224 |
225 | 3. Now we can see policy page where some users like kafka have access to do CRUD operation on alltopics.
226 |
227 | 
228 |
229 |
230 | 4. Now edit the alltopic policy and add espkafkauser in selectuser from dropdown. Click on save policy after changes
231 |
232 | 
233 |
234 | 
235 |
236 |
237 | 5. If we are not able to see our user in dropdown then that mean that user is not available in AAD domain.
238 |
239 | 6. Now Execute CRUD operations in head node for verification
240 |
241 | ```bash
242 | # Sample command
243 | java -jar -Djava.security.auth.login.config=JAAS_CONFIG_FILE_PATH PRODUCER_CONSUMER_ESP_JAR_PATH create $TOPICNAME $KAFKABROKER
244 |
245 | # Create
246 | java -jar -Djava.security.auth.login.config=user_jaas.conf kafka-producer-consumer-esp.jar create $TOPICNAME $KAFKABROKERS
247 |
248 | # Describe
249 | java -jar -Djava.security.auth.login.config=user_jaas.conf kafka-producer-consumer-esp.jar describe $TOPICNAME $KAFKABROKERS
250 |
251 | #Produce
252 | java -jar -Djava.security.auth.login.config=user_jaas.conf kafka-producer-consumer-esp.jar producer $TOPICNAME $KAFKABROKERS
253 |
254 | #Consume
255 | java -jar -Djava.security.auth.login.config=user_jaas.conf kafka-producer-consumer-esp.jar consumer $TOPICNAME $KAFKABROKERS
256 |
257 | #Delete
258 | java -jar -Djava.security.auth.login.config=user_jaas.conf kafka-producer-consumer-esp.jar delete $TOPICNAME $KAFKABROKERS
259 | ```
260 |
261 |
262 | ### Multiple consumers
263 |
264 | Kafka consumers use a consumer group when reading records. Using the same group with multiple consumers results in load balanced reads from a topic. Each consumer in the group receives a portion of the records.
265 |
266 | The consumer application accepts a parameter that is used as the group ID. For example, the following command starts a consumer using a group ID of `myGroup`:
267 |
268 | ```bash
269 | java -jar -Djava.security.auth.login.config=user_jaas.conf kafka-producer-consumer-esp.jar consumer myTest $KAFKABROKERS myGroup
270 | ```
271 |
272 | Use __Ctrl + C__ to exit the consumer.
273 |
274 | To see this process in action, use the following command:
275 |
276 | With Kafka as user
277 | ```bash
278 | tmux new-session 'java -jar -Djava.security.auth.login.config=/usr/hdp/current/kafka-broker/config/kafka_client_jaas.conf kafka-producer-consumer-esp.jar consumer myTest $KAFKABROKERS myGroup' \
279 | \; split-window -h 'java -jar -Djava.security.auth.login.config=/usr/hdp/current/kafka-broker/config/kafka_client_jaas.conf kafka-producer-consumer-esp.jar consumer myTest $KAFKABROKERS myGroup' \
280 | \; attach
281 | ```
282 |
283 | With custom user
284 | ```bash
285 | tmux new-session 'java -jar -Djava.security.auth.login.config=user_jaas.conf kafka-producer-consumer-esp.jar consumer myTest $KAFKABROKERS myGroup' \
286 | \; split-window -h 'java -jar -Djava.security.auth.login.config=user_jaas.conf kafka-producer-consumer-esp.jar consumer myTest $KAFKABROKERS myGroup' \
287 | \; attach
288 | ```
289 |
290 | This command uses `tmux` to split the terminal into two columns. A consumer is started in each column, with the same group ID value. Once the consumers finish reading, notice that each read only a portion of the records. Use __Ctrl + C__ twice to exit `tmux`.
291 |
292 | Consumption by clients within the same group is handled through the partitions for the topic. In this code sample, the `test` topic created earlier has eight partitions. If you start eight consumers, each consumer reads records from a single partition for the topic.
293 |
294 | > [!IMPORTANT]
295 | > There cannot be more consumer instances in a consumer group than partitions. In this example, one consumer group can contain up to eight consumers since that is the number of partitions in the topic. Or you can have multiple consumer groups, each with no more than eight consumers.
296 |
297 | Records stored in Kafka are stored in the order they're received within a partition. To achieve in-ordered delivery for records *within a partition*, create a consumer group where the number of consumer instances matches the number of partitions. To achieve in-ordered delivery for records *within the topic*, create a consumer group with only one consumer instance.
298 |
299 | ## Common Issues faced
300 |
301 | 1. Topic creation fails
302 |
303 |
304 | If your cluster is Enterprise Security Pack enabled, use the [pre-built JAR files for producer and consumer](https://github.com/Azure-Samples/hdinsight-kafka-java-get-started/blob/master/Prebuilt-Jars/kafka-producer-consumer-esp.jar).
305 |
306 |
307 | The ESP jar can be built from the code in the [`DomainJoined-Producer-Consumer` subdirectory](https://github.com/Azure-Samples/hdinsight-kafka-java-get-started/tree/master/DomainJoined-Producer-Consumer). Note that the producer and consumer properties ave an additional property `CommonClientConfigs.SECURITY_PROTOCOL_CONFIG` for ESP enabled clusters.
308 |
309 |
310 | 1. Facing issue with ESP enabled clusters
311 |
312 | If produce and consume operations fail, and you are using an ESP enabled cluster, check that the user `kafka` is present in all Ranger policies. If it is not present, add it to all Ranger policies.
313 |
314 |
--------------------------------------------------------------------------------
/DomainJoined-Producer-Consumer/media/Add_User.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Azure-Samples/hdinsight-kafka-java-get-started/28497e8edf677bf63b4c50a0e1f6a25b6d3f35a7/DomainJoined-Producer-Consumer/media/Add_User.png
--------------------------------------------------------------------------------
/DomainJoined-Producer-Consumer/media/Azure_Portal_UI.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Azure-Samples/hdinsight-kafka-java-get-started/28497e8edf677bf63b4c50a0e1f6a25b6d3f35a7/DomainJoined-Producer-Consumer/media/Azure_Portal_UI.png
--------------------------------------------------------------------------------
/DomainJoined-Producer-Consumer/media/Edit_Policy_UI.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Azure-Samples/hdinsight-kafka-java-get-started/28497e8edf677bf63b4c50a0e1f6a25b6d3f35a7/DomainJoined-Producer-Consumer/media/Edit_Policy_UI.png
--------------------------------------------------------------------------------
/DomainJoined-Producer-Consumer/media/Kafk_Policy_UI.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Azure-Samples/hdinsight-kafka-java-get-started/28497e8edf677bf63b4c50a0e1f6a25b6d3f35a7/DomainJoined-Producer-Consumer/media/Kafk_Policy_UI.png
--------------------------------------------------------------------------------
/DomainJoined-Producer-Consumer/media/Ranger_UI.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Azure-Samples/hdinsight-kafka-java-get-started/28497e8edf677bf63b4c50a0e1f6a25b6d3f35a7/DomainJoined-Producer-Consumer/media/Ranger_UI.png
--------------------------------------------------------------------------------
/DomainJoined-Producer-Consumer/pom.xml:
--------------------------------------------------------------------------------
1 |
3 | 4.0.0
4 | com.microsoft.example
5 | kafka-producer-consumer-esp
6 | jar
7 | 1.0-SNAPSHOT
8 | kafka-producer-consumer
9 | http://maven.apache.org
10 |
11 |
12 | 2.1.1
13 |
14 |
15 |
16 |
17 | org.apache.kafka
18 | kafka-clients
19 | ${kafka.version}
20 |
21 |
22 |
23 |
24 |
25 | org.apache.maven.plugins
26 | maven-compiler-plugin
27 | 3.3
28 |
29 |
30 | 1.8
31 | 1.8
32 |
33 |
34 |
35 |
36 | org.apache.maven.plugins
37 | maven-shade-plugin
38 | 2.3
39 |
40 |
41 |
42 |
43 |
44 |
45 | com.microsoft.example.Run
46 |
47 |
48 |
49 |
50 |
51 | *:*
52 |
53 | META-INF/*.SF
54 | META-INF/*.DSA
55 | META-INF/*.RSA
56 |
57 |
58 |
59 |
60 |
61 |
62 | package
63 |
64 | shade
65 |
66 |
67 |
68 |
69 |
70 |
71 |
72 |
--------------------------------------------------------------------------------
/DomainJoined-Producer-Consumer/src/main/java/com/microsoft/example/AdminClientWrapper.java:
--------------------------------------------------------------------------------
1 | package com.microsoft.example;
2 |
3 | import org.apache.kafka.clients.producer.ProducerConfig;
4 | import org.apache.kafka.clients.admin.AdminClient;
5 | import org.apache.kafka.clients.admin.DescribeTopicsResult;
6 | import org.apache.kafka.clients.admin.CreateTopicsResult;
7 | import org.apache.kafka.clients.admin.DeleteTopicsResult;
8 | import org.apache.kafka.clients.admin.TopicDescription;
9 | import org.apache.kafka.clients.admin.NewTopic;
10 |
11 | import org.apache.kafka.clients.admin.KafkaAdminClient;
12 | import org.apache.kafka.clients.CommonClientConfigs;
13 |
14 |
15 | import java.util.Collection;
16 | import java.util.Collections;
17 | import java.util.concurrent.ExecutionException;
18 | import java.util.Properties;
19 | import java.util.Random;
20 | import java.io.IOException;
21 |
22 |
23 | public class AdminClientWrapper {
24 |
25 | public static Properties getProperties(String brokers) {
26 | Properties properties = new Properties();
27 | properties.setProperty(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, brokers);
28 |
29 | // Set how to serialize key/value pairs
30 | properties.setProperty(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer");
31 | properties.setProperty(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer");
32 | // specify the protocol for Domain Joined clusters
33 | properties.setProperty(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, "SASL_PLAINTEXT");
34 |
35 | return properties;
36 | }
37 |
38 | public static void describeTopics(String brokers, String topicName) throws IOException {
39 | // Set properties used to configure admin client
40 | Properties properties = getProperties(brokers);
41 |
42 | try (final AdminClient adminClient = KafkaAdminClient.create(properties)) {
43 | // Make async call to describe the topic.
44 | final DescribeTopicsResult describeTopicsResult = adminClient.describeTopics(Collections.singleton(topicName));
45 |
46 | TopicDescription description = describeTopicsResult.values().get(topicName).get();
47 | System.out.print(description.toString());
48 | } catch (Exception e) {
49 | System.out.print("Describe denied\n");
50 | System.out.print(e.getMessage());
51 | //throw new RuntimeException(e.getMessage(), e);
52 | }
53 | }
54 |
55 | public static void deleteTopics(String brokers, String topicName) throws IOException {
56 | // Set properties used to configure admin client
57 | Properties properties = getProperties(brokers);
58 |
59 | try (final AdminClient adminClient = KafkaAdminClient.create(properties)) {
60 | final DeleteTopicsResult deleteTopicsResult = adminClient.deleteTopics(Collections.singleton(topicName));
61 | deleteTopicsResult.values().get(topicName).get();
62 | System.out.print("Topic " + topicName + " deleted");
63 | } catch (Exception e) {
64 | System.out.print("Delete Topics denied\n");
65 | System.out.print(e.getMessage());
66 | //throw new RuntimeException(e.getMessage(), e);
67 | }
68 | }
69 |
70 | public static void createTopics(String brokers, String topicName) throws IOException {
71 | // Set properties used to configure admin client
72 | Properties properties = getProperties(brokers);
73 |
74 | try (final AdminClient adminClient = KafkaAdminClient.create(properties)) {
75 | int numPartitions = 8;
76 | short replicationFactor = (short)3;
77 | final NewTopic newTopic = new NewTopic(topicName, numPartitions, replicationFactor);
78 |
79 | final CreateTopicsResult createTopicsResult = adminClient.createTopics(Collections.singleton(newTopic));
80 | createTopicsResult.values().get(topicName).get();
81 | System.out.print("Topic " + topicName + " created");
82 | } catch (Exception e) {
83 | System.out.print("Create Topics denied\n");
84 | System.out.print(e.getMessage());
85 | //throw new RuntimeException(e.getMessage(), e);
86 | }
87 | }
88 | }
89 |
--------------------------------------------------------------------------------
/DomainJoined-Producer-Consumer/src/main/java/com/microsoft/example/Consumer.java:
--------------------------------------------------------------------------------
1 | package com.microsoft.example;
2 |
3 | import org.apache.kafka.clients.consumer.KafkaConsumer;
4 | import org.apache.kafka.clients.consumer.ConsumerRecords;
5 | import org.apache.kafka.clients.consumer.ConsumerRecord;
6 | import org.apache.kafka.clients.CommonClientConfigs;
7 | import java.util.Properties;
8 | import java.util.Arrays;
9 |
10 | public class Consumer {
11 | public static int consume(String brokers, String groupId, String topicName) {
12 | // Create a consumer
13 | KafkaConsumer consumer;
14 | // Configure the consumer
15 | Properties properties = new Properties();
16 | // Point it to the brokers
17 | properties.setProperty("bootstrap.servers", brokers);
18 | // Set the consumer group (all consumers must belong to a group).
19 | properties.setProperty("group.id", groupId);
20 | // Set how to serialize key/value pairs
21 | properties.setProperty("key.deserializer","org.apache.kafka.common.serialization.StringDeserializer");
22 | properties.setProperty("value.deserializer","org.apache.kafka.common.serialization.StringDeserializer");
23 | // When a group is first created, it has no offset stored to start reading from. This tells it to start
24 | // with the earliest record in the stream.
25 | properties.setProperty("auto.offset.reset","earliest");
26 |
27 | // specify the protocol for Domain Joined clusters
28 | properties.setProperty(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, "SASL_PLAINTEXT");
29 |
30 | consumer = new KafkaConsumer<>(properties);
31 |
32 | // Subscribe to the 'test' topic
33 | consumer.subscribe(Arrays.asList(topicName));
34 |
35 | // Loop until ctrl + c
36 | int count = 0;
37 | while(true) {
38 | // Poll for records
39 | ConsumerRecords records = consumer.poll(200);
40 | // Did we get any?
41 | if (records.count() == 0) {
42 | // timeout/nothing to read
43 | } else {
44 | // Yes, loop over records
45 | for(ConsumerRecord record: records) {
46 | // Display record and count
47 | count += 1;
48 | System.out.println( count + ": " + record.value());
49 | }
50 | }
51 | }
52 | }
53 | }
54 |
--------------------------------------------------------------------------------
/DomainJoined-Producer-Consumer/src/main/java/com/microsoft/example/Producer.java:
--------------------------------------------------------------------------------
1 | package com.microsoft.example;
2 |
3 | import org.apache.kafka.clients.producer.KafkaProducer;
4 | import org.apache.kafka.clients.producer.ProducerRecord;
5 | import org.apache.kafka.clients.producer.ProducerConfig;
6 | import org.apache.kafka.clients.admin.AdminClient;
7 | import org.apache.kafka.clients.admin.DescribeTopicsResult;
8 | import org.apache.kafka.clients.admin.KafkaAdminClient;
9 | import org.apache.kafka.clients.CommonClientConfigs;
10 | import org.apache.kafka.clients.admin.TopicDescription;
11 |
12 | import java.util.Collection;
13 | import java.util.Collections;
14 | import java.util.concurrent.ExecutionException;
15 | import java.util.Properties;
16 | import java.util.Random;
17 | import java.io.IOException;
18 |
19 | public class Producer
20 | {
21 | public static void produce(String brokers, String topicName) throws IOException
22 | {
23 |
24 | // Set properties used to configure the producer
25 | Properties properties = new Properties();
26 | // Set the brokers (bootstrap servers)
27 | properties.setProperty("bootstrap.servers", brokers);
28 | // Set how to serialize key/value pairs
29 | properties.setProperty("key.serializer","org.apache.kafka.common.serialization.StringSerializer");
30 | properties.setProperty("value.serializer","org.apache.kafka.common.serialization.StringSerializer");
31 | // specify the protocol for Domain Joined clusters
32 | properties.setProperty(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, "SASL_PLAINTEXT");
33 |
34 | KafkaProducer producer = new KafkaProducer<>(properties);
35 |
36 | // So we can generate random sentences
37 | Random random = new Random();
38 | String[] sentences = new String[] {
39 | "the cow jumped over the moon",
40 | "an apple a day keeps the doctor away",
41 | "four score and seven years ago",
42 | "snow white and the seven dwarfs",
43 | "i am at two with nature"
44 | };
45 |
46 | String progressAnimation = "|/-\\";
47 | // Produce a bunch of records
48 | for(int i = 0; i < 100; i++) {
49 | // Pick a sentence at random
50 | String sentence = sentences[random.nextInt(sentences.length)];
51 | // Send the sentence to the test topic
52 | try
53 | {
54 | producer.send(new ProducerRecord(topicName, sentence)).get();
55 | }
56 | catch (Exception ex)
57 | {
58 | System.out.print(ex.getMessage());
59 | throw new IOException(ex.toString());
60 | }
61 | String progressBar = "\r" + progressAnimation.charAt(i % progressAnimation.length()) + " " + i;
62 | System.out.write(progressBar.getBytes());
63 | }
64 | }
65 | }
66 |
--------------------------------------------------------------------------------
/DomainJoined-Producer-Consumer/src/main/java/com/microsoft/example/Run.java:
--------------------------------------------------------------------------------
1 | package com.microsoft.example;
2 |
3 | import java.io.IOException;
4 | import java.util.UUID;
5 | import java.io.PrintWriter;
6 | import java.io.File;
7 | import java.lang.Exception;
8 |
9 | // Handle starting producer or consumer
10 | public class Run {
11 | public static void main(String[] args) throws IOException {
12 | if(args.length < 3) {
13 | usage();
14 | }
15 | // Get the brokers
16 | String brokers = args[2];
17 | String topicName = args[1];
18 | switch(args[0].toLowerCase()) {
19 | case "producer":
20 | Producer.produce(brokers, topicName);
21 | break;
22 | case "consumer":
23 | // Either a groupId was passed in, or we need a random one
24 | String groupId;
25 | if(args.length == 4) {
26 | groupId = args[3];
27 | } else {
28 | groupId = UUID.randomUUID().toString();
29 | }
30 | Consumer.consume(brokers, groupId, topicName);
31 | break;
32 | case "describe":
33 | AdminClientWrapper.describeTopics(brokers, topicName);
34 | break;
35 | case "create":
36 | AdminClientWrapper.createTopics(brokers, topicName);
37 | break;
38 | case "delete":
39 | AdminClientWrapper.deleteTopics(brokers, topicName);
40 | break;
41 | default:
42 | usage();
43 | }
44 | System.exit(0);
45 | }
46 | // Display usage
47 | public static void usage() {
48 | System.out.println("Usage:");
49 | System.out.println("kafka-example.jar brokerhosts [groupid]");
50 | System.exit(1);
51 | }
52 | }
53 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | The MIT License (MIT)
2 |
3 | Copyright (c) 2015 Microsoft Corporation
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
--------------------------------------------------------------------------------
/Prebuilt-Jars/kafka-producer-consumer-esp.jar:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Azure-Samples/hdinsight-kafka-java-get-started/28497e8edf677bf63b4c50a0e1f6a25b6d3f35a7/Prebuilt-Jars/kafka-producer-consumer-esp.jar
--------------------------------------------------------------------------------
/Prebuilt-Jars/kafka-producer-consumer-tls-esp.jar:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Azure-Samples/hdinsight-kafka-java-get-started/28497e8edf677bf63b4c50a0e1f6a25b6d3f35a7/Prebuilt-Jars/kafka-producer-consumer-tls-esp.jar
--------------------------------------------------------------------------------
/Prebuilt-Jars/kafka-producer-consumer.jar:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Azure-Samples/hdinsight-kafka-java-get-started/28497e8edf677bf63b4c50a0e1f6a25b6d3f35a7/Prebuilt-Jars/kafka-producer-consumer.jar
--------------------------------------------------------------------------------
/Producer-Consumer/.gitignore:
--------------------------------------------------------------------------------
1 | target/
2 | pom.xml.tag
3 | pom.xml.releaseBackup
4 | pom.xml.versionsBackup
5 | pom.xml.next
6 | release.properties
7 | dependency-reduced-pom.xml
8 | buildNumber.properties
9 | .mvn/timing.properties
10 | .idea/
11 | *.log
12 | .classpath
13 | .project
14 | .settings/
15 | *.iml
--------------------------------------------------------------------------------
/Producer-Consumer/pom.xml:
--------------------------------------------------------------------------------
1 |
3 | 4.0.0
4 | com.microsoft.example
5 | kafka-producer-consumer
6 | jar
7 | 1.0-SNAPSHOT
8 | kafka-producer-consumer
9 | http://maven.apache.org
10 |
11 |
12 | 2.1.1
13 |
14 |
15 |
16 |
17 | org.apache.kafka
18 | kafka-clients
19 | ${kafka.version}
20 |
21 |
22 |
23 |
24 |
25 | org.apache.maven.plugins
26 | maven-compiler-plugin
27 | 3.3
28 |
29 |
30 | 1.8
31 | 1.8
32 |
33 |
34 |
35 |
36 | org.apache.maven.plugins
37 | maven-shade-plugin
38 | 2.3
39 |
40 |
41 |
42 |
43 |
44 |
45 | com.microsoft.example.Run
46 |
47 |
48 |
49 |
50 |
51 | *:*
52 |
53 | META-INF/*.SF
54 | META-INF/*.DSA
55 | META-INF/*.RSA
56 |
57 |
58 |
59 |
60 |
61 |
62 | package
63 |
64 | shade
65 |
66 |
67 |
68 |
69 |
70 |
71 |
72 |
--------------------------------------------------------------------------------
/Producer-Consumer/src/main/java/com/microsoft/example/AdminClientWrapper.java:
--------------------------------------------------------------------------------
1 | package com.microsoft.example;
2 |
3 | import org.apache.kafka.clients.producer.ProducerConfig;
4 | import org.apache.kafka.clients.admin.AdminClient;
5 | import org.apache.kafka.clients.admin.DescribeTopicsResult;
6 | import org.apache.kafka.clients.admin.CreateTopicsResult;
7 | import org.apache.kafka.clients.admin.DeleteTopicsResult;
8 | import org.apache.kafka.clients.admin.TopicDescription;
9 | import org.apache.kafka.clients.admin.NewTopic;
10 |
11 | import org.apache.kafka.clients.admin.KafkaAdminClient;
12 | import org.apache.kafka.clients.CommonClientConfigs;
13 |
14 |
15 | import java.util.Collection;
16 | import java.util.Collections;
17 | import java.util.concurrent.ExecutionException;
18 | import java.util.Properties;
19 | import java.util.Random;
20 | import java.io.IOException;
21 |
22 |
23 | public class AdminClientWrapper {
24 |
25 | public static Properties getProperties(String brokers) {
26 | Properties properties = new Properties();
27 | properties.setProperty(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, brokers);
28 |
29 | // Set how to serialize key/value pairs
30 | properties.setProperty(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer");
31 | properties.setProperty(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer");
32 | // specify the protocol for Domain Joined clusters
33 | //properties.setProperty(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, "SASL_PLAINTEXT");
34 |
35 | return properties;
36 | }
37 |
38 | public static void describeTopics(String brokers, String topicName) throws IOException {
39 | // Set properties used to configure admin client
40 | Properties properties = getProperties(brokers);
41 |
42 | try (final AdminClient adminClient = KafkaAdminClient.create(properties)) {
43 | // Make async call to describe the topic.
44 | final DescribeTopicsResult describeTopicsResult = adminClient.describeTopics(Collections.singleton(topicName));
45 |
46 | TopicDescription description = describeTopicsResult.values().get(topicName).get();
47 | System.out.print(description.toString());
48 | } catch (Exception e) {
49 | System.out.print("Describe denied\n");
50 | System.out.print(e.getMessage());
51 | //throw new RuntimeException(e.getMessage(), e);
52 | }
53 | }
54 |
55 | public static void deleteTopics(String brokers, String topicName) throws IOException {
56 | // Set properties used to configure admin client
57 | Properties properties = getProperties(brokers);
58 |
59 | try (final AdminClient adminClient = KafkaAdminClient.create(properties)) {
60 | final DeleteTopicsResult deleteTopicsResult = adminClient.deleteTopics(Collections.singleton(topicName));
61 | deleteTopicsResult.values().get(topicName).get();
62 | System.out.print("Topic " + topicName + " deleted");
63 | } catch (Exception e) {
64 | System.out.print("Delete Topics denied\n");
65 | System.out.print(e.getMessage());
66 | //throw new RuntimeException(e.getMessage(), e);
67 | }
68 | }
69 |
70 | public static void createTopics(String brokers, String topicName) throws IOException {
71 | // Set properties used to configure admin client
72 | Properties properties = getProperties(brokers);
73 |
74 | try (final AdminClient adminClient = KafkaAdminClient.create(properties)) {
75 | int numPartitions = 8;
76 | short replicationFactor = (short)3;
77 | final NewTopic newTopic = new NewTopic(topicName, numPartitions, replicationFactor);
78 |
79 | final CreateTopicsResult createTopicsResult = adminClient.createTopics(Collections.singleton(newTopic));
80 | createTopicsResult.values().get(topicName).get();
81 | System.out.print("Topic " + topicName + " created");
82 | } catch (Exception e) {
83 | System.out.print("Create Topics denied\n");
84 | System.out.print(e.getMessage());
85 | //throw new RuntimeException(e.getMessage(), e);
86 | }
87 | }
88 | }
89 |
--------------------------------------------------------------------------------
/Producer-Consumer/src/main/java/com/microsoft/example/Consumer.java:
--------------------------------------------------------------------------------
1 | package com.microsoft.example;
2 |
3 | import org.apache.kafka.clients.consumer.KafkaConsumer;
4 | import org.apache.kafka.clients.consumer.ConsumerRecords;
5 | import org.apache.kafka.clients.consumer.ConsumerRecord;
6 | import java.util.Properties;
7 | import java.util.Arrays;
8 |
9 | public class Consumer {
10 | public static int consume(String brokers, String groupId, String topicName) {
11 | // Create a consumer
12 | KafkaConsumer consumer;
13 | // Configure the consumer
14 | Properties properties = new Properties();
15 | // Point it to the brokers
16 | properties.setProperty("bootstrap.servers", brokers);
17 | // Set the consumer group (all consumers must belong to a group).
18 | properties.setProperty("group.id", groupId);
19 | // Set how to serialize key/value pairs
20 | properties.setProperty("key.deserializer","org.apache.kafka.common.serialization.StringDeserializer");
21 | properties.setProperty("value.deserializer","org.apache.kafka.common.serialization.StringDeserializer");
22 | // When a group is first created, it has no offset stored to start reading from. This tells it to start
23 | // with the earliest record in the stream.
24 | properties.setProperty("auto.offset.reset","earliest");
25 |
26 | // specify the protocol for Domain Joined clusters
27 | //properties.setProperty(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, "SASL_PLAINTEXT");
28 |
29 | consumer = new KafkaConsumer<>(properties);
30 |
31 | // Subscribe to the 'test' topic
32 | consumer.subscribe(Arrays.asList(topicName));
33 |
34 | // Loop until ctrl + c
35 | int count = 0;
36 | while(true) {
37 | // Poll for records
38 | ConsumerRecords records = consumer.poll(200);
39 | // Did we get any?
40 | if (records.count() == 0) {
41 | // timeout/nothing to read
42 | } else {
43 | // Yes, loop over records
44 | for(ConsumerRecord record: records) {
45 | // Display record and count
46 | count += 1;
47 | System.out.println( count + ": " + record.value());
48 | }
49 | }
50 | }
51 | }
52 | }
53 |
--------------------------------------------------------------------------------
/Producer-Consumer/src/main/java/com/microsoft/example/Producer.java:
--------------------------------------------------------------------------------
1 | package com.microsoft.example;
2 |
3 | import org.apache.kafka.clients.producer.KafkaProducer;
4 | import org.apache.kafka.clients.producer.ProducerRecord;
5 | import org.apache.kafka.clients.producer.ProducerConfig;
6 | import org.apache.kafka.clients.admin.AdminClient;
7 | import org.apache.kafka.clients.admin.DescribeTopicsResult;
8 | import org.apache.kafka.clients.admin.KafkaAdminClient;
9 | import org.apache.kafka.clients.CommonClientConfigs;
10 | import org.apache.kafka.clients.admin.TopicDescription;
11 |
12 | import java.util.Collection;
13 | import java.util.Collections;
14 | import java.util.concurrent.ExecutionException;
15 | import java.util.Properties;
16 | import java.util.Random;
17 | import java.io.IOException;
18 |
19 | public class Producer
20 | {
21 | public static void produce(String brokers, String topicName) throws IOException
22 | {
23 |
24 | // Set properties used to configure the producer
25 | Properties properties = new Properties();
26 | // Set the brokers (bootstrap servers)
27 | properties.setProperty("bootstrap.servers", brokers);
28 | // Set how to serialize key/value pairs
29 | properties.setProperty("key.serializer","org.apache.kafka.common.serialization.StringSerializer");
30 | properties.setProperty("value.serializer","org.apache.kafka.common.serialization.StringSerializer");
31 | // specify the protocol for Domain Joined clusters
32 | //properties.setProperty(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, "SASL_PLAINTEXT");
33 |
34 | KafkaProducer producer = new KafkaProducer<>(properties);
35 |
36 | // So we can generate random sentences
37 | Random random = new Random();
38 | String[] sentences = new String[] {
39 | "the cow jumped over the moon",
40 | "an apple a day keeps the doctor away",
41 | "four score and seven years ago",
42 | "snow white and the seven dwarfs",
43 | "i am at two with nature"
44 | };
45 |
46 | String progressAnimation = "|/-\\";
47 | // Produce a bunch of records
48 | for(int i = 0; i < 100; i++) {
49 | // Pick a sentence at random
50 | String sentence = sentences[random.nextInt(sentences.length)];
51 | // Send the sentence to the test topic
52 | try
53 | {
54 | producer.send(new ProducerRecord(topicName, sentence)).get();
55 | }
56 | catch (Exception ex)
57 | {
58 | System.out.print(ex.getMessage());
59 | throw new IOException(ex.toString());
60 | }
61 | String progressBar = "\r" + progressAnimation.charAt(i % progressAnimation.length()) + " " + i;
62 | System.out.write(progressBar.getBytes());
63 | }
64 | }
65 | }
66 |
--------------------------------------------------------------------------------
/Producer-Consumer/src/main/java/com/microsoft/example/Run.java:
--------------------------------------------------------------------------------
1 | package com.microsoft.example;
2 |
3 | import java.io.IOException;
4 | import java.util.UUID;
5 | import java.io.PrintWriter;
6 | import java.io.File;
7 | import java.lang.Exception;
8 |
9 | // Handle starting producer or consumer
10 | public class Run {
11 | public static void main(String[] args) throws IOException {
12 | if(args.length < 3) {
13 | usage();
14 | }
15 | // Get the brokers
16 | String brokers = args[2];
17 | String topicName = args[1];
18 | switch(args[0].toLowerCase()) {
19 | case "producer":
20 | Producer.produce(brokers, topicName);
21 | break;
22 | case "consumer":
23 | // Either a groupId was passed in, or we need a random one
24 | String groupId;
25 | if(args.length == 4) {
26 | groupId = args[3];
27 | } else {
28 | groupId = UUID.randomUUID().toString();
29 | }
30 | Consumer.consume(brokers, groupId, topicName);
31 | break;
32 | case "describe":
33 | AdminClientWrapper.describeTopics(brokers, topicName);
34 | break;
35 | case "create":
36 | AdminClientWrapper.createTopics(brokers, topicName);
37 | break;
38 | case "delete":
39 | AdminClientWrapper.deleteTopics(brokers, topicName);
40 | break;
41 | default:
42 | usage();
43 | }
44 | System.exit(0);
45 | }
46 | // Display usage
47 | public static void usage() {
48 | System.out.println("Usage:");
49 | System.out.println("kafka-example.jar brokerhosts [groupid]");
50 | System.exit(1);
51 | }
52 | }
53 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | ---
2 | page_type: sample
3 | languages:
4 | - java
5 | products:
6 | - azure
7 | - azure-hdinsight
8 | description: "The examples in this repository demonstrate how to use the Kafka Consumer, Producer, and Streaming APIs with a Kafka on HDInsight cluster."
9 | urlFragment: hdinsight-kafka-java-get-started
10 | ---
11 |
12 | # Java-based example of using the Kafka Consumer, Producer, and Streaming APIs
13 |
14 | The examples in this repository demonstrate how to use the Kafka Consumer, Producer, and Streaming APIs with a Kafka on HDInsight cluster.
15 |
16 | There are two projects included in this repository:
17 |
18 | * Producer-Consumer: This contains a producer and consumer that use a Kafka topic named `test`.
19 |
20 | * Streaming: This contains an application that uses the Kafka streaming API (in Kafka 0.10.0 or higher) that reads data from the `test` topic, splits the data into words, and writes a count of words into the `wordcounts` topic.
21 |
22 | NOTE: This both projects assume Kafka 0.10.0, which is available with Kafka on HDInsight cluster version 3.6.
23 |
24 | ## Producer and Consumer
25 |
26 | To run the consumer and producer example, use the following steps:
27 |
28 | 1. Fork/Clone the repository to your development environment.
29 |
30 | 2. Install Java JDK 8 or higher. This was tested with Oracle Java 8, but should work under things like OpenJDK as well.
31 |
32 | 3. Install [Maven](http://maven.apache.org/).
33 |
34 | 4. Assuming Java and Maven are both in the path, and everything is configured fine for JAVA_HOME, use the following commands to build the consumer and producer example:
35 |
36 | cd Producer-Consumer
37 | mvn clean package
38 |
39 | A file named `kafka-producer-consumer-1.0-SNAPSHOT.jar` is now available in the `target` directory.
40 |
41 | 5. Use SCP to upload the file to the Kafka cluster:
42 |
43 | scp ./target/kafka-producer-consumer-1.0-SNAPSHOT.jar SSHUSER@CLUSTERNAME-ssh.azurehdinsight.net:kafka-producer-consumer.jar
44 |
45 | Replace **SSHUSER** with the SSH user for your cluster, and replace **CLUSTERNAME** with the name of your cluster. When prompted enter the password for the SSH user.
46 |
47 | 6. Use SSH to connect to the cluster:
48 |
49 | ssh USERNAME@CLUSTERNAME
50 |
51 | 7. Use the following commands in the SSH session to get the Zookeeper hosts and Kafka brokers for the cluster. You need this information when working with Kafka. Note that JQ is also installed, as it makes it easier to parse the JSON returned from Ambari. Replace __PASSWORD__ with the login (admin) password for the cluster. Replace __KAFKANAME__ with the name of the Kafka on HDInsight cluster.
52 |
53 | sudo apt -y install jq
54 | export KAFKAZKHOSTS=`curl -sS -u admin:$PASSWORD -G https://$CLUSTERNAME.azurehdinsight.net/api/v1/clusters/$CLUSTERNAME/services/ZOOKEEPER/components/ZOOKEEPER_SERVER | jq -r '["\(.host_components[].HostRoles.host_name):2181"] | join(",")' | cut -d',' -f1,2`
55 |
56 | export KAFKABROKERS=`curl -sS -u admin:$PASSWORD -G https://$CLUSTERNAME.azurehdinsight.net/api/v1/clusters/$CLUSTERNAME/services/KAFKA/components/KAFKA_BROKER | jq -r '["\(.host_components[].HostRoles.host_name):9092"] | join(",")' | cut -d',' -f1,2`
57 |
58 | 8. Use the following to verify that the environment variables have been correctly populated:
59 |
60 | echo '$KAFKAZKHOSTS='$KAFKAZKHOSTS
61 | echo '$KAFKABROKERS='$KAFKABROKERS
62 |
63 | The following is an example of the contents of `$KAFKAZKHOSTS`:
64 |
65 | zk0-kafka.eahjefxxp1netdbyklgqj5y1ud.ex.internal.cloudapp.net:2181,zk2-kafka.eahjefxxp1netdbyklgqj5y1ud.ex.internal.cloudapp.net:2181
66 |
67 | The following is an example of the contents of `$KAFKABROKERS`:
68 |
69 | wn1-kafka.eahjefxxp1netdbyklgqj5y1ud.cx.internal.cloudapp.net:9092,wn0-kafka.eahjefxxp1netdbyklgqj5y1ud.cx.internal.cloudapp.net:9092
70 |
71 | NOTE: This information may change as you perform scaling operations on the cluster, as this adds and removes worker nodes. You should always retrieve the Zookeeper and Broker information before working with Kafka.
72 |
73 | IMPORTANT: You don't have to provide all broker or Zookeeper nodes. A connection to one broker or Zookeeper node can be used to learn about the others. In this example, the list of hosts is trimmed to two entries.
74 |
75 | 9. This example uses a topic named `test`. Use the following to create this topic:
76 |
77 | /usr/hdp/current/kafka-broker/bin/kafka-topics.sh --create --replication-factor 2 --partitions 8 --topic test --zookeeper $KAFKAZKHOSTS
78 |
79 | 10. Use the producer-consumer example to write records to the topic:
80 |
81 | java -jar kafka-producer-consumer.jar producer test $KAFKABROKERS
82 |
83 | A counter displays how many records have been written.
84 |
85 | 11. Use the producer-consumer to read the records that were just written:
86 |
87 | java -jar kafka-producer-consumer.jar consumer test $KAFKABROKERS
88 |
89 | This returns a list of the random sentences, along with a count of how many are read.
90 |
91 | ## Streaming
92 |
93 | NOTE: The streaming example expects that you have already setup the `test` topic from the previous section.
94 |
95 | 1. On your development environment, change to the `Streaming` directory and use the following to create a jar for this project:
96 |
97 | mvn clean package
98 |
99 | 2. Use SCP to copy the `kafka-streaming-1.0-SNAPSHOT.jar` file to your HDInsight cluster:
100 |
101 | scp ./target/kafka-streaming-1.0-SNAPSHOT.jar SSHUSER@CLUSTERNAME-ssh.azurehdinsight.net:kafka-streaming.jar
102 |
103 | Replace **SSHUSER** with the SSH user for your cluster, and replace **CLUSTERNAME** with the name of your cluster. When prompted enter the password for the SSH user.
104 |
105 | 3. Once the file has been uploaded, return to the SSH connection to your HDInsight cluster and use the following commands to create the `wordcounts` and `wordcount-example-Counts-changelog` topics:
106 |
107 | /usr/hdp/current/kafka-broker/bin/kafka-topics.sh --create --replication-factor 2 --partitions 8 --topic wordcounts --zookeeper $KAFKAZKHOSTS
108 | /usr/hdp/current/kafka-broker/bin/kafka-topics.sh --create --replication-factor 2 --partitions 8 --topic wordcount-example-Counts-changelog --zookeeper $KAFKAZKHOSTS
109 |
110 | 4. Use the following command to start the streaming process in the background:
111 |
112 | java -jar kafka-streaming.jar $KAFKABROKERS 2>/dev/null &
113 |
114 | 4. While it is running, use the producer to send messages to the `test` topic:
115 |
116 | java -jar kafka-producer-consumer.jar producer test $KAFKABROKERS &>/dev/null &
117 |
118 | 6. Use the following to view the output that is written to the `wordcounts` topic:
119 |
120 | /usr/hdp/current/kafka-broker/bin/kafka-console-consumer.sh --bootstrap-server $KAFKABROKERS --topic wordcounts --from-beginning --formatter kafka.tools.DefaultMessageFormatter --property print.key=true --property key.deserializer=org.apache.kafka.common.serialization.StringDeserializer --property value.deserializer=org.apache.kafka.common.serialization.LongDeserializer
121 |
122 | NOTE: You have to tell the consumer to print the key (which contains the word value) and the deserializer to use for the key and value in order to view the data.
123 |
124 | The output is similar to the following:
125 |
126 | dwarfs 13635
127 | ago 13664
128 | snow 13636
129 | dwarfs 13636
130 | ago 13665
131 | a 13803
132 | ago 13666
133 | a 13804
134 | ago 13667
135 | ago 13668
136 | jumped 13640
137 | jumped 13641
138 | a 13805
139 | snow 13637
140 |
141 | 7. Use __Ctrl + C__ to exit the consumer, then use the `fg` command to bring the streaming background task to the foreground. Use __Ctrl + C__ to exit it also.
142 |
--------------------------------------------------------------------------------
/Streaming/.gitignore:
--------------------------------------------------------------------------------
1 | target/
2 | pom.xml.tag
3 | pom.xml.releaseBackup
4 | pom.xml.versionsBackup
5 | pom.xml.next
6 | release.properties
7 | dependency-reduced-pom.xml
8 | buildNumber.properties
9 | .mvn/timing.properties
10 | .idea/
11 | *.log
12 | .classpath
13 | .project
14 | .settings/
15 | *.iml
--------------------------------------------------------------------------------
/Streaming/pom.xml:
--------------------------------------------------------------------------------
1 |
3 | 4.0.0
4 | com.microsoft.example
5 | kafka-streaming
6 | jar
7 | 1.0-SNAPSHOT
8 | kafka-streaming
9 | http://maven.apache.org
10 |
11 | 0.10.0.0
12 |
13 |
14 |
15 | org.apache.kafka
16 | kafka-streams
17 | ${kafka.version}
18 |
19 |
20 |
21 |
22 |
23 | org.apache.maven.plugins
24 | maven-compiler-plugin
25 | 3.3
26 |
27 | 1.8
28 | 1.8
29 |
30 |
31 |
32 |
33 | org.apache.maven.plugins
34 | maven-shade-plugin
35 | 2.3
36 |
37 |
38 |
39 |
40 |
41 |
42 | com.microsoft.example.Stream
43 |
44 |
45 |
46 |
47 |
48 | *:*
49 |
50 | META-INF/*.SF
51 | META-INF/*.DSA
52 | META-INF/*.RSA
53 |
54 |
55 |
56 |
57 |
58 |
59 | package
60 |
61 | shade
62 |
63 |
64 |
65 |
66 |
67 |
68 |
69 |
--------------------------------------------------------------------------------
/Streaming/src/main/java/com/microsoft/example/Stream.java:
--------------------------------------------------------------------------------
1 | package com.microsoft.example;
2 |
3 | import org.apache.kafka.common.serialization.Serde;
4 | import org.apache.kafka.common.serialization.Serdes;
5 | import org.apache.kafka.streams.KafkaStreams;
6 | import org.apache.kafka.streams.KeyValue;
7 | import org.apache.kafka.streams.StreamsConfig;
8 | import org.apache.kafka.streams.kstream.KStream;
9 | import org.apache.kafka.streams.kstream.KStreamBuilder;
10 |
11 | import java.util.Arrays;
12 | import java.util.Properties;
13 |
14 | public class Stream
15 | {
16 | public static void main( String[] args ) {
17 | Properties streamsConfig = new Properties();
18 | // The name must be unique on the Kafka cluster
19 | streamsConfig.put(StreamsConfig.APPLICATION_ID_CONFIG, "wordcount-example");
20 | // Brokers
21 | streamsConfig.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, args[0]);
22 | // Zookeeper
23 | //streamsConfig.put(StreamsConfig.ZOOKEEPER_CONNECT_CONFIG, args[1]);
24 | // SerDes for key and values
25 | streamsConfig.put(StreamsConfig.KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
26 | streamsConfig.put(StreamsConfig.VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
27 |
28 | // Serdes for the word and count
29 | Serde stringSerde = Serdes.String();
30 | Serde longSerde = Serdes.Long();
31 |
32 | KStreamBuilder builder = new KStreamBuilder();
33 | KStream sentences = builder.stream(stringSerde, stringSerde, "test");
34 | KStream wordCounts = sentences
35 | .flatMapValues(value -> Arrays.asList(value.toLowerCase().split("\\W+")))
36 | .map((key, word) -> new KeyValue<>(word, word))
37 | .countByKey("Counts")
38 | .toStream();
39 | wordCounts.to(stringSerde, longSerde, "wordcounts");
40 |
41 | KafkaStreams streams = new KafkaStreams(builder, streamsConfig);
42 | streams.start();
43 |
44 | Runtime.getRuntime().addShutdownHook(new Thread(streams::close));
45 | }
46 | }
47 |
--------------------------------------------------------------------------------
/azuredeploy.json:
--------------------------------------------------------------------------------
1 | {
2 | "$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
3 | "contentVersion": "1.0.0.0",
4 | "parameters": {
5 | "clusterName": {
6 | "type": "string",
7 | "metadata": {
8 | "description": "The name of the Kafka cluster to create. This must be a unique name."
9 | }
10 | },
11 | "clusterLoginUserName": {
12 | "type": "string",
13 | "defaultValue": "admin",
14 | "metadata": {
15 | "description": "These credentials can be used to submit jobs to the cluster and to log into cluster dashboards."
16 | }
17 | },
18 | "clusterLoginPassword": {
19 | "type": "securestring",
20 | "metadata": {
21 | "description": "The password must be at least 10 characters in length and must contain at least one digit, one non-alphanumeric character, and one upper or lower case letter."
22 | }
23 | },
24 | "sshUserName": {
25 | "type": "string",
26 | "defaultValue": "sshuser",
27 | "metadata": {
28 | "description": "These credentials can be used to remotely access the cluster."
29 | }
30 | },
31 | "sshPassword": {
32 | "type": "securestring",
33 | "metadata": {
34 | "description": "The password must be at least 10 characters in length and must contain at least one digit, one non-alphanumeric character, and one upper or lower case letter."
35 | }
36 | }
37 | },
38 | "variables": {
39 | "defaultStorageAccount": {
40 | "name": "[uniqueString(resourceGroup().id)]",
41 | "type": "Standard_LRS"
42 | }
43 | },
44 | "resources": [
45 | {
46 | "type": "Microsoft.Storage/storageAccounts",
47 | "name": "[variables('defaultStorageAccount').name]",
48 | "location": "[resourceGroup().location]",
49 | "apiVersion": "2016-01-01",
50 | "sku": {
51 | "name": "[variables('defaultStorageAccount').type]"
52 | },
53 | "kind": "Storage",
54 | "properties": {}
55 | },
56 | {
57 | "name": "[parameters('clusterName')]",
58 | "type": "Microsoft.HDInsight/clusters",
59 | "location": "[resourceGroup().location]",
60 | "apiVersion": "2015-03-01-preview",
61 | "dependsOn": [
62 | "[concat('Microsoft.Storage/storageAccounts/',variables('defaultStorageAccount').name)]"
63 | ],
64 | "tags": { },
65 | "properties": {
66 | "clusterVersion": "3.6",
67 | "osType": "Linux",
68 | "clusterDefinition": {
69 | "kind": "kafka",
70 |
71 | "configurations": {
72 | "gateway": {
73 | "restAuthCredential.isEnabled": true,
74 | "restAuthCredential.username": "[parameters('clusterLoginUserName')]",
75 | "restAuthCredential.password": "[parameters('clusterLoginPassword')]"
76 | }
77 | }
78 | },
79 | "storageProfile": {
80 | "storageaccounts": [
81 | {
82 | "name": "[replace(replace(concat(reference(concat('Microsoft.Storage/storageAccounts/', variables('defaultStorageAccount').name), '2016-01-01').primaryEndpoints.blob),'https:',''),'/','')]",
83 | "isDefault": true,
84 | "container": "[parameters('clusterName')]",
85 | "key": "[listKeys(resourceId('Microsoft.Storage/storageAccounts', variables('defaultStorageAccount').name), '2016-01-01').keys[0].value]"
86 | }
87 | ]
88 | },
89 | "computeProfile": {
90 | "roles": [
91 | {
92 | "name": "headnode",
93 | "targetInstanceCount": "2",
94 | "hardwareProfile": {
95 | "vmSize": "Standard_D3_v2"
96 | },
97 | "osProfile": {
98 | "linuxOperatingSystemProfile": {
99 | "username": "[parameters('sshUserName')]",
100 | "password": "[parameters('sshPassword')]"
101 | }
102 | }
103 | },
104 | {
105 | "name": "workernode",
106 | "targetInstanceCount": 4,
107 | "hardwareProfile": {
108 | "vmSize": "Standard_D3_v2"
109 | },
110 | "dataDisksGroups": [
111 | {
112 | "disksPerNode": 2
113 | }
114 | ],
115 | "osProfile": {
116 | "linuxOperatingSystemProfile": {
117 | "username": "[parameters('sshUserName')]",
118 | "password": "[parameters('sshPassword')]"
119 | }
120 | }
121 | },
122 | {
123 | "name": "zookeepernode",
124 | "targetInstanceCount": "3",
125 | "hardwareProfile": {
126 | "vmSize": "Standard_A3"
127 | },
128 | "osProfile": {
129 | "linuxOperatingSystemProfile": {
130 | "username": "[parameters('sshUserName')]",
131 | "password": "[parameters('sshPassword')]"
132 | }
133 | }
134 | }
135 | ]
136 | }
137 | }
138 | }
139 | ],
140 | "outputs": {
141 | "cluster": {
142 | "type": "object",
143 | "value": "[reference(resourceId('Microsoft.HDInsight/clusters',parameters('clusterName')))]"
144 | }
145 | }
146 | }
--------------------------------------------------------------------------------