├── CONTRIBUTING.md ├── DomainJoined-Producer-Consumer-With-TLS ├── .gitignore ├── README.md ├── media │ ├── Add_User.png │ ├── Azure_Portal_UI.png │ ├── Edit_Policy_UI.png │ ├── Kafk_Policy_UI.png │ └── Ranger_UI.png ├── pom.xml └── src │ └── main │ └── java │ └── com │ └── microsoft │ └── example │ ├── AdminClientWrapper.java │ ├── Consumer.java │ ├── Producer.java │ └── Run.java ├── DomainJoined-Producer-Consumer ├── .gitignore ├── README.md ├── media │ ├── Add_User.png │ ├── Azure_Portal_UI.png │ ├── Edit_Policy_UI.png │ ├── Kafk_Policy_UI.png │ └── Ranger_UI.png ├── pom.xml └── src │ └── main │ └── java │ └── com │ └── microsoft │ └── example │ ├── AdminClientWrapper.java │ ├── Consumer.java │ ├── Producer.java │ └── Run.java ├── LICENSE ├── Prebuilt-Jars ├── kafka-producer-consumer-esp.jar ├── kafka-producer-consumer-tls-esp.jar └── kafka-producer-consumer.jar ├── Producer-Consumer ├── .gitignore ├── pom.xml └── src │ └── main │ └── java │ └── com │ └── microsoft │ └── example │ ├── AdminClientWrapper.java │ ├── Consumer.java │ ├── Producer.java │ └── Run.java ├── README.md ├── Streaming ├── .gitignore ├── pom.xml └── src │ └── main │ └── java │ └── com │ └── microsoft │ └── example │ └── Stream.java └── azuredeploy.json /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing to Azure samples 2 | 3 | Thank you for your interest in contributing to Azure samples! 4 | 5 | ## Ways to contribute 6 | 7 | You can contribute to [Azure samples](https://azure.microsoft.com/documentation/samples/) in a few different ways: 8 | 9 | - Submit feedback on [this sample page](https://azure.microsoft.com/documentation/samples/hdinsight-java-storm-wordcount/) whether it was helpful or not. 10 | - Submit issues through [issue tracker](https://github.com/Azure-Samples/hdinsight-java-storm-wordcount/issues) on GitHub. We are actively monitoring the issues and improving our samples. 11 | - If you wish to make code changes to samples, or contribute something new, please follow the [GitHub Forks / Pull requests model](https://help.github.com/articles/fork-a-repo/): Fork the sample repo, make the change and propose it back by submitting a pull request. -------------------------------------------------------------------------------- /DomainJoined-Producer-Consumer-With-TLS/.gitignore: -------------------------------------------------------------------------------- 1 | target/ 2 | pom.xml.tag 3 | pom.xml.releaseBackup 4 | pom.xml.versionsBackup 5 | pom.xml.next 6 | release.properties 7 | dependency-reduced-pom.xml 8 | buildNumber.properties 9 | .mvn/timing.properties 10 | .idea/ 11 | *.log 12 | .classpath 13 | .project 14 | .settings/ 15 | *.iml -------------------------------------------------------------------------------- /DomainJoined-Producer-Consumer-With-TLS/README.md: -------------------------------------------------------------------------------- 1 | --- 2 | page_type: sample 3 | languages: java 4 | products: 5 | - azure 6 | - azure-hdinsight 7 | description: "Examples in this repository demonstrate how to use the Kafka Consumer, Producer, and Streaming APIs with a Kerberized Kafka on HDInsight cluster." 8 | urlFragment: hdinsight-kafka-java-get-started 9 | --- 10 | 11 | # Java-based example of using the Kafka Consumer, Producer, and Streaming APIs 12 | 13 | The examples in this repository demonstrate how to use the Kafka Consumer, Producer, and Streaming APIs with `ESP Kafka including TLS enabled` on HDInsight cluster. 14 | 15 | ## Prerequisites 16 | 17 | * Apache Kafka on HDInsight cluster. To learn how to create the cluster, see [Start with Apache Kafka on HDInsight](apache-kafka-get-started.md). 18 | * [Java Developer Kit (JDK) version 8](https://aka.ms/azure-jdks) or an equivalent, such as OpenJDK. 19 | * [Apache Maven](https://maven.apache.org/download.cgi) properly [installed](https://maven.apache.org/install.html) according to Apache. Maven is a project build system for Java projects. 20 | * An SSH client like Putty. For more information, see [Connect to HDInsight (Apache Hadoop) using SSH](../hdinsight-hadoop-linux-use-ssh-unix.md). 21 | 22 | ## Understand the code 23 | 24 | If you're using **Enterprise Security Package (ESP) with TLS Encryption** enabled Kafka cluster, you should use the application version located in the `DomainJoined-Producer-Consumer-With-TLS` subdirectory. 25 | 26 | The application consists primarily of four files: 27 | * `pom.xml`: This file defines the project dependencies, Java version, and packaging methods. 28 | * `Producer.java`: This file sends random sentences to Kafka using the producer API. 29 | * `Consumer.java`: This file uses the consumer API to read data from Kafka and emit it to STDOUT. 30 | * `AdminClientWrapper.java`: This file uses the admin API to create, describe, and delete Kafka topics. 31 | * `Run.java`: The command-line interface used to run the producer and consumer code. 32 | 33 | ### Pom.xml 34 | 35 | The important things to understand in the `pom.xml` file are: 36 | 37 | * Dependencies: This project relies on the Kafka producer and consumer APIs, which are provided by the `kafka-clients` package. The following XML code defines this dependency: 38 | 39 | ```xml 40 | 41 | 42 | org.apache.kafka 43 | kafka-clients 44 | ${kafka.version} 45 | 46 | ``` 47 | 48 | The `${kafka.version}` entry is declared in the `..` section of `pom.xml`, and is configured to the Kafka version of the HDInsight cluster. 49 | 50 | * Plugins: Maven plugins provide various capabilities. In this project, the following plugins are used: 51 | 52 | * `maven-compiler-plugin`: Used to set the Java version used by the project to 8. This is the version of Java used by HDInsight 4.0. 53 | * `maven-shade-plugin`: Used to generate an uber jar that contains this application as well as any dependencies. It is also used to set the entry point of the application, so that you can directly run the Jar file without having to specify the main class. 54 | 55 | ### Producer.java 56 | 57 | The producer communicates with the Kafka broker hosts (worker nodes) and sends data to a Kafka topic. The following code snippet is from the [Producer.java](https://github.com/Azure-Samples/hdinsight-kafka-java-get-started/blob/master/DomainJoined-Producer-Consumer/src/main/java/com/microsoft/example/Producer.java) file from the [GitHub repository](https://github.com/Azure-Samples/hdinsight-kafka-java-get-started) and shows how to set the producer properties. 58 | 59 | ```java 60 | Properties properties = new Properties(); 61 | // Set the brokers (bootstrap servers) 62 | properties.setProperty("bootstrap.servers", brokers); 63 | // Set how to serialize key/value pairs 64 | properties.setProperty("key.serializer","org.apache.kafka.common.serialization.StringSerializer"); 65 | properties.setProperty("value.serializer","org.apache.kafka.common.serialization.StringSerializer"); 66 | // Set the TLS Encryption for Domain Joined TLS Encrypted cluster 67 | properties.setProperty(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, "SASL_SSL"); 68 | properties.setProperty("ssl.mechanism", "GSSAPI"); 69 | properties.setProperty("sasl.kerberos.service.name", "kafka"); 70 | // Set the SSL Truststore location and password 71 | properties.setProperty(SslConfigs.SSL_TRUSTSTORE_LOCATION_CONFIG,"/home/sshuser/ssl/kafka.client.truststore.jks"); 72 | properties.setProperty(SslConfigs.SSL_TRUSTSTORE_PASSWORD_CONFIG, "MyClientPassword123"); 73 | // Set the SSL keystore location and password 74 | properties.setProperty(SslConfigs.SSL_KEYSTORE_LOCATION_CONFIG,"/home/sshuser/ssl/kafka.client.keystore.jks"); 75 | properties.setProperty(SslConfigs.SSL_KEYSTORE_PASSWORD_CONFIG, "MyClientPassword123"); 76 | // Set the SSL key password 77 | properties.setProperty(SslConfigs.SSL_KEY_PASSWORD_CONFIG, "MyClientPassword123"); 78 | KafkaProducer producer = new KafkaProducer<>(properties); 79 | ``` 80 | 81 | ### Consumer.java 82 | 83 | The consumer communicates with the Kafka broker hosts (worker nodes), and reads records in a loop. The following code snippet from the [Consumer.java](https://github.com/Azure-Samples/hdinsight-kafka-java-get-started/blob/master/DomainJoined-Producer-Consumer/src/main/java/com/microsoft/example/Consumer.java) file sets the consumer properties. 84 | 85 | ```java 86 | KafkaConsumer consumer; 87 | // Configure the consumer 88 | Properties properties = new Properties(); 89 | // Point it to the brokers 90 | properties.setProperty("bootstrap.servers", brokers); 91 | // Set the consumer group (all consumers must belong to a group). 92 | properties.setProperty("group.id", groupId); 93 | // Set how to serialize key/value pairs 94 | properties.setProperty("key.deserializer","org.apache.kafka.common.serialization.StringDeserializer"); 95 | properties.setProperty("value.deserializer","org.apache.kafka.common.serialization.StringDeserializer"); 96 | // Set the TLS Encryption for Domain Joined TLS Encrypted cluster 97 | properties.setProperty(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, "SASL_SSL"); 98 | properties.setProperty("ssl.mechanism", "GSSAPI"); 99 | properties.setProperty("sasl.kerberos.service.name", "kafka"); 100 | // Set the SSL Truststore location and password 101 | properties.setProperty(SslConfigs.SSL_TRUSTSTORE_LOCATION_CONFIG,"/home/sshuser/ssl/kafka.client.truststore.jks"); 102 | properties.setProperty(SslConfigs.SSL_TRUSTSTORE_PASSWORD_CONFIG, "MyClientPassword123"); 103 | // Set the SSL keystore location and password 104 | properties.setProperty(SslConfigs.SSL_KEYSTORE_LOCATION_CONFIG,"/home/sshuser/ssl/kafka.client.keystore.jks"); 105 | properties.setProperty(SslConfigs.SSL_KEYSTORE_PASSWORD_CONFIG, "MyClientPassword123"); 106 | // Set the SSL key password 107 | properties.setProperty(SslConfigs.SSL_KEY_PASSWORD_CONFIG, "MyClientPassword123"); 108 | // When a group is first created, it has no offset stored to start reading from. This tells it to start 109 | // with the earliest record in the stream. 110 | properties.setProperty("auto.offset.reset","earliest"); 111 | consumer = new KafkaConsumer<>(properties); 112 | ``` 113 | 114 | #### Note: 115 | The important properties added for ESP with TLS Encryption enabled cluster. 116 | This is critical to add in `AdminClient, Producer and Consumer`. 117 | It is possible that your ESP cluster might have TLS Encryption and Authentication both. Please change the configurations based on 118 | [Enable TLS Encryption on ESP cluster](https://learn.microsoft.com/en-us/azure/hdinsight/kafka/apache-esp-kafka-ssl-encryption-authentication) 119 | ``` 120 | // Set the TLS Encryption for Domain Joined TLS Encrypted cluster 121 | properties.setProperty(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, "SASL_SSL"); 122 | properties.setProperty("ssl.mechanism", "GSSAPI"); 123 | properties.setProperty("sasl.kerberos.service.name", "kafka"); 124 | // Set the SSL Truststore location and password 125 | properties.setProperty(SslConfigs.SSL_TRUSTSTORE_LOCATION_CONFIG,"/home/sshuser/ssl/kafka.client.truststore.jks"); 126 | properties.setProperty(SslConfigs.SSL_TRUSTSTORE_PASSWORD_CONFIG, "MyClientPassword123"); 127 | // Set the SSL keystore location and password 128 | properties.setProperty(SslConfigs.SSL_KEYSTORE_LOCATION_CONFIG,"/home/sshuser/ssl/kafka.client.keystore.jks"); 129 | properties.setProperty(SslConfigs.SSL_KEYSTORE_PASSWORD_CONFIG, "MyClientPassword123"); 130 | // Set the SSL key password 131 | properties.setProperty(SslConfigs.SSL_KEY_PASSWORD_CONFIG, "MyClientPassword123"); 132 | ``` 133 | In this code, the consumer is configured to read from the start of the topic (`auto.offset.reset` is set to `earliest`.) 134 | Above properties can change based on your keystore, truststore location and passwords. Another possible value for `ssl.mechanism` is `PLAIN` 135 | 136 | ### Run.java 137 | 138 | The [Run.java](https://github.com/Azure-Samples/hdinsight-kafka-java-get-started/blob/master/DomainJoined-Producer-Consumer/src/main/java/com/microsoft/example/Run.java) file provides a command-line interface that runs either the producer or consumer code. You must provide the Kafka broker host information as a parameter. You can optionally include a group ID value, which is used by the consumer process. If you create multiple consumer instances using the same group ID, they'll load balance reading from the topic. 139 | 140 | ## Use Pre-built JAR files 141 | 142 | Download the jars from the [Kafka Get Started Azure sample](https://github.com/Azure-Samples/hdinsight-kafka-java-get-started/tree/master/Prebuilt-Jars). If your cluster is **Enterprise Security Package (ESP) with TLS Encryption ** enabled, use kafka-producer-consumer-tls-esp.jar. Use the command below to copy the jars to your cluster. 143 | 144 | ```cmd 145 | scp kafka-producer-consumer-tls-esp.jar sshuser@CLUSTERNAME-ssh.azurehdinsight.net:kafka-producer-consumer.jar 146 | ``` 147 | 148 | ## Build the JAR files from code 149 | 150 | 1. Download and extract the examples from [https://github.com/Azure-Samples/hdinsight-kafka-java-get-started](https://github.com/Azure-Samples/hdinsight-kafka-java-get-started). 151 | 152 | 2. If you are using **Enterprise Security Package (ESP) with TLS Encryption** enabled Kafka cluster, you should set the location to `DomainJoined-Producer-Consumer` subdirectory. Use the following command to build the application: 153 | 154 | ```cmd 155 | mvn clean package 156 | ``` 157 | 158 | This command creates a directory named `target`, that contains a file named `kafka-producer-consumer-1.0-SNAPSHOT.jar`. For ESP clusters the file will be `kafka-producer-consumer-esp-1.0-SNAPSHOT.jar` 159 | 160 | 3. Replace `sshuser` with the SSH user for your cluster, and replace `CLUSTERNAME` with the name of your cluster. Enter the following command to copy the `kafka-producer-consumer-*.jar` file to your HDInsight cluster. When prompted enter the password for the SSH user. 161 | 162 | ```cmd 163 | scp ./target/kafka-producer-consumer*.jar sshuser@CLUSTERNAME-ssh.azurehdinsight.net:kafka-producer-consumer.jar 164 | ``` 165 | 166 | ## Run the example 167 | This conversation was marked as resolved by piyushgupta 168 | 169 | 1. Replace `sshuser` with the SSH user for your cluster, and replace `CLUSTERNAME` with the name of your cluster. Open an SSH connection to the cluster, by entering the following command. If prompted, enter the password for the SSH user account. 170 | 171 | ```cmd 172 | ssh sshuser@CLUSTERNAME-ssh.azurehdinsight.net 173 | ``` 174 | 175 | 1. To get the Kafka broker hosts, substitute the values for `` and `` in the following command and execute it. Use the same casing for `` as shown in the Azure portal. Replace `` with the cluster login password, then execute: 176 | 177 | ```bash 178 | sudo apt -y install jq 179 | export clusterName='' 180 | export password='' 181 | export KAFKABROKERS=$(curl -sS -u admin:$password -G https://$clusterName.azurehdinsight.net/api/v1/clusters/$clusterName/services/KAFKA/components/KAFKA_BROKER | jq -r '["\(.host_components[].HostRoles.host_name):9092"] | join(",")' | cut -d',' -f1,2); 182 | ``` 183 | 184 | > **Note** 185 | This command requires Ambari access. If your cluster is behind an NSG, run this command from a machine that can access Ambari. 186 | 1. Create Kafka topic, `myTest`, by entering the following command: 187 | 188 | ```bash 189 | java -jar -Djava.security.auth.login.config=/usr/hdp/current/kafka-broker/config/kafka_client_jaas.conf kafka-producer-consumer.jar create myTest $KAFKABROKERS 190 | ``` 191 | 192 | 1. To run the producer and write data to the topic, use the following command: 193 | 194 | ```bash 195 | java -jar -Djava.security.auth.login.config=/usr/hdp/current/kafka-broker/config/kafka_client_jaas.conf kafka-producer-consumer.jar producer myTest $KAFKABROKERS 196 | ``` 197 | 198 | 1. Once the producer has finished, use the following command to read from the topic: 199 | 200 | ```bash 201 | java -jar -Djava.security.auth.login.config=/usr/hdp/current/kafka-broker/config/kafka_client_jaas.conf kafka-producer-consumer.jar consumer myTest $KAFKABROKERS 202 | scp ./target/kafka-producer-consumer*.jar sshuser@CLUSTERNAME-ssh.azurehdinsight.net:kafka-producer-consumer.jar 203 | ``` 204 | 205 | The records read, along with a count of records, is displayed. 206 | 207 | 1. Use __Ctrl + C__ to exit the consumer. 208 | 209 | ### Run the Example with another User (espkafkauser) 210 | 211 | 1. To get the Kafka broker hosts, substitute the values for `` and `` in the following command and execute it. Use the same casing for `` as shown in the Azure portal. Replace `` with the cluster login password, then execute: 212 | 213 | ```bash 214 | sudo apt -y install jq 215 | export clusterName='' 216 | export password='' 217 | export KAFKABROKERS=$(curl -sS -u admin:$password -G https://$clusterName.azurehdinsight.net/api/v1/clusters/$clusterName/services/KAFKA/components/KAFKA_BROKER | jq -r '["\(.host_components[].HostRoles.host_name):9092"] | join(",")' | cut -d',' -f1,2); 218 | ``` 219 | 2. Create the keytab file for espkafkauser with below steps 220 | ```bash 221 | ktutil 222 | ktutil: addent -password -p espkafkauser@TEST.COM -k 1 -e RC4-HMAC 223 | Password for espkafkauser@TEST.COM: 224 | ktutil: wkt espkafkauser.keytab 225 | ktutil: q 226 | ``` 227 | 228 | **NOTE:-** 229 | 1. espkafkauser should be part of your domain group and add it in RangerUI to give CRUD operations privileges. 230 | 2. Keep this domain name (TEST.COM) in capital only. Otherwise, kerberos will throw errors at the time of CRUD operations. 231 | 232 | You will be having an espkafkauser.keytab file in local directory. Now create an espkafkauser_jaas.conf jaas config file with data given below 233 | 234 | ``` 235 | KafkaClient { 236 | com.sun.security.auth.module.Krb5LoginModule required 237 | useKeyTab=true 238 | storeKey=true 239 | keyTab="/home/sshuser/espkafkauser.keytab" 240 | useTicketCache=false 241 | serviceName="kafka" 242 | principal="espkafkauser@TEST.COM"; 243 | }; 244 | ``` 245 | ### Steps to add espkafkauser on RangerUI 246 | 1. Go to overview page of cluster and use Ambari UI URL to open ranger. Enter the Ambari UI credentials and it should work. 247 | 248 | ![](media/Azure_Portal_UI.png) 249 | ``` 250 | Generic 251 | https:///ranger 252 | 253 | Example 254 | https://espkafka.azurehdinsight.net/ranger 255 | ``` 256 | 257 | 2. If everything is correct then you will be able to see ranger dashboard. Now click on Kafka link. 258 | 259 | ![](media/Ranger_UI.png) 260 | 261 | 262 | 3. Now we can see policy page where some users like kafka have access to do CRUD operation on alltopics. 263 | 264 | ![](media/Kafk_Policy_UI.png) 265 | 266 | 267 | 4. Now edit the alltopic policy and add espkafkauser in selectuser from dropdown. Click on save policy after changes 268 | 269 | ![](media/Edit_Policy_UI.png) 270 | 271 | ![](media/Add_User.png) 272 | 273 | 274 | 5. If we are not able to see our user in dropdown then that mean that user is not available in AAD domain. 275 | 276 | 6. Now Execute CRUD operations in head node for verification 277 | 278 | ```bash 279 | # Sample command 280 | java -jar -Djava.security.auth.login.config=JAAS_CONFIG_FILE_PATH PRODUCER_CONSUMER_ESP_JAR_PATH create $TOPICNAME $KAFKABROKER 281 | 282 | # Create 283 | java -jar -Djava.security.auth.login.config=user_jaas.conf kafka-producer-consumer-tls-esp.jar create $TOPICNAME $KAFKABROKERS 284 | 285 | # Describe 286 | java -jar -Djava.security.auth.login.config=user_jaas.conf kafka-producer-consumer-tls-esp.jar describe $TOPICNAME $KAFKABROKERS 287 | 288 | #Produce 289 | java -jar -Djava.security.auth.login.config=user_jaas.conf kafka-producer-consumer-tls-esp.jar producer $TOPICNAME $KAFKABROKERS 290 | 291 | #Consume 292 | java -jar -Djava.security.auth.login.config=user_jaas.conf kafka-producer-consumer-tls-esp.jar consumer $TOPICNAME $KAFKABROKERS 293 | 294 | #Delete 295 | java -jar -Djava.security.auth.login.config=user_jaas.conf kafka-producer-consumer-tls-esp.jar delete $TOPICNAME $KAFKABROKERS 296 | ``` 297 | 298 | 299 | ### Multiple consumers 300 | 301 | Kafka consumers use a consumer group when reading records. Using the same group with multiple consumers results in load balanced reads from a topic. Each consumer in the group receives a portion of the records. 302 | 303 | The consumer application accepts a parameter that is used as the group ID. For example, the following command starts a consumer using a group ID of `myGroup`: 304 | 305 | ```bash 306 | java -jar -Djava.security.auth.login.config=user_jaas.conf kafka-producer-consumer-tls-esp.jar consumer myTest $KAFKABROKERS myGroup 307 | ``` 308 | 309 | Use __Ctrl + C__ to exit the consumer. 310 | 311 | To see this process in action, use the following command: 312 | 313 | With Kafka as user 314 | ```bash 315 | tmux new-session 'java -jar -Djava.security.auth.login.config=/usr/hdp/current/kafka-broker/config/kafka_client_jaas.conf kafka-producer-consumer-tls-esp.jar consumer myTest $KAFKABROKERS myGroup' \ 316 | \; split-window -h 'java -jar -Djava.security.auth.login.config=/usr/hdp/current/kafka-broker/config/kafka_client_jaas.conf kafka-producer-consumer-tls-esp.jar consumer myTest $KAFKABROKERS myGroup' \ 317 | \; attach 318 | ``` 319 | 320 | With custom user 321 | ```bash 322 | tmux new-session 'java -jar -Djava.security.auth.login.config=user_jaas.conf kafka-producer-consumer-tls-esp.jar consumer myTest $KAFKABROKERS myGroup' \ 323 | \; split-window -h 'java -jar -Djava.security.auth.login.config=user_jaas.conf kafka-producer-consumer-tls-esp.jar consumer myTest $KAFKABROKERS myGroup' \ 324 | \; attach 325 | ``` 326 | 327 | This command uses `tmux` to split the terminal into two columns. A consumer is started in each column, with the same group ID value. Once the consumers finish reading, notice that each read only a portion of the records. Use __Ctrl + C__ twice to exit `tmux`. 328 | 329 | Consumption by clients within the same group is handled through the partitions for the topic. In this code sample, the `test` topic created earlier has eight partitions. If you start eight consumers, each consumer reads records from a single partition for the topic. 330 | 331 | > [!IMPORTANT] 332 | > There cannot be more consumer instances in a consumer group than partitions. In this example, one consumer group can contain up to eight consumers since that is the number of partitions in the topic. Or you can have multiple consumer groups, each with no more than eight consumers. 333 | 334 | Records stored in Kafka are stored in the order they're received within a partition. To achieve in-ordered delivery for records *within a partition*, create a consumer group where the number of consumer instances matches the number of partitions. To achieve in-ordered delivery for records *within the topic*, create a consumer group with only one consumer instance. 335 | 336 | ## Common Issues faced 337 | 338 | 1. Topic creation fails 339 | 340 | 341 | If your cluster is Enterprise Security Pack enabled, use the [pre-built JAR files for producer and consumer](https://github.com/Azure-Samples/hdinsight-kafka-java-get-started/blob/master/Prebuilt-Jars/kafka-producer-consumer-tls-esp.jar). 342 | 343 | 344 | The ESP with TLS Encryption jar can be built from the code in the [`DomainJoined-Producer-Consumer-With-TLS` subdirectory](https://github.com/Azure-Samples/hdinsight-kafka-java-get-started/tree/master/DomainJoined-Producer-Consumer-With-TLS). 345 | 346 | 347 | 1. Facing issue with ESP enabled clusters 348 | 349 | If produce and consume operations fail, and you are using an ESP enabled cluster, check that the user `kafka` is present in all Ranger policies. If it is not present, add it to all Ranger policies. 350 | 351 | -------------------------------------------------------------------------------- /DomainJoined-Producer-Consumer-With-TLS/media/Add_User.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure-Samples/hdinsight-kafka-java-get-started/28497e8edf677bf63b4c50a0e1f6a25b6d3f35a7/DomainJoined-Producer-Consumer-With-TLS/media/Add_User.png -------------------------------------------------------------------------------- /DomainJoined-Producer-Consumer-With-TLS/media/Azure_Portal_UI.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure-Samples/hdinsight-kafka-java-get-started/28497e8edf677bf63b4c50a0e1f6a25b6d3f35a7/DomainJoined-Producer-Consumer-With-TLS/media/Azure_Portal_UI.png -------------------------------------------------------------------------------- /DomainJoined-Producer-Consumer-With-TLS/media/Edit_Policy_UI.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure-Samples/hdinsight-kafka-java-get-started/28497e8edf677bf63b4c50a0e1f6a25b6d3f35a7/DomainJoined-Producer-Consumer-With-TLS/media/Edit_Policy_UI.png -------------------------------------------------------------------------------- /DomainJoined-Producer-Consumer-With-TLS/media/Kafk_Policy_UI.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure-Samples/hdinsight-kafka-java-get-started/28497e8edf677bf63b4c50a0e1f6a25b6d3f35a7/DomainJoined-Producer-Consumer-With-TLS/media/Kafk_Policy_UI.png -------------------------------------------------------------------------------- /DomainJoined-Producer-Consumer-With-TLS/media/Ranger_UI.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure-Samples/hdinsight-kafka-java-get-started/28497e8edf677bf63b4c50a0e1f6a25b6d3f35a7/DomainJoined-Producer-Consumer-With-TLS/media/Ranger_UI.png -------------------------------------------------------------------------------- /DomainJoined-Producer-Consumer-With-TLS/pom.xml: -------------------------------------------------------------------------------- 1 | 3 | 4.0.0 4 | com.microsoft.example 5 | kafka-producer-consumer-tls-esp 6 | jar 7 | 1.0-SNAPSHOT 8 | kafka-producer-consumer 9 | http://maven.apache.org 10 | 11 | 12 | 2.1.1 13 | 14 | 15 | 16 | 17 | org.apache.kafka 18 | kafka-clients 19 | ${kafka.version} 20 | 21 | 22 | 23 | 24 | 25 | org.apache.maven.plugins 26 | maven-compiler-plugin 27 | 3.3 28 | 29 | 30 | 1.8 31 | 1.8 32 | 33 | 34 | 35 | 36 | org.apache.maven.plugins 37 | maven-shade-plugin 38 | 2.3 39 | 40 | 41 | 42 | 43 | 44 | 45 | com.microsoft.example.Run 46 | 47 | 48 | 49 | 50 | 51 | *:* 52 | 53 | META-INF/*.SF 54 | META-INF/*.DSA 55 | META-INF/*.RSA 56 | 57 | 58 | 59 | 60 | 61 | 62 | package 63 | 64 | shade 65 | 66 | 67 | 68 | 69 | 70 | 71 | 72 | -------------------------------------------------------------------------------- /DomainJoined-Producer-Consumer-With-TLS/src/main/java/com/microsoft/example/AdminClientWrapper.java: -------------------------------------------------------------------------------- 1 | package com.microsoft.example; 2 | 3 | import org.apache.kafka.clients.producer.ProducerConfig; 4 | import org.apache.kafka.clients.admin.AdminClient; 5 | import org.apache.kafka.clients.admin.DescribeTopicsResult; 6 | import org.apache.kafka.clients.admin.CreateTopicsResult; 7 | import org.apache.kafka.clients.admin.DeleteTopicsResult; 8 | import org.apache.kafka.clients.admin.TopicDescription; 9 | import org.apache.kafka.clients.admin.NewTopic; 10 | 11 | import org.apache.kafka.clients.admin.KafkaAdminClient; 12 | import org.apache.kafka.clients.CommonClientConfigs; 13 | import org.apache.kafka.common.config.SslConfigs; 14 | 15 | 16 | import java.util.Collection; 17 | import java.util.Collections; 18 | import java.util.concurrent.ExecutionException; 19 | import java.util.Properties; 20 | import java.util.Random; 21 | import java.io.IOException; 22 | 23 | 24 | public class AdminClientWrapper { 25 | 26 | public static Properties getProperties(String brokers) { 27 | Properties properties = new Properties(); 28 | properties.setProperty(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, brokers); 29 | 30 | // Set how to serialize key/value pairs 31 | properties.setProperty(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer"); 32 | properties.setProperty(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer"); 33 | // specify the protocol for Domain Joined TLS Encrypted clusters 34 | properties.setProperty(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, "SASL_SSL"); 35 | properties.setProperty("ssl.mechanism", "GSSAPI"); 36 | properties.setProperty("sasl.kerberos.service.name", "kafka"); 37 | // specifiy the Truststore location and password 38 | properties.setProperty(SslConfigs.SSL_TRUSTSTORE_LOCATION_CONFIG,"/home/sshuser/ssl/kafka.client.truststore.jks"); 39 | properties.setProperty(SslConfigs.SSL_TRUSTSTORE_PASSWORD_CONFIG, "MyClientPassword123"); 40 | // specifiy the Keystore location and password 41 | properties.setProperty(SslConfigs.SSL_KEYSTORE_LOCATION_CONFIG,"/home/sshuser/ssl/kafka.client.keystore.jks"); 42 | properties.setProperty(SslConfigs.SSL_KEYSTORE_PASSWORD_CONFIG, "MyClientPassword123"); 43 | // specifiy the key password 44 | properties.setProperty(SslConfigs.SSL_KEY_PASSWORD_CONFIG, "MyClientPassword123"); 45 | return properties; 46 | } 47 | 48 | public static void describeTopics(String brokers, String topicName) throws IOException { 49 | // Set properties used to configure admin client 50 | Properties properties = getProperties(brokers); 51 | 52 | try (final AdminClient adminClient = KafkaAdminClient.create(properties)) { 53 | // Make async call to describe the topic. 54 | final DescribeTopicsResult describeTopicsResult = adminClient.describeTopics(Collections.singleton(topicName)); 55 | 56 | TopicDescription description = describeTopicsResult.values().get(topicName).get(); 57 | System.out.print(description.toString()); 58 | } catch (Exception e) { 59 | System.out.print("Describe denied\n"); 60 | System.out.print(e.getMessage()); 61 | //throw new RuntimeException(e.getMessage(), e); 62 | } 63 | } 64 | 65 | public static void deleteTopics(String brokers, String topicName) throws IOException { 66 | // Set properties used to configure admin client 67 | Properties properties = getProperties(brokers); 68 | 69 | try (final AdminClient adminClient = KafkaAdminClient.create(properties)) { 70 | final DeleteTopicsResult deleteTopicsResult = adminClient.deleteTopics(Collections.singleton(topicName)); 71 | deleteTopicsResult.values().get(topicName).get(); 72 | System.out.print("Topic " + topicName + " deleted"); 73 | } catch (Exception e) { 74 | System.out.print("Delete Topics denied\n"); 75 | System.out.print(e.getMessage()); 76 | //throw new RuntimeException(e.getMessage(), e); 77 | } 78 | } 79 | 80 | public static void createTopics(String brokers, String topicName) throws IOException { 81 | // Set properties used to configure admin client 82 | Properties properties = getProperties(brokers); 83 | 84 | try (final AdminClient adminClient = KafkaAdminClient.create(properties)) { 85 | int numPartitions = 8; 86 | short replicationFactor = (short)3; 87 | final NewTopic newTopic = new NewTopic(topicName, numPartitions, replicationFactor); 88 | 89 | final CreateTopicsResult createTopicsResult = adminClient.createTopics(Collections.singleton(newTopic)); 90 | createTopicsResult.values().get(topicName).get(); 91 | System.out.print("Topic " + topicName + " created"); 92 | } catch (Exception e) { 93 | System.out.print("Create Topics denied\n"); 94 | System.out.print(e.getMessage()); 95 | //throw new RuntimeException(e.getMessage(), e); 96 | } 97 | } 98 | } 99 | -------------------------------------------------------------------------------- /DomainJoined-Producer-Consumer-With-TLS/src/main/java/com/microsoft/example/Consumer.java: -------------------------------------------------------------------------------- 1 | package com.microsoft.example; 2 | 3 | import org.apache.kafka.clients.consumer.KafkaConsumer; 4 | import org.apache.kafka.clients.consumer.ConsumerRecords; 5 | import org.apache.kafka.clients.consumer.ConsumerRecord; 6 | import org.apache.kafka.clients.CommonClientConfigs; 7 | import org.apache.kafka.common.config.SslConfigs; 8 | 9 | import java.util.Properties; 10 | import java.util.Arrays; 11 | 12 | public class Consumer { 13 | public static int consume(String brokers, String groupId, String topicName) { 14 | // Create a consumer 15 | KafkaConsumer consumer; 16 | // Configure the consumer 17 | Properties properties = new Properties(); 18 | // Point it to the brokers 19 | properties.setProperty("bootstrap.servers", brokers); 20 | // Set the consumer group (all consumers must belong to a group). 21 | properties.setProperty("group.id", groupId); 22 | // Set how to serialize key/value pairs 23 | properties.setProperty("key.deserializer","org.apache.kafka.common.serialization.StringDeserializer"); 24 | properties.setProperty("value.deserializer","org.apache.kafka.common.serialization.StringDeserializer"); 25 | // specify the protocol for Domain Joined TLS Encrypted clusters 26 | properties.setProperty(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, "SASL_SSL"); 27 | properties.setProperty("ssl.mechanism", "GSSAPI"); 28 | properties.setProperty("sasl.kerberos.service.name", "kafka"); 29 | // specifiy the Truststore location and password 30 | properties.setProperty(SslConfigs.SSL_TRUSTSTORE_LOCATION_CONFIG,"/home/sshuser/ssl/kafka.client.truststore.jks"); 31 | properties.setProperty(SslConfigs.SSL_TRUSTSTORE_PASSWORD_CONFIG, "MyClientPassword123"); 32 | // specifiy the Keystore location and password 33 | properties.setProperty(SslConfigs.SSL_KEYSTORE_LOCATION_CONFIG,"/home/sshuser/ssl/kafka.client.keystore.jks"); 34 | properties.setProperty(SslConfigs.SSL_KEYSTORE_PASSWORD_CONFIG, "MyClientPassword123"); 35 | // specifiy the key password 36 | properties.setProperty(SslConfigs.SSL_KEY_PASSWORD_CONFIG, "MyClientPassword123"); 37 | // When a group is first created, it has no offset stored to start reading from. This tells it to start 38 | // with the earliest record in the stream. 39 | properties.setProperty("auto.offset.reset","earliest"); 40 | 41 | consumer = new KafkaConsumer<>(properties); 42 | 43 | // Subscribe to the 'test' topic 44 | consumer.subscribe(Arrays.asList(topicName)); 45 | 46 | // Loop until ctrl + c 47 | int count = 0; 48 | while(true) { 49 | // Poll for records 50 | ConsumerRecords records = consumer.poll(200); 51 | // Did we get any? 52 | if (records.count() == 0) { 53 | // timeout/nothing to read 54 | } else { 55 | // Yes, loop over records 56 | for(ConsumerRecord record: records) { 57 | // Display record and count 58 | count += 1; 59 | System.out.println( count + ": " + record.value()); 60 | } 61 | } 62 | } 63 | } 64 | } 65 | -------------------------------------------------------------------------------- /DomainJoined-Producer-Consumer-With-TLS/src/main/java/com/microsoft/example/Producer.java: -------------------------------------------------------------------------------- 1 | package com.microsoft.example; 2 | 3 | import org.apache.kafka.clients.producer.KafkaProducer; 4 | import org.apache.kafka.clients.producer.ProducerRecord; 5 | import org.apache.kafka.clients.producer.ProducerConfig; 6 | import org.apache.kafka.clients.admin.AdminClient; 7 | import org.apache.kafka.clients.admin.DescribeTopicsResult; 8 | import org.apache.kafka.clients.admin.KafkaAdminClient; 9 | import org.apache.kafka.clients.CommonClientConfigs; 10 | import org.apache.kafka.clients.admin.TopicDescription; 11 | import org.apache.kafka.common.config.SslConfigs; 12 | 13 | import java.util.Collection; 14 | import java.util.Collections; 15 | import java.util.concurrent.ExecutionException; 16 | import java.util.Properties; 17 | import java.util.Random; 18 | import java.io.IOException; 19 | 20 | public class Producer 21 | { 22 | public static void produce(String brokers, String topicName) throws IOException 23 | { 24 | 25 | // Set properties used to configure the producer 26 | Properties properties = new Properties(); 27 | // Set the brokers (bootstrap servers) 28 | properties.setProperty("bootstrap.servers", brokers); 29 | // Set how to serialize key/value pairs 30 | properties.setProperty("key.serializer","org.apache.kafka.common.serialization.StringSerializer"); 31 | properties.setProperty("value.serializer","org.apache.kafka.common.serialization.StringSerializer"); 32 | // specify the protocol for Domain Joined TLS Encrypted clusters 33 | properties.setProperty(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, "SASL_SSL"); 34 | properties.setProperty("ssl.mechanism", "GSSAPI"); 35 | properties.setProperty("sasl.kerberos.service.name", "kafka"); 36 | // specifiy the Truststore location and password 37 | properties.setProperty(SslConfigs.SSL_TRUSTSTORE_LOCATION_CONFIG,"/home/sshuser/ssl/kafka.client.truststore.jks"); 38 | properties.setProperty(SslConfigs.SSL_TRUSTSTORE_PASSWORD_CONFIG, "MyClientPassword123"); 39 | // specifiy the Keystore location and password 40 | properties.setProperty(SslConfigs.SSL_KEYSTORE_LOCATION_CONFIG,"/home/sshuser/ssl/kafka.client.keystore.jks"); 41 | properties.setProperty(SslConfigs.SSL_KEYSTORE_PASSWORD_CONFIG, "MyClientPassword123"); 42 | // specifiy the key password 43 | properties.setProperty(SslConfigs.SSL_KEY_PASSWORD_CONFIG, "MyClientPassword123"); 44 | 45 | KafkaProducer producer = new KafkaProducer<>(properties); 46 | 47 | // So we can generate random sentences 48 | Random random = new Random(); 49 | String[] sentences = new String[] { 50 | "the cow jumped over the moon", 51 | "an apple a day keeps the doctor away", 52 | "four score and seven years ago", 53 | "snow white and the seven dwarfs", 54 | "i am at two with nature" 55 | }; 56 | 57 | String progressAnimation = "|/-\\"; 58 | // Produce a bunch of records 59 | for(int i = 0; i < 100; i++) { 60 | // Pick a sentence at random 61 | String sentence = sentences[random.nextInt(sentences.length)]; 62 | // Send the sentence to the test topic 63 | try 64 | { 65 | producer.send(new ProducerRecord(topicName, sentence)).get(); 66 | } 67 | catch (Exception ex) 68 | { 69 | System.out.print(ex.getMessage()); 70 | throw new IOException(ex.toString()); 71 | } 72 | String progressBar = "\r" + progressAnimation.charAt(i % progressAnimation.length()) + " " + i; 73 | System.out.write(progressBar.getBytes()); 74 | } 75 | } 76 | } 77 | -------------------------------------------------------------------------------- /DomainJoined-Producer-Consumer-With-TLS/src/main/java/com/microsoft/example/Run.java: -------------------------------------------------------------------------------- 1 | package com.microsoft.example; 2 | 3 | import java.io.IOException; 4 | import java.util.UUID; 5 | import java.io.PrintWriter; 6 | import java.io.File; 7 | import java.lang.Exception; 8 | 9 | // Handle starting producer or consumer 10 | public class Run { 11 | public static void main(String[] args) throws IOException { 12 | if(args.length < 3) { 13 | usage(); 14 | } 15 | // Get the brokers 16 | String brokers = args[2]; 17 | String topicName = args[1]; 18 | switch(args[0].toLowerCase()) { 19 | case "producer": 20 | Producer.produce(brokers, topicName); 21 | break; 22 | case "consumer": 23 | // Either a groupId was passed in, or we need a random one 24 | String groupId; 25 | if(args.length == 4) { 26 | groupId = args[3]; 27 | } else { 28 | groupId = UUID.randomUUID().toString(); 29 | } 30 | Consumer.consume(brokers, groupId, topicName); 31 | break; 32 | case "describe": 33 | AdminClientWrapper.describeTopics(brokers, topicName); 34 | break; 35 | case "create": 36 | AdminClientWrapper.createTopics(brokers, topicName); 37 | break; 38 | case "delete": 39 | AdminClientWrapper.deleteTopics(brokers, topicName); 40 | break; 41 | default: 42 | usage(); 43 | } 44 | System.exit(0); 45 | } 46 | // Display usage 47 | public static void usage() { 48 | System.out.println("Usage:"); 49 | System.out.println("kafka-example.jar brokerhosts [groupid]"); 50 | System.exit(1); 51 | } 52 | } 53 | -------------------------------------------------------------------------------- /DomainJoined-Producer-Consumer/.gitignore: -------------------------------------------------------------------------------- 1 | target/ 2 | pom.xml.tag 3 | pom.xml.releaseBackup 4 | pom.xml.versionsBackup 5 | pom.xml.next 6 | release.properties 7 | dependency-reduced-pom.xml 8 | buildNumber.properties 9 | .mvn/timing.properties 10 | .idea/ 11 | *.log 12 | .classpath 13 | .project 14 | .settings/ 15 | *.iml -------------------------------------------------------------------------------- /DomainJoined-Producer-Consumer/README.md: -------------------------------------------------------------------------------- 1 | --- 2 | page_type: sample 3 | languages: java 4 | products: 5 | - azure 6 | - azure-hdinsight 7 | description: "Examples in this repository demonstrate how to use the Kafka Consumer, Producer, and Streaming APIs with a Kerberized Kafka on HDInsight cluster." 8 | urlFragment: hdinsight-kafka-java-get-started 9 | --- 10 | 11 | # Java-based example of using the Kafka Consumer, Producer, and Streaming APIs 12 | 13 | The examples in this repository demonstrate how to use the Kafka Consumer, Producer, and Streaming APIs with a Kafka on HDInsight cluster. 14 | 15 | ## Prerequisites 16 | 17 | * Apache Kafka on HDInsight cluster. To learn how to create the cluster, see [Start with Apache Kafka on HDInsight](apache-kafka-get-started.md). 18 | * [Java Developer Kit (JDK) version 8](https://aka.ms/azure-jdks) or an equivalent, such as OpenJDK. 19 | * [Apache Maven](https://maven.apache.org/download.cgi) properly [installed](https://maven.apache.org/install.html) according to Apache. Maven is a project build system for Java projects. 20 | * An SSH client like Putty. For more information, see [Connect to HDInsight (Apache Hadoop) using SSH](../hdinsight-hadoop-linux-use-ssh-unix.md). 21 | 22 | ## Understand the code 23 | 24 | If you're using **Enterprise Security Package (ESP)** enabled Kafka cluster, you should use the application version located in the `DomainJoined-Producer-Consumer` subdirectory. 25 | 26 | The application consists primarily of four files: 27 | * `pom.xml`: This file defines the project dependencies, Java version, and packaging methods. 28 | * `Producer.java`: This file sends random sentences to Kafka using the producer API. 29 | * `Consumer.java`: This file uses the consumer API to read data from Kafka and emit it to STDOUT. 30 | * `AdminClientWrapper.java`: This file uses the admin API to create, describe, and delete Kafka topics. 31 | * `Run.java`: The command-line interface used to run the producer and consumer code. 32 | 33 | ### Pom.xml 34 | 35 | The important things to understand in the `pom.xml` file are: 36 | 37 | * Dependencies: This project relies on the Kafka producer and consumer APIs, which are provided by the `kafka-clients` package. The following XML code defines this dependency: 38 | 39 | ```xml 40 | 41 | 42 | org.apache.kafka 43 | kafka-clients 44 | ${kafka.version} 45 | 46 | ``` 47 | 48 | The `${kafka.version}` entry is declared in the `..` section of `pom.xml`, and is configured to the Kafka version of the HDInsight cluster. 49 | 50 | * Plugins: Maven plugins provide various capabilities. In this project, the following plugins are used: 51 | 52 | * `maven-compiler-plugin`: Used to set the Java version used by the project to 8. This is the version of Java used by HDInsight 4.0. 53 | * `maven-shade-plugin`: Used to generate an uber jar that contains this application as well as any dependencies. It is also used to set the entry point of the application, so that you can directly run the Jar file without having to specify the main class. 54 | 55 | ### Producer.java 56 | 57 | The producer communicates with the Kafka broker hosts (worker nodes) and sends data to a Kafka topic. The following code snippet is from the [Producer.java](https://github.com/Azure-Samples/hdinsight-kafka-java-get-started/blob/master/DomainJoined-Producer-Consumer/src/main/java/com/microsoft/example/Producer.java) file from the [GitHub repository](https://github.com/Azure-Samples/hdinsight-kafka-java-get-started) and shows how to set the producer properties. 58 | 59 | ```java 60 | Properties properties = new Properties(); 61 | // Set the brokers (bootstrap servers) 62 | properties.setProperty("bootstrap.servers", brokers); 63 | // Set how to serialize key/value pairs 64 | properties.setProperty("key.serializer","org.apache.kafka.common.serialization.StringSerializer"); 65 | properties.setProperty("value.serializer","org.apache.kafka.common.serialization.StringSerializer"); 66 | properties.setProperty(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, "SASL_PLAINTEXT"); 67 | KafkaProducer producer = new KafkaProducer<>(properties); 68 | ``` 69 | 70 | ### Consumer.java 71 | 72 | The consumer communicates with the Kafka broker hosts (worker nodes), and reads records in a loop. The following code snippet from the [Consumer.java](https://github.com/Azure-Samples/hdinsight-kafka-java-get-started/blob/master/DomainJoined-Producer-Consumer/src/main/java/com/microsoft/example/Consumer.java) file sets the consumer properties. 73 | 74 | ```java 75 | KafkaConsumer consumer; 76 | // Configure the consumer 77 | Properties properties = new Properties(); 78 | // Point it to the brokers 79 | properties.setProperty("bootstrap.servers", brokers); 80 | // Set the consumer group (all consumers must belong to a group). 81 | properties.setProperty("group.id", groupId); 82 | // Set how to serialize key/value pairs 83 | properties.setProperty("key.deserializer","org.apache.kafka.common.serialization.StringDeserializer"); 84 | properties.setProperty("value.deserializer","org.apache.kafka.common.serialization.StringDeserializer"); 85 | properties.setProperty(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, "SASL_PLAINTEXT"); 86 | // When a group is first created, it has no offset stored to start reading from. This tells it to start 87 | // with the earliest record in the stream. 88 | properties.setProperty("auto.offset.reset","earliest"); 89 | 90 | consumer = new KafkaConsumer<>(properties); 91 | ``` 92 | 93 | Notice the important property added for ESP cluster. This is critical to add in AdminClient, Producer and Consumer. properties.setProperty(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, "SASL_PLAINTEXT"); 94 | In this code, the consumer is configured to read from the start of the topic (`auto.offset.reset` is set to `earliest`.) 95 | 96 | ### Run.java 97 | 98 | The [Run.java](https://github.com/Azure-Samples/hdinsight-kafka-java-get-started/blob/master/DomainJoined-Producer-Consumer/src/main/java/com/microsoft/example/Run.java) file provides a command-line interface that runs either the producer or consumer code. You must provide the Kafka broker host information as a parameter. You can optionally include a group ID value, which is used by the consumer process. If you create multiple consumer instances using the same group ID, they'll load balance reading from the topic. 99 | 100 | ## Use Pre-built JAR files 101 | 102 | Download the jars from the [Kafka Get Started Azure sample](https://github.com/Azure-Samples/hdinsight-kafka-java-get-started/tree/master/Prebuilt-Jars). If your cluster is **Enterprise Security Package (ESP)** enabled, use kafka-producer-consumer-esp.jar. Use the command below to copy the jars to your cluster. 103 | 104 | ```cmd 105 | scp kafka-producer-consumer-esp.jar sshuser@CLUSTERNAME-ssh.azurehdinsight.net:kafka-producer-consumer.jar 106 | ``` 107 | 108 | ## Build the JAR files from code 109 | 110 | 111 | If you would like to skip this step, prebuilt jars can be downloaded from the `Prebuilt-Jars` subdirectory. Download the kafka-producer-consumer.jar. If your cluster is **Enterprise Security Package (ESP)** enabled, use kafka-producer-consumer-esp.jar. Execute step 3 to copy the jar to your HDInsight cluster. 112 | 113 | 1. Download and extract the examples from [https://github.com/Azure-Samples/hdinsight-kafka-java-get-started](https://github.com/Azure-Samples/hdinsight-kafka-java-get-started). 114 | 115 | 2. If you are using **Enterprise Security Package (ESP)** enabled Kafka cluster, you should set the location to `DomainJoined-Producer-Consumer` subdirectory. Use the following command to build the application: 116 | 117 | ```cmd 118 | mvn clean package 119 | ``` 120 | 121 | This command creates a directory named `target`, that contains a file named `kafka-producer-consumer-1.0-SNAPSHOT.jar`. For ESP clusters the file will be `kafka-producer-consumer-esp-1.0-SNAPSHOT.jar` 122 | 123 | 3. Replace `sshuser` with the SSH user for your cluster, and replace `CLUSTERNAME` with the name of your cluster. Enter the following command to copy the `kafka-producer-consumer-*.jar` file to your HDInsight cluster. When prompted enter the password for the SSH user. 124 | 125 | ```cmd 126 | scp ./target/kafka-producer-consumer*.jar sshuser@CLUSTERNAME-ssh.azurehdinsight.net:kafka-producer-consumer.jar 127 | ``` 128 | 129 | ## Run the example 130 | This conversation was marked as resolved by anusricorp 131 | 132 | 1. Replace `sshuser` with the SSH user for your cluster, and replace `CLUSTERNAME` with the name of your cluster. Open an SSH connection to the cluster, by entering the following command. If prompted, enter the password for the SSH user account. 133 | 134 | ```cmd 135 | ssh sshuser@CLUSTERNAME-ssh.azurehdinsight.net 136 | ``` 137 | 138 | 1. To get the Kafka broker hosts, substitute the values for `` and `` in the following command and execute it. Use the same casing for `` as shown in the Azure portal. Replace `` with the cluster login password, then execute: 139 | 140 | ```bash 141 | sudo apt -y install jq 142 | export clusterName='' 143 | export password='' 144 | export KAFKABROKERS=$(curl -sS -u admin:$password -G https://$clusterName.azurehdinsight.net/api/v1/clusters/$clusterName/services/KAFKA/components/KAFKA_BROKER | jq -r '["\(.host_components[].HostRoles.host_name):9092"] | join(",")' | cut -d',' -f1,2); 145 | ``` 146 | 147 | > **Note** 148 | This command requires Ambari access. If your cluster is behind an NSG, run this command from a machine that can access Ambari. 149 | 1. Create Kafka topic, `myTest`, by entering the following command: 150 | 151 | ```bash 152 | java -jar -Djava.security.auth.login.config=/usr/hdp/current/kafka-broker/config/kafka_client_jaas.conf kafka-producer-consumer.jar create myTest $KAFKABROKERS 153 | ``` 154 | 155 | 1. To run the producer and write data to the topic, use the following command: 156 | 157 | ```bash 158 | java -jar -Djava.security.auth.login.config=/usr/hdp/current/kafka-broker/config/kafka_client_jaas.conf kafka-producer-consumer.jar producer myTest $KAFKABROKERS 159 | ``` 160 | 161 | 1. Once the producer has finished, use the following command to read from the topic: 162 | 163 | ```bash 164 | java -jar -Djava.security.auth.login.config=/usr/hdp/current/kafka-broker/config/kafka_client_jaas.conf kafka-producer-consumer.jar consumer myTest $KAFKABROKERS 165 | scp ./target/kafka-producer-consumer*.jar sshuser@CLUSTERNAME-ssh.azurehdinsight.net:kafka-producer-consumer.jar 166 | ``` 167 | 168 | The records read, along with a count of records, is displayed. 169 | 170 | 1. Use __Ctrl + C__ to exit the consumer. 171 | 172 | ### Run the Example with another User (espkafkauser) 173 | 174 | 1. To get the Kafka broker hosts, substitute the values for `` and `` in the following command and execute it. Use the same casing for `` as shown in the Azure portal. Replace `` with the cluster login password, then execute: 175 | 176 | ```bash 177 | sudo apt -y install jq 178 | export clusterName='' 179 | export password='' 180 | export KAFKABROKERS=$(curl -sS -u admin:$password -G https://$clusterName.azurehdinsight.net/api/v1/clusters/$clusterName/services/KAFKA/components/KAFKA_BROKER | jq -r '["\(.host_components[].HostRoles.host_name):9092"] | join(",")' | cut -d',' -f1,2); 181 | ``` 182 | 2. Create the keytab file for espkafkauser with below steps 183 | ```bash 184 | ktutil 185 | ktutil: addent -password -p espkafkauser@TEST.COM -k 1 -e RC4-HMAC 186 | Password for espkafkauser@TEST.COM: 187 | ktutil: wkt espkafkauser.keytab 188 | ktutil: q 189 | ``` 190 | 191 | **NOTE:-** 192 | 1. espkafkauser should be part of your domain group and add it in RangerUI to give CRUD operations privileges. 193 | 2. Keep this domain name (TEST.COM) in capital only. Otherwise, kerberos will throw errors at the time of CRUD operations. 194 | 195 | You will be having an espkafkauser.keytab file in local directory. Now create an espkafkauser_jaas.conf jaas config file with data given below 196 | 197 | ``` 198 | KafkaClient { 199 | com.sun.security.auth.module.Krb5LoginModule required 200 | useKeyTab=true 201 | storeKey=true 202 | keyTab="/home/sshuser/espkafkauser.keytab" 203 | useTicketCache=false 204 | serviceName="kafka" 205 | principal="espkafkauser@TEST.COM"; 206 | }; 207 | ``` 208 | ### Steps to add espkafkauser on RangerUI 209 | 1. Go to overview page of cluster and use Ambari UI URL to open ranger. Enter the Ambari UI credentials and it should work. 210 | 211 | ![](media/Azure_Portal_UI.png) 212 | ``` 213 | Generic 214 | https:///ranger 215 | 216 | Example 217 | https://espkafka.azurehdinsight.net/ranger 218 | ``` 219 | 220 | 2. If everything is correct then you will be able to see ranger dashboard. Now click on Kafka link. 221 | 222 | ![](media/Ranger_UI.png) 223 | 224 | 225 | 3. Now we can see policy page where some users like kafka have access to do CRUD operation on alltopics. 226 | 227 | ![](media/Kafk_Policy_UI.png) 228 | 229 | 230 | 4. Now edit the alltopic policy and add espkafkauser in selectuser from dropdown. Click on save policy after changes 231 | 232 | ![](media/Edit_Policy_UI.png) 233 | 234 | ![](media/Add_User.png) 235 | 236 | 237 | 5. If we are not able to see our user in dropdown then that mean that user is not available in AAD domain. 238 | 239 | 6. Now Execute CRUD operations in head node for verification 240 | 241 | ```bash 242 | # Sample command 243 | java -jar -Djava.security.auth.login.config=JAAS_CONFIG_FILE_PATH PRODUCER_CONSUMER_ESP_JAR_PATH create $TOPICNAME $KAFKABROKER 244 | 245 | # Create 246 | java -jar -Djava.security.auth.login.config=user_jaas.conf kafka-producer-consumer-esp.jar create $TOPICNAME $KAFKABROKERS 247 | 248 | # Describe 249 | java -jar -Djava.security.auth.login.config=user_jaas.conf kafka-producer-consumer-esp.jar describe $TOPICNAME $KAFKABROKERS 250 | 251 | #Produce 252 | java -jar -Djava.security.auth.login.config=user_jaas.conf kafka-producer-consumer-esp.jar producer $TOPICNAME $KAFKABROKERS 253 | 254 | #Consume 255 | java -jar -Djava.security.auth.login.config=user_jaas.conf kafka-producer-consumer-esp.jar consumer $TOPICNAME $KAFKABROKERS 256 | 257 | #Delete 258 | java -jar -Djava.security.auth.login.config=user_jaas.conf kafka-producer-consumer-esp.jar delete $TOPICNAME $KAFKABROKERS 259 | ``` 260 | 261 | 262 | ### Multiple consumers 263 | 264 | Kafka consumers use a consumer group when reading records. Using the same group with multiple consumers results in load balanced reads from a topic. Each consumer in the group receives a portion of the records. 265 | 266 | The consumer application accepts a parameter that is used as the group ID. For example, the following command starts a consumer using a group ID of `myGroup`: 267 | 268 | ```bash 269 | java -jar -Djava.security.auth.login.config=user_jaas.conf kafka-producer-consumer-esp.jar consumer myTest $KAFKABROKERS myGroup 270 | ``` 271 | 272 | Use __Ctrl + C__ to exit the consumer. 273 | 274 | To see this process in action, use the following command: 275 | 276 | With Kafka as user 277 | ```bash 278 | tmux new-session 'java -jar -Djava.security.auth.login.config=/usr/hdp/current/kafka-broker/config/kafka_client_jaas.conf kafka-producer-consumer-esp.jar consumer myTest $KAFKABROKERS myGroup' \ 279 | \; split-window -h 'java -jar -Djava.security.auth.login.config=/usr/hdp/current/kafka-broker/config/kafka_client_jaas.conf kafka-producer-consumer-esp.jar consumer myTest $KAFKABROKERS myGroup' \ 280 | \; attach 281 | ``` 282 | 283 | With custom user 284 | ```bash 285 | tmux new-session 'java -jar -Djava.security.auth.login.config=user_jaas.conf kafka-producer-consumer-esp.jar consumer myTest $KAFKABROKERS myGroup' \ 286 | \; split-window -h 'java -jar -Djava.security.auth.login.config=user_jaas.conf kafka-producer-consumer-esp.jar consumer myTest $KAFKABROKERS myGroup' \ 287 | \; attach 288 | ``` 289 | 290 | This command uses `tmux` to split the terminal into two columns. A consumer is started in each column, with the same group ID value. Once the consumers finish reading, notice that each read only a portion of the records. Use __Ctrl + C__ twice to exit `tmux`. 291 | 292 | Consumption by clients within the same group is handled through the partitions for the topic. In this code sample, the `test` topic created earlier has eight partitions. If you start eight consumers, each consumer reads records from a single partition for the topic. 293 | 294 | > [!IMPORTANT] 295 | > There cannot be more consumer instances in a consumer group than partitions. In this example, one consumer group can contain up to eight consumers since that is the number of partitions in the topic. Or you can have multiple consumer groups, each with no more than eight consumers. 296 | 297 | Records stored in Kafka are stored in the order they're received within a partition. To achieve in-ordered delivery for records *within a partition*, create a consumer group where the number of consumer instances matches the number of partitions. To achieve in-ordered delivery for records *within the topic*, create a consumer group with only one consumer instance. 298 | 299 | ## Common Issues faced 300 | 301 | 1. Topic creation fails 302 | 303 | 304 | If your cluster is Enterprise Security Pack enabled, use the [pre-built JAR files for producer and consumer](https://github.com/Azure-Samples/hdinsight-kafka-java-get-started/blob/master/Prebuilt-Jars/kafka-producer-consumer-esp.jar). 305 | 306 | 307 | The ESP jar can be built from the code in the [`DomainJoined-Producer-Consumer` subdirectory](https://github.com/Azure-Samples/hdinsight-kafka-java-get-started/tree/master/DomainJoined-Producer-Consumer). Note that the producer and consumer properties ave an additional property `CommonClientConfigs.SECURITY_PROTOCOL_CONFIG` for ESP enabled clusters. 308 | 309 | 310 | 1. Facing issue with ESP enabled clusters 311 | 312 | If produce and consume operations fail, and you are using an ESP enabled cluster, check that the user `kafka` is present in all Ranger policies. If it is not present, add it to all Ranger policies. 313 | 314 | -------------------------------------------------------------------------------- /DomainJoined-Producer-Consumer/media/Add_User.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure-Samples/hdinsight-kafka-java-get-started/28497e8edf677bf63b4c50a0e1f6a25b6d3f35a7/DomainJoined-Producer-Consumer/media/Add_User.png -------------------------------------------------------------------------------- /DomainJoined-Producer-Consumer/media/Azure_Portal_UI.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure-Samples/hdinsight-kafka-java-get-started/28497e8edf677bf63b4c50a0e1f6a25b6d3f35a7/DomainJoined-Producer-Consumer/media/Azure_Portal_UI.png -------------------------------------------------------------------------------- /DomainJoined-Producer-Consumer/media/Edit_Policy_UI.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure-Samples/hdinsight-kafka-java-get-started/28497e8edf677bf63b4c50a0e1f6a25b6d3f35a7/DomainJoined-Producer-Consumer/media/Edit_Policy_UI.png -------------------------------------------------------------------------------- /DomainJoined-Producer-Consumer/media/Kafk_Policy_UI.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure-Samples/hdinsight-kafka-java-get-started/28497e8edf677bf63b4c50a0e1f6a25b6d3f35a7/DomainJoined-Producer-Consumer/media/Kafk_Policy_UI.png -------------------------------------------------------------------------------- /DomainJoined-Producer-Consumer/media/Ranger_UI.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure-Samples/hdinsight-kafka-java-get-started/28497e8edf677bf63b4c50a0e1f6a25b6d3f35a7/DomainJoined-Producer-Consumer/media/Ranger_UI.png -------------------------------------------------------------------------------- /DomainJoined-Producer-Consumer/pom.xml: -------------------------------------------------------------------------------- 1 | 3 | 4.0.0 4 | com.microsoft.example 5 | kafka-producer-consumer-esp 6 | jar 7 | 1.0-SNAPSHOT 8 | kafka-producer-consumer 9 | http://maven.apache.org 10 | 11 | 12 | 2.1.1 13 | 14 | 15 | 16 | 17 | org.apache.kafka 18 | kafka-clients 19 | ${kafka.version} 20 | 21 | 22 | 23 | 24 | 25 | org.apache.maven.plugins 26 | maven-compiler-plugin 27 | 3.3 28 | 29 | 30 | 1.8 31 | 1.8 32 | 33 | 34 | 35 | 36 | org.apache.maven.plugins 37 | maven-shade-plugin 38 | 2.3 39 | 40 | 41 | 42 | 43 | 44 | 45 | com.microsoft.example.Run 46 | 47 | 48 | 49 | 50 | 51 | *:* 52 | 53 | META-INF/*.SF 54 | META-INF/*.DSA 55 | META-INF/*.RSA 56 | 57 | 58 | 59 | 60 | 61 | 62 | package 63 | 64 | shade 65 | 66 | 67 | 68 | 69 | 70 | 71 | 72 | -------------------------------------------------------------------------------- /DomainJoined-Producer-Consumer/src/main/java/com/microsoft/example/AdminClientWrapper.java: -------------------------------------------------------------------------------- 1 | package com.microsoft.example; 2 | 3 | import org.apache.kafka.clients.producer.ProducerConfig; 4 | import org.apache.kafka.clients.admin.AdminClient; 5 | import org.apache.kafka.clients.admin.DescribeTopicsResult; 6 | import org.apache.kafka.clients.admin.CreateTopicsResult; 7 | import org.apache.kafka.clients.admin.DeleteTopicsResult; 8 | import org.apache.kafka.clients.admin.TopicDescription; 9 | import org.apache.kafka.clients.admin.NewTopic; 10 | 11 | import org.apache.kafka.clients.admin.KafkaAdminClient; 12 | import org.apache.kafka.clients.CommonClientConfigs; 13 | 14 | 15 | import java.util.Collection; 16 | import java.util.Collections; 17 | import java.util.concurrent.ExecutionException; 18 | import java.util.Properties; 19 | import java.util.Random; 20 | import java.io.IOException; 21 | 22 | 23 | public class AdminClientWrapper { 24 | 25 | public static Properties getProperties(String brokers) { 26 | Properties properties = new Properties(); 27 | properties.setProperty(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, brokers); 28 | 29 | // Set how to serialize key/value pairs 30 | properties.setProperty(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer"); 31 | properties.setProperty(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer"); 32 | // specify the protocol for Domain Joined clusters 33 | properties.setProperty(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, "SASL_PLAINTEXT"); 34 | 35 | return properties; 36 | } 37 | 38 | public static void describeTopics(String brokers, String topicName) throws IOException { 39 | // Set properties used to configure admin client 40 | Properties properties = getProperties(brokers); 41 | 42 | try (final AdminClient adminClient = KafkaAdminClient.create(properties)) { 43 | // Make async call to describe the topic. 44 | final DescribeTopicsResult describeTopicsResult = adminClient.describeTopics(Collections.singleton(topicName)); 45 | 46 | TopicDescription description = describeTopicsResult.values().get(topicName).get(); 47 | System.out.print(description.toString()); 48 | } catch (Exception e) { 49 | System.out.print("Describe denied\n"); 50 | System.out.print(e.getMessage()); 51 | //throw new RuntimeException(e.getMessage(), e); 52 | } 53 | } 54 | 55 | public static void deleteTopics(String brokers, String topicName) throws IOException { 56 | // Set properties used to configure admin client 57 | Properties properties = getProperties(brokers); 58 | 59 | try (final AdminClient adminClient = KafkaAdminClient.create(properties)) { 60 | final DeleteTopicsResult deleteTopicsResult = adminClient.deleteTopics(Collections.singleton(topicName)); 61 | deleteTopicsResult.values().get(topicName).get(); 62 | System.out.print("Topic " + topicName + " deleted"); 63 | } catch (Exception e) { 64 | System.out.print("Delete Topics denied\n"); 65 | System.out.print(e.getMessage()); 66 | //throw new RuntimeException(e.getMessage(), e); 67 | } 68 | } 69 | 70 | public static void createTopics(String brokers, String topicName) throws IOException { 71 | // Set properties used to configure admin client 72 | Properties properties = getProperties(brokers); 73 | 74 | try (final AdminClient adminClient = KafkaAdminClient.create(properties)) { 75 | int numPartitions = 8; 76 | short replicationFactor = (short)3; 77 | final NewTopic newTopic = new NewTopic(topicName, numPartitions, replicationFactor); 78 | 79 | final CreateTopicsResult createTopicsResult = adminClient.createTopics(Collections.singleton(newTopic)); 80 | createTopicsResult.values().get(topicName).get(); 81 | System.out.print("Topic " + topicName + " created"); 82 | } catch (Exception e) { 83 | System.out.print("Create Topics denied\n"); 84 | System.out.print(e.getMessage()); 85 | //throw new RuntimeException(e.getMessage(), e); 86 | } 87 | } 88 | } 89 | -------------------------------------------------------------------------------- /DomainJoined-Producer-Consumer/src/main/java/com/microsoft/example/Consumer.java: -------------------------------------------------------------------------------- 1 | package com.microsoft.example; 2 | 3 | import org.apache.kafka.clients.consumer.KafkaConsumer; 4 | import org.apache.kafka.clients.consumer.ConsumerRecords; 5 | import org.apache.kafka.clients.consumer.ConsumerRecord; 6 | import org.apache.kafka.clients.CommonClientConfigs; 7 | import java.util.Properties; 8 | import java.util.Arrays; 9 | 10 | public class Consumer { 11 | public static int consume(String brokers, String groupId, String topicName) { 12 | // Create a consumer 13 | KafkaConsumer consumer; 14 | // Configure the consumer 15 | Properties properties = new Properties(); 16 | // Point it to the brokers 17 | properties.setProperty("bootstrap.servers", brokers); 18 | // Set the consumer group (all consumers must belong to a group). 19 | properties.setProperty("group.id", groupId); 20 | // Set how to serialize key/value pairs 21 | properties.setProperty("key.deserializer","org.apache.kafka.common.serialization.StringDeserializer"); 22 | properties.setProperty("value.deserializer","org.apache.kafka.common.serialization.StringDeserializer"); 23 | // When a group is first created, it has no offset stored to start reading from. This tells it to start 24 | // with the earliest record in the stream. 25 | properties.setProperty("auto.offset.reset","earliest"); 26 | 27 | // specify the protocol for Domain Joined clusters 28 | properties.setProperty(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, "SASL_PLAINTEXT"); 29 | 30 | consumer = new KafkaConsumer<>(properties); 31 | 32 | // Subscribe to the 'test' topic 33 | consumer.subscribe(Arrays.asList(topicName)); 34 | 35 | // Loop until ctrl + c 36 | int count = 0; 37 | while(true) { 38 | // Poll for records 39 | ConsumerRecords records = consumer.poll(200); 40 | // Did we get any? 41 | if (records.count() == 0) { 42 | // timeout/nothing to read 43 | } else { 44 | // Yes, loop over records 45 | for(ConsumerRecord record: records) { 46 | // Display record and count 47 | count += 1; 48 | System.out.println( count + ": " + record.value()); 49 | } 50 | } 51 | } 52 | } 53 | } 54 | -------------------------------------------------------------------------------- /DomainJoined-Producer-Consumer/src/main/java/com/microsoft/example/Producer.java: -------------------------------------------------------------------------------- 1 | package com.microsoft.example; 2 | 3 | import org.apache.kafka.clients.producer.KafkaProducer; 4 | import org.apache.kafka.clients.producer.ProducerRecord; 5 | import org.apache.kafka.clients.producer.ProducerConfig; 6 | import org.apache.kafka.clients.admin.AdminClient; 7 | import org.apache.kafka.clients.admin.DescribeTopicsResult; 8 | import org.apache.kafka.clients.admin.KafkaAdminClient; 9 | import org.apache.kafka.clients.CommonClientConfigs; 10 | import org.apache.kafka.clients.admin.TopicDescription; 11 | 12 | import java.util.Collection; 13 | import java.util.Collections; 14 | import java.util.concurrent.ExecutionException; 15 | import java.util.Properties; 16 | import java.util.Random; 17 | import java.io.IOException; 18 | 19 | public class Producer 20 | { 21 | public static void produce(String brokers, String topicName) throws IOException 22 | { 23 | 24 | // Set properties used to configure the producer 25 | Properties properties = new Properties(); 26 | // Set the brokers (bootstrap servers) 27 | properties.setProperty("bootstrap.servers", brokers); 28 | // Set how to serialize key/value pairs 29 | properties.setProperty("key.serializer","org.apache.kafka.common.serialization.StringSerializer"); 30 | properties.setProperty("value.serializer","org.apache.kafka.common.serialization.StringSerializer"); 31 | // specify the protocol for Domain Joined clusters 32 | properties.setProperty(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, "SASL_PLAINTEXT"); 33 | 34 | KafkaProducer producer = new KafkaProducer<>(properties); 35 | 36 | // So we can generate random sentences 37 | Random random = new Random(); 38 | String[] sentences = new String[] { 39 | "the cow jumped over the moon", 40 | "an apple a day keeps the doctor away", 41 | "four score and seven years ago", 42 | "snow white and the seven dwarfs", 43 | "i am at two with nature" 44 | }; 45 | 46 | String progressAnimation = "|/-\\"; 47 | // Produce a bunch of records 48 | for(int i = 0; i < 100; i++) { 49 | // Pick a sentence at random 50 | String sentence = sentences[random.nextInt(sentences.length)]; 51 | // Send the sentence to the test topic 52 | try 53 | { 54 | producer.send(new ProducerRecord(topicName, sentence)).get(); 55 | } 56 | catch (Exception ex) 57 | { 58 | System.out.print(ex.getMessage()); 59 | throw new IOException(ex.toString()); 60 | } 61 | String progressBar = "\r" + progressAnimation.charAt(i % progressAnimation.length()) + " " + i; 62 | System.out.write(progressBar.getBytes()); 63 | } 64 | } 65 | } 66 | -------------------------------------------------------------------------------- /DomainJoined-Producer-Consumer/src/main/java/com/microsoft/example/Run.java: -------------------------------------------------------------------------------- 1 | package com.microsoft.example; 2 | 3 | import java.io.IOException; 4 | import java.util.UUID; 5 | import java.io.PrintWriter; 6 | import java.io.File; 7 | import java.lang.Exception; 8 | 9 | // Handle starting producer or consumer 10 | public class Run { 11 | public static void main(String[] args) throws IOException { 12 | if(args.length < 3) { 13 | usage(); 14 | } 15 | // Get the brokers 16 | String brokers = args[2]; 17 | String topicName = args[1]; 18 | switch(args[0].toLowerCase()) { 19 | case "producer": 20 | Producer.produce(brokers, topicName); 21 | break; 22 | case "consumer": 23 | // Either a groupId was passed in, or we need a random one 24 | String groupId; 25 | if(args.length == 4) { 26 | groupId = args[3]; 27 | } else { 28 | groupId = UUID.randomUUID().toString(); 29 | } 30 | Consumer.consume(brokers, groupId, topicName); 31 | break; 32 | case "describe": 33 | AdminClientWrapper.describeTopics(brokers, topicName); 34 | break; 35 | case "create": 36 | AdminClientWrapper.createTopics(brokers, topicName); 37 | break; 38 | case "delete": 39 | AdminClientWrapper.deleteTopics(brokers, topicName); 40 | break; 41 | default: 42 | usage(); 43 | } 44 | System.exit(0); 45 | } 46 | // Display usage 47 | public static void usage() { 48 | System.out.println("Usage:"); 49 | System.out.println("kafka-example.jar brokerhosts [groupid]"); 50 | System.exit(1); 51 | } 52 | } 53 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2015 Microsoft Corporation 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. -------------------------------------------------------------------------------- /Prebuilt-Jars/kafka-producer-consumer-esp.jar: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure-Samples/hdinsight-kafka-java-get-started/28497e8edf677bf63b4c50a0e1f6a25b6d3f35a7/Prebuilt-Jars/kafka-producer-consumer-esp.jar -------------------------------------------------------------------------------- /Prebuilt-Jars/kafka-producer-consumer-tls-esp.jar: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure-Samples/hdinsight-kafka-java-get-started/28497e8edf677bf63b4c50a0e1f6a25b6d3f35a7/Prebuilt-Jars/kafka-producer-consumer-tls-esp.jar -------------------------------------------------------------------------------- /Prebuilt-Jars/kafka-producer-consumer.jar: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure-Samples/hdinsight-kafka-java-get-started/28497e8edf677bf63b4c50a0e1f6a25b6d3f35a7/Prebuilt-Jars/kafka-producer-consumer.jar -------------------------------------------------------------------------------- /Producer-Consumer/.gitignore: -------------------------------------------------------------------------------- 1 | target/ 2 | pom.xml.tag 3 | pom.xml.releaseBackup 4 | pom.xml.versionsBackup 5 | pom.xml.next 6 | release.properties 7 | dependency-reduced-pom.xml 8 | buildNumber.properties 9 | .mvn/timing.properties 10 | .idea/ 11 | *.log 12 | .classpath 13 | .project 14 | .settings/ 15 | *.iml -------------------------------------------------------------------------------- /Producer-Consumer/pom.xml: -------------------------------------------------------------------------------- 1 | 3 | 4.0.0 4 | com.microsoft.example 5 | kafka-producer-consumer 6 | jar 7 | 1.0-SNAPSHOT 8 | kafka-producer-consumer 9 | http://maven.apache.org 10 | 11 | 12 | 2.1.1 13 | 14 | 15 | 16 | 17 | org.apache.kafka 18 | kafka-clients 19 | ${kafka.version} 20 | 21 | 22 | 23 | 24 | 25 | org.apache.maven.plugins 26 | maven-compiler-plugin 27 | 3.3 28 | 29 | 30 | 1.8 31 | 1.8 32 | 33 | 34 | 35 | 36 | org.apache.maven.plugins 37 | maven-shade-plugin 38 | 2.3 39 | 40 | 41 | 42 | 43 | 44 | 45 | com.microsoft.example.Run 46 | 47 | 48 | 49 | 50 | 51 | *:* 52 | 53 | META-INF/*.SF 54 | META-INF/*.DSA 55 | META-INF/*.RSA 56 | 57 | 58 | 59 | 60 | 61 | 62 | package 63 | 64 | shade 65 | 66 | 67 | 68 | 69 | 70 | 71 | 72 | -------------------------------------------------------------------------------- /Producer-Consumer/src/main/java/com/microsoft/example/AdminClientWrapper.java: -------------------------------------------------------------------------------- 1 | package com.microsoft.example; 2 | 3 | import org.apache.kafka.clients.producer.ProducerConfig; 4 | import org.apache.kafka.clients.admin.AdminClient; 5 | import org.apache.kafka.clients.admin.DescribeTopicsResult; 6 | import org.apache.kafka.clients.admin.CreateTopicsResult; 7 | import org.apache.kafka.clients.admin.DeleteTopicsResult; 8 | import org.apache.kafka.clients.admin.TopicDescription; 9 | import org.apache.kafka.clients.admin.NewTopic; 10 | 11 | import org.apache.kafka.clients.admin.KafkaAdminClient; 12 | import org.apache.kafka.clients.CommonClientConfigs; 13 | 14 | 15 | import java.util.Collection; 16 | import java.util.Collections; 17 | import java.util.concurrent.ExecutionException; 18 | import java.util.Properties; 19 | import java.util.Random; 20 | import java.io.IOException; 21 | 22 | 23 | public class AdminClientWrapper { 24 | 25 | public static Properties getProperties(String brokers) { 26 | Properties properties = new Properties(); 27 | properties.setProperty(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, brokers); 28 | 29 | // Set how to serialize key/value pairs 30 | properties.setProperty(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer"); 31 | properties.setProperty(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer"); 32 | // specify the protocol for Domain Joined clusters 33 | //properties.setProperty(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, "SASL_PLAINTEXT"); 34 | 35 | return properties; 36 | } 37 | 38 | public static void describeTopics(String brokers, String topicName) throws IOException { 39 | // Set properties used to configure admin client 40 | Properties properties = getProperties(brokers); 41 | 42 | try (final AdminClient adminClient = KafkaAdminClient.create(properties)) { 43 | // Make async call to describe the topic. 44 | final DescribeTopicsResult describeTopicsResult = adminClient.describeTopics(Collections.singleton(topicName)); 45 | 46 | TopicDescription description = describeTopicsResult.values().get(topicName).get(); 47 | System.out.print(description.toString()); 48 | } catch (Exception e) { 49 | System.out.print("Describe denied\n"); 50 | System.out.print(e.getMessage()); 51 | //throw new RuntimeException(e.getMessage(), e); 52 | } 53 | } 54 | 55 | public static void deleteTopics(String brokers, String topicName) throws IOException { 56 | // Set properties used to configure admin client 57 | Properties properties = getProperties(brokers); 58 | 59 | try (final AdminClient adminClient = KafkaAdminClient.create(properties)) { 60 | final DeleteTopicsResult deleteTopicsResult = adminClient.deleteTopics(Collections.singleton(topicName)); 61 | deleteTopicsResult.values().get(topicName).get(); 62 | System.out.print("Topic " + topicName + " deleted"); 63 | } catch (Exception e) { 64 | System.out.print("Delete Topics denied\n"); 65 | System.out.print(e.getMessage()); 66 | //throw new RuntimeException(e.getMessage(), e); 67 | } 68 | } 69 | 70 | public static void createTopics(String brokers, String topicName) throws IOException { 71 | // Set properties used to configure admin client 72 | Properties properties = getProperties(brokers); 73 | 74 | try (final AdminClient adminClient = KafkaAdminClient.create(properties)) { 75 | int numPartitions = 8; 76 | short replicationFactor = (short)3; 77 | final NewTopic newTopic = new NewTopic(topicName, numPartitions, replicationFactor); 78 | 79 | final CreateTopicsResult createTopicsResult = adminClient.createTopics(Collections.singleton(newTopic)); 80 | createTopicsResult.values().get(topicName).get(); 81 | System.out.print("Topic " + topicName + " created"); 82 | } catch (Exception e) { 83 | System.out.print("Create Topics denied\n"); 84 | System.out.print(e.getMessage()); 85 | //throw new RuntimeException(e.getMessage(), e); 86 | } 87 | } 88 | } 89 | -------------------------------------------------------------------------------- /Producer-Consumer/src/main/java/com/microsoft/example/Consumer.java: -------------------------------------------------------------------------------- 1 | package com.microsoft.example; 2 | 3 | import org.apache.kafka.clients.consumer.KafkaConsumer; 4 | import org.apache.kafka.clients.consumer.ConsumerRecords; 5 | import org.apache.kafka.clients.consumer.ConsumerRecord; 6 | import java.util.Properties; 7 | import java.util.Arrays; 8 | 9 | public class Consumer { 10 | public static int consume(String brokers, String groupId, String topicName) { 11 | // Create a consumer 12 | KafkaConsumer consumer; 13 | // Configure the consumer 14 | Properties properties = new Properties(); 15 | // Point it to the brokers 16 | properties.setProperty("bootstrap.servers", brokers); 17 | // Set the consumer group (all consumers must belong to a group). 18 | properties.setProperty("group.id", groupId); 19 | // Set how to serialize key/value pairs 20 | properties.setProperty("key.deserializer","org.apache.kafka.common.serialization.StringDeserializer"); 21 | properties.setProperty("value.deserializer","org.apache.kafka.common.serialization.StringDeserializer"); 22 | // When a group is first created, it has no offset stored to start reading from. This tells it to start 23 | // with the earliest record in the stream. 24 | properties.setProperty("auto.offset.reset","earliest"); 25 | 26 | // specify the protocol for Domain Joined clusters 27 | //properties.setProperty(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, "SASL_PLAINTEXT"); 28 | 29 | consumer = new KafkaConsumer<>(properties); 30 | 31 | // Subscribe to the 'test' topic 32 | consumer.subscribe(Arrays.asList(topicName)); 33 | 34 | // Loop until ctrl + c 35 | int count = 0; 36 | while(true) { 37 | // Poll for records 38 | ConsumerRecords records = consumer.poll(200); 39 | // Did we get any? 40 | if (records.count() == 0) { 41 | // timeout/nothing to read 42 | } else { 43 | // Yes, loop over records 44 | for(ConsumerRecord record: records) { 45 | // Display record and count 46 | count += 1; 47 | System.out.println( count + ": " + record.value()); 48 | } 49 | } 50 | } 51 | } 52 | } 53 | -------------------------------------------------------------------------------- /Producer-Consumer/src/main/java/com/microsoft/example/Producer.java: -------------------------------------------------------------------------------- 1 | package com.microsoft.example; 2 | 3 | import org.apache.kafka.clients.producer.KafkaProducer; 4 | import org.apache.kafka.clients.producer.ProducerRecord; 5 | import org.apache.kafka.clients.producer.ProducerConfig; 6 | import org.apache.kafka.clients.admin.AdminClient; 7 | import org.apache.kafka.clients.admin.DescribeTopicsResult; 8 | import org.apache.kafka.clients.admin.KafkaAdminClient; 9 | import org.apache.kafka.clients.CommonClientConfigs; 10 | import org.apache.kafka.clients.admin.TopicDescription; 11 | 12 | import java.util.Collection; 13 | import java.util.Collections; 14 | import java.util.concurrent.ExecutionException; 15 | import java.util.Properties; 16 | import java.util.Random; 17 | import java.io.IOException; 18 | 19 | public class Producer 20 | { 21 | public static void produce(String brokers, String topicName) throws IOException 22 | { 23 | 24 | // Set properties used to configure the producer 25 | Properties properties = new Properties(); 26 | // Set the brokers (bootstrap servers) 27 | properties.setProperty("bootstrap.servers", brokers); 28 | // Set how to serialize key/value pairs 29 | properties.setProperty("key.serializer","org.apache.kafka.common.serialization.StringSerializer"); 30 | properties.setProperty("value.serializer","org.apache.kafka.common.serialization.StringSerializer"); 31 | // specify the protocol for Domain Joined clusters 32 | //properties.setProperty(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, "SASL_PLAINTEXT"); 33 | 34 | KafkaProducer producer = new KafkaProducer<>(properties); 35 | 36 | // So we can generate random sentences 37 | Random random = new Random(); 38 | String[] sentences = new String[] { 39 | "the cow jumped over the moon", 40 | "an apple a day keeps the doctor away", 41 | "four score and seven years ago", 42 | "snow white and the seven dwarfs", 43 | "i am at two with nature" 44 | }; 45 | 46 | String progressAnimation = "|/-\\"; 47 | // Produce a bunch of records 48 | for(int i = 0; i < 100; i++) { 49 | // Pick a sentence at random 50 | String sentence = sentences[random.nextInt(sentences.length)]; 51 | // Send the sentence to the test topic 52 | try 53 | { 54 | producer.send(new ProducerRecord(topicName, sentence)).get(); 55 | } 56 | catch (Exception ex) 57 | { 58 | System.out.print(ex.getMessage()); 59 | throw new IOException(ex.toString()); 60 | } 61 | String progressBar = "\r" + progressAnimation.charAt(i % progressAnimation.length()) + " " + i; 62 | System.out.write(progressBar.getBytes()); 63 | } 64 | } 65 | } 66 | -------------------------------------------------------------------------------- /Producer-Consumer/src/main/java/com/microsoft/example/Run.java: -------------------------------------------------------------------------------- 1 | package com.microsoft.example; 2 | 3 | import java.io.IOException; 4 | import java.util.UUID; 5 | import java.io.PrintWriter; 6 | import java.io.File; 7 | import java.lang.Exception; 8 | 9 | // Handle starting producer or consumer 10 | public class Run { 11 | public static void main(String[] args) throws IOException { 12 | if(args.length < 3) { 13 | usage(); 14 | } 15 | // Get the brokers 16 | String brokers = args[2]; 17 | String topicName = args[1]; 18 | switch(args[0].toLowerCase()) { 19 | case "producer": 20 | Producer.produce(brokers, topicName); 21 | break; 22 | case "consumer": 23 | // Either a groupId was passed in, or we need a random one 24 | String groupId; 25 | if(args.length == 4) { 26 | groupId = args[3]; 27 | } else { 28 | groupId = UUID.randomUUID().toString(); 29 | } 30 | Consumer.consume(brokers, groupId, topicName); 31 | break; 32 | case "describe": 33 | AdminClientWrapper.describeTopics(brokers, topicName); 34 | break; 35 | case "create": 36 | AdminClientWrapper.createTopics(brokers, topicName); 37 | break; 38 | case "delete": 39 | AdminClientWrapper.deleteTopics(brokers, topicName); 40 | break; 41 | default: 42 | usage(); 43 | } 44 | System.exit(0); 45 | } 46 | // Display usage 47 | public static void usage() { 48 | System.out.println("Usage:"); 49 | System.out.println("kafka-example.jar brokerhosts [groupid]"); 50 | System.exit(1); 51 | } 52 | } 53 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | --- 2 | page_type: sample 3 | languages: 4 | - java 5 | products: 6 | - azure 7 | - azure-hdinsight 8 | description: "The examples in this repository demonstrate how to use the Kafka Consumer, Producer, and Streaming APIs with a Kafka on HDInsight cluster." 9 | urlFragment: hdinsight-kafka-java-get-started 10 | --- 11 | 12 | # Java-based example of using the Kafka Consumer, Producer, and Streaming APIs 13 | 14 | The examples in this repository demonstrate how to use the Kafka Consumer, Producer, and Streaming APIs with a Kafka on HDInsight cluster. 15 | 16 | There are two projects included in this repository: 17 | 18 | * Producer-Consumer: This contains a producer and consumer that use a Kafka topic named `test`. 19 | 20 | * Streaming: This contains an application that uses the Kafka streaming API (in Kafka 0.10.0 or higher) that reads data from the `test` topic, splits the data into words, and writes a count of words into the `wordcounts` topic. 21 | 22 | NOTE: This both projects assume Kafka 0.10.0, which is available with Kafka on HDInsight cluster version 3.6. 23 | 24 | ## Producer and Consumer 25 | 26 | To run the consumer and producer example, use the following steps: 27 | 28 | 1. Fork/Clone the repository to your development environment. 29 | 30 | 2. Install Java JDK 8 or higher. This was tested with Oracle Java 8, but should work under things like OpenJDK as well. 31 | 32 | 3. Install [Maven](http://maven.apache.org/). 33 | 34 | 4. Assuming Java and Maven are both in the path, and everything is configured fine for JAVA_HOME, use the following commands to build the consumer and producer example: 35 | 36 | cd Producer-Consumer 37 | mvn clean package 38 | 39 | A file named `kafka-producer-consumer-1.0-SNAPSHOT.jar` is now available in the `target` directory. 40 | 41 | 5. Use SCP to upload the file to the Kafka cluster: 42 | 43 | scp ./target/kafka-producer-consumer-1.0-SNAPSHOT.jar SSHUSER@CLUSTERNAME-ssh.azurehdinsight.net:kafka-producer-consumer.jar 44 | 45 | Replace **SSHUSER** with the SSH user for your cluster, and replace **CLUSTERNAME** with the name of your cluster. When prompted enter the password for the SSH user. 46 | 47 | 6. Use SSH to connect to the cluster: 48 | 49 | ssh USERNAME@CLUSTERNAME 50 | 51 | 7. Use the following commands in the SSH session to get the Zookeeper hosts and Kafka brokers for the cluster. You need this information when working with Kafka. Note that JQ is also installed, as it makes it easier to parse the JSON returned from Ambari. Replace __PASSWORD__ with the login (admin) password for the cluster. Replace __KAFKANAME__ with the name of the Kafka on HDInsight cluster. 52 | 53 | sudo apt -y install jq 54 | export KAFKAZKHOSTS=`curl -sS -u admin:$PASSWORD -G https://$CLUSTERNAME.azurehdinsight.net/api/v1/clusters/$CLUSTERNAME/services/ZOOKEEPER/components/ZOOKEEPER_SERVER | jq -r '["\(.host_components[].HostRoles.host_name):2181"] | join(",")' | cut -d',' -f1,2` 55 | 56 | export KAFKABROKERS=`curl -sS -u admin:$PASSWORD -G https://$CLUSTERNAME.azurehdinsight.net/api/v1/clusters/$CLUSTERNAME/services/KAFKA/components/KAFKA_BROKER | jq -r '["\(.host_components[].HostRoles.host_name):9092"] | join(",")' | cut -d',' -f1,2` 57 | 58 | 8. Use the following to verify that the environment variables have been correctly populated: 59 | 60 | echo '$KAFKAZKHOSTS='$KAFKAZKHOSTS 61 | echo '$KAFKABROKERS='$KAFKABROKERS 62 | 63 | The following is an example of the contents of `$KAFKAZKHOSTS`: 64 | 65 | zk0-kafka.eahjefxxp1netdbyklgqj5y1ud.ex.internal.cloudapp.net:2181,zk2-kafka.eahjefxxp1netdbyklgqj5y1ud.ex.internal.cloudapp.net:2181 66 | 67 | The following is an example of the contents of `$KAFKABROKERS`: 68 | 69 | wn1-kafka.eahjefxxp1netdbyklgqj5y1ud.cx.internal.cloudapp.net:9092,wn0-kafka.eahjefxxp1netdbyklgqj5y1ud.cx.internal.cloudapp.net:9092 70 | 71 | NOTE: This information may change as you perform scaling operations on the cluster, as this adds and removes worker nodes. You should always retrieve the Zookeeper and Broker information before working with Kafka. 72 | 73 | IMPORTANT: You don't have to provide all broker or Zookeeper nodes. A connection to one broker or Zookeeper node can be used to learn about the others. In this example, the list of hosts is trimmed to two entries. 74 | 75 | 9. This example uses a topic named `test`. Use the following to create this topic: 76 | 77 | /usr/hdp/current/kafka-broker/bin/kafka-topics.sh --create --replication-factor 2 --partitions 8 --topic test --zookeeper $KAFKAZKHOSTS 78 | 79 | 10. Use the producer-consumer example to write records to the topic: 80 | 81 | java -jar kafka-producer-consumer.jar producer test $KAFKABROKERS 82 | 83 | A counter displays how many records have been written. 84 | 85 | 11. Use the producer-consumer to read the records that were just written: 86 | 87 | java -jar kafka-producer-consumer.jar consumer test $KAFKABROKERS 88 | 89 | This returns a list of the random sentences, along with a count of how many are read. 90 | 91 | ## Streaming 92 | 93 | NOTE: The streaming example expects that you have already setup the `test` topic from the previous section. 94 | 95 | 1. On your development environment, change to the `Streaming` directory and use the following to create a jar for this project: 96 | 97 | mvn clean package 98 | 99 | 2. Use SCP to copy the `kafka-streaming-1.0-SNAPSHOT.jar` file to your HDInsight cluster: 100 | 101 | scp ./target/kafka-streaming-1.0-SNAPSHOT.jar SSHUSER@CLUSTERNAME-ssh.azurehdinsight.net:kafka-streaming.jar 102 | 103 | Replace **SSHUSER** with the SSH user for your cluster, and replace **CLUSTERNAME** with the name of your cluster. When prompted enter the password for the SSH user. 104 | 105 | 3. Once the file has been uploaded, return to the SSH connection to your HDInsight cluster and use the following commands to create the `wordcounts` and `wordcount-example-Counts-changelog` topics: 106 | 107 | /usr/hdp/current/kafka-broker/bin/kafka-topics.sh --create --replication-factor 2 --partitions 8 --topic wordcounts --zookeeper $KAFKAZKHOSTS 108 | /usr/hdp/current/kafka-broker/bin/kafka-topics.sh --create --replication-factor 2 --partitions 8 --topic wordcount-example-Counts-changelog --zookeeper $KAFKAZKHOSTS 109 | 110 | 4. Use the following command to start the streaming process in the background: 111 | 112 | java -jar kafka-streaming.jar $KAFKABROKERS 2>/dev/null & 113 | 114 | 4. While it is running, use the producer to send messages to the `test` topic: 115 | 116 | java -jar kafka-producer-consumer.jar producer test $KAFKABROKERS &>/dev/null & 117 | 118 | 6. Use the following to view the output that is written to the `wordcounts` topic: 119 | 120 | /usr/hdp/current/kafka-broker/bin/kafka-console-consumer.sh --bootstrap-server $KAFKABROKERS --topic wordcounts --from-beginning --formatter kafka.tools.DefaultMessageFormatter --property print.key=true --property key.deserializer=org.apache.kafka.common.serialization.StringDeserializer --property value.deserializer=org.apache.kafka.common.serialization.LongDeserializer 121 | 122 | NOTE: You have to tell the consumer to print the key (which contains the word value) and the deserializer to use for the key and value in order to view the data. 123 | 124 | The output is similar to the following: 125 | 126 | dwarfs 13635 127 | ago 13664 128 | snow 13636 129 | dwarfs 13636 130 | ago 13665 131 | a 13803 132 | ago 13666 133 | a 13804 134 | ago 13667 135 | ago 13668 136 | jumped 13640 137 | jumped 13641 138 | a 13805 139 | snow 13637 140 | 141 | 7. Use __Ctrl + C__ to exit the consumer, then use the `fg` command to bring the streaming background task to the foreground. Use __Ctrl + C__ to exit it also. 142 | -------------------------------------------------------------------------------- /Streaming/.gitignore: -------------------------------------------------------------------------------- 1 | target/ 2 | pom.xml.tag 3 | pom.xml.releaseBackup 4 | pom.xml.versionsBackup 5 | pom.xml.next 6 | release.properties 7 | dependency-reduced-pom.xml 8 | buildNumber.properties 9 | .mvn/timing.properties 10 | .idea/ 11 | *.log 12 | .classpath 13 | .project 14 | .settings/ 15 | *.iml -------------------------------------------------------------------------------- /Streaming/pom.xml: -------------------------------------------------------------------------------- 1 | 3 | 4.0.0 4 | com.microsoft.example 5 | kafka-streaming 6 | jar 7 | 1.0-SNAPSHOT 8 | kafka-streaming 9 | http://maven.apache.org 10 | 11 | 0.10.0.0 12 | 13 | 14 | 15 | org.apache.kafka 16 | kafka-streams 17 | ${kafka.version} 18 | 19 | 20 | 21 | 22 | 23 | org.apache.maven.plugins 24 | maven-compiler-plugin 25 | 3.3 26 | 27 | 1.8 28 | 1.8 29 | 30 | 31 | 32 | 33 | org.apache.maven.plugins 34 | maven-shade-plugin 35 | 2.3 36 | 37 | 38 | 39 | 40 | 41 | 42 | com.microsoft.example.Stream 43 | 44 | 45 | 46 | 47 | 48 | *:* 49 | 50 | META-INF/*.SF 51 | META-INF/*.DSA 52 | META-INF/*.RSA 53 | 54 | 55 | 56 | 57 | 58 | 59 | package 60 | 61 | shade 62 | 63 | 64 | 65 | 66 | 67 | 68 | 69 | -------------------------------------------------------------------------------- /Streaming/src/main/java/com/microsoft/example/Stream.java: -------------------------------------------------------------------------------- 1 | package com.microsoft.example; 2 | 3 | import org.apache.kafka.common.serialization.Serde; 4 | import org.apache.kafka.common.serialization.Serdes; 5 | import org.apache.kafka.streams.KafkaStreams; 6 | import org.apache.kafka.streams.KeyValue; 7 | import org.apache.kafka.streams.StreamsConfig; 8 | import org.apache.kafka.streams.kstream.KStream; 9 | import org.apache.kafka.streams.kstream.KStreamBuilder; 10 | 11 | import java.util.Arrays; 12 | import java.util.Properties; 13 | 14 | public class Stream 15 | { 16 | public static void main( String[] args ) { 17 | Properties streamsConfig = new Properties(); 18 | // The name must be unique on the Kafka cluster 19 | streamsConfig.put(StreamsConfig.APPLICATION_ID_CONFIG, "wordcount-example"); 20 | // Brokers 21 | streamsConfig.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, args[0]); 22 | // Zookeeper 23 | //streamsConfig.put(StreamsConfig.ZOOKEEPER_CONNECT_CONFIG, args[1]); 24 | // SerDes for key and values 25 | streamsConfig.put(StreamsConfig.KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName()); 26 | streamsConfig.put(StreamsConfig.VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName()); 27 | 28 | // Serdes for the word and count 29 | Serde stringSerde = Serdes.String(); 30 | Serde longSerde = Serdes.Long(); 31 | 32 | KStreamBuilder builder = new KStreamBuilder(); 33 | KStream sentences = builder.stream(stringSerde, stringSerde, "test"); 34 | KStream wordCounts = sentences 35 | .flatMapValues(value -> Arrays.asList(value.toLowerCase().split("\\W+"))) 36 | .map((key, word) -> new KeyValue<>(word, word)) 37 | .countByKey("Counts") 38 | .toStream(); 39 | wordCounts.to(stringSerde, longSerde, "wordcounts"); 40 | 41 | KafkaStreams streams = new KafkaStreams(builder, streamsConfig); 42 | streams.start(); 43 | 44 | Runtime.getRuntime().addShutdownHook(new Thread(streams::close)); 45 | } 46 | } 47 | -------------------------------------------------------------------------------- /azuredeploy.json: -------------------------------------------------------------------------------- 1 | { 2 | "$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#", 3 | "contentVersion": "1.0.0.0", 4 | "parameters": { 5 | "clusterName": { 6 | "type": "string", 7 | "metadata": { 8 | "description": "The name of the Kafka cluster to create. This must be a unique name." 9 | } 10 | }, 11 | "clusterLoginUserName": { 12 | "type": "string", 13 | "defaultValue": "admin", 14 | "metadata": { 15 | "description": "These credentials can be used to submit jobs to the cluster and to log into cluster dashboards." 16 | } 17 | }, 18 | "clusterLoginPassword": { 19 | "type": "securestring", 20 | "metadata": { 21 | "description": "The password must be at least 10 characters in length and must contain at least one digit, one non-alphanumeric character, and one upper or lower case letter." 22 | } 23 | }, 24 | "sshUserName": { 25 | "type": "string", 26 | "defaultValue": "sshuser", 27 | "metadata": { 28 | "description": "These credentials can be used to remotely access the cluster." 29 | } 30 | }, 31 | "sshPassword": { 32 | "type": "securestring", 33 | "metadata": { 34 | "description": "The password must be at least 10 characters in length and must contain at least one digit, one non-alphanumeric character, and one upper or lower case letter." 35 | } 36 | } 37 | }, 38 | "variables": { 39 | "defaultStorageAccount": { 40 | "name": "[uniqueString(resourceGroup().id)]", 41 | "type": "Standard_LRS" 42 | } 43 | }, 44 | "resources": [ 45 | { 46 | "type": "Microsoft.Storage/storageAccounts", 47 | "name": "[variables('defaultStorageAccount').name]", 48 | "location": "[resourceGroup().location]", 49 | "apiVersion": "2016-01-01", 50 | "sku": { 51 | "name": "[variables('defaultStorageAccount').type]" 52 | }, 53 | "kind": "Storage", 54 | "properties": {} 55 | }, 56 | { 57 | "name": "[parameters('clusterName')]", 58 | "type": "Microsoft.HDInsight/clusters", 59 | "location": "[resourceGroup().location]", 60 | "apiVersion": "2015-03-01-preview", 61 | "dependsOn": [ 62 | "[concat('Microsoft.Storage/storageAccounts/',variables('defaultStorageAccount').name)]" 63 | ], 64 | "tags": { }, 65 | "properties": { 66 | "clusterVersion": "3.6", 67 | "osType": "Linux", 68 | "clusterDefinition": { 69 | "kind": "kafka", 70 | 71 | "configurations": { 72 | "gateway": { 73 | "restAuthCredential.isEnabled": true, 74 | "restAuthCredential.username": "[parameters('clusterLoginUserName')]", 75 | "restAuthCredential.password": "[parameters('clusterLoginPassword')]" 76 | } 77 | } 78 | }, 79 | "storageProfile": { 80 | "storageaccounts": [ 81 | { 82 | "name": "[replace(replace(concat(reference(concat('Microsoft.Storage/storageAccounts/', variables('defaultStorageAccount').name), '2016-01-01').primaryEndpoints.blob),'https:',''),'/','')]", 83 | "isDefault": true, 84 | "container": "[parameters('clusterName')]", 85 | "key": "[listKeys(resourceId('Microsoft.Storage/storageAccounts', variables('defaultStorageAccount').name), '2016-01-01').keys[0].value]" 86 | } 87 | ] 88 | }, 89 | "computeProfile": { 90 | "roles": [ 91 | { 92 | "name": "headnode", 93 | "targetInstanceCount": "2", 94 | "hardwareProfile": { 95 | "vmSize": "Standard_D3_v2" 96 | }, 97 | "osProfile": { 98 | "linuxOperatingSystemProfile": { 99 | "username": "[parameters('sshUserName')]", 100 | "password": "[parameters('sshPassword')]" 101 | } 102 | } 103 | }, 104 | { 105 | "name": "workernode", 106 | "targetInstanceCount": 4, 107 | "hardwareProfile": { 108 | "vmSize": "Standard_D3_v2" 109 | }, 110 | "dataDisksGroups": [ 111 | { 112 | "disksPerNode": 2 113 | } 114 | ], 115 | "osProfile": { 116 | "linuxOperatingSystemProfile": { 117 | "username": "[parameters('sshUserName')]", 118 | "password": "[parameters('sshPassword')]" 119 | } 120 | } 121 | }, 122 | { 123 | "name": "zookeepernode", 124 | "targetInstanceCount": "3", 125 | "hardwareProfile": { 126 | "vmSize": "Standard_A3" 127 | }, 128 | "osProfile": { 129 | "linuxOperatingSystemProfile": { 130 | "username": "[parameters('sshUserName')]", 131 | "password": "[parameters('sshPassword')]" 132 | } 133 | } 134 | } 135 | ] 136 | } 137 | } 138 | } 139 | ], 140 | "outputs": { 141 | "cluster": { 142 | "type": "object", 143 | "value": "[reference(resourceId('Microsoft.HDInsight/clusters',parameters('clusterName')))]" 144 | } 145 | } 146 | } --------------------------------------------------------------------------------