├── .gitignore ├── settings.conf ├── LICENCE.txt ├── demo-producer.py ├── demo-consumer.py └── README.md /.gitignore: -------------------------------------------------------------------------------- 1 | *~ 2 | *.pyc 3 | -------------------------------------------------------------------------------- /settings.conf: -------------------------------------------------------------------------------- 1 | [kafka_demo] 2 | 3 | # Note that this topic will be created automatically if it does not exist 4 | topic: test 5 | 6 | # Comma-separated list of hosts 7 | kafka_hosts: localhost:9092 8 | zookeeper_hosts: localhost:2181 9 | 10 | # How long to delay between displaying information in the terminal (in seconds) 11 | display_interval: 5 12 | -------------------------------------------------------------------------------- /LICENCE.txt: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2015 Carl Scheffler 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy of 6 | this software and associated documentation files (the "Software"), to deal in 7 | the Software without restriction, including without limitation the rights to use, 8 | copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the 9 | Software, and to permit persons to whom the Software is furnished to do so, 10 | subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS 17 | FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR 18 | COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER 19 | IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN 20 | CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 21 | -------------------------------------------------------------------------------- /demo-producer.py: -------------------------------------------------------------------------------- 1 | from pykafka import KafkaClient 2 | import time 3 | import uuid 4 | try: 5 | import configparser # Python 3 6 | except ImportError: 7 | import ConfigParser as configparser # Python 2 8 | 9 | 10 | config = configparser.ConfigParser() 11 | config.read('settings.conf') 12 | 13 | kafka_client = KafkaClient(hosts=config.get('kafka_demo', 'kafka_hosts')) # Create Kafka client 14 | topic = kafka_client.topics[config.get('kafka_demo', 'topic')] # This will create the topic if it does not exist 15 | display_interval = int(config.get('kafka_demo', 'display_interval')) 16 | 17 | print 'Producing messages to topic %r. Press Ctrl-C to interrupt.' % topic.name 18 | display_iteration = 0 19 | message_count = 0 20 | start_time = time.time() 21 | with topic.get_producer() as producer: # Create Kafka producer on the given topic 22 | while True: 23 | identifier = str(uuid.uuid4()) # Encode the message (this should result in a byte string) 24 | producer.produce(identifier) # Send the message to Kafka 25 | message_count += 1 26 | now = time.time() 27 | if now - start_time > display_interval: 28 | print '%i) %i messages produced at %.0f messages / second' % ( 29 | display_iteration, 30 | message_count, 31 | message_count / (now - start_time)) 32 | display_iteration += 1 33 | message_count = 0 34 | start_time = time.time() 35 | -------------------------------------------------------------------------------- /demo-consumer.py: -------------------------------------------------------------------------------- 1 | from pykafka import KafkaClient 2 | import time 3 | import uuid 4 | try: 5 | import configparser # Python 3 6 | except ImportError: 7 | import ConfigParser as configparser # Python 2 8 | 9 | 10 | config = configparser.ConfigParser() 11 | config.read('settings.conf') 12 | 13 | kafka_client = KafkaClient(hosts=config.get('kafka_demo', 'kafka_hosts')) # Create Kafka client 14 | topic = kafka_client.topics[config.get('kafka_demo', 'topic')] # This will create the topic if it does not exist 15 | consumer = topic.get_balanced_consumer( 16 | consumer_group="test_group", 17 | auto_commit_enable=True, 18 | zookeeper_connect=config.get('kafka_demo', 'zookeeper_hosts')) 19 | display_interval = int(config.get('kafka_demo', 'display_interval')) 20 | 21 | print 'Consuming messagse from topic %r. Press Ctrl-C to interrupt.' % topic.name 22 | display_iteration = 0 23 | message_count = 0 24 | partitions = set() # Track which partitions got consumed by this consumer 25 | start_time = time.time() 26 | while True: 27 | message = consumer.consume() # Read one message from Kafka 28 | identifier = uuid.UUID(message.value) # Decode the message 29 | message_count += 1 30 | partitions.add(message.partition.id) 31 | now = time.time() 32 | if now - start_time > display_interval: 33 | print '%i) %i messages consumed at %.0f messages / second - from partitions %r' % ( 34 | display_iteration, 35 | message_count, 36 | message_count / (now - start_time), 37 | sorted(partitions)) 38 | display_iteration += 1 39 | message_count = 0 40 | partitions = set() 41 | start_time = time.time() 42 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | Kafka demo scripts 2 | ================== 3 | 4 | This is the demo kit that accompanies the talk I gave at PyConZA 2015 on Apache 5 | Kafka. It contains scripts for demoing a producer and balanced consumers. 6 | 7 | You can watch the talk here: 8 | 9 | * https://www.youtube.com/watch?v=b8Cj5-LieH0 10 | 11 | 12 | Dependencies 13 | ------------ 14 | 15 | This demo requires Kafka 0.8.2. If you do not have access to a Kafka cluster, 16 | you can set it up in standalone mode, by downloading it here: 17 | 18 | * http://kafka.apache.org/downloads.html 19 | 20 | and installing it following the instructions here: 21 | 22 | * http://kafka.apache.org/documentation.html#quickstart 23 | 24 | You will also need the pykafka client library. I used version 2.0.0, which was 25 | the latest version at the time of writing. Get it using 26 | 27 | pip install pykafka 28 | 29 | or from the source repo here: 30 | 31 | * https://github.com/Parsely/pykafka 32 | 33 | 34 | Configuration 35 | ------------- 36 | 37 | Edit `settings.conf` in this repo to connect to your Kafka and Zookeeper 38 | hosts. By default, it is set up for a standalone Kafka installation on 39 | localhost. 40 | 41 | For the tests below to run properly, create a topic called "test" by executing 42 | 43 | cd /opt/kafka # This is the default install directory 44 | ./bin/kafka-topics.sh --topic test --partitions 6 --create \ 45 | --zookeeper localhost:2181 --replication-factor 1 46 | 47 | The test topic will have 6 partitions (more on this in Demo 4 below). 48 | 49 | To check that your topic got created properly, you can execute 50 | 51 | ./bin/kafka-topics.sh --describe --zookeeper localhost:2181 52 | 53 | This will display information about all Kafka topics. 54 | 55 | 56 | Demo 1: Running a producer 57 | -------------------------- 58 | 59 | python demo-producer.py 60 | 61 | This will produce messages as fast as possible and display how many got 62 | produced, once every 5 seconds. 63 | 64 | 65 | Demo 2: Running a consumer 66 | -------------------------- 67 | 68 | python demo-consumer.py 69 | 70 | This will consume already produced messages as fast as possible and display how 71 | many got consumed, once every 5 seconds. It will also show the list of 72 | partitions from which messages got consumed. More on partitions in demo 4. 73 | 74 | The consumer will block when it runs out of messages. 75 | 76 | 77 | Demo 3: Clearing out your Kafka and Zookeeper data 78 | -------------------------------------------------- 79 | 80 | To delete all Kafka logs and Zookeeper data, you can do the following. This is 81 | for test purposes only and should obviously never be done in production. 82 | 83 | 1. Stop your Kafka server 84 | 2. Stop your Zookeeper server 85 | 3. `rm -Rf /tmp/zookeeper/*` 86 | This is the default data location for a standalone Zookeeper install. See 87 | the `dataDir` property in `config/zookeeper.properties` for where your 88 | installation is storing things. 89 | 4. `rm -Rf /tmp/kafka-logs/*` 90 | This is the default data location for a standalone Kafka install. See the 91 | `log.dirs` property in `config/server.properties` for where your 92 | installation is storing its Kafka logs. 93 | 5. Start your Zookeeper server 94 | 6. Start your Kafka server 95 | 7. Re-create your Kafka topics 96 | 97 | 98 | Demo 4: Running a producer and multiple, balanced consumers 99 | ----------------------------------------------------------- 100 | 101 | Run `demo-producer.py` and a few copies of `demo-consumer.py` simultaneously, 102 | in different terminals. You should see that the different consumers 103 | automatically read from different partitions. If a consumer is killed or added, 104 | the partitions being read will automatically be rebalanced between the 105 | consumers. 106 | 107 | Note that this will work only if your topic has more than 1 partition. Nothing 108 | useful happens if you have more consumers than you have partitions. See the 109 | Configuration section above on how to create a topic with multiple partitions. 110 | --------------------------------------------------------------------------------