├── .github ├── CONTRIBUTING.md ├── ISSUE_TEMPLATE.md └── PULL_REQUEST_TEMPLATE.md ├── LICENSE ├── Makefile ├── README.md ├── docker-compose.yml └── kubernetes ├── README.md ├── kafka-svc.yml ├── kafka.yml └── zookeeper.yml /.github/CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing Guidelines 2 | 3 | ## Getting Started 4 | 5 | ### Cloning the repo 6 | 7 | ### Setting up for local dev 8 | 9 | ### Making a PR 10 | 11 | 12 | ## How you can contribute 13 | 14 | ### Adding features 15 | 16 | ### Filing, diagnosing, and reproducing bugs 17 | 18 | ### Adding documentation 19 | 20 | 21 | ## Tips for contributing 22 | 23 | ### External documentation 24 | 25 | - Library docs 26 | 27 | - Language/framework related docs 28 | 29 | 30 | ### Getting help 31 | 32 | - Issue tracker 33 | 34 | - StackOverflow topics 35 | 36 | - Slack teams 37 | 38 | - Gitter team 39 | 40 | - Mailing list 41 | 42 | 43 | ## Code of Conduct 44 | 45 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE.md: -------------------------------------------------------------------------------- 1 | ### Expected behaviour 2 | 3 | ### Actual behaviour 4 | 5 | ### Steps to reproduce the problem. 6 | 7 | ### Specifications like the version of the project, operating system, or hardware 8 | -------------------------------------------------------------------------------- /.github/PULL_REQUEST_TEMPLATE.md: -------------------------------------------------------------------------------- 1 | A reference to a related issue in the repository: 2 | 3 | e.g. Fixes # . 4 | 5 | A description of the changes proposed in the pull request: 6 | 7 | e.g. Changes proposed in this pull request: 8 | 9 | - 10 | 11 | - 12 | 13 | - 14 | 15 | @mentions of the person or team responsible for reviewing proposed changes. 16 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2016 William Martin Stewart 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | setup-zoo: 2 | -eval "$(docker-machine env default)" 3 | @docker-compose down -v 4 | @docker-compose build 5 | # @docker-compose rm -vf 6 | @docker-compose up -d zoo1 zoo2 zoo3 7 | 8 | up: setup-zoo 9 | @echo '=== Sleeping for 8s while Zookeeper initialises ===' && sleep 8 10 | -docker-compose up 11 | @docker-compose down 12 | 13 | reset: 14 | @docker-compose stop 15 | @docker-compose rm -vf 16 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Kafka with docker-compose 2 | 3 | Three broker Kafka cluster and three node Zookeeper ensemble running in Docker with docker-compose. 4 | 5 | ## Overview 6 | 7 | Based on @eliaslevy's work on a Zookeeper cluster in Kubernetes [here](https://github.com/eliaslevy/docker-zookeeper), and @wurstmeister's Kafka docker-compose [here](https://github.com/wurstmeister/kafka-docker). 8 | 9 | Kafka requires unique `host:port` combinations, and can try assign its own broker IDs, but the issue with it assigning its own broker IDs is that they aren't persistent across container restarts. It would probably be better to hardcode `KAFKA_BROKER_ID` for each instance for now, or you get "Leader Not Available" issues. 10 | 11 | I made this while experimenting with setting up Kafka in Kubernetes. I have included the Kubernetes config files and instructions for setting up a multi-broker Kafka cluster and Zookeeper ensemble [here](https://github.com/zoidbergwill/docker-compose-kafka/kubernetes/). 12 | 13 | ## Usage 14 | 15 | To start the Zookeeper ensemble and Kafka cluster, assuming you have docker-compose (>= 1.6) installed: 16 | 17 | 1. Change the `KAFKA_ADVERTISED_HOST_NAME` to your `DOCKER_HOST` IP 18 | Note: If you're using [Docker toolbox](https://www.docker.com/products/docker-toolbox) then this is the IP from `env | grep DOCKER_HOST` 19 | 1. Run `make up` 20 | 1. Once Zookeeper and Kafka are done setting up, you can connect to the Kafka with something like [pykafka](https://github.com/Parsely/pykafka) with your docker host IP: 21 | 22 | ```Python 23 | > from pykafka import KafkaClient 24 | > kafka_client = KafkaClient('192.168.99.100:9092') # Or your Docker host IP:9092 25 | > for topic in kafka_client.topics.values(): 26 | print(topic.partitions[0].leader) 27 | 28 | 29 | 30 | 31 | ``` 32 | -------------------------------------------------------------------------------- /docker-compose.yml: -------------------------------------------------------------------------------- 1 | version: '2' 2 | services: 3 | kafka1: 4 | image: wurstmeister/kafka 5 | depends_on: 6 | - zoo1 7 | - zoo2 8 | - zoo3 9 | ports: 10 | - "9092:9092" 11 | environment: 12 | KAFKA_LOG_DIRS: /kafka 13 | KAFKA_BROKER_ID: 1 14 | KAFKA_CREATE_TOPICS: test-topic-1:1:2,test-topic-2:1:2,test-topic-3:1:2 15 | KAFKA_ADVERTISED_HOST_NAME: 192.168.99.100 16 | KAFKA_ADVERTISED_PORT: 9092 17 | KAFKA_LOG_RETENTION_HOURS: "168" 18 | KAFKA_LOG_RETENTION_BYTES: "100000000" 19 | KAFKA_ZOOKEEPER_CONNECT: zoo1:2181,zoo2:2181,zoo3:2181 20 | 21 | kafka2: 22 | image: wurstmeister/kafka 23 | depends_on: 24 | - zoo1 25 | - zoo2 26 | - zoo3 27 | ports: 28 | - "9093:9092" 29 | environment: 30 | KAFKA_LOG_DIRS: /kafka 31 | KAFKA_BROKER_ID: 2 32 | KAFKA_ADVERTISED_HOST_NAME: 192.168.99.100 33 | KAFKA_ADVERTISED_PORT: 9093 34 | KAFKA_LOG_RETENTION_HOURS: "168" 35 | KAFKA_LOG_RETENTION_BYTES: "100000000" 36 | KAFKA_ZOOKEEPER_CONNECT: zoo1:2181,zoo2:2181,zoo3:2181 37 | 38 | kafka3: 39 | image: wurstmeister/kafka 40 | depends_on: 41 | - zoo1 42 | - zoo2 43 | - zoo3 44 | ports: 45 | - "9094:9092" 46 | environment: 47 | KAFKA_LOG_DIRS: /kafka 48 | KAFKA_BROKER_ID: 3 49 | KAFKA_ADVERTISED_HOST_NAME: 192.168.99.100 50 | KAFKA_ADVERTISED_PORT: 9094 51 | KAFKA_LOG_RETENTION_HOURS: "168" 52 | KAFKA_LOG_RETENTION_BYTES: "100000000" 53 | KAFKA_ZOOKEEPER_CONNECT: zoo1:2181,zoo2:2181,zoo3:2181 54 | 55 | zoo1: 56 | image: elevy/zookeeper:latest 57 | environment: 58 | MYID: 1 59 | SERVERS: zoo1,zoo2,zoo3 60 | ports: 61 | - "2181:2181" 62 | - "2888" 63 | - "3888" 64 | 65 | zoo2: 66 | image: elevy/zookeeper:latest 67 | environment: 68 | MYID: 2 69 | SERVERS: zoo1,zoo2,zoo3 70 | ports: 71 | - "2182:2181" 72 | - "2888" 73 | - "3888" 74 | 75 | zoo3: 76 | image: elevy/zookeeper:latest 77 | environment: 78 | MYID: 3 79 | SERVERS: zoo1,zoo2,zoo3 80 | ports: 81 | - "2183:2181" 82 | - "2888" 83 | - "3888" 84 | -------------------------------------------------------------------------------- /kubernetes/README.md: -------------------------------------------------------------------------------- 1 | # Setting up Kafka in Kubernetes 2 | 3 | Dynamically getting the IPs for the Kafka brokers is somewhat complex so instead I create a load balancer for each Kafka broker and choose to hardcode the IP/port combination for each broker in the replication controller configs in `kafka.yml`. 4 | 5 | The Kafka brokers are on different external ports because I'd like to have them all on the same IP like the docker-compose example eventually. I think I could do it more easily using Ingress controllers in the future. 6 | 7 | ## Usage 8 | 9 | 1. Attach labels to nodes 10 | 11 | ```sh 12 | $ count=0; for node in $(kubectl get nodes -o=jsonpath="{.items[*].metadata.name}"); do count=$(((count+1))); kubectl label nodes $node custom/node-id=$count; done 13 | ``` 14 | 15 | 1. Create the Zookeeper services, Zookeeper replication controllers, and Kafka services 16 | 17 | ```sh 18 | $ kubectl create -f kubernetes/zookeeper.yml 19 | ... 20 | $ kubectl create -f kubernetes/kafka-svc.yml 21 | ... 22 | ``` 23 | 24 | 1. Run `kubectl get svc -w` and wait for the Kafka services to be assigned their external IPs 25 | 26 | ```sh 27 | $ kubectl get svc -w 28 | NAME CLUSTER_IP EXTERNAL_IP PORT(S) SELECTOR AGE 29 | kafka-1 10.0.0.2 9092/TCP app=kafka,server-id=1 47s 30 | kafka-1 10.0.0.2 192.168.99.100 9092/TCP app=kafka,server-id=1 54s 31 | ... 32 | ``` 33 | 34 | 1. Add the IPs as the `KAFKA_ADVERTISED_HOST_NAME`s in `kubernetes/kafka.yml` 35 | 36 | 1. Create the Kafka replication controllers 37 | 38 | ```sh 39 | $ kubectl create -f deploy/k8s/kafka.yml 40 | ... 41 | ``` 42 | 43 | ## Future Work 44 | 45 | - Kafka brokers and Zookeeper servers that support dynamic scaling 46 | - Dynamic but cross-instance persistent broker IDs for Kafka 47 | - Change to properly tagged latest Zookeeper and Kafka images 48 | -------------------------------------------------------------------------------- /kubernetes/kafka-svc.yml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: Service 3 | metadata: 4 | name: kafka-1 5 | spec: 6 | ports: 7 | - name: client 8 | port: 9092 9 | selector: 10 | app: kafka 11 | server-id: "1" 12 | type: LoadBalancer 13 | --- 14 | apiVersion: v1 15 | kind: Service 16 | metadata: 17 | name: kafka-2 18 | spec: 19 | ports: 20 | - name: client 21 | port: 9093 22 | targetPort: 9092 23 | selector: 24 | app: kafka 25 | server-id: "2" 26 | type: LoadBalancer 27 | --- 28 | apiVersion: v1 29 | kind: Service 30 | metadata: 31 | name: kafka-3 32 | spec: 33 | ports: 34 | - name: client 35 | port: 9094 36 | targetPort: 9092 37 | selector: 38 | app: kafka 39 | server-id: "3" 40 | type: LoadBalancer 41 | -------------------------------------------------------------------------------- /kubernetes/kafka.yml: -------------------------------------------------------------------------------- 1 | apiVersion: extensions/v1beta1 2 | kind: Deployment 3 | metadata: 4 | name: kafka-1-deployment 5 | spec: 6 | replicas: 1 7 | metadata: 8 | name: kafka-1 9 | labels: 10 | app: kafka 11 | server-id: "1" 12 | spec: 13 | volumes: 14 | - name: data 15 | gcePersistentDisk: 16 | pdName: kafka-1 17 | fsType: ext4 18 | containers: 19 | - name: server 20 | image: wurstmeister/kafka:latest 21 | env: 22 | - name: KAFKA_LOG_DIRS 23 | value: /kafka/kafka-1 24 | - name: KAFKA_BROKER_ID 25 | value: "1" 26 | - name: KAFKA_ADVERTISED_HOST_NAME 27 | value: 192.168.99.100 28 | - name: KAFKA_ADVERTISED_PORT 29 | value: "9092" 30 | - name: KAFKA_LOG_RETENTION_HOURS 31 | value: "168" 32 | - name: KAFKA_ZOOKEEPER_CONNECT 33 | value: zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181 34 | - name: KAFKA_MESSAGE_MAX_BYTES 35 | value: "50000000" 36 | - name: KAFKA_LOG_SEGMENT_BYTES 37 | value: "100000000" 38 | - name: KAFKA_REPLICA_FETCH_MAX_BYTES 39 | value: "50000000" 40 | ports: 41 | - containerPort: 9092 42 | volumeMounts: 43 | - mountPath: /kafka 44 | name: data 45 | nodeSelector: 46 | custom/node-id: "1" 47 | --- 48 | apiVersion: extensions/v1beta1 49 | kind: Deployment 50 | metadata: 51 | name: kafka-2-deployment 52 | spec: 53 | replicas: 1 54 | metadata: 55 | name: kafka-2 56 | labels: 57 | app: kafka 58 | server-id: "2" 59 | spec: 60 | volumes: 61 | - name: data 62 | gcePersistentDisk: 63 | pdName: kafka-2 64 | fsType: ext4 65 | containers: 66 | - name: server 67 | image: wurstmeister/kafka:latest 68 | env: 69 | - name: KAFKA_LOG_DIRS 70 | value: /kafka/kafka-2 71 | - name: KAFKA_BROKER_ID 72 | value: "2" 73 | - name: KAFKA_ADVERTISED_HOST_NAME 74 | value: 192.168.99.100 75 | - name: KAFKA_ADVERTISED_PORT 76 | value: "9093" 77 | - name: KAFKA_LOG_RETENTION_HOURS 78 | value: "168" 79 | - name: KAFKA_ZOOKEEPER_CONNECT 80 | value: zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181 81 | - name: KAFKA_MESSAGE_MAX_BYTES 82 | value: "50000000" 83 | - name: KAFKA_LOG_SEGMENT_BYTES 84 | value: "100000000" 85 | - name: KAFKA_REPLICA_FETCH_MAX_BYTES 86 | value: "50000000" 87 | ports: 88 | - containerPort: 9092 89 | volumeMounts: 90 | - mountPath: /kafka 91 | name: data 92 | nodeSelector: 93 | custom/node-id: "2" 94 | --- 95 | apiVersion: extensions/v1beta1 96 | kind: Deployment 97 | metadata: 98 | name: kafka-3-deployment 99 | spec: 100 | replicas: 1 101 | metadata: 102 | name: kafka-3 103 | labels: 104 | app: kafka 105 | server-id: "3" 106 | spec: 107 | volumes: 108 | - name: data 109 | gcePersistentDisk: 110 | pdName: kafka-3 111 | fsType: ext4 112 | containers: 113 | - name: server 114 | image: wurstmeister/kafka:latest 115 | env: 116 | - name: KAFKA_LOG_DIRS 117 | value: /kafka/kafka-3 118 | - name: KAFKA_BROKER_ID 119 | value: "3" 120 | - name: KAFKA_CREATE_TOPICS 121 | value: test-topic-1:1:2,test-topic-2:1:2,test-topic-3:1:2 122 | - name: KAFKA_ADVERTISED_HOST_NAME 123 | value: 192.168.99.100 124 | - name: KAFKA_ADVERTISED_PORT 125 | value: "9094" 126 | - name: KAFKA_LOG_RETENTION_HOURS 127 | value: "168" 128 | - name: KAFKA_ZOOKEEPER_CONNECT 129 | value: zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181 130 | - name: KAFKA_MESSAGE_MAX_BYTES 131 | value: "50000000" 132 | - name: KAFKA_LOG_SEGMENT_BYTES 133 | value: "100000000" 134 | - name: KAFKA_REPLICA_FETCH_MAX_BYTES 135 | value: "50000000" 136 | ports: 137 | - containerPort: 9092 138 | volumeMounts: 139 | - mountPath: /kafka 140 | name: data 141 | nodeSelector: 142 | custom/node-id: "3" 143 | -------------------------------------------------------------------------------- /kubernetes/zookeeper.yml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: Service 3 | metadata: 4 | name: zookeeper 5 | spec: 6 | ports: 7 | - name: client 8 | port: 2181 9 | selector: 10 | app: zookeeper 11 | --- 12 | apiVersion: v1 13 | kind: Service 14 | metadata: 15 | name: zookeeper-1 16 | spec: 17 | ports: 18 | - name: client 19 | port: 2181 20 | - name: followers 21 | port: 2888 22 | - name: election 23 | port: 3888 24 | selector: 25 | app: zookeeper 26 | server-id: "1" 27 | --- 28 | apiVersion: v1 29 | kind: Service 30 | metadata: 31 | name: zookeeper-2 32 | spec: 33 | ports: 34 | - name: client 35 | port: 2181 36 | - name: followers 37 | port: 2888 38 | - name: election 39 | port: 3888 40 | selector: 41 | app: zookeeper 42 | server-id: "2" 43 | --- 44 | apiVersion: v1 45 | kind: Service 46 | metadata: 47 | name: zookeeper-3 48 | spec: 49 | ports: 50 | - name: client 51 | port: 2181 52 | - name: followers 53 | port: 2888 54 | - name: election 55 | port: 3888 56 | selector: 57 | app: zookeeper 58 | server-id: "3" 59 | --- 60 | apiVersion: extensions/v1beta1 61 | kind: Deployment 62 | metadata: 63 | name: zookeeper-1-deployment 64 | spec: 65 | replicas: 1 66 | template: 67 | metadata: 68 | name: zookeeper-1 69 | labels: 70 | app: zookeeper 71 | server-id: "1" 72 | spec: 73 | volumes: 74 | - name: data 75 | emptyDir: {} 76 | - name: wal 77 | emptyDir: 78 | medium: Memory 79 | containers: 80 | - name: server 81 | image: elevy/zookeeper:latest 82 | env: 83 | - name: MYID 84 | value: "1" 85 | - name: SERVERS 86 | value: "zookeeper-1,zookeeper-2,zookeeper-3" 87 | - name: JVMFLAGS 88 | value: "-Xmx2G" 89 | ports: 90 | - containerPort: 2181 91 | - containerPort: 2888 92 | - containerPort: 3888 93 | volumeMounts: 94 | - mountPath: /zookeeper/data 95 | name: data 96 | - mountPath: /zookeeper/wal 97 | name: wal 98 | nodeSelector: 99 | custom/node-id: "1" 100 | --- 101 | apiVersion: extensions/v1beta1 102 | kind: Deployment 103 | metadata: 104 | name: zookeeper-2-deployment 105 | spec: 106 | replicas: 1 107 | template: 108 | metadata: 109 | name: zookeeper-2 110 | labels: 111 | app: zookeeper 112 | server-id: "2" 113 | spec: 114 | volumes: 115 | - name: data 116 | emptyDir: {} 117 | - name: wal 118 | emptyDir: 119 | medium: Memory 120 | containers: 121 | - name: server 122 | image: elevy/zookeeper:latest 123 | env: 124 | - name: MYID 125 | value: "2" 126 | - name: SERVERS 127 | value: "zookeeper-1,zookeeper-2,zookeeper-3" 128 | - name: JVMFLAGS 129 | value: "-Xmx2G" 130 | ports: 131 | - containerPort: 2181 132 | - containerPort: 2888 133 | - containerPort: 3888 134 | volumeMounts: 135 | - mountPath: /zookeeper/data 136 | name: data 137 | - mountPath: /zookeeper/wal 138 | name: wal 139 | nodeSelector: 140 | custom/node-id: "2" 141 | --- 142 | apiVersion: extensions/v1beta1 143 | kind: Deployment 144 | metadata: 145 | name: zookeeper-3-deployment 146 | spec: 147 | replicas: 1 148 | template: 149 | metadata: 150 | name: zookeeper-3 151 | labels: 152 | app: zookeeper 153 | server-id: "3" 154 | spec: 155 | volumes: 156 | - name: data 157 | emptyDir: {} 158 | - name: wal 159 | emptyDir: 160 | medium: Memory 161 | containers: 162 | - name: server 163 | image: elevy/zookeeper:latest 164 | env: 165 | - name: MYID 166 | value: "3" 167 | - name: SERVERS 168 | value: "zookeeper-1,zookeeper-2,zookeeper-3" 169 | - name: JVMFLAGS 170 | value: "-Xmx2G" 171 | ports: 172 | - containerPort: 2181 173 | - containerPort: 2888 174 | - containerPort: 3888 175 | volumeMounts: 176 | - mountPath: /zookeeper/data 177 | name: data 178 | - mountPath: /zookeeper/wal 179 | name: wal 180 | nodeSelector: 181 | custom/node-id: "3" 182 | --------------------------------------------------------------------------------