├── .gitignore ├── Dockerfile ├── Makefile ├── README.md ├── build-ureplicator ├── Dockerfile ├── Makefile ├── confd │ ├── CONFD.md │ ├── conf.d │ │ ├── consumer.properties.toml │ │ ├── helix.properties.toml │ │ ├── producer.properties.toml │ │ └── zookeeper.properties.toml │ └── templates │ │ ├── consumer.properties.tmpl │ │ ├── helix.properties.tmpl │ │ ├── log4j.properties.tmpl │ │ ├── producer.properties.tmpl │ │ ├── test-log4j.properties.tmpl │ │ ├── tools-log4j.properties.tmpl │ │ ├── topicmapping.properties.tmpl │ │ └── zookeeper.properties.tmpl └── entrypoint.sh ├── doc └── media │ ├── brookin-packet-loss-kafka-source-1min-3brokers.png │ ├── brookin-packet-loss-kafka-source-1min.png │ ├── brookin-packet-loss-kafka-source.png │ ├── brooklin-add-partitions-take1.png │ ├── brooklin-add-partitions-take2.png │ ├── brooklin-adding-new-worker.png │ ├── brooklin-downsize-destination-cluster-100mb.png │ ├── brooklin-downsize-destination-cluster.png │ ├── brooklin-kill-kafka-destination-pod-take2-aftershock1.png │ ├── brooklin-kill-kafka-destination-pod-take2-aftershock2.png │ ├── brooklin-kill-kafka-destination-pod-take2.png │ ├── brooklin-kill-kafka-destination-pod-take3.png │ ├── brooklin-kill-kafka-destination-pod-take4.png │ ├── brooklin-kill-kafka-destination-pod-take5.png │ ├── brooklin-kill-kafka-destination-pod.png │ ├── brooklin-kill-kafka-source-pod.png │ ├── brooklin-new-topic.png │ ├── brooklin-packat-loss-100mb.png │ ├── brooklin-packet-loss.png │ ├── brooklin-reduce-worker-pool-to-31.png │ ├── brooklin-remove-more-workers.png │ ├── brooklin-removing-more-workers-latency.png │ ├── brooklin-resize-kafka-source.png │ ├── brooklin-scale-down-and-up-100mb.png │ ├── brooklin-scale-down-and-up.png │ ├── brookling-killl-kafka-pod-take2-production-error-rate.png │ ├── downsize-destination-cluster.png │ ├── kill-kafka-source-pod.png │ ├── kill-pod-destination.png │ ├── new-topic.png │ ├── packet-loss-on-source-cluster.png │ ├── packet-loss-on-workers.png │ └── remove-worker.png ├── go.mk ├── go.mod ├── go.sum ├── k8s.mk ├── k8s ├── brooklin │ ├── 00namespace.yml │ ├── 20zookeeper.yml │ ├── 25env-config.yml │ ├── 25jmx-prometheus-javaagent-config.yml │ ├── 30brooklin.yml │ ├── 40monitoring.yml │ ├── delete-replicate-topic.sh │ ├── replicate-topic.sh │ └── test.sh ├── kafka-destination │ ├── 00namespace.yml │ ├── 10broker-config.yml │ ├── 10metrics-config.yml │ ├── 10zookeeper-config.yml │ ├── 20dns.yml │ ├── 20pzoo-service.yml │ ├── 30service.yml │ ├── 50kafka.yml │ ├── 50pzoo.yml │ ├── 60monitoring.yml │ └── test.sh ├── kafka-source │ ├── 00namespace.yml │ ├── 10broker-config.yml │ ├── 10metrics-config.yml │ ├── 10zookeeper-config.yml │ ├── 20dns.yml │ ├── 20pzoo-service.yml │ ├── 30service.yml │ ├── 50kafka.yml │ ├── 50pzoo.yml │ ├── 60monitoring.yml │ └── test.sh ├── monitoring │ ├── admin-cluster-role-binding.yaml │ ├── admin-service-account.yaml │ ├── graphite-exporter │ │ ├── configmap.yml │ │ ├── deployment.yaml │ │ ├── prometheus-scrape.yaml │ │ └── service.yaml │ ├── kube-state-metrics-cluster-role-binding.yaml │ ├── kube-state-metrics-cluster-role.yaml │ ├── kube-state-metrics-deployment.yaml │ ├── kube-state-metrics-role-binding.yaml │ ├── kube-state-metrics-role.yaml │ ├── kube-state-metrics-service-account.yaml │ ├── kube-state-metrics-service.yaml │ ├── monitoring-expose-kube-controller-manager.yaml │ ├── monitoring-expose-kube-scheduler.yaml │ └── patch │ │ ├── grafana-dashboard-definitions.yaml │ │ ├── grafana-datasources.yaml.tmpl │ │ └── template.sh ├── tester │ ├── consumer.yaml │ └── producer.yaml └── ureplicator │ ├── 00namespace.yml │ ├── 20zookeeper.yml │ ├── 25env-config.yml.tmpl │ ├── 25jmx-prometheus-javaagent-config.yml │ ├── 30ureplicator.yml │ ├── 40monitoring.yml │ ├── template.sh │ └── test.sh ├── lib ├── admin │ └── topic.go ├── cmd │ ├── cmd.go │ ├── consume.go │ └── produce.go ├── consumer │ ├── consumer.go │ ├── performance.go │ ├── sequences.go │ ├── sequences_test.go │ ├── throughput.go │ └── ui.go ├── gen │ └── main │ │ └── code-gen.go ├── log.go ├── message │ ├── data.go │ ├── message-no-headers-const-gen.go │ ├── message-no-headers.go │ ├── message-no-headers_test.go │ ├── message.go │ └── message_test.go ├── producer │ ├── monitor.go │ └── producer.go └── types │ └── types.go ├── main.go ├── results-brooklin.md ├── results-ureplicator.md └── running.md /.gitignore: -------------------------------------------------------------------------------- 1 | vendor 2 | kafka-mirror-tester 3 | build-ureplicator/tmp/ 4 | k8s/ureplicator/25env-config.yml 5 | k8s/monitoring/patch/grafana-datasources.yaml 6 | -------------------------------------------------------------------------------- /Dockerfile: -------------------------------------------------------------------------------- 1 | FROM ubuntu 2 | 3 | # Install the C lib for kafka 4 | RUN apt-get update 5 | RUN apt-get install -y --no-install-recommends apt-utils wget gnupg software-properties-common 6 | RUN apt-get install -y apt-transport-https ca-certificates 7 | RUN wget -qO - https://packages.confluent.io/deb/5.1/archive.key | apt-key add - 8 | RUN add-apt-repository "deb [arch=amd64] https://packages.confluent.io/deb/5.1 stable main" 9 | RUN apt-get update 10 | RUN apt-get install -y librdkafka-dev 11 | 12 | # Install Go 13 | RUN add-apt-repository ppa:longsleep/golang-backports 14 | RUN apt-get update 15 | RUN apt-get install -y golang-1.11-go 16 | 17 | # build the library 18 | WORKDIR /go/src/github.com/appsflyer/kafka-mirror-tester 19 | COPY *.go ./ 20 | COPY lib lib 21 | COPY vendor vendor 22 | 23 | RUN GOPATH=/go GOOS=linux /usr/lib/go-1.11/bin/go build -a -o main . 24 | 25 | EXPOSE 8000 26 | 27 | ENTRYPOINT ["./main"] 28 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | include go.mk 2 | include k8s.mk 3 | 4 | #################### 5 | # uReplicator docker 6 | #################### 7 | ureplicator-release: 8 | cd build-ureplicator; make release 9 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Kafka Mirror Tester 2 | 3 | Kafka mirror tester is a tool meant to test the performance and correctness of apache kafka mirroring. 4 | Mirroring is not one of kafka's built in properties, but there are 3rd party tools that implement mirroring, namely: 5 | 6 | 1. Kafka's [Mirror Maker](https://kafka.apache.org/documentation.html#basic_ops_mirror_maker), a relatively simple tool within the kafka project with some known limitations 7 | 2. [Confluent's Replicator](https://docs.confluent.io/current/multi-dc-replicator/index.html), a paid for tool from confluent. 8 | 3. Uber's open source [uReplicator](https://github.com/uber/uReplicator) 9 | 4. LinkedIn [Brooklin](https://github.com/linkedin/brooklin) 10 | 11 | This test tool is indifferent to the underlying mirroring tool so it is able to test all the above mentioned replicators. 12 | 13 | *The current implementation supports Uber's uReplicator and Linkedin's Brooklin.* 14 | 15 | Presentation on this project: https://speakerdeck.com/rantav/infrastructure-testing-using-kubernetes 16 | 17 | 18 | ## High level design 19 | 20 | Mirroring typically takes place between two datacenters as described below: 21 | 22 | ``` 23 | ---------------------- -------------------------------------- 24 | | | | | 25 | | Source DC | | Destination DC | 26 | | | | | 27 | | ---------------- | | -------------- ---------------- | 28 | | | Source Kafka | | - - - - - - - - - - -> | | Replicator | -> | Target Kafka | | 29 | | ________________ | | -------------- ---------------- | 30 | | | | | 31 | ---------------------- -------------------------------------- 32 | ``` 33 | 34 | The test tool has the following goals in mind: 35 | 36 | 1. Correctness, mainly completeness - that all messages sent to Source arrived in order at Destination (per partition). At least once semantic. 37 | 2. Performance - how long does it take for messages to get replicated and sent to Destination. This, of course, takes into consideration the laws of manure, e.g. inherent line latency. 38 | 39 | The test harness is therefore comprised of two components: The `producer` and the `consumer` 40 | 41 | ### The producer 42 | The producer writes messages with sequence numbers and timestamps. 43 | 44 | ### The consumer 45 | The consumer reads messages and looks into the sequence numbers and timestamps to determine correctness and performance. 46 | This assumes the producer and consumer's clocks are in sync (we don't require the punctuation of atomic clocks, but we do realize that out of sync clocks will influence accuracy) 47 | 48 | ## Lower level design 49 | 50 | The producer writes its messages to the source kafka, adding it's `producer-id`, `sequence`, `timestamp` and a `payload`. 51 | The producer is capable of throttling it's throughput so that we'd achieve predictable throughput. 52 | 53 | ``` 54 | ---------------------- -------------------------------------- 55 | | | | | 56 | | Source DC | | Destination DC | 57 | | | | | 58 | | ---------------- | | -------------- ---------------- | 59 | | | Source Kafka | | - - - - - - - - - - -> | | Replicator | -> | Target Kafka | | 60 | | ---------------- | | -------------- ---------------- | 61 | | ↑ | | | | 62 | | | | | | | 63 | | | | | ↓ | 64 | | ------------ | | ------------ | 65 | | | producer | | | | consumer | | 66 | | ------------ | | ------------ | 67 | ---------------------- -------------------------------------- 68 | ``` 69 | 70 | ### Message format 71 | 72 | We aim for a simple, low overhead message format utilizing Kafka's built in header fields. And where headers are not supported (shamefully, the current reality with both uReplicator and Brooklin), we utilize an in-body message format. 73 | 74 | Message format: 75 | There are two variants of message formats, one that uses kafka headers and the other that does not. 76 | We implement two formats because while headers are nicer and easier to use, neither uReplicator nor Brooklin currently support them. 77 | 78 | Message format with headers: (for simplicity, we use a json format but of course in Kafka it's all binary) 79 | ``` 80 | { 81 | value: payload, // Payload size is determined by the user. 82 | timestamp: produceTime, // The producer embeds a timestamp in UTC 83 | headers: { 84 | id: producer-id, 85 | seq: sequence-number 86 | } 87 | } 88 | ``` 89 | 90 | Message format without headers (encoded in the message body itself): 91 | ``` 92 | +-------------------------------------------------+ 93 | | producer-id;sequence-number;timestamp;payload...| 94 | +-------------------------------------------------+ 95 | ``` 96 | 97 | We add the `producer-id` so that we can run the producers on multiple hosts and still be able to make sure that all messages arrived. 98 | 99 | ### Producer 100 | 101 | Command line arguments: 102 | 103 | `--id`: Producer ID. May be the hostname etc. 104 | 105 | `--topics`: List of topic names to write to, separated by comas 106 | 107 | `--throughput`: Number of messages per second per topic 108 | 109 | `--message-size`: Message size, including the header section (producer-id;sequence-number;timestamp;). The minimal message size is around 30 bytes then due to a typical header length 110 | 111 | `--bootstrap-server`: A kafka server from which to bootstrap 112 | 113 | `--use-message-headers`: Whether to use message headers to encode metadata (or encode it within the payload) 114 | 115 | The producer would generate messages containing the header and adding the payload for as long as needed in order to reach the `message-size` and send them to kafka. 116 | It will try to achieve the desired throughput (send batches and in parallel) but will not exceed it. If it is unable to achieve the desired throughput we'll emit a log warning and continue. We also keep an eye on that using Prometheus and Grafana in the big picture. 117 | The throughput is measured as the number of messages / second / topic. 118 | 119 | ### Consumer 120 | 121 | Command line arguments: 122 | 123 | `--topics`: List of topic names to read from, separated by comas 124 | 125 | `--bootstrap-server`: A kafka server from which to bootstrap 126 | 127 | `--use-message-headers`: Whether to use message headers to encode metadata (or encode it within the payload) 128 | 129 | The consumer would read the messages from each of the topics and calculate correctness and performance. 130 | 131 | Correctness is determined by the combination of `topic`, `producer-id` and `sequence-number` (e.g. if a specific producer has gaps that means we're missing messages). 132 | 133 | There is a fine point to mention in that respect. When operating with multiple partitions we utilize Kafka's message `key` in order to ensure message routing correctness. When multiple consumers read, (naturally from multiple partitions) we want each consumer to be able to read *all* sequential messages *in the order* they were sent. To achieve that we use Kafka's message routing abilities such that messages with the same key always routed to the same partition. What matters is the number of partitions in the destination cluster. To achieve linearity we sequence the messages modulo the number of partitions in the destination cluster. This way, all ascending sequence numbers are sent to the same partition in the same order and clients are then able to easily verify that all messages arrived in the order they were sent. 134 | 135 | 136 | Latency is determined by the time gap between the `timestamp` and the current local consumer time. The consumer then emits a histogram of latency buckets. 137 | 138 | ## Open for discussion 139 | 140 | 1. If the last message from a producer got lost we don't know about it. If all messages from a specific producer got lost, we won't know about it either (although it's possible to manually audit that). If a certain partition is not replicated we can only see it by means of traffic volume monitoring, not precise counts. 141 | 142 | # Using it. 143 | The tools in this project expect some familiarity with 3rd party tools, namely Kubernetes and AWS. We don't expect expert level but some familiarity with the tools is very helpful. 144 | 145 | For details how to run it see [Running it](running.md) 146 | 147 | # Results 148 | 149 | We have [benchmark results for uReplicator](results-ureplicator.md) and [benchmark results for brooklin](results-brooklin.md) 150 | -------------------------------------------------------------------------------- /build-ureplicator/Dockerfile: -------------------------------------------------------------------------------- 1 | FROM openjdk:8-jre 2 | 3 | ADD https://github.com/kelseyhightower/confd/releases/download/v0.15.0/confd-0.15.0-linux-amd64 /usr/local/bin/confd 4 | 5 | ADD https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/0.3.1/jmx_prometheus_javaagent-0.3.1.jar /jmx_prometheus_javaagent-0.3.1.jar 6 | 7 | COPY tmp/uReplicator-master/uReplicator-Distribution/target/uReplicator-Distribution-pkg /uReplicator 8 | 9 | COPY tmp/uReplicator-master/config uReplicator/config 10 | 11 | COPY confd /etc/confd 12 | 13 | COPY entrypoint.sh /entrypoint.sh 14 | RUN chmod +x /entrypoint.sh && \ 15 | chmod +x /usr/local/bin/confd && \ 16 | chmod +x /uReplicator/bin/*.sh 17 | 18 | ENV JAVA_OPTS "${JAVA_OPTS} -XX:+UnlockExperimentalVMOptions -XX:+UseG1GC -XX:+UseCGroupMemoryLimitForHeap -XX:MaxRAMFraction=1" 19 | 20 | ENTRYPOINT [ "/entrypoint.sh" ] -------------------------------------------------------------------------------- /build-ureplicator/Makefile: -------------------------------------------------------------------------------- 1 | REV := `git rev-parse --short HEAD` 2 | 3 | 4 | #################### 5 | # uReplicator docker 6 | #################### 7 | U_WORK_DIR := tmp 8 | U_BIN := ureplicator 9 | U_IMAGE := rantav/$(U_BIN) 10 | 11 | release: clean build deploy clean 12 | 13 | build: 14 | mkdir -p $(U_WORK_DIR) 15 | curl -sL https://github.com/uber/uReplicator/archive/master.tar.gz | tar xz -C $(U_WORK_DIR) 16 | cd $(U_WORK_DIR)/uReplicator-master && mvn package -DskipTests 17 | chmod u+x $(U_WORK_DIR)/uReplicator-master/bin/pkg/*.sh 18 | 19 | image: 20 | docker build -t $(U_IMAGE):$(REV) . 21 | 22 | deploy: image 23 | docker push $(U_IMAGE):$(REV) 24 | 25 | clean: 26 | @/bin/rm -rf $(U_WORK_DIR) 27 | -------------------------------------------------------------------------------- /build-ureplicator/confd/CONFD.md: -------------------------------------------------------------------------------- 1 | ## ignore me 2 | Generate confd 3 | ``` 4 | ls -al | awk '{print$9}' | while read line ; do echo "src =" '"'${line}'"' ; echo "dest =" '"/uReplicator/config/'${line}'.tmpl"' ; echo ; done 5 | {{ getenv "HOSTNAME" }} 6 | ``` 7 | -------------------------------------------------------------------------------- /build-ureplicator/confd/conf.d/consumer.properties.toml: -------------------------------------------------------------------------------- 1 | [template] 2 | src = "consumer.properties.tmpl" 3 | dest = "/uReplicator/config/consumer.properties" 4 | -------------------------------------------------------------------------------- /build-ureplicator/confd/conf.d/helix.properties.toml: -------------------------------------------------------------------------------- 1 | [template] 2 | src = "helix.properties.tmpl" 3 | dest = "/uReplicator/config/helix.properties" 4 | -------------------------------------------------------------------------------- /build-ureplicator/confd/conf.d/producer.properties.toml: -------------------------------------------------------------------------------- 1 | [template] 2 | src = "producer.properties.tmpl" 3 | dest = "/uReplicator/config/producer.properties" 4 | -------------------------------------------------------------------------------- /build-ureplicator/confd/conf.d/zookeeper.properties.toml: -------------------------------------------------------------------------------- 1 | [template] 2 | src = "zookeeper.properties.tmpl" 3 | dest = "/uReplicator/config/zookeeper.properties" 4 | -------------------------------------------------------------------------------- /build-ureplicator/confd/templates/consumer.properties.tmpl: -------------------------------------------------------------------------------- 1 | # Licensed to the Apache Software Foundation (ASF) under one or more 2 | # contributor license agreements. See the NOTICE file distributed with 3 | # this work for additional information regarding copyright ownership. 4 | # The ASF licenses this file to You under the Apache License, Version 2.0 5 | # (the "License"); you may not use this file except in compliance with 6 | # the License. You may obtain a copy of the License at 7 | # 8 | # http://www.apache.org/licenses/LICENSE-2.0 9 | # 10 | # Unless required by applicable law or agreed to in writing, software 11 | # distributed under the License is distributed on an "AS IS" BASIS, 12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | # See the License for the specific language governing permissions and 14 | # limitations under the License. 15 | # see kafka.consumer.ConsumerConfig for more details 16 | 17 | # Zookeeper connection string 18 | # comma separated host:port pairs, each corresponding to a zk 19 | # server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002" 20 | zookeeper.connect={{ getenv "SRC_ZK_CONNECT" }} 21 | 22 | # timeout in ms for connecting to zookeeper 23 | zookeeper.connection.timeout.ms=30000 24 | zookeeper.session.timeout.ms=30000 25 | 26 | #consumer group id 27 | group.id={{ getenv "CONSUMER_GROUP_ID" }} 28 | 29 | consumer.id={{ getenv "HOSTNAME" }} 30 | partition.assignment.strategy=roundrobin 31 | socket.receive.buffer.bytes={{ getenv "SOCKET_RECEIVE_BUFFER_BYTES" }} 32 | fetch.message.max.bytes={{ getenv "FETCH_MESSAGE_MAX_BYTES" }} 33 | queued.max.message.chunks=5 34 | 35 | #consumer timeout 36 | #consumer.timeout.ms=5000 37 | 38 | auto.offset.reset=smallest 39 | num.consumer.fetchers={{ getenv "NUM_CONSUMER_FETCHERS" }} 40 | -------------------------------------------------------------------------------- /build-ureplicator/confd/templates/helix.properties.tmpl: -------------------------------------------------------------------------------- 1 | zkServer={{ getenv "HELIX_ZK_CONNECT" }} 2 | instanceId={{ getenv "HOSTNAME" }} 3 | helixClusterName={{ getenv "HELIX_CLUSTER_NAME" }} -------------------------------------------------------------------------------- /build-ureplicator/confd/templates/log4j.properties.tmpl: -------------------------------------------------------------------------------- 1 | # Licensed to the Apache Software Foundation (ASF) under one or more 2 | # contributor license agreements. See the NOTICE file distributed with 3 | # this work for additional information regarding copyright ownership. 4 | # The ASF licenses this file to You under the Apache License, Version 2.0 5 | # (the "License"); you may not use this file except in compliance with 6 | # the License. You may obtain a copy of the License at 7 | # 8 | # http://www.apache.org/licenses/LICENSE-2.0 9 | # 10 | # Unless required by applicable law or agreed to in writing, software 11 | # distributed under the License is distributed on an "AS IS" BASIS, 12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | # See the License for the specific language governing permissions and 14 | # limitations under the License. 15 | 16 | log4j.rootLogger=INFO, stdout 17 | 18 | log4j.appender.stdout=org.apache.log4j.ConsoleAppender 19 | log4j.appender.stdout.layout=org.apache.log4j.PatternLayout 20 | log4j.appender.stdout.layout.ConversionPattern=[%d] %p %m (%c)%n 21 | 22 | log4j.appender.kafkaAppender=org.apache.log4j.DailyRollingFileAppender 23 | log4j.appender.kafkaAppender.DatePattern='.'yyyy-MM-dd-HH 24 | log4j.appender.kafkaAppender.File=${kafka.logs.dir}/server.log 25 | log4j.appender.kafkaAppender.layout=org.apache.log4j.PatternLayout 26 | log4j.appender.kafkaAppender.layout.ConversionPattern=[%d] %p %m (%c)%n 27 | 28 | log4j.appender.stateChangeAppender=org.apache.log4j.DailyRollingFileAppender 29 | log4j.appender.stateChangeAppender.DatePattern='.'yyyy-MM-dd-HH 30 | log4j.appender.stateChangeAppender.File=${kafka.logs.dir}/state-change.log 31 | log4j.appender.stateChangeAppender.layout=org.apache.log4j.PatternLayout 32 | log4j.appender.stateChangeAppender.layout.ConversionPattern=[%d] %p %m (%c)%n 33 | 34 | log4j.appender.requestAppender=org.apache.log4j.DailyRollingFileAppender 35 | log4j.appender.requestAppender.DatePattern='.'yyyy-MM-dd-HH 36 | log4j.appender.requestAppender.File=${kafka.logs.dir}/kafka-request.log 37 | log4j.appender.requestAppender.layout=org.apache.log4j.PatternLayout 38 | log4j.appender.requestAppender.layout.ConversionPattern=[%d] %p %m (%c)%n 39 | 40 | log4j.appender.cleanerAppender=org.apache.log4j.DailyRollingFileAppender 41 | log4j.appender.cleanerAppender.DatePattern='.'yyyy-MM-dd-HH 42 | log4j.appender.cleanerAppender.File=${kafka.logs.dir}/log-cleaner.log 43 | log4j.appender.cleanerAppender.layout=org.apache.log4j.PatternLayout 44 | log4j.appender.cleanerAppender.layout.ConversionPattern=[%d] %p %m (%c)%n 45 | 46 | log4j.appender.controllerAppender=org.apache.log4j.DailyRollingFileAppender 47 | log4j.appender.controllerAppender.DatePattern='.'yyyy-MM-dd-HH 48 | log4j.appender.controllerAppender.File=${kafka.logs.dir}/controller.log 49 | log4j.appender.controllerAppender.layout=org.apache.log4j.PatternLayout 50 | log4j.appender.controllerAppender.layout.ConversionPattern=[%d] %p %m (%c)%n 51 | 52 | # Turn on all our debugging info 53 | #log4j.logger.kafka.producer.async.DefaultEventHandler=DEBUG, kafkaAppender 54 | #log4j.logger.kafka.client.ClientUtils=DEBUG, kafkaAppender 55 | #log4j.logger.kafka.perf=DEBUG, kafkaAppender 56 | #log4j.logger.kafka.perf.ProducerPerformance$ProducerThread=DEBUG, kafkaAppender 57 | #log4j.logger.org.I0Itec.zkclient.ZkClient=DEBUG 58 | log4j.logger.kafka=INFO, kafkaAppender 59 | 60 | log4j.logger.kafka.network.RequestChannel$=WARN, requestAppender 61 | log4j.additivity.kafka.network.RequestChannel$=false 62 | 63 | #log4j.logger.kafka.network.Processor=TRACE, requestAppender 64 | #log4j.logger.kafka.server.KafkaApis=TRACE, requestAppender 65 | #log4j.additivity.kafka.server.KafkaApis=false 66 | log4j.logger.kafka.request.logger=WARN, requestAppender 67 | log4j.additivity.kafka.request.logger=false 68 | 69 | log4j.logger.kafka.controller=TRACE, controllerAppender 70 | log4j.additivity.kafka.controller=false 71 | 72 | log4j.logger.kafka.log.LogCleaner=INFO, cleanerAppender 73 | log4j.additivity.kafka.log.LogCleaner=false 74 | 75 | log4j.logger.state.change.logger=TRACE, stateChangeAppender 76 | log4j.additivity.state.change.logger=false 77 | -------------------------------------------------------------------------------- /build-ureplicator/confd/templates/producer.properties.tmpl: -------------------------------------------------------------------------------- 1 | # Licensed to the Apache Software Foundation (ASF) under one or more 2 | # contributor license agreements. See the NOTICE file distributed with 3 | # this work for additional information regarding copyright ownership. 4 | # The ASF licenses this file to You under the Apache License, Version 2.0 5 | # (the "License"); you may not use this file except in compliance with 6 | # the License. You may obtain a copy of the License at 7 | # 8 | # http://www.apache.org/licenses/LICENSE-2.0 9 | # 10 | # Unless required by applicable law or agreed to in writing, software 11 | # distributed under the License is distributed on an "AS IS" BASIS, 12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | # See the License for the specific language governing permissions and 14 | # limitations under the License. 15 | # see kafka.producer.ProducerConfig for more details 16 | 17 | ############################# Producer Basics ############################# 18 | 19 | # list of brokers used for bootstrapping knowledge about the rest of the cluster 20 | # format: host1:port1,host2:port2 ... 21 | bootstrap.servers={{ getenv "DST_BOOTSTRAP_SERVERS" }} 22 | client.id={{ getenv "CONSUMER_GROUP_ID" }} 23 | 24 | # name of the partitioner class for partitioning events; default partition spreads data randomly 25 | #partitioner.class= 26 | 27 | # specifies whether the messages are sent asynchronously (async) or synchronously (sync) 28 | # NOT SUPPORTED 29 | #producer.type=async 30 | 31 | # specify the compression codec for all data generated: none, gzip, snappy, lz4. 32 | # the old config values work as well: 0, 1, 2, 3 for none, gzip, snappy, lz4, respectively 33 | # NOT SUPPORTED 34 | #compression.codec=snappy 35 | # CORRECT PROPERT?Y 36 | compression.type={{ getenv "PROD_COMPRESSION_TYPE" }} 37 | 38 | # message encoder 39 | # NOT SUPPORTED 40 | # serializer.class=kafka.serializer.DefaultEncoder 41 | key.serializer=org.apache.kafka.common.serialization.ByteArraySerializer 42 | value.serializer=org.apache.kafka.common.serialization.ByteArraySerializer 43 | 44 | # allow topic level compression 45 | #compressed.topics= 46 | 47 | # Alias for queue.buffering.max.ms: Delay in milliseconds to wait for messages in the producer queue to accumulate before constructing message batches (MessageSets) to transmit to brokers. A higher value allows larger and more effective (less overhead, improved compression) batches of messages to accumulate at the expense of increased message delivery latency. 48 | linger.ms={{ getenv "PROD_LINGER_MS" }} 49 | 50 | ############################# Async Producer ############################# 51 | # maximum time, in milliseconds, for buffering data on the producer queue 52 | # NOT SUPPORTED 53 | #queue.buffering.max.ms={{ getenv "PROD_QUEUE_BUFFERING_MAX_MS" }} 54 | 55 | # the maximum size of the blocking queue for buffering on the producer 56 | # NOT SUPPORTED 57 | #queue.buffering.max.messages={{ getenv "PROD_QUEUE_BUFFERING_MAX_MESSAGES" }} 58 | 59 | # Timeout for event enqueue: 60 | # 0: events will be enqueued immediately or dropped if the queue is full 61 | # -ve: enqueue will block indefinitely if the queue is full 62 | # +ve: enqueue will block up to this many milliseconds if the queue is full 63 | #queue.enqueue.timeout.ms= 64 | 65 | # the number of messages batched at the producer 66 | # NOT SUPPORTED 67 | #batch.num.messages={{ getenv "PROD_BATCH_NUM_MESSAGES" }} 68 | 69 | send.buffer.bytes={{ getenv "PROD_SEND_BUFFER_BYTES" }} 70 | 71 | # The maximum number of unacknowledged requests the client will send on a single connection before blocking. Note that if this setting is set to be greater than 1 and there are failed sends, there is a risk of message re-ordering due to retries (i.e., if retries are enabled). 72 | max.in.flight.requests.per.connection={{ getenv "PROD_MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION" }} 73 | 74 | max.request.size={{ getenv "PROD_MAX_REQUEST_SIZE" }} 75 | -------------------------------------------------------------------------------- /build-ureplicator/confd/templates/test-log4j.properties.tmpl: -------------------------------------------------------------------------------- 1 | # Licensed to the Apache Software Foundation (ASF) under one or more 2 | # contributor license agreements. See the NOTICE file distributed with 3 | # this work for additional information regarding copyright ownership. 4 | # The ASF licenses this file to You under the Apache License, Version 2.0 5 | # (the "License"); you may not use this file except in compliance with 6 | # the License. You may obtain a copy of the License at 7 | # 8 | # http://www.apache.org/licenses/LICENSE-2.0 9 | # 10 | # Unless required by applicable law or agreed to in writing, software 11 | # distributed under the License is distributed on an "AS IS" BASIS, 12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | # See the License for the specific language governing permissions and 14 | # limitations under the License. 15 | 16 | log4j.rootLogger=INFO, stdout 17 | 18 | log4j.appender.stdout=org.apache.log4j.ConsoleAppender 19 | log4j.appender.stdout.layout=org.apache.log4j.PatternLayout 20 | log4j.appender.stdout.layout.ConversionPattern=[%d] %p %m (%c)%n 21 | 22 | log4j.appender.kafkaAppender=org.apache.log4j.DailyRollingFileAppender 23 | log4j.appender.kafkaAppender.DatePattern='.'yyyy-MM-dd-HH 24 | log4j.appender.kafkaAppender.File=logs/server.log 25 | log4j.appender.kafkaAppender.layout=org.apache.log4j.PatternLayout 26 | log4j.appender.kafkaAppender.layout.ConversionPattern=[%d] %p %m (%c)%n 27 | 28 | log4j.appender.stateChangeAppender=org.apache.log4j.DailyRollingFileAppender 29 | log4j.appender.stateChangeAppender.DatePattern='.'yyyy-MM-dd-HH 30 | log4j.appender.stateChangeAppender.File=logs/state-change.log 31 | log4j.appender.stateChangeAppender.layout=org.apache.log4j.PatternLayout 32 | log4j.appender.stateChangeAppender.layout.ConversionPattern=[%d] %p %m (%c)%n 33 | 34 | log4j.appender.requestAppender=org.apache.log4j.DailyRollingFileAppender 35 | log4j.appender.requestAppender.DatePattern='.'yyyy-MM-dd-HH 36 | log4j.appender.requestAppender.File=logs/kafka-request.log 37 | log4j.appender.requestAppender.layout=org.apache.log4j.PatternLayout 38 | log4j.appender.requestAppender.layout.ConversionPattern=[%d] %p %m (%c)%n 39 | 40 | log4j.appender.controllerAppender=org.apache.log4j.DailyRollingFileAppender 41 | log4j.appender.controllerAppender.DatePattern='.'yyyy-MM-dd-HH 42 | log4j.appender.controllerAppender.File=logs/controller.log 43 | log4j.appender.controllerAppender.layout=org.apache.log4j.PatternLayout 44 | log4j.appender.controllerAppender.layout.ConversionPattern=[%d] %p %m (%c)%n 45 | 46 | # Turn on all our debugging info 47 | #log4j.logger.kafka.producer.async.DefaultEventHandler=DEBUG, kafkaAppender 48 | #log4j.logger.kafka.client.ClientUtils=DEBUG, kafkaAppender 49 | log4j.logger.kafka.tools=DEBUG, kafkaAppender 50 | log4j.logger.kafka.tools.ProducerPerformance$ProducerThread=DEBUG, kafkaAppender 51 | #log4j.logger.org.I0Itec.zkclient.ZkClient=DEBUG 52 | log4j.logger.kafka=INFO, kafkaAppender 53 | 54 | log4j.logger.kafka.network.RequestChannel$=TRACE, requestAppender 55 | log4j.additivity.kafka.network.RequestChannel$=false 56 | 57 | #log4j.logger.kafka.network.Processor=TRACE, requestAppender 58 | #log4j.logger.kafka.server.KafkaApis=TRACE, requestAppender 59 | #log4j.additivity.kafka.server.KafkaApis=false 60 | log4j.logger.kafka.request.logger=TRACE, requestAppender 61 | log4j.additivity.kafka.request.logger=false 62 | 63 | log4j.logger.kafka.controller=TRACE, controllerAppender 64 | log4j.additivity.kafka.controller=false 65 | 66 | log4j.logger.state.change.logger=TRACE, stateChangeAppender 67 | log4j.additivity.state.change.logger=false 68 | -------------------------------------------------------------------------------- /build-ureplicator/confd/templates/tools-log4j.properties.tmpl: -------------------------------------------------------------------------------- 1 | # Licensed to the Apache Software Foundation (ASF) under one or more 2 | # contributor license agreements. See the NOTICE file distributed with 3 | # this work for additional information regarding copyright ownership. 4 | # The ASF licenses this file to You under the Apache License, Version 2.0 5 | # (the "License"); you may not use this file except in compliance with 6 | # the License. You may obtain a copy of the License at 7 | # 8 | # http://www.apache.org/licenses/LICENSE-2.0 9 | # 10 | # Unless required by applicable law or agreed to in writing, software 11 | # distributed under the License is distributed on an "AS IS" BASIS, 12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | # See the License for the specific language governing permissions and 14 | # limitations under the License. 15 | 16 | log4j.rootLogger=INFO, stdout 17 | 18 | log4j.appender.stdout=org.apache.log4j.ConsoleAppender 19 | log4j.appender.stdout.layout=org.apache.log4j.PatternLayout 20 | log4j.appender.stdout.layout.ConversionPattern=[%d] %p %m (%c)%n 21 | -------------------------------------------------------------------------------- /build-ureplicator/confd/templates/topicmapping.properties.tmpl: -------------------------------------------------------------------------------- 1 | dummyTopic dummyTopic1 -------------------------------------------------------------------------------- /build-ureplicator/confd/templates/zookeeper.properties.tmpl: -------------------------------------------------------------------------------- 1 | # Licensed to the Apache Software Foundation (ASF) under one or more 2 | # contributor license agreements. See the NOTICE file distributed with 3 | # this work for additional information regarding copyright ownership. 4 | # The ASF licenses this file to You under the Apache License, Version 2.0 5 | # (the "License"); you may not use this file except in compliance with 6 | # the License. You may obtain a copy of the License at 7 | # 8 | # http://www.apache.org/licenses/LICENSE-2.0 9 | # 10 | # Unless required by applicable law or agreed to in writing, software 11 | # distributed under the License is distributed on an "AS IS" BASIS, 12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | # See the License for the specific language governing permissions and 14 | # limitations under the License. 15 | 16 | # the directory where the snapshot is stored. 17 | dataDir=/tmp/zookeeper 18 | # the port at which the clients will connect 19 | clientPort=2181 20 | # disable the per-ip limit on the number of connections since this is a non-production config 21 | maxClientCnxns=0 22 | -------------------------------------------------------------------------------- /build-ureplicator/entrypoint.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash -ex 2 | 3 | if [[ "${LOGICAL_PROCESSORS}" == "" ]]; then 4 | LOGICAL_PROCESSORS=`getconf _NPROCESSORS_ONLN` 5 | fi 6 | 7 | export JAVA_OPTS="${JAVA_OPTS} -XX:ParallelGCThreads=${LOGICAL_PROCESSORS}" 8 | 9 | 10 | confd -onetime -backend env 11 | 12 | cd /uReplicator/bin/ 13 | 14 | if [ "${SERVICE_TYPE}" == "controller" ] ; then 15 | ./start-controller.sh \ 16 | -port 9000 \ 17 | -zookeeper "${HELIX_ZK_CONNECT}" \ 18 | -helixClusterName "${HELIX_CLUSTER_NAME}" \ 19 | -backUpToGit false \ 20 | -autoRebalanceDelayInSeconds 120 \ 21 | -localBackupFilePath /tmp/uReplicator-controller \ 22 | -enableAutoWhitelist true \ 23 | -enableAutoTopicExpansion true \ 24 | -srcKafkaZkPath "${SRC_ZK_CONNECT}" \ 25 | -destKafkaZkPath "${DST_ZK_CONNECT}" \ 26 | -initWaitTimeInSeconds 10 \ 27 | -refreshTimeInSeconds 20 \ 28 | -graphiteHost "${GRAPHITE_HOST}" \ 29 | -graphitePort "${GRAPHITE_PORT}" \ 30 | -env "${HELIX_ENV}" 31 | 32 | until [[ "OK" == "$(curl --silent http://localhost:9000/health)" ]]; do 33 | echo waiting 34 | sleep 1 35 | done 36 | 37 | TOPIC_LIST=( $(echo ${TOPICS} | sed "s/,/ /g") ) 38 | PARTITION_LIST=( $(echo ${PARTITIONS} | sed "s/,/ /g") ) 39 | 40 | for index in ${!TOPIC_LIST[*]}; do 41 | TOPIC="${TOPIC_LIST[$index]}" 42 | PARTITION="${PARTITION_LIST[$index]}" 43 | 44 | echo "Topic: ${TOPIC}, Partitions: ${PARTITION}" 45 | 46 | curl -X POST -d "{\"topic\": \"${TOPIC}\", \"numPartitions\": \"${PARTITION}\"}" http://localhost:9000/topics || true 47 | done 48 | 49 | elif [ "${SERVICE_TYPE}" == "worker" ] ; then 50 | 51 | WORKER_ABORT_ON_SEND_FAILURE="${WORKER_ABORT_ON_SEND_FAILURE:=false}" 52 | 53 | ./start-worker.sh \ 54 | --helix.config /uReplicator/config/helix.properties \ 55 | --consumer.config /uReplicator/config/consumer.properties \ 56 | --producer.config /uReplicator/config/producer.properties \ 57 | --abort.on.send.failure="${WORKER_ABORT_ON_SEND_FAILURE}" 58 | 59 | fi -------------------------------------------------------------------------------- /doc/media/brookin-packet-loss-kafka-source-1min-3brokers.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AppsFlyer/kafka-mirror-tester/ce3ed8f8f0f9ac672921bb9b69bf89ca556b1afe/doc/media/brookin-packet-loss-kafka-source-1min-3brokers.png -------------------------------------------------------------------------------- /doc/media/brookin-packet-loss-kafka-source-1min.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AppsFlyer/kafka-mirror-tester/ce3ed8f8f0f9ac672921bb9b69bf89ca556b1afe/doc/media/brookin-packet-loss-kafka-source-1min.png -------------------------------------------------------------------------------- /doc/media/brookin-packet-loss-kafka-source.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AppsFlyer/kafka-mirror-tester/ce3ed8f8f0f9ac672921bb9b69bf89ca556b1afe/doc/media/brookin-packet-loss-kafka-source.png -------------------------------------------------------------------------------- /doc/media/brooklin-add-partitions-take1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AppsFlyer/kafka-mirror-tester/ce3ed8f8f0f9ac672921bb9b69bf89ca556b1afe/doc/media/brooklin-add-partitions-take1.png -------------------------------------------------------------------------------- /doc/media/brooklin-add-partitions-take2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AppsFlyer/kafka-mirror-tester/ce3ed8f8f0f9ac672921bb9b69bf89ca556b1afe/doc/media/brooklin-add-partitions-take2.png -------------------------------------------------------------------------------- /doc/media/brooklin-adding-new-worker.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AppsFlyer/kafka-mirror-tester/ce3ed8f8f0f9ac672921bb9b69bf89ca556b1afe/doc/media/brooklin-adding-new-worker.png -------------------------------------------------------------------------------- /doc/media/brooklin-downsize-destination-cluster-100mb.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AppsFlyer/kafka-mirror-tester/ce3ed8f8f0f9ac672921bb9b69bf89ca556b1afe/doc/media/brooklin-downsize-destination-cluster-100mb.png -------------------------------------------------------------------------------- /doc/media/brooklin-downsize-destination-cluster.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AppsFlyer/kafka-mirror-tester/ce3ed8f8f0f9ac672921bb9b69bf89ca556b1afe/doc/media/brooklin-downsize-destination-cluster.png -------------------------------------------------------------------------------- /doc/media/brooklin-kill-kafka-destination-pod-take2-aftershock1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AppsFlyer/kafka-mirror-tester/ce3ed8f8f0f9ac672921bb9b69bf89ca556b1afe/doc/media/brooklin-kill-kafka-destination-pod-take2-aftershock1.png -------------------------------------------------------------------------------- /doc/media/brooklin-kill-kafka-destination-pod-take2-aftershock2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AppsFlyer/kafka-mirror-tester/ce3ed8f8f0f9ac672921bb9b69bf89ca556b1afe/doc/media/brooklin-kill-kafka-destination-pod-take2-aftershock2.png -------------------------------------------------------------------------------- /doc/media/brooklin-kill-kafka-destination-pod-take2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AppsFlyer/kafka-mirror-tester/ce3ed8f8f0f9ac672921bb9b69bf89ca556b1afe/doc/media/brooklin-kill-kafka-destination-pod-take2.png -------------------------------------------------------------------------------- /doc/media/brooklin-kill-kafka-destination-pod-take3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AppsFlyer/kafka-mirror-tester/ce3ed8f8f0f9ac672921bb9b69bf89ca556b1afe/doc/media/brooklin-kill-kafka-destination-pod-take3.png -------------------------------------------------------------------------------- /doc/media/brooklin-kill-kafka-destination-pod-take4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AppsFlyer/kafka-mirror-tester/ce3ed8f8f0f9ac672921bb9b69bf89ca556b1afe/doc/media/brooklin-kill-kafka-destination-pod-take4.png -------------------------------------------------------------------------------- /doc/media/brooklin-kill-kafka-destination-pod-take5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AppsFlyer/kafka-mirror-tester/ce3ed8f8f0f9ac672921bb9b69bf89ca556b1afe/doc/media/brooklin-kill-kafka-destination-pod-take5.png -------------------------------------------------------------------------------- /doc/media/brooklin-kill-kafka-destination-pod.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AppsFlyer/kafka-mirror-tester/ce3ed8f8f0f9ac672921bb9b69bf89ca556b1afe/doc/media/brooklin-kill-kafka-destination-pod.png -------------------------------------------------------------------------------- /doc/media/brooklin-kill-kafka-source-pod.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AppsFlyer/kafka-mirror-tester/ce3ed8f8f0f9ac672921bb9b69bf89ca556b1afe/doc/media/brooklin-kill-kafka-source-pod.png -------------------------------------------------------------------------------- /doc/media/brooklin-new-topic.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AppsFlyer/kafka-mirror-tester/ce3ed8f8f0f9ac672921bb9b69bf89ca556b1afe/doc/media/brooklin-new-topic.png -------------------------------------------------------------------------------- /doc/media/brooklin-packat-loss-100mb.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AppsFlyer/kafka-mirror-tester/ce3ed8f8f0f9ac672921bb9b69bf89ca556b1afe/doc/media/brooklin-packat-loss-100mb.png -------------------------------------------------------------------------------- /doc/media/brooklin-packet-loss.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AppsFlyer/kafka-mirror-tester/ce3ed8f8f0f9ac672921bb9b69bf89ca556b1afe/doc/media/brooklin-packet-loss.png -------------------------------------------------------------------------------- /doc/media/brooklin-reduce-worker-pool-to-31.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AppsFlyer/kafka-mirror-tester/ce3ed8f8f0f9ac672921bb9b69bf89ca556b1afe/doc/media/brooklin-reduce-worker-pool-to-31.png -------------------------------------------------------------------------------- /doc/media/brooklin-remove-more-workers.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AppsFlyer/kafka-mirror-tester/ce3ed8f8f0f9ac672921bb9b69bf89ca556b1afe/doc/media/brooklin-remove-more-workers.png -------------------------------------------------------------------------------- /doc/media/brooklin-removing-more-workers-latency.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AppsFlyer/kafka-mirror-tester/ce3ed8f8f0f9ac672921bb9b69bf89ca556b1afe/doc/media/brooklin-removing-more-workers-latency.png -------------------------------------------------------------------------------- /doc/media/brooklin-resize-kafka-source.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AppsFlyer/kafka-mirror-tester/ce3ed8f8f0f9ac672921bb9b69bf89ca556b1afe/doc/media/brooklin-resize-kafka-source.png -------------------------------------------------------------------------------- /doc/media/brooklin-scale-down-and-up-100mb.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AppsFlyer/kafka-mirror-tester/ce3ed8f8f0f9ac672921bb9b69bf89ca556b1afe/doc/media/brooklin-scale-down-and-up-100mb.png -------------------------------------------------------------------------------- /doc/media/brooklin-scale-down-and-up.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AppsFlyer/kafka-mirror-tester/ce3ed8f8f0f9ac672921bb9b69bf89ca556b1afe/doc/media/brooklin-scale-down-and-up.png -------------------------------------------------------------------------------- /doc/media/brookling-killl-kafka-pod-take2-production-error-rate.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AppsFlyer/kafka-mirror-tester/ce3ed8f8f0f9ac672921bb9b69bf89ca556b1afe/doc/media/brookling-killl-kafka-pod-take2-production-error-rate.png -------------------------------------------------------------------------------- /doc/media/downsize-destination-cluster.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AppsFlyer/kafka-mirror-tester/ce3ed8f8f0f9ac672921bb9b69bf89ca556b1afe/doc/media/downsize-destination-cluster.png -------------------------------------------------------------------------------- /doc/media/kill-kafka-source-pod.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AppsFlyer/kafka-mirror-tester/ce3ed8f8f0f9ac672921bb9b69bf89ca556b1afe/doc/media/kill-kafka-source-pod.png -------------------------------------------------------------------------------- /doc/media/kill-pod-destination.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AppsFlyer/kafka-mirror-tester/ce3ed8f8f0f9ac672921bb9b69bf89ca556b1afe/doc/media/kill-pod-destination.png -------------------------------------------------------------------------------- /doc/media/new-topic.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AppsFlyer/kafka-mirror-tester/ce3ed8f8f0f9ac672921bb9b69bf89ca556b1afe/doc/media/new-topic.png -------------------------------------------------------------------------------- /doc/media/packet-loss-on-source-cluster.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AppsFlyer/kafka-mirror-tester/ce3ed8f8f0f9ac672921bb9b69bf89ca556b1afe/doc/media/packet-loss-on-source-cluster.png -------------------------------------------------------------------------------- /doc/media/packet-loss-on-workers.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AppsFlyer/kafka-mirror-tester/ce3ed8f8f0f9ac672921bb9b69bf89ca556b1afe/doc/media/packet-loss-on-workers.png -------------------------------------------------------------------------------- /doc/media/remove-worker.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AppsFlyer/kafka-mirror-tester/ce3ed8f8f0f9ac672921bb9b69bf89ca556b1afe/doc/media/remove-worker.png -------------------------------------------------------------------------------- /go.mk: -------------------------------------------------------------------------------- 1 | LOCAL_IP := `ifconfig | grep -Eo 'inet (addr:)?([0-9]*\.){3}[0-9]*' | grep -Eo '([0-9]*\.){3}[0-9]*' | grep -v '127.0.0.1' | head -1` 2 | 3 | ########################### 4 | # Build, test, run 5 | ########################### 6 | go-setup: 7 | @echo For mac: brew install librdkafka 8 | @echo For linux install librdkafka-dev 9 | 10 | go-build: go-generate go-test 11 | go build ./... 12 | 13 | go-run-producer: 14 | # Check out http://localhost:8001/metrics 15 | go run main.go produce --bootstrap-servers localhost:9093 --id $$(hostname) --message-size 100 --throughput 10 --topics topic1,topic2 --use-message-headers 16 | 17 | go-run-consumer: 18 | # Check out http://localhost:8000/metrics 19 | go run main.go consume --bootstrap-servers localhost:9093 --consumer-group group-4 --topics topic1,topic2 --use-message-headers 20 | 21 | go-test: 22 | go test ./... 23 | 24 | go-generate: 25 | go generate ./... 26 | 27 | ######################### 28 | # Docker 29 | ######################### 30 | go-docker-build: go-test 31 | docker build . -t rantav/kafka-mirror-tester:latest 32 | 33 | go-docker-push: go-docker-build 34 | # push to dockerhub 35 | docker push rantav/kafka-mirror-tester 36 | 37 | go-docker-run-consumer: 38 | # Check out http://localhost:8000/metrics 39 | docker run -p 8000:8000 rantav/kafka-mirror-tester consume --bootstrap-servers $(LOCAL_IP):9093 --consumer-group group-4 --topics topic1,topic2 40 | 41 | go-docker-run-producer: 42 | # Check out http://localhost:8001/metrics 43 | docker run rantav/kafka-mirror-tester produce --bootstrap-servers $(LOCAL_IP):9093 --id $$(hostname) --message-size 100 --throughput 10 --topics topic1,topic2 44 | 45 | go-release: go-docker-push 46 | -------------------------------------------------------------------------------- /go.mod: -------------------------------------------------------------------------------- 1 | module github.com/appsflyer/kafka-mirror-tester 2 | 3 | go 1.13 4 | 5 | require ( 6 | github.com/beorn7/perks v0.0.0-20180321164747-3a771d992973 7 | github.com/confluentinc/confluent-kafka-go v0.11.6 8 | github.com/davecgh/go-spew v1.1.1 9 | github.com/deckarep/golang-set v1.7.1 10 | github.com/dustin/go-humanize v1.0.0 11 | github.com/golang/protobuf v1.2.0 12 | github.com/inconshreveable/mousetrap v1.0.0 13 | github.com/jamiealquiza/tachymeter v1.1.2 14 | github.com/konsorten/go-windows-terminal-sequences v1.0.1 15 | github.com/matttproud/golang_protobuf_extensions v1.0.1 16 | github.com/paulbellamy/ratecounter v0.2.0 17 | github.com/pkg/errors v0.8.0 18 | github.com/pmezard/go-difflib v1.0.0 19 | github.com/prometheus/client_golang v0.9.2 20 | github.com/prometheus/client_model v0.0.0-20180712105110-5c3871d89910 21 | github.com/prometheus/common v0.1.0 22 | github.com/prometheus/procfs v0.0.0-20190104112138-b1a0a9a36d74 23 | github.com/sirupsen/logrus v1.2.0 24 | github.com/spf13/cobra v0.0.3 25 | github.com/spf13/pflag v1.0.3 26 | github.com/stretchr/testify v1.2.2 27 | golang.org/x/crypto v0.0.0-20181203042331-505ab145d0a9 28 | golang.org/x/sys v0.0.0-20181213200352-4d1cda033e06 29 | golang.org/x/time v0.0.0-20181108054448-85acf8d2951c 30 | ) 31 | -------------------------------------------------------------------------------- /go.sum: -------------------------------------------------------------------------------- 1 | github.com/alecthomas/template v0.0.0-20160405071501-a0175ee3bccc/go.mod h1:LOuyumcjzFXgccqObfd/Ljyb9UuFJ6TxHnclSeseNhc= 2 | github.com/alecthomas/units v0.0.0-20151022065526-2efee857e7cf/go.mod h1:ybxpYRFXyAe+OPACYpWeL0wqObRcbAqCMya13uyzqw0= 3 | github.com/beorn7/perks v0.0.0-20180321164747-3a771d992973 h1:xJ4a3vCFaGF/jqvzLMYoU8P317H5OQ+Via4RmuPwCS0= 4 | github.com/beorn7/perks v0.0.0-20180321164747-3a771d992973/go.mod h1:Dwedo/Wpr24TaqPxmxbtue+5NUziq4I4S80YR8gNf3Q= 5 | github.com/confluentinc/confluent-kafka-go v0.11.6 h1:rEblubnNXCjRThwAGnFSzLKYIRAoXLDC3A9r4ciziHU= 6 | github.com/confluentinc/confluent-kafka-go v0.11.6/go.mod h1:u2zNLny2xq+5rWeTQjFHbDzzNuba4P1vo31r9r4uAdg= 7 | github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c= 8 | github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38= 9 | github.com/deckarep/golang-set v1.7.1 h1:SCQV0S6gTtp6itiFrTqI+pfmJ4LN85S1YzhDf9rTHJQ= 10 | github.com/deckarep/golang-set v1.7.1/go.mod h1:93vsz/8Wt4joVM7c2AVqh+YRMiUSc14yDtF28KmMOgQ= 11 | github.com/dustin/go-humanize v1.0.0 h1:VSnTsYCnlFHaM2/igO1h6X3HA71jcobQuxemgkq4zYo= 12 | github.com/dustin/go-humanize v1.0.0/go.mod h1:HtrtbFcZ19U5GC7JDqmcUSB87Iq5E25KnS6fMYU6eOk= 13 | github.com/go-kit/kit v0.8.0/go.mod h1:xBxKIO96dXMWWy0MnWVtmwkA9/13aqxPnvrjFYMA2as= 14 | github.com/go-logfmt/logfmt v0.3.0/go.mod h1:Qt1PoO58o5twSAckw1HlFXLmHsOX5/0LbT9GBnD5lWE= 15 | github.com/go-stack/stack v1.8.0/go.mod h1:v0f6uXyyMGvRgIKkXu+yp6POWl0qKG85gN/melR3HDY= 16 | github.com/gogo/protobuf v1.1.1/go.mod h1:r8qH/GZQm5c6nD/R0oafs1akxWv10x8SbQlK7atdtwQ= 17 | github.com/golang/protobuf v1.2.0 h1:P3YflyNX/ehuJFLhxviNdFxQPkGK5cDcApsge1SqnvM= 18 | github.com/golang/protobuf v1.2.0/go.mod h1:6lQm79b+lXiMfvg/cZm0SGofjICqVBUtrP5yJMmIC1U= 19 | github.com/inconshreveable/mousetrap v1.0.0/go.mod h1:PxqpIevigyE2G7u3NXJIT2ANytuPF1OarO4DADm73n8= 20 | github.com/jamiealquiza/tachymeter v1.1.2 h1:cOgpMYFejxGSAe5f5JOb7uNPZ53kmEYwwpCrw1vDh2Q= 21 | github.com/jamiealquiza/tachymeter v1.1.2/go.mod h1:Ayf6zPZKEnLsc3winWEXJRkTBhdHo58HODAu1oFJkYU= 22 | github.com/julienschmidt/httprouter v1.2.0/go.mod h1:SYymIcj16QtmaHHD7aYtjjsJG7VTCxuUUipMqKk8s4w= 23 | github.com/konsorten/go-windows-terminal-sequences v1.0.1/go.mod h1:T0+1ngSBFLxvqU3pZ+m/2kptfBszLMUkC4ZK/EgS/cQ= 24 | github.com/kr/logfmt v0.0.0-20140226030751-b84e30acd515/go.mod h1:+0opPa2QZZtGFBFZlji/RkVcI2GknAs/DXo4wKdlNEc= 25 | github.com/matttproud/golang_protobuf_extensions v1.0.1 h1:4hp9jkHxhMHkqkrB3Ix0jegS5sx/RkqARlsWZ6pIwiU= 26 | github.com/matttproud/golang_protobuf_extensions v1.0.1/go.mod h1:D8He9yQNgCq6Z5Ld7szi9bcBfOoFv/3dc6xSMkL2PC0= 27 | github.com/mwitkow/go-conntrack v0.0.0-20161129095857-cc309e4a2223/go.mod h1:qRWi+5nqEBWmkhHvq77mSJWrCKwh8bxhgT7d/eI7P4U= 28 | github.com/paulbellamy/ratecounter v0.2.0 h1:2L/RhJq+HA8gBQImDXtLPrDXK5qAj6ozWVK/zFXVJGs= 29 | github.com/paulbellamy/ratecounter v0.2.0/go.mod h1:Hfx1hDpSGoqxkVVpBi/IlYD7kChlfo5C6hzIHwPqfFE= 30 | github.com/pkg/errors v0.8.0 h1:WdK/asTD0HN+q6hsWO3/vpuAkAr+tw6aNJNDFFf0+qw= 31 | github.com/pkg/errors v0.8.0/go.mod h1:bwawxfHBFNV+L2hUp1rHADufV3IMtnDRdf1r5NINEl0= 32 | github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM= 33 | github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4= 34 | github.com/prometheus/client_golang v0.9.1/go.mod h1:7SWBe2y4D6OKWSNQJUaRYU/AaXPKyh/dDVn+NZz0KFw= 35 | github.com/prometheus/client_golang v0.9.2 h1:awm861/B8OKDd2I/6o1dy3ra4BamzKhYOiGItCeZ740= 36 | github.com/prometheus/client_golang v0.9.2/go.mod h1:OsXs2jCmiKlQ1lTBmv21f2mNfw4xf/QclQDMrYNZzcM= 37 | github.com/prometheus/client_model v0.0.0-20180712105110-5c3871d89910 h1:idejC8f05m9MGOsuEi1ATq9shN03HrxNkD/luQvxCv8= 38 | github.com/prometheus/client_model v0.0.0-20180712105110-5c3871d89910/go.mod h1:MbSGuTsp3dbXC40dX6PRTWyKYBIrTGTE9sqQNg2J8bo= 39 | github.com/prometheus/common v0.0.0-20181126121408-4724e9255275/go.mod h1:daVV7qP5qjZbuso7PdcryaAu0sAZbrN9i7WWcTMWvro= 40 | github.com/prometheus/common v0.1.0 h1:IxU7wGikQPAcoOd3/f4Ol7+vIKS1Sgu08tzjktR4nJE= 41 | github.com/prometheus/common v0.1.0/go.mod h1:TNfzLD0ON7rHzMJeJkieUDPYmFC7Snx/y86RQel1bk4= 42 | github.com/prometheus/procfs v0.0.0-20181005140218-185b4288413d/go.mod h1:c3At6R/oaqEKCNdg8wHV1ftS6bRYblBhIjjI8uT2IGk= 43 | github.com/prometheus/procfs v0.0.0-20181204211112-1dc9a6cbc91a/go.mod h1:c3At6R/oaqEKCNdg8wHV1ftS6bRYblBhIjjI8uT2IGk= 44 | github.com/prometheus/procfs v0.0.0-20190104112138-b1a0a9a36d74 h1:d1Xoc24yp/pXmWl2leBiBA+Tptce6cQsA+MMx/nOOcY= 45 | github.com/prometheus/procfs v0.0.0-20190104112138-b1a0a9a36d74/go.mod h1:c3At6R/oaqEKCNdg8wHV1ftS6bRYblBhIjjI8uT2IGk= 46 | github.com/sirupsen/logrus v1.2.0 h1:juTguoYk5qI21pwyTXY3B3Y5cOTH3ZUyZCg1v/mihuo= 47 | github.com/sirupsen/logrus v1.2.0/go.mod h1:LxeOpSwHxABJmUn/MG1IvRgCAasNZTLOkJPxbbu5VWo= 48 | github.com/spf13/cobra v0.0.3 h1:ZlrZ4XsMRm04Fr5pSFxBgfND2EBVa1nLpiy1stUsX/8= 49 | github.com/spf13/cobra v0.0.3/go.mod h1:1l0Ry5zgKvJasoi3XT1TypsSe7PqH0Sj9dhYf7v3XqQ= 50 | github.com/spf13/pflag v1.0.3 h1:zPAT6CGy6wXeQ7NtTnaTerfKOsV6V6F8agHXFiazDkg= 51 | github.com/spf13/pflag v1.0.3/go.mod h1:DYY7MBk1bdzusC3SYhjObp+wFpr4gzcvqqNjLnInEg4= 52 | github.com/stretchr/objx v0.1.1/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME= 53 | github.com/stretchr/testify v1.2.2 h1:bSDNvY7ZPG5RlJ8otE/7V6gMiyenm9RtJ7IUVIAoJ1w= 54 | github.com/stretchr/testify v1.2.2/go.mod h1:a8OnRcib4nhh0OaRAV+Yts87kKdq0PP7pXfy6kDkUVs= 55 | golang.org/x/crypto v0.0.0-20180904163835-0709b304e793/go.mod h1:6SG95UA2DQfeDnfUPMdvaQW0Q7yPrPDi9nlGo2tz2b4= 56 | golang.org/x/crypto v0.0.0-20181203042331-505ab145d0a9 h1:mKdxBk7AujPs8kU4m80U72y/zjbZ3UcXC7dClwKbUI0= 57 | golang.org/x/crypto v0.0.0-20181203042331-505ab145d0a9/go.mod h1:6SG95UA2DQfeDnfUPMdvaQW0Q7yPrPDi9nlGo2tz2b4= 58 | golang.org/x/net v0.0.0-20181114220301-adae6a3d119a/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4= 59 | golang.org/x/net v0.0.0-20181201002055-351d144fa1fc/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4= 60 | golang.org/x/sync v0.0.0-20181108010431-42b317875d0f/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM= 61 | golang.org/x/sys v0.0.0-20180905080454-ebe1bf3edb33/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY= 62 | golang.org/x/sys v0.0.0-20181116152217-5ac8a444bdc5/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY= 63 | golang.org/x/sys v0.0.0-20181213200352-4d1cda033e06 h1:0oC8rFnE+74kEmuHZ46F6KHsMr5Gx2gUQPuNz28iQZM= 64 | golang.org/x/sys v0.0.0-20181213200352-4d1cda033e06/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY= 65 | golang.org/x/time v0.0.0-20181108054448-85acf8d2951c h1:fqgJT0MGcGpPgpWU7VRdRjuArfcOvC4AoJmILihzhDg= 66 | golang.org/x/time v0.0.0-20181108054448-85acf8d2951c/go.mod h1:tRJNPiyCQ0inRvYxbN9jk5I+vvW/OXSQhTDSoE431IQ= 67 | gopkg.in/alecthomas/kingpin.v2 v2.2.6/go.mod h1:FMv+mEhP44yOT+4EoQTLFTRgOQ1FBLkstjWtayDeSgw= 68 | gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0= 69 | gopkg.in/yaml.v2 v2.2.1/go.mod h1:hI93XBmqTisBFMUTm0b8Fm+jr3Dg1NNxqwp+5A1VGuI= 70 | -------------------------------------------------------------------------------- /k8s/brooklin/00namespace.yml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: Namespace 3 | metadata: 4 | name: brooklin 5 | -------------------------------------------------------------------------------- /k8s/brooklin/20zookeeper.yml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: Service 3 | metadata: 4 | namespace: brooklin 5 | name: zk-hs 6 | labels: 7 | app: zk 8 | spec: 9 | ports: 10 | - port: 2888 11 | name: server 12 | - port: 3888 13 | name: leader-election 14 | clusterIP: None 15 | selector: 16 | app: zk 17 | --- 18 | apiVersion: v1 19 | kind: Service 20 | metadata: 21 | namespace: brooklin 22 | name: zookeeper 23 | labels: 24 | app: zk 25 | spec: 26 | ports: 27 | - port: 2181 28 | name: client 29 | selector: 30 | app: zk 31 | --- 32 | apiVersion: policy/v1beta1 33 | kind: PodDisruptionBudget 34 | metadata: 35 | namespace: brooklin 36 | name: zk-pdb 37 | spec: 38 | selector: 39 | matchLabels: 40 | app: zk 41 | maxUnavailable: 1 42 | --- 43 | apiVersion: apps/v1beta1 44 | kind: StatefulSet 45 | metadata: 46 | namespace: brooklin 47 | name: zk 48 | spec: 49 | selector: 50 | matchLabels: 51 | app: zk 52 | serviceName: zk-hs 53 | replicas: 1 54 | updateStrategy: 55 | type: RollingUpdate 56 | podManagementPolicy: Parallel 57 | template: 58 | metadata: 59 | labels: 60 | app: zk 61 | spec: 62 | affinity: 63 | podAntiAffinity: 64 | requiredDuringSchedulingIgnoredDuringExecution: 65 | - labelSelector: 66 | matchExpressions: 67 | - key: "app" 68 | operator: In 69 | values: 70 | - zk-hs 71 | topologyKey: "kubernetes.io/hostname" 72 | containers: 73 | - name: kubernetes-zookeeper 74 | imagePullPolicy: Always 75 | image: "k8s.gcr.io/kubernetes-zookeeper:1.0-3.4.10" # Consider an upgrade to ZK? 76 | resources: 77 | requests: 78 | memory: "1Gi" 79 | cpu: "0.5" 80 | ports: 81 | - containerPort: 2181 82 | name: client 83 | - containerPort: 2888 84 | name: server 85 | - containerPort: 3888 86 | name: leader-election 87 | command: 88 | - sh 89 | - -c 90 | - "start-zookeeper \ 91 | --servers=1 \ 92 | --data_dir=/var/lib/zookeeper/data \ 93 | --data_log_dir=/var/lib/zookeeper/data/log \ 94 | --conf_dir=/opt/zookeeper/conf \ 95 | --client_port=2181 \ 96 | --election_port=3888 \ 97 | --server_port=2888 \ 98 | --tick_time=2000 \ 99 | --init_limit=10 \ 100 | --sync_limit=5 \ 101 | --heap=512M \ 102 | --max_client_cnxns=200 \ 103 | --snap_retain_count=3 \ 104 | --purge_interval=12 \ 105 | --max_session_timeout=40000 \ 106 | --min_session_timeout=4000 \ 107 | --log_level=INFO" 108 | readinessProbe: 109 | exec: 110 | command: 111 | - sh 112 | - -c 113 | - "zookeeper-ready 2181" 114 | initialDelaySeconds: 10 115 | timeoutSeconds: 5 116 | livenessProbe: 117 | exec: 118 | command: 119 | - sh 120 | - -c 121 | - "zookeeper-ready 2181" 122 | initialDelaySeconds: 10 123 | timeoutSeconds: 5 124 | volumeMounts: 125 | - name: data 126 | mountPath: /var/lib/zookeeper 127 | volumes: 128 | - name: data 129 | emptyDir: {} 130 | securityContext: 131 | runAsUser: 1000 132 | fsGroup: 1000 133 | -------------------------------------------------------------------------------- /k8s/brooklin/25env-config.yml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: ConfigMap 3 | metadata: 4 | name: brooklin-envs 5 | namespace: brooklin 6 | data: 7 | BROOKLIN_CLUSTER_NAME: brooklin-quickstart 8 | BROOKLIN_ZOOKEEPER_CONNECT: zookeeper.brooklin.svc.cluster.local:2181 9 | BROOKLIN_HTTP_PORT: "32311" 10 | KAFKA_TP_BOOTSTRAP_SERVERS: broker.kafka-destination.svc.cluster.local:9092 11 | KAFKA_TP_ZOOKEEPER_CONNECT: zookeeper.kafka-destination.svc.cluster.local:2181 12 | KAFKA_TP_CLIENT_ID: brooklin-producer-1 13 | BROOKLIN_CONFIG: /etc/brooklin-writable 14 | JMX_OPTS: "-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=1099 -Dcom.sun.management.jmxremote.rmi.port=1099 -Dcom.sun.management.jmxremote.local.only=false -Djava.rmi.server.hostname=127.0.0.1 " 15 | OPTS: "-javaagent:/etc/brooklin-writable/jmx_prometheus_javaagent-0.3.1.jar=8080:/etc/jmx-config/jmx-prometheus-javaagent-config.yml" 16 | 17 | configure.sh: |- 18 | #!/bin/sh 19 | set -x 20 | wget https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/0.3.1/jmx_prometheus_javaagent-0.3.1.jar -O /etc/brooklin-writable/jmx_prometheus_javaagent-0.3.1.jar 21 | cp /etc/brooklin/server.properties /etc/brooklin-writable/server.properties 22 | 23 | 24 | server.properties: |- 25 | ############################# Server Basics ############################# 26 | brooklin.server.coordinator.cluster=brooklin-cluster 27 | brooklin.server.coordinator.zkAddress=localhost:2181 28 | brooklin.server.httpPort=32311 29 | brooklin.server.connectorNames=testC,fileC,dirC,kafkaC,kafkaMirroringC 30 | brooklin.server.transportProviderNames=dirTP,kafkaTP 31 | brooklin.server.csvMetricsDir=/tmp/brooklin-example/ 32 | 33 | ########################### Test event producing connector Configs ###################### 34 | brooklin.server.connector.testC.factoryClassName=com.linkedin.datastream.connectors.TestEventProducingConnectorFactory 35 | brooklin.server.connector.testC.assignmentStrategyFactory=com.linkedin.datastream.server.assignment.LoadbalancingStrategyFactory 36 | brooklin.server.connector.testC.strategy.TasksPerDatastream = 4 37 | 38 | ########################### File connector Configs ###################### 39 | brooklin.server.connector.fileC.factoryClassName=com.linkedin.datastream.connectors.file.FileConnectorFactory 40 | brooklin.server.connector.fileC.assignmentStrategyFactory=com.linkedin.datastream.server.assignment.BroadcastStrategyFactory 41 | brooklin.server.connector.fileC.strategy.maxTasks=1 42 | 43 | ########################### Directory connector Configs ###################### 44 | brooklin.server.connector.dirC.factoryClassName=com.linkedin.datastream.connectors.directory.DirectoryConnectorFactory 45 | brooklin.server.connector.dirC.assignmentStrategyFactory=com.linkedin.datastream.server.assignment.BroadcastStrategyFactory 46 | brooklin.server.connector.dirC.strategy.maxTasks=1 47 | 48 | ########################### Kafka connector Configs ###################### 49 | brooklin.server.connector.kafkaC.factoryClassName=com.linkedin.datastream.connectors.kafka.KafkaConnectorFactory 50 | brooklin.server.connector.kafkaC.assignmentStrategyFactory=com.linkedin.datastream.server.assignment.BroadcastStrategyFactory 51 | 52 | ########################### Kafka Mirroring connector Configs ###################### 53 | brooklin.server.connector.kafkaMirroringC.factoryClassName=com.linkedin.datastream.connectors.kafka.mirrormaker.KafkaMirrorMakerConnectorFactory 54 | brooklin.server.connector.kafkaMirroringC.assignmentStrategyFactory=com.linkedin.datastream.server.assignment.BroadcastStrategyFactory 55 | brooklin.server.connector.kafkaMirroringC.consumer.max.poll.records=10000 56 | brooklin.server.connector.kafkaMirroringC.consumer.fetch.max.wait.ms=10000 57 | # fetch.max.bytes = 52428800 58 | # max.partition.fetch.bytes = 1048576 59 | # receive.buffer.bytes = 65536 60 | brooklin.server.connector.kafkaMirroringC.consumer.receive.buffer.bytes=524288 61 | brooklin.server.connector.kafkaMirroringC.consumer.max.partition.fetch.bytes=262144 62 | brooklin.server.connector.kafkaMirroringC.pausePartitionOnError=true 63 | brooklin.server.connector.kafkaMirroringC.pauseErrorPartitionDurationMs=30000 64 | 65 | ########################### Directory transport provider configs ###################### 66 | brooklin.server.transportProvider.dirTP.factoryClassName=com.linkedin.datastream.server.DirectoryTransportProviderAdminFactory 67 | 68 | ########################### Kafka transport provider configs ###################### 69 | brooklin.server.transportProvider.kafkaTP.factoryClassName=com.linkedin.datastream.kafka.KafkaTransportProviderAdminFactory 70 | brooklin.server.transportProvider.kafkaTP.bootstrap.servers=localhost:9092 71 | brooklin.server.transportProvider.kafkaTP.zookeeper.connect=localhost:2181 72 | brooklin.server.transportProvider.kafkaTP.client.id=datastream-producer 73 | # brooklin.server.transportProvider.kafkaTP.producer.linger.ms=1000 74 | # brooklin.server.transportProvider.kafkaTP.producer.batch.size=32768 75 | # brooklin.server.transportProvider.kafkaTP.producer.send.buffer.bytes=262144 76 | # brooklin.server.transportProvider.kafkaTP.producer.max.request.size=262144 77 | # brooklin.server.transportProvider.kafkaTP.producer.max.in.flight.requests.per.connection=1 78 | 79 | -------------------------------------------------------------------------------- /k8s/brooklin/25jmx-prometheus-javaagent-config.yml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: ConfigMap 3 | metadata: 4 | name: brooklin-jmx-prometheus-javaagent-config 5 | namespace: brooklin 6 | data: 7 | jmx-prometheus-javaagent-config.yml: |+ 8 | startDelaySeconds: 0 9 | lowercaseOutputName: true 10 | lowercaseOutputLabelNames: true 11 | whitelistObjectNames: 12 | - "java.lang:*" 13 | - "metrics:*" 14 | - "kafka.consumer:*" 15 | - "kafka.producer:*" 16 | -------------------------------------------------------------------------------- /k8s/brooklin/30brooklin.yml: -------------------------------------------------------------------------------- 1 | apiVersion: extensions/v1beta1 2 | kind: Deployment 3 | metadata: 4 | namespace: brooklin 5 | name: brooklin 6 | labels: 7 | app: brooklin 8 | spec: 9 | replicas: 32 10 | selector: 11 | matchLabels: 12 | app: brooklin 13 | template: 14 | metadata: 15 | labels: 16 | app: brooklin 17 | spec: 18 | terminationGracePeriodSeconds: 10 19 | initContainers: 20 | - name: init-zk 21 | image: busybox 22 | command: 23 | - /bin/sh 24 | - -c 25 | - 'until [ "imok" = "$(echo ruok | nc -w 1 $(echo $BROOKLIN_ZOOKEEPER_CONNECT | cut -d: -f1) $(echo $BROOKLIN_ZOOKEEPER_CONNECT | cut -d: -f2))" ] ; do echo waiting ; sleep 10 ; done' 26 | envFrom: 27 | - configMapRef: 28 | name: brooklin-envs 29 | - name: init-config 30 | image: busybox 31 | envFrom: 32 | - configMapRef: 33 | name: brooklin-envs 34 | command: ['sh', '/etc/brooklin/configure.sh'] 35 | volumeMounts: 36 | - name: config 37 | mountPath: /etc/brooklin 38 | - name: config-writable 39 | mountPath: /etc/brooklin-writable 40 | containers: 41 | - name: brooklin 42 | image: rantav/brooklin:1.0.2-0 43 | imagePullPolicy: IfNotPresent 44 | env: 45 | - name: HEAP_OPTS 46 | value: "-Xmx2G -Xms2G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=100 -XX:InitiatingHeapOccupancyPercent=35 -XX:MinMetaspaceFreeRatio=50 -XX:MaxMetaspaceFreeRatio=80 -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps" 47 | envFrom: 48 | - configMapRef: 49 | name: brooklin-envs 50 | ports: 51 | - name: service 52 | containerPort: 32311 53 | - name: metrics 54 | containerPort: 8080 55 | resources: 56 | requests: 57 | cpu: 700m 58 | memory: 2Gi 59 | limits: 60 | cpu: 700m 61 | volumeMounts: 62 | - name: jmx-config 63 | mountPath: /etc/jmx-config 64 | - name: config 65 | mountPath: /etc/brooklin 66 | - name: config-writable 67 | mountPath: /etc/brooklin-writable 68 | volumes: 69 | - name: jmx-config 70 | configMap: 71 | name: brooklin-jmx-prometheus-javaagent-config 72 | - name: config 73 | configMap: 74 | name: brooklin-envs 75 | - name: config-writable 76 | emptyDir: {} 77 | affinity: 78 | podAntiAffinity: 79 | requiredDuringSchedulingIgnoredDuringExecution: 80 | - labelSelector: 81 | matchExpressions: 82 | - key: app 83 | operator: In 84 | values: 85 | - kafka-destination 86 | namespaces: 87 | - kafka-destination 88 | topologyKey: "kubernetes.io/hostname" 89 | #- labelSelector: 90 | #matchExpressions: 91 | #- key: app 92 | #operator: In 93 | #values: 94 | #- brooklin 95 | #- key: component 96 | #operator: In 97 | #values: 98 | #- worker 99 | #topologyKey: "kubernetes.io/hostname" 100 | -------------------------------------------------------------------------------- /k8s/brooklin/40monitoring.yml: -------------------------------------------------------------------------------- 1 | # Headless service just for the sake of exposing the metrics 2 | apiVersion: v1 3 | kind: Service 4 | metadata: 5 | name: brooklin 6 | namespace: brooklin 7 | labels: 8 | app: brooklin 9 | spec: 10 | ports: 11 | - name: metrics 12 | port: 8080 13 | clusterIP: None 14 | selector: 15 | app: brooklin 16 | --- 17 | apiVersion: monitoring.coreos.com/v1 18 | kind: ServiceMonitor 19 | metadata: 20 | labels: 21 | k8s-app: brooklin 22 | name: brooklin 23 | namespace: monitoring 24 | spec: 25 | endpoints: 26 | - port: metrics 27 | jobLabel: k8s-app 28 | namespaceSelector: 29 | matchNames: 30 | - brooklin 31 | selector: 32 | matchLabels: 33 | app: brooklin 34 | --- 35 | apiVersion: rbac.authorization.k8s.io/v1beta1 36 | kind: ClusterRole 37 | metadata: 38 | name: prometheus-k8s 39 | namespace: brooklin 40 | rules: 41 | - apiGroups: [""] 42 | resources: 43 | - nodes 44 | - services 45 | - endpoints 46 | - pods 47 | verbs: ["get", "list", "watch"] 48 | - apiGroups: [""] 49 | resources: 50 | - configmaps 51 | verbs: ["get"] 52 | - nonResourceURLs: ["/metrics"] 53 | verbs: ["get"] 54 | --- 55 | apiVersion: rbac.authorization.k8s.io/v1beta1 56 | kind: ClusterRoleBinding 57 | metadata: 58 | name: prometheus-k8s 59 | roleRef: 60 | apiGroup: rbac.authorization.k8s.io 61 | kind: ClusterRole 62 | name: prometheus-k8s 63 | subjects: 64 | - kind: ServiceAccount 65 | name: prometheus-k8s 66 | namespace: monitoring 67 | -------------------------------------------------------------------------------- /k8s/brooklin/delete-replicate-topic.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | set -x 4 | set -e 5 | 6 | if [ "$#" -ne 1 ]; then 7 | echo "Illegal number of parameters. Looking for topic name" 8 | exit 1 9 | fi 10 | 11 | topic_name=$1 12 | 13 | brooklin_pod=$(kubectl --context eu-west-1.k8s.local get pods -n brooklin -l app=brooklin -o 'jsonpath={.items[0].metadata.name}') 14 | kubectl --context eu-west-1.k8s.local exec -n brooklin $brooklin_pod -- bash -c "unset JMX_OPTS; unset JMX_PORT; unset OPTS; \$BROOKLIN_HOME/bin/brooklin-rest-client.sh -o DELETE -u http://localhost:32311/ -n mirror-$topic_name 2> /dev/null" 15 | -------------------------------------------------------------------------------- /k8s/brooklin/replicate-topic.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | set -x 4 | set -e 5 | 6 | if [ "$#" -ne 1 ]; then 7 | echo "Illegal number of parameters. Looking for topic name" 8 | exit 1 9 | fi 10 | 11 | topic_name=$1 12 | 13 | kafka_source_ip=$(kubectl --context us-east-1.k8s.local get node $(kubectl --context us-east-1.k8s.local -n kafka-source get po kafka-source-0 -o jsonpath='{.spec.nodeName}') -o jsonpath='{.status.addresses[?(@.type=="ExternalIP")].address}') 14 | brooklin_pod=$(kubectl --context eu-west-1.k8s.local get pods -n brooklin -l app=brooklin -o 'jsonpath={.items[0].metadata.name}') 15 | kubectl --context eu-west-1.k8s.local exec -n brooklin $brooklin_pod -- bash -c "unset JMX_OPTS; unset JMX_PORT; unset OPTS; \$BROOKLIN_HOME/bin/brooklin-rest-client.sh -o CREATE -u http://localhost:32311/ -n mirror-$topic_name -s \"kafka://$kafka_source_ip:9093/^$topic_name$\" -c kafkaMirroringC -t kafkaTP -m '{\"owner\":\"test-user\",\"system.reuseExistingDestination\":\"false\",\"system.destination.identityPartitioningEnabled\":true}' 2>/dev/null" 16 | -------------------------------------------------------------------------------- /k8s/brooklin/test.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | set -x 4 | # Test Kafka to see if a topic had been replicated 5 | kubectl --context eu-west-1.k8s.local -n kafka-destination wait --for=condition=Ready pod/kafka-destination-0 --timeout=-1s 6 | kubectl --context us-east-1.k8s.local -n kafka-source wait --for=condition=Ready pod/kafka-source-0 --timeout=-1s 7 | 8 | kubectl --context eu-west-1.k8s.local -n brooklin wait --for=condition=Available deployment/brooklin --timeout=-1s 9 | while [[ $(kubectl --context eu-west-1.k8s.local get pods -n brooklin -l app=brooklin -o 'jsonpath={..status.conditions[?(@.type=="Ready")].status}' | cut -d' ' -f1) != "True" ]]; do echo "waiting for brooklin pod..." && sleep 10; done 10 | 11 | # Run end to end tests. Produce to the source cluster, consume from the destination cluster 12 | TOPIC="_test_replicator_$(date +%s)" 13 | kubectl --context us-east-1.k8s.local exec -n kafka-source kafka-source-0 -- bash -c "unset JMX_PORT; echo ' >>>>>>>>>>>>> REPLICATOR GREAT SUCCESS! <<<<<<<<<<<<<<<<' | /opt/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic $TOPIC" 14 | 15 | $(dirname "$0")/replicate-topic.sh $TOPIC 16 | 17 | kubectl --context eu-west-1.k8s.local exec -n kafka-destination kafka-destination-0 -- bash -c "unset JMX_PORT; /opt/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning --topic $TOPIC --max-messages 1" 18 | 19 | $(dirname "$0")/delete-replicate-topic.sh $TOPIC -------------------------------------------------------------------------------- /k8s/kafka-destination/00namespace.yml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: Namespace 3 | metadata: 4 | name: kafka-destination 5 | -------------------------------------------------------------------------------- /k8s/kafka-destination/10broker-config.yml: -------------------------------------------------------------------------------- 1 | kind: ConfigMap 2 | metadata: 3 | name: broker-config 4 | namespace: kafka-destination 5 | apiVersion: v1 6 | data: 7 | init.sh: |- 8 | #!/bin/bash 9 | set -x 10 | 11 | KAFKA_BROKER_ID=${HOSTNAME##*-} 12 | sed "s/#init#broker.id=#init#/broker.id=$KAFKA_BROKER_ID/" /etc/kafka/server.properties > /etc/kafka-writable/server.properties 13 | 14 | hash kubectl 2>/dev/null || { 15 | sed -i "s/#init#broker.rack=#init#/#init#broker.rack=# kubectl not found in path/" /etc/kafka-writable/server.properties 16 | } && { 17 | ZONE=$(kubectl get node "$NODE_NAME" -o=go-template='{{index .metadata.labels "failure-domain.beta.kubernetes.io/zone"}}') 18 | if [ $? -ne 0 ]; then 19 | sed -i "s/#init#broker.rack=#init#/#init#broker.rack=# zone lookup failed, see -c init-config logs/" /etc/kafka-writable/server.properties 20 | elif [ "x$ZONE" == "x" ]; then 21 | sed -i "s/#init#broker.rack=#init#/#init#broker.rack=# zone label not found for node $NODE_NAME/" /etc/kafka-writable/server.properties 22 | else 23 | sed -i "s/#init#broker.rack=#init#/broker.rack=$ZONE/" /etc/kafka-writable/server.properties 24 | fi 25 | } 26 | 27 | server.properties: |- 28 | # Licensed to the Apache Software Foundation (ASF) under one or more 29 | # contributor license agreements. See the NOTICE file distributed with 30 | # this work for additional information regarding copyright ownership. 31 | # The ASF licenses this file to You under the Apache License, Version 2.0 32 | # (the "License"); you may not use this file except in compliance with 33 | # the License. You may obtain a copy of the License at 34 | # 35 | # http://www.apache.org/licenses/LICENSE-2.0 36 | # 37 | # Unless required by applicable law or agreed to in writing, software 38 | # distributed under the License is distributed on an "AS IS" BASIS, 39 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 40 | # See the License for the specific language governing permissions and 41 | # limitations under the License. 42 | 43 | # see kafka.server.KafkaConfig for additional details and defaults 44 | 45 | ############################# Server Basics ############################# 46 | 47 | # The id of the broker. This must be set to a unique integer for each broker. 48 | #init#broker.id=#init# 49 | 50 | #init#broker.rack=#init# 51 | 52 | # Switch to enable topic deletion or not, default value is false 53 | delete.topic.enable=true 54 | 55 | ############################# Socket Server Settings ############################# 56 | 57 | # The address the socket server listens on. It will get the value returned from 58 | # java.net.InetAddress.getCanonicalHostName() if not configured. 59 | # FORMAT: 60 | # listeners = listener_name://host_name:port 61 | # EXAMPLE: 62 | # listeners = PLAINTEXT://your.host.name:9092 63 | #listeners=PLAINTEXT://:9092 64 | 65 | # Hostname and port the broker will advertise to producers and consumers. If not set, 66 | # it uses the value for "listeners" if configured. Otherwise, it will use the value 67 | # returned from java.net.InetAddress.getCanonicalHostName(). 68 | #advertised.listeners=PLAINTEXT://your.host.name:9092 69 | 70 | # Maps listener names to security protocols, the default is for them to be the same. See the config documentation for more details 71 | #listener.security.protocol.map=PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL 72 | 73 | # The number of threads that the server uses for receiving requests from the network and sending responses to the network 74 | num.network.threads=8 75 | 76 | # The number of threads that the server uses for processing requests, which may include disk I/O 77 | num.io.threads=8 78 | 79 | # The send buffer (SO_SNDBUF) used by the socket server 80 | socket.send.buffer.bytes=104857600 81 | 82 | # The receive buffer (SO_RCVBUF) used by the socket server 83 | socket.receive.buffer.bytes=104857600 84 | 85 | # The maximum size of a request that the socket server will accept (protection against OOM) 86 | socket.request.max.bytes=104857600 87 | 88 | 89 | ############################# Log Basics ############################# 90 | 91 | # A comma seperated list of directories under which to store log files 92 | log.dirs=/tmp/kafka-logs 93 | 94 | # The default number of log partitions per topic. More partitions allow greater 95 | # parallelism for consumption, but this will also result in more files across 96 | # the brokers. 97 | num.partitions=1 98 | 99 | # The number of threads per data directory to be used for log recovery at startup and flushing at shutdown. 100 | # This value is recommended to be increased for installations with data dirs located in RAID array. 101 | num.recovery.threads.per.data.dir=1 102 | 103 | ############################# Internal Topic Settings ############################# 104 | # The replication factor for the group metadata internal topics "__consumer_offsets" and "__transaction_state" 105 | # For anything other than development testing, a value greater than 1 is recommended for to ensure availability such as 3. 106 | offsets.topic.replication.factor=1 107 | transaction.state.log.replication.factor=1 108 | transaction.state.log.min.isr=1 109 | 110 | ############################# Log Flush Policy ############################# 111 | 112 | # Messages are immediately written to the filesystem but by default we only fsync() to sync 113 | # the OS cache lazily. The following configurations control the flush of data to disk. 114 | # There are a few important trade-offs here: 115 | # 1. Durability: Unflushed data may be lost if you are not using replication. 116 | # 2. Latency: Very large flush intervals may lead to latency spikes when the flush does occur as there will be a lot of data to flush. 117 | # 3. Throughput: The flush is generally the most expensive operation, and a small flush interval may lead to exceessive seeks. 118 | # The settings below allow one to configure the flush policy to flush data after a period of time or 119 | # every N messages (or both). This can be done globally and overridden on a per-topic basis. 120 | 121 | # The number of messages to accept before forcing a flush of data to disk 122 | #log.flush.interval.messages=10000 123 | 124 | # The maximum amount of time a message can sit in a log before we force a flush 125 | #log.flush.interval.ms=1000 126 | 127 | ############################# Log Retention Policy ############################# 128 | 129 | # The following configurations control the disposal of log segments. The policy can 130 | # be set to delete segments after a period of time, or after a given size has accumulated. 131 | # A segment will be deleted whenever *either* of these criteria are met. Deletion always happens 132 | # from the end of the log. 133 | 134 | # The minimum age of a log file to be eligible for deletion due to age 135 | log.retention.hours=168 136 | 137 | # A size-based retention policy for logs. Segments are pruned from the log as long as the remaining 138 | # segments don't drop below log.retention.bytes. Functions independently of log.retention.hours. 139 | #log.retention.bytes=1073741824 140 | 141 | # The maximum size of a log segment file. When this size is reached a new log segment will be created. 142 | log.segment.bytes=1073741824 143 | 144 | # The interval at which log segments are checked to see if they can be deleted according 145 | # to the retention policies 146 | log.retention.check.interval.ms=60000 147 | 148 | ############################# Zookeeper ############################# 149 | 150 | # Zookeeper connection string (see zookeeper docs for details). 151 | # This is a comma separated host:port pairs, each corresponding to a zk 152 | # server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002". 153 | # You can also append an optional chroot string to the urls to specify the 154 | # root directory for all kafka znodes. 155 | zookeeper.connect=zookeeper.kafka-destination.svc.cluster.local:2181 156 | 157 | # Timeout in ms for connecting to zookeeper 158 | zookeeper.connection.timeout.ms=6000 159 | 160 | 161 | ############################# Group Coordinator Settings ############################# 162 | 163 | # The following configuration specifies the time, in milliseconds, that the GroupCoordinator will delay the initial consumer rebalance. 164 | # The rebalance will be further delayed by the value of group.initial.rebalance.delay.ms as new members join the group, up to a maximum of max.poll.interval.ms. 165 | # The default value for this is 3 seconds. 166 | # We override this to 0 here as it makes for a better out-of-the-box experience for development and testing. 167 | # However, in production environments the default value of 3 seconds is more suitable as this will help to avoid unnecessary, and potentially expensive, rebalances during application startup. 168 | group.initial.rebalance.delay.ms=0 169 | 170 | log4j.properties: |- 171 | # Licensed to the Apache Software Foundation (ASF) under one or more 172 | # contributor license agreements. See the NOTICE file distributed with 173 | # this work for additional information regarding copyright ownership. 174 | # The ASF licenses this file to You under the Apache License, Version 2.0 175 | # (the "License"); you may not use this file except in compliance with 176 | # the License. You may obtain a copy of the License at 177 | # 178 | # http://www.apache.org/licenses/LICENSE-2.0 179 | # 180 | # Unless required by applicable law or agreed to in writing, software 181 | # distributed under the License is distributed on an "AS IS" BASIS, 182 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 183 | # See the License for the specific language governing permissions and 184 | # limitations under the License. 185 | 186 | # Unspecified loggers and loggers with additivity=true output to server.log and stdout 187 | # Note that INFO only applies to unspecified loggers, the log level of the child logger is used otherwise 188 | log4j.rootLogger=INFO, stdout 189 | 190 | log4j.appender.stdout=org.apache.log4j.ConsoleAppender 191 | log4j.appender.stdout.layout=org.apache.log4j.PatternLayout 192 | log4j.appender.stdout.layout.ConversionPattern=[%d] %p %m (%c)%n 193 | 194 | log4j.appender.kafkaAppender=org.apache.log4j.DailyRollingFileAppender 195 | log4j.appender.kafkaAppender.DatePattern='.'yyyy-MM-dd-HH 196 | log4j.appender.kafkaAppender.File=${kafka.logs.dir}/server.log 197 | log4j.appender.kafkaAppender.layout=org.apache.log4j.PatternLayout 198 | log4j.appender.kafkaAppender.layout.ConversionPattern=[%d] %p %m (%c)%n 199 | 200 | log4j.appender.stateChangeAppender=org.apache.log4j.DailyRollingFileAppender 201 | log4j.appender.stateChangeAppender.DatePattern='.'yyyy-MM-dd-HH 202 | log4j.appender.stateChangeAppender.File=${kafka.logs.dir}/state-change.log 203 | log4j.appender.stateChangeAppender.layout=org.apache.log4j.PatternLayout 204 | log4j.appender.stateChangeAppender.layout.ConversionPattern=[%d] %p %m (%c)%n 205 | 206 | log4j.appender.requestAppender=org.apache.log4j.DailyRollingFileAppender 207 | log4j.appender.requestAppender.DatePattern='.'yyyy-MM-dd-HH 208 | log4j.appender.requestAppender.File=${kafka.logs.dir}/kafka-request.log 209 | log4j.appender.requestAppender.layout=org.apache.log4j.PatternLayout 210 | log4j.appender.requestAppender.layout.ConversionPattern=[%d] %p %m (%c)%n 211 | 212 | log4j.appender.cleanerAppender=org.apache.log4j.DailyRollingFileAppender 213 | log4j.appender.cleanerAppender.DatePattern='.'yyyy-MM-dd-HH 214 | log4j.appender.cleanerAppender.File=${kafka.logs.dir}/log-cleaner.log 215 | log4j.appender.cleanerAppender.layout=org.apache.log4j.PatternLayout 216 | log4j.appender.cleanerAppender.layout.ConversionPattern=[%d] %p %m (%c)%n 217 | 218 | log4j.appender.controllerAppender=org.apache.log4j.DailyRollingFileAppender 219 | log4j.appender.controllerAppender.DatePattern='.'yyyy-MM-dd-HH 220 | log4j.appender.controllerAppender.File=${kafka.logs.dir}/controller.log 221 | log4j.appender.controllerAppender.layout=org.apache.log4j.PatternLayout 222 | log4j.appender.controllerAppender.layout.ConversionPattern=[%d] %p %m (%c)%n 223 | 224 | log4j.appender.authorizerAppender=org.apache.log4j.DailyRollingFileAppender 225 | log4j.appender.authorizerAppender.DatePattern='.'yyyy-MM-dd-HH 226 | log4j.appender.authorizerAppender.File=${kafka.logs.dir}/kafka-authorizer.log 227 | log4j.appender.authorizerAppender.layout=org.apache.log4j.PatternLayout 228 | log4j.appender.authorizerAppender.layout.ConversionPattern=[%d] %p %m (%c)%n 229 | 230 | # Change the two lines below to adjust ZK client logging 231 | log4j.logger.org.I0Itec.zkclient.ZkClient=INFO 232 | log4j.logger.org.apache.zookeeper=INFO 233 | 234 | # Change the two lines below to adjust the general broker logging level (output to server.log and stdout) 235 | log4j.logger.kafka=INFO 236 | log4j.logger.org.apache.kafka=INFO 237 | 238 | # Change to DEBUG or TRACE to enable request logging 239 | log4j.logger.kafka.request.logger=WARN, requestAppender 240 | log4j.additivity.kafka.request.logger=false 241 | 242 | # Uncomment the lines below and change log4j.logger.kafka.network.RequestChannel$ to TRACE for additional output 243 | # related to the handling of requests 244 | #log4j.logger.kafka.network.Processor=TRACE, requestAppender 245 | #log4j.logger.kafka.server.KafkaApis=TRACE, requestAppender 246 | #log4j.additivity.kafka.server.KafkaApis=false 247 | log4j.logger.kafka.network.RequestChannel$=WARN, requestAppender 248 | log4j.additivity.kafka.network.RequestChannel$=false 249 | 250 | log4j.logger.kafka.controller=TRACE, controllerAppender 251 | log4j.additivity.kafka.controller=false 252 | 253 | log4j.logger.kafka.log.LogCleaner=INFO, cleanerAppender 254 | log4j.additivity.kafka.log.LogCleaner=false 255 | 256 | log4j.logger.state.change.logger=TRACE, stateChangeAppender 257 | log4j.additivity.state.change.logger=false 258 | 259 | # Change to DEBUG to enable audit log for the authorizer 260 | log4j.logger.kafka.authorizer.logger=WARN, authorizerAppender 261 | log4j.additivity.kafka.authorizer.logger=false 262 | -------------------------------------------------------------------------------- /k8s/kafka-destination/10metrics-config.yml: -------------------------------------------------------------------------------- 1 | kind: ConfigMap 2 | metadata: 3 | name: jmx-config 4 | namespace: kafka-destination 5 | apiVersion: v1 6 | data: 7 | jmx-kafka-prometheus.yml: |+ 8 | lowercaseOutputName: true 9 | jmxUrl: service:jmx:rmi:///jndi/rmi://127.0.0.1:5555/jmxrmi 10 | ssl: false 11 | whitelistObjectNames: ["kafka.server:*","kafka.controller:*","java.lang:*"] 12 | rules: 13 | - pattern : kafka.server<>Value 14 | - pattern : kafka.server<>OneMinuteRate 15 | - pattern : kafka.server<>OneMinuteRate 16 | - pattern : kafka.server<>queue-size 17 | - pattern : kafka.server<>(Value|OneMinuteRate) 18 | - pattern : kafka.server<>(.*) 19 | - pattern : kafka.server<>(.*) 20 | - pattern : kafka.server<>queue-size 21 | - pattern : kafka.server<>OneMinuteRate 22 | - pattern : kafka.controller<>Value 23 | - pattern : java.lang<>SystemCpuLoad 24 | - pattern : java.langused 25 | - pattern : java.lang<>FreePhysicalMemorySize 26 | - pattern: 'java.lang<(.*)>ThreadCount: .*' 27 | name: java_lang_threading_threadcount 28 | - pattern: 'java.lang<.*>OpenFileDescriptorCount: .*' 29 | name: java_lang_operatingsystem_openfiledescriptorcount 30 | - pattern: 'java.lang(.+): .*' 31 | name: java_lang_memory_nonheapmemoryusage_$1 32 | 33 | jmx-zookeeper-prometheus.yaml: |+ 34 | startDelaySeconds: 0 35 | lowercaseOutputName: true 36 | lowercaseOutputLabelNames: true 37 | jmxUrl: service:jmx:rmi:///jndi/rmi://127.0.0.1:5555/jmxrmi 38 | ssl: false 39 | whitelistObjectNames: ["java.lang:*","org.apache.ZooKeeperService:*"] 40 | rules: 41 | - pattern: 'java.lang(.+): .*' 42 | name: java_lang_Memory_HeapMemoryUsage_$1 43 | - pattern: 'java.lang(.+): .*' 44 | name: java_lang_Memory_NonHeapMemoryUsage_$1 45 | - pattern: 'java.lang<.*>OpenFileDescriptorCount: .*' 46 | name: java_lang_OperatingSystem_OpenFileDescriptorCount 47 | - pattern: 'java.lang<.*>ProcessCpuLoad: .*' 48 | name: java_lang_OperatingSystem_ProcessCpuLoad 49 | - pattern: 'java.lang<(.*)>ThreadCount: .*' 50 | name: java_lang_Threading_ThreadCount 51 | # These are still incorrect, they need more work 52 | - pattern: "org.apache.ZooKeeperService<>(\\w+)" 53 | name: "zookeeper_$2" 54 | - pattern: "org.apache.ZooKeeperService<>(\\w+)" 55 | name: "zookeeper_$3" 56 | labels: 57 | replicaId: "$2" 58 | - pattern: "org.apache.ZooKeeperService<>(\\w+)" 59 | name: "zookeeper_$4" 60 | labels: 61 | replicaId: "$2" 62 | memberType: "$3" 63 | - pattern: "org.apache.ZooKeeperService<>(\\w+)" 64 | name: "zookeeper_$4_$5" 65 | labels: 66 | replicaId: "$2" 67 | memberType: "$3" 68 | -------------------------------------------------------------------------------- /k8s/kafka-destination/10zookeeper-config.yml: -------------------------------------------------------------------------------- 1 | kind: ConfigMap 2 | metadata: 3 | name: zookeeper-config 4 | namespace: kafka-destination 5 | apiVersion: v1 6 | data: 7 | init.sh: |- 8 | #!/bin/bash 9 | set -x 10 | 11 | [ -z "$ID_OFFSET" ] && ID_OFFSET=1 12 | export ZOOKEEPER_SERVER_ID=$((${HOSTNAME##*-} + $ID_OFFSET)) 13 | echo "${ZOOKEEPER_SERVER_ID:-1}" | tee /var/lib/zookeeper/data/myid 14 | sed "s/server\.$ZOOKEEPER_SERVER_ID\=[a-z0-9.-]*/server.$ZOOKEEPER_SERVER_ID=0.0.0.0/" /etc/kafka/zookeeper.properties > /etc/kafka-writable/zookeeper.properties 15 | 16 | 17 | zookeeper.properties: |- 18 | tickTime=2000 19 | dataDir=/var/lib/zookeeper/data 20 | dataLogDir=/var/lib/zookeeper/log 21 | clientPort=2181 22 | initLimit=5 23 | syncLimit=2 24 | 25 | log4j.properties: |- 26 | log4j.rootLogger=INFO, stdout 27 | log4j.appender.stdout=org.apache.log4j.ConsoleAppender 28 | log4j.appender.stdout.layout=org.apache.log4j.PatternLayout 29 | log4j.appender.stdout.layout.ConversionPattern=[%d] %p %m (%c)%n 30 | 31 | # Suppress connection log messages, three lines per livenessProbe execution 32 | log4j.logger.org.apache.zookeeper.server.NIOServerCnxnFactory=WARN 33 | log4j.logger.org.apache.zookeeper.server.NIOServerCnxn=WARN 34 | -------------------------------------------------------------------------------- /k8s/kafka-destination/20dns.yml: -------------------------------------------------------------------------------- 1 | # A headless service to create DNS records 2 | --- 3 | apiVersion: v1 4 | kind: Service 5 | metadata: 6 | name: broker 7 | namespace: kafka-destination 8 | labels: 9 | app: kafka-destination 10 | spec: 11 | ports: 12 | - name: broker 13 | port: 9092 14 | - name: prometheus 15 | port: 5556 16 | clusterIP: None 17 | selector: 18 | app: kafka-destination 19 | -------------------------------------------------------------------------------- /k8s/kafka-destination/20pzoo-service.yml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: Service 3 | metadata: 4 | name: pzoo 5 | namespace: kafka-destination 6 | spec: 7 | ports: 8 | - port: 2888 9 | name: peer 10 | - port: 3888 11 | name: leader-election 12 | clusterIP: None 13 | selector: 14 | app: zookeeper-replica 15 | storage: persistent 16 | -------------------------------------------------------------------------------- /k8s/kafka-destination/30service.yml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: Service 3 | metadata: 4 | name: zookeeper 5 | namespace: kafka-destination 6 | labels: 7 | app: zookeeper 8 | spec: 9 | ports: 10 | - port: 2181 11 | name: client 12 | - port: 5556 13 | name: prometheus 14 | selector: 15 | app: zookeeper 16 | -------------------------------------------------------------------------------- /k8s/kafka-destination/50kafka.yml: -------------------------------------------------------------------------------- 1 | apiVersion: apps/v1beta2 2 | kind: StatefulSet 3 | metadata: 4 | name: kafka-destination 5 | namespace: kafka-destination 6 | spec: 7 | selector: 8 | matchLabels: 9 | app: kafka-destination 10 | serviceName: "broker" 11 | replicas: 16 12 | updateStrategy: 13 | type: OnDelete 14 | template: 15 | metadata: 16 | labels: 17 | app: kafka-destination 18 | spec: 19 | terminationGracePeriodSeconds: 30 20 | initContainers: 21 | - name: init-config 22 | image: solsson/kafka-initutils@sha256:c98d7fb5e9365eab391a5dcd4230fc6e72caf929c60f29ff091e3b0215124713 23 | env: 24 | - name: NODE_NAME 25 | valueFrom: 26 | fieldRef: 27 | fieldPath: spec.nodeName 28 | - name: POD_NAME 29 | valueFrom: 30 | fieldRef: 31 | fieldPath: metadata.name 32 | - name: POD_NAMESPACE 33 | valueFrom: 34 | fieldRef: 35 | fieldPath: metadata.namespace 36 | command: ['/bin/bash', '/etc/kafka/init.sh'] 37 | volumeMounts: 38 | - name: config 39 | mountPath: /etc/kafka 40 | - name: config-writable 41 | mountPath: /etc/kafka-writable 42 | containers: 43 | - name: broker 44 | image: solsson/kafka:2.1.0@sha256:ac3f06d87d45c7be727863f31e79fbfdcb9c610b51ba9cf03c75a95d602f15e1 45 | env: 46 | - name: KAFKA_LOG4J_OPTS 47 | value: -Dlog4j.configuration=file:/etc/kafka/log4j.properties 48 | - name: JMX_PORT 49 | value: "5555" 50 | - name: KAFKA_HEAP_OPTS 51 | value: "-Xmx11G -Xms11G" 52 | ports: 53 | - name: inside 54 | containerPort: 9092 55 | - name: jmx 56 | containerPort: 5555 57 | command: 58 | - ./bin/kafka-server-start.sh 59 | - /etc/kafka-writable/server.properties 60 | resources: 61 | requests: 62 | cpu: 1200m 63 | memory: 12Gi 64 | ephemeral-storage: "80Gi" 65 | limits: 66 | memory: 12Gi 67 | readinessProbe: 68 | tcpSocket: 69 | port: inside 70 | timeoutSeconds: 1 71 | livenessProbe: 72 | tcpSocket: 73 | port: inside 74 | initialDelaySeconds: 60 75 | periodSeconds: 20 76 | timeoutSeconds: 1 77 | volumeMounts: 78 | - name: config 79 | mountPath: /etc/kafka 80 | - name: config-writable 81 | mountPath: /etc/kafka-writable 82 | - name: data 83 | mountPath: /var/lib/kafka/data 84 | - name: metrics 85 | image: solsson/kafka-prometheus-jmx-exporter@sha256:a23062396cd5af1acdf76512632c20ea6be76885dfc20cd9ff40fb23846557e8 86 | command: 87 | - java 88 | - -XX:+UnlockExperimentalVMOptions 89 | - -XX:+UseCGroupMemoryLimitForHeap 90 | - -XX:MaxRAMFraction=1 91 | - -XshowSettings:vm 92 | - -jar 93 | - jmx_prometheus_httpserver.jar 94 | - "5556" 95 | - /etc/jmx-kafka/jmx-kafka-prometheus.yml 96 | ports: 97 | - name: prometheus 98 | containerPort: 5556 99 | resources: 100 | requests: 101 | cpu: 100m 102 | memory: 500Mi 103 | #limits: 104 | #memory: 200Mi 105 | volumeMounts: 106 | - name: jmx-config 107 | mountPath: /etc/jmx-kafka 108 | volumes: 109 | - name: config 110 | configMap: 111 | name: broker-config 112 | - name: config-writable 113 | emptyDir: {} 114 | - name: jmx-config 115 | configMap: 116 | name: jmx-config 117 | - name: data 118 | emptyDir: {} 119 | # affinity: 120 | # podAntiAffinity: 121 | # requiredDuringSchedulingIgnoredDuringExecution: 122 | # - labelSelector: 123 | # matchExpressions: 124 | # - key: app 125 | # operator: In 126 | # values: 127 | # - kafka-destination 128 | # topologyKey: "kubernetes.io/hostname" 129 | -------------------------------------------------------------------------------- /k8s/kafka-destination/50pzoo.yml: -------------------------------------------------------------------------------- 1 | apiVersion: apps/v1beta2 2 | kind: StatefulSet 3 | metadata: 4 | name: pzoo-destination 5 | namespace: kafka-destination 6 | spec: 7 | selector: 8 | matchLabels: 9 | app: zookeeper 10 | storage: persistent 11 | serviceName: "pzoo" 12 | replicas: 1 13 | updateStrategy: 14 | type: OnDelete 15 | template: 16 | metadata: 17 | labels: 18 | app: zookeeper 19 | storage: persistent 20 | annotations: 21 | spec: 22 | terminationGracePeriodSeconds: 10 23 | initContainers: 24 | - name: init-config 25 | image: solsson/kafka:2.1.0@sha256:ac3f06d87d45c7be727863f31e79fbfdcb9c610b51ba9cf03c75a95d602f15e1 26 | command: ['/bin/bash', '/etc/kafka/init.sh'] 27 | volumeMounts: 28 | - name: config 29 | mountPath: /etc/kafka 30 | - name: config-writable 31 | mountPath: /etc/kafka-writable 32 | - name: data 33 | mountPath: /var/lib/zookeeper/data 34 | containers: 35 | - name: zookeeper 36 | image: solsson/kafka:2.1.0@sha256:ac3f06d87d45c7be727863f31e79fbfdcb9c610b51ba9cf03c75a95d602f15e1 37 | env: 38 | - name: KAFKA_LOG4J_OPTS 39 | value: -Dlog4j.configuration=file:/etc/kafka/log4j.properties 40 | - name: JMX_PORT 41 | value: "5555" 42 | command: 43 | - ./bin/zookeeper-server-start.sh 44 | - /etc/kafka-writable/zookeeper.properties 45 | ports: 46 | - containerPort: 2181 47 | name: client 48 | - containerPort: 2888 49 | name: peer 50 | - containerPort: 3888 51 | name: leader-election 52 | - name: jmx 53 | containerPort: 5555 54 | resources: 55 | requests: 56 | cpu: 20m 57 | memory: 200Mi 58 | ephemeral-storage: "2Gi" 59 | readinessProbe: 60 | exec: 61 | command: 62 | - /bin/sh 63 | - -c 64 | - '[ "imok" = "$(echo ruok | nc -w 1 127.0.0.1 2181)" ]' 65 | volumeMounts: 66 | - name: config 67 | mountPath: /etc/kafka 68 | - name: config-writable 69 | mountPath: /etc/kafka-writable 70 | - name: data 71 | mountPath: /var/lib/zookeeper/data 72 | - name: metrics 73 | image: solsson/kafka-prometheus-jmx-exporter@sha256:a23062396cd5af1acdf76512632c20ea6be76885dfc20cd9ff40fb23846557e8 74 | command: 75 | - java 76 | - -XX:+UnlockExperimentalVMOptions 77 | - -XX:+UseCGroupMemoryLimitForHeap 78 | - -XX:MaxRAMFraction=1 79 | - -XshowSettings:vm 80 | - -jar 81 | - jmx_prometheus_httpserver.jar 82 | - "5556" 83 | - /etc/jmx-config/jmx-zookeeper-prometheus.yaml 84 | ports: 85 | - name: prometheus 86 | containerPort: 5556 87 | resources: 88 | requests: 89 | cpu: 100m 90 | memory: 500Mi 91 | volumeMounts: 92 | - name: jmx-config 93 | mountPath: /etc/jmx-config 94 | volumes: 95 | - name: config 96 | configMap: 97 | name: zookeeper-config 98 | - name: config-writable 99 | emptyDir: {} 100 | - name: data 101 | emptyDir: {} 102 | - name: jmx-config 103 | configMap: 104 | name: jmx-config 105 | affinity: 106 | podAntiAffinity: 107 | requiredDuringSchedulingIgnoredDuringExecution: 108 | - labelSelector: 109 | matchExpressions: 110 | - key: app 111 | operator: In 112 | values: 113 | - zookeeper 114 | topologyKey: "kubernetes.io/hostname" 115 | -------------------------------------------------------------------------------- /k8s/kafka-destination/60monitoring.yml: -------------------------------------------------------------------------------- 1 | # Monitor kafka 2 | apiVersion: monitoring.coreos.com/v1 3 | kind: ServiceMonitor 4 | metadata: 5 | labels: 6 | k8s-app: kafka 7 | name: kafka 8 | namespace: monitoring 9 | spec: 10 | endpoints: 11 | - port: prometheus 12 | jobLabel: k8s-app 13 | namespaceSelector: 14 | matchNames: 15 | - kafka-destination 16 | selector: 17 | matchLabels: 18 | app: kafka-destination 19 | --- 20 | 21 | # Monitor zookeeper 22 | apiVersion: monitoring.coreos.com/v1 23 | kind: ServiceMonitor 24 | metadata: 25 | labels: 26 | k8s-app: zookeeper 27 | name: zookeeper 28 | namespace: monitoring 29 | spec: 30 | endpoints: 31 | - port: prometheus 32 | jobLabel: k8s-app 33 | namespaceSelector: 34 | matchNames: 35 | - kafka-destination 36 | selector: 37 | matchLabels: 38 | app: zookeeper 39 | --- 40 | 41 | # set permissions 42 | apiVersion: rbac.authorization.k8s.io/v1beta1 43 | kind: ClusterRole 44 | metadata: 45 | name: prometheus-k8s 46 | namespace: kafka-destination 47 | rules: 48 | - apiGroups: [""] 49 | resources: 50 | - nodes 51 | - services 52 | - endpoints 53 | - pods 54 | verbs: ["get", "list", "watch"] 55 | - apiGroups: [""] 56 | resources: 57 | - configmaps 58 | verbs: ["get"] 59 | - nonResourceURLs: ["/metrics"] 60 | verbs: ["get"] 61 | --- 62 | apiVersion: rbac.authorization.k8s.io/v1beta1 63 | kind: ClusterRoleBinding 64 | metadata: 65 | name: prometheus-k8s 66 | roleRef: 67 | apiGroup: rbac.authorization.k8s.io 68 | kind: ClusterRole 69 | name: prometheus-k8s 70 | subjects: 71 | - kind: ServiceAccount 72 | name: prometheus-k8s 73 | namespace: monitoring 74 | -------------------------------------------------------------------------------- /k8s/kafka-destination/test.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | set -x 4 | # Test ZK 5 | #kubectl exec -n kafka-destination pzoo-destination-0 -- /opt/kafka/bin/zookeeper-shell.sh localhost:2181 create /foo bar 6 | #kubectl exec -n kafka-destination pzoo-destination-0 -- /opt/kafka/bin/zookeeper-shell.sh localhost:2181 get /foo 7 | kubectl --context eu-west-1.k8s.local -n kafka-destination wait --for=condition=Ready pod/pzoo-destination-0 --timeout=-1s 8 | 9 | # wait some, to make sure ZK is with us 10 | sleep 20 11 | 12 | kubectl --context eu-west-1.k8s.local exec -n kafka-destination pzoo-destination-0 -- bash -c "unset JMX_PORT; /opt/kafka/bin/zookeeper-shell.sh localhost:2181 get /brokers/ids/0" 13 | 14 | # Test Kafka from the inside 15 | kubectl --context eu-west-1.k8s.local -n kafka-destination wait --for=condition=Ready pod/kafka-destination-0 --timeout=-1s 16 | 17 | # wait some, to make sure kafka is with us 18 | sleep 20 19 | 20 | TOPIC="_test_destination_$(date +%s)" 21 | kubectl --context eu-west-1.k8s.local exec -n kafka-destination kafka-destination-0 -- bash -c "unset JMX_PORT; echo ' >>>>>>>>>>>>> DESTINATION GREAT SUCCESS! <<<<<<<<<<<<<<<<' | /opt/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic $TOPIC" 22 | kubectl --context eu-west-1.k8s.local exec -n kafka-destination kafka-destination-0 -- bash -c "unset JMX_PORT; /opt/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning --topic $TOPIC --max-messages 1" 23 | -------------------------------------------------------------------------------- /k8s/kafka-source/00namespace.yml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: Namespace 3 | metadata: 4 | name: kafka-source 5 | -------------------------------------------------------------------------------- /k8s/kafka-source/10metrics-config.yml: -------------------------------------------------------------------------------- 1 | kind: ConfigMap 2 | metadata: 3 | name: jmx-config 4 | namespace: kafka-source 5 | apiVersion: v1 6 | data: 7 | 8 | jmx-kafka-prometheus.yml: |+ 9 | lowercaseOutputName: true 10 | jmxUrl: service:jmx:rmi:///jndi/rmi://127.0.0.1:5555/jmxrmi 11 | ssl: false 12 | whitelistObjectNames: ["kafka.server:*","kafka.controller:*","java.lang:*"] 13 | rules: 14 | - pattern : kafka.server<>Value 15 | - pattern : kafka.server<>OneMinuteRate 16 | - pattern : kafka.server<>OneMinuteRate 17 | - pattern : kafka.server<>queue-size 18 | - pattern : kafka.server<>(Value|OneMinuteRate) 19 | - pattern : kafka.server<>(.*) 20 | - pattern : kafka.server<>(.*) 21 | - pattern : kafka.server<>queue-size 22 | - pattern : kafka.server<>OneMinuteRate 23 | - pattern : kafka.controller<>Value 24 | - pattern : java.lang<>SystemCpuLoad 25 | - pattern : java.langused 26 | - pattern : java.lang<>FreePhysicalMemorySize 27 | - pattern: 'java.lang<(.*)>ThreadCount: .*' 28 | name: java_lang_threading_threadcount 29 | - pattern: 'java.lang<.*>OpenFileDescriptorCount: .*' 30 | name: java_lang_operatingsystem_openfiledescriptorcount 31 | - pattern: 'java.lang(.+): .*' 32 | name: java_lang_memory_nonheapmemoryusage_$1 33 | 34 | jmx-zookeeper-prometheus.yaml: |+ 35 | startDelaySeconds: 0 36 | lowercaseOutputName: true 37 | lowercaseOutputLabelNames: true 38 | jmxUrl: service:jmx:rmi:///jndi/rmi://127.0.0.1:5555/jmxrmi 39 | ssl: false 40 | whitelistObjectNames: ["java.lang:*","org.apache.ZooKeeperService:*"] 41 | rules: 42 | - pattern: 'java.lang(.+): .*' 43 | name: java_lang_Memory_HeapMemoryUsage_$1 44 | - pattern: 'java.lang(.+): .*' 45 | name: java_lang_Memory_NonHeapMemoryUsage_$1 46 | - pattern: 'java.lang<.*>OpenFileDescriptorCount: .*' 47 | name: java_lang_OperatingSystem_OpenFileDescriptorCount 48 | - pattern: 'java.lang<.*>ProcessCpuLoad: .*' 49 | name: java_lang_OperatingSystem_ProcessCpuLoad 50 | - pattern: 'java.lang<(.*)>ThreadCount: .*' 51 | name: java_lang_Threading_ThreadCount 52 | # These are still incorrect, they need more work 53 | - pattern: "org.apache.ZooKeeperService<>(\\w+)" 54 | name: "zookeeper_$2" 55 | - pattern: "org.apache.ZooKeeperService<>(\\w+)" 56 | name: "zookeeper_$3" 57 | labels: 58 | replicaId: "$2" 59 | - pattern: "org.apache.ZooKeeperService<>(\\w+)" 60 | name: "zookeeper_$4" 61 | labels: 62 | replicaId: "$2" 63 | memberType: "$3" 64 | - pattern: "org.apache.ZooKeeperService<>(\\w+)" 65 | name: "zookeeper_$4_$5" 66 | labels: 67 | replicaId: "$2" 68 | memberType: "$3" 69 | -------------------------------------------------------------------------------- /k8s/kafka-source/10zookeeper-config.yml: -------------------------------------------------------------------------------- 1 | kind: ConfigMap 2 | metadata: 3 | name: zookeeper-config 4 | namespace: kafka-source 5 | apiVersion: v1 6 | data: 7 | init.sh: |- 8 | #!/bin/bash 9 | set -x 10 | 11 | [ -z "$ID_OFFSET" ] && ID_OFFSET=1 12 | export ZOOKEEPER_SERVER_ID=$((${HOSTNAME##*-} + $ID_OFFSET)) 13 | echo "${ZOOKEEPER_SERVER_ID:-1}" | tee /var/lib/zookeeper/data/myid 14 | sed "s/server\.$ZOOKEEPER_SERVER_ID\=[a-z0-9.-]*/server.$ZOOKEEPER_SERVER_ID=0.0.0.0/" /etc/kafka/zookeeper.properties > /etc/kafka-writable/zookeeper.properties 15 | 16 | zookeeper.properties: |- 17 | tickTime=2000 18 | dataDir=/var/lib/zookeeper/data 19 | dataLogDir=/var/lib/zookeeper/log 20 | clientPort=2181 21 | initLimit=5 22 | syncLimit=2 23 | 24 | log4j.properties: |- 25 | log4j.rootLogger=INFO, stdout 26 | log4j.appender.stdout=org.apache.log4j.ConsoleAppender 27 | log4j.appender.stdout.layout=org.apache.log4j.PatternLayout 28 | log4j.appender.stdout.layout.ConversionPattern=[%d] %p %m (%c)%n 29 | 30 | # Suppress connection log messages, three lines per livenessProbe execution 31 | log4j.logger.org.apache.zookeeper.server.NIOServerCnxnFactory=WARN 32 | log4j.logger.org.apache.zookeeper.server.NIOServerCnxn=WARN 33 | -------------------------------------------------------------------------------- /k8s/kafka-source/20dns.yml: -------------------------------------------------------------------------------- 1 | # A headless service to create DNS records. This is required for the inter-node communcation 2 | --- 3 | apiVersion: v1 4 | kind: Service 5 | metadata: 6 | name: broker 7 | namespace: kafka-source 8 | labels: 9 | app: kafka-source 10 | spec: 11 | ports: 12 | - name: internal 13 | port: 9092 14 | - name: prometheus 15 | port: 5556 16 | clusterIP: None 17 | selector: 18 | app: kafka-source 19 | -------------------------------------------------------------------------------- /k8s/kafka-source/20pzoo-service.yml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: Service 3 | metadata: 4 | name: pzoo 5 | namespace: kafka-source 6 | spec: 7 | ports: 8 | - port: 2888 9 | name: peer 10 | - port: 3888 11 | name: leader-election 12 | clusterIP: None 13 | selector: 14 | app: zookeeper-main 15 | storage: persistent 16 | -------------------------------------------------------------------------------- /k8s/kafka-source/30service.yml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: Service 3 | metadata: 4 | name: zookeeper 5 | namespace: kafka-source 6 | labels: 7 | app: zookeeper 8 | spec: 9 | ports: 10 | - port: 2181 11 | name: client 12 | - port: 5556 13 | name: prometheus 14 | selector: 15 | app: zookeeper 16 | -------------------------------------------------------------------------------- /k8s/kafka-source/50kafka.yml: -------------------------------------------------------------------------------- 1 | apiVersion: apps/v1beta2 2 | kind: StatefulSet 3 | metadata: 4 | name: kafka-source 5 | namespace: kafka-source 6 | spec: 7 | selector: 8 | matchLabels: 9 | app: kafka-source 10 | serviceName: "broker" 11 | replicas: 16 12 | updateStrategy: 13 | type: OnDelete 14 | template: 15 | metadata: 16 | labels: 17 | app: kafka-source 18 | spec: 19 | terminationGracePeriodSeconds: 30 20 | initContainers: 21 | - name: init-config 22 | image: solsson/kafka-initutils@sha256:2cdb90ea514194d541c7b869ac15d2d530ca64889f56e270161fe4e5c3d076ea 23 | env: 24 | - name: NODE_NAME 25 | valueFrom: 26 | fieldRef: 27 | fieldPath: spec.nodeName 28 | - name: POD_NAME 29 | valueFrom: 30 | fieldRef: 31 | fieldPath: metadata.name 32 | - name: POD_NAMESPACE 33 | valueFrom: 34 | fieldRef: 35 | fieldPath: metadata.namespace 36 | - name: POD_IP 37 | valueFrom: 38 | fieldRef: 39 | fieldPath: status.podIP 40 | command: ['/bin/bash', '/etc/kafka/init.sh'] 41 | volumeMounts: 42 | - name: config 43 | mountPath: /etc/kafka 44 | - name: config-writable 45 | mountPath: /etc/kafka-writable 46 | containers: 47 | - name: broker 48 | image: solsson/kafka:2.1.0@sha256:ac3f06d87d45c7be727863f31e79fbfdcb9c610b51ba9cf03c75a95d602f15e1 49 | env: 50 | - name: KAFKA_LOG4J_OPTS 51 | value: -Dlog4j.configuration=file:/etc/kafka/log4j.properties 52 | - name: JMX_PORT 53 | value: "5555" 54 | - name: KAFKA_HEAP_OPTS 55 | value: "-Xmx11G -Xms11G" 56 | ports: 57 | - name: broker-internal 58 | containerPort: 9092 59 | - name: broker-external 60 | containerPort: 9093 61 | hostPort: 9093 62 | - name: jmx 63 | containerPort: 5555 64 | command: 65 | - ./bin/kafka-server-start.sh 66 | - /etc/kafka-writable/server.properties 67 | resources: 68 | requests: 69 | cpu: 1200m 70 | memory: 12Gi 71 | ephemeral-storage: "80Gi" 72 | limits: 73 | memory: 12Gi 74 | readinessProbe: 75 | tcpSocket: 76 | port: broker-internal 77 | timeoutSeconds: 1 78 | livenessProbe: 79 | tcpSocket: 80 | port: broker-internal 81 | initialDelaySeconds: 60 82 | periodSeconds: 20 83 | timeoutSeconds: 1 84 | volumeMounts: 85 | - name: config 86 | mountPath: /etc/kafka 87 | - name: config-writable 88 | mountPath: /etc/kafka-writable 89 | - name: data 90 | mountPath: /var/lib/kafka/data 91 | - name: metrics 92 | image: solsson/kafka-prometheus-jmx-exporter@sha256:a23062396cd5af1acdf76512632c20ea6be76885dfc20cd9ff40fb23846557e8 93 | command: 94 | - java 95 | - -XX:+UnlockExperimentalVMOptions 96 | - -XX:+UseCGroupMemoryLimitForHeap 97 | - -XX:MaxRAMFraction=1 98 | - -XshowSettings:vm 99 | - -jar 100 | - jmx_prometheus_httpserver.jar 101 | - "5556" 102 | - /etc/jmx-kafka/jmx-kafka-prometheus.yml 103 | ports: 104 | - name: prometheus 105 | containerPort: 5556 106 | resources: 107 | requests: 108 | cpu: 100m 109 | memory: 500Mi 110 | #limits: 111 | #memory: 200Mi 112 | volumeMounts: 113 | - name: jmx-config 114 | mountPath: /etc/jmx-kafka 115 | volumes: 116 | - name: config 117 | configMap: 118 | name: broker-config 119 | - name: config-writable 120 | emptyDir: {} 121 | - name: data 122 | emptyDir: {} 123 | - name: jmx-config 124 | configMap: 125 | name: jmx-config 126 | affinity: 127 | podAntiAffinity: 128 | requiredDuringSchedulingIgnoredDuringExecution: 129 | - labelSelector: 130 | matchExpressions: 131 | - key: app 132 | operator: In 133 | values: 134 | - kafka-source 135 | topologyKey: "kubernetes.io/hostname" 136 | -------------------------------------------------------------------------------- /k8s/kafka-source/50pzoo.yml: -------------------------------------------------------------------------------- 1 | apiVersion: apps/v1beta2 2 | kind: StatefulSet 3 | metadata: 4 | name: pzoo-source 5 | namespace: kafka-source 6 | spec: 7 | selector: 8 | matchLabels: 9 | app: zookeeper 10 | storage: persistent 11 | serviceName: "pzoo" 12 | replicas: 1 13 | updateStrategy: 14 | type: OnDelete 15 | template: 16 | metadata: 17 | labels: 18 | app: zookeeper 19 | storage: persistent 20 | annotations: 21 | spec: 22 | terminationGracePeriodSeconds: 10 23 | initContainers: 24 | - name: init-config 25 | image: solsson/kafka:2.1.0@sha256:ac3f06d87d45c7be727863f31e79fbfdcb9c610b51ba9cf03c75a95d602f15e1 26 | command: ['/bin/bash', '/etc/kafka/init.sh'] 27 | volumeMounts: 28 | - name: config 29 | mountPath: /etc/kafka 30 | - name: config-writable 31 | mountPath: /etc/kafka-writable 32 | - name: data 33 | mountPath: /var/lib/zookeeper/data 34 | containers: 35 | - name: zookeeper 36 | image: solsson/kafka:2.1.0@sha256:ac3f06d87d45c7be727863f31e79fbfdcb9c610b51ba9cf03c75a95d602f15e1 37 | env: 38 | - name: KAFKA_LOG4J_OPTS 39 | value: -Dlog4j.configuration=file:/etc/kafka/log4j.properties 40 | - name: JMX_PORT 41 | value: "5555" 42 | command: 43 | - ./bin/zookeeper-server-start.sh 44 | - /etc/kafka-writable/zookeeper.properties 45 | ports: 46 | - containerPort: 2181 47 | hostPort: 2181 48 | name: client 49 | - containerPort: 2888 50 | name: peer 51 | - containerPort: 3888 52 | name: leader-election 53 | - name: jmx 54 | containerPort: 5555 55 | resources: 56 | requests: 57 | cpu: 200m 58 | memory: 2000Mi 59 | ephemeral-storage: "4Gi" 60 | readinessProbe: 61 | exec: 62 | command: 63 | - /bin/sh 64 | - -c 65 | - '[ "imok" = "$(echo ruok | nc -w 1 127.0.0.1 2181)" ]' 66 | volumeMounts: 67 | - name: config 68 | mountPath: /etc/kafka 69 | - name: config-writable 70 | mountPath: /etc/kafka-writable 71 | - name: data 72 | mountPath: /var/lib/zookeeper/data 73 | - name: metrics 74 | image: solsson/kafka-prometheus-jmx-exporter@sha256:a23062396cd5af1acdf76512632c20ea6be76885dfc20cd9ff40fb23846557e8 75 | command: 76 | - java 77 | - -XX:+UnlockExperimentalVMOptions 78 | - -XX:+UseCGroupMemoryLimitForHeap 79 | - -XX:MaxRAMFraction=1 80 | - -XshowSettings:vm 81 | - -jar 82 | - jmx_prometheus_httpserver.jar 83 | - "5556" 84 | - /etc/jmx-config/jmx-zookeeper-prometheus.yaml 85 | ports: 86 | - name: prometheus 87 | containerPort: 5556 88 | resources: 89 | requests: 90 | cpu: 100m 91 | memory: 500Mi 92 | volumeMounts: 93 | - name: jmx-config 94 | mountPath: /etc/jmx-config 95 | volumes: 96 | - name: config 97 | configMap: 98 | name: zookeeper-config 99 | - name: config-writable 100 | emptyDir: {} 101 | - name: data 102 | emptyDir: {} 103 | - name: jmx-config 104 | configMap: 105 | name: jmx-config 106 | affinity: 107 | podAntiAffinity: 108 | requiredDuringSchedulingIgnoredDuringExecution: 109 | - labelSelector: 110 | matchExpressions: 111 | - key: app 112 | operator: In 113 | values: 114 | - zookeeper 115 | topologyKey: "kubernetes.io/hostname" 116 | -------------------------------------------------------------------------------- /k8s/kafka-source/60monitoring.yml: -------------------------------------------------------------------------------- 1 | # Monitor Kafka 2 | 3 | apiVersion: monitoring.coreos.com/v1 4 | kind: ServiceMonitor 5 | metadata: 6 | labels: 7 | k8s-app: kafka 8 | name: kafka 9 | namespace: monitoring 10 | spec: 11 | endpoints: 12 | - port: prometheus 13 | jobLabel: k8s-app 14 | namespaceSelector: 15 | matchNames: 16 | - kafka-source 17 | selector: 18 | matchLabels: 19 | app: kafka-source 20 | --- 21 | # Monitor zookeeper 22 | apiVersion: monitoring.coreos.com/v1 23 | kind: ServiceMonitor 24 | metadata: 25 | labels: 26 | k8s-app: zookeeper 27 | name: zookeeper 28 | namespace: monitoring 29 | spec: 30 | endpoints: 31 | - port: prometheus 32 | jobLabel: k8s-app 33 | namespaceSelector: 34 | matchNames: 35 | - kafka-source 36 | selector: 37 | matchLabels: 38 | app: zookeeper 39 | --- 40 | 41 | # Set permissions 42 | apiVersion: rbac.authorization.k8s.io/v1beta1 43 | kind: ClusterRole 44 | metadata: 45 | name: prometheus-k8s 46 | namespace: kafka-source 47 | rules: 48 | - apiGroups: [""] 49 | resources: 50 | - nodes 51 | - services 52 | - endpoints 53 | - pods 54 | verbs: ["get", "list", "watch"] 55 | - apiGroups: [""] 56 | resources: 57 | - configmaps 58 | verbs: ["get"] 59 | - nonResourceURLs: ["/metrics"] 60 | verbs: ["get"] 61 | --- 62 | apiVersion: rbac.authorization.k8s.io/v1beta1 63 | kind: ClusterRoleBinding 64 | metadata: 65 | name: prometheus-k8s 66 | roleRef: 67 | apiGroup: rbac.authorization.k8s.io 68 | kind: ClusterRole 69 | name: prometheus-k8s 70 | subjects: 71 | - kind: ServiceAccount 72 | name: prometheus-k8s 73 | namespace: monitoring 74 | --- 75 | # Monitor zookeeper 76 | -------------------------------------------------------------------------------- /k8s/kafka-source/test.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | set -x 4 | # Test ZK from the inside 5 | #kubectl --context us-east-1.k8s.local exec -n kafka-source pzoo-source-0 -- /opt/kafka/bin/zookeeper-shell.sh localhost:2181 create /foo bar 6 | #kubectl --context us-east-1.k8s.local exec -n kafka-source pzoo-source-0 -- /opt/kafka/bin/zookeeper-shell.sh localhost:2181 get /foo 7 | kubectl --context us-east-1.k8s.local -n kafka-source wait --for=condition=Ready pod/pzoo-source-0 --timeout=-1s 8 | 9 | # wait some, to make sure ZK is with us 10 | sleep 20 11 | 12 | kubectl --context us-east-1.k8s.local exec -n kafka-source pzoo-source-0 -- bash -c "unset JMX_PORT; /opt/kafka/bin/zookeeper-shell.sh localhost:2181 get /brokers/ids/0" 13 | 14 | # Test ZK from the outside. We assume there's a zookeeper-shell installed locally on the developer's laptop 15 | zookeeper-shell $(kubectl --context us-east-1.k8s.local get node $(kubectl --context us-east-1.k8s.local -n kafka-source get po pzoo-source-0 -o jsonpath='{.spec.nodeName}') -o jsonpath='{.status.addresses[?(@.type=="ExternalIP")].address}'):2181 get /brokers/ids/0 16 | 17 | 18 | kubectl --context us-east-1.k8s.local -n kafka-source wait --for=condition=Ready pod/kafka-source-0 --timeout=-1s 19 | 20 | # wait some, to make sure kafka is with us 21 | sleep 20 22 | 23 | TOPIC="_test_source_$(date +%s)" 24 | kubectl --context us-east-1.k8s.local exec -n kafka-source kafka-source-0 -- bash -c "unset JMX_PORT; echo ' >>>>>>>>>>>>> SOURCE GREAT SUCCESS! <<<<<<<<<<<<<<<<' | /opt/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic $TOPIC" 25 | kubectl --context us-east-1.k8s.local exec -n kafka-source kafka-source-0 -- bash -c "unset JMX_PORT; /opt/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning --topic $TOPIC --max-messages 1" 26 | 27 | # Test kafka from the outside. This assumes there's a locally installed kafka-console-consumer script 28 | kafka-console-consumer --bootstrap-server $(kubectl --context us-east-1.k8s.local get node $(kubectl --context us-east-1.k8s.local -n kafka-source get po kafka-source-0 -o jsonpath='{.spec.nodeName}') -o jsonpath='{.status.addresses[?(@.type=="ExternalIP")].address}'):9093 --topic $TOPIC --from-beginning --max-messages 1 29 | -------------------------------------------------------------------------------- /k8s/monitoring/admin-cluster-role-binding.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: rbac.authorization.k8s.io/v1beta1 2 | kind: ClusterRoleBinding 3 | metadata: 4 | name: eks-admin 5 | roleRef: 6 | apiGroup: rbac.authorization.k8s.io 7 | kind: ClusterRole 8 | name: cluster-admin 9 | subjects: 10 | - kind: ServiceAccount 11 | name: eks-admin 12 | namespace: kube-system 13 | -------------------------------------------------------------------------------- /k8s/monitoring/admin-service-account.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: ServiceAccount 3 | metadata: 4 | name: eks-admin 5 | namespace: kube-system 6 | -------------------------------------------------------------------------------- /k8s/monitoring/graphite-exporter/configmap.yml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: ConfigMap 3 | metadata: 4 | name: graphite-mapping 5 | namespace: monitoring 6 | data: 7 | graphite-mapping.conf: |- 8 | mappings: 9 | - match: stats.test.counter.kafka-mirror-maker-controller.*.*.worker.rebalance.*.* 10 | name: ureplicator_worker_rebalance 11 | labels: 12 | region: $1 13 | instance: $2 14 | nevermind1: $3 15 | nevermind2: $4 16 | metric: $5 17 | metric_type: $6 18 | component: worker 19 | - match: stats.test.counter.kafka-mirror-maker-controller.*.*.*.*.*.totalNumber.count 20 | name: ureplicator_worker 21 | labels: 22 | region: $1 23 | instance: $2 24 | perspective: $3 25 | metric: $4 26 | worker_instance: $5 27 | component: worker 28 | - match: stats.test.counter.kafka-mirror-maker-controller.*.*.*.*.* 29 | name: ureplicator_controller 30 | labels: 31 | region: $1 32 | instance: $2 33 | module: $3 34 | metric: $4 35 | metric_type: $5 36 | component: controller 37 | - match: stats.test.counter.kafka-mirror-maker-controller.*.*.KafkaBrokerTopicObserver.*.*.* 38 | name: ureplicator_topic_observer 39 | labels: 40 | region: $1 41 | instance: $2 42 | unit: $3 43 | direction: $4 44 | metric: $5 45 | metric_type: $6 46 | component: controller 47 | - match: stats.test.counter.kafka-mirror-maker-controller.*.*.topic.partitions.*.count 48 | name: ureplicator_topic_partitions 49 | labels: 50 | region: $1 51 | instance: $2 52 | nevermind1: $3 53 | nevermind2: $4 54 | metric: $5 55 | metric_type: count 56 | component: controller 57 | - match: stats.test.counter.kafka-mirror-maker-controller.*.*.AutoTopicWhitelistManager.*.* 58 | name: ureplicator_topic_whitelist_manager 59 | labels: 60 | region: $1 61 | instance: $2 62 | nevermind1: $3 63 | metric: $4 64 | metric_type: $5 65 | component: controller 66 | - match: stats.test.counter.kafka-mirror-maker-controller.*.*.leader.counter.count 67 | name: ureplicator_leader_count 68 | labels: 69 | region: $1 70 | instance: $2 71 | component: controller 72 | metric_type: count 73 | - match: stats.test.counter.kafka-mirror-maker-controller.*.*.topic.errorNumber.count 74 | name: ureplicator_topic_errors 75 | labels: 76 | region: $1 77 | instance: $2 78 | component: controller 79 | metric_type: count 80 | - match: stats.test.counter.kafka-mirror-maker-controller.*.*.topic.totalNumber.count 81 | name: ureplicator_topic_counts 82 | labels: 83 | region: $1 84 | instance: $2 85 | component: controller 86 | - match: stats.test.counter.kafka-mirror-maker-controller.*.*.worker.*.count 87 | name: ureplicator_worker_instances 88 | labels: 89 | region: $1 90 | instance: $2 91 | __nevermind: $3 92 | metric: $4 93 | metric_type: count 94 | component: worker 95 | -------------------------------------------------------------------------------- /k8s/monitoring/graphite-exporter/deployment.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: extensions/v1beta1 2 | kind: Deployment 3 | metadata: 4 | name: prometheus-graphite-exporter 5 | namespace: monitoring 6 | labels: 7 | app: prometheus 8 | component: graphite-exporter 9 | spec: 10 | replicas: 1 11 | template: 12 | metadata: 13 | name: prometheus-graphite-exporter 14 | labels: 15 | app: prometheus 16 | component: graphite-exporter 17 | spec: 18 | serviceAccountName: prometheus-k8s 19 | containers: 20 | - name: prometheus-graphite-exporter 21 | image: prom/graphite-exporter:master 22 | args: 23 | - '--graphite.mapping-config=/tmp/graphite-mapping.conf' 24 | ports: 25 | - name: importer 26 | containerPort: 9109 27 | - name: exporter 28 | containerPort: 9108 29 | resources: 30 | requests: 31 | cpu: 50m 32 | memory: 250Mi 33 | volumeMounts: 34 | - name: graphite-mapping-volume 35 | mountPath: /tmp/graphite-mapping.conf 36 | subPath: graphite-mapping.conf 37 | volumes: 38 | - name: graphite-mapping-volume 39 | configMap: 40 | name: graphite-mapping 41 | -------------------------------------------------------------------------------- /k8s/monitoring/graphite-exporter/prometheus-scrape.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: monitoring.coreos.com/v1 2 | kind: ServiceMonitor 3 | metadata: 4 | labels: 5 | k8s-app: prometheus-graphite-exporter 6 | name: prometheus-graphite-exporter 7 | namespace: monitoring 8 | spec: 9 | endpoints: 10 | - port: prometheus-graphite-exporter 11 | jobLabel: k8s-app 12 | namespaceSelector: 13 | matchNames: 14 | - monitoring 15 | selector: 16 | matchLabels: 17 | app: prometheus 18 | component: graphite-exporter 19 | --- 20 | apiVersion: rbac.authorization.k8s.io/v1beta1 21 | kind: ClusterRole 22 | metadata: 23 | name: prometheus-k8s 24 | namespace: monitoring 25 | rules: 26 | - apiGroups: [""] 27 | resources: 28 | - nodes 29 | - services 30 | - endpoints 31 | - pods 32 | verbs: ["get", "list", "watch"] 33 | - apiGroups: [""] 34 | resources: 35 | - configmaps 36 | verbs: ["get"] 37 | - nonResourceURLs: ["/metrics"] 38 | verbs: ["get"] 39 | --- 40 | apiVersion: rbac.authorization.k8s.io/v1beta1 41 | kind: ClusterRoleBinding 42 | metadata: 43 | name: prometheus-k8s 44 | roleRef: 45 | apiGroup: rbac.authorization.k8s.io 46 | kind: ClusterRole 47 | name: prometheus-k8s 48 | subjects: 49 | - kind: ServiceAccount 50 | name: prometheus-k8s 51 | namespace: monitoring 52 | -------------------------------------------------------------------------------- /k8s/monitoring/graphite-exporter/service.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: Service 3 | metadata: 4 | namespace: monitoring 5 | name: prometheus-graphite-exporter 6 | labels: 7 | app: prometheus 8 | component: graphite-exporter 9 | spec: 10 | clusterIP: None 11 | ports: 12 | - name: prometheus-graphite-exporter 13 | port: 9108 14 | protocol: TCP 15 | selector: 16 | app: prometheus 17 | component: graphite-exporter 18 | type: ClusterIP 19 | -------------------------------------------------------------------------------- /k8s/monitoring/kube-state-metrics-cluster-role-binding.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: rbac.authorization.k8s.io/v1 2 | # kubernetes versions before 1.8.0 should use rbac.authorization.k8s.io/v1beta1 3 | kind: ClusterRoleBinding 4 | metadata: 5 | name: kube-state-metrics 6 | roleRef: 7 | apiGroup: rbac.authorization.k8s.io 8 | kind: ClusterRole 9 | name: kube-state-metrics 10 | subjects: 11 | - kind: ServiceAccount 12 | name: kube-state-metrics 13 | namespace: kube-system 14 | -------------------------------------------------------------------------------- /k8s/monitoring/kube-state-metrics-cluster-role.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: rbac.authorization.k8s.io/v1 2 | # kubernetes versions before 1.8.0 should use rbac.authorization.k8s.io/v1beta1 3 | kind: ClusterRole 4 | metadata: 5 | name: kube-state-metrics 6 | rules: 7 | - apiGroups: [""] 8 | resources: 9 | - configmaps 10 | - secrets 11 | - nodes 12 | - pods 13 | - services 14 | - resourcequotas 15 | - replicationcontrollers 16 | - limitranges 17 | - persistentvolumeclaims 18 | - persistentvolumes 19 | - namespaces 20 | - endpoints 21 | verbs: ["list", "watch"] 22 | - apiGroups: ["extensions"] 23 | resources: 24 | - daemonsets 25 | - deployments 26 | - replicasets 27 | verbs: ["list", "watch"] 28 | - apiGroups: ["apps"] 29 | resources: 30 | - statefulsets 31 | verbs: ["list", "watch"] 32 | - apiGroups: ["batch"] 33 | resources: 34 | - cronjobs 35 | - jobs 36 | verbs: ["list", "watch"] 37 | - apiGroups: ["autoscaling"] 38 | resources: 39 | - horizontalpodautoscalers 40 | verbs: ["list", "watch"] 41 | - apiGroups: ["policy"] 42 | resources: 43 | - poddisruptionbudgets 44 | verbs: ["list", "watch"] 45 | -------------------------------------------------------------------------------- /k8s/monitoring/kube-state-metrics-deployment.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: apps/v1beta2 2 | # Kubernetes versions after 1.9.0 should use apps/v1 3 | # Kubernetes versions before 1.8.0 should use apps/v1beta1 or extensions/v1beta1 4 | kind: Deployment 5 | metadata: 6 | name: kube-state-metrics 7 | namespace: kube-system 8 | spec: 9 | selector: 10 | matchLabels: 11 | k8s-app: kube-state-metrics 12 | replicas: 1 13 | template: 14 | metadata: 15 | labels: 16 | k8s-app: kube-state-metrics 17 | spec: 18 | serviceAccountName: kube-state-metrics 19 | containers: 20 | - name: kube-state-metrics 21 | image: quay.io/coreos/kube-state-metrics:v1.4.0 22 | ports: 23 | - name: http-metrics 24 | containerPort: 8080 25 | - name: telemetry 26 | containerPort: 8081 27 | readinessProbe: 28 | httpGet: 29 | path: /healthz 30 | port: 8080 31 | initialDelaySeconds: 5 32 | timeoutSeconds: 5 33 | - name: addon-resizer 34 | image: k8s.gcr.io/addon-resizer:1.8.3 35 | resources: 36 | limits: 37 | cpu: 150m 38 | memory: 50Mi 39 | requests: 40 | cpu: 150m 41 | memory: 50Mi 42 | env: 43 | - name: MY_POD_NAME 44 | valueFrom: 45 | fieldRef: 46 | fieldPath: metadata.name 47 | - name: MY_POD_NAMESPACE 48 | valueFrom: 49 | fieldRef: 50 | fieldPath: metadata.namespace 51 | command: 52 | - /pod_nanny 53 | - --container=kube-state-metrics 54 | - --cpu=100m 55 | - --extra-cpu=1m 56 | - --memory=100Mi 57 | - --extra-memory=2Mi 58 | - --threshold=5 59 | - --deployment=kube-state-metrics 60 | -------------------------------------------------------------------------------- /k8s/monitoring/kube-state-metrics-role-binding.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: rbac.authorization.k8s.io/v1 2 | # kubernetes versions before 1.8.0 should use rbac.authorization.k8s.io/v1beta1 3 | kind: RoleBinding 4 | metadata: 5 | name: kube-state-metrics 6 | namespace: kube-system 7 | roleRef: 8 | apiGroup: rbac.authorization.k8s.io 9 | kind: Role 10 | name: kube-state-metrics-resizer 11 | subjects: 12 | - kind: ServiceAccount 13 | name: kube-state-metrics 14 | namespace: kube-system 15 | -------------------------------------------------------------------------------- /k8s/monitoring/kube-state-metrics-role.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: rbac.authorization.k8s.io/v1 2 | # kubernetes versions before 1.8.0 should use rbac.authorization.k8s.io/v1beta1 3 | kind: Role 4 | metadata: 5 | namespace: kube-system 6 | name: kube-state-metrics-resizer 7 | rules: 8 | - apiGroups: [""] 9 | resources: 10 | - pods 11 | verbs: ["get"] 12 | - apiGroups: ["extensions"] 13 | resources: 14 | - deployments 15 | resourceNames: ["kube-state-metrics"] 16 | verbs: ["get", "update"] 17 | -------------------------------------------------------------------------------- /k8s/monitoring/kube-state-metrics-service-account.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: ServiceAccount 3 | metadata: 4 | name: kube-state-metrics 5 | namespace: kube-system 6 | -------------------------------------------------------------------------------- /k8s/monitoring/kube-state-metrics-service.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: Service 3 | metadata: 4 | name: kube-state-metrics 5 | namespace: kube-system 6 | labels: 7 | k8s-app: kube-state-metrics 8 | annotations: 9 | prometheus.io/scrape: 'true' 10 | spec: 11 | ports: 12 | - name: http-metrics 13 | port: 8080 14 | targetPort: http-metrics 15 | protocol: TCP 16 | - name: telemetry 17 | port: 8081 18 | targetPort: telemetry 19 | protocol: TCP 20 | selector: 21 | k8s-app: kube-state-metrics 22 | -------------------------------------------------------------------------------- /k8s/monitoring/monitoring-expose-kube-controller-manager.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: Service 3 | metadata: 4 | namespace: kube-system 5 | name: kube-controller-manager-prometheus-discovery 6 | labels: 7 | k8s-app: kube-controller-manager 8 | spec: 9 | selector: 10 | k8s-app: kube-controller-manager 11 | type: ClusterIP 12 | clusterIP: None 13 | ports: 14 | - name: http-metrics 15 | port: 10252 16 | targetPort: 10252 17 | protocol: TCP 18 | -------------------------------------------------------------------------------- /k8s/monitoring/monitoring-expose-kube-scheduler.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: Service 3 | metadata: 4 | namespace: kube-system 5 | name: kube-scheduler-prometheus-discovery 6 | labels: 7 | k8s-app: kube-scheduler 8 | spec: 9 | selector: 10 | k8s-app: kube-scheduler 11 | type: ClusterIP 12 | clusterIP: None 13 | ports: 14 | - name: http-metrics 15 | port: 10251 16 | targetPort: 10251 17 | protocol: TCP 18 | -------------------------------------------------------------------------------- /k8s/monitoring/patch/grafana-datasources.yaml.tmpl: -------------------------------------------------------------------------------- 1 | data: 2 | prometheus.yaml: |- 3 | { 4 | "datasources": [ 5 | { 6 | "access": "proxy", 7 | "etitable": false, 8 | "name": "prometheus", 9 | "org_id": 1, 10 | "type": "prometheus", 11 | "url": "http://prometheus-k8s.monitoring.svc:9090", 12 | "version": 1 13 | }, 14 | { 15 | "access": "proxy", 16 | "etitable": false, 17 | "name": "us-east-1 source", 18 | "org_id": 1, 19 | "type": "prometheus", 20 | "url": "__US_EAST_1_PROMETHEUS__", 21 | "version": 1 22 | }, 23 | { 24 | "access": "proxy", 25 | "etitable": false, 26 | "name": "eu-west-1 destination", 27 | "org_id": 1, 28 | "type": "prometheus", 29 | "url": "http://prometheus-k8s.monitoring.svc:9090", 30 | "version": 1 31 | } 32 | ] 33 | } 34 | -------------------------------------------------------------------------------- /k8s/monitoring/patch/template.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | set +x 3 | 4 | DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )" 5 | 6 | cd $DIR 7 | 8 | until LB="http://$(kubectl --context us-east-1.k8s.local get svc --namespace monitoring prometheus-k8s -o jsonpath="{.status.loadBalancer.ingress[0].hostname}"):$(kubectl --context us-east-1.k8s.local get svc --namespace monitoring prometheus-k8s -o jsonpath="{.spec.ports[0].port}")" 9 | do 10 | echo "Prometheus on us-east-1 isn't ready yet" 11 | sleep 5 12 | done 13 | 14 | sed "s|__US_EAST_1_PROMETHEUS__|$LB|" grafana-datasources.yaml.tmpl > grafana-datasources.yaml 15 | 16 | -------------------------------------------------------------------------------- /k8s/tester/consumer.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: extensions/v1beta1 2 | kind: Deployment 3 | metadata: 4 | name: kafka-mirror-tester-consumer 5 | labels: 6 | app: kafka-mirror-tester-consumer 7 | spec: 8 | replicas: 8 9 | selector: 10 | matchLabels: 11 | app: kafka-mirror-tester-consumer 12 | template: 13 | metadata: 14 | labels: 15 | app: kafka-mirror-tester-consumer 16 | spec: 17 | containers: 18 | - name: consumer 19 | image: rantav/kafka-mirror-tester:latest 20 | imagePullPolicy: Always 21 | args: 22 | - consume 23 | - --bootstrap-servers 24 | - broker.kafka-destination.svc.cluster.local:9092 25 | - --consumer-group 26 | - group-1 27 | - --topics 28 | - topic0 29 | - --retention 30 | - "300000" 31 | - --num-partitions 32 | - "64" 33 | - --num-replicas 34 | - "2" 35 | ports: 36 | - name: metrics 37 | containerPort: 8000 38 | # affinity: 39 | # podAntiAffinity: 40 | # requiredDuringSchedulingIgnoredDuringExecution: 41 | # - labelSelector: 42 | # matchExpressions: 43 | # - key: app 44 | # operator: In 45 | # values: 46 | # - kafka-mirror-tester-consumer 47 | # topologyKey: "kubernetes.io/hostname" 48 | # - labelSelector: 49 | # matchExpressions: 50 | # - key: app 51 | # operator: In 52 | # values: 53 | # - kafka-destination 54 | # namespaces: 55 | # - kafka-destination 56 | # topologyKey: "kubernetes.io/hostname" 57 | # - labelSelector: 58 | # matchExpressions: 59 | # - key: app 60 | # operator: In 61 | # values: 62 | # - ureplicator 63 | # - key: component 64 | # operator: In 65 | # values: 66 | # - worker 67 | # namespaces: 68 | # - ureplicator 69 | # topologyKey: "kubernetes.io/hostname" 70 | --- 71 | # Headless service just for the sake of exposing the metrics 72 | apiVersion: v1 73 | kind: Service 74 | metadata: 75 | name: kafka-mirror-tester-consumer 76 | labels: 77 | app: kafka-mirror-tester-consumer 78 | spec: 79 | ports: 80 | - name: metrics 81 | port: 8000 82 | clusterIP: None 83 | selector: 84 | app: kafka-mirror-tester-consumer 85 | --- 86 | apiVersion: monitoring.coreos.com/v1 87 | kind: ServiceMonitor 88 | metadata: 89 | labels: 90 | k8s-app: kafka-mirror-tester-consumer 91 | name: kafka-mirror-tester-consumer 92 | namespace: monitoring 93 | spec: 94 | endpoints: 95 | - port: metrics 96 | jobLabel: k8s-app 97 | namespaceSelector: 98 | matchNames: 99 | - default 100 | selector: 101 | matchLabels: 102 | app: kafka-mirror-tester-consumer 103 | -------------------------------------------------------------------------------- /k8s/tester/producer.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: extensions/v1beta1 2 | kind: Deployment 3 | metadata: 4 | name: kafka-mirror-tester-producer 5 | labels: 6 | app: kafka-mirror-tester-producer 7 | spec: 8 | replicas: 8 9 | selector: 10 | matchLabels: 11 | app: kafka-mirror-tester-producer 12 | template: 13 | metadata: 14 | labels: 15 | app: kafka-mirror-tester-producer 16 | spec: 17 | containers: 18 | - name: producer 19 | image: rantav/kafka-mirror-tester:latest 20 | imagePullPolicy: Always 21 | env: 22 | - name: ID 23 | valueFrom: 24 | fieldRef: 25 | fieldPath: status.podIP 26 | args: 27 | - produce 28 | - --bootstrap-servers 29 | - broker.kafka-source.svc.cluster.local:9092 30 | - --id 31 | - $(ID) 32 | - --message-size 33 | - "1000" 34 | - --throughput 35 | - "20000" 36 | - --topics 37 | - topic0 38 | - --retention 39 | - "300000" 40 | - --num-partitions 41 | - "64" 42 | - --num-replicas 43 | - "2" 44 | ports: 45 | - name: metrics 46 | containerPort: 8001 47 | #affinity: 48 | #podAntiAffinity: 49 | #requiredDuringSchedulingIgnoredDuringExecution: 50 | #- labelSelector: 51 | #matchExpressions: 52 | #- key: app 53 | #operator: In 54 | #values: 55 | #- kafka-mirror-tester-producer 56 | #topologyKey: "kubernetes.io/hostname" 57 | #- labelSelector: 58 | #matchExpressions: 59 | #- key: app 60 | #operator: In 61 | #values: 62 | #- kafka-source 63 | #namespaces: 64 | #- kafka-source 65 | #topologyKey: "kubernetes.io/hostname" 66 | --- 67 | # Headless service just for the sake of exposing the metrics 68 | apiVersion: v1 69 | kind: Service 70 | metadata: 71 | name: kafka-mirror-tester-producer 72 | labels: 73 | app: kafka-mirror-tester-producer 74 | spec: 75 | ports: 76 | - name: metrics 77 | port: 8001 78 | clusterIP: None 79 | selector: 80 | app: kafka-mirror-tester-producer 81 | --- 82 | apiVersion: monitoring.coreos.com/v1 83 | kind: ServiceMonitor 84 | metadata: 85 | labels: 86 | k8s-app: kafka-mirror-tester-producer 87 | name: kafka-mirror-tester-producer 88 | namespace: monitoring 89 | spec: 90 | endpoints: 91 | - port: metrics 92 | jobLabel: k8s-app 93 | namespaceSelector: 94 | matchNames: 95 | - default 96 | selector: 97 | matchLabels: 98 | app: kafka-mirror-tester-producer 99 | -------------------------------------------------------------------------------- /k8s/ureplicator/00namespace.yml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: Namespace 3 | metadata: 4 | name: ureplicator 5 | -------------------------------------------------------------------------------- /k8s/ureplicator/20zookeeper.yml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: Service 3 | metadata: 4 | namespace: ureplicator 5 | name: zk-hs 6 | labels: 7 | app: zk 8 | spec: 9 | ports: 10 | - port: 2888 11 | name: server 12 | - port: 3888 13 | name: leader-election 14 | clusterIP: None 15 | selector: 16 | app: zk 17 | --- 18 | apiVersion: v1 19 | kind: Service 20 | metadata: 21 | namespace: ureplicator 22 | name: zookeeper 23 | labels: 24 | app: zk 25 | spec: 26 | ports: 27 | - port: 2181 28 | name: client 29 | selector: 30 | app: zk 31 | --- 32 | apiVersion: policy/v1beta1 33 | kind: PodDisruptionBudget 34 | metadata: 35 | namespace: ureplicator 36 | name: zk-pdb 37 | spec: 38 | selector: 39 | matchLabels: 40 | app: zk 41 | maxUnavailable: 1 42 | --- 43 | apiVersion: apps/v1beta1 44 | kind: StatefulSet 45 | metadata: 46 | namespace: ureplicator 47 | name: zk 48 | spec: 49 | selector: 50 | matchLabels: 51 | app: zk 52 | serviceName: zk-hs 53 | replicas: 1 54 | updateStrategy: 55 | type: RollingUpdate 56 | podManagementPolicy: Parallel 57 | template: 58 | metadata: 59 | labels: 60 | app: zk 61 | spec: 62 | affinity: 63 | podAntiAffinity: 64 | requiredDuringSchedulingIgnoredDuringExecution: 65 | - labelSelector: 66 | matchExpressions: 67 | - key: "app" 68 | operator: In 69 | values: 70 | - zk-hs 71 | topologyKey: "kubernetes.io/hostname" 72 | containers: 73 | - name: kubernetes-zookeeper 74 | imagePullPolicy: Always 75 | image: "k8s.gcr.io/kubernetes-zookeeper:1.0-3.4.10" 76 | resources: 77 | requests: 78 | memory: "1Gi" 79 | cpu: "0.5" 80 | ports: 81 | - containerPort: 2181 82 | name: client 83 | - containerPort: 2888 84 | name: server 85 | - containerPort: 3888 86 | name: leader-election 87 | command: 88 | - sh 89 | - -c 90 | - "start-zookeeper \ 91 | --servers=1 \ 92 | --data_dir=/var/lib/zookeeper/data \ 93 | --data_log_dir=/var/lib/zookeeper/data/log \ 94 | --conf_dir=/opt/zookeeper/conf \ 95 | --client_port=2181 \ 96 | --election_port=3888 \ 97 | --server_port=2888 \ 98 | --tick_time=2000 \ 99 | --init_limit=10 \ 100 | --sync_limit=5 \ 101 | --heap=512M \ 102 | --max_client_cnxns=200 \ 103 | --snap_retain_count=3 \ 104 | --purge_interval=12 \ 105 | --max_session_timeout=40000 \ 106 | --min_session_timeout=4000 \ 107 | --log_level=INFO" 108 | readinessProbe: 109 | exec: 110 | command: 111 | - sh 112 | - -c 113 | - "zookeeper-ready 2181" 114 | initialDelaySeconds: 10 115 | timeoutSeconds: 5 116 | livenessProbe: 117 | exec: 118 | command: 119 | - sh 120 | - -c 121 | - "zookeeper-ready 2181" 122 | initialDelaySeconds: 10 123 | timeoutSeconds: 5 124 | volumeMounts: 125 | - name: data 126 | mountPath: /var/lib/zookeeper 127 | volumes: 128 | - name: data 129 | emptyDir: {} 130 | securityContext: 131 | runAsUser: 1000 132 | fsGroup: 1000 133 | -------------------------------------------------------------------------------- /k8s/ureplicator/25env-config.yml.tmpl: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: ConfigMap 3 | metadata: 4 | name: ureplicator-envs 5 | namespace: ureplicator 6 | data: 7 | SRC_ZK_CONNECT: __SRC_ZK_CONNECT__ 8 | CONSUMER_GROUP_ID: ureplicator 9 | HELIX_CLUSTER_NAME: ureplicator 10 | HELIX_ENV: test.eu1 11 | HELIX_ZK_CONNECT: zookeeper.ureplicator.svc.cluster.local:2181 12 | HELIX_ZK_ADDRESS: zookeeper.ureplicator.svc.cluster.local 13 | HELIX_ZK_PORT: '2181' 14 | DST_ZK_CONNECT: zookeeper.kafka-destination.svc.cluster.local:2181 15 | DST_BOOTSTRAP_SERVERS: broker.kafka-destination.svc.cluster.local:9092 16 | WORKER_ABORT_ON_SEND_FAILURE: 'true' 17 | GRAPHITE_HOST: prometheus-graphite-exporter.monitoring.svc.cluster.local 18 | GRAPHITE_PORT: "9109" 19 | FETCH_MESSAGE_MAX_BYTES: "10485760" 20 | SOCKET_RECEIVE_BUFFER_BYTES: "10485760" 21 | NUM_CONSUMER_FETCHERS: "1" 22 | PROD_COMPRESSION_TYPE: none # none, gzip, snappy, lz4 23 | PROD_LINGER_MS: "1000" 24 | PROD_SEND_BUFFER_BYTES: "10485760" 25 | PROD_MAX_REQUEST_SIZE: "10485760" 26 | PROD_MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION: "10" 27 | JAVA_OPTS: -javaagent:/jmx_prometheus_javaagent-0.3.1.jar=8080:/etc/jmx-config/jmx-prometheus-javaagent-config.yml 28 | -------------------------------------------------------------------------------- /k8s/ureplicator/25jmx-prometheus-javaagent-config.yml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: ConfigMap 3 | metadata: 4 | name: ureplicator-jmx-prometheus-javaagent-config 5 | namespace: ureplicator 6 | data: 7 | jmx-prometheus-javaagent-config.yml: |+ 8 | startDelaySeconds: 0 9 | lowercaseOutputName: true 10 | lowercaseOutputLabelNames: true 11 | whitelistObjectNames: ["java.lang:*"] 12 | rules: 13 | - pattern: 'java.lang(.+): .*' 14 | name: java_lang_Memory_HeapMemoryUsage_$1 15 | - pattern: 'java.lang(.+): .*' 16 | name: java_lang_Memory_NonHeapMemoryUsage_$1 17 | - pattern: 'java.lang<.*>OpenFileDescriptorCount: .*' 18 | name: java_lang_OperatingSystem_OpenFileDescriptorCount 19 | - pattern: 'java.lang<.*>ProcessCpuLoad: .*' 20 | name: java_lang_OperatingSystem_ProcessCpuLoad 21 | - pattern: 'java.lang<(.*)>ThreadCount: .*' 22 | name: java_lang_Threading_ThreadCount 23 | -------------------------------------------------------------------------------- /k8s/ureplicator/30ureplicator.yml: -------------------------------------------------------------------------------- 1 | apiVersion: apps/v1beta2 2 | kind: Deployment 3 | metadata: 4 | namespace: ureplicator 5 | name: ureplicator-controller 6 | labels: 7 | app: ureplicator 8 | component: controller 9 | spec: 10 | replicas: 1 11 | selector: 12 | matchLabels: 13 | app: ureplicator 14 | component: controller 15 | template: 16 | metadata: 17 | labels: 18 | app: ureplicator 19 | component: controller 20 | spec: 21 | terminationGracePeriodSeconds: 10 22 | initContainers: 23 | - name: init-zk 24 | image: busybox 25 | command: 26 | - /bin/sh 27 | - -c 28 | - 'until [ "imok" = "$(echo ruok | nc -w 1 $HELIX_ZK_ADDRESS $HELIX_ZK_PORT)" ] ; do echo waiting ; sleep 1 ; done' 29 | env: 30 | - name: SERVICE_TYPE 31 | value: "init" 32 | envFrom: 33 | - configMapRef: 34 | name: ureplicator-envs 35 | containers: 36 | - name: ureplicator-controller 37 | image: rantav/ureplicator:1c1677d 38 | env: 39 | - name: SERVICE_CMD 40 | value: "start-controller.sh" 41 | - name: SERVICE_TYPE 42 | value: "controller" 43 | envFrom: 44 | - configMapRef: 45 | name: ureplicator-envs 46 | ports: 47 | - name: api-port 48 | containerPort: 9000 49 | - name: metrics 50 | containerPort: 8080 51 | livenessProbe: 52 | httpGet: 53 | path: /health 54 | port: api-port 55 | initialDelaySeconds: 120 56 | timeoutSeconds: 10 57 | readinessProbe: 58 | httpGet: 59 | path: /health 60 | port: api-port 61 | initialDelaySeconds: 120 62 | timeoutSeconds: 10 63 | resources: 64 | requests: 65 | cpu: 1000m 66 | memory: 3000Mi 67 | limits: 68 | cpu: 1000m 69 | memory: 3000Mi 70 | volumeMounts: 71 | - name: tmp 72 | mountPath: /tmp/uReplicator-controller 73 | - name: jmx-config 74 | mountPath: /etc/jmx-config 75 | volumes: 76 | - name: tmp 77 | emptyDir: {} 78 | - name: jmx-config 79 | configMap: 80 | name: ureplicator-jmx-prometheus-javaagent-config 81 | --- 82 | apiVersion: extensions/v1beta1 83 | kind: Deployment 84 | metadata: 85 | namespace: ureplicator 86 | name: ureplicator-worker 87 | labels: 88 | app: ureplicator 89 | component: worker 90 | spec: 91 | replicas: 1 92 | selector: 93 | matchLabels: 94 | app: ureplicator 95 | component: worker 96 | template: 97 | metadata: 98 | labels: 99 | app: ureplicator 100 | component: worker 101 | spec: 102 | terminationGracePeriodSeconds: 10 103 | initContainers: 104 | - name: init-zk 105 | image: busybox 106 | command: 107 | - /bin/sh 108 | - -c 109 | - 'until [ "imok" = "$(echo ruok | nc -w 1 $HELIX_ZK_ADDRESS $HELIX_ZK_PORT)" ] ; do echo waiting ; sleep 10 ; done' 110 | envFrom: 111 | - configMapRef: 112 | name: ureplicator-envs 113 | containers: 114 | - name: ureplicator-worker 115 | image: rantav/ureplicator:1c1677d 116 | imagePullPolicy: Always 117 | env: 118 | - name: SERVICE_TYPE 119 | value: "worker" 120 | - name: SERVICE_CMD 121 | value: "start-worker.sh" 122 | envFrom: 123 | - configMapRef: 124 | name: ureplicator-envs 125 | ports: 126 | - name: metrics 127 | containerPort: 8080 128 | resources: 129 | requests: 130 | cpu: 800m 131 | memory: 3Gi 132 | #limits: 133 | #cpu: 1200m 134 | #memory: 6Gi 135 | volumeMounts: 136 | - name: jmx-config 137 | mountPath: /etc/jmx-config 138 | volumes: 139 | - name: jmx-config 140 | configMap: 141 | name: ureplicator-jmx-prometheus-javaagent-config 142 | affinity: 143 | podAntiAffinity: 144 | requiredDuringSchedulingIgnoredDuringExecution: 145 | - labelSelector: 146 | matchExpressions: 147 | - key: app 148 | operator: In 149 | values: 150 | - kafka-destination 151 | namespaces: 152 | - kafka-destination 153 | topologyKey: "kubernetes.io/hostname" 154 | #- labelSelector: 155 | #matchExpressions: 156 | #- key: app 157 | #operator: In 158 | #values: 159 | #- ureplicator 160 | #- key: component 161 | #operator: In 162 | #values: 163 | #- worker 164 | #topologyKey: "kubernetes.io/hostname" 165 | -------------------------------------------------------------------------------- /k8s/ureplicator/40monitoring.yml: -------------------------------------------------------------------------------- 1 | # Headless service just for the sake of exposing the metrics 2 | apiVersion: v1 3 | kind: Service 4 | metadata: 5 | name: ureplicator-controller 6 | namespace: ureplicator 7 | labels: 8 | app: ureplicator 9 | component: controller 10 | spec: 11 | ports: 12 | - name: metrics 13 | port: 8080 14 | clusterIP: None 15 | selector: 16 | app: ureplicator 17 | component: controller 18 | --- 19 | apiVersion: v1 20 | kind: Service 21 | metadata: 22 | name: ureplicator-worker 23 | namespace: ureplicator 24 | labels: 25 | app: ureplicator 26 | component: worker 27 | spec: 28 | ports: 29 | - name: metrics 30 | port: 8080 31 | clusterIP: None 32 | selector: 33 | app: ureplicator 34 | component: worker 35 | --- 36 | apiVersion: monitoring.coreos.com/v1 37 | kind: ServiceMonitor 38 | metadata: 39 | labels: 40 | k8s-app: ureplicator-controller 41 | name: ureplicator-controller 42 | namespace: monitoring 43 | spec: 44 | endpoints: 45 | - port: metrics 46 | jobLabel: k8s-app 47 | namespaceSelector: 48 | matchNames: 49 | - ureplicator 50 | selector: 51 | matchLabels: 52 | app: ureplicator 53 | component: controller 54 | --- 55 | apiVersion: monitoring.coreos.com/v1 56 | kind: ServiceMonitor 57 | metadata: 58 | labels: 59 | k8s-app: ureplicator-worker 60 | name: ureplicator-worker 61 | namespace: monitoring 62 | spec: 63 | endpoints: 64 | - port: metrics 65 | jobLabel: k8s-app 66 | namespaceSelector: 67 | matchNames: 68 | - ureplicator 69 | selector: 70 | matchLabels: 71 | app: ureplicator 72 | component: worker 73 | --- 74 | apiVersion: rbac.authorization.k8s.io/v1beta1 75 | kind: ClusterRole 76 | metadata: 77 | name: prometheus-k8s 78 | namespace: ureplicator 79 | rules: 80 | - apiGroups: [""] 81 | resources: 82 | - nodes 83 | - services 84 | - endpoints 85 | - pods 86 | verbs: ["get", "list", "watch"] 87 | - apiGroups: [""] 88 | resources: 89 | - configmaps 90 | verbs: ["get"] 91 | - nonResourceURLs: ["/metrics"] 92 | verbs: ["get"] 93 | --- 94 | apiVersion: rbac.authorization.k8s.io/v1beta1 95 | kind: ClusterRoleBinding 96 | metadata: 97 | name: prometheus-k8s 98 | roleRef: 99 | apiGroup: rbac.authorization.k8s.io 100 | kind: ClusterRole 101 | name: prometheus-k8s 102 | subjects: 103 | - kind: ServiceAccount 104 | name: prometheus-k8s 105 | namespace: monitoring 106 | -------------------------------------------------------------------------------- /k8s/ureplicator/template.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | set +x 3 | 4 | DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )" 5 | 6 | cd $DIR 7 | 8 | until IP=$(kubectl --context us-east-1.k8s.local get node $(kubectl --context us-east-1.k8s.local -n kafka-source get po pzoo-source-0 -o jsonpath='{.spec.nodeName}') -o jsonpath='{.status.addresses[?(@.type=="ExternalIP")].address}') 9 | do 10 | echo "ZK on source isn't ready yet" 11 | sleep 5 12 | done 13 | 14 | sed "s/__SRC_ZK_CONNECT__/$IP:2181/" 25env-config.yml.tmpl > 25env-config.yml 15 | 16 | -------------------------------------------------------------------------------- /k8s/ureplicator/test.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | set -x 4 | # Test Kafka to see if a topic had been replicated 5 | kubectl --context eu-west-1.k8s.local -n kafka-destination wait --for=condition=Ready pod/kafka-destination-0 --timeout=-1s 6 | kubectl --context us-east-1.k8s.local -n kafka-source wait --for=condition=Ready pod/kafka-source-0 --timeout=-1s 7 | 8 | kubectl --context eu-west-1.k8s.local -n ureplicator wait --for=condition=Available deployment/ureplicator-worker --timeout=-1s 9 | kubectl --context eu-west-1.k8s.local -n ureplicator wait --for=condition=Available deployment/ureplicator-controller --timeout=-1s 10 | 11 | # Run end to end tests. Produce to the source cluster, consume from the destination cluster 12 | TOPIC="_test_replicator_$(date +%s)" 13 | kubectl --context us-east-1.k8s.local exec -n kafka-source kafka-source-0 -- bash -c "unset JMX_PORT; echo ' >>>>>>>>>>>>> REPLICATOR GREAT SUCCESS! <<<<<<<<<<<<<<<<' | /opt/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic $TOPIC" 14 | kubectl --context eu-west-1.k8s.local exec -n kafka-destination kafka-destination-0 -- bash -c "unset JMX_PORT; /opt/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning --topic $TOPIC --max-messages 1" 15 | -------------------------------------------------------------------------------- /lib/admin/topic.go: -------------------------------------------------------------------------------- 1 | package admin 2 | 3 | // Package admin is used for kafka's admin api 4 | 5 | import ( 6 | "context" 7 | "fmt" 8 | 9 | "github.com/confluentinc/confluent-kafka-go/kafka" 10 | log "github.com/sirupsen/logrus" 11 | "github.com/appsflyer/kafka-mirror-tester/lib/types" 12 | ) 13 | 14 | // MustCreateTopic creates a new topic with the specified number of partitions. 15 | // If the topic already exists, fails silently 16 | // On error - simply panics 17 | func MustCreateTopic( 18 | ctx context.Context, 19 | brokers types.Brokers, 20 | topic types.Topic, 21 | partitions, 22 | replicas, 23 | retentionMs uint) { 24 | a, err := kafka.NewAdminClient(&kafka.ConfigMap{"bootstrap.servers": string(brokers)}) 25 | if err != nil { 26 | log.Fatalf("%+v", err) 27 | return 28 | } 29 | defer a.Close() 30 | 31 | res, err := a.CreateTopics( 32 | ctx, 33 | []kafka.TopicSpecification{ 34 | { 35 | Topic: string(topic), 36 | NumPartitions: int(partitions), 37 | ReplicationFactor: int(replicas), 38 | Config: map[string]string{ 39 | "retention.ms": fmt.Sprintf("%d", retentionMs), 40 | }, 41 | }, 42 | }) 43 | if err != nil { 44 | log.Fatalf("%+v", err) 45 | return 46 | } 47 | 48 | log.Infof("Topic create result: %v", res) 49 | } 50 | -------------------------------------------------------------------------------- /lib/cmd/cmd.go: -------------------------------------------------------------------------------- 1 | // Package cmd is a cli layer that takes care of cli args etc. 2 | // powereve by cobra https://github.com/spf13/cobra 3 | package cmd 4 | 5 | import ( 6 | "fmt" 7 | "os" 8 | 9 | "github.com/spf13/cobra" 10 | ) 11 | 12 | // cobra root cmd 13 | var rootCmd = &cobra.Command{ 14 | Use: "kafka-mirror-tester", 15 | Short: "Kafka mirror tester is a test tool for kafka mirroring", 16 | Long: `A high throughput producer and consumer that stress kafka and validate message consumption order and latency.`, 17 | } 18 | 19 | // Execute is the main entry point for the CLI, using cobra lib. 20 | func Execute() { 21 | if err := rootCmd.Execute(); err != nil { 22 | fmt.Println(err) 23 | os.Exit(1) 24 | } 25 | } 26 | -------------------------------------------------------------------------------- /lib/cmd/consume.go: -------------------------------------------------------------------------------- 1 | package cmd 2 | 3 | import ( 4 | "context" 5 | "strings" 6 | 7 | "github.com/spf13/cobra" 8 | "github.com/appsflyer/kafka-mirror-tester/lib/admin" 9 | "github.com/appsflyer/kafka-mirror-tester/lib/consumer" 10 | "github.com/appsflyer/kafka-mirror-tester/lib/types" 11 | ) 12 | 13 | var ( 14 | cTopics *string 15 | cBootstraServers *string 16 | consumerGroup *string 17 | cUseMessageHeaders *bool 18 | cNumPartitions *uint 19 | cNumReplicas *uint 20 | cRetention *uint 21 | ) 22 | 23 | // consumeCmd represents the consume command 24 | var consumeCmd = &cobra.Command{ 25 | Use: "consume", 26 | Short: "Consume messages from kafka and aggregate results", 27 | Long: `Consumes messages from kafka and collects statistics about them. 28 | Namely latency statistics and sequence number bookeeping.`, 29 | Run: func(cmd *cobra.Command, args []string) { 30 | ctx := context.Background() 31 | brokers := types.Brokers(*cBootstraServers) 32 | ts := types.Topics(strings.Split(*cTopics, ",")) 33 | initialSequence := types.SequenceNumber(0) 34 | cg := types.ConsumerGroup(*consumerGroup) 35 | for _, t := range ts { 36 | admin.MustCreateTopic(ctx, brokers, types.Topic(t), *cNumPartitions, *cNumReplicas, *cRetention) 37 | } 38 | consumer.ConsumeAndAnalyze(ctx, brokers, ts, cg, initialSequence, *cUseMessageHeaders) 39 | }, 40 | } 41 | 42 | func init() { 43 | rootCmd.AddCommand(consumeCmd) 44 | 45 | cTopics = consumeCmd.Flags().String("topics", "", "List of topics to consume from (coma separated)") 46 | consumeCmd.MarkFlagRequired("topics") 47 | cBootstraServers = consumeCmd.Flags().String("bootstrap-servers", "", "List of host:port bootstrap servers (coma separated)") 48 | consumeCmd.MarkFlagRequired("bootstrap-servers") 49 | consumerGroup = consumeCmd.Flags().String("consumer-group", "", "The kafka consumer group name") 50 | consumeCmd.MarkFlagRequired("consumer-group") 51 | cUseMessageHeaders = consumeCmd.Flags().Bool("use-message-headers", false, "Whether to use message headers to pass metadata or use the payload instead") 52 | cNumPartitions = consumeCmd.Flags().Uint("num-partitions", 1, "Number of partitions to create per each topic (if the topics are new)") 53 | cNumReplicas = consumeCmd.Flags().Uint("num-replicas", 1, "Number of replicas to create per each topic (if the topics are new)") 54 | cRetention = consumeCmd.Flags().Uint("retention", 30000, "Data retention for the created topics. In ms.") 55 | } 56 | -------------------------------------------------------------------------------- /lib/cmd/produce.go: -------------------------------------------------------------------------------- 1 | package cmd 2 | 3 | import ( 4 | "github.com/spf13/cobra" 5 | 6 | "github.com/appsflyer/kafka-mirror-tester/lib/producer" 7 | "github.com/appsflyer/kafka-mirror-tester/lib/types" 8 | ) 9 | 10 | var ( 11 | // CLI args 12 | producerID *string 13 | pTopics *string 14 | throughput *uint 15 | messageSize *uint 16 | pBootstraServers *string 17 | pUseMessageHeaders *bool 18 | pNumPartitions *uint 19 | pNumReplicas *uint 20 | pRetention *uint 21 | ) 22 | 23 | // produceCmd represents the produce command 24 | var produceCmd = &cobra.Command{ 25 | Use: "produce", 26 | Short: "Produce messages to kafka", 27 | Long: `The producer is a high-throughput kafka message producer. 28 | It sends sequence numbered and timestamped messages to kafka where by the consumer reads and validates. `, 29 | Run: func(cmd *cobra.Command, args []string) { 30 | brokers := types.Brokers(*pBootstraServers) 31 | id := types.ProducerID(*producerID) 32 | through := types.Throughput(*throughput) 33 | size := types.MessageSize(*messageSize) 34 | initialSequence := types.SequenceNumber(0) 35 | producer.ProduceToTopics(brokers, id, through, size, initialSequence, *pTopics, *pNumPartitions, *pNumReplicas, *pUseMessageHeaders, *pRetention) 36 | }, 37 | } 38 | 39 | func init() { 40 | rootCmd.AddCommand(produceCmd) 41 | producerID = produceCmd.Flags().String("id", "", "ID of the producer. You can use the hostname command") 42 | produceCmd.MarkFlagRequired("id") 43 | pTopics = produceCmd.Flags().String("topics", "", "List of topics to produce to (coma separated)") 44 | produceCmd.MarkFlagRequired("topics") 45 | throughput = produceCmd.Flags().Uint("throughput", 0, "Number of messages to send to each topic per second") 46 | produceCmd.MarkFlagRequired("throughput") 47 | messageSize = produceCmd.Flags().Uint("message-size", 0, "Message size to send (in bytes)") 48 | produceCmd.MarkFlagRequired("message-size") 49 | pBootstraServers = produceCmd.Flags().String("bootstrap-servers", "", "List of host:port bootstrap servers (coma separated)") 50 | produceCmd.MarkFlagRequired("bootstrap-servers") 51 | pUseMessageHeaders = produceCmd.Flags().Bool("use-message-headers", false, "Whether to use message headers to pass metadata or use the payload instead") 52 | pNumPartitions = produceCmd.Flags().Uint("num-partitions", 1, "Number of partitions to create per each topic (if the topics are new)") 53 | pNumReplicas = produceCmd.Flags().Uint("num-replicas", 1, "Number of replicas to create per each topic (if the topics are new)") 54 | pRetention = produceCmd.Flags().Uint("retention", 30000, "Data retention for the created topics. In ms.") 55 | } 56 | -------------------------------------------------------------------------------- /lib/consumer/consumer.go: -------------------------------------------------------------------------------- 1 | // Package consumer implements the consumption, performance measurement and validation logic of the test 2 | package consumer 3 | 4 | import ( 5 | "context" 6 | "fmt" 7 | "os" 8 | "os/signal" 9 | "syscall" 10 | 11 | "github.com/confluentinc/confluent-kafka-go/kafka" 12 | log "github.com/sirupsen/logrus" 13 | 14 | "github.com/appsflyer/kafka-mirror-tester/lib/message" 15 | "github.com/appsflyer/kafka-mirror-tester/lib/types" 16 | ) 17 | 18 | const ( 19 | 20 | // kafka consumer session timeout 21 | sessionTimeoutMs = 6000 22 | 23 | // For the purpose of performance monitoring we always want to start with the latest messages 24 | autoOffsetReset = "latest" 25 | ) 26 | 27 | // clientID is a friendly name for the client so that monitoring tool know who we are. 28 | var clientID string 29 | 30 | func init() { 31 | 32 | //log.SetLevel(log.TraceLevel) 33 | 34 | hostname, err := os.Hostname() 35 | if err != nil { 36 | log.Fatalf("Can't get hostname %+v", err) 37 | } 38 | clientID = fmt.Sprintf("kafka-mirror-tester-%s", hostname) 39 | } 40 | 41 | // ConsumeAndAnalyze consumes messages from the kafka topic and analyzes their correctness and performance. 42 | // The function blocks forever (or until the context is cancled, or until a signal is sent) 43 | func ConsumeAndAnalyze( 44 | ctx context.Context, 45 | brokers types.Brokers, 46 | topics types.Topics, 47 | group types.ConsumerGroup, 48 | initialSequence types.SequenceNumber, 49 | useMessageHeaders bool, 50 | ) { 51 | log.Infof("Starting the consumer. brokers=%s, topics=%s group=%s initialSequence=%d", 52 | brokers, topics, group, initialSequence) 53 | c, err := kafka.NewConsumer(&kafka.ConfigMap{ 54 | "bootstrap.servers": string(brokers), 55 | "group.id": string(group), 56 | "session.timeout.ms": sessionTimeoutMs, 57 | "go.events.channel.enable": true, 58 | "go.application.rebalance.enable": true, 59 | "client.id": clientID, 60 | // Enable generation of PartitionEOF when the 61 | // end of a partition is reached. 62 | "enable.partition.eof": true, 63 | "auto.offset.reset": autoOffsetReset, 64 | }) 65 | if err != nil { 66 | log.Fatalf("Failed to create consumer: %s\n", err) 67 | } 68 | defer c.Close() 69 | log.Debugf("Created Consumer %v\n", c) 70 | 71 | err = c.SubscribeTopics([]string(topics), nil) 72 | 73 | if err != nil { 74 | log.Fatalf("Failed to subscribe to topics %s: %s\n", topics, err) 75 | } 76 | 77 | serveConsumerUI() 78 | 79 | consumeForever(ctx, c, initialSequence, useMessageHeaders) 80 | } 81 | 82 | // loops through the kafka consumer channel and consumes all events 83 | // The loop runs forever until the context is cancled or a signal is sent (SIGINT or SIGTERM) 84 | func consumeForever( 85 | ctx context.Context, 86 | c *kafka.Consumer, 87 | initialSequence types.SequenceNumber, 88 | useMessageHeaders bool, 89 | ) { 90 | sigchan := make(chan os.Signal, 1) 91 | signal.Notify(sigchan, syscall.SIGINT, syscall.SIGTERM) 92 | 93 | for { 94 | select { 95 | case sig := <-sigchan: 96 | log.Infof("Caught signal %v: terminating", sig) 97 | // TODO: Write a summary message to the console before quitting. 98 | return 99 | case <-ctx.Done(): 100 | log.Infof("Done. %s", ctx.Err()) 101 | return 102 | case ev := <-c.Events(): 103 | // Most events are typically juse messages, still we are also interested in 104 | // Partition changes, EOF and Errors 105 | switch e := ev.(type) { 106 | case kafka.AssignedPartitions: 107 | log.Infof("AssignedPartitions %v", e) 108 | c.Assign(e.Partitions) 109 | case kafka.RevokedPartitions: 110 | log.Infof("RevokedPartitions %v", e) 111 | c.Unassign() 112 | case *kafka.Message: 113 | processMessage(e, useMessageHeaders) 114 | case kafka.PartitionEOF: 115 | log.Debugf("PartitionEOF Reached %v", e) 116 | case kafka.Error: 117 | // Errors should generally be considered as informational, the client will try to automatically recover 118 | log.Errorf("Error: %+v", e) 119 | } 120 | } 121 | } 122 | } 123 | 124 | // Process a single message, keeping track of latency data and sequence numbers. 125 | func processMessage( 126 | msg *kafka.Message, 127 | useMessageHeaders bool, 128 | ) { 129 | data := message.Extract(msg, useMessageHeaders) 130 | log.Tracef("Data: %s", data) 131 | validateSequence(data) 132 | collectThroughput(data) 133 | collectLatencyStats(data) 134 | } 135 | -------------------------------------------------------------------------------- /lib/consumer/performance.go: -------------------------------------------------------------------------------- 1 | package consumer 2 | 3 | import ( 4 | "github.com/jamiealquiza/tachymeter" 5 | "github.com/prometheus/client_golang/prometheus" 6 | 7 | "github.com/appsflyer/kafka-mirror-tester/lib/message" 8 | ) 9 | 10 | const ( 11 | // Define a sample size of 500. This affects memory consumption v/s precision. 12 | // It's probably OK to increase this number by a lot but didn't test it yet. 13 | tachymeterSampleSize = 500 14 | ) 15 | 16 | var ( 17 | // We use two tools to measure the time performance. 18 | // One is useful due to it's interface with prometheus and the other is useful as a CLI interface. 19 | 20 | // Prometheus 21 | latencyHistogram prometheus.Histogram 22 | 23 | // And this one has a mice text UI. 24 | tachymeterHistogram *tachymeter.Tachymeter 25 | 26 | // Define the time windows in which metrics are aggregater for. 27 | // How to read this? "20s1s" means a chart will be displayed for 20 seconds and each 28 | // item in this chart is a 1 second average. 29 | tachymeterMeasurementWindows = []string{"20s1s", "1m1s", "2m1s", "15m30s", "1h1m"} 30 | ) 31 | 32 | func init() { 33 | tachymeterHistogram = tachymeter.New(&tachymeter.Config{Size: tachymeterSampleSize}) 34 | } 35 | 36 | // Collect the latency stats from the data into the various counters. 37 | func collectLatencyStats(data *message.Data) { 38 | latencyHistogram.Observe(float64(data.LatencyMS())) 39 | tachymeterHistogram.AddTime(data.Latency) 40 | } 41 | -------------------------------------------------------------------------------- /lib/consumer/sequences.go: -------------------------------------------------------------------------------- 1 | package consumer 2 | 3 | import ( 4 | "fmt" 5 | "sync/atomic" 6 | 7 | "github.com/prometheus/client_golang/prometheus" 8 | log "github.com/sirupsen/logrus" 9 | 10 | "github.com/appsflyer/kafka-mirror-tester/lib/message" 11 | "github.com/appsflyer/kafka-mirror-tester/lib/types" 12 | ) 13 | 14 | var ( 15 | // Map of producer,topic,key -> latest sequence number recieved from this key 16 | receivedSequenceNumbers map[string]types.SequenceNumber 17 | 18 | // For each measurement there are two kind of counters, one is a simple counter that 19 | // simply keeps count of how many such events occured. 20 | // And the other is a prometheus.Counter instance which measures temporary values of that count 21 | // (e.g. last minute, last 15 minutes etc) 22 | sameMessagesCounter prometheus.Counter 23 | sameMessagesCount uint64 24 | oldMessagesCounter prometheus.Counter 25 | oldMessagesCount uint64 26 | inOrderMessagesCounter prometheus.Counter 27 | inOrderMessagesCount uint64 28 | skippedMessagesCounter prometheus.Counter 29 | skippedMessagesCount uint64 30 | ) 31 | 32 | func init() { 33 | receivedSequenceNumbers = make(map[string]types.SequenceNumber) 34 | } 35 | 36 | // For each message validates that the sequence numnber that corresponds to the producer and the topic 37 | // are in order. 38 | // If they are not in order, will log it and accumulate in counters. 39 | // The function accesses some global varialbe that aren't thread safe (receivedSequenceNumbers) which makes the function not thread safe by itself. 40 | func validateSequence(data *message.Data) { 41 | seq := data.Sequence 42 | key := createSeqnenceNumberKey(data.ProducerID, data.Topic, data.MessageKey) 43 | latestSeq, exists := receivedSequenceNumbers[key] 44 | if !exists { 45 | // key not found, let's insert it first 46 | if seq != 0 { 47 | log.Infof("Received initial sequence number > 0. topic=%s producer=%s key=%d number=%d", 48 | data.Topic, data.ProducerID, data.MessageKey, data.Sequence) 49 | } 50 | receivedSequenceNumbers[key] = seq 51 | log.Tracef("Message received first of it's producer-topic: %s", data) 52 | inOrderMessagesCounter.Add(1) 53 | atomic.AddUint64(&inOrderMessagesCount, 1) 54 | return 55 | } 56 | 57 | switch { 58 | case seq == latestSeq: 59 | // Same message twice? That's OK, let's just log it 60 | log.Debugf("Received the same message again: %s", data) 61 | sameMessagesCounter.Add(1) 62 | atomic.AddUint64(&sameMessagesCount, 1) 63 | case seq < latestSeq: 64 | // Received an old message 65 | log.Debugf("Received old data. Current seq=%d, but received %s", latestSeq, data) 66 | oldMessagesCounter.Add(1) 67 | atomic.AddUint64(&oldMessagesCount, 1) 68 | case seq == latestSeq+1: 69 | // That's just perfect! 70 | log.Tracef("Message received in order %s", data) 71 | inOrderMessagesCounter.Add(1) 72 | atomic.AddUint64(&inOrderMessagesCount, 1) 73 | case seq > latestSeq+1: 74 | // skipped a few sequences :-( 75 | howMany := seq - latestSeq 76 | log.Debugf("Skipped a few messages (%d messages). Current seq=%d, received %s", 77 | howMany, latestSeq, data) 78 | skippedMessagesCounter.Add(float64(howMany)) 79 | atomic.AddUint64(&skippedMessagesCount, uint64(howMany)) 80 | } 81 | receivedSequenceNumbers[key] = seq 82 | } 83 | 84 | // create a key for the sequence number map 85 | func createSeqnenceNumberKey(pid types.ProducerID, topic types.Topic, messageKey types.MessageKey) string { 86 | return fmt.Sprintf("%s:%s:%d", pid, topic, messageKey) 87 | } 88 | -------------------------------------------------------------------------------- /lib/consumer/sequences_test.go: -------------------------------------------------------------------------------- 1 | package consumer 2 | 3 | import ( 4 | "testing" 5 | 6 | "github.com/stretchr/testify/assert" 7 | "github.com/appsflyer/kafka-mirror-tester/lib/message" 8 | ) 9 | 10 | func TestValidateSequence(t *testing.T) { 11 | initPrometheus() 12 | assert := assert.New(t) 13 | 14 | // validate initial state 15 | assert.Equal(uint64(0), sameMessagesCount) 16 | assert.Equal(uint64(0), oldMessagesCount) 17 | assert.Equal(uint64(0), inOrderMessagesCount) 18 | assert.Equal(uint64(0), skippedMessagesCount) 19 | 20 | // Now start sending messages and observe counts 21 | data := &message.Data{ 22 | ProducerID: "1", 23 | Topic: "t", 24 | MessageKey: 1, 25 | Sequence: 0, 26 | } 27 | validateSequence(data) 28 | assert.Equal(uint64(0), sameMessagesCount) 29 | assert.Equal(uint64(0), oldMessagesCount) 30 | assert.Equal(uint64(1), inOrderMessagesCount) 31 | assert.Equal(uint64(0), skippedMessagesCount) 32 | 33 | // Same message again 34 | validateSequence(data) 35 | assert.Equal(uint64(1), sameMessagesCount) 36 | assert.Equal(uint64(0), oldMessagesCount) 37 | assert.Equal(uint64(1), inOrderMessagesCount) 38 | assert.Equal(uint64(0), skippedMessagesCount) 39 | 40 | // increase sequence 41 | data = &message.Data{ 42 | ProducerID: "1", 43 | Topic: "t", 44 | MessageKey: 1, 45 | Sequence: 1, 46 | } 47 | validateSequence(data) 48 | assert.Equal(uint64(1), sameMessagesCount) 49 | assert.Equal(uint64(0), oldMessagesCount) 50 | assert.Equal(uint64(2), inOrderMessagesCount) 51 | assert.Equal(uint64(0), skippedMessagesCount) 52 | 53 | // Send to a different topic 54 | data = &message.Data{ 55 | ProducerID: "1", 56 | Topic: "t2", 57 | MessageKey: 1, 58 | Sequence: 0, 59 | } 60 | validateSequence(data) 61 | assert.Equal(uint64(1), sameMessagesCount) 62 | assert.Equal(uint64(0), oldMessagesCount) 63 | assert.Equal(uint64(3), inOrderMessagesCount) 64 | assert.Equal(uint64(0), skippedMessagesCount) 65 | 66 | // Send from a different producer 67 | data = &message.Data{ 68 | ProducerID: "2", 69 | Topic: "t", 70 | MessageKey: 1, 71 | Sequence: 0, 72 | } 73 | validateSequence(data) 74 | assert.Equal(uint64(1), sameMessagesCount) 75 | assert.Equal(uint64(0), oldMessagesCount) 76 | assert.Equal(uint64(4), inOrderMessagesCount) 77 | assert.Equal(uint64(0), skippedMessagesCount) 78 | 79 | // Send with a different message key 80 | data = &message.Data{ 81 | ProducerID: "1", 82 | Topic: "t", 83 | MessageKey: 2, 84 | Sequence: 0, 85 | } 86 | validateSequence(data) 87 | assert.Equal(uint64(1), sameMessagesCount) 88 | assert.Equal(uint64(0), oldMessagesCount) 89 | assert.Equal(uint64(5), inOrderMessagesCount) 90 | assert.Equal(uint64(0), skippedMessagesCount) 91 | 92 | // Skip a few messages 93 | data = &message.Data{ 94 | ProducerID: "1", 95 | Topic: "t", 96 | MessageKey: 1, 97 | Sequence: 5, 98 | } 99 | validateSequence(data) 100 | assert.Equal(uint64(1), sameMessagesCount) 101 | assert.Equal(uint64(0), oldMessagesCount) 102 | assert.Equal(uint64(5), inOrderMessagesCount) 103 | assert.Equal(uint64(4), skippedMessagesCount) 104 | 105 | // Skip an old message 106 | data = &message.Data{ 107 | ProducerID: "1", 108 | Topic: "t", 109 | MessageKey: 1, 110 | Sequence: 2, 111 | } 112 | validateSequence(data) 113 | assert.Equal(uint64(1), sameMessagesCount) 114 | assert.Equal(uint64(1), oldMessagesCount) 115 | assert.Equal(uint64(5), inOrderMessagesCount) 116 | assert.Equal(uint64(4), skippedMessagesCount) 117 | } 118 | -------------------------------------------------------------------------------- /lib/consumer/throughput.go: -------------------------------------------------------------------------------- 1 | package consumer 2 | 3 | import ( 4 | "sync/atomic" 5 | 6 | "github.com/prometheus/client_golang/prometheus" 7 | 8 | "github.com/appsflyer/kafka-mirror-tester/lib/message" 9 | ) 10 | 11 | var ( 12 | bytesCounter prometheus.Counter 13 | bytesCount uint64 14 | messageCounter prometheus.Counter 15 | messageCount uint64 16 | ) 17 | 18 | // Count the total throughput (message count and byte count) 19 | func collectThroughput(data *message.Data) { 20 | bytes := data.TotalPayloadLength 21 | bytesCounter.Add(float64(bytes)) 22 | atomic.AddUint64(&bytesCount, bytes) 23 | messageCounter.Inc() 24 | atomic.AddUint64(&messageCount, 1) 25 | } 26 | -------------------------------------------------------------------------------- /lib/consumer/ui.go: -------------------------------------------------------------------------------- 1 | package consumer 2 | 3 | import ( 4 | "fmt" 5 | "net/http" 6 | "sync" 7 | "sync/atomic" 8 | "time" 9 | 10 | humanize "github.com/dustin/go-humanize" 11 | "github.com/prometheus/client_golang/prometheus" 12 | "github.com/prometheus/client_golang/prometheus/promauto" 13 | "github.com/prometheus/client_golang/prometheus/promhttp" 14 | ) 15 | 16 | const ( 17 | terminalReportingFrequency = 10 * time.Second 18 | ) 19 | 20 | var ( 21 | // once is used for one-time initialization that we don't want to embed in the init function. 22 | once sync.Once 23 | ) 24 | 25 | // Serve the different UIs for viewing metrics. 26 | func serveConsumerUI() { 27 | once.Do(func() { 28 | terminalUI() 29 | initPrometheus() 30 | }) 31 | } 32 | 33 | func initPrometheus() { 34 | latencyHistogram = promauto.NewHistogram(prometheus.HistogramOpts{ 35 | Name: "message_arrival_latency_hist_ms", 36 | Help: "Latency in ms for message arrival e2e (histogram).", 37 | Buckets: prometheus.ExponentialBuckets(1000, 2, 9), // 9 buckets: 1sec,2sec,4,8,16... 38 | }) 39 | sameMessagesCounter = promauto.NewCounter(prometheus.CounterOpts{ 40 | Name: "same_message_count", 41 | Help: "Number of times the same message was consumed.", 42 | }) 43 | oldMessagesCounter = promauto.NewCounter(prometheus.CounterOpts{ 44 | Name: "old_message_count", 45 | Help: "Number of times an old message was consumed.", 46 | }) 47 | inOrderMessagesCounter = promauto.NewCounter(prometheus.CounterOpts{ 48 | Name: "in_order_message_count", 49 | Help: "Number of times a message was received in order (this is the happy path).", 50 | }) 51 | skippedMessagesCounter = promauto.NewCounter(prometheus.CounterOpts{ 52 | Name: "skipped_message_count", 53 | Help: "Number of times a message was skipped.", 54 | }) 55 | messageCounter = promauto.NewCounter(prometheus.CounterOpts{ 56 | Name: "messages_consumed", 57 | Help: "Number of messages consumed from kafka.", 58 | }) 59 | bytesCounter = promauto.NewCounter(prometheus.CounterOpts{ 60 | Name: "bytes_consumed", 61 | Help: "Number of bytes consumed from kafka.", 62 | }) 63 | 64 | http.Handle("/metrics", promhttp.Handler()) 65 | go http.ListenAndServe(":8000", nil) 66 | } 67 | 68 | // Periodically emit statistics to the terminal. 69 | func terminalUI() { 70 | ticker := time.Tick(terminalReportingFrequency) 71 | const terminalWidth = 50 72 | var ( 73 | lastMessages uint64 74 | lastBytes uint64 75 | ) 76 | go func() { 77 | for { 78 | <-ticker 79 | messages := atomic.LoadUint64(&messageCount) 80 | bytes := atomic.LoadUint64(&bytesCount) 81 | reportingFrequencySec := uint64((terminalReportingFrequency / time.Second)) 82 | messageRate := int((messages - lastMessages) / reportingFrequencySec) 83 | bytesRate := uint64((bytes - lastBytes) / reportingFrequencySec) 84 | metrics := tachymeterHistogram.Calc() 85 | tachymeterHistogram.Reset() 86 | 87 | fmt.Printf("\n\n\n\tSTATS\n") 88 | //print a visual histogram of latencies 89 | fmt.Println(metrics.Histogram.String(terminalWidth)) 90 | // print statistics about latencies 91 | fmt.Println(metrics.String()) 92 | fmt.Printf("\nRead rate: %d messages/sec \t Byte rate: %s/sec \n", messageRate, humanize.Bytes(bytesRate)) 93 | fmt.Printf("\nsameMessagesCount=%d, oldMessagesCount=%d, inOrderMessagesCount=%d, skippedMessagesCount=%d", 94 | atomic.LoadUint64(&sameMessagesCount), 95 | atomic.LoadUint64(&oldMessagesCount), 96 | atomic.LoadUint64(&inOrderMessagesCount), 97 | atomic.LoadUint64(&skippedMessagesCount)) 98 | lastMessages = messages 99 | lastBytes = bytes 100 | } 101 | }() 102 | } 103 | -------------------------------------------------------------------------------- /lib/gen/main/code-gen.go: -------------------------------------------------------------------------------- 1 | // The following directive is necessary to make the package coherent: 2 | // +build ignore 3 | 4 | // This program generates code automatically and it should be run before build. 5 | // It can be invoked by running: go generate 6 | 7 | package main 8 | 9 | import ( 10 | "fmt" 11 | "html/template" 12 | "os" 13 | "strings" 14 | 15 | log "github.com/sirupsen/logrus" 16 | ) 17 | 18 | // max payload size 19 | 20 | const payloadLength = 1e6 21 | 22 | var messageConstTemplate = template.Must(template.New("").Parse(`// Code generated by go generate; DO NOT EDIT. 23 | package message 24 | 25 | const payload = "{{ .Payload }}" 26 | `)) 27 | 28 | func main() { 29 | f, err := os.Create("message-no-headers-const-gen.go") 30 | if err != nil { 31 | log.Fatalf("Err: %+v", err) 32 | } 33 | defer f.Close() 34 | 35 | var b strings.Builder 36 | for i := 0; i < payloadLength; i++ { 37 | fmt.Fprintf(&b, "%d", i%10) 38 | } 39 | messageConstTemplate.Execute(f, struct { 40 | Payload string 41 | }{ 42 | Payload: b.String(), 43 | }) 44 | } 45 | -------------------------------------------------------------------------------- /lib/log.go: -------------------------------------------------------------------------------- 1 | package lib 2 | 3 | import log "github.com/sirupsen/logrus" 4 | 5 | // initialize logs 6 | func init() { 7 | log.SetFormatter(&log.TextFormatter{ 8 | ForceColors: true, 9 | FullTimestamp: true, 10 | }) 11 | } 12 | -------------------------------------------------------------------------------- /lib/message/data.go: -------------------------------------------------------------------------------- 1 | package message 2 | 3 | import ( 4 | "fmt" 5 | "time" 6 | 7 | "github.com/appsflyer/kafka-mirror-tester/lib/types" 8 | ) 9 | 10 | // Data represent the data sent in a message. 11 | type Data struct { 12 | ProducerID types.ProducerID 13 | MessageKey types.MessageKey 14 | Sequence types.SequenceNumber 15 | ProducerTimestamp time.Time 16 | ConsumerTimestamp time.Time 17 | Latency time.Duration // In nanoseconds 18 | Topic types.Topic 19 | // The actual payload (without metadata) 20 | Payload []byte 21 | // The total payload lenght, including metadata sent inside the payload 22 | TotalPayloadLength uint64 23 | } 24 | 25 | func (d Data) String() string { 26 | return fmt.Sprintf("message.Data[ProducerID=%s, MessageKey=%d, Topic=%s, Sequence=%d, Latency=%dms len(Payload)=%db]", 27 | d.ProducerID, d.MessageKey, d.Topic, d.Sequence, d.LatencyMS(), len(d.Payload)) 28 | } 29 | 30 | // LatencyMS returns the latency in ms 31 | func (d Data) LatencyMS() int64 { 32 | return int64(d.Latency / 1e6) 33 | } 34 | 35 | // Data parsed from the payload (when headers are not used) 36 | type parsedData struct { 37 | producerID types.ProducerID 38 | sequence types.SequenceNumber 39 | timestamp time.Time 40 | payload []byte 41 | } 42 | -------------------------------------------------------------------------------- /lib/message/message-no-headers.go: -------------------------------------------------------------------------------- 1 | package message 2 | 3 | import ( 4 | "fmt" 5 | "strconv" 6 | "strings" 7 | "time" 8 | 9 | "github.com/pkg/errors" 10 | log "github.com/sirupsen/logrus" 11 | 12 | "github.com/appsflyer/kafka-mirror-tester/lib/types" 13 | ) 14 | 15 | //go:generate go run ../gen/main/code-gen.go 16 | 17 | // Format a message based on the parameters 18 | func format( 19 | id types.ProducerID, 20 | seq types.SequenceNumber, 21 | timestamp time.Time, 22 | messageSize types.MessageSize, 23 | ) string { 24 | var b strings.Builder 25 | // build the header first 26 | fmt.Fprintf(&b, "%s;%d;%d;", id, seq, timestamp.UTC().UnixNano()) 27 | 28 | // See how much space left for payload and add chars based on the space left 29 | left := int(messageSize) - b.Len() 30 | if left > 0 { 31 | fmt.Fprintf(&b, payload[:left]) 32 | } 33 | return b.String() 34 | } 35 | 36 | // Parse parses the string message into the Data structure. 37 | func parse(msg string) (data parsedData, err error) { 38 | parts := strings.Split(msg, ";") 39 | if len(parts) != 4 { 40 | err = errors.Errorf("msg should contain 4 parts but it doesn't. %s...", msg[:30]) 41 | return 42 | } 43 | 44 | data.producerID = types.ProducerID(parts[0]) 45 | sq, err := strconv.ParseInt(parts[1], 10, 64) 46 | if err != nil { 47 | err = errors.WithStack(err) 48 | return 49 | } 50 | 51 | data.sequence = types.SequenceNumber(sq) 52 | 53 | ts, err := parseTs(parts[2]) 54 | if err != nil { 55 | err = errors.WithStack(err) 56 | return 57 | } 58 | data.timestamp = ts 59 | 60 | data.payload = []byte(parts[3]) 61 | return 62 | } 63 | 64 | func parseTs(ts string) (time.Time, error) { 65 | i, err := strconv.ParseInt(ts, 10, 64) 66 | if err != nil { 67 | log.Fatalf("Malformed timestamp %s. %+v", ts, err) 68 | } 69 | nano := i % 1e9 70 | sec := i / 1e9 71 | t := time.Unix(sec, nano).UTC() 72 | return t, nil 73 | } 74 | -------------------------------------------------------------------------------- /lib/message/message-no-headers_test.go: -------------------------------------------------------------------------------- 1 | package message 2 | 3 | import ( 4 | "testing" 5 | "time" 6 | 7 | "github.com/stretchr/testify/assert" 8 | "github.com/stretchr/testify/require" 9 | 10 | "github.com/appsflyer/kafka-mirror-tester/lib/types" 11 | ) 12 | 13 | func TestFormat(t *testing.T) { 14 | 15 | assert := assert.New(t) 16 | now := time.Now() 17 | // Check length 18 | msg := format("1", 0, now, 100) 19 | assert.Equal(100, len(msg), "Length should be 100") 20 | 21 | // Check minimal length 22 | msg = format("1", 0, now, 1) 23 | assert.True(len(msg) > 1, "Length should be > 1") 24 | 25 | // Check very long messages 26 | msg = format("1", 0, now, 1e4) 27 | assert.Equal(int(1e4), len(msg), "Length should be 1e3") 28 | } 29 | 30 | func TestParse(t *testing.T) { 31 | 32 | assert := assert.New(t) 33 | now := time.Now() 34 | // Create a message 35 | msg := format("1", 0, now, 100) 36 | // Make sure at least one ms passed before parsing it 37 | time.Sleep(1 * time.Millisecond) 38 | parsed, err := parse(msg) 39 | 40 | now = time.Now() 41 | require.Nil(t, err, "There should not be an error") 42 | 43 | assert.Equal(types.ProducerID("1"), parsed.producerID, "ProducerID should be 1") 44 | assert.Equal(types.SequenceNumber(0), parsed.sequence, "Sequence should be 0") 45 | assert.True(parsed.timestamp.Before(now)) 46 | } 47 | 48 | // from fib_test.go 49 | func BenchmarkFormat(b *testing.B) { 50 | now := time.Now() 51 | for n := 0; n < b.N; n++ { 52 | format("xx", 5, now, 1000) 53 | } 54 | } 55 | -------------------------------------------------------------------------------- /lib/message/message.go: -------------------------------------------------------------------------------- 1 | package message 2 | 3 | import ( 4 | "strconv" 5 | "time" 6 | 7 | "github.com/confluentinc/confluent-kafka-go/kafka" 8 | log "github.com/sirupsen/logrus" 9 | 10 | "github.com/appsflyer/kafka-mirror-tester/lib/types" 11 | ) 12 | 13 | const ( 14 | // KeySequence identifies the sequence number header 15 | KeySequence = "seq" 16 | 17 | // KeyProducerID identifies the producer ID header 18 | KeyProducerID = "id" 19 | ) 20 | 21 | // Create a mew message with headers, timestamp and size. 22 | // Does not set TopicPartition. 23 | func Create( 24 | producerID types.ProducerID, 25 | messageID types.MessageKey, 26 | seq types.SequenceNumber, 27 | size types.MessageSize, 28 | useMessageHeaders bool, 29 | ) *kafka.Message { 30 | ts := time.Now().UTC() 31 | msg := &kafka.Message{ 32 | Key: []byte(strconv.FormatUint(uint64(messageID), 10)), 33 | } 34 | if useMessageHeaders { 35 | msg.Timestamp = ts 36 | msg.TimestampType = kafka.TimestampCreateTime 37 | msg.Value = make([]byte, size) 38 | msg.Headers = []kafka.Header{ 39 | { 40 | Key: KeyProducerID, 41 | Value: []byte(producerID), 42 | }, 43 | { 44 | Key: KeySequence, 45 | Value: []byte(strconv.FormatInt(int64(seq), 10)), 46 | }, 47 | } 48 | } else { 49 | msg.Value = []byte(format(producerID, seq, ts, size)) 50 | } 51 | return msg 52 | } 53 | 54 | // Extract the data from the message and set timestamp and latencies 55 | func Extract( 56 | msg *kafka.Message, 57 | useMessageHeaders bool, 58 | ) *Data { 59 | now := time.Now().UTC() 60 | var topic types.Topic 61 | if msg.TopicPartition.Topic != nil { 62 | topic = types.Topic(*msg.TopicPartition.Topic) 63 | } else { 64 | topic = types.Topic("") 65 | } 66 | keyStr := string(msg.Key) 67 | ui, err := strconv.ParseUint(keyStr, 10, 64) 68 | if err != nil { 69 | log.Errorf("Malformed message key %s \t %s", keyStr, err) 70 | } 71 | key := types.MessageKey(ui) 72 | data := &Data{ 73 | ConsumerTimestamp: now, 74 | Topic: topic, 75 | TotalPayloadLength: uint64(len(msg.Value)), 76 | MessageKey: key, 77 | } 78 | if useMessageHeaders { 79 | data.ProducerID = getProducerID(msg) 80 | data.Sequence = getSequence(msg) 81 | data.ProducerTimestamp = msg.Timestamp 82 | data.Payload = msg.Value 83 | } else { 84 | parsed, err := parse(string(msg.Value)) 85 | if err != nil { 86 | log.Errorf("Error parsing message %s", string(msg.Value)) 87 | return data 88 | } 89 | data.ProducerID = parsed.producerID 90 | data.Sequence = parsed.sequence 91 | data.Payload = parsed.payload 92 | data.ProducerTimestamp = parsed.timestamp 93 | } 94 | data.Latency = data.ConsumerTimestamp.Sub(data.ProducerTimestamp) 95 | return data 96 | } 97 | 98 | func getProducerID(msg *kafka.Message) types.ProducerID { 99 | v := getHeader(msg, KeyProducerID) 100 | if v == nil { 101 | return types.ProducerID("") 102 | } 103 | return types.ProducerID(string(v)) 104 | } 105 | 106 | func getSequence(msg *kafka.Message) types.SequenceNumber { 107 | str := string(getHeader(msg, KeySequence)) 108 | if str == "" { 109 | return -1 110 | } 111 | i, err := strconv.ParseInt(str, 10, 64) 112 | if err != nil { 113 | log.Fatalf("Malformed Sequence Number %s. %+v", str, err) 114 | } 115 | return types.SequenceNumber(i) 116 | } 117 | 118 | func getHeader(msg *kafka.Message, key string) []byte { 119 | for _, h := range msg.Headers { 120 | if h.Key == key { 121 | return h.Value 122 | } 123 | } 124 | // header not found 125 | return nil 126 | } 127 | -------------------------------------------------------------------------------- /lib/message/message_test.go: -------------------------------------------------------------------------------- 1 | package message 2 | 3 | import ( 4 | "testing" 5 | "time" 6 | 7 | "github.com/stretchr/testify/assert" 8 | "github.com/stretchr/testify/require" 9 | "github.com/appsflyer/kafka-mirror-tester/lib/types" 10 | ) 11 | 12 | func TestCreateAndExtractWithHeaders(t *testing.T) { 13 | msg := Create("1", 2, 5, 100, true) 14 | require.NotNil(t, msg, "Message should not be nil") 15 | 16 | // Make sure at least one ms passed before parsing it 17 | time.Sleep(1 * time.Millisecond) 18 | data := Extract(msg, true) 19 | require.NotNil(t, data, "Data should not be nil") 20 | 21 | assert := assert.New(t) 22 | assert.Equal(types.ProducerID("1"), data.ProducerID, "ProducerID should be 1") 23 | assert.Equal(types.MessageKey(2), data.MessageKey, "MessageKey should be 2") 24 | assert.Equal(types.SequenceNumber(5), data.Sequence, "Sequence number should be 5") 25 | assert.True(data.Latency > 1, "Latency should be > 1") 26 | } 27 | 28 | func TestCreateAndExtractWithHouteaders(t *testing.T) { 29 | msg := Create("1", 2, 5, 100, false) 30 | require.NotNil(t, msg, "Message should not be nil") 31 | 32 | // Make sure at least one ms passed before parsing it 33 | time.Sleep(1 * time.Millisecond) 34 | data := Extract(msg, false) 35 | require.NotNil(t, data, "Data should not be nil") 36 | 37 | assert := assert.New(t) 38 | assert.Equal(types.ProducerID("1"), data.ProducerID, "ProducerID should be 1") 39 | assert.Equal(types.MessageKey(2), data.MessageKey, "MessageKey should be 2") 40 | assert.Equal(types.SequenceNumber(5), data.Sequence, "Sequence number should be 5") 41 | assert.True(data.Latency > 1, "Latency should be > 1") 42 | } 43 | 44 | func TestMissingHeaderFields(t *testing.T) { 45 | msg := Create("1", 2, 5, 100, true) 46 | require.NotNil(t, msg, "Message should not be nil") 47 | msg.Headers = msg.Headers[1:] 48 | data := Extract(msg, true) 49 | require.NotNil(t, data, "Data should not be nil") 50 | 51 | assert := assert.New(t) 52 | assert.Equal(types.ProducerID(""), data.ProducerID, "ProducerID should be 1") 53 | assert.Equal(types.MessageKey(2), data.MessageKey, "MessageKey should be 2") 54 | assert.Equal(types.SequenceNumber(5), data.Sequence, "Sequence number should be 5") 55 | } 56 | 57 | func TestMissingHeaders(t *testing.T) { 58 | msg := Create("1", 2, 5, 100, true) 59 | require.NotNil(t, msg, "Message should not be nil") 60 | msg.Headers = nil 61 | data := Extract(msg, true) 62 | require.NotNil(t, data, "Data should not be nil") 63 | 64 | assert := assert.New(t) 65 | assert.Equal(types.ProducerID(""), data.ProducerID, "ProducerID should be 1") 66 | assert.Equal(types.MessageKey(2), data.MessageKey, "MessageKey should be 2") 67 | assert.Equal(types.SequenceNumber(-1), data.Sequence, "Sequence number should be 5") 68 | } 69 | -------------------------------------------------------------------------------- /lib/producer/monitor.go: -------------------------------------------------------------------------------- 1 | package producer 2 | 3 | import ( 4 | "context" 5 | "net/http" 6 | "sync/atomic" 7 | "time" 8 | 9 | "github.com/confluentinc/confluent-kafka-go/kafka" 10 | mapset "github.com/deckarep/golang-set" 11 | "github.com/dustin/go-humanize" 12 | "github.com/paulbellamy/ratecounter" 13 | "github.com/prometheus/client_golang/prometheus" 14 | "github.com/prometheus/client_golang/prometheus/promauto" 15 | "github.com/prometheus/client_golang/prometheus/promhttp" 16 | log "github.com/sirupsen/logrus" 17 | 18 | "github.com/appsflyer/kafka-mirror-tester/lib/types" 19 | ) 20 | 21 | const monitoringFrequency = 5 * time.Second 22 | 23 | var ( 24 | // messageRateCounter is used in order to observe the actual throughput 25 | messageRateCounter *ratecounter.RateCounter 26 | messageCounter prometheus.Counter 27 | messageSendErrors prometheus.Counter 28 | 29 | // bytesRateCounter measures the actual throughput in bytes 30 | bytesRateCounter *ratecounter.RateCounter 31 | bytesCounter prometheus.Counter 32 | 33 | topicsGauge prometheus.Gauge 34 | partitionsGauge prometheus.Gauge 35 | 36 | // number of messages that are bandwidth throttled or kafka-server throttled. 37 | // This is the number of messages that were supposed to be sent but got throttled and are lagging behind. 38 | badwidthThrottledMessages prometheus.Counter 39 | 40 | // Number of currently client-side in-flight messages (messages buffered but not yet sent) 41 | inflightMessageCount prometheus.GaugeFunc 42 | ) 43 | 44 | func init() { 45 | messageRateCounter = ratecounter.NewRateCounter(monitoringFrequency) 46 | bytesRateCounter = ratecounter.NewRateCounter(monitoringFrequency) 47 | } 48 | 49 | func reportMessageSent(m *kafka.Message) { 50 | messageRateCounter.Incr(1) 51 | messageCounter.Inc() 52 | l := len(m.Value) 53 | bytesRateCounter.Incr(int64(l)) 54 | bytesCounter.Add(float64(l)) 55 | } 56 | 57 | // periodically monitors the kafka writer. 58 | // Blocks forever or until canceled. 59 | func monitor( 60 | ctx context.Context, 61 | errorCounter *uint64, 62 | frequency time.Duration, 63 | desiredThroughput types.Throughput, 64 | id types.ProducerID, 65 | numTopics, numPartitions uint, 66 | producers mapset.Set, 67 | ) { 68 | initPrometheus(numTopics, numPartitions, producers) 69 | ticker := time.Tick(frequency) 70 | for { 71 | select { 72 | case <-ticker: 73 | printStats(errorCounter, frequency, desiredThroughput, id) 74 | case <-ctx.Done(): 75 | log.Infof("Monitor done. %s", ctx.Err()) 76 | return 77 | } 78 | } 79 | } 80 | 81 | func initPrometheus( 82 | numTopics, numPartitions uint, 83 | producers mapset.Set, 84 | ) { 85 | messageCounter = promauto.NewCounter(prometheus.CounterOpts{ 86 | Name: "messages_produced", 87 | Help: "Number of messages produced to kafka.", 88 | }) 89 | bytesCounter = promauto.NewCounter(prometheus.CounterOpts{ 90 | Name: "bytes_produced", 91 | Help: "Number of bytes produced to kafka.", 92 | }) 93 | topicsGauge = promauto.NewGauge(prometheus.GaugeOpts{ 94 | Name: "producer_number_of_topics", 95 | Help: "Number of topics that the producer writes to.", 96 | }) 97 | topicsGauge.Add(float64(numTopics)) 98 | 99 | partitionsGauge = promauto.NewGauge(prometheus.GaugeOpts{ 100 | Name: "producer_number_of_partitions", 101 | Help: "Number of partitions of each topic that the producer writes to.", 102 | }) 103 | partitionsGauge.Add(float64(numPartitions)) 104 | 105 | badwidthThrottledMessages = promauto.NewCounter(prometheus.CounterOpts{ 106 | Name: "bandwidth_throttled_messages", 107 | Help: "Number of messages throttled after sending.", 108 | }) 109 | inflightMessageCount = promauto.NewGaugeFunc(prometheus.GaugeOpts{ 110 | Name: "in_flight_message_count", 111 | Help: "Number of currently in-flight messages (client side)", 112 | }, inFlightMessageCounter(producers)) 113 | messageSendErrors = promauto.NewCounter(prometheus.CounterOpts{ 114 | Name: "message_send_errors", 115 | Help: "Number of message send errors.", 116 | }) 117 | http.Handle("/metrics", promhttp.Handler()) 118 | go http.ListenAndServe(":8001", nil) 119 | } 120 | 121 | func inFlightMessageCounter(producers mapset.Set) func() float64 { 122 | return func() float64 { 123 | sum := 0 124 | for p := range producers.Iterator().C { 125 | producer := p.(*kafka.Producer) 126 | sum += producer.Len() 127 | } 128 | return float64(sum) 129 | } 130 | } 131 | 132 | // Prints some runtime stats such as errors, throughputs etc 133 | func printStats( 134 | errorCounter *uint64, 135 | frequency time.Duration, 136 | desiredThroughput types.Throughput, 137 | id types.ProducerID, 138 | ) { 139 | frequencySeconds := int64(frequency / time.Second) 140 | messageThroughput := messageRateCounter.Rate() / frequencySeconds 141 | bytesThroughput := uint64(bytesRateCounter.Rate() / frequencySeconds) 142 | errors := atomic.LoadUint64(errorCounter) 143 | log.Infof(`Recent stats for %s: 144 | Throughput: %d messages / sec 145 | Throughput: %s / sec 146 | Total errors: %d 147 | `, id, messageThroughput, humanize.Bytes(bytesThroughput), errors) 148 | 149 | // How much slack we're willing to take if throughput is lower than desired 150 | const slack = .9 151 | 152 | if float32(messageThroughput) < float32(desiredThroughput)*slack { 153 | log.Warnf("Actual throughput is < desired throughput. %d < %d", messageThroughput, desiredThroughput) 154 | badwidthThrottledMessages.Add(float64(desiredThroughput) - float64(messageThroughput)) 155 | } 156 | } 157 | -------------------------------------------------------------------------------- /lib/producer/producer.go: -------------------------------------------------------------------------------- 1 | package producer 2 | 3 | // The producer package is responsible for producing messages and repotring success/failure WRT 4 | // delivery as well as capacity (is it able to produce the required throughput) 5 | 6 | import ( 7 | "context" 8 | "math" 9 | "strings" 10 | "sync" 11 | "sync/atomic" 12 | 13 | "golang.org/x/time/rate" 14 | 15 | "github.com/confluentinc/confluent-kafka-go/kafka" 16 | mapset "github.com/deckarep/golang-set" 17 | log "github.com/sirupsen/logrus" 18 | 19 | "github.com/appsflyer/kafka-mirror-tester/lib/admin" 20 | "github.com/appsflyer/kafka-mirror-tester/lib/message" 21 | "github.com/appsflyer/kafka-mirror-tester/lib/types" 22 | ) 23 | 24 | const ( 25 | // How much burst we allow for the rate limiter. 26 | // We provide a 0.1 burst ratio which means that at times the rate might go up to 10% or the desired rate (but not for log) 27 | // This is done in order to conpersate for slow starts. 28 | burstRatio = 0.1 29 | 30 | // Number of messages per producer that we allow in-flight before waiting and flushing 31 | inFlightThreshold = 100000000 32 | ) 33 | 34 | // ProduceToTopics spawms multiple producer threads and produces to all topics 35 | func ProduceToTopics( 36 | brokers types.Brokers, 37 | id types.ProducerID, 38 | throughput types.Throughput, 39 | size types.MessageSize, 40 | initialSequence types.SequenceNumber, 41 | topicsString string, 42 | numPartitions, numReplicas uint, 43 | useMessageHeaders bool, 44 | retentionMs uint, 45 | ) { 46 | // Count the total number of errors on this topic 47 | errorCounter := uint64(0) 48 | topics := strings.Split(topicsString, ",") 49 | producers := mapset.NewSet() 50 | ctx := context.Background() 51 | go monitor(ctx, &errorCounter, monitoringFrequency, throughput, id, uint(len(topics)), numPartitions, producers) 52 | 53 | var wg sync.WaitGroup 54 | for _, topic := range topics { 55 | t := types.Topic(topic) 56 | wg.Add(1) 57 | go func(topic types.Topic, partitions, replicas uint) { 58 | admin.MustCreateTopic(ctx, brokers, t, partitions, replicas, retentionMs) 59 | ProduceForever( 60 | ctx, 61 | brokers, 62 | t, 63 | id, 64 | initialSequence, 65 | partitions, 66 | throughput, 67 | size, 68 | useMessageHeaders, 69 | &errorCounter, 70 | producers) 71 | wg.Done() 72 | }(t, numPartitions, numReplicas) 73 | } 74 | wg.Wait() 75 | } 76 | 77 | // ProduceForever will produce messages to the topic forver or until canceled by the context. 78 | // It will try to acheive the desired throughput and if not - will log that. It will not exceed the throughput (measured by number of messages per second) 79 | // throughput is limited to 1M messages per second. 80 | func ProduceForever( 81 | ctx context.Context, 82 | brokers types.Brokers, 83 | topic types.Topic, 84 | id types.ProducerID, 85 | initialSequence types.SequenceNumber, 86 | numPartitions uint, 87 | throughput types.Throughput, 88 | messageSize types.MessageSize, 89 | useMessageHeaders bool, 90 | errorCounter *uint64, 91 | producers mapset.Set, 92 | ) { 93 | log.Infof("Starting the producer. brokers=%s, topic=%s id=%s throughput=%d size=%d initialSequence=%d", 94 | brokers, topic, id, throughput, messageSize, initialSequence) 95 | p, err := kafka.NewProducer(&kafka.ConfigMap{ 96 | "bootstrap.servers": string(brokers), 97 | "queue.buffering.max.ms": "1000", 98 | }) 99 | if err != nil { 100 | log.Fatalf("Failed to create producer: %s\n", err) 101 | } 102 | defer p.Close() 103 | producers.Add(p) 104 | producerForeverWithProducer( 105 | ctx, 106 | p, 107 | topic, 108 | id, 109 | initialSequence, 110 | numPartitions, 111 | throughput, 112 | messageSize, 113 | useMessageHeaders, 114 | errorCounter) 115 | } 116 | 117 | // producerForeverWithWriter produces kafka messages forever or until the context is canceled. 118 | // adheeers to maintaining the desired throughput. 119 | func producerForeverWithProducer( 120 | ctx context.Context, 121 | p *kafka.Producer, 122 | topic types.Topic, 123 | producerID types.ProducerID, 124 | initialSequence types.SequenceNumber, 125 | numPartitions uint, 126 | throughput types.Throughput, 127 | messageSize types.MessageSize, 128 | useMessageHeaders bool, 129 | errorCounter *uint64, 130 | ) { 131 | // the rate limiter regulates the producer by limiting its throughput (messages/sec) 132 | limiter := rate.NewLimiter(rate.Limit(throughput), int(math.Ceil(float64(throughput)*burstRatio))) 133 | 134 | // Sequence number per message 135 | seq := initialSequence 136 | 137 | go eventsProcessor(p, errorCounter) 138 | 139 | topicString := string(topic) 140 | tp := kafka.TopicPartition{Topic: &topicString, Partition: kafka.PartitionAny} 141 | for ; ; seq++ { 142 | err := limiter.Wait(ctx) 143 | if err != nil { 144 | log.Errorf("Error waiting %+v", err) 145 | continue 146 | } 147 | numPartitionsXprime := numPartitions * 17 // TO increase the likelihood of even partitioning 148 | messageKey := types.MessageKey(uint(seq) % numPartitionsXprime) 149 | scopedSeq := seq / types.SequenceNumber(numPartitionsXprime) 150 | produceMessage(ctx, p, tp, producerID, messageKey, scopedSeq, messageSize, useMessageHeaders) 151 | } 152 | } 153 | 154 | // produceMessage produces a single message to kafka. 155 | // message production is asyncrounous on the ProducerChannel 156 | func produceMessage( 157 | ctx context.Context, 158 | p *kafka.Producer, 159 | topicPartition kafka.TopicPartition, 160 | producerID types.ProducerID, 161 | messageKey types.MessageKey, 162 | seq types.SequenceNumber, 163 | messageSize types.MessageSize, 164 | useMessageHeaders bool, 165 | ) { 166 | if p.Len() > inFlightThreshold { 167 | p.Flush(1) 168 | } 169 | m := message.Create(producerID, messageKey, seq, messageSize, useMessageHeaders) 170 | m.TopicPartition = topicPartition 171 | p.ProduceChannel() <- m 172 | log.Tracef("Producing %s...", m) 173 | } 174 | 175 | // eventsProcessor processes the events emited by the producer p. 176 | // It then logs errors and increased the passed-by-reference errors counter and updates the throughput counter 177 | func eventsProcessor( 178 | p *kafka.Producer, 179 | errorCounter *uint64, 180 | ) { 181 | for e := range p.Events() { 182 | switch ev := e.(type) { 183 | case *kafka.Message: 184 | m := ev 185 | if m.TopicPartition.Error != nil { 186 | log.Errorf("Delivery failed: %v", m.TopicPartition.Error) 187 | atomic.AddUint64(errorCounter, 1) 188 | messageSendErrors.Inc() 189 | } else { 190 | reportMessageSent(m) 191 | } 192 | default: 193 | log.Infof("Ignored event: %s", ev) 194 | } 195 | } 196 | } 197 | -------------------------------------------------------------------------------- /lib/types/types.go: -------------------------------------------------------------------------------- 1 | package types 2 | 3 | // Common type definitions for this project 4 | 5 | // Brokers is a coma seperated string of host:port 6 | type Brokers string 7 | 8 | // Throughput describes a message send throughput measured by number of messages per second 9 | type Throughput uint 10 | 11 | // MessageSize describes a message size in bytes 12 | type MessageSize uint 13 | 14 | // ProducerID describes an ID for a producer 15 | type ProducerID string 16 | 17 | // MessageKey describes a message key in kafka. We define them as uint b/c we want 18 | // to enfoce that as part of the business logic. 19 | // One thing worth mentioning is that in Kafka message keys are not generally required to be unique. 20 | // In fact we use them simply for message routing b/w partitions 21 | // and we expect each key to repeat many times 22 | type MessageKey uint 23 | 24 | // Topic describes a name of a kafka topic 25 | type Topic string 26 | 27 | // Topics is just an array of topics 28 | type Topics []string 29 | 30 | // SequenceNumber represents a sequence number in a message used for testing the order of the message 31 | type SequenceNumber int64 32 | 33 | // ConsumerGroup for kafka 34 | type ConsumerGroup string 35 | -------------------------------------------------------------------------------- /main.go: -------------------------------------------------------------------------------- 1 | package main 2 | 3 | import ( 4 | "github.com/appsflyer/kafka-mirror-tester/lib/cmd" 5 | ) 6 | 7 | func main() { 8 | cmd.Execute() 9 | } 10 | -------------------------------------------------------------------------------- /results-ureplicator.md: -------------------------------------------------------------------------------- 1 | # Results of the experiment - uReplicator 2 | In this experiment we set out to test the performance and correctness of uReplicator with two Kafka clusters located in two AWS regions: `us-east-1` (Virginia) and `eu-west-1` (Ireland). 3 | 4 | To implement the experiment we created specialized producer and consumer written in Go. The producer's responsibility is to generate messages in predefined format and configurable throughput and the consumer's responsibility is to verify message arrival and measure throughput and latency. 5 | 6 | Details of the implementation of the producer and the consumer are in the [readme file](README.md). 7 | 8 | ## Setup 9 | 10 | We use *Kubernetest* to spin up servers in both datacenters and set up the two kafka clusters as well as the uReplicator, producer and consumer. 11 | 12 | * The producer runs in us-east-1, producing data to the local cluster. 13 | * uReplicator runs in eu-west-1, consuming from the cluster in us-east-1 and producing to the local kafka cluster in eu-west-1 14 | * The consumer runs in eu-west-1, consuming messages replicated to the local cluster by uReplicator 15 | 16 | We tested various and different configurations but most of the tests were run with this setup: 17 | 18 | * Kubernetest node types: `i3-large` 19 | * Kubernetest cluster sizes: in us-east-1: 40 nodes, in eu-west-1 48 nodes 20 | * Kafka cluster sizes: 30 brokers in each cluster. Single zookeeper pod. Storage on ephemeral local disks 21 | * uReplicator: 8 workers, 3 controllers (1 controller in some tests) 22 | * Producer: 10 pods 23 | * Consumer: 4 pods 24 | * Produced messages: 1kB (1000 bytes) each message 25 | * Production throughput: 200k messages/sec 26 | * => This results in replication of *200 MB/sec* 27 | * Replicated topics by uReplicator: 1 28 | * Kafka replication factor: 3 in us-east-1, 2 in eu-west-1 29 | * Partitions: 150 partitions on both clusters as a baseline 30 | 31 | (further configuration details such as memory, CPU allocation and more can be found in the k8s yaml files in this project) 32 | 33 | ## Results 34 | 35 | We ran multiple experiments, here are the highlights 36 | 37 | ### Long haul 38 | 39 | Run the workload of 200MB/sec for several hours. 40 | 41 | **Result:** Looks good. Nothing suspicious happened. Over hours and hours the topics were correctly replicated. 42 | 43 | ### Kill a broker in kafka-source 44 | 45 | We kill a broker pod in kafka-source. When killed k8s automatically re-provisions a new pod in the statefulset, which results in a few minutes downtime for one of the brokers until it's back up. Since replication factor is 3 we do not expect message loss although this action might result in higher latency and lower throughput. 46 | 47 | Killing a pod: (example) 48 | 49 | ```sh 50 | kubectl --context us-east-1.k8s.local -n kafka-source delete pod kafka-source-2 51 | ``` 52 | 53 | **Result:** We see a small hiccup in throughput and latency of replication. And some message loss (about 20 messages out of 200k/sec, 0.01%). We don't have an explanation to this message loss although in terms of correctness of our application this is definitely something we can live with. 54 | 55 | 56 | ![Kill a broker in kafka-source](doc/media/kill-kafka-source-pod.png "Kill a broker in kafka-source") 57 | 58 | ### Reduce source cluster size permanently to 29 59 | 60 | The baseline of the source cluster is 30 brokers. In this experiment we reduce the size to 29 permanently. Unlike the previous experiment in this case k8s will not re-provision the killed pod so the cluster size will remain 29 permanently. This is supposed to be OK since the replication factor is 3. 61 | 62 | Scaling down a cluster to 29: 63 | 64 | ```sh 65 | kubectl --context us-east-1.k8s.local -n kafka-source scale statefulset kafka-source --replicas 29 66 | ``` 67 | 68 | **Result:** the result is very similar to before. We see a slight hiccup in replication throughput and in some cases we see minor message loss (which we cannot explain) but that's all. 69 | 70 | ### Add uReplicator worker 71 | 72 | The original setup had 8 uReplicator workers. We want to check how adding an additional worker affects the cluster. Our expectation is that workers would rebalance and "continue as usual". 73 | 74 | ```sh 75 | kubectl --context eu-west-1.k8s.local -n ureplicator scale deployment ureplicator-worker --replicas 9 76 | ``` 77 | 78 | **Result:** As expected, everything is normal, that's good. 79 | 80 | ### Remove uReplicator worker 81 | 82 | Similar to before, the number of workers in our baseline is 8. In this test we reduce them to 7 by scaling down the number of pods. We expect to see no message loss but a small hiccup in latency until a rebalance. 83 | 84 | ```sh 85 | kubectl --context eu-west-1.k8s.local -n ureplicator scale deployment ureplicator-worker --replicas 7 86 | ``` 87 | 88 | **Result:** We indeed see slowness but after a rebalance (~2 minutes) the rest of the workers catch up. No message loss. 89 | 90 | ![Remove worker](doc/media/remove-worker.png "Remove worker") 91 | 92 | ### uReplicaor under capacity and then back in capacity 93 | When removing more and more workers from uReplicator at some stage it will run out of capacity and not be able to replicate in the desired throughput. 94 | 95 | Our experiment is: Remove more and more workers, have uReplicator run out of capacity. And only then re-add workers and see how fast it's able to pick up with the pace. 96 | 97 | Remove more and more workers: 98 | 99 | ```sh 100 | kubectl --context eu-west-1.k8s.local -n ureplicator scale deployment ureplicator-worker --replicas 7 101 | kubectl --context eu-west-1.k8s.local -n ureplicator scale deployment ureplicator-worker --replicas 6 102 | kubectl --context eu-west-1.k8s.local -n ureplicator scale deployment ureplicator-worker --replicas 5 103 | ... 104 | 105 | ``` 106 | 107 | And then when you see it runs out of capacity start adding them again 108 | 109 | ```sh 110 | kubectl --context eu-west-1.k8s.local -n ureplicator scale deployment ureplicator-worker --replicas 6 111 | ``` 112 | 113 | It takes a long time for workers to catch up with work but eventually they do. Around 10 minutes 114 | Sometimes uReplicaotr needs to "get kicked" by adding or removing workers. For example we see that with 10 workers it might get to a local minima and then when reduced to 8 workers it suddenly gets a boost. It seems that sometimes the controllers don't know about the newly added workers. This can be monitored and fixed looking into the `/instances` API on the controller. 115 | 116 | ### Kill a broker in kafka-source 117 | 118 | We abruptly kill one of the brokers, allowing k8s to re-provision a new pod into its statefulset. We expect no message loss (as there's a replication factor of 3) and perhaps a slight performance hiccup. 119 | 120 | ```sh 121 | kubectl --context eu-west-1.k8s.local -n kafka-destination delete pod kafka-destination-29 122 | ``` 123 | 124 | **Result:** There's a small hiccup and then things are back to normal. No message loss 125 | 126 | ![Kill pod in destination](doc/media/kill-pod-destination.png "Kill pod in destination") 127 | 128 | ### Delete a pod on kafka-destination 129 | 130 | The baseline of both kafka clusters is 30 brokers. In this experiment we delete one broker from the destination cluster. We expect no message loss (as there's a replication factor of 3) and perhaps a slight performance hiccup. 131 | 132 | ```sh 133 | kubectl --context eu-west-1.k8s.local -n kafka-destination scale statefulset kafka-destination --replicas 29 134 | ``` 135 | 136 | **Result:** OK, no noticeable hiccups 137 | 138 | ![Downsize destination cluster](doc/media/downsize-destination-cluster.png "Downsize destination cluster") 139 | 140 | ### Adding new topic 141 | 142 | In this experiment we add a new topic and we want to see how fast uReplicator starts replicating the new topic. 143 | 144 | ***Result:*** Discovery of new topic is in the order of 2-3 minutes, which is OK. 145 | 146 | ![Discover new topic](doc/media/new-topic.png) 147 | 148 | ### Adding partitions to an existing topic 149 | 150 | We want to test what happens when we repartition (e.g. add partitions) to an existing topic which is already being actively replicated. We expect uReplicator to pick up the new partitions and start replicating them as well. 151 | 152 | We connect to one of the source cluster workers and run the `kafka-topics` command 153 | 154 | ```sh 155 | $ make k8s-kafka-shell-source 156 | # ... connecting ... 157 | 158 | $ unset JMX_PORT 159 | $ bin/kafka-topics.sh --zookeeper zookeeper:2181 --alter --topic topic5 --partitions 300 160 | ``` 161 | 162 | **Result:** This was a bit of a surprise, but uReplicator did not pick up the new partitions. It continued replicating the old partitions but did not pick up the new partitions. 163 | 164 | To fix that we use uReplicator's API. We delete the topic from uReplicator and then re-add it and then uReplicator finally starts replicating the new partitions. This seems like a usability issue, not sure if this is by design. 165 | 166 | In order to send commands to the remote uReplicator controller(s) we open a local port with port-forwarding: 167 | 168 | ``` 169 | kubectl --context eu-west-1.k8s.local -n ureplicator port-forward ureplicator-controller-76ff85b889-l9mzl 9000 170 | ``` 171 | 172 | And now we can delete and recreate the topic with as many partitions as we should 173 | 174 | ``` 175 | curl -X DELETE http://localhost:9000/topics/topic5 176 | curl -X POST -d '{"topic":"topic5", "numPartitions":"300"}' http://localhost:9000/topics 177 | ``` 178 | 179 | ### Add uReplicator controller 180 | 181 | We try adding a new controller to make sure nothing breaks while doing so. 182 | 183 | **Result:** Looks good, continues operation as normal. 184 | 185 | ### Delete uReplicator controller 186 | 187 | We delete a uReplicator controller (there were 3 to begin with) and make sure the rest of the controllers are able to continue operation as planned. 188 | 189 | **Result:** Looks good, the rest of the controllers behave normally 190 | 191 | ### Delete all uReplicator controllers 192 | 193 | We delete all uReplicator controllers to see what happens. 194 | 195 | **Result:** For as long as there's no controller alive the workers continue their normal operation. However new topics will not are not getting replicated. When controllers are back up they pick up the information about the existing workers and they take charge again of topic replication control. 196 | 197 | ### Packet loss: 10% on 4 replicator workers (out of 10) 198 | 199 | Since the main scenario we will deal with is replication over the Atlantic we want to test by simulating packet loss. We use Weave Scopes' Traffic Control plugin in order to apply 10% packet loss on 4 out of 10 uReplicator workers. 10% packet loss is quite high. 200 | 201 | **Result:** We see slowness in processing, but no message loss. That's good. 202 | 203 | ![Packet loss on workers](doc/media/packet-loss-on-workers.png "packet loss on workers") 204 | 205 | ### Packet loss: 10% on 4 brokers in source cluster 206 | 207 | The source cluster has 30 nodes. In this experiment we apply 10% packet loss on 4 of the brokers. This is similar to the previous experiment only that packet loss was implemented at the other end of the Atlantic. 208 | 209 | **Result:** OK, we see slowness and when packets are back, the cluster catches up. 210 | 211 | ![Packet loss on source cluster](doc/media/packet-loss-on-source-cluster.png "Packet loss on source cluster") 212 | 213 | 214 | ## Conclusion 215 | 216 | All in all uReplicator seems like a capable tool for the mission at hand. There are still a few blind spots and hiccups but all in all, it seems ready to go. 217 | 218 | ### No message headers 219 | 220 | One of the relatively recent features added to Kafka are message headers. As of this writing *uReplicator does not support message headers*, meaning that if a message contains a header uReplicator would replicate the message, silently discarding its headers. 221 | -------------------------------------------------------------------------------- /running.md: -------------------------------------------------------------------------------- 1 | # Installing and running the tool(s) 2 | 3 | We assume some level of familiarity with the following tools and technologies: 4 | 5 | * Kafka 6 | * uReplicator 7 | * Brooklin 8 | * AWS 9 | * Kubernetes 10 | 11 | And some nice to have and useful skills: 12 | 13 | * Prometheus 14 | * Grafana 15 | * Golang 16 | 17 | ## Prerequisite and setup 18 | 19 | The following tools are required: 20 | * `make` (already installed on most systems) 21 | * `AWS CLI`. And setup AWS keys 22 | * `kops`. The current tested version is 1.10.0 (with brew it's `brew install kops@1.10.0` or `brew upgrade kops@1.10.0` or `brew switch kops 1.10.0`) 23 | * `kubectl` - the kubernetes CLI 24 | * `kafka client tools`, in particular: `zookeeper-shell` and `kafka-console-consumer` 25 | 26 | # Running it 27 | 28 | NOTICE: This will incur costs from AWS. We setup up hefty clusters and drive traffic between them and this is costs $$$ 29 | 30 | ``` 31 | make k8s-all # Wait for all resources to be created. This could take up to 40min, depending on the cluster size. 32 | ``` 33 | 34 | # Destroying it 35 | 36 | ``` 37 | make k8s-delete-all # And wait for all resources to get deleted. This can take a few minutes 38 | ``` 39 | --------------------------------------------------------------------------------