├── .gitignore
├── Dockerfile
├── Makefile
├── README.md
├── build-ureplicator
    ├── Dockerfile
    ├── Makefile
    ├── confd
    │   ├── CONFD.md
    │   ├── conf.d
    │   │   ├── consumer.properties.toml
    │   │   ├── helix.properties.toml
    │   │   ├── producer.properties.toml
    │   │   └── zookeeper.properties.toml
    │   └── templates
    │   │   ├── consumer.properties.tmpl
    │   │   ├── helix.properties.tmpl
    │   │   ├── log4j.properties.tmpl
    │   │   ├── producer.properties.tmpl
    │   │   ├── test-log4j.properties.tmpl
    │   │   ├── tools-log4j.properties.tmpl
    │   │   ├── topicmapping.properties.tmpl
    │   │   └── zookeeper.properties.tmpl
    └── entrypoint.sh
├── doc
    └── media
    │   ├── brookin-packet-loss-kafka-source-1min-3brokers.png
    │   ├── brookin-packet-loss-kafka-source-1min.png
    │   ├── brookin-packet-loss-kafka-source.png
    │   ├── brooklin-add-partitions-take1.png
    │   ├── brooklin-add-partitions-take2.png
    │   ├── brooklin-adding-new-worker.png
    │   ├── brooklin-downsize-destination-cluster-100mb.png
    │   ├── brooklin-downsize-destination-cluster.png
    │   ├── brooklin-kill-kafka-destination-pod-take2-aftershock1.png
    │   ├── brooklin-kill-kafka-destination-pod-take2-aftershock2.png
    │   ├── brooklin-kill-kafka-destination-pod-take2.png
    │   ├── brooklin-kill-kafka-destination-pod-take3.png
    │   ├── brooklin-kill-kafka-destination-pod-take4.png
    │   ├── brooklin-kill-kafka-destination-pod-take5.png
    │   ├── brooklin-kill-kafka-destination-pod.png
    │   ├── brooklin-kill-kafka-source-pod.png
    │   ├── brooklin-new-topic.png
    │   ├── brooklin-packat-loss-100mb.png
    │   ├── brooklin-packet-loss.png
    │   ├── brooklin-reduce-worker-pool-to-31.png
    │   ├── brooklin-remove-more-workers.png
    │   ├── brooklin-removing-more-workers-latency.png
    │   ├── brooklin-resize-kafka-source.png
    │   ├── brooklin-scale-down-and-up-100mb.png
    │   ├── brooklin-scale-down-and-up.png
    │   ├── brookling-killl-kafka-pod-take2-production-error-rate.png
    │   ├── downsize-destination-cluster.png
    │   ├── kill-kafka-source-pod.png
    │   ├── kill-pod-destination.png
    │   ├── new-topic.png
    │   ├── packet-loss-on-source-cluster.png
    │   ├── packet-loss-on-workers.png
    │   └── remove-worker.png
├── go.mk
├── go.mod
├── go.sum
├── k8s.mk
├── k8s
    ├── brooklin
    │   ├── 00namespace.yml
    │   ├── 20zookeeper.yml
    │   ├── 25env-config.yml
    │   ├── 25jmx-prometheus-javaagent-config.yml
    │   ├── 30brooklin.yml
    │   ├── 40monitoring.yml
    │   ├── delete-replicate-topic.sh
    │   ├── replicate-topic.sh
    │   └── test.sh
    ├── kafka-destination
    │   ├── 00namespace.yml
    │   ├── 10broker-config.yml
    │   ├── 10metrics-config.yml
    │   ├── 10zookeeper-config.yml
    │   ├── 20dns.yml
    │   ├── 20pzoo-service.yml
    │   ├── 30service.yml
    │   ├── 50kafka.yml
    │   ├── 50pzoo.yml
    │   ├── 60monitoring.yml
    │   └── test.sh
    ├── kafka-source
    │   ├── 00namespace.yml
    │   ├── 10broker-config.yml
    │   ├── 10metrics-config.yml
    │   ├── 10zookeeper-config.yml
    │   ├── 20dns.yml
    │   ├── 20pzoo-service.yml
    │   ├── 30service.yml
    │   ├── 50kafka.yml
    │   ├── 50pzoo.yml
    │   ├── 60monitoring.yml
    │   └── test.sh
    ├── monitoring
    │   ├── admin-cluster-role-binding.yaml
    │   ├── admin-service-account.yaml
    │   ├── graphite-exporter
    │   │   ├── configmap.yml
    │   │   ├── deployment.yaml
    │   │   ├── prometheus-scrape.yaml
    │   │   └── service.yaml
    │   ├── kube-state-metrics-cluster-role-binding.yaml
    │   ├── kube-state-metrics-cluster-role.yaml
    │   ├── kube-state-metrics-deployment.yaml
    │   ├── kube-state-metrics-role-binding.yaml
    │   ├── kube-state-metrics-role.yaml
    │   ├── kube-state-metrics-service-account.yaml
    │   ├── kube-state-metrics-service.yaml
    │   ├── monitoring-expose-kube-controller-manager.yaml
    │   ├── monitoring-expose-kube-scheduler.yaml
    │   └── patch
    │   │   ├── grafana-dashboard-definitions.yaml
    │   │   ├── grafana-datasources.yaml.tmpl
    │   │   └── template.sh
    ├── tester
    │   ├── consumer.yaml
    │   └── producer.yaml
    └── ureplicator
    │   ├── 00namespace.yml
    │   ├── 20zookeeper.yml
    │   ├── 25env-config.yml.tmpl
    │   ├── 25jmx-prometheus-javaagent-config.yml
    │   ├── 30ureplicator.yml
    │   ├── 40monitoring.yml
    │   ├── template.sh
    │   └── test.sh
├── lib
    ├── admin
    │   └── topic.go
    ├── cmd
    │   ├── cmd.go
    │   ├── consume.go
    │   └── produce.go
    ├── consumer
    │   ├── consumer.go
    │   ├── performance.go
    │   ├── sequences.go
    │   ├── sequences_test.go
    │   ├── throughput.go
    │   └── ui.go
    ├── gen
    │   └── main
    │   │   └── code-gen.go
    ├── log.go
    ├── message
    │   ├── data.go
    │   ├── message-no-headers-const-gen.go
    │   ├── message-no-headers.go
    │   ├── message-no-headers_test.go
    │   ├── message.go
    │   └── message_test.go
    ├── producer
    │   ├── monitor.go
    │   └── producer.go
    └── types
    │   └── types.go
├── main.go
├── results-brooklin.md
├── results-ureplicator.md
└── running.md


/.gitignore:
--------------------------------------------------------------------------------
1 | vendor
2 | kafka-mirror-tester
3 | build-ureplicator/tmp/
4 | k8s/ureplicator/25env-config.yml
5 | k8s/monitoring/patch/grafana-datasources.yaml
6 | 


--------------------------------------------------------------------------------
/Dockerfile:
--------------------------------------------------------------------------------
 1 | FROM ubuntu
 2 | 
 3 | # Install the C lib for kafka
 4 | RUN apt-get update
 5 | RUN apt-get install -y --no-install-recommends apt-utils wget gnupg software-properties-common
 6 | RUN apt-get install -y apt-transport-https ca-certificates
 7 | RUN wget -qO - https://packages.confluent.io/deb/5.1/archive.key | apt-key add -
 8 | RUN add-apt-repository "deb [arch=amd64] https://packages.confluent.io/deb/5.1 stable main"
 9 | RUN apt-get update
10 | RUN apt-get install -y librdkafka-dev
11 | 
12 | # Install Go
13 | RUN add-apt-repository ppa:longsleep/golang-backports
14 | RUN apt-get update
15 | RUN apt-get install -y golang-1.11-go
16 | 
17 | # build the library
18 | WORKDIR /go/src/github.com/appsflyer/kafka-mirror-tester
19 | COPY *.go ./
20 | COPY lib lib
21 | COPY vendor vendor
22 | 
23 | RUN GOPATH=/go GOOS=linux /usr/lib/go-1.11/bin/go build -a -o main .
24 | 
25 | EXPOSE 8000
26 | 
27 | ENTRYPOINT ["./main"]
28 | 


--------------------------------------------------------------------------------
/Makefile:
--------------------------------------------------------------------------------
1 | include go.mk
2 | include k8s.mk
3 | 
4 | ####################
5 | # uReplicator docker
6 | ####################
7 | ureplicator-release:
8 | 	cd build-ureplicator; make release
9 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Kafka Mirror Tester
  2 | 
  3 | Kafka mirror tester is a tool meant to test the performance and correctness of apache kafka mirroring.
  4 | Mirroring is not one of kafka's built in properties, but there are 3rd party tools that implement mirroring, namely:
  5 | 
  6 | 1. Kafka's [Mirror Maker](https://kafka.apache.org/documentation.html#basic_ops_mirror_maker), a relatively simple tool within the kafka project with some known limitations
  7 | 2. [Confluent's Replicator](https://docs.confluent.io/current/multi-dc-replicator/index.html), a paid for tool from confluent.
  8 | 3. Uber's open source [uReplicator](https://github.com/uber/uReplicator)
  9 | 4. LinkedIn [Brooklin](https://github.com/linkedin/brooklin)
 10 | 
 11 | This test tool is indifferent to the underlying mirroring tool so it is able to test all the above mentioned replicators.
 12 | 
 13 | *The current implementation supports Uber's uReplicator and Linkedin's Brooklin.*
 14 | 
 15 | Presentation on this project: https://speakerdeck.com/rantav/infrastructure-testing-using-kubernetes
 16 | 
 17 | 
 18 | ## High level design
 19 | 
 20 | Mirroring typically takes place between two datacenters as described below:
 21 | 
 22 | ```
 23 | ----------------------                        --------------------------------------
 24 | |                    |                        |                                    |
 25 | |  Source DC         |                        | Destination DC                     |
 26 | |                    |                        |                                    |
 27 | |  ----------------  |                        | --------------    ---------------- |
 28 | |  | Source Kafka |  | - - - - - - - - - - -> | | Replicator | -> | Target Kafka | |
 29 | |  ________________  |                        | --------------    ---------------- |
 30 | |                    |                        |                                    |
 31 | ----------------------                        --------------------------------------
 32 | ```
 33 | 
 34 | The test tool has the following goals in mind:
 35 | 
 36 | 1. Correctness, mainly completeness - that all messages sent to Source arrived in order at Destination (per partition). At least once semantic.
 37 | 2. Performance - how long does it take for messages to get replicated and sent to Destination. This, of course, takes into consideration the laws of manure, e.g. inherent line latency.
 38 | 
 39 | The test harness is therefore comprised  of two components: The `producer` and the `consumer`
 40 | 
 41 | ### The producer
 42 | The producer writes messages with sequence numbers and timestamps.
 43 | 
 44 | ### The consumer
 45 | The consumer reads messages and looks into the sequence numbers and timestamps to determine correctness and performance.
 46 | This assumes the producer and consumer's clocks are in sync (we don't require the punctuation of atomic clocks, but we do realize that out of sync clocks will influence accuracy)
 47 | 
 48 | ## Lower level design
 49 | 
 50 | The producer writes its messages to the source kafka, adding it's `producer-id`, `sequence`, `timestamp` and a `payload`.
 51 | The producer is capable of throttling it's throughput so that we'd achieve predictable throughput.
 52 | 
 53 | ```
 54 | ----------------------                        --------------------------------------
 55 | |                    |                        |                                    |
 56 | |  Source DC         |                        | Destination DC                     |
 57 | |                    |                        |                                    |
 58 | |  ----------------  |                        | --------------    ---------------- |
 59 | |  | Source Kafka |  | - - - - - - - - - - -> | | Replicator | -> | Target Kafka | |
 60 | |  ----------------  |                        | --------------    ---------------- |
 61 | |    ↑               |                        |                             |      |
 62 | |    |               |                        |                             |      |
 63 | |    |               |                        |                             ↓      |
 64 | |  ------------      |                        |                       ------------ |
 65 | |  | producer |      |                        |                       | consumer | |
 66 | |  ------------      |                        |                       ------------ |
 67 | ----------------------                        --------------------------------------
 68 | ```
 69 | 
 70 | ### Message format
 71 | 
 72 | We aim for a simple, low overhead message format utilizing Kafka's built in header fields. And where headers are not supported (shamefully, the current reality with both uReplicator and Brooklin), we utilize an in-body message format.
 73 | 
 74 | Message format:
 75 | There are two variants of message formats, one that uses kafka headers and the other that does not. 
 76 | We implement two formats because while headers are nicer and easier to use, neither uReplicator nor Brooklin currently support them.
 77 | 
 78 | Message format with headers: (for simplicity, we use a json format but of course in Kafka it's all binary)
 79 | ```
 80 | {
 81 |     value: payload,               // Payload size is determined by the user.
 82 |     timestamp: produceTime,       // The producer embeds a timestamp in UTC
 83 |     headers: {
 84 |         id: producer-id,
 85 |         seq: sequence-number
 86 |     }
 87 | }
 88 | ```
 89 | 
 90 | Message format without headers (encoded in the message body itself):
 91 | ```
 92 | +-------------------------------------------------+
 93 | | producer-id;sequence-number;timestamp;payload...|
 94 | +-------------------------------------------------+
 95 | ```
 96 | 
 97 | We add the `producer-id` so that we can run the producers on multiple hosts and still be able to make sure that all messages arrived.
 98 | 
 99 | ### Producer
100 | 
101 | Command line arguments:
102 | 
103 | `--id`: Producer ID. May be the hostname etc.
104 | 
105 | `--topics`: List of topic names to write to, separated by comas
106 | 
107 | `--throughput`: Number of messages per second per topic
108 | 
109 | `--message-size`: Message size, including the header section (producer-id;sequence-number;timestamp;). The minimal message size is around 30 bytes then due to a typical header length
110 | 
111 | `--bootstrap-server`: A kafka server from which to bootstrap
112 | 
113 | `--use-message-headers`: Whether to use message headers to encode metadata (or encode it within the payload)
114 | 
115 | The producer would generate messages containing the header and adding the payload for as long as needed in order to reach the `message-size` and send them to kafka.
116 | It will try to achieve the desired throughput (send batches and in parallel) but will not exceed it. If it is unable to achieve the desired throughput we'll emit a log warning and continue. We also keep an eye on that using Prometheus and Grafana in the big picture.
117 | The throughput is measured as the number of messages / second / topic.
118 | 
119 | ### Consumer
120 | 
121 | Command line arguments:
122 | 
123 | `--topics`: List of topic names to read from, separated by comas
124 | 
125 | `--bootstrap-server`: A kafka server from which to bootstrap
126 | 
127 | `--use-message-headers`: Whether to use message headers to encode metadata (or encode it within the payload)
128 | 
129 | The consumer would read the messages from each of the topics and calculate correctness and performance.
130 | 
131 | Correctness is determined by the combination of `topic`, `producer-id` and `sequence-number` (e.g. if a specific producer has gaps that means we're  missing messages).
132 | 
133 | There is a fine point to mention in that respect. When operating with multiple partitions we utilize Kafka's message `key` in order to ensure message routing correctness. When multiple consumers read, (naturally from multiple partitions) we want each consumer to be able to read *all* sequential messages *in the order* they were sent. To achieve that we use Kafka's message routing abilities such that messages with the same key always routed to the same partition. What matters is the number of partitions in the destination cluster. To achieve linearity we sequence the messages modulo the number of partitions in the destination cluster. This way, all ascending sequence numbers are sent to the same partition in the same order and clients are then able to easily verify that all messages arrived in the order they were sent.
134 | 
135 | 
136 | Latency is determined by the time gap between the `timestamp` and the current local consumer time. The consumer then emits a histogram of latency buckets.
137 | 
138 | ## Open for discussion
139 | 
140 | 1. If the last message from a producer got lost we don't know about it. If all messages from a specific producer got lost, we won't know about it either (although it's possible to manually audit that). If a certain partition is not replicated we can only see it by means of traffic volume monitoring, not precise counts.
141 | 
142 | # Using it.
143 | The tools in this project expect some familiarity with 3rd party tools, namely Kubernetes and AWS. We don't expect expert level but some familiarity with the tools is very helpful.
144 | 
145 | For details how to run it see [Running it](running.md)
146 | 
147 | # Results
148 | 
149 | We have [benchmark results for uReplicator](results-ureplicator.md) and [benchmark results for brooklin](results-brooklin.md)
150 | 


--------------------------------------------------------------------------------
/build-ureplicator/Dockerfile:
--------------------------------------------------------------------------------
 1 | FROM openjdk:8-jre
 2 | 
 3 | ADD https://github.com/kelseyhightower/confd/releases/download/v0.15.0/confd-0.15.0-linux-amd64 /usr/local/bin/confd
 4 | 
 5 | ADD https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/0.3.1/jmx_prometheus_javaagent-0.3.1.jar /jmx_prometheus_javaagent-0.3.1.jar
 6 | 
 7 | COPY tmp/uReplicator-master/uReplicator-Distribution/target/uReplicator-Distribution-pkg /uReplicator
 8 | 
 9 | COPY tmp/uReplicator-master/config uReplicator/config
10 | 
11 | COPY confd /etc/confd
12 | 
13 | COPY entrypoint.sh /entrypoint.sh
14 | RUN chmod +x /entrypoint.sh && \
15 |     chmod +x /usr/local/bin/confd && \
16 |     chmod +x /uReplicator/bin/*.sh
17 | 
18 | ENV JAVA_OPTS "${JAVA_OPTS} -XX:+UnlockExperimentalVMOptions -XX:+UseG1GC -XX:+UseCGroupMemoryLimitForHeap -XX:MaxRAMFraction=1"
19 | 
20 | ENTRYPOINT [ "/entrypoint.sh" ]


--------------------------------------------------------------------------------
/build-ureplicator/Makefile:
--------------------------------------------------------------------------------
 1 | REV := `git rev-parse --short HEAD`
 2 | 
 3 | 
 4 | ####################
 5 | # uReplicator docker
 6 | ####################
 7 | U_WORK_DIR := tmp
 8 | U_BIN := ureplicator
 9 | U_IMAGE := rantav/$(U_BIN)
10 | 
11 | release: clean build deploy clean
12 | 
13 | build:
14 | 	mkdir -p $(U_WORK_DIR)
15 | 	curl -sL https://github.com/uber/uReplicator/archive/master.tar.gz | tar xz -C $(U_WORK_DIR)
16 | 	cd $(U_WORK_DIR)/uReplicator-master && mvn package -DskipTests
17 | 	chmod u+x $(U_WORK_DIR)/uReplicator-master/bin/pkg/*.sh
18 | 
19 | image:
20 | 	docker build -t $(U_IMAGE):$(REV) .
21 | 
22 | deploy: image
23 | 	docker push $(U_IMAGE):$(REV)
24 | 
25 | clean:
26 | 	@/bin/rm -rf $(U_WORK_DIR)
27 | 


--------------------------------------------------------------------------------
/build-ureplicator/confd/CONFD.md:
--------------------------------------------------------------------------------
1 | ## ignore me
2 | Generate confd
3 | ```
4 | ls -al | awk '{print$9}' | while read line ; do echo "src =" '"'${line}'"' ; echo "dest =" '"/uReplicator/config/'${line}'.tmpl"' ; echo  ; done
5 | {{ getenv "HOSTNAME" }}
6 | ```
7 | 


--------------------------------------------------------------------------------
/build-ureplicator/confd/conf.d/consumer.properties.toml:
--------------------------------------------------------------------------------
1 | [template]
2 | src = "consumer.properties.tmpl"
3 | dest = "/uReplicator/config/consumer.properties"
4 | 


--------------------------------------------------------------------------------
/build-ureplicator/confd/conf.d/helix.properties.toml:
--------------------------------------------------------------------------------
1 | [template]
2 | src = "helix.properties.tmpl"
3 | dest = "/uReplicator/config/helix.properties"
4 | 


--------------------------------------------------------------------------------
/build-ureplicator/confd/conf.d/producer.properties.toml:
--------------------------------------------------------------------------------
1 | [template]
2 | src = "producer.properties.tmpl"
3 | dest = "/uReplicator/config/producer.properties"
4 | 


--------------------------------------------------------------------------------
/build-ureplicator/confd/conf.d/zookeeper.properties.toml:
--------------------------------------------------------------------------------
1 | [template]
2 | src = "zookeeper.properties.tmpl"
3 | dest = "/uReplicator/config/zookeeper.properties"
4 | 


--------------------------------------------------------------------------------
/build-ureplicator/confd/templates/consumer.properties.tmpl:
--------------------------------------------------------------------------------
 1 | # Licensed to the Apache Software Foundation (ASF) under one or more
 2 | # contributor license agreements.  See the NOTICE file distributed with
 3 | # this work for additional information regarding copyright ownership.
 4 | # The ASF licenses this file to You under the Apache License, Version 2.0
 5 | # (the "License"); you may not use this file except in compliance with
 6 | # the License.  You may obtain a copy of the License at
 7 | #
 8 | #    http://www.apache.org/licenses/LICENSE-2.0
 9 | #
10 | # Unless required by applicable law or agreed to in writing, software
11 | # distributed under the License is distributed on an "AS IS" BASIS,
12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13 | # See the License for the specific language governing permissions and
14 | # limitations under the License.
15 | # see kafka.consumer.ConsumerConfig for more details
16 | 
17 | # Zookeeper connection string
18 | # comma separated host:port pairs, each corresponding to a zk
19 | # server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002"
20 | zookeeper.connect={{ getenv "SRC_ZK_CONNECT" }}
21 | 
22 | # timeout in ms for connecting to zookeeper
23 | zookeeper.connection.timeout.ms=30000
24 | zookeeper.session.timeout.ms=30000
25 | 
26 | #consumer group id
27 | group.id={{ getenv "CONSUMER_GROUP_ID" }}
28 | 
29 | consumer.id={{ getenv "HOSTNAME" }}
30 | partition.assignment.strategy=roundrobin
31 | socket.receive.buffer.bytes={{ getenv "SOCKET_RECEIVE_BUFFER_BYTES" }}
32 | fetch.message.max.bytes={{ getenv "FETCH_MESSAGE_MAX_BYTES" }}
33 | queued.max.message.chunks=5
34 | 
35 | #consumer timeout
36 | #consumer.timeout.ms=5000
37 | 
38 | auto.offset.reset=smallest
39 | num.consumer.fetchers={{ getenv "NUM_CONSUMER_FETCHERS" }}
40 | 


--------------------------------------------------------------------------------
/build-ureplicator/confd/templates/helix.properties.tmpl:
--------------------------------------------------------------------------------
1 | zkServer={{ getenv "HELIX_ZK_CONNECT" }}
2 | instanceId={{ getenv "HOSTNAME" }}
3 | helixClusterName={{ getenv "HELIX_CLUSTER_NAME" }}


--------------------------------------------------------------------------------
/build-ureplicator/confd/templates/log4j.properties.tmpl:
--------------------------------------------------------------------------------
 1 | # Licensed to the Apache Software Foundation (ASF) under one or more
 2 | # contributor license agreements.  See the NOTICE file distributed with
 3 | # this work for additional information regarding copyright ownership.
 4 | # The ASF licenses this file to You under the Apache License, Version 2.0
 5 | # (the "License"); you may not use this file except in compliance with
 6 | # the License.  You may obtain a copy of the License at
 7 | #
 8 | #    http://www.apache.org/licenses/LICENSE-2.0
 9 | #
10 | # Unless required by applicable law or agreed to in writing, software
11 | # distributed under the License is distributed on an "AS IS" BASIS,
12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13 | # See the License for the specific language governing permissions and
14 | # limitations under the License.
15 | 
16 | log4j.rootLogger=INFO, stdout 
17 | 
18 | log4j.appender.stdout=org.apache.log4j.ConsoleAppender
19 | log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
20 | log4j.appender.stdout.layout.ConversionPattern=[%d] %p %m (%c)%n
21 | 
22 | log4j.appender.kafkaAppender=org.apache.log4j.DailyRollingFileAppender
23 | log4j.appender.kafkaAppender.DatePattern='.'yyyy-MM-dd-HH
24 | log4j.appender.kafkaAppender.File=${kafka.logs.dir}/server.log
25 | log4j.appender.kafkaAppender.layout=org.apache.log4j.PatternLayout
26 | log4j.appender.kafkaAppender.layout.ConversionPattern=[%d] %p %m (%c)%n
27 | 
28 | log4j.appender.stateChangeAppender=org.apache.log4j.DailyRollingFileAppender
29 | log4j.appender.stateChangeAppender.DatePattern='.'yyyy-MM-dd-HH
30 | log4j.appender.stateChangeAppender.File=${kafka.logs.dir}/state-change.log
31 | log4j.appender.stateChangeAppender.layout=org.apache.log4j.PatternLayout
32 | log4j.appender.stateChangeAppender.layout.ConversionPattern=[%d] %p %m (%c)%n
33 | 
34 | log4j.appender.requestAppender=org.apache.log4j.DailyRollingFileAppender
35 | log4j.appender.requestAppender.DatePattern='.'yyyy-MM-dd-HH
36 | log4j.appender.requestAppender.File=${kafka.logs.dir}/kafka-request.log
37 | log4j.appender.requestAppender.layout=org.apache.log4j.PatternLayout
38 | log4j.appender.requestAppender.layout.ConversionPattern=[%d] %p %m (%c)%n
39 | 
40 | log4j.appender.cleanerAppender=org.apache.log4j.DailyRollingFileAppender
41 | log4j.appender.cleanerAppender.DatePattern='.'yyyy-MM-dd-HH
42 | log4j.appender.cleanerAppender.File=${kafka.logs.dir}/log-cleaner.log
43 | log4j.appender.cleanerAppender.layout=org.apache.log4j.PatternLayout
44 | log4j.appender.cleanerAppender.layout.ConversionPattern=[%d] %p %m (%c)%n
45 | 
46 | log4j.appender.controllerAppender=org.apache.log4j.DailyRollingFileAppender
47 | log4j.appender.controllerAppender.DatePattern='.'yyyy-MM-dd-HH
48 | log4j.appender.controllerAppender.File=${kafka.logs.dir}/controller.log
49 | log4j.appender.controllerAppender.layout=org.apache.log4j.PatternLayout
50 | log4j.appender.controllerAppender.layout.ConversionPattern=[%d] %p %m (%c)%n
51 | 
52 | # Turn on all our debugging info
53 | #log4j.logger.kafka.producer.async.DefaultEventHandler=DEBUG, kafkaAppender
54 | #log4j.logger.kafka.client.ClientUtils=DEBUG, kafkaAppender
55 | #log4j.logger.kafka.perf=DEBUG, kafkaAppender
56 | #log4j.logger.kafka.perf.ProducerPerformance$ProducerThread=DEBUG, kafkaAppender
57 | #log4j.logger.org.I0Itec.zkclient.ZkClient=DEBUG
58 | log4j.logger.kafka=INFO, kafkaAppender
59 | 
60 | log4j.logger.kafka.network.RequestChannel$=WARN, requestAppender
61 | log4j.additivity.kafka.network.RequestChannel$=false
62 | 
63 | #log4j.logger.kafka.network.Processor=TRACE, requestAppender
64 | #log4j.logger.kafka.server.KafkaApis=TRACE, requestAppender
65 | #log4j.additivity.kafka.server.KafkaApis=false
66 | log4j.logger.kafka.request.logger=WARN, requestAppender
67 | log4j.additivity.kafka.request.logger=false
68 | 
69 | log4j.logger.kafka.controller=TRACE, controllerAppender
70 | log4j.additivity.kafka.controller=false
71 | 
72 | log4j.logger.kafka.log.LogCleaner=INFO, cleanerAppender
73 | log4j.additivity.kafka.log.LogCleaner=false
74 | 
75 | log4j.logger.state.change.logger=TRACE, stateChangeAppender
76 | log4j.additivity.state.change.logger=false
77 | 


--------------------------------------------------------------------------------
/build-ureplicator/confd/templates/producer.properties.tmpl:
--------------------------------------------------------------------------------
 1 | # Licensed to the Apache Software Foundation (ASF) under one or more
 2 | # contributor license agreements.  See the NOTICE file distributed with
 3 | # this work for additional information regarding copyright ownership.
 4 | # The ASF licenses this file to You under the Apache License, Version 2.0
 5 | # (the "License"); you may not use this file except in compliance with
 6 | # the License.  You may obtain a copy of the License at
 7 | #
 8 | #    http://www.apache.org/licenses/LICENSE-2.0
 9 | #
10 | # Unless required by applicable law or agreed to in writing, software
11 | # distributed under the License is distributed on an "AS IS" BASIS,
12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13 | # See the License for the specific language governing permissions and
14 | # limitations under the License.
15 | # see kafka.producer.ProducerConfig for more details
16 | 
17 | ############################# Producer Basics #############################
18 | 
19 | # list of brokers used for bootstrapping knowledge about the rest of the cluster
20 | # format: host1:port1,host2:port2 ...
21 | bootstrap.servers={{ getenv "DST_BOOTSTRAP_SERVERS" }}
22 | client.id={{ getenv "CONSUMER_GROUP_ID" }}
23 | 
24 | # name of the partitioner class for partitioning events; default partition spreads data randomly
25 | #partitioner.class=
26 | 
27 | # specifies whether the messages are sent asynchronously (async) or synchronously (sync)
28 | # NOT SUPPORTED
29 | #producer.type=async
30 | 
31 | # specify the compression codec for all data generated: none, gzip, snappy, lz4.
32 | # the old config values work as well: 0, 1, 2, 3 for none, gzip, snappy, lz4, respectively
33 | # NOT SUPPORTED
34 | #compression.codec=snappy
35 | # CORRECT PROPERT?Y
36 | compression.type={{ getenv "PROD_COMPRESSION_TYPE" }}
37 | 
38 | # message encoder
39 | # NOT SUPPORTED
40 | # serializer.class=kafka.serializer.DefaultEncoder
41 | key.serializer=org.apache.kafka.common.serialization.ByteArraySerializer
42 | value.serializer=org.apache.kafka.common.serialization.ByteArraySerializer
43 | 
44 | # allow topic level compression
45 | #compressed.topics=
46 | 
47 | # Alias for queue.buffering.max.ms: Delay in milliseconds to wait for messages in the producer queue to accumulate before constructing message batches (MessageSets) to transmit to brokers. A higher value allows larger and more effective (less overhead, improved compression) batches of messages to accumulate at the expense of increased message delivery latency.
48 | linger.ms={{ getenv "PROD_LINGER_MS" }}
49 | 
50 | ############################# Async Producer #############################
51 | # maximum time, in milliseconds, for buffering data on the producer queue
52 | # NOT SUPPORTED
53 | #queue.buffering.max.ms={{ getenv "PROD_QUEUE_BUFFERING_MAX_MS" }}
54 | 
55 | # the maximum size of the blocking queue for buffering on the producer
56 | # NOT SUPPORTED
57 | #queue.buffering.max.messages={{ getenv "PROD_QUEUE_BUFFERING_MAX_MESSAGES" }}
58 | 
59 | # Timeout for event enqueue:
60 | # 0: events will be enqueued immediately or dropped if the queue is full
61 | # -ve: enqueue will block indefinitely if the queue is full
62 | # +ve: enqueue will block up to this many milliseconds if the queue is full
63 | #queue.enqueue.timeout.ms=
64 | 
65 | # the number of messages batched at the producer
66 | # NOT SUPPORTED
67 | #batch.num.messages={{ getenv "PROD_BATCH_NUM_MESSAGES" }}
68 | 
69 | send.buffer.bytes={{ getenv "PROD_SEND_BUFFER_BYTES" }}
70 | 
71 | # The maximum number of unacknowledged requests the client will send on a single connection before blocking. Note that if this setting is set to be greater than 1 and there are failed sends, there is a risk of message re-ordering due to retries (i.e., if retries are enabled).
72 | max.in.flight.requests.per.connection={{ getenv "PROD_MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION" }}
73 | 
74 | max.request.size={{ getenv "PROD_MAX_REQUEST_SIZE" }}
75 | 


--------------------------------------------------------------------------------
/build-ureplicator/confd/templates/test-log4j.properties.tmpl:
--------------------------------------------------------------------------------
 1 | # Licensed to the Apache Software Foundation (ASF) under one or more
 2 | # contributor license agreements.  See the NOTICE file distributed with
 3 | # this work for additional information regarding copyright ownership.
 4 | # The ASF licenses this file to You under the Apache License, Version 2.0
 5 | # (the "License"); you may not use this file except in compliance with
 6 | # the License.  You may obtain a copy of the License at
 7 | #
 8 | #    http://www.apache.org/licenses/LICENSE-2.0
 9 | #
10 | # Unless required by applicable law or agreed to in writing, software
11 | # distributed under the License is distributed on an "AS IS" BASIS,
12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13 | # See the License for the specific language governing permissions and
14 | # limitations under the License.
15 | 
16 | log4j.rootLogger=INFO, stdout
17 | 
18 | log4j.appender.stdout=org.apache.log4j.ConsoleAppender
19 | log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
20 | log4j.appender.stdout.layout.ConversionPattern=[%d] %p %m (%c)%n
21 | 
22 | log4j.appender.kafkaAppender=org.apache.log4j.DailyRollingFileAppender
23 | log4j.appender.kafkaAppender.DatePattern='.'yyyy-MM-dd-HH
24 | log4j.appender.kafkaAppender.File=logs/server.log
25 | log4j.appender.kafkaAppender.layout=org.apache.log4j.PatternLayout
26 | log4j.appender.kafkaAppender.layout.ConversionPattern=[%d] %p %m (%c)%n
27 | 
28 | log4j.appender.stateChangeAppender=org.apache.log4j.DailyRollingFileAppender
29 | log4j.appender.stateChangeAppender.DatePattern='.'yyyy-MM-dd-HH
30 | log4j.appender.stateChangeAppender.File=logs/state-change.log
31 | log4j.appender.stateChangeAppender.layout=org.apache.log4j.PatternLayout
32 | log4j.appender.stateChangeAppender.layout.ConversionPattern=[%d] %p %m (%c)%n
33 | 
34 | log4j.appender.requestAppender=org.apache.log4j.DailyRollingFileAppender
35 | log4j.appender.requestAppender.DatePattern='.'yyyy-MM-dd-HH
36 | log4j.appender.requestAppender.File=logs/kafka-request.log
37 | log4j.appender.requestAppender.layout=org.apache.log4j.PatternLayout
38 | log4j.appender.requestAppender.layout.ConversionPattern=[%d] %p %m (%c)%n
39 | 
40 | log4j.appender.controllerAppender=org.apache.log4j.DailyRollingFileAppender
41 | log4j.appender.controllerAppender.DatePattern='.'yyyy-MM-dd-HH
42 | log4j.appender.controllerAppender.File=logs/controller.log
43 | log4j.appender.controllerAppender.layout=org.apache.log4j.PatternLayout
44 | log4j.appender.controllerAppender.layout.ConversionPattern=[%d] %p %m (%c)%n
45 | 
46 | # Turn on all our debugging info
47 | #log4j.logger.kafka.producer.async.DefaultEventHandler=DEBUG, kafkaAppender
48 | #log4j.logger.kafka.client.ClientUtils=DEBUG, kafkaAppender
49 | log4j.logger.kafka.tools=DEBUG, kafkaAppender
50 | log4j.logger.kafka.tools.ProducerPerformance$ProducerThread=DEBUG, kafkaAppender
51 | #log4j.logger.org.I0Itec.zkclient.ZkClient=DEBUG
52 | log4j.logger.kafka=INFO, kafkaAppender
53 | 
54 | log4j.logger.kafka.network.RequestChannel$=TRACE, requestAppender
55 | log4j.additivity.kafka.network.RequestChannel$=false
56 | 
57 | #log4j.logger.kafka.network.Processor=TRACE, requestAppender
58 | #log4j.logger.kafka.server.KafkaApis=TRACE, requestAppender
59 | #log4j.additivity.kafka.server.KafkaApis=false
60 | log4j.logger.kafka.request.logger=TRACE, requestAppender
61 | log4j.additivity.kafka.request.logger=false
62 | 
63 | log4j.logger.kafka.controller=TRACE, controllerAppender
64 | log4j.additivity.kafka.controller=false
65 | 
66 | log4j.logger.state.change.logger=TRACE, stateChangeAppender
67 | log4j.additivity.state.change.logger=false
68 | 


--------------------------------------------------------------------------------
/build-ureplicator/confd/templates/tools-log4j.properties.tmpl:
--------------------------------------------------------------------------------
 1 | # Licensed to the Apache Software Foundation (ASF) under one or more
 2 | # contributor license agreements.  See the NOTICE file distributed with
 3 | # this work for additional information regarding copyright ownership.
 4 | # The ASF licenses this file to You under the Apache License, Version 2.0
 5 | # (the "License"); you may not use this file except in compliance with
 6 | # the License.  You may obtain a copy of the License at
 7 | #
 8 | #    http://www.apache.org/licenses/LICENSE-2.0
 9 | #
10 | # Unless required by applicable law or agreed to in writing, software
11 | # distributed under the License is distributed on an "AS IS" BASIS,
12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13 | # See the License for the specific language governing permissions and
14 | # limitations under the License.
15 | 
16 | log4j.rootLogger=INFO, stdout
17 | 
18 | log4j.appender.stdout=org.apache.log4j.ConsoleAppender
19 | log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
20 | log4j.appender.stdout.layout.ConversionPattern=[%d] %p %m (%c)%n
21 | 


--------------------------------------------------------------------------------
/build-ureplicator/confd/templates/topicmapping.properties.tmpl:
--------------------------------------------------------------------------------
1 | dummyTopic dummyTopic1


--------------------------------------------------------------------------------
/build-ureplicator/confd/templates/zookeeper.properties.tmpl:
--------------------------------------------------------------------------------
 1 | # Licensed to the Apache Software Foundation (ASF) under one or more
 2 | # contributor license agreements.  See the NOTICE file distributed with
 3 | # this work for additional information regarding copyright ownership.
 4 | # The ASF licenses this file to You under the Apache License, Version 2.0
 5 | # (the "License"); you may not use this file except in compliance with
 6 | # the License.  You may obtain a copy of the License at
 7 | # 
 8 | #    http://www.apache.org/licenses/LICENSE-2.0
 9 | # 
10 | # Unless required by applicable law or agreed to in writing, software
11 | # distributed under the License is distributed on an "AS IS" BASIS,
12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13 | # See the License for the specific language governing permissions and
14 | # limitations under the License.
15 | 
16 | # the directory where the snapshot is stored.
17 | dataDir=/tmp/zookeeper
18 | # the port at which the clients will connect
19 | clientPort=2181
20 | # disable the per-ip limit on the number of connections since this is a non-production config
21 | maxClientCnxns=0
22 | 


--------------------------------------------------------------------------------
/build-ureplicator/entrypoint.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash -ex
 2 | 
 3 | if [[ "${LOGICAL_PROCESSORS}" == "" ]]; then
 4 |   LOGICAL_PROCESSORS=`getconf _NPROCESSORS_ONLN`
 5 | fi
 6 | 
 7 | export JAVA_OPTS="${JAVA_OPTS} -XX:ParallelGCThreads=${LOGICAL_PROCESSORS}"
 8 | 
 9 | 
10 | confd -onetime -backend env
11 | 
12 | cd /uReplicator/bin/
13 | 
14 | if [ "${SERVICE_TYPE}" == "controller" ] ; then
15 |   ./start-controller.sh \
16 |     -port 9000 \
17 |     -zookeeper "${HELIX_ZK_CONNECT}" \
18 |     -helixClusterName "${HELIX_CLUSTER_NAME}" \
19 |     -backUpToGit false \
20 |     -autoRebalanceDelayInSeconds 120 \
21 |     -localBackupFilePath /tmp/uReplicator-controller \
22 |     -enableAutoWhitelist true \
23 |     -enableAutoTopicExpansion true \
24 |     -srcKafkaZkPath "${SRC_ZK_CONNECT}" \
25 |     -destKafkaZkPath "${DST_ZK_CONNECT}" \
26 |     -initWaitTimeInSeconds 10 \
27 |     -refreshTimeInSeconds 20 \
28 |     -graphiteHost "${GRAPHITE_HOST}" \
29 |     -graphitePort "${GRAPHITE_PORT}" \
30 |     -env "${HELIX_ENV}"
31 | 
32 |   until [[ "OK" == "$(curl --silent http://localhost:9000/health)" ]]; do
33 |     echo waiting
34 |     sleep 1
35 |   done
36 | 
37 |   TOPIC_LIST=( $(echo ${TOPICS} | sed "s/,/ /g") )
38 |   PARTITION_LIST=( $(echo ${PARTITIONS} | sed "s/,/ /g") )
39 | 
40 |   for index in ${!TOPIC_LIST[*]}; do
41 |     TOPIC="${TOPIC_LIST[$index]}"
42 |     PARTITION="${PARTITION_LIST[$index]}"
43 | 
44 |     echo "Topic: ${TOPIC}, Partitions: ${PARTITION}"
45 | 
46 |     curl -X POST -d "{\"topic\": \"${TOPIC}\", \"numPartitions\": \"${PARTITION}\"}" http://localhost:9000/topics || true
47 |   done
48 | 
49 | elif [ "${SERVICE_TYPE}" == "worker" ] ; then
50 | 
51 |   WORKER_ABORT_ON_SEND_FAILURE="${WORKER_ABORT_ON_SEND_FAILURE:=false}"
52 | 
53 |   ./start-worker.sh \
54 |     --helix.config /uReplicator/config/helix.properties \
55 |     --consumer.config /uReplicator/config/consumer.properties \
56 |     --producer.config /uReplicator/config/producer.properties \
57 |     --abort.on.send.failure="${WORKER_ABORT_ON_SEND_FAILURE}"
58 | 
59 | fi


--------------------------------------------------------------------------------
/doc/media/brookin-packet-loss-kafka-source-1min-3brokers.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AppsFlyer/kafka-mirror-tester/ce3ed8f8f0f9ac672921bb9b69bf89ca556b1afe/doc/media/brookin-packet-loss-kafka-source-1min-3brokers.png


--------------------------------------------------------------------------------
/doc/media/brookin-packet-loss-kafka-source-1min.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AppsFlyer/kafka-mirror-tester/ce3ed8f8f0f9ac672921bb9b69bf89ca556b1afe/doc/media/brookin-packet-loss-kafka-source-1min.png


--------------------------------------------------------------------------------
/doc/media/brookin-packet-loss-kafka-source.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AppsFlyer/kafka-mirror-tester/ce3ed8f8f0f9ac672921bb9b69bf89ca556b1afe/doc/media/brookin-packet-loss-kafka-source.png


--------------------------------------------------------------------------------
/doc/media/brooklin-add-partitions-take1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AppsFlyer/kafka-mirror-tester/ce3ed8f8f0f9ac672921bb9b69bf89ca556b1afe/doc/media/brooklin-add-partitions-take1.png


--------------------------------------------------------------------------------
/doc/media/brooklin-add-partitions-take2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AppsFlyer/kafka-mirror-tester/ce3ed8f8f0f9ac672921bb9b69bf89ca556b1afe/doc/media/brooklin-add-partitions-take2.png


--------------------------------------------------------------------------------
/doc/media/brooklin-adding-new-worker.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AppsFlyer/kafka-mirror-tester/ce3ed8f8f0f9ac672921bb9b69bf89ca556b1afe/doc/media/brooklin-adding-new-worker.png


--------------------------------------------------------------------------------
/doc/media/brooklin-downsize-destination-cluster-100mb.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AppsFlyer/kafka-mirror-tester/ce3ed8f8f0f9ac672921bb9b69bf89ca556b1afe/doc/media/brooklin-downsize-destination-cluster-100mb.png


--------------------------------------------------------------------------------
/doc/media/brooklin-downsize-destination-cluster.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AppsFlyer/kafka-mirror-tester/ce3ed8f8f0f9ac672921bb9b69bf89ca556b1afe/doc/media/brooklin-downsize-destination-cluster.png


--------------------------------------------------------------------------------
/doc/media/brooklin-kill-kafka-destination-pod-take2-aftershock1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AppsFlyer/kafka-mirror-tester/ce3ed8f8f0f9ac672921bb9b69bf89ca556b1afe/doc/media/brooklin-kill-kafka-destination-pod-take2-aftershock1.png


--------------------------------------------------------------------------------
/doc/media/brooklin-kill-kafka-destination-pod-take2-aftershock2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AppsFlyer/kafka-mirror-tester/ce3ed8f8f0f9ac672921bb9b69bf89ca556b1afe/doc/media/brooklin-kill-kafka-destination-pod-take2-aftershock2.png


--------------------------------------------------------------------------------
/doc/media/brooklin-kill-kafka-destination-pod-take2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AppsFlyer/kafka-mirror-tester/ce3ed8f8f0f9ac672921bb9b69bf89ca556b1afe/doc/media/brooklin-kill-kafka-destination-pod-take2.png


--------------------------------------------------------------------------------
/doc/media/brooklin-kill-kafka-destination-pod-take3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AppsFlyer/kafka-mirror-tester/ce3ed8f8f0f9ac672921bb9b69bf89ca556b1afe/doc/media/brooklin-kill-kafka-destination-pod-take3.png


--------------------------------------------------------------------------------
/doc/media/brooklin-kill-kafka-destination-pod-take4.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AppsFlyer/kafka-mirror-tester/ce3ed8f8f0f9ac672921bb9b69bf89ca556b1afe/doc/media/brooklin-kill-kafka-destination-pod-take4.png


--------------------------------------------------------------------------------
/doc/media/brooklin-kill-kafka-destination-pod-take5.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AppsFlyer/kafka-mirror-tester/ce3ed8f8f0f9ac672921bb9b69bf89ca556b1afe/doc/media/brooklin-kill-kafka-destination-pod-take5.png


--------------------------------------------------------------------------------
/doc/media/brooklin-kill-kafka-destination-pod.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AppsFlyer/kafka-mirror-tester/ce3ed8f8f0f9ac672921bb9b69bf89ca556b1afe/doc/media/brooklin-kill-kafka-destination-pod.png


--------------------------------------------------------------------------------
/doc/media/brooklin-kill-kafka-source-pod.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AppsFlyer/kafka-mirror-tester/ce3ed8f8f0f9ac672921bb9b69bf89ca556b1afe/doc/media/brooklin-kill-kafka-source-pod.png


--------------------------------------------------------------------------------
/doc/media/brooklin-new-topic.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AppsFlyer/kafka-mirror-tester/ce3ed8f8f0f9ac672921bb9b69bf89ca556b1afe/doc/media/brooklin-new-topic.png


--------------------------------------------------------------------------------
/doc/media/brooklin-packat-loss-100mb.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AppsFlyer/kafka-mirror-tester/ce3ed8f8f0f9ac672921bb9b69bf89ca556b1afe/doc/media/brooklin-packat-loss-100mb.png


--------------------------------------------------------------------------------
/doc/media/brooklin-packet-loss.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AppsFlyer/kafka-mirror-tester/ce3ed8f8f0f9ac672921bb9b69bf89ca556b1afe/doc/media/brooklin-packet-loss.png


--------------------------------------------------------------------------------
/doc/media/brooklin-reduce-worker-pool-to-31.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AppsFlyer/kafka-mirror-tester/ce3ed8f8f0f9ac672921bb9b69bf89ca556b1afe/doc/media/brooklin-reduce-worker-pool-to-31.png


--------------------------------------------------------------------------------
/doc/media/brooklin-remove-more-workers.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AppsFlyer/kafka-mirror-tester/ce3ed8f8f0f9ac672921bb9b69bf89ca556b1afe/doc/media/brooklin-remove-more-workers.png


--------------------------------------------------------------------------------
/doc/media/brooklin-removing-more-workers-latency.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AppsFlyer/kafka-mirror-tester/ce3ed8f8f0f9ac672921bb9b69bf89ca556b1afe/doc/media/brooklin-removing-more-workers-latency.png


--------------------------------------------------------------------------------
/doc/media/brooklin-resize-kafka-source.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AppsFlyer/kafka-mirror-tester/ce3ed8f8f0f9ac672921bb9b69bf89ca556b1afe/doc/media/brooklin-resize-kafka-source.png


--------------------------------------------------------------------------------
/doc/media/brooklin-scale-down-and-up-100mb.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AppsFlyer/kafka-mirror-tester/ce3ed8f8f0f9ac672921bb9b69bf89ca556b1afe/doc/media/brooklin-scale-down-and-up-100mb.png


--------------------------------------------------------------------------------
/doc/media/brooklin-scale-down-and-up.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AppsFlyer/kafka-mirror-tester/ce3ed8f8f0f9ac672921bb9b69bf89ca556b1afe/doc/media/brooklin-scale-down-and-up.png


--------------------------------------------------------------------------------
/doc/media/brookling-killl-kafka-pod-take2-production-error-rate.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AppsFlyer/kafka-mirror-tester/ce3ed8f8f0f9ac672921bb9b69bf89ca556b1afe/doc/media/brookling-killl-kafka-pod-take2-production-error-rate.png


--------------------------------------------------------------------------------
/doc/media/downsize-destination-cluster.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AppsFlyer/kafka-mirror-tester/ce3ed8f8f0f9ac672921bb9b69bf89ca556b1afe/doc/media/downsize-destination-cluster.png


--------------------------------------------------------------------------------
/doc/media/kill-kafka-source-pod.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AppsFlyer/kafka-mirror-tester/ce3ed8f8f0f9ac672921bb9b69bf89ca556b1afe/doc/media/kill-kafka-source-pod.png


--------------------------------------------------------------------------------
/doc/media/kill-pod-destination.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AppsFlyer/kafka-mirror-tester/ce3ed8f8f0f9ac672921bb9b69bf89ca556b1afe/doc/media/kill-pod-destination.png


--------------------------------------------------------------------------------
/doc/media/new-topic.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AppsFlyer/kafka-mirror-tester/ce3ed8f8f0f9ac672921bb9b69bf89ca556b1afe/doc/media/new-topic.png


--------------------------------------------------------------------------------
/doc/media/packet-loss-on-source-cluster.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AppsFlyer/kafka-mirror-tester/ce3ed8f8f0f9ac672921bb9b69bf89ca556b1afe/doc/media/packet-loss-on-source-cluster.png


--------------------------------------------------------------------------------
/doc/media/packet-loss-on-workers.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AppsFlyer/kafka-mirror-tester/ce3ed8f8f0f9ac672921bb9b69bf89ca556b1afe/doc/media/packet-loss-on-workers.png


--------------------------------------------------------------------------------
/doc/media/remove-worker.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AppsFlyer/kafka-mirror-tester/ce3ed8f8f0f9ac672921bb9b69bf89ca556b1afe/doc/media/remove-worker.png


--------------------------------------------------------------------------------
/go.mk:
--------------------------------------------------------------------------------
 1 | LOCAL_IP := `ifconfig | grep -Eo 'inet (addr:)?([0-9]*\.){3}[0-9]*' | grep -Eo '([0-9]*\.){3}[0-9]*' | grep -v '127.0.0.1' | head -1`
 2 | 
 3 | ###########################
 4 | # Build, test, run
 5 | ###########################
 6 | go-setup:
 7 | 	@echo For mac:  brew install librdkafka
 8 | 	@echo For linux install librdkafka-dev
 9 | 
10 | go-build: go-generate go-test
11 | 	go build ./...
12 | 
13 | go-run-producer:
14 | 	# Check out http://localhost:8001/metrics
15 | 	go run main.go produce --bootstrap-servers localhost:9093 --id $$(hostname) --message-size 100 --throughput 10 --topics topic1,topic2 --use-message-headers
16 | 
17 | go-run-consumer:
18 | 	# Check out http://localhost:8000/metrics
19 | 	go run main.go consume --bootstrap-servers localhost:9093 --consumer-group group-4 --topics topic1,topic2 --use-message-headers
20 | 
21 | go-test:
22 | 	go test ./...
23 | 
24 | go-generate:
25 | 	go generate ./...
26 | 
27 | #########################
28 | # Docker
29 | #########################
30 | go-docker-build: go-test
31 | 	docker build . -t rantav/kafka-mirror-tester:latest
32 | 
33 | go-docker-push: go-docker-build
34 | 	# push to dockerhub
35 | 	docker push rantav/kafka-mirror-tester
36 | 
37 | go-docker-run-consumer:
38 | 	# Check out http://localhost:8000/metrics
39 | 	docker run -p 8000:8000 rantav/kafka-mirror-tester consume --bootstrap-servers $(LOCAL_IP):9093 --consumer-group group-4 --topics topic1,topic2
40 | 
41 | go-docker-run-producer:
42 | 	# Check out http://localhost:8001/metrics
43 | 	docker run rantav/kafka-mirror-tester produce --bootstrap-servers $(LOCAL_IP):9093 --id $$(hostname) --message-size 100 --throughput 10 --topics topic1,topic2
44 | 
45 | go-release: go-docker-push
46 | 


--------------------------------------------------------------------------------
/go.mod:
--------------------------------------------------------------------------------
 1 | module github.com/appsflyer/kafka-mirror-tester
 2 | 
 3 | go 1.13
 4 | 
 5 | require (
 6 | 	github.com/beorn7/perks v0.0.0-20180321164747-3a771d992973
 7 | 	github.com/confluentinc/confluent-kafka-go v0.11.6
 8 | 	github.com/davecgh/go-spew v1.1.1
 9 | 	github.com/deckarep/golang-set v1.7.1
10 | 	github.com/dustin/go-humanize v1.0.0
11 | 	github.com/golang/protobuf v1.2.0
12 | 	github.com/inconshreveable/mousetrap v1.0.0
13 | 	github.com/jamiealquiza/tachymeter v1.1.2
14 | 	github.com/konsorten/go-windows-terminal-sequences v1.0.1
15 | 	github.com/matttproud/golang_protobuf_extensions v1.0.1
16 | 	github.com/paulbellamy/ratecounter v0.2.0
17 | 	github.com/pkg/errors v0.8.0
18 | 	github.com/pmezard/go-difflib v1.0.0
19 | 	github.com/prometheus/client_golang v0.9.2
20 | 	github.com/prometheus/client_model v0.0.0-20180712105110-5c3871d89910
21 | 	github.com/prometheus/common v0.1.0
22 | 	github.com/prometheus/procfs v0.0.0-20190104112138-b1a0a9a36d74
23 | 	github.com/sirupsen/logrus v1.2.0
24 | 	github.com/spf13/cobra v0.0.3
25 | 	github.com/spf13/pflag v1.0.3
26 | 	github.com/stretchr/testify v1.2.2
27 | 	golang.org/x/crypto v0.0.0-20181203042331-505ab145d0a9
28 | 	golang.org/x/sys v0.0.0-20181213200352-4d1cda033e06
29 | 	golang.org/x/time v0.0.0-20181108054448-85acf8d2951c
30 | )
31 | 


--------------------------------------------------------------------------------
/go.sum:
--------------------------------------------------------------------------------
 1 | github.com/alecthomas/template v0.0.0-20160405071501-a0175ee3bccc/go.mod h1:LOuyumcjzFXgccqObfd/Ljyb9UuFJ6TxHnclSeseNhc=
 2 | github.com/alecthomas/units v0.0.0-20151022065526-2efee857e7cf/go.mod h1:ybxpYRFXyAe+OPACYpWeL0wqObRcbAqCMya13uyzqw0=
 3 | github.com/beorn7/perks v0.0.0-20180321164747-3a771d992973 h1:xJ4a3vCFaGF/jqvzLMYoU8P317H5OQ+Via4RmuPwCS0=
 4 | github.com/beorn7/perks v0.0.0-20180321164747-3a771d992973/go.mod h1:Dwedo/Wpr24TaqPxmxbtue+5NUziq4I4S80YR8gNf3Q=
 5 | github.com/confluentinc/confluent-kafka-go v0.11.6 h1:rEblubnNXCjRThwAGnFSzLKYIRAoXLDC3A9r4ciziHU=
 6 | github.com/confluentinc/confluent-kafka-go v0.11.6/go.mod h1:u2zNLny2xq+5rWeTQjFHbDzzNuba4P1vo31r9r4uAdg=
 7 | github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c=
 8 | github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
 9 | github.com/deckarep/golang-set v1.7.1 h1:SCQV0S6gTtp6itiFrTqI+pfmJ4LN85S1YzhDf9rTHJQ=
10 | github.com/deckarep/golang-set v1.7.1/go.mod h1:93vsz/8Wt4joVM7c2AVqh+YRMiUSc14yDtF28KmMOgQ=
11 | github.com/dustin/go-humanize v1.0.0 h1:VSnTsYCnlFHaM2/igO1h6X3HA71jcobQuxemgkq4zYo=
12 | github.com/dustin/go-humanize v1.0.0/go.mod h1:HtrtbFcZ19U5GC7JDqmcUSB87Iq5E25KnS6fMYU6eOk=
13 | github.com/go-kit/kit v0.8.0/go.mod h1:xBxKIO96dXMWWy0MnWVtmwkA9/13aqxPnvrjFYMA2as=
14 | github.com/go-logfmt/logfmt v0.3.0/go.mod h1:Qt1PoO58o5twSAckw1HlFXLmHsOX5/0LbT9GBnD5lWE=
15 | github.com/go-stack/stack v1.8.0/go.mod h1:v0f6uXyyMGvRgIKkXu+yp6POWl0qKG85gN/melR3HDY=
16 | github.com/gogo/protobuf v1.1.1/go.mod h1:r8qH/GZQm5c6nD/R0oafs1akxWv10x8SbQlK7atdtwQ=
17 | github.com/golang/protobuf v1.2.0 h1:P3YflyNX/ehuJFLhxviNdFxQPkGK5cDcApsge1SqnvM=
18 | github.com/golang/protobuf v1.2.0/go.mod h1:6lQm79b+lXiMfvg/cZm0SGofjICqVBUtrP5yJMmIC1U=
19 | github.com/inconshreveable/mousetrap v1.0.0/go.mod h1:PxqpIevigyE2G7u3NXJIT2ANytuPF1OarO4DADm73n8=
20 | github.com/jamiealquiza/tachymeter v1.1.2 h1:cOgpMYFejxGSAe5f5JOb7uNPZ53kmEYwwpCrw1vDh2Q=
21 | github.com/jamiealquiza/tachymeter v1.1.2/go.mod h1:Ayf6zPZKEnLsc3winWEXJRkTBhdHo58HODAu1oFJkYU=
22 | github.com/julienschmidt/httprouter v1.2.0/go.mod h1:SYymIcj16QtmaHHD7aYtjjsJG7VTCxuUUipMqKk8s4w=
23 | github.com/konsorten/go-windows-terminal-sequences v1.0.1/go.mod h1:T0+1ngSBFLxvqU3pZ+m/2kptfBszLMUkC4ZK/EgS/cQ=
24 | github.com/kr/logfmt v0.0.0-20140226030751-b84e30acd515/go.mod h1:+0opPa2QZZtGFBFZlji/RkVcI2GknAs/DXo4wKdlNEc=
25 | github.com/matttproud/golang_protobuf_extensions v1.0.1 h1:4hp9jkHxhMHkqkrB3Ix0jegS5sx/RkqARlsWZ6pIwiU=
26 | github.com/matttproud/golang_protobuf_extensions v1.0.1/go.mod h1:D8He9yQNgCq6Z5Ld7szi9bcBfOoFv/3dc6xSMkL2PC0=
27 | github.com/mwitkow/go-conntrack v0.0.0-20161129095857-cc309e4a2223/go.mod h1:qRWi+5nqEBWmkhHvq77mSJWrCKwh8bxhgT7d/eI7P4U=
28 | github.com/paulbellamy/ratecounter v0.2.0 h1:2L/RhJq+HA8gBQImDXtLPrDXK5qAj6ozWVK/zFXVJGs=
29 | github.com/paulbellamy/ratecounter v0.2.0/go.mod h1:Hfx1hDpSGoqxkVVpBi/IlYD7kChlfo5C6hzIHwPqfFE=
30 | github.com/pkg/errors v0.8.0 h1:WdK/asTD0HN+q6hsWO3/vpuAkAr+tw6aNJNDFFf0+qw=
31 | github.com/pkg/errors v0.8.0/go.mod h1:bwawxfHBFNV+L2hUp1rHADufV3IMtnDRdf1r5NINEl0=
32 | github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM=
33 | github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
34 | github.com/prometheus/client_golang v0.9.1/go.mod h1:7SWBe2y4D6OKWSNQJUaRYU/AaXPKyh/dDVn+NZz0KFw=
35 | github.com/prometheus/client_golang v0.9.2 h1:awm861/B8OKDd2I/6o1dy3ra4BamzKhYOiGItCeZ740=
36 | github.com/prometheus/client_golang v0.9.2/go.mod h1:OsXs2jCmiKlQ1lTBmv21f2mNfw4xf/QclQDMrYNZzcM=
37 | github.com/prometheus/client_model v0.0.0-20180712105110-5c3871d89910 h1:idejC8f05m9MGOsuEi1ATq9shN03HrxNkD/luQvxCv8=
38 | github.com/prometheus/client_model v0.0.0-20180712105110-5c3871d89910/go.mod h1:MbSGuTsp3dbXC40dX6PRTWyKYBIrTGTE9sqQNg2J8bo=
39 | github.com/prometheus/common v0.0.0-20181126121408-4724e9255275/go.mod h1:daVV7qP5qjZbuso7PdcryaAu0sAZbrN9i7WWcTMWvro=
40 | github.com/prometheus/common v0.1.0 h1:IxU7wGikQPAcoOd3/f4Ol7+vIKS1Sgu08tzjktR4nJE=
41 | github.com/prometheus/common v0.1.0/go.mod h1:TNfzLD0ON7rHzMJeJkieUDPYmFC7Snx/y86RQel1bk4=
42 | github.com/prometheus/procfs v0.0.0-20181005140218-185b4288413d/go.mod h1:c3At6R/oaqEKCNdg8wHV1ftS6bRYblBhIjjI8uT2IGk=
43 | github.com/prometheus/procfs v0.0.0-20181204211112-1dc9a6cbc91a/go.mod h1:c3At6R/oaqEKCNdg8wHV1ftS6bRYblBhIjjI8uT2IGk=
44 | github.com/prometheus/procfs v0.0.0-20190104112138-b1a0a9a36d74 h1:d1Xoc24yp/pXmWl2leBiBA+Tptce6cQsA+MMx/nOOcY=
45 | github.com/prometheus/procfs v0.0.0-20190104112138-b1a0a9a36d74/go.mod h1:c3At6R/oaqEKCNdg8wHV1ftS6bRYblBhIjjI8uT2IGk=
46 | github.com/sirupsen/logrus v1.2.0 h1:juTguoYk5qI21pwyTXY3B3Y5cOTH3ZUyZCg1v/mihuo=
47 | github.com/sirupsen/logrus v1.2.0/go.mod h1:LxeOpSwHxABJmUn/MG1IvRgCAasNZTLOkJPxbbu5VWo=
48 | github.com/spf13/cobra v0.0.3 h1:ZlrZ4XsMRm04Fr5pSFxBgfND2EBVa1nLpiy1stUsX/8=
49 | github.com/spf13/cobra v0.0.3/go.mod h1:1l0Ry5zgKvJasoi3XT1TypsSe7PqH0Sj9dhYf7v3XqQ=
50 | github.com/spf13/pflag v1.0.3 h1:zPAT6CGy6wXeQ7NtTnaTerfKOsV6V6F8agHXFiazDkg=
51 | github.com/spf13/pflag v1.0.3/go.mod h1:DYY7MBk1bdzusC3SYhjObp+wFpr4gzcvqqNjLnInEg4=
52 | github.com/stretchr/objx v0.1.1/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME=
53 | github.com/stretchr/testify v1.2.2 h1:bSDNvY7ZPG5RlJ8otE/7V6gMiyenm9RtJ7IUVIAoJ1w=
54 | github.com/stretchr/testify v1.2.2/go.mod h1:a8OnRcib4nhh0OaRAV+Yts87kKdq0PP7pXfy6kDkUVs=
55 | golang.org/x/crypto v0.0.0-20180904163835-0709b304e793/go.mod h1:6SG95UA2DQfeDnfUPMdvaQW0Q7yPrPDi9nlGo2tz2b4=
56 | golang.org/x/crypto v0.0.0-20181203042331-505ab145d0a9 h1:mKdxBk7AujPs8kU4m80U72y/zjbZ3UcXC7dClwKbUI0=
57 | golang.org/x/crypto v0.0.0-20181203042331-505ab145d0a9/go.mod h1:6SG95UA2DQfeDnfUPMdvaQW0Q7yPrPDi9nlGo2tz2b4=
58 | golang.org/x/net v0.0.0-20181114220301-adae6a3d119a/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4=
59 | golang.org/x/net v0.0.0-20181201002055-351d144fa1fc/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4=
60 | golang.org/x/sync v0.0.0-20181108010431-42b317875d0f/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
61 | golang.org/x/sys v0.0.0-20180905080454-ebe1bf3edb33/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
62 | golang.org/x/sys v0.0.0-20181116152217-5ac8a444bdc5/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
63 | golang.org/x/sys v0.0.0-20181213200352-4d1cda033e06 h1:0oC8rFnE+74kEmuHZ46F6KHsMr5Gx2gUQPuNz28iQZM=
64 | golang.org/x/sys v0.0.0-20181213200352-4d1cda033e06/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
65 | golang.org/x/time v0.0.0-20181108054448-85acf8d2951c h1:fqgJT0MGcGpPgpWU7VRdRjuArfcOvC4AoJmILihzhDg=
66 | golang.org/x/time v0.0.0-20181108054448-85acf8d2951c/go.mod h1:tRJNPiyCQ0inRvYxbN9jk5I+vvW/OXSQhTDSoE431IQ=
67 | gopkg.in/alecthomas/kingpin.v2 v2.2.6/go.mod h1:FMv+mEhP44yOT+4EoQTLFTRgOQ1FBLkstjWtayDeSgw=
68 | gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
69 | gopkg.in/yaml.v2 v2.2.1/go.mod h1:hI93XBmqTisBFMUTm0b8Fm+jr3Dg1NNxqwp+5A1VGuI=
70 | 


--------------------------------------------------------------------------------
/k8s/brooklin/00namespace.yml:
--------------------------------------------------------------------------------
1 | apiVersion: v1
2 | kind: Namespace
3 | metadata:
4 |   name: brooklin
5 | 


--------------------------------------------------------------------------------
/k8s/brooklin/20zookeeper.yml:
--------------------------------------------------------------------------------
  1 | apiVersion: v1
  2 | kind: Service
  3 | metadata:
  4 |   namespace: brooklin
  5 |   name: zk-hs
  6 |   labels:
  7 |     app: zk
  8 | spec:
  9 |   ports:
 10 |   - port: 2888
 11 |     name: server
 12 |   - port: 3888
 13 |     name: leader-election
 14 |   clusterIP: None
 15 |   selector:
 16 |     app: zk
 17 | ---
 18 | apiVersion: v1
 19 | kind: Service
 20 | metadata:
 21 |   namespace: brooklin
 22 |   name: zookeeper
 23 |   labels:
 24 |     app: zk
 25 | spec:
 26 |   ports:
 27 |   - port: 2181
 28 |     name: client
 29 |   selector:
 30 |     app: zk
 31 | ---
 32 | apiVersion: policy/v1beta1
 33 | kind: PodDisruptionBudget
 34 | metadata:
 35 |   namespace: brooklin
 36 |   name: zk-pdb
 37 | spec:
 38 |   selector:
 39 |     matchLabels:
 40 |       app: zk
 41 |   maxUnavailable: 1
 42 | ---
 43 | apiVersion: apps/v1beta1
 44 | kind: StatefulSet
 45 | metadata:
 46 |   namespace: brooklin
 47 |   name: zk
 48 | spec:
 49 |   selector:
 50 |     matchLabels:
 51 |       app: zk
 52 |   serviceName: zk-hs
 53 |   replicas: 1
 54 |   updateStrategy:
 55 |     type: RollingUpdate
 56 |   podManagementPolicy: Parallel
 57 |   template:
 58 |     metadata:
 59 |       labels:
 60 |         app: zk
 61 |     spec:
 62 |       affinity:
 63 |         podAntiAffinity:
 64 |           requiredDuringSchedulingIgnoredDuringExecution:
 65 |             - labelSelector:
 66 |                 matchExpressions:
 67 |                   - key: "app"
 68 |                     operator: In
 69 |                     values:
 70 |                     - zk-hs
 71 |               topologyKey: "kubernetes.io/hostname"
 72 |       containers:
 73 |       - name: kubernetes-zookeeper
 74 |         imagePullPolicy: Always
 75 |         image: "k8s.gcr.io/kubernetes-zookeeper:1.0-3.4.10" # Consider an upgrade to ZK?
 76 |         resources:
 77 |           requests:
 78 |             memory: "1Gi"
 79 |             cpu: "0.5"
 80 |         ports:
 81 |         - containerPort: 2181
 82 |           name: client
 83 |         - containerPort: 2888
 84 |           name: server
 85 |         - containerPort: 3888
 86 |           name: leader-election
 87 |         command:
 88 |         - sh
 89 |         - -c
 90 |         - "start-zookeeper \
 91 |           --servers=1 \
 92 |           --data_dir=/var/lib/zookeeper/data \
 93 |           --data_log_dir=/var/lib/zookeeper/data/log \
 94 |           --conf_dir=/opt/zookeeper/conf \
 95 |           --client_port=2181 \
 96 |           --election_port=3888 \
 97 |           --server_port=2888 \
 98 |           --tick_time=2000 \
 99 |           --init_limit=10 \
100 |           --sync_limit=5 \
101 |           --heap=512M \
102 |           --max_client_cnxns=200 \
103 |           --snap_retain_count=3 \
104 |           --purge_interval=12 \
105 |           --max_session_timeout=40000 \
106 |           --min_session_timeout=4000 \
107 |           --log_level=INFO"
108 |         readinessProbe:
109 |           exec:
110 |             command:
111 |             - sh
112 |             - -c
113 |             - "zookeeper-ready 2181"
114 |           initialDelaySeconds: 10
115 |           timeoutSeconds: 5
116 |         livenessProbe:
117 |           exec:
118 |             command:
119 |             - sh
120 |             - -c
121 |             - "zookeeper-ready 2181"
122 |           initialDelaySeconds: 10
123 |           timeoutSeconds: 5
124 |         volumeMounts:
125 |         - name: data
126 |           mountPath: /var/lib/zookeeper
127 |       volumes:
128 |       - name: data
129 |         emptyDir: {}
130 |       securityContext:
131 |         runAsUser: 1000
132 |         fsGroup: 1000
133 | 


--------------------------------------------------------------------------------
/k8s/brooklin/25env-config.yml:
--------------------------------------------------------------------------------
 1 | apiVersion: v1
 2 | kind: ConfigMap
 3 | metadata:
 4 |   name: brooklin-envs
 5 |   namespace: brooklin
 6 | data:
 7 |   BROOKLIN_CLUSTER_NAME: brooklin-quickstart
 8 |   BROOKLIN_ZOOKEEPER_CONNECT: zookeeper.brooklin.svc.cluster.local:2181
 9 |   BROOKLIN_HTTP_PORT: "32311"
10 |   KAFKA_TP_BOOTSTRAP_SERVERS: broker.kafka-destination.svc.cluster.local:9092
11 |   KAFKA_TP_ZOOKEEPER_CONNECT: zookeeper.kafka-destination.svc.cluster.local:2181
12 |   KAFKA_TP_CLIENT_ID: brooklin-producer-1
13 |   BROOKLIN_CONFIG: /etc/brooklin-writable
14 |   JMX_OPTS: "-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false  -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=1099 -Dcom.sun.management.jmxremote.rmi.port=1099 -Dcom.sun.management.jmxremote.local.only=false -Djava.rmi.server.hostname=127.0.0.1 "
15 |   OPTS: "-javaagent:/etc/brooklin-writable/jmx_prometheus_javaagent-0.3.1.jar=8080:/etc/jmx-config/jmx-prometheus-javaagent-config.yml"
16 | 
17 |   configure.sh: |-
18 |     #!/bin/sh
19 |     set -x
20 |     wget https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/0.3.1/jmx_prometheus_javaagent-0.3.1.jar -O /etc/brooklin-writable/jmx_prometheus_javaagent-0.3.1.jar
21 |     cp /etc/brooklin/server.properties /etc/brooklin-writable/server.properties
22 | 
23 | 
24 |   server.properties: |-
25 |     ############################# Server Basics #############################
26 |     brooklin.server.coordinator.cluster=brooklin-cluster
27 |     brooklin.server.coordinator.zkAddress=localhost:2181
28 |     brooklin.server.httpPort=32311
29 |     brooklin.server.connectorNames=testC,fileC,dirC,kafkaC,kafkaMirroringC
30 |     brooklin.server.transportProviderNames=dirTP,kafkaTP
31 |     brooklin.server.csvMetricsDir=/tmp/brooklin-example/
32 | 
33 |     ########################### Test event producing connector Configs ######################
34 |     brooklin.server.connector.testC.factoryClassName=com.linkedin.datastream.connectors.TestEventProducingConnectorFactory
35 |     brooklin.server.connector.testC.assignmentStrategyFactory=com.linkedin.datastream.server.assignment.LoadbalancingStrategyFactory
36 |     brooklin.server.connector.testC.strategy.TasksPerDatastream = 4
37 | 
38 |     ########################### File connector Configs ######################
39 |     brooklin.server.connector.fileC.factoryClassName=com.linkedin.datastream.connectors.file.FileConnectorFactory
40 |     brooklin.server.connector.fileC.assignmentStrategyFactory=com.linkedin.datastream.server.assignment.BroadcastStrategyFactory
41 |     brooklin.server.connector.fileC.strategy.maxTasks=1
42 | 
43 |     ########################### Directory connector Configs ######################
44 |     brooklin.server.connector.dirC.factoryClassName=com.linkedin.datastream.connectors.directory.DirectoryConnectorFactory
45 |     brooklin.server.connector.dirC.assignmentStrategyFactory=com.linkedin.datastream.server.assignment.BroadcastStrategyFactory
46 |     brooklin.server.connector.dirC.strategy.maxTasks=1
47 | 
48 |     ########################### Kafka connector Configs ######################
49 |     brooklin.server.connector.kafkaC.factoryClassName=com.linkedin.datastream.connectors.kafka.KafkaConnectorFactory
50 |     brooklin.server.connector.kafkaC.assignmentStrategyFactory=com.linkedin.datastream.server.assignment.BroadcastStrategyFactory
51 | 
52 |     ########################### Kafka Mirroring connector Configs ######################
53 |     brooklin.server.connector.kafkaMirroringC.factoryClassName=com.linkedin.datastream.connectors.kafka.mirrormaker.KafkaMirrorMakerConnectorFactory
54 |     brooklin.server.connector.kafkaMirroringC.assignmentStrategyFactory=com.linkedin.datastream.server.assignment.BroadcastStrategyFactory
55 |     brooklin.server.connector.kafkaMirroringC.consumer.max.poll.records=10000
56 |     brooklin.server.connector.kafkaMirroringC.consumer.fetch.max.wait.ms=10000
57 |     # fetch.max.bytes = 52428800
58 |     # max.partition.fetch.bytes = 1048576
59 |     # receive.buffer.bytes = 65536
60 |     brooklin.server.connector.kafkaMirroringC.consumer.receive.buffer.bytes=524288
61 |     brooklin.server.connector.kafkaMirroringC.consumer.max.partition.fetch.bytes=262144
62 |     brooklin.server.connector.kafkaMirroringC.pausePartitionOnError=true
63 |     brooklin.server.connector.kafkaMirroringC.pauseErrorPartitionDurationMs=30000
64 | 
65 |     ########################### Directory transport provider configs ######################
66 |     brooklin.server.transportProvider.dirTP.factoryClassName=com.linkedin.datastream.server.DirectoryTransportProviderAdminFactory
67 | 
68 |     ########################### Kafka transport provider configs ######################
69 |     brooklin.server.transportProvider.kafkaTP.factoryClassName=com.linkedin.datastream.kafka.KafkaTransportProviderAdminFactory
70 |     brooklin.server.transportProvider.kafkaTP.bootstrap.servers=localhost:9092
71 |     brooklin.server.transportProvider.kafkaTP.zookeeper.connect=localhost:2181
72 |     brooklin.server.transportProvider.kafkaTP.client.id=datastream-producer
73 |     # brooklin.server.transportProvider.kafkaTP.producer.linger.ms=1000
74 |     # brooklin.server.transportProvider.kafkaTP.producer.batch.size=32768
75 |     # brooklin.server.transportProvider.kafkaTP.producer.send.buffer.bytes=262144
76 |     # brooklin.server.transportProvider.kafkaTP.producer.max.request.size=262144
77 |     # brooklin.server.transportProvider.kafkaTP.producer.max.in.flight.requests.per.connection=1
78 | 
79 | 


--------------------------------------------------------------------------------
/k8s/brooklin/25jmx-prometheus-javaagent-config.yml:
--------------------------------------------------------------------------------
 1 | apiVersion: v1
 2 | kind: ConfigMap
 3 | metadata:
 4 |   name: brooklin-jmx-prometheus-javaagent-config
 5 |   namespace: brooklin
 6 | data:
 7 |   jmx-prometheus-javaagent-config.yml: |+
 8 |     startDelaySeconds: 0
 9 |     lowercaseOutputName: true
10 |     lowercaseOutputLabelNames: true
11 |     whitelistObjectNames:
12 |         - "java.lang:*"
13 |         - "metrics:*"
14 |         - "kafka.consumer:*"
15 |         - "kafka.producer:*"
16 | 


--------------------------------------------------------------------------------
/k8s/brooklin/30brooklin.yml:
--------------------------------------------------------------------------------
  1 | apiVersion: extensions/v1beta1
  2 | kind: Deployment
  3 | metadata:
  4 |   namespace: brooklin
  5 |   name: brooklin
  6 |   labels:
  7 |     app: brooklin
  8 | spec:
  9 |   replicas: 32
 10 |   selector:
 11 |     matchLabels:
 12 |       app: brooklin
 13 |   template:
 14 |     metadata:
 15 |       labels:
 16 |         app: brooklin
 17 |     spec:
 18 |       terminationGracePeriodSeconds: 10
 19 |       initContainers:
 20 |       - name: init-zk
 21 |         image: busybox
 22 |         command:
 23 |           - /bin/sh
 24 |           - -c
 25 |           - 'until [ "imok" = "$(echo ruok | nc -w 1 $(echo $BROOKLIN_ZOOKEEPER_CONNECT | cut -d: -f1) $(echo $BROOKLIN_ZOOKEEPER_CONNECT | cut -d: -f2))" ] ; do echo waiting ; sleep 10 ; done'
 26 |         envFrom:
 27 |         - configMapRef:
 28 |             name: brooklin-envs
 29 |       - name: init-config
 30 |         image: busybox
 31 |         envFrom:
 32 |         - configMapRef:
 33 |             name: brooklin-envs
 34 |         command: ['sh', '/etc/brooklin/configure.sh']
 35 |         volumeMounts:
 36 |         - name: config
 37 |           mountPath: /etc/brooklin
 38 |         - name: config-writable
 39 |           mountPath: /etc/brooklin-writable
 40 |       containers:
 41 |       - name: brooklin
 42 |         image: rantav/brooklin:1.0.2-0
 43 |         imagePullPolicy: IfNotPresent
 44 |         env:
 45 |         - name: HEAP_OPTS
 46 |           value: "-Xmx2G -Xms2G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=100 -XX:InitiatingHeapOccupancyPercent=35  -XX:MinMetaspaceFreeRatio=50 -XX:MaxMetaspaceFreeRatio=80 -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps"
 47 |         envFrom:
 48 |         - configMapRef:
 49 |             name: brooklin-envs
 50 |         ports:
 51 |         - name: service
 52 |           containerPort: 32311
 53 |         - name: metrics
 54 |           containerPort: 8080
 55 |         resources:
 56 |           requests:
 57 |             cpu: 700m
 58 |             memory: 2Gi
 59 |           limits:
 60 |             cpu: 700m
 61 |         volumeMounts:
 62 |         - name: jmx-config
 63 |           mountPath: /etc/jmx-config
 64 |         - name: config
 65 |           mountPath: /etc/brooklin
 66 |         - name: config-writable
 67 |           mountPath: /etc/brooklin-writable
 68 |       volumes:
 69 |       - name: jmx-config
 70 |         configMap:
 71 |           name: brooklin-jmx-prometheus-javaagent-config
 72 |       - name: config
 73 |         configMap:
 74 |           name: brooklin-envs
 75 |       - name: config-writable
 76 |         emptyDir: {}
 77 |       affinity:
 78 |         podAntiAffinity:
 79 |           requiredDuringSchedulingIgnoredDuringExecution:
 80 |           - labelSelector:
 81 |               matchExpressions:
 82 |               - key: app
 83 |                 operator: In
 84 |                 values:
 85 |                 - kafka-destination
 86 |             namespaces:
 87 |             - kafka-destination
 88 |             topologyKey: "kubernetes.io/hostname"
 89 |           #- labelSelector:
 90 |               #matchExpressions:
 91 |               #- key: app
 92 |                 #operator: In
 93 |                 #values:
 94 |                 #- brooklin
 95 |               #- key: component
 96 |                 #operator: In
 97 |                 #values:
 98 |                 #- worker
 99 |             #topologyKey: "kubernetes.io/hostname"
100 | 


--------------------------------------------------------------------------------
/k8s/brooklin/40monitoring.yml:
--------------------------------------------------------------------------------
 1 | # Headless service just for the sake of exposing the metrics
 2 | apiVersion: v1
 3 | kind: Service
 4 | metadata:
 5 |   name: brooklin
 6 |   namespace: brooklin
 7 |   labels:
 8 |     app: brooklin
 9 | spec:
10 |   ports:
11 |   - name: metrics
12 |     port: 8080
13 |   clusterIP: None
14 |   selector:
15 |     app: brooklin
16 | ---
17 | apiVersion: monitoring.coreos.com/v1
18 | kind: ServiceMonitor
19 | metadata:
20 |   labels:
21 |     k8s-app: brooklin
22 |   name: brooklin
23 |   namespace: monitoring
24 | spec:
25 |   endpoints:
26 |   - port: metrics
27 |   jobLabel: k8s-app
28 |   namespaceSelector:
29 |     matchNames:
30 |     - brooklin
31 |   selector:
32 |     matchLabels:
33 |       app: brooklin
34 | ---
35 | apiVersion: rbac.authorization.k8s.io/v1beta1
36 | kind: ClusterRole
37 | metadata:
38 |   name: prometheus-k8s
39 |   namespace: brooklin
40 | rules:
41 | - apiGroups: [""]
42 |   resources:
43 |   - nodes
44 |   - services
45 |   - endpoints
46 |   - pods
47 |   verbs: ["get", "list", "watch"]
48 | - apiGroups: [""]
49 |   resources:
50 |   - configmaps
51 |   verbs: ["get"]
52 | - nonResourceURLs: ["/metrics"]
53 |   verbs: ["get"]
54 | ---
55 | apiVersion: rbac.authorization.k8s.io/v1beta1
56 | kind: ClusterRoleBinding
57 | metadata:
58 |   name: prometheus-k8s
59 | roleRef:
60 |   apiGroup: rbac.authorization.k8s.io
61 |   kind: ClusterRole
62 |   name: prometheus-k8s
63 | subjects:
64 | - kind: ServiceAccount
65 |   name: prometheus-k8s
66 |   namespace: monitoring
67 | 


--------------------------------------------------------------------------------
/k8s/brooklin/delete-replicate-topic.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | 
 3 | set -x
 4 | set -e
 5 | 
 6 | if [ "$#" -ne 1 ]; then
 7 |     echo "Illegal number of parameters. Looking for topic name"
 8 |     exit 1
 9 | fi
10 | 
11 | topic_name=$1
12 | 
13 | brooklin_pod=$(kubectl --context eu-west-1.k8s.local get pods -n brooklin -l app=brooklin -o 'jsonpath={.items[0].metadata.name}')
14 | kubectl --context eu-west-1.k8s.local exec -n brooklin $brooklin_pod -- bash -c "unset JMX_OPTS; unset JMX_PORT; unset OPTS; \$BROOKLIN_HOME/bin/brooklin-rest-client.sh -o DELETE -u http://localhost:32311/ -n mirror-$topic_name 2> /dev/null"
15 | 


--------------------------------------------------------------------------------
/k8s/brooklin/replicate-topic.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | 
 3 | set -x
 4 | set -e
 5 | 
 6 | if [ "$#" -ne 1 ]; then
 7 |     echo "Illegal number of parameters. Looking for topic name"
 8 |     exit 1
 9 | fi
10 | 
11 | topic_name=$1
12 | 
13 | kafka_source_ip=$(kubectl --context us-east-1.k8s.local get node $(kubectl --context us-east-1.k8s.local -n kafka-source get po kafka-source-0 -o jsonpath='{.spec.nodeName}') -o jsonpath='{.status.addresses[?(@.type=="ExternalIP")].address}')
14 | brooklin_pod=$(kubectl --context eu-west-1.k8s.local get pods -n brooklin -l app=brooklin -o 'jsonpath={.items[0].metadata.name}')
15 | kubectl --context eu-west-1.k8s.local exec -n brooklin $brooklin_pod -- bash -c "unset JMX_OPTS; unset JMX_PORT; unset OPTS; \$BROOKLIN_HOME/bin/brooklin-rest-client.sh -o CREATE -u http://localhost:32311/ -n mirror-$topic_name -s \"kafka://$kafka_source_ip:9093/^$topic_name$\" -c kafkaMirroringC -t kafkaTP -m '{\"owner\":\"test-user\",\"system.reuseExistingDestination\":\"false\",\"system.destination.identityPartitioningEnabled\":true}' 2>/dev/null"
16 | 


--------------------------------------------------------------------------------
/k8s/brooklin/test.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | 
 3 | set -x
 4 | # Test Kafka to see if a topic had been replicated
 5 | kubectl --context eu-west-1.k8s.local -n kafka-destination wait --for=condition=Ready pod/kafka-destination-0 --timeout=-1s
 6 | kubectl --context us-east-1.k8s.local -n kafka-source wait --for=condition=Ready pod/kafka-source-0 --timeout=-1s
 7 | 
 8 | kubectl --context eu-west-1.k8s.local -n brooklin wait --for=condition=Available deployment/brooklin --timeout=-1s
 9 | while [[ $(kubectl --context eu-west-1.k8s.local get pods -n brooklin -l app=brooklin -o 'jsonpath={..status.conditions[?(@.type=="Ready")].status}' | cut -d' ' -f1) != "True" ]]; do echo "waiting for brooklin pod..." && sleep 10; done
10 | 
11 | # Run end to end tests. Produce to the source cluster, consume from the destination cluster
12 | TOPIC="_test_replicator_$(date +%s)"
13 | kubectl --context us-east-1.k8s.local exec -n kafka-source kafka-source-0 -- bash -c "unset JMX_PORT; echo '                                     >>>>>>>>>>>>>  REPLICATOR GREAT SUCCESS! <<<<<<<<<<<<<<<<' | /opt/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic $TOPIC"
14 | 
15 | $(dirname "$0")/replicate-topic.sh $TOPIC
16 | 
17 | kubectl --context eu-west-1.k8s.local exec -n kafka-destination kafka-destination-0 -- bash -c "unset JMX_PORT; /opt/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning --topic $TOPIC --max-messages 1"
18 | 
19 | $(dirname "$0")/delete-replicate-topic.sh $TOPIC


--------------------------------------------------------------------------------
/k8s/kafka-destination/00namespace.yml:
--------------------------------------------------------------------------------
1 | apiVersion: v1
2 | kind: Namespace
3 | metadata:
4 |   name: kafka-destination
5 | 


--------------------------------------------------------------------------------
/k8s/kafka-destination/10broker-config.yml:
--------------------------------------------------------------------------------
  1 | kind: ConfigMap
  2 | metadata:
  3 |   name: broker-config
  4 |   namespace: kafka-destination
  5 | apiVersion: v1
  6 | data:
  7 |   init.sh: |-
  8 |     #!/bin/bash
  9 |     set -x
 10 | 
 11 |     KAFKA_BROKER_ID=${HOSTNAME##*-}
 12 |     sed "s/#init#broker.id=#init#/broker.id=$KAFKA_BROKER_ID/" /etc/kafka/server.properties  > /etc/kafka-writable/server.properties
 13 | 
 14 |     hash kubectl 2>/dev/null || {
 15 |       sed -i "s/#init#broker.rack=#init#/#init#broker.rack=# kubectl not found in path/" /etc/kafka-writable/server.properties
 16 |     } && {
 17 |       ZONE=$(kubectl get node "$NODE_NAME" -o=go-template='{{index .metadata.labels "failure-domain.beta.kubernetes.io/zone"}}')
 18 |       if [ $? -ne 0 ]; then
 19 |         sed -i "s/#init#broker.rack=#init#/#init#broker.rack=# zone lookup failed, see -c init-config logs/" /etc/kafka-writable/server.properties
 20 |       elif [ "x$ZONE" == "x<no value>" ]; then
 21 |         sed -i "s/#init#broker.rack=#init#/#init#broker.rack=# zone label not found for node $NODE_NAME/" /etc/kafka-writable/server.properties
 22 |       else
 23 |         sed -i "s/#init#broker.rack=#init#/broker.rack=$ZONE/" /etc/kafka-writable/server.properties
 24 |       fi
 25 |     }
 26 | 
 27 |   server.properties: |-
 28 |     # Licensed to the Apache Software Foundation (ASF) under one or more
 29 |     # contributor license agreements.  See the NOTICE file distributed with
 30 |     # this work for additional information regarding copyright ownership.
 31 |     # The ASF licenses this file to You under the Apache License, Version 2.0
 32 |     # (the "License"); you may not use this file except in compliance with
 33 |     # the License.  You may obtain a copy of the License at
 34 |     #
 35 |     #    http://www.apache.org/licenses/LICENSE-2.0
 36 |     #
 37 |     # Unless required by applicable law or agreed to in writing, software
 38 |     # distributed under the License is distributed on an "AS IS" BASIS,
 39 |     # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 40 |     # See the License for the specific language governing permissions and
 41 |     # limitations under the License.
 42 | 
 43 |     # see kafka.server.KafkaConfig for additional details and defaults
 44 | 
 45 |     ############################# Server Basics #############################
 46 | 
 47 |     # The id of the broker. This must be set to a unique integer for each broker.
 48 |     #init#broker.id=#init#
 49 | 
 50 |     #init#broker.rack=#init#
 51 | 
 52 |     # Switch to enable topic deletion or not, default value is false
 53 |     delete.topic.enable=true
 54 | 
 55 |     ############################# Socket Server Settings #############################
 56 | 
 57 |     # The address the socket server listens on. It will get the value returned from
 58 |     # java.net.InetAddress.getCanonicalHostName() if not configured.
 59 |     #   FORMAT:
 60 |     #     listeners = listener_name://host_name:port
 61 |     #   EXAMPLE:
 62 |     #     listeners = PLAINTEXT://your.host.name:9092
 63 |     #listeners=PLAINTEXT://:9092
 64 | 
 65 |     # Hostname and port the broker will advertise to producers and consumers. If not set,
 66 |     # it uses the value for "listeners" if configured.  Otherwise, it will use the value
 67 |     # returned from java.net.InetAddress.getCanonicalHostName().
 68 |     #advertised.listeners=PLAINTEXT://your.host.name:9092
 69 | 
 70 |     # Maps listener names to security protocols, the default is for them to be the same. See the config documentation for more details
 71 |     #listener.security.protocol.map=PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL
 72 | 
 73 |     # The number of threads that the server uses for receiving requests from the network and sending responses to the network
 74 |     num.network.threads=8
 75 | 
 76 |     # The number of threads that the server uses for processing requests, which may include disk I/O
 77 |     num.io.threads=8
 78 | 
 79 |     # The send buffer (SO_SNDBUF) used by the socket server
 80 |     socket.send.buffer.bytes=104857600
 81 | 
 82 |     # The receive buffer (SO_RCVBUF) used by the socket server
 83 |     socket.receive.buffer.bytes=104857600
 84 | 
 85 |     # The maximum size of a request that the socket server will accept (protection against OOM)
 86 |     socket.request.max.bytes=104857600
 87 | 
 88 | 
 89 |     ############################# Log Basics #############################
 90 | 
 91 |     # A comma seperated list of directories under which to store log files
 92 |     log.dirs=/tmp/kafka-logs
 93 | 
 94 |     # The default number of log partitions per topic. More partitions allow greater
 95 |     # parallelism for consumption, but this will also result in more files across
 96 |     # the brokers.
 97 |     num.partitions=1
 98 | 
 99 |     # The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
100 |     # This value is recommended to be increased for installations with data dirs located in RAID array.
101 |     num.recovery.threads.per.data.dir=1
102 | 
103 |     ############################# Internal Topic Settings  #############################
104 |     # The replication factor for the group metadata internal topics "__consumer_offsets" and "__transaction_state"
105 |     # For anything other than development testing, a value greater than 1 is recommended for to ensure availability such as 3.
106 |     offsets.topic.replication.factor=1
107 |     transaction.state.log.replication.factor=1
108 |     transaction.state.log.min.isr=1
109 | 
110 |     ############################# Log Flush Policy #############################
111 | 
112 |     # Messages are immediately written to the filesystem but by default we only fsync() to sync
113 |     # the OS cache lazily. The following configurations control the flush of data to disk.
114 |     # There are a few important trade-offs here:
115 |     #    1. Durability: Unflushed data may be lost if you are not using replication.
116 |     #    2. Latency: Very large flush intervals may lead to latency spikes when the flush does occur as there will be a lot of data to flush.
117 |     #    3. Throughput: The flush is generally the most expensive operation, and a small flush interval may lead to exceessive seeks.
118 |     # The settings below allow one to configure the flush policy to flush data after a period of time or
119 |     # every N messages (or both). This can be done globally and overridden on a per-topic basis.
120 | 
121 |     # The number of messages to accept before forcing a flush of data to disk
122 |     #log.flush.interval.messages=10000
123 | 
124 |     # The maximum amount of time a message can sit in a log before we force a flush
125 |     #log.flush.interval.ms=1000
126 | 
127 |     ############################# Log Retention Policy #############################
128 | 
129 |     # The following configurations control the disposal of log segments. The policy can
130 |     # be set to delete segments after a period of time, or after a given size has accumulated.
131 |     # A segment will be deleted whenever *either* of these criteria are met. Deletion always happens
132 |     # from the end of the log.
133 | 
134 |     # The minimum age of a log file to be eligible for deletion due to age
135 |     log.retention.hours=168
136 | 
137 |     # A size-based retention policy for logs. Segments are pruned from the log as long as the remaining
138 |     # segments don't drop below log.retention.bytes. Functions independently of log.retention.hours.
139 |     #log.retention.bytes=1073741824
140 | 
141 |     # The maximum size of a log segment file. When this size is reached a new log segment will be created.
142 |     log.segment.bytes=1073741824
143 | 
144 |     # The interval at which log segments are checked to see if they can be deleted according
145 |     # to the retention policies
146 |     log.retention.check.interval.ms=60000
147 | 
148 |     ############################# Zookeeper #############################
149 | 
150 |     # Zookeeper connection string (see zookeeper docs for details).
151 |     # This is a comma separated host:port pairs, each corresponding to a zk
152 |     # server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
153 |     # You can also append an optional chroot string to the urls to specify the
154 |     # root directory for all kafka znodes.
155 |     zookeeper.connect=zookeeper.kafka-destination.svc.cluster.local:2181
156 | 
157 |     # Timeout in ms for connecting to zookeeper
158 |     zookeeper.connection.timeout.ms=6000
159 | 
160 | 
161 |     ############################# Group Coordinator Settings #############################
162 | 
163 |     # The following configuration specifies the time, in milliseconds, that the GroupCoordinator will delay the initial consumer rebalance.
164 |     # The rebalance will be further delayed by the value of group.initial.rebalance.delay.ms as new members join the group, up to a maximum of max.poll.interval.ms.
165 |     # The default value for this is 3 seconds.
166 |     # We override this to 0 here as it makes for a better out-of-the-box experience for development and testing.
167 |     # However, in production environments the default value of 3 seconds is more suitable as this will help to avoid unnecessary, and potentially expensive, rebalances during application startup.
168 |     group.initial.rebalance.delay.ms=0
169 | 
170 |   log4j.properties: |-
171 |     # Licensed to the Apache Software Foundation (ASF) under one or more
172 |     # contributor license agreements.  See the NOTICE file distributed with
173 |     # this work for additional information regarding copyright ownership.
174 |     # The ASF licenses this file to You under the Apache License, Version 2.0
175 |     # (the "License"); you may not use this file except in compliance with
176 |     # the License.  You may obtain a copy of the License at
177 |     #
178 |     #    http://www.apache.org/licenses/LICENSE-2.0
179 |     #
180 |     # Unless required by applicable law or agreed to in writing, software
181 |     # distributed under the License is distributed on an "AS IS" BASIS,
182 |     # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
183 |     # See the License for the specific language governing permissions and
184 |     # limitations under the License.
185 | 
186 |     # Unspecified loggers and loggers with additivity=true output to server.log and stdout
187 |     # Note that INFO only applies to unspecified loggers, the log level of the child logger is used otherwise
188 |     log4j.rootLogger=INFO, stdout
189 | 
190 |     log4j.appender.stdout=org.apache.log4j.ConsoleAppender
191 |     log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
192 |     log4j.appender.stdout.layout.ConversionPattern=[%d] %p %m (%c)%n
193 | 
194 |     log4j.appender.kafkaAppender=org.apache.log4j.DailyRollingFileAppender
195 |     log4j.appender.kafkaAppender.DatePattern='.'yyyy-MM-dd-HH
196 |     log4j.appender.kafkaAppender.File=${kafka.logs.dir}/server.log
197 |     log4j.appender.kafkaAppender.layout=org.apache.log4j.PatternLayout
198 |     log4j.appender.kafkaAppender.layout.ConversionPattern=[%d] %p %m (%c)%n
199 | 
200 |     log4j.appender.stateChangeAppender=org.apache.log4j.DailyRollingFileAppender
201 |     log4j.appender.stateChangeAppender.DatePattern='.'yyyy-MM-dd-HH
202 |     log4j.appender.stateChangeAppender.File=${kafka.logs.dir}/state-change.log
203 |     log4j.appender.stateChangeAppender.layout=org.apache.log4j.PatternLayout
204 |     log4j.appender.stateChangeAppender.layout.ConversionPattern=[%d] %p %m (%c)%n
205 | 
206 |     log4j.appender.requestAppender=org.apache.log4j.DailyRollingFileAppender
207 |     log4j.appender.requestAppender.DatePattern='.'yyyy-MM-dd-HH
208 |     log4j.appender.requestAppender.File=${kafka.logs.dir}/kafka-request.log
209 |     log4j.appender.requestAppender.layout=org.apache.log4j.PatternLayout
210 |     log4j.appender.requestAppender.layout.ConversionPattern=[%d] %p %m (%c)%n
211 | 
212 |     log4j.appender.cleanerAppender=org.apache.log4j.DailyRollingFileAppender
213 |     log4j.appender.cleanerAppender.DatePattern='.'yyyy-MM-dd-HH
214 |     log4j.appender.cleanerAppender.File=${kafka.logs.dir}/log-cleaner.log
215 |     log4j.appender.cleanerAppender.layout=org.apache.log4j.PatternLayout
216 |     log4j.appender.cleanerAppender.layout.ConversionPattern=[%d] %p %m (%c)%n
217 | 
218 |     log4j.appender.controllerAppender=org.apache.log4j.DailyRollingFileAppender
219 |     log4j.appender.controllerAppender.DatePattern='.'yyyy-MM-dd-HH
220 |     log4j.appender.controllerAppender.File=${kafka.logs.dir}/controller.log
221 |     log4j.appender.controllerAppender.layout=org.apache.log4j.PatternLayout
222 |     log4j.appender.controllerAppender.layout.ConversionPattern=[%d] %p %m (%c)%n
223 | 
224 |     log4j.appender.authorizerAppender=org.apache.log4j.DailyRollingFileAppender
225 |     log4j.appender.authorizerAppender.DatePattern='.'yyyy-MM-dd-HH
226 |     log4j.appender.authorizerAppender.File=${kafka.logs.dir}/kafka-authorizer.log
227 |     log4j.appender.authorizerAppender.layout=org.apache.log4j.PatternLayout
228 |     log4j.appender.authorizerAppender.layout.ConversionPattern=[%d] %p %m (%c)%n
229 | 
230 |     # Change the two lines below to adjust ZK client logging
231 |     log4j.logger.org.I0Itec.zkclient.ZkClient=INFO
232 |     log4j.logger.org.apache.zookeeper=INFO
233 | 
234 |     # Change the two lines below to adjust the general broker logging level (output to server.log and stdout)
235 |     log4j.logger.kafka=INFO
236 |     log4j.logger.org.apache.kafka=INFO
237 | 
238 |     # Change to DEBUG or TRACE to enable request logging
239 |     log4j.logger.kafka.request.logger=WARN, requestAppender
240 |     log4j.additivity.kafka.request.logger=false
241 | 
242 |     # Uncomment the lines below and change log4j.logger.kafka.network.RequestChannel$ to TRACE for additional output
243 |     # related to the handling of requests
244 |     #log4j.logger.kafka.network.Processor=TRACE, requestAppender
245 |     #log4j.logger.kafka.server.KafkaApis=TRACE, requestAppender
246 |     #log4j.additivity.kafka.server.KafkaApis=false
247 |     log4j.logger.kafka.network.RequestChannel$=WARN, requestAppender
248 |     log4j.additivity.kafka.network.RequestChannel$=false
249 | 
250 |     log4j.logger.kafka.controller=TRACE, controllerAppender
251 |     log4j.additivity.kafka.controller=false
252 | 
253 |     log4j.logger.kafka.log.LogCleaner=INFO, cleanerAppender
254 |     log4j.additivity.kafka.log.LogCleaner=false
255 | 
256 |     log4j.logger.state.change.logger=TRACE, stateChangeAppender
257 |     log4j.additivity.state.change.logger=false
258 | 
259 |     # Change to DEBUG to enable audit log for the authorizer
260 |     log4j.logger.kafka.authorizer.logger=WARN, authorizerAppender
261 |     log4j.additivity.kafka.authorizer.logger=false
262 | 


--------------------------------------------------------------------------------
/k8s/kafka-destination/10metrics-config.yml:
--------------------------------------------------------------------------------
 1 | kind: ConfigMap
 2 | metadata:
 3 |   name: jmx-config
 4 |   namespace: kafka-destination
 5 | apiVersion: v1
 6 | data:
 7 |   jmx-kafka-prometheus.yml: |+
 8 |     lowercaseOutputName: true
 9 |     jmxUrl: service:jmx:rmi:///jndi/rmi://127.0.0.1:5555/jmxrmi
10 |     ssl: false
11 |     whitelistObjectNames: ["kafka.server:*","kafka.controller:*","java.lang:*"]
12 |     rules:
13 |       - pattern : kafka.server<type=ReplicaFetcherManager, name=MaxLag, clientId=(.+)><>Value
14 |       - pattern : kafka.server<type=BrokerTopicMetrics, name=(BytesInPerSec|BytesOutPerSec|MessagesInPerSec), topic=(.+)><>OneMinuteRate
15 |       - pattern : kafka.server<type=KafkaRequestHandlerPool, name=RequestHandlerAvgIdlePercent><>OneMinuteRate
16 |       - pattern : kafka.server<type=Produce><>queue-size
17 |       - pattern : kafka.server<type=ReplicaManager, name=(PartitionCount|UnderReplicatedPartitions)><>(Value|OneMinuteRate)
18 |       - pattern : kafka.server<type=controller-channel-metrics, broker-id=(.+)><>(.*)
19 |       - pattern : kafka.server<type=socket-server-metrics, networkProcessor=(.+)><>(.*)
20 |       - pattern : kafka.server<type=Fetch><>queue-size
21 |       - pattern : kafka.server<type=SessionExpireListener, name=(.+)><>OneMinuteRate
22 |       - pattern : kafka.controller<type=KafkaController, name=(.+)><>Value
23 |       - pattern : java.lang<type=OperatingSystem><>SystemCpuLoad
24 |       - pattern : java.lang<type=Memory><HeapMemoryUsage>used
25 |       - pattern : java.lang<type=OperatingSystem><>FreePhysicalMemorySize
26 |       - pattern: 'java.lang<type=Threading><(.*)>ThreadCount: .*'
27 |         name: java_lang_threading_threadcount
28 |       - pattern: 'java.lang<type=OperatingSystem><.*>OpenFileDescriptorCount: .*'
29 |         name: java_lang_operatingsystem_openfiledescriptorcount
30 |       - pattern: 'java.lang<type=Memory><NonHeapMemoryUsage>(.+): .*'
31 |         name: java_lang_memory_nonheapmemoryusage_$1
32 | 
33 |   jmx-zookeeper-prometheus.yaml: |+
34 |     startDelaySeconds: 0
35 |     lowercaseOutputName: true
36 |     lowercaseOutputLabelNames: true
37 |     jmxUrl: service:jmx:rmi:///jndi/rmi://127.0.0.1:5555/jmxrmi
38 |     ssl: false
39 |     whitelistObjectNames: ["java.lang:*","org.apache.ZooKeeperService:*"]
40 |     rules:
41 |       - pattern: 'java.lang<type=Memory><HeapMemoryUsage>(.+): .*'
42 |         name: java_lang_Memory_HeapMemoryUsage_$1
43 |       - pattern: 'java.lang<type=Memory><NonHeapMemoryUsage>(.+): .*'
44 |         name: java_lang_Memory_NonHeapMemoryUsage_$1
45 |       - pattern: 'java.lang<type=OperatingSystem><.*>OpenFileDescriptorCount: .*'
46 |         name: java_lang_OperatingSystem_OpenFileDescriptorCount
47 |       - pattern: 'java.lang<type=OperatingSystem><.*>ProcessCpuLoad: .*'
48 |         name: java_lang_OperatingSystem_ProcessCpuLoad
49 |       - pattern: 'java.lang<type=Threading><(.*)>ThreadCount: .*'
50 |         name: java_lang_Threading_ThreadCount
51 |       # These are still incorrect, they need more work
52 |       - pattern: "org.apache.ZooKeeperService<name0=ReplicatedServer_id(\\d)><>(\\w+)"
53 |         name: "zookeeper_$2"
54 |       - pattern: "org.apache.ZooKeeperService<name0=ReplicatedServer_id(\\d), name1=replica.(\\d)><>(\\w+)"
55 |         name: "zookeeper_$3"
56 |         labels:
57 |           replicaId: "$2"
58 |       - pattern: "org.apache.ZooKeeperService<name0=ReplicatedServer_id(\\d), name1=replica.(\\d), name2=(\\w+)><>(\\w+)"
59 |         name: "zookeeper_$4"
60 |         labels:
61 |           replicaId: "$2"
62 |           memberType: "$3"
63 |       - pattern: "org.apache.ZooKeeperService<name0=ReplicatedServer_id(\\d), name1=replica.(\\d), name2=(\\w+), name3=(\\w+)><>(\\w+)"
64 |         name: "zookeeper_$4_$5"
65 |         labels:
66 |           replicaId: "$2"
67 |           memberType: "$3"
68 | 


--------------------------------------------------------------------------------
/k8s/kafka-destination/10zookeeper-config.yml:
--------------------------------------------------------------------------------
 1 | kind: ConfigMap
 2 | metadata:
 3 |   name: zookeeper-config
 4 |   namespace: kafka-destination
 5 | apiVersion: v1
 6 | data:
 7 |   init.sh: |-
 8 |     #!/bin/bash
 9 |     set -x
10 | 
11 |     [ -z "$ID_OFFSET" ] && ID_OFFSET=1
12 |     export ZOOKEEPER_SERVER_ID=$((${HOSTNAME##*-} + $ID_OFFSET))
13 |     echo "${ZOOKEEPER_SERVER_ID:-1}" | tee /var/lib/zookeeper/data/myid
14 |     sed "s/server\.$ZOOKEEPER_SERVER_ID\=[a-z0-9.-]*/server.$ZOOKEEPER_SERVER_ID=0.0.0.0/" /etc/kafka/zookeeper.properties > /etc/kafka-writable/zookeeper.properties
15 | 
16 | 
17 |   zookeeper.properties: |-
18 |     tickTime=2000
19 |     dataDir=/var/lib/zookeeper/data
20 |     dataLogDir=/var/lib/zookeeper/log
21 |     clientPort=2181
22 |     initLimit=5
23 |     syncLimit=2
24 | 
25 |   log4j.properties: |-
26 |     log4j.rootLogger=INFO, stdout
27 |     log4j.appender.stdout=org.apache.log4j.ConsoleAppender
28 |     log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
29 |     log4j.appender.stdout.layout.ConversionPattern=[%d] %p %m (%c)%n
30 | 
31 |     # Suppress connection log messages, three lines per livenessProbe execution
32 |     log4j.logger.org.apache.zookeeper.server.NIOServerCnxnFactory=WARN
33 |     log4j.logger.org.apache.zookeeper.server.NIOServerCnxn=WARN
34 | 


--------------------------------------------------------------------------------
/k8s/kafka-destination/20dns.yml:
--------------------------------------------------------------------------------
 1 | # A headless service to create DNS records
 2 | ---
 3 | apiVersion: v1
 4 | kind: Service
 5 | metadata:
 6 |   name: broker
 7 |   namespace: kafka-destination
 8 |   labels:
 9 |     app: kafka-destination
10 | spec:
11 |   ports:
12 |   - name: broker
13 |     port: 9092
14 |   - name: prometheus
15 |     port: 5556
16 |   clusterIP: None
17 |   selector:
18 |     app: kafka-destination
19 | 


--------------------------------------------------------------------------------
/k8s/kafka-destination/20pzoo-service.yml:
--------------------------------------------------------------------------------
 1 | apiVersion: v1
 2 | kind: Service
 3 | metadata:
 4 |   name: pzoo
 5 |   namespace: kafka-destination
 6 | spec:
 7 |   ports:
 8 |   - port: 2888
 9 |     name: peer
10 |   - port: 3888
11 |     name: leader-election
12 |   clusterIP: None
13 |   selector:
14 |     app: zookeeper-replica
15 |     storage: persistent
16 | 


--------------------------------------------------------------------------------
/k8s/kafka-destination/30service.yml:
--------------------------------------------------------------------------------
 1 | apiVersion: v1
 2 | kind: Service
 3 | metadata:
 4 |   name: zookeeper
 5 |   namespace: kafka-destination
 6 |   labels:
 7 |     app: zookeeper
 8 | spec:
 9 |   ports:
10 |   - port: 2181
11 |     name: client
12 |   - port: 5556
13 |     name: prometheus
14 |   selector:
15 |     app: zookeeper
16 | 


--------------------------------------------------------------------------------
/k8s/kafka-destination/50kafka.yml:
--------------------------------------------------------------------------------
  1 | apiVersion: apps/v1beta2
  2 | kind: StatefulSet
  3 | metadata:
  4 |   name: kafka-destination
  5 |   namespace: kafka-destination
  6 | spec:
  7 |   selector:
  8 |     matchLabels:
  9 |       app: kafka-destination
 10 |   serviceName: "broker"
 11 |   replicas: 16
 12 |   updateStrategy:
 13 |     type: OnDelete
 14 |   template:
 15 |     metadata:
 16 |       labels:
 17 |         app: kafka-destination
 18 |     spec:
 19 |       terminationGracePeriodSeconds: 30
 20 |       initContainers:
 21 |       - name: init-config
 22 |         image: solsson/kafka-initutils@sha256:c98d7fb5e9365eab391a5dcd4230fc6e72caf929c60f29ff091e3b0215124713
 23 |         env:
 24 |         - name: NODE_NAME
 25 |           valueFrom:
 26 |             fieldRef:
 27 |               fieldPath: spec.nodeName
 28 |         - name: POD_NAME
 29 |           valueFrom:
 30 |             fieldRef:
 31 |               fieldPath: metadata.name
 32 |         - name: POD_NAMESPACE
 33 |           valueFrom:
 34 |             fieldRef:
 35 |               fieldPath: metadata.namespace
 36 |         command: ['/bin/bash', '/etc/kafka/init.sh']
 37 |         volumeMounts:
 38 |         - name: config
 39 |           mountPath: /etc/kafka
 40 |         - name: config-writable
 41 |           mountPath: /etc/kafka-writable
 42 |       containers:
 43 |       - name: broker
 44 |         image: solsson/kafka:2.1.0@sha256:ac3f06d87d45c7be727863f31e79fbfdcb9c610b51ba9cf03c75a95d602f15e1
 45 |         env:
 46 |         - name: KAFKA_LOG4J_OPTS
 47 |           value: -Dlog4j.configuration=file:/etc/kafka/log4j.properties
 48 |         - name: JMX_PORT
 49 |           value: "5555"
 50 |         - name: KAFKA_HEAP_OPTS
 51 |           value: "-Xmx11G -Xms11G"
 52 |         ports:
 53 |         - name: inside
 54 |           containerPort: 9092
 55 |         - name: jmx
 56 |           containerPort: 5555
 57 |         command:
 58 |         - ./bin/kafka-server-start.sh
 59 |         - /etc/kafka-writable/server.properties
 60 |         resources:
 61 |           requests:
 62 |             cpu: 1200m
 63 |             memory: 12Gi
 64 |             ephemeral-storage: "80Gi"
 65 |           limits:
 66 |             memory: 12Gi
 67 |         readinessProbe:
 68 |           tcpSocket:
 69 |             port: inside
 70 |           timeoutSeconds: 1
 71 |         livenessProbe:
 72 |           tcpSocket:
 73 |             port: inside
 74 |           initialDelaySeconds: 60
 75 |           periodSeconds: 20
 76 |           timeoutSeconds: 1
 77 |         volumeMounts:
 78 |         - name: config
 79 |           mountPath: /etc/kafka
 80 |         - name: config-writable
 81 |           mountPath: /etc/kafka-writable
 82 |         - name: data
 83 |           mountPath: /var/lib/kafka/data
 84 |       - name: metrics
 85 |         image: solsson/kafka-prometheus-jmx-exporter@sha256:a23062396cd5af1acdf76512632c20ea6be76885dfc20cd9ff40fb23846557e8
 86 |         command:
 87 |         - java
 88 |         - -XX:+UnlockExperimentalVMOptions
 89 |         - -XX:+UseCGroupMemoryLimitForHeap
 90 |         - -XX:MaxRAMFraction=1
 91 |         - -XshowSettings:vm
 92 |         - -jar
 93 |         - jmx_prometheus_httpserver.jar
 94 |         - "5556"
 95 |         - /etc/jmx-kafka/jmx-kafka-prometheus.yml
 96 |         ports:
 97 |         - name: prometheus
 98 |           containerPort: 5556
 99 |         resources:
100 |           requests:
101 |             cpu: 100m
102 |             memory: 500Mi
103 |           #limits:
104 |             #memory: 200Mi
105 |         volumeMounts:
106 |         - name: jmx-config
107 |           mountPath: /etc/jmx-kafka
108 |       volumes:
109 |       - name: config
110 |         configMap:
111 |           name: broker-config
112 |       - name: config-writable
113 |         emptyDir: {}
114 |       - name: jmx-config
115 |         configMap:
116 |           name: jmx-config
117 |       - name: data
118 |         emptyDir: {}
119 |       # affinity:
120 |       #   podAntiAffinity:
121 |       #     requiredDuringSchedulingIgnoredDuringExecution:
122 |       #     - labelSelector:
123 |       #         matchExpressions:
124 |       #         - key: app
125 |       #           operator: In
126 |       #           values:
127 |       #           - kafka-destination
128 |       #       topologyKey: "kubernetes.io/hostname"
129 | 


--------------------------------------------------------------------------------
/k8s/kafka-destination/50pzoo.yml:
--------------------------------------------------------------------------------
  1 | apiVersion: apps/v1beta2
  2 | kind: StatefulSet
  3 | metadata:
  4 |   name: pzoo-destination
  5 |   namespace: kafka-destination
  6 | spec:
  7 |   selector:
  8 |     matchLabels:
  9 |       app: zookeeper
 10 |       storage: persistent
 11 |   serviceName: "pzoo"
 12 |   replicas: 1
 13 |   updateStrategy:
 14 |     type: OnDelete
 15 |   template:
 16 |     metadata:
 17 |       labels:
 18 |         app: zookeeper
 19 |         storage: persistent
 20 |       annotations:
 21 |     spec:
 22 |       terminationGracePeriodSeconds: 10
 23 |       initContainers:
 24 |       - name: init-config
 25 |         image: solsson/kafka:2.1.0@sha256:ac3f06d87d45c7be727863f31e79fbfdcb9c610b51ba9cf03c75a95d602f15e1
 26 |         command: ['/bin/bash', '/etc/kafka/init.sh']
 27 |         volumeMounts:
 28 |         - name: config
 29 |           mountPath: /etc/kafka
 30 |         - name: config-writable
 31 |           mountPath: /etc/kafka-writable
 32 |         - name: data
 33 |           mountPath: /var/lib/zookeeper/data
 34 |       containers:
 35 |       - name: zookeeper
 36 |         image: solsson/kafka:2.1.0@sha256:ac3f06d87d45c7be727863f31e79fbfdcb9c610b51ba9cf03c75a95d602f15e1
 37 |         env:
 38 |         - name: KAFKA_LOG4J_OPTS
 39 |           value: -Dlog4j.configuration=file:/etc/kafka/log4j.properties
 40 |         - name: JMX_PORT
 41 |           value: "5555"
 42 |         command:
 43 |         - ./bin/zookeeper-server-start.sh
 44 |         - /etc/kafka-writable/zookeeper.properties
 45 |         ports:
 46 |         - containerPort: 2181
 47 |           name: client
 48 |         - containerPort: 2888
 49 |           name: peer
 50 |         - containerPort: 3888
 51 |           name: leader-election
 52 |         - name: jmx
 53 |           containerPort: 5555
 54 |         resources:
 55 |           requests:
 56 |             cpu: 20m
 57 |             memory: 200Mi
 58 |             ephemeral-storage: "2Gi"
 59 |         readinessProbe:
 60 |           exec:
 61 |             command:
 62 |             - /bin/sh
 63 |             - -c
 64 |             - '[ "imok" = "$(echo ruok | nc -w 1 127.0.0.1 2181)" ]'
 65 |         volumeMounts:
 66 |         - name: config
 67 |           mountPath: /etc/kafka
 68 |         - name: config-writable
 69 |           mountPath: /etc/kafka-writable
 70 |         - name: data
 71 |           mountPath: /var/lib/zookeeper/data
 72 |       - name: metrics
 73 |         image: solsson/kafka-prometheus-jmx-exporter@sha256:a23062396cd5af1acdf76512632c20ea6be76885dfc20cd9ff40fb23846557e8
 74 |         command:
 75 |         - java
 76 |         - -XX:+UnlockExperimentalVMOptions
 77 |         - -XX:+UseCGroupMemoryLimitForHeap
 78 |         - -XX:MaxRAMFraction=1
 79 |         - -XshowSettings:vm
 80 |         - -jar
 81 |         - jmx_prometheus_httpserver.jar
 82 |         - "5556"
 83 |         - /etc/jmx-config/jmx-zookeeper-prometheus.yaml
 84 |         ports:
 85 |         - name: prometheus
 86 |           containerPort: 5556
 87 |         resources:
 88 |           requests:
 89 |             cpu: 100m
 90 |             memory: 500Mi
 91 |         volumeMounts:
 92 |         - name: jmx-config
 93 |           mountPath: /etc/jmx-config
 94 |       volumes:
 95 |       - name: config
 96 |         configMap:
 97 |           name: zookeeper-config
 98 |       - name: config-writable
 99 |         emptyDir: {}
100 |       - name: data
101 |         emptyDir: {}
102 |       - name: jmx-config
103 |         configMap:
104 |           name: jmx-config
105 |       affinity:
106 |         podAntiAffinity:
107 |           requiredDuringSchedulingIgnoredDuringExecution:
108 |           - labelSelector:
109 |               matchExpressions:
110 |               - key: app
111 |                 operator: In
112 |                 values:
113 |                 - zookeeper
114 |             topologyKey: "kubernetes.io/hostname"
115 | 


--------------------------------------------------------------------------------
/k8s/kafka-destination/60monitoring.yml:
--------------------------------------------------------------------------------
 1 | # Monitor kafka
 2 | apiVersion: monitoring.coreos.com/v1
 3 | kind: ServiceMonitor
 4 | metadata:
 5 |   labels:
 6 |     k8s-app: kafka
 7 |   name: kafka
 8 |   namespace: monitoring
 9 | spec:
10 |   endpoints:
11 |   - port: prometheus
12 |   jobLabel: k8s-app
13 |   namespaceSelector:
14 |     matchNames:
15 |     - kafka-destination
16 |   selector:
17 |     matchLabels:
18 |       app: kafka-destination
19 | ---
20 | 
21 | # Monitor zookeeper
22 | apiVersion: monitoring.coreos.com/v1
23 | kind: ServiceMonitor
24 | metadata:
25 |   labels:
26 |     k8s-app: zookeeper
27 |   name: zookeeper
28 |   namespace: monitoring
29 | spec:
30 |   endpoints:
31 |   - port: prometheus
32 |   jobLabel: k8s-app
33 |   namespaceSelector:
34 |     matchNames:
35 |     - kafka-destination
36 |   selector:
37 |     matchLabels:
38 |       app: zookeeper
39 | ---
40 | 
41 | # set permissions
42 | apiVersion: rbac.authorization.k8s.io/v1beta1
43 | kind: ClusterRole
44 | metadata:
45 |   name: prometheus-k8s
46 |   namespace: kafka-destination
47 | rules:
48 | - apiGroups: [""]
49 |   resources:
50 |   - nodes
51 |   - services
52 |   - endpoints
53 |   - pods
54 |   verbs: ["get", "list", "watch"]
55 | - apiGroups: [""]
56 |   resources:
57 |   - configmaps
58 |   verbs: ["get"]
59 | - nonResourceURLs: ["/metrics"]
60 |   verbs: ["get"]
61 | ---
62 | apiVersion: rbac.authorization.k8s.io/v1beta1
63 | kind: ClusterRoleBinding
64 | metadata:
65 |   name: prometheus-k8s
66 | roleRef:
67 |   apiGroup: rbac.authorization.k8s.io
68 |   kind: ClusterRole
69 |   name: prometheus-k8s
70 | subjects:
71 | - kind: ServiceAccount
72 |   name: prometheus-k8s
73 |   namespace: monitoring
74 | 


--------------------------------------------------------------------------------
/k8s/kafka-destination/test.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | 
 3 | set -x
 4 | # Test ZK
 5 | #kubectl exec -n kafka-destination pzoo-destination-0 -- /opt/kafka/bin/zookeeper-shell.sh localhost:2181 create /foo bar
 6 | #kubectl exec -n kafka-destination pzoo-destination-0 -- /opt/kafka/bin/zookeeper-shell.sh localhost:2181 get /foo
 7 | kubectl --context eu-west-1.k8s.local -n kafka-destination wait --for=condition=Ready pod/pzoo-destination-0 --timeout=-1s
 8 | 
 9 | # wait some, to make sure ZK is with us
10 | sleep 20
11 | 
12 | kubectl --context eu-west-1.k8s.local exec -n kafka-destination pzoo-destination-0 -- bash -c "unset JMX_PORT; /opt/kafka/bin/zookeeper-shell.sh localhost:2181 get /brokers/ids/0"
13 | 
14 | # Test Kafka from the inside
15 | kubectl --context eu-west-1.k8s.local -n kafka-destination wait --for=condition=Ready pod/kafka-destination-0 --timeout=-1s
16 | 
17 | # wait some, to make sure kafka is with us
18 | sleep 20
19 | 
20 | TOPIC="_test_destination_$(date +%s)"
21 | kubectl --context eu-west-1.k8s.local exec -n kafka-destination kafka-destination-0 -- bash -c "unset JMX_PORT; echo '                                 >>>>>>>>>>>>>  DESTINATION GREAT SUCCESS! <<<<<<<<<<<<<<<<' | /opt/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic $TOPIC"
22 | kubectl --context eu-west-1.k8s.local exec -n kafka-destination kafka-destination-0 -- bash -c "unset JMX_PORT; /opt/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning --topic $TOPIC --max-messages 1"
23 | 


--------------------------------------------------------------------------------
/k8s/kafka-source/00namespace.yml:
--------------------------------------------------------------------------------
1 | apiVersion: v1
2 | kind: Namespace
3 | metadata:
4 |   name: kafka-source
5 | 


--------------------------------------------------------------------------------
/k8s/kafka-source/10metrics-config.yml:
--------------------------------------------------------------------------------
 1 | kind: ConfigMap
 2 | metadata:
 3 |   name: jmx-config
 4 |   namespace: kafka-source
 5 | apiVersion: v1
 6 | data:
 7 | 
 8 |   jmx-kafka-prometheus.yml: |+
 9 |     lowercaseOutputName: true
10 |     jmxUrl: service:jmx:rmi:///jndi/rmi://127.0.0.1:5555/jmxrmi
11 |     ssl: false
12 |     whitelistObjectNames: ["kafka.server:*","kafka.controller:*","java.lang:*"]
13 |     rules:
14 |       - pattern : kafka.server<type=ReplicaFetcherManager, name=MaxLag, clientId=(.+)><>Value
15 |       - pattern : kafka.server<type=BrokerTopicMetrics, name=(BytesInPerSec|BytesOutPerSec|MessagesInPerSec), topic=(.+)><>OneMinuteRate
16 |       - pattern : kafka.server<type=KafkaRequestHandlerPool, name=RequestHandlerAvgIdlePercent><>OneMinuteRate
17 |       - pattern : kafka.server<type=Produce><>queue-size
18 |       - pattern : kafka.server<type=ReplicaManager, name=(PartitionCount|UnderReplicatedPartitions)><>(Value|OneMinuteRate)
19 |       - pattern : kafka.server<type=controller-channel-metrics, broker-id=(.+)><>(.*)
20 |       - pattern : kafka.server<type=socket-server-metrics, networkProcessor=(.+)><>(.*)
21 |       - pattern : kafka.server<type=Fetch><>queue-size
22 |       - pattern : kafka.server<type=SessionExpireListener, name=(.+)><>OneMinuteRate
23 |       - pattern : kafka.controller<type=KafkaController, name=(.+)><>Value
24 |       - pattern : java.lang<type=OperatingSystem><>SystemCpuLoad
25 |       - pattern : java.lang<type=Memory><HeapMemoryUsage>used
26 |       - pattern : java.lang<type=OperatingSystem><>FreePhysicalMemorySize
27 |       - pattern: 'java.lang<type=Threading><(.*)>ThreadCount: .*'
28 |         name: java_lang_threading_threadcount
29 |       - pattern: 'java.lang<type=OperatingSystem><.*>OpenFileDescriptorCount: .*'
30 |         name: java_lang_operatingsystem_openfiledescriptorcount
31 |       - pattern: 'java.lang<type=Memory><NonHeapMemoryUsage>(.+): .*'
32 |         name: java_lang_memory_nonheapmemoryusage_$1
33 | 
34 |   jmx-zookeeper-prometheus.yaml: |+
35 |     startDelaySeconds: 0
36 |     lowercaseOutputName: true
37 |     lowercaseOutputLabelNames: true
38 |     jmxUrl: service:jmx:rmi:///jndi/rmi://127.0.0.1:5555/jmxrmi
39 |     ssl: false
40 |     whitelistObjectNames: ["java.lang:*","org.apache.ZooKeeperService:*"]
41 |     rules:
42 |       - pattern: 'java.lang<type=Memory><HeapMemoryUsage>(.+): .*'
43 |         name: java_lang_Memory_HeapMemoryUsage_$1
44 |       - pattern: 'java.lang<type=Memory><NonHeapMemoryUsage>(.+): .*'
45 |         name: java_lang_Memory_NonHeapMemoryUsage_$1
46 |       - pattern: 'java.lang<type=OperatingSystem><.*>OpenFileDescriptorCount: .*'
47 |         name: java_lang_OperatingSystem_OpenFileDescriptorCount
48 |       - pattern: 'java.lang<type=OperatingSystem><.*>ProcessCpuLoad: .*'
49 |         name: java_lang_OperatingSystem_ProcessCpuLoad
50 |       - pattern: 'java.lang<type=Threading><(.*)>ThreadCount: .*'
51 |         name: java_lang_Threading_ThreadCount
52 |       # These are still incorrect, they need more work
53 |       - pattern: "org.apache.ZooKeeperService<name0=ReplicatedServer_id(\\d)><>(\\w+)"
54 |         name: "zookeeper_$2"
55 |       - pattern: "org.apache.ZooKeeperService<name0=ReplicatedServer_id(\\d), name1=replica.(\\d)><>(\\w+)"
56 |         name: "zookeeper_$3"
57 |         labels:
58 |           replicaId: "$2"
59 |       - pattern: "org.apache.ZooKeeperService<name0=ReplicatedServer_id(\\d), name1=replica.(\\d), name2=(\\w+)><>(\\w+)"
60 |         name: "zookeeper_$4"
61 |         labels:
62 |           replicaId: "$2"
63 |           memberType: "$3"
64 |       - pattern: "org.apache.ZooKeeperService<name0=ReplicatedServer_id(\\d), name1=replica.(\\d), name2=(\\w+), name3=(\\w+)><>(\\w+)"
65 |         name: "zookeeper_$4_$5"
66 |         labels:
67 |           replicaId: "$2"
68 |           memberType: "$3"
69 | 


--------------------------------------------------------------------------------
/k8s/kafka-source/10zookeeper-config.yml:
--------------------------------------------------------------------------------
 1 | kind: ConfigMap
 2 | metadata:
 3 |   name: zookeeper-config
 4 |   namespace: kafka-source
 5 | apiVersion: v1
 6 | data:
 7 |   init.sh: |-
 8 |     #!/bin/bash
 9 |     set -x
10 | 
11 |     [ -z "$ID_OFFSET" ] && ID_OFFSET=1
12 |     export ZOOKEEPER_SERVER_ID=$((${HOSTNAME##*-} + $ID_OFFSET))
13 |     echo "${ZOOKEEPER_SERVER_ID:-1}" | tee /var/lib/zookeeper/data/myid
14 |     sed "s/server\.$ZOOKEEPER_SERVER_ID\=[a-z0-9.-]*/server.$ZOOKEEPER_SERVER_ID=0.0.0.0/" /etc/kafka/zookeeper.properties > /etc/kafka-writable/zookeeper.properties
15 | 
16 |   zookeeper.properties: |-
17 |     tickTime=2000
18 |     dataDir=/var/lib/zookeeper/data
19 |     dataLogDir=/var/lib/zookeeper/log
20 |     clientPort=2181
21 |     initLimit=5
22 |     syncLimit=2
23 | 
24 |   log4j.properties: |-
25 |     log4j.rootLogger=INFO, stdout
26 |     log4j.appender.stdout=org.apache.log4j.ConsoleAppender
27 |     log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
28 |     log4j.appender.stdout.layout.ConversionPattern=[%d] %p %m (%c)%n
29 | 
30 |     # Suppress connection log messages, three lines per livenessProbe execution
31 |     log4j.logger.org.apache.zookeeper.server.NIOServerCnxnFactory=WARN
32 |     log4j.logger.org.apache.zookeeper.server.NIOServerCnxn=WARN
33 | 


--------------------------------------------------------------------------------
/k8s/kafka-source/20dns.yml:
--------------------------------------------------------------------------------
 1 | # A headless service to create DNS records. This is required for the inter-node communcation
 2 | ---
 3 | apiVersion: v1
 4 | kind: Service
 5 | metadata:
 6 |   name: broker
 7 |   namespace: kafka-source
 8 |   labels:
 9 |     app: kafka-source
10 | spec:
11 |   ports:
12 |   - name: internal
13 |     port: 9092
14 |   - name: prometheus
15 |     port: 5556
16 |   clusterIP: None
17 |   selector:
18 |     app: kafka-source
19 | 


--------------------------------------------------------------------------------
/k8s/kafka-source/20pzoo-service.yml:
--------------------------------------------------------------------------------
 1 | apiVersion: v1
 2 | kind: Service
 3 | metadata:
 4 |   name: pzoo
 5 |   namespace: kafka-source
 6 | spec:
 7 |   ports:
 8 |   - port: 2888
 9 |     name: peer
10 |   - port: 3888
11 |     name: leader-election
12 |   clusterIP: None
13 |   selector:
14 |     app: zookeeper-main
15 |     storage: persistent
16 | 


--------------------------------------------------------------------------------
/k8s/kafka-source/30service.yml:
--------------------------------------------------------------------------------
 1 | apiVersion: v1
 2 | kind: Service
 3 | metadata:
 4 |   name: zookeeper
 5 |   namespace: kafka-source
 6 |   labels:
 7 |     app: zookeeper
 8 | spec:
 9 |   ports:
10 |   - port: 2181
11 |     name: client
12 |   - port: 5556
13 |     name: prometheus
14 |   selector:
15 |     app: zookeeper
16 | 


--------------------------------------------------------------------------------
/k8s/kafka-source/50kafka.yml:
--------------------------------------------------------------------------------
  1 | apiVersion: apps/v1beta2
  2 | kind: StatefulSet
  3 | metadata:
  4 |   name: kafka-source
  5 |   namespace: kafka-source
  6 | spec:
  7 |   selector:
  8 |     matchLabels:
  9 |       app: kafka-source
 10 |   serviceName: "broker"
 11 |   replicas: 16
 12 |   updateStrategy:
 13 |     type: OnDelete
 14 |   template:
 15 |     metadata:
 16 |       labels:
 17 |         app: kafka-source
 18 |     spec:
 19 |       terminationGracePeriodSeconds: 30
 20 |       initContainers:
 21 |       - name: init-config
 22 |         image: solsson/kafka-initutils@sha256:2cdb90ea514194d541c7b869ac15d2d530ca64889f56e270161fe4e5c3d076ea
 23 |         env:
 24 |         - name: NODE_NAME
 25 |           valueFrom:
 26 |             fieldRef:
 27 |               fieldPath: spec.nodeName
 28 |         - name: POD_NAME
 29 |           valueFrom:
 30 |             fieldRef:
 31 |               fieldPath: metadata.name
 32 |         - name: POD_NAMESPACE
 33 |           valueFrom:
 34 |             fieldRef:
 35 |               fieldPath: metadata.namespace
 36 |         - name: POD_IP
 37 |           valueFrom:
 38 |             fieldRef:
 39 |               fieldPath: status.podIP
 40 |         command: ['/bin/bash', '/etc/kafka/init.sh']
 41 |         volumeMounts:
 42 |         - name: config
 43 |           mountPath: /etc/kafka
 44 |         - name: config-writable
 45 |           mountPath: /etc/kafka-writable
 46 |       containers:
 47 |       - name: broker
 48 |         image: solsson/kafka:2.1.0@sha256:ac3f06d87d45c7be727863f31e79fbfdcb9c610b51ba9cf03c75a95d602f15e1
 49 |         env:
 50 |         - name: KAFKA_LOG4J_OPTS
 51 |           value: -Dlog4j.configuration=file:/etc/kafka/log4j.properties
 52 |         - name: JMX_PORT
 53 |           value: "5555"
 54 |         - name: KAFKA_HEAP_OPTS
 55 |           value: "-Xmx11G -Xms11G"
 56 |         ports:
 57 |         - name: broker-internal
 58 |           containerPort: 9092
 59 |         - name: broker-external
 60 |           containerPort: 9093
 61 |           hostPort: 9093
 62 |         - name: jmx
 63 |           containerPort: 5555
 64 |         command:
 65 |         - ./bin/kafka-server-start.sh
 66 |         - /etc/kafka-writable/server.properties
 67 |         resources:
 68 |           requests:
 69 |             cpu: 1200m
 70 |             memory: 12Gi
 71 |             ephemeral-storage: "80Gi"
 72 |           limits:
 73 |             memory: 12Gi
 74 |         readinessProbe:
 75 |           tcpSocket:
 76 |             port: broker-internal
 77 |           timeoutSeconds: 1
 78 |         livenessProbe:
 79 |           tcpSocket:
 80 |             port: broker-internal
 81 |           initialDelaySeconds: 60
 82 |           periodSeconds: 20
 83 |           timeoutSeconds: 1
 84 |         volumeMounts:
 85 |         - name: config
 86 |           mountPath: /etc/kafka
 87 |         - name: config-writable
 88 |           mountPath: /etc/kafka-writable
 89 |         - name: data
 90 |           mountPath: /var/lib/kafka/data
 91 |       - name: metrics
 92 |         image: solsson/kafka-prometheus-jmx-exporter@sha256:a23062396cd5af1acdf76512632c20ea6be76885dfc20cd9ff40fb23846557e8
 93 |         command:
 94 |         - java
 95 |         - -XX:+UnlockExperimentalVMOptions
 96 |         - -XX:+UseCGroupMemoryLimitForHeap
 97 |         - -XX:MaxRAMFraction=1
 98 |         - -XshowSettings:vm
 99 |         - -jar
100 |         - jmx_prometheus_httpserver.jar
101 |         - "5556"
102 |         - /etc/jmx-kafka/jmx-kafka-prometheus.yml
103 |         ports:
104 |         - name: prometheus
105 |           containerPort: 5556
106 |         resources:
107 |           requests:
108 |             cpu: 100m
109 |             memory: 500Mi
110 |           #limits:
111 |             #memory: 200Mi
112 |         volumeMounts:
113 |         - name: jmx-config
114 |           mountPath: /etc/jmx-kafka
115 |       volumes:
116 |       - name: config
117 |         configMap:
118 |           name: broker-config
119 |       - name: config-writable
120 |         emptyDir: {}
121 |       - name: data
122 |         emptyDir: {}
123 |       - name: jmx-config
124 |         configMap:
125 |           name: jmx-config
126 |       affinity:
127 |         podAntiAffinity:
128 |           requiredDuringSchedulingIgnoredDuringExecution:
129 |           - labelSelector:
130 |               matchExpressions:
131 |               - key: app
132 |                 operator: In
133 |                 values:
134 |                 - kafka-source
135 |             topologyKey: "kubernetes.io/hostname"
136 | 


--------------------------------------------------------------------------------
/k8s/kafka-source/50pzoo.yml:
--------------------------------------------------------------------------------
  1 | apiVersion: apps/v1beta2
  2 | kind: StatefulSet
  3 | metadata:
  4 |   name: pzoo-source
  5 |   namespace: kafka-source
  6 | spec:
  7 |   selector:
  8 |     matchLabels:
  9 |       app: zookeeper
 10 |       storage: persistent
 11 |   serviceName: "pzoo"
 12 |   replicas: 1
 13 |   updateStrategy:
 14 |     type: OnDelete
 15 |   template:
 16 |     metadata:
 17 |       labels:
 18 |         app: zookeeper
 19 |         storage: persistent
 20 |       annotations:
 21 |     spec:
 22 |       terminationGracePeriodSeconds: 10
 23 |       initContainers:
 24 |       - name: init-config
 25 |         image: solsson/kafka:2.1.0@sha256:ac3f06d87d45c7be727863f31e79fbfdcb9c610b51ba9cf03c75a95d602f15e1
 26 |         command: ['/bin/bash', '/etc/kafka/init.sh']
 27 |         volumeMounts:
 28 |         - name: config
 29 |           mountPath: /etc/kafka
 30 |         - name: config-writable
 31 |           mountPath: /etc/kafka-writable
 32 |         - name: data
 33 |           mountPath: /var/lib/zookeeper/data
 34 |       containers:
 35 |       - name: zookeeper
 36 |         image: solsson/kafka:2.1.0@sha256:ac3f06d87d45c7be727863f31e79fbfdcb9c610b51ba9cf03c75a95d602f15e1
 37 |         env:
 38 |         - name: KAFKA_LOG4J_OPTS
 39 |           value: -Dlog4j.configuration=file:/etc/kafka/log4j.properties
 40 |         - name: JMX_PORT
 41 |           value: "5555"
 42 |         command:
 43 |         - ./bin/zookeeper-server-start.sh
 44 |         - /etc/kafka-writable/zookeeper.properties
 45 |         ports:
 46 |         - containerPort: 2181
 47 |           hostPort: 2181
 48 |           name: client
 49 |         - containerPort: 2888
 50 |           name: peer
 51 |         - containerPort: 3888
 52 |           name: leader-election
 53 |         - name: jmx
 54 |           containerPort: 5555
 55 |         resources:
 56 |           requests:
 57 |             cpu: 200m
 58 |             memory: 2000Mi
 59 |             ephemeral-storage: "4Gi"
 60 |         readinessProbe:
 61 |           exec:
 62 |             command:
 63 |             - /bin/sh
 64 |             - -c
 65 |             - '[ "imok" = "$(echo ruok | nc -w 1 127.0.0.1 2181)" ]'
 66 |         volumeMounts:
 67 |         - name: config
 68 |           mountPath: /etc/kafka
 69 |         - name: config-writable
 70 |           mountPath: /etc/kafka-writable
 71 |         - name: data
 72 |           mountPath: /var/lib/zookeeper/data
 73 |       - name: metrics
 74 |         image: solsson/kafka-prometheus-jmx-exporter@sha256:a23062396cd5af1acdf76512632c20ea6be76885dfc20cd9ff40fb23846557e8
 75 |         command:
 76 |         - java
 77 |         - -XX:+UnlockExperimentalVMOptions
 78 |         - -XX:+UseCGroupMemoryLimitForHeap
 79 |         - -XX:MaxRAMFraction=1
 80 |         - -XshowSettings:vm
 81 |         - -jar
 82 |         - jmx_prometheus_httpserver.jar
 83 |         - "5556"
 84 |         - /etc/jmx-config/jmx-zookeeper-prometheus.yaml
 85 |         ports:
 86 |         - name: prometheus
 87 |           containerPort: 5556
 88 |         resources:
 89 |           requests:
 90 |             cpu: 100m
 91 |             memory: 500Mi
 92 |         volumeMounts:
 93 |         - name: jmx-config
 94 |           mountPath: /etc/jmx-config
 95 |       volumes:
 96 |       - name: config
 97 |         configMap:
 98 |           name: zookeeper-config
 99 |       - name: config-writable
100 |         emptyDir: {}
101 |       - name: data
102 |         emptyDir: {}
103 |       - name: jmx-config
104 |         configMap:
105 |           name: jmx-config
106 |       affinity:
107 |         podAntiAffinity:
108 |           requiredDuringSchedulingIgnoredDuringExecution:
109 |           - labelSelector:
110 |               matchExpressions:
111 |               - key: app
112 |                 operator: In
113 |                 values:
114 |                 - zookeeper
115 |             topologyKey: "kubernetes.io/hostname"
116 | 


--------------------------------------------------------------------------------
/k8s/kafka-source/60monitoring.yml:
--------------------------------------------------------------------------------
 1 | # Monitor Kafka
 2 | 
 3 | apiVersion: monitoring.coreos.com/v1
 4 | kind: ServiceMonitor
 5 | metadata:
 6 |   labels:
 7 |     k8s-app: kafka
 8 |   name: kafka
 9 |   namespace: monitoring
10 | spec:
11 |   endpoints:
12 |   - port: prometheus
13 |   jobLabel: k8s-app
14 |   namespaceSelector:
15 |     matchNames:
16 |     - kafka-source
17 |   selector:
18 |     matchLabels:
19 |       app: kafka-source
20 | ---
21 | # Monitor zookeeper
22 | apiVersion: monitoring.coreos.com/v1
23 | kind: ServiceMonitor
24 | metadata:
25 |   labels:
26 |     k8s-app: zookeeper
27 |   name: zookeeper
28 |   namespace: monitoring
29 | spec:
30 |   endpoints:
31 |   - port: prometheus
32 |   jobLabel: k8s-app
33 |   namespaceSelector:
34 |     matchNames:
35 |     - kafka-source
36 |   selector:
37 |     matchLabels:
38 |       app: zookeeper
39 | ---
40 | 
41 | # Set permissions
42 | apiVersion: rbac.authorization.k8s.io/v1beta1
43 | kind: ClusterRole
44 | metadata:
45 |   name: prometheus-k8s
46 |   namespace: kafka-source
47 | rules:
48 | - apiGroups: [""]
49 |   resources:
50 |   - nodes
51 |   - services
52 |   - endpoints
53 |   - pods
54 |   verbs: ["get", "list", "watch"]
55 | - apiGroups: [""]
56 |   resources:
57 |   - configmaps
58 |   verbs: ["get"]
59 | - nonResourceURLs: ["/metrics"]
60 |   verbs: ["get"]
61 | ---
62 | apiVersion: rbac.authorization.k8s.io/v1beta1
63 | kind: ClusterRoleBinding
64 | metadata:
65 |   name: prometheus-k8s
66 | roleRef:
67 |   apiGroup: rbac.authorization.k8s.io
68 |   kind: ClusterRole
69 |   name: prometheus-k8s
70 | subjects:
71 | - kind: ServiceAccount
72 |   name: prometheus-k8s
73 |   namespace: monitoring
74 | ---
75 | # Monitor zookeeper
76 | 


--------------------------------------------------------------------------------
/k8s/kafka-source/test.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | 
 3 | set -x
 4 | # Test ZK from the inside
 5 | #kubectl --context us-east-1.k8s.local exec -n kafka-source pzoo-source-0 -- /opt/kafka/bin/zookeeper-shell.sh localhost:2181 create /foo bar
 6 | #kubectl --context us-east-1.k8s.local exec -n kafka-source pzoo-source-0 -- /opt/kafka/bin/zookeeper-shell.sh localhost:2181 get /foo
 7 | kubectl --context us-east-1.k8s.local -n kafka-source wait --for=condition=Ready pod/pzoo-source-0 --timeout=-1s
 8 | 
 9 | # wait some, to make sure ZK is with us
10 | sleep 20
11 | 
12 | kubectl --context us-east-1.k8s.local exec -n kafka-source pzoo-source-0 -- bash -c "unset JMX_PORT; /opt/kafka/bin/zookeeper-shell.sh localhost:2181 get /brokers/ids/0"
13 | 
14 | # Test ZK from the outside. We assume there's a zookeeper-shell installed locally on the developer's laptop
15 | zookeeper-shell $(kubectl --context us-east-1.k8s.local get node $(kubectl --context us-east-1.k8s.local -n kafka-source get po pzoo-source-0 -o jsonpath='{.spec.nodeName}') -o jsonpath='{.status.addresses[?(@.type=="ExternalIP")].address}'):2181 get /brokers/ids/0
16 | 
17 | 
18 | kubectl --context us-east-1.k8s.local -n kafka-source wait --for=condition=Ready pod/kafka-source-0 --timeout=-1s
19 | 
20 | # wait some, to make sure kafka is with us
21 | sleep 20
22 | 
23 | TOPIC="_test_source_$(date +%s)"
24 | kubectl --context us-east-1.k8s.local exec -n kafka-source kafka-source-0 -- bash -c "unset JMX_PORT; echo '                                  >>>>>>>>>>>>>  SOURCE GREAT SUCCESS! <<<<<<<<<<<<<<<<' | /opt/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic $TOPIC"
25 | kubectl --context us-east-1.k8s.local exec -n kafka-source kafka-source-0 -- bash -c "unset JMX_PORT; /opt/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning --topic $TOPIC --max-messages 1"
26 | 
27 | # Test kafka from the outside. This assumes there's a locally installed kafka-console-consumer script
28 | kafka-console-consumer --bootstrap-server $(kubectl --context us-east-1.k8s.local get node $(kubectl --context us-east-1.k8s.local -n kafka-source get po kafka-source-0 -o jsonpath='{.spec.nodeName}') -o jsonpath='{.status.addresses[?(@.type=="ExternalIP")].address}'):9093 --topic $TOPIC --from-beginning --max-messages 1
29 | 


--------------------------------------------------------------------------------
/k8s/monitoring/admin-cluster-role-binding.yaml:
--------------------------------------------------------------------------------
 1 | apiVersion: rbac.authorization.k8s.io/v1beta1
 2 | kind: ClusterRoleBinding
 3 | metadata:
 4 |   name: eks-admin
 5 | roleRef:
 6 |   apiGroup: rbac.authorization.k8s.io
 7 |   kind: ClusterRole
 8 |   name: cluster-admin
 9 | subjects:
10 | - kind: ServiceAccount
11 |   name: eks-admin
12 |   namespace: kube-system
13 | 


--------------------------------------------------------------------------------
/k8s/monitoring/admin-service-account.yaml:
--------------------------------------------------------------------------------
1 | apiVersion: v1
2 | kind: ServiceAccount
3 | metadata:
4 |   name: eks-admin
5 |   namespace: kube-system
6 | 


--------------------------------------------------------------------------------
/k8s/monitoring/graphite-exporter/configmap.yml:
--------------------------------------------------------------------------------
 1 | apiVersion: v1
 2 | kind: ConfigMap
 3 | metadata:
 4 |   name: graphite-mapping
 5 |   namespace: monitoring
 6 | data:
 7 |   graphite-mapping.conf: |-
 8 |     mappings:
 9 |     - match: stats.test.counter.kafka-mirror-maker-controller.*.*.worker.rebalance.*.*
10 |       name: ureplicator_worker_rebalance
11 |       labels:
12 |         region: $1
13 |         instance: $2
14 |         nevermind1: $3
15 |         nevermind2: $4
16 |         metric: $5
17 |         metric_type: $6
18 |         component: worker
19 |     - match: stats.test.counter.kafka-mirror-maker-controller.*.*.*.*.*.totalNumber.count
20 |       name: ureplicator_worker
21 |       labels:
22 |         region: $1
23 |         instance: $2
24 |         perspective: $3
25 |         metric: $4
26 |         worker_instance: $5
27 |         component: worker
28 |     - match: stats.test.counter.kafka-mirror-maker-controller.*.*.*.*.*
29 |       name: ureplicator_controller
30 |       labels:
31 |         region: $1
32 |         instance: $2
33 |         module: $3
34 |         metric: $4
35 |         metric_type: $5
36 |         component: controller
37 |     - match: stats.test.counter.kafka-mirror-maker-controller.*.*.KafkaBrokerTopicObserver.*.*.*
38 |       name: ureplicator_topic_observer
39 |       labels:
40 |         region: $1
41 |         instance: $2
42 |         unit: $3
43 |         direction: $4
44 |         metric: $5
45 |         metric_type: $6
46 |         component: controller
47 |     - match: stats.test.counter.kafka-mirror-maker-controller.*.*.topic.partitions.*.count
48 |       name: ureplicator_topic_partitions
49 |       labels:
50 |         region: $1
51 |         instance: $2
52 |         nevermind1: $3
53 |         nevermind2: $4
54 |         metric: $5
55 |         metric_type: count
56 |         component: controller
57 |     - match: stats.test.counter.kafka-mirror-maker-controller.*.*.AutoTopicWhitelistManager.*.*
58 |       name: ureplicator_topic_whitelist_manager
59 |       labels:
60 |         region: $1
61 |         instance: $2
62 |         nevermind1: $3
63 |         metric: $4
64 |         metric_type: $5
65 |         component: controller
66 |     - match: stats.test.counter.kafka-mirror-maker-controller.*.*.leader.counter.count
67 |       name: ureplicator_leader_count
68 |       labels:
69 |         region: $1
70 |         instance: $2
71 |         component: controller
72 |         metric_type: count
73 |     - match: stats.test.counter.kafka-mirror-maker-controller.*.*.topic.errorNumber.count
74 |       name: ureplicator_topic_errors
75 |       labels:
76 |         region: $1
77 |         instance: $2
78 |         component: controller
79 |         metric_type: count
80 |     - match: stats.test.counter.kafka-mirror-maker-controller.*.*.topic.totalNumber.count
81 |       name: ureplicator_topic_counts
82 |       labels:
83 |         region: $1
84 |         instance: $2
85 |         component: controller
86 |     - match: stats.test.counter.kafka-mirror-maker-controller.*.*.worker.*.count
87 |       name: ureplicator_worker_instances
88 |       labels:
89 |         region: $1
90 |         instance: $2
91 |         __nevermind: $3
92 |         metric: $4
93 |         metric_type: count
94 |         component: worker
95 | 


--------------------------------------------------------------------------------
/k8s/monitoring/graphite-exporter/deployment.yaml:
--------------------------------------------------------------------------------
 1 | apiVersion: extensions/v1beta1
 2 | kind: Deployment
 3 | metadata:
 4 |   name: prometheus-graphite-exporter
 5 |   namespace: monitoring
 6 |   labels:
 7 |     app: prometheus
 8 |     component: graphite-exporter
 9 | spec:
10 |   replicas: 1
11 |   template:
12 |     metadata:
13 |       name: prometheus-graphite-exporter
14 |       labels:
15 |         app: prometheus
16 |         component: graphite-exporter
17 |     spec:
18 |       serviceAccountName: prometheus-k8s
19 |       containers:
20 |       - name: prometheus-graphite-exporter
21 |         image: prom/graphite-exporter:master
22 |         args:
23 |           - '--graphite.mapping-config=/tmp/graphite-mapping.conf'
24 |         ports:
25 |         - name: importer
26 |           containerPort: 9109
27 |         - name: exporter
28 |           containerPort: 9108
29 |         resources:
30 |           requests:
31 |             cpu: 50m
32 |             memory: 250Mi
33 |         volumeMounts:
34 |         - name: graphite-mapping-volume
35 |           mountPath: /tmp/graphite-mapping.conf
36 |           subPath: graphite-mapping.conf
37 |       volumes:
38 |       - name: graphite-mapping-volume
39 |         configMap:
40 |           name: graphite-mapping
41 | 


--------------------------------------------------------------------------------
/k8s/monitoring/graphite-exporter/prometheus-scrape.yaml:
--------------------------------------------------------------------------------
 1 | apiVersion: monitoring.coreos.com/v1
 2 | kind: ServiceMonitor
 3 | metadata:
 4 |   labels:
 5 |     k8s-app: prometheus-graphite-exporter
 6 |   name: prometheus-graphite-exporter
 7 |   namespace: monitoring
 8 | spec:
 9 |   endpoints:
10 |   - port: prometheus-graphite-exporter
11 |   jobLabel: k8s-app
12 |   namespaceSelector:
13 |     matchNames:
14 |     - monitoring
15 |   selector:
16 |     matchLabels:
17 |       app: prometheus
18 |       component: graphite-exporter
19 | ---
20 | apiVersion: rbac.authorization.k8s.io/v1beta1
21 | kind: ClusterRole
22 | metadata:
23 |   name: prometheus-k8s
24 |   namespace: monitoring
25 | rules:
26 | - apiGroups: [""]
27 |   resources:
28 |   - nodes
29 |   - services
30 |   - endpoints
31 |   - pods
32 |   verbs: ["get", "list", "watch"]
33 | - apiGroups: [""]
34 |   resources:
35 |   - configmaps
36 |   verbs: ["get"]
37 | - nonResourceURLs: ["/metrics"]
38 |   verbs: ["get"]
39 | ---
40 | apiVersion: rbac.authorization.k8s.io/v1beta1
41 | kind: ClusterRoleBinding
42 | metadata:
43 |   name: prometheus-k8s
44 | roleRef:
45 |   apiGroup: rbac.authorization.k8s.io
46 |   kind: ClusterRole
47 |   name: prometheus-k8s
48 | subjects:
49 | - kind: ServiceAccount
50 |   name: prometheus-k8s
51 |   namespace: monitoring
52 | 


--------------------------------------------------------------------------------
/k8s/monitoring/graphite-exporter/service.yaml:
--------------------------------------------------------------------------------
 1 | apiVersion: v1
 2 | kind: Service
 3 | metadata:
 4 |   namespace: monitoring
 5 |   name: prometheus-graphite-exporter
 6 |   labels:
 7 |     app: prometheus
 8 |     component: graphite-exporter
 9 | spec:
10 |   clusterIP: None
11 |   ports:
12 |     - name: prometheus-graphite-exporter
13 |       port: 9108
14 |       protocol: TCP
15 |   selector:
16 |     app: prometheus
17 |     component: graphite-exporter
18 |   type: ClusterIP
19 | 


--------------------------------------------------------------------------------
/k8s/monitoring/kube-state-metrics-cluster-role-binding.yaml:
--------------------------------------------------------------------------------
 1 | apiVersion: rbac.authorization.k8s.io/v1
 2 | # kubernetes versions before 1.8.0 should use rbac.authorization.k8s.io/v1beta1
 3 | kind: ClusterRoleBinding
 4 | metadata:
 5 |   name: kube-state-metrics
 6 | roleRef:
 7 |   apiGroup: rbac.authorization.k8s.io
 8 |   kind: ClusterRole
 9 |   name: kube-state-metrics
10 | subjects:
11 | - kind: ServiceAccount
12 |   name: kube-state-metrics
13 |   namespace: kube-system
14 | 


--------------------------------------------------------------------------------
/k8s/monitoring/kube-state-metrics-cluster-role.yaml:
--------------------------------------------------------------------------------
 1 | apiVersion: rbac.authorization.k8s.io/v1
 2 | # kubernetes versions before 1.8.0 should use rbac.authorization.k8s.io/v1beta1
 3 | kind: ClusterRole
 4 | metadata:
 5 |   name: kube-state-metrics
 6 | rules:
 7 | - apiGroups: [""]
 8 |   resources:
 9 |   - configmaps
10 |   - secrets
11 |   - nodes
12 |   - pods
13 |   - services
14 |   - resourcequotas
15 |   - replicationcontrollers
16 |   - limitranges
17 |   - persistentvolumeclaims
18 |   - persistentvolumes
19 |   - namespaces
20 |   - endpoints
21 |   verbs: ["list", "watch"]
22 | - apiGroups: ["extensions"]
23 |   resources:
24 |   - daemonsets
25 |   - deployments
26 |   - replicasets
27 |   verbs: ["list", "watch"]
28 | - apiGroups: ["apps"]
29 |   resources:
30 |   - statefulsets
31 |   verbs: ["list", "watch"]
32 | - apiGroups: ["batch"]
33 |   resources:
34 |   - cronjobs
35 |   - jobs
36 |   verbs: ["list", "watch"]
37 | - apiGroups: ["autoscaling"]
38 |   resources:
39 |   - horizontalpodautoscalers
40 |   verbs: ["list", "watch"]
41 | - apiGroups: ["policy"]
42 |   resources:
43 |   - poddisruptionbudgets
44 |   verbs: ["list", "watch"]
45 | 


--------------------------------------------------------------------------------
/k8s/monitoring/kube-state-metrics-deployment.yaml:
--------------------------------------------------------------------------------
 1 | apiVersion: apps/v1beta2
 2 | # Kubernetes versions after 1.9.0 should use apps/v1
 3 | # Kubernetes versions before 1.8.0 should use apps/v1beta1 or extensions/v1beta1
 4 | kind: Deployment
 5 | metadata:
 6 |   name: kube-state-metrics
 7 |   namespace: kube-system
 8 | spec:
 9 |   selector:
10 |     matchLabels:
11 |       k8s-app: kube-state-metrics
12 |   replicas: 1
13 |   template:
14 |     metadata:
15 |       labels:
16 |         k8s-app: kube-state-metrics
17 |     spec:
18 |       serviceAccountName: kube-state-metrics
19 |       containers:
20 |       - name: kube-state-metrics
21 |         image: quay.io/coreos/kube-state-metrics:v1.4.0
22 |         ports:
23 |         - name: http-metrics
24 |           containerPort: 8080
25 |         - name: telemetry
26 |           containerPort: 8081
27 |         readinessProbe:
28 |           httpGet:
29 |             path: /healthz
30 |             port: 8080
31 |           initialDelaySeconds: 5
32 |           timeoutSeconds: 5
33 |       - name: addon-resizer
34 |         image: k8s.gcr.io/addon-resizer:1.8.3
35 |         resources:
36 |           limits:
37 |             cpu: 150m
38 |             memory: 50Mi
39 |           requests:
40 |             cpu: 150m
41 |             memory: 50Mi
42 |         env:
43 |           - name: MY_POD_NAME
44 |             valueFrom:
45 |               fieldRef:
46 |                 fieldPath: metadata.name
47 |           - name: MY_POD_NAMESPACE
48 |             valueFrom:
49 |               fieldRef:
50 |                 fieldPath: metadata.namespace
51 |         command:
52 |           - /pod_nanny
53 |           - --container=kube-state-metrics
54 |           - --cpu=100m
55 |           - --extra-cpu=1m
56 |           - --memory=100Mi
57 |           - --extra-memory=2Mi
58 |           - --threshold=5
59 |           - --deployment=kube-state-metrics
60 | 


--------------------------------------------------------------------------------
/k8s/monitoring/kube-state-metrics-role-binding.yaml:
--------------------------------------------------------------------------------
 1 | apiVersion: rbac.authorization.k8s.io/v1
 2 | # kubernetes versions before 1.8.0 should use rbac.authorization.k8s.io/v1beta1
 3 | kind: RoleBinding
 4 | metadata:
 5 |   name: kube-state-metrics
 6 |   namespace: kube-system
 7 | roleRef:
 8 |   apiGroup: rbac.authorization.k8s.io
 9 |   kind: Role
10 |   name: kube-state-metrics-resizer
11 | subjects:
12 | - kind: ServiceAccount
13 |   name: kube-state-metrics
14 |   namespace: kube-system
15 | 


--------------------------------------------------------------------------------
/k8s/monitoring/kube-state-metrics-role.yaml:
--------------------------------------------------------------------------------
 1 | apiVersion: rbac.authorization.k8s.io/v1
 2 | # kubernetes versions before 1.8.0 should use rbac.authorization.k8s.io/v1beta1
 3 | kind: Role
 4 | metadata:
 5 |   namespace: kube-system
 6 |   name: kube-state-metrics-resizer
 7 | rules:
 8 | - apiGroups: [""]
 9 |   resources:
10 |   - pods
11 |   verbs: ["get"]
12 | - apiGroups: ["extensions"]
13 |   resources:
14 |   - deployments
15 |   resourceNames: ["kube-state-metrics"]
16 |   verbs: ["get", "update"]
17 | 


--------------------------------------------------------------------------------
/k8s/monitoring/kube-state-metrics-service-account.yaml:
--------------------------------------------------------------------------------
1 | apiVersion: v1
2 | kind: ServiceAccount
3 | metadata:
4 |   name: kube-state-metrics
5 |   namespace: kube-system
6 | 


--------------------------------------------------------------------------------
/k8s/monitoring/kube-state-metrics-service.yaml:
--------------------------------------------------------------------------------
 1 | apiVersion: v1
 2 | kind: Service
 3 | metadata:
 4 |   name: kube-state-metrics
 5 |   namespace: kube-system
 6 |   labels:
 7 |     k8s-app: kube-state-metrics
 8 |   annotations:
 9 |     prometheus.io/scrape: 'true'
10 | spec:
11 |   ports:
12 |   - name: http-metrics
13 |     port: 8080
14 |     targetPort: http-metrics
15 |     protocol: TCP
16 |   - name: telemetry
17 |     port: 8081
18 |     targetPort: telemetry
19 |     protocol: TCP
20 |   selector:
21 |     k8s-app: kube-state-metrics
22 | 


--------------------------------------------------------------------------------
/k8s/monitoring/monitoring-expose-kube-controller-manager.yaml:
--------------------------------------------------------------------------------
 1 | apiVersion: v1
 2 | kind: Service
 3 | metadata:
 4 |   namespace: kube-system
 5 |   name: kube-controller-manager-prometheus-discovery
 6 |   labels:
 7 |     k8s-app: kube-controller-manager
 8 | spec:
 9 |   selector:
10 |     k8s-app: kube-controller-manager
11 |   type: ClusterIP
12 |   clusterIP: None
13 |   ports:
14 |   - name: http-metrics
15 |     port: 10252
16 |     targetPort: 10252
17 |     protocol: TCP
18 | 


--------------------------------------------------------------------------------
/k8s/monitoring/monitoring-expose-kube-scheduler.yaml:
--------------------------------------------------------------------------------
 1 | apiVersion: v1
 2 | kind: Service
 3 | metadata:
 4 |   namespace: kube-system
 5 |   name: kube-scheduler-prometheus-discovery
 6 |   labels:
 7 |     k8s-app: kube-scheduler
 8 | spec:
 9 |   selector:
10 |     k8s-app: kube-scheduler
11 |   type: ClusterIP
12 |   clusterIP: None
13 |   ports:
14 |   - name: http-metrics
15 |     port: 10251
16 |     targetPort: 10251
17 |     protocol: TCP
18 | 


--------------------------------------------------------------------------------
/k8s/monitoring/patch/grafana-datasources.yaml.tmpl:
--------------------------------------------------------------------------------
 1 | data:
 2 |   prometheus.yaml: |-
 3 |     {
 4 |         "datasources": [
 5 |             {
 6 |                 "access": "proxy",
 7 |                 "etitable": false,
 8 |                 "name": "prometheus",
 9 |                 "org_id": 1,
10 |                 "type": "prometheus",
11 |                 "url": "http://prometheus-k8s.monitoring.svc:9090",
12 |                 "version": 1
13 |             },
14 |             {
15 |                 "access": "proxy",
16 |                 "etitable": false,
17 |                 "name": "us-east-1 source",
18 |                 "org_id": 1,
19 |                 "type": "prometheus",
20 |                 "url": "__US_EAST_1_PROMETHEUS__",
21 |                 "version": 1
22 |             },
23 |             {
24 |                 "access": "proxy",
25 |                 "etitable": false,
26 |                 "name": "eu-west-1 destination",
27 |                 "org_id": 1,
28 |                 "type": "prometheus",
29 |                 "url": "http://prometheus-k8s.monitoring.svc:9090",
30 |                 "version": 1
31 |             }
32 |         ]
33 |     }
34 | 


--------------------------------------------------------------------------------
/k8s/monitoring/patch/template.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | set +x
 3 | 
 4 | DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
 5 | 
 6 | cd $DIR
 7 | 
 8 | until LB="http://$(kubectl --context us-east-1.k8s.local get svc --namespace monitoring prometheus-k8s -o jsonpath="{.status.loadBalancer.ingress[0].hostname}"):$(kubectl --context us-east-1.k8s.local get svc --namespace monitoring prometheus-k8s -o jsonpath="{.spec.ports[0].port}")"
 9 | do
10 |   echo "Prometheus on us-east-1 isn't ready yet"
11 |   sleep 5
12 | done
13 | 
14 | sed "s|__US_EAST_1_PROMETHEUS__|$LB|" grafana-datasources.yaml.tmpl > grafana-datasources.yaml
15 | 
16 | 


--------------------------------------------------------------------------------
/k8s/tester/consumer.yaml:
--------------------------------------------------------------------------------
  1 | apiVersion: extensions/v1beta1
  2 | kind: Deployment
  3 | metadata:
  4 |   name: kafka-mirror-tester-consumer
  5 |   labels:
  6 |     app: kafka-mirror-tester-consumer
  7 | spec:
  8 |   replicas: 8
  9 |   selector:
 10 |     matchLabels:
 11 |       app: kafka-mirror-tester-consumer
 12 |   template:
 13 |     metadata:
 14 |       labels:
 15 |         app: kafka-mirror-tester-consumer
 16 |     spec:
 17 |       containers:
 18 |       - name: consumer
 19 |         image: rantav/kafka-mirror-tester:latest
 20 |         imagePullPolicy: Always
 21 |         args:
 22 |         - consume
 23 |         - --bootstrap-servers
 24 |         - broker.kafka-destination.svc.cluster.local:9092
 25 |         - --consumer-group
 26 |         - group-1
 27 |         - --topics
 28 |         - topic0
 29 |         - --retention
 30 |         - "300000"
 31 |         - --num-partitions
 32 |         - "64"
 33 |         - --num-replicas
 34 |         - "2"
 35 |         ports:
 36 |         - name: metrics
 37 |           containerPort: 8000
 38 |       # affinity:
 39 |       #   podAntiAffinity:
 40 |       #     requiredDuringSchedulingIgnoredDuringExecution:
 41 |       #     - labelSelector:
 42 |       #         matchExpressions:
 43 |       #         - key: app
 44 |       #           operator: In
 45 |       #           values:
 46 |       #           - kafka-mirror-tester-consumer
 47 |       #       topologyKey: "kubernetes.io/hostname"
 48 |       #     - labelSelector:
 49 |       #         matchExpressions:
 50 |       #         - key: app
 51 |       #           operator: In
 52 |       #           values:
 53 |       #           - kafka-destination
 54 |       #       namespaces:
 55 |       #       - kafka-destination
 56 |       #       topologyKey: "kubernetes.io/hostname"
 57 |       #     - labelSelector:
 58 |       #         matchExpressions:
 59 |       #         - key: app
 60 |       #           operator: In
 61 |       #           values:
 62 |       #           - ureplicator
 63 |       #         - key: component
 64 |       #           operator: In
 65 |       #           values:
 66 |       #           - worker
 67 |       #       namespaces:
 68 |       #       - ureplicator
 69 |       #       topologyKey: "kubernetes.io/hostname"
 70 | ---
 71 | # Headless service just for the sake of exposing the metrics
 72 | apiVersion: v1
 73 | kind: Service
 74 | metadata:
 75 |   name: kafka-mirror-tester-consumer
 76 |   labels:
 77 |     app: kafka-mirror-tester-consumer
 78 | spec:
 79 |   ports:
 80 |   - name: metrics
 81 |     port: 8000
 82 |   clusterIP: None
 83 |   selector:
 84 |     app: kafka-mirror-tester-consumer
 85 | ---
 86 | apiVersion: monitoring.coreos.com/v1
 87 | kind: ServiceMonitor
 88 | metadata:
 89 |   labels:
 90 |     k8s-app: kafka-mirror-tester-consumer
 91 |   name: kafka-mirror-tester-consumer
 92 |   namespace: monitoring
 93 | spec:
 94 |   endpoints:
 95 |   - port: metrics
 96 |   jobLabel: k8s-app
 97 |   namespaceSelector:
 98 |     matchNames:
 99 |     - default
100 |   selector:
101 |     matchLabels:
102 |       app: kafka-mirror-tester-consumer
103 | 


--------------------------------------------------------------------------------
/k8s/tester/producer.yaml:
--------------------------------------------------------------------------------
 1 | apiVersion: extensions/v1beta1
 2 | kind: Deployment
 3 | metadata:
 4 |   name: kafka-mirror-tester-producer
 5 |   labels:
 6 |     app: kafka-mirror-tester-producer
 7 | spec:
 8 |   replicas: 8
 9 |   selector:
10 |     matchLabels:
11 |       app: kafka-mirror-tester-producer
12 |   template:
13 |     metadata:
14 |       labels:
15 |         app: kafka-mirror-tester-producer
16 |     spec:
17 |       containers:
18 |       - name: producer
19 |         image: rantav/kafka-mirror-tester:latest
20 |         imagePullPolicy: Always
21 |         env:
22 |         - name: ID
23 |           valueFrom:
24 |             fieldRef:
25 |               fieldPath: status.podIP
26 |         args:
27 |         - produce
28 |         - --bootstrap-servers
29 |         - broker.kafka-source.svc.cluster.local:9092
30 |         - --id
31 |         - $(ID)
32 |         - --message-size
33 |         - "1000"
34 |         - --throughput
35 |         - "20000"
36 |         - --topics
37 |         - topic0
38 |         - --retention
39 |         - "300000"
40 |         - --num-partitions
41 |         - "64"
42 |         - --num-replicas
43 |         - "2"
44 |         ports:
45 |         - name: metrics
46 |           containerPort: 8001
47 |       #affinity:
48 |         #podAntiAffinity:
49 |           #requiredDuringSchedulingIgnoredDuringExecution:
50 |           #- labelSelector:
51 |               #matchExpressions:
52 |               #- key: app
53 |                 #operator: In
54 |                 #values:
55 |                 #- kafka-mirror-tester-producer
56 |             #topologyKey: "kubernetes.io/hostname"
57 |           #- labelSelector:
58 |               #matchExpressions:
59 |               #- key: app
60 |                 #operator: In
61 |                 #values:
62 |                 #- kafka-source
63 |             #namespaces:
64 |             #- kafka-source
65 |             #topologyKey: "kubernetes.io/hostname"
66 | ---
67 | # Headless service just for the sake of exposing the metrics
68 | apiVersion: v1
69 | kind: Service
70 | metadata:
71 |   name: kafka-mirror-tester-producer
72 |   labels:
73 |     app: kafka-mirror-tester-producer
74 | spec:
75 |   ports:
76 |   - name: metrics
77 |     port: 8001
78 |   clusterIP: None
79 |   selector:
80 |     app: kafka-mirror-tester-producer
81 | ---
82 | apiVersion: monitoring.coreos.com/v1
83 | kind: ServiceMonitor
84 | metadata:
85 |   labels:
86 |     k8s-app: kafka-mirror-tester-producer
87 |   name: kafka-mirror-tester-producer
88 |   namespace: monitoring
89 | spec:
90 |   endpoints:
91 |   - port: metrics
92 |   jobLabel: k8s-app
93 |   namespaceSelector:
94 |     matchNames:
95 |     - default
96 |   selector:
97 |     matchLabels:
98 |       app: kafka-mirror-tester-producer
99 | 


--------------------------------------------------------------------------------
/k8s/ureplicator/00namespace.yml:
--------------------------------------------------------------------------------
1 | apiVersion: v1
2 | kind: Namespace
3 | metadata:
4 |   name: ureplicator
5 | 


--------------------------------------------------------------------------------
/k8s/ureplicator/20zookeeper.yml:
--------------------------------------------------------------------------------
  1 | apiVersion: v1
  2 | kind: Service
  3 | metadata:
  4 |   namespace: ureplicator
  5 |   name: zk-hs
  6 |   labels:
  7 |     app: zk
  8 | spec:
  9 |   ports:
 10 |   - port: 2888
 11 |     name: server
 12 |   - port: 3888
 13 |     name: leader-election
 14 |   clusterIP: None
 15 |   selector:
 16 |     app: zk
 17 | ---
 18 | apiVersion: v1
 19 | kind: Service
 20 | metadata:
 21 |   namespace: ureplicator
 22 |   name: zookeeper
 23 |   labels:
 24 |     app: zk
 25 | spec:
 26 |   ports:
 27 |   - port: 2181
 28 |     name: client
 29 |   selector:
 30 |     app: zk
 31 | ---
 32 | apiVersion: policy/v1beta1
 33 | kind: PodDisruptionBudget
 34 | metadata:
 35 |   namespace: ureplicator
 36 |   name: zk-pdb
 37 | spec:
 38 |   selector:
 39 |     matchLabels:
 40 |       app: zk
 41 |   maxUnavailable: 1
 42 | ---
 43 | apiVersion: apps/v1beta1
 44 | kind: StatefulSet
 45 | metadata:
 46 |   namespace: ureplicator
 47 |   name: zk
 48 | spec:
 49 |   selector:
 50 |     matchLabels:
 51 |       app: zk
 52 |   serviceName: zk-hs
 53 |   replicas: 1
 54 |   updateStrategy:
 55 |     type: RollingUpdate
 56 |   podManagementPolicy: Parallel
 57 |   template:
 58 |     metadata:
 59 |       labels:
 60 |         app: zk
 61 |     spec:
 62 |       affinity:
 63 |         podAntiAffinity:
 64 |           requiredDuringSchedulingIgnoredDuringExecution:
 65 |             - labelSelector:
 66 |                 matchExpressions:
 67 |                   - key: "app"
 68 |                     operator: In
 69 |                     values:
 70 |                     - zk-hs
 71 |               topologyKey: "kubernetes.io/hostname"
 72 |       containers:
 73 |       - name: kubernetes-zookeeper
 74 |         imagePullPolicy: Always
 75 |         image: "k8s.gcr.io/kubernetes-zookeeper:1.0-3.4.10"
 76 |         resources:
 77 |           requests:
 78 |             memory: "1Gi"
 79 |             cpu: "0.5"
 80 |         ports:
 81 |         - containerPort: 2181
 82 |           name: client
 83 |         - containerPort: 2888
 84 |           name: server
 85 |         - containerPort: 3888
 86 |           name: leader-election
 87 |         command:
 88 |         - sh
 89 |         - -c
 90 |         - "start-zookeeper \
 91 |           --servers=1 \
 92 |           --data_dir=/var/lib/zookeeper/data \
 93 |           --data_log_dir=/var/lib/zookeeper/data/log \
 94 |           --conf_dir=/opt/zookeeper/conf \
 95 |           --client_port=2181 \
 96 |           --election_port=3888 \
 97 |           --server_port=2888 \
 98 |           --tick_time=2000 \
 99 |           --init_limit=10 \
100 |           --sync_limit=5 \
101 |           --heap=512M \
102 |           --max_client_cnxns=200 \
103 |           --snap_retain_count=3 \
104 |           --purge_interval=12 \
105 |           --max_session_timeout=40000 \
106 |           --min_session_timeout=4000 \
107 |           --log_level=INFO"
108 |         readinessProbe:
109 |           exec:
110 |             command:
111 |             - sh
112 |             - -c
113 |             - "zookeeper-ready 2181"
114 |           initialDelaySeconds: 10
115 |           timeoutSeconds: 5
116 |         livenessProbe:
117 |           exec:
118 |             command:
119 |             - sh
120 |             - -c
121 |             - "zookeeper-ready 2181"
122 |           initialDelaySeconds: 10
123 |           timeoutSeconds: 5
124 |         volumeMounts:
125 |         - name: data
126 |           mountPath: /var/lib/zookeeper
127 |       volumes:
128 |       - name: data
129 |         emptyDir: {}
130 |       securityContext:
131 |         runAsUser: 1000
132 |         fsGroup: 1000
133 | 


--------------------------------------------------------------------------------
/k8s/ureplicator/25env-config.yml.tmpl:
--------------------------------------------------------------------------------
 1 | apiVersion: v1
 2 | kind: ConfigMap
 3 | metadata:
 4 |   name: ureplicator-envs
 5 |   namespace: ureplicator
 6 | data:
 7 |   SRC_ZK_CONNECT: __SRC_ZK_CONNECT__
 8 |   CONSUMER_GROUP_ID: ureplicator
 9 |   HELIX_CLUSTER_NAME: ureplicator
10 |   HELIX_ENV: test.eu1
11 |   HELIX_ZK_CONNECT: zookeeper.ureplicator.svc.cluster.local:2181
12 |   HELIX_ZK_ADDRESS: zookeeper.ureplicator.svc.cluster.local
13 |   HELIX_ZK_PORT: '2181'
14 |   DST_ZK_CONNECT: zookeeper.kafka-destination.svc.cluster.local:2181
15 |   DST_BOOTSTRAP_SERVERS: broker.kafka-destination.svc.cluster.local:9092
16 |   WORKER_ABORT_ON_SEND_FAILURE: 'true'
17 |   GRAPHITE_HOST: prometheus-graphite-exporter.monitoring.svc.cluster.local
18 |   GRAPHITE_PORT: "9109"
19 |   FETCH_MESSAGE_MAX_BYTES: "10485760"
20 |   SOCKET_RECEIVE_BUFFER_BYTES: "10485760"
21 |   NUM_CONSUMER_FETCHERS: "1"
22 |   PROD_COMPRESSION_TYPE: none # none, gzip, snappy, lz4
23 |   PROD_LINGER_MS: "1000"
24 |   PROD_SEND_BUFFER_BYTES: "10485760"
25 |   PROD_MAX_REQUEST_SIZE: "10485760"
26 |   PROD_MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION: "10"
27 |   JAVA_OPTS: -javaagent:/jmx_prometheus_javaagent-0.3.1.jar=8080:/etc/jmx-config/jmx-prometheus-javaagent-config.yml
28 | 


--------------------------------------------------------------------------------
/k8s/ureplicator/25jmx-prometheus-javaagent-config.yml:
--------------------------------------------------------------------------------
 1 | apiVersion: v1
 2 | kind: ConfigMap
 3 | metadata:
 4 |   name: ureplicator-jmx-prometheus-javaagent-config
 5 |   namespace: ureplicator
 6 | data:
 7 |   jmx-prometheus-javaagent-config.yml: |+
 8 |     startDelaySeconds: 0
 9 |     lowercaseOutputName: true
10 |     lowercaseOutputLabelNames: true
11 |     whitelistObjectNames: ["java.lang:*"]
12 |     rules:
13 |       - pattern: 'java.lang<type=Memory><HeapMemoryUsage>(.+): .*'
14 |         name: java_lang_Memory_HeapMemoryUsage_$1
15 |       - pattern: 'java.lang<type=Memory><NonHeapMemoryUsage>(.+): .*'
16 |         name: java_lang_Memory_NonHeapMemoryUsage_$1
17 |       - pattern: 'java.lang<type=OperatingSystem><.*>OpenFileDescriptorCount: .*'
18 |         name: java_lang_OperatingSystem_OpenFileDescriptorCount
19 |       - pattern: 'java.lang<type=OperatingSystem><.*>ProcessCpuLoad: .*'
20 |         name: java_lang_OperatingSystem_ProcessCpuLoad
21 |       - pattern: 'java.lang<type=Threading><(.*)>ThreadCount: .*'
22 |         name: java_lang_Threading_ThreadCount
23 | 


--------------------------------------------------------------------------------
/k8s/ureplicator/30ureplicator.yml:
--------------------------------------------------------------------------------
  1 | apiVersion: apps/v1beta2
  2 | kind: Deployment
  3 | metadata:
  4 |   namespace: ureplicator
  5 |   name: ureplicator-controller
  6 |   labels:
  7 |     app: ureplicator
  8 |     component: controller
  9 | spec:
 10 |   replicas: 1
 11 |   selector:
 12 |     matchLabels:
 13 |       app: ureplicator
 14 |       component: controller
 15 |   template:
 16 |     metadata:
 17 |       labels:
 18 |         app: ureplicator
 19 |         component: controller
 20 |     spec:
 21 |       terminationGracePeriodSeconds: 10
 22 |       initContainers:
 23 |       - name: init-zk
 24 |         image: busybox
 25 |         command:
 26 |           - /bin/sh
 27 |           - -c
 28 |           - 'until [ "imok" = "$(echo ruok | nc -w 1 $HELIX_ZK_ADDRESS $HELIX_ZK_PORT)" ] ; do echo waiting ; sleep 1 ; done'
 29 |         env:
 30 |         - name: SERVICE_TYPE
 31 |           value: "init"
 32 |         envFrom:
 33 |         - configMapRef:
 34 |             name: ureplicator-envs
 35 |       containers:
 36 |       - name: ureplicator-controller
 37 |         image: rantav/ureplicator:1c1677d
 38 |         env:
 39 |         - name: SERVICE_CMD
 40 |           value: "start-controller.sh"
 41 |         - name: SERVICE_TYPE
 42 |           value: "controller"
 43 |         envFrom:
 44 |         - configMapRef:
 45 |             name: ureplicator-envs
 46 |         ports:
 47 |         - name: api-port
 48 |           containerPort: 9000
 49 |         - name: metrics
 50 |           containerPort: 8080
 51 |         livenessProbe:
 52 |           httpGet:
 53 |             path: /health
 54 |             port: api-port
 55 |           initialDelaySeconds: 120
 56 |           timeoutSeconds: 10
 57 |         readinessProbe:
 58 |           httpGet:
 59 |             path: /health
 60 |             port: api-port
 61 |           initialDelaySeconds: 120
 62 |           timeoutSeconds: 10
 63 |         resources:
 64 |           requests:
 65 |             cpu: 1000m
 66 |             memory: 3000Mi
 67 |           limits:
 68 |             cpu: 1000m
 69 |             memory: 3000Mi
 70 |         volumeMounts:
 71 |         - name: tmp
 72 |           mountPath: /tmp/uReplicator-controller
 73 |         - name: jmx-config
 74 |           mountPath: /etc/jmx-config
 75 |       volumes:
 76 |       - name: tmp
 77 |         emptyDir: {}
 78 |       - name: jmx-config
 79 |         configMap:
 80 |           name: ureplicator-jmx-prometheus-javaagent-config
 81 | ---
 82 | apiVersion: extensions/v1beta1
 83 | kind: Deployment
 84 | metadata:
 85 |   namespace: ureplicator
 86 |   name: ureplicator-worker
 87 |   labels:
 88 |     app: ureplicator
 89 |     component: worker
 90 | spec:
 91 |   replicas: 1
 92 |   selector:
 93 |     matchLabels:
 94 |       app: ureplicator
 95 |       component: worker
 96 |   template:
 97 |     metadata:
 98 |       labels:
 99 |         app: ureplicator
100 |         component: worker
101 |     spec:
102 |       terminationGracePeriodSeconds: 10
103 |       initContainers:
104 |       - name: init-zk
105 |         image: busybox
106 |         command:
107 |           - /bin/sh
108 |           - -c
109 |           - 'until [ "imok" = "$(echo ruok | nc -w 1 $HELIX_ZK_ADDRESS $HELIX_ZK_PORT)" ] ; do echo waiting ; sleep 10 ; done'
110 |         envFrom:
111 |         - configMapRef:
112 |             name: ureplicator-envs
113 |       containers:
114 |       - name: ureplicator-worker
115 |         image: rantav/ureplicator:1c1677d
116 |         imagePullPolicy: Always
117 |         env:
118 |         - name: SERVICE_TYPE
119 |           value: "worker"
120 |         - name: SERVICE_CMD
121 |           value: "start-worker.sh"
122 |         envFrom:
123 |         - configMapRef:
124 |             name: ureplicator-envs
125 |         ports:
126 |         - name: metrics
127 |           containerPort: 8080
128 |         resources:
129 |           requests:
130 |             cpu: 800m
131 |             memory: 3Gi
132 |           #limits:
133 |             #cpu: 1200m
134 |             #memory: 6Gi
135 |         volumeMounts:
136 |         - name: jmx-config
137 |           mountPath: /etc/jmx-config
138 |       volumes:
139 |       - name: jmx-config
140 |         configMap:
141 |           name: ureplicator-jmx-prometheus-javaagent-config
142 |       affinity:
143 |         podAntiAffinity:
144 |           requiredDuringSchedulingIgnoredDuringExecution:
145 |           - labelSelector:
146 |               matchExpressions:
147 |               - key: app
148 |                 operator: In
149 |                 values:
150 |                 - kafka-destination
151 |             namespaces:
152 |             - kafka-destination
153 |             topologyKey: "kubernetes.io/hostname"
154 |           #- labelSelector:
155 |               #matchExpressions:
156 |               #- key: app
157 |                 #operator: In
158 |                 #values:
159 |                 #- ureplicator
160 |               #- key: component
161 |                 #operator: In
162 |                 #values:
163 |                 #- worker
164 |             #topologyKey: "kubernetes.io/hostname"
165 | 


--------------------------------------------------------------------------------
/k8s/ureplicator/40monitoring.yml:
--------------------------------------------------------------------------------
  1 | # Headless service just for the sake of exposing the metrics
  2 | apiVersion: v1
  3 | kind: Service
  4 | metadata:
  5 |   name: ureplicator-controller
  6 |   namespace: ureplicator
  7 |   labels:
  8 |     app: ureplicator
  9 |     component: controller
 10 | spec:
 11 |   ports:
 12 |   - name: metrics
 13 |     port: 8080
 14 |   clusterIP: None
 15 |   selector:
 16 |     app: ureplicator
 17 |     component: controller
 18 | ---
 19 | apiVersion: v1
 20 | kind: Service
 21 | metadata:
 22 |   name: ureplicator-worker
 23 |   namespace: ureplicator
 24 |   labels:
 25 |     app: ureplicator
 26 |     component: worker
 27 | spec:
 28 |   ports:
 29 |   - name: metrics
 30 |     port: 8080
 31 |   clusterIP: None
 32 |   selector:
 33 |     app: ureplicator
 34 |     component: worker
 35 | ---
 36 | apiVersion: monitoring.coreos.com/v1
 37 | kind: ServiceMonitor
 38 | metadata:
 39 |   labels:
 40 |     k8s-app: ureplicator-controller
 41 |   name: ureplicator-controller
 42 |   namespace: monitoring
 43 | spec:
 44 |   endpoints:
 45 |   - port: metrics
 46 |   jobLabel: k8s-app
 47 |   namespaceSelector:
 48 |     matchNames:
 49 |     - ureplicator
 50 |   selector:
 51 |     matchLabels:
 52 |       app: ureplicator
 53 |       component: controller
 54 | ---
 55 | apiVersion: monitoring.coreos.com/v1
 56 | kind: ServiceMonitor
 57 | metadata:
 58 |   labels:
 59 |     k8s-app: ureplicator-worker
 60 |   name: ureplicator-worker
 61 |   namespace: monitoring
 62 | spec:
 63 |   endpoints:
 64 |   - port: metrics
 65 |   jobLabel: k8s-app
 66 |   namespaceSelector:
 67 |     matchNames:
 68 |     - ureplicator
 69 |   selector:
 70 |     matchLabels:
 71 |       app: ureplicator
 72 |       component: worker
 73 | ---
 74 | apiVersion: rbac.authorization.k8s.io/v1beta1
 75 | kind: ClusterRole
 76 | metadata:
 77 |   name: prometheus-k8s
 78 |   namespace: ureplicator
 79 | rules:
 80 | - apiGroups: [""]
 81 |   resources:
 82 |   - nodes
 83 |   - services
 84 |   - endpoints
 85 |   - pods
 86 |   verbs: ["get", "list", "watch"]
 87 | - apiGroups: [""]
 88 |   resources:
 89 |   - configmaps
 90 |   verbs: ["get"]
 91 | - nonResourceURLs: ["/metrics"]
 92 |   verbs: ["get"]
 93 | ---
 94 | apiVersion: rbac.authorization.k8s.io/v1beta1
 95 | kind: ClusterRoleBinding
 96 | metadata:
 97 |   name: prometheus-k8s
 98 | roleRef:
 99 |   apiGroup: rbac.authorization.k8s.io
100 |   kind: ClusterRole
101 |   name: prometheus-k8s
102 | subjects:
103 | - kind: ServiceAccount
104 |   name: prometheus-k8s
105 |   namespace: monitoring
106 | 


--------------------------------------------------------------------------------
/k8s/ureplicator/template.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | set +x
 3 | 
 4 | DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
 5 | 
 6 | cd $DIR
 7 | 
 8 | until IP=$(kubectl --context us-east-1.k8s.local get node $(kubectl --context us-east-1.k8s.local -n kafka-source get po pzoo-source-0 -o jsonpath='{.spec.nodeName}') -o jsonpath='{.status.addresses[?(@.type=="ExternalIP")].address}')
 9 | do
10 |   echo "ZK on source isn't ready yet"
11 |   sleep 5
12 | done
13 | 
14 | sed "s/__SRC_ZK_CONNECT__/$IP:2181/" 25env-config.yml.tmpl > 25env-config.yml
15 | 
16 | 


--------------------------------------------------------------------------------
/k8s/ureplicator/test.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | 
 3 | set -x
 4 | # Test Kafka to see if a topic had been replicated
 5 | kubectl --context eu-west-1.k8s.local -n kafka-destination wait --for=condition=Ready pod/kafka-destination-0 --timeout=-1s
 6 | kubectl --context us-east-1.k8s.local -n kafka-source wait --for=condition=Ready pod/kafka-source-0 --timeout=-1s
 7 | 
 8 | kubectl --context eu-west-1.k8s.local -n ureplicator wait --for=condition=Available deployment/ureplicator-worker --timeout=-1s
 9 | kubectl --context eu-west-1.k8s.local -n ureplicator wait --for=condition=Available deployment/ureplicator-controller --timeout=-1s
10 | 
11 | # Run end to end tests. Produce to the source cluster, consume from the destination cluster
12 | TOPIC="_test_replicator_$(date +%s)"
13 | kubectl --context us-east-1.k8s.local exec -n kafka-source kafka-source-0 -- bash -c "unset JMX_PORT; echo '                                     >>>>>>>>>>>>>  REPLICATOR GREAT SUCCESS! <<<<<<<<<<<<<<<<' | /opt/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic $TOPIC"
14 | kubectl --context eu-west-1.k8s.local exec -n kafka-destination kafka-destination-0 -- bash -c "unset JMX_PORT; /opt/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning --topic $TOPIC --max-messages 1"
15 | 


--------------------------------------------------------------------------------
/lib/admin/topic.go:
--------------------------------------------------------------------------------
 1 | package admin
 2 | 
 3 | // Package admin is used for kafka's admin api
 4 | 
 5 | import (
 6 | 	"context"
 7 | 	"fmt"
 8 | 
 9 | 	"github.com/confluentinc/confluent-kafka-go/kafka"
10 | 	log "github.com/sirupsen/logrus"
11 | 	"github.com/appsflyer/kafka-mirror-tester/lib/types"
12 | )
13 | 
14 | // MustCreateTopic creates a new topic with the specified number of partitions.
15 | // If the topic already exists, fails silently
16 | // On error - simply panics
17 | func MustCreateTopic(
18 | 	ctx context.Context,
19 | 	brokers types.Brokers,
20 | 	topic types.Topic,
21 | 	partitions,
22 | 	replicas,
23 | 	retentionMs uint) {
24 | 	a, err := kafka.NewAdminClient(&kafka.ConfigMap{"bootstrap.servers": string(brokers)})
25 | 	if err != nil {
26 | 		log.Fatalf("%+v", err)
27 | 		return
28 | 	}
29 | 	defer a.Close()
30 | 
31 | 	res, err := a.CreateTopics(
32 | 		ctx,
33 | 		[]kafka.TopicSpecification{
34 | 			{
35 | 				Topic:             string(topic),
36 | 				NumPartitions:     int(partitions),
37 | 				ReplicationFactor: int(replicas),
38 | 				Config: map[string]string{
39 | 					"retention.ms": fmt.Sprintf("%d", retentionMs),
40 | 				},
41 | 			},
42 | 		})
43 | 	if err != nil {
44 | 		log.Fatalf("%+v", err)
45 | 		return
46 | 	}
47 | 
48 | 	log.Infof("Topic create result: %v", res)
49 | }
50 | 


--------------------------------------------------------------------------------
/lib/cmd/cmd.go:
--------------------------------------------------------------------------------
 1 | // Package cmd is a cli layer that takes care of cli args etc.
 2 | // powereve by cobra https://github.com/spf13/cobra
 3 | package cmd
 4 | 
 5 | import (
 6 | 	"fmt"
 7 | 	"os"
 8 | 
 9 | 	"github.com/spf13/cobra"
10 | )
11 | 
12 | // cobra root cmd
13 | var rootCmd = &cobra.Command{
14 | 	Use:   "kafka-mirror-tester",
15 | 	Short: "Kafka mirror tester is a test tool for kafka mirroring",
16 | 	Long:  `A high throughput producer and consumer that stress kafka and validate message consumption order and latency.`,
17 | }
18 | 
19 | // Execute is the main entry point for the CLI, using cobra lib.
20 | func Execute() {
21 | 	if err := rootCmd.Execute(); err != nil {
22 | 		fmt.Println(err)
23 | 		os.Exit(1)
24 | 	}
25 | }
26 | 


--------------------------------------------------------------------------------
/lib/cmd/consume.go:
--------------------------------------------------------------------------------
 1 | package cmd
 2 | 
 3 | import (
 4 | 	"context"
 5 | 	"strings"
 6 | 
 7 | 	"github.com/spf13/cobra"
 8 | 	"github.com/appsflyer/kafka-mirror-tester/lib/admin"
 9 | 	"github.com/appsflyer/kafka-mirror-tester/lib/consumer"
10 | 	"github.com/appsflyer/kafka-mirror-tester/lib/types"
11 | )
12 | 
13 | var (
14 | 	cTopics            *string
15 | 	cBootstraServers   *string
16 | 	consumerGroup      *string
17 | 	cUseMessageHeaders *bool
18 | 	cNumPartitions     *uint
19 | 	cNumReplicas       *uint
20 | 	cRetention         *uint
21 | )
22 | 
23 | // consumeCmd represents the consume command
24 | var consumeCmd = &cobra.Command{
25 | 	Use:   "consume",
26 | 	Short: "Consume messages from kafka and aggregate results",
27 | 	Long: `Consumes messages from kafka and collects statistics about them.
28 | Namely latency statistics and sequence number bookeeping.`,
29 | 	Run: func(cmd *cobra.Command, args []string) {
30 | 		ctx := context.Background()
31 | 		brokers := types.Brokers(*cBootstraServers)
32 | 		ts := types.Topics(strings.Split(*cTopics, ","))
33 | 		initialSequence := types.SequenceNumber(0)
34 | 		cg := types.ConsumerGroup(*consumerGroup)
35 | 		for _, t := range ts {
36 | 			admin.MustCreateTopic(ctx, brokers, types.Topic(t), *cNumPartitions, *cNumReplicas, *cRetention)
37 | 		}
38 | 		consumer.ConsumeAndAnalyze(ctx, brokers, ts, cg, initialSequence, *cUseMessageHeaders)
39 | 	},
40 | }
41 | 
42 | func init() {
43 | 	rootCmd.AddCommand(consumeCmd)
44 | 
45 | 	cTopics = consumeCmd.Flags().String("topics", "", "List of topics to consume from (coma separated)")
46 | 	consumeCmd.MarkFlagRequired("topics")
47 | 	cBootstraServers = consumeCmd.Flags().String("bootstrap-servers", "", "List of host:port bootstrap servers (coma separated)")
48 | 	consumeCmd.MarkFlagRequired("bootstrap-servers")
49 | 	consumerGroup = consumeCmd.Flags().String("consumer-group", "", "The kafka consumer group name")
50 | 	consumeCmd.MarkFlagRequired("consumer-group")
51 | 	cUseMessageHeaders = consumeCmd.Flags().Bool("use-message-headers", false, "Whether to use message headers to pass metadata or use the payload instead")
52 | 	cNumPartitions = consumeCmd.Flags().Uint("num-partitions", 1, "Number of partitions to create per each topic (if the topics are new)")
53 | 	cNumReplicas = consumeCmd.Flags().Uint("num-replicas", 1, "Number of replicas to create per each topic (if the topics are new)")
54 | 	cRetention = consumeCmd.Flags().Uint("retention", 30000, "Data retention for the created topics. In ms.")
55 | }
56 | 


--------------------------------------------------------------------------------
/lib/cmd/produce.go:
--------------------------------------------------------------------------------
 1 | package cmd
 2 | 
 3 | import (
 4 | 	"github.com/spf13/cobra"
 5 | 
 6 | 	"github.com/appsflyer/kafka-mirror-tester/lib/producer"
 7 | 	"github.com/appsflyer/kafka-mirror-tester/lib/types"
 8 | )
 9 | 
10 | var (
11 | 	// CLI args
12 | 	producerID         *string
13 | 	pTopics            *string
14 | 	throughput         *uint
15 | 	messageSize        *uint
16 | 	pBootstraServers   *string
17 | 	pUseMessageHeaders *bool
18 | 	pNumPartitions     *uint
19 | 	pNumReplicas       *uint
20 | 	pRetention         *uint
21 | )
22 | 
23 | // produceCmd represents the produce command
24 | var produceCmd = &cobra.Command{
25 | 	Use:   "produce",
26 | 	Short: "Produce messages to kafka",
27 | 	Long: `The producer is a high-throughput kafka message producer.
28 | 	It sends sequence numbered and timestamped messages to kafka where by the consumer reads and validates. `,
29 | 	Run: func(cmd *cobra.Command, args []string) {
30 | 		brokers := types.Brokers(*pBootstraServers)
31 | 		id := types.ProducerID(*producerID)
32 | 		through := types.Throughput(*throughput)
33 | 		size := types.MessageSize(*messageSize)
34 | 		initialSequence := types.SequenceNumber(0)
35 | 		producer.ProduceToTopics(brokers, id, through, size, initialSequence, *pTopics, *pNumPartitions, *pNumReplicas, *pUseMessageHeaders, *pRetention)
36 | 	},
37 | }
38 | 
39 | func init() {
40 | 	rootCmd.AddCommand(produceCmd)
41 | 	producerID = produceCmd.Flags().String("id", "", "ID of the producer. You can use the hostname command")
42 | 	produceCmd.MarkFlagRequired("id")
43 | 	pTopics = produceCmd.Flags().String("topics", "", "List of topics to produce to (coma separated)")
44 | 	produceCmd.MarkFlagRequired("topics")
45 | 	throughput = produceCmd.Flags().Uint("throughput", 0, "Number of messages to send to each topic per second")
46 | 	produceCmd.MarkFlagRequired("throughput")
47 | 	messageSize = produceCmd.Flags().Uint("message-size", 0, "Message size to send (in bytes)")
48 | 	produceCmd.MarkFlagRequired("message-size")
49 | 	pBootstraServers = produceCmd.Flags().String("bootstrap-servers", "", "List of host:port bootstrap servers (coma separated)")
50 | 	produceCmd.MarkFlagRequired("bootstrap-servers")
51 | 	pUseMessageHeaders = produceCmd.Flags().Bool("use-message-headers", false, "Whether to use message headers to pass metadata or use the payload instead")
52 | 	pNumPartitions = produceCmd.Flags().Uint("num-partitions", 1, "Number of partitions to create per each topic (if the topics are new)")
53 | 	pNumReplicas = produceCmd.Flags().Uint("num-replicas", 1, "Number of replicas to create per each topic (if the topics are new)")
54 | 	pRetention = produceCmd.Flags().Uint("retention", 30000, "Data retention for the created topics. In ms.")
55 | }
56 | 


--------------------------------------------------------------------------------
/lib/consumer/consumer.go:
--------------------------------------------------------------------------------
  1 | // Package consumer implements the consumption, performance measurement and validation logic of the test
  2 | package consumer
  3 | 
  4 | import (
  5 | 	"context"
  6 | 	"fmt"
  7 | 	"os"
  8 | 	"os/signal"
  9 | 	"syscall"
 10 | 
 11 | 	"github.com/confluentinc/confluent-kafka-go/kafka"
 12 | 	log "github.com/sirupsen/logrus"
 13 | 
 14 | 	"github.com/appsflyer/kafka-mirror-tester/lib/message"
 15 | 	"github.com/appsflyer/kafka-mirror-tester/lib/types"
 16 | )
 17 | 
 18 | const (
 19 | 
 20 | 	// kafka consumer session timeout
 21 | 	sessionTimeoutMs = 6000
 22 | 
 23 | 	// For the purpose of performance monitoring we always want to start with the latest messages
 24 | 	autoOffsetReset = "latest"
 25 | )
 26 | 
 27 | // clientID is a friendly name for the client so that monitoring tool know who we are.
 28 | var clientID string
 29 | 
 30 | func init() {
 31 | 
 32 | 	//log.SetLevel(log.TraceLevel)
 33 | 
 34 | 	hostname, err := os.Hostname()
 35 | 	if err != nil {
 36 | 		log.Fatalf("Can't get hostname %+v", err)
 37 | 	}
 38 | 	clientID = fmt.Sprintf("kafka-mirror-tester-%s", hostname)
 39 | }
 40 | 
 41 | // ConsumeAndAnalyze consumes messages from the kafka topic and analyzes their correctness and performance.
 42 | // The function blocks forever (or until the context is cancled, or until a signal is sent)
 43 | func ConsumeAndAnalyze(
 44 | 	ctx context.Context,
 45 | 	brokers types.Brokers,
 46 | 	topics types.Topics,
 47 | 	group types.ConsumerGroup,
 48 | 	initialSequence types.SequenceNumber,
 49 | 	useMessageHeaders bool,
 50 | ) {
 51 | 	log.Infof("Starting the consumer. brokers=%s, topics=%s group=%s initialSequence=%d",
 52 | 		brokers, topics, group, initialSequence)
 53 | 	c, err := kafka.NewConsumer(&kafka.ConfigMap{
 54 | 		"bootstrap.servers":               string(brokers),
 55 | 		"group.id":                        string(group),
 56 | 		"session.timeout.ms":              sessionTimeoutMs,
 57 | 		"go.events.channel.enable":        true,
 58 | 		"go.application.rebalance.enable": true,
 59 | 		"client.id":                       clientID,
 60 | 		// Enable generation of PartitionEOF when the
 61 | 		// end of a partition is reached.
 62 | 		"enable.partition.eof": true,
 63 | 		"auto.offset.reset":    autoOffsetReset,
 64 | 	})
 65 | 	if err != nil {
 66 | 		log.Fatalf("Failed to create consumer: %s\n", err)
 67 | 	}
 68 | 	defer c.Close()
 69 | 	log.Debugf("Created Consumer %v\n", c)
 70 | 
 71 | 	err = c.SubscribeTopics([]string(topics), nil)
 72 | 
 73 | 	if err != nil {
 74 | 		log.Fatalf("Failed to subscribe to topics %s: %s\n", topics, err)
 75 | 	}
 76 | 
 77 | 	serveConsumerUI()
 78 | 
 79 | 	consumeForever(ctx, c, initialSequence, useMessageHeaders)
 80 | }
 81 | 
 82 | // loops through the kafka consumer channel and consumes all events
 83 | // The loop runs forever until the context is cancled or a signal is sent (SIGINT or SIGTERM)
 84 | func consumeForever(
 85 | 	ctx context.Context,
 86 | 	c *kafka.Consumer,
 87 | 	initialSequence types.SequenceNumber,
 88 | 	useMessageHeaders bool,
 89 | ) {
 90 | 	sigchan := make(chan os.Signal, 1)
 91 | 	signal.Notify(sigchan, syscall.SIGINT, syscall.SIGTERM)
 92 | 
 93 | 	for {
 94 | 		select {
 95 | 		case sig := <-sigchan:
 96 | 			log.Infof("Caught signal %v: terminating", sig)
 97 | 			// TODO: Write a summary message to the console before quitting.
 98 | 			return
 99 | 		case <-ctx.Done():
100 | 			log.Infof("Done. %s", ctx.Err())
101 | 			return
102 | 		case ev := <-c.Events():
103 | 			// Most events are typically juse messages, still we are also interested in
104 | 			// Partition changes, EOF and Errors
105 | 			switch e := ev.(type) {
106 | 			case kafka.AssignedPartitions:
107 | 				log.Infof("AssignedPartitions %v", e)
108 | 				c.Assign(e.Partitions)
109 | 			case kafka.RevokedPartitions:
110 | 				log.Infof("RevokedPartitions %v", e)
111 | 				c.Unassign()
112 | 			case *kafka.Message:
113 | 				processMessage(e, useMessageHeaders)
114 | 			case kafka.PartitionEOF:
115 | 				log.Debugf("PartitionEOF Reached %v", e)
116 | 			case kafka.Error:
117 | 				// Errors should generally be considered as informational, the client will try to automatically recover
118 | 				log.Errorf("Error: %+v", e)
119 | 			}
120 | 		}
121 | 	}
122 | }
123 | 
124 | // Process a single message, keeping track of latency data and sequence numbers.
125 | func processMessage(
126 | 	msg *kafka.Message,
127 | 	useMessageHeaders bool,
128 | ) {
129 | 	data := message.Extract(msg, useMessageHeaders)
130 | 	log.Tracef("Data: %s", data)
131 | 	validateSequence(data)
132 | 	collectThroughput(data)
133 | 	collectLatencyStats(data)
134 | }
135 | 


--------------------------------------------------------------------------------
/lib/consumer/performance.go:
--------------------------------------------------------------------------------
 1 | package consumer
 2 | 
 3 | import (
 4 | 	"github.com/jamiealquiza/tachymeter"
 5 | 	"github.com/prometheus/client_golang/prometheus"
 6 | 
 7 | 	"github.com/appsflyer/kafka-mirror-tester/lib/message"
 8 | )
 9 | 
10 | const (
11 | 	// Define a sample size of 500. This affects memory consumption v/s precision.
12 | 	// It's probably OK to increase this number by a lot but didn't test it yet.
13 | 	tachymeterSampleSize = 500
14 | )
15 | 
16 | var (
17 | 	// We use two tools to measure the time performance.
18 | 	// One is useful due to it's interface with prometheus and the other is useful as a CLI interface.
19 | 
20 | 	// Prometheus
21 | 	latencyHistogram prometheus.Histogram
22 | 
23 | 	// And this one has a mice text UI.
24 | 	tachymeterHistogram *tachymeter.Tachymeter
25 | 
26 | 	// Define the time windows in which metrics are aggregater for.
27 | 	// How to read this? "20s1s" means a chart will be displayed for 20 seconds and each
28 | 	// item in this chart is a 1 second average.
29 | 	tachymeterMeasurementWindows = []string{"20s1s", "1m1s", "2m1s", "15m30s", "1h1m"}
30 | )
31 | 
32 | func init() {
33 | 	tachymeterHistogram = tachymeter.New(&tachymeter.Config{Size: tachymeterSampleSize})
34 | }
35 | 
36 | // Collect the latency stats from the data into the various counters.
37 | func collectLatencyStats(data *message.Data) {
38 | 	latencyHistogram.Observe(float64(data.LatencyMS()))
39 | 	tachymeterHistogram.AddTime(data.Latency)
40 | }
41 | 


--------------------------------------------------------------------------------
/lib/consumer/sequences.go:
--------------------------------------------------------------------------------
 1 | package consumer
 2 | 
 3 | import (
 4 | 	"fmt"
 5 | 	"sync/atomic"
 6 | 
 7 | 	"github.com/prometheus/client_golang/prometheus"
 8 | 	log "github.com/sirupsen/logrus"
 9 | 
10 | 	"github.com/appsflyer/kafka-mirror-tester/lib/message"
11 | 	"github.com/appsflyer/kafka-mirror-tester/lib/types"
12 | )
13 | 
14 | var (
15 | 	// Map of producer,topic,key -> latest sequence number recieved from this key
16 | 	receivedSequenceNumbers map[string]types.SequenceNumber
17 | 
18 | 	// For each measurement there are two kind of counters, one is a simple counter that
19 | 	// simply keeps count of how many such events occured.
20 | 	// And the other is a prometheus.Counter instance which measures temporary values of that count
21 | 	// (e.g. last minute, last 15 minutes etc)
22 | 	sameMessagesCounter    prometheus.Counter
23 | 	sameMessagesCount      uint64
24 | 	oldMessagesCounter     prometheus.Counter
25 | 	oldMessagesCount       uint64
26 | 	inOrderMessagesCounter prometheus.Counter
27 | 	inOrderMessagesCount   uint64
28 | 	skippedMessagesCounter prometheus.Counter
29 | 	skippedMessagesCount   uint64
30 | )
31 | 
32 | func init() {
33 | 	receivedSequenceNumbers = make(map[string]types.SequenceNumber)
34 | }
35 | 
36 | // For each message validates that the sequence numnber that corresponds to the producer and the topic
37 | // are in order.
38 | // If they are not in order, will log it and accumulate in counters.
39 | // The function accesses some global varialbe that aren't thread safe (receivedSequenceNumbers) which makes the function not thread safe by itself.
40 | func validateSequence(data *message.Data) {
41 | 	seq := data.Sequence
42 | 	key := createSeqnenceNumberKey(data.ProducerID, data.Topic, data.MessageKey)
43 | 	latestSeq, exists := receivedSequenceNumbers[key]
44 | 	if !exists {
45 | 		// key not found, let's insert it first
46 | 		if seq != 0 {
47 | 			log.Infof("Received initial sequence number > 0. topic=%s producer=%s key=%d number=%d",
48 | 				data.Topic, data.ProducerID, data.MessageKey, data.Sequence)
49 | 		}
50 | 		receivedSequenceNumbers[key] = seq
51 | 		log.Tracef("Message received first of it's producer-topic: %s", data)
52 | 		inOrderMessagesCounter.Add(1)
53 | 		atomic.AddUint64(&inOrderMessagesCount, 1)
54 | 		return
55 | 	}
56 | 
57 | 	switch {
58 | 	case seq == latestSeq:
59 | 		// Same message twice? That's OK, let's just log it
60 | 		log.Debugf("Received the same message again: %s", data)
61 | 		sameMessagesCounter.Add(1)
62 | 		atomic.AddUint64(&sameMessagesCount, 1)
63 | 	case seq < latestSeq:
64 | 		// Received an old message
65 | 		log.Debugf("Received old data. Current seq=%d, but received %s", latestSeq, data)
66 | 		oldMessagesCounter.Add(1)
67 | 		atomic.AddUint64(&oldMessagesCount, 1)
68 | 	case seq == latestSeq+1:
69 | 		// That's just perfect!
70 | 		log.Tracef("Message received in order %s", data)
71 | 		inOrderMessagesCounter.Add(1)
72 | 		atomic.AddUint64(&inOrderMessagesCount, 1)
73 | 	case seq > latestSeq+1:
74 | 		// skipped a few sequences :-(
75 | 		howMany := seq - latestSeq
76 | 		log.Debugf("Skipped a few messages (%d messages). Current seq=%d, received %s",
77 | 			howMany, latestSeq, data)
78 | 		skippedMessagesCounter.Add(float64(howMany))
79 | 		atomic.AddUint64(&skippedMessagesCount, uint64(howMany))
80 | 	}
81 | 	receivedSequenceNumbers[key] = seq
82 | }
83 | 
84 | // create a key for the sequence number map
85 | func createSeqnenceNumberKey(pid types.ProducerID, topic types.Topic, messageKey types.MessageKey) string {
86 | 	return fmt.Sprintf("%s:%s:%d", pid, topic, messageKey)
87 | }
88 | 


--------------------------------------------------------------------------------
/lib/consumer/sequences_test.go:
--------------------------------------------------------------------------------
  1 | package consumer
  2 | 
  3 | import (
  4 | 	"testing"
  5 | 
  6 | 	"github.com/stretchr/testify/assert"
  7 | 	"github.com/appsflyer/kafka-mirror-tester/lib/message"
  8 | )
  9 | 
 10 | func TestValidateSequence(t *testing.T) {
 11 | 	initPrometheus()
 12 | 	assert := assert.New(t)
 13 | 
 14 | 	// validate initial state
 15 | 	assert.Equal(uint64(0), sameMessagesCount)
 16 | 	assert.Equal(uint64(0), oldMessagesCount)
 17 | 	assert.Equal(uint64(0), inOrderMessagesCount)
 18 | 	assert.Equal(uint64(0), skippedMessagesCount)
 19 | 
 20 | 	// Now start sending messages and observe counts
 21 | 	data := &message.Data{
 22 | 		ProducerID: "1",
 23 | 		Topic:      "t",
 24 | 		MessageKey: 1,
 25 | 		Sequence:   0,
 26 | 	}
 27 | 	validateSequence(data)
 28 | 	assert.Equal(uint64(0), sameMessagesCount)
 29 | 	assert.Equal(uint64(0), oldMessagesCount)
 30 | 	assert.Equal(uint64(1), inOrderMessagesCount)
 31 | 	assert.Equal(uint64(0), skippedMessagesCount)
 32 | 
 33 | 	// Same message again
 34 | 	validateSequence(data)
 35 | 	assert.Equal(uint64(1), sameMessagesCount)
 36 | 	assert.Equal(uint64(0), oldMessagesCount)
 37 | 	assert.Equal(uint64(1), inOrderMessagesCount)
 38 | 	assert.Equal(uint64(0), skippedMessagesCount)
 39 | 
 40 | 	// increase sequence
 41 | 	data = &message.Data{
 42 | 		ProducerID: "1",
 43 | 		Topic:      "t",
 44 | 		MessageKey: 1,
 45 | 		Sequence:   1,
 46 | 	}
 47 | 	validateSequence(data)
 48 | 	assert.Equal(uint64(1), sameMessagesCount)
 49 | 	assert.Equal(uint64(0), oldMessagesCount)
 50 | 	assert.Equal(uint64(2), inOrderMessagesCount)
 51 | 	assert.Equal(uint64(0), skippedMessagesCount)
 52 | 
 53 | 	// Send to a different topic
 54 | 	data = &message.Data{
 55 | 		ProducerID: "1",
 56 | 		Topic:      "t2",
 57 | 		MessageKey: 1,
 58 | 		Sequence:   0,
 59 | 	}
 60 | 	validateSequence(data)
 61 | 	assert.Equal(uint64(1), sameMessagesCount)
 62 | 	assert.Equal(uint64(0), oldMessagesCount)
 63 | 	assert.Equal(uint64(3), inOrderMessagesCount)
 64 | 	assert.Equal(uint64(0), skippedMessagesCount)
 65 | 
 66 | 	// Send from a different producer
 67 | 	data = &message.Data{
 68 | 		ProducerID: "2",
 69 | 		Topic:      "t",
 70 | 		MessageKey: 1,
 71 | 		Sequence:   0,
 72 | 	}
 73 | 	validateSequence(data)
 74 | 	assert.Equal(uint64(1), sameMessagesCount)
 75 | 	assert.Equal(uint64(0), oldMessagesCount)
 76 | 	assert.Equal(uint64(4), inOrderMessagesCount)
 77 | 	assert.Equal(uint64(0), skippedMessagesCount)
 78 | 
 79 | 	// Send with a different message key
 80 | 	data = &message.Data{
 81 | 		ProducerID: "1",
 82 | 		Topic:      "t",
 83 | 		MessageKey: 2,
 84 | 		Sequence:   0,
 85 | 	}
 86 | 	validateSequence(data)
 87 | 	assert.Equal(uint64(1), sameMessagesCount)
 88 | 	assert.Equal(uint64(0), oldMessagesCount)
 89 | 	assert.Equal(uint64(5), inOrderMessagesCount)
 90 | 	assert.Equal(uint64(0), skippedMessagesCount)
 91 | 
 92 | 	// Skip a few messages
 93 | 	data = &message.Data{
 94 | 		ProducerID: "1",
 95 | 		Topic:      "t",
 96 | 		MessageKey: 1,
 97 | 		Sequence:   5,
 98 | 	}
 99 | 	validateSequence(data)
100 | 	assert.Equal(uint64(1), sameMessagesCount)
101 | 	assert.Equal(uint64(0), oldMessagesCount)
102 | 	assert.Equal(uint64(5), inOrderMessagesCount)
103 | 	assert.Equal(uint64(4), skippedMessagesCount)
104 | 
105 | 	// Skip an old message
106 | 	data = &message.Data{
107 | 		ProducerID: "1",
108 | 		Topic:      "t",
109 | 		MessageKey: 1,
110 | 		Sequence:   2,
111 | 	}
112 | 	validateSequence(data)
113 | 	assert.Equal(uint64(1), sameMessagesCount)
114 | 	assert.Equal(uint64(1), oldMessagesCount)
115 | 	assert.Equal(uint64(5), inOrderMessagesCount)
116 | 	assert.Equal(uint64(4), skippedMessagesCount)
117 | }
118 | 


--------------------------------------------------------------------------------
/lib/consumer/throughput.go:
--------------------------------------------------------------------------------
 1 | package consumer
 2 | 
 3 | import (
 4 | 	"sync/atomic"
 5 | 
 6 | 	"github.com/prometheus/client_golang/prometheus"
 7 | 
 8 | 	"github.com/appsflyer/kafka-mirror-tester/lib/message"
 9 | )
10 | 
11 | var (
12 | 	bytesCounter   prometheus.Counter
13 | 	bytesCount     uint64
14 | 	messageCounter prometheus.Counter
15 | 	messageCount   uint64
16 | )
17 | 
18 | // Count the total throughput (message count and byte count)
19 | func collectThroughput(data *message.Data) {
20 | 	bytes := data.TotalPayloadLength
21 | 	bytesCounter.Add(float64(bytes))
22 | 	atomic.AddUint64(&bytesCount, bytes)
23 | 	messageCounter.Inc()
24 | 	atomic.AddUint64(&messageCount, 1)
25 | }
26 | 


--------------------------------------------------------------------------------
/lib/consumer/ui.go:
--------------------------------------------------------------------------------
  1 | package consumer
  2 | 
  3 | import (
  4 | 	"fmt"
  5 | 	"net/http"
  6 | 	"sync"
  7 | 	"sync/atomic"
  8 | 	"time"
  9 | 
 10 | 	humanize "github.com/dustin/go-humanize"
 11 | 	"github.com/prometheus/client_golang/prometheus"
 12 | 	"github.com/prometheus/client_golang/prometheus/promauto"
 13 | 	"github.com/prometheus/client_golang/prometheus/promhttp"
 14 | )
 15 | 
 16 | const (
 17 | 	terminalReportingFrequency = 10 * time.Second
 18 | )
 19 | 
 20 | var (
 21 | 	// once is used for one-time initialization that we don't want to embed in the init function.
 22 | 	once sync.Once
 23 | )
 24 | 
 25 | // Serve the different UIs for viewing metrics.
 26 | func serveConsumerUI() {
 27 | 	once.Do(func() {
 28 | 		terminalUI()
 29 | 		initPrometheus()
 30 | 	})
 31 | }
 32 | 
 33 | func initPrometheus() {
 34 | 	latencyHistogram = promauto.NewHistogram(prometheus.HistogramOpts{
 35 | 		Name:    "message_arrival_latency_hist_ms",
 36 | 		Help:    "Latency in ms for message arrival e2e (histogram).",
 37 | 		Buckets: prometheus.ExponentialBuckets(1000, 2, 9), // 9 buckets: 1sec,2sec,4,8,16...
 38 | 	})
 39 | 	sameMessagesCounter = promauto.NewCounter(prometheus.CounterOpts{
 40 | 		Name: "same_message_count",
 41 | 		Help: "Number of times the same message was consumed.",
 42 | 	})
 43 | 	oldMessagesCounter = promauto.NewCounter(prometheus.CounterOpts{
 44 | 		Name: "old_message_count",
 45 | 		Help: "Number of times an old message was consumed.",
 46 | 	})
 47 | 	inOrderMessagesCounter = promauto.NewCounter(prometheus.CounterOpts{
 48 | 		Name: "in_order_message_count",
 49 | 		Help: "Number of times a message was received in order (this is the happy path).",
 50 | 	})
 51 | 	skippedMessagesCounter = promauto.NewCounter(prometheus.CounterOpts{
 52 | 		Name: "skipped_message_count",
 53 | 		Help: "Number of times a message was skipped.",
 54 | 	})
 55 | 	messageCounter = promauto.NewCounter(prometheus.CounterOpts{
 56 | 		Name: "messages_consumed",
 57 | 		Help: "Number of messages consumed from kafka.",
 58 | 	})
 59 | 	bytesCounter = promauto.NewCounter(prometheus.CounterOpts{
 60 | 		Name: "bytes_consumed",
 61 | 		Help: "Number of bytes consumed from kafka.",
 62 | 	})
 63 | 
 64 | 	http.Handle("/metrics", promhttp.Handler())
 65 | 	go http.ListenAndServe(":8000", nil)
 66 | }
 67 | 
 68 | // Periodically emit statistics to the terminal.
 69 | func terminalUI() {
 70 | 	ticker := time.Tick(terminalReportingFrequency)
 71 | 	const terminalWidth = 50
 72 | 	var (
 73 | 		lastMessages uint64
 74 | 		lastBytes    uint64
 75 | 	)
 76 | 	go func() {
 77 | 		for {
 78 | 			<-ticker
 79 | 			messages := atomic.LoadUint64(&messageCount)
 80 | 			bytes := atomic.LoadUint64(&bytesCount)
 81 | 			reportingFrequencySec := uint64((terminalReportingFrequency / time.Second))
 82 | 			messageRate := int((messages - lastMessages) / reportingFrequencySec)
 83 | 			bytesRate := uint64((bytes - lastBytes) / reportingFrequencySec)
 84 | 			metrics := tachymeterHistogram.Calc()
 85 | 			tachymeterHistogram.Reset()
 86 | 
 87 | 			fmt.Printf("\n\n\n\tSTATS\n")
 88 | 			//print a visual histogram of latencies
 89 | 			fmt.Println(metrics.Histogram.String(terminalWidth))
 90 | 			// print statistics about latencies
 91 | 			fmt.Println(metrics.String())
 92 | 			fmt.Printf("\nRead rate: %d messages/sec \t Byte rate: %s/sec \n", messageRate, humanize.Bytes(bytesRate))
 93 | 			fmt.Printf("\nsameMessagesCount=%d, oldMessagesCount=%d, inOrderMessagesCount=%d, skippedMessagesCount=%d",
 94 | 				atomic.LoadUint64(&sameMessagesCount),
 95 | 				atomic.LoadUint64(&oldMessagesCount),
 96 | 				atomic.LoadUint64(&inOrderMessagesCount),
 97 | 				atomic.LoadUint64(&skippedMessagesCount))
 98 | 			lastMessages = messages
 99 | 			lastBytes = bytes
100 | 		}
101 | 	}()
102 | }
103 | 


--------------------------------------------------------------------------------
/lib/gen/main/code-gen.go:
--------------------------------------------------------------------------------
 1 | // The following directive is necessary to make the package coherent:
 2 | // +build ignore
 3 | 
 4 | // This program generates code automatically and it should be run before build.
 5 | // It can be invoked by running: go generate
 6 | 
 7 | package main
 8 | 
 9 | import (
10 | 	"fmt"
11 | 	"html/template"
12 | 	"os"
13 | 	"strings"
14 | 
15 | 	log "github.com/sirupsen/logrus"
16 | )
17 | 
18 | // max payload size
19 | 
20 | const payloadLength = 1e6
21 | 
22 | var messageConstTemplate = template.Must(template.New("").Parse(`// Code generated by go generate; DO NOT EDIT.
23 | package message
24 | 
25 | const payload = "{{ .Payload }}"
26 | `))
27 | 
28 | func main() {
29 | 	f, err := os.Create("message-no-headers-const-gen.go")
30 | 	if err != nil {
31 | 		log.Fatalf("Err: %+v", err)
32 | 	}
33 | 	defer f.Close()
34 | 
35 | 	var b strings.Builder
36 | 	for i := 0; i < payloadLength; i++ {
37 | 		fmt.Fprintf(&b, "%d", i%10)
38 | 	}
39 | 	messageConstTemplate.Execute(f, struct {
40 | 		Payload string
41 | 	}{
42 | 		Payload: b.String(),
43 | 	})
44 | }
45 | 


--------------------------------------------------------------------------------
/lib/log.go:
--------------------------------------------------------------------------------
 1 | package lib
 2 | 
 3 | import log "github.com/sirupsen/logrus"
 4 | 
 5 | // initialize logs
 6 | func init() {
 7 | 	log.SetFormatter(&log.TextFormatter{
 8 | 		ForceColors:   true,
 9 | 		FullTimestamp: true,
10 | 	})
11 | }
12 | 


--------------------------------------------------------------------------------
/lib/message/data.go:
--------------------------------------------------------------------------------
 1 | package message
 2 | 
 3 | import (
 4 | 	"fmt"
 5 | 	"time"
 6 | 
 7 | 	"github.com/appsflyer/kafka-mirror-tester/lib/types"
 8 | )
 9 | 
10 | // Data represent the data sent in a message.
11 | type Data struct {
12 | 	ProducerID        types.ProducerID
13 | 	MessageKey        types.MessageKey
14 | 	Sequence          types.SequenceNumber
15 | 	ProducerTimestamp time.Time
16 | 	ConsumerTimestamp time.Time
17 | 	Latency           time.Duration // In nanoseconds
18 | 	Topic             types.Topic
19 | 	// The actual payload (without metadata)
20 | 	Payload []byte
21 | 	// The total payload lenght, including metadata sent inside the payload
22 | 	TotalPayloadLength uint64
23 | }
24 | 
25 | func (d Data) String() string {
26 | 	return fmt.Sprintf("message.Data[ProducerID=%s, MessageKey=%d, Topic=%s, Sequence=%d, Latency=%dms len(Payload)=%db]",
27 | 		d.ProducerID, d.MessageKey, d.Topic, d.Sequence, d.LatencyMS(), len(d.Payload))
28 | }
29 | 
30 | // LatencyMS returns the latency in ms
31 | func (d Data) LatencyMS() int64 {
32 | 	return int64(d.Latency / 1e6)
33 | }
34 | 
35 | // Data parsed from the payload (when headers are not used)
36 | type parsedData struct {
37 | 	producerID types.ProducerID
38 | 	sequence   types.SequenceNumber
39 | 	timestamp  time.Time
40 | 	payload    []byte
41 | }
42 | 


--------------------------------------------------------------------------------
/lib/message/message-no-headers.go:
--------------------------------------------------------------------------------
 1 | package message
 2 | 
 3 | import (
 4 | 	"fmt"
 5 | 	"strconv"
 6 | 	"strings"
 7 | 	"time"
 8 | 
 9 | 	"github.com/pkg/errors"
10 | 	log "github.com/sirupsen/logrus"
11 | 
12 | 	"github.com/appsflyer/kafka-mirror-tester/lib/types"
13 | )
14 | 
15 | //go:generate go run ../gen/main/code-gen.go
16 | 
17 | // Format a message based on the parameters
18 | func format(
19 | 	id types.ProducerID,
20 | 	seq types.SequenceNumber,
21 | 	timestamp time.Time,
22 | 	messageSize types.MessageSize,
23 | ) string {
24 | 	var b strings.Builder
25 | 	// build the header first
26 | 	fmt.Fprintf(&b, "%s;%d;%d;", id, seq, timestamp.UTC().UnixNano())
27 | 
28 | 	// See how much space left for payload and add chars based on the space left
29 | 	left := int(messageSize) - b.Len()
30 | 	if left > 0 {
31 | 		fmt.Fprintf(&b, payload[:left])
32 | 	}
33 | 	return b.String()
34 | }
35 | 
36 | // Parse parses the string message into the Data structure.
37 | func parse(msg string) (data parsedData, err error) {
38 | 	parts := strings.Split(msg, ";")
39 | 	if len(parts) != 4 {
40 | 		err = errors.Errorf("msg should contain 4 parts but it doesn't. %s...", msg[:30])
41 | 		return
42 | 	}
43 | 
44 | 	data.producerID = types.ProducerID(parts[0])
45 | 	sq, err := strconv.ParseInt(parts[1], 10, 64)
46 | 	if err != nil {
47 | 		err = errors.WithStack(err)
48 | 		return
49 | 	}
50 | 
51 | 	data.sequence = types.SequenceNumber(sq)
52 | 
53 | 	ts, err := parseTs(parts[2])
54 | 	if err != nil {
55 | 		err = errors.WithStack(err)
56 | 		return
57 | 	}
58 | 	data.timestamp = ts
59 | 
60 | 	data.payload = []byte(parts[3])
61 | 	return
62 | }
63 | 
64 | func parseTs(ts string) (time.Time, error) {
65 | 	i, err := strconv.ParseInt(ts, 10, 64)
66 | 	if err != nil {
67 | 		log.Fatalf("Malformed timestamp %s. %+v", ts, err)
68 | 	}
69 | 	nano := i % 1e9
70 | 	sec := i / 1e9
71 | 	t := time.Unix(sec, nano).UTC()
72 | 	return t, nil
73 | }
74 | 


--------------------------------------------------------------------------------
/lib/message/message-no-headers_test.go:
--------------------------------------------------------------------------------
 1 | package message
 2 | 
 3 | import (
 4 | 	"testing"
 5 | 	"time"
 6 | 
 7 | 	"github.com/stretchr/testify/assert"
 8 | 	"github.com/stretchr/testify/require"
 9 | 
10 | 	"github.com/appsflyer/kafka-mirror-tester/lib/types"
11 | )
12 | 
13 | func TestFormat(t *testing.T) {
14 | 
15 | 	assert := assert.New(t)
16 | 	now := time.Now()
17 | 	// Check length
18 | 	msg := format("1", 0, now, 100)
19 | 	assert.Equal(100, len(msg), "Length should be 100")
20 | 
21 | 	// Check minimal length
22 | 	msg = format("1", 0, now, 1)
23 | 	assert.True(len(msg) > 1, "Length should be > 1")
24 | 
25 | 	// Check very long messages
26 | 	msg = format("1", 0, now, 1e4)
27 | 	assert.Equal(int(1e4), len(msg), "Length should be 1e3")
28 | }
29 | 
30 | func TestParse(t *testing.T) {
31 | 
32 | 	assert := assert.New(t)
33 | 	now := time.Now()
34 | 	// Create a message
35 | 	msg := format("1", 0, now, 100)
36 | 	// Make sure at least one ms passed before parsing it
37 | 	time.Sleep(1 * time.Millisecond)
38 | 	parsed, err := parse(msg)
39 | 
40 | 	now = time.Now()
41 | 	require.Nil(t, err, "There should not be an error")
42 | 
43 | 	assert.Equal(types.ProducerID("1"), parsed.producerID, "ProducerID should be 1")
44 | 	assert.Equal(types.SequenceNumber(0), parsed.sequence, "Sequence should be 0")
45 | 	assert.True(parsed.timestamp.Before(now))
46 | }
47 | 
48 | // from fib_test.go
49 | func BenchmarkFormat(b *testing.B) {
50 | 	now := time.Now()
51 | 	for n := 0; n < b.N; n++ {
52 | 		format("xx", 5, now, 1000)
53 | 	}
54 | }
55 | 


--------------------------------------------------------------------------------
/lib/message/message.go:
--------------------------------------------------------------------------------
  1 | package message
  2 | 
  3 | import (
  4 | 	"strconv"
  5 | 	"time"
  6 | 
  7 | 	"github.com/confluentinc/confluent-kafka-go/kafka"
  8 | 	log "github.com/sirupsen/logrus"
  9 | 
 10 | 	"github.com/appsflyer/kafka-mirror-tester/lib/types"
 11 | )
 12 | 
 13 | const (
 14 | 	// KeySequence identifies the sequence number header
 15 | 	KeySequence = "seq"
 16 | 
 17 | 	// KeyProducerID identifies the producer ID header
 18 | 	KeyProducerID = "id"
 19 | )
 20 | 
 21 | // Create a mew message with headers, timestamp and size.
 22 | // Does not set TopicPartition.
 23 | func Create(
 24 | 	producerID types.ProducerID,
 25 | 	messageID types.MessageKey,
 26 | 	seq types.SequenceNumber,
 27 | 	size types.MessageSize,
 28 | 	useMessageHeaders bool,
 29 | ) *kafka.Message {
 30 | 	ts := time.Now().UTC()
 31 | 	msg := &kafka.Message{
 32 | 		Key: []byte(strconv.FormatUint(uint64(messageID), 10)),
 33 | 	}
 34 | 	if useMessageHeaders {
 35 | 		msg.Timestamp = ts
 36 | 		msg.TimestampType = kafka.TimestampCreateTime
 37 | 		msg.Value = make([]byte, size)
 38 | 		msg.Headers = []kafka.Header{
 39 | 			{
 40 | 				Key:   KeyProducerID,
 41 | 				Value: []byte(producerID),
 42 | 			},
 43 | 			{
 44 | 				Key:   KeySequence,
 45 | 				Value: []byte(strconv.FormatInt(int64(seq), 10)),
 46 | 			},
 47 | 		}
 48 | 	} else {
 49 | 		msg.Value = []byte(format(producerID, seq, ts, size))
 50 | 	}
 51 | 	return msg
 52 | }
 53 | 
 54 | // Extract the data from the message and set timestamp and latencies
 55 | func Extract(
 56 | 	msg *kafka.Message,
 57 | 	useMessageHeaders bool,
 58 | ) *Data {
 59 | 	now := time.Now().UTC()
 60 | 	var topic types.Topic
 61 | 	if msg.TopicPartition.Topic != nil {
 62 | 		topic = types.Topic(*msg.TopicPartition.Topic)
 63 | 	} else {
 64 | 		topic = types.Topic("")
 65 | 	}
 66 | 	keyStr := string(msg.Key)
 67 | 	ui, err := strconv.ParseUint(keyStr, 10, 64)
 68 | 	if err != nil {
 69 | 		log.Errorf("Malformed message key %s \t %s", keyStr, err)
 70 | 	}
 71 | 	key := types.MessageKey(ui)
 72 | 	data := &Data{
 73 | 		ConsumerTimestamp:  now,
 74 | 		Topic:              topic,
 75 | 		TotalPayloadLength: uint64(len(msg.Value)),
 76 | 		MessageKey:         key,
 77 | 	}
 78 | 	if useMessageHeaders {
 79 | 		data.ProducerID = getProducerID(msg)
 80 | 		data.Sequence = getSequence(msg)
 81 | 		data.ProducerTimestamp = msg.Timestamp
 82 | 		data.Payload = msg.Value
 83 | 	} else {
 84 | 		parsed, err := parse(string(msg.Value))
 85 | 		if err != nil {
 86 | 			log.Errorf("Error parsing message %s", string(msg.Value))
 87 | 			return data
 88 | 		}
 89 | 		data.ProducerID = parsed.producerID
 90 | 		data.Sequence = parsed.sequence
 91 | 		data.Payload = parsed.payload
 92 | 		data.ProducerTimestamp = parsed.timestamp
 93 | 	}
 94 | 	data.Latency = data.ConsumerTimestamp.Sub(data.ProducerTimestamp)
 95 | 	return data
 96 | }
 97 | 
 98 | func getProducerID(msg *kafka.Message) types.ProducerID {
 99 | 	v := getHeader(msg, KeyProducerID)
100 | 	if v == nil {
101 | 		return types.ProducerID("")
102 | 	}
103 | 	return types.ProducerID(string(v))
104 | }
105 | 
106 | func getSequence(msg *kafka.Message) types.SequenceNumber {
107 | 	str := string(getHeader(msg, KeySequence))
108 | 	if str == "" {
109 | 		return -1
110 | 	}
111 | 	i, err := strconv.ParseInt(str, 10, 64)
112 | 	if err != nil {
113 | 		log.Fatalf("Malformed Sequence Number %s. %+v", str, err)
114 | 	}
115 | 	return types.SequenceNumber(i)
116 | }
117 | 
118 | func getHeader(msg *kafka.Message, key string) []byte {
119 | 	for _, h := range msg.Headers {
120 | 		if h.Key == key {
121 | 			return h.Value
122 | 		}
123 | 	}
124 | 	// header not found
125 | 	return nil
126 | }
127 | 


--------------------------------------------------------------------------------
/lib/message/message_test.go:
--------------------------------------------------------------------------------
 1 | package message
 2 | 
 3 | import (
 4 | 	"testing"
 5 | 	"time"
 6 | 
 7 | 	"github.com/stretchr/testify/assert"
 8 | 	"github.com/stretchr/testify/require"
 9 | 	"github.com/appsflyer/kafka-mirror-tester/lib/types"
10 | )
11 | 
12 | func TestCreateAndExtractWithHeaders(t *testing.T) {
13 | 	msg := Create("1", 2, 5, 100, true)
14 | 	require.NotNil(t, msg, "Message should not be nil")
15 | 
16 | 	// Make sure at least one ms passed before parsing it
17 | 	time.Sleep(1 * time.Millisecond)
18 | 	data := Extract(msg, true)
19 | 	require.NotNil(t, data, "Data should not be nil")
20 | 
21 | 	assert := assert.New(t)
22 | 	assert.Equal(types.ProducerID("1"), data.ProducerID, "ProducerID should be 1")
23 | 	assert.Equal(types.MessageKey(2), data.MessageKey, "MessageKey should be 2")
24 | 	assert.Equal(types.SequenceNumber(5), data.Sequence, "Sequence number should be 5")
25 | 	assert.True(data.Latency > 1, "Latency should be > 1")
26 | }
27 | 
28 | func TestCreateAndExtractWithHouteaders(t *testing.T) {
29 | 	msg := Create("1", 2, 5, 100, false)
30 | 	require.NotNil(t, msg, "Message should not be nil")
31 | 
32 | 	// Make sure at least one ms passed before parsing it
33 | 	time.Sleep(1 * time.Millisecond)
34 | 	data := Extract(msg, false)
35 | 	require.NotNil(t, data, "Data should not be nil")
36 | 
37 | 	assert := assert.New(t)
38 | 	assert.Equal(types.ProducerID("1"), data.ProducerID, "ProducerID should be 1")
39 | 	assert.Equal(types.MessageKey(2), data.MessageKey, "MessageKey should be 2")
40 | 	assert.Equal(types.SequenceNumber(5), data.Sequence, "Sequence number should be 5")
41 | 	assert.True(data.Latency > 1, "Latency should be > 1")
42 | }
43 | 
44 | func TestMissingHeaderFields(t *testing.T) {
45 | 	msg := Create("1", 2, 5, 100, true)
46 | 	require.NotNil(t, msg, "Message should not be nil")
47 | 	msg.Headers = msg.Headers[1:]
48 | 	data := Extract(msg, true)
49 | 	require.NotNil(t, data, "Data should not be nil")
50 | 
51 | 	assert := assert.New(t)
52 | 	assert.Equal(types.ProducerID(""), data.ProducerID, "ProducerID should be 1")
53 | 	assert.Equal(types.MessageKey(2), data.MessageKey, "MessageKey should be 2")
54 | 	assert.Equal(types.SequenceNumber(5), data.Sequence, "Sequence number should be 5")
55 | }
56 | 
57 | func TestMissingHeaders(t *testing.T) {
58 | 	msg := Create("1", 2, 5, 100, true)
59 | 	require.NotNil(t, msg, "Message should not be nil")
60 | 	msg.Headers = nil
61 | 	data := Extract(msg, true)
62 | 	require.NotNil(t, data, "Data should not be nil")
63 | 
64 | 	assert := assert.New(t)
65 | 	assert.Equal(types.ProducerID(""), data.ProducerID, "ProducerID should be 1")
66 | 	assert.Equal(types.MessageKey(2), data.MessageKey, "MessageKey should be 2")
67 | 	assert.Equal(types.SequenceNumber(-1), data.Sequence, "Sequence number should be 5")
68 | }
69 | 


--------------------------------------------------------------------------------
/lib/producer/monitor.go:
--------------------------------------------------------------------------------
  1 | package producer
  2 | 
  3 | import (
  4 | 	"context"
  5 | 	"net/http"
  6 | 	"sync/atomic"
  7 | 	"time"
  8 | 
  9 | 	"github.com/confluentinc/confluent-kafka-go/kafka"
 10 | 	mapset "github.com/deckarep/golang-set"
 11 | 	"github.com/dustin/go-humanize"
 12 | 	"github.com/paulbellamy/ratecounter"
 13 | 	"github.com/prometheus/client_golang/prometheus"
 14 | 	"github.com/prometheus/client_golang/prometheus/promauto"
 15 | 	"github.com/prometheus/client_golang/prometheus/promhttp"
 16 | 	log "github.com/sirupsen/logrus"
 17 | 
 18 | 	"github.com/appsflyer/kafka-mirror-tester/lib/types"
 19 | )
 20 | 
 21 | const monitoringFrequency = 5 * time.Second
 22 | 
 23 | var (
 24 | 	// messageRateCounter is used in order to observe the actual throughput
 25 | 	messageRateCounter *ratecounter.RateCounter
 26 | 	messageCounter     prometheus.Counter
 27 | 	messageSendErrors  prometheus.Counter
 28 | 
 29 | 	// bytesRateCounter measures the actual throughput in bytes
 30 | 	bytesRateCounter *ratecounter.RateCounter
 31 | 	bytesCounter     prometheus.Counter
 32 | 
 33 | 	topicsGauge     prometheus.Gauge
 34 | 	partitionsGauge prometheus.Gauge
 35 | 
 36 | 	// number of messages that are bandwidth throttled or kafka-server throttled.
 37 | 	// This is the number of messages that were supposed to be sent but got throttled and are lagging behind.
 38 | 	badwidthThrottledMessages prometheus.Counter
 39 | 
 40 | 	// Number of currently client-side in-flight messages (messages buffered but not yet sent)
 41 | 	inflightMessageCount prometheus.GaugeFunc
 42 | )
 43 | 
 44 | func init() {
 45 | 	messageRateCounter = ratecounter.NewRateCounter(monitoringFrequency)
 46 | 	bytesRateCounter = ratecounter.NewRateCounter(monitoringFrequency)
 47 | }
 48 | 
 49 | func reportMessageSent(m *kafka.Message) {
 50 | 	messageRateCounter.Incr(1)
 51 | 	messageCounter.Inc()
 52 | 	l := len(m.Value)
 53 | 	bytesRateCounter.Incr(int64(l))
 54 | 	bytesCounter.Add(float64(l))
 55 | }
 56 | 
 57 | // periodically monitors the kafka writer.
 58 | // Blocks forever or until canceled.
 59 | func monitor(
 60 | 	ctx context.Context,
 61 | 	errorCounter *uint64,
 62 | 	frequency time.Duration,
 63 | 	desiredThroughput types.Throughput,
 64 | 	id types.ProducerID,
 65 | 	numTopics, numPartitions uint,
 66 | 	producers mapset.Set,
 67 | ) {
 68 | 	initPrometheus(numTopics, numPartitions, producers)
 69 | 	ticker := time.Tick(frequency)
 70 | 	for {
 71 | 		select {
 72 | 		case <-ticker:
 73 | 			printStats(errorCounter, frequency, desiredThroughput, id)
 74 | 		case <-ctx.Done():
 75 | 			log.Infof("Monitor done. %s", ctx.Err())
 76 | 			return
 77 | 		}
 78 | 	}
 79 | }
 80 | 
 81 | func initPrometheus(
 82 | 	numTopics, numPartitions uint,
 83 | 	producers mapset.Set,
 84 | ) {
 85 | 	messageCounter = promauto.NewCounter(prometheus.CounterOpts{
 86 | 		Name: "messages_produced",
 87 | 		Help: "Number of messages produced to kafka.",
 88 | 	})
 89 | 	bytesCounter = promauto.NewCounter(prometheus.CounterOpts{
 90 | 		Name: "bytes_produced",
 91 | 		Help: "Number of bytes produced to kafka.",
 92 | 	})
 93 | 	topicsGauge = promauto.NewGauge(prometheus.GaugeOpts{
 94 | 		Name: "producer_number_of_topics",
 95 | 		Help: "Number of topics that the producer writes to.",
 96 | 	})
 97 | 	topicsGauge.Add(float64(numTopics))
 98 | 
 99 | 	partitionsGauge = promauto.NewGauge(prometheus.GaugeOpts{
100 | 		Name: "producer_number_of_partitions",
101 | 		Help: "Number of partitions of each topic that the producer writes to.",
102 | 	})
103 | 	partitionsGauge.Add(float64(numPartitions))
104 | 
105 | 	badwidthThrottledMessages = promauto.NewCounter(prometheus.CounterOpts{
106 | 		Name: "bandwidth_throttled_messages",
107 | 		Help: "Number of messages throttled after sending.",
108 | 	})
109 | 	inflightMessageCount = promauto.NewGaugeFunc(prometheus.GaugeOpts{
110 | 		Name: "in_flight_message_count",
111 | 		Help: "Number of currently in-flight messages (client side)",
112 | 	}, inFlightMessageCounter(producers))
113 | 	messageSendErrors = promauto.NewCounter(prometheus.CounterOpts{
114 | 		Name: "message_send_errors",
115 | 		Help: "Number of message send errors.",
116 | 	})
117 | 	http.Handle("/metrics", promhttp.Handler())
118 | 	go http.ListenAndServe(":8001", nil)
119 | }
120 | 
121 | func inFlightMessageCounter(producers mapset.Set) func() float64 {
122 | 	return func() float64 {
123 | 		sum := 0
124 | 		for p := range producers.Iterator().C {
125 | 			producer := p.(*kafka.Producer)
126 | 			sum += producer.Len()
127 | 		}
128 | 		return float64(sum)
129 | 	}
130 | }
131 | 
132 | // Prints some runtime stats such as errors, throughputs etc
133 | func printStats(
134 | 	errorCounter *uint64,
135 | 	frequency time.Duration,
136 | 	desiredThroughput types.Throughput,
137 | 	id types.ProducerID,
138 | ) {
139 | 	frequencySeconds := int64(frequency / time.Second)
140 | 	messageThroughput := messageRateCounter.Rate() / frequencySeconds
141 | 	bytesThroughput := uint64(bytesRateCounter.Rate() / frequencySeconds)
142 | 	errors := atomic.LoadUint64(errorCounter)
143 | 	log.Infof(`Recent stats for %s:
144 | 	Throughput: %d messages / sec
145 | 	Throughput: %s / sec
146 | 	Total errors: %d
147 | 	`, id, messageThroughput, humanize.Bytes(bytesThroughput), errors)
148 | 
149 | 	// How much slack we're willing to take if throughput is lower than desired
150 | 	const slack = .9
151 | 
152 | 	if float32(messageThroughput) < float32(desiredThroughput)*slack {
153 | 		log.Warnf("Actual throughput is < desired throughput. %d < %d", messageThroughput, desiredThroughput)
154 | 		badwidthThrottledMessages.Add(float64(desiredThroughput) - float64(messageThroughput))
155 | 	}
156 | }
157 | 


--------------------------------------------------------------------------------
/lib/producer/producer.go:
--------------------------------------------------------------------------------
  1 | package producer
  2 | 
  3 | // The producer package is responsible for producing messages and repotring success/failure WRT
  4 | // delivery as well as capacity (is it able to produce the required throughput)
  5 | 
  6 | import (
  7 | 	"context"
  8 | 	"math"
  9 | 	"strings"
 10 | 	"sync"
 11 | 	"sync/atomic"
 12 | 
 13 | 	"golang.org/x/time/rate"
 14 | 
 15 | 	"github.com/confluentinc/confluent-kafka-go/kafka"
 16 | 	mapset "github.com/deckarep/golang-set"
 17 | 	log "github.com/sirupsen/logrus"
 18 | 
 19 | 	"github.com/appsflyer/kafka-mirror-tester/lib/admin"
 20 | 	"github.com/appsflyer/kafka-mirror-tester/lib/message"
 21 | 	"github.com/appsflyer/kafka-mirror-tester/lib/types"
 22 | )
 23 | 
 24 | const (
 25 | 	// How much burst we allow for the rate limiter.
 26 | 	// We provide a 0.1 burst ratio which means that at times the rate might go up to 10% or the desired rate (but not for log)
 27 | 	// This is done in order to conpersate for slow starts.
 28 | 	burstRatio = 0.1
 29 | 
 30 | 	// Number of messages per producer that we allow in-flight before waiting and flushing
 31 | 	inFlightThreshold = 100000000
 32 | )
 33 | 
 34 | // ProduceToTopics spawms multiple producer threads and produces to all topics
 35 | func ProduceToTopics(
 36 | 	brokers types.Brokers,
 37 | 	id types.ProducerID,
 38 | 	throughput types.Throughput,
 39 | 	size types.MessageSize,
 40 | 	initialSequence types.SequenceNumber,
 41 | 	topicsString string,
 42 | 	numPartitions, numReplicas uint,
 43 | 	useMessageHeaders bool,
 44 | 	retentionMs uint,
 45 | ) {
 46 | 	// Count the total number of errors on this topic
 47 | 	errorCounter := uint64(0)
 48 | 	topics := strings.Split(topicsString, ",")
 49 | 	producers := mapset.NewSet()
 50 | 	ctx := context.Background()
 51 | 	go monitor(ctx, &errorCounter, monitoringFrequency, throughput, id, uint(len(topics)), numPartitions, producers)
 52 | 
 53 | 	var wg sync.WaitGroup
 54 | 	for _, topic := range topics {
 55 | 		t := types.Topic(topic)
 56 | 		wg.Add(1)
 57 | 		go func(topic types.Topic, partitions, replicas uint) {
 58 | 			admin.MustCreateTopic(ctx, brokers, t, partitions, replicas, retentionMs)
 59 | 			ProduceForever(
 60 | 				ctx,
 61 | 				brokers,
 62 | 				t,
 63 | 				id,
 64 | 				initialSequence,
 65 | 				partitions,
 66 | 				throughput,
 67 | 				size,
 68 | 				useMessageHeaders,
 69 | 				&errorCounter,
 70 | 				producers)
 71 | 			wg.Done()
 72 | 		}(t, numPartitions, numReplicas)
 73 | 	}
 74 | 	wg.Wait()
 75 | }
 76 | 
 77 | // ProduceForever will produce messages to the topic forver or until canceled by the context.
 78 | // It will try to acheive the desired throughput and if not - will log that. It will not exceed the throughput (measured by number of messages per second)
 79 | // throughput is limited to 1M messages per second.
 80 | func ProduceForever(
 81 | 	ctx context.Context,
 82 | 	brokers types.Brokers,
 83 | 	topic types.Topic,
 84 | 	id types.ProducerID,
 85 | 	initialSequence types.SequenceNumber,
 86 | 	numPartitions uint,
 87 | 	throughput types.Throughput,
 88 | 	messageSize types.MessageSize,
 89 | 	useMessageHeaders bool,
 90 | 	errorCounter *uint64,
 91 | 	producers mapset.Set,
 92 | ) {
 93 | 	log.Infof("Starting the producer. brokers=%s, topic=%s id=%s throughput=%d size=%d initialSequence=%d",
 94 | 		brokers, topic, id, throughput, messageSize, initialSequence)
 95 | 	p, err := kafka.NewProducer(&kafka.ConfigMap{
 96 | 		"bootstrap.servers":      string(brokers),
 97 | 		"queue.buffering.max.ms": "1000",
 98 | 	})
 99 | 	if err != nil {
100 | 		log.Fatalf("Failed to create producer: %s\n", err)
101 | 	}
102 | 	defer p.Close()
103 | 	producers.Add(p)
104 | 	producerForeverWithProducer(
105 | 		ctx,
106 | 		p,
107 | 		topic,
108 | 		id,
109 | 		initialSequence,
110 | 		numPartitions,
111 | 		throughput,
112 | 		messageSize,
113 | 		useMessageHeaders,
114 | 		errorCounter)
115 | }
116 | 
117 | // producerForeverWithWriter produces kafka messages forever or until the context is canceled.
118 | // adheeers to maintaining the desired throughput.
119 | func producerForeverWithProducer(
120 | 	ctx context.Context,
121 | 	p *kafka.Producer,
122 | 	topic types.Topic,
123 | 	producerID types.ProducerID,
124 | 	initialSequence types.SequenceNumber,
125 | 	numPartitions uint,
126 | 	throughput types.Throughput,
127 | 	messageSize types.MessageSize,
128 | 	useMessageHeaders bool,
129 | 	errorCounter *uint64,
130 | ) {
131 | 	// the rate limiter regulates the producer by limiting its throughput (messages/sec)
132 | 	limiter := rate.NewLimiter(rate.Limit(throughput), int(math.Ceil(float64(throughput)*burstRatio)))
133 | 
134 | 	// Sequence number per message
135 | 	seq := initialSequence
136 | 
137 | 	go eventsProcessor(p, errorCounter)
138 | 
139 | 	topicString := string(topic)
140 | 	tp := kafka.TopicPartition{Topic: &topicString, Partition: kafka.PartitionAny}
141 | 	for ; ; seq++ {
142 | 		err := limiter.Wait(ctx)
143 | 		if err != nil {
144 | 			log.Errorf("Error waiting %+v", err)
145 | 			continue
146 | 		}
147 | 		numPartitionsXprime := numPartitions * 17 // TO increase the likelihood of even partitioning
148 | 		messageKey := types.MessageKey(uint(seq) % numPartitionsXprime)
149 | 		scopedSeq := seq / types.SequenceNumber(numPartitionsXprime)
150 | 		produceMessage(ctx, p, tp, producerID, messageKey, scopedSeq, messageSize, useMessageHeaders)
151 | 	}
152 | }
153 | 
154 | // produceMessage produces a single message to kafka.
155 | // message production is asyncrounous on the ProducerChannel
156 | func produceMessage(
157 | 	ctx context.Context,
158 | 	p *kafka.Producer,
159 | 	topicPartition kafka.TopicPartition,
160 | 	producerID types.ProducerID,
161 | 	messageKey types.MessageKey,
162 | 	seq types.SequenceNumber,
163 | 	messageSize types.MessageSize,
164 | 	useMessageHeaders bool,
165 | ) {
166 | 	if p.Len() > inFlightThreshold {
167 | 		p.Flush(1)
168 | 	}
169 | 	m := message.Create(producerID, messageKey, seq, messageSize, useMessageHeaders)
170 | 	m.TopicPartition = topicPartition
171 | 	p.ProduceChannel() <- m
172 | 	log.Tracef("Producing %s...", m)
173 | }
174 | 
175 | // eventsProcessor processes the events emited by the producer p.
176 | // It then logs errors and increased the passed-by-reference errors counter and updates the throughput counter
177 | func eventsProcessor(
178 | 	p *kafka.Producer,
179 | 	errorCounter *uint64,
180 | ) {
181 | 	for e := range p.Events() {
182 | 		switch ev := e.(type) {
183 | 		case *kafka.Message:
184 | 			m := ev
185 | 			if m.TopicPartition.Error != nil {
186 | 				log.Errorf("Delivery failed: %v", m.TopicPartition.Error)
187 | 				atomic.AddUint64(errorCounter, 1)
188 | 				messageSendErrors.Inc()
189 | 			} else {
190 | 				reportMessageSent(m)
191 | 			}
192 | 		default:
193 | 			log.Infof("Ignored event: %s", ev)
194 | 		}
195 | 	}
196 | }
197 | 


--------------------------------------------------------------------------------
/lib/types/types.go:
--------------------------------------------------------------------------------
 1 | package types
 2 | 
 3 | // Common type definitions for this project
 4 | 
 5 | // Brokers is a coma seperated string of host:port
 6 | type Brokers string
 7 | 
 8 | // Throughput describes a message send throughput measured by number of messages per second
 9 | type Throughput uint
10 | 
11 | // MessageSize describes a message size in bytes
12 | type MessageSize uint
13 | 
14 | // ProducerID describes an ID for a producer
15 | type ProducerID string
16 | 
17 | // MessageKey describes a message key in kafka. We define them as uint b/c we want 
18 | // to enfoce that as part of the business logic. 
19 | // One thing worth mentioning is that in Kafka message keys are not generally required to be unique.
20 | // In fact we use them simply for message routing b/w partitions
21 | // and we expect each key to repeat many times
22 | type MessageKey uint
23 | 
24 | // Topic describes a name of a kafka topic
25 | type Topic string
26 | 
27 | // Topics is just an array of topics
28 | type Topics []string
29 | 
30 | // SequenceNumber represents a sequence number in a message used for testing the order of the message
31 | type SequenceNumber int64
32 | 
33 | // ConsumerGroup for kafka
34 | type ConsumerGroup string
35 | 


--------------------------------------------------------------------------------
/main.go:
--------------------------------------------------------------------------------
 1 | package main
 2 | 
 3 | import (
 4 | 	"github.com/appsflyer/kafka-mirror-tester/lib/cmd"
 5 | )
 6 | 
 7 | func main() {
 8 | 	cmd.Execute()
 9 | }
10 | 


--------------------------------------------------------------------------------
/results-ureplicator.md:
--------------------------------------------------------------------------------
  1 | # Results of the experiment - uReplicator
  2 | In this experiment we set out to test the performance and correctness of uReplicator with two Kafka clusters located in two AWS regions: `us-east-1` (Virginia) and `eu-west-1` (Ireland).
  3 | 
  4 | To implement the experiment we created specialized producer and consumer written in Go. The producer's responsibility is to generate messages in predefined format and configurable throughput and the consumer's responsibility is to verify message arrival and measure throughput and latency.
  5 | 
  6 | Details of the implementation of the producer and the consumer are in the [readme file](README.md).
  7 | 
  8 | ## Setup
  9 | 
 10 | We use *Kubernetest* to spin up servers in both datacenters and set up the two kafka clusters as well as the uReplicator, producer and consumer.
 11 | 
 12 | * The producer runs in us-east-1, producing data to the local cluster.
 13 | * uReplicator runs in eu-west-1, consuming from the cluster in us-east-1 and producing to the local kafka cluster in eu-west-1
 14 | * The consumer runs in eu-west-1, consuming messages replicated to the local cluster by uReplicator
 15 | 
 16 | We tested various and different configurations but most of the tests were run with this setup:
 17 | 
 18 | * Kubernetest node types: `i3-large`
 19 | * Kubernetest cluster sizes: in us-east-1: 40 nodes, in eu-west-1 48 nodes
 20 | * Kafka cluster sizes: 30 brokers in each cluster. Single zookeeper pod. Storage on ephemeral local disks
 21 | * uReplicator: 8 workers, 3 controllers (1 controller in some tests)
 22 | * Producer: 10 pods
 23 | * Consumer: 4 pods
 24 | * Produced messages: 1kB (1000 bytes) each message
 25 | * Production throughput: 200k messages/sec
 26 | * => This results in replication of *200 MB/sec*
 27 | * Replicated topics by uReplicator: 1
 28 | * Kafka replication factor: 3 in us-east-1, 2 in eu-west-1
 29 | * Partitions: 150 partitions on both clusters as a baseline
 30 | 
 31 | (further configuration details such as memory, CPU allocation and more can be found in the k8s yaml files in this project)
 32 | 
 33 | ## Results
 34 | 
 35 | We ran multiple experiments, here are the highlights
 36 | 
 37 | ### Long haul
 38 | 
 39 | Run the workload of 200MB/sec for several hours.
 40 | 
 41 | **Result:** Looks good. Nothing suspicious happened. Over hours and hours the topics were correctly replicated.
 42 | 
 43 | ### Kill a broker in kafka-source
 44 | 
 45 | We kill a broker pod in kafka-source. When killed k8s automatically re-provisions a new pod in the statefulset, which results in a few minutes downtime for one of the brokers until it's back up. Since replication factor is 3 we do not expect message loss although this action might result in higher latency and lower throughput.
 46 | 
 47 | Killing a pod: (example)
 48 | 
 49 | ```sh
 50 | kubectl --context us-east-1.k8s.local -n kafka-source delete pod kafka-source-2
 51 | ```
 52 | 
 53 | **Result:** We see a small hiccup in throughput and latency of replication. And some message loss (about 20 messages out of 200k/sec, 0.01%). We don't have an explanation to this message loss although in terms of correctness of our application this is definitely something we can live with.
 54 | 
 55 | 
 56 | ![Kill a broker in kafka-source](doc/media/kill-kafka-source-pod.png "Kill a broker in kafka-source")
 57 | 
 58 | ### Reduce source cluster size permanently to 29
 59 | 
 60 | The baseline of the source cluster is 30 brokers. In this experiment we reduce the size to 29 permanently. Unlike the previous experiment in this case k8s will not re-provision the killed pod so the cluster size will remain 29 permanently. This is supposed to be OK since the replication factor is 3.
 61 | 
 62 | Scaling down a cluster to 29:
 63 | 
 64 | ```sh
 65 | kubectl --context us-east-1.k8s.local -n kafka-source scale statefulset kafka-source --replicas 29
 66 | ```
 67 | 
 68 | **Result:** the result is very similar to before. We see a slight hiccup in replication throughput and in some cases we see minor message loss (which we cannot explain) but that's all.
 69 | 
 70 | ### Add uReplicator worker
 71 | 
 72 | The original setup had 8 uReplicator workers. We want to check how adding an additional worker affects the cluster. Our expectation is that workers would rebalance and "continue as usual".
 73 | 
 74 | ```sh
 75 | kubectl --context eu-west-1.k8s.local -n ureplicator scale deployment ureplicator-worker --replicas 9
 76 | ```
 77 | 
 78 | **Result:** As expected, everything is normal, that's good.
 79 | 
 80 | ### Remove uReplicator worker
 81 | 
 82 | Similar to before, the number of workers in our baseline is 8. In this test we reduce them to 7 by scaling down the number of pods. We expect to see no message loss but a small hiccup in latency until a rebalance.
 83 | 
 84 | ```sh
 85 | kubectl --context eu-west-1.k8s.local -n ureplicator scale deployment ureplicator-worker --replicas 7
 86 | ```
 87 | 
 88 | **Result:** We indeed see slowness but after a rebalance (~2 minutes) the rest of the workers catch up. No message loss.
 89 | 
 90 | ![Remove worker](doc/media/remove-worker.png "Remove worker")
 91 | 
 92 | ### uReplicaor under capacity and then back in capacity
 93 | When removing more and more workers from uReplicator at some stage it will run out of capacity and not be able to replicate in the desired throughput.
 94 | 
 95 | Our experiment is: Remove more and more workers, have uReplicator run out of capacity. And only then re-add workers and see how fast it's able to pick up with the pace.
 96 | 
 97 | Remove more and more workers:
 98 | 
 99 | ```sh
100 | kubectl --context eu-west-1.k8s.local -n ureplicator scale deployment ureplicator-worker --replicas 7
101 | kubectl --context eu-west-1.k8s.local -n ureplicator scale deployment ureplicator-worker --replicas 6
102 | kubectl --context eu-west-1.k8s.local -n ureplicator scale deployment ureplicator-worker --replicas 5
103 | ...
104 | 
105 | ```
106 | 
107 | And then when you see it runs out of capacity start adding them again
108 | 
109 | ```sh
110 | kubectl --context eu-west-1.k8s.local -n ureplicator scale deployment ureplicator-worker --replicas 6
111 | ```
112 | 
113 | It takes a long time for workers to catch up with work but eventually they do. Around 10 minutes
114 | Sometimes uReplicaotr needs to "get kicked" by adding or removing workers. For example we see that with 10 workers it might get to a local minima and then when reduced to 8 workers it suddenly gets a boost. It seems that sometimes the controllers don't know about the newly added workers. This can be monitored and fixed looking into the `/instances` API on the controller.
115 | 
116 | ### Kill a broker in kafka-source
117 | 
118 | We abruptly kill one of the brokers, allowing k8s to re-provision a new pod into its statefulset. We expect no message loss (as there's a replication factor of 3) and perhaps a slight performance hiccup.
119 | 
120 | ```sh
121 | kubectl --context eu-west-1.k8s.local -n kafka-destination delete pod kafka-destination-29
122 | ```
123 | 
124 | **Result:** There's a small hiccup and then things are back to normal. No message loss
125 | 
126 | ![Kill pod in destination](doc/media/kill-pod-destination.png "Kill pod in destination")
127 | 
128 | ### Delete a pod on kafka-destination
129 | 
130 | The baseline of both kafka clusters is 30 brokers. In this experiment we delete one broker from the destination cluster. We expect no message loss (as there's a replication factor of 3) and perhaps a slight performance hiccup.
131 | 
132 | ```sh
133 | kubectl --context eu-west-1.k8s.local -n kafka-destination scale statefulset kafka-destination --replicas 29
134 | ```
135 | 
136 | **Result:** OK, no noticeable hiccups
137 | 
138 | ![Downsize destination cluster](doc/media/downsize-destination-cluster.png "Downsize destination cluster")
139 | 
140 | ### Adding new topic
141 | 
142 | In this experiment we add a new topic and we want to see how fast uReplicator starts replicating the new topic.
143 | 
144 | ***Result:*** Discovery of new topic is in the order of 2-3 minutes, which is OK.
145 | 
146 | ![Discover new topic](doc/media/new-topic.png)
147 | 
148 | ### Adding partitions to an existing topic
149 | 
150 | We want to test what happens when we repartition (e.g. add partitions) to an existing topic which is already being actively replicated. We expect uReplicator to pick up the new partitions and start replicating them as well.
151 | 
152 | We connect to one of the source cluster workers and run the `kafka-topics` command
153 | 
154 | ```sh
155 | $ make k8s-kafka-shell-source
156 | # ... connecting ...
157 | 
158 | $ unset JMX_PORT
159 | $ bin/kafka-topics.sh --zookeeper  zookeeper:2181 --alter --topic topic5 --partitions 300
160 | ```
161 | 
162 | **Result:** This was a bit of a surprise, but uReplicator did not pick up the new partitions. It continued replicating the old partitions but did not pick up the new partitions.
163 | 
164 | To fix that we use uReplicator's API. We delete the topic from uReplicator and then re-add it and then uReplicator finally starts replicating the new partitions. This seems like a usability issue, not sure if this is by design. 
165 | 
166 | In order to send commands to the remote uReplicator controller(s) we open a local port with port-forwarding:
167 | 
168 | ```
169 | kubectl --context eu-west-1.k8s.local -n ureplicator port-forward  ureplicator-controller-76ff85b889-l9mzl 9000
170 | ```
171 | 
172 | And now we can delete and recreate the topic with as many partitions as we should
173 | 
174 | ```
175 | curl -X DELETE http://localhost:9000/topics/topic5
176 | curl -X POST -d '{"topic":"topic5", "numPartitions":"300"}' http://localhost:9000/topics
177 | ```
178 | 
179 | ### Add uReplicator controller
180 | 
181 | We try adding a new controller to make sure nothing breaks while doing so.
182 | 
183 | **Result:** Looks good, continues operation as normal.
184 | 
185 | ### Delete uReplicator controller
186 | 
187 | We delete a uReplicator controller (there were 3 to begin with) and make sure the rest of the controllers are able to continue operation as planned.
188 | 
189 | **Result:** Looks good, the rest of the controllers behave normally
190 | 
191 | ### Delete all uReplicator controllers
192 | 
193 | We delete all uReplicator controllers to see what happens.
194 | 
195 | **Result:** For as long as there's no controller alive the workers continue their normal operation. However new topics will not are not getting replicated. When controllers are back up they pick up the information about the existing workers and they take charge again of topic replication control.
196 | 
197 | ### Packet loss: 10% on 4 replicator workers (out of 10)
198 | 
199 | Since the main scenario we will deal with is replication over the Atlantic we want to test by simulating packet loss. We use Weave Scopes' Traffic Control plugin in order to apply 10% packet loss on 4 out of 10 uReplicator workers. 10% packet loss is quite high.
200 | 
201 | **Result:** We see slowness in processing, but no message loss. That's good.
202 | 
203 | ![Packet loss on workers](doc/media/packet-loss-on-workers.png "packet loss on workers")
204 | 
205 | ### Packet loss: 10% on 4 brokers in source cluster
206 | 
207 | The source cluster has 30 nodes. In this experiment we apply 10% packet loss on 4 of the brokers. This is similar to the previous experiment only that packet loss was implemented at the other end of the Atlantic.
208 | 
209 | **Result:** OK, we see slowness and when packets are back, the cluster catches up.
210 | 
211 | ![Packet loss on source cluster](doc/media/packet-loss-on-source-cluster.png "Packet loss on source cluster")
212 | 
213 | 
214 | ## Conclusion
215 | 
216 | All in all uReplicator seems like a capable tool for the mission at hand. There are still a few blind spots and hiccups but all in all, it seems ready to go.
217 | 
218 | ### No message headers
219 | 
220 | One of the relatively recent features added to Kafka are message headers. As of this writing *uReplicator does not support message headers*, meaning that if a message contains a header uReplicator would replicate the message, silently discarding its headers. 
221 | 


--------------------------------------------------------------------------------
/running.md:
--------------------------------------------------------------------------------
 1 | # Installing and running the tool(s)
 2 | 
 3 | We assume some level of familiarity with the following tools and technologies:
 4 | 
 5 | * Kafka
 6 | * uReplicator
 7 | * Brooklin
 8 | * AWS
 9 | * Kubernetes
10 | 
11 | And some nice to have and useful skills:
12 | 
13 | * Prometheus
14 | * Grafana
15 | * Golang
16 | 
17 | ## Prerequisite and setup
18 | 
19 | The following tools are required:
20 | * `make` (already installed on most systems)
21 | * `AWS CLI`. And setup AWS keys
22 | * `kops`. The current tested version is 1.10.0 (with brew it's `brew install kops@1.10.0` or `brew upgrade kops@1.10.0` or `brew switch kops 1.10.0`)
23 | * `kubectl` - the kubernetes CLI
24 | * `kafka client tools`, in particular: `zookeeper-shell` and `kafka-console-consumer`
25 | 
26 | # Running it
27 | 
28 | NOTICE: This will incur costs from AWS. We setup up hefty clusters and drive traffic between them and this is costs $$$
29 | 
30 | ```
31 | make k8s-all # Wait for all resources to be created. This could take up to 40min, depending on the cluster size.
32 | ```
33 | 
34 | # Destroying it
35 | 
36 | ```
37 | make k8s-delete-all # And wait for all resources to get deleted. This can take a few minutes
38 | ```
39 | 


--------------------------------------------------------------------------------