├── README.md └── kafka-reassign-tool /README.md: -------------------------------------------------------------------------------- 1 | # kafka-reassign-tool 2 | A helper script for Kafka to make it easier changing replicas for existing topics 3 | 4 | `kafka-reassign-tool` uses Kafka's standard `kafka-reassign-partitions.sh` script and generates data for it. 5 | The main purpose of this script is to make changing replication factor for existing topics easier. 6 | `kafka-reassign-tool` simply generates partition reassignment JSON file that can be fed into `kafka-reassign-partitions.sh` 7 | for execution. 8 | 9 | What it can do: 10 | * increase replication for existing topics 11 | * decrease replication factor for existing topics 12 | * remove all replicas from a particular broker so it can be decomissioned 13 | * balance leaders 14 | 15 | In all these cases kafka-reassign-tool will try to miminise number of changes which makes it 16 | different from standard Kafka tools which may generate a better distribution but they may generate a lot of movement 17 | by reallocating each partition to a completely different set of brokers (which may be ok for small installation but 18 | become an issue when you have lots of partitions and brokers). 19 | 20 | Note: I wrote this tool before discovering Linkedin's [kafka-tools](https://github.com/linkedin/kafka-tools) for example. I am still using my tool now but you may have a better shot with these more mature ones. 21 | 22 | ## Configuration 23 | `kafka-reassign-tool` needs to know where Kafka standard scripts are located as well as Zookeeper URL. 24 | If your Kafka is installed in `/opt/kafka` you do not need to supply any additional command line options. Otherwise you 25 | need to provide `--kafka-home ` command line option. 26 | Either way the script will attempt to read zookeeper URL from `config/server.properties` of Kafka home directory. 27 | If that does not work for any reason, you may need to add `--zookeeper ` option. 28 | 29 | ## Changing replication factor 30 | Running the tool with `--replication-factor 3` for example will change partition assignment map so that each partition has 3 replicas in the end. 31 | `kafka-reassign-tool` goes over all partitions one by one and: 32 | * if partition has fewer replicas than needed - adds more replicas by selecting least used brokers (that have fewer replicas for the topic than others) 33 | * if partition has more replicas than needed - removes extra replicas by removing most used brokers (that have more replicas for the topic than others) 34 | * if partition already at target replication factor, the tool still may re-order its replicas to make sure leader distribution is more level among brokers 35 | 36 | This is an example output of the tool when it is asked to increase replication factor for a topic with 2 replicas to 3: 37 | ``` 38 | $ kafka-reassign-tool --topic mytopic --replication-factor 3 39 | 40 | Reading /opt/kafka/config/server.properties 41 | Using zookeeper URL: localhost:2181/kafka 42 | Reading list of brokers... 43 | Reading list of topics... 44 | ------------------------ 45 | Brokers: 46 | 1001 47 | 1002 48 | 1003 49 | 1004 50 | Topics: 51 | mytopic 52 | ------------------------ 53 | Getting current assignments... 54 | Building new assignments... 55 | mytopic-0 : [1001, 1002] => [1001, 1002, 1003] 56 | mytopic-2 : [1002, 1004] => [1002, 1004, 1001] 57 | mytopic-1 : [1002, 1004] => [1002, 1003, 1004] 58 | mytopic-3 : [1003, 1001] => [1003, 1001, 1004] 59 | Saving new assignments into new-assignments.json... 60 | Done 61 | ``` 62 | And a similar ouptut when it is asked to decrease replication factor for a topic with 2 replicas to 1: 63 | ``` 64 | $ kafka-reassign-tool --topic mytopic --replication-factor 1 65 | 66 | Reading /opt/kafka/config/server.properties 67 | Using zookeeper URL: localhost:2181/kafka 68 | Reading list of brokers... 69 | Reading list of topics... 70 | ------------------------ 71 | Brokers: 72 | 1001 73 | 1002 74 | 1003 75 | 1004 76 | Topics: 77 | mytopic 78 | ------------------------ 79 | Getting current assignments... 80 | Building new assignments... 81 | mytopic-0 : [1001, 1002] => [1001] 82 | mytopic-2 : [1002, 1004] => [1002] 83 | mytopic-1 : [1002, 1004] => [1004] 84 | mytopic-3 : [1003, 1001] => [1003] 85 | Saving new assignments into new-assignments.json... 86 | Done 87 | ``` 88 | 89 | Of course, it always makes sense to eyeball what changes `kafka-reassign-tool` plans before actually executing them. 90 | 91 | ### Bulk change 92 | You can supply `--topic` option multiple times: 93 | ``` 94 | $ kafka-reassign-tool --topic mytopic1 --topic mytopic2 --replication-factor 3 95 | ``` 96 | or you can omit it completely in which case the same replication factor will be applied to all topics in your cluster 97 | ``` 98 | $ kafka-reassign-tool --replication-factor 3 99 | ``` 100 | Which, of course, only makes sense when all your topics share the same replication factor 101 | 102 | ## Decomissioning a broker 103 | The `--brokers` command line option allows you to specify which brokers can be used for assignment. 104 | `kafka-reassign-tool` remove all assignments from brokers that are not in that list (replacing it with others to maintain replication factor). 105 | This can be used to decomission a certain broker by removing all replicas from it. 106 | 107 | For example, the command below does not list broker 1004 so if any partition used it as a replica, it will be changed to something else: 108 | ``` 109 | $ kafka-reassign-tool --topic mytopic --replication-factor 2 --brokers 1001,1002,1003 110 | 111 | Reading /opt/kafka/config/server.properties 112 | Using zookeeper URL: localhost:2181/kafka 113 | Reading list of brokers... 114 | Reading list of topics... 115 | ------------------------ 116 | Brokers: 117 | 1001 118 | 1002 119 | 1003 120 | 1004 121 | Topics: 122 | mytopic 123 | ------------------------ 124 | Getting current assignments... 125 | Building new assignments... 126 | mytopic-0 : [1001, 1002] => [1001, 1002] 127 | mytopic-2 : [1002, 1004] => [1002, 1003] 128 | mytopic-1 : [1002, 1004] => [1002, 1001] 129 | mytopic-3 : [1003, 1001] => [1003, 1001] 130 | Saving new assignments into new-assignments.json... 131 | Done 132 | ``` 133 | 134 | Note that this operation can also be used in bulk by providing multiple `--topic` options or omitting it completely to select all topics. 135 | However, keep in mind that the same value from `--replication-factor` will be used for all selected topics. 136 | 137 | ## Applying the change 138 | After successful invocation, `kafka-reassign-tool` generates `new-assignments.json` file which can be then applied as 139 | ``` 140 | kafka-reassign-partitions.sh --zookeeper --reassignment-json-file new-assignments.json --execute --throttle 100000000 141 | ``` 142 | the example above throttles replication to 100Mb/sec. You may decide to use a different limit or to omit it completely. 143 | If throttling was used, at the end you must verify reassignmnet with 144 | ``` 145 | kafka-reassign-partitions.sh --zookeeper --reassignment-json-file new-assignments.json --verify 146 | ``` 147 | so replication quota is reset. For more details see https://kafka.apache.org/documentation.html#rep-throttle 148 | -------------------------------------------------------------------------------- /kafka-reassign-tool: -------------------------------------------------------------------------------- 1 | #!/usr/bin/ruby 2 | 3 | require 'yaml' 4 | require 'json' 5 | require 'open3' 6 | require 'tempfile' 7 | require 'optparse' 8 | 9 | DEFAULT_KAFKA_ROOT='/opt/kafka' 10 | DEFAULT_KAFKA_CONFIG='/config/server.properties' 11 | 12 | $kafka_root = nil 13 | $zookeeper_url = nil 14 | 15 | def run(command, args = [], opts = {}) 16 | 17 | opts = { 18 | :raise_on_exitstatus => true, 19 | :raise_on_err => true, 20 | :input => '' 21 | }.merge(opts) 22 | 23 | out = nil 24 | err = nil 25 | exit_status = nil 26 | Open3::popen3(command, *args) do |stdin, stdout, stderr, wait_thr| 27 | stdin << opts[:input] 28 | stdin.close 29 | out = stdout.read 30 | err = stderr.read 31 | exit_status = wait_thr.value 32 | end 33 | 34 | if (opts[:raise_on_exitstatus] && exit_status.exitstatus != 0) || (opts[:raise_on_err] && !err.empty?) then 35 | puts "#{command} failed. exitstatus=#{exit_status.exitstatus}, stderr:" 36 | puts err 37 | raise "#{command} failed" 38 | end 39 | 40 | {:out => out, :err => err, :exit_status => exit_status} 41 | end 42 | 43 | def get_kafka_root() 44 | return $kafka_root if $kafka_root 45 | DEFAULT_KAFKA_ROOT 46 | end 47 | 48 | def get_zk_url() 49 | return $zookeeper_url if $zookeeper_url 50 | 51 | config_file = get_kafka_root + DEFAULT_KAFKA_CONFIG 52 | 53 | if File.file? config_file then 54 | puts "Reading #{config_file}" 55 | File.open(config_file, 'r') do |file| 56 | file.each_line do |line| 57 | if line =~ /^zookeeper.connect=(.*)/ then 58 | $zookeeper_url = $1.strip 59 | end 60 | end 61 | end 62 | else 63 | puts "Config file #{config_file} does not exist" 64 | end 65 | 66 | raise "No zookeeper URL given" unless $zookeeper_url 67 | 68 | $zookeeper_url 69 | end 70 | 71 | def get_brokers() 72 | result = run(get_kafka_root + '/bin/zookeeper-shell.sh', [get_zk_url], {:input => 'ls /brokers/ids'}) 73 | 74 | # Look for a line 75 | # [1003, 1001, 1002] 76 | # 77 | brokers = nil 78 | result[:out].lines.each {|line| 79 | if line =~ /^ \[ ([^\]]+) \] /x then 80 | brokers = $1.split(',').collect(&:strip).collect(&:to_i) 81 | end 82 | } 83 | 84 | brokers 85 | end 86 | 87 | def get_topics() 88 | result = run(get_kafka_root + '/bin/kafka-topics.sh', [ 89 | '--zookeeper', get_zk_url, 90 | '--list' 91 | ]) 92 | result[:out].lines.collect(&:strip) 93 | end 94 | 95 | def topics_json(topics) 96 | # { 97 | # "topics": [ 98 | # {"topic": "topicname1"}, 99 | # {"topic": "topicname2"} 100 | # ], 101 | # "version":1 102 | # } 103 | 104 | data = { 105 | :version => 1, 106 | :topics => topics.collect{|t| { :topic => t }} 107 | } 108 | 109 | data.to_json 110 | end 111 | 112 | def get_current_assignments(topics, brokers) 113 | 114 | topics_file = Tempfile.new(File.basename(__FILE__) + '-topics') 115 | topics_file << topics_json(topics) 116 | topics_file.close 117 | 118 | result = run(get_kafka_root + '/bin/kafka-reassign-partitions.sh', [ 119 | '--zookeeper', get_zk_url, 120 | '--topics-to-move-json-file', topics_file.path, 121 | '--broker-list', brokers.join(','), 122 | '--generate' 123 | ]) 124 | 125 | # Current partition replica assignment 126 | # 127 | # {"version":1,"partitions":[]} 128 | # Proposed partition reassignment configuration 129 | # 130 | # {"version":1,"partitions":[]} 131 | 132 | # Note that on newer Kafka versions all that is prepended by this line: 133 | # Warning: --zookeeper is deprecated, and will be removed in a future version of Kafka. 134 | 135 | unless result[:out] =~ /Current partition replica assignment(.*)Proposed partition reassignment configuration(.*)\z/m 136 | puts "kafka-reassign-partitions.sh output:" 137 | puts result[:out] 138 | raise "Cannot parse assignment data" 139 | end 140 | 141 | JSON.parse($1) 142 | end 143 | 144 | # Returns a hash 145 | # { 146 | # 'topic_name' => { 147 | # 'broker_id' => { 148 | # :replica => 0, // for how many partitions this broker has a replica 149 | # :leader => 0, // for how many partitions this broker is the leader 150 | # }, 151 | # ... 152 | # }, 153 | # ... 154 | # } 155 | # 156 | def get_broker_stats(assignments) 157 | stats = {} 158 | 159 | assignments['partitions'].each {|item| 160 | topic = item['topic'] 161 | replicas = item['replicas'] 162 | 163 | stats[topic] ||= {} 164 | topic_stats = stats[topic] 165 | 166 | leader = nil 167 | replicas.each {|replica| 168 | leader = replica if leader.nil? 169 | 170 | topic_stats[replica] ||= {:replica => 0, :leader => 0} 171 | topic_stats[replica][:replica] += 1 172 | } 173 | 174 | topic_stats[leader][:leader] += 1 if leader 175 | } 176 | 177 | stats 178 | end 179 | 180 | def set_replication(assignments, brokers, replication_factor) 181 | 182 | stats = get_broker_stats(assignments) 183 | 184 | # Make sure we have an entry for each broker even if given topic never uses it 185 | stats.each {|topic, topic_data| 186 | brokers.each {|broker| 187 | topic_data[broker] ||= {:replica => 0, :leader => 0} 188 | } 189 | } 190 | 191 | result = assignments.clone() 192 | new_assignments = [] 193 | result['partitions'] = new_assignments 194 | 195 | puts 196 | puts "Initial stats:" 197 | puts stats 198 | puts 199 | 200 | assignments['partitions'].each {|item| 201 | topic = item['topic'] 202 | partition = item['partition'] 203 | replicas = item['replicas'] 204 | 205 | topic_stats = stats[topic] 206 | 207 | # Calculate for how many partitions a broker should be a leader 208 | # I calculate number of partition as a sum of how many partitions each broker leads 209 | # this kinda sucks, it would be easier if get_broker_stats returned that info but 210 | # that will require changing its result structure once again. So next time. 211 | partition_count = topic_stats.values.collect {|data| data[:leader]}.reduce(0, :+) 212 | leader_target = partition_count / brokers.size 213 | 214 | # Remember what replicas we had before so we can see later if anything has changed 215 | orig_replicas = replicas.clone() 216 | 217 | # Remove replicas that are not in the given list of allowed brokers 218 | old_leader = replicas[0] 219 | (replicas - brokers).each {|broker| 220 | replicas.delete(broker) 221 | topic_stats[broker][:replica] -= 1 222 | } 223 | 224 | while replicas.size > replication_factor do 225 | # Remove the most used (for this topic) broker 226 | broker = topic_stats.select{|broker, data| replicas.include? broker} .max_by{|broker, data| data[:replica]}[0] 227 | 228 | topic_stats[broker][:replica] -= 1 229 | replicas.delete(broker) 230 | end 231 | 232 | while replicas.size < replication_factor do 233 | # Use least used (for this topic) allowed broker as a new replica 234 | broker = topic_stats.select{|broker, data| (brokers - replicas).include? broker} .min_by{|broker, data| data[:replica]}[0] 235 | 236 | # Make a new broker ordinary replica, the code below will promote it to leader if needed 237 | replicas.push broker 238 | topic_stats[broker][:replica] += 1 239 | end 240 | 241 | # If removing brokers that were not allowed or adding/removing replicas changed the leader, update stats too 242 | # Need to have up-to-date stats before using them to select a new leader 243 | new_leader = replicas[0] 244 | if new_leader != old_leader then 245 | topic_stats[old_leader][:leader] -= 1 unless old_leader.nil? 246 | topic_stats[new_leader][:leader] += 1 unless new_leader.nil? 247 | end 248 | 249 | old_leader = new_leader 250 | 251 | # Select the best leader for this partition (based on which broker is a leader for fewest partitions) 252 | 253 | candidate_leader_pos = nil 254 | candidate_leader_leads = leader_target 255 | 256 | replicas.each_with_index{|broker, pos| 257 | # For how many partitions this broker is currently the leader 258 | leader_for = topic_stats[broker][:leader] 259 | if leader_for < candidate_leader_leads then 260 | candidate_leader_pos = pos 261 | candidate_leader_leads = leader_for 262 | end 263 | } 264 | 265 | # Make our leader the first element in the list 266 | replicas.unshift(replicas.delete_at(candidate_leader_pos)) if candidate_leader_pos 267 | 268 | # Update leader stats again if leader has changed 269 | new_leader = replicas[0] 270 | if new_leader != old_leader then 271 | topic_stats[old_leader][:leader] -= 1 unless old_leader.nil? 272 | topic_stats[new_leader][:leader] += 1 unless new_leader.nil? 273 | end 274 | 275 | next if replicas == orig_replicas 276 | 277 | puts " #{topic}-#{partition} : #{orig_replicas} => #{replicas}" 278 | 279 | item = item.clone() 280 | item['replicas'] = replicas 281 | 282 | # If log_dirs configuration is present (it is optional) and it only 283 | # contains 'any' and not something else, replace it with a new one 284 | # as number of replicas may have changed. 285 | if item['log_dirs'] == ['any'] * orig_replicas.count 286 | item['log_dirs'] = ['any'] * replicas.count 287 | end 288 | 289 | new_assignments << item 290 | } 291 | 292 | puts 293 | puts "Final stats:" 294 | puts stats 295 | puts 296 | 297 | result 298 | end 299 | 300 | topics = [] 301 | brokers = [] 302 | replication_factor = nil 303 | 304 | optparse = OptionParser.new do |opts| 305 | opts.on("--kafka-home ", 306 | "Root directory of the Kafka installation", 307 | " (standard Kafka scripts must be under bin/ directory there)", 308 | " Default: #{DEFAULT_KAFKA_ROOT}") { |v| $kafka_root = v } 309 | opts.on("--zookeeper ", 310 | "The connection string for the zookeeper connection", 311 | " If not specified, and attempt to read it from Kafka config file is made") { |v| $zookeeper_url = v } 312 | opts.on("--topic ", 313 | "Can be specified multiple times to list one or more topic to apply operation to.", 314 | " If option is not used, operation will apply to all existing topics") { |v| topics << v } 315 | opts.on("--brokers ", 316 | "Coma-separated list of brokers that partitions will be reassigned to", 317 | " If option is not used, all brokers of the cluster will be used") { |v| brokers = v.split(',').collect(&:strip).collect(&:to_i) } 318 | opts.on("--replication-factor ", 319 | "Target replication factor. Required") { |v| replication_factor = v.to_i } 320 | end 321 | 322 | begin 323 | optparse.parse! 324 | raise OptionParser::MissingArgument.new('replication-factor') if replication_factor.nil? 325 | rescue OptionParser::InvalidOption, OptionParser::MissingArgument 326 | puts $!.to_s 327 | puts optparse 328 | exit -1 329 | end 330 | 331 | 332 | zk_url = get_zk_url() 333 | puts "Using zookeeper URL: #{zk_url}" 334 | 335 | puts "Reading list of brokers..." 336 | known_brokers = get_brokers() 337 | 338 | if brokers.empty? then 339 | brokers = known_brokers 340 | else 341 | brokers = brokers.sort.uniq 342 | invalid = brokers - known_brokers 343 | unless invalid.empty? 344 | raise "Unknown brokers: #{invalid}" 345 | end 346 | end 347 | 348 | puts "Reading list of topics..." 349 | known_topics = get_topics() 350 | 351 | if topics.empty? then 352 | topics = known_topics 353 | else 354 | topics = topics.sort.uniq 355 | invalid = topics - known_topics 356 | unless invalid.empty? 357 | raise "Unknown topics: #{invalid}" 358 | end 359 | end 360 | 361 | puts "------------------------" 362 | puts "Brokers:" 363 | brokers.each {|broker| puts " #{broker}" } 364 | puts "Topics:" 365 | topics.each {|topic| puts " #{topic}" } 366 | puts "------------------------" 367 | 368 | if brokers.size < replication_factor then 369 | raise "Cannot achieve replication factor of #{replication_factor} with #{brokers.size} broker(s)" 370 | end 371 | 372 | puts "Getting current assignments..." 373 | assignments = get_current_assignments(topics, known_brokers) 374 | 375 | puts "Building new assignments..." 376 | new_assignments = set_replication(assignments, brokers, replication_factor) 377 | 378 | if new_assignments['partitions'].empty? then 379 | puts "No changes needed" 380 | else 381 | puts "Saving new assignments into new-assignments.json..." 382 | File.open('new-assignments.json', 'w') { |file| 383 | file.write(new_assignments.to_json) 384 | } 385 | puts "Done" 386 | puts "To apply, run:" 387 | puts " #{get_kafka_root + '/bin/kafka-reassign-partitions.sh'} --zookeeper #{zk_url} --reassignment-json-file new-assignments.json --execute --throttle XXXXXXXX" 388 | puts "Then verify with:" 389 | puts " #{get_kafka_root + '/bin/kafka-reassign-partitions.sh'} --zookeeper #{zk_url} --reassignment-json-file new-assignments.json --verify" 390 | puts 391 | end 392 | 393 | --------------------------------------------------------------------------------