├── README.md ├── broker.properties ├── coordinator.properties ├── historical.properties ├── images ├── broken_console.png ├── console.png ├── task_fail.png ├── task_running.png └── working_console.png ├── middleManager.properties ├── overlord.properties ├── task.log └── update.sh /README.md: -------------------------------------------------------------------------------- 1 | **Table of Contents** 2 | 3 | - [Overview](#overview) 4 | - [Node Configuration](#node-configuration) 5 | - [JVM Settings](#jvm-settings) 6 | - [Issues](#issues) 7 | - [Coordinator Console Problems](#coordinator-console-problems) 8 | - [Resolved Issues](#resolved-issues) 9 | - [No Task Logs](#no-task-logs) 10 | - [Index Task Fails](#index-task-fails) 11 | - [Not enough direct memory](#not-enough-direct-memory) 12 | - [Exception with one of the sequences](#exception-with-one-of-the-sequences) 13 | 14 | --- 15 | 16 | # Overview 17 | 18 | Please see the included `.properties` files for the druid configuration of each node type. At the top of each file is a summary of system resources such as the number of processors the machine has, the size of main memory in bytes, etc... 19 | 20 | ## Node Configuration 21 | 22 | My druid cluster consists of the following node types with the linked `/runtime.properties` configuration files: 23 | 24 | * 3x [historical](historical.properties) 25 | * 1x [overlord](overlord.properties) 26 | * 1x [middleManager](middleManager.properties) 27 | * 1x [broker](broker.properties) 28 | * 1x [coordinator](coordinator.properties) 29 | * 1x mysql 30 | * 1x zookeeper 31 | 32 | **Each node is a seperate AWS EC2 instance** 33 | 34 | ## JVM Settings 35 | 36 | ### coordinator 37 | `java -server -Xmx10g -Xms10g -XX:NewSize=512m -XX:MaxNewSize=512m -XX:+UseG1GC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Djava.io.tmpdir=/tmp -classpath /usr/local/druid-services/lib/*:/usr/local/druid-services/config/coordinator io.druid.cli.Main server coordinator` 38 | 39 | ### broker 40 | `java -server -Xmx20g -Xms20g -XX:NewSize=2g -XX:MaxNewSize=2g -XX:MaxDirectMemorySize=31g -XX:+UseConcMarkSweepGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Djava.i.tmpdir=/tmp -classpath /usr/local/druid-services/lib/*:/usr/local/druid-services/config/broker io.druid.cli.Main server broker` 41 | 42 | ### historical 43 | `java -server -Xmx4g -Xms4g -XX:NewSize=1g -XX:MaxDirectMemorySize=9g -XX:+UseConcMarkSweepGC -XX:+PrintGCDetails -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Djava.i.tmpdir=/tmp -classpath /usr/local/druid-services/lib/*:/usr/local/druid-services/config/historical io.druid.cli.Main server historical` 44 | 45 | ### overlord 46 | `java -server -Xmx10g -Xms10g -XX:NewSize=512m -XX:MaxNewSize=512m -XX:+UseConcMarkSweepGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Djava.io.tmpdir=/tmp -classpath /usr/local/druid-services/lib/*:/usr/local/druid-services/config/overlord io.druid.cli.Main server overlord` 47 | 48 | ### middleManager 49 | `java -server -Xmx64m -Xms64m -XX:+UseConcMarkSweepGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Djava.io.tmpdir=/tmp -classpath /usr/local/druid-services/lib/*:/usr/local/druid-services/config/middleManager io.druid.cli.Main server middleManager` 50 | 51 | --- 52 | 53 | # Issues 54 | 55 | ## Coordinator Console Problems 56 | 57 | The Coordniator Console looks like this: 58 | ![alt tag](https://raw.github.com/lexicalunit/druid_config/master/images/console.png) 59 | 60 | When I go to `http://coordinator-ip:8080`, the Coordinator Console comes up and works perfectly. For example, the dropdowns are properly populated: 61 | ![alt tag](https://raw.github.com/lexicalunit/druid_config/master/images/working_console.png) 62 | 63 | However, when I go to, for example, `http://overlord-ip:8080` instead, the console will come up but nothing works properly. For example, dropdowns not working anymore: 64 | ![alt tag](https://raw.github.com/lexicalunit/druid_config/master/images/broken_console.png) 65 | 66 | --- 67 | 68 | # Resolved Issues 69 | 70 | ## No Task Logs 71 | 72 | After kicking off a Indexing Task I can see that it is running in the console: 73 | ![alt tag](https://raw.github.com/lexicalunit/druid_config/master/images/task_running.png) 74 | 75 | Eventually, the task fails: 76 | ![alt tag](https://raw.github.com/lexicalunit/druid_config/master/images/task_fail.png) 77 | 78 | When I click on the link `log (all)` I am taken to a page that just says: 79 | 80 | ``` 81 | No log was found for this task. The task may not exist, or it may not have begun running yet. 82 | ``` 83 | 84 | Given the following properties in my overlord configuration: 85 | 86 | ```properties 87 | druid.indexer.logs.type=s3 88 | druid.indexer.logs.s3Bucket=s3-bucket 89 | druid.indexer.logs.s3Prefix=logs 90 | ``` 91 | 92 | And the following properties in my middleManager configuration: 93 | 94 | ```properties 95 | druid.selectors.indexing.serviceName=druid:overlord 96 | ``` 97 | 98 | The logs should be available on S3, however they are not: 99 | 100 | ```bash 101 | $ s3cmd ls s3://s3-bucket/ 102 | DIR s3://s3-bucket/click_conversion/ 103 | DIR s3://s3-bucket/click_conversion_weekly/ 104 | ``` 105 | 106 | ### Resolution 107 | 108 | The following needed to be added to the middleManager configuration: 109 | 110 | ```properties 111 | druid.indexer.logs.type=s3 112 | druid.indexer.logs.s3Bucket=s3-bucket 113 | druid.indexer.logs.s3Prefix=logs 114 | ``` 115 | 116 | ## Index Task Fails 117 | 118 | I am trying to reindex my `click_conversion` dataset so that segments are indexed by `WEEK` rather than by `DAY`. 119 | 120 | ### Index Task Description 121 | 122 | Here is my indexing task specification: 123 | 124 | ```json 125 | { 126 | "type": "index", 127 | "dataSource": "click_conversion_weekly", 128 | "granularitySpec": 129 | { 130 | "type": "uniform", 131 | "gran": "WEEK", 132 | "intervals": ["2014-04-06T00:00:00.000Z/2014-04-13T00:00:00.000Z"] 133 | }, 134 | "indexGranularity": "minute", 135 | "aggregators": [ 136 | {"type": "count", "name": "count"}, 137 | {"type": "doubleSum", "name": "commissions", "fieldName": "commissions"}, 138 | {"type": "doubleSum", "name": "sales", "fieldName": "sales"}, 139 | {"type": "doubleSum", "name": "orders", "fieldName": "orders"} 140 | ], 141 | "firehose": { 142 | "type": "ingestSegment", 143 | "dataSource": "click_conversion", 144 | "interval": "2014-04-06T00:00:00.000Z/2014-04-13T00:00:00.000Z" 145 | } 146 | } 147 | ``` 148 | 149 | ### Index Task Error 150 | 151 | The task eventually fails with the following error: 152 | 153 | ``` 154 | 2014-10-27 16:40:11,934 INFO [task-runner-0] io.druid.indexing.common.actions.RemoteTaskActionClient - Submitting action for task[reingest] to overlord[http://10.13.132.213:8080/druid/indexer/v1/action]: SegmentListUsedAction{dataSource='click_conversion', interval=2014-04-06T00:00:00.000Z/2014-04-13T00:00:00.000Z} 155 | 2014-10-27 16:40:50,224 WARN [task-runner-0] io.druid.indexing.common.index.YeOldePlumberSchool - Failed to merge and upload 156 | java.lang.IllegalStateException: Nothing indexed? 157 | at io.druid.indexing.common.index.YeOldePlumberSchool$1.finishJob(YeOldePlumberSchool.java:159) 158 | at io.druid.indexing.common.task.IndexTask.generateSegment(IndexTask.java:444) 159 | at io.druid.indexing.common.task.IndexTask.run(IndexTask.java:198) 160 | at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:219) 161 | at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:198) 162 | at java.util.concurrent.FutureTask.run(FutureTask.java:262) 163 | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 164 | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 165 | at java.lang.Thread.run(Thread.java:745) 166 | 2014-10-27 16:40:50,228 INFO [task-runner-0] io.druid.indexing.common.task.IndexTask - Task[index_click_conversion_weekly_2014-10-27T16:36:15.336Z] interval[2014-04-07T00:00:00.000Z/2014-04-14T00:00:00.000Z] partition[0] took in 12,968,203 rows (0 processed, 0 unparseable, 12,968,203 thrown away) and output 0 rows 167 | 2014-10-27 16:40:50,229 ERROR [task-runner-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[IndexTask{id=index_click_conversion_weekly_2014-10-27T16:36:15.336Z, type=index, dataSource=click_conversion_weekly}] 168 | java.lang.IllegalStateException: Nothing indexed? 169 | at io.druid.indexing.common.index.YeOldePlumberSchool$1.finishJob(YeOldePlumberSchool.java:159) 170 | at io.druid.indexing.common.task.IndexTask.generateSegment(IndexTask.java:444) 171 | at io.druid.indexing.common.task.IndexTask.run(IndexTask.java:198) 172 | at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:219) 173 | at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:198) 174 | at java.util.concurrent.FutureTask.run(FutureTask.java:262) 175 | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 176 | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 177 | at java.lang.Thread.run(Thread.java:745) 178 | 2014-10-27 16:40:50,229 INFO [task-runner-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Removing task directory: /tmp/persistent/task/index_click_conversion_weekly_2014-10-27T16:36:15.336Z/work 179 | 2014-10-27 16:40:50,242 INFO [task-runner-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: { 180 | "id" : "index_click_conversion_weekly_2014-10-27T16:36:15.336Z", 181 | "status" : "FAILED", 182 | "duration" : 268865 183 | } 184 | 2014-10-27 16:40:50,248 INFO [main] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.server.coordination.AbstractDataSegmentAnnouncer.stop()] on object[io.druid.server.coordination.BatchDataSegmentAnnouncer@47f36a24]. 185 | ``` 186 | 187 | Please see the full log for details: [task.log](task.log). 188 | 189 | ### Index Task Overlord Log 190 | 191 | And here is what the overlord log shows: 192 | 193 | ``` 194 | 2014-10-27 16:40:11,761 INFO [qtp1352013222-38] io.druid.indexing.common.actions.LocalTaskActionClient - Performing action for task[reingest]: SegmentListUsedAction{dataSource='click_conversion', interval=2014-04-06T00:00:00.000Z/2014-04-13T00:00:00.000Z} 195 | 2014-10-27 16:40:28,849 INFO [TaskQueue-StorageSync] io.druid.indexing.overlord.TaskQueue - Synced 1 tasks from storage (0 tasks added, 0 tasks removed). 196 | 2014-10-27 16:40:50,822 INFO [PathChildrenCache-0] io.druid.indexing.overlord.RemoteTaskRunner - Worker[10.136.64.60:8080] wrote FAILED status for task: index_click_conversion_weekly_2014-10-27T16:36:15.336Z 197 | 2014-10-27 16:40:50,822 INFO [PathChildrenCache-0] io.druid.indexing.overlord.RemoteTaskRunner - Worker[10.136.64.60:8080] completed task[index_click_conversion_weekly_2014-10-27T16:36:15.336Z] with status[FAILED] 198 | 2014-10-27 16:40:50,822 INFO [PathChildrenCache-0] io.druid.indexing.overlord.TaskQueue - Received FAILED status for task: index_click_conversion_weekly_2014-10-27T16:36:15.336Z 199 | 2014-10-27 16:40:50,823 INFO [PathChildrenCache-0] io.druid.indexing.overlord.RemoteTaskRunner - Cleaning up task[index_click_conversion_weekly_2014-10-27T16:36:15.336Z] on worker[10.136.64.60:8080] 200 | 2014-10-27 16:40:50,826 INFO [PathChildrenCache-0] io.druid.indexing.overlord.HeapMemoryTaskStorage - Updating task index_click_conversion_weekly_2014-10-27T16:36:15.336Z to status: TaskStatus{id=index_click_conversion_weekly_2014-10-27T16:36:15.336Z, status=FAILED, duration=0} 201 | 2014-10-27 16:40:50,826 INFO [PathChildrenCache-0] io.druid.indexing.overlord.TaskLockbox - Removing task[index_click_conversion_weekly_2014-10-27T16:36:15.336Z] from TaskLock[index_click_conversion_weekly_2014-10-27T16:36:15.336Z] 202 | 2014-10-27 16:40:50,826 INFO [PathChildrenCache-0] io.druid.indexing.overlord.TaskLockbox - TaskLock is now empty: TaskLock{groupId=index_click_conversion_weekly_2014-10-27T16:36:15.336Z, dataSource=click_conversion_weekly, interval=2014-03-31T00:00:00.000Z/2014-04-14T00:00:00.000Z, version=2014-10-27T16:36:15.337Z} 203 | 2014-10-27 16:40:50,826 INFO [PathChildrenCache-0] io.druid.indexing.overlord.TaskQueue - Task done: IndexTask{id=index_click_conversion_weekly_2014-10-27T16:36:15.336Z, type=index, dataSource=click_conversion_weekly} 204 | 2014-10-27 16:40:50,827 INFO [PathChildrenCache-0] io.druid.indexing.overlord.TaskQueue - Task FAILED: IndexTask{id=index_click_conversion_weekly_2014-10-27T16:36:15.336Z, type=index, dataSource=click_conversion_weekly} (0 run duration) 205 | 2014-10-27 16:40:50,827 INFO [PathChildrenCache-0] io.druid.indexing.overlord.RemoteTaskRunner - Task[index_click_conversion_weekly_2014-10-27T16:36:15.336Z] went bye bye. 206 | 2014-10-27 16:41:28,849 INFO [TaskQueue-StorageSync] io.druid.indexing.overlord.TaskQueue - Synced 0 tasks from storage (0 tasks added, 0 tasks removed). 207 | 2014-10-27 16:42:28,850 INFO [TaskQueue-StorageSync] io.druid.indexing.overlord.TaskQueue - Synced 0 tasks from storage (0 tasks added, 0 tasks removed). 208 | ``` 209 | 210 | ### Resolution 211 | 212 | I had to workaround a bug that the druid devs found in the indexing service by specifying directly the `numShards` property in the task JSON. This is the replacement task JSON: 213 | 214 | ```json 215 | { 216 | "type" : "index", 217 | "schema" : { 218 | "dataSchema" : { 219 | "dataSource" : "click_conversion_weekly", 220 | "metricsSpec" : [ { 221 | "type" : "count", 222 | "name" : "count" 223 | }, { 224 | "type" : "doubleSum", 225 | "name" : "commissions", 226 | "fieldName" : "commissions" 227 | }, { 228 | "type" : "doubleSum", 229 | "name" : "sales", 230 | "fieldName" : "sales" 231 | }, { 232 | "type" : "doubleSum", 233 | "name" : "orders", 234 | "fieldName" : "orders" 235 | } ], 236 | "granularitySpec" : { 237 | "type" : "uniform", 238 | "segmentGranularity" : "WEEK", 239 | "queryGranularity" : "MINUTE", 240 | "intervals" : [ "2014-04-13T00:00:00.000Z/2014-09-01T00:00:00.000Z" ] 241 | } 242 | }, 243 | "ioConfig" : { 244 | "type" : "index", 245 | "firehose" : { 246 | "type" : "ingestSegment", 247 | "dataSource" : "click_conversion", 248 | "interval" : "2014-04-13T00:00:00.000Z/2014-09-01T00:00:00.000Z" 249 | } 250 | }, 251 | "tuningConfig" : { 252 | "type" : "index", 253 | "rowFlushBoundary" : 500000, 254 | "targetPartitionSize": -1, 255 | "numShards" : 3 256 | } 257 | } 258 | } 259 | ``` 260 | 261 | I also needed to mount an EBS volume on the middle manager node and add the following configuration to the middle manager node: 262 | 263 | ```properties 264 | druid.indexer.runner.javaOpts="-server -Xmx2g -XX:+UseG1GC -XX:MaxGCPauseMillis=100 -XX:+PrintGCDetails -XX:+PrintGCTimeStamps" 265 | druid.indexer.task.chathandler.type=announce 266 | druid.indexer.task.baseTaskDir=/persistent 267 | ``` 268 | 269 | ## Not enough direct memory 270 | 271 | I get the following in my historical node log: 272 | 273 | ``` 274 | 2014-10-29 21:30:44,610 INFO [main] io.druid.guice.JsonConfigurator - Loaded class[class io.druid.segment.loading.SegmentLoaderConfig] from props[druid.segmentCache.] as [SegmentLoaderConfig{locations=[StorageLocationConfig{path=/indexCache, maxSize=21207667507}], deleteOnRemove=true, dropSegmentDelayMillis=30000, infoDir=null}] 275 | 2014-10-29 21:30:44,615 INFO [main] io.druid.guice.JsonConfigurator - Loaded class[class io.druid.query.QueryConfig] from props[druid.query.] as [io.druid.query.QueryConfig@1159f15e] 276 | 2014-10-29 21:30:44,630 INFO [main] io.druid.guice.JsonConfigurator - Loaded class[class io.druid.query.search.search.SearchQueryConfig] from props[druid.query.search.] as [io.druid.query.search.search.SearchQueryConfig@60f1057] 277 | 2014-10-29 21:30:44,638 INFO [main] io.druid.guice.JsonConfigurator - Loaded class[class io.druid.query.groupby.GroupByQueryConfig] from props[druid.query.groupBy.] as [io.druid.query.groupby.GroupByQueryConfig@70095013] 278 | 2014-10-29 21:30:44,641 INFO [main] org.skife.config.ConfigurationObjectFactory - Assigning value [1142857142] for [druid.processing.buffer.sizeBytes] on [io.druid.query.DruidProcessingConfig#intermediateComputeSizeBytes()] 279 | 2014-10-29 21:30:44,644 INFO [main] org.skife.config.ConfigurationObjectFactory - Assigning value [7] for [druid.processing.numThreads] on [io.druid.query.DruidProcessingConfig#getNumThreads()] 280 | 2014-10-29 21:30:44,644 INFO [main] org.skife.config.ConfigurationObjectFactory - Using method itself for [${base_path}.columnCache.sizeBytes] on [io.druid.query.DruidProcessingConfig#columnCacheSizeBytes()] 281 | 2014-10-29 21:30:44,645 INFO [main] org.skife.config.ConfigurationObjectFactory - Assigning default value [processing-%s] for [${base_path}.formatString] on [com.metamx.common.concurrent.ExecutorServiceConfig#getFormatString()] 282 | 2014-10-29 21:30:44,726 WARN [main] io.druid.guice.DruidProcessingModule - Guice provision errors: 283 | 284 | 1) Not enough direct memory. Please adjust -XX:MaxDirectMemorySize, druid.processing.buffer.sizeBytes, or druid.processing.numThreads: maxDirectMemory[4,294,967,296], memoryNeeded[9,142,857,136] = druid.processing.buffer.sizeBytes[1,142,857,142] * ( druid.processing.numThreads[7] + 1 ) 285 | 286 | 1 error 287 | com.google.inject.ProvisionException: Guice provision errors: 288 | 289 | 1) Not enough direct memory. Please adjust -XX:MaxDirectMemorySize, druid.processing.buffer.sizeBytes, or druid.processing.numThreads: maxDirectMemory[4,294,967,296], memoryNeeded[9,142,857,136] = druid.processing.buffer.sizeBytes[1,142,857,142] * ( druid.processing.numThreads[7] + 1 ) 290 | 291 | 1 error 292 | at io.druid.guice.DruidProcessingModule.getIntermediateResultsPool(DruidProcessingModule.java:87) 293 | at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 294 | at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 295 | at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 296 | at java.lang.reflect.Method.invoke(Method.java:606) 297 | at com.google.inject.internal.ProviderMethod.get(ProviderMethod.java:105) 298 | at com.google.inject.internal.ProviderInternalFactory.provision(ProviderInternalFactory.java:86) 299 | at com.google.inject.internal.InternalFactoryToInitializableAdapter.provision(InternalFactoryToInitializableAdapter.java:55) 300 | at com.google.inject.internal.ProviderInternalFactory.circularGet(ProviderInternalFactory.java:66) 301 | at com.google.inject.internal.InternalFactoryToInitializableAdapter.get(InternalFactoryToInitializableAdapter.java:47) 302 | at com.google.inject.internal.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:46) 303 | at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1058) 304 | at com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40) 305 | at com.google.inject.Scopes$1$1.get(Scopes.java:65) 306 | at com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:41) 307 | at com.google.inject.internal.SingleParameterInjector.inject(SingleParameterInjector.java:38) 308 | at com.google.inject.internal.SingleParameterInjector.getAll(SingleParameterInjector.java:62) 309 | at com.google.inject.internal.ConstructorInjector.provision(ConstructorInjector.java:107) 310 | at com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:88) 311 | at com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:269) 312 | at com.google.inject.internal.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:46) 313 | at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1058) 314 | at com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40) 315 | at com.google.inject.Scopes$1$1.get(Scopes.java:65) 316 | at com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:41) 317 | at com.google.inject.internal.SingleParameterInjector.inject(SingleParameterInjector.java:38) 318 | at com.google.inject.internal.SingleParameterInjector.getAll(SingleParameterInjector.java:62) 319 | at com.google.inject.internal.ConstructorInjector.provision(ConstructorInjector.java:107) 320 | at com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:88) 321 | at com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:269) 322 | at com.google.inject.internal.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:46) 323 | at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1058) 324 | at com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40) 325 | at com.google.inject.Scopes$1$1.get(Scopes.java:65) 326 | at com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:41) 327 | at com.google.inject.internal.FactoryProxy.get(FactoryProxy.java:56) 328 | at com.google.inject.internal.InjectorImpl$3$1.call(InjectorImpl.java:1005) 329 | at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1058) 330 | at com.google.inject.internal.InjectorImpl$3.get(InjectorImpl.java:1001) 331 | at com.google.inject.spi.ProviderLookup$1.get(ProviderLookup.java:90) 332 | at com.google.inject.spi.ProviderLookup$1.get(ProviderLookup.java:90) 333 | at com.google.inject.multibindings.MapBinder$RealMapBinder$2.get(MapBinder.java:389) 334 | at com.google.inject.multibindings.MapBinder$RealMapBinder$2.get(MapBinder.java:385) 335 | at com.google.inject.internal.ProviderInternalFactory.provision(ProviderInternalFactory.java:86) 336 | at com.google.inject.internal.InternalFactoryToInitializableAdapter.provision(InternalFactoryToInitializableAdapter.java:55) 337 | at com.google.inject.internal.ProviderInternalFactory.circularGet(ProviderInternalFactory.java:66) 338 | at com.google.inject.internal.InternalFactoryToInitializableAdapter.get(InternalFactoryToInitializableAdapter.java:47) 339 | at com.google.inject.internal.SingleParameterInjector.inject(SingleParameterInjector.java:38) 340 | at com.google.inject.internal.SingleParameterInjector.getAll(SingleParameterInjector.java:62) 341 | at com.google.inject.internal.ConstructorInjector.provision(ConstructorInjector.java:107) 342 | at com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:88) 343 | at com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:269) 344 | at com.google.inject.internal.FactoryProxy.get(FactoryProxy.java:56) 345 | at com.google.inject.internal.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:46) 346 | at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1058) 347 | at com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40) 348 | at com.google.inject.Scopes$1$1.get(Scopes.java:65) 349 | at com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:41) 350 | at com.google.inject.internal.SingleParameterInjector.inject(SingleParameterInjector.java:38) 351 | at com.google.inject.internal.SingleParameterInjector.getAll(SingleParameterInjector.java:62) 352 | at com.google.inject.internal.ConstructorInjector.provision(ConstructorInjector.java:107) 353 | at com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:88) 354 | at com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:269) 355 | at com.google.inject.internal.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:46) 356 | at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1058) 357 | at com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40) 358 | at com.google.inject.Scopes$1$1.get(Scopes.java:65) 359 | at com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:41) 360 | at com.google.inject.internal.SingleParameterInjector.inject(SingleParameterInjector.java:38) 361 | at com.google.inject.internal.SingleParameterInjector.getAll(SingleParameterInjector.java:62) 362 | at com.google.inject.internal.ConstructorInjector.provision(ConstructorInjector.java:107) 363 | at com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:88) 364 | at com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:269) 365 | at com.google.inject.internal.InjectorImpl$3$1.call(InjectorImpl.java:1005) 366 | at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1058) 367 | at com.google.inject.internal.InjectorImpl$3.get(InjectorImpl.java:1001) 368 | at com.google.inject.internal.InjectorImpl.getInstance(InjectorImpl.java:1040) 369 | at io.druid.server.metrics.MetricsModule.getMonitorScheduler(MetricsModule.java:83) 370 | at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 371 | at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 372 | at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 373 | at java.lang.reflect.Method.invoke(Method.java:606) 374 | at com.google.inject.internal.ProviderMethod.get(ProviderMethod.java:105) 375 | at com.google.inject.internal.ProviderInternalFactory.provision(ProviderInternalFactory.java:86) 376 | at com.google.inject.internal.InternalFactoryToInitializableAdapter.provision(InternalFactoryToInitializableAdapter.java:55) 377 | at com.google.inject.internal.ProviderInternalFactory.circularGet(ProviderInternalFactory.java:66) 378 | at com.google.inject.internal.InternalFactoryToInitializableAdapter.get(InternalFactoryToInitializableAdapter.java:47) 379 | at com.google.inject.internal.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:46) 380 | at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1058) 381 | at com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40) 382 | at io.druid.guice.LifecycleScope$1.get(LifecycleScope.java:49) 383 | at com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:41) 384 | at com.google.inject.internal.FactoryProxy.get(FactoryProxy.java:56) 385 | at com.google.inject.internal.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:46) 386 | at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1058) 387 | at com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40) 388 | at com.google.inject.Scopes$1$1.get(Scopes.java:65) 389 | at com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:41) 390 | at com.google.inject.internal.InternalInjectorCreator$1.call(InternalInjectorCreator.java:205) 391 | at com.google.inject.internal.InternalInjectorCreator$1.call(InternalInjectorCreator.java:199) 392 | at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1051) 393 | at com.google.inject.internal.InternalInjectorCreator.loadEagerSingletons(InternalInjectorCreator.java:199) 394 | at com.google.inject.internal.InternalInjectorCreator.injectDynamically(InternalInjectorCreator.java:180) 395 | at com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:110) 396 | at com.google.inject.Guice.createInjector(Guice.java:96) 397 | at com.google.inject.Guice.createInjector(Guice.java:73) 398 | at com.google.inject.Guice.createInjector(Guice.java:62) 399 | at io.druid.initialization.Initialization.makeInjectorWithModules(Initialization.java:349) 400 | at io.druid.cli.GuiceRunnable.makeInjector(GuiceRunnable.java:56) 401 | at io.druid.cli.ServerRunnable.run(ServerRunnable.java:39) 402 | at io.druid.cli.Main.main(Main.java:90) 403 | ``` 404 | 405 | ### Resolution 406 | 407 | The JVM setting `-XX:MaxDirectMemorySize` on the historical nodes must be at least as big as `druid.processing.numThreads * druid.processing.buffer.sizeBytes`. 408 | 409 | ## Exception with one of the sequences 410 | 411 | When trying to run a simple topN query after restarting my cluster, I get the following response: 412 | 413 | ```bash 414 | $ dq "http://${druid_broker}:8080/druid/v2/?pretty" topn_by_channel.json 415 | 416 | curl --silent --show-error -d @topn_by_channel.json -H 'content-type: application/json' 'http://broker-ip:8080/druid/v2/' --data-urlencode 'pretty' | python -mjson.tool | pygmentize -l json -f terminal256 417 | 418 | { 419 | "error": "null exception" 420 | } 421 | 422 | real 0m1.964s 423 | user 0m0.003s 424 | sys 0m0.002s 425 | ``` 426 | 427 | The topN query JSON: 428 | 429 | ```json 430 | { 431 | "queryType": "topN", 432 | "dataSource": "click_conversion", 433 | "granularity": "day", 434 | "dimension": "channel", 435 | "metric": "events", 436 | "threshold": 100, 437 | "aggregations": [ 438 | { 439 | "type": "longSum", 440 | "fieldName": "count", 441 | "name": "events" 442 | }, 443 | { 444 | "type": "longSum", 445 | "fieldName": "orders", 446 | "name": "orders" 447 | }, 448 | { 449 | "type": "doubleSum", 450 | "fieldName": "sales", 451 | "name": "sales" 452 | }, 453 | { 454 | "type": "doubleSum", 455 | "fieldName": "commissions", 456 | "name": "commissions" 457 | } 458 | ], 459 | "intervals": ["0/3000"], 460 | "context": {"useCache": false, "populateCache": false} 461 | } 462 | ``` 463 | 464 | Looking in the broker node's log, I see the following entry: 465 | 466 | ``` 467 | 2014-10-29 21:37:52,655 INFO [qtp1289413694-43] io.druid.server.QueryResource - null exception [3941ccaa-f425-4dfc-913f-8f3faa08d953] 468 | ``` 469 | 470 | Looking in the historical node's log, I see the the following: 471 | 472 | ``` 473 | 2014-10-29 21:40:58,826 INFO [topN_click_conversion_[2014-04-11T00:00:00.000Z/2014-04-12T00:00:00.000Z]] io.druid.guice.DruidProcessingModule$IntermediateProcessingBufferPool - Allocating new intermediate processing buffer[219] of size[1,142,857,142] 474 | 2014-10-29 21:40:58,826 INFO [topN_click_conversion_[2014-04-14T00:00:00.000Z/2014-04-15T00:00:00.000Z]] io.druid.guice.DruidProcessingModule$IntermediateProcessingBufferPool - Allocating new intermediate processing buffer[220] of size[1,142,857,142] 475 | 2014-10-29 21:40:58,826 INFO [topN_click_conversion_[2014-04-17T00:00:00.000Z/2014-04-18T00:00:00.000Z]] io.druid.guice.DruidProcessingModule$IntermediateProcessingBufferPool - Allocating new intermediate processing buffer[221] of size[1,142,857,142] 476 | 2014-10-29 21:40:58,826 INFO [topN_click_conversion_[2014-04-22T00:00:00.000Z/2014-04-23T00:00:00.000Z]] io.druid.guice.DruidProcessingModule$IntermediateProcessingBufferPool - Allocating new intermediate processing buffer[222] of size[1,142,857,142] 477 | [Full GC[CMS: 15246K->15444K(3145728K), 0.1036770 secs] 314761K->15444K(4089472K), [CMS Perm : 39889K->39889K(66556K)], 0.1037830 secs] [Times: user=0.11 sys=0.00, real=0.10 secs] 478 | [Full GC[CMS: 15444K->15444K(3145728K), 0.0891800 secs] 15444K->15444K(4089472K), [CMS Perm : 39889K->39889K(66556K)], 0.0892730 secs] [Times: user=0.09 sys=0.00, real=0.09 secs] 479 | [Full GC[CMS: 15444K->15444K(3145728K), 0.0885340 secs] 15444K->15444K(4089472K), [CMS Perm : 39889K->39889K(66556K)], 0.0886360 secs] [Times: user=0.09 sys=0.00, real=0.09 secs] 480 | [Full GC[CMS: 15444K->15444K(3145728K), 0.0892900 secs] 17841K->15444K(4089472K), [CMS Perm : 39889K->39889K(66556K)], 0.0893870 secs] [Times: user=0.09 sys=0.00, real=0.09 secs] 481 | 2014-10-29 21:40:59,199 ERROR [processing-6] io.druid.query.ChainedExecutionQueryRunner - Exception with one of the sequences! 482 | java.lang.NullPointerException 483 | at io.druid.query.topn.PooledTopNAlgorithm.cleanup(PooledTopNAlgorithm.java:231) 484 | at io.druid.query.topn.PooledTopNAlgorithm.cleanup(PooledTopNAlgorithm.java:37) 485 | at io.druid.query.topn.TopNMapFn.apply(TopNMapFn.java:64) 486 | at io.druid.query.topn.TopNMapFn.apply(TopNMapFn.java:29) 487 | at io.druid.query.topn.TopNQueryEngine$1.apply(TopNQueryEngine.java:84) 488 | at io.druid.query.topn.TopNQueryEngine$1.apply(TopNQueryEngine.java:79) 489 | at com.metamx.common.guava.MappingYieldingAccumulator.accumulate(MappingYieldingAccumulator.java:57) 490 | at com.metamx.common.guava.FilteringYieldingAccumulator.accumulate(FilteringYieldingAccumulator.java:69) 491 | at com.metamx.common.guava.MappingYieldingAccumulator.accumulate(MappingYieldingAccumulator.java:57) 492 | at com.metamx.common.guava.BaseSequence.makeYielder(BaseSequence.java:104) 493 | at com.metamx.common.guava.BaseSequence.toYielder(BaseSequence.java:81) 494 | at com.metamx.common.guava.MappedSequence.toYielder(MappedSequence.java:46) 495 | at com.metamx.common.guava.ResourceClosingSequence.toYielder(ResourceClosingSequence.java:25) 496 | at com.metamx.common.guava.FilteredSequence.toYielder(FilteredSequence.java:52) 497 | at com.metamx.common.guava.MappedSequence.toYielder(MappedSequence.java:46) 498 | at com.metamx.common.guava.FilteredSequence.toYielder(FilteredSequence.java:52) 499 | at com.metamx.common.guava.ResourceClosingSequence.toYielder(ResourceClosingSequence.java:25) 500 | at com.metamx.common.guava.YieldingSequenceBase.accumulate(YieldingSequenceBase.java:18) 501 | at io.druid.query.MetricsEmittingQueryRunner$1.accumulate(MetricsEmittingQueryRunner.java:103) 502 | at io.druid.query.MetricsEmittingQueryRunner$1.accumulate(MetricsEmittingQueryRunner.java:103) 503 | at io.druid.query.spec.SpecificSegmentQueryRunner$2$1.call(SpecificSegmentQueryRunner.java:78) 504 | at io.druid.query.spec.SpecificSegmentQueryRunner.doNamed(SpecificSegmentQueryRunner.java:149) 505 | at io.druid.query.spec.SpecificSegmentQueryRunner.access$300(SpecificSegmentQueryRunner.java:35) 506 | at io.druid.query.spec.SpecificSegmentQueryRunner$2.doItNamed(SpecificSegmentQueryRunner.java:140) 507 | at io.druid.query.spec.SpecificSegmentQueryRunner$2.accumulate(SpecificSegmentQueryRunner.java:72) 508 | at com.metamx.common.guava.Sequences.toList(Sequences.java:113) 509 | at io.druid.query.ChainedExecutionQueryRunner$1$1$1.call(ChainedExecutionQueryRunner.java:132) 510 | at io.druid.query.ChainedExecutionQueryRunner$1$1$1.call(ChainedExecutionQueryRunner.java:118) 511 | at java.util.concurrent.FutureTask.run(FutureTask.java:262) 512 | at io.druid.query.PrioritizedExecutorService$PrioritizedListenableFutureTask.run(PrioritizedExecutorService.java:204) 513 | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 514 | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 515 | at java.lang.Thread.run(Thread.java:745) 516 | 2014-10-29 21:40:59,199 ERROR [processing-5] io.druid.query.ChainedExecutionQueryRunner - Exception with one of the sequences! 517 | java.lang.NullPointerException 518 | at io.druid.query.topn.PooledTopNAlgorithm.cleanup(PooledTopNAlgorithm.java:231) 519 | at io.druid.query.topn.PooledTopNAlgorithm.cleanup(PooledTopNAlgorithm.java:37) 520 | at io.druid.query.topn.TopNMapFn.apply(TopNMapFn.java:64) 521 | at io.druid.query.topn.TopNMapFn.apply(TopNMapFn.java:29) 522 | at io.druid.query.topn.TopNQueryEngine$1.apply(TopNQueryEngine.java:84) 523 | at io.druid.query.topn.TopNQueryEngine$1.apply(TopNQueryEngine.java:79) 524 | at com.metamx.common.guava.MappingYieldingAccumulator.accumulate(MappingYieldingAccumulator.java:57) 525 | at com.metamx.common.guava.FilteringYieldingAccumulator.accumulate(FilteringYieldingAccumulator.java:69) 526 | at com.metamx.common.guava.MappingYieldingAccumulator.accumulate(MappingYieldingAccumulator.java:57) 527 | at com.metamx.common.guava.BaseSequence.makeYielder(BaseSequence.java:104) 528 | at com.metamx.common.guava.BaseSequence.toYielder(BaseSequence.java:81) 529 | at com.metamx.common.guava.MappedSequence.toYielder(MappedSequence.java:46) 530 | at com.metamx.common.guava.ResourceClosingSequence.toYielder(ResourceClosingSequence.java:25) 531 | at com.metamx.common.guava.FilteredSequence.toYielder(FilteredSequence.java:52) 532 | at com.metamx.common.guava.MappedSequence.toYielder(MappedSequence.java:46) 533 | at com.metamx.common.guava.FilteredSequence.toYielder(FilteredSequence.java:52) 534 | at com.metamx.common.guava.ResourceClosingSequence.toYielder(ResourceClosingSequence.java:25) 535 | at com.metamx.common.guava.YieldingSequenceBase.accumulate(YieldingSequenceBase.java:18) 536 | at io.druid.query.MetricsEmittingQueryRunner$1.accumulate(MetricsEmittingQueryRunner.java:103) 537 | at io.druid.query.MetricsEmittingQueryRunner$1.accumulate(MetricsEmittingQueryRunner.java:103) 538 | at io.druid.query.spec.SpecificSegmentQueryRunner$2$1.call(SpecificSegmentQueryRunner.java:78) 539 | at io.druid.query.spec.SpecificSegmentQueryRunner.doNamed(SpecificSegmentQueryRunner.java:149) 540 | at io.druid.query.spec.SpecificSegmentQueryRunner.access$300(SpecificSegmentQueryRunner.java:35) 541 | at io.druid.query.spec.SpecificSegmentQueryRunner$2.doItNamed(SpecificSegmentQueryRunner.java:140) 542 | at io.druid.query.spec.SpecificSegmentQueryRunner$2.accumulate(SpecificSegmentQueryRunner.java:72) 543 | at com.metamx.common.guava.Sequences.toList(Sequences.java:113) 544 | at io.druid.query.ChainedExecutionQueryRunner$1$1$1.call(ChainedExecutionQueryRunner.java:132) 545 | at io.druid.query.ChainedExecutionQueryRunner$1$1$1.call(ChainedExecutionQueryRunner.java:118) 546 | at java.util.concurrent.FutureTask.run(FutureTask.java:262) 547 | at io.druid.query.PrioritizedExecutorService$PrioritizedListenableFutureTask.run(PrioritizedExecutorService.java:204) 548 | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 549 | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 550 | at java.lang.Thread.run(Thread.java:745) 551 | 2014-10-29 21:40:59,207 INFO [topN_click_conversion_[2014-07-24T00:00:00.000Z/2014-07-25T00:00:00.000Z]] io.druid.guice.DruidProcessingModule$IntermediateProcessingBufferPool - Allocating new intermediate processing buffer[223] of size[1,142,857,142] 552 | 2014-10-29 21:40:59,207 INFO [topN_click_conversion_[2014-06-04T00:00:00.000Z/2014-06-05T00:00:00.000Z]] io.druid.guice.DruidProcessingModule$IntermediateProcessingBufferPool - Allocating new intermediate processing buffer[224] of size[1,142,857,142] 553 | [Full GC[CMS: 15444K->15262K(3145728K), 0.0890440 secs] 44207K->15262K(4089472K), [CMS Perm : 39889K->39889K(66556K)], 0.0891450 secs] [Times: user=0.09 sys=0.00, real=0.09 secs] 554 | [Full GC[CMS: 15262K->15247K(3145728K), 0.0889730 secs] 22454K->15247K(4089472K), [CMS Perm : 39889K->39889K(66556K)], 0.0890760 secs] [Times: user=0.09 sys=0.00, real=0.08 secs] 555 | 2014-10-29 21:40:59,386 ERROR [processing-1] io.druid.query.ChainedExecutionQueryRunner - Exception with one of the sequences! 556 | java.lang.NullPointerException 557 | at io.druid.query.topn.PooledTopNAlgorithm.cleanup(PooledTopNAlgorithm.java:231) 558 | at io.druid.query.topn.PooledTopNAlgorithm.cleanup(PooledTopNAlgorithm.java:37) 559 | at io.druid.query.topn.TopNMapFn.apply(TopNMapFn.java:64) 560 | at io.druid.query.topn.TopNMapFn.apply(TopNMapFn.java:29) 561 | at io.druid.query.topn.TopNQueryEngine$1.apply(TopNQueryEngine.java:84) 562 | at io.druid.query.topn.TopNQueryEngine$1.apply(TopNQueryEngine.java:79) 563 | at com.metamx.common.guava.MappingYieldingAccumulator.accumulate(MappingYieldingAccumulator.java:57) 564 | at com.metamx.common.guava.FilteringYieldingAccumulator.accumulate(FilteringYieldingAccumulator.java:69) 565 | at com.metamx.common.guava.MappingYieldingAccumulator.accumulate(MappingYieldingAccumulator.java:57) 566 | at com.metamx.common.guava.BaseSequence.makeYielder(BaseSequence.java:104) 567 | at com.metamx.common.guava.BaseSequence.toYielder(BaseSequence.java:81) 568 | at com.metamx.common.guava.MappedSequence.toYielder(MappedSequence.java:46) 569 | at com.metamx.common.guava.ResourceClosingSequence.toYielder(ResourceClosingSequence.java:25) 570 | at com.metamx.common.guava.FilteredSequence.toYielder(FilteredSequence.java:52) 571 | at com.metamx.common.guava.MappedSequence.toYielder(MappedSequence.java:46) 572 | at com.metamx.common.guava.FilteredSequence.toYielder(FilteredSequence.java:52) 573 | at com.metamx.common.guava.ResourceClosingSequence.toYielder(ResourceClosingSequence.java:25) 574 | at com.metamx.common.guava.YieldingSequenceBase.accumulate(YieldingSequenceBase.java:18) 575 | at io.druid.query.MetricsEmittingQueryRunner$1.accumulate(MetricsEmittingQueryRunner.java:103) 576 | at io.druid.query.MetricsEmittingQueryRunner$1.accumulate(MetricsEmittingQueryRunner.java:103) 577 | at io.druid.query.spec.SpecificSegmentQueryRunner$2$1.call(SpecificSegmentQueryRunner.java:78) 578 | at io.druid.query.spec.SpecificSegmentQueryRunner.doNamed(SpecificSegmentQueryRunner.java:149) 579 | at io.druid.query.spec.SpecificSegmentQueryRunner.access$300(SpecificSegmentQueryRunner.java:35) 580 | at io.druid.query.spec.SpecificSegmentQueryRunner$2.doItNamed(SpecificSegmentQueryRunner.java:140) 581 | at io.druid.query.spec.SpecificSegmentQueryRunner$2.accumulate(SpecificSegmentQueryRunner.java:72) 582 | at com.metamx.common.guava.Sequences.toList(Sequences.java:113) 583 | at io.druid.query.ChainedExecutionQueryRunner$1$1$1.call(ChainedExecutionQueryRunner.java:132) 584 | at io.druid.query.ChainedExecutionQueryRunner$1$1$1.call(ChainedExecutionQueryRunner.java:118) 585 | at java.util.concurrent.FutureTask.run(FutureTask.java:262) 586 | at io.druid.query.PrioritizedExecutorService$PrioritizedListenableFutureTask.run(PrioritizedExecutorService.java:204) 587 | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 588 | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 589 | at java.lang.Thread.run(Thread.java:745) 590 | 2014-10-29 21:40:59,386 ERROR [processing-4] io.druid.query.ChainedExecutionQueryRunner - Exception with one of the sequences! 591 | java.lang.NullPointerException 592 | at io.druid.query.topn.PooledTopNAlgorithm.cleanup(PooledTopNAlgorithm.java:231) 593 | at io.druid.query.topn.PooledTopNAlgorithm.cleanup(PooledTopNAlgorithm.java:37) 594 | at io.druid.query.topn.TopNMapFn.apply(TopNMapFn.java:64) 595 | at io.druid.query.topn.TopNMapFn.apply(TopNMapFn.java:29) 596 | at io.druid.query.topn.TopNQueryEngine$1.apply(TopNQueryEngine.java:84) 597 | at io.druid.query.topn.TopNQueryEngine$1.apply(TopNQueryEngine.java:79) 598 | at com.metamx.common.guava.MappingYieldingAccumulator.accumulate(MappingYieldingAccumulator.java:57) 599 | at com.metamx.common.guava.FilteringYieldingAccumulator.accumulate(FilteringYieldingAccumulator.java:69) 600 | at com.metamx.common.guava.MappingYieldingAccumulator.accumulate(MappingYieldingAccumulator.java:57) 601 | at com.metamx.common.guava.BaseSequence.makeYielder(BaseSequence.java:104) 602 | at com.metamx.common.guava.BaseSequence.toYielder(BaseSequence.java:81) 603 | at com.metamx.common.guava.MappedSequence.toYielder(MappedSequence.java:46) 604 | at com.metamx.common.guava.ResourceClosingSequence.toYielder(ResourceClosingSequence.java:25) 605 | at com.metamx.common.guava.FilteredSequence.toYielder(FilteredSequence.java:52) 606 | at com.metamx.common.guava.MappedSequence.toYielder(MappedSequence.java:46) 607 | at com.metamx.common.guava.FilteredSequence.toYielder(FilteredSequence.java:52) 608 | at com.metamx.common.guava.ResourceClosingSequence.toYielder(ResourceClosingSequence.java:25) 609 | at com.metamx.common.guava.YieldingSequenceBase.accumulate(YieldingSequenceBase.java:18) 610 | at io.druid.query.MetricsEmittingQueryRunner$1.accumulate(MetricsEmittingQueryRunner.java:103) 611 | at io.druid.query.MetricsEmittingQueryRunner$1.accumulate(MetricsEmittingQueryRunner.java:103) 612 | at io.druid.query.spec.SpecificSegmentQueryRunner$2$1.call(SpecificSegmentQueryRunner.java:78) 613 | at io.druid.query.spec.SpecificSegmentQueryRunner.doNamed(SpecificSegmentQueryRunner.java:149) 614 | at io.druid.query.spec.SpecificSegmentQueryRunner.access$300(SpecificSegmentQueryRunner.java:35) 615 | at io.druid.query.spec.SpecificSegmentQueryRunner$2.doItNamed(SpecificSegmentQueryRunner.java:140) 616 | at io.druid.query.spec.SpecificSegmentQueryRunner$2.accumulate(SpecificSegmentQueryRunner.java:72) 617 | at com.metamx.common.guava.Sequences.toList(Sequences.java:113) 618 | at io.druid.query.ChainedExecutionQueryRunner$1$1$1.call(ChainedExecutionQueryRunner.java:132) 619 | at io.druid.query.ChainedExecutionQueryRunner$1$1$1.call(ChainedExecutionQueryRunner.java:118) 620 | at java.util.concurrent.FutureTask.run(FutureTask.java:262) 621 | at io.druid.query.PrioritizedExecutorService$PrioritizedListenableFutureTask.run(PrioritizedExecutorService.java:204) 622 | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 623 | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 624 | at java.lang.Thread.run(Thread.java:745) 625 | ``` 626 | 627 | The above lines repeat hundreds of times in the log. 628 | 629 | ### Resolution 630 | 631 | The JVM setting `-XX:MaxDirectMemorySize` on the historical nodes must be at least as big as `druid.processing.numThreads * druid.processing.buffer.sizeBytes`. 632 | 633 | Explanation from Fangjin: 634 | 635 | The intermediate buffer is a block of off-heap memory used to hold intermediate results for computation (one of the biggest users of this buffer is topN queries). For example, if you need to compute aggregates for every dimension value in a topN dimension with very high cardinality, you may require a lot of memory to store these results. Using off-heap memory bounds the total resources used so we don't overflow the heap for computations, something that is likely without bounds when there are numerous concurrent queries. Because of the limited memory available on your nodes, you do have a make a tradeoff of how this memory gets used for various things. Druid needs some manual configuration for resource distribution right now and we hope this gets easier in the future. -------------------------------------------------------------------------------- /broker.properties: -------------------------------------------------------------------------------- 1 | # System Cores: 8 2 | # System Memory: 31564075008 bytes 3 | # Space at /: 8320901120 bytes 4 | 5 | # Server Module (All nodes) 6 | druid.service=druid/broker 7 | druid.host=10.166.117.181:8080 8 | druid.port=8080 9 | 10 | # Indexing Service Discovery Module (All nodes) 11 | druid.selectors.indexing.serviceName=druid:overlord 12 | 13 | # Curator Module (All nodes) 14 | druid.zk.service.host=10.158.107.246:2181 15 | 16 | # Metrics Module (All nodes) 17 | druid.monitoring.monitors=["com.metamx.metrics.SysMonitor","com.metamx.metrics.JvmMonitor"] 18 | 19 | # DataSegment Pusher/Puller Module - S3 Deep Storage (All nodes) 20 | druid.extensions.coordinates=["io.druid.extensions:druid-s3-extensions:0.6.146"] 21 | druid.storage.type=s3 22 | druid.s3.accessKey=XXXXXXXXXXXX 23 | druid.s3.secretKey=xxxxxxxxxxxxxxxxxxxx 24 | druid.storage.bucket=s3-bucket 25 | 26 | # DataSegment Pusher/Puller Module - Cassandra Deep Storage (All nodes) 27 | # druid.extensions.coordinates=["io.druid.extensions:druid-cassandra-storage:0.6.146"] 28 | # druid.storage.type=c* 29 | # druid.storage.host=none:9160 30 | # druid.storage.keyspace=druid 31 | 32 | # Emitter Module (All nodes) 33 | druid.emitter=logging 34 | 35 | # Druid Processing Module (Historical, Realtime, and Broker nodes) 36 | druid.processing.numThreads=7 37 | druid.processing.buffer.sizeBytes=2000000000 38 | 39 | # Queryable Module (Historical, Realtime, and Broker nodes) 40 | druid.request.logging.type=file 41 | druid.request.logging.dir=/usr/local/druid-services/logs 42 | 43 | # Http Client Module (Broker node) 44 | druid.broker.http.numConnections=20 45 | druid.broker.http.readTimeout=PT5M 46 | druid.broker.cache.type=local 47 | druid.broker.cache.sizeInBytes=4160450560 48 | 49 | # /!\ Counterintuitive Configuration Warning /!\ 50 | # Setting druid.emitter.logging.logLevel to "debug" DISABLES debug logging, 51 | # to enable debug logging, comment out the following line: 52 | druid.emitter.logging.logLevel=debug 53 | 54 | -------------------------------------------------------------------------------- /coordinator.properties: -------------------------------------------------------------------------------- 1 | # System Cores: 4 2 | # System Memory: 15773679616 bytes 3 | # Space at /: 8320901120 bytes 4 | 5 | # Server Module (All nodes) 6 | druid.service=druid/coordinator 7 | druid.host=10.91.147.92:8080 8 | druid.port=8080 9 | 10 | # Indexing Service Discovery Module (All nodes) 11 | druid.selectors.indexing.serviceName=druid:overlord 12 | 13 | # Curator Module (All nodes) 14 | druid.zk.service.host=10.158.107.246:2181 15 | 16 | # Metrics Module (All nodes) 17 | druid.monitoring.monitors=["com.metamx.metrics.SysMonitor","com.metamx.metrics.JvmMonitor"] 18 | 19 | # DataSegment Pusher/Puller Module - S3 Deep Storage (All nodes) 20 | druid.extensions.coordinates=["io.druid.extensions:druid-s3-extensions:0.6.146"] 21 | druid.storage.type=s3 22 | druid.s3.accessKey=XXXXXXXXXXXX 23 | druid.s3.secretKey=xxxxxxxxxxxxxxxxxxxx 24 | druid.storage.bucket=s3-bucket 25 | 26 | # DataSegment Pusher/Puller Module - Cassandra Deep Storage (All nodes) 27 | # druid.extensions.coordinates=["io.druid.extensions:druid-cassandra-storage:0.6.146"] 28 | # druid.storage.type=c* 29 | # druid.storage.host=none:9160 30 | # druid.storage.keyspace=druid 31 | 32 | # Emitter Module (All nodes) 33 | druid.emitter=logging 34 | 35 | # Database Connector Module (Coordinator, Overlord, and MiddleManager nodes) 36 | druid.db.connector.connectURI=jdbc:mysql://10.91.140.56:3306/druid 37 | druid.db.connector.user=druid 38 | druid.db.connector.password=diurd 39 | 40 | # Coordinator Configuration 41 | druid.coordinator.startDelay=PT70s 42 | 43 | # /!\ Counterintuitive Configuration Warning /!\ 44 | # Setting druid.emitter.logging.logLevel to "debug" DISABLES debug logging, 45 | # to enable debug logging, comment out the following line: 46 | druid.emitter.logging.logLevel=debug 47 | 48 | -------------------------------------------------------------------------------- /historical.properties: -------------------------------------------------------------------------------- 1 | # System Cores: 8 2 | # System Memory: 31564075008 bytes 3 | # Space at /: 8320901120 bytes 4 | # Space at /indexCache: bytes 5 | 6 | # Server Module (All nodes) 7 | druid.service=druid/historical 8 | druid.host=10.63.203.153:8080 9 | druid.port=8080 10 | 11 | # Indexing Service Discovery Module (All nodes) 12 | druid.selectors.indexing.serviceName=druid:overlord 13 | 14 | # Curator Module (All nodes) 15 | druid.zk.service.host=10.158.107.246:2181 16 | 17 | # Metrics Module (All nodes - ServerMonitor for Realtime and Historical nodes only) 18 | druid.monitoring.monitors=["com.metamx.metrics.SysMonitor","com.metamx.metrics.JvmMonitor","io.druid.server.metrics.ServerMonitor"] 19 | 20 | # DataSegment Pusher/Puller Module - S3 Deep Storage (All nodes) 21 | druid.extensions.coordinates=["io.druid.extensions:druid-s3-extensions:0.6.146"] 22 | druid.storage.type=s3 23 | druid.s3.accessKey=XXXXXXXXXXXX 24 | druid.s3.secretKey=xxxxxxxxxxxxxxxxxxxx 25 | druid.storage.bucket=s3-bucket 26 | 27 | # DataSegment Pusher/Puller Module - Cassandra Deep Storage (All nodes) 28 | # druid.extensions.coordinates=["io.druid.extensions:druid-cassandra-storage:0.6.146"] 29 | # druid.storage.type=c* 30 | # druid.storage.host=none:9160 31 | # druid.storage.keyspace=druid 32 | 33 | # Emitter Module (All nodes) 34 | druid.emitter=logging 35 | 36 | # Druid Processing Module (Historical, Realtime, and Broker nodes) 37 | druid.processing.numThreads=7 38 | druid.processing.buffer.sizeBytes=1142857142 39 | 40 | # Queryable Module (Historical, Realtime, and Broker nodes) 41 | druid.request.logging.type=file 42 | druid.request.logging.dir=/usr/local/druid-services/logs 43 | 44 | # Storage Node Module (Historical and Realtime nodes) 45 | druid.server.maxSize=21207667507 46 | druid.segmentCache.locations=[{"path": "/indexCache", "maxSize": 21207667507}] 47 | 48 | # /!\ Counterintuitive Configuration Warning /!\ 49 | # Setting druid.emitter.logging.logLevel to "debug" DISABLES debug logging, 50 | # to enable debug logging, comment out the following line: 51 | druid.emitter.logging.logLevel=debug 52 | 53 | -------------------------------------------------------------------------------- /images/broken_console.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lexicalunit/druid_config/98d807588ba2db9715db81bf71f4366cddbc98e7/images/broken_console.png -------------------------------------------------------------------------------- /images/console.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lexicalunit/druid_config/98d807588ba2db9715db81bf71f4366cddbc98e7/images/console.png -------------------------------------------------------------------------------- /images/task_fail.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lexicalunit/druid_config/98d807588ba2db9715db81bf71f4366cddbc98e7/images/task_fail.png -------------------------------------------------------------------------------- /images/task_running.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lexicalunit/druid_config/98d807588ba2db9715db81bf71f4366cddbc98e7/images/task_running.png -------------------------------------------------------------------------------- /images/working_console.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lexicalunit/druid_config/98d807588ba2db9715db81bf71f4366cddbc98e7/images/working_console.png -------------------------------------------------------------------------------- /middleManager.properties: -------------------------------------------------------------------------------- 1 | # System Cores: 8 2 | # System Memory: 31564075008 bytes 3 | # Space at /: 8320901120 bytes 4 | 5 | # Server Module (All nodes) 6 | druid.service=druid/middleManager 7 | druid.host=10.237.181.35:8080 8 | druid.port=8080 9 | 10 | # Indexing Service Discovery Module (All nodes) 11 | druid.selectors.indexing.serviceName=druid:overlord 12 | 13 | # Curator Module (All nodes) 14 | druid.zk.service.host=10.158.107.246:2181 15 | 16 | # Metrics Module (All nodes) 17 | druid.monitoring.monitors=["com.metamx.metrics.SysMonitor","com.metamx.metrics.JvmMonitor"] 18 | 19 | # DataSegment Pusher/Puller Module - S3 Deep Storage (All nodes) 20 | druid.extensions.coordinates=["io.druid.extensions:druid-s3-extensions:0.6.146"] 21 | druid.storage.type=s3 22 | druid.s3.accessKey=XXXXXXXXXXXX 23 | druid.s3.secretKey=xxxxxxxxxxxxxxxxxxxx 24 | druid.storage.bucket=s3-bucket 25 | 26 | # DataSegment Pusher/Puller Module - Cassandra Deep Storage (All nodes) 27 | # druid.extensions.coordinates=["io.druid.extensions:druid-cassandra-storage:0.6.146"] 28 | # druid.storage.type=c* 29 | # druid.storage.host=none:9160 30 | # druid.storage.keyspace=druid 31 | 32 | # Emitter Module (All nodes) 33 | druid.emitter=logging 34 | 35 | # Database Connector Module (Coordinator, Overlord, and MiddleManager nodes) 36 | druid.db.connector.connectURI=jdbc:mysql://10.91.140.56:3306/druid 37 | druid.db.connector.user=druid 38 | druid.db.connector.password=diurd 39 | 40 | # Task Log Module (Overlord and MiddleManager node) 41 | druid.indexer.logs.type=s3 42 | druid.indexer.logs.s3Bucket=s3-bucket 43 | druid.indexer.logs.s3Prefix=logs 44 | 45 | # Middle Manager Configuration 46 | druid.worker.ip=10.237.181.35 47 | druid.worker.capacity=7 48 | 49 | # Peon Configuration 50 | druid.indexer.runner.javaOpts="-server -Xmx2g -XX:+UseG1GC -XX:MaxGCPauseMillis=100 -XX:+PrintGCDetails -XX:+PrintGCTimeStamps" 51 | druid.indexer.task.chathandler.type=announce 52 | druid.indexer.task.baseTaskDir=/persistent 53 | 54 | # Taken from http://druid.io/docs/latest/Production-Cluster-Configuration.html 55 | druid.indexer.fork.property.druid.computation.buffer.size=536870912 56 | druid.indexer.fork.property.druid.processing.numThreads=3 57 | druid.indexer.fork.property.druid.request.logging.type=file 58 | druid.indexer.fork.property.druid.request.logging.dir=/usr/local/druid-services/logs 59 | druid.indexer.fork.property.druid.segmentCache.locations=[{"path": "/tmp", "maxSize": 0}] 60 | druid.indexer.fork.property.druid.server.http.numThreads=50 61 | 62 | # /!\ Counterintuitive Configuration Warning /!\ 63 | # Setting druid.emitter.logging.logLevel to "debug" DISABLES debug logging, 64 | # to enable debug logging, comment out the following line: 65 | druid.emitter.logging.logLevel=debug 66 | 67 | -------------------------------------------------------------------------------- /overlord.properties: -------------------------------------------------------------------------------- 1 | # System Cores: 4 2 | # System Memory: 15773679616 bytes 3 | # Space at /: 8320901120 bytes 4 | 5 | # Server Module (All nodes) 6 | druid.service=druid/overlord 7 | druid.host=10.33.150.229:8080 8 | druid.port=8080 9 | 10 | # Indexing Service Discovery Module (All nodes) 11 | druid.selectors.indexing.serviceName=druid:overlord 12 | 13 | # Curator Module (All nodes) 14 | druid.zk.service.host=10.158.107.246:2181 15 | 16 | # Metrics Module (All nodes) 17 | druid.monitoring.monitors=["com.metamx.metrics.SysMonitor","com.metamx.metrics.JvmMonitor"] 18 | 19 | # DataSegment Pusher/Puller Module - S3 Deep Storage (All nodes) 20 | druid.extensions.coordinates=["io.druid.extensions:druid-s3-extensions:0.6.146"] 21 | druid.storage.type=s3 22 | druid.s3.accessKey=XXXXXXXXXXXX 23 | druid.s3.secretKey=xxxxxxxxxxxxxxxxxxxx 24 | druid.storage.bucket=s3-bucket 25 | 26 | # DataSegment Pusher/Puller Module - Cassandra Deep Storage (All nodes) 27 | # druid.extensions.coordinates=["io.druid.extensions:druid-cassandra-storage:0.6.146"] 28 | # druid.storage.type=c* 29 | # druid.storage.host=none:9160 30 | # druid.storage.keyspace=druid 31 | 32 | # Emitter Module (All nodes) 33 | druid.emitter=logging 34 | 35 | # Database Connector Module (Coordinator, Overlord, and MiddleManager nodes) 36 | druid.db.connector.connectURI=jdbc:mysql://10.91.140.56:3306/druid 37 | druid.db.connector.user=druid 38 | druid.db.connector.password=diurd 39 | 40 | # Task Log Module (Overlord and MiddleManager node) 41 | druid.indexer.logs.type=s3 42 | druid.indexer.logs.s3Bucket=s3-bucket 43 | druid.indexer.logs.s3Prefix=logs 44 | 45 | # Overlord Configuration 46 | druid.indexer.queue.startDelay=PT0M 47 | druid.indexer.runner.type=remote 48 | 49 | # /!\ Counterintuitive Configuration Warning /!\ 50 | # Setting druid.emitter.logging.logLevel to "debug" DISABLES debug logging, 51 | # to enable debug logging, comment out the following line: 52 | druid.emitter.logging.logLevel=debug 53 | 54 | -------------------------------------------------------------------------------- /update.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | function abort() 4 | { 5 | echo $'\e[1;31merror:' $1 $'\e[0m' >&2 6 | exit 1 7 | } 8 | 9 | [[ -n "$druid_coordinator" && -n "$druid_broker" && -n "$druid_historical_1" && -n "$druid_overlord" && -n "$druid_middleManager" ]] || abort "exports not set up" 10 | 11 | scp -i $WHIRR_PEM -o "UserKnownHostsFile /dev/null" -o StrictHostKeyChecking=no $druid_coordinator:/usr/local/druid-services/config/coordinator/runtime.properties coordinator.properties 12 | scp -i $WHIRR_PEM -o "UserKnownHostsFile /dev/null" -o StrictHostKeyChecking=no $druid_broker:/usr/local/druid-services/config/broker/runtime.properties broker.properties 13 | scp -i $WHIRR_PEM -o "UserKnownHostsFile /dev/null" -o StrictHostKeyChecking=no $druid_historical_1:/usr/local/druid-services/config/historical/runtime.properties historical.properties 14 | scp -i $WHIRR_PEM -o "UserKnownHostsFile /dev/null" -o StrictHostKeyChecking=no $druid_overlord:/usr/local/druid-services/config/overlord/runtime.properties overlord.properties 15 | scp -i $WHIRR_PEM -o "UserKnownHostsFile /dev/null" -o StrictHostKeyChecking=no $druid_middleManager:/usr/local/druid-services/config/middleManager/runtime.properties middleManager.properties 16 | 17 | for I in *.properties; do 18 | perl -pi -e 's/druid.s3.accessKey=.*/druid.s3.accessKey=XXXXXXXXXXXX/g' $I 19 | perl -pi -e 's/druid.s3.secretKey=.*/druid.s3.secretKey=xxxxxxxxxxxxxxxxxxxx/g' $I 20 | perl -pi -e 's/druid.storage.bucket=.*/druid.storage.bucket=s3-bucket/g' $I 21 | perl -pi -e 's/druid.indexer.logs.s3Bucket=.*/druid.indexer.logs.s3Bucket=s3-bucket/g' $I 22 | done 23 | 24 | echo "JVM settings:" 25 | ssh -i $WHIRR_PEM -o "UserKnownHostsFile /dev/null" -o StrictHostKeyChecking=no $druid_coordinator "ps -efw | grep java | grep -v grep" 2>/dev/null 26 | ssh -i $WHIRR_PEM -o "UserKnownHostsFile /dev/null" -o StrictHostKeyChecking=no $druid_broker "ps -efw | grep java | grep -v grep" 2>/dev/null 27 | ssh -i $WHIRR_PEM -o "UserKnownHostsFile /dev/null" -o StrictHostKeyChecking=no $druid_historical_1 "ps -efw | grep java | grep -v grep" 2>/dev/null 28 | ssh -i $WHIRR_PEM -o "UserKnownHostsFile /dev/null" -o StrictHostKeyChecking=no $druid_overlord "ps -efw | grep java | grep -v grep" 2>/dev/null 29 | ssh -i $WHIRR_PEM -o "UserKnownHostsFile /dev/null" -o StrictHostKeyChecking=no $druid_middleManager "ps -efw | grep java | grep -v grep" 2>/dev/null 30 | --------------------------------------------------------------------------------