├── .DS_Store ├── bin └── README ├── default ├── .DS_Store ├── app.conf ├── data │ ├── .DS_Store │ └── ui │ │ ├── .DS_Store │ │ ├── nav │ │ └── default.xml │ │ └── views │ │ ├── .DS_Store │ │ ├── README │ │ ├── bucket_roll_analysis.xml │ │ ├── cluster_master_performance.xml │ │ ├── crash_dump_analysis.xml │ │ ├── debug_bucket_rolls.xml │ │ ├── debug_cache_manager_misses.xml │ │ ├── debug_incoming_forwarders.xml │ │ ├── debug_indexer_performance.xml │ │ ├── debug_ingestion.xml │ │ ├── debug_peer_is_down.xml │ │ ├── debug_replication.xml │ │ ├── debug_search.xml │ │ ├── discovery_forwarding_hierrachy.xml │ │ ├── discovery_searches.xml │ │ ├── event_delay_for_host.xml │ │ ├── event_delay_for_index.xml │ │ ├── event_delay_index_sourcetype.xml │ │ ├── event_distribution_measurement.xml │ │ ├── find_cluster_master_events.xml │ │ ├── home.xml │ │ ├── indexer_performance.xml │ │ ├── internal_indexes_breakdown.xml │ │ ├── measuring_concurrency.xml │ │ ├── roll_your_own_tstats_acceleration.xml │ │ ├── search_head_resource_utilisation.xml │ │ ├── search_performance_evaluator.xml │ │ ├── trace_back_indexer_search_load.xml │ │ ├── tstats_performance_comparision.xml │ │ └── vcpu_infrastructure_sizing.xml └── searches.conf ├── local ├── app.conf ├── data │ └── ui │ │ └── views │ │ ├── bucket_size_analysis │ │ ├── bursting_forwarders_and_indexing_delay.xml │ │ ├── event_distribution_measurements.xml │ │ ├── intermediate_forwarders_switching_efficiency_analysis.xml │ │ └── top_data_generating_source_forwarder_analysis.xml └── savedsearches.conf ├── metadata ├── default.meta └── local.meta └── vcpu_pricing /.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/silkyrich/cluster_health_tools/6678502bc293585094f98e1ba87a1d40a0b3b538/.DS_Store -------------------------------------------------------------------------------- /bin/README: -------------------------------------------------------------------------------- 1 | This is where you put any scripts you want to add to this app. 2 | -------------------------------------------------------------------------------- /default/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/silkyrich/cluster_health_tools/6678502bc293585094f98e1ba87a1d40a0b3b538/default/.DS_Store -------------------------------------------------------------------------------- /default/app.conf: -------------------------------------------------------------------------------- 1 | # 2 | # Splunk app configuration file 3 | # 4 | 5 | [install] 6 | is_configured = 0 7 | 8 | [ui] 9 | is_visible = 1 10 | label = Event distribution tools 11 | 12 | [launcher] 13 | author = Richard Morgan 14 | description = A collection of dashboards to measure event distribution by various metrics 15 | version = 1.0 16 | 17 | -------------------------------------------------------------------------------- /default/data/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/silkyrich/cluster_health_tools/6678502bc293585094f98e1ba87a1d40a0b3b538/default/data/.DS_Store -------------------------------------------------------------------------------- /default/data/ui/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/silkyrich/cluster_health_tools/6678502bc293585094f98e1ba87a1d40a0b3b538/default/data/ui/.DS_Store -------------------------------------------------------------------------------- /default/data/ui/nav/default.xml: -------------------------------------------------------------------------------- 1 | 8 | -------------------------------------------------------------------------------- /default/data/ui/views/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/silkyrich/cluster_health_tools/6678502bc293585094f98e1ba87a1d40a0b3b538/default/data/ui/views/.DS_Store -------------------------------------------------------------------------------- /default/data/ui/views/README: -------------------------------------------------------------------------------- 1 | There are a lot of dashboards in this collection of varying quality. 2 | 3 | Install the one called "home" into any app that you like, "search" for instance. 4 | 5 | The "home" dashboard has links to the major dashboards and a description of what they are trying to achieve. 6 | -------------------------------------------------------------------------------- /default/data/ui/views/crash_dump_analysis.xml: -------------------------------------------------------------------------------- 1 |
254 | -------------------------------------------------------------------------------- /default/data/ui/views/debug_bucket_rolls.xml: -------------------------------------------------------------------------------- 1 | 115 | -------------------------------------------------------------------------------- /default/data/ui/views/debug_cache_manager_misses.xml: -------------------------------------------------------------------------------- 1 | 137 | -------------------------------------------------------------------------------- /default/data/ui/views/debug_peer_is_down.xml: -------------------------------------------------------------------------------- 1 | 321 | -------------------------------------------------------------------------------- /default/data/ui/views/debug_replication.xml: -------------------------------------------------------------------------------- 1 | 250 | -------------------------------------------------------------------------------- /default/data/ui/views/debug_search.xml: -------------------------------------------------------------------------------- 1 | 224 | -------------------------------------------------------------------------------- /default/data/ui/views/discovery_forwarding_hierrachy.xml: -------------------------------------------------------------------------------- 1 | 176 | -------------------------------------------------------------------------------- /default/data/ui/views/discovery_searches.xml: -------------------------------------------------------------------------------- 1 | 230 | -------------------------------------------------------------------------------- /default/data/ui/views/event_delay_for_host.xml: -------------------------------------------------------------------------------- 1 | 317 | -------------------------------------------------------------------------------- /default/data/ui/views/event_delay_for_index.xml: -------------------------------------------------------------------------------- 1 | 219 | -------------------------------------------------------------------------------- /default/data/ui/views/event_delay_index_sourcetype.xml: -------------------------------------------------------------------------------- 1 | 121 | -------------------------------------------------------------------------------- /default/data/ui/views/event_distribution_measurement.xml: -------------------------------------------------------------------------------- 1 | 267 | -------------------------------------------------------------------------------- /default/data/ui/views/find_cluster_master_events.xml: -------------------------------------------------------------------------------- 1 | 288 | -------------------------------------------------------------------------------- /default/data/ui/views/home.xml: -------------------------------------------------------------------------------- 1 |Welcome to Richard Morgan's collection of diagnostic dashboards. As a Principal SE Architect at Splunk working in the EMEA region for many years I have developed my own dashboards to debug and assess the correct and efficient working of Splunk. I use these dashboards in my day to day job and I am proud to share them with the wider community.
8 | 9 |All these dashboards have been built with the following principles:
10 |This tool reads from the metrics about the forwarders connecting to your indexers. You can select an entire cluster or a single site and the tool will rank forwarders by their contribution of data. This is important as very common to have a single forwarder send a significant about of data to the cluster. In the extreme cases it is not uncommon for 90% of the data to come from just 5% of the forwarders. It is therefore very important to tune how this 5% to work efficiently and correctly. The remaining 95% of forwarders then become little more than rounding errors. This tool will allow you to browse the forwarder population and at a high level understand its contribution, its burstiness, maximum data rates how long it takes to sweep the entire cluster, the software build and OS.
27 |This tool instruments all logging around the ingestion pipelines, including: pipeline utilization, load, parsing errors, blocking, connecting forwarders, channel creation, output performance. You can select a single pipeline and see metrics about the sources passing through it, identify problematic sources and drill backwards through the forwarding chain to the source.
31 | 32 | 33 |This tool measures event distribution in your Splunk environments, select a site or a cluster to perform the analysis on, then you select a subset of the indexes within that site or cluster to measure. The dashboard then measure at regular intervals how events are distributed across the search peers. The distribution is then visualized in a series of charts. 37 |
38 | 39 | 40 | 41 |This tool measures event delay at scale simply by comparing the indexed field "_indexTime" and _time using tstats. By default, the search targets very event received over the last second. Use this tool to find indexes which are receiving from the past or the future, and then drill in to find which specific hosts are responsible for the ill-timed events.
45 |This tool is similar to "Event delay per index", but instead of measuring one second for all hosts, it measures one host for 24 hours (by default). Use this tool to understand if a host is sending data to the cluster in a timely manor.
49 | 50 | https://raw.githubusercontent.com/silkyrich/cluster_health_tools/master/default/data/ui/views/debug_replication.xml 51 | 52 |Blocking on indexers can be caused by a few factors, they are disk IO, saturated CPU, network latency and network throughput. Network throughput can be a problem in cloud environments where indexers are place in various racks around the data center and contention for network resources are in play, just because you have a 10Gbit interface doesn't mean you are going to get that between pairs of hosts. This dashboard measure the blocking reported by the indexers when sending to a remote host allowing you to surface problematic indexers. It then plots the delay by reporting indexer, the remote blocked indexers and the individual bucket. 56 | 57 | 58 |
59 | 60 | 61 |This tool allows you to understand and measure the efficiency of the bucket rolling across all indexes, and drills down into the behaviour of specific indexes. Select a bucket and it constructs a search to find what is in that bucket. Use this tool to understand and reduce the frequency of bucket rolls.
65 |Event distribution is critical to search workload distribution and the scale out of your environment. This tool measures how well event distribution is working in your environment. You can select multiple indexes and see how quickly randomisation is working.
69 | 70 | 71 |