├── .gitignore ├── README.md ├── co2meter ├── README.md ├── archive.png ├── co2meter.ndjson ├── co2meter.png ├── co2meter.py ├── dashboard.png ├── index.png └── minio.png ├── directory-sizes ├── README.md ├── archive.png ├── dashboard.png ├── directory-sizes.py ├── index.png ├── logo.png └── minio.png ├── flight-tracker ├── README.md ├── archive.png ├── dashboard.png ├── flight-tracker.ndjson ├── flight-tracker.py ├── index.png ├── logo.png └── minio.png ├── gps ├── README.md ├── archive.png ├── dashboard.png ├── gps.ndjson ├── gps.png ├── index.png └── minio.png ├── haproxy-filebeat-module ├── 2-archive │ ├── haproxy-filebeat-module-archive.yml │ ├── haproxy-filebeat-module-reindex.yml │ └── haproxy-filebeat-module-structure.yml ├── 4-visualize │ └── dashboard.json └── README.md ├── images ├── architecture.png ├── caiv.png ├── data-source-assets.png ├── elk-data-lake.png ├── indexing.png ├── logical-elements.png ├── onboarding-data.png ├── terminology.png └── workflow.png ├── power-emu2 ├── README.md ├── archive.png ├── dashboard.png ├── emu-2.jpg ├── index.png ├── minio.png ├── power-emu2.ndjson └── power-emu2.py ├── power-hs300 ├── README.md ├── archive.png ├── dashboard.png ├── hs300.png ├── hs300.py ├── index.png ├── minio.png ├── power-hs300.ndjson └── reindex.yml ├── satellites ├── README.md ├── dashboard.png ├── satellites.ndjson └── satellites.png ├── setup ├── README.md ├── dead-letter-queue-archive.yml ├── distributor.conf ├── haproxy.cfg └── needs-classified-archive.yml ├── solar-enphase ├── README.md ├── archive.png ├── dashboard.png ├── index.png ├── minio.png ├── solar-enphase.py └── solar.png ├── temperature-dht22 ├── README.md ├── archive.png ├── dashboard.png ├── dht22.png ├── index.png ├── minio.png └── temperature-dht22.ndjson ├── utilization ├── 2-archive │ ├── utilization-archive.yml │ ├── utilization-reindex.yml │ └── utilization-structure.yml └── 3-index │ ├── README.md │ └── utilization-index-template.json └── weather-station ├── README.md ├── archive.png ├── dashboard.png ├── index.png ├── minio.png ├── weather-station.ndjson └── ws-1550-ip.png /.gitignore: -------------------------------------------------------------------------------- 1 | .DS_Store 2 | *.swp 3 | *.bak 4 | *.orig 5 | *.dump 6 | *.keystore 7 | tmp 8 | -------------------------------------------------------------------------------- /co2meter/README.md: -------------------------------------------------------------------------------- 1 | # CO2 Monitoring 2 | 3 | co2meter 4 | 5 | The [CO2Mini](https://www.co2meter.com/collections/desktop/products/co2mini-co2-indoor-air-quality-monitor?variant=308811055) is an indoor air quality monitor that displays the CO2 level of the room it's in. It's often used in home and office settings since it's been shown that elevated CO2 levels can cause [fatigue](https://pubmed.ncbi.nlm.nih.gov/26273786/) and [impair decisions](https://newscenter.lbl.gov/2012/10/17/elevated-indoor-carbon-dioxide-impairs-decision-making-performance/). The CO2Mini connects to a computer via USB where it can be read programmatically. 6 | 7 | In this data source, we'll build the following dashboard with Elastic: 8 | 9 | ![Dashboard](dashboard.png) 10 | 11 | Let's get started! 12 | 13 | ## Step #1 - Collect Data 14 | 15 | Create a new python script called `~/bin/co2meter.py` with the following contents: 16 | 17 | ​ [co2meter.py](co2meter.py) 18 | 19 | The script was originally written by [Henryk Plötz](https://hackaday.io/project/5301-reverse-engineering-a-low-cost-usb-co-monitor/log/17909-all-your-base-are-belong-to-us) and has only a few minor edits so it works with Python3. 20 | 21 | Take a few minutes to familiarize yourself with the script. There are a couple of labels you can change near the bottom. Adjust the values of `hostname` and `location` to suit your needs. 22 | 23 | With your CO2Mini plugged in, try running the script: 24 | 25 | ```bash 26 | chmod a+x ~/bin/co2meter.py 27 | sudo ~/bin/co2meter.py 28 | ``` 29 | 30 | We'll run our script with `sudo`, but you could add a `udev` rule to give your user permission to `/dev/hidraw0`. 31 | 32 | You should see output on `stdout` similar to: 33 | 34 | ```json 35 | {"@timestamp": "2021-09-01T20:38:06.353614", "hostname": "node", "location": "office", "co2_ppm": 438, "temp_c": 27.79, "temp_f": 82.02, "source": "CO2 Meter"} 36 | ``` 37 | 38 | Once you confirm the script is working, you can redirect its output to a log file: 39 | 40 | ```bash 41 | sudo touch /var/log/co2meter.log 42 | sudo chown ubuntu.ubuntu /var/log/co2meter.log 43 | ``` 44 | 45 | Create a logrotate entry so the log file doesn't grow unbounded: 46 | 47 | ```bash 48 | sudo vi /etc/logrotate.d/co2meter 49 | ``` 50 | 51 | Add the following logrotate content: 52 | 53 | ``` 54 | /var/log/co2meter.log { 55 | weekly 56 | rotate 12 57 | compress 58 | delaycompress 59 | missingok 60 | notifempty 61 | create 644 ubuntu ubuntu 62 | } 63 | ``` 64 | 65 | Add the following entry to your crontab with `crontab -e`: 66 | 67 | ``` 68 | * * * * * /home/ubuntu/bin/co2meter.py >> /var/log/co2meter.log 2>&1 69 | ``` 70 | 71 | Verify output by tailing the log file for a few minutes (since cron is only running the script at the start of each minute): 72 | 73 | ```bash 74 | tail -f /var/log/co2meter.log 75 | ``` 76 | 77 | If you're seeing output scroll each minute then you are successfully collecting data! 78 | 79 | ## Step #2 - Archive Data 80 | 81 | Once your data is ready to archive, we'll use Filebeat to send it to Logstash which will in turn sends it to S3. 82 | 83 | Add the following to the Filebeat config `/etc/filebeat/filebeat.yml` on the host logging your CO2 data: 84 | 85 | ```yaml 86 | filebeat.inputs: 87 | - type: log 88 | enabled: true 89 | tags: ["co2meter"] 90 | paths: 91 | - /var/log/co2meter.log 92 | ``` 93 | 94 | This tells Filebeat where the log file is located and it adds a tag to each event. We'll refer to that tag in Logstash so we can easily isolate events from this data stream. 95 | 96 | Restart Filebeat: 97 | 98 | ```bash 99 | sudo systemctl restart filebeat 100 | ``` 101 | 102 | You may want to tail syslog to see if Filebeat restarts without any issues: 103 | 104 | ```bash 105 | tail -f /var/log/syslog | grep filebeat 106 | ``` 107 | 108 | At this point, we should have CO2 Meter data flowing into Logstash. By default however, our `distributor` pipeline in Logstash will put any unrecognized data in our Data Lake / S3 bucket called `NEEDS_CLASSIFIED`. To change this, we're going to update the `distributor` pipeline to recognize the CO2 Meter data feed. 109 | 110 | Add the following conditional to your `distributor.yml` file: 111 | 112 | ``` 113 | } else if "co2meter" in [tags] { 114 | pipeline { 115 | send_to => ["co2meter-archive"] 116 | } 117 | } 118 | ``` 119 | 120 | Create a Logstash pipeline called `co2meter-archive.yml` with the following contents: 121 | 122 | ``` 123 | input { 124 | pipeline { 125 | address => "co2meter-archive" 126 | } 127 | } 128 | filter { 129 | } 130 | output { 131 | s3 { 132 | # 133 | # Custom Settings 134 | # 135 | prefix => "co2meter/%{+YYYY}-%{+MM}-%{+dd}/%{+HH}" 136 | temporary_directory => "${S3_TEMP_DIR}/co2meter-archive" 137 | access_key_id => "${S3_ACCESS_KEY}" 138 | secret_access_key => "${S3_SECRET_KEY}" 139 | endpoint => "${S3_ENDPOINT}" 140 | bucket => "${S3_BUCKET}" 141 | 142 | # 143 | # Standard Settings 144 | # 145 | validate_credentials_on_root_bucket => false 146 | codec => json_lines 147 | # Limit Data Lake file sizes to 5 GB 148 | size_file => 5000000000 149 | time_file => 60 150 | # encoding => "gzip" 151 | additional_settings => { 152 | force_path_style => true 153 | follow_redirects => false 154 | } 155 | } 156 | } 157 | ``` 158 | 159 | Put this pipeline in your Logstash configuration directory so it gets loaded whenever Logstash restarts: 160 | 161 | ```bash 162 | sudo mv co2meter-archive.yml /etc/logstash/conf.d/ 163 | ``` 164 | 165 | Add the pipeline to your `/etc/logstash/pipelines.yml` file: 166 | 167 | ``` 168 | - pipeline.id: "co2meter-archive" 169 | path.config: "/etc/logstash/conf.d/co2meter-archive.conf" 170 | ``` 171 | 172 | And finally, restart the Logstash service: 173 | 174 | ```bash 175 | sudo systemctl restart logstash 176 | ``` 177 | 178 | While Logstash is restarting, you can tail it's log file in order to see if there are any configuration errors: 179 | 180 | ```bash 181 | sudo tail -f /var/log/logstash/logstash-plain.log 182 | ``` 183 | 184 | After a few seconds, you should see Logstash shutdown and start with the new pipeline and no errors being emitted. 185 | 186 | Check your cluster's Stack Monitoring to see if we're getting events through the pipeline: 187 | 188 | ![Stack Monitoring](archive.png) 189 | 190 | Check your S3 bucket to see if you're getting data directories created for the current date & hour with data: 191 | 192 | ![MinIO](minio.png) 193 | 194 | If you see your data being stored, then you are successfully archiving! 195 | 196 | ## Step #3 - Index Data 197 | 198 | Once Logstash is archiving the data, next we need to index it with Elastic. 199 | 200 | We'll use Elastic's [Dynamic field mapping](https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-field-mapping.html) feature to automatically create the right [Field data types](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html) for the data we're sending in. 201 | 202 | Using the [Logstash Toolkit](http://github.com/gose/logstash-toolkit), the following filter chain has been built that can parse the raw JSON coming in. 203 | 204 | Create a new pipeline called `co2meter-index.yml` with the following content: 205 | 206 | ``` 207 | input { 208 | pipeline { 209 | address => "co2meter-index" 210 | } 211 | } 212 | filter { 213 | json { 214 | source => "message" 215 | } 216 | json { 217 | source => "message" 218 | } 219 | mutate { 220 | remove_field => ["message", "agent", "host", "input", "log", "host", "ecs", "@version"] 221 | } 222 | } 223 | output { 224 | elasticsearch { 225 | # 226 | # Custom Settings 227 | # 228 | id => "co2meter-index" 229 | index => "co2meter-%{+YYYY.MM.dd}" 230 | hosts => "${ES_ENDPOINT}" 231 | user => "${ES_USERNAME}" 232 | password => "${ES_PASSWORD}" 233 | } 234 | } 235 | ``` 236 | 237 | Put this pipeline in your Logstash configuration directory so it gets loaded in whenever Logstash restarts: 238 | 239 | ```bash 240 | sudo mv co2meter-index.yml /etc/logstash/conf.d/ 241 | ``` 242 | 243 | Add the pipeline to your `/etc/logstash/pipelines.yml` file: 244 | 245 | ``` 246 | - pipeline.id: "co2meter-index" 247 | path.config: "/etc/logstash/conf.d/co2meter-index.conf" 248 | ``` 249 | 250 | And finally, restart the Logstash service: 251 | 252 | ```bash 253 | sudo systemctl restart logstash 254 | ``` 255 | 256 | While Logstash is restarting, you can tail it's log file in order to see if there are any configuration errors: 257 | 258 | ```bash 259 | sudo tail -f /var/log/logstash/logstash-plain.log 260 | ``` 261 | 262 | After a few seconds, you should see Logstash shutdown and start with the new pipeline and no errors being emitted. 263 | 264 | Check your cluster's Stack Monitoring to see if we're getting events through the pipeline: 265 | 266 | ![Indexing](index.png) 267 | 268 | ## Step #4 - Visualize Data 269 | 270 | Once Elasticsearch is indexing the data, we want to visualize it in Kibana. 271 | 272 | Download this dashboard: [co2meter.ndjson](co2meter.ndjson) 273 | 274 | Jump back into Kibana: 275 | 276 | 1. Select "Stack Management" from the menu 277 | 2. Select "Saved Objects" 278 | 3. Click "Import" in the upper right 279 | 280 | Once it's been imported, click on "CO2 Meter". 281 | 282 | Congratulations! You should now be looking at data from your CO2 Meter in Elastic. 283 | 284 | ![Dashboard](dashboard.png) 285 | 286 | These graphs can be added to the [Weather Station](../weather-station) data source. 287 | -------------------------------------------------------------------------------- /co2meter/archive.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/co2meter/archive.png -------------------------------------------------------------------------------- /co2meter/co2meter.ndjson: -------------------------------------------------------------------------------- 1 | {"attributes":{"fieldAttrs":"{}","fields":"[]","runtimeFieldMap":"{}","timeFieldName":"@timestamp","title":"co2meter-*","typeMeta":"{}"},"coreMigrationVersion":"7.14.0","id":"74e365f0-0b03-11ec-b013-53a9df7625dd","migrationVersion":{"index-pattern":"7.11.0"},"references":[],"type":"index-pattern","updated_at":"2021-09-01T09:03:21.562Z","version":"WzIxODM4MSwyXQ=="} 2 | {"attributes":{"description":"","hits":0,"kibanaSavedObjectMeta":{"searchSourceJSON":"{\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filter\":[]}"},"optionsJSON":"{\"useMargins\":true,\"syncColors\":false,\"hidePanelTitles\":false}","panelsJSON":"[{\"version\":\"7.14.0\",\"type\":\"visualization\",\"gridData\":{\"x\":0,\"y\":0,\"w\":11,\"h\":4,\"i\":\"83813ed8-374f-42f7-851e-453e236435be\"},\"panelIndex\":\"83813ed8-374f-42f7-851e-453e236435be\",\"embeddableConfig\":{\"savedVis\":{\"title\":\"\",\"description\":\"\",\"type\":\"markdown\",\"params\":{\"fontSize\":12,\"openLinksInNewTab\":false,\"markdown\":\"# CO2 Meter\"},\"uiState\":{},\"data\":{\"aggs\":[],\"searchSource\":{\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filter\":[]}}},\"enhancements\":{}}},{\"version\":\"7.14.0\",\"type\":\"lens\",\"gridData\":{\"x\":0,\"y\":4,\"w\":48,\"h\":12,\"i\":\"2af5ddba-aad0-40ae-b802-4a9e3e83fc33\"},\"panelIndex\":\"2af5ddba-aad0-40ae-b802-4a9e3e83fc33\",\"embeddableConfig\":{\"attributes\":{\"title\":\"\",\"type\":\"lens\",\"visualizationType\":\"lnsXY\",\"state\":{\"datasourceStates\":{\"indexpattern\":{\"layers\":{\"6619bac7-cb55-4c41-81cb-f417f1e8d4d4\":{\"columns\":{\"8b1ecdee-e774-4a13-b160-76f80601de32\":{\"label\":\"@timestamp\",\"dataType\":\"date\",\"operationType\":\"date_histogram\",\"sourceField\":\"@timestamp\",\"isBucketed\":true,\"scale\":\"interval\",\"params\":{\"interval\":\"auto\"}},\"bbd7882d-13ea-40f2-9d25-c863db2ff550\":{\"label\":\"Median of co2_ppm\",\"dataType\":\"number\",\"operationType\":\"median\",\"sourceField\":\"co2_ppm\",\"isBucketed\":false,\"scale\":\"ratio\"},\"5ed35199-2f7e-4098-b627-ba46c5bf53f8\":{\"label\":\"Top values of hostname.keyword\",\"dataType\":\"string\",\"operationType\":\"terms\",\"scale\":\"ordinal\",\"sourceField\":\"hostname.keyword\",\"isBucketed\":true,\"params\":{\"size\":3,\"orderBy\":{\"type\":\"column\",\"columnId\":\"bbd7882d-13ea-40f2-9d25-c863db2ff550\"},\"orderDirection\":\"desc\",\"otherBucket\":true,\"missingBucket\":false}}},\"columnOrder\":[\"5ed35199-2f7e-4098-b627-ba46c5bf53f8\",\"8b1ecdee-e774-4a13-b160-76f80601de32\",\"bbd7882d-13ea-40f2-9d25-c863db2ff550\"],\"incompleteColumns\":{}}}}},\"visualization\":{\"legend\":{\"isVisible\":true,\"position\":\"right\"},\"valueLabels\":\"hide\",\"fittingFunction\":\"None\",\"curveType\":\"CURVE_MONOTONE_X\",\"yLeftExtent\":{\"mode\":\"custom\",\"lowerBound\":0,\"upperBound\":1600},\"yRightExtent\":{\"mode\":\"full\"},\"axisTitlesVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"tickLabelsVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"gridlinesVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"preferredSeriesType\":\"line\",\"layers\":[{\"layerId\":\"6619bac7-cb55-4c41-81cb-f417f1e8d4d4\",\"accessors\":[\"bbd7882d-13ea-40f2-9d25-c863db2ff550\"],\"position\":\"top\",\"seriesType\":\"line\",\"showGridlines\":false,\"xAccessor\":\"8b1ecdee-e774-4a13-b160-76f80601de32\",\"splitAccessor\":\"5ed35199-2f7e-4098-b627-ba46c5bf53f8\"}]},\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filters\":[]},\"references\":[{\"type\":\"index-pattern\",\"id\":\"74e365f0-0b03-11ec-b013-53a9df7625dd\",\"name\":\"indexpattern-datasource-current-indexpattern\"},{\"type\":\"index-pattern\",\"id\":\"74e365f0-0b03-11ec-b013-53a9df7625dd\",\"name\":\"indexpattern-datasource-layer-6619bac7-cb55-4c41-81cb-f417f1e8d4d4\"}]},\"enhancements\":{}}},{\"version\":\"7.14.0\",\"type\":\"lens\",\"gridData\":{\"x\":0,\"y\":16,\"w\":48,\"h\":12,\"i\":\"35baf3b9-4b4f-4248-8044-6415c253c84c\"},\"panelIndex\":\"35baf3b9-4b4f-4248-8044-6415c253c84c\",\"embeddableConfig\":{\"attributes\":{\"title\":\"\",\"type\":\"lens\",\"visualizationType\":\"lnsXY\",\"state\":{\"datasourceStates\":{\"indexpattern\":{\"layers\":{\"f4357ceb-17b0-4b3a-8118-8529aa62726d\":{\"columns\":{\"8fe94aed-879a-4a2f-89c1-e787fe8a55fe\":{\"label\":\"@timestamp\",\"dataType\":\"date\",\"operationType\":\"date_histogram\",\"sourceField\":\"@timestamp\",\"isBucketed\":true,\"scale\":\"interval\",\"params\":{\"interval\":\"auto\"}},\"07ebffaa-db61-4e78-b3cc-7bd46277170d\":{\"label\":\"Top values of hostname.keyword\",\"dataType\":\"string\",\"operationType\":\"terms\",\"scale\":\"ordinal\",\"sourceField\":\"hostname.keyword\",\"isBucketed\":true,\"params\":{\"size\":3,\"orderBy\":{\"type\":\"column\",\"columnId\":\"c589e9a4-17c7-4f72-96d3-cdccebbe04a2\"},\"orderDirection\":\"desc\",\"otherBucket\":true,\"missingBucket\":false}},\"c589e9a4-17c7-4f72-96d3-cdccebbe04a2\":{\"label\":\"Median of temp_f\",\"dataType\":\"number\",\"operationType\":\"median\",\"sourceField\":\"temp_f\",\"isBucketed\":false,\"scale\":\"ratio\"}},\"columnOrder\":[\"07ebffaa-db61-4e78-b3cc-7bd46277170d\",\"8fe94aed-879a-4a2f-89c1-e787fe8a55fe\",\"c589e9a4-17c7-4f72-96d3-cdccebbe04a2\"],\"incompleteColumns\":{}}}}},\"visualization\":{\"legend\":{\"isVisible\":true,\"position\":\"right\"},\"valueLabels\":\"hide\",\"fittingFunction\":\"None\",\"curveType\":\"CURVE_MONOTONE_X\",\"yLeftExtent\":{\"mode\":\"custom\",\"lowerBound\":0,\"upperBound\":100},\"yRightExtent\":{\"mode\":\"full\"},\"axisTitlesVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"tickLabelsVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"gridlinesVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"preferredSeriesType\":\"line\",\"layers\":[{\"layerId\":\"f4357ceb-17b0-4b3a-8118-8529aa62726d\",\"accessors\":[\"c589e9a4-17c7-4f72-96d3-cdccebbe04a2\"],\"position\":\"top\",\"seriesType\":\"line\",\"showGridlines\":false,\"xAccessor\":\"8fe94aed-879a-4a2f-89c1-e787fe8a55fe\",\"splitAccessor\":\"07ebffaa-db61-4e78-b3cc-7bd46277170d\"}]},\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filters\":[]},\"references\":[{\"type\":\"index-pattern\",\"id\":\"74e365f0-0b03-11ec-b013-53a9df7625dd\",\"name\":\"indexpattern-datasource-current-indexpattern\"},{\"type\":\"index-pattern\",\"id\":\"74e365f0-0b03-11ec-b013-53a9df7625dd\",\"name\":\"indexpattern-datasource-layer-f4357ceb-17b0-4b3a-8118-8529aa62726d\"}]},\"enhancements\":{}}}]","timeRestore":false,"title":"CO2 Meter","version":1},"coreMigrationVersion":"7.14.0","id":"90479db0-0b04-11ec-b013-53a9df7625dd","migrationVersion":{"dashboard":"7.14.0"},"references":[{"id":"74e365f0-0b03-11ec-b013-53a9df7625dd","name":"2af5ddba-aad0-40ae-b802-4a9e3e83fc33:indexpattern-datasource-current-indexpattern","type":"index-pattern"},{"id":"74e365f0-0b03-11ec-b013-53a9df7625dd","name":"2af5ddba-aad0-40ae-b802-4a9e3e83fc33:indexpattern-datasource-layer-6619bac7-cb55-4c41-81cb-f417f1e8d4d4","type":"index-pattern"},{"id":"74e365f0-0b03-11ec-b013-53a9df7625dd","name":"35baf3b9-4b4f-4248-8044-6415c253c84c:indexpattern-datasource-current-indexpattern","type":"index-pattern"},{"id":"74e365f0-0b03-11ec-b013-53a9df7625dd","name":"35baf3b9-4b4f-4248-8044-6415c253c84c:indexpattern-datasource-layer-f4357ceb-17b0-4b3a-8118-8529aa62726d","type":"index-pattern"}],"type":"dashboard","updated_at":"2021-09-01T20:50:22.668Z","version":"WzIzNDk3MCwyXQ=="} 3 | {"excludedObjects":[],"excludedObjectsCount":0,"exportedCount":2,"missingRefCount":0,"missingReferences":[]} -------------------------------------------------------------------------------- /co2meter/co2meter.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/co2meter/co2meter.png -------------------------------------------------------------------------------- /co2meter/co2meter.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | import datetime 4 | import fcntl 5 | import json 6 | import sys 7 | import time 8 | 9 | def decrypt(key, data): 10 | cstate = [0x48, 0x74, 0x65, 0x6D, 0x70, 0x39, 0x39, 0x65] 11 | shuffle = [2, 4, 0, 7, 1, 6, 5, 3] 12 | phase1 = [0] * 8 13 | for i, o in enumerate(shuffle): 14 | phase1[o] = data[i] 15 | phase2 = [0] * 8 16 | for i in range(8): 17 | phase2[i] = phase1[i] ^ key[i] 18 | phase3 = [0] * 8 19 | for i in range(8): 20 | phase3[i] = ( (phase2[i] >> 3) | (phase2[ (i-1+8)%8 ] << 5) ) & 0xff 21 | ctmp = [0] * 8 22 | for i in range(8): 23 | ctmp[i] = ( (cstate[i] >> 4) | (cstate[i]<<4) ) & 0xff 24 | out = [0] * 8 25 | for i in range(8): 26 | out[i] = (0x100 + phase3[i] - ctmp[i]) & 0xff 27 | return out 28 | 29 | def hd(d): 30 | return " ".join("%02X" % e for e in d) 31 | 32 | if __name__ == "__main__": 33 | key = [0xc4, 0xc6, 0xc0, 0x92, 0x40, 0x23, 0xdc, 0x96] 34 | fp = open("/dev/hidraw0", "a+b", 0) 35 | set_report = [0] + [0xc4, 0xc6, 0xc0, 0x92, 0x40, 0x23, 0xdc, 0x96] 36 | fcntl.ioctl(fp, 0xC0094806, bytearray(set_report)) 37 | 38 | values = {} 39 | 40 | co2_ppm = 0 41 | temp_c = 1000 42 | i = 0 43 | 44 | while True: 45 | i += 1 46 | if i == 10: 47 | break 48 | data = list(fp.read(8)) 49 | decrypted = decrypt(key, data) 50 | if decrypted[4] != 0x0d or (sum(decrypted[:3]) & 0xff) != decrypted[3]: 51 | print(hd(data), " => ", hd(decrypted), "Checksum error") 52 | else: 53 | op = decrypted[0] 54 | val = decrypted[1] << 8 | decrypted[2] 55 | values[op] = val 56 | # http://co2meters.com/Documentation/AppNotes/AN146-RAD-0401-serial-communication.pdf 57 | if 0x50 in values: 58 | co2_ppm = values[0x50] 59 | if 0x42 in values: 60 | temp_c = values[0x42] / 16.0 - 273.15 61 | 62 | temp_f = (temp_c * 9 / 5) + 32 63 | output = { 64 | "@timestamp": datetime.datetime.utcnow().isoformat(), 65 | "hostname": "node-21", 66 | "location": "office", 67 | "co2_ppm": co2_ppm, 68 | "temp_c": float("%2.2f" % temp_c), 69 | "temp_f": float("%2.2f" % temp_f), 70 | "source": "CO2 Meter" 71 | } 72 | print(json.dumps(output)) 73 | -------------------------------------------------------------------------------- /co2meter/dashboard.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/co2meter/dashboard.png -------------------------------------------------------------------------------- /co2meter/index.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/co2meter/index.png -------------------------------------------------------------------------------- /co2meter/minio.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/co2meter/minio.png -------------------------------------------------------------------------------- /directory-sizes/README.md: -------------------------------------------------------------------------------- 1 | # Monitoring Directory Sizes 2 | 3 | DHT22 4 | 5 | Keeping an eye on the growth of your Data Lake is useful for a few reasons: 6 | 7 | 1. See how fast each data source is growing on disk 8 | 2. Keep an eye on how much space you have available 9 | 3. Better understand the cost of storing each data source 10 | 11 | We'll use a Python script to query the size of each directory in our Data Lake (via NFS mount) in addition to recording the total size and space available for use. Our script will write to stdout which we'll redirect to a log file. From there, Filebeat will pick it up and send it to Elastic. 12 | 13 | ![Dashboard](dashboard.png) 14 | 15 | Let's get started. 16 | 17 | ## Step #1 - Collect Data 18 | 19 | Create a Python script at `~/bin/directory-sizes.py` with the following contents (adjusting any values as you see fit): 20 | 21 | ```python 22 | #!/usr/bin/env python3 23 | 24 | import datetime 25 | import json 26 | import os 27 | 28 | path = "/mnt/data-lake" 29 | 30 | def get_size(start_path = path): 31 | total_size = 0 32 | for dirpath, dirnames, filenames in os.walk(start_path): 33 | for f in filenames: 34 | fp = os.path.join(dirpath, f) 35 | if not os.path.islink(fp): # skip symbolic links 36 | total_size += os.path.getsize(fp) 37 | return total_size 38 | 39 | if __name__ == "__main__": 40 | if os.path.ismount(path): 41 | # Get size of each directory 42 | for d in os.listdir(path): 43 | size_bytes = get_size(path + "/" + d) 44 | output = { 45 | "@timestamp": datetime.datetime.utcnow().isoformat(), 46 | "dir": d, 47 | "bytes": size_bytes 48 | } 49 | print(json.dumps(output)) 50 | 51 | # Get total, available, and free space 52 | statvfs = os.statvfs(path) 53 | output = { 54 | "@timestamp": datetime.datetime.utcnow().isoformat(), 55 | "total_bytes": statvfs.f_frsize * statvfs.f_blocks, # Size of filesystem in bytes 56 | "free_bytes": statvfs.f_frsize * statvfs.f_bfree, # Free bytes total 57 | "available_bytes": statvfs.f_frsize * statvfs.f_bavail, # Free bytes for users 58 | "mounted": True 59 | } 60 | print(json.dumps(output)) 61 | else: 62 | output = { 63 | "@timestamp": datetime.datetime.utcnow().isoformat(), 64 | "mounted": False 65 | } 66 | print(json.dumps(output)) 67 | ``` 68 | 69 | Try running the script from the command line: 70 | 71 | ```bash 72 | chmod a+x ~/bin/directory-sizes.py 73 | ~/bin/directory-sizes.py 74 | ``` 75 | 76 | The output should look like the following: 77 | 78 | ```json 79 | {"@timestamp": "2021-09-06T14:46:37.376487", "dir": "nginx", "bytes": 1445406508} 80 | {"@timestamp": "2021-09-06T14:46:39.673445", "dir": "system-metricbeat-module", "bytes": 62265436549} 81 | {"@timestamp": "2021-09-06T14:46:39.683812", "dir": "flights", "bytes": 5943006981} 82 | {"@timestamp": "2021-09-06T14:46:41.122360", "dir": "haproxy-metricbeat-module", "bytes": 15443596238} 83 | {"@timestamp": "2021-09-06T14:46:41.122731", "dir": "weather-historical", "bytes": 137599636} 84 | ... 85 | ``` 86 | 87 | Once you're able to successfully run the script, create a log file for its output: 88 | 89 | ```bash 90 | sudo touch /var/log/directory-sizes.log 91 | sudo chown ubuntu.ubuntu /var/log/directory-sizes.log 92 | ``` 93 | 94 | Create a logrotate entry so the log file doesn't grow unbounded: 95 | 96 | ``` 97 | sudo vi /etc/logrotate.d/directory-sizes 98 | ``` 99 | 100 | Add the following content: 101 | 102 | ``` 103 | /var/log/directory-sizes.log { 104 | weekly 105 | rotate 12 106 | compress 107 | delaycompress 108 | missingok 109 | notifempty 110 | create 644 ubuntu ubuntu 111 | } 112 | ``` 113 | 114 | Add the following entry to your crontab: 115 | 116 | ``` 117 | * * * * * sudo /home/ubuntu/bin/directory-sizes.py >> /var/log/directory-sizes.log 2>&1 118 | ``` 119 | 120 | Verify output by tailing the log file for a few minutes: 121 | 122 | ``` 123 | $ tail -f /var/log/directory-sizes.log 124 | ``` 125 | 126 | If you're seeing output scroll each minute then you are successfully collecting data! 127 | 128 | ## Step #2 - Archive Data 129 | 130 | Once your data is ready to archive, we'll use Filebeat to send it to Logstash which will in turn sends it to S3. 131 | 132 | Add the following to the Filebeat config `/etc/filebeat/filebeat.yml` on the host logging your DHT22 data: 133 | 134 | ```yaml 135 | filebeat.inputs: 136 | - type: log 137 | enabled: true 138 | tags: ["directory-sizes"] 139 | paths: 140 | - /var/log/directory-sizes.log 141 | ``` 142 | 143 | This tells Filebeat where the log file is located and it adds a tag to each event. We'll refer to that tag in Logstash so we can easily isolate events from this data stream. 144 | 145 | Restart Filebeat: 146 | 147 | ```bash 148 | sudo systemctl restart filebeat 149 | ``` 150 | 151 | You may want to tail syslog to see if Filebeat restarts without any issues: 152 | 153 | ```bash 154 | tail -f /var/log/syslog | grep filebeat 155 | ``` 156 | 157 | At this point, we should have DHT22 data flowing into Logstash. By default however, our `distributor` pipeline in Logstash will put any unrecognized data in our Data Lake / S3 bucket called `NEEDS_CLASSIFIED`. To change this, we're going to update the `distributor` pipeline to recognize the DHT22 data feed. 158 | 159 | Add the following conditional to your `distributor.yml` file: 160 | 161 | ``` 162 | } else if "directory-sizes" in [tags] { 163 | pipeline { 164 | send_to => ["directory-sizes-archive"] 165 | } 166 | } 167 | ``` 168 | 169 | Create a Logstash pipeline called `directory-sizes-archive.yml` with the following contents: 170 | 171 | ``` 172 | input { 173 | pipeline { 174 | address => "directory-sizes-archive" 175 | } 176 | } 177 | filter { 178 | } 179 | output { 180 | s3 { 181 | # 182 | # Custom Settings 183 | # 184 | prefix => "directory-sizes/%{+YYYY}-%{+MM}-%{+dd}/%{+HH}" 185 | temporary_directory => "${S3_TEMP_DIR}/directory-sizes-archive" 186 | access_key_id => "${S3_ACCESS_KEY}" 187 | secret_access_key => "${S3_SECRET_KEY}" 188 | endpoint => "${S3_ENDPOINT}" 189 | bucket => "${S3_BUCKET}" 190 | 191 | # 192 | # Standard Settings 193 | # 194 | validate_credentials_on_root_bucket => false 195 | codec => json_lines 196 | # Limit Data Lake file sizes to 5 GB 197 | size_file => 5000000000 198 | time_file => 60 199 | # encoding => "gzip" 200 | additional_settings => { 201 | force_path_style => true 202 | follow_redirects => false 203 | } 204 | } 205 | } 206 | ``` 207 | 208 | Put this pipeline in your Logstash configuration directory so it gets loaded whenever Logstash restarts: 209 | 210 | ```bash 211 | sudo mv directory-sizes-archive.yml /etc/logstash/conf.d/ 212 | ``` 213 | 214 | Add the pipeline to your `/etc/logstash/pipelines.yml` file: 215 | 216 | ``` 217 | - pipeline.id: "directory-sizes-archive" 218 | path.config: "/etc/logstash/conf.d/directory-sizes-archive.conf" 219 | ``` 220 | 221 | And finally, restart the Logstash service: 222 | 223 | ```bash 224 | sudo systemctl restart logstash 225 | ``` 226 | 227 | While Logstash is restarting, you can tail it's log file in order to see if there are any configuration errors: 228 | 229 | ```bash 230 | sudo tail -f /var/log/logstash/logstash-plain.log 231 | ``` 232 | 233 | After a few seconds, you should see Logstash shutdown and start with the new pipeline and no errors being emitted. 234 | 235 | Check your cluster's Stack Monitoring to see if we're getting events through the pipeline: 236 | 237 | ![Stack Monitoring](archive.png) 238 | 239 | Check your S3 bucket to see if you're getting data directories created for the current date & hour with data: 240 | 241 | ![MinIO](minio.png) 242 | 243 | If you see your data being stored, then you are successfully archiving! 244 | 245 | ## Step #3 - Index Data 246 | 247 | Once Logstash is archiving the data, next we need to index it with Elastic. 248 | 249 | We'll use Elastic's [Dynamic field mapping](https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-field-mapping.html) feature to automatically create the right [Field data types](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html) for the data we're sending in. 250 | 251 | Using the [Logstash Toolkit](http://github.com/gose/logstash-toolkit), the following filter chain has been built that can parse the raw JSON coming in. 252 | 253 | Create a new pipeline called `directory-sizes-index.yml` with the following content: 254 | 255 | ``` 256 | input { 257 | pipeline { 258 | address => "directory-sizes-index" 259 | } 260 | } 261 | filter { 262 | json { 263 | source => "message" 264 | } 265 | json { 266 | source => "message" 267 | } 268 | date { 269 | match => ["timestamp", "ISO8601"] 270 | } 271 | mutate { 272 | remove_field => ["timestamp", "message"] 273 | remove_field => ["tags", "agent", "input", "log", "path", "ecs", "@version"] 274 | } 275 | } 276 | output { 277 | elasticsearch { 278 | # 279 | # Custom Settings 280 | # 281 | id => "directory-sizes-index" 282 | index => "directory-sizes-%{+YYYY.MM.dd}" 283 | hosts => "${ES_ENDPOINT}" 284 | user => "${ES_USERNAME}" 285 | password => "${ES_PASSWORD}" 286 | } 287 | } 288 | ``` 289 | 290 | Put this pipeline in your Logstash configuration directory so it gets loaded in whenever Logstash restarts: 291 | 292 | ```bash 293 | sudo mv directory-sizes-index.yml /etc/logstash/conf.d/ 294 | ``` 295 | 296 | Add the pipeline to your `/etc/logstash/pipelines.yml` file: 297 | 298 | ``` 299 | - pipeline.id: "directory-sizes-index" 300 | path.config: "/etc/logstash/conf.d/directory-sizes-index.conf" 301 | ``` 302 | 303 | Append your new pipeline to your tagged data in the `distributor.yml` pipeline: 304 | 305 | ``` 306 | } else if "directory-sizes" in [tags] { 307 | pipeline { 308 | send_to => ["directory-sizes-archive", "directory-sizes-index"] 309 | } 310 | } 311 | ``` 312 | 313 | And finally, restart the Logstash service: 314 | 315 | ```bash 316 | sudo systemctl restart logstash 317 | ``` 318 | 319 | While Logstash is restarting, you can tail it's log file in order to see if there are any configuration errors: 320 | 321 | ```bash 322 | sudo tail -f /var/log/logstash/logstash-plain.log 323 | ``` 324 | 325 | After a few seconds, you should see Logstash shutdown and start with the new pipeline and no errors being emitted. 326 | 327 | Check your cluster's Stack Monitoring to see if we're getting events through the pipeline: 328 | 329 | ![Indexing](index.png) 330 | 331 | ## Step #4 - Visualize Data 332 | 333 | Once Elasticsearch is indexing the data, we want to visualize it in Kibana. 334 | 335 | Download this dashboard: [directory-sizes.ndjson](directory-sizes.ndjson) 336 | 337 | Jump back into Kibana: 338 | 339 | 1. Select "Stack Management" from the menu 340 | 2. Select "Saved Objects" 341 | 3. Click "Import" in the upper right 342 | 343 | Once it's been imported, click on "Temperature DHT22". 344 | 345 | ![Dashboard](dashboard.png) 346 | 347 | Congratulations! You should now be looking at temperature data from your DHT22 in Elastic. 348 | 349 | -------------------------------------------------------------------------------- /directory-sizes/archive.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/directory-sizes/archive.png -------------------------------------------------------------------------------- /directory-sizes/dashboard.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/directory-sizes/dashboard.png -------------------------------------------------------------------------------- /directory-sizes/directory-sizes.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | import datetime 4 | import json 5 | import os 6 | 7 | path = "/mnt/data-lake" 8 | 9 | def get_size(start_path = path): 10 | total_size = 0 11 | for dirpath, dirnames, filenames in os.walk(start_path): 12 | for f in filenames: 13 | fp = os.path.join(dirpath, f) 14 | # skip if it is symbolic link 15 | if not os.path.islink(fp): 16 | total_size += os.path.getsize(fp) 17 | 18 | return total_size 19 | 20 | if __name__ == "__main__": 21 | 22 | if os.path.ismount(path): 23 | # Get size of each directory 24 | for d in os.listdir(path): 25 | size_bytes = get_size(path + "/" + d) 26 | output = { 27 | "@timestamp": datetime.datetime.utcnow().isoformat(), 28 | "directory": d, 29 | "bytes": size_bytes 30 | } 31 | print(json.dumps(output)) 32 | 33 | # Get total, available, and free space 34 | statvfs = os.statvfs(path) 35 | output = { 36 | "@timestamp": datetime.datetime.utcnow().isoformat(), 37 | "total_bytes": statvfs.f_frsize * statvfs.f_blocks, # Size of filesystem in bytes 38 | "free_bytes": statvfs.f_frsize * statvfs.f_bfree, # Free bytes total 39 | "available_bytes": statvfs.f_frsize * statvfs.f_bavail, # Free bytes for users 40 | "mounted": True 41 | } 42 | print(json.dumps(output)) 43 | else: 44 | output = { 45 | "@timestamp": datetime.datetime.utcnow().isoformat(), 46 | "mounted": False 47 | } 48 | print(json.dumps(output)) 49 | 50 | -------------------------------------------------------------------------------- /directory-sizes/index.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/directory-sizes/index.png -------------------------------------------------------------------------------- /directory-sizes/logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/directory-sizes/logo.png -------------------------------------------------------------------------------- /directory-sizes/minio.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/directory-sizes/minio.png -------------------------------------------------------------------------------- /flight-tracker/README.md: -------------------------------------------------------------------------------- 1 | # Elastic Flight Tracker 2 | 3 | 4 | 5 | For this data source, we'll be using an [SDR](https://www.amazon.com/gp/product/B01GDN1T4S) to track aircraft flights via [ADS-B](https://mode-s.org/decode/). We'll use a Python script to decode the signals and write them to a log file. Elastic's Filebeat will pick them up from there and handle getting them to Logstash. 6 | 7 | We'll build the following dashboard with Elastic: 8 | 9 | ![Dashboard](dashboard.png) 10 | 11 | Let's get started! 12 | 13 | ## Step #1 - Collect Data 14 | 15 | Create a new python script called `~/bin/flight-tracker.py` with the following contents: 16 | 17 | ​ [flight-tracker.py](flight-tracker.py) 18 | 19 | The script requires that your SDR be plugged in before running. 20 | 21 | Take a few minutes to familiarize yourself with the script. Adjust the values of `` and ``. You can use [LatLong.net](https://www.latlong.net/) to lookup your location. 22 | 23 | When you're ready, try running the script: 24 | 25 | ```bash 26 | chmod a+x ~/bin/flight-tracker.py 27 | sudo ~/bin/flight-tracker.py 28 | ``` 29 | 30 | It may take a few minutes to see output if you're in a quiet airspace, but once ~10 messages have been received you should see output on `stdout` similar to: 31 | 32 | ```json 33 | {"@timestamp": "2021-09-08T15:20:04.046427", "hex_ident": "A49DE9", "call_sign": null, "location": [42.05695, -88.04905], "altitude_ft": 31475, "speed_kts": 334, "track_angle_deg": 169, "vertical_speed_fpm": 3328, "speed_ref": "GS"} 34 | {"@timestamp": "2021-09-08T15:20:03.330181", "hex_ident": "A1D4BC", "call_sign": "ENY4299", "location": [41.78804, -88.11425], "altitude_ft": 9675, "speed_kts": 292, "track_angle_deg": 41, "vertical_speed_fpm": -1792, "speed_ref": "GS"} 35 | {"@timestamp": "2021-09-08T15:20:05.502300", "hex_ident": "ACC3B4", "call_sign": "AAL2080", "location": [41.91885, -88.03], "altitude_ft": 7600, "speed_kts": 289, "track_angle_deg": 45, "vertical_speed_fpm": -1536, "speed_ref": "GS"} 36 | ``` 37 | 38 | Once you confirm the script is working, you can redirect its output to a log file: 39 | 40 | ```bash 41 | sudo touch /var/log/flight-tracker.log 42 | sudo chown ubuntu.ubuntu /var/log/flight-tracker.log 43 | ``` 44 | 45 | Create a logrotate entry so the log file doesn't grow unbounded: 46 | 47 | ```bash 48 | sudo vi /etc/logrotate.d/flight-tracker 49 | ``` 50 | 51 | Add the following logrotate content: 52 | 53 | ``` 54 | /var/log/flight-tracker.log { 55 | weekly 56 | rotate 12 57 | compress 58 | delaycompress 59 | missingok 60 | notifempty 61 | create 644 ubuntu ubuntu 62 | } 63 | ``` 64 | 65 | Create a new bash script `~/bin/flight-tracker.sh` with the following: 66 | 67 | ```bash 68 | #!/bin/bash 69 | 70 | if pgrep -f "sudo /home/ubuntu/bin/flight-tracker.py" > /dev/null 71 | then 72 | echo "Already running." 73 | else 74 | echo "Not running. Restarting..." 75 | sudo /home/ubuntu/bin/flight-tracker.py >> /var/log/flight-tracker.log 2>&1 76 | fi 77 | ``` 78 | 79 | Make it executable: 80 | 81 | ```bash 82 | chmod a+x ~/bin/flight-tracker.sh 83 | ``` 84 | 85 | Add the following entry to your crontab with `crontab -e`: 86 | 87 | ``` 88 | * * * * * /home/ubuntu/bin/flight-tracker.sh >> /tmp/flight-tracker.log 2>&1 89 | ``` 90 | 91 | Verify output by tailing the log file for a few minutes (since cron is only running the script at the start of each minute): 92 | 93 | ```bash 94 | tail -f /var/log/flight-tracker.log 95 | ``` 96 | 97 | If you're seeing output every few seconds, then you are successfully collecting data! 98 | 99 | ## Step #2 - Archive Data 100 | 101 | Once your data is ready to archive, we'll use Filebeat to send it to Logstash which will in turn sends it to S3. 102 | 103 | Add the following to the Filebeat config `/etc/filebeat/filebeat.yml` on the host logging your CO2 data: 104 | 105 | ```yaml 106 | filebeat.inputs: 107 | - type: log 108 | enabled: true 109 | tags: ["flight-tracker"] 110 | paths: 111 | - /var/log/flight-tracker.log 112 | ``` 113 | 114 | This tells Filebeat where the log file is located and it adds a tag to each event. We'll refer to that tag in Logstash so we can easily isolate events from this data stream. 115 | 116 | Restart Filebeat: 117 | 118 | ```bash 119 | sudo systemctl restart filebeat 120 | ``` 121 | 122 | You may want to tail syslog to see if Filebeat restarts without any issues: 123 | 124 | ```bash 125 | tail -f /var/log/syslog | grep filebeat 126 | ``` 127 | 128 | At this point, we should have Solar Enphase data flowing into Logstash. By default however, our `distributor` pipeline in Logstash will put any unrecognized data in our Data Lake / S3 bucket called `NEEDS_CLASSIFIED`. To change this, we're going to update the `distributor` pipeline to recognize the Solar Enphase data feed. 129 | 130 | Add the following conditional to your `distributor.yml` file: 131 | 132 | ``` 133 | } else if "flight-tracker" in [tags] { 134 | pipeline { 135 | send_to => ["flight-tracker-archive"] 136 | } 137 | } 138 | ``` 139 | 140 | Create a Logstash pipeline called `flight-tracker-archive.yml` with the following contents: 141 | 142 | ``` 143 | input { 144 | pipeline { 145 | address => "flight-tracker-archive" 146 | } 147 | } 148 | filter { 149 | } 150 | output { 151 | s3 { 152 | # 153 | # Custom Settings 154 | # 155 | prefix => "flight-tracker/%{+YYYY}-%{+MM}-%{+dd}/%{+HH}" 156 | temporary_directory => "${S3_TEMP_DIR}/flight-tracker-archive" 157 | access_key_id => "${S3_ACCESS_KEY}" 158 | secret_access_key => "${S3_SECRET_KEY}" 159 | endpoint => "${S3_ENDPOINT}" 160 | bucket => "${S3_BUCKET}" 161 | 162 | # 163 | # Standard Settings 164 | # 165 | validate_credentials_on_root_bucket => false 166 | codec => json_lines 167 | # Limit Data Lake file sizes to 5 GB 168 | size_file => 5000000000 169 | time_file => 60 170 | # encoding => "gzip" 171 | additional_settings => { 172 | force_path_style => true 173 | follow_redirects => false 174 | } 175 | } 176 | } 177 | ``` 178 | 179 | Put this pipeline in your Logstash configuration directory so it gets loaded whenever Logstash restarts: 180 | 181 | ```bash 182 | sudo mv flight-tracker-archive.yml /etc/logstash/conf.d/ 183 | ``` 184 | 185 | Add the pipeline to your `/etc/logstash/pipelines.yml` file: 186 | 187 | ``` 188 | - pipeline.id: "flight-tracker-archive" 189 | path.config: "/etc/logstash/conf.d/flight-tracker-archive.conf" 190 | ``` 191 | 192 | And finally, restart the Logstash service: 193 | 194 | ```bash 195 | sudo systemctl restart logstash 196 | ``` 197 | 198 | While Logstash is restarting, you can tail it's log file in order to see if there are any configuration errors: 199 | 200 | ```bash 201 | sudo tail -f /var/log/logstash/logstash-plain.log 202 | ``` 203 | 204 | After a few seconds, you should see Logstash shutdown and start with the new pipeline and no errors being emitted. 205 | 206 | Check your cluster's Stack Monitoring to see if we're getting events through the pipeline: 207 | 208 | ![Stack Monitoring](archive.png) 209 | 210 | Check your S3 bucket to see if you're getting data directories created for the current date & hour with data: 211 | 212 | ![MinIO](minio.png) 213 | 214 | If you see your data being stored, then you are successfully archiving! 215 | 216 | ## Step #3 - Index Data 217 | 218 | Once Logstash is archiving the data, next we need to index it with Elastic. 219 | 220 | We'll use Elastic's [Dynamic field mapping](https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-field-mapping.html) feature to automatically create the right [Field data types](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html) for the majority of data we're sending in. The one exception is for `geo_point` data which we need to explicitly add a mapping for, using an index template. 221 | 222 | Jump into Kibana and create the following Index Template using Dev Tools: 223 | 224 | ``` 225 | PUT _index_template/flight-tracker 226 | { 227 | "index_patterns": ["flight-tracker-*"], 228 | "template": { 229 | "settings": {}, 230 | "mappings": { 231 | "properties": { 232 | "location": { 233 | "type": "geo_point" 234 | } 235 | } 236 | }, 237 | "aliases": {} 238 | } 239 | } 240 | ``` 241 | 242 | Using the [Logstash Toolkit](http://github.com/gose/logstash-toolkit), the following filter chain has been built that can parse the raw JSON coming in. 243 | 244 | Create a new pipeline called `flight-tracker-index.yml` with the following content: 245 | 246 | ``` 247 | input { 248 | pipeline { 249 | address => "flight-tracker-index" 250 | } 251 | } 252 | filter { 253 | json { 254 | source => "message" 255 | } 256 | json { 257 | source => "message" 258 | } 259 | mutate { 260 | remove_field => ["message", "tags", "path"] 261 | remove_field => ["agent", "host", "input", "log", "ecs", "@version"] 262 | } 263 | } 264 | output { 265 | elasticsearch { 266 | # 267 | # Custom Settings 268 | # 269 | id => "flight-tracker-index" 270 | index => "flight-tracker-%{+YYYY.MM.dd}" 271 | hosts => "${ES_ENDPOINT}" 272 | user => "${ES_USERNAME}" 273 | password => "${ES_PASSWORD}" 274 | } 275 | } 276 | ``` 277 | 278 | Put this pipeline in your Logstash configuration directory so it gets loaded in whenever Logstash restarts: 279 | 280 | ```bash 281 | sudo mv flight-tracker-index.yml /etc/logstash/conf.d/ 282 | ``` 283 | 284 | Add the pipeline to your `/etc/logstash/pipelines.yml` file: 285 | 286 | ``` 287 | - pipeline.id: "flight-tracker-index" 288 | path.config: "/etc/logstash/conf.d/flight-tracker-index.conf" 289 | ``` 290 | 291 | And finally, restart the Logstash service: 292 | 293 | ```bash 294 | sudo systemctl restart logstash 295 | ``` 296 | 297 | While Logstash is restarting, you can tail it's log file in order to see if there are any configuration errors: 298 | 299 | ```bash 300 | sudo tail -f /var/log/logstash/logstash-plain.log 301 | ``` 302 | 303 | After a few seconds, you should see Logstash shutdown and start with the new pipeline and no errors being emitted. 304 | 305 | Check your cluster's Stack Monitoring to see if we're getting events through the pipeline: 306 | 307 | ![Indexing](index.png) 308 | 309 | ## Step #4 - Visualize Data 310 | 311 | Once Elasticsearch is indexing the data, we want to visualize it in Kibana. 312 | 313 | Download this dashboard: [flight-tracker.ndjson](flight-tracker.ndjson) 314 | 315 | Jump back into Kibana: 316 | 317 | 1. Select "Stack Management" from the menu 318 | 2. Select "Saved Objects" 319 | 3. Click "Import" in the upper right 320 | 321 | Once it's been imported, click on "Flight Tracker". 322 | 323 | ![Dashboard](dashboard.png) 324 | 325 | If you'd like to plot the location of your receiver (i.e., the orange tower in the Elastic Map), add the following document using Dev Tools (replacing the `lat` and`lon` with your location): 326 | 327 | ```JSON 328 | PUT /flight-tracker-receiver/_doc/1 329 | { 330 | "location": { 331 | "lat": 41.978611, 332 | "lon": -87.904724 333 | } 334 | } 335 | ``` 336 | 337 | Congratulations! You should now be looking at live flights in Elastic as they're being collected by your base station! 338 | -------------------------------------------------------------------------------- /flight-tracker/archive.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/flight-tracker/archive.png -------------------------------------------------------------------------------- /flight-tracker/dashboard.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/flight-tracker/dashboard.png -------------------------------------------------------------------------------- /flight-tracker/flight-tracker.ndjson: -------------------------------------------------------------------------------- 1 | {"attributes":{"fieldAttrs":"{}","fields":"[]","runtimeFieldMap":"{}","timeFieldName":"@timestamp","title":"flight-tracker-*","typeMeta":"{}"},"coreMigrationVersion":"7.14.0","id":"c10de000-10c0-11ec-b013-53a9df7625dd","migrationVersion":{"index-pattern":"7.11.0"},"references":[],"type":"index-pattern","updated_at":"2021-09-08T16:21:00.040Z","version":"WzQwODEyMSwyXQ=="} 2 | {"attributes":{"description":"","hits":0,"kibanaSavedObjectMeta":{"searchSourceJSON":"{\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filter\":[]}"},"optionsJSON":"{\"useMargins\":true,\"syncColors\":false,\"hidePanelTitles\":false}","panelsJSON":"[{\"version\":\"7.14.0\",\"type\":\"visualization\",\"gridData\":{\"x\":0,\"y\":0,\"w\":12,\"h\":5,\"i\":\"11142c2c-8f3c-4eb0-9c2c-bc87dd272bc1\"},\"panelIndex\":\"11142c2c-8f3c-4eb0-9c2c-bc87dd272bc1\",\"embeddableConfig\":{\"savedVis\":{\"title\":\"\",\"description\":\"\",\"type\":\"markdown\",\"params\":{\"fontSize\":12,\"openLinksInNewTab\":false,\"markdown\":\"# Flight Tracker\"},\"uiState\":{},\"data\":{\"aggs\":[],\"searchSource\":{\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filter\":[]}}},\"enhancements\":{}}},{\"version\":\"7.14.0\",\"type\":\"lens\",\"gridData\":{\"x\":12,\"y\":0,\"w\":36,\"h\":5,\"i\":\"2cf8e227-485b-4209-af4e-9416692b8916\"},\"panelIndex\":\"2cf8e227-485b-4209-af4e-9416692b8916\",\"embeddableConfig\":{\"attributes\":{\"title\":\"\",\"type\":\"lens\",\"visualizationType\":\"lnsXY\",\"state\":{\"datasourceStates\":{\"indexpattern\":{\"layers\":{\"451792f9-6bc9-4b2d-a8db-b88844fa22d0\":{\"columns\":{\"ea213a93-dd58-4eb6-9b0b-4c747c1592d2\":{\"label\":\"@timestamp\",\"dataType\":\"date\",\"operationType\":\"date_histogram\",\"sourceField\":\"@timestamp\",\"isBucketed\":true,\"scale\":\"interval\",\"params\":{\"interval\":\"auto\"}},\"95952347-70f3-4884-93f8-cef298545532\":{\"label\":\"Count of records\",\"dataType\":\"number\",\"operationType\":\"count\",\"isBucketed\":false,\"scale\":\"ratio\",\"sourceField\":\"Records\"}},\"columnOrder\":[\"ea213a93-dd58-4eb6-9b0b-4c747c1592d2\",\"95952347-70f3-4884-93f8-cef298545532\"],\"incompleteColumns\":{}}}}},\"visualization\":{\"legend\":{\"isVisible\":true,\"position\":\"right\"},\"valueLabels\":\"hide\",\"fittingFunction\":\"None\",\"yLeftExtent\":{\"mode\":\"full\"},\"yRightExtent\":{\"mode\":\"full\"},\"axisTitlesVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"tickLabelsVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"gridlinesVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"preferredSeriesType\":\"bar_stacked\",\"layers\":[{\"layerId\":\"451792f9-6bc9-4b2d-a8db-b88844fa22d0\",\"accessors\":[\"95952347-70f3-4884-93f8-cef298545532\"],\"position\":\"top\",\"seriesType\":\"bar_stacked\",\"showGridlines\":false,\"xAccessor\":\"ea213a93-dd58-4eb6-9b0b-4c747c1592d2\"}]},\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filters\":[]},\"references\":[{\"type\":\"index-pattern\",\"id\":\"c10de000-10c0-11ec-b013-53a9df7625dd\",\"name\":\"indexpattern-datasource-current-indexpattern\"},{\"type\":\"index-pattern\",\"id\":\"c10de000-10c0-11ec-b013-53a9df7625dd\",\"name\":\"indexpattern-datasource-layer-451792f9-6bc9-4b2d-a8db-b88844fa22d0\"}]},\"enhancements\":{}}},{\"version\":\"7.14.0\",\"type\":\"map\",\"gridData\":{\"x\":0,\"y\":5,\"w\":48,\"h\":33,\"i\":\"6641024f-134f-488a-98bb-e6e65eb47fcc\"},\"panelIndex\":\"6641024f-134f-488a-98bb-e6e65eb47fcc\",\"embeddableConfig\":{\"attributes\":{\"title\":\"Flight Tracker\",\"description\":\"\",\"layerListJSON\":\"[{\\\"sourceDescriptor\\\":{\\\"type\\\":\\\"EMS_TMS\\\",\\\"isAutoSelect\\\":true,\\\"id\\\":null},\\\"id\\\":\\\"b97e4b84-4fb0-414d-88cf-a3bfb409c8fe\\\",\\\"label\\\":null,\\\"minZoom\\\":0,\\\"maxZoom\\\":24,\\\"alpha\\\":1,\\\"visible\\\":true,\\\"style\\\":{\\\"type\\\":\\\"TILE\\\"},\\\"includeInFitToBounds\\\":true,\\\"type\\\":\\\"VECTOR_TILE\\\"},{\\\"sourceDescriptor\\\":{\\\"indexPatternId\\\":\\\"c10de000-10c0-11ec-b013-53a9df7625dd\\\",\\\"geoField\\\":\\\"location\\\",\\\"filterByMapBounds\\\":true,\\\"scalingType\\\":\\\"LIMIT\\\",\\\"id\\\":\\\"85e4a6d8-32db-4f58-8268-19505567a7ac\\\",\\\"type\\\":\\\"ES_SEARCH\\\",\\\"applyGlobalQuery\\\":true,\\\"applyGlobalTime\\\":true,\\\"tooltipProperties\\\":[],\\\"sortField\\\":\\\"\\\",\\\"sortOrder\\\":\\\"desc\\\",\\\"topHitsSplitField\\\":\\\"\\\",\\\"topHitsSize\\\":1},\\\"id\\\":\\\"7b080642-7356-4d69-881d-914ccbc26fa0\\\",\\\"label\\\":\\\"Check-ins\\\",\\\"minZoom\\\":0,\\\"maxZoom\\\":24,\\\"alpha\\\":0.75,\\\"visible\\\":true,\\\"style\\\":{\\\"type\\\":\\\"VECTOR\\\",\\\"properties\\\":{\\\"icon\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"value\\\":\\\"airport\\\"}},\\\"fillColor\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"color\\\":\\\"#54B399\\\"}},\\\"lineColor\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"color\\\":\\\"#41937c\\\"}},\\\"lineWidth\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"size\\\":0}},\\\"iconSize\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"size\\\":3}},\\\"iconOrientation\\\":{\\\"type\\\":\\\"DYNAMIC\\\",\\\"options\\\":{\\\"field\\\":{\\\"label\\\":\\\"track_deg\\\",\\\"name\\\":\\\"track_deg\\\",\\\"origin\\\":\\\"source\\\",\\\"type\\\":\\\"number\\\",\\\"supportsAutoDomain\\\":true},\\\"fieldMetaOptions\\\":{\\\"isEnabled\\\":true,\\\"sigma\\\":3}}},\\\"labelText\\\":{\\\"type\\\":\\\"DYNAMIC\\\",\\\"options\\\":{\\\"field\\\":null}},\\\"labelColor\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"color\\\":\\\"#FFFFFF\\\"}},\\\"labelSize\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"size\\\":14}},\\\"labelBorderColor\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"color\\\":\\\"#000000\\\"}},\\\"symbolizeAs\\\":{\\\"options\\\":{\\\"value\\\":\\\"circle\\\"}},\\\"labelBorderSize\\\":{\\\"options\\\":{\\\"size\\\":\\\"SMALL\\\"}}},\\\"isTimeAware\\\":true},\\\"includeInFitToBounds\\\":true,\\\"type\\\":\\\"VECTOR\\\",\\\"joins\\\":[]},{\\\"sourceDescriptor\\\":{\\\"indexPatternId\\\":\\\"c10de000-10c0-11ec-b013-53a9df7625dd\\\",\\\"geoField\\\":\\\"location\\\",\\\"scalingType\\\":\\\"TOP_HITS\\\",\\\"sortField\\\":\\\"@timestamp\\\",\\\"sortOrder\\\":\\\"desc\\\",\\\"tooltipProperties\\\":[\\\"hex_code.keyword\\\",\\\"@timestamp\\\",\\\"call_sign\\\",\\\"altitude_ft\\\",\\\"speed_kts\\\",\\\"vertical_speed_fpm\\\"],\\\"topHitsSplitField\\\":\\\"hex_code.keyword\\\",\\\"topHitsSize\\\":1,\\\"id\\\":\\\"4973e2c9-1ca6-4518-bac7-088913a7d434\\\",\\\"type\\\":\\\"ES_SEARCH\\\",\\\"applyGlobalQuery\\\":true,\\\"applyGlobalTime\\\":true,\\\"filterByMapBounds\\\":true},\\\"id\\\":\\\"5bb65a5f-f1c6-4e65-ae5f-f140bdecb838\\\",\\\"label\\\":\\\"Flights\\\",\\\"minZoom\\\":0,\\\"maxZoom\\\":24,\\\"alpha\\\":0.75,\\\"visible\\\":true,\\\"style\\\":{\\\"type\\\":\\\"VECTOR\\\",\\\"properties\\\":{\\\"icon\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"value\\\":\\\"airport\\\"}},\\\"fillColor\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"color\\\":\\\"#6092C0\\\"}},\\\"lineColor\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"color\\\":\\\"#4379aa\\\"}},\\\"lineWidth\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"size\\\":0}},\\\"iconSize\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"size\\\":10}},\\\"iconOrientation\\\":{\\\"type\\\":\\\"DYNAMIC\\\",\\\"options\\\":{\\\"field\\\":{\\\"label\\\":\\\"track_deg\\\",\\\"name\\\":\\\"track_deg\\\",\\\"origin\\\":\\\"source\\\",\\\"type\\\":\\\"number\\\",\\\"supportsAutoDomain\\\":true},\\\"fieldMetaOptions\\\":{\\\"isEnabled\\\":true,\\\"sigma\\\":3}}},\\\"labelText\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"value\\\":\\\"\\\"}},\\\"labelColor\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"color\\\":\\\"#FFFFFF\\\"}},\\\"labelSize\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"size\\\":14}},\\\"labelBorderColor\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"color\\\":\\\"#000000\\\"}},\\\"symbolizeAs\\\":{\\\"options\\\":{\\\"value\\\":\\\"icon\\\"}},\\\"labelBorderSize\\\":{\\\"options\\\":{\\\"size\\\":\\\"SMALL\\\"}}},\\\"isTimeAware\\\":true},\\\"includeInFitToBounds\\\":true,\\\"type\\\":\\\"VECTOR\\\",\\\"joins\\\":[]},{\\\"sourceDescriptor\\\":{\\\"indexPatternId\\\":\\\"c10de000-10c0-11ec-b013-53a9df7625dd\\\",\\\"geoField\\\":\\\"location\\\",\\\"splitField\\\":\\\"hex_code.keyword\\\",\\\"sortField\\\":\\\"@timestamp\\\",\\\"id\\\":\\\"e5e0fe1f-b235-4f42-b361-2e0d7177b1a8\\\",\\\"type\\\":\\\"ES_GEO_LINE\\\",\\\"applyGlobalQuery\\\":true,\\\"applyGlobalTime\\\":true,\\\"metrics\\\":[{\\\"type\\\":\\\"count\\\"}]},\\\"style\\\":{\\\"type\\\":\\\"VECTOR\\\",\\\"properties\\\":{\\\"icon\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"value\\\":\\\"marker\\\"}},\\\"fillColor\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"color\\\":\\\"#54B399\\\"}},\\\"lineColor\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"color\\\":\\\"#41937c\\\"}},\\\"lineWidth\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"size\\\":2}},\\\"iconSize\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"size\\\":6}},\\\"iconOrientation\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"orientation\\\":0}},\\\"labelText\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"value\\\":\\\"\\\"}},\\\"labelColor\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"color\\\":\\\"#FFFFFF\\\"}},\\\"labelSize\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"size\\\":14}},\\\"labelBorderColor\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"color\\\":\\\"#000000\\\"}},\\\"symbolizeAs\\\":{\\\"options\\\":{\\\"value\\\":\\\"circle\\\"}},\\\"labelBorderSize\\\":{\\\"options\\\":{\\\"size\\\":\\\"SMALL\\\"}}},\\\"isTimeAware\\\":true},\\\"id\\\":\\\"90e78113-220f-439f-908e-5ec584444856\\\",\\\"label\\\":null,\\\"minZoom\\\":0,\\\"maxZoom\\\":24,\\\"alpha\\\":1,\\\"visible\\\":true,\\\"includeInFitToBounds\\\":true,\\\"type\\\":\\\"VECTOR\\\",\\\"joins\\\":[]},{\\\"sourceDescriptor\\\":{\\\"indexPatternId\\\":\\\"c10de000-10c0-11ec-b013-53a9df7625dd\\\",\\\"geoField\\\":\\\"location\\\",\\\"filterByMapBounds\\\":true,\\\"scalingType\\\":\\\"LIMIT\\\",\\\"id\\\":\\\"efde372c-8fee-4433-ad0b-5cf3b1ae0c1b\\\",\\\"type\\\":\\\"ES_SEARCH\\\",\\\"applyGlobalQuery\\\":false,\\\"applyGlobalTime\\\":false,\\\"tooltipProperties\\\":[],\\\"sortField\\\":\\\"\\\",\\\"sortOrder\\\":\\\"desc\\\",\\\"topHitsSplitField\\\":\\\"\\\",\\\"topHitsSize\\\":1},\\\"id\\\":\\\"89aa1c7a-a9a0-4ad3-a15e-309539f3d74b\\\",\\\"label\\\":\\\"Base Station\\\",\\\"minZoom\\\":0,\\\"maxZoom\\\":24,\\\"alpha\\\":0.75,\\\"visible\\\":true,\\\"style\\\":{\\\"type\\\":\\\"VECTOR\\\",\\\"properties\\\":{\\\"icon\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"value\\\":\\\"communications-tower\\\"}},\\\"fillColor\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"color\\\":\\\"#f8a305\\\"}},\\\"lineColor\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"color\\\":\\\"#F8A305\\\"}},\\\"lineWidth\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"size\\\":1}},\\\"iconSize\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"size\\\":16}},\\\"iconOrientation\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"orientation\\\":0}},\\\"labelText\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"value\\\":\\\"\\\"}},\\\"labelColor\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"color\\\":\\\"#FFFFFF\\\"}},\\\"labelSize\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"size\\\":14}},\\\"labelBorderColor\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"color\\\":\\\"#000000\\\"}},\\\"symbolizeAs\\\":{\\\"options\\\":{\\\"value\\\":\\\"icon\\\"}},\\\"labelBorderSize\\\":{\\\"options\\\":{\\\"size\\\":\\\"SMALL\\\"}}},\\\"isTimeAware\\\":true},\\\"includeInFitToBounds\\\":true,\\\"type\\\":\\\"VECTOR\\\",\\\"joins\\\":[],\\\"query\\\":{\\\"query\\\":\\\"_id : 1\\\",\\\"language\\\":\\\"kuery\\\"}}]\",\"mapStateJSON\":\"{\\\"zoom\\\":8.47,\\\"center\\\":{\\\"lon\\\":-88.20859,\\\"lat\\\":41.95384},\\\"timeFilters\\\":{\\\"from\\\":\\\"now-3m\\\",\\\"to\\\":\\\"now\\\"},\\\"refreshConfig\\\":{\\\"isPaused\\\":false,\\\"interval\\\":10000},\\\"query\\\":{\\\"query\\\":\\\"\\\",\\\"language\\\":\\\"kuery\\\"},\\\"filters\\\":[],\\\"settings\\\":{\\\"autoFitToDataBounds\\\":false,\\\"backgroundColor\\\":\\\"#1d1e24\\\",\\\"disableInteractive\\\":false,\\\"disableTooltipControl\\\":false,\\\"hideToolbarOverlay\\\":false,\\\"hideLayerControl\\\":false,\\\"hideViewControl\\\":false,\\\"initialLocation\\\":\\\"LAST_SAVED_LOCATION\\\",\\\"fixedLocation\\\":{\\\"lat\\\":0,\\\"lon\\\":0,\\\"zoom\\\":2},\\\"browserLocation\\\":{\\\"zoom\\\":2},\\\"maxZoom\\\":24,\\\"minZoom\\\":0,\\\"showScaleControl\\\":false,\\\"showSpatialFilters\\\":true,\\\"showTimesliderToggleButton\\\":true,\\\"spatialFiltersAlpa\\\":0.3,\\\"spatialFiltersFillColor\\\":\\\"#DA8B45\\\",\\\"spatialFiltersLineColor\\\":\\\"#DA8B45\\\"}}\",\"uiStateJSON\":\"{\\\"isLayerTOCOpen\\\":true,\\\"openTOCDetails\\\":[]}\"},\"mapCenter\":{\"lat\":41.73541,\"lon\":-88.02419,\"zoom\":8.75},\"mapBuffer\":{\"minLon\":-89.29687,\"minLat\":40.9799,\"maxLon\":-86.48437,\"maxLat\":42.55308},\"isLayerTOCOpen\":true,\"openTOCDetails\":[],\"hiddenLayers\":[\"7b080642-7356-4d69-881d-914ccbc26fa0\"],\"enhancements\":{}}}]","timeRestore":false,"title":"Flight Tracker","version":1},"coreMigrationVersion":"7.14.0","id":"0ada6ef0-10c2-11ec-b013-53a9df7625dd","migrationVersion":{"dashboard":"7.14.0"},"references":[{"id":"c10de000-10c0-11ec-b013-53a9df7625dd","name":"2cf8e227-485b-4209-af4e-9416692b8916:indexpattern-datasource-current-indexpattern","type":"index-pattern"},{"id":"c10de000-10c0-11ec-b013-53a9df7625dd","name":"2cf8e227-485b-4209-af4e-9416692b8916:indexpattern-datasource-layer-451792f9-6bc9-4b2d-a8db-b88844fa22d0","type":"index-pattern"},{"id":"c10de000-10c0-11ec-b013-53a9df7625dd","name":"6641024f-134f-488a-98bb-e6e65eb47fcc:layer_1_source_index_pattern","type":"index-pattern"},{"id":"c10de000-10c0-11ec-b013-53a9df7625dd","name":"6641024f-134f-488a-98bb-e6e65eb47fcc:layer_2_source_index_pattern","type":"index-pattern"},{"id":"c10de000-10c0-11ec-b013-53a9df7625dd","name":"6641024f-134f-488a-98bb-e6e65eb47fcc:layer_3_source_index_pattern","type":"index-pattern"},{"id":"c10de000-10c0-11ec-b013-53a9df7625dd","name":"6641024f-134f-488a-98bb-e6e65eb47fcc:layer_4_source_index_pattern","type":"index-pattern"}],"type":"dashboard","updated_at":"2021-09-08T16:47:35.330Z","version":"WzQwODk1OCwyXQ=="} 3 | {"excludedObjects":[],"excludedObjectsCount":0,"exportedCount":2,"missingRefCount":0,"missingReferences":[]} -------------------------------------------------------------------------------- /flight-tracker/flight-tracker.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | import pyModeS as pms 4 | 5 | from datetime import datetime, timedelta 6 | from json import dumps 7 | from pyModeS import common 8 | from pyModeS.extra.rtlreader import RtlReader 9 | 10 | class Flight: 11 | def __init__(self, hex_ident=None): 12 | self.hex_ident = hex_ident 13 | self.call_sign = None 14 | self.location = None 15 | self.altitude_ft = None 16 | self.speed_kts = None 17 | self.track_angle_deg = None 18 | self.vertical_speed_fpm = None 19 | self.speed_ref = None 20 | self.last_seen = None 21 | self.sent = False 22 | 23 | def has_info(self): 24 | return (#self.call_sign is not None and 25 | self.location is not None and 26 | self.altitude_ft is not None and 27 | self.track_angle_deg is not None and 28 | self.speed_kts is not None) 29 | 30 | def pretty_print(self): 31 | print(self.hex_ident, self.call_sign, self.location, self.altitude_ft, self.speed_kts, 32 | self.track_angle_deg, self.vertical_speed_fpm, self.speed_ref) 33 | 34 | def json_print(self): 35 | output = { 36 | "@timestamp": self.last_seen.isoformat(), 37 | "hex_code": self.hex_ident, 38 | "call_sign": self.call_sign, 39 | "location": { 40 | "lat": self.location[0], 41 | "lon": self.location[1] 42 | }, 43 | "altitude_ft": self.altitude_ft, 44 | "speed_kts": self.speed_kts, 45 | "track_deg": int(self.track_angle_deg), 46 | "vertical_speed_fpm": self.vertical_speed_fpm, 47 | "speed_ref": self.speed_ref 48 | } 49 | print(dumps(output)) 50 | 51 | 52 | class ADSBClient(RtlReader): 53 | def __init__(self): 54 | super(ADSBClient, self).__init__() 55 | self.flights = {} 56 | self.lat_ref = 57 | self.lon_ref = 58 | self.i = 0 59 | 60 | def handle_messages(self, messages): 61 | self.i += 1 62 | for msg, ts in messages: 63 | if len(msg) != 28: # wrong data length 64 | continue 65 | 66 | df = pms.df(msg) 67 | 68 | if df != 17: # not ADSB 69 | continue 70 | 71 | if pms.crc(msg) !=0: # CRC fail 72 | continue 73 | 74 | icao = pms.adsb.icao(msg) 75 | tc = pms.adsb.typecode(msg) 76 | flight = None 77 | 78 | if icao in self.flights: 79 | flight = self.flights[icao] 80 | else: 81 | flight = Flight(icao) 82 | 83 | flight.last_seen = datetime.now() 84 | 85 | # Message Type Codes: https://mode-s.org/api/ 86 | if tc >= 1 and tc <= 4: 87 | # Typecode 1-4 88 | flight.call_sign = pms.adsb.callsign(msg).strip('_') 89 | elif tc >= 9 and tc <= 18: 90 | # Typecode 9-18 (airborne, barometric height) 91 | flight.location = pms.adsb.airborne_position_with_ref(msg, 92 | self.lat_ref, self.lon_ref) 93 | flight.altitude_ft = pms.adsb.altitude(msg) 94 | flight.sent = False 95 | elif tc == 19: 96 | # Typecode: 19 97 | # Ground Speed (GS) or Airspeed (IAS/TAS) 98 | # Output (speed, track angle, vertical speed, tag): 99 | (flight.speed_kts, flight.track_angle_deg, flight.vertical_speed_fpm, 100 | flight.speed_ref) = pms.adsb.velocity(msg) 101 | 102 | self.flights[icao] = flight 103 | 104 | if self.i > 10: 105 | self.i = 0 106 | #print("Flights: ", len(self.flights)) 107 | for key in list(self.flights): 108 | f = self.flights[key] 109 | if f.has_info() and not f.sent: 110 | #f.pretty_print() 111 | f.json_print() 112 | f.sent = True 113 | elif f.last_seen < (datetime.now() - timedelta(minutes=5)): 114 | #print("Deleting ", key) 115 | del self.flights[key] 116 | 117 | 118 | if __name__ == "__main__": 119 | client = ADSBClient() 120 | client.run() 121 | -------------------------------------------------------------------------------- /flight-tracker/index.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/flight-tracker/index.png -------------------------------------------------------------------------------- /flight-tracker/logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/flight-tracker/logo.png -------------------------------------------------------------------------------- /flight-tracker/minio.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/flight-tracker/minio.png -------------------------------------------------------------------------------- /gps/README.md: -------------------------------------------------------------------------------- 1 | # GPS Monitoring 2 | 3 | gps 4 | 5 | 6 | 7 | The [VK-162 GPS Receiver](https://www.pishop.us/product/gps-antenna-vk-162/) is a low cost, high sensitivity GPS Receiver with an internal antenna. It provides location & speed tracking with a high degree of accuracy, and can be used in a number of applications. It's built on the Ublox G6010 / G7020 low-power consumption GPS chipset and connects to a computer via USB where it can be read programmatically. 8 | 9 | In this data source, we'll build the following dashboard with Elastic: 10 | 11 | ![Dashboard](dashboard.png) 12 | 13 | Let's get started! 14 | 15 | ## Step #1 - Collect Data 16 | 17 | We use [gpsd](https://gpsd.gitlab.io/gpsd/) to query the GPS Receiver, and then [gpspipe](https://gpsd.gitlab.io/gpsd/gpspipe.html) to talk to `gpsd` to get location readings. 18 | 19 | If this is your first time setting up `gpsd`, you can refer to the [installation instructions](https://gpsd.gitlab.io/gpsd/installation.html). 20 | 21 | For Ubuntu systems, the installation process is as follows: 22 | 23 | ```bash 24 | sudo apt install gpsd-clients gpsd 25 | sudo systemctl enable gpsd 26 | ``` 27 | 28 | Create a file called `/etc/default/gpsd` with the following contents (changing the device to your setup, if necessary): 29 | 30 | ``` 31 | DEVICES="/dev/ttyACM0" 32 | ``` 33 | 34 | Then start the `gpsd` service: 35 | 36 | ```bash 37 | sudo systemctl start gpsd 38 | ``` 39 | 40 | And try querying it: 41 | 42 | ```bash 43 | gpspipe -w 44 | ``` 45 | 46 | You should see output similar to the following: 47 | 48 | ```json 49 | {"class":"VERSION","release":"3.20","rev":"3.20","proto_major":3,"proto_minor":14} 50 | {"class":"DEVICES","devices":[{"class":"DEVICE","path":"/dev/ttyACM0","driver":"u-blox","subtype":"SW 1.00 (59842),HW 00070000","subtype1":",PROTVER 14.00,GPS;SBAS;GLO;QZSS","activated":"2021-09-02T19:13:12.267Z","flags":1,"native":1,"bps":9600,"parity":"N","stopbits":1,"cycle":1.00,"mincycle":0.25}]} 51 | {"class":"WATCH","enable":true,"json":true,"nmea":false,"raw":0,"scaled":false,"timing":false,"split24":false,"pps":false} 52 | {"class":"TPV","device":"/dev/ttyACM0","status":2,"mode":3,"time":"2021-09-02T19:13:13.000Z","leapseconds":18,"ept":0.005,"lat":41.881832,"lon":-87.623177,"altHAE":206.865,"altMSL":240.666,"alt":240.666,"track":14.3003,"magtrack":10.5762,"magvar":-3.7,"speed":0.088,"climb":0.010,"eps":0.67,"ecefx":160358.79,"ecefy":-4754122.42,"ecefz":4234974.21,"ecefvx":0.02,"ecefvy":0.05,"ecefvz":0.07,"ecefpAcc":10.77,"ecefvAcc":0.67,"velN":0.085,"velE":0.022,"velD":-0.010,"geoidSep":-33.801} 53 | ``` 54 | 55 | The main thing to look for are the `"class":"TPV"` objects which is are "time-position-velocity" reports. All of the fields included are described in the document [Core Protocol Responses](https://gpsd.gitlab.io/gpsd/gpsd_json.html#_core_protocol_responses). 56 | 57 | Create a new shell script called `~/bin/gps.sh` with the following contents: 58 | 59 | ```bash 60 | #!/bin/bash 61 | 62 | gpspipe -w -n 10 | grep TPV | tail -n 1 63 | ``` 64 | 65 | Try running the script: 66 | 67 | ```bash 68 | chmod a+x ~/bin/gps.sh 69 | ~/bin/gps.sh 70 | ``` 71 | 72 | You should see output on `stdout` similar to: 73 | 74 | ```json 75 | {"class":"TPV","device":"/dev/ttyACM0","status":2,"mode":3,"time":"2021-09-02T19:13:13.000Z","leapseconds":18,"ept":0.005,"lat":41.881832,"lon":-87.623177,"altHAE":206.865,"altMSL":240.666,"alt":240.666,"track":14.3003,"magtrack":10.5762,"magvar":-3.7,"speed":0.088,"climb":0.010,"eps":0.67,"ecefx":160358.79,"ecefy":-4754122.42,"ecefz":4234974.21,"ecefvx":0.02,"ecefvy":0.05,"ecefvz":0.07,"ecefpAcc":10.77,"ecefvAcc":0.67,"velN":0.085,"velE":0.022,"velD":-0.010,"geoidSep":-33.801} 76 | ``` 77 | 78 | Once you confirm the script is working, you can redirect its output to a log file: 79 | 80 | ```bash 81 | sudo touch /var/log/gps.log 82 | sudo chown ubuntu.ubuntu /var/log/gps.log 83 | ``` 84 | 85 | Create a logrotate entry so the log file doesn't grow unbounded: 86 | 87 | ```bash 88 | sudo vi /etc/logrotate.d/gps 89 | ``` 90 | 91 | Add the following logrotate content: 92 | 93 | ``` 94 | /var/log/gps.log { 95 | weekly 96 | rotate 12 97 | compress 98 | delaycompress 99 | missingok 100 | notifempty 101 | create 644 ubuntu ubuntu 102 | } 103 | ``` 104 | 105 | Add the following entry to your crontab with `crontab -e`: 106 | 107 | ``` 108 | * * * * * /home/ubuntu/bin/gps.sh >> /var/log/gps.log 2>&1 109 | ``` 110 | 111 | Verify output by tailing the log file for a few minutes (since cron is only running the script at the start of each minute): 112 | 113 | ```bash 114 | tail -f /var/log/gps.log 115 | ``` 116 | 117 | If you're seeing output scroll each minute then you are successfully collecting data! 118 | 119 | ## Step #2 - Archive Data 120 | 121 | Once your data is ready to archive, we'll use Filebeat to send it to Logstash which will in turn sends it to S3. 122 | 123 | Add the following to the Filebeat config `/etc/filebeat/filebeat.yml` on the host logging your weather data: 124 | 125 | ```yaml 126 | filebeat.inputs: 127 | - type: log 128 | enabled: true 129 | tags: ["gps"] 130 | paths: 131 | - /var/log/gps.log 132 | ``` 133 | 134 | This tells Filebeat where the log file is located and it adds a tag to each event. We'll refer to that tag in Logstash so we can easily isolate events from this data stream. 135 | 136 | Restart Filebeat: 137 | 138 | ```bash 139 | sudo systemctl restart filebeat 140 | ``` 141 | 142 | You may want to tail syslog to see if Filebeat restarts without any issues: 143 | 144 | ```bash 145 | tail -f /var/log/syslog | grep filebeat 146 | ``` 147 | 148 | At this point, we should have GPS data flowing into Logstash. By default however, our `distributor` pipeline in Logstash will put any unrecognized data in our Data Lake / S3 bucket called `NEEDS_CLASSIFIED`. To change this, we're going to update the `distributor` pipeline to recognize the weather station data feed. 149 | 150 | Add the following conditional to your `distributor.yml` file: 151 | 152 | ``` 153 | } else if "gps" in [tags] { 154 | pipeline { 155 | send_to => ["gps-archive"] 156 | } 157 | } 158 | ``` 159 | 160 | Create a Logstash pipeline called `gps-archive.yml` with the following contents: 161 | 162 | ``` 163 | input { 164 | pipeline { 165 | address => "gps-archive" 166 | } 167 | } 168 | filter { 169 | } 170 | output { 171 | s3 { 172 | # 173 | # Custom Settings 174 | # 175 | prefix => "gps/%{+YYYY}-%{+MM}-%{+dd}/%{+HH}" 176 | temporary_directory => "${S3_TEMP_DIR}/gps-archive" 177 | access_key_id => "${S3_ACCESS_KEY}" 178 | secret_access_key => "${S3_SECRET_KEY}" 179 | endpoint => "${S3_ENDPOINT}" 180 | bucket => "${S3_BUCKET}" 181 | 182 | # 183 | # Standard Settings 184 | # 185 | validate_credentials_on_root_bucket => false 186 | codec => json_lines 187 | # Limit Data Lake file sizes to 5 GB 188 | size_file => 5000000000 189 | time_file => 60 190 | # encoding => "gzip" 191 | additional_settings => { 192 | force_path_style => true 193 | follow_redirects => false 194 | } 195 | } 196 | } 197 | ``` 198 | 199 | Put this pipeline in your Logstash configuration directory so it gets loaded whenever Logstash restarts: 200 | 201 | ```bash 202 | sudo mv gps-archive.yml /etc/logstash/conf.d/ 203 | ``` 204 | 205 | Add the pipeline to your `/etc/logstash/pipelines.yml` file: 206 | 207 | ``` 208 | - pipeline.id: "gps-archive" 209 | path.config: "/etc/logstash/conf.d/gps-archive.conf" 210 | ``` 211 | 212 | And finally, restart the Logstash service: 213 | 214 | ```bash 215 | sudo systemctl restart logstash 216 | ``` 217 | 218 | While Logstash is restarting, you can tail it's log file in order to see if there are any configuration errors: 219 | 220 | ```bash 221 | sudo tail -f /var/log/logstash/logstash-plain.log 222 | ``` 223 | 224 | After a few seconds, you should see Logstash shutdown and start with the new pipeline and no errors being emitted. 225 | 226 | Check your cluster's Stack Monitoring to see if we're getting events through the pipeline: 227 | 228 | ![Stack Monitoring](archive.png) 229 | 230 | Check your S3 bucket to see if you're getting data directories created for the current date & hour with data: 231 | 232 | ![MinIO](minio.png) 233 | 234 | If you see your data being stored, then you are successfully archiving! 235 | 236 | ## Step #3 - Index Data 237 | 238 | Once Logstash is archiving the data, we need to index it with Elastic. 239 | 240 | We'll use Elastic's [Dynamic field mapping](https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-field-mapping.html) feature to automatically create the right [Field data types](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html) for the majority of data we're sending in. The one exception is for `geo_point` data which we need to explicitly add a mapping for, using an index template. 241 | 242 | Jump into Kibana and create the following Index Template using Dev Tools: 243 | 244 | ``` 245 | PUT _index_template/gps 246 | { 247 | "index_patterns": ["gps-*"], 248 | "template": { 249 | "settings": {}, 250 | "mappings": { 251 | "properties": { 252 | "location": { 253 | "type": "geo_point" 254 | } 255 | } 256 | }, 257 | "aliases": {} 258 | } 259 | } 260 | ``` 261 | 262 | Using the [Logstash Toolkit](http://github.com/gose/logstash-toolkit), the following filter chain has been built that can parse the raw JSON coming in. 263 | 264 | Create a new pipeline called `gps-index.yml` with the following content: 265 | 266 | ``` 267 | input { 268 | pipeline { 269 | address => "gps-index" 270 | } 271 | } 272 | filter { 273 | json { 274 | source => "message" 275 | } 276 | json { 277 | source => "message" 278 | } 279 | mutate { 280 | remove_field => ["message", "agent", "host", "input", "log", "host", "ecs", "@version"] 281 | } 282 | mutate { 283 | add_field => { "[location]" => "%{[lat]}, %{[lon]}" } 284 | } 285 | } 286 | output { 287 | elasticsearch { 288 | # 289 | # Custom Settings 290 | # 291 | id => "gps-index" 292 | index => "gps-%{+YYYY.MM.dd}" 293 | hosts => "${ES_ENDPOINT}" 294 | user => "${ES_USERNAME}" 295 | password => "${ES_PASSWORD}" 296 | } 297 | } 298 | ``` 299 | 300 | Put this pipeline in your Logstash configuration directory so it gets loaded in whenever Logstash restarts: 301 | 302 | ```bash 303 | sudo mv gps-index.yml /etc/logstash/conf.d/ 304 | ``` 305 | 306 | Add the pipeline to your `/etc/logstash/pipelines.yml` file: 307 | 308 | ``` 309 | - pipeline.id: "gps-index" 310 | path.config: "/etc/logstash/conf.d/gps-index.conf" 311 | ``` 312 | 313 | And finally, restart the Logstash service: 314 | 315 | ```bash 316 | sudo systemctl restart logstash 317 | ``` 318 | 319 | While Logstash is restarting, you can tail it's log file in order to see if there are any configuration errors: 320 | 321 | ```bash 322 | sudo tail -f /var/log/logstash/logstash-plain.log 323 | ``` 324 | 325 | After a few seconds, you should see Logstash shutdown and start with the new pipeline and no errors being emitted. 326 | 327 | Check your cluster's Stack Monitoring to see if we're getting events through the pipeline: 328 | 329 | ![Indexing](index.png) 330 | 331 | ## Step #4 - Visualize Data 332 | 333 | Once Elasticsearch is indexing the data, we want to visualize it in Kibana. 334 | 335 | Download this dashboard: 336 | 337 | ​ [gps.ndjson](gps.ndjson) 338 | 339 | Jump into Kibana: 340 | 341 | 1. Select "Stack Management" from the menu 342 | 2. Select "Saved Objects" 343 | 3. Click "Import" in the upper right 344 | 345 | ![Dashboard](dashboard.png) 346 | 347 | Congratulations! You should now be looking at data from your GPS in Elastic. 348 | -------------------------------------------------------------------------------- /gps/archive.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/gps/archive.png -------------------------------------------------------------------------------- /gps/dashboard.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/gps/dashboard.png -------------------------------------------------------------------------------- /gps/gps.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/gps/gps.png -------------------------------------------------------------------------------- /gps/index.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/gps/index.png -------------------------------------------------------------------------------- /gps/minio.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/gps/minio.png -------------------------------------------------------------------------------- /haproxy-filebeat-module/2-archive/haproxy-filebeat-module-archive.yml: -------------------------------------------------------------------------------- 1 | input { 2 | pipeline { 3 | address => "haproxy-filebeat-module-archive" 4 | } 5 | } 6 | filter { 7 | } 8 | output { 9 | s3 { 10 | # 11 | # Custom Settings 12 | # 13 | prefix => "haproxy-filebeat-module/${S3_DATE_DIR}" 14 | temporary_directory => "${S3_TEMP_DIR}/haproxy-filebeat-module-archive" 15 | access_key_id => "${S3_ACCESS_KEY}" 16 | secret_access_key => "${S3_SECRET_KEY}" 17 | endpoint => "${S3_ENDPOINT}" 18 | bucket => "${S3_BUCKET}" 19 | 20 | # 21 | # Standard Settings 22 | # 23 | validate_credentials_on_root_bucket => false 24 | codec => json_lines 25 | # Limit Data Lake file sizes to 5 GB 26 | size_file => 5000000000 27 | time_file => 1 28 | # encoding => "gzip" 29 | additional_settings => { 30 | force_path_style => true 31 | follow_redirects => false 32 | } 33 | } 34 | } 35 | -------------------------------------------------------------------------------- /haproxy-filebeat-module/2-archive/haproxy-filebeat-module-reindex.yml: -------------------------------------------------------------------------------- 1 | input { 2 | s3 { 3 | # 4 | # Custom Settings 5 | # 6 | prefix => "haproxy-filebeat-module/2021-01-04" 7 | temporary_directory => "${S3_TEMP_DIR}/haproxy-filebeat-module-reindex" 8 | access_key_id => "${S3_ACCESS_KEY}" 9 | secret_access_key => "${S3_SECRET_KEY}" 10 | endpoint => "${S3_ENDPOINT}" 11 | bucket => "${S3_BUCKET}" 12 | 13 | # 14 | # Standard Settings 15 | # 16 | watch_for_new_files => false 17 | codec => json_lines 18 | additional_settings => { 19 | force_path_style => true 20 | follow_redirects => false 21 | } 22 | } 23 | } 24 | filter { 25 | } 26 | output { 27 | pipeline { send_to => "haproxy-filebeat-module-structure" } 28 | } 29 | -------------------------------------------------------------------------------- /haproxy-filebeat-module/2-archive/haproxy-filebeat-module-structure.yml: -------------------------------------------------------------------------------- 1 | input { 2 | pipeline { 3 | address => "haproxy-filebeat-module-structure" 4 | } 5 | } 6 | filter { 7 | } 8 | output { 9 | elasticsearch { 10 | # 11 | # Custom Settings 12 | # 13 | id => "haproxy-filebeat-module-structure" 14 | index => "haproxy-filebeat-module" 15 | hosts => "${ES_ENDPOINT}" 16 | user => "${ES_USERNAME}" 17 | password => "${ES_PASSWORD}" 18 | } 19 | } 20 | -------------------------------------------------------------------------------- /haproxy-filebeat-module/4-visualize/dashboard.json: -------------------------------------------------------------------------------- 1 | { 2 | "objects": [ 3 | { 4 | "attributes": { 5 | "description": "", 6 | "kibanaSavedObjectMeta": { 7 | "searchSourceJSON": { 8 | "filter": [], 9 | "index": "filebeat-*", 10 | "query": { 11 | "language": "kuery", 12 | "query": "" 13 | } 14 | } 15 | }, 16 | "title": "Backend breakdown [Filebeat HAProxy] ECS", 17 | "uiStateJSON": {}, 18 | "version": 1, 19 | "visState": { 20 | "aggs": [ 21 | { 22 | "enabled": true, 23 | "id": "1", 24 | "params": {}, 25 | "schema": "metric", 26 | "type": "count" 27 | }, 28 | { 29 | "enabled": true, 30 | "id": "2", 31 | "params": { 32 | "field": "haproxy.backend_name", 33 | "missingBucket": false, 34 | "missingBucketLabel": "Missing", 35 | "order": "desc", 36 | "orderBy": "1", 37 | "otherBucket": false, 38 | "otherBucketLabel": "Other", 39 | "size": 5 40 | }, 41 | "schema": "segment", 42 | "type": "terms" 43 | } 44 | ], 45 | "params": { 46 | "addLegend": true, 47 | "addTooltip": true, 48 | "isDonut": true, 49 | "labels": { 50 | "last_level": true, 51 | "show": false, 52 | "truncate": 100, 53 | "values": true 54 | }, 55 | "legendPosition": "right", 56 | "type": "pie" 57 | }, 58 | "title": "Backend breakdown [Filebeat HAProxy] ECS", 59 | "type": "pie" 60 | } 61 | }, 62 | "id": "55251360-aa32-11e8-9c06-877f0445e3e0-ecs", 63 | "type": "visualization", 64 | "updated_at": "2018-12-06T11:35:36.721Z", 65 | "version": 2 66 | }, 67 | { 68 | "attributes": { 69 | "description": "", 70 | "kibanaSavedObjectMeta": { 71 | "searchSourceJSON": { 72 | "filter": [], 73 | "index": "filebeat-*", 74 | "query": { 75 | "language": "kuery", 76 | "query": "" 77 | } 78 | } 79 | }, 80 | "title": "Frontend breakdown [Filebeat HAProxy] ECS", 81 | "uiStateJSON": {}, 82 | "version": 1, 83 | "visState": { 84 | "aggs": [ 85 | { 86 | "enabled": true, 87 | "id": "1", 88 | "params": {}, 89 | "schema": "metric", 90 | "type": "count" 91 | }, 92 | { 93 | "enabled": true, 94 | "id": "2", 95 | "params": { 96 | "field": "haproxy.frontend_name", 97 | "missingBucket": false, 98 | "missingBucketLabel": "Missing", 99 | "order": "desc", 100 | "orderBy": "1", 101 | "otherBucket": false, 102 | "otherBucketLabel": "Other", 103 | "size": 5 104 | }, 105 | "schema": "segment", 106 | "type": "terms" 107 | } 108 | ], 109 | "params": { 110 | "addLegend": true, 111 | "addTooltip": true, 112 | "isDonut": true, 113 | "labels": { 114 | "last_level": true, 115 | "show": false, 116 | "truncate": 100, 117 | "values": true 118 | }, 119 | "legendPosition": "right", 120 | "type": "pie" 121 | }, 122 | "title": "Frontend breakdown [Filebeat HAProxy] ECS", 123 | "type": "pie" 124 | } 125 | }, 126 | "id": "7fb671f0-aa32-11e8-9c06-877f0445e3e0-ecs", 127 | "type": "visualization", 128 | "updated_at": "2018-12-06T11:35:36.721Z", 129 | "version": 2 130 | }, 131 | { 132 | "attributes": { 133 | "description": "", 134 | "kibanaSavedObjectMeta": { 135 | "searchSourceJSON": { 136 | "filter": [], 137 | "index": "filebeat-*", 138 | "query": { 139 | "language": "kuery", 140 | "query": "" 141 | } 142 | } 143 | }, 144 | "title": "IP Geohashes [Filebeat HAProxy] ECS", 145 | "uiStateJSON": { 146 | "mapCenter": [ 147 | 14.944784875088372, 148 | 5.09765625 149 | ] 150 | }, 151 | "version": 1, 152 | "visState": { 153 | "aggs": [ 154 | { 155 | "enabled": true, 156 | "id": "1", 157 | "params": { 158 | "field": "source.address" 159 | }, 160 | "schema": "metric", 161 | "type": "cardinality" 162 | }, 163 | { 164 | "enabled": true, 165 | "id": "2", 166 | "params": { 167 | "autoPrecision": true, 168 | "field": "source.geo.location", 169 | "isFilteredByCollar": true, 170 | "precision": 2, 171 | "useGeocentroid": true 172 | }, 173 | "schema": "segment", 174 | "type": "geohash_grid" 175 | } 176 | ], 177 | "params": { 178 | "addTooltip": true, 179 | "heatBlur": 15, 180 | "heatMaxZoom": 16, 181 | "heatMinOpacity": 0.1, 182 | "heatNormalizeData": true, 183 | "heatRadius": 25, 184 | "isDesaturated": true, 185 | "legendPosition": "bottomright", 186 | "mapCenter": [ 187 | 15, 188 | 5 189 | ], 190 | "mapType": "Scaled Circle Markers", 191 | "mapZoom": 2, 192 | "wms": { 193 | "enabled": false, 194 | "options": { 195 | "attribution": "Maps provided by USGS", 196 | "format": "image/png", 197 | "layers": "0", 198 | "styles": "", 199 | "transparent": true, 200 | "version": "1.3.0" 201 | }, 202 | "url": "https://basemap.nationalmap.gov/arcgis/services/USGSTopo/MapServer/WMSServer" 203 | } 204 | }, 205 | "title": "IP Geohashes [Filebeat HAProxy] ECS", 206 | "type": "tile_map" 207 | } 208 | }, 209 | "id": "11f8b9c0-aa32-11e8-9c06-877f0445e3e0-ecs", 210 | "type": "visualization", 211 | "updated_at": "2018-12-06T11:35:36.721Z", 212 | "version": 2 213 | }, 214 | { 215 | "attributes": { 216 | "description": "", 217 | "kibanaSavedObjectMeta": { 218 | "searchSourceJSON": { 219 | "filter": [], 220 | "index": "filebeat-*", 221 | "query": { 222 | "language": "kuery", 223 | "query": "" 224 | } 225 | } 226 | }, 227 | "title": "Response codes over time [Filebeat HAProxy] ECS", 228 | "uiStateJSON": { 229 | "vis": { 230 | "colors": { 231 | "200": "#508642", 232 | "204": "#629E51", 233 | "302": "#6ED0E0", 234 | "404": "#EAB839", 235 | "503": "#705DA0" 236 | } 237 | } 238 | }, 239 | "version": 1, 240 | "visState": { 241 | "aggs": [ 242 | { 243 | "enabled": true, 244 | "id": "1", 245 | "params": {}, 246 | "schema": "metric", 247 | "type": "count" 248 | }, 249 | { 250 | "enabled": true, 251 | "id": "2", 252 | "params": { 253 | "customInterval": "2h", 254 | "extended_bounds": {}, 255 | "field": "@timestamp", 256 | "interval": "auto", 257 | "min_doc_count": 1 258 | }, 259 | "schema": "segment", 260 | "type": "date_histogram" 261 | }, 262 | { 263 | "enabled": true, 264 | "id": "3", 265 | "params": { 266 | "field": "http.response.status_code", 267 | "missingBucket": false, 268 | "missingBucketLabel": "Missing", 269 | "order": "desc", 270 | "orderBy": "_term", 271 | "otherBucket": false, 272 | "otherBucketLabel": "Other", 273 | "size": 5 274 | }, 275 | "schema": "group", 276 | "type": "terms" 277 | } 278 | ], 279 | "params": { 280 | "addLegend": true, 281 | "addTimeMarker": false, 282 | "addTooltip": true, 283 | "categoryAxes": [ 284 | { 285 | "id": "CategoryAxis-1", 286 | "labels": { 287 | "show": true, 288 | "truncate": 100 289 | }, 290 | "position": "bottom", 291 | "scale": { 292 | "type": "linear" 293 | }, 294 | "show": true, 295 | "style": {}, 296 | "title": {}, 297 | "type": "category" 298 | } 299 | ], 300 | "grid": { 301 | "categoryLines": false, 302 | "style": { 303 | "color": "#eee" 304 | } 305 | }, 306 | "legendPosition": "right", 307 | "seriesParams": [ 308 | { 309 | "data": { 310 | "id": "1", 311 | "label": "Count" 312 | }, 313 | "drawLinesBetweenPoints": true, 314 | "mode": "stacked", 315 | "show": "true", 316 | "showCircles": true, 317 | "type": "histogram", 318 | "valueAxis": "ValueAxis-1" 319 | } 320 | ], 321 | "times": [], 322 | "type": "histogram", 323 | "valueAxes": [ 324 | { 325 | "id": "ValueAxis-1", 326 | "labels": { 327 | "filter": false, 328 | "rotate": 0, 329 | "show": true, 330 | "truncate": 100 331 | }, 332 | "name": "LeftAxis-1", 333 | "position": "left", 334 | "scale": { 335 | "mode": "normal", 336 | "type": "linear" 337 | }, 338 | "show": true, 339 | "style": {}, 340 | "title": { 341 | "text": "Count" 342 | }, 343 | "type": "value" 344 | } 345 | ] 346 | }, 347 | "title": "Response codes over time [Filebeat HAProxy] ECS", 348 | "type": "histogram" 349 | } 350 | }, 351 | "id": "68af8ef0-aa33-11e8-9c06-877f0445e3e0-ecs", 352 | "type": "visualization", 353 | "updated_at": "2018-12-06T11:35:36.721Z", 354 | "version": 2 355 | }, 356 | { 357 | "attributes": { 358 | "description": "Filebeat HAProxy module dashboard", 359 | "hits": 0, 360 | "kibanaSavedObjectMeta": { 361 | "searchSourceJSON": { 362 | "filter": [], 363 | "query": { 364 | "language": "kuery", 365 | "query": "" 366 | } 367 | } 368 | }, 369 | "optionsJSON": { 370 | "darkTheme": false, 371 | "hidePanelTitles": false, 372 | "useMargins": true 373 | }, 374 | "panelsJSON": [ 375 | { 376 | "embeddableConfig": {}, 377 | "gridData": { 378 | "h": 15, 379 | "i": "1", 380 | "w": 24, 381 | "x": 0, 382 | "y": 0 383 | }, 384 | "id": "55251360-aa32-11e8-9c06-877f0445e3e0-ecs", 385 | "panelIndex": "1", 386 | "type": "visualization", 387 | "version": "6.5.2" 388 | }, 389 | { 390 | "embeddableConfig": {}, 391 | "gridData": { 392 | "h": 15, 393 | "i": "2", 394 | "w": 24, 395 | "x": 24, 396 | "y": 0 397 | }, 398 | "id": "7fb671f0-aa32-11e8-9c06-877f0445e3e0-ecs", 399 | "panelIndex": "2", 400 | "type": "visualization", 401 | "version": "6.5.2" 402 | }, 403 | { 404 | "embeddableConfig": {}, 405 | "gridData": { 406 | "h": 15, 407 | "i": "3", 408 | "w": 24, 409 | "x": 0, 410 | "y": 15 411 | }, 412 | "id": "11f8b9c0-aa32-11e8-9c06-877f0445e3e0-ecs", 413 | "panelIndex": "3", 414 | "type": "visualization", 415 | "version": "6.5.2" 416 | }, 417 | { 418 | "embeddableConfig": {}, 419 | "gridData": { 420 | "h": 15, 421 | "i": "4", 422 | "w": 24, 423 | "x": 24, 424 | "y": 15 425 | }, 426 | "id": "68af8ef0-aa33-11e8-9c06-877f0445e3e0-ecs", 427 | "panelIndex": "4", 428 | "type": "visualization", 429 | "version": "6.5.2" 430 | } 431 | ], 432 | "timeRestore": false, 433 | "title": "[Filebeat HAProxy] Overview ECS", 434 | "version": 1 435 | }, 436 | "id": "3560d580-aa34-11e8-9c06-877f0445e3e0-ecs", 437 | "type": "dashboard", 438 | "updated_at": "2018-12-06T11:40:40.204Z", 439 | "version": 6 440 | } 441 | ], 442 | "version": "6.5.2" 443 | } 444 | -------------------------------------------------------------------------------- /haproxy-filebeat-module/README.md: -------------------------------------------------------------------------------- 1 | We'll use the Filebeat HAProxy module to grab the HAProxy log file. 2 | 3 | You can grab it without the module, but only one method works at a 4 | time for Filebeat to read the file (you can't enable both). 5 | 6 | We'll use the Filebeat HAProxy module since it cleanly persists the 7 | HAProxy log file messages while also providing the appropriate metadata 8 | for the other module artifacts: Ingest Pipeline, Kibana Dashboard, etc. 9 | 10 | ``` 11 | $ filebeat module enable haproxy 12 | $ cat /etc/filebeat/modules.d/haproxy.yml 13 | - module: haproxy 14 | log: 15 | enabled: true 16 | var.input: file 17 | ``` 18 | We should still be able to use the data collected by the module with 19 | the "raw" HAProxy data source adapter [here](/data-sources/haproxy). 20 | 21 | Since Beats Modules come with the ability to load the out-of-the-box 22 | assets using the beat, you can leverage that method as described below. 23 | 24 | Load Index Template 25 | 26 | [https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-template.html](https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-template.html) 27 | 28 | Load Kibana Dashboards 29 | 30 | [https://www.elastic.co/guide/en/beats/filebeat/current/load-kibana-dashboards.html](https://www.elastic.co/guide/en/beats/filebeat/current/load-kibana-dashboards.html) 31 | -------------------------------------------------------------------------------- /images/architecture.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/images/architecture.png -------------------------------------------------------------------------------- /images/caiv.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/images/caiv.png -------------------------------------------------------------------------------- /images/data-source-assets.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/images/data-source-assets.png -------------------------------------------------------------------------------- /images/elk-data-lake.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/images/elk-data-lake.png -------------------------------------------------------------------------------- /images/indexing.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/images/indexing.png -------------------------------------------------------------------------------- /images/logical-elements.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/images/logical-elements.png -------------------------------------------------------------------------------- /images/onboarding-data.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/images/onboarding-data.png -------------------------------------------------------------------------------- /images/terminology.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/images/terminology.png -------------------------------------------------------------------------------- /images/workflow.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/images/workflow.png -------------------------------------------------------------------------------- /power-emu2/README.md: -------------------------------------------------------------------------------- 1 | # Monitoring Power with EMU-2 2 | 3 | EMU-2 4 | 5 | The [EMU-2](https://www.rainforestautomation.com/rfa-z105-2-emu-2-2/) by Rainforest Automation displays your smart meter's data in real time. We'll connect to it via USB and use a Python script to receive its messages. The device should output the current demand (kW), current meter reading, and even the current price per kWh. 6 | 7 | Our goal is to build the following dashboard: 8 | 9 | ![Dashboard](dashboard.png) 10 | 11 | Let's get started. 12 | 13 | ## Step #1 - Collect Data 14 | 15 | Create a new python script called `~/bin/power-emu.py` with the following contents: 16 | 17 | ​ [power-emu2.py](power-emu2.py) 18 | 19 | You might need to adjust the USB port in the script, to match your needs. Look for `/dev/ttyACM1` in the script. 20 | 21 | Decoding the messages from the EMU-2 can be tricky. There are technical documents to aid the process if you want to dig deeper than the provided Python script: 22 | 23 | * [Emu-2-Tech-Guide-1.05.pdf](https://github.com/rainforestautomation/Emu-Serial-API/blob/master/Emu-2-Tech-Guide-1.05.pdf) 24 | * [RAVEn. XML API Manual.pdf](https://rainforestautomation.com/wp-content/uploads/2014/02/raven_xml_api_r127.pdf) 25 | 26 | Try running the script from the command line: 27 | 28 | ```bash 29 | chmod a+x ~/bin/power-emu2.py 30 | sudo ~/bin/power-emu2.py 31 | ``` 32 | 33 | The output will include a JSON-formatted summary of each power outlet's metrics. 34 | 35 | ```json 36 | {"message": "Starting", "timestamp": "2021-09-06T07:55:42Z", "status": "connected"} 37 | {"message": "InstantaneousDemand", "timestamp": "2021-09-06T07:55:23Z", "demand_kW": 0.558} 38 | {"message": "InstantaneousDemand", "timestamp": "2021-09-06T07:55:53Z", "demand_kW": 0.585} 39 | {"message": "InstantaneousDemand", "timestamp": "2021-09-06T07:56:08Z", "demand_kW": 0.63} 40 | {"message": "CurrentSummationDelivered", "timestamp": "2021-09-06T07:56:11Z", "summation_delivered": 73438571, "summation_received": 0, "meter_kWh": 73438.571} 41 | {"message": "PriceCluster", "timestamp": "2021-09-06T07:56:51Z", "price_cents_kWh": 5.399, "currency": 840, "tier": 0, "start_time": "2021-09-06T07:50:00Z", "duration": 1440} 42 | ``` 43 | 44 | Hit `^c` to quite the script. 45 | 46 | Once you're able to successfully query the power strip, create a log file for its output: 47 | 48 | ```bash 49 | sudo touch /var/log/power-emu2.log 50 | sudo chown ubuntu.ubuntu /var/log/power-emu2.log 51 | ``` 52 | 53 | Create a logrotate entry so the log file doesn't grow unbounded: 54 | 55 | ``` 56 | sudo vi /etc/logrotate.d/power-emu2 57 | ``` 58 | 59 | Add the following content: 60 | 61 | ``` 62 | /var/log/power-emu2.log { 63 | weekly 64 | rotate 12 65 | compress 66 | delaycompress 67 | missingok 68 | notifempty 69 | create 644 ubuntu ubuntu 70 | } 71 | ``` 72 | 73 | Create a new bash script `~/bin/power-emu2.sh` with the following: 74 | 75 | ```bash 76 | #!/bin/bash 77 | 78 | if pgrep -f "sudo /home/ubuntu/bin/power-emu2.py" > /dev/null 79 | then 80 | echo "Already running." 81 | else 82 | echo "Not running. Restarting..." 83 | sudo /home/ubuntu/bin/power-emu2.py >> /var/log/power-emu2.log 2>&1 84 | fi 85 | ``` 86 | 87 | Add the following entry to your crontab: 88 | 89 | ``` 90 | * * * * * /home/ubuntu/bin/power-emu2.sh >> /tmp/power-emu2.log 2>&1 91 | ``` 92 | 93 | Verify output by tailing the log file for a few minutes: 94 | 95 | ``` 96 | tail -f /var/log/power-emu2.log 97 | ``` 98 | 99 | If you're seeing output scroll each minute then you are successfully collecting data! 100 | 101 | ## Step #2 - Archive Data 102 | 103 | Once your data is ready to archive, we'll use Filebeat to send it to Logstash which will in turn sends it to S3. 104 | 105 | Add the following to the Filebeat config `/etc/filebeat/filebeat.yml` on the host logging your EMU-2 data: 106 | 107 | ```yaml 108 | filebeat.inputs: 109 | - type: log 110 | enabled: true 111 | tags: ["power-emu2"] 112 | paths: 113 | - /var/log/power-emu2.log 114 | ``` 115 | 116 | This tells Filebeat where the log file is located and it adds a tag to each event. We'll refer to that tag in Logstash so we can easily isolate events from this data stream. 117 | 118 | Restart Filebeat: 119 | 120 | ```bash 121 | sudo systemctl restart filebeat 122 | ``` 123 | 124 | You may want to tail syslog to see if Filebeat restarts without any issues: 125 | 126 | ```bash 127 | tail -f /var/log/syslog | grep filebeat 128 | ``` 129 | 130 | At this point, we should have EMU-2 data flowing into Logstash. By default however, our `distributor` pipeline in Logstash will put any unrecognized data in our Data Lake / S3 bucket called `NEEDS_CLASSIFIED`. To change this, we're going to update the `distributor` pipeline to recognize the EMU-2 data feed. 131 | 132 | Add the following conditional to your `distributor.yml` file: 133 | 134 | ``` 135 | } else if "power-emu2" in [tags] { 136 | pipeline { 137 | send_to => ["power-emu2-archive"] 138 | } 139 | } 140 | ``` 141 | 142 | Create a Logstash pipeline called `power-emu2-archive.yml` with the following contents: 143 | 144 | ``` 145 | input { 146 | pipeline { 147 | address => "power-emu2-archive" 148 | } 149 | } 150 | filter { 151 | } 152 | output { 153 | s3 { 154 | # 155 | # Custom Settings 156 | # 157 | prefix => "power-emu2/%{+YYYY}-%{+MM}-%{+dd}/%{+HH}" 158 | temporary_directory => "${S3_TEMP_DIR}/power-emu2-archive" 159 | access_key_id => "${S3_ACCESS_KEY}" 160 | secret_access_key => "${S3_SECRET_KEY}" 161 | endpoint => "${S3_ENDPOINT}" 162 | bucket => "${S3_BUCKET}" 163 | 164 | # 165 | # Standard Settings 166 | # 167 | validate_credentials_on_root_bucket => false 168 | codec => json_lines 169 | # Limit Data Lake file sizes to 5 GB 170 | size_file => 5000000000 171 | time_file => 60 172 | # encoding => "gzip" 173 | additional_settings => { 174 | force_path_style => true 175 | follow_redirects => false 176 | } 177 | } 178 | } 179 | ``` 180 | 181 | Put this pipeline in your Logstash configuration directory so it gets loaded whenever Logstash restarts: 182 | 183 | ```bash 184 | sudo mv power-emu2-archive.yml /etc/logstash/conf.d/ 185 | ``` 186 | 187 | Add the pipeline to your `/etc/logstash/pipelines.yml` file: 188 | 189 | ``` 190 | - pipeline.id: "power-emu2-archive" 191 | path.config: "/etc/logstash/conf.d/power-emu2-archive.conf" 192 | ``` 193 | 194 | And finally, restart the Logstash service: 195 | 196 | ```bash 197 | sudo systemctl restart logstash 198 | ``` 199 | 200 | While Logstash is restarting, you can tail it's log file in order to see if there are any configuration errors: 201 | 202 | ```bash 203 | sudo tail -f /var/log/logstash/logstash-plain.log 204 | ``` 205 | 206 | After a few seconds, you should see Logstash shutdown and start with the new pipeline and no errors being emitted. 207 | 208 | Check your cluster's Stack Monitoring to see if we're getting events through the pipeline: 209 | 210 | ![Stack Monitoring](archive.png) 211 | 212 | Check your S3 bucket to see if you're getting data directories created for the current date & hour with data: 213 | 214 | ![MinIO](minio.png) 215 | 216 | If you see your data being stored, then you are successfully archiving! 217 | 218 | ## Step #3 - Index Data 219 | 220 | Once Logstash is archiving the data, next we need to index it with Elastic. 221 | 222 | We'll use Elastic's [Dynamic field mapping](https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-field-mapping.html) feature to automatically create the right [Field data types](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html) for the data we're sending in. 223 | 224 | Using the [Logstash Toolkit](http://github.com/gose/logstash-toolkit), the following filter chain has been built that can parse the raw JSON coming in. 225 | 226 | Create a new pipeline called `power-emu2-index.yml` with the following content: 227 | 228 | ``` 229 | input { 230 | pipeline { 231 | address => "power-emu2-index" 232 | } 233 | } 234 | filter { 235 | json { 236 | source => "message" 237 | skip_on_invalid_json => true 238 | } 239 | json { 240 | source => "message" 241 | skip_on_invalid_json => true 242 | } 243 | date { 244 | match => ["timestamp", "ISO8601"] 245 | } 246 | mutate { 247 | remove_field => ["timestamp"] 248 | remove_field => ["log", "input", "agent", "tags", "@version", "ecs", "host"] 249 | } 250 | } 251 | output { 252 | elasticsearch { 253 | # 254 | # Custom Settings 255 | # 256 | id => "power-emu2-index" 257 | index => "power-emu2-%{+YYYY.MM.dd}" 258 | hosts => "${ES_ENDPOINT}" 259 | user => "${ES_USERNAME}" 260 | password => "${ES_PASSWORD}" 261 | } 262 | } 263 | ``` 264 | 265 | Put this pipeline in your Logstash configuration directory so it gets loaded in whenever Logstash restarts: 266 | 267 | ```bash 268 | sudo mv power-emu2-index.yml /etc/logstash/conf.d/ 269 | ``` 270 | 271 | Add the pipeline to your `/etc/logstash/pipelines.yml` file: 272 | 273 | ``` 274 | - pipeline.id: "power-emu2-index" 275 | path.config: "/etc/logstash/conf.d/power-emu2-index.conf" 276 | ``` 277 | 278 | Append your new pipeline to your tagged data in the `distributor.yml` pipeline: 279 | 280 | ``` 281 | } else if "power-emu2" in [tags] { 282 | pipeline { 283 | send_to => ["power-emu2-archive", "power-emu2-index"] 284 | } 285 | } 286 | ``` 287 | 288 | And finally, restart the Logstash service: 289 | 290 | ```bash 291 | sudo systemctl restart logstash 292 | ``` 293 | 294 | While Logstash is restarting, you can tail it's log file in order to see if there are any configuration errors: 295 | 296 | ```bash 297 | sudo tail -f /var/log/logstash/logstash-plain.log 298 | ``` 299 | 300 | After a few seconds, you should see Logstash shutdown and start with the new pipeline and no errors being emitted. 301 | 302 | Check your cluster's Stack Monitoring to see if we're getting events through the pipeline: 303 | 304 | ![Indexing](index.png) 305 | 306 | ## Step #4 - Visualize Data 307 | 308 | Once Elasticsearch is indexing the data, we want to visualize it in Kibana. 309 | 310 | Download this dashboard: [power-emu2.ndjson](power-emu2.ndjson) 311 | 312 | Jump back into Kibana: 313 | 314 | 1. Select "Stack Management" from the menu 315 | 2. Select "Saved Objects" 316 | 3. Click "Import" in the upper right 317 | 318 | Once it's been imported, click on "Power EMU-2". 319 | 320 | ![Dashboard](dashboard.png) 321 | 322 | Congratulations! You should now be looking at power data from your EMU-2 in Elastic. -------------------------------------------------------------------------------- /power-emu2/archive.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/power-emu2/archive.png -------------------------------------------------------------------------------- /power-emu2/dashboard.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/power-emu2/dashboard.png -------------------------------------------------------------------------------- /power-emu2/emu-2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/power-emu2/emu-2.jpg -------------------------------------------------------------------------------- /power-emu2/index.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/power-emu2/index.png -------------------------------------------------------------------------------- /power-emu2/minio.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/power-emu2/minio.png -------------------------------------------------------------------------------- /power-emu2/power-emu2.ndjson: -------------------------------------------------------------------------------- 1 | {"attributes":{"fieldAttrs":"{}","fields":"[]","runtimeFieldMap":"{}","timeFieldName":"@timestamp","title":"power-emu2-*","typeMeta":"{}"},"coreMigrationVersion":"7.14.0","id":"140f1bf0-0dbc-11ec-b013-53a9df7625dd","migrationVersion":{"index-pattern":"7.11.0"},"references":[],"type":"index-pattern","updated_at":"2021-09-04T20:09:58.325Z","version":"WzMxMzI2NCwyXQ=="} 2 | {"attributes":{"description":"","hits":0,"kibanaSavedObjectMeta":{"searchSourceJSON":"{\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filter\":[]}"},"optionsJSON":"{\"useMargins\":true,\"syncColors\":false,\"hidePanelTitles\":false}","panelsJSON":"[{\"version\":\"7.14.0\",\"type\":\"visualization\",\"gridData\":{\"x\":0,\"y\":0,\"w\":12,\"h\":8,\"i\":\"2492dbbd-f32f-4d29-946d-0b6971538d0e\"},\"panelIndex\":\"2492dbbd-f32f-4d29-946d-0b6971538d0e\",\"embeddableConfig\":{\"savedVis\":{\"title\":\"\",\"description\":\"\",\"type\":\"markdown\",\"params\":{\"fontSize\":12,\"openLinksInNewTab\":false,\"markdown\":\"# Power EMU-2\"},\"uiState\":{},\"data\":{\"aggs\":[],\"searchSource\":{\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filter\":[]}}},\"enhancements\":{}}},{\"version\":\"7.14.0\",\"type\":\"lens\",\"gridData\":{\"x\":12,\"y\":0,\"w\":36,\"h\":8,\"i\":\"58d3b589-f688-4815-b206-94f8e5bcf246\"},\"panelIndex\":\"58d3b589-f688-4815-b206-94f8e5bcf246\",\"embeddableConfig\":{\"attributes\":{\"title\":\"\",\"type\":\"lens\",\"visualizationType\":\"lnsXY\",\"state\":{\"datasourceStates\":{\"indexpattern\":{\"layers\":{\"9b5ecb03-f611-4038-bdd0-1e033b9b64b3\":{\"columns\":{\"9fa24993-a762-4277-aa6d-e471e99152e6\":{\"label\":\"@timestamp\",\"dataType\":\"date\",\"operationType\":\"date_histogram\",\"sourceField\":\"@timestamp\",\"isBucketed\":true,\"scale\":\"interval\",\"params\":{\"interval\":\"auto\"}},\"b27753d2-962f-407c-a41d-d0af4ff04199\":{\"label\":\"Count of records\",\"dataType\":\"number\",\"operationType\":\"count\",\"isBucketed\":false,\"scale\":\"ratio\",\"sourceField\":\"Records\"}},\"columnOrder\":[\"9fa24993-a762-4277-aa6d-e471e99152e6\",\"b27753d2-962f-407c-a41d-d0af4ff04199\"],\"incompleteColumns\":{}}}}},\"visualization\":{\"legend\":{\"isVisible\":true,\"position\":\"right\"},\"valueLabels\":\"hide\",\"fittingFunction\":\"None\",\"curveType\":\"CURVE_MONOTONE_X\",\"valuesInLegend\":true,\"yLeftExtent\":{\"mode\":\"full\"},\"yRightExtent\":{\"mode\":\"full\"},\"axisTitlesVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"tickLabelsVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"gridlinesVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"preferredSeriesType\":\"bar\",\"layers\":[{\"layerId\":\"9b5ecb03-f611-4038-bdd0-1e033b9b64b3\",\"accessors\":[\"b27753d2-962f-407c-a41d-d0af4ff04199\"],\"position\":\"top\",\"seriesType\":\"bar\",\"showGridlines\":false,\"xAccessor\":\"9fa24993-a762-4277-aa6d-e471e99152e6\"}]},\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filters\":[]},\"references\":[{\"type\":\"index-pattern\",\"id\":\"140f1bf0-0dbc-11ec-b013-53a9df7625dd\",\"name\":\"indexpattern-datasource-current-indexpattern\"},{\"type\":\"index-pattern\",\"id\":\"140f1bf0-0dbc-11ec-b013-53a9df7625dd\",\"name\":\"indexpattern-datasource-layer-9b5ecb03-f611-4038-bdd0-1e033b9b64b3\"}]},\"enhancements\":{},\"hidePanelTitles\":false},\"title\":\"Logs\"},{\"version\":\"7.14.0\",\"type\":\"lens\",\"gridData\":{\"x\":0,\"y\":8,\"w\":48,\"h\":11,\"i\":\"b1c3297d-5e54-4a20-8181-5cb483faf23d\"},\"panelIndex\":\"b1c3297d-5e54-4a20-8181-5cb483faf23d\",\"embeddableConfig\":{\"attributes\":{\"title\":\"\",\"type\":\"lens\",\"visualizationType\":\"lnsXY\",\"state\":{\"datasourceStates\":{\"indexpattern\":{\"layers\":{\"6ba7a7ef-3bea-46d9-b61a-bc2a3c673c1d\":{\"columns\":{\"623795c1-643f-4db6-9378-059abb5dc58b\":{\"label\":\"@timestamp\",\"dataType\":\"date\",\"operationType\":\"date_histogram\",\"sourceField\":\"@timestamp\",\"isBucketed\":true,\"scale\":\"interval\",\"params\":{\"interval\":\"auto\"}},\"b49999ce-ffb9-4188-9356-e363e7ae2ca8\":{\"label\":\"Median of demand_kW\",\"dataType\":\"number\",\"operationType\":\"median\",\"sourceField\":\"demand_kW\",\"isBucketed\":false,\"scale\":\"ratio\"}},\"columnOrder\":[\"623795c1-643f-4db6-9378-059abb5dc58b\",\"b49999ce-ffb9-4188-9356-e363e7ae2ca8\"],\"incompleteColumns\":{}}}}},\"visualization\":{\"legend\":{\"isVisible\":true,\"position\":\"right\"},\"valueLabels\":\"hide\",\"fittingFunction\":\"None\",\"curveType\":\"CURVE_MONOTONE_X\",\"valuesInLegend\":true,\"yLeftExtent\":{\"mode\":\"full\"},\"yRightExtent\":{\"mode\":\"full\"},\"axisTitlesVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"tickLabelsVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"gridlinesVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"preferredSeriesType\":\"line\",\"layers\":[{\"layerId\":\"6ba7a7ef-3bea-46d9-b61a-bc2a3c673c1d\",\"accessors\":[\"b49999ce-ffb9-4188-9356-e363e7ae2ca8\"],\"position\":\"top\",\"seriesType\":\"line\",\"showGridlines\":false,\"xAccessor\":\"623795c1-643f-4db6-9378-059abb5dc58b\"}]},\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filters\":[]},\"references\":[{\"type\":\"index-pattern\",\"id\":\"140f1bf0-0dbc-11ec-b013-53a9df7625dd\",\"name\":\"indexpattern-datasource-current-indexpattern\"},{\"type\":\"index-pattern\",\"id\":\"140f1bf0-0dbc-11ec-b013-53a9df7625dd\",\"name\":\"indexpattern-datasource-layer-6ba7a7ef-3bea-46d9-b61a-bc2a3c673c1d\"}]},\"hidePanelTitles\":false,\"enhancements\":{}},\"title\":\"Demand (kW)\"},{\"version\":\"7.14.0\",\"type\":\"lens\",\"gridData\":{\"x\":0,\"y\":19,\"w\":48,\"h\":9,\"i\":\"0161139a-4d80-41ab-aed3-7d3042269d5a\"},\"panelIndex\":\"0161139a-4d80-41ab-aed3-7d3042269d5a\",\"embeddableConfig\":{\"attributes\":{\"title\":\"\",\"type\":\"lens\",\"visualizationType\":\"lnsXY\",\"state\":{\"datasourceStates\":{\"indexpattern\":{\"layers\":{\"25083df7-ad95-4b9c-882c-5d82a00a15ed\":{\"columns\":{\"45c88792-c3c8-40f6-a1a1-3643062294f8\":{\"label\":\"@timestamp\",\"dataType\":\"date\",\"operationType\":\"date_histogram\",\"sourceField\":\"@timestamp\",\"isBucketed\":true,\"scale\":\"interval\",\"params\":{\"interval\":\"auto\"}},\"ac569630-a85e-46fc-a682-c30e1b1bbef5\":{\"label\":\"Differences of Sum of meter_kWh\",\"dataType\":\"number\",\"operationType\":\"differences\",\"isBucketed\":false,\"scale\":\"ratio\",\"references\":[\"ff830719-e828-470a-8bdd-07e23f048769\"]},\"ff830719-e828-470a-8bdd-07e23f048769\":{\"label\":\"Sum of meter_kWh\",\"dataType\":\"number\",\"operationType\":\"sum\",\"sourceField\":\"meter_kWh\",\"isBucketed\":false,\"scale\":\"ratio\"}},\"columnOrder\":[\"45c88792-c3c8-40f6-a1a1-3643062294f8\",\"ac569630-a85e-46fc-a682-c30e1b1bbef5\",\"ff830719-e828-470a-8bdd-07e23f048769\"],\"incompleteColumns\":{}}}}},\"visualization\":{\"legend\":{\"isVisible\":true,\"position\":\"right\",\"showSingleSeries\":false},\"valueLabels\":\"hide\",\"fittingFunction\":\"None\",\"curveType\":\"CURVE_MONOTONE_X\",\"valuesInLegend\":true,\"yLeftExtent\":{\"mode\":\"full\"},\"yRightExtent\":{\"mode\":\"full\"},\"axisTitlesVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"tickLabelsVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"gridlinesVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"preferredSeriesType\":\"line\",\"layers\":[{\"layerId\":\"25083df7-ad95-4b9c-882c-5d82a00a15ed\",\"accessors\":[\"ac569630-a85e-46fc-a682-c30e1b1bbef5\"],\"position\":\"top\",\"seriesType\":\"line\",\"showGridlines\":false,\"xAccessor\":\"45c88792-c3c8-40f6-a1a1-3643062294f8\"}]},\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filters\":[]},\"references\":[{\"type\":\"index-pattern\",\"id\":\"140f1bf0-0dbc-11ec-b013-53a9df7625dd\",\"name\":\"indexpattern-datasource-current-indexpattern\"},{\"type\":\"index-pattern\",\"id\":\"140f1bf0-0dbc-11ec-b013-53a9df7625dd\",\"name\":\"indexpattern-datasource-layer-25083df7-ad95-4b9c-882c-5d82a00a15ed\"}]},\"enhancements\":{}}},{\"version\":\"7.14.0\",\"type\":\"lens\",\"gridData\":{\"x\":0,\"y\":28,\"w\":48,\"h\":10,\"i\":\"1c3d281f-94f6-45e8-be45-5d0f24f7ee2f\"},\"panelIndex\":\"1c3d281f-94f6-45e8-be45-5d0f24f7ee2f\",\"embeddableConfig\":{\"attributes\":{\"title\":\"\",\"type\":\"lens\",\"visualizationType\":\"lnsXY\",\"state\":{\"datasourceStates\":{\"indexpattern\":{\"layers\":{\"d4322c84-ac6d-4819-aa31-75a87a04bb0f\":{\"columns\":{\"6e0e774c-eb62-4ae1-abbd-5680c42f85f9\":{\"label\":\"@timestamp\",\"dataType\":\"date\",\"operationType\":\"date_histogram\",\"sourceField\":\"@timestamp\",\"isBucketed\":true,\"scale\":\"interval\",\"params\":{\"interval\":\"auto\"}},\"021d0fc2-f73f-463f-a7c3-5d030e404f68\":{\"label\":\"Differences of Maximum of summation_delivered\",\"dataType\":\"number\",\"operationType\":\"differences\",\"isBucketed\":false,\"scale\":\"ratio\",\"references\":[\"aa4fb545-bc30-40a7-b6cc-942b35936030\"]},\"aa4fb545-bc30-40a7-b6cc-942b35936030\":{\"label\":\"Maximum of summation_delivered\",\"dataType\":\"number\",\"operationType\":\"max\",\"sourceField\":\"summation_delivered\",\"isBucketed\":false,\"scale\":\"ratio\"}},\"columnOrder\":[\"6e0e774c-eb62-4ae1-abbd-5680c42f85f9\",\"021d0fc2-f73f-463f-a7c3-5d030e404f68\",\"aa4fb545-bc30-40a7-b6cc-942b35936030\"],\"incompleteColumns\":{}}}}},\"visualization\":{\"legend\":{\"isVisible\":true,\"position\":\"right\"},\"valueLabels\":\"hide\",\"fittingFunction\":\"None\",\"curveType\":\"CURVE_MONOTONE_X\",\"valuesInLegend\":true,\"yLeftExtent\":{\"mode\":\"full\"},\"yRightExtent\":{\"mode\":\"full\"},\"axisTitlesVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"tickLabelsVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"gridlinesVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"preferredSeriesType\":\"line\",\"layers\":[{\"layerId\":\"d4322c84-ac6d-4819-aa31-75a87a04bb0f\",\"accessors\":[\"021d0fc2-f73f-463f-a7c3-5d030e404f68\"],\"position\":\"top\",\"seriesType\":\"line\",\"showGridlines\":false,\"xAccessor\":\"6e0e774c-eb62-4ae1-abbd-5680c42f85f9\"}]},\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filters\":[]},\"references\":[{\"type\":\"index-pattern\",\"id\":\"140f1bf0-0dbc-11ec-b013-53a9df7625dd\",\"name\":\"indexpattern-datasource-current-indexpattern\"},{\"type\":\"index-pattern\",\"id\":\"140f1bf0-0dbc-11ec-b013-53a9df7625dd\",\"name\":\"indexpattern-datasource-layer-d4322c84-ac6d-4819-aa31-75a87a04bb0f\"}]},\"enhancements\":{}}}]","timeRestore":false,"title":"Power EMU-2","version":1},"coreMigrationVersion":"7.14.0","id":"b58dc570-0dbd-11ec-b013-53a9df7625dd","migrationVersion":{"dashboard":"7.14.0"},"references":[{"id":"140f1bf0-0dbc-11ec-b013-53a9df7625dd","name":"58d3b589-f688-4815-b206-94f8e5bcf246:indexpattern-datasource-current-indexpattern","type":"index-pattern"},{"id":"140f1bf0-0dbc-11ec-b013-53a9df7625dd","name":"58d3b589-f688-4815-b206-94f8e5bcf246:indexpattern-datasource-layer-9b5ecb03-f611-4038-bdd0-1e033b9b64b3","type":"index-pattern"},{"id":"140f1bf0-0dbc-11ec-b013-53a9df7625dd","name":"b1c3297d-5e54-4a20-8181-5cb483faf23d:indexpattern-datasource-current-indexpattern","type":"index-pattern"},{"id":"140f1bf0-0dbc-11ec-b013-53a9df7625dd","name":"b1c3297d-5e54-4a20-8181-5cb483faf23d:indexpattern-datasource-layer-6ba7a7ef-3bea-46d9-b61a-bc2a3c673c1d","type":"index-pattern"},{"id":"140f1bf0-0dbc-11ec-b013-53a9df7625dd","name":"0161139a-4d80-41ab-aed3-7d3042269d5a:indexpattern-datasource-current-indexpattern","type":"index-pattern"},{"id":"140f1bf0-0dbc-11ec-b013-53a9df7625dd","name":"0161139a-4d80-41ab-aed3-7d3042269d5a:indexpattern-datasource-layer-25083df7-ad95-4b9c-882c-5d82a00a15ed","type":"index-pattern"},{"id":"140f1bf0-0dbc-11ec-b013-53a9df7625dd","name":"1c3d281f-94f6-45e8-be45-5d0f24f7ee2f:indexpattern-datasource-current-indexpattern","type":"index-pattern"},{"id":"140f1bf0-0dbc-11ec-b013-53a9df7625dd","name":"1c3d281f-94f6-45e8-be45-5d0f24f7ee2f:indexpattern-datasource-layer-d4322c84-ac6d-4819-aa31-75a87a04bb0f","type":"index-pattern"}],"type":"dashboard","updated_at":"2021-09-06T07:40:24.254Z","version":"WzM1MDE5OSwyXQ=="} 3 | {"excludedObjects":[],"excludedObjectsCount":0,"exportedCount":2,"missingRefCount":0,"missingReferences":[]} -------------------------------------------------------------------------------- /power-emu2/power-emu2.py: -------------------------------------------------------------------------------- 1 | #! /usr/bin/env python3 2 | 3 | import datetime 4 | import json 5 | import os 6 | import platform 7 | import serial 8 | import sys 9 | import time 10 | import xml.etree.ElementTree as et 11 | 12 | data = {} 13 | data['message'] = "Starting" 14 | d = datetime.datetime.now().strftime("%Y-%m-%dT%H:%M:%SZ") 15 | data['timestamp'] = d 16 | 17 | Y2K = 946684800 18 | 19 | try: 20 | dev = '/dev/ttyACM1' 21 | emu2 = serial.Serial(dev, 115200, timeout=1) 22 | data['status'] = "connected" 23 | except: 24 | data['status'] = "could not connect" 25 | print(json.dumps(data), flush=True) 26 | exit() 27 | 28 | print(json.dumps(data), flush=True) 29 | 30 | while True: 31 | try: 32 | msg = emu2.readlines() 33 | except: 34 | data = {} 35 | data['message'] = "error" 36 | d = datetime.datetime.now().strftime("%Y-%m-%dT%H:%M:%SZ") 37 | data['timestamp'] = d 38 | print(json.dumps(data)) 39 | exit() 40 | 41 | if msg == [] or msg[0].decode()[0] != '<': 42 | continue 43 | 44 | msg = ''.join([line.decode() for line in msg]) 45 | 46 | try: 47 | tree = et.fromstring(msg) 48 | #print(msg) 49 | except: 50 | continue 51 | 52 | data = {} 53 | data['message'] = tree.tag 54 | 55 | if tree.tag == 'InstantaneousDemand': 56 | # Received every 15 seconds 57 | ts = int(tree.find('TimeStamp').text, 16) 58 | t = ts + Y2K # ts + Y2K = Unix Epoch Time 59 | d = datetime.datetime.fromtimestamp(t).strftime("%Y-%m-%dT%H:%M:%SZ") 60 | data['timestamp'] = d 61 | power = int(tree.find('Demand').text, 16) 62 | power *= int(tree.find('Multiplier').text, 16) 63 | power /= int(tree.find('Divisor').text, 16) 64 | power = round(power, int(tree.find('DigitsRight').text, 16)) 65 | data['demand_kW'] = power 66 | elif tree.tag == 'PriceCluster': 67 | # Received every 1-2 minutes 68 | ts = int(tree.find('TimeStamp').text, 16) 69 | t = ts + Y2K # ts + Y2K = Unix Epoch Time 70 | d = datetime.datetime.fromtimestamp(t).strftime("%Y-%m-%dT%H:%M:%SZ") 71 | data['timestamp'] = d 72 | #data['price'] = int(tree.find('Price').text, 16) 73 | #data['trailing'] = int(tree.find('TrailingDigits').text, 16) 74 | data['price_cents_kWh'] = int(tree.find('Price').text, 16) 75 | data['price_cents_kWh'] /= 1000 76 | data['currency'] = int(tree.find('Currency').text, 16) 77 | # Currency uses ISO 4217 codes 78 | # US Dollar is code 840 79 | data['tier'] = int(tree.find('Tier').text, 16) 80 | st = int(tree.find('StartTime').text, 16) 81 | st = st + Y2K # st + Y2K = Unix Epoch Time 82 | d = datetime.datetime.fromtimestamp(st).strftime("%Y-%m-%dT%H:%M:%SZ") 83 | data['start_time'] = d 84 | data['duration'] = int(tree.find('Duration').text, 16) 85 | elif tree.tag == 'CurrentSummationDelivered': 86 | # Received every 3-5 minutes 87 | ts = int(tree.find('TimeStamp').text, 16) 88 | t = ts + Y2K # ts + Y2K = Unix Epoch Time 89 | d = datetime.datetime.fromtimestamp(t).strftime("%Y-%m-%dT%H:%M:%SZ") 90 | data['timestamp'] = d 91 | data['summation_delivered'] = int(tree.find('SummationDelivered').text, 16) 92 | data['summation_received'] = int(tree.find('SummationReceived').text, 16) 93 | energy = int(tree.find('SummationDelivered').text, 16) 94 | energy -= int(tree.find('SummationReceived').text, 16) 95 | energy *= int(tree.find('Multiplier').text, 16) 96 | energy /= int(tree.find('Divisor').text, 16) 97 | energy = round(energy, int(tree.find('DigitsRight').text, 16)) 98 | data['meter_kWh'] = energy 99 | elif tree.tag == 'TimeCluster': 100 | # Received every 15 minutes 101 | ts = int(tree.find('UTCTime').text, 16) 102 | t = ts + Y2K # ts + Y2K = Unix Epoch Time 103 | d = datetime.datetime.fromtimestamp(t).strftime("%Y-%m-%dT%H:%M:%SZ") 104 | data['timestamp'] = d 105 | ts = int(tree.find('LocalTime').text, 16) 106 | t = ts + Y2K # ts + Y2K = Unix Epoch Time 107 | d = datetime.datetime.fromtimestamp(t).strftime("%Y-%m-%dT%H:%M:%S") 108 | data['local_time'] = d 109 | else: 110 | for child in tree: 111 | if child: 112 | value = int(child.text, 16) if child.text[:2] == '0x' else child.text 113 | data['unknown'] = value 114 | 115 | print(json.dumps(data), flush=True) 116 | -------------------------------------------------------------------------------- /power-hs300/README.md: -------------------------------------------------------------------------------- 1 | # Monitoring Power with HS300 2 | 3 | HS300 4 | 5 | The [Kasa Smart Wi-Fi Power Strip (HS300)](https://www.kasasmart.com/us/products/smart-plugs/kasa-smart-wi-fi-power-strip-hs300) is a consumer-grade power strip that allows you to independently control and monitor 6 smart outlets (and charge 3 devices with built-in USB ports). The power strip can be controlled via the Kasa Smart [iPhone](https://apps.apple.com/us/app/kasa-smart/id1034035493) app or [Android](https://play.google.com/store/apps/details?id=com.tplink.kasa_android&hl=en_US&gl=US) app. Furthermore, you can query it via API to get the electrical properties of each outlet. For example: 6 | 7 | * Voltage 8 | * Current 9 | * Watts 10 | * Watts per hour 11 | 12 | We'll use a Python script to query it each minute via a cron job, and redirect the output to a log file. From there, Filebeat will pick it up and send it into Elastic. Many data center grade PSUs also provide ways to query individual outlet metrics. A similar script could be written to extract this information in a commercial setting. 13 | 14 | In general, this exercise is meant to bring transparency to the cost of electricity to run a set of machines. If we know how much power a machine is consuming, we can calculate its electricity cost based on utility rates. 15 | 16 | ![Dashboard](dashboard.png) 17 | 18 | Let's get started. 19 | 20 | ## Step #1 - Collect Data 21 | 22 | Install the following Python module that knows how to query the power strip: 23 | 24 | ```bash 25 | pip3 install pyhs100 26 | ``` 27 | 28 | Find the IP address of the power strip: 29 | 30 | ```bash 31 | pyhs100 discover | grep IP 32 | ``` 33 | 34 | This should return an IP address for each HS300 on your network: 35 | 36 | ``` 37 | Host/IP: 192.168.1.5 38 | Host/IP: 192.168.1.6 39 | ``` 40 | 41 | Try querying the power strip: 42 | 43 | ```bash 44 | /home/ubuntu/.local/bin/pyhs100 --ip 192.168.1.5 emeter 45 | ``` 46 | 47 | You should see output similar to: 48 | 49 | ``` 50 | {0: {'voltage_mv': 112807, 'current_ma': 239, 'power_mw': 24620, 'total_wh': 12}, 1: {'voltag_mv': 112608, 'current_ma': 243, 'power_mw': 23948, 'total_wh': 12}, 2: {'voltage_mv': 112608, 'current_ma': 238, 'power_mw': 23453, 'total_wh': 11}, 3: {'voltage_mv': 112509, 'current_ma': 70, 'power_mw': 5399, 'total_wh': 4}, 4: {'voltage_mv': 112409, 'current_ma': 93, 'power_mw': 3130, 'total_wh': 1}, 5: {'voltage_mv': 109030, 'current_ma': 78, 'power_mw': 5787, 'total_wh': 2}} 51 | ``` 52 | 53 | This is not properly formatted JSON, but the script included with this data source will help clean it up. 54 | 55 | After you've verified that you can query the power strip, download the following script and open it in your favorite editor: 56 | 57 | [power-hs300.py](power-hs300.py) 58 | 59 | Modify the script with the following: 60 | 61 | * Change the IP addresses to match that of your power strip(s) 62 | * Change the directory location of the `pyhs100` command 63 | * Change the names of each outlet in the `hosts` dictionary 64 | * Change the `label` argument in the `query_power_strip()` function calls 65 | 66 | Try running the script from the command line: 67 | 68 | ```bash 69 | chmod a+x ~/bin/power-hs300.py 70 | ~/bin/power-hs300.py 71 | ``` 72 | 73 | The output will include a JSON-formatted summary of each power outlet's metrics. 74 | 75 | ```json 76 | {"@timestamp": "2021-02-08T14:32:11.611868", "outlets": [{"ip": "192.168.1.5", "outlet": 0, "name": "node-1", "volts": 112.393, "amps": 0.254, "watts": 25.425, "label": "office"}, ...]} 77 | ``` 78 | 79 | When pretty-printed, it will look like this: 80 | 81 | ```json 82 | { 83 | "@timestamp": "2021-02-08T14:32:11.611868", 84 | "outlets": [ 85 | { 86 | "ip": "192.168.1.5", 87 | "label": "office", 88 | "outlet": 0, 89 | "name": "node-1", 90 | "volts": 112.393, 91 | "amps": 0.254, 92 | "watts": 25.425 93 | }, 94 | ... 95 | ] 96 | } 97 | ``` 98 | 99 | Once you're able to successfully query the power strip, create a log file for its output: 100 | 101 | ```bash 102 | sudo touch /var/log/power-hs300.log 103 | sudo chown ubuntu.ubuntu /var/log/power-hs300.log 104 | ``` 105 | 106 | Create a logrotate entry so the log file doesn't grow unbounded: 107 | 108 | ``` 109 | sudo vi /etc/logrotate.d/power-hs300 110 | ``` 111 | 112 | Add the following content: 113 | 114 | ``` 115 | /var/log/power-hs300.log { 116 | weekly 117 | rotate 12 118 | compress 119 | delaycompress 120 | missingok 121 | notifempty 122 | create 644 ubuntu ubuntu 123 | } 124 | ``` 125 | 126 | Add the following entry to your crontab: 127 | 128 | ``` 129 | * * * * * /home/ubuntu/bin/power-hs300.py >> /var/log/power-hs300.log 2>&1 130 | ``` 131 | 132 | Verify output by tailing the log file for a few minutes: 133 | 134 | ``` 135 | $ tail -f /var/log/power-hs300.log 136 | ``` 137 | 138 | If you're seeing output scroll each minute then you are successfully collecting data! 139 | 140 | ## Step #2 - Archive Data 141 | 142 | Once your data is ready to archive, we'll use Filebeat to send it to Logstash which will in turn sends it to S3. 143 | 144 | Add the following to the Filebeat config `/etc/filebeat/filebeat.yml` on the host logging your HS300 data: 145 | 146 | ```yaml 147 | filebeat.inputs: 148 | - type: log 149 | enabled: true 150 | tags: ["power-hs300"] 151 | paths: 152 | - /var/log/power-hs300.log 153 | ``` 154 | 155 | This tells Filebeat where the log file is located and it adds a tag to each event. We'll refer to that tag in Logstash so we can easily isolate events from this data stream. 156 | 157 | Restart Filebeat: 158 | 159 | ```bash 160 | sudo systemctl restart filebeat 161 | ``` 162 | 163 | You may want to tail syslog to see if Filebeat restarts without any issues: 164 | 165 | ```bash 166 | tail -f /var/log/syslog | grep filebeat 167 | ``` 168 | 169 | At this point, we should have HS300 data flowing into Logstash. By default however, our `distributor` pipeline in Logstash will put any unrecognized data in our Data Lake / S3 bucket called `NEEDS_CLASSIFIED`. To change this, we're going to update the `distributor` pipeline to recognize the HS300 data feed. 170 | 171 | Add the following conditional to your `distributor.yml` file: 172 | 173 | ``` 174 | } else if "power-hs300" in [tags] { 175 | pipeline { 176 | send_to => ["power-hs300-archive"] 177 | } 178 | } 179 | ``` 180 | 181 | Create a Logstash pipeline called `power-hs300-archive.yml` with the following contents: 182 | 183 | ``` 184 | input { 185 | pipeline { 186 | address => "power-hs300-archive" 187 | } 188 | } 189 | filter { 190 | } 191 | output { 192 | s3 { 193 | # 194 | # Custom Settings 195 | # 196 | prefix => "power-hs300/%{+YYYY}-%{+MM}-%{+dd}/%{+HH}" 197 | temporary_directory => "${S3_TEMP_DIR}/power-hs300-archive" 198 | access_key_id => "${S3_ACCESS_KEY}" 199 | secret_access_key => "${S3_SECRET_KEY}" 200 | endpoint => "${S3_ENDPOINT}" 201 | bucket => "${S3_BUCKET}" 202 | 203 | # 204 | # Standard Settings 205 | # 206 | validate_credentials_on_root_bucket => false 207 | codec => json_lines 208 | # Limit Data Lake file sizes to 5 GB 209 | size_file => 5000000000 210 | time_file => 60 211 | # encoding => "gzip" 212 | additional_settings => { 213 | force_path_style => true 214 | follow_redirects => false 215 | } 216 | } 217 | } 218 | ``` 219 | 220 | Put this pipeline in your Logstash configuration directory so it gets loaded whenever Logstash restarts: 221 | 222 | ```bash 223 | sudo mv power-hs300-archive.yml /etc/logstash/conf.d/ 224 | ``` 225 | 226 | Add the pipeline to your `/etc/logstash/pipelines.yml` file: 227 | 228 | ``` 229 | - pipeline.id: "power-hs300-archive" 230 | path.config: "/etc/logstash/conf.d/power-hs300-archive.conf" 231 | ``` 232 | 233 | And finally, restart the Logstash service: 234 | 235 | ```bash 236 | sudo systemctl restart logstash 237 | ``` 238 | 239 | While Logstash is restarting, you can tail it's log file in order to see if there are any configuration errors: 240 | 241 | ```bash 242 | sudo tail -f /var/log/logstash/logstash-plain.log 243 | ``` 244 | 245 | After a few seconds, you should see Logstash shutdown and start with the new pipeline and no errors being emitted. 246 | 247 | Check your cluster's Stack Monitoring to see if we're getting events through the pipeline: 248 | 249 | ![Stack Monitoring](archive.png) 250 | 251 | Check your S3 bucket to see if you're getting data directories created for the current date & hour with data: 252 | 253 | ![MinIO](minio.png) 254 | 255 | If you see your data being stored, then you are successfully archiving! 256 | 257 | ## Step #3 - Index Data 258 | 259 | Once Logstash is archiving the data, next we need to index it with Elastic. 260 | 261 | We'll use Elastic's [Dynamic field mapping](https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-field-mapping.html) feature to automatically create the right [Field data types](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html) for the data we're sending in. 262 | 263 | Using the [Logstash Toolkit](http://github.com/gose/logstash-toolkit), the following filter chain has been built that can parse the raw JSON coming in. 264 | 265 | Create a new pipeline called `power-hs300-index.yml` with the following content: 266 | 267 | ``` 268 | input { 269 | pipeline { 270 | address => "power-hs300-index" 271 | } 272 | } 273 | filter { 274 | mutate { 275 | remove_field => ["log", "input", "agent", "tags", "@version", "ecs", "host"] 276 | } 277 | json { 278 | source => "message" 279 | skip_on_invalid_json => true 280 | } 281 | if "_jsonparsefailure" in [tags] { 282 | drop { } 283 | } 284 | split { 285 | field => "outlets" 286 | } 287 | ruby { 288 | code => " 289 | event.get('outlets').each do |k, v| 290 | event.set(k, v) 291 | if k == '@timestamp' 292 | event.set(k, v + 'Z') 293 | end 294 | end 295 | event.remove('outlets') 296 | " 297 | } 298 | if "_rubyexception" in [tags] { 299 | drop { } 300 | } 301 | mutate { 302 | remove_field => ["message"] 303 | remove_field => ["@version"] 304 | } 305 | } 306 | output { 307 | elasticsearch { 308 | # 309 | # Custom Settings 310 | # 311 | id => "power-hs300-index" 312 | index => "power-hs300-%{+YYYY.MM.dd}" 313 | hosts => "${ES_ENDPOINT}" 314 | user => "${ES_USERNAME}" 315 | password => "${ES_PASSWORD}" 316 | } 317 | } 318 | ``` 319 | 320 | Put this pipeline in your Logstash configuration directory so it gets loaded in whenever Logstash restarts: 321 | 322 | ```bash 323 | sudo mv power-hs300-index.yml /etc/logstash/conf.d/ 324 | ``` 325 | 326 | Add the pipeline to your `/etc/logstash/pipelines.yml` file: 327 | 328 | ``` 329 | - pipeline.id: "power-hs300-index" 330 | path.config: "/etc/logstash/conf.d/power-hs300-index.conf" 331 | ``` 332 | 333 | And finally, restart the Logstash service: 334 | 335 | ```bash 336 | sudo systemctl restart logstash 337 | ``` 338 | 339 | While Logstash is restarting, you can tail it's log file in order to see if there are any configuration errors: 340 | 341 | ```bash 342 | sudo tail -f /var/log/logstash/logstash-plain.log 343 | ``` 344 | 345 | After a few seconds, you should see Logstash shutdown and start with the new pipeline and no errors being emitted. 346 | 347 | Check your cluster's Stack Monitoring to see if we're getting events through the pipeline: 348 | 349 | ![Indexing](index.png) 350 | 351 | ## Step #4 - Visualize Data 352 | 353 | Once Elasticsearch is indexing the data, we want to visualize it in Kibana. 354 | 355 | Download this dashboard: [power-hs300.ndjson](power-hs300.ndjson) 356 | 357 | Jump back into Kibana: 358 | 359 | 1. Select "Stack Management" from the menu 360 | 2. Select "Saved Objects" 361 | 3. Click "Import" in the upper right 362 | 363 | Once it's been imported, click on "Power HS300". 364 | 365 | ![Dashboard](dashboard.png) 366 | 367 | Congratulations! You should now be looking at power data from your HS300 in Elastic. -------------------------------------------------------------------------------- /power-hs300/archive.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/power-hs300/archive.png -------------------------------------------------------------------------------- /power-hs300/dashboard.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/power-hs300/dashboard.png -------------------------------------------------------------------------------- /power-hs300/hs300.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/power-hs300/hs300.png -------------------------------------------------------------------------------- /power-hs300/hs300.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | import datetime 4 | import json 5 | import subprocess 6 | 7 | def query_power_strip(ip_addr, label, hosts, outlets, time): 8 | output = subprocess.getoutput("/home/ubuntu/.local/bin/pyhs100 --ip " + ip_addr + 9 | " emeter | grep voltage") 10 | output = output.replace("'", "\"") 11 | output = output.replace("0:", "\"0\":") 12 | output = output.replace("1:", "\"1\":") 13 | output = output.replace("2:", "\"2\":") 14 | output = output.replace("3:", "\"3\":") 15 | output = output.replace("4:", "\"4\":") 16 | output = output.replace("5:", "\"5\":") 17 | 18 | try: 19 | json_output = json.loads(output) 20 | for i in range(0, 6): 21 | reading = {} 22 | reading["ip"] = ip_addr 23 | reading["label"] = label 24 | reading["outlet"] = i 25 | reading["name"] = hosts[i] 26 | reading["volts"] = json_output[f"{i}"]["voltage_mv"] / 1000 27 | reading["amps"] = json_output[f"{i}"]["current_ma"] / 1000 28 | reading["watts"] = json_output[f"{i}"]["power_mw"] / 1000 29 | # Record then erase, the stats from the meter only at the top of each hour. 30 | # This gives us a clean "watts/hour" reading every 1 hour. 31 | if time.minute == 0: 32 | reading["watt_hours"] = json_output[f"{i}"]["total_wh"] 33 | erase_output = subprocess.getoutput("/home/ubuntu/.local/bin/pyhs100 --ip " + 34 | ip_addr + " emeter --erase") 35 | outlets.append(reading) 36 | except Exception as e: 37 | print(e) 38 | 39 | def main(): 40 | # This script is designed to run every minute. 41 | # If it's the top of the hour, the "watt_hours" are also queried, 42 | # which often makes the runtime of this script greater than 1 minute. 43 | # So we capture the time the script started because we'll likely write 44 | # to output after another invocation of this script. 45 | # Even though these events will be written "out of order", 46 | # recording the correct invocation time will be important. 47 | now = datetime.datetime.utcnow() 48 | outlets = [] 49 | 50 | hosts = { 51 | 0: "node-22", 52 | 1: "5k-monitor", 53 | 2: "node-17", 54 | 3: "node-18", 55 | 4: "node-21", 56 | 5: "switch-8" 57 | } 58 | query_power_strip("192.168.1.81", "desk", hosts, outlets, now) 59 | 60 | hosts = { 61 | 0: "node-1", 62 | 1: "node-2", 63 | 2: "node-3", 64 | 3: "node-0", 65 | 4: "switch-8-poe", 66 | 5: "udm-pro" 67 | } 68 | query_power_strip("192.168.1.82", "office", hosts, outlets, now) 69 | 70 | hosts = { 71 | 0: "node-9", 72 | 1: "node-10", 73 | 2: "node-6", 74 | 3: "node-4", 75 | 4: "node-5", 76 | 5: "node-20" 77 | } 78 | query_power_strip("192.168.1.83", "basement", hosts, outlets, now) 79 | 80 | power = { 81 | "@timestamp": now.isoformat(), 82 | "outlets": outlets 83 | } 84 | 85 | print(json.dumps(power)) 86 | 87 | if __name__ == "__main__": 88 | main() 89 | 90 | -------------------------------------------------------------------------------- /power-hs300/index.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/power-hs300/index.png -------------------------------------------------------------------------------- /power-hs300/minio.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/power-hs300/minio.png -------------------------------------------------------------------------------- /power-hs300/reindex.yml: -------------------------------------------------------------------------------- 1 | input { 2 | s3 { 3 | # 4 | # Custom Settings 5 | # 6 | prefix => "power-hs300/2021-02-15/15-" 7 | #prefix => "power-hs300/2021-01-29/16-00" 8 | temporary_directory => "${S3_TEMP_DIR}/reindex" 9 | access_key_id => "${S3_ACCESS_KEY}" 10 | secret_access_key => "${S3_SECRET_KEY}" 11 | endpoint => "${S3_ENDPOINT}" 12 | bucket => "${S3_BUCKET}" 13 | 14 | # 15 | # Standard Settings 16 | # 17 | watch_for_new_files => false 18 | sincedb_path => "/dev/null" 19 | codec => json_lines 20 | additional_settings => { 21 | force_path_style => true 22 | follow_redirects => false 23 | } 24 | } 25 | } 26 | filter { 27 | mutate { 28 | remove_field => ["log", "input", "agent", "tags", "@version", "ecs", "host"] 29 | gsub => [ 30 | "message", "@timestamp", "ts" 31 | ] 32 | } 33 | json { 34 | source => "message" 35 | skip_on_invalid_json => true 36 | } 37 | if "_jsonparsefailure" in [tags] { 38 | drop { } 39 | } 40 | split { 41 | field => "outlets" 42 | } 43 | ruby { 44 | code => " 45 | event.get('outlets').each do |k, v| 46 | event.set(k, v) 47 | end 48 | event.remove('outlets') 49 | " 50 | } 51 | if "_rubyexception" in [tags] { 52 | drop { } 53 | } 54 | mutate { 55 | remove_field => ["message", "@timestamp"] 56 | } 57 | date { 58 | match => ["ts", "YYYY-MM-dd'T'HH:mm:ss.SSSSSS"] 59 | timezone => "UTC" 60 | target => "@timestamp" 61 | } 62 | mutate { 63 | remove_field => ["ts"] 64 | } 65 | } 66 | output { 67 | stdout { 68 | codec => dots 69 | } 70 | elasticsearch { 71 | index => "power-hs300-%{+YYYY.MM.dd}" 72 | hosts => "${ES_ENDPOINT}" 73 | user => "${ES_USERNAME}" 74 | password => "${ES_PASSWORD}" 75 | } 76 | } 77 | -------------------------------------------------------------------------------- /satellites/README.md: -------------------------------------------------------------------------------- 1 | # Tracking Satellites with Elastic 2 | 3 | satellites 4 | 5 | Often times data we collect will include geospatial information which is worth seeing on a map. [Elastic Maps](https://www.elastic.co/maps) is a great way to visualize this data to better understand how it is behaving. [The Elastic Stack](https://www.elastic.co/what-is/elk-stack) supports a wide range of [Field data types](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html) that include [geo points](https://www.elastic.co/guide/en/elasticsearch/reference/current/geo-point.html). For this data source, we track the location of over 1,000 [Starlink](https://en.wikipedia.org/wiki/Starlink) satellites and the [International Space Station](https://en.wikipedia.org/wiki/International_Space_Station) (ISS). 6 | 7 | Plotting the location of these satellites involves getting the latest [TLE](https://en.wikipedia.org/wiki/Two-line_element_set) from [Celestrak](http://www.celestrak.com/Norad/elements/table.php?tleFile=starlink&title=Starlink%20Satellites&orbits=0&pointsPerRev=90&frame=1), then using the [Skyfield API](https://rhodesmill.org/skyfield/) to convert the TLE to a Latitude & Longitude by providing a time and date. 8 | 9 | After we get the data ingest and indexed, we will use [Elastic Maps](https://www.elastic.co/maps) to plot our data: 10 | 11 | ![Dashboard](dashboard.png) 12 | 13 | Let's get started! 14 | 15 | ## Step #1 - Collect Data 16 | 17 | Install the following Python module that knows how to convert TLE information into latitude & longitude: 18 | 19 | ```bash 20 | $ pip3 install skyfield 21 | ``` 22 | 23 | Create a new python script called `satellites.py` with the following contents: 24 | 25 | ```python 26 | #!/usr/bin/env python3 27 | 28 | import datetime, json, time 29 | from skyfield.api import load, wgs84 30 | 31 | def main(): 32 | stations_url = 'http://celestrak.com/NORAD/elements/stations.txt' 33 | stations = load.tle_file(stations_url, reload=True) 34 | starlink_url = 'http://celestrak.com/NORAD/elements/starlink.txt' 35 | starlinks = load.tle_file(starlink_url, reload=True) 36 | 37 | while True: 38 | now = datetime.datetime.utcnow() 39 | ts = load.timescale() 40 | 41 | satellites = [] 42 | output = {} 43 | output['@timestamp'] = now.strftime('%Y-%m-%dT%H:%M:%SZ') 44 | 45 | by_name = {station.name: station for station in stations} 46 | station = by_name['ISS (ZARYA)'] 47 | satellite = {} 48 | satellite['name'] = 'ISS' 49 | satellite['sat_num'] = station.model.satnum 50 | geocentric = station.at(ts.now()) 51 | subpoint = wgs84.subpoint(geocentric) 52 | geo_point = {} 53 | geo_point['lat'] = subpoint.latitude.degrees 54 | geo_point['lon'] = subpoint.longitude.degrees 55 | satellite['location'] = geo_point 56 | satellite['elevation'] = int(subpoint.elevation.m) 57 | satellites.append(satellite) 58 | 59 | for starlink in starlinks: 60 | try: 61 | geocentric = starlink.at(ts.now()) 62 | subpoint = wgs84.subpoint(geocentric) 63 | satellite = {} 64 | satellite['name'] = starlink.name 65 | satellite['sat_num'] = starlink.model.satnum 66 | geo_point = {} 67 | geo_point['lat'] = subpoint.latitude.degrees 68 | geo_point['lon'] = subpoint.longitude.degrees 69 | satellite['location'] = geo_point 70 | satellite['elevation'] = int(subpoint.elevation.m) 71 | satellites.append(satellite) 72 | except: 73 | pass 74 | 75 | output['satellites'] = satellites 76 | print(json.dumps(output)) 77 | 78 | time.sleep(3) 79 | 80 | if __name__ == "__main__": 81 | main() 82 | ``` 83 | 84 | Create a new bash script called `satellites.sh` with the following contents: 85 | 86 | ```bash 87 | #!/bin/bash 88 | 89 | if pgrep -f "python3 /home/ubuntu/python/satellites/satellites.py" > /dev/null 90 | then 91 | echo "Already running." 92 | else 93 | echo "Not running. Restarting..." 94 | /home/ubuntu/python/satellites/satellites.py >> /var/log/satellites.log 95 | fi 96 | ``` 97 | 98 | You can store these wherever you'd like. A good place to put them is in a `~/python` and `~/bin` directory, respectively. 99 | 100 | Try running the Python script directly: 101 | 102 | ``` 103 | $ chmod a+x ~/python/satellites/satellites.py 104 | $ ~/python/satellites/satellites.py 105 | ``` 106 | 107 | You should see output similar to: 108 | 109 | ```json 110 | {"@timestamp": "2021-04-18T16:47:54Z", "satellites": [{"name": "ISS", "sat_num": 25544, "location": {"lat": -9.499628732834388, "lon": 5.524255661695312}, "elevation": 421272}, {"name": "STARLINK-24", "sat_num": 44238, "location": {"lat": -53.0987009533634, "lon": 75.21545552082654}, "elevation": 539139}]} 111 | ``` 112 | 113 | Once you confirm the script is working, you can redirect its output to a log file: 114 | 115 | ``` 116 | $ sudo touch /var/log/satellites.log 117 | $ sudo chown ubuntu.ubuntu /var/log/satellites.log 118 | ``` 119 | 120 | Create a logrotate entry so the log file doesn't grow unbounded: 121 | 122 | ``` 123 | $ sudo vi /etc/logrotate.d/satellites 124 | ``` 125 | 126 | Add the following content: 127 | 128 | ``` 129 | /var/log/satellites.log { 130 | weekly 131 | rotate 12 132 | compress 133 | delaycompress 134 | missingok 135 | notifempty 136 | create 644 ubuntu ubuntu 137 | } 138 | ``` 139 | 140 | Add the following entry to your crontab: 141 | 142 | ``` 143 | * * * * * /home/ubuntu/bin/satellites.sh > /dev/null 2>&1 144 | ``` 145 | 146 | Verify output by tailing the log file for a few minutes: 147 | 148 | ``` 149 | $ tail -f /var/log/satellites.log 150 | ``` 151 | 152 | Tell Filebeat to send events in it to Elasticsearch, by editing `/etc/filebeat/filebeat.yml`: 153 | 154 | ``` 155 | filebeat.inputs: 156 | - type: log 157 | enabled: true 158 | tags: ["satellites"] 159 | paths: 160 | - /var/log/satellites.log 161 | ``` 162 | 163 | Restart Filebeat: 164 | 165 | ``` 166 | $ sudo systemctl restart filebeat 167 | ``` 168 | 169 | We now have a reliable collection method that will queue the satellite data on disk to a log file. Next, we'll leverage Filebeat to manage all the domain-specific logic of handing it off to Logstash in a reliable manner, dealing with retries, backoff logic, and more. 170 | 171 | ## Step #2 - Archive Data 172 | 173 | Once you have a data source that's ready to archive, we'll turn to Filebeat to send in the data to Logstash. By default, our `distributor` pipeline will put any unrecognized data in a Data Lake bucket called `NEEDS_CLASSIFIED`. To change this, we're going to update the `distributor` pipeline to recognize the satellites data feed and create two pipelines that know how to archive it in the Data Lake. 174 | 175 | Create an Archive Pipeline: 176 | 177 | ``` 178 | input { 179 | pipeline { 180 | address => "satellites-archive" 181 | } 182 | } 183 | filter { 184 | } 185 | output { 186 | s3 { 187 | # 188 | # Custom Settings 189 | # 190 | prefix => "satellites/%{+YYYY}-%{+MM}-%{+dd}/%{+HH}" 191 | temporary_directory => "${S3_TEMP_DIR}/satellites-archive" 192 | access_key_id => "${S3_ACCESS_KEY}" 193 | secret_access_key => "${S3_SECRET_KEY}" 194 | endpoint => "${S3_ENDPOINT}" 195 | bucket => "${S3_BUCKET}" 196 | 197 | # 198 | # Standard Settings 199 | # 200 | validate_credentials_on_root_bucket => false 201 | codec => json_lines 202 | # Limit Data Lake file sizes to 5 GB 203 | size_file => 5000000000 204 | time_file => 60 205 | # encoding => "gzip" 206 | additional_settings => { 207 | force_path_style => true 208 | follow_redirects => false 209 | } 210 | } 211 | } 212 | ``` 213 | 214 | If you're doing this in environment with multiple Logstash instances, please adapt the instruction below to your workflow for deploying updates. Ansible is a great configuration management tool for this purpose. 215 | 216 | ## Step #3 - Index Data 217 | 218 | Once Logstash is archiving the data, we need to index it with Elastic. 219 | 220 | Create an Index Template: 221 | 222 | ``` 223 | PUT _index_template/satellites 224 | { 225 | "index_patterns": ["satellites-*"], 226 | "template": { 227 | "settings": {}, 228 | "mappings": { 229 | "properties": { 230 | "location": { 231 | "type": "geo_point" 232 | } 233 | } 234 | }, 235 | "aliases": {} 236 | } 237 | } 238 | ``` 239 | 240 | Create an Index Pipeline: 241 | 242 | ``` 243 | input { 244 | pipeline { 245 | address => "satellites-index" 246 | } 247 | } 248 | filter { 249 | json { 250 | source => "message" 251 | } 252 | json { 253 | source => "message" 254 | } 255 | split { 256 | field => "satellites" 257 | } 258 | mutate { 259 | rename => { "[satellites][name]" => "[name]" } 260 | rename => { "[satellites][sat_num]" => "[sat_num]" } 261 | rename => { "[satellites][location]" => "[location]" } 262 | rename => { "[satellites][elevation]" => "[elevation]" } 263 | remove_field => ["message", "agent", "input", "@version", "satellites"] 264 | } 265 | } 266 | output { 267 | elasticsearch { 268 | # 269 | # Custom Settings 270 | # 271 | id => "satellites-index" 272 | index => "satellites-%{+YYYY.MM.dd}" 273 | hosts => "${ES_ENDPOINT}" 274 | user => "${ES_USERNAME}" 275 | password => "${ES_PASSWORD}" 276 | } 277 | } 278 | ``` 279 | 280 | Deploy your pipeline. 281 | 282 | ## Step #4 - Visualize Data 283 | 284 | Once Elasticsearch is indexing the data, we want to visualize it in Kibana. 285 | 286 | dashboard -------------------------------------------------------------------------------- /satellites/dashboard.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/satellites/dashboard.png -------------------------------------------------------------------------------- /satellites/satellites.ndjson: -------------------------------------------------------------------------------- 1 | {"attributes":{"fieldAttrs":"{}","fields":"[]","runtimeFieldMap":"{}","timeFieldName":"@timestamp","title":"satellites-*"},"coreMigrationVersion":"7.12.0","id":"6979b870-9945-11eb-9bde-9dcf1fa82a43","migrationVersion":{"index-pattern":"7.11.0"},"references":[],"type":"index-pattern","updated_at":"2021-04-18T17:45:39.267Z","version":"WzUyNDcyNywxMl0="} 2 | {"attributes":{"description":"","hits":0,"kibanaSavedObjectMeta":{"searchSourceJSON":"{\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filter\":[]}"},"optionsJSON":"{\"hidePanelTitles\":false,\"useMargins\":true}","panelsJSON":"[{\"version\":\"7.12.0\",\"type\":\"map\",\"gridData\":{\"x\":0,\"y\":0,\"w\":48,\"h\":26,\"i\":\"dae266e1-0d25-4c8e-83e8-547fb8ae8cfd\"},\"panelIndex\":\"dae266e1-0d25-4c8e-83e8-547fb8ae8cfd\",\"embeddableConfig\":{\"attributes\":{\"title\":\"\",\"description\":\"\",\"layerListJSON\":\"[{\\\"sourceDescriptor\\\":{\\\"type\\\":\\\"EMS_TMS\\\",\\\"isAutoSelect\\\":true},\\\"id\\\":\\\"0722a46c-af6c-4837-ae8c-4a1895a2385a\\\",\\\"label\\\":null,\\\"minZoom\\\":0,\\\"maxZoom\\\":24,\\\"alpha\\\":1,\\\"visible\\\":true,\\\"style\\\":{\\\"type\\\":\\\"TILE\\\"},\\\"type\\\":\\\"VECTOR_TILE\\\"},{\\\"sourceDescriptor\\\":{\\\"indexPatternId\\\":\\\"6979b870-9945-11eb-9bde-9dcf1fa82a43\\\",\\\"geoField\\\":\\\"location\\\",\\\"filterByMapBounds\\\":true,\\\"scalingType\\\":\\\"TOP_HITS\\\",\\\"topHitsSplitField\\\":\\\"name.keyword\\\",\\\"topHitsSize\\\":1,\\\"id\\\":\\\"a24b740c-1f80-4013-a190-dfd89b2fab24\\\",\\\"type\\\":\\\"ES_SEARCH\\\",\\\"applyGlobalQuery\\\":true,\\\"applyGlobalTime\\\":true,\\\"tooltipProperties\\\":[\\\"elevation\\\",\\\"name\\\"],\\\"sortField\\\":\\\"\\\",\\\"sortOrder\\\":\\\"desc\\\"},\\\"id\\\":\\\"d4790890-f1d6-4730-bd88-4f8eca8e0fc0\\\",\\\"label\\\":\\\"Satellites\\\",\\\"minZoom\\\":0,\\\"maxZoom\\\":24,\\\"alpha\\\":0.75,\\\"visible\\\":true,\\\"style\\\":{\\\"type\\\":\\\"VECTOR\\\",\\\"properties\\\":{\\\"icon\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"value\\\":\\\"confectionery\\\"}},\\\"fillColor\\\":{\\\"type\\\":\\\"DYNAMIC\\\",\\\"options\\\":{\\\"color\\\":\\\"Green to Red\\\",\\\"colorCategory\\\":\\\"palette_0\\\",\\\"field\\\":{\\\"name\\\":\\\"elevation\\\",\\\"origin\\\":\\\"source\\\"},\\\"fieldMetaOptions\\\":{\\\"isEnabled\\\":true,\\\"sigma\\\":3},\\\"type\\\":\\\"ORDINAL\\\",\\\"useCustomColorRamp\\\":false}},\\\"lineColor\\\":{\\\"type\\\":\\\"DYNAMIC\\\",\\\"options\\\":{\\\"color\\\":\\\"Green to Red\\\",\\\"colorCategory\\\":\\\"palette_0\\\",\\\"field\\\":{\\\"name\\\":\\\"elevation\\\",\\\"origin\\\":\\\"source\\\"},\\\"fieldMetaOptions\\\":{\\\"isEnabled\\\":true,\\\"sigma\\\":3},\\\"type\\\":\\\"ORDINAL\\\",\\\"useCustomColorRamp\\\":false}},\\\"lineWidth\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"size\\\":1}},\\\"iconSize\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"size\\\":6}},\\\"iconOrientation\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"orientation\\\":0}},\\\"labelText\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"value\\\":\\\"\\\"}},\\\"labelColor\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"color\\\":\\\"#FFFFFF\\\"}},\\\"labelSize\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"size\\\":14}},\\\"labelBorderColor\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"color\\\":\\\"#000000\\\"}},\\\"symbolizeAs\\\":{\\\"options\\\":{\\\"value\\\":\\\"icon\\\"}},\\\"labelBorderSize\\\":{\\\"options\\\":{\\\"size\\\":\\\"SMALL\\\"}}},\\\"isTimeAware\\\":true},\\\"type\\\":\\\"VECTOR\\\",\\\"joins\\\":[]}]\",\"mapStateJSON\":\"{\\\"zoom\\\":1.95,\\\"center\\\":{\\\"lon\\\":-99.72924,\\\"lat\\\":22.13816},\\\"timeFilters\\\":{\\\"from\\\":\\\"now-1m\\\",\\\"to\\\":\\\"now\\\"},\\\"refreshConfig\\\":{\\\"isPaused\\\":false,\\\"interval\\\":5000},\\\"query\\\":{\\\"query\\\":\\\"\\\",\\\"language\\\":\\\"kuery\\\"},\\\"filters\\\":[],\\\"settings\\\":{\\\"autoFitToDataBounds\\\":false,\\\"backgroundColor\\\":\\\"#1d1e24\\\",\\\"disableInteractive\\\":false,\\\"disableTooltipControl\\\":false,\\\"hideToolbarOverlay\\\":false,\\\"hideLayerControl\\\":false,\\\"hideViewControl\\\":false,\\\"initialLocation\\\":\\\"LAST_SAVED_LOCATION\\\",\\\"fixedLocation\\\":{\\\"lat\\\":0,\\\"lon\\\":0,\\\"zoom\\\":2},\\\"browserLocation\\\":{\\\"zoom\\\":2},\\\"maxZoom\\\":24,\\\"minZoom\\\":0,\\\"showScaleControl\\\":false,\\\"showSpatialFilters\\\":true,\\\"spatialFiltersAlpa\\\":0.3,\\\"spatialFiltersFillColor\\\":\\\"#DA8B45\\\",\\\"spatialFiltersLineColor\\\":\\\"#DA8B45\\\"}}\",\"uiStateJSON\":\"{\\\"isLayerTOCOpen\\\":true,\\\"openTOCDetails\\\":[]}\"},\"mapCenter\":{\"lat\":22.13816,\"lon\":-99.72924,\"zoom\":1.95},\"mapBuffer\":{\"minLon\":-395.447,\"minLat\":-88.57154500000001,\"maxLon\":195.98852,\"maxLat\":115.932715},\"isLayerTOCOpen\":true,\"openTOCDetails\":[],\"hiddenLayers\":[],\"enhancements\":{}}}]","timeRestore":false,"title":"Satellites","version":1},"coreMigrationVersion":"7.12.0","id":"4ae756b0-9ee0-11eb-892f-d146407b15b5","migrationVersion":{"dashboard":"7.11.0"},"references":[{"id":"6979b870-9945-11eb-9bde-9dcf1fa82a43","name":"layer_1_source_index_pattern","type":"index-pattern"}],"type":"dashboard","updated_at":"2021-04-18T17:58:42.036Z","version":"WzUyNTEzNSwxMl0="} 3 | {"exportedCount":2,"missingRefCount":0,"missingReferences":[]} -------------------------------------------------------------------------------- /satellites/satellites.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/satellites/satellites.png -------------------------------------------------------------------------------- /setup/README.md: -------------------------------------------------------------------------------- 1 | # Setup 2 | 3 | To build the architecture for the Elastic Data Lake, you'll need these components: 4 | 5 | * Logstash 6 | * HAProxy (or equivalent) 7 | * S3 Data Store (or equivalent) 8 | * Elastic Cluster 9 | 10 | Here is the architecture we're building: 11 | 12 | ![](../images/architecture.png) 13 | 14 | ## Prerequisites 15 | 16 | This guide depends on you having an S3 store and Elasticsearch cluster already running. We'll use [Elastic Cloud](https://elastic.co) to run our Elasticsearch cluster and [Minio](https://www.digitalocean.com/community/tutorials/how-to-set-up-an-object-storage-server-using-minio-on-ubuntu-18-04) as an S3 data store (or any S3-compliant service). 17 | 18 | ## Step 1 - Logstash 19 | 20 | Identify the host you want to run Logstash. Depending on the volume of ingest you anticipate, you may want to run Logstash on multiple hosts (or containers). It scales easily so putting HAProxy in front of it (which we'll do next) will make it easy to add more capacity. 21 | 22 | Follow these instructions to get Logstash up and running: 23 | 24 | [https://www.elastic.co/guide/en/logstash/current/installing-logstash.html](https://www.elastic.co/guide/en/logstash/current/installing-logstash.html) 25 | 26 | Next, create a [Logstash keystore](https://www.elastic.co/guide/en/logstash/current/keystore.html) to store sensitive information and variables: 27 | 28 | ``` 29 | $ export LOGSTASH_KEYSTORE_PASS=mypassword 30 | $ sudo -E /usr/share/logstash/bin/logstash-keystore --path.settings /etc/logstash create 31 | ``` 32 | 33 | **Note:** Store this password somewhere safe. You will also need to add it to the environment that starts the Logstash process. 34 | 35 | We'll use the keystore to fill in variables about our Elasticsearch cluster: 36 | 37 | ``` 38 | $ sudo -E /usr/share/logstash/bin/logstash-keystore --path.settings /etc/logstash add ES_ENDPOINT 39 | $ sudo -E /usr/share/logstash/bin/logstash-keystore --path.settings /etc/logstash add ES_USERNAME 40 | $ sudo -E /usr/share/logstash/bin/logstash-keystore --path.settings /etc/logstash add ES_PASSWORD 41 | ``` 42 | 43 | The `ES_ENDPOINT` value should be a full domain with `https` prefix and `:9243` port suffix. For example: 44 | 45 | ``` 46 | https://elasticsearch.my-domain.com:9243 47 | ``` 48 | 49 | We'll also use the keystore to fill in variables about our S3 bucket: 50 | 51 | ``` 52 | $ sudo -E /usr/share/logstash/bin/logstash-keystore --path.settings /etc/logstash add S3_ENDPOINT 53 | $ sudo -E /usr/share/logstash/bin/logstash-keystore --path.settings /etc/logstash add S3_BUCKET 54 | $ sudo -E /usr/share/logstash/bin/logstash-keystore --path.settings /etc/logstash add S3_ACCESS_KEY 55 | $ sudo -E /usr/share/logstash/bin/logstash-keystore --path.settings /etc/logstash add S3_SECRET_KEY 56 | $ sudo -E /usr/share/logstash/bin/logstash-keystore --path.settings /etc/logstash add S3_DATE_DIR 57 | $ sudo -E /usr/share/logstash/bin/logstash-keystore --path.settings /etc/logstash add S3_TEMP_DIR 58 | ``` 59 | 60 | The `S3_DATE_DIR` variable is used to organize your data into `date/time` directories in the Data Lake. For example, `data-source/2021-01-01/13` will contain data collected January 1, 2021 during the 1PM GMT hour. Organizing your data in this manner gives you good granularity in terms of identifying what time windows you may want to re-index in the future. It allows you to reindex data from a year, month, day, or hour interval. An hour (as opposed to using the hour with minute granularity) provides a nice balance between flushing what's in Logstash to your archive relatively often, while not creating a "too many files" burden on the underlying archive file system. Many file systems can handle lots of files; it's more the latency involved in recalling them that we want to avoid. 61 | 62 | The recommended value for `S3_DATE_DIR` is: 63 | 64 | ``` 65 | %{+YYYY}-%{+MM}-%{+dd}/%{+HH} 66 | ``` 67 | 68 | The `S3_TEMP_DIR` variable should point to a directory where Logstash can temporarily store events. Since this directory will contain events, you may need to make it secure so that only the Logstash process can read it (in addition to write to it). 69 | 70 | If Logstash is running on an isolated host, you may set it to: 71 | 72 | ``` 73 | /tmp/logstash 74 | ``` 75 | 76 | ### Ansible Pipeline Management 77 | 78 | We'll configure Logstash using Ansible. Ansible is a popular software provisioning tool that makes deploying configuration updates to multiple servers a breeze. If you can SSH into a host, you can use Ansible to push configuration to it. 79 | 80 | Create a directory to hold the Logstash configuration we'll be pushing to each Logstash host. 81 | 82 | ``` 83 | $ mkdir logstash 84 | $ vi playbook-logstash.yml 85 | ``` 86 | 87 | Add the following content to your Logstash Ansible playbook. 88 | 89 | **Note:** Replace `node-1` and `node-2` with the names of your Logstash hosts. 90 | 91 | ``` 92 | --- 93 | - hosts: node-1:node-2 94 | become: yes 95 | gather_facts: no 96 | 97 | tasks: 98 | - name: Copy in pipelines.yml 99 | template: 100 | src: "pipelines.yml" 101 | dest: "/etc/logstash/pipelines.yml" 102 | mode: 0644 103 | 104 | - name: Remove existing pipelines 105 | file: 106 | path: "/etc/logstash/conf.d" 107 | state: absent 108 | 109 | - name: Copy in pipelines 110 | copy: 111 | src: "conf.d" 112 | dest: "/etc/logstash/" 113 | 114 | - name: Restart Logstash 115 | service: 116 | name: logstash 117 | state: restarted 118 | enabled: true 119 | 120 | ``` 121 | 122 | ## Step 2 - HAProxy 123 | 124 | Identify the host you want to run HAProxy. Many Linux distributions support installation from the standard distribution. 125 | 126 | In Ubuntu, run: 127 | 128 | ``` 129 | $ sudo apt install haproxy 130 | ``` 131 | 132 | In Redhat, run: 133 | 134 | ``` 135 | $ sudo yum install haproxy 136 | ``` 137 | 138 | A sample configuration file is provided: [haproxy.cfg](haproxy.cfg) 139 | -------------------------------------------------------------------------------- /setup/dead-letter-queue-archive.yml: -------------------------------------------------------------------------------- 1 | input { 2 | dead_letter_queue { 3 | pipeline_id => "haproxy-filebeat-module-structure" 4 | path => "${S3_TEMP_DIR}/dead-letter-queue" 5 | # This directory needs created by hand (change /tmp/logstash if necessary): 6 | # mkdir -p /tmp/logstash/dead-letter-queue/haproxy-filebeat-module-structure 7 | # chown -R logstash.logstash /tmp/logstash/dead-letter-queue 8 | } 9 | dead_letter_queue { 10 | pipeline_id => "haproxy-metricbeat-module-structure" 11 | path => "${S3_TEMP_DIR}/dead-letter-queue" 12 | # This directory needs created by hand (change /tmp/logstash if necessary): 13 | # mkdir -p /tmp/logstash/dead-letter-queue/haproxy-metricbeat-module-structure 14 | # chown -R logstash.logstash /tmp/logstash/dead-letter-queue 15 | } 16 | dead_letter_queue { 17 | pipeline_id => "system-filebeat-module-structure" 18 | path => "${S3_TEMP_DIR}/dead-letter-queue" 19 | # This directory needs created by hand (change /tmp/logstash if necessary): 20 | # mkdir -p /tmp/logstash/dead-letter-queue/system-filebeat-module-structure 21 | # chown -R logstash.logstash /tmp/logstash/dead-letter-queue 22 | } 23 | dead_letter_queue { 24 | pipeline_id => "unknown-structure" 25 | path => "${S3_TEMP_DIR}/dead-letter-queue" 26 | # This directory needs created by hand (change /tmp/logstash if necessary): 27 | # mkdir -p /tmp/logstash/dead-letter-queue/unknown-structure 28 | # chown -R logstash.logstash /tmp/logstash/dead-letter-queue 29 | } 30 | dead_letter_queue { 31 | pipeline_id => "utilization-structure" 32 | path => "${S3_TEMP_DIR}/dead-letter-queue" 33 | # This directory needs created by hand (change /tmp/logstash if necessary): 34 | # mkdir -p /tmp/logstash/dead-letter-queue/utilization-structure 35 | # chown -R logstash.logstash /tmp/logstash/dead-letter-queue 36 | } 37 | } 38 | filter { 39 | } 40 | output { 41 | s3 { 42 | # 43 | # Custom Settings 44 | # 45 | prefix => "dead-letter-queue-archive/${S3_DATE_DIR}" 46 | temporary_directory => "${S3_TEMP_DIR}/dead-letter-queue-archive" 47 | access_key_id => "${S3_ACCESS_KEY}" 48 | secret_access_key => "${S3_SECRET_KEY}" 49 | endpoint => "${S3_ENDPOINT}" 50 | bucket => "${S3_BUCKET}" 51 | 52 | # 53 | # Standard Settings 54 | # 55 | validate_credentials_on_root_bucket => false 56 | codec => json_lines 57 | # Limit Data Lake file sizes to 5 GB 58 | size_file => 5000000000 59 | time_file => 1 60 | # encoding => "gzip" 61 | additional_settings => { 62 | force_path_style => true 63 | follow_redirects => false 64 | } 65 | } 66 | } 67 | -------------------------------------------------------------------------------- /setup/distributor.conf: -------------------------------------------------------------------------------- 1 | input { 2 | tcp { 3 | port => 4044 4 | } 5 | beats { 6 | port => 5044 7 | } 8 | } 9 | filter { 10 | # Raw data filters go here. 11 | # Filter out any data you don't want in the Data Lake or Elasticsearch. 12 | } 13 | output { 14 | if "utilization" in [tags] { 15 | pipeline { 16 | send_to => ["utilization-archive", "utilization-structure"] 17 | } 18 | } else if [agent][type] == "filebeat" and [event][module] == "system" { 19 | pipeline { 20 | send_to => ["system-filebeat-module-archive", "system-filebeat-module-structure"] 21 | } 22 | } else if [agent][type] == "filebeat" and [event][module] == "haproxy" { 23 | pipeline { 24 | send_to => ["haproxy-filebeat-module-archive", "haproxy-filebeat-module-structure"] 25 | } 26 | } else if [agent][type] == "metricbeat" and [event][module] == "haproxy" { 27 | pipeline { 28 | send_to => ["haproxy-metricbeat-module-archive", "haproxy-metricbeat-module-structure"] 29 | } 30 | } else { 31 | pipeline { 32 | send_to => ["unknown-archive", "unknown-structure"] 33 | } 34 | } 35 | } 36 | -------------------------------------------------------------------------------- /setup/haproxy.cfg: -------------------------------------------------------------------------------- 1 | global 2 | log /dev/log local0 3 | log /dev/log local1 notice 4 | chroot /var/lib/haproxy 5 | stats socket 127.0.0.1:14567 6 | user haproxy 7 | group haproxy 8 | daemon 9 | tune.ssl.default-dh-param 2048 10 | 11 | ssl-default-bind-ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384 12 | ssl-default-bind-ciphersuites TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256 13 | ssl-default-bind-options ssl-min-ver TLSv1.2 no-tls-tickets 14 | 15 | defaults 16 | log global 17 | mode http 18 | option httplog 19 | option dontlognull 20 | timeout connect 5000 21 | timeout client 50000 22 | timeout server 50000 23 | errorfile 400 /etc/haproxy/errors/400.http 24 | errorfile 403 /etc/haproxy/errors/403.http 25 | errorfile 408 /etc/haproxy/errors/408.http 26 | errorfile 500 /etc/haproxy/errors/500.http 27 | errorfile 502 /etc/haproxy/errors/502.http 28 | errorfile 503 /etc/haproxy/errors/503.http 29 | errorfile 504 /etc/haproxy/errors/504.http 30 | 31 | # Logstash TCP 32 | listen logstash-tcp:4443 33 | #log /dev/log local0 debug 34 | mode tcp 35 | bind *:4443 ssl crt /etc/haproxy/certs/logstash.corp-intranet.pem ca-file /etc/haproxy/certs/ca.crt verify required 36 | option tcp-check 37 | balance roundrobin 38 | server proxy 127.0.0.1:4044 check port 4044 39 | 40 | # Logstash Beats 41 | listen logstash-beats:5443 42 | #log /dev/log local0 debug 43 | mode tcp 44 | bind *:5443 ssl crt /etc/haproxy/certs/logstash.corp-intranet.pem ca-file /etc/haproxy/certs/ca.crt verify required 45 | option tcp-check 46 | balance roundrobin 47 | server proxy 127.0.0.1:5044 check port 5044 48 | 49 | # Elasticsearch 50 | listen elasticsearch:9243 51 | #log /dev/log local0 debug 52 | mode http 53 | bind *:9243 ssl crt /etc/haproxy/certs/corp-intranet.pem 54 | http-request add-header X-Found-Cluster f40ec3b5bf1c4d8d81b3934cb97c8a32 55 | option ssl-hello-chk 56 | server proxy f40ec3b5bf1c4d8d81b3934cb97c8a32.us-central1.gcp.cloud.es.io:9243 check ssl port 9243 verify none 57 | 58 | # MinIO 59 | listen minio:9443 60 | #log /dev/log local0 debug 61 | mode http 62 | bind *:9443 ssl crt /etc/haproxy/certs/corp-intranet.pem 63 | http-request set-header X-Forwarded-Port %[dst_port] 64 | http-request add-header X-Forwarded-Proto https if { ssl_fc } 65 | option tcp-check 66 | balance roundrobin 67 | server proxy 127.0.0.1:9000 check port 9000 68 | -------------------------------------------------------------------------------- /setup/needs-classified-archive.yml: -------------------------------------------------------------------------------- 1 | input { 2 | pipeline { 3 | address => "needs-classified-archive" 4 | } 5 | } 6 | filter { 7 | } 8 | output { 9 | s3 { 10 | # 11 | # Custom Settings 12 | # 13 | prefix => "NEEDS_CLASSIFIED/${S3_DATE_DIR}" 14 | temporary_directory => "${S3_TEMP_DIR}/needs-classified-archive" 15 | access_key_id => "${S3_ACCESS_KEY}" 16 | secret_access_key => "${S3_SECRET_KEY}" 17 | endpoint => "${S3_ENDPOINT}" 18 | bucket => "${S3_BUCKET}" 19 | 20 | # 21 | # Standard Settings 22 | # 23 | validate_credentials_on_root_bucket => false 24 | codec => json_lines 25 | # Limit Data Lake file sizes to 5 GB 26 | size_file => 5000000000 27 | time_file => 1 28 | # encoding => "gzip" 29 | additional_settings => { 30 | force_path_style => true 31 | follow_redirects => false 32 | } 33 | } 34 | } 35 | -------------------------------------------------------------------------------- /solar-enphase/README.md: -------------------------------------------------------------------------------- 1 | # Solar Monitoring with Enphase 2 | 3 | solar 4 | 5 | The [IQ 7+](https://store.enphase.com/storefront/en-us/iq-7plus-microinverter) from Enphase is a microinverter compatible with 60 and 72-cell solar panels that can produce 295VA at peak power. Enphase provides an [API]( https://developer.enphase.com/docs#envoys) that allows us to query a set of these microinverters reporting into their service. They offer a range of [Plans](https://developer.enphase.com/plans), including a free plan, which we'll be using for this data source. 6 | 7 | For this data source, we'll build the following dashboard with Elastic: 8 | 9 | ![Dashboard](dashboard.png) 10 | 11 | Let's get started! 12 | 13 | ## Step #1 - Collect Data 14 | 15 | Create a new python script called `~/bin/solar-enphase.py` with the following contents: 16 | 17 | ​ [solar-enphase.py](solar-enphase.py) 18 | 19 | The script queries a set of Enphase's APIs at different intervals. The goal being to stay within our alotted quota of 10k API calls per month. We'll write the data collected to our data lake, but only use a portion of it for analysis in Elastic. 20 | 21 | Take a few minutes to familiarize yourself with the script. There are a couple of labels you can change near the bottom. Adjust the values of ``, `` and `` to suit your needs. The Enphase [Developer Portal](https://developer.enphase.com) is where you can get these values. 22 | 23 | When you're ready, try running the script: 24 | 25 | ```bash 26 | chmod a+x ~/bin/solar-enphase.py 27 | ~/bin/solar-enphase.py 28 | ``` 29 | 30 | You may not see any output, and this is by design (not a great design, albeit, but it works for now). Since we're limited to ~300 API calls per day on the Free plan, the script checks to see if it's on a specific minute of the hour in order to determine which API calls to make. 31 | 32 | If you run the script at :00, :10, :20, :30, :40, or :50 past the hour, you should see output on `stdout` similar to: 33 | 34 | ```json 35 | [{"signal_strength":0,"micro_inverters":[{"id":40236944,"serial_number":"121927062331","model":"IQ7+","part_number":"800-00625-r02","sku":"IQ7PLUS-72-2-US","status":"normal","power_produced":28,"proc_load":"520-00082-r01-v04.27.04","param_table":"540-00242-r01-v04.22.09","envoy_serial_number":"111943015132",... 36 | ``` 37 | 38 | Once you confirm the script is working, you can redirect its output to a log file: 39 | 40 | ```bash 41 | sudo touch /var/log/solar-enphase.log 42 | sudo chown ubuntu.ubuntu /var/log/solar-enphase.log 43 | ``` 44 | 45 | Create a logrotate entry so the log file doesn't grow unbounded: 46 | 47 | ```bash 48 | sudo vi /etc/logrotate.d/solar-enphase 49 | ``` 50 | 51 | Add the following logrotate content: 52 | 53 | ``` 54 | /var/log/solar-enphase.log { 55 | weekly 56 | rotate 12 57 | compress 58 | delaycompress 59 | missingok 60 | notifempty 61 | create 644 ubuntu ubuntu 62 | } 63 | ``` 64 | 65 | Add the following entry to your crontab with `crontab -e`: 66 | 67 | ``` 68 | * * * * * /home/ubuntu/bin/solar-enphase.py >> /var/log/solar-enphase.log 2>&1 69 | ``` 70 | 71 | Verify output by tailing the log file for a few minutes (since cron is only running the script at the start of each minute): 72 | 73 | ```bash 74 | tail -f /var/log/solar-enphase.log 75 | ``` 76 | 77 | If you're seeing output scroll every 10 minutes, then you are successfully collecting data! 78 | 79 | ## Step #2 - Archive Data 80 | 81 | Once your data is ready to archive, we'll use Filebeat to send it to Logstash which will in turn sends it to S3. 82 | 83 | Add the following to the Filebeat config `/etc/filebeat/filebeat.yml` on the host logging your CO2 data: 84 | 85 | ```yaml 86 | filebeat.inputs: 87 | - type: log 88 | enabled: true 89 | tags: ["solar-enphase"] 90 | paths: 91 | - /var/log/solar-enphase.log 92 | ``` 93 | 94 | This tells Filebeat where the log file is located and it adds a tag to each event. We'll refer to that tag in Logstash so we can easily isolate events from this data stream. 95 | 96 | Restart Filebeat: 97 | 98 | ```bash 99 | sudo systemctl restart filebeat 100 | ``` 101 | 102 | You may want to tail syslog to see if Filebeat restarts without any issues: 103 | 104 | ```bash 105 | tail -f /var/log/syslog | grep filebeat 106 | ``` 107 | 108 | At this point, we should have Solar Enphase data flowing into Logstash. By default however, our `distributor` pipeline in Logstash will put any unrecognized data in our Data Lake / S3 bucket called `NEEDS_CLASSIFIED`. To change this, we're going to update the `distributor` pipeline to recognize the Solar Enphase data feed. 109 | 110 | Add the following conditional to your `distributor.yml` file: 111 | 112 | ``` 113 | } else if "solar-enphase" in [tags] { 114 | pipeline { 115 | send_to => ["solar-enphase-archive"] 116 | } 117 | } 118 | ``` 119 | 120 | Create a Logstash pipeline called `solar-enphase-archive.yml` with the following contents: 121 | 122 | ``` 123 | input { 124 | pipeline { 125 | address => "solar-enphase-archive" 126 | } 127 | } 128 | filter { 129 | } 130 | output { 131 | s3 { 132 | # 133 | # Custom Settings 134 | # 135 | prefix => "solar-enphase/%{+YYYY}-%{+MM}-%{+dd}/%{+HH}" 136 | temporary_directory => "${S3_TEMP_DIR}/solar-enphase-archive" 137 | access_key_id => "${S3_ACCESS_KEY}" 138 | secret_access_key => "${S3_SECRET_KEY}" 139 | endpoint => "${S3_ENDPOINT}" 140 | bucket => "${S3_BUCKET}" 141 | 142 | # 143 | # Standard Settings 144 | # 145 | validate_credentials_on_root_bucket => false 146 | codec => json_lines 147 | # Limit Data Lake file sizes to 5 GB 148 | size_file => 5000000000 149 | time_file => 60 150 | # encoding => "gzip" 151 | additional_settings => { 152 | force_path_style => true 153 | follow_redirects => false 154 | } 155 | } 156 | } 157 | ``` 158 | 159 | Put this pipeline in your Logstash configuration directory so it gets loaded whenever Logstash restarts: 160 | 161 | ```bash 162 | sudo mv solar-enphase-archive.yml /etc/logstash/conf.d/ 163 | ``` 164 | 165 | Add the pipeline to your `/etc/logstash/pipelines.yml` file: 166 | 167 | ``` 168 | - pipeline.id: "solar-enphase-archive" 169 | path.config: "/etc/logstash/conf.d/solar-enphase-archive.conf" 170 | ``` 171 | 172 | And finally, restart the Logstash service: 173 | 174 | ```bash 175 | sudo systemctl restart logstash 176 | ``` 177 | 178 | While Logstash is restarting, you can tail it's log file in order to see if there are any configuration errors: 179 | 180 | ```bash 181 | sudo tail -f /var/log/logstash/logstash-plain.log 182 | ``` 183 | 184 | After a few seconds, you should see Logstash shutdown and start with the new pipeline and no errors being emitted. 185 | 186 | Check your cluster's Stack Monitoring to see if we're getting events through the pipeline: 187 | 188 | ![Stack Monitoring](archive.png) 189 | 190 | Check your S3 bucket to see if you're getting data directories created for the current date & hour with data: 191 | 192 | ![MinIO](minio.png) 193 | 194 | If you see your data being stored, then you are successfully archiving! 195 | 196 | ## Step #3 - Index Data 197 | 198 | Once Logstash is archiving the data, next we need to index it with Elastic. 199 | 200 | We'll use Elastic's [Dynamic field mapping](https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-field-mapping.html) feature to automatically create the right [Field data types](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html) for the data we're sending in. 201 | 202 | Using the [Logstash Toolkit](http://github.com/gose/logstash-toolkit), the following filter chain has been built that can parse the raw JSON coming in. 203 | 204 | Create a new pipeline called `solar-enphase-index.yml` with the following content: 205 | 206 | ``` 207 | input { 208 | pipeline { 209 | address => "solar-enphase-index" 210 | } 211 | } 212 | filter { 213 | json { 214 | source => "message" 215 | } 216 | if [message] =~ /^\[/ { 217 | json { 218 | source => "message" 219 | target => "tmp" 220 | } 221 | } else { 222 | drop { } 223 | } 224 | if "_jsonparsefailure" in [tags] { 225 | drop { } 226 | } 227 | mutate { 228 | remove_field => ["message"] 229 | } 230 | mutate { 231 | add_field => { 232 | "message" => "%{[tmp][0]}" 233 | } 234 | } 235 | mutate { 236 | remove_field => ["tmp"] 237 | } 238 | json { 239 | source => "message" 240 | } 241 | mutate { 242 | remove_field => ["message"] 243 | } 244 | split { 245 | field => "micro_inverters" 246 | } 247 | ruby { 248 | # Promote the keys inside tmp to root, then remove tmp 249 | code => ' 250 | event.get("micro_inverters").each { |k, v| 251 | event.set(k,v) 252 | } 253 | event.remove("micro_inverters") 254 | ' 255 | } 256 | date { 257 | match => ["last_report_date", "ISO8601"] 258 | } 259 | mutate { 260 | remove_field => ["last_report_date", "part_number", "envoy_serial_number", "param_table"] 261 | remove_field => ["model", "sku", "grid_profile", "proc_load", "id"] 262 | remove_field => ["agent", "host", "input", "log", "host", "ecs", "@version"] 263 | } 264 | } 265 | output { 266 | elasticsearch { 267 | # 268 | # Custom Settings 269 | # 270 | id => "solar-enphase-index" 271 | index => "solar-enphase-%{+YYYY.MM.dd}" 272 | hosts => "${ES_ENDPOINT}" 273 | user => "${ES_USERNAME}" 274 | password => "${ES_PASSWORD}" 275 | } 276 | } 277 | ``` 278 | 279 | Put this pipeline in your Logstash configuration directory so it gets loaded in whenever Logstash restarts: 280 | 281 | ```bash 282 | sudo mv solar-enphase-index.yml /etc/logstash/conf.d/ 283 | ``` 284 | 285 | Add the pipeline to your `/etc/logstash/pipelines.yml` file: 286 | 287 | ``` 288 | - pipeline.id: "solar-enphase-index" 289 | path.config: "/etc/logstash/conf.d/solar-enphase-index.conf" 290 | ``` 291 | 292 | And finally, restart the Logstash service: 293 | 294 | ```bash 295 | sudo systemctl restart logstash 296 | ``` 297 | 298 | While Logstash is restarting, you can tail it's log file in order to see if there are any configuration errors: 299 | 300 | ```bash 301 | sudo tail -f /var/log/logstash/logstash-plain.log 302 | ``` 303 | 304 | After a few seconds, you should see Logstash shutdown and start with the new pipeline and no errors being emitted. 305 | 306 | Check your cluster's Stack Monitoring to see if we're getting events through the pipeline: 307 | 308 | ![Indexing](index.png) 309 | 310 | ## Step #4 - Visualize Data 311 | 312 | Once Elasticsearch is indexing the data, we want to visualize it in Kibana. 313 | 314 | Download this dashboard: [solar-enphase.ndjson](solar-enphase.ndjson) 315 | 316 | Jump back into Kibana: 317 | 318 | 1. Select "Stack Management" from the menu 319 | 2. Select "Saved Objects" 320 | 3. Click "Import" in the upper right 321 | 322 | Once it's been imported, click on "Solar Enphase". 323 | 324 | ![Dashboard](dashboard.png) 325 | 326 | Congratulations! You should now be looking at data from your Solar Enphase in Elastic. 327 | -------------------------------------------------------------------------------- /solar-enphase/archive.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/solar-enphase/archive.png -------------------------------------------------------------------------------- /solar-enphase/dashboard.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/solar-enphase/dashboard.png -------------------------------------------------------------------------------- /solar-enphase/index.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/solar-enphase/index.png -------------------------------------------------------------------------------- /solar-enphase/minio.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/solar-enphase/minio.png -------------------------------------------------------------------------------- /solar-enphase/solar-enphase.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | import urllib.request 4 | import urllib.parse 5 | import datetime 6 | 7 | def main(): 8 | # The Enphase API endpoints are detailed here: 9 | # https://developer.enphase.com/docs 10 | # Most of them don't need to be called more than once a day. 11 | 12 | now = datetime.datetime.utcnow() 13 | 14 | # Run once per day 15 | if now.hour == 0 and now.minute == 0: 16 | url = "https://api.enphaseenergy.com/api/v2/systems//energy_lifetime?key=&user_id=" 17 | f = urllib.request.urlopen(url) 18 | print(f.read().decode("utf-8")) 19 | 20 | url = "https://api.enphaseenergy.com/api/v2/systems//inventory?key=&user_id=" 21 | f = urllib.request.urlopen(url) 22 | print(f.read().decode("utf-8")) 23 | 24 | # Run once per hour 25 | if now.minute == 0: 26 | url = "https://api.enphaseenergy.com/api/v2/systems//summary?key=&user_id=" 27 | f = urllib.request.urlopen(url) 28 | print(f.read().decode("utf-8")) 29 | 30 | # Run every 10 minutes 31 | if now.minute % 10 == 0: 32 | # Get the status of each inverter 33 | url = "https://api.enphaseenergy.com/api/v2/systems/inverters_summary_by_envoy_or_site?key=&user_id=&site_id=" 34 | f = urllib.request.urlopen(url) 35 | print(f.read().decode("utf-8")) 36 | 37 | # The `stats` endpoint updates, at most, once every 5 minutes. 38 | # It isn't reliable though, so you can't expect a new reading every 5 minutes. 39 | # Due to this, we'll track all of it and use an enrich lookup in Logstash to 40 | # see if the 5-minute reading was already inserted into Elasticsearch. 41 | # { 42 | # "end_at": 1613239200, 43 | # "devices_reporting": 20, 44 | # "powr": 159, # Average power produced during this interval, measured in Watts. 45 | # "enwh": 13 # Energy produced during this interval, measured in Watt hours. 46 | # } 47 | url = "https://api.enphaseenergy.com/api/v2/systems//stats?key=&user_id=" 48 | f = urllib.request.urlopen(url) 49 | print(f.read().decode("utf-8")) 50 | 51 | if __name__ == "__main__": 52 | main() 53 | -------------------------------------------------------------------------------- /solar-enphase/solar.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/solar-enphase/solar.png -------------------------------------------------------------------------------- /temperature-dht22/README.md: -------------------------------------------------------------------------------- 1 | # Monitoring Temperature with DHT22 2 | 3 | DHT22 4 | 5 | The [DHT22](http://www.adafruit.com/products/385), in a low-cost digital temperature and humidity sensor. It uses a capacitive humidity sensor and a thermistor to measure the surrounding air, and outputs a digital signal on the data pin reporting their values. The [AM2302](https://www.adafruit.com/product/393) is a wired version of this sensor which includes the required [4.7K - 10KΩ](https://raspberrypi.stackexchange.com/questions/12161/do-i-have-to-connect-a-resistor-to-my-dht22-humidity-sensor) resistor. The version by [FTCBlock](https://www.amazon.com/FTCBlock-Temperature-Humidity-Electronic-Practice/dp/B07H2RP26F) comes with GPIO jumpers that don't require a breadboard or soldering. 6 | 7 | We'll use a Python script to query the sensor each minute via a cron job, and redirect the output to a log file. From there, Filebeat will pick it up and send it to Elastic. 8 | 9 | ![Dashboard](dashboard.png) 10 | 11 | Let's get started. 12 | 13 | ## Step #1 - Collect Data 14 | 15 | Install the following Python module: 16 | 17 | ```bash 18 | sudo pip3 install Adafruit_DHT 19 | ``` 20 | 21 | Create a Python script at `~/bin/temperature-dht22.py` with the following contents (adjusting any values as you see fit): 22 | 23 | ```python 24 | #!/usr/bin/env python3 25 | 26 | import Adafruit_DHT 27 | import datetime 28 | import json 29 | import socket 30 | 31 | DHT_SENSOR = Adafruit_DHT.DHT22 32 | DHT_PIN = 4 33 | 34 | if __name__ == "__main__": 35 | humidity, temp_c = Adafruit_DHT.read_retry(DHT_SENSOR, DHT_PIN) 36 | temp_f = (temp_c * 9 / 5) + 32 37 | output = { 38 | "timestamp": datetime.datetime.utcnow().isoformat(), 39 | "host": socket.gethostname(), 40 | "temp_c": float("%2.2f" % temp_c), 41 | "temp_f": float("%2.2f" % temp_f), 42 | "humidity": float("%2.2f" % humidity), 43 | "location": "office", 44 | "source": "DHT22" 45 | } 46 | print(json.dumps(output)) 47 | ``` 48 | 49 | Try running the script from the command line: 50 | 51 | ```bash 52 | chmod a+x ~/bin/temperature-dht22.py 53 | sudo ~/bin/temperature-dht22.py 54 | ``` 55 | 56 | The output should look like the following: 57 | 58 | ```json 59 | {"timestamp": "2021-09-05T12:30:10.436436", "host": "node-19", "temp_c": 21.3, "temp_f": 70.34, "humidity": 60.2, "location": "office", "source": "DHT22"} 60 | ``` 61 | 62 | Once you're able to successfully query the sensor, create a log file for its output: 63 | 64 | ```bash 65 | sudo touch /var/log/temperature-dht22.log 66 | sudo chown ubuntu.ubuntu /var/log/temperature-dht22.log 67 | ``` 68 | 69 | Create a logrotate entry so the log file doesn't grow unbounded: 70 | 71 | ``` 72 | sudo vi /etc/logrotate.d/temperature-dht22 73 | ``` 74 | 75 | Add the following content: 76 | 77 | ``` 78 | /var/log/temperature-dht22.log { 79 | weekly 80 | rotate 12 81 | compress 82 | delaycompress 83 | missingok 84 | notifempty 85 | create 644 ubuntu ubuntu 86 | } 87 | ``` 88 | 89 | Add the following entry to your crontab: 90 | 91 | ``` 92 | * * * * * sudo /home/ubuntu/bin/temperature-dht22.py >> /var/log/temperature-dht22.log 2>&1 93 | ``` 94 | 95 | Verify output by tailing the log file for a few minutes: 96 | 97 | ``` 98 | $ tail -f /var/log/temperature-dht22.log 99 | ``` 100 | 101 | If you're seeing output scroll each minute then you are successfully collecting data! 102 | 103 | ## Step #2 - Archive Data 104 | 105 | Once your data is ready to archive, we'll use Filebeat to send it to Logstash which will in turn sends it to S3. 106 | 107 | Add the following to the Filebeat config `/etc/filebeat/filebeat.yml` on the host logging your DHT22 data: 108 | 109 | ```yaml 110 | filebeat.inputs: 111 | - type: log 112 | enabled: true 113 | tags: ["temperature-dht22"] 114 | paths: 115 | - /var/log/temperature-dht22.log 116 | ``` 117 | 118 | This tells Filebeat where the log file is located and it adds a tag to each event. We'll refer to that tag in Logstash so we can easily isolate events from this data stream. 119 | 120 | Restart Filebeat: 121 | 122 | ```bash 123 | sudo systemctl restart filebeat 124 | ``` 125 | 126 | You may want to tail syslog to see if Filebeat restarts without any issues: 127 | 128 | ```bash 129 | tail -f /var/log/syslog | grep filebeat 130 | ``` 131 | 132 | At this point, we should have DHT22 data flowing into Logstash. By default however, our `distributor` pipeline in Logstash will put any unrecognized data in our Data Lake / S3 bucket called `NEEDS_CLASSIFIED`. To change this, we're going to update the `distributor` pipeline to recognize the DHT22 data feed. 133 | 134 | Add the following conditional to your `distributor.yml` file: 135 | 136 | ``` 137 | } else if "temperature-dht22" in [tags] { 138 | pipeline { 139 | send_to => ["temperature-dht22-archive"] 140 | } 141 | } 142 | ``` 143 | 144 | Create a Logstash pipeline called `temperature-dht22-archive.yml` with the following contents: 145 | 146 | ``` 147 | input { 148 | pipeline { 149 | address => "temperature-dht22-archive" 150 | } 151 | } 152 | filter { 153 | } 154 | output { 155 | s3 { 156 | # 157 | # Custom Settings 158 | # 159 | prefix => "temperature-dht22/%{+YYYY}-%{+MM}-%{+dd}/%{+HH}" 160 | temporary_directory => "${S3_TEMP_DIR}/temperature-dht22-archive" 161 | access_key_id => "${S3_ACCESS_KEY}" 162 | secret_access_key => "${S3_SECRET_KEY}" 163 | endpoint => "${S3_ENDPOINT}" 164 | bucket => "${S3_BUCKET}" 165 | 166 | # 167 | # Standard Settings 168 | # 169 | validate_credentials_on_root_bucket => false 170 | codec => json_lines 171 | # Limit Data Lake file sizes to 5 GB 172 | size_file => 5000000000 173 | time_file => 60 174 | # encoding => "gzip" 175 | additional_settings => { 176 | force_path_style => true 177 | follow_redirects => false 178 | } 179 | } 180 | } 181 | ``` 182 | 183 | Put this pipeline in your Logstash configuration directory so it gets loaded whenever Logstash restarts: 184 | 185 | ```bash 186 | sudo mv temperature-dht22-archive.yml /etc/logstash/conf.d/ 187 | ``` 188 | 189 | Add the pipeline to your `/etc/logstash/pipelines.yml` file: 190 | 191 | ``` 192 | - pipeline.id: "temperature-dht22-archive" 193 | path.config: "/etc/logstash/conf.d/temperature-dht22-archive.conf" 194 | ``` 195 | 196 | And finally, restart the Logstash service: 197 | 198 | ```bash 199 | sudo systemctl restart logstash 200 | ``` 201 | 202 | While Logstash is restarting, you can tail it's log file in order to see if there are any configuration errors: 203 | 204 | ```bash 205 | sudo tail -f /var/log/logstash/logstash-plain.log 206 | ``` 207 | 208 | After a few seconds, you should see Logstash shutdown and start with the new pipeline and no errors being emitted. 209 | 210 | Check your cluster's Stack Monitoring to see if we're getting events through the pipeline: 211 | 212 | ![Stack Monitoring](archive.png) 213 | 214 | Check your S3 bucket to see if you're getting data directories created for the current date & hour with data: 215 | 216 | ![MinIO](minio.png) 217 | 218 | If you see your data being stored, then you are successfully archiving! 219 | 220 | ## Step #3 - Index Data 221 | 222 | Once Logstash is archiving the data, next we need to index it with Elastic. 223 | 224 | We'll use Elastic's [Dynamic field mapping](https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-field-mapping.html) feature to automatically create the right [Field data types](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html) for the data we're sending in. 225 | 226 | Using the [Logstash Toolkit](http://github.com/gose/logstash-toolkit), the following filter chain has been built that can parse the raw JSON coming in. 227 | 228 | Create a new pipeline called `temperature-dht22-index.yml` with the following content: 229 | 230 | ``` 231 | input { 232 | pipeline { 233 | address => "temperature-dht22-index" 234 | } 235 | } 236 | filter { 237 | json { 238 | source => "message" 239 | } 240 | json { 241 | source => "message" 242 | } 243 | date { 244 | match => ["timestamp", "ISO8601"] 245 | } 246 | mutate { 247 | remove_field => ["timestamp", "message"] 248 | remove_field => ["tags", "agent", "input", "log", "path", "ecs", "@version"] 249 | } 250 | } 251 | output { 252 | elasticsearch { 253 | # 254 | # Custom Settings 255 | # 256 | id => "temperature-dht22-index" 257 | index => "temperature-dht22-%{+YYYY.MM.dd}" 258 | hosts => "${ES_ENDPOINT}" 259 | user => "${ES_USERNAME}" 260 | password => "${ES_PASSWORD}" 261 | } 262 | } 263 | ``` 264 | 265 | Put this pipeline in your Logstash configuration directory so it gets loaded in whenever Logstash restarts: 266 | 267 | ```bash 268 | sudo mv temperature-dht22-index.yml /etc/logstash/conf.d/ 269 | ``` 270 | 271 | Add the pipeline to your `/etc/logstash/pipelines.yml` file: 272 | 273 | ``` 274 | - pipeline.id: "temperature-dht22-index" 275 | path.config: "/etc/logstash/conf.d/temperature-dht22-index.conf" 276 | ``` 277 | 278 | Append your new pipeline to your tagged data in the `distributor.yml` pipeline: 279 | 280 | ``` 281 | } else if "temperature-dht22" in [tags] { 282 | pipeline { 283 | send_to => ["temperature-dht22-archive", "temperature-dht22-index"] 284 | } 285 | } 286 | ``` 287 | 288 | And finally, restart the Logstash service: 289 | 290 | ```bash 291 | sudo systemctl restart logstash 292 | ``` 293 | 294 | While Logstash is restarting, you can tail it's log file in order to see if there are any configuration errors: 295 | 296 | ```bash 297 | sudo tail -f /var/log/logstash/logstash-plain.log 298 | ``` 299 | 300 | After a few seconds, you should see Logstash shutdown and start with the new pipeline and no errors being emitted. 301 | 302 | Check your cluster's Stack Monitoring to see if we're getting events through the pipeline: 303 | 304 | ![Indexing](index.png) 305 | 306 | ## Step #4 - Visualize Data 307 | 308 | Once Elasticsearch is indexing the data, we want to visualize it in Kibana. 309 | 310 | Download this dashboard: [temperature-dht22.ndjson](temperature-dht22.ndjson) 311 | 312 | Jump back into Kibana: 313 | 314 | 1. Select "Stack Management" from the menu 315 | 2. Select "Saved Objects" 316 | 3. Click "Import" in the upper right 317 | 318 | Once it's been imported, click on "Temperature DHT22". 319 | 320 | ![Dashboard](dashboard.png) 321 | 322 | Congratulations! You should now be looking at temperature data from your DHT22 in Elastic. 323 | 324 | -------------------------------------------------------------------------------- /temperature-dht22/archive.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/temperature-dht22/archive.png -------------------------------------------------------------------------------- /temperature-dht22/dashboard.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/temperature-dht22/dashboard.png -------------------------------------------------------------------------------- /temperature-dht22/dht22.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/temperature-dht22/dht22.png -------------------------------------------------------------------------------- /temperature-dht22/index.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/temperature-dht22/index.png -------------------------------------------------------------------------------- /temperature-dht22/minio.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/temperature-dht22/minio.png -------------------------------------------------------------------------------- /temperature-dht22/temperature-dht22.ndjson: -------------------------------------------------------------------------------- 1 | {"attributes":{"fieldAttrs":"{}","fields":"[]","runtimeFieldMap":"{}","timeFieldName":"@timestamp","title":"temperature-dht22-*","typeMeta":"{}"},"coreMigrationVersion":"7.14.0","id":"9dff6140-0ee2-11ec-b03a-7d8df502f497","migrationVersion":{"index-pattern":"7.11.0"},"references":[],"type":"index-pattern","updated_at":"2021-09-06T07:18:21.789Z","version":"WzM0OTQ3NywyXQ=="} 2 | {"attributes":{"description":"","hits":0,"kibanaSavedObjectMeta":{"searchSourceJSON":"{\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filter\":[]}"},"optionsJSON":"{\"useMargins\":true,\"syncColors\":false,\"hidePanelTitles\":false}","panelsJSON":"[{\"version\":\"7.14.0\",\"type\":\"visualization\",\"gridData\":{\"x\":0,\"y\":0,\"w\":20,\"h\":6,\"i\":\"bfa04254-31c1-429b-971d-921b5d6c33b1\"},\"panelIndex\":\"bfa04254-31c1-429b-971d-921b5d6c33b1\",\"embeddableConfig\":{\"savedVis\":{\"title\":\"\",\"description\":\"\",\"type\":\"markdown\",\"params\":{\"fontSize\":12,\"openLinksInNewTab\":false,\"markdown\":\"# Temperature & Humidity\\nby DHT22\"},\"uiState\":{},\"data\":{\"aggs\":[],\"searchSource\":{\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filter\":[]}}},\"enhancements\":{}}},{\"version\":\"7.14.0\",\"type\":\"lens\",\"gridData\":{\"x\":20,\"y\":0,\"w\":28,\"h\":6,\"i\":\"6bf93313-05d1-4657-bf19-2ac871b74009\"},\"panelIndex\":\"6bf93313-05d1-4657-bf19-2ac871b74009\",\"embeddableConfig\":{\"attributes\":{\"title\":\"\",\"type\":\"lens\",\"visualizationType\":\"lnsXY\",\"state\":{\"datasourceStates\":{\"indexpattern\":{\"layers\":{\"eee22f36-699b-4bcf-b6a8-a1efaefa5549\":{\"columns\":{\"c8b44acf-fb21-4a7a-8d72-f212143dc087\":{\"label\":\"@timestamp\",\"dataType\":\"date\",\"operationType\":\"date_histogram\",\"sourceField\":\"@timestamp\",\"isBucketed\":true,\"scale\":\"interval\",\"params\":{\"interval\":\"auto\"}},\"7658a405-61eb-48aa-85c0-92254a942779\":{\"label\":\"Count of records\",\"dataType\":\"number\",\"operationType\":\"count\",\"isBucketed\":false,\"scale\":\"ratio\",\"sourceField\":\"Records\"}},\"columnOrder\":[\"c8b44acf-fb21-4a7a-8d72-f212143dc087\",\"7658a405-61eb-48aa-85c0-92254a942779\"],\"incompleteColumns\":{}}}}},\"visualization\":{\"legend\":{\"isVisible\":true,\"position\":\"right\"},\"valueLabels\":\"hide\",\"fittingFunction\":\"None\",\"yLeftExtent\":{\"mode\":\"full\"},\"yRightExtent\":{\"mode\":\"full\"},\"axisTitlesVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"tickLabelsVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"gridlinesVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"preferredSeriesType\":\"bar_stacked\",\"layers\":[{\"layerId\":\"eee22f36-699b-4bcf-b6a8-a1efaefa5549\",\"accessors\":[\"7658a405-61eb-48aa-85c0-92254a942779\"],\"position\":\"top\",\"seriesType\":\"bar_stacked\",\"showGridlines\":false,\"xAccessor\":\"c8b44acf-fb21-4a7a-8d72-f212143dc087\"}]},\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filters\":[]},\"references\":[{\"type\":\"index-pattern\",\"id\":\"9dff6140-0ee2-11ec-b03a-7d8df502f497\",\"name\":\"indexpattern-datasource-current-indexpattern\"},{\"type\":\"index-pattern\",\"id\":\"9dff6140-0ee2-11ec-b03a-7d8df502f497\",\"name\":\"indexpattern-datasource-layer-eee22f36-699b-4bcf-b6a8-a1efaefa5549\"}]},\"hidePanelTitles\":false,\"enhancements\":{}},\"title\":\"Logs\"},{\"version\":\"7.14.0\",\"type\":\"lens\",\"gridData\":{\"x\":0,\"y\":6,\"w\":48,\"h\":10,\"i\":\"b2634629-b584-4b60-99d2-574db7c2d576\"},\"panelIndex\":\"b2634629-b584-4b60-99d2-574db7c2d576\",\"embeddableConfig\":{\"attributes\":{\"title\":\"\",\"type\":\"lens\",\"visualizationType\":\"lnsXY\",\"state\":{\"datasourceStates\":{\"indexpattern\":{\"layers\":{\"ddd9b678-b4d2-4c07-8715-7338bc709326\":{\"columns\":{\"5f615941-0d1f-4e75-b85d-37a5de33199d\":{\"label\":\"@timestamp\",\"dataType\":\"date\",\"operationType\":\"date_histogram\",\"sourceField\":\"@timestamp\",\"isBucketed\":true,\"scale\":\"interval\",\"params\":{\"interval\":\"auto\"}},\"7a7f6240-bd2d-4782-bc0e-68f5ddfe61b6\":{\"label\":\"Median of temp_f\",\"dataType\":\"number\",\"operationType\":\"median\",\"sourceField\":\"temp_f\",\"isBucketed\":false,\"scale\":\"ratio\"},\"8b38e615-3eac-4a81-9da9-955d50ffb348\":{\"label\":\"Top values of host.keyword\",\"dataType\":\"string\",\"operationType\":\"terms\",\"scale\":\"ordinal\",\"sourceField\":\"host.keyword\",\"isBucketed\":true,\"params\":{\"size\":3,\"orderBy\":{\"type\":\"column\",\"columnId\":\"7a7f6240-bd2d-4782-bc0e-68f5ddfe61b6\"},\"orderDirection\":\"desc\",\"otherBucket\":true,\"missingBucket\":false}}},\"columnOrder\":[\"8b38e615-3eac-4a81-9da9-955d50ffb348\",\"5f615941-0d1f-4e75-b85d-37a5de33199d\",\"7a7f6240-bd2d-4782-bc0e-68f5ddfe61b6\"],\"incompleteColumns\":{}}}}},\"visualization\":{\"legend\":{\"isVisible\":true,\"position\":\"right\"},\"valueLabels\":\"hide\",\"fittingFunction\":\"None\",\"curveType\":\"CURVE_MONOTONE_X\",\"valuesInLegend\":true,\"yLeftExtent\":{\"mode\":\"full\"},\"yRightExtent\":{\"mode\":\"full\"},\"axisTitlesVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"tickLabelsVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"gridlinesVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"preferredSeriesType\":\"line\",\"layers\":[{\"layerId\":\"ddd9b678-b4d2-4c07-8715-7338bc709326\",\"accessors\":[\"7a7f6240-bd2d-4782-bc0e-68f5ddfe61b6\"],\"position\":\"top\",\"seriesType\":\"line\",\"showGridlines\":false,\"xAccessor\":\"5f615941-0d1f-4e75-b85d-37a5de33199d\",\"splitAccessor\":\"8b38e615-3eac-4a81-9da9-955d50ffb348\"}]},\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filters\":[]},\"references\":[{\"type\":\"index-pattern\",\"id\":\"9dff6140-0ee2-11ec-b03a-7d8df502f497\",\"name\":\"indexpattern-datasource-current-indexpattern\"},{\"type\":\"index-pattern\",\"id\":\"9dff6140-0ee2-11ec-b03a-7d8df502f497\",\"name\":\"indexpattern-datasource-layer-ddd9b678-b4d2-4c07-8715-7338bc709326\"}]},\"hidePanelTitles\":false,\"enhancements\":{}},\"title\":\"Temperature\"},{\"version\":\"7.14.0\",\"type\":\"lens\",\"gridData\":{\"x\":0,\"y\":16,\"w\":48,\"h\":9,\"i\":\"5c2cc2a8-b6ae-4362-bbc4-df113d3e4619\"},\"panelIndex\":\"5c2cc2a8-b6ae-4362-bbc4-df113d3e4619\",\"embeddableConfig\":{\"attributes\":{\"title\":\"\",\"type\":\"lens\",\"visualizationType\":\"lnsXY\",\"state\":{\"datasourceStates\":{\"indexpattern\":{\"layers\":{\"9436d677-5ba4-4307-838b-53d734ad969d\":{\"columns\":{\"8c427cf5-b467-4497-b49a-ffe256a862d4\":{\"label\":\"@timestamp\",\"dataType\":\"date\",\"operationType\":\"date_histogram\",\"sourceField\":\"@timestamp\",\"isBucketed\":true,\"scale\":\"interval\",\"params\":{\"interval\":\"auto\"}},\"c8c24ecc-082a-4eeb-a039-aef253a1a9de\":{\"label\":\"Median of humidity\",\"dataType\":\"number\",\"operationType\":\"median\",\"sourceField\":\"humidity\",\"isBucketed\":false,\"scale\":\"ratio\"},\"eac3c026-b26a-4384-9020-d6eae264660d\":{\"label\":\"Top values of host.keyword\",\"dataType\":\"string\",\"operationType\":\"terms\",\"scale\":\"ordinal\",\"sourceField\":\"host.keyword\",\"isBucketed\":true,\"params\":{\"size\":3,\"orderBy\":{\"type\":\"column\",\"columnId\":\"c8c24ecc-082a-4eeb-a039-aef253a1a9de\"},\"orderDirection\":\"desc\",\"otherBucket\":true,\"missingBucket\":false}}},\"columnOrder\":[\"eac3c026-b26a-4384-9020-d6eae264660d\",\"8c427cf5-b467-4497-b49a-ffe256a862d4\",\"c8c24ecc-082a-4eeb-a039-aef253a1a9de\"],\"incompleteColumns\":{}}}}},\"visualization\":{\"legend\":{\"isVisible\":true,\"position\":\"right\"},\"valueLabels\":\"hide\",\"fittingFunction\":\"None\",\"curveType\":\"CURVE_MONOTONE_X\",\"valuesInLegend\":true,\"yLeftExtent\":{\"mode\":\"full\"},\"yRightExtent\":{\"mode\":\"full\"},\"axisTitlesVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"tickLabelsVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"gridlinesVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"preferredSeriesType\":\"line\",\"layers\":[{\"layerId\":\"9436d677-5ba4-4307-838b-53d734ad969d\",\"accessors\":[\"c8c24ecc-082a-4eeb-a039-aef253a1a9de\"],\"position\":\"top\",\"seriesType\":\"line\",\"showGridlines\":false,\"xAccessor\":\"8c427cf5-b467-4497-b49a-ffe256a862d4\",\"splitAccessor\":\"eac3c026-b26a-4384-9020-d6eae264660d\"}]},\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filters\":[]},\"references\":[{\"type\":\"index-pattern\",\"id\":\"9dff6140-0ee2-11ec-b03a-7d8df502f497\",\"name\":\"indexpattern-datasource-current-indexpattern\"},{\"type\":\"index-pattern\",\"id\":\"9dff6140-0ee2-11ec-b03a-7d8df502f497\",\"name\":\"indexpattern-datasource-layer-9436d677-5ba4-4307-838b-53d734ad969d\"}]},\"hidePanelTitles\":false,\"enhancements\":{}},\"title\":\"Humidity\"}]","timeRestore":false,"title":"Temperature DHT22","version":1},"coreMigrationVersion":"7.14.0","id":"27df3f70-0ee3-11ec-b03a-7d8df502f497","migrationVersion":{"dashboard":"7.14.0"},"references":[{"id":"9dff6140-0ee2-11ec-b03a-7d8df502f497","name":"6bf93313-05d1-4657-bf19-2ac871b74009:indexpattern-datasource-current-indexpattern","type":"index-pattern"},{"id":"9dff6140-0ee2-11ec-b03a-7d8df502f497","name":"6bf93313-05d1-4657-bf19-2ac871b74009:indexpattern-datasource-layer-eee22f36-699b-4bcf-b6a8-a1efaefa5549","type":"index-pattern"},{"id":"9dff6140-0ee2-11ec-b03a-7d8df502f497","name":"b2634629-b584-4b60-99d2-574db7c2d576:indexpattern-datasource-current-indexpattern","type":"index-pattern"},{"id":"9dff6140-0ee2-11ec-b03a-7d8df502f497","name":"b2634629-b584-4b60-99d2-574db7c2d576:indexpattern-datasource-layer-ddd9b678-b4d2-4c07-8715-7338bc709326","type":"index-pattern"},{"id":"9dff6140-0ee2-11ec-b03a-7d8df502f497","name":"5c2cc2a8-b6ae-4362-bbc4-df113d3e4619:indexpattern-datasource-current-indexpattern","type":"index-pattern"},{"id":"9dff6140-0ee2-11ec-b03a-7d8df502f497","name":"5c2cc2a8-b6ae-4362-bbc4-df113d3e4619:indexpattern-datasource-layer-9436d677-5ba4-4307-838b-53d734ad969d","type":"index-pattern"}],"type":"dashboard","updated_at":"2021-09-06T07:22:13.101Z","version":"WzM0OTY3MywyXQ=="} 3 | {"excludedObjects":[],"excludedObjectsCount":0,"exportedCount":2,"missingRefCount":0,"missingReferences":[]} -------------------------------------------------------------------------------- /utilization/2-archive/utilization-archive.yml: -------------------------------------------------------------------------------- 1 | input { 2 | pipeline { 3 | address => "utilization-archive" 4 | } 5 | } 6 | filter { 7 | } 8 | output { 9 | s3 { 10 | # 11 | # Custom Settings 12 | # 13 | prefix => "utilization/${S3_DATE_DIR}" 14 | temporary_directory => "${S3_TEMP_DIR}/utilization-archive" 15 | access_key_id => "${S3_ACCESS_KEY}" 16 | secret_access_key => "${S3_SECRET_KEY}" 17 | endpoint => "${S3_ENDPOINT}" 18 | bucket => "${S3_BUCKET}" 19 | 20 | # 21 | # Standard Settings 22 | # 23 | validate_credentials_on_root_bucket => false 24 | codec => json_lines 25 | # Limit Data Lake file sizes to 5 GB 26 | size_file => 5000000000 27 | time_file => 1 28 | # encoding => "gzip" 29 | additional_settings => { 30 | force_path_style => true 31 | follow_redirects => false 32 | } 33 | } 34 | } 35 | -------------------------------------------------------------------------------- /utilization/2-archive/utilization-reindex.yml: -------------------------------------------------------------------------------- 1 | input { 2 | s3 { 3 | # 4 | # Custom Settings 5 | # 6 | prefix => "utilization/2021-01-04" 7 | temporary_directory => "${S3_TEMP_DIR}/utilization-reindex" 8 | access_key_id => "${S3_ACCESS_KEY}" 9 | secret_access_key => "${S3_SECRET_KEY}" 10 | endpoint => "${S3_ENDPOINT}" 11 | bucket => "${S3_BUCKET}" 12 | 13 | # 14 | # Standard Settings 15 | # 16 | watch_for_new_files => false 17 | codec => json_lines 18 | additional_settings => { 19 | force_path_style => true 20 | follow_redirects => false 21 | } 22 | } 23 | } 24 | filter { 25 | } 26 | output { 27 | pipeline { send_to => "utilization-structure" } 28 | } 29 | -------------------------------------------------------------------------------- /utilization/2-archive/utilization-structure.yml: -------------------------------------------------------------------------------- 1 | input { 2 | pipeline { 3 | address => "utilization-structure" 4 | } 5 | } 6 | filter { 7 | } 8 | output { 9 | elasticsearch { 10 | # 11 | # Custom Settings 12 | # 13 | id => "utilization-structure" 14 | index => "utilization" 15 | hosts => "${ES_ENDPOINT}" 16 | user => "${ES_USERNAME}" 17 | password => "${ES_PASSWORD}" 18 | } 19 | } 20 | -------------------------------------------------------------------------------- /weather-station/README.md: -------------------------------------------------------------------------------- 1 | # Weather Station 2 | 3 | weather-station 4 | 5 | The [WS-1550-IP](https://ambientweather.com/amws1500.html) from Ambient Weather is a great amatuer weather station. The station itself is powered by solar with AA battery backup, it's relatively maintenance free, and it's a joy to observe in action. It communicates with a base station via 915 MHz that requires no setup. You can also add up to 8 additional sensors to collect temperature from various points within range, all wirelessly. The base station connects to the Internet via a hard-wired RJ-45 connection on your network, that it uses to upload the data it collects to Ambient Weather's free service. From there, you can query an [API](https://ambientweather.docs.apiary.io/#) to get your latest, hyper-local, weather. 6 | 7 | In this data source, we'll build the following dashboard with Elastic: 8 | 9 | ![Dashboard](dashboard.png) 10 | 11 | Let's get started! 12 | 13 | ## Step #1 - Collect Data 14 | 15 | Create a new python script called `~/bin/weather-station.py` with the following contents: 16 | 17 | ```python 18 | #!/usr/bin/env python3 19 | 20 | import urllib.request 21 | import urllib.parse 22 | 23 | api_key = '' 24 | app_key = '' 25 | 26 | url = "https://api.ambientweather.net/v1/devices?apiKey=%s&applicationKey=%s" % (api_key, app_key) 27 | 28 | try: 29 | f = urllib.request.urlopen(url) 30 | except urllib.error.HTTPError as e: 31 | # Return code error (e.g. 404, 501, ...) 32 | print('[{"lastData": {"http_code": %s}}]' % (e.code)) 33 | except urllib.error.URLError as e: 34 | # Not an HTTP-specific error (e.g. connection refused) 35 | print('[{"lastData": {"http_error": "%s"}}]' % (e.reason)) 36 | else: 37 | # 200 38 | print(f.read().decode("utf-8")) 39 | ``` 40 | 41 | Enter your API key and Application key from the Ambient Weather service. 42 | 43 | This script queries the Ambient Weather API using your API key and Application key. It then prints the response to `stdout`. Once we've confirmed the script works, we'll redirect `stdout` to a log file. 44 | 45 | Try running the script: 46 | 47 | ```bash 48 | chmod a+x ~/bin/weather-station.py 49 | ~/bin/weather-station.py 50 | ``` 51 | 52 | You should see output similar to: 53 | 54 | ```json 55 | [{"macAddress":"00:0E:C6:20:0F:7B","lastData":{"dateutc":1630076460000,"winddir":186,"windspeedmph":0.22,"windgustmph":1.12,"maxdailygust":4.47,"tempf":82.4,"battout":1,"humidity":69,"hourlyrainin":0,"eventrainin":0,"dailyrainin":0,"weeklyrainin":1.22,"monthlyrainin":5.03,"yearlyrainin":21.34,"totalrainin":21.34,"tempinf":73.4,"battin":1,"humidityin":62, ... 56 | ``` 57 | 58 | Once you confirm the script is working, you can redirect its output to a log file: 59 | 60 | ```bash 61 | sudo touch /var/log/weather-station.log 62 | sudo chown ubuntu.ubuntu /var/log/weather-station.log 63 | ``` 64 | 65 | Create a logrotate entry so the log file doesn't grow unbounded: 66 | 67 | ```bash 68 | sudo vi /etc/logrotate.d/weather-station 69 | ``` 70 | 71 | Add the following logrotate content: 72 | 73 | ``` 74 | /var/log/weather-station.log { 75 | weekly 76 | rotate 12 77 | compress 78 | delaycompress 79 | missingok 80 | notifempty 81 | create 644 ubuntu ubuntu 82 | } 83 | ``` 84 | 85 | Add the following entry to your crontab with `crontab -e`: 86 | 87 | ``` 88 | * * * * * /home/ubuntu/bin/weather-station.py >> /var/log/weather-station.log 2>&1 89 | ``` 90 | 91 | Verify output by tailing the log file for a few minutes (since cron is only running the script at the start of each minute): 92 | 93 | ```bash 94 | tail -f /var/log/weather-station.log 95 | ``` 96 | 97 | If you're seeing output scroll each minute then you are successfully collecting data! 98 | 99 | ## Step #2 - Archive Data 100 | 101 | Once your data is ready to archive, we'll use Filebeat to send it to Logstash which will in turn sends it to S3. 102 | 103 | Add the following to the Filebeat config `/etc/filebeat/filebeat.yml` on the host logging your weather data: 104 | 105 | ```yaml 106 | filebeat.inputs: 107 | - type: log 108 | enabled: true 109 | tags: ["weather-station"] 110 | paths: 111 | - /var/log/weather-station.log 112 | ``` 113 | 114 | This tells Filebeat where the log file is located and it adds a tag to each event. We'll refer to that tag in Logstash so we can easily isolate events from this data stream. 115 | 116 | Restart Filebeat: 117 | 118 | ```bash 119 | sudo systemctl restart filebeat 120 | ``` 121 | 122 | You may want to tail syslog to see if Filebeat restarts without any issues: 123 | 124 | ```bash 125 | tail -f /var/log/syslog | grep filebeat 126 | ``` 127 | 128 | At this point, we should have weather-station data flowing into Logstash. By default however, our `distributor` pipeline in Logstash will put any unrecognized data in our Data Lake / S3 bucket called `NEEDS_CLASSIFIED`. To change this, we're going to update the `distributor` pipeline to recognize the weather station data feed. 129 | 130 | Add the following conditional to your `distributor.yml` file: 131 | 132 | ``` 133 | } else if "weather-station" in [tags] { 134 | pipeline { 135 | send_to => ["weather-station-archive"] 136 | } 137 | } 138 | ``` 139 | 140 | Create a Logstash pipeline called `weather-station-archive.yml` with the following contents: 141 | 142 | ``` 143 | input { 144 | pipeline { 145 | address => "weather-station-archive" 146 | } 147 | } 148 | filter { 149 | } 150 | output { 151 | s3 { 152 | # 153 | # Custom Settings 154 | # 155 | prefix => "weather-station/%{+YYYY}-%{+MM}-%{+dd}/%{+HH}" 156 | temporary_directory => "${S3_TEMP_DIR}/weather-station-archive" 157 | access_key_id => "${S3_ACCESS_KEY}" 158 | secret_access_key => "${S3_SECRET_KEY}" 159 | endpoint => "${S3_ENDPOINT}" 160 | bucket => "${S3_BUCKET}" 161 | 162 | # 163 | # Standard Settings 164 | # 165 | validate_credentials_on_root_bucket => false 166 | codec => json_lines 167 | # Limit Data Lake file sizes to 5 GB 168 | size_file => 5000000000 169 | time_file => 60 170 | # encoding => "gzip" 171 | additional_settings => { 172 | force_path_style => true 173 | follow_redirects => false 174 | } 175 | } 176 | } 177 | ``` 178 | 179 | Put this pipeline in your Logstash configuration directory so it gets loaded whenever Logstash restarts: 180 | 181 | ```bash 182 | sudo mv weather-station-archive.yml /etc/logstash/conf.d/ 183 | ``` 184 | 185 | Add the pipeline to your `/etc/logstash/pipelines.yml` file: 186 | 187 | ``` 188 | - pipeline.id: "weather-station-archive" 189 | path.config: "/etc/logstash/conf.d/weather-station-archive.conf" 190 | ``` 191 | 192 | And finally, restart the Logstash service: 193 | 194 | ```bash 195 | sudo systemctl restart logstash 196 | ``` 197 | 198 | While Logstash is restarting, you can tail it's log file in order to see if there are any configuration errors: 199 | 200 | ```bash 201 | sudo tail -f /var/log/logstash/logstash-plain.log 202 | ``` 203 | 204 | After a few seconds, you should see Logstash shutdown and start with the new pipeline and no errors being emitted. 205 | 206 | Check your cluster's Stack Monitoring to see if we're getting events through the pipeline: 207 | 208 | ![Archive](archive.png) 209 | 210 | Check your S3 bucket to see if you're getting data directories created each minute for the current date & hour with data: 211 | 212 | ![Minio](minio.png) 213 | 214 | If you see your data being stored, then you are successfully archiving! 215 | 216 | ## Step #3 - Index Data 217 | 218 | Once Logstash is archiving the data, next we need to index it with Elastic. 219 | 220 | Jump into Kibana and open Dev Tools. 221 | 222 | Copy and paste the following content into Dev Tools to create an Index Template for our weather station data: 223 | 224 | ``` 225 | PUT _index_template/weather-station 226 | { 227 | "index_patterns": [ 228 | "weather-station-*" 229 | ], 230 | "template": { 231 | "mappings": { 232 | "dynamic_templates": [ 233 | { 234 | "integers": { 235 | "match_mapping_type": "long", 236 | "mapping": { 237 | "type": "float" 238 | } 239 | } 240 | } 241 | ], 242 | "properties": { 243 | "info.coords.geo.coordinates": { 244 | "type": "geo_point" 245 | } 246 | } 247 | } 248 | } 249 | } 250 | ``` 251 | 252 | For the most part, we'll use Elastic's [Dynamic field mapping](https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-field-mapping.html) feature to automatically create the right [Field data types](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html) for the data we're sending in. The exceptions here are the latitude & longitude of the weather station and the coercion of any `long` values into `float` values. First, for the latitude & longitude, we need to explicility tell Elasticsearch that this is a [`geo_point`](https://www.elastic.co/guide/en/elasticsearch/reference/current/geo-point.html) type so that we can plot it on a map. If you start to track multiple weather stations in Elastic, plotting their locations on a map is very useful. Second, to prevent any values that happen to first come in as a whole number from determining the mapping type, we set a [`dynamic_template`](https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-templates.html) to convert any `long` values into `float` values. 253 | 254 | Now, switch back to a terminal so we can create the Logstash pipeline to index the weather station data. 255 | 256 | Using the [Logstash Toolkit](http://github.com/gose/logstash-toolkit), I iteratively built the following filter chain that can parse the raw JSON coming in. 257 | 258 | Create a new pipeline called `weather-station-index.yml` with the following content: 259 | 260 | ``` 261 | input { 262 | pipeline { 263 | address => "weather-station-index" 264 | } 265 | } 266 | filter { 267 | if [message] =~ /^\[/ { 268 | json { 269 | source => "message" 270 | target => "tmp" 271 | } 272 | } else { 273 | drop { } 274 | } 275 | if "_jsonparsefailure" in [tags] { 276 | drop { } 277 | } 278 | mutate { 279 | remove_field => ["message"] 280 | } 281 | mutate { 282 | add_field => { 283 | "message" => "%{[tmp][0]}" 284 | } 285 | } 286 | json { 287 | source => "message" 288 | } 289 | ruby { 290 | # Promote the keys inside lastData to root, then remove lastData 291 | code => ' 292 | event.get("lastData").each { |k, v| 293 | event.set(k,v) 294 | } 295 | event.remove("lastData") 296 | ' 297 | } 298 | date { 299 | match => ["date", "ISO8601"] 300 | } 301 | mutate { 302 | remove_field => ["message", "tmp", "path", "host", "macAddress", "date"] 303 | } 304 | } 305 | output { 306 | elasticsearch { 307 | # 308 | # Custom Settings 309 | # 310 | id => "weather-station-index" 311 | index => "weather-station-%{+YYYY.MM.dd}" 312 | hosts => "${ES_ENDPOINT}" 313 | user => "${ES_USERNAME}" 314 | password => "${ES_PASSWORD}" 315 | } 316 | } 317 | ``` 318 | 319 | This filter chain structures the raw data into a format that allows us to easily use Elastic's dynamic mapping feature. 320 | 321 | For the most part, we use the raw field names as provided to us by the Ambient Weather service. You can rename the raw field names to something more descriptive if you'd like, but then you'll also need to adjust the Dashboard provided in Step #4 to point to your field names. 322 | 323 | Ambient Weather provides a list of the units on each of their field names in the [Device Data Specs](https://github.com/ambient-weather/api-docs/wiki/Device-Data-Specs). 324 | 325 | Put this pipeline in your Logstash configuration directory so it gets loaded in whenever Logstash restarts: 326 | 327 | ```bash 328 | sudo mv weather-station-index.yml /etc/logstash/conf.d/ 329 | ``` 330 | 331 | Add the pipeline to your `/etc/logstash/pipelines.yml` file: 332 | 333 | ``` 334 | - pipeline.id: "weather-station-index" 335 | path.config: "/etc/logstash/conf.d/weather-station-index.conf" 336 | ``` 337 | 338 | And finally, restart the Logstash service: 339 | 340 | ```bash 341 | sudo systemctl restart logstash 342 | ``` 343 | 344 | While Logstash is restarting, you can tail it's log file in order to see if there are any configuration errors: 345 | 346 | ```bash 347 | sudo tail -f /var/log/logstash/logstash-plain.log 348 | ``` 349 | 350 | After a few seconds, you should see Logstash shutdown and start with the new pipeline and no errors being emitted. 351 | 352 | Check your cluster's Stack Monitoring to see if we're getting events through the pipeline: 353 | 354 | ![Stack Monitoring](index.png) 355 | 356 | ## Step #4 - Visualize Data 357 | 358 | Once Elasticsearch is indexing the data, we want to visualize it in Kibana. 359 | 360 | Download this dashboard: 361 | 362 | ​ [weather-station.ndjson](weather-station.ndjson) 363 | 364 | Jump into Kibana: 365 | 366 | 1. Select "Stack Management" from the menu 367 | 2. Select "Saved Objects" 368 | 3. Click "Import" in the upper right 369 | 370 | Once it's been imported, click on "Weather Station". 371 | 372 | Congratulations! You should now be looking at data from your weather station in Elastic. 373 | 374 | dashboard -------------------------------------------------------------------------------- /weather-station/archive.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/weather-station/archive.png -------------------------------------------------------------------------------- /weather-station/dashboard.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/weather-station/dashboard.png -------------------------------------------------------------------------------- /weather-station/index.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/weather-station/index.png -------------------------------------------------------------------------------- /weather-station/minio.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/weather-station/minio.png -------------------------------------------------------------------------------- /weather-station/ws-1550-ip.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/weather-station/ws-1550-ip.png --------------------------------------------------------------------------------