├── .gitignore
├── README.md
├── co2meter
├── README.md
├── archive.png
├── co2meter.ndjson
├── co2meter.png
├── co2meter.py
├── dashboard.png
├── index.png
└── minio.png
├── directory-sizes
├── README.md
├── archive.png
├── dashboard.png
├── directory-sizes.py
├── index.png
├── logo.png
└── minio.png
├── flight-tracker
├── README.md
├── archive.png
├── dashboard.png
├── flight-tracker.ndjson
├── flight-tracker.py
├── index.png
├── logo.png
└── minio.png
├── gps
├── README.md
├── archive.png
├── dashboard.png
├── gps.ndjson
├── gps.png
├── index.png
└── minio.png
├── haproxy-filebeat-module
├── 2-archive
│ ├── haproxy-filebeat-module-archive.yml
│ ├── haproxy-filebeat-module-reindex.yml
│ └── haproxy-filebeat-module-structure.yml
├── 4-visualize
│ └── dashboard.json
└── README.md
├── images
├── architecture.png
├── caiv.png
├── data-source-assets.png
├── elk-data-lake.png
├── indexing.png
├── logical-elements.png
├── onboarding-data.png
├── terminology.png
└── workflow.png
├── power-emu2
├── README.md
├── archive.png
├── dashboard.png
├── emu-2.jpg
├── index.png
├── minio.png
├── power-emu2.ndjson
└── power-emu2.py
├── power-hs300
├── README.md
├── archive.png
├── dashboard.png
├── hs300.png
├── hs300.py
├── index.png
├── minio.png
├── power-hs300.ndjson
└── reindex.yml
├── satellites
├── README.md
├── dashboard.png
├── satellites.ndjson
└── satellites.png
├── setup
├── README.md
├── dead-letter-queue-archive.yml
├── distributor.conf
├── haproxy.cfg
└── needs-classified-archive.yml
├── solar-enphase
├── README.md
├── archive.png
├── dashboard.png
├── index.png
├── minio.png
├── solar-enphase.py
└── solar.png
├── temperature-dht22
├── README.md
├── archive.png
├── dashboard.png
├── dht22.png
├── index.png
├── minio.png
└── temperature-dht22.ndjson
├── utilization
├── 2-archive
│ ├── utilization-archive.yml
│ ├── utilization-reindex.yml
│ └── utilization-structure.yml
└── 3-index
│ ├── README.md
│ └── utilization-index-template.json
└── weather-station
├── README.md
├── archive.png
├── dashboard.png
├── index.png
├── minio.png
├── weather-station.ndjson
└── ws-1550-ip.png
/.gitignore:
--------------------------------------------------------------------------------
1 | .DS_Store
2 | *.swp
3 | *.bak
4 | *.orig
5 | *.dump
6 | *.keystore
7 | tmp
8 |
--------------------------------------------------------------------------------
/co2meter/README.md:
--------------------------------------------------------------------------------
1 | # CO2 Monitoring
2 |
3 |
4 |
5 | The [CO2Mini](https://www.co2meter.com/collections/desktop/products/co2mini-co2-indoor-air-quality-monitor?variant=308811055) is an indoor air quality monitor that displays the CO2 level of the room it's in. It's often used in home and office settings since it's been shown that elevated CO2 levels can cause [fatigue](https://pubmed.ncbi.nlm.nih.gov/26273786/) and [impair decisions](https://newscenter.lbl.gov/2012/10/17/elevated-indoor-carbon-dioxide-impairs-decision-making-performance/). The CO2Mini connects to a computer via USB where it can be read programmatically.
6 |
7 | In this data source, we'll build the following dashboard with Elastic:
8 |
9 | 
10 |
11 | Let's get started!
12 |
13 | ## Step #1 - Collect Data
14 |
15 | Create a new python script called `~/bin/co2meter.py` with the following contents:
16 |
17 | [co2meter.py](co2meter.py)
18 |
19 | The script was originally written by [Henryk Plötz](https://hackaday.io/project/5301-reverse-engineering-a-low-cost-usb-co-monitor/log/17909-all-your-base-are-belong-to-us) and has only a few minor edits so it works with Python3.
20 |
21 | Take a few minutes to familiarize yourself with the script. There are a couple of labels you can change near the bottom. Adjust the values of `hostname` and `location` to suit your needs.
22 |
23 | With your CO2Mini plugged in, try running the script:
24 |
25 | ```bash
26 | chmod a+x ~/bin/co2meter.py
27 | sudo ~/bin/co2meter.py
28 | ```
29 |
30 | We'll run our script with `sudo`, but you could add a `udev` rule to give your user permission to `/dev/hidraw0`.
31 |
32 | You should see output on `stdout` similar to:
33 |
34 | ```json
35 | {"@timestamp": "2021-09-01T20:38:06.353614", "hostname": "node", "location": "office", "co2_ppm": 438, "temp_c": 27.79, "temp_f": 82.02, "source": "CO2 Meter"}
36 | ```
37 |
38 | Once you confirm the script is working, you can redirect its output to a log file:
39 |
40 | ```bash
41 | sudo touch /var/log/co2meter.log
42 | sudo chown ubuntu.ubuntu /var/log/co2meter.log
43 | ```
44 |
45 | Create a logrotate entry so the log file doesn't grow unbounded:
46 |
47 | ```bash
48 | sudo vi /etc/logrotate.d/co2meter
49 | ```
50 |
51 | Add the following logrotate content:
52 |
53 | ```
54 | /var/log/co2meter.log {
55 | weekly
56 | rotate 12
57 | compress
58 | delaycompress
59 | missingok
60 | notifempty
61 | create 644 ubuntu ubuntu
62 | }
63 | ```
64 |
65 | Add the following entry to your crontab with `crontab -e`:
66 |
67 | ```
68 | * * * * * /home/ubuntu/bin/co2meter.py >> /var/log/co2meter.log 2>&1
69 | ```
70 |
71 | Verify output by tailing the log file for a few minutes (since cron is only running the script at the start of each minute):
72 |
73 | ```bash
74 | tail -f /var/log/co2meter.log
75 | ```
76 |
77 | If you're seeing output scroll each minute then you are successfully collecting data!
78 |
79 | ## Step #2 - Archive Data
80 |
81 | Once your data is ready to archive, we'll use Filebeat to send it to Logstash which will in turn sends it to S3.
82 |
83 | Add the following to the Filebeat config `/etc/filebeat/filebeat.yml` on the host logging your CO2 data:
84 |
85 | ```yaml
86 | filebeat.inputs:
87 | - type: log
88 | enabled: true
89 | tags: ["co2meter"]
90 | paths:
91 | - /var/log/co2meter.log
92 | ```
93 |
94 | This tells Filebeat where the log file is located and it adds a tag to each event. We'll refer to that tag in Logstash so we can easily isolate events from this data stream.
95 |
96 | Restart Filebeat:
97 |
98 | ```bash
99 | sudo systemctl restart filebeat
100 | ```
101 |
102 | You may want to tail syslog to see if Filebeat restarts without any issues:
103 |
104 | ```bash
105 | tail -f /var/log/syslog | grep filebeat
106 | ```
107 |
108 | At this point, we should have CO2 Meter data flowing into Logstash. By default however, our `distributor` pipeline in Logstash will put any unrecognized data in our Data Lake / S3 bucket called `NEEDS_CLASSIFIED`. To change this, we're going to update the `distributor` pipeline to recognize the CO2 Meter data feed.
109 |
110 | Add the following conditional to your `distributor.yml` file:
111 |
112 | ```
113 | } else if "co2meter" in [tags] {
114 | pipeline {
115 | send_to => ["co2meter-archive"]
116 | }
117 | }
118 | ```
119 |
120 | Create a Logstash pipeline called `co2meter-archive.yml` with the following contents:
121 |
122 | ```
123 | input {
124 | pipeline {
125 | address => "co2meter-archive"
126 | }
127 | }
128 | filter {
129 | }
130 | output {
131 | s3 {
132 | #
133 | # Custom Settings
134 | #
135 | prefix => "co2meter/%{+YYYY}-%{+MM}-%{+dd}/%{+HH}"
136 | temporary_directory => "${S3_TEMP_DIR}/co2meter-archive"
137 | access_key_id => "${S3_ACCESS_KEY}"
138 | secret_access_key => "${S3_SECRET_KEY}"
139 | endpoint => "${S3_ENDPOINT}"
140 | bucket => "${S3_BUCKET}"
141 |
142 | #
143 | # Standard Settings
144 | #
145 | validate_credentials_on_root_bucket => false
146 | codec => json_lines
147 | # Limit Data Lake file sizes to 5 GB
148 | size_file => 5000000000
149 | time_file => 60
150 | # encoding => "gzip"
151 | additional_settings => {
152 | force_path_style => true
153 | follow_redirects => false
154 | }
155 | }
156 | }
157 | ```
158 |
159 | Put this pipeline in your Logstash configuration directory so it gets loaded whenever Logstash restarts:
160 |
161 | ```bash
162 | sudo mv co2meter-archive.yml /etc/logstash/conf.d/
163 | ```
164 |
165 | Add the pipeline to your `/etc/logstash/pipelines.yml` file:
166 |
167 | ```
168 | - pipeline.id: "co2meter-archive"
169 | path.config: "/etc/logstash/conf.d/co2meter-archive.conf"
170 | ```
171 |
172 | And finally, restart the Logstash service:
173 |
174 | ```bash
175 | sudo systemctl restart logstash
176 | ```
177 |
178 | While Logstash is restarting, you can tail it's log file in order to see if there are any configuration errors:
179 |
180 | ```bash
181 | sudo tail -f /var/log/logstash/logstash-plain.log
182 | ```
183 |
184 | After a few seconds, you should see Logstash shutdown and start with the new pipeline and no errors being emitted.
185 |
186 | Check your cluster's Stack Monitoring to see if we're getting events through the pipeline:
187 |
188 | 
189 |
190 | Check your S3 bucket to see if you're getting data directories created for the current date & hour with data:
191 |
192 | 
193 |
194 | If you see your data being stored, then you are successfully archiving!
195 |
196 | ## Step #3 - Index Data
197 |
198 | Once Logstash is archiving the data, next we need to index it with Elastic.
199 |
200 | We'll use Elastic's [Dynamic field mapping](https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-field-mapping.html) feature to automatically create the right [Field data types](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html) for the data we're sending in.
201 |
202 | Using the [Logstash Toolkit](http://github.com/gose/logstash-toolkit), the following filter chain has been built that can parse the raw JSON coming in.
203 |
204 | Create a new pipeline called `co2meter-index.yml` with the following content:
205 |
206 | ```
207 | input {
208 | pipeline {
209 | address => "co2meter-index"
210 | }
211 | }
212 | filter {
213 | json {
214 | source => "message"
215 | }
216 | json {
217 | source => "message"
218 | }
219 | mutate {
220 | remove_field => ["message", "agent", "host", "input", "log", "host", "ecs", "@version"]
221 | }
222 | }
223 | output {
224 | elasticsearch {
225 | #
226 | # Custom Settings
227 | #
228 | id => "co2meter-index"
229 | index => "co2meter-%{+YYYY.MM.dd}"
230 | hosts => "${ES_ENDPOINT}"
231 | user => "${ES_USERNAME}"
232 | password => "${ES_PASSWORD}"
233 | }
234 | }
235 | ```
236 |
237 | Put this pipeline in your Logstash configuration directory so it gets loaded in whenever Logstash restarts:
238 |
239 | ```bash
240 | sudo mv co2meter-index.yml /etc/logstash/conf.d/
241 | ```
242 |
243 | Add the pipeline to your `/etc/logstash/pipelines.yml` file:
244 |
245 | ```
246 | - pipeline.id: "co2meter-index"
247 | path.config: "/etc/logstash/conf.d/co2meter-index.conf"
248 | ```
249 |
250 | And finally, restart the Logstash service:
251 |
252 | ```bash
253 | sudo systemctl restart logstash
254 | ```
255 |
256 | While Logstash is restarting, you can tail it's log file in order to see if there are any configuration errors:
257 |
258 | ```bash
259 | sudo tail -f /var/log/logstash/logstash-plain.log
260 | ```
261 |
262 | After a few seconds, you should see Logstash shutdown and start with the new pipeline and no errors being emitted.
263 |
264 | Check your cluster's Stack Monitoring to see if we're getting events through the pipeline:
265 |
266 | 
267 |
268 | ## Step #4 - Visualize Data
269 |
270 | Once Elasticsearch is indexing the data, we want to visualize it in Kibana.
271 |
272 | Download this dashboard: [co2meter.ndjson](co2meter.ndjson)
273 |
274 | Jump back into Kibana:
275 |
276 | 1. Select "Stack Management" from the menu
277 | 2. Select "Saved Objects"
278 | 3. Click "Import" in the upper right
279 |
280 | Once it's been imported, click on "CO2 Meter".
281 |
282 | Congratulations! You should now be looking at data from your CO2 Meter in Elastic.
283 |
284 | 
285 |
286 | These graphs can be added to the [Weather Station](../weather-station) data source.
287 |
--------------------------------------------------------------------------------
/co2meter/archive.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/co2meter/archive.png
--------------------------------------------------------------------------------
/co2meter/co2meter.ndjson:
--------------------------------------------------------------------------------
1 | {"attributes":{"fieldAttrs":"{}","fields":"[]","runtimeFieldMap":"{}","timeFieldName":"@timestamp","title":"co2meter-*","typeMeta":"{}"},"coreMigrationVersion":"7.14.0","id":"74e365f0-0b03-11ec-b013-53a9df7625dd","migrationVersion":{"index-pattern":"7.11.0"},"references":[],"type":"index-pattern","updated_at":"2021-09-01T09:03:21.562Z","version":"WzIxODM4MSwyXQ=="}
2 | {"attributes":{"description":"","hits":0,"kibanaSavedObjectMeta":{"searchSourceJSON":"{\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filter\":[]}"},"optionsJSON":"{\"useMargins\":true,\"syncColors\":false,\"hidePanelTitles\":false}","panelsJSON":"[{\"version\":\"7.14.0\",\"type\":\"visualization\",\"gridData\":{\"x\":0,\"y\":0,\"w\":11,\"h\":4,\"i\":\"83813ed8-374f-42f7-851e-453e236435be\"},\"panelIndex\":\"83813ed8-374f-42f7-851e-453e236435be\",\"embeddableConfig\":{\"savedVis\":{\"title\":\"\",\"description\":\"\",\"type\":\"markdown\",\"params\":{\"fontSize\":12,\"openLinksInNewTab\":false,\"markdown\":\"# CO2 Meter\"},\"uiState\":{},\"data\":{\"aggs\":[],\"searchSource\":{\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filter\":[]}}},\"enhancements\":{}}},{\"version\":\"7.14.0\",\"type\":\"lens\",\"gridData\":{\"x\":0,\"y\":4,\"w\":48,\"h\":12,\"i\":\"2af5ddba-aad0-40ae-b802-4a9e3e83fc33\"},\"panelIndex\":\"2af5ddba-aad0-40ae-b802-4a9e3e83fc33\",\"embeddableConfig\":{\"attributes\":{\"title\":\"\",\"type\":\"lens\",\"visualizationType\":\"lnsXY\",\"state\":{\"datasourceStates\":{\"indexpattern\":{\"layers\":{\"6619bac7-cb55-4c41-81cb-f417f1e8d4d4\":{\"columns\":{\"8b1ecdee-e774-4a13-b160-76f80601de32\":{\"label\":\"@timestamp\",\"dataType\":\"date\",\"operationType\":\"date_histogram\",\"sourceField\":\"@timestamp\",\"isBucketed\":true,\"scale\":\"interval\",\"params\":{\"interval\":\"auto\"}},\"bbd7882d-13ea-40f2-9d25-c863db2ff550\":{\"label\":\"Median of co2_ppm\",\"dataType\":\"number\",\"operationType\":\"median\",\"sourceField\":\"co2_ppm\",\"isBucketed\":false,\"scale\":\"ratio\"},\"5ed35199-2f7e-4098-b627-ba46c5bf53f8\":{\"label\":\"Top values of hostname.keyword\",\"dataType\":\"string\",\"operationType\":\"terms\",\"scale\":\"ordinal\",\"sourceField\":\"hostname.keyword\",\"isBucketed\":true,\"params\":{\"size\":3,\"orderBy\":{\"type\":\"column\",\"columnId\":\"bbd7882d-13ea-40f2-9d25-c863db2ff550\"},\"orderDirection\":\"desc\",\"otherBucket\":true,\"missingBucket\":false}}},\"columnOrder\":[\"5ed35199-2f7e-4098-b627-ba46c5bf53f8\",\"8b1ecdee-e774-4a13-b160-76f80601de32\",\"bbd7882d-13ea-40f2-9d25-c863db2ff550\"],\"incompleteColumns\":{}}}}},\"visualization\":{\"legend\":{\"isVisible\":true,\"position\":\"right\"},\"valueLabels\":\"hide\",\"fittingFunction\":\"None\",\"curveType\":\"CURVE_MONOTONE_X\",\"yLeftExtent\":{\"mode\":\"custom\",\"lowerBound\":0,\"upperBound\":1600},\"yRightExtent\":{\"mode\":\"full\"},\"axisTitlesVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"tickLabelsVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"gridlinesVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"preferredSeriesType\":\"line\",\"layers\":[{\"layerId\":\"6619bac7-cb55-4c41-81cb-f417f1e8d4d4\",\"accessors\":[\"bbd7882d-13ea-40f2-9d25-c863db2ff550\"],\"position\":\"top\",\"seriesType\":\"line\",\"showGridlines\":false,\"xAccessor\":\"8b1ecdee-e774-4a13-b160-76f80601de32\",\"splitAccessor\":\"5ed35199-2f7e-4098-b627-ba46c5bf53f8\"}]},\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filters\":[]},\"references\":[{\"type\":\"index-pattern\",\"id\":\"74e365f0-0b03-11ec-b013-53a9df7625dd\",\"name\":\"indexpattern-datasource-current-indexpattern\"},{\"type\":\"index-pattern\",\"id\":\"74e365f0-0b03-11ec-b013-53a9df7625dd\",\"name\":\"indexpattern-datasource-layer-6619bac7-cb55-4c41-81cb-f417f1e8d4d4\"}]},\"enhancements\":{}}},{\"version\":\"7.14.0\",\"type\":\"lens\",\"gridData\":{\"x\":0,\"y\":16,\"w\":48,\"h\":12,\"i\":\"35baf3b9-4b4f-4248-8044-6415c253c84c\"},\"panelIndex\":\"35baf3b9-4b4f-4248-8044-6415c253c84c\",\"embeddableConfig\":{\"attributes\":{\"title\":\"\",\"type\":\"lens\",\"visualizationType\":\"lnsXY\",\"state\":{\"datasourceStates\":{\"indexpattern\":{\"layers\":{\"f4357ceb-17b0-4b3a-8118-8529aa62726d\":{\"columns\":{\"8fe94aed-879a-4a2f-89c1-e787fe8a55fe\":{\"label\":\"@timestamp\",\"dataType\":\"date\",\"operationType\":\"date_histogram\",\"sourceField\":\"@timestamp\",\"isBucketed\":true,\"scale\":\"interval\",\"params\":{\"interval\":\"auto\"}},\"07ebffaa-db61-4e78-b3cc-7bd46277170d\":{\"label\":\"Top values of hostname.keyword\",\"dataType\":\"string\",\"operationType\":\"terms\",\"scale\":\"ordinal\",\"sourceField\":\"hostname.keyword\",\"isBucketed\":true,\"params\":{\"size\":3,\"orderBy\":{\"type\":\"column\",\"columnId\":\"c589e9a4-17c7-4f72-96d3-cdccebbe04a2\"},\"orderDirection\":\"desc\",\"otherBucket\":true,\"missingBucket\":false}},\"c589e9a4-17c7-4f72-96d3-cdccebbe04a2\":{\"label\":\"Median of temp_f\",\"dataType\":\"number\",\"operationType\":\"median\",\"sourceField\":\"temp_f\",\"isBucketed\":false,\"scale\":\"ratio\"}},\"columnOrder\":[\"07ebffaa-db61-4e78-b3cc-7bd46277170d\",\"8fe94aed-879a-4a2f-89c1-e787fe8a55fe\",\"c589e9a4-17c7-4f72-96d3-cdccebbe04a2\"],\"incompleteColumns\":{}}}}},\"visualization\":{\"legend\":{\"isVisible\":true,\"position\":\"right\"},\"valueLabels\":\"hide\",\"fittingFunction\":\"None\",\"curveType\":\"CURVE_MONOTONE_X\",\"yLeftExtent\":{\"mode\":\"custom\",\"lowerBound\":0,\"upperBound\":100},\"yRightExtent\":{\"mode\":\"full\"},\"axisTitlesVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"tickLabelsVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"gridlinesVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"preferredSeriesType\":\"line\",\"layers\":[{\"layerId\":\"f4357ceb-17b0-4b3a-8118-8529aa62726d\",\"accessors\":[\"c589e9a4-17c7-4f72-96d3-cdccebbe04a2\"],\"position\":\"top\",\"seriesType\":\"line\",\"showGridlines\":false,\"xAccessor\":\"8fe94aed-879a-4a2f-89c1-e787fe8a55fe\",\"splitAccessor\":\"07ebffaa-db61-4e78-b3cc-7bd46277170d\"}]},\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filters\":[]},\"references\":[{\"type\":\"index-pattern\",\"id\":\"74e365f0-0b03-11ec-b013-53a9df7625dd\",\"name\":\"indexpattern-datasource-current-indexpattern\"},{\"type\":\"index-pattern\",\"id\":\"74e365f0-0b03-11ec-b013-53a9df7625dd\",\"name\":\"indexpattern-datasource-layer-f4357ceb-17b0-4b3a-8118-8529aa62726d\"}]},\"enhancements\":{}}}]","timeRestore":false,"title":"CO2 Meter","version":1},"coreMigrationVersion":"7.14.0","id":"90479db0-0b04-11ec-b013-53a9df7625dd","migrationVersion":{"dashboard":"7.14.0"},"references":[{"id":"74e365f0-0b03-11ec-b013-53a9df7625dd","name":"2af5ddba-aad0-40ae-b802-4a9e3e83fc33:indexpattern-datasource-current-indexpattern","type":"index-pattern"},{"id":"74e365f0-0b03-11ec-b013-53a9df7625dd","name":"2af5ddba-aad0-40ae-b802-4a9e3e83fc33:indexpattern-datasource-layer-6619bac7-cb55-4c41-81cb-f417f1e8d4d4","type":"index-pattern"},{"id":"74e365f0-0b03-11ec-b013-53a9df7625dd","name":"35baf3b9-4b4f-4248-8044-6415c253c84c:indexpattern-datasource-current-indexpattern","type":"index-pattern"},{"id":"74e365f0-0b03-11ec-b013-53a9df7625dd","name":"35baf3b9-4b4f-4248-8044-6415c253c84c:indexpattern-datasource-layer-f4357ceb-17b0-4b3a-8118-8529aa62726d","type":"index-pattern"}],"type":"dashboard","updated_at":"2021-09-01T20:50:22.668Z","version":"WzIzNDk3MCwyXQ=="}
3 | {"excludedObjects":[],"excludedObjectsCount":0,"exportedCount":2,"missingRefCount":0,"missingReferences":[]}
--------------------------------------------------------------------------------
/co2meter/co2meter.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/co2meter/co2meter.png
--------------------------------------------------------------------------------
/co2meter/co2meter.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 |
3 | import datetime
4 | import fcntl
5 | import json
6 | import sys
7 | import time
8 |
9 | def decrypt(key, data):
10 | cstate = [0x48, 0x74, 0x65, 0x6D, 0x70, 0x39, 0x39, 0x65]
11 | shuffle = [2, 4, 0, 7, 1, 6, 5, 3]
12 | phase1 = [0] * 8
13 | for i, o in enumerate(shuffle):
14 | phase1[o] = data[i]
15 | phase2 = [0] * 8
16 | for i in range(8):
17 | phase2[i] = phase1[i] ^ key[i]
18 | phase3 = [0] * 8
19 | for i in range(8):
20 | phase3[i] = ( (phase2[i] >> 3) | (phase2[ (i-1+8)%8 ] << 5) ) & 0xff
21 | ctmp = [0] * 8
22 | for i in range(8):
23 | ctmp[i] = ( (cstate[i] >> 4) | (cstate[i]<<4) ) & 0xff
24 | out = [0] * 8
25 | for i in range(8):
26 | out[i] = (0x100 + phase3[i] - ctmp[i]) & 0xff
27 | return out
28 |
29 | def hd(d):
30 | return " ".join("%02X" % e for e in d)
31 |
32 | if __name__ == "__main__":
33 | key = [0xc4, 0xc6, 0xc0, 0x92, 0x40, 0x23, 0xdc, 0x96]
34 | fp = open("/dev/hidraw0", "a+b", 0)
35 | set_report = [0] + [0xc4, 0xc6, 0xc0, 0x92, 0x40, 0x23, 0xdc, 0x96]
36 | fcntl.ioctl(fp, 0xC0094806, bytearray(set_report))
37 |
38 | values = {}
39 |
40 | co2_ppm = 0
41 | temp_c = 1000
42 | i = 0
43 |
44 | while True:
45 | i += 1
46 | if i == 10:
47 | break
48 | data = list(fp.read(8))
49 | decrypted = decrypt(key, data)
50 | if decrypted[4] != 0x0d or (sum(decrypted[:3]) & 0xff) != decrypted[3]:
51 | print(hd(data), " => ", hd(decrypted), "Checksum error")
52 | else:
53 | op = decrypted[0]
54 | val = decrypted[1] << 8 | decrypted[2]
55 | values[op] = val
56 | # http://co2meters.com/Documentation/AppNotes/AN146-RAD-0401-serial-communication.pdf
57 | if 0x50 in values:
58 | co2_ppm = values[0x50]
59 | if 0x42 in values:
60 | temp_c = values[0x42] / 16.0 - 273.15
61 |
62 | temp_f = (temp_c * 9 / 5) + 32
63 | output = {
64 | "@timestamp": datetime.datetime.utcnow().isoformat(),
65 | "hostname": "node-21",
66 | "location": "office",
67 | "co2_ppm": co2_ppm,
68 | "temp_c": float("%2.2f" % temp_c),
69 | "temp_f": float("%2.2f" % temp_f),
70 | "source": "CO2 Meter"
71 | }
72 | print(json.dumps(output))
73 |
--------------------------------------------------------------------------------
/co2meter/dashboard.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/co2meter/dashboard.png
--------------------------------------------------------------------------------
/co2meter/index.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/co2meter/index.png
--------------------------------------------------------------------------------
/co2meter/minio.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/co2meter/minio.png
--------------------------------------------------------------------------------
/directory-sizes/README.md:
--------------------------------------------------------------------------------
1 | # Monitoring Directory Sizes
2 |
3 |
4 |
5 | Keeping an eye on the growth of your Data Lake is useful for a few reasons:
6 |
7 | 1. See how fast each data source is growing on disk
8 | 2. Keep an eye on how much space you have available
9 | 3. Better understand the cost of storing each data source
10 |
11 | We'll use a Python script to query the size of each directory in our Data Lake (via NFS mount) in addition to recording the total size and space available for use. Our script will write to stdout which we'll redirect to a log file. From there, Filebeat will pick it up and send it to Elastic.
12 |
13 | 
14 |
15 | Let's get started.
16 |
17 | ## Step #1 - Collect Data
18 |
19 | Create a Python script at `~/bin/directory-sizes.py` with the following contents (adjusting any values as you see fit):
20 |
21 | ```python
22 | #!/usr/bin/env python3
23 |
24 | import datetime
25 | import json
26 | import os
27 |
28 | path = "/mnt/data-lake"
29 |
30 | def get_size(start_path = path):
31 | total_size = 0
32 | for dirpath, dirnames, filenames in os.walk(start_path):
33 | for f in filenames:
34 | fp = os.path.join(dirpath, f)
35 | if not os.path.islink(fp): # skip symbolic links
36 | total_size += os.path.getsize(fp)
37 | return total_size
38 |
39 | if __name__ == "__main__":
40 | if os.path.ismount(path):
41 | # Get size of each directory
42 | for d in os.listdir(path):
43 | size_bytes = get_size(path + "/" + d)
44 | output = {
45 | "@timestamp": datetime.datetime.utcnow().isoformat(),
46 | "dir": d,
47 | "bytes": size_bytes
48 | }
49 | print(json.dumps(output))
50 |
51 | # Get total, available, and free space
52 | statvfs = os.statvfs(path)
53 | output = {
54 | "@timestamp": datetime.datetime.utcnow().isoformat(),
55 | "total_bytes": statvfs.f_frsize * statvfs.f_blocks, # Size of filesystem in bytes
56 | "free_bytes": statvfs.f_frsize * statvfs.f_bfree, # Free bytes total
57 | "available_bytes": statvfs.f_frsize * statvfs.f_bavail, # Free bytes for users
58 | "mounted": True
59 | }
60 | print(json.dumps(output))
61 | else:
62 | output = {
63 | "@timestamp": datetime.datetime.utcnow().isoformat(),
64 | "mounted": False
65 | }
66 | print(json.dumps(output))
67 | ```
68 |
69 | Try running the script from the command line:
70 |
71 | ```bash
72 | chmod a+x ~/bin/directory-sizes.py
73 | ~/bin/directory-sizes.py
74 | ```
75 |
76 | The output should look like the following:
77 |
78 | ```json
79 | {"@timestamp": "2021-09-06T14:46:37.376487", "dir": "nginx", "bytes": 1445406508}
80 | {"@timestamp": "2021-09-06T14:46:39.673445", "dir": "system-metricbeat-module", "bytes": 62265436549}
81 | {"@timestamp": "2021-09-06T14:46:39.683812", "dir": "flights", "bytes": 5943006981}
82 | {"@timestamp": "2021-09-06T14:46:41.122360", "dir": "haproxy-metricbeat-module", "bytes": 15443596238}
83 | {"@timestamp": "2021-09-06T14:46:41.122731", "dir": "weather-historical", "bytes": 137599636}
84 | ...
85 | ```
86 |
87 | Once you're able to successfully run the script, create a log file for its output:
88 |
89 | ```bash
90 | sudo touch /var/log/directory-sizes.log
91 | sudo chown ubuntu.ubuntu /var/log/directory-sizes.log
92 | ```
93 |
94 | Create a logrotate entry so the log file doesn't grow unbounded:
95 |
96 | ```
97 | sudo vi /etc/logrotate.d/directory-sizes
98 | ```
99 |
100 | Add the following content:
101 |
102 | ```
103 | /var/log/directory-sizes.log {
104 | weekly
105 | rotate 12
106 | compress
107 | delaycompress
108 | missingok
109 | notifempty
110 | create 644 ubuntu ubuntu
111 | }
112 | ```
113 |
114 | Add the following entry to your crontab:
115 |
116 | ```
117 | * * * * * sudo /home/ubuntu/bin/directory-sizes.py >> /var/log/directory-sizes.log 2>&1
118 | ```
119 |
120 | Verify output by tailing the log file for a few minutes:
121 |
122 | ```
123 | $ tail -f /var/log/directory-sizes.log
124 | ```
125 |
126 | If you're seeing output scroll each minute then you are successfully collecting data!
127 |
128 | ## Step #2 - Archive Data
129 |
130 | Once your data is ready to archive, we'll use Filebeat to send it to Logstash which will in turn sends it to S3.
131 |
132 | Add the following to the Filebeat config `/etc/filebeat/filebeat.yml` on the host logging your DHT22 data:
133 |
134 | ```yaml
135 | filebeat.inputs:
136 | - type: log
137 | enabled: true
138 | tags: ["directory-sizes"]
139 | paths:
140 | - /var/log/directory-sizes.log
141 | ```
142 |
143 | This tells Filebeat where the log file is located and it adds a tag to each event. We'll refer to that tag in Logstash so we can easily isolate events from this data stream.
144 |
145 | Restart Filebeat:
146 |
147 | ```bash
148 | sudo systemctl restart filebeat
149 | ```
150 |
151 | You may want to tail syslog to see if Filebeat restarts without any issues:
152 |
153 | ```bash
154 | tail -f /var/log/syslog | grep filebeat
155 | ```
156 |
157 | At this point, we should have DHT22 data flowing into Logstash. By default however, our `distributor` pipeline in Logstash will put any unrecognized data in our Data Lake / S3 bucket called `NEEDS_CLASSIFIED`. To change this, we're going to update the `distributor` pipeline to recognize the DHT22 data feed.
158 |
159 | Add the following conditional to your `distributor.yml` file:
160 |
161 | ```
162 | } else if "directory-sizes" in [tags] {
163 | pipeline {
164 | send_to => ["directory-sizes-archive"]
165 | }
166 | }
167 | ```
168 |
169 | Create a Logstash pipeline called `directory-sizes-archive.yml` with the following contents:
170 |
171 | ```
172 | input {
173 | pipeline {
174 | address => "directory-sizes-archive"
175 | }
176 | }
177 | filter {
178 | }
179 | output {
180 | s3 {
181 | #
182 | # Custom Settings
183 | #
184 | prefix => "directory-sizes/%{+YYYY}-%{+MM}-%{+dd}/%{+HH}"
185 | temporary_directory => "${S3_TEMP_DIR}/directory-sizes-archive"
186 | access_key_id => "${S3_ACCESS_KEY}"
187 | secret_access_key => "${S3_SECRET_KEY}"
188 | endpoint => "${S3_ENDPOINT}"
189 | bucket => "${S3_BUCKET}"
190 |
191 | #
192 | # Standard Settings
193 | #
194 | validate_credentials_on_root_bucket => false
195 | codec => json_lines
196 | # Limit Data Lake file sizes to 5 GB
197 | size_file => 5000000000
198 | time_file => 60
199 | # encoding => "gzip"
200 | additional_settings => {
201 | force_path_style => true
202 | follow_redirects => false
203 | }
204 | }
205 | }
206 | ```
207 |
208 | Put this pipeline in your Logstash configuration directory so it gets loaded whenever Logstash restarts:
209 |
210 | ```bash
211 | sudo mv directory-sizes-archive.yml /etc/logstash/conf.d/
212 | ```
213 |
214 | Add the pipeline to your `/etc/logstash/pipelines.yml` file:
215 |
216 | ```
217 | - pipeline.id: "directory-sizes-archive"
218 | path.config: "/etc/logstash/conf.d/directory-sizes-archive.conf"
219 | ```
220 |
221 | And finally, restart the Logstash service:
222 |
223 | ```bash
224 | sudo systemctl restart logstash
225 | ```
226 |
227 | While Logstash is restarting, you can tail it's log file in order to see if there are any configuration errors:
228 |
229 | ```bash
230 | sudo tail -f /var/log/logstash/logstash-plain.log
231 | ```
232 |
233 | After a few seconds, you should see Logstash shutdown and start with the new pipeline and no errors being emitted.
234 |
235 | Check your cluster's Stack Monitoring to see if we're getting events through the pipeline:
236 |
237 | 
238 |
239 | Check your S3 bucket to see if you're getting data directories created for the current date & hour with data:
240 |
241 | 
242 |
243 | If you see your data being stored, then you are successfully archiving!
244 |
245 | ## Step #3 - Index Data
246 |
247 | Once Logstash is archiving the data, next we need to index it with Elastic.
248 |
249 | We'll use Elastic's [Dynamic field mapping](https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-field-mapping.html) feature to automatically create the right [Field data types](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html) for the data we're sending in.
250 |
251 | Using the [Logstash Toolkit](http://github.com/gose/logstash-toolkit), the following filter chain has been built that can parse the raw JSON coming in.
252 |
253 | Create a new pipeline called `directory-sizes-index.yml` with the following content:
254 |
255 | ```
256 | input {
257 | pipeline {
258 | address => "directory-sizes-index"
259 | }
260 | }
261 | filter {
262 | json {
263 | source => "message"
264 | }
265 | json {
266 | source => "message"
267 | }
268 | date {
269 | match => ["timestamp", "ISO8601"]
270 | }
271 | mutate {
272 | remove_field => ["timestamp", "message"]
273 | remove_field => ["tags", "agent", "input", "log", "path", "ecs", "@version"]
274 | }
275 | }
276 | output {
277 | elasticsearch {
278 | #
279 | # Custom Settings
280 | #
281 | id => "directory-sizes-index"
282 | index => "directory-sizes-%{+YYYY.MM.dd}"
283 | hosts => "${ES_ENDPOINT}"
284 | user => "${ES_USERNAME}"
285 | password => "${ES_PASSWORD}"
286 | }
287 | }
288 | ```
289 |
290 | Put this pipeline in your Logstash configuration directory so it gets loaded in whenever Logstash restarts:
291 |
292 | ```bash
293 | sudo mv directory-sizes-index.yml /etc/logstash/conf.d/
294 | ```
295 |
296 | Add the pipeline to your `/etc/logstash/pipelines.yml` file:
297 |
298 | ```
299 | - pipeline.id: "directory-sizes-index"
300 | path.config: "/etc/logstash/conf.d/directory-sizes-index.conf"
301 | ```
302 |
303 | Append your new pipeline to your tagged data in the `distributor.yml` pipeline:
304 |
305 | ```
306 | } else if "directory-sizes" in [tags] {
307 | pipeline {
308 | send_to => ["directory-sizes-archive", "directory-sizes-index"]
309 | }
310 | }
311 | ```
312 |
313 | And finally, restart the Logstash service:
314 |
315 | ```bash
316 | sudo systemctl restart logstash
317 | ```
318 |
319 | While Logstash is restarting, you can tail it's log file in order to see if there are any configuration errors:
320 |
321 | ```bash
322 | sudo tail -f /var/log/logstash/logstash-plain.log
323 | ```
324 |
325 | After a few seconds, you should see Logstash shutdown and start with the new pipeline and no errors being emitted.
326 |
327 | Check your cluster's Stack Monitoring to see if we're getting events through the pipeline:
328 |
329 | 
330 |
331 | ## Step #4 - Visualize Data
332 |
333 | Once Elasticsearch is indexing the data, we want to visualize it in Kibana.
334 |
335 | Download this dashboard: [directory-sizes.ndjson](directory-sizes.ndjson)
336 |
337 | Jump back into Kibana:
338 |
339 | 1. Select "Stack Management" from the menu
340 | 2. Select "Saved Objects"
341 | 3. Click "Import" in the upper right
342 |
343 | Once it's been imported, click on "Temperature DHT22".
344 |
345 | 
346 |
347 | Congratulations! You should now be looking at temperature data from your DHT22 in Elastic.
348 |
349 |
--------------------------------------------------------------------------------
/directory-sizes/archive.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/directory-sizes/archive.png
--------------------------------------------------------------------------------
/directory-sizes/dashboard.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/directory-sizes/dashboard.png
--------------------------------------------------------------------------------
/directory-sizes/directory-sizes.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 |
3 | import datetime
4 | import json
5 | import os
6 |
7 | path = "/mnt/data-lake"
8 |
9 | def get_size(start_path = path):
10 | total_size = 0
11 | for dirpath, dirnames, filenames in os.walk(start_path):
12 | for f in filenames:
13 | fp = os.path.join(dirpath, f)
14 | # skip if it is symbolic link
15 | if not os.path.islink(fp):
16 | total_size += os.path.getsize(fp)
17 |
18 | return total_size
19 |
20 | if __name__ == "__main__":
21 |
22 | if os.path.ismount(path):
23 | # Get size of each directory
24 | for d in os.listdir(path):
25 | size_bytes = get_size(path + "/" + d)
26 | output = {
27 | "@timestamp": datetime.datetime.utcnow().isoformat(),
28 | "directory": d,
29 | "bytes": size_bytes
30 | }
31 | print(json.dumps(output))
32 |
33 | # Get total, available, and free space
34 | statvfs = os.statvfs(path)
35 | output = {
36 | "@timestamp": datetime.datetime.utcnow().isoformat(),
37 | "total_bytes": statvfs.f_frsize * statvfs.f_blocks, # Size of filesystem in bytes
38 | "free_bytes": statvfs.f_frsize * statvfs.f_bfree, # Free bytes total
39 | "available_bytes": statvfs.f_frsize * statvfs.f_bavail, # Free bytes for users
40 | "mounted": True
41 | }
42 | print(json.dumps(output))
43 | else:
44 | output = {
45 | "@timestamp": datetime.datetime.utcnow().isoformat(),
46 | "mounted": False
47 | }
48 | print(json.dumps(output))
49 |
50 |
--------------------------------------------------------------------------------
/directory-sizes/index.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/directory-sizes/index.png
--------------------------------------------------------------------------------
/directory-sizes/logo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/directory-sizes/logo.png
--------------------------------------------------------------------------------
/directory-sizes/minio.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/directory-sizes/minio.png
--------------------------------------------------------------------------------
/flight-tracker/README.md:
--------------------------------------------------------------------------------
1 | # Elastic Flight Tracker
2 |
3 |
4 |
5 | For this data source, we'll be using an [SDR](https://www.amazon.com/gp/product/B01GDN1T4S) to track aircraft flights via [ADS-B](https://mode-s.org/decode/). We'll use a Python script to decode the signals and write them to a log file. Elastic's Filebeat will pick them up from there and handle getting them to Logstash.
6 |
7 | We'll build the following dashboard with Elastic:
8 |
9 | 
10 |
11 | Let's get started!
12 |
13 | ## Step #1 - Collect Data
14 |
15 | Create a new python script called `~/bin/flight-tracker.py` with the following contents:
16 |
17 | [flight-tracker.py](flight-tracker.py)
18 |
19 | The script requires that your SDR be plugged in before running.
20 |
21 | Take a few minutes to familiarize yourself with the script. Adjust the values of `` and ``. You can use [LatLong.net](https://www.latlong.net/) to lookup your location.
22 |
23 | When you're ready, try running the script:
24 |
25 | ```bash
26 | chmod a+x ~/bin/flight-tracker.py
27 | sudo ~/bin/flight-tracker.py
28 | ```
29 |
30 | It may take a few minutes to see output if you're in a quiet airspace, but once ~10 messages have been received you should see output on `stdout` similar to:
31 |
32 | ```json
33 | {"@timestamp": "2021-09-08T15:20:04.046427", "hex_ident": "A49DE9", "call_sign": null, "location": [42.05695, -88.04905], "altitude_ft": 31475, "speed_kts": 334, "track_angle_deg": 169, "vertical_speed_fpm": 3328, "speed_ref": "GS"}
34 | {"@timestamp": "2021-09-08T15:20:03.330181", "hex_ident": "A1D4BC", "call_sign": "ENY4299", "location": [41.78804, -88.11425], "altitude_ft": 9675, "speed_kts": 292, "track_angle_deg": 41, "vertical_speed_fpm": -1792, "speed_ref": "GS"}
35 | {"@timestamp": "2021-09-08T15:20:05.502300", "hex_ident": "ACC3B4", "call_sign": "AAL2080", "location": [41.91885, -88.03], "altitude_ft": 7600, "speed_kts": 289, "track_angle_deg": 45, "vertical_speed_fpm": -1536, "speed_ref": "GS"}
36 | ```
37 |
38 | Once you confirm the script is working, you can redirect its output to a log file:
39 |
40 | ```bash
41 | sudo touch /var/log/flight-tracker.log
42 | sudo chown ubuntu.ubuntu /var/log/flight-tracker.log
43 | ```
44 |
45 | Create a logrotate entry so the log file doesn't grow unbounded:
46 |
47 | ```bash
48 | sudo vi /etc/logrotate.d/flight-tracker
49 | ```
50 |
51 | Add the following logrotate content:
52 |
53 | ```
54 | /var/log/flight-tracker.log {
55 | weekly
56 | rotate 12
57 | compress
58 | delaycompress
59 | missingok
60 | notifempty
61 | create 644 ubuntu ubuntu
62 | }
63 | ```
64 |
65 | Create a new bash script `~/bin/flight-tracker.sh` with the following:
66 |
67 | ```bash
68 | #!/bin/bash
69 |
70 | if pgrep -f "sudo /home/ubuntu/bin/flight-tracker.py" > /dev/null
71 | then
72 | echo "Already running."
73 | else
74 | echo "Not running. Restarting..."
75 | sudo /home/ubuntu/bin/flight-tracker.py >> /var/log/flight-tracker.log 2>&1
76 | fi
77 | ```
78 |
79 | Make it executable:
80 |
81 | ```bash
82 | chmod a+x ~/bin/flight-tracker.sh
83 | ```
84 |
85 | Add the following entry to your crontab with `crontab -e`:
86 |
87 | ```
88 | * * * * * /home/ubuntu/bin/flight-tracker.sh >> /tmp/flight-tracker.log 2>&1
89 | ```
90 |
91 | Verify output by tailing the log file for a few minutes (since cron is only running the script at the start of each minute):
92 |
93 | ```bash
94 | tail -f /var/log/flight-tracker.log
95 | ```
96 |
97 | If you're seeing output every few seconds, then you are successfully collecting data!
98 |
99 | ## Step #2 - Archive Data
100 |
101 | Once your data is ready to archive, we'll use Filebeat to send it to Logstash which will in turn sends it to S3.
102 |
103 | Add the following to the Filebeat config `/etc/filebeat/filebeat.yml` on the host logging your CO2 data:
104 |
105 | ```yaml
106 | filebeat.inputs:
107 | - type: log
108 | enabled: true
109 | tags: ["flight-tracker"]
110 | paths:
111 | - /var/log/flight-tracker.log
112 | ```
113 |
114 | This tells Filebeat where the log file is located and it adds a tag to each event. We'll refer to that tag in Logstash so we can easily isolate events from this data stream.
115 |
116 | Restart Filebeat:
117 |
118 | ```bash
119 | sudo systemctl restart filebeat
120 | ```
121 |
122 | You may want to tail syslog to see if Filebeat restarts without any issues:
123 |
124 | ```bash
125 | tail -f /var/log/syslog | grep filebeat
126 | ```
127 |
128 | At this point, we should have Solar Enphase data flowing into Logstash. By default however, our `distributor` pipeline in Logstash will put any unrecognized data in our Data Lake / S3 bucket called `NEEDS_CLASSIFIED`. To change this, we're going to update the `distributor` pipeline to recognize the Solar Enphase data feed.
129 |
130 | Add the following conditional to your `distributor.yml` file:
131 |
132 | ```
133 | } else if "flight-tracker" in [tags] {
134 | pipeline {
135 | send_to => ["flight-tracker-archive"]
136 | }
137 | }
138 | ```
139 |
140 | Create a Logstash pipeline called `flight-tracker-archive.yml` with the following contents:
141 |
142 | ```
143 | input {
144 | pipeline {
145 | address => "flight-tracker-archive"
146 | }
147 | }
148 | filter {
149 | }
150 | output {
151 | s3 {
152 | #
153 | # Custom Settings
154 | #
155 | prefix => "flight-tracker/%{+YYYY}-%{+MM}-%{+dd}/%{+HH}"
156 | temporary_directory => "${S3_TEMP_DIR}/flight-tracker-archive"
157 | access_key_id => "${S3_ACCESS_KEY}"
158 | secret_access_key => "${S3_SECRET_KEY}"
159 | endpoint => "${S3_ENDPOINT}"
160 | bucket => "${S3_BUCKET}"
161 |
162 | #
163 | # Standard Settings
164 | #
165 | validate_credentials_on_root_bucket => false
166 | codec => json_lines
167 | # Limit Data Lake file sizes to 5 GB
168 | size_file => 5000000000
169 | time_file => 60
170 | # encoding => "gzip"
171 | additional_settings => {
172 | force_path_style => true
173 | follow_redirects => false
174 | }
175 | }
176 | }
177 | ```
178 |
179 | Put this pipeline in your Logstash configuration directory so it gets loaded whenever Logstash restarts:
180 |
181 | ```bash
182 | sudo mv flight-tracker-archive.yml /etc/logstash/conf.d/
183 | ```
184 |
185 | Add the pipeline to your `/etc/logstash/pipelines.yml` file:
186 |
187 | ```
188 | - pipeline.id: "flight-tracker-archive"
189 | path.config: "/etc/logstash/conf.d/flight-tracker-archive.conf"
190 | ```
191 |
192 | And finally, restart the Logstash service:
193 |
194 | ```bash
195 | sudo systemctl restart logstash
196 | ```
197 |
198 | While Logstash is restarting, you can tail it's log file in order to see if there are any configuration errors:
199 |
200 | ```bash
201 | sudo tail -f /var/log/logstash/logstash-plain.log
202 | ```
203 |
204 | After a few seconds, you should see Logstash shutdown and start with the new pipeline and no errors being emitted.
205 |
206 | Check your cluster's Stack Monitoring to see if we're getting events through the pipeline:
207 |
208 | 
209 |
210 | Check your S3 bucket to see if you're getting data directories created for the current date & hour with data:
211 |
212 | 
213 |
214 | If you see your data being stored, then you are successfully archiving!
215 |
216 | ## Step #3 - Index Data
217 |
218 | Once Logstash is archiving the data, next we need to index it with Elastic.
219 |
220 | We'll use Elastic's [Dynamic field mapping](https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-field-mapping.html) feature to automatically create the right [Field data types](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html) for the majority of data we're sending in. The one exception is for `geo_point` data which we need to explicitly add a mapping for, using an index template.
221 |
222 | Jump into Kibana and create the following Index Template using Dev Tools:
223 |
224 | ```
225 | PUT _index_template/flight-tracker
226 | {
227 | "index_patterns": ["flight-tracker-*"],
228 | "template": {
229 | "settings": {},
230 | "mappings": {
231 | "properties": {
232 | "location": {
233 | "type": "geo_point"
234 | }
235 | }
236 | },
237 | "aliases": {}
238 | }
239 | }
240 | ```
241 |
242 | Using the [Logstash Toolkit](http://github.com/gose/logstash-toolkit), the following filter chain has been built that can parse the raw JSON coming in.
243 |
244 | Create a new pipeline called `flight-tracker-index.yml` with the following content:
245 |
246 | ```
247 | input {
248 | pipeline {
249 | address => "flight-tracker-index"
250 | }
251 | }
252 | filter {
253 | json {
254 | source => "message"
255 | }
256 | json {
257 | source => "message"
258 | }
259 | mutate {
260 | remove_field => ["message", "tags", "path"]
261 | remove_field => ["agent", "host", "input", "log", "ecs", "@version"]
262 | }
263 | }
264 | output {
265 | elasticsearch {
266 | #
267 | # Custom Settings
268 | #
269 | id => "flight-tracker-index"
270 | index => "flight-tracker-%{+YYYY.MM.dd}"
271 | hosts => "${ES_ENDPOINT}"
272 | user => "${ES_USERNAME}"
273 | password => "${ES_PASSWORD}"
274 | }
275 | }
276 | ```
277 |
278 | Put this pipeline in your Logstash configuration directory so it gets loaded in whenever Logstash restarts:
279 |
280 | ```bash
281 | sudo mv flight-tracker-index.yml /etc/logstash/conf.d/
282 | ```
283 |
284 | Add the pipeline to your `/etc/logstash/pipelines.yml` file:
285 |
286 | ```
287 | - pipeline.id: "flight-tracker-index"
288 | path.config: "/etc/logstash/conf.d/flight-tracker-index.conf"
289 | ```
290 |
291 | And finally, restart the Logstash service:
292 |
293 | ```bash
294 | sudo systemctl restart logstash
295 | ```
296 |
297 | While Logstash is restarting, you can tail it's log file in order to see if there are any configuration errors:
298 |
299 | ```bash
300 | sudo tail -f /var/log/logstash/logstash-plain.log
301 | ```
302 |
303 | After a few seconds, you should see Logstash shutdown and start with the new pipeline and no errors being emitted.
304 |
305 | Check your cluster's Stack Monitoring to see if we're getting events through the pipeline:
306 |
307 | 
308 |
309 | ## Step #4 - Visualize Data
310 |
311 | Once Elasticsearch is indexing the data, we want to visualize it in Kibana.
312 |
313 | Download this dashboard: [flight-tracker.ndjson](flight-tracker.ndjson)
314 |
315 | Jump back into Kibana:
316 |
317 | 1. Select "Stack Management" from the menu
318 | 2. Select "Saved Objects"
319 | 3. Click "Import" in the upper right
320 |
321 | Once it's been imported, click on "Flight Tracker".
322 |
323 | 
324 |
325 | If you'd like to plot the location of your receiver (i.e., the orange tower in the Elastic Map), add the following document using Dev Tools (replacing the `lat` and`lon` with your location):
326 |
327 | ```JSON
328 | PUT /flight-tracker-receiver/_doc/1
329 | {
330 | "location": {
331 | "lat": 41.978611,
332 | "lon": -87.904724
333 | }
334 | }
335 | ```
336 |
337 | Congratulations! You should now be looking at live flights in Elastic as they're being collected by your base station!
338 |
--------------------------------------------------------------------------------
/flight-tracker/archive.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/flight-tracker/archive.png
--------------------------------------------------------------------------------
/flight-tracker/dashboard.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/flight-tracker/dashboard.png
--------------------------------------------------------------------------------
/flight-tracker/flight-tracker.ndjson:
--------------------------------------------------------------------------------
1 | {"attributes":{"fieldAttrs":"{}","fields":"[]","runtimeFieldMap":"{}","timeFieldName":"@timestamp","title":"flight-tracker-*","typeMeta":"{}"},"coreMigrationVersion":"7.14.0","id":"c10de000-10c0-11ec-b013-53a9df7625dd","migrationVersion":{"index-pattern":"7.11.0"},"references":[],"type":"index-pattern","updated_at":"2021-09-08T16:21:00.040Z","version":"WzQwODEyMSwyXQ=="}
2 | {"attributes":{"description":"","hits":0,"kibanaSavedObjectMeta":{"searchSourceJSON":"{\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filter\":[]}"},"optionsJSON":"{\"useMargins\":true,\"syncColors\":false,\"hidePanelTitles\":false}","panelsJSON":"[{\"version\":\"7.14.0\",\"type\":\"visualization\",\"gridData\":{\"x\":0,\"y\":0,\"w\":12,\"h\":5,\"i\":\"11142c2c-8f3c-4eb0-9c2c-bc87dd272bc1\"},\"panelIndex\":\"11142c2c-8f3c-4eb0-9c2c-bc87dd272bc1\",\"embeddableConfig\":{\"savedVis\":{\"title\":\"\",\"description\":\"\",\"type\":\"markdown\",\"params\":{\"fontSize\":12,\"openLinksInNewTab\":false,\"markdown\":\"# Flight Tracker\"},\"uiState\":{},\"data\":{\"aggs\":[],\"searchSource\":{\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filter\":[]}}},\"enhancements\":{}}},{\"version\":\"7.14.0\",\"type\":\"lens\",\"gridData\":{\"x\":12,\"y\":0,\"w\":36,\"h\":5,\"i\":\"2cf8e227-485b-4209-af4e-9416692b8916\"},\"panelIndex\":\"2cf8e227-485b-4209-af4e-9416692b8916\",\"embeddableConfig\":{\"attributes\":{\"title\":\"\",\"type\":\"lens\",\"visualizationType\":\"lnsXY\",\"state\":{\"datasourceStates\":{\"indexpattern\":{\"layers\":{\"451792f9-6bc9-4b2d-a8db-b88844fa22d0\":{\"columns\":{\"ea213a93-dd58-4eb6-9b0b-4c747c1592d2\":{\"label\":\"@timestamp\",\"dataType\":\"date\",\"operationType\":\"date_histogram\",\"sourceField\":\"@timestamp\",\"isBucketed\":true,\"scale\":\"interval\",\"params\":{\"interval\":\"auto\"}},\"95952347-70f3-4884-93f8-cef298545532\":{\"label\":\"Count of records\",\"dataType\":\"number\",\"operationType\":\"count\",\"isBucketed\":false,\"scale\":\"ratio\",\"sourceField\":\"Records\"}},\"columnOrder\":[\"ea213a93-dd58-4eb6-9b0b-4c747c1592d2\",\"95952347-70f3-4884-93f8-cef298545532\"],\"incompleteColumns\":{}}}}},\"visualization\":{\"legend\":{\"isVisible\":true,\"position\":\"right\"},\"valueLabels\":\"hide\",\"fittingFunction\":\"None\",\"yLeftExtent\":{\"mode\":\"full\"},\"yRightExtent\":{\"mode\":\"full\"},\"axisTitlesVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"tickLabelsVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"gridlinesVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"preferredSeriesType\":\"bar_stacked\",\"layers\":[{\"layerId\":\"451792f9-6bc9-4b2d-a8db-b88844fa22d0\",\"accessors\":[\"95952347-70f3-4884-93f8-cef298545532\"],\"position\":\"top\",\"seriesType\":\"bar_stacked\",\"showGridlines\":false,\"xAccessor\":\"ea213a93-dd58-4eb6-9b0b-4c747c1592d2\"}]},\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filters\":[]},\"references\":[{\"type\":\"index-pattern\",\"id\":\"c10de000-10c0-11ec-b013-53a9df7625dd\",\"name\":\"indexpattern-datasource-current-indexpattern\"},{\"type\":\"index-pattern\",\"id\":\"c10de000-10c0-11ec-b013-53a9df7625dd\",\"name\":\"indexpattern-datasource-layer-451792f9-6bc9-4b2d-a8db-b88844fa22d0\"}]},\"enhancements\":{}}},{\"version\":\"7.14.0\",\"type\":\"map\",\"gridData\":{\"x\":0,\"y\":5,\"w\":48,\"h\":33,\"i\":\"6641024f-134f-488a-98bb-e6e65eb47fcc\"},\"panelIndex\":\"6641024f-134f-488a-98bb-e6e65eb47fcc\",\"embeddableConfig\":{\"attributes\":{\"title\":\"Flight Tracker\",\"description\":\"\",\"layerListJSON\":\"[{\\\"sourceDescriptor\\\":{\\\"type\\\":\\\"EMS_TMS\\\",\\\"isAutoSelect\\\":true,\\\"id\\\":null},\\\"id\\\":\\\"b97e4b84-4fb0-414d-88cf-a3bfb409c8fe\\\",\\\"label\\\":null,\\\"minZoom\\\":0,\\\"maxZoom\\\":24,\\\"alpha\\\":1,\\\"visible\\\":true,\\\"style\\\":{\\\"type\\\":\\\"TILE\\\"},\\\"includeInFitToBounds\\\":true,\\\"type\\\":\\\"VECTOR_TILE\\\"},{\\\"sourceDescriptor\\\":{\\\"indexPatternId\\\":\\\"c10de000-10c0-11ec-b013-53a9df7625dd\\\",\\\"geoField\\\":\\\"location\\\",\\\"filterByMapBounds\\\":true,\\\"scalingType\\\":\\\"LIMIT\\\",\\\"id\\\":\\\"85e4a6d8-32db-4f58-8268-19505567a7ac\\\",\\\"type\\\":\\\"ES_SEARCH\\\",\\\"applyGlobalQuery\\\":true,\\\"applyGlobalTime\\\":true,\\\"tooltipProperties\\\":[],\\\"sortField\\\":\\\"\\\",\\\"sortOrder\\\":\\\"desc\\\",\\\"topHitsSplitField\\\":\\\"\\\",\\\"topHitsSize\\\":1},\\\"id\\\":\\\"7b080642-7356-4d69-881d-914ccbc26fa0\\\",\\\"label\\\":\\\"Check-ins\\\",\\\"minZoom\\\":0,\\\"maxZoom\\\":24,\\\"alpha\\\":0.75,\\\"visible\\\":true,\\\"style\\\":{\\\"type\\\":\\\"VECTOR\\\",\\\"properties\\\":{\\\"icon\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"value\\\":\\\"airport\\\"}},\\\"fillColor\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"color\\\":\\\"#54B399\\\"}},\\\"lineColor\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"color\\\":\\\"#41937c\\\"}},\\\"lineWidth\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"size\\\":0}},\\\"iconSize\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"size\\\":3}},\\\"iconOrientation\\\":{\\\"type\\\":\\\"DYNAMIC\\\",\\\"options\\\":{\\\"field\\\":{\\\"label\\\":\\\"track_deg\\\",\\\"name\\\":\\\"track_deg\\\",\\\"origin\\\":\\\"source\\\",\\\"type\\\":\\\"number\\\",\\\"supportsAutoDomain\\\":true},\\\"fieldMetaOptions\\\":{\\\"isEnabled\\\":true,\\\"sigma\\\":3}}},\\\"labelText\\\":{\\\"type\\\":\\\"DYNAMIC\\\",\\\"options\\\":{\\\"field\\\":null}},\\\"labelColor\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"color\\\":\\\"#FFFFFF\\\"}},\\\"labelSize\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"size\\\":14}},\\\"labelBorderColor\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"color\\\":\\\"#000000\\\"}},\\\"symbolizeAs\\\":{\\\"options\\\":{\\\"value\\\":\\\"circle\\\"}},\\\"labelBorderSize\\\":{\\\"options\\\":{\\\"size\\\":\\\"SMALL\\\"}}},\\\"isTimeAware\\\":true},\\\"includeInFitToBounds\\\":true,\\\"type\\\":\\\"VECTOR\\\",\\\"joins\\\":[]},{\\\"sourceDescriptor\\\":{\\\"indexPatternId\\\":\\\"c10de000-10c0-11ec-b013-53a9df7625dd\\\",\\\"geoField\\\":\\\"location\\\",\\\"scalingType\\\":\\\"TOP_HITS\\\",\\\"sortField\\\":\\\"@timestamp\\\",\\\"sortOrder\\\":\\\"desc\\\",\\\"tooltipProperties\\\":[\\\"hex_code.keyword\\\",\\\"@timestamp\\\",\\\"call_sign\\\",\\\"altitude_ft\\\",\\\"speed_kts\\\",\\\"vertical_speed_fpm\\\"],\\\"topHitsSplitField\\\":\\\"hex_code.keyword\\\",\\\"topHitsSize\\\":1,\\\"id\\\":\\\"4973e2c9-1ca6-4518-bac7-088913a7d434\\\",\\\"type\\\":\\\"ES_SEARCH\\\",\\\"applyGlobalQuery\\\":true,\\\"applyGlobalTime\\\":true,\\\"filterByMapBounds\\\":true},\\\"id\\\":\\\"5bb65a5f-f1c6-4e65-ae5f-f140bdecb838\\\",\\\"label\\\":\\\"Flights\\\",\\\"minZoom\\\":0,\\\"maxZoom\\\":24,\\\"alpha\\\":0.75,\\\"visible\\\":true,\\\"style\\\":{\\\"type\\\":\\\"VECTOR\\\",\\\"properties\\\":{\\\"icon\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"value\\\":\\\"airport\\\"}},\\\"fillColor\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"color\\\":\\\"#6092C0\\\"}},\\\"lineColor\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"color\\\":\\\"#4379aa\\\"}},\\\"lineWidth\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"size\\\":0}},\\\"iconSize\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"size\\\":10}},\\\"iconOrientation\\\":{\\\"type\\\":\\\"DYNAMIC\\\",\\\"options\\\":{\\\"field\\\":{\\\"label\\\":\\\"track_deg\\\",\\\"name\\\":\\\"track_deg\\\",\\\"origin\\\":\\\"source\\\",\\\"type\\\":\\\"number\\\",\\\"supportsAutoDomain\\\":true},\\\"fieldMetaOptions\\\":{\\\"isEnabled\\\":true,\\\"sigma\\\":3}}},\\\"labelText\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"value\\\":\\\"\\\"}},\\\"labelColor\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"color\\\":\\\"#FFFFFF\\\"}},\\\"labelSize\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"size\\\":14}},\\\"labelBorderColor\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"color\\\":\\\"#000000\\\"}},\\\"symbolizeAs\\\":{\\\"options\\\":{\\\"value\\\":\\\"icon\\\"}},\\\"labelBorderSize\\\":{\\\"options\\\":{\\\"size\\\":\\\"SMALL\\\"}}},\\\"isTimeAware\\\":true},\\\"includeInFitToBounds\\\":true,\\\"type\\\":\\\"VECTOR\\\",\\\"joins\\\":[]},{\\\"sourceDescriptor\\\":{\\\"indexPatternId\\\":\\\"c10de000-10c0-11ec-b013-53a9df7625dd\\\",\\\"geoField\\\":\\\"location\\\",\\\"splitField\\\":\\\"hex_code.keyword\\\",\\\"sortField\\\":\\\"@timestamp\\\",\\\"id\\\":\\\"e5e0fe1f-b235-4f42-b361-2e0d7177b1a8\\\",\\\"type\\\":\\\"ES_GEO_LINE\\\",\\\"applyGlobalQuery\\\":true,\\\"applyGlobalTime\\\":true,\\\"metrics\\\":[{\\\"type\\\":\\\"count\\\"}]},\\\"style\\\":{\\\"type\\\":\\\"VECTOR\\\",\\\"properties\\\":{\\\"icon\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"value\\\":\\\"marker\\\"}},\\\"fillColor\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"color\\\":\\\"#54B399\\\"}},\\\"lineColor\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"color\\\":\\\"#41937c\\\"}},\\\"lineWidth\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"size\\\":2}},\\\"iconSize\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"size\\\":6}},\\\"iconOrientation\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"orientation\\\":0}},\\\"labelText\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"value\\\":\\\"\\\"}},\\\"labelColor\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"color\\\":\\\"#FFFFFF\\\"}},\\\"labelSize\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"size\\\":14}},\\\"labelBorderColor\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"color\\\":\\\"#000000\\\"}},\\\"symbolizeAs\\\":{\\\"options\\\":{\\\"value\\\":\\\"circle\\\"}},\\\"labelBorderSize\\\":{\\\"options\\\":{\\\"size\\\":\\\"SMALL\\\"}}},\\\"isTimeAware\\\":true},\\\"id\\\":\\\"90e78113-220f-439f-908e-5ec584444856\\\",\\\"label\\\":null,\\\"minZoom\\\":0,\\\"maxZoom\\\":24,\\\"alpha\\\":1,\\\"visible\\\":true,\\\"includeInFitToBounds\\\":true,\\\"type\\\":\\\"VECTOR\\\",\\\"joins\\\":[]},{\\\"sourceDescriptor\\\":{\\\"indexPatternId\\\":\\\"c10de000-10c0-11ec-b013-53a9df7625dd\\\",\\\"geoField\\\":\\\"location\\\",\\\"filterByMapBounds\\\":true,\\\"scalingType\\\":\\\"LIMIT\\\",\\\"id\\\":\\\"efde372c-8fee-4433-ad0b-5cf3b1ae0c1b\\\",\\\"type\\\":\\\"ES_SEARCH\\\",\\\"applyGlobalQuery\\\":false,\\\"applyGlobalTime\\\":false,\\\"tooltipProperties\\\":[],\\\"sortField\\\":\\\"\\\",\\\"sortOrder\\\":\\\"desc\\\",\\\"topHitsSplitField\\\":\\\"\\\",\\\"topHitsSize\\\":1},\\\"id\\\":\\\"89aa1c7a-a9a0-4ad3-a15e-309539f3d74b\\\",\\\"label\\\":\\\"Base Station\\\",\\\"minZoom\\\":0,\\\"maxZoom\\\":24,\\\"alpha\\\":0.75,\\\"visible\\\":true,\\\"style\\\":{\\\"type\\\":\\\"VECTOR\\\",\\\"properties\\\":{\\\"icon\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"value\\\":\\\"communications-tower\\\"}},\\\"fillColor\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"color\\\":\\\"#f8a305\\\"}},\\\"lineColor\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"color\\\":\\\"#F8A305\\\"}},\\\"lineWidth\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"size\\\":1}},\\\"iconSize\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"size\\\":16}},\\\"iconOrientation\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"orientation\\\":0}},\\\"labelText\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"value\\\":\\\"\\\"}},\\\"labelColor\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"color\\\":\\\"#FFFFFF\\\"}},\\\"labelSize\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"size\\\":14}},\\\"labelBorderColor\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"color\\\":\\\"#000000\\\"}},\\\"symbolizeAs\\\":{\\\"options\\\":{\\\"value\\\":\\\"icon\\\"}},\\\"labelBorderSize\\\":{\\\"options\\\":{\\\"size\\\":\\\"SMALL\\\"}}},\\\"isTimeAware\\\":true},\\\"includeInFitToBounds\\\":true,\\\"type\\\":\\\"VECTOR\\\",\\\"joins\\\":[],\\\"query\\\":{\\\"query\\\":\\\"_id : 1\\\",\\\"language\\\":\\\"kuery\\\"}}]\",\"mapStateJSON\":\"{\\\"zoom\\\":8.47,\\\"center\\\":{\\\"lon\\\":-88.20859,\\\"lat\\\":41.95384},\\\"timeFilters\\\":{\\\"from\\\":\\\"now-3m\\\",\\\"to\\\":\\\"now\\\"},\\\"refreshConfig\\\":{\\\"isPaused\\\":false,\\\"interval\\\":10000},\\\"query\\\":{\\\"query\\\":\\\"\\\",\\\"language\\\":\\\"kuery\\\"},\\\"filters\\\":[],\\\"settings\\\":{\\\"autoFitToDataBounds\\\":false,\\\"backgroundColor\\\":\\\"#1d1e24\\\",\\\"disableInteractive\\\":false,\\\"disableTooltipControl\\\":false,\\\"hideToolbarOverlay\\\":false,\\\"hideLayerControl\\\":false,\\\"hideViewControl\\\":false,\\\"initialLocation\\\":\\\"LAST_SAVED_LOCATION\\\",\\\"fixedLocation\\\":{\\\"lat\\\":0,\\\"lon\\\":0,\\\"zoom\\\":2},\\\"browserLocation\\\":{\\\"zoom\\\":2},\\\"maxZoom\\\":24,\\\"minZoom\\\":0,\\\"showScaleControl\\\":false,\\\"showSpatialFilters\\\":true,\\\"showTimesliderToggleButton\\\":true,\\\"spatialFiltersAlpa\\\":0.3,\\\"spatialFiltersFillColor\\\":\\\"#DA8B45\\\",\\\"spatialFiltersLineColor\\\":\\\"#DA8B45\\\"}}\",\"uiStateJSON\":\"{\\\"isLayerTOCOpen\\\":true,\\\"openTOCDetails\\\":[]}\"},\"mapCenter\":{\"lat\":41.73541,\"lon\":-88.02419,\"zoom\":8.75},\"mapBuffer\":{\"minLon\":-89.29687,\"minLat\":40.9799,\"maxLon\":-86.48437,\"maxLat\":42.55308},\"isLayerTOCOpen\":true,\"openTOCDetails\":[],\"hiddenLayers\":[\"7b080642-7356-4d69-881d-914ccbc26fa0\"],\"enhancements\":{}}}]","timeRestore":false,"title":"Flight Tracker","version":1},"coreMigrationVersion":"7.14.0","id":"0ada6ef0-10c2-11ec-b013-53a9df7625dd","migrationVersion":{"dashboard":"7.14.0"},"references":[{"id":"c10de000-10c0-11ec-b013-53a9df7625dd","name":"2cf8e227-485b-4209-af4e-9416692b8916:indexpattern-datasource-current-indexpattern","type":"index-pattern"},{"id":"c10de000-10c0-11ec-b013-53a9df7625dd","name":"2cf8e227-485b-4209-af4e-9416692b8916:indexpattern-datasource-layer-451792f9-6bc9-4b2d-a8db-b88844fa22d0","type":"index-pattern"},{"id":"c10de000-10c0-11ec-b013-53a9df7625dd","name":"6641024f-134f-488a-98bb-e6e65eb47fcc:layer_1_source_index_pattern","type":"index-pattern"},{"id":"c10de000-10c0-11ec-b013-53a9df7625dd","name":"6641024f-134f-488a-98bb-e6e65eb47fcc:layer_2_source_index_pattern","type":"index-pattern"},{"id":"c10de000-10c0-11ec-b013-53a9df7625dd","name":"6641024f-134f-488a-98bb-e6e65eb47fcc:layer_3_source_index_pattern","type":"index-pattern"},{"id":"c10de000-10c0-11ec-b013-53a9df7625dd","name":"6641024f-134f-488a-98bb-e6e65eb47fcc:layer_4_source_index_pattern","type":"index-pattern"}],"type":"dashboard","updated_at":"2021-09-08T16:47:35.330Z","version":"WzQwODk1OCwyXQ=="}
3 | {"excludedObjects":[],"excludedObjectsCount":0,"exportedCount":2,"missingRefCount":0,"missingReferences":[]}
--------------------------------------------------------------------------------
/flight-tracker/flight-tracker.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 |
3 | import pyModeS as pms
4 |
5 | from datetime import datetime, timedelta
6 | from json import dumps
7 | from pyModeS import common
8 | from pyModeS.extra.rtlreader import RtlReader
9 |
10 | class Flight:
11 | def __init__(self, hex_ident=None):
12 | self.hex_ident = hex_ident
13 | self.call_sign = None
14 | self.location = None
15 | self.altitude_ft = None
16 | self.speed_kts = None
17 | self.track_angle_deg = None
18 | self.vertical_speed_fpm = None
19 | self.speed_ref = None
20 | self.last_seen = None
21 | self.sent = False
22 |
23 | def has_info(self):
24 | return (#self.call_sign is not None and
25 | self.location is not None and
26 | self.altitude_ft is not None and
27 | self.track_angle_deg is not None and
28 | self.speed_kts is not None)
29 |
30 | def pretty_print(self):
31 | print(self.hex_ident, self.call_sign, self.location, self.altitude_ft, self.speed_kts,
32 | self.track_angle_deg, self.vertical_speed_fpm, self.speed_ref)
33 |
34 | def json_print(self):
35 | output = {
36 | "@timestamp": self.last_seen.isoformat(),
37 | "hex_code": self.hex_ident,
38 | "call_sign": self.call_sign,
39 | "location": {
40 | "lat": self.location[0],
41 | "lon": self.location[1]
42 | },
43 | "altitude_ft": self.altitude_ft,
44 | "speed_kts": self.speed_kts,
45 | "track_deg": int(self.track_angle_deg),
46 | "vertical_speed_fpm": self.vertical_speed_fpm,
47 | "speed_ref": self.speed_ref
48 | }
49 | print(dumps(output))
50 |
51 |
52 | class ADSBClient(RtlReader):
53 | def __init__(self):
54 | super(ADSBClient, self).__init__()
55 | self.flights = {}
56 | self.lat_ref =
57 | self.lon_ref =
58 | self.i = 0
59 |
60 | def handle_messages(self, messages):
61 | self.i += 1
62 | for msg, ts in messages:
63 | if len(msg) != 28: # wrong data length
64 | continue
65 |
66 | df = pms.df(msg)
67 |
68 | if df != 17: # not ADSB
69 | continue
70 |
71 | if pms.crc(msg) !=0: # CRC fail
72 | continue
73 |
74 | icao = pms.adsb.icao(msg)
75 | tc = pms.adsb.typecode(msg)
76 | flight = None
77 |
78 | if icao in self.flights:
79 | flight = self.flights[icao]
80 | else:
81 | flight = Flight(icao)
82 |
83 | flight.last_seen = datetime.now()
84 |
85 | # Message Type Codes: https://mode-s.org/api/
86 | if tc >= 1 and tc <= 4:
87 | # Typecode 1-4
88 | flight.call_sign = pms.adsb.callsign(msg).strip('_')
89 | elif tc >= 9 and tc <= 18:
90 | # Typecode 9-18 (airborne, barometric height)
91 | flight.location = pms.adsb.airborne_position_with_ref(msg,
92 | self.lat_ref, self.lon_ref)
93 | flight.altitude_ft = pms.adsb.altitude(msg)
94 | flight.sent = False
95 | elif tc == 19:
96 | # Typecode: 19
97 | # Ground Speed (GS) or Airspeed (IAS/TAS)
98 | # Output (speed, track angle, vertical speed, tag):
99 | (flight.speed_kts, flight.track_angle_deg, flight.vertical_speed_fpm,
100 | flight.speed_ref) = pms.adsb.velocity(msg)
101 |
102 | self.flights[icao] = flight
103 |
104 | if self.i > 10:
105 | self.i = 0
106 | #print("Flights: ", len(self.flights))
107 | for key in list(self.flights):
108 | f = self.flights[key]
109 | if f.has_info() and not f.sent:
110 | #f.pretty_print()
111 | f.json_print()
112 | f.sent = True
113 | elif f.last_seen < (datetime.now() - timedelta(minutes=5)):
114 | #print("Deleting ", key)
115 | del self.flights[key]
116 |
117 |
118 | if __name__ == "__main__":
119 | client = ADSBClient()
120 | client.run()
121 |
--------------------------------------------------------------------------------
/flight-tracker/index.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/flight-tracker/index.png
--------------------------------------------------------------------------------
/flight-tracker/logo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/flight-tracker/logo.png
--------------------------------------------------------------------------------
/flight-tracker/minio.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/flight-tracker/minio.png
--------------------------------------------------------------------------------
/gps/README.md:
--------------------------------------------------------------------------------
1 | # GPS Monitoring
2 |
3 |
4 |
5 |
6 |
7 | The [VK-162 GPS Receiver](https://www.pishop.us/product/gps-antenna-vk-162/) is a low cost, high sensitivity GPS Receiver with an internal antenna. It provides location & speed tracking with a high degree of accuracy, and can be used in a number of applications. It's built on the Ublox G6010 / G7020 low-power consumption GPS chipset and connects to a computer via USB where it can be read programmatically.
8 |
9 | In this data source, we'll build the following dashboard with Elastic:
10 |
11 | 
12 |
13 | Let's get started!
14 |
15 | ## Step #1 - Collect Data
16 |
17 | We use [gpsd](https://gpsd.gitlab.io/gpsd/) to query the GPS Receiver, and then [gpspipe](https://gpsd.gitlab.io/gpsd/gpspipe.html) to talk to `gpsd` to get location readings.
18 |
19 | If this is your first time setting up `gpsd`, you can refer to the [installation instructions](https://gpsd.gitlab.io/gpsd/installation.html).
20 |
21 | For Ubuntu systems, the installation process is as follows:
22 |
23 | ```bash
24 | sudo apt install gpsd-clients gpsd
25 | sudo systemctl enable gpsd
26 | ```
27 |
28 | Create a file called `/etc/default/gpsd` with the following contents (changing the device to your setup, if necessary):
29 |
30 | ```
31 | DEVICES="/dev/ttyACM0"
32 | ```
33 |
34 | Then start the `gpsd` service:
35 |
36 | ```bash
37 | sudo systemctl start gpsd
38 | ```
39 |
40 | And try querying it:
41 |
42 | ```bash
43 | gpspipe -w
44 | ```
45 |
46 | You should see output similar to the following:
47 |
48 | ```json
49 | {"class":"VERSION","release":"3.20","rev":"3.20","proto_major":3,"proto_minor":14}
50 | {"class":"DEVICES","devices":[{"class":"DEVICE","path":"/dev/ttyACM0","driver":"u-blox","subtype":"SW 1.00 (59842),HW 00070000","subtype1":",PROTVER 14.00,GPS;SBAS;GLO;QZSS","activated":"2021-09-02T19:13:12.267Z","flags":1,"native":1,"bps":9600,"parity":"N","stopbits":1,"cycle":1.00,"mincycle":0.25}]}
51 | {"class":"WATCH","enable":true,"json":true,"nmea":false,"raw":0,"scaled":false,"timing":false,"split24":false,"pps":false}
52 | {"class":"TPV","device":"/dev/ttyACM0","status":2,"mode":3,"time":"2021-09-02T19:13:13.000Z","leapseconds":18,"ept":0.005,"lat":41.881832,"lon":-87.623177,"altHAE":206.865,"altMSL":240.666,"alt":240.666,"track":14.3003,"magtrack":10.5762,"magvar":-3.7,"speed":0.088,"climb":0.010,"eps":0.67,"ecefx":160358.79,"ecefy":-4754122.42,"ecefz":4234974.21,"ecefvx":0.02,"ecefvy":0.05,"ecefvz":0.07,"ecefpAcc":10.77,"ecefvAcc":0.67,"velN":0.085,"velE":0.022,"velD":-0.010,"geoidSep":-33.801}
53 | ```
54 |
55 | The main thing to look for are the `"class":"TPV"` objects which is are "time-position-velocity" reports. All of the fields included are described in the document [Core Protocol Responses](https://gpsd.gitlab.io/gpsd/gpsd_json.html#_core_protocol_responses).
56 |
57 | Create a new shell script called `~/bin/gps.sh` with the following contents:
58 |
59 | ```bash
60 | #!/bin/bash
61 |
62 | gpspipe -w -n 10 | grep TPV | tail -n 1
63 | ```
64 |
65 | Try running the script:
66 |
67 | ```bash
68 | chmod a+x ~/bin/gps.sh
69 | ~/bin/gps.sh
70 | ```
71 |
72 | You should see output on `stdout` similar to:
73 |
74 | ```json
75 | {"class":"TPV","device":"/dev/ttyACM0","status":2,"mode":3,"time":"2021-09-02T19:13:13.000Z","leapseconds":18,"ept":0.005,"lat":41.881832,"lon":-87.623177,"altHAE":206.865,"altMSL":240.666,"alt":240.666,"track":14.3003,"magtrack":10.5762,"magvar":-3.7,"speed":0.088,"climb":0.010,"eps":0.67,"ecefx":160358.79,"ecefy":-4754122.42,"ecefz":4234974.21,"ecefvx":0.02,"ecefvy":0.05,"ecefvz":0.07,"ecefpAcc":10.77,"ecefvAcc":0.67,"velN":0.085,"velE":0.022,"velD":-0.010,"geoidSep":-33.801}
76 | ```
77 |
78 | Once you confirm the script is working, you can redirect its output to a log file:
79 |
80 | ```bash
81 | sudo touch /var/log/gps.log
82 | sudo chown ubuntu.ubuntu /var/log/gps.log
83 | ```
84 |
85 | Create a logrotate entry so the log file doesn't grow unbounded:
86 |
87 | ```bash
88 | sudo vi /etc/logrotate.d/gps
89 | ```
90 |
91 | Add the following logrotate content:
92 |
93 | ```
94 | /var/log/gps.log {
95 | weekly
96 | rotate 12
97 | compress
98 | delaycompress
99 | missingok
100 | notifempty
101 | create 644 ubuntu ubuntu
102 | }
103 | ```
104 |
105 | Add the following entry to your crontab with `crontab -e`:
106 |
107 | ```
108 | * * * * * /home/ubuntu/bin/gps.sh >> /var/log/gps.log 2>&1
109 | ```
110 |
111 | Verify output by tailing the log file for a few minutes (since cron is only running the script at the start of each minute):
112 |
113 | ```bash
114 | tail -f /var/log/gps.log
115 | ```
116 |
117 | If you're seeing output scroll each minute then you are successfully collecting data!
118 |
119 | ## Step #2 - Archive Data
120 |
121 | Once your data is ready to archive, we'll use Filebeat to send it to Logstash which will in turn sends it to S3.
122 |
123 | Add the following to the Filebeat config `/etc/filebeat/filebeat.yml` on the host logging your weather data:
124 |
125 | ```yaml
126 | filebeat.inputs:
127 | - type: log
128 | enabled: true
129 | tags: ["gps"]
130 | paths:
131 | - /var/log/gps.log
132 | ```
133 |
134 | This tells Filebeat where the log file is located and it adds a tag to each event. We'll refer to that tag in Logstash so we can easily isolate events from this data stream.
135 |
136 | Restart Filebeat:
137 |
138 | ```bash
139 | sudo systemctl restart filebeat
140 | ```
141 |
142 | You may want to tail syslog to see if Filebeat restarts without any issues:
143 |
144 | ```bash
145 | tail -f /var/log/syslog | grep filebeat
146 | ```
147 |
148 | At this point, we should have GPS data flowing into Logstash. By default however, our `distributor` pipeline in Logstash will put any unrecognized data in our Data Lake / S3 bucket called `NEEDS_CLASSIFIED`. To change this, we're going to update the `distributor` pipeline to recognize the weather station data feed.
149 |
150 | Add the following conditional to your `distributor.yml` file:
151 |
152 | ```
153 | } else if "gps" in [tags] {
154 | pipeline {
155 | send_to => ["gps-archive"]
156 | }
157 | }
158 | ```
159 |
160 | Create a Logstash pipeline called `gps-archive.yml` with the following contents:
161 |
162 | ```
163 | input {
164 | pipeline {
165 | address => "gps-archive"
166 | }
167 | }
168 | filter {
169 | }
170 | output {
171 | s3 {
172 | #
173 | # Custom Settings
174 | #
175 | prefix => "gps/%{+YYYY}-%{+MM}-%{+dd}/%{+HH}"
176 | temporary_directory => "${S3_TEMP_DIR}/gps-archive"
177 | access_key_id => "${S3_ACCESS_KEY}"
178 | secret_access_key => "${S3_SECRET_KEY}"
179 | endpoint => "${S3_ENDPOINT}"
180 | bucket => "${S3_BUCKET}"
181 |
182 | #
183 | # Standard Settings
184 | #
185 | validate_credentials_on_root_bucket => false
186 | codec => json_lines
187 | # Limit Data Lake file sizes to 5 GB
188 | size_file => 5000000000
189 | time_file => 60
190 | # encoding => "gzip"
191 | additional_settings => {
192 | force_path_style => true
193 | follow_redirects => false
194 | }
195 | }
196 | }
197 | ```
198 |
199 | Put this pipeline in your Logstash configuration directory so it gets loaded whenever Logstash restarts:
200 |
201 | ```bash
202 | sudo mv gps-archive.yml /etc/logstash/conf.d/
203 | ```
204 |
205 | Add the pipeline to your `/etc/logstash/pipelines.yml` file:
206 |
207 | ```
208 | - pipeline.id: "gps-archive"
209 | path.config: "/etc/logstash/conf.d/gps-archive.conf"
210 | ```
211 |
212 | And finally, restart the Logstash service:
213 |
214 | ```bash
215 | sudo systemctl restart logstash
216 | ```
217 |
218 | While Logstash is restarting, you can tail it's log file in order to see if there are any configuration errors:
219 |
220 | ```bash
221 | sudo tail -f /var/log/logstash/logstash-plain.log
222 | ```
223 |
224 | After a few seconds, you should see Logstash shutdown and start with the new pipeline and no errors being emitted.
225 |
226 | Check your cluster's Stack Monitoring to see if we're getting events through the pipeline:
227 |
228 | 
229 |
230 | Check your S3 bucket to see if you're getting data directories created for the current date & hour with data:
231 |
232 | 
233 |
234 | If you see your data being stored, then you are successfully archiving!
235 |
236 | ## Step #3 - Index Data
237 |
238 | Once Logstash is archiving the data, we need to index it with Elastic.
239 |
240 | We'll use Elastic's [Dynamic field mapping](https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-field-mapping.html) feature to automatically create the right [Field data types](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html) for the majority of data we're sending in. The one exception is for `geo_point` data which we need to explicitly add a mapping for, using an index template.
241 |
242 | Jump into Kibana and create the following Index Template using Dev Tools:
243 |
244 | ```
245 | PUT _index_template/gps
246 | {
247 | "index_patterns": ["gps-*"],
248 | "template": {
249 | "settings": {},
250 | "mappings": {
251 | "properties": {
252 | "location": {
253 | "type": "geo_point"
254 | }
255 | }
256 | },
257 | "aliases": {}
258 | }
259 | }
260 | ```
261 |
262 | Using the [Logstash Toolkit](http://github.com/gose/logstash-toolkit), the following filter chain has been built that can parse the raw JSON coming in.
263 |
264 | Create a new pipeline called `gps-index.yml` with the following content:
265 |
266 | ```
267 | input {
268 | pipeline {
269 | address => "gps-index"
270 | }
271 | }
272 | filter {
273 | json {
274 | source => "message"
275 | }
276 | json {
277 | source => "message"
278 | }
279 | mutate {
280 | remove_field => ["message", "agent", "host", "input", "log", "host", "ecs", "@version"]
281 | }
282 | mutate {
283 | add_field => { "[location]" => "%{[lat]}, %{[lon]}" }
284 | }
285 | }
286 | output {
287 | elasticsearch {
288 | #
289 | # Custom Settings
290 | #
291 | id => "gps-index"
292 | index => "gps-%{+YYYY.MM.dd}"
293 | hosts => "${ES_ENDPOINT}"
294 | user => "${ES_USERNAME}"
295 | password => "${ES_PASSWORD}"
296 | }
297 | }
298 | ```
299 |
300 | Put this pipeline in your Logstash configuration directory so it gets loaded in whenever Logstash restarts:
301 |
302 | ```bash
303 | sudo mv gps-index.yml /etc/logstash/conf.d/
304 | ```
305 |
306 | Add the pipeline to your `/etc/logstash/pipelines.yml` file:
307 |
308 | ```
309 | - pipeline.id: "gps-index"
310 | path.config: "/etc/logstash/conf.d/gps-index.conf"
311 | ```
312 |
313 | And finally, restart the Logstash service:
314 |
315 | ```bash
316 | sudo systemctl restart logstash
317 | ```
318 |
319 | While Logstash is restarting, you can tail it's log file in order to see if there are any configuration errors:
320 |
321 | ```bash
322 | sudo tail -f /var/log/logstash/logstash-plain.log
323 | ```
324 |
325 | After a few seconds, you should see Logstash shutdown and start with the new pipeline and no errors being emitted.
326 |
327 | Check your cluster's Stack Monitoring to see if we're getting events through the pipeline:
328 |
329 | 
330 |
331 | ## Step #4 - Visualize Data
332 |
333 | Once Elasticsearch is indexing the data, we want to visualize it in Kibana.
334 |
335 | Download this dashboard:
336 |
337 | [gps.ndjson](gps.ndjson)
338 |
339 | Jump into Kibana:
340 |
341 | 1. Select "Stack Management" from the menu
342 | 2. Select "Saved Objects"
343 | 3. Click "Import" in the upper right
344 |
345 | 
346 |
347 | Congratulations! You should now be looking at data from your GPS in Elastic.
348 |
--------------------------------------------------------------------------------
/gps/archive.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/gps/archive.png
--------------------------------------------------------------------------------
/gps/dashboard.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/gps/dashboard.png
--------------------------------------------------------------------------------
/gps/gps.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/gps/gps.png
--------------------------------------------------------------------------------
/gps/index.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/gps/index.png
--------------------------------------------------------------------------------
/gps/minio.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/gps/minio.png
--------------------------------------------------------------------------------
/haproxy-filebeat-module/2-archive/haproxy-filebeat-module-archive.yml:
--------------------------------------------------------------------------------
1 | input {
2 | pipeline {
3 | address => "haproxy-filebeat-module-archive"
4 | }
5 | }
6 | filter {
7 | }
8 | output {
9 | s3 {
10 | #
11 | # Custom Settings
12 | #
13 | prefix => "haproxy-filebeat-module/${S3_DATE_DIR}"
14 | temporary_directory => "${S3_TEMP_DIR}/haproxy-filebeat-module-archive"
15 | access_key_id => "${S3_ACCESS_KEY}"
16 | secret_access_key => "${S3_SECRET_KEY}"
17 | endpoint => "${S3_ENDPOINT}"
18 | bucket => "${S3_BUCKET}"
19 |
20 | #
21 | # Standard Settings
22 | #
23 | validate_credentials_on_root_bucket => false
24 | codec => json_lines
25 | # Limit Data Lake file sizes to 5 GB
26 | size_file => 5000000000
27 | time_file => 1
28 | # encoding => "gzip"
29 | additional_settings => {
30 | force_path_style => true
31 | follow_redirects => false
32 | }
33 | }
34 | }
35 |
--------------------------------------------------------------------------------
/haproxy-filebeat-module/2-archive/haproxy-filebeat-module-reindex.yml:
--------------------------------------------------------------------------------
1 | input {
2 | s3 {
3 | #
4 | # Custom Settings
5 | #
6 | prefix => "haproxy-filebeat-module/2021-01-04"
7 | temporary_directory => "${S3_TEMP_DIR}/haproxy-filebeat-module-reindex"
8 | access_key_id => "${S3_ACCESS_KEY}"
9 | secret_access_key => "${S3_SECRET_KEY}"
10 | endpoint => "${S3_ENDPOINT}"
11 | bucket => "${S3_BUCKET}"
12 |
13 | #
14 | # Standard Settings
15 | #
16 | watch_for_new_files => false
17 | codec => json_lines
18 | additional_settings => {
19 | force_path_style => true
20 | follow_redirects => false
21 | }
22 | }
23 | }
24 | filter {
25 | }
26 | output {
27 | pipeline { send_to => "haproxy-filebeat-module-structure" }
28 | }
29 |
--------------------------------------------------------------------------------
/haproxy-filebeat-module/2-archive/haproxy-filebeat-module-structure.yml:
--------------------------------------------------------------------------------
1 | input {
2 | pipeline {
3 | address => "haproxy-filebeat-module-structure"
4 | }
5 | }
6 | filter {
7 | }
8 | output {
9 | elasticsearch {
10 | #
11 | # Custom Settings
12 | #
13 | id => "haproxy-filebeat-module-structure"
14 | index => "haproxy-filebeat-module"
15 | hosts => "${ES_ENDPOINT}"
16 | user => "${ES_USERNAME}"
17 | password => "${ES_PASSWORD}"
18 | }
19 | }
20 |
--------------------------------------------------------------------------------
/haproxy-filebeat-module/4-visualize/dashboard.json:
--------------------------------------------------------------------------------
1 | {
2 | "objects": [
3 | {
4 | "attributes": {
5 | "description": "",
6 | "kibanaSavedObjectMeta": {
7 | "searchSourceJSON": {
8 | "filter": [],
9 | "index": "filebeat-*",
10 | "query": {
11 | "language": "kuery",
12 | "query": ""
13 | }
14 | }
15 | },
16 | "title": "Backend breakdown [Filebeat HAProxy] ECS",
17 | "uiStateJSON": {},
18 | "version": 1,
19 | "visState": {
20 | "aggs": [
21 | {
22 | "enabled": true,
23 | "id": "1",
24 | "params": {},
25 | "schema": "metric",
26 | "type": "count"
27 | },
28 | {
29 | "enabled": true,
30 | "id": "2",
31 | "params": {
32 | "field": "haproxy.backend_name",
33 | "missingBucket": false,
34 | "missingBucketLabel": "Missing",
35 | "order": "desc",
36 | "orderBy": "1",
37 | "otherBucket": false,
38 | "otherBucketLabel": "Other",
39 | "size": 5
40 | },
41 | "schema": "segment",
42 | "type": "terms"
43 | }
44 | ],
45 | "params": {
46 | "addLegend": true,
47 | "addTooltip": true,
48 | "isDonut": true,
49 | "labels": {
50 | "last_level": true,
51 | "show": false,
52 | "truncate": 100,
53 | "values": true
54 | },
55 | "legendPosition": "right",
56 | "type": "pie"
57 | },
58 | "title": "Backend breakdown [Filebeat HAProxy] ECS",
59 | "type": "pie"
60 | }
61 | },
62 | "id": "55251360-aa32-11e8-9c06-877f0445e3e0-ecs",
63 | "type": "visualization",
64 | "updated_at": "2018-12-06T11:35:36.721Z",
65 | "version": 2
66 | },
67 | {
68 | "attributes": {
69 | "description": "",
70 | "kibanaSavedObjectMeta": {
71 | "searchSourceJSON": {
72 | "filter": [],
73 | "index": "filebeat-*",
74 | "query": {
75 | "language": "kuery",
76 | "query": ""
77 | }
78 | }
79 | },
80 | "title": "Frontend breakdown [Filebeat HAProxy] ECS",
81 | "uiStateJSON": {},
82 | "version": 1,
83 | "visState": {
84 | "aggs": [
85 | {
86 | "enabled": true,
87 | "id": "1",
88 | "params": {},
89 | "schema": "metric",
90 | "type": "count"
91 | },
92 | {
93 | "enabled": true,
94 | "id": "2",
95 | "params": {
96 | "field": "haproxy.frontend_name",
97 | "missingBucket": false,
98 | "missingBucketLabel": "Missing",
99 | "order": "desc",
100 | "orderBy": "1",
101 | "otherBucket": false,
102 | "otherBucketLabel": "Other",
103 | "size": 5
104 | },
105 | "schema": "segment",
106 | "type": "terms"
107 | }
108 | ],
109 | "params": {
110 | "addLegend": true,
111 | "addTooltip": true,
112 | "isDonut": true,
113 | "labels": {
114 | "last_level": true,
115 | "show": false,
116 | "truncate": 100,
117 | "values": true
118 | },
119 | "legendPosition": "right",
120 | "type": "pie"
121 | },
122 | "title": "Frontend breakdown [Filebeat HAProxy] ECS",
123 | "type": "pie"
124 | }
125 | },
126 | "id": "7fb671f0-aa32-11e8-9c06-877f0445e3e0-ecs",
127 | "type": "visualization",
128 | "updated_at": "2018-12-06T11:35:36.721Z",
129 | "version": 2
130 | },
131 | {
132 | "attributes": {
133 | "description": "",
134 | "kibanaSavedObjectMeta": {
135 | "searchSourceJSON": {
136 | "filter": [],
137 | "index": "filebeat-*",
138 | "query": {
139 | "language": "kuery",
140 | "query": ""
141 | }
142 | }
143 | },
144 | "title": "IP Geohashes [Filebeat HAProxy] ECS",
145 | "uiStateJSON": {
146 | "mapCenter": [
147 | 14.944784875088372,
148 | 5.09765625
149 | ]
150 | },
151 | "version": 1,
152 | "visState": {
153 | "aggs": [
154 | {
155 | "enabled": true,
156 | "id": "1",
157 | "params": {
158 | "field": "source.address"
159 | },
160 | "schema": "metric",
161 | "type": "cardinality"
162 | },
163 | {
164 | "enabled": true,
165 | "id": "2",
166 | "params": {
167 | "autoPrecision": true,
168 | "field": "source.geo.location",
169 | "isFilteredByCollar": true,
170 | "precision": 2,
171 | "useGeocentroid": true
172 | },
173 | "schema": "segment",
174 | "type": "geohash_grid"
175 | }
176 | ],
177 | "params": {
178 | "addTooltip": true,
179 | "heatBlur": 15,
180 | "heatMaxZoom": 16,
181 | "heatMinOpacity": 0.1,
182 | "heatNormalizeData": true,
183 | "heatRadius": 25,
184 | "isDesaturated": true,
185 | "legendPosition": "bottomright",
186 | "mapCenter": [
187 | 15,
188 | 5
189 | ],
190 | "mapType": "Scaled Circle Markers",
191 | "mapZoom": 2,
192 | "wms": {
193 | "enabled": false,
194 | "options": {
195 | "attribution": "Maps provided by USGS",
196 | "format": "image/png",
197 | "layers": "0",
198 | "styles": "",
199 | "transparent": true,
200 | "version": "1.3.0"
201 | },
202 | "url": "https://basemap.nationalmap.gov/arcgis/services/USGSTopo/MapServer/WMSServer"
203 | }
204 | },
205 | "title": "IP Geohashes [Filebeat HAProxy] ECS",
206 | "type": "tile_map"
207 | }
208 | },
209 | "id": "11f8b9c0-aa32-11e8-9c06-877f0445e3e0-ecs",
210 | "type": "visualization",
211 | "updated_at": "2018-12-06T11:35:36.721Z",
212 | "version": 2
213 | },
214 | {
215 | "attributes": {
216 | "description": "",
217 | "kibanaSavedObjectMeta": {
218 | "searchSourceJSON": {
219 | "filter": [],
220 | "index": "filebeat-*",
221 | "query": {
222 | "language": "kuery",
223 | "query": ""
224 | }
225 | }
226 | },
227 | "title": "Response codes over time [Filebeat HAProxy] ECS",
228 | "uiStateJSON": {
229 | "vis": {
230 | "colors": {
231 | "200": "#508642",
232 | "204": "#629E51",
233 | "302": "#6ED0E0",
234 | "404": "#EAB839",
235 | "503": "#705DA0"
236 | }
237 | }
238 | },
239 | "version": 1,
240 | "visState": {
241 | "aggs": [
242 | {
243 | "enabled": true,
244 | "id": "1",
245 | "params": {},
246 | "schema": "metric",
247 | "type": "count"
248 | },
249 | {
250 | "enabled": true,
251 | "id": "2",
252 | "params": {
253 | "customInterval": "2h",
254 | "extended_bounds": {},
255 | "field": "@timestamp",
256 | "interval": "auto",
257 | "min_doc_count": 1
258 | },
259 | "schema": "segment",
260 | "type": "date_histogram"
261 | },
262 | {
263 | "enabled": true,
264 | "id": "3",
265 | "params": {
266 | "field": "http.response.status_code",
267 | "missingBucket": false,
268 | "missingBucketLabel": "Missing",
269 | "order": "desc",
270 | "orderBy": "_term",
271 | "otherBucket": false,
272 | "otherBucketLabel": "Other",
273 | "size": 5
274 | },
275 | "schema": "group",
276 | "type": "terms"
277 | }
278 | ],
279 | "params": {
280 | "addLegend": true,
281 | "addTimeMarker": false,
282 | "addTooltip": true,
283 | "categoryAxes": [
284 | {
285 | "id": "CategoryAxis-1",
286 | "labels": {
287 | "show": true,
288 | "truncate": 100
289 | },
290 | "position": "bottom",
291 | "scale": {
292 | "type": "linear"
293 | },
294 | "show": true,
295 | "style": {},
296 | "title": {},
297 | "type": "category"
298 | }
299 | ],
300 | "grid": {
301 | "categoryLines": false,
302 | "style": {
303 | "color": "#eee"
304 | }
305 | },
306 | "legendPosition": "right",
307 | "seriesParams": [
308 | {
309 | "data": {
310 | "id": "1",
311 | "label": "Count"
312 | },
313 | "drawLinesBetweenPoints": true,
314 | "mode": "stacked",
315 | "show": "true",
316 | "showCircles": true,
317 | "type": "histogram",
318 | "valueAxis": "ValueAxis-1"
319 | }
320 | ],
321 | "times": [],
322 | "type": "histogram",
323 | "valueAxes": [
324 | {
325 | "id": "ValueAxis-1",
326 | "labels": {
327 | "filter": false,
328 | "rotate": 0,
329 | "show": true,
330 | "truncate": 100
331 | },
332 | "name": "LeftAxis-1",
333 | "position": "left",
334 | "scale": {
335 | "mode": "normal",
336 | "type": "linear"
337 | },
338 | "show": true,
339 | "style": {},
340 | "title": {
341 | "text": "Count"
342 | },
343 | "type": "value"
344 | }
345 | ]
346 | },
347 | "title": "Response codes over time [Filebeat HAProxy] ECS",
348 | "type": "histogram"
349 | }
350 | },
351 | "id": "68af8ef0-aa33-11e8-9c06-877f0445e3e0-ecs",
352 | "type": "visualization",
353 | "updated_at": "2018-12-06T11:35:36.721Z",
354 | "version": 2
355 | },
356 | {
357 | "attributes": {
358 | "description": "Filebeat HAProxy module dashboard",
359 | "hits": 0,
360 | "kibanaSavedObjectMeta": {
361 | "searchSourceJSON": {
362 | "filter": [],
363 | "query": {
364 | "language": "kuery",
365 | "query": ""
366 | }
367 | }
368 | },
369 | "optionsJSON": {
370 | "darkTheme": false,
371 | "hidePanelTitles": false,
372 | "useMargins": true
373 | },
374 | "panelsJSON": [
375 | {
376 | "embeddableConfig": {},
377 | "gridData": {
378 | "h": 15,
379 | "i": "1",
380 | "w": 24,
381 | "x": 0,
382 | "y": 0
383 | },
384 | "id": "55251360-aa32-11e8-9c06-877f0445e3e0-ecs",
385 | "panelIndex": "1",
386 | "type": "visualization",
387 | "version": "6.5.2"
388 | },
389 | {
390 | "embeddableConfig": {},
391 | "gridData": {
392 | "h": 15,
393 | "i": "2",
394 | "w": 24,
395 | "x": 24,
396 | "y": 0
397 | },
398 | "id": "7fb671f0-aa32-11e8-9c06-877f0445e3e0-ecs",
399 | "panelIndex": "2",
400 | "type": "visualization",
401 | "version": "6.5.2"
402 | },
403 | {
404 | "embeddableConfig": {},
405 | "gridData": {
406 | "h": 15,
407 | "i": "3",
408 | "w": 24,
409 | "x": 0,
410 | "y": 15
411 | },
412 | "id": "11f8b9c0-aa32-11e8-9c06-877f0445e3e0-ecs",
413 | "panelIndex": "3",
414 | "type": "visualization",
415 | "version": "6.5.2"
416 | },
417 | {
418 | "embeddableConfig": {},
419 | "gridData": {
420 | "h": 15,
421 | "i": "4",
422 | "w": 24,
423 | "x": 24,
424 | "y": 15
425 | },
426 | "id": "68af8ef0-aa33-11e8-9c06-877f0445e3e0-ecs",
427 | "panelIndex": "4",
428 | "type": "visualization",
429 | "version": "6.5.2"
430 | }
431 | ],
432 | "timeRestore": false,
433 | "title": "[Filebeat HAProxy] Overview ECS",
434 | "version": 1
435 | },
436 | "id": "3560d580-aa34-11e8-9c06-877f0445e3e0-ecs",
437 | "type": "dashboard",
438 | "updated_at": "2018-12-06T11:40:40.204Z",
439 | "version": 6
440 | }
441 | ],
442 | "version": "6.5.2"
443 | }
444 |
--------------------------------------------------------------------------------
/haproxy-filebeat-module/README.md:
--------------------------------------------------------------------------------
1 | We'll use the Filebeat HAProxy module to grab the HAProxy log file.
2 |
3 | You can grab it without the module, but only one method works at a
4 | time for Filebeat to read the file (you can't enable both).
5 |
6 | We'll use the Filebeat HAProxy module since it cleanly persists the
7 | HAProxy log file messages while also providing the appropriate metadata
8 | for the other module artifacts: Ingest Pipeline, Kibana Dashboard, etc.
9 |
10 | ```
11 | $ filebeat module enable haproxy
12 | $ cat /etc/filebeat/modules.d/haproxy.yml
13 | - module: haproxy
14 | log:
15 | enabled: true
16 | var.input: file
17 | ```
18 | We should still be able to use the data collected by the module with
19 | the "raw" HAProxy data source adapter [here](/data-sources/haproxy).
20 |
21 | Since Beats Modules come with the ability to load the out-of-the-box
22 | assets using the beat, you can leverage that method as described below.
23 |
24 | Load Index Template
25 |
26 | [https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-template.html](https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-template.html)
27 |
28 | Load Kibana Dashboards
29 |
30 | [https://www.elastic.co/guide/en/beats/filebeat/current/load-kibana-dashboards.html](https://www.elastic.co/guide/en/beats/filebeat/current/load-kibana-dashboards.html)
31 |
--------------------------------------------------------------------------------
/images/architecture.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/images/architecture.png
--------------------------------------------------------------------------------
/images/caiv.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/images/caiv.png
--------------------------------------------------------------------------------
/images/data-source-assets.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/images/data-source-assets.png
--------------------------------------------------------------------------------
/images/elk-data-lake.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/images/elk-data-lake.png
--------------------------------------------------------------------------------
/images/indexing.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/images/indexing.png
--------------------------------------------------------------------------------
/images/logical-elements.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/images/logical-elements.png
--------------------------------------------------------------------------------
/images/onboarding-data.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/images/onboarding-data.png
--------------------------------------------------------------------------------
/images/terminology.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/images/terminology.png
--------------------------------------------------------------------------------
/images/workflow.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/images/workflow.png
--------------------------------------------------------------------------------
/power-emu2/README.md:
--------------------------------------------------------------------------------
1 | # Monitoring Power with EMU-2
2 |
3 |
4 |
5 | The [EMU-2](https://www.rainforestautomation.com/rfa-z105-2-emu-2-2/) by Rainforest Automation displays your smart meter's data in real time. We'll connect to it via USB and use a Python script to receive its messages. The device should output the current demand (kW), current meter reading, and even the current price per kWh.
6 |
7 | Our goal is to build the following dashboard:
8 |
9 | 
10 |
11 | Let's get started.
12 |
13 | ## Step #1 - Collect Data
14 |
15 | Create a new python script called `~/bin/power-emu.py` with the following contents:
16 |
17 | [power-emu2.py](power-emu2.py)
18 |
19 | You might need to adjust the USB port in the script, to match your needs. Look for `/dev/ttyACM1` in the script.
20 |
21 | Decoding the messages from the EMU-2 can be tricky. There are technical documents to aid the process if you want to dig deeper than the provided Python script:
22 |
23 | * [Emu-2-Tech-Guide-1.05.pdf](https://github.com/rainforestautomation/Emu-Serial-API/blob/master/Emu-2-Tech-Guide-1.05.pdf)
24 | * [RAVEn. XML API Manual.pdf](https://rainforestautomation.com/wp-content/uploads/2014/02/raven_xml_api_r127.pdf)
25 |
26 | Try running the script from the command line:
27 |
28 | ```bash
29 | chmod a+x ~/bin/power-emu2.py
30 | sudo ~/bin/power-emu2.py
31 | ```
32 |
33 | The output will include a JSON-formatted summary of each power outlet's metrics.
34 |
35 | ```json
36 | {"message": "Starting", "timestamp": "2021-09-06T07:55:42Z", "status": "connected"}
37 | {"message": "InstantaneousDemand", "timestamp": "2021-09-06T07:55:23Z", "demand_kW": 0.558}
38 | {"message": "InstantaneousDemand", "timestamp": "2021-09-06T07:55:53Z", "demand_kW": 0.585}
39 | {"message": "InstantaneousDemand", "timestamp": "2021-09-06T07:56:08Z", "demand_kW": 0.63}
40 | {"message": "CurrentSummationDelivered", "timestamp": "2021-09-06T07:56:11Z", "summation_delivered": 73438571, "summation_received": 0, "meter_kWh": 73438.571}
41 | {"message": "PriceCluster", "timestamp": "2021-09-06T07:56:51Z", "price_cents_kWh": 5.399, "currency": 840, "tier": 0, "start_time": "2021-09-06T07:50:00Z", "duration": 1440}
42 | ```
43 |
44 | Hit `^c` to quite the script.
45 |
46 | Once you're able to successfully query the power strip, create a log file for its output:
47 |
48 | ```bash
49 | sudo touch /var/log/power-emu2.log
50 | sudo chown ubuntu.ubuntu /var/log/power-emu2.log
51 | ```
52 |
53 | Create a logrotate entry so the log file doesn't grow unbounded:
54 |
55 | ```
56 | sudo vi /etc/logrotate.d/power-emu2
57 | ```
58 |
59 | Add the following content:
60 |
61 | ```
62 | /var/log/power-emu2.log {
63 | weekly
64 | rotate 12
65 | compress
66 | delaycompress
67 | missingok
68 | notifempty
69 | create 644 ubuntu ubuntu
70 | }
71 | ```
72 |
73 | Create a new bash script `~/bin/power-emu2.sh` with the following:
74 |
75 | ```bash
76 | #!/bin/bash
77 |
78 | if pgrep -f "sudo /home/ubuntu/bin/power-emu2.py" > /dev/null
79 | then
80 | echo "Already running."
81 | else
82 | echo "Not running. Restarting..."
83 | sudo /home/ubuntu/bin/power-emu2.py >> /var/log/power-emu2.log 2>&1
84 | fi
85 | ```
86 |
87 | Add the following entry to your crontab:
88 |
89 | ```
90 | * * * * * /home/ubuntu/bin/power-emu2.sh >> /tmp/power-emu2.log 2>&1
91 | ```
92 |
93 | Verify output by tailing the log file for a few minutes:
94 |
95 | ```
96 | tail -f /var/log/power-emu2.log
97 | ```
98 |
99 | If you're seeing output scroll each minute then you are successfully collecting data!
100 |
101 | ## Step #2 - Archive Data
102 |
103 | Once your data is ready to archive, we'll use Filebeat to send it to Logstash which will in turn sends it to S3.
104 |
105 | Add the following to the Filebeat config `/etc/filebeat/filebeat.yml` on the host logging your EMU-2 data:
106 |
107 | ```yaml
108 | filebeat.inputs:
109 | - type: log
110 | enabled: true
111 | tags: ["power-emu2"]
112 | paths:
113 | - /var/log/power-emu2.log
114 | ```
115 |
116 | This tells Filebeat where the log file is located and it adds a tag to each event. We'll refer to that tag in Logstash so we can easily isolate events from this data stream.
117 |
118 | Restart Filebeat:
119 |
120 | ```bash
121 | sudo systemctl restart filebeat
122 | ```
123 |
124 | You may want to tail syslog to see if Filebeat restarts without any issues:
125 |
126 | ```bash
127 | tail -f /var/log/syslog | grep filebeat
128 | ```
129 |
130 | At this point, we should have EMU-2 data flowing into Logstash. By default however, our `distributor` pipeline in Logstash will put any unrecognized data in our Data Lake / S3 bucket called `NEEDS_CLASSIFIED`. To change this, we're going to update the `distributor` pipeline to recognize the EMU-2 data feed.
131 |
132 | Add the following conditional to your `distributor.yml` file:
133 |
134 | ```
135 | } else if "power-emu2" in [tags] {
136 | pipeline {
137 | send_to => ["power-emu2-archive"]
138 | }
139 | }
140 | ```
141 |
142 | Create a Logstash pipeline called `power-emu2-archive.yml` with the following contents:
143 |
144 | ```
145 | input {
146 | pipeline {
147 | address => "power-emu2-archive"
148 | }
149 | }
150 | filter {
151 | }
152 | output {
153 | s3 {
154 | #
155 | # Custom Settings
156 | #
157 | prefix => "power-emu2/%{+YYYY}-%{+MM}-%{+dd}/%{+HH}"
158 | temporary_directory => "${S3_TEMP_DIR}/power-emu2-archive"
159 | access_key_id => "${S3_ACCESS_KEY}"
160 | secret_access_key => "${S3_SECRET_KEY}"
161 | endpoint => "${S3_ENDPOINT}"
162 | bucket => "${S3_BUCKET}"
163 |
164 | #
165 | # Standard Settings
166 | #
167 | validate_credentials_on_root_bucket => false
168 | codec => json_lines
169 | # Limit Data Lake file sizes to 5 GB
170 | size_file => 5000000000
171 | time_file => 60
172 | # encoding => "gzip"
173 | additional_settings => {
174 | force_path_style => true
175 | follow_redirects => false
176 | }
177 | }
178 | }
179 | ```
180 |
181 | Put this pipeline in your Logstash configuration directory so it gets loaded whenever Logstash restarts:
182 |
183 | ```bash
184 | sudo mv power-emu2-archive.yml /etc/logstash/conf.d/
185 | ```
186 |
187 | Add the pipeline to your `/etc/logstash/pipelines.yml` file:
188 |
189 | ```
190 | - pipeline.id: "power-emu2-archive"
191 | path.config: "/etc/logstash/conf.d/power-emu2-archive.conf"
192 | ```
193 |
194 | And finally, restart the Logstash service:
195 |
196 | ```bash
197 | sudo systemctl restart logstash
198 | ```
199 |
200 | While Logstash is restarting, you can tail it's log file in order to see if there are any configuration errors:
201 |
202 | ```bash
203 | sudo tail -f /var/log/logstash/logstash-plain.log
204 | ```
205 |
206 | After a few seconds, you should see Logstash shutdown and start with the new pipeline and no errors being emitted.
207 |
208 | Check your cluster's Stack Monitoring to see if we're getting events through the pipeline:
209 |
210 | 
211 |
212 | Check your S3 bucket to see if you're getting data directories created for the current date & hour with data:
213 |
214 | 
215 |
216 | If you see your data being stored, then you are successfully archiving!
217 |
218 | ## Step #3 - Index Data
219 |
220 | Once Logstash is archiving the data, next we need to index it with Elastic.
221 |
222 | We'll use Elastic's [Dynamic field mapping](https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-field-mapping.html) feature to automatically create the right [Field data types](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html) for the data we're sending in.
223 |
224 | Using the [Logstash Toolkit](http://github.com/gose/logstash-toolkit), the following filter chain has been built that can parse the raw JSON coming in.
225 |
226 | Create a new pipeline called `power-emu2-index.yml` with the following content:
227 |
228 | ```
229 | input {
230 | pipeline {
231 | address => "power-emu2-index"
232 | }
233 | }
234 | filter {
235 | json {
236 | source => "message"
237 | skip_on_invalid_json => true
238 | }
239 | json {
240 | source => "message"
241 | skip_on_invalid_json => true
242 | }
243 | date {
244 | match => ["timestamp", "ISO8601"]
245 | }
246 | mutate {
247 | remove_field => ["timestamp"]
248 | remove_field => ["log", "input", "agent", "tags", "@version", "ecs", "host"]
249 | }
250 | }
251 | output {
252 | elasticsearch {
253 | #
254 | # Custom Settings
255 | #
256 | id => "power-emu2-index"
257 | index => "power-emu2-%{+YYYY.MM.dd}"
258 | hosts => "${ES_ENDPOINT}"
259 | user => "${ES_USERNAME}"
260 | password => "${ES_PASSWORD}"
261 | }
262 | }
263 | ```
264 |
265 | Put this pipeline in your Logstash configuration directory so it gets loaded in whenever Logstash restarts:
266 |
267 | ```bash
268 | sudo mv power-emu2-index.yml /etc/logstash/conf.d/
269 | ```
270 |
271 | Add the pipeline to your `/etc/logstash/pipelines.yml` file:
272 |
273 | ```
274 | - pipeline.id: "power-emu2-index"
275 | path.config: "/etc/logstash/conf.d/power-emu2-index.conf"
276 | ```
277 |
278 | Append your new pipeline to your tagged data in the `distributor.yml` pipeline:
279 |
280 | ```
281 | } else if "power-emu2" in [tags] {
282 | pipeline {
283 | send_to => ["power-emu2-archive", "power-emu2-index"]
284 | }
285 | }
286 | ```
287 |
288 | And finally, restart the Logstash service:
289 |
290 | ```bash
291 | sudo systemctl restart logstash
292 | ```
293 |
294 | While Logstash is restarting, you can tail it's log file in order to see if there are any configuration errors:
295 |
296 | ```bash
297 | sudo tail -f /var/log/logstash/logstash-plain.log
298 | ```
299 |
300 | After a few seconds, you should see Logstash shutdown and start with the new pipeline and no errors being emitted.
301 |
302 | Check your cluster's Stack Monitoring to see if we're getting events through the pipeline:
303 |
304 | 
305 |
306 | ## Step #4 - Visualize Data
307 |
308 | Once Elasticsearch is indexing the data, we want to visualize it in Kibana.
309 |
310 | Download this dashboard: [power-emu2.ndjson](power-emu2.ndjson)
311 |
312 | Jump back into Kibana:
313 |
314 | 1. Select "Stack Management" from the menu
315 | 2. Select "Saved Objects"
316 | 3. Click "Import" in the upper right
317 |
318 | Once it's been imported, click on "Power EMU-2".
319 |
320 | 
321 |
322 | Congratulations! You should now be looking at power data from your EMU-2 in Elastic.
--------------------------------------------------------------------------------
/power-emu2/archive.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/power-emu2/archive.png
--------------------------------------------------------------------------------
/power-emu2/dashboard.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/power-emu2/dashboard.png
--------------------------------------------------------------------------------
/power-emu2/emu-2.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/power-emu2/emu-2.jpg
--------------------------------------------------------------------------------
/power-emu2/index.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/power-emu2/index.png
--------------------------------------------------------------------------------
/power-emu2/minio.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/power-emu2/minio.png
--------------------------------------------------------------------------------
/power-emu2/power-emu2.ndjson:
--------------------------------------------------------------------------------
1 | {"attributes":{"fieldAttrs":"{}","fields":"[]","runtimeFieldMap":"{}","timeFieldName":"@timestamp","title":"power-emu2-*","typeMeta":"{}"},"coreMigrationVersion":"7.14.0","id":"140f1bf0-0dbc-11ec-b013-53a9df7625dd","migrationVersion":{"index-pattern":"7.11.0"},"references":[],"type":"index-pattern","updated_at":"2021-09-04T20:09:58.325Z","version":"WzMxMzI2NCwyXQ=="}
2 | {"attributes":{"description":"","hits":0,"kibanaSavedObjectMeta":{"searchSourceJSON":"{\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filter\":[]}"},"optionsJSON":"{\"useMargins\":true,\"syncColors\":false,\"hidePanelTitles\":false}","panelsJSON":"[{\"version\":\"7.14.0\",\"type\":\"visualization\",\"gridData\":{\"x\":0,\"y\":0,\"w\":12,\"h\":8,\"i\":\"2492dbbd-f32f-4d29-946d-0b6971538d0e\"},\"panelIndex\":\"2492dbbd-f32f-4d29-946d-0b6971538d0e\",\"embeddableConfig\":{\"savedVis\":{\"title\":\"\",\"description\":\"\",\"type\":\"markdown\",\"params\":{\"fontSize\":12,\"openLinksInNewTab\":false,\"markdown\":\"# Power EMU-2\"},\"uiState\":{},\"data\":{\"aggs\":[],\"searchSource\":{\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filter\":[]}}},\"enhancements\":{}}},{\"version\":\"7.14.0\",\"type\":\"lens\",\"gridData\":{\"x\":12,\"y\":0,\"w\":36,\"h\":8,\"i\":\"58d3b589-f688-4815-b206-94f8e5bcf246\"},\"panelIndex\":\"58d3b589-f688-4815-b206-94f8e5bcf246\",\"embeddableConfig\":{\"attributes\":{\"title\":\"\",\"type\":\"lens\",\"visualizationType\":\"lnsXY\",\"state\":{\"datasourceStates\":{\"indexpattern\":{\"layers\":{\"9b5ecb03-f611-4038-bdd0-1e033b9b64b3\":{\"columns\":{\"9fa24993-a762-4277-aa6d-e471e99152e6\":{\"label\":\"@timestamp\",\"dataType\":\"date\",\"operationType\":\"date_histogram\",\"sourceField\":\"@timestamp\",\"isBucketed\":true,\"scale\":\"interval\",\"params\":{\"interval\":\"auto\"}},\"b27753d2-962f-407c-a41d-d0af4ff04199\":{\"label\":\"Count of records\",\"dataType\":\"number\",\"operationType\":\"count\",\"isBucketed\":false,\"scale\":\"ratio\",\"sourceField\":\"Records\"}},\"columnOrder\":[\"9fa24993-a762-4277-aa6d-e471e99152e6\",\"b27753d2-962f-407c-a41d-d0af4ff04199\"],\"incompleteColumns\":{}}}}},\"visualization\":{\"legend\":{\"isVisible\":true,\"position\":\"right\"},\"valueLabels\":\"hide\",\"fittingFunction\":\"None\",\"curveType\":\"CURVE_MONOTONE_X\",\"valuesInLegend\":true,\"yLeftExtent\":{\"mode\":\"full\"},\"yRightExtent\":{\"mode\":\"full\"},\"axisTitlesVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"tickLabelsVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"gridlinesVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"preferredSeriesType\":\"bar\",\"layers\":[{\"layerId\":\"9b5ecb03-f611-4038-bdd0-1e033b9b64b3\",\"accessors\":[\"b27753d2-962f-407c-a41d-d0af4ff04199\"],\"position\":\"top\",\"seriesType\":\"bar\",\"showGridlines\":false,\"xAccessor\":\"9fa24993-a762-4277-aa6d-e471e99152e6\"}]},\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filters\":[]},\"references\":[{\"type\":\"index-pattern\",\"id\":\"140f1bf0-0dbc-11ec-b013-53a9df7625dd\",\"name\":\"indexpattern-datasource-current-indexpattern\"},{\"type\":\"index-pattern\",\"id\":\"140f1bf0-0dbc-11ec-b013-53a9df7625dd\",\"name\":\"indexpattern-datasource-layer-9b5ecb03-f611-4038-bdd0-1e033b9b64b3\"}]},\"enhancements\":{},\"hidePanelTitles\":false},\"title\":\"Logs\"},{\"version\":\"7.14.0\",\"type\":\"lens\",\"gridData\":{\"x\":0,\"y\":8,\"w\":48,\"h\":11,\"i\":\"b1c3297d-5e54-4a20-8181-5cb483faf23d\"},\"panelIndex\":\"b1c3297d-5e54-4a20-8181-5cb483faf23d\",\"embeddableConfig\":{\"attributes\":{\"title\":\"\",\"type\":\"lens\",\"visualizationType\":\"lnsXY\",\"state\":{\"datasourceStates\":{\"indexpattern\":{\"layers\":{\"6ba7a7ef-3bea-46d9-b61a-bc2a3c673c1d\":{\"columns\":{\"623795c1-643f-4db6-9378-059abb5dc58b\":{\"label\":\"@timestamp\",\"dataType\":\"date\",\"operationType\":\"date_histogram\",\"sourceField\":\"@timestamp\",\"isBucketed\":true,\"scale\":\"interval\",\"params\":{\"interval\":\"auto\"}},\"b49999ce-ffb9-4188-9356-e363e7ae2ca8\":{\"label\":\"Median of demand_kW\",\"dataType\":\"number\",\"operationType\":\"median\",\"sourceField\":\"demand_kW\",\"isBucketed\":false,\"scale\":\"ratio\"}},\"columnOrder\":[\"623795c1-643f-4db6-9378-059abb5dc58b\",\"b49999ce-ffb9-4188-9356-e363e7ae2ca8\"],\"incompleteColumns\":{}}}}},\"visualization\":{\"legend\":{\"isVisible\":true,\"position\":\"right\"},\"valueLabels\":\"hide\",\"fittingFunction\":\"None\",\"curveType\":\"CURVE_MONOTONE_X\",\"valuesInLegend\":true,\"yLeftExtent\":{\"mode\":\"full\"},\"yRightExtent\":{\"mode\":\"full\"},\"axisTitlesVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"tickLabelsVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"gridlinesVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"preferredSeriesType\":\"line\",\"layers\":[{\"layerId\":\"6ba7a7ef-3bea-46d9-b61a-bc2a3c673c1d\",\"accessors\":[\"b49999ce-ffb9-4188-9356-e363e7ae2ca8\"],\"position\":\"top\",\"seriesType\":\"line\",\"showGridlines\":false,\"xAccessor\":\"623795c1-643f-4db6-9378-059abb5dc58b\"}]},\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filters\":[]},\"references\":[{\"type\":\"index-pattern\",\"id\":\"140f1bf0-0dbc-11ec-b013-53a9df7625dd\",\"name\":\"indexpattern-datasource-current-indexpattern\"},{\"type\":\"index-pattern\",\"id\":\"140f1bf0-0dbc-11ec-b013-53a9df7625dd\",\"name\":\"indexpattern-datasource-layer-6ba7a7ef-3bea-46d9-b61a-bc2a3c673c1d\"}]},\"hidePanelTitles\":false,\"enhancements\":{}},\"title\":\"Demand (kW)\"},{\"version\":\"7.14.0\",\"type\":\"lens\",\"gridData\":{\"x\":0,\"y\":19,\"w\":48,\"h\":9,\"i\":\"0161139a-4d80-41ab-aed3-7d3042269d5a\"},\"panelIndex\":\"0161139a-4d80-41ab-aed3-7d3042269d5a\",\"embeddableConfig\":{\"attributes\":{\"title\":\"\",\"type\":\"lens\",\"visualizationType\":\"lnsXY\",\"state\":{\"datasourceStates\":{\"indexpattern\":{\"layers\":{\"25083df7-ad95-4b9c-882c-5d82a00a15ed\":{\"columns\":{\"45c88792-c3c8-40f6-a1a1-3643062294f8\":{\"label\":\"@timestamp\",\"dataType\":\"date\",\"operationType\":\"date_histogram\",\"sourceField\":\"@timestamp\",\"isBucketed\":true,\"scale\":\"interval\",\"params\":{\"interval\":\"auto\"}},\"ac569630-a85e-46fc-a682-c30e1b1bbef5\":{\"label\":\"Differences of Sum of meter_kWh\",\"dataType\":\"number\",\"operationType\":\"differences\",\"isBucketed\":false,\"scale\":\"ratio\",\"references\":[\"ff830719-e828-470a-8bdd-07e23f048769\"]},\"ff830719-e828-470a-8bdd-07e23f048769\":{\"label\":\"Sum of meter_kWh\",\"dataType\":\"number\",\"operationType\":\"sum\",\"sourceField\":\"meter_kWh\",\"isBucketed\":false,\"scale\":\"ratio\"}},\"columnOrder\":[\"45c88792-c3c8-40f6-a1a1-3643062294f8\",\"ac569630-a85e-46fc-a682-c30e1b1bbef5\",\"ff830719-e828-470a-8bdd-07e23f048769\"],\"incompleteColumns\":{}}}}},\"visualization\":{\"legend\":{\"isVisible\":true,\"position\":\"right\",\"showSingleSeries\":false},\"valueLabels\":\"hide\",\"fittingFunction\":\"None\",\"curveType\":\"CURVE_MONOTONE_X\",\"valuesInLegend\":true,\"yLeftExtent\":{\"mode\":\"full\"},\"yRightExtent\":{\"mode\":\"full\"},\"axisTitlesVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"tickLabelsVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"gridlinesVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"preferredSeriesType\":\"line\",\"layers\":[{\"layerId\":\"25083df7-ad95-4b9c-882c-5d82a00a15ed\",\"accessors\":[\"ac569630-a85e-46fc-a682-c30e1b1bbef5\"],\"position\":\"top\",\"seriesType\":\"line\",\"showGridlines\":false,\"xAccessor\":\"45c88792-c3c8-40f6-a1a1-3643062294f8\"}]},\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filters\":[]},\"references\":[{\"type\":\"index-pattern\",\"id\":\"140f1bf0-0dbc-11ec-b013-53a9df7625dd\",\"name\":\"indexpattern-datasource-current-indexpattern\"},{\"type\":\"index-pattern\",\"id\":\"140f1bf0-0dbc-11ec-b013-53a9df7625dd\",\"name\":\"indexpattern-datasource-layer-25083df7-ad95-4b9c-882c-5d82a00a15ed\"}]},\"enhancements\":{}}},{\"version\":\"7.14.0\",\"type\":\"lens\",\"gridData\":{\"x\":0,\"y\":28,\"w\":48,\"h\":10,\"i\":\"1c3d281f-94f6-45e8-be45-5d0f24f7ee2f\"},\"panelIndex\":\"1c3d281f-94f6-45e8-be45-5d0f24f7ee2f\",\"embeddableConfig\":{\"attributes\":{\"title\":\"\",\"type\":\"lens\",\"visualizationType\":\"lnsXY\",\"state\":{\"datasourceStates\":{\"indexpattern\":{\"layers\":{\"d4322c84-ac6d-4819-aa31-75a87a04bb0f\":{\"columns\":{\"6e0e774c-eb62-4ae1-abbd-5680c42f85f9\":{\"label\":\"@timestamp\",\"dataType\":\"date\",\"operationType\":\"date_histogram\",\"sourceField\":\"@timestamp\",\"isBucketed\":true,\"scale\":\"interval\",\"params\":{\"interval\":\"auto\"}},\"021d0fc2-f73f-463f-a7c3-5d030e404f68\":{\"label\":\"Differences of Maximum of summation_delivered\",\"dataType\":\"number\",\"operationType\":\"differences\",\"isBucketed\":false,\"scale\":\"ratio\",\"references\":[\"aa4fb545-bc30-40a7-b6cc-942b35936030\"]},\"aa4fb545-bc30-40a7-b6cc-942b35936030\":{\"label\":\"Maximum of summation_delivered\",\"dataType\":\"number\",\"operationType\":\"max\",\"sourceField\":\"summation_delivered\",\"isBucketed\":false,\"scale\":\"ratio\"}},\"columnOrder\":[\"6e0e774c-eb62-4ae1-abbd-5680c42f85f9\",\"021d0fc2-f73f-463f-a7c3-5d030e404f68\",\"aa4fb545-bc30-40a7-b6cc-942b35936030\"],\"incompleteColumns\":{}}}}},\"visualization\":{\"legend\":{\"isVisible\":true,\"position\":\"right\"},\"valueLabels\":\"hide\",\"fittingFunction\":\"None\",\"curveType\":\"CURVE_MONOTONE_X\",\"valuesInLegend\":true,\"yLeftExtent\":{\"mode\":\"full\"},\"yRightExtent\":{\"mode\":\"full\"},\"axisTitlesVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"tickLabelsVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"gridlinesVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"preferredSeriesType\":\"line\",\"layers\":[{\"layerId\":\"d4322c84-ac6d-4819-aa31-75a87a04bb0f\",\"accessors\":[\"021d0fc2-f73f-463f-a7c3-5d030e404f68\"],\"position\":\"top\",\"seriesType\":\"line\",\"showGridlines\":false,\"xAccessor\":\"6e0e774c-eb62-4ae1-abbd-5680c42f85f9\"}]},\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filters\":[]},\"references\":[{\"type\":\"index-pattern\",\"id\":\"140f1bf0-0dbc-11ec-b013-53a9df7625dd\",\"name\":\"indexpattern-datasource-current-indexpattern\"},{\"type\":\"index-pattern\",\"id\":\"140f1bf0-0dbc-11ec-b013-53a9df7625dd\",\"name\":\"indexpattern-datasource-layer-d4322c84-ac6d-4819-aa31-75a87a04bb0f\"}]},\"enhancements\":{}}}]","timeRestore":false,"title":"Power EMU-2","version":1},"coreMigrationVersion":"7.14.0","id":"b58dc570-0dbd-11ec-b013-53a9df7625dd","migrationVersion":{"dashboard":"7.14.0"},"references":[{"id":"140f1bf0-0dbc-11ec-b013-53a9df7625dd","name":"58d3b589-f688-4815-b206-94f8e5bcf246:indexpattern-datasource-current-indexpattern","type":"index-pattern"},{"id":"140f1bf0-0dbc-11ec-b013-53a9df7625dd","name":"58d3b589-f688-4815-b206-94f8e5bcf246:indexpattern-datasource-layer-9b5ecb03-f611-4038-bdd0-1e033b9b64b3","type":"index-pattern"},{"id":"140f1bf0-0dbc-11ec-b013-53a9df7625dd","name":"b1c3297d-5e54-4a20-8181-5cb483faf23d:indexpattern-datasource-current-indexpattern","type":"index-pattern"},{"id":"140f1bf0-0dbc-11ec-b013-53a9df7625dd","name":"b1c3297d-5e54-4a20-8181-5cb483faf23d:indexpattern-datasource-layer-6ba7a7ef-3bea-46d9-b61a-bc2a3c673c1d","type":"index-pattern"},{"id":"140f1bf0-0dbc-11ec-b013-53a9df7625dd","name":"0161139a-4d80-41ab-aed3-7d3042269d5a:indexpattern-datasource-current-indexpattern","type":"index-pattern"},{"id":"140f1bf0-0dbc-11ec-b013-53a9df7625dd","name":"0161139a-4d80-41ab-aed3-7d3042269d5a:indexpattern-datasource-layer-25083df7-ad95-4b9c-882c-5d82a00a15ed","type":"index-pattern"},{"id":"140f1bf0-0dbc-11ec-b013-53a9df7625dd","name":"1c3d281f-94f6-45e8-be45-5d0f24f7ee2f:indexpattern-datasource-current-indexpattern","type":"index-pattern"},{"id":"140f1bf0-0dbc-11ec-b013-53a9df7625dd","name":"1c3d281f-94f6-45e8-be45-5d0f24f7ee2f:indexpattern-datasource-layer-d4322c84-ac6d-4819-aa31-75a87a04bb0f","type":"index-pattern"}],"type":"dashboard","updated_at":"2021-09-06T07:40:24.254Z","version":"WzM1MDE5OSwyXQ=="}
3 | {"excludedObjects":[],"excludedObjectsCount":0,"exportedCount":2,"missingRefCount":0,"missingReferences":[]}
--------------------------------------------------------------------------------
/power-emu2/power-emu2.py:
--------------------------------------------------------------------------------
1 | #! /usr/bin/env python3
2 |
3 | import datetime
4 | import json
5 | import os
6 | import platform
7 | import serial
8 | import sys
9 | import time
10 | import xml.etree.ElementTree as et
11 |
12 | data = {}
13 | data['message'] = "Starting"
14 | d = datetime.datetime.now().strftime("%Y-%m-%dT%H:%M:%SZ")
15 | data['timestamp'] = d
16 |
17 | Y2K = 946684800
18 |
19 | try:
20 | dev = '/dev/ttyACM1'
21 | emu2 = serial.Serial(dev, 115200, timeout=1)
22 | data['status'] = "connected"
23 | except:
24 | data['status'] = "could not connect"
25 | print(json.dumps(data), flush=True)
26 | exit()
27 |
28 | print(json.dumps(data), flush=True)
29 |
30 | while True:
31 | try:
32 | msg = emu2.readlines()
33 | except:
34 | data = {}
35 | data['message'] = "error"
36 | d = datetime.datetime.now().strftime("%Y-%m-%dT%H:%M:%SZ")
37 | data['timestamp'] = d
38 | print(json.dumps(data))
39 | exit()
40 |
41 | if msg == [] or msg[0].decode()[0] != '<':
42 | continue
43 |
44 | msg = ''.join([line.decode() for line in msg])
45 |
46 | try:
47 | tree = et.fromstring(msg)
48 | #print(msg)
49 | except:
50 | continue
51 |
52 | data = {}
53 | data['message'] = tree.tag
54 |
55 | if tree.tag == 'InstantaneousDemand':
56 | # Received every 15 seconds
57 | ts = int(tree.find('TimeStamp').text, 16)
58 | t = ts + Y2K # ts + Y2K = Unix Epoch Time
59 | d = datetime.datetime.fromtimestamp(t).strftime("%Y-%m-%dT%H:%M:%SZ")
60 | data['timestamp'] = d
61 | power = int(tree.find('Demand').text, 16)
62 | power *= int(tree.find('Multiplier').text, 16)
63 | power /= int(tree.find('Divisor').text, 16)
64 | power = round(power, int(tree.find('DigitsRight').text, 16))
65 | data['demand_kW'] = power
66 | elif tree.tag == 'PriceCluster':
67 | # Received every 1-2 minutes
68 | ts = int(tree.find('TimeStamp').text, 16)
69 | t = ts + Y2K # ts + Y2K = Unix Epoch Time
70 | d = datetime.datetime.fromtimestamp(t).strftime("%Y-%m-%dT%H:%M:%SZ")
71 | data['timestamp'] = d
72 | #data['price'] = int(tree.find('Price').text, 16)
73 | #data['trailing'] = int(tree.find('TrailingDigits').text, 16)
74 | data['price_cents_kWh'] = int(tree.find('Price').text, 16)
75 | data['price_cents_kWh'] /= 1000
76 | data['currency'] = int(tree.find('Currency').text, 16)
77 | # Currency uses ISO 4217 codes
78 | # US Dollar is code 840
79 | data['tier'] = int(tree.find('Tier').text, 16)
80 | st = int(tree.find('StartTime').text, 16)
81 | st = st + Y2K # st + Y2K = Unix Epoch Time
82 | d = datetime.datetime.fromtimestamp(st).strftime("%Y-%m-%dT%H:%M:%SZ")
83 | data['start_time'] = d
84 | data['duration'] = int(tree.find('Duration').text, 16)
85 | elif tree.tag == 'CurrentSummationDelivered':
86 | # Received every 3-5 minutes
87 | ts = int(tree.find('TimeStamp').text, 16)
88 | t = ts + Y2K # ts + Y2K = Unix Epoch Time
89 | d = datetime.datetime.fromtimestamp(t).strftime("%Y-%m-%dT%H:%M:%SZ")
90 | data['timestamp'] = d
91 | data['summation_delivered'] = int(tree.find('SummationDelivered').text, 16)
92 | data['summation_received'] = int(tree.find('SummationReceived').text, 16)
93 | energy = int(tree.find('SummationDelivered').text, 16)
94 | energy -= int(tree.find('SummationReceived').text, 16)
95 | energy *= int(tree.find('Multiplier').text, 16)
96 | energy /= int(tree.find('Divisor').text, 16)
97 | energy = round(energy, int(tree.find('DigitsRight').text, 16))
98 | data['meter_kWh'] = energy
99 | elif tree.tag == 'TimeCluster':
100 | # Received every 15 minutes
101 | ts = int(tree.find('UTCTime').text, 16)
102 | t = ts + Y2K # ts + Y2K = Unix Epoch Time
103 | d = datetime.datetime.fromtimestamp(t).strftime("%Y-%m-%dT%H:%M:%SZ")
104 | data['timestamp'] = d
105 | ts = int(tree.find('LocalTime').text, 16)
106 | t = ts + Y2K # ts + Y2K = Unix Epoch Time
107 | d = datetime.datetime.fromtimestamp(t).strftime("%Y-%m-%dT%H:%M:%S")
108 | data['local_time'] = d
109 | else:
110 | for child in tree:
111 | if child:
112 | value = int(child.text, 16) if child.text[:2] == '0x' else child.text
113 | data['unknown'] = value
114 |
115 | print(json.dumps(data), flush=True)
116 |
--------------------------------------------------------------------------------
/power-hs300/README.md:
--------------------------------------------------------------------------------
1 | # Monitoring Power with HS300
2 |
3 |
4 |
5 | The [Kasa Smart Wi-Fi Power Strip (HS300)](https://www.kasasmart.com/us/products/smart-plugs/kasa-smart-wi-fi-power-strip-hs300) is a consumer-grade power strip that allows you to independently control and monitor 6 smart outlets (and charge 3 devices with built-in USB ports). The power strip can be controlled via the Kasa Smart [iPhone](https://apps.apple.com/us/app/kasa-smart/id1034035493) app or [Android](https://play.google.com/store/apps/details?id=com.tplink.kasa_android&hl=en_US&gl=US) app. Furthermore, you can query it via API to get the electrical properties of each outlet. For example:
6 |
7 | * Voltage
8 | * Current
9 | * Watts
10 | * Watts per hour
11 |
12 | We'll use a Python script to query it each minute via a cron job, and redirect the output to a log file. From there, Filebeat will pick it up and send it into Elastic. Many data center grade PSUs also provide ways to query individual outlet metrics. A similar script could be written to extract this information in a commercial setting.
13 |
14 | In general, this exercise is meant to bring transparency to the cost of electricity to run a set of machines. If we know how much power a machine is consuming, we can calculate its electricity cost based on utility rates.
15 |
16 | 
17 |
18 | Let's get started.
19 |
20 | ## Step #1 - Collect Data
21 |
22 | Install the following Python module that knows how to query the power strip:
23 |
24 | ```bash
25 | pip3 install pyhs100
26 | ```
27 |
28 | Find the IP address of the power strip:
29 |
30 | ```bash
31 | pyhs100 discover | grep IP
32 | ```
33 |
34 | This should return an IP address for each HS300 on your network:
35 |
36 | ```
37 | Host/IP: 192.168.1.5
38 | Host/IP: 192.168.1.6
39 | ```
40 |
41 | Try querying the power strip:
42 |
43 | ```bash
44 | /home/ubuntu/.local/bin/pyhs100 --ip 192.168.1.5 emeter
45 | ```
46 |
47 | You should see output similar to:
48 |
49 | ```
50 | {0: {'voltage_mv': 112807, 'current_ma': 239, 'power_mw': 24620, 'total_wh': 12}, 1: {'voltag_mv': 112608, 'current_ma': 243, 'power_mw': 23948, 'total_wh': 12}, 2: {'voltage_mv': 112608, 'current_ma': 238, 'power_mw': 23453, 'total_wh': 11}, 3: {'voltage_mv': 112509, 'current_ma': 70, 'power_mw': 5399, 'total_wh': 4}, 4: {'voltage_mv': 112409, 'current_ma': 93, 'power_mw': 3130, 'total_wh': 1}, 5: {'voltage_mv': 109030, 'current_ma': 78, 'power_mw': 5787, 'total_wh': 2}}
51 | ```
52 |
53 | This is not properly formatted JSON, but the script included with this data source will help clean it up.
54 |
55 | After you've verified that you can query the power strip, download the following script and open it in your favorite editor:
56 |
57 | [power-hs300.py](power-hs300.py)
58 |
59 | Modify the script with the following:
60 |
61 | * Change the IP addresses to match that of your power strip(s)
62 | * Change the directory location of the `pyhs100` command
63 | * Change the names of each outlet in the `hosts` dictionary
64 | * Change the `label` argument in the `query_power_strip()` function calls
65 |
66 | Try running the script from the command line:
67 |
68 | ```bash
69 | chmod a+x ~/bin/power-hs300.py
70 | ~/bin/power-hs300.py
71 | ```
72 |
73 | The output will include a JSON-formatted summary of each power outlet's metrics.
74 |
75 | ```json
76 | {"@timestamp": "2021-02-08T14:32:11.611868", "outlets": [{"ip": "192.168.1.5", "outlet": 0, "name": "node-1", "volts": 112.393, "amps": 0.254, "watts": 25.425, "label": "office"}, ...]}
77 | ```
78 |
79 | When pretty-printed, it will look like this:
80 |
81 | ```json
82 | {
83 | "@timestamp": "2021-02-08T14:32:11.611868",
84 | "outlets": [
85 | {
86 | "ip": "192.168.1.5",
87 | "label": "office",
88 | "outlet": 0,
89 | "name": "node-1",
90 | "volts": 112.393,
91 | "amps": 0.254,
92 | "watts": 25.425
93 | },
94 | ...
95 | ]
96 | }
97 | ```
98 |
99 | Once you're able to successfully query the power strip, create a log file for its output:
100 |
101 | ```bash
102 | sudo touch /var/log/power-hs300.log
103 | sudo chown ubuntu.ubuntu /var/log/power-hs300.log
104 | ```
105 |
106 | Create a logrotate entry so the log file doesn't grow unbounded:
107 |
108 | ```
109 | sudo vi /etc/logrotate.d/power-hs300
110 | ```
111 |
112 | Add the following content:
113 |
114 | ```
115 | /var/log/power-hs300.log {
116 | weekly
117 | rotate 12
118 | compress
119 | delaycompress
120 | missingok
121 | notifempty
122 | create 644 ubuntu ubuntu
123 | }
124 | ```
125 |
126 | Add the following entry to your crontab:
127 |
128 | ```
129 | * * * * * /home/ubuntu/bin/power-hs300.py >> /var/log/power-hs300.log 2>&1
130 | ```
131 |
132 | Verify output by tailing the log file for a few minutes:
133 |
134 | ```
135 | $ tail -f /var/log/power-hs300.log
136 | ```
137 |
138 | If you're seeing output scroll each minute then you are successfully collecting data!
139 |
140 | ## Step #2 - Archive Data
141 |
142 | Once your data is ready to archive, we'll use Filebeat to send it to Logstash which will in turn sends it to S3.
143 |
144 | Add the following to the Filebeat config `/etc/filebeat/filebeat.yml` on the host logging your HS300 data:
145 |
146 | ```yaml
147 | filebeat.inputs:
148 | - type: log
149 | enabled: true
150 | tags: ["power-hs300"]
151 | paths:
152 | - /var/log/power-hs300.log
153 | ```
154 |
155 | This tells Filebeat where the log file is located and it adds a tag to each event. We'll refer to that tag in Logstash so we can easily isolate events from this data stream.
156 |
157 | Restart Filebeat:
158 |
159 | ```bash
160 | sudo systemctl restart filebeat
161 | ```
162 |
163 | You may want to tail syslog to see if Filebeat restarts without any issues:
164 |
165 | ```bash
166 | tail -f /var/log/syslog | grep filebeat
167 | ```
168 |
169 | At this point, we should have HS300 data flowing into Logstash. By default however, our `distributor` pipeline in Logstash will put any unrecognized data in our Data Lake / S3 bucket called `NEEDS_CLASSIFIED`. To change this, we're going to update the `distributor` pipeline to recognize the HS300 data feed.
170 |
171 | Add the following conditional to your `distributor.yml` file:
172 |
173 | ```
174 | } else if "power-hs300" in [tags] {
175 | pipeline {
176 | send_to => ["power-hs300-archive"]
177 | }
178 | }
179 | ```
180 |
181 | Create a Logstash pipeline called `power-hs300-archive.yml` with the following contents:
182 |
183 | ```
184 | input {
185 | pipeline {
186 | address => "power-hs300-archive"
187 | }
188 | }
189 | filter {
190 | }
191 | output {
192 | s3 {
193 | #
194 | # Custom Settings
195 | #
196 | prefix => "power-hs300/%{+YYYY}-%{+MM}-%{+dd}/%{+HH}"
197 | temporary_directory => "${S3_TEMP_DIR}/power-hs300-archive"
198 | access_key_id => "${S3_ACCESS_KEY}"
199 | secret_access_key => "${S3_SECRET_KEY}"
200 | endpoint => "${S3_ENDPOINT}"
201 | bucket => "${S3_BUCKET}"
202 |
203 | #
204 | # Standard Settings
205 | #
206 | validate_credentials_on_root_bucket => false
207 | codec => json_lines
208 | # Limit Data Lake file sizes to 5 GB
209 | size_file => 5000000000
210 | time_file => 60
211 | # encoding => "gzip"
212 | additional_settings => {
213 | force_path_style => true
214 | follow_redirects => false
215 | }
216 | }
217 | }
218 | ```
219 |
220 | Put this pipeline in your Logstash configuration directory so it gets loaded whenever Logstash restarts:
221 |
222 | ```bash
223 | sudo mv power-hs300-archive.yml /etc/logstash/conf.d/
224 | ```
225 |
226 | Add the pipeline to your `/etc/logstash/pipelines.yml` file:
227 |
228 | ```
229 | - pipeline.id: "power-hs300-archive"
230 | path.config: "/etc/logstash/conf.d/power-hs300-archive.conf"
231 | ```
232 |
233 | And finally, restart the Logstash service:
234 |
235 | ```bash
236 | sudo systemctl restart logstash
237 | ```
238 |
239 | While Logstash is restarting, you can tail it's log file in order to see if there are any configuration errors:
240 |
241 | ```bash
242 | sudo tail -f /var/log/logstash/logstash-plain.log
243 | ```
244 |
245 | After a few seconds, you should see Logstash shutdown and start with the new pipeline and no errors being emitted.
246 |
247 | Check your cluster's Stack Monitoring to see if we're getting events through the pipeline:
248 |
249 | 
250 |
251 | Check your S3 bucket to see if you're getting data directories created for the current date & hour with data:
252 |
253 | 
254 |
255 | If you see your data being stored, then you are successfully archiving!
256 |
257 | ## Step #3 - Index Data
258 |
259 | Once Logstash is archiving the data, next we need to index it with Elastic.
260 |
261 | We'll use Elastic's [Dynamic field mapping](https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-field-mapping.html) feature to automatically create the right [Field data types](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html) for the data we're sending in.
262 |
263 | Using the [Logstash Toolkit](http://github.com/gose/logstash-toolkit), the following filter chain has been built that can parse the raw JSON coming in.
264 |
265 | Create a new pipeline called `power-hs300-index.yml` with the following content:
266 |
267 | ```
268 | input {
269 | pipeline {
270 | address => "power-hs300-index"
271 | }
272 | }
273 | filter {
274 | mutate {
275 | remove_field => ["log", "input", "agent", "tags", "@version", "ecs", "host"]
276 | }
277 | json {
278 | source => "message"
279 | skip_on_invalid_json => true
280 | }
281 | if "_jsonparsefailure" in [tags] {
282 | drop { }
283 | }
284 | split {
285 | field => "outlets"
286 | }
287 | ruby {
288 | code => "
289 | event.get('outlets').each do |k, v|
290 | event.set(k, v)
291 | if k == '@timestamp'
292 | event.set(k, v + 'Z')
293 | end
294 | end
295 | event.remove('outlets')
296 | "
297 | }
298 | if "_rubyexception" in [tags] {
299 | drop { }
300 | }
301 | mutate {
302 | remove_field => ["message"]
303 | remove_field => ["@version"]
304 | }
305 | }
306 | output {
307 | elasticsearch {
308 | #
309 | # Custom Settings
310 | #
311 | id => "power-hs300-index"
312 | index => "power-hs300-%{+YYYY.MM.dd}"
313 | hosts => "${ES_ENDPOINT}"
314 | user => "${ES_USERNAME}"
315 | password => "${ES_PASSWORD}"
316 | }
317 | }
318 | ```
319 |
320 | Put this pipeline in your Logstash configuration directory so it gets loaded in whenever Logstash restarts:
321 |
322 | ```bash
323 | sudo mv power-hs300-index.yml /etc/logstash/conf.d/
324 | ```
325 |
326 | Add the pipeline to your `/etc/logstash/pipelines.yml` file:
327 |
328 | ```
329 | - pipeline.id: "power-hs300-index"
330 | path.config: "/etc/logstash/conf.d/power-hs300-index.conf"
331 | ```
332 |
333 | And finally, restart the Logstash service:
334 |
335 | ```bash
336 | sudo systemctl restart logstash
337 | ```
338 |
339 | While Logstash is restarting, you can tail it's log file in order to see if there are any configuration errors:
340 |
341 | ```bash
342 | sudo tail -f /var/log/logstash/logstash-plain.log
343 | ```
344 |
345 | After a few seconds, you should see Logstash shutdown and start with the new pipeline and no errors being emitted.
346 |
347 | Check your cluster's Stack Monitoring to see if we're getting events through the pipeline:
348 |
349 | 
350 |
351 | ## Step #4 - Visualize Data
352 |
353 | Once Elasticsearch is indexing the data, we want to visualize it in Kibana.
354 |
355 | Download this dashboard: [power-hs300.ndjson](power-hs300.ndjson)
356 |
357 | Jump back into Kibana:
358 |
359 | 1. Select "Stack Management" from the menu
360 | 2. Select "Saved Objects"
361 | 3. Click "Import" in the upper right
362 |
363 | Once it's been imported, click on "Power HS300".
364 |
365 | 
366 |
367 | Congratulations! You should now be looking at power data from your HS300 in Elastic.
--------------------------------------------------------------------------------
/power-hs300/archive.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/power-hs300/archive.png
--------------------------------------------------------------------------------
/power-hs300/dashboard.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/power-hs300/dashboard.png
--------------------------------------------------------------------------------
/power-hs300/hs300.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/power-hs300/hs300.png
--------------------------------------------------------------------------------
/power-hs300/hs300.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 |
3 | import datetime
4 | import json
5 | import subprocess
6 |
7 | def query_power_strip(ip_addr, label, hosts, outlets, time):
8 | output = subprocess.getoutput("/home/ubuntu/.local/bin/pyhs100 --ip " + ip_addr +
9 | " emeter | grep voltage")
10 | output = output.replace("'", "\"")
11 | output = output.replace("0:", "\"0\":")
12 | output = output.replace("1:", "\"1\":")
13 | output = output.replace("2:", "\"2\":")
14 | output = output.replace("3:", "\"3\":")
15 | output = output.replace("4:", "\"4\":")
16 | output = output.replace("5:", "\"5\":")
17 |
18 | try:
19 | json_output = json.loads(output)
20 | for i in range(0, 6):
21 | reading = {}
22 | reading["ip"] = ip_addr
23 | reading["label"] = label
24 | reading["outlet"] = i
25 | reading["name"] = hosts[i]
26 | reading["volts"] = json_output[f"{i}"]["voltage_mv"] / 1000
27 | reading["amps"] = json_output[f"{i}"]["current_ma"] / 1000
28 | reading["watts"] = json_output[f"{i}"]["power_mw"] / 1000
29 | # Record then erase, the stats from the meter only at the top of each hour.
30 | # This gives us a clean "watts/hour" reading every 1 hour.
31 | if time.minute == 0:
32 | reading["watt_hours"] = json_output[f"{i}"]["total_wh"]
33 | erase_output = subprocess.getoutput("/home/ubuntu/.local/bin/pyhs100 --ip " +
34 | ip_addr + " emeter --erase")
35 | outlets.append(reading)
36 | except Exception as e:
37 | print(e)
38 |
39 | def main():
40 | # This script is designed to run every minute.
41 | # If it's the top of the hour, the "watt_hours" are also queried,
42 | # which often makes the runtime of this script greater than 1 minute.
43 | # So we capture the time the script started because we'll likely write
44 | # to output after another invocation of this script.
45 | # Even though these events will be written "out of order",
46 | # recording the correct invocation time will be important.
47 | now = datetime.datetime.utcnow()
48 | outlets = []
49 |
50 | hosts = {
51 | 0: "node-22",
52 | 1: "5k-monitor",
53 | 2: "node-17",
54 | 3: "node-18",
55 | 4: "node-21",
56 | 5: "switch-8"
57 | }
58 | query_power_strip("192.168.1.81", "desk", hosts, outlets, now)
59 |
60 | hosts = {
61 | 0: "node-1",
62 | 1: "node-2",
63 | 2: "node-3",
64 | 3: "node-0",
65 | 4: "switch-8-poe",
66 | 5: "udm-pro"
67 | }
68 | query_power_strip("192.168.1.82", "office", hosts, outlets, now)
69 |
70 | hosts = {
71 | 0: "node-9",
72 | 1: "node-10",
73 | 2: "node-6",
74 | 3: "node-4",
75 | 4: "node-5",
76 | 5: "node-20"
77 | }
78 | query_power_strip("192.168.1.83", "basement", hosts, outlets, now)
79 |
80 | power = {
81 | "@timestamp": now.isoformat(),
82 | "outlets": outlets
83 | }
84 |
85 | print(json.dumps(power))
86 |
87 | if __name__ == "__main__":
88 | main()
89 |
90 |
--------------------------------------------------------------------------------
/power-hs300/index.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/power-hs300/index.png
--------------------------------------------------------------------------------
/power-hs300/minio.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/power-hs300/minio.png
--------------------------------------------------------------------------------
/power-hs300/reindex.yml:
--------------------------------------------------------------------------------
1 | input {
2 | s3 {
3 | #
4 | # Custom Settings
5 | #
6 | prefix => "power-hs300/2021-02-15/15-"
7 | #prefix => "power-hs300/2021-01-29/16-00"
8 | temporary_directory => "${S3_TEMP_DIR}/reindex"
9 | access_key_id => "${S3_ACCESS_KEY}"
10 | secret_access_key => "${S3_SECRET_KEY}"
11 | endpoint => "${S3_ENDPOINT}"
12 | bucket => "${S3_BUCKET}"
13 |
14 | #
15 | # Standard Settings
16 | #
17 | watch_for_new_files => false
18 | sincedb_path => "/dev/null"
19 | codec => json_lines
20 | additional_settings => {
21 | force_path_style => true
22 | follow_redirects => false
23 | }
24 | }
25 | }
26 | filter {
27 | mutate {
28 | remove_field => ["log", "input", "agent", "tags", "@version", "ecs", "host"]
29 | gsub => [
30 | "message", "@timestamp", "ts"
31 | ]
32 | }
33 | json {
34 | source => "message"
35 | skip_on_invalid_json => true
36 | }
37 | if "_jsonparsefailure" in [tags] {
38 | drop { }
39 | }
40 | split {
41 | field => "outlets"
42 | }
43 | ruby {
44 | code => "
45 | event.get('outlets').each do |k, v|
46 | event.set(k, v)
47 | end
48 | event.remove('outlets')
49 | "
50 | }
51 | if "_rubyexception" in [tags] {
52 | drop { }
53 | }
54 | mutate {
55 | remove_field => ["message", "@timestamp"]
56 | }
57 | date {
58 | match => ["ts", "YYYY-MM-dd'T'HH:mm:ss.SSSSSS"]
59 | timezone => "UTC"
60 | target => "@timestamp"
61 | }
62 | mutate {
63 | remove_field => ["ts"]
64 | }
65 | }
66 | output {
67 | stdout {
68 | codec => dots
69 | }
70 | elasticsearch {
71 | index => "power-hs300-%{+YYYY.MM.dd}"
72 | hosts => "${ES_ENDPOINT}"
73 | user => "${ES_USERNAME}"
74 | password => "${ES_PASSWORD}"
75 | }
76 | }
77 |
--------------------------------------------------------------------------------
/satellites/README.md:
--------------------------------------------------------------------------------
1 | # Tracking Satellites with Elastic
2 |
3 |
4 |
5 | Often times data we collect will include geospatial information which is worth seeing on a map. [Elastic Maps](https://www.elastic.co/maps) is a great way to visualize this data to better understand how it is behaving. [The Elastic Stack](https://www.elastic.co/what-is/elk-stack) supports a wide range of [Field data types](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html) that include [geo points](https://www.elastic.co/guide/en/elasticsearch/reference/current/geo-point.html). For this data source, we track the location of over 1,000 [Starlink](https://en.wikipedia.org/wiki/Starlink) satellites and the [International Space Station](https://en.wikipedia.org/wiki/International_Space_Station) (ISS).
6 |
7 | Plotting the location of these satellites involves getting the latest [TLE](https://en.wikipedia.org/wiki/Two-line_element_set) from [Celestrak](http://www.celestrak.com/Norad/elements/table.php?tleFile=starlink&title=Starlink%20Satellites&orbits=0&pointsPerRev=90&frame=1), then using the [Skyfield API](https://rhodesmill.org/skyfield/) to convert the TLE to a Latitude & Longitude by providing a time and date.
8 |
9 | After we get the data ingest and indexed, we will use [Elastic Maps](https://www.elastic.co/maps) to plot our data:
10 |
11 | 
12 |
13 | Let's get started!
14 |
15 | ## Step #1 - Collect Data
16 |
17 | Install the following Python module that knows how to convert TLE information into latitude & longitude:
18 |
19 | ```bash
20 | $ pip3 install skyfield
21 | ```
22 |
23 | Create a new python script called `satellites.py` with the following contents:
24 |
25 | ```python
26 | #!/usr/bin/env python3
27 |
28 | import datetime, json, time
29 | from skyfield.api import load, wgs84
30 |
31 | def main():
32 | stations_url = 'http://celestrak.com/NORAD/elements/stations.txt'
33 | stations = load.tle_file(stations_url, reload=True)
34 | starlink_url = 'http://celestrak.com/NORAD/elements/starlink.txt'
35 | starlinks = load.tle_file(starlink_url, reload=True)
36 |
37 | while True:
38 | now = datetime.datetime.utcnow()
39 | ts = load.timescale()
40 |
41 | satellites = []
42 | output = {}
43 | output['@timestamp'] = now.strftime('%Y-%m-%dT%H:%M:%SZ')
44 |
45 | by_name = {station.name: station for station in stations}
46 | station = by_name['ISS (ZARYA)']
47 | satellite = {}
48 | satellite['name'] = 'ISS'
49 | satellite['sat_num'] = station.model.satnum
50 | geocentric = station.at(ts.now())
51 | subpoint = wgs84.subpoint(geocentric)
52 | geo_point = {}
53 | geo_point['lat'] = subpoint.latitude.degrees
54 | geo_point['lon'] = subpoint.longitude.degrees
55 | satellite['location'] = geo_point
56 | satellite['elevation'] = int(subpoint.elevation.m)
57 | satellites.append(satellite)
58 |
59 | for starlink in starlinks:
60 | try:
61 | geocentric = starlink.at(ts.now())
62 | subpoint = wgs84.subpoint(geocentric)
63 | satellite = {}
64 | satellite['name'] = starlink.name
65 | satellite['sat_num'] = starlink.model.satnum
66 | geo_point = {}
67 | geo_point['lat'] = subpoint.latitude.degrees
68 | geo_point['lon'] = subpoint.longitude.degrees
69 | satellite['location'] = geo_point
70 | satellite['elevation'] = int(subpoint.elevation.m)
71 | satellites.append(satellite)
72 | except:
73 | pass
74 |
75 | output['satellites'] = satellites
76 | print(json.dumps(output))
77 |
78 | time.sleep(3)
79 |
80 | if __name__ == "__main__":
81 | main()
82 | ```
83 |
84 | Create a new bash script called `satellites.sh` with the following contents:
85 |
86 | ```bash
87 | #!/bin/bash
88 |
89 | if pgrep -f "python3 /home/ubuntu/python/satellites/satellites.py" > /dev/null
90 | then
91 | echo "Already running."
92 | else
93 | echo "Not running. Restarting..."
94 | /home/ubuntu/python/satellites/satellites.py >> /var/log/satellites.log
95 | fi
96 | ```
97 |
98 | You can store these wherever you'd like. A good place to put them is in a `~/python` and `~/bin` directory, respectively.
99 |
100 | Try running the Python script directly:
101 |
102 | ```
103 | $ chmod a+x ~/python/satellites/satellites.py
104 | $ ~/python/satellites/satellites.py
105 | ```
106 |
107 | You should see output similar to:
108 |
109 | ```json
110 | {"@timestamp": "2021-04-18T16:47:54Z", "satellites": [{"name": "ISS", "sat_num": 25544, "location": {"lat": -9.499628732834388, "lon": 5.524255661695312}, "elevation": 421272}, {"name": "STARLINK-24", "sat_num": 44238, "location": {"lat": -53.0987009533634, "lon": 75.21545552082654}, "elevation": 539139}]}
111 | ```
112 |
113 | Once you confirm the script is working, you can redirect its output to a log file:
114 |
115 | ```
116 | $ sudo touch /var/log/satellites.log
117 | $ sudo chown ubuntu.ubuntu /var/log/satellites.log
118 | ```
119 |
120 | Create a logrotate entry so the log file doesn't grow unbounded:
121 |
122 | ```
123 | $ sudo vi /etc/logrotate.d/satellites
124 | ```
125 |
126 | Add the following content:
127 |
128 | ```
129 | /var/log/satellites.log {
130 | weekly
131 | rotate 12
132 | compress
133 | delaycompress
134 | missingok
135 | notifempty
136 | create 644 ubuntu ubuntu
137 | }
138 | ```
139 |
140 | Add the following entry to your crontab:
141 |
142 | ```
143 | * * * * * /home/ubuntu/bin/satellites.sh > /dev/null 2>&1
144 | ```
145 |
146 | Verify output by tailing the log file for a few minutes:
147 |
148 | ```
149 | $ tail -f /var/log/satellites.log
150 | ```
151 |
152 | Tell Filebeat to send events in it to Elasticsearch, by editing `/etc/filebeat/filebeat.yml`:
153 |
154 | ```
155 | filebeat.inputs:
156 | - type: log
157 | enabled: true
158 | tags: ["satellites"]
159 | paths:
160 | - /var/log/satellites.log
161 | ```
162 |
163 | Restart Filebeat:
164 |
165 | ```
166 | $ sudo systemctl restart filebeat
167 | ```
168 |
169 | We now have a reliable collection method that will queue the satellite data on disk to a log file. Next, we'll leverage Filebeat to manage all the domain-specific logic of handing it off to Logstash in a reliable manner, dealing with retries, backoff logic, and more.
170 |
171 | ## Step #2 - Archive Data
172 |
173 | Once you have a data source that's ready to archive, we'll turn to Filebeat to send in the data to Logstash. By default, our `distributor` pipeline will put any unrecognized data in a Data Lake bucket called `NEEDS_CLASSIFIED`. To change this, we're going to update the `distributor` pipeline to recognize the satellites data feed and create two pipelines that know how to archive it in the Data Lake.
174 |
175 | Create an Archive Pipeline:
176 |
177 | ```
178 | input {
179 | pipeline {
180 | address => "satellites-archive"
181 | }
182 | }
183 | filter {
184 | }
185 | output {
186 | s3 {
187 | #
188 | # Custom Settings
189 | #
190 | prefix => "satellites/%{+YYYY}-%{+MM}-%{+dd}/%{+HH}"
191 | temporary_directory => "${S3_TEMP_DIR}/satellites-archive"
192 | access_key_id => "${S3_ACCESS_KEY}"
193 | secret_access_key => "${S3_SECRET_KEY}"
194 | endpoint => "${S3_ENDPOINT}"
195 | bucket => "${S3_BUCKET}"
196 |
197 | #
198 | # Standard Settings
199 | #
200 | validate_credentials_on_root_bucket => false
201 | codec => json_lines
202 | # Limit Data Lake file sizes to 5 GB
203 | size_file => 5000000000
204 | time_file => 60
205 | # encoding => "gzip"
206 | additional_settings => {
207 | force_path_style => true
208 | follow_redirects => false
209 | }
210 | }
211 | }
212 | ```
213 |
214 | If you're doing this in environment with multiple Logstash instances, please adapt the instruction below to your workflow for deploying updates. Ansible is a great configuration management tool for this purpose.
215 |
216 | ## Step #3 - Index Data
217 |
218 | Once Logstash is archiving the data, we need to index it with Elastic.
219 |
220 | Create an Index Template:
221 |
222 | ```
223 | PUT _index_template/satellites
224 | {
225 | "index_patterns": ["satellites-*"],
226 | "template": {
227 | "settings": {},
228 | "mappings": {
229 | "properties": {
230 | "location": {
231 | "type": "geo_point"
232 | }
233 | }
234 | },
235 | "aliases": {}
236 | }
237 | }
238 | ```
239 |
240 | Create an Index Pipeline:
241 |
242 | ```
243 | input {
244 | pipeline {
245 | address => "satellites-index"
246 | }
247 | }
248 | filter {
249 | json {
250 | source => "message"
251 | }
252 | json {
253 | source => "message"
254 | }
255 | split {
256 | field => "satellites"
257 | }
258 | mutate {
259 | rename => { "[satellites][name]" => "[name]" }
260 | rename => { "[satellites][sat_num]" => "[sat_num]" }
261 | rename => { "[satellites][location]" => "[location]" }
262 | rename => { "[satellites][elevation]" => "[elevation]" }
263 | remove_field => ["message", "agent", "input", "@version", "satellites"]
264 | }
265 | }
266 | output {
267 | elasticsearch {
268 | #
269 | # Custom Settings
270 | #
271 | id => "satellites-index"
272 | index => "satellites-%{+YYYY.MM.dd}"
273 | hosts => "${ES_ENDPOINT}"
274 | user => "${ES_USERNAME}"
275 | password => "${ES_PASSWORD}"
276 | }
277 | }
278 | ```
279 |
280 | Deploy your pipeline.
281 |
282 | ## Step #4 - Visualize Data
283 |
284 | Once Elasticsearch is indexing the data, we want to visualize it in Kibana.
285 |
286 |
--------------------------------------------------------------------------------
/satellites/dashboard.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/satellites/dashboard.png
--------------------------------------------------------------------------------
/satellites/satellites.ndjson:
--------------------------------------------------------------------------------
1 | {"attributes":{"fieldAttrs":"{}","fields":"[]","runtimeFieldMap":"{}","timeFieldName":"@timestamp","title":"satellites-*"},"coreMigrationVersion":"7.12.0","id":"6979b870-9945-11eb-9bde-9dcf1fa82a43","migrationVersion":{"index-pattern":"7.11.0"},"references":[],"type":"index-pattern","updated_at":"2021-04-18T17:45:39.267Z","version":"WzUyNDcyNywxMl0="}
2 | {"attributes":{"description":"","hits":0,"kibanaSavedObjectMeta":{"searchSourceJSON":"{\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filter\":[]}"},"optionsJSON":"{\"hidePanelTitles\":false,\"useMargins\":true}","panelsJSON":"[{\"version\":\"7.12.0\",\"type\":\"map\",\"gridData\":{\"x\":0,\"y\":0,\"w\":48,\"h\":26,\"i\":\"dae266e1-0d25-4c8e-83e8-547fb8ae8cfd\"},\"panelIndex\":\"dae266e1-0d25-4c8e-83e8-547fb8ae8cfd\",\"embeddableConfig\":{\"attributes\":{\"title\":\"\",\"description\":\"\",\"layerListJSON\":\"[{\\\"sourceDescriptor\\\":{\\\"type\\\":\\\"EMS_TMS\\\",\\\"isAutoSelect\\\":true},\\\"id\\\":\\\"0722a46c-af6c-4837-ae8c-4a1895a2385a\\\",\\\"label\\\":null,\\\"minZoom\\\":0,\\\"maxZoom\\\":24,\\\"alpha\\\":1,\\\"visible\\\":true,\\\"style\\\":{\\\"type\\\":\\\"TILE\\\"},\\\"type\\\":\\\"VECTOR_TILE\\\"},{\\\"sourceDescriptor\\\":{\\\"indexPatternId\\\":\\\"6979b870-9945-11eb-9bde-9dcf1fa82a43\\\",\\\"geoField\\\":\\\"location\\\",\\\"filterByMapBounds\\\":true,\\\"scalingType\\\":\\\"TOP_HITS\\\",\\\"topHitsSplitField\\\":\\\"name.keyword\\\",\\\"topHitsSize\\\":1,\\\"id\\\":\\\"a24b740c-1f80-4013-a190-dfd89b2fab24\\\",\\\"type\\\":\\\"ES_SEARCH\\\",\\\"applyGlobalQuery\\\":true,\\\"applyGlobalTime\\\":true,\\\"tooltipProperties\\\":[\\\"elevation\\\",\\\"name\\\"],\\\"sortField\\\":\\\"\\\",\\\"sortOrder\\\":\\\"desc\\\"},\\\"id\\\":\\\"d4790890-f1d6-4730-bd88-4f8eca8e0fc0\\\",\\\"label\\\":\\\"Satellites\\\",\\\"minZoom\\\":0,\\\"maxZoom\\\":24,\\\"alpha\\\":0.75,\\\"visible\\\":true,\\\"style\\\":{\\\"type\\\":\\\"VECTOR\\\",\\\"properties\\\":{\\\"icon\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"value\\\":\\\"confectionery\\\"}},\\\"fillColor\\\":{\\\"type\\\":\\\"DYNAMIC\\\",\\\"options\\\":{\\\"color\\\":\\\"Green to Red\\\",\\\"colorCategory\\\":\\\"palette_0\\\",\\\"field\\\":{\\\"name\\\":\\\"elevation\\\",\\\"origin\\\":\\\"source\\\"},\\\"fieldMetaOptions\\\":{\\\"isEnabled\\\":true,\\\"sigma\\\":3},\\\"type\\\":\\\"ORDINAL\\\",\\\"useCustomColorRamp\\\":false}},\\\"lineColor\\\":{\\\"type\\\":\\\"DYNAMIC\\\",\\\"options\\\":{\\\"color\\\":\\\"Green to Red\\\",\\\"colorCategory\\\":\\\"palette_0\\\",\\\"field\\\":{\\\"name\\\":\\\"elevation\\\",\\\"origin\\\":\\\"source\\\"},\\\"fieldMetaOptions\\\":{\\\"isEnabled\\\":true,\\\"sigma\\\":3},\\\"type\\\":\\\"ORDINAL\\\",\\\"useCustomColorRamp\\\":false}},\\\"lineWidth\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"size\\\":1}},\\\"iconSize\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"size\\\":6}},\\\"iconOrientation\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"orientation\\\":0}},\\\"labelText\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"value\\\":\\\"\\\"}},\\\"labelColor\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"color\\\":\\\"#FFFFFF\\\"}},\\\"labelSize\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"size\\\":14}},\\\"labelBorderColor\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"color\\\":\\\"#000000\\\"}},\\\"symbolizeAs\\\":{\\\"options\\\":{\\\"value\\\":\\\"icon\\\"}},\\\"labelBorderSize\\\":{\\\"options\\\":{\\\"size\\\":\\\"SMALL\\\"}}},\\\"isTimeAware\\\":true},\\\"type\\\":\\\"VECTOR\\\",\\\"joins\\\":[]}]\",\"mapStateJSON\":\"{\\\"zoom\\\":1.95,\\\"center\\\":{\\\"lon\\\":-99.72924,\\\"lat\\\":22.13816},\\\"timeFilters\\\":{\\\"from\\\":\\\"now-1m\\\",\\\"to\\\":\\\"now\\\"},\\\"refreshConfig\\\":{\\\"isPaused\\\":false,\\\"interval\\\":5000},\\\"query\\\":{\\\"query\\\":\\\"\\\",\\\"language\\\":\\\"kuery\\\"},\\\"filters\\\":[],\\\"settings\\\":{\\\"autoFitToDataBounds\\\":false,\\\"backgroundColor\\\":\\\"#1d1e24\\\",\\\"disableInteractive\\\":false,\\\"disableTooltipControl\\\":false,\\\"hideToolbarOverlay\\\":false,\\\"hideLayerControl\\\":false,\\\"hideViewControl\\\":false,\\\"initialLocation\\\":\\\"LAST_SAVED_LOCATION\\\",\\\"fixedLocation\\\":{\\\"lat\\\":0,\\\"lon\\\":0,\\\"zoom\\\":2},\\\"browserLocation\\\":{\\\"zoom\\\":2},\\\"maxZoom\\\":24,\\\"minZoom\\\":0,\\\"showScaleControl\\\":false,\\\"showSpatialFilters\\\":true,\\\"spatialFiltersAlpa\\\":0.3,\\\"spatialFiltersFillColor\\\":\\\"#DA8B45\\\",\\\"spatialFiltersLineColor\\\":\\\"#DA8B45\\\"}}\",\"uiStateJSON\":\"{\\\"isLayerTOCOpen\\\":true,\\\"openTOCDetails\\\":[]}\"},\"mapCenter\":{\"lat\":22.13816,\"lon\":-99.72924,\"zoom\":1.95},\"mapBuffer\":{\"minLon\":-395.447,\"minLat\":-88.57154500000001,\"maxLon\":195.98852,\"maxLat\":115.932715},\"isLayerTOCOpen\":true,\"openTOCDetails\":[],\"hiddenLayers\":[],\"enhancements\":{}}}]","timeRestore":false,"title":"Satellites","version":1},"coreMigrationVersion":"7.12.0","id":"4ae756b0-9ee0-11eb-892f-d146407b15b5","migrationVersion":{"dashboard":"7.11.0"},"references":[{"id":"6979b870-9945-11eb-9bde-9dcf1fa82a43","name":"layer_1_source_index_pattern","type":"index-pattern"}],"type":"dashboard","updated_at":"2021-04-18T17:58:42.036Z","version":"WzUyNTEzNSwxMl0="}
3 | {"exportedCount":2,"missingRefCount":0,"missingReferences":[]}
--------------------------------------------------------------------------------
/satellites/satellites.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/satellites/satellites.png
--------------------------------------------------------------------------------
/setup/README.md:
--------------------------------------------------------------------------------
1 | # Setup
2 |
3 | To build the architecture for the Elastic Data Lake, you'll need these components:
4 |
5 | * Logstash
6 | * HAProxy (or equivalent)
7 | * S3 Data Store (or equivalent)
8 | * Elastic Cluster
9 |
10 | Here is the architecture we're building:
11 |
12 | 
13 |
14 | ## Prerequisites
15 |
16 | This guide depends on you having an S3 store and Elasticsearch cluster already running. We'll use [Elastic Cloud](https://elastic.co) to run our Elasticsearch cluster and [Minio](https://www.digitalocean.com/community/tutorials/how-to-set-up-an-object-storage-server-using-minio-on-ubuntu-18-04) as an S3 data store (or any S3-compliant service).
17 |
18 | ## Step 1 - Logstash
19 |
20 | Identify the host you want to run Logstash. Depending on the volume of ingest you anticipate, you may want to run Logstash on multiple hosts (or containers). It scales easily so putting HAProxy in front of it (which we'll do next) will make it easy to add more capacity.
21 |
22 | Follow these instructions to get Logstash up and running:
23 |
24 | [https://www.elastic.co/guide/en/logstash/current/installing-logstash.html](https://www.elastic.co/guide/en/logstash/current/installing-logstash.html)
25 |
26 | Next, create a [Logstash keystore](https://www.elastic.co/guide/en/logstash/current/keystore.html) to store sensitive information and variables:
27 |
28 | ```
29 | $ export LOGSTASH_KEYSTORE_PASS=mypassword
30 | $ sudo -E /usr/share/logstash/bin/logstash-keystore --path.settings /etc/logstash create
31 | ```
32 |
33 | **Note:** Store this password somewhere safe. You will also need to add it to the environment that starts the Logstash process.
34 |
35 | We'll use the keystore to fill in variables about our Elasticsearch cluster:
36 |
37 | ```
38 | $ sudo -E /usr/share/logstash/bin/logstash-keystore --path.settings /etc/logstash add ES_ENDPOINT
39 | $ sudo -E /usr/share/logstash/bin/logstash-keystore --path.settings /etc/logstash add ES_USERNAME
40 | $ sudo -E /usr/share/logstash/bin/logstash-keystore --path.settings /etc/logstash add ES_PASSWORD
41 | ```
42 |
43 | The `ES_ENDPOINT` value should be a full domain with `https` prefix and `:9243` port suffix. For example:
44 |
45 | ```
46 | https://elasticsearch.my-domain.com:9243
47 | ```
48 |
49 | We'll also use the keystore to fill in variables about our S3 bucket:
50 |
51 | ```
52 | $ sudo -E /usr/share/logstash/bin/logstash-keystore --path.settings /etc/logstash add S3_ENDPOINT
53 | $ sudo -E /usr/share/logstash/bin/logstash-keystore --path.settings /etc/logstash add S3_BUCKET
54 | $ sudo -E /usr/share/logstash/bin/logstash-keystore --path.settings /etc/logstash add S3_ACCESS_KEY
55 | $ sudo -E /usr/share/logstash/bin/logstash-keystore --path.settings /etc/logstash add S3_SECRET_KEY
56 | $ sudo -E /usr/share/logstash/bin/logstash-keystore --path.settings /etc/logstash add S3_DATE_DIR
57 | $ sudo -E /usr/share/logstash/bin/logstash-keystore --path.settings /etc/logstash add S3_TEMP_DIR
58 | ```
59 |
60 | The `S3_DATE_DIR` variable is used to organize your data into `date/time` directories in the Data Lake. For example, `data-source/2021-01-01/13` will contain data collected January 1, 2021 during the 1PM GMT hour. Organizing your data in this manner gives you good granularity in terms of identifying what time windows you may want to re-index in the future. It allows you to reindex data from a year, month, day, or hour interval. An hour (as opposed to using the hour with minute granularity) provides a nice balance between flushing what's in Logstash to your archive relatively often, while not creating a "too many files" burden on the underlying archive file system. Many file systems can handle lots of files; it's more the latency involved in recalling them that we want to avoid.
61 |
62 | The recommended value for `S3_DATE_DIR` is:
63 |
64 | ```
65 | %{+YYYY}-%{+MM}-%{+dd}/%{+HH}
66 | ```
67 |
68 | The `S3_TEMP_DIR` variable should point to a directory where Logstash can temporarily store events. Since this directory will contain events, you may need to make it secure so that only the Logstash process can read it (in addition to write to it).
69 |
70 | If Logstash is running on an isolated host, you may set it to:
71 |
72 | ```
73 | /tmp/logstash
74 | ```
75 |
76 | ### Ansible Pipeline Management
77 |
78 | We'll configure Logstash using Ansible. Ansible is a popular software provisioning tool that makes deploying configuration updates to multiple servers a breeze. If you can SSH into a host, you can use Ansible to push configuration to it.
79 |
80 | Create a directory to hold the Logstash configuration we'll be pushing to each Logstash host.
81 |
82 | ```
83 | $ mkdir logstash
84 | $ vi playbook-logstash.yml
85 | ```
86 |
87 | Add the following content to your Logstash Ansible playbook.
88 |
89 | **Note:** Replace `node-1` and `node-2` with the names of your Logstash hosts.
90 |
91 | ```
92 | ---
93 | - hosts: node-1:node-2
94 | become: yes
95 | gather_facts: no
96 |
97 | tasks:
98 | - name: Copy in pipelines.yml
99 | template:
100 | src: "pipelines.yml"
101 | dest: "/etc/logstash/pipelines.yml"
102 | mode: 0644
103 |
104 | - name: Remove existing pipelines
105 | file:
106 | path: "/etc/logstash/conf.d"
107 | state: absent
108 |
109 | - name: Copy in pipelines
110 | copy:
111 | src: "conf.d"
112 | dest: "/etc/logstash/"
113 |
114 | - name: Restart Logstash
115 | service:
116 | name: logstash
117 | state: restarted
118 | enabled: true
119 |
120 | ```
121 |
122 | ## Step 2 - HAProxy
123 |
124 | Identify the host you want to run HAProxy. Many Linux distributions support installation from the standard distribution.
125 |
126 | In Ubuntu, run:
127 |
128 | ```
129 | $ sudo apt install haproxy
130 | ```
131 |
132 | In Redhat, run:
133 |
134 | ```
135 | $ sudo yum install haproxy
136 | ```
137 |
138 | A sample configuration file is provided: [haproxy.cfg](haproxy.cfg)
139 |
--------------------------------------------------------------------------------
/setup/dead-letter-queue-archive.yml:
--------------------------------------------------------------------------------
1 | input {
2 | dead_letter_queue {
3 | pipeline_id => "haproxy-filebeat-module-structure"
4 | path => "${S3_TEMP_DIR}/dead-letter-queue"
5 | # This directory needs created by hand (change /tmp/logstash if necessary):
6 | # mkdir -p /tmp/logstash/dead-letter-queue/haproxy-filebeat-module-structure
7 | # chown -R logstash.logstash /tmp/logstash/dead-letter-queue
8 | }
9 | dead_letter_queue {
10 | pipeline_id => "haproxy-metricbeat-module-structure"
11 | path => "${S3_TEMP_DIR}/dead-letter-queue"
12 | # This directory needs created by hand (change /tmp/logstash if necessary):
13 | # mkdir -p /tmp/logstash/dead-letter-queue/haproxy-metricbeat-module-structure
14 | # chown -R logstash.logstash /tmp/logstash/dead-letter-queue
15 | }
16 | dead_letter_queue {
17 | pipeline_id => "system-filebeat-module-structure"
18 | path => "${S3_TEMP_DIR}/dead-letter-queue"
19 | # This directory needs created by hand (change /tmp/logstash if necessary):
20 | # mkdir -p /tmp/logstash/dead-letter-queue/system-filebeat-module-structure
21 | # chown -R logstash.logstash /tmp/logstash/dead-letter-queue
22 | }
23 | dead_letter_queue {
24 | pipeline_id => "unknown-structure"
25 | path => "${S3_TEMP_DIR}/dead-letter-queue"
26 | # This directory needs created by hand (change /tmp/logstash if necessary):
27 | # mkdir -p /tmp/logstash/dead-letter-queue/unknown-structure
28 | # chown -R logstash.logstash /tmp/logstash/dead-letter-queue
29 | }
30 | dead_letter_queue {
31 | pipeline_id => "utilization-structure"
32 | path => "${S3_TEMP_DIR}/dead-letter-queue"
33 | # This directory needs created by hand (change /tmp/logstash if necessary):
34 | # mkdir -p /tmp/logstash/dead-letter-queue/utilization-structure
35 | # chown -R logstash.logstash /tmp/logstash/dead-letter-queue
36 | }
37 | }
38 | filter {
39 | }
40 | output {
41 | s3 {
42 | #
43 | # Custom Settings
44 | #
45 | prefix => "dead-letter-queue-archive/${S3_DATE_DIR}"
46 | temporary_directory => "${S3_TEMP_DIR}/dead-letter-queue-archive"
47 | access_key_id => "${S3_ACCESS_KEY}"
48 | secret_access_key => "${S3_SECRET_KEY}"
49 | endpoint => "${S3_ENDPOINT}"
50 | bucket => "${S3_BUCKET}"
51 |
52 | #
53 | # Standard Settings
54 | #
55 | validate_credentials_on_root_bucket => false
56 | codec => json_lines
57 | # Limit Data Lake file sizes to 5 GB
58 | size_file => 5000000000
59 | time_file => 1
60 | # encoding => "gzip"
61 | additional_settings => {
62 | force_path_style => true
63 | follow_redirects => false
64 | }
65 | }
66 | }
67 |
--------------------------------------------------------------------------------
/setup/distributor.conf:
--------------------------------------------------------------------------------
1 | input {
2 | tcp {
3 | port => 4044
4 | }
5 | beats {
6 | port => 5044
7 | }
8 | }
9 | filter {
10 | # Raw data filters go here.
11 | # Filter out any data you don't want in the Data Lake or Elasticsearch.
12 | }
13 | output {
14 | if "utilization" in [tags] {
15 | pipeline {
16 | send_to => ["utilization-archive", "utilization-structure"]
17 | }
18 | } else if [agent][type] == "filebeat" and [event][module] == "system" {
19 | pipeline {
20 | send_to => ["system-filebeat-module-archive", "system-filebeat-module-structure"]
21 | }
22 | } else if [agent][type] == "filebeat" and [event][module] == "haproxy" {
23 | pipeline {
24 | send_to => ["haproxy-filebeat-module-archive", "haproxy-filebeat-module-structure"]
25 | }
26 | } else if [agent][type] == "metricbeat" and [event][module] == "haproxy" {
27 | pipeline {
28 | send_to => ["haproxy-metricbeat-module-archive", "haproxy-metricbeat-module-structure"]
29 | }
30 | } else {
31 | pipeline {
32 | send_to => ["unknown-archive", "unknown-structure"]
33 | }
34 | }
35 | }
36 |
--------------------------------------------------------------------------------
/setup/haproxy.cfg:
--------------------------------------------------------------------------------
1 | global
2 | log /dev/log local0
3 | log /dev/log local1 notice
4 | chroot /var/lib/haproxy
5 | stats socket 127.0.0.1:14567
6 | user haproxy
7 | group haproxy
8 | daemon
9 | tune.ssl.default-dh-param 2048
10 |
11 | ssl-default-bind-ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384
12 | ssl-default-bind-ciphersuites TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256
13 | ssl-default-bind-options ssl-min-ver TLSv1.2 no-tls-tickets
14 |
15 | defaults
16 | log global
17 | mode http
18 | option httplog
19 | option dontlognull
20 | timeout connect 5000
21 | timeout client 50000
22 | timeout server 50000
23 | errorfile 400 /etc/haproxy/errors/400.http
24 | errorfile 403 /etc/haproxy/errors/403.http
25 | errorfile 408 /etc/haproxy/errors/408.http
26 | errorfile 500 /etc/haproxy/errors/500.http
27 | errorfile 502 /etc/haproxy/errors/502.http
28 | errorfile 503 /etc/haproxy/errors/503.http
29 | errorfile 504 /etc/haproxy/errors/504.http
30 |
31 | # Logstash TCP
32 | listen logstash-tcp:4443
33 | #log /dev/log local0 debug
34 | mode tcp
35 | bind *:4443 ssl crt /etc/haproxy/certs/logstash.corp-intranet.pem ca-file /etc/haproxy/certs/ca.crt verify required
36 | option tcp-check
37 | balance roundrobin
38 | server proxy 127.0.0.1:4044 check port 4044
39 |
40 | # Logstash Beats
41 | listen logstash-beats:5443
42 | #log /dev/log local0 debug
43 | mode tcp
44 | bind *:5443 ssl crt /etc/haproxy/certs/logstash.corp-intranet.pem ca-file /etc/haproxy/certs/ca.crt verify required
45 | option tcp-check
46 | balance roundrobin
47 | server proxy 127.0.0.1:5044 check port 5044
48 |
49 | # Elasticsearch
50 | listen elasticsearch:9243
51 | #log /dev/log local0 debug
52 | mode http
53 | bind *:9243 ssl crt /etc/haproxy/certs/corp-intranet.pem
54 | http-request add-header X-Found-Cluster f40ec3b5bf1c4d8d81b3934cb97c8a32
55 | option ssl-hello-chk
56 | server proxy f40ec3b5bf1c4d8d81b3934cb97c8a32.us-central1.gcp.cloud.es.io:9243 check ssl port 9243 verify none
57 |
58 | # MinIO
59 | listen minio:9443
60 | #log /dev/log local0 debug
61 | mode http
62 | bind *:9443 ssl crt /etc/haproxy/certs/corp-intranet.pem
63 | http-request set-header X-Forwarded-Port %[dst_port]
64 | http-request add-header X-Forwarded-Proto https if { ssl_fc }
65 | option tcp-check
66 | balance roundrobin
67 | server proxy 127.0.0.1:9000 check port 9000
68 |
--------------------------------------------------------------------------------
/setup/needs-classified-archive.yml:
--------------------------------------------------------------------------------
1 | input {
2 | pipeline {
3 | address => "needs-classified-archive"
4 | }
5 | }
6 | filter {
7 | }
8 | output {
9 | s3 {
10 | #
11 | # Custom Settings
12 | #
13 | prefix => "NEEDS_CLASSIFIED/${S3_DATE_DIR}"
14 | temporary_directory => "${S3_TEMP_DIR}/needs-classified-archive"
15 | access_key_id => "${S3_ACCESS_KEY}"
16 | secret_access_key => "${S3_SECRET_KEY}"
17 | endpoint => "${S3_ENDPOINT}"
18 | bucket => "${S3_BUCKET}"
19 |
20 | #
21 | # Standard Settings
22 | #
23 | validate_credentials_on_root_bucket => false
24 | codec => json_lines
25 | # Limit Data Lake file sizes to 5 GB
26 | size_file => 5000000000
27 | time_file => 1
28 | # encoding => "gzip"
29 | additional_settings => {
30 | force_path_style => true
31 | follow_redirects => false
32 | }
33 | }
34 | }
35 |
--------------------------------------------------------------------------------
/solar-enphase/README.md:
--------------------------------------------------------------------------------
1 | # Solar Monitoring with Enphase
2 |
3 |
4 |
5 | The [IQ 7+](https://store.enphase.com/storefront/en-us/iq-7plus-microinverter) from Enphase is a microinverter compatible with 60 and 72-cell solar panels that can produce 295VA at peak power. Enphase provides an [API]( https://developer.enphase.com/docs#envoys) that allows us to query a set of these microinverters reporting into their service. They offer a range of [Plans](https://developer.enphase.com/plans), including a free plan, which we'll be using for this data source.
6 |
7 | For this data source, we'll build the following dashboard with Elastic:
8 |
9 | 
10 |
11 | Let's get started!
12 |
13 | ## Step #1 - Collect Data
14 |
15 | Create a new python script called `~/bin/solar-enphase.py` with the following contents:
16 |
17 | [solar-enphase.py](solar-enphase.py)
18 |
19 | The script queries a set of Enphase's APIs at different intervals. The goal being to stay within our alotted quota of 10k API calls per month. We'll write the data collected to our data lake, but only use a portion of it for analysis in Elastic.
20 |
21 | Take a few minutes to familiarize yourself with the script. There are a couple of labels you can change near the bottom. Adjust the values of ``, `` and `` to suit your needs. The Enphase [Developer Portal](https://developer.enphase.com) is where you can get these values.
22 |
23 | When you're ready, try running the script:
24 |
25 | ```bash
26 | chmod a+x ~/bin/solar-enphase.py
27 | ~/bin/solar-enphase.py
28 | ```
29 |
30 | You may not see any output, and this is by design (not a great design, albeit, but it works for now). Since we're limited to ~300 API calls per day on the Free plan, the script checks to see if it's on a specific minute of the hour in order to determine which API calls to make.
31 |
32 | If you run the script at :00, :10, :20, :30, :40, or :50 past the hour, you should see output on `stdout` similar to:
33 |
34 | ```json
35 | [{"signal_strength":0,"micro_inverters":[{"id":40236944,"serial_number":"121927062331","model":"IQ7+","part_number":"800-00625-r02","sku":"IQ7PLUS-72-2-US","status":"normal","power_produced":28,"proc_load":"520-00082-r01-v04.27.04","param_table":"540-00242-r01-v04.22.09","envoy_serial_number":"111943015132",...
36 | ```
37 |
38 | Once you confirm the script is working, you can redirect its output to a log file:
39 |
40 | ```bash
41 | sudo touch /var/log/solar-enphase.log
42 | sudo chown ubuntu.ubuntu /var/log/solar-enphase.log
43 | ```
44 |
45 | Create a logrotate entry so the log file doesn't grow unbounded:
46 |
47 | ```bash
48 | sudo vi /etc/logrotate.d/solar-enphase
49 | ```
50 |
51 | Add the following logrotate content:
52 |
53 | ```
54 | /var/log/solar-enphase.log {
55 | weekly
56 | rotate 12
57 | compress
58 | delaycompress
59 | missingok
60 | notifempty
61 | create 644 ubuntu ubuntu
62 | }
63 | ```
64 |
65 | Add the following entry to your crontab with `crontab -e`:
66 |
67 | ```
68 | * * * * * /home/ubuntu/bin/solar-enphase.py >> /var/log/solar-enphase.log 2>&1
69 | ```
70 |
71 | Verify output by tailing the log file for a few minutes (since cron is only running the script at the start of each minute):
72 |
73 | ```bash
74 | tail -f /var/log/solar-enphase.log
75 | ```
76 |
77 | If you're seeing output scroll every 10 minutes, then you are successfully collecting data!
78 |
79 | ## Step #2 - Archive Data
80 |
81 | Once your data is ready to archive, we'll use Filebeat to send it to Logstash which will in turn sends it to S3.
82 |
83 | Add the following to the Filebeat config `/etc/filebeat/filebeat.yml` on the host logging your CO2 data:
84 |
85 | ```yaml
86 | filebeat.inputs:
87 | - type: log
88 | enabled: true
89 | tags: ["solar-enphase"]
90 | paths:
91 | - /var/log/solar-enphase.log
92 | ```
93 |
94 | This tells Filebeat where the log file is located and it adds a tag to each event. We'll refer to that tag in Logstash so we can easily isolate events from this data stream.
95 |
96 | Restart Filebeat:
97 |
98 | ```bash
99 | sudo systemctl restart filebeat
100 | ```
101 |
102 | You may want to tail syslog to see if Filebeat restarts without any issues:
103 |
104 | ```bash
105 | tail -f /var/log/syslog | grep filebeat
106 | ```
107 |
108 | At this point, we should have Solar Enphase data flowing into Logstash. By default however, our `distributor` pipeline in Logstash will put any unrecognized data in our Data Lake / S3 bucket called `NEEDS_CLASSIFIED`. To change this, we're going to update the `distributor` pipeline to recognize the Solar Enphase data feed.
109 |
110 | Add the following conditional to your `distributor.yml` file:
111 |
112 | ```
113 | } else if "solar-enphase" in [tags] {
114 | pipeline {
115 | send_to => ["solar-enphase-archive"]
116 | }
117 | }
118 | ```
119 |
120 | Create a Logstash pipeline called `solar-enphase-archive.yml` with the following contents:
121 |
122 | ```
123 | input {
124 | pipeline {
125 | address => "solar-enphase-archive"
126 | }
127 | }
128 | filter {
129 | }
130 | output {
131 | s3 {
132 | #
133 | # Custom Settings
134 | #
135 | prefix => "solar-enphase/%{+YYYY}-%{+MM}-%{+dd}/%{+HH}"
136 | temporary_directory => "${S3_TEMP_DIR}/solar-enphase-archive"
137 | access_key_id => "${S3_ACCESS_KEY}"
138 | secret_access_key => "${S3_SECRET_KEY}"
139 | endpoint => "${S3_ENDPOINT}"
140 | bucket => "${S3_BUCKET}"
141 |
142 | #
143 | # Standard Settings
144 | #
145 | validate_credentials_on_root_bucket => false
146 | codec => json_lines
147 | # Limit Data Lake file sizes to 5 GB
148 | size_file => 5000000000
149 | time_file => 60
150 | # encoding => "gzip"
151 | additional_settings => {
152 | force_path_style => true
153 | follow_redirects => false
154 | }
155 | }
156 | }
157 | ```
158 |
159 | Put this pipeline in your Logstash configuration directory so it gets loaded whenever Logstash restarts:
160 |
161 | ```bash
162 | sudo mv solar-enphase-archive.yml /etc/logstash/conf.d/
163 | ```
164 |
165 | Add the pipeline to your `/etc/logstash/pipelines.yml` file:
166 |
167 | ```
168 | - pipeline.id: "solar-enphase-archive"
169 | path.config: "/etc/logstash/conf.d/solar-enphase-archive.conf"
170 | ```
171 |
172 | And finally, restart the Logstash service:
173 |
174 | ```bash
175 | sudo systemctl restart logstash
176 | ```
177 |
178 | While Logstash is restarting, you can tail it's log file in order to see if there are any configuration errors:
179 |
180 | ```bash
181 | sudo tail -f /var/log/logstash/logstash-plain.log
182 | ```
183 |
184 | After a few seconds, you should see Logstash shutdown and start with the new pipeline and no errors being emitted.
185 |
186 | Check your cluster's Stack Monitoring to see if we're getting events through the pipeline:
187 |
188 | 
189 |
190 | Check your S3 bucket to see if you're getting data directories created for the current date & hour with data:
191 |
192 | 
193 |
194 | If you see your data being stored, then you are successfully archiving!
195 |
196 | ## Step #3 - Index Data
197 |
198 | Once Logstash is archiving the data, next we need to index it with Elastic.
199 |
200 | We'll use Elastic's [Dynamic field mapping](https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-field-mapping.html) feature to automatically create the right [Field data types](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html) for the data we're sending in.
201 |
202 | Using the [Logstash Toolkit](http://github.com/gose/logstash-toolkit), the following filter chain has been built that can parse the raw JSON coming in.
203 |
204 | Create a new pipeline called `solar-enphase-index.yml` with the following content:
205 |
206 | ```
207 | input {
208 | pipeline {
209 | address => "solar-enphase-index"
210 | }
211 | }
212 | filter {
213 | json {
214 | source => "message"
215 | }
216 | if [message] =~ /^\[/ {
217 | json {
218 | source => "message"
219 | target => "tmp"
220 | }
221 | } else {
222 | drop { }
223 | }
224 | if "_jsonparsefailure" in [tags] {
225 | drop { }
226 | }
227 | mutate {
228 | remove_field => ["message"]
229 | }
230 | mutate {
231 | add_field => {
232 | "message" => "%{[tmp][0]}"
233 | }
234 | }
235 | mutate {
236 | remove_field => ["tmp"]
237 | }
238 | json {
239 | source => "message"
240 | }
241 | mutate {
242 | remove_field => ["message"]
243 | }
244 | split {
245 | field => "micro_inverters"
246 | }
247 | ruby {
248 | # Promote the keys inside tmp to root, then remove tmp
249 | code => '
250 | event.get("micro_inverters").each { |k, v|
251 | event.set(k,v)
252 | }
253 | event.remove("micro_inverters")
254 | '
255 | }
256 | date {
257 | match => ["last_report_date", "ISO8601"]
258 | }
259 | mutate {
260 | remove_field => ["last_report_date", "part_number", "envoy_serial_number", "param_table"]
261 | remove_field => ["model", "sku", "grid_profile", "proc_load", "id"]
262 | remove_field => ["agent", "host", "input", "log", "host", "ecs", "@version"]
263 | }
264 | }
265 | output {
266 | elasticsearch {
267 | #
268 | # Custom Settings
269 | #
270 | id => "solar-enphase-index"
271 | index => "solar-enphase-%{+YYYY.MM.dd}"
272 | hosts => "${ES_ENDPOINT}"
273 | user => "${ES_USERNAME}"
274 | password => "${ES_PASSWORD}"
275 | }
276 | }
277 | ```
278 |
279 | Put this pipeline in your Logstash configuration directory so it gets loaded in whenever Logstash restarts:
280 |
281 | ```bash
282 | sudo mv solar-enphase-index.yml /etc/logstash/conf.d/
283 | ```
284 |
285 | Add the pipeline to your `/etc/logstash/pipelines.yml` file:
286 |
287 | ```
288 | - pipeline.id: "solar-enphase-index"
289 | path.config: "/etc/logstash/conf.d/solar-enphase-index.conf"
290 | ```
291 |
292 | And finally, restart the Logstash service:
293 |
294 | ```bash
295 | sudo systemctl restart logstash
296 | ```
297 |
298 | While Logstash is restarting, you can tail it's log file in order to see if there are any configuration errors:
299 |
300 | ```bash
301 | sudo tail -f /var/log/logstash/logstash-plain.log
302 | ```
303 |
304 | After a few seconds, you should see Logstash shutdown and start with the new pipeline and no errors being emitted.
305 |
306 | Check your cluster's Stack Monitoring to see if we're getting events through the pipeline:
307 |
308 | 
309 |
310 | ## Step #4 - Visualize Data
311 |
312 | Once Elasticsearch is indexing the data, we want to visualize it in Kibana.
313 |
314 | Download this dashboard: [solar-enphase.ndjson](solar-enphase.ndjson)
315 |
316 | Jump back into Kibana:
317 |
318 | 1. Select "Stack Management" from the menu
319 | 2. Select "Saved Objects"
320 | 3. Click "Import" in the upper right
321 |
322 | Once it's been imported, click on "Solar Enphase".
323 |
324 | 
325 |
326 | Congratulations! You should now be looking at data from your Solar Enphase in Elastic.
327 |
--------------------------------------------------------------------------------
/solar-enphase/archive.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/solar-enphase/archive.png
--------------------------------------------------------------------------------
/solar-enphase/dashboard.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/solar-enphase/dashboard.png
--------------------------------------------------------------------------------
/solar-enphase/index.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/solar-enphase/index.png
--------------------------------------------------------------------------------
/solar-enphase/minio.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/solar-enphase/minio.png
--------------------------------------------------------------------------------
/solar-enphase/solar-enphase.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 |
3 | import urllib.request
4 | import urllib.parse
5 | import datetime
6 |
7 | def main():
8 | # The Enphase API endpoints are detailed here:
9 | # https://developer.enphase.com/docs
10 | # Most of them don't need to be called more than once a day.
11 |
12 | now = datetime.datetime.utcnow()
13 |
14 | # Run once per day
15 | if now.hour == 0 and now.minute == 0:
16 | url = "https://api.enphaseenergy.com/api/v2/systems//energy_lifetime?key=&user_id="
17 | f = urllib.request.urlopen(url)
18 | print(f.read().decode("utf-8"))
19 |
20 | url = "https://api.enphaseenergy.com/api/v2/systems//inventory?key=&user_id="
21 | f = urllib.request.urlopen(url)
22 | print(f.read().decode("utf-8"))
23 |
24 | # Run once per hour
25 | if now.minute == 0:
26 | url = "https://api.enphaseenergy.com/api/v2/systems//summary?key=&user_id="
27 | f = urllib.request.urlopen(url)
28 | print(f.read().decode("utf-8"))
29 |
30 | # Run every 10 minutes
31 | if now.minute % 10 == 0:
32 | # Get the status of each inverter
33 | url = "https://api.enphaseenergy.com/api/v2/systems/inverters_summary_by_envoy_or_site?key=&user_id=&site_id="
34 | f = urllib.request.urlopen(url)
35 | print(f.read().decode("utf-8"))
36 |
37 | # The `stats` endpoint updates, at most, once every 5 minutes.
38 | # It isn't reliable though, so you can't expect a new reading every 5 minutes.
39 | # Due to this, we'll track all of it and use an enrich lookup in Logstash to
40 | # see if the 5-minute reading was already inserted into Elasticsearch.
41 | # {
42 | # "end_at": 1613239200,
43 | # "devices_reporting": 20,
44 | # "powr": 159, # Average power produced during this interval, measured in Watts.
45 | # "enwh": 13 # Energy produced during this interval, measured in Watt hours.
46 | # }
47 | url = "https://api.enphaseenergy.com/api/v2/systems//stats?key=&user_id="
48 | f = urllib.request.urlopen(url)
49 | print(f.read().decode("utf-8"))
50 |
51 | if __name__ == "__main__":
52 | main()
53 |
--------------------------------------------------------------------------------
/solar-enphase/solar.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/solar-enphase/solar.png
--------------------------------------------------------------------------------
/temperature-dht22/README.md:
--------------------------------------------------------------------------------
1 | # Monitoring Temperature with DHT22
2 |
3 |
4 |
5 | The [DHT22](http://www.adafruit.com/products/385), in a low-cost digital temperature and humidity sensor. It uses a capacitive humidity sensor and a thermistor to measure the surrounding air, and outputs a digital signal on the data pin reporting their values. The [AM2302](https://www.adafruit.com/product/393) is a wired version of this sensor which includes the required [4.7K - 10KΩ](https://raspberrypi.stackexchange.com/questions/12161/do-i-have-to-connect-a-resistor-to-my-dht22-humidity-sensor) resistor. The version by [FTCBlock](https://www.amazon.com/FTCBlock-Temperature-Humidity-Electronic-Practice/dp/B07H2RP26F) comes with GPIO jumpers that don't require a breadboard or soldering.
6 |
7 | We'll use a Python script to query the sensor each minute via a cron job, and redirect the output to a log file. From there, Filebeat will pick it up and send it to Elastic.
8 |
9 | 
10 |
11 | Let's get started.
12 |
13 | ## Step #1 - Collect Data
14 |
15 | Install the following Python module:
16 |
17 | ```bash
18 | sudo pip3 install Adafruit_DHT
19 | ```
20 |
21 | Create a Python script at `~/bin/temperature-dht22.py` with the following contents (adjusting any values as you see fit):
22 |
23 | ```python
24 | #!/usr/bin/env python3
25 |
26 | import Adafruit_DHT
27 | import datetime
28 | import json
29 | import socket
30 |
31 | DHT_SENSOR = Adafruit_DHT.DHT22
32 | DHT_PIN = 4
33 |
34 | if __name__ == "__main__":
35 | humidity, temp_c = Adafruit_DHT.read_retry(DHT_SENSOR, DHT_PIN)
36 | temp_f = (temp_c * 9 / 5) + 32
37 | output = {
38 | "timestamp": datetime.datetime.utcnow().isoformat(),
39 | "host": socket.gethostname(),
40 | "temp_c": float("%2.2f" % temp_c),
41 | "temp_f": float("%2.2f" % temp_f),
42 | "humidity": float("%2.2f" % humidity),
43 | "location": "office",
44 | "source": "DHT22"
45 | }
46 | print(json.dumps(output))
47 | ```
48 |
49 | Try running the script from the command line:
50 |
51 | ```bash
52 | chmod a+x ~/bin/temperature-dht22.py
53 | sudo ~/bin/temperature-dht22.py
54 | ```
55 |
56 | The output should look like the following:
57 |
58 | ```json
59 | {"timestamp": "2021-09-05T12:30:10.436436", "host": "node-19", "temp_c": 21.3, "temp_f": 70.34, "humidity": 60.2, "location": "office", "source": "DHT22"}
60 | ```
61 |
62 | Once you're able to successfully query the sensor, create a log file for its output:
63 |
64 | ```bash
65 | sudo touch /var/log/temperature-dht22.log
66 | sudo chown ubuntu.ubuntu /var/log/temperature-dht22.log
67 | ```
68 |
69 | Create a logrotate entry so the log file doesn't grow unbounded:
70 |
71 | ```
72 | sudo vi /etc/logrotate.d/temperature-dht22
73 | ```
74 |
75 | Add the following content:
76 |
77 | ```
78 | /var/log/temperature-dht22.log {
79 | weekly
80 | rotate 12
81 | compress
82 | delaycompress
83 | missingok
84 | notifempty
85 | create 644 ubuntu ubuntu
86 | }
87 | ```
88 |
89 | Add the following entry to your crontab:
90 |
91 | ```
92 | * * * * * sudo /home/ubuntu/bin/temperature-dht22.py >> /var/log/temperature-dht22.log 2>&1
93 | ```
94 |
95 | Verify output by tailing the log file for a few minutes:
96 |
97 | ```
98 | $ tail -f /var/log/temperature-dht22.log
99 | ```
100 |
101 | If you're seeing output scroll each minute then you are successfully collecting data!
102 |
103 | ## Step #2 - Archive Data
104 |
105 | Once your data is ready to archive, we'll use Filebeat to send it to Logstash which will in turn sends it to S3.
106 |
107 | Add the following to the Filebeat config `/etc/filebeat/filebeat.yml` on the host logging your DHT22 data:
108 |
109 | ```yaml
110 | filebeat.inputs:
111 | - type: log
112 | enabled: true
113 | tags: ["temperature-dht22"]
114 | paths:
115 | - /var/log/temperature-dht22.log
116 | ```
117 |
118 | This tells Filebeat where the log file is located and it adds a tag to each event. We'll refer to that tag in Logstash so we can easily isolate events from this data stream.
119 |
120 | Restart Filebeat:
121 |
122 | ```bash
123 | sudo systemctl restart filebeat
124 | ```
125 |
126 | You may want to tail syslog to see if Filebeat restarts without any issues:
127 |
128 | ```bash
129 | tail -f /var/log/syslog | grep filebeat
130 | ```
131 |
132 | At this point, we should have DHT22 data flowing into Logstash. By default however, our `distributor` pipeline in Logstash will put any unrecognized data in our Data Lake / S3 bucket called `NEEDS_CLASSIFIED`. To change this, we're going to update the `distributor` pipeline to recognize the DHT22 data feed.
133 |
134 | Add the following conditional to your `distributor.yml` file:
135 |
136 | ```
137 | } else if "temperature-dht22" in [tags] {
138 | pipeline {
139 | send_to => ["temperature-dht22-archive"]
140 | }
141 | }
142 | ```
143 |
144 | Create a Logstash pipeline called `temperature-dht22-archive.yml` with the following contents:
145 |
146 | ```
147 | input {
148 | pipeline {
149 | address => "temperature-dht22-archive"
150 | }
151 | }
152 | filter {
153 | }
154 | output {
155 | s3 {
156 | #
157 | # Custom Settings
158 | #
159 | prefix => "temperature-dht22/%{+YYYY}-%{+MM}-%{+dd}/%{+HH}"
160 | temporary_directory => "${S3_TEMP_DIR}/temperature-dht22-archive"
161 | access_key_id => "${S3_ACCESS_KEY}"
162 | secret_access_key => "${S3_SECRET_KEY}"
163 | endpoint => "${S3_ENDPOINT}"
164 | bucket => "${S3_BUCKET}"
165 |
166 | #
167 | # Standard Settings
168 | #
169 | validate_credentials_on_root_bucket => false
170 | codec => json_lines
171 | # Limit Data Lake file sizes to 5 GB
172 | size_file => 5000000000
173 | time_file => 60
174 | # encoding => "gzip"
175 | additional_settings => {
176 | force_path_style => true
177 | follow_redirects => false
178 | }
179 | }
180 | }
181 | ```
182 |
183 | Put this pipeline in your Logstash configuration directory so it gets loaded whenever Logstash restarts:
184 |
185 | ```bash
186 | sudo mv temperature-dht22-archive.yml /etc/logstash/conf.d/
187 | ```
188 |
189 | Add the pipeline to your `/etc/logstash/pipelines.yml` file:
190 |
191 | ```
192 | - pipeline.id: "temperature-dht22-archive"
193 | path.config: "/etc/logstash/conf.d/temperature-dht22-archive.conf"
194 | ```
195 |
196 | And finally, restart the Logstash service:
197 |
198 | ```bash
199 | sudo systemctl restart logstash
200 | ```
201 |
202 | While Logstash is restarting, you can tail it's log file in order to see if there are any configuration errors:
203 |
204 | ```bash
205 | sudo tail -f /var/log/logstash/logstash-plain.log
206 | ```
207 |
208 | After a few seconds, you should see Logstash shutdown and start with the new pipeline and no errors being emitted.
209 |
210 | Check your cluster's Stack Monitoring to see if we're getting events through the pipeline:
211 |
212 | 
213 |
214 | Check your S3 bucket to see if you're getting data directories created for the current date & hour with data:
215 |
216 | 
217 |
218 | If you see your data being stored, then you are successfully archiving!
219 |
220 | ## Step #3 - Index Data
221 |
222 | Once Logstash is archiving the data, next we need to index it with Elastic.
223 |
224 | We'll use Elastic's [Dynamic field mapping](https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-field-mapping.html) feature to automatically create the right [Field data types](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html) for the data we're sending in.
225 |
226 | Using the [Logstash Toolkit](http://github.com/gose/logstash-toolkit), the following filter chain has been built that can parse the raw JSON coming in.
227 |
228 | Create a new pipeline called `temperature-dht22-index.yml` with the following content:
229 |
230 | ```
231 | input {
232 | pipeline {
233 | address => "temperature-dht22-index"
234 | }
235 | }
236 | filter {
237 | json {
238 | source => "message"
239 | }
240 | json {
241 | source => "message"
242 | }
243 | date {
244 | match => ["timestamp", "ISO8601"]
245 | }
246 | mutate {
247 | remove_field => ["timestamp", "message"]
248 | remove_field => ["tags", "agent", "input", "log", "path", "ecs", "@version"]
249 | }
250 | }
251 | output {
252 | elasticsearch {
253 | #
254 | # Custom Settings
255 | #
256 | id => "temperature-dht22-index"
257 | index => "temperature-dht22-%{+YYYY.MM.dd}"
258 | hosts => "${ES_ENDPOINT}"
259 | user => "${ES_USERNAME}"
260 | password => "${ES_PASSWORD}"
261 | }
262 | }
263 | ```
264 |
265 | Put this pipeline in your Logstash configuration directory so it gets loaded in whenever Logstash restarts:
266 |
267 | ```bash
268 | sudo mv temperature-dht22-index.yml /etc/logstash/conf.d/
269 | ```
270 |
271 | Add the pipeline to your `/etc/logstash/pipelines.yml` file:
272 |
273 | ```
274 | - pipeline.id: "temperature-dht22-index"
275 | path.config: "/etc/logstash/conf.d/temperature-dht22-index.conf"
276 | ```
277 |
278 | Append your new pipeline to your tagged data in the `distributor.yml` pipeline:
279 |
280 | ```
281 | } else if "temperature-dht22" in [tags] {
282 | pipeline {
283 | send_to => ["temperature-dht22-archive", "temperature-dht22-index"]
284 | }
285 | }
286 | ```
287 |
288 | And finally, restart the Logstash service:
289 |
290 | ```bash
291 | sudo systemctl restart logstash
292 | ```
293 |
294 | While Logstash is restarting, you can tail it's log file in order to see if there are any configuration errors:
295 |
296 | ```bash
297 | sudo tail -f /var/log/logstash/logstash-plain.log
298 | ```
299 |
300 | After a few seconds, you should see Logstash shutdown and start with the new pipeline and no errors being emitted.
301 |
302 | Check your cluster's Stack Monitoring to see if we're getting events through the pipeline:
303 |
304 | 
305 |
306 | ## Step #4 - Visualize Data
307 |
308 | Once Elasticsearch is indexing the data, we want to visualize it in Kibana.
309 |
310 | Download this dashboard: [temperature-dht22.ndjson](temperature-dht22.ndjson)
311 |
312 | Jump back into Kibana:
313 |
314 | 1. Select "Stack Management" from the menu
315 | 2. Select "Saved Objects"
316 | 3. Click "Import" in the upper right
317 |
318 | Once it's been imported, click on "Temperature DHT22".
319 |
320 | 
321 |
322 | Congratulations! You should now be looking at temperature data from your DHT22 in Elastic.
323 |
324 |
--------------------------------------------------------------------------------
/temperature-dht22/archive.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/temperature-dht22/archive.png
--------------------------------------------------------------------------------
/temperature-dht22/dashboard.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/temperature-dht22/dashboard.png
--------------------------------------------------------------------------------
/temperature-dht22/dht22.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/temperature-dht22/dht22.png
--------------------------------------------------------------------------------
/temperature-dht22/index.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/temperature-dht22/index.png
--------------------------------------------------------------------------------
/temperature-dht22/minio.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/temperature-dht22/minio.png
--------------------------------------------------------------------------------
/temperature-dht22/temperature-dht22.ndjson:
--------------------------------------------------------------------------------
1 | {"attributes":{"fieldAttrs":"{}","fields":"[]","runtimeFieldMap":"{}","timeFieldName":"@timestamp","title":"temperature-dht22-*","typeMeta":"{}"},"coreMigrationVersion":"7.14.0","id":"9dff6140-0ee2-11ec-b03a-7d8df502f497","migrationVersion":{"index-pattern":"7.11.0"},"references":[],"type":"index-pattern","updated_at":"2021-09-06T07:18:21.789Z","version":"WzM0OTQ3NywyXQ=="}
2 | {"attributes":{"description":"","hits":0,"kibanaSavedObjectMeta":{"searchSourceJSON":"{\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filter\":[]}"},"optionsJSON":"{\"useMargins\":true,\"syncColors\":false,\"hidePanelTitles\":false}","panelsJSON":"[{\"version\":\"7.14.0\",\"type\":\"visualization\",\"gridData\":{\"x\":0,\"y\":0,\"w\":20,\"h\":6,\"i\":\"bfa04254-31c1-429b-971d-921b5d6c33b1\"},\"panelIndex\":\"bfa04254-31c1-429b-971d-921b5d6c33b1\",\"embeddableConfig\":{\"savedVis\":{\"title\":\"\",\"description\":\"\",\"type\":\"markdown\",\"params\":{\"fontSize\":12,\"openLinksInNewTab\":false,\"markdown\":\"# Temperature & Humidity\\nby DHT22\"},\"uiState\":{},\"data\":{\"aggs\":[],\"searchSource\":{\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filter\":[]}}},\"enhancements\":{}}},{\"version\":\"7.14.0\",\"type\":\"lens\",\"gridData\":{\"x\":20,\"y\":0,\"w\":28,\"h\":6,\"i\":\"6bf93313-05d1-4657-bf19-2ac871b74009\"},\"panelIndex\":\"6bf93313-05d1-4657-bf19-2ac871b74009\",\"embeddableConfig\":{\"attributes\":{\"title\":\"\",\"type\":\"lens\",\"visualizationType\":\"lnsXY\",\"state\":{\"datasourceStates\":{\"indexpattern\":{\"layers\":{\"eee22f36-699b-4bcf-b6a8-a1efaefa5549\":{\"columns\":{\"c8b44acf-fb21-4a7a-8d72-f212143dc087\":{\"label\":\"@timestamp\",\"dataType\":\"date\",\"operationType\":\"date_histogram\",\"sourceField\":\"@timestamp\",\"isBucketed\":true,\"scale\":\"interval\",\"params\":{\"interval\":\"auto\"}},\"7658a405-61eb-48aa-85c0-92254a942779\":{\"label\":\"Count of records\",\"dataType\":\"number\",\"operationType\":\"count\",\"isBucketed\":false,\"scale\":\"ratio\",\"sourceField\":\"Records\"}},\"columnOrder\":[\"c8b44acf-fb21-4a7a-8d72-f212143dc087\",\"7658a405-61eb-48aa-85c0-92254a942779\"],\"incompleteColumns\":{}}}}},\"visualization\":{\"legend\":{\"isVisible\":true,\"position\":\"right\"},\"valueLabels\":\"hide\",\"fittingFunction\":\"None\",\"yLeftExtent\":{\"mode\":\"full\"},\"yRightExtent\":{\"mode\":\"full\"},\"axisTitlesVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"tickLabelsVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"gridlinesVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"preferredSeriesType\":\"bar_stacked\",\"layers\":[{\"layerId\":\"eee22f36-699b-4bcf-b6a8-a1efaefa5549\",\"accessors\":[\"7658a405-61eb-48aa-85c0-92254a942779\"],\"position\":\"top\",\"seriesType\":\"bar_stacked\",\"showGridlines\":false,\"xAccessor\":\"c8b44acf-fb21-4a7a-8d72-f212143dc087\"}]},\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filters\":[]},\"references\":[{\"type\":\"index-pattern\",\"id\":\"9dff6140-0ee2-11ec-b03a-7d8df502f497\",\"name\":\"indexpattern-datasource-current-indexpattern\"},{\"type\":\"index-pattern\",\"id\":\"9dff6140-0ee2-11ec-b03a-7d8df502f497\",\"name\":\"indexpattern-datasource-layer-eee22f36-699b-4bcf-b6a8-a1efaefa5549\"}]},\"hidePanelTitles\":false,\"enhancements\":{}},\"title\":\"Logs\"},{\"version\":\"7.14.0\",\"type\":\"lens\",\"gridData\":{\"x\":0,\"y\":6,\"w\":48,\"h\":10,\"i\":\"b2634629-b584-4b60-99d2-574db7c2d576\"},\"panelIndex\":\"b2634629-b584-4b60-99d2-574db7c2d576\",\"embeddableConfig\":{\"attributes\":{\"title\":\"\",\"type\":\"lens\",\"visualizationType\":\"lnsXY\",\"state\":{\"datasourceStates\":{\"indexpattern\":{\"layers\":{\"ddd9b678-b4d2-4c07-8715-7338bc709326\":{\"columns\":{\"5f615941-0d1f-4e75-b85d-37a5de33199d\":{\"label\":\"@timestamp\",\"dataType\":\"date\",\"operationType\":\"date_histogram\",\"sourceField\":\"@timestamp\",\"isBucketed\":true,\"scale\":\"interval\",\"params\":{\"interval\":\"auto\"}},\"7a7f6240-bd2d-4782-bc0e-68f5ddfe61b6\":{\"label\":\"Median of temp_f\",\"dataType\":\"number\",\"operationType\":\"median\",\"sourceField\":\"temp_f\",\"isBucketed\":false,\"scale\":\"ratio\"},\"8b38e615-3eac-4a81-9da9-955d50ffb348\":{\"label\":\"Top values of host.keyword\",\"dataType\":\"string\",\"operationType\":\"terms\",\"scale\":\"ordinal\",\"sourceField\":\"host.keyword\",\"isBucketed\":true,\"params\":{\"size\":3,\"orderBy\":{\"type\":\"column\",\"columnId\":\"7a7f6240-bd2d-4782-bc0e-68f5ddfe61b6\"},\"orderDirection\":\"desc\",\"otherBucket\":true,\"missingBucket\":false}}},\"columnOrder\":[\"8b38e615-3eac-4a81-9da9-955d50ffb348\",\"5f615941-0d1f-4e75-b85d-37a5de33199d\",\"7a7f6240-bd2d-4782-bc0e-68f5ddfe61b6\"],\"incompleteColumns\":{}}}}},\"visualization\":{\"legend\":{\"isVisible\":true,\"position\":\"right\"},\"valueLabels\":\"hide\",\"fittingFunction\":\"None\",\"curveType\":\"CURVE_MONOTONE_X\",\"valuesInLegend\":true,\"yLeftExtent\":{\"mode\":\"full\"},\"yRightExtent\":{\"mode\":\"full\"},\"axisTitlesVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"tickLabelsVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"gridlinesVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"preferredSeriesType\":\"line\",\"layers\":[{\"layerId\":\"ddd9b678-b4d2-4c07-8715-7338bc709326\",\"accessors\":[\"7a7f6240-bd2d-4782-bc0e-68f5ddfe61b6\"],\"position\":\"top\",\"seriesType\":\"line\",\"showGridlines\":false,\"xAccessor\":\"5f615941-0d1f-4e75-b85d-37a5de33199d\",\"splitAccessor\":\"8b38e615-3eac-4a81-9da9-955d50ffb348\"}]},\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filters\":[]},\"references\":[{\"type\":\"index-pattern\",\"id\":\"9dff6140-0ee2-11ec-b03a-7d8df502f497\",\"name\":\"indexpattern-datasource-current-indexpattern\"},{\"type\":\"index-pattern\",\"id\":\"9dff6140-0ee2-11ec-b03a-7d8df502f497\",\"name\":\"indexpattern-datasource-layer-ddd9b678-b4d2-4c07-8715-7338bc709326\"}]},\"hidePanelTitles\":false,\"enhancements\":{}},\"title\":\"Temperature\"},{\"version\":\"7.14.0\",\"type\":\"lens\",\"gridData\":{\"x\":0,\"y\":16,\"w\":48,\"h\":9,\"i\":\"5c2cc2a8-b6ae-4362-bbc4-df113d3e4619\"},\"panelIndex\":\"5c2cc2a8-b6ae-4362-bbc4-df113d3e4619\",\"embeddableConfig\":{\"attributes\":{\"title\":\"\",\"type\":\"lens\",\"visualizationType\":\"lnsXY\",\"state\":{\"datasourceStates\":{\"indexpattern\":{\"layers\":{\"9436d677-5ba4-4307-838b-53d734ad969d\":{\"columns\":{\"8c427cf5-b467-4497-b49a-ffe256a862d4\":{\"label\":\"@timestamp\",\"dataType\":\"date\",\"operationType\":\"date_histogram\",\"sourceField\":\"@timestamp\",\"isBucketed\":true,\"scale\":\"interval\",\"params\":{\"interval\":\"auto\"}},\"c8c24ecc-082a-4eeb-a039-aef253a1a9de\":{\"label\":\"Median of humidity\",\"dataType\":\"number\",\"operationType\":\"median\",\"sourceField\":\"humidity\",\"isBucketed\":false,\"scale\":\"ratio\"},\"eac3c026-b26a-4384-9020-d6eae264660d\":{\"label\":\"Top values of host.keyword\",\"dataType\":\"string\",\"operationType\":\"terms\",\"scale\":\"ordinal\",\"sourceField\":\"host.keyword\",\"isBucketed\":true,\"params\":{\"size\":3,\"orderBy\":{\"type\":\"column\",\"columnId\":\"c8c24ecc-082a-4eeb-a039-aef253a1a9de\"},\"orderDirection\":\"desc\",\"otherBucket\":true,\"missingBucket\":false}}},\"columnOrder\":[\"eac3c026-b26a-4384-9020-d6eae264660d\",\"8c427cf5-b467-4497-b49a-ffe256a862d4\",\"c8c24ecc-082a-4eeb-a039-aef253a1a9de\"],\"incompleteColumns\":{}}}}},\"visualization\":{\"legend\":{\"isVisible\":true,\"position\":\"right\"},\"valueLabels\":\"hide\",\"fittingFunction\":\"None\",\"curveType\":\"CURVE_MONOTONE_X\",\"valuesInLegend\":true,\"yLeftExtent\":{\"mode\":\"full\"},\"yRightExtent\":{\"mode\":\"full\"},\"axisTitlesVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"tickLabelsVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"gridlinesVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"preferredSeriesType\":\"line\",\"layers\":[{\"layerId\":\"9436d677-5ba4-4307-838b-53d734ad969d\",\"accessors\":[\"c8c24ecc-082a-4eeb-a039-aef253a1a9de\"],\"position\":\"top\",\"seriesType\":\"line\",\"showGridlines\":false,\"xAccessor\":\"8c427cf5-b467-4497-b49a-ffe256a862d4\",\"splitAccessor\":\"eac3c026-b26a-4384-9020-d6eae264660d\"}]},\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filters\":[]},\"references\":[{\"type\":\"index-pattern\",\"id\":\"9dff6140-0ee2-11ec-b03a-7d8df502f497\",\"name\":\"indexpattern-datasource-current-indexpattern\"},{\"type\":\"index-pattern\",\"id\":\"9dff6140-0ee2-11ec-b03a-7d8df502f497\",\"name\":\"indexpattern-datasource-layer-9436d677-5ba4-4307-838b-53d734ad969d\"}]},\"hidePanelTitles\":false,\"enhancements\":{}},\"title\":\"Humidity\"}]","timeRestore":false,"title":"Temperature DHT22","version":1},"coreMigrationVersion":"7.14.0","id":"27df3f70-0ee3-11ec-b03a-7d8df502f497","migrationVersion":{"dashboard":"7.14.0"},"references":[{"id":"9dff6140-0ee2-11ec-b03a-7d8df502f497","name":"6bf93313-05d1-4657-bf19-2ac871b74009:indexpattern-datasource-current-indexpattern","type":"index-pattern"},{"id":"9dff6140-0ee2-11ec-b03a-7d8df502f497","name":"6bf93313-05d1-4657-bf19-2ac871b74009:indexpattern-datasource-layer-eee22f36-699b-4bcf-b6a8-a1efaefa5549","type":"index-pattern"},{"id":"9dff6140-0ee2-11ec-b03a-7d8df502f497","name":"b2634629-b584-4b60-99d2-574db7c2d576:indexpattern-datasource-current-indexpattern","type":"index-pattern"},{"id":"9dff6140-0ee2-11ec-b03a-7d8df502f497","name":"b2634629-b584-4b60-99d2-574db7c2d576:indexpattern-datasource-layer-ddd9b678-b4d2-4c07-8715-7338bc709326","type":"index-pattern"},{"id":"9dff6140-0ee2-11ec-b03a-7d8df502f497","name":"5c2cc2a8-b6ae-4362-bbc4-df113d3e4619:indexpattern-datasource-current-indexpattern","type":"index-pattern"},{"id":"9dff6140-0ee2-11ec-b03a-7d8df502f497","name":"5c2cc2a8-b6ae-4362-bbc4-df113d3e4619:indexpattern-datasource-layer-9436d677-5ba4-4307-838b-53d734ad969d","type":"index-pattern"}],"type":"dashboard","updated_at":"2021-09-06T07:22:13.101Z","version":"WzM0OTY3MywyXQ=="}
3 | {"excludedObjects":[],"excludedObjectsCount":0,"exportedCount":2,"missingRefCount":0,"missingReferences":[]}
--------------------------------------------------------------------------------
/utilization/2-archive/utilization-archive.yml:
--------------------------------------------------------------------------------
1 | input {
2 | pipeline {
3 | address => "utilization-archive"
4 | }
5 | }
6 | filter {
7 | }
8 | output {
9 | s3 {
10 | #
11 | # Custom Settings
12 | #
13 | prefix => "utilization/${S3_DATE_DIR}"
14 | temporary_directory => "${S3_TEMP_DIR}/utilization-archive"
15 | access_key_id => "${S3_ACCESS_KEY}"
16 | secret_access_key => "${S3_SECRET_KEY}"
17 | endpoint => "${S3_ENDPOINT}"
18 | bucket => "${S3_BUCKET}"
19 |
20 | #
21 | # Standard Settings
22 | #
23 | validate_credentials_on_root_bucket => false
24 | codec => json_lines
25 | # Limit Data Lake file sizes to 5 GB
26 | size_file => 5000000000
27 | time_file => 1
28 | # encoding => "gzip"
29 | additional_settings => {
30 | force_path_style => true
31 | follow_redirects => false
32 | }
33 | }
34 | }
35 |
--------------------------------------------------------------------------------
/utilization/2-archive/utilization-reindex.yml:
--------------------------------------------------------------------------------
1 | input {
2 | s3 {
3 | #
4 | # Custom Settings
5 | #
6 | prefix => "utilization/2021-01-04"
7 | temporary_directory => "${S3_TEMP_DIR}/utilization-reindex"
8 | access_key_id => "${S3_ACCESS_KEY}"
9 | secret_access_key => "${S3_SECRET_KEY}"
10 | endpoint => "${S3_ENDPOINT}"
11 | bucket => "${S3_BUCKET}"
12 |
13 | #
14 | # Standard Settings
15 | #
16 | watch_for_new_files => false
17 | codec => json_lines
18 | additional_settings => {
19 | force_path_style => true
20 | follow_redirects => false
21 | }
22 | }
23 | }
24 | filter {
25 | }
26 | output {
27 | pipeline { send_to => "utilization-structure" }
28 | }
29 |
--------------------------------------------------------------------------------
/utilization/2-archive/utilization-structure.yml:
--------------------------------------------------------------------------------
1 | input {
2 | pipeline {
3 | address => "utilization-structure"
4 | }
5 | }
6 | filter {
7 | }
8 | output {
9 | elasticsearch {
10 | #
11 | # Custom Settings
12 | #
13 | id => "utilization-structure"
14 | index => "utilization"
15 | hosts => "${ES_ENDPOINT}"
16 | user => "${ES_USERNAME}"
17 | password => "${ES_PASSWORD}"
18 | }
19 | }
20 |
--------------------------------------------------------------------------------
/weather-station/README.md:
--------------------------------------------------------------------------------
1 | # Weather Station
2 |
3 |
4 |
5 | The [WS-1550-IP](https://ambientweather.com/amws1500.html) from Ambient Weather is a great amatuer weather station. The station itself is powered by solar with AA battery backup, it's relatively maintenance free, and it's a joy to observe in action. It communicates with a base station via 915 MHz that requires no setup. You can also add up to 8 additional sensors to collect temperature from various points within range, all wirelessly. The base station connects to the Internet via a hard-wired RJ-45 connection on your network, that it uses to upload the data it collects to Ambient Weather's free service. From there, you can query an [API](https://ambientweather.docs.apiary.io/#) to get your latest, hyper-local, weather.
6 |
7 | In this data source, we'll build the following dashboard with Elastic:
8 |
9 | 
10 |
11 | Let's get started!
12 |
13 | ## Step #1 - Collect Data
14 |
15 | Create a new python script called `~/bin/weather-station.py` with the following contents:
16 |
17 | ```python
18 | #!/usr/bin/env python3
19 |
20 | import urllib.request
21 | import urllib.parse
22 |
23 | api_key = ''
24 | app_key = ''
25 |
26 | url = "https://api.ambientweather.net/v1/devices?apiKey=%s&applicationKey=%s" % (api_key, app_key)
27 |
28 | try:
29 | f = urllib.request.urlopen(url)
30 | except urllib.error.HTTPError as e:
31 | # Return code error (e.g. 404, 501, ...)
32 | print('[{"lastData": {"http_code": %s}}]' % (e.code))
33 | except urllib.error.URLError as e:
34 | # Not an HTTP-specific error (e.g. connection refused)
35 | print('[{"lastData": {"http_error": "%s"}}]' % (e.reason))
36 | else:
37 | # 200
38 | print(f.read().decode("utf-8"))
39 | ```
40 |
41 | Enter your API key and Application key from the Ambient Weather service.
42 |
43 | This script queries the Ambient Weather API using your API key and Application key. It then prints the response to `stdout`. Once we've confirmed the script works, we'll redirect `stdout` to a log file.
44 |
45 | Try running the script:
46 |
47 | ```bash
48 | chmod a+x ~/bin/weather-station.py
49 | ~/bin/weather-station.py
50 | ```
51 |
52 | You should see output similar to:
53 |
54 | ```json
55 | [{"macAddress":"00:0E:C6:20:0F:7B","lastData":{"dateutc":1630076460000,"winddir":186,"windspeedmph":0.22,"windgustmph":1.12,"maxdailygust":4.47,"tempf":82.4,"battout":1,"humidity":69,"hourlyrainin":0,"eventrainin":0,"dailyrainin":0,"weeklyrainin":1.22,"monthlyrainin":5.03,"yearlyrainin":21.34,"totalrainin":21.34,"tempinf":73.4,"battin":1,"humidityin":62, ...
56 | ```
57 |
58 | Once you confirm the script is working, you can redirect its output to a log file:
59 |
60 | ```bash
61 | sudo touch /var/log/weather-station.log
62 | sudo chown ubuntu.ubuntu /var/log/weather-station.log
63 | ```
64 |
65 | Create a logrotate entry so the log file doesn't grow unbounded:
66 |
67 | ```bash
68 | sudo vi /etc/logrotate.d/weather-station
69 | ```
70 |
71 | Add the following logrotate content:
72 |
73 | ```
74 | /var/log/weather-station.log {
75 | weekly
76 | rotate 12
77 | compress
78 | delaycompress
79 | missingok
80 | notifempty
81 | create 644 ubuntu ubuntu
82 | }
83 | ```
84 |
85 | Add the following entry to your crontab with `crontab -e`:
86 |
87 | ```
88 | * * * * * /home/ubuntu/bin/weather-station.py >> /var/log/weather-station.log 2>&1
89 | ```
90 |
91 | Verify output by tailing the log file for a few minutes (since cron is only running the script at the start of each minute):
92 |
93 | ```bash
94 | tail -f /var/log/weather-station.log
95 | ```
96 |
97 | If you're seeing output scroll each minute then you are successfully collecting data!
98 |
99 | ## Step #2 - Archive Data
100 |
101 | Once your data is ready to archive, we'll use Filebeat to send it to Logstash which will in turn sends it to S3.
102 |
103 | Add the following to the Filebeat config `/etc/filebeat/filebeat.yml` on the host logging your weather data:
104 |
105 | ```yaml
106 | filebeat.inputs:
107 | - type: log
108 | enabled: true
109 | tags: ["weather-station"]
110 | paths:
111 | - /var/log/weather-station.log
112 | ```
113 |
114 | This tells Filebeat where the log file is located and it adds a tag to each event. We'll refer to that tag in Logstash so we can easily isolate events from this data stream.
115 |
116 | Restart Filebeat:
117 |
118 | ```bash
119 | sudo systemctl restart filebeat
120 | ```
121 |
122 | You may want to tail syslog to see if Filebeat restarts without any issues:
123 |
124 | ```bash
125 | tail -f /var/log/syslog | grep filebeat
126 | ```
127 |
128 | At this point, we should have weather-station data flowing into Logstash. By default however, our `distributor` pipeline in Logstash will put any unrecognized data in our Data Lake / S3 bucket called `NEEDS_CLASSIFIED`. To change this, we're going to update the `distributor` pipeline to recognize the weather station data feed.
129 |
130 | Add the following conditional to your `distributor.yml` file:
131 |
132 | ```
133 | } else if "weather-station" in [tags] {
134 | pipeline {
135 | send_to => ["weather-station-archive"]
136 | }
137 | }
138 | ```
139 |
140 | Create a Logstash pipeline called `weather-station-archive.yml` with the following contents:
141 |
142 | ```
143 | input {
144 | pipeline {
145 | address => "weather-station-archive"
146 | }
147 | }
148 | filter {
149 | }
150 | output {
151 | s3 {
152 | #
153 | # Custom Settings
154 | #
155 | prefix => "weather-station/%{+YYYY}-%{+MM}-%{+dd}/%{+HH}"
156 | temporary_directory => "${S3_TEMP_DIR}/weather-station-archive"
157 | access_key_id => "${S3_ACCESS_KEY}"
158 | secret_access_key => "${S3_SECRET_KEY}"
159 | endpoint => "${S3_ENDPOINT}"
160 | bucket => "${S3_BUCKET}"
161 |
162 | #
163 | # Standard Settings
164 | #
165 | validate_credentials_on_root_bucket => false
166 | codec => json_lines
167 | # Limit Data Lake file sizes to 5 GB
168 | size_file => 5000000000
169 | time_file => 60
170 | # encoding => "gzip"
171 | additional_settings => {
172 | force_path_style => true
173 | follow_redirects => false
174 | }
175 | }
176 | }
177 | ```
178 |
179 | Put this pipeline in your Logstash configuration directory so it gets loaded whenever Logstash restarts:
180 |
181 | ```bash
182 | sudo mv weather-station-archive.yml /etc/logstash/conf.d/
183 | ```
184 |
185 | Add the pipeline to your `/etc/logstash/pipelines.yml` file:
186 |
187 | ```
188 | - pipeline.id: "weather-station-archive"
189 | path.config: "/etc/logstash/conf.d/weather-station-archive.conf"
190 | ```
191 |
192 | And finally, restart the Logstash service:
193 |
194 | ```bash
195 | sudo systemctl restart logstash
196 | ```
197 |
198 | While Logstash is restarting, you can tail it's log file in order to see if there are any configuration errors:
199 |
200 | ```bash
201 | sudo tail -f /var/log/logstash/logstash-plain.log
202 | ```
203 |
204 | After a few seconds, you should see Logstash shutdown and start with the new pipeline and no errors being emitted.
205 |
206 | Check your cluster's Stack Monitoring to see if we're getting events through the pipeline:
207 |
208 | 
209 |
210 | Check your S3 bucket to see if you're getting data directories created each minute for the current date & hour with data:
211 |
212 | 
213 |
214 | If you see your data being stored, then you are successfully archiving!
215 |
216 | ## Step #3 - Index Data
217 |
218 | Once Logstash is archiving the data, next we need to index it with Elastic.
219 |
220 | Jump into Kibana and open Dev Tools.
221 |
222 | Copy and paste the following content into Dev Tools to create an Index Template for our weather station data:
223 |
224 | ```
225 | PUT _index_template/weather-station
226 | {
227 | "index_patterns": [
228 | "weather-station-*"
229 | ],
230 | "template": {
231 | "mappings": {
232 | "dynamic_templates": [
233 | {
234 | "integers": {
235 | "match_mapping_type": "long",
236 | "mapping": {
237 | "type": "float"
238 | }
239 | }
240 | }
241 | ],
242 | "properties": {
243 | "info.coords.geo.coordinates": {
244 | "type": "geo_point"
245 | }
246 | }
247 | }
248 | }
249 | }
250 | ```
251 |
252 | For the most part, we'll use Elastic's [Dynamic field mapping](https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-field-mapping.html) feature to automatically create the right [Field data types](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html) for the data we're sending in. The exceptions here are the latitude & longitude of the weather station and the coercion of any `long` values into `float` values. First, for the latitude & longitude, we need to explicility tell Elasticsearch that this is a [`geo_point`](https://www.elastic.co/guide/en/elasticsearch/reference/current/geo-point.html) type so that we can plot it on a map. If you start to track multiple weather stations in Elastic, plotting their locations on a map is very useful. Second, to prevent any values that happen to first come in as a whole number from determining the mapping type, we set a [`dynamic_template`](https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-templates.html) to convert any `long` values into `float` values.
253 |
254 | Now, switch back to a terminal so we can create the Logstash pipeline to index the weather station data.
255 |
256 | Using the [Logstash Toolkit](http://github.com/gose/logstash-toolkit), I iteratively built the following filter chain that can parse the raw JSON coming in.
257 |
258 | Create a new pipeline called `weather-station-index.yml` with the following content:
259 |
260 | ```
261 | input {
262 | pipeline {
263 | address => "weather-station-index"
264 | }
265 | }
266 | filter {
267 | if [message] =~ /^\[/ {
268 | json {
269 | source => "message"
270 | target => "tmp"
271 | }
272 | } else {
273 | drop { }
274 | }
275 | if "_jsonparsefailure" in [tags] {
276 | drop { }
277 | }
278 | mutate {
279 | remove_field => ["message"]
280 | }
281 | mutate {
282 | add_field => {
283 | "message" => "%{[tmp][0]}"
284 | }
285 | }
286 | json {
287 | source => "message"
288 | }
289 | ruby {
290 | # Promote the keys inside lastData to root, then remove lastData
291 | code => '
292 | event.get("lastData").each { |k, v|
293 | event.set(k,v)
294 | }
295 | event.remove("lastData")
296 | '
297 | }
298 | date {
299 | match => ["date", "ISO8601"]
300 | }
301 | mutate {
302 | remove_field => ["message", "tmp", "path", "host", "macAddress", "date"]
303 | }
304 | }
305 | output {
306 | elasticsearch {
307 | #
308 | # Custom Settings
309 | #
310 | id => "weather-station-index"
311 | index => "weather-station-%{+YYYY.MM.dd}"
312 | hosts => "${ES_ENDPOINT}"
313 | user => "${ES_USERNAME}"
314 | password => "${ES_PASSWORD}"
315 | }
316 | }
317 | ```
318 |
319 | This filter chain structures the raw data into a format that allows us to easily use Elastic's dynamic mapping feature.
320 |
321 | For the most part, we use the raw field names as provided to us by the Ambient Weather service. You can rename the raw field names to something more descriptive if you'd like, but then you'll also need to adjust the Dashboard provided in Step #4 to point to your field names.
322 |
323 | Ambient Weather provides a list of the units on each of their field names in the [Device Data Specs](https://github.com/ambient-weather/api-docs/wiki/Device-Data-Specs).
324 |
325 | Put this pipeline in your Logstash configuration directory so it gets loaded in whenever Logstash restarts:
326 |
327 | ```bash
328 | sudo mv weather-station-index.yml /etc/logstash/conf.d/
329 | ```
330 |
331 | Add the pipeline to your `/etc/logstash/pipelines.yml` file:
332 |
333 | ```
334 | - pipeline.id: "weather-station-index"
335 | path.config: "/etc/logstash/conf.d/weather-station-index.conf"
336 | ```
337 |
338 | And finally, restart the Logstash service:
339 |
340 | ```bash
341 | sudo systemctl restart logstash
342 | ```
343 |
344 | While Logstash is restarting, you can tail it's log file in order to see if there are any configuration errors:
345 |
346 | ```bash
347 | sudo tail -f /var/log/logstash/logstash-plain.log
348 | ```
349 |
350 | After a few seconds, you should see Logstash shutdown and start with the new pipeline and no errors being emitted.
351 |
352 | Check your cluster's Stack Monitoring to see if we're getting events through the pipeline:
353 |
354 | 
355 |
356 | ## Step #4 - Visualize Data
357 |
358 | Once Elasticsearch is indexing the data, we want to visualize it in Kibana.
359 |
360 | Download this dashboard:
361 |
362 | [weather-station.ndjson](weather-station.ndjson)
363 |
364 | Jump into Kibana:
365 |
366 | 1. Select "Stack Management" from the menu
367 | 2. Select "Saved Objects"
368 | 3. Click "Import" in the upper right
369 |
370 | Once it's been imported, click on "Weather Station".
371 |
372 | Congratulations! You should now be looking at data from your weather station in Elastic.
373 |
374 |
--------------------------------------------------------------------------------
/weather-station/archive.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/weather-station/archive.png
--------------------------------------------------------------------------------
/weather-station/dashboard.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/weather-station/dashboard.png
--------------------------------------------------------------------------------
/weather-station/index.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/weather-station/index.png
--------------------------------------------------------------------------------
/weather-station/minio.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/weather-station/minio.png
--------------------------------------------------------------------------------
/weather-station/ws-1550-ip.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/weather-station/ws-1550-ip.png
--------------------------------------------------------------------------------