├── .gitignore
├── README.md
├── co2meter
    ├── README.md
    ├── archive.png
    ├── co2meter.ndjson
    ├── co2meter.png
    ├── co2meter.py
    ├── dashboard.png
    ├── index.png
    └── minio.png
├── directory-sizes
    ├── README.md
    ├── archive.png
    ├── dashboard.png
    ├── directory-sizes.py
    ├── index.png
    ├── logo.png
    └── minio.png
├── flight-tracker
    ├── README.md
    ├── archive.png
    ├── dashboard.png
    ├── flight-tracker.ndjson
    ├── flight-tracker.py
    ├── index.png
    ├── logo.png
    └── minio.png
├── gps
    ├── README.md
    ├── archive.png
    ├── dashboard.png
    ├── gps.ndjson
    ├── gps.png
    ├── index.png
    └── minio.png
├── haproxy-filebeat-module
    ├── 2-archive
    │   ├── haproxy-filebeat-module-archive.yml
    │   ├── haproxy-filebeat-module-reindex.yml
    │   └── haproxy-filebeat-module-structure.yml
    ├── 4-visualize
    │   └── dashboard.json
    └── README.md
├── images
    ├── architecture.png
    ├── caiv.png
    ├── data-source-assets.png
    ├── elk-data-lake.png
    ├── indexing.png
    ├── logical-elements.png
    ├── onboarding-data.png
    ├── terminology.png
    └── workflow.png
├── power-emu2
    ├── README.md
    ├── archive.png
    ├── dashboard.png
    ├── emu-2.jpg
    ├── index.png
    ├── minio.png
    ├── power-emu2.ndjson
    └── power-emu2.py
├── power-hs300
    ├── README.md
    ├── archive.png
    ├── dashboard.png
    ├── hs300.png
    ├── hs300.py
    ├── index.png
    ├── minio.png
    ├── power-hs300.ndjson
    └── reindex.yml
├── satellites
    ├── README.md
    ├── dashboard.png
    ├── satellites.ndjson
    └── satellites.png
├── setup
    ├── README.md
    ├── dead-letter-queue-archive.yml
    ├── distributor.conf
    ├── haproxy.cfg
    └── needs-classified-archive.yml
├── solar-enphase
    ├── README.md
    ├── archive.png
    ├── dashboard.png
    ├── index.png
    ├── minio.png
    ├── solar-enphase.py
    └── solar.png
├── temperature-dht22
    ├── README.md
    ├── archive.png
    ├── dashboard.png
    ├── dht22.png
    ├── index.png
    ├── minio.png
    └── temperature-dht22.ndjson
├── utilization
    ├── 2-archive
    │   ├── utilization-archive.yml
    │   ├── utilization-reindex.yml
    │   └── utilization-structure.yml
    └── 3-index
    │   ├── README.md
    │   └── utilization-index-template.json
└── weather-station
    ├── README.md
    ├── archive.png
    ├── dashboard.png
    ├── index.png
    ├── minio.png
    ├── weather-station.ndjson
    └── ws-1550-ip.png


/.gitignore:
--------------------------------------------------------------------------------
1 | .DS_Store
2 | *.swp
3 | *.bak
4 | *.orig
5 | *.dump
6 | *.keystore
7 | tmp
8 | 


--------------------------------------------------------------------------------
/co2meter/README.md:
--------------------------------------------------------------------------------
  1 | # CO2 Monitoring
  2 | 
  3 | <img src="co2meter.png" alt="co2meter" width="300" align="right">
  4 | 
  5 | The [CO2Mini](https://www.co2meter.com/collections/desktop/products/co2mini-co2-indoor-air-quality-monitor?variant=308811055) is an indoor air quality monitor that displays the CO2 level of the room it's in.  It's often used in home and office settings since it's been shown that elevated CO2 levels can cause [fatigue](https://pubmed.ncbi.nlm.nih.gov/26273786/) and [impair decisions](https://newscenter.lbl.gov/2012/10/17/elevated-indoor-carbon-dioxide-impairs-decision-making-performance/).  The CO2Mini connects to a computer via USB where it can be read programmatically.
  6 | 
  7 | In this data source, we'll build the following dashboard with Elastic:
  8 | 
  9 | ![Dashboard](dashboard.png)
 10 | 
 11 | Let's get started!
 12 | 
 13 | ## Step #1 - Collect Data
 14 | 
 15 | Create a new python script called `~/bin/co2meter.py` with the following contents:
 16 | 
 17 | ​	[co2meter.py](co2meter.py)
 18 | 
 19 | The script was originally written by [Henryk Plötz](https://hackaday.io/project/5301-reverse-engineering-a-low-cost-usb-co-monitor/log/17909-all-your-base-are-belong-to-us) and has only a few minor edits so it works with Python3.  
 20 | 
 21 | Take a few minutes to familiarize yourself with the script.  There are a couple of labels you can change near the bottom.  Adjust the values of `hostname` and `location` to suit your needs.
 22 | 
 23 | With your CO2Mini plugged in, try running the script:
 24 | 
 25 | ```bash
 26 | chmod a+x ~/bin/co2meter.py
 27 | sudo ~/bin/co2meter.py
 28 | ```
 29 | 
 30 | We'll run our script with `sudo`, but you could add a `udev` rule to give your user permission to `/dev/hidraw0`.
 31 | 
 32 | You should see output on `stdout` similar to:
 33 | 
 34 | ```json
 35 | {"@timestamp": "2021-09-01T20:38:06.353614", "hostname": "node", "location": "office", "co2_ppm": 438, "temp_c": 27.79, "temp_f": 82.02, "source": "CO2 Meter"}
 36 | ```
 37 | 
 38 | Once you confirm the script is working, you can redirect its output to a log file:
 39 | 
 40 | ```bash
 41 | sudo touch /var/log/co2meter.log
 42 | sudo chown ubuntu.ubuntu /var/log/co2meter.log
 43 | ```
 44 | 
 45 | Create a logrotate entry so the log file doesn't grow unbounded:
 46 | 
 47 | ```bash
 48 | sudo vi /etc/logrotate.d/co2meter
 49 | ```
 50 | 
 51 | Add the following logrotate content:
 52 | 
 53 | ```
 54 | /var/log/co2meter.log {
 55 |   weekly
 56 |   rotate 12
 57 |   compress
 58 |   delaycompress
 59 |   missingok
 60 |   notifempty
 61 |   create 644 ubuntu ubuntu
 62 | }
 63 | ```
 64 | 
 65 | Add the following entry to your crontab with `crontab -e`:
 66 | 
 67 | ```
 68 | * * * * * /home/ubuntu/bin/co2meter.py >> /var/log/co2meter.log 2>&1
 69 | ```
 70 | 
 71 | Verify output by tailing the log file for a few minutes (since cron is only running the script at the start of each minute):
 72 | 
 73 | ```bash
 74 | tail -f /var/log/co2meter.log
 75 | ```
 76 | 
 77 | If you're seeing output scroll each minute then you are successfully collecting data!
 78 | 
 79 | ## Step #2 - Archive Data
 80 | 
 81 | Once your data is ready to archive, we'll use Filebeat to send it to Logstash which will in turn sends it to S3.
 82 | 
 83 | Add the following to the Filebeat config `/etc/filebeat/filebeat.yml` on the host logging your CO2 data:
 84 | 
 85 | ```yaml
 86 | filebeat.inputs:
 87 |   - type: log
 88 |     enabled: true
 89 |     tags: ["co2meter"]
 90 |     paths:
 91 |       - /var/log/co2meter.log
 92 | ```
 93 | 
 94 | This tells Filebeat where the log file is located and it adds a tag to each event.  We'll refer to that tag in Logstash so we can easily isolate events from this data stream.
 95 | 
 96 | Restart Filebeat:
 97 | 
 98 | ```bash
 99 | sudo systemctl restart filebeat
100 | ```
101 | 
102 | You may want to tail syslog to see if Filebeat restarts without any issues:
103 | 
104 | ```bash
105 | tail -f /var/log/syslog | grep filebeat
106 | ```
107 | 
108 | At this point, we should have CO2 Meter data flowing into Logstash.  By default however, our `distributor` pipeline in Logstash will put any unrecognized data in our Data Lake / S3 bucket called `NEEDS_CLASSIFIED`.  To change this, we're going to update the `distributor` pipeline to recognize the CO2 Meter data feed.
109 | 
110 | Add the following conditional to your `distributor.yml` file:
111 | 
112 | ```
113 | } else if "co2meter" in [tags] {
114 |     pipeline {
115 |         send_to => ["co2meter-archive"]
116 |     }
117 | }
118 | ```
119 | 
120 | Create a Logstash pipeline called `co2meter-archive.yml` with the following contents:
121 | 
122 | ```
123 | input {
124 |     pipeline {
125 |         address => "co2meter-archive"
126 |     }
127 | }
128 | filter {
129 | }
130 | output {
131 |     s3 {
132 |         #
133 |         # Custom Settings
134 |         #
135 |         prefix => "co2meter/%{+YYYY}-%{+MM}-%{+dd}/%{+HH}"
136 |         temporary_directory => "${S3_TEMP_DIR}/co2meter-archive"
137 |         access_key_id => "${S3_ACCESS_KEY}"
138 |         secret_access_key => "${S3_SECRET_KEY}"
139 |         endpoint => "${S3_ENDPOINT}"
140 |         bucket => "${S3_BUCKET}"
141 | 
142 |         #
143 |         # Standard Settings
144 |         #
145 |         validate_credentials_on_root_bucket => false
146 |         codec => json_lines
147 |         # Limit Data Lake file sizes to 5 GB
148 |         size_file => 5000000000
149 |         time_file => 60
150 |         # encoding => "gzip"
151 |         additional_settings => {
152 |             force_path_style => true
153 |             follow_redirects => false
154 |         }
155 |     }
156 | }
157 | ```
158 | 
159 | Put this pipeline in your Logstash configuration directory so it gets loaded whenever Logstash restarts:
160 | 
161 | ```bash
162 | sudo mv co2meter-archive.yml /etc/logstash/conf.d/
163 | ```
164 | 
165 | Add the pipeline to your `/etc/logstash/pipelines.yml` file:
166 | 
167 | ```
168 | - pipeline.id: "co2meter-archive"
169 |   path.config: "/etc/logstash/conf.d/co2meter-archive.conf"
170 | ```
171 | 
172 | And finally, restart the Logstash service:
173 | 
174 | ```bash
175 | sudo systemctl restart logstash
176 | ```
177 | 
178 | While Logstash is restarting, you can tail it's log file in order to see if there are any configuration errors:
179 | 
180 | ```bash
181 | sudo tail -f /var/log/logstash/logstash-plain.log
182 | ```
183 | 
184 | After a few seconds, you should see Logstash shutdown and start with the new pipeline and no errors being emitted.
185 | 
186 | Check your cluster's Stack Monitoring to see if we're getting events through the pipeline:
187 | 
188 | ![Stack Monitoring](archive.png)
189 | 
190 | Check your S3 bucket to see if you're getting data directories created for the current date & hour with data:
191 | 
192 | ![MinIO](minio.png)
193 | 
194 | If you see your data being stored, then you are successfully archiving!
195 | 
196 | ## Step #3 - Index Data
197 | 
198 | Once Logstash is archiving the data, next we need to index it with Elastic.
199 | 
200 | We'll use Elastic's [Dynamic field mapping](https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-field-mapping.html) feature to automatically create the right [Field data types](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html) for the data we're sending in.  
201 | 
202 | Using the [Logstash Toolkit](http://github.com/gose/logstash-toolkit), the following filter chain has been built that can parse the raw JSON coming in.
203 | 
204 | Create a new pipeline called `co2meter-index.yml` with the following content:
205 | 
206 | ```
207 | input {
208 |     pipeline {
209 |         address => "co2meter-index"
210 |     }
211 | }
212 | filter { 
213 |     json {
214 |         source => "message"
215 |     }
216 |     json {
217 |         source => "message"
218 |     }
219 |     mutate {
220 |         remove_field => ["message", "agent", "host", "input", "log", "host", "ecs", "@version"]
221 |     }
222 | }
223 | output {
224 |     elasticsearch {
225 |         #
226 |         # Custom Settings
227 |         #
228 |         id => "co2meter-index"
229 |         index => "co2meter-%{+YYYY.MM.dd}"
230 |         hosts => "${ES_ENDPOINT}"
231 |         user => "${ES_USERNAME}"
232 |         password => "${ES_PASSWORD}"
233 |     }
234 | }
235 | ```
236 | 
237 | Put this pipeline in your Logstash configuration directory so it gets loaded in whenever Logstash restarts:
238 | 
239 | ```bash
240 | sudo mv co2meter-index.yml /etc/logstash/conf.d/
241 | ```
242 | 
243 | Add the pipeline to your `/etc/logstash/pipelines.yml` file:
244 | 
245 | ```
246 | - pipeline.id: "co2meter-index"
247 |   path.config: "/etc/logstash/conf.d/co2meter-index.conf"
248 | ```
249 | 
250 | And finally, restart the Logstash service:
251 | 
252 | ```bash
253 | sudo systemctl restart logstash
254 | ```
255 | 
256 | While Logstash is restarting, you can tail it's log file in order to see if there are any configuration errors:
257 | 
258 | ```bash
259 | sudo tail -f /var/log/logstash/logstash-plain.log
260 | ```
261 | 
262 | After a few seconds, you should see Logstash shutdown and start with the new pipeline and no errors being emitted.
263 | 
264 | Check your cluster's Stack Monitoring to see if we're getting events through the pipeline:
265 | 
266 | ![Indexing](index.png)
267 | 
268 | ## Step #4 - Visualize Data
269 | 
270 | Once Elasticsearch is indexing the data, we want to visualize it in Kibana.
271 | 
272 | Download this dashboard:  [co2meter.ndjson](co2meter.ndjson)
273 | 
274 | Jump back into Kibana:
275 | 
276 | 1. Select "Stack Management" from the menu
277 | 2. Select "Saved Objects"
278 | 3. Click "Import" in the upper right
279 | 
280 | Once it's been imported, click on "CO2 Meter".
281 | 
282 | Congratulations!  You should now be looking at data from your CO2 Meter in Elastic.
283 | 
284 | ![Dashboard](dashboard.png)
285 | 
286 | These graphs can be added to the [Weather Station](../weather-station) data source.
287 | 


--------------------------------------------------------------------------------
/co2meter/archive.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/co2meter/archive.png


--------------------------------------------------------------------------------
/co2meter/co2meter.ndjson:
--------------------------------------------------------------------------------
1 | {"attributes":{"fieldAttrs":"{}","fields":"[]","runtimeFieldMap":"{}","timeFieldName":"@timestamp","title":"co2meter-*","typeMeta":"{}"},"coreMigrationVersion":"7.14.0","id":"74e365f0-0b03-11ec-b013-53a9df7625dd","migrationVersion":{"index-pattern":"7.11.0"},"references":[],"type":"index-pattern","updated_at":"2021-09-01T09:03:21.562Z","version":"WzIxODM4MSwyXQ=="}
2 | {"attributes":{"description":"","hits":0,"kibanaSavedObjectMeta":{"searchSourceJSON":"{\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filter\":[]}"},"optionsJSON":"{\"useMargins\":true,\"syncColors\":false,\"hidePanelTitles\":false}","panelsJSON":"[{\"version\":\"7.14.0\",\"type\":\"visualization\",\"gridData\":{\"x\":0,\"y\":0,\"w\":11,\"h\":4,\"i\":\"83813ed8-374f-42f7-851e-453e236435be\"},\"panelIndex\":\"83813ed8-374f-42f7-851e-453e236435be\",\"embeddableConfig\":{\"savedVis\":{\"title\":\"\",\"description\":\"\",\"type\":\"markdown\",\"params\":{\"fontSize\":12,\"openLinksInNewTab\":false,\"markdown\":\"# CO2 Meter\"},\"uiState\":{},\"data\":{\"aggs\":[],\"searchSource\":{\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filter\":[]}}},\"enhancements\":{}}},{\"version\":\"7.14.0\",\"type\":\"lens\",\"gridData\":{\"x\":0,\"y\":4,\"w\":48,\"h\":12,\"i\":\"2af5ddba-aad0-40ae-b802-4a9e3e83fc33\"},\"panelIndex\":\"2af5ddba-aad0-40ae-b802-4a9e3e83fc33\",\"embeddableConfig\":{\"attributes\":{\"title\":\"\",\"type\":\"lens\",\"visualizationType\":\"lnsXY\",\"state\":{\"datasourceStates\":{\"indexpattern\":{\"layers\":{\"6619bac7-cb55-4c41-81cb-f417f1e8d4d4\":{\"columns\":{\"8b1ecdee-e774-4a13-b160-76f80601de32\":{\"label\":\"@timestamp\",\"dataType\":\"date\",\"operationType\":\"date_histogram\",\"sourceField\":\"@timestamp\",\"isBucketed\":true,\"scale\":\"interval\",\"params\":{\"interval\":\"auto\"}},\"bbd7882d-13ea-40f2-9d25-c863db2ff550\":{\"label\":\"Median of co2_ppm\",\"dataType\":\"number\",\"operationType\":\"median\",\"sourceField\":\"co2_ppm\",\"isBucketed\":false,\"scale\":\"ratio\"},\"5ed35199-2f7e-4098-b627-ba46c5bf53f8\":{\"label\":\"Top values of hostname.keyword\",\"dataType\":\"string\",\"operationType\":\"terms\",\"scale\":\"ordinal\",\"sourceField\":\"hostname.keyword\",\"isBucketed\":true,\"params\":{\"size\":3,\"orderBy\":{\"type\":\"column\",\"columnId\":\"bbd7882d-13ea-40f2-9d25-c863db2ff550\"},\"orderDirection\":\"desc\",\"otherBucket\":true,\"missingBucket\":false}}},\"columnOrder\":[\"5ed35199-2f7e-4098-b627-ba46c5bf53f8\",\"8b1ecdee-e774-4a13-b160-76f80601de32\",\"bbd7882d-13ea-40f2-9d25-c863db2ff550\"],\"incompleteColumns\":{}}}}},\"visualization\":{\"legend\":{\"isVisible\":true,\"position\":\"right\"},\"valueLabels\":\"hide\",\"fittingFunction\":\"None\",\"curveType\":\"CURVE_MONOTONE_X\",\"yLeftExtent\":{\"mode\":\"custom\",\"lowerBound\":0,\"upperBound\":1600},\"yRightExtent\":{\"mode\":\"full\"},\"axisTitlesVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"tickLabelsVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"gridlinesVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"preferredSeriesType\":\"line\",\"layers\":[{\"layerId\":\"6619bac7-cb55-4c41-81cb-f417f1e8d4d4\",\"accessors\":[\"bbd7882d-13ea-40f2-9d25-c863db2ff550\"],\"position\":\"top\",\"seriesType\":\"line\",\"showGridlines\":false,\"xAccessor\":\"8b1ecdee-e774-4a13-b160-76f80601de32\",\"splitAccessor\":\"5ed35199-2f7e-4098-b627-ba46c5bf53f8\"}]},\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filters\":[]},\"references\":[{\"type\":\"index-pattern\",\"id\":\"74e365f0-0b03-11ec-b013-53a9df7625dd\",\"name\":\"indexpattern-datasource-current-indexpattern\"},{\"type\":\"index-pattern\",\"id\":\"74e365f0-0b03-11ec-b013-53a9df7625dd\",\"name\":\"indexpattern-datasource-layer-6619bac7-cb55-4c41-81cb-f417f1e8d4d4\"}]},\"enhancements\":{}}},{\"version\":\"7.14.0\",\"type\":\"lens\",\"gridData\":{\"x\":0,\"y\":16,\"w\":48,\"h\":12,\"i\":\"35baf3b9-4b4f-4248-8044-6415c253c84c\"},\"panelIndex\":\"35baf3b9-4b4f-4248-8044-6415c253c84c\",\"embeddableConfig\":{\"attributes\":{\"title\":\"\",\"type\":\"lens\",\"visualizationType\":\"lnsXY\",\"state\":{\"datasourceStates\":{\"indexpattern\":{\"layers\":{\"f4357ceb-17b0-4b3a-8118-8529aa62726d\":{\"columns\":{\"8fe94aed-879a-4a2f-89c1-e787fe8a55fe\":{\"label\":\"@timestamp\",\"dataType\":\"date\",\"operationType\":\"date_histogram\",\"sourceField\":\"@timestamp\",\"isBucketed\":true,\"scale\":\"interval\",\"params\":{\"interval\":\"auto\"}},\"07ebffaa-db61-4e78-b3cc-7bd46277170d\":{\"label\":\"Top values of hostname.keyword\",\"dataType\":\"string\",\"operationType\":\"terms\",\"scale\":\"ordinal\",\"sourceField\":\"hostname.keyword\",\"isBucketed\":true,\"params\":{\"size\":3,\"orderBy\":{\"type\":\"column\",\"columnId\":\"c589e9a4-17c7-4f72-96d3-cdccebbe04a2\"},\"orderDirection\":\"desc\",\"otherBucket\":true,\"missingBucket\":false}},\"c589e9a4-17c7-4f72-96d3-cdccebbe04a2\":{\"label\":\"Median of temp_f\",\"dataType\":\"number\",\"operationType\":\"median\",\"sourceField\":\"temp_f\",\"isBucketed\":false,\"scale\":\"ratio\"}},\"columnOrder\":[\"07ebffaa-db61-4e78-b3cc-7bd46277170d\",\"8fe94aed-879a-4a2f-89c1-e787fe8a55fe\",\"c589e9a4-17c7-4f72-96d3-cdccebbe04a2\"],\"incompleteColumns\":{}}}}},\"visualization\":{\"legend\":{\"isVisible\":true,\"position\":\"right\"},\"valueLabels\":\"hide\",\"fittingFunction\":\"None\",\"curveType\":\"CURVE_MONOTONE_X\",\"yLeftExtent\":{\"mode\":\"custom\",\"lowerBound\":0,\"upperBound\":100},\"yRightExtent\":{\"mode\":\"full\"},\"axisTitlesVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"tickLabelsVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"gridlinesVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"preferredSeriesType\":\"line\",\"layers\":[{\"layerId\":\"f4357ceb-17b0-4b3a-8118-8529aa62726d\",\"accessors\":[\"c589e9a4-17c7-4f72-96d3-cdccebbe04a2\"],\"position\":\"top\",\"seriesType\":\"line\",\"showGridlines\":false,\"xAccessor\":\"8fe94aed-879a-4a2f-89c1-e787fe8a55fe\",\"splitAccessor\":\"07ebffaa-db61-4e78-b3cc-7bd46277170d\"}]},\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filters\":[]},\"references\":[{\"type\":\"index-pattern\",\"id\":\"74e365f0-0b03-11ec-b013-53a9df7625dd\",\"name\":\"indexpattern-datasource-current-indexpattern\"},{\"type\":\"index-pattern\",\"id\":\"74e365f0-0b03-11ec-b013-53a9df7625dd\",\"name\":\"indexpattern-datasource-layer-f4357ceb-17b0-4b3a-8118-8529aa62726d\"}]},\"enhancements\":{}}}]","timeRestore":false,"title":"CO2 Meter","version":1},"coreMigrationVersion":"7.14.0","id":"90479db0-0b04-11ec-b013-53a9df7625dd","migrationVersion":{"dashboard":"7.14.0"},"references":[{"id":"74e365f0-0b03-11ec-b013-53a9df7625dd","name":"2af5ddba-aad0-40ae-b802-4a9e3e83fc33:indexpattern-datasource-current-indexpattern","type":"index-pattern"},{"id":"74e365f0-0b03-11ec-b013-53a9df7625dd","name":"2af5ddba-aad0-40ae-b802-4a9e3e83fc33:indexpattern-datasource-layer-6619bac7-cb55-4c41-81cb-f417f1e8d4d4","type":"index-pattern"},{"id":"74e365f0-0b03-11ec-b013-53a9df7625dd","name":"35baf3b9-4b4f-4248-8044-6415c253c84c:indexpattern-datasource-current-indexpattern","type":"index-pattern"},{"id":"74e365f0-0b03-11ec-b013-53a9df7625dd","name":"35baf3b9-4b4f-4248-8044-6415c253c84c:indexpattern-datasource-layer-f4357ceb-17b0-4b3a-8118-8529aa62726d","type":"index-pattern"}],"type":"dashboard","updated_at":"2021-09-01T20:50:22.668Z","version":"WzIzNDk3MCwyXQ=="}
3 | {"excludedObjects":[],"excludedObjectsCount":0,"exportedCount":2,"missingRefCount":0,"missingReferences":[]}


--------------------------------------------------------------------------------
/co2meter/co2meter.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/co2meter/co2meter.png


--------------------------------------------------------------------------------
/co2meter/co2meter.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python3
 2 | 
 3 | import datetime
 4 | import fcntl
 5 | import json
 6 | import sys
 7 | import time
 8 | 
 9 | def decrypt(key,  data):
10 |     cstate = [0x48,  0x74,  0x65,  0x6D,  0x70,  0x39,  0x39,  0x65]
11 |     shuffle = [2, 4, 0, 7, 1, 6, 5, 3]
12 |     phase1 = [0] * 8
13 |     for i, o in enumerate(shuffle):
14 |         phase1[o] = data[i]
15 |     phase2 = [0] * 8
16 |     for i in range(8):
17 |         phase2[i] = phase1[i] ^ key[i]
18 |     phase3 = [0] * 8
19 |     for i in range(8):
20 |         phase3[i] = ( (phase2[i] >> 3) | (phase2[ (i-1+8)%8 ] << 5) ) & 0xff
21 |     ctmp = [0] * 8
22 |     for i in range(8):
23 |         ctmp[i] = ( (cstate[i] >> 4) | (cstate[i]<<4) ) & 0xff
24 |     out = [0] * 8
25 |     for i in range(8):
26 |         out[i] = (0x100 + phase3[i] - ctmp[i]) & 0xff
27 |     return out
28 | 
29 | def hd(d):
30 |     return " ".join("%02X" % e for e in d)
31 | 
32 | if __name__ == "__main__":
33 |     key = [0xc4, 0xc6, 0xc0, 0x92, 0x40, 0x23, 0xdc, 0x96]
34 |     fp = open("/dev/hidraw0", "a+b", 0)
35 |     set_report = [0] + [0xc4, 0xc6, 0xc0, 0x92, 0x40, 0x23, 0xdc, 0x96]
36 |     fcntl.ioctl(fp, 0xC0094806, bytearray(set_report))
37 | 
38 |     values = {}
39 | 
40 |     co2_ppm = 0
41 |     temp_c = 1000
42 |     i = 0
43 | 
44 |     while True:
45 |         i += 1
46 |         if i == 10:
47 |             break
48 |         data = list(fp.read(8))
49 |         decrypted = decrypt(key, data)
50 |         if decrypted[4] != 0x0d or (sum(decrypted[:3]) & 0xff) != decrypted[3]:
51 |             print(hd(data), " => ", hd(decrypted),  "Checksum error")
52 |         else:
53 |             op = decrypted[0]
54 |             val = decrypted[1] << 8 | decrypted[2]
55 |             values[op] = val
56 |             # http://co2meters.com/Documentation/AppNotes/AN146-RAD-0401-serial-communication.pdf
57 |             if 0x50 in values:
58 |                 co2_ppm = values[0x50]
59 |             if 0x42 in values:
60 |                 temp_c = values[0x42] / 16.0 - 273.15
61 | 
62 |     temp_f = (temp_c * 9 / 5) + 32
63 |     output = {
64 |         "@timestamp":  datetime.datetime.utcnow().isoformat(),
65 |         "hostname": "node-21",
66 |         "location": "office",
67 |         "co2_ppm": co2_ppm,
68 |         "temp_c": float("%2.2f" % temp_c),
69 |         "temp_f": float("%2.2f" % temp_f),
70 |         "source": "CO2 Meter"
71 |     }
72 |     print(json.dumps(output))
73 | 


--------------------------------------------------------------------------------
/co2meter/dashboard.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/co2meter/dashboard.png


--------------------------------------------------------------------------------
/co2meter/index.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/co2meter/index.png


--------------------------------------------------------------------------------
/co2meter/minio.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/co2meter/minio.png


--------------------------------------------------------------------------------
/directory-sizes/README.md:
--------------------------------------------------------------------------------
  1 | # Monitoring Directory Sizes
  2 | 
  3 | <img src="logo.png" alt="DHT22" width="350" align="right">
  4 | 
  5 | Keeping an eye on the growth of your Data Lake is useful for a few reasons:
  6 | 
  7 | 1. See how fast each data source is growing on disk
  8 | 2. Keep an eye on how much space you have available
  9 | 3. Better understand the cost of storing each data source
 10 | 
 11 | We'll use a Python script to query the size of each directory in our Data Lake (via NFS mount) in addition to recording the total size and space available for use.  Our script will write to stdout which we'll redirect to a log file.  From there, Filebeat will pick it up and send it to Elastic.
 12 | 
 13 | ![Dashboard](dashboard.png)
 14 | 
 15 | Let's get started.
 16 | 
 17 | ## Step #1 - Collect Data
 18 | 
 19 | Create a Python script at `~/bin/directory-sizes.py` with the following contents (adjusting any values as you see fit):
 20 | 
 21 | ```python
 22 | #!/usr/bin/env python3
 23 | 
 24 | import datetime
 25 | import json
 26 | import os
 27 | 
 28 | path = "/mnt/data-lake"
 29 | 
 30 | def get_size(start_path = path):
 31 |     total_size = 0
 32 |     for dirpath, dirnames, filenames in os.walk(start_path):
 33 |         for f in filenames:
 34 |             fp = os.path.join(dirpath, f)
 35 |             if not os.path.islink(fp): # skip symbolic links
 36 |                 total_size += os.path.getsize(fp)
 37 |     return total_size
 38 | 
 39 | if __name__ == "__main__":
 40 |     if os.path.ismount(path):
 41 |         # Get size of each directory
 42 |         for d in os.listdir(path):
 43 |             size_bytes = get_size(path + "/" + d)
 44 |             output = {
 45 |                 "@timestamp": datetime.datetime.utcnow().isoformat(),
 46 |                 "dir": d,
 47 |                 "bytes": size_bytes
 48 |             }
 49 |             print(json.dumps(output))
 50 | 
 51 |         # Get total, available, and free space
 52 |         statvfs = os.statvfs(path)
 53 |         output = {
 54 |             "@timestamp": datetime.datetime.utcnow().isoformat(),
 55 |             "total_bytes": statvfs.f_frsize * statvfs.f_blocks,     # Size of filesystem in bytes
 56 |             "free_bytes": statvfs.f_frsize * statvfs.f_bfree,       # Free bytes total
 57 |             "available_bytes": statvfs.f_frsize * statvfs.f_bavail, # Free bytes for users
 58 |             "mounted": True
 59 |         }
 60 |         print(json.dumps(output))
 61 |     else:
 62 |         output = {
 63 |             "@timestamp": datetime.datetime.utcnow().isoformat(),
 64 |             "mounted": False
 65 |         }
 66 |         print(json.dumps(output))
 67 | ```
 68 | 
 69 | Try running the script from the command line:
 70 | 
 71 | ```bash
 72 | chmod a+x ~/bin/directory-sizes.py
 73 | ~/bin/directory-sizes.py
 74 | ```
 75 | 
 76 | The output should look like the following:
 77 | 
 78 | ```json
 79 | {"@timestamp": "2021-09-06T14:46:37.376487", "dir": "nginx", "bytes": 1445406508}
 80 | {"@timestamp": "2021-09-06T14:46:39.673445", "dir": "system-metricbeat-module", "bytes": 62265436549}
 81 | {"@timestamp": "2021-09-06T14:46:39.683812", "dir": "flights", "bytes": 5943006981}
 82 | {"@timestamp": "2021-09-06T14:46:41.122360", "dir": "haproxy-metricbeat-module", "bytes": 15443596238}
 83 | {"@timestamp": "2021-09-06T14:46:41.122731", "dir": "weather-historical", "bytes": 137599636}
 84 | ...
 85 | ```
 86 | 
 87 | Once you're able to successfully run the script, create a log file for its output:
 88 | 
 89 | ```bash
 90 | sudo touch /var/log/directory-sizes.log
 91 | sudo chown ubuntu.ubuntu /var/log/directory-sizes.log
 92 | ```
 93 | 
 94 | Create a logrotate entry so the log file doesn't grow unbounded:
 95 | 
 96 | ```
 97 | sudo vi /etc/logrotate.d/directory-sizes
 98 | ```
 99 | 
100 | Add the following content:
101 | 
102 | ```
103 | /var/log/directory-sizes.log {
104 |   weekly
105 |   rotate 12
106 |   compress
107 |   delaycompress
108 |   missingok
109 |   notifempty
110 |   create 644 ubuntu ubuntu
111 | }
112 | ```
113 | 
114 | Add the following entry to your crontab:
115 | 
116 | ```
117 | * * * * * sudo /home/ubuntu/bin/directory-sizes.py >> /var/log/directory-sizes.log 2>&1
118 | ```
119 | 
120 | Verify output by tailing the log file for a few minutes:
121 | 
122 | ```
123 | $ tail -f /var/log/directory-sizes.log
124 | ```
125 | 
126 | If you're seeing output scroll each minute then you are successfully collecting data!
127 | 
128 | ## Step #2 - Archive Data
129 | 
130 | Once your data is ready to archive, we'll use Filebeat to send it to Logstash which will in turn sends it to S3.
131 | 
132 | Add the following to the Filebeat config `/etc/filebeat/filebeat.yml` on the host logging your DHT22 data:
133 | 
134 | ```yaml
135 | filebeat.inputs:
136 |   - type: log
137 |     enabled: true
138 |     tags: ["directory-sizes"]
139 |     paths:
140 |       - /var/log/directory-sizes.log
141 | ```
142 | 
143 | This tells Filebeat where the log file is located and it adds a tag to each event.  We'll refer to that tag in Logstash so we can easily isolate events from this data stream.
144 | 
145 | Restart Filebeat:
146 | 
147 | ```bash
148 | sudo systemctl restart filebeat
149 | ```
150 | 
151 | You may want to tail syslog to see if Filebeat restarts without any issues:
152 | 
153 | ```bash
154 | tail -f /var/log/syslog | grep filebeat
155 | ```
156 | 
157 | At this point, we should have DHT22 data flowing into Logstash.  By default however, our `distributor` pipeline in Logstash will put any unrecognized data in our Data Lake / S3 bucket called `NEEDS_CLASSIFIED`.  To change this, we're going to update the `distributor` pipeline to recognize the DHT22 data feed.
158 | 
159 | Add the following conditional to your `distributor.yml` file:
160 | 
161 | ```
162 | } else if "directory-sizes" in [tags] {
163 |     pipeline {
164 |         send_to => ["directory-sizes-archive"]
165 |     }
166 | }
167 | ```
168 | 
169 | Create a Logstash pipeline called `directory-sizes-archive.yml` with the following contents:
170 | 
171 | ```
172 | input {
173 |     pipeline {
174 |         address => "directory-sizes-archive"
175 |     }
176 | }
177 | filter {
178 | }
179 | output {
180 |     s3 {
181 |         #
182 |         # Custom Settings
183 |         #
184 |         prefix => "directory-sizes/%{+YYYY}-%{+MM}-%{+dd}/%{+HH}"
185 |         temporary_directory => "${S3_TEMP_DIR}/directory-sizes-archive"
186 |         access_key_id => "${S3_ACCESS_KEY}"
187 |         secret_access_key => "${S3_SECRET_KEY}"
188 |         endpoint => "${S3_ENDPOINT}"
189 |         bucket => "${S3_BUCKET}"
190 | 
191 |         #
192 |         # Standard Settings
193 |         #
194 |         validate_credentials_on_root_bucket => false
195 |         codec => json_lines
196 |         # Limit Data Lake file sizes to 5 GB
197 |         size_file => 5000000000
198 |         time_file => 60
199 |         # encoding => "gzip"
200 |         additional_settings => {
201 |             force_path_style => true
202 |             follow_redirects => false
203 |         }
204 |     }
205 | }
206 | ```
207 | 
208 | Put this pipeline in your Logstash configuration directory so it gets loaded whenever Logstash restarts:
209 | 
210 | ```bash
211 | sudo mv directory-sizes-archive.yml /etc/logstash/conf.d/
212 | ```
213 | 
214 | Add the pipeline to your `/etc/logstash/pipelines.yml` file:
215 | 
216 | ```
217 | - pipeline.id: "directory-sizes-archive"
218 |   path.config: "/etc/logstash/conf.d/directory-sizes-archive.conf"
219 | ```
220 | 
221 | And finally, restart the Logstash service:
222 | 
223 | ```bash
224 | sudo systemctl restart logstash
225 | ```
226 | 
227 | While Logstash is restarting, you can tail it's log file in order to see if there are any configuration errors:
228 | 
229 | ```bash
230 | sudo tail -f /var/log/logstash/logstash-plain.log
231 | ```
232 | 
233 | After a few seconds, you should see Logstash shutdown and start with the new pipeline and no errors being emitted.
234 | 
235 | Check your cluster's Stack Monitoring to see if we're getting events through the pipeline:
236 | 
237 | ![Stack Monitoring](archive.png)
238 | 
239 | Check your S3 bucket to see if you're getting data directories created for the current date & hour with data:
240 | 
241 | ![MinIO](minio.png)
242 | 
243 | If you see your data being stored, then you are successfully archiving!
244 | 
245 | ## Step #3 - Index Data
246 | 
247 | Once Logstash is archiving the data, next we need to index it with Elastic.
248 | 
249 | We'll use Elastic's [Dynamic field mapping](https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-field-mapping.html) feature to automatically create the right [Field data types](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html) for the data we're sending in.  
250 | 
251 | Using the [Logstash Toolkit](http://github.com/gose/logstash-toolkit), the following filter chain has been built that can parse the raw JSON coming in.
252 | 
253 | Create a new pipeline called `directory-sizes-index.yml` with the following content:
254 | 
255 | ```
256 | input {
257 |     pipeline {
258 |         address => "directory-sizes-index"
259 |     }
260 | }
261 | filter {
262 |     json {
263 |         source => "message"
264 |     }
265 |     json {
266 |         source => "message"
267 |     }
268 |     date {
269 |         match => ["timestamp", "ISO8601"]
270 |     }
271 |     mutate {
272 |         remove_field => ["timestamp", "message"]
273 |         remove_field => ["tags", "agent", "input", "log", "path", "ecs", "@version"]
274 |     }
275 | }
276 | output {
277 |     elasticsearch {
278 |         #
279 |         # Custom Settings
280 |         #
281 |         id => "directory-sizes-index"
282 |         index => "directory-sizes-%{+YYYY.MM.dd}"
283 |         hosts => "${ES_ENDPOINT}"
284 |         user => "${ES_USERNAME}"
285 |         password => "${ES_PASSWORD}"
286 |     }
287 | }
288 | ```
289 | 
290 | Put this pipeline in your Logstash configuration directory so it gets loaded in whenever Logstash restarts:
291 | 
292 | ```bash
293 | sudo mv directory-sizes-index.yml /etc/logstash/conf.d/
294 | ```
295 | 
296 | Add the pipeline to your `/etc/logstash/pipelines.yml` file:
297 | 
298 | ```
299 | - pipeline.id: "directory-sizes-index"
300 |   path.config: "/etc/logstash/conf.d/directory-sizes-index.conf"
301 | ```
302 | 
303 | Append your new pipeline to your tagged data in the `distributor.yml` pipeline:
304 | 
305 | ```
306 | } else if "directory-sizes" in [tags] {
307 |     pipeline {
308 |         send_to => ["directory-sizes-archive", "directory-sizes-index"]
309 |     }
310 | }
311 | ```
312 | 
313 | And finally, restart the Logstash service:
314 | 
315 | ```bash
316 | sudo systemctl restart logstash
317 | ```
318 | 
319 | While Logstash is restarting, you can tail it's log file in order to see if there are any configuration errors:
320 | 
321 | ```bash
322 | sudo tail -f /var/log/logstash/logstash-plain.log
323 | ```
324 | 
325 | After a few seconds, you should see Logstash shutdown and start with the new pipeline and no errors being emitted.
326 | 
327 | Check your cluster's Stack Monitoring to see if we're getting events through the pipeline:
328 | 
329 | ![Indexing](index.png)
330 | 
331 | ## Step #4 - Visualize Data
332 | 
333 | Once Elasticsearch is indexing the data, we want to visualize it in Kibana.
334 | 
335 | Download this dashboard:  [directory-sizes.ndjson](directory-sizes.ndjson)
336 | 
337 | Jump back into Kibana:
338 | 
339 | 1. Select "Stack Management" from the menu
340 | 2. Select "Saved Objects"
341 | 3. Click "Import" in the upper right
342 | 
343 | Once it's been imported, click on "Temperature DHT22".
344 | 
345 | ![Dashboard](dashboard.png)
346 | 
347 | Congratulations!  You should now be looking at temperature data from your DHT22 in Elastic.
348 | 
349 | 


--------------------------------------------------------------------------------
/directory-sizes/archive.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/directory-sizes/archive.png


--------------------------------------------------------------------------------
/directory-sizes/dashboard.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/directory-sizes/dashboard.png


--------------------------------------------------------------------------------
/directory-sizes/directory-sizes.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python3
 2 | 
 3 | import datetime
 4 | import json
 5 | import os
 6 | 
 7 | path = "/mnt/data-lake"
 8 | 
 9 | def get_size(start_path = path):
10 |     total_size = 0
11 |     for dirpath, dirnames, filenames in os.walk(start_path):
12 |         for f in filenames:
13 |             fp = os.path.join(dirpath, f)
14 |             # skip if it is symbolic link
15 |             if not os.path.islink(fp):
16 |                 total_size += os.path.getsize(fp)
17 | 
18 |     return total_size
19 | 
20 | if __name__ == "__main__":
21 | 
22 |     if os.path.ismount(path):
23 |         # Get size of each directory
24 |         for d in os.listdir(path):
25 |             size_bytes = get_size(path + "/" + d)
26 |             output = { 
27 |                 "@timestamp": datetime.datetime.utcnow().isoformat(),
28 |                 "directory": d,
29 |                 "bytes": size_bytes
30 |             }
31 |             print(json.dumps(output))
32 |             
33 |         # Get total, available, and free space
34 |         statvfs = os.statvfs(path)
35 |         output = { 
36 |             "@timestamp": datetime.datetime.utcnow().isoformat(),
37 |             "total_bytes": statvfs.f_frsize * statvfs.f_blocks,     # Size of filesystem in bytes
38 |             "free_bytes": statvfs.f_frsize * statvfs.f_bfree,       # Free bytes total
39 |             "available_bytes": statvfs.f_frsize * statvfs.f_bavail, # Free bytes for users
40 |             "mounted": True
41 |         }   
42 |         print(json.dumps(output))
43 |     else:
44 |         output = {
45 |             "@timestamp": datetime.datetime.utcnow().isoformat(),
46 |             "mounted": False
47 |         }
48 |         print(json.dumps(output))
49 | 
50 | 


--------------------------------------------------------------------------------
/directory-sizes/index.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/directory-sizes/index.png


--------------------------------------------------------------------------------
/directory-sizes/logo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/directory-sizes/logo.png


--------------------------------------------------------------------------------
/directory-sizes/minio.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/directory-sizes/minio.png


--------------------------------------------------------------------------------
/flight-tracker/README.md:
--------------------------------------------------------------------------------
  1 | # Elastic Flight Tracker
  2 | 
  3 | <img src="logo.png" width="300" align="right">
  4 | 
  5 | For this data source, we'll be using an [SDR](https://www.amazon.com/gp/product/B01GDN1T4S) to track aircraft flights via [ADS-B](https://mode-s.org/decode/).  We'll use a Python script to decode the signals and write them to a log file.  Elastic's Filebeat will pick them up from there and handle getting them to Logstash.
  6 | 
  7 | We'll build the following dashboard with Elastic:
  8 | 
  9 | ![Dashboard](dashboard.png)
 10 | 
 11 | Let's get started!
 12 | 
 13 | ## Step #1 - Collect Data
 14 | 
 15 | Create a new python script called `~/bin/flight-tracker.py` with the following contents:
 16 | 
 17 | ​	[flight-tracker.py](flight-tracker.py)
 18 | 
 19 | The script requires that your SDR be plugged in before running.
 20 | 
 21 | Take a few minutes to familiarize yourself with the script.  Adjust the values of `<your_latitude>` and `<your_longitude>`.  You can use [LatLong.net](https://www.latlong.net/) to lookup your location.
 22 | 
 23 | When you're ready, try running the script:
 24 | 
 25 | ```bash
 26 | chmod a+x ~/bin/flight-tracker.py
 27 | sudo ~/bin/flight-tracker.py
 28 | ```
 29 | 
 30 | It may take a few minutes to see output if you're in a quiet airspace, but once ~10 messages have been received you should see output on `stdout` similar to:
 31 | 
 32 | ```json
 33 | {"@timestamp": "2021-09-08T15:20:04.046427", "hex_ident": "A49DE9", "call_sign": null, "location": [42.05695, -88.04905], "altitude_ft": 31475, "speed_kts": 334, "track_angle_deg": 169, "vertical_speed_fpm": 3328, "speed_ref": "GS"}
 34 | {"@timestamp": "2021-09-08T15:20:03.330181", "hex_ident": "A1D4BC", "call_sign": "ENY4299", "location": [41.78804, -88.11425], "altitude_ft": 9675, "speed_kts": 292, "track_angle_deg": 41, "vertical_speed_fpm": -1792, "speed_ref": "GS"}
 35 | {"@timestamp": "2021-09-08T15:20:05.502300", "hex_ident": "ACC3B4", "call_sign": "AAL2080", "location": [41.91885, -88.03], "altitude_ft": 7600, "speed_kts": 289, "track_angle_deg": 45, "vertical_speed_fpm": -1536, "speed_ref": "GS"}
 36 | ```
 37 | 
 38 | Once you confirm the script is working, you can redirect its output to a log file:
 39 | 
 40 | ```bash
 41 | sudo touch /var/log/flight-tracker.log
 42 | sudo chown ubuntu.ubuntu /var/log/flight-tracker.log
 43 | ```
 44 | 
 45 | Create a logrotate entry so the log file doesn't grow unbounded:
 46 | 
 47 | ```bash
 48 | sudo vi /etc/logrotate.d/flight-tracker
 49 | ```
 50 | 
 51 | Add the following logrotate content:
 52 | 
 53 | ```
 54 | /var/log/flight-tracker.log {
 55 |   weekly
 56 |   rotate 12
 57 |   compress
 58 |   delaycompress
 59 |   missingok
 60 |   notifempty
 61 |   create 644 ubuntu ubuntu
 62 | }
 63 | ```
 64 | 
 65 | Create a new bash script `~/bin/flight-tracker.sh` with the following:
 66 | 
 67 | ```bash
 68 | #!/bin/bash
 69 | 
 70 | if pgrep -f "sudo /home/ubuntu/bin/flight-tracker.py" > /dev/null
 71 | then
 72 |     echo "Already running."
 73 | else
 74 |     echo "Not running.  Restarting..."
 75 |     sudo /home/ubuntu/bin/flight-tracker.py >> /var/log/flight-tracker.log 2>&1
 76 | fi
 77 | ```
 78 | 
 79 | Make it executable:
 80 | 
 81 | ```bash
 82 | chmod a+x ~/bin/flight-tracker.sh
 83 | ```
 84 | 
 85 | Add the following entry to your crontab with `crontab -e`:
 86 | 
 87 | ```
 88 | * * * * * /home/ubuntu/bin/flight-tracker.sh >> /tmp/flight-tracker.log 2>&1
 89 | ```
 90 | 
 91 | Verify output by tailing the log file for a few minutes (since cron is only running the script at the start of each minute):
 92 | 
 93 | ```bash
 94 | tail -f /var/log/flight-tracker.log
 95 | ```
 96 | 
 97 | If you're seeing output every few seconds, then you are successfully collecting data!
 98 | 
 99 | ## Step #2 - Archive Data
100 | 
101 | Once your data is ready to archive, we'll use Filebeat to send it to Logstash which will in turn sends it to S3.
102 | 
103 | Add the following to the Filebeat config `/etc/filebeat/filebeat.yml` on the host logging your CO2 data:
104 | 
105 | ```yaml
106 | filebeat.inputs:
107 |   - type: log
108 |     enabled: true
109 |     tags: ["flight-tracker"]
110 |     paths:
111 |       - /var/log/flight-tracker.log
112 | ```
113 | 
114 | This tells Filebeat where the log file is located and it adds a tag to each event.  We'll refer to that tag in Logstash so we can easily isolate events from this data stream.
115 | 
116 | Restart Filebeat:
117 | 
118 | ```bash
119 | sudo systemctl restart filebeat
120 | ```
121 | 
122 | You may want to tail syslog to see if Filebeat restarts without any issues:
123 | 
124 | ```bash
125 | tail -f /var/log/syslog | grep filebeat
126 | ```
127 | 
128 | At this point, we should have Solar Enphase data flowing into Logstash.  By default however, our `distributor` pipeline in Logstash will put any unrecognized data in our Data Lake / S3 bucket called `NEEDS_CLASSIFIED`.  To change this, we're going to update the `distributor` pipeline to recognize the Solar Enphase data feed.
129 | 
130 | Add the following conditional to your `distributor.yml` file:
131 | 
132 | ```
133 | } else if "flight-tracker" in [tags] {
134 |     pipeline {
135 |         send_to => ["flight-tracker-archive"]
136 |     }
137 | }
138 | ```
139 | 
140 | Create a Logstash pipeline called `flight-tracker-archive.yml` with the following contents:
141 | 
142 | ```
143 | input {
144 |     pipeline {
145 |         address => "flight-tracker-archive"
146 |     }
147 | }
148 | filter {
149 | }
150 | output {
151 |     s3 {
152 |         #
153 |         # Custom Settings
154 |         #
155 |         prefix => "flight-tracker/%{+YYYY}-%{+MM}-%{+dd}/%{+HH}"
156 |         temporary_directory => "${S3_TEMP_DIR}/flight-tracker-archive"
157 |         access_key_id => "${S3_ACCESS_KEY}"
158 |         secret_access_key => "${S3_SECRET_KEY}"
159 |         endpoint => "${S3_ENDPOINT}"
160 |         bucket => "${S3_BUCKET}"
161 | 
162 |         #
163 |         # Standard Settings
164 |         #
165 |         validate_credentials_on_root_bucket => false
166 |         codec => json_lines
167 |         # Limit Data Lake file sizes to 5 GB
168 |         size_file => 5000000000
169 |         time_file => 60
170 |         # encoding => "gzip"
171 |         additional_settings => {
172 |             force_path_style => true
173 |             follow_redirects => false
174 |         }
175 |     }
176 | }
177 | ```
178 | 
179 | Put this pipeline in your Logstash configuration directory so it gets loaded whenever Logstash restarts:
180 | 
181 | ```bash
182 | sudo mv flight-tracker-archive.yml /etc/logstash/conf.d/
183 | ```
184 | 
185 | Add the pipeline to your `/etc/logstash/pipelines.yml` file:
186 | 
187 | ```
188 | - pipeline.id: "flight-tracker-archive"
189 |   path.config: "/etc/logstash/conf.d/flight-tracker-archive.conf"
190 | ```
191 | 
192 | And finally, restart the Logstash service:
193 | 
194 | ```bash
195 | sudo systemctl restart logstash
196 | ```
197 | 
198 | While Logstash is restarting, you can tail it's log file in order to see if there are any configuration errors:
199 | 
200 | ```bash
201 | sudo tail -f /var/log/logstash/logstash-plain.log
202 | ```
203 | 
204 | After a few seconds, you should see Logstash shutdown and start with the new pipeline and no errors being emitted.
205 | 
206 | Check your cluster's Stack Monitoring to see if we're getting events through the pipeline:
207 | 
208 | ![Stack Monitoring](archive.png)
209 | 
210 | Check your S3 bucket to see if you're getting data directories created for the current date & hour with data:
211 | 
212 | ![MinIO](minio.png)
213 | 
214 | If you see your data being stored, then you are successfully archiving!
215 | 
216 | ## Step #3 - Index Data
217 | 
218 | Once Logstash is archiving the data, next we need to index it with Elastic.
219 | 
220 | We'll use Elastic's [Dynamic field mapping](https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-field-mapping.html) feature to automatically create the right [Field data types](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html) for the majority of data we're sending in.  The one exception is for `geo_point` data which we need to explicitly add a mapping for, using an index template.
221 | 
222 | Jump into Kibana and create the following Index Template using Dev Tools:
223 | 
224 | ```
225 | PUT _index_template/flight-tracker
226 | {
227 |   "index_patterns": ["flight-tracker-*"],
228 |   "template": {
229 |     "settings": {},
230 |     "mappings": {
231 |       "properties": {
232 |         "location": {
233 |           "type": "geo_point"
234 |         }
235 |       }
236 |     },
237 |     "aliases": {}
238 |   }
239 | }
240 | ```
241 | 
242 | Using the [Logstash Toolkit](http://github.com/gose/logstash-toolkit), the following filter chain has been built that can parse the raw JSON coming in.
243 | 
244 | Create a new pipeline called `flight-tracker-index.yml` with the following content:
245 | 
246 | ```
247 | input {
248 |   pipeline {
249 |     address => "flight-tracker-index"
250 |   }
251 | }
252 | filter {
253 |   json {
254 |     source => "message"
255 |   }
256 |   json {
257 |     source => "message"
258 |   }
259 |   mutate {
260 |     remove_field => ["message", "tags", "path"]
261 |     remove_field => ["agent", "host", "input", "log", "ecs", "@version"]
262 |   }
263 | }
264 | output {
265 |   elasticsearch {
266 |     #
267 |     # Custom Settings
268 |     #
269 |     id => "flight-tracker-index"
270 |     index => "flight-tracker-%{+YYYY.MM.dd}"
271 |     hosts => "${ES_ENDPOINT}"
272 |     user => "${ES_USERNAME}"
273 |     password => "${ES_PASSWORD}"
274 |   }
275 | }
276 | ```
277 | 
278 | Put this pipeline in your Logstash configuration directory so it gets loaded in whenever Logstash restarts:
279 | 
280 | ```bash
281 | sudo mv flight-tracker-index.yml /etc/logstash/conf.d/
282 | ```
283 | 
284 | Add the pipeline to your `/etc/logstash/pipelines.yml` file:
285 | 
286 | ```
287 | - pipeline.id: "flight-tracker-index"
288 |   path.config: "/etc/logstash/conf.d/flight-tracker-index.conf"
289 | ```
290 | 
291 | And finally, restart the Logstash service:
292 | 
293 | ```bash
294 | sudo systemctl restart logstash
295 | ```
296 | 
297 | While Logstash is restarting, you can tail it's log file in order to see if there are any configuration errors:
298 | 
299 | ```bash
300 | sudo tail -f /var/log/logstash/logstash-plain.log
301 | ```
302 | 
303 | After a few seconds, you should see Logstash shutdown and start with the new pipeline and no errors being emitted.
304 | 
305 | Check your cluster's Stack Monitoring to see if we're getting events through the pipeline:
306 | 
307 | ![Indexing](index.png)
308 | 
309 | ## Step #4 - Visualize Data
310 | 
311 | Once Elasticsearch is indexing the data, we want to visualize it in Kibana.
312 | 
313 | Download this dashboard:  [flight-tracker.ndjson](flight-tracker.ndjson)
314 | 
315 | Jump back into Kibana:
316 | 
317 | 1. Select "Stack Management" from the menu
318 | 2. Select "Saved Objects"
319 | 3. Click "Import" in the upper right
320 | 
321 | Once it's been imported, click on "Flight Tracker".
322 | 
323 | ![Dashboard](dashboard.png)
324 | 
325 | If you'd like to plot the location of your receiver (i.e., the orange tower in the Elastic Map), add the following document using Dev Tools (replacing the `lat` and`lon` with your location):
326 | 
327 | ```JSON
328 | PUT /flight-tracker-receiver/_doc/1
329 | {
330 |   "location": {
331 |     "lat": 41.978611,
332 |     "lon": -87.904724
333 |   }
334 | }
335 | ```
336 | 
337 | Congratulations!  You should now be looking at live flights in Elastic as they're being collected by your base station!
338 | 


--------------------------------------------------------------------------------
/flight-tracker/archive.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/flight-tracker/archive.png


--------------------------------------------------------------------------------
/flight-tracker/dashboard.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/flight-tracker/dashboard.png


--------------------------------------------------------------------------------
/flight-tracker/flight-tracker.ndjson:
--------------------------------------------------------------------------------
1 | {"attributes":{"fieldAttrs":"{}","fields":"[]","runtimeFieldMap":"{}","timeFieldName":"@timestamp","title":"flight-tracker-*","typeMeta":"{}"},"coreMigrationVersion":"7.14.0","id":"c10de000-10c0-11ec-b013-53a9df7625dd","migrationVersion":{"index-pattern":"7.11.0"},"references":[],"type":"index-pattern","updated_at":"2021-09-08T16:21:00.040Z","version":"WzQwODEyMSwyXQ=="}
2 | {"attributes":{"description":"","hits":0,"kibanaSavedObjectMeta":{"searchSourceJSON":"{\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filter\":[]}"},"optionsJSON":"{\"useMargins\":true,\"syncColors\":false,\"hidePanelTitles\":false}","panelsJSON":"[{\"version\":\"7.14.0\",\"type\":\"visualization\",\"gridData\":{\"x\":0,\"y\":0,\"w\":12,\"h\":5,\"i\":\"11142c2c-8f3c-4eb0-9c2c-bc87dd272bc1\"},\"panelIndex\":\"11142c2c-8f3c-4eb0-9c2c-bc87dd272bc1\",\"embeddableConfig\":{\"savedVis\":{\"title\":\"\",\"description\":\"\",\"type\":\"markdown\",\"params\":{\"fontSize\":12,\"openLinksInNewTab\":false,\"markdown\":\"# Flight Tracker\"},\"uiState\":{},\"data\":{\"aggs\":[],\"searchSource\":{\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filter\":[]}}},\"enhancements\":{}}},{\"version\":\"7.14.0\",\"type\":\"lens\",\"gridData\":{\"x\":12,\"y\":0,\"w\":36,\"h\":5,\"i\":\"2cf8e227-485b-4209-af4e-9416692b8916\"},\"panelIndex\":\"2cf8e227-485b-4209-af4e-9416692b8916\",\"embeddableConfig\":{\"attributes\":{\"title\":\"\",\"type\":\"lens\",\"visualizationType\":\"lnsXY\",\"state\":{\"datasourceStates\":{\"indexpattern\":{\"layers\":{\"451792f9-6bc9-4b2d-a8db-b88844fa22d0\":{\"columns\":{\"ea213a93-dd58-4eb6-9b0b-4c747c1592d2\":{\"label\":\"@timestamp\",\"dataType\":\"date\",\"operationType\":\"date_histogram\",\"sourceField\":\"@timestamp\",\"isBucketed\":true,\"scale\":\"interval\",\"params\":{\"interval\":\"auto\"}},\"95952347-70f3-4884-93f8-cef298545532\":{\"label\":\"Count of records\",\"dataType\":\"number\",\"operationType\":\"count\",\"isBucketed\":false,\"scale\":\"ratio\",\"sourceField\":\"Records\"}},\"columnOrder\":[\"ea213a93-dd58-4eb6-9b0b-4c747c1592d2\",\"95952347-70f3-4884-93f8-cef298545532\"],\"incompleteColumns\":{}}}}},\"visualization\":{\"legend\":{\"isVisible\":true,\"position\":\"right\"},\"valueLabels\":\"hide\",\"fittingFunction\":\"None\",\"yLeftExtent\":{\"mode\":\"full\"},\"yRightExtent\":{\"mode\":\"full\"},\"axisTitlesVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"tickLabelsVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"gridlinesVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"preferredSeriesType\":\"bar_stacked\",\"layers\":[{\"layerId\":\"451792f9-6bc9-4b2d-a8db-b88844fa22d0\",\"accessors\":[\"95952347-70f3-4884-93f8-cef298545532\"],\"position\":\"top\",\"seriesType\":\"bar_stacked\",\"showGridlines\":false,\"xAccessor\":\"ea213a93-dd58-4eb6-9b0b-4c747c1592d2\"}]},\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filters\":[]},\"references\":[{\"type\":\"index-pattern\",\"id\":\"c10de000-10c0-11ec-b013-53a9df7625dd\",\"name\":\"indexpattern-datasource-current-indexpattern\"},{\"type\":\"index-pattern\",\"id\":\"c10de000-10c0-11ec-b013-53a9df7625dd\",\"name\":\"indexpattern-datasource-layer-451792f9-6bc9-4b2d-a8db-b88844fa22d0\"}]},\"enhancements\":{}}},{\"version\":\"7.14.0\",\"type\":\"map\",\"gridData\":{\"x\":0,\"y\":5,\"w\":48,\"h\":33,\"i\":\"6641024f-134f-488a-98bb-e6e65eb47fcc\"},\"panelIndex\":\"6641024f-134f-488a-98bb-e6e65eb47fcc\",\"embeddableConfig\":{\"attributes\":{\"title\":\"Flight Tracker\",\"description\":\"\",\"layerListJSON\":\"[{\\\"sourceDescriptor\\\":{\\\"type\\\":\\\"EMS_TMS\\\",\\\"isAutoSelect\\\":true,\\\"id\\\":null},\\\"id\\\":\\\"b97e4b84-4fb0-414d-88cf-a3bfb409c8fe\\\",\\\"label\\\":null,\\\"minZoom\\\":0,\\\"maxZoom\\\":24,\\\"alpha\\\":1,\\\"visible\\\":true,\\\"style\\\":{\\\"type\\\":\\\"TILE\\\"},\\\"includeInFitToBounds\\\":true,\\\"type\\\":\\\"VECTOR_TILE\\\"},{\\\"sourceDescriptor\\\":{\\\"indexPatternId\\\":\\\"c10de000-10c0-11ec-b013-53a9df7625dd\\\",\\\"geoField\\\":\\\"location\\\",\\\"filterByMapBounds\\\":true,\\\"scalingType\\\":\\\"LIMIT\\\",\\\"id\\\":\\\"85e4a6d8-32db-4f58-8268-19505567a7ac\\\",\\\"type\\\":\\\"ES_SEARCH\\\",\\\"applyGlobalQuery\\\":true,\\\"applyGlobalTime\\\":true,\\\"tooltipProperties\\\":[],\\\"sortField\\\":\\\"\\\",\\\"sortOrder\\\":\\\"desc\\\",\\\"topHitsSplitField\\\":\\\"\\\",\\\"topHitsSize\\\":1},\\\"id\\\":\\\"7b080642-7356-4d69-881d-914ccbc26fa0\\\",\\\"label\\\":\\\"Check-ins\\\",\\\"minZoom\\\":0,\\\"maxZoom\\\":24,\\\"alpha\\\":0.75,\\\"visible\\\":true,\\\"style\\\":{\\\"type\\\":\\\"VECTOR\\\",\\\"properties\\\":{\\\"icon\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"value\\\":\\\"airport\\\"}},\\\"fillColor\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"color\\\":\\\"#54B399\\\"}},\\\"lineColor\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"color\\\":\\\"#41937c\\\"}},\\\"lineWidth\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"size\\\":0}},\\\"iconSize\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"size\\\":3}},\\\"iconOrientation\\\":{\\\"type\\\":\\\"DYNAMIC\\\",\\\"options\\\":{\\\"field\\\":{\\\"label\\\":\\\"track_deg\\\",\\\"name\\\":\\\"track_deg\\\",\\\"origin\\\":\\\"source\\\",\\\"type\\\":\\\"number\\\",\\\"supportsAutoDomain\\\":true},\\\"fieldMetaOptions\\\":{\\\"isEnabled\\\":true,\\\"sigma\\\":3}}},\\\"labelText\\\":{\\\"type\\\":\\\"DYNAMIC\\\",\\\"options\\\":{\\\"field\\\":null}},\\\"labelColor\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"color\\\":\\\"#FFFFFF\\\"}},\\\"labelSize\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"size\\\":14}},\\\"labelBorderColor\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"color\\\":\\\"#000000\\\"}},\\\"symbolizeAs\\\":{\\\"options\\\":{\\\"value\\\":\\\"circle\\\"}},\\\"labelBorderSize\\\":{\\\"options\\\":{\\\"size\\\":\\\"SMALL\\\"}}},\\\"isTimeAware\\\":true},\\\"includeInFitToBounds\\\":true,\\\"type\\\":\\\"VECTOR\\\",\\\"joins\\\":[]},{\\\"sourceDescriptor\\\":{\\\"indexPatternId\\\":\\\"c10de000-10c0-11ec-b013-53a9df7625dd\\\",\\\"geoField\\\":\\\"location\\\",\\\"scalingType\\\":\\\"TOP_HITS\\\",\\\"sortField\\\":\\\"@timestamp\\\",\\\"sortOrder\\\":\\\"desc\\\",\\\"tooltipProperties\\\":[\\\"hex_code.keyword\\\",\\\"@timestamp\\\",\\\"call_sign\\\",\\\"altitude_ft\\\",\\\"speed_kts\\\",\\\"vertical_speed_fpm\\\"],\\\"topHitsSplitField\\\":\\\"hex_code.keyword\\\",\\\"topHitsSize\\\":1,\\\"id\\\":\\\"4973e2c9-1ca6-4518-bac7-088913a7d434\\\",\\\"type\\\":\\\"ES_SEARCH\\\",\\\"applyGlobalQuery\\\":true,\\\"applyGlobalTime\\\":true,\\\"filterByMapBounds\\\":true},\\\"id\\\":\\\"5bb65a5f-f1c6-4e65-ae5f-f140bdecb838\\\",\\\"label\\\":\\\"Flights\\\",\\\"minZoom\\\":0,\\\"maxZoom\\\":24,\\\"alpha\\\":0.75,\\\"visible\\\":true,\\\"style\\\":{\\\"type\\\":\\\"VECTOR\\\",\\\"properties\\\":{\\\"icon\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"value\\\":\\\"airport\\\"}},\\\"fillColor\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"color\\\":\\\"#6092C0\\\"}},\\\"lineColor\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"color\\\":\\\"#4379aa\\\"}},\\\"lineWidth\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"size\\\":0}},\\\"iconSize\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"size\\\":10}},\\\"iconOrientation\\\":{\\\"type\\\":\\\"DYNAMIC\\\",\\\"options\\\":{\\\"field\\\":{\\\"label\\\":\\\"track_deg\\\",\\\"name\\\":\\\"track_deg\\\",\\\"origin\\\":\\\"source\\\",\\\"type\\\":\\\"number\\\",\\\"supportsAutoDomain\\\":true},\\\"fieldMetaOptions\\\":{\\\"isEnabled\\\":true,\\\"sigma\\\":3}}},\\\"labelText\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"value\\\":\\\"\\\"}},\\\"labelColor\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"color\\\":\\\"#FFFFFF\\\"}},\\\"labelSize\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"size\\\":14}},\\\"labelBorderColor\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"color\\\":\\\"#000000\\\"}},\\\"symbolizeAs\\\":{\\\"options\\\":{\\\"value\\\":\\\"icon\\\"}},\\\"labelBorderSize\\\":{\\\"options\\\":{\\\"size\\\":\\\"SMALL\\\"}}},\\\"isTimeAware\\\":true},\\\"includeInFitToBounds\\\":true,\\\"type\\\":\\\"VECTOR\\\",\\\"joins\\\":[]},{\\\"sourceDescriptor\\\":{\\\"indexPatternId\\\":\\\"c10de000-10c0-11ec-b013-53a9df7625dd\\\",\\\"geoField\\\":\\\"location\\\",\\\"splitField\\\":\\\"hex_code.keyword\\\",\\\"sortField\\\":\\\"@timestamp\\\",\\\"id\\\":\\\"e5e0fe1f-b235-4f42-b361-2e0d7177b1a8\\\",\\\"type\\\":\\\"ES_GEO_LINE\\\",\\\"applyGlobalQuery\\\":true,\\\"applyGlobalTime\\\":true,\\\"metrics\\\":[{\\\"type\\\":\\\"count\\\"}]},\\\"style\\\":{\\\"type\\\":\\\"VECTOR\\\",\\\"properties\\\":{\\\"icon\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"value\\\":\\\"marker\\\"}},\\\"fillColor\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"color\\\":\\\"#54B399\\\"}},\\\"lineColor\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"color\\\":\\\"#41937c\\\"}},\\\"lineWidth\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"size\\\":2}},\\\"iconSize\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"size\\\":6}},\\\"iconOrientation\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"orientation\\\":0}},\\\"labelText\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"value\\\":\\\"\\\"}},\\\"labelColor\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"color\\\":\\\"#FFFFFF\\\"}},\\\"labelSize\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"size\\\":14}},\\\"labelBorderColor\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"color\\\":\\\"#000000\\\"}},\\\"symbolizeAs\\\":{\\\"options\\\":{\\\"value\\\":\\\"circle\\\"}},\\\"labelBorderSize\\\":{\\\"options\\\":{\\\"size\\\":\\\"SMALL\\\"}}},\\\"isTimeAware\\\":true},\\\"id\\\":\\\"90e78113-220f-439f-908e-5ec584444856\\\",\\\"label\\\":null,\\\"minZoom\\\":0,\\\"maxZoom\\\":24,\\\"alpha\\\":1,\\\"visible\\\":true,\\\"includeInFitToBounds\\\":true,\\\"type\\\":\\\"VECTOR\\\",\\\"joins\\\":[]},{\\\"sourceDescriptor\\\":{\\\"indexPatternId\\\":\\\"c10de000-10c0-11ec-b013-53a9df7625dd\\\",\\\"geoField\\\":\\\"location\\\",\\\"filterByMapBounds\\\":true,\\\"scalingType\\\":\\\"LIMIT\\\",\\\"id\\\":\\\"efde372c-8fee-4433-ad0b-5cf3b1ae0c1b\\\",\\\"type\\\":\\\"ES_SEARCH\\\",\\\"applyGlobalQuery\\\":false,\\\"applyGlobalTime\\\":false,\\\"tooltipProperties\\\":[],\\\"sortField\\\":\\\"\\\",\\\"sortOrder\\\":\\\"desc\\\",\\\"topHitsSplitField\\\":\\\"\\\",\\\"topHitsSize\\\":1},\\\"id\\\":\\\"89aa1c7a-a9a0-4ad3-a15e-309539f3d74b\\\",\\\"label\\\":\\\"Base Station\\\",\\\"minZoom\\\":0,\\\"maxZoom\\\":24,\\\"alpha\\\":0.75,\\\"visible\\\":true,\\\"style\\\":{\\\"type\\\":\\\"VECTOR\\\",\\\"properties\\\":{\\\"icon\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"value\\\":\\\"communications-tower\\\"}},\\\"fillColor\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"color\\\":\\\"#f8a305\\\"}},\\\"lineColor\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"color\\\":\\\"#F8A305\\\"}},\\\"lineWidth\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"size\\\":1}},\\\"iconSize\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"size\\\":16}},\\\"iconOrientation\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"orientation\\\":0}},\\\"labelText\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"value\\\":\\\"\\\"}},\\\"labelColor\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"color\\\":\\\"#FFFFFF\\\"}},\\\"labelSize\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"size\\\":14}},\\\"labelBorderColor\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"color\\\":\\\"#000000\\\"}},\\\"symbolizeAs\\\":{\\\"options\\\":{\\\"value\\\":\\\"icon\\\"}},\\\"labelBorderSize\\\":{\\\"options\\\":{\\\"size\\\":\\\"SMALL\\\"}}},\\\"isTimeAware\\\":true},\\\"includeInFitToBounds\\\":true,\\\"type\\\":\\\"VECTOR\\\",\\\"joins\\\":[],\\\"query\\\":{\\\"query\\\":\\\"_id : 1\\\",\\\"language\\\":\\\"kuery\\\"}}]\",\"mapStateJSON\":\"{\\\"zoom\\\":8.47,\\\"center\\\":{\\\"lon\\\":-88.20859,\\\"lat\\\":41.95384},\\\"timeFilters\\\":{\\\"from\\\":\\\"now-3m\\\",\\\"to\\\":\\\"now\\\"},\\\"refreshConfig\\\":{\\\"isPaused\\\":false,\\\"interval\\\":10000},\\\"query\\\":{\\\"query\\\":\\\"\\\",\\\"language\\\":\\\"kuery\\\"},\\\"filters\\\":[],\\\"settings\\\":{\\\"autoFitToDataBounds\\\":false,\\\"backgroundColor\\\":\\\"#1d1e24\\\",\\\"disableInteractive\\\":false,\\\"disableTooltipControl\\\":false,\\\"hideToolbarOverlay\\\":false,\\\"hideLayerControl\\\":false,\\\"hideViewControl\\\":false,\\\"initialLocation\\\":\\\"LAST_SAVED_LOCATION\\\",\\\"fixedLocation\\\":{\\\"lat\\\":0,\\\"lon\\\":0,\\\"zoom\\\":2},\\\"browserLocation\\\":{\\\"zoom\\\":2},\\\"maxZoom\\\":24,\\\"minZoom\\\":0,\\\"showScaleControl\\\":false,\\\"showSpatialFilters\\\":true,\\\"showTimesliderToggleButton\\\":true,\\\"spatialFiltersAlpa\\\":0.3,\\\"spatialFiltersFillColor\\\":\\\"#DA8B45\\\",\\\"spatialFiltersLineColor\\\":\\\"#DA8B45\\\"}}\",\"uiStateJSON\":\"{\\\"isLayerTOCOpen\\\":true,\\\"openTOCDetails\\\":[]}\"},\"mapCenter\":{\"lat\":41.73541,\"lon\":-88.02419,\"zoom\":8.75},\"mapBuffer\":{\"minLon\":-89.29687,\"minLat\":40.9799,\"maxLon\":-86.48437,\"maxLat\":42.55308},\"isLayerTOCOpen\":true,\"openTOCDetails\":[],\"hiddenLayers\":[\"7b080642-7356-4d69-881d-914ccbc26fa0\"],\"enhancements\":{}}}]","timeRestore":false,"title":"Flight Tracker","version":1},"coreMigrationVersion":"7.14.0","id":"0ada6ef0-10c2-11ec-b013-53a9df7625dd","migrationVersion":{"dashboard":"7.14.0"},"references":[{"id":"c10de000-10c0-11ec-b013-53a9df7625dd","name":"2cf8e227-485b-4209-af4e-9416692b8916:indexpattern-datasource-current-indexpattern","type":"index-pattern"},{"id":"c10de000-10c0-11ec-b013-53a9df7625dd","name":"2cf8e227-485b-4209-af4e-9416692b8916:indexpattern-datasource-layer-451792f9-6bc9-4b2d-a8db-b88844fa22d0","type":"index-pattern"},{"id":"c10de000-10c0-11ec-b013-53a9df7625dd","name":"6641024f-134f-488a-98bb-e6e65eb47fcc:layer_1_source_index_pattern","type":"index-pattern"},{"id":"c10de000-10c0-11ec-b013-53a9df7625dd","name":"6641024f-134f-488a-98bb-e6e65eb47fcc:layer_2_source_index_pattern","type":"index-pattern"},{"id":"c10de000-10c0-11ec-b013-53a9df7625dd","name":"6641024f-134f-488a-98bb-e6e65eb47fcc:layer_3_source_index_pattern","type":"index-pattern"},{"id":"c10de000-10c0-11ec-b013-53a9df7625dd","name":"6641024f-134f-488a-98bb-e6e65eb47fcc:layer_4_source_index_pattern","type":"index-pattern"}],"type":"dashboard","updated_at":"2021-09-08T16:47:35.330Z","version":"WzQwODk1OCwyXQ=="}
3 | {"excludedObjects":[],"excludedObjectsCount":0,"exportedCount":2,"missingRefCount":0,"missingReferences":[]}


--------------------------------------------------------------------------------
/flight-tracker/flight-tracker.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | 
  3 | import pyModeS as pms
  4 | 
  5 | from datetime import datetime, timedelta
  6 | from json import dumps
  7 | from pyModeS import common
  8 | from pyModeS.extra.rtlreader import RtlReader
  9 | 
 10 | class Flight:
 11 |     def __init__(self, hex_ident=None):
 12 |         self.hex_ident = hex_ident
 13 |         self.call_sign = None
 14 |         self.location = None
 15 |         self.altitude_ft = None
 16 |         self.speed_kts = None
 17 |         self.track_angle_deg = None
 18 |         self.vertical_speed_fpm = None
 19 |         self.speed_ref = None
 20 |         self.last_seen = None
 21 |         self.sent = False
 22 | 
 23 |     def has_info(self):
 24 |         return (#self.call_sign is not None and
 25 |             self.location is not None and
 26 |             self.altitude_ft is not None and
 27 |             self.track_angle_deg is not None and
 28 |             self.speed_kts is not None)
 29 | 
 30 |     def pretty_print(self):
 31 |         print(self.hex_ident, self.call_sign, self.location, self.altitude_ft, self.speed_kts,
 32 |                 self.track_angle_deg, self.vertical_speed_fpm, self.speed_ref)
 33 | 
 34 |     def json_print(self):
 35 |         output = {
 36 |             "@timestamp": self.last_seen.isoformat(),
 37 |             "hex_code": self.hex_ident,
 38 |             "call_sign": self.call_sign,
 39 |             "location": {
 40 |                 "lat": self.location[0],
 41 |                 "lon": self.location[1]
 42 |                 },
 43 |             "altitude_ft": self.altitude_ft,
 44 |             "speed_kts": self.speed_kts,
 45 |             "track_deg": int(self.track_angle_deg),
 46 |             "vertical_speed_fpm": self.vertical_speed_fpm,
 47 |             "speed_ref": self.speed_ref
 48 |         }
 49 |         print(dumps(output))
 50 | 
 51 | 
 52 | class ADSBClient(RtlReader):
 53 |     def __init__(self):
 54 |         super(ADSBClient, self).__init__()
 55 |         self.flights = {}
 56 |         self.lat_ref = <your_latitude>
 57 |         self.lon_ref = <your_longitude>
 58 |         self.i = 0
 59 | 
 60 |     def handle_messages(self, messages):
 61 |         self.i += 1
 62 |         for msg, ts in messages:
 63 |             if len(msg) != 28:  # wrong data length
 64 |                 continue
 65 | 
 66 |             df = pms.df(msg)
 67 | 
 68 |             if df != 17:  # not ADSB
 69 |                 continue
 70 | 
 71 |             if pms.crc(msg) !=0:  # CRC fail
 72 |                 continue
 73 | 
 74 |             icao = pms.adsb.icao(msg)
 75 |             tc = pms.adsb.typecode(msg)
 76 |             flight = None
 77 | 
 78 |             if icao in self.flights:
 79 |                 flight = self.flights[icao]
 80 |             else:
 81 |                 flight = Flight(icao)
 82 | 
 83 |             flight.last_seen = datetime.now()
 84 | 
 85 |             # Message Type Codes:  https://mode-s.org/api/
 86 |             if tc >= 1 and tc <= 4:
 87 |                 # Typecode 1-4
 88 |                 flight.call_sign = pms.adsb.callsign(msg).strip('_')
 89 |             elif tc >= 9 and tc <= 18:
 90 |                 # Typecode 9-18 (airborne, barometric height)
 91 |                 flight.location = pms.adsb.airborne_position_with_ref(msg,
 92 |                         self.lat_ref, self.lon_ref)
 93 |                 flight.altitude_ft = pms.adsb.altitude(msg)
 94 |                 flight.sent = False
 95 |             elif tc == 19:
 96 |                 # Typecode: 19
 97 |                 # Ground Speed (GS) or Airspeed (IAS/TAS)
 98 |                 # Output (speed, track angle, vertical speed, tag):
 99 |                 (flight.speed_kts, flight.track_angle_deg, flight.vertical_speed_fpm,
100 |                     flight.speed_ref) = pms.adsb.velocity(msg)
101 | 
102 |             self.flights[icao] = flight
103 | 
104 |             if self.i > 10:
105 |                 self.i = 0
106 |                 #print("Flights: ", len(self.flights))
107 |                 for key in list(self.flights):
108 |                     f = self.flights[key]
109 |                     if f.has_info() and not f.sent:
110 |                         #f.pretty_print()
111 |                         f.json_print()
112 |                         f.sent = True
113 |                     elif f.last_seen < (datetime.now() - timedelta(minutes=5)):
114 |                         #print("Deleting ", key)
115 |                         del self.flights[key]
116 | 
117 | 
118 | if __name__ == "__main__":
119 |     client = ADSBClient()
120 |     client.run()
121 | 


--------------------------------------------------------------------------------
/flight-tracker/index.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/flight-tracker/index.png


--------------------------------------------------------------------------------
/flight-tracker/logo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/flight-tracker/logo.png


--------------------------------------------------------------------------------
/flight-tracker/minio.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/flight-tracker/minio.png


--------------------------------------------------------------------------------
/gps/README.md:
--------------------------------------------------------------------------------
  1 | # GPS Monitoring
  2 | 
  3 | <img src="gps.png" alt="gps" width="250" align="right">
  4 | 
  5 | 
  6 | 
  7 | The [VK-162 GPS Receiver](https://www.pishop.us/product/gps-antenna-vk-162/) is a low cost, high sensitivity GPS Receiver with an internal antenna.   It provides location & speed tracking with a high degree of accuracy, and can be used in a number of applications.  It's built on the Ublox G6010 / G7020 low-power consumption GPS chipset and connects to a computer via USB where it can be read programmatically.
  8 | 
  9 | In this data source, we'll build the following dashboard with Elastic:
 10 | 
 11 | ![Dashboard](dashboard.png)
 12 | 
 13 | Let's get started!
 14 | 
 15 | ## Step #1 - Collect Data
 16 | 
 17 | We use [gpsd](https://gpsd.gitlab.io/gpsd/) to query the GPS Receiver, and then [gpspipe](https://gpsd.gitlab.io/gpsd/gpspipe.html) to talk to `gpsd` to get location readings.
 18 | 
 19 | If this is your first time setting up `gpsd`, you can refer to the [installation instructions](https://gpsd.gitlab.io/gpsd/installation.html).
 20 | 
 21 | For Ubuntu systems, the installation process is as follows:
 22 | 
 23 | ```bash
 24 | sudo apt install gpsd-clients gpsd
 25 | sudo systemctl enable gpsd	
 26 | ```
 27 | 
 28 | Create a file called `/etc/default/gpsd` with the following contents (changing the device to your setup, if necessary):
 29 | 
 30 | ```
 31 | DEVICES="/dev/ttyACM0"
 32 | ```
 33 | 
 34 | Then start the `gpsd` service:
 35 | 
 36 | ```bash
 37 | sudo systemctl start gpsd
 38 | ```
 39 | 
 40 | And try querying it:
 41 | 
 42 | ```bash
 43 | gpspipe -w
 44 | ```
 45 | 
 46 | You should see output similar to the following:
 47 | 
 48 |  ```json
 49 | {"class":"VERSION","release":"3.20","rev":"3.20","proto_major":3,"proto_minor":14}
 50 | {"class":"DEVICES","devices":[{"class":"DEVICE","path":"/dev/ttyACM0","driver":"u-blox","subtype":"SW 1.00 (59842),HW 00070000","subtype1":",PROTVER 14.00,GPS;SBAS;GLO;QZSS","activated":"2021-09-02T19:13:12.267Z","flags":1,"native":1,"bps":9600,"parity":"N","stopbits":1,"cycle":1.00,"mincycle":0.25}]}
 51 | {"class":"WATCH","enable":true,"json":true,"nmea":false,"raw":0,"scaled":false,"timing":false,"split24":false,"pps":false}
 52 | {"class":"TPV","device":"/dev/ttyACM0","status":2,"mode":3,"time":"2021-09-02T19:13:13.000Z","leapseconds":18,"ept":0.005,"lat":41.881832,"lon":-87.623177,"altHAE":206.865,"altMSL":240.666,"alt":240.666,"track":14.3003,"magtrack":10.5762,"magvar":-3.7,"speed":0.088,"climb":0.010,"eps":0.67,"ecefx":160358.79,"ecefy":-4754122.42,"ecefz":4234974.21,"ecefvx":0.02,"ecefvy":0.05,"ecefvz":0.07,"ecefpAcc":10.77,"ecefvAcc":0.67,"velN":0.085,"velE":0.022,"velD":-0.010,"geoidSep":-33.801}
 53 |  ```
 54 | 
 55 | The main thing to look for are the `"class":"TPV"` objects which is are "time-position-velocity" reports.  All of the fields included are described in the document [Core Protocol Responses](https://gpsd.gitlab.io/gpsd/gpsd_json.html#_core_protocol_responses).
 56 | 
 57 | Create a new shell script called `~/bin/gps.sh` with the following contents:
 58 | 
 59 | ```bash
 60 | #!/bin/bash
 61 | 
 62 | gpspipe -w -n 10 | grep TPV | tail -n 1
 63 | ```
 64 | 
 65 | Try running the script:
 66 | 
 67 | ```bash
 68 | chmod a+x ~/bin/gps.sh
 69 | ~/bin/gps.sh
 70 | ```
 71 | 
 72 | You should see output on `stdout` similar to:
 73 | 
 74 | ```json
 75 | {"class":"TPV","device":"/dev/ttyACM0","status":2,"mode":3,"time":"2021-09-02T19:13:13.000Z","leapseconds":18,"ept":0.005,"lat":41.881832,"lon":-87.623177,"altHAE":206.865,"altMSL":240.666,"alt":240.666,"track":14.3003,"magtrack":10.5762,"magvar":-3.7,"speed":0.088,"climb":0.010,"eps":0.67,"ecefx":160358.79,"ecefy":-4754122.42,"ecefz":4234974.21,"ecefvx":0.02,"ecefvy":0.05,"ecefvz":0.07,"ecefpAcc":10.77,"ecefvAcc":0.67,"velN":0.085,"velE":0.022,"velD":-0.010,"geoidSep":-33.801}
 76 | ```
 77 | 
 78 | Once you confirm the script is working, you can redirect its output to a log file:
 79 | 
 80 | ```bash
 81 | sudo touch /var/log/gps.log
 82 | sudo chown ubuntu.ubuntu /var/log/gps.log
 83 | ```
 84 | 
 85 | Create a logrotate entry so the log file doesn't grow unbounded:
 86 | 
 87 | ```bash
 88 | sudo vi /etc/logrotate.d/gps
 89 | ```
 90 | 
 91 | Add the following logrotate content:
 92 | 
 93 | ```
 94 | /var/log/gps.log {
 95 |   weekly
 96 |   rotate 12
 97 |   compress
 98 |   delaycompress
 99 |   missingok
100 |   notifempty
101 |   create 644 ubuntu ubuntu
102 | }
103 | ```
104 | 
105 | Add the following entry to your crontab with `crontab -e`:
106 | 
107 | ```
108 | * * * * * /home/ubuntu/bin/gps.sh >> /var/log/gps.log 2>&1
109 | ```
110 | 
111 | Verify output by tailing the log file for a few minutes (since cron is only running the script at the start of each minute):
112 | 
113 | ```bash
114 | tail -f /var/log/gps.log
115 | ```
116 | 
117 | If you're seeing output scroll each minute then you are successfully collecting data!
118 | 
119 | ## Step #2 - Archive Data
120 | 
121 | Once your data is ready to archive, we'll use Filebeat to send it to Logstash which will in turn sends it to S3.
122 | 
123 | Add the following to the Filebeat config `/etc/filebeat/filebeat.yml` on the host logging your weather data:
124 | 
125 | ```yaml
126 | filebeat.inputs:
127 |   - type: log
128 |     enabled: true
129 |     tags: ["gps"]
130 |     paths:
131 |       - /var/log/gps.log
132 | ```
133 | 
134 | This tells Filebeat where the log file is located and it adds a tag to each event.  We'll refer to that tag in Logstash so we can easily isolate events from this data stream.
135 | 
136 | Restart Filebeat:
137 | 
138 | ```bash
139 | sudo systemctl restart filebeat
140 | ```
141 | 
142 | You may want to tail syslog to see if Filebeat restarts without any issues:
143 | 
144 | ```bash
145 | tail -f /var/log/syslog | grep filebeat
146 | ```
147 | 
148 | At this point, we should have GPS data flowing into Logstash.  By default however, our `distributor` pipeline in Logstash will put any unrecognized data in our Data Lake / S3 bucket called `NEEDS_CLASSIFIED`.  To change this, we're going to update the `distributor` pipeline to recognize the weather station data feed.
149 | 
150 | Add the following conditional to your `distributor.yml` file:
151 | 
152 | ```
153 | } else if "gps" in [tags] {
154 |     pipeline {
155 |         send_to => ["gps-archive"]
156 |     }
157 | }
158 | ```
159 | 
160 | Create a Logstash pipeline called `gps-archive.yml` with the following contents:
161 | 
162 | ```
163 | input {
164 |     pipeline {
165 |         address => "gps-archive"
166 |     }
167 | }
168 | filter {
169 | }
170 | output {
171 |     s3 {
172 |         #
173 |         # Custom Settings
174 |         #
175 |         prefix => "gps/%{+YYYY}-%{+MM}-%{+dd}/%{+HH}"
176 |         temporary_directory => "${S3_TEMP_DIR}/gps-archive"
177 |         access_key_id => "${S3_ACCESS_KEY}"
178 |         secret_access_key => "${S3_SECRET_KEY}"
179 |         endpoint => "${S3_ENDPOINT}"
180 |         bucket => "${S3_BUCKET}"
181 | 
182 |         #
183 |         # Standard Settings
184 |         #
185 |         validate_credentials_on_root_bucket => false
186 |         codec => json_lines
187 |         # Limit Data Lake file sizes to 5 GB
188 |         size_file => 5000000000
189 |         time_file => 60
190 |         # encoding => "gzip"
191 |         additional_settings => {
192 |             force_path_style => true
193 |             follow_redirects => false
194 |         }
195 |     }
196 | }
197 | ```
198 | 
199 | Put this pipeline in your Logstash configuration directory so it gets loaded whenever Logstash restarts:
200 | 
201 | ```bash
202 | sudo mv gps-archive.yml /etc/logstash/conf.d/
203 | ```
204 | 
205 | Add the pipeline to your `/etc/logstash/pipelines.yml` file:
206 | 
207 | ```
208 | - pipeline.id: "gps-archive"
209 |   path.config: "/etc/logstash/conf.d/gps-archive.conf"
210 | ```
211 | 
212 | And finally, restart the Logstash service:
213 | 
214 | ```bash
215 | sudo systemctl restart logstash
216 | ```
217 | 
218 | While Logstash is restarting, you can tail it's log file in order to see if there are any configuration errors:
219 | 
220 | ```bash
221 | sudo tail -f /var/log/logstash/logstash-plain.log
222 | ```
223 | 
224 | After a few seconds, you should see Logstash shutdown and start with the new pipeline and no errors being emitted.
225 | 
226 | Check your cluster's Stack Monitoring to see if we're getting events through the pipeline:
227 | 
228 | ![Stack Monitoring](archive.png)
229 | 
230 | Check your S3 bucket to see if you're getting data directories created for the current date & hour with data:
231 | 
232 | ![MinIO](minio.png)
233 | 
234 | If you see your data being stored, then you are successfully archiving!
235 | 
236 | ## Step #3 - Index Data
237 | 
238 | Once Logstash is archiving the data, we need to index it with Elastic.
239 | 
240 | We'll use Elastic's [Dynamic field mapping](https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-field-mapping.html) feature to automatically create the right [Field data types](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html) for the majority of data we're sending in.  The one exception is for `geo_point` data which we need to explicitly add a mapping for, using an index template.
241 | 
242 | Jump into Kibana and create the following Index Template using Dev Tools:
243 | 
244 | ```
245 | PUT _index_template/gps
246 | {
247 |   "index_patterns": ["gps-*"],
248 |   "template": {
249 |     "settings": {},
250 |     "mappings": {
251 |       "properties": {
252 |         "location": {
253 |           "type": "geo_point"
254 |         }
255 |       }
256 |     },
257 |     "aliases": {}
258 |   }
259 | }
260 | ```
261 | 
262 | Using the [Logstash Toolkit](http://github.com/gose/logstash-toolkit), the following filter chain has been built that can parse the raw JSON coming in.
263 | 
264 | Create a new pipeline called `gps-index.yml` with the following content:
265 | 
266 | ```
267 | input {
268 |     pipeline {
269 |         address => "gps-index"
270 |     }
271 | }
272 | filter { 
273 |     json {
274 |         source => "message"
275 |     }
276 |     json {
277 |         source => "message"
278 |     }
279 |     mutate {
280 |         remove_field => ["message", "agent", "host", "input", "log", "host", "ecs", "@version"]
281 |     }
282 |     mutate {
283 |         add_field => { "[location]" => "%{[lat]}, %{[lon]}" }
284 |     }
285 | }
286 | output {
287 |     elasticsearch {
288 |         #
289 |         # Custom Settings
290 |         #
291 |         id => "gps-index"
292 |         index => "gps-%{+YYYY.MM.dd}"
293 |         hosts => "${ES_ENDPOINT}"
294 |         user => "${ES_USERNAME}"
295 |         password => "${ES_PASSWORD}"
296 |     }
297 | }
298 | ```
299 | 
300 | Put this pipeline in your Logstash configuration directory so it gets loaded in whenever Logstash restarts:
301 | 
302 | ```bash
303 | sudo mv gps-index.yml /etc/logstash/conf.d/
304 | ```
305 | 
306 | Add the pipeline to your `/etc/logstash/pipelines.yml` file:
307 | 
308 | ```
309 | - pipeline.id: "gps-index"
310 |   path.config: "/etc/logstash/conf.d/gps-index.conf"
311 | ```
312 | 
313 | And finally, restart the Logstash service:
314 | 
315 | ```bash
316 | sudo systemctl restart logstash
317 | ```
318 | 
319 | While Logstash is restarting, you can tail it's log file in order to see if there are any configuration errors:
320 | 
321 | ```bash
322 | sudo tail -f /var/log/logstash/logstash-plain.log
323 | ```
324 | 
325 | After a few seconds, you should see Logstash shutdown and start with the new pipeline and no errors being emitted.
326 | 
327 | Check your cluster's Stack Monitoring to see if we're getting events through the pipeline:
328 | 
329 | ![Indexing](index.png)
330 | 
331 | ## Step #4 - Visualize Data
332 | 
333 | Once Elasticsearch is indexing the data, we want to visualize it in Kibana.
334 | 
335 | Download this dashboard:
336 | 
337 | ​	[gps.ndjson](gps.ndjson)
338 | 
339 | Jump into Kibana:
340 | 
341 | 1. Select "Stack Management" from the menu
342 | 2. Select "Saved Objects"
343 | 3. Click "Import" in the upper right
344 | 
345 | ![Dashboard](dashboard.png)
346 | 
347 | Congratulations!  You should now be looking at data from your GPS in Elastic.
348 | 


--------------------------------------------------------------------------------
/gps/archive.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/gps/archive.png


--------------------------------------------------------------------------------
/gps/dashboard.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/gps/dashboard.png


--------------------------------------------------------------------------------
/gps/gps.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/gps/gps.png


--------------------------------------------------------------------------------
/gps/index.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/gps/index.png


--------------------------------------------------------------------------------
/gps/minio.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/gps/minio.png


--------------------------------------------------------------------------------
/haproxy-filebeat-module/2-archive/haproxy-filebeat-module-archive.yml:
--------------------------------------------------------------------------------
 1 | input {
 2 |     pipeline {
 3 |         address => "haproxy-filebeat-module-archive"
 4 |     }
 5 | }
 6 | filter {
 7 | }
 8 | output {
 9 |     s3 {
10 |         #
11 |         # Custom Settings
12 |         #
13 |         prefix => "haproxy-filebeat-module/${S3_DATE_DIR}"
14 |         temporary_directory => "${S3_TEMP_DIR}/haproxy-filebeat-module-archive"        
15 |         access_key_id => "${S3_ACCESS_KEY}"
16 |         secret_access_key => "${S3_SECRET_KEY}"
17 |         endpoint => "${S3_ENDPOINT}"
18 |         bucket => "${S3_BUCKET}"
19 |         
20 |         #
21 |         # Standard Settings
22 |         #
23 |         validate_credentials_on_root_bucket => false
24 |         codec => json_lines
25 |         # Limit Data Lake file sizes to 5 GB
26 |         size_file => 5000000000
27 |         time_file => 1
28 |         # encoding => "gzip"
29 |         additional_settings => {
30 |             force_path_style => true
31 |             follow_redirects => false
32 |         }
33 |     }
34 | }
35 | 


--------------------------------------------------------------------------------
/haproxy-filebeat-module/2-archive/haproxy-filebeat-module-reindex.yml:
--------------------------------------------------------------------------------
 1 | input {
 2 |     s3 {
 3 |         #
 4 |         # Custom Settings
 5 |         #
 6 |         prefix => "haproxy-filebeat-module/2021-01-04"
 7 |         temporary_directory => "${S3_TEMP_DIR}/haproxy-filebeat-module-reindex"
 8 |         access_key_id => "${S3_ACCESS_KEY}"
 9 |         secret_access_key => "${S3_SECRET_KEY}"
10 |         endpoint => "${S3_ENDPOINT}"
11 |         bucket => "${S3_BUCKET}"
12 |         
13 |         #
14 |         # Standard Settings
15 |         #
16 |         watch_for_new_files => false
17 |         codec => json_lines
18 |         additional_settings => {
19 |             force_path_style => true
20 |             follow_redirects => false
21 |         }
22 |     }
23 | }
24 | filter {
25 | }
26 | output {
27 |     pipeline { send_to => "haproxy-filebeat-module-structure" }
28 | }
29 | 


--------------------------------------------------------------------------------
/haproxy-filebeat-module/2-archive/haproxy-filebeat-module-structure.yml:
--------------------------------------------------------------------------------
 1 | input {
 2 |     pipeline {
 3 |         address => "haproxy-filebeat-module-structure"
 4 |     }
 5 | }
 6 | filter {
 7 | }
 8 | output {
 9 |     elasticsearch {
10 |         #
11 |         # Custom Settings
12 |         #
13 |         id => "haproxy-filebeat-module-structure"
14 |         index => "haproxy-filebeat-module"
15 |         hosts => "${ES_ENDPOINT}"
16 |         user => "${ES_USERNAME}"
17 |         password => "${ES_PASSWORD}"
18 |     }
19 | }
20 | 


--------------------------------------------------------------------------------
/haproxy-filebeat-module/4-visualize/dashboard.json:
--------------------------------------------------------------------------------
  1 | {
  2 |   "objects": [
  3 |     {
  4 |       "attributes": {
  5 |         "description": "",
  6 |         "kibanaSavedObjectMeta": {
  7 |           "searchSourceJSON": {
  8 |             "filter": [],
  9 |             "index": "filebeat-*",
 10 |             "query": {
 11 |               "language": "kuery",
 12 |               "query": ""
 13 |             }
 14 |           }
 15 |         },
 16 |         "title": "Backend breakdown [Filebeat HAProxy] ECS",
 17 |         "uiStateJSON": {},
 18 |         "version": 1,
 19 |         "visState": {
 20 |           "aggs": [
 21 |             {
 22 |               "enabled": true,
 23 |               "id": "1",
 24 |               "params": {},
 25 |               "schema": "metric",
 26 |               "type": "count"
 27 |             },
 28 |             {
 29 |               "enabled": true,
 30 |               "id": "2",
 31 |               "params": {
 32 |                 "field": "haproxy.backend_name",
 33 |                 "missingBucket": false,
 34 |                 "missingBucketLabel": "Missing",
 35 |                 "order": "desc",
 36 |                 "orderBy": "1",
 37 |                 "otherBucket": false,
 38 |                 "otherBucketLabel": "Other",
 39 |                 "size": 5
 40 |               },
 41 |               "schema": "segment",
 42 |               "type": "terms"
 43 |             }
 44 |           ],
 45 |           "params": {
 46 |             "addLegend": true,
 47 |             "addTooltip": true,
 48 |             "isDonut": true,
 49 |             "labels": {
 50 |               "last_level": true,
 51 |               "show": false,
 52 |               "truncate": 100,
 53 |               "values": true
 54 |             },
 55 |             "legendPosition": "right",
 56 |             "type": "pie"
 57 |           },
 58 |           "title": "Backend breakdown [Filebeat HAProxy] ECS",
 59 |           "type": "pie"
 60 |         }
 61 |       },
 62 |       "id": "55251360-aa32-11e8-9c06-877f0445e3e0-ecs",
 63 |       "type": "visualization",
 64 |       "updated_at": "2018-12-06T11:35:36.721Z",
 65 |       "version": 2
 66 |     },
 67 |     {
 68 |       "attributes": {
 69 |         "description": "",
 70 |         "kibanaSavedObjectMeta": {
 71 |           "searchSourceJSON": {
 72 |             "filter": [],
 73 |             "index": "filebeat-*",
 74 |             "query": {
 75 |               "language": "kuery",
 76 |               "query": ""
 77 |             }
 78 |           }
 79 |         },
 80 |         "title": "Frontend breakdown [Filebeat HAProxy] ECS",
 81 |         "uiStateJSON": {},
 82 |         "version": 1,
 83 |         "visState": {
 84 |           "aggs": [
 85 |             {
 86 |               "enabled": true,
 87 |               "id": "1",
 88 |               "params": {},
 89 |               "schema": "metric",
 90 |               "type": "count"
 91 |             },
 92 |             {
 93 |               "enabled": true,
 94 |               "id": "2",
 95 |               "params": {
 96 |                 "field": "haproxy.frontend_name",
 97 |                 "missingBucket": false,
 98 |                 "missingBucketLabel": "Missing",
 99 |                 "order": "desc",
100 |                 "orderBy": "1",
101 |                 "otherBucket": false,
102 |                 "otherBucketLabel": "Other",
103 |                 "size": 5
104 |               },
105 |               "schema": "segment",
106 |               "type": "terms"
107 |             }
108 |           ],
109 |           "params": {
110 |             "addLegend": true,
111 |             "addTooltip": true,
112 |             "isDonut": true,
113 |             "labels": {
114 |               "last_level": true,
115 |               "show": false,
116 |               "truncate": 100,
117 |               "values": true
118 |             },
119 |             "legendPosition": "right",
120 |             "type": "pie"
121 |           },
122 |           "title": "Frontend breakdown [Filebeat HAProxy] ECS",
123 |           "type": "pie"
124 |         }
125 |       },
126 |       "id": "7fb671f0-aa32-11e8-9c06-877f0445e3e0-ecs",
127 |       "type": "visualization",
128 |       "updated_at": "2018-12-06T11:35:36.721Z",
129 |       "version": 2
130 |     },
131 |     {
132 |       "attributes": {
133 |         "description": "",
134 |         "kibanaSavedObjectMeta": {
135 |           "searchSourceJSON": {
136 |             "filter": [],
137 |             "index": "filebeat-*",
138 |             "query": {
139 |               "language": "kuery",
140 |               "query": ""
141 |             }
142 |           }
143 |         },
144 |         "title": "IP Geohashes [Filebeat HAProxy] ECS",
145 |         "uiStateJSON": {
146 |           "mapCenter": [
147 |             14.944784875088372,
148 |             5.09765625
149 |           ]
150 |         },
151 |         "version": 1,
152 |         "visState": {
153 |           "aggs": [
154 |             {
155 |               "enabled": true,
156 |               "id": "1",
157 |               "params": {
158 |                 "field": "source.address"
159 |               },
160 |               "schema": "metric",
161 |               "type": "cardinality"
162 |             },
163 |             {
164 |               "enabled": true,
165 |               "id": "2",
166 |               "params": {
167 |                 "autoPrecision": true,
168 |                 "field": "source.geo.location",
169 |                 "isFilteredByCollar": true,
170 |                 "precision": 2,
171 |                 "useGeocentroid": true
172 |               },
173 |               "schema": "segment",
174 |               "type": "geohash_grid"
175 |             }
176 |           ],
177 |           "params": {
178 |             "addTooltip": true,
179 |             "heatBlur": 15,
180 |             "heatMaxZoom": 16,
181 |             "heatMinOpacity": 0.1,
182 |             "heatNormalizeData": true,
183 |             "heatRadius": 25,
184 |             "isDesaturated": true,
185 |             "legendPosition": "bottomright",
186 |             "mapCenter": [
187 |               15,
188 |               5
189 |             ],
190 |             "mapType": "Scaled Circle Markers",
191 |             "mapZoom": 2,
192 |             "wms": {
193 |               "enabled": false,
194 |               "options": {
195 |                 "attribution": "Maps provided by USGS",
196 |                 "format": "image/png",
197 |                 "layers": "0",
198 |                 "styles": "",
199 |                 "transparent": true,
200 |                 "version": "1.3.0"
201 |               },
202 |               "url": "https://basemap.nationalmap.gov/arcgis/services/USGSTopo/MapServer/WMSServer"
203 |             }
204 |           },
205 |           "title": "IP Geohashes [Filebeat HAProxy] ECS",
206 |           "type": "tile_map"
207 |         }
208 |       },
209 |       "id": "11f8b9c0-aa32-11e8-9c06-877f0445e3e0-ecs",
210 |       "type": "visualization",
211 |       "updated_at": "2018-12-06T11:35:36.721Z",
212 |       "version": 2
213 |     },
214 |     {
215 |       "attributes": {
216 |         "description": "",
217 |         "kibanaSavedObjectMeta": {
218 |           "searchSourceJSON": {
219 |             "filter": [],
220 |             "index": "filebeat-*",
221 |             "query": {
222 |               "language": "kuery",
223 |               "query": ""
224 |             }
225 |           }
226 |         },
227 |         "title": "Response codes over time [Filebeat HAProxy] ECS",
228 |         "uiStateJSON": {
229 |           "vis": {
230 |             "colors": {
231 |               "200": "#508642",
232 |               "204": "#629E51",
233 |               "302": "#6ED0E0",
234 |               "404": "#EAB839",
235 |               "503": "#705DA0"
236 |             }
237 |           }
238 |         },
239 |         "version": 1,
240 |         "visState": {
241 |           "aggs": [
242 |             {
243 |               "enabled": true,
244 |               "id": "1",
245 |               "params": {},
246 |               "schema": "metric",
247 |               "type": "count"
248 |             },
249 |             {
250 |               "enabled": true,
251 |               "id": "2",
252 |               "params": {
253 |                 "customInterval": "2h",
254 |                 "extended_bounds": {},
255 |                 "field": "@timestamp",
256 |                 "interval": "auto",
257 |                 "min_doc_count": 1
258 |               },
259 |               "schema": "segment",
260 |               "type": "date_histogram"
261 |             },
262 |             {
263 |               "enabled": true,
264 |               "id": "3",
265 |               "params": {
266 |                 "field": "http.response.status_code",
267 |                 "missingBucket": false,
268 |                 "missingBucketLabel": "Missing",
269 |                 "order": "desc",
270 |                 "orderBy": "_term",
271 |                 "otherBucket": false,
272 |                 "otherBucketLabel": "Other",
273 |                 "size": 5
274 |               },
275 |               "schema": "group",
276 |               "type": "terms"
277 |             }
278 |           ],
279 |           "params": {
280 |             "addLegend": true,
281 |             "addTimeMarker": false,
282 |             "addTooltip": true,
283 |             "categoryAxes": [
284 |               {
285 |                 "id": "CategoryAxis-1",
286 |                 "labels": {
287 |                   "show": true,
288 |                   "truncate": 100
289 |                 },
290 |                 "position": "bottom",
291 |                 "scale": {
292 |                   "type": "linear"
293 |                 },
294 |                 "show": true,
295 |                 "style": {},
296 |                 "title": {},
297 |                 "type": "category"
298 |               }
299 |             ],
300 |             "grid": {
301 |               "categoryLines": false,
302 |               "style": {
303 |                 "color": "#eee"
304 |               }
305 |             },
306 |             "legendPosition": "right",
307 |             "seriesParams": [
308 |               {
309 |                 "data": {
310 |                   "id": "1",
311 |                   "label": "Count"
312 |                 },
313 |                 "drawLinesBetweenPoints": true,
314 |                 "mode": "stacked",
315 |                 "show": "true",
316 |                 "showCircles": true,
317 |                 "type": "histogram",
318 |                 "valueAxis": "ValueAxis-1"
319 |               }
320 |             ],
321 |             "times": [],
322 |             "type": "histogram",
323 |             "valueAxes": [
324 |               {
325 |                 "id": "ValueAxis-1",
326 |                 "labels": {
327 |                   "filter": false,
328 |                   "rotate": 0,
329 |                   "show": true,
330 |                   "truncate": 100
331 |                 },
332 |                 "name": "LeftAxis-1",
333 |                 "position": "left",
334 |                 "scale": {
335 |                   "mode": "normal",
336 |                   "type": "linear"
337 |                 },
338 |                 "show": true,
339 |                 "style": {},
340 |                 "title": {
341 |                   "text": "Count"
342 |                 },
343 |                 "type": "value"
344 |               }
345 |             ]
346 |           },
347 |           "title": "Response codes over time [Filebeat HAProxy] ECS",
348 |           "type": "histogram"
349 |         }
350 |       },
351 |       "id": "68af8ef0-aa33-11e8-9c06-877f0445e3e0-ecs",
352 |       "type": "visualization",
353 |       "updated_at": "2018-12-06T11:35:36.721Z",
354 |       "version": 2
355 |     },
356 |     {
357 |       "attributes": {
358 |         "description": "Filebeat HAProxy module dashboard",
359 |         "hits": 0,
360 |         "kibanaSavedObjectMeta": {
361 |           "searchSourceJSON": {
362 |             "filter": [],
363 |             "query": {
364 |               "language": "kuery",
365 |               "query": ""
366 |             }
367 |           }
368 |         },
369 |         "optionsJSON": {
370 |           "darkTheme": false,
371 |           "hidePanelTitles": false,
372 |           "useMargins": true
373 |         },
374 |         "panelsJSON": [
375 |           {
376 |             "embeddableConfig": {},
377 |             "gridData": {
378 |               "h": 15,
379 |               "i": "1",
380 |               "w": 24,
381 |               "x": 0,
382 |               "y": 0
383 |             },
384 |             "id": "55251360-aa32-11e8-9c06-877f0445e3e0-ecs",
385 |             "panelIndex": "1",
386 |             "type": "visualization",
387 |             "version": "6.5.2"
388 |           },
389 |           {
390 |             "embeddableConfig": {},
391 |             "gridData": {
392 |               "h": 15,
393 |               "i": "2",
394 |               "w": 24,
395 |               "x": 24,
396 |               "y": 0
397 |             },
398 |             "id": "7fb671f0-aa32-11e8-9c06-877f0445e3e0-ecs",
399 |             "panelIndex": "2",
400 |             "type": "visualization",
401 |             "version": "6.5.2"
402 |           },
403 |           {
404 |             "embeddableConfig": {},
405 |             "gridData": {
406 |               "h": 15,
407 |               "i": "3",
408 |               "w": 24,
409 |               "x": 0,
410 |               "y": 15
411 |             },
412 |             "id": "11f8b9c0-aa32-11e8-9c06-877f0445e3e0-ecs",
413 |             "panelIndex": "3",
414 |             "type": "visualization",
415 |             "version": "6.5.2"
416 |           },
417 |           {
418 |             "embeddableConfig": {},
419 |             "gridData": {
420 |               "h": 15,
421 |               "i": "4",
422 |               "w": 24,
423 |               "x": 24,
424 |               "y": 15
425 |             },
426 |             "id": "68af8ef0-aa33-11e8-9c06-877f0445e3e0-ecs",
427 |             "panelIndex": "4",
428 |             "type": "visualization",
429 |             "version": "6.5.2"
430 |           }
431 |         ],
432 |         "timeRestore": false,
433 |         "title": "[Filebeat HAProxy] Overview ECS",
434 |         "version": 1
435 |       },
436 |       "id": "3560d580-aa34-11e8-9c06-877f0445e3e0-ecs",
437 |       "type": "dashboard",
438 |       "updated_at": "2018-12-06T11:40:40.204Z",
439 |       "version": 6
440 |     }
441 |   ],
442 |   "version": "6.5.2"
443 | }
444 | 


--------------------------------------------------------------------------------
/haproxy-filebeat-module/README.md:
--------------------------------------------------------------------------------
 1 | We'll use the Filebeat HAProxy module to grab the HAProxy log file.
 2 | 
 3 | You can grab it without the module, but only one method works at a
 4 | time for Filebeat to read the file (you can't enable both).
 5 | 
 6 | We'll use the Filebeat HAProxy module since it cleanly persists the
 7 | HAProxy log file messages while also providing the appropriate metadata
 8 | for the other module artifacts: Ingest Pipeline, Kibana Dashboard, etc.
 9 | 
10 | ```
11 | $ filebeat module enable haproxy
12 | $ cat /etc/filebeat/modules.d/haproxy.yml
13 | - module: haproxy
14 |   log:
15 |     enabled: true
16 |     var.input: file
17 | ```
18 | We should still be able to use the data collected by the module with
19 | the "raw" HAProxy data source adapter [here](/data-sources/haproxy).
20 | 
21 | Since Beats Modules come with the ability to load the out-of-the-box
22 | assets using the beat, you can leverage that method as described below.
23 | 
24 | Load Index Template
25 | 
26 | [https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-template.html](https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-template.html)
27 | 
28 | Load Kibana Dashboards
29 | 
30 | [https://www.elastic.co/guide/en/beats/filebeat/current/load-kibana-dashboards.html](https://www.elastic.co/guide/en/beats/filebeat/current/load-kibana-dashboards.html)
31 | 


--------------------------------------------------------------------------------
/images/architecture.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/images/architecture.png


--------------------------------------------------------------------------------
/images/caiv.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/images/caiv.png


--------------------------------------------------------------------------------
/images/data-source-assets.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/images/data-source-assets.png


--------------------------------------------------------------------------------
/images/elk-data-lake.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/images/elk-data-lake.png


--------------------------------------------------------------------------------
/images/indexing.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/images/indexing.png


--------------------------------------------------------------------------------
/images/logical-elements.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/images/logical-elements.png


--------------------------------------------------------------------------------
/images/onboarding-data.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/images/onboarding-data.png


--------------------------------------------------------------------------------
/images/terminology.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/images/terminology.png


--------------------------------------------------------------------------------
/images/workflow.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/images/workflow.png


--------------------------------------------------------------------------------
/power-emu2/README.md:
--------------------------------------------------------------------------------
  1 | # Monitoring Power with EMU-2
  2 | 
  3 | <img src="emu-2.jpg" alt="EMU-2" width="350" align="right">
  4 | 
  5 | The [EMU-2](https://www.rainforestautomation.com/rfa-z105-2-emu-2-2/) by Rainforest Automation displays your smart meter's data in real time.  We'll connect to it via USB and use a Python script to receive its messages.  The device should output the current demand (kW), current meter reading, and even the current price per kWh.
  6 | 
  7 | Our goal is to build the following dashboard:
  8 | 
  9 | ![Dashboard](dashboard.png)
 10 | 
 11 | Let's get started.
 12 | 
 13 | ## Step #1 - Collect Data
 14 | 
 15 | Create a new python script called `~/bin/power-emu.py` with the following contents:
 16 | 
 17 | ​	[power-emu2.py](power-emu2.py)
 18 | 
 19 | You might need to adjust the USB port in the script, to match your needs.  Look for `/dev/ttyACM1` in the script.
 20 | 
 21 | Decoding the messages from the EMU-2 can be tricky.  There are technical documents to aid the process if you want to dig deeper than the provided Python script:
 22 | 
 23 | * [Emu-2-Tech-Guide-1.05.pdf](https://github.com/rainforestautomation/Emu-Serial-API/blob/master/Emu-2-Tech-Guide-1.05.pdf)
 24 | * [RAVEn. XML API Manual.pdf](https://rainforestautomation.com/wp-content/uploads/2014/02/raven_xml_api_r127.pdf)
 25 | 
 26 | Try running the script from the command line:
 27 | 
 28 | ```bash
 29 | chmod a+x ~/bin/power-emu2.py
 30 | sudo ~/bin/power-emu2.py
 31 | ```
 32 | 
 33 | The output will include a JSON-formatted summary of each power outlet's metrics.
 34 | 
 35 | ```json
 36 | {"message": "Starting", "timestamp": "2021-09-06T07:55:42Z", "status": "connected"}
 37 | {"message": "InstantaneousDemand", "timestamp": "2021-09-06T07:55:23Z", "demand_kW": 0.558}
 38 | {"message": "InstantaneousDemand", "timestamp": "2021-09-06T07:55:53Z", "demand_kW": 0.585}
 39 | {"message": "InstantaneousDemand", "timestamp": "2021-09-06T07:56:08Z", "demand_kW": 0.63}
 40 | {"message": "CurrentSummationDelivered", "timestamp": "2021-09-06T07:56:11Z", "summation_delivered": 73438571, "summation_received": 0, "meter_kWh": 73438.571}
 41 | {"message": "PriceCluster", "timestamp": "2021-09-06T07:56:51Z", "price_cents_kWh": 5.399, "currency": 840, "tier": 0, "start_time": "2021-09-06T07:50:00Z", "duration": 1440}
 42 | ```
 43 | 
 44 | Hit `^c` to quite the script.
 45 | 
 46 | Once you're able to successfully query the power strip, create a log file for its output:
 47 | 
 48 | ```bash
 49 | sudo touch /var/log/power-emu2.log
 50 | sudo chown ubuntu.ubuntu /var/log/power-emu2.log
 51 | ```
 52 | 
 53 | Create a logrotate entry so the log file doesn't grow unbounded:
 54 | 
 55 | ```
 56 | sudo vi /etc/logrotate.d/power-emu2
 57 | ```
 58 | 
 59 | Add the following content:
 60 | 
 61 | ```
 62 | /var/log/power-emu2.log {
 63 |   weekly
 64 |   rotate 12
 65 |   compress
 66 |   delaycompress
 67 |   missingok
 68 |   notifempty
 69 |   create 644 ubuntu ubuntu
 70 | }
 71 | ```
 72 | 
 73 | Create a new bash script `~/bin/power-emu2.sh` with the following:
 74 | 
 75 | ```bash
 76 | #!/bin/bash
 77 | 
 78 | if pgrep -f "sudo /home/ubuntu/bin/power-emu2.py" > /dev/null
 79 | then
 80 |     echo "Already running."
 81 | else
 82 |     echo "Not running.  Restarting..."
 83 |     sudo /home/ubuntu/bin/power-emu2.py >> /var/log/power-emu2.log 2>&1
 84 | fi
 85 | ```
 86 | 
 87 | Add the following entry to your crontab:
 88 | 
 89 | ```
 90 | * * * * * /home/ubuntu/bin/power-emu2.sh >> /tmp/power-emu2.log 2>&1
 91 | ```
 92 | 
 93 | Verify output by tailing the log file for a few minutes:
 94 | 
 95 | ```
 96 | tail -f /var/log/power-emu2.log
 97 | ```
 98 | 
 99 | If you're seeing output scroll each minute then you are successfully collecting data!
100 | 
101 | ## Step #2 - Archive Data
102 | 
103 | Once your data is ready to archive, we'll use Filebeat to send it to Logstash which will in turn sends it to S3.
104 | 
105 | Add the following to the Filebeat config `/etc/filebeat/filebeat.yml` on the host logging your EMU-2 data:
106 | 
107 | ```yaml
108 | filebeat.inputs:
109 |   - type: log
110 |     enabled: true
111 |     tags: ["power-emu2"]
112 |     paths:
113 |       - /var/log/power-emu2.log
114 | ```
115 | 
116 | This tells Filebeat where the log file is located and it adds a tag to each event.  We'll refer to that tag in Logstash so we can easily isolate events from this data stream.
117 | 
118 | Restart Filebeat:
119 | 
120 | ```bash
121 | sudo systemctl restart filebeat
122 | ```
123 | 
124 | You may want to tail syslog to see if Filebeat restarts without any issues:
125 | 
126 | ```bash
127 | tail -f /var/log/syslog | grep filebeat
128 | ```
129 | 
130 | At this point, we should have EMU-2 data flowing into Logstash.  By default however, our `distributor` pipeline in Logstash will put any unrecognized data in our Data Lake / S3 bucket called `NEEDS_CLASSIFIED`.  To change this, we're going to update the `distributor` pipeline to recognize the EMU-2 data feed.
131 | 
132 | Add the following conditional to your `distributor.yml` file:
133 | 
134 | ```
135 | } else if "power-emu2" in [tags] {
136 |     pipeline {
137 |         send_to => ["power-emu2-archive"]
138 |     }
139 | }
140 | ```
141 | 
142 | Create a Logstash pipeline called `power-emu2-archive.yml` with the following contents:
143 | 
144 | ```
145 | input {
146 |     pipeline {
147 |         address => "power-emu2-archive"
148 |     }
149 | }
150 | filter {
151 | }
152 | output {
153 |     s3 {
154 |         #
155 |         # Custom Settings
156 |         #
157 |         prefix => "power-emu2/%{+YYYY}-%{+MM}-%{+dd}/%{+HH}"
158 |         temporary_directory => "${S3_TEMP_DIR}/power-emu2-archive"
159 |         access_key_id => "${S3_ACCESS_KEY}"
160 |         secret_access_key => "${S3_SECRET_KEY}"
161 |         endpoint => "${S3_ENDPOINT}"
162 |         bucket => "${S3_BUCKET}"
163 | 
164 |         #
165 |         # Standard Settings
166 |         #
167 |         validate_credentials_on_root_bucket => false
168 |         codec => json_lines
169 |         # Limit Data Lake file sizes to 5 GB
170 |         size_file => 5000000000
171 |         time_file => 60
172 |         # encoding => "gzip"
173 |         additional_settings => {
174 |             force_path_style => true
175 |             follow_redirects => false
176 |         }
177 |     }
178 | }
179 | ```
180 | 
181 | Put this pipeline in your Logstash configuration directory so it gets loaded whenever Logstash restarts:
182 | 
183 | ```bash
184 | sudo mv power-emu2-archive.yml /etc/logstash/conf.d/
185 | ```
186 | 
187 | Add the pipeline to your `/etc/logstash/pipelines.yml` file:
188 | 
189 | ```
190 | - pipeline.id: "power-emu2-archive"
191 |   path.config: "/etc/logstash/conf.d/power-emu2-archive.conf"
192 | ```
193 | 
194 | And finally, restart the Logstash service:
195 | 
196 | ```bash
197 | sudo systemctl restart logstash
198 | ```
199 | 
200 | While Logstash is restarting, you can tail it's log file in order to see if there are any configuration errors:
201 | 
202 | ```bash
203 | sudo tail -f /var/log/logstash/logstash-plain.log
204 | ```
205 | 
206 | After a few seconds, you should see Logstash shutdown and start with the new pipeline and no errors being emitted.
207 | 
208 | Check your cluster's Stack Monitoring to see if we're getting events through the pipeline:
209 | 
210 | ![Stack Monitoring](archive.png)
211 | 
212 | Check your S3 bucket to see if you're getting data directories created for the current date & hour with data:
213 | 
214 | ![MinIO](minio.png)
215 | 
216 | If you see your data being stored, then you are successfully archiving!
217 | 
218 | ## Step #3 - Index Data
219 | 
220 | Once Logstash is archiving the data, next we need to index it with Elastic.
221 | 
222 | We'll use Elastic's [Dynamic field mapping](https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-field-mapping.html) feature to automatically create the right [Field data types](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html) for the data we're sending in.  
223 | 
224 | Using the [Logstash Toolkit](http://github.com/gose/logstash-toolkit), the following filter chain has been built that can parse the raw JSON coming in.
225 | 
226 | Create a new pipeline called `power-emu2-index.yml` with the following content:
227 | 
228 | ```
229 | input {
230 |     pipeline {
231 |         address => "power-emu2-index"
232 |     }
233 | }
234 | filter {
235 |     json {
236 |         source => "message"
237 |         skip_on_invalid_json => true
238 |     }
239 |     json {
240 |         source => "message"
241 |         skip_on_invalid_json => true
242 |     }
243 |     date {
244 |         match => ["timestamp", "ISO8601"]
245 |     }
246 |     mutate {
247 |         remove_field => ["timestamp"]
248 |         remove_field => ["log", "input", "agent", "tags", "@version", "ecs", "host"]
249 |     }
250 | }
251 | output {
252 |     elasticsearch {
253 |         #
254 |         # Custom Settings
255 |         #
256 |         id => "power-emu2-index"
257 |         index => "power-emu2-%{+YYYY.MM.dd}"
258 |         hosts => "${ES_ENDPOINT}"
259 |         user => "${ES_USERNAME}"
260 |         password => "${ES_PASSWORD}"
261 |     }
262 | }
263 | ```
264 | 
265 | Put this pipeline in your Logstash configuration directory so it gets loaded in whenever Logstash restarts:
266 | 
267 | ```bash
268 | sudo mv power-emu2-index.yml /etc/logstash/conf.d/
269 | ```
270 | 
271 | Add the pipeline to your `/etc/logstash/pipelines.yml` file:
272 | 
273 | ```
274 | - pipeline.id: "power-emu2-index"
275 |   path.config: "/etc/logstash/conf.d/power-emu2-index.conf"
276 | ```
277 | 
278 | Append your new pipeline to your tagged data in the `distributor.yml` pipeline:
279 | 
280 | ```
281 | } else if "power-emu2" in [tags] {
282 |     pipeline {
283 |         send_to => ["power-emu2-archive", "power-emu2-index"]
284 |     }
285 | }
286 | ```
287 | 
288 | And finally, restart the Logstash service:
289 | 
290 | ```bash
291 | sudo systemctl restart logstash
292 | ```
293 | 
294 | While Logstash is restarting, you can tail it's log file in order to see if there are any configuration errors:
295 | 
296 | ```bash
297 | sudo tail -f /var/log/logstash/logstash-plain.log
298 | ```
299 | 
300 | After a few seconds, you should see Logstash shutdown and start with the new pipeline and no errors being emitted.
301 | 
302 | Check your cluster's Stack Monitoring to see if we're getting events through the pipeline:
303 | 
304 | ![Indexing](index.png)
305 | 
306 | ## Step #4 - Visualize Data
307 | 
308 | Once Elasticsearch is indexing the data, we want to visualize it in Kibana.
309 | 
310 | Download this dashboard:  [power-emu2.ndjson](power-emu2.ndjson)
311 | 
312 | Jump back into Kibana:
313 | 
314 | 1. Select "Stack Management" from the menu
315 | 2. Select "Saved Objects"
316 | 3. Click "Import" in the upper right
317 | 
318 | Once it's been imported, click on "Power EMU-2".
319 | 
320 | ![Dashboard](dashboard.png)
321 | 
322 | Congratulations!  You should now be looking at power data from your EMU-2 in Elastic.


--------------------------------------------------------------------------------
/power-emu2/archive.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/power-emu2/archive.png


--------------------------------------------------------------------------------
/power-emu2/dashboard.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/power-emu2/dashboard.png


--------------------------------------------------------------------------------
/power-emu2/emu-2.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/power-emu2/emu-2.jpg


--------------------------------------------------------------------------------
/power-emu2/index.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/power-emu2/index.png


--------------------------------------------------------------------------------
/power-emu2/minio.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/power-emu2/minio.png


--------------------------------------------------------------------------------
/power-emu2/power-emu2.ndjson:
--------------------------------------------------------------------------------
1 | {"attributes":{"fieldAttrs":"{}","fields":"[]","runtimeFieldMap":"{}","timeFieldName":"@timestamp","title":"power-emu2-*","typeMeta":"{}"},"coreMigrationVersion":"7.14.0","id":"140f1bf0-0dbc-11ec-b013-53a9df7625dd","migrationVersion":{"index-pattern":"7.11.0"},"references":[],"type":"index-pattern","updated_at":"2021-09-04T20:09:58.325Z","version":"WzMxMzI2NCwyXQ=="}
2 | {"attributes":{"description":"","hits":0,"kibanaSavedObjectMeta":{"searchSourceJSON":"{\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filter\":[]}"},"optionsJSON":"{\"useMargins\":true,\"syncColors\":false,\"hidePanelTitles\":false}","panelsJSON":"[{\"version\":\"7.14.0\",\"type\":\"visualization\",\"gridData\":{\"x\":0,\"y\":0,\"w\":12,\"h\":8,\"i\":\"2492dbbd-f32f-4d29-946d-0b6971538d0e\"},\"panelIndex\":\"2492dbbd-f32f-4d29-946d-0b6971538d0e\",\"embeddableConfig\":{\"savedVis\":{\"title\":\"\",\"description\":\"\",\"type\":\"markdown\",\"params\":{\"fontSize\":12,\"openLinksInNewTab\":false,\"markdown\":\"# Power EMU-2\"},\"uiState\":{},\"data\":{\"aggs\":[],\"searchSource\":{\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filter\":[]}}},\"enhancements\":{}}},{\"version\":\"7.14.0\",\"type\":\"lens\",\"gridData\":{\"x\":12,\"y\":0,\"w\":36,\"h\":8,\"i\":\"58d3b589-f688-4815-b206-94f8e5bcf246\"},\"panelIndex\":\"58d3b589-f688-4815-b206-94f8e5bcf246\",\"embeddableConfig\":{\"attributes\":{\"title\":\"\",\"type\":\"lens\",\"visualizationType\":\"lnsXY\",\"state\":{\"datasourceStates\":{\"indexpattern\":{\"layers\":{\"9b5ecb03-f611-4038-bdd0-1e033b9b64b3\":{\"columns\":{\"9fa24993-a762-4277-aa6d-e471e99152e6\":{\"label\":\"@timestamp\",\"dataType\":\"date\",\"operationType\":\"date_histogram\",\"sourceField\":\"@timestamp\",\"isBucketed\":true,\"scale\":\"interval\",\"params\":{\"interval\":\"auto\"}},\"b27753d2-962f-407c-a41d-d0af4ff04199\":{\"label\":\"Count of records\",\"dataType\":\"number\",\"operationType\":\"count\",\"isBucketed\":false,\"scale\":\"ratio\",\"sourceField\":\"Records\"}},\"columnOrder\":[\"9fa24993-a762-4277-aa6d-e471e99152e6\",\"b27753d2-962f-407c-a41d-d0af4ff04199\"],\"incompleteColumns\":{}}}}},\"visualization\":{\"legend\":{\"isVisible\":true,\"position\":\"right\"},\"valueLabels\":\"hide\",\"fittingFunction\":\"None\",\"curveType\":\"CURVE_MONOTONE_X\",\"valuesInLegend\":true,\"yLeftExtent\":{\"mode\":\"full\"},\"yRightExtent\":{\"mode\":\"full\"},\"axisTitlesVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"tickLabelsVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"gridlinesVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"preferredSeriesType\":\"bar\",\"layers\":[{\"layerId\":\"9b5ecb03-f611-4038-bdd0-1e033b9b64b3\",\"accessors\":[\"b27753d2-962f-407c-a41d-d0af4ff04199\"],\"position\":\"top\",\"seriesType\":\"bar\",\"showGridlines\":false,\"xAccessor\":\"9fa24993-a762-4277-aa6d-e471e99152e6\"}]},\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filters\":[]},\"references\":[{\"type\":\"index-pattern\",\"id\":\"140f1bf0-0dbc-11ec-b013-53a9df7625dd\",\"name\":\"indexpattern-datasource-current-indexpattern\"},{\"type\":\"index-pattern\",\"id\":\"140f1bf0-0dbc-11ec-b013-53a9df7625dd\",\"name\":\"indexpattern-datasource-layer-9b5ecb03-f611-4038-bdd0-1e033b9b64b3\"}]},\"enhancements\":{},\"hidePanelTitles\":false},\"title\":\"Logs\"},{\"version\":\"7.14.0\",\"type\":\"lens\",\"gridData\":{\"x\":0,\"y\":8,\"w\":48,\"h\":11,\"i\":\"b1c3297d-5e54-4a20-8181-5cb483faf23d\"},\"panelIndex\":\"b1c3297d-5e54-4a20-8181-5cb483faf23d\",\"embeddableConfig\":{\"attributes\":{\"title\":\"\",\"type\":\"lens\",\"visualizationType\":\"lnsXY\",\"state\":{\"datasourceStates\":{\"indexpattern\":{\"layers\":{\"6ba7a7ef-3bea-46d9-b61a-bc2a3c673c1d\":{\"columns\":{\"623795c1-643f-4db6-9378-059abb5dc58b\":{\"label\":\"@timestamp\",\"dataType\":\"date\",\"operationType\":\"date_histogram\",\"sourceField\":\"@timestamp\",\"isBucketed\":true,\"scale\":\"interval\",\"params\":{\"interval\":\"auto\"}},\"b49999ce-ffb9-4188-9356-e363e7ae2ca8\":{\"label\":\"Median of demand_kW\",\"dataType\":\"number\",\"operationType\":\"median\",\"sourceField\":\"demand_kW\",\"isBucketed\":false,\"scale\":\"ratio\"}},\"columnOrder\":[\"623795c1-643f-4db6-9378-059abb5dc58b\",\"b49999ce-ffb9-4188-9356-e363e7ae2ca8\"],\"incompleteColumns\":{}}}}},\"visualization\":{\"legend\":{\"isVisible\":true,\"position\":\"right\"},\"valueLabels\":\"hide\",\"fittingFunction\":\"None\",\"curveType\":\"CURVE_MONOTONE_X\",\"valuesInLegend\":true,\"yLeftExtent\":{\"mode\":\"full\"},\"yRightExtent\":{\"mode\":\"full\"},\"axisTitlesVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"tickLabelsVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"gridlinesVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"preferredSeriesType\":\"line\",\"layers\":[{\"layerId\":\"6ba7a7ef-3bea-46d9-b61a-bc2a3c673c1d\",\"accessors\":[\"b49999ce-ffb9-4188-9356-e363e7ae2ca8\"],\"position\":\"top\",\"seriesType\":\"line\",\"showGridlines\":false,\"xAccessor\":\"623795c1-643f-4db6-9378-059abb5dc58b\"}]},\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filters\":[]},\"references\":[{\"type\":\"index-pattern\",\"id\":\"140f1bf0-0dbc-11ec-b013-53a9df7625dd\",\"name\":\"indexpattern-datasource-current-indexpattern\"},{\"type\":\"index-pattern\",\"id\":\"140f1bf0-0dbc-11ec-b013-53a9df7625dd\",\"name\":\"indexpattern-datasource-layer-6ba7a7ef-3bea-46d9-b61a-bc2a3c673c1d\"}]},\"hidePanelTitles\":false,\"enhancements\":{}},\"title\":\"Demand (kW)\"},{\"version\":\"7.14.0\",\"type\":\"lens\",\"gridData\":{\"x\":0,\"y\":19,\"w\":48,\"h\":9,\"i\":\"0161139a-4d80-41ab-aed3-7d3042269d5a\"},\"panelIndex\":\"0161139a-4d80-41ab-aed3-7d3042269d5a\",\"embeddableConfig\":{\"attributes\":{\"title\":\"\",\"type\":\"lens\",\"visualizationType\":\"lnsXY\",\"state\":{\"datasourceStates\":{\"indexpattern\":{\"layers\":{\"25083df7-ad95-4b9c-882c-5d82a00a15ed\":{\"columns\":{\"45c88792-c3c8-40f6-a1a1-3643062294f8\":{\"label\":\"@timestamp\",\"dataType\":\"date\",\"operationType\":\"date_histogram\",\"sourceField\":\"@timestamp\",\"isBucketed\":true,\"scale\":\"interval\",\"params\":{\"interval\":\"auto\"}},\"ac569630-a85e-46fc-a682-c30e1b1bbef5\":{\"label\":\"Differences of Sum of meter_kWh\",\"dataType\":\"number\",\"operationType\":\"differences\",\"isBucketed\":false,\"scale\":\"ratio\",\"references\":[\"ff830719-e828-470a-8bdd-07e23f048769\"]},\"ff830719-e828-470a-8bdd-07e23f048769\":{\"label\":\"Sum of meter_kWh\",\"dataType\":\"number\",\"operationType\":\"sum\",\"sourceField\":\"meter_kWh\",\"isBucketed\":false,\"scale\":\"ratio\"}},\"columnOrder\":[\"45c88792-c3c8-40f6-a1a1-3643062294f8\",\"ac569630-a85e-46fc-a682-c30e1b1bbef5\",\"ff830719-e828-470a-8bdd-07e23f048769\"],\"incompleteColumns\":{}}}}},\"visualization\":{\"legend\":{\"isVisible\":true,\"position\":\"right\",\"showSingleSeries\":false},\"valueLabels\":\"hide\",\"fittingFunction\":\"None\",\"curveType\":\"CURVE_MONOTONE_X\",\"valuesInLegend\":true,\"yLeftExtent\":{\"mode\":\"full\"},\"yRightExtent\":{\"mode\":\"full\"},\"axisTitlesVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"tickLabelsVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"gridlinesVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"preferredSeriesType\":\"line\",\"layers\":[{\"layerId\":\"25083df7-ad95-4b9c-882c-5d82a00a15ed\",\"accessors\":[\"ac569630-a85e-46fc-a682-c30e1b1bbef5\"],\"position\":\"top\",\"seriesType\":\"line\",\"showGridlines\":false,\"xAccessor\":\"45c88792-c3c8-40f6-a1a1-3643062294f8\"}]},\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filters\":[]},\"references\":[{\"type\":\"index-pattern\",\"id\":\"140f1bf0-0dbc-11ec-b013-53a9df7625dd\",\"name\":\"indexpattern-datasource-current-indexpattern\"},{\"type\":\"index-pattern\",\"id\":\"140f1bf0-0dbc-11ec-b013-53a9df7625dd\",\"name\":\"indexpattern-datasource-layer-25083df7-ad95-4b9c-882c-5d82a00a15ed\"}]},\"enhancements\":{}}},{\"version\":\"7.14.0\",\"type\":\"lens\",\"gridData\":{\"x\":0,\"y\":28,\"w\":48,\"h\":10,\"i\":\"1c3d281f-94f6-45e8-be45-5d0f24f7ee2f\"},\"panelIndex\":\"1c3d281f-94f6-45e8-be45-5d0f24f7ee2f\",\"embeddableConfig\":{\"attributes\":{\"title\":\"\",\"type\":\"lens\",\"visualizationType\":\"lnsXY\",\"state\":{\"datasourceStates\":{\"indexpattern\":{\"layers\":{\"d4322c84-ac6d-4819-aa31-75a87a04bb0f\":{\"columns\":{\"6e0e774c-eb62-4ae1-abbd-5680c42f85f9\":{\"label\":\"@timestamp\",\"dataType\":\"date\",\"operationType\":\"date_histogram\",\"sourceField\":\"@timestamp\",\"isBucketed\":true,\"scale\":\"interval\",\"params\":{\"interval\":\"auto\"}},\"021d0fc2-f73f-463f-a7c3-5d030e404f68\":{\"label\":\"Differences of Maximum of summation_delivered\",\"dataType\":\"number\",\"operationType\":\"differences\",\"isBucketed\":false,\"scale\":\"ratio\",\"references\":[\"aa4fb545-bc30-40a7-b6cc-942b35936030\"]},\"aa4fb545-bc30-40a7-b6cc-942b35936030\":{\"label\":\"Maximum of summation_delivered\",\"dataType\":\"number\",\"operationType\":\"max\",\"sourceField\":\"summation_delivered\",\"isBucketed\":false,\"scale\":\"ratio\"}},\"columnOrder\":[\"6e0e774c-eb62-4ae1-abbd-5680c42f85f9\",\"021d0fc2-f73f-463f-a7c3-5d030e404f68\",\"aa4fb545-bc30-40a7-b6cc-942b35936030\"],\"incompleteColumns\":{}}}}},\"visualization\":{\"legend\":{\"isVisible\":true,\"position\":\"right\"},\"valueLabels\":\"hide\",\"fittingFunction\":\"None\",\"curveType\":\"CURVE_MONOTONE_X\",\"valuesInLegend\":true,\"yLeftExtent\":{\"mode\":\"full\"},\"yRightExtent\":{\"mode\":\"full\"},\"axisTitlesVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"tickLabelsVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"gridlinesVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"preferredSeriesType\":\"line\",\"layers\":[{\"layerId\":\"d4322c84-ac6d-4819-aa31-75a87a04bb0f\",\"accessors\":[\"021d0fc2-f73f-463f-a7c3-5d030e404f68\"],\"position\":\"top\",\"seriesType\":\"line\",\"showGridlines\":false,\"xAccessor\":\"6e0e774c-eb62-4ae1-abbd-5680c42f85f9\"}]},\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filters\":[]},\"references\":[{\"type\":\"index-pattern\",\"id\":\"140f1bf0-0dbc-11ec-b013-53a9df7625dd\",\"name\":\"indexpattern-datasource-current-indexpattern\"},{\"type\":\"index-pattern\",\"id\":\"140f1bf0-0dbc-11ec-b013-53a9df7625dd\",\"name\":\"indexpattern-datasource-layer-d4322c84-ac6d-4819-aa31-75a87a04bb0f\"}]},\"enhancements\":{}}}]","timeRestore":false,"title":"Power EMU-2","version":1},"coreMigrationVersion":"7.14.0","id":"b58dc570-0dbd-11ec-b013-53a9df7625dd","migrationVersion":{"dashboard":"7.14.0"},"references":[{"id":"140f1bf0-0dbc-11ec-b013-53a9df7625dd","name":"58d3b589-f688-4815-b206-94f8e5bcf246:indexpattern-datasource-current-indexpattern","type":"index-pattern"},{"id":"140f1bf0-0dbc-11ec-b013-53a9df7625dd","name":"58d3b589-f688-4815-b206-94f8e5bcf246:indexpattern-datasource-layer-9b5ecb03-f611-4038-bdd0-1e033b9b64b3","type":"index-pattern"},{"id":"140f1bf0-0dbc-11ec-b013-53a9df7625dd","name":"b1c3297d-5e54-4a20-8181-5cb483faf23d:indexpattern-datasource-current-indexpattern","type":"index-pattern"},{"id":"140f1bf0-0dbc-11ec-b013-53a9df7625dd","name":"b1c3297d-5e54-4a20-8181-5cb483faf23d:indexpattern-datasource-layer-6ba7a7ef-3bea-46d9-b61a-bc2a3c673c1d","type":"index-pattern"},{"id":"140f1bf0-0dbc-11ec-b013-53a9df7625dd","name":"0161139a-4d80-41ab-aed3-7d3042269d5a:indexpattern-datasource-current-indexpattern","type":"index-pattern"},{"id":"140f1bf0-0dbc-11ec-b013-53a9df7625dd","name":"0161139a-4d80-41ab-aed3-7d3042269d5a:indexpattern-datasource-layer-25083df7-ad95-4b9c-882c-5d82a00a15ed","type":"index-pattern"},{"id":"140f1bf0-0dbc-11ec-b013-53a9df7625dd","name":"1c3d281f-94f6-45e8-be45-5d0f24f7ee2f:indexpattern-datasource-current-indexpattern","type":"index-pattern"},{"id":"140f1bf0-0dbc-11ec-b013-53a9df7625dd","name":"1c3d281f-94f6-45e8-be45-5d0f24f7ee2f:indexpattern-datasource-layer-d4322c84-ac6d-4819-aa31-75a87a04bb0f","type":"index-pattern"}],"type":"dashboard","updated_at":"2021-09-06T07:40:24.254Z","version":"WzM1MDE5OSwyXQ=="}
3 | {"excludedObjects":[],"excludedObjectsCount":0,"exportedCount":2,"missingRefCount":0,"missingReferences":[]}


--------------------------------------------------------------------------------
/power-emu2/power-emu2.py:
--------------------------------------------------------------------------------
  1 | #! /usr/bin/env python3
  2 | 
  3 | import datetime
  4 | import json
  5 | import os
  6 | import platform
  7 | import serial
  8 | import sys
  9 | import time
 10 | import xml.etree.ElementTree as et
 11 | 
 12 | data = {}
 13 | data['message'] = "Starting"
 14 | d = datetime.datetime.now().strftime("%Y-%m-%dT%H:%M:%SZ")
 15 | data['timestamp'] = d
 16 | 
 17 | Y2K = 946684800
 18 | 
 19 | try:
 20 |     dev = '/dev/ttyACM1'
 21 |     emu2 = serial.Serial(dev, 115200, timeout=1)
 22 |     data['status'] = "connected"
 23 | except:
 24 |     data['status'] = "could not connect"
 25 |     print(json.dumps(data), flush=True)
 26 |     exit()
 27 | 
 28 | print(json.dumps(data), flush=True)
 29 | 
 30 | while True:
 31 |     try:
 32 |         msg = emu2.readlines()
 33 |     except:
 34 |         data = {}
 35 |         data['message'] = "error"
 36 |         d = datetime.datetime.now().strftime("%Y-%m-%dT%H:%M:%SZ")
 37 |         data['timestamp'] = d
 38 |         print(json.dumps(data))
 39 |         exit()
 40 | 
 41 |     if msg == [] or msg[0].decode()[0] != '<':
 42 |         continue
 43 | 
 44 |     msg = ''.join([line.decode() for line in msg])
 45 | 
 46 |     try:
 47 |         tree = et.fromstring(msg)
 48 |         #print(msg)
 49 |     except:
 50 |         continue
 51 | 
 52 |     data = {}
 53 |     data['message'] = tree.tag
 54 | 
 55 |     if tree.tag == 'InstantaneousDemand':
 56 |         # Received every 15 seconds
 57 |         ts = int(tree.find('TimeStamp').text, 16)
 58 |         t = ts + Y2K # ts + Y2K = Unix Epoch Time
 59 |         d = datetime.datetime.fromtimestamp(t).strftime("%Y-%m-%dT%H:%M:%SZ")
 60 |         data['timestamp'] = d
 61 |         power = int(tree.find('Demand').text, 16)
 62 |         power *= int(tree.find('Multiplier').text, 16)
 63 |         power /= int(tree.find('Divisor').text, 16)
 64 |         power = round(power, int(tree.find('DigitsRight').text, 16))
 65 |         data['demand_kW'] = power
 66 |     elif tree.tag == 'PriceCluster':
 67 |         # Received every 1-2 minutes
 68 |         ts = int(tree.find('TimeStamp').text, 16)
 69 |         t = ts + Y2K # ts + Y2K = Unix Epoch Time
 70 |         d = datetime.datetime.fromtimestamp(t).strftime("%Y-%m-%dT%H:%M:%SZ")
 71 |         data['timestamp'] = d
 72 |         #data['price'] = int(tree.find('Price').text, 16)
 73 |         #data['trailing'] = int(tree.find('TrailingDigits').text, 16)
 74 |         data['price_cents_kWh'] = int(tree.find('Price').text, 16)
 75 |         data['price_cents_kWh'] /= 1000
 76 |         data['currency'] = int(tree.find('Currency').text, 16)
 77 |         # Currency uses ISO 4217 codes
 78 |         # US Dollar is code 840
 79 |         data['tier'] = int(tree.find('Tier').text, 16)
 80 |         st = int(tree.find('StartTime').text, 16)
 81 |         st = st + Y2K # st + Y2K = Unix Epoch Time
 82 |         d = datetime.datetime.fromtimestamp(st).strftime("%Y-%m-%dT%H:%M:%SZ")
 83 |         data['start_time'] = d
 84 |         data['duration'] = int(tree.find('Duration').text, 16)
 85 |     elif tree.tag == 'CurrentSummationDelivered':
 86 |         # Received every 3-5 minutes
 87 |         ts = int(tree.find('TimeStamp').text, 16)
 88 |         t = ts + Y2K # ts + Y2K = Unix Epoch Time
 89 |         d = datetime.datetime.fromtimestamp(t).strftime("%Y-%m-%dT%H:%M:%SZ")
 90 |         data['timestamp'] = d
 91 |         data['summation_delivered'] = int(tree.find('SummationDelivered').text, 16)
 92 |         data['summation_received'] = int(tree.find('SummationReceived').text, 16)
 93 |         energy = int(tree.find('SummationDelivered').text, 16)
 94 |         energy -= int(tree.find('SummationReceived').text, 16)
 95 |         energy *= int(tree.find('Multiplier').text, 16)
 96 |         energy /= int(tree.find('Divisor').text, 16)
 97 |         energy = round(energy, int(tree.find('DigitsRight').text, 16))
 98 |         data['meter_kWh'] = energy
 99 |     elif tree.tag == 'TimeCluster':
100 |         # Received every 15 minutes
101 |         ts = int(tree.find('UTCTime').text, 16)
102 |         t = ts + Y2K # ts + Y2K = Unix Epoch Time
103 |         d = datetime.datetime.fromtimestamp(t).strftime("%Y-%m-%dT%H:%M:%SZ")
104 |         data['timestamp'] = d
105 |         ts = int(tree.find('LocalTime').text, 16)
106 |         t = ts + Y2K # ts + Y2K = Unix Epoch Time
107 |         d = datetime.datetime.fromtimestamp(t).strftime("%Y-%m-%dT%H:%M:%S")
108 |         data['local_time'] = d
109 |     else:
110 |         for child in tree:
111 |             if child:
112 |                 value = int(child.text, 16) if child.text[:2] == '0x' else child.text
113 |                 data['unknown'] = value
114 | 
115 |     print(json.dumps(data), flush=True)
116 | 


--------------------------------------------------------------------------------
/power-hs300/README.md:
--------------------------------------------------------------------------------
  1 | # Monitoring Power with HS300
  2 | 
  3 | <img src="hs300.png" alt="HS300" width="350" align="right">
  4 | 
  5 | The [Kasa Smart Wi-Fi Power Strip (HS300)](https://www.kasasmart.com/us/products/smart-plugs/kasa-smart-wi-fi-power-strip-hs300) is a consumer-grade power strip that allows you to independently control and monitor 6 smart outlets (and charge 3 devices with built-in USB ports).  The power strip can be controlled via the Kasa Smart [iPhone](https://apps.apple.com/us/app/kasa-smart/id1034035493) app or [Android](https://play.google.com/store/apps/details?id=com.tplink.kasa_android&hl=en_US&gl=US) app.  Furthermore, you can query it via API to get the electrical properties of each outlet.  For example:
  6 | 
  7 | * Voltage
  8 | * Current
  9 | * Watts
 10 | * Watts per hour
 11 | 
 12 | We'll use a Python script to query it each minute via a cron job, and redirect the output to a log file.  From there, Filebeat will pick it up and send it into Elastic.  Many data center grade PSUs also provide ways to query individual outlet metrics.  A similar script could be written to extract this information in a commercial setting.
 13 | 
 14 | In general, this exercise is meant to bring transparency to the cost of electricity to run a set of machines.  If we know how much power a machine is consuming, we can calculate its electricity cost based on utility rates.
 15 | 
 16 | ![Dashboard](dashboard.png)
 17 | 
 18 | Let's get started.
 19 | 
 20 | ## Step #1 - Collect Data
 21 | 
 22 | Install the following Python module that knows how to query the power strip:
 23 | 
 24 | ```bash
 25 | pip3 install pyhs100
 26 | ```
 27 | 
 28 | Find the IP address of the power strip:
 29 | 
 30 | ```bash
 31 | pyhs100 discover | grep IP
 32 | ```
 33 | 
 34 | This should return an IP address for each HS300 on your network:
 35 | 
 36 | ```
 37 | Host/IP: 192.168.1.5
 38 | Host/IP: 192.168.1.6
 39 | ```
 40 | 
 41 | Try querying the power strip:
 42 | 
 43 | ```bash
 44 | /home/ubuntu/.local/bin/pyhs100 --ip 192.168.1.5 emeter
 45 | ```
 46 | 
 47 | You should see output similar to:
 48 | 
 49 | ```
 50 | {0: {'voltage_mv': 112807, 'current_ma': 239, 'power_mw': 24620, 'total_wh': 12}, 1: {'voltag_mv': 112608, 'current_ma': 243, 'power_mw': 23948, 'total_wh': 12}, 2: {'voltage_mv': 112608, 'current_ma': 238, 'power_mw': 23453, 'total_wh': 11}, 3: {'voltage_mv': 112509, 'current_ma': 70, 'power_mw': 5399, 'total_wh': 4}, 4: {'voltage_mv': 112409, 'current_ma': 93, 'power_mw': 3130, 'total_wh': 1}, 5: {'voltage_mv': 109030, 'current_ma': 78, 'power_mw': 5787, 'total_wh': 2}}
 51 | ```
 52 | 
 53 | This is not properly formatted JSON, but the script included with this data source will help clean it up.
 54 | 
 55 | After you've verified that you can query the power strip, download the following script and open it in your favorite editor:
 56 | 
 57 | [power-hs300.py](power-hs300.py)
 58 | 
 59 | Modify the script with the following:
 60 | 
 61 | * Change the IP addresses to match that of your power strip(s)
 62 | * Change the directory location of the `pyhs100` command
 63 | * Change the names of each outlet in the `hosts` dictionary
 64 | * Change the `label` argument in the `query_power_strip()` function calls
 65 | 
 66 | Try running the script from the command line:
 67 | 
 68 | ```bash
 69 | chmod a+x ~/bin/power-hs300.py
 70 | ~/bin/power-hs300.py
 71 | ```
 72 | 
 73 | The output will include a JSON-formatted summary of each power outlet's metrics.
 74 | 
 75 | ```json
 76 | {"@timestamp": "2021-02-08T14:32:11.611868", "outlets": [{"ip": "192.168.1.5", "outlet": 0, "name": "node-1", "volts": 112.393, "amps": 0.254, "watts": 25.425, "label": "office"}, ...]}
 77 | ```
 78 | 
 79 |   When pretty-printed, it will look like this:
 80 | 
 81 | ```json
 82 | {
 83 | 	"@timestamp": "2021-02-08T14:32:11.611868",
 84 | 	"outlets": [
 85 | 		{
 86 | 			"ip": "192.168.1.5",
 87 | 			"label": "office",
 88 | 			"outlet": 0,
 89 | 			"name": "node-1",
 90 | 			"volts": 112.393,
 91 | 			"amps": 0.254,
 92 | 			"watts": 25.425
 93 | 		},
 94 | 		...
 95 | 	]
 96 | }
 97 | ```
 98 | 
 99 | Once you're able to successfully query the power strip, create a log file for its output:
100 | 
101 | ```bash
102 | sudo touch /var/log/power-hs300.log
103 | sudo chown ubuntu.ubuntu /var/log/power-hs300.log
104 | ```
105 | 
106 | Create a logrotate entry so the log file doesn't grow unbounded:
107 | 
108 | ```
109 | sudo vi /etc/logrotate.d/power-hs300
110 | ```
111 | 
112 | Add the following content:
113 | 
114 | ```
115 | /var/log/power-hs300.log {
116 |   weekly
117 |   rotate 12
118 |   compress
119 |   delaycompress
120 |   missingok
121 |   notifempty
122 |   create 644 ubuntu ubuntu
123 | }
124 | ```
125 | 
126 | Add the following entry to your crontab:
127 | 
128 | ```
129 | * * * * * /home/ubuntu/bin/power-hs300.py >> /var/log/power-hs300.log 2>&1
130 | ```
131 | 
132 | Verify output by tailing the log file for a few minutes:
133 | 
134 | ```
135 | $ tail -f /var/log/power-hs300.log
136 | ```
137 | 
138 | If you're seeing output scroll each minute then you are successfully collecting data!
139 | 
140 | ## Step #2 - Archive Data
141 | 
142 | Once your data is ready to archive, we'll use Filebeat to send it to Logstash which will in turn sends it to S3.
143 | 
144 | Add the following to the Filebeat config `/etc/filebeat/filebeat.yml` on the host logging your HS300 data:
145 | 
146 | ```yaml
147 | filebeat.inputs:
148 |   - type: log
149 |     enabled: true
150 |     tags: ["power-hs300"]
151 |     paths:
152 |       - /var/log/power-hs300.log
153 | ```
154 | 
155 | This tells Filebeat where the log file is located and it adds a tag to each event.  We'll refer to that tag in Logstash so we can easily isolate events from this data stream.
156 | 
157 | Restart Filebeat:
158 | 
159 | ```bash
160 | sudo systemctl restart filebeat
161 | ```
162 | 
163 | You may want to tail syslog to see if Filebeat restarts without any issues:
164 | 
165 | ```bash
166 | tail -f /var/log/syslog | grep filebeat
167 | ```
168 | 
169 | At this point, we should have HS300 data flowing into Logstash.  By default however, our `distributor` pipeline in Logstash will put any unrecognized data in our Data Lake / S3 bucket called `NEEDS_CLASSIFIED`.  To change this, we're going to update the `distributor` pipeline to recognize the HS300 data feed.
170 | 
171 | Add the following conditional to your `distributor.yml` file:
172 | 
173 | ```
174 | } else if "power-hs300" in [tags] {
175 |     pipeline {
176 |         send_to => ["power-hs300-archive"]
177 |     }
178 | }
179 | ```
180 | 
181 | Create a Logstash pipeline called `power-hs300-archive.yml` with the following contents:
182 | 
183 | ```
184 | input {
185 |     pipeline {
186 |         address => "power-hs300-archive"
187 |     }
188 | }
189 | filter {
190 | }
191 | output {
192 |     s3 {
193 |         #
194 |         # Custom Settings
195 |         #
196 |         prefix => "power-hs300/%{+YYYY}-%{+MM}-%{+dd}/%{+HH}"
197 |         temporary_directory => "${S3_TEMP_DIR}/power-hs300-archive"
198 |         access_key_id => "${S3_ACCESS_KEY}"
199 |         secret_access_key => "${S3_SECRET_KEY}"
200 |         endpoint => "${S3_ENDPOINT}"
201 |         bucket => "${S3_BUCKET}"
202 | 
203 |         #
204 |         # Standard Settings
205 |         #
206 |         validate_credentials_on_root_bucket => false
207 |         codec => json_lines
208 |         # Limit Data Lake file sizes to 5 GB
209 |         size_file => 5000000000
210 |         time_file => 60
211 |         # encoding => "gzip"
212 |         additional_settings => {
213 |             force_path_style => true
214 |             follow_redirects => false
215 |         }
216 |     }
217 | }
218 | ```
219 | 
220 | Put this pipeline in your Logstash configuration directory so it gets loaded whenever Logstash restarts:
221 | 
222 | ```bash
223 | sudo mv power-hs300-archive.yml /etc/logstash/conf.d/
224 | ```
225 | 
226 | Add the pipeline to your `/etc/logstash/pipelines.yml` file:
227 | 
228 | ```
229 | - pipeline.id: "power-hs300-archive"
230 |   path.config: "/etc/logstash/conf.d/power-hs300-archive.conf"
231 | ```
232 | 
233 | And finally, restart the Logstash service:
234 | 
235 | ```bash
236 | sudo systemctl restart logstash
237 | ```
238 | 
239 | While Logstash is restarting, you can tail it's log file in order to see if there are any configuration errors:
240 | 
241 | ```bash
242 | sudo tail -f /var/log/logstash/logstash-plain.log
243 | ```
244 | 
245 | After a few seconds, you should see Logstash shutdown and start with the new pipeline and no errors being emitted.
246 | 
247 | Check your cluster's Stack Monitoring to see if we're getting events through the pipeline:
248 | 
249 | ![Stack Monitoring](archive.png)
250 | 
251 | Check your S3 bucket to see if you're getting data directories created for the current date & hour with data:
252 | 
253 | ![MinIO](minio.png)
254 | 
255 | If you see your data being stored, then you are successfully archiving!
256 | 
257 | ## Step #3 - Index Data
258 | 
259 | Once Logstash is archiving the data, next we need to index it with Elastic.
260 | 
261 | We'll use Elastic's [Dynamic field mapping](https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-field-mapping.html) feature to automatically create the right [Field data types](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html) for the data we're sending in.  
262 | 
263 | Using the [Logstash Toolkit](http://github.com/gose/logstash-toolkit), the following filter chain has been built that can parse the raw JSON coming in.
264 | 
265 | Create a new pipeline called `power-hs300-index.yml` with the following content:
266 | 
267 | ```
268 | input {
269 |     pipeline {
270 |         address => "power-hs300-index"
271 |     }
272 | }
273 | filter {
274 |     mutate {
275 |         remove_field => ["log", "input", "agent", "tags", "@version", "ecs", "host"]
276 |     }
277 |     json {
278 |         source => "message"
279 |         skip_on_invalid_json => true
280 |     }
281 |     if "_jsonparsefailure" in [tags] {
282 |       drop { }
283 |     }
284 |     split {
285 |         field => "outlets"
286 |     }
287 |     ruby {
288 |         code => "
289 |             event.get('outlets').each do |k, v|
290 |                 event.set(k, v)
291 |                 if k == '@timestamp'
292 |                     event.set(k, v + 'Z')
293 |                 end
294 |             end
295 |             event.remove('outlets')
296 |         "
297 |     }
298 |     if "_rubyexception" in [tags] {
299 |       drop { }
300 |     }
301 |     mutate {
302 |         remove_field => ["message"]
303 |         remove_field => ["@version"]
304 |     }
305 | }
306 | output {
307 |     elasticsearch {
308 |         #
309 |         # Custom Settings
310 |         #
311 |         id => "power-hs300-index"
312 |         index => "power-hs300-%{+YYYY.MM.dd}"
313 |         hosts => "${ES_ENDPOINT}"
314 |         user => "${ES_USERNAME}"
315 |         password => "${ES_PASSWORD}"
316 |     }
317 | }
318 | ```
319 | 
320 | Put this pipeline in your Logstash configuration directory so it gets loaded in whenever Logstash restarts:
321 | 
322 | ```bash
323 | sudo mv power-hs300-index.yml /etc/logstash/conf.d/
324 | ```
325 | 
326 | Add the pipeline to your `/etc/logstash/pipelines.yml` file:
327 | 
328 | ```
329 | - pipeline.id: "power-hs300-index"
330 |   path.config: "/etc/logstash/conf.d/power-hs300-index.conf"
331 | ```
332 | 
333 | And finally, restart the Logstash service:
334 | 
335 | ```bash
336 | sudo systemctl restart logstash
337 | ```
338 | 
339 | While Logstash is restarting, you can tail it's log file in order to see if there are any configuration errors:
340 | 
341 | ```bash
342 | sudo tail -f /var/log/logstash/logstash-plain.log
343 | ```
344 | 
345 | After a few seconds, you should see Logstash shutdown and start with the new pipeline and no errors being emitted.
346 | 
347 | Check your cluster's Stack Monitoring to see if we're getting events through the pipeline:
348 | 
349 | ![Indexing](index.png)
350 | 
351 | ## Step #4 - Visualize Data
352 | 
353 | Once Elasticsearch is indexing the data, we want to visualize it in Kibana.
354 | 
355 | Download this dashboard:  [power-hs300.ndjson](power-hs300.ndjson)
356 | 
357 | Jump back into Kibana:
358 | 
359 | 1. Select "Stack Management" from the menu
360 | 2. Select "Saved Objects"
361 | 3. Click "Import" in the upper right
362 | 
363 | Once it's been imported, click on "Power HS300".
364 | 
365 | ![Dashboard](dashboard.png)
366 | 
367 | Congratulations!  You should now be looking at power data from your HS300 in Elastic.


--------------------------------------------------------------------------------
/power-hs300/archive.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/power-hs300/archive.png


--------------------------------------------------------------------------------
/power-hs300/dashboard.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/power-hs300/dashboard.png


--------------------------------------------------------------------------------
/power-hs300/hs300.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/power-hs300/hs300.png


--------------------------------------------------------------------------------
/power-hs300/hs300.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python3
 2 | 
 3 | import datetime
 4 | import json
 5 | import subprocess
 6 | 
 7 | def query_power_strip(ip_addr, label, hosts, outlets, time):
 8 |     output = subprocess.getoutput("/home/ubuntu/.local/bin/pyhs100 --ip " + ip_addr +
 9 |             " emeter | grep voltage")
10 |     output = output.replace("'", "\"")
11 |     output = output.replace("0:", "\"0\":")
12 |     output = output.replace("1:", "\"1\":")
13 |     output = output.replace("2:", "\"2\":")
14 |     output = output.replace("3:", "\"3\":")
15 |     output = output.replace("4:", "\"4\":")
16 |     output = output.replace("5:", "\"5\":")
17 | 
18 |     try:
19 |         json_output = json.loads(output)
20 |         for i in range(0, 6):
21 |             reading = {}
22 |             reading["ip"] = ip_addr
23 |             reading["label"] = label
24 |             reading["outlet"] = i
25 |             reading["name"] = hosts[i]
26 |             reading["volts"] = json_output[f"{i}"]["voltage_mv"] / 1000
27 |             reading["amps"] = json_output[f"{i}"]["current_ma"] / 1000
28 |             reading["watts"] = json_output[f"{i}"]["power_mw"] / 1000
29 |             # Record then erase, the stats from the meter only at the top of each hour.
30 |             # This gives us a clean "watts/hour" reading every 1 hour.
31 |             if time.minute == 0:
32 |                 reading["watt_hours"] = json_output[f"{i}"]["total_wh"]
33 |                 erase_output = subprocess.getoutput("/home/ubuntu/.local/bin/pyhs100 --ip " +
34 |                         ip_addr + " emeter --erase")
35 |             outlets.append(reading)
36 |     except Exception as e:
37 |         print(e)
38 | 
39 | def main():
40 |     # This script is designed to run every minute.
41 |     # If it's the top of the hour, the "watt_hours" are also queried,
42 |     # which often makes the runtime of this script greater than 1 minute.
43 |     # So we capture the time the script started because we'll likely write
44 |     # to output after another invocation of this script.
45 |     # Even though these events will be written "out of order",
46 |     # recording the correct invocation time will be important.
47 |     now = datetime.datetime.utcnow()
48 |     outlets = []
49 | 
50 |     hosts = {
51 |         0: "node-22",
52 |         1: "5k-monitor",
53 |         2: "node-17",
54 |         3: "node-18",
55 |         4: "node-21",
56 |         5: "switch-8"
57 |     }
58 |     query_power_strip("192.168.1.81", "desk", hosts, outlets, now)
59 | 
60 |     hosts = {
61 |         0: "node-1",
62 |         1: "node-2",
63 |         2: "node-3",
64 |         3: "node-0",
65 |         4: "switch-8-poe",
66 |         5: "udm-pro"
67 |     }
68 |     query_power_strip("192.168.1.82", "office", hosts, outlets, now)
69 | 
70 |     hosts = {
71 |         0: "node-9",
72 |         1: "node-10",
73 |         2: "node-6",
74 |         3: "node-4",
75 |         4: "node-5",
76 |         5: "node-20"
77 |     }
78 |     query_power_strip("192.168.1.83", "basement", hosts, outlets, now)
79 | 
80 |     power = {
81 |         "@timestamp": now.isoformat(),
82 |         "outlets": outlets
83 |     }
84 | 
85 |     print(json.dumps(power))
86 | 
87 | if __name__ == "__main__":
88 |     main()
89 | 
90 | 


--------------------------------------------------------------------------------
/power-hs300/index.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/power-hs300/index.png


--------------------------------------------------------------------------------
/power-hs300/minio.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/power-hs300/minio.png


--------------------------------------------------------------------------------
/power-hs300/reindex.yml:
--------------------------------------------------------------------------------
 1 | input {
 2 |     s3 {
 3 |         #
 4 |         # Custom Settings
 5 |         #
 6 |         prefix => "power-hs300/2021-02-15/15-"
 7 |         #prefix => "power-hs300/2021-01-29/16-00"
 8 |         temporary_directory => "${S3_TEMP_DIR}/reindex"
 9 |         access_key_id => "${S3_ACCESS_KEY}"
10 |         secret_access_key => "${S3_SECRET_KEY}"
11 |         endpoint => "${S3_ENDPOINT}"
12 |         bucket => "${S3_BUCKET}"
13 | 
14 |         #
15 |         # Standard Settings
16 |         #
17 |         watch_for_new_files => false
18 |         sincedb_path => "/dev/null"
19 |         codec => json_lines
20 |         additional_settings => {
21 |             force_path_style => true
22 |             follow_redirects => false
23 |         }
24 |     }
25 | }
26 | filter {
27 |     mutate {
28 |         remove_field => ["log", "input", "agent", "tags", "@version", "ecs", "host"]
29 |         gsub => [
30 |             "message", "@timestamp", "ts"
31 |         ]
32 |     }
33 |     json {
34 |         source => "message"
35 |         skip_on_invalid_json => true
36 |     }
37 |     if "_jsonparsefailure" in [tags] {
38 |         drop { }
39 |     }
40 |     split {
41 |         field => "outlets"
42 |     }
43 |     ruby {
44 |         code => "
45 |             event.get('outlets').each do |k, v|
46 |                 event.set(k, v)
47 |             end
48 |             event.remove('outlets')
49 |         "
50 |     }
51 |     if "_rubyexception" in [tags] {
52 |       drop { }
53 |     }
54 |     mutate {
55 |         remove_field => ["message", "@timestamp"]
56 |     }
57 |     date {
58 |         match => ["ts", "YYYY-MM-dd'T'HH:mm:ss.SSSSSS"]
59 |         timezone => "UTC"
60 |         target => "@timestamp"
61 |     }
62 |     mutate {
63 |         remove_field => ["ts"]
64 |     }
65 | }
66 | output {
67 |     stdout {
68 |         codec => dots
69 |     }
70 |     elasticsearch {
71 |         index => "power-hs300-%{+YYYY.MM.dd}"
72 |         hosts => "${ES_ENDPOINT}"
73 |         user => "${ES_USERNAME}"
74 |         password => "${ES_PASSWORD}"
75 |     }
76 | }
77 | 


--------------------------------------------------------------------------------
/satellites/README.md:
--------------------------------------------------------------------------------
  1 | # Tracking Satellites with Elastic
  2 | 
  3 | <img src="satellites.png" alt="satellites" width="200" align="right">
  4 | 
  5 | Often times data we collect will include geospatial information which is worth seeing on a map.  [Elastic Maps](https://www.elastic.co/maps) is a great way to visualize this data to better understand how it is behaving.  [The Elastic Stack](https://www.elastic.co/what-is/elk-stack) supports a wide range of [Field data types](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html) that include [geo points](https://www.elastic.co/guide/en/elasticsearch/reference/current/geo-point.html).  For this data source, we track the location of over 1,000 [Starlink](https://en.wikipedia.org/wiki/Starlink) satellites and the [International Space Station](https://en.wikipedia.org/wiki/International_Space_Station) (ISS).
  6 | 
  7 | Plotting the location of these satellites involves getting the latest [TLE](https://en.wikipedia.org/wiki/Two-line_element_set) from [Celestrak](http://www.celestrak.com/Norad/elements/table.php?tleFile=starlink&title=Starlink%20Satellites&orbits=0&pointsPerRev=90&frame=1), then using the [Skyfield API](https://rhodesmill.org/skyfield/) to convert the TLE to a Latitude & Longitude by providing a time and date.
  8 | 
  9 | After we get the data ingest and indexed, we will use [Elastic Maps](https://www.elastic.co/maps) to plot our data:
 10 | 
 11 | ![Dashboard](dashboard.png)
 12 | 
 13 | Let's get started!
 14 | 
 15 | ## Step #1 - Collect Data
 16 | 
 17 | Install the following Python module that knows how to convert TLE information into latitude & longitude:
 18 | 
 19 | ```bash
 20 | $ pip3 install skyfield
 21 | ```
 22 | 
 23 | Create a new python script called `satellites.py` with the following contents:
 24 | 
 25 | ```python
 26 | #!/usr/bin/env python3
 27 | 
 28 | import datetime, json, time
 29 | from skyfield.api import load, wgs84
 30 | 
 31 | def main():
 32 |     stations_url = 'http://celestrak.com/NORAD/elements/stations.txt'
 33 |     stations = load.tle_file(stations_url, reload=True)
 34 |     starlink_url = 'http://celestrak.com/NORAD/elements/starlink.txt'
 35 |     starlinks = load.tle_file(starlink_url, reload=True)
 36 | 
 37 |     while True:
 38 |         now = datetime.datetime.utcnow()
 39 |         ts = load.timescale()
 40 | 
 41 |         satellites = []
 42 |         output = {}
 43 |         output['@timestamp'] = now.strftime('%Y-%m-%dT%H:%M:%SZ')
 44 | 
 45 |         by_name = {station.name: station for station in stations}
 46 |         station = by_name['ISS (ZARYA)']
 47 |         satellite = {}
 48 |         satellite['name'] = 'ISS'
 49 |         satellite['sat_num'] = station.model.satnum
 50 |         geocentric = station.at(ts.now())
 51 |         subpoint = wgs84.subpoint(geocentric)
 52 |         geo_point = {}
 53 |         geo_point['lat'] = subpoint.latitude.degrees
 54 |         geo_point['lon'] = subpoint.longitude.degrees
 55 |         satellite['location'] = geo_point
 56 |         satellite['elevation'] = int(subpoint.elevation.m)
 57 |         satellites.append(satellite)
 58 | 
 59 |         for starlink in starlinks:
 60 |             try:
 61 |                 geocentric = starlink.at(ts.now())
 62 |                 subpoint = wgs84.subpoint(geocentric)
 63 |                 satellite = {}
 64 |                 satellite['name'] = starlink.name
 65 |                 satellite['sat_num'] = starlink.model.satnum
 66 |                 geo_point = {}
 67 |                 geo_point['lat'] = subpoint.latitude.degrees
 68 |                 geo_point['lon'] = subpoint.longitude.degrees
 69 |                 satellite['location'] = geo_point
 70 |                 satellite['elevation'] = int(subpoint.elevation.m)
 71 |                 satellites.append(satellite)
 72 |             except:
 73 |                 pass
 74 | 
 75 |         output['satellites'] = satellites
 76 |         print(json.dumps(output))
 77 | 
 78 |         time.sleep(3)
 79 | 
 80 | if __name__ == "__main__":
 81 |     main()
 82 | ```
 83 | 
 84 | Create a new bash script called `satellites.sh` with the following contents:
 85 | 
 86 | ```bash
 87 | #!/bin/bash
 88 | 
 89 | if pgrep -f "python3 /home/ubuntu/python/satellites/satellites.py" > /dev/null
 90 | then
 91 |     echo "Already running."
 92 | else
 93 |     echo "Not running.  Restarting..."
 94 |     /home/ubuntu/python/satellites/satellites.py >> /var/log/satellites.log
 95 | fi
 96 | ```
 97 | 
 98 | You can store these wherever you'd like.  A good place to put them is in a `~/python` and `~/bin` directory, respectively.
 99 | 
100 | Try running the Python script directly:
101 | 
102 | ```
103 | $ chmod a+x ~/python/satellites/satellites.py
104 | $ ~/python/satellites/satellites.py
105 | ```
106 | 
107 | You should see output similar to:
108 | 
109 | ```json
110 | {"@timestamp": "2021-04-18T16:47:54Z", "satellites": [{"name": "ISS", "sat_num": 25544, "location": {"lat": -9.499628732834388, "lon": 5.524255661695312}, "elevation": 421272}, {"name": "STARLINK-24", "sat_num": 44238, "location": {"lat": -53.0987009533634, "lon": 75.21545552082654}, "elevation": 539139}]}
111 | ```
112 | 
113 | Once you confirm the script is working, you can redirect its output to a log file:
114 | 
115 | ```
116 | $ sudo touch /var/log/satellites.log
117 | $ sudo chown ubuntu.ubuntu /var/log/satellites.log
118 | ```
119 | 
120 | Create a logrotate entry so the log file doesn't grow unbounded:
121 | 
122 | ```
123 | $ sudo vi /etc/logrotate.d/satellites
124 | ```
125 | 
126 | Add the following content:
127 | 
128 | ```
129 | /var/log/satellites.log {
130 |   weekly
131 |   rotate 12
132 |   compress
133 |   delaycompress
134 |   missingok
135 |   notifempty
136 |   create 644 ubuntu ubuntu
137 | }
138 | ```
139 | 
140 | Add the following entry to your crontab:
141 | 
142 | ```
143 | * * * * * /home/ubuntu/bin/satellites.sh > /dev/null 2>&1
144 | ```
145 | 
146 | Verify output by tailing the log file for a few minutes:
147 | 
148 | ```
149 | $ tail -f /var/log/satellites.log
150 | ```
151 | 
152 | Tell Filebeat to send events in it to Elasticsearch, by editing `/etc/filebeat/filebeat.yml`:
153 | 
154 | ```
155 | filebeat.inputs:
156 | - type: log
157 |   enabled: true
158 |   tags: ["satellites"]
159 |   paths:
160 |     - /var/log/satellites.log
161 | ```
162 | 
163 | Restart Filebeat:
164 | 
165 | ```
166 | $ sudo systemctl restart filebeat
167 | ```
168 | 
169 | We now have a reliable collection method that will queue the satellite data on disk to a log file.  Next, we'll leverage Filebeat to manage all the domain-specific logic of handing it off to Logstash in a reliable manner, dealing with retries, backoff logic, and more.
170 | 
171 | ## Step #2 - Archive Data
172 | 
173 | Once you have a data source that's ready to archive, we'll turn to Filebeat to send in the data to Logstash.  By default, our `distributor` pipeline will put any unrecognized data in a Data Lake bucket called `NEEDS_CLASSIFIED`.  To change this, we're going to update the `distributor` pipeline to recognize the satellites data feed and create two pipelines that know how to archive it in the Data Lake.
174 | 
175 | Create an Archive Pipeline:
176 | 
177 | ```
178 | input {
179 |     pipeline {
180 |         address => "satellites-archive"
181 |     }
182 | }
183 | filter {
184 | }
185 | output {
186 |     s3 {
187 |         #
188 |         # Custom Settings
189 |         #
190 |         prefix => "satellites/%{+YYYY}-%{+MM}-%{+dd}/%{+HH}"
191 |         temporary_directory => "${S3_TEMP_DIR}/satellites-archive"
192 |         access_key_id => "${S3_ACCESS_KEY}"
193 |         secret_access_key => "${S3_SECRET_KEY}"
194 |         endpoint => "${S3_ENDPOINT}"
195 |         bucket => "${S3_BUCKET}"
196 | 
197 |         #
198 |         # Standard Settings
199 |         #
200 |         validate_credentials_on_root_bucket => false
201 |         codec => json_lines
202 |         # Limit Data Lake file sizes to 5 GB
203 |         size_file => 5000000000
204 |         time_file => 60
205 |         # encoding => "gzip"
206 |         additional_settings => {
207 |             force_path_style => true
208 |             follow_redirects => false
209 |         }
210 |     }
211 | }
212 | ```
213 | 
214 | If you're doing this in environment with multiple Logstash instances, please adapt the instruction below to your workflow for deploying updates.  Ansible is a great configuration management tool for this purpose.
215 | 
216 | ## Step #3 - Index Data
217 | 
218 | Once Logstash is archiving the data, we need to index it with Elastic.
219 | 
220 | Create an Index Template:
221 | 
222 | ```
223 | PUT _index_template/satellites
224 | {
225 |   "index_patterns": ["satellites-*"],
226 |   "template": {
227 |     "settings": {},
228 |     "mappings": {
229 |       "properties": {
230 |         "location": {
231 |           "type": "geo_point"
232 |         }
233 |       }
234 |     },
235 |     "aliases": {}
236 |   }
237 | }
238 | ```
239 | 
240 | Create an Index Pipeline:
241 | 
242 | ```
243 | input {
244 |     pipeline {
245 |         address => "satellites-index"
246 |     }
247 | }   
248 | filter {
249 |   json {
250 |     source => "message"
251 |   }       
252 |   json {
253 |     source => "message"
254 |   } 
255 |   split {
256 |     field => "satellites"
257 |   }
258 |   mutate {
259 |     rename => { "[satellites][name]" => "[name]" }
260 |     rename => { "[satellites][sat_num]" => "[sat_num]" }
261 |     rename => { "[satellites][location]" => "[location]" }
262 |     rename => { "[satellites][elevation]" => "[elevation]" }
263 |     remove_field => ["message", "agent", "input", "@version", "satellites"]
264 |   }
265 | }
266 | output {
267 |     elasticsearch {
268 |         #
269 |         # Custom Settings
270 |         #
271 |         id => "satellites-index"
272 |         index => "satellites-%{+YYYY.MM.dd}"
273 |         hosts => "${ES_ENDPOINT}"
274 |         user => "${ES_USERNAME}"
275 |         password => "${ES_PASSWORD}"
276 |     }
277 | }
278 | ```
279 | 
280 | Deploy your pipeline.
281 | 
282 | ## Step #4 - Visualize Data
283 | 
284 | Once Elasticsearch is indexing the data, we want to visualize it in Kibana.
285 | 
286 | <img src="dashboard.png" alt="dashboard">


--------------------------------------------------------------------------------
/satellites/dashboard.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/satellites/dashboard.png


--------------------------------------------------------------------------------
/satellites/satellites.ndjson:
--------------------------------------------------------------------------------
1 | {"attributes":{"fieldAttrs":"{}","fields":"[]","runtimeFieldMap":"{}","timeFieldName":"@timestamp","title":"satellites-*"},"coreMigrationVersion":"7.12.0","id":"6979b870-9945-11eb-9bde-9dcf1fa82a43","migrationVersion":{"index-pattern":"7.11.0"},"references":[],"type":"index-pattern","updated_at":"2021-04-18T17:45:39.267Z","version":"WzUyNDcyNywxMl0="}
2 | {"attributes":{"description":"","hits":0,"kibanaSavedObjectMeta":{"searchSourceJSON":"{\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filter\":[]}"},"optionsJSON":"{\"hidePanelTitles\":false,\"useMargins\":true}","panelsJSON":"[{\"version\":\"7.12.0\",\"type\":\"map\",\"gridData\":{\"x\":0,\"y\":0,\"w\":48,\"h\":26,\"i\":\"dae266e1-0d25-4c8e-83e8-547fb8ae8cfd\"},\"panelIndex\":\"dae266e1-0d25-4c8e-83e8-547fb8ae8cfd\",\"embeddableConfig\":{\"attributes\":{\"title\":\"\",\"description\":\"\",\"layerListJSON\":\"[{\\\"sourceDescriptor\\\":{\\\"type\\\":\\\"EMS_TMS\\\",\\\"isAutoSelect\\\":true},\\\"id\\\":\\\"0722a46c-af6c-4837-ae8c-4a1895a2385a\\\",\\\"label\\\":null,\\\"minZoom\\\":0,\\\"maxZoom\\\":24,\\\"alpha\\\":1,\\\"visible\\\":true,\\\"style\\\":{\\\"type\\\":\\\"TILE\\\"},\\\"type\\\":\\\"VECTOR_TILE\\\"},{\\\"sourceDescriptor\\\":{\\\"indexPatternId\\\":\\\"6979b870-9945-11eb-9bde-9dcf1fa82a43\\\",\\\"geoField\\\":\\\"location\\\",\\\"filterByMapBounds\\\":true,\\\"scalingType\\\":\\\"TOP_HITS\\\",\\\"topHitsSplitField\\\":\\\"name.keyword\\\",\\\"topHitsSize\\\":1,\\\"id\\\":\\\"a24b740c-1f80-4013-a190-dfd89b2fab24\\\",\\\"type\\\":\\\"ES_SEARCH\\\",\\\"applyGlobalQuery\\\":true,\\\"applyGlobalTime\\\":true,\\\"tooltipProperties\\\":[\\\"elevation\\\",\\\"name\\\"],\\\"sortField\\\":\\\"\\\",\\\"sortOrder\\\":\\\"desc\\\"},\\\"id\\\":\\\"d4790890-f1d6-4730-bd88-4f8eca8e0fc0\\\",\\\"label\\\":\\\"Satellites\\\",\\\"minZoom\\\":0,\\\"maxZoom\\\":24,\\\"alpha\\\":0.75,\\\"visible\\\":true,\\\"style\\\":{\\\"type\\\":\\\"VECTOR\\\",\\\"properties\\\":{\\\"icon\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"value\\\":\\\"confectionery\\\"}},\\\"fillColor\\\":{\\\"type\\\":\\\"DYNAMIC\\\",\\\"options\\\":{\\\"color\\\":\\\"Green to Red\\\",\\\"colorCategory\\\":\\\"palette_0\\\",\\\"field\\\":{\\\"name\\\":\\\"elevation\\\",\\\"origin\\\":\\\"source\\\"},\\\"fieldMetaOptions\\\":{\\\"isEnabled\\\":true,\\\"sigma\\\":3},\\\"type\\\":\\\"ORDINAL\\\",\\\"useCustomColorRamp\\\":false}},\\\"lineColor\\\":{\\\"type\\\":\\\"DYNAMIC\\\",\\\"options\\\":{\\\"color\\\":\\\"Green to Red\\\",\\\"colorCategory\\\":\\\"palette_0\\\",\\\"field\\\":{\\\"name\\\":\\\"elevation\\\",\\\"origin\\\":\\\"source\\\"},\\\"fieldMetaOptions\\\":{\\\"isEnabled\\\":true,\\\"sigma\\\":3},\\\"type\\\":\\\"ORDINAL\\\",\\\"useCustomColorRamp\\\":false}},\\\"lineWidth\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"size\\\":1}},\\\"iconSize\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"size\\\":6}},\\\"iconOrientation\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"orientation\\\":0}},\\\"labelText\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"value\\\":\\\"\\\"}},\\\"labelColor\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"color\\\":\\\"#FFFFFF\\\"}},\\\"labelSize\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"size\\\":14}},\\\"labelBorderColor\\\":{\\\"type\\\":\\\"STATIC\\\",\\\"options\\\":{\\\"color\\\":\\\"#000000\\\"}},\\\"symbolizeAs\\\":{\\\"options\\\":{\\\"value\\\":\\\"icon\\\"}},\\\"labelBorderSize\\\":{\\\"options\\\":{\\\"size\\\":\\\"SMALL\\\"}}},\\\"isTimeAware\\\":true},\\\"type\\\":\\\"VECTOR\\\",\\\"joins\\\":[]}]\",\"mapStateJSON\":\"{\\\"zoom\\\":1.95,\\\"center\\\":{\\\"lon\\\":-99.72924,\\\"lat\\\":22.13816},\\\"timeFilters\\\":{\\\"from\\\":\\\"now-1m\\\",\\\"to\\\":\\\"now\\\"},\\\"refreshConfig\\\":{\\\"isPaused\\\":false,\\\"interval\\\":5000},\\\"query\\\":{\\\"query\\\":\\\"\\\",\\\"language\\\":\\\"kuery\\\"},\\\"filters\\\":[],\\\"settings\\\":{\\\"autoFitToDataBounds\\\":false,\\\"backgroundColor\\\":\\\"#1d1e24\\\",\\\"disableInteractive\\\":false,\\\"disableTooltipControl\\\":false,\\\"hideToolbarOverlay\\\":false,\\\"hideLayerControl\\\":false,\\\"hideViewControl\\\":false,\\\"initialLocation\\\":\\\"LAST_SAVED_LOCATION\\\",\\\"fixedLocation\\\":{\\\"lat\\\":0,\\\"lon\\\":0,\\\"zoom\\\":2},\\\"browserLocation\\\":{\\\"zoom\\\":2},\\\"maxZoom\\\":24,\\\"minZoom\\\":0,\\\"showScaleControl\\\":false,\\\"showSpatialFilters\\\":true,\\\"spatialFiltersAlpa\\\":0.3,\\\"spatialFiltersFillColor\\\":\\\"#DA8B45\\\",\\\"spatialFiltersLineColor\\\":\\\"#DA8B45\\\"}}\",\"uiStateJSON\":\"{\\\"isLayerTOCOpen\\\":true,\\\"openTOCDetails\\\":[]}\"},\"mapCenter\":{\"lat\":22.13816,\"lon\":-99.72924,\"zoom\":1.95},\"mapBuffer\":{\"minLon\":-395.447,\"minLat\":-88.57154500000001,\"maxLon\":195.98852,\"maxLat\":115.932715},\"isLayerTOCOpen\":true,\"openTOCDetails\":[],\"hiddenLayers\":[],\"enhancements\":{}}}]","timeRestore":false,"title":"Satellites","version":1},"coreMigrationVersion":"7.12.0","id":"4ae756b0-9ee0-11eb-892f-d146407b15b5","migrationVersion":{"dashboard":"7.11.0"},"references":[{"id":"6979b870-9945-11eb-9bde-9dcf1fa82a43","name":"layer_1_source_index_pattern","type":"index-pattern"}],"type":"dashboard","updated_at":"2021-04-18T17:58:42.036Z","version":"WzUyNTEzNSwxMl0="}
3 | {"exportedCount":2,"missingRefCount":0,"missingReferences":[]}


--------------------------------------------------------------------------------
/satellites/satellites.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/satellites/satellites.png


--------------------------------------------------------------------------------
/setup/README.md:
--------------------------------------------------------------------------------
  1 | # Setup
  2 | 
  3 | To build the architecture for the Elastic Data Lake, you'll need these components:
  4 | 
  5 | * Logstash
  6 | * HAProxy (or equivalent)
  7 | * S3 Data Store (or equivalent)
  8 | * Elastic Cluster
  9 | 
 10 | Here is the architecture we're building:
 11 | 
 12 | ![](../images/architecture.png)
 13 | 
 14 | ## Prerequisites
 15 | 
 16 | This guide depends on you having an S3 store and Elasticsearch cluster already running.  We'll use [Elastic Cloud](https://elastic.co) to run our Elasticsearch cluster and [Minio](https://www.digitalocean.com/community/tutorials/how-to-set-up-an-object-storage-server-using-minio-on-ubuntu-18-04) as an S3 data store (or any S3-compliant service).
 17 | 
 18 | ## Step 1 - Logstash
 19 | 
 20 | Identify the host you want to run Logstash.  Depending on the volume of ingest you anticipate, you may want to run Logstash on multiple hosts (or containers).  It scales easily so putting HAProxy in front of it (which we'll do next) will make it easy to add more capacity.
 21 | 
 22 | Follow these instructions to get Logstash up and running:
 23 | 
 24 | [https://www.elastic.co/guide/en/logstash/current/installing-logstash.html](https://www.elastic.co/guide/en/logstash/current/installing-logstash.html)
 25 | 
 26 | Next, create a [Logstash keystore](https://www.elastic.co/guide/en/logstash/current/keystore.html) to store sensitive information and variables:
 27 | 
 28 | ```
 29 | $ export LOGSTASH_KEYSTORE_PASS=mypassword
 30 | $ sudo -E /usr/share/logstash/bin/logstash-keystore --path.settings /etc/logstash create
 31 | ```
 32 | 
 33 | **Note:**  Store this password somewhere safe.  You will also need to add it to the environment that starts the Logstash process.
 34 | 
 35 | We'll use the keystore to fill in variables about our Elasticsearch cluster:
 36 | 
 37 | ```
 38 | $ sudo -E /usr/share/logstash/bin/logstash-keystore --path.settings /etc/logstash add ES_ENDPOINT
 39 | $ sudo -E /usr/share/logstash/bin/logstash-keystore --path.settings /etc/logstash add ES_USERNAME
 40 | $ sudo -E /usr/share/logstash/bin/logstash-keystore --path.settings /etc/logstash add ES_PASSWORD
 41 | ```
 42 | 
 43 | The `ES_ENDPOINT` value should be a full domain with `https` prefix and `:9243` port suffix.  For example:
 44 | 
 45 | ```
 46 | https://elasticsearch.my-domain.com:9243
 47 | ```
 48 | 
 49 | We'll also use the keystore to fill in variables about our S3 bucket:
 50 | 
 51 | ```
 52 | $ sudo -E /usr/share/logstash/bin/logstash-keystore --path.settings /etc/logstash add S3_ENDPOINT
 53 | $ sudo -E /usr/share/logstash/bin/logstash-keystore --path.settings /etc/logstash add S3_BUCKET
 54 | $ sudo -E /usr/share/logstash/bin/logstash-keystore --path.settings /etc/logstash add S3_ACCESS_KEY
 55 | $ sudo -E /usr/share/logstash/bin/logstash-keystore --path.settings /etc/logstash add S3_SECRET_KEY
 56 | $ sudo -E /usr/share/logstash/bin/logstash-keystore --path.settings /etc/logstash add S3_DATE_DIR
 57 | $ sudo -E /usr/share/logstash/bin/logstash-keystore --path.settings /etc/logstash add S3_TEMP_DIR
 58 | ```
 59 | 
 60 | The `S3_DATE_DIR` variable is used to organize your data into `date/time` directories in the Data Lake.  For example, `data-source/2021-01-01/13` will contain data collected January 1, 2021 during the 1PM GMT hour.  Organizing your data in this manner gives you good granularity in terms of identifying what time windows you may want to re-index in the future.  It allows you to reindex data from a year, month, day, or hour interval.  An hour (as opposed to using the hour with minute granularity) provides a nice balance between flushing what's in Logstash to your archive relatively often, while not creating a "too many files" burden on the underlying archive file system.  Many file systems can handle lots of files; it's more the latency involved in recalling them that we want to avoid.
 61 | 
 62 | The recommended value for `S3_DATE_DIR` is:
 63 | 
 64 | ```
 65 | %{+YYYY}-%{+MM}-%{+dd}/%{+HH}
 66 | ```
 67 | 
 68 | The `S3_TEMP_DIR` variable should point to a directory where Logstash can temporarily store events.  Since this directory will contain events, you may need to make it secure so that only the Logstash process can read it (in addition to write to it).
 69 | 
 70 | If Logstash is running on an isolated host, you may set it to:
 71 | 
 72 | ```
 73 | /tmp/logstash
 74 | ```
 75 | 
 76 | ### Ansible Pipeline Management
 77 | 
 78 | We'll configure Logstash using Ansible.  Ansible is a popular software provisioning tool that makes deploying configuration updates to multiple servers a breeze.  If you can SSH into a host, you can use Ansible to push configuration to it.
 79 | 
 80 | Create a directory to hold the Logstash configuration we'll be pushing to each Logstash host.
 81 | 
 82 | ```
 83 | $ mkdir logstash
 84 | $ vi playbook-logstash.yml 
 85 | ```
 86 | 
 87 | Add the following content to your Logstash Ansible playbook.
 88 | 
 89 | **Note:** Replace `node-1` and `node-2` with the names of your Logstash hosts.
 90 | 
 91 | ```
 92 | ---
 93 | - hosts: node-1:node-2
 94 |   become: yes
 95 |   gather_facts: no
 96 | 
 97 |   tasks:
 98 |     - name: Copy in pipelines.yml
 99 |       template:
100 |         src: "pipelines.yml"
101 |         dest: "/etc/logstash/pipelines.yml"
102 |         mode: 0644
103 | 
104 |     - name: Remove existing pipelines
105 |       file:
106 |         path: "/etc/logstash/conf.d"
107 |         state: absent
108 | 
109 |     - name: Copy in pipelines
110 |       copy:
111 |         src: "conf.d"
112 |         dest: "/etc/logstash/"
113 | 
114 |     - name: Restart Logstash
115 |       service:
116 |         name: logstash
117 |         state: restarted
118 |         enabled: true
119 | 
120 | ```
121 | 
122 | ## Step 2 - HAProxy
123 | 
124 | Identify the host you want to run HAProxy.  Many Linux distributions support installation from the standard distribution.
125 | 
126 | In Ubuntu, run:
127 | 
128 | ```
129 | $ sudo apt install haproxy
130 | ```
131 | 
132 | In Redhat, run:
133 | 
134 | ```
135 | $ sudo yum install haproxy
136 | ```
137 | 
138 | A sample configuration file is provided: [haproxy.cfg](haproxy.cfg)
139 | 


--------------------------------------------------------------------------------
/setup/dead-letter-queue-archive.yml:
--------------------------------------------------------------------------------
 1 | input {
 2 |     dead_letter_queue {
 3 |         pipeline_id => "haproxy-filebeat-module-structure"
 4 |         path => "${S3_TEMP_DIR}/dead-letter-queue"
 5 |         # This directory needs created by hand (change /tmp/logstash if necessary):
 6 |         # mkdir -p /tmp/logstash/dead-letter-queue/haproxy-filebeat-module-structure
 7 |         # chown -R logstash.logstash /tmp/logstash/dead-letter-queue
 8 |     }
 9 |     dead_letter_queue {
10 |         pipeline_id => "haproxy-metricbeat-module-structure"
11 |         path => "${S3_TEMP_DIR}/dead-letter-queue"
12 |         # This directory needs created by hand (change /tmp/logstash if necessary):
13 |         # mkdir -p /tmp/logstash/dead-letter-queue/haproxy-metricbeat-module-structure
14 |         # chown -R logstash.logstash /tmp/logstash/dead-letter-queue
15 |     }
16 |     dead_letter_queue {
17 |         pipeline_id => "system-filebeat-module-structure"
18 |         path => "${S3_TEMP_DIR}/dead-letter-queue"
19 |         # This directory needs created by hand (change /tmp/logstash if necessary):
20 |         # mkdir -p /tmp/logstash/dead-letter-queue/system-filebeat-module-structure
21 |         # chown -R logstash.logstash /tmp/logstash/dead-letter-queue
22 |     }
23 |     dead_letter_queue {
24 |         pipeline_id => "unknown-structure"
25 |         path => "${S3_TEMP_DIR}/dead-letter-queue"
26 |         # This directory needs created by hand (change /tmp/logstash if necessary):
27 |         # mkdir -p /tmp/logstash/dead-letter-queue/unknown-structure
28 |         # chown -R logstash.logstash /tmp/logstash/dead-letter-queue
29 |     }
30 |     dead_letter_queue {
31 |         pipeline_id => "utilization-structure"
32 |         path => "${S3_TEMP_DIR}/dead-letter-queue"
33 |         # This directory needs created by hand (change /tmp/logstash if necessary):
34 |         # mkdir -p /tmp/logstash/dead-letter-queue/utilization-structure
35 |         # chown -R logstash.logstash /tmp/logstash/dead-letter-queue
36 |     }
37 | }
38 | filter {
39 | }
40 | output {
41 |     s3 {
42 |         #
43 |         # Custom Settings
44 |         #
45 |         prefix => "dead-letter-queue-archive/${S3_DATE_DIR}"
46 |         temporary_directory => "${S3_TEMP_DIR}/dead-letter-queue-archive"
47 |         access_key_id => "${S3_ACCESS_KEY}"
48 |         secret_access_key => "${S3_SECRET_KEY}"
49 |         endpoint => "${S3_ENDPOINT}"
50 |         bucket => "${S3_BUCKET}"
51 |         
52 |         #
53 |         # Standard Settings
54 |         #
55 |         validate_credentials_on_root_bucket => false
56 |         codec => json_lines
57 |         # Limit Data Lake file sizes to 5 GB
58 |         size_file => 5000000000
59 |         time_file => 1
60 |         # encoding => "gzip"
61 |         additional_settings => {
62 |             force_path_style => true
63 |             follow_redirects => false
64 |         }
65 |     }
66 | }
67 | 


--------------------------------------------------------------------------------
/setup/distributor.conf:
--------------------------------------------------------------------------------
 1 | input {
 2 |     tcp {
 3 |         port => 4044
 4 |     }
 5 |     beats {
 6 |         port => 5044
 7 |     }
 8 | }
 9 | filter {
10 |     # Raw data filters go here.
11 |     # Filter out any data you don't want in the Data Lake or Elasticsearch.
12 | }
13 | output {
14 |     if "utilization" in [tags] {
15 |         pipeline {
16 |             send_to => ["utilization-archive", "utilization-structure"]
17 |         }
18 |     } else if [agent][type] == "filebeat" and [event][module] == "system" {
19 |         pipeline {
20 |             send_to => ["system-filebeat-module-archive", "system-filebeat-module-structure"]
21 |         }
22 |     } else if [agent][type] == "filebeat" and [event][module] == "haproxy" {
23 |         pipeline {
24 |             send_to => ["haproxy-filebeat-module-archive", "haproxy-filebeat-module-structure"]
25 |         }
26 |     } else if [agent][type] == "metricbeat" and [event][module] == "haproxy" {
27 |         pipeline {
28 |             send_to => ["haproxy-metricbeat-module-archive", "haproxy-metricbeat-module-structure"]
29 |         }
30 |     } else {
31 |         pipeline {
32 |             send_to => ["unknown-archive", "unknown-structure"]
33 |         }
34 |     }
35 | }
36 | 


--------------------------------------------------------------------------------
/setup/haproxy.cfg:
--------------------------------------------------------------------------------
 1 | global
 2 |   log /dev/log local0
 3 |   log /dev/log local1 notice
 4 |   chroot /var/lib/haproxy
 5 |   stats socket 127.0.0.1:14567
 6 |   user haproxy
 7 |   group haproxy
 8 |   daemon
 9 |   tune.ssl.default-dh-param 2048
10 | 
11 |   ssl-default-bind-ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384
12 |   ssl-default-bind-ciphersuites TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256
13 |   ssl-default-bind-options ssl-min-ver TLSv1.2 no-tls-tickets
14 | 
15 | defaults
16 |   log global
17 |   mode http
18 |   option httplog
19 |   option dontlognull
20 |   timeout connect 5000
21 |   timeout client 50000
22 |   timeout server 50000
23 |   errorfile 400 /etc/haproxy/errors/400.http
24 |   errorfile 403 /etc/haproxy/errors/403.http
25 |   errorfile 408 /etc/haproxy/errors/408.http
26 |   errorfile 500 /etc/haproxy/errors/500.http
27 |   errorfile 502 /etc/haproxy/errors/502.http
28 |   errorfile 503 /etc/haproxy/errors/503.http
29 |   errorfile 504 /etc/haproxy/errors/504.http
30 | 
31 | # Logstash TCP
32 | listen logstash-tcp:4443
33 |   #log /dev/log local0 debug
34 |   mode tcp
35 |   bind *:4443 ssl crt /etc/haproxy/certs/logstash.corp-intranet.pem ca-file /etc/haproxy/certs/ca.crt verify required
36 |   option tcp-check
37 |   balance roundrobin
38 |   server proxy 127.0.0.1:4044 check port 4044
39 | 
40 | # Logstash Beats
41 | listen logstash-beats:5443
42 |   #log /dev/log local0 debug
43 |   mode tcp
44 |   bind *:5443 ssl crt /etc/haproxy/certs/logstash.corp-intranet.pem ca-file /etc/haproxy/certs/ca.crt verify required
45 |   option tcp-check
46 |   balance roundrobin
47 |   server proxy 127.0.0.1:5044 check port 5044
48 | 
49 | # Elasticsearch
50 | listen elasticsearch:9243
51 |   #log /dev/log local0 debug
52 |   mode http
53 |   bind *:9243 ssl crt /etc/haproxy/certs/corp-intranet.pem
54 |   http-request add-header X-Found-Cluster f40ec3b5bf1c4d8d81b3934cb97c8a32
55 |   option ssl-hello-chk
56 |   server proxy f40ec3b5bf1c4d8d81b3934cb97c8a32.us-central1.gcp.cloud.es.io:9243 check ssl port 9243 verify none
57 | 
58 | # MinIO
59 | listen minio:9443
60 |   #log /dev/log local0 debug
61 |   mode http
62 |   bind *:9443 ssl crt /etc/haproxy/certs/corp-intranet.pem
63 |   http-request set-header X-Forwarded-Port %[dst_port]
64 |   http-request add-header X-Forwarded-Proto https if { ssl_fc }
65 |   option tcp-check
66 |   balance roundrobin
67 |   server proxy 127.0.0.1:9000 check port 9000
68 | 


--------------------------------------------------------------------------------
/setup/needs-classified-archive.yml:
--------------------------------------------------------------------------------
 1 | input {
 2 |     pipeline {
 3 |         address => "needs-classified-archive"
 4 |     }
 5 | }
 6 | filter {
 7 | }
 8 | output {
 9 |     s3 {
10 |         #
11 |         # Custom Settings
12 |         #
13 |         prefix => "NEEDS_CLASSIFIED/${S3_DATE_DIR}"
14 |         temporary_directory => "${S3_TEMP_DIR}/needs-classified-archive"
15 |         access_key_id => "${S3_ACCESS_KEY}"
16 |         secret_access_key => "${S3_SECRET_KEY}"
17 |         endpoint => "${S3_ENDPOINT}"
18 |         bucket => "${S3_BUCKET}"
19 |         
20 |         #
21 |         # Standard Settings
22 |         #
23 |         validate_credentials_on_root_bucket => false
24 |         codec => json_lines
25 |         # Limit Data Lake file sizes to 5 GB
26 |         size_file => 5000000000
27 |         time_file => 1
28 |         # encoding => "gzip"
29 |         additional_settings => {
30 |             force_path_style => true
31 |             follow_redirects => false
32 |         }
33 |     }
34 | }
35 | 


--------------------------------------------------------------------------------
/solar-enphase/README.md:
--------------------------------------------------------------------------------
  1 | # Solar Monitoring with Enphase
  2 | 
  3 | <img src="solar.png" alt="solar" width="300" align="right">
  4 | 
  5 | The [IQ 7+](https://store.enphase.com/storefront/en-us/iq-7plus-microinverter) from Enphase is a microinverter compatible with 60 and 72-cell solar panels that can produce 295VA at peak power.  Enphase provides an [API]( https://developer.enphase.com/docs#envoys) that allows us to query a set of these microinverters reporting into their service.  They offer a range of [Plans](https://developer.enphase.com/plans), including a free plan, which we'll be using for this data source.
  6 | 
  7 | For this data source, we'll build the following dashboard with Elastic:
  8 | 
  9 | ![Dashboard](dashboard.png)
 10 | 
 11 | Let's get started!
 12 | 
 13 | ## Step #1 - Collect Data
 14 | 
 15 | Create a new python script called `~/bin/solar-enphase.py` with the following contents:
 16 | 
 17 | ​	[solar-enphase.py](solar-enphase.py)
 18 | 
 19 | The script queries a set of Enphase's APIs at different intervals.  The goal being to stay within our alotted quota of 10k API calls per month.  We'll write the data collected to our data lake, but only use a portion of it for analysis in Elastic.
 20 | 
 21 | Take a few minutes to familiarize yourself with the script.  There are a couple of labels you can change near the bottom.  Adjust the values of `<your_site_id>`, `<your_key>` and `<your_user_id>` to suit your needs.  The Enphase [Developer Portal](https://developer.enphase.com) is where you can get these values.
 22 | 
 23 | When you're ready, try running the script:
 24 | 
 25 | ```bash
 26 | chmod a+x ~/bin/solar-enphase.py
 27 | ~/bin/solar-enphase.py
 28 | ```
 29 | 
 30 | You may not see any output, and this is by design (not a great design, albeit, but it works for now).  Since we're limited to ~300 API calls per day on the Free plan, the script checks to see if it's on a specific minute of the hour in order to determine which API calls to make.
 31 | 
 32 | If you run the script at :00, :10, :20, :30, :40, or :50 past the hour, you should see output on `stdout` similar to:
 33 | 
 34 | ```json
 35 | [{"signal_strength":0,"micro_inverters":[{"id":40236944,"serial_number":"121927062331","model":"IQ7+","part_number":"800-00625-r02","sku":"IQ7PLUS-72-2-US","status":"normal","power_produced":28,"proc_load":"520-00082-r01-v04.27.04","param_table":"540-00242-r01-v04.22.09","envoy_serial_number":"111943015132",...
 36 | ```
 37 | 
 38 | Once you confirm the script is working, you can redirect its output to a log file:
 39 | 
 40 | ```bash
 41 | sudo touch /var/log/solar-enphase.log
 42 | sudo chown ubuntu.ubuntu /var/log/solar-enphase.log
 43 | ```
 44 | 
 45 | Create a logrotate entry so the log file doesn't grow unbounded:
 46 | 
 47 | ```bash
 48 | sudo vi /etc/logrotate.d/solar-enphase
 49 | ```
 50 | 
 51 | Add the following logrotate content:
 52 | 
 53 | ```
 54 | /var/log/solar-enphase.log {
 55 |   weekly
 56 |   rotate 12
 57 |   compress
 58 |   delaycompress
 59 |   missingok
 60 |   notifempty
 61 |   create 644 ubuntu ubuntu
 62 | }
 63 | ```
 64 | 
 65 | Add the following entry to your crontab with `crontab -e`:
 66 | 
 67 | ```
 68 | * * * * * /home/ubuntu/bin/solar-enphase.py >> /var/log/solar-enphase.log 2>&1
 69 | ```
 70 | 
 71 | Verify output by tailing the log file for a few minutes (since cron is only running the script at the start of each minute):
 72 | 
 73 | ```bash
 74 | tail -f /var/log/solar-enphase.log
 75 | ```
 76 | 
 77 | If you're seeing output scroll every 10 minutes, then you are successfully collecting data!
 78 | 
 79 | ## Step #2 - Archive Data
 80 | 
 81 | Once your data is ready to archive, we'll use Filebeat to send it to Logstash which will in turn sends it to S3.
 82 | 
 83 | Add the following to the Filebeat config `/etc/filebeat/filebeat.yml` on the host logging your CO2 data:
 84 | 
 85 | ```yaml
 86 | filebeat.inputs:
 87 |   - type: log
 88 |     enabled: true
 89 |     tags: ["solar-enphase"]
 90 |     paths:
 91 |       - /var/log/solar-enphase.log
 92 | ```
 93 | 
 94 | This tells Filebeat where the log file is located and it adds a tag to each event.  We'll refer to that tag in Logstash so we can easily isolate events from this data stream.
 95 | 
 96 | Restart Filebeat:
 97 | 
 98 | ```bash
 99 | sudo systemctl restart filebeat
100 | ```
101 | 
102 | You may want to tail syslog to see if Filebeat restarts without any issues:
103 | 
104 | ```bash
105 | tail -f /var/log/syslog | grep filebeat
106 | ```
107 | 
108 | At this point, we should have Solar Enphase data flowing into Logstash.  By default however, our `distributor` pipeline in Logstash will put any unrecognized data in our Data Lake / S3 bucket called `NEEDS_CLASSIFIED`.  To change this, we're going to update the `distributor` pipeline to recognize the Solar Enphase data feed.
109 | 
110 | Add the following conditional to your `distributor.yml` file:
111 | 
112 | ```
113 | } else if "solar-enphase" in [tags] {
114 |     pipeline {
115 |         send_to => ["solar-enphase-archive"]
116 |     }
117 | }
118 | ```
119 | 
120 | Create a Logstash pipeline called `solar-enphase-archive.yml` with the following contents:
121 | 
122 | ```
123 | input {
124 |     pipeline {
125 |         address => "solar-enphase-archive"
126 |     }
127 | }
128 | filter {
129 | }
130 | output {
131 |     s3 {
132 |         #
133 |         # Custom Settings
134 |         #
135 |         prefix => "solar-enphase/%{+YYYY}-%{+MM}-%{+dd}/%{+HH}"
136 |         temporary_directory => "${S3_TEMP_DIR}/solar-enphase-archive"
137 |         access_key_id => "${S3_ACCESS_KEY}"
138 |         secret_access_key => "${S3_SECRET_KEY}"
139 |         endpoint => "${S3_ENDPOINT}"
140 |         bucket => "${S3_BUCKET}"
141 | 
142 |         #
143 |         # Standard Settings
144 |         #
145 |         validate_credentials_on_root_bucket => false
146 |         codec => json_lines
147 |         # Limit Data Lake file sizes to 5 GB
148 |         size_file => 5000000000
149 |         time_file => 60
150 |         # encoding => "gzip"
151 |         additional_settings => {
152 |             force_path_style => true
153 |             follow_redirects => false
154 |         }
155 |     }
156 | }
157 | ```
158 | 
159 | Put this pipeline in your Logstash configuration directory so it gets loaded whenever Logstash restarts:
160 | 
161 | ```bash
162 | sudo mv solar-enphase-archive.yml /etc/logstash/conf.d/
163 | ```
164 | 
165 | Add the pipeline to your `/etc/logstash/pipelines.yml` file:
166 | 
167 | ```
168 | - pipeline.id: "solar-enphase-archive"
169 |   path.config: "/etc/logstash/conf.d/solar-enphase-archive.conf"
170 | ```
171 | 
172 | And finally, restart the Logstash service:
173 | 
174 | ```bash
175 | sudo systemctl restart logstash
176 | ```
177 | 
178 | While Logstash is restarting, you can tail it's log file in order to see if there are any configuration errors:
179 | 
180 | ```bash
181 | sudo tail -f /var/log/logstash/logstash-plain.log
182 | ```
183 | 
184 | After a few seconds, you should see Logstash shutdown and start with the new pipeline and no errors being emitted.
185 | 
186 | Check your cluster's Stack Monitoring to see if we're getting events through the pipeline:
187 | 
188 | ![Stack Monitoring](archive.png)
189 | 
190 | Check your S3 bucket to see if you're getting data directories created for the current date & hour with data:
191 | 
192 | ![MinIO](minio.png)
193 | 
194 | If you see your data being stored, then you are successfully archiving!
195 | 
196 | ## Step #3 - Index Data
197 | 
198 | Once Logstash is archiving the data, next we need to index it with Elastic.
199 | 
200 | We'll use Elastic's [Dynamic field mapping](https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-field-mapping.html) feature to automatically create the right [Field data types](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html) for the data we're sending in.  
201 | 
202 | Using the [Logstash Toolkit](http://github.com/gose/logstash-toolkit), the following filter chain has been built that can parse the raw JSON coming in.
203 | 
204 | Create a new pipeline called `solar-enphase-index.yml` with the following content:
205 | 
206 | ```
207 | input {
208 |   pipeline {
209 |     address => "solar-enphase-index"
210 |   }
211 | }
212 | filter {
213 |   json {
214 |     source => "message"
215 |   }
216 |   if [message] =~ /^\[/ {
217 |     json {
218 |       source => "message"
219 |       target => "tmp"
220 |     }
221 |   } else {
222 |     drop { }
223 |   }
224 |   if "_jsonparsefailure" in [tags] {
225 |     drop { }
226 |   }
227 |   mutate {
228 |     remove_field => ["message"]
229 |   }
230 |   mutate {
231 |     add_field => {
232 |       "message" => "%{[tmp][0]}"
233 |     }
234 |   }
235 |   mutate {
236 |     remove_field => ["tmp"]
237 |   }
238 |   json {
239 |     source => "message"
240 |   }
241 |   mutate {
242 |     remove_field => ["message"]
243 |   }
244 |   split {
245 |     field => "micro_inverters"
246 |   }
247 |   ruby {
248 |     # Promote the keys inside tmp to root, then remove tmp
249 |     code => '
250 |       event.get("micro_inverters").each { |k, v|
251 |         event.set(k,v)
252 |       }
253 |       event.remove("micro_inverters")
254 |     '
255 |   }
256 |   date {
257 |     match => ["last_report_date", "ISO8601"]
258 |   }
259 |   mutate {
260 |     remove_field => ["last_report_date", "part_number", "envoy_serial_number", "param_table"]
261 |     remove_field => ["model", "sku", "grid_profile", "proc_load", "id"]
262 |     remove_field => ["agent", "host", "input", "log", "host", "ecs", "@version"]
263 |   }
264 | }
265 | output {
266 |   elasticsearch {
267 |       #
268 |       # Custom Settings
269 |       #
270 |       id => "solar-enphase-index"
271 |       index => "solar-enphase-%{+YYYY.MM.dd}"
272 |       hosts => "${ES_ENDPOINT}"
273 |       user => "${ES_USERNAME}"
274 |       password => "${ES_PASSWORD}"
275 |   }
276 | }
277 | ```
278 | 
279 | Put this pipeline in your Logstash configuration directory so it gets loaded in whenever Logstash restarts:
280 | 
281 | ```bash
282 | sudo mv solar-enphase-index.yml /etc/logstash/conf.d/
283 | ```
284 | 
285 | Add the pipeline to your `/etc/logstash/pipelines.yml` file:
286 | 
287 | ```
288 | - pipeline.id: "solar-enphase-index"
289 |   path.config: "/etc/logstash/conf.d/solar-enphase-index.conf"
290 | ```
291 | 
292 | And finally, restart the Logstash service:
293 | 
294 | ```bash
295 | sudo systemctl restart logstash
296 | ```
297 | 
298 | While Logstash is restarting, you can tail it's log file in order to see if there are any configuration errors:
299 | 
300 | ```bash
301 | sudo tail -f /var/log/logstash/logstash-plain.log
302 | ```
303 | 
304 | After a few seconds, you should see Logstash shutdown and start with the new pipeline and no errors being emitted.
305 | 
306 | Check your cluster's Stack Monitoring to see if we're getting events through the pipeline:
307 | 
308 | ![Indexing](index.png)
309 | 
310 | ## Step #4 - Visualize Data
311 | 
312 | Once Elasticsearch is indexing the data, we want to visualize it in Kibana.
313 | 
314 | Download this dashboard:  [solar-enphase.ndjson](solar-enphase.ndjson)
315 | 
316 | Jump back into Kibana:
317 | 
318 | 1. Select "Stack Management" from the menu
319 | 2. Select "Saved Objects"
320 | 3. Click "Import" in the upper right
321 | 
322 | Once it's been imported, click on "Solar Enphase".
323 | 
324 | ![Dashboard](dashboard.png)
325 | 
326 | Congratulations!  You should now be looking at data from your Solar Enphase in Elastic.
327 | 


--------------------------------------------------------------------------------
/solar-enphase/archive.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/solar-enphase/archive.png


--------------------------------------------------------------------------------
/solar-enphase/dashboard.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/solar-enphase/dashboard.png


--------------------------------------------------------------------------------
/solar-enphase/index.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/solar-enphase/index.png


--------------------------------------------------------------------------------
/solar-enphase/minio.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/solar-enphase/minio.png


--------------------------------------------------------------------------------
/solar-enphase/solar-enphase.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python3
 2 |   
 3 | import urllib.request
 4 | import urllib.parse
 5 | import datetime
 6 | 
 7 | def main():
 8 |     # The Enphase API endpoints are detailed here:
 9 |     #   https://developer.enphase.com/docs
10 |     # Most of them don't need to be called more than once a day.
11 | 
12 |     now = datetime.datetime.utcnow()
13 | 
14 |     # Run once per day
15 |     if now.hour == 0 and now.minute == 0:
16 |         url = "https://api.enphaseenergy.com/api/v2/systems/<your_site_id>/energy_lifetime?key=<your_key>&user_id=<your_user_id>"
17 |         f = urllib.request.urlopen(url)
18 |         print(f.read().decode("utf-8"))
19 | 
20 |         url = "https://api.enphaseenergy.com/api/v2/systems/<your_site_id>/inventory?key=<your_key>&user_id=<your_user_id>"
21 |         f = urllib.request.urlopen(url)
22 |         print(f.read().decode("utf-8"))
23 | 
24 |     # Run once per hour
25 |     if now.minute == 0:
26 |         url = "https://api.enphaseenergy.com/api/v2/systems/<your_site_id>/summary?key=<your_key>&user_id=<your_user_id>"
27 |         f = urllib.request.urlopen(url)
28 |         print(f.read().decode("utf-8"))
29 | 
30 |     # Run every 10 minutes 
31 |     if now.minute % 10 == 0:
32 |         # Get the status of each inverter
33 |         url = "https://api.enphaseenergy.com/api/v2/systems/inverters_summary_by_envoy_or_site?key=<your_key>&user_id=<your_user_id>&site_id=<your_site_id>"
34 |         f = urllib.request.urlopen(url)
35 |         print(f.read().decode("utf-8"))
36 | 
37 |         # The `stats` endpoint updates, at most, once every 5 minutes.
38 |         # It isn't reliable though, so you can't expect a new reading every 5 minutes.
39 |         # Due to this, we'll track all of it and use an enrich lookup in Logstash to 
40 |         # see if the 5-minute reading was already inserted into Elasticsearch.
41 |         # {
42 |         #   "end_at": 1613239200,
43 |         #   "devices_reporting": 20,
44 |         #   "powr": 159,  # Average power produced during this interval, measured in Watts.
45 |         #   "enwh": 13    # Energy produced during this interval, measured in Watt hours.
46 |         # }
47 |         url = "https://api.enphaseenergy.com/api/v2/systems/<your_site_id>/stats?key=<your_key>&user_id=<your_user_id>"
48 |         f = urllib.request.urlopen(url)
49 |         print(f.read().decode("utf-8"))
50 | 
51 | if __name__ == "__main__":
52 |     main()
53 | 


--------------------------------------------------------------------------------
/solar-enphase/solar.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/solar-enphase/solar.png


--------------------------------------------------------------------------------
/temperature-dht22/README.md:
--------------------------------------------------------------------------------
  1 | # Monitoring Temperature with DHT22
  2 | 
  3 | <img src="dht22.png" alt="DHT22" width="350" align="right">
  4 | 
  5 | The [DHT22](http://www.adafruit.com/products/385), in a low-cost digital temperature and humidity sensor.  It uses a capacitive humidity sensor and a thermistor to measure the surrounding air, and outputs a digital signal on the data pin reporting their values.  The [AM2302](https://www.adafruit.com/product/393) is a wired version of this sensor which includes the required [4.7K - 10KΩ](https://raspberrypi.stackexchange.com/questions/12161/do-i-have-to-connect-a-resistor-to-my-dht22-humidity-sensor) resistor.  The version by [FTCBlock](https://www.amazon.com/FTCBlock-Temperature-Humidity-Electronic-Practice/dp/B07H2RP26F) comes with GPIO jumpers that don't require a breadboard or soldering.
  6 | 
  7 | We'll use a Python script to query the sensor each minute via a cron job, and redirect the output to a log file.  From there, Filebeat will pick it up and send it to Elastic.
  8 | 
  9 | ![Dashboard](dashboard.png)
 10 | 
 11 | Let's get started.
 12 | 
 13 | ## Step #1 - Collect Data
 14 | 
 15 | Install the following Python module:
 16 | 
 17 | ```bash
 18 | sudo pip3 install Adafruit_DHT
 19 | ```
 20 | 
 21 | Create a Python script at `~/bin/temperature-dht22.py` with the following contents (adjusting any values as you see fit):
 22 | 
 23 | ```python
 24 | #!/usr/bin/env python3
 25 |   
 26 | import Adafruit_DHT
 27 | import datetime
 28 | import json
 29 | import socket
 30 | 
 31 | DHT_SENSOR = Adafruit_DHT.DHT22
 32 | DHT_PIN = 4
 33 | 
 34 | if __name__ == "__main__":
 35 |     humidity, temp_c = Adafruit_DHT.read_retry(DHT_SENSOR, DHT_PIN)
 36 |     temp_f = (temp_c * 9 / 5) + 32
 37 |     output = {
 38 |         "timestamp":  datetime.datetime.utcnow().isoformat(),
 39 |         "host": socket.gethostname(),
 40 |         "temp_c": float("%2.2f" % temp_c),
 41 |         "temp_f": float("%2.2f" % temp_f),
 42 |         "humidity": float("%2.2f" % humidity),
 43 |         "location": "office",
 44 |         "source": "DHT22"
 45 |     }
 46 |     print(json.dumps(output))
 47 | ```
 48 | 
 49 | Try running the script from the command line:
 50 | 
 51 | ```bash
 52 | chmod a+x ~/bin/temperature-dht22.py
 53 | sudo ~/bin/temperature-dht22.py
 54 | ```
 55 | 
 56 | The output should look like the following:
 57 | 
 58 | ```json
 59 | {"timestamp": "2021-09-05T12:30:10.436436", "host": "node-19", "temp_c": 21.3, "temp_f": 70.34, "humidity": 60.2, "location": "office", "source": "DHT22"}
 60 | ```
 61 | 
 62 | Once you're able to successfully query the sensor, create a log file for its output:
 63 | 
 64 | ```bash
 65 | sudo touch /var/log/temperature-dht22.log
 66 | sudo chown ubuntu.ubuntu /var/log/temperature-dht22.log
 67 | ```
 68 | 
 69 | Create a logrotate entry so the log file doesn't grow unbounded:
 70 | 
 71 | ```
 72 | sudo vi /etc/logrotate.d/temperature-dht22
 73 | ```
 74 | 
 75 | Add the following content:
 76 | 
 77 | ```
 78 | /var/log/temperature-dht22.log {
 79 |   weekly
 80 |   rotate 12
 81 |   compress
 82 |   delaycompress
 83 |   missingok
 84 |   notifempty
 85 |   create 644 ubuntu ubuntu
 86 | }
 87 | ```
 88 | 
 89 | Add the following entry to your crontab:
 90 | 
 91 | ```
 92 | * * * * * sudo /home/ubuntu/bin/temperature-dht22.py >> /var/log/temperature-dht22.log 2>&1
 93 | ```
 94 | 
 95 | Verify output by tailing the log file for a few minutes:
 96 | 
 97 | ```
 98 | $ tail -f /var/log/temperature-dht22.log
 99 | ```
100 | 
101 | If you're seeing output scroll each minute then you are successfully collecting data!
102 | 
103 | ## Step #2 - Archive Data
104 | 
105 | Once your data is ready to archive, we'll use Filebeat to send it to Logstash which will in turn sends it to S3.
106 | 
107 | Add the following to the Filebeat config `/etc/filebeat/filebeat.yml` on the host logging your DHT22 data:
108 | 
109 | ```yaml
110 | filebeat.inputs:
111 |   - type: log
112 |     enabled: true
113 |     tags: ["temperature-dht22"]
114 |     paths:
115 |       - /var/log/temperature-dht22.log
116 | ```
117 | 
118 | This tells Filebeat where the log file is located and it adds a tag to each event.  We'll refer to that tag in Logstash so we can easily isolate events from this data stream.
119 | 
120 | Restart Filebeat:
121 | 
122 | ```bash
123 | sudo systemctl restart filebeat
124 | ```
125 | 
126 | You may want to tail syslog to see if Filebeat restarts without any issues:
127 | 
128 | ```bash
129 | tail -f /var/log/syslog | grep filebeat
130 | ```
131 | 
132 | At this point, we should have DHT22 data flowing into Logstash.  By default however, our `distributor` pipeline in Logstash will put any unrecognized data in our Data Lake / S3 bucket called `NEEDS_CLASSIFIED`.  To change this, we're going to update the `distributor` pipeline to recognize the DHT22 data feed.
133 | 
134 | Add the following conditional to your `distributor.yml` file:
135 | 
136 | ```
137 | } else if "temperature-dht22" in [tags] {
138 |     pipeline {
139 |         send_to => ["temperature-dht22-archive"]
140 |     }
141 | }
142 | ```
143 | 
144 | Create a Logstash pipeline called `temperature-dht22-archive.yml` with the following contents:
145 | 
146 | ```
147 | input {
148 |     pipeline {
149 |         address => "temperature-dht22-archive"
150 |     }
151 | }
152 | filter {
153 | }
154 | output {
155 |     s3 {
156 |         #
157 |         # Custom Settings
158 |         #
159 |         prefix => "temperature-dht22/%{+YYYY}-%{+MM}-%{+dd}/%{+HH}"
160 |         temporary_directory => "${S3_TEMP_DIR}/temperature-dht22-archive"
161 |         access_key_id => "${S3_ACCESS_KEY}"
162 |         secret_access_key => "${S3_SECRET_KEY}"
163 |         endpoint => "${S3_ENDPOINT}"
164 |         bucket => "${S3_BUCKET}"
165 | 
166 |         #
167 |         # Standard Settings
168 |         #
169 |         validate_credentials_on_root_bucket => false
170 |         codec => json_lines
171 |         # Limit Data Lake file sizes to 5 GB
172 |         size_file => 5000000000
173 |         time_file => 60
174 |         # encoding => "gzip"
175 |         additional_settings => {
176 |             force_path_style => true
177 |             follow_redirects => false
178 |         }
179 |     }
180 | }
181 | ```
182 | 
183 | Put this pipeline in your Logstash configuration directory so it gets loaded whenever Logstash restarts:
184 | 
185 | ```bash
186 | sudo mv temperature-dht22-archive.yml /etc/logstash/conf.d/
187 | ```
188 | 
189 | Add the pipeline to your `/etc/logstash/pipelines.yml` file:
190 | 
191 | ```
192 | - pipeline.id: "temperature-dht22-archive"
193 |   path.config: "/etc/logstash/conf.d/temperature-dht22-archive.conf"
194 | ```
195 | 
196 | And finally, restart the Logstash service:
197 | 
198 | ```bash
199 | sudo systemctl restart logstash
200 | ```
201 | 
202 | While Logstash is restarting, you can tail it's log file in order to see if there are any configuration errors:
203 | 
204 | ```bash
205 | sudo tail -f /var/log/logstash/logstash-plain.log
206 | ```
207 | 
208 | After a few seconds, you should see Logstash shutdown and start with the new pipeline and no errors being emitted.
209 | 
210 | Check your cluster's Stack Monitoring to see if we're getting events through the pipeline:
211 | 
212 | ![Stack Monitoring](archive.png)
213 | 
214 | Check your S3 bucket to see if you're getting data directories created for the current date & hour with data:
215 | 
216 | ![MinIO](minio.png)
217 | 
218 | If you see your data being stored, then you are successfully archiving!
219 | 
220 | ## Step #3 - Index Data
221 | 
222 | Once Logstash is archiving the data, next we need to index it with Elastic.
223 | 
224 | We'll use Elastic's [Dynamic field mapping](https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-field-mapping.html) feature to automatically create the right [Field data types](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html) for the data we're sending in.  
225 | 
226 | Using the [Logstash Toolkit](http://github.com/gose/logstash-toolkit), the following filter chain has been built that can parse the raw JSON coming in.
227 | 
228 | Create a new pipeline called `temperature-dht22-index.yml` with the following content:
229 | 
230 | ```
231 | input {
232 |     pipeline {
233 |         address => "temperature-dht22-index"
234 |     }
235 | }
236 | filter {
237 |     json {
238 |         source => "message"
239 |     }
240 |     json {
241 |         source => "message"
242 |     }
243 |     date {
244 |         match => ["timestamp", "ISO8601"]
245 |     }
246 |     mutate {
247 |         remove_field => ["timestamp", "message"]
248 |         remove_field => ["tags", "agent", "input", "log", "path", "ecs", "@version"]
249 |     }
250 | }
251 | output {
252 |     elasticsearch {
253 |         #
254 |         # Custom Settings
255 |         #
256 |         id => "temperature-dht22-index"
257 |         index => "temperature-dht22-%{+YYYY.MM.dd}"
258 |         hosts => "${ES_ENDPOINT}"
259 |         user => "${ES_USERNAME}"
260 |         password => "${ES_PASSWORD}"
261 |     }
262 | }
263 | ```
264 | 
265 | Put this pipeline in your Logstash configuration directory so it gets loaded in whenever Logstash restarts:
266 | 
267 | ```bash
268 | sudo mv temperature-dht22-index.yml /etc/logstash/conf.d/
269 | ```
270 | 
271 | Add the pipeline to your `/etc/logstash/pipelines.yml` file:
272 | 
273 | ```
274 | - pipeline.id: "temperature-dht22-index"
275 |   path.config: "/etc/logstash/conf.d/temperature-dht22-index.conf"
276 | ```
277 | 
278 | Append your new pipeline to your tagged data in the `distributor.yml` pipeline:
279 | 
280 | ```
281 | } else if "temperature-dht22" in [tags] {
282 |     pipeline {
283 |         send_to => ["temperature-dht22-archive", "temperature-dht22-index"]
284 |     }
285 | }
286 | ```
287 | 
288 | And finally, restart the Logstash service:
289 | 
290 | ```bash
291 | sudo systemctl restart logstash
292 | ```
293 | 
294 | While Logstash is restarting, you can tail it's log file in order to see if there are any configuration errors:
295 | 
296 | ```bash
297 | sudo tail -f /var/log/logstash/logstash-plain.log
298 | ```
299 | 
300 | After a few seconds, you should see Logstash shutdown and start with the new pipeline and no errors being emitted.
301 | 
302 | Check your cluster's Stack Monitoring to see if we're getting events through the pipeline:
303 | 
304 | ![Indexing](index.png)
305 | 
306 | ## Step #4 - Visualize Data
307 | 
308 | Once Elasticsearch is indexing the data, we want to visualize it in Kibana.
309 | 
310 | Download this dashboard:  [temperature-dht22.ndjson](temperature-dht22.ndjson)
311 | 
312 | Jump back into Kibana:
313 | 
314 | 1. Select "Stack Management" from the menu
315 | 2. Select "Saved Objects"
316 | 3. Click "Import" in the upper right
317 | 
318 | Once it's been imported, click on "Temperature DHT22".
319 | 
320 | ![Dashboard](dashboard.png)
321 | 
322 | Congratulations!  You should now be looking at temperature data from your DHT22 in Elastic.
323 | 
324 | 


--------------------------------------------------------------------------------
/temperature-dht22/archive.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/temperature-dht22/archive.png


--------------------------------------------------------------------------------
/temperature-dht22/dashboard.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/temperature-dht22/dashboard.png


--------------------------------------------------------------------------------
/temperature-dht22/dht22.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/temperature-dht22/dht22.png


--------------------------------------------------------------------------------
/temperature-dht22/index.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/temperature-dht22/index.png


--------------------------------------------------------------------------------
/temperature-dht22/minio.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/temperature-dht22/minio.png


--------------------------------------------------------------------------------
/temperature-dht22/temperature-dht22.ndjson:
--------------------------------------------------------------------------------
1 | {"attributes":{"fieldAttrs":"{}","fields":"[]","runtimeFieldMap":"{}","timeFieldName":"@timestamp","title":"temperature-dht22-*","typeMeta":"{}"},"coreMigrationVersion":"7.14.0","id":"9dff6140-0ee2-11ec-b03a-7d8df502f497","migrationVersion":{"index-pattern":"7.11.0"},"references":[],"type":"index-pattern","updated_at":"2021-09-06T07:18:21.789Z","version":"WzM0OTQ3NywyXQ=="}
2 | {"attributes":{"description":"","hits":0,"kibanaSavedObjectMeta":{"searchSourceJSON":"{\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filter\":[]}"},"optionsJSON":"{\"useMargins\":true,\"syncColors\":false,\"hidePanelTitles\":false}","panelsJSON":"[{\"version\":\"7.14.0\",\"type\":\"visualization\",\"gridData\":{\"x\":0,\"y\":0,\"w\":20,\"h\":6,\"i\":\"bfa04254-31c1-429b-971d-921b5d6c33b1\"},\"panelIndex\":\"bfa04254-31c1-429b-971d-921b5d6c33b1\",\"embeddableConfig\":{\"savedVis\":{\"title\":\"\",\"description\":\"\",\"type\":\"markdown\",\"params\":{\"fontSize\":12,\"openLinksInNewTab\":false,\"markdown\":\"# Temperature & Humidity\\nby DHT22\"},\"uiState\":{},\"data\":{\"aggs\":[],\"searchSource\":{\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filter\":[]}}},\"enhancements\":{}}},{\"version\":\"7.14.0\",\"type\":\"lens\",\"gridData\":{\"x\":20,\"y\":0,\"w\":28,\"h\":6,\"i\":\"6bf93313-05d1-4657-bf19-2ac871b74009\"},\"panelIndex\":\"6bf93313-05d1-4657-bf19-2ac871b74009\",\"embeddableConfig\":{\"attributes\":{\"title\":\"\",\"type\":\"lens\",\"visualizationType\":\"lnsXY\",\"state\":{\"datasourceStates\":{\"indexpattern\":{\"layers\":{\"eee22f36-699b-4bcf-b6a8-a1efaefa5549\":{\"columns\":{\"c8b44acf-fb21-4a7a-8d72-f212143dc087\":{\"label\":\"@timestamp\",\"dataType\":\"date\",\"operationType\":\"date_histogram\",\"sourceField\":\"@timestamp\",\"isBucketed\":true,\"scale\":\"interval\",\"params\":{\"interval\":\"auto\"}},\"7658a405-61eb-48aa-85c0-92254a942779\":{\"label\":\"Count of records\",\"dataType\":\"number\",\"operationType\":\"count\",\"isBucketed\":false,\"scale\":\"ratio\",\"sourceField\":\"Records\"}},\"columnOrder\":[\"c8b44acf-fb21-4a7a-8d72-f212143dc087\",\"7658a405-61eb-48aa-85c0-92254a942779\"],\"incompleteColumns\":{}}}}},\"visualization\":{\"legend\":{\"isVisible\":true,\"position\":\"right\"},\"valueLabels\":\"hide\",\"fittingFunction\":\"None\",\"yLeftExtent\":{\"mode\":\"full\"},\"yRightExtent\":{\"mode\":\"full\"},\"axisTitlesVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"tickLabelsVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"gridlinesVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"preferredSeriesType\":\"bar_stacked\",\"layers\":[{\"layerId\":\"eee22f36-699b-4bcf-b6a8-a1efaefa5549\",\"accessors\":[\"7658a405-61eb-48aa-85c0-92254a942779\"],\"position\":\"top\",\"seriesType\":\"bar_stacked\",\"showGridlines\":false,\"xAccessor\":\"c8b44acf-fb21-4a7a-8d72-f212143dc087\"}]},\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filters\":[]},\"references\":[{\"type\":\"index-pattern\",\"id\":\"9dff6140-0ee2-11ec-b03a-7d8df502f497\",\"name\":\"indexpattern-datasource-current-indexpattern\"},{\"type\":\"index-pattern\",\"id\":\"9dff6140-0ee2-11ec-b03a-7d8df502f497\",\"name\":\"indexpattern-datasource-layer-eee22f36-699b-4bcf-b6a8-a1efaefa5549\"}]},\"hidePanelTitles\":false,\"enhancements\":{}},\"title\":\"Logs\"},{\"version\":\"7.14.0\",\"type\":\"lens\",\"gridData\":{\"x\":0,\"y\":6,\"w\":48,\"h\":10,\"i\":\"b2634629-b584-4b60-99d2-574db7c2d576\"},\"panelIndex\":\"b2634629-b584-4b60-99d2-574db7c2d576\",\"embeddableConfig\":{\"attributes\":{\"title\":\"\",\"type\":\"lens\",\"visualizationType\":\"lnsXY\",\"state\":{\"datasourceStates\":{\"indexpattern\":{\"layers\":{\"ddd9b678-b4d2-4c07-8715-7338bc709326\":{\"columns\":{\"5f615941-0d1f-4e75-b85d-37a5de33199d\":{\"label\":\"@timestamp\",\"dataType\":\"date\",\"operationType\":\"date_histogram\",\"sourceField\":\"@timestamp\",\"isBucketed\":true,\"scale\":\"interval\",\"params\":{\"interval\":\"auto\"}},\"7a7f6240-bd2d-4782-bc0e-68f5ddfe61b6\":{\"label\":\"Median of temp_f\",\"dataType\":\"number\",\"operationType\":\"median\",\"sourceField\":\"temp_f\",\"isBucketed\":false,\"scale\":\"ratio\"},\"8b38e615-3eac-4a81-9da9-955d50ffb348\":{\"label\":\"Top values of host.keyword\",\"dataType\":\"string\",\"operationType\":\"terms\",\"scale\":\"ordinal\",\"sourceField\":\"host.keyword\",\"isBucketed\":true,\"params\":{\"size\":3,\"orderBy\":{\"type\":\"column\",\"columnId\":\"7a7f6240-bd2d-4782-bc0e-68f5ddfe61b6\"},\"orderDirection\":\"desc\",\"otherBucket\":true,\"missingBucket\":false}}},\"columnOrder\":[\"8b38e615-3eac-4a81-9da9-955d50ffb348\",\"5f615941-0d1f-4e75-b85d-37a5de33199d\",\"7a7f6240-bd2d-4782-bc0e-68f5ddfe61b6\"],\"incompleteColumns\":{}}}}},\"visualization\":{\"legend\":{\"isVisible\":true,\"position\":\"right\"},\"valueLabels\":\"hide\",\"fittingFunction\":\"None\",\"curveType\":\"CURVE_MONOTONE_X\",\"valuesInLegend\":true,\"yLeftExtent\":{\"mode\":\"full\"},\"yRightExtent\":{\"mode\":\"full\"},\"axisTitlesVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"tickLabelsVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"gridlinesVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"preferredSeriesType\":\"line\",\"layers\":[{\"layerId\":\"ddd9b678-b4d2-4c07-8715-7338bc709326\",\"accessors\":[\"7a7f6240-bd2d-4782-bc0e-68f5ddfe61b6\"],\"position\":\"top\",\"seriesType\":\"line\",\"showGridlines\":false,\"xAccessor\":\"5f615941-0d1f-4e75-b85d-37a5de33199d\",\"splitAccessor\":\"8b38e615-3eac-4a81-9da9-955d50ffb348\"}]},\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filters\":[]},\"references\":[{\"type\":\"index-pattern\",\"id\":\"9dff6140-0ee2-11ec-b03a-7d8df502f497\",\"name\":\"indexpattern-datasource-current-indexpattern\"},{\"type\":\"index-pattern\",\"id\":\"9dff6140-0ee2-11ec-b03a-7d8df502f497\",\"name\":\"indexpattern-datasource-layer-ddd9b678-b4d2-4c07-8715-7338bc709326\"}]},\"hidePanelTitles\":false,\"enhancements\":{}},\"title\":\"Temperature\"},{\"version\":\"7.14.0\",\"type\":\"lens\",\"gridData\":{\"x\":0,\"y\":16,\"w\":48,\"h\":9,\"i\":\"5c2cc2a8-b6ae-4362-bbc4-df113d3e4619\"},\"panelIndex\":\"5c2cc2a8-b6ae-4362-bbc4-df113d3e4619\",\"embeddableConfig\":{\"attributes\":{\"title\":\"\",\"type\":\"lens\",\"visualizationType\":\"lnsXY\",\"state\":{\"datasourceStates\":{\"indexpattern\":{\"layers\":{\"9436d677-5ba4-4307-838b-53d734ad969d\":{\"columns\":{\"8c427cf5-b467-4497-b49a-ffe256a862d4\":{\"label\":\"@timestamp\",\"dataType\":\"date\",\"operationType\":\"date_histogram\",\"sourceField\":\"@timestamp\",\"isBucketed\":true,\"scale\":\"interval\",\"params\":{\"interval\":\"auto\"}},\"c8c24ecc-082a-4eeb-a039-aef253a1a9de\":{\"label\":\"Median of humidity\",\"dataType\":\"number\",\"operationType\":\"median\",\"sourceField\":\"humidity\",\"isBucketed\":false,\"scale\":\"ratio\"},\"eac3c026-b26a-4384-9020-d6eae264660d\":{\"label\":\"Top values of host.keyword\",\"dataType\":\"string\",\"operationType\":\"terms\",\"scale\":\"ordinal\",\"sourceField\":\"host.keyword\",\"isBucketed\":true,\"params\":{\"size\":3,\"orderBy\":{\"type\":\"column\",\"columnId\":\"c8c24ecc-082a-4eeb-a039-aef253a1a9de\"},\"orderDirection\":\"desc\",\"otherBucket\":true,\"missingBucket\":false}}},\"columnOrder\":[\"eac3c026-b26a-4384-9020-d6eae264660d\",\"8c427cf5-b467-4497-b49a-ffe256a862d4\",\"c8c24ecc-082a-4eeb-a039-aef253a1a9de\"],\"incompleteColumns\":{}}}}},\"visualization\":{\"legend\":{\"isVisible\":true,\"position\":\"right\"},\"valueLabels\":\"hide\",\"fittingFunction\":\"None\",\"curveType\":\"CURVE_MONOTONE_X\",\"valuesInLegend\":true,\"yLeftExtent\":{\"mode\":\"full\"},\"yRightExtent\":{\"mode\":\"full\"},\"axisTitlesVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"tickLabelsVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"gridlinesVisibilitySettings\":{\"x\":true,\"yLeft\":true,\"yRight\":true},\"preferredSeriesType\":\"line\",\"layers\":[{\"layerId\":\"9436d677-5ba4-4307-838b-53d734ad969d\",\"accessors\":[\"c8c24ecc-082a-4eeb-a039-aef253a1a9de\"],\"position\":\"top\",\"seriesType\":\"line\",\"showGridlines\":false,\"xAccessor\":\"8c427cf5-b467-4497-b49a-ffe256a862d4\",\"splitAccessor\":\"eac3c026-b26a-4384-9020-d6eae264660d\"}]},\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filters\":[]},\"references\":[{\"type\":\"index-pattern\",\"id\":\"9dff6140-0ee2-11ec-b03a-7d8df502f497\",\"name\":\"indexpattern-datasource-current-indexpattern\"},{\"type\":\"index-pattern\",\"id\":\"9dff6140-0ee2-11ec-b03a-7d8df502f497\",\"name\":\"indexpattern-datasource-layer-9436d677-5ba4-4307-838b-53d734ad969d\"}]},\"hidePanelTitles\":false,\"enhancements\":{}},\"title\":\"Humidity\"}]","timeRestore":false,"title":"Temperature DHT22","version":1},"coreMigrationVersion":"7.14.0","id":"27df3f70-0ee3-11ec-b03a-7d8df502f497","migrationVersion":{"dashboard":"7.14.0"},"references":[{"id":"9dff6140-0ee2-11ec-b03a-7d8df502f497","name":"6bf93313-05d1-4657-bf19-2ac871b74009:indexpattern-datasource-current-indexpattern","type":"index-pattern"},{"id":"9dff6140-0ee2-11ec-b03a-7d8df502f497","name":"6bf93313-05d1-4657-bf19-2ac871b74009:indexpattern-datasource-layer-eee22f36-699b-4bcf-b6a8-a1efaefa5549","type":"index-pattern"},{"id":"9dff6140-0ee2-11ec-b03a-7d8df502f497","name":"b2634629-b584-4b60-99d2-574db7c2d576:indexpattern-datasource-current-indexpattern","type":"index-pattern"},{"id":"9dff6140-0ee2-11ec-b03a-7d8df502f497","name":"b2634629-b584-4b60-99d2-574db7c2d576:indexpattern-datasource-layer-ddd9b678-b4d2-4c07-8715-7338bc709326","type":"index-pattern"},{"id":"9dff6140-0ee2-11ec-b03a-7d8df502f497","name":"5c2cc2a8-b6ae-4362-bbc4-df113d3e4619:indexpattern-datasource-current-indexpattern","type":"index-pattern"},{"id":"9dff6140-0ee2-11ec-b03a-7d8df502f497","name":"5c2cc2a8-b6ae-4362-bbc4-df113d3e4619:indexpattern-datasource-layer-9436d677-5ba4-4307-838b-53d734ad969d","type":"index-pattern"}],"type":"dashboard","updated_at":"2021-09-06T07:22:13.101Z","version":"WzM0OTY3MywyXQ=="}
3 | {"excludedObjects":[],"excludedObjectsCount":0,"exportedCount":2,"missingRefCount":0,"missingReferences":[]}


--------------------------------------------------------------------------------
/utilization/2-archive/utilization-archive.yml:
--------------------------------------------------------------------------------
 1 | input {
 2 |     pipeline {
 3 |         address => "utilization-archive"
 4 |     }
 5 | }
 6 | filter {
 7 | }
 8 | output {
 9 |     s3 {
10 |         #
11 |         # Custom Settings
12 |         #
13 |         prefix => "utilization/${S3_DATE_DIR}"
14 |         temporary_directory => "${S3_TEMP_DIR}/utilization-archive"
15 |         access_key_id => "${S3_ACCESS_KEY}"
16 |         secret_access_key => "${S3_SECRET_KEY}"
17 |         endpoint => "${S3_ENDPOINT}"
18 |         bucket => "${S3_BUCKET}"
19 |         
20 |         #
21 |         # Standard Settings
22 |         #
23 |         validate_credentials_on_root_bucket => false
24 |         codec => json_lines
25 |         # Limit Data Lake file sizes to 5 GB
26 |         size_file => 5000000000
27 |         time_file => 1
28 |         # encoding => "gzip"
29 |         additional_settings => {
30 |             force_path_style => true
31 |             follow_redirects => false
32 |         }
33 |     }
34 | }
35 | 


--------------------------------------------------------------------------------
/utilization/2-archive/utilization-reindex.yml:
--------------------------------------------------------------------------------
 1 | input {
 2 |     s3 {
 3 |         #
 4 |         # Custom Settings
 5 |         #
 6 |         prefix => "utilization/2021-01-04"
 7 |         temporary_directory => "${S3_TEMP_DIR}/utilization-reindex"
 8 |         access_key_id => "${S3_ACCESS_KEY}"
 9 |         secret_access_key => "${S3_SECRET_KEY}"
10 |         endpoint => "${S3_ENDPOINT}"
11 |         bucket => "${S3_BUCKET}"
12 |         
13 |         #
14 |         # Standard Settings
15 |         #
16 |         watch_for_new_files => false
17 |         codec => json_lines
18 |         additional_settings => {
19 |             force_path_style => true
20 |             follow_redirects => false
21 |         }
22 |     }
23 | }
24 | filter {
25 | }
26 | output {
27 |     pipeline { send_to => "utilization-structure" }
28 | }
29 | 


--------------------------------------------------------------------------------
/utilization/2-archive/utilization-structure.yml:
--------------------------------------------------------------------------------
 1 | input {
 2 |     pipeline {
 3 |         address => "utilization-structure"
 4 |     }
 5 | }
 6 | filter {
 7 | }
 8 | output {
 9 |     elasticsearch {
10 |         #
11 |         # Custom Settings
12 |         #
13 |         id => "utilization-structure"
14 |         index => "utilization"
15 |         hosts => "${ES_ENDPOINT}"
16 |         user => "${ES_USERNAME}"
17 |         password => "${ES_PASSWORD}"
18 |     }
19 | }
20 | 


--------------------------------------------------------------------------------
/weather-station/README.md:
--------------------------------------------------------------------------------
  1 | # Weather Station
  2 | 
  3 | <img src="ws-1550-ip.png" alt="weather-station" width="300" align="right">
  4 | 
  5 | The [WS-1550-IP](https://ambientweather.com/amws1500.html) from Ambient Weather is a great amatuer weather station.  The station itself is powered by solar with AA battery backup, it's relatively maintenance free, and it's a joy to observe in action.  It communicates with a base station via 915 MHz that requires no setup.  You can also add up to 8 additional sensors to collect temperature from various points within range, all wirelessly.  The base station connects to the Internet via a hard-wired RJ-45 connection on your network, that it uses to upload the data it collects to Ambient Weather's free service.  From there, you can query an [API](https://ambientweather.docs.apiary.io/#) to get your latest, hyper-local, weather.
  6 | 
  7 | In this data source, we'll build the following dashboard with Elastic:
  8 | 
  9 | ![Dashboard](dashboard.png)
 10 | 
 11 | Let's get started!
 12 | 
 13 | ## Step #1 - Collect Data
 14 | 
 15 | Create a new python script called `~/bin/weather-station.py` with the following contents:
 16 | 
 17 | ```python
 18 | #!/usr/bin/env python3
 19 |   
 20 | import urllib.request
 21 | import urllib.parse
 22 | 
 23 | api_key = '<your_api_key>'
 24 | app_key = '<your_app_key>'
 25 | 
 26 | url = "https://api.ambientweather.net/v1/devices?apiKey=%s&applicationKey=%s" % (api_key, app_key)
 27 | 
 28 | try:
 29 |     f = urllib.request.urlopen(url)
 30 | except urllib.error.HTTPError as e:
 31 |     # Return code error (e.g. 404, 501, ...)
 32 |     print('[{"lastData": {"http_code": %s}}]' % (e.code))
 33 | except urllib.error.URLError as e:
 34 |     # Not an HTTP-specific error (e.g. connection refused)
 35 |     print('[{"lastData": {"http_error": "%s"}}]' % (e.reason))
 36 | else:
 37 |     # 200
 38 |     print(f.read().decode("utf-8"))
 39 | ```
 40 | 
 41 | Enter your API key and Application key from the Ambient Weather service.
 42 | 
 43 | This script queries the Ambient Weather API using your API key and Application key.  It then prints the response to `stdout`.  Once we've confirmed the script works, we'll redirect `stdout` to a log file.
 44 | 
 45 | Try running the script:
 46 | 
 47 | ```bash
 48 | chmod a+x ~/bin/weather-station.py
 49 | ~/bin/weather-station.py
 50 | ```
 51 | 
 52 | You should see output similar to:
 53 | 
 54 | ```json
 55 | [{"macAddress":"00:0E:C6:20:0F:7B","lastData":{"dateutc":1630076460000,"winddir":186,"windspeedmph":0.22,"windgustmph":1.12,"maxdailygust":4.47,"tempf":82.4,"battout":1,"humidity":69,"hourlyrainin":0,"eventrainin":0,"dailyrainin":0,"weeklyrainin":1.22,"monthlyrainin":5.03,"yearlyrainin":21.34,"totalrainin":21.34,"tempinf":73.4,"battin":1,"humidityin":62, ...
 56 | ```
 57 | 
 58 | Once you confirm the script is working, you can redirect its output to a log file:
 59 | 
 60 | ```bash
 61 | sudo touch /var/log/weather-station.log
 62 | sudo chown ubuntu.ubuntu /var/log/weather-station.log
 63 | ```
 64 | 
 65 | Create a logrotate entry so the log file doesn't grow unbounded:
 66 | 
 67 | ```bash
 68 | sudo vi /etc/logrotate.d/weather-station
 69 | ```
 70 | 
 71 | Add the following logrotate content:
 72 | 
 73 | ```
 74 | /var/log/weather-station.log {
 75 |   weekly
 76 |   rotate 12
 77 |   compress
 78 |   delaycompress
 79 |   missingok
 80 |   notifempty
 81 |   create 644 ubuntu ubuntu
 82 | }
 83 | ```
 84 | 
 85 | Add the following entry to your crontab with `crontab -e`:
 86 | 
 87 | ```
 88 | * * * * * /home/ubuntu/bin/weather-station.py >> /var/log/weather-station.log 2>&1
 89 | ```
 90 | 
 91 | Verify output by tailing the log file for a few minutes (since cron is only running the script at the start of each minute):
 92 | 
 93 | ```bash
 94 | tail -f /var/log/weather-station.log
 95 | ```
 96 | 
 97 | If you're seeing output scroll each minute then you are successfully collecting data!
 98 | 
 99 | ## Step #2 - Archive Data
100 | 
101 | Once your data is ready to archive, we'll use Filebeat to send it to Logstash which will in turn sends it to S3.
102 | 
103 | Add the following to the Filebeat config `/etc/filebeat/filebeat.yml` on the host logging your weather data:
104 | 
105 | ```yaml
106 | filebeat.inputs:
107 | - type: log
108 |   enabled: true
109 |   tags: ["weather-station"]
110 |   paths:
111 |     - /var/log/weather-station.log
112 | ```
113 | 
114 | This tells Filebeat where the log file is located and it adds a tag to each event.  We'll refer to that tag in Logstash so we can easily isolate events from this data stream.
115 | 
116 | Restart Filebeat:
117 | 
118 | ```bash
119 | sudo systemctl restart filebeat
120 | ```
121 | 
122 | You may want to tail syslog to see if Filebeat restarts without any issues:
123 | 
124 | ```bash
125 | tail -f /var/log/syslog | grep filebeat
126 | ```
127 | 
128 | At this point, we should have weather-station data flowing into Logstash.  By default however, our `distributor` pipeline in Logstash will put any unrecognized data in our Data Lake / S3 bucket called `NEEDS_CLASSIFIED`.  To change this, we're going to update the `distributor` pipeline to recognize the weather station data feed.
129 | 
130 | Add the following conditional to your `distributor.yml` file:
131 | 
132 | ```
133 | } else if "weather-station" in [tags] {
134 |     pipeline {
135 |         send_to => ["weather-station-archive"]
136 |     }
137 | }
138 | ```
139 | 
140 | Create a Logstash pipeline called `weather-station-archive.yml` with the following contents:
141 | 
142 | ```
143 | input {
144 |     pipeline {
145 |         address => "weather-station-archive"
146 |     }
147 | }
148 | filter {
149 | }
150 | output {
151 |     s3 {
152 |         #
153 |         # Custom Settings
154 |         #
155 |         prefix => "weather-station/%{+YYYY}-%{+MM}-%{+dd}/%{+HH}"
156 |         temporary_directory => "${S3_TEMP_DIR}/weather-station-archive"
157 |         access_key_id => "${S3_ACCESS_KEY}"
158 |         secret_access_key => "${S3_SECRET_KEY}"
159 |         endpoint => "${S3_ENDPOINT}"
160 |         bucket => "${S3_BUCKET}"
161 | 
162 |         #
163 |         # Standard Settings
164 |         #
165 |         validate_credentials_on_root_bucket => false
166 |         codec => json_lines
167 |         # Limit Data Lake file sizes to 5 GB
168 |         size_file => 5000000000
169 |         time_file => 60
170 |         # encoding => "gzip"
171 |         additional_settings => {
172 |             force_path_style => true
173 |             follow_redirects => false
174 |         }
175 |     }
176 | }
177 | ```
178 | 
179 | Put this pipeline in your Logstash configuration directory so it gets loaded whenever Logstash restarts:
180 | 
181 | ```bash
182 | sudo mv weather-station-archive.yml /etc/logstash/conf.d/
183 | ```
184 | 
185 | Add the pipeline to your `/etc/logstash/pipelines.yml` file:
186 | 
187 | ```
188 | - pipeline.id: "weather-station-archive"
189 |   path.config: "/etc/logstash/conf.d/weather-station-archive.conf"
190 | ```
191 | 
192 | And finally, restart the Logstash service:
193 | 
194 | ```bash
195 | sudo systemctl restart logstash
196 | ```
197 | 
198 | While Logstash is restarting, you can tail it's log file in order to see if there are any configuration errors:
199 | 
200 | ```bash
201 | sudo tail -f /var/log/logstash/logstash-plain.log
202 | ```
203 | 
204 | After a few seconds, you should see Logstash shutdown and start with the new pipeline and no errors being emitted.
205 | 
206 | Check your cluster's Stack Monitoring to see if we're getting events through the pipeline:
207 | 
208 | ![Archive](archive.png)
209 | 
210 | Check your S3 bucket to see if you're getting data directories created each minute for the current date & hour with data:
211 | 
212 | ![Minio](minio.png)
213 | 
214 | If you see your data being stored, then you are successfully archiving!
215 | 
216 | ## Step #3 - Index Data
217 | 
218 | Once Logstash is archiving the data, next we need to index it with Elastic.
219 | 
220 | Jump into Kibana and open Dev Tools.
221 | 
222 | Copy and paste the following content into Dev Tools to create an Index Template for our weather station data:
223 | 
224 | ```
225 | PUT _index_template/weather-station
226 | {
227 |   "index_patterns": [
228 |     "weather-station-*"
229 |   ],
230 |   "template": {
231 |     "mappings": {
232 |       "dynamic_templates": [
233 |         {
234 |           "integers": {
235 |             "match_mapping_type": "long",
236 |             "mapping": {
237 |               "type": "float"
238 |             }
239 |           }
240 |         }
241 |       ],
242 |       "properties": {
243 |         "info.coords.geo.coordinates": {
244 |           "type": "geo_point"
245 |         }
246 |       }
247 |     }
248 |   }
249 | }
250 | ```
251 | 
252 | For the most part, we'll use Elastic's [Dynamic field mapping](https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-field-mapping.html) feature to automatically create the right [Field data types](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html) for the data we're sending in.  The exceptions here are the latitude & longitude of the weather station and the coercion of any `long` values into `float` values.  First, for the latitude & longitude, we need to explicility tell Elasticsearch that this is a [`geo_point`](https://www.elastic.co/guide/en/elasticsearch/reference/current/geo-point.html) type so that we can plot it on a map.  If you start to track multiple weather stations in Elastic, plotting their locations on a map is very useful.  Second, to prevent any values that happen to first come in as a whole number from determining the mapping type, we set a [`dynamic_template`](https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-templates.html) to convert any `long` values into `float` values.
253 | 
254 | Now, switch back to a terminal so we can create the Logstash pipeline to index the weather station data.
255 | 
256 | Using the [Logstash Toolkit](http://github.com/gose/logstash-toolkit), I iteratively built the following filter chain that can parse the raw JSON coming in.
257 | 
258 | Create a new pipeline called `weather-station-index.yml` with the following content:
259 | 
260 | ```
261 | input {
262 |     pipeline {
263 |         address => "weather-station-index"
264 |     }
265 | }
266 | filter {
267 |   if [message] =~ /^\[/ {
268 |     json {
269 |       source => "message"
270 |       target => "tmp"
271 |     }
272 |   } else {
273 |     drop { }
274 |   }
275 |   if "_jsonparsefailure" in [tags] {
276 |     drop { }
277 |   }
278 |   mutate {
279 |     remove_field => ["message"]
280 |   }
281 |   mutate {
282 |     add_field => {
283 |       "message" => "%{[tmp][0]}"
284 |     }
285 |   }
286 |   json {
287 |     source => "message"
288 |   }
289 |   ruby {
290 |     # Promote the keys inside lastData to root, then remove lastData
291 |     code => '
292 |       event.get("lastData").each { |k, v|
293 |         event.set(k,v)
294 |       }
295 |       event.remove("lastData")
296 |     '
297 |   }
298 |   date {
299 |     match => ["date", "ISO8601"]
300 |   }
301 |   mutate {
302 |     remove_field => ["message", "tmp", "path", "host", "macAddress", "date"]
303 |   }
304 | }
305 | output {
306 |     elasticsearch {
307 |         #
308 |         # Custom Settings
309 |         #
310 |         id => "weather-station-index"
311 |         index => "weather-station-%{+YYYY.MM.dd}"
312 |         hosts => "${ES_ENDPOINT}"
313 |         user => "${ES_USERNAME}"
314 |         password => "${ES_PASSWORD}"
315 |     }
316 | }
317 | ```
318 | 
319 | This filter chain structures the raw data into a format that allows us to easily use Elastic's dynamic mapping feature.
320 | 
321 | For the most part, we use the raw field names as provided to us by the Ambient Weather service.  You can rename the raw field names to something more descriptive if you'd like, but then you'll also need to adjust the Dashboard provided in Step #4 to point to your field names.  
322 | 
323 | Ambient Weather provides a list of the units on each of their field names in the [Device Data Specs](https://github.com/ambient-weather/api-docs/wiki/Device-Data-Specs).
324 | 
325 | Put this pipeline in your Logstash configuration directory so it gets loaded in whenever Logstash restarts:
326 | 
327 | ```bash
328 | sudo mv weather-station-index.yml /etc/logstash/conf.d/
329 | ```
330 | 
331 | Add the pipeline to your `/etc/logstash/pipelines.yml` file:
332 | 
333 | ```
334 | - pipeline.id: "weather-station-index"
335 |   path.config: "/etc/logstash/conf.d/weather-station-index.conf"
336 | ```
337 | 
338 | And finally, restart the Logstash service:
339 | 
340 | ```bash
341 | sudo systemctl restart logstash
342 | ```
343 | 
344 | While Logstash is restarting, you can tail it's log file in order to see if there are any configuration errors:
345 | 
346 | ```bash
347 | sudo tail -f /var/log/logstash/logstash-plain.log
348 | ```
349 | 
350 | After a few seconds, you should see Logstash shutdown and start with the new pipeline and no errors being emitted.
351 | 
352 | Check your cluster's Stack Monitoring to see if we're getting events through the pipeline:
353 | 
354 | ![Stack Monitoring](index.png)
355 | 
356 | ## Step #4 - Visualize Data
357 | 
358 | Once Elasticsearch is indexing the data, we want to visualize it in Kibana.
359 | 
360 | Download this dashboard:
361 | 
362 | ​	[weather-station.ndjson](weather-station.ndjson)
363 | 
364 | Jump into Kibana:
365 | 
366 | 1. Select "Stack Management" from the menu
367 | 2. Select "Saved Objects"
368 | 3. Click "Import" in the upper right
369 | 
370 | Once it's been imported, click on "Weather Station".
371 | 
372 | Congratulations!  You should now be looking at data from your weather station in Elastic.
373 | 
374 | <img src="dashboard.png" alt="dashboard">


--------------------------------------------------------------------------------
/weather-station/archive.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/weather-station/archive.png


--------------------------------------------------------------------------------
/weather-station/dashboard.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/weather-station/dashboard.png


--------------------------------------------------------------------------------
/weather-station/index.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/weather-station/index.png


--------------------------------------------------------------------------------
/weather-station/minio.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/weather-station/minio.png


--------------------------------------------------------------------------------
/weather-station/ws-1550-ip.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gose/elastic-data-lake/588bb1b4bab4cf10970e2e0562c11604205d4e3e/weather-station/ws-1550-ip.png


--------------------------------------------------------------------------------