├── .gitignore
├── CHANGES
├── LICENSE
├── Readme.md
├── docker
├── .env
├── data
│ └── .empty
├── docker-compose.yml
└── zeek2es
│ ├── Dockerfile
│ └── entrypoint.sh
├── images
├── kibana-aggregation.png
├── kibana-map.png
├── kibana-subnet-search.png
├── kibana-timeseries.png
├── kibana.png
└── multi-log-correlation.png
├── process_log.sh
├── process_logs_as_datastream.sh
├── process_logs_to_stdout.sh
├── setup.py
└── zeek2es.py
/.gitignore:
--------------------------------------------------------------------------------
1 | build/
2 | *.so
3 | *.c
4 | .DS_Store
5 | docker/data
6 |
--------------------------------------------------------------------------------
/CHANGES:
--------------------------------------------------------------------------------
1 | v0.3.15 Improved Humio import.
2 | v0.3.14 Removed a print statement.
3 | v0.3.13 Fixed some errors on Humio import.
4 | v0.3.12 Will continue to populate data after a Humio error.
5 | v0.3.11 Added Humio support.
6 | v0.3.10 Improved Docker components.
7 | v0.3.9 Fixed a variable check when there is no output.
8 | v0.3.8 Fixed up some minor issues with JSON stdout output.
9 | v0.3.7 Added Docker pieces.
10 | v0.3.6 Fixed a bug with the slash on the end of the ES url option.
11 | v0.3.5 Removed need for trailing slash on ES URL.
12 | v0.3.4 Made datastream names consistent with ES expectations if -d is used without an index name.
13 | v0.3.3 Added best compression option and fixed helper script.
14 | v0.3.2 Fixed a bug with a grep command.
15 | v0.3.1 Added more logic to make ready for Elastic v8.
16 | v0.3.0 Added filtering on keys. Cleaned up some argparse logic, breaking previous command lines.
17 | v0.2.20 Fix wording.
18 | v0.2.19 Fix a bug in a helper script.
19 | v0.2.18 Added the -p command line argument to split additional fields.
20 | v0.2.17 Fixed various things in the help scripts. Refactor.
21 | v0.2.16 Fixed a typo in a helper script.
22 | v0.2.15 Refactor helper script.
23 | v0.2.14 Added a fswatch helper script.
24 | v0.2.13 Refactored the helper script.
25 | v0.2.12 Added a supporting shell script for data streams.
26 | v0.2.11 Fixed a mapping issue with data streams.
27 | v0.2.10 Fixed help screen output.
28 | v0.2.9 Added hashdates option to use random hashes instead of dates in indices.
29 | v0.2.8 Added lifecycle policy for shard size rollover.
30 | v0.2.7 Added data stream capability.
31 | v0.2.6 Added capability to output only certain fields.
32 | v0.2.5 Added Cython and Python lambda filtering capabilities.
33 | v0.2.4 Added error checking for empty field.
34 | v0.2.3 Added keyword sub field capabilities with -k option.
35 | Added more documentation to readme.
36 | v0.2.2 Added a split ingest pipeline on the "service" field.
37 | v0.2.1 Added ES pipeline capability, which allows for Geolocation on IP addresses.
38 | v0.2.0 Removed some index checking, made indices on log type and day to
39 | reduce the number of open indices. Remove state documents.
40 | Other odds and ends. Added @timestamp for ease.
41 | v0.1.16 Added JSON input support with -j.
42 | v0.1.15 Fix a bug with timezone translation.
43 | v0.1.14 Add timezone support.
44 | v0.1.13 Tune down the -l parameter.
45 | v0.1.12 Added origtime command line option.
46 | v0.1.11 Improvements to processing speed.
47 | v0.1.10 Add option to keep original times.
48 | v0.1.9 Remove stderr output from zeek-cut.
49 | v0.1.8 Added system name to log, if available.
50 | v0.1.7 Improved index name generation.
51 | v0.1.6 Get date from log rather than path.
52 | v0.1.5 Added more debug output.
53 | v0.1.4 Added some error checking.
54 | v0.1.3 Added number of items processed to state document.
55 | v0.1.2 Added state information and --checkstate command line option.
56 | v0.1.1 Added file name to JSON documents.
57 | v0.1.0 Initial release.
58 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | Copyright (c) 2021, Corelight, Inc. All rights reserved.
2 |
3 | Redistribution and use in source and binary forms, with or without
4 | modification, are permitted provided that the following conditions are
5 | met:
6 |
7 | (1) Redistributions of source code must retain the above copyright
8 | notice, this list of conditions and the following disclaimer.
9 |
10 | (2) Redistributions in binary form must reproduce the above copyright
11 | notice, this list of conditions and the following disclaimer in
12 | the documentation and/or other materials provided with the
13 | distribution.
14 |
15 | (3) Neither the name of Corelight nor the names of any contributors
16 | may be used to endorse or promote products derived from this
17 | software without specific prior written permission.
18 |
19 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
20 | "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
21 | LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
22 | A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
23 | OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
24 | SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
25 | LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
26 | DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
27 | THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
28 | (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
29 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
30 |
--------------------------------------------------------------------------------
/Readme.md:
--------------------------------------------------------------------------------
1 | # zeek2es.py
2 |
3 | This Python application translates [Zeek's](https://zeek.org/) ASCII TSV and JSON
4 | logs into [ElasticSearch's bulk load JSON format](https://www.elastic.co/guide/en/elasticsearch/reference/current/getting-started.html#add-multiple-documents).
5 |
6 | ## Table of Contents:
7 | - [Introduction](#introduction)
8 | - [Installation](#installation)
9 | - [Elastic v8.0+](#elastic80)
10 | - [Docker](#docker)
11 | - [Upgrading zeek2es](#upgradingzeek2es)
12 | - [ES Ingest Pipeline](#esingestpipeline)
13 | - [Filtering Data](#filteringdata)
14 | - [Python Filters](#pythonfilters)
15 | - [Filter on Keys](#filteronkeys)
16 | - [Command Line Examples](#commandlineexamples)
17 | - [Command Line Options](#commandlineoptions)
18 | - [Requirements](#requirements)
19 | - [Notes](#notes)
20 | - [Humio](#humio)
21 | - [JSON Log Input](#jsonloginput)
22 | - [Data Streams](#datastreams)
23 | - [Helper Scripts](#helperscripts)
24 | - [Cython](#cython)
25 |
26 | ## Introduction
27 |
28 | 
29 |
30 | Want to see multiple Zeek logs for the same connection ID (uid)
31 | or file ID (fuid)? Here are the hits from files.log, http.log, and
32 | conn.log for a single uid:
33 |
34 | 
35 |
36 | You can perform subnet searching on Zeek's 'addr' type:
37 |
38 | 
39 |
40 | You can create time series graphs, such as this NTP and HTTP graph:
41 |
42 | 
43 |
44 | IP Addresses can be Geolocated with the `-g` command line option:
45 |
46 | 
47 |
48 | Aggregations are simple and quick:
49 |
50 | 
51 |
52 | This application will "just work" when Zeek log formats change. The logic reads
53 | the field names and associated types to set up the mappings correctly in
54 | ElasticSearch.
55 |
56 | This application will recognize gzip or uncompressed logs. This application assumes
57 | you have ElasticSearch set up on your localhost at the default port.
58 | If you do not have ElasticSearch you can output the JSON to stdout with the `-s -b` command line options
59 | to process with the [jq application](https://stedolan.github.io/jq).
60 |
61 | You can add a keyword subfield to text fields with the `-k` command line option. This is useful
62 | for aggregations in Kibana.
63 |
64 | If Python is already on your system, there is nothing additional for you to copy over
65 | to your machine than [Elasticsearch, Kibana](https://www.elastic.co/start), and [zeek2es.py](zeek2es.py)
66 | if you already have the [requests](https://docs.python-requests.org/en/latest/) library installed.
67 |
68 | ## Installation
69 |
70 | Assuming you meet the [requirements](#requirements), there is none. You just
71 | copy [zeek2es.py](zeek2es.py) to your host and run it with Python. Once Zeek
72 | logs have been imported with automatic index name generation (meaning, you did not supply the `-i` option)
73 | you will find your indices named "zeek_`zeeklogname`_`date`", where `zeeklogname` is a log name like `conn`
74 | and the `date` is in `YYYY-MM-DD` format. Set your Kibana index pattern to match `zeek*` in this case. If
75 | you named your index with the `-i` option, you will need to create a Kibana index pattern that
76 | matches your naming scheme.
77 |
78 | If you are upgrading zeek2es, please see [the section on upgrading zeek2es](#upgradingzeek2es).
79 |
80 | ### Elastic v8.0+
81 |
82 | If you are using Elastic v8.0+, it has security enabled by default. This adds a requirement of a username
83 | and password, plus HTTPS.
84 |
85 | If you want to be able to delete indices/data streams with wildcards (as examples in this readme show),
86 | edit `elasticsearch.yml` with the following line:
87 |
88 | ```
89 | action.destructive_requires_name: false
90 | ```
91 |
92 | You will also need to change the curl commands in this readme to contain `-k -u elastic:`
93 | where the `elastic` user's password is set with a command like the following:
94 |
95 | ```
96 | ./bin/elasticsearch-reset-password -u elastic -i
97 | ```
98 |
99 | You can use `zeek2es.py` with the `--user` and `--passwd` command line options to specify your
100 | credentials to ES. You can also supply these options via the extra command line arguments for the helper
101 | scripts.
102 |
103 | ### Docker
104 |
105 | Probably the easiest way to use this code is through Docker. All of the files are in the `docker` directory.
106 | First, you will want to edit the lines with `CHANGEME!!!` in the `.env` file to fit your environment.
107 | You will also need to edit the Elastic password in `docker/zeek2es/entrypoint.sh` to match. It can be found after the `--passwd` option.
108 | Next, you can change directory into the `docker` directory and type the following commands to bring
109 | up a zeek2es and Elasticsearch cluster:
110 |
111 | ```
112 | docker-compose build
113 | dockr-compose up
114 | ```
115 |
116 | Now you can put logs in the `VOLUME_MOUNT/data/logs` directory (`VOLUME_MOUNT` you set in the `.env` file).
117 | When logs are CREATED in this directory, zeek2es will begin processing them and pushing them into Elasticsearch.
118 | You can then login to https://localhost:5601 with the username and password you set up in the `.env` file.
119 | By default there is a self signed certificate, but you can change that if you edit the docker compose files. Once inside
120 | Kibana you will go to Stack Management->Data Views and create a data view for `logs*` with the timestamp `@timestamp`.
121 | Now you will be able to go to Discover and start searching your logs! Your data is persistent in the `VOLUME_MOUNT/data` directory you set.
122 | If you would like to remove all data, just `rm -rf VOLUME_MOUNT/data`, substituting the directory you set into that remove command.
123 | The next time you start your cluster it will be brand new for more data.
124 |
125 | ## Upgrading zeek2es
126 |
127 | Most upgrades should be as simple as copying the newer [zeek2es.py](zeek2es.py) over
128 | the old one. In some cases, the ES ingest pipeline required for the `-g` command line option
129 | might change during an upgrade. Therefore, it is strongly recommend you delete
130 | your [ingest pipeline](#esingestpipeline) before you run a new version of zeek2es.py.
131 |
132 | ### ES Ingest Pipeline
133 |
134 | If you need to [delete the "zeekgeoip" ES ingest pipeline](https://www.elastic.co/guide/en/elasticsearch/reference/current/delete-pipeline-api.html)
135 | used to geolocate IP addresses with the `-g` command line option, you can either do it graphically
136 | through Kibana's Stack Management->Ingest Pipelines or this command will do it for you:
137 |
138 | ```
139 | curl -X DELETE "localhost:9200/_ingest/pipeline/zeekgeoip?pretty"
140 | ```
141 |
142 | This command is strongly recommended whenever updating your copy of zeek2es.py.
143 |
144 | ## Filtering Data
145 |
146 | ### Python Filters
147 |
148 | zeek2es provides filtering capabilities for your Zeek logs before they are stored in ElasticSearch. This
149 | functionality can be enabled with the `-a` or `-f` options. The filters are constructed from Python
150 | lambda functions, where the input is a Python dictionary representing the output. You can add a
151 | filter to only store connection logs where the `service` field is populated using the `-f` option with
152 | this lambda filter file:
153 |
154 | ```
155 | lambda x: 'service' in x and len(x['service']) > 0
156 | ```
157 |
158 | Or maybe you'd like to filter for connections that have at least 1,024 bytes, with at least 1 byte coming from
159 | the destination:
160 |
161 | ```
162 | lambda x: 'orig_ip_bytes' in x and 'resp_ip_bytes' in x and x['orig_ip_bytes'] + x['resp_ip_bytes'] > 1024 and x['resp_ip_bytes'] > 0
163 | ```
164 |
165 | Simpler lambda filters can be provided on the command line via the `-a` option. This filter will only store
166 | connection log entries where the originator IP address is part of the `192.0.0.0/8` network:
167 |
168 | ```
169 | python zeek2es.py conn.log.gz -a "lambda x: 'id.orig_h' in x and ipaddress.ip_address(x['id.orig_h']) in ipaddress.ip_network('192.0.0.0/8')"
170 | ```
171 |
172 | For power users, the `-f` option will allow you to define a full function (instead of Python's lambda functions) so you can write functions that
173 | span multiple lines.
174 |
175 | ### Filter on Keys
176 |
177 | In some instances you might want to pull data from one log that depends on another. An
178 | example would be finding all `ssl.log` rows that have a `uid` matching previously
179 | indexed rows from `conn.log`, or vice versa. You can filter by importing your
180 | `conn.log` files with the `-o uid uid.txt` command line. This will log all uids that were
181 | indexed to a file named `uid.txt`. Then, when you import your `ssl.log` files you will provide
182 | the `-e uid uid.txt` command line. This will only import SSL rows
183 | containing `uid` values that are in `uid.txt`, previously built from our import of `conn.log`.
184 |
185 | ## Command Line Examples
186 |
187 | ```
188 | python zeek2es.py your_zeek_log.gz -i your_es_index_name
189 | ```
190 |
191 | This script can be run in parallel on all connection logs, 10 at a time, with the following command:
192 |
193 | ```
194 | find /some/dir -name “conn*.log.gz” | parallel -j 10 python zeek2es.py {1} :::: -
195 | ```
196 |
197 | If you would like to automatically import all conn.log files as they are created in a directory, the following
198 | [fswatch](https://emcrisostomo.github.io/fswatch/) command will do that for you:
199 |
200 | ```
201 | fswatch -m poll_monitor --event Created -r /data/logs/zeek/ | awk '/^.*\/conn.*\.log\.gz$/' | parallel -j 5 python ~/zeek2es.py {} -g -d :::: -
202 | ```
203 |
204 | If you have the jq command installed you can perform searches across all your logs for a common
205 | field like connection uid, even without ElasticSearch:
206 |
207 | ```
208 | find /usr/local/var/logs -name "*.log.gz" -exec python ~/Source/zeek2es/zeek2es.py {} -s -b -z \; | jq -c '. | select(.uid=="CLbPij1vThLvQ2qDKh")'
209 | ```
210 |
211 | You can use much more complex jq queries than this if you are familiar with jq.
212 |
213 | If you want to remove all of your Zeek data from ElasticSearch, this command will do it for you:
214 |
215 | ```
216 | curl -X DELETE http://localhost:9200/zeek*
217 | ```
218 |
219 | Since the indices have the date appended to them, you could
220 | delete Dec 31, 2021 with the following command:
221 |
222 | ```
223 | curl -X DELETE http://localhost:9200/zeek_*_2021-12-31
224 | ```
225 |
226 | You could delete all conn.log entries with this command:
227 |
228 | ```
229 | curl -X DELETE http://localhost:9200/zeek_conn_*
230 | ```
231 |
232 | ## Command Line Options
233 |
234 | ```
235 | $ python zeek2es.py -h
236 | usage: zeek2es.py [-h] [-i ESINDEX] [-u ESURL] [--user USER] [--passwd PASSWD]
237 | [-l LINES] [-n NAME] [-k KEYWORDS [KEYWORDS ...]]
238 | [-a LAMBDAFILTER] [-f FILTERFILE]
239 | [-y OUTPUTFIELDS [OUTPUTFIELDS ...]] [-d DATASTREAM]
240 | [--compress] [-o fieldname filename] [-e fieldname filename]
241 | [-g] [-p SPLITFIELDS [SPLITFIELDS ...]] [-j] [-r] [-t] [-s]
242 | [-b] [--humio HUMIO HUMIO] [-c] [-w] [-z]
243 | filename
244 |
245 | Process Zeek ASCII logs into ElasticSearch.
246 |
247 | positional arguments:
248 | filename The Zeek log in *.log or *.gz format. Include the full path.
249 |
250 | optional arguments:
251 | -h, --help show this help message and exit
252 | -i ESINDEX, --esindex ESINDEX
253 | The Elasticsearch index/data stream name.
254 | -u ESURL, --esurl ESURL
255 | The Elasticsearch URL. Use ending slash. Use https for Elastic v8+. (default: http://localhost:9200)
256 | --user USER The Elasticsearch user. (default: disabled)
257 | --passwd PASSWD The Elasticsearch password. Note this will put your password in this shell history file. (default: disabled)
258 | -l LINES, --lines LINES
259 | Lines to buffer for RESTful operations. (default: 10,000)
260 | -n NAME, --name NAME The name of the system to add to the index for uniqueness. (default: empty string)
261 | -k KEYWORDS [KEYWORDS ...], --keywords KEYWORDS [KEYWORDS ...]
262 | A list of text fields to add a keyword subfield. (default: service)
263 | -a LAMBDAFILTER, --lambdafilter LAMBDAFILTER
264 | A Python lambda function, when eval'd will filter your output JSON dict. (default: empty string)
265 | -f FILTERFILE, --filterfile FILTERFILE
266 | A Python function file, when eval'd will filter your output JSON dict. (default: empty string)
267 | -y OUTPUTFIELDS [OUTPUTFIELDS ...], --outputfields OUTPUTFIELDS [OUTPUTFIELDS ...]
268 | A list of fields to keep for the output. Must include ts. (default: empty string)
269 | -d DATASTREAM, --datastream DATASTREAM
270 | Instead of an index, use a data stream that will rollover at this many GB.
271 | Recommended is 50 or less. (default: 0 - disabled)
272 | --compress If a datastream is used, enable best compression.
273 | -o fieldname filename, --logkey fieldname filename
274 | A field to log to a file. Example: uid uid.txt.
275 | Will append to the file! Delete file before running if appending is undesired.
276 | This option can be called more than once. (default: empty - disabled)
277 | -e fieldname filename, --filterkeys fieldname filename
278 | A field to filter with keys from a file. Example: uid uid.txt. (default: empty string - disabled)
279 | -g, --ingestion Use the ingestion pipeline to do things like geolocate IPs and split services. Takes longer, but worth it.
280 | -p SPLITFIELDS [SPLITFIELDS ...], --splitfields SPLITFIELDS [SPLITFIELDS ...]
281 | A list of additional fields to split with the ingestion pipeline, if enabled.
282 | (default: empty string - disabled)
283 | -j, --jsonlogs Assume input logs are JSON.
284 | -r, --origtime Keep the numerical time format, not milliseconds as ES needs.
285 | -t, --timestamp Keep the time in timestamp format.
286 | -s, --stdout Print JSON to stdout instead of sending to Elasticsearch directly.
287 | -b, --nobulk Remove the ES bulk JSON header. Requires --stdout.
288 | --humio HUMIO HUMIO First argument is the Humio URL, the second argument is the ingest token.
289 | -c, --cython Use Cython execution by loading the local zeek2es.so file through an import.
290 | Run python setup.py build_ext --inplace first to make your zeek2es.so file!
291 | -w, --hashdates Use hashes instead of dates for the index name.
292 | -z, --supresswarnings
293 | Supress any type of warning. Die stoically and silently.
294 |
295 | To delete indices:
296 |
297 | curl -X DELETE http://localhost:9200/zeek*?pretty
298 |
299 | To delete data streams:
300 |
301 | curl -X DELETE http://localhost:9200/_data_stream/zeek*?pretty
302 |
303 | To delete index templates:
304 |
305 | curl -X DELETE http://localhost:9200/_index_template/zeek*?pretty
306 |
307 | To delete the lifecycle policy:
308 |
309 | curl -X DELETE http://localhost:9200/_ilm/policy/zeek-lifecycle-policy?pretty
310 |
311 | You will need to add -k -u elastic_user:password if you are using Elastic v8+.
312 | ```
313 |
314 | ## Requirements
315 |
316 | - A Unix-like environment (MacOs works!)
317 | - Python
318 | - [requests](https://docs.python-requests.org/en/latest/) Python library installed, such as with with `pip`.
319 |
320 | ## Notes
321 |
322 | ### Humio
323 |
324 | To import your data into Humio you will need to set up a repository with the `corelight-json` parser. Obtain
325 | the ingest token for the repository and you can import your data with a command such as:
326 |
327 | ```
328 | python3 zeek2es.py -s -b --humio http://localhost:8080 b005bf74-1ed3-4871-904f-9460a4687202 http.log
329 | ```
330 |
331 | The URL should be in the format of: `http://yourserver:8080`, as the rest of the path is added by the
332 | `zeek2es.py` script automatically for you.
333 |
334 | ### JSON Log Input
335 |
336 | Since Zeek JSON logs do not have type information like the ASCII TSV versions, only limited type information
337 | can be provided to ElasticSearch. You will notice this most for Zeek "addr" log fields that
338 | are not id$orig_h and id$resp_h, since the type information is not available to translate the field into
339 | ElasticSearch's "ip" type. Since address fields will not be of type "ip", you will not be able to use
340 | subnet searches, for example, like you could for the TSV logs. Saving Zeek logs in ASCII TSV
341 | format provides for greater long term flexibility.
342 |
343 | ### Data Streams
344 |
345 | You can use data streams instead of indices for large logs with the `-d` command line option. This
346 | option creates index templates beginning with `zeek_`. It also creates a lifecycle policy
347 | named `zeek-lifecycle-policy`. If you would like to delete all of your data streams, lifecycle policies,
348 | and index templates, these commands will do it for you:
349 |
350 | ```
351 | curl -X DELETE http://localhost:9200/_data_stream/zeek*?pretty
352 | curl -X DELETE http://localhost:9200/_index_template/zeek*?pretty
353 | curl -X DELETE http://localhost:9200/_ilm/policy/zeek-lifecycle-policy?pretty
354 | ```
355 |
356 | ### Helper Scripts
357 |
358 | There are two scripts that will help you make your logs into data streams such as `logs-zeek-conn`.
359 | The first script is [process_logs_as_datastream.sh](process_logs_as_datastream.sh) and given
360 | a list of logs and directories, will import them as such. The second script
361 | is [process_log.sh](process_log.sh), and it can be used to import logs
362 | one at a time. This script can also be used to monitor logs created in a directory with
363 | [fswatch](https://emcrisostomo.github.io/fswatch/). Both scripts have example command lines
364 | if you run them without any parameters.
365 |
366 | ```
367 | $ ./process_logs_as_datastream.sh
368 | Usage: ./process_logs_as_datastream.sh NJOBS "ADDITIONAL_ARGS_TO_ZEEK2ES" "LIST_OF_LOGS_DELIMITED_BY_SPACES" DIR1 DIR2 ...
369 |
370 | Example:
371 | time ./process_logs_as_datastream.sh 16 "" "amqp bgp conn dce_rpc dhcp dns dpd files ftp http ipsec irc kerberos modbus modbus_register_change mount mqtt mysql nfs notice ntlm ntp ospf portmap radius reporter rdp rfb rip ripng sip smb_cmd smb_files smb_mapping smtp snmp socks ssh ssl stun syslog tunnel vpn weird wireguard x509" /usr/local/var/logs
372 | ```
373 |
374 | ```
375 | $ ./process_log.sh
376 | Usage: ./process_log.sh LOGFILENAME "ADDITIONAL_ARGS_TO_ZEEK2ES"
377 |
378 | Example:
379 | fswatch -m poll_monitor --event Created -r /data/logs/zeek | awk '/^.*\/(conn|dns|http)\..*\.log\.gz$/' | parallel -j 16 ./process_log.sh {} "" :::: -
380 | ```
381 |
382 | You will need to edit these scripts and command lines according to your environment.
383 |
384 | Any files having a name of a log such as `conn_filter.txt` in the `lambda_filter_file_dir`, by default your home directory, will be applied as a lambda
385 | filter file to the corresponding log input. This allows you to set up all of your filters in one directory and import multiple log files with
386 | that set of filters in one command with [process_logs_as_datastream.sh](process_logs_as_datastream.sh).
387 |
388 | The following lines should delete all Zeek data in ElasticSearch no matter if you use indices or
389 | data streams, or these helper scripts:
390 |
391 | ```
392 | curl -X DELETE http://localhost:9200/zeek*?pretty
393 | curl -X DELETE http://localhost:9200/_data_stream/zeek*?pretty
394 | curl -X DELETE http://localhost:9200/_data_stream/logs-zeek*?pretty
395 | curl -X DELETE http://localhost:9200/_index_template/zeek*?pretty
396 | curl -X DELETE http://localhost:9200/_index_template/logs-zeek*?pretty
397 | curl -X DELETE http://localhost:9200/_ilm/policy/zeek-lifecycle-policy?pretty
398 | ```
399 |
400 | ... or if using Elastic v8+ ...
401 |
402 | ```
403 | curl -X DELETE -k -u elastic:password https://localhost:9200/zeek*?pretty
404 | curl -X DELETE -k -u elastic:password https://localhost:9200/_data_stream/zeek*?pretty
405 | curl -X DELETE -k -u elastic:password https://localhost:9200/_data_stream/logs-zeek*?pretty
406 | curl -X DELETE -k -u elastic:password https://localhost:9200/_index_template/zeek*?pretty
407 | curl -X DELETE -k -u elastic:password https://localhost:9200/_index_template/logs-zeek*?pretty
408 | curl -X DELETE -k -u elastic:password https://localhost:9200/_ilm/policy/zeek-lifecycle-policy?pretty
409 | ```
410 |
411 | But to be able to do this in v8+ you will need to configure Elastic as described
412 | in the section [Elastic v8.0+](#elastic80).
413 |
414 | ### Cython
415 |
416 | If you'd like to try [Cython](https://cython.org/), you must run `python setup.py build_ext --inplace`
417 | first to generate your compiled file. You must do this every time you update zeek2es!
--------------------------------------------------------------------------------
/docker/.env:
--------------------------------------------------------------------------------
1 | # Password for the 'elastic' user (at least 6 characters) CHANGEME!!!
2 | ELASTIC_PASSWORD=elastic
3 |
4 | # Password for the 'kibana_system' user (at least 6 characters) CHANGEME!!!
5 | KIBANA_PASSWORD=elasticANDkibana
6 |
7 | # Version of Elastic products
8 | STACK_VERSION=8.1.3
9 |
10 | # Set the cluster name
11 | CLUSTER_NAME=docker-cluster
12 |
13 | # Set to 'basic' or 'trial' to automatically start the 30-day trial
14 | LICENSE=basic
15 | #LICENSE=trial
16 |
17 | # Port to expose Elasticsearch HTTP API to the host
18 | ES_PORT=9200
19 | #ES_PORT=127.0.0.1:9200
20 |
21 | # Port to expose Kibana to the host
22 | KIBANA_PORT=5601
23 | #KIBANA_PORT=80
24 |
25 | # Increase or decrease based on the available host memory (in bytes)
26 | MEM_LIMIT=1073741824
27 |
28 | # Project namespace (defaults to the current folder name if not set)
29 | #COMPOSE_PROJECT_NAME=myproject
30 |
31 | # Where the data directory resides for volumes CHANGEME!!!
32 | VOLUME_MOUNT=./
--------------------------------------------------------------------------------
/docker/data/.empty:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/corelight/zeek2es/078b531dc27741e3dad26880f58aaee859e8721d/docker/data/.empty
--------------------------------------------------------------------------------
/docker/docker-compose.yml:
--------------------------------------------------------------------------------
1 | version: "2.2"
2 |
3 | services:
4 | setup:
5 | image: docker.elastic.co/elasticsearch/elasticsearch:${STACK_VERSION}
6 | volumes:
7 | - certs:/usr/share/elasticsearch/config/certs
8 | user: "0"
9 | command: >
10 | bash -c '
11 | if [ x${ELASTIC_PASSWORD} == x ]; then
12 | echo "Set the ELASTIC_PASSWORD environment variable in the .env file";
13 | exit 1;
14 | elif [ x${KIBANA_PASSWORD} == x ]; then
15 | echo "Set the KIBANA_PASSWORD environment variable in the .env file";
16 | exit 1;
17 | fi;
18 | if [ ! -f config/certs/ca.zip ]; then
19 | echo "Creating CA";
20 | bin/elasticsearch-certutil ca --silent --pem -out config/certs/ca.zip;
21 | unzip config/certs/ca.zip -d config/certs;
22 | fi;
23 | if [ ! -f config/certs/certs.zip ]; then
24 | echo "Creating certs";
25 | echo -ne \
26 | "instances:\n"\
27 | " - name: es01\n"\
28 | " dns:\n"\
29 | " - es01\n"\
30 | " - localhost\n"\
31 | " ip:\n"\
32 | " - 127.0.0.1\n"\
33 | " - name: es02\n"\
34 | " dns:\n"\
35 | " - es02\n"\
36 | " - localhost\n"\
37 | " ip:\n"\
38 | " - 127.0.0.1\n"\
39 | " - name: es03\n"\
40 | " dns:\n"\
41 | " - es03\n"\
42 | " - localhost\n"\
43 | " ip:\n"\
44 | " - 127.0.0.1\n"\
45 | " - name: kibana\n"\
46 | " dns:\n"\
47 | " - kibana\n"\
48 | " - localhost\n"\
49 | " ip:\n"\
50 | " - 127.0.0.1\n"\
51 | > config/certs/instances.yml;
52 | bin/elasticsearch-certutil cert --silent --pem -out config/certs/certs.zip --in config/certs/instances.yml --ca-cert config/certs/ca/ca.crt --ca-key config/certs/ca/ca.key;
53 | unzip config/certs/certs.zip -d config/certs;
54 | fi;
55 | echo "Setting file permissions"
56 | chown -R root:root config/certs;
57 | find . -type d -exec chmod 750 \{\} \;;
58 | find . -type f -exec chmod 640 \{\} \;;
59 | echo "Waiting for Elasticsearch availability";
60 | until curl -s --cacert config/certs/ca/ca.crt https://es01:9200 | grep -q "missing authentication credentials"; do sleep 30; done;
61 | echo "Setting kibana_system password";
62 | until curl -s -X POST --cacert config/certs/ca/ca.crt -u elastic:${ELASTIC_PASSWORD} -H "Content-Type: application/json" https://es01:9200/_security/user/kibana_system/_password -d "{\"password\":\"${KIBANA_PASSWORD}\"}" | grep -q "^{}"; do sleep 10; done;
63 | echo "All done!";
64 | '
65 | healthcheck:
66 | test: ["CMD-SHELL", "[ -f config/certs/es01/es01.crt ]"]
67 | interval: 1s
68 | timeout: 5s
69 | retries: 120
70 | container_name: "setup"
71 |
72 | es01:
73 | depends_on:
74 | setup:
75 | condition: service_healthy
76 | image: docker.elastic.co/elasticsearch/elasticsearch:${STACK_VERSION}
77 | restart: "unless-stopped"
78 | volumes:
79 | - certs:/usr/share/elasticsearch/config/certs
80 | - ${VOLUME_MOUNT}/data/es01:/usr/share/elasticsearch/data
81 | ports:
82 | - ${ES_PORT}:9200
83 | environment:
84 | - node.name=es01
85 | - cluster.name=${CLUSTER_NAME}
86 | - cluster.initial_master_nodes=es01,es02,es03
87 | - discovery.seed_hosts=es02,es03
88 | - ELASTIC_PASSWORD=${ELASTIC_PASSWORD}
89 | - bootstrap.memory_lock=true
90 | - xpack.security.enabled=true
91 | - xpack.security.http.ssl.enabled=true
92 | - xpack.security.http.ssl.key=certs/es01/es01.key
93 | - xpack.security.http.ssl.certificate=certs/es01/es01.crt
94 | - xpack.security.http.ssl.certificate_authorities=certs/ca/ca.crt
95 | - xpack.security.http.ssl.verification_mode=certificate
96 | - xpack.security.transport.ssl.enabled=true
97 | - xpack.security.transport.ssl.key=certs/es01/es01.key
98 | - xpack.security.transport.ssl.certificate=certs/es01/es01.crt
99 | - xpack.security.transport.ssl.certificate_authorities=certs/ca/ca.crt
100 | - xpack.security.transport.ssl.verification_mode=certificate
101 | - xpack.license.self_generated.type=${LICENSE}
102 | mem_limit: ${MEM_LIMIT}
103 | ulimits:
104 | memlock:
105 | soft: -1
106 | hard: -1
107 | healthcheck:
108 | test:
109 | [
110 | "CMD-SHELL",
111 | "curl -s --cacert config/certs/ca/ca.crt https://localhost:9200 | grep -q 'missing authentication credentials'",
112 | ]
113 | interval: 10s
114 | timeout: 10s
115 | retries: 120
116 | container_name: "es01"
117 |
118 | es02:
119 | depends_on:
120 | - es01
121 | image: docker.elastic.co/elasticsearch/elasticsearch:${STACK_VERSION}
122 | restart: "unless-stopped"
123 | volumes:
124 | - certs:/usr/share/elasticsearch/config/certs
125 | - ${VOLUME_MOUNT}/data/es02:/usr/share/elasticsearch/data
126 | environment:
127 | - node.name=es02
128 | - cluster.name=${CLUSTER_NAME}
129 | - cluster.initial_master_nodes=es01,es02,es03
130 | - discovery.seed_hosts=es01,es03
131 | - bootstrap.memory_lock=true
132 | - xpack.security.enabled=true
133 | - xpack.security.http.ssl.enabled=true
134 | - xpack.security.http.ssl.key=certs/es02/es02.key
135 | - xpack.security.http.ssl.certificate=certs/es02/es02.crt
136 | - xpack.security.http.ssl.certificate_authorities=certs/ca/ca.crt
137 | - xpack.security.http.ssl.verification_mode=certificate
138 | - xpack.security.transport.ssl.enabled=true
139 | - xpack.security.transport.ssl.key=certs/es02/es02.key
140 | - xpack.security.transport.ssl.certificate=certs/es02/es02.crt
141 | - xpack.security.transport.ssl.certificate_authorities=certs/ca/ca.crt
142 | - xpack.security.transport.ssl.verification_mode=certificate
143 | - xpack.license.self_generated.type=${LICENSE}
144 | mem_limit: ${MEM_LIMIT}
145 | ulimits:
146 | memlock:
147 | soft: -1
148 | hard: -1
149 | healthcheck:
150 | test:
151 | [
152 | "CMD-SHELL",
153 | "curl -s --cacert config/certs/ca/ca.crt https://localhost:9200 | grep -q 'missing authentication credentials'",
154 | ]
155 | interval: 10s
156 | timeout: 10s
157 | retries: 120
158 | container_name: "es02"
159 |
160 | es03:
161 | depends_on:
162 | - es02
163 | image: docker.elastic.co/elasticsearch/elasticsearch:${STACK_VERSION}
164 | restart: "unless-stopped"
165 | volumes:
166 | - certs:/usr/share/elasticsearch/config/certs
167 | - ${VOLUME_MOUNT}/data/es03:/usr/share/elasticsearch/data
168 | environment:
169 | - node.name=es03
170 | - cluster.name=${CLUSTER_NAME}
171 | - cluster.initial_master_nodes=es01,es02,es03
172 | - discovery.seed_hosts=es01,es02
173 | - bootstrap.memory_lock=true
174 | - xpack.security.enabled=true
175 | - xpack.security.http.ssl.enabled=true
176 | - xpack.security.http.ssl.key=certs/es03/es03.key
177 | - xpack.security.http.ssl.certificate=certs/es03/es03.crt
178 | - xpack.security.http.ssl.certificate_authorities=certs/ca/ca.crt
179 | - xpack.security.http.ssl.verification_mode=certificate
180 | - xpack.security.transport.ssl.enabled=true
181 | - xpack.security.transport.ssl.key=certs/es03/es03.key
182 | - xpack.security.transport.ssl.certificate=certs/es03/es03.crt
183 | - xpack.security.transport.ssl.certificate_authorities=certs/ca/ca.crt
184 | - xpack.security.transport.ssl.verification_mode=certificate
185 | - xpack.license.self_generated.type=${LICENSE}
186 | mem_limit: ${MEM_LIMIT}
187 | ulimits:
188 | memlock:
189 | soft: -1
190 | hard: -1
191 | healthcheck:
192 | test:
193 | [
194 | "CMD-SHELL",
195 | "curl -s --cacert config/certs/ca/ca.crt https://localhost:9200 | grep -q 'missing authentication credentials'",
196 | ]
197 | interval: 10s
198 | timeout: 10s
199 | retries: 120
200 | container_name: "es03"
201 |
202 | kibana:
203 | depends_on:
204 | es01:
205 | condition: service_healthy
206 | es02:
207 | condition: service_healthy
208 | es03:
209 | condition: service_healthy
210 | image: docker.elastic.co/kibana/kibana:${STACK_VERSION}
211 | restart: "unless-stopped"
212 | volumes:
213 | - certs:/usr/share/kibana/config/certs
214 | - ${VOLUME_MOUNT}/data/kibana:/usr/share/kibana/data
215 | ports:
216 | - ${KIBANA_PORT}:5601
217 | environment:
218 | - SERVERNAME=kibana
219 | - ELASTICSEARCH_HOSTS=https://es01:9200
220 | - ELASTICSEARCH_USERNAME=kibana_system
221 | - ELASTICSEARCH_PASSWORD=${KIBANA_PASSWORD}
222 | - ELASTICSEARCH_SSL_CERTIFICATEAUTHORITIES=config/certs/ca/ca.crt
223 | - SERVER_SSL_ENABLED=true
224 | - SERVER_SSL_KEY=/usr/share/kibana/config/certs/kibana/kibana.key
225 | - SERVER_SSL_CERTIFICATE=/usr/share/kibana/config/certs/kibana/kibana.crt
226 | - SERVER_SSL_CERTIFICATEAUTHORITIES=config/certs/ca/ca.crt
227 | # - SERVER_SSL_PASSWORD=${KIBANA_CERT_PASSWORD}
228 | mem_limit: ${MEM_LIMIT}
229 | healthcheck:
230 | test:
231 | [
232 | "CMD-SHELL",
233 | "curl -s -I http://localhost:5601 | grep -q 'HTTP/1.1 302 Found'",
234 | ]
235 | interval: 10s
236 | timeout: 10s
237 | retries: 120
238 | container_name: "kibana"
239 |
240 | zeek2es:
241 | build:
242 | context: ./zeek2es
243 | dockerfile: Dockerfile
244 | restart: "unless-stopped"
245 | depends_on:
246 | es01:
247 | condition: service_healthy
248 | es02:
249 | condition: service_healthy
250 | es03:
251 | condition: service_healthy
252 | command: >
253 | bash -c '
254 | chmod 755 /entrypoint.sh;
255 | /entrypoint.sh
256 | '
257 | volumes:
258 | - ./zeek2es/entrypoint.sh:/entrypoint.sh
259 | - ${VOLUME_MOUNT}/data/logs:/logs
260 | tty: true
261 | container_name: "zeek2es"
262 |
263 | volumes:
264 | certs:
265 | driver: local
--------------------------------------------------------------------------------
/docker/zeek2es/Dockerfile:
--------------------------------------------------------------------------------
1 | FROM ubuntu:jammy
2 |
3 | RUN apt-get -q update && \
4 | DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
5 | curl \
6 | fswatch \
7 | geoipupdate \
8 | git \
9 | iproute2 \
10 | jq \
11 | less \
12 | netcat \
13 | net-tools \
14 | parallel \
15 | python3 \
16 | python3-dev \
17 | python3-pip \
18 | python3-setuptools \
19 | python3-wheel \
20 | swig \
21 | tcpdump \
22 | tcpreplay \
23 | termshark \
24 | tshark \
25 | vim \
26 | wget \
27 | zeek-aux && \
28 | pip3 install --no-cache-dir pre-commit requests && \
29 | curl -L -O https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-8.2.0-amd64.deb && \
30 | dpkg -i filebeat-8.2.0-amd64.deb && \
31 | rm filebeat-8.2.0-amd64.deb && \
32 | apt-get clean && rm -rf /var/lib/apt/lists/* && rm -rf ~/.cache/pip
33 |
34 | # Install zeek2es
35 | RUN cd / && git clone https://github.com/corelight/zeek2es.git
36 |
37 | #COPY entrypoint.sh /entrypoint.sh
38 | #RUN chmod 755 /entrypoint.sh
--------------------------------------------------------------------------------
/docker/zeek2es/entrypoint.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 |
3 | fswatch -m poll_monitor --event Created -r /logs | parallel -j 3 python3 /zeek2es/zeek2es.py {} --compress -g -l 5000 -d 25 -u https://es01:9200 --user elastic --passwd elastic :::: -
--------------------------------------------------------------------------------
/images/kibana-aggregation.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/corelight/zeek2es/078b531dc27741e3dad26880f58aaee859e8721d/images/kibana-aggregation.png
--------------------------------------------------------------------------------
/images/kibana-map.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/corelight/zeek2es/078b531dc27741e3dad26880f58aaee859e8721d/images/kibana-map.png
--------------------------------------------------------------------------------
/images/kibana-subnet-search.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/corelight/zeek2es/078b531dc27741e3dad26880f58aaee859e8721d/images/kibana-subnet-search.png
--------------------------------------------------------------------------------
/images/kibana-timeseries.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/corelight/zeek2es/078b531dc27741e3dad26880f58aaee859e8721d/images/kibana-timeseries.png
--------------------------------------------------------------------------------
/images/kibana.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/corelight/zeek2es/078b531dc27741e3dad26880f58aaee859e8721d/images/kibana.png
--------------------------------------------------------------------------------
/images/multi-log-correlation.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/corelight/zeek2es/078b531dc27741e3dad26880f58aaee859e8721d/images/multi-log-correlation.png
--------------------------------------------------------------------------------
/process_log.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 |
3 | # Things you can set:
4 | zeek2es_path=~/Source/zeek2es/zeek2es.py
5 | filter_file_dir=~/
6 | num_of_lines=50000
7 | logfiledelim=\\.
8 | stream_prepend="logs-zeek-"
9 | stream_ending=""
10 | pythoncmd="python3"
11 | zeek2esargs="-g -l $num_of_lines"
12 |
13 | # Error checking
14 | if [ "$#" -ne 2 ]; then
15 | echo "Usage: $0 LOGFILENAME \"ADDITIONAL_ARGS_TO_ZEEK2ES\"" >&2
16 | echo >&2
17 | echo "Example:" >&2
18 | echo " fswatch -m poll_monitor --event Created -r /data/logs/zeek | awk '/^.*\/(conn|dns|http)\..*\.log\.gz$/' | parallel -j 16 $0 {} \"\"" :::: - >&2
19 | exit 1
20 | fi
21 |
22 | # Things set from the command line
23 | logfile=$1
24 | additional_args=$2
25 |
26 | echo "Processing $logfile..."
27 | regex="s/.*\/\([^0-9\.]*\)$logfiledelim[0-9].*\.log\.gz/\1/"
28 | log_type=`echo $logfile | sed $regex`
29 | echo $log_type
30 |
31 | zeek2esargsplus=$zeek2esargs" -i $stream_prepend$log_type$stream_ending "$additional_args
32 |
33 | filterfile=$filter_file_dir$log_type"_filter.txt"
34 |
35 | if [ -f $filterfile ]; then
36 | echo " Using filter file "$filterfile
37 | $pythoncmd $zeek2es_path $logfile $zeek2esargsplus -f $filterfile
38 | else
39 | echo " No filter file found for "$filterfile
40 | $pythoncmd $zeek2es_path $logfile $zeek2esargsplus
41 | fi
--------------------------------------------------------------------------------
/process_logs_as_datastream.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 |
3 | # Things you can set:
4 | zeek2es_path=~/Source/zeek2es/zeek2es.py
5 | lognamedelim=\\.
6 | #zeek2es_path=~/zeek2es.py
7 | #lognamedelim=_2
8 | filter_file_dir=~/
9 | num_of_lines=50000
10 | num_of_gb=50
11 | pythoncmd="python3"
12 | zeek2esargs="-g -l $num_of_lines"
13 |
14 | # Error checking
15 | if [ "$#" -lt 4 ]; then
16 | echo "Usage: $0 NJOBS \"ADDITIONAL_ARGS_TO_ZEEK2ES\" \"LIST_OF_LOGS_DELIMITED_BY_SPACES\" DIR1 DIR2 ..." >&2
17 | echo >&2
18 | echo "Example:" >&2
19 | echo " time $0 16 \"\" \"amqp bgp conn dce_rpc dhcp dns dpd files ftp http ipsec irc kerberos modbus modbus_register_change mount mqtt mysql nfs notice ntlm ntp ospf portmap radius reporter rdp rfb rip ripng sip smb_cmd smb_files smb_mapping smtp snmp socks ssh ssl stun syslog tunnel vpn weird wireguard x509\" /usr/local/var/logs" >&2
20 | exit 1
21 | fi
22 |
23 | # Things set from the command line
24 | njobs=$1
25 | additional_args=$2
26 | logs=$3
27 | logdirs=${@:4}
28 |
29 | # Iterate through the *.log.gz files in the supplied directory
30 | for val in $logs; do
31 | zeek2esargsplus=$zeek2esargs" --compress -d "$num_of_gb" "$additional_args
32 | echo "Processing $val logs..."
33 | filename_re="/^.*\/"$val$lognamedelim".*\.log\.gz$/"
34 |
35 | filterfile=$filter_file_dir$val"_filter.txt"
36 |
37 | if [ -f $filterfile ]; then
38 | echo " Using filter file "$filterfile
39 | find $logdirs | awk $filename_re | parallel -j $njobs $pythoncmd $zeek2es_path {} $zeek2esargsplus -f $filterfile :::: -
40 | else
41 | echo " No filter file found for "$filterfile
42 | find $logdirs | awk $filename_re | parallel -j $njobs $pythoncmd $zeek2es_path {} $zeek2esargsplus :::: -
43 | fi
44 | done
--------------------------------------------------------------------------------
/process_logs_to_stdout.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 |
3 | # Things you can set:
4 | zeek2es_path=~/Source/zeek2es/zeek2es.py
5 | lognamedelim=\\.
6 | #zeek2es_path=~/zeek2es.py
7 | #lognamedelim=_2
8 | filter_file_dir=~/
9 | num_of_lines=50000
10 | stream_prepend="logs-zeek-"
11 | stream_ending=""
12 | pythoncmd="python3"
13 | zeek2esargs="-s -b"
14 |
15 | # Error checking
16 | if [ "$#" -lt 4 ]; then
17 | echo "Usage: $0 NJOBS \"ADDITIONAL_ARGS_TO_ZEEK2ES\" \"LIST_OF_LOGS_DELIMITED_BY_SPACES\" DIR1 DIR2 ..." >&2
18 | echo >&2
19 | echo "Example:" >&2
20 | echo " time $0 16 \"\" \"amqp bgp conn dce_rpc dhcp dns dpd files ftp http ipsec irc kerberos modbus modbus_register_change mount mqtt mysql nfs notice ntlm ntp ospf portmap radius reporter rdp rfb rip ripng sip smb_cmd smb_files smb_mapping smtp snmp socks ssh ssl stun syslog tunnel vpn weird wireguard x509\" /usr/local/var/logs" >&2
21 | exit 1
22 | fi
23 |
24 | # Things set from the command line
25 | njobs=$1
26 | additional_args=$2
27 | logs=$3
28 | logdirs=${@:4}
29 |
30 | # Iterate through the *.log.gz files in the supplied directory
31 | for val in $logs; do
32 | zeek2esargsplus=$zeek2esargs" "$additional_args
33 | filename_re="/^.*\/"$val$lognamedelim".*\.log\.gz$/"
34 |
35 | filterfile=$filter_file_dir$val"_filter.txt"
36 |
37 | if [ -f $filterfile ]; then
38 | find $logdirs | awk $filename_re | parallel -j $njobs $pythoncmd $zeek2es_path {} $zeek2esargsplus -f $filterfile :::: -
39 | else
40 | find $logdirs | awk $filename_re | parallel -j $njobs $pythoncmd $zeek2es_path {} $zeek2esargsplus :::: -
41 | fi
42 | done
--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
1 | from setuptools import setup
2 | from Cython.Build import cythonize
3 |
4 | setup(
5 | ext_modules = cythonize("zeek2es.py")
6 | )
7 |
--------------------------------------------------------------------------------
/zeek2es.py:
--------------------------------------------------------------------------------
1 | import sys
2 | import subprocess
3 | import json
4 | import csv
5 | import io
6 | import requests
7 | from requests.auth import HTTPBasicAuth
8 | from urllib3.exceptions import InsecureRequestWarning
9 | import datetime
10 | import re
11 | import argparse
12 | import random
13 | import time
14 | # Making these available for lambda filter input.
15 | import ipaddress
16 | import os
17 |
18 | # The number of bits to use in a random hash.
19 | hashbits = 128
20 |
21 | # Disable SSL warnings.
22 | requests.packages.urllib3.disable_warnings(category=InsecureRequestWarning)
23 |
24 | # We do this to add a little extra help at the end.
25 | class MyParser(argparse.ArgumentParser):
26 | def print_help(self):
27 | super().print_help()
28 | print("")
29 | print("To delete indices:\n\n\tcurl -X DELETE http://localhost:9200/zeek*?pretty\n")
30 | print("To delete data streams:\n\n\tcurl -X DELETE http://localhost:9200/_data_stream/zeek*?pretty\n")
31 | print("To delete index templates:\n\n\tcurl -X DELETE http://localhost:9200/_index_template/zeek*?pretty\n")
32 | print("To delete the lifecycle policy:\n\n\tcurl -X DELETE http://localhost:9200/_ilm/policy/zeek-lifecycle-policy?pretty\n")
33 | print("You will need to add -k -u elastic_user:password if you are using Elastic v8+.\n")
34 |
35 | # This takes care of arg parsing
36 | def parseargs():
37 | parser = MyParser(description='Process Zeek ASCII logs into ElasticSearch.', formatter_class=argparse.RawTextHelpFormatter)
38 | parser.add_argument('filename',
39 | help='The Zeek log in *.log or *.gz format. Include the full path.')
40 | parser.add_argument('-i', '--esindex', help='The Elasticsearch index/data stream name.')
41 | parser.add_argument('-u', '--esurl', default="http://localhost:9200", help='The Elasticsearch URL. Use ending slash. Use https for Elastic v8+. (default: http://localhost:9200)')
42 | parser.add_argument('--user', default="", help='The Elasticsearch user. (default: disabled)')
43 | parser.add_argument('--passwd', default="", help='The Elasticsearch password. Note this will put your password in this shell history file. (default: disabled)')
44 | parser.add_argument('-l', '--lines', default=10000, type=int, help='Lines to buffer for RESTful operations. (default: 10,000)')
45 | parser.add_argument('-n', '--name', default="", help='The name of the system to add to the index for uniqueness. (default: empty string)')
46 | parser.add_argument('-k', '--keywords', nargs="+", default="service", help='A list of text fields to add a keyword subfield. (default: service)')
47 | parser.add_argument('-a', '--lambdafilter', default="", help='A Python lambda function, when eval\'d will filter your output JSON dict. (default: empty string)')
48 | parser.add_argument('-f', '--filterfile', default="", help='A Python function file, when eval\'d will filter your output JSON dict. (default: empty string)')
49 | parser.add_argument('-y', '--outputfields', nargs="+", default="", help='A list of fields to keep for the output. Must include ts. (default: empty string)')
50 | parser.add_argument('-d', '--datastream', default=0, type=int, help='Instead of an index, use a data stream that will rollover at this many GB.\nRecommended is 50 or less. (default: 0 - disabled)')
51 | parser.add_argument('--compress', action="store_true", help='If a datastream is used, enable best compression.')
52 | parser.add_argument('-o', '--logkey', nargs=2, action='append', metavar=('fieldname','filename'), default=[], help='A field to log to a file. Example: uid uid.txt. \nWill append to the file! Delete file before running if appending is undesired. \nThis option can be called more than once. (default: empty - disabled)')
53 | parser.add_argument('-e', '--filterkeys', nargs=2, metavar=('fieldname','filename'), default="", help='A field to filter with keys from a file. Example: uid uid.txt. (default: empty string - disabled)')
54 | parser.add_argument('-g', '--ingestion', action="store_true", help='Use the ingestion pipeline to do things like geolocate IPs and split services. Takes longer, but worth it.')
55 | parser.add_argument('-p', '--splitfields', nargs="+", default="", help='A list of additional fields to split with the ingestion pipeline, if enabled.\n(default: empty string - disabled)')
56 | parser.add_argument('-j', '--jsonlogs', action="store_true", help='Assume input logs are JSON.')
57 | parser.add_argument('-r', '--origtime', action="store_true", help='Keep the numerical time format, not milliseconds as ES needs.')
58 | parser.add_argument('-t', '--timestamp', action="store_true", help='Keep the time in timestamp format.')
59 | parser.add_argument('-s', '--stdout', action="store_true", help='Print JSON to stdout instead of sending to Elasticsearch directly.')
60 | parser.add_argument('-b', '--nobulk', action="store_true", help='Remove the ES bulk JSON header. Requires --stdout.')
61 | parser.add_argument('--humio', nargs=2, default="", help='First argument is the Humio URL, the second argument is the ingest token.')
62 | parser.add_argument('-c', '--cython', action="store_true", help='Use Cython execution by loading the local zeek2es.so file through an import.\nRun python setup.py build_ext --inplace first to make your zeek2es.so file!')
63 | parser.add_argument('-w', '--hashdates', action="store_true", help='Use hashes instead of dates for the index name.')
64 | parser.add_argument('-z', '--supresswarnings', action="store_true", help='Supress any type of warning. Die stoically and silently.')
65 | args = parser.parse_args()
66 | return args
67 |
68 | # A function to send data in bulk to ES.
69 | def sendbulk(args, outstring, es_index, filename):
70 | # Elastic username and password auth
71 | auth = None
72 | if (len(args['user']) > 0):
73 | auth = HTTPBasicAuth(args['user'], args['passwd'])
74 |
75 | if len(args['humio']) != 2:
76 | if not args['stdout']:
77 | esurl = args['esurl'][:-1] if args['esurl'].endswith('/') else args['esurl']
78 |
79 | res = requests.put(esurl+'/_bulk', headers={'Content-Type': 'application/json'},
80 | data=outstring.encode('UTF-8'), auth=auth, verify=False)
81 | if not res.ok:
82 | if not args['supresswarnings']:
83 | print("WARNING! PUT did not return OK! Your index {} is incomplete. Filename: {} Response: {} {}".format(es_index, filename, res, res.text))
84 | else:
85 | print(outstring.strip())
86 | else:
87 | # Send to Humio
88 | Headers = { "Authorization" : "Bearer "+args['humio'][1] }
89 | data = [{"messages" : outstring.strip().split('\n') }]
90 | while True:
91 | try:
92 | r = requests.post(args['humio'][0]+'/api/v1/ingest/humio-unstructured', headers=Headers, json=data)
93 | break
94 | except Exception as exc:
95 | if not args['supresswarnings']:
96 | print("WARNING, Humio error: {}".format(exc))
97 | time.sleep(1)
98 |
99 | # A function to send the datastream info to ES.
100 | def senddatastream(args, es_index, mappings):
101 | # Elastic username and password auth
102 | auth = None
103 | if (len(args['user']) > 0):
104 | auth = HTTPBasicAuth(args['user'], args['passwd'])
105 |
106 | esurl = args['esurl'][:-1] if args['esurl'].endswith('/') else args['esurl']
107 |
108 | lifecycle_policy = {"policy": {"phases": {"hot": {"actions": {"rollover": {"max_primary_shard_size": "{}GB".format(args['datastream'])}}}}}}
109 | res = requests.put(esurl+"/_ilm/policy/zeek-lifecycle-policy", headers={'Content-Type': 'application/json'},
110 | data=json.dumps(lifecycle_policy).encode('UTF-8'), auth=auth, verify=False)
111 | index_template = {"index_patterns": [es_index], "data_stream": {}, "composed_of": [], "priority": 500,
112 | "template": {"settings": {"index.lifecycle.name": "zeek-lifecycle-policy"}, "mappings": mappings["mappings"]}}
113 | if (args['compress']):
114 | index_template["template"]["settings"]["index"] = {"codec": "best_compression"}
115 | res = requests.put(esurl+"/_index_template/"+es_index, headers={'Content-Type': 'application/json'},
116 | data=json.dumps(index_template).encode('UTF-8'), auth=auth, verify=False)
117 |
118 | # A function to send mappings to ES.
119 | def sendmappings(args, es_index, mappings):
120 | # Elastic username and password auth
121 | auth = None
122 | if (len(args['user']) > 0):
123 | auth = HTTPBasicAuth(args['user'], args['passwd'])
124 |
125 | esurl = args['esurl'][:-1] if args['esurl'].endswith('/') else args['esurl']
126 |
127 | res = requests.put(esurl+"/"+es_index, headers={'Content-Type': 'application/json'},
128 | data=json.dumps(mappings).encode('UTF-8'), auth=auth, verify=False)
129 |
130 | # A function to send the ingest pipeline to ES.
131 | def sendpipeline(args, ingest_pipeline):
132 | # Elastic username and password auth
133 | auth = None
134 | if (len(args['user']) > 0):
135 | auth = HTTPBasicAuth(args['user'], args['passwd'])
136 |
137 | esurl = args['esurl'][:-1] if args['esurl'].endswith('/') else args['esurl']
138 |
139 | res = requests.put(esurl+"/_ingest/pipeline/zeekgeoip", headers={'Content-Type': 'application/json'},
140 | data=json.dumps(ingest_pipeline).encode('UTF-8'), auth=auth, verify=False)
141 |
142 | # Everything important is in here.
143 | def main(**args):
144 |
145 | # Takes care of the fields we want to output, if not all.
146 | outputfields = []
147 | if (len(args['outputfields']) > 0):
148 | outputfields = args['outputfields']
149 |
150 | # Takes care of logging keys to a file.
151 | logkeyfields = []
152 | logkeys_fds = []
153 | if (len(args['logkey']) > 0):
154 | for lk in args['logkey']:
155 | thefield, thefile = lk[0], lk[1]
156 | f = open(thefile, "a+")
157 | logkeyfields.append(thefield)
158 | logkeys_fds.append(f)
159 |
160 | # Takes care of loading keys from a file to use in a filter.
161 | filterkeys = set()
162 | filterkeys_field = None
163 | if (len(args['filterkeys']) > 0):
164 | filterkeys_field = args['filterkeys'][0]
165 | filterkeys_file = args['filterkeys'][1]
166 | with open(filterkeys_file, "r") as infile:
167 | filterkeys = set(infile.read().splitlines())
168 |
169 | # This takes care of fields where we want to add the keyword field.
170 | keywords = []
171 | if (len(args['keywords']) > 0):
172 | keywords = args['keywords']
173 |
174 | # Error checking
175 | if args['esindex'] and args['stdout']:
176 | if not args['supresswarnings']:
177 | print("Cannot write to Elasticsearch and stdout at the same time.")
178 | exit(-1)
179 |
180 | # Error checking
181 | if args['nobulk'] and not args['stdout']:
182 | if not args['supresswarnings']:
183 | print("The nobulk option can only be used with the stdout option.")
184 | exit(-2)
185 |
186 | # Error checking
187 | if len(args['humio']) > 0 and (not args['stdout'] or not args['nobulk'] or args['timestamp']):
188 | if not args['supresswarnings']:
189 | print("The Humio option can only be used with the stdout and nobulk options, and cannot have the timestamp option.")
190 | exit(-5)
191 |
192 | # Error checking
193 | if not args['timestamp'] and args['origtime']:
194 | if not args['supresswarnings']:
195 | print("The origtime option can only be used with the timestamp option.")
196 | exit(-3)
197 |
198 | # Error checking
199 | if len(args['lambdafilter']) > 0 and len(args['filterfile']) > 0:
200 | if not args['supresswarnings']:
201 | print("The lambdafilter option cannot be used with the filterfile option.")
202 | exit(-7)
203 |
204 | # This takes care of loading the Python filters.
205 | filterfilter = None
206 | if len(args['lambdafilter']) > 0:
207 | filterfilter = eval(args['lambdafilter'])
208 |
209 | if len(args['filterfile']) > 0:
210 | with open(args['filterfile'], "r") as ff:
211 | filterfilter = eval(ff.read())
212 |
213 | # The file we are processing.
214 | filename = args['filename']
215 |
216 | # Detect if the log is compressed or not.
217 | if filename.split(".")[-1].lower() == "gz":
218 | # This works on Linux and MacOs
219 | zcat_name = ["gzip", "-d", "-c"]
220 | else:
221 | zcat_name = ["cat"]
222 |
223 | # Setup the ingest pipeline
224 | ingest_pipeline = {"description": "Zeek Log Ingestion Pipeline.", "processors": [ ]}
225 |
226 | if args['ingestion']:
227 | fields_to_split = []
228 | if len(args['splitfields']) > 0:
229 | fields_to_split = args['splitfields']
230 | ingest_pipeline["processors"] += [{"dot_expander": {"field": "*"}}]
231 | ingest_pipeline["processors"] += [{"split": {"field": "service", "separator": ",", "ignore_missing": True, "ignore_failure": True}}]
232 | for f in fields_to_split:
233 | ingest_pipeline["processors"] += [{"split": {"field": f, "separator": ",", "ignore_missing": True, "ignore_failure": True}}]
234 | ingest_pipeline["processors"] += [{"geoip": {"field": "id.orig_h", "target_field": "geoip_orig", "ignore_missing": True}}]
235 | ingest_pipeline["processors"] += [{"geoip": {"field": "id.resp_h", "target_field": "geoip_resp", "ignore_missing": True}}]
236 |
237 | # This section takes care of TSV logs. Skip ahead for the JSON logic.
238 | if not args['jsonlogs']:
239 | # Get the date
240 |
241 | zcat_process = subprocess.Popen(zcat_name+[filename],
242 | stdout=subprocess.PIPE)
243 |
244 | head_process = subprocess.Popen(['head'],
245 | stdin=zcat_process.stdout,
246 | stdout=subprocess.PIPE)
247 |
248 | grep_process = subprocess.Popen(['grep', '#open'],
249 | stdin=head_process.stdout,
250 | stdout=subprocess.PIPE)
251 |
252 | try:
253 | log_date = datetime.datetime.strptime(grep_process.communicate()[0].decode('UTF-8').strip().split('\t')[1], "%Y-%m-%d-%H-%M-%S")
254 | except:
255 | if not args['supresswarnings']:
256 | print("Date not found from Zeek log! {}".format(filename))
257 | exit(-4)
258 |
259 | # Get the Zeek log path
260 |
261 | zcat_process = subprocess.Popen(zcat_name+[filename],
262 | stdout=subprocess.PIPE)
263 |
264 | head_process = subprocess.Popen(['head'],
265 | stdin=zcat_process.stdout,
266 | stdout=subprocess.PIPE)
267 |
268 | grep_process = subprocess.Popen(['grep', '#path'],
269 | stdin=head_process.stdout,
270 | stdout=subprocess.PIPE)
271 |
272 | zeek_log_path = grep_process.communicate()[0].decode('UTF-8').strip().split('\t')[1]
273 |
274 | # Build the ES index.
275 | if not args['esindex']:
276 | if args['datastream'] > 0:
277 | es_index = "logs-zeek-{}".format(zeek_log_path)
278 | else:
279 | sysname = ""
280 | if (len(args['name']) > 0):
281 | sysname = "{}_".format(args['name'])
282 | # We allow for hashes instead of dates in the index name.
283 | if not args['hashdates']:
284 | es_index = "zeek_"+sysname+"{}_{}".format(zeek_log_path, log_date.date())
285 | else:
286 | es_index = "zeek_"+sysname+"{}_{}".format(zeek_log_path, random.getrandbits(hashbits))
287 | else:
288 | es_index = args['esindex']
289 |
290 | es_index = es_index.replace(':', '_').replace("/", "_")
291 |
292 | # Get the Zeek fields from the log file.
293 |
294 | zcat_process = subprocess.Popen(zcat_name+[filename],
295 | stdout=subprocess.PIPE)
296 |
297 | head_process = subprocess.Popen(['head'],
298 | stdin=zcat_process.stdout,
299 | stdout=subprocess.PIPE)
300 |
301 | grep_process = subprocess.Popen(['grep', '#fields'],
302 | stdin=head_process.stdout,
303 | stdout=subprocess.PIPE)
304 |
305 | fields = grep_process.communicate()[0].decode('UTF-8').strip().split('\t')[1:]
306 |
307 | # Get the Zeek types from the log file.
308 |
309 | zcat_process = subprocess.Popen(zcat_name+[filename],
310 | stdout=subprocess.PIPE)
311 |
312 | head_process = subprocess.Popen(['head'],
313 | stdin=zcat_process.stdout,
314 | stdout=subprocess.PIPE)
315 |
316 | grep_process = subprocess.Popen(['grep', '#types'],
317 | stdin=head_process.stdout,
318 | stdout=subprocess.PIPE)
319 |
320 | types = grep_process.communicate()[0].decode('UTF-8').strip().split('\t')[1:]
321 |
322 | # Read TSV
323 |
324 | zcat_process = subprocess.Popen(zcat_name+[filename],
325 | stdout=subprocess.PIPE)
326 |
327 | grep_process = subprocess.Popen(['grep', '-E', '-v', '^#'],
328 | stdin=zcat_process.stdout,
329 | stdout=subprocess.PIPE)
330 |
331 | # Make the max size
332 | csv.field_size_limit(sys.maxsize)
333 |
334 | # Only process if we have a valid log file.
335 | if len(types) > 0 and len(fields) > 0:
336 | read_tsv = csv.reader(io.TextIOWrapper(grep_process.stdout), delimiter="\t", quoting=csv.QUOTE_NONE)
337 |
338 | # Put mappings
339 |
340 | mappings = {"mappings": {"properties": dict(geoip_orig=dict(properties=dict(location=dict(type="geo_point"))), geoip_resp=dict(properties=dict(location=dict(type="geo_point"))))}}
341 |
342 | for i in range(len(fields)):
343 | if types[i] == "time":
344 | mappings["mappings"]["properties"][fields[i]] = {"type": "date"}
345 | elif types[i] == "addr":
346 | mappings["mappings"]["properties"][fields[i]] = {"type": "ip"}
347 | elif types[i] == "string":
348 | # Special cases
349 | if fields[i] in keywords:
350 | mappings["mappings"]["properties"][fields[i]] = {"type": "text", "fields": { "keyword": { "type": "keyword" }}}
351 | else:
352 | mappings["mappings"]["properties"][fields[i]] = {"type": "text"}
353 |
354 | # Put index template for data stream
355 |
356 | if args["datastream"] > 0:
357 | senddatastream(args, es_index, mappings)
358 |
359 | # Put data
360 |
361 | putmapping = False
362 | putpipeline = False
363 | n = 0
364 | items = 0
365 | outstring = ""
366 | ofl = len(outputfields)
367 |
368 | # Iterate through every row in the TSV.
369 | for row in read_tsv:
370 | # Build the dict and fill in any default info.
371 | d = dict(zeek_log_filename=filename, zeek_log_path=zeek_log_path)
372 | if (len(args['name']) > 0):
373 | d["zeek_log_system_name"] = args['name']
374 | i = 0
375 | added_val = False
376 |
377 | # For each column in the row.
378 | for col in row:
379 | # Process the data using a method for each type. We also will only output fields of a certain name,
380 | # if identified on the command line.
381 | if types[i] == "time":
382 | if col != '-' and col != '(empty)' and col != '' and (ofl == 0 or fields[i] in outputfields):
383 | gmt_mydt = datetime.datetime.utcfromtimestamp(float(col))
384 | if not args['timestamp']:
385 | d[fields[i]] = "{}T{}".format(gmt_mydt.date(), gmt_mydt.time())
386 | else:
387 | if args['origtime']:
388 | d[fields[i]] = gmt_mydt.timestamp()
389 | else:
390 | d[fields[i]] = gmt_mydt.timestamp()*1000
391 | added_val = True
392 | elif types[i] == "interval" or types[i] == "double":
393 | if col != '-' and col != '(empty)' and col != '' and (ofl == 0 or fields[i] in outputfields):
394 | d[fields[i]] = float(col)
395 | added_val = True
396 | elif types[i] == "bool":
397 | if col != '-' and col != '(empty)' and col != '' and (ofl == 0 or fields[i] in outputfields):
398 | d[fields[i]] = col == "T"
399 | added_val = True
400 | elif types[i] == "port" or types[i] == "count" or types[i] == "int":
401 | if col != '-' and col != '(empty)' and col != '' and (ofl == 0 or fields[i] in outputfields):
402 | d[fields[i]] = int(col)
403 | added_val = True
404 | elif types[i].startswith("vector") or types[i].startswith("set"):
405 | if col != '-' and col != '(empty)' and col != '' and (ofl == 0 or fields[i] in outputfields):
406 | d[fields[i]] = col.split(",")
407 | added_val = True
408 | else:
409 | if col != '-' and col != '(empty)' and col != '' and (ofl == 0 or fields[i] in outputfields):
410 | d[fields[i]] = col
411 | added_val = True
412 | i += 1
413 |
414 | # Here we only add data if there is a timestamp, and if the filter keys are used we make sure our key exists.
415 | if added_val and "ts" in d and (not filterkeys_field or (filterkeys_field and d[filterkeys_field] in filterkeys)):
416 | # This is the Python function filtering logic.
417 | filter_data = False
418 | if filterfilter:
419 | output = list(filter(filterfilter, [d]))
420 | if len(output) == 0:
421 | filter_data = True
422 |
423 | # If we haven't filtered using the Python filter function...
424 | if not filter_data:
425 | # Log the keys to a file, if desired.
426 | i = 0
427 | for lkf in logkeyfields:
428 | lkfd = logkeys_fds[i]
429 | if lkf in d:
430 | if isinstance(d[lkf], list):
431 | for z in d[lkf]:
432 | lkfd.write(z)
433 | lkfd.write("\n")
434 | else:
435 | lkfd.write(d[lkf])
436 | lkfd.write("\n")
437 | i += 1
438 |
439 | # Create the bulk header.
440 | if not args['nobulk']:
441 | i = dict(create=dict(_index=es_index))
442 | if len(ingest_pipeline["processors"]) > 0:
443 | i["create"]["pipeline"] = "zeekgeoip"
444 | outstring += json.dumps(i)+"\n"
445 | # Prepare the output and increment counters
446 | if args['humio']:
447 | d['ts'] = d['ts'] + "Z"
448 | if "_write_ts" in d:
449 | d['_write_ts'] = d['_write_ts'] + "Z"
450 | else:
451 | d["_write_ts"] = d["ts"]
452 | if "_path" not in d:
453 | d["_path"] = zeek_log_path
454 | if (len(args['name'].strip()) > 0):
455 | d["_system_name"] = args['name'].strip()
456 | d["@timestamp"] = d["ts"]
457 | outstring += json.dumps(d)+"\n"
458 | n += 1
459 | items += 1
460 | # If we aren't using stdout, prepare the ES index/datastream.
461 | if not args['stdout']:
462 | if putmapping == False:
463 | sendmappings(args, es_index, mappings)
464 | putmapping = True
465 | if putpipeline == False and len(ingest_pipeline["processors"]) > 0:
466 | sendpipeline(args, ingest_pipeline)
467 | putpipeline = True
468 |
469 | # Once we get more than "lines", we send it to ES
470 | if n >= args['lines'] and len(outstring) > 0:
471 | sendbulk(args, outstring, es_index, filename)
472 | outstring = ""
473 | n = 0
474 |
475 | # We do this one last time to get rid of any remaining lines.
476 | if n != 0 and len(outstring) > 0:
477 | sendbulk(args, outstring, es_index, filename)
478 | else:
479 | # This does everything the TSV version does, but for JSON
480 | # Read JSON log
481 | zcat_process = subprocess.Popen(zcat_name+[filename],
482 | stdout=subprocess.PIPE)
483 | j_in = io.TextIOWrapper(zcat_process.stdout)
484 |
485 | zeek_log_path = ""
486 | items = 0
487 | n = 0
488 | outstring = ""
489 | es_index = ""
490 |
491 | # Put mappings
492 |
493 | mappings = {"mappings": {"properties": dict(ts=dict(type="date"), geoip_orig=dict(properties=dict(location=dict(type="geo_point"))),
494 | geoip_resp=dict(properties=dict(location=dict(type="geo_point"))))}}
495 | mappings["mappings"]["properties"]["id.orig_h"] = {"type": "ip"}
496 | mappings["mappings"]["properties"]["id.resp_h"] = {"type": "ip"}
497 | putmapping = False
498 | putpipeline = False
499 | putdatastream = False
500 |
501 | # We continue until broken.
502 | while True:
503 | line = j_in.readline()
504 |
505 | # Here is where we break out of the while True loop.
506 | if not line:
507 | break
508 |
509 | # Load our data so we can process it.
510 | j_data = json.loads(line)
511 |
512 | # Only process data that has a timestamp field.
513 | if "ts" in j_data:
514 | # Here we deal with the time output format.
515 | gmt_mydt = datetime.datetime.utcfromtimestamp(float(j_data["ts"]))
516 |
517 | if not args['timestamp']:
518 | j_data["ts"] = "{}T{}".format(gmt_mydt.date(), gmt_mydt.time())
519 | else:
520 | if args['origtime']:
521 | j_data["ts"] = gmt_mydt.timestamp()
522 | else:
523 | # ES uses ms
524 | j_data["ts"] = gmt_mydt.timestamp()*1000
525 |
526 | # This happens when we go through this loop the first time and do not have an es_index name.
527 | if es_index == "":
528 | sysname = ""
529 |
530 | if (len(args['name']) > 0):
531 | sysname = "{}_".format(args['name'])
532 |
533 | # Since the JSON logs do not include the Zeek log path, we try to guess it from the name.
534 | try:
535 | zeek_log_path = re.search(".*\/([^\._]+).*", filename).group(1).lower()
536 | except:
537 | print("Log path cannot be found from filename: {}".format(filename))
538 | exit(-5)
539 |
540 | # We allow for hahes instead of dates in our index name.
541 | if not args['hashdates']:
542 | es_index = "zeek_{}{}_{}".format(sysname, zeek_log_path, gmt_mydt.date())
543 | else:
544 | es_index = "zeek_{}{}_{}".format(sysname, zeek_log_path, random.getrandbits(hashbits))
545 |
546 | es_index = es_index.replace(':', '_').replace("/", "_")
547 |
548 | # If we are not sending the data to stdout, we prepare the ES index or datastream.
549 | if not args['stdout']:
550 | if putmapping == False:
551 | sendmappings(args, es_index, mappings)
552 | putmapping = True
553 | if putpipeline == False and len(ingest_pipeline["processors"]) > 0:
554 | sendpipeline(args, ingest_pipeline)
555 | putpipeline = True
556 | if args["datastream"] > 0 and putdatastream == False:
557 | senddatastream(args, es_index, mappings)
558 | putdatastream = True
559 |
560 | # We add the system name, if desired.
561 | if (len(args['name']) > 0):
562 | j_data["zeek_log_system_name"] = args['name']
563 |
564 | # Here we are checking if the keys will filter the data in.
565 | if not filterkeys_field or (filterkeys_field and j_data[filterkeys_field] in filterkeys):
566 | # This check below is for the Python filters.
567 | filter_data = False
568 | if filterfilter:
569 | output = list(filter(filterfilter, [j_data]))
570 | if len(output) == 0:
571 | filter_data = True
572 |
573 | if not filter_data:
574 | # We log the keys, if so desired.
575 | i = 0
576 | for lkf in logkeyfields:
577 | lkfd = logkeys_fds[i]
578 | if lkf in j_data:
579 | if isinstance(j_data[lkf], list):
580 | for z in j_data[lkf]:
581 | lkfd.write(z)
582 | lkfd.write("\n")
583 | else:
584 | lkfd.write(j_data[lkf])
585 | lkfd.write("\n")
586 | i += 1
587 | items += 1
588 |
589 | if not args['nobulk']:
590 | i = dict(create=dict(_index=es_index))
591 | if len(ingest_pipeline["processors"]) > 0:
592 | i["create"]["pipeline"] = "zeekgeoip"
593 | outstring += json.dumps(i)+"\n"
594 | j_data["@timestamp"] = j_data["ts"]
595 | # Here we only include the output fields identified via the command line.
596 | if len(outputfields) > 0:
597 | new_j_data = {}
598 | for o in outputfields:
599 | if o in j_data:
600 | new_j_data[o] = j_data[o]
601 | j_data = new_j_data
602 | outstring += json.dumps(j_data) + "\n"
603 | n += 1
604 |
605 | # Here we output a set of lines to the ES server.
606 | if n >= args['lines'] and len(outstring) > 0:
607 | sendbulk(args, outstring, es_index, filename)
608 | outstring = ""
609 | n = 0
610 |
611 | # We send the last of the data to the ES server, if there is any left.
612 | if n != 0 and len(outstring) > 0:
613 | sendbulk(args, outstring, es_index, filename)
614 |
615 | # This deals with running as a script vs. cython.
616 | if __name__ == "__main__":
617 | args = parseargs()
618 | if args.cython:
619 | import zeek2es
620 | zeek2es.main(**vars(args))
621 | else:
622 | main(**vars(args))
--------------------------------------------------------------------------------