├── .gitignore ├── CHANGES ├── LICENSE ├── Readme.md ├── docker ├── .env ├── data │ └── .empty ├── docker-compose.yml └── zeek2es │ ├── Dockerfile │ └── entrypoint.sh ├── images ├── kibana-aggregation.png ├── kibana-map.png ├── kibana-subnet-search.png ├── kibana-timeseries.png ├── kibana.png └── multi-log-correlation.png ├── process_log.sh ├── process_logs_as_datastream.sh ├── process_logs_to_stdout.sh ├── setup.py └── zeek2es.py /.gitignore: -------------------------------------------------------------------------------- 1 | build/ 2 | *.so 3 | *.c 4 | .DS_Store 5 | docker/data 6 | -------------------------------------------------------------------------------- /CHANGES: -------------------------------------------------------------------------------- 1 | v0.3.15 Improved Humio import. 2 | v0.3.14 Removed a print statement. 3 | v0.3.13 Fixed some errors on Humio import. 4 | v0.3.12 Will continue to populate data after a Humio error. 5 | v0.3.11 Added Humio support. 6 | v0.3.10 Improved Docker components. 7 | v0.3.9 Fixed a variable check when there is no output. 8 | v0.3.8 Fixed up some minor issues with JSON stdout output. 9 | v0.3.7 Added Docker pieces. 10 | v0.3.6 Fixed a bug with the slash on the end of the ES url option. 11 | v0.3.5 Removed need for trailing slash on ES URL. 12 | v0.3.4 Made datastream names consistent with ES expectations if -d is used without an index name. 13 | v0.3.3 Added best compression option and fixed helper script. 14 | v0.3.2 Fixed a bug with a grep command. 15 | v0.3.1 Added more logic to make ready for Elastic v8. 16 | v0.3.0 Added filtering on keys. Cleaned up some argparse logic, breaking previous command lines. 17 | v0.2.20 Fix wording. 18 | v0.2.19 Fix a bug in a helper script. 19 | v0.2.18 Added the -p command line argument to split additional fields. 20 | v0.2.17 Fixed various things in the help scripts. Refactor. 21 | v0.2.16 Fixed a typo in a helper script. 22 | v0.2.15 Refactor helper script. 23 | v0.2.14 Added a fswatch helper script. 24 | v0.2.13 Refactored the helper script. 25 | v0.2.12 Added a supporting shell script for data streams. 26 | v0.2.11 Fixed a mapping issue with data streams. 27 | v0.2.10 Fixed help screen output. 28 | v0.2.9 Added hashdates option to use random hashes instead of dates in indices. 29 | v0.2.8 Added lifecycle policy for shard size rollover. 30 | v0.2.7 Added data stream capability. 31 | v0.2.6 Added capability to output only certain fields. 32 | v0.2.5 Added Cython and Python lambda filtering capabilities. 33 | v0.2.4 Added error checking for empty field. 34 | v0.2.3 Added keyword sub field capabilities with -k option. 35 | Added more documentation to readme. 36 | v0.2.2 Added a split ingest pipeline on the "service" field. 37 | v0.2.1 Added ES pipeline capability, which allows for Geolocation on IP addresses. 38 | v0.2.0 Removed some index checking, made indices on log type and day to 39 | reduce the number of open indices. Remove state documents. 40 | Other odds and ends. Added @timestamp for ease. 41 | v0.1.16 Added JSON input support with -j. 42 | v0.1.15 Fix a bug with timezone translation. 43 | v0.1.14 Add timezone support. 44 | v0.1.13 Tune down the -l parameter. 45 | v0.1.12 Added origtime command line option. 46 | v0.1.11 Improvements to processing speed. 47 | v0.1.10 Add option to keep original times. 48 | v0.1.9 Remove stderr output from zeek-cut. 49 | v0.1.8 Added system name to log, if available. 50 | v0.1.7 Improved index name generation. 51 | v0.1.6 Get date from log rather than path. 52 | v0.1.5 Added more debug output. 53 | v0.1.4 Added some error checking. 54 | v0.1.3 Added number of items processed to state document. 55 | v0.1.2 Added state information and --checkstate command line option. 56 | v0.1.1 Added file name to JSON documents. 57 | v0.1.0 Initial release. 58 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright (c) 2021, Corelight, Inc. All rights reserved. 2 | 3 | Redistribution and use in source and binary forms, with or without 4 | modification, are permitted provided that the following conditions are 5 | met: 6 | 7 | (1) Redistributions of source code must retain the above copyright 8 | notice, this list of conditions and the following disclaimer. 9 | 10 | (2) Redistributions in binary form must reproduce the above copyright 11 | notice, this list of conditions and the following disclaimer in 12 | the documentation and/or other materials provided with the 13 | distribution. 14 | 15 | (3) Neither the name of Corelight nor the names of any contributors 16 | may be used to endorse or promote products derived from this 17 | software without specific prior written permission. 18 | 19 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 20 | "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 21 | LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR 22 | A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT 23 | OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 24 | SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 25 | LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 26 | DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 27 | THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 28 | (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 29 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 30 | -------------------------------------------------------------------------------- /Readme.md: -------------------------------------------------------------------------------- 1 | # zeek2es.py 2 | 3 | This Python application translates [Zeek's](https://zeek.org/) ASCII TSV and JSON 4 | logs into [ElasticSearch's bulk load JSON format](https://www.elastic.co/guide/en/elasticsearch/reference/current/getting-started.html#add-multiple-documents). 5 | 6 | ## Table of Contents: 7 | - [Introduction](#introduction) 8 | - [Installation](#installation) 9 | - [Elastic v8.0+](#elastic80) 10 | - [Docker](#docker) 11 | - [Upgrading zeek2es](#upgradingzeek2es) 12 | - [ES Ingest Pipeline](#esingestpipeline) 13 | - [Filtering Data](#filteringdata) 14 | - [Python Filters](#pythonfilters) 15 | - [Filter on Keys](#filteronkeys) 16 | - [Command Line Examples](#commandlineexamples) 17 | - [Command Line Options](#commandlineoptions) 18 | - [Requirements](#requirements) 19 | - [Notes](#notes) 20 | - [Humio](#humio) 21 | - [JSON Log Input](#jsonloginput) 22 | - [Data Streams](#datastreams) 23 | - [Helper Scripts](#helperscripts) 24 | - [Cython](#cython) 25 | 26 | ## Introduction 27 | 28 | ![Kibana](images/kibana.png) 29 | 30 | Want to see multiple Zeek logs for the same connection ID (uid) 31 | or file ID (fuid)? Here are the hits from files.log, http.log, and 32 | conn.log for a single uid: 33 | 34 | ![Kibana](images/multi-log-correlation.png) 35 | 36 | You can perform subnet searching on Zeek's 'addr' type: 37 | 38 | ![Kibana Subnet Searching](images/kibana-subnet-search.png) 39 | 40 | You can create time series graphs, such as this NTP and HTTP graph: 41 | 42 | ![Kibana Time Series](images/kibana-timeseries.png) 43 | 44 | IP Addresses can be Geolocated with the `-g` command line option: 45 | 46 | ![Kibana Mapping](images/kibana-map.png) 47 | 48 | Aggregations are simple and quick: 49 | 50 | ![Kibana Aggregation](images/kibana-aggregation.png) 51 | 52 | This application will "just work" when Zeek log formats change. The logic reads 53 | the field names and associated types to set up the mappings correctly in 54 | ElasticSearch. 55 | 56 | This application will recognize gzip or uncompressed logs. This application assumes 57 | you have ElasticSearch set up on your localhost at the default port. 58 | If you do not have ElasticSearch you can output the JSON to stdout with the `-s -b` command line options 59 | to process with the [jq application](https://stedolan.github.io/jq). 60 | 61 | You can add a keyword subfield to text fields with the `-k` command line option. This is useful 62 | for aggregations in Kibana. 63 | 64 | If Python is already on your system, there is nothing additional for you to copy over 65 | to your machine than [Elasticsearch, Kibana](https://www.elastic.co/start), and [zeek2es.py](zeek2es.py) 66 | if you already have the [requests](https://docs.python-requests.org/en/latest/) library installed. 67 | 68 | ## Installation 69 | 70 | Assuming you meet the [requirements](#requirements), there is none. You just 71 | copy [zeek2es.py](zeek2es.py) to your host and run it with Python. Once Zeek 72 | logs have been imported with automatic index name generation (meaning, you did not supply the `-i` option) 73 | you will find your indices named "zeek_`zeeklogname`_`date`", where `zeeklogname` is a log name like `conn` 74 | and the `date` is in `YYYY-MM-DD` format. Set your Kibana index pattern to match `zeek*` in this case. If 75 | you named your index with the `-i` option, you will need to create a Kibana index pattern that 76 | matches your naming scheme. 77 | 78 | If you are upgrading zeek2es, please see [the section on upgrading zeek2es](#upgradingzeek2es). 79 | 80 | ### Elastic v8.0+ 81 | 82 | If you are using Elastic v8.0+, it has security enabled by default. This adds a requirement of a username 83 | and password, plus HTTPS. 84 | 85 | If you want to be able to delete indices/data streams with wildcards (as examples in this readme show), 86 | edit `elasticsearch.yml` with the following line: 87 | 88 | ``` 89 | action.destructive_requires_name: false 90 | ``` 91 | 92 | You will also need to change the curl commands in this readme to contain `-k -u elastic:` 93 | where the `elastic` user's password is set with a command like the following: 94 | 95 | ``` 96 | ./bin/elasticsearch-reset-password -u elastic -i 97 | ``` 98 | 99 | You can use `zeek2es.py` with the `--user` and `--passwd` command line options to specify your 100 | credentials to ES. You can also supply these options via the extra command line arguments for the helper 101 | scripts. 102 | 103 | ### Docker 104 | 105 | Probably the easiest way to use this code is through Docker. All of the files are in the `docker` directory. 106 | First, you will want to edit the lines with `CHANGEME!!!` in the `.env` file to fit your environment. 107 | You will also need to edit the Elastic password in `docker/zeek2es/entrypoint.sh` to match. It can be found after the `--passwd` option. 108 | Next, you can change directory into the `docker` directory and type the following commands to bring 109 | up a zeek2es and Elasticsearch cluster: 110 | 111 | ``` 112 | docker-compose build 113 | dockr-compose up 114 | ``` 115 | 116 | Now you can put logs in the `VOLUME_MOUNT/data/logs` directory (`VOLUME_MOUNT` you set in the `.env` file). 117 | When logs are CREATED in this directory, zeek2es will begin processing them and pushing them into Elasticsearch. 118 | You can then login to https://localhost:5601 with the username and password you set up in the `.env` file. 119 | By default there is a self signed certificate, but you can change that if you edit the docker compose files. Once inside 120 | Kibana you will go to Stack Management->Data Views and create a data view for `logs*` with the timestamp `@timestamp`. 121 | Now you will be able to go to Discover and start searching your logs! Your data is persistent in the `VOLUME_MOUNT/data` directory you set. 122 | If you would like to remove all data, just `rm -rf VOLUME_MOUNT/data`, substituting the directory you set into that remove command. 123 | The next time you start your cluster it will be brand new for more data. 124 | 125 | ## Upgrading zeek2es 126 | 127 | Most upgrades should be as simple as copying the newer [zeek2es.py](zeek2es.py) over 128 | the old one. In some cases, the ES ingest pipeline required for the `-g` command line option 129 | might change during an upgrade. Therefore, it is strongly recommend you delete 130 | your [ingest pipeline](#esingestpipeline) before you run a new version of zeek2es.py. 131 | 132 | ### ES Ingest Pipeline 133 | 134 | If you need to [delete the "zeekgeoip" ES ingest pipeline](https://www.elastic.co/guide/en/elasticsearch/reference/current/delete-pipeline-api.html) 135 | used to geolocate IP addresses with the `-g` command line option, you can either do it graphically 136 | through Kibana's Stack Management->Ingest Pipelines or this command will do it for you: 137 | 138 | ``` 139 | curl -X DELETE "localhost:9200/_ingest/pipeline/zeekgeoip?pretty" 140 | ``` 141 | 142 | This command is strongly recommended whenever updating your copy of zeek2es.py. 143 | 144 | ## Filtering Data 145 | 146 | ### Python Filters 147 | 148 | zeek2es provides filtering capabilities for your Zeek logs before they are stored in ElasticSearch. This 149 | functionality can be enabled with the `-a` or `-f` options. The filters are constructed from Python 150 | lambda functions, where the input is a Python dictionary representing the output. You can add a 151 | filter to only store connection logs where the `service` field is populated using the `-f` option with 152 | this lambda filter file: 153 | 154 | ``` 155 | lambda x: 'service' in x and len(x['service']) > 0 156 | ``` 157 | 158 | Or maybe you'd like to filter for connections that have at least 1,024 bytes, with at least 1 byte coming from 159 | the destination: 160 | 161 | ``` 162 | lambda x: 'orig_ip_bytes' in x and 'resp_ip_bytes' in x and x['orig_ip_bytes'] + x['resp_ip_bytes'] > 1024 and x['resp_ip_bytes'] > 0 163 | ``` 164 | 165 | Simpler lambda filters can be provided on the command line via the `-a` option. This filter will only store 166 | connection log entries where the originator IP address is part of the `192.0.0.0/8` network: 167 | 168 | ``` 169 | python zeek2es.py conn.log.gz -a "lambda x: 'id.orig_h' in x and ipaddress.ip_address(x['id.orig_h']) in ipaddress.ip_network('192.0.0.0/8')" 170 | ``` 171 | 172 | For power users, the `-f` option will allow you to define a full function (instead of Python's lambda functions) so you can write functions that 173 | span multiple lines. 174 | 175 | ### Filter on Keys 176 | 177 | In some instances you might want to pull data from one log that depends on another. An 178 | example would be finding all `ssl.log` rows that have a `uid` matching previously 179 | indexed rows from `conn.log`, or vice versa. You can filter by importing your 180 | `conn.log` files with the `-o uid uid.txt` command line. This will log all uids that were 181 | indexed to a file named `uid.txt`. Then, when you import your `ssl.log` files you will provide 182 | the `-e uid uid.txt` command line. This will only import SSL rows 183 | containing `uid` values that are in `uid.txt`, previously built from our import of `conn.log`. 184 | 185 | ## Command Line Examples 186 | 187 | ``` 188 | python zeek2es.py your_zeek_log.gz -i your_es_index_name 189 | ``` 190 | 191 | This script can be run in parallel on all connection logs, 10 at a time, with the following command: 192 | 193 | ``` 194 | find /some/dir -name “conn*.log.gz” | parallel -j 10 python zeek2es.py {1} :::: - 195 | ``` 196 | 197 | If you would like to automatically import all conn.log files as they are created in a directory, the following 198 | [fswatch](https://emcrisostomo.github.io/fswatch/) command will do that for you: 199 | 200 | ``` 201 | fswatch -m poll_monitor --event Created -r /data/logs/zeek/ | awk '/^.*\/conn.*\.log\.gz$/' | parallel -j 5 python ~/zeek2es.py {} -g -d :::: - 202 | ``` 203 | 204 | If you have the jq command installed you can perform searches across all your logs for a common 205 | field like connection uid, even without ElasticSearch: 206 | 207 | ``` 208 | find /usr/local/var/logs -name "*.log.gz" -exec python ~/Source/zeek2es/zeek2es.py {} -s -b -z \; | jq -c '. | select(.uid=="CLbPij1vThLvQ2qDKh")' 209 | ``` 210 | 211 | You can use much more complex jq queries than this if you are familiar with jq. 212 | 213 | If you want to remove all of your Zeek data from ElasticSearch, this command will do it for you: 214 | 215 | ``` 216 | curl -X DELETE http://localhost:9200/zeek* 217 | ``` 218 | 219 | Since the indices have the date appended to them, you could 220 | delete Dec 31, 2021 with the following command: 221 | 222 | ``` 223 | curl -X DELETE http://localhost:9200/zeek_*_2021-12-31 224 | ``` 225 | 226 | You could delete all conn.log entries with this command: 227 | 228 | ``` 229 | curl -X DELETE http://localhost:9200/zeek_conn_* 230 | ``` 231 | 232 | ## Command Line Options 233 | 234 | ``` 235 | $ python zeek2es.py -h 236 | usage: zeek2es.py [-h] [-i ESINDEX] [-u ESURL] [--user USER] [--passwd PASSWD] 237 | [-l LINES] [-n NAME] [-k KEYWORDS [KEYWORDS ...]] 238 | [-a LAMBDAFILTER] [-f FILTERFILE] 239 | [-y OUTPUTFIELDS [OUTPUTFIELDS ...]] [-d DATASTREAM] 240 | [--compress] [-o fieldname filename] [-e fieldname filename] 241 | [-g] [-p SPLITFIELDS [SPLITFIELDS ...]] [-j] [-r] [-t] [-s] 242 | [-b] [--humio HUMIO HUMIO] [-c] [-w] [-z] 243 | filename 244 | 245 | Process Zeek ASCII logs into ElasticSearch. 246 | 247 | positional arguments: 248 | filename The Zeek log in *.log or *.gz format. Include the full path. 249 | 250 | optional arguments: 251 | -h, --help show this help message and exit 252 | -i ESINDEX, --esindex ESINDEX 253 | The Elasticsearch index/data stream name. 254 | -u ESURL, --esurl ESURL 255 | The Elasticsearch URL. Use ending slash. Use https for Elastic v8+. (default: http://localhost:9200) 256 | --user USER The Elasticsearch user. (default: disabled) 257 | --passwd PASSWD The Elasticsearch password. Note this will put your password in this shell history file. (default: disabled) 258 | -l LINES, --lines LINES 259 | Lines to buffer for RESTful operations. (default: 10,000) 260 | -n NAME, --name NAME The name of the system to add to the index for uniqueness. (default: empty string) 261 | -k KEYWORDS [KEYWORDS ...], --keywords KEYWORDS [KEYWORDS ...] 262 | A list of text fields to add a keyword subfield. (default: service) 263 | -a LAMBDAFILTER, --lambdafilter LAMBDAFILTER 264 | A Python lambda function, when eval'd will filter your output JSON dict. (default: empty string) 265 | -f FILTERFILE, --filterfile FILTERFILE 266 | A Python function file, when eval'd will filter your output JSON dict. (default: empty string) 267 | -y OUTPUTFIELDS [OUTPUTFIELDS ...], --outputfields OUTPUTFIELDS [OUTPUTFIELDS ...] 268 | A list of fields to keep for the output. Must include ts. (default: empty string) 269 | -d DATASTREAM, --datastream DATASTREAM 270 | Instead of an index, use a data stream that will rollover at this many GB. 271 | Recommended is 50 or less. (default: 0 - disabled) 272 | --compress If a datastream is used, enable best compression. 273 | -o fieldname filename, --logkey fieldname filename 274 | A field to log to a file. Example: uid uid.txt. 275 | Will append to the file! Delete file before running if appending is undesired. 276 | This option can be called more than once. (default: empty - disabled) 277 | -e fieldname filename, --filterkeys fieldname filename 278 | A field to filter with keys from a file. Example: uid uid.txt. (default: empty string - disabled) 279 | -g, --ingestion Use the ingestion pipeline to do things like geolocate IPs and split services. Takes longer, but worth it. 280 | -p SPLITFIELDS [SPLITFIELDS ...], --splitfields SPLITFIELDS [SPLITFIELDS ...] 281 | A list of additional fields to split with the ingestion pipeline, if enabled. 282 | (default: empty string - disabled) 283 | -j, --jsonlogs Assume input logs are JSON. 284 | -r, --origtime Keep the numerical time format, not milliseconds as ES needs. 285 | -t, --timestamp Keep the time in timestamp format. 286 | -s, --stdout Print JSON to stdout instead of sending to Elasticsearch directly. 287 | -b, --nobulk Remove the ES bulk JSON header. Requires --stdout. 288 | --humio HUMIO HUMIO First argument is the Humio URL, the second argument is the ingest token. 289 | -c, --cython Use Cython execution by loading the local zeek2es.so file through an import. 290 | Run python setup.py build_ext --inplace first to make your zeek2es.so file! 291 | -w, --hashdates Use hashes instead of dates for the index name. 292 | -z, --supresswarnings 293 | Supress any type of warning. Die stoically and silently. 294 | 295 | To delete indices: 296 | 297 | curl -X DELETE http://localhost:9200/zeek*?pretty 298 | 299 | To delete data streams: 300 | 301 | curl -X DELETE http://localhost:9200/_data_stream/zeek*?pretty 302 | 303 | To delete index templates: 304 | 305 | curl -X DELETE http://localhost:9200/_index_template/zeek*?pretty 306 | 307 | To delete the lifecycle policy: 308 | 309 | curl -X DELETE http://localhost:9200/_ilm/policy/zeek-lifecycle-policy?pretty 310 | 311 | You will need to add -k -u elastic_user:password if you are using Elastic v8+. 312 | ``` 313 | 314 | ## Requirements 315 | 316 | - A Unix-like environment (MacOs works!) 317 | - Python 318 | - [requests](https://docs.python-requests.org/en/latest/) Python library installed, such as with with `pip`. 319 | 320 | ## Notes 321 | 322 | ### Humio 323 | 324 | To import your data into Humio you will need to set up a repository with the `corelight-json` parser. Obtain 325 | the ingest token for the repository and you can import your data with a command such as: 326 | 327 | ``` 328 | python3 zeek2es.py -s -b --humio http://localhost:8080 b005bf74-1ed3-4871-904f-9460a4687202 http.log 329 | ``` 330 | 331 | The URL should be in the format of: `http://yourserver:8080`, as the rest of the path is added by the 332 | `zeek2es.py` script automatically for you. 333 | 334 | ### JSON Log Input 335 | 336 | Since Zeek JSON logs do not have type information like the ASCII TSV versions, only limited type information 337 | can be provided to ElasticSearch. You will notice this most for Zeek "addr" log fields that 338 | are not id$orig_h and id$resp_h, since the type information is not available to translate the field into 339 | ElasticSearch's "ip" type. Since address fields will not be of type "ip", you will not be able to use 340 | subnet searches, for example, like you could for the TSV logs. Saving Zeek logs in ASCII TSV 341 | format provides for greater long term flexibility. 342 | 343 | ### Data Streams 344 | 345 | You can use data streams instead of indices for large logs with the `-d` command line option. This 346 | option creates index templates beginning with `zeek_`. It also creates a lifecycle policy 347 | named `zeek-lifecycle-policy`. If you would like to delete all of your data streams, lifecycle policies, 348 | and index templates, these commands will do it for you: 349 | 350 | ``` 351 | curl -X DELETE http://localhost:9200/_data_stream/zeek*?pretty 352 | curl -X DELETE http://localhost:9200/_index_template/zeek*?pretty 353 | curl -X DELETE http://localhost:9200/_ilm/policy/zeek-lifecycle-policy?pretty 354 | ``` 355 | 356 | ### Helper Scripts 357 | 358 | There are two scripts that will help you make your logs into data streams such as `logs-zeek-conn`. 359 | The first script is [process_logs_as_datastream.sh](process_logs_as_datastream.sh) and given 360 | a list of logs and directories, will import them as such. The second script 361 | is [process_log.sh](process_log.sh), and it can be used to import logs 362 | one at a time. This script can also be used to monitor logs created in a directory with 363 | [fswatch](https://emcrisostomo.github.io/fswatch/). Both scripts have example command lines 364 | if you run them without any parameters. 365 | 366 | ``` 367 | $ ./process_logs_as_datastream.sh 368 | Usage: ./process_logs_as_datastream.sh NJOBS "ADDITIONAL_ARGS_TO_ZEEK2ES" "LIST_OF_LOGS_DELIMITED_BY_SPACES" DIR1 DIR2 ... 369 | 370 | Example: 371 | time ./process_logs_as_datastream.sh 16 "" "amqp bgp conn dce_rpc dhcp dns dpd files ftp http ipsec irc kerberos modbus modbus_register_change mount mqtt mysql nfs notice ntlm ntp ospf portmap radius reporter rdp rfb rip ripng sip smb_cmd smb_files smb_mapping smtp snmp socks ssh ssl stun syslog tunnel vpn weird wireguard x509" /usr/local/var/logs 372 | ``` 373 | 374 | ``` 375 | $ ./process_log.sh 376 | Usage: ./process_log.sh LOGFILENAME "ADDITIONAL_ARGS_TO_ZEEK2ES" 377 | 378 | Example: 379 | fswatch -m poll_monitor --event Created -r /data/logs/zeek | awk '/^.*\/(conn|dns|http)\..*\.log\.gz$/' | parallel -j 16 ./process_log.sh {} "" :::: - 380 | ``` 381 | 382 | You will need to edit these scripts and command lines according to your environment. 383 | 384 | Any files having a name of a log such as `conn_filter.txt` in the `lambda_filter_file_dir`, by default your home directory, will be applied as a lambda 385 | filter file to the corresponding log input. This allows you to set up all of your filters in one directory and import multiple log files with 386 | that set of filters in one command with [process_logs_as_datastream.sh](process_logs_as_datastream.sh). 387 | 388 | The following lines should delete all Zeek data in ElasticSearch no matter if you use indices or 389 | data streams, or these helper scripts: 390 | 391 | ``` 392 | curl -X DELETE http://localhost:9200/zeek*?pretty 393 | curl -X DELETE http://localhost:9200/_data_stream/zeek*?pretty 394 | curl -X DELETE http://localhost:9200/_data_stream/logs-zeek*?pretty 395 | curl -X DELETE http://localhost:9200/_index_template/zeek*?pretty 396 | curl -X DELETE http://localhost:9200/_index_template/logs-zeek*?pretty 397 | curl -X DELETE http://localhost:9200/_ilm/policy/zeek-lifecycle-policy?pretty 398 | ``` 399 | 400 | ... or if using Elastic v8+ ... 401 | 402 | ``` 403 | curl -X DELETE -k -u elastic:password https://localhost:9200/zeek*?pretty 404 | curl -X DELETE -k -u elastic:password https://localhost:9200/_data_stream/zeek*?pretty 405 | curl -X DELETE -k -u elastic:password https://localhost:9200/_data_stream/logs-zeek*?pretty 406 | curl -X DELETE -k -u elastic:password https://localhost:9200/_index_template/zeek*?pretty 407 | curl -X DELETE -k -u elastic:password https://localhost:9200/_index_template/logs-zeek*?pretty 408 | curl -X DELETE -k -u elastic:password https://localhost:9200/_ilm/policy/zeek-lifecycle-policy?pretty 409 | ``` 410 | 411 | But to be able to do this in v8+ you will need to configure Elastic as described 412 | in the section [Elastic v8.0+](#elastic80). 413 | 414 | ### Cython 415 | 416 | If you'd like to try [Cython](https://cython.org/), you must run `python setup.py build_ext --inplace` 417 | first to generate your compiled file. You must do this every time you update zeek2es! -------------------------------------------------------------------------------- /docker/.env: -------------------------------------------------------------------------------- 1 | # Password for the 'elastic' user (at least 6 characters) CHANGEME!!! 2 | ELASTIC_PASSWORD=elastic 3 | 4 | # Password for the 'kibana_system' user (at least 6 characters) CHANGEME!!! 5 | KIBANA_PASSWORD=elasticANDkibana 6 | 7 | # Version of Elastic products 8 | STACK_VERSION=8.1.3 9 | 10 | # Set the cluster name 11 | CLUSTER_NAME=docker-cluster 12 | 13 | # Set to 'basic' or 'trial' to automatically start the 30-day trial 14 | LICENSE=basic 15 | #LICENSE=trial 16 | 17 | # Port to expose Elasticsearch HTTP API to the host 18 | ES_PORT=9200 19 | #ES_PORT=127.0.0.1:9200 20 | 21 | # Port to expose Kibana to the host 22 | KIBANA_PORT=5601 23 | #KIBANA_PORT=80 24 | 25 | # Increase or decrease based on the available host memory (in bytes) 26 | MEM_LIMIT=1073741824 27 | 28 | # Project namespace (defaults to the current folder name if not set) 29 | #COMPOSE_PROJECT_NAME=myproject 30 | 31 | # Where the data directory resides for volumes CHANGEME!!! 32 | VOLUME_MOUNT=./ -------------------------------------------------------------------------------- /docker/data/.empty: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/corelight/zeek2es/078b531dc27741e3dad26880f58aaee859e8721d/docker/data/.empty -------------------------------------------------------------------------------- /docker/docker-compose.yml: -------------------------------------------------------------------------------- 1 | version: "2.2" 2 | 3 | services: 4 | setup: 5 | image: docker.elastic.co/elasticsearch/elasticsearch:${STACK_VERSION} 6 | volumes: 7 | - certs:/usr/share/elasticsearch/config/certs 8 | user: "0" 9 | command: > 10 | bash -c ' 11 | if [ x${ELASTIC_PASSWORD} == x ]; then 12 | echo "Set the ELASTIC_PASSWORD environment variable in the .env file"; 13 | exit 1; 14 | elif [ x${KIBANA_PASSWORD} == x ]; then 15 | echo "Set the KIBANA_PASSWORD environment variable in the .env file"; 16 | exit 1; 17 | fi; 18 | if [ ! -f config/certs/ca.zip ]; then 19 | echo "Creating CA"; 20 | bin/elasticsearch-certutil ca --silent --pem -out config/certs/ca.zip; 21 | unzip config/certs/ca.zip -d config/certs; 22 | fi; 23 | if [ ! -f config/certs/certs.zip ]; then 24 | echo "Creating certs"; 25 | echo -ne \ 26 | "instances:\n"\ 27 | " - name: es01\n"\ 28 | " dns:\n"\ 29 | " - es01\n"\ 30 | " - localhost\n"\ 31 | " ip:\n"\ 32 | " - 127.0.0.1\n"\ 33 | " - name: es02\n"\ 34 | " dns:\n"\ 35 | " - es02\n"\ 36 | " - localhost\n"\ 37 | " ip:\n"\ 38 | " - 127.0.0.1\n"\ 39 | " - name: es03\n"\ 40 | " dns:\n"\ 41 | " - es03\n"\ 42 | " - localhost\n"\ 43 | " ip:\n"\ 44 | " - 127.0.0.1\n"\ 45 | " - name: kibana\n"\ 46 | " dns:\n"\ 47 | " - kibana\n"\ 48 | " - localhost\n"\ 49 | " ip:\n"\ 50 | " - 127.0.0.1\n"\ 51 | > config/certs/instances.yml; 52 | bin/elasticsearch-certutil cert --silent --pem -out config/certs/certs.zip --in config/certs/instances.yml --ca-cert config/certs/ca/ca.crt --ca-key config/certs/ca/ca.key; 53 | unzip config/certs/certs.zip -d config/certs; 54 | fi; 55 | echo "Setting file permissions" 56 | chown -R root:root config/certs; 57 | find . -type d -exec chmod 750 \{\} \;; 58 | find . -type f -exec chmod 640 \{\} \;; 59 | echo "Waiting for Elasticsearch availability"; 60 | until curl -s --cacert config/certs/ca/ca.crt https://es01:9200 | grep -q "missing authentication credentials"; do sleep 30; done; 61 | echo "Setting kibana_system password"; 62 | until curl -s -X POST --cacert config/certs/ca/ca.crt -u elastic:${ELASTIC_PASSWORD} -H "Content-Type: application/json" https://es01:9200/_security/user/kibana_system/_password -d "{\"password\":\"${KIBANA_PASSWORD}\"}" | grep -q "^{}"; do sleep 10; done; 63 | echo "All done!"; 64 | ' 65 | healthcheck: 66 | test: ["CMD-SHELL", "[ -f config/certs/es01/es01.crt ]"] 67 | interval: 1s 68 | timeout: 5s 69 | retries: 120 70 | container_name: "setup" 71 | 72 | es01: 73 | depends_on: 74 | setup: 75 | condition: service_healthy 76 | image: docker.elastic.co/elasticsearch/elasticsearch:${STACK_VERSION} 77 | restart: "unless-stopped" 78 | volumes: 79 | - certs:/usr/share/elasticsearch/config/certs 80 | - ${VOLUME_MOUNT}/data/es01:/usr/share/elasticsearch/data 81 | ports: 82 | - ${ES_PORT}:9200 83 | environment: 84 | - node.name=es01 85 | - cluster.name=${CLUSTER_NAME} 86 | - cluster.initial_master_nodes=es01,es02,es03 87 | - discovery.seed_hosts=es02,es03 88 | - ELASTIC_PASSWORD=${ELASTIC_PASSWORD} 89 | - bootstrap.memory_lock=true 90 | - xpack.security.enabled=true 91 | - xpack.security.http.ssl.enabled=true 92 | - xpack.security.http.ssl.key=certs/es01/es01.key 93 | - xpack.security.http.ssl.certificate=certs/es01/es01.crt 94 | - xpack.security.http.ssl.certificate_authorities=certs/ca/ca.crt 95 | - xpack.security.http.ssl.verification_mode=certificate 96 | - xpack.security.transport.ssl.enabled=true 97 | - xpack.security.transport.ssl.key=certs/es01/es01.key 98 | - xpack.security.transport.ssl.certificate=certs/es01/es01.crt 99 | - xpack.security.transport.ssl.certificate_authorities=certs/ca/ca.crt 100 | - xpack.security.transport.ssl.verification_mode=certificate 101 | - xpack.license.self_generated.type=${LICENSE} 102 | mem_limit: ${MEM_LIMIT} 103 | ulimits: 104 | memlock: 105 | soft: -1 106 | hard: -1 107 | healthcheck: 108 | test: 109 | [ 110 | "CMD-SHELL", 111 | "curl -s --cacert config/certs/ca/ca.crt https://localhost:9200 | grep -q 'missing authentication credentials'", 112 | ] 113 | interval: 10s 114 | timeout: 10s 115 | retries: 120 116 | container_name: "es01" 117 | 118 | es02: 119 | depends_on: 120 | - es01 121 | image: docker.elastic.co/elasticsearch/elasticsearch:${STACK_VERSION} 122 | restart: "unless-stopped" 123 | volumes: 124 | - certs:/usr/share/elasticsearch/config/certs 125 | - ${VOLUME_MOUNT}/data/es02:/usr/share/elasticsearch/data 126 | environment: 127 | - node.name=es02 128 | - cluster.name=${CLUSTER_NAME} 129 | - cluster.initial_master_nodes=es01,es02,es03 130 | - discovery.seed_hosts=es01,es03 131 | - bootstrap.memory_lock=true 132 | - xpack.security.enabled=true 133 | - xpack.security.http.ssl.enabled=true 134 | - xpack.security.http.ssl.key=certs/es02/es02.key 135 | - xpack.security.http.ssl.certificate=certs/es02/es02.crt 136 | - xpack.security.http.ssl.certificate_authorities=certs/ca/ca.crt 137 | - xpack.security.http.ssl.verification_mode=certificate 138 | - xpack.security.transport.ssl.enabled=true 139 | - xpack.security.transport.ssl.key=certs/es02/es02.key 140 | - xpack.security.transport.ssl.certificate=certs/es02/es02.crt 141 | - xpack.security.transport.ssl.certificate_authorities=certs/ca/ca.crt 142 | - xpack.security.transport.ssl.verification_mode=certificate 143 | - xpack.license.self_generated.type=${LICENSE} 144 | mem_limit: ${MEM_LIMIT} 145 | ulimits: 146 | memlock: 147 | soft: -1 148 | hard: -1 149 | healthcheck: 150 | test: 151 | [ 152 | "CMD-SHELL", 153 | "curl -s --cacert config/certs/ca/ca.crt https://localhost:9200 | grep -q 'missing authentication credentials'", 154 | ] 155 | interval: 10s 156 | timeout: 10s 157 | retries: 120 158 | container_name: "es02" 159 | 160 | es03: 161 | depends_on: 162 | - es02 163 | image: docker.elastic.co/elasticsearch/elasticsearch:${STACK_VERSION} 164 | restart: "unless-stopped" 165 | volumes: 166 | - certs:/usr/share/elasticsearch/config/certs 167 | - ${VOLUME_MOUNT}/data/es03:/usr/share/elasticsearch/data 168 | environment: 169 | - node.name=es03 170 | - cluster.name=${CLUSTER_NAME} 171 | - cluster.initial_master_nodes=es01,es02,es03 172 | - discovery.seed_hosts=es01,es02 173 | - bootstrap.memory_lock=true 174 | - xpack.security.enabled=true 175 | - xpack.security.http.ssl.enabled=true 176 | - xpack.security.http.ssl.key=certs/es03/es03.key 177 | - xpack.security.http.ssl.certificate=certs/es03/es03.crt 178 | - xpack.security.http.ssl.certificate_authorities=certs/ca/ca.crt 179 | - xpack.security.http.ssl.verification_mode=certificate 180 | - xpack.security.transport.ssl.enabled=true 181 | - xpack.security.transport.ssl.key=certs/es03/es03.key 182 | - xpack.security.transport.ssl.certificate=certs/es03/es03.crt 183 | - xpack.security.transport.ssl.certificate_authorities=certs/ca/ca.crt 184 | - xpack.security.transport.ssl.verification_mode=certificate 185 | - xpack.license.self_generated.type=${LICENSE} 186 | mem_limit: ${MEM_LIMIT} 187 | ulimits: 188 | memlock: 189 | soft: -1 190 | hard: -1 191 | healthcheck: 192 | test: 193 | [ 194 | "CMD-SHELL", 195 | "curl -s --cacert config/certs/ca/ca.crt https://localhost:9200 | grep -q 'missing authentication credentials'", 196 | ] 197 | interval: 10s 198 | timeout: 10s 199 | retries: 120 200 | container_name: "es03" 201 | 202 | kibana: 203 | depends_on: 204 | es01: 205 | condition: service_healthy 206 | es02: 207 | condition: service_healthy 208 | es03: 209 | condition: service_healthy 210 | image: docker.elastic.co/kibana/kibana:${STACK_VERSION} 211 | restart: "unless-stopped" 212 | volumes: 213 | - certs:/usr/share/kibana/config/certs 214 | - ${VOLUME_MOUNT}/data/kibana:/usr/share/kibana/data 215 | ports: 216 | - ${KIBANA_PORT}:5601 217 | environment: 218 | - SERVERNAME=kibana 219 | - ELASTICSEARCH_HOSTS=https://es01:9200 220 | - ELASTICSEARCH_USERNAME=kibana_system 221 | - ELASTICSEARCH_PASSWORD=${KIBANA_PASSWORD} 222 | - ELASTICSEARCH_SSL_CERTIFICATEAUTHORITIES=config/certs/ca/ca.crt 223 | - SERVER_SSL_ENABLED=true 224 | - SERVER_SSL_KEY=/usr/share/kibana/config/certs/kibana/kibana.key 225 | - SERVER_SSL_CERTIFICATE=/usr/share/kibana/config/certs/kibana/kibana.crt 226 | - SERVER_SSL_CERTIFICATEAUTHORITIES=config/certs/ca/ca.crt 227 | # - SERVER_SSL_PASSWORD=${KIBANA_CERT_PASSWORD} 228 | mem_limit: ${MEM_LIMIT} 229 | healthcheck: 230 | test: 231 | [ 232 | "CMD-SHELL", 233 | "curl -s -I http://localhost:5601 | grep -q 'HTTP/1.1 302 Found'", 234 | ] 235 | interval: 10s 236 | timeout: 10s 237 | retries: 120 238 | container_name: "kibana" 239 | 240 | zeek2es: 241 | build: 242 | context: ./zeek2es 243 | dockerfile: Dockerfile 244 | restart: "unless-stopped" 245 | depends_on: 246 | es01: 247 | condition: service_healthy 248 | es02: 249 | condition: service_healthy 250 | es03: 251 | condition: service_healthy 252 | command: > 253 | bash -c ' 254 | chmod 755 /entrypoint.sh; 255 | /entrypoint.sh 256 | ' 257 | volumes: 258 | - ./zeek2es/entrypoint.sh:/entrypoint.sh 259 | - ${VOLUME_MOUNT}/data/logs:/logs 260 | tty: true 261 | container_name: "zeek2es" 262 | 263 | volumes: 264 | certs: 265 | driver: local -------------------------------------------------------------------------------- /docker/zeek2es/Dockerfile: -------------------------------------------------------------------------------- 1 | FROM ubuntu:jammy 2 | 3 | RUN apt-get -q update && \ 4 | DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \ 5 | curl \ 6 | fswatch \ 7 | geoipupdate \ 8 | git \ 9 | iproute2 \ 10 | jq \ 11 | less \ 12 | netcat \ 13 | net-tools \ 14 | parallel \ 15 | python3 \ 16 | python3-dev \ 17 | python3-pip \ 18 | python3-setuptools \ 19 | python3-wheel \ 20 | swig \ 21 | tcpdump \ 22 | tcpreplay \ 23 | termshark \ 24 | tshark \ 25 | vim \ 26 | wget \ 27 | zeek-aux && \ 28 | pip3 install --no-cache-dir pre-commit requests && \ 29 | curl -L -O https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-8.2.0-amd64.deb && \ 30 | dpkg -i filebeat-8.2.0-amd64.deb && \ 31 | rm filebeat-8.2.0-amd64.deb && \ 32 | apt-get clean && rm -rf /var/lib/apt/lists/* && rm -rf ~/.cache/pip 33 | 34 | # Install zeek2es 35 | RUN cd / && git clone https://github.com/corelight/zeek2es.git 36 | 37 | #COPY entrypoint.sh /entrypoint.sh 38 | #RUN chmod 755 /entrypoint.sh -------------------------------------------------------------------------------- /docker/zeek2es/entrypoint.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | fswatch -m poll_monitor --event Created -r /logs | parallel -j 3 python3 /zeek2es/zeek2es.py {} --compress -g -l 5000 -d 25 -u https://es01:9200 --user elastic --passwd elastic :::: - -------------------------------------------------------------------------------- /images/kibana-aggregation.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/corelight/zeek2es/078b531dc27741e3dad26880f58aaee859e8721d/images/kibana-aggregation.png -------------------------------------------------------------------------------- /images/kibana-map.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/corelight/zeek2es/078b531dc27741e3dad26880f58aaee859e8721d/images/kibana-map.png -------------------------------------------------------------------------------- /images/kibana-subnet-search.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/corelight/zeek2es/078b531dc27741e3dad26880f58aaee859e8721d/images/kibana-subnet-search.png -------------------------------------------------------------------------------- /images/kibana-timeseries.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/corelight/zeek2es/078b531dc27741e3dad26880f58aaee859e8721d/images/kibana-timeseries.png -------------------------------------------------------------------------------- /images/kibana.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/corelight/zeek2es/078b531dc27741e3dad26880f58aaee859e8721d/images/kibana.png -------------------------------------------------------------------------------- /images/multi-log-correlation.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/corelight/zeek2es/078b531dc27741e3dad26880f58aaee859e8721d/images/multi-log-correlation.png -------------------------------------------------------------------------------- /process_log.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # Things you can set: 4 | zeek2es_path=~/Source/zeek2es/zeek2es.py 5 | filter_file_dir=~/ 6 | num_of_lines=50000 7 | logfiledelim=\\. 8 | stream_prepend="logs-zeek-" 9 | stream_ending="" 10 | pythoncmd="python3" 11 | zeek2esargs="-g -l $num_of_lines" 12 | 13 | # Error checking 14 | if [ "$#" -ne 2 ]; then 15 | echo "Usage: $0 LOGFILENAME \"ADDITIONAL_ARGS_TO_ZEEK2ES\"" >&2 16 | echo >&2 17 | echo "Example:" >&2 18 | echo " fswatch -m poll_monitor --event Created -r /data/logs/zeek | awk '/^.*\/(conn|dns|http)\..*\.log\.gz$/' | parallel -j 16 $0 {} \"\"" :::: - >&2 19 | exit 1 20 | fi 21 | 22 | # Things set from the command line 23 | logfile=$1 24 | additional_args=$2 25 | 26 | echo "Processing $logfile..." 27 | regex="s/.*\/\([^0-9\.]*\)$logfiledelim[0-9].*\.log\.gz/\1/" 28 | log_type=`echo $logfile | sed $regex` 29 | echo $log_type 30 | 31 | zeek2esargsplus=$zeek2esargs" -i $stream_prepend$log_type$stream_ending "$additional_args 32 | 33 | filterfile=$filter_file_dir$log_type"_filter.txt" 34 | 35 | if [ -f $filterfile ]; then 36 | echo " Using filter file "$filterfile 37 | $pythoncmd $zeek2es_path $logfile $zeek2esargsplus -f $filterfile 38 | else 39 | echo " No filter file found for "$filterfile 40 | $pythoncmd $zeek2es_path $logfile $zeek2esargsplus 41 | fi -------------------------------------------------------------------------------- /process_logs_as_datastream.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # Things you can set: 4 | zeek2es_path=~/Source/zeek2es/zeek2es.py 5 | lognamedelim=\\. 6 | #zeek2es_path=~/zeek2es.py 7 | #lognamedelim=_2 8 | filter_file_dir=~/ 9 | num_of_lines=50000 10 | num_of_gb=50 11 | pythoncmd="python3" 12 | zeek2esargs="-g -l $num_of_lines" 13 | 14 | # Error checking 15 | if [ "$#" -lt 4 ]; then 16 | echo "Usage: $0 NJOBS \"ADDITIONAL_ARGS_TO_ZEEK2ES\" \"LIST_OF_LOGS_DELIMITED_BY_SPACES\" DIR1 DIR2 ..." >&2 17 | echo >&2 18 | echo "Example:" >&2 19 | echo " time $0 16 \"\" \"amqp bgp conn dce_rpc dhcp dns dpd files ftp http ipsec irc kerberos modbus modbus_register_change mount mqtt mysql nfs notice ntlm ntp ospf portmap radius reporter rdp rfb rip ripng sip smb_cmd smb_files smb_mapping smtp snmp socks ssh ssl stun syslog tunnel vpn weird wireguard x509\" /usr/local/var/logs" >&2 20 | exit 1 21 | fi 22 | 23 | # Things set from the command line 24 | njobs=$1 25 | additional_args=$2 26 | logs=$3 27 | logdirs=${@:4} 28 | 29 | # Iterate through the *.log.gz files in the supplied directory 30 | for val in $logs; do 31 | zeek2esargsplus=$zeek2esargs" --compress -d "$num_of_gb" "$additional_args 32 | echo "Processing $val logs..." 33 | filename_re="/^.*\/"$val$lognamedelim".*\.log\.gz$/" 34 | 35 | filterfile=$filter_file_dir$val"_filter.txt" 36 | 37 | if [ -f $filterfile ]; then 38 | echo " Using filter file "$filterfile 39 | find $logdirs | awk $filename_re | parallel -j $njobs $pythoncmd $zeek2es_path {} $zeek2esargsplus -f $filterfile :::: - 40 | else 41 | echo " No filter file found for "$filterfile 42 | find $logdirs | awk $filename_re | parallel -j $njobs $pythoncmd $zeek2es_path {} $zeek2esargsplus :::: - 43 | fi 44 | done -------------------------------------------------------------------------------- /process_logs_to_stdout.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # Things you can set: 4 | zeek2es_path=~/Source/zeek2es/zeek2es.py 5 | lognamedelim=\\. 6 | #zeek2es_path=~/zeek2es.py 7 | #lognamedelim=_2 8 | filter_file_dir=~/ 9 | num_of_lines=50000 10 | stream_prepend="logs-zeek-" 11 | stream_ending="" 12 | pythoncmd="python3" 13 | zeek2esargs="-s -b" 14 | 15 | # Error checking 16 | if [ "$#" -lt 4 ]; then 17 | echo "Usage: $0 NJOBS \"ADDITIONAL_ARGS_TO_ZEEK2ES\" \"LIST_OF_LOGS_DELIMITED_BY_SPACES\" DIR1 DIR2 ..." >&2 18 | echo >&2 19 | echo "Example:" >&2 20 | echo " time $0 16 \"\" \"amqp bgp conn dce_rpc dhcp dns dpd files ftp http ipsec irc kerberos modbus modbus_register_change mount mqtt mysql nfs notice ntlm ntp ospf portmap radius reporter rdp rfb rip ripng sip smb_cmd smb_files smb_mapping smtp snmp socks ssh ssl stun syslog tunnel vpn weird wireguard x509\" /usr/local/var/logs" >&2 21 | exit 1 22 | fi 23 | 24 | # Things set from the command line 25 | njobs=$1 26 | additional_args=$2 27 | logs=$3 28 | logdirs=${@:4} 29 | 30 | # Iterate through the *.log.gz files in the supplied directory 31 | for val in $logs; do 32 | zeek2esargsplus=$zeek2esargs" "$additional_args 33 | filename_re="/^.*\/"$val$lognamedelim".*\.log\.gz$/" 34 | 35 | filterfile=$filter_file_dir$val"_filter.txt" 36 | 37 | if [ -f $filterfile ]; then 38 | find $logdirs | awk $filename_re | parallel -j $njobs $pythoncmd $zeek2es_path {} $zeek2esargsplus -f $filterfile :::: - 39 | else 40 | find $logdirs | awk $filename_re | parallel -j $njobs $pythoncmd $zeek2es_path {} $zeek2esargsplus :::: - 41 | fi 42 | done -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | from setuptools import setup 2 | from Cython.Build import cythonize 3 | 4 | setup( 5 | ext_modules = cythonize("zeek2es.py") 6 | ) 7 | -------------------------------------------------------------------------------- /zeek2es.py: -------------------------------------------------------------------------------- 1 | import sys 2 | import subprocess 3 | import json 4 | import csv 5 | import io 6 | import requests 7 | from requests.auth import HTTPBasicAuth 8 | from urllib3.exceptions import InsecureRequestWarning 9 | import datetime 10 | import re 11 | import argparse 12 | import random 13 | import time 14 | # Making these available for lambda filter input. 15 | import ipaddress 16 | import os 17 | 18 | # The number of bits to use in a random hash. 19 | hashbits = 128 20 | 21 | # Disable SSL warnings. 22 | requests.packages.urllib3.disable_warnings(category=InsecureRequestWarning) 23 | 24 | # We do this to add a little extra help at the end. 25 | class MyParser(argparse.ArgumentParser): 26 | def print_help(self): 27 | super().print_help() 28 | print("") 29 | print("To delete indices:\n\n\tcurl -X DELETE http://localhost:9200/zeek*?pretty\n") 30 | print("To delete data streams:\n\n\tcurl -X DELETE http://localhost:9200/_data_stream/zeek*?pretty\n") 31 | print("To delete index templates:\n\n\tcurl -X DELETE http://localhost:9200/_index_template/zeek*?pretty\n") 32 | print("To delete the lifecycle policy:\n\n\tcurl -X DELETE http://localhost:9200/_ilm/policy/zeek-lifecycle-policy?pretty\n") 33 | print("You will need to add -k -u elastic_user:password if you are using Elastic v8+.\n") 34 | 35 | # This takes care of arg parsing 36 | def parseargs(): 37 | parser = MyParser(description='Process Zeek ASCII logs into ElasticSearch.', formatter_class=argparse.RawTextHelpFormatter) 38 | parser.add_argument('filename', 39 | help='The Zeek log in *.log or *.gz format. Include the full path.') 40 | parser.add_argument('-i', '--esindex', help='The Elasticsearch index/data stream name.') 41 | parser.add_argument('-u', '--esurl', default="http://localhost:9200", help='The Elasticsearch URL. Use ending slash. Use https for Elastic v8+. (default: http://localhost:9200)') 42 | parser.add_argument('--user', default="", help='The Elasticsearch user. (default: disabled)') 43 | parser.add_argument('--passwd', default="", help='The Elasticsearch password. Note this will put your password in this shell history file. (default: disabled)') 44 | parser.add_argument('-l', '--lines', default=10000, type=int, help='Lines to buffer for RESTful operations. (default: 10,000)') 45 | parser.add_argument('-n', '--name', default="", help='The name of the system to add to the index for uniqueness. (default: empty string)') 46 | parser.add_argument('-k', '--keywords', nargs="+", default="service", help='A list of text fields to add a keyword subfield. (default: service)') 47 | parser.add_argument('-a', '--lambdafilter', default="", help='A Python lambda function, when eval\'d will filter your output JSON dict. (default: empty string)') 48 | parser.add_argument('-f', '--filterfile', default="", help='A Python function file, when eval\'d will filter your output JSON dict. (default: empty string)') 49 | parser.add_argument('-y', '--outputfields', nargs="+", default="", help='A list of fields to keep for the output. Must include ts. (default: empty string)') 50 | parser.add_argument('-d', '--datastream', default=0, type=int, help='Instead of an index, use a data stream that will rollover at this many GB.\nRecommended is 50 or less. (default: 0 - disabled)') 51 | parser.add_argument('--compress', action="store_true", help='If a datastream is used, enable best compression.') 52 | parser.add_argument('-o', '--logkey', nargs=2, action='append', metavar=('fieldname','filename'), default=[], help='A field to log to a file. Example: uid uid.txt. \nWill append to the file! Delete file before running if appending is undesired. \nThis option can be called more than once. (default: empty - disabled)') 53 | parser.add_argument('-e', '--filterkeys', nargs=2, metavar=('fieldname','filename'), default="", help='A field to filter with keys from a file. Example: uid uid.txt. (default: empty string - disabled)') 54 | parser.add_argument('-g', '--ingestion', action="store_true", help='Use the ingestion pipeline to do things like geolocate IPs and split services. Takes longer, but worth it.') 55 | parser.add_argument('-p', '--splitfields', nargs="+", default="", help='A list of additional fields to split with the ingestion pipeline, if enabled.\n(default: empty string - disabled)') 56 | parser.add_argument('-j', '--jsonlogs', action="store_true", help='Assume input logs are JSON.') 57 | parser.add_argument('-r', '--origtime', action="store_true", help='Keep the numerical time format, not milliseconds as ES needs.') 58 | parser.add_argument('-t', '--timestamp', action="store_true", help='Keep the time in timestamp format.') 59 | parser.add_argument('-s', '--stdout', action="store_true", help='Print JSON to stdout instead of sending to Elasticsearch directly.') 60 | parser.add_argument('-b', '--nobulk', action="store_true", help='Remove the ES bulk JSON header. Requires --stdout.') 61 | parser.add_argument('--humio', nargs=2, default="", help='First argument is the Humio URL, the second argument is the ingest token.') 62 | parser.add_argument('-c', '--cython', action="store_true", help='Use Cython execution by loading the local zeek2es.so file through an import.\nRun python setup.py build_ext --inplace first to make your zeek2es.so file!') 63 | parser.add_argument('-w', '--hashdates', action="store_true", help='Use hashes instead of dates for the index name.') 64 | parser.add_argument('-z', '--supresswarnings', action="store_true", help='Supress any type of warning. Die stoically and silently.') 65 | args = parser.parse_args() 66 | return args 67 | 68 | # A function to send data in bulk to ES. 69 | def sendbulk(args, outstring, es_index, filename): 70 | # Elastic username and password auth 71 | auth = None 72 | if (len(args['user']) > 0): 73 | auth = HTTPBasicAuth(args['user'], args['passwd']) 74 | 75 | if len(args['humio']) != 2: 76 | if not args['stdout']: 77 | esurl = args['esurl'][:-1] if args['esurl'].endswith('/') else args['esurl'] 78 | 79 | res = requests.put(esurl+'/_bulk', headers={'Content-Type': 'application/json'}, 80 | data=outstring.encode('UTF-8'), auth=auth, verify=False) 81 | if not res.ok: 82 | if not args['supresswarnings']: 83 | print("WARNING! PUT did not return OK! Your index {} is incomplete. Filename: {} Response: {} {}".format(es_index, filename, res, res.text)) 84 | else: 85 | print(outstring.strip()) 86 | else: 87 | # Send to Humio 88 | Headers = { "Authorization" : "Bearer "+args['humio'][1] } 89 | data = [{"messages" : outstring.strip().split('\n') }] 90 | while True: 91 | try: 92 | r = requests.post(args['humio'][0]+'/api/v1/ingest/humio-unstructured', headers=Headers, json=data) 93 | break 94 | except Exception as exc: 95 | if not args['supresswarnings']: 96 | print("WARNING, Humio error: {}".format(exc)) 97 | time.sleep(1) 98 | 99 | # A function to send the datastream info to ES. 100 | def senddatastream(args, es_index, mappings): 101 | # Elastic username and password auth 102 | auth = None 103 | if (len(args['user']) > 0): 104 | auth = HTTPBasicAuth(args['user'], args['passwd']) 105 | 106 | esurl = args['esurl'][:-1] if args['esurl'].endswith('/') else args['esurl'] 107 | 108 | lifecycle_policy = {"policy": {"phases": {"hot": {"actions": {"rollover": {"max_primary_shard_size": "{}GB".format(args['datastream'])}}}}}} 109 | res = requests.put(esurl+"/_ilm/policy/zeek-lifecycle-policy", headers={'Content-Type': 'application/json'}, 110 | data=json.dumps(lifecycle_policy).encode('UTF-8'), auth=auth, verify=False) 111 | index_template = {"index_patterns": [es_index], "data_stream": {}, "composed_of": [], "priority": 500, 112 | "template": {"settings": {"index.lifecycle.name": "zeek-lifecycle-policy"}, "mappings": mappings["mappings"]}} 113 | if (args['compress']): 114 | index_template["template"]["settings"]["index"] = {"codec": "best_compression"} 115 | res = requests.put(esurl+"/_index_template/"+es_index, headers={'Content-Type': 'application/json'}, 116 | data=json.dumps(index_template).encode('UTF-8'), auth=auth, verify=False) 117 | 118 | # A function to send mappings to ES. 119 | def sendmappings(args, es_index, mappings): 120 | # Elastic username and password auth 121 | auth = None 122 | if (len(args['user']) > 0): 123 | auth = HTTPBasicAuth(args['user'], args['passwd']) 124 | 125 | esurl = args['esurl'][:-1] if args['esurl'].endswith('/') else args['esurl'] 126 | 127 | res = requests.put(esurl+"/"+es_index, headers={'Content-Type': 'application/json'}, 128 | data=json.dumps(mappings).encode('UTF-8'), auth=auth, verify=False) 129 | 130 | # A function to send the ingest pipeline to ES. 131 | def sendpipeline(args, ingest_pipeline): 132 | # Elastic username and password auth 133 | auth = None 134 | if (len(args['user']) > 0): 135 | auth = HTTPBasicAuth(args['user'], args['passwd']) 136 | 137 | esurl = args['esurl'][:-1] if args['esurl'].endswith('/') else args['esurl'] 138 | 139 | res = requests.put(esurl+"/_ingest/pipeline/zeekgeoip", headers={'Content-Type': 'application/json'}, 140 | data=json.dumps(ingest_pipeline).encode('UTF-8'), auth=auth, verify=False) 141 | 142 | # Everything important is in here. 143 | def main(**args): 144 | 145 | # Takes care of the fields we want to output, if not all. 146 | outputfields = [] 147 | if (len(args['outputfields']) > 0): 148 | outputfields = args['outputfields'] 149 | 150 | # Takes care of logging keys to a file. 151 | logkeyfields = [] 152 | logkeys_fds = [] 153 | if (len(args['logkey']) > 0): 154 | for lk in args['logkey']: 155 | thefield, thefile = lk[0], lk[1] 156 | f = open(thefile, "a+") 157 | logkeyfields.append(thefield) 158 | logkeys_fds.append(f) 159 | 160 | # Takes care of loading keys from a file to use in a filter. 161 | filterkeys = set() 162 | filterkeys_field = None 163 | if (len(args['filterkeys']) > 0): 164 | filterkeys_field = args['filterkeys'][0] 165 | filterkeys_file = args['filterkeys'][1] 166 | with open(filterkeys_file, "r") as infile: 167 | filterkeys = set(infile.read().splitlines()) 168 | 169 | # This takes care of fields where we want to add the keyword field. 170 | keywords = [] 171 | if (len(args['keywords']) > 0): 172 | keywords = args['keywords'] 173 | 174 | # Error checking 175 | if args['esindex'] and args['stdout']: 176 | if not args['supresswarnings']: 177 | print("Cannot write to Elasticsearch and stdout at the same time.") 178 | exit(-1) 179 | 180 | # Error checking 181 | if args['nobulk'] and not args['stdout']: 182 | if not args['supresswarnings']: 183 | print("The nobulk option can only be used with the stdout option.") 184 | exit(-2) 185 | 186 | # Error checking 187 | if len(args['humio']) > 0 and (not args['stdout'] or not args['nobulk'] or args['timestamp']): 188 | if not args['supresswarnings']: 189 | print("The Humio option can only be used with the stdout and nobulk options, and cannot have the timestamp option.") 190 | exit(-5) 191 | 192 | # Error checking 193 | if not args['timestamp'] and args['origtime']: 194 | if not args['supresswarnings']: 195 | print("The origtime option can only be used with the timestamp option.") 196 | exit(-3) 197 | 198 | # Error checking 199 | if len(args['lambdafilter']) > 0 and len(args['filterfile']) > 0: 200 | if not args['supresswarnings']: 201 | print("The lambdafilter option cannot be used with the filterfile option.") 202 | exit(-7) 203 | 204 | # This takes care of loading the Python filters. 205 | filterfilter = None 206 | if len(args['lambdafilter']) > 0: 207 | filterfilter = eval(args['lambdafilter']) 208 | 209 | if len(args['filterfile']) > 0: 210 | with open(args['filterfile'], "r") as ff: 211 | filterfilter = eval(ff.read()) 212 | 213 | # The file we are processing. 214 | filename = args['filename'] 215 | 216 | # Detect if the log is compressed or not. 217 | if filename.split(".")[-1].lower() == "gz": 218 | # This works on Linux and MacOs 219 | zcat_name = ["gzip", "-d", "-c"] 220 | else: 221 | zcat_name = ["cat"] 222 | 223 | # Setup the ingest pipeline 224 | ingest_pipeline = {"description": "Zeek Log Ingestion Pipeline.", "processors": [ ]} 225 | 226 | if args['ingestion']: 227 | fields_to_split = [] 228 | if len(args['splitfields']) > 0: 229 | fields_to_split = args['splitfields'] 230 | ingest_pipeline["processors"] += [{"dot_expander": {"field": "*"}}] 231 | ingest_pipeline["processors"] += [{"split": {"field": "service", "separator": ",", "ignore_missing": True, "ignore_failure": True}}] 232 | for f in fields_to_split: 233 | ingest_pipeline["processors"] += [{"split": {"field": f, "separator": ",", "ignore_missing": True, "ignore_failure": True}}] 234 | ingest_pipeline["processors"] += [{"geoip": {"field": "id.orig_h", "target_field": "geoip_orig", "ignore_missing": True}}] 235 | ingest_pipeline["processors"] += [{"geoip": {"field": "id.resp_h", "target_field": "geoip_resp", "ignore_missing": True}}] 236 | 237 | # This section takes care of TSV logs. Skip ahead for the JSON logic. 238 | if not args['jsonlogs']: 239 | # Get the date 240 | 241 | zcat_process = subprocess.Popen(zcat_name+[filename], 242 | stdout=subprocess.PIPE) 243 | 244 | head_process = subprocess.Popen(['head'], 245 | stdin=zcat_process.stdout, 246 | stdout=subprocess.PIPE) 247 | 248 | grep_process = subprocess.Popen(['grep', '#open'], 249 | stdin=head_process.stdout, 250 | stdout=subprocess.PIPE) 251 | 252 | try: 253 | log_date = datetime.datetime.strptime(grep_process.communicate()[0].decode('UTF-8').strip().split('\t')[1], "%Y-%m-%d-%H-%M-%S") 254 | except: 255 | if not args['supresswarnings']: 256 | print("Date not found from Zeek log! {}".format(filename)) 257 | exit(-4) 258 | 259 | # Get the Zeek log path 260 | 261 | zcat_process = subprocess.Popen(zcat_name+[filename], 262 | stdout=subprocess.PIPE) 263 | 264 | head_process = subprocess.Popen(['head'], 265 | stdin=zcat_process.stdout, 266 | stdout=subprocess.PIPE) 267 | 268 | grep_process = subprocess.Popen(['grep', '#path'], 269 | stdin=head_process.stdout, 270 | stdout=subprocess.PIPE) 271 | 272 | zeek_log_path = grep_process.communicate()[0].decode('UTF-8').strip().split('\t')[1] 273 | 274 | # Build the ES index. 275 | if not args['esindex']: 276 | if args['datastream'] > 0: 277 | es_index = "logs-zeek-{}".format(zeek_log_path) 278 | else: 279 | sysname = "" 280 | if (len(args['name']) > 0): 281 | sysname = "{}_".format(args['name']) 282 | # We allow for hashes instead of dates in the index name. 283 | if not args['hashdates']: 284 | es_index = "zeek_"+sysname+"{}_{}".format(zeek_log_path, log_date.date()) 285 | else: 286 | es_index = "zeek_"+sysname+"{}_{}".format(zeek_log_path, random.getrandbits(hashbits)) 287 | else: 288 | es_index = args['esindex'] 289 | 290 | es_index = es_index.replace(':', '_').replace("/", "_") 291 | 292 | # Get the Zeek fields from the log file. 293 | 294 | zcat_process = subprocess.Popen(zcat_name+[filename], 295 | stdout=subprocess.PIPE) 296 | 297 | head_process = subprocess.Popen(['head'], 298 | stdin=zcat_process.stdout, 299 | stdout=subprocess.PIPE) 300 | 301 | grep_process = subprocess.Popen(['grep', '#fields'], 302 | stdin=head_process.stdout, 303 | stdout=subprocess.PIPE) 304 | 305 | fields = grep_process.communicate()[0].decode('UTF-8').strip().split('\t')[1:] 306 | 307 | # Get the Zeek types from the log file. 308 | 309 | zcat_process = subprocess.Popen(zcat_name+[filename], 310 | stdout=subprocess.PIPE) 311 | 312 | head_process = subprocess.Popen(['head'], 313 | stdin=zcat_process.stdout, 314 | stdout=subprocess.PIPE) 315 | 316 | grep_process = subprocess.Popen(['grep', '#types'], 317 | stdin=head_process.stdout, 318 | stdout=subprocess.PIPE) 319 | 320 | types = grep_process.communicate()[0].decode('UTF-8').strip().split('\t')[1:] 321 | 322 | # Read TSV 323 | 324 | zcat_process = subprocess.Popen(zcat_name+[filename], 325 | stdout=subprocess.PIPE) 326 | 327 | grep_process = subprocess.Popen(['grep', '-E', '-v', '^#'], 328 | stdin=zcat_process.stdout, 329 | stdout=subprocess.PIPE) 330 | 331 | # Make the max size 332 | csv.field_size_limit(sys.maxsize) 333 | 334 | # Only process if we have a valid log file. 335 | if len(types) > 0 and len(fields) > 0: 336 | read_tsv = csv.reader(io.TextIOWrapper(grep_process.stdout), delimiter="\t", quoting=csv.QUOTE_NONE) 337 | 338 | # Put mappings 339 | 340 | mappings = {"mappings": {"properties": dict(geoip_orig=dict(properties=dict(location=dict(type="geo_point"))), geoip_resp=dict(properties=dict(location=dict(type="geo_point"))))}} 341 | 342 | for i in range(len(fields)): 343 | if types[i] == "time": 344 | mappings["mappings"]["properties"][fields[i]] = {"type": "date"} 345 | elif types[i] == "addr": 346 | mappings["mappings"]["properties"][fields[i]] = {"type": "ip"} 347 | elif types[i] == "string": 348 | # Special cases 349 | if fields[i] in keywords: 350 | mappings["mappings"]["properties"][fields[i]] = {"type": "text", "fields": { "keyword": { "type": "keyword" }}} 351 | else: 352 | mappings["mappings"]["properties"][fields[i]] = {"type": "text"} 353 | 354 | # Put index template for data stream 355 | 356 | if args["datastream"] > 0: 357 | senddatastream(args, es_index, mappings) 358 | 359 | # Put data 360 | 361 | putmapping = False 362 | putpipeline = False 363 | n = 0 364 | items = 0 365 | outstring = "" 366 | ofl = len(outputfields) 367 | 368 | # Iterate through every row in the TSV. 369 | for row in read_tsv: 370 | # Build the dict and fill in any default info. 371 | d = dict(zeek_log_filename=filename, zeek_log_path=zeek_log_path) 372 | if (len(args['name']) > 0): 373 | d["zeek_log_system_name"] = args['name'] 374 | i = 0 375 | added_val = False 376 | 377 | # For each column in the row. 378 | for col in row: 379 | # Process the data using a method for each type. We also will only output fields of a certain name, 380 | # if identified on the command line. 381 | if types[i] == "time": 382 | if col != '-' and col != '(empty)' and col != '' and (ofl == 0 or fields[i] in outputfields): 383 | gmt_mydt = datetime.datetime.utcfromtimestamp(float(col)) 384 | if not args['timestamp']: 385 | d[fields[i]] = "{}T{}".format(gmt_mydt.date(), gmt_mydt.time()) 386 | else: 387 | if args['origtime']: 388 | d[fields[i]] = gmt_mydt.timestamp() 389 | else: 390 | d[fields[i]] = gmt_mydt.timestamp()*1000 391 | added_val = True 392 | elif types[i] == "interval" or types[i] == "double": 393 | if col != '-' and col != '(empty)' and col != '' and (ofl == 0 or fields[i] in outputfields): 394 | d[fields[i]] = float(col) 395 | added_val = True 396 | elif types[i] == "bool": 397 | if col != '-' and col != '(empty)' and col != '' and (ofl == 0 or fields[i] in outputfields): 398 | d[fields[i]] = col == "T" 399 | added_val = True 400 | elif types[i] == "port" or types[i] == "count" or types[i] == "int": 401 | if col != '-' and col != '(empty)' and col != '' and (ofl == 0 or fields[i] in outputfields): 402 | d[fields[i]] = int(col) 403 | added_val = True 404 | elif types[i].startswith("vector") or types[i].startswith("set"): 405 | if col != '-' and col != '(empty)' and col != '' and (ofl == 0 or fields[i] in outputfields): 406 | d[fields[i]] = col.split(",") 407 | added_val = True 408 | else: 409 | if col != '-' and col != '(empty)' and col != '' and (ofl == 0 or fields[i] in outputfields): 410 | d[fields[i]] = col 411 | added_val = True 412 | i += 1 413 | 414 | # Here we only add data if there is a timestamp, and if the filter keys are used we make sure our key exists. 415 | if added_val and "ts" in d and (not filterkeys_field or (filterkeys_field and d[filterkeys_field] in filterkeys)): 416 | # This is the Python function filtering logic. 417 | filter_data = False 418 | if filterfilter: 419 | output = list(filter(filterfilter, [d])) 420 | if len(output) == 0: 421 | filter_data = True 422 | 423 | # If we haven't filtered using the Python filter function... 424 | if not filter_data: 425 | # Log the keys to a file, if desired. 426 | i = 0 427 | for lkf in logkeyfields: 428 | lkfd = logkeys_fds[i] 429 | if lkf in d: 430 | if isinstance(d[lkf], list): 431 | for z in d[lkf]: 432 | lkfd.write(z) 433 | lkfd.write("\n") 434 | else: 435 | lkfd.write(d[lkf]) 436 | lkfd.write("\n") 437 | i += 1 438 | 439 | # Create the bulk header. 440 | if not args['nobulk']: 441 | i = dict(create=dict(_index=es_index)) 442 | if len(ingest_pipeline["processors"]) > 0: 443 | i["create"]["pipeline"] = "zeekgeoip" 444 | outstring += json.dumps(i)+"\n" 445 | # Prepare the output and increment counters 446 | if args['humio']: 447 | d['ts'] = d['ts'] + "Z" 448 | if "_write_ts" in d: 449 | d['_write_ts'] = d['_write_ts'] + "Z" 450 | else: 451 | d["_write_ts"] = d["ts"] 452 | if "_path" not in d: 453 | d["_path"] = zeek_log_path 454 | if (len(args['name'].strip()) > 0): 455 | d["_system_name"] = args['name'].strip() 456 | d["@timestamp"] = d["ts"] 457 | outstring += json.dumps(d)+"\n" 458 | n += 1 459 | items += 1 460 | # If we aren't using stdout, prepare the ES index/datastream. 461 | if not args['stdout']: 462 | if putmapping == False: 463 | sendmappings(args, es_index, mappings) 464 | putmapping = True 465 | if putpipeline == False and len(ingest_pipeline["processors"]) > 0: 466 | sendpipeline(args, ingest_pipeline) 467 | putpipeline = True 468 | 469 | # Once we get more than "lines", we send it to ES 470 | if n >= args['lines'] and len(outstring) > 0: 471 | sendbulk(args, outstring, es_index, filename) 472 | outstring = "" 473 | n = 0 474 | 475 | # We do this one last time to get rid of any remaining lines. 476 | if n != 0 and len(outstring) > 0: 477 | sendbulk(args, outstring, es_index, filename) 478 | else: 479 | # This does everything the TSV version does, but for JSON 480 | # Read JSON log 481 | zcat_process = subprocess.Popen(zcat_name+[filename], 482 | stdout=subprocess.PIPE) 483 | j_in = io.TextIOWrapper(zcat_process.stdout) 484 | 485 | zeek_log_path = "" 486 | items = 0 487 | n = 0 488 | outstring = "" 489 | es_index = "" 490 | 491 | # Put mappings 492 | 493 | mappings = {"mappings": {"properties": dict(ts=dict(type="date"), geoip_orig=dict(properties=dict(location=dict(type="geo_point"))), 494 | geoip_resp=dict(properties=dict(location=dict(type="geo_point"))))}} 495 | mappings["mappings"]["properties"]["id.orig_h"] = {"type": "ip"} 496 | mappings["mappings"]["properties"]["id.resp_h"] = {"type": "ip"} 497 | putmapping = False 498 | putpipeline = False 499 | putdatastream = False 500 | 501 | # We continue until broken. 502 | while True: 503 | line = j_in.readline() 504 | 505 | # Here is where we break out of the while True loop. 506 | if not line: 507 | break 508 | 509 | # Load our data so we can process it. 510 | j_data = json.loads(line) 511 | 512 | # Only process data that has a timestamp field. 513 | if "ts" in j_data: 514 | # Here we deal with the time output format. 515 | gmt_mydt = datetime.datetime.utcfromtimestamp(float(j_data["ts"])) 516 | 517 | if not args['timestamp']: 518 | j_data["ts"] = "{}T{}".format(gmt_mydt.date(), gmt_mydt.time()) 519 | else: 520 | if args['origtime']: 521 | j_data["ts"] = gmt_mydt.timestamp() 522 | else: 523 | # ES uses ms 524 | j_data["ts"] = gmt_mydt.timestamp()*1000 525 | 526 | # This happens when we go through this loop the first time and do not have an es_index name. 527 | if es_index == "": 528 | sysname = "" 529 | 530 | if (len(args['name']) > 0): 531 | sysname = "{}_".format(args['name']) 532 | 533 | # Since the JSON logs do not include the Zeek log path, we try to guess it from the name. 534 | try: 535 | zeek_log_path = re.search(".*\/([^\._]+).*", filename).group(1).lower() 536 | except: 537 | print("Log path cannot be found from filename: {}".format(filename)) 538 | exit(-5) 539 | 540 | # We allow for hahes instead of dates in our index name. 541 | if not args['hashdates']: 542 | es_index = "zeek_{}{}_{}".format(sysname, zeek_log_path, gmt_mydt.date()) 543 | else: 544 | es_index = "zeek_{}{}_{}".format(sysname, zeek_log_path, random.getrandbits(hashbits)) 545 | 546 | es_index = es_index.replace(':', '_').replace("/", "_") 547 | 548 | # If we are not sending the data to stdout, we prepare the ES index or datastream. 549 | if not args['stdout']: 550 | if putmapping == False: 551 | sendmappings(args, es_index, mappings) 552 | putmapping = True 553 | if putpipeline == False and len(ingest_pipeline["processors"]) > 0: 554 | sendpipeline(args, ingest_pipeline) 555 | putpipeline = True 556 | if args["datastream"] > 0 and putdatastream == False: 557 | senddatastream(args, es_index, mappings) 558 | putdatastream = True 559 | 560 | # We add the system name, if desired. 561 | if (len(args['name']) > 0): 562 | j_data["zeek_log_system_name"] = args['name'] 563 | 564 | # Here we are checking if the keys will filter the data in. 565 | if not filterkeys_field or (filterkeys_field and j_data[filterkeys_field] in filterkeys): 566 | # This check below is for the Python filters. 567 | filter_data = False 568 | if filterfilter: 569 | output = list(filter(filterfilter, [j_data])) 570 | if len(output) == 0: 571 | filter_data = True 572 | 573 | if not filter_data: 574 | # We log the keys, if so desired. 575 | i = 0 576 | for lkf in logkeyfields: 577 | lkfd = logkeys_fds[i] 578 | if lkf in j_data: 579 | if isinstance(j_data[lkf], list): 580 | for z in j_data[lkf]: 581 | lkfd.write(z) 582 | lkfd.write("\n") 583 | else: 584 | lkfd.write(j_data[lkf]) 585 | lkfd.write("\n") 586 | i += 1 587 | items += 1 588 | 589 | if not args['nobulk']: 590 | i = dict(create=dict(_index=es_index)) 591 | if len(ingest_pipeline["processors"]) > 0: 592 | i["create"]["pipeline"] = "zeekgeoip" 593 | outstring += json.dumps(i)+"\n" 594 | j_data["@timestamp"] = j_data["ts"] 595 | # Here we only include the output fields identified via the command line. 596 | if len(outputfields) > 0: 597 | new_j_data = {} 598 | for o in outputfields: 599 | if o in j_data: 600 | new_j_data[o] = j_data[o] 601 | j_data = new_j_data 602 | outstring += json.dumps(j_data) + "\n" 603 | n += 1 604 | 605 | # Here we output a set of lines to the ES server. 606 | if n >= args['lines'] and len(outstring) > 0: 607 | sendbulk(args, outstring, es_index, filename) 608 | outstring = "" 609 | n = 0 610 | 611 | # We send the last of the data to the ES server, if there is any left. 612 | if n != 0 and len(outstring) > 0: 613 | sendbulk(args, outstring, es_index, filename) 614 | 615 | # This deals with running as a script vs. cython. 616 | if __name__ == "__main__": 617 | args = parseargs() 618 | if args.cython: 619 | import zeek2es 620 | zeek2es.main(**vars(args)) 621 | else: 622 | main(**vars(args)) --------------------------------------------------------------------------------