├── Rakefile ├── .travis.yml ├── NOTICE.TXT ├── .github ├── PULL_REQUEST_TEMPLATE.md ├── ISSUE_TEMPLATE.md └── CONTRIBUTING.md ├── Gemfile ├── CONTRIBUTORS ├── .gitignore ├── logstash-filter-aggregate.gemspec ├── spec └── filters │ ├── aggregate_spec_helper.rb │ └── aggregate_spec.rb ├── BUILD.md ├── README.md ├── CHANGELOG.md ├── LICENSE ├── lib └── logstash │ └── filters │ └── aggregate.rb └── docs └── index.asciidoc /Rakefile: -------------------------------------------------------------------------------- 1 | require "logstash/devutils/rake" 2 | -------------------------------------------------------------------------------- /.travis.yml: -------------------------------------------------------------------------------- 1 | import: 2 | - logstash-plugins/.ci:travis/travis.yml@1.x -------------------------------------------------------------------------------- /NOTICE.TXT: -------------------------------------------------------------------------------- 1 | Elasticsearch 2 | Copyright 2012-2015 Elasticsearch 3 | 4 | This product includes software developed by The Apache Software 5 | Foundation (http://www.apache.org/). -------------------------------------------------------------------------------- /.github/PULL_REQUEST_TEMPLATE.md: -------------------------------------------------------------------------------- 1 | Thanks for contributing to Logstash! If you haven't already signed our CLA, here's a handy link: https://www.elastic.co/contributor-agreement/ 2 | -------------------------------------------------------------------------------- /Gemfile: -------------------------------------------------------------------------------- 1 | source 'https://rubygems.org' 2 | 3 | gemspec 4 | 5 | logstash_path = ENV["LOGSTASH_PATH"] || "../../logstash" 6 | use_logstash_source = ENV["LOGSTASH_SOURCE"] && ENV["LOGSTASH_SOURCE"].to_s == "1" 7 | 8 | if Dir.exist?(logstash_path) && use_logstash_source 9 | gem 'logstash-core', :path => "#{logstash_path}/logstash-core" 10 | gem 'logstash-core-plugin-api', :path => "#{logstash_path}/logstash-core-plugin-api" 11 | end 12 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE.md: -------------------------------------------------------------------------------- 1 | Please post all product and debugging questions on our [forum](https://discuss.elastic.co/c/logstash). Your questions will reach our wider community members there, and if we confirm that there is a bug, then we can open a new issue here. 2 | 3 | For all general issues, please provide the following details for fast resolution: 4 | 5 | - Version: 6 | - Operating System: 7 | - Config File (if you have sensitive info, please remove it): 8 | - Sample Data: 9 | - Steps to Reproduce: 10 | -------------------------------------------------------------------------------- /CONTRIBUTORS: -------------------------------------------------------------------------------- 1 | The following is a list of people who have contributed ideas, code, bug 2 | reports, or in general have helped logstash along its way. 3 | 4 | Maintainers: 5 | * Fabien Baligand (fbaligand) 6 | 7 | Contributors: 8 | * Fabien Baligand (fbaligand) 9 | * Artur Kronenberg (pandaadb) 10 | * Fernando Galandrini (fjgal) 11 | 12 | Note: If you've sent us patches, bug reports, or otherwise contributed to 13 | Logstash, and you aren't on the list above and want to be, please let us know 14 | and we'll make sure you're here. Contributions from folks like you are what make 15 | open source awesome. 16 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | *.lock 2 | *.gem 3 | *.rbc 4 | /.config 5 | /coverage/ 6 | /InstalledFiles 7 | /pkg/ 8 | /spec/reports/ 9 | /test/tmp/ 10 | /test/version_tmp/ 11 | /tmp/ 12 | 13 | ## Specific to RubyMotion: 14 | .dat* 15 | .repl_history 16 | build/ 17 | 18 | ## Documentation cache and generated files: 19 | /.yardoc/ 20 | /_yardoc/ 21 | /doc/ 22 | /rdoc/ 23 | 24 | ## Environment normalisation: 25 | /.bundle/ 26 | /vendor/bundle 27 | /lib/bundler/man/ 28 | 29 | # for a library or gem, you might want to ignore these files since the code is 30 | # intended to run in multiple environments; otherwise, check them in: 31 | # Gemfile.lock 32 | # .ruby-version 33 | # .ruby-gemset 34 | 35 | # unless supporting rvm < 1.11.0 or doing something fancy, ignore this: 36 | .rvmrc 37 | /.project 38 | /.buildpath 39 | -------------------------------------------------------------------------------- /logstash-filter-aggregate.gemspec: -------------------------------------------------------------------------------- 1 | Gem::Specification.new do |s| 2 | s.name = 'logstash-filter-aggregate' 3 | s.version = '2.10.0' 4 | s.licenses = ['Apache-2.0'] 5 | s.summary = 'Aggregates information from several events originating with a single task' 6 | s.description = 'This gem is a Logstash plugin required to be installed on top of the Logstash core pipeline using $LS_HOME/bin/logstash-plugin install gemname. This gem is not a stand-alone program' 7 | s.authors = ['Elastic', 'Fabien Baligand'] 8 | s.email = 'info@elastic.co' 9 | s.homepage = 'https://github.com/logstash-plugins/logstash-filter-aggregate' 10 | s.require_paths = ['lib'] 11 | 12 | # Files 13 | s.files = Dir["lib/**/*","spec/**/*","*.gemspec","*.md","CONTRIBUTORS","Gemfile","LICENSE","NOTICE.TXT", "vendor/jar-dependencies/**/*.jar", "vendor/jar-dependencies/**/*.rb", "VERSION", "docs/**/*"] 14 | 15 | # Tests 16 | s.test_files = s.files.grep(%r{^(test|spec|features)/}) 17 | 18 | # Special flag to let us know this is actually a logstash plugin 19 | s.metadata = { 'logstash_plugin' => 'true', 'logstash_group' => 'filter' } 20 | 21 | # Gem dependencies 22 | s.add_runtime_dependency 'logstash-core-plugin-api', '>= 1.60', '<= 2.99' 23 | 24 | # Gem test dependencies 25 | s.add_development_dependency 'logstash-devutils' 26 | end 27 | -------------------------------------------------------------------------------- /spec/filters/aggregate_spec_helper.rb: -------------------------------------------------------------------------------- 1 | # encoding: utf-8 2 | require "logstash/filters/aggregate" 3 | 4 | def event(data = {}) 5 | LogStash::Event.new(data) 6 | end 7 | 8 | def timestamp(iso8601) 9 | LogStash::Timestamp.new(iso8601) 10 | end 11 | 12 | def start_event(data = {}) 13 | data["logger"] = "TASK_START" 14 | event(data) 15 | end 16 | 17 | def update_event(data = {}) 18 | data["logger"] = "SQL" 19 | event(data) 20 | end 21 | 22 | def end_event(data = {}) 23 | data["logger"] = "TASK_END" 24 | event(data) 25 | end 26 | 27 | def setup_filter(config = {}) 28 | config["task_id"] ||= "%{taskid}" 29 | filter = LogStash::Filters::Aggregate.new(config) 30 | filter.register() 31 | return filter 32 | end 33 | 34 | def filter(event) 35 | @start_filter.filter(event) 36 | @update_filter.filter(event) 37 | @end_filter.filter(event) 38 | end 39 | 40 | def pipelines() 41 | LogStash::Filters::Aggregate.class_variable_get(:@@pipelines) 42 | end 43 | 44 | def current_pipeline() 45 | pipelines()['main'] 46 | end 47 | 48 | def aggregate_maps() 49 | current_pipeline().aggregate_maps 50 | end 51 | 52 | def taskid_eviction_instance() 53 | current_pipeline().flush_instance_map["%{taskid}"] 54 | end 55 | 56 | def pipeline_close_instance() 57 | current_pipeline().pipeline_close_instance 58 | end 59 | 60 | def aggregate_maps_path_set() 61 | current_pipeline().aggregate_maps_path_set 62 | end 63 | 64 | def reset_timeout_management() 65 | current_pipeline().flush_instance_map.clear() 66 | current_pipeline().last_flush_timestamp_map.clear() 67 | end 68 | 69 | def reset_pipeline_variables() 70 | pipelines().clear() 71 | # reset_timeout_management() 72 | # aggregate_maps().clear() 73 | # current_pipeline().pipeline_close_instance = nil 74 | # current_pipeline().aggregate_maps_path_set = false 75 | end 76 | -------------------------------------------------------------------------------- /BUILD.md: -------------------------------------------------------------------------------- 1 | # Logstash Plugin 2 | 3 | This is a plugin for [Logstash](https://github.com/elastic/logstash). 4 | 5 | It is fully free and fully open source. The license is Apache 2.0, meaning you are pretty much free to use it however you want in whatever way. 6 | 7 | ## Documentation 8 | 9 | Logstash provides infrastructure to automatically generate documentation for this plugin. We use the asciidoc format to write documentation so any comments in the source code will be first converted into asciidoc and then into html. All plugin documentation are placed under one [central location](http://www.elasticsearch.org/guide/en/logstash/current/). 10 | 11 | - For formatting code or config example, you can use the asciidoc `[source,ruby]` directive 12 | - For more asciidoc formatting tips, see the excellent reference here https://github.com/elastic/docs#asciidoc-guide 13 | 14 | ## Developing 15 | 16 | ### 1. Plugin Developement and Testing 17 | 18 | #### Code 19 | - To get started, you'll need JRuby with the Bundler gem installed. 20 | 21 | - Create a new plugin or clone an existing from the GitHub [logstash-plugins](https://github.com/logstash-plugins) organization. We also provide [example plugins](https://github.com/logstash-plugins?query=example). 22 | 23 | - Install dependencies 24 | ```sh 25 | bundle install 26 | ``` 27 | 28 | #### Test 29 | 30 | - Update your dependencies 31 | 32 | ```sh 33 | bundle install 34 | ``` 35 | 36 | - Run tests 37 | 38 | ```sh 39 | bundle exec rspec 40 | ``` 41 | 42 | ### 2. Running your unpublished Plugin in Logstash 43 | 44 | #### 2.1 Run in a local Logstash clone 45 | 46 | - Edit Logstash `Gemfile` and add the local plugin path, for example: 47 | ```ruby 48 | gem "logstash-filter-awesome", :path => "/your/local/logstash-filter-awesome" 49 | ``` 50 | - Install plugin 51 | ```sh 52 | bin/plugin install --no-verify 53 | ``` 54 | - Run Logstash with your plugin 55 | ```sh 56 | bin/logstash -e 'filter {awesome {}}' 57 | ``` 58 | At this point any modifications to the plugin code will be applied to this local Logstash setup. After modifying the plugin, simply rerun Logstash. 59 | 60 | #### 2.2 Run in an installed Logstash 61 | 62 | You can use the same **2.1** method to run your plugin in an installed Logstash by editing its `Gemfile` and pointing the `:path` to your local plugin development directory or you can build the gem and install it using: 63 | 64 | - Build your plugin gem 65 | ```sh 66 | gem build logstash-filter-awesome.gemspec 67 | ``` 68 | - Install the plugin from the Logstash home 69 | ```sh 70 | bin/plugin install /your/local/plugin/logstash-filter-awesome.gem 71 | ``` 72 | - Start Logstash and proceed to test the plugin 73 | 74 | ## Contributing 75 | 76 | All contributions are welcome: ideas, patches, documentation, bug reports, complaints, and even something you drew up on a napkin. 77 | 78 | Programming is not a required skill. Whatever you've seen about open source and maintainers or community members saying "send patches or die" - you will not see that here. 79 | 80 | It is more important to the community that you are able to contribute. 81 | 82 | For more information about contributing, see the [CONTRIBUTING](https://github.com/elastic/logstash/blob/master/CONTRIBUTING.md) file. -------------------------------------------------------------------------------- /.github/CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing to Logstash 2 | 3 | All contributions are welcome: ideas, patches, documentation, bug reports, 4 | complaints, etc! 5 | 6 | Programming is not a required skill, and there are many ways to help out! 7 | It is more important to us that you are able to contribute. 8 | 9 | That said, some basic guidelines, which you are free to ignore :) 10 | 11 | ## Want to learn? 12 | 13 | Want to lurk about and see what others are doing with Logstash? 14 | 15 | * The irc channel (#logstash on irc.freenode.org) is a good place for this 16 | * The [forum](https://discuss.elastic.co/c/logstash) is also 17 | great for learning from others. 18 | 19 | ## Got Questions? 20 | 21 | Have a problem you want Logstash to solve for you? 22 | 23 | * You can ask a question in the [forum](https://discuss.elastic.co/c/logstash) 24 | * Alternately, you are welcome to join the IRC channel #logstash on 25 | irc.freenode.org and ask for help there! 26 | 27 | ## Have an Idea or Feature Request? 28 | 29 | * File a ticket on [GitHub](https://github.com/elastic/logstash/issues). Please remember that GitHub is used only for issues and feature requests. If you have a general question, the [forum](https://discuss.elastic.co/c/logstash) or IRC would be the best place to ask. 30 | 31 | ## Something Not Working? Found a Bug? 32 | 33 | If you think you found a bug, it probably is a bug. 34 | 35 | * If it is a general Logstash or a pipeline issue, file it in [Logstash GitHub](https://github.com/elasticsearch/logstash/issues) 36 | * If it is specific to a plugin, please file it in the respective repository under [logstash-plugins](https://github.com/logstash-plugins) 37 | * or ask the [forum](https://discuss.elastic.co/c/logstash). 38 | 39 | # Contributing Documentation and Code Changes 40 | 41 | If you have a bugfix or new feature that you would like to contribute to 42 | logstash, and you think it will take more than a few minutes to produce the fix 43 | (ie; write code), it is worth discussing the change with the Logstash users and developers first! You can reach us via [GitHub](https://github.com/elastic/logstash/issues), the [forum](https://discuss.elastic.co/c/logstash), or via IRC (#logstash on freenode irc) 44 | Please note that Pull Requests without tests will not be merged. If you would like to contribute but do not have experience with writing tests, please ping us on IRC/forum or create a PR and ask our help. 45 | 46 | ## Contributing to plugins 47 | 48 | Check our [documentation](https://www.elastic.co/guide/en/logstash/current/contributing-to-logstash.html) on how to contribute to plugins or write your own! It is super easy! 49 | 50 | ## Contribution Steps 51 | 52 | 1. Test your changes! [Run](https://github.com/elastic/logstash#testing) the test suite 53 | 2. Please make sure you have signed our [Contributor License 54 | Agreement](https://www.elastic.co/contributor-agreement/). We are not 55 | asking you to assign copyright to us, but to give us the right to distribute 56 | your code without restriction. We ask this of all contributors in order to 57 | assure our users of the origin and continuing existence of the code. You 58 | only need to sign the CLA once. 59 | 3. Send a pull request! Push your changes to your fork of the repository and 60 | [submit a pull 61 | request](https://help.github.com/articles/using-pull-requests). In the pull 62 | request, describe what your changes do and mention any bugs/issues related 63 | to the pull request. 64 | 65 | 66 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Aggregate Logstash Plugin 2 | 3 | [![Travis Build Status](https://travis-ci.com/logstash-plugins/logstash-filter-aggregate.svg)](https://travis-ci.com/logstash-plugins/logstash-filter-aggregate) 4 | 5 | This is a plugin for [Logstash](https://github.com/elastic/logstash). 6 | 7 | It is fully free and fully open source. The license is Apache 2.0, meaning you are pretty much free to use it however you want in whatever way. 8 | 9 | ## Documentation 10 | 11 | Latest aggregate plugin documentation is available [here](docs/index.asciidoc). 12 | 13 | Logstash provides infrastructure to automatically generate documentation for this plugin. We use the asciidoc format to write documentation so any comments in the source code will be first converted into asciidoc and then into html. All plugin documentation are placed under one [central location](http://www.elastic.co/guide/en/logstash/current/). 14 | 15 | - For formatting code or config example, you can use the asciidoc `[source,ruby]` directive 16 | - For more asciidoc formatting tips, see the excellent reference here https://github.com/elastic/docs#asciidoc-guide 17 | 18 | ## Changelog 19 | 20 | Read [CHANGELOG.md](CHANGELOG.md). 21 | 22 | ## Need Help? 23 | 24 | Need help? Try #logstash on freenode IRC or the https://discuss.elastic.co/c/logstash discussion forum. 25 | 26 | ## Developing 27 | 28 | ### 1. Plugin Developement and Testing 29 | 30 | #### Code 31 | - To get started, you'll need JRuby with the Bundler gem installed. 32 | 33 | - Create a new plugin or clone and existing from the GitHub [logstash-plugins](https://github.com/logstash-plugins) organization. We also provide [example plugins](https://github.com/logstash-plugins?query=example). 34 | 35 | - Install dependencies 36 | ```sh 37 | bundle install 38 | ``` 39 | 40 | #### Test 41 | 42 | - Update your dependencies 43 | 44 | ```sh 45 | bundle install 46 | ``` 47 | 48 | - Run tests 49 | 50 | ```sh 51 | bundle exec rspec 52 | ``` 53 | 54 | ### 2. Running your unpublished Plugin in Logstash 55 | 56 | #### 2.1 Run in a local Logstash clone 57 | 58 | - Edit Logstash `Gemfile` and add the local plugin path, for example: 59 | ```ruby 60 | gem "logstash-filter-awesome", :path => "/your/local/logstash-filter-awesome" 61 | ``` 62 | - Install plugin 63 | ```sh 64 | # Logstash 2.3 and higher 65 | bin/logstash-plugin install --no-verify 66 | 67 | # Prior to Logstash 2.3 68 | bin/plugin install --no-verify 69 | 70 | ``` 71 | - Run Logstash with your plugin 72 | ```sh 73 | bin/logstash -e 'filter {awesome {}}' 74 | ``` 75 | At this point any modifications to the plugin code will be applied to this local Logstash setup. After modifying the plugin, simply rerun Logstash. 76 | 77 | #### 2.2 Run in an installed Logstash 78 | 79 | You can use the same **2.1** method to run your plugin in an installed Logstash by editing its `Gemfile` and pointing the `:path` to your local plugin development directory or you can build the gem and install it using: 80 | 81 | - Build your plugin gem 82 | ```sh 83 | gem build logstash-filter-awesome.gemspec 84 | ``` 85 | - Install the plugin from the Logstash home 86 | ```sh 87 | # Logstash 2.3 and higher 88 | bin/logstash-plugin install --no-verify 89 | 90 | # Prior to Logstash 2.3 91 | bin/plugin install --no-verify 92 | 93 | ``` 94 | - Start Logstash and proceed to test the plugin 95 | 96 | ## Contributing 97 | 98 | All contributions are welcome: ideas, patches, documentation, bug reports, complaints, and even something you drew up on a napkin. 99 | 100 | Programming is not a required skill. Whatever you've seen about open source and maintainers or community members saying "send patches or die" - you will not see that here. 101 | 102 | It is more important to the community that you are able to contribute. 103 | 104 | For more information about contributing, see the [CONTRIBUTING](https://github.com/elastic/logstash/blob/master/CONTRIBUTING.md) file. -------------------------------------------------------------------------------- /CHANGELOG.md: -------------------------------------------------------------------------------- 1 | ## 2.10.0 2 | - new feature: add ability to generate new event during code execution (#116) 3 | 4 | ## 2.9.2 5 | - bugfix: remove 'default_timeout' at pipeline level (fix #112) 6 | - ci: update travis ci configuration 7 | 8 | ## 2.9.1 9 | - bugfix: fix inactivity timeout feature when processing old logs (PR [#103](https://github.com/logstash-plugins/logstash-filter-aggregate/pull/103), thanks @jdratlif for his contribution!) 10 | - docs: fix several typos in documentation 11 | - docs: enhance example 4 documentation 12 | - ci: enhance plugin continuous integration 13 | 14 | ## 2.9.0 15 | - new feature: add ability to dynamically define a custom `timeout` or `inactivity_timeout` in `code` block (fix issues [#91](https://github.com/logstash-plugins/logstash-filter-aggregate/issues/91) and [#92](https://github.com/logstash-plugins/logstash-filter-aggregate/issues/92)) 16 | - new feature: add meta informations available in `code` block through `map_meta` variable 17 | - new feature: add Logstash metrics, specific to aggregate plugin: aggregate_maps, pushed_events, task_timeouts, code_errors, timeout_code_errors 18 | - new feature: validate at startup that `map_action` option equals to 'create', 'update' or 'create_or_update' 19 | 20 | ## 2.8.0 21 | - new feature: add 'timeout_timestamp_field' option (fix issue [#81](https://github.com/logstash-plugins/logstash-filter-aggregate/issues/81)) 22 | When set, this option lets to compute timeout based on event timestamp field (and not system time). 23 | It's particularly useful when processing old logs. 24 | 25 | ## 2.7.2 26 | - bugfix: fix synchronisation issue at Logstash shutdown (issue [#75](https://github.com/logstash-plugins/logstash-filter-aggregate/issues/75)) 27 | 28 | ## 2.7.1 29 | - docs: update gemspec summary 30 | 31 | ## 2.7.0 32 | - new feature: add support for multiple pipelines (for Logstash 6.0+) 33 | aggregate maps, timeout options, and aggregate_maps_path are now stored per pipeline. 34 | each pipeline is independant. 35 | - docs: fix break lines in documentation examples 36 | 37 | ## 2.6.4 38 | - bugfix: fix a NPE issue at Logstash 6.0 shutdown 39 | - docs: remove all redundant documentation in aggregate.rb (now only present in docs/index.asciidoc) 40 | 41 | ## 2.6.3 42 | - docs: fix some documentation issues 43 | 44 | ## 2.6.2 45 | - docs: remove incorrectly coded, redundant links 46 | 47 | ## 2.6.1 48 | - docs: bump patch level for doc build 49 | 50 | ## 2.6.0 51 | - new feature: add 'inactivity_timeout' option. 52 | Events for a given `task_id` will be aggregated for as long as they keep arriving within the defined `inactivity_timeout` option - the inactivity timeout is reset each time a new event happens. On the contrary, `timeout` is never reset and happens after `timeout` seconds since aggregation map creation. 53 | 54 | ## 2.5.2 55 | - bugfix: fix 'aggregate_maps_path' load (issue [#62](https://github.com/logstash-plugins/logstash-filter-aggregate/issues/62)). Re-start of Logstash died when no data were provided in 'aggregate_maps_path' file for some aggregate task_id patterns 56 | - enhancement: at Logstash startup, check that 'task_id' option contains a field reference expression (else raise error) 57 | - docs: enhance examples 58 | - docs: precise that tasks are tied to their task_id pattern, even if they have same task_id value 59 | 60 | ## 2.5.1 61 | - enhancement: when final flush occurs (just before Logstash shutdown), add `_aggregatefinalflush` tag on generated timeout events 62 | - bugfix: when final flush occurs (just before Logstash shutdown), push last aggregate map as event (if push_previous_map_as_event=true) 63 | - bugfix: fix 'timeout_task_id_field' feature when push_previous_map_as_event=true 64 | - bugfix: fix aggregate_maps_path feature (bug since v2.4.0) 65 | - internal: add debug logging 66 | - internal: refactor flush management static variables 67 | 68 | ## 2.5.0 69 | - new feature: add compatibility with Logstash 5 70 | - breaking: need Logstash 2.4 or later 71 | 72 | ## 2.4.0 73 | - new feature: You can now define timeout options per task_id pattern (fix issue [#42](https://github.com/logstash-plugins/logstash-filter-aggregate/issues/42)) 74 | timeout options are : `timeout, timeout_code, push_map_as_event_on_timeout, push_previous_map_as_event, timeout_task_id_field, timeout_tags` 75 | - validation: a configuration error is thrown at startup if you define any timeout option on several aggregate filters for the same task_id pattern 76 | - breaking: if you use `aggregate_maps_path` option, storage format has changed. So you have to delete `aggregate_maps_path` file before starting Logstash 77 | 78 | ## 2.3.1 79 | - new feature: Add new option "timeout_tags" so that you can add tags to generated timeout events 80 | 81 | ## 2.3.0 82 | - new feature: Add new option "push_map_as_event_on_timeout" so that when a task timeout happens the aggregation map can be yielded as a new event 83 | - new feature: Add new option "timeout_code" which takes the timeout event populated with the aggregation map and executes code on it. This works for "push_map_as_event_on_timeout" as well as "push_previous_map_as_event" 84 | - new feature: Add new option "timeout_task_id_field" which is used to map the task_id on timeout events. 85 | 86 | ## 2.2.0 87 | - new feature: add new option "push_previous_map_as_event" so that each time aggregate plugin detects a new task id, it pushes previous aggregate map as a new logstash event 88 | 89 | ## 2.1.2 90 | - bugfix: clarify default timeout behaviour : by default, timeout is 1800s 91 | 92 | ## 2.1.1 93 | - bugfix: when "aggregate_maps_path" option is defined in more than one aggregate filter, raise a Logstash::ConfigurationError 94 | - bugfix: add support for logstash hot reload feature 95 | 96 | ## 2.1.0 97 | - new feature: add new option "aggregate_maps_path" so that aggregate maps can be stored at logstash shutdown and reloaded at logstash startup 98 | 99 | ## 2.0.5 100 | - internal,deps: Depend on logstash-core-plugin-api instead of logstash-core, removing the need to mass update plugins on major releases of logstash 101 | - breaking: need Logstash 2.3 or later 102 | 103 | ## 2.0.4 104 | - internal,deps: New dependency requirements for logstash-core for the 5.0 release 105 | 106 | ## 2.0.3 107 | - bugfix: fix issue [#10](https://github.com/logstash-plugins/logstash-filter-aggregate/issues/10) : numeric task_id is now well processed 108 | 109 | ## 2.0.2 110 | - bugfix: fix issue [#5](https://github.com/logstash-plugins/logstash-filter-aggregate/issues/5) : when code call raises an exception, the error is logged and the event is tagged '_aggregateexception'. It avoids logstash crash. 111 | 112 | ## 2.0.0 113 | - internal: Plugins were updated to follow the new shutdown semantic, this mainly allows Logstash to instruct input plugins to terminate gracefully, instead of using Thread.raise on the plugins' threads. 114 | Ref: https://github.com/elastic/logstash/pull/3895 115 | - internal,deps: Dependency on logstash-core update to 2.0 116 | 117 | ## 0.1.3 118 | - breaking: remove "milestone" method call which is deprecated in logstash 1.5, break compatibility with logstash 1.4 119 | - internal,test: enhanced tests using 'expect' command 120 | - docs: add a second example in documentation 121 | 122 | ## 0.1.2 123 | - compatible with logstash 1.4 124 | - first version available on github 125 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | 2 | Apache License 3 | Version 2.0, January 2004 4 | http://www.apache.org/licenses/ 5 | 6 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 7 | 8 | 1. Definitions. 9 | 10 | "License" shall mean the terms and conditions for use, reproduction, 11 | and distribution as defined by Sections 1 through 9 of this document. 12 | 13 | "Licensor" shall mean the copyright owner or entity authorized by 14 | the copyright owner that is granting the License. 15 | 16 | "Legal Entity" shall mean the union of the acting entity and all 17 | other entities that control, are controlled by, or are under common 18 | control with that entity. For the purposes of this definition, 19 | "control" means (i) the power, direct or indirect, to cause the 20 | direction or management of such entity, whether by contract or 21 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 22 | outstanding shares, or (iii) beneficial ownership of such entity. 23 | 24 | "You" (or "Your") shall mean an individual or Legal Entity 25 | exercising permissions granted by this License. 26 | 27 | "Source" form shall mean the preferred form for making modifications, 28 | including but not limited to software source code, documentation 29 | source, and configuration files. 30 | 31 | "Object" form shall mean any form resulting from mechanical 32 | transformation or translation of a Source form, including but 33 | not limited to compiled object code, generated documentation, 34 | and conversions to other media types. 35 | 36 | "Work" shall mean the work of authorship, whether in Source or 37 | Object form, made available under the License, as indicated by a 38 | copyright notice that is included in or attached to the work 39 | (an example is provided in the Appendix below). 40 | 41 | "Derivative Works" shall mean any work, whether in Source or Object 42 | form, that is based on (or derived from) the Work and for which the 43 | editorial revisions, annotations, elaborations, or other modifications 44 | represent, as a whole, an original work of authorship. For the purposes 45 | of this License, Derivative Works shall not include works that remain 46 | separable from, or merely link (or bind by name) to the interfaces of, 47 | the Work and Derivative Works thereof. 48 | 49 | "Contribution" shall mean any work of authorship, including 50 | the original version of the Work and any modifications or additions 51 | to that Work or Derivative Works thereof, that is intentionally 52 | submitted to Licensor for inclusion in the Work by the copyright owner 53 | or by an individual or Legal Entity authorized to submit on behalf of 54 | the copyright owner. For the purposes of this definition, "submitted" 55 | means any form of electronic, verbal, or written communication sent 56 | to the Licensor or its representatives, including but not limited to 57 | communication on electronic mailing lists, source code control systems, 58 | and issue tracking systems that are managed by, or on behalf of, the 59 | Licensor for the purpose of discussing and improving the Work, but 60 | excluding communication that is conspicuously marked or otherwise 61 | designated in writing by the copyright owner as "Not a Contribution." 62 | 63 | "Contributor" shall mean Licensor and any individual or Legal Entity 64 | on behalf of whom a Contribution has been received by Licensor and 65 | subsequently incorporated within the Work. 66 | 67 | 2. Grant of Copyright License. Subject to the terms and conditions of 68 | this License, each Contributor hereby grants to You a perpetual, 69 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 70 | copyright license to reproduce, prepare Derivative Works of, 71 | publicly display, publicly perform, sublicense, and distribute the 72 | Work and such Derivative Works in Source or Object form. 73 | 74 | 3. Grant of Patent License. Subject to the terms and conditions of 75 | this License, each Contributor hereby grants to You a perpetual, 76 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 77 | (except as stated in this section) patent license to make, have made, 78 | use, offer to sell, sell, import, and otherwise transfer the Work, 79 | where such license applies only to those patent claims licensable 80 | by such Contributor that are necessarily infringed by their 81 | Contribution(s) alone or by combination of their Contribution(s) 82 | with the Work to which such Contribution(s) was submitted. If You 83 | institute patent litigation against any entity (including a 84 | cross-claim or counterclaim in a lawsuit) alleging that the Work 85 | or a Contribution incorporated within the Work constitutes direct 86 | or contributory patent infringement, then any patent licenses 87 | granted to You under this License for that Work shall terminate 88 | as of the date such litigation is filed. 89 | 90 | 4. Redistribution. You may reproduce and distribute copies of the 91 | Work or Derivative Works thereof in any medium, with or without 92 | modifications, and in Source or Object form, provided that You 93 | meet the following conditions: 94 | 95 | (a) You must give any other recipients of the Work or 96 | Derivative Works a copy of this License; and 97 | 98 | (b) You must cause any modified files to carry prominent notices 99 | stating that You changed the files; and 100 | 101 | (c) You must retain, in the Source form of any Derivative Works 102 | that You distribute, all copyright, patent, trademark, and 103 | attribution notices from the Source form of the Work, 104 | excluding those notices that do not pertain to any part of 105 | the Derivative Works; and 106 | 107 | (d) If the Work includes a "NOTICE" text file as part of its 108 | distribution, then any Derivative Works that You distribute must 109 | include a readable copy of the attribution notices contained 110 | within such NOTICE file, excluding those notices that do not 111 | pertain to any part of the Derivative Works, in at least one 112 | of the following places: within a NOTICE text file distributed 113 | as part of the Derivative Works; within the Source form or 114 | documentation, if provided along with the Derivative Works; or, 115 | within a display generated by the Derivative Works, if and 116 | wherever such third-party notices normally appear. The contents 117 | of the NOTICE file are for informational purposes only and 118 | do not modify the License. You may add Your own attribution 119 | notices within Derivative Works that You distribute, alongside 120 | or as an addendum to the NOTICE text from the Work, provided 121 | that such additional attribution notices cannot be construed 122 | as modifying the License. 123 | 124 | You may add Your own copyright statement to Your modifications and 125 | may provide additional or different license terms and conditions 126 | for use, reproduction, or distribution of Your modifications, or 127 | for any such Derivative Works as a whole, provided Your use, 128 | reproduction, and distribution of the Work otherwise complies with 129 | the conditions stated in this License. 130 | 131 | 5. Submission of Contributions. Unless You explicitly state otherwise, 132 | any Contribution intentionally submitted for inclusion in the Work 133 | by You to the Licensor shall be under the terms and conditions of 134 | this License, without any additional terms or conditions. 135 | Notwithstanding the above, nothing herein shall supersede or modify 136 | the terms of any separate license agreement you may have executed 137 | with Licensor regarding such Contributions. 138 | 139 | 6. Trademarks. This License does not grant permission to use the trade 140 | names, trademarks, service marks, or product names of the Licensor, 141 | except as required for reasonable and customary use in describing the 142 | origin of the Work and reproducing the content of the NOTICE file. 143 | 144 | 7. Disclaimer of Warranty. Unless required by applicable law or 145 | agreed to in writing, Licensor provides the Work (and each 146 | Contributor provides its Contributions) on an "AS IS" BASIS, 147 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 148 | implied, including, without limitation, any warranties or conditions 149 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 150 | PARTICULAR PURPOSE. You are solely responsible for determining the 151 | appropriateness of using or redistributing the Work and assume any 152 | risks associated with Your exercise of permissions under this License. 153 | 154 | 8. Limitation of Liability. In no event and under no legal theory, 155 | whether in tort (including negligence), contract, or otherwise, 156 | unless required by applicable law (such as deliberate and grossly 157 | negligent acts) or agreed to in writing, shall any Contributor be 158 | liable to You for damages, including any direct, indirect, special, 159 | incidental, or consequential damages of any character arising as a 160 | result of this License or out of the use or inability to use the 161 | Work (including but not limited to damages for loss of goodwill, 162 | work stoppage, computer failure or malfunction, or any and all 163 | other commercial damages or losses), even if such Contributor 164 | has been advised of the possibility of such damages. 165 | 166 | 9. Accepting Warranty or Additional Liability. While redistributing 167 | the Work or Derivative Works thereof, You may choose to offer, 168 | and charge a fee for, acceptance of support, warranty, indemnity, 169 | or other liability obligations and/or rights consistent with this 170 | License. However, in accepting such obligations, You may act only 171 | on Your own behalf and on Your sole responsibility, not on behalf 172 | of any other Contributor, and only if You agree to indemnify, 173 | defend, and hold each Contributor harmless for any liability 174 | incurred by, or claims asserted against, such Contributor by reason 175 | of your accepting any such warranty or additional liability. 176 | 177 | END OF TERMS AND CONDITIONS 178 | 179 | APPENDIX: How to apply the Apache License to your work. 180 | 181 | To apply the Apache License to your work, attach the following 182 | boilerplate notice, with the fields enclosed by brackets "[]" 183 | replaced with your own identifying information. (Don't include 184 | the brackets!) The text should be enclosed in the appropriate 185 | comment syntax for the file format. We also recommend that a 186 | file or class name and description of purpose be included on the 187 | same "printed page" as the copyright notice for easier 188 | identification within third-party archives. 189 | 190 | Copyright 2020 Elastic and contributors 191 | 192 | Licensed under the Apache License, Version 2.0 (the "License"); 193 | you may not use this file except in compliance with the License. 194 | You may obtain a copy of the License at 195 | 196 | http://www.apache.org/licenses/LICENSE-2.0 197 | 198 | Unless required by applicable law or agreed to in writing, software 199 | distributed under the License is distributed on an "AS IS" BASIS, 200 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 201 | See the License for the specific language governing permissions and 202 | limitations under the License. 203 | -------------------------------------------------------------------------------- /spec/filters/aggregate_spec.rb: -------------------------------------------------------------------------------- 1 | # encoding: utf-8 2 | require "logstash/devutils/rspec/spec_helper" 3 | require "logstash/filters/aggregate" 4 | require_relative "aggregate_spec_helper" 5 | 6 | describe LogStash::Filters::Aggregate do 7 | 8 | before(:each) do 9 | reset_pipeline_variables() 10 | @start_filter = setup_filter({ "map_action" => "create", "code" => "map['sql_duration'] = 0" }) 11 | @update_filter = setup_filter({ "map_action" => "update", "code" => "map['sql_duration'] += event.get('duration')" }) 12 | @end_filter = setup_filter({"timeout_task_id_field" => "my_id", "push_map_as_event_on_timeout" => true, "map_action" => "update", "code" => "event.set('sql_duration', map['sql_duration'])", "end_of_task" => true, "timeout" => 5, "inactivity_timeout" => 2, "timeout_code" => "event.set('test', 'testValue')", "timeout_tags" => ["tag1", "tag2"] }) 13 | end 14 | 15 | context "Validation" do 16 | describe "and register a filter with a task_id without dynamic expression" do 17 | it "raises a LogStash::ConfigurationError" do 18 | expect { 19 | setup_filter({ "code" => "", "task_id" => "static_value" }) 20 | }.to raise_error(LogStash::ConfigurationError) 21 | end 22 | end 23 | describe "and register a filter with inactivity_timeout longer than timeout" do 24 | it "raises a LogStash::ConfigurationError" do 25 | expect { 26 | # use a diffrent task_id pattern, otherwise the timeout settings cannot be updated 27 | setup_filter({ "task_id" => "%{taskid2}", "code" => "", "timeout" => 2, "inactivity_timeout" => 3 }) 28 | }.to raise_error(LogStash::ConfigurationError) 29 | end 30 | end 31 | end 32 | 33 | context "Start event" do 34 | describe "and receiving an event without task_id" do 35 | it "does not record it" do 36 | @start_filter.filter(event()) 37 | expect(aggregate_maps["%{taskid}"]).to be_empty 38 | end 39 | end 40 | describe "and receiving an event with task_id" do 41 | it "records it" do 42 | event = start_event("taskid" => "id123") 43 | @start_filter.filter(event) 44 | 45 | expect(aggregate_maps["%{taskid}"].size).to eq(1) 46 | expect(aggregate_maps["%{taskid}"]["id123"]).not_to be_nil 47 | expect(aggregate_maps["%{taskid}"]["id123"].creation_timestamp).to be >= event.timestamp.time 48 | expect(aggregate_maps["%{taskid}"]["id123"].map["sql_duration"]).to eq(0) 49 | end 50 | end 51 | 52 | describe "and receiving two 'start events' for the same task_id" do 53 | it "keeps the first one and does nothing with the second one" do 54 | 55 | first_start_event = start_event("taskid" => "id124") 56 | @start_filter.filter(first_start_event) 57 | 58 | first_update_event = update_event("taskid" => "id124", "duration" => 2) 59 | @update_filter.filter(first_update_event) 60 | 61 | sleep(1) 62 | second_start_event = start_event("taskid" => "id124") 63 | @start_filter.filter(second_start_event) 64 | 65 | expect(aggregate_maps["%{taskid}"].size).to eq(1) 66 | expect(aggregate_maps["%{taskid}"]["id124"].creation_timestamp).to be < second_start_event.timestamp.time 67 | expect(aggregate_maps["%{taskid}"]["id124"].map["sql_duration"]).to eq(first_update_event.get("duration")) 68 | end 69 | end 70 | end 71 | 72 | context "End event" do 73 | describe "receiving an event without a previous 'start event'" do 74 | describe "but without a previous 'start event'" do 75 | it "does nothing with the event" do 76 | end_event = end_event("taskid" => "id124") 77 | @end_filter.filter(end_event) 78 | 79 | expect(aggregate_maps["%{taskid}"]).to be_empty 80 | expect(end_event.get("sql_duration")).to be_nil 81 | end 82 | end 83 | end 84 | end 85 | 86 | context "Start/end events interaction" do 87 | describe "receiving a 'start event'" do 88 | before(:each) do 89 | @task_id_value = "id_123" 90 | @start_event = start_event({"taskid" => @task_id_value}) 91 | @start_filter.filter(@start_event) 92 | expect(aggregate_maps["%{taskid}"].size).to eq(1) 93 | end 94 | 95 | describe "and receiving an end event" do 96 | describe "and without an id" do 97 | it "does nothing" do 98 | end_event = end_event() 99 | @end_filter.filter(end_event) 100 | expect(aggregate_maps["%{taskid}"].size).to eq(1) 101 | expect(end_event.get("sql_duration")).to be_nil 102 | end 103 | end 104 | 105 | describe "and an id different from the one of the 'start event'" do 106 | it "does nothing" do 107 | different_id_value = @task_id_value + "_different" 108 | @end_filter.filter(end_event("taskid" => different_id_value)) 109 | 110 | expect(aggregate_maps["%{taskid}"].size).to eq(1) 111 | expect(aggregate_maps["%{taskid}"][@task_id_value]).not_to be_nil 112 | end 113 | end 114 | 115 | describe "and the same id of the 'start event'" do 116 | it "add 'sql_duration' field to the end event and deletes the aggregate map associated to taskid" do 117 | expect(aggregate_maps["%{taskid}"].size).to eq(1) 118 | expect(aggregate_maps["%{taskid}"][@task_id_value].map["sql_duration"]).to eq(0) 119 | 120 | @update_filter.filter(update_event("taskid" => @task_id_value, "duration" => 2)) 121 | expect(aggregate_maps["%{taskid}"][@task_id_value].map["sql_duration"]).to eq(2) 122 | 123 | end_event = end_event("taskid" => @task_id_value) 124 | @end_filter.filter(end_event) 125 | 126 | expect(aggregate_maps["%{taskid}"]).to be_empty 127 | expect(end_event.get("sql_duration")).to eq(2) 128 | end 129 | 130 | end 131 | end 132 | end 133 | end 134 | 135 | context "Event with integer task id" do 136 | it "works as well as with a string task id" do 137 | start_event = start_event("taskid" => 124) 138 | @start_filter.filter(start_event) 139 | expect(aggregate_maps["%{taskid}"].size).to eq(1) 140 | end 141 | end 142 | 143 | context "Event which causes an exception when code call" do 144 | it "intercepts exception, logs the error and tags the event with '_aggregateexception'" do 145 | @start_filter = setup_filter({ "code" => "fail 'Test'" }) 146 | start_event = start_event("taskid" => "id124") 147 | @start_filter.filter(start_event) 148 | 149 | expect(start_event.get("tags")).to eq(["_aggregateexception"]) 150 | end 151 | end 152 | 153 | context "flush call" do 154 | before(:each) do 155 | @end_filter.timeout = 1 156 | expect(@end_filter.timeout).to eq(1) 157 | @task_id_value = "id_123" 158 | @start_event = start_event({"taskid" => @task_id_value}) 159 | @start_filter.filter(@start_event) 160 | expect(aggregate_maps["%{taskid}"].size).to eq(1) 161 | end 162 | 163 | describe "no timeout defined in none filter" do 164 | it "defines a default timeout on a default filter" do 165 | reset_timeout_management() 166 | @end_filter.timeout = nil 167 | expect(taskid_eviction_instance).to be_nil 168 | @end_filter.flush() 169 | expect(taskid_eviction_instance).to eq(@end_filter) 170 | expect(@end_filter.timeout).to eq(LogStash::Filters::Aggregate::DEFAULT_TIMEOUT) 171 | end 172 | end 173 | 174 | describe "timeout is defined on another filter" do 175 | it "taskid eviction_instance is not updated" do 176 | expect(taskid_eviction_instance).not_to be_nil 177 | @start_filter.flush() 178 | expect(taskid_eviction_instance).not_to eq(@start_filter) 179 | expect(taskid_eviction_instance).to eq(@end_filter) 180 | end 181 | end 182 | 183 | describe "no timeout defined on the filter" do 184 | it "event is not removed" do 185 | sleep(2) 186 | @start_filter.flush() 187 | expect(aggregate_maps["%{taskid}"].size).to eq(1) 188 | end 189 | end 190 | 191 | describe "timeout defined on the filter" do 192 | it "event is not removed if not expired" do 193 | entries = @end_filter.flush() 194 | expect(aggregate_maps["%{taskid}"].size).to eq(1) 195 | expect(entries).to be_empty 196 | end 197 | it "removes event if expired and creates a new timeout event" do 198 | sleep(2) 199 | entries = @end_filter.flush() 200 | expect(aggregate_maps["%{taskid}"]).to be_empty 201 | expect(entries.size).to eq(1) 202 | expect(entries[0].get("my_id")).to eq("id_123") # task id 203 | expect(entries[0].get("sql_duration")).to eq(0) # Aggregation map 204 | expect(entries[0].get("test")).to eq("testValue") # Timeout code 205 | expect(entries[0].get("tags")).to eq(["tag1", "tag2"]) # Timeout tags 206 | end 207 | end 208 | 209 | describe "timeout defined on another filter with another task_id pattern" do 210 | it "does not remove event" do 211 | another_filter = setup_filter({ "task_id" => "%{another_taskid}", "code" => "", "timeout" => 1 }) 212 | sleep(2) 213 | entries = another_filter.flush() 214 | expect(aggregate_maps["%{taskid}"].size).to eq(1) 215 | expect(entries).to be_empty 216 | end 217 | end 218 | 219 | context "inactivity_timeout" do 220 | before(:each) do 221 | @end_filter.timeout = 4 222 | expect(@end_filter.timeout).to eq(4) 223 | @end_filter.inactivity_timeout = 2 224 | expect(@end_filter.inactivity_timeout).to eq(2) 225 | @task_id_value = "id_123" 226 | @start_event = start_event({"taskid" => @task_id_value}) 227 | @start_filter.filter(@start_event) 228 | expect(aggregate_maps["%{taskid}"].size).to eq(1) 229 | end 230 | describe "event arrives before inactivity_timeout" do 231 | it "does not remove event if another" do 232 | expect(aggregate_maps["%{taskid}"].size).to eq(1) 233 | sleep(1) 234 | @start_filter.filter(start_event({"task_id" => @task_id_value})) 235 | entries = @end_filter.flush() 236 | expect(aggregate_maps["%{taskid}"].size).to eq(1) 237 | expect(entries).to be_empty 238 | end 239 | end 240 | describe "no event arrives after inactivity_timeout" do 241 | it "removes event" do 242 | expect(aggregate_maps["%{taskid}"].size).to eq(1) 243 | sleep(3) 244 | entries = @end_filter.flush() 245 | expect(aggregate_maps["%{taskid}"]).to be_empty 246 | expect(entries.size).to eq(1) 247 | end 248 | end 249 | describe "timeout expires while events arrive within inactivity_timeout" do 250 | it "removes event" do 251 | expect(aggregate_maps["%{taskid}"].size).to eq(1) 252 | sleep(1) 253 | @start_filter.filter(start_event({"task_id" => @task_id_value})) 254 | sleep(1) 255 | @start_filter.filter(start_event({"task_id" => @task_id_value})) 256 | sleep(1) 257 | @start_filter.filter(start_event({"task_id" => @task_id_value})) 258 | sleep(2) 259 | @start_filter.filter(start_event({"task_id" => @task_id_value})) 260 | entries = @end_filter.flush() 261 | expect(aggregate_maps["%{taskid}"]).to be_empty 262 | expect(entries.size).to eq(1) 263 | end 264 | end 265 | end 266 | end 267 | 268 | context "aggregate_maps_path option is defined, " do 269 | describe "close event append then register event append, " do 270 | it "stores aggregate maps to configured file and then loads aggregate maps from file" do 271 | store_file = "aggregate_maps" 272 | File.delete(store_file) if File.exist?(store_file) 273 | expect(File.exist?(store_file)).to be false 274 | 275 | one_filter = setup_filter({ "task_id" => "%{one_special_field}", "code" => ""}) 276 | store_filter = setup_filter({ "code" => "map['sql_duration'] = 0", "aggregate_maps_path" => store_file }) 277 | expect(aggregate_maps["%{one_special_field}"]).to be_empty 278 | expect(aggregate_maps["%{taskid}"]).to be_empty 279 | 280 | start_event = start_event("taskid" => 124) 281 | filter = store_filter.filter(start_event) 282 | expect(aggregate_maps["%{taskid}"].size).to eq(1) 283 | 284 | @end_filter.close() 285 | expect(aggregate_maps).not_to be_empty 286 | 287 | store_filter.close() 288 | expect(File.exist?(store_file)).to be true 289 | expect(current_pipeline).to be_nil 290 | 291 | one_filter = setup_filter({ "task_id" => "%{one_special_field}", "code" => ""}) 292 | store_filter = setup_filter({ "code" => "map['sql_duration'] = 0", "aggregate_maps_path" => store_file }) 293 | expect(File.exist?(store_file)).to be false 294 | expect(aggregate_maps["%{one_special_field}"]).to be_empty 295 | expect(aggregate_maps["%{taskid}"].size).to eq(1) 296 | end 297 | end 298 | 299 | describe "when aggregate_maps_path option is defined in 2 instances, " do 300 | it "raises Logstash::ConfigurationError" do 301 | expect { 302 | setup_filter({ "code" => "", "aggregate_maps_path" => "aggregate_maps1" }) 303 | setup_filter({ "code" => "", "aggregate_maps_path" => "aggregate_maps2" }) 304 | }.to raise_error(LogStash::ConfigurationError) 305 | end 306 | end 307 | end 308 | 309 | context "Logstash reload occurs, " do 310 | describe "close method is called, " do 311 | it "reinitializes pipelines" do 312 | @end_filter.close() 313 | expect(current_pipeline).to be_nil 314 | 315 | @end_filter.register() 316 | expect(current_pipeline).not_to be_nil 317 | expect(aggregate_maps).not_to be_nil 318 | expect(pipeline_close_instance).to be_nil 319 | end 320 | end 321 | end 322 | 323 | context "push_previous_map_as_event option is defined, " do 324 | describe "when push_previous_map_as_event option is activated on another filter with same task_id pattern" do 325 | it "should throw a LogStash::ConfigurationError" do 326 | expect { 327 | setup_filter({"code" => "map['taskid'] = event.get('taskid')", "push_previous_map_as_event" => true}) 328 | }.to raise_error(LogStash::ConfigurationError) 329 | end 330 | end 331 | 332 | describe "when a new task id is detected, " do 333 | it "should push previous map as new event" do 334 | push_filter = setup_filter({ "task_id" => "%{ppm_id}", "code" => "map['ppm_id'] = event.get('ppm_id')", "push_previous_map_as_event" => true, "timeout" => 5, "timeout_task_id_field" => "timeout_task_id_field" }) 335 | push_filter.filter(event({"ppm_id" => "1"})) { |yield_event| fail "task 1 shouldn't have yield event" } 336 | push_filter.filter(event({"ppm_id" => "2"})) do |yield_event| 337 | expect(yield_event.get("ppm_id")).to eq("1") 338 | expect(yield_event.get("timeout_task_id_field")).to eq("1") 339 | end 340 | expect(aggregate_maps["%{ppm_id}"].size).to eq(1) 341 | end 342 | end 343 | 344 | describe "when timeout happens, " do 345 | it "flush method should return last map as new event" do 346 | push_filter = setup_filter({ "task_id" => "%{ppm_id}", "code" => "map['ppm_id'] = event.get('ppm_id')", "push_previous_map_as_event" => true, "timeout" => 1, "timeout_code" => "event.set('test', 'testValue')" }) 347 | push_filter.filter(event({"ppm_id" => "1"})) 348 | sleep(2) 349 | events_to_flush = push_filter.flush() 350 | expect(events_to_flush).not_to be_nil 351 | expect(events_to_flush.size).to eq(1) 352 | expect(events_to_flush[0].get("ppm_id")).to eq("1") 353 | expect(events_to_flush[0].get('test')).to eq("testValue") 354 | expect(aggregate_maps["%{ppm_id}"].size).to eq(0) 355 | end 356 | end 357 | 358 | describe "when Logstash shutdown happens, " do 359 | it "flush method should return last map as new event even if timeout has not occured" do 360 | push_filter = setup_filter({ "task_id" => "%{ppm_id}", "code" => "", "push_previous_map_as_event" => true, "timeout" => 4 }) 361 | push_filter.filter(event({"ppm_id" => "1"})) 362 | events_to_flush = push_filter.flush({:final=>false}) 363 | expect(events_to_flush).to be_empty 364 | expect(aggregate_maps["%{ppm_id}"].size).to eq(1) 365 | events_to_flush = push_filter.flush({:final=>true}) 366 | expect(events_to_flush).not_to be_nil 367 | expect(events_to_flush.size).to eq(1) 368 | expect(events_to_flush[0].get("tags")).to eq(["_aggregatefinalflush"]) 369 | expect(aggregate_maps["%{ppm_id}"].size).to eq(0) 370 | end 371 | end 372 | end 373 | 374 | context "timeout_timestamp_field option is defined, " do 375 | describe "when 3 old events arrive, " do 376 | it "should push a new aggregated event using timeout based on events timestamp" do 377 | agg_filter = setup_filter({ "task_id" => "%{ppm_id}", "code" => "map['sql_duration'] ||= 0; map['sql_duration'] += event.get('duration')", "timeout_timestamp_field" => "@timestamp", "push_map_as_event_on_timeout" => true, "timeout" => 120 }) 378 | agg_filter.filter(event({"ppm_id" => "1", "duration" => 2, "@timestamp" => timestamp("2018-01-31T00:00:00Z")})) { |yield_event| fail "it shouldn't have yield event" } 379 | agg_filter.filter(event({"ppm_id" => "1", "duration" => 3, "@timestamp" => timestamp("2018-01-31T00:00:01Z")})) { |yield_event| fail "it shouldn't have yield event" } 380 | events_to_flush = agg_filter.flush() 381 | expect(events_to_flush).to be_empty 382 | agg_filter.filter(event({"ppm_id" => "1", "duration" => 4, "@timestamp" => timestamp("2018-01-31T00:05:00Z")})) do |yield_event| 383 | expect(yield_event).not_to be_nil 384 | expect(yield_event.get("sql_duration")).to eq(5) 385 | end 386 | expect(aggregate_maps["%{ppm_id}"].size).to eq(1) 387 | expect(aggregate_maps["%{ppm_id}"]["1"].map["sql_duration"]).to eq(4) 388 | end 389 | end 390 | end 391 | 392 | context "custom timeout on map_meta, " do 393 | describe "when map_meta.timeout=0, " do 394 | it "should push a new aggregated event immediately" do 395 | agg_filter = setup_filter({ "task_id" => "%{ppm_id}", "code" => "map['sql_duration'] = 2; map_meta.timeout = 0", "push_map_as_event_on_timeout" => true, "timeout" => 120 }) 396 | agg_filter.filter(event({"ppm_id" => "1"})) do |yield_event| 397 | expect(yield_event).not_to be_nil 398 | expect(yield_event.get("sql_duration")).to eq(2) 399 | end 400 | expect(aggregate_maps["%{ppm_id}"]).to be_empty 401 | end 402 | end 403 | describe "when map_meta.timeout=0 and push_map_as_event_on_timeout=false, " do 404 | it "should just remove expired map and not push an aggregated event" do 405 | agg_filter = setup_filter({ "task_id" => "%{ppm_id}", "code" => "map_meta.timeout = 0", "push_map_as_event_on_timeout" => false, "timeout" => 120 }) 406 | agg_filter.filter(event({"ppm_id" => "1"})) { |yield_event| fail "it shouldn't have yield event" } 407 | expect(aggregate_maps["%{ppm_id}"]).to be_empty 408 | end 409 | end 410 | describe "when map_meta.inactivity_timeout=1, " do 411 | it "should push a new aggregated event at next flush call" do 412 | agg_filter = setup_filter({ "task_id" => "%{ppm_id}", "code" => "map['sql_duration'] = 2; map_meta.inactivity_timeout = 1", "push_map_as_event_on_timeout" => true, "timeout" => 120 }) 413 | agg_filter.filter(event({"ppm_id" => "1"})) { |yield_event| fail "it shouldn't have yield event" } 414 | expect(aggregate_maps["%{ppm_id}"].size).to eq(1) 415 | sleep(2) 416 | events_to_flush = agg_filter.flush() 417 | expect(events_to_flush.size).to eq(1) 418 | expect(aggregate_maps["%{ppm_id}"]).to be_empty 419 | end 420 | end 421 | end 422 | 423 | context "Custom event generation code is used" do 424 | describe "when a new event is manually generated" do 425 | it "should push a new event immediately" do 426 | agg_filter = setup_filter({ "task_id" => "%{task_id}", "code" => "map['sql_duration'] = 2; new_event_block.call(LogStash::Event.new({:my_sql_duration => map['sql_duration']}))", "timeout" => 120 }) 427 | agg_filter.filter(event({"task_id" => "1"})) do |yield_event| 428 | expect(yield_event).not_to be_nil 429 | expect(yield_event.get("my_sql_duration")).to eq(2) 430 | end 431 | end 432 | end 433 | 434 | end 435 | 436 | end -------------------------------------------------------------------------------- /lib/logstash/filters/aggregate.rb: -------------------------------------------------------------------------------- 1 | # encoding: utf-8 2 | 3 | require "logstash/filters/base" 4 | require "logstash/namespace" 5 | require "thread" 6 | require "logstash/util/decorators" 7 | 8 | 9 | class LogStash::Filters::Aggregate < LogStash::Filters::Base 10 | 11 | 12 | # ############## # 13 | # CONFIG OPTIONS # 14 | # ############## # 15 | 16 | 17 | config_name "aggregate" 18 | 19 | config :task_id, :validate => :string, :required => true 20 | 21 | config :code, :validate => :string, :required => true 22 | 23 | config :map_action, :validate => ["create", "update", "create_or_update"], :default => "create_or_update" 24 | 25 | config :end_of_task, :validate => :boolean, :default => false 26 | 27 | config :aggregate_maps_path, :validate => :string, :required => false 28 | 29 | config :timeout, :validate => :number, :required => false 30 | 31 | config :inactivity_timeout, :validate => :number, :required => false 32 | 33 | config :timeout_code, :validate => :string, :required => false 34 | 35 | config :push_map_as_event_on_timeout, :validate => :boolean, :required => false, :default => false 36 | 37 | config :push_previous_map_as_event, :validate => :boolean, :required => false, :default => false 38 | 39 | config :timeout_timestamp_field, :validate => :string, :required => false 40 | 41 | config :timeout_task_id_field, :validate => :string, :required => false 42 | 43 | config :timeout_tags, :validate => :array, :required => false, :default => [] 44 | 45 | 46 | # ################## # 47 | # INSTANCE VARIABLES # 48 | # ################## # 49 | 50 | 51 | # pointer to current pipeline context 52 | attr_accessor :current_pipeline 53 | 54 | # boolean indicating if expired maps should be checked on every flush call (typically because custom timeout has beeen set on a map) 55 | attr_accessor :check_expired_maps_on_every_flush 56 | 57 | # ################ # 58 | # STATIC VARIABLES # 59 | # ################ # 60 | 61 | 62 | # Default timeout (in seconds) when not defined in plugin configuration 63 | DEFAULT_TIMEOUT = 1800 64 | 65 | # Store all shared aggregate attributes per pipeline id 66 | @@pipelines = {} 67 | 68 | 69 | # ####### # 70 | # METHODS # 71 | # ####### # 72 | 73 | 74 | # Initialize plugin 75 | public 76 | def register 77 | 78 | @logger.debug("Aggregate register call", :code => @code) 79 | 80 | # validate task_id option 81 | if !@task_id.match(/%\{.+\}/) 82 | raise LogStash::ConfigurationError, "Aggregate plugin: task_id pattern '#{@task_id}' must contain a dynamic expression like '%{field}'" 83 | end 84 | 85 | # process lambda expression to call in each filter call 86 | eval("@codeblock = lambda { |event, map, map_meta, &new_event_block| #{@code} }", binding, "(aggregate filter code)") 87 | 88 | # process lambda expression to call in the timeout case or previous event case 89 | if @timeout_code 90 | eval("@timeout_codeblock = lambda { |event| #{@timeout_code} }", binding, "(aggregate filter timeout code)") 91 | end 92 | 93 | # init pipeline context 94 | @@pipelines[pipeline_id] ||= LogStash::Filters::Aggregate::Pipeline.new() 95 | @current_pipeline = @@pipelines[pipeline_id] 96 | 97 | @current_pipeline.mutex.synchronize do 98 | 99 | # timeout management : define eviction_instance for current task_id pattern 100 | if has_timeout_options? 101 | if @current_pipeline.flush_instance_map.has_key?(@task_id) 102 | # all timeout options have to be defined in only one aggregate filter per task_id pattern 103 | raise LogStash::ConfigurationError, "Aggregate plugin: For task_id pattern '#{@task_id}', there are more than one filter which defines timeout options. All timeout options have to be defined in only one aggregate filter per task_id pattern. Timeout options are : #{display_timeout_options}" 104 | end 105 | @current_pipeline.flush_instance_map[@task_id] = self 106 | @logger.debug("Aggregate timeout for '#{@task_id}' pattern: #{@timeout} seconds") 107 | end 108 | 109 | # inactivity timeout management: make sure it is lower than timeout 110 | if @inactivity_timeout && ((@timeout && @inactivity_timeout > @timeout) || (@timeout.nil? && @inactivity_timeout > DEFAULT_TIMEOUT)) 111 | raise LogStash::ConfigurationError, "Aggregate plugin: For task_id pattern #{@task_id}, inactivity_timeout (#{@inactivity_timeout}) must be lower than timeout (#{@timeout})" 112 | end 113 | 114 | # reinit pipeline_close_instance (if necessary) 115 | if !@current_pipeline.aggregate_maps_path_set && @current_pipeline.pipeline_close_instance 116 | @current_pipeline.pipeline_close_instance = nil 117 | end 118 | 119 | # check if aggregate_maps_path option has already been set on another instance else set @current_pipeline.aggregate_maps_path_set 120 | if @aggregate_maps_path 121 | if @current_pipeline.aggregate_maps_path_set 122 | @current_pipeline.aggregate_maps_path_set = false 123 | raise LogStash::ConfigurationError, "Aggregate plugin: Option 'aggregate_maps_path' must be set on only one aggregate filter" 124 | else 125 | @current_pipeline.aggregate_maps_path_set = true 126 | @current_pipeline.pipeline_close_instance = self 127 | end 128 | end 129 | 130 | # load aggregate maps from file (if option defined) 131 | if @aggregate_maps_path && File.exist?(@aggregate_maps_path) 132 | File.open(@aggregate_maps_path, "r") { |from_file| @current_pipeline.aggregate_maps.merge!(Marshal.load(from_file)) } 133 | File.delete(@aggregate_maps_path) 134 | @logger.info("Aggregate maps loaded from : #{@aggregate_maps_path}") 135 | end 136 | 137 | # init aggregate_maps 138 | @current_pipeline.aggregate_maps[@task_id] ||= {} 139 | update_aggregate_maps_metric() 140 | 141 | end 142 | end 143 | 144 | # Called when Logstash stops 145 | public 146 | def close 147 | 148 | @logger.debug("Aggregate close call", :code => @code) 149 | 150 | # define pipeline close instance if none is already defined 151 | @current_pipeline.pipeline_close_instance = self if @current_pipeline.pipeline_close_instance.nil? 152 | 153 | if @current_pipeline.pipeline_close_instance == self 154 | # store aggregate maps to file (if option defined) 155 | @current_pipeline.mutex.synchronize do 156 | @current_pipeline.aggregate_maps.delete_if { |key, value| value.empty? } 157 | if @aggregate_maps_path && !@current_pipeline.aggregate_maps.empty? 158 | File.open(@aggregate_maps_path, "w"){ |to_file| Marshal.dump(@current_pipeline.aggregate_maps, to_file) } 159 | @logger.info("Aggregate maps stored to : #{@aggregate_maps_path}") 160 | end 161 | end 162 | 163 | # remove pipeline context for Logstash reload 164 | @@pipelines.delete(pipeline_id) 165 | end 166 | 167 | end 168 | 169 | # This method is invoked each time an event matches the filter 170 | public 171 | def filter(event, &new_event_block) 172 | 173 | # define task id 174 | task_id = event.sprintf(@task_id) 175 | return if task_id.nil? || task_id == @task_id 176 | 177 | noError = false 178 | event_to_yield = nil 179 | 180 | # protect aggregate_maps against concurrent access, using a mutex 181 | @current_pipeline.mutex.synchronize do 182 | 183 | # if timeout is based on event timestamp, check if task_id map is expired and should be removed 184 | if @timeout_timestamp_field 185 | event_to_yield = remove_expired_map_based_on_event_timestamp(task_id, event) 186 | end 187 | 188 | # retrieve the current aggregate map 189 | aggregate_maps_element = @current_pipeline.aggregate_maps[@task_id][task_id] 190 | 191 | # case where aggregate map isn't already created 192 | if aggregate_maps_element.nil? 193 | return if @map_action == "update" 194 | 195 | # create new event from previous map, if @push_previous_map_as_event is enabled 196 | if @push_previous_map_as_event && !@current_pipeline.aggregate_maps[@task_id].empty? 197 | event_to_yield = extract_previous_map_as_event() 198 | end 199 | 200 | # create aggregate map 201 | creation_timestamp = reference_timestamp(event) 202 | aggregate_maps_element = LogStash::Filters::Aggregate::Element.new(creation_timestamp, task_id) 203 | @current_pipeline.aggregate_maps[@task_id][task_id] = aggregate_maps_element 204 | update_aggregate_maps_metric() 205 | else 206 | return if @map_action == "create" 207 | end 208 | 209 | # update last event timestamp 210 | aggregate_maps_element.lastevent_timestamp = reference_timestamp(event) 211 | aggregate_maps_element.difference_from_lastevent_to_now = (Time.now - aggregate_maps_element.lastevent_timestamp).to_i 212 | 213 | # execute the code to read/update map and event 214 | map = aggregate_maps_element.map 215 | begin 216 | @codeblock.call(event, map, aggregate_maps_element, &new_event_block) 217 | @logger.debug("Aggregate successful filter code execution", :code => @code) 218 | noError = true 219 | rescue => exception 220 | @logger.error("Aggregate exception occurred", 221 | :error => exception, 222 | :code => @code, 223 | :map => map, 224 | :event_data => event.to_hash_with_metadata) 225 | event.tag("_aggregateexception") 226 | metric.increment(:code_errors) 227 | end 228 | 229 | # delete the map if task is ended 230 | @current_pipeline.aggregate_maps[@task_id].delete(task_id) if @end_of_task 231 | update_aggregate_maps_metric() 232 | 233 | # process custom timeout set by code block 234 | if (aggregate_maps_element.timeout || aggregate_maps_element.inactivity_timeout) 235 | event_to_yield = process_map_timeout(aggregate_maps_element) 236 | end 237 | 238 | end 239 | 240 | # match the filter, only if no error occurred 241 | filter_matched(event) if noError 242 | 243 | # yield previous map as new event if set 244 | yield event_to_yield if event_to_yield 245 | end 246 | 247 | # Process a custom timeout defined in aggregate map element 248 | # Returns an event to yield if timeout=0 and push_map_as_event_on_timeout=true 249 | def process_map_timeout(element) 250 | event_to_yield = nil 251 | init_pipeline_timeout_management() 252 | if (element.timeout == 0 || element.inactivity_timeout == 0) 253 | @current_pipeline.aggregate_maps[@task_id].delete(element.task_id) 254 | if @current_pipeline.flush_instance_map[@task_id].push_map_as_event_on_timeout 255 | event_to_yield = create_timeout_event(element.map, element.task_id) 256 | end 257 | @logger.debug("Aggregate remove expired map with task_id=#{element.task_id} and custom timeout=0") 258 | metric.increment(:task_timeouts) 259 | update_aggregate_maps_metric() 260 | else 261 | @current_pipeline.flush_instance_map[@task_id].check_expired_maps_on_every_flush ||= true 262 | end 263 | return event_to_yield 264 | end 265 | 266 | # Create a new event from the aggregation_map and the corresponding task_id 267 | # This will create the event and 268 | # if @timeout_task_id_field is set, it will set the task_id on the timeout event 269 | # if @timeout_code is set, it will execute the timeout code on the created timeout event 270 | # returns the newly created event 271 | def create_timeout_event(aggregation_map, task_id) 272 | 273 | @logger.debug("Aggregate create_timeout_event call with task_id '#{task_id}'") 274 | 275 | event_to_yield = LogStash::Event.new(aggregation_map) 276 | 277 | if @timeout_task_id_field 278 | event_to_yield.set(@timeout_task_id_field, task_id) 279 | end 280 | 281 | LogStash::Util::Decorators.add_tags(@timeout_tags, event_to_yield, "filters/#{self.class.name}") 282 | 283 | 284 | # Call timeout code block if available 285 | if @timeout_code 286 | begin 287 | @timeout_codeblock.call(event_to_yield) 288 | rescue => exception 289 | @logger.error("Aggregate exception occurred", 290 | :error => exception, 291 | :timeout_code => @timeout_code, 292 | :timeout_event_data => event_to_yield.to_hash_with_metadata) 293 | event_to_yield.tag("_aggregateexception") 294 | metric.increment(:timeout_code_errors) 295 | end 296 | end 297 | 298 | metric.increment(:pushed_events) 299 | 300 | return event_to_yield 301 | end 302 | 303 | # Extract the previous map in aggregate maps, and return it as a new Logstash event 304 | def extract_previous_map_as_event 305 | previous_entry = @current_pipeline.aggregate_maps[@task_id].shift() 306 | previous_task_id = previous_entry[0] 307 | previous_map = previous_entry[1].map 308 | update_aggregate_maps_metric() 309 | return create_timeout_event(previous_map, previous_task_id) 310 | end 311 | 312 | # Necessary to indicate Logstash to periodically call 'flush' method 313 | def periodic_flush 314 | true 315 | end 316 | 317 | # This method is invoked by LogStash every 5 seconds. 318 | def flush(options = {}) 319 | 320 | @logger.trace("Aggregate flush call with #{options}") 321 | 322 | # init flush/timeout properties for current pipeline 323 | init_pipeline_timeout_management() 324 | 325 | # launch timeout management only every interval of (@inactivity_timeout / 2) seconds or at Logstash shutdown 326 | if @current_pipeline.flush_instance_map[@task_id] == self && @current_pipeline.aggregate_maps[@task_id] && (!@current_pipeline.last_flush_timestamp_map.has_key?(@task_id) || Time.now > @current_pipeline.last_flush_timestamp_map[@task_id] + @inactivity_timeout / 2 || options[:final] || @check_expired_maps_on_every_flush) 327 | events_to_flush = remove_expired_maps() 328 | 329 | # at Logstash shutdown, if push_previous_map_as_event is enabled, it's important to force flush (particularly for jdbc input plugin) 330 | @current_pipeline.mutex.synchronize do 331 | if options[:final] && @push_previous_map_as_event && !@current_pipeline.aggregate_maps[@task_id].empty? 332 | events_to_flush << extract_previous_map_as_event() 333 | end 334 | end 335 | 336 | update_aggregate_maps_metric() 337 | 338 | # tag flushed events, indicating "final flush" special event 339 | if options[:final] 340 | events_to_flush.each { |event_to_flush| event_to_flush.tag("_aggregatefinalflush") } 341 | end 342 | 343 | # update last flush timestamp 344 | @current_pipeline.last_flush_timestamp_map[@task_id] = Time.now 345 | 346 | # return events to flush into Logstash pipeline 347 | return events_to_flush 348 | else 349 | return [] 350 | end 351 | end 352 | 353 | # init flush/timeout properties for current pipeline 354 | def init_pipeline_timeout_management() 355 | 356 | # Define default flush instance that manages timeout (if not defined by user) 357 | if !@current_pipeline.flush_instance_map.has_key?(@task_id) 358 | @current_pipeline.flush_instance_map[@task_id] = self 359 | end 360 | 361 | # Define timeout and inactivity_timeout (if not defined by user) 362 | if @current_pipeline.flush_instance_map[@task_id] == self 363 | if @timeout.nil? 364 | @timeout = DEFAULT_TIMEOUT 365 | @logger.debug("Aggregate timeout for '#{@task_id}' pattern: #{@timeout} seconds (default value)") 366 | end 367 | if @inactivity_timeout.nil? 368 | @inactivity_timeout = @timeout 369 | end 370 | end 371 | 372 | end 373 | 374 | # Remove the expired Aggregate maps from @current_pipeline.aggregate_maps if they are older than timeout or if no new event has been received since inactivity_timeout. 375 | # If @push_previous_map_as_event option is set, or @push_map_as_event_on_timeout is set, expired maps are returned as new events to be flushed to Logstash pipeline. 376 | def remove_expired_maps() 377 | events_to_flush = [] 378 | default_min_timestamp = Time.now - @timeout 379 | default_min_inactivity_timestamp = Time.now - @inactivity_timeout 380 | 381 | @current_pipeline.mutex.synchronize do 382 | 383 | @logger.debug("Aggregate remove_expired_maps call with '#{@task_id}' pattern and #{@current_pipeline.aggregate_maps[@task_id].length} maps") 384 | 385 | @current_pipeline.aggregate_maps[@task_id].delete_if do |key, element| 386 | min_timestamp = element.timeout ? Time.now - element.timeout : default_min_timestamp 387 | min_inactivity_timestamp = element.inactivity_timeout ? Time.now - element.inactivity_timeout : default_min_inactivity_timestamp 388 | if element.creation_timestamp + element.difference_from_creation_to_now < min_timestamp || element.lastevent_timestamp + element.difference_from_lastevent_to_now < min_inactivity_timestamp 389 | if @push_previous_map_as_event || @push_map_as_event_on_timeout 390 | events_to_flush << create_timeout_event(element.map, key) 391 | end 392 | @logger.debug("Aggregate remove expired map with task_id=#{key}") 393 | metric.increment(:task_timeouts) 394 | next true 395 | end 396 | next false 397 | end 398 | end 399 | 400 | # disable check_expired_maps_on_every_flush if there is not anymore maps 401 | if @current_pipeline.aggregate_maps[@task_id].length == 0 && @check_expired_maps_on_every_flush 402 | @check_expired_maps_on_every_flush = nil 403 | end 404 | 405 | return events_to_flush 406 | end 407 | 408 | # Remove the expired Aggregate map associated to task_id if it is older than timeout or if no new event has been received since inactivity_timeout (relative to current event timestamp). 409 | # If @push_previous_map_as_event option is set, or @push_map_as_event_on_timeout is set, expired map is returned as new event to be flushed to Logstash pipeline. 410 | def remove_expired_map_based_on_event_timestamp(task_id, event) 411 | 412 | @logger.debug("Aggregate remove_expired_map_based_on_event_timestamp call with task_id : '#{@task_id}'") 413 | 414 | # get aggregate map element 415 | element = @current_pipeline.aggregate_maps[@task_id][task_id] 416 | return nil if element.nil? 417 | 418 | init_pipeline_timeout_management() 419 | 420 | event_to_flush = nil 421 | event_timestamp = reference_timestamp(event) 422 | min_timestamp = element.timeout ? event_timestamp - element.timeout : event_timestamp - @timeout 423 | min_inactivity_timestamp = element.inactivity_timeout ? event_timestamp - element.inactivity_timeout : event_timestamp - @inactivity_timeout 424 | 425 | if element.creation_timestamp < min_timestamp || element.lastevent_timestamp < min_inactivity_timestamp 426 | if @push_previous_map_as_event || @push_map_as_event_on_timeout 427 | event_to_flush = create_timeout_event(element.map, task_id) 428 | end 429 | @current_pipeline.aggregate_maps[@task_id].delete(task_id) 430 | @logger.debug("Aggregate remove expired map with task_id=#{task_id}") 431 | metric.increment(:task_timeouts) 432 | end 433 | 434 | return event_to_flush 435 | end 436 | 437 | # return if this filter instance has any timeout option enabled in logstash configuration 438 | def has_timeout_options?() 439 | return ( 440 | timeout || 441 | inactivity_timeout || 442 | timeout_code || 443 | push_map_as_event_on_timeout || 444 | push_previous_map_as_event || 445 | timeout_timestamp_field || 446 | timeout_task_id_field || 447 | !timeout_tags.empty? 448 | ) 449 | end 450 | 451 | # display all possible timeout options 452 | def display_timeout_options() 453 | return [ 454 | "timeout", 455 | "inactivity_timeout", 456 | "timeout_code", 457 | "push_map_as_event_on_timeout", 458 | "push_previous_map_as_event", 459 | "timeout_timestamp_field", 460 | "timeout_task_id_field", 461 | "timeout_tags" 462 | ].join(", ") 463 | end 464 | 465 | # return current pipeline id 466 | def pipeline_id() 467 | if @execution_context 468 | return @execution_context.pipeline_id 469 | else 470 | return "main" 471 | end 472 | end 473 | 474 | # compute and return "reference" timestamp to compute timeout : 475 | # by default "system current time" or event timestamp if timeout_timestamp_field option is defined 476 | def reference_timestamp(event) 477 | return (@timeout_timestamp_field) ? event.get(@timeout_timestamp_field).time : Time.now 478 | end 479 | 480 | # update "aggregate_maps" metric, with aggregate maps count associated to configured taskid pattern 481 | def update_aggregate_maps_metric() 482 | aggregate_maps = @current_pipeline.aggregate_maps[@task_id] 483 | if aggregate_maps 484 | metric.gauge(:aggregate_maps, aggregate_maps.length) 485 | end 486 | end 487 | 488 | end # class LogStash::Filters::Aggregate 489 | 490 | # Element of "aggregate_maps" 491 | class LogStash::Filters::Aggregate::Element 492 | 493 | attr_accessor :creation_timestamp, :lastevent_timestamp, :difference_from_creation_to_now, :difference_from_lastevent_to_now, :timeout, :inactivity_timeout, :task_id, :map 494 | 495 | def initialize(creation_timestamp, task_id) 496 | @creation_timestamp = creation_timestamp 497 | @lastevent_timestamp = creation_timestamp 498 | @difference_from_creation_to_now = (Time.now - creation_timestamp).to_i 499 | @difference_from_lastevent_to_now = @difference_from_creation_to_now 500 | @timeout = nil 501 | @inactivity_timeout = nil 502 | @task_id = task_id 503 | @map = {} 504 | end 505 | end 506 | 507 | # shared aggregate attributes for each pipeline 508 | class LogStash::Filters::Aggregate::Pipeline 509 | 510 | attr_accessor :aggregate_maps, :mutex, :flush_instance_map, :last_flush_timestamp_map, :aggregate_maps_path_set, :pipeline_close_instance 511 | 512 | def initialize() 513 | # Stores all aggregate maps, per task_id pattern, then per task_id value 514 | @aggregate_maps = {} 515 | 516 | # Mutex used to synchronize access to 'aggregate_maps' 517 | @mutex = Mutex.new 518 | 519 | # For each "task_id" pattern, defines which Aggregate instance will process flush() call, processing expired Aggregate elements (older than timeout) 520 | # For each entry, key is "task_id pattern" and value is "aggregate instance" 521 | @flush_instance_map = {} 522 | 523 | # last time where timeout management in flush() method was launched, per "task_id" pattern 524 | @last_flush_timestamp_map = {} 525 | 526 | # flag indicating if aggregate_maps_path option has been already set on one aggregate instance 527 | @aggregate_maps_path_set = false 528 | 529 | # defines which Aggregate instance will close Aggregate variables associated to current pipeline 530 | @pipeline_close_instance = nil 531 | end 532 | end 533 | -------------------------------------------------------------------------------- /docs/index.asciidoc: -------------------------------------------------------------------------------- 1 | :plugin: aggregate 2 | :type: filter 3 | 4 | /////////////////////////////////////////// 5 | START - GENERATED VARIABLES, DO NOT EDIT! 6 | /////////////////////////////////////////// 7 | :version: %VERSION% 8 | :release_date: %RELEASE_DATE% 9 | :changelog_url: %CHANGELOG_URL% 10 | :include_path: ../../../../logstash/docs/include 11 | /////////////////////////////////////////// 12 | END - GENERATED VARIABLES, DO NOT EDIT! 13 | /////////////////////////////////////////// 14 | 15 | [id="plugins-{type}s-{plugin}"] 16 | 17 | === Aggregate filter plugin 18 | 19 | include::{include_path}/plugin_header.asciidoc[] 20 | 21 | 22 | [id="plugins-{type}s-{plugin}-description"] 23 | ==== Description 24 | 25 | 26 | The aim of this filter is to aggregate information available among several events (typically log lines) belonging to a same task, 27 | and finally push aggregated information into final task event. 28 | 29 | You should be very careful to set Logstash filter workers to 1 (`-w 1` in [command-line flag](https://www.elastic.co/guide/en/logstash/current/running-logstash-command-line.html#command-line-flags)) for this filter to work correctly 30 | otherwise events may be processed out of sequence and unexpected results will occur. 31 | 32 | 33 | [id="plugins-{type}s-{plugin}-example1"] 34 | ==== Example #1 35 | 36 | * with these given logs : 37 | 38 | [source,ruby] 39 | ---------------------------------- 40 | INFO - 12345 - TASK_START - start 41 | INFO - 12345 - SQL - sqlQuery1 - 12 42 | INFO - 12345 - SQL - sqlQuery2 - 34 43 | INFO - 12345 - TASK_END - end 44 | ---------------------------------- 45 | 46 | * you can aggregate "sql duration" for the whole task with this configuration : 47 | 48 | [source,ruby] 49 | ---------------------------------- 50 | filter { 51 | grok { 52 | match => [ "message", "%{LOGLEVEL:loglevel} - %{NOTSPACE:taskid} - %{NOTSPACE:logger} - %{WORD:label}( - %{INT:duration:int})?" ] 53 | } 54 | 55 | if [logger] == "TASK_START" { 56 | aggregate { 57 | task_id => "%{taskid}" 58 | code => "map['sql_duration'] = 0" 59 | map_action => "create" 60 | } 61 | } 62 | 63 | if [logger] == "SQL" { 64 | aggregate { 65 | task_id => "%{taskid}" 66 | code => "map['sql_duration'] += event.get('duration')" 67 | map_action => "update" 68 | } 69 | } 70 | 71 | if [logger] == "TASK_END" { 72 | aggregate { 73 | task_id => "%{taskid}" 74 | code => "event.set('sql_duration', map['sql_duration'])" 75 | map_action => "update" 76 | end_of_task => true 77 | timeout => 120 78 | } 79 | } 80 | } 81 | ---------------------------------- 82 | 83 | * the final event then looks like : 84 | 85 | [source,ruby] 86 | ---------------------------------- 87 | { 88 | "message" => "INFO - 12345 - TASK_END - end message", 89 | "sql_duration" => 46 90 | } 91 | ---------------------------------- 92 | 93 | the field `sql_duration` is added and contains the sum of all sql queries durations. 94 | 95 | 96 | [id="plugins-{type}s-{plugin}-example2"] 97 | ==== Example #2 : no start event 98 | 99 | * If you have the same logs than example #1, but without a start log : 100 | 101 | [source,ruby] 102 | ---------------------------------- 103 | INFO - 12345 - SQL - sqlQuery1 - 12 104 | INFO - 12345 - SQL - sqlQuery2 - 34 105 | INFO - 12345 - TASK_END - end 106 | ---------------------------------- 107 | 108 | * you can also aggregate "sql duration" with a slightly different configuration : 109 | 110 | [source,ruby] 111 | ---------------------------------- 112 | filter { 113 | grok { 114 | match => [ "message", "%{LOGLEVEL:loglevel} - %{NOTSPACE:taskid} - %{NOTSPACE:logger} - %{WORD:label}( - %{INT:duration:int})?" ] 115 | } 116 | 117 | if [logger] == "SQL" { 118 | aggregate { 119 | task_id => "%{taskid}" 120 | code => "map['sql_duration'] ||= 0 ; map['sql_duration'] += event.get('duration')" 121 | } 122 | } 123 | 124 | if [logger] == "TASK_END" { 125 | aggregate { 126 | task_id => "%{taskid}" 127 | code => "event.set('sql_duration', map['sql_duration'])" 128 | end_of_task => true 129 | timeout => 120 130 | } 131 | } 132 | } 133 | ---------------------------------- 134 | 135 | * the final event is exactly the same than example #1 136 | * the key point is the "||=" ruby operator. It allows to initialize 'sql_duration' map entry to 0 only if this map entry is not already initialized 137 | 138 | 139 | [id="plugins-{type}s-{plugin}-example3"] 140 | ==== Example #3 : no end event 141 | 142 | Third use case: You have no specific end event. 143 | 144 | A typical case is aggregating or tracking user behaviour. We can track a user by its ID through the events, however once the user stops interacting, the events stop coming in. There is no specific event indicating the end of the user's interaction. 145 | 146 | In this case, we can enable the option 'push_map_as_event_on_timeout' to enable pushing the aggregation map as a new event when a timeout occurs. 147 | In addition, we can enable 'timeout_code' to execute code on the populated timeout event. 148 | We can also add 'timeout_task_id_field' so we can correlate the task_id, which in this case would be the user's ID. 149 | 150 | * Given these logs: 151 | 152 | [source,ruby] 153 | ---------------------------------- 154 | INFO - 12345 - Clicked One 155 | INFO - 12345 - Clicked Two 156 | INFO - 12345 - Clicked Three 157 | ---------------------------------- 158 | 159 | * You can aggregate the amount of clicks the user did like this: 160 | 161 | [source,ruby] 162 | ---------------------------------- 163 | filter { 164 | grok { 165 | match => [ "message", "%{LOGLEVEL:loglevel} - %{NOTSPACE:user_id} - %{GREEDYDATA:msg_text}" ] 166 | } 167 | 168 | aggregate { 169 | task_id => "%{user_id}" 170 | code => "map['clicks'] ||= 0; map['clicks'] += 1;" 171 | push_map_as_event_on_timeout => true 172 | timeout_task_id_field => "user_id" 173 | timeout => 600 # 10 minutes timeout 174 | timeout_tags => ['_aggregatetimeout'] 175 | timeout_code => "event.set('several_clicks', event.get('clicks') > 1)" 176 | } 177 | } 178 | ---------------------------------- 179 | 180 | * After ten minutes, this will yield an event like: 181 | 182 | [source,json] 183 | ---------------------------------- 184 | { 185 | "user_id": "12345", 186 | "clicks": 3, 187 | "several_clicks": true, 188 | "tags": [ 189 | "_aggregatetimeout" 190 | ] 191 | } 192 | ---------------------------------- 193 | 194 | 195 | [id="plugins-{type}s-{plugin}-example4"] 196 | ==== Example #4 : no end event and tasks come one after the other 197 | 198 | Fourth use case : like example #3, you have no specific end event, but also, tasks come one after the other. 199 | 200 | That is to say : tasks are not interlaced. All task1 events come, then all task2 events come, ... 201 | 202 | In that case, you don't want to wait task timeout to flush aggregation map. 203 | 204 | * A typical case is aggregating results from jdbc input plugin. 205 | * Given that you have this SQL query : `SELECT country_name, town_name FROM town` 206 | * Using jdbc input plugin, you get these 3 events from : 207 | 208 | [source,json] 209 | ---------------------------------- 210 | { "country_name": "France", "town_name": "Paris" } 211 | { "country_name": "France", "town_name": "Marseille" } 212 | { "country_name": "USA", "town_name": "New-York" } 213 | ---------------------------------- 214 | 215 | * And you would like these 2 result events to push them into elasticsearch : 216 | 217 | [source,json] 218 | ---------------------------------- 219 | { "country_name": "France", "towns": [ {"town_name": "Paris"}, {"town_name": "Marseille"} ] } 220 | { "country_name": "USA", "towns": [ {"town_name": "New-York"} ] } 221 | ---------------------------------- 222 | 223 | * You can do that using `push_previous_map_as_event` aggregate plugin option : 224 | 225 | [source,ruby] 226 | ---------------------------------- 227 | filter { 228 | aggregate { 229 | task_id => "%{country_name}" 230 | code => " 231 | map['country_name'] ||= event.get('country_name') 232 | map['towns'] ||= [] 233 | map['towns'] << {'town_name' => event.get('town_name')} 234 | event.cancel() 235 | " 236 | push_previous_map_as_event => true 237 | timeout => 3 238 | } 239 | } 240 | ---------------------------------- 241 | 242 | * The key point is that each time aggregate plugin detects a new `country_name`, it pushes previous aggregate map as a new Logstash event, and then creates a new empty map for the next country 243 | * When 3s timeout comes, the last aggregate map is pushed as a new event 244 | * Initial events (which are not aggregated) are dropped because useless (thanks to `event.cancel()`) 245 | * Last point: if a field is not fulfilled for every event (say "town_postcode" field), the `||=` operator will let you to push into aggregate map, the first "not null" value. Example: `map['town_postcode'] ||= event.get('town_postcode')` 246 | 247 | 248 | [id="plugins-{type}s-{plugin}-example5"] 249 | ==== Example #5 : no end event and push events as soon as possible 250 | 251 | Fifth use case: like example #3, there is no end event. 252 | 253 | Events keep coming for an indefinite time and you want to push the aggregation map as soon as possible after the last user interaction without waiting for the `timeout`. 254 | 255 | This allows to have the aggregated events pushed closer to real time. 256 | 257 | 258 | A typical case is aggregating or tracking user behaviour. 259 | 260 | We can track a user by its ID through the events, however once the user stops interacting, the events stop coming in. 261 | 262 | There is no specific event indicating the end of the user's interaction. 263 | 264 | The user interaction will be considered as ended when no events for the specified user (task_id) arrive after the specified inactivity_timeout`. 265 | 266 | If the user continues interacting for longer than `timeout` seconds (since first event), the aggregation map will still be deleted and pushed as a new event when timeout occurs. 267 | 268 | The difference with example #3 is that the events will be pushed as soon as the user stops interacting for `inactivity_timeout` seconds instead of waiting for the end of `timeout` seconds since first event. 269 | 270 | In this case, we can enable the option 'push_map_as_event_on_timeout' to enable pushing the aggregation map as a new event when inactivity timeout occurs. 271 | 272 | In addition, we can enable 'timeout_code' to execute code on the populated timeout event. 273 | 274 | We can also add 'timeout_task_id_field' so we can correlate the task_id, which in this case would be the user's ID. 275 | 276 | 277 | * Given these logs: 278 | 279 | [source,ruby] 280 | ---------------------------------- 281 | INFO - 12345 - Clicked One 282 | INFO - 12345 - Clicked Two 283 | INFO - 12345 - Clicked Three 284 | ---------------------------------- 285 | 286 | * You can aggregate the amount of clicks the user did like this: 287 | 288 | [source,ruby] 289 | ---------------------------------- 290 | filter { 291 | grok { 292 | match => [ "message", "%{LOGLEVEL:loglevel} - %{NOTSPACE:user_id} - %{GREEDYDATA:msg_text}" ] 293 | } 294 | aggregate { 295 | task_id => "%{user_id}" 296 | code => "map['clicks'] ||= 0; map['clicks'] += 1;" 297 | push_map_as_event_on_timeout => true 298 | timeout_task_id_field => "user_id" 299 | timeout => 3600 # 1 hour timeout, user activity will be considered finished one hour after the first event, even if events keep coming 300 | inactivity_timeout => 300 # 5 minutes timeout, user activity will be considered finished if no new events arrive 5 minutes after the last event 301 | timeout_tags => ['_aggregatetimeout'] 302 | timeout_code => "event.set('several_clicks', event.get('clicks') > 1)" 303 | } 304 | } 305 | ---------------------------------- 306 | 307 | * After five minutes of inactivity or one hour since first event, this will yield an event like: 308 | 309 | [source,json] 310 | ---------------------------------- 311 | { 312 | "user_id": "12345", 313 | "clicks": 3, 314 | "several_clicks": true, 315 | "tags": [ 316 | "_aggregatetimeout" 317 | ] 318 | } 319 | ---------------------------------- 320 | 321 | 322 | [id="plugins-{type}s-{plugin}-howitworks"] 323 | ==== How it works 324 | * the filter needs a "task_id" to correlate events (log lines) of a same task 325 | * at the task beginning, filter creates a map, attached to task_id 326 | * for each event, you can execute code using 'event' and 'map' (for instance, copy an event field to map) 327 | * in the final event, you can execute a last code (for instance, add map data to final event) 328 | * after the final event, the map attached to task is deleted (thanks to `end_of_task => true`) 329 | * an aggregate map is tied to one task_id value which is tied to one task_id pattern. So if you have 2 filters with different task_id patterns, even if you have same task_id value, they won't share the same aggregate map. 330 | * in one filter configuration, it is recommended to define a timeout option to protect the feature against unterminated tasks. It tells the filter to delete expired maps 331 | * if no timeout is defined, by default, all maps older than 1800 seconds are automatically deleted 332 | * all timeout options have to be defined in only one aggregate filter per task_id pattern (per pipeline). Timeout options are : timeout, inactivity_timeout, timeout_code, push_map_as_event_on_timeout, push_previous_map_as_event, timeout_timestamp_field, timeout_task_id_field, timeout_tags 333 | * if `code` execution raises an exception, the error is logged and event is tagged '_aggregateexception' 334 | 335 | 336 | [id="plugins-{type}s-{plugin}-usecases"] 337 | ==== Use Cases 338 | * extract some cool metrics from task logs and push them into task final log event (like in example #1 and #2) 339 | * extract error information in any task log line, and push it in final task event (to get a final event with all error information if any) 340 | * extract all back-end calls as a list, and push this list in final task event (to get a task profile) 341 | * extract all http headers logged in several lines to push this list in final task event (complete http request info) 342 | * for every back-end call, collect call details available on several lines, analyse it and finally tag final back-end call log line (error, timeout, business-warning, ...) 343 | * Finally, task id can be any correlation id matching your need : it can be a session id, a file path, ... 344 | 345 | 346 | [id="plugins-{type}s-{plugin}-options"] 347 | ==== Aggregate Filter Configuration Options 348 | 349 | This plugin supports the following configuration options plus the <> described later. 350 | 351 | [cols="<,<,<",options="header",] 352 | |======================================================================= 353 | |Setting |Input type|Required 354 | | <> |<>, a valid filesystem path|No 355 | | <> |<>|Yes 356 | | <> |<>|No 357 | | <> |<>|No 358 | | <> |<>, one of `["create", "update", "create_or_update"]`|No 359 | | <> |<>|No 360 | | <> |<>|No 361 | | <> |<>|Yes 362 | | <> |<>|No 363 | | <> |<>|No 364 | | <> |<>|No 365 | | <> |<>|No 366 | | <> |<>|No 367 | |======================================================================= 368 | 369 | Also see <> for a list of options supported by all 370 | filter plugins. 371 | 372 |   373 | 374 | [id="plugins-{type}s-{plugin}-aggregate_maps_path"] 375 | ===== `aggregate_maps_path` 376 | 377 | * Value type is <> 378 | * There is no default value for this setting. 379 | 380 | The path to file where aggregate maps are stored when Logstash stops 381 | and are loaded from when Logstash starts. 382 | 383 | If not defined, aggregate maps will not be stored at Logstash stop and will be lost. 384 | Must be defined in only one aggregate filter per pipeline (as aggregate maps are shared at pipeline level). 385 | 386 | Example: 387 | [source,ruby] 388 | filter { 389 | aggregate { 390 | aggregate_maps_path => "/path/to/.aggregate_maps" 391 | } 392 | } 393 | 394 | [id="plugins-{type}s-{plugin}-code"] 395 | ===== `code` 396 | 397 | * This is a required setting. 398 | * Value type is <> 399 | * There is no default value for this setting. 400 | 401 | The code to execute to update aggregated map, using current event. 402 | 403 | Or on the contrary, the code to execute to update event, using aggregated map. 404 | 405 | Available variables are: 406 | 407 | `event`: current Logstash event 408 | 409 | `map`: aggregated map associated to `task_id`, containing key/value pairs. Data structure is a ruby http://ruby-doc.org/core-1.9.1/Hash.html[Hash] 410 | 411 | `map_meta`: meta informations associated to aggregate map. It allows to set a custom `timeout` or `inactivity_timeout`. 412 | It allows also to get `creation_timestamp`, `lastevent_timestamp` and `task_id`. 413 | 414 | `new_event_block`: block used to emit new Logstash events. See the second example on how to use it. 415 | 416 | When option push_map_as_event_on_timeout=true, if you set `map_meta.timeout=0` in `code` block, then aggregated map is immediately pushed as a new event. 417 | 418 | 419 | Example: 420 | [source,ruby] 421 | filter { 422 | aggregate { 423 | code => "map['sql_duration'] += event.get('duration')" 424 | } 425 | } 426 | 427 | 428 | To create additional events during the code execution, to be emitted immediately, you can use `new_event_block.call(event)` function, like in the following example: 429 | 430 | [source,ruby] 431 | filter { 432 | aggregate { 433 | code => " 434 | data = {:my_sql_duration => map['sql_duration']} 435 | generated_event = LogStash::Event.new(data) 436 | generated_event.set('my_other_field', 34) 437 | new_event_block.call(generated_event) 438 | " 439 | } 440 | } 441 | 442 | The parameter of the function `new_event_block.call` must be of type `LogStash::Event`. 443 | To create such an object, the constructor of the same class can be used: `LogStash::Event.new()`. 444 | `LogStash::Event.new()` can receive a parameter of type ruby http://ruby-doc.org/core-1.9.1/Hash.html[Hash] to initialize the new event fields. 445 | 446 | 447 | [id="plugins-{type}s-{plugin}-end_of_task"] 448 | ===== `end_of_task` 449 | 450 | * Value type is <> 451 | * Default value is `false` 452 | 453 | Tell the filter that task is ended, and therefore, to delete aggregate map after code execution. 454 | 455 | [id="plugins-{type}s-{plugin}-inactivity_timeout"] 456 | ===== `inactivity_timeout` 457 | 458 | * Value type is <> 459 | * There is no default value for this setting. 460 | 461 | The amount of seconds (since the last event) after which a task is considered as expired. 462 | 463 | When timeout occurs for a task, its aggregate map is evicted. 464 | 465 | If 'push_map_as_event_on_timeout' or 'push_previous_map_as_event' is set to true, the task aggregation map is pushed as a new Logstash event. 466 | 467 | `inactivity_timeout` can be defined for each "task_id" pattern. 468 | 469 | `inactivity_timeout` must be lower than `timeout`. 470 | 471 | [id="plugins-{type}s-{plugin}-map_action"] 472 | ===== `map_action` 473 | 474 | * Value type is <> 475 | * Default value is `"create_or_update"` 476 | 477 | Tell the filter what to do with aggregate map. 478 | 479 | `"create"`: create the map, and execute the code only if map wasn't created before 480 | 481 | `"update"`: doesn't create the map, and execute the code only if map was created before 482 | 483 | `"create_or_update"`: create the map if it wasn't created before, execute the code in all cases 484 | 485 | [id="plugins-{type}s-{plugin}-push_map_as_event_on_timeout"] 486 | ===== `push_map_as_event_on_timeout` 487 | 488 | * Value type is <> 489 | * Default value is `false` 490 | 491 | When this option is enabled, each time a task timeout is detected, it pushes task aggregation map as a new Logstash event. 492 | This enables to detect and process task timeouts in Logstash, but also to manage tasks that have no explicit end event. 493 | 494 | [id="plugins-{type}s-{plugin}-push_previous_map_as_event"] 495 | ===== `push_previous_map_as_event` 496 | 497 | * Value type is <> 498 | * Default value is `false` 499 | 500 | When this option is enabled, each time aggregate plugin detects a new task id, it pushes previous aggregate map as a new Logstash event, 501 | and then creates a new empty map for the next task. 502 | 503 | WARNING: this option works fine only if tasks come one after the other. It means : all task1 events, then all task2 events, etc... 504 | 505 | [id="plugins-{type}s-{plugin}-task_id"] 506 | ===== `task_id` 507 | 508 | * This is a required setting. 509 | * Value type is <> 510 | * There is no default value for this setting. 511 | 512 | The expression defining task ID to correlate logs. 513 | 514 | This value must uniquely identify the task. 515 | 516 | Example: 517 | [source,ruby] 518 | filter { 519 | aggregate { 520 | task_id => "%{type}%{my_task_id}" 521 | } 522 | } 523 | 524 | [id="plugins-{type}s-{plugin}-timeout"] 525 | ===== `timeout` 526 | 527 | * Value type is <> 528 | * Default value is `1800` 529 | 530 | The amount of seconds (since the first event) after which a task is considered as expired. 531 | 532 | When timeout occurs for a task, its aggregate map is evicted. 533 | 534 | If 'push_map_as_event_on_timeout' or 'push_previous_map_as_event' is set to true, the task aggregation map is pushed as a new Logstash event. 535 | 536 | Timeout can be defined for each "task_id" pattern. 537 | 538 | [id="plugins-{type}s-{plugin}-timeout_code"] 539 | ===== `timeout_code` 540 | 541 | * Value type is <> 542 | * There is no default value for this setting. 543 | 544 | The code to execute to complete timeout generated event, when `'push_map_as_event_on_timeout'` or `'push_previous_map_as_event'` is set to true. 545 | The code block will have access to the newly generated timeout event that is pre-populated with the aggregation map. 546 | 547 | If `'timeout_task_id_field'` is set, the event is also populated with the task_id value 548 | 549 | Example: 550 | [source,ruby] 551 | filter { 552 | aggregate { 553 | timeout_code => "event.set('state', 'timeout')" 554 | } 555 | } 556 | 557 | [id="plugins-{type}s-{plugin}-timeout_tags"] 558 | ===== `timeout_tags` 559 | 560 | * Value type is <> 561 | * Default value is `[]` 562 | 563 | Defines tags to add when a timeout event is generated and yield 564 | 565 | Example: 566 | [source,ruby] 567 | filter { 568 | aggregate { 569 | timeout_tags => ["aggregate_timeout"] 570 | } 571 | } 572 | 573 | [id="plugins-{type}s-{plugin}-timeout_task_id_field"] 574 | ===== `timeout_task_id_field` 575 | 576 | * Value type is <> 577 | * There is no default value for this setting. 578 | 579 | This option indicates the timeout generated event's field where the current "task_id" value will be set. 580 | This can help to correlate which tasks have been timed out. 581 | 582 | By default, if this option is not set, task id value won't be set into timeout generated event. 583 | 584 | Example: 585 | [source,ruby] 586 | filter { 587 | aggregate { 588 | timeout_task_id_field => "task_id" 589 | } 590 | } 591 | 592 | [id="plugins-{type}s-{plugin}-timeout_timestamp_field"] 593 | ===== `timeout_timestamp_field` 594 | 595 | * Value type is <> 596 | * There is no default value for this setting. 597 | 598 | By default, timeout is computed using system time, where Logstash is running. 599 | 600 | When this option is set, timeout is computed using event timestamp field indicated in this option. 601 | It means that when a first event arrives on aggregate filter and induces a map creation, map creation time will be equal to this event timestamp. 602 | Then, each time a new event arrives on aggregate filter, event timestamp is compared to map creation time to check if timeout happened. 603 | 604 | This option is particularly useful when processing old logs with option `push_map_as_event_on_timeout => true`. 605 | It lets to generate aggregated events based on timeout on old logs, where system time is inappropriate. 606 | 607 | Warning : so that this option works fine, it must be set on first aggregate filter. 608 | 609 | Example: 610 | [source,ruby] 611 | filter { 612 | aggregate { 613 | timeout_timestamp_field => "@timestamp" 614 | } 615 | } 616 | 617 | 618 | [id="plugins-{type}s-{plugin}-common-options"] 619 | include::{include_path}/{type}.asciidoc[] 620 | --------------------------------------------------------------------------------