├── Gemfile
├── LICENSE
├── README.md
├── Rakefile
├── fluent-plugin-sanitizer.gemspec
├── lib
└── fluent
│ └── plugin
│ └── filter_sanitizer.rb
└── test
├── helper.rb
├── lib
└── fluent
│ └── plugin
│ └── filter_sanitizer.rb
└── plugin
└── test_filter_sanitizer.rb
/Gemfile:
--------------------------------------------------------------------------------
1 | source "https://rubygems.org"
2 |
3 | gemspec
4 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | Apache License
2 | Version 2.0, January 2004
3 | http://www.apache.org/licenses/
4 |
5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6 |
7 | 1. Definitions.
8 |
9 | "License" shall mean the terms and conditions for use, reproduction,
10 | and distribution as defined by Sections 1 through 9 of this document.
11 |
12 | "Licensor" shall mean the copyright owner or entity authorized by
13 | the copyright owner that is granting the License.
14 |
15 | "Legal Entity" shall mean the union of the acting entity and all
16 | other entities that control, are controlled by, or are under common
17 | control with that entity. For the purposes of this definition,
18 | "control" means (i) the power, direct or indirect, to cause the
19 | direction or management of such entity, whether by contract or
20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the
21 | outstanding shares, or (iii) beneficial ownership of such entity.
22 |
23 | "You" (or "Your") shall mean an individual or Legal Entity
24 | exercising permissions granted by this License.
25 |
26 | "Source" form shall mean the preferred form for making modifications,
27 | including but not limited to software source code, documentation
28 | source, and configuration files.
29 |
30 | "Object" form shall mean any form resulting from mechanical
31 | transformation or translation of a Source form, including but
32 | not limited to compiled object code, generated documentation,
33 | and conversions to other media types.
34 |
35 | "Work" shall mean the work of authorship, whether in Source or
36 | Object form, made available under the License, as indicated by a
37 | copyright notice that is included in or attached to the work
38 | (an example is provided in the Appendix below).
39 |
40 | "Derivative Works" shall mean any work, whether in Source or Object
41 | form, that is based on (or derived from) the Work and for which the
42 | editorial revisions, annotations, elaborations, or other modifications
43 | represent, as a whole, an original work of authorship. For the purposes
44 | of this License, Derivative Works shall not include works that remain
45 | separable from, or merely link (or bind by name) to the interfaces of,
46 | the Work and Derivative Works thereof.
47 |
48 | "Contribution" shall mean any work of authorship, including
49 | the original version of the Work and any modifications or additions
50 | to that Work or Derivative Works thereof, that is intentionally
51 | submitted to Licensor for inclusion in the Work by the copyright owner
52 | or by an individual or Legal Entity authorized to submit on behalf of
53 | the copyright owner. For the purposes of this definition, "submitted"
54 | means any form of electronic, verbal, or written communication sent
55 | to the Licensor or its representatives, including but not limited to
56 | communication on electronic mailing lists, source code control systems,
57 | and issue tracking systems that are managed by, or on behalf of, the
58 | Licensor for the purpose of discussing and improving the Work, but
59 | excluding communication that is conspicuously marked or otherwise
60 | designated in writing by the copyright owner as "Not a Contribution."
61 |
62 | "Contributor" shall mean Licensor and any individual or Legal Entity
63 | on behalf of whom a Contribution has been received by Licensor and
64 | subsequently incorporated within the Work.
65 |
66 | 2. Grant of Copyright License. Subject to the terms and conditions of
67 | this License, each Contributor hereby grants to You a perpetual,
68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
69 | copyright license to reproduce, prepare Derivative Works of,
70 | publicly display, publicly perform, sublicense, and distribute the
71 | Work and such Derivative Works in Source or Object form.
72 |
73 | 3. Grant of Patent License. Subject to the terms and conditions of
74 | this License, each Contributor hereby grants to You a perpetual,
75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
76 | (except as stated in this section) patent license to make, have made,
77 | use, offer to sell, sell, import, and otherwise transfer the Work,
78 | where such license applies only to those patent claims licensable
79 | by such Contributor that are necessarily infringed by their
80 | Contribution(s) alone or by combination of their Contribution(s)
81 | with the Work to which such Contribution(s) was submitted. If You
82 | institute patent litigation against any entity (including a
83 | cross-claim or counterclaim in a lawsuit) alleging that the Work
84 | or a Contribution incorporated within the Work constitutes direct
85 | or contributory patent infringement, then any patent licenses
86 | granted to You under this License for that Work shall terminate
87 | as of the date such litigation is filed.
88 |
89 | 4. Redistribution. You may reproduce and distribute copies of the
90 | Work or Derivative Works thereof in any medium, with or without
91 | modifications, and in Source or Object form, provided that You
92 | meet the following conditions:
93 |
94 | (a) You must give any other recipients of the Work or
95 | Derivative Works a copy of this License; and
96 |
97 | (b) You must cause any modified files to carry prominent notices
98 | stating that You changed the files; and
99 |
100 | (c) You must retain, in the Source form of any Derivative Works
101 | that You distribute, all copyright, patent, trademark, and
102 | attribution notices from the Source form of the Work,
103 | excluding those notices that do not pertain to any part of
104 | the Derivative Works; and
105 |
106 | (d) If the Work includes a "NOTICE" text file as part of its
107 | distribution, then any Derivative Works that You distribute must
108 | include a readable copy of the attribution notices contained
109 | within such NOTICE file, excluding those notices that do not
110 | pertain to any part of the Derivative Works, in at least one
111 | of the following places: within a NOTICE text file distributed
112 | as part of the Derivative Works; within the Source form or
113 | documentation, if provided along with the Derivative Works; or,
114 | within a display generated by the Derivative Works, if and
115 | wherever such third-party notices normally appear. The contents
116 | of the NOTICE file are for informational purposes only and
117 | do not modify the License. You may add Your own attribution
118 | notices within Derivative Works that You distribute, alongside
119 | or as an addendum to the NOTICE text from the Work, provided
120 | that such additional attribution notices cannot be construed
121 | as modifying the License.
122 |
123 | You may add Your own copyright statement to Your modifications and
124 | may provide additional or different license terms and conditions
125 | for use, reproduction, or distribution of Your modifications, or
126 | for any such Derivative Works as a whole, provided Your use,
127 | reproduction, and distribution of the Work otherwise complies with
128 | the conditions stated in this License.
129 |
130 | 5. Submission of Contributions. Unless You explicitly state otherwise,
131 | any Contribution intentionally submitted for inclusion in the Work
132 | by You to the Licensor shall be under the terms and conditions of
133 | this License, without any additional terms or conditions.
134 | Notwithstanding the above, nothing herein shall supersede or modify
135 | the terms of any separate license agreement you may have executed
136 | with Licensor regarding such Contributions.
137 |
138 | 6. Trademarks. This License does not grant permission to use the trade
139 | names, trademarks, service marks, or product names of the Licensor,
140 | except as required for reasonable and customary use in describing the
141 | origin of the Work and reproducing the content of the NOTICE file.
142 |
143 | 7. Disclaimer of Warranty. Unless required by applicable law or
144 | agreed to in writing, Licensor provides the Work (and each
145 | Contributor provides its Contributions) on an "AS IS" BASIS,
146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147 | implied, including, without limitation, any warranties or conditions
148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149 | PARTICULAR PURPOSE. You are solely responsible for determining the
150 | appropriateness of using or redistributing the Work and assume any
151 | risks associated with Your exercise of permissions under this License.
152 |
153 | 8. Limitation of Liability. In no event and under no legal theory,
154 | whether in tort (including negligence), contract, or otherwise,
155 | unless required by applicable law (such as deliberate and grossly
156 | negligent acts) or agreed to in writing, shall any Contributor be
157 | liable to You for damages, including any direct, indirect, special,
158 | incidental, or consequential damages of any character arising as a
159 | result of this License or out of the use or inability to use the
160 | Work (including but not limited to damages for loss of goodwill,
161 | work stoppage, computer failure or malfunction, or any and all
162 | other commercial damages or losses), even if such Contributor
163 | has been advised of the possibility of such damages.
164 |
165 | 9. Accepting Warranty or Additional Liability. While redistributing
166 | the Work or Derivative Works thereof, You may choose to offer,
167 | and charge a fee for, acceptance of support, warranty, indemnity,
168 | or other liability obligations and/or rights consistent with this
169 | License. However, in accepting such obligations, You may act only
170 | on Your own behalf and on Your sole responsibility, not on behalf
171 | of any other Contributor, and only if You agree to indemnify,
172 | defend, and hold each Contributor harmless for any liability
173 | incurred by, or claims asserted against, such Contributor by reason
174 | of your accepting any such warranty or additional liability.
175 |
176 | END OF TERMS AND CONDITIONS
177 |
178 | APPENDIX: How to apply the Apache License to your work.
179 |
180 | To apply the Apache License to your work, attach the following
181 | boilerplate notice, with the fields enclosed by brackets "[]"
182 | replaced with your own identifying information. (Don't include
183 | the brackets!) The text should be enclosed in the appropriate
184 | comment syntax for the file format. We also recommend that a
185 | file or class name and description of purpose be included on the
186 | same "printed page" as the copyright notice for easier
187 | identification within third-party archives.
188 |
189 | Copyright [yyyy] [name of copyright owner]
190 |
191 | Licensed under the Apache License, Version 2.0 (the "License");
192 | you may not use this file except in compliance with the License.
193 | You may obtain a copy of the License at
194 |
195 | http://www.apache.org/licenses/LICENSE-2.0
196 |
197 | Unless required by applicable law or agreed to in writing, software
198 | distributed under the License is distributed on an "AS IS" BASIS,
199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200 | See the License for the specific language governing permissions and
201 | limitations under the License.
202 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # fluent-plugin-sanitizer
2 | Sanitizeris [Fluentd](https://fluentd.org/) filter plugin to mask sensitive information. With Sanitizer, you can mask based on key-value pairs on the fly in between Fluentd processes. Sanitizer provides options which enable you to mask values with custom rules. In custom rules, you can specify patterns such as IP addresses, hostnames in FQDN style, regular expressions and keywords. In terms of IP addresses and hostnames, Sanitizer delivers useful options which allows you to easily mask IP addresses and hostnames in complex messages.
3 |
4 | ## Installation
5 | When you are using OSS Fluentd :
6 | ```
7 | fluent-gem install fluent-plugin-sanitizer
8 | ```
9 | When you are using td-agent :
10 | ```
11 | td-agent-gem install fluent-plugin-sanitizer
12 | ```
13 |
14 | ## Configuration
15 | ### Parameters
16 | - hash_salt (optional) : hash salt used when calculating hash value with original information.
17 | - hash_scheme (optional) : Hash scheme to use for generating hash value. Supported schemes are `md5`,`sha1`,`sha256`,`sha384` and `sha512`. (default: `md5`)
18 | - rule options :
19 | - keys (mandatory) : Name of keys whose values will be masked. You can specify multiple keys. When keys are nested, you can use {parent key}.{child key} like "kubernetes.master_url".
20 | - pattern_ipv4 (optional) : Mask IP addresses in IPv4 format. You can use “true” or “false”. (defalt: false)
21 | - pattern_fqdn (optional) : Mask hostname in FQDN style. You can use “true” or “false”. (defalt: false)
22 | - pattern_regex (optional) : Mask value mactches custom regular expression.
23 | - regex_capture_group (optional) : If you define capture group in regular expression, you can specify the name of capture group to be masked.
24 | - pattern_regex_prefix (optional) : Define prefix used for masking vales. (default: Regex)
25 | - pattern_keywords (optional) : Mask values match custom keywords. You can specify multiple keywords.
26 | - pattern_keywords_prefix (optional) : Define prefix used for masking vales. (default: Keyword)
27 |
28 | You can specify multiple rules in a single configuration. It is also possible to define multiple pattern options in a single rule like the following sample.
29 |
30 | ```
31 |
32 | @type sanitizer
33 | hash_salt mysalt
34 |
35 | keys source, kubernetes.master_url
36 | pattern_ipv4 true
37 |
38 |
39 | keys hostname, host
40 | pattern_fqdn true
41 |
42 |
43 | keys message, system.log
44 | pattern_regex /^Hello World!$/
45 | pattern_keywords password, passwd
46 |
47 |
48 | ```
49 |
50 | ## Use cases
51 | ### Mask IP addresses and Hostnames
52 | Masking IP addresses and hostnames is one of the typical use cases of security operations. You just need to specify the name of keys that potentially have IP addresses and hostnames in value. Here is a configuration sample as well as input and output samples.
53 |
54 | **Configuration sample**
55 | ```
56 |
57 | @type sanitizer
58 | hash_salt mysalt
59 | hash_scheme md5
60 |
61 | keys ip
62 | pattern_ipv4 true
63 |
64 |
65 | keys host
66 | pattern_fqdn true
67 |
68 |
69 | keys system.url, system.log
70 | pattern_ipv4 true
71 | pattern_fqdn true
72 |
73 |
74 | ```
75 | **Input sample**
76 | ```
77 | {
78 | "ip" : "192.168.10.10",
79 | "host" : "test01.demo.com",
80 | "system" : {
81 | "url" : "https://test02.demo.com:8000/event",
82 | "log" : "access from 192.168.10.100 was blocked"
83 | }
84 | }
85 | ```
86 | **Output sample**
87 | ```
88 | {
89 | "ip" : "IPv4_94712b06963e277fe28469388323665d",
90 | "host" : "FQDN_37de34e3d799de477c742d8d7bb35550",
91 | "system" : {
92 | "url" : "https://FQDN_e9a59624f555d02f06209c9942dded19:8000/event"
93 | "log" : "access from IPv4_f7374d61e6d21dc1105f70358a5f8e8f was blocked"
94 | }
95 | }
96 | ```
97 | ### Mask words match custom keyword and regular expression
98 | In case log messages including sensitive information such as SSN and phone number, Sanitizer could also help you. If you know the exact keyword that needs to be masked, you can use the keyword option. You can also use the regex option if you like to mask information which matches custom a regular expression.
99 |
100 | **Configuration sample**
101 | ```
102 |
103 | @type sanitizer
104 | hash_salt mysalt
105 |
106 | keys user.ssn
107 | pattern_regex /^(?!(000|666|9))\d{3}-(?!00)\d{2}-(?!0000)\d{4}$/
108 | pattern_regex_prefix SSN
109 |
110 |
111 | keys user.phone
112 | pattern_regex /^\d{3}-?\d{3}-?\d{4}$/
113 | pattern_regex_prefix Phone
114 |
115 |
116 | ```
117 | **Input sample**
118 | ```
119 | {
120 | "user" : {
121 | "ssn" : "123-45-6789"
122 | "phone" : "123-456-7890"
123 | }
124 | }
125 | ```
126 | **Output sample**
127 | ```
128 | {
129 | "user" : {
130 | "ssn" : "SSN_f6b6430343a9a749e12db8a112ca74e9"
131 | "phone" : "Phone_0a25187902a0cf755627397eb085d736"
132 | }
133 | }
134 | ```
135 | From v0.1.2, "regex_capture_group" option is available. With "regex_capture_group" option, it is possible to mask specific part of original messages.
136 |
137 | **Configuration sample**
138 | ```
139 |
140 | keys user.email
141 | pattern_regex /(?\w+)\@\w+.\w+/
142 | regex_capture_group "user"
143 | pattern_regex_prefix "USER"
144 |
145 | ```
146 | **Input sample**
147 | ```
148 | {
149 | "user" : {
150 | "email" : "user1@demo.com"
151 | }
152 | }
153 | ```
154 | **Output sample**
155 | ```
156 | {
157 | "user" : {
158 | "email" : "USER_321865df6f0ce6bdf3ea16f74623534a@demo.com"
159 | }
160 | }
161 | ```
162 |
163 | ### Tips : Debug how sanitizer works
164 | When you design custom rules in a configuration file, you might need information about how Sanitizer masks original values into hash values for debugging purposes. You can check that information if you run td-agent/Fluentd with debug option enabled. The debug information is shown in the log file of td-agent/Fluentd like the following log message sample.
165 |
166 | **Log message sample**
167 | ```
168 | YYYY-MM-DD Time fluent.debug: {"message":"[pattern_regex] sanitize '123-45-6789' to 'SSN_f6b6430343a9a749e12db8a112ca74e9'"}
169 | YYYY-MM-DD Time fluent.debug: {"message":"[pattern_regex] sanitize '123-456-7890' to 'Phone_0a25187902a0cf755627397eb085d736'"}
170 | ```
171 | ## Contribute
172 | Contribution to fluent-plugin-sanitizer is always welcomed.
173 |
174 |
175 | ## Copyright
176 | * Copyright(c) 2021- TK Kubota
177 | * License
178 | * Apache License, Version 2.0
179 |
--------------------------------------------------------------------------------
/Rakefile:
--------------------------------------------------------------------------------
1 | require "bundler"
2 | Bundler::GemHelper.install_tasks
3 |
4 | require "rake/testtask"
5 |
6 | Rake::TestTask.new(:test) do |t|
7 | t.libs.push("lib", "test")
8 | t.test_files = FileList["test/**/test_*.rb"]
9 | t.verbose = true
10 | t.warning = true
11 | end
12 |
13 | task default: [:test]
14 |
--------------------------------------------------------------------------------
/fluent-plugin-sanitizer.gemspec:
--------------------------------------------------------------------------------
1 | lib = File.expand_path("../lib", __FILE__)
2 | $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
3 |
4 | Gem::Specification.new do |spec|
5 | spec.name = "fluent-plugin-sanitizer"
6 | spec.version = "0.1.3"
7 | spec.authors = ["TK Kubota"]
8 | spec.email = ["tkubota@ctc-america.com"]
9 |
10 | spec.summary = %q{Filter plugin of Fluentd which sanitize sensitive information.}
11 | spec.description = %q{The fluent-plugin-sanitzer is Fluentd filter plugin to sanitize sensitive information with custom rules. The fluent-plugin-sanitzer provides not only options to sanitize values with custom regular expression and keywords but also build-in options which allows users to easily sanitize IP addresses and hostnames in complex messages.}
12 | spec.homepage = "https://github.com/fluent/fluent-plugin-sanitizer"
13 | spec.license = "Apache-2.0"
14 |
15 | test_files, files = `git ls-files -z`.split("\x0").partition do |f|
16 | f.match(%r{^(test|spec|features)/})
17 | end
18 | spec.files = files
19 | spec.executables = files.grep(%r{^bin/}) { |f| File.basename(f) }
20 | spec.test_files = test_files
21 | spec.require_paths = ["lib"]
22 |
23 | spec.add_development_dependency "bundler", "~> 1.14"
24 | spec.add_development_dependency "rake", "~> 12.0"
25 | spec.add_development_dependency "test-unit", "~> 3.0"
26 | spec.add_runtime_dependency "fluentd", [">= 0.14.10", "< 2"]
27 | end
28 |
--------------------------------------------------------------------------------
/lib/fluent/plugin/filter_sanitizer.rb:
--------------------------------------------------------------------------------
1 | #
2 | # Copyright 2021- TK Kubota
3 | #
4 | # Licensed under the Apache License, Version 2.0 (the "License");
5 | # you may not use this file except in compliance with the License.
6 | # You may obtain a copy of the License at
7 | #
8 | # http://www.apache.org/licenses/LICENSE-2.0
9 | #
10 | # Unless required by applicable law or agreed to in writing, software
11 | # distributed under the License is distributed on an "AS IS" BASIS,
12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13 | # See the License for the specific language governing permissions and
14 | # limitations under the License.
15 |
16 | require "fluent/plugin/filter"
17 | require "digest"
18 |
19 | module Fluent
20 | module Plugin
21 | class SanitizerFilter < Fluent::Plugin::Filter
22 | Fluent::Plugin.register_filter("sanitizer", self)
23 |
24 | helpers :event_emitter, :record_accessor
25 |
26 | desc "Hash salt to be used to generate hash values with specified hash(optional)"
27 | config_param :hash_salt, :string, default: ""
28 |
29 | desc "Hash scheme to use for generating hash value (supported schemes are md5,sha1,sha256,sha384,sha512) (optional)"
30 | config_param :hash_scheme, :enum, list: [:md5, :sha1, :sha256, :sha384, :sha512], default: :md5
31 |
32 | config_section :rule, param_name: :rules, multi: true do
33 | desc "Name of keys whose values are to be sanitized"
34 | config_param :keys, :array, default: []
35 | desc "Sanitize if values contain IPv4 (optional)"
36 | config_param :pattern_ipv4, :bool, default: false
37 | desc "Sanitize if values contain Hostname in FQDN style (optional)"
38 | config_param :pattern_fqdn, :bool, default: false
39 | desc "Sanitize if values match custom regular expression (optional)"
40 | config_param :pattern_regex, :regexp, default: /^$/
41 | desc "Target capture group name to be masked (optional)"
42 | config_param :regex_capture_group, :string, default:""
43 | desc "Prefix for pattern_regex (optional)"
44 | config_param :pattern_regex_prefix, :string, default: "Regex"
45 | desc "Sanitize if values match custom keywords (optional)"
46 | config_param :pattern_keywords, :array, default: []
47 | desc "Prefix for pattern_keywords (optional)"
48 | config_param :pattern_keywords_prefix, :string, default: "Keywords"
49 | end
50 |
51 | def configure(conf)
52 | super
53 | @salt = conf['hash_salt']
54 | @salt = "" if @salt.nil?
55 | @hash_scheme = conf['hash_scheme']
56 | @sanitize_func =
57 | case @hash_scheme
58 | when "sha1"
59 | Proc.new { |str| Digest::SHA1.hexdigest(@salt + str) }
60 | when "sha256"
61 | Proc.new { |str| Digest::SHA256.hexdigest(@salt +str) }
62 | when "sha384"
63 | Proc.new { |str| Digest::SHA384.hexdigest(@salt +str) }
64 | when "sha512"
65 | Proc.new { |str| Digest::SHA512.hexdigest(@salt +str) }
66 | else
67 | Proc.new { |str| Digest::MD5.hexdigest(@salt +str) }
68 | end
69 |
70 | @sanitizerules = []
71 | @rules.each do |rule|
72 | if rule.keys.empty?
73 | raise Fluent::ConfigError, "You need to specify at least one key in rule statement."
74 | else
75 | keys = rule.keys
76 | end
77 |
78 | if rule.pattern_ipv4 || !rule.pattern_ipv4
79 | pattern_ipv4 = rule.pattern_ipv4
80 | else
81 | raise Fluent::ConfigError, "true or false is available for pattern_ipv4 option."
82 | end
83 |
84 | if rule.pattern_fqdn || !rule.pattern_fqdn
85 | pattern_fqdn = rule.pattern_fqdn
86 | else
87 | raise Fluent::ConfigError, "true or false is available for pattern_fqdn option."
88 | end
89 |
90 | if rule.pattern_regex.class == Regexp
91 | pattern_regex = rule.pattern_regex
92 | regex_capture_group = rule.regex_capture_group
93 | else
94 | raise Fluent::ConfigError, "Your need to specify Regexp for pattern_regex option."
95 | end
96 |
97 | pattern_keywords = rule.pattern_keywords
98 |
99 | regex_prefix = rule.pattern_regex_prefix
100 | keywords_prefix = rule.pattern_keywords_prefix
101 |
102 | @sanitizerules.push([keys, pattern_ipv4, pattern_fqdn, pattern_regex, regex_capture_group, pattern_keywords, regex_prefix, keywords_prefix])
103 | end
104 | end
105 |
106 | def filter(tag, time, record)
107 | @sanitizerules.each do |keys, pattern_ipv4, pattern_fqdn, pattern_regex, regex_capture_group, pattern_keywords, regex_prefix, keywords_prefix|
108 | keys.each do |key|
109 | accessor = record_accessor_create("$."+key.to_s)
110 | begin
111 | if pattern_ipv4 && accessor.call(record)
112 | accessor.set(record, sanitize_ipv4_val(accessor.call(record).to_s))
113 | end
114 | if pattern_fqdn && accessor.call(record)
115 | accessor.set(record, sanitize_fqdn_val(accessor.call(record).to_s))
116 | end
117 | if !pattern_regex.to_s.eql?("(?-mix:^$)") && accessor.call(record)
118 | if regex_capture_group.empty?
119 | accessor.set(record, sanitize_regex_val(accessor.call(record), regex_prefix, pattern_regex))
120 | else
121 | accessor.set(record, sanitize_regex_val_capture(accessor.call(record), regex_prefix, pattern_regex, regex_capture_group))
122 | end
123 | #end
124 | end
125 | if !pattern_keywords.empty? && accessor.call(record)
126 | accessor.set(record, sanitize_keywords_val(accessor.call(record).to_s, pattern_keywords, keywords_prefix))
127 | end
128 | rescue => e
129 | log.warn "Skipping this key", error_class: e.class, error: e.message
130 | end
131 | end
132 | end
133 | record
134 | end
135 |
136 | def include_ipv4?(str)
137 | str.match?(/^.*\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}.*$/)
138 | end
139 |
140 | def is_ipv4?(str)
141 | str.match?(/^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$/)
142 | end
143 |
144 | def is_ipv4_port?(str)
145 | str.match?(/^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}:[0-9]{1,5}$/)
146 | end
147 |
148 | def include_fqdn?(str)
149 | str.match?(/^.*\b(([a-zA-Z0-9]|[a-zA-Z0-9][a-zA-Z0-9\-]*[a-zA-Z0-9])\.){2,}([A-Za-z]|[A-Za-z][A-Za-z\-]*[A-Za-z]){2,}.*$/)
150 | end
151 |
152 | def is_fqdn?(str)
153 | str.match?(/^\b(([a-zA-Z0-9]|[a-zA-Z0-9][a-zA-Z0-9\-]*[a-zA-Z0-9])\.){2,}([A-Za-z]|[A-Za-z][A-Za-z\-]*[A-Za-z]){2,}$/)
154 | end
155 |
156 | def is_fqdn_port?(str)
157 | str.match?(/^\b(([a-zA-Z0-9]|[a-zA-Z0-9][a-zA-Z0-9\-]*[a-zA-Z0-9])\.){2,}([A-Za-z]|[A-Za-z][A-Za-z\-]*[A-Za-z]){2,}:[0-9]{1,5}$/)
158 | end
159 |
160 | def is_url?(str)
161 | str.match?(/^[a-zA-Z0-9]{2,}:\/\/.*$/)
162 | end
163 |
164 | def subtract_quotations(str)
165 | str.gsub(/\\\"|\'|\"|\\\'/,'')
166 | end
167 |
168 | def sanitize_ipv4(str)
169 | return "IPv4_"+ @sanitize_func.call(str)
170 | end
171 |
172 | def sanitize_fqdn(str)
173 | return "FQDN_"+ @sanitize_func.call(str)
174 | end
175 |
176 | def sanitize_val(str, prefix)
177 | s = prefix + "_" + @sanitize_func.call(str)
178 | $log.debug "[pattern_regex] sanitize '#{str}' to '#{s}'" if str != s
179 | return s
180 | end
181 |
182 | def sanitize_regex(str, prefix, regex)
183 | regex_p = Regexp.new(regex)
184 | if str =~ regex_p
185 | scans = str.scan(regex).flatten
186 | if scans.any?{ |e| e.nil? }
187 | return prefix + "_" + @sanitize_func.call(str)
188 | else
189 | scans.each do |s|
190 | mask = prefix + "_" + @sanitize_func.call(str)
191 | str = str.gsub(s, mask)
192 | end
193 | end
194 | return str
195 | else
196 | $log.debug "[pattern_regex] #{str} does not match given regex #{regex}. skip this rule."
197 | return str
198 | end
199 | end
200 |
201 | def sanitize_regex_capture(str, prefix, regex, capture_group)
202 | regex_p = Regexp.new(regex)
203 | if str =~ regex_p
204 | if str.match(regex).names.include?(capture_group)
205 | scans = str.scan(regex).flatten
206 | scans.each do |s|
207 | mask = prefix + "_" + @sanitize_func.call(str)
208 | str = str.gsub(s, mask)
209 | end
210 | return str
211 | else
212 | $log.debug "[pattern_regex] regex pattern matched but capture group '#{capture_group}' does not exist. Skip this rule."
213 | return str
214 | end
215 | else
216 | $log.debug "[pattern_regex] #{str} does not match given regex #{regex}. Skip this rule."
217 | return str
218 | end
219 | end
220 |
221 | def sanitize_keyword(str, prefix)
222 | return prefix + "_" + @sanitize_func.call(str)
223 | end
224 |
225 | def sanitize_ipv4_port(str)
226 | ip_port = []
227 | str.split(":").each do |s|
228 | s = sanitize_ipv4(s) if is_ipv4?(s)
229 | ip_port.push(s)
230 | end
231 | return ip_port.join(":")
232 | end
233 |
234 | def sanitize_fqdn_port(str)
235 | fqdn_port = []
236 | str.split(":").each do |s|
237 | s = sanitize_fqdn(s) if is_fqdn?(s)
238 | fqdn_port.push(s)
239 | end
240 | return fqdn_port.join(":")
241 | end
242 |
243 | def sanitize_ipv4_url(str)
244 | ip_url = []
245 | str.split("://").each do |s|
246 | if s.include?("/")
247 | url_slash = []
248 | s.split("/").each do |ss|
249 | ss = sanitize_ipv4(ss) if is_ipv4?(ss)
250 | ss = sanitize_ipv4_port(ss) if is_ipv4_port?(ss)
251 | url_slash.push(ss)
252 | end
253 | s = url_slash.join("/")
254 | else
255 | s = sanitize_ipv4_port(s) if is_ipv4_port?(s)
256 | s = sanitize_ipv4_port(s) if is_ipv4_port?(s)
257 | end
258 | ip_url.push(s)
259 | end
260 | return ip_url.join("://")
261 | end
262 |
263 | def sanitize_fqdn_url(str)
264 | fqdn_url = []
265 | str.split("://").each do |s|
266 | if s.include?("/")
267 | url_slash = []
268 | s.split("/").each do |ss|
269 | ss = sanitize_fqdn(ss) if is_fqdn?(ss)
270 | ss = sanitize_fqdn_port(ss) if is_fqdn_port?(ss)
271 | url_slash.push(ss)
272 | end
273 | s = url_slash.join("/")
274 | else
275 | s = sanitize_fqdn(s) if is_fqdn?(s)
276 | s = sanitize_fqdn_port(s) if is_fqdn_port?(s)
277 | end
278 | fqdn_url.push(s)
279 | end
280 | return fqdn_url.join("://")
281 | end
282 |
283 | def sanitize_ipv4_val(v)
284 | line = []
285 | if v.include?(",")
286 | v.split(",").each do |s|
287 | s = subtract_quotations(s)
288 | if include_ipv4?(s)
289 | if is_url?(s)
290 | s = sanitize_ipv4_url(s)
291 | else
292 | s = sanitize_ipv4(s) if is_ipv4?(s)
293 | s = sanitize_ipv4_port(s) if is_ipv4_port?(s)
294 | end
295 | end
296 | line.push(s)
297 | end
298 | return line.join(",")
299 | else
300 | v.split().each do |s|
301 | s = subtract_quotations(s)
302 | if include_ipv4?(s)
303 | if is_url?(s)
304 | s = sanitize_ipv4_url(s)
305 | else
306 | s = sanitize_ipv4(s) if is_ipv4?(s)
307 | s = sanitize_ipv4_port(s) if is_ipv4_port?(s)
308 | end
309 | end
310 | line.push(s)
311 | end
312 | $log.debug "[pattern_ipv4] sanitize '#{v}' to '#{line.join(" ")}'" if v != line.join(" ")
313 | return line.join(" ")
314 | end
315 | end
316 |
317 | def sanitize_fqdn_val(v)
318 | line = []
319 | if v.include?(",")
320 | v.split(",").each do |s|
321 | s = subtract_quotations(s)
322 | if include_fqdn?(s)
323 | if is_url?(s)
324 | s = sanitize_fqdn_url(s)
325 | else
326 | s = sanitize_fqdn(s) if is_fqdn?(s)
327 | s = sanitize_fqdn_port(s) if is_fqdn_port?(s)
328 | end
329 | end
330 | line.push(s)
331 | end
332 | return line.join(",")
333 | else
334 | v.split().each do |s|
335 | s = subtract_quotations(s)
336 | if include_fqdn?(s)
337 | if is_url?(s)
338 | s = sanitize_fqdn_url(s)
339 | else
340 | s = sanitize_fqdn(s) if is_fqdn?(s)
341 | s = sanitize_fqdn_port(s) if is_fqdn_port?(s)
342 | end
343 | end
344 | line.push(s)
345 | end
346 | $log.debug "[pattern_fqdn] sanitize '#{v}' to '#{line.join(" ")}'" if v != line.join(" ")
347 | return line.join(" ")
348 | end
349 | end
350 |
351 | def sanitize_regex_val(v, prefix, regex)
352 | s = sanitize_regex(v, prefix, regex)
353 | $log.debug "[pattern_regex] sanitize '#{v}' to '#{s}'" if v != s
354 | return s
355 | end
356 |
357 | def sanitize_regex_val_capture(v, prefix, regex, capture_group)
358 | s = sanitize_regex_capture(v, prefix, regex, capture_group)
359 | $log.debug "[pattern_regex] sanitize '#{v}' to '#{s}'" if v != s
360 | return s
361 | end
362 |
363 | def sanitize_keywords_val(v, keywords, prefix)
364 | line = []
365 | v.split().each do |vv|
366 | if keywords.include?(vv)
367 | line.push(sanitize_keyword(vv, prefix))
368 | else
369 | line.push(vv)
370 | end
371 | end
372 | $log.debug "[pattern_keywords] sanitize '#{v}' to '#{line.join(" ")}'" if v != line.join(" ")
373 | return line.join(" ")
374 | end
375 |
376 | end
377 | end
378 | end
379 |
--------------------------------------------------------------------------------
/test/helper.rb:
--------------------------------------------------------------------------------
1 | $LOAD_PATH.unshift(File.expand_path("../../", __FILE__))
2 | require "test-unit"
3 | require "fluent/test"
4 | require "fluent/test/driver/filter"
5 | require "fluent/test/helpers"
6 |
7 | Test::Unit::TestCase.include(Fluent::Test::Helpers)
8 | Test::Unit::TestCase.extend(Fluent::Test::Helpers)
9 |
--------------------------------------------------------------------------------
/test/lib/fluent/plugin/filter_sanitizer.rb:
--------------------------------------------------------------------------------
1 | #
2 | # Copyright 2021- TODO: Write your name
3 | #
4 | # Licensed under the Apache License, Version 2.0 (the "License");
5 | # you may not use this file except in compliance with the License.
6 | # You may obtain a copy of the License at
7 | #
8 | # http://www.apache.org/licenses/LICENSE-2.0
9 | #
10 | # Unless required by applicable law or agreed to in writing, software
11 | # distributed under the License is distributed on an "AS IS" BASIS,
12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13 | # See the License for the specific language governing permissions and
14 | # limitations under the License.
15 |
16 | require "fluent/plugin/filter"
17 |
18 | module Fluent
19 | module Plugin
20 | class SanitizerFilter < Fluent::Plugin::Filter
21 | Fluent::Plugin.register_filter("sanitizer", self)
22 |
23 | def filter(tag, time, record)
24 | end
25 | end
26 | end
27 | end
28 |
--------------------------------------------------------------------------------
/test/plugin/test_filter_sanitizer.rb:
--------------------------------------------------------------------------------
1 | require "helper"
2 | require "fluent/plugin/filter_sanitizer.rb"
3 |
4 | class SanitizerFilterTest < Test::Unit::TestCase
5 | setup do
6 | Fluent::Test.setup
7 | end
8 |
9 | test "failure" do
10 | flunk
11 | end
12 |
13 | private
14 |
15 | def create_driver(conf)
16 | Fluent::Test::Driver::Filter.new(Fluent::Plugin::SanitizerFilter).configure(conf)
17 | end
18 | end
19 |
--------------------------------------------------------------------------------