├── .github
    ├── CONTRIBUTING.md
    ├── ISSUE_TEMPLATE.md
    └── PULL_REQUEST_TEMPLATE.md
├── .gitignore
├── .travis.yml
├── CHANGELOG.md
├── CONTRIBUTORS
├── Gemfile
├── LICENSE
├── NOTICE.TXT
├── README.md
├── Rakefile
├── build.gradle
├── docs
    └── index.asciidoc
├── gradle.properties
├── gradle
    └── wrapper
    │   ├── gradle-wrapper.jar
    │   └── gradle-wrapper.properties
├── gradlew
├── gradlew.bat
├── lib
    └── logstash
    │   └── outputs
    │       ├── webhdfs.rb
    │       └── webhdfs_helper.rb
├── logstash-output-webhdfs.gemspec
└── spec
    ├── integration
        └── webhdfs_spec.rb
    └── outputs
        ├── webhdfs_helper_spec.rb
        └── webhdfs_spec.rb


/.github/CONTRIBUTING.md:
--------------------------------------------------------------------------------
 1 | # Contributing to Logstash
 2 | 
 3 | All contributions are welcome: ideas, patches, documentation, bug reports,
 4 | complaints, etc!
 5 | 
 6 | Programming is not a required skill, and there are many ways to help out!
 7 | It is more important to us that you are able to contribute.
 8 | 
 9 | That said, some basic guidelines, which you are free to ignore :)
10 | 
11 | ## Want to learn?
12 | 
13 | Want to lurk about and see what others are doing with Logstash? 
14 | 
15 | * The irc channel (#logstash on irc.freenode.org) is a good place for this
16 | * The [forum](https://discuss.elastic.co/c/logstash) is also
17 |   great for learning from others.
18 | 
19 | ## Got Questions?
20 | 
21 | Have a problem you want Logstash to solve for you? 
22 | 
23 | * You can ask a question in the [forum](https://discuss.elastic.co/c/logstash)
24 | * Alternately, you are welcome to join the IRC channel #logstash on
25 | irc.freenode.org and ask for help there!
26 | 
27 | ## Have an Idea or Feature Request?
28 | 
29 | * File a ticket on [GitHub](https://github.com/elastic/logstash/issues). Please remember that GitHub is used only for issues and feature requests. If you have a general question, the [forum](https://discuss.elastic.co/c/logstash) or IRC would be the best place to ask.
30 | 
31 | ## Something Not Working? Found a Bug?
32 | 
33 | If you think you found a bug, it probably is a bug.
34 | 
35 | * If it is a general Logstash or a pipeline issue, file it in [Logstash GitHub](https://github.com/elasticsearch/logstash/issues)
36 | * If it is specific to a plugin, please file it in the respective repository under [logstash-plugins](https://github.com/logstash-plugins)
37 | * or ask the [forum](https://discuss.elastic.co/c/logstash).
38 | 
39 | # Contributing Documentation and Code Changes
40 | 
41 | If you have a bugfix or new feature that you would like to contribute to
42 | logstash, and you think it will take more than a few minutes to produce the fix
43 | (ie; write code), it is worth discussing the change with the Logstash users and developers first! You can reach us via [GitHub](https://github.com/elastic/logstash/issues), the [forum](https://discuss.elastic.co/c/logstash), or via IRC (#logstash on freenode irc)
44 | Please note that Pull Requests without tests will not be merged. If you would like to contribute but do not have experience with writing tests, please ping us on IRC/forum or create a PR and ask our help.
45 | 
46 | ## Contributing to plugins
47 | 
48 | Check our [documentation](https://www.elastic.co/guide/en/logstash/current/contributing-to-logstash.html) on how to contribute to plugins or write your own! It is super easy!
49 | 
50 | ## Contribution Steps
51 | 
52 | 1. Test your changes! [Run](https://github.com/elastic/logstash#testing) the test suite
53 | 2. Please make sure you have signed our [Contributor License
54 |    Agreement](https://www.elastic.co/contributor-agreement/). We are not
55 |    asking you to assign copyright to us, but to give us the right to distribute
56 |    your code without restriction. We ask this of all contributors in order to
57 |    assure our users of the origin and continuing existence of the code. You
58 |    only need to sign the CLA once.
59 | 3. Send a pull request! Push your changes to your fork of the repository and
60 |    [submit a pull
61 |    request](https://help.github.com/articles/using-pull-requests). In the pull
62 |    request, describe what your changes do and mention any bugs/issues related
63 |    to the pull request.
64 | 
65 | 
66 | 


--------------------------------------------------------------------------------
/.github/ISSUE_TEMPLATE.md:
--------------------------------------------------------------------------------
 1 | Please post all product and debugging questions on our [forum](https://discuss.elastic.co/c/logstash). Your questions will reach our wider community members there, and if we confirm that there is a bug, then we can open a new issue here.
 2 | 
 3 | For all general issues, please provide the following details for fast resolution:
 4 | 
 5 | - Version:
 6 | - Operating System:
 7 | - Config File (if you have sensitive info, please remove it):
 8 | - Sample Data:
 9 | - Steps to Reproduce:
10 | 


--------------------------------------------------------------------------------
/.github/PULL_REQUEST_TEMPLATE.md:
--------------------------------------------------------------------------------
1 | Thanks for contributing to Logstash! If you haven't already signed our CLA, here's a handy link: https://www.elastic.co/contributor-agreement/
2 | 


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | *.lock
2 | 


--------------------------------------------------------------------------------
/.travis.yml:
--------------------------------------------------------------------------------
1 | import:
2 | - logstash-plugins/.ci:travis/travis.yml@1.x


--------------------------------------------------------------------------------
/CHANGELOG.md:
--------------------------------------------------------------------------------
 1 | ## 3.1.0
 2 |   - Fix: remove snappy gem as a dependency in favor of directly vendoring the snappy jar. [#46](https://github.com/logstash-plugins/logstash-output-webhdfs/pull/46)
 3 | 
 4 | ## 3.0.6
 5 |   - Docs: Set the default_codec doc attribute.
 6 | 
 7 | ## 3.0.5
 8 |   - Update gemspec summary
 9 | 
10 | ## 3.0.4
11 |   - Fix some documentation issues
12 | 
13 | ## 3.0.2
14 |   - Relax constraint on logstash-core-plugin-api to >= 1.60 <= 2.99
15 | 
16 | # 3.0.1
17 |   - Force encoding the gemspec of this plugin into utf-8 to make sure updating all the plugin works see https://github.com/elastic/logstash/issues/5468
18 |   - Use oracle JDK8 for travis build
19 | # 3.0.0
20 |   - Use new Event API defined in Logstash 5.x (backwards incompatible change)
21 | # 2.0.4
22 |   - Depend on logstash-core-plugin-api instead of logstash-core, removing the need to mass update plugins on major releases of logstash
23 | # 2.0.3
24 |   - New dependency requirements for logstash-core for the 5.0 release
25 | ## 2.0.0
26 |  - Plugins were updated to follow the new shutdown semantic, this mainly allows Logstash to instruct input plugins to terminate gracefully,
27 |    instead of using Thread.raise on the plugins' threads. Ref: https://github.com/elastic/logstash/pull/3895
28 |  - Dependency on logstash-core update to 2.0
29 | 
30 | ## 0.1.0
31 | * First version of the webhdfs plugin output
32 | 


--------------------------------------------------------------------------------
/CONTRIBUTORS:
--------------------------------------------------------------------------------
 1 | The following is a list of people who have contributed ideas, code, bug
 2 | reports, or in general have helped logstash along its way.
 3 | 
 4 | Maintainers:
 5 | * Björn Puttmann, dbap GmbH (dstore-dbap)
 6 | 
 7 | Contributors:
 8 | * Björn Puttmann, dbap GmbH (dstore-dbap)
 9 | * Pier-Hugues Pellerin (ph)
10 | * Pere Urbón (purbon)
11 | * Suyog Rao (suyograo)
12 | * João Duarte (jsvd)
13 | * Shaunak Kashyap (ycombinator)
14 | 
15 | Note: If you've sent us patches, bug reports, or otherwise contributed to
16 | Logstash, and you aren't on the list above and want to be, please let us know
17 | and we'll make sure you're here. Contributions from folks like you are what make
18 | open source awesome.
19 | 


--------------------------------------------------------------------------------
/Gemfile:
--------------------------------------------------------------------------------
 1 | source 'https://rubygems.org'
 2 | 
 3 | gemspec
 4 | 
 5 | logstash_path = ENV["LOGSTASH_PATH"] || "../../logstash"
 6 | use_logstash_source = ENV["LOGSTASH_SOURCE"] && ENV["LOGSTASH_SOURCE"].to_s == "1"
 7 | 
 8 | if Dir.exist?(logstash_path) && use_logstash_source
 9 |   gem 'logstash-core', :path => "#{logstash_path}/logstash-core"
10 |   gem 'logstash-core-plugin-api', :path => "#{logstash_path}/logstash-core-plugin-api"
11 | end
12 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
  1 | 
  2 |                                  Apache License
  3 |                            Version 2.0, January 2004
  4 |                         http://www.apache.org/licenses/
  5 | 
  6 |    TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
  7 | 
  8 |    1. Definitions.
  9 | 
 10 |       "License" shall mean the terms and conditions for use, reproduction,
 11 |       and distribution as defined by Sections 1 through 9 of this document.
 12 | 
 13 |       "Licensor" shall mean the copyright owner or entity authorized by
 14 |       the copyright owner that is granting the License.
 15 | 
 16 |       "Legal Entity" shall mean the union of the acting entity and all
 17 |       other entities that control, are controlled by, or are under common
 18 |       control with that entity. For the purposes of this definition,
 19 |       "control" means (i) the power, direct or indirect, to cause the
 20 |       direction or management of such entity, whether by contract or
 21 |       otherwise, or (ii) ownership of fifty percent (50%) or more of the
 22 |       outstanding shares, or (iii) beneficial ownership of such entity.
 23 | 
 24 |       "You" (or "Your") shall mean an individual or Legal Entity
 25 |       exercising permissions granted by this License.
 26 | 
 27 |       "Source" form shall mean the preferred form for making modifications,
 28 |       including but not limited to software source code, documentation
 29 |       source, and configuration files.
 30 | 
 31 |       "Object" form shall mean any form resulting from mechanical
 32 |       transformation or translation of a Source form, including but
 33 |       not limited to compiled object code, generated documentation,
 34 |       and conversions to other media types.
 35 | 
 36 |       "Work" shall mean the work of authorship, whether in Source or
 37 |       Object form, made available under the License, as indicated by a
 38 |       copyright notice that is included in or attached to the work
 39 |       (an example is provided in the Appendix below).
 40 | 
 41 |       "Derivative Works" shall mean any work, whether in Source or Object
 42 |       form, that is based on (or derived from) the Work and for which the
 43 |       editorial revisions, annotations, elaborations, or other modifications
 44 |       represent, as a whole, an original work of authorship. For the purposes
 45 |       of this License, Derivative Works shall not include works that remain
 46 |       separable from, or merely link (or bind by name) to the interfaces of,
 47 |       the Work and Derivative Works thereof.
 48 | 
 49 |       "Contribution" shall mean any work of authorship, including
 50 |       the original version of the Work and any modifications or additions
 51 |       to that Work or Derivative Works thereof, that is intentionally
 52 |       submitted to Licensor for inclusion in the Work by the copyright owner
 53 |       or by an individual or Legal Entity authorized to submit on behalf of
 54 |       the copyright owner. For the purposes of this definition, "submitted"
 55 |       means any form of electronic, verbal, or written communication sent
 56 |       to the Licensor or its representatives, including but not limited to
 57 |       communication on electronic mailing lists, source code control systems,
 58 |       and issue tracking systems that are managed by, or on behalf of, the
 59 |       Licensor for the purpose of discussing and improving the Work, but
 60 |       excluding communication that is conspicuously marked or otherwise
 61 |       designated in writing by the copyright owner as "Not a Contribution."
 62 | 
 63 |       "Contributor" shall mean Licensor and any individual or Legal Entity
 64 |       on behalf of whom a Contribution has been received by Licensor and
 65 |       subsequently incorporated within the Work.
 66 | 
 67 |    2. Grant of Copyright License. Subject to the terms and conditions of
 68 |       this License, each Contributor hereby grants to You a perpetual,
 69 |       worldwide, non-exclusive, no-charge, royalty-free, irrevocable
 70 |       copyright license to reproduce, prepare Derivative Works of,
 71 |       publicly display, publicly perform, sublicense, and distribute the
 72 |       Work and such Derivative Works in Source or Object form.
 73 | 
 74 |    3. Grant of Patent License. Subject to the terms and conditions of
 75 |       this License, each Contributor hereby grants to You a perpetual,
 76 |       worldwide, non-exclusive, no-charge, royalty-free, irrevocable
 77 |       (except as stated in this section) patent license to make, have made,
 78 |       use, offer to sell, sell, import, and otherwise transfer the Work,
 79 |       where such license applies only to those patent claims licensable
 80 |       by such Contributor that are necessarily infringed by their
 81 |       Contribution(s) alone or by combination of their Contribution(s)
 82 |       with the Work to which such Contribution(s) was submitted. If You
 83 |       institute patent litigation against any entity (including a
 84 |       cross-claim or counterclaim in a lawsuit) alleging that the Work
 85 |       or a Contribution incorporated within the Work constitutes direct
 86 |       or contributory patent infringement, then any patent licenses
 87 |       granted to You under this License for that Work shall terminate
 88 |       as of the date such litigation is filed.
 89 | 
 90 |    4. Redistribution. You may reproduce and distribute copies of the
 91 |       Work or Derivative Works thereof in any medium, with or without
 92 |       modifications, and in Source or Object form, provided that You
 93 |       meet the following conditions:
 94 | 
 95 |       (a) You must give any other recipients of the Work or
 96 |           Derivative Works a copy of this License; and
 97 | 
 98 |       (b) You must cause any modified files to carry prominent notices
 99 |           stating that You changed the files; and
100 | 
101 |       (c) You must retain, in the Source form of any Derivative Works
102 |           that You distribute, all copyright, patent, trademark, and
103 |           attribution notices from the Source form of the Work,
104 |           excluding those notices that do not pertain to any part of
105 |           the Derivative Works; and
106 | 
107 |       (d) If the Work includes a "NOTICE" text file as part of its
108 |           distribution, then any Derivative Works that You distribute must
109 |           include a readable copy of the attribution notices contained
110 |           within such NOTICE file, excluding those notices that do not
111 |           pertain to any part of the Derivative Works, in at least one
112 |           of the following places: within a NOTICE text file distributed
113 |           as part of the Derivative Works; within the Source form or
114 |           documentation, if provided along with the Derivative Works; or,
115 |           within a display generated by the Derivative Works, if and
116 |           wherever such third-party notices normally appear. The contents
117 |           of the NOTICE file are for informational purposes only and
118 |           do not modify the License. You may add Your own attribution
119 |           notices within Derivative Works that You distribute, alongside
120 |           or as an addendum to the NOTICE text from the Work, provided
121 |           that such additional attribution notices cannot be construed
122 |           as modifying the License.
123 | 
124 |       You may add Your own copyright statement to Your modifications and
125 |       may provide additional or different license terms and conditions
126 |       for use, reproduction, or distribution of Your modifications, or
127 |       for any such Derivative Works as a whole, provided Your use,
128 |       reproduction, and distribution of the Work otherwise complies with
129 |       the conditions stated in this License.
130 | 
131 |    5. Submission of Contributions. Unless You explicitly state otherwise,
132 |       any Contribution intentionally submitted for inclusion in the Work
133 |       by You to the Licensor shall be under the terms and conditions of
134 |       this License, without any additional terms or conditions.
135 |       Notwithstanding the above, nothing herein shall supersede or modify
136 |       the terms of any separate license agreement you may have executed
137 |       with Licensor regarding such Contributions.
138 | 
139 |    6. Trademarks. This License does not grant permission to use the trade
140 |       names, trademarks, service marks, or product names of the Licensor,
141 |       except as required for reasonable and customary use in describing the
142 |       origin of the Work and reproducing the content of the NOTICE file.
143 | 
144 |    7. Disclaimer of Warranty. Unless required by applicable law or
145 |       agreed to in writing, Licensor provides the Work (and each
146 |       Contributor provides its Contributions) on an "AS IS" BASIS,
147 |       WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
148 |       implied, including, without limitation, any warranties or conditions
149 |       of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
150 |       PARTICULAR PURPOSE. You are solely responsible for determining the
151 |       appropriateness of using or redistributing the Work and assume any
152 |       risks associated with Your exercise of permissions under this License.
153 | 
154 |    8. Limitation of Liability. In no event and under no legal theory,
155 |       whether in tort (including negligence), contract, or otherwise,
156 |       unless required by applicable law (such as deliberate and grossly
157 |       negligent acts) or agreed to in writing, shall any Contributor be
158 |       liable to You for damages, including any direct, indirect, special,
159 |       incidental, or consequential damages of any character arising as a
160 |       result of this License or out of the use or inability to use the
161 |       Work (including but not limited to damages for loss of goodwill,
162 |       work stoppage, computer failure or malfunction, or any and all
163 |       other commercial damages or losses), even if such Contributor
164 |       has been advised of the possibility of such damages.
165 | 
166 |    9. Accepting Warranty or Additional Liability. While redistributing
167 |       the Work or Derivative Works thereof, You may choose to offer,
168 |       and charge a fee for, acceptance of support, warranty, indemnity,
169 |       or other liability obligations and/or rights consistent with this
170 |       License. However, in accepting such obligations, You may act only
171 |       on Your own behalf and on Your sole responsibility, not on behalf
172 |       of any other Contributor, and only if You agree to indemnify,
173 |       defend, and hold each Contributor harmless for any liability
174 |       incurred by, or claims asserted against, such Contributor by reason
175 |       of your accepting any such warranty or additional liability.
176 | 
177 |    END OF TERMS AND CONDITIONS
178 | 
179 |    APPENDIX: How to apply the Apache License to your work.
180 | 
181 |       To apply the Apache License to your work, attach the following
182 |       boilerplate notice, with the fields enclosed by brackets "[]"
183 |       replaced with your own identifying information. (Don't include
184 |       the brackets!)  The text should be enclosed in the appropriate
185 |       comment syntax for the file format. We also recommend that a
186 |       file or class name and description of purpose be included on the
187 |       same "printed page" as the copyright notice for easier
188 |       identification within third-party archives.
189 | 
190 |    Copyright 2020 Elastic and contributors
191 | 
192 |    Licensed under the Apache License, Version 2.0 (the "License");
193 |    you may not use this file except in compliance with the License.
194 |    You may obtain a copy of the License at
195 | 
196 |        http://www.apache.org/licenses/LICENSE-2.0
197 | 
198 |    Unless required by applicable law or agreed to in writing, software
199 |    distributed under the License is distributed on an "AS IS" BASIS,
200 |    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
201 |    See the License for the specific language governing permissions and
202 |    limitations under the License.
203 | 


--------------------------------------------------------------------------------
/NOTICE.TXT:
--------------------------------------------------------------------------------
1 | Elasticsearch
2 | Copyright 2012-2015 Elasticsearch
3 | 
4 | This product includes software developed by The Apache Software
5 | Foundation (http://www.apache.org/).


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Logstash Plugin
 2 | 
 3 | [![Travis Build Status](https://travis-ci.com/logstash-plugins/logstash-output-webhdfs.svg)](https://travis-ci.com/logstash-plugins/logstash-output-webhdfs)
 4 | 
 5 | This is a plugin for [Logstash](https://github.com/elastic/logstash).
 6 | 
 7 | It is fully free and fully open source. The license is Apache 2.0, meaning you are pretty much free to use it however you want in whatever way.
 8 | 
 9 | ## Documentation
10 | 
11 | Logstash provides infrastructure to automatically generate documentation for this plugin. We use the asciidoc format to write documentation so any comments in the source code will be first converted into asciidoc and then into html. All plugin documentation are placed under one [central location](http://www.elastic.co/guide/en/logstash/current/).
12 | 
13 | - For formatting code or config example, you can use the asciidoc `[source,ruby]` directive
14 | - For more asciidoc formatting tips, see the excellent reference here https://github.com/elastic/docs#asciidoc-guide
15 | 
16 | ## Need Help?
17 | 
18 | Need help? Try #logstash on freenode IRC or the https://discuss.elastic.co/c/logstash discussion forum.
19 | 
20 | ## Developing
21 | 
22 | ### 1. Plugin Developement and Testing
23 | 
24 | #### Code
25 | - To get started, you'll need JRuby with the Bundler gem installed.
26 | 
27 | - Create a new plugin or clone and existing from the GitHub [logstash-plugins](https://github.com/logstash-plugins) organization. We also provide [example plugins](https://github.com/logstash-plugins?query=example).
28 | 
29 | - Install dependencies
30 | ```sh
31 | bundle install
32 | ```
33 | 
34 | #### Test
35 | 
36 | - Update your dependencies
37 | 
38 | ```sh
39 | bundle install
40 | ```
41 | 
42 | - Run tests
43 | 
44 | ```sh
45 | bundle exec rspec
46 | ```
47 | 
48 | ### 2. Running your unpublished Plugin in Logstash
49 | 
50 | #### 2.1 Run in a local Logstash clone
51 | 
52 | - Edit Logstash `Gemfile` and add the local plugin path, for example:
53 | ```ruby
54 | gem "logstash-filter-awesome", :path => "/your/local/logstash-filter-awesome"
55 | ```
56 | - Install plugin
57 | ```sh
58 | # Logstash 2.3 and higher
59 | bin/logstash-plugin install --no-verify
60 | 
61 | # Prior to Logstash 2.3
62 | bin/plugin install --no-verify
63 | 
64 | ```
65 | - Run Logstash with your plugin
66 | ```sh
67 | bin/logstash -e 'filter {awesome {}}'
68 | ```
69 | At this point any modifications to the plugin code will be applied to this local Logstash setup. After modifying the plugin, simply rerun Logstash.
70 | 
71 | #### 2.2 Run in an installed Logstash
72 | 
73 | You can use the same **2.1** method to run your plugin in an installed Logstash by editing its `Gemfile` and pointing the `:path` to your local plugin development directory or you can build the gem and install it using:
74 | 
75 | - Build your plugin gem
76 | ```sh
77 | gem build logstash-filter-awesome.gemspec
78 | ```
79 | - Install the plugin from the Logstash home
80 | ```sh
81 | # Logstash 2.3 and higher
82 | bin/logstash-plugin install --no-verify
83 | 
84 | # Prior to Logstash 2.3
85 | bin/plugin install --no-verify
86 | 
87 | ```
88 | - Start Logstash and proceed to test the plugin
89 | 
90 | ## Contributing
91 | 
92 | All contributions are welcome: ideas, patches, documentation, bug reports, complaints, and even something you drew up on a napkin.
93 | 
94 | Programming is not a required skill. Whatever you've seen about open source and maintainers or community members  saying "send patches or die" - you will not see that here.
95 | 
96 | It is more important to the community that you are able to contribute.
97 | 
98 | For more information about contributing, see the [CONTRIBUTING](https://github.com/elastic/logstash/blob/master/CONTRIBUTING.md) file.
99 | 


--------------------------------------------------------------------------------
/Rakefile:
--------------------------------------------------------------------------------
 1 | # encoding: utf-8
 2 | require "jars/installer"
 3 | require "fileutils"
 4 | require "logstash/devutils/rake"
 5 | 
 6 | task :default do
 7 |   system('rake -vT')
 8 | end
 9 | 
10 | task :vendor do
11 |   exit(1) unless system './gradlew vendor'
12 | end
13 | 
14 | task :clean do
15 |   ["vendor/jar-dependencies", "Gemfile.lock"].each do |p|
16 |     FileUtils.rm_rf(p)
17 |   end
18 | end


--------------------------------------------------------------------------------
/build.gradle:
--------------------------------------------------------------------------------
 1 | import java.nio.file.Files
 2 | import static java.nio.file.StandardCopyOption.REPLACE_EXISTING
 3 | /*
 4 |  * Licensed to Elasticsearch under one or more contributor
 5 |  * license agreements. See the NOTICE file distributed with
 6 |  * this work for additional information regarding copyright
 7 |  * ownership. Elasticsearch licenses this file to you under
 8 |  * the Apache License, Version 2.0 (the "License"); you may
 9 |  * not use this file except in compliance with the License.
10 |  * You may obtain a copy of the License at
11 |  *
12 |  *    http://www.apache.org/licenses/LICENSE-2.0
13 |  *
14 |  * Unless required by applicable law or agreed to in writing,
15 |  * software distributed under the License is distributed on an
16 |  * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
17 |  * KIND, either express or implied.  See the License for the
18 |  * specific language governing permissions and limitations
19 |  * under the License.
20 |  */
21 |  buildscript {
22 |     repositories {
23 |         mavenCentral()
24 |     }
25 | }
26 | 
27 | plugins {
28 |   id 'java'
29 |   id 'maven-publish'
30 |   id 'distribution'
31 |   id 'idea' 
32 | }
33 | 
34 | group "org.logstash.outputs"
35 | 
36 | java {
37 |     sourceCompatibility = JavaVersion.VERSION_1_8
38 | }
39 | 
40 | repositories {
41 |     mavenCentral()
42 | }
43 | 
44 | dependencies {
45 |     implementation 'org.xerial.snappy:snappy-java:1.1.10.5'
46 | }
47 | 
48 | task generateGemJarRequiresFile {
49 |     doLast {
50 |         File jars_file = file('lib/logstash-output-webhdfs_jars.rb')
51 |         jars_file.newWriter().withWriter { w ->
52 |             w << "# AUTOGENERATED BY THE GRADLE SCRIPT. DO NOT EDIT.\n\n"
53 |             w << "require \'jar_dependencies\'\n"
54 |             configurations.runtimeClasspath.allDependencies.each {
55 |                 w << "require_jar(\'${it.group}\', \'${it.name}\', \'${it.version}\')\n"
56 |             }
57 |         }
58 |     }
59 | }
60 | 
61 | task vendor {
62 |     doLast {
63 |         String vendorPathPrefix = "vendor/jar-dependencies"
64 |         configurations.runtimeClasspath.allDependencies.each { dep ->
65 |             File f = configurations.runtimeClasspath.filter { it.absolutePath.contains("${dep.group}/${dep.name}/${dep.version}") }.singleFile
66 |             String groupPath = dep.group.replaceAll('\\.', '/')
67 |             File newJarFile = file("${vendorPathPrefix}/${groupPath}/${dep.name}/${dep.version}/${dep.name}-${dep.version}.jar")
68 |             newJarFile.mkdirs()
69 |             Files.copy(f.toPath(), newJarFile.toPath(), REPLACE_EXISTING)
70 |         }
71 |     }
72 | }
73 | 
74 | vendor.dependsOn(generateGemJarRequiresFile)
75 | 


--------------------------------------------------------------------------------
/docs/index.asciidoc:
--------------------------------------------------------------------------------
  1 | :plugin: webhdfs
  2 | :type: output
  3 | :default_codec: line
  4 | 
  5 | ///////////////////////////////////////////
  6 | START - GENERATED VARIABLES, DO NOT EDIT!
  7 | ///////////////////////////////////////////
  8 | :version: %VERSION%
  9 | :release_date: %RELEASE_DATE%
 10 | :changelog_url: %CHANGELOG_URL%
 11 | :include_path: ../../../../logstash/docs/include
 12 | ///////////////////////////////////////////
 13 | END - GENERATED VARIABLES, DO NOT EDIT!
 14 | ///////////////////////////////////////////
 15 | 
 16 | [id="plugins-{type}s-{plugin}"]
 17 | 
 18 | === Webhdfs output plugin
 19 | 
 20 | include::{include_path}/plugin_header.asciidoc[]
 21 | 
 22 | ==== Description
 23 | 
 24 | This plugin sends Logstash events into files in HDFS via
 25 | the https://hadoop.apache.org/docs/r1.0.4/webhdfs.html[webhdfs] REST API.
 26 | 
 27 | ==== Dependencies
 28 | This plugin has no dependency on jars from hadoop, thus reducing configuration and compatibility
 29 | problems. It uses the webhdfs gem from Kazuki Ohta and TAGOMORI Satoshi (@see: https://github.com/kzk/webhdfs).
 30 | Optional dependencies are zlib and snappy gem if you use the compression functionality.
 31 | 
 32 | ==== Operational Notes
 33 | If you get an error like:
 34 | 
 35 |     Max write retries reached. Exception: initialize: name or service not known {:level=>:error}
 36 | 
 37 | make sure that the hostname of your namenode is resolvable on the host running Logstash. When creating/appending
 38 | to a file, webhdfs somtime sends a `307 TEMPORARY_REDIRECT` with the `HOSTNAME` of the machine its running on.
 39 | 
 40 | ==== Usage
 41 | This is an example of Logstash config:
 42 | 
 43 | [source,ruby]
 44 | ----------------------------------
 45 | input {
 46 |   ...
 47 | }
 48 | filter {
 49 |   ...
 50 | }
 51 | output {
 52 |   webhdfs {
 53 |     host => "127.0.0.1"                 # (required)
 54 |     port => 50070                       # (optional, default: 50070)
 55 |     path => "/user/logstash/dt=%{+YYYY-MM-dd}/logstash-%{+HH}.log"  # (required)
 56 |     user => "hue"                       # (required)
 57 |   }
 58 | }
 59 | ----------------------------------
 60 | 
 61 | [id="plugins-{type}s-{plugin}-options"]
 62 | ==== Webhdfs Output Configuration Options
 63 | 
 64 | This plugin supports the following configuration options plus the <<plugins-{type}s-{plugin}-common-options>> described later.
 65 | 
 66 | [cols="<,<,<",options="header",]
 67 | |=======================================================================
 68 | |Setting |Input type|Required
 69 | | <<plugins-{type}s-{plugin}-compression>> |<<string,string>>, one of `["none", "snappy", "gzip"]`|No
 70 | | <<plugins-{type}s-{plugin}-flush_size>> |<<number,number>>|No
 71 | | <<plugins-{type}s-{plugin}-host>> |<<string,string>>|Yes
 72 | | <<plugins-{type}s-{plugin}-idle_flush_time>> |<<number,number>>|No
 73 | | <<plugins-{type}s-{plugin}-kerberos_keytab>> |<<string,string>>|No
 74 | | <<plugins-{type}s-{plugin}-open_timeout>> |<<number,number>>|No
 75 | | <<plugins-{type}s-{plugin}-path>> |<<string,string>>|Yes
 76 | | <<plugins-{type}s-{plugin}-port>> |<<number,number>>|No
 77 | | <<plugins-{type}s-{plugin}-read_timeout>> |<<number,number>>|No
 78 | | <<plugins-{type}s-{plugin}-retry_interval>> |<<number,number>>|No
 79 | | <<plugins-{type}s-{plugin}-retry_known_errors>> |<<boolean,boolean>>|No
 80 | | <<plugins-{type}s-{plugin}-retry_times>> |<<number,number>>|No
 81 | | <<plugins-{type}s-{plugin}-single_file_per_thread>> |<<boolean,boolean>>|No
 82 | | <<plugins-{type}s-{plugin}-snappy_bufsize>> |<<number,number>>|No
 83 | | <<plugins-{type}s-{plugin}-snappy_format>> |<<string,string>>, one of `["stream", "file"]`|No
 84 | | <<plugins-{type}s-{plugin}-ssl_cert>> |<<string,string>>|No
 85 | | <<plugins-{type}s-{plugin}-ssl_key>> |<<string,string>>|No
 86 | | <<plugins-{type}s-{plugin}-standby_host>> |<<string,string>>|No
 87 | | <<plugins-{type}s-{plugin}-standby_port>> |<<number,number>>|No
 88 | | <<plugins-{type}s-{plugin}-use_httpfs>> |<<boolean,boolean>>|No
 89 | | <<plugins-{type}s-{plugin}-use_kerberos_auth>> |<<boolean,boolean>>|No
 90 | | <<plugins-{type}s-{plugin}-use_ssl_auth>> |<<boolean,boolean>>|No
 91 | | <<plugins-{type}s-{plugin}-user>> |<<string,string>>|Yes
 92 | |=======================================================================
 93 | 
 94 | Also see <<plugins-{type}s-{plugin}-common-options>> for a list of options supported by all
 95 | output plugins.
 96 | 
 97 | &nbsp;
 98 | 
 99 | [id="plugins-{type}s-{plugin}-compression"]
100 | ===== `compression` 
101 | 
102 |   * Value can be any of: `none`, `snappy`, `gzip`
103 |   * Default value is `"none"`
104 | 
105 | Compress output. One of ['none', 'snappy', 'gzip']
106 | 
107 | [id="plugins-{type}s-{plugin}-flush_size"]
108 | ===== `flush_size` 
109 | 
110 |   * Value type is <<number,number>>
111 |   * Default value is `500`
112 | 
113 | Sending data to webhdfs if event count is above, even if `store_interval_in_secs` is not reached.
114 | 
115 | [id="plugins-{type}s-{plugin}-host"]
116 | ===== `host` 
117 | 
118 |   * This is a required setting.
119 |   * Value type is <<string,string>>
120 |   * There is no default value for this setting.
121 | 
122 | The server name for webhdfs/httpfs connections.
123 | 
124 | [id="plugins-{type}s-{plugin}-idle_flush_time"]
125 | ===== `idle_flush_time` 
126 | 
127 |   * Value type is <<number,number>>
128 |   * Default value is `1`
129 | 
130 | Sending data to webhdfs in x seconds intervals.
131 | 
132 | [id="plugins-{type}s-{plugin}-kerberos_keytab"]
133 | ===== `kerberos_keytab` 
134 | 
135 |   * Value type is <<string,string>>
136 |   * There is no default value for this setting.
137 | 
138 | Set kerberos keytab file. Note that the gssapi library needs to be available to use this.
139 | 
140 | [id="plugins-{type}s-{plugin}-open_timeout"]
141 | ===== `open_timeout` 
142 | 
143 |   * Value type is <<number,number>>
144 |   * Default value is `30`
145 | 
146 | WebHdfs open timeout, default 30s.
147 | 
148 | [id="plugins-{type}s-{plugin}-path"]
149 | ===== `path` 
150 | 
151 |   * This is a required setting.
152 |   * Value type is <<string,string>>
153 |   * There is no default value for this setting.
154 | 
155 | The path to the file to write to. Event fields can be used here,
156 | as well as date fields in the joda time format, e.g.:
157 | `/user/logstash/dt=%{+YYYY-MM-dd}/%{@source_host}-%{+HH}.log`
158 | 
159 | [id="plugins-{type}s-{plugin}-port"]
160 | ===== `port` 
161 | 
162 |   * Value type is <<number,number>>
163 |   * Default value is `50070`
164 | 
165 | The server port for webhdfs/httpfs connections.
166 | 
167 | [id="plugins-{type}s-{plugin}-read_timeout"]
168 | ===== `read_timeout` 
169 | 
170 |   * Value type is <<number,number>>
171 |   * Default value is `30`
172 | 
173 | The WebHdfs read timeout, default 30s.
174 | 
175 | [id="plugins-{type}s-{plugin}-retry_interval"]
176 | ===== `retry_interval` 
177 | 
178 |   * Value type is <<number,number>>
179 |   * Default value is `0.5`
180 | 
181 | How long should we wait between retries.
182 | 
183 | [id="plugins-{type}s-{plugin}-retry_known_errors"]
184 | ===== `retry_known_errors` 
185 | 
186 |   * Value type is <<boolean,boolean>>
187 |   * Default value is `true`
188 | 
189 | Retry some known webhdfs errors. These may be caused by race conditions when appending to same file, etc.
190 | 
191 | [id="plugins-{type}s-{plugin}-retry_times"]
192 | ===== `retry_times` 
193 | 
194 |   * Value type is <<number,number>>
195 |   * Default value is `5`
196 | 
197 | How many times should we retry. If retry_times is exceeded, an error will be logged and the event will be discarded.
198 | 
199 | [id="plugins-{type}s-{plugin}-single_file_per_thread"]
200 | ===== `single_file_per_thread` 
201 | 
202 |   * Value type is <<boolean,boolean>>
203 |   * Default value is `false`
204 | 
205 | Avoid appending to same file in multiple threads.
206 | This solves some problems with multiple logstash output threads and locked file leases in webhdfs.
207 | If this option is set to true, %{[@metadata][thread_id]} needs to be used in path config settting.
208 | 
209 | [id="plugins-{type}s-{plugin}-snappy_bufsize"]
210 | ===== `snappy_bufsize` 
211 | 
212 |   * Value type is <<number,number>>
213 |   * Default value is `32768`
214 | 
215 | Set snappy chunksize. Only neccessary for stream format. Defaults to 32k. Max is 65536
216 | @see http://code.google.com/p/snappy/source/browse/trunk/framing_format.txt
217 | 
218 | [id="plugins-{type}s-{plugin}-snappy_format"]
219 | ===== `snappy_format` 
220 | 
221 |   * Value can be any of: `stream`, `file`
222 |   * Default value is `"stream"`
223 | 
224 | Set snappy format. One of "stream", "file". Set to stream to be hive compatible.
225 | 
226 | [id="plugins-{type}s-{plugin}-ssl_cert"]
227 | ===== `ssl_cert` 
228 | 
229 |   * Value type is <<string,string>>
230 |   * There is no default value for this setting.
231 | 
232 | Set ssl cert file.
233 | 
234 | [id="plugins-{type}s-{plugin}-ssl_key"]
235 | ===== `ssl_key` 
236 | 
237 |   * Value type is <<string,string>>
238 |   * There is no default value for this setting.
239 | 
240 | Set ssl key file.
241 | 
242 | [id="plugins-{type}s-{plugin}-standby_host"]
243 | ===== `standby_host` 
244 | 
245 |   * Value type is <<string,string>>
246 |   * Default value is `false`
247 | 
248 | Standby namenode for ha hdfs.
249 | 
250 | [id="plugins-{type}s-{plugin}-standby_port"]
251 | ===== `standby_port` 
252 | 
253 |   * Value type is <<number,number>>
254 |   * Default value is `50070`
255 | 
256 | Standby namenode port for ha hdfs.
257 | 
258 | [id="plugins-{type}s-{plugin}-use_httpfs"]
259 | ===== `use_httpfs` 
260 | 
261 |   * Value type is <<boolean,boolean>>
262 |   * Default value is `false`
263 | 
264 | Use httpfs mode if set to true, else webhdfs.
265 | 
266 | [id="plugins-{type}s-{plugin}-use_kerberos_auth"]
267 | ===== `use_kerberos_auth` 
268 | 
269 |   * Value type is <<boolean,boolean>>
270 |   * Default value is `false`
271 | 
272 | Set kerberos authentication.
273 | 
274 | [id="plugins-{type}s-{plugin}-use_ssl_auth"]
275 | ===== `use_ssl_auth` 
276 | 
277 |   * Value type is <<boolean,boolean>>
278 |   * Default value is `false`
279 | 
280 | Set ssl authentication. Note that the openssl library needs to be available to use this.
281 | 
282 | [id="plugins-{type}s-{plugin}-user"]
283 | ===== `user` 
284 | 
285 |   * This is a required setting.
286 |   * Value type is <<string,string>>
287 |   * There is no default value for this setting.
288 | 
289 | The Username for webhdfs.
290 | 
291 | 
292 | 
293 | [id="plugins-{type}s-{plugin}-common-options"]
294 | include::{include_path}/{type}.asciidoc[]
295 | 
296 | :default_codec!:


--------------------------------------------------------------------------------
/gradle.properties:
--------------------------------------------------------------------------------
1 | org.gradle.daemon=false
2 | 


--------------------------------------------------------------------------------
/gradle/wrapper/gradle-wrapper.jar:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/logstash-plugins/logstash-output-webhdfs/1de4c34bf391359d1960cdd3d289375a3b0644a7/gradle/wrapper/gradle-wrapper.jar


--------------------------------------------------------------------------------
/gradle/wrapper/gradle-wrapper.properties:
--------------------------------------------------------------------------------
1 | distributionBase=GRADLE_USER_HOME
2 | distributionPath=wrapper/dists
3 | distributionUrl=https\://services.gradle.org/distributions/gradle-8.7-bin.zip
4 | networkTimeout=10000
5 | validateDistributionUrl=true
6 | zipStoreBase=GRADLE_USER_HOME
7 | zipStorePath=wrapper/dists
8 | 


--------------------------------------------------------------------------------
/gradlew:
--------------------------------------------------------------------------------
  1 | #!/bin/sh
  2 | 
  3 | #
  4 | # Copyright © 2015-2021 the original authors.
  5 | #
  6 | # Licensed under the Apache License, Version 2.0 (the "License");
  7 | # you may not use this file except in compliance with the License.
  8 | # You may obtain a copy of the License at
  9 | #
 10 | #      https://www.apache.org/licenses/LICENSE-2.0
 11 | #
 12 | # Unless required by applicable law or agreed to in writing, software
 13 | # distributed under the License is distributed on an "AS IS" BASIS,
 14 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 15 | # See the License for the specific language governing permissions and
 16 | # limitations under the License.
 17 | #
 18 | 
 19 | ##############################################################################
 20 | #
 21 | #   Gradle start up script for POSIX generated by Gradle.
 22 | #
 23 | #   Important for running:
 24 | #
 25 | #   (1) You need a POSIX-compliant shell to run this script. If your /bin/sh is
 26 | #       noncompliant, but you have some other compliant shell such as ksh or
 27 | #       bash, then to run this script, type that shell name before the whole
 28 | #       command line, like:
 29 | #
 30 | #           ksh Gradle
 31 | #
 32 | #       Busybox and similar reduced shells will NOT work, because this script
 33 | #       requires all of these POSIX shell features:
 34 | #         * functions;
 35 | #         * expansions «$var», «${var}», «${var:-default}», «${var+SET}»,
 36 | #           «${var#prefix}», «${var%suffix}», and «$( cmd )»;
 37 | #         * compound commands having a testable exit status, especially «case»;
 38 | #         * various built-in commands including «command», «set», and «ulimit».
 39 | #
 40 | #   Important for patching:
 41 | #
 42 | #   (2) This script targets any POSIX shell, so it avoids extensions provided
 43 | #       by Bash, Ksh, etc; in particular arrays are avoided.
 44 | #
 45 | #       The "traditional" practice of packing multiple parameters into a
 46 | #       space-separated string is a well documented source of bugs and security
 47 | #       problems, so this is (mostly) avoided, by progressively accumulating
 48 | #       options in "$@", and eventually passing that to Java.
 49 | #
 50 | #       Where the inherited environment variables (DEFAULT_JVM_OPTS, JAVA_OPTS,
 51 | #       and GRADLE_OPTS) rely on word-splitting, this is performed explicitly;
 52 | #       see the in-line comments for details.
 53 | #
 54 | #       There are tweaks for specific operating systems such as AIX, CygWin,
 55 | #       Darwin, MinGW, and NonStop.
 56 | #
 57 | #   (3) This script is generated from the Groovy template
 58 | #       https://github.com/gradle/gradle/blob/HEAD/subprojects/plugins/src/main/resources/org/gradle/api/internal/plugins/unixStartScript.txt
 59 | #       within the Gradle project.
 60 | #
 61 | #       You can find Gradle at https://github.com/gradle/gradle/.
 62 | #
 63 | ##############################################################################
 64 | 
 65 | # Attempt to set APP_HOME
 66 | 
 67 | # Resolve links: $0 may be a link
 68 | app_path=$0
 69 | 
 70 | # Need this for daisy-chained symlinks.
 71 | while
 72 |     APP_HOME=${app_path%"${app_path##*/}"}  # leaves a trailing /; empty if no leading path
 73 |     [ -h "$app_path" ]
 74 | do
 75 |     ls=$( ls -ld "$app_path" )
 76 |     link=${ls#*' -> '}
 77 |     case $link in             #(
 78 |       /*)   app_path=$link ;; #(
 79 |       *)    app_path=$APP_HOME$link ;;
 80 |     esac
 81 | done
 82 | 
 83 | # This is normally unused
 84 | # shellcheck disable=SC2034
 85 | APP_BASE_NAME=${0##*/}
 86 | # Discard cd standard output in case $CDPATH is set (https://github.com/gradle/gradle/issues/25036)
 87 | APP_HOME=$( cd "${APP_HOME:-./}" > /dev/null && pwd -P ) || exit
 88 | 
 89 | # Use the maximum available, or set MAX_FD != -1 to use that value.
 90 | MAX_FD=maximum
 91 | 
 92 | warn () {
 93 |     echo "$*"
 94 | } >&2
 95 | 
 96 | die () {
 97 |     echo
 98 |     echo "$*"
 99 |     echo
100 |     exit 1
101 | } >&2
102 | 
103 | # OS specific support (must be 'true' or 'false').
104 | cygwin=false
105 | msys=false
106 | darwin=false
107 | nonstop=false
108 | case "$( uname )" in                #(
109 |   CYGWIN* )         cygwin=true  ;; #(
110 |   Darwin* )         darwin=true  ;; #(
111 |   MSYS* | MINGW* )  msys=true    ;; #(
112 |   NONSTOP* )        nonstop=true ;;
113 | esac
114 | 
115 | CLASSPATH=$APP_HOME/gradle/wrapper/gradle-wrapper.jar
116 | 
117 | 
118 | # Determine the Java command to use to start the JVM.
119 | if [ -n "$JAVA_HOME" ] ; then
120 |     if [ -x "$JAVA_HOME/jre/sh/java" ] ; then
121 |         # IBM's JDK on AIX uses strange locations for the executables
122 |         JAVACMD=$JAVA_HOME/jre/sh/java
123 |     else
124 |         JAVACMD=$JAVA_HOME/bin/java
125 |     fi
126 |     if [ ! -x "$JAVACMD" ] ; then
127 |         die "ERROR: JAVA_HOME is set to an invalid directory: $JAVA_HOME
128 | 
129 | Please set the JAVA_HOME variable in your environment to match the
130 | location of your Java installation."
131 |     fi
132 | else
133 |     JAVACMD=java
134 |     if ! command -v java >/dev/null 2>&1
135 |     then
136 |         die "ERROR: JAVA_HOME is not set and no 'java' command could be found in your PATH.
137 | 
138 | Please set the JAVA_HOME variable in your environment to match the
139 | location of your Java installation."
140 |     fi
141 | fi
142 | 
143 | # Increase the maximum file descriptors if we can.
144 | if ! "$cygwin" && ! "$darwin" && ! "$nonstop" ; then
145 |     case $MAX_FD in #(
146 |       max*)
147 |         # In POSIX sh, ulimit -H is undefined. That's why the result is checked to see if it worked.
148 |         # shellcheck disable=SC2039,SC3045
149 |         MAX_FD=$( ulimit -H -n ) ||
150 |             warn "Could not query maximum file descriptor limit"
151 |     esac
152 |     case $MAX_FD in  #(
153 |       '' | soft) :;; #(
154 |       *)
155 |         # In POSIX sh, ulimit -n is undefined. That's why the result is checked to see if it worked.
156 |         # shellcheck disable=SC2039,SC3045
157 |         ulimit -n "$MAX_FD" ||
158 |             warn "Could not set maximum file descriptor limit to $MAX_FD"
159 |     esac
160 | fi
161 | 
162 | # Collect all arguments for the java command, stacking in reverse order:
163 | #   * args from the command line
164 | #   * the main class name
165 | #   * -classpath
166 | #   * -D...appname settings
167 | #   * --module-path (only if needed)
168 | #   * DEFAULT_JVM_OPTS, JAVA_OPTS, and GRADLE_OPTS environment variables.
169 | 
170 | # For Cygwin or MSYS, switch paths to Windows format before running java
171 | if "$cygwin" || "$msys" ; then
172 |     APP_HOME=$( cygpath --path --mixed "$APP_HOME" )
173 |     CLASSPATH=$( cygpath --path --mixed "$CLASSPATH" )
174 | 
175 |     JAVACMD=$( cygpath --unix "$JAVACMD" )
176 | 
177 |     # Now convert the arguments - kludge to limit ourselves to /bin/sh
178 |     for arg do
179 |         if
180 |             case $arg in                                #(
181 |               -*)   false ;;                            # don't mess with options #(
182 |               /?*)  t=${arg#/} t=/${t%%/*}              # looks like a POSIX filepath
183 |                     [ -e "$t" ] ;;                      #(
184 |               *)    false ;;
185 |             esac
186 |         then
187 |             arg=$( cygpath --path --ignore --mixed "$arg" )
188 |         fi
189 |         # Roll the args list around exactly as many times as the number of
190 |         # args, so each arg winds up back in the position where it started, but
191 |         # possibly modified.
192 |         #
193 |         # NB: a `for` loop captures its iteration list before it begins, so
194 |         # changing the positional parameters here affects neither the number of
195 |         # iterations, nor the values presented in `arg`.
196 |         shift                   # remove old arg
197 |         set -- "$@" "$arg"      # push replacement arg
198 |     done
199 | fi
200 | 
201 | 
202 | # Add default JVM options here. You can also use JAVA_OPTS and GRADLE_OPTS to pass JVM options to this script.
203 | DEFAULT_JVM_OPTS='"-Xmx64m" "-Xms64m"'
204 | 
205 | # Collect all arguments for the java command:
206 | #   * DEFAULT_JVM_OPTS, JAVA_OPTS, JAVA_OPTS, and optsEnvironmentVar are not allowed to contain shell fragments,
207 | #     and any embedded shellness will be escaped.
208 | #   * For example: A user cannot expect ${Hostname} to be expanded, as it is an environment variable and will be
209 | #     treated as '${Hostname}' itself on the command line.
210 | 
211 | set -- \
212 |         "-Dorg.gradle.appname=$APP_BASE_NAME" \
213 |         -classpath "$CLASSPATH" \
214 |         org.gradle.wrapper.GradleWrapperMain \
215 |         "$@"
216 | 
217 | # Stop when "xargs" is not available.
218 | if ! command -v xargs >/dev/null 2>&1
219 | then
220 |     die "xargs is not available"
221 | fi
222 | 
223 | # Use "xargs" to parse quoted args.
224 | #
225 | # With -n1 it outputs one arg per line, with the quotes and backslashes removed.
226 | #
227 | # In Bash we could simply go:
228 | #
229 | #   readarray ARGS < <( xargs -n1 <<<"$var" ) &&
230 | #   set -- "${ARGS[@]}" "$@"
231 | #
232 | # but POSIX shell has neither arrays nor command substitution, so instead we
233 | # post-process each arg (as a line of input to sed) to backslash-escape any
234 | # character that might be a shell metacharacter, then use eval to reverse
235 | # that process (while maintaining the separation between arguments), and wrap
236 | # the whole thing up as a single "set" statement.
237 | #
238 | # This will of course break if any of these variables contains a newline or
239 | # an unmatched quote.
240 | #
241 | 
242 | eval "set -- $(
243 |         printf '%s\n' "$DEFAULT_JVM_OPTS $JAVA_OPTS $GRADLE_OPTS" |
244 |         xargs -n1 |
245 |         sed ' s~[^-[:alnum:]+,./:=@_]~\\&~g; ' |
246 |         tr '\n' ' '
247 |     )" '"$@"'
248 | 
249 | exec "$JAVACMD" "$@"
250 | 


--------------------------------------------------------------------------------
/gradlew.bat:
--------------------------------------------------------------------------------
 1 | @rem
 2 | @rem Copyright 2015 the original author or authors.
 3 | @rem
 4 | @rem Licensed under the Apache License, Version 2.0 (the "License");
 5 | @rem you may not use this file except in compliance with the License.
 6 | @rem You may obtain a copy of the License at
 7 | @rem
 8 | @rem      https://www.apache.org/licenses/LICENSE-2.0
 9 | @rem
10 | @rem Unless required by applicable law or agreed to in writing, software
11 | @rem distributed under the License is distributed on an "AS IS" BASIS,
12 | @rem WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13 | @rem See the License for the specific language governing permissions and
14 | @rem limitations under the License.
15 | @rem
16 | 
17 | @if "%DEBUG%"=="" @echo off
18 | @rem ##########################################################################
19 | @rem
20 | @rem  Gradle startup script for Windows
21 | @rem
22 | @rem ##########################################################################
23 | 
24 | @rem Set local scope for the variables with windows NT shell
25 | if "%OS%"=="Windows_NT" setlocal
26 | 
27 | set DIRNAME=%~dp0
28 | if "%DIRNAME%"=="" set DIRNAME=.
29 | @rem This is normally unused
30 | set APP_BASE_NAME=%~n0
31 | set APP_HOME=%DIRNAME%
32 | 
33 | @rem Resolve any "." and ".." in APP_HOME to make it shorter.
34 | for %%i in ("%APP_HOME%") do set APP_HOME=%%~fi
35 | 
36 | @rem Add default JVM options here. You can also use JAVA_OPTS and GRADLE_OPTS to pass JVM options to this script.
37 | set DEFAULT_JVM_OPTS="-Xmx64m" "-Xms64m"
38 | 
39 | @rem Find java.exe
40 | if defined JAVA_HOME goto findJavaFromJavaHome
41 | 
42 | set JAVA_EXE=java.exe
43 | %JAVA_EXE% -version >NUL 2>&1
44 | if %ERRORLEVEL% equ 0 goto execute
45 | 
46 | echo. 1>&2
47 | echo ERROR: JAVA_HOME is not set and no 'java' command could be found in your PATH. 1>&2
48 | echo. 1>&2
49 | echo Please set the JAVA_HOME variable in your environment to match the 1>&2
50 | echo location of your Java installation. 1>&2
51 | 
52 | goto fail
53 | 
54 | :findJavaFromJavaHome
55 | set JAVA_HOME=%JAVA_HOME:"=%
56 | set JAVA_EXE=%JAVA_HOME%/bin/java.exe
57 | 
58 | if exist "%JAVA_EXE%" goto execute
59 | 
60 | echo. 1>&2
61 | echo ERROR: JAVA_HOME is set to an invalid directory: %JAVA_HOME% 1>&2
62 | echo. 1>&2
63 | echo Please set the JAVA_HOME variable in your environment to match the 1>&2
64 | echo location of your Java installation. 1>&2
65 | 
66 | goto fail
67 | 
68 | :execute
69 | @rem Setup the command line
70 | 
71 | set CLASSPATH=%APP_HOME%\gradle\wrapper\gradle-wrapper.jar
72 | 
73 | 
74 | @rem Execute Gradle
75 | "%JAVA_EXE%" %DEFAULT_JVM_OPTS% %JAVA_OPTS% %GRADLE_OPTS% "-Dorg.gradle.appname=%APP_BASE_NAME%" -classpath "%CLASSPATH%" org.gradle.wrapper.GradleWrapperMain %*
76 | 
77 | :end
78 | @rem End local scope for the variables with windows NT shell
79 | if %ERRORLEVEL% equ 0 goto mainEnd
80 | 
81 | :fail
82 | rem Set variable GRADLE_EXIT_CONSOLE if you need the _script_ return code instead of
83 | rem the _cmd.exe /c_ return code!
84 | set EXIT_CODE=%ERRORLEVEL%
85 | if %EXIT_CODE% equ 0 set EXIT_CODE=1
86 | if not ""=="%GRADLE_EXIT_CONSOLE%" exit %EXIT_CODE%
87 | exit /b %EXIT_CODE%
88 | 
89 | :mainEnd
90 | if "%OS%"=="Windows_NT" endlocal
91 | 
92 | :omega
93 | 


--------------------------------------------------------------------------------
/lib/logstash/outputs/webhdfs.rb:
--------------------------------------------------------------------------------
  1 | # encoding: utf-8
  2 | require "logstash/namespace"
  3 | require "logstash/outputs/base"
  4 | require "stud/buffer"
  5 | require "logstash/outputs/webhdfs_helper"
  6 | 
  7 | # This plugin sends Logstash events into files in HDFS via
  8 | # the https://hadoop.apache.org/docs/r1.0.4/webhdfs.html[webhdfs] REST API.
  9 | #
 10 | # ==== Dependencies
 11 | # This plugin has no dependency on jars from hadoop, thus reducing configuration and compatibility
 12 | # problems. It uses the webhdfs gem from Kazuki Ohta and TAGOMORI Satoshi (@see: https://github.com/kzk/webhdfs).
 13 | # Optional dependencies are zlib if you use the compression functionality.
 14 | #
 15 | # ==== Operational Notes
 16 | # If you get an error like:
 17 | #
 18 | #     Max write retries reached. Exception: initialize: name or service not known {:level=>:error}
 19 | #
 20 | # make sure that the hostname of your namenode is resolvable on the host running Logstash. When creating/appending
 21 | # to a file, webhdfs somtime sends a `307 TEMPORARY_REDIRECT` with the `HOSTNAME` of the machine its running on.
 22 | #
 23 | # ==== Usage
 24 | # This is an example of Logstash config:
 25 | #
 26 | # [source,ruby]
 27 | # ----------------------------------
 28 | # input {
 29 | #   ...
 30 | # }
 31 | # filter {
 32 | #   ...
 33 | # }
 34 | # output {
 35 | #   webhdfs {
 36 | #     host => "127.0.0.1"                 # (required)
 37 | #     port => 50070                       # (optional, default: 50070)
 38 | #     path => "/user/logstash/dt=%{+YYYY-MM-dd}/logstash-%{+HH}.log"  # (required)
 39 | #     user => "hue"                       # (required)
 40 | #   }
 41 | # }
 42 | # ----------------------------------
 43 | 
 44 | class LogStash::Outputs::WebHdfs < LogStash::Outputs::Base
 45 | 
 46 |   include Stud::Buffer
 47 |   include LogStash::Outputs::WebHdfsHelper
 48 | 
 49 |   config_name "webhdfs"
 50 | 
 51 |   MAGIC = "\x82SNAPPY\x0".force_encoding Encoding::ASCII_8BIT
 52 |   DEFAULT_VERSION = 1
 53 |   MINIMUM_COMPATIBLE_VERSION = 1
 54 | 
 55 |   # The server name for webhdfs/httpfs connections.
 56 |   config :host, :validate => :string, :required => true
 57 | 
 58 |   # The server port for webhdfs/httpfs connections.
 59 |   config :port, :validate => :number, :default => 50070
 60 | 
 61 |   # Standby namenode for ha hdfs.
 62 |   config :standby_host, :validate => :string, :default => false
 63 | 
 64 |   # Standby namenode port for ha hdfs.
 65 |   config :standby_port, :validate => :number, :default => 50070
 66 | 
 67 |   # The Username for webhdfs.
 68 |   config :user, :validate => :string, :required => true
 69 | 
 70 |   # The path to the file to write to. Event fields can be used here,
 71 |   # as well as date fields in the joda time format, e.g.:
 72 |   # `/user/logstash/dt=%{+YYYY-MM-dd}/%{@source_host}-%{+HH}.log`
 73 |   config :path, :validate => :string, :required => true
 74 | 
 75 |   # Sending data to webhdfs in x seconds intervals.
 76 |   config :idle_flush_time, :validate => :number, :default => 1
 77 | 
 78 |   # Sending data to webhdfs if event count is above, even if `store_interval_in_secs` is not reached.
 79 |   config :flush_size, :validate => :number, :default => 500
 80 | 
 81 |   # WebHdfs open timeout, default 30s.
 82 |   config :open_timeout, :validate => :number, :default => 30
 83 | 
 84 |   # The WebHdfs read timeout, default 30s.
 85 |   config :read_timeout, :validate => :number, :default => 30
 86 | 
 87 |   # Use httpfs mode if set to true, else webhdfs.
 88 |   config :use_httpfs, :validate => :boolean, :default => false
 89 | 
 90 |   # Avoid appending to same file in multiple threads.
 91 |   # This solves some problems with multiple logstash output threads and locked file leases in webhdfs.
 92 |   # If this option is set to true, %{[@metadata][thread_id]} needs to be used in path config settting.
 93 |   config :single_file_per_thread, :validate => :boolean, :default => false
 94 | 
 95 |   # Retry some known webhdfs errors. These may be caused by race conditions when appending to same file, etc.
 96 |   config :retry_known_errors, :validate => :boolean, :default => true
 97 | 
 98 |   # How long should we wait between retries.
 99 |   config :retry_interval, :validate => :number, :default => 0.5
100 | 
101 |   # How many times should we retry. If retry_times is exceeded, an error will be logged and the event will be discarded.
102 |   config :retry_times, :validate => :number, :default => 5
103 | 
104 |   # Compress output. One of ['none', 'snappy', 'gzip']
105 |   config :compression, :validate => ["none", "snappy", "gzip"], :default => "none"
106 | 
107 |   # Set snappy chunksize. Only neccessary for stream format. Defaults to 32k. Max is 65536
108 |   # @see http://code.google.com/p/snappy/source/browse/trunk/framing_format.txt
109 |   config :snappy_bufsize, :validate => :number, :default => 32768
110 | 
111 |   # Set snappy format. One of "stream", "file". Set to stream to be hive compatible.
112 |   config :snappy_format, :validate => ["stream", "file"], :default => "stream"
113 | 
114 |   # Set kerberos authentication.
115 |   config :use_kerberos_auth, :validate => :boolean, :default => false
116 | 
117 |   # Set kerberos keytab file. Note that the gssapi library needs to be available to use this.
118 |   config :kerberos_keytab, :validate => :string
119 | 
120 |   # Set ssl authentication. Note that the openssl library needs to be available to use this.
121 |   config :use_ssl_auth, :validate => :boolean, :default => false
122 | 
123 |   # Set ssl key file.
124 |   config :ssl_key, :validate => :string
125 | 
126 |   # Set ssl cert file.
127 |   config :ssl_cert, :validate => :string
128 | 
129 |   ## Set codec.
130 |   default :codec, 'line'
131 | 
132 |   public
133 | 
134 |   def register
135 |     load_module('webhdfs')
136 | 
137 |     # in case of snappy the jars are already included and no wrapper module has to be loaded.
138 |     if @compression == "gzip"
139 |       load_module('zlib')
140 |     end
141 |     @main_namenode_failed = false
142 |     @standby_client = false
143 |     @files = {}
144 |     # Create and test standby client if configured.
145 |     if @standby_host
146 |       @standby_client = prepare_client(@standby_host, @standby_port, @user)
147 |       begin
148 |         test_client(@standby_client)
149 |       rescue => e
150 |         logger.warn("Could not connect to standby namenode #{@standby_client.host}. Error: #{e.message}. Trying main webhdfs namenode.")
151 |       end
152 |     end
153 |     @client = prepare_client(@host, @port, @user)
154 |     begin
155 |       test_client(@client)
156 |     rescue => e
157 |       # If no standy host is configured, we need to exit here.
158 |       if not @standby_host
159 |         raise
160 |       else
161 |         # If a standby host is configured, try this before giving up.
162 |         logger.error("Could not connect to #{@client.host}:#{@client.port}. Error: #{e.message}")
163 |         do_failover
164 |       end
165 |     end
166 |     # Make sure @path contains %{[@metadata][thread_id]} format value if @single_file_per_thread is set to true.
167 |     if @single_file_per_thread and !@path.include? "%{[@metadata][thread_id]}"
168 |       @logger.error("Please set %{[@metadata][thread_id]} format value in @path if @single_file_per_thread is active.")
169 |       raise LogStash::ConfigurationError
170 |     end
171 |     buffer_initialize(
172 |       :max_items => @flush_size,
173 |       :max_interval => @idle_flush_time,
174 |       :logger => @logger
175 |     )
176 |     @codec.on_event do |event, encoded_event|
177 |       encoded_event
178 |     end
179 |   end # def register
180 | 
181 |   def receive(event)
182 |     buffer_receive(event)
183 |   end # def receive
184 | 
185 |   def flush(events=nil, close=false)
186 |     return if not events
187 |     newline = "\n"
188 |     output_files = Hash.new { |hash, key| hash[key] = "" }
189 |     events.collect do |event|
190 |       # Add thread_id to event metadata to be used as format value in path configuration.
191 |       if @single_file_per_thread
192 |         event.set("[@metadata][thread_id]", Thread.current.object_id.to_s)
193 |       end
194 |       path = event.sprintf(@path)
195 |       event_as_string = @codec.encode(event)
196 |       event_as_string += newline unless event_as_string.end_with? newline
197 |       output_files[path] << event_as_string
198 |     end
199 |     output_files.each do |path, output|
200 |       if @compression == "gzip"
201 |         path += ".gz"
202 |         output = compress_gzip(output)
203 |       elsif @compression == "snappy"
204 |         path += ".snappy"
205 |         if @snappy_format == "file"
206 |           output = compress_snappy_file(output)
207 |         elsif
208 |           output = compress_snappy_stream(output)
209 |         end
210 |       end
211 |       write_data(path, output)
212 |     end
213 |   end
214 | 
215 |   def write_data(path, data)
216 |     # Retry max_retry times. This can solve problems like leases being hold by another process. Sadly this is no
217 |     # KNOWN_ERROR in rubys webhdfs client.
218 |     write_tries = 0
219 |     begin
220 |       # Try to append to already existing file, which will work most of the times.
221 |       @client.append(path, data)
222 |       # File does not exist, so create it.
223 |     rescue WebHDFS::FileNotFoundError
224 |       # Add snappy header if format is "file".
225 |       if @compression == "snappy" and @snappy_format == "file"
226 |         @client.create(path, get_snappy_header! + data)
227 |       elsif
228 |         @client.create(path, data)
229 |       end
230 |     # Handle other write errors and retry to write max. @retry_times.
231 |     rescue => e
232 |       # Handle StandbyException and do failover. Still we want to exit if write_tries >= @retry_times.
233 |       if @standby_client && (e.message.match(/Failed to connect to host/) || e.message.match(/StandbyException/))
234 |         do_failover
235 |         write_tries += 1
236 |         retry
237 |       end
238 |       if write_tries < @retry_times
239 |         @logger.warn("webhdfs write caused an exception: #{e.message}. Maybe you should increase retry_interval or reduce number of workers. Retrying...")
240 |         sleep(@retry_interval * write_tries)
241 |         write_tries += 1
242 |         retry
243 |       else
244 |         # Issue error after max retries.
245 |         @logger.error("Max write retries reached. Events will be discarded. Exception: #{e.message}")
246 |       end
247 |     end
248 |   end
249 | 
250 |   def do_failover
251 |     if not @standby_client
252 |       return
253 |     end
254 |     @logger.warn("Failing over from #{@client.host}:#{@client.port} to #{@standby_client.host}:#{@standby_client.port}.")
255 |     @client, @standby_client = @standby_client, @client
256 |   end
257 | 
258 |   def close
259 |     buffer_flush(:final => true)
260 |   end # def close
261 | end # class LogStash::Outputs::WebHdfs
262 | 


--------------------------------------------------------------------------------
/lib/logstash/outputs/webhdfs_helper.rb:
--------------------------------------------------------------------------------
  1 | require "logstash/namespace"
  2 | 
  3 | module LogStash
  4 |   module Outputs
  5 |     module WebHdfsHelper
  6 | 
  7 |       # Load a module
  8 |       # @param module_name [String] A module name
  9 |       # @raise [LoadError] If the module count not be loaded
 10 |       def load_module(module_name)
 11 |         begin
 12 |           require module_name
 13 |         rescue LoadError
 14 |           @logger.error("Module #{module_name} could not be loaded.")
 15 |           raise
 16 |         end
 17 |       end
 18 | 
 19 |       # Setup a WebHDFS client
 20 |       # @param host [String] The WebHDFS location
 21 |       # @param port [Number] The port used to do the communication
 22 |       # @param username [String] A valid HDFS user
 23 |       # @return [WebHDFS] An setup client instance
 24 |       def prepare_client(host, port, username)
 25 |         client = WebHDFS::Client.new(host, port, username)
 26 |         if @use_kerberos_auth
 27 |           require 'gssapi'
 28 |           client.kerberos = true
 29 |           client.kerberos_keytab = @kerberos_keytab
 30 |         end
 31 |         if @use_ssl_auth
 32 |           require 'openssl'
 33 |           client.ssl = true
 34 |           client.ssl_key = OpenSSL::PKey::RSA.new(open(@ssl_key))
 35 |           client.ssl_cert = OpenSSL::X509::Certificate.new(open(@ssl_cert))
 36 |         end
 37 |         client.httpfs_mode = @use_httpfs
 38 |         client.open_timeout = @open_timeout
 39 |         client.read_timeout = @read_timeout
 40 |         client.retry_known_errors = @retry_known_errors
 41 |         client.retry_interval = @retry_interval if @retry_interval
 42 |         client.retry_times = @retry_times if @retry_times
 43 |         client
 44 |       end
 45 |       # Test client connection.
 46 |       #@param client [WebHDFS] webhdfs client object.
 47 |       def test_client(client)
 48 |         begin
 49 |           client.list('/')
 50 |         rescue => e
 51 |           @logger.error("Webhdfs check request failed. (namenode: #{client.host}:#{client.port}, Exception: #{e.message})")
 52 |           raise
 53 |         end
 54 |       end
 55 | 
 56 |       # Compress data using the gzip methods.
 57 |       # @param data [String] stream of data to be compressed
 58 |       # @return [String] the compressed stream of data
 59 |       def compress_gzip(data)
 60 |         buffer = StringIO.new('','w')
 61 |         compressor = Zlib::GzipWriter.new(buffer)
 62 |         begin
 63 |           compressor.write(data)
 64 |         ensure
 65 |           compressor.close()
 66 |         end
 67 |         buffer.string
 68 |       end
 69 | 
 70 |       # Compress snappy file.
 71 |       # @param data [binary] stream of data to be compressed
 72 |       # @return [String] the compressed stream of data
 73 |       def compress_snappy_file(data)
 74 |         # Encode data to ASCII_8BIT (binary)
 75 |         data = data.encode(Encoding::ASCII_8BIT, "binary", :undef => :replace)
 76 |         buffer = StringIO.new('', 'w')
 77 |         buffer.set_encoding(Encoding::ASCII_8BIT)
 78 |         compressed = snappy_deflate(data)
 79 |         buffer << [compressed.size, compressed].pack("Na*")
 80 |         buffer.string
 81 |       end
 82 | 
 83 |       def snappy_deflate(input)
 84 |         raw_bytes = input.bytes.to_java :byte # to needed to force the instance to be a byte[] and match arguments type in subsequent Snappy call
 85 | 
 86 |         compressed = Java::org.xerial.snappy.Snappy.compress(raw_bytes)
 87 | 
 88 |         String.from_java_bytes(compressed)
 89 |       end
 90 | 
 91 |       def snappy_inflate(input)
 92 |         raw_bytes = input.bytes.to_java :byte # to needed to force the instance to be a byte[] and match arguments type in subsequent Snappy call
 93 |         uncompressed_length = Java::org.xerial.snappy.Snappy.uncompressedLength(raw_bytes, 0, raw_bytes.length)
 94 |         uncompressed = Java::byte[uncompressed_length].new
 95 |         Java::org.xerial.snappy.Snappy.uncompress(raw_bytes, 0, raw_bytes.length, uncompressed, 0)
 96 | 
 97 |         String.from_java_bytes(uncompressed)
 98 |       end
 99 | 
100 |       def compress_snappy_stream(data)
101 |         # Encode data to ASCII_8BIT (binary)
102 |         data = data.encode(Encoding::ASCII_8BIT, "binary", :undef => :replace)
103 |         buffer = StringIO.new
104 |         buffer.set_encoding(Encoding::ASCII_8BIT)
105 |         chunks = data.scan(/.{1,#{@snappy_bufsize}}/m)
106 |         chunks.each do |chunk|
107 |           compressed = snappy_deflate(chunk)
108 |           buffer << [chunk.size, compressed.size, compressed].pack("NNa*")
109 |         end
110 |         return buffer.string
111 |       end
112 | 
113 |       def get_snappy_header!
114 |         [MAGIC, DEFAULT_VERSION, MINIMUM_COMPATIBLE_VERSION].pack("a8NN")
115 |       end
116 | 
117 |     end
118 |   end
119 | end
120 | 


--------------------------------------------------------------------------------
/logstash-output-webhdfs.gemspec:
--------------------------------------------------------------------------------
 1 | ﻿# encoding: utf-8
 2 | Gem::Specification.new do |s|
 3 | 
 4 |   s.name            = 'logstash-output-webhdfs'
 5 |   s.version         = '3.1.0'
 6 |   s.licenses        = ['Apache License (2.0)']
 7 |   s.summary         = "Sends Logstash events to HDFS using the `webhdfs` REST API"
 8 |   s.description     = "This gem is a Logstash plugin required to be installed on top of the Logstash core pipeline using $LS_HOME/bin/logstash-plugin install gemname. This gem is not a stand-alone program"
 9 |   s.authors         = ["Björn Puttmann, loshkovskyi, Elastic"]
10 |   s.email           = 'b.puttmann@dbap.de'
11 |   s.homepage        = "http://www.dbap.de"
12 |   s.require_paths = ['lib', 'vendor/jar-dependencies']
13 | 
14 |   # Files
15 |   s.files = Dir["lib/**/*","spec/**/*","*.gemspec","*.md","CONTRIBUTORS","Gemfile","LICENSE","NOTICE.TXT", "vendor/jar-dependencies/**/*.jar", "vendor/jar-dependencies/**/*.rb", "VERSION", "docs/**/*"]
16 | 
17 |   # Tests
18 |   s.test_files = s.files.grep(%r{^(test|spec|features)/})
19 | 
20 |   # Special flag to let us know this is actually a logstash plugin
21 |   s.metadata = { "logstash_plugin" => "true", "logstash_group" => "output" }
22 | 
23 |   # Gem dependencies
24 |   s.add_runtime_dependency "logstash-core-plugin-api", ">= 1.60", "<= 2.99"
25 |   s.add_runtime_dependency 'webhdfs'
26 |   s.add_development_dependency 'logstash-devutils'
27 | 
28 |   s.add_development_dependency 'logstash-codec-line'
29 |   s.add_development_dependency 'logstash-codec-json'
30 | 
31 |   s.platform = 'java'
32 | end
33 | 


--------------------------------------------------------------------------------
/spec/integration/webhdfs_spec.rb:
--------------------------------------------------------------------------------
  1 | # encoding: utf-8
  2 | require 'logstash/devutils/rspec/spec_helper'
  3 | require 'logstash/outputs/webhdfs'
  4 | require 'webhdfs'
  5 | require 'json'
  6 | 
  7 | describe LogStash::Outputs::WebHdfs, :integration => true do
  8 |   let(:host) { 'localhost' }
  9 |   let(:port) { 50070 }
 10 |   let(:user) { 'test' }
 11 |   let(:test_file) { '/user/' + user + '/%{host}.test' }
 12 |   let(:hdfs_file_name) { 'user/' + user + '/localhost.test' }
 13 | 
 14 |   let(:config) { { 'host' => host, 'user' => user, 'path' => test_file, 'compression' => 'none' } }
 15 | 
 16 |   subject(:plugin) { LogStash::Plugin.lookup("output", "webhdfs").new(config) }
 17 | 
 18 |   let(:webhdfs_client) { WebHDFS::Client.new(host, port, user) }
 19 | 
 20 |   let(:event) { LogStash::Event.new('message' => 'Hello world!', 'source' => 'out of the blue',
 21 |                                     'type' => 'generator', 'host' => 'localhost' ) }
 22 | 
 23 |   describe "register and close" do
 24 | 
 25 |     it 'should register with default values' do
 26 |       expect { subject.register }.to_not raise_error
 27 |     end
 28 | 
 29 |   end
 30 | 
 31 |   describe '#write' do
 32 | 
 33 |     let(:config) { { 'host' => host, 'user' => user, 'flush_size' => 10,
 34 |                      'path' => test_file, 'compression' => 'none' } }
 35 | 
 36 |     after(:each) do
 37 |       webhdfs_client.delete(hdfs_file_name)
 38 |     end
 39 | 
 40 |     describe "writing plain files" do
 41 | 
 42 |       before(:each) do
 43 |         subject.register
 44 |         subject.receive(event)
 45 |         subject.close
 46 |       end
 47 | 
 48 |       it 'should use the correct filename pattern' do
 49 |         expect { webhdfs_client.read(hdfs_file_name) }.to_not raise_error
 50 |       end
 51 | 
 52 |       context "using the line codec without format" do
 53 | 
 54 |         let(:config) { { 'host' => host, 'user' => user, 'flush_size' => 10,
 55 |                          'path' => test_file, 'compression' => 'none', 'codec' => 'line' } }
 56 | 
 57 |         it 'should match the event data' do
 58 |           expect(webhdfs_client.read(hdfs_file_name).strip()).to eq(event.to_s)
 59 |         end
 60 | 
 61 |       end
 62 | 
 63 |       context "using the json codec" do
 64 | 
 65 |         let(:config) { { 'host' => host, 'user' => user, 'flush_size' => 10,
 66 |                          'path' => test_file, 'compression' => 'none', 'codec' => 'json' } }
 67 | 
 68 | 
 69 |         it 'should match the event data' do
 70 |           expect(webhdfs_client.read(hdfs_file_name).strip()).to eq(event.to_json)
 71 |         end
 72 | 
 73 |       end
 74 | 
 75 |       context "when flushing events" do
 76 | 
 77 |         let(:config) { { 'host' => host, 'user' => user, 'flush_size' => 10, 'idle_flush_time' => 2,
 78 |                          'path' => test_file, 'compression' => 'none', 'codec' => 'json' } }
 79 | 
 80 |         before(:each) do
 81 |           webhdfs_client.delete(hdfs_file_name)
 82 |         end
 83 | 
 84 |         it 'should flush after configured idle time' do
 85 |           subject.register
 86 |           subject.receive(event)
 87 |           expect { webhdfs_client.read(hdfs_file_name) }.to raise_error(error=WebHDFS::FileNotFoundError)
 88 |           sleep 3
 89 |           expect { webhdfs_client.read(hdfs_file_name) }.to_not raise_error
 90 |           expect(webhdfs_client.read(hdfs_file_name).strip()).to eq(event.to_json)
 91 |         end
 92 | 
 93 |       end
 94 | 
 95 |     end
 96 | 
 97 |     describe "#compression" do
 98 | 
 99 |       before(:each) do
100 |         subject.register
101 |         for _ in 0...500
102 |           subject.receive(event)
103 |         end
104 |         subject.close
105 |       end
106 | 
107 |       context "when using no compression" do
108 | 
109 |         let(:config) { { 'host' => host, 'user' => user, 'flush_size' => 10,
110 |                          'path' => test_file, 'compression' => 'none', 'codec' => 'line' } }
111 | 
112 |         it 'should write some messages uncompressed' do
113 |           expect(webhdfs_client.read(hdfs_file_name).lines.count).to eq(500)
114 |         end
115 | 
116 |       end
117 | 
118 |       context "when using gzip compression" do
119 | 
120 |         let(:config) { { 'host' => host, 'user' => user,
121 |                          'path' => test_file, 'compression' => 'gzip', 'codec' => 'line' } }
122 | 
123 |         it 'should write some messages gzip compressed' do
124 |           expect(Zlib::Inflate.new(window_bits=47).inflate(webhdfs_client.read("#{hdfs_file_name}.gz")).lines.count ).to eq(500)
125 |           webhdfs_client.delete("#{hdfs_file_name}.gz")
126 |         end
127 |       end
128 |     end
129 |   end
130 | end


--------------------------------------------------------------------------------
/spec/outputs/webhdfs_helper_spec.rb:
--------------------------------------------------------------------------------
 1 | # encoding: utf-8
 2 | require 'logstash/devutils/rspec/spec_helper'
 3 | require 'logstash/outputs/webhdfs'
 4 | require 'webhdfs'
 5 | require 'logstash-output-webhdfs_jars'
 6 | 
 7 | 
 8 | describe "webhdfs helpers" do
 9 | 
10 |   let(:host) { 'localhost' }
11 |   let(:user) { 'hadoop' }
12 |   let(:path) { '/test.log' }
13 | 
14 |   let(:config) { { 'host' =>host, 'user' => user, 'path' => path, 'compression' => 'none' } }
15 | 
16 |   let(:sample_data) { "Something very very very long to compress" }
17 | 
18 |   subject(:plugin) { LogStash::Plugin.lookup("output", "webhdfs").new(config) }
19 | 
20 |   context "when compressing using vendor snappy" do
21 |     it "should return a valid byte array" do
22 |       compressed = subject.compress_snappy_file(sample_data)
23 | 
24 |       expect(compressed).not_to be(:nil)
25 |     end
26 | 
27 |     it "should contains all the data" do
28 |       compressed = subject.compress_snappy_file(sample_data)
29 | 
30 |       #remove the length integer (32 bit) added by compress_snappy_file, 4 bytes, from compressed
31 |       uncompressed = subject.snappy_inflate(compressed[4..-1])
32 | 
33 |       expect(uncompressed).to eq(sample_data)
34 |     end
35 |   end
36 | end
37 | 
38 | 


--------------------------------------------------------------------------------
/spec/outputs/webhdfs_spec.rb:
--------------------------------------------------------------------------------
 1 | # encoding: utf-8
 2 | require 'logstash/devutils/rspec/spec_helper'
 3 | require 'logstash/outputs/webhdfs'
 4 | require 'webhdfs'
 5 | require 'json'
 6 | 
 7 | describe 'outputs/webhdfs' do
 8 | 
 9 |   let(:host) { 'localhost' }
10 |   let(:user) { 'hadoop' }
11 |   let(:path) { '/test.log' }
12 | 
13 |   let(:config) { { 'host' =>host, 'user' => user, 'path' => path, 'compression' => 'none' } }
14 | 
15 |   subject(:plugin) { LogStash::Plugin.lookup("output", "webhdfs").new(config) }
16 | 
17 |   describe '#initializing' do
18 | 
19 |     it 'should fail to register without required values' do
20 |       plugin = LogStash::Plugin.lookup("output", "webhdfs")
21 |       expect { plugin.new }.to raise_error(error=LogStash::ConfigurationError)
22 |     end
23 | 
24 |     context "default values" do
25 | 
26 |       it 'should have default port' do
27 |         expect(subject.port).to eq(50070)
28 |       end
29 | 
30 |       it 'should have default idle_flush_time' do
31 |         expect(subject.idle_flush_time).to eq(1)
32 |       end
33 |       it 'should have default flush_size' do
34 |         expect(subject.flush_size).to eq(500)
35 |       end
36 | 
37 |       it 'should have default open_timeout' do
38 |         expect(subject.open_timeout).to eq(30)
39 |       end
40 | 
41 |       it 'should have default read_timeout' do
42 |         expect(subject.read_timeout).to eq(30)
43 |       end
44 | 
45 |       it 'should have default use_httpfs' do
46 |         expect(subject.use_httpfs).to eq(false)
47 |       end
48 | 
49 |       it 'should have default retry_known_errors' do
50 |         expect(subject.retry_known_errors).to eq(true)
51 |       end
52 | 
53 |       it 'should have default retry_interval' do
54 |         expect(subject.retry_interval).to eq(0.5)
55 |       end
56 | 
57 |       it 'should have default retry_times' do
58 |         expect(subject.retry_times).to eq(5)
59 |       end
60 | 
61 |       it 'should have default snappy_bufsize' do
62 |         expect(subject.snappy_bufsize).to eq(32768)
63 |       end
64 | 
65 |       it 'should have default snappy_format' do
66 |         expect(subject.snappy_format).to eq('stream')
67 |       end
68 | 
69 |     end
70 |   end
71 | end
72 | 


--------------------------------------------------------------------------------