├── .github ├── CONTRIBUTING.md ├── ISSUE_TEMPLATE.md └── PULL_REQUEST_TEMPLATE.md ├── .gitignore ├── .travis.yml ├── CHANGELOG.md ├── CONTRIBUTORS ├── Gemfile ├── LICENSE ├── NOTICE.TXT ├── README.md ├── Rakefile ├── docs └── index.asciidoc ├── lib └── logstash │ └── inputs │ ├── s3.rb │ └── s3 │ └── patch.rb ├── logstash-input-s3.gemspec └── spec ├── fixtures ├── cloudfront.log ├── compressed.log.gee.zip ├── compressed.log.gz ├── compressed.log.gzip ├── invalid_utf8.gbk.log ├── json.log ├── json_with_message.log ├── multiline.log ├── multiple_compressed_streams.gz └── uncompressed.log ├── inputs ├── s3_spec.rb └── sincedb_spec.rb ├── integration └── s3_spec.rb └── support └── helpers.rb /.github/CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing to Logstash 2 | 3 | All contributions are welcome: ideas, patches, documentation, bug reports, 4 | complaints, etc! 5 | 6 | Programming is not a required skill, and there are many ways to help out! 7 | It is more important to us that you are able to contribute. 8 | 9 | That said, some basic guidelines, which you are free to ignore :) 10 | 11 | ## Want to learn? 12 | 13 | Want to lurk about and see what others are doing with Logstash? 14 | 15 | * The irc channel (#logstash on irc.freenode.org) is a good place for this 16 | * The [forum](https://discuss.elastic.co/c/logstash) is also 17 | great for learning from others. 18 | 19 | ## Got Questions? 20 | 21 | Have a problem you want Logstash to solve for you? 22 | 23 | * You can ask a question in the [forum](https://discuss.elastic.co/c/logstash) 24 | * Alternately, you are welcome to join the IRC channel #logstash on 25 | irc.freenode.org and ask for help there! 26 | 27 | ## Have an Idea or Feature Request? 28 | 29 | * File a ticket on [GitHub](https://github.com/elastic/logstash/issues). Please remember that GitHub is used only for issues and feature requests. If you have a general question, the [forum](https://discuss.elastic.co/c/logstash) or IRC would be the best place to ask. 30 | 31 | ## Something Not Working? Found a Bug? 32 | 33 | If you think you found a bug, it probably is a bug. 34 | 35 | * If it is a general Logstash or a pipeline issue, file it in [Logstash GitHub](https://github.com/elasticsearch/logstash/issues) 36 | * If it is specific to a plugin, please file it in the respective repository under [logstash-plugins](https://github.com/logstash-plugins) 37 | * or ask the [forum](https://discuss.elastic.co/c/logstash). 38 | 39 | # Contributing Documentation and Code Changes 40 | 41 | If you have a bugfix or new feature that you would like to contribute to 42 | logstash, and you think it will take more than a few minutes to produce the fix 43 | (ie; write code), it is worth discussing the change with the Logstash users and developers first! You can reach us via [GitHub](https://github.com/elastic/logstash/issues), the [forum](https://discuss.elastic.co/c/logstash), or via IRC (#logstash on freenode irc) 44 | Please note that Pull Requests without tests will not be merged. If you would like to contribute but do not have experience with writing tests, please ping us on IRC/forum or create a PR and ask our help. 45 | 46 | ## Contributing to plugins 47 | 48 | Check our [documentation](https://www.elastic.co/guide/en/logstash/current/contributing-to-logstash.html) on how to contribute to plugins or write your own! It is super easy! 49 | 50 | ## Contribution Steps 51 | 52 | 1. Test your changes! [Run](https://github.com/elastic/logstash#testing) the test suite 53 | 2. Please make sure you have signed our [Contributor License 54 | Agreement](https://www.elastic.co/contributor-agreement/). We are not 55 | asking you to assign copyright to us, but to give us the right to distribute 56 | your code without restriction. We ask this of all contributors in order to 57 | assure our users of the origin and continuing existence of the code. You 58 | only need to sign the CLA once. 59 | 3. Send a pull request! Push your changes to your fork of the repository and 60 | [submit a pull 61 | request](https://help.github.com/articles/using-pull-requests). In the pull 62 | request, describe what your changes do and mention any bugs/issues related 63 | to the pull request. 64 | 65 | 66 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE.md: -------------------------------------------------------------------------------- 1 | Please post all product and debugging questions on our [forum](https://discuss.elastic.co/c/logstash). Your questions will reach our wider community members there, and if we confirm that there is a bug, then we can open a new issue here. 2 | 3 | For all general issues, please provide the following details for fast resolution: 4 | 5 | - Version: 6 | - Operating System: 7 | - Config File (if you have sensitive info, please remove it): 8 | - Sample Data: 9 | - Steps to Reproduce: 10 | -------------------------------------------------------------------------------- /.github/PULL_REQUEST_TEMPLATE.md: -------------------------------------------------------------------------------- 1 | Thanks for contributing to Logstash! If you haven't already signed our CLA, here's a handy link: https://www.elastic.co/contributor-agreement/ 2 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | *.gem 2 | Gemfile.lock 3 | .bundle 4 | vendor 5 | coverage/ 6 | .idea/* 7 | .ruby-version 8 | .github/ 9 | .gradle/ 10 | .rakeTasks 11 | -------------------------------------------------------------------------------- /.travis.yml: -------------------------------------------------------------------------------- 1 | import: 2 | - logstash-plugins/.ci:travis/travis.yml@1.x -------------------------------------------------------------------------------- /CHANGELOG.md: -------------------------------------------------------------------------------- 1 | ## 3.8.4 2 | - Refactoring, reuse code to manage `additional_settings` from mixin-aws [#237](https://github.com/logstash-plugins/logstash-input-s3/pull/237) 3 | 4 | ## 3.8.3 5 | - Fix missing `metadata` and `type` of the last event [#223](https://github.com/logstash-plugins/logstash-input-s3/pull/223) 6 | 7 | ## 3.8.2 8 | - Refactor: read sincedb time once per bucket listing [#233](https://github.com/logstash-plugins/logstash-input-s3/pull/233) 9 | 10 | ## 3.8.1 11 | - Feat: cast true/false values for additional_settings [#232](https://github.com/logstash-plugins/logstash-input-s3/pull/232) 12 | 13 | ## 3.8.0 14 | - Add ECS v8 support. 15 | 16 | ## 3.7.0 17 | - Add ECS support. [#228](https://github.com/logstash-plugins/logstash-input-s3/pull/228) 18 | - Fix missing file in cutoff time change. [#224](https://github.com/logstash-plugins/logstash-input-s3/pull/224) 19 | 20 | ## 3.6.0 21 | - Fixed unprocessed file with the same `last_modified` in ingestion. [#220](https://github.com/logstash-plugins/logstash-input-s3/pull/220) 22 | 23 | ## 3.5.2 24 | - [DOC]Added note that only AWS S3 is supported. No other S3 compatible storage solutions are supported. [#208](https://github.com/logstash-plugins/logstash-input-s3/issues/208) 25 | 26 | ## 3.5.1 27 | - [DOC]Added example for `exclude_pattern` and reordered option descriptions [#204](https://github.com/logstash-plugins/logstash-input-s3/issues/204) 28 | 29 | ## 3.5.0 30 | - Added support for including objects restored from Glacier or Glacier Deep [#199](https://github.com/logstash-plugins/logstash-input-s3/issues/199) 31 | - Added `gzip_pattern` option, enabling more flexible determination of whether a file is gzipped [#165](https://github.com/logstash-plugins/logstash-input-s3/issues/165) 32 | - Refactor: log exception: class + unify logging messages a bit [#201](https://github.com/logstash-plugins/logstash-input-s3/pull/201) 33 | 34 | ## 3.4.1 35 | - Fixed link formatting for input type (documentation) 36 | 37 | ## 3.4.0 38 | - Skips objects that are archived to AWS Glacier with a helpful log message (previously they would log as matched, but then fail to load events) [#160](https://github.com/logstash-plugins/logstash-input-s3/pull/160) 39 | - Added `watch_for_new_files` option, enabling single-batch imports [#159](https://github.com/logstash-plugins/logstash-input-s3/pull/159) 40 | 41 | ## 3.3.7 42 | - Added ability to optionally include S3 object properties inside @metadata [#155](https://github.com/logstash-plugins/logstash-input-s3/pull/155) 43 | 44 | ## 3.3.6 45 | - Fixed error in documentation by removing illegal commas [#154](https://github.com/logstash-plugins/logstash-input-s3/pull/154) 46 | 47 | ## 3.3.5 48 | - [#136](https://github.com/logstash-plugins/logstash-input-s3/pull/136) Avoid plugin crashes when encountering 'bad' files in S3 buckets 49 | 50 | ## 3.3.4 51 | - Log entry when bucket is empty #150 52 | 53 | ## 3.3.3 54 | - Symbolize hash keys for additional_settings hash #148 55 | 56 | ## 3.3.2 57 | - Docs: Set the default_codec doc attribute. 58 | 59 | ## 3.3.1 60 | - Improve error handling when listing/downloading from S3 #144 61 | 62 | ## 3.3.0 63 | - Add documentation for endpoint, role_arn and role_session_name #142 64 | - Add support for additional_settings option #141 65 | 66 | ## 3.2.0 67 | - Add support for auto-detecting gzip files with `.gzip` extension, in addition to existing support for `*.gz` 68 | - Improve performance of gzip decoding by 10x by using Java's Zlib 69 | 70 | ## 3.1.9 71 | - Change default sincedb path to live in `{path.data}/plugins/inputs/s3` instead of $HOME. 72 | Prior Logstash installations (using $HOME default) are automatically migrated. 73 | - Don't download the file if the length is 0 #2 74 | 75 | ## 3.1.8 76 | - Update gemspec summary 77 | 78 | ## 3.1.7 79 | - Fix missing last multi-line entry #120 80 | 81 | ## 3.1.6 82 | - Fix some documentation issues 83 | 84 | ## 3.1.4 85 | - Avoid parsing non string elements #109 86 | 87 | ## 3.1.3 88 | - The plugin will now include the s3 key in the metadata #105 89 | 90 | ## 3.1.2 91 | - Fix an issue when the remote file contains multiple blob of gz in the same file #101 92 | - Make the integration suite run 93 | - Remove uneeded development dependency 94 | 95 | ## 3.1.1 96 | - Relax constraint on logstash-core-plugin-api to >= 1.60 <= 2.99 97 | 98 | ## 3.1.0 99 | - breaking,config: Remove deprecated config `credentials` and `region_endpoint`. Please use AWS mixin. 100 | 101 | ## 3.0.1 102 | - Republish all the gems under jruby. 103 | 104 | ## 3.0.0 105 | - Update the plugin to the version 2.0 of the plugin api, this change is required for Logstash 5.0 compatibility. See https://github.com/elastic/logstash/issues/5141 106 | 107 | ## 2.0.6 108 | - Depend on logstash-core-plugin-api instead of logstash-core, removing the need to mass update plugins on major releases of logstash 109 | 110 | ## 2.0.5 111 | - New dependency requirements for logstash-core for the 5.0 release 112 | 113 | ## 2.0.4 114 | - Fix for Error: No Such Key problem when deleting 115 | 116 | ## 2.0.3 117 | - Do not raise an exception if the sincedb file is empty, instead return the current time #66 118 | 119 | ## 2.0.0 120 | - Plugins were updated to follow the new shutdown semantic, this mainly allows Logstash to instruct input plugins to terminate gracefully, 121 | instead of using Thread.raise on the plugins' threads. Ref: https://github.com/elastic/logstash/pull/3895 122 | - Dependency on logstash-core update to 2.0 123 | 124 | -------------------------------------------------------------------------------- /CONTRIBUTORS: -------------------------------------------------------------------------------- 1 | The following is a list of people who have contributed ideas, code, bug 2 | reports, or in general have helped logstash along its way. 3 | 4 | Contributors: 5 | * Aaron Mildenstein (untergeek) 6 | * Adam Tucker (adamjt) 7 | * John Pariseau (ururk) 8 | * Jordan Sissel (jordansissel) 9 | * Mathieu Guillaume (mguillaume) 10 | * Pier-Hugues Pellerin (ph) 11 | * Richard Pijnenburg (electrical) 12 | * Suyog Rao (suyograo) 13 | * Ted Timmons (tedder) 14 | * Ryan O'Keeffe (danielredoak) 15 | 16 | Note: If you've sent us patches, bug reports, or otherwise contributed to 17 | Logstash, and you aren't on the list above and want to be, please let us know 18 | and we'll make sure you're here. Contributions from folks like you are what make 19 | open source awesome. 20 | -------------------------------------------------------------------------------- /Gemfile: -------------------------------------------------------------------------------- 1 | source 'https://rubygems.org' 2 | 3 | gemspec 4 | 5 | logstash_path = ENV["LOGSTASH_PATH"] || "../../logstash" 6 | use_logstash_source = ENV["LOGSTASH_SOURCE"] && ENV["LOGSTASH_SOURCE"].to_s == "1" 7 | 8 | if Dir.exist?(logstash_path) && use_logstash_source 9 | gem 'logstash-core', :path => "#{logstash_path}/logstash-core" 10 | gem 'logstash-core-plugin-api', :path => "#{logstash_path}/logstash-core-plugin-api" 11 | end 12 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | 2 | Apache License 3 | Version 2.0, January 2004 4 | http://www.apache.org/licenses/ 5 | 6 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 7 | 8 | 1. Definitions. 9 | 10 | "License" shall mean the terms and conditions for use, reproduction, 11 | and distribution as defined by Sections 1 through 9 of this document. 12 | 13 | "Licensor" shall mean the copyright owner or entity authorized by 14 | the copyright owner that is granting the License. 15 | 16 | "Legal Entity" shall mean the union of the acting entity and all 17 | other entities that control, are controlled by, or are under common 18 | control with that entity. For the purposes of this definition, 19 | "control" means (i) the power, direct or indirect, to cause the 20 | direction or management of such entity, whether by contract or 21 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 22 | outstanding shares, or (iii) beneficial ownership of such entity. 23 | 24 | "You" (or "Your") shall mean an individual or Legal Entity 25 | exercising permissions granted by this License. 26 | 27 | "Source" form shall mean the preferred form for making modifications, 28 | including but not limited to software source code, documentation 29 | source, and configuration files. 30 | 31 | "Object" form shall mean any form resulting from mechanical 32 | transformation or translation of a Source form, including but 33 | not limited to compiled object code, generated documentation, 34 | and conversions to other media types. 35 | 36 | "Work" shall mean the work of authorship, whether in Source or 37 | Object form, made available under the License, as indicated by a 38 | copyright notice that is included in or attached to the work 39 | (an example is provided in the Appendix below). 40 | 41 | "Derivative Works" shall mean any work, whether in Source or Object 42 | form, that is based on (or derived from) the Work and for which the 43 | editorial revisions, annotations, elaborations, or other modifications 44 | represent, as a whole, an original work of authorship. For the purposes 45 | of this License, Derivative Works shall not include works that remain 46 | separable from, or merely link (or bind by name) to the interfaces of, 47 | the Work and Derivative Works thereof. 48 | 49 | "Contribution" shall mean any work of authorship, including 50 | the original version of the Work and any modifications or additions 51 | to that Work or Derivative Works thereof, that is intentionally 52 | submitted to Licensor for inclusion in the Work by the copyright owner 53 | or by an individual or Legal Entity authorized to submit on behalf of 54 | the copyright owner. For the purposes of this definition, "submitted" 55 | means any form of electronic, verbal, or written communication sent 56 | to the Licensor or its representatives, including but not limited to 57 | communication on electronic mailing lists, source code control systems, 58 | and issue tracking systems that are managed by, or on behalf of, the 59 | Licensor for the purpose of discussing and improving the Work, but 60 | excluding communication that is conspicuously marked or otherwise 61 | designated in writing by the copyright owner as "Not a Contribution." 62 | 63 | "Contributor" shall mean Licensor and any individual or Legal Entity 64 | on behalf of whom a Contribution has been received by Licensor and 65 | subsequently incorporated within the Work. 66 | 67 | 2. Grant of Copyright License. Subject to the terms and conditions of 68 | this License, each Contributor hereby grants to You a perpetual, 69 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 70 | copyright license to reproduce, prepare Derivative Works of, 71 | publicly display, publicly perform, sublicense, and distribute the 72 | Work and such Derivative Works in Source or Object form. 73 | 74 | 3. Grant of Patent License. Subject to the terms and conditions of 75 | this License, each Contributor hereby grants to You a perpetual, 76 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 77 | (except as stated in this section) patent license to make, have made, 78 | use, offer to sell, sell, import, and otherwise transfer the Work, 79 | where such license applies only to those patent claims licensable 80 | by such Contributor that are necessarily infringed by their 81 | Contribution(s) alone or by combination of their Contribution(s) 82 | with the Work to which such Contribution(s) was submitted. If You 83 | institute patent litigation against any entity (including a 84 | cross-claim or counterclaim in a lawsuit) alleging that the Work 85 | or a Contribution incorporated within the Work constitutes direct 86 | or contributory patent infringement, then any patent licenses 87 | granted to You under this License for that Work shall terminate 88 | as of the date such litigation is filed. 89 | 90 | 4. Redistribution. You may reproduce and distribute copies of the 91 | Work or Derivative Works thereof in any medium, with or without 92 | modifications, and in Source or Object form, provided that You 93 | meet the following conditions: 94 | 95 | (a) You must give any other recipients of the Work or 96 | Derivative Works a copy of this License; and 97 | 98 | (b) You must cause any modified files to carry prominent notices 99 | stating that You changed the files; and 100 | 101 | (c) You must retain, in the Source form of any Derivative Works 102 | that You distribute, all copyright, patent, trademark, and 103 | attribution notices from the Source form of the Work, 104 | excluding those notices that do not pertain to any part of 105 | the Derivative Works; and 106 | 107 | (d) If the Work includes a "NOTICE" text file as part of its 108 | distribution, then any Derivative Works that You distribute must 109 | include a readable copy of the attribution notices contained 110 | within such NOTICE file, excluding those notices that do not 111 | pertain to any part of the Derivative Works, in at least one 112 | of the following places: within a NOTICE text file distributed 113 | as part of the Derivative Works; within the Source form or 114 | documentation, if provided along with the Derivative Works; or, 115 | within a display generated by the Derivative Works, if and 116 | wherever such third-party notices normally appear. The contents 117 | of the NOTICE file are for informational purposes only and 118 | do not modify the License. You may add Your own attribution 119 | notices within Derivative Works that You distribute, alongside 120 | or as an addendum to the NOTICE text from the Work, provided 121 | that such additional attribution notices cannot be construed 122 | as modifying the License. 123 | 124 | You may add Your own copyright statement to Your modifications and 125 | may provide additional or different license terms and conditions 126 | for use, reproduction, or distribution of Your modifications, or 127 | for any such Derivative Works as a whole, provided Your use, 128 | reproduction, and distribution of the Work otherwise complies with 129 | the conditions stated in this License. 130 | 131 | 5. Submission of Contributions. Unless You explicitly state otherwise, 132 | any Contribution intentionally submitted for inclusion in the Work 133 | by You to the Licensor shall be under the terms and conditions of 134 | this License, without any additional terms or conditions. 135 | Notwithstanding the above, nothing herein shall supersede or modify 136 | the terms of any separate license agreement you may have executed 137 | with Licensor regarding such Contributions. 138 | 139 | 6. Trademarks. This License does not grant permission to use the trade 140 | names, trademarks, service marks, or product names of the Licensor, 141 | except as required for reasonable and customary use in describing the 142 | origin of the Work and reproducing the content of the NOTICE file. 143 | 144 | 7. Disclaimer of Warranty. Unless required by applicable law or 145 | agreed to in writing, Licensor provides the Work (and each 146 | Contributor provides its Contributions) on an "AS IS" BASIS, 147 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 148 | implied, including, without limitation, any warranties or conditions 149 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 150 | PARTICULAR PURPOSE. You are solely responsible for determining the 151 | appropriateness of using or redistributing the Work and assume any 152 | risks associated with Your exercise of permissions under this License. 153 | 154 | 8. Limitation of Liability. In no event and under no legal theory, 155 | whether in tort (including negligence), contract, or otherwise, 156 | unless required by applicable law (such as deliberate and grossly 157 | negligent acts) or agreed to in writing, shall any Contributor be 158 | liable to You for damages, including any direct, indirect, special, 159 | incidental, or consequential damages of any character arising as a 160 | result of this License or out of the use or inability to use the 161 | Work (including but not limited to damages for loss of goodwill, 162 | work stoppage, computer failure or malfunction, or any and all 163 | other commercial damages or losses), even if such Contributor 164 | has been advised of the possibility of such damages. 165 | 166 | 9. Accepting Warranty or Additional Liability. While redistributing 167 | the Work or Derivative Works thereof, You may choose to offer, 168 | and charge a fee for, acceptance of support, warranty, indemnity, 169 | or other liability obligations and/or rights consistent with this 170 | License. However, in accepting such obligations, You may act only 171 | on Your own behalf and on Your sole responsibility, not on behalf 172 | of any other Contributor, and only if You agree to indemnify, 173 | defend, and hold each Contributor harmless for any liability 174 | incurred by, or claims asserted against, such Contributor by reason 175 | of your accepting any such warranty or additional liability. 176 | 177 | END OF TERMS AND CONDITIONS 178 | 179 | APPENDIX: How to apply the Apache License to your work. 180 | 181 | To apply the Apache License to your work, attach the following 182 | boilerplate notice, with the fields enclosed by brackets "[]" 183 | replaced with your own identifying information. (Don't include 184 | the brackets!) The text should be enclosed in the appropriate 185 | comment syntax for the file format. We also recommend that a 186 | file or class name and description of purpose be included on the 187 | same "printed page" as the copyright notice for easier 188 | identification within third-party archives. 189 | 190 | Copyright 2020 Elastic and contributors 191 | 192 | Licensed under the Apache License, Version 2.0 (the "License"); 193 | you may not use this file except in compliance with the License. 194 | You may obtain a copy of the License at 195 | 196 | http://www.apache.org/licenses/LICENSE-2.0 197 | 198 | Unless required by applicable law or agreed to in writing, software 199 | distributed under the License is distributed on an "AS IS" BASIS, 200 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 201 | See the License for the specific language governing permissions and 202 | limitations under the License. 203 | -------------------------------------------------------------------------------- /NOTICE.TXT: -------------------------------------------------------------------------------- 1 | Elasticsearch 2 | Copyright 2012-2015 Elasticsearch 3 | 4 | This product includes software developed by The Apache Software 5 | Foundation (http://www.apache.org/). -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Logstash Plugin 2 | 3 | [![Travis Build Status](https://travis-ci.com/logstash-plugins/logstash-input-s3.svg)](https://travis-ci.com/logstash-plugins/logstash-input-s3) 4 | 5 | This is a plugin for [Logstash](https://github.com/elastic/logstash). 6 | 7 | It is fully free and fully open source. The license is Apache 2.0, meaning you are pretty much free to use it however you want in whatever way. 8 | 9 | ## Required S3 Permissions 10 | 11 | This plugin reads from your S3 bucket, and would require the following 12 | permissions applied to the AWS IAM Policy being used: 13 | 14 | * `s3:ListBucket` to check if the S3 bucket exists and list objects in it. 15 | * `s3:GetObject` to check object metadata and download objects from S3 buckets. 16 | 17 | You might also need `s3:DeleteObject` when setting S3 input to delete on read. 18 | And the `s3:CreateBucket` permission to create a backup bucket unless already 19 | exists. 20 | In addition, when `backup_to_bucket` is used, the `s3:PutObject` action is also required. 21 | 22 | For buckets that have versioning enabled, you might need to add additional 23 | permissions. 24 | 25 | More information about S3 permissions can be found at - 26 | http://docs.aws.amazon.com/AmazonS3/latest/dev/using-with-s3-actions.html 27 | 28 | ## Documentation 29 | 30 | Logstash provides infrastructure to automatically generate documentation for this plugin. We use the asciidoc format to write documentation so any comments in the source code will be first converted into asciidoc and then into html. All plugin documentation are placed under one [central location](http://www.elastic.co/guide/en/logstash/current/). 31 | 32 | - For formatting code or config example, you can use the asciidoc `[source,ruby]` directive 33 | - For more asciidoc formatting tips, see the excellent reference here https://github.com/elastic/docs#asciidoc-guide 34 | 35 | ## Need Help? 36 | 37 | Need help? Try #logstash on freenode IRC or the https://discuss.elastic.co/c/logstash discussion forum. 38 | 39 | ## Developing 40 | 41 | ### 1. Plugin Development and Testing 42 | 43 | #### Code 44 | - To get started, you'll need JRuby with the Bundler gem installed. 45 | 46 | - Create a new plugin or clone and existing from the GitHub [logstash-plugins](https://github.com/logstash-plugins) organization. We also provide [example plugins](https://github.com/logstash-plugins?query=example). 47 | 48 | - Install dependencies 49 | ```sh 50 | bundle install 51 | ``` 52 | 53 | #### Test 54 | 55 | - Update your dependencies 56 | 57 | ```sh 58 | bundle install 59 | ``` 60 | 61 | - Run tests 62 | 63 | ```sh 64 | bundle exec rspec 65 | ``` 66 | 67 | ### 2. Running your unpublished Plugin in Logstash 68 | 69 | #### 2.1 Run in a local Logstash clone 70 | 71 | - Edit Logstash `Gemfile` and add the local plugin path, for example: 72 | ```ruby 73 | gem "logstash-filter-awesome", :path => "/your/local/logstash-filter-awesome" 74 | ``` 75 | - Install plugin 76 | ```sh 77 | # Logstash 2.3 and higher 78 | bin/logstash-plugin install --no-verify 79 | 80 | # Prior to Logstash 2.3 81 | bin/plugin install --no-verify 82 | 83 | ``` 84 | - Run Logstash with your plugin 85 | ```sh 86 | bin/logstash -e 'filter {awesome {}}' 87 | ``` 88 | At this point any modifications to the plugin code will be applied to this local Logstash setup. After modifying the plugin, simply rerun Logstash. 89 | 90 | #### 2.2 Run in an installed Logstash 91 | 92 | You can use the same **2.1** method to run your plugin in an installed Logstash by editing its `Gemfile` and pointing the `:path` to your local plugin development directory or you can build the gem and install it using: 93 | 94 | - Build your plugin gem 95 | ```sh 96 | gem build logstash-filter-awesome.gemspec 97 | ``` 98 | - Install the plugin from the Logstash home 99 | ```sh 100 | # Logstash 2.3 and higher 101 | bin/logstash-plugin install --no-verify 102 | 103 | # Prior to Logstash 2.3 104 | bin/plugin install --no-verify 105 | 106 | ``` 107 | - Start Logstash and proceed to test the plugin 108 | 109 | ## Contributing 110 | 111 | All contributions are welcome: ideas, patches, documentation, bug reports, complaints, and even something you drew up on a napkin. 112 | 113 | Programming is not a required skill. Whatever you've seen about open source and maintainers or community members saying "send patches or die" - you will not see that here. 114 | 115 | It is more important to the community that you are able to contribute. 116 | 117 | For more information about contributing, see the [CONTRIBUTING](https://github.com/elastic/logstash/blob/master/CONTRIBUTING.md) file. 118 | -------------------------------------------------------------------------------- /Rakefile: -------------------------------------------------------------------------------- 1 | @files=[] 2 | 3 | task :default do 4 | system("rake -T") 5 | end 6 | 7 | require "logstash/devutils/rake" 8 | -------------------------------------------------------------------------------- /docs/index.asciidoc: -------------------------------------------------------------------------------- 1 | :plugin: s3 2 | :type: input 3 | :default_codec: plain 4 | 5 | /////////////////////////////////////////// 6 | START - GENERATED VARIABLES, DO NOT EDIT! 7 | /////////////////////////////////////////// 8 | :version: %VERSION% 9 | :release_date: %RELEASE_DATE% 10 | :changelog_url: %CHANGELOG_URL% 11 | :include_path: ../../../../logstash/docs/include 12 | /////////////////////////////////////////// 13 | END - GENERATED VARIABLES, DO NOT EDIT! 14 | /////////////////////////////////////////// 15 | 16 | [id="plugins-{type}s-{plugin}"] 17 | 18 | === S3 input plugin 19 | 20 | include::{include_path}/plugin_header.asciidoc[] 21 | 22 | ==== Description 23 | 24 | Stream events from files from a S3 bucket. 25 | 26 | IMPORTANT: The S3 input plugin only supports AWS S3. 27 | Other S3 compatible storage solutions are not supported. 28 | 29 | Each line from each file generates an event. 30 | Files ending in `.gz` are handled as gzip'ed files. 31 | 32 | Files that are archived to AWS Glacier will be skipped. 33 | 34 | [id="plugins-{type}s-{plugin}-ecs_metadata"] 35 | ==== Event Metadata and the Elastic Common Schema (ECS) 36 | This plugin adds cloudfront metadata to event. 37 | When ECS compatibility is disabled, the value is stored in the root level. 38 | When ECS is enabled, the value is stored in the `@metadata` where it can be used by other plugins in your pipeline. 39 | 40 | Here’s how ECS compatibility mode affects output. 41 | [cols="> described later. 53 | 54 | [cols="<,<,<",options="header",] 55 | |======================================================================= 56 | |Setting |Input type|Required 57 | | <> |<>|No 58 | | <> |<>|No 59 | | <> |<>|No 60 | | <> |<>|No 61 | | <> |<>|No 62 | | <> |<>|No 63 | | <> |<>|Yes 64 | | <> |<>|No 65 | | <> |<>|No 66 | | <> |<>|No 67 | | <> |<>|No 68 | | <> |<>|No 69 | | <> |<>|No 70 | | <> |<>|No 71 | | <> |<>|No 72 | | <> |<>|No 73 | | <> |<>|No 74 | | <> |<>|No 75 | | <> |<>|No 76 | | <> |<>|No 77 | | <> |<>|No 78 | | <> |<>|No 79 | | <> |<>|No 80 | | <> |<>|No 81 | |======================================================================= 82 | 83 | Also see <> for a list of options supported by all 84 | input plugins. 85 | 86 |   87 | 88 | [id="plugins-{type}s-{plugin}-access_key_id"] 89 | ===== `access_key_id` 90 | 91 | * Value type is <> 92 | * There is no default value for this setting. 93 | 94 | This plugin uses the AWS SDK and supports several ways to get credentials, which will be tried in this order: 95 | 96 | 1. Static configuration, using `access_key_id` and `secret_access_key` params in logstash plugin config 97 | 2. External credentials file specified by `aws_credentials_file` 98 | 3. Environment variables `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` 99 | 4. Environment variables `AMAZON_ACCESS_KEY_ID` and `AMAZON_SECRET_ACCESS_KEY` 100 | 5. IAM Instance Profile (available when running inside EC2) 101 | 102 | 103 | [id="plugins-{type}s-{plugin}-additional_settings"] 104 | ===== `additional_settings` 105 | 106 | * Value type is <> 107 | * Default value is `{}` 108 | 109 | Key-value pairs of settings and corresponding values used to parametrize 110 | the connection to s3. See full list in https://docs.aws.amazon.com/sdkforruby/api/Aws/S3/Client.html[the AWS SDK documentation]. Example: 111 | 112 | [source,ruby] 113 | input { 114 | s3 { 115 | access_key_id => "1234" 116 | secret_access_key => "secret" 117 | bucket => "logstash-test" 118 | additional_settings => { 119 | force_path_style => true 120 | follow_redirects => false 121 | } 122 | } 123 | } 124 | 125 | [id="plugins-{type}s-{plugin}-aws_credentials_file"] 126 | ===== `aws_credentials_file` 127 | 128 | * Value type is <> 129 | * There is no default value for this setting. 130 | 131 | Path to YAML file containing a hash of AWS credentials. 132 | This file will only be loaded if `access_key_id` and 133 | `secret_access_key` aren't set. The contents of the 134 | file should look like this: 135 | 136 | [source,ruby] 137 | ---------------------------------- 138 | :access_key_id: "12345" 139 | :secret_access_key: "54321" 140 | ---------------------------------- 141 | 142 | 143 | [id="plugins-{type}s-{plugin}-backup_add_prefix"] 144 | ===== `backup_add_prefix` 145 | 146 | * Value type is <> 147 | * Default value is `nil` 148 | 149 | Append a prefix to the key (full path including file name in s3) after processing. 150 | If backing up to another (or the same) bucket, this effectively lets you 151 | choose a new 'folder' to place the files in 152 | 153 | [id="plugins-{type}s-{plugin}-backup_to_bucket"] 154 | ===== `backup_to_bucket` 155 | 156 | * Value type is <> 157 | * Default value is `nil` 158 | 159 | Name of a S3 bucket to backup processed files to. 160 | 161 | [id="plugins-{type}s-{plugin}-backup_to_dir"] 162 | ===== `backup_to_dir` 163 | 164 | * Value type is <> 165 | * Default value is `nil` 166 | 167 | Path of a local directory to backup processed files to. 168 | 169 | [id="plugins-{type}s-{plugin}-bucket"] 170 | ===== `bucket` 171 | 172 | * This is a required setting. 173 | * Value type is <> 174 | * There is no default value for this setting. 175 | 176 | The name of the S3 bucket. 177 | 178 | [id="plugins-{type}s-{plugin}-delete"] 179 | ===== `delete` 180 | 181 | * Value type is <> 182 | * Default value is `false` 183 | 184 | Whether to delete processed files from the original bucket. 185 | 186 | [id="plugins-{type}s-{plugin}-ecs_compatibility"] 187 | ===== `ecs_compatibility` 188 | 189 | * Value type is <> 190 | * Supported values are: 191 | ** `disabled`: does not use ECS-compatible field names 192 | ** `v1`,`v8`: uses metadata fields that are compatible with Elastic Common Schema 193 | 194 | Controls this plugin's compatibility with the 195 | {ecs-ref}[Elastic Common Schema (ECS)]. 196 | See <> for detailed information. 197 | 198 | [id="plugins-{type}s-{plugin}-endpoint"] 199 | ===== `endpoint` 200 | 201 | * Value type is <> 202 | * There is no default value for this setting. 203 | 204 | The endpoint to connect to. By default it is constructed using the value of `region`. 205 | This is useful when connecting to S3 compatible services, but beware that these aren't 206 | guaranteed to work correctly with the AWS SDK. 207 | 208 | [id="plugins-{type}s-{plugin}-exclude_pattern"] 209 | ===== `exclude_pattern` 210 | 211 | * Value type is <> 212 | * Default value is `nil` 213 | 214 | Ruby style regexp of keys to exclude from the bucket. 215 | 216 | Note that files matching the pattern are skipped _after_ they have been listed. 217 | Consider using <> instead where possible. 218 | 219 | Example: 220 | 221 | [source,ruby] 222 | ----- 223 | "exclude_pattern" => "\/2020\/04\/" 224 | ----- 225 | 226 | This pattern excludes all logs containing "/2020/04/" in the path. 227 | 228 | 229 | [id="plugins-{type}s-{plugin}-gzip_pattern"] 230 | ===== `gzip_pattern` 231 | 232 | * Value type is <> 233 | * Default value is `"\.gz(ip)?$"` 234 | 235 | Regular expression used to determine whether an input file is in gzip format. 236 | 237 | [id="plugins-{type}s-{plugin}-include_object_properties"] 238 | ===== `include_object_properties` 239 | 240 | * Value type is <> 241 | * Default value is `false` 242 | 243 | Whether or not to include the S3 object's properties (last_modified, content_type, metadata) into each Event at 244 | `[@metadata][s3]`. Regardless of this setting, `[@metadata][s3][key]` will always be present. 245 | 246 | [id="plugins-{type}s-{plugin}-interval"] 247 | ===== `interval` 248 | 249 | * Value type is <> 250 | * Default value is `60` 251 | 252 | Interval to wait between to check the file list again after a run is finished. 253 | Value is in seconds. 254 | 255 | [id="plugins-{type}s-{plugin}-prefix"] 256 | ===== `prefix` 257 | 258 | * Value type is <> 259 | * Default value is `nil` 260 | 261 | If specified, the prefix of filenames in the bucket must match (not a regexp) 262 | 263 | [id="plugins-{type}s-{plugin}-proxy_uri"] 264 | ===== `proxy_uri` 265 | 266 | * Value type is <> 267 | * There is no default value for this setting. 268 | 269 | URI to proxy server if required 270 | 271 | [id="plugins-{type}s-{plugin}-region"] 272 | ===== `region` 273 | 274 | * Value type is <> 275 | * Default value is `"us-east-1"` 276 | 277 | The AWS Region 278 | 279 | [id="plugins-{type}s-{plugin}-role_arn"] 280 | ===== `role_arn` 281 | 282 | * Value type is <> 283 | * There is no default value for this setting. 284 | 285 | The AWS IAM Role to assume, if any. 286 | This is used to generate temporary credentials, typically for cross-account access. 287 | See the https://docs.aws.amazon.com/STS/latest/APIReference/API_AssumeRole.html[AssumeRole API documentation] for more information. 288 | 289 | [id="plugins-{type}s-{plugin}-role_session_name"] 290 | ===== `role_session_name` 291 | 292 | * Value type is <> 293 | * Default value is `"logstash"` 294 | 295 | Session name to use when assuming an IAM role. 296 | 297 | [id="plugins-{type}s-{plugin}-secret_access_key"] 298 | ===== `secret_access_key` 299 | 300 | * Value type is <> 301 | * There is no default value for this setting. 302 | 303 | The AWS Secret Access Key 304 | 305 | [id="plugins-{type}s-{plugin}-session_token"] 306 | ===== `session_token` 307 | 308 | * Value type is <> 309 | * There is no default value for this setting. 310 | 311 | The AWS Session token for temporary credential 312 | 313 | [id="plugins-{type}s-{plugin}-sincedb_path"] 314 | ===== `sincedb_path` 315 | 316 | * Value type is <> 317 | * Default value is `nil` 318 | 319 | Where to write the since database (keeps track of the date 320 | the last handled file was added to S3). The default will write 321 | sincedb files to in the directory '{path.data}/plugins/inputs/s3/' 322 | 323 | If specified, this setting must be a filename path and not just a directory. 324 | 325 | [id="plugins-{type}s-{plugin}-temporary_directory"] 326 | ===== `temporary_directory` 327 | 328 | * Value type is <> 329 | * Default value is `"/tmp/logstash"` 330 | 331 | Set the directory where logstash will store the tmp files before processing them. 332 | 333 | [id="plugins-{type}s-{plugin}-watch_for_new_files"] 334 | ===== `watch_for_new_files` 335 | 336 | * Value type is <> 337 | * Default value is `true` 338 | 339 | Whether or not to watch for new files. 340 | Disabling this option causes the input to close itself after processing the files from a single listing. 341 | 342 | [id="plugins-{type}s-{plugin}-common-options"] 343 | include::{include_path}/{type}.asciidoc[] 344 | 345 | :default_codec!: 346 | -------------------------------------------------------------------------------- /lib/logstash/inputs/s3.rb: -------------------------------------------------------------------------------- 1 | # encoding: utf-8 2 | require "logstash/inputs/base" 3 | require "logstash/namespace" 4 | require "logstash/plugin_mixins/aws_config" 5 | require "time" 6 | require "date" 7 | require "tmpdir" 8 | require "stud/interval" 9 | require "stud/temporary" 10 | require "aws-sdk" 11 | require "logstash/inputs/s3/patch" 12 | require "logstash/plugin_mixins/ecs_compatibility_support" 13 | 14 | require 'java' 15 | 16 | Aws.eager_autoload! 17 | # Stream events from files from a S3 bucket. 18 | # 19 | # Each line from each file generates an event. 20 | # Files ending in `.gz` are handled as gzip'ed files. 21 | class LogStash::Inputs::S3 < LogStash::Inputs::Base 22 | 23 | java_import java.io.InputStream 24 | java_import java.io.InputStreamReader 25 | java_import java.io.FileInputStream 26 | java_import java.io.BufferedReader 27 | java_import java.util.zip.GZIPInputStream 28 | java_import java.util.zip.ZipException 29 | 30 | include LogStash::PluginMixins::AwsConfig::V2 31 | include LogStash::PluginMixins::ECSCompatibilitySupport(:disabled, :v1, :v8 => :v1) 32 | 33 | config_name "s3" 34 | 35 | default :codec, "plain" 36 | 37 | # The name of the S3 bucket. 38 | config :bucket, :validate => :string, :required => true 39 | 40 | # If specified, the prefix of filenames in the bucket must match (not a regexp) 41 | config :prefix, :validate => :string, :default => nil 42 | 43 | config :additional_settings, :validate => :hash, :default => {} 44 | 45 | # The path to use for writing state. The state stored by this plugin is 46 | # a memory of files already processed by this plugin. 47 | # 48 | # If not specified, the default is in `{path.data}/plugins/inputs/s3/...` 49 | # 50 | # Should be a path with filename not just a directory. 51 | config :sincedb_path, :validate => :string, :default => nil 52 | 53 | # Name of a S3 bucket to backup processed files to. 54 | config :backup_to_bucket, :validate => :string, :default => nil 55 | 56 | # Append a prefix to the key (full path including file name in s3) after processing. 57 | # If backing up to another (or the same) bucket, this effectively lets you 58 | # choose a new 'folder' to place the files in 59 | config :backup_add_prefix, :validate => :string, :default => nil 60 | 61 | # Path of a local directory to backup processed files to. 62 | config :backup_to_dir, :validate => :string, :default => nil 63 | 64 | # Whether to delete processed files from the original bucket. 65 | config :delete, :validate => :boolean, :default => false 66 | 67 | # Interval to wait between to check the file list again after a run is finished. 68 | # Value is in seconds. 69 | config :interval, :validate => :number, :default => 60 70 | 71 | # Whether to watch for new files with the interval. 72 | # If false, overrides any interval and only lists the s3 bucket once. 73 | config :watch_for_new_files, :validate => :boolean, :default => true 74 | 75 | # Ruby style regexp of keys to exclude from the bucket 76 | config :exclude_pattern, :validate => :string, :default => nil 77 | 78 | # Set the directory where logstash will store the tmp files before processing them. 79 | # default to the current OS temporary directory in linux /tmp/logstash 80 | config :temporary_directory, :validate => :string, :default => File.join(Dir.tmpdir, "logstash") 81 | 82 | # Whether or not to include the S3 object's properties (last_modified, content_type, metadata) 83 | # into each Event at [@metadata][s3]. Regardless of this setting, [@metdata][s3][key] will always 84 | # be present. 85 | config :include_object_properties, :validate => :boolean, :default => false 86 | 87 | # Regular expression used to determine whether an input file is in gzip format. 88 | # default to an expression that matches *.gz and *.gzip file extensions 89 | config :gzip_pattern, :validate => :string, :default => "\.gz(ip)?$" 90 | 91 | CUTOFF_SECOND = 3 92 | 93 | def initialize(*params) 94 | super 95 | @cloudfront_fields_key = ecs_select[disabled: 'cloudfront_fields', v1: '[@metadata][s3][cloudfront][fields]'] 96 | @cloudfront_version_key = ecs_select[disabled: 'cloudfront_version', v1: '[@metadata][s3][cloudfront][version]'] 97 | end 98 | 99 | def register 100 | require "fileutils" 101 | require "digest/md5" 102 | require "aws-sdk-resources" 103 | 104 | @logger.info("Registering", :bucket => @bucket, :region => @region) 105 | 106 | s3 = get_s3object 107 | 108 | @s3bucket = s3.bucket(@bucket) 109 | 110 | unless @backup_to_bucket.nil? 111 | @backup_bucket = s3.bucket(@backup_to_bucket) 112 | begin 113 | s3.client.head_bucket({ :bucket => @backup_to_bucket}) 114 | rescue Aws::S3::Errors::NoSuchBucket 115 | s3.create_bucket({ :bucket => @backup_to_bucket}) 116 | end 117 | end 118 | 119 | unless @backup_to_dir.nil? 120 | Dir.mkdir(@backup_to_dir, 0700) unless File.exists?(@backup_to_dir) 121 | end 122 | 123 | FileUtils.mkdir_p(@temporary_directory) unless Dir.exist?(@temporary_directory) 124 | 125 | if !@watch_for_new_files && original_params.include?('interval') 126 | logger.warn("`watch_for_new_files` has been disabled; `interval` directive will be ignored.") 127 | end 128 | end 129 | 130 | def run(queue) 131 | @current_thread = Thread.current 132 | Stud.interval(@interval) do 133 | process_files(queue) 134 | stop unless @watch_for_new_files 135 | end 136 | end # def run 137 | 138 | def list_new_files 139 | objects = [] 140 | found = false 141 | current_time = Time.now 142 | sincedb_time = sincedb.read 143 | begin 144 | @s3bucket.objects(:prefix => @prefix).each do |log| 145 | found = true 146 | @logger.debug('Found key', :key => log.key) 147 | if ignore_filename?(log.key) 148 | @logger.debug('Ignoring', :key => log.key) 149 | elsif log.content_length <= 0 150 | @logger.debug('Object Zero Length', :key => log.key) 151 | elsif log.last_modified <= sincedb_time 152 | @logger.debug('Object Not Modified', :key => log.key) 153 | elsif log.last_modified > (current_time - CUTOFF_SECOND).utc # file modified within last two seconds will be processed in next cycle 154 | @logger.debug('Object Modified After Cutoff Time', :key => log.key) 155 | elsif (log.storage_class == 'GLACIER' || log.storage_class == 'DEEP_ARCHIVE') && !file_restored?(log.object) 156 | @logger.debug('Object Archived to Glacier', :key => log.key) 157 | else 158 | objects << log 159 | @logger.debug("Added to objects[]", :key => log.key, :length => objects.length) 160 | end 161 | end 162 | @logger.info('No files found in bucket', :prefix => prefix) unless found 163 | rescue Aws::Errors::ServiceError => e 164 | @logger.error("Unable to list objects in bucket", :exception => e.class, :message => e.message, :backtrace => e.backtrace, :prefix => prefix) 165 | end 166 | objects.sort_by { |log| log.last_modified } 167 | end # def fetch_new_files 168 | 169 | def backup_to_bucket(object) 170 | unless @backup_to_bucket.nil? 171 | backup_key = "#{@backup_add_prefix}#{object.key}" 172 | @backup_bucket.object(backup_key).copy_from(:copy_source => "#{object.bucket_name}/#{object.key}") 173 | if @delete 174 | object.delete() 175 | end 176 | end 177 | end 178 | 179 | def backup_to_dir(filename) 180 | unless @backup_to_dir.nil? 181 | FileUtils.cp(filename, @backup_to_dir) 182 | end 183 | end 184 | 185 | def process_files(queue) 186 | objects = list_new_files 187 | 188 | objects.each do |log| 189 | if stop? 190 | break 191 | else 192 | process_log(queue, log) 193 | end 194 | end 195 | end # def process_files 196 | 197 | def stop 198 | # @current_thread is initialized in the `#run` method, 199 | # this variable is needed because the `#stop` is a called in another thread 200 | # than the `#run` method and requiring us to call stop! with a explicit thread. 201 | Stud.stop!(@current_thread) 202 | end 203 | 204 | private 205 | 206 | # Read the content of the local file 207 | # 208 | # @param [Queue] Where to push the event 209 | # @param [String] Which file to read from 210 | # @param [S3Object] Source s3 object 211 | # @return [Boolean] True if the file was completely read, false otherwise. 212 | def process_local_log(queue, filename, object) 213 | @logger.debug('Processing file', :filename => filename) 214 | metadata = {} 215 | # Currently codecs operates on bytes instead of stream. 216 | # So all IO stuff: decompression, reading need to be done in the actual 217 | # input and send as bytes to the codecs. 218 | read_file(filename) do |line| 219 | if stop? 220 | @logger.warn("Logstash S3 input, stop reading in the middle of the file, we will read it again when logstash is started") 221 | return false 222 | end 223 | 224 | @codec.decode(line) do |event| 225 | # We are making an assumption concerning cloudfront 226 | # log format, the user will use the plain or the line codec 227 | # and the message key will represent the actual line content. 228 | # If the event is only metadata the event will be drop. 229 | # This was the behavior of the pre 1.5 plugin. 230 | # 231 | # The line need to go through the codecs to replace 232 | # unknown bytes in the log stream before doing a regexp match or 233 | # you will get a `Error: invalid byte sequence in UTF-8' 234 | if event_is_metadata?(event) 235 | @logger.debug('Event is metadata, updating the current cloudfront metadata', :event => event) 236 | update_metadata(metadata, event) 237 | else 238 | push_decoded_event(queue, metadata, object, event) 239 | end 240 | end 241 | end 242 | # #ensure any stateful codecs (such as multi-line ) are flushed to the queue 243 | @codec.flush do |event| 244 | push_decoded_event(queue, metadata, object, event) 245 | end 246 | 247 | return true 248 | end # def process_local_log 249 | 250 | def push_decoded_event(queue, metadata, object, event) 251 | decorate(event) 252 | 253 | if @include_object_properties 254 | event.set("[@metadata][s3]", object.data.to_h) 255 | else 256 | event.set("[@metadata][s3]", {}) 257 | end 258 | 259 | event.set("[@metadata][s3][key]", object.key) 260 | event.set(@cloudfront_version_key, metadata[:cloudfront_version]) unless metadata[:cloudfront_version].nil? 261 | event.set(@cloudfront_fields_key, metadata[:cloudfront_fields]) unless metadata[:cloudfront_fields].nil? 262 | 263 | queue << event 264 | end 265 | 266 | def event_is_metadata?(event) 267 | return false unless event.get("message").class == String 268 | line = event.get("message") 269 | version_metadata?(line) || fields_metadata?(line) 270 | end 271 | 272 | def version_metadata?(line) 273 | line.start_with?('#Version: ') 274 | end 275 | 276 | def fields_metadata?(line) 277 | line.start_with?('#Fields: ') 278 | end 279 | 280 | def update_metadata(metadata, event) 281 | line = event.get('message').strip 282 | 283 | if version_metadata?(line) 284 | metadata[:cloudfront_version] = line.split(/#Version: (.+)/).last 285 | end 286 | 287 | if fields_metadata?(line) 288 | metadata[:cloudfront_fields] = line.split(/#Fields: (.+)/).last 289 | end 290 | end 291 | 292 | def read_file(filename, &block) 293 | if gzip?(filename) 294 | read_gzip_file(filename, block) 295 | else 296 | read_plain_file(filename, block) 297 | end 298 | rescue => e 299 | # skip any broken file 300 | @logger.error("Failed to read file, processing skipped", :exception => e.class, :message => e.message, :filename => filename) 301 | end 302 | 303 | def read_plain_file(filename, block) 304 | File.open(filename, 'rb') do |file| 305 | file.each(&block) 306 | end 307 | end 308 | 309 | def read_gzip_file(filename, block) 310 | file_stream = FileInputStream.new(filename) 311 | gzip_stream = GZIPInputStream.new(file_stream) 312 | decoder = InputStreamReader.new(gzip_stream, "UTF-8") 313 | buffered = BufferedReader.new(decoder) 314 | 315 | while (line = buffered.readLine()) 316 | block.call(line) 317 | end 318 | ensure 319 | buffered.close unless buffered.nil? 320 | decoder.close unless decoder.nil? 321 | gzip_stream.close unless gzip_stream.nil? 322 | file_stream.close unless file_stream.nil? 323 | end 324 | 325 | def gzip?(filename) 326 | Regexp.new(@gzip_pattern).match(filename) 327 | end 328 | 329 | def sincedb 330 | @sincedb ||= if @sincedb_path.nil? 331 | @logger.info("Using default generated file for the sincedb", :filename => sincedb_file) 332 | SinceDB::File.new(sincedb_file) 333 | else 334 | @logger.info("Using the provided sincedb_path", :sincedb_path => @sincedb_path) 335 | SinceDB::File.new(@sincedb_path) 336 | end 337 | end 338 | 339 | def sincedb_file 340 | digest = Digest::MD5.hexdigest("#{@bucket}+#{@prefix}") 341 | dir = File.join(LogStash::SETTINGS.get_value("path.data"), "plugins", "inputs", "s3") 342 | FileUtils::mkdir_p(dir) 343 | path = File.join(dir, "sincedb_#{digest}") 344 | 345 | # Migrate old default sincedb path to new one. 346 | if ENV["HOME"] 347 | # This is the old file path including the old digest mechanism. 348 | # It remains as a way to automatically upgrade users with the old default ($HOME) 349 | # to the new default (path.data) 350 | old = File.join(ENV["HOME"], ".sincedb_" + Digest::MD5.hexdigest("#{@bucket}+#{@prefix}")) 351 | if File.exist?(old) 352 | logger.info("Migrating old sincedb in $HOME to {path.data}") 353 | FileUtils.mv(old, path) 354 | end 355 | end 356 | 357 | path 358 | end 359 | 360 | def ignore_filename?(filename) 361 | if @prefix == filename 362 | return true 363 | elsif filename.end_with?("/") 364 | return true 365 | elsif (@backup_add_prefix && @backup_to_bucket == @bucket && filename =~ /^#{backup_add_prefix}/) 366 | return true 367 | elsif @exclude_pattern.nil? 368 | return false 369 | elsif filename =~ Regexp.new(@exclude_pattern) 370 | return true 371 | else 372 | return false 373 | end 374 | end 375 | 376 | def process_log(queue, log) 377 | @logger.debug("Processing", :bucket => @bucket, :key => log.key) 378 | object = @s3bucket.object(log.key) 379 | 380 | filename = File.join(temporary_directory, File.basename(log.key)) 381 | if download_remote_file(object, filename) 382 | if process_local_log(queue, filename, object) 383 | if object.last_modified == log.last_modified 384 | backup_to_bucket(object) 385 | backup_to_dir(filename) 386 | delete_file_from_bucket(object) 387 | FileUtils.remove_entry_secure(filename, true) 388 | sincedb.write(log.last_modified) 389 | else 390 | @logger.info("#{log.key} is updated at #{object.last_modified} and will process in the next cycle") 391 | end 392 | end 393 | else 394 | FileUtils.remove_entry_secure(filename, true) 395 | end 396 | end 397 | 398 | # Stream the remove file to the local disk 399 | # 400 | # @param [S3Object] Reference to the remove S3 objec to download 401 | # @param [String] The Temporary filename to stream to. 402 | # @return [Boolean] True if the file was completely downloaded 403 | def download_remote_file(remote_object, local_filename) 404 | completed = false 405 | @logger.debug("Downloading remote file", :remote_key => remote_object.key, :local_filename => local_filename) 406 | File.open(local_filename, 'wb') do |s3file| 407 | return completed if stop? 408 | begin 409 | remote_object.get(:response_target => s3file) 410 | completed = true 411 | rescue Aws::Errors::ServiceError => e 412 | @logger.warn("Unable to download remote file", :exception => e.class, :message => e.message, :remote_key => remote_object.key) 413 | end 414 | end 415 | completed 416 | end 417 | 418 | def delete_file_from_bucket(object) 419 | if @delete and @backup_to_bucket.nil? 420 | object.delete() 421 | end 422 | end 423 | 424 | def get_s3object 425 | s3 = Aws::S3::Resource.new(aws_options_hash || {}) 426 | end 427 | 428 | def file_restored?(object) 429 | begin 430 | restore = object.data.restore 431 | if restore && restore.match(/ongoing-request\s?=\s?["']false["']/) 432 | if restore = restore.match(/expiry-date\s?=\s?["'](.*?)["']/) 433 | expiry_date = DateTime.parse(restore[1]) 434 | return true if DateTime.now < expiry_date # restored 435 | else 436 | @logger.debug("No expiry-date header for restore request: #{object.data.restore}") 437 | return nil # no expiry-date found for ongoing request 438 | end 439 | end 440 | rescue => e 441 | @logger.debug("Could not determine Glacier restore status", :exception => e.class, :message => e.message) 442 | end 443 | return false 444 | end 445 | 446 | module SinceDB 447 | class File 448 | def initialize(file) 449 | @sincedb_path = file 450 | end 451 | 452 | # @return [Time] 453 | def read 454 | if ::File.exists?(@sincedb_path) 455 | content = ::File.read(@sincedb_path).chomp.strip 456 | # If the file was created but we didn't have the time to write to it 457 | return content.empty? ? Time.new(0) : Time.parse(content) 458 | else 459 | return Time.new(0) 460 | end 461 | end 462 | 463 | def write(since = nil) 464 | since = Time.now if since.nil? 465 | ::File.open(@sincedb_path, 'w') { |file| file.write(since.to_s) } 466 | end 467 | end 468 | end 469 | end # class LogStash::Inputs::S3 470 | -------------------------------------------------------------------------------- /lib/logstash/inputs/s3/patch.rb: -------------------------------------------------------------------------------- 1 | # This is patch related to the autoloading and ruby 2 | # 3 | # The fix exist in jruby 9k but not in the current jruby, not sure when or it will be backported 4 | # https://github.com/jruby/jruby/issues/3645 5 | # 6 | # AWS is doing tricky name discovery in the module to generate the correct error class and 7 | # this strategy is bogus in jruby and `eager_autoload` don't fix this issue. 8 | # 9 | # This will be a short lived patch since AWS is removing the need. 10 | # see: https://github.com/aws/aws-sdk-ruby/issues/1301#issuecomment-261115960 11 | old_stderr = $stderr 12 | 13 | $stderr = StringIO.new 14 | begin 15 | module Aws 16 | const_set(:S3, Aws::S3) 17 | end 18 | ensure 19 | $stderr = old_stderr 20 | end 21 | -------------------------------------------------------------------------------- /logstash-input-s3.gemspec: -------------------------------------------------------------------------------- 1 | Gem::Specification.new do |s| 2 | 3 | s.name = 'logstash-input-s3' 4 | s.version = '3.8.4' 5 | s.licenses = ['Apache-2.0'] 6 | s.summary = "Streams events from files in a S3 bucket" 7 | s.description = "This gem is a Logstash plugin required to be installed on top of the Logstash core pipeline using $LS_HOME/bin/logstash-plugin install gemname. This gem is not a stand-alone program" 8 | s.authors = ["Elastic"] 9 | s.email = 'info@elastic.co' 10 | s.homepage = "http://www.elastic.co/guide/en/logstash/current/index.html" 11 | s.require_paths = ["lib"] 12 | 13 | # Files 14 | s.files = Dir["lib/**/*","spec/**/*","*.gemspec","*.md","CONTRIBUTORS","Gemfile","LICENSE","NOTICE.TXT", "vendor/jar-dependencies/**/*.jar", "vendor/jar-dependencies/**/*.rb", "VERSION", "docs/**/*"] 15 | 16 | # Tests 17 | s.test_files = s.files.grep(%r{^(test|spec|features)/}) 18 | 19 | # Special flag to let us know this is actually a logstash plugin 20 | s.metadata = { "logstash_plugin" => "true", "logstash_group" => "input" } 21 | 22 | # Gem dependencies 23 | s.add_runtime_dependency "logstash-core-plugin-api", ">= 2.1.12", "<= 2.99" 24 | s.add_runtime_dependency 'logstash-mixin-aws', '>= 5.1.0' 25 | s.add_runtime_dependency 'stud', '~> 0.0.18' 26 | # s.add_runtime_dependency 'aws-sdk-resources', '>= 2.0.33' 27 | s.add_development_dependency 'logstash-devutils' 28 | s.add_development_dependency "logstash-codec-json" 29 | s.add_development_dependency "logstash-codec-multiline" 30 | s.add_runtime_dependency 'logstash-mixin-ecs_compatibility_support', '~>1.2' 31 | end 32 | -------------------------------------------------------------------------------- /spec/fixtures/cloudfront.log: -------------------------------------------------------------------------------- 1 | #Version: 1.0 2 | #Fields: date time x-edge-location c-ip x-event sc-bytes x-cf-status x-cf-client-id cs-uri-stem cs-uri-query c-referrer x-page-url​ c-user-agent x-sname x-sname-query x-file-ext x-sid 3 | 2010-03-12 23:51:20 SEA4 192.0.2.147 connect 2014 OK bfd8a98bee0840d9b871b7f6ade9908f rtmp://shqshne4jdp4b6.cloudfront.net/cfx/st​ key=value http://player.longtailvideo.com/player.swf http://www.longtailvideo.com/support/jw-player-setup-wizard?example=204 LNX%2010,0,32,18 - - - - 4 | 2010-03-12 23:51:21 SEA4 192.0.2.222 play 3914 OK bfd8a98bee0840d9b871b7f6ade9908f rtmp://shqshne4jdp4b6.cloudfront.net/cfx/st​ key=value http://player.longtailvideo.com/player.swf http://www.longtailvideo.com/support/jw-player-setup-wizard?example=204 LNX%2010,0,32,18 myvideo p=2&q=4 flv 1 5 | -------------------------------------------------------------------------------- /spec/fixtures/compressed.log.gee.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/logstash-plugins/logstash-input-s3/b7f42d7cd7c09b5343e5e60fe2bd7971f039eecd/spec/fixtures/compressed.log.gee.zip -------------------------------------------------------------------------------- /spec/fixtures/compressed.log.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/logstash-plugins/logstash-input-s3/b7f42d7cd7c09b5343e5e60fe2bd7971f039eecd/spec/fixtures/compressed.log.gz -------------------------------------------------------------------------------- /spec/fixtures/compressed.log.gzip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/logstash-plugins/logstash-input-s3/b7f42d7cd7c09b5343e5e60fe2bd7971f039eecd/spec/fixtures/compressed.log.gzip -------------------------------------------------------------------------------- /spec/fixtures/invalid_utf8.gbk.log: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/logstash-plugins/logstash-input-s3/b7f42d7cd7c09b5343e5e60fe2bd7971f039eecd/spec/fixtures/invalid_utf8.gbk.log -------------------------------------------------------------------------------- /spec/fixtures/json.log: -------------------------------------------------------------------------------- 1 | { "hello": "world" } 2 | { "hello": "awesome world" } 3 | -------------------------------------------------------------------------------- /spec/fixtures/json_with_message.log: -------------------------------------------------------------------------------- 1 | { "message": ["GET", 32, "/health"] } 2 | { "message": true } 3 | -------------------------------------------------------------------------------- /spec/fixtures/multiline.log: -------------------------------------------------------------------------------- 1 | __SEPARATOR__ 2 | file:1 record:1 line:1 3 | file:1 record:1 line:2 4 | __SEPARATOR__ 5 | file:1 record:2 line:1 6 | file:1 record:2 line:2 -------------------------------------------------------------------------------- /spec/fixtures/multiple_compressed_streams.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/logstash-plugins/logstash-input-s3/b7f42d7cd7c09b5343e5e60fe2bd7971f039eecd/spec/fixtures/multiple_compressed_streams.gz -------------------------------------------------------------------------------- /spec/fixtures/uncompressed.log: -------------------------------------------------------------------------------- 1 | 2010-03-12 23:51:20 SEA4 192.0.2.147 connect 2014 OK bfd8a98bee0840d9b871b7f6ade9908f rtmp://shqshne4jdp4b6.cloudfront.net/cfx/st​ key=value http://player.longtailvideo.com/player.swf http://www.longtailvideo.com/support/jw-player-setup-wizard?example=204 LNX%2010,0,32,18 - - - - 2 | 2010-03-12 23:51:21 SEA4 192.0.2.222 play 3914 OK bfd8a98bee0840d9b871b7f6ade9908f rtmp://shqshne4jdp4b6.cloudfront.net/cfx/st​ key=value http://player.longtailvideo.com/player.swf http://www.longtailvideo.com/support/jw-player-setup-wizard?example=204 LNX%2010,0,32,18 myvideo p=2&q=4 flv 1 3 | -------------------------------------------------------------------------------- /spec/inputs/s3_spec.rb: -------------------------------------------------------------------------------- 1 | # encoding: utf-8 2 | require "logstash/devutils/rspec/spec_helper" 3 | require "logstash/devutils/rspec/shared_examples" 4 | require "logstash/inputs/s3" 5 | require "logstash/codecs/multiline" 6 | require "logstash/errors" 7 | require "aws-sdk-resources" 8 | require_relative "../support/helpers" 9 | require "stud/temporary" 10 | require "aws-sdk" 11 | require "fileutils" 12 | require 'logstash/plugin_mixins/ecs_compatibility_support/spec_helper' 13 | 14 | describe LogStash::Inputs::S3 do 15 | let(:temporary_directory) { Stud::Temporary.pathname } 16 | let(:sincedb_path) { Stud::Temporary.pathname } 17 | let(:day) { 3600 * 24 } 18 | let(:creds) { Aws::Credentials.new('1234', 'secret') } 19 | let(:config) { 20 | { 21 | "access_key_id" => "1234", 22 | "secret_access_key" => "secret", 23 | "bucket" => "logstash-test", 24 | "temporary_directory" => temporary_directory, 25 | "sincedb_path" => File.join(sincedb_path, ".sincedb") 26 | } 27 | } 28 | let(:cutoff) { LogStash::Inputs::S3::CUTOFF_SECOND } 29 | 30 | 31 | before do 32 | FileUtils.mkdir_p(sincedb_path) 33 | Aws.config[:stub_responses] = true 34 | Thread.abort_on_exception = true 35 | end 36 | 37 | context "when interrupting the plugin" do 38 | let(:config) { super().merge({ "interval" => 5 }) } 39 | let(:s3_obj) { double(:key => "awesome-key", :last_modified => Time.now.round, :content_length => 10, :storage_class => 'STANDARD', :object => double(:data => double(:restore => nil)) ) } 40 | 41 | before do 42 | expect_any_instance_of(LogStash::Inputs::S3).to receive(:list_new_files).and_return(TestInfiniteS3Object.new(s3_obj)) 43 | end 44 | 45 | it_behaves_like "an interruptible input plugin" do 46 | let(:allowed_lag) { 16 } if LOGSTASH_VERSION.split('.').first.to_i <= 6 47 | end 48 | end 49 | 50 | describe "#register" do 51 | subject { LogStash::Inputs::S3.new(config) } 52 | 53 | context "with temporary directory" do 54 | let(:temporary_directory) { Stud::Temporary.pathname } 55 | 56 | it "creates the direct when it doesn't exist" do 57 | expect { subject.register }.to change { Dir.exist?(temporary_directory) }.from(false).to(true) 58 | end 59 | end 60 | end 61 | 62 | describe '#get_s3object' do 63 | subject { LogStash::Inputs::S3.new(settings) } 64 | 65 | context 'with modern access key options' do 66 | let(:settings) { 67 | { 68 | "access_key_id" => "1234", 69 | "secret_access_key" => "secret", 70 | "proxy_uri" => "http://example.com", 71 | "bucket" => "logstash-test", 72 | } 73 | } 74 | 75 | it 'should instantiate AWS::S3 clients with a proxy set' do 76 | expect(Aws::S3::Resource).to receive(:new).with({ 77 | :credentials => kind_of(Aws::Credentials), 78 | :http_proxy => 'http://example.com', 79 | :region => subject.region 80 | }) 81 | 82 | subject.send(:get_s3object) 83 | end 84 | end 85 | 86 | describe "additional_settings" do 87 | context "supported settings" do 88 | let(:settings) { 89 | { 90 | "additional_settings" => { "force_path_style" => 'true', "ssl_verify_peer" => 'false', "profile" => 'logstash' }, 91 | "bucket" => "logstash-test", 92 | } 93 | } 94 | 95 | it 'should instantiate AWS::S3 clients with force_path_style set' do 96 | expect(Aws::S3::Resource).to receive(:new).with({ 97 | :region => subject.region, 98 | :force_path_style => true, :ssl_verify_peer => false, :profile => 'logstash' 99 | }).and_call_original 100 | 101 | subject.send(:get_s3object) 102 | end 103 | end 104 | 105 | context 'when an unknown setting is given' do 106 | let(:settings) { 107 | { 108 | "additional_settings" => { "this_setting_doesnt_exist" => true }, 109 | "bucket" => "logstash-test", 110 | } 111 | } 112 | 113 | it 'should raise an error' do 114 | expect { subject.send(:get_s3object) }.to raise_error(ArgumentError) 115 | end 116 | end 117 | end 118 | end 119 | 120 | describe "#list_new_files" do 121 | before { allow_any_instance_of(Aws::S3::Bucket).to receive(:objects) { objects_list } } 122 | 123 | let!(:present_object_after_cutoff) {double(:key => 'this-should-not-be-present', :last_modified => Time.now, :content_length => 10, :storage_class => 'STANDARD', :object => double(:data => double(:restore => nil)) ) } 124 | let!(:present_object) {double(:key => 'this-should-be-present', :last_modified => Time.now - cutoff, :content_length => 10, :storage_class => 'STANDARD', :object => double(:data => double(:restore => nil)) ) } 125 | let!(:archived_object) {double(:key => 'this-should-be-archived', :last_modified => Time.now - cutoff, :content_length => 10, :storage_class => 'GLACIER', :object => double(:data => double(:restore => nil)) ) } 126 | let!(:deep_archived_object) {double(:key => 'this-should-be-archived', :last_modified => Time.now - cutoff, :content_length => 10, :storage_class => 'GLACIER', :object => double(:data => double(:restore => nil)) ) } 127 | let!(:restored_object) {double(:key => 'this-should-be-restored-from-archive', :last_modified => Time.now - cutoff, :content_length => 10, :storage_class => 'GLACIER', :object => double(:data => double(:restore => 'ongoing-request="false", expiry-date="Thu, 01 Jan 2099 00:00:00 GMT"')) ) } 128 | let!(:deep_restored_object) {double(:key => 'this-should-be-restored-from-deep-archive', :last_modified => Time.now - cutoff, :content_length => 10, :storage_class => 'DEEP_ARCHIVE', :object => double(:data => double(:restore => 'ongoing-request="false", expiry-date="Thu, 01 Jan 2099 00:00:00 GMT"')) ) } 129 | let(:objects_list) { 130 | [ 131 | double(:key => 'exclude-this-file-1', :last_modified => Time.now - 2 * day, :content_length => 100, :storage_class => 'STANDARD'), 132 | double(:key => 'exclude/logstash', :last_modified => Time.now - 2 * day, :content_length => 50, :storage_class => 'STANDARD'), 133 | archived_object, 134 | restored_object, 135 | deep_restored_object, 136 | present_object, 137 | present_object_after_cutoff 138 | ] 139 | } 140 | 141 | it 'should allow user to exclude files from the s3 bucket' do 142 | plugin = LogStash::Inputs::S3.new(config.merge({ "exclude_pattern" => "^exclude" })) 143 | plugin.register 144 | 145 | files = plugin.list_new_files.map { |item| item.key } 146 | expect(files).to include(present_object.key) 147 | expect(files).to include(restored_object.key) 148 | expect(files).to include(deep_restored_object.key) 149 | expect(files).to_not include('exclude-this-file-1') # matches exclude pattern 150 | expect(files).to_not include('exclude/logstash') # matches exclude pattern 151 | expect(files).to_not include(archived_object.key) # archived 152 | expect(files).to_not include(deep_archived_object.key) # archived 153 | expect(files).to_not include(present_object_after_cutoff.key) # after cutoff 154 | expect(files.size).to eq(3) 155 | end 156 | 157 | it 'should support not providing a exclude pattern' do 158 | plugin = LogStash::Inputs::S3.new(config) 159 | plugin.register 160 | 161 | files = plugin.list_new_files.map { |item| item.key } 162 | expect(files).to include(present_object.key) 163 | expect(files).to include(restored_object.key) 164 | expect(files).to include(deep_restored_object.key) 165 | expect(files).to include('exclude-this-file-1') # no exclude pattern given 166 | expect(files).to include('exclude/logstash') # no exclude pattern given 167 | expect(files).to_not include(archived_object.key) # archived 168 | expect(files).to_not include(deep_archived_object.key) # archived 169 | expect(files).to_not include(present_object_after_cutoff.key) # after cutoff 170 | expect(files.size).to eq(5) 171 | end 172 | 173 | context 'when all files are excluded from a bucket' do 174 | let(:objects_list) { 175 | [ 176 | double(:key => 'exclude-this-file-1', :last_modified => Time.now - 2 * day, :content_length => 100, :storage_class => 'STANDARD'), 177 | double(:key => 'exclude/logstash', :last_modified => Time.now - 2 * day, :content_length => 50, :storage_class => 'STANDARD'), 178 | ] 179 | } 180 | 181 | it 'should not log that no files were found in the bucket' do 182 | plugin = LogStash::Inputs::S3.new(config.merge({ "exclude_pattern" => "^exclude" })) 183 | plugin.register 184 | allow(plugin.logger).to receive(:debug).with(anything, anything) 185 | 186 | expect(plugin.logger).not_to receive(:info).with(/No files found/, anything) 187 | expect(plugin.logger).to receive(:debug).with(/Ignoring/, anything) 188 | expect(plugin.list_new_files).to be_empty 189 | end 190 | end 191 | 192 | context 'with an empty bucket' do 193 | let(:objects_list) { [] } 194 | 195 | it 'should log that no files were found in the bucket' do 196 | plugin = LogStash::Inputs::S3.new(config) 197 | plugin.register 198 | allow(plugin.logger).to receive(:info).with(/Using the provided sincedb_path/, anything) 199 | expect(plugin.logger).to receive(:info).with(/No files found/, anything) 200 | expect(plugin.list_new_files).to be_empty 201 | end 202 | end 203 | 204 | context "If the bucket is the same as the backup bucket" do 205 | it 'should ignore files from the bucket if they match the backup prefix' do 206 | objects_list = [ 207 | double(:key => 'mybackup-log-1', :last_modified => Time.now, :content_length => 5, :storage_class => 'STANDARD'), 208 | present_object 209 | ] 210 | 211 | allow_any_instance_of(Aws::S3::Bucket).to receive(:objects) { objects_list } 212 | 213 | plugin = LogStash::Inputs::S3.new(config.merge({ 'backup_add_prefix' => 'mybackup', 214 | 'backup_to_bucket' => config['bucket']})) 215 | plugin.register 216 | 217 | files = plugin.list_new_files.map { |item| item.key } 218 | expect(files).to include(present_object.key) 219 | expect(files).to_not include('mybackup-log-1') # matches backup prefix 220 | expect(files.size).to eq(1) 221 | end 222 | end 223 | 224 | it 'should ignore files older than X' do 225 | plugin = LogStash::Inputs::S3.new(config.merge({ 'backup_add_prefix' => 'exclude-this-file'})) 226 | 227 | 228 | allow_any_instance_of(LogStash::Inputs::S3::SinceDB::File).to receive(:read).and_return(Time.now - day) 229 | plugin.register 230 | 231 | files = plugin.list_new_files.map { |item| item.key } 232 | expect(files).to include(present_object.key) 233 | expect(files).to include(restored_object.key) 234 | expect(files).to include(deep_restored_object.key) 235 | expect(files).to_not include('exclude-this-file-1') # too old 236 | expect(files).to_not include('exclude/logstash') # too old 237 | expect(files).to_not include(archived_object.key) # archived 238 | expect(files).to_not include(deep_archived_object.key) # archived 239 | expect(files).to_not include(present_object_after_cutoff.key) # after cutoff 240 | expect(files.size).to eq(3) 241 | end 242 | 243 | it 'should ignore file if the file match the prefix' do 244 | prefix = 'mysource/' 245 | 246 | objects_list = [ 247 | double(:key => prefix, :last_modified => Time.now, :content_length => 5, :storage_class => 'STANDARD'), 248 | present_object 249 | ] 250 | 251 | allow_any_instance_of(Aws::S3::Bucket).to receive(:objects).with(:prefix => prefix) { objects_list } 252 | 253 | plugin = LogStash::Inputs::S3.new(config.merge({ 'prefix' => prefix })) 254 | plugin.register 255 | expect(plugin.list_new_files.map { |item| item.key }).to eq([present_object.key]) 256 | end 257 | 258 | it 'should sort return object sorted by last_modification date with older first' do 259 | objects = [ 260 | double(:key => 'YESTERDAY', :last_modified => Time.now - day, :content_length => 5, :storage_class => 'STANDARD'), 261 | double(:key => 'TODAY', :last_modified => Time.now, :content_length => 5, :storage_class => 'STANDARD'), 262 | double(:key => 'TODAY_BEFORE_CUTOFF', :last_modified => Time.now - cutoff, :content_length => 5, :storage_class => 'STANDARD'), 263 | double(:key => 'TWO_DAYS_AGO', :last_modified => Time.now - 2 * day, :content_length => 5, :storage_class => 'STANDARD') 264 | ] 265 | 266 | allow_any_instance_of(Aws::S3::Bucket).to receive(:objects) { objects } 267 | 268 | 269 | plugin = LogStash::Inputs::S3.new(config) 270 | plugin.register 271 | expect(plugin.list_new_files.map { |item| item.key }).to eq(['TWO_DAYS_AGO', 'YESTERDAY', 'TODAY_BEFORE_CUTOFF']) 272 | end 273 | 274 | describe "when doing backup on the s3" do 275 | it 'should copy to another s3 bucket when keeping the original file' do 276 | plugin = LogStash::Inputs::S3.new(config.merge({ "backup_to_bucket" => "mybackup"})) 277 | plugin.register 278 | 279 | s3object = Aws::S3::Object.new('mybucket', 'testkey') 280 | expect_any_instance_of(Aws::S3::Object).to receive(:copy_from).with(:copy_source => "mybucket/testkey") 281 | expect(s3object).to_not receive(:delete) 282 | 283 | plugin.backup_to_bucket(s3object) 284 | end 285 | 286 | it 'should copy to another s3 bucket when deleting the original file' do 287 | plugin = LogStash::Inputs::S3.new(config.merge({ "backup_to_bucket" => "mybackup", "delete" => true })) 288 | plugin.register 289 | 290 | s3object = Aws::S3::Object.new('mybucket', 'testkey') 291 | expect_any_instance_of(Aws::S3::Object).to receive(:copy_from).with(:copy_source => "mybucket/testkey") 292 | expect(s3object).to receive(:delete) 293 | 294 | plugin.backup_to_bucket(s3object) 295 | end 296 | 297 | it 'should add the specified prefix to the backup file' do 298 | plugin = LogStash::Inputs::S3.new(config.merge({ "backup_to_bucket" => "mybackup", 299 | "backup_add_prefix" => 'backup-' })) 300 | plugin.register 301 | 302 | s3object = Aws::S3::Object.new('mybucket', 'testkey') 303 | expect_any_instance_of(Aws::S3::Object).to receive(:copy_from).with(:copy_source => "mybucket/testkey") 304 | expect(s3object).to_not receive(:delete) 305 | 306 | plugin.backup_to_bucket(s3object) 307 | end 308 | end 309 | 310 | it 'should support doing local backup of files' do 311 | Stud::Temporary.directory do |backup_dir| 312 | Stud::Temporary.file do |source_file| 313 | backup_file = File.join(backup_dir.to_s, Pathname.new(source_file.path).basename.to_s) 314 | 315 | plugin = LogStash::Inputs::S3.new(config.merge({ "backup_to_dir" => backup_dir })) 316 | 317 | plugin.backup_to_dir(source_file) 318 | 319 | expect(File.exists?(backup_file)).to eq(true) 320 | end 321 | end 322 | end 323 | end 324 | 325 | shared_examples "generated events" do 326 | let(:events_to_process) { 2 } 327 | 328 | it 'should process events' do 329 | events = fetch_events(config) 330 | expect(events.size).to eq(events_to_process) 331 | expect(events[0].get("[@metadata][s3][key]")).to eql log.key 332 | expect(events[1].get("[@metadata][s3][key]")).to eql log.key 333 | end 334 | 335 | it "deletes the temporary file" do 336 | events = fetch_events(config) 337 | expect(Dir.glob(File.join(temporary_directory, "*")).size).to eq(0) 338 | end 339 | end 340 | 341 | context 'while communicating with s3' do 342 | let(:config) { 343 | { 344 | "access_key_id" => "1234", 345 | "secret_access_key" => "secret", 346 | "bucket" => "logstash-test", 347 | "codec" => "json", 348 | } 349 | } 350 | %w(AccessDenied NotFound).each do |error| 351 | context "while listing bucket contents, #{error} is returned" do 352 | before do 353 | Aws.config[:s3] = { 354 | stub_responses: { 355 | list_objects: error 356 | } 357 | } 358 | end 359 | 360 | it 'should not crash the plugin' do 361 | events = fetch_events(config) 362 | expect(events.size).to eq(0) 363 | end 364 | end 365 | end 366 | 367 | %w(AccessDenied NoSuchKey).each do |error| 368 | context "when retrieving an object, #{error} is returned" do 369 | let(:objects) { [log] } 370 | let(:log) { double(:key => 'uncompressed.log', :last_modified => Time.now - 2 * day, :content_length => 5, :storage_class => 'STANDARD') } 371 | 372 | let(:config) { 373 | { 374 | "access_key_id" => "1234", 375 | "secret_access_key" => "secret", 376 | "bucket" => "logstash-test", 377 | "codec" => "json", 378 | } 379 | } 380 | before do 381 | Aws.config[:s3] = { 382 | stub_responses: { 383 | get_object: error 384 | } 385 | } 386 | allow_any_instance_of(Aws::S3::Bucket).to receive(:objects) { objects } 387 | end 388 | 389 | it 'should not crash the plugin' do 390 | events = fetch_events(config) 391 | expect(events.size).to eq(0) 392 | end 393 | end 394 | end 395 | end 396 | 397 | context 'when working with logs' do 398 | let(:objects) { [log] } 399 | let(:log) { double(:key => 'uncompressed.log', :last_modified => Time.now - 2 * day, :content_length => 5, :data => { "etag" => 'c2c966251da0bc3229d12c2642ba50a4' }, :storage_class => 'STANDARD') } 400 | let(:data) { File.read(log_file) } 401 | 402 | before do 403 | Aws.config[:s3] = { 404 | stub_responses: { 405 | get_object: { body: data } 406 | } 407 | } 408 | allow_any_instance_of(Aws::S3::Bucket).to receive(:objects) { objects } 409 | allow_any_instance_of(Aws::S3::Bucket).to receive(:object).with(log.key) { log } 410 | expect(log).to receive(:get).with(instance_of(Hash)) do |arg| 411 | File.open(arg[:response_target], 'wb') { |s3file| s3file.write(data) } 412 | end 413 | end 414 | 415 | context "when event doesn't have a `message` field" do 416 | let(:log_file) { File.join(File.dirname(__FILE__), '..', 'fixtures', 'json.log') } 417 | let(:config) { 418 | { 419 | "access_key_id" => "1234", 420 | "secret_access_key" => "secret", 421 | "bucket" => "logstash-test", 422 | "codec" => "json", 423 | } 424 | } 425 | 426 | include_examples "generated events" 427 | end 428 | 429 | context "when event does have a `message` field" do 430 | let(:log_file) { File.join(File.dirname(__FILE__), '..', 'fixtures', 'json_with_message.log') } 431 | let(:config) { 432 | { 433 | "access_key_id" => "1234", 434 | "secret_access_key" => "secret", 435 | "bucket" => "logstash-test", 436 | "codec" => "json", 437 | } 438 | } 439 | 440 | include_examples "generated events" 441 | end 442 | 443 | context "multiple compressed streams" do 444 | let(:log) { double(:key => 'log.gz', :last_modified => Time.now - 2 * day, :content_length => 5, :storage_class => 'STANDARD') } 445 | let(:log_file) { File.join(File.dirname(__FILE__), '..', 'fixtures', 'multiple_compressed_streams.gz') } 446 | 447 | include_examples "generated events" do 448 | let(:events_to_process) { 16 } 449 | end 450 | end 451 | 452 | context 'compressed' do 453 | let(:log) { double(:key => 'log.gz', :last_modified => Time.now - 2 * day, :content_length => 5, :storage_class => 'STANDARD') } 454 | let(:log_file) { File.join(File.dirname(__FILE__), '..', 'fixtures', 'compressed.log.gz') } 455 | 456 | include_examples "generated events" 457 | end 458 | 459 | context 'compressed with gzip extension and using default gzip_pattern option' do 460 | let(:log) { double(:key => 'log.gz', :last_modified => Time.now - 2 * day, :content_length => 5, :storage_class => 'STANDARD') } 461 | let(:log_file) { File.join(File.dirname(__FILE__), '..', 'fixtures', 'compressed.log.gzip') } 462 | 463 | include_examples "generated events" 464 | end 465 | 466 | context 'compressed with gzip extension and using custom gzip_pattern option' do 467 | let(:config) { super().merge({ "gzip_pattern" => "gee.zip$" }) } 468 | let(:log) { double(:key => 'log.gee.zip', :last_modified => Time.now - 2 * day, :content_length => 5, :storage_class => 'STANDARD') } 469 | let(:log_file) { File.join(File.dirname(__FILE__), '..', 'fixtures', 'compressed.log.gee.zip') } 470 | include_examples "generated events" 471 | end 472 | 473 | context 'plain text' do 474 | let(:log_file) { File.join(File.dirname(__FILE__), '..', 'fixtures', 'uncompressed.log') } 475 | 476 | include_examples "generated events" 477 | end 478 | 479 | context 'multi-line' do 480 | let(:log_file) { File.join(File.dirname(__FILE__), '..', 'fixtures', 'multiline.log') } 481 | let(:config) { 482 | { 483 | "access_key_id" => "1234", 484 | "secret_access_key" => "secret", 485 | "bucket" => "logstash-test", 486 | "codec" => LogStash::Codecs::Multiline.new( {"pattern" => "__SEPARATOR__", "negate" => "true", "what" => "previous"}) 487 | } 488 | } 489 | 490 | include_examples "generated events" 491 | end 492 | 493 | context 'encoded' do 494 | let(:log_file) { File.join(File.dirname(__FILE__), '..', 'fixtures', 'invalid_utf8.gbk.log') } 495 | 496 | include_examples "generated events" 497 | end 498 | 499 | context 'cloudfront' do 500 | let(:log_file) { File.join(File.dirname(__FILE__), '..', 'fixtures', 'cloudfront.log') } 501 | 502 | describe "metadata", :ecs_compatibility_support, :aggregate_failures do 503 | ecs_compatibility_matrix(:disabled, :v1) do |ecs_select| 504 | before(:each) do 505 | allow_any_instance_of(described_class).to receive(:ecs_compatibility).and_return(ecs_compatibility) 506 | end 507 | 508 | it 'should extract metadata from cloudfront log' do 509 | events = fetch_events(config) 510 | 511 | events.each do |event| 512 | expect(event.get ecs_select[disabled: "cloudfront_fields", v1: "[@metadata][s3][cloudfront][fields]"] ).to eq('date time x-edge-location c-ip x-event sc-bytes x-cf-status x-cf-client-id cs-uri-stem cs-uri-query c-referrer x-page-url​ c-user-agent x-sname x-sname-query x-file-ext x-sid') 513 | expect(event.get ecs_select[disabled: "cloudfront_version", v1: "[@metadata][s3][cloudfront][version]"] ).to eq('1.0') 514 | end 515 | end 516 | end 517 | end 518 | 519 | include_examples "generated events" 520 | end 521 | 522 | context 'when include_object_properties is set to true' do 523 | let(:config) { super().merge({ "include_object_properties" => true }) } 524 | let(:log_file) { File.join(File.dirname(__FILE__), '..', 'fixtures', 'uncompressed.log') } 525 | 526 | it 'should extract object properties onto [@metadata][s3]' do 527 | events = fetch_events(config) 528 | events.each do |event| 529 | expect(event.get('[@metadata][s3]')).to include(log.data) 530 | end 531 | end 532 | 533 | include_examples "generated events" 534 | end 535 | 536 | context 'when include_object_properties is set to false' do 537 | let(:config) { super().merge({ "include_object_properties" => false }) } 538 | let(:log_file) { File.join(File.dirname(__FILE__), '..', 'fixtures', 'uncompressed.log') } 539 | 540 | it 'should NOT extract object properties onto [@metadata][s3]' do 541 | events = fetch_events(config) 542 | events.each do |event| 543 | expect(event.get('[@metadata][s3]')).to_not include(log.data) 544 | end 545 | end 546 | 547 | include_examples "generated events" 548 | end 549 | end 550 | 551 | describe "data loss" do 552 | let(:s3_plugin) { LogStash::Inputs::S3.new(config) } 553 | let(:queue) { [] } 554 | 555 | before do 556 | s3_plugin.register 557 | end 558 | 559 | context 'events come after cutoff time' do 560 | it 'should be processed in next cycle' do 561 | s3_objects = [ 562 | double(:key => 'TWO_DAYS_AGO', :last_modified => Time.now.round - 2 * day, :content_length => 5, :storage_class => 'STANDARD'), 563 | double(:key => 'YESTERDAY', :last_modified => Time.now.round - day, :content_length => 5, :storage_class => 'STANDARD'), 564 | double(:key => 'TODAY_BEFORE_CUTOFF', :last_modified => Time.now.round - cutoff, :content_length => 5, :storage_class => 'STANDARD'), 565 | double(:key => 'TODAY', :last_modified => Time.now.round, :content_length => 5, :storage_class => 'STANDARD'), 566 | double(:key => 'TODAY', :last_modified => Time.now.round, :content_length => 5, :storage_class => 'STANDARD') 567 | ] 568 | size = s3_objects.length 569 | 570 | allow_any_instance_of(Aws::S3::Bucket).to receive(:objects) { s3_objects } 571 | allow_any_instance_of(Aws::S3::Bucket).to receive(:object).and_return(*s3_objects) 572 | expect(s3_plugin).to receive(:process_log).at_least(size).and_call_original 573 | expect(s3_plugin).to receive(:stop?).and_return(false).at_least(size) 574 | expect(s3_plugin).to receive(:download_remote_file).and_return(true).at_least(size) 575 | expect(s3_plugin).to receive(:process_local_log).and_return(true).at_least(size) 576 | 577 | # first iteration 578 | s3_plugin.process_files(queue) 579 | 580 | # second iteration 581 | sleep(cutoff + 1) 582 | s3_plugin.process_files(queue) 583 | end 584 | end 585 | 586 | context 's3 object updated after getting summary' do 587 | it 'should not update sincedb' do 588 | s3_summary = [ 589 | double(:key => 'YESTERDAY', :last_modified => Time.now.round - day, :content_length => 5, :storage_class => 'STANDARD'), 590 | double(:key => 'TODAY', :last_modified => Time.now.round - (cutoff * 10), :content_length => 5, :storage_class => 'STANDARD') 591 | ] 592 | 593 | s3_objects = [ 594 | double(:key => 'YESTERDAY', :last_modified => Time.now.round - day, :content_length => 5, :storage_class => 'STANDARD'), 595 | double(:key => 'TODAY_UPDATED', :last_modified => Time.now.round, :content_length => 5, :storage_class => 'STANDARD') 596 | ] 597 | 598 | size = s3_objects.length 599 | 600 | allow_any_instance_of(Aws::S3::Bucket).to receive(:objects) { s3_summary } 601 | allow_any_instance_of(Aws::S3::Bucket).to receive(:object).and_return(*s3_objects) 602 | expect(s3_plugin).to receive(:process_log).at_least(size).and_call_original 603 | expect(s3_plugin).to receive(:stop?).and_return(false).at_least(size) 604 | expect(s3_plugin).to receive(:download_remote_file).and_return(true).at_least(size) 605 | expect(s3_plugin).to receive(:process_local_log).and_return(true).at_least(size) 606 | 607 | s3_plugin.process_files(queue) 608 | expect(s3_plugin.send(:sincedb).read).to eq(s3_summary[0].last_modified) 609 | end 610 | end 611 | end 612 | end 613 | -------------------------------------------------------------------------------- /spec/inputs/sincedb_spec.rb: -------------------------------------------------------------------------------- 1 | # encoding: utf-8 2 | require "logstash/devutils/rspec/spec_helper" 3 | require "logstash/inputs/s3" 4 | require "stud/temporary" 5 | require "fileutils" 6 | 7 | describe LogStash::Inputs::S3::SinceDB::File do 8 | let(:file) { Stud::Temporary.file.path } 9 | subject { LogStash::Inputs::S3::SinceDB::File.new(file) } 10 | before do 11 | FileUtils.touch(file) 12 | end 13 | 14 | it "doesnt raise an exception if the file is empty" do 15 | expect { subject.read }.not_to raise_error 16 | end 17 | end 18 | -------------------------------------------------------------------------------- /spec/integration/s3_spec.rb: -------------------------------------------------------------------------------- 1 | require "logstash/devutils/rspec/spec_helper" 2 | require "logstash/inputs/s3" 3 | require "aws-sdk" 4 | require "fileutils" 5 | require_relative "../support/helpers" 6 | 7 | describe LogStash::Inputs::S3, :integration => true, :s3 => true do 8 | before do 9 | Thread.abort_on_exception = true 10 | 11 | upload_file('../fixtures/uncompressed.log' , "#{prefix}uncompressed_1.log") 12 | upload_file('../fixtures/compressed.log.gz', "#{prefix}compressed_1.log.gz") 13 | sleep(LogStash::Inputs::S3::CUTOFF_SECOND + 1) 14 | end 15 | 16 | after do 17 | delete_remote_files(prefix) 18 | FileUtils.rm_rf(temporary_directory) 19 | delete_remote_files(backup_prefix) 20 | end 21 | 22 | let(:temporary_directory) { Stud::Temporary.directory } 23 | let(:prefix) { 'logstash-s3-input-prefix/' } 24 | 25 | let(:minimal_settings) { { "access_key_id" => ENV['AWS_ACCESS_KEY_ID'], 26 | "secret_access_key" => ENV['AWS_SECRET_ACCESS_KEY'], 27 | "bucket" => ENV['AWS_LOGSTASH_TEST_BUCKET'], 28 | "region" => ENV["AWS_REGION"] || "us-east-1", 29 | "prefix" => prefix, 30 | "temporary_directory" => temporary_directory } } 31 | let(:backup_prefix) { "backup/" } 32 | let(:backup_bucket) { "logstash-s3-input-backup" } 33 | 34 | it "support prefix to scope the remote files" do 35 | events = fetch_events(minimal_settings) 36 | expect(events.size).to eq(4) 37 | end 38 | 39 | 40 | it "add a prefix to the file" do 41 | fetch_events(minimal_settings.merge({ "backup_to_bucket" => ENV["AWS_LOGSTASH_TEST_BUCKET"], 42 | "backup_add_prefix" => backup_prefix })) 43 | expect(list_remote_files(backup_prefix).size).to eq(2) 44 | end 45 | 46 | it "allow you to backup to a local directory" do 47 | Stud::Temporary.directory do |backup_dir| 48 | fetch_events(minimal_settings.merge({ "backup_to_dir" => backup_dir })) 49 | expect(Dir.glob(File.join(backup_dir, "*")).size).to eq(2) 50 | end 51 | end 52 | 53 | context "remote backup" do 54 | before do 55 | create_bucket(backup_bucket) 56 | end 57 | 58 | it "another bucket" do 59 | fetch_events(minimal_settings.merge({ "backup_to_bucket" => backup_bucket})) 60 | expect(list_remote_files("", backup_bucket).size).to eq(2) 61 | end 62 | 63 | after do 64 | delete_bucket(backup_bucket) 65 | end 66 | end 67 | end 68 | -------------------------------------------------------------------------------- /spec/support/helpers.rb: -------------------------------------------------------------------------------- 1 | def fetch_events(settings) 2 | queue = [] 3 | s3 = LogStash::Inputs::S3.new(settings) 4 | s3.register 5 | s3.process_files(queue) 6 | queue 7 | end 8 | 9 | # delete_files(prefix) 10 | def upload_file(local_file, remote_name) 11 | bucket = s3object.bucket(ENV['AWS_LOGSTASH_TEST_BUCKET']) 12 | file = File.expand_path(File.join(File.dirname(__FILE__), local_file)) 13 | bucket.object(remote_name).upload_file(file) 14 | end 15 | 16 | def delete_remote_files(prefix) 17 | bucket = s3object.bucket(ENV['AWS_LOGSTASH_TEST_BUCKET']) 18 | bucket.objects(:prefix => prefix).each { |object| object.delete } 19 | end 20 | 21 | def list_remote_files(prefix, target_bucket = ENV['AWS_LOGSTASH_TEST_BUCKET']) 22 | bucket = s3object.bucket(target_bucket) 23 | bucket.objects(:prefix => prefix).collect(&:key) 24 | end 25 | 26 | def create_bucket(name) 27 | s3object.bucket(name).create 28 | end 29 | 30 | def delete_bucket(name) 31 | s3object.bucket(name).objects.map(&:delete) 32 | s3object.bucket(name).delete 33 | end 34 | 35 | def s3object 36 | Aws::S3::Resource.new 37 | end 38 | 39 | class TestInfiniteS3Object 40 | def initialize(s3_obj) 41 | @s3_obj = s3_obj 42 | end 43 | 44 | def each 45 | counter = 1 46 | 47 | loop do 48 | yield @s3_obj 49 | counter +=1 50 | end 51 | end 52 | end --------------------------------------------------------------------------------