├── AUTHORS ├── ChangeLog ├── Gemfile ├── README.md ├── Rakefile ├── VERSION ├── fluent-plugin-dynamodb.gemspec ├── lib └── fluent │ └── plugin │ └── out_dynamodb.rb └── test └── out_dynamodb.rb /AUTHORS: -------------------------------------------------------------------------------- 1 | Takashi Matsuno 2 | Sadayuki Furuhashi 3 | 4 | -------------------------------------------------------------------------------- /ChangeLog: -------------------------------------------------------------------------------- 1 | Release 0.1.8 - 2012/07/10 2 | 3 | * Fix gem.homepage url 4 | 5 | Release 0.1.7 - 2012/06/17 6 | 7 | * Inherits DetachMultiProcessMixin 8 | 9 | Release 0.1.6 - 2012/06/12 10 | 11 | * Optimized write(chunk) method not to collect all records in memory 12 | 13 | Release 0.1.5 - 2012/06/10 14 | 15 | * First release 16 | 17 | Release 0.1.0 - 2012/06/09 18 | 19 | * First commit 20 | 21 | -------------------------------------------------------------------------------- /Gemfile: -------------------------------------------------------------------------------- 1 | source "http://rubygems.org" 2 | 3 | gemspec 4 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Amazon DynamoDB output plugin for [Fluentd](http://fluentd.org) event collector 2 | 3 | ## Requirements 4 | 5 | | fluent-plugin-influxdb | fluentd | ruby | 6 | |------------------------|---------|------| 7 | | >= 0.2.0 | >= v0.14.0 | >= 2.1 | 8 | | < 0.2.0 | >= v0.12.0 | >= 1.9 | 9 | 10 | ## Installation 11 | 12 | $ fluent-gem install fluent-plugin-dynamodb 13 | 14 | ## Configuration 15 | 16 | 17 | ### DynamoDB 18 | 19 | First of all, you need to create a table in DynamoDB. It's easy to create via Management Console. 20 | 21 | Specify table name, hash attribute name and throughput as you like. fluent-plugin-dynamodb will load your table schema and write event-stream out to your table. 22 | 23 | 24 | ### Fluentd 25 | 26 | 27 | @type dynamodb 28 | aws_key_id AWS_ACCESS_KEY 29 | aws_sec_key AWS_SECRET_ACCESS_KEY 30 | proxy_uri http://user:password@192.168.0.250:3128/ 31 | dynamo_db_endpoint https://dynamodb.ap-northeast-1.amazonaws.com 32 | dynamo_db_table access_log 33 | 34 | 35 | * **aws\_key\_id (optional)** - AWS access key id. This parameter is required when your agent is not running on EC2 instance with an IAM Instance Profile. 36 | * **aws\_sec\_key (optional)** - AWS secret key. This parameter is required when your agent is not running on EC2 instance with an IAM Instance Profile. 37 | * **proxy_uri (optional)** - your proxy url. 38 | * **dynamo\_db\_endpoint (required)** - end point of dynamodb. see [Regions and Endpoints](http://docs.amazonwebservices.com/general/latest/gr/rande.html#ddb_region) 39 | * **dynamo\_db\_table (required)** - table name of dynamodb. 40 | 41 | ## TIPS 42 | 43 | ### retrieving data 44 | 45 | fluent-plugin-dynamo will add **time** attribute and any other attributes of record automatically. 46 | For example if you read apache's access log via fluentd, structure of the table will have been like this. 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | 63 | 64 | 65 | 66 | 67 | 68 | 69 | 70 | 71 | 72 | 73 | 74 | 75 | 76 | 77 | 78 | 79 | 80 | 81 | 82 |
id (Hash Key)timehostpathmethodreferercodeagentsize
"a937f980-b304-11e1-bc96-c82a14fffef2""2012-06-10T05:26:46Z""192.168.0.3""/index.html""GET""-""200""Mozilla/5.0""4286"
"a87fc51e-b308-11e1-ba0f-5855caf50759""2012-06-10T05:28:23Z""192.168.0.4""/sample.html""GET""-""200""Mozilla/5.0""8933"
83 | 84 | Item can be retrieved by the key, but fluent-plugin-dynamo uses UUID as a primary key. 85 | There is no simple way to retrieve logs you want. 86 | By the way, you can write scan-filter with AWS SDK like [this](https://gist.github.com/2906291), but Hive on EMR is the best practice I think. 87 | 88 | ### multiprocessing 89 | 90 | If you need high throughput and if you have much provisioned throughput and abudant buffer, you can setup multiprocessing. fluent-plugin-dynamodb uses **multi workers**, so you can launch 6 workers as follows. 91 | 92 | 93 | @type dynamodb 94 | aws_key_id AWS_ACCESS_KEY 95 | aws_sec_key AWS_SECRET_ACCESS_KEY 96 | proxy_uri http://user:password@192.168.0.250:3128/ 97 | dynamo_db_endpoint https://dynamodb.ap-northeast-1.amazonaws.com 98 | dynamo_db_table access_log 99 | 100 | 101 | workers 6 102 | 103 | 104 | ### multi-region redundancy 105 | 106 | As you know fluentd has **copy** output plugin. 107 | So you can easily setup multi-region redundancy as follows. 108 | 109 | 110 | @type copy 111 | 112 | @type dynamodb 113 | aws_key_id AWS_ACCESS_KEY 114 | aws_sec_key AWS_SECRET_ACCESS_KEY 115 | dynamo_db_table test 116 | dynamo_db_endpoint https://dynamodb.ap-northeast-1.amazonaws.com 117 | 118 | 119 | @type dynamodb 120 | aws_key_id AWS_ACCESS_KEY 121 | aws_sec_key AWS_SECRET_ACCESS_KEY 122 | dynamo_db_table test 123 | dynamo_db_endpoint https://dynamodb.ap-southeast-1.amazonaws.com 124 | 125 | 126 | 127 | ## TODO 128 | 129 | * auto-create table 130 | * tag_mapped 131 | 132 | ## Copyright 133 | 134 | 135 | 136 | 137 | 138 | 139 | 140 | 141 |
CopyrightCopyright (c) 2012- Takashi Matsuno
LicenseApache License, Version 2.0
142 | -------------------------------------------------------------------------------- /Rakefile: -------------------------------------------------------------------------------- 1 | 2 | require 'bundler' 3 | Bundler::GemHelper.install_tasks 4 | 5 | require 'rake/testtask' 6 | 7 | Rake::TestTask.new(:test) do |test| 8 | test.libs << 'lib' << 'test' 9 | test.test_files = FileList['test/*.rb'] 10 | test.verbose = true 11 | end 12 | 13 | task :default => [:build] 14 | 15 | -------------------------------------------------------------------------------- /VERSION: -------------------------------------------------------------------------------- 1 | 0.2.0 2 | -------------------------------------------------------------------------------- /fluent-plugin-dynamodb.gemspec: -------------------------------------------------------------------------------- 1 | # encoding: utf-8 2 | $:.push File.expand_path('../lib', __FILE__) 3 | 4 | Gem::Specification.new do |gem| 5 | gem.name = "fluent-plugin-dynamodb" 6 | gem.description = "Amazon DynamoDB output plugin for Fluent event collector" 7 | gem.homepage = "https://github.com/gonsuke/fluent-plugin-dynamodb" 8 | gem.summary = gem.description 9 | gem.license = "Apache-2.0" 10 | gem.version = File.read("VERSION").strip 11 | gem.authors = ["Takashi Matsuno"] 12 | gem.email = "g0n5uk3@gmail.com" 13 | gem.has_rdoc = false 14 | #gem.platform = Gem::Platform::RUBY 15 | gem.files = `git ls-files`.split("\n") 16 | gem.test_files = `git ls-files -- {test,spec,features}/*`.split("\n") 17 | gem.executables = `git ls-files -- bin/*`.split("\n").map{ |f| File.basename(f) } 18 | gem.require_paths = ['lib'] 19 | 20 | gem.add_dependency "fluentd", [">= 0.14.15", "< 2"] 21 | gem.add_dependency "aws-sdk-dynamodb", [">= 1.0.0", "< 2"] 22 | gem.add_dependency "uuidtools", "~> 2.1.0" 23 | gem.add_development_dependency "rake", ">= 0.9.2" 24 | gem.add_development_dependency "test-unit", ">= 3.1.0" 25 | end 26 | -------------------------------------------------------------------------------- /lib/fluent/plugin/out_dynamodb.rb: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | require 'fluent/plugin/output' 3 | require 'aws-sdk-dynamodb' 4 | require 'msgpack' 5 | require 'time' 6 | require 'uuidtools' 7 | 8 | module Fluent::Plugin 9 | 10 | 11 | class DynamoDBOutput < Fluent::Plugin::Output 12 | Fluent::Plugin.register_output('dynamodb', self) 13 | 14 | helpers :compat_parameters 15 | 16 | DEFAULT_BUFFER_TYPE = "memory" 17 | 18 | BATCHWRITE_ITEM_LIMIT = 25 19 | BATCHWRITE_CONTENT_SIZE_LIMIT = 1024*1024 20 | 21 | config_param :aws_key_id, :string, :default => nil, :secret => true 22 | config_param :aws_sec_key, :string, :default => nil, :secret => true 23 | config_param :proxy_uri, :string, :default => nil 24 | config_param :dynamo_db_region, :string, default: ENV["AWS_REGION"] || "us-east-1" 25 | config_param :dynamo_db_table, :string 26 | config_param :dynamo_db_endpoint, :string, :default => nil 27 | config_param :time_format, :string, :default => nil 28 | config_param :add_time_attribute, :bool, :default => true 29 | config_param :detach_process, :integer, :default => 2 30 | 31 | config_section :buffer do 32 | config_set_default :@type, DEFAULT_BUFFER_TYPE 33 | end 34 | 35 | def configure(conf) 36 | compat_parameters_convert(conf, :buffer) 37 | super 38 | 39 | @timef = Fluent::TimeFormatter.new(@time_format, @localtime) 40 | end 41 | 42 | def start 43 | options = {} 44 | if @aws_key_id && @aws_sec_key 45 | options[:access_key_id] = @aws_key_id 46 | options[:secret_access_key] = @aws_sec_key 47 | end 48 | options[:region] = @dynamo_db_region if @dynamo_db_region 49 | options[:endpoint] = @dynamo_db_endpoint 50 | options[:proxy_uri] = @proxy_uri if @proxy_uri 51 | 52 | super 53 | 54 | begin 55 | restart_session(options) 56 | valid_table(@dynamo_db_table) 57 | rescue Fluent::ConfigError => e 58 | log.fatal "ConfigError: Please check your configuration, then restart fluentd. '#{e}'" 59 | exit! 60 | rescue Exception => e 61 | log.fatal "UnknownError: '#{e}'" 62 | exit! 63 | end 64 | end 65 | 66 | def restart_session(options) 67 | @dynamo_db = Aws::DynamoDB::Client.new(options) 68 | @resource = Aws::DynamoDB::Resource.new(client: @dynamo_db) 69 | 70 | end 71 | 72 | def valid_table(table_name) 73 | table = @resource.table(table_name) 74 | @hash_key = table.key_schema.select{|e| e.key_type == "HASH" }.first 75 | range_key_candidate = table.key_schema.select{|e| e.key_type == "RANGE" } 76 | @range_key = range_key_candidate.first if range_key_candidate 77 | end 78 | 79 | def match_type!(key, record) 80 | if key.key_type == "NUMBER" 81 | potential_value = record[key.attribute_name].to_i 82 | if potential_value == 0 83 | log.fatal "Failed attempt to cast hash_key to Integer." 84 | end 85 | record[key.attribute_name] = potential_value 86 | end 87 | end 88 | 89 | def format(tag, time, record) 90 | if !record.key?(@hash_key.attribute_name) 91 | record[@hash_key.attribute_name] = UUIDTools::UUID.timestamp_create.to_s 92 | end 93 | match_type!(@hash_key, record) 94 | 95 | formatted_time = @timef.format(time) 96 | if @range_key 97 | if !record.key?(@range_key.attribute_name) 98 | record[@range_key.attribute_name] = formatted_time 99 | end 100 | match_type!(@range_key, record) 101 | end 102 | record['time'] = formatted_time if @add_time_attribute 103 | 104 | record.to_msgpack 105 | end 106 | 107 | def formatted_to_msgpack_binary? 108 | true 109 | end 110 | 111 | def multi_workers_ready? 112 | true 113 | end 114 | 115 | def write(chunk) 116 | batch_size = 0 117 | batch_records = [] 118 | chunk.msgpack_each {|record| 119 | batch_records << { 120 | put_request: { 121 | item: record 122 | } 123 | } 124 | batch_size += record.to_json.length # FIXME: heuristic 125 | if batch_records.size >= BATCHWRITE_ITEM_LIMIT || batch_size >= BATCHWRITE_CONTENT_SIZE_LIMIT 126 | batch_put_records(batch_records) 127 | batch_records.clear 128 | batch_size = 0 129 | end 130 | } 131 | unless batch_records.empty? 132 | batch_put_records(batch_records) 133 | end 134 | end 135 | 136 | def batch_put_records(records) 137 | @dynamo_db.batch_write_item(request_items: { @dynamo_db_table => records }) 138 | end 139 | 140 | end 141 | 142 | 143 | end 144 | -------------------------------------------------------------------------------- /test/out_dynamodb.rb: -------------------------------------------------------------------------------- 1 | require 'fluent/test' 2 | require 'fluent/test/helpers' 3 | require 'fluent/test/driver/output' 4 | require 'fluent/plugin/out_dynamodb' 5 | 6 | class DynamoDBOutputTest < Test::Unit::TestCase 7 | include Fluent::Test::Helpers 8 | 9 | def setup 10 | Fluent::Test.setup 11 | end 12 | 13 | CONFIG = %[ 14 | aws_key_id test_key_id 15 | aws_sec_key test_sec_key 16 | dynamo_db_table test_table 17 | dynamo_db_endpoint test.endpoint 18 | utc 19 | buffer_type memory 20 | ] 21 | 22 | def create_driver(conf = CONFIG) 23 | Fluent::Test::Driver::Output.new(Fluent::Plugin::DynamoDBOutput) do 24 | def write(chunk) 25 | chunk.read 26 | end 27 | end.configure(conf) 28 | end 29 | 30 | def test_configure 31 | d = create_driver 32 | assert_equal 'test_key_id', d.instance.aws_key_id 33 | assert_equal 'test_sec_key', d.instance.aws_sec_key 34 | assert_equal 'test_table', d.instance.dynamo_db_table 35 | assert_equal 'test.endpoint', d.instance.dynamo_db_endpoint 36 | end 37 | 38 | def test_format 39 | d = create_driver 40 | 41 | time = event_time("2011-01-02 13:14:15 UTC") 42 | d.run(default_tag: 'test') do 43 | d.feed(time, {"a"=>1}) 44 | d.feed(time, {"a"=>2}) 45 | end 46 | 47 | expected = [{'a' => 1}].to_msgpack + [{'a' => 2}].to_msgpack 48 | assert_equal expected, d.formatted 49 | end 50 | 51 | def test_write 52 | d = create_driver 53 | 54 | time = event_time("2011-01-02 13:14:15 UTC") 55 | d.run(default_tag: 'test') do 56 | d.feed(time, {"a"=>1}) 57 | d.feed(time, {"a"=>2}) 58 | end 59 | 60 | data = d.events 61 | 62 | assert_equal [time, {'a' => 1}].to_msgpack + [time, {'a' => 2}].to_msgpack, data 63 | end 64 | 65 | end 66 | --------------------------------------------------------------------------------