├── AUTHORS
├── ChangeLog
├── Gemfile
├── README.md
├── Rakefile
├── VERSION
├── fluent-plugin-dynamodb.gemspec
├── lib
└── fluent
│ └── plugin
│ └── out_dynamodb.rb
└── test
└── out_dynamodb.rb
/AUTHORS:
--------------------------------------------------------------------------------
1 | Takashi Matsuno
2 | Sadayuki Furuhashi
3 |
4 |
--------------------------------------------------------------------------------
/ChangeLog:
--------------------------------------------------------------------------------
1 | Release 0.1.8 - 2012/07/10
2 |
3 | * Fix gem.homepage url
4 |
5 | Release 0.1.7 - 2012/06/17
6 |
7 | * Inherits DetachMultiProcessMixin
8 |
9 | Release 0.1.6 - 2012/06/12
10 |
11 | * Optimized write(chunk) method not to collect all records in memory
12 |
13 | Release 0.1.5 - 2012/06/10
14 |
15 | * First release
16 |
17 | Release 0.1.0 - 2012/06/09
18 |
19 | * First commit
20 |
21 |
--------------------------------------------------------------------------------
/Gemfile:
--------------------------------------------------------------------------------
1 | source "http://rubygems.org"
2 |
3 | gemspec
4 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Amazon DynamoDB output plugin for [Fluentd](http://fluentd.org) event collector
2 |
3 | ## Requirements
4 |
5 | | fluent-plugin-influxdb | fluentd | ruby |
6 | |------------------------|---------|------|
7 | | >= 0.2.0 | >= v0.14.0 | >= 2.1 |
8 | | < 0.2.0 | >= v0.12.0 | >= 1.9 |
9 |
10 | ## Installation
11 |
12 | $ fluent-gem install fluent-plugin-dynamodb
13 |
14 | ## Configuration
15 |
16 |
17 | ### DynamoDB
18 |
19 | First of all, you need to create a table in DynamoDB. It's easy to create via Management Console.
20 |
21 | Specify table name, hash attribute name and throughput as you like. fluent-plugin-dynamodb will load your table schema and write event-stream out to your table.
22 |
23 |
24 | ### Fluentd
25 |
26 |
27 | @type dynamodb
28 | aws_key_id AWS_ACCESS_KEY
29 | aws_sec_key AWS_SECRET_ACCESS_KEY
30 | proxy_uri http://user:password@192.168.0.250:3128/
31 | dynamo_db_endpoint https://dynamodb.ap-northeast-1.amazonaws.com
32 | dynamo_db_table access_log
33 |
34 |
35 | * **aws\_key\_id (optional)** - AWS access key id. This parameter is required when your agent is not running on EC2 instance with an IAM Instance Profile.
36 | * **aws\_sec\_key (optional)** - AWS secret key. This parameter is required when your agent is not running on EC2 instance with an IAM Instance Profile.
37 | * **proxy_uri (optional)** - your proxy url.
38 | * **dynamo\_db\_endpoint (required)** - end point of dynamodb. see [Regions and Endpoints](http://docs.amazonwebservices.com/general/latest/gr/rande.html#ddb_region)
39 | * **dynamo\_db\_table (required)** - table name of dynamodb.
40 |
41 | ## TIPS
42 |
43 | ### retrieving data
44 |
45 | fluent-plugin-dynamo will add **time** attribute and any other attributes of record automatically.
46 | For example if you read apache's access log via fluentd, structure of the table will have been like this.
47 |
48 |
49 |
50 | id (Hash Key) |
51 | time |
52 | host |
53 | path |
54 | method |
55 | referer |
56 | code |
57 | agent |
58 | size |
59 |
60 |
61 | "a937f980-b304-11e1-bc96-c82a14fffef2" |
62 | "2012-06-10T05:26:46Z" |
63 | "192.168.0.3" |
64 | "/index.html" |
65 | "GET" |
66 | "-" |
67 | "200" |
68 | "Mozilla/5.0" |
69 | "4286" |
70 |
71 |
72 | "a87fc51e-b308-11e1-ba0f-5855caf50759" |
73 | "2012-06-10T05:28:23Z" |
74 | "192.168.0.4" |
75 | "/sample.html" |
76 | "GET" |
77 | "-" |
78 | "200" |
79 | "Mozilla/5.0" |
80 | "8933" |
81 |
82 |
83 |
84 | Item can be retrieved by the key, but fluent-plugin-dynamo uses UUID as a primary key.
85 | There is no simple way to retrieve logs you want.
86 | By the way, you can write scan-filter with AWS SDK like [this](https://gist.github.com/2906291), but Hive on EMR is the best practice I think.
87 |
88 | ### multiprocessing
89 |
90 | If you need high throughput and if you have much provisioned throughput and abudant buffer, you can setup multiprocessing. fluent-plugin-dynamodb uses **multi workers**, so you can launch 6 workers as follows.
91 |
92 |
93 | @type dynamodb
94 | aws_key_id AWS_ACCESS_KEY
95 | aws_sec_key AWS_SECRET_ACCESS_KEY
96 | proxy_uri http://user:password@192.168.0.250:3128/
97 | dynamo_db_endpoint https://dynamodb.ap-northeast-1.amazonaws.com
98 | dynamo_db_table access_log
99 |
100 |
101 | workers 6
102 |
103 |
104 | ### multi-region redundancy
105 |
106 | As you know fluentd has **copy** output plugin.
107 | So you can easily setup multi-region redundancy as follows.
108 |
109 |
110 | @type copy
111 |
112 | @type dynamodb
113 | aws_key_id AWS_ACCESS_KEY
114 | aws_sec_key AWS_SECRET_ACCESS_KEY
115 | dynamo_db_table test
116 | dynamo_db_endpoint https://dynamodb.ap-northeast-1.amazonaws.com
117 |
118 |
119 | @type dynamodb
120 | aws_key_id AWS_ACCESS_KEY
121 | aws_sec_key AWS_SECRET_ACCESS_KEY
122 | dynamo_db_table test
123 | dynamo_db_endpoint https://dynamodb.ap-southeast-1.amazonaws.com
124 |
125 |
126 |
127 | ## TODO
128 |
129 | * auto-create table
130 | * tag_mapped
131 |
132 | ## Copyright
133 |
134 |
135 |
136 | Copyright | Copyright (c) 2012- Takashi Matsuno |
137 |
138 |
139 | License | Apache License, Version 2.0 |
140 |
141 |
142 |
--------------------------------------------------------------------------------
/Rakefile:
--------------------------------------------------------------------------------
1 |
2 | require 'bundler'
3 | Bundler::GemHelper.install_tasks
4 |
5 | require 'rake/testtask'
6 |
7 | Rake::TestTask.new(:test) do |test|
8 | test.libs << 'lib' << 'test'
9 | test.test_files = FileList['test/*.rb']
10 | test.verbose = true
11 | end
12 |
13 | task :default => [:build]
14 |
15 |
--------------------------------------------------------------------------------
/VERSION:
--------------------------------------------------------------------------------
1 | 0.2.0
2 |
--------------------------------------------------------------------------------
/fluent-plugin-dynamodb.gemspec:
--------------------------------------------------------------------------------
1 | # encoding: utf-8
2 | $:.push File.expand_path('../lib', __FILE__)
3 |
4 | Gem::Specification.new do |gem|
5 | gem.name = "fluent-plugin-dynamodb"
6 | gem.description = "Amazon DynamoDB output plugin for Fluent event collector"
7 | gem.homepage = "https://github.com/gonsuke/fluent-plugin-dynamodb"
8 | gem.summary = gem.description
9 | gem.license = "Apache-2.0"
10 | gem.version = File.read("VERSION").strip
11 | gem.authors = ["Takashi Matsuno"]
12 | gem.email = "g0n5uk3@gmail.com"
13 | gem.has_rdoc = false
14 | #gem.platform = Gem::Platform::RUBY
15 | gem.files = `git ls-files`.split("\n")
16 | gem.test_files = `git ls-files -- {test,spec,features}/*`.split("\n")
17 | gem.executables = `git ls-files -- bin/*`.split("\n").map{ |f| File.basename(f) }
18 | gem.require_paths = ['lib']
19 |
20 | gem.add_dependency "fluentd", [">= 0.14.15", "< 2"]
21 | gem.add_dependency "aws-sdk-dynamodb", [">= 1.0.0", "< 2"]
22 | gem.add_dependency "uuidtools", "~> 2.1.0"
23 | gem.add_development_dependency "rake", ">= 0.9.2"
24 | gem.add_development_dependency "test-unit", ">= 3.1.0"
25 | end
26 |
--------------------------------------------------------------------------------
/lib/fluent/plugin/out_dynamodb.rb:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 | require 'fluent/plugin/output'
3 | require 'aws-sdk-dynamodb'
4 | require 'msgpack'
5 | require 'time'
6 | require 'uuidtools'
7 |
8 | module Fluent::Plugin
9 |
10 |
11 | class DynamoDBOutput < Fluent::Plugin::Output
12 | Fluent::Plugin.register_output('dynamodb', self)
13 |
14 | helpers :compat_parameters
15 |
16 | DEFAULT_BUFFER_TYPE = "memory"
17 |
18 | BATCHWRITE_ITEM_LIMIT = 25
19 | BATCHWRITE_CONTENT_SIZE_LIMIT = 1024*1024
20 |
21 | config_param :aws_key_id, :string, :default => nil, :secret => true
22 | config_param :aws_sec_key, :string, :default => nil, :secret => true
23 | config_param :proxy_uri, :string, :default => nil
24 | config_param :dynamo_db_region, :string, default: ENV["AWS_REGION"] || "us-east-1"
25 | config_param :dynamo_db_table, :string
26 | config_param :dynamo_db_endpoint, :string, :default => nil
27 | config_param :time_format, :string, :default => nil
28 | config_param :add_time_attribute, :bool, :default => true
29 | config_param :detach_process, :integer, :default => 2
30 |
31 | config_section :buffer do
32 | config_set_default :@type, DEFAULT_BUFFER_TYPE
33 | end
34 |
35 | def configure(conf)
36 | compat_parameters_convert(conf, :buffer)
37 | super
38 |
39 | @timef = Fluent::TimeFormatter.new(@time_format, @localtime)
40 | end
41 |
42 | def start
43 | options = {}
44 | if @aws_key_id && @aws_sec_key
45 | options[:access_key_id] = @aws_key_id
46 | options[:secret_access_key] = @aws_sec_key
47 | end
48 | options[:region] = @dynamo_db_region if @dynamo_db_region
49 | options[:endpoint] = @dynamo_db_endpoint
50 | options[:proxy_uri] = @proxy_uri if @proxy_uri
51 |
52 | super
53 |
54 | begin
55 | restart_session(options)
56 | valid_table(@dynamo_db_table)
57 | rescue Fluent::ConfigError => e
58 | log.fatal "ConfigError: Please check your configuration, then restart fluentd. '#{e}'"
59 | exit!
60 | rescue Exception => e
61 | log.fatal "UnknownError: '#{e}'"
62 | exit!
63 | end
64 | end
65 |
66 | def restart_session(options)
67 | @dynamo_db = Aws::DynamoDB::Client.new(options)
68 | @resource = Aws::DynamoDB::Resource.new(client: @dynamo_db)
69 |
70 | end
71 |
72 | def valid_table(table_name)
73 | table = @resource.table(table_name)
74 | @hash_key = table.key_schema.select{|e| e.key_type == "HASH" }.first
75 | range_key_candidate = table.key_schema.select{|e| e.key_type == "RANGE" }
76 | @range_key = range_key_candidate.first if range_key_candidate
77 | end
78 |
79 | def match_type!(key, record)
80 | if key.key_type == "NUMBER"
81 | potential_value = record[key.attribute_name].to_i
82 | if potential_value == 0
83 | log.fatal "Failed attempt to cast hash_key to Integer."
84 | end
85 | record[key.attribute_name] = potential_value
86 | end
87 | end
88 |
89 | def format(tag, time, record)
90 | if !record.key?(@hash_key.attribute_name)
91 | record[@hash_key.attribute_name] = UUIDTools::UUID.timestamp_create.to_s
92 | end
93 | match_type!(@hash_key, record)
94 |
95 | formatted_time = @timef.format(time)
96 | if @range_key
97 | if !record.key?(@range_key.attribute_name)
98 | record[@range_key.attribute_name] = formatted_time
99 | end
100 | match_type!(@range_key, record)
101 | end
102 | record['time'] = formatted_time if @add_time_attribute
103 |
104 | record.to_msgpack
105 | end
106 |
107 | def formatted_to_msgpack_binary?
108 | true
109 | end
110 |
111 | def multi_workers_ready?
112 | true
113 | end
114 |
115 | def write(chunk)
116 | batch_size = 0
117 | batch_records = []
118 | chunk.msgpack_each {|record|
119 | batch_records << {
120 | put_request: {
121 | item: record
122 | }
123 | }
124 | batch_size += record.to_json.length # FIXME: heuristic
125 | if batch_records.size >= BATCHWRITE_ITEM_LIMIT || batch_size >= BATCHWRITE_CONTENT_SIZE_LIMIT
126 | batch_put_records(batch_records)
127 | batch_records.clear
128 | batch_size = 0
129 | end
130 | }
131 | unless batch_records.empty?
132 | batch_put_records(batch_records)
133 | end
134 | end
135 |
136 | def batch_put_records(records)
137 | @dynamo_db.batch_write_item(request_items: { @dynamo_db_table => records })
138 | end
139 |
140 | end
141 |
142 |
143 | end
144 |
--------------------------------------------------------------------------------
/test/out_dynamodb.rb:
--------------------------------------------------------------------------------
1 | require 'fluent/test'
2 | require 'fluent/test/helpers'
3 | require 'fluent/test/driver/output'
4 | require 'fluent/plugin/out_dynamodb'
5 |
6 | class DynamoDBOutputTest < Test::Unit::TestCase
7 | include Fluent::Test::Helpers
8 |
9 | def setup
10 | Fluent::Test.setup
11 | end
12 |
13 | CONFIG = %[
14 | aws_key_id test_key_id
15 | aws_sec_key test_sec_key
16 | dynamo_db_table test_table
17 | dynamo_db_endpoint test.endpoint
18 | utc
19 | buffer_type memory
20 | ]
21 |
22 | def create_driver(conf = CONFIG)
23 | Fluent::Test::Driver::Output.new(Fluent::Plugin::DynamoDBOutput) do
24 | def write(chunk)
25 | chunk.read
26 | end
27 | end.configure(conf)
28 | end
29 |
30 | def test_configure
31 | d = create_driver
32 | assert_equal 'test_key_id', d.instance.aws_key_id
33 | assert_equal 'test_sec_key', d.instance.aws_sec_key
34 | assert_equal 'test_table', d.instance.dynamo_db_table
35 | assert_equal 'test.endpoint', d.instance.dynamo_db_endpoint
36 | end
37 |
38 | def test_format
39 | d = create_driver
40 |
41 | time = event_time("2011-01-02 13:14:15 UTC")
42 | d.run(default_tag: 'test') do
43 | d.feed(time, {"a"=>1})
44 | d.feed(time, {"a"=>2})
45 | end
46 |
47 | expected = [{'a' => 1}].to_msgpack + [{'a' => 2}].to_msgpack
48 | assert_equal expected, d.formatted
49 | end
50 |
51 | def test_write
52 | d = create_driver
53 |
54 | time = event_time("2011-01-02 13:14:15 UTC")
55 | d.run(default_tag: 'test') do
56 | d.feed(time, {"a"=>1})
57 | d.feed(time, {"a"=>2})
58 | end
59 |
60 | data = d.events
61 |
62 | assert_equal [time, {'a' => 1}].to_msgpack + [time, {'a' => 2}].to_msgpack, data
63 | end
64 |
65 | end
66 |
--------------------------------------------------------------------------------