├── .eslintrc.yml ├── .gitignore ├── README.md ├── art └── diagram.png ├── buildspec.yml ├── package-lock.json ├── package.json ├── serverless.yml └── src └── main └── js └── lambda ├── link └── index.js ├── load └── index.js ├── playback ├── index.js └── test.json ├── pump └── index.js └── scan └── index.js /.eslintrc.yml: -------------------------------------------------------------------------------- 1 | env: 2 | es6: true 3 | node: true 4 | extends: 'eslint:recommended' 5 | parserOptions: 6 | sourceType: module 7 | ecmaVersion: 8 8 | rules: 9 | no-console: 'off' 10 | indent: 11 | - error 12 | - 4 13 | linebreak-style: 14 | - error 15 | - unix 16 | quotes: 17 | - error 18 | - single 19 | semi: 20 | - error 21 | - always 22 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | /node_modules/ 2 | /.DS_Store 3 | /.project 4 | /.serverless/ 5 | /.webpack/ 6 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # dynamodb-event-store ![codebuild](https://codebuild.us-east-1.amazonaws.com/badges?uuid=eyJlbmNyeXB0ZWREYXRhIjoieUNIR0o0d0tpRmJab0huNkxETitablhCYWNmc3l0d0pLanRIbFQyRlJmNmRRQ2lTcGFhZUZyR3VjeEx3ZEwxdVRLOUozb0tyQWdtbm9iTXcwaGpxOXRJPSIsIml2UGFyYW1ldGVyU3BlYyI6IlJSWldUMW5TNHpCTzBGMzIiLCJtYXRlcmlhbFNldFNlcmlhbCI6MX0%3D&branch=master) 2 | Simple Event Store for DynamoDB 3 | 4 | ![diagram](art/diagram.png) 5 | ## What is it? 6 | This is a *proof of concept* implementation to store a log of time based events (in this case from Kinesis) to a NoSQL database like DynamoDB. While it's certainly not ready for production use (you have been warned), it could evolve into a robust solution easily -- I think. 7 | 8 | The idea is to be able to store events when they are happening, and to have a way to play them back sequentially from an arbitrary point in time. This is very useful for Event Sourcing, to keep the ledger of events for a potentially infinite amount of data and time, when the Event Stream may be offering limited retention. 9 | 10 | The problem with storing time based events in DynamoDB, in fact, is not trivial. The main issue is that using a naive partition key/range key schema will typically face the [hot key/partition problem](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GuidelinesForTables.html), or size limitations for the partition, or make it impossible to play events back in sequence. 11 | 12 | 13 | ### Naive solutions: 14 | 15 | **Random partition key and time based range key** 16 | 17 | |Partition Key|Range Key|Data| 18 | |----|----|----| 19 | |379d96af-5058-401f-ab7c|2018-03-04T14:59:44.810Z|{"foo":12}| 20 | |cf046f75-beb6-414c-b0ba|2018-03-04T14:59:44.321Z|{"foo":31}| 21 | |c115d8ac-205e-4ed7-8f86|2018-03-04T14:59:44.030Z|{"foo":10}| 22 | 23 | This provides an optimal resource utilization, as writes are spread across partitions, but it becomes very difficult to play back event in sequence, because the partition key is random, and therefore it cannot be guessed to rebuild the original time sequence. Also note that using secondary global indexes would just move the hot partition problem to the index tables. 24 | 25 | 26 | **Predictable partition key and time based range key** 27 | 28 | |Partition Key|Range Key|Data| 29 | |----|----|----| 30 | |2018-03-04T14:59:44.000Z|2018-03-04T14:59:44.030Z|{"foo":10}| 31 | |2018-03-04T14:59:44.000Z|2018-03-04T14:59:44.321Z|{"foo":31}| 32 | |2018-03-04T14:59:44.000Z|2018-03-04T14:59:44.810Z|{"foo":12}| 33 | 34 | This makes it easy to scan the table, with a predictable partition key (based on a second resolution), but the writes are happening at the same time, during the same second, on the same partition, making it "hot" and likely exceeding the provisioned throughput. Increasing the throughput would still not be a full solution to the problem, as each partition just gets a slice of the total provision throughput, depending on the number of partitions in use. Since each partition is 10GB, eventually the table will grow to hundreds of partitions, and any increase in throughtput will benefit for only a small percentual. 35 | 36 | ### This implementation: 37 | The idea is to actually combine the two naive solutions above, into one that is *less naive* and provides both optimal write performances and easy event playback and ordering. 38 | 39 | One *buffer* table is used to write events using a quasi-random partition key, with a timestamp as another colunmn (doesn't need to be a range key). Periodically, say every minute, the buffer table is scanned and records are moved into the *event* table, with a partition key that is predictable as in the naive example before (timestamp at the second resolution), and the full timestamp as range key. Because the scan operation on the buffer table returns records in random order, writes are spread across potentially 60 different partitions (in case of scan every minute), therefore avoiding or reducing hot partitions. 40 | 41 | Depending on the frequency of events, and hence the rate of writing, you can use a different time resolution and a different scan period. For instance, for a lower event frequency, the time unit could be set to 1 minute, and the scan period increased to 10 minutes. 42 | In this way, events will be moved from the buffer table to the event table evey ten minutes, and writes will be spread across 10 partitions. 43 | 44 | All of this is implement purely in Serverless Lambda, without the need to provision anything but the tables and the kinesis stream. 45 | 46 | Please note that in this POC, the partion key used for the buffer table is actually a `sha256` of the kinesis message. This allow to handle failures, as the *pump* lambda will just retry to update the same record in case a failure writing to dynamo (which in turn would cause the lambda to restart reading events from the stream). 47 | 48 | To load test, there's also an ApiGateway *load* lambda that can be invoked with something like apache benchmark and returns a single pixel after having pushed the request object data in kinesis. For instance, running: 49 | 50 | ```bash 51 | $ ab -k -n 1000 -c 10 https://xxxx.execute-api.us-east-1.amazonaws.com/dev/track.gif 52 | ``` 53 | 54 | you will be pushing 1,000 events (from 10 clients) into kinesis and the dynamodb event store. 55 | 56 | You could also put the url in a web page in a highly trafficed site with something like the following: 57 | ```html 58 | 59 | ``` 60 | and push actual click stream data into your event store. 61 | 62 | In this implementation, the playback of events targets a separate Kinesis stream. 63 | You can test it by running 64 | ```bash 65 | $ sls invoke local -f playback 66 | 67 | Starting event playback on event-store-dev-event-table from 2018-03-16T19:56:02.000Z to 2018-03-16T19:56:32.000Z 68 | 2018-03-16T19:56:02.000Z 69 | 2018-03-16T19:56:03.000Z: 2018-03-16T19:56:03.801Z [3] 70 | 2018-03-16T19:56:04.000Z: 2018-03-16T19:56:04.036Z [9] 71 | 2018-03-16T19:56:05.000Z: 2018-03-16T19:56:05.041Z [25] 72 | 2018-03-16T19:56:06.000Z: 2018-03-16T19:56:06.034Z [32] 73 | 2018-03-16T19:56:07.000Z: 2018-03-16T19:56:07.012Z [40] 74 | 2018-03-16T19:56:08.000Z: 2018-03-16T19:56:08.002Z [32] 75 | 2018-03-16T19:56:09.000Z: 2018-03-16T19:56:09.016Z [39] 76 | 2018-03-16T19:56:10.000Z: 2018-03-16T19:56:10.006Z [37] 77 | 2018-03-16T19:56:11.000Z: 2018-03-16T19:56:11.020Z [43] 78 | 2018-03-16T19:56:12.000Z: 2018-03-16T19:56:12.038Z [41] 79 | 2018-03-16T19:56:13.000Z: 2018-03-16T19:56:13.007Z [41] 80 | 2018-03-16T19:56:14.000Z: 2018-03-16T19:56:14.009Z [39] 81 | 2018-03-16T19:56:15.000Z: 2018-03-16T19:56:15.002Z [40] 82 | 2018-03-16T19:56:16.000Z: 2018-03-16T19:56:16.063Z [31] 83 | 2018-03-16T19:56:17.000Z: 2018-03-16T19:56:17.034Z [14] 84 | 2018-03-16T19:56:18.000Z: 2018-03-16T19:56:18.010Z [36] 85 | 2018-03-16T19:56:19.000Z: 2018-03-16T19:56:19.015Z [41] 86 | 2018-03-16T19:56:20.000Z: 2018-03-16T19:56:20.000Z [36] 87 | 2018-03-16T19:56:21.000Z: 2018-03-16T19:56:21.005Z [42] 88 | 2018-03-16T19:56:22.000Z: 2018-03-16T19:56:22.018Z [44] 89 | 2018-03-16T19:56:23.000Z: 2018-03-16T19:56:23.062Z [43] 90 | 2018-03-16T19:56:24.000Z: 2018-03-16T19:56:24.013Z [40] 91 | 2018-03-16T19:56:25.000Z: 2018-03-16T19:56:25.005Z [41] 92 | 2018-03-16T19:56:26.000Z: 2018-03-16T19:56:26.015Z [41] 93 | 2018-03-16T19:56:27.000Z: 2018-03-16T19:56:27.008Z [33] 94 | 2018-03-16T19:56:28.000Z: 2018-03-16T19:56:28.005Z [36] 95 | 2018-03-16T19:56:29.000Z: 2018-03-16T19:56:29.032Z [42] 96 | 2018-03-16T19:56:30.000Z: 2018-03-16T19:56:30.000Z [43] 97 | 2018-03-16T19:56:31.000Z: 2018-03-16T19:56:31.009Z [12] 98 | "Success" 99 | ``` 100 | which, by default, will push the last 15 minutes of event in the playback stream. 101 | 102 | The idea is that, if you ever need to playback events, you can avoid saturating the main event stream, and you don't need to mark events as a "special" playback event. The downside, of course, is that you will need to subscribe to both the event and the playback stream. 103 | 104 | ### Functions 105 | The following Lambda functions are deployed as part of this project: 106 | * **load**: _Load events into Kinesis from API gateway_. This is an extremely simple load simulator, that you can easily trigger with Apache Benchmark or any other web load testing tools. 107 | * **pump**: _Push events from Kinesis in the DynamoDB buffer table with a well distributed partition key_. This function will subscribe to the event stream, and duly copy any record into the buffer table, using as partition key an hash of the actual message. 108 | * **scan**: _Periodically scan the buffer table and store any new record into the event table_. This is an example of a "forking lambda", as it will scan the buffer table and fork new copies of itself to finish the task when there are more records that can be returned in a single scan operation. 109 | 110 | * **playback**: _Load events from the DynamoDB event store into the playback Kinesis stream_. This is a sample implementation for a playback function. It can be invoked directly with `sls invoke local -f playback -p src/main/js/lambda/playback/test.json ` where test.json contains a `start` and an `end` timestamp to define the window of events to playback. If invoked without parameters, it will default to playing back the last 15 minutes of events. 111 | 112 | ### How to run it 113 | The code is written using the serverless framework, so just install it with: 114 | ```bash 115 | $ npm -g install serverless 116 | ``` 117 | Then, clone this project, install all the dependencies and deploy: 118 | ```bash 119 | $ npm install && sls deploy 120 | ``` 121 | At the end, the tables, the kinesis stream and the lambda functions are deployed, and you can start using it. Just using a simple curl command: 122 | ```bash 123 | $ curl -v https://xxxx.execute-api.us-east-1.amazonaws.com/dev/track.gif 124 | ``` 125 | and then verify that records are placed in the buffer table and then moved to the event table. 126 | 127 | ### Notes 128 | This project is also using AWS X-ray to provide resource utilization metrics. 129 | The default capacity set for the DynamoDB *buffer* table is 100 reads, 100 writes. The capacity for the event table is 10 reads, 50 writes. This can be adequate for a small to medium load. 130 | Using DynamoDB autoscaling may also work well for moderately spiking load. 131 | -------------------------------------------------------------------------------- /art/diagram.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alessandrobologna/dynamodb-event-store/2f238fde6008023fb6a67e5a9d8d1dd6ee5c944e/art/diagram.png -------------------------------------------------------------------------------- /buildspec.yml: -------------------------------------------------------------------------------- 1 | version: 0.2 2 | 3 | phases: 4 | install: 5 | commands: 6 | - npm -g install serverless 7 | pre_build: 8 | commands: 9 | - npm install 10 | build: 11 | commands: 12 | - sls deploy 13 | -------------------------------------------------------------------------------- /package-lock.json: -------------------------------------------------------------------------------- 1 | { 2 | "name": "dynamodb-event-store", 3 | "version": "1.0.0", 4 | "lockfileVersion": 1, 5 | "requires": true, 6 | "dependencies": { 7 | "array-uniq": { 8 | "version": "1.0.2", 9 | "resolved": "https://registry.npmjs.org/array-uniq/-/array-uniq-1.0.2.tgz", 10 | "integrity": "sha1-X8w3OSB3VyPP1k1lxkvvU7+eum0=" 11 | }, 12 | "async": { 13 | "version": "1.0.0", 14 | "resolved": "https://registry.npmjs.org/async/-/async-1.0.0.tgz", 15 | "integrity": "sha1-+PwEyjoTeErenhZBr5hXjPvWR6k=" 16 | }, 17 | "async-listener": { 18 | "version": "0.6.9", 19 | "resolved": "https://registry.npmjs.org/async-listener/-/async-listener-0.6.9.tgz", 20 | "integrity": "sha1-UbyV5BCVQX8zki+03uTyMrMiZIg=", 21 | "requires": { 22 | "semver": "5.5.0", 23 | "shimmer": "1.2.0" 24 | } 25 | }, 26 | "aws-sdk": { 27 | "version": "2.226.1", 28 | "resolved": "https://registry.npmjs.org/aws-sdk/-/aws-sdk-2.226.1.tgz", 29 | "integrity": "sha1-jm4rRmqvJss1cUkF1fk05sgxepk=", 30 | "dev": true, 31 | "requires": { 32 | "buffer": "4.9.1", 33 | "events": "1.1.1", 34 | "ieee754": "1.1.8", 35 | "jmespath": "0.15.0", 36 | "querystring": "0.2.0", 37 | "sax": "1.2.1", 38 | "url": "0.10.3", 39 | "uuid": "3.1.0", 40 | "xml2js": "0.4.17", 41 | "xmlbuilder": "4.2.1" 42 | } 43 | }, 44 | "aws-xray-sdk": { 45 | "version": "1.2.0", 46 | "resolved": "https://registry.npmjs.org/aws-xray-sdk/-/aws-xray-sdk-1.2.0.tgz", 47 | "integrity": "sha1-/+FqfqyCh2Nfoc5Aw2E3J2669Co=", 48 | "requires": { 49 | "aws-xray-sdk-core": "1.2.0", 50 | "aws-xray-sdk-express": "1.2.0", 51 | "aws-xray-sdk-mysql": "1.2.0", 52 | "aws-xray-sdk-postgres": "1.2.0", 53 | "pkginfo": "0.4.1" 54 | } 55 | }, 56 | "aws-xray-sdk-core": { 57 | "version": "1.2.0", 58 | "resolved": "https://registry.npmjs.org/aws-xray-sdk-core/-/aws-xray-sdk-core-1.2.0.tgz", 59 | "integrity": "sha512-HOSVn0O7ZXKTTgGErpG8GjFfSxhK5vfihqNAPoyi7GYMB16Y25BeWBkfpYVOEWWGYL/9bDcBcRAVc1NJwVJiHQ==", 60 | "requires": { 61 | "continuation-local-storage": "3.2.1", 62 | "moment": "2.22.1", 63 | "pkginfo": "0.4.1", 64 | "semver": "5.5.0", 65 | "underscore": "1.9.0", 66 | "winston": "2.4.2" 67 | } 68 | }, 69 | "aws-xray-sdk-express": { 70 | "version": "1.2.0", 71 | "resolved": "https://registry.npmjs.org/aws-xray-sdk-express/-/aws-xray-sdk-express-1.2.0.tgz", 72 | "integrity": "sha1-i/NSkIndtHsnNjFFdpcNTGifPzE=" 73 | }, 74 | "aws-xray-sdk-mysql": { 75 | "version": "1.2.0", 76 | "resolved": "https://registry.npmjs.org/aws-xray-sdk-mysql/-/aws-xray-sdk-mysql-1.2.0.tgz", 77 | "integrity": "sha1-n4m1yRN6utyrn2mifMRVx/5P3L0=" 78 | }, 79 | "aws-xray-sdk-postgres": { 80 | "version": "1.2.0", 81 | "resolved": "https://registry.npmjs.org/aws-xray-sdk-postgres/-/aws-xray-sdk-postgres-1.2.0.tgz", 82 | "integrity": "sha1-a4NCdkNZ6CPq9b5azvurY//DfH4=" 83 | }, 84 | "base64-js": { 85 | "version": "1.3.0", 86 | "resolved": "https://registry.npmjs.org/base64-js/-/base64-js-1.3.0.tgz", 87 | "integrity": "sha512-ccav/yGvoa80BQDljCxsmmQ3Xvx60/UpBIij5QN21W3wBi/hhIC9OoO+KLpu9IJTS9j4DRVJ3aDDF9cMSoa2lw==", 88 | "dev": true 89 | }, 90 | "buffer": { 91 | "version": "4.9.1", 92 | "resolved": "https://registry.npmjs.org/buffer/-/buffer-4.9.1.tgz", 93 | "integrity": "sha1-bRu2AbB6TvztlwlBMgkwJ8lbwpg=", 94 | "dev": true, 95 | "requires": { 96 | "base64-js": "1.3.0", 97 | "ieee754": "1.1.8", 98 | "isarray": "1.0.0" 99 | } 100 | }, 101 | "colors": { 102 | "version": "1.0.3", 103 | "resolved": "https://registry.npmjs.org/colors/-/colors-1.0.3.tgz", 104 | "integrity": "sha1-BDP0TYCWgP3rYO0mDxsMJi6CpAs=" 105 | }, 106 | "continuation-local-storage": { 107 | "version": "3.2.1", 108 | "resolved": "https://registry.npmjs.org/continuation-local-storage/-/continuation-local-storage-3.2.1.tgz", 109 | "integrity": "sha1-EfYT906RT+mzTJKtLSj+auHbf/s=", 110 | "requires": { 111 | "async-listener": "0.6.9", 112 | "emitter-listener": "1.1.1" 113 | } 114 | }, 115 | "cycle": { 116 | "version": "1.0.3", 117 | "resolved": "https://registry.npmjs.org/cycle/-/cycle-1.0.3.tgz", 118 | "integrity": "sha1-IegLK+hYD5i0aPN5QwZisEbDStI=" 119 | }, 120 | "emitter-listener": { 121 | "version": "1.1.1", 122 | "resolved": "https://registry.npmjs.org/emitter-listener/-/emitter-listener-1.1.1.tgz", 123 | "integrity": "sha1-6Lu+gkS8jg0LTvcc0UKUx/JBx+w=", 124 | "requires": { 125 | "shimmer": "1.2.0" 126 | } 127 | }, 128 | "events": { 129 | "version": "1.1.1", 130 | "resolved": "https://registry.npmjs.org/events/-/events-1.1.1.tgz", 131 | "integrity": "sha1-nr23Y1rQmccNzEwqH1AEKI6L2SQ=", 132 | "dev": true 133 | }, 134 | "eyes": { 135 | "version": "0.1.8", 136 | "resolved": "https://registry.npmjs.org/eyes/-/eyes-0.1.8.tgz", 137 | "integrity": "sha1-Ys8SAjTGg3hdkCNIqADvPgzCC8A=" 138 | }, 139 | "ieee754": { 140 | "version": "1.1.8", 141 | "resolved": "https://registry.npmjs.org/ieee754/-/ieee754-1.1.8.tgz", 142 | "integrity": "sha1-vjPUCsEO8ZJnAfbwii2G+/0a0+Q=", 143 | "dev": true 144 | }, 145 | "isarray": { 146 | "version": "1.0.0", 147 | "resolved": "https://registry.npmjs.org/isarray/-/isarray-1.0.0.tgz", 148 | "integrity": "sha1-u5NdSFgsuhaMBoNJV6VKPgcSTxE=", 149 | "dev": true 150 | }, 151 | "isstream": { 152 | "version": "0.1.2", 153 | "resolved": "https://registry.npmjs.org/isstream/-/isstream-0.1.2.tgz", 154 | "integrity": "sha1-R+Y/evVa+m+S4VAOaQ64uFKcCZo=" 155 | }, 156 | "jmespath": { 157 | "version": "0.15.0", 158 | "resolved": "https://registry.npmjs.org/jmespath/-/jmespath-0.15.0.tgz", 159 | "integrity": "sha1-o/Iiqarp+Wb10nx5ZRDigJF2Qhc=", 160 | "dev": true 161 | }, 162 | "lodash": { 163 | "version": "4.17.5", 164 | "resolved": "https://registry.npmjs.org/lodash/-/lodash-4.17.5.tgz", 165 | "integrity": "sha1-maktZcAnLevoyWtgV7yPv6O+1RE=", 166 | "dev": true 167 | }, 168 | "moment": { 169 | "version": "2.22.1", 170 | "resolved": "https://registry.npmjs.org/moment/-/moment-2.22.1.tgz", 171 | "integrity": "sha512-shJkRTSebXvsVqk56I+lkb2latjBs8I+pc2TzWc545y2iFnSjm7Wg0QMh+ZWcdSLQyGEau5jI8ocnmkyTgr9YQ==" 172 | }, 173 | "pkginfo": { 174 | "version": "0.4.1", 175 | "resolved": "https://registry.npmjs.org/pkginfo/-/pkginfo-0.4.1.tgz", 176 | "integrity": "sha1-tUGO8EOd5UJfxJlQQtztFPsqhP8=" 177 | }, 178 | "punycode": { 179 | "version": "1.3.2", 180 | "resolved": "https://registry.npmjs.org/punycode/-/punycode-1.3.2.tgz", 181 | "integrity": "sha1-llOgNvt8HuQjQvIyXM7v6jkmxI0=", 182 | "dev": true 183 | }, 184 | "querystring": { 185 | "version": "0.2.0", 186 | "resolved": "https://registry.npmjs.org/querystring/-/querystring-0.2.0.tgz", 187 | "integrity": "sha1-sgmEkgO7Jd+CDadW50cAWHhSFiA=", 188 | "dev": true 189 | }, 190 | "randomstring": { 191 | "version": "1.1.5", 192 | "resolved": "https://registry.npmjs.org/randomstring/-/randomstring-1.1.5.tgz", 193 | "integrity": "sha1-bfBij3XL1ZMpMNn+OrTpVqGFGMM=", 194 | "requires": { 195 | "array-uniq": "1.0.2" 196 | } 197 | }, 198 | "sax": { 199 | "version": "1.2.1", 200 | "resolved": "https://registry.npmjs.org/sax/-/sax-1.2.1.tgz", 201 | "integrity": "sha1-e45lYZCyKOgaZq6nSEgNgozS03o=", 202 | "dev": true 203 | }, 204 | "semver": { 205 | "version": "5.5.0", 206 | "resolved": "https://registry.npmjs.org/semver/-/semver-5.5.0.tgz", 207 | "integrity": "sha1-3Eu8emyp2Rbe5dQ1FvAJK1j3uKs=" 208 | }, 209 | "serverless-apigw-binary": { 210 | "version": "0.4.4", 211 | "resolved": "https://registry.npmjs.org/serverless-apigw-binary/-/serverless-apigw-binary-0.4.4.tgz", 212 | "integrity": "sha1-88/9g2UyKiodi95f2Uswa3pk1Ps=", 213 | "dev": true 214 | }, 215 | "serverless-plugin-tracing": { 216 | "version": "2.0.0", 217 | "resolved": "https://registry.npmjs.org/serverless-plugin-tracing/-/serverless-plugin-tracing-2.0.0.tgz", 218 | "integrity": "sha1-32uLMWasm7cKN8f8h1AUsjaRWPY=", 219 | "dev": true 220 | }, 221 | "shimmer": { 222 | "version": "1.2.0", 223 | "resolved": "https://registry.npmjs.org/shimmer/-/shimmer-1.2.0.tgz", 224 | "integrity": "sha1-+Wb3VVeJdj502IQRk2haXnhzZmU=" 225 | }, 226 | "stack-trace": { 227 | "version": "0.0.10", 228 | "resolved": "https://registry.npmjs.org/stack-trace/-/stack-trace-0.0.10.tgz", 229 | "integrity": "sha1-VHxws0fo0ytOEI6hoqFZ5f3eGcA=" 230 | }, 231 | "underscore": { 232 | "version": "1.9.0", 233 | "resolved": "https://registry.npmjs.org/underscore/-/underscore-1.9.0.tgz", 234 | "integrity": "sha512-4IV1DSSxC1QK48j9ONFK1MoIAKKkbE8i7u55w2R6IqBqbT7A/iG7aZBCR2Bi8piF0Uz+i/MG1aeqLwl/5vqF+A==" 235 | }, 236 | "url": { 237 | "version": "0.10.3", 238 | "resolved": "https://registry.npmjs.org/url/-/url-0.10.3.tgz", 239 | "integrity": "sha1-Ah5NnHcF8hu/N9A861h2dAJ3TGQ=", 240 | "dev": true, 241 | "requires": { 242 | "punycode": "1.3.2", 243 | "querystring": "0.2.0" 244 | } 245 | }, 246 | "uuid": { 247 | "version": "3.1.0", 248 | "resolved": "https://registry.npmjs.org/uuid/-/uuid-3.1.0.tgz", 249 | "integrity": "sha512-DIWtzUkw04M4k3bf1IcpS2tngXEL26YUD2M0tMDUpnUrz2hgzUBlD55a4FjdLGPvfHxS6uluGWvaVEqgBcVa+g==", 250 | "dev": true 251 | }, 252 | "winston": { 253 | "version": "2.4.2", 254 | "resolved": "https://registry.npmjs.org/winston/-/winston-2.4.2.tgz", 255 | "integrity": "sha512-4S/Ad4ZfSNl8OccCLxnJmNISWcm2joa6Q0YGDxlxMzH0fgSwWsjMt+SmlNwCqdpaPg3ev1HKkMBsIiXeSUwpbA==", 256 | "requires": { 257 | "async": "1.0.0", 258 | "colors": "1.0.3", 259 | "cycle": "1.0.3", 260 | "eyes": "0.1.8", 261 | "isstream": "0.1.2", 262 | "stack-trace": "0.0.10" 263 | } 264 | }, 265 | "xml2js": { 266 | "version": "0.4.17", 267 | "resolved": "https://registry.npmjs.org/xml2js/-/xml2js-0.4.17.tgz", 268 | "integrity": "sha1-F76T6q4/O3eTWceVtBlwWogX6Gg=", 269 | "dev": true, 270 | "requires": { 271 | "sax": "1.2.1", 272 | "xmlbuilder": "4.2.1" 273 | } 274 | }, 275 | "xmlbuilder": { 276 | "version": "4.2.1", 277 | "resolved": "https://registry.npmjs.org/xmlbuilder/-/xmlbuilder-4.2.1.tgz", 278 | "integrity": "sha1-qlijBBoGb5DqoWwvU4n/GfP0YaU=", 279 | "dev": true, 280 | "requires": { 281 | "lodash": "4.17.5" 282 | } 283 | } 284 | } 285 | } 286 | -------------------------------------------------------------------------------- /package.json: -------------------------------------------------------------------------------- 1 | { 2 | "name": "dynamodb-event-store", 3 | "version": "1.0.0", 4 | "description": "Sample dynamodb based event store", 5 | "repository": { 6 | "type": "git", 7 | "url": "git+https://github.com/alessandrobologna/dynamodb-event-store.git" 8 | }, 9 | "author": "Alessandro Bologna", 10 | "license": "ISC", 11 | "bugs": { 12 | "url": "https://github.com/alessandrobologna/dynamodb-event-store/issues" 13 | }, 14 | "homepage": "https://github.com/alessandrobologna/dynamodb-event-store#readme", 15 | "dependencies": { 16 | "aws-xray-sdk": "^1.2.0", 17 | "randomstring": "^1.1.5" 18 | }, 19 | "devDependencies": { 20 | "aws-sdk": "^2.205.0", 21 | "serverless-apigw-binary": "^0.4.4", 22 | "serverless-plugin-tracing": "^2.0.0" 23 | } 24 | } 25 | -------------------------------------------------------------------------------- /serverless.yml: -------------------------------------------------------------------------------- 1 | service: 2 | name: ${{env:SERVICE, 'event-store'}} 3 | 4 | provider: 5 | name: aws 6 | runtime: nodejs8.10 7 | variableSyntax: "\\${{([\\s\\S]+?)}}" 8 | stage: ${{opt:stage, 'dev'}} 9 | tracing: true 10 | environment: 11 | EVENT_STREAM: ${{self:service}}-${{self:provider.stage}}-stream 12 | PLAYBACK_STREAM: ${{self:service}}-${{self:provider.stage}}-playback 13 | S3_BUCKET: ${{self:service}}-${{self:provider.stage}}-bucket 14 | DYNAMO_BUFFER_TABLE: ${{self:service}}-${{self:provider.stage}}-buffer-table 15 | DYNAMO_EVENT_TABLE: ${{self:service}}-${{self:provider.stage}}-event-table 16 | iamRoleStatements: 17 | - Effect: Allow 18 | Action: 19 | - dynamodb:* 20 | Resource: "*" 21 | - Effect: Allow 22 | Action: 23 | - lambda:InvokeFunction 24 | Resource: "*" 25 | - Effect: Allow 26 | Action: 27 | - xray:PutTelemetryRecords 28 | - xray:PutTraceSegments 29 | Resource: "*" 30 | - Effect: "Allow" 31 | Action: 32 | - "kinesis:PutRecord" 33 | - "kinesis:PutRecords" 34 | Resource: 35 | - Fn::GetAtt: 36 | - EventStream 37 | - Arn 38 | - Fn::GetAtt: 39 | - PlaybackStream 40 | - Arn 41 | plugins: 42 | - serverless-plugin-tracing 43 | 44 | functions: 45 | playback: 46 | handler: src/main/js/lambda/playback/index.handler 47 | timeout: 300 48 | memorySize: 1500 49 | 50 | link: 51 | handler: src/main/js/lambda/link/index.handler 52 | timeout: 300 53 | memorySize: 1500 54 | 55 | load: 56 | handler: src/main/js/lambda/load/index.handler 57 | events: 58 | - http: 59 | path: claps/create 60 | method: post 61 | integration: lambda 62 | request: 63 | template: 64 | application/json: '{ "body" : $input.json(''$'') }' 65 | 66 | scan: 67 | handler: src/main/js/lambda/scan/index.handler 68 | timeout: 300 69 | memorySize: 1500 70 | events: 71 | - schedule: rate(1 minute) 72 | pump: 73 | handler: src/main/js/lambda/pump/index.handler 74 | timeout: 300 75 | memorySize: 1500 76 | events: 77 | - stream: 78 | type: kinesis 79 | arn: 80 | Fn::GetAtt: 81 | - EventStream 82 | - Arn 83 | batchSize: 500 84 | startingPosition: TRIM_HORIZON 85 | 86 | package: 87 | include: 88 | - src/main/js/lambda/** 89 | - node_modules/** 90 | exclude: 91 | - ./** 92 | 93 | resources: 94 | Description: DynamoDB Event Store 95 | Resources: 96 | EventStore: 97 | Type: 'AWS::DynamoDB::Table' 98 | Properties: 99 | TableName: ${{self:provider.environment.DYNAMO_EVENT_TABLE}} 100 | AttributeDefinitions: 101 | - AttributeName: event_time_slot 102 | AttributeType: S 103 | - AttributeName: event_id 104 | AttributeType: S 105 | - AttributeName: event_time_stamp 106 | AttributeType: S 107 | KeySchema: 108 | - AttributeName: event_time_slot 109 | KeyType: HASH 110 | - AttributeName: "event_id" 111 | KeyType: "RANGE" 112 | LocalSecondaryIndexes: 113 | - IndexName: "event_time_stamp" 114 | KeySchema: 115 | - AttributeName: "event_time_slot" 116 | KeyType: "HASH" 117 | - AttributeName: event_time_stamp 118 | KeyType: RANGE 119 | Projection: 120 | ProjectionType: ALL 121 | ProvisionedThroughput: 122 | ReadCapacityUnits: 10 123 | WriteCapacityUnits: 25 124 | EventBuffer: 125 | Type: 'AWS::DynamoDB::Table' 126 | Properties: 127 | TableName: ${{self:provider.environment.DYNAMO_BUFFER_TABLE}} 128 | AttributeDefinitions: 129 | - AttributeName: partition_key 130 | AttributeType: S 131 | KeySchema: 132 | - AttributeName: partition_key 133 | KeyType: HASH 134 | ProvisionedThroughput: 135 | ReadCapacityUnits: 25 136 | WriteCapacityUnits: 50 137 | EventStream: 138 | Type: AWS::Kinesis::Stream 139 | Properties: 140 | Name: ${{self:provider.environment.EVENT_STREAM}} 141 | ShardCount: 1 142 | PlaybackStream: 143 | Type: AWS::Kinesis::Stream 144 | Properties: 145 | Name: ${{self:provider.environment.PLAYBACK_STREAM}} 146 | ShardCount: 1 147 | 148 | Outputs: 149 | EventStream: 150 | Description: Event Stream name 151 | Value: 152 | 'Ref': EventStream 153 | EventStreamARN: 154 | Description: Event Stream ARN 155 | Value: 156 | 'Fn::GetAtt': 157 | - EventStream 158 | - Arn 159 | PlaybackStream: 160 | Description: Playback Stream name 161 | Value: 162 | 'Ref': PlaybackStream 163 | PlaybackStreamARN: 164 | Description: Playback Stream ARN 165 | Value: 166 | 'Fn::GetAtt': 167 | - PlaybackStream 168 | - Arn 169 | EventStore: 170 | Description: Event Store DynamoDB Table name 171 | Value: 172 | 'Ref': EventStore 173 | EventStoreARN: 174 | Description: Event Store DynamoDB Table ARN 175 | Value: 176 | 'Fn::GetAtt': 177 | - EventStore 178 | - Arn 179 | EventBuffer: 180 | Description: Event Buffer DynamoDB Table name 181 | Value: 182 | 'Ref': EventBuffer 183 | EventBufferARN: 184 | Description: Event Buffer DynamoDB Table ARN 185 | Value: 186 | 'Fn::GetAtt': 187 | - EventBuffer 188 | - Arn -------------------------------------------------------------------------------- /src/main/js/lambda/link/index.js: -------------------------------------------------------------------------------- 1 | const AWS = require('aws-sdk'); 2 | const dynamoDb = new AWS.DynamoDB.DocumentClient({ 3 | region: process.env.AWS_REGION 4 | }); 5 | const lambda = new AWS.Lambda({ 6 | region: process.env.AWS_REGION 7 | }); 8 | const timeUnit = parseInt(process.env.TIME_UNIT || "1000"); 9 | 10 | /* 11 | * Periodically scan the event table, and link records to their successor 12 | 13 | 00:01:00 => {next : 00:01:10} 14 | 00:01:10 => {next : 00:01:15} 15 | 00:01:15 => {next : 00:01:18} 16 | 00:01:18 => {next : undefined} 17 | 18 | */ 19 | 20 | exports.handler = async (event, context, callback) => { 21 | event = event || {} 22 | // if no event.start is provided, start from 15 minutes ago 23 | event.start = event.start || new Date(Math.floor(new Date() / timeUnit - 15 * 60) * timeUnit).toISOString(); 24 | let current = new Date(Date.parse(event.start)).toISOString() 25 | 26 | // stop linking at the current timestamp 27 | event.end = event.end ? new Date(Date.parse(event.end)).toISOString() : new Date().toISOString() 28 | 29 | console.log(`Starting event playback on ${process.env.DYNAMO_EVENT_TABLE} from ${event.start} to ${event.end}`) 30 | try { 31 | let event_time_slot = undefined; 32 | let previous = undefined; 33 | // leave this lambda before it times out 34 | while (current < event.end && context.getRemainingTimeInMillis() < timeUnit * 5) { 35 | const params = { 36 | TableName: process.env.DYNAMO_EVENT_TABLE, 37 | KeyConditionExpression: 'event_time_slot = :event_time_slot', 38 | ExpressionAttributeValues: { 39 | ':event_time_slot': current 40 | }, 41 | Limit: 1 // we just need to retrieve the first record for each time slot 42 | }; 43 | 44 | const records = await dynamoDb.query(params).promise(); 45 | if (records.Items.length > 0) { 46 | const record = records.Items[0]; 47 | if (previous) { 48 | previous.next_slot = record.event_time_slot 49 | const result = await dynamoDb.put({ 50 | TableName: process.env.DYNAMO_EVENT_TABLE, 51 | Item: previous 52 | }).promise() 53 | console.log(`Linked : ${previous.event_time_slot} => ${record.event_time_slot}`) 54 | } 55 | previous = record; 56 | } else { 57 | console.log(`Skipped : ${current}`); 58 | } 59 | // set a new start key to extract more records 60 | current = new Date(Date.parse(current) + timeUnit).toISOString(); 61 | } 62 | console.log(`Processed ${(Date.parse(current) - Date.parse(event.start))/1000} time slots`) 63 | if (current < event.end) { 64 | // exited the loop because of an impeding timeout 65 | console.log(`Invoking continuation lambda before timeout`) 66 | const result = await lambda.invoke({ 67 | FunctionName: context.functionName, 68 | InvocationType: 'Event', 69 | Payload: JSON.stringify({ 70 | 'start' : current, 71 | 'end' : event.end, 72 | 'continue' : true 73 | }), 74 | Qualifier: context.functionVersion 75 | }).promise(); 76 | console.log(`Invoked continuation lambda with start: ${current}`) 77 | } 78 | } catch (error) { 79 | console.error("An error occured during linking", error) 80 | return callback(error) 81 | } 82 | callback(null, "Success"); 83 | } 84 | 85 | -------------------------------------------------------------------------------- /src/main/js/lambda/load/index.js: -------------------------------------------------------------------------------- 1 | 'use strict'; 2 | 3 | const AWSXRay = require('aws-xray-sdk-core'); 4 | const AWS = AWSXRay.captureAWS(require('aws-sdk')); 5 | const uuidv4 = require('uuid/v4'); 6 | var randomstring = require('randomstring'); 7 | 8 | const kinesis = new AWS.Kinesis({ 9 | region: process.env.AWS_REGION 10 | }); 11 | /* 12 | * Load events into Kinesis from API gateway 13 | */ 14 | 15 | exports.handler = (event, context, callback) => { 16 | // generate a random message body if not provided 17 | if (event.body && Object.keys(event.body).length === 0) { 18 | event.body = { 19 | 'MessageType' : 'Claps', 20 | 'MemberId' : randomstring.generate({ 21 | length: 2, 22 | charset: 'hex' 23 | }), 24 | 'PostId' : randomstring.generate({ 25 | length: 2, 26 | charset: 'hex' 27 | }), 28 | get EntityId() { 29 | return `${this.MessageType}-${this.MemberId}-${this.PostId}`; 30 | } 31 | }; 32 | } 33 | let response = { 34 | statusCode: 200, 35 | headers: { 36 | 'Content-Type': 'application/json', 37 | 'Cache-Control': 'no-cache' 38 | }, 39 | body: event.body 40 | }; 41 | event['@timestamp'] = new Date().toISOString(); 42 | kinesis.putRecord({ 43 | Data: JSON.stringify(event), 44 | PartitionKey: event.body && event.body.EntityId || uuidv4(), // default to random uuid if no EntityId is provided 45 | StreamName: process.env.EVENT_STREAM 46 | }).promise().then(data => { 47 | if (data.SequenceNumber) { 48 | response.headers['X-SequenceNumber'] = data.SequenceNumber; 49 | } 50 | callback(null, response); 51 | }).catch(error => { 52 | console.error('Kinesis error', error); 53 | callback(error); 54 | }); 55 | }; 56 | 57 | 58 | 59 | 60 | -------------------------------------------------------------------------------- /src/main/js/lambda/playback/index.js: -------------------------------------------------------------------------------- 1 | const AWS = require('aws-sdk'); 2 | const dynamoDb = new AWS.DynamoDB.DocumentClient({ 3 | region: process.env.AWS_REGION 4 | }); 5 | const kinesis = new AWS.Kinesis({ 6 | region: process.env.AWS_REGION 7 | }); 8 | 9 | const timeUnit = parseInt(process.env.TIME_UNIT || "1000"); 10 | 11 | /* 12 | * Load events from the DynamoDB event store into the playback Kinesis stream 13 | */ 14 | 15 | exports.handler = async (event, context, callback) => { 16 | event = event || {} 17 | // if no event.start is provided, start from 1 hour ago 18 | event.start = event.start || new Date(Math.floor(new Date() / timeUnit - 15 * 60) * timeUnit).toISOString(); 19 | 20 | let start = new Date(Date.parse(event.start)).toISOString() 21 | let end = event.end ? new Date(Date.parse(event.end)).toISOString() : new Date().toISOString() 22 | console.log(`Starting event playback on ${process.env.DYNAMO_EVENT_TABLE} from ${event.start} to ${end}`) 23 | try { 24 | while (start < end) { 25 | const params = { 26 | TableName: process.env.DYNAMO_EVENT_TABLE, 27 | KeyConditionExpression: 'event_time_slot = :event_time_slot', 28 | ExpressionAttributeValues: { ':event_time_slot': start }, 29 | Limit: 500 // max number of records in a kinesis put operation 30 | }; 31 | // make sure all records are played back 32 | while (true) { 33 | const records = await dynamoDb.query(params).promise(); 34 | if (records.Items.length > 0) { 35 | let response = { 36 | SequenceNumber: 0} 37 | ; 38 | for (let record of records.Items) { 39 | const data = JSON.stringify(record.record_payload.kinesis.data); 40 | while (true) { 41 | try { 42 | response = await kinesis.putRecord({ 43 | Data: data, 44 | PartitionKey: record.record_payload.kinesis.partitionKey, 45 | SequenceNumberForOrdering: response.SequenceNumber 46 | }).promise(); 47 | break; 48 | } catch (error) { 49 | console.log(error) 50 | continue; 51 | } 52 | } 53 | } 54 | console.log(`${records.Items[0].event_time_slot}: ${records.Items[0].event_time_stamp} [${records.Items.length}]`) 55 | if (!records.LastEvaluatedKey) { 56 | // mo more records for this time slot 57 | break; 58 | } 59 | // set a new start key to extract more records 60 | params.ExclusiveStartKey = records.LastEvaluatedKey; 61 | 62 | } else { 63 | console.log(start); 64 | break; 65 | } 66 | } 67 | start = new Date(Date.parse(start) + timeUnit).toISOString(); 68 | } 69 | } catch (error) { 70 | console.error("An error occured during playback", error) 71 | return callback(error) 72 | } 73 | callback(null, "Success"); 74 | } 75 | 76 | -------------------------------------------------------------------------------- /src/main/js/lambda/playback/test.json: -------------------------------------------------------------------------------- 1 | { 2 | "start" : "2018-03-16T19:56:02.000Z", 3 | "end" : "2018-03-16T19:56:32.000Z" 4 | } -------------------------------------------------------------------------------- /src/main/js/lambda/pump/index.js: -------------------------------------------------------------------------------- 1 | const AWSXRay = require('aws-xray-sdk'); 2 | const crypto = require('crypto'); 3 | const AWS = AWSXRay.captureAWS(require('aws-sdk')); 4 | const dynamoDb = new AWS.DynamoDB.DocumentClient({ 5 | region: process.env.AWS_REGION 6 | }); 7 | 8 | /* 9 | * Push events from Kinesis in the DynamoDB buffer table with a well distributed partition key 10 | */ 11 | exports.handler = (event, context, callback) => { 12 | console.log("Processing " + event.Records.length + " records"); 13 | const promises = event.Records.map(record => { 14 | // use a sha256 of the eventSourceARN and the eventID 15 | const item = { 16 | partition_key: crypto.createHash('sha256').update(record.eventSourceARN + record.eventID).digest("hex"), 17 | event_id: record.eventID, 18 | event_time_stamp: Math.ceil(record.kinesis.approximateArrivalTimestamp * 1000), 19 | record_payload: record 20 | } 21 | return new Promise((resolve, reject) => { 22 | dynamoDb.put({ 23 | TableName: process.env.DYNAMO_BUFFER_TABLE, 24 | Item: item 25 | }, (error, result) => { 26 | if (error) { 27 | reject(error); 28 | } 29 | resolve(item.partition_key); 30 | }) 31 | }); 32 | }); 33 | 34 | Promise.all(promises).then(results => { 35 | console.log("Consumed " + results.length + " records from kinesis"); 36 | console.log(JSON.stringify(results)); 37 | callback(null, "Success"); 38 | }).catch(function (err) { 39 | console.error('A promise failed to resolve', err); 40 | callback(err); 41 | }) 42 | }; 43 | -------------------------------------------------------------------------------- /src/main/js/lambda/scan/index.js: -------------------------------------------------------------------------------- 1 | const AWSXRay = require('aws-xray-sdk'); 2 | const AWS = AWSXRay.captureAWS(require('aws-sdk')); 3 | const dynamoDb = new AWS.DynamoDB.DocumentClient({ 4 | region: process.env.AWS_REGION 5 | }); 6 | const lambda = new AWS.Lambda({ 7 | region: process.env.AWS_REGION 8 | }); 9 | 10 | const timeUnit = parseInt(process.env.TIME_UNIT || '1000'); 11 | const decodePayload = 'true' === (process.env.DECODE_PAYLOAD || 'true'); 12 | 13 | /* 14 | * Periodically scan the buffer table and store into the event table 15 | */ 16 | exports.handler = async (event, context, callback) => { 17 | const params = { 18 | TableName: process.env.DYNAMO_BUFFER_TABLE, 19 | Limit: 500 20 | }; 21 | if (event.LastEvaluatedKey) { 22 | console.log('Resuming execution from ' + JSON.stringify(event.LastEvaluatedKey)); 23 | params.ExclusiveStartKey = event.LastEvaluatedKey; 24 | } 25 | 26 | try { 27 | const response = await dynamoDb.scan(params).promise(); 28 | if (response.LastEvaluatedKey) { 29 | console.log('There are more records to process, invoking another execution'); 30 | event.LastEvaluatedKey = response.LastEvaluatedKey; 31 | await lambda.invoke({ 32 | FunctionName: context.functionName, 33 | InvocationType: 'Event', 34 | Payload: JSON.stringify(event), 35 | Qualifier: context.functionVersion 36 | }).promise(); 37 | } 38 | if (response.Items.length > 0) { 39 | for (let record of response.Items) { 40 | if (decodePayload) { 41 | try { 42 | record.record_payload.kinesis.data = JSON.parse(Buffer.from(record.record_payload.kinesis.data, 'base64')); 43 | } catch (e) { 44 | console.warn('Could not decode the payload'); 45 | // it wasn't JSON after all 46 | } 47 | } 48 | // create an item that has for partition key a slot timestamp, and range key the timestamp 49 | console.log('putting'); 50 | await dynamoDb.put({ 51 | TableName: process.env.DYNAMO_EVENT_TABLE, 52 | Item: { 53 | event_time_slot: new Date(Math.floor(record.event_time_stamp / timeUnit) * timeUnit).toISOString(), 54 | event_time_stamp: new Date(record.event_time_stamp).toISOString(), 55 | event_id: record.event_id, 56 | record_payload: record.record_payload 57 | } 58 | }).promise(); 59 | // if the put calls fails, an exception will be thrown and the delete method will not be invoked 60 | console.log('deleting'); 61 | await dynamoDb.delete({ 62 | TableName: process.env.DYNAMO_BUFFER_TABLE, 63 | Key: { 64 | partition_key: record.partition_key 65 | } 66 | }).promise(); 67 | } 68 | console.log(`Processed ${response.Items.length} records`); 69 | } else { 70 | console.log('Buffer table is empty, nothing to do'); 71 | } 72 | return callback(null, 'Success'); 73 | } catch (e) { 74 | console.error('Exception during execution:', e); 75 | callback(e); 76 | } 77 | }; 78 | --------------------------------------------------------------------------------