├── .gitignore
├── .travis.yml
├── LICENSE
├── MANIFEST.in
├── README.rst
├── pg2kinesis
    ├── __init__.py
    ├── __main__.py
    ├── formatter.py
    ├── log.py
    ├── slot.py
    └── stream.py
├── requirements.txt
├── setup.cfg
├── setup.py
└── tests
    ├── __init__.py
    ├── test___main__.py
    ├── test_formatter.py
    ├── test_slot.py
    └── test_stream.py


/.gitignore:
--------------------------------------------------------------------------------
 1 | *.pyc
 2 | .cache
 3 | .coverage/
 4 | .eggs/
 5 | .idea/
 6 | build/
 7 | dist/
 8 | pg2kinesis.egg-info/
 9 | tests/__pycache__/
10 | *.DS_Store
11 | env/
12 | 


--------------------------------------------------------------------------------
/.travis.yml:
--------------------------------------------------------------------------------
 1 | dist: trusty
 2 | language: python
 3 | 
 4 | branches:
 5 |   only:
 6 |     - master
 7 | 
 8 | 
 9 | sudo: false
10 | 
11 | python:
12 |   - "2.7"
13 |   - "3.3"
14 |   - "3.4"
15 |   - "3.5"
16 |   - "3.6"
17 | 
18 | install: pip install -r requirements.txt
19 | 
20 | script: pytest
21 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2017 Handshake Corp
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/MANIFEST.in:
--------------------------------------------------------------------------------
1 | include *.rst *.txt LICENSE .travis.yml
2 | recursive-include tests *.py
3 | 


--------------------------------------------------------------------------------
/README.rst:
--------------------------------------------------------------------------------
  1 | ==========
  2 | pg2kinesis
  3 | ==========
  4 | 
  5 | 
  6 | pg2kinesis uses `logical decoding
  7 | <https://www.postgresql.org/docs/9.4/static/logicaldecoding.html>`_
  8 | in Postgres 9.4 or later to capture a consistent, continuous stream of events
  9 | from the database and publishes them to an `AWS Kinesis <https://aws.amazon.com/kinesis/>`_
 10 | stream in a format of your choosing.
 11 | 
 12 | It does this without requiring any changes to your schema like triggers or
 13 | "shadow" columns or tables, and has a negligible impact on database performance.
 14 | This is done while being extremely fault tolerant. No data loss will be incurred
 15 | on any type of underlying system failure including process crashes, network
 16 | outages, or ec2 instance failures. However, in these situations there will
 17 | likely be records that are sent more than once, so your consumer should be
 18 | designed with this in mind.
 19 | 
 20 | The fault tolerance comes from guarantees provided by the underlying
 21 | technologies and from the "2-phase commit" style of publishing inherent to the
 22 | design of the program. Changes are first peeked from the replication slot and
 23 | published to Kinesis. Once Kinesis successfully recieves a batch of records, we
 24 | advance the `xmin <https://www.postgresql.org/docs/9.4/static/catalog-pg-replication-slots.html>`_
 25 | of the slot, thereby telling postgres it is safe to reclaim the space taken by
 26 | the WAL. As is always the case with logical replication slots, unacknowledged
 27 | data on the slot will consume disk on the database until it is read.
 28 | 
 29 | There are other utilities that do similar things, often by injecting a C library
 30 | into Postgres to do data transformations in place. Unfortunately these
 31 | approaches are not suitable for managed databases like AWS' RDS where support
 32 | for various plugins is limited and ultimately determined by the hosting provider.
 33 | We specifically created pg2kinesis to make use of logical decoding on
 34 | `Amazon's RDS for PostgreSQL <https://aws.amazon.com/rds/postgresql/>`_. Amazon
 35 | supports logical decoding with either the `test_decoding <https://www.postgresql.org/docs/9.4/static/test-decoding.html>`_
 36 | or `wal2json <https://aws.amazon.com/about-aws/whats-new/2017/07/amazon-rds-for-postgresql-supports-new-minor-versions-9-6-3-and-9-5-7-and-9-4-12-and-9-3-17/>`_
 37 | output plugins. This utility takes the output of either plugin, transforms it
 38 | based on a formatter you can define, guarantees publishing to a Kinesis stream
 39 | in *transaction commit time order* and with a guarantee that *no data will be lost*.
 40 | 
 41 | Installation
 42 | ------------
 43 | 
 44 | Prerequisites
 45 | ^^^^^^^^^^^^^
 46 | 
 47 |  #. Python 2.7*, 3.3+
 48 |  #. AWS-CLI installed and configured
 49 |  #. A PostgreSQL 9.4+ server with logical replication enabled
 50 |  #. A Kinesis stream
 51 | 
 52 | Install
 53 | ^^^^^^^
 54 | 
 55 |  ``pip install pg2kinesis``
 56 | 
 57 | 
 58 | Tests
 59 | -----
 60 | 
 61 | To run tests you will need a clone of the repo and have to install some additional requirements:
 62 | 
 63 |  #. ``git clone git@github.com:handshake/pg2kinesis.git``
 64 |  #. ``cd pg2kinesis``
 65 |  #. ``pip install -r requirements.txt``
 66 |  #. ``(cd tests && pytest)``
 67 | 
 68 | 
 69 | Usage
 70 | -----
 71 | 
 72 | Run ``pg2kinesis --help`` to get a list of the latest command line options.
 73 | 
 74 | By default pg2kinesis attempts to connect to a local postgres instance and
 75 | publish to a stream named ``pg2kinesis`` using the AWS credentials in the
 76 | environment the utility was invoked in.
 77 | 
 78 | On successful start it will query your database for the primary key definitions
 79 | of every table in ``--pg-dbname``. This is used to identify the correct column
 80 | in the test_decoding output to publish. If a table does not have a primary key
 81 | its changes will **NOT** be published unless using wal2json and ``--full-change``.
 82 | 
 83 | You have the choice for 3 different textual formats that will be sent to the
 84 | kinesis stream:
 85 | 
 86 | * ``CSV``: outputs strings to Kinesis that look like::
 87 | 
 88 |     0,CDC,<transaction_id (xid)>,<table name>,<dml operation:DELETE|INSERT|UPDATE>,<primary key of row>
 89 | 
 90 | * ``CSVPayload``: outputs similar to the above except the 3rd column is now a
 91 |   json object representing the change.
 92 | 
 93 |   .. code-block::
 94 | 
 95 |       0,CDC,{
 96 |         "xid": <transaction_id>
 97 |         "table": <...>
 98 |         "operation: <...>
 99 |         "pkey": <...>
100 |       }
101 | 
102 | * If ``wal2json`` is being used, this can either be the primary key as above or
103 |   the full changed row.
104 | 
105 |   .. code-block::
106 | 
107 |       0,CDC,{
108 |         "xid": 30355,
109 |         "change": {
110 |           "kind": "insert",
111 |           "columnnames": ["a", "b"],
112 |           "columntypes": ["int4", "int4"],
113 |           "table": "foo",
114 |           "columnvalues": [1, null],
115 |           "schema": "public"
116 |         }
117 |       }
118 | 
119 | 
120 | Shout Outs
121 | ----------
122 | 
123 | pg2kinesis is based on the ideas of others including:
124 | 
125 | * Logical Decoding: a new world of data exchange applications for Postgres SQL
126 |   [(`slides <https://www.slideshare.net/8kdata/postgresql-logical-decoding/>`_)]
127 | * psycopg2 [(`main <http://initd.org/psycopg/>`_]) [(`repo
128 |   <https://github.com/psycopg/psycopg2/>`__)]
129 | * bottledwater-pg [(`blog <https://www.confluent.io/blog/bottled-water-real-time-integration-of-postgresql-and-kafka>`_)] [(`repo <https://github.com/confluentinc/bottledwater-pg/>`__)]
130 | * wal2json [(`repo <https://github.com/eulerto/wal2json/>`__)]
131 | 
132 | 
133 | Future Road Map
134 | ---------------
135 | 
136 | * Support full change output from test_decoding plugin
137 | * Allow HUPing to notify utility to regenerate primary key cache
138 | * Support above on a schedule specified via commandline with sensible default of once an hour.
139 | 


--------------------------------------------------------------------------------
/pg2kinesis/__init__.py:
--------------------------------------------------------------------------------
 1 | __version__ = '0.7.0'
 2 | 
 3 | __uri__ = 'https://github.com/handshake/pg2kinesis'
 4 | __title__ = 'pg2kinesis'
 5 | __description__ = 'A utility that enables PostgreSQL 9.4+ to logically replicate to Amazon''s Kinesis'
 6 | __doc__ = __description__ + ' <' + __uri__ + '>'
 7 | 
 8 | __author__ = 'Shon T. Urbas, Geoff Johnson'
 9 | __email__ = 'shon.urbas@gmail.com, geoff.johnson@handshake.com'
10 | 
11 | __license__ = 'MIT'
12 | __copyright__ = 'Copyright (c) 2016 Handshake Corp.'
13 | 


--------------------------------------------------------------------------------
/pg2kinesis/__main__.py:
--------------------------------------------------------------------------------
  1 | from __future__ import division
  2 | import time
  3 | 
  4 | import click
  5 | 
  6 | from .slot import SlotReader
  7 | from .formatter import get_formatter
  8 | from .stream import StreamWriter
  9 | from .log import logger
 10 | 
 11 | 
 12 | SUPPORTED_OPERATIONS = ['update', 'insert', 'delete', 'truncate']
 13 | 
 14 | @click.command()
 15 | @click.option('--pg-dbname', '-d', help='Database to connect to.')
 16 | @click.option('--pg-host', '-h', default='',
 17 |               help='Postgres server location. Leave empty if localhost.')
 18 | @click.option('--pg-port', '-p', default='5432', help='Postgres port.')
 19 | @click.option('--pg-user', '-u', help='Postgres user')
 20 | @click.option('--pg-sslmode', help='Postgres SSL mode', default='prefer')
 21 | @click.option('--pg-slot-name', '-s', default='pg2kinesis',
 22 |               help='Postgres replication slot name.')
 23 | @click.option('--pg-slot-output-plugin', default='test_decoding',
 24 |               type=click.Choice(['test_decoding', 'wal2json']),
 25 |               help='Postgres replication slot output plugin')
 26 | @click.option('--stream-name', '-k', default='pg2kinesis',
 27 |               help='Kinesis stream name.')
 28 | @click.option('--message-formatter', '-f', default='CSVPayload',
 29 |               type=click.Choice(['CSVPayload', 'CSV']),
 30 |               help='Kinesis record formatter.')
 31 | @click.option('--table-pat', help='Optional regular expression for table names.')
 32 | @click.option('--full-change', default=False, is_flag=True,
 33 |               help='Emit all columns of a changed row.')
 34 | @click.option('--create-slot', default=False, is_flag=True,
 35 |               help='Attempt to on start create a the slot.')
 36 | @click.option('--recreate-slot', default=False, is_flag=True,
 37 |               help='Deletes the slot on start if it exists and then creates.')
 38 | @click.option('--operations', default='all', type=click.Choice(['all'] + SUPPORTED_OPERATIONS),
 39 |               multiple=True, help = 'Which operations to replicate to kinesis, Default: all')
 40 | def main(pg_dbname, pg_host, pg_port, pg_user, pg_sslmode, pg_slot_name, pg_slot_output_plugin,
 41 |          stream_name, message_formatter, table_pat, operations, full_change, create_slot, recreate_slot):
 42 |     if 'all' in operations:
 43 |         operations = SUPPORTED_OPERATIONS
 44 | 
 45 |     if full_change:
 46 |         assert message_formatter == 'CSVPayload', 'Full changes must be formatted as JSON.'
 47 |         assert pg_slot_output_plugin == 'wal2json', 'Full changes must use wal2json.'
 48 | 
 49 |     logger.info('Starting pg2kinesis replicating the following operations: %s', ','.join(operations))
 50 |     logger.info('Getting kinesis stream writer')
 51 |     writer = StreamWriter(stream_name)
 52 | 
 53 |     with SlotReader(pg_dbname, pg_host, pg_port, pg_user, pg_sslmode, pg_slot_name,
 54 |                     pg_slot_output_plugin) as reader:
 55 | 
 56 |         if recreate_slot:
 57 |             reader.delete_slot()
 58 |             reader.create_slot()
 59 |         elif create_slot:
 60 |             reader.create_slot()
 61 | 
 62 |         pk_map = reader.primary_key_map
 63 |         formatter = get_formatter(message_formatter, pk_map,
 64 |                                   pg_slot_output_plugin, full_change, table_pat)
 65 | 
 66 |         consume = Consume(formatter, writer, operations)
 67 | 
 68 |         # Blocking. Responds to Control-C.
 69 |         reader.process_replication_stream(consume)
 70 | 
 71 | class Consume(object):
 72 |     def __init__(self, formatter, writer, filter_operations):
 73 |         self.cum_msg_count = 0
 74 |         self.cum_msg_size = 0
 75 |         self.msg_window_size = 0
 76 |         self.msg_window_count = 0
 77 |         self.cur_window = 0
 78 | 
 79 |         self.formatter = formatter
 80 |         self.writer = writer
 81 |         self.filter_operations = filter_operations
 82 | 
 83 |     def should_send_to_kinesis(self, fmt_msg):
 84 |         return fmt_msg.change.operation in self.filter_operations
 85 | 
 86 |     def __call__(self, change):
 87 |         self.cum_msg_count += 1
 88 |         self.cum_msg_size += change.data_size
 89 | 
 90 |         self.msg_window_size += change.data_size
 91 |         self.msg_window_count += 1
 92 | 
 93 |         fmt_msgs = self.formatter(change.payload)
 94 | 
 95 |         progress_msg = 'xid: {:12} win_count:{:>10} win_size:{:>10}mb cum_count:{:>10} cum_size:{:>10}mb'
 96 | 
 97 |         for fmt_msg in fmt_msgs:
 98 |             if not self.should_send_to_kinesis(fmt_msg):
 99 |                 fmt_msg = None
100 | 
101 |             did_put = self.writer.put_message(fmt_msg)
102 |             if did_put:
103 |                 change.cursor.send_feedback(flush_lsn=change.data_start)
104 |                 logger.info('Flushed LSN: {}'.format(change.data_start))
105 | 
106 |             int_time = int(time.time())
107 |             if not int_time % 10 and int_time != self.cur_window:
108 |                 logger.info(progress_msg.format(
109 |                     self.formatter.cur_xact, self.msg_window_count,
110 |                     self.msg_window_size / 1048576, self.cum_msg_count,
111 |                     self.cum_msg_size / 1048576))
112 | 
113 |                 self.cur_window = int_time
114 |                 self.msg_window_size = 0
115 |                 self.msg_window_count = 0
116 | 
117 | if __name__ == '__main__':
118 |     main()
119 | 


--------------------------------------------------------------------------------
/pg2kinesis/formatter.py:
--------------------------------------------------------------------------------
  1 | from __future__ import unicode_literals
  2 | 
  3 | import json
  4 | import re
  5 | import sys
  6 | 
  7 | from .log import logger
  8 | 
  9 | from collections import namedtuple
 10 | 
 11 | # Tuples representing changes as pulled from database
 12 | Change = namedtuple('Change', 'xid, table, operation, pkey')
 13 | FullChange = namedtuple('FullChange', 'xid, change')
 14 | FullChange.operation = property(lambda self: self.change.get('kind'))
 15 | 
 16 | # Final product of Formatter, a Change and the Change formatted.
 17 | Message = namedtuple('Message', 'change, fmt_msg')
 18 | 
 19 | COL_TYPE_VALUE_TEMPLATE_PAT = r"{col_name}\[{col_type}\]:'?([\w\-]+)'?"
 20 | MISSING_TABLE_ERR = 'Unable to locate table: "{}"'
 21 | MISSING_PK_ERR = 'Unable to locate primary key for table "{}"'
 22 | 
 23 | class Formatter(object):
 24 |     VERSION = 0
 25 |     TYPE = 'CDC'
 26 |     IGNORED_CHANGES = {'COMMIT'}
 27 | 
 28 |     def __init__(self, primary_key_map, output_plugin='test_decoding',
 29 |                  full_change=False, table_pat=None):
 30 | 
 31 |         self._primary_key_patterns = {}
 32 |         self.output_plugin = output_plugin
 33 |         self.primary_key_map = primary_key_map
 34 |         self.full_change = full_change
 35 |         self.table_pat = table_pat if table_pat is not None else r'[\w_\.]+'
 36 |         self.table_re = re.compile(self.table_pat)
 37 |         self.cur_xact = ''
 38 | 
 39 |         for k, v in getattr(primary_key_map, 'iteritems', primary_key_map.items)():
 40 |             # ":" added to make later look up not need to trim trailing ":".
 41 |             self._primary_key_patterns[k + ":"] = re.compile(
 42 |                 COL_TYPE_VALUE_TEMPLATE_PAT.format(col_name=v.col_name, col_type=v.col_type)
 43 |             )
 44 | 
 45 |     def _preprocess_test_decoding_change(self, change):
 46 |         """
 47 |         Takes a message payload from the test_decoding plugin and distills it
 48 |         into a Change tuple currently only looking for primary key.
 49 | 
 50 |         They look like this:
 51 |             "table table_test: UPDATE: uuid[uuid]:'00079f3e-0479-4475-acff-4f225cc5188a' another_col[text]'bling'"
 52 | 
 53 |         :param change: a message payload from postgres' test_decoding plugin.
 54 |         :return: A list of type Change
 55 |         """
 56 | 
 57 |         rec = change.split(' ', 3)
 58 | 
 59 |         if rec[0] == 'BEGIN':
 60 |             self.cur_xact = rec[1]
 61 |         elif rec[0] in self.IGNORED_CHANGES:
 62 |             pass
 63 |         elif rec[0] == 'table':
 64 |             table_name = rec[1][:-1]
 65 | 
 66 |             if self.table_re.search(table_name):
 67 |                 try:
 68 |                     mat = self._primary_key_patterns[rec[1]].search(rec[3])
 69 |                 except KeyError:
 70 |                     self._log_and_raise(MISSING_TABLE_ERR.format(rec[1]))
 71 |                 else:
 72 |                     if mat:
 73 |                         pkey = mat.groups()[0]
 74 |                         return [Change(xid=self.cur_xact, table=table_name,
 75 |                                        operation=rec[2][:-1], pkey=pkey)]
 76 |                     else:
 77 |                         self._log_and_raise(MISSING_PK_ERR.format(table_name))
 78 |         else:
 79 |             self._log_and_raise('Unknown change: "{}"'.format(change))
 80 | 
 81 |         return []
 82 | 
 83 |     def _preprocess_wal2json_change(self, change):
 84 |         """
 85 |         Takes a message payload from the wal2json plugin and distills it into a
 86 |         list of Change or FullChange tuples.
 87 | 
 88 |         They look like this:
 89 |             {
 90 |                 "xid": 1234567890
 91 |                 "change": [
 92 |                     {
 93 |                         "kind": "insert",
 94 |                         "schema": "public",
 95 |                         "table": "some_table",
 96 |                         "columnnames": ["id"],
 97 |                         "columntypes": ["int4"],
 98 |                         "columnvalues": [42]
 99 |                     }
100 |                 ]
101 |             }
102 |         :param change: a message payload from postgres wal2json plugin.
103 |         :return: A list of type Change or FullChange
104 |         """
105 | 
106 |         change_dictionary = json.loads(change)
107 |         if not change_dictionary:
108 |             return []
109 | 
110 |         self.cur_xact = change_dictionary['xid']
111 |         changes = []
112 | 
113 |         for change in change_dictionary['change']:
114 |             table_name = change['table']
115 |             schema = change['schema']
116 |             if self.table_re.search(table_name):
117 |                 if self.full_change:
118 |                     changes.append(FullChange(xid=self.cur_xact, change=change))
119 |                 else:
120 |                     try:
121 |                         full_table = '{}.{}'.format(schema, table_name)
122 |                         primary_key = self.primary_key_map[full_table]
123 |                     except KeyError:
124 |                         self._log_and_raise(MISSING_TABLE_ERR.format(full_table))
125 |                     else:
126 |                         value_index = change['columnnames'].index(primary_key.col_name)
127 |                         pkey = str(change['columnvalues'][value_index])
128 |                         changes.append(Change(xid=self.cur_xact,
129 |                                               table=full_table,
130 |                                               operation=change['kind'].lower(),
131 |                                               pkey=pkey))
132 |         return changes
133 | 
134 |     @staticmethod
135 |     def _log_and_raise(msg):
136 |         logger.error(msg)
137 |         raise Exception(msg)
138 | 
139 |     def __call__(self, change):
140 |         if self.output_plugin == 'test_decoding':
141 |             pp_changes = self._preprocess_test_decoding_change(change)
142 |         elif self.output_plugin == 'wal2json':
143 |             pp_changes = self._preprocess_wal2json_change(change)
144 |         return [self.produce_formatted_message(pp_change) for pp_change in pp_changes]
145 | 
146 |     def produce_formatted_message(self, change):
147 |         raise NotImplementedError
148 | 
149 | 
150 | class CSVFormatter(Formatter):
151 |     VERSION = 0
152 |     def produce_formatted_message(self, change):
153 |         fmt_msg = '{},{},{},{},{},{}'.format(CSVFormatter.VERSION,
154 |                                              CSVFormatter.TYPE, *change)
155 |         return Message(change=change, fmt_msg=fmt_msg)
156 | 
157 | 
158 | class CSVPayloadFormatter(Formatter):
159 |     VERSION = 0
160 |     def produce_formatted_message(self, change):
161 |         fmt_msg = '{},{},{}'.format(CSVFormatter.VERSION, CSVFormatter.TYPE,
162 |                                     json.dumps(change._asdict()))
163 |         return Message(change=change, fmt_msg=fmt_msg)
164 | 
165 | 
166 | def get_formatter(name, primary_key_map, output_plugin, full_change, table_pat):
167 |     formatter_f = getattr(sys.modules[__name__], '%sFormatter' % name)
168 |     return formatter_f(primary_key_map, output_plugin, full_change, table_pat)
169 | 


--------------------------------------------------------------------------------
/pg2kinesis/log.py:
--------------------------------------------------------------------------------
1 | import logging
2 | 
3 | FORMAT = '%(asctime)-15s %(levelname)s %(message)s'
4 | logging.basicConfig(format=FORMAT)
5 | logger = logging.getLogger()
6 | logger.setLevel(logging.INFO)
7 | 


--------------------------------------------------------------------------------
/pg2kinesis/slot.py:
--------------------------------------------------------------------------------
  1 | from collections import namedtuple
  2 | import threading
  3 | 
  4 | import psycopg2
  5 | import psycopg2.extras
  6 | import psycopg2.extensions
  7 | import psycopg2.errorcodes
  8 | 
  9 | from .log import logger
 10 | 
 11 | psycopg2.extensions.register_type(psycopg2.extensions.UNICODE, None)
 12 | psycopg2.extensions.register_type(psycopg2.extensions.UNICODEARRAY, None)
 13 | 
 14 | PrimaryKeyMapItem = namedtuple('PrimaryKeyMapItem', 'table_name, col_name, col_type, col_ord_pos')
 15 | 
 16 | 
 17 | class SlotReader(object):
 18 |     PK_SQL = """
 19 |     SELECT CONCAT(table_schema, '.', table_name), column_name, data_type, ordinal_position
 20 |     FROM information_schema.tables
 21 |     LEFT JOIN (
 22 |         SELECT CONCAT(table_schema, '.', table_name), column_name, data_type, c.ordinal_position,
 23 |                     table_catalog, table_schema, table_name
 24 |         FROM information_schema.table_constraints
 25 |         JOIN information_schema.key_column_usage AS kcu
 26 |             USING (constraint_catalog, constraint_schema, constraint_name,
 27 |                         table_catalog, table_schema, table_name)
 28 |         JOIN information_schema.columns AS c
 29 |             USING (table_catalog, table_schema, table_name, column_name)
 30 |         WHERE constraint_type = 'PRIMARY KEY'
 31 |     ) as q using (table_catalog, table_schema, table_name)
 32 |     ORDER BY ordinal_position;
 33 |     """
 34 | 
 35 |     def __init__(self, database, host, port, user, sslmode, slot_name,
 36 |                  output_plugin='test_decoding'):
 37 |         # Cool fact: using connections as context manager doesn't close them on
 38 |         # success after leaving with block
 39 |         self._db_confg = dict(database=database, host=host, port=port, user=user, sslmode=sslmode)
 40 |         self._repl_conn = None
 41 |         self._repl_cursor = None
 42 |         self._normal_conn = None
 43 |         self.slot_name = slot_name
 44 |         self.output_plugin = output_plugin
 45 |         self.cur_lag = 0
 46 | 
 47 |     def __enter__(self):
 48 |         self._normal_conn = self._get_connection()
 49 |         self._normal_conn.set_isolation_level(psycopg2.extensions.ISOLATION_LEVEL_AUTOCOMMIT)
 50 |         self._repl_conn = self._get_connection(connection_factory=psycopg2.extras.LogicalReplicationConnection)
 51 |         self._repl_cursor = self._repl_conn.cursor()
 52 | 
 53 |         return self
 54 | 
 55 |     def __exit__(self, exc_type, exc_val, exc_tb):
 56 |         """
 57 |         Be a good citizen and try to clean up on the way out.
 58 |         """
 59 | 
 60 |         try:
 61 |             self._repl_cursor.close()
 62 |         except Exception:
 63 |             pass
 64 | 
 65 |         try:
 66 |             self._repl_conn.close()
 67 |         except Exception:
 68 |             pass
 69 | 
 70 |         try:
 71 |             self._normal_conn.close()
 72 |         except Exception:
 73 |             pass
 74 | 
 75 |     def _get_connection(self, connection_factory=None, cursor_factory=None):
 76 |         return psycopg2.connect(connection_factory=connection_factory,
 77 |                                 cursor_factory=cursor_factory, **self._db_confg)
 78 | 
 79 |     def _execute_and_fetch(self, sql, *params):
 80 |         with self._normal_conn.cursor() as cur:
 81 |             if params:
 82 |                 cur.execute(sql, params)
 83 |             else:
 84 |                 cur.execute(sql)
 85 | 
 86 |             return cur.fetchall()
 87 | 
 88 |     @property
 89 |     def primary_key_map(self):
 90 |         logger.info('Getting primary key map')
 91 |         result = map(PrimaryKeyMapItem._make, self._execute_and_fetch(SlotReader.PK_SQL))
 92 |         pk_map = {rec.table_name: rec for rec in result}
 93 | 
 94 |         return pk_map
 95 | 
 96 |     def create_slot(self):
 97 |         logger.info('Creating slot %s' % self.slot_name)
 98 |         try:
 99 |             self._repl_cursor.create_replication_slot(self.slot_name,
100 |                                                       slot_type=psycopg2.extras.REPLICATION_LOGICAL,
101 |                                                       output_plugin=self.output_plugin)
102 |         except psycopg2.ProgrammingError as p:
103 |             # Will be raised if slot exists already.
104 |             if p.pgcode != psycopg2.errorcodes.DUPLICATE_OBJECT:
105 |                 logger.error(p)
106 |                 raise
107 |             else:
108 |                 logger.info('Slot %s is already present.' % self.slot_name)
109 | 
110 |     def delete_slot(self):
111 |         logger.info('Deleting slot %s' % self.slot_name)
112 |         try:
113 |             self._repl_cursor.drop_replication_slot(self.slot_name)
114 |         except psycopg2.ProgrammingError as p:
115 |             # Will be raised if slot exists already.
116 |             if p.pgcode != psycopg2.errorcodes.UNDEFINED_OBJECT:
117 |                 logger.error(p)
118 |                 raise
119 |             else:
120 |                 logger.info('Slot %s was not found.' % self.slot_name)
121 | 
122 |     def process_replication_stream(self, consume):
123 |         logger.info('Starting the consumption of slot "%s"!' % self.slot_name)
124 |         if self.output_plugin == 'wal2json':
125 |             options = {'include-xids': 1}
126 |         else:
127 |             options = None
128 |         self._repl_cursor.start_replication(self.slot_name, options=options)
129 |         self._repl_cursor.consume_stream(consume)
130 | 


--------------------------------------------------------------------------------
/pg2kinesis/stream.py:
--------------------------------------------------------------------------------
 1 | import time
 2 | import aws_kinesis_agg.aggregator
 3 | import boto3
 4 | 
 5 | from botocore.exceptions import ClientError
 6 | from .log import logger
 7 | 
 8 | class StreamWriter(object):
 9 |     def __init__(self, stream_name, back_off_limit=60, send_window=13):
10 |         self.stream_name = stream_name
11 |         self.back_off_limit = back_off_limit
12 |         self.last_send = 0
13 | 
14 |         self._kinesis = boto3.client('kinesis')
15 |         self._sequence_number_for_ordering = '0'
16 |         self._record_agg = aws_kinesis_agg.aggregator.RecordAggregator()
17 |         self._send_window = send_window
18 | 
19 |         try:
20 |             self._kinesis.create_stream(StreamName=stream_name, ShardCount=1)
21 |         except ClientError as e:
22 |             # ResourceInUseException is raised when the stream already exists
23 |             if e.response['Error']['Code'] != 'ResourceInUseException':
24 |                 logger.error(e)
25 |                 raise
26 | 
27 |         waiter = self._kinesis.get_waiter('stream_exists')
28 | 
29 |         # waits up to 180 seconds for stream to exist
30 |         waiter.wait(StreamName=self.stream_name)
31 | 
32 |     def put_message(self, fmt_msg):
33 |         agg_record = None
34 | 
35 |         if fmt_msg:
36 |             agg_record = self._record_agg.add_user_record(str(fmt_msg.change.xid), fmt_msg.fmt_msg)
37 | 
38 |         # agg_record will be a complete record if aggregation is full.
39 |         if agg_record or (self._send_window and time.time() - self.last_send > self._send_window):
40 |             agg_record = agg_record if agg_record else self._record_agg.clear_and_get()
41 |             self._send_agg_record(agg_record)
42 |             self.last_send = time.time()
43 | 
44 |         return agg_record
45 | 
46 |     def _send_agg_record(self, agg_record):
47 |         if agg_record is None:
48 |             return
49 | 
50 |         pk, _, data = agg_record.get_contents()
51 |         logger.info('Sending %s records. Size %s. PK: %s' %
52 |                     (agg_record.get_num_user_records(), agg_record.get_size_bytes(), pk))
53 | 
54 |         back_off = .05
55 |         while back_off < self.back_off_limit:
56 |             try:
57 |                 result = self._kinesis.put_record(Data=data,
58 |                                                   PartitionKey=pk,
59 |                                                   SequenceNumberForOrdering=self._sequence_number_for_ordering,
60 |                                                   StreamName=self.stream_name)
61 | 
62 |             except ClientError as e:
63 |                 if e.response['Error']["Code"] == 'ProvisionedThroughputExceededException':
64 |                     back_off *= 2
65 |                     logger.warning('Provisioned throughput exceeded: sleeping %ss' % back_off)
66 |                     time.sleep(back_off)
67 |                 else:
68 |                     logger.error(e)
69 |                     raise
70 |             else:
71 |                 logger.debug('Sequence number: %s' % result['SequenceNumber'])
72 |                 break
73 |         else:
74 |             raise Exception('ProvisionedThroughputExceededException caused a backed off too many times!')
75 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
 1 | aws_kinesis_agg==1.1.0
 2 | boto3==1.6.19
 3 | botocore==1.9.19
 4 | click==6.3.0
 5 | freezegun==0.3.6
 6 | ipdb==0.10.2
 7 | ipython<6.0.0,>=0.10.2
 8 | protobuf==3.0.0
 9 | psycopg2==2.7.4
10 | pytest==3.0.4
11 | mock==2.0.0
12 | 


--------------------------------------------------------------------------------
/setup.cfg:
--------------------------------------------------------------------------------
1 | [bdist_wheel]
2 | universal = 0
3 | 
4 | [metadata]
5 | license_file = LICENSE
6 | 


--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
 1 | import codecs
 2 | import os
 3 | import re
 4 | 
 5 | from setuptools import setup, find_packages
 6 | 
 7 | 
 8 | ###############################################################################
 9 | 
10 | NAME = 'pg2kinesis'
11 | PACKAGES = find_packages(where='.')
12 | 
13 | if not PACKAGES:
14 |     raise RuntimeError('Package NOT FOUND!')
15 | 
16 | META_PATH = os.path.join('pg2kinesis', '__init__.py')
17 | KEYWORDS = ['logical replication', 'kinesis', 'database']
18 | CLASSIFIERS = [
19 |     'Development Status :: 5 - Production/Stable',
20 |     'Intended Audience :: Developers',
21 |     'Natural Language :: English',
22 |     'License :: OSI Approved :: MIT License',
23 |     'Operating System :: OS Independent',
24 |     'Programming Language :: Python',
25 |     'Programming Language :: Python :: 2',
26 |     'Programming Language :: Python :: 2.7',
27 |     'Programming Language :: Python :: 3',
28 |     'Programming Language :: Python :: 3.3',
29 |     'Programming Language :: Python :: 3.4',
30 |     'Programming Language :: Python :: 3.5',
31 |     'Programming Language :: Python :: 3.6',
32 |     'Programming Language :: Python :: Implementation :: CPython',
33 |     'Topic :: Utilities',
34 | ]
35 | INSTALL_REQUIRES = [
36 |     'aws_kinesis_agg>=1.1.0',
37 |     'boto3>=1.6.19',
38 |     'botocore>=1.9.19',
39 |     'click>=6.3.0',
40 |     'protobuf>=3.0.0',
41 |     'psycopg2>=2.7.4',
42 | ]
43 | 
44 | ###############################################################################
45 | 
46 | HERE = os.path.abspath(os.path.dirname(__file__))
47 | 
48 | 
49 | def read(*parts):
50 |     """
51 |     Build an absolute path from *parts* and and return the contents of the
52 |     resulting file.  Assume UTF-8 encoding.
53 |     """
54 |     with codecs.open(os.path.join(HERE, *parts), 'rb', 'utf-8') as f:
55 |         return f.read()
56 | 
57 | 
58 | META_FILE = read(META_PATH)
59 | 
60 | 
61 | def find_meta(meta):
62 |     """
63 |     Extract __*meta*__ from META_FILE.
64 |     """
65 |     meta_match = re.search(
66 |         r'^__{meta}__ = [\'"]([^\'"]*)[\'"]'.format(meta=meta),
67 |         META_FILE, re.M
68 |     )
69 |     if meta_match:
70 |         return meta_match.group(1)
71 |     raise RuntimeError('Unable to find __{meta}__ string.'.format(meta=meta))
72 | 
73 | 
74 | VERSION = find_meta('version')
75 | URI = find_meta('uri')
76 | LONG = read('README.rst')
77 | 
78 | if __name__ == "__main__":
79 |     setup(
80 |         name=NAME,
81 |         description=find_meta("description"),
82 |         license=find_meta("license"),
83 |         url=URI,
84 |         version=VERSION,
85 |         author=find_meta("author"),
86 |         author_email=find_meta("email"),
87 |         maintainer=find_meta("author"),
88 |         maintainer_email=find_meta("email"),
89 |         keywords=KEYWORDS,
90 |         long_description=LONG,
91 |         packages=PACKAGES,
92 |         zip_safe=True,
93 |         classifiers=CLASSIFIERS,
94 |         install_requires=INSTALL_REQUIRES,
95 |         entry_points={
96 |             'console_scripts': ['pg2kinesis=pg2kinesis.__main__:main'],
97 |         }
98 |     )
99 | 


--------------------------------------------------------------------------------
/tests/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/surbas/pg2kinesis/5db63a5476d05aedb524da99b8279642b51aa319/tests/__init__.py


--------------------------------------------------------------------------------
/tests/test___main__.py:
--------------------------------------------------------------------------------
  1 | from __future__ import unicode_literals
  2 | 
  3 | from mock import Mock, call, patch
  4 | 
  5 | from pg2kinesis.__main__ import Consume
  6 | from pg2kinesis.formatter import Message, Change, FullChange
  7 | 
  8 | 
  9 | def test_consume():
 10 |     mock_formatter = Mock(return_value='fmt_msg')
 11 |     # required to avoid formatting error if cur_xact is logged
 12 |     mock_formatter.cur_xact = 'TEST_TRANSACTION'
 13 |     mock_writer = Mock()
 14 | 
 15 |     consume = Consume(mock_formatter, mock_writer, ['insert', 'update', 'delete', 'truncate'])
 16 | 
 17 |     mock_change = Mock()
 18 |     mock_change.data_start = 10
 19 |     mock_change.data_size = 100
 20 |     mock_change.payload = 'PAYLOAD'
 21 | 
 22 |     mock_writer.put_message = Mock(return_value=False)
 23 |     consume.should_send_to_kinesis = Mock(return_value=True)
 24 |     consume(mock_change)
 25 |     assert mock_writer.put_message.called, 'Sanity'
 26 |     assert call.cursor.send_feedback(flush_lsn=10) not in mock_change.mock_calls, \
 27 |         'we did not send feedback!'
 28 | 
 29 |     mock_writer.put_message = Mock(return_value=True)
 30 |     consume(mock_change)
 31 |     assert mock_writer.put_message.called, 'Sanity'
 32 |     assert call.cursor.send_feedback(flush_lsn=10) in mock_change.mock_calls, \
 33 |         'we sent feedback!'
 34 | 
 35 | 
 36 |     mock_time = Mock()
 37 |     mock_time.return_value = 11.0
 38 | 
 39 |     consume.msg_window_size = 0
 40 |     consume.msg_window_count = 0
 41 |     consume.cur_window = 10
 42 |     with patch('time.time', mock_time):
 43 |         consume(mock_change)
 44 |         assert consume.cur_window == 10, 'cur window not updated if time is non-10-multiple'
 45 |         assert consume.msg_window_size == 100, 'msg_window_size not reset if time is non-10-multiple'
 46 |         assert consume.msg_window_count == 1, 'msg_window_count not reset if time is non-10-multiple'
 47 | 
 48 |     mock_time.return_value = 20.0
 49 |     with patch('time.time', mock_time):
 50 |         consume(mock_change)
 51 |         assert consume.cur_window == 20, 'cur window updated if time is multiple of 10'
 52 |         assert consume.msg_window_size == 0, 'msg_window_size reset if time is multiple of 10'
 53 |         assert consume.msg_window_count == 0, 'msg_window_count reset if time is multiple of 10'
 54 | 
 55 |     with patch('time.time', mock_time):
 56 |         consume(mock_change)
 57 |         assert consume.msg_window_size == 100, 'msg_window_size not reset if time is same as cur_window'
 58 | 
 59 | 
 60 | def test_consume_excludes():
 61 |     mock_formatter = Mock(return_value=[
 62 |         Message(Change(1, 'my_table', 'insert', 1), 'formatted_message1'),
 63 |         Message(FullChange(1, {
 64 |             "kind": "update",
 65 |             "schema": "public",
 66 |             "table": "my_table",
 67 |             "columnnames": ["id"],
 68 |             "columntypes": ["int4"],
 69 |             "columnvalues": [42]
 70 |         }), 'formatted_message2')
 71 |     ])
 72 |     mock_formatter.cur_xact = 'TEST_TRANSACTION'
 73 |     mock_change = Mock(data_start=10, data_size=100, payload='PAYLOAD')
 74 |     mock_writer = Mock()
 75 |     mock_writer.put_message.return_value = False
 76 | 
 77 |     consume = Consume(mock_formatter, mock_writer, ['delete'])
 78 |     consume(mock_change)
 79 | 
 80 |     mock_writer.put_message.assert_has_calls([call(None), call(None)])
 81 | 
 82 | 
 83 | def test_consume_includes():
 84 |     formatted_messages = [
 85 |         Message(Change(1, 'my_table', 'delete', 1), 'formatted_message1'),
 86 |         Message(FullChange(1, {
 87 |             "kind": "delete",
 88 |             "schema": "public",
 89 |             "table": "my_table",
 90 |             "columnnames": ["id"],
 91 |             "columntypes": ["int4"],
 92 |             "columnvalues": [42]
 93 |         }), 'formatted_message2')
 94 |     ]
 95 |     mock_formatter = Mock(return_value=formatted_messages)
 96 |     mock_formatter.cur_xact = 'TEST_TRANSACTION'
 97 |     mock_change = Mock(data_start=10, data_size=100, payload='PAYLOAD', cursor=Mock())
 98 |     mock_change.cursor.send_feedback.return_value = True
 99 |     mock_writer = Mock()
100 |     mock_writer.put_message.return_value = True
101 | 
102 |     consume = Consume(mock_formatter, mock_writer, ['delete'])
103 |     consume(mock_change)
104 | 
105 |     mock_writer.put_message.assert_has_calls([call(msg) for msg in formatted_messages])
106 |     mock_change.cursor.send_feedback.assert_has_calls([call(flush_lsn=mock_change.data_start)])
107 | 


--------------------------------------------------------------------------------
/tests/test_formatter.py:
--------------------------------------------------------------------------------
  1 | # coding=utf-8
  2 | from __future__ import unicode_literals
  3 | import json
  4 | 
  5 | import mock
  6 | import pytest
  7 | 
  8 | from pg2kinesis.slot import PrimaryKeyMapItem
  9 | from pg2kinesis.formatter import Change, CSVFormatter, CSVPayloadFormatter, Formatter, get_formatter
 10 | 
 11 | 
 12 | def get_formatter_produce_formatted_message(cls):
 13 |     # type: (Formatter, unicode) -> Message
 14 |     change = Change(xid=1, table=u'public.blue', operation=u'Update', pkey=u'123456')
 15 |     result = cls({}).produce_formatted_message(change)
 16 |     assert result.change == change
 17 |     return result
 18 | 
 19 | 
 20 | def test_CSVFormatter_produce_formatted_message():
 21 |     result = get_formatter_produce_formatted_message(CSVFormatter)
 22 | 
 23 |     assert result.fmt_msg == u'0,CDC,1,public.blue,Update,123456'
 24 | 
 25 | 
 26 | def test_CSVPayloadFormatter_produce_formatted_message():
 27 |     result = get_formatter_produce_formatted_message(CSVPayloadFormatter)
 28 |     assert result.fmt_msg.startswith(u'0,CDC,')
 29 |     payload = result.fmt_msg.split(',', 2)[-1]
 30 |     assert json.loads(payload) == dict(xid=1, table=u'public.blue', operation=u'Update', pkey=u'123456')
 31 | 
 32 | 
 33 | @pytest.fixture
 34 | def pkey_map():
 35 |     return {'public.test_table': PrimaryKeyMapItem(u'public.test_table', u'uuid', u'uuid', 0),
 36 |             'public.test_table2': PrimaryKeyMapItem(u'public.test_table2', u'name', u'character varying', 0)}
 37 | 
 38 | 
 39 | @pytest.fixture(params=[CSVFormatter, CSVPayloadFormatter, Formatter])
 40 | def formatter(request, pkey_map):
 41 |     return request.param(pkey_map)
 42 | 
 43 | 
 44 | def test___init__(formatter):
 45 |     # processes primary key map (adds colon)
 46 | 
 47 |     patterns = formatter._primary_key_patterns
 48 | 
 49 |     assert len(patterns) == 2
 50 |     assert u'public.test_table:' in patterns, 'with colon'
 51 |     assert u'public.test_table2:' in patterns, 'with colon'
 52 |     assert u'public.test_table' not in patterns, 'without colon should not be in patterns'
 53 |     assert u'public.test_table2' not in patterns, 'without colon should not be in patterns'
 54 |     assert patterns[u'public.test_table:'].pattern == r"uuid\[uuid\]:'?([\w\-]+)'?"
 55 |     assert patterns[u'public.test_table2:'].pattern == r"name\[character varying\]:'?([\w\-]+)'?"
 56 | 
 57 | 
 58 | def test__preprocess_test_decoding_change(formatter):
 59 |     # assert begin -> None + cur trans
 60 |     assert formatter.cur_xact == ''
 61 |     result = formatter._preprocess_test_decoding_change(u'BEGIN 100')
 62 |     assert result == []
 63 |     assert formatter.cur_xact == u'100'
 64 | 
 65 |     # assert commit -> None
 66 |     formatter.cur_xact = ''
 67 |     result = formatter._preprocess_test_decoding_change(u'COMMIT')
 68 |     assert result == []
 69 |     assert formatter.cur_xact == ''
 70 | 
 71 |     # error states
 72 |     with mock.patch.object(formatter, '_log_and_raise') as mock_log_and_raise:
 73 |         formatter._preprocess_test_decoding_change(u'UNKNOWN BLING')
 74 |         assert mock_log_and_raise.called
 75 |         mock_log_and_raise.assert_called_with(u'Unknown change: "UNKNOWN BLING"')
 76 | 
 77 |     with mock.patch.object(formatter, '_log_and_raise') as mock_log_and_raise:
 78 |         formatter._preprocess_test_decoding_change(u"table not_a_table: UPDATE: uuid[uuid]:'00079f3e-0479-4475-acff-4f225cc5188a'")
 79 |         assert mock_log_and_raise.called
 80 |         mock_log_and_raise.assert_called_with(u'Unable to locate table: "not_a_table:"')
 81 | 
 82 |     with mock.patch.object(formatter, '_log_and_raise') as mock_log_and_raise:
 83 |         formatter._preprocess_test_decoding_change(u"table public.test_table: UPDATE: not[not]:'00079f3e-0479-4475-acff-4f225cc5188a'")
 84 |         assert mock_log_and_raise.called
 85 |         mock_log_and_raise.assert_called_with(u'Unable to locate primary key for table "public.test_table"')
 86 | 
 87 |     # assert on proper match
 88 |     formatter.cur_xact = '1337'
 89 |     change = formatter._preprocess_test_decoding_change(
 90 |         u"table public.test_table: UPDATE: uuid[uuid]:'00079f3e-0479-4475-acff-4f225cc5188a'")[0]
 91 | 
 92 |     assert change.xid == u'1337'
 93 |     assert change.table == u'public.test_table'
 94 |     assert change.operation == u'UPDATE'
 95 |     assert change.pkey == u'00079f3e-0479-4475-acff-4f225cc5188a'
 96 | 
 97 |     change = formatter._preprocess_test_decoding_change(
 98 |         u"table public.test_table2: DELETE: name[character varying]:'Bling-2'")[0]
 99 | 
100 |     assert change.xid == u'1337'
101 |     assert change.table == u'public.test_table2'
102 |     assert change.operation == u'DELETE'
103 |     assert change.pkey == u'Bling-2'
104 | 
105 | 
106 | def test__preprocess_wal2json_change(formatter):
107 |     formatter.cur_xact = ''
108 |     result = formatter._preprocess_wal2json_change(u"""{
109 |                 "xid": 101,
110 |                 "change": []
111 |             }""")
112 |     assert result == []
113 |     assert formatter.cur_xact == 101
114 | 
115 |     with mock.patch.object(formatter, '_log_and_raise') as mock_log_and_raise:
116 |         formatter._preprocess_wal2json_change(u"""{
117 |                 "xid": 100,
118 |                 "change": [
119 |                     {
120 |                         "kind": "insert",
121 |                         "schema": "public",
122 |                         "table": "not_a_table",
123 |                         "columnnames": ["uuid"],
124 |                         "columntypes": ["int4"],
125 |                         "columnvalues": ["00079f3e-0479-4475-acff-4f225cc5188a"]
126 |                     }
127 |                 ]
128 |             }""")
129 |         assert mock_log_and_raise.called
130 |         mock_log_and_raise.assert_called_with(u'Unable to locate table: "public.not_a_table"')
131 | 
132 |     # assert on proper match
133 |     formatter.cur_xact = '1337'
134 |     change = formatter._preprocess_wal2json_change(u"""{
135 |                 "xid": 1337,
136 |                 "change": [
137 |                     {
138 |                         "kind": "insert",
139 |                         "schema": "public",
140 |                         "table": "test_table",
141 |                         "columnnames": ["uuid"],
142 |                         "columntypes": ["int4"],
143 |                         "columnvalues": ["00079f3e-0479-4475-acff-4f225cc5188a"]
144 |                     }
145 |                 ]
146 |             }""")[0]
147 | 
148 |     assert change.xid == 1337
149 |     assert change.table == u'public.test_table'
150 |     assert change.operation == u'insert'
151 |     assert change.pkey == u'00079f3e-0479-4475-acff-4f225cc5188a'
152 | 
153 |     change = formatter._preprocess_wal2json_change(u"""{
154 |                 "xid": 1337,
155 |                 "change": [
156 |                     {
157 |                         "kind": "delete",
158 |                         "schema": "public",
159 |                         "table": "test_table2",
160 |                         "columnnames": ["name"],
161 |                         "columntypes": ["varchar"],
162 |                         "columnvalues": ["Bling-2"]
163 |                     }
164 |                 ]
165 |             }""")[0]
166 | 
167 |     assert change.xid == 1337
168 |     assert change.table == u'public.test_table2'
169 |     assert change.operation == u'delete'
170 |     assert change.pkey == u'Bling-2'
171 | 
172 | 
173 | def test__preprocess_wal2json_full_change(formatter):
174 |     formatter.cur_xact = ''
175 |     formatter.full_change = True
176 | 
177 |     result = formatter._preprocess_wal2json_change(u"""{
178 |                 "xid": 101,
179 |                 "change": []
180 |             }""")
181 |     assert result == []
182 |     assert formatter.cur_xact == 101
183 | 
184 |     # Full changes from wal2json are not validated against known table map
185 |     with mock.patch.object(formatter, '_log_and_raise') as mock_log_and_raise:
186 |         formatter._preprocess_wal2json_change(u"""{
187 |                 "xid": 100,
188 |                 "change": [
189 |                     {
190 |                         "kind": "insert",
191 |                         "schema": "public",
192 |                         "table": "not_a_table",
193 |                         "columnnames": ["uuid"],
194 |                         "columntypes": ["int4"],
195 |                         "columnvalues": ["00079f3e-0479-4475-acff-4f225cc5188a"]
196 |                     }
197 |                 ]
198 |             }""")
199 |         assert not mock_log_and_raise.called
200 | 
201 |     # assert on proper match
202 |     formatter.cur_xact = '1337'
203 |     change = formatter._preprocess_wal2json_change(u"""{
204 |                 "xid": 1337,
205 |                 "change": [
206 |                     {
207 |                         "kind": "insert",
208 |                         "schema": "public",
209 |                         "table": "test_table",
210 |                         "columnnames": ["uuid"],
211 |                         "columntypes": ["int4"],
212 |                         "columnvalues": ["00079f3e-0479-4475-acff-4f225cc5188a"]
213 |                     }
214 |                 ]
215 |             }""")[0]
216 | 
217 |     assert change.xid == 1337
218 |     assert change.change == {
219 |         "kind": "insert",
220 |         "schema": "public",
221 |         "table": "test_table",
222 |         "columnnames": ["uuid"],
223 |         "columntypes": ["int4"],
224 |         "columnvalues": ["00079f3e-0479-4475-acff-4f225cc5188a"]
225 |     }
226 | 
227 |     change = formatter._preprocess_wal2json_change(u"""{
228 |                 "xid": 1337,
229 |                 "change": [
230 |                     {
231 |                         "kind": "delete",
232 |                         "schema": "public",
233 |                         "table": "test_table2",
234 |                         "columnnames": ["name"],
235 |                         "columntypes": ["varchar"],
236 |                         "columnvalues": ["Bling-2"]
237 |                     }
238 |                 ]
239 |             }""")[0]
240 | 
241 |     assert change.xid == 1337
242 |     assert change.change == {
243 |         "kind": "delete",
244 |         "schema": "public",
245 |         "table": "test_table2",
246 |         "columnnames": ["name"],
247 |         "columntypes": ["varchar"],
248 |         "columnvalues": ["Bling-2"]
249 |     }
250 | 
251 | 
252 | def test_log_and_raise(formatter):
253 | 
254 |     with mock.patch('logging.Logger.error') as mock_log, pytest.raises(Exception) as e_info:
255 |         formatter._log_and_raise(u'HELP!')
256 | 
257 |     assert str(e_info.value) == u'HELP!'
258 |     mock_log.assert_called_with(u'HELP!')
259 | 
260 | 
261 | def test___call__(formatter):
262 |     with mock.patch.object(formatter, '_preprocess_test_decoding_change', return_value=[]) as mock_preprocess_test_decoding_change, \
263 |             mock.patch.object(formatter, 'produce_formatted_message', return_value=None) as mock_pfm:
264 |         assert not mock_preprocess_test_decoding_change.called
265 |         result = formatter('COMMIT')
266 |         assert mock_preprocess_test_decoding_change.called
267 |         assert result == []
268 |         assert not mock_pfm.called
269 | 
270 |     with mock.patch.object(formatter, '_preprocess_test_decoding_change', return_value=['blue']) as mock_preprocess_test_decoding_change, \
271 |             mock.patch.object(formatter, 'produce_formatted_message', return_value='blue msg') as mock_pfm:
272 |         assert not mock_preprocess_test_decoding_change.called
273 |         result = formatter('blue message')
274 |         assert mock_preprocess_test_decoding_change.called
275 |         assert result == ['blue msg']
276 |         mock_pfm.assert_called_with('blue')
277 | 
278 | 
279 | def test_get_formatter():
280 |     with mock.patch.object(Formatter, '__init__', return_value=None) as mocked:
281 |         result = get_formatter('CSVPayload', 1, 2, 3, 4)
282 |         assert isinstance(result, CSVPayloadFormatter)
283 |         assert mocked.called
284 |         mocked.assert_called_with(1, 2, 3, 4)
285 | 
286 |     with mock.patch.object(Formatter, '__init__', return_value=None) as mocked:
287 |         result = get_formatter('CSV', 1, 2, 3, 4)
288 |         assert isinstance(result, CSVFormatter)
289 |         assert mocked.called
290 |         mocked.assert_called_with(1, 2, 3, 4)
291 | 


--------------------------------------------------------------------------------
/tests/test_slot.py:
--------------------------------------------------------------------------------
  1 | from mock import call, Mock, MagicMock, patch, PropertyMock
  2 | 
  3 | import pytest
  4 | import psycopg2
  5 | import psycopg2.errorcodes
  6 | 
  7 | from pg2kinesis.slot import SlotReader
  8 | 
  9 | 
 10 | @pytest.fixture
 11 | def slot():
 12 |     slot = SlotReader('blah_db', 'blah_host', 'blah_port', 'blah_user', 'blah_sslmode', 'pg2kinesis')
 13 |     slot._repl_cursor = Mock()
 14 |     slot._repl_conn = Mock()
 15 |     slot._normal_conn = Mock()
 16 | 
 17 |     return slot
 18 | 
 19 | 
 20 | def test__enter__(slot):
 21 |     # returns its self
 22 | 
 23 |     with patch.object(slot, '_get_connection', side_effects=[Mock(), Mock()]) as mock_gc:
 24 |         assert slot == slot.__enter__(), 'Returns itself'
 25 |         assert mock_gc.call_count == 2
 26 | 
 27 |         assert call.set_isolation_level(0) in slot._normal_conn.method_calls, 'make sure we are in autocommit'
 28 |         assert call.cursor() in slot._repl_conn.method_calls, 'we opened a cursor'
 29 | 
 30 |     with patch.object(slot, '_get_connection', side_effects=[Mock(), Mock()]) as mock_gc:
 31 |         slot.__enter__()
 32 | 
 33 | 
 34 | def test__exit__(slot):
 35 |     slot.__exit__(None, None, None)
 36 | 
 37 |     assert call.close() in slot._repl_cursor.method_calls
 38 |     assert call.close() in slot._repl_conn.method_calls
 39 |     assert call.close() in slot._normal_conn.method_calls
 40 | 
 41 |     slot._repl_cursor.close = Mock(side_effect=Exception)
 42 |     slot._repl_conn.close = Mock(side_effect=Exception)
 43 |     slot._normal_conn.close= Mock(side_effect=Exception)
 44 |     slot.__exit__(None, None, None)
 45 | 
 46 |     assert slot._repl_cursor.close.called, "Still called even thought call above raised"
 47 |     assert slot._repl_conn.close.called, "Still called even thought call above raised"
 48 |     assert slot._normal_conn.close.called, "Still called even thought call above raised"
 49 | 
 50 | 
 51 | def test_create_slot(slot):
 52 | 
 53 |     with patch.object(psycopg2.ProgrammingError, 'pgcode',
 54 |                       new_callable=PropertyMock,
 55 |                       return_value=psycopg2.errorcodes.DUPLICATE_OBJECT):
 56 |         pe = psycopg2.ProgrammingError()
 57 | 
 58 | 
 59 |         slot._repl_cursor.create_replication_slot = Mock(side_effect=pe)
 60 |         slot.create_slot()
 61 |         slot._repl_cursor.create_replication_slot.assert_called_with('pg2kinesis',
 62 |                                                                      slot_type=psycopg2.extras.REPLICATION_LOGICAL,
 63 |                                                                      output_plugin=u'test_decoding')
 64 |     with patch.object(psycopg2.ProgrammingError, 'pgcode',
 65 |                           new_callable=PropertyMock,
 66 |                           return_value=-1):
 67 |         pe = psycopg2.ProgrammingError()
 68 |         slot._repl_cursor.create_replication_slot = Mock(side_effect=pe)
 69 | 
 70 |         with pytest.raises(psycopg2.ProgrammingError) as e_info:
 71 |             slot.create_slot()
 72 |             slot._repl_cursor.create_replication_slot.assert_called_with('pg2kinesis',
 73 |                                                                          slot_type=psycopg2.extras.REPLICATION_LOGICAL,
 74 |                                                                          output_plugin=u'test_decoding')
 75 |         assert e_info.value.pgcode == -1
 76 | 
 77 |         slot._repl_cursor.create_replication_slot = Mock(side_effect=Exception)
 78 |     with pytest.raises(Exception):
 79 |         slot.create_slot()
 80 |         slot._repl_cursor.create_replication_slot.assert_called_with('pg2kinesis',
 81 |                                                                          slot_type=psycopg2.extras.REPLICATION_LOGICAL,
 82 |                                                                          output_plugin=u'test_decoding')
 83 | 
 84 | 
 85 | def test_delete_slot(slot):
 86 |     with patch.object(psycopg2.ProgrammingError, 'pgcode',
 87 |                       new_callable=PropertyMock,
 88 |                       return_value=psycopg2.errorcodes.UNDEFINED_OBJECT):
 89 |         pe = psycopg2.ProgrammingError()
 90 |         slot._repl_cursor.drop_replication_slot = Mock(side_effect=pe)
 91 |         slot.delete_slot()
 92 |     slot._repl_cursor.drop_replication_slot.assert_called_with('pg2kinesis')
 93 | 
 94 |     with patch.object(psycopg2.ProgrammingError, 'pgcode',
 95 |                       new_callable=PropertyMock,
 96 |                       return_value=-1):
 97 |         pe = psycopg2.ProgrammingError()
 98 |         slot._repl_cursor.create_replication_slot = Mock(side_effect=pe)
 99 |         with pytest.raises(psycopg2.ProgrammingError) as e_info:
100 |             slot.delete_slot()
101 |             slot._repl_cursor.drop_replication_slot.assert_called_with('pg2kinesis')
102 | 
103 |             assert e_info.value.pgcode == -1
104 | 
105 |     slot._repl_cursor.create_replication_slot = Mock(side_effect=Exception)
106 |     with pytest.raises(Exception):
107 |         slot.delete_slot()
108 |         slot._repl_cursor.drop_replication_slot.assert_called_with('pg2kinesis')
109 | 
110 | 
111 | def test__get_connection(slot):
112 |     with patch('psycopg2.connect') as mock_connect:
113 |         slot._get_connection()
114 |         kall = call(connection_factory=None, cursor_factory=None, database='blah_db', host='blah_host',
115 |                     port='blah_port', user='blah_user')
116 |         assert mock_connect.called_with(kall)
117 | 
118 |         slot._get_connection(connection_factory='connection_fact', cursor_factory='cursor_fact')
119 |         kall = call(connection_factory='connection_fact', cursor_factory='cursor_fact', database='blah_db', host='blah_host',
120 |                     port='blah_port', user = 'blah_user')
121 |         assert mock_connect.called_with(kall)
122 | 
123 | 
124 | def test_primary_key_map(slot):
125 |     slot._execute_and_fetch = Mock(return_value=[('test_table', 'pkey', 'uuid', 0),
126 |                                                  ('test_table2', 'pkey', 'uuid', 0),
127 |                                                  ('blue', 'bkey', 'char var', 10)
128 |                                                  ])
129 | 
130 |     pkey_map = slot.primary_key_map
131 | 
132 |     assert len(pkey_map) == 3
133 |     assert 'test_table' in pkey_map
134 |     assert 'test_table2' in pkey_map
135 |     assert 'blue' in pkey_map
136 | 
137 |     assert pkey_map['blue'].table_name == 'blue'
138 |     assert pkey_map['blue'].col_name == 'bkey'
139 |     assert pkey_map['blue'].col_type == 'char var'
140 |     assert pkey_map['blue'].col_ord_pos == 10
141 | 
142 | 
143 | def test_execute_and_fetch(slot):
144 |     norm_conn = slot._normal_conn
145 |     mock_cur = MagicMock()
146 |     norm_conn.cursor = Mock(return_value=mock_cur)
147 | 
148 |     slot._execute_and_fetch('SQL SQL STATEMENT', 1, 2, 3)
149 |     call.execute('SQL SQL STATEMENT', (1, 2, 3)) in mock_cur.method_calls
150 |     assert call.__enter__().fetchall() in mock_cur.mock_calls
151 | 
152 |     mock_cur.reset_mock()
153 |     slot._execute_and_fetch('SQL SQL STATEMENT')
154 |     call.execute('SQL SQL STATEMENT') in mock_cur.method_calls
155 |     assert call.__enter__().fetchall() in mock_cur.mock_calls
156 | 
157 | 
158 | def test_process_replication_stream(slot):
159 |     consume = Mock()
160 |     slot.process_replication_stream(consume)
161 | 
162 |     assert call.start_replication('pg2kinesis', options=None) in  slot._repl_cursor.method_calls, 'We started replication event loop'
163 |     assert call.consume_stream(consume) in slot._repl_cursor.method_calls, 'We pass consume to this method'
164 | 
165 | 


--------------------------------------------------------------------------------
/tests/test_stream.py:
--------------------------------------------------------------------------------
 1 | import time
 2 | 
 3 | from freezegun import freeze_time
 4 | from mock import patch, call, Mock
 5 | import pytest
 6 | import boto3
 7 | from botocore.exceptions import ClientError
 8 | 
 9 | from pg2kinesis.stream import StreamWriter
10 | 
11 | @pytest.fixture()
12 | def writer():
13 |     with patch('aws_kinesis_agg.aggregator.RecordAggregator'), patch.object(boto3, 'client'):
14 |         writer = StreamWriter('blah')
15 |     return writer
16 | 
17 | def test__init__():
18 |     mock_client = Mock()
19 |     with patch.object(boto3, 'client', return_value=mock_client):
20 |         error_response = {'Error': {'Code': 'ResourceInUseException'}}
21 |         mock_client.create_stream = Mock(side_effect=ClientError(error_response, 'create_stream'))
22 | 
23 |         StreamWriter('blah')
24 |         assert mock_client.create_stream.called
25 |         assert call.get_waiter('stream_exists') in mock_client.method_calls, "We handled stream existence"
26 | 
27 |         error_response = {'Error': {'Code': 'Something else'}}
28 |         mock_client.create_stream = Mock(side_effect=ClientError(error_response, 'create_stream'))
29 | 
30 |         mock_client.reset_mock()
31 |         with pytest.raises(ClientError):
32 |             StreamWriter('blah')
33 |             assert mock_client.create_stream.called
34 |             assert call.get_waiter('stream_exists') not in mock_client.method_calls, "never reached"
35 | 
36 | 
37 | def test_put_message(writer):
38 | 
39 |     writer._send_agg_record = Mock()
40 | 
41 |     msg = Mock()
42 |     msg.change.xid = 10
43 |     msg.fmt_msg = object()
44 | 
45 |     writer.last_send = 1445444940.0 - 10      # "2015-10-21 16:28:50"
46 |     with freeze_time('2015-10-21 16:29:00'):  # -> 1445444940.0
47 |         result = writer.put_message(None)
48 | 
49 |         assert result is None, 'With no message or timeout we did not force a send'
50 |         assert not writer._send_agg_record.called, 'we did not force a send'
51 | 
52 |         writer._record_agg.add_user_record = Mock(return_value=None)
53 |         result = writer.put_message(msg)
54 |         assert result is None, 'With message, no timeout and not a full agg we do not send'
55 |         assert not writer._send_agg_record.called, 'we did not force a send'
56 | 
57 |     with freeze_time('2015-10-21 16:29:10'):  # -> 1445444950.0
58 |         result = writer.put_message(None)
59 |         assert result is not None, 'Timeout forced a send'
60 |         assert writer._send_agg_record.called, 'We sent a record'
61 |         assert writer.last_send == 1445444950.0, 'updated window'
62 | 
63 |     with freeze_time('2015-10-21 16:29:20'):  # -> 1445444960.0
64 |         writer._send_agg_record.reset_mock()
65 |         writer._record_agg.add_user_record = Mock(return_value='blue')
66 |         result = writer.put_message(msg)
67 | 
68 |         assert result == 'blue', 'We passed in a message that forced the agg to report full'
69 |         assert writer._send_agg_record.called, 'We sent a record'
70 |         assert writer.last_send == 1445444960.0, 'updated window'
71 | 
72 | 
73 | def test__send_agg_record(writer):
74 |     assert writer._send_agg_record(None) is None, 'Do not do anything if agg_record is None'
75 | 
76 |     agg_rec = Mock()
77 |     agg_rec.get_contents = Mock(return_value=(1, 2, 'datablob'))
78 | 
79 |     err = ClientError({'Error': {'Code': 'ProvisionedThroughputExceededException'}}, 'put_record')
80 | 
81 |     writer._kinesis.put_record = Mock(side_effect=[err, err, err, {'SequenceNumber': 12345}])
82 | 
83 |     with patch.object(time, 'sleep') as mock_sleep:
84 |         writer._send_agg_record(agg_rec)
85 |         assert mock_sleep.call_count == 3, "We had to back off 3 times so we slept"
86 |         assert mock_sleep.call_args_list == [call(.1), call(.2), call(.4)], 'Geometric back off!'
87 | 
88 |     with pytest.raises(ClientError):
89 |         writer._kinesis.put_record = Mock(side_effect=ClientError({'Error': {'Code': 'Something else'}},
90 |                                                                   'put_record'))
91 |         writer._send_agg_record(agg_rec)
92 | 
93 |     writer.back_off_limit = .3  # Will bust on third go around
94 |     writer._kinesis.put_record = Mock(side_effect=[err, err, err, {'SequenceNumber': 12345}])
95 |     with pytest.raises(Exception) as e_info, patch.object(time, 'sleep'):
96 |         writer._send_agg_record(agg_rec)
97 |         assert e_info.value.message == 'ProvisionedThroughputExceededException caused a backed off too many times!', \
98 |             'We raise on too many throughput errors'
99 | 


--------------------------------------------------------------------------------