├── .gitignore ├── .travis.yml ├── LICENSE ├── MANIFEST.in ├── README.rst ├── pg2kinesis ├── __init__.py ├── __main__.py ├── formatter.py ├── log.py ├── slot.py └── stream.py ├── requirements.txt ├── setup.cfg ├── setup.py └── tests ├── __init__.py ├── test___main__.py ├── test_formatter.py ├── test_slot.py └── test_stream.py /.gitignore: -------------------------------------------------------------------------------- 1 | *.pyc 2 | .cache 3 | .coverage/ 4 | .eggs/ 5 | .idea/ 6 | build/ 7 | dist/ 8 | pg2kinesis.egg-info/ 9 | tests/__pycache__/ 10 | *.DS_Store 11 | env/ 12 | -------------------------------------------------------------------------------- /.travis.yml: -------------------------------------------------------------------------------- 1 | dist: trusty 2 | language: python 3 | 4 | branches: 5 | only: 6 | - master 7 | 8 | 9 | sudo: false 10 | 11 | python: 12 | - "2.7" 13 | - "3.3" 14 | - "3.4" 15 | - "3.5" 16 | - "3.6" 17 | 18 | install: pip install -r requirements.txt 19 | 20 | script: pytest 21 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2017 Handshake Corp 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /MANIFEST.in: -------------------------------------------------------------------------------- 1 | include *.rst *.txt LICENSE .travis.yml 2 | recursive-include tests *.py 3 | -------------------------------------------------------------------------------- /README.rst: -------------------------------------------------------------------------------- 1 | ========== 2 | pg2kinesis 3 | ========== 4 | 5 | 6 | pg2kinesis uses `logical decoding 7 | `_ 8 | in Postgres 9.4 or later to capture a consistent, continuous stream of events 9 | from the database and publishes them to an `AWS Kinesis `_ 10 | stream in a format of your choosing. 11 | 12 | It does this without requiring any changes to your schema like triggers or 13 | "shadow" columns or tables, and has a negligible impact on database performance. 14 | This is done while being extremely fault tolerant. No data loss will be incurred 15 | on any type of underlying system failure including process crashes, network 16 | outages, or ec2 instance failures. However, in these situations there will 17 | likely be records that are sent more than once, so your consumer should be 18 | designed with this in mind. 19 | 20 | The fault tolerance comes from guarantees provided by the underlying 21 | technologies and from the "2-phase commit" style of publishing inherent to the 22 | design of the program. Changes are first peeked from the replication slot and 23 | published to Kinesis. Once Kinesis successfully recieves a batch of records, we 24 | advance the `xmin `_ 25 | of the slot, thereby telling postgres it is safe to reclaim the space taken by 26 | the WAL. As is always the case with logical replication slots, unacknowledged 27 | data on the slot will consume disk on the database until it is read. 28 | 29 | There are other utilities that do similar things, often by injecting a C library 30 | into Postgres to do data transformations in place. Unfortunately these 31 | approaches are not suitable for managed databases like AWS' RDS where support 32 | for various plugins is limited and ultimately determined by the hosting provider. 33 | We specifically created pg2kinesis to make use of logical decoding on 34 | `Amazon's RDS for PostgreSQL `_. Amazon 35 | supports logical decoding with either the `test_decoding `_ 36 | or `wal2json `_ 37 | output plugins. This utility takes the output of either plugin, transforms it 38 | based on a formatter you can define, guarantees publishing to a Kinesis stream 39 | in *transaction commit time order* and with a guarantee that *no data will be lost*. 40 | 41 | Installation 42 | ------------ 43 | 44 | Prerequisites 45 | ^^^^^^^^^^^^^ 46 | 47 | #. Python 2.7*, 3.3+ 48 | #. AWS-CLI installed and configured 49 | #. A PostgreSQL 9.4+ server with logical replication enabled 50 | #. A Kinesis stream 51 | 52 | Install 53 | ^^^^^^^ 54 | 55 | ``pip install pg2kinesis`` 56 | 57 | 58 | Tests 59 | ----- 60 | 61 | To run tests you will need a clone of the repo and have to install some additional requirements: 62 | 63 | #. ``git clone git@github.com:handshake/pg2kinesis.git`` 64 | #. ``cd pg2kinesis`` 65 | #. ``pip install -r requirements.txt`` 66 | #. ``(cd tests && pytest)`` 67 | 68 | 69 | Usage 70 | ----- 71 | 72 | Run ``pg2kinesis --help`` to get a list of the latest command line options. 73 | 74 | By default pg2kinesis attempts to connect to a local postgres instance and 75 | publish to a stream named ``pg2kinesis`` using the AWS credentials in the 76 | environment the utility was invoked in. 77 | 78 | On successful start it will query your database for the primary key definitions 79 | of every table in ``--pg-dbname``. This is used to identify the correct column 80 | in the test_decoding output to publish. If a table does not have a primary key 81 | its changes will **NOT** be published unless using wal2json and ``--full-change``. 82 | 83 | You have the choice for 3 different textual formats that will be sent to the 84 | kinesis stream: 85 | 86 | * ``CSV``: outputs strings to Kinesis that look like:: 87 | 88 | 0,CDC,,,, 89 | 90 | * ``CSVPayload``: outputs similar to the above except the 3rd column is now a 91 | json object representing the change. 92 | 93 | .. code-block:: 94 | 95 | 0,CDC,{ 96 | "xid": 97 | "table": <...> 98 | "operation: <...> 99 | "pkey": <...> 100 | } 101 | 102 | * If ``wal2json`` is being used, this can either be the primary key as above or 103 | the full changed row. 104 | 105 | .. code-block:: 106 | 107 | 0,CDC,{ 108 | "xid": 30355, 109 | "change": { 110 | "kind": "insert", 111 | "columnnames": ["a", "b"], 112 | "columntypes": ["int4", "int4"], 113 | "table": "foo", 114 | "columnvalues": [1, null], 115 | "schema": "public" 116 | } 117 | } 118 | 119 | 120 | Shout Outs 121 | ---------- 122 | 123 | pg2kinesis is based on the ideas of others including: 124 | 125 | * Logical Decoding: a new world of data exchange applications for Postgres SQL 126 | [(`slides `_)] 127 | * psycopg2 [(`main `_]) [(`repo 128 | `__)] 129 | * bottledwater-pg [(`blog `_)] [(`repo `__)] 130 | * wal2json [(`repo `__)] 131 | 132 | 133 | Future Road Map 134 | --------------- 135 | 136 | * Support full change output from test_decoding plugin 137 | * Allow HUPing to notify utility to regenerate primary key cache 138 | * Support above on a schedule specified via commandline with sensible default of once an hour. 139 | -------------------------------------------------------------------------------- /pg2kinesis/__init__.py: -------------------------------------------------------------------------------- 1 | __version__ = '0.7.0' 2 | 3 | __uri__ = 'https://github.com/handshake/pg2kinesis' 4 | __title__ = 'pg2kinesis' 5 | __description__ = 'A utility that enables PostgreSQL 9.4+ to logically replicate to Amazon''s Kinesis' 6 | __doc__ = __description__ + ' <' + __uri__ + '>' 7 | 8 | __author__ = 'Shon T. Urbas, Geoff Johnson' 9 | __email__ = 'shon.urbas@gmail.com, geoff.johnson@handshake.com' 10 | 11 | __license__ = 'MIT' 12 | __copyright__ = 'Copyright (c) 2016 Handshake Corp.' 13 | -------------------------------------------------------------------------------- /pg2kinesis/__main__.py: -------------------------------------------------------------------------------- 1 | from __future__ import division 2 | import time 3 | 4 | import click 5 | 6 | from .slot import SlotReader 7 | from .formatter import get_formatter 8 | from .stream import StreamWriter 9 | from .log import logger 10 | 11 | 12 | SUPPORTED_OPERATIONS = ['update', 'insert', 'delete', 'truncate'] 13 | 14 | @click.command() 15 | @click.option('--pg-dbname', '-d', help='Database to connect to.') 16 | @click.option('--pg-host', '-h', default='', 17 | help='Postgres server location. Leave empty if localhost.') 18 | @click.option('--pg-port', '-p', default='5432', help='Postgres port.') 19 | @click.option('--pg-user', '-u', help='Postgres user') 20 | @click.option('--pg-sslmode', help='Postgres SSL mode', default='prefer') 21 | @click.option('--pg-slot-name', '-s', default='pg2kinesis', 22 | help='Postgres replication slot name.') 23 | @click.option('--pg-slot-output-plugin', default='test_decoding', 24 | type=click.Choice(['test_decoding', 'wal2json']), 25 | help='Postgres replication slot output plugin') 26 | @click.option('--stream-name', '-k', default='pg2kinesis', 27 | help='Kinesis stream name.') 28 | @click.option('--message-formatter', '-f', default='CSVPayload', 29 | type=click.Choice(['CSVPayload', 'CSV']), 30 | help='Kinesis record formatter.') 31 | @click.option('--table-pat', help='Optional regular expression for table names.') 32 | @click.option('--full-change', default=False, is_flag=True, 33 | help='Emit all columns of a changed row.') 34 | @click.option('--create-slot', default=False, is_flag=True, 35 | help='Attempt to on start create a the slot.') 36 | @click.option('--recreate-slot', default=False, is_flag=True, 37 | help='Deletes the slot on start if it exists and then creates.') 38 | @click.option('--operations', default='all', type=click.Choice(['all'] + SUPPORTED_OPERATIONS), 39 | multiple=True, help = 'Which operations to replicate to kinesis, Default: all') 40 | def main(pg_dbname, pg_host, pg_port, pg_user, pg_sslmode, pg_slot_name, pg_slot_output_plugin, 41 | stream_name, message_formatter, table_pat, operations, full_change, create_slot, recreate_slot): 42 | if 'all' in operations: 43 | operations = SUPPORTED_OPERATIONS 44 | 45 | if full_change: 46 | assert message_formatter == 'CSVPayload', 'Full changes must be formatted as JSON.' 47 | assert pg_slot_output_plugin == 'wal2json', 'Full changes must use wal2json.' 48 | 49 | logger.info('Starting pg2kinesis replicating the following operations: %s', ','.join(operations)) 50 | logger.info('Getting kinesis stream writer') 51 | writer = StreamWriter(stream_name) 52 | 53 | with SlotReader(pg_dbname, pg_host, pg_port, pg_user, pg_sslmode, pg_slot_name, 54 | pg_slot_output_plugin) as reader: 55 | 56 | if recreate_slot: 57 | reader.delete_slot() 58 | reader.create_slot() 59 | elif create_slot: 60 | reader.create_slot() 61 | 62 | pk_map = reader.primary_key_map 63 | formatter = get_formatter(message_formatter, pk_map, 64 | pg_slot_output_plugin, full_change, table_pat) 65 | 66 | consume = Consume(formatter, writer, operations) 67 | 68 | # Blocking. Responds to Control-C. 69 | reader.process_replication_stream(consume) 70 | 71 | class Consume(object): 72 | def __init__(self, formatter, writer, filter_operations): 73 | self.cum_msg_count = 0 74 | self.cum_msg_size = 0 75 | self.msg_window_size = 0 76 | self.msg_window_count = 0 77 | self.cur_window = 0 78 | 79 | self.formatter = formatter 80 | self.writer = writer 81 | self.filter_operations = filter_operations 82 | 83 | def should_send_to_kinesis(self, fmt_msg): 84 | return fmt_msg.change.operation in self.filter_operations 85 | 86 | def __call__(self, change): 87 | self.cum_msg_count += 1 88 | self.cum_msg_size += change.data_size 89 | 90 | self.msg_window_size += change.data_size 91 | self.msg_window_count += 1 92 | 93 | fmt_msgs = self.formatter(change.payload) 94 | 95 | progress_msg = 'xid: {:12} win_count:{:>10} win_size:{:>10}mb cum_count:{:>10} cum_size:{:>10}mb' 96 | 97 | for fmt_msg in fmt_msgs: 98 | if not self.should_send_to_kinesis(fmt_msg): 99 | fmt_msg = None 100 | 101 | did_put = self.writer.put_message(fmt_msg) 102 | if did_put: 103 | change.cursor.send_feedback(flush_lsn=change.data_start) 104 | logger.info('Flushed LSN: {}'.format(change.data_start)) 105 | 106 | int_time = int(time.time()) 107 | if not int_time % 10 and int_time != self.cur_window: 108 | logger.info(progress_msg.format( 109 | self.formatter.cur_xact, self.msg_window_count, 110 | self.msg_window_size / 1048576, self.cum_msg_count, 111 | self.cum_msg_size / 1048576)) 112 | 113 | self.cur_window = int_time 114 | self.msg_window_size = 0 115 | self.msg_window_count = 0 116 | 117 | if __name__ == '__main__': 118 | main() 119 | -------------------------------------------------------------------------------- /pg2kinesis/formatter.py: -------------------------------------------------------------------------------- 1 | from __future__ import unicode_literals 2 | 3 | import json 4 | import re 5 | import sys 6 | 7 | from .log import logger 8 | 9 | from collections import namedtuple 10 | 11 | # Tuples representing changes as pulled from database 12 | Change = namedtuple('Change', 'xid, table, operation, pkey') 13 | FullChange = namedtuple('FullChange', 'xid, change') 14 | FullChange.operation = property(lambda self: self.change.get('kind')) 15 | 16 | # Final product of Formatter, a Change and the Change formatted. 17 | Message = namedtuple('Message', 'change, fmt_msg') 18 | 19 | COL_TYPE_VALUE_TEMPLATE_PAT = r"{col_name}\[{col_type}\]:'?([\w\-]+)'?" 20 | MISSING_TABLE_ERR = 'Unable to locate table: "{}"' 21 | MISSING_PK_ERR = 'Unable to locate primary key for table "{}"' 22 | 23 | class Formatter(object): 24 | VERSION = 0 25 | TYPE = 'CDC' 26 | IGNORED_CHANGES = {'COMMIT'} 27 | 28 | def __init__(self, primary_key_map, output_plugin='test_decoding', 29 | full_change=False, table_pat=None): 30 | 31 | self._primary_key_patterns = {} 32 | self.output_plugin = output_plugin 33 | self.primary_key_map = primary_key_map 34 | self.full_change = full_change 35 | self.table_pat = table_pat if table_pat is not None else r'[\w_\.]+' 36 | self.table_re = re.compile(self.table_pat) 37 | self.cur_xact = '' 38 | 39 | for k, v in getattr(primary_key_map, 'iteritems', primary_key_map.items)(): 40 | # ":" added to make later look up not need to trim trailing ":". 41 | self._primary_key_patterns[k + ":"] = re.compile( 42 | COL_TYPE_VALUE_TEMPLATE_PAT.format(col_name=v.col_name, col_type=v.col_type) 43 | ) 44 | 45 | def _preprocess_test_decoding_change(self, change): 46 | """ 47 | Takes a message payload from the test_decoding plugin and distills it 48 | into a Change tuple currently only looking for primary key. 49 | 50 | They look like this: 51 | "table table_test: UPDATE: uuid[uuid]:'00079f3e-0479-4475-acff-4f225cc5188a' another_col[text]'bling'" 52 | 53 | :param change: a message payload from postgres' test_decoding plugin. 54 | :return: A list of type Change 55 | """ 56 | 57 | rec = change.split(' ', 3) 58 | 59 | if rec[0] == 'BEGIN': 60 | self.cur_xact = rec[1] 61 | elif rec[0] in self.IGNORED_CHANGES: 62 | pass 63 | elif rec[0] == 'table': 64 | table_name = rec[1][:-1] 65 | 66 | if self.table_re.search(table_name): 67 | try: 68 | mat = self._primary_key_patterns[rec[1]].search(rec[3]) 69 | except KeyError: 70 | self._log_and_raise(MISSING_TABLE_ERR.format(rec[1])) 71 | else: 72 | if mat: 73 | pkey = mat.groups()[0] 74 | return [Change(xid=self.cur_xact, table=table_name, 75 | operation=rec[2][:-1], pkey=pkey)] 76 | else: 77 | self._log_and_raise(MISSING_PK_ERR.format(table_name)) 78 | else: 79 | self._log_and_raise('Unknown change: "{}"'.format(change)) 80 | 81 | return [] 82 | 83 | def _preprocess_wal2json_change(self, change): 84 | """ 85 | Takes a message payload from the wal2json plugin and distills it into a 86 | list of Change or FullChange tuples. 87 | 88 | They look like this: 89 | { 90 | "xid": 1234567890 91 | "change": [ 92 | { 93 | "kind": "insert", 94 | "schema": "public", 95 | "table": "some_table", 96 | "columnnames": ["id"], 97 | "columntypes": ["int4"], 98 | "columnvalues": [42] 99 | } 100 | ] 101 | } 102 | :param change: a message payload from postgres wal2json plugin. 103 | :return: A list of type Change or FullChange 104 | """ 105 | 106 | change_dictionary = json.loads(change) 107 | if not change_dictionary: 108 | return [] 109 | 110 | self.cur_xact = change_dictionary['xid'] 111 | changes = [] 112 | 113 | for change in change_dictionary['change']: 114 | table_name = change['table'] 115 | schema = change['schema'] 116 | if self.table_re.search(table_name): 117 | if self.full_change: 118 | changes.append(FullChange(xid=self.cur_xact, change=change)) 119 | else: 120 | try: 121 | full_table = '{}.{}'.format(schema, table_name) 122 | primary_key = self.primary_key_map[full_table] 123 | except KeyError: 124 | self._log_and_raise(MISSING_TABLE_ERR.format(full_table)) 125 | else: 126 | value_index = change['columnnames'].index(primary_key.col_name) 127 | pkey = str(change['columnvalues'][value_index]) 128 | changes.append(Change(xid=self.cur_xact, 129 | table=full_table, 130 | operation=change['kind'].lower(), 131 | pkey=pkey)) 132 | return changes 133 | 134 | @staticmethod 135 | def _log_and_raise(msg): 136 | logger.error(msg) 137 | raise Exception(msg) 138 | 139 | def __call__(self, change): 140 | if self.output_plugin == 'test_decoding': 141 | pp_changes = self._preprocess_test_decoding_change(change) 142 | elif self.output_plugin == 'wal2json': 143 | pp_changes = self._preprocess_wal2json_change(change) 144 | return [self.produce_formatted_message(pp_change) for pp_change in pp_changes] 145 | 146 | def produce_formatted_message(self, change): 147 | raise NotImplementedError 148 | 149 | 150 | class CSVFormatter(Formatter): 151 | VERSION = 0 152 | def produce_formatted_message(self, change): 153 | fmt_msg = '{},{},{},{},{},{}'.format(CSVFormatter.VERSION, 154 | CSVFormatter.TYPE, *change) 155 | return Message(change=change, fmt_msg=fmt_msg) 156 | 157 | 158 | class CSVPayloadFormatter(Formatter): 159 | VERSION = 0 160 | def produce_formatted_message(self, change): 161 | fmt_msg = '{},{},{}'.format(CSVFormatter.VERSION, CSVFormatter.TYPE, 162 | json.dumps(change._asdict())) 163 | return Message(change=change, fmt_msg=fmt_msg) 164 | 165 | 166 | def get_formatter(name, primary_key_map, output_plugin, full_change, table_pat): 167 | formatter_f = getattr(sys.modules[__name__], '%sFormatter' % name) 168 | return formatter_f(primary_key_map, output_plugin, full_change, table_pat) 169 | -------------------------------------------------------------------------------- /pg2kinesis/log.py: -------------------------------------------------------------------------------- 1 | import logging 2 | 3 | FORMAT = '%(asctime)-15s %(levelname)s %(message)s' 4 | logging.basicConfig(format=FORMAT) 5 | logger = logging.getLogger() 6 | logger.setLevel(logging.INFO) 7 | -------------------------------------------------------------------------------- /pg2kinesis/slot.py: -------------------------------------------------------------------------------- 1 | from collections import namedtuple 2 | import threading 3 | 4 | import psycopg2 5 | import psycopg2.extras 6 | import psycopg2.extensions 7 | import psycopg2.errorcodes 8 | 9 | from .log import logger 10 | 11 | psycopg2.extensions.register_type(psycopg2.extensions.UNICODE, None) 12 | psycopg2.extensions.register_type(psycopg2.extensions.UNICODEARRAY, None) 13 | 14 | PrimaryKeyMapItem = namedtuple('PrimaryKeyMapItem', 'table_name, col_name, col_type, col_ord_pos') 15 | 16 | 17 | class SlotReader(object): 18 | PK_SQL = """ 19 | SELECT CONCAT(table_schema, '.', table_name), column_name, data_type, ordinal_position 20 | FROM information_schema.tables 21 | LEFT JOIN ( 22 | SELECT CONCAT(table_schema, '.', table_name), column_name, data_type, c.ordinal_position, 23 | table_catalog, table_schema, table_name 24 | FROM information_schema.table_constraints 25 | JOIN information_schema.key_column_usage AS kcu 26 | USING (constraint_catalog, constraint_schema, constraint_name, 27 | table_catalog, table_schema, table_name) 28 | JOIN information_schema.columns AS c 29 | USING (table_catalog, table_schema, table_name, column_name) 30 | WHERE constraint_type = 'PRIMARY KEY' 31 | ) as q using (table_catalog, table_schema, table_name) 32 | ORDER BY ordinal_position; 33 | """ 34 | 35 | def __init__(self, database, host, port, user, sslmode, slot_name, 36 | output_plugin='test_decoding'): 37 | # Cool fact: using connections as context manager doesn't close them on 38 | # success after leaving with block 39 | self._db_confg = dict(database=database, host=host, port=port, user=user, sslmode=sslmode) 40 | self._repl_conn = None 41 | self._repl_cursor = None 42 | self._normal_conn = None 43 | self.slot_name = slot_name 44 | self.output_plugin = output_plugin 45 | self.cur_lag = 0 46 | 47 | def __enter__(self): 48 | self._normal_conn = self._get_connection() 49 | self._normal_conn.set_isolation_level(psycopg2.extensions.ISOLATION_LEVEL_AUTOCOMMIT) 50 | self._repl_conn = self._get_connection(connection_factory=psycopg2.extras.LogicalReplicationConnection) 51 | self._repl_cursor = self._repl_conn.cursor() 52 | 53 | return self 54 | 55 | def __exit__(self, exc_type, exc_val, exc_tb): 56 | """ 57 | Be a good citizen and try to clean up on the way out. 58 | """ 59 | 60 | try: 61 | self._repl_cursor.close() 62 | except Exception: 63 | pass 64 | 65 | try: 66 | self._repl_conn.close() 67 | except Exception: 68 | pass 69 | 70 | try: 71 | self._normal_conn.close() 72 | except Exception: 73 | pass 74 | 75 | def _get_connection(self, connection_factory=None, cursor_factory=None): 76 | return psycopg2.connect(connection_factory=connection_factory, 77 | cursor_factory=cursor_factory, **self._db_confg) 78 | 79 | def _execute_and_fetch(self, sql, *params): 80 | with self._normal_conn.cursor() as cur: 81 | if params: 82 | cur.execute(sql, params) 83 | else: 84 | cur.execute(sql) 85 | 86 | return cur.fetchall() 87 | 88 | @property 89 | def primary_key_map(self): 90 | logger.info('Getting primary key map') 91 | result = map(PrimaryKeyMapItem._make, self._execute_and_fetch(SlotReader.PK_SQL)) 92 | pk_map = {rec.table_name: rec for rec in result} 93 | 94 | return pk_map 95 | 96 | def create_slot(self): 97 | logger.info('Creating slot %s' % self.slot_name) 98 | try: 99 | self._repl_cursor.create_replication_slot(self.slot_name, 100 | slot_type=psycopg2.extras.REPLICATION_LOGICAL, 101 | output_plugin=self.output_plugin) 102 | except psycopg2.ProgrammingError as p: 103 | # Will be raised if slot exists already. 104 | if p.pgcode != psycopg2.errorcodes.DUPLICATE_OBJECT: 105 | logger.error(p) 106 | raise 107 | else: 108 | logger.info('Slot %s is already present.' % self.slot_name) 109 | 110 | def delete_slot(self): 111 | logger.info('Deleting slot %s' % self.slot_name) 112 | try: 113 | self._repl_cursor.drop_replication_slot(self.slot_name) 114 | except psycopg2.ProgrammingError as p: 115 | # Will be raised if slot exists already. 116 | if p.pgcode != psycopg2.errorcodes.UNDEFINED_OBJECT: 117 | logger.error(p) 118 | raise 119 | else: 120 | logger.info('Slot %s was not found.' % self.slot_name) 121 | 122 | def process_replication_stream(self, consume): 123 | logger.info('Starting the consumption of slot "%s"!' % self.slot_name) 124 | if self.output_plugin == 'wal2json': 125 | options = {'include-xids': 1} 126 | else: 127 | options = None 128 | self._repl_cursor.start_replication(self.slot_name, options=options) 129 | self._repl_cursor.consume_stream(consume) 130 | -------------------------------------------------------------------------------- /pg2kinesis/stream.py: -------------------------------------------------------------------------------- 1 | import time 2 | import aws_kinesis_agg.aggregator 3 | import boto3 4 | 5 | from botocore.exceptions import ClientError 6 | from .log import logger 7 | 8 | class StreamWriter(object): 9 | def __init__(self, stream_name, back_off_limit=60, send_window=13): 10 | self.stream_name = stream_name 11 | self.back_off_limit = back_off_limit 12 | self.last_send = 0 13 | 14 | self._kinesis = boto3.client('kinesis') 15 | self._sequence_number_for_ordering = '0' 16 | self._record_agg = aws_kinesis_agg.aggregator.RecordAggregator() 17 | self._send_window = send_window 18 | 19 | try: 20 | self._kinesis.create_stream(StreamName=stream_name, ShardCount=1) 21 | except ClientError as e: 22 | # ResourceInUseException is raised when the stream already exists 23 | if e.response['Error']['Code'] != 'ResourceInUseException': 24 | logger.error(e) 25 | raise 26 | 27 | waiter = self._kinesis.get_waiter('stream_exists') 28 | 29 | # waits up to 180 seconds for stream to exist 30 | waiter.wait(StreamName=self.stream_name) 31 | 32 | def put_message(self, fmt_msg): 33 | agg_record = None 34 | 35 | if fmt_msg: 36 | agg_record = self._record_agg.add_user_record(str(fmt_msg.change.xid), fmt_msg.fmt_msg) 37 | 38 | # agg_record will be a complete record if aggregation is full. 39 | if agg_record or (self._send_window and time.time() - self.last_send > self._send_window): 40 | agg_record = agg_record if agg_record else self._record_agg.clear_and_get() 41 | self._send_agg_record(agg_record) 42 | self.last_send = time.time() 43 | 44 | return agg_record 45 | 46 | def _send_agg_record(self, agg_record): 47 | if agg_record is None: 48 | return 49 | 50 | pk, _, data = agg_record.get_contents() 51 | logger.info('Sending %s records. Size %s. PK: %s' % 52 | (agg_record.get_num_user_records(), agg_record.get_size_bytes(), pk)) 53 | 54 | back_off = .05 55 | while back_off < self.back_off_limit: 56 | try: 57 | result = self._kinesis.put_record(Data=data, 58 | PartitionKey=pk, 59 | SequenceNumberForOrdering=self._sequence_number_for_ordering, 60 | StreamName=self.stream_name) 61 | 62 | except ClientError as e: 63 | if e.response['Error']["Code"] == 'ProvisionedThroughputExceededException': 64 | back_off *= 2 65 | logger.warning('Provisioned throughput exceeded: sleeping %ss' % back_off) 66 | time.sleep(back_off) 67 | else: 68 | logger.error(e) 69 | raise 70 | else: 71 | logger.debug('Sequence number: %s' % result['SequenceNumber']) 72 | break 73 | else: 74 | raise Exception('ProvisionedThroughputExceededException caused a backed off too many times!') 75 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | aws_kinesis_agg==1.1.0 2 | boto3==1.6.19 3 | botocore==1.9.19 4 | click==6.3.0 5 | freezegun==0.3.6 6 | ipdb==0.10.2 7 | ipython<6.0.0,>=0.10.2 8 | protobuf==3.0.0 9 | psycopg2==2.7.4 10 | pytest==3.0.4 11 | mock==2.0.0 12 | -------------------------------------------------------------------------------- /setup.cfg: -------------------------------------------------------------------------------- 1 | [bdist_wheel] 2 | universal = 0 3 | 4 | [metadata] 5 | license_file = LICENSE 6 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | import codecs 2 | import os 3 | import re 4 | 5 | from setuptools import setup, find_packages 6 | 7 | 8 | ############################################################################### 9 | 10 | NAME = 'pg2kinesis' 11 | PACKAGES = find_packages(where='.') 12 | 13 | if not PACKAGES: 14 | raise RuntimeError('Package NOT FOUND!') 15 | 16 | META_PATH = os.path.join('pg2kinesis', '__init__.py') 17 | KEYWORDS = ['logical replication', 'kinesis', 'database'] 18 | CLASSIFIERS = [ 19 | 'Development Status :: 5 - Production/Stable', 20 | 'Intended Audience :: Developers', 21 | 'Natural Language :: English', 22 | 'License :: OSI Approved :: MIT License', 23 | 'Operating System :: OS Independent', 24 | 'Programming Language :: Python', 25 | 'Programming Language :: Python :: 2', 26 | 'Programming Language :: Python :: 2.7', 27 | 'Programming Language :: Python :: 3', 28 | 'Programming Language :: Python :: 3.3', 29 | 'Programming Language :: Python :: 3.4', 30 | 'Programming Language :: Python :: 3.5', 31 | 'Programming Language :: Python :: 3.6', 32 | 'Programming Language :: Python :: Implementation :: CPython', 33 | 'Topic :: Utilities', 34 | ] 35 | INSTALL_REQUIRES = [ 36 | 'aws_kinesis_agg>=1.1.0', 37 | 'boto3>=1.6.19', 38 | 'botocore>=1.9.19', 39 | 'click>=6.3.0', 40 | 'protobuf>=3.0.0', 41 | 'psycopg2>=2.7.4', 42 | ] 43 | 44 | ############################################################################### 45 | 46 | HERE = os.path.abspath(os.path.dirname(__file__)) 47 | 48 | 49 | def read(*parts): 50 | """ 51 | Build an absolute path from *parts* and and return the contents of the 52 | resulting file. Assume UTF-8 encoding. 53 | """ 54 | with codecs.open(os.path.join(HERE, *parts), 'rb', 'utf-8') as f: 55 | return f.read() 56 | 57 | 58 | META_FILE = read(META_PATH) 59 | 60 | 61 | def find_meta(meta): 62 | """ 63 | Extract __*meta*__ from META_FILE. 64 | """ 65 | meta_match = re.search( 66 | r'^__{meta}__ = [\'"]([^\'"]*)[\'"]'.format(meta=meta), 67 | META_FILE, re.M 68 | ) 69 | if meta_match: 70 | return meta_match.group(1) 71 | raise RuntimeError('Unable to find __{meta}__ string.'.format(meta=meta)) 72 | 73 | 74 | VERSION = find_meta('version') 75 | URI = find_meta('uri') 76 | LONG = read('README.rst') 77 | 78 | if __name__ == "__main__": 79 | setup( 80 | name=NAME, 81 | description=find_meta("description"), 82 | license=find_meta("license"), 83 | url=URI, 84 | version=VERSION, 85 | author=find_meta("author"), 86 | author_email=find_meta("email"), 87 | maintainer=find_meta("author"), 88 | maintainer_email=find_meta("email"), 89 | keywords=KEYWORDS, 90 | long_description=LONG, 91 | packages=PACKAGES, 92 | zip_safe=True, 93 | classifiers=CLASSIFIERS, 94 | install_requires=INSTALL_REQUIRES, 95 | entry_points={ 96 | 'console_scripts': ['pg2kinesis=pg2kinesis.__main__:main'], 97 | } 98 | ) 99 | -------------------------------------------------------------------------------- /tests/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/surbas/pg2kinesis/5db63a5476d05aedb524da99b8279642b51aa319/tests/__init__.py -------------------------------------------------------------------------------- /tests/test___main__.py: -------------------------------------------------------------------------------- 1 | from __future__ import unicode_literals 2 | 3 | from mock import Mock, call, patch 4 | 5 | from pg2kinesis.__main__ import Consume 6 | from pg2kinesis.formatter import Message, Change, FullChange 7 | 8 | 9 | def test_consume(): 10 | mock_formatter = Mock(return_value='fmt_msg') 11 | # required to avoid formatting error if cur_xact is logged 12 | mock_formatter.cur_xact = 'TEST_TRANSACTION' 13 | mock_writer = Mock() 14 | 15 | consume = Consume(mock_formatter, mock_writer, ['insert', 'update', 'delete', 'truncate']) 16 | 17 | mock_change = Mock() 18 | mock_change.data_start = 10 19 | mock_change.data_size = 100 20 | mock_change.payload = 'PAYLOAD' 21 | 22 | mock_writer.put_message = Mock(return_value=False) 23 | consume.should_send_to_kinesis = Mock(return_value=True) 24 | consume(mock_change) 25 | assert mock_writer.put_message.called, 'Sanity' 26 | assert call.cursor.send_feedback(flush_lsn=10) not in mock_change.mock_calls, \ 27 | 'we did not send feedback!' 28 | 29 | mock_writer.put_message = Mock(return_value=True) 30 | consume(mock_change) 31 | assert mock_writer.put_message.called, 'Sanity' 32 | assert call.cursor.send_feedback(flush_lsn=10) in mock_change.mock_calls, \ 33 | 'we sent feedback!' 34 | 35 | 36 | mock_time = Mock() 37 | mock_time.return_value = 11.0 38 | 39 | consume.msg_window_size = 0 40 | consume.msg_window_count = 0 41 | consume.cur_window = 10 42 | with patch('time.time', mock_time): 43 | consume(mock_change) 44 | assert consume.cur_window == 10, 'cur window not updated if time is non-10-multiple' 45 | assert consume.msg_window_size == 100, 'msg_window_size not reset if time is non-10-multiple' 46 | assert consume.msg_window_count == 1, 'msg_window_count not reset if time is non-10-multiple' 47 | 48 | mock_time.return_value = 20.0 49 | with patch('time.time', mock_time): 50 | consume(mock_change) 51 | assert consume.cur_window == 20, 'cur window updated if time is multiple of 10' 52 | assert consume.msg_window_size == 0, 'msg_window_size reset if time is multiple of 10' 53 | assert consume.msg_window_count == 0, 'msg_window_count reset if time is multiple of 10' 54 | 55 | with patch('time.time', mock_time): 56 | consume(mock_change) 57 | assert consume.msg_window_size == 100, 'msg_window_size not reset if time is same as cur_window' 58 | 59 | 60 | def test_consume_excludes(): 61 | mock_formatter = Mock(return_value=[ 62 | Message(Change(1, 'my_table', 'insert', 1), 'formatted_message1'), 63 | Message(FullChange(1, { 64 | "kind": "update", 65 | "schema": "public", 66 | "table": "my_table", 67 | "columnnames": ["id"], 68 | "columntypes": ["int4"], 69 | "columnvalues": [42] 70 | }), 'formatted_message2') 71 | ]) 72 | mock_formatter.cur_xact = 'TEST_TRANSACTION' 73 | mock_change = Mock(data_start=10, data_size=100, payload='PAYLOAD') 74 | mock_writer = Mock() 75 | mock_writer.put_message.return_value = False 76 | 77 | consume = Consume(mock_formatter, mock_writer, ['delete']) 78 | consume(mock_change) 79 | 80 | mock_writer.put_message.assert_has_calls([call(None), call(None)]) 81 | 82 | 83 | def test_consume_includes(): 84 | formatted_messages = [ 85 | Message(Change(1, 'my_table', 'delete', 1), 'formatted_message1'), 86 | Message(FullChange(1, { 87 | "kind": "delete", 88 | "schema": "public", 89 | "table": "my_table", 90 | "columnnames": ["id"], 91 | "columntypes": ["int4"], 92 | "columnvalues": [42] 93 | }), 'formatted_message2') 94 | ] 95 | mock_formatter = Mock(return_value=formatted_messages) 96 | mock_formatter.cur_xact = 'TEST_TRANSACTION' 97 | mock_change = Mock(data_start=10, data_size=100, payload='PAYLOAD', cursor=Mock()) 98 | mock_change.cursor.send_feedback.return_value = True 99 | mock_writer = Mock() 100 | mock_writer.put_message.return_value = True 101 | 102 | consume = Consume(mock_formatter, mock_writer, ['delete']) 103 | consume(mock_change) 104 | 105 | mock_writer.put_message.assert_has_calls([call(msg) for msg in formatted_messages]) 106 | mock_change.cursor.send_feedback.assert_has_calls([call(flush_lsn=mock_change.data_start)]) 107 | -------------------------------------------------------------------------------- /tests/test_formatter.py: -------------------------------------------------------------------------------- 1 | # coding=utf-8 2 | from __future__ import unicode_literals 3 | import json 4 | 5 | import mock 6 | import pytest 7 | 8 | from pg2kinesis.slot import PrimaryKeyMapItem 9 | from pg2kinesis.formatter import Change, CSVFormatter, CSVPayloadFormatter, Formatter, get_formatter 10 | 11 | 12 | def get_formatter_produce_formatted_message(cls): 13 | # type: (Formatter, unicode) -> Message 14 | change = Change(xid=1, table=u'public.blue', operation=u'Update', pkey=u'123456') 15 | result = cls({}).produce_formatted_message(change) 16 | assert result.change == change 17 | return result 18 | 19 | 20 | def test_CSVFormatter_produce_formatted_message(): 21 | result = get_formatter_produce_formatted_message(CSVFormatter) 22 | 23 | assert result.fmt_msg == u'0,CDC,1,public.blue,Update,123456' 24 | 25 | 26 | def test_CSVPayloadFormatter_produce_formatted_message(): 27 | result = get_formatter_produce_formatted_message(CSVPayloadFormatter) 28 | assert result.fmt_msg.startswith(u'0,CDC,') 29 | payload = result.fmt_msg.split(',', 2)[-1] 30 | assert json.loads(payload) == dict(xid=1, table=u'public.blue', operation=u'Update', pkey=u'123456') 31 | 32 | 33 | @pytest.fixture 34 | def pkey_map(): 35 | return {'public.test_table': PrimaryKeyMapItem(u'public.test_table', u'uuid', u'uuid', 0), 36 | 'public.test_table2': PrimaryKeyMapItem(u'public.test_table2', u'name', u'character varying', 0)} 37 | 38 | 39 | @pytest.fixture(params=[CSVFormatter, CSVPayloadFormatter, Formatter]) 40 | def formatter(request, pkey_map): 41 | return request.param(pkey_map) 42 | 43 | 44 | def test___init__(formatter): 45 | # processes primary key map (adds colon) 46 | 47 | patterns = formatter._primary_key_patterns 48 | 49 | assert len(patterns) == 2 50 | assert u'public.test_table:' in patterns, 'with colon' 51 | assert u'public.test_table2:' in patterns, 'with colon' 52 | assert u'public.test_table' not in patterns, 'without colon should not be in patterns' 53 | assert u'public.test_table2' not in patterns, 'without colon should not be in patterns' 54 | assert patterns[u'public.test_table:'].pattern == r"uuid\[uuid\]:'?([\w\-]+)'?" 55 | assert patterns[u'public.test_table2:'].pattern == r"name\[character varying\]:'?([\w\-]+)'?" 56 | 57 | 58 | def test__preprocess_test_decoding_change(formatter): 59 | # assert begin -> None + cur trans 60 | assert formatter.cur_xact == '' 61 | result = formatter._preprocess_test_decoding_change(u'BEGIN 100') 62 | assert result == [] 63 | assert formatter.cur_xact == u'100' 64 | 65 | # assert commit -> None 66 | formatter.cur_xact = '' 67 | result = formatter._preprocess_test_decoding_change(u'COMMIT') 68 | assert result == [] 69 | assert formatter.cur_xact == '' 70 | 71 | # error states 72 | with mock.patch.object(formatter, '_log_and_raise') as mock_log_and_raise: 73 | formatter._preprocess_test_decoding_change(u'UNKNOWN BLING') 74 | assert mock_log_and_raise.called 75 | mock_log_and_raise.assert_called_with(u'Unknown change: "UNKNOWN BLING"') 76 | 77 | with mock.patch.object(formatter, '_log_and_raise') as mock_log_and_raise: 78 | formatter._preprocess_test_decoding_change(u"table not_a_table: UPDATE: uuid[uuid]:'00079f3e-0479-4475-acff-4f225cc5188a'") 79 | assert mock_log_and_raise.called 80 | mock_log_and_raise.assert_called_with(u'Unable to locate table: "not_a_table:"') 81 | 82 | with mock.patch.object(formatter, '_log_and_raise') as mock_log_and_raise: 83 | formatter._preprocess_test_decoding_change(u"table public.test_table: UPDATE: not[not]:'00079f3e-0479-4475-acff-4f225cc5188a'") 84 | assert mock_log_and_raise.called 85 | mock_log_and_raise.assert_called_with(u'Unable to locate primary key for table "public.test_table"') 86 | 87 | # assert on proper match 88 | formatter.cur_xact = '1337' 89 | change = formatter._preprocess_test_decoding_change( 90 | u"table public.test_table: UPDATE: uuid[uuid]:'00079f3e-0479-4475-acff-4f225cc5188a'")[0] 91 | 92 | assert change.xid == u'1337' 93 | assert change.table == u'public.test_table' 94 | assert change.operation == u'UPDATE' 95 | assert change.pkey == u'00079f3e-0479-4475-acff-4f225cc5188a' 96 | 97 | change = formatter._preprocess_test_decoding_change( 98 | u"table public.test_table2: DELETE: name[character varying]:'Bling-2'")[0] 99 | 100 | assert change.xid == u'1337' 101 | assert change.table == u'public.test_table2' 102 | assert change.operation == u'DELETE' 103 | assert change.pkey == u'Bling-2' 104 | 105 | 106 | def test__preprocess_wal2json_change(formatter): 107 | formatter.cur_xact = '' 108 | result = formatter._preprocess_wal2json_change(u"""{ 109 | "xid": 101, 110 | "change": [] 111 | }""") 112 | assert result == [] 113 | assert formatter.cur_xact == 101 114 | 115 | with mock.patch.object(formatter, '_log_and_raise') as mock_log_and_raise: 116 | formatter._preprocess_wal2json_change(u"""{ 117 | "xid": 100, 118 | "change": [ 119 | { 120 | "kind": "insert", 121 | "schema": "public", 122 | "table": "not_a_table", 123 | "columnnames": ["uuid"], 124 | "columntypes": ["int4"], 125 | "columnvalues": ["00079f3e-0479-4475-acff-4f225cc5188a"] 126 | } 127 | ] 128 | }""") 129 | assert mock_log_and_raise.called 130 | mock_log_and_raise.assert_called_with(u'Unable to locate table: "public.not_a_table"') 131 | 132 | # assert on proper match 133 | formatter.cur_xact = '1337' 134 | change = formatter._preprocess_wal2json_change(u"""{ 135 | "xid": 1337, 136 | "change": [ 137 | { 138 | "kind": "insert", 139 | "schema": "public", 140 | "table": "test_table", 141 | "columnnames": ["uuid"], 142 | "columntypes": ["int4"], 143 | "columnvalues": ["00079f3e-0479-4475-acff-4f225cc5188a"] 144 | } 145 | ] 146 | }""")[0] 147 | 148 | assert change.xid == 1337 149 | assert change.table == u'public.test_table' 150 | assert change.operation == u'insert' 151 | assert change.pkey == u'00079f3e-0479-4475-acff-4f225cc5188a' 152 | 153 | change = formatter._preprocess_wal2json_change(u"""{ 154 | "xid": 1337, 155 | "change": [ 156 | { 157 | "kind": "delete", 158 | "schema": "public", 159 | "table": "test_table2", 160 | "columnnames": ["name"], 161 | "columntypes": ["varchar"], 162 | "columnvalues": ["Bling-2"] 163 | } 164 | ] 165 | }""")[0] 166 | 167 | assert change.xid == 1337 168 | assert change.table == u'public.test_table2' 169 | assert change.operation == u'delete' 170 | assert change.pkey == u'Bling-2' 171 | 172 | 173 | def test__preprocess_wal2json_full_change(formatter): 174 | formatter.cur_xact = '' 175 | formatter.full_change = True 176 | 177 | result = formatter._preprocess_wal2json_change(u"""{ 178 | "xid": 101, 179 | "change": [] 180 | }""") 181 | assert result == [] 182 | assert formatter.cur_xact == 101 183 | 184 | # Full changes from wal2json are not validated against known table map 185 | with mock.patch.object(formatter, '_log_and_raise') as mock_log_and_raise: 186 | formatter._preprocess_wal2json_change(u"""{ 187 | "xid": 100, 188 | "change": [ 189 | { 190 | "kind": "insert", 191 | "schema": "public", 192 | "table": "not_a_table", 193 | "columnnames": ["uuid"], 194 | "columntypes": ["int4"], 195 | "columnvalues": ["00079f3e-0479-4475-acff-4f225cc5188a"] 196 | } 197 | ] 198 | }""") 199 | assert not mock_log_and_raise.called 200 | 201 | # assert on proper match 202 | formatter.cur_xact = '1337' 203 | change = formatter._preprocess_wal2json_change(u"""{ 204 | "xid": 1337, 205 | "change": [ 206 | { 207 | "kind": "insert", 208 | "schema": "public", 209 | "table": "test_table", 210 | "columnnames": ["uuid"], 211 | "columntypes": ["int4"], 212 | "columnvalues": ["00079f3e-0479-4475-acff-4f225cc5188a"] 213 | } 214 | ] 215 | }""")[0] 216 | 217 | assert change.xid == 1337 218 | assert change.change == { 219 | "kind": "insert", 220 | "schema": "public", 221 | "table": "test_table", 222 | "columnnames": ["uuid"], 223 | "columntypes": ["int4"], 224 | "columnvalues": ["00079f3e-0479-4475-acff-4f225cc5188a"] 225 | } 226 | 227 | change = formatter._preprocess_wal2json_change(u"""{ 228 | "xid": 1337, 229 | "change": [ 230 | { 231 | "kind": "delete", 232 | "schema": "public", 233 | "table": "test_table2", 234 | "columnnames": ["name"], 235 | "columntypes": ["varchar"], 236 | "columnvalues": ["Bling-2"] 237 | } 238 | ] 239 | }""")[0] 240 | 241 | assert change.xid == 1337 242 | assert change.change == { 243 | "kind": "delete", 244 | "schema": "public", 245 | "table": "test_table2", 246 | "columnnames": ["name"], 247 | "columntypes": ["varchar"], 248 | "columnvalues": ["Bling-2"] 249 | } 250 | 251 | 252 | def test_log_and_raise(formatter): 253 | 254 | with mock.patch('logging.Logger.error') as mock_log, pytest.raises(Exception) as e_info: 255 | formatter._log_and_raise(u'HELP!') 256 | 257 | assert str(e_info.value) == u'HELP!' 258 | mock_log.assert_called_with(u'HELP!') 259 | 260 | 261 | def test___call__(formatter): 262 | with mock.patch.object(formatter, '_preprocess_test_decoding_change', return_value=[]) as mock_preprocess_test_decoding_change, \ 263 | mock.patch.object(formatter, 'produce_formatted_message', return_value=None) as mock_pfm: 264 | assert not mock_preprocess_test_decoding_change.called 265 | result = formatter('COMMIT') 266 | assert mock_preprocess_test_decoding_change.called 267 | assert result == [] 268 | assert not mock_pfm.called 269 | 270 | with mock.patch.object(formatter, '_preprocess_test_decoding_change', return_value=['blue']) as mock_preprocess_test_decoding_change, \ 271 | mock.patch.object(formatter, 'produce_formatted_message', return_value='blue msg') as mock_pfm: 272 | assert not mock_preprocess_test_decoding_change.called 273 | result = formatter('blue message') 274 | assert mock_preprocess_test_decoding_change.called 275 | assert result == ['blue msg'] 276 | mock_pfm.assert_called_with('blue') 277 | 278 | 279 | def test_get_formatter(): 280 | with mock.patch.object(Formatter, '__init__', return_value=None) as mocked: 281 | result = get_formatter('CSVPayload', 1, 2, 3, 4) 282 | assert isinstance(result, CSVPayloadFormatter) 283 | assert mocked.called 284 | mocked.assert_called_with(1, 2, 3, 4) 285 | 286 | with mock.patch.object(Formatter, '__init__', return_value=None) as mocked: 287 | result = get_formatter('CSV', 1, 2, 3, 4) 288 | assert isinstance(result, CSVFormatter) 289 | assert mocked.called 290 | mocked.assert_called_with(1, 2, 3, 4) 291 | -------------------------------------------------------------------------------- /tests/test_slot.py: -------------------------------------------------------------------------------- 1 | from mock import call, Mock, MagicMock, patch, PropertyMock 2 | 3 | import pytest 4 | import psycopg2 5 | import psycopg2.errorcodes 6 | 7 | from pg2kinesis.slot import SlotReader 8 | 9 | 10 | @pytest.fixture 11 | def slot(): 12 | slot = SlotReader('blah_db', 'blah_host', 'blah_port', 'blah_user', 'blah_sslmode', 'pg2kinesis') 13 | slot._repl_cursor = Mock() 14 | slot._repl_conn = Mock() 15 | slot._normal_conn = Mock() 16 | 17 | return slot 18 | 19 | 20 | def test__enter__(slot): 21 | # returns its self 22 | 23 | with patch.object(slot, '_get_connection', side_effects=[Mock(), Mock()]) as mock_gc: 24 | assert slot == slot.__enter__(), 'Returns itself' 25 | assert mock_gc.call_count == 2 26 | 27 | assert call.set_isolation_level(0) in slot._normal_conn.method_calls, 'make sure we are in autocommit' 28 | assert call.cursor() in slot._repl_conn.method_calls, 'we opened a cursor' 29 | 30 | with patch.object(slot, '_get_connection', side_effects=[Mock(), Mock()]) as mock_gc: 31 | slot.__enter__() 32 | 33 | 34 | def test__exit__(slot): 35 | slot.__exit__(None, None, None) 36 | 37 | assert call.close() in slot._repl_cursor.method_calls 38 | assert call.close() in slot._repl_conn.method_calls 39 | assert call.close() in slot._normal_conn.method_calls 40 | 41 | slot._repl_cursor.close = Mock(side_effect=Exception) 42 | slot._repl_conn.close = Mock(side_effect=Exception) 43 | slot._normal_conn.close= Mock(side_effect=Exception) 44 | slot.__exit__(None, None, None) 45 | 46 | assert slot._repl_cursor.close.called, "Still called even thought call above raised" 47 | assert slot._repl_conn.close.called, "Still called even thought call above raised" 48 | assert slot._normal_conn.close.called, "Still called even thought call above raised" 49 | 50 | 51 | def test_create_slot(slot): 52 | 53 | with patch.object(psycopg2.ProgrammingError, 'pgcode', 54 | new_callable=PropertyMock, 55 | return_value=psycopg2.errorcodes.DUPLICATE_OBJECT): 56 | pe = psycopg2.ProgrammingError() 57 | 58 | 59 | slot._repl_cursor.create_replication_slot = Mock(side_effect=pe) 60 | slot.create_slot() 61 | slot._repl_cursor.create_replication_slot.assert_called_with('pg2kinesis', 62 | slot_type=psycopg2.extras.REPLICATION_LOGICAL, 63 | output_plugin=u'test_decoding') 64 | with patch.object(psycopg2.ProgrammingError, 'pgcode', 65 | new_callable=PropertyMock, 66 | return_value=-1): 67 | pe = psycopg2.ProgrammingError() 68 | slot._repl_cursor.create_replication_slot = Mock(side_effect=pe) 69 | 70 | with pytest.raises(psycopg2.ProgrammingError) as e_info: 71 | slot.create_slot() 72 | slot._repl_cursor.create_replication_slot.assert_called_with('pg2kinesis', 73 | slot_type=psycopg2.extras.REPLICATION_LOGICAL, 74 | output_plugin=u'test_decoding') 75 | assert e_info.value.pgcode == -1 76 | 77 | slot._repl_cursor.create_replication_slot = Mock(side_effect=Exception) 78 | with pytest.raises(Exception): 79 | slot.create_slot() 80 | slot._repl_cursor.create_replication_slot.assert_called_with('pg2kinesis', 81 | slot_type=psycopg2.extras.REPLICATION_LOGICAL, 82 | output_plugin=u'test_decoding') 83 | 84 | 85 | def test_delete_slot(slot): 86 | with patch.object(psycopg2.ProgrammingError, 'pgcode', 87 | new_callable=PropertyMock, 88 | return_value=psycopg2.errorcodes.UNDEFINED_OBJECT): 89 | pe = psycopg2.ProgrammingError() 90 | slot._repl_cursor.drop_replication_slot = Mock(side_effect=pe) 91 | slot.delete_slot() 92 | slot._repl_cursor.drop_replication_slot.assert_called_with('pg2kinesis') 93 | 94 | with patch.object(psycopg2.ProgrammingError, 'pgcode', 95 | new_callable=PropertyMock, 96 | return_value=-1): 97 | pe = psycopg2.ProgrammingError() 98 | slot._repl_cursor.create_replication_slot = Mock(side_effect=pe) 99 | with pytest.raises(psycopg2.ProgrammingError) as e_info: 100 | slot.delete_slot() 101 | slot._repl_cursor.drop_replication_slot.assert_called_with('pg2kinesis') 102 | 103 | assert e_info.value.pgcode == -1 104 | 105 | slot._repl_cursor.create_replication_slot = Mock(side_effect=Exception) 106 | with pytest.raises(Exception): 107 | slot.delete_slot() 108 | slot._repl_cursor.drop_replication_slot.assert_called_with('pg2kinesis') 109 | 110 | 111 | def test__get_connection(slot): 112 | with patch('psycopg2.connect') as mock_connect: 113 | slot._get_connection() 114 | kall = call(connection_factory=None, cursor_factory=None, database='blah_db', host='blah_host', 115 | port='blah_port', user='blah_user') 116 | assert mock_connect.called_with(kall) 117 | 118 | slot._get_connection(connection_factory='connection_fact', cursor_factory='cursor_fact') 119 | kall = call(connection_factory='connection_fact', cursor_factory='cursor_fact', database='blah_db', host='blah_host', 120 | port='blah_port', user = 'blah_user') 121 | assert mock_connect.called_with(kall) 122 | 123 | 124 | def test_primary_key_map(slot): 125 | slot._execute_and_fetch = Mock(return_value=[('test_table', 'pkey', 'uuid', 0), 126 | ('test_table2', 'pkey', 'uuid', 0), 127 | ('blue', 'bkey', 'char var', 10) 128 | ]) 129 | 130 | pkey_map = slot.primary_key_map 131 | 132 | assert len(pkey_map) == 3 133 | assert 'test_table' in pkey_map 134 | assert 'test_table2' in pkey_map 135 | assert 'blue' in pkey_map 136 | 137 | assert pkey_map['blue'].table_name == 'blue' 138 | assert pkey_map['blue'].col_name == 'bkey' 139 | assert pkey_map['blue'].col_type == 'char var' 140 | assert pkey_map['blue'].col_ord_pos == 10 141 | 142 | 143 | def test_execute_and_fetch(slot): 144 | norm_conn = slot._normal_conn 145 | mock_cur = MagicMock() 146 | norm_conn.cursor = Mock(return_value=mock_cur) 147 | 148 | slot._execute_and_fetch('SQL SQL STATEMENT', 1, 2, 3) 149 | call.execute('SQL SQL STATEMENT', (1, 2, 3)) in mock_cur.method_calls 150 | assert call.__enter__().fetchall() in mock_cur.mock_calls 151 | 152 | mock_cur.reset_mock() 153 | slot._execute_and_fetch('SQL SQL STATEMENT') 154 | call.execute('SQL SQL STATEMENT') in mock_cur.method_calls 155 | assert call.__enter__().fetchall() in mock_cur.mock_calls 156 | 157 | 158 | def test_process_replication_stream(slot): 159 | consume = Mock() 160 | slot.process_replication_stream(consume) 161 | 162 | assert call.start_replication('pg2kinesis', options=None) in slot._repl_cursor.method_calls, 'We started replication event loop' 163 | assert call.consume_stream(consume) in slot._repl_cursor.method_calls, 'We pass consume to this method' 164 | 165 | -------------------------------------------------------------------------------- /tests/test_stream.py: -------------------------------------------------------------------------------- 1 | import time 2 | 3 | from freezegun import freeze_time 4 | from mock import patch, call, Mock 5 | import pytest 6 | import boto3 7 | from botocore.exceptions import ClientError 8 | 9 | from pg2kinesis.stream import StreamWriter 10 | 11 | @pytest.fixture() 12 | def writer(): 13 | with patch('aws_kinesis_agg.aggregator.RecordAggregator'), patch.object(boto3, 'client'): 14 | writer = StreamWriter('blah') 15 | return writer 16 | 17 | def test__init__(): 18 | mock_client = Mock() 19 | with patch.object(boto3, 'client', return_value=mock_client): 20 | error_response = {'Error': {'Code': 'ResourceInUseException'}} 21 | mock_client.create_stream = Mock(side_effect=ClientError(error_response, 'create_stream')) 22 | 23 | StreamWriter('blah') 24 | assert mock_client.create_stream.called 25 | assert call.get_waiter('stream_exists') in mock_client.method_calls, "We handled stream existence" 26 | 27 | error_response = {'Error': {'Code': 'Something else'}} 28 | mock_client.create_stream = Mock(side_effect=ClientError(error_response, 'create_stream')) 29 | 30 | mock_client.reset_mock() 31 | with pytest.raises(ClientError): 32 | StreamWriter('blah') 33 | assert mock_client.create_stream.called 34 | assert call.get_waiter('stream_exists') not in mock_client.method_calls, "never reached" 35 | 36 | 37 | def test_put_message(writer): 38 | 39 | writer._send_agg_record = Mock() 40 | 41 | msg = Mock() 42 | msg.change.xid = 10 43 | msg.fmt_msg = object() 44 | 45 | writer.last_send = 1445444940.0 - 10 # "2015-10-21 16:28:50" 46 | with freeze_time('2015-10-21 16:29:00'): # -> 1445444940.0 47 | result = writer.put_message(None) 48 | 49 | assert result is None, 'With no message or timeout we did not force a send' 50 | assert not writer._send_agg_record.called, 'we did not force a send' 51 | 52 | writer._record_agg.add_user_record = Mock(return_value=None) 53 | result = writer.put_message(msg) 54 | assert result is None, 'With message, no timeout and not a full agg we do not send' 55 | assert not writer._send_agg_record.called, 'we did not force a send' 56 | 57 | with freeze_time('2015-10-21 16:29:10'): # -> 1445444950.0 58 | result = writer.put_message(None) 59 | assert result is not None, 'Timeout forced a send' 60 | assert writer._send_agg_record.called, 'We sent a record' 61 | assert writer.last_send == 1445444950.0, 'updated window' 62 | 63 | with freeze_time('2015-10-21 16:29:20'): # -> 1445444960.0 64 | writer._send_agg_record.reset_mock() 65 | writer._record_agg.add_user_record = Mock(return_value='blue') 66 | result = writer.put_message(msg) 67 | 68 | assert result == 'blue', 'We passed in a message that forced the agg to report full' 69 | assert writer._send_agg_record.called, 'We sent a record' 70 | assert writer.last_send == 1445444960.0, 'updated window' 71 | 72 | 73 | def test__send_agg_record(writer): 74 | assert writer._send_agg_record(None) is None, 'Do not do anything if agg_record is None' 75 | 76 | agg_rec = Mock() 77 | agg_rec.get_contents = Mock(return_value=(1, 2, 'datablob')) 78 | 79 | err = ClientError({'Error': {'Code': 'ProvisionedThroughputExceededException'}}, 'put_record') 80 | 81 | writer._kinesis.put_record = Mock(side_effect=[err, err, err, {'SequenceNumber': 12345}]) 82 | 83 | with patch.object(time, 'sleep') as mock_sleep: 84 | writer._send_agg_record(agg_rec) 85 | assert mock_sleep.call_count == 3, "We had to back off 3 times so we slept" 86 | assert mock_sleep.call_args_list == [call(.1), call(.2), call(.4)], 'Geometric back off!' 87 | 88 | with pytest.raises(ClientError): 89 | writer._kinesis.put_record = Mock(side_effect=ClientError({'Error': {'Code': 'Something else'}}, 90 | 'put_record')) 91 | writer._send_agg_record(agg_rec) 92 | 93 | writer.back_off_limit = .3 # Will bust on third go around 94 | writer._kinesis.put_record = Mock(side_effect=[err, err, err, {'SequenceNumber': 12345}]) 95 | with pytest.raises(Exception) as e_info, patch.object(time, 'sleep'): 96 | writer._send_agg_record(agg_rec) 97 | assert e_info.value.message == 'ProvisionedThroughputExceededException caused a backed off too many times!', \ 98 | 'We raise on too many throughput errors' 99 | --------------------------------------------------------------------------------