├── LICENSE ├── README.md ├── install-justone-kafka-sink-pg-1.0.sql ├── justone-kafka-sink-pg-json-connector.properties ├── justone-kafka-sink-pg-json-standalone.properties ├── pom.xml ├── src └── main │ └── java │ └── com │ └── justone │ └── kafka │ └── sink │ └── pg │ └── json │ ├── PostgreSQLSinkConnector.java │ └── PostgreSQLSinkTask.java └── uninstall-justone-kafka-sink-pg-1.0.sql /LICENSE: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2016 duncanpauly 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # *THIS REPOSITORY IS NOT SUPPORTED* 2 | 3 | # kafka-sink-pg-json 4 | 5 | ## Description 6 | 7 | Kafka sink connector for streaming JSON messages into a PostgreSQL table. 8 | 9 | The connector receives message values in JSON format which are parsed into column values and writes one row to a table for 10 | each message received. 11 | 12 | The connector provides configuration options for controlling: 13 | 14 | * How JSON messages are parsed into relational rows 15 | * Message delivery semantics (at most once, at least once or exactly once). 16 | 17 | 18 | ## Requirements 19 | 20 | * Kafka 0.9 or later 21 | * PostgreSQL 9.0 or later (or a compatible database that supports the COPY interface and the pl/pgsql language) 22 | 23 | ## Components 24 | 25 | ### SQL files 26 | 27 | * install-justone-kafka-sink-pg-1.0.sql 28 | * uninstall-justone-kafka-sink-pg-1.0.sql 29 | 30 | ### Property files 31 | 32 | * justone-kafka-sink-pg-json-standalone.properties 33 | * justone-kafka-sink-pg-json-connector.properties 34 | 35 | ### Jar files 36 | 37 | * justone-kafka-sink-pg-json-1.0.jar 38 | * justone-json-1.0.jar 39 | * justone-pgwriter-1.0.jar 40 | * postgresql-9.3-1103.jdbc4.jar 41 | 42 | ## Installation 43 | 44 | * Place the jar files in the kafka library directory (libs) 45 | * Place the property files in the kafka configuration directory (config) 46 | * Install the package in the database using \i install-justone-kafka-sink-pg-1.0.sql from a psql session 47 | 48 | Note that the package must be installed in each database the connector will be used with. If the package has not been installed in 49 | the database you will see an error of the form ERROR: schema "$justone$kafka$connect$sink" does not exist. 50 | 51 | ## Uninstall 52 | 53 | If you wish to uninstall the package from the database, you can use \i uninstall-justone-kafka-sink-pg-1.0.sql from a psql session 54 | 55 | ## Usage 56 | 57 | Edit the justone-kafka-sink-pg-json-connector.properties file (see below) to configure the behaviour of the sink connector. 58 | 59 | To run the connector in standalone mode, use the following command from the Kafka home directory: 60 | 61 | bin/connect-standalone.sh config/justone-kafka-sink-pg-json-standalone.properties config/justone-kafka-sink-pg-json-connector.properties 62 | 63 | To run the connector in distributed mode, see Kafka documentation. 64 | 65 | Please note the following: 66 | 67 | * Only the value component of a message is parsed (if an optional key is provided, it is disregarded) 68 | * The connector does not use any schema information and will ignore any associated with a message 69 | * If a message contains no elements which can be reached by the configured parse paths then a row with all null columns is inserted into the table 70 | 71 | Typically, a seperate topic is configured for each table. However, the connector can consume messages from multiple topics, but be aware that a message which does not contain any of the configured parse paths will cause a row with null columns to be inserted. 72 | 73 | 74 | ## Configuration 75 | 76 | ### Kafka Connect 77 | 78 | The value converter used by Kafka connect should be StringConverter and the following property should be set: 79 | 80 | value.converter=org.apache.kafka.connect.storage.StringConverter 81 | 82 | This has already been set in the supplied justone-kafka-sink-pg-json-standalone.properties file - but you will likely 83 | need to modify the setting if using another property file for distributed mode. 84 | 85 | ### Sink Connector 86 | 87 | The sink connector properties (justone-kafka-sink-pg-json-connector.properties) are as follows: 88 | 89 | * tasks.max - number of tasks to be assigned to the connector. Mandatory. Must be 1 or more. 90 | * topics - topics to consume from. Mandatory. 91 | * db.host - server address/name of the database host. Optional. Default is localhost. 92 | * db.database - database to connect to. Mandatory. 93 | * db.username - username to connect to the database with. Mandatory. 94 | * db.password - password to use for user authentication. Optional. Default is none. 95 | * db.schema - schema of the table to append to. Mandatory. 96 | * db.table - name of the table to append to. Mandatory. 97 | * db.columns - comma separated list of columns to receive json element values. Mandatory. 98 | * db.json.parse - comma separated list of parse paths to retrieve json elements by (see below). Mandatory. 99 | * db.delivery - type of delivery. Must be one of fastest, guaranteed, synchronized (see below). Optional. Default is synchronized. 100 | * db.buffer.size - buffer size for caching table writes 101 | 102 | ## Delivery Modes 103 | 104 | The connector offers 3 delivery modes: 105 | 106 | * Fastest - a message will be delivered at most once, but may be lost. 107 | * Guaranteed - a message is guaranteed to be delivered, but may be duplicated. 108 | * Synchronized - a message is delivered exactly once. 109 | 110 | Delivery semantics are controlled by setting the db.delivery property in justone-kafka-sink-pg-json-1.0.properties. 111 | 112 | Note that the synchronized mode stores Kafka state in the database and if you subsequently run the connector in a non-synchronized mode 113 | (fastest or guaranteed) then any Kafka state for that table is discarded from the database. 114 | 115 | ## JSON Parsing 116 | 117 | Elements from a JSON message are parsed out into column values and this is specified using a list of parse paths in the db.json.parse property. 118 | Each parse path describes the parse route through the message to an element to be extracted from the message. The extracted element may be 119 | any JSON type (null, boolean, number, string, array, object) and the string representation of the extracted element is placed into a column in the 120 | sink table. A parse path corresponds to the column name in the db.columns property in the corresponding list position. 121 | 122 | A parse path represents an element hierarchy and is expressed as a string element identifiers, separated by a delimiting character (typically /). 123 | 124 | * A child element wihin an object is specified using @key where key is the key of the child element. 125 | * A child element within an array is specified using #index where index is the index of the child element (starting at 0). 126 | 127 | A path must start with the delimiter used to separate element identifiers. This first character is arbitrary and can be chosen to avoid 128 | conflict with key names. 129 | 130 | Below are some examples for paths in the following message: 131 | 132 | {"identity":71293145,"location":{"latitude":51.5009449,"longitude":-2.4773414},"acceleration":[0.01,0.0,0.0]} 133 | 134 | * /@identity - the path to element 71293145. 135 | * /@location/@longitude - the path to element -2.4773414. 136 | * /@acceleration/#0 - the path to element 0.01 137 | * /@location - the path to element {"latitude":51.5009449, "longitude":-2.4773414} 138 | 139 | The data type of a column receiving an element must be compatible with the element value passed to it. 140 | When a non scalar element (object or aray) is passed into a column, the target column should be a TEXT, JSON or VARCHAR data type. 141 | 142 | To insert messages in the above format into a table with an id, latitude, longitude, and acceleration columns, the db.columns and db.json.parse configuration properties would be: 143 | 144 | * db.columns = id,latitude,longitude,acceleration 145 | * db.json.parse = /@identity,/@location/@latitude,/@location/@longitude,/@acceleration 146 | 147 | Note the corresponding positions between columns and their respective parse paths. 148 | 149 | Where a path does not exist in the JSON message, a null value is placed in the column value. For example /@foo/@bar would return a null 150 | value from the example message above. 151 | 152 | ## Internals 153 | 154 | For synchronized delivery, a package of pl/pgsql functions is installed in the database. 155 | 156 | The package provides the functions for starting, flushing and getting state to ensure synchronized delivery semantics (exactly once) 157 | from Kafka to the sink table. 158 | 159 | The functions and state information are stored in the "$justone$kafka$connect$sink" schema. Within this schema, each sink table has a 160 | corresponding state table called schema.table using the schema and name of the sink table. 161 | 162 | A state table contains a row for each topic, partition and offset and has the following table definition 163 | 164 | Column | Type | Modifiers 165 | -----------------+-------------------+----------- 166 | kafka_topic | character varying | not null 167 | kafka_partition | integer | not null 168 | kafka_offset | bigint | not null 169 | 170 | The start() function is called when a Kafka sink task is started. It creates a temporary sink table and also creates a Kafka 171 | state table if it does not already exist. The sink task can then insert rows into the temporary sink table. 172 | 173 | The state() function returns rows from the state table for a specified sink table. This information can be used by the sink 174 | connector to initialise offsets for synchronizing consumption with the table state. Note that this function may return no 175 | rows if the sink table has not been flushed. 176 | 177 | The flush() function is called during a sink task flush. It copies rows from the temporary sink table to the permanent sink table 178 | and refreshes the Kafka state information in the state table. This is performed in the same transaction to guarantee 179 | synchronization. 180 | 181 | The drop() function is called to drop synchronization state if non synchronized delivery is used by the sink task. 182 | 183 | Process flow is typically: 184 | 185 | SELECT "$justone$kafka$connect$sink".start(,); 186 | SELECT kafkaTopic,kafkaPartition,kafkaOffset FROM "$justone$kafka$connect$sink".state(,
); 187 | ...insert rows into temporay sink table... 188 | SELECT "$justone$kafka$connect$sink".flush(,
,,,); 189 | 190 | ## Dependencies 191 | 192 | * Kafka Connect API - connect-api-0.9.0.1.jar 193 | * JustOne json parser - justone-json-1.0.jar 194 | * JustOne pg writer - justone-pgwriter-1.0.jar 195 | 196 | ## Support 197 | 198 | This repository is not supported. 199 | 200 | ## Author 201 | 202 | Duncan Pauly 203 | -------------------------------------------------------------------------------- /install-justone-kafka-sink-pg-1.0.sql: -------------------------------------------------------------------------------- 1 | /* 2 | 3 | MIT License 4 | 5 | Copyright (c) 2016 JustOne Database Inc 6 | 7 | Permission is hereby granted, free of charge, to any person obtaining a copy 8 | of this software and associated documentation files (the "Software"), to deal 9 | in the Software without restriction, including without limitation the rights 10 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 11 | copies of the Software, and to permit persons to whom the Software is 12 | furnished to do so, subject to the following conditions: 13 | 14 | The above copyright notice and this permission notice shall be included in all 15 | copies or substantial portions of the Software. 16 | 17 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 18 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 19 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 20 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 21 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 22 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 23 | SOFTWARE. 24 | 25 | */ 26 | 27 | -------------------------------------------------------------------------------------------------------------------------------- 28 | -- 29 | -- Installs the package for Kafka synchronized delivery used by JustOne Kafka PostgreSQL sink connectors. 30 | -- 31 | -- The package provides the functions for starting, flushing and getting state to ensure synchronized delivery semantics (exactly once) 32 | -- from Kafka to the sink table. 33 | -- 34 | -- The functions and state information are stored in the "$justone$kafka$connect$sink" schema. Within this schema, each sink table has a 35 | -- corresponding state table called "." where and are the schema and name of the sink table respectively. 36 | -- A state table contains a row for each topic, partition and offset. 37 | -- 38 | -- The start() function is called when a Kafka sink task is started. It creates a temporary sink table and also creates a Kafka 39 | -- state table if it does not already exist. The sink task can then insert rows into the temporary sink table. 40 | -- 41 | -- The state() function returns rows from the state table for a specified sink table. This information can be used by the sink 42 | -- connector to initialise offsets for synchronizing consumption with the table state. Note that this function may return no 43 | -- rows if the sink table has not been flushed. 44 | -- 45 | -- The flush() function is called during a sink task flush. It copies rows from the temporary sink table to the permanent sink table 46 | -- and refreshes the Kafka state information in the state table. This is performed in the same transaction to guarantee 47 | -- synchronization. 48 | -- 49 | -- The drop() function is called to drop synchronization state if non synchronized delivery is used by the sink task. 50 | -- 51 | -- Process flow is typically: 52 | -- 53 | -- SELECT "$justone$kafka$connect$sink".start(,
); 54 | -- SELECT kafkaTopic,kafkaPartition,kafkaOffset FROM "$justone$kafka$connect$sink".state(,
); 55 | -- insert rows into temporay sink table 56 | -- SELECT "$justone$kafka$connect$sink".flush(,
,,,); 57 | -- 58 | -------------------------------------------------------------------------------------------------------------------------------- 59 | -- 60 | -- Version History 61 | -- Version 1.0, 20 April 2016, Duncan Pauly 62 | -- 63 | -------------------------------------------------------------------------------------------------------------------------------- 64 | 65 | -------------------------------------------------------------------------------------------------------------------------------- 66 | -- Schema for Kafka synchronisation package 67 | -------------------------------------------------------------------------------------------------------------------------------- 68 | DROP SCHEMA "$justone$kafka$connect$sink" CASCADE; 69 | CREATE SCHEMA "$justone$kafka$connect$sink"; 70 | 71 | -------------------------------------------------------------------------------------------------------------------------------- 72 | -- 73 | -- Function to start task for a sink table 74 | -- 75 | -- Parameters: 76 | -- 77 | -- schemaName - schema of sink table 78 | -- tableName - name of sink table 79 | -- 80 | -------------------------------------------------------------------------------------------------------------------------------- 81 | DROP FUNCTION IF EXISTS "$justone$kafka$connect$sink".start(VARCHAR,VARCHAR) CASCADE; 82 | CREATE OR REPLACE FUNCTION "$justone$kafka$connect$sink".start(schemaName VARCHAR, tableName VARCHAR) RETURNS VOID AS $$ 83 | BEGIN 84 | 85 | /* create tempporary sink table */ 86 | EXECUTE format('CREATE TEMPORARY TABLE %2$s (LIKE "%1$s"."%2$s")',schemaName,tableName); 87 | 88 | /* create state table if it does not exist */ 89 | EXECUTE format('CREATE TABLE IF NOT EXISTS "$justone$kafka$connect$sink"."%1$s.%2$s" (kafka_topic VARCHAR NOT NULL, kafka_partition INTEGER NOT NULL, kafka_offset BIGINT NOT NULL) TABLESPACE pg_default', schemaName, tableName); 90 | 91 | RETURN; 92 | 93 | END; 94 | $$ LANGUAGE plpgsql; 95 | 96 | -------------------------------------------------------------------------------------------------------------------------------- 97 | -- 98 | -- Function to flush sink table state 99 | -- 100 | -- Parameters: 101 | -- 102 | -- schemaName - schema of sink table being flushed 103 | -- tableName - name of sink table being flushed 104 | -- kafkaTopics - array of Kafka topic states 105 | -- kafkaPartitions - array of Kafka partition states 106 | -- kafkaOffsets - array of Kafka offset states 107 | -- 108 | -------------------------------------------------------------------------------------------------------------------------------- 109 | DROP FUNCTION IF EXISTS "$justone$kafka$connect$sink".flush(VARCHAR,VARCHAR,VARCHAR[], INTEGER[], BIGINT[]) CASCADE; 110 | CREATE OR REPLACE FUNCTION "$justone$kafka$connect$sink".flush(schemaName VARCHAR, tableName VARCHAR, kafkaTopics VARCHAR[], kafkaPartitions INTEGER[], kafkaOffsets BIGINT[]) RETURNS VOID AS $$ 111 | BEGIN 112 | 113 | /* ensure temporary sink table exists */ 114 | IF (NOT pg_table_is_visible(format('pg_temp."%1$s"',tableName)::regclass)) THEN 115 | RAISE EXCEPTION 'No temporary sink table'; 116 | END IF; 117 | 118 | /* copy temporary sink table to permanent sink table */ 119 | EXECUTE format('INSERT INTO "%1$s"."%2$s" SELECT * FROM pg_temp."%2$s"', schemaName, tableName); 120 | 121 | /* truncate temporary sink table */ 122 | EXECUTE format('TRUNCATE TABLE pg_temp."%1$s"', tableName); 123 | 124 | /* truncate state table */ 125 | EXECUTE format('TRUNCATE TABLE "$justone$kafka$connect$sink"."%1$s.%2$s"', schemaName, tableName); 126 | 127 | /* insert kafka information into state table */ 128 | EXECUTE format('INSERT INTO "$justone$kafka$connect$sink"."%1$s.%2$s" (kafka_topic, kafka_partition, kafka_offset) SELECT unnest($1),unnest($2),unnest($3)', schemaName, tableName) USING kafkaTopics, kafkaPartitions, kafkaOffsets; 129 | 130 | RETURN; 131 | 132 | END; 133 | $$ LANGUAGE plpgsql; 134 | 135 | -------------------------------------------------------------------------------------------------------------------------------- 136 | -- 137 | -- Function to return Kafka state information for a sink table 138 | -- 139 | -- Parameters: 140 | -- 141 | -- schemaname - schema of sink table to get state information for 142 | -- tablename - name of sink table to get state information for 143 | -- 144 | -- Return: 145 | -- 146 | -- kafkaTopic - topic state 147 | -- kafkaPartition - partition state 148 | -- kafkaOffset - offset state 149 | -------------------------------------------------------------------------------------------------------------------------------- 150 | DROP FUNCTION IF EXISTS "$justone$kafka$connect$sink".state(VARCHAR,VARCHAR,VARCHAR[], INTEGER[], BIGINT[]) CASCADE; 151 | CREATE OR REPLACE FUNCTION "$justone$kafka$connect$sink".state(schemaname VARCHAR, tablename VARCHAR) 152 | RETURNS TABLE (kafkaTopic VARCHAR, kafkaPartition INTEGER, kafkaOffset BIGINT) AS $$ 153 | BEGIN 154 | 155 | /* query and return rows from state table */ 156 | RETURN QUERY EXECUTE format('SELECT kafka_topic, kafka_partition, kafka_offset FROM "$justone$kafka$connect$sink"."%1$s.%2$s"',schemaName,tableName); 157 | 158 | END; 159 | $$ LANGUAGE plpgsql; 160 | 161 | -------------------------------------------------------------------------------------------------------------------------------- 162 | -- 163 | -- Function to drop synchronization state for a sink table 164 | -- 165 | -- Parameters: 166 | -- 167 | -- schemaName - schema of sink table 168 | -- tableName - name of sink table 169 | -- 170 | -------------------------------------------------------------------------------------------------------------------------------- 171 | DROP FUNCTION IF EXISTS "$justone$kafka$connect$sink".drop(VARCHAR,VARCHAR) CASCADE; 172 | CREATE OR REPLACE FUNCTION "$justone$kafka$connect$sink".drop(schemaName VARCHAR, tableName VARCHAR) RETURNS VOID AS $$ 173 | BEGIN 174 | 175 | /* drop state table if it exists */ 176 | EXECUTE format('DROP TABLE IF EXISTS "$justone$kafka$connect$sink"."%1$s.%2$s"', schemaName, tableName); 177 | 178 | RETURN; 179 | 180 | END; 181 | $$ LANGUAGE plpgsql; 182 | 183 | -------------------------------------------------------------------------------------------------------------------------------- 184 | -- 185 | -- Grants 186 | -- 187 | -------------------------------------------------------------------------------------------------------------------------------- 188 | GRANT ALL ON SCHEMA "$justone$kafka$connect$sink" TO public; 189 | GRANT ALL ON ALL FUNCTIONS IN SCHEMA "$justone$kafka$connect$sink" TO public; 190 | -------------------------------------------------------------------------------- /justone-kafka-sink-pg-json-connector.properties: -------------------------------------------------------------------------------- 1 | # MIT License 2 | # 3 | # Copyright (c) 2016 JustOne Database Inc 4 | # 5 | # Permission is hereby granted, free of charge, to any person obtaining a copy 6 | # of this software and associated documentation files (the "Software"), to deal 7 | # in the Software without restriction, including without limitation the rights 8 | # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | # copies of the Software, and to permit persons to whom the Software is 10 | # furnished to do so, subject to the following conditions: 11 | # 12 | # The above copyright notice and this permission notice shall be included in all 13 | # copies or substantial portions of the Software. 14 | # 15 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | # SOFTWARE. 22 | # 23 | # Name of the connector (do not change) 24 | # 25 | name=justone-kafka-sink-pg-json 26 | # 27 | # Connector class (do not change) 28 | # 29 | connector.class=com.justone.kafka.sink.pg.json.PostgreSQLSinkConnector 30 | # 31 | # Number of tasks to be assigned to the connector (mandatory) 32 | # 33 | tasks.max=1 34 | # 35 | # Topics to consume from (mandatory) 36 | # 37 | topics=??? 38 | # 39 | # Server address/name hosting the database (optional - default is localhost) 40 | # 41 | db.host=localhost 42 | # 43 | # Name of the database to connect to (mandatory) 44 | # 45 | db.database=??? 46 | # 47 | # Name of the user to connect to the database with (mandatory) 48 | # 49 | db.username=??? 50 | # 51 | # Password to use for user authentication (optional - default is none) 52 | # 53 | # db.password=none 54 | # 55 | # Schema of the table (mandatory) 56 | # 57 | db.schema=public 58 | # 59 | # Table to receive rows (mandatory) 60 | # 61 | db.table=??? 62 | # 63 | # Comma separated list of columns to receive json elements (mandatory) 64 | # 65 | db.columns=??? 66 | # 67 | # Comma separated list of parse paths to retrieve json elements by (mandatory) 68 | # 69 | db.json.parse=??? 70 | # 71 | # Type of delivery (mandatory). Must be one of fastest, guaranteed, synchronized 72 | # 73 | db.delivery=synchronized 74 | # 75 | # Buffer size (bytes) used to cache writes 76 | # 77 | db.buffer.size=8000000 78 | # 79 | -------------------------------------------------------------------------------- /justone-kafka-sink-pg-json-standalone.properties: -------------------------------------------------------------------------------- 1 | # Licensed to the Apache Software Foundation (ASF) under one or more 2 | # contributor license agreements. See the NOTICE file distributed with 3 | # this work for additional information regarding copyright ownership. 4 | # The ASF licenses this file to You under the Apache License, Version 2.0 5 | # (the "License"); you may not use this file except in compliance with 6 | # the License. You may obtain a copy of the License at 7 | # 8 | # http://www.apache.org/licenses/LICENSE-2.0 9 | # 10 | # Unless required by applicable law or agreed to in writing, software 11 | # distributed under the License is distributed on an "AS IS" BASIS, 12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | # See the License for the specific language governing permissions and 14 | # limitations under the License. 15 | 16 | bootstrap.servers=localhost:9092 17 | 18 | # The converters specify the format of data in Kafka and how to translate it into Connect data. Every Connect user will 19 | # need to configure these based on the format they want their data in when loaded from or stored into Kafka 20 | # 21 | key.converter=org.apache.kafka.connect.storage.StringConverter 22 | value.converter=org.apache.kafka.connect.storage.StringConverter 23 | # 24 | # Converter-specific settings can be passed in by prefixing the Converter's setting with the converter we want to apply 25 | # it to 26 | # 27 | key.converter.schemas.enable=false 28 | value.converter.schemas.enable=false 29 | # 30 | # The internal converter used for offsets and config data is configurable and must be specified, but most users will 31 | # always want to use the built-in default. Offset and config data is never visible outside of Copcyat in this format. 32 | # 33 | internal.key.converter=org.apache.kafka.connect.json.JsonConverter 34 | internal.value.converter=org.apache.kafka.connect.json.JsonConverter 35 | internal.key.converter.schemas.enable=false 36 | internal.value.converter.schemas.enable=false 37 | # 38 | # Location of of Kafka offsets 39 | # 40 | offset.storage.file.filename=/tmp/connect.offsets 41 | -------------------------------------------------------------------------------- /pom.xml: -------------------------------------------------------------------------------- 1 | 3 | 4.0.0 4 | 5 | com.justone 6 | justone-kafka-sink-pg-json 7 | 1.0 8 | jar 9 | 10 | kafka-sink-pg-json 11 | http://maven.apache.org 12 | 13 | 14 | 15 | org.apache.maven.plugins 16 | maven-compiler-plugin 17 | 2.3.2 18 | 19 | 1.7 20 | 1.7 21 | 22 | 23 | 24 | 25 | 26 | UTF-8 27 | 28 | 29 | 30 | 31 | junit 32 | junit 33 | 3.8.1 34 | test 35 | 36 | 37 | org.apache.kafka 38 | connect-api 39 | 0.9.0.1 40 | 41 | 42 | com.justone 43 | justone-json 44 | 1.0.1 45 | 46 | 47 | com.justone 48 | justone-pgwriter 49 | 1.0 50 | 51 | 52 | 53 | -------------------------------------------------------------------------------- /src/main/java/com/justone/kafka/sink/pg/json/PostgreSQLSinkConnector.java: -------------------------------------------------------------------------------- 1 | /* 2 | 3 | MIT License 4 | 5 | Copyright (c) 2016 JustOne Database Inc 6 | 7 | Permission is hereby granted, free of charge, to any person obtaining a copy 8 | of this software and associated documentation files (the "Software"), to deal 9 | in the Software without restriction, including without limitation the rights 10 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 11 | copies of the Software, and to permit persons to whom the Software is 12 | furnished to do so, subject to the following conditions: 13 | 14 | The above copyright notice and this permission notice shall be included in all 15 | copies or substantial portions of the Software. 16 | 17 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 18 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 19 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 20 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 21 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 22 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 23 | SOFTWARE. 24 | 25 | */ 26 | 27 | package com.justone.kafka.sink.pg.json; 28 | 29 | import java.util.ArrayList; 30 | import java.util.List; 31 | import java.util.Map; 32 | import org.apache.kafka.connect.connector.ConnectorContext; 33 | import org.apache.kafka.connect.connector.Task; 34 | import org.apache.kafka.connect.sink.SinkConnector; 35 | 36 | /** 37 | * Kafka sink connector for PostgreSQL 38 | * 39 | * @author Duncan Pauly 40 | * @version 1.0 41 | * 42 | */ 43 | public class PostgreSQLSinkConnector extends SinkConnector { 44 | /** 45 | * Version of the connector 46 | */ 47 | public final static String VERSION="1.0a"; 48 | /** 49 | * Configuration properties for the connector 50 | */ 51 | private Map fProperties; 52 | 53 | /** 54 | * Returns version of the connector 55 | * @return version 56 | */ 57 | @Override 58 | public String version() { 59 | 60 | return VERSION;//return version 61 | 62 | }//version() 63 | 64 | /** 65 | * Initialise the connector 66 | * @param ctx context of the connector 67 | */ 68 | @Override 69 | public void initialize(ConnectorContext ctx) { 70 | //do nothing 71 | }//initialize() 72 | 73 | /** 74 | * Initialise the connector 75 | * @param ctx context of the connector 76 | * @param taskConfigs task configuration 77 | */ 78 | @Override 79 | public void initialize(ConnectorContext ctx, 80 | List> taskConfigs) { 81 | //do nothing 82 | }//initialize() 83 | 84 | /** 85 | * Start the connector 86 | * @param props connector configuration properties 87 | */ 88 | @Override 89 | public void start(Map props) { 90 | 91 | fProperties=props;//set connector configuration properties 92 | 93 | }//start() 94 | 95 | /** 96 | * Stop the connector 97 | */ 98 | @Override 99 | public void stop() { 100 | //do nothing 101 | }//stop() 102 | 103 | /** 104 | * Returns class of task 105 | * @return class of task 106 | */ 107 | @Override 108 | public Class taskClass() { 109 | return PostgreSQLSinkTask.class;//return task class 110 | }//taskClass() 111 | 112 | /** 113 | * Returns task configurations 114 | * @param maxTasks maximum tasks to execute 115 | * @return task configurations 116 | */ 117 | @Override 118 | public List> taskConfigs(int maxTasks) { 119 | 120 | ArrayList> configurations = new ArrayList<>();//construct list 121 | 122 | for (int i = 0; i < maxTasks; i++) {//for each task 123 | configurations.add(fProperties);//add connector configuration 124 | }//for each task 125 | 126 | return configurations;//return task configurations 127 | 128 | }//taskConfigs() 129 | 130 | }//PostgreSQLSinkConnector{} 131 | -------------------------------------------------------------------------------- /src/main/java/com/justone/kafka/sink/pg/json/PostgreSQLSinkTask.java: -------------------------------------------------------------------------------- 1 | /* 2 | 3 | MIT License 4 | 5 | Copyright (c) 2016 JustOne Database Inc 6 | 7 | Permission is hereby granted, free of charge, to any person obtaining a copy 8 | of this software and associated documentation files (the "Software"), to deal 9 | in the Software without restriction, including without limitation the rights 10 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 11 | copies of the Software, and to permit persons to whom the Software is 12 | furnished to do so, subject to the following conditions: 13 | 14 | The above copyright notice and this permission notice shall be included in all 15 | copies or substantial portions of the Software. 16 | 17 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 18 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 19 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 20 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 21 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 22 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 23 | SOFTWARE. 24 | 25 | */ 26 | package com.justone.kafka.sink.pg.json; 27 | 28 | import java.io.IOException; 29 | import java.sql.Connection; 30 | import java.sql.Statement; 31 | import java.sql.PreparedStatement; 32 | import java.sql.ResultSet; 33 | import java.sql.SQLException; 34 | import java.util.Collection; 35 | import java.util.Iterator; 36 | import java.util.Map; 37 | import java.util.Map.Entry; 38 | import java.util.HashMap; 39 | 40 | import org.apache.kafka.clients.consumer.OffsetAndMetadata; 41 | import org.apache.kafka.common.TopicPartition; 42 | import org.apache.kafka.connect.errors.ConnectException; 43 | import org.apache.kafka.connect.sink.SinkRecord; 44 | import org.apache.kafka.connect.sink.SinkTask; 45 | import org.apache.kafka.connect.sink.SinkTaskContext; 46 | import org.slf4j.Logger; 47 | import org.slf4j.LoggerFactory; 48 | 49 | import com.justone.pgwriter.TableWriter; 50 | 51 | import com.justone.json.Element; 52 | import com.justone.json.Parser; 53 | import com.justone.json.Path; 54 | 55 | /** 56 | * Task for PostgreSQL sink connector 57 | * @author Duncan Pauly 58 | * @version 1.0 59 | */ 60 | public class PostgreSQLSinkTask extends SinkTask { 61 | 62 | /** 63 | * Fastest delivery semantic 64 | */ 65 | private static final int FASTEST = 0; 66 | /** 67 | * Guaranteed delivery semantic 68 | */ 69 | private static final int GUARANTEED = 1; 70 | /** 71 | * Synchronised delivery semantic 72 | */ 73 | private static final int SYNCHRONIZED = 2; 74 | /** 75 | * Delivery configuration options 76 | */ 77 | private static final String[] DELIVERY=new String[]{"FASTEST","GUARANTEED","SYNCHRONIZED"}; 78 | /** 79 | * Database host server property key 80 | */ 81 | public static final String HOST_CONFIG = "db.host"; 82 | /** 83 | * Database name property key 84 | */ 85 | public static final String DATABASE_CONFIG = "db.database"; 86 | /** 87 | * Database username property key 88 | */ 89 | public static final String USER_CONFIG = "db.username"; 90 | /** 91 | * Database password property key 92 | */ 93 | public static final String PASSWORD_CONFIG = "db.password"; 94 | /** 95 | * Schema name property key 96 | */ 97 | public static final String SCHEMA_CONFIG = "db.schema"; 98 | /** 99 | * Table name property key 100 | */ 101 | public static final String TABLE_CONFIG = "db.table"; 102 | /** 103 | * Table column names property key 104 | */ 105 | public static final String COLUMN_CONFIG = "db.columns"; 106 | /** 107 | * Delivery semantics property key 108 | */ 109 | public static final String DELIVERY_CONFIG = "db.delivery"; 110 | /** 111 | * JSON parse paths property key 112 | */ 113 | public static final String PARSE_CONFIG = "db.json.parse"; 114 | /** 115 | * Buffer size property key 116 | */ 117 | public static final String BUFFER_CONFIG = "db.buffer.size"; 118 | /** 119 | * Synchronise command to start sink task 120 | */ 121 | private static final String SYNC_START = "SELECT \"$justone$kafka$connect$sink\".start('','')"; 122 | /** 123 | * Synchronise command to get synchronisation state 124 | */ 125 | private static final String SYNC_STATE = "SELECT kafkaTopic,kafkaPartition,kafkaOffset FROM \"$justone$kafka$connect$sink\".state('','')"; 126 | /** 127 | * Synchronise command to flush 128 | */ 129 | private static final String SYNC_FLUSH = "SELECT \"$justone$kafka$connect$sink\".flush('','',?,?,?)"; 130 | /** 131 | * Synchronise command to drop synchronisation state 132 | */ 133 | private static final String SYNC_DROP = "SELECT \"$justone$kafka$connect$sink\".drop('','')"; 134 | /** 135 | * Logger for trace messages 136 | */ 137 | private static final Logger fLog = LoggerFactory.getLogger(PostgreSQLSinkTask.class); 138 | /** 139 | * Sink task context 140 | */ 141 | private SinkTaskContext iTaskContext; 142 | /** 143 | * Table writer for appending to the table 144 | */ 145 | private TableWriter iWriter; 146 | /** 147 | * Paths for JSON parsing 148 | */ 149 | private Path[] iPaths; 150 | /** 151 | * Parser for JSON parsing 152 | */ 153 | private Parser iParser; 154 | /** 155 | * Delivery semantic 156 | */ 157 | private int iDelivery; 158 | /** 159 | * Database connection 160 | */ 161 | private Connection iConnection; 162 | /** 163 | * Sink table flush statement 164 | */ 165 | private PreparedStatement iFlushStatement; 166 | 167 | /** 168 | * Constructor for sink task 169 | */ 170 | public PostgreSQLSinkTask() { 171 | }//PostgreSQLSinkTask() 172 | 173 | /** 174 | * Return connector version 175 | * @return version string 176 | */ 177 | @Override 178 | public String version() { 179 | 180 | return PostgreSQLSinkConnector.VERSION;//return connector version 181 | 182 | }//version() 183 | 184 | /** 185 | * Initialise sink task 186 | * @param context context of the sink task 187 | */ 188 | @Override 189 | public void initialize(SinkTaskContext context) { 190 | 191 | iTaskContext=context;//save task context 192 | 193 | }//initialize() 194 | 195 | /** 196 | * Start the task 197 | * @param props configuration properties 198 | * @throws ConnectException if failed to start 199 | */ 200 | @Override 201 | public void start(Map props) throws ConnectException { 202 | 203 | fLog.trace("Starting"); 204 | 205 | /* log connector configuration */ 206 | String configuration="\n"; 207 | configuration=configuration+'\t'+HOST_CONFIG+':'+props.get(HOST_CONFIG)+'\n'; 208 | configuration=configuration+'\t'+DATABASE_CONFIG+':'+props.get(DATABASE_CONFIG)+'\n'; 209 | configuration=configuration+'\t'+USER_CONFIG+':'+props.get(USER_CONFIG)+'\n'; 210 | configuration=configuration+'\t'+PASSWORD_CONFIG+':'+props.get(PASSWORD_CONFIG)+'\n'; 211 | configuration=configuration+'\t'+SCHEMA_CONFIG+':'+props.get(SCHEMA_CONFIG)+'\n'; 212 | configuration=configuration+'\t'+TABLE_CONFIG+':'+props.get(TABLE_CONFIG)+'\n'; 213 | configuration=configuration+'\t'+COLUMN_CONFIG+':'+props.get(COLUMN_CONFIG)+'\n'; 214 | configuration=configuration+'\t'+PARSE_CONFIG+':'+props.get(PARSE_CONFIG)+'\n'; 215 | configuration=configuration+'\t'+BUFFER_CONFIG+':'+props.get(BUFFER_CONFIG)+'\n'; 216 | configuration=configuration+'\t'+DELIVERY_CONFIG+':'+props.get(DELIVERY_CONFIG)+'\n'; 217 | fLog.info("Sink connector configuration: " + configuration); 218 | 219 | try { 220 | 221 | /* get configuration properties */ 222 | String host=props.get(HOST_CONFIG);//database host 223 | String database=props.get(DATABASE_CONFIG);//database name 224 | String username=props.get(USER_CONFIG);//database username 225 | String password=props.get(PASSWORD_CONFIG);//database password 226 | String schema=props.get(SCHEMA_CONFIG);//schema of table to sink to 227 | String table=props.get(TABLE_CONFIG);//name of table to sink to 228 | String columnList=props.get(COLUMN_CONFIG);//columns to sink to 229 | Integer bufferSize=Integer.parseInt(props.get(BUFFER_CONFIG));//task buffer size 230 | String pathList=props.get(PARSE_CONFIG);//list if JSON parse paths 231 | String delivery=props.get(DELIVERY_CONFIG);//delivery semantics required 232 | 233 | /* validate configuration */ 234 | if (database==null) throw new ConnectException("Database not configured");//database name is mandatory 235 | if (schema==null) throw new ConnectException("Schema not configured");//schema name is mandatory 236 | if (table==null) throw new ConnectException("Table not configured");//table name is mandatory 237 | if (columnList==null) throw new ConnectException("Columns not configured");//column list is mandatory 238 | if (pathList==null) throw new ConnectException("Parse paths not configured");//path list is mandatory 239 | if (bufferSize<0) throw new ConnectException("Buffer size configuration is invalid");//buffer size is mandatory 240 | 241 | /* construct parse paths from path list */ 242 | String[] columns=columnList.split("\\,");//split column list into separate strings 243 | String[] paths=pathList.split("\\,");//split path list into separate strings 244 | iPaths=new Path[paths.length];//construct array of paths 245 | for (int i=0;i",schema).replace("",table);//prepare start statement 266 | statement.executeQuery(start);//perform start 267 | 268 | /* fetch table state */ 269 | String state=SYNC_STATE.replace("",schema).replace("",table);//prepare state query statement 270 | ResultSet resultSet=statement.executeQuery(state);//perform state query 271 | 272 | if (resultSet.isBeforeFirst()) {//if state is not empty 273 | HashMap offsetMap=new HashMap<>();//construct map of offsets 274 | while (resultSet.next()) {//for each state row 275 | String topic=resultSet.getString(1);//get topic 276 | Integer partition=resultSet.getInt(2);//get partition number 277 | Long offset=resultSet.getLong(3);//get offset number 278 | offsetMap.put(new TopicPartition(topic,partition),offset);//append to map of offsets 279 | }//for each partition 280 | resultSet.close();//be a good citizen 281 | 282 | iTaskContext.offset(offsetMap);//synchronise offsets 283 | 284 | }//if state is not empty 285 | 286 | /* prepare flush statement */ 287 | String flush=SYNC_FLUSH.replace("",schema).replace("",table);//prepare flush statement 288 | iFlushStatement=iConnection.prepareStatement(flush);//set flush statement 289 | 290 | } else {//else non synchronised delivery 291 | 292 | /* drop synchronization state */ 293 | String drop=SYNC_DROP.replace("",schema).replace("",table);//prepare drop statement 294 | statement.executeQuery(drop);//perform drop 295 | 296 | }//if synchonized delivery 297 | 298 | iParser=new Parser();//construct parser 299 | 300 | } catch (NumberFormatException | SQLException | IOException exception) { 301 | throw new ConnectException(exception);//ho hum... 302 | }//try{} 303 | 304 | }//start() 305 | 306 | /** 307 | * Parses JSON value in each record and appends JSON elements to the table 308 | * @param sinkRecords records to be written 309 | * @throws ConnectException if put fails 310 | */ 311 | @Override 312 | public void put(Collection sinkRecords) throws ConnectException { 313 | 314 | for (SinkRecord record : sinkRecords) {//for each sink record 315 | 316 | fLog.trace("Put message {}", record.value()); 317 | 318 | try { 319 | 320 | iParser.parse(record.value().toString());//parse record value 321 | 322 | /* append parsed JSON elements to the table */ 323 | for (int i=0;i offsets) throws ConnectException { 365 | 366 | fLog.trace("Flush start at "+System.currentTimeMillis()); 367 | 368 | try { 369 | 370 | if (iDelivery>FASTEST)//if guaranteed or synchronized 371 | iWriter.flush();//flush table writes 372 | 373 | if (iDelivery==SYNCHRONIZED) {//if synchronized delivery 374 | 375 | /* create topic, partition and offset arrays for database flush function call */ 376 | 377 | int size=offsets.size();//get number of flush map entries 378 | String[] topicArray=new String[size];//create array for topics 379 | Integer[] partitionArray=new Integer[size];//create array for partitions 380 | Long[] offsetArray=new Long[size];//create array for offsets 381 | 382 | /* populate topic, partition and offset arrays */ 383 | 384 | Iterator> iterator=offsets.entrySet().iterator();//create map iterator 385 | for (int i=0;i entry=iterator.next();//get next entry 387 | TopicPartition key=entry.getKey();//get topic partition key 388 | OffsetAndMetadata value=entry.getValue();//get offset value 389 | topicArray[i]=key.topic();//put topic into array 390 | partitionArray[i]=key.partition();//put partition in to array 391 | offsetArray[i]=value.offset();//put offset into array 392 | }//for each flush map entry 393 | 394 | /* bind arays to flush statement */ 395 | 396 | iFlushStatement.setArray(1, iConnection.createArrayOf("varchar", topicArray));//bind topic array 397 | iFlushStatement.setArray(2, iConnection.createArrayOf("integer", partitionArray));//bind partition array 398 | iFlushStatement.setArray(3, iConnection.createArrayOf("bigint", offsetArray));//bind offset array 399 | 400 | /* execute the database flush function */ 401 | 402 | iFlushStatement.executeQuery(); 403 | 404 | }//if synchronized delivery 405 | 406 | } catch (SQLException | IOException exception) { 407 | throw new ConnectException(exception); 408 | }//try{} 409 | 410 | fLog.trace("Flush stop at "+System.currentTimeMillis()); 411 | 412 | }//flush() 413 | 414 | /** 415 | * Stop the sink task 416 | * @throws ConnectException 417 | */ 418 | @Override 419 | public void stop() throws ConnectException { 420 | 421 | fLog.trace("Stopping"); 422 | 423 | try { 424 | 425 | iWriter.close();//close table writer 426 | 427 | } catch (IOException exception) { 428 | throw new ConnectException(exception); 429 | }//try{} 430 | 431 | }//stop() 432 | 433 | }//PostgreSQLSinkTask 434 | -------------------------------------------------------------------------------- /uninstall-justone-kafka-sink-pg-1.0.sql: -------------------------------------------------------------------------------- 1 | /* 2 | 3 | MIT License 4 | 5 | Copyright (c) 2016 JustOne Database Inc 6 | 7 | Permission is hereby granted, free of charge, to any person obtaining a copy 8 | of this software and associated documentation files (the "Software"), to deal 9 | in the Software without restriction, including without limitation the rights 10 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 11 | copies of the Software, and to permit persons to whom the Software is 12 | furnished to do so, subject to the following conditions: 13 | 14 | The above copyright notice and this permission notice shall be included in all 15 | copies or substantial portions of the Software. 16 | 17 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 18 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 19 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 20 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 21 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 22 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 23 | SOFTWARE. 24 | 25 | */ 26 | 27 | -------------------------------------------------------------------------------------------------------------------------------- 28 | -- 29 | -- Uninstalls synchronisation package for JustOne Kafka PostgreSQL sink connectors. 30 | -- 31 | -------------------------------------------------------------------------------------------------------------------------------- 32 | -- 33 | -- Version History 34 | -- Version 1.0, 20 April 2016, Duncan Pauly 35 | -- 36 | -------------------------------------------------------------------------------------------------------------------------------- 37 | 38 | DROP SCHEMA "$justone$kafka$connect$sink" CASCADE; 39 | --------------------------------------------------------------------------------