├── .gitignore ├── README.md ├── calcite └── kafka.model.json ├── ksqlDB ├── ksql-cli.sh └── ksqlDB-deployment.yaml ├── materialize ├── materialize-deployment.yaml └── psql-cli.sh └── stream-generator ├── .gitignore ├── Dockerfile ├── generator-deployment.yaml ├── kafa-topics.yaml ├── poetry.lock ├── pyproject.toml └── stream_generator.py /.gitignore: -------------------------------------------------------------------------------- 1 | *.pyc 2 | stream-generator/stream_generator.egg-info/ 3 | 4 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Streaming SQL 2 | 3 | This repo has example Kubernetes deployment files and example applications for various streaming SQL options. 4 | 5 | ## Kafka 6 | 7 | All the examples in this repo assume you have an Apache Kafka cluster running and accessible on you Kubernetes cluster. [Strimzi](https://strimzi.io/) is a great way to get Kafka up and running on Kubernetes. 8 | 9 | Install the latest version of the Strimzi Kafka Cluster Operator on your cluster with the following command (see [the docs](https://strimzi.io/docs/latest/#downloads-str) for other deployment options): 10 | 11 | ```bash 12 | curl -L http://strimzi.io/install/latest \ 13 | | sed 's/namespace: .*/namespace: /' \ 14 | | kubectl apply -f - -n 15 | ``` 16 | 17 | Then you can spin up a 3 node Kafka cluster with the following command (see the [Strimzi project repository](https://github.com/strimzi/strimzi-kafka-operator/tree/master/examples/kafka) for other example Kafka deployments): 18 | 19 | ```bash 20 | kubectl apply -f \ 21 | https://strimzi.io/examples/latest/kafka/kafka-persistent.yaml \ 22 | -n 23 | ``` 24 | 25 | ## Stream Generator 26 | 27 | In order to provide an example streaming source, for the stream SQL implementations in this repo, an example stream generator deployment is provided. This will stream the [Wikipedia](https://wikipedia.org) changes log into a Kafka topic (removing any non-change messages and errors). This stream source can be deployed using the following command: 28 | 29 | ```bash 30 | kubectl apply -f stream-generator/generator-deployment.yaml 31 | ``` 32 | 33 | By default this will stream messages to the broker bootstrap address for the Strimzi cluster described in the section above. If you are using a different set up the change the `KAFKA_BOOTSTRAP_SERVERS` environment variable in the deployment file. 34 | The generator will stream changes into the `wiki-changes` topic on the configured Kafka broker. If you do not have topic auto-creation enabled, you should create that topic first. If you are using the Strimzi deployment above, which has the Topic Operator enabled, the topic can be created using the command below: 35 | 36 | ```bash 37 | kubectl apply -f stream-generator/kafka-topic.yaml 38 | ``` 39 | 40 | ## ksqlDB 41 | 42 | [ksqlDB](https://ksqldb.io/) is a streaming SQL implementation based on [Apache Kafka](https://kafka.apache.org/) and the [Kafka Stream](https://kafka.apache.org/documentation/streams/) library. 43 | 44 | It is not a fully open source solution as it is licensed under the Confluent Community License Agreement v1.0. 45 | 46 | To test out ksqlDB, you need to change the `KSQL_BOOTSTRAP_SERVERS` environment variable in the `ksqlDB-deployment.yaml` to match the boostrap address of you Kafka cluster. If you are using Strimzi with the example above then this should be set as below (if you changed the `metadata.name` field of the `kafka` custom resource then change `my-cluster` to match the new name): 47 | 48 | ```yaml 49 | - name: KSQL_BOOTSTRAP_SERVERS 50 | value: PLAINTEXT://my-cluster-kafka-bootstrap:9092 51 | ``` 52 | 53 | Then you can deploy your ksqlDB instance: 54 | 55 | ```bash 56 | $ kubectl apply -f ksqlDB/ksqlDB-deployment.yaml 57 | ``` 58 | 59 | Once deployed you can interact with the server using the `ksql-cli` command line client running in another pod using the `ksql-cli.sh` script: 60 | 61 | ```bash 62 | $ ./ksqlDB/ksql-cli.sh 1 63 | ``` 64 | 65 | Where the first argument is the namespace the ksqlDB instance is deployed in and the second argument is a number used to label different invocations of the CLI if you want to run more than one instance for data entry and analysis. 66 | 67 | Now you can play with ksqlDB and follow the [project quickstart guide](https://ksqldb.io/quickstart.html). 68 | 69 | Alternatively, you can write queries against the Wikipedia changes stream. For example, if you want to create a stream which contains the user IDs and the title of the Wikipedia article they are editing you can use the command below: 70 | 71 | ```sql 72 | CREATE STREAM userTitles (user VARCHAR, title VARCHAR) WITH (kafka_topic='wiki-changes', key='user', value_format='json'); 73 | ``` 74 | 75 | You can then see the contents of that stream by using the query below: 76 | 77 | ```sql 78 | SELECT * FROM userTitles EMIT CHANGES; 79 | ``` 80 | 81 | You create tables from this stream and others and then query these like tables in a database. 82 | 83 | ## Materialize 84 | 85 | [Materialize](https://materialize.io/) uses a streaming engine based on timely dataflow to provide a PostgreSQL compatible SQL interface. 86 | 87 | This is not a fully open source solution as it is licensed under the Business Source License v1.1. Which limits you to running the code on a singe instance unless you purchase a licence. 88 | 89 | You can deploy the Materialize server using the following command: 90 | 91 | ```bash 92 | $ kubectl apply -f materialize/materialize-deployment.yaml 93 | ``` 94 | 95 | Once deployed you can interact with the server using the `psql` command line client running in another pod using the `psql-cli.sh` script: 96 | 97 | ```bash 98 | $ ./materialize/psql-cli.sh 1 99 | ``` 100 | 101 | Where the first argument is the namespace the Materialize instance is deployed in and the second argument is a number used to label different invocations of the CLI if you want to run more than one instance for data entry and analysis. 102 | 103 | Now you can play with Materialize and follow the [project quickstart guide](https://materialize.io/docs/get-started/). 104 | 105 | ## Apache Calcite 106 | 107 | [Apache Calcite](https://calcite.apache.org/) provides libraries and tools to parse and optimise SQL queries and run them on a large number of different storage layers and execution engines. These include [experimental support](https://calcite.apache.org/docs/kafka_adapter.html) for Apache Kafka as a data source and allows you to query topics using Calcite's [Streaming SQL](https://calcite.apache.org/docs/stream.html) support. 108 | 109 | An example Kafka Adapter model file is provided in the `/calacite` folder. This is attached to a simple stream of usernames and titles from the wiki changes stream. 110 | 111 | You can run the calcite `sqlline` tool by cloning the [calcite repo](https://github.com/apache/calcite) and running the following command from the repo root: 112 | 113 | ```bash 114 | $ ./sqlline 115 | ``` 116 | 117 | To connect to the kafka cluster running in `minikube` you will need to add an external listener to your cluster definition. You can add the following config for an unsecured external listener: 118 | 119 | ```yaml 120 | kafka: 121 | listeners: 122 | external: 123 | tls: false 124 | type: nodeport 125 | ``` 126 | 127 | And then expose the listener port using: 128 | 129 | ```bash 130 | kubectl port-forward svc/my-cluster-kafka-external-bootstrap 9094:9094 131 | ``` 132 | 133 | This will make the Kafka bootstrap service available at `localhost:9094`. 134 | 135 | You can then start the calcite `sqlline` tool and connect to the `user-titles` stream (defined in the `kafka.modle.json` schema file) using the command below: 136 | 137 | ```bash 138 | sqlline> !connect jdbc:calcite:model=kafka.model.json admin admin 139 | ``` 140 | 141 | The above assumes that you placed `kafka.model.json` in the Calcite repo root but you can pass a relative path after the `=` pointing to the schema file. 142 | 143 | You can see the `USER_TITLES` table by running the command below: 144 | 145 | ```sql 146 | SELECT STREAM * FROM KAFKA.USER_TITLES LIMIT 10; 147 | ``` 148 | 149 | The `LIMIT` term is there to get the query to return faster. 150 | 151 | 152 | 153 | 154 | -------------------------------------------------------------------------------- /calcite/kafka.model.json: -------------------------------------------------------------------------------- 1 | { 2 | "version": "1.0", 3 | "defaultSchema": "KAFKA", 4 | "schemas": [ 5 | { 6 | "name": "KAFKA", 7 | "tables": [ 8 | { 9 | "name": "USER_TITLES", 10 | "factory": "org.apache.calcite.adapter.kafka.KafkaTableFactory", 11 | "operand": { 12 | "bootstrap.servers": "localhost:9094", 13 | "topic.name": "user-titles", 14 | "consumer.params": { 15 | "group.id": "calcite-ut-consumer", 16 | "key.deserializer": "org.apache.kafka.common.serialization.ByteArrayDeserializer", 17 | "value.deserializer": "org.apache.kafka.common.serialization.ByteArrayDeserializer" 18 | } 19 | } 20 | } 21 | ] 22 | } 23 | ] 24 | } 25 | -------------------------------------------------------------------------------- /ksqlDB/ksql-cli.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | kubectl -n $1 run ksqldb-cli-$2 -ti \ 4 | --image=confluentinc/ksqldb-cli:0.8.1 \ 5 | --rm=true --restart=Never \ 6 | -- /usr/bin/ksql http://ksqldb-server:8088 7 | -------------------------------------------------------------------------------- /ksqlDB/ksqlDB-deployment.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: apps/v1 2 | kind: Deployment 3 | metadata: 4 | name: ksqldb-deployment 5 | labels: 6 | app: ksqldb 7 | spec: 8 | replicas: 1 9 | selector: 10 | matchLabels: 11 | app: ksqldb 12 | template: 13 | metadata: 14 | labels: 15 | app: ksqldb 16 | spec: 17 | containers: 18 | - name: ksqldb 19 | image: confluentinc/ksqldb-server:0.8.1 20 | ports: 21 | - containerPort: 8088 22 | env: 23 | - name: KSQL_BOOTSTRAP_SERVERS 24 | value: PLAINTEXT://my-cluster-kafka-bootstrap:9092 25 | - name: KSQL_LISTENERS 26 | value: http://0.0.0.0:8088 27 | - name: KSQL_KSQL_LOGGING_PROCESSING_STREAM_AUTO_CREATE 28 | value: "true" 29 | - name: KSQL_KSQL_LOGGING_PROCESSING_TOPIC_AUTO_CREATE 30 | value: "true" 31 | --- 32 | apiVersion: v1 33 | kind: Service 34 | metadata: 35 | name: ksqldb-server 36 | spec: 37 | selector: 38 | app: ksqldb 39 | ports: 40 | - protocol: TCP 41 | port: 8088 42 | targetPort: 8088 -------------------------------------------------------------------------------- /materialize/materialize-deployment.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: apps/v1 2 | kind: Deployment 3 | metadata: 4 | name: materialize-deployment 5 | labels: 6 | app: materialize 7 | spec: 8 | replicas: 1 9 | selector: 10 | matchLabels: 11 | app: materialize 12 | template: 13 | metadata: 14 | labels: 15 | app: materialize 16 | spec: 17 | containers: 18 | - name: materialize 19 | image: materialize/materialized:latest 20 | ports: 21 | - containerPort: 6875 22 | env: 23 | - name: MZ_THREADS 24 | value: "2" 25 | --- 26 | apiVersion: v1 27 | kind: Service 28 | metadata: 29 | name: materialize-server 30 | spec: 31 | selector: 32 | app: materialize 33 | ports: 34 | - protocol: TCP 35 | port: 6875 36 | targetPort: 6875 37 | -------------------------------------------------------------------------------- /materialize/psql-cli.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | kubectl -n $1 run psql-cli-$2 -ti \ 4 | --image=postgres:latest \ 5 | --rm=true --restart=Never \ 6 | -- psql -h materialize-server -p 6875 materialize 7 | -------------------------------------------------------------------------------- /stream-generator/.gitignore: -------------------------------------------------------------------------------- 1 | __pycache__/* 2 | .mypy_cache/* 3 | -------------------------------------------------------------------------------- /stream-generator/Dockerfile: -------------------------------------------------------------------------------- 1 | FROM centos:8 2 | 3 | RUN yum -y update \ 4 | && yum -y install python3 python3-pip \ 5 | && yum -y clean all 6 | 7 | RUN pip3 install kafka-python sseclient 8 | 9 | COPY stream_generator.py . 10 | 11 | CMD ["python3","stream_generator.py"] 12 | -------------------------------------------------------------------------------- /stream-generator/generator-deployment.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: apps/v1 2 | kind: Deployment 3 | metadata: 4 | name: stream-generator 5 | labels: 6 | app: streamgen 7 | spec: 8 | replicas: 1 9 | selector: 10 | matchLabels: 11 | app: streamgen 12 | template: 13 | metadata: 14 | labels: 15 | app: streamgen 16 | spec: 17 | containers: 18 | - name: generator 19 | image: tomncooper/wiki-change-generator:latest 20 | imagePullPolicy: Always 21 | env: 22 | - name: KAFKA_BOOTSTRAP_SERVERS 23 | value: my-cluster-kafka-bootstrap:9092 24 | -------------------------------------------------------------------------------- /stream-generator/kafa-topics.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: kafka.strimzi.io/v1beta1 2 | kind: KafkaTopic 3 | metadata: 4 | name: wiki-changes 5 | labels: 6 | strimzi.io/cluster: my-cluster 7 | spec: 8 | partitions: 4 9 | replicas: 3 10 | config: 11 | retention.ms: 7200000 12 | segment.bytes: 1073741824 13 | --- 14 | apiVersion: kafka.strimzi.io/v1beta1 15 | kind: KafkaTopic 16 | metadata: 17 | name: user-titles 18 | labels: 19 | strimzi.io/cluster: my-cluster 20 | spec: 21 | partitions: 4 22 | replicas: 3 23 | config: 24 | retention.ms: 7200000 25 | segment.bytes: 1073741824 26 | 27 | 28 | -------------------------------------------------------------------------------- /stream-generator/poetry.lock: -------------------------------------------------------------------------------- 1 | [[package]] 2 | category = "dev" 3 | description = "Disable App Nap on OS X 10.9" 4 | marker = "sys_platform == \"darwin\"" 5 | name = "appnope" 6 | optional = false 7 | python-versions = "*" 8 | version = "0.1.0" 9 | 10 | [[package]] 11 | category = "dev" 12 | description = "Specifications for callback functions passed in to an API" 13 | name = "backcall" 14 | optional = false 15 | python-versions = "*" 16 | version = "0.1.0" 17 | 18 | [[package]] 19 | category = "main" 20 | description = "Python package for providing Mozilla's CA Bundle." 21 | name = "certifi" 22 | optional = false 23 | python-versions = "*" 24 | version = "2020.4.5.1" 25 | 26 | [[package]] 27 | category = "main" 28 | description = "Universal encoding detector for Python 2 and 3" 29 | name = "chardet" 30 | optional = false 31 | python-versions = "*" 32 | version = "3.0.4" 33 | 34 | [[package]] 35 | category = "dev" 36 | description = "Cross-platform colored terminal text." 37 | marker = "sys_platform == \"win32\"" 38 | name = "colorama" 39 | optional = false 40 | python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*" 41 | version = "0.4.3" 42 | 43 | [[package]] 44 | category = "dev" 45 | description = "Decorators for Humans" 46 | name = "decorator" 47 | optional = false 48 | python-versions = ">=2.6, !=3.0.*, !=3.1.*" 49 | version = "4.4.2" 50 | 51 | [[package]] 52 | category = "main" 53 | description = "Internationalized Domain Names in Applications (IDNA)" 54 | name = "idna" 55 | optional = false 56 | python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*" 57 | version = "2.9" 58 | 59 | [[package]] 60 | category = "dev" 61 | description = "IPython: Productive Interactive Computing" 62 | name = "ipython" 63 | optional = false 64 | python-versions = ">=3.6" 65 | version = "7.13.0" 66 | 67 | [package.dependencies] 68 | appnope = "*" 69 | backcall = "*" 70 | colorama = "*" 71 | decorator = "*" 72 | jedi = ">=0.10" 73 | pexpect = "*" 74 | pickleshare = "*" 75 | prompt-toolkit = ">=2.0.0,<3.0.0 || >3.0.0,<3.0.1 || >3.0.1,<3.1.0" 76 | pygments = "*" 77 | setuptools = ">=18.5" 78 | traitlets = ">=4.2" 79 | 80 | [package.extras] 81 | all = ["numpy (>=1.14)", "testpath", "notebook", "nose (>=0.10.1)", "nbconvert", "requests", "ipywidgets", "qtconsole", "ipyparallel", "Sphinx (>=1.3)", "pygments", "nbformat", "ipykernel"] 82 | doc = ["Sphinx (>=1.3)"] 83 | kernel = ["ipykernel"] 84 | nbconvert = ["nbconvert"] 85 | nbformat = ["nbformat"] 86 | notebook = ["notebook", "ipywidgets"] 87 | parallel = ["ipyparallel"] 88 | qtconsole = ["qtconsole"] 89 | test = ["nose (>=0.10.1)", "requests", "testpath", "pygments", "nbformat", "ipykernel", "numpy (>=1.14)"] 90 | 91 | [[package]] 92 | category = "dev" 93 | description = "Vestigial utilities from IPython" 94 | name = "ipython-genutils" 95 | optional = false 96 | python-versions = "*" 97 | version = "0.2.0" 98 | 99 | [[package]] 100 | category = "dev" 101 | description = "An autocompletion tool for Python that can be used for text editors." 102 | name = "jedi" 103 | optional = false 104 | python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*" 105 | version = "0.17.0" 106 | 107 | [package.dependencies] 108 | parso = ">=0.7.0" 109 | 110 | [package.extras] 111 | qa = ["flake8 (3.7.9)"] 112 | testing = ["colorama", "docopt", "pytest (>=3.9.0,<5.0.0)"] 113 | 114 | [[package]] 115 | category = "main" 116 | description = "Pure Python client for Apache Kafka" 117 | name = "kafka-python" 118 | optional = false 119 | python-versions = "*" 120 | version = "2.0.1" 121 | 122 | [[package]] 123 | category = "dev" 124 | description = "Optional static typing for Python" 125 | name = "mypy" 126 | optional = false 127 | python-versions = ">=3.5" 128 | version = "0.770" 129 | 130 | [package.dependencies] 131 | mypy-extensions = ">=0.4.3,<0.5.0" 132 | typed-ast = ">=1.4.0,<1.5.0" 133 | typing-extensions = ">=3.7.4" 134 | 135 | [package.extras] 136 | dmypy = ["psutil (>=4.0)"] 137 | 138 | [[package]] 139 | category = "dev" 140 | description = "Experimental type system extensions for programs checked with the mypy typechecker." 141 | name = "mypy-extensions" 142 | optional = false 143 | python-versions = "*" 144 | version = "0.4.3" 145 | 146 | [[package]] 147 | category = "dev" 148 | description = "A Python Parser" 149 | name = "parso" 150 | optional = false 151 | python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*" 152 | version = "0.7.0" 153 | 154 | [package.extras] 155 | testing = ["docopt", "pytest (>=3.0.7)"] 156 | 157 | [[package]] 158 | category = "dev" 159 | description = "Pexpect allows easy control of interactive console applications." 160 | marker = "sys_platform != \"win32\"" 161 | name = "pexpect" 162 | optional = false 163 | python-versions = "*" 164 | version = "4.8.0" 165 | 166 | [package.dependencies] 167 | ptyprocess = ">=0.5" 168 | 169 | [[package]] 170 | category = "dev" 171 | description = "Tiny 'shelve'-like database with concurrency support" 172 | name = "pickleshare" 173 | optional = false 174 | python-versions = "*" 175 | version = "0.7.5" 176 | 177 | [[package]] 178 | category = "dev" 179 | description = "Library for building powerful interactive command lines in Python" 180 | name = "prompt-toolkit" 181 | optional = false 182 | python-versions = ">=3.6.1" 183 | version = "3.0.5" 184 | 185 | [package.dependencies] 186 | wcwidth = "*" 187 | 188 | [[package]] 189 | category = "dev" 190 | description = "Run a subprocess in a pseudo terminal" 191 | marker = "sys_platform != \"win32\"" 192 | name = "ptyprocess" 193 | optional = false 194 | python-versions = "*" 195 | version = "0.6.0" 196 | 197 | [[package]] 198 | category = "dev" 199 | description = "Pygments is a syntax highlighting package written in Python." 200 | name = "pygments" 201 | optional = false 202 | python-versions = ">=3.5" 203 | version = "2.6.1" 204 | 205 | [[package]] 206 | category = "main" 207 | description = "Python HTTP for Humans." 208 | name = "requests" 209 | optional = false 210 | python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*" 211 | version = "2.23.0" 212 | 213 | [package.dependencies] 214 | certifi = ">=2017.4.17" 215 | chardet = ">=3.0.2,<4" 216 | idna = ">=2.5,<3" 217 | urllib3 = ">=1.21.1,<1.25.0 || >1.25.0,<1.25.1 || >1.25.1,<1.26" 218 | 219 | [package.extras] 220 | security = ["pyOpenSSL (>=0.14)", "cryptography (>=1.3.4)"] 221 | socks = ["PySocks (>=1.5.6,<1.5.7 || >1.5.7)", "win-inet-pton"] 222 | 223 | [[package]] 224 | category = "main" 225 | description = "Python 2 and 3 compatibility utilities" 226 | name = "six" 227 | optional = false 228 | python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*" 229 | version = "1.14.0" 230 | 231 | [[package]] 232 | category = "main" 233 | description = "Python client library for reading Server Sent Event streams." 234 | name = "sseclient" 235 | optional = false 236 | python-versions = "*" 237 | version = "0.0.26" 238 | 239 | [package.dependencies] 240 | requests = ">=2.9" 241 | six = "*" 242 | 243 | [[package]] 244 | category = "dev" 245 | description = "Traitlets Python config system" 246 | name = "traitlets" 247 | optional = false 248 | python-versions = "*" 249 | version = "4.3.3" 250 | 251 | [package.dependencies] 252 | decorator = "*" 253 | ipython-genutils = "*" 254 | six = "*" 255 | 256 | [package.extras] 257 | test = ["pytest", "mock"] 258 | 259 | [[package]] 260 | category = "dev" 261 | description = "a fork of Python 2 and 3 ast modules with type comment support" 262 | name = "typed-ast" 263 | optional = false 264 | python-versions = "*" 265 | version = "1.4.1" 266 | 267 | [[package]] 268 | category = "dev" 269 | description = "Backported and Experimental Type Hints for Python 3.5+" 270 | name = "typing-extensions" 271 | optional = false 272 | python-versions = "*" 273 | version = "3.7.4.2" 274 | 275 | [[package]] 276 | category = "main" 277 | description = "HTTP library with thread-safe connection pooling, file post, and more." 278 | name = "urllib3" 279 | optional = false 280 | python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*, <4" 281 | version = "1.25.9" 282 | 283 | [package.extras] 284 | brotli = ["brotlipy (>=0.6.0)"] 285 | secure = ["certifi", "cryptography (>=1.3.4)", "idna (>=2.0.0)", "pyOpenSSL (>=0.14)", "ipaddress"] 286 | socks = ["PySocks (>=1.5.6,<1.5.7 || >1.5.7,<2.0)"] 287 | 288 | [[package]] 289 | category = "dev" 290 | description = "Measures number of Terminal column cells of wide-character codes" 291 | name = "wcwidth" 292 | optional = false 293 | python-versions = "*" 294 | version = "0.1.9" 295 | 296 | [metadata] 297 | content-hash = "6f3d5cafb770579231bdb7cf9f68431c5f96d85c3f6241b81f3de5d0c3bef6f3" 298 | python-versions = "^3.8" 299 | 300 | [metadata.files] 301 | appnope = [ 302 | {file = "appnope-0.1.0-py2.py3-none-any.whl", hash = "sha256:5b26757dc6f79a3b7dc9fab95359328d5747fcb2409d331ea66d0272b90ab2a0"}, 303 | {file = "appnope-0.1.0.tar.gz", hash = "sha256:8b995ffe925347a2138d7ac0fe77155e4311a0ea6d6da4f5128fe4b3cbe5ed71"}, 304 | ] 305 | backcall = [ 306 | {file = "backcall-0.1.0.tar.gz", hash = "sha256:38ecd85be2c1e78f77fd91700c76e14667dc21e2713b63876c0eb901196e01e4"}, 307 | {file = "backcall-0.1.0.zip", hash = "sha256:bbbf4b1e5cd2bdb08f915895b51081c041bac22394fdfcfdfbe9f14b77c08bf2"}, 308 | ] 309 | certifi = [ 310 | {file = "certifi-2020.4.5.1-py2.py3-none-any.whl", hash = "sha256:1d987a998c75633c40847cc966fcf5904906c920a7f17ef374f5aa4282abd304"}, 311 | {file = "certifi-2020.4.5.1.tar.gz", hash = "sha256:51fcb31174be6e6664c5f69e3e1691a2d72a1a12e90f872cbdb1567eb47b6519"}, 312 | ] 313 | chardet = [ 314 | {file = "chardet-3.0.4-py2.py3-none-any.whl", hash = "sha256:fc323ffcaeaed0e0a02bf4d117757b98aed530d9ed4531e3e15460124c106691"}, 315 | {file = "chardet-3.0.4.tar.gz", hash = "sha256:84ab92ed1c4d4f16916e05906b6b75a6c0fb5db821cc65e70cbd64a3e2a5eaae"}, 316 | ] 317 | colorama = [ 318 | {file = "colorama-0.4.3-py2.py3-none-any.whl", hash = "sha256:7d73d2a99753107a36ac6b455ee49046802e59d9d076ef8e47b61499fa29afff"}, 319 | {file = "colorama-0.4.3.tar.gz", hash = "sha256:e96da0d330793e2cb9485e9ddfd918d456036c7149416295932478192f4436a1"}, 320 | ] 321 | decorator = [ 322 | {file = "decorator-4.4.2-py2.py3-none-any.whl", hash = "sha256:41fa54c2a0cc4ba648be4fd43cff00aedf5b9465c9bf18d64325bc225f08f760"}, 323 | {file = "decorator-4.4.2.tar.gz", hash = "sha256:e3a62f0520172440ca0dcc823749319382e377f37f140a0b99ef45fecb84bfe7"}, 324 | ] 325 | idna = [ 326 | {file = "idna-2.9-py2.py3-none-any.whl", hash = "sha256:a068a21ceac8a4d63dbfd964670474107f541babbd2250d61922f029858365fa"}, 327 | {file = "idna-2.9.tar.gz", hash = "sha256:7588d1c14ae4c77d74036e8c22ff447b26d0fde8f007354fd48a7814db15b7cb"}, 328 | ] 329 | ipython = [ 330 | {file = "ipython-7.13.0-py3-none-any.whl", hash = "sha256:eb8d075de37f678424527b5ef6ea23f7b80240ca031c2dd6de5879d687a65333"}, 331 | {file = "ipython-7.13.0.tar.gz", hash = "sha256:ca478e52ae1f88da0102360e57e528b92f3ae4316aabac80a2cd7f7ab2efb48a"}, 332 | ] 333 | ipython-genutils = [ 334 | {file = "ipython_genutils-0.2.0-py2.py3-none-any.whl", hash = "sha256:72dd37233799e619666c9f639a9da83c34013a73e8bbc79a7a6348d93c61fab8"}, 335 | {file = "ipython_genutils-0.2.0.tar.gz", hash = "sha256:eb2e116e75ecef9d4d228fdc66af54269afa26ab4463042e33785b887c628ba8"}, 336 | ] 337 | jedi = [ 338 | {file = "jedi-0.17.0-py2.py3-none-any.whl", hash = "sha256:cd60c93b71944d628ccac47df9a60fec53150de53d42dc10a7fc4b5ba6aae798"}, 339 | {file = "jedi-0.17.0.tar.gz", hash = "sha256:df40c97641cb943661d2db4c33c2e1ff75d491189423249e989bcea4464f3030"}, 340 | ] 341 | kafka-python = [ 342 | {file = "kafka-python-2.0.1.tar.gz", hash = "sha256:e59ad42dec8c7d54e3fbba0c1f8b54c44d92a3392d88242962d0c29803f2f6f8"}, 343 | {file = "kafka_python-2.0.1-py2.py3-none-any.whl", hash = "sha256:513431184ecd08e706ca53421ff23e269fc052374084b45b49640419564dd704"}, 344 | ] 345 | mypy = [ 346 | {file = "mypy-0.770-cp35-cp35m-macosx_10_6_x86_64.whl", hash = "sha256:a34b577cdf6313bf24755f7a0e3f3c326d5c1f4fe7422d1d06498eb25ad0c600"}, 347 | {file = "mypy-0.770-cp35-cp35m-manylinux1_x86_64.whl", hash = "sha256:86c857510a9b7c3104cf4cde1568f4921762c8f9842e987bc03ed4f160925754"}, 348 | {file = "mypy-0.770-cp35-cp35m-win_amd64.whl", hash = "sha256:a8ffcd53cb5dfc131850851cc09f1c44689c2812d0beb954d8138d4f5fc17f65"}, 349 | {file = "mypy-0.770-cp36-cp36m-macosx_10_6_x86_64.whl", hash = "sha256:7687f6455ec3ed7649d1ae574136835a4272b65b3ddcf01ab8704ac65616c5ce"}, 350 | {file = "mypy-0.770-cp36-cp36m-manylinux1_x86_64.whl", hash = "sha256:3beff56b453b6ef94ecb2996bea101a08f1f8a9771d3cbf4988a61e4d9973761"}, 351 | {file = "mypy-0.770-cp36-cp36m-win_amd64.whl", hash = "sha256:15b948e1302682e3682f11f50208b726a246ab4e6c1b39f9264a8796bb416aa2"}, 352 | {file = "mypy-0.770-cp37-cp37m-macosx_10_6_x86_64.whl", hash = "sha256:b90928f2d9eb2f33162405f32dde9f6dcead63a0971ca8a1b50eb4ca3e35ceb8"}, 353 | {file = "mypy-0.770-cp37-cp37m-manylinux1_x86_64.whl", hash = "sha256:c56ffe22faa2e51054c5f7a3bc70a370939c2ed4de308c690e7949230c995913"}, 354 | {file = "mypy-0.770-cp37-cp37m-win_amd64.whl", hash = "sha256:8dfb69fbf9f3aeed18afffb15e319ca7f8da9642336348ddd6cab2713ddcf8f9"}, 355 | {file = "mypy-0.770-cp38-cp38-macosx_10_9_x86_64.whl", hash = "sha256:219a3116ecd015f8dca7b5d2c366c973509dfb9a8fc97ef044a36e3da66144a1"}, 356 | {file = "mypy-0.770-cp38-cp38-manylinux1_x86_64.whl", hash = "sha256:7ec45a70d40ede1ec7ad7f95b3c94c9cf4c186a32f6bacb1795b60abd2f9ef27"}, 357 | {file = "mypy-0.770-cp38-cp38-win_amd64.whl", hash = "sha256:f91c7ae919bbc3f96cd5e5b2e786b2b108343d1d7972ea130f7de27fdd547cf3"}, 358 | {file = "mypy-0.770-py3-none-any.whl", hash = "sha256:3b1fc683fb204c6b4403a1ef23f0b1fac8e4477091585e0c8c54cbdf7d7bb164"}, 359 | {file = "mypy-0.770.tar.gz", hash = "sha256:8a627507ef9b307b46a1fea9513d5c98680ba09591253082b4c48697ba05a4ae"}, 360 | ] 361 | mypy-extensions = [ 362 | {file = "mypy_extensions-0.4.3-py2.py3-none-any.whl", hash = "sha256:090fedd75945a69ae91ce1303b5824f428daf5a028d2f6ab8a299250a846f15d"}, 363 | {file = "mypy_extensions-0.4.3.tar.gz", hash = "sha256:2d82818f5bb3e369420cb3c4060a7970edba416647068eb4c5343488a6c604a8"}, 364 | ] 365 | parso = [ 366 | {file = "parso-0.7.0-py2.py3-none-any.whl", hash = "sha256:158c140fc04112dc45bca311633ae5033c2c2a7b732fa33d0955bad8152a8dd0"}, 367 | {file = "parso-0.7.0.tar.gz", hash = "sha256:908e9fae2144a076d72ae4e25539143d40b8e3eafbaeae03c1bfe226f4cdf12c"}, 368 | ] 369 | pexpect = [ 370 | {file = "pexpect-4.8.0-py2.py3-none-any.whl", hash = "sha256:0b48a55dcb3c05f3329815901ea4fc1537514d6ba867a152b581d69ae3710937"}, 371 | {file = "pexpect-4.8.0.tar.gz", hash = "sha256:fc65a43959d153d0114afe13997d439c22823a27cefceb5ff35c2178c6784c0c"}, 372 | ] 373 | pickleshare = [ 374 | {file = "pickleshare-0.7.5-py2.py3-none-any.whl", hash = "sha256:9649af414d74d4df115d5d718f82acb59c9d418196b7b4290ed47a12ce62df56"}, 375 | {file = "pickleshare-0.7.5.tar.gz", hash = "sha256:87683d47965c1da65cdacaf31c8441d12b8044cdec9aca500cd78fc2c683afca"}, 376 | ] 377 | prompt-toolkit = [ 378 | {file = "prompt_toolkit-3.0.5-py3-none-any.whl", hash = "sha256:df7e9e63aea609b1da3a65641ceaf5bc7d05e0a04de5bd45d05dbeffbabf9e04"}, 379 | {file = "prompt_toolkit-3.0.5.tar.gz", hash = "sha256:563d1a4140b63ff9dd587bda9557cffb2fe73650205ab6f4383092fb882e7dc8"}, 380 | ] 381 | ptyprocess = [ 382 | {file = "ptyprocess-0.6.0-py2.py3-none-any.whl", hash = "sha256:d7cc528d76e76342423ca640335bd3633420dc1366f258cb31d05e865ef5ca1f"}, 383 | {file = "ptyprocess-0.6.0.tar.gz", hash = "sha256:923f299cc5ad920c68f2bc0bc98b75b9f838b93b599941a6b63ddbc2476394c0"}, 384 | ] 385 | pygments = [ 386 | {file = "Pygments-2.6.1-py3-none-any.whl", hash = "sha256:ff7a40b4860b727ab48fad6360eb351cc1b33cbf9b15a0f689ca5353e9463324"}, 387 | {file = "Pygments-2.6.1.tar.gz", hash = "sha256:647344a061c249a3b74e230c739f434d7ea4d8b1d5f3721bc0f3558049b38f44"}, 388 | ] 389 | requests = [ 390 | {file = "requests-2.23.0-py2.py3-none-any.whl", hash = "sha256:43999036bfa82904b6af1d99e4882b560e5e2c68e5c4b0aa03b655f3d7d73fee"}, 391 | {file = "requests-2.23.0.tar.gz", hash = "sha256:b3f43d496c6daba4493e7c431722aeb7dbc6288f52a6e04e7b6023b0247817e6"}, 392 | ] 393 | six = [ 394 | {file = "six-1.14.0-py2.py3-none-any.whl", hash = "sha256:8f3cd2e254d8f793e7f3d6d9df77b92252b52637291d0f0da013c76ea2724b6c"}, 395 | {file = "six-1.14.0.tar.gz", hash = "sha256:236bdbdce46e6e6a3d61a337c0f8b763ca1e8717c03b369e87a7ec7ce1319c0a"}, 396 | ] 397 | sseclient = [ 398 | {file = "sseclient-0.0.26.tar.gz", hash = "sha256:33f45ab71bb6369025d6a1014e15f12774f7ea25b7e80eeb00bd73668d5fefad"}, 399 | ] 400 | traitlets = [ 401 | {file = "traitlets-4.3.3-py2.py3-none-any.whl", hash = "sha256:70b4c6a1d9019d7b4f6846832288f86998aa3b9207c6821f3578a6a6a467fe44"}, 402 | {file = "traitlets-4.3.3.tar.gz", hash = "sha256:d023ee369ddd2763310e4c3eae1ff649689440d4ae59d7485eb4cfbbe3e359f7"}, 403 | ] 404 | typed-ast = [ 405 | {file = "typed_ast-1.4.1-cp35-cp35m-manylinux1_i686.whl", hash = "sha256:73d785a950fc82dd2a25897d525d003f6378d1cb23ab305578394694202a58c3"}, 406 | {file = "typed_ast-1.4.1-cp35-cp35m-manylinux1_x86_64.whl", hash = "sha256:aaee9905aee35ba5905cfb3c62f3e83b3bec7b39413f0a7f19be4e547ea01ebb"}, 407 | {file = "typed_ast-1.4.1-cp35-cp35m-win32.whl", hash = "sha256:0c2c07682d61a629b68433afb159376e24e5b2fd4641d35424e462169c0a7919"}, 408 | {file = "typed_ast-1.4.1-cp35-cp35m-win_amd64.whl", hash = "sha256:4083861b0aa07990b619bd7ddc365eb7fa4b817e99cf5f8d9cf21a42780f6e01"}, 409 | {file = "typed_ast-1.4.1-cp36-cp36m-macosx_10_9_x86_64.whl", hash = "sha256:269151951236b0f9a6f04015a9004084a5ab0d5f19b57de779f908621e7d8b75"}, 410 | {file = "typed_ast-1.4.1-cp36-cp36m-manylinux1_i686.whl", hash = "sha256:24995c843eb0ad11a4527b026b4dde3da70e1f2d8806c99b7b4a7cf491612652"}, 411 | {file = "typed_ast-1.4.1-cp36-cp36m-manylinux1_x86_64.whl", hash = "sha256:fe460b922ec15dd205595c9b5b99e2f056fd98ae8f9f56b888e7a17dc2b757e7"}, 412 | {file = "typed_ast-1.4.1-cp36-cp36m-win32.whl", hash = "sha256:4e3e5da80ccbebfff202a67bf900d081906c358ccc3d5e3c8aea42fdfdfd51c1"}, 413 | {file = "typed_ast-1.4.1-cp36-cp36m-win_amd64.whl", hash = "sha256:249862707802d40f7f29f6e1aad8d84b5aa9e44552d2cc17384b209f091276aa"}, 414 | {file = "typed_ast-1.4.1-cp37-cp37m-macosx_10_9_x86_64.whl", hash = "sha256:8ce678dbaf790dbdb3eba24056d5364fb45944f33553dd5869b7580cdbb83614"}, 415 | {file = "typed_ast-1.4.1-cp37-cp37m-manylinux1_i686.whl", hash = "sha256:c9e348e02e4d2b4a8b2eedb48210430658df6951fa484e59de33ff773fbd4b41"}, 416 | {file = "typed_ast-1.4.1-cp37-cp37m-manylinux1_x86_64.whl", hash = "sha256:bcd3b13b56ea479b3650b82cabd6b5343a625b0ced5429e4ccad28a8973f301b"}, 417 | {file = "typed_ast-1.4.1-cp37-cp37m-win32.whl", hash = "sha256:d5d33e9e7af3b34a40dc05f498939f0ebf187f07c385fd58d591c533ad8562fe"}, 418 | {file = "typed_ast-1.4.1-cp37-cp37m-win_amd64.whl", hash = "sha256:0666aa36131496aed8f7be0410ff974562ab7eeac11ef351def9ea6fa28f6355"}, 419 | {file = "typed_ast-1.4.1-cp38-cp38-macosx_10_15_x86_64.whl", hash = "sha256:d205b1b46085271b4e15f670058ce182bd1199e56b317bf2ec004b6a44f911f6"}, 420 | {file = "typed_ast-1.4.1-cp38-cp38-manylinux1_i686.whl", hash = "sha256:6daac9731f172c2a22ade6ed0c00197ee7cc1221aa84cfdf9c31defeb059a907"}, 421 | {file = "typed_ast-1.4.1-cp38-cp38-manylinux1_x86_64.whl", hash = "sha256:498b0f36cc7054c1fead3d7fc59d2150f4d5c6c56ba7fb150c013fbc683a8d2d"}, 422 | {file = "typed_ast-1.4.1-cp38-cp38-win32.whl", hash = "sha256:715ff2f2df46121071622063fc7543d9b1fd19ebfc4f5c8895af64a77a8c852c"}, 423 | {file = "typed_ast-1.4.1-cp38-cp38-win_amd64.whl", hash = "sha256:fc0fea399acb12edbf8a628ba8d2312f583bdbdb3335635db062fa98cf71fca4"}, 424 | {file = "typed_ast-1.4.1-cp39-cp39-macosx_10_15_x86_64.whl", hash = "sha256:d43943ef777f9a1c42bf4e552ba23ac77a6351de620aa9acf64ad54933ad4d34"}, 425 | {file = "typed_ast-1.4.1.tar.gz", hash = "sha256:8c8aaad94455178e3187ab22c8b01a3837f8ee50e09cf31f1ba129eb293ec30b"}, 426 | ] 427 | typing-extensions = [ 428 | {file = "typing_extensions-3.7.4.2-py2-none-any.whl", hash = "sha256:f8d2bd89d25bc39dabe7d23df520442fa1d8969b82544370e03d88b5a591c392"}, 429 | {file = "typing_extensions-3.7.4.2-py3-none-any.whl", hash = "sha256:6e95524d8a547a91e08f404ae485bbb71962de46967e1b71a0cb89af24e761c5"}, 430 | {file = "typing_extensions-3.7.4.2.tar.gz", hash = "sha256:79ee589a3caca649a9bfd2a8de4709837400dfa00b6cc81962a1e6a1815969ae"}, 431 | ] 432 | urllib3 = [ 433 | {file = "urllib3-1.25.9-py2.py3-none-any.whl", hash = "sha256:88206b0eb87e6d677d424843ac5209e3fb9d0190d0ee169599165ec25e9d9115"}, 434 | {file = "urllib3-1.25.9.tar.gz", hash = "sha256:3018294ebefce6572a474f0604c2021e33b3fd8006ecd11d62107a5d2a963527"}, 435 | ] 436 | wcwidth = [ 437 | {file = "wcwidth-0.1.9-py2.py3-none-any.whl", hash = "sha256:cafe2186b3c009a04067022ce1dcd79cb38d8d65ee4f4791b8888d6599d1bbe1"}, 438 | {file = "wcwidth-0.1.9.tar.gz", hash = "sha256:ee73862862a156bf77ff92b09034fc4825dd3af9cf81bc5b360668d425f3c5f1"}, 439 | ] 440 | -------------------------------------------------------------------------------- /stream-generator/pyproject.toml: -------------------------------------------------------------------------------- 1 | [tool.poetry] 2 | name = "stream-generator" 3 | version = "0.1.0" 4 | description = "" 5 | authors = ["Thomas Cooper "] 6 | 7 | [tool.poetry.dependencies] 8 | python = "^3.8" 9 | sseclient = "^0.0.26" 10 | kafka-python = "^2.0.1" 11 | 12 | [tool.poetry.dev-dependencies] 13 | ipython = "^7.13.0" 14 | mypy = "^0.770" 15 | 16 | [build-system] 17 | requires = ["poetry>=0.12"] 18 | build-backend = "poetry.masonry.api" 19 | -------------------------------------------------------------------------------- /stream-generator/stream_generator.py: -------------------------------------------------------------------------------- 1 | import logging 2 | import json 3 | import os 4 | import sys 5 | import signal 6 | 7 | from argparse import ArgumentParser, Namespace 8 | from typing import Iterator, Optional 9 | 10 | from sseclient import SSEClient as EventSource 11 | from kafka import KafkaProducer 12 | 13 | WIKIPEDIA_CHANGES: str = "https://stream.wikimedia.org/v2/stream/recentchange" 14 | 15 | BOOTSTRAP_ENV_VAR: str = "KAFKA_BOOTSTRAP_SERVERS" 16 | 17 | LOG: logging.Logger = logging.getLogger("streamingSQL.generator") 18 | 19 | def wikipedia_changes() -> Iterator[Optional[dict]]: 20 | 21 | for event in EventSource(WIKIPEDIA_CHANGES): 22 | if event.event == 'message': 23 | LOG.debug("Processing new message") 24 | try: 25 | change = json.loads(event.data) 26 | except json.JSONDecodeError: 27 | LOG.warning("JSON Decode failed on message: %s", event.data) 28 | yield None 29 | else: 30 | LOG.debug("Yielding message") 31 | yield change 32 | 33 | def setup_logging(logger: logging.Logger, debug: bool = False) -> None: 34 | 35 | style = "{" 36 | 37 | if debug: 38 | level: int = logging.DEBUG 39 | console_fmt: str = "{asctime} | {levelname} | {name} | F:{funcName} | L:{lineno} | {message}" 40 | 41 | else: 42 | level = logging.INFO 43 | console_fmt = "{asctime} | {levelname} | {name} | {message}" 44 | 45 | console_handler: logging.StreamHandler = logging.StreamHandler() 46 | console_handler.setFormatter(logging.Formatter(fmt=console_fmt, style=style)) 47 | logger.addHandler(console_handler) 48 | logger.setLevel(level) 49 | 50 | def create_parser() -> ArgumentParser: 51 | 52 | parser: ArgumentParser = ArgumentParser("Source stream generation program") 53 | 54 | parser.add_argument( 55 | "--debug", 56 | required=False, 57 | action="store_true", 58 | help="Flag indicating if debug information should be logged.", 59 | ) 60 | 61 | parser.add_argument( 62 | "-bs", 63 | "--bootstrap_servers", 64 | required=False, 65 | help="The server boostrap string for the Kafka cluster", 66 | ) 67 | 68 | return parser 69 | 70 | def send_to_kafka(producer: KafkaProducer, topic: str, payload_str: str, key_str: Optional[str] = None): 71 | 72 | try: 73 | payload: bytes = payload_str.encode("UTF-8") 74 | except UnicodeError as serialise_error: 75 | LOG.warning("Error encoding message payload to bytes: %s", str(serialise_error)) 76 | return None 77 | 78 | key: Optional[bytes] 79 | if key_str: 80 | try: 81 | key = key_str.encode("UTF-8") 82 | except UnicodeError as serialise_error: 83 | LOG.warning("Error encoding message key to bytes: %s", str(serialise_error)) 84 | return None 85 | else: 86 | key = None 87 | 88 | try: 89 | if key: 90 | producer.send(topic, key=key, value=payload) 91 | else: 92 | producer.send(topic, value=payload) 93 | except Exception as kafka_error: 94 | LOG.warning("Error when sending to Kafka: %s", str(kafka_error)) 95 | 96 | 97 | if __name__ == "__main__": 98 | 99 | PARSER: ArgumentParser = create_parser() 100 | ARGS: Namespace = PARSER.parse_args() 101 | 102 | TOP_LOG: logging.Logger = logging.getLogger("streamingSQL") 103 | 104 | setup_logging(TOP_LOG, ARGS.debug) 105 | 106 | TOP_LOG.info("Starting stream generation program") 107 | 108 | TOP_LOG.info( 109 | "Creating Kafka Producer for Kafka Cluster at: %s", 110 | ARGS.bootstrap_servers, 111 | ) 112 | 113 | # If the command line args are specified then use them. If not then look for env vars 114 | # and if they aren't present exit. 115 | if not ARGS.bootstrap_servers: 116 | if BOOTSTRAP_ENV_VAR in os.environ: 117 | BOOTSTRAP: str = os.environ[BOOTSTRAP_ENV_VAR] 118 | TOP_LOG.info("Using Kafka bootstrap address (%s) defined in %s environment variable", 119 | BOOTSTRAP, BOOTSTRAP_ENV_VAR) 120 | else: 121 | TOP_LOG.error( 122 | "Kafka boostrap servers string was not supplied via the command line " 123 | "argument or the environment variable (%s). Exiting.", BOOTSTRAP_ENV_VAR) 124 | sys.exit(1) 125 | else: 126 | BOOTSTRAP = ARGS.bootstrap_servers 127 | 128 | TOP_LOG.info( 129 | "Creating Kafka Producer for Kafka Cluster at: %s", BOOTSTRAP, 130 | ) 131 | 132 | PRODUCER: KafkaProducer = KafkaProducer(bootstrap_servers=BOOTSTRAP) 133 | 134 | def terminate_handler(sigterm, frame): 135 | TOP_LOG.warning("SIGTERM signal received, closing Kafka Producer") 136 | PRODUCER.close() 137 | 138 | #Intercept SIGTERM and close PRODUCER gracefully 139 | signal.signal(signal.SIGTERM, terminate_handler) 140 | 141 | WIKI_CHANGES: Iterator[Optional[dict]] = wikipedia_changes() 142 | 143 | try: 144 | TOP_LOG.info("Starting stream generation") 145 | while True: 146 | 147 | try: 148 | change: Optional[dict] = next(WIKI_CHANGES) 149 | except Exception as read_err: 150 | TOP_LOG.warning("Error fetching Wikipedia change message: %s", str(read_err)) 151 | continue 152 | 153 | if not change: 154 | TOP_LOG.warning("Returned wiki change was empty") 155 | continue 156 | 157 | try: 158 | payload_str: str = json.dumps(change) 159 | except Exception as json_error: 160 | TOP_LOG.warning("Error encoding change message to JSON: %s", str(json_error)) 161 | continue 162 | 163 | # Send raw wiki change to wiki-changes 164 | send_to_kafka(PRODUCER, "wiki-changes", payload_str=payload_str) 165 | 166 | # Send user and title to user-titles topic for simple example. 167 | try: 168 | send_to_kafka(PRODUCER, "user-titles", key_str=change["user"], payload_str=change["title"]) 169 | except KeyError as key_error: 170 | TOP_LOG.error("Wiki change dictionary did not contain expected keys") 171 | continue 172 | 173 | except KeyboardInterrupt: 174 | TOP_LOG.info("Closing Kafka producer...") 175 | PRODUCER.close() 176 | TOP_LOG.info("Kafka producer closed. Exiting.") 177 | 178 | --------------------------------------------------------------------------------