├── .gitignore ├── README.md ├── compose.yaml ├── consumer_thumbnail.jpg ├── github-firehose ├── main.py └── requirements.txt ├── kafka_to_google_thumbnail.jpg ├── performance_thumbnail.jpg ├── processor_thumbnail.png ├── producer_thumbnail.png ├── read_from_kafka ├── main.py └── requirements.txt ├── send_to_kafka ├── main.py └── requirements.txt ├── weather_processor ├── main.py └── requirements.txt └── weather_to_google ├── main.py └── requirements.txt /.gitignore: -------------------------------------------------------------------------------- 1 | env/ 2 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Python and Kafka with Quix Streams 2 | 3 | Supporting code for our [step-by-step coding walkthroughs][youtube-playlist]. 4 | 5 | # A Simple Producer 6 | 7 | [![YouTube Producer Video Thumbnail](producer_thumbnail.png?raw=true)][youtube-producer] 8 | There's a complete [walkthrough video here][youtube-producer]. To run the code in this repo: 9 | 10 | ```sh 11 | # Set up an environment. 12 | cd send_to_kafka 13 | python3 -m venv env 14 | source env/bin/activate 15 | pip install -r requirements.txt 16 | 17 | # Run the producer. 18 | python3 main.py 19 | ``` 20 | 21 | # A Simple Consumer 22 | 23 | [![YouTube Consumer Video Thumbnail](consumer_thumbnail.jpg?raw=true)][youtube-consumer] 24 | There's a complete [walkthrough video here][youtube-consumer]. To run the code in this repo: 25 | 26 | ```sh 27 | # Set up an environment. 28 | cd read_from_kafka 29 | python3 -m venv env 30 | source env/bin/activate 31 | pip install -r requirements.txt 32 | 33 | # Run the consumer. 34 | python3 main.py 35 | ``` 36 | 37 | # A Simple Processor 38 | 39 | [![YouTube Processor Video Thumbnail](processor_thumbnail.png?raw=true)][youtube-processor] 40 | There's a complete [walkthrough video here][youtube-processor]. To run the code in this repo: 41 | 42 | ```sh 43 | # Set up an environment. 44 | cd weather_processor 45 | python3 -m venv env 46 | source env/bin/activate 47 | pip install -r requirements.txt 48 | 49 | # Run the stream processor. 50 | python3 main.py 51 | ``` 52 | 53 | # Stream Processing to Google Spreadsheets 54 | 55 | [![YouTube Kafka To Google Video Thumbnail](kafka_to_google_thumbnail.jpg?raw=true)][youtube-kafka-to-google] 56 | There's a complete [walkthrough video here][youtube-kafka-to-google]. 57 | 58 | To run the code in this repo, you'll first need to create a Google Developer 59 | client API key file, using the [Google Developer 60 | Console][google-developer-console]. Copy that `client_secret.json` file it 61 | gives you into the `weather_to_google` directory. Then: 62 | 63 | ```sh 64 | # Set up an environment. 65 | cd weather_to_google 66 | python3 -m venv env 67 | source env/bin/activate 68 | pip install -r requirements.txt 69 | 70 | # Run the stream processor. 71 | python3 main.py 72 | ``` 73 | 74 | 75 | # Performant Python Producers 76 | 77 | [![YouTube Performant Python Producers Video Thumbnail](performance_thumbnail.jpg?raw=true)][youtube-performance] 78 | There's a complete [walkthrough video here][youtube-performance]. 79 | 80 | ```sh 81 | # Set up an environment. 82 | cd github-firehose 83 | python3 -m venv env 84 | source env/bin/activate 85 | pip install -r requirements.txt 86 | 87 | # Run the stream processor. 88 | python3 main.py 89 | ``` 90 | 91 | [youtube-producer]: https://youtu.be/D2NYvGlbK0M 92 | [youtube-consumer]: https://youtu.be/eCsSAzTy5cE 93 | [youtube-processor]: https://youtu.be/5sqegy_EPa0 94 | [youtube-kafka-to-google]: https://youtu.be/UHuQndx83I8 95 | [youtube-performance]: https://youtu.be/mdhEXg5Pny8 96 | [youtube-playlist]: https://www.youtube.com/playlist?list=PL5gMntduShmyJd2fsflN1jwLW9XtDMFAX 97 | [google-developer-console]: https://console.cloud.google.com/ 98 | -------------------------------------------------------------------------------- /compose.yaml: -------------------------------------------------------------------------------- 1 | services: 2 | kafka-broker: 3 | image: docker.redpanda.com/redpandadata/redpanda:v24.1.1 4 | command: | 5 | redpanda start 6 | --smp 1 7 | --overprovisioned 8 | --node-id 0 9 | --kafka-addr internal://0.0.0.0:9092,external://0.0.0.0:19092 10 | --advertise-kafka-addr internal://kafka-broker:9092,external://localhost:9092 11 | --pandaproxy-addr internal://0.0.0.0:8082,external://0.0.0.0:18082 12 | --advertise-pandaproxy-addr internal://kafka-broker:8082,external://localhost:18082 13 | --schema-registry-addr internal://0.0.0.0:8081,external://0.0.0.0:18081 14 | --rpc-addr kafka-broker:33145 15 | --advertise-rpc-addr kafka-broker:33145 16 | --mode dev-container 17 | --set auto_create_topics_enabled=true 18 | ports: 19 | - 18081:18081 20 | - 18082:18082 21 | - 9092:19092 22 | - 19644:9644 23 | console: 24 | image: docker.redpanda.com/redpandadata/console:v2.5.2 25 | entrypoint: /bin/sh 26 | command: |- 27 | -c 'echo "$$CONSOLE_CONFIG_FILE" > /tmp/config.yml; /app/console' 28 | ports: 29 | - 8080:8080 30 | environment: 31 | CONFIG_FILEPATH: /tmp/config.yml 32 | CONSOLE_CONFIG_FILE: > 33 | kafka: 34 | brokers: ["kafka-broker:9092"] 35 | schemaRegistry: 36 | enabled: true 37 | urls: ["http://kafka-broker:8081"] 38 | redpanda: 39 | adminApi: 40 | enabled: true 41 | urls: ["http://kafka-broker:9644"] 42 | connect: 43 | enabled: true 44 | clusters: 45 | - name: local-connect-cluster 46 | url: http://connect:8083 47 | -------------------------------------------------------------------------------- /consumer_thumbnail.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/quixio/simple-kafka-python/51dcdabd4f01aa543116626132f7aaa9e5afca35/consumer_thumbnail.jpg -------------------------------------------------------------------------------- /github-firehose/main.py: -------------------------------------------------------------------------------- 1 | import json 2 | import logging 3 | from pprint import pformat 4 | 5 | from quixstreams import Application 6 | from requests_sse import EventSource 7 | 8 | 9 | def handle_stats(stats_msg: str) -> None: 10 | stats = json.loads(stats_msg) 11 | logging.info("STATS: %s", pformat(stats)) 12 | 13 | 14 | def main() -> None: 15 | logging.info("START") 16 | 17 | app = Application( 18 | broker_address="localhost:19092", 19 | loglevel="DEBUG", 20 | producer_extra_config={ 21 | "statistics.interval.ms": 3 * 1000, 22 | "stats_cb": handle_stats, 23 | "debug": "msg", 24 | "linger.ms": 200, 25 | "compression.type": "gzip", 26 | }, 27 | ) 28 | 29 | with ( 30 | app.get_producer() as producer, 31 | EventSource( 32 | "http://github-firehose.libraries.io/events", 33 | timeout=30, 34 | ) as event_source, 35 | ): 36 | for event in event_source: 37 | value = json.loads(event.data) 38 | key = value["id"] 39 | logging.debug("Got: %s", pformat(value)) 40 | 41 | producer.produce( 42 | topic="github_events", 43 | key=key, 44 | value=json.dumps(value), 45 | ) 46 | 47 | 48 | if __name__ == "__main__": 49 | try: 50 | logging.basicConfig(level="INFO") 51 | main() 52 | except KeyboardInterrupt: 53 | pass 54 | -------------------------------------------------------------------------------- /github-firehose/requirements.txt: -------------------------------------------------------------------------------- 1 | quixstreams>=2.4.0 2 | requests-sse>=0.3.2 3 | -------------------------------------------------------------------------------- /kafka_to_google_thumbnail.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/quixio/simple-kafka-python/51dcdabd4f01aa543116626132f7aaa9e5afca35/kafka_to_google_thumbnail.jpg -------------------------------------------------------------------------------- /performance_thumbnail.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/quixio/simple-kafka-python/51dcdabd4f01aa543116626132f7aaa9e5afca35/performance_thumbnail.jpg -------------------------------------------------------------------------------- /processor_thumbnail.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/quixio/simple-kafka-python/51dcdabd4f01aa543116626132f7aaa9e5afca35/processor_thumbnail.png -------------------------------------------------------------------------------- /producer_thumbnail.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/quixio/simple-kafka-python/51dcdabd4f01aa543116626132f7aaa9e5afca35/producer_thumbnail.png -------------------------------------------------------------------------------- /read_from_kafka/main.py: -------------------------------------------------------------------------------- 1 | from quixstreams import Application 2 | import json 3 | 4 | 5 | def main(): 6 | app = Application( 7 | broker_address="localhost:9092", 8 | loglevel="DEBUG", 9 | consumer_group="weather_reader", 10 | auto_offset_reset="latest", 11 | ) 12 | 13 | with app.get_consumer() as consumer: 14 | consumer.subscribe(["weather_data_demo"]) 15 | 16 | while True: 17 | msg = consumer.poll(1) 18 | 19 | if msg is None: 20 | print("Waiting...") 21 | elif msg.error() is not None: 22 | raise Exception(msg.error()) 23 | else: 24 | key = msg.key().decode("utf8") 25 | value = json.loads(msg.value()) 26 | offset = msg.offset() 27 | 28 | print(f"{offset} {key} {value}") 29 | consumer.store_offsets(msg) 30 | 31 | 32 | if __name__ == "__main__": 33 | try: 34 | main() 35 | except KeyboardInterrupt: 36 | pass 37 | -------------------------------------------------------------------------------- /read_from_kafka/requirements.txt: -------------------------------------------------------------------------------- 1 | quixstreams==2.4 2 | -------------------------------------------------------------------------------- /send_to_kafka/main.py: -------------------------------------------------------------------------------- 1 | import requests 2 | import time 3 | import json 4 | import logging 5 | from quixstreams import Application 6 | 7 | 8 | def get_weather(): 9 | response = requests.get( 10 | "https://api.open-meteo.com/v1/forecast", 11 | params={ 12 | "latitude": 51.5, 13 | "longitude": -0.11, 14 | "current": "temperature_2m", 15 | }, 16 | ) 17 | 18 | return response.json() 19 | 20 | 21 | def main(): 22 | app = Application( 23 | broker_address="localhost:9092", 24 | loglevel="DEBUG", 25 | ) 26 | 27 | with app.get_producer() as producer: 28 | while True: 29 | weather = get_weather() 30 | logging.debug("Got weather: %s", weather) 31 | producer.produce( 32 | topic="weather_data_demo", 33 | key="London", 34 | value=json.dumps(weather), 35 | ) 36 | logging.info("Produced. Sleeping...") 37 | time.sleep(300) 38 | 39 | 40 | if __name__ == "__main__": 41 | logging.basicConfig(level="DEBUG") 42 | main() 43 | -------------------------------------------------------------------------------- /send_to_kafka/requirements.txt: -------------------------------------------------------------------------------- 1 | requests==2.31 2 | quixstreams==2.4 3 | -------------------------------------------------------------------------------- /weather_processor/main.py: -------------------------------------------------------------------------------- 1 | import logging 2 | from quixstreams import Application 3 | 4 | 5 | def main(): 6 | logging.info("START") 7 | app = Application( 8 | broker_address="localhost:9092", 9 | loglevel="DEBUG", 10 | auto_offset_reset="earliest", 11 | consumer_group="weather_processor", 12 | ) 13 | 14 | input_topic = app.topic("weather_data_demo") 15 | output_topic = app.topic("weather_i18n") 16 | 17 | def i18n_weather(msg): 18 | celsius = msg["current"]["temperature_2m"] 19 | fahrenheit = (celsius * 9 / 5) + 32 20 | kelvin = celsius + 273.15 21 | 22 | new_msg = { 23 | "celsius": celsius, 24 | "fahrenheit": round(fahrenheit, 2), 25 | "kelvin": round(kelvin, 2), 26 | } 27 | 28 | logging.debug("Returning: %s", new_msg) 29 | 30 | return new_msg 31 | 32 | sdf = app.dataframe(input_topic) 33 | sdf = sdf.apply(i18n_weather) 34 | sdf = sdf.to_topic(output_topic) 35 | 36 | app.run(sdf) 37 | 38 | 39 | if __name__ == "__main__": 40 | logging.basicConfig(level="DEBUG") 41 | main() 42 | -------------------------------------------------------------------------------- /weather_processor/requirements.txt: -------------------------------------------------------------------------------- 1 | quixstreams==2.4 2 | -------------------------------------------------------------------------------- /weather_to_google/main.py: -------------------------------------------------------------------------------- 1 | import logging 2 | from quixstreams import Application 3 | from uuid import uuid4 4 | from datetime import timedelta 5 | import pygsheets 6 | 7 | 8 | def initializer_fn(msg): 9 | temperature = msg["current"]["temperature_2m"] 10 | 11 | return { 12 | "open": temperature, 13 | "high": temperature, 14 | "low": temperature, 15 | "close": temperature, 16 | } 17 | 18 | 19 | def reducer_fn(summary, msg): 20 | temperature = msg["current"]["temperature_2m"] 21 | 22 | return { 23 | "open": summary["open"], 24 | "high": max(summary["high"], temperature), 25 | "low": min(summary["low"], temperature), 26 | "close": temperature, 27 | } 28 | 29 | 30 | def main(): 31 | app = Application( 32 | broker_address="localhost:9092", 33 | loglevel="DEBUG", 34 | consumer_group="weather_to_google", 35 | auto_offset_reset="earliest", 36 | ) 37 | 38 | input_topic = app.topic("weather_data_demo") 39 | 40 | sdf = app.dataframe(input_topic) 41 | 42 | # sdf = sdf.group_into_hourly_batches(...) 43 | sdf = sdf.tumbling_window(duration_ms=timedelta(hours=1)) 44 | 45 | # sdf = sdf.summarize_that_hour(...) 46 | sdf = sdf.reduce( 47 | initializer=initializer_fn, 48 | reducer=reducer_fn, 49 | ) 50 | sdf = sdf.final() 51 | 52 | sdf = sdf.update(lambda msg: logging.debug("Got: %s", msg)) 53 | 54 | google_api = pygsheets.authorize() 55 | workspace = google_api.open("Weather Sheet") 56 | sheet = workspace[0] 57 | sheet.update_values( 58 | "A1", 59 | [ 60 | [ 61 | "Start", 62 | "End", 63 | "Open", 64 | "High", 65 | "Low", 66 | "Close", 67 | "Date", 68 | ] 69 | ], 70 | ) 71 | 72 | def to_google(msg): 73 | sheet.insert_rows( 74 | 1, 75 | values=[ 76 | msg["start"], 77 | msg["end"], 78 | msg["value"]["open"], 79 | msg["value"]["high"], 80 | msg["value"]["low"], 81 | msg["value"]["close"], 82 | "=EPOCHTODATE(A2 / 1000)", 83 | ], 84 | ) 85 | 86 | sdf = sdf.apply(to_google) 87 | 88 | app.run(sdf) 89 | 90 | 91 | if __name__ == "__main__": 92 | logging.basicConfig(level="DEBUG") 93 | main() 94 | -------------------------------------------------------------------------------- /weather_to_google/requirements.txt: -------------------------------------------------------------------------------- 1 | quixstreams >= 2.4 2 | pygsheets >= 2.0.6 3 | --------------------------------------------------------------------------------