├── 01-processes.livemd ├── 02-tasks.livemd ├── 03-genstage.livemd ├── 04-flow.livemd ├── 05-broadway.livemd ├── LICENSE.txt ├── README.md └── images ├── 00-introduction_avatar_and_name.png.png ├── 03-genstage_architetcture.png ├── 03-genstage_demand.png ├── 04-flow_partitioning.png └── 05-broadway_architecture.png /01-processes.livemd: -------------------------------------------------------------------------------- 1 | # Processes Basics 2 | 3 | ## Introduction 4 | 5 | All Elixir code runs inside *processes*. In this livebook, we'll take a look at the basics of processes, how to spawn them, and how to have them communicate with each other. 6 | 7 | ## Spawning 8 | 9 | The most basic operation to create a new process is to **spawn** it. `spawn/1` takes a function, creates a new process, and executes the function in the new process. 10 | 11 | The process exits when the function returns or when there's an explicit exit (such as an uncaught `raise`). 12 | 13 | Each process is identified by a **PID** (Process IDentifier). 14 | 15 | ```elixir 16 | pid = 17 | spawn(fn -> 18 | Process.sleep(2000) 19 | IO.puts("Finished the expensive computation.") 20 | end) 21 | 22 | IO.puts("Just spawned a process with PID #{inspect(pid)}. Let's wait a bit.") 23 | ``` 24 | 25 | ## Sending and Receiving Messages 26 | 27 | Processes communicate via messages. A process can **send** or **receive** messages. 28 | 29 | ### Sending 30 | 31 | To send messages, you use `send(pid, message)`. The caller process sends `message` to the process identified by `pid`. Sending is **asynchronous**, that is, `send/2` returns as soon as the message is sent. There's no guarantee that when `send/2` returns the destination process has received the message. 32 | 33 | ### Receiving 34 | 35 | To receive messages, a process calls `receive`. `receive` is blocking: it will halt execution until a message that matches one of the listed patterns arrives to the caller process. 36 | 37 | ```elixir 38 | pid = 39 | spawn(fn -> 40 | receive do 41 | message -> 42 | IO.puts("#{inspect(self())} received a message: #{inspect(message)}") 43 | end 44 | end) 45 | 46 | Process.alive?(pid) 47 | ``` 48 | 49 | ```elixir 50 | send(pid, :hello_world) 51 | ``` 52 | 53 | As you can see, the process receives the message and prints it to standard output. Since the function that we passed to `spawn/1` returns after printing the message, the process itself finishes its execution. You can see this by checking whether `pid` represents a process that is alive. 54 | 55 | ```elixir 56 | Process.alive?(pid) 57 | ``` 58 | 59 | ### Pattern Matching on Receive 60 | 61 | `receive` supports multiple `->` clauses. When a message arrives, the first clause that matches it gets executed. This is analogous to `case`. 62 | 63 | ```elixir 64 | pid = 65 | spawn(fn -> 66 | receive do 67 | :ping -> 68 | IO.puts("pong!") 69 | 70 | message -> 71 | IO.puts("#{inspect(self())} received a message: #{inspect(message)}") 72 | end 73 | end) 74 | 75 | send(pid, :ping) 76 | ``` 77 | 78 | ### The Process Mailbox 79 | 80 | Each process has a **mailbox** where all messages it receives end up. It's conceptually similar to a queue. 81 | 82 | When a process receives a message, this message ends up in the process mailbox. The next time there is a `receive` call in that process, this is the algorithm that gets executed to determine what to do with that message. 83 | 84 | 85 | 86 | 87 | 88 | ```mermaid 89 | sequenceDiagram; 90 | participant other_pid as Other process 91 | participant pid as Process 92 | participant mailbox as Mailbox 93 | 94 | other_pid->>pid: Send message 95 | pid->>mailbox: Put message at the end 96 | 97 | pid->>pid: Wait for the next call to receive 98 | 99 | loop Every message in mailbox 100 | alt Matches one clause 101 | pid->>pid: Execute clause 102 | else 103 | pid->>mailbox: Store in the same position 104 | end 105 | end 106 | ``` 107 | 108 | 109 | 110 | Two important things: 111 | 112 | * If the loop reaches the end of the mailbox and no messages match any `receive` clauses, `receive` blocks until a new message comes and the algorithm gets executed again. 113 | * If `receive` is called when there are already messages in the mailbox, the algorithms executes right away (it doesn't wait for a new message). 114 | 115 | 116 | 117 | ### Receive Timeout 118 | 119 | `receive` blocks indefinitely. However, it supports an `after` clause. This clause lets you specify a timeout after which the corresponding code executes and `receive` returns. Be careful using `receive` without `after`, since it could cause the process to halt indefinitely in case there's a bug in your list of patterns. 120 | 121 | ```elixir 122 | spawn(fn -> 123 | receive do 124 | message -> 125 | IO.puts("#{inspect(self())} received a message: #{inspect(message)}") 126 | after 127 | 5_000 -> 128 | IO.puts("Timeout, no messages") 129 | end 130 | end) 131 | ``` 132 | 133 | ## Parallel Map 134 | 135 | Let's use what we know so far to implement parallel mapping. 136 | 137 | ```elixir 138 | defmodule Parallel do 139 | def map(enum, fun) do 140 | # Let's take note of the "parent" PID, since if we call self() 141 | # in the function we pass to spawn/1 then we get the PID of the 142 | # spawned process. 143 | parent = self() 144 | 145 | pids = 146 | Enum.map(enum, fn elem -> 147 | spawn(fn -> 148 | # Compute the mapped element. 149 | mapped_elem = fun.(elem) 150 | 151 | # Send the result back to the "parent". 152 | send(parent, {self(), mapped_elem}) 153 | end) 154 | end) 155 | 156 | Enum.map(pids, fn pid -> 157 | receive do 158 | {^pid, mapped_elem} -> mapped_elem 159 | end 160 | end) 161 | end 162 | end 163 | ``` 164 | 165 | This code is **full** of bugs. 🙈 It doesn't use `after` for timeouts, it doesn't do any error handling, and more. However, it illustrates the idea! Let's give it a spin. 166 | 167 | To see the parallelism in action, let's map over a list of integers representing *timeouts* in milliseconds. We'll map the `Process.sleep/1` function over those. First, let's use `Enum.map/2` to see what happens when we map sequentially, one item at a time: 168 | 169 | ```elixir 170 | {elapsed, _result} = :timer.tc(fn -> Enum.map([1000, 1000, 1000], &Process.sleep/1) end) 171 | IO.puts("Elapsed time: #{elapsed / 1_000_000} s") 172 | ``` 173 | 174 | It takes roughly 3s to execute the code, which makes perfect sense. If we use our `Parallel.map/2` function, it should hopefully take around 1s! 175 | 176 | ```elixir 177 | {elapsed, _result} = :timer.tc(fn -> Parallel.map([1000, 1000, 1000], &Process.sleep/1) end) 178 | IO.puts("Elapsed time: #{elapsed / 1_000_000} s") 179 | ``` 180 | -------------------------------------------------------------------------------- /02-tasks.livemd: -------------------------------------------------------------------------------- 1 | # Tasks 2 | 3 | ## Processes On Steroids 4 | 5 | Tasks are processes that handle a bunch of the nitty-gritty details for you. They're the natural evolution from processes for many use cases. 6 | 7 | Let's start by exploring the `async` + `await` use case. 8 | 9 | ### `async` and `await` 10 | 11 | [`Task.async/1`](https://hexdocs.pm/elixir/Task.html#async/1) spawns a new task, similarly to `spawn/1`. Instead of sending messages back and forth, you can use [`Task.await/1`](https://hexdocs.pm/elixir/Task.html#await/2) to collect the value *returned* by the spawned task. 12 | 13 | ```elixir 14 | task = 15 | Task.async(fn -> 16 | Process.sleep(1000) 17 | IO.puts("Expensive computation is done!") 18 | Enum.random(1..100) 19 | end) 20 | 21 | IO.puts("Running task...") 22 | Task.await(task) 23 | ``` 24 | 25 | ### Yielding 26 | 27 | [`Task.yield/2`](https://hexdocs.pm/elixir/Task.html#yield/2) is similar to `Task.await/1`, but it returns `nil` if the task doesn't return a value within the specified timeout (`Task.await/1` exits instead). 28 | 29 | ```elixir 30 | task = 31 | Task.async(fn -> 32 | Process.sleep(1000) 33 | IO.puts("Expensive computation is done!") 34 | Enum.random(1..100) 35 | end) 36 | 37 | Task.yield(task, _timeout = 5000) 38 | ``` 39 | 40 | As you can see, if the task returns in time, `Task.yield/2` returns `{:ok, result}`. Let's see what happens if the task *doesn't* return in time, instead: 41 | 42 | ```elixir 43 | task = 44 | Task.async(fn -> 45 | Process.sleep(1000) 46 | IO.puts("Expensive computation is done!") 47 | Enum.random(1..100) 48 | end) 49 | 50 | Task.yield(task, _timeout = 500) 51 | ``` 52 | 53 | `Task.yield/2` returns `nil`, but after a while the task seems to still print something. That's because `Task.yield/2` "peeks" into whether the task finished, but doesn't shut the task down in case it hasn't finished. To stop the task, we can use [`Task.shutdown/1`](https://hexdocs.pm/elixir/Task.html#shutdown/2). 54 | 55 | `Task.yield/2` and `Task.shutdown/1` are often combined to implement the use case when you need a computation to be bound by time. It goes something like this: 56 | 57 | 1. Start the computation 58 | 2. Do some other work on the side 59 | 3. When you're ready, check the result of the task with `Task.yield/2`. 60 | 4. If the task does not complete within the timeout, shut down the task. 61 | 62 | `Task.shutdown/1` also takes care of race conditions, which can happen in case the task completes right as we are telling it to shut down. 63 | 64 | ```elixir 65 | task = 66 | Task.async(fn -> 67 | Process.sleep(1000) 68 | IO.puts("Expensive computation is done!") 69 | Enum.random(1..100) 70 | end) 71 | 72 | IO.puts("Running task...") 73 | Task.yield(task, 500) || Task.shutdown(task) 74 | ``` 75 | 76 | ## Parallel Map — Take #2 with async_stream 77 | 78 | `Task` provides *the most underrated function* (IMO) in all of Elixir's standard library: [`Task.async_stream/3`](https://hexdocs.pm/elixir/Task.html#async_stream/3). It takes an enumerable and a function, and returns a stream that maps the function over the enumerable **in parallel**. 79 | 80 | ```elixir 81 | stream = 82 | Task.async_stream([200, 100, 400], fn timeout -> 83 | Process.sleep(timeout) 84 | IO.puts("Slept for #{timeout} ms") 85 | timeout * 2 86 | end) 87 | 88 | Enum.to_list(stream) 89 | ``` 90 | 91 | Seems like nothing special, right? Well, it is! 92 | 93 | `async_stream`'s coolest feature is that it uses a **bounded number of processes**. You can control this number through the `:max_concurrency` option, and it defaults to the number of cores on your machine. This feature is huge: our previous naive parallel-map implementation would spawn one process per element in the enumerable, regardless of the number of elements. Billions of processes? Not good. `async_stream` will happily churn through infinite streams, using `:max_concurrency` processes at a time. 94 | 95 | `async_stream` is also **flexible**. It accepts any enumerable as its input (including infinite streams) and returns itself an enumerable. 96 | 97 | ## When to Use Tasks 98 | 99 | * If you want to perform a few requests to different services and then collect the results 100 | * If you need a simple parallel mapping approach 101 | * If you need to perform a computation in a limited timeframe and want to stop it if it times out 102 | * When you want to spawn a computation in the background ([`Task.start/1`](https://hexdocs.pm/elixir/Task.html#start/1)), for something like side effects 103 | 104 | ## Practical Tips 105 | 106 | ### Tip #1 — `ordered: false` with `async_stream` 107 | 108 | If you're using `Task.async_stream/1` and don't care about the ordering of results, use the `ordered: false` option. 109 | 110 | This is great for when you're using `async_stream/1` to parallelize side-effects over a collection, for example. It's also useful when you're going to do something with the mapped collection that doesn't require ordering, like aggregating into a map. 111 | 112 | ```elixir 113 | print_after_timeout = fn timeout -> 114 | Process.sleep(timeout) 115 | IO.puts("Slept for #{timeout} ms") 116 | timeout 117 | end 118 | 119 | [200, 100, 400] 120 | |> Task.async_stream(print_after_timeout, ordered: false) 121 | |> Enum.to_list() 122 | ``` 123 | 124 | As you can see, the results are returned in the order in which they finish computing, and not in the order of the original list. 125 | 126 | ### Tip #2 — Follow the Documentation for `Task.yield/2` 127 | 128 | The [documentation for `Task.yield/2`](https://hexdocs.pm/elixir/Task.html#yield/2) has a great code snippet to use when you need to perform a time-capped computation. 129 | 130 | ```elixir 131 | task = 132 | Task.async(fn -> 133 | Process.sleep(Enum.random(499..501)) 134 | IO.puts("Done!") 135 | end) 136 | 137 | case Task.yield(task, 500) || Task.shutdown(task) do 138 | {:ok, result} -> result 139 | nil -> :timeout 140 | end 141 | ``` 142 | 143 | ### Tip #3 — `async_stream` Goes a Long Way 144 | 145 | Before talking about GenStage, Broadway, and Flow, I want to stress the importance of `async_stream`. I've seen many cases of solutions that used GenStage or Flow that were essentially overengineered `async_stream`s. `async_stream` has some limitations, but combining the bounded number of processes, the optional ordering, and the fact that it processes lazy streams makes it a great choice in many situations. 146 | -------------------------------------------------------------------------------- /03-genstage.livemd: -------------------------------------------------------------------------------- 1 | # GenStage 2 | 3 | ```elixir 4 | Mix.install([:gen_stage]) 5 | 6 | {:ok, _} = Application.ensure_all_started(:crypto) 7 | ``` 8 | 9 | ## What Is GenStage 10 | 11 | GenStage is a library maintained by the Elixir core team. It provides an abstraction over asynchronous computation that happens through multiple *stages*. 12 | 13 | The idea is this: you have something generating **events** (whatever an event is). You want to feed those events through a pipeline of *stages* with varying topologies. That's what GenStage gives you. 14 | 15 | A GenStage stage is an OTP behaviour, similar to `GenServer` or `:gen_statem`. Below is a small example of a minimal producer stage. 16 | 17 | ```elixir 18 | defmodule SampleStage do 19 | use GenStage 20 | 21 | @impl true 22 | def init(_), do: {:producer, :nostate} 23 | 24 | @impl true 25 | def handle_demand(_demand, :nostate) do 26 | {:noreply, _events = [], :nostate} 27 | end 28 | end 29 | ``` 30 | 31 | ## Stage Types 32 | 33 | GenStage provides three stage types: 34 | 35 | * *Producer* stages 36 | * *Producer-consumer* stages 37 | * *Consumer* stages 38 | 39 | A GenStage pipeline can only have **one producer** and **one consumer**, plus any number of producer-consumers. A pipeline could look something like this: 40 | 41 | ![](images/03-genstage_architetcture.png) 42 | 43 | GenStage stages signal their type by returning it from the `init/1` callback. 44 | 45 | ## Demand 46 | 47 | A foundational notion in GenStage is **demand**. We say that GenStage pipelines are *demand-driven*. Pipelines don't flow from producers to consumers directly: the flow starts the opposite way. The very consuming end of the pipeline starts to "send demand" upstream, declaring itself ready to consume `n` events. 48 | 49 | The demand flows upstream through any producer-consumers and eventually up to the producer. The producer should always generate events according to the demand that arrived to it. Only then events flow downstream through the stages. 50 | 51 | ![](images/03-genstage_demand.png) 52 | 53 | The point of this demand-driven flow is to provide **backpressure**. Events will flow through the pipeline only as fast as stages can consume them. 54 | 55 | ## A Simple Pipeline 56 | 57 | ### Producer 58 | 59 | Let's start with producers. A producer's job is to produce events according to the downstream **demand**. It has to implement the `handle_demand/2` callback. GenStage invokes this callback whenever there is downstream demand, and passes the demand as an integer to it. 60 | 61 | As an example, let's build a producer that just produces random binaries, indefinitely. 62 | 63 | ```elixir 64 | defmodule RandomBinaryProducer do 65 | use GenStage 66 | 67 | def start_link(binary_size) do 68 | GenStage.start_link(__MODULE__, binary_size) 69 | end 70 | 71 | @impl true 72 | def init(binary_size) do 73 | {:producer, binary_size} 74 | end 75 | 76 | @impl true 77 | def handle_demand(demand, binary_size = _state) do 78 | # Processing is expensive! Let's simulate that by sleeping for a bit. 79 | Process.sleep(Enum.random(1000..5000)) 80 | 81 | events = 82 | Stream.repeatedly(fn -> :crypto.strong_rand_bytes(binary_size) end) 83 | |> Enum.take(demand) 84 | 85 | {:noreply, events, binary_size} 86 | end 87 | end 88 | ``` 89 | 90 | ### Consumer 91 | 92 | Now, let's add the dumbest consumer: it'll consume these random binaries and print them to the standard output. So much for interesting examples, right?! 93 | 94 | Consumers implement the `handle_events/3` callback, which is invoked when there are new events coming from the producer. `handle_events/3` can return events to pass downstream to the pipeline. For consumers, however, that list of events must always be `[]`. We'll see it in action in producer-consumers. 95 | 96 | ```elixir 97 | defmodule PrinterConsumer do 98 | use GenStage 99 | 100 | def start_link do 101 | GenStage.start_link(__MODULE__, :nostate) 102 | end 103 | 104 | @impl true 105 | def init(:nostate) do 106 | {:consumer, :nostate} 107 | end 108 | 109 | @impl true 110 | def handle_events(binaries, _from, state) do 111 | Enum.each(binaries, &IO.inspect(&1, label: "Binary consumed in #{inspect(self())}")) 112 | {:noreply, _events = [], state} 113 | end 114 | end 115 | ``` 116 | 117 | ### Wiring It Up 118 | 119 | Alright, we're ready to run our pipeline. As we mentioned, events flow downstream but the pipeline is "kicked off" by demand going upstream, from consumers all the way to producers. 120 | 121 | For this reason, we have to glue the pipeline together starting from the consumer. GenStage provides functions to subscribe a consumer to a producer, such as `GenStage.sync_subscribe/3`. 122 | 123 | ```elixir 124 | {:ok, producer} = RandomBinaryProducer.start_link(_size = 12) 125 | {:ok, consumer1} = PrinterConsumer.start_link() 126 | {:ok, consumer2} = PrinterConsumer.start_link() 127 | 128 | IO.puts("Ready, set, go!") 129 | 130 | {:ok, subscription_tag1} = 131 | GenStage.sync_subscribe(consumer1, 132 | to: producer, 133 | cancel: :temporary, 134 | min_demand: 5, 135 | max_demand: 10 136 | ) 137 | 138 | {:ok, subscription_tag2} = 139 | GenStage.sync_subscribe(consumer2, 140 | to: producer, 141 | cancel: :temporary, 142 | min_demand: 5, 143 | max_demand: 10 144 | ) 145 | 146 | # After 10s, we shut down the pipeline to avoid it printing forever. 147 | Process.sleep(10_000) 148 | GenStage.cancel({producer, subscription_tag1}, :shutdown) 149 | GenStage.cancel({producer, subscription_tag2}, :shutdown) 150 | ``` 151 | 152 | ### Producer-consumer 153 | 154 | Let's add a producer-consumer. It's going to add the MD5 hash of each event it consumes, and emit the event downstream as `{original_event, md5_hash}`. 155 | 156 | ```elixir 157 | defmodule Hasher do 158 | use GenStage 159 | 160 | def start_link do 161 | GenStage.start_link(__MODULE__, :nostate) 162 | end 163 | 164 | @impl true 165 | def init(:nostate) do 166 | {:producer_consumer, :nostate} 167 | end 168 | 169 | @impl true 170 | def handle_events(events, _from, :nostate) do 171 | events = 172 | for event <- events do 173 | {event, Base.encode64(:erlang.md5(event))} 174 | end 175 | 176 | # Here, "events" is not empty. 177 | {:noreply, events, :nostate} 178 | end 179 | end 180 | ``` 181 | 182 | ```elixir 183 | {:ok, producer} = RandomBinaryProducer.start_link(_size = 12) 184 | {:ok, producer_consumer} = Hasher.start_link() 185 | {:ok, consumer} = PrinterConsumer.start_link() 186 | 187 | IO.puts("Ready, set, go!") 188 | 189 | {:ok, first_subscription_tag} = 190 | GenStage.sync_subscribe(consumer, 191 | to: producer_consumer, 192 | cancel: :temporary, 193 | min_demand: 2, 194 | max_demand: 5 195 | ) 196 | 197 | {:ok, second_subscription_tag} = 198 | GenStage.sync_subscribe(producer_consumer, 199 | to: producer, 200 | cancel: :temporary, 201 | min_demand: 2, 202 | max_demand: 5 203 | ) 204 | 205 | Process.sleep(10_000) 206 | GenStage.cancel({producer, second_subscription_tag}, :shutdown) 207 | GenStage.cancel({producer_consumer, first_subscription_tag}, :shutdown) 208 | ``` 209 | 210 | ## Dispatching 211 | 212 | Event dispatching is the missing piece in our understanding of GenStage. GenStage producers and producer-consumers dispatch events downstream based on **dispatchers** which can implement different dispatching strategies. 213 | 214 | The default dispatcher is called a "demand dispatcher". It hands events to the downstream consumer with the *highest demand*. This is intuitive: if a consumer has high demand, it means it already processed events and has "bandwidth" to process more. 215 | 216 | You can write your own dispatcher by writing a module that implements the [`GenStage.Dispatcher` behaviour](https://hexdocs.pm/gen_stage/GenStage.Dispatcher.html). GenStage ships with two useful dispatchers. 217 | 218 | ### [`GenStage.BroadcastDispatcher`](https://hexdocs.pm/gen_stage/GenStage.BroadcastDispatcher.html) 219 | 220 | This dispatcher dispatches *copies* of events to all subscribed downstream consumers. It can be useful, for example, when the same events need to be consumed by consumers that perform different kinds of work. 221 | 222 | In the example below, you'll notice how the same random binary is printed *twice*, once for each consumer. 223 | 224 | ```elixir 225 | defmodule RandomBinaryBroadcaster do 226 | use GenStage 227 | 228 | def start_link(binary_size) do 229 | GenStage.start_link(__MODULE__, binary_size) 230 | end 231 | 232 | @impl true 233 | def init(binary_size) do 234 | {:producer, binary_size, dispatcher: GenStage.BroadcastDispatcher} 235 | end 236 | 237 | @impl true 238 | def handle_demand(demand, binary_size = _state) do 239 | Process.sleep(Enum.random(1000..5000)) 240 | 241 | events = 242 | Stream.repeatedly(fn -> :crypto.strong_rand_bytes(binary_size) end) 243 | |> Enum.take(demand) 244 | 245 | {:noreply, events, binary_size} 246 | end 247 | end 248 | 249 | {:ok, producer} = RandomBinaryBroadcaster.start_link(_size = 12) 250 | {:ok, consumer1} = PrinterConsumer.start_link() 251 | {:ok, consumer2} = PrinterConsumer.start_link() 252 | 253 | IO.puts("Ready, set, go!") 254 | 255 | {:ok, subscription_tag1} = 256 | GenStage.sync_subscribe(consumer1, 257 | to: producer, 258 | cancel: :temporary, 259 | min_demand: 5, 260 | max_demand: 10 261 | ) 262 | 263 | {:ok, subscription_tag2} = 264 | GenStage.sync_subscribe(consumer2, 265 | to: producer, 266 | cancel: :temporary, 267 | min_demand: 5, 268 | max_demand: 10 269 | ) 270 | 271 | Process.sleep(10_000) 272 | GenStage.cancel({producer, subscription_tag1}, :shutdown) 273 | GenStage.cancel({producer, subscription_tag2}, :shutdown) 274 | ``` 275 | 276 | ### [`GenStage.PartitionDispatcher`](https://hexdocs.pm/gen_stage/GenStage.PartitionDispatcher.html) 277 | 278 | This dispatcher dispatches events based on a **partitioning key** on the event itself. Consumers can subscribe to a producer that uses this dispatcher and specify the partition they want to consume. This is useful to dispatch events deterministically, which can help with keeping state in the consumer (think of caching, ownership, and so on). 279 | 280 | ```elixir 281 | defmodule PartitionProducer do 282 | use GenStage 283 | 284 | require Integer 285 | 286 | def start_link do 287 | GenStage.start_link(__MODULE__, :no_state) 288 | end 289 | 290 | @impl true 291 | def init(:no_state) do 292 | dispatcher = {GenStage.PartitionDispatcher, partitions: [:odd, :even], hash: &hash/1} 293 | {:producer, :no_state, dispatcher: dispatcher} 294 | end 295 | 296 | @impl true 297 | def handle_demand(demand, state) do 298 | Process.sleep(Enum.random(1000..5000)) 299 | {:noreply, Enum.take_random(1..1000, demand), state} 300 | end 301 | 302 | defp hash(event) when Integer.is_even(event), do: {event, :even} 303 | defp hash(event) when Integer.is_odd(event), do: {event, :odd} 304 | end 305 | 306 | {:ok, producer} = PartitionProducer.start_link() 307 | {:ok, consumer1} = PrinterConsumer.start_link() 308 | {:ok, consumer2} = PrinterConsumer.start_link() 309 | 310 | IO.puts("Ready, set, go!") 311 | 312 | {:ok, subscription_tag1} = 313 | GenStage.sync_subscribe(consumer1, 314 | to: producer, 315 | partition: :even, 316 | cancel: :temporary, 317 | min_demand: 5, 318 | max_demand: 10 319 | ) 320 | 321 | {:ok, subscription_tag2} = 322 | GenStage.sync_subscribe(consumer2, 323 | to: producer, 324 | partition: :odd, 325 | cancel: :temporary, 326 | min_demand: 5, 327 | max_demand: 10 328 | ) 329 | 330 | Process.sleep(10_000) 331 | GenStage.cancel({producer, subscription_tag1}, :shutdown) 332 | GenStage.cancel({producer, subscription_tag2}, :shutdown) 333 | ``` 334 | 335 | As you can see, all odd integers are printed by the same consumer, and all the even ones are printed by the same (other) consumer. 336 | -------------------------------------------------------------------------------- /04-flow.livemd: -------------------------------------------------------------------------------- 1 | # Flow 2 | 3 | ```elixir 4 | Mix.install([:req, :flow]) 5 | ``` 6 | 7 | ## What Is Flow? 8 | 9 | Flow is an abstraction built on top of GenStage. Its job is *processing collections asynchronously* through series of stages. 10 | 11 | > But that's kind of what we did with `Task.async_stream/1`! — you, now 12 | 13 | Indeed, the similarities are a few: 14 | 15 | * They both work on *bounded* or *unbounded* collections 16 | * They both process collection elements in parallel 17 | 18 | However, Flow is more powerful, versatile, and customizable than `Task.async_stream/1`. Because of this, it's also more complex and has a steeper learning curve, so make sure it solves your problem better than `async_stream` before using it! 19 | 20 | Flow is built on top of GenStage itself. It's the next layer of abstraction when thinking about parallel data processing and map/reduce algorithms. 21 | 22 | ## Counting Words 23 | 24 | Let's implement the classic map/reduce toy example: counting the occurrence of every word in a text. We'll fetch the words from the [baconipsum.com](https://baconipsum.com) API 🥓 25 | 26 | #### The `Enum` Version 27 | 28 | Let's start with how we'd do this with `Enum`. 29 | 30 | ```elixir 31 | start_time = System.monotonic_time(:millisecond) 32 | 33 | result = 34 | 1..10 35 | |> Enum.map(fn _ -> 36 | Req.get!("https://baconipsum.com/api/?type=meat-and-filler¶s=50&format=text").body 37 | end) 38 | |> Enum.flat_map(fn paragraph -> 39 | String.split(paragraph, "\n", trim: true) 40 | end) 41 | |> Enum.flat_map(&String.split(&1, " ", trim: true)) 42 | |> Enum.reduce(%{}, fn word, acc -> 43 | Map.update(acc, word, 1, &(&1 + 1)) 44 | end) 45 | |> Enum.sort_by(fn {_word, count} -> count end, :desc) 46 | 47 | end_time = System.monotonic_time(:millisecond) 48 | IO.puts("Took us #{end_time - start_time}ms") 49 | 50 | result 51 | ``` 52 | 53 | Cool. Cool cool cool. This solution works. However, it is **sequential**. It also loads the whole text into memory for every request and then it keeps eagerly evaluating expressions. Each of those can be expensive: first we split the whole text into a full list of lines, then each line into a full list of words, and so on. We fully compute each step before moving to the next. 54 | 55 | I know what you're thinking: this is what **streams** are for. You're kind of right. 56 | 57 | #### The `Stream` Version 58 | 59 | It's easy enough to change the `Enum`-based version to use streams instead. 60 | 61 | ```elixir 62 | start_time = System.monotonic_time(:millisecond) 63 | 64 | result = 65 | 1..10 66 | |> Stream.map(fn _ -> 67 | Req.get!("https://baconipsum.com/api/?type=meat-and-filler¶s=50&format=text").body 68 | end) 69 | |> Stream.flat_map(fn paragraph -> 70 | String.splitter(paragraph, "\n", trim: true) 71 | end) 72 | |> Stream.flat_map(&String.splitter(&1, " ", trim: true)) 73 | |> Enum.reduce(%{}, fn word, acc -> 74 | Map.update(acc, word, 1, &(&1 + 1)) 75 | end) 76 | |> Enum.sort_by(fn {_word, count} -> count end, :desc) 77 | 78 | end_time = System.monotonic_time(:millisecond) 79 | IO.puts("Took us #{end_time - start_time}ms") 80 | 81 | result 82 | ``` 83 | 84 | The differences are: 85 | 86 | * We use [`String.splitter/3`](https://hexdocs.pm/elixir/String.html#splitter/2) instead of `String.split/3`, since it returns a lazy stream 87 | * We change `Enum.flat_map/2` into `Stream.flat_map/2` 88 | 89 | Just like that, our solution is now *lazy* and won't compute every step and load it fully into memory. That's fantastic, but our execution time is kind of exactly the same! Splitting strings and stuff like that can take memory, but it's blazing-fast compared to performing HTTP requests. 90 | 91 | ### Enter Flow 92 | 93 | Flow tries its hardest to provide an API that's almost identical to `Enum` and `Stream`. This is how we'd rewrite our example. 94 | 95 | ```elixir 96 | start_time = System.monotonic_time(:millisecond) 97 | 98 | result = 99 | 1..10 100 | |> Flow.from_enumerable(max_demand: 1, min_demand: 0) 101 | |> Flow.map(fn _ -> 102 | Req.get!("https://baconipsum.com/api/?type=meat-and-filler¶s=50&format=text").body 103 | end) 104 | |> Flow.flat_map(fn paragraph -> 105 | String.splitter(paragraph, "\n", trim: true) 106 | end) 107 | |> Flow.flat_map(&String.splitter(&1, " ", trim: true)) 108 | |> Flow.partition() 109 | |> Flow.reduce(fn -> %{} end, fn word, acc -> 110 | Map.update(acc, word, 1, &(&1 + 1)) 111 | end) 112 | |> Enum.sort_by(fn {_word, count} -> count end, :desc) 113 | 114 | end_time = System.monotonic_time(:millisecond) 115 | IO.puts("Took us #{end_time - start_time}ms") 116 | 117 | result 118 | ``` 119 | 120 | ## Partitioning 121 | 122 | The example above works thanks to a little function we snuck in there: [`Flow.partition/1`](https://hexdocs.pm/flow/Flow.html#partition/1). 123 | 124 | Flow executes computations in different processes, including the `Flow.reduce/3` step. This means that if we don't have a way to make sure words are deterministically processed in specific processes, we're not going to be able to perform the reduce step in parallel. This is a known issue when working with map/reduce: if you want the reduce step to happen in parallel, you need to divide the output of the map step in subsets that you can reduce on their own. 125 | 126 | `Flow.partition/1` does exactly that: it introduces a new set of *stages* and makes sure the same word is always mapped to the same stage. It does this through a hash function. 127 | 128 | ![](images/04-flow_partitioning.png) 129 | 130 | ## Windows and Triggers 131 | 132 | We won't go into too much detail on windows and triggers, but they're a powerful feature of Flow. 133 | 134 | The reason for having these features is *unbounded collections*. When working with unbounded collections, we can't simply `Flow.reduce/3` like we did in the word-counting example. We'll never arrive at a reduceable result if our collection has infinite elements. 135 | 136 | Windows and triggers address this problem. Windows let us split the collection based on time, and triggers let us tell the pipeline when to "flush" the results we have computed so far. Then, Flow can run its operations on *windows* of data instead of on the unbounded collection. 137 | 138 | Once we specify a window strategy, then we can use triggers to *materialize* the data in the window. The parallel computation now happens on the window itself, not on all the data like in the examples above. 139 | 140 | All events belong to the **global window** by default, which is returned by [`Flow.Window.global/0`](https://hexdocs.pm/flow/1.2.0/Flow.Window.html#global/0). 141 | 142 | ```elixir 143 | Flow.Window.global() 144 | ``` 145 | 146 | ### Triggers 147 | 148 | Let's see a trigger in action. We'll use the [`Flow.Window.trigger_every/2` trigger](https://hexdocs.pm/flow/Flow.Window.html#trigger_every/2), which triggers every `n` elements in the window. 149 | 150 | ```elixir 151 | window = Flow.Window.global() |> Flow.Window.trigger_every(10) 152 | 153 | Flow.from_enumerable(1..100) 154 | |> Flow.partition(window: window, stages: 1) 155 | |> Flow.reduce(fn -> 0 end, &(&1 + &2)) 156 | |> Flow.emit(:state) 157 | |> Enum.to_list() 158 | ``` 159 | 160 | As you can see in the output, every element in the returned list is the accumulated sum of the ten elements before it. 161 | 162 | #### What to Emit? 163 | 164 | You might notice a little call to [`Flow.emit/2`](https://hexdocs.pm/flow/Flow.html#emit/2) in the pipeline in our previous example. You can use `Flow.emit/2` to tell Flow what is the value you want to emit to the next stage of the computation. Often, you'd use `Flow.emit(flow, :events)`. In our case, what we want to emit to the next stage is the *state* calculated by `Flow.reduce/3`, so we use `Flow.emit(:state)`. 165 | 166 | ### Other Window Types 167 | 168 | We can have different window types. The [`Flow.Window.count/1` window](https://hexdocs.pm/flow/Flow.Window.html#count/1) considers the given `n` elements. We can see how this plays out with the `trigger_every` trigger we used above. In the example below, we consider a window of 15 elements at a time, but emit the trigger every 6 elements. 169 | 170 | ```elixir 171 | window = Flow.Window.count(15) |> Flow.Window.trigger_every(6) 172 | 173 | Stream.repeatedly(fn -> 5 end) 174 | |> Flow.from_enumerable() 175 | |> Flow.partition(window: window, stages: 1) 176 | |> Flow.reduce(fn -> 0 end, &(&1 + &2)) 177 | |> Flow.emit(:state) 178 | |> Enum.take(10) 179 | ``` 180 | 181 | ## Stages 182 | 183 | As mentioned at the beginning of this livebook, Flow is built on top of GenStage. In many cases, you won't even notice it. However, it's important because it means all the GenStage features we know and love power Flow as well. For example, Flow pipelines have built-in backpressure through GenStage's demand. You can also use GenStage producers or producer-consumers to feed a Flow pipeline through [`Flow.from_stages/2`](https://hexdocs.pm/flow/Flow.html#from_stages/1). 184 | 185 | To control parallelism and batching in Flow, you'll see GenStage concepts pop up. For example, the `Flow.partition/2` function we used above uses a `:stages` option to control how many processes to use in the partitioning. You can also use `:min_demand` and `:max_demand` to control batching. 186 | -------------------------------------------------------------------------------- /05-broadway.livemd: -------------------------------------------------------------------------------- 1 | # Broadway 2 | 3 | ```elixir 4 | Mix.install([ 5 | :amqp, 6 | :broadway, 7 | :broadway_rabbitmq, 8 | :jason, 9 | :kino 10 | ]) 11 | 12 | case AMQP.Connection.open() do 13 | {:ok, conn} -> 14 | :ok = AMQP.Connection.close(conn) 15 | 16 | {:error, reason} -> 17 | raise """ 18 | there doesn't seem to be a RabbitMQ instance accepting connections at 19 | localhost:5672: #{inspect(reason)} 20 | """ 21 | end 22 | 23 | defmodule VisualHelpers do 24 | # Don't do this. It's Broadway private API and it could change any time! 25 | def broadway_sup_tree(pipeline) do 26 | sup = :sys.get_state(pipeline).supervisor_pid 27 | # Kino.Process.sup_tree/2 doesn't render non-alive processes correctly, so let's 28 | # remove them. 29 | Supervisor.delete_child(sup, Broadway.Topology.RateLimiter) 30 | Kino.Process.sup_tree(sup, direction: :left_right) 31 | end 32 | end 33 | ``` 34 | 35 | ## Before We Start 36 | 37 | In the setup block above, we try to connect to RabbitMQ to check whether you have it running. We use RabbitMQ in this livebook to feed messages to our Broadway examples. 38 | 39 | If you don't have RabbitMQ installed, the easiest way to quickly spin up an instance is through Docker: 40 | 41 | ``` 42 | $ docker run --rm -it -p 5672:5672 rabbitmq 43 | ``` 44 | 45 | ## The Big Stage 46 | 47 | Broadway is a tool to build **data ingestion** and **data processing** pipelines. It's a sibling of Flow that builds on top of GenStage. You can think of Flow as the solution to the problem "how do I use GenStage in a simple way to process collections?". Broadway is the answer to the question "how do I use GenStage in a simple way to ingest and process data from different sources?". 48 | 49 | Compared to GenStage, Broadway is more **declarative**. It provides many abstractions that you could build yourself with GenStage, but it makes them easy to configure. Some of those are: 50 | 51 | * batching 52 | * (automatic) acknowledgements 53 | * graceful shutdown and draining 54 | * rate limiting 55 | * instrumentation 56 | 57 | That all sounds fantastic. Let's dive in. 58 | 59 | ## Broadway's Architecture 60 | 61 | Broadway's architecture is made of three main components: 62 | 63 | * a set of **producers** — these produce the messages that feed the pipeline 64 | * a set of **processors** — these do per-message processing 65 | * a (optional) set of **batch processes** — these, if present, work on batches of *processed* messages 66 | 67 | In its simplest form, a Broadway pipeline is made of producers and processors. 68 | 69 | ![](images/05-broadway_architecture.png) 70 | 71 | ### Producers 72 | 73 | A Broadway producer is a **GenStage producer** that emits `Broadway.Message` structs as its events. One of the coolest things about Broadway is that its ecosystem comes with several existing producers. These produce messages out of the most common message queues, databases, and more. Some of the existing producers are: 74 | 75 | * [`broadway_kafka`](https://github.com/dashbitco/broadway_kafka) 76 | * [`broadway_rabbitmq`](https://github.com/dashbitco/broadway_rabbitmq) 77 | * [`broadway_sqs`](https://github.com/dashbitco/broadway_sqs) 78 | * [`broadway_cloud_pub_sub`](https://github.com/dashbitco/broadway_cloud_pub_sub) 79 | 80 | ## Pipeline Example 81 | 82 | Let's start with an example of a simple Broadway pipeline. We'll use the RabbitMQ producer. Our pipeline will consume messages from a RabbitMQ queue and, guess what, print them to standard output! 😀 Read [the RabbitMQ section above](#before-we-start) to make sure you have RabbitMQ running locally. 83 | 84 | ```elixir 85 | defmodule RabbitMQPrinterPipeline do 86 | use Broadway 87 | 88 | def start_link do 89 | producer_opts = [ 90 | # The queue to consume from. 91 | queue: "my_queue", 92 | on_failure: :reject_and_requeue 93 | ] 94 | 95 | Broadway.start_link(__MODULE__, 96 | name: __MODULE__, 97 | producer: [ 98 | module: {BroadwayRabbitMQ.Producer, producer_opts}, 99 | concurrency: 1 100 | ], 101 | processors: [ 102 | default: [concurrency: 2] 103 | ] 104 | ) 105 | end 106 | 107 | # The only callback you need to process messages. 108 | @impl true 109 | def handle_message(:default, message, _context) do 110 | IO.inspect(message, label: "Message in processor #{inspect(self())}") 111 | message 112 | end 113 | end 114 | ``` 115 | 116 | Before running this, let's start up a connection RabbitMQ using the [AMQP library](https://github.com/pma/amqp). We also declare an exchange, a queue, and bind the queue to the exchange. Take a look at [CloudAMQP's RabbitMQ introduction](https://www.cloudamqp.com/blog/part1-rabbitmq-for-beginners-what-is-rabbitmq.html) if you want to learn more about RabbitMQ concepts. 117 | 118 | ```elixir 119 | {:ok, conn} = AMQP.Connection.open() 120 | {:ok, channel} = AMQP.Channel.open(conn) 121 | :ok = AMQP.Exchange.declare(channel, "my_exchange", :topic) 122 | {:ok, _info} = AMQP.Queue.declare(channel, "my_queue") 123 | :ok = AMQP.Queue.bind(channel, "my_queue", "my_exchange", routing_key: "print.*") 124 | 125 | # Publish messages 126 | :ok = AMQP.Basic.publish(channel, "my_exchange", "print.this", "hello world!") 127 | :ok = AMQP.Basic.publish(channel, "my_exchange", "print.that", "this is Broadway speaking!") 128 | 129 | :ok = AMQP.Channel.close(channel) 130 | :ok = AMQP.Connection.close(conn) 131 | ``` 132 | 133 | Okay, we're ready to rumble. We already published a couple of messages in the code above. These are now sitting in the `my_queue` queue in RabbitMQ. When we'll connect a consumer to it, RabbitMQ will deliver the messages to the consumer. We'll start our Broadway pipeline below, which does exactly that: it subscribes a consumer to the queue. 134 | 135 | ```elixir 136 | # Start the pipeline 137 | {:ok, pipeline} = RabbitMQPrinterPipeline.start_link() 138 | 139 | # Shut it down after 3 seconds 140 | Process.sleep(3_000) 141 | Broadway.stop(pipeline) 142 | ``` 143 | 144 | As you can see, we return the `Broadway.Message` struct itself from the `handle_message/3` callback. This functional approach is fantastic, because it lets us only operate on data. It also lets us tell Broadway when something went wrong, for example by returning a "failed" message using [`Broadway.Message.failed/2`](https://hexdocs.pm/broadway/Broadway.Message.html#failed/2). 145 | 146 | When `handle_message/3` returns, Broadway **acknowledges** the message. What this means depends on the producer. In RabbitMQ's case, it means that Broadway does a RabbitMQ *ack* of the message, which in turn causes RabbitMQ itself to consider the message as consumed. For other producers, semantics might be slightly different. 147 | 148 | If there's a crash in `handle_message/3`, Broadway will follow the `:on_failure` option we gave when starting the producer. In this case, it will reject the message and requeue it, using the RabbitMQ "nack" operation with `requeue: true`. 149 | 150 | ## Batching 151 | 152 | Broadway pipelines also support batching. You can declare different batchers when starting the Broadway pipeline. When you return a message from the `handle_message/3` callback, you can put a **batcher** term in it. Broadway uses this term to batch the message to the right batcher. Let's see an example where consume messages from RabbitMQ that contain a simple JSON like: 153 | 154 | ```json 155 | {"value": 42} 156 | ``` 157 | 158 | We'll get the `"value"` key and batch it based on whether it's even or odd. 159 | 160 | ```elixir 161 | defmodule RabbitMQBatchedPipeline do 162 | use Broadway 163 | 164 | require Integer 165 | 166 | def start_link do 167 | producer_opts = [ 168 | queue: "my_queue", 169 | on_failure: :reject_and_requeue 170 | ] 171 | 172 | Broadway.start_link(__MODULE__, 173 | name: __MODULE__, 174 | producer: [ 175 | module: {BroadwayRabbitMQ.Producer, producer_opts}, 176 | concurrency: 1 177 | ], 178 | processors: [ 179 | default: [concurrency: 2] 180 | ], 181 | batchers: [ 182 | odd: [concurrency: 1, batch_size: 3], 183 | even: [concurrency: 1, batch_size: 3] 184 | ] 185 | ) 186 | end 187 | 188 | @impl true 189 | def handle_message(:default, message, _context) do 190 | %{"value" => value} = Jason.decode!(message.data) 191 | message = Broadway.Message.put_data(message, value) 192 | 193 | if Integer.is_even(value) do 194 | Broadway.Message.put_batcher(message, :even) 195 | else 196 | Broadway.Message.put_batcher(message, :odd) 197 | end 198 | end 199 | 200 | @impl true 201 | def handle_batch(batcher, messages, _batch_info, _context) do 202 | messages 203 | |> Enum.map(& &1.data) 204 | |> IO.inspect(label: "Batch of messages in #{inspect(batcher)} batcher") 205 | 206 | messages 207 | end 208 | end 209 | ``` 210 | 211 | Let's run this code. We'll also visualize the supervision tree started by Broadway at the end, which is pretty cool and gives you an idea of what's going on under the hood. 212 | 213 | ```elixir 214 | {:ok, conn} = AMQP.Connection.open() 215 | {:ok, channel} = AMQP.Channel.open(conn) 216 | :ok = AMQP.Exchange.declare(channel, "my_exchange", :topic) 217 | {:ok, _info} = AMQP.Queue.declare(channel, "my_queue") 218 | :ok = AMQP.Queue.bind(channel, "my_queue", "my_exchange", routing_key: "print.*") 219 | 220 | # Publish messages 221 | for int <- 500..520 do 222 | :ok = AMQP.Basic.publish(channel, "my_exchange", "print.this", ~s({"value": #{int}})) 223 | end 224 | 225 | # Start the pipeline 226 | {:ok, pipeline} = RabbitMQBatchedPipeline.start_link() 227 | 228 | # Shut it down after 5 seconds 229 | Task.start(fn -> 230 | Process.sleep(5_000) 231 | Broadway.stop(pipeline) 232 | end) 233 | 234 | # Visualize the supervision tree started by Broadway. 235 | VisualHelpers.broadway_sup_tree(pipeline) 236 | ``` 237 | 238 | Batching works on **batch size** plus **batch timeout**. If a batches reaches the configured size, then it gets handed to the batcher. If it doesn't reach the configured size within a configurable timeout, it gets handed to the batcher anyways when the timeout expires, as we can see in the example above with the last two batches. 239 | -------------------------------------------------------------------------------- /LICENSE.txt: -------------------------------------------------------------------------------- 1 | Copyright 2022 Andrea Leopardi 2 | 3 | Licensed under the Apache License, Version 2.0 (the "License"); 4 | you may not use this file except in compliance with the License. 5 | You may obtain a copy of the License at 6 | 7 | http://www.apache.org/licenses/LICENSE-2.0 8 | 9 | Unless required by applicable law or agreed to in writing, software 10 | distributed under the License is distributed on an "AS IS" BASIS, 11 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | See the License for the specific language governing permissions and 13 | limitations under the License. 14 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Asynchronous Processing in Elixir 🏃 2 | 3 | ![Cover image of some abstract lights][cover-image] 4 | 5 | This is a short interactive guide to asynchronous data processing in Elixir. It 6 | uses [Livebook][livebook] to show interactive Elixir snippets that you can run 7 | on your own machine. 8 | 9 | ## How Do I Use This? 10 | 11 | There are a handful of livebooks in this guide. Using the badges below, you can 12 | import them to your computer and run and explore them there. My recommendation 13 | is to use the Livebook desktop app that you can find [on its website][livebook]. 14 | 15 | ### 01 — Processes 16 | 17 | [![Run in Livebook](https://livebook.dev/badge/v1/blue.svg)](https://livebook.dev/run?url=https%3A%2F%2Fraw.githubusercontent.com%2Fwhatyouhide%2Fguide_async_processing_in_elixir%2Fmain%2F01-processes.livemd) 18 | 19 | In this livebook we talk about the basics of processes and message passing. 20 | 21 | ### 02 — Tasks 22 | 23 | Here, we talk about the `Task` abstraction that ships in Elixir's standard 24 | library. 25 | 26 | [![Run in Livebook](https://livebook.dev/badge/v1/blue.svg)](https://livebook.dev/run?url=https%3A%2F%2Fraw.githubusercontent.com%2Fwhatyouhide%2Fguide_async_processing_in_elixir%2Fmain%2F02-tasks.livemd) 27 | 28 | ### 03 — GenStage 29 | 30 | [GenStage] is an Elixir library maintained by the Elixir core team. It lets you 31 | build pipelines of stages through which events flow. 32 | 33 | [![Run in Livebook](https://livebook.dev/badge/v1/blue.svg)](https://livebook.dev/run?url=https%3A%2F%2Fraw.githubusercontent.com%2Fwhatyouhide%2Fguide_async_processing_in_elixir%2Fmain%2F03-genstage.livemd) 34 | 35 | ### 04 — Flow 36 | 37 | [Flow] is another Elixir library maintained by the core team. It builds on top 38 | of GenStage to provide an API similar to `Enum` and `Stream`, but for parallel 39 | data processing. 40 | 41 | [![Run in Livebook](https://livebook.dev/badge/v1/blue.svg)](https://livebook.dev/run?url=https%3A%2F%2Fraw.githubusercontent.com%2Fwhatyouhide%2Fguide_async_processing_in_elixir%2Fmain%2F04-flow.livemd) 42 | 43 | ### 05 — Broadway 44 | 45 | [Broadway] lets you build declarative data ingestion and processing pipelines. 46 | It supports several sources (RabbitMQ, Kafka, AWS SQS, and more). 47 | 48 | [![Run in Livebook](https://livebook.dev/badge/v1/blue.svg)](https://livebook.dev/run?url=https%3A%2F%2Fraw.githubusercontent.com%2Fwhatyouhide%2Fguide_async_processing_in_elixir%2Fmain%2F05-broadway.livemd) 49 | 50 | ## License 51 | 52 | See [the license file](./LICENSE.txt). 53 | 54 | "Social preview" photo by [Bofu Shaw](https://unsplash.com/@hikeshaw?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText) on [Unsplash](https://unsplash.com/s/photos/speed?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText). 55 | 56 | [livebook]: https://livebook.dev 57 | [GenStage]: https://github.com/elixir-lang/gen_stage 58 | [Flow]: https://github.com/elixir-lang/flow 59 | [Broadway]: https://elixir-broadway.org 60 | [cover-image]: https://user-images.githubusercontent.com/3890250/182093532-159e5bcc-dcd7-40d7-9030-da914f3db0bb.jpg 61 | -------------------------------------------------------------------------------- /images/00-introduction_avatar_and_name.png.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/whatyouhide/guide_async_processing_in_elixir/6dc6f71d1e01db64dc7d55ca1b0a89cef74356b9/images/00-introduction_avatar_and_name.png.png -------------------------------------------------------------------------------- /images/03-genstage_architetcture.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/whatyouhide/guide_async_processing_in_elixir/6dc6f71d1e01db64dc7d55ca1b0a89cef74356b9/images/03-genstage_architetcture.png -------------------------------------------------------------------------------- /images/03-genstage_demand.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/whatyouhide/guide_async_processing_in_elixir/6dc6f71d1e01db64dc7d55ca1b0a89cef74356b9/images/03-genstage_demand.png -------------------------------------------------------------------------------- /images/04-flow_partitioning.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/whatyouhide/guide_async_processing_in_elixir/6dc6f71d1e01db64dc7d55ca1b0a89cef74356b9/images/04-flow_partitioning.png -------------------------------------------------------------------------------- /images/05-broadway_architecture.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/whatyouhide/guide_async_processing_in_elixir/6dc6f71d1e01db64dc7d55ca1b0a89cef74356b9/images/05-broadway_architecture.png --------------------------------------------------------------------------------