├── 01-processes.livemd
├── 02-tasks.livemd
├── 03-genstage.livemd
├── 04-flow.livemd
├── 05-broadway.livemd
├── LICENSE.txt
├── README.md
└── images
    ├── 00-introduction_avatar_and_name.png.png
    ├── 03-genstage_architetcture.png
    ├── 03-genstage_demand.png
    ├── 04-flow_partitioning.png
    └── 05-broadway_architecture.png


/01-processes.livemd:
--------------------------------------------------------------------------------
  1 | # Processes Basics
  2 | 
  3 | ## Introduction
  4 | 
  5 | All Elixir code runs inside *processes*. In this livebook, we'll take a look at the basics of processes, how to spawn them, and how to have them communicate with each other.
  6 | 
  7 | ## Spawning
  8 | 
  9 | The most basic operation to create a new process is to **spawn** it. `spawn/1` takes a function, creates a new process, and executes the function in the new process.
 10 | 
 11 | The process exits when the function returns or when there's an explicit exit (such as an uncaught `raise`).
 12 | 
 13 | Each process is identified by a **PID** (Process IDentifier).
 14 | 
 15 | ```elixir
 16 | pid =
 17 |   spawn(fn ->
 18 |     Process.sleep(2000)
 19 |     IO.puts("Finished the expensive computation.")
 20 |   end)
 21 | 
 22 | IO.puts("Just spawned a process with PID #{inspect(pid)}. Let's wait a bit.")
 23 | ```
 24 | 
 25 | ## Sending and Receiving Messages
 26 | 
 27 | Processes communicate via messages. A process can **send** or **receive** messages.
 28 | 
 29 | ### Sending
 30 | 
 31 | To send messages, you use `send(pid, message)`. The caller process sends `message` to the process identified by `pid`. Sending is **asynchronous**, that is, `send/2` returns as soon as the message is sent. There's no guarantee that when `send/2` returns the destination process has received the message.
 32 | 
 33 | ### Receiving
 34 | 
 35 | To receive messages, a process calls `receive`. `receive` is blocking: it will halt execution until a message that matches one of the listed patterns arrives to the caller process.
 36 | 
 37 | ```elixir
 38 | pid =
 39 |   spawn(fn ->
 40 |     receive do
 41 |       message ->
 42 |         IO.puts("#{inspect(self())} received a message: #{inspect(message)}")
 43 |     end
 44 |   end)
 45 | 
 46 | Process.alive?(pid)
 47 | ```
 48 | 
 49 | ```elixir
 50 | send(pid, :hello_world)
 51 | ```
 52 | 
 53 | As you can see, the process receives the message and prints it to standard output. Since the function that we passed to `spawn/1` returns after printing the message, the process itself finishes its execution. You can see this by checking whether `pid` represents a process that is alive.
 54 | 
 55 | ```elixir
 56 | Process.alive?(pid)
 57 | ```
 58 | 
 59 | ### Pattern Matching on Receive
 60 | 
 61 | `receive` supports multiple `->` clauses. When a message arrives, the first clause that matches it gets executed. This is analogous to `case`.
 62 | 
 63 | ```elixir
 64 | pid =
 65 |   spawn(fn ->
 66 |     receive do
 67 |       :ping ->
 68 |         IO.puts("pong!")
 69 | 
 70 |       message ->
 71 |         IO.puts("#{inspect(self())} received a message: #{inspect(message)}")
 72 |     end
 73 |   end)
 74 | 
 75 | send(pid, :ping)
 76 | ```
 77 | 
 78 | ### The Process Mailbox
 79 | 
 80 | Each process has a **mailbox** where all messages it receives end up. It's conceptually similar to a queue.
 81 | 
 82 | When a process receives a message, this message ends up in the process mailbox. The next time there is a `receive` call in that process, this is the algorithm that gets executed to determine what to do with that message.
 83 | 
 84 | <!-- livebook:{"break_markdown":true} -->
 85 | 
 86 | <!-- Learn more at https://mermaid-js.github.io/mermaid -->
 87 | 
 88 | ```mermaid
 89 | sequenceDiagram;
 90 |   participant other_pid as Other process
 91 |   participant pid as Process
 92 |   participant mailbox as Mailbox
 93 | 
 94 |   other_pid->>pid: Send message
 95 |   pid->>mailbox: Put message at the end
 96 | 
 97 |   pid->>pid: Wait for the next call to receive
 98 | 
 99 |   loop Every message in mailbox
100 |     alt Matches one clause
101 |       pid->>pid: Execute clause
102 |     else
103 |       pid->>mailbox: Store in the same position
104 |     end
105 |   end
106 | ```
107 | 
108 | <!-- livebook:{"break_markdown":true} -->
109 | 
110 | Two important things:
111 | 
112 | * If the loop reaches the end of the mailbox and no messages match any `receive` clauses, `receive` blocks until a new message comes and the algorithm gets executed again.
113 | * If `receive` is called when there are already messages in the mailbox, the algorithms executes right away (it doesn't wait for a new message).
114 | 
115 | <!-- livebook:{"break_markdown":true} -->
116 | 
117 | ### Receive Timeout
118 | 
119 | `receive` blocks indefinitely. However, it supports an `after` clause. This clause lets you specify a timeout after which the corresponding code executes and `receive` returns. Be careful using `receive` without `after`, since it could cause the process to halt indefinitely in case there's a bug in your list of patterns.
120 | 
121 | ```elixir
122 | spawn(fn ->
123 |   receive do
124 |     message ->
125 |       IO.puts("#{inspect(self())} received a message: #{inspect(message)}")
126 |   after
127 |     5_000 ->
128 |       IO.puts("Timeout, no messages")
129 |   end
130 | end)
131 | ```
132 | 
133 | ## Parallel Map
134 | 
135 | Let's use what we know so far to implement parallel mapping.
136 | 
137 | ```elixir
138 | defmodule Parallel do
139 |   def map(enum, fun) do
140 |     # Let's take note of the "parent" PID, since if we call self()
141 |     # in the function we pass to spawn/1 then we get the PID of the
142 |     # spawned process.
143 |     parent = self()
144 | 
145 |     pids =
146 |       Enum.map(enum, fn elem ->
147 |         spawn(fn ->
148 |           # Compute the mapped element.
149 |           mapped_elem = fun.(elem)
150 | 
151 |           # Send the result back to the "parent".
152 |           send(parent, {self(), mapped_elem})
153 |         end)
154 |       end)
155 | 
156 |     Enum.map(pids, fn pid ->
157 |       receive do
158 |         {^pid, mapped_elem} -> mapped_elem
159 |       end
160 |     end)
161 |   end
162 | end
163 | ```
164 | 
165 | This code is **full** of bugs. 🙈 It doesn't use `after` for timeouts, it doesn't do any error handling, and more. However, it illustrates the idea! Let's give it a spin.
166 | 
167 | To see the parallelism in action, let's map over a list of integers representing *timeouts* in milliseconds. We'll map the `Process.sleep/1` function over those. First, let's use `Enum.map/2` to see what happens when we map sequentially, one item at a time:
168 | 
169 | ```elixir
170 | {elapsed, _result} = :timer.tc(fn -> Enum.map([1000, 1000, 1000], &Process.sleep/1) end)
171 | IO.puts("Elapsed time: #{elapsed / 1_000_000} s")
172 | ```
173 | 
174 | It takes roughly 3s to execute the code, which makes perfect sense. If we use our `Parallel.map/2` function, it should hopefully take around 1s!
175 | 
176 | ```elixir
177 | {elapsed, _result} = :timer.tc(fn -> Parallel.map([1000, 1000, 1000], &Process.sleep/1) end)
178 | IO.puts("Elapsed time: #{elapsed / 1_000_000} s")
179 | ```
180 | 


--------------------------------------------------------------------------------
/02-tasks.livemd:
--------------------------------------------------------------------------------
  1 | # Tasks
  2 | 
  3 | ## Processes On Steroids
  4 | 
  5 | Tasks are processes that handle a bunch of the nitty-gritty details for you. They're the natural evolution from processes for many use cases.
  6 | 
  7 | Let's start by exploring the `async` + `await` use case.
  8 | 
  9 | ### `async` and `await`
 10 | 
 11 | [`Task.async/1`](https://hexdocs.pm/elixir/Task.html#async/1) spawns a new task, similarly to `spawn/1`. Instead of sending messages back and forth, you can use [`Task.await/1`](https://hexdocs.pm/elixir/Task.html#await/2) to collect the value *returned* by the spawned task.
 12 | 
 13 | ```elixir
 14 | task =
 15 |   Task.async(fn ->
 16 |     Process.sleep(1000)
 17 |     IO.puts("Expensive computation is done!")
 18 |     Enum.random(1..100)
 19 |   end)
 20 | 
 21 | IO.puts("Running task...")
 22 | Task.await(task)
 23 | ```
 24 | 
 25 | ### Yielding
 26 | 
 27 | [`Task.yield/2`](https://hexdocs.pm/elixir/Task.html#yield/2) is similar to `Task.await/1`, but it returns `nil` if the task doesn't return a value within the specified timeout (`Task.await/1` exits instead).
 28 | 
 29 | ```elixir
 30 | task =
 31 |   Task.async(fn ->
 32 |     Process.sleep(1000)
 33 |     IO.puts("Expensive computation is done!")
 34 |     Enum.random(1..100)
 35 |   end)
 36 | 
 37 | Task.yield(task, _timeout = 5000)
 38 | ```
 39 | 
 40 | As you can see, if the task returns in time, `Task.yield/2` returns `{:ok, result}`. Let's see what happens if the task *doesn't* return in time, instead:
 41 | 
 42 | ```elixir
 43 | task =
 44 |   Task.async(fn ->
 45 |     Process.sleep(1000)
 46 |     IO.puts("Expensive computation is done!")
 47 |     Enum.random(1..100)
 48 |   end)
 49 | 
 50 | Task.yield(task, _timeout = 500)
 51 | ```
 52 | 
 53 | `Task.yield/2` returns `nil`, but after a while the task seems to still print something. That's because `Task.yield/2` "peeks" into whether the task finished, but doesn't shut the task down in case it hasn't finished. To stop the task, we can use [`Task.shutdown/1`](https://hexdocs.pm/elixir/Task.html#shutdown/2).
 54 | 
 55 | `Task.yield/2` and `Task.shutdown/1` are often combined to implement the use case when you need a computation to be bound by time. It goes something like this:
 56 | 
 57 | 1. Start the computation
 58 | 2. Do some other work on the side
 59 | 3. When you're ready, check the result of the task with `Task.yield/2`.
 60 | 4. If the task does not complete within the timeout, shut down the task.
 61 | 
 62 | `Task.shutdown/1` also takes care of race conditions, which can happen in case the task completes right as we are telling it to shut down.
 63 | 
 64 | ```elixir
 65 | task =
 66 |   Task.async(fn ->
 67 |     Process.sleep(1000)
 68 |     IO.puts("Expensive computation is done!")
 69 |     Enum.random(1..100)
 70 |   end)
 71 | 
 72 | IO.puts("Running task...")
 73 | Task.yield(task, 500) || Task.shutdown(task)
 74 | ```
 75 | 
 76 | ## Parallel Map — Take #2 with async_stream
 77 | 
 78 | `Task` provides *the most underrated function* (IMO) in all of Elixir's standard library: [`Task.async_stream/3`](https://hexdocs.pm/elixir/Task.html#async_stream/3). It takes an enumerable and a function, and returns a stream that maps the function over the enumerable **in parallel**.
 79 | 
 80 | ```elixir
 81 | stream =
 82 |   Task.async_stream([200, 100, 400], fn timeout ->
 83 |     Process.sleep(timeout)
 84 |     IO.puts("Slept for #{timeout} ms")
 85 |     timeout * 2
 86 |   end)
 87 | 
 88 | Enum.to_list(stream)
 89 | ```
 90 | 
 91 | Seems like nothing special, right? Well, it is!
 92 | 
 93 | `async_stream`'s coolest feature is that it uses a **bounded number of processes**. You can control this number through the `:max_concurrency` option, and it defaults to the number of cores on your machine. This feature is huge: our previous naive parallel-map implementation would spawn one process per element in the enumerable, regardless of the number of elements. Billions of processes? Not good. `async_stream` will happily churn through infinite streams, using `:max_concurrency` processes at a time.
 94 | 
 95 | `async_stream` is also **flexible**. It accepts any enumerable as its input (including infinite streams) and returns itself an enumerable.
 96 | 
 97 | ## When to Use Tasks
 98 | 
 99 | * If you want to perform a few requests to different services and then collect the results
100 | * If you need a simple parallel mapping approach
101 | * If you need to perform a computation in a limited timeframe and want to stop it if it times out
102 | * When you want to spawn a computation in the background ([`Task.start/1`](https://hexdocs.pm/elixir/Task.html#start/1)), for something like side effects
103 | 
104 | ## Practical Tips
105 | 
106 | ### Tip #1 — `ordered: false` with `async_stream`
107 | 
108 | If you're using `Task.async_stream/1` and don't care about the ordering of results, use the `ordered: false` option.
109 | 
110 | This is great for when you're using `async_stream/1` to parallelize side-effects over a collection, for example. It's also useful when you're going to do something with the mapped collection that doesn't require ordering, like aggregating into a map.
111 | 
112 | ```elixir
113 | print_after_timeout = fn timeout ->
114 |   Process.sleep(timeout)
115 |   IO.puts("Slept for #{timeout} ms")
116 |   timeout
117 | end
118 | 
119 | [200, 100, 400]
120 | |> Task.async_stream(print_after_timeout, ordered: false)
121 | |> Enum.to_list()
122 | ```
123 | 
124 | As you can see, the results are returned in the order in which they finish computing, and not in the order of the original list.
125 | 
126 | ### Tip #2 — Follow the Documentation for `Task.yield/2`
127 | 
128 | The [documentation for `Task.yield/2`](https://hexdocs.pm/elixir/Task.html#yield/2) has a great code snippet to use when you need to perform a time-capped computation.
129 | 
130 | ```elixir
131 | task =
132 |   Task.async(fn ->
133 |     Process.sleep(Enum.random(499..501))
134 |     IO.puts("Done!")
135 |   end)
136 | 
137 | case Task.yield(task, 500) || Task.shutdown(task) do
138 |   {:ok, result} -> result
139 |   nil -> :timeout
140 | end
141 | ```
142 | 
143 | ### Tip #3 — `async_stream` Goes a Long Way
144 | 
145 | Before talking about GenStage, Broadway, and Flow, I want to stress the importance of `async_stream`. I've seen many cases of solutions that used GenStage or Flow that were essentially overengineered `async_stream`s. `async_stream` has some limitations, but combining the bounded number of processes, the optional ordering, and the fact that it processes lazy streams makes it a great choice in many situations.
146 | 


--------------------------------------------------------------------------------
/03-genstage.livemd:
--------------------------------------------------------------------------------
  1 | # GenStage
  2 | 
  3 | ```elixir
  4 | Mix.install([:gen_stage])
  5 | 
  6 | {:ok, _} = Application.ensure_all_started(:crypto)
  7 | ```
  8 | 
  9 | ## What Is GenStage
 10 | 
 11 | GenStage is a library maintained by the Elixir core team. It provides an abstraction over asynchronous computation that happens through multiple *stages*.
 12 | 
 13 | The idea is this: you have something generating **events** (whatever an event is). You want to feed those events through a pipeline of *stages* with varying topologies. That's what GenStage gives you.
 14 | 
 15 | A GenStage stage is an OTP behaviour, similar to `GenServer` or `:gen_statem`. Below is a small example of a minimal producer stage.
 16 | 
 17 | ```elixir
 18 | defmodule SampleStage do
 19 |   use GenStage
 20 | 
 21 |   @impl true
 22 |   def init(_), do: {:producer, :nostate}
 23 | 
 24 |   @impl true
 25 |   def handle_demand(_demand, :nostate) do
 26 |     {:noreply, _events = [], :nostate}
 27 |   end
 28 | end
 29 | ```
 30 | 
 31 | ## Stage Types
 32 | 
 33 | GenStage provides three stage types:
 34 | 
 35 | * *Producer* stages
 36 | * *Producer-consumer* stages
 37 | * *Consumer* stages
 38 | 
 39 | A GenStage pipeline can only have **one producer** and **one consumer**, plus any number of producer-consumers. A pipeline could look something like this:
 40 | 
 41 | ![](images/03-genstage_architetcture.png)
 42 | 
 43 | GenStage stages signal their type by returning it from the `init/1` callback.
 44 | 
 45 | ## Demand
 46 | 
 47 | A foundational notion in GenStage is **demand**. We say that GenStage pipelines are *demand-driven*. Pipelines don't flow from producers to consumers directly: the flow starts the opposite way. The very consuming end of the pipeline starts to "send demand" upstream, declaring itself ready to consume `n` events.
 48 | 
 49 | The demand flows upstream through any producer-consumers and eventually up to the producer. The producer should always generate events according to the demand that arrived to it. Only then events flow downstream through the stages.
 50 | 
 51 | ![](images/03-genstage_demand.png)
 52 | 
 53 | The point of this demand-driven flow is to provide **backpressure**. Events will flow through the pipeline only as fast as stages can consume them.
 54 | 
 55 | ## A Simple Pipeline
 56 | 
 57 | ### Producer
 58 | 
 59 | Let's start with producers. A producer's job is to produce events according to the downstream **demand**. It has to implement the `handle_demand/2` callback. GenStage invokes this callback whenever there is downstream demand, and passes the demand as an integer to it.
 60 | 
 61 | As an example, let's build a producer that just produces random binaries, indefinitely.
 62 | 
 63 | ```elixir
 64 | defmodule RandomBinaryProducer do
 65 |   use GenStage
 66 | 
 67 |   def start_link(binary_size) do
 68 |     GenStage.start_link(__MODULE__, binary_size)
 69 |   end
 70 | 
 71 |   @impl true
 72 |   def init(binary_size) do
 73 |     {:producer, binary_size}
 74 |   end
 75 | 
 76 |   @impl true
 77 |   def handle_demand(demand, binary_size = _state) do
 78 |     # Processing is expensive! Let's simulate that by sleeping for a bit.
 79 |     Process.sleep(Enum.random(1000..5000))
 80 | 
 81 |     events =
 82 |       Stream.repeatedly(fn -> :crypto.strong_rand_bytes(binary_size) end)
 83 |       |> Enum.take(demand)
 84 | 
 85 |     {:noreply, events, binary_size}
 86 |   end
 87 | end
 88 | ```
 89 | 
 90 | ### Consumer
 91 | 
 92 | Now, let's add the dumbest consumer: it'll consume these random binaries and print them to the standard output. So much for interesting examples, right?!
 93 | 
 94 | Consumers implement the `handle_events/3` callback, which is invoked when there are new events coming from the producer. `handle_events/3` can return events to pass downstream to the pipeline. For consumers, however, that list of events must always be `[]`. We'll see it in action in producer-consumers.
 95 | 
 96 | ```elixir
 97 | defmodule PrinterConsumer do
 98 |   use GenStage
 99 | 
100 |   def start_link do
101 |     GenStage.start_link(__MODULE__, :nostate)
102 |   end
103 | 
104 |   @impl true
105 |   def init(:nostate) do
106 |     {:consumer, :nostate}
107 |   end
108 | 
109 |   @impl true
110 |   def handle_events(binaries, _from, state) do
111 |     Enum.each(binaries, &IO.inspect(&1, label: "Binary consumed in #{inspect(self())}"))
112 |     {:noreply, _events = [], state}
113 |   end
114 | end
115 | ```
116 | 
117 | ### Wiring It Up
118 | 
119 | Alright, we're ready to run our pipeline. As we mentioned, events flow downstream but the pipeline is "kicked off" by demand going upstream, from consumers all the way to producers.
120 | 
121 | For this reason, we have to glue the pipeline together starting from the consumer. GenStage provides functions to subscribe a consumer to a producer, such as `GenStage.sync_subscribe/3`.
122 | 
123 | ```elixir
124 | {:ok, producer} = RandomBinaryProducer.start_link(_size = 12)
125 | {:ok, consumer1} = PrinterConsumer.start_link()
126 | {:ok, consumer2} = PrinterConsumer.start_link()
127 | 
128 | IO.puts("Ready, set, go!")
129 | 
130 | {:ok, subscription_tag1} =
131 |   GenStage.sync_subscribe(consumer1,
132 |     to: producer,
133 |     cancel: :temporary,
134 |     min_demand: 5,
135 |     max_demand: 10
136 |   )
137 | 
138 | {:ok, subscription_tag2} =
139 |   GenStage.sync_subscribe(consumer2,
140 |     to: producer,
141 |     cancel: :temporary,
142 |     min_demand: 5,
143 |     max_demand: 10
144 |   )
145 | 
146 | # After 10s, we shut down the pipeline to avoid it printing forever.
147 | Process.sleep(10_000)
148 | GenStage.cancel({producer, subscription_tag1}, :shutdown)
149 | GenStage.cancel({producer, subscription_tag2}, :shutdown)
150 | ```
151 | 
152 | ### Producer-consumer
153 | 
154 | Let's add a producer-consumer. It's going to add the MD5 hash of each event it consumes, and emit the event downstream as `{original_event, md5_hash}`.
155 | 
156 | ```elixir
157 | defmodule Hasher do
158 |   use GenStage
159 | 
160 |   def start_link do
161 |     GenStage.start_link(__MODULE__, :nostate)
162 |   end
163 | 
164 |   @impl true
165 |   def init(:nostate) do
166 |     {:producer_consumer, :nostate}
167 |   end
168 | 
169 |   @impl true
170 |   def handle_events(events, _from, :nostate) do
171 |     events =
172 |       for event <- events do
173 |         {event, Base.encode64(:erlang.md5(event))}
174 |       end
175 | 
176 |     # Here, "events" is not empty.
177 |     {:noreply, events, :nostate}
178 |   end
179 | end
180 | ```
181 | 
182 | ```elixir
183 | {:ok, producer} = RandomBinaryProducer.start_link(_size = 12)
184 | {:ok, producer_consumer} = Hasher.start_link()
185 | {:ok, consumer} = PrinterConsumer.start_link()
186 | 
187 | IO.puts("Ready, set, go!")
188 | 
189 | {:ok, first_subscription_tag} =
190 |   GenStage.sync_subscribe(consumer,
191 |     to: producer_consumer,
192 |     cancel: :temporary,
193 |     min_demand: 2,
194 |     max_demand: 5
195 |   )
196 | 
197 | {:ok, second_subscription_tag} =
198 |   GenStage.sync_subscribe(producer_consumer,
199 |     to: producer,
200 |     cancel: :temporary,
201 |     min_demand: 2,
202 |     max_demand: 5
203 |   )
204 | 
205 | Process.sleep(10_000)
206 | GenStage.cancel({producer, second_subscription_tag}, :shutdown)
207 | GenStage.cancel({producer_consumer, first_subscription_tag}, :shutdown)
208 | ```
209 | 
210 | ## Dispatching
211 | 
212 | Event dispatching is the missing piece in our understanding of GenStage. GenStage producers and producer-consumers dispatch events downstream based on **dispatchers** which can implement different dispatching strategies.
213 | 
214 | The default dispatcher is called a "demand dispatcher". It hands events to the downstream consumer with the *highest demand*. This is intuitive: if a consumer has high demand, it means it already processed events and has "bandwidth" to process more.
215 | 
216 | You can write your own dispatcher by writing a module that implements the [`GenStage.Dispatcher` behaviour](https://hexdocs.pm/gen_stage/GenStage.Dispatcher.html). GenStage ships with two useful dispatchers.
217 | 
218 | ### [`GenStage.BroadcastDispatcher`](https://hexdocs.pm/gen_stage/GenStage.BroadcastDispatcher.html)
219 | 
220 | This dispatcher dispatches *copies* of events to all subscribed downstream consumers. It can be useful, for example, when the same events need to be consumed by consumers that perform different kinds of work.
221 | 
222 | In the example below, you'll notice how the same random binary is printed *twice*, once for each consumer.
223 | 
224 | ```elixir
225 | defmodule RandomBinaryBroadcaster do
226 |   use GenStage
227 | 
228 |   def start_link(binary_size) do
229 |     GenStage.start_link(__MODULE__, binary_size)
230 |   end
231 | 
232 |   @impl true
233 |   def init(binary_size) do
234 |     {:producer, binary_size, dispatcher: GenStage.BroadcastDispatcher}
235 |   end
236 | 
237 |   @impl true
238 |   def handle_demand(demand, binary_size = _state) do
239 |     Process.sleep(Enum.random(1000..5000))
240 | 
241 |     events =
242 |       Stream.repeatedly(fn -> :crypto.strong_rand_bytes(binary_size) end)
243 |       |> Enum.take(demand)
244 | 
245 |     {:noreply, events, binary_size}
246 |   end
247 | end
248 | 
249 | {:ok, producer} = RandomBinaryBroadcaster.start_link(_size = 12)
250 | {:ok, consumer1} = PrinterConsumer.start_link()
251 | {:ok, consumer2} = PrinterConsumer.start_link()
252 | 
253 | IO.puts("Ready, set, go!")
254 | 
255 | {:ok, subscription_tag1} =
256 |   GenStage.sync_subscribe(consumer1,
257 |     to: producer,
258 |     cancel: :temporary,
259 |     min_demand: 5,
260 |     max_demand: 10
261 |   )
262 | 
263 | {:ok, subscription_tag2} =
264 |   GenStage.sync_subscribe(consumer2,
265 |     to: producer,
266 |     cancel: :temporary,
267 |     min_demand: 5,
268 |     max_demand: 10
269 |   )
270 | 
271 | Process.sleep(10_000)
272 | GenStage.cancel({producer, subscription_tag1}, :shutdown)
273 | GenStage.cancel({producer, subscription_tag2}, :shutdown)
274 | ```
275 | 
276 | ### [`GenStage.PartitionDispatcher`](https://hexdocs.pm/gen_stage/GenStage.PartitionDispatcher.html)
277 | 
278 | This dispatcher dispatches events based on a **partitioning key** on the event itself. Consumers can subscribe to a producer that uses this dispatcher and specify the partition they want to consume. This is useful to dispatch events deterministically, which can help with keeping state in the consumer (think of caching, ownership, and so on).
279 | 
280 | ```elixir
281 | defmodule PartitionProducer do
282 |   use GenStage
283 | 
284 |   require Integer
285 | 
286 |   def start_link do
287 |     GenStage.start_link(__MODULE__, :no_state)
288 |   end
289 | 
290 |   @impl true
291 |   def init(:no_state) do
292 |     dispatcher = {GenStage.PartitionDispatcher, partitions: [:odd, :even], hash: &hash/1}
293 |     {:producer, :no_state, dispatcher: dispatcher}
294 |   end
295 | 
296 |   @impl true
297 |   def handle_demand(demand, state) do
298 |     Process.sleep(Enum.random(1000..5000))
299 |     {:noreply, Enum.take_random(1..1000, demand), state}
300 |   end
301 | 
302 |   defp hash(event) when Integer.is_even(event), do: {event, :even}
303 |   defp hash(event) when Integer.is_odd(event), do: {event, :odd}
304 | end
305 | 
306 | {:ok, producer} = PartitionProducer.start_link()
307 | {:ok, consumer1} = PrinterConsumer.start_link()
308 | {:ok, consumer2} = PrinterConsumer.start_link()
309 | 
310 | IO.puts("Ready, set, go!")
311 | 
312 | {:ok, subscription_tag1} =
313 |   GenStage.sync_subscribe(consumer1,
314 |     to: producer,
315 |     partition: :even,
316 |     cancel: :temporary,
317 |     min_demand: 5,
318 |     max_demand: 10
319 |   )
320 | 
321 | {:ok, subscription_tag2} =
322 |   GenStage.sync_subscribe(consumer2,
323 |     to: producer,
324 |     partition: :odd,
325 |     cancel: :temporary,
326 |     min_demand: 5,
327 |     max_demand: 10
328 |   )
329 | 
330 | Process.sleep(10_000)
331 | GenStage.cancel({producer, subscription_tag1}, :shutdown)
332 | GenStage.cancel({producer, subscription_tag2}, :shutdown)
333 | ```
334 | 
335 | As you can see, all odd integers are printed by the same consumer, and all the even ones are printed by the same (other) consumer.
336 | 


--------------------------------------------------------------------------------
/04-flow.livemd:
--------------------------------------------------------------------------------
  1 | # Flow
  2 | 
  3 | ```elixir
  4 | Mix.install([:req, :flow])
  5 | ```
  6 | 
  7 | ## What Is Flow?
  8 | 
  9 | Flow is an abstraction built on top of GenStage. Its job is *processing collections asynchronously* through series of stages.
 10 | 
 11 | > But that's kind of what we did with `Task.async_stream/1`! — you, now
 12 | 
 13 | Indeed, the similarities are a few:
 14 | 
 15 | * They both work on *bounded* or *unbounded* collections
 16 | * They both process collection elements in parallel
 17 | 
 18 | However, Flow is more powerful, versatile, and customizable than `Task.async_stream/1`. Because of this, it's also more complex and has a steeper learning curve, so make sure it solves your problem better than `async_stream` before using it!
 19 | 
 20 | Flow is built on top of GenStage itself. It's the next layer of abstraction when thinking about parallel data processing and map/reduce algorithms.
 21 | 
 22 | ## Counting Words
 23 | 
 24 | Let's implement the classic map/reduce toy example: counting the occurrence of every word in a text. We'll fetch the words from the [baconipsum.com](https://baconipsum.com) API 🥓
 25 | 
 26 | #### The `Enum` Version
 27 | 
 28 | Let's start with how we'd do this with `Enum`.
 29 | 
 30 | ```elixir
 31 | start_time = System.monotonic_time(:millisecond)
 32 | 
 33 | result =
 34 |   1..10
 35 |   |> Enum.map(fn _ ->
 36 |     Req.get!("https://baconipsum.com/api/?type=meat-and-filler&paras=50&format=text").body
 37 |   end)
 38 |   |> Enum.flat_map(fn paragraph ->
 39 |     String.split(paragraph, "\n", trim: true)
 40 |   end)
 41 |   |> Enum.flat_map(&String.split(&1, " ", trim: true))
 42 |   |> Enum.reduce(%{}, fn word, acc ->
 43 |     Map.update(acc, word, 1, &(&1 + 1))
 44 |   end)
 45 |   |> Enum.sort_by(fn {_word, count} -> count end, :desc)
 46 | 
 47 | end_time = System.monotonic_time(:millisecond)
 48 | IO.puts("Took us #{end_time - start_time}ms")
 49 | 
 50 | result
 51 | ```
 52 | 
 53 | Cool. Cool cool cool. This solution works. However, it is **sequential**. It also loads the whole text into memory for every request and then it keeps eagerly evaluating expressions. Each of those can be expensive: first we split the whole text into a full list of lines, then each line into a full list of words, and so on. We fully compute each step before moving to the next.
 54 | 
 55 | I know what you're thinking: this is what **streams** are for. You're kind of right.
 56 | 
 57 | #### The `Stream` Version
 58 | 
 59 | It's easy enough to change the `Enum`-based version to use streams instead.
 60 | 
 61 | ```elixir
 62 | start_time = System.monotonic_time(:millisecond)
 63 | 
 64 | result =
 65 |   1..10
 66 |   |> Stream.map(fn _ ->
 67 |     Req.get!("https://baconipsum.com/api/?type=meat-and-filler&paras=50&format=text").body
 68 |   end)
 69 |   |> Stream.flat_map(fn paragraph ->
 70 |     String.splitter(paragraph, "\n", trim: true)
 71 |   end)
 72 |   |> Stream.flat_map(&String.splitter(&1, " ", trim: true))
 73 |   |> Enum.reduce(%{}, fn word, acc ->
 74 |     Map.update(acc, word, 1, &(&1 + 1))
 75 |   end)
 76 |   |> Enum.sort_by(fn {_word, count} -> count end, :desc)
 77 | 
 78 | end_time = System.monotonic_time(:millisecond)
 79 | IO.puts("Took us #{end_time - start_time}ms")
 80 | 
 81 | result
 82 | ```
 83 | 
 84 | The differences are:
 85 | 
 86 | * We use [`String.splitter/3`](https://hexdocs.pm/elixir/String.html#splitter/2) instead of `String.split/3`, since it returns a lazy stream
 87 | * We change `Enum.flat_map/2` into `Stream.flat_map/2`
 88 | 
 89 | Just like that, our solution is now *lazy* and won't compute every step and load it fully into memory. That's fantastic, but our execution time is kind of exactly the same! Splitting strings and stuff like that can take memory, but it's blazing-fast compared to performing HTTP requests.
 90 | 
 91 | ### Enter Flow
 92 | 
 93 | Flow tries its hardest to provide an API that's almost identical to `Enum` and `Stream`. This is how we'd rewrite our example.
 94 | 
 95 | ```elixir
 96 | start_time = System.monotonic_time(:millisecond)
 97 | 
 98 | result =
 99 |   1..10
100 |   |> Flow.from_enumerable(max_demand: 1, min_demand: 0)
101 |   |> Flow.map(fn _ ->
102 |     Req.get!("https://baconipsum.com/api/?type=meat-and-filler&paras=50&format=text").body
103 |   end)
104 |   |> Flow.flat_map(fn paragraph ->
105 |     String.splitter(paragraph, "\n", trim: true)
106 |   end)
107 |   |> Flow.flat_map(&String.splitter(&1, " ", trim: true))
108 |   |> Flow.partition()
109 |   |> Flow.reduce(fn -> %{} end, fn word, acc ->
110 |     Map.update(acc, word, 1, &(&1 + 1))
111 |   end)
112 |   |> Enum.sort_by(fn {_word, count} -> count end, :desc)
113 | 
114 | end_time = System.monotonic_time(:millisecond)
115 | IO.puts("Took us #{end_time - start_time}ms")
116 | 
117 | result
118 | ```
119 | 
120 | ## Partitioning
121 | 
122 | The example above works thanks to a little function we snuck in there: [`Flow.partition/1`](https://hexdocs.pm/flow/Flow.html#partition/1).
123 | 
124 | Flow executes computations in different processes, including the `Flow.reduce/3` step. This means that if we don't have a way to make sure words are deterministically processed in specific processes, we're not going to be able to perform the reduce step in parallel. This is a known issue when working with map/reduce: if you want the reduce step to happen in parallel, you need to divide the output of the map step in subsets that you can reduce on their own.
125 | 
126 | `Flow.partition/1` does exactly that: it introduces a new set of *stages* and makes sure the same word is always mapped to the same stage. It does this through a hash function.
127 | 
128 | ![](images/04-flow_partitioning.png)
129 | 
130 | ## Windows and Triggers
131 | 
132 | We won't go into too much detail on windows and triggers, but they're a powerful feature of Flow.
133 | 
134 | The reason for having these features is *unbounded collections*. When working with unbounded collections, we can't simply `Flow.reduce/3` like we did in the word-counting example. We'll never arrive at a reduceable result if our collection has infinite elements.
135 | 
136 | Windows and triggers address this problem. Windows let us split the collection based on time, and triggers let us tell the pipeline when to "flush" the results we have computed so far. Then, Flow can run its operations on *windows* of data instead of on the unbounded collection.
137 | 
138 | Once we specify a window strategy, then we can use triggers to *materialize* the data in the window. The parallel computation now happens on the window itself, not on all the data like in the examples above.
139 | 
140 | All events belong to the **global window** by default, which is returned by [`Flow.Window.global/0`](https://hexdocs.pm/flow/1.2.0/Flow.Window.html#global/0).
141 | 
142 | ```elixir
143 | Flow.Window.global()
144 | ```
145 | 
146 | ### Triggers
147 | 
148 | Let's see a trigger in action. We'll use the [`Flow.Window.trigger_every/2` trigger](https://hexdocs.pm/flow/Flow.Window.html#trigger_every/2), which triggers every `n` elements in the window.
149 | 
150 | ```elixir
151 | window = Flow.Window.global() |> Flow.Window.trigger_every(10)
152 | 
153 | Flow.from_enumerable(1..100)
154 | |> Flow.partition(window: window, stages: 1)
155 | |> Flow.reduce(fn -> 0 end, &(&1 + &2))
156 | |> Flow.emit(:state)
157 | |> Enum.to_list()
158 | ```
159 | 
160 | As you can see in the output, every element in the returned list is the accumulated sum of the ten elements before it.
161 | 
162 | #### What to Emit?
163 | 
164 | You might notice a little call to [`Flow.emit/2`](https://hexdocs.pm/flow/Flow.html#emit/2) in the pipeline in our previous example. You can use `Flow.emit/2` to tell Flow what is the value you want to emit to the next stage of the computation. Often, you'd use `Flow.emit(flow, :events)`. In our case, what we want to emit to the next stage is the *state* calculated by `Flow.reduce/3`, so we use `Flow.emit(:state)`.
165 | 
166 | ### Other Window Types
167 | 
168 | We can have different window types. The [`Flow.Window.count/1` window](https://hexdocs.pm/flow/Flow.Window.html#count/1) considers the given `n` elements. We can see how this plays out with the `trigger_every` trigger we used above. In the example below, we consider a window of 15 elements at a time, but emit the trigger every 6 elements.
169 | 
170 | ```elixir
171 | window = Flow.Window.count(15) |> Flow.Window.trigger_every(6)
172 | 
173 | Stream.repeatedly(fn -> 5 end)
174 | |> Flow.from_enumerable()
175 | |> Flow.partition(window: window, stages: 1)
176 | |> Flow.reduce(fn -> 0 end, &(&1 + &2))
177 | |> Flow.emit(:state)
178 | |> Enum.take(10)
179 | ```
180 | 
181 | ## Stages
182 | 
183 | As mentioned at the beginning of this livebook, Flow is built on top of GenStage. In many cases, you won't even notice it. However, it's important because it means all the GenStage features we know and love power Flow as well. For example, Flow pipelines have built-in backpressure through GenStage's demand. You can also use GenStage producers or producer-consumers to feed a Flow pipeline through [`Flow.from_stages/2`](https://hexdocs.pm/flow/Flow.html#from_stages/1).
184 | 
185 | To control parallelism and batching in Flow, you'll see GenStage concepts pop up. For example, the `Flow.partition/2` function we used above uses a `:stages` option to control how many processes to use in the partitioning. You can also use `:min_demand` and `:max_demand` to control batching.
186 | 


--------------------------------------------------------------------------------
/05-broadway.livemd:
--------------------------------------------------------------------------------
  1 | # Broadway
  2 | 
  3 | ```elixir
  4 | Mix.install([
  5 |   :amqp,
  6 |   :broadway,
  7 |   :broadway_rabbitmq,
  8 |   :jason,
  9 |   :kino
 10 | ])
 11 | 
 12 | case AMQP.Connection.open() do
 13 |   {:ok, conn} ->
 14 |     :ok = AMQP.Connection.close(conn)
 15 | 
 16 |   {:error, reason} ->
 17 |     raise """
 18 |     there doesn't seem to be a RabbitMQ instance accepting connections at
 19 |     localhost:5672: #{inspect(reason)}
 20 |     """
 21 | end
 22 | 
 23 | defmodule VisualHelpers do
 24 |   # Don't do this. It's Broadway private API and it could change any time!
 25 |   def broadway_sup_tree(pipeline) do
 26 |     sup = :sys.get_state(pipeline).supervisor_pid
 27 |     # Kino.Process.sup_tree/2 doesn't render non-alive processes correctly, so let's
 28 |     # remove them.
 29 |     Supervisor.delete_child(sup, Broadway.Topology.RateLimiter)
 30 |     Kino.Process.sup_tree(sup, direction: :left_right)
 31 |   end
 32 | end
 33 | ```
 34 | 
 35 | ## Before We Start
 36 | 
 37 | In the setup block above, we try to connect to RabbitMQ to check whether you have it running. We use RabbitMQ in this livebook to feed messages to our Broadway examples.
 38 | 
 39 | If you don't have RabbitMQ installed, the easiest way to quickly spin up an instance is through Docker:
 40 | 
 41 | ```
 42 | $ docker run --rm -it -p 5672:5672 rabbitmq
 43 | ```
 44 | 
 45 | ## The Big Stage
 46 | 
 47 | Broadway is a tool to build **data ingestion** and **data processing** pipelines. It's a sibling of Flow that builds on top of GenStage. You can think of Flow as the solution to the problem "how do I use GenStage in a simple way to process collections?". Broadway is the answer to the question "how do I use GenStage in a simple way to ingest and process data from different sources?".
 48 | 
 49 | Compared to GenStage, Broadway is more **declarative**. It provides many abstractions that you could build yourself with GenStage, but it makes them easy to configure. Some of those are:
 50 | 
 51 | * batching
 52 | * (automatic) acknowledgements
 53 | * graceful shutdown and draining
 54 | * rate limiting
 55 | * instrumentation
 56 | 
 57 | That all sounds fantastic. Let's dive in.
 58 | 
 59 | ## Broadway's Architecture
 60 | 
 61 | Broadway's architecture is made of three main components:
 62 | 
 63 | * a set of **producers** — these produce the messages that feed the pipeline
 64 | * a set of **processors** — these do per-message processing
 65 | * a (optional) set of **batch processes** — these, if present, work on batches of *processed* messages
 66 | 
 67 | In its simplest form, a Broadway pipeline is made of producers and processors.
 68 | 
 69 | ![](images/05-broadway_architecture.png)
 70 | 
 71 | ### Producers
 72 | 
 73 | A Broadway producer is a **GenStage producer** that emits `Broadway.Message` structs as its events. One of the coolest things about Broadway is that its ecosystem comes with several existing producers. These produce messages out of the most common message queues, databases, and more. Some of the existing producers are:
 74 | 
 75 | * [`broadway_kafka`](https://github.com/dashbitco/broadway_kafka)
 76 | * [`broadway_rabbitmq`](https://github.com/dashbitco/broadway_rabbitmq)
 77 | * [`broadway_sqs`](https://github.com/dashbitco/broadway_sqs)
 78 | * [`broadway_cloud_pub_sub`](https://github.com/dashbitco/broadway_cloud_pub_sub)
 79 | 
 80 | ## Pipeline Example
 81 | 
 82 | Let's start with an example of a simple Broadway pipeline. We'll use the RabbitMQ producer. Our pipeline will consume messages from a RabbitMQ queue and, guess what, print them to standard output! 😀 Read [the RabbitMQ section above](#before-we-start) to make sure you have RabbitMQ running locally.
 83 | 
 84 | ```elixir
 85 | defmodule RabbitMQPrinterPipeline do
 86 |   use Broadway
 87 | 
 88 |   def start_link do
 89 |     producer_opts = [
 90 |       # The queue to consume from.
 91 |       queue: "my_queue",
 92 |       on_failure: :reject_and_requeue
 93 |     ]
 94 | 
 95 |     Broadway.start_link(__MODULE__,
 96 |       name: __MODULE__,
 97 |       producer: [
 98 |         module: {BroadwayRabbitMQ.Producer, producer_opts},
 99 |         concurrency: 1
100 |       ],
101 |       processors: [
102 |         default: [concurrency: 2]
103 |       ]
104 |     )
105 |   end
106 | 
107 |   # The only callback you need to process messages.
108 |   @impl true
109 |   def handle_message(:default, message, _context) do
110 |     IO.inspect(message, label: "Message in processor #{inspect(self())}")
111 |     message
112 |   end
113 | end
114 | ```
115 | 
116 | Before running this, let's start up a connection RabbitMQ using the [AMQP library](https://github.com/pma/amqp). We also declare an exchange, a queue, and bind the queue to the exchange. Take a look at [CloudAMQP's RabbitMQ introduction](https://www.cloudamqp.com/blog/part1-rabbitmq-for-beginners-what-is-rabbitmq.html) if you want to learn more about RabbitMQ concepts.
117 | 
118 | ```elixir
119 | {:ok, conn} = AMQP.Connection.open()
120 | {:ok, channel} = AMQP.Channel.open(conn)
121 | :ok = AMQP.Exchange.declare(channel, "my_exchange", :topic)
122 | {:ok, _info} = AMQP.Queue.declare(channel, "my_queue")
123 | :ok = AMQP.Queue.bind(channel, "my_queue", "my_exchange", routing_key: "print.*")
124 | 
125 | # Publish messages
126 | :ok = AMQP.Basic.publish(channel, "my_exchange", "print.this", "hello world!")
127 | :ok = AMQP.Basic.publish(channel, "my_exchange", "print.that", "this is Broadway speaking!")
128 | 
129 | :ok = AMQP.Channel.close(channel)
130 | :ok = AMQP.Connection.close(conn)
131 | ```
132 | 
133 | Okay, we're ready to rumble. We already published a couple of messages in the code above. These are now sitting in the `my_queue` queue in RabbitMQ. When we'll connect a consumer to it, RabbitMQ will deliver the messages to the consumer. We'll start our Broadway pipeline below, which does exactly that: it subscribes a consumer to the queue.
134 | 
135 | ```elixir
136 | # Start the pipeline
137 | {:ok, pipeline} = RabbitMQPrinterPipeline.start_link()
138 | 
139 | # Shut it down after 3 seconds
140 | Process.sleep(3_000)
141 | Broadway.stop(pipeline)
142 | ```
143 | 
144 | As you can see, we return the `Broadway.Message` struct itself from the `handle_message/3` callback. This functional approach is fantastic, because it lets us only operate on data. It also lets us tell Broadway when something went wrong, for example by returning a "failed" message using [`Broadway.Message.failed/2`](https://hexdocs.pm/broadway/Broadway.Message.html#failed/2).
145 | 
146 | When `handle_message/3` returns, Broadway **acknowledges** the message. What this means depends on the producer. In RabbitMQ's case, it means that Broadway does a RabbitMQ *ack* of the message, which in turn causes RabbitMQ itself to consider the message as consumed. For other producers, semantics might be slightly different.
147 | 
148 | If there's a crash in `handle_message/3`, Broadway will follow the `:on_failure` option we gave when starting the producer. In this case, it will reject the message and requeue it, using the RabbitMQ "nack" operation with `requeue: true`.
149 | 
150 | ## Batching
151 | 
152 | Broadway pipelines also support batching. You can declare different batchers when starting the Broadway pipeline. When you return a message from the `handle_message/3` callback, you can put a **batcher** term in it. Broadway uses this term to batch the message to the right batcher. Let's see an example where consume messages from RabbitMQ that contain a simple JSON like:
153 | 
154 | ```json
155 | {"value": 42}
156 | ```
157 | 
158 | We'll get the `"value"` key and batch it based on whether it's even or odd.
159 | 
160 | ```elixir
161 | defmodule RabbitMQBatchedPipeline do
162 |   use Broadway
163 | 
164 |   require Integer
165 | 
166 |   def start_link do
167 |     producer_opts = [
168 |       queue: "my_queue",
169 |       on_failure: :reject_and_requeue
170 |     ]
171 | 
172 |     Broadway.start_link(__MODULE__,
173 |       name: __MODULE__,
174 |       producer: [
175 |         module: {BroadwayRabbitMQ.Producer, producer_opts},
176 |         concurrency: 1
177 |       ],
178 |       processors: [
179 |         default: [concurrency: 2]
180 |       ],
181 |       batchers: [
182 |         odd: [concurrency: 1, batch_size: 3],
183 |         even: [concurrency: 1, batch_size: 3]
184 |       ]
185 |     )
186 |   end
187 | 
188 |   @impl true
189 |   def handle_message(:default, message, _context) do
190 |     %{"value" => value} = Jason.decode!(message.data)
191 |     message = Broadway.Message.put_data(message, value)
192 | 
193 |     if Integer.is_even(value) do
194 |       Broadway.Message.put_batcher(message, :even)
195 |     else
196 |       Broadway.Message.put_batcher(message, :odd)
197 |     end
198 |   end
199 | 
200 |   @impl true
201 |   def handle_batch(batcher, messages, _batch_info, _context) do
202 |     messages
203 |     |> Enum.map(& &1.data)
204 |     |> IO.inspect(label: "Batch of messages in #{inspect(batcher)} batcher")
205 | 
206 |     messages
207 |   end
208 | end
209 | ```
210 | 
211 | Let's run this code. We'll also visualize the supervision tree started by Broadway at the end, which is pretty cool and gives you an idea of what's going on under the hood.
212 | 
213 | ```elixir
214 | {:ok, conn} = AMQP.Connection.open()
215 | {:ok, channel} = AMQP.Channel.open(conn)
216 | :ok = AMQP.Exchange.declare(channel, "my_exchange", :topic)
217 | {:ok, _info} = AMQP.Queue.declare(channel, "my_queue")
218 | :ok = AMQP.Queue.bind(channel, "my_queue", "my_exchange", routing_key: "print.*")
219 | 
220 | # Publish messages
221 | for int <- 500..520 do
222 |   :ok = AMQP.Basic.publish(channel, "my_exchange", "print.this", ~s({"value": #{int}}))
223 | end
224 | 
225 | # Start the pipeline
226 | {:ok, pipeline} = RabbitMQBatchedPipeline.start_link()
227 | 
228 | # Shut it down after 5 seconds
229 | Task.start(fn ->
230 |   Process.sleep(5_000)
231 |   Broadway.stop(pipeline)
232 | end)
233 | 
234 | # Visualize the supervision tree started by Broadway.
235 | VisualHelpers.broadway_sup_tree(pipeline)
236 | ```
237 | 
238 | Batching works on **batch size** plus **batch timeout**. If a batches reaches the configured size, then it gets handed to the batcher. If it doesn't reach the configured size within a configurable timeout, it gets handed to the batcher anyways when the timeout expires, as we can see in the example above with the last two batches.
239 | 


--------------------------------------------------------------------------------
/LICENSE.txt:
--------------------------------------------------------------------------------
 1 | Copyright 2022 Andrea Leopardi
 2 | 
 3 | Licensed under the Apache License, Version 2.0 (the "License");
 4 | you may not use this file except in compliance with the License.
 5 | You may obtain a copy of the License at
 6 | 
 7 |     http://www.apache.org/licenses/LICENSE-2.0
 8 | 
 9 | Unless required by applicable law or agreed to in writing, software
10 | distributed under the License is distributed on an "AS IS" BASIS,
11 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | See the License for the specific language governing permissions and
13 | limitations under the License.
14 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Asynchronous Processing in Elixir 🏃
 2 | 
 3 | ![Cover image of some abstract lights][cover-image]
 4 | 
 5 | This is a short interactive guide to asynchronous data processing in Elixir. It
 6 | uses [Livebook][livebook] to show interactive Elixir snippets that you can run
 7 | on your own machine.
 8 | 
 9 | ## How Do I Use This?
10 | 
11 | There are a handful of livebooks in this guide. Using the badges below, you can
12 | import them to your computer and run and explore them there. My recommendation
13 | is to use the Livebook desktop app that you can find [on its website][livebook].
14 | 
15 | ### 01 — Processes
16 | 
17 | [![Run in Livebook](https://livebook.dev/badge/v1/blue.svg)](https://livebook.dev/run?url=https%3A%2F%2Fraw.githubusercontent.com%2Fwhatyouhide%2Fguide_async_processing_in_elixir%2Fmain%2F01-processes.livemd)
18 | 
19 | In this livebook we talk about the basics of processes and message passing.
20 | 
21 | ### 02 — Tasks
22 | 
23 | Here, we talk about the `Task` abstraction that ships in Elixir's standard
24 | library.
25 | 
26 | [![Run in Livebook](https://livebook.dev/badge/v1/blue.svg)](https://livebook.dev/run?url=https%3A%2F%2Fraw.githubusercontent.com%2Fwhatyouhide%2Fguide_async_processing_in_elixir%2Fmain%2F02-tasks.livemd)
27 | 
28 | ### 03 — GenStage
29 | 
30 | [GenStage] is an Elixir library maintained by the Elixir core team. It lets you
31 | build pipelines of stages through which events flow.
32 | 
33 | [![Run in Livebook](https://livebook.dev/badge/v1/blue.svg)](https://livebook.dev/run?url=https%3A%2F%2Fraw.githubusercontent.com%2Fwhatyouhide%2Fguide_async_processing_in_elixir%2Fmain%2F03-genstage.livemd)
34 | 
35 | ### 04 — Flow
36 | 
37 | [Flow] is another Elixir library maintained by the core team. It builds on top
38 | of GenStage to provide an API similar to `Enum` and `Stream`, but for parallel
39 | data processing.
40 | 
41 | [![Run in Livebook](https://livebook.dev/badge/v1/blue.svg)](https://livebook.dev/run?url=https%3A%2F%2Fraw.githubusercontent.com%2Fwhatyouhide%2Fguide_async_processing_in_elixir%2Fmain%2F04-flow.livemd)
42 | 
43 | ### 05 — Broadway
44 | 
45 | [Broadway] lets you build declarative data ingestion and processing pipelines.
46 | It supports several sources (RabbitMQ, Kafka, AWS SQS, and more).
47 | 
48 | [![Run in Livebook](https://livebook.dev/badge/v1/blue.svg)](https://livebook.dev/run?url=https%3A%2F%2Fraw.githubusercontent.com%2Fwhatyouhide%2Fguide_async_processing_in_elixir%2Fmain%2F05-broadway.livemd)
49 | 
50 | ## License
51 | 
52 | See [the license file](./LICENSE.txt).
53 | 
54 | "Social preview" photo by [Bofu Shaw](https://unsplash.com/@hikeshaw?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText) on [Unsplash](https://unsplash.com/s/photos/speed?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText).
55 | 
56 | [livebook]: https://livebook.dev
57 | [GenStage]: https://github.com/elixir-lang/gen_stage
58 | [Flow]: https://github.com/elixir-lang/flow
59 | [Broadway]: https://elixir-broadway.org
60 | [cover-image]: https://user-images.githubusercontent.com/3890250/182093532-159e5bcc-dcd7-40d7-9030-da914f3db0bb.jpg
61 | 


--------------------------------------------------------------------------------
/images/00-introduction_avatar_and_name.png.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/whatyouhide/guide_async_processing_in_elixir/6dc6f71d1e01db64dc7d55ca1b0a89cef74356b9/images/00-introduction_avatar_and_name.png.png


--------------------------------------------------------------------------------
/images/03-genstage_architetcture.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/whatyouhide/guide_async_processing_in_elixir/6dc6f71d1e01db64dc7d55ca1b0a89cef74356b9/images/03-genstage_architetcture.png


--------------------------------------------------------------------------------
/images/03-genstage_demand.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/whatyouhide/guide_async_processing_in_elixir/6dc6f71d1e01db64dc7d55ca1b0a89cef74356b9/images/03-genstage_demand.png


--------------------------------------------------------------------------------
/images/04-flow_partitioning.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/whatyouhide/guide_async_processing_in_elixir/6dc6f71d1e01db64dc7d55ca1b0a89cef74356b9/images/04-flow_partitioning.png


--------------------------------------------------------------------------------
/images/05-broadway_architecture.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/whatyouhide/guide_async_processing_in_elixir/6dc6f71d1e01db64dc7d55ca1b0a89cef74356b9/images/05-broadway_architecture.png


--------------------------------------------------------------------------------