├── .gitignore ├── README.md ├── config └── config.exs ├── lib ├── genstage_example.ex ├── genstage_example │ ├── consumer.ex │ ├── producer.ex │ ├── task.ex │ └── task_db_interface.ex └── repo.ex ├── mix.exs ├── mix.lock ├── priv └── repo │ └── migrations │ └── 20161023200119_add_tasks.exs └── test └── test_helper.exs /.gitignore: -------------------------------------------------------------------------------- 1 | # The directory Mix will write compiled artifacts to. 2 | /_build 3 | 4 | # If you run "mix test --cover", coverage assets end up here. 5 | /cover 6 | 7 | # The directory Mix downloads your dependencies sources to. 8 | /deps 9 | 10 | # Where 3rd-party dependencies like ExDoc output generated docs. 11 | /doc 12 | 13 | # If the VM crashes, it generates a dump, let's ignore it too. 14 | erl_crash.dump 15 | 16 | # Also ignore archive artifacts (built via "mix archive.build"). 17 | *.ez 18 | 19 | *.sw* 20 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # GenStage Tutorial 2 | 3 | ## Introduction 4 | So what is GenStage? From the official documentation, it is a "specification and computational flow for Elixir", but what does that mean to us? 5 | There is a lot to something that can be described as that vague, and here we'll take a dive in and build something on top of it to understand its goals. 6 | We could go into the technical and theoretical implications of this, but instead lets try a pragmatic approach to really just get it to work. 7 | 8 | First, Let's imagine we have a server that constantly emits numbers. 9 | It starts at the state of the number we give it, then counts up in one from there onward. 10 | This is what we would call our producer. 11 | Each time it emits a number, this is an event, and we want to handle it with a consumer. 12 | A consumer simply takes what a producer emits and does something to it. 13 | In our case, we will display the count. 14 | There is a lot more to GenStage at a technical and applied level, but we will build up on the specifics and definitions further in later lessons, for now we just want a running example we can build up on. 15 | 16 | ## Getting Started: A Sample GenStage Project 17 | We'll begin by generating a simple project that has a supervision tree: 18 | 19 | ```shell 20 | $ mix new genstage_example --sup 21 | $ cd genstage_example 22 | ``` 23 | 24 | Let's set up some basic things for the future of our application. 25 | Since GenStage is generally used as a transformation pipeline, lets imagine we have a background worker of some sort. 26 | This worker will need to persist whatever it changes, so we should get a database set up, but we can worry about that in a later lesson. 27 | To start, all we need to do is add `gen_stage` to our deps in `mix.deps`. 28 | 29 | ```elixir 30 | . . . 31 | defp deps do 32 | [ 33 | {:gen_stage, "~> 0.7"}, 34 | ] 35 | end 36 | . . . 37 | ``` 38 | 39 | Now, we should fetch our dependencies and compile before we start setup: 40 | 41 | ```shell 42 | $ mix do deps.get, compile 43 | ``` 44 | 45 | Lets build a producer, our simple beginning building block to help us utilize this new tool! 46 | 47 | ## Building A Producer 48 | To get started what we want to do is create a producer that emits a constant stream of events for our consumer to handle. 49 | This is quite simple with the rudimentary example of a counter. 50 | Let's create a namespaced directory under `lib` and then go from there, this way our module naming matches our names of the modules themselves: 51 | 52 | ```shell 53 | $ mkdir lib/genstage_example 54 | $ touch lib/genstage_example/producer.ex 55 | ``` 56 | 57 | Now we can add the code: 58 | 59 | ```elixir 60 | defmodule GenstageExample.Producer do 61 | alias Experimental.GenStage 62 | use GenStage 63 | 64 | def start_link do 65 | GenStage.start_link(__MODULE__, 0, name: __MODULE__) 66 | # naming allows us to handle failure 67 | end 68 | 69 | def init(counter) do 70 | {:producer, counter} 71 | end 72 | 73 | def handle_demand(demand, state) do 74 | # the default demand is 1000 75 | events = Enum.to_list(state..state + demand - 1) 76 | # [0 .. 999] 77 | # is a filled list, so its going to be considered emitting true events immediately 78 | {:noreply, events, (state + demand)} 79 | end 80 | end 81 | ``` 82 | 83 | Let's break this down line by line. 84 | To begin with, we have our initial declarations: 85 | 86 | ```elixir 87 | . . . 88 | defmodule GenstageExample.Producer do 89 | alias Experimental.GenStage 90 | use GenStage 91 | . . . 92 | ``` 93 | 94 | What this does is a couple simple things. 95 | First, we declare our module, and soon after we alias `Experimental.GenStage`. 96 | This is simply because we will be calling it more than once and makes it more convenient. 97 | The `use GenStage` line is much akin to `use GenServer`. 98 | This line allows us to import the default behaviour and functions to save us from a large amount of boilerplate. 99 | 100 | If we go further, we see the first two primary functions for startup: 101 | 102 | ```elixir 103 | . . . 104 | def start_link do 105 | GenStage.start_link(__MODULE__, :the_state_doesnt_matter) 106 | end 107 | 108 | def init(counter) do 109 | {:producer, counter} 110 | end 111 | . . . 112 | ``` 113 | 114 | These two functions offer a very simple start. 115 | First, we have our standard `start_link/0` function. 116 | Inside here, we use`GenStage.start_link/` beginning with our argument `__MODULE__`, which will give it the name of our current module. 117 | Next, we set a state, which is arbitrary in this case, and can be any value. 118 | The `__MODULE__` argument is used for name registration like any other module. 119 | The second argument is the arguments, which in this case are meaningless as we do not care about it. 120 | In `init/1` we simply set the counter as our state, and label ourselves as a producer. 121 | 122 | Finally, we have where the real meat of our code's functionality is: 123 | 124 | ```elixir 125 | . . . 126 | def handle_demand(demand, state) do 127 | events = Enum.to_list(state..state + demand - 1) 128 | {:noreply, events, (state + demand)} 129 | end 130 | . . . 131 | ``` 132 | 133 | `handle_demand/2` must be implemented by all producer type modules that utilize GenStage. 134 | In this case, we are simply sending out an incrementing counter. 135 | This might not make a ton of sense until we build our consumer, so lets move on to that now. 136 | 137 | ## Building A Consumer 138 | The consumer will handle the events that are broadcasted out by our producer. 139 | For now, we wont worry about things like broadcast strategies, or what the internals are truly doing. 140 | We'll start by showing all the code and then break it down. 141 | 142 | ```elixir 143 | defmodule GenstageExample.Consumer do 144 | alias Experimental.GenStage 145 | use GenStage 146 | 147 | def start_link do 148 | GenStage.start_link(__MODULE__, :state_doesnt_matter) 149 | end 150 | 151 | def init(state) do 152 | {:consumer, state, subscribe_to: [GenstageExample.Producer]} 153 | end 154 | 155 | def handle_events(events, _from, state) do 156 | for event <- events do 157 | IO.inspect {self(), event, state} 158 | end 159 | {:noreply, [], state} 160 | end 161 | end 162 | ``` 163 | 164 | To start, let's look at the beginning functions just like last time: 165 | 166 | ```elixir 167 | defmodule GenstageExample.Consumer do 168 | alias Experimental.GenStage 169 | use GenStage 170 | 171 | def start_link do 172 | GenStage.start_link(__MODULE__, :state_doesnt_matter) 173 | end 174 | 175 | def init(state) do 176 | {:consumer, state, subscribe_to: [GenstageExample.Producer]} 177 | end 178 | . . . 179 | ``` 180 | 181 | To begin, much like in our producer, we set up our `start_link/0` and `init/1` functions. 182 | In `start_link` we simple register the module name like last time, and set a state. 183 | The state is arbitrary for the consumer, and can be literally whatever we please, in this case `:state_doesnt_matter`. 184 | 185 | In `init/1` we simply take the state and set up our expected tuple. 186 | It expected use to register our `:consumer` atom first, then the state given. 187 | Our `subscribe_to` clause is optional. 188 | What this does is subscribe us to our producer module. 189 | The reason for this is if something crashes, it will simply attempt to re-subscribe and then resume receiving emitted events. 190 | 191 | ```elixir 192 | . . . 193 | def handle_events(events, _from, state) do 194 | for event <- events do 195 | IO.inspect {self(), event, state} 196 | end 197 | {:noreply, [], state} 198 | end 199 | . . . 200 | ``` 201 | 202 | This is the meat of our consumer, `handle_events/3`. 203 | `handle_events/3` must be implemented by any `consumer` or `producer_consumer` type of GenStage module. 204 | What this does for us is quite simple. 205 | We take a list of events, and iterate through these. 206 | From there, we inspect the `pid` of our consumer, the event (in this case the current count), and the state. 207 | After that, we don't reply because we are a consumer and do not handle anything, and we don't emit events to the second argument is empty, then we simply pass on the state. 208 | 209 | ## Wiring It Together 210 | To get all of this to work we only have to make one simple change. 211 | Open up `lib/genstage_example.ex` and we can add them as workers and they will automatically start with our application: 212 | 213 | ```elixir 214 | . . . 215 | children = [ 216 | worker(GenstageExample.Producer, []), 217 | worker(GenstageExample.Consumer, []), 218 | ] 219 | . . . 220 | ``` 221 | 222 | With this, if things are all correct, we can run IEx and we should see everything working: 223 | 224 | ```elixir 225 | iex(1)> {#PID<0.205.0>, 0, :state_doesnt_matter} 226 | {#PID<0.205.0>, 1, :state_doesnt_matter} 227 | {#PID<0.205.0>, 2, :state_doesnt_matter} 228 | {#PID<0.205.0>, 3, :state_doesnt_matter} 229 | {#PID<0.205.0>, 4, :state_doesnt_matter} 230 | {#PID<0.205.0>, 5, :state_doesnt_matter} 231 | {#PID<0.205.0>, 6, :state_doesnt_matter} 232 | {#PID<0.205.0>, 7, :state_doesnt_matter} 233 | {#PID<0.205.0>, 8, :state_doesnt_matter} 234 | {#PID<0.205.0>, 9, :state_doesnt_matter} 235 | {#PID<0.205.0>, 10, :state_doesnt_matter} 236 | {#PID<0.205.0>, 11, :state_doesnt_matter} 237 | {#PID<0.205.0>, 12, :state_doesnt_matter} 238 | . . . 239 | ``` 240 | 241 | ## Tinkering: For Science and Understanding 242 | From here, we have a working flow. 243 | There is a producer emitting our counter, and our consumber is displaying all of this and continuing the flow. 244 | Now, what if we wanted multiple consumers? 245 | Right now, if we examine the `IO.inspect/1` output, we see that every single event is handled by a single PID. 246 | This isn't very Elixir-y. 247 | We have massive concurrency built-in, we should probably leverage that as much as possible. 248 | Let's make some adjustments so that we can have multiple workers by modifying `lib/genstage_example.ex` 249 | 250 | ```elixir 251 | . . . 252 | children = [ 253 | worker(GenstageExample.Producer, []), 254 | worker(GenstageExample.Consumer, [], id: 1), 255 | worker(GenstageExample.Consumer, [], id: 2), 256 | ] 257 | . . . 258 | ``` 259 | 260 | Now, let's fire up IEx again: 261 | 262 | ```elixir 263 | $ iex -S mix 264 | iex(1)> {#PID<0.205.0>, 0, :state_doesnt_matter} 265 | {#PID<0.205.0>, 1, :state_doesnt_matter} 266 | {#PID<0.205.0>, 2, :state_doesnt_matter} 267 | {#PID<0.207.0>, 3, :state_doesnt_matter} 268 | . . . 269 | ``` 270 | 271 | As you can see, we have multiple PIDs now, simply by adding a line of code and giving our consumers IDs. 272 | But we can take this even further: 273 | 274 | ```elixir 275 | . . . 276 | children = [ 277 | worker(GenstageExample.Producer, []), 278 | ] 279 | consumers = for id <- 1..(System.schedulers_online * 12) do 280 | # helper to get the number of cores on machine 281 | worker(GenstageExample.Consumer, [], id: id) 282 | end 283 | 284 | opts = [strategy: :one_for_one, name: GenstageExample.Supervisor] 285 | Supervisor.start_link(children ++ consumers, opts) 286 | . . . 287 | ``` 288 | 289 | What we are doing here is quite simple. 290 | First, we get the number of core on the machine with `System.schedulers_online/0`, and from there we simply create a worker just like we had. 291 | Now we have 12 workers per core. This is much more effective. 292 | 293 | ```elixir 294 | . . . 295 | {#PID<0.229.0>, 63697, :state_doesnt_matter} 296 | {#PID<0.224.0>, 53190, :state_doesnt_matter} 297 | {#PID<0.223.0>, 72687, :state_doesnt_matter} 298 | {#PID<0.238.0>, 69688, :state_doesnt_matter} 299 | {#PID<0.196.0>, 62696, :state_doesnt_matter} 300 | {#PID<0.212.0>, 52713, :state_doesnt_matter} 301 | {#PID<0.233.0>, 72175, :state_doesnt_matter} 302 | {#PID<0.214.0>, 51712, :state_doesnt_matter} 303 | {#PID<0.227.0>, 66190, :state_doesnt_matter} 304 | {#PID<0.234.0>, 58694, :state_doesnt_matter} 305 | {#PID<0.211.0>, 55694, :state_doesnt_matter} 306 | {#PID<0.240.0>, 64698, :state_doesnt_matter} 307 | {#PID<0.193.0>, 50692, :state_doesnt_matter} 308 | {#PID<0.207.0>, 56683, :state_doesnt_matter} 309 | {#PID<0.213.0>, 71684, :state_doesnt_matter} 310 | {#PID<0.235.0>, 53712, :state_doesnt_matter} 311 | {#PID<0.208.0>, 51197, :state_doesnt_matter} 312 | {#PID<0.200.0>, 61689, :state_doesnt_matter} 313 | . . . 314 | ``` 315 | 316 | Though we lack any ordering like we would have with a single core, but every increment is being hit and processed. 317 | 318 | We can take this a step further and change our broadcasting strategy from the default in our producer: 319 | 320 | ```elixir 321 | . . . 322 | def init(counter) do 323 | {:producer, counter, dispatcher: GenStage.BroadcastDispatcher} 324 | end 325 | . . . 326 | ``` 327 | 328 | What this does is it accumulates demand from all consumers before broadcasting its events to all of them. 329 | If we fire up IEx we can see the implication: 330 | 331 | ```elixir 332 | . . . 333 | {#PID<0.200.0>, 1689, :state_doesnt_matter} 334 | {#PID<0.230.0>, 1690, :state_doesnt_matter} 335 | {#PID<0.196.0>, 1679, :state_doesnt_matter} 336 | {#PID<0.215.0>, 1683, :state_doesnt_matter} 337 | {#PID<0.237.0>, 1687, :state_doesnt_matter} 338 | {#PID<0.205.0>, 1682, :state_doesnt_matter} 339 | {#PID<0.206.0>, 1695, :state_doesnt_matter} 340 | {#PID<0.216.0>, 1682, :state_doesnt_matter} 341 | {#PID<0.217.0>, 1689, :state_doesnt_matter} 342 | {#PID<0.233.0>, 1681, :state_doesnt_matter} 343 | {#PID<0.223.0>, 1689, :state_doesnt_matter} 344 | {#PID<0.193.0>, 1194, :state_doesnt_matter} 345 | . . . 346 | ``` 347 | 348 | Note that some numbers are showing twice now, this is why. 349 | 350 | 351 | ## Setting Up Postgres to Extend Our Producer 352 | To go further we'll need to bring in a database to store our progress and status. 353 | This is quite simple using [Ecto](LINKTOLESSON). 354 | To get started let's add it and the Postgresql adapter to `mix.exs`: 355 | 356 | ```elixir 357 | . . . 358 | defp deps do 359 | [ 360 | {:gen_stage, "~> 0.7"}, 361 | {:ecto, "~> 2.0"}, 362 | {:postgrex, "~> 0.12.1"}, 363 | ] 364 | end 365 | . . . 366 | 367 | ``` 368 | 369 | Fetch the dependencies and compile: 370 | 371 | ```shell 372 | $ mix do deps.get, compile 373 | ``` 374 | 375 | And now we can add a repo for setup in `lib/repo.ex`: 376 | 377 | ```elixir 378 | defmodule GenstageExample.Repo do 379 | use Ecto.Repo, 380 | otp_app: :genstage_example 381 | end 382 | ``` 383 | 384 | and with this we can set up our config next in `config/config.exs`: 385 | 386 | ```elixir 387 | use Mix.Config 388 | 389 | config :genstage_example, ecto_repos: [GenstageExample.Repo] 390 | 391 | config :genstage_example, GenstageExample.Repo, 392 | adapter: Ecto.Adapters.Postgres, 393 | database: "genstage_example", 394 | username: "your_username", 395 | password: "your_password", 396 | hostname: "localhost", 397 | port: "5432" 398 | ``` 399 | 400 | And if we add a supservisor to `lib/genstage_example.ex` we can now start working with the DB: 401 | 402 | ```elixir 403 | . . . 404 | def start(_type, _args) do 405 | import Supervisor.Spec, warn: false 406 | 407 | children = [ 408 | supervisor(GenstageExample.Repo, []), 409 | worker(GenstageExample.Producer, []), 410 | ] 411 | end 412 | . . . 413 | ``` 414 | 415 | But we should also make an interface to do that, so let's import our query interface and repo to the producer: 416 | 417 | ```elixir 418 | . . . 419 | import Ecto.Query 420 | import GenstageExample.Repo 421 | . . . 422 | ``` 423 | 424 | Now we need to create our migration: 425 | 426 | ```shell 427 | $ mix ecto.gen.migration setup_tasks status:text payload:binary 428 | ``` 429 | 430 | Now that we have a functional database, we can start storing things. 431 | Let's remove our change in Broadcaster, as we only were doing that to demonstrate that there are others outside the normal default in our Producer. 432 | 433 | ```elixir 434 | . . . 435 | def init(counter) do 436 | {:producer, counter} 437 | end 438 | . . . 439 | ``` 440 | 441 | ### Modelling the Rest of the Functionality 442 | 443 | Now that we have all this boilerplate work completed we should come up with a model to run all of this now that we have a simple wired-together producer/consumer model. 444 | At the end of the day we are trying to make a task runner. 445 | To do this, we probably want to abstract the interface for tasks and DB interfacing into their own modules. 446 | To start, let's create our `Task` module to model our actual tasks to be run: 447 | 448 | ```elixir 449 | defmodule GenstageExample.Task do 450 | def enqueue(status, payload) do 451 | GenstageExample.TaskDBInterface.insert_tasks(status, payload) 452 | end 453 | 454 | def take(limit) do 455 | GenstageExample.TaskDBInterface.take_tasks(limit) 456 | end 457 | end 458 | ``` 459 | 460 | This is a _really_ simple interface to abstract a given task's functionality. 461 | We only have 2 functions. 462 | Now, the module they are calling doesn't exist yet, it gives us the ideas we need to build a very simple interface. 463 | These can be broken down as follows: 464 | 465 | 1. `enqueue/2` - Enqueue a task to be run 466 | 3. `take/1` - Take a given number of tasks to run from the database 467 | 468 | Now this gives us the interface we need: we can set things to be run, and grab tasks to be run and we can define the rest of the interface. 469 | Let's create an interface with our database in its own module: 470 | 471 | ```elixir 472 | defmodule GenstageExample.TaskDBInterface do 473 | import Ecto.Query 474 | 475 | def take_tasks(limit) do 476 | {:ok, {count, events}} = 477 | GenstageExample.Repo.transaction fn -> 478 | ids = GenstageExample.Repo.all waiting(limit) 479 | GenstageExample.Repo.update_all by_ids(ids), [set: [status: "running"]], [returning: [:id, :payload]] 480 | end 481 | {count, events} 482 | end 483 | 484 | def insert_tasks(status, payload) do 485 | GenstageExample.Repo.insert_all "tasks", [ 486 | %{status: status, payload: payload} 487 | ] 488 | end 489 | 490 | def update_task_status(id, status) do 491 | GenstageExample.Repo.update_all by_ids([id]), set: [status: status] 492 | end 493 | 494 | defp by_ids(ids) do 495 | from t in "tasks", where: t.id in ^ids 496 | end 497 | 498 | defp waiting(limit) do 499 | from t in "tasks", 500 | where: t.status == "waiting", 501 | limit: ^limit, 502 | select: t.id, 503 | lock: "FOR UPDATE SKIP LOCKED" 504 | end 505 | end 506 | ``` 507 | 508 | This one is a bit more complex, but we'll break it down piece by piece. 509 | We have 3 main functions, and 2 private helpers: 510 | 511 | #### Main Functions 512 | 1. `take_tasks/1` 513 | 2. `insert_tasks/2` 514 | 3. `update_task_status/2` 515 | 516 | With `take_tasks/1` we have the bulk of our logic. 517 | This function will be called to grab tasks we have queued to run them. 518 | Let's look at the code: 519 | 520 | ```elixir 521 | . . . 522 | def take_tasks(limit) do 523 | {:ok, {count, events}} = 524 | GenstageExample.Repo.transaction fn -> 525 | ids = GenstageExample.Repo.all waiting(limit) 526 | GenstageExample.Repo.update_all by_ids(ids), [set: [status: "running"]], [returning: [:id, :payload]] 527 | end 528 | {count, events} 529 | end 530 | . . . 531 | ``` 532 | 533 | We do a few things here. 534 | First, we go in and we wrap everything in a transaction. 535 | This maintains state in the database so we avoid race conditions and other bad things. 536 | Inside here, we get the ids of all tasks waiting to be executed up to some limit, and set them to `running` as their status. 537 | We return the `count` of total tasks and the events to be run in the consumer. 538 | 539 | Next we have `insert_tasks/2`: 540 | 541 | ```elixir 542 | . . . 543 | def insert_tasks(status, payload) do 544 | GenstageExample.Repo.insert_all "tasks", [ 545 | %{status: status, payload: payload} 546 | ] 547 | end 548 | . . . 549 | ``` 550 | 551 | This one is a bit more simple. 552 | We just insert a task to be run with a given payload binary. 553 | 554 | Finally, we have `update_task_status/2`, which is also quite simple: 555 | 556 | ```elixir 557 | . . . 558 | def update_task_status(id, status) do 559 | GenstageExample.Repo.update_all by_ids([id]), set: [status: status] 560 | end 561 | . . . 562 | ``` 563 | 564 | Here we simple update tasks to the status we want using a given id. 565 | 566 | #### Helpers 567 | Our helpers are all called primarily inside of `take_tasks/1`, but also used elsewhere in the main public API. 568 | 569 | ```elixir 570 | . . . 571 | defp by_ids(ids) do 572 | from t in "tasks", where: t.id in ^ids 573 | end 574 | 575 | defp waiting(limit) do 576 | from t in "tasks", 577 | where: t.status == "waiting", 578 | limit: ^limit, 579 | select: t.id, 580 | lock: "FOR UPDATE SKIP LOCKED" 581 | end 582 | . . . 583 | ``` 584 | 585 | Neither of these has a ton of complexity. 586 | `by_ids/1` simply grabs all tasks that match in a given list of IDs. 587 | 588 | `waiting/1` finds all tasks that have the status waiting up to a given limit. 589 | However, there is one note to make on `waiting/1`. 590 | We leverage a lock on all tasks being updated so we skip those, a feature available in psql 9.5+. 591 | Outside of this, it is a very simple `SELECT` statement. 592 | 593 | Now that we have our DB interface defined as it is used in the primary API, we can move onto the producer, consumer, and last bits of configuration. 594 | 595 | ### Producer, Consumer, and Final Configuration 596 | 597 | #### Final Config 598 | We will need to do a bit of configuration in `lib/genstage_example.ex` to clarify things as well as give us the final functionalities we will need to run jobs. 599 | This is what we will end up with: 600 | 601 | ```elixir 602 | . . . 603 | def start(_type, _args) do 604 | import Supervisor.Spec, warn: false 605 | # 12 workers / system core 606 | consumers = for id <- (0..System.schedulers_online * 12) do 607 | worker(GenstageExample.Consumer, [], id: id) 608 | end 609 | producers = [ 610 | worker(Producer, []), 611 | ] 612 | 613 | supervisors = [ 614 | supervisor(GenstageExample.Repo, []), 615 | supervisor(Task.Supervisor, [[name: GenstageExample.TaskSupervisor]]), 616 | ] 617 | children = supervisors ++ producers ++ consumers 618 | 619 | opts = [strategy: :one_for_one, name: GenstageExample.Supervisor] 620 | Supervisor.start_link(children, opts) 621 | end 622 | 623 | def start_later(module, function, args) do 624 | payload = {module, function, args} |> :erlang.term_to_binary 625 | Repo.insert_all("tasks", [ 626 | %{status: "waiting", payload: payload} 627 | ]) 628 | notify_producer 629 | end 630 | 631 | def notify_producer do 632 | send(Producer, :data_inserted) 633 | end 634 | 635 | defdelegate enqueue(module, function, args), to: Producer 636 | . . . 637 | ``` 638 | 639 | Let's tackle this from the top down. 640 | First, `start/2`: 641 | 642 | ```elixir 643 | . . . 644 | def start(_type, _args) do 645 | import Supervisor.Spec, warn: false 646 | # 12 workers / system core 647 | consumers = for id <- (0..System.schedulers_online * 12) do 648 | worker(GenstageExample.Consumer, [], id: id) 649 | end 650 | producers = [ 651 | worker(Producer, []), 652 | ] 653 | 654 | supervisors = [ 655 | supervisor(GenstageExample.Repo, []), 656 | supervisor(Task.Supervisor, [[name: GenstageExample.TaskSupervisor]]), 657 | ] 658 | children = supervisors ++ producers ++ consumers 659 | 660 | opts = [strategy: :one_for_one, name: GenstageExample.Supervisor] 661 | Supervisor.start_link(children, opts) 662 | end 663 | . . . 664 | ``` 665 | First of all, you will notice we are now defining producers, consumers, and supervisors separately. 666 | I find this convention to work quite well to illustrate the intentions of various processes and trees we are starting here. 667 | In these 3 lists we set up 12 consumers / CPU core, set up a single producer, and then our supervisors for the Repo, as well as one new one. 668 | 669 | This new supervisor is run through `Task.Supervisor`, which is built into Elixir. 670 | We give it a name so it is easily referred to in our GenStage code, `GenstageExample.TaskSupervisor`. 671 | Now, we define our children as the concatenation of all these lists. 672 | 673 | Next, we have `start_later/3`: 674 | 675 | ```elixir 676 | . . . 677 | def start_later(module, function, args) do 678 | payload = {module, function, args} |> :erlang.term_to_binary 679 | Repo.insert_all("tasks", [ 680 | %{status: "waiting", payload: payload} 681 | ]) 682 | notify_producer 683 | end 684 | . . . 685 | ``` 686 | This function takes a module, a function, and an argument. 687 | It then encodes them as a binary using some built-in erlang magic. 688 | From here, we then insert the task as `waiting`, and we notify a producer that a task has been inserted to run. 689 | 690 | Now let's check out `notify_producer/0`: 691 | 692 | ```elixir 693 | . . . 694 | def notify_producer do 695 | send(Producer, :data_inserted) 696 | end 697 | . . . 698 | ``` 699 | 700 | This method is quite simple. 701 | We send our producer a message, `:data_inserted`, simply so that it knows what we did. 702 | The message here is arbitrary, but I chose this atom to make the meaning clear. 703 | 704 | Last, but not least we do some simple delegation: 705 | 706 | ```elixir 707 | . . . 708 | defdelegate enqueue(module, functions, args), to : Producer 709 | . . . 710 | ``` 711 | This simply makes it so if we call `GenstageExample.enqueue(module, function, args)` that it will be delegated to the same method in our producer. 712 | 713 | ### Producer Setup 714 | Our producer doesn't need a ton of work. 715 | first, we'll alter our `handle_demand/2` to actually do something with our events: 716 | 717 | ```elixir 718 | . . . 719 | def handle_demand(demand, state) when demand > 0 do 720 | serve_jobs(demand + state) 721 | end 722 | . . . 723 | ``` 724 | 725 | We haven't defined `serve_jobs/2` yet, but we'll get there. 726 | The concept is simple, when we get a demand and demand is > 0, we do some work to the tune of demand + the current state's number of jobs. 727 | 728 | Now that we will be sending a message to the producer when we run `start_later/3`, we will want to respond to it with a `handle_info/2` call: 729 | 730 | ```elixir 731 | . . . 732 | def handle_info(:enqueued, state) do 733 | {count, events} = GenstageExample.Task.take(state) 734 | {:noreply, events, state - count} 735 | end 736 | . . . 737 | ``` 738 | 739 | With this, we simply respond by taking the number of tasks we are told to get ready to run. 740 | 741 | Now let's define `serve_jobs/1`: 742 | 743 | ```elixir 744 | . . . 745 | def serve_jobs limit do 746 | {count, events} = GenstageExample.Task.take(limit) 747 | Process.send_after(@name, :enqueued, 60_000) 748 | {:noreply, events, limit - count} 749 | end 750 | . . . 751 | ``` 752 | 753 | Now, we are sending a process in one minute that to our producer telling it that it should respond to `:enqueued`. 754 | Note that we call the process module with `@name`, which we will need to add at the top as a module attribute: 755 | 756 | ```elixir 757 | . . . 758 | @name __MODULE__ 759 | . . . 760 | ``` 761 | 762 | Let's define that last function to handle the `:enqueued` message now, too: 763 | 764 | ```elixir 765 | . . . 766 | def handle_cast(:enqueued, state) do 767 | serve_jobs(state) 768 | end 769 | . . . 770 | ``` 771 | 772 | This will simply serve jobs when we tell the producer they have `state` number of enqueued and to respond. 773 | 774 | ## Setting Up the Consumer for Real Work 775 | Our consumer is where we do the work. 776 | Now that we have our producer storing tasks, we want to have the consumer handle this as well. 777 | There is a good bit of work to be done here tying into our work so far. 778 | The core of the consumer is `handle_events/3`, lets flesh out the functionality we wish to have there and define it as we go further: 779 | 780 | 781 | ```elixir 782 | . . . 783 | def handle_events(events, _from, state) do 784 | for event <- events do 785 | %{id: id, payload: payload} = event 786 | {module, function, args} = payload |> deconstruct_payload 787 | task = start_task(module, function, args) 788 | yield_to_and_update_task(task, id) 789 | end 790 | {:noreply, [], state} 791 | end 792 | . . . 793 | ``` 794 | 795 | At its core, this setup simple just wants to run a task we decode the binary of. 796 | To do this we get the data from the event, deconstruct it, and then start and yield to a task. 797 | These functions aren't defined yet, so let's create them: 798 | 799 | 800 | ```elixir 801 | . . . 802 | def deconstruct_payload payload do 803 | payload |> :erlang.binary_to_term 804 | end 805 | . . . 806 | ``` 807 | We can use Erlang's built-in inverse of our other `term_to_binary/1` function to get our module, function, and args back out. 808 | Now we need to start the task: 809 | 810 | ```elixir 811 | . . . 812 | def start_task(mod, func, args) do 813 | Task.Supervisor.async_nolink(TaskSupervisor, mod, func, args) 814 | end 815 | . . . 816 | ``` 817 | 818 | Here we leverage the supervisor we created at the beginning to run this in a task. 819 | Now we need to define `yield_to_and_update_task/2`: 820 | 821 | ```elixir 822 | . . . 823 | def yield_to_and_update_task(task, id) do 824 | task 825 | |> Task.yield(1000) 826 | |> yield_to_status(task) 827 | |> update(id) 828 | end 829 | . . . 830 | ``` 831 | 832 | Now this brings in more pieces we've yet to define, but the core is simple. 833 | We wait 1 second for the task to run. 834 | From here, we respond to the status it returns (which will either be `:ok`, `:exit`, or `nil`) and handle it as such. 835 | After that we update our task via our DB interface to get things current. 836 | Let's define `yield_to_status/2` for each of the scenarios we mentioned: 837 | 838 | ```elixir 839 | . . . 840 | def yield_to_status({:ok, _}, _) do 841 | "success" 842 | end 843 | 844 | def yield_to_status({:exit, _}, _) do 845 | "error" 846 | end 847 | 848 | def yield_to_status(nil, task) do 849 | Task.shutdown(task) 850 | "timeout" 851 | end 852 | . . . 853 | ``` 854 | These simple handle the atom being returned from the process and respond appropriately. 855 | If it takes more than a second, we need to shut it down because otherwise it would just hang forever. 856 | 857 | If we make another method to update the database after consumption, we are set to go: 858 | 859 | ```elixir 860 | . . . 861 | defp update(status, id) do 862 | GenstageExample.TaskDBInterface.update_task_status(id, status) 863 | end 864 | . . . 865 | ``` 866 | 867 | And here we just call it through our database interface and update the status after yielding to allow the task time to run. 868 | 869 | From this, we can see our finalized consumer: 870 | 871 | ```elixir 872 | defmodule GenstageExample.Consumer do 873 | alias Experimental.GenStage 874 | use GenStage 875 | alias GenstageExample.{Producer, TaskSupervisor} 876 | 877 | def start_link do 878 | GenStage.start_link(__MODULE__, :state_doesnt_matter) 879 | end 880 | 881 | def init(state) do 882 | {:consumer, state, subscribe_to: [Producer]} 883 | end 884 | 885 | def handle_events(events, _from, state) do 886 | for event <- events do 887 | %{id: id, payload: payload} = event 888 | {module, function, args} = payload |> deconstruct_payload 889 | task = start_task(module, function, args) 890 | yield_to_and_update_task(task, id) 891 | end 892 | {:noreply, [], state} 893 | end 894 | 895 | defp yield_to_and_update_task(task, id) do 896 | task 897 | |> Task.yield(1000) 898 | |> yield_to_status(task) 899 | |> update(id) 900 | end 901 | 902 | defp start_task(mod, func, args) do 903 | Task.Supervisor.async_nolink(TaskSupervisor, mod , func, args) 904 | end 905 | 906 | defp yield_to_status({:ok, _}, _) do 907 | "success" 908 | end 909 | 910 | defp yield_to_status({:exit, _}, _) do 911 | "error" 912 | end 913 | 914 | defp yield_to_status(nil, task) do 915 | Task.shutdown(task) 916 | "timeout" 917 | end 918 | 919 | defp update(status, id) do 920 | GenstageExample.TaskDBInterface.update_task_status(id, status) 921 | end 922 | 923 | defp deconstruct_payload payload do 924 | payload |> :erlang.binary_to_term 925 | end 926 | end 927 | ``` 928 | 929 | Now, if we go into IEx: 930 | 931 | ```elixir 932 | $ iex -S mix 933 | iex> GenstageExample.enqueue(IO, :puts, ["wuddup"]) 934 | #=> 935 | 16:39:31.014 [debug] QUERY OK db=137.4ms 936 | INSERT INTO "tasks" ("payload","status") VALUES ($1,$2) [<<131, 104, 3, 100, 0, 9, 69, 108, 105, 120, 105, 114, 46, 73, 79, 100, 0, 4, 112, 117, 116, 115, 108, 0, 0, 0, 1, 109, 0, 0, 0, 6, 119, 117, 100, 100, 117, 112, 106>>, "waiting"] 937 | :ok 938 | 939 | 16:39:31.015 [debug] QUERY OK db=0.4ms queue=0.1ms 940 | begin [] 941 | 942 | 16:39:31.025 [debug] QUERY OK source="tasks" db=9.6ms 943 | SELECT t0."id" FROM "tasks" AS t0 WHERE (t0."status" = 'waiting') LIMIT $1 FOR UPDATE SKIP LOCKED [49000] 944 | 945 | 16:39:31.026 [debug] QUERY OK source="tasks" db=0.8ms 946 | UPDATE "tasks" AS t0 SET "status" = $1 WHERE (t0."id" = ANY($2)) RETURNING t0."id", t0."payload" ["running", [5]] 947 | 948 | 16:39:31.040 [debug] QUERY OK db=13.5ms 949 | commit [] 950 | iex(2)> wuddup 951 | 952 | 16:39:31.060 [debug] QUERY OK source="tasks" db=1.3ms 953 | UPDATE "tasks" AS t0 SET "status" = $1 WHERE (t0."id" = ANY($2)) ["success", [5]] 954 | ``` 955 | 956 | It works and we are storing and running tasks! 957 | -------------------------------------------------------------------------------- /config/config.exs: -------------------------------------------------------------------------------- 1 | use Mix.Config 2 | 3 | config :genstage_example, ecto_repos: [GenstageExample.Repo] 4 | 5 | config :genstage_example, GenstageExample.Repo, 6 | adapter: Ecto.Adapters.Postgres, 7 | database: "genstage_example", 8 | username: "bobdawg", 9 | password: "", 10 | hostname: "localhost", 11 | port: "5432" 12 | 13 | -------------------------------------------------------------------------------- /lib/genstage_example.ex: -------------------------------------------------------------------------------- 1 | defmodule GenstageExample do 2 | use Application 3 | 4 | alias GenstageExample.{Producer, Repo} 5 | def start(_type, _args) do 6 | import Supervisor.Spec, warn: false 7 | # 12 workers / system core 8 | consumers = for id <- (0..System.schedulers_online * 12) do 9 | worker(GenstageExample.Consumer, [], id: id) 10 | end 11 | producers = [ 12 | worker(Producer, []), 13 | ] 14 | 15 | supervisors = [ 16 | supervisor(GenstageExample.Repo, []), 17 | supervisor(Task.Supervisor, [[name: GenstageExample.TaskSupervisor]]), 18 | ] 19 | children = supervisors ++ producers ++ consumers 20 | 21 | opts = [strategy: :one_for_one, name: GenstageExample.Supervisor] 22 | Supervisor.start_link(children, opts) 23 | end 24 | 25 | def start_later(module, function, args) do 26 | payload = {module, function, args} |> :erlang.term_to_binary 27 | Repo.insert_all("tasks", [ 28 | %{status: "waiting", payload: payload} 29 | ]) 30 | notify_producer 31 | end 32 | 33 | def notify_producer do 34 | send(Producer, :data_inserted) 35 | end 36 | 37 | defdelegate enqueue(module, function, args), to: Producer 38 | end 39 | -------------------------------------------------------------------------------- /lib/genstage_example/consumer.ex: -------------------------------------------------------------------------------- 1 | defmodule GenstageExample.Consumer do 2 | alias Experimental.GenStage 3 | use GenStage 4 | alias GenstageExample.{Producer, TaskSupervisor} 5 | 6 | def start_link do 7 | GenStage.start_link(__MODULE__, :state_doesnt_matter) 8 | end 9 | 10 | def init(state) do 11 | {:consumer, state, subscribe_to: [Producer]} 12 | end 13 | 14 | def handle_events(events, _from, state) do 15 | for event <- events do 16 | %{id: id, payload: payload} = event 17 | {module, function, args} = payload |> deconstruct_payload 18 | task = start_task(module, function, args) 19 | yield_to_and_update_task(task, id) 20 | end 21 | {:noreply, [], state} 22 | end 23 | 24 | defp start_task(mod, func, args) do 25 | Task.Supervisor.async_nolink(TaskSupervisor, mod , func, args) 26 | end 27 | 28 | defp yield_to_status({:ok, _}, _) do 29 | "success" 30 | end 31 | 32 | defp yield_to_status({:exit, _}, _) do 33 | "error" 34 | end 35 | 36 | defp yield_to_status(nil, task) do 37 | Task.shutdown(task) 38 | "timeout" 39 | end 40 | 41 | defp update(status, id) do 42 | GenstageExample.TaskDBInterface.update_task_status(id, status) 43 | end 44 | 45 | defp yield_to_and_update_task(task, id) do 46 | task 47 | |> Task.yield(1000) 48 | |> yield_to_status(task) 49 | |> update(id) 50 | end 51 | 52 | defp deconstruct_payload payload do 53 | payload |> :erlang.binary_to_term 54 | end 55 | end 56 | -------------------------------------------------------------------------------- /lib/genstage_example/producer.ex: -------------------------------------------------------------------------------- 1 | defmodule GenstageExample.Producer do 2 | alias Experimental.GenStage 3 | use GenStage 4 | 5 | @name __MODULE__ 6 | 7 | def start_link do 8 | GenStage.start_link(__MODULE__, 0, name: @name) 9 | end 10 | 11 | def init(counter) do 12 | {:producer, counter} 13 | end 14 | 15 | def enqueue(module, function, args) do 16 | payload = {module, function, args} |> construct_payload 17 | GenstageExample.Task.enqueue("waiting", payload) 18 | Process.send(@name, :enqueued, []) 19 | :ok 20 | end 21 | 22 | def handle_cast(:enqueued, state) do 23 | serve_jobs(state) 24 | end 25 | 26 | def handle_demand(demand, state) do 27 | serve_jobs(demand + state) 28 | end 29 | 30 | def handle_info(:enqueued, state) do 31 | {count, events} = GenstageExample.Task.take(state) 32 | {:noreply, events, state - count} 33 | end 34 | 35 | def serve_jobs limit do 36 | {count, events} = GenstageExample.Task.take(limit) 37 | Process.send_after(@name, :enqueued, 60_000) 38 | {:noreply, events, limit - count} 39 | end 40 | 41 | defp construct_payload({module, function, args}) do 42 | {module, function, args} |> :erlang.term_to_binary 43 | end 44 | end 45 | -------------------------------------------------------------------------------- /lib/genstage_example/task.ex: -------------------------------------------------------------------------------- 1 | defmodule GenstageExample.Task do 2 | def enqueue(status, payload) do 3 | GenstageExample.TaskDBInterface.insert_tasks(status, payload) 4 | end 5 | 6 | def take(limit) do 7 | GenstageExample.TaskDBInterface.take_tasks(limit) 8 | end 9 | end 10 | -------------------------------------------------------------------------------- /lib/genstage_example/task_db_interface.ex: -------------------------------------------------------------------------------- 1 | defmodule GenstageExample.TaskDBInterface do 2 | import Ecto.Query 3 | 4 | def take_tasks(limit) do 5 | {:ok, {count, events}} = 6 | GenstageExample.Repo.transaction fn -> 7 | ids = GenstageExample.Repo.all waiting(limit) 8 | GenstageExample.Repo.update_all by_ids(ids), [set: [status: "running"]], [returning: [:id, :payload]] 9 | end 10 | {count, events} 11 | end 12 | 13 | def insert_tasks(status, payload) do 14 | GenstageExample.Repo.insert_all "tasks", [ 15 | %{status: status, payload: payload} 16 | ] 17 | end 18 | 19 | def update_task_status(id, status) do 20 | GenstageExample.Repo.update_all by_ids([id]), set: [status: status] 21 | end 22 | 23 | defp by_ids(ids) do 24 | from t in "tasks", where: t.id in ^ids 25 | end 26 | 27 | defp waiting(limit) do 28 | from t in "tasks", 29 | where: t.status == "waiting", 30 | limit: ^limit, 31 | select: t.id, 32 | lock: "FOR UPDATE SKIP LOCKED" 33 | end 34 | end 35 | -------------------------------------------------------------------------------- /lib/repo.ex: -------------------------------------------------------------------------------- 1 | defmodule GenstageExample.Repo do 2 | use Ecto.Repo, 3 | otp_app: :genstage_example 4 | end 5 | -------------------------------------------------------------------------------- /mix.exs: -------------------------------------------------------------------------------- 1 | defmodule GenstageExample.Mixfile do 2 | use Mix.Project 3 | 4 | def project do 5 | [app: :genstage_example, 6 | version: "0.1.0", 7 | elixir: "~> 1.3", 8 | build_embedded: Mix.env == :prod, 9 | start_permanent: Mix.env == :prod, 10 | package: package(), 11 | deps: deps()] 12 | end 13 | 14 | # Configuration for the OTP application 15 | # 16 | # Type "mix help compile.app" for more information 17 | def application do 18 | [applications: [:logger, :postgrex], 19 | mod: {GenstageExample, []}] 20 | end 21 | 22 | # Dependencies can be Hex packages: 23 | # 24 | # {:mydep, "~> 0.3.0"} 25 | # 26 | # Or git/path repositories: 27 | # 28 | # {:mydep, git: "https://github.com/elixir-lang/mydep.git", tag: "0.1.0"} 29 | # 30 | # Type "mix help deps" for more examples and options 31 | defp deps do 32 | [ 33 | {:ecto, "~> 2.0"}, 34 | {:postgrex, "~> 0.12.1"}, 35 | {:gen_stage, "~> 0.7"}, 36 | {:ex_doc, ">= 0.0.0", only: :dev}, 37 | ] 38 | end 39 | 40 | defp package do 41 | [# These are the default files included in the package 42 | name: :postgrex, 43 | files: ["doc", "lib", "priv", "mix.exs", "README*", "README*"], 44 | maintainers: ["Robert Grayson"], 45 | description: "A simple genstage example - a task runner", 46 | licenses: ["DWTFYWPL"], 47 | links: %{"GitHub" => "https://github.com/ybur-yug/genstage_example", 48 | "Docs" => "http://elixirschool.com/"}] 49 | end 50 | end 51 | -------------------------------------------------------------------------------- /mix.lock: -------------------------------------------------------------------------------- 1 | %{"connection": {:hex, :connection, "1.0.4", "a1cae72211f0eef17705aaededacac3eb30e6625b04a6117c1b2db6ace7d5976", [:mix], []}, 2 | "db_connection": {:hex, :db_connection, "1.0.0", "63c03e520d54886a66104d34e32397ba960db6e74b596ce221592c07d6a40d8d", [:mix], [{:connection, "~> 1.0.2", [hex: :connection, optional: false]}, {:poolboy, "~> 1.5", [hex: :poolboy, optional: true]}, {:sbroker, "~> 1.0", [hex: :sbroker, optional: true]}]}, 3 | "decimal": {:hex, :decimal, "1.2.0", "462960fd71af282e570f7b477f6be56bf8968e68277d4d0b641a635269bf4b0d", [:mix], []}, 4 | "earmark": {:hex, :earmark, "1.0.2", "a0b0904d74ecc14da8bd2e6e0248e1a409a2bc91aade75fcf428125603de3853", [:mix], []}, 5 | "ecto": {:hex, :ecto, "2.0.5", "7f4c79ac41ffba1a4c032b69d7045489f0069c256de606523c65d9f8188e502d", [:mix], [{:db_connection, "~> 1.0-rc.4", [hex: :db_connection, optional: true]}, {:decimal, "~> 1.1.2 or ~> 1.2", [hex: :decimal, optional: false]}, {:mariaex, "~> 0.7.7", [hex: :mariaex, optional: true]}, {:poison, "~> 1.5 or ~> 2.0", [hex: :poison, optional: true]}, {:poolboy, "~> 1.5", [hex: :poolboy, optional: false]}, {:postgrex, "~> 0.12.0", [hex: :postgrex, optional: true]}, {:sbroker, "~> 1.0-beta", [hex: :sbroker, optional: true]}]}, 6 | "ex_doc": {:hex, :ex_doc, "0.14.3", "e61cec6cf9731d7d23d254266ab06ac1decbb7651c3d1568402ec535d387b6f7", [:mix], [{:earmark, "~> 1.0", [hex: :earmark, optional: false]}]}, 7 | "gen_stage": {:hex, :gen_stage, "0.7.0", "d8ab7f294ca0fb7ca001c33a97d25876561f086ecd149689f43c0d2ffccf4fff", [:mix], []}, 8 | "poolboy": {:hex, :poolboy, "1.5.1", "6b46163901cfd0a1b43d692657ed9d7e599853b3b21b95ae5ae0a777cf9b6ca8", [:rebar], []}, 9 | "postgrex": {:hex, :postgrex, "0.12.1", "2f8b46cb3a44dcd42f42938abedbfffe7e103ba4ce810ccbeee8dcf27ca0fb06", [:mix], [{:connection, "~> 1.0", [hex: :connection, optional: false]}, {:db_connection, "~> 1.0-rc.4", [hex: :db_connection, optional: false]}, {:decimal, "~> 1.0", [hex: :decimal, optional: false]}]}} 10 | -------------------------------------------------------------------------------- /priv/repo/migrations/20161023200119_add_tasks.exs: -------------------------------------------------------------------------------- 1 | defmodule GenstageExample.Repo.Migrations.AddTasks do 2 | use Ecto.Migration 3 | 4 | def change do 5 | create table(:tasks) do 6 | add :payload, :binary 7 | add :status, :string 8 | end 9 | end 10 | end 11 | -------------------------------------------------------------------------------- /test/test_helper.exs: -------------------------------------------------------------------------------- 1 | ExUnit.start() 2 | --------------------------------------------------------------------------------