├── LICENSE └── readme.MD /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2022 João Duarte 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /readme.MD: -------------------------------------------------------------------------------- 1 | # Table of contents 2 | - [Bibliography](#bibliography) 3 | - [Why is concurrency hard?](#why-is-concurrency-hard) 4 | * [Concurrency problems:](#concurrency-problems) 5 | - [Communicating Sequencial Processes (CSP)](#communicating-sequencial-processes-csp) 6 | * [What is the difference between concurrency and parallelism?](#what-is-the-difference-between-concurrency-and-parallelism) 7 | * [What is CSP?](#what-is-csp) 8 | * [How does CSP (and Golang) help?](#how-does-csp-and-golang-help) 9 | * [Go's Philosophy on Concurrency](#gos-philosophy-on-concurrency) 10 | - [Go's Concurrency Building Blocks](#gos-concurrency-building-blocks) 11 | * [Goroutines](#goroutines) 12 | + [The M:N Scheduler](#the-mn-scheduler) 13 | * [The `sync` package](#the-sync-package) 14 | + [WaitGroup](#waitgroup) 15 | + [Mutex and RWMutex](#mutex-and-rwmutex) 16 | + [Cond](#cond) 17 | + [Once](#once) 18 | + [Pool](#pool) 19 | * [Channels](#channels) 20 | + [Buffered vs unbuffered](#buffered-vs-unbuffered) 21 | + [Channel owners and channel consumers](#channel-owners-and-channel-consumers) 22 | * [The `select` statement](#the-select-statement) 23 | - [Concurrency Patterns in Go](#concurrency-patterns-in-go) 24 | * [Confinement](#confinement) 25 | * [The for-select loop](#the-for-select-loop) 26 | * [How to prevent goroutine leaks?](#how-to-prevent-goroutine-leaks) 27 | * [The or-channel](#the-or-channel) 28 | * [Error handling](#error-handling) 29 | * [Pipelines](#pipelines) 30 | * [Useful generators](#useful-generators) 31 | + [Repeat](#repeat) 32 | + [Take](#take) 33 | + [The repeatFn](#the-repeatfn) 34 | + [Fan-out, fan-in](#fan-out-fan-in) 35 | + [The or-done channel](#the-or-done-channel) 36 | + [The tee-channel](#the-tee-channel) 37 | + [The bridge-channel](#the-bridge-channel) 38 | * [The context package](#the-context-package) 39 | - [Concurrency at scale](#concurrency-at-scale) 40 | * [Error Propagation](#error-propagation) 41 | * [Timeouts and Cancelations](#timeouts-and-cancelations) 42 | * [Heartbeats](#heartbeats) 43 | * [Replicated requests](#replicated-requests) 44 | 45 | # Introduction 46 | I decided to properly start learning concurrency in Golang. I had previously worked with it but didn't really understood it. So I bought Concurrency in Go by Katherine Cox-Buday and started to take notes of what I was learning. This is basically a **very** slimmed down version of the contents of the book. I think it's something useful and decided to share it. 47 | 48 | # Bibliography 49 | 1. Concurrency in Go - Katherine Cox-Buday - [buy here](https://www.amazon.com/Concurrency-Go-Tools-Techniques-Developers/dp/1491941197) 50 | 51 | Everything in this article is based on this book by Katherine Cox-Buday. All the knowledge and nearly all examples comes from her book. This article is purely my annotations of the book and what I learned from it. All credits go to Katherine. 52 | 53 | # Why is concurrency hard? 54 | ## Concurrency problems: 55 | ### Race Conditions 56 | 1. When 2 or more operations must execute in the correct order, but the program has not been written so that this order is guaranteed. 57 | 2. It shows up as a data race, when two threads are competing for the same variable or resource. 58 | 59 | ### Atomicity 60 | 1. Something is atomic when, within a context, something is indivisible and uninterruptible by any other thing. 61 | 2. If a variable/resource is inside a goroutine (context) and its not shared with other routines, then it's atomic. Changes to that resource are uninterruptible and indivisible 62 | 63 | ### Memory Access Syncronization 64 | 1. When there is a data race, we can wrap that **critical section** (when we need exclusive access to a shared resource) with a mutex, locking and unlocking the mutex. When we do this, we are synchronising memory access. 65 | 2. This fixes the data race, but it doesn't fix the race condition (threads still compete for first write/read). 66 | 3. It can create maintenance and performance problems. 67 | We should try to keep them only to critical sections. If it gets inefficient, make them broader. 68 | 69 | ### Deadlocks 70 | 1. When all concurrent processes (threads or goroutines) are blocked, waiting for another routine to continue. The program will never resume without external intervention. 71 | 2. Happens when a shared mutex is locked twice. by two different routines. 72 | 3. *Coffman Conditions* help identity when we have a deadlock: 73 | 1. *Mutual exclusions* - When a concurrent process has exclusive rights to a shared resource 74 | 2. *Wait for condition* - When a process holds a resource and simultaneously is waiting for another resource 75 | 3. *No preemption* - when a resource held by the process can only be released by that process 76 | 4. Circular wait - a routine waiting for a routine that is waiting for that routine. 77 | 78 | ### Livelocks 79 | 1. When concurrent processes do nothing to advance the state of the program 80 | 2. For instance, when trying to manually avoiding a deadlock, we might try to make the process avoid that other process. And so both of the processes get caught in a loop avoiding each other but not advancing 81 | 3. Like when two people are staring at each other, trying to let the other one pass, going left and right, but not succeeding. 82 | 83 | ### Starvation 84 | 1. When a resource is too greedy and prevents other resources to do their job, for instance with a shared mutex in which it keeps it too much time locked. 85 | 2. It might seem that some process is more efficient than the other, but the reality is that process is just consuming more resources (exclusive memory access to a resource) and leaving nothing to the other. 86 | 87 | ## When writing code that uses concurrency, try to answer these questions to the reader (of the func): 88 | 1. Who is responsible for initialising the concurrency/goroutines? Me or the function? 89 | 2. Who is responsible for the synchronisation of memory? 90 | 3. How is the problem space mapped onto concurrency primitives? 91 | 92 | # Communicating Sequencial Processes (CSP) 93 | ## What is the difference between concurrency and parallelism? 94 | 1. We write concurrent code that we except to run in parallel. 95 | 2. We should be ignorant to the fact that a concurrent code is running in parallel (ideally). 96 | 3. Concurrency is running/managing multiple computations at the same time, parallelism is running multiple computations simultaneously. 97 | 98 | ## What is CSP? 99 | 1. Paper written by Charles Hoare (in 1978), from which Go takes its principles in regards to concurrency, basically where channels come from. 100 | 2. Said that inputs and outputs needed to be primitives to facilitate communication between parallel processes (so as do channels) 101 | 3. Made this communication easier to make without the need of exclusive memory access 102 | 103 | ## How does CSP (and Golang) help? 104 | 1. Golang provides channels (communication inputs and outputs between processes), goroutines (processes) and the select statement. 105 | 2. Channels: 106 | 1. We can easily communicate between different goroutines. 107 | 2. There is less need to do memory access synchronisations. 108 | 3. Allows us to compose the outputs and inputs to other subsystems, and combine it with timeouts or cancellations. 109 | 3. Goroutines: 110 | 1. Are very lightweight, we don't have to worry about creating one (unlike Java threads for instance) 111 | 2. We don't need to think about parallelism in our code, and allows us to model problems closer to their natural level of concurrency. 112 | 3. Are multiplexed onto OS Threads automatically and scheduled for us (Go does that). We don't have to worry about these optimisations. 113 | 4. Go scales all our gouroutines dynamically, we don't have to worry about specific OS/hardware limit, because it does it for us. 114 | 4. Select: 115 | 1. Complement to channels 116 | 2. enables all the difficult bits of composing channels. Waits for events, selects messages from competing channels, continue if there are no messages waiting... 117 | 118 | ## Go's Philosophy on Concurrency 119 | 1. "Share memory by communicating, don't communicate by sharing memory" 120 | 2. Aim for simplicity. Use channels when possible, treat goroutines like a free resource. 121 | 3. If necessary only, use memory access synchronisation (mutexes). 122 | 123 | # Go's Concurrency Building Blocks 124 | ## Goroutines 125 | 1. Every program in Go has at least one goroutine: the main routine. 126 | 2. A cheap and basic unit of organisation in a Go program. 127 | 3. It's a function that runs concurrently alongside other code. 128 | 4. They run in the same memory address they were created in (share variables) 129 | 5. The functions might not run if main ends before the goroutines start/print to stdout. 130 | 6. They are very lightweight (few kilobytes). 131 | 7. It's possible to create thousands of goroutines in the same address space. 132 | 133 | ### Examples of a goroutine instantiation: 134 | 135 | ```go 136 | func main() { 137 | go sayHello() 138 | go func() { 139 | fmt.Println("cenas") 140 | }() 141 | // continue do other things. 142 | } 143 | 144 | func sayHello() { 145 | fmt.Println("hello world") 146 | } 147 | ``` 148 | 149 | ### The M:N Scheduler 150 | 1. Go's mechanism for hosting goroutines 151 | 2. Maps `M` green threads into `N` OS threads 152 | 1. **Green threads** - threads managed by the language's runtime. 153 | 2. **OS Threads** - expensive 154 | 3. Goroutines are scheduled into the green threads. 155 | 4. The scheduler handles the distribution accross the available threads and ensures that when these goroutines become blocks, other goroutines can run. 156 | 157 | ## The `sync` package 158 | 1. Contain concurrency primitives that are useful for low level memory synchronisation. 159 | 160 | ### WaitGroup 161 | 1. Allows us to wait for a set of concurrent operations to complete when we don't care about the result of the concurrent operation. 162 | 163 | #### Example 164 | ```go 165 | var wg sync.WaitGroup 166 | 167 | wg.Add(1) // Add an incremental to the group (plus one routine running) 168 | go func() { 169 | defer wg.Done() // Say that the routine is finished to the group. 170 | fmt.Println("first goroutine sleeping") 171 | time.Sleep(1) 172 | } () 173 | 174 | wg.Add(1) 175 | go func() { 176 | defer wg.Done() 177 | fmt.Println("second goroutine sleeping") 178 | time.Sleep(1) 179 | } () 180 | 181 | wg.Wait() // Waits for routines in waitgroup are finished. 182 | fmt.Println("all goroutines are complete.") 183 | ``` 184 | 185 | ### Mutex and RWMutex 186 | 1. Allows us to wrap critical sections and exclusively write to a memory address (memory synchronisation). 187 | 2. Try to only use in an encapsulated manner (in a struct that allows for concurrent writes for instance). 188 | 3. **Locking the same mutex twice without Unlocking will cause a deadlock!** 189 | 4. RWMutex is the same as regular Mutex, but constrains wether we want to synchronise in reading or writing. 190 | 191 | #### Example 192 | ```go 193 | type ConcurrentCounter struct { 194 | mu sync.Mutex 195 | value int 196 | } 197 | 198 | func (c *ConcurrentCounter) Increment() { 199 | c.mu.Lock() // restricts the memory pointers here. Guarantees only one write to value. 200 | c.value++ 201 | c.mu.Unlock() // Being called with defer is also nice. 202 | } 203 | ``` 204 | 205 | ### Cond 206 | 1. A rendezvous for goroutines waiting for or announcing the occurence of an event/signal. 207 | 208 | #### Example 209 | This example makes a queue with limit of 5, and keeps adding to the queue. We want to remove from the queue every time the queue gets to length of 2, concurrently. 210 | 211 | ```go 212 | c := sync.NewCond(&sync.Mutex{}) 213 | queue := make([]interface{}, 0, 5) 214 | 215 | removeFromQueue := func(delay time.Duration) { 216 | time.Sleep(delay) 217 | c.L.Lock() // enter the critical section. 218 | queue = queue[1:] 219 | fmt.Println("removed from queue") 220 | c.L.Unlock() // exit the critical section. 221 | c.Signal() // signal that our operation is done. 222 | } 223 | 224 | for i := 0; i < 5; i++ { 225 | c.L.Lock() // Enter a critical section. Prevent concurrent writes. 226 | for len(queue) == 2 { 227 | c.Wait() // We suspend the main routine here. Wait for a signal to continue. 228 | // c.Wait() calls c.L.Unlock() after it's called, and c.L.Lock() when it exits. 229 | } 230 | fmt.Println("adding to queue") 231 | queue = append(queue, struct{}{}) 232 | go removeFromQueue(time.Second*1) 233 | c.L.Unlock() // Exit critical section. 234 | } 235 | 236 | Output: 237 | adding to queue 238 | adding to queue 239 | removed from queue 240 | adding to queue 241 | removed from queue 242 | adding to queue 243 | removed from queue 244 | adding to queue 245 | removed from queue 246 | adding to queue 247 | ``` 248 | 249 | We can also use `Broadcast()` instead of `Signal()`. `Broadcast()` tells all goroutines that something happened. `Signal()` tells the goroutine that is waiting the longest that something happened. 250 | 251 | ### Once 252 | 1. Ensures that only one call to Do ever calls the function that is passed, even on different goroutines. 253 | #### Example 254 | ```go 255 | var count int 256 | increment := func() { 257 | count++ 258 | } 259 | 260 | var once sync.Once 261 | 262 | var wg sync.WaitGroup 263 | wg.Add(100) 264 | for i := 0; i < 100; i++ { 265 | go func() { 266 | defer wg.Done() 267 | once.Do(increment) 268 | }() 269 | } 270 | 271 | wg.Wait() 272 | fmt.Printf("count is %d\n", count) 273 | 274 | Output: 275 | count is 1 276 | ``` 277 | 278 | ### Pool 279 | 1. Concurrent safe implementation of the design pattern "Pool". 280 | 2. Creates and makes available a fixed number (or pool) of things for use. Used for expensive things such as database connection. Only N instances are created, but M operations can request access from these instances. 281 | 3. Can basically improve response time, but try to avoid using it. 282 | 283 | #### Example 284 | ```go 285 | pool := &sync.Pool { 286 | New: func() interface{} { 287 | fmt.Println("creating new instance") 288 | return struct{}{} 289 | } 290 | } 291 | 292 | pool.Get() // Gets an available instance or calls pool.New() 293 | instance := pool.Get() 294 | pool.Put(instance) // Put previously retrieved instance back in pool. Use it to seed the pool or to reuse instances. Saves memory. 295 | pool.Get() 296 | 297 | Output: 298 | creating new instance 299 | creating new instance 300 | ``` 301 | 302 | ## Channels 303 | 1. Streams of data (input and output) that allow communication between goroutines. Derived from Hoare's CSP. 304 | 2. Can be unidirectional or not (only read or write, or both). Useful so we can define who owns and who consumes the channel. 305 | 306 | ```go 307 | var receiveStream <-chan string 308 | var sendStream chan<- string 309 | dataStream := make(chan string) 310 | 311 | // valid statements 312 | receiveStream = dataStream 313 | sendStream = dataStream 314 | 315 | go func() { 316 | dataStream <- "hello world" 317 | }() 318 | fmt.Println(<-dataStream) // prints hello world. Waits until there is something in the channel. 319 | ``` 320 | 321 | 3. We can close the channel, with `close(dataStream)` and receive a the close message with `data, ok := <-dataStream`, where ok is `false` and data is `""` 322 | 4. We can loop through a channel, using `for data := range dataStream {}`, and it will end the loop as soon as `dataStream` is closed. It will block the execution to wait for the data as well. 323 | 324 | ### Buffered vs unbuffered 325 | 1. Buffered channels have a limit `make(chan int, 3)`, whereas unbuffered channels don't `make(chan int)` 326 | 2. Buffered channels are a FIFO (first in first out) queue. 327 | 3. Unbuffered channels are used for synchronous communication 328 | 4. Unbuffered channels send data to other routines as soon as it receives it. 329 | 5. Buffered channels have to be filled, and then the routines that consume the channel receive. 330 | 6. Unbuffered Channel will block the goroutine whenever it is empty and waiting to be filled. 331 | 7. Buffered Channel will also block the goroutine either when it is empty and waiting to be filled or it's on its full-capacity and there's a statement that want to fill the channel. 332 | 333 | ### Channel owners and channel consumers 334 | #### Owners should: 335 | 1. Instantiate the channel. 336 | 2. Perform writes, or pass ownership to another goroutine. 337 | 3. Close the channel. 338 | 4. Encapsulate 1,2 and 3 and expose them via a reader channel. 339 | 340 | This reduces the risk of deadlocks (by writing to a nil channel), panics (by closing a nil channel or writing to a closed channel, or closing a channel more than once) 341 | 342 | #### Consumers should: 343 | 1. Know when a channel is closed. 344 | 2. Responsibly handle blocking for any reason (using the `select` statement). 345 | 346 | #### Example: 347 | This way, the lifecycle of resultStream is encapsulated, easy to read and maintain. 348 | ```go 349 | chanOwner := func() <-chan int { //read only chan return 350 | resultStream := make(chan int, 5) 351 | go func() { 352 | defer close(resultStream) 353 | for i := 0; i <= 5; i++ { 354 | resultStream <- i 355 | } 356 | } () 357 | return resultStream 358 | } 359 | 360 | resultStream := chanOwner() 361 | for result := range resultStream { 362 | fmt.Printf("received: %d\n", result) 363 | } 364 | fmt.Println("done receiving") 365 | 366 | output: 367 | received 0 368 | received 1 369 | received 2 370 | received 3 371 | received 4 372 | received 5 373 | done receiving 374 | ``` 375 | 376 | ## The `select` statement 377 | 1. Binds channels together. 378 | 2. Waits for a channel to receive something and does logic accordingly 379 | 380 | ### Examples: 381 | ```go 382 | done := make(chan interface{}) 383 | go func() { 384 | time.Sleep(5*time.Second) 385 | close(done) 386 | } () 387 | 388 | workCounter := 0 389 | for { 390 | select { 391 | case <-done: 392 | break 393 | case <-time.After(10 * time.Second): // example of a timeout with a select, if we never get anything on the done channel for 10 seconds. 394 | break 395 | default: 396 | } 397 | //simulate work 398 | workCounter++ 399 | time.Sleep(1*time.Second) 400 | } 401 | fmt.Printf("achieved %d work cycles", workCounter) 402 | 403 | output: 404 | achieved 5 work cycles. 405 | ``` 406 | 407 | # Concurrency Patterns in Go 408 | ## Confinement 409 | Means encapsulating the data, having owners and consumers, so that the owners can close the channel and guarantee that the operations are atomic. 410 | 411 | **Example:** 412 | ```go 413 | chanOwner := func() <-chan int { 414 | results := make(chan int, 5) 415 | go func() { 416 | defer close(results) 417 | for i := 0; i < 5; i++ { 418 | results <- i 419 | } 420 | }() 421 | return results 422 | } 423 | 424 | consumer := func(results <-chan int) { 425 | for result := range results { 426 | fmt.Printf("received: %d\n", result) 427 | } 428 | fmt.Println("done receiving") 429 | } 430 | 431 | results := chanOwner() 432 | consumer(results) 433 | ``` 434 | 435 | ## The for-select loop 436 | We can loop indefinitely and wait to be stopped, or loop through iteration variables, and do a select statement to return from the loop. 437 | 438 | **Example:** 439 | ```go 440 | for { 441 | select { 442 | case <-done: 443 | return 444 | default: 445 | } 446 | // do non-preemptable work 447 | } 448 | ``` 449 | 450 | ## How to prevent goroutine leaks? 451 | Although goroutines are very lightweight, they are not eliminated by the garbage collector, which means that we need to prevent them to run indefinitely. 452 | We do that by using a read only channel on the owners of the goroutine, and exiting the routine once that channel is closed. The consumers, or the parent routine that has access to the routine, that knows when that routine must end, are able to do it just by closing the channel. 453 | 454 | **Example:** 455 | ```go 456 | newRandStream := func(done <-chan interface{}) <-chan int { 457 | randStream := make(chan int) 458 | go func() { 459 | defer fmt.Println("newRandStream has exited.") 460 | defer close(randStream) 461 | for { 462 | select { 463 | case randStream <- rand.Int(): 464 | case <-done: 465 | return 466 | } 467 | } 468 | } () 469 | return randStream 470 | } 471 | 472 | done := make(chan interface{}) 473 | randStream := newRandStream(done) 474 | fmt.Println("3 random ints:") 475 | for i := 1; i <= 3; i++ { 476 | fmt.Printf("%d: %d", i, <-randStream) 477 | } 478 | close(done) 479 | 480 | //simulate on going work 481 | time.Sleep(1 * time.Second) 482 | 483 | Output: 484 | 3 random ints: 485 | 1: 121324212 486 | 2: 423344243 487 | 3: 123123231 488 | newRandStream has exited. 489 | ``` 490 | 491 | If we didn't send the close channel, we would never have gotten the "newRandStream" message. 492 | 493 | ## The or-channel 494 | You can combine multiple done channels into a single done channel that closes if any of its component channels close. This is done by the use of a recursive function, and it's a good pattern to have. The next example is to check if any of X channels (or goroutines) are closed, and it's X/2 complexity. 495 | 496 | **Example:** 497 | ```go 498 | or := func (channels ...<-chan interface{}) <-chan interface{} { 499 | switch len(channels) { 500 | case 0: 501 | return nil 502 | case 1: 503 | return channels[0] 504 | } 505 | 506 | orDone := make(chan interface{}) 507 | go func() { 508 | defer close(orDone) 509 | switch len(channels) { 510 | case 2: 511 | select { 512 | case <-channels[0]: 513 | case <-channels[1]: 514 | } 515 | default: 516 | select { 517 | case <-channels[0]: 518 | case <-channels[1]: 519 | case <-channels[2]: 520 | case <-or(append(channels[3:], orDone)...) 521 | } 522 | } 523 | return orDone 524 | }() 525 | } 526 | 527 | sig := func(after time.Duration) <-chan interface{} { 528 | c := make(chan interface{}) 529 | go func() { 530 | defer close(c) 531 | time.Sleep(after) 532 | }() 533 | return c 534 | } 535 | 536 | start := time.Now() 537 | <-or( 538 | sig(time.Second * 5) 539 | sig(time.Second * 1) 540 | sig(time.Hour * 2) 541 | sig(time.Minute * 1) 542 | ) 543 | fmt.Printf("time since start: %v", time.Since(start)) 544 | 545 | output: 546 | time since start: 1.000004s 547 | ``` 548 | 549 | ## Error handling 550 | Parent goroutines should handle the errors of the children goroutines, or any other routine that has more access to the context, or state, of the whole application. We can do that by encapsulating the return result of the return channel. 551 | 552 | **Example:** 553 | ```go 554 | type Result struct { 555 | Error err 556 | Response *http.Response 557 | } 558 | 559 | checkStatus := func(done <-chan interface{}, urls ...string) <-chan Result { 560 | results := make(chan Result) 561 | go func() { 562 | defer close(results) 563 | for _, url := range urls { 564 | resp, err := http.Get(url) 565 | result := Result{Error: err, Result: resp} 566 | select { 567 | case <-done: 568 | return 569 | case results <- result: 570 | } 571 | } 572 | } () 573 | return results 574 | } 575 | 576 | done := make(chan interface{}) 577 | defer close(done) 578 | 579 | urls := []string{"a", "https://www.google.com", "b", "c", "d"} 580 | errCount := 0 581 | for result := range checkStatus(done, urls) { 582 | if result.Error != nil { 583 | errCount++ 584 | fmt.Printf("error: %v", err) 585 | if errCount == 3 { 586 | fmt.Println("too many errors, exiting process") 587 | break 588 | } 589 | continue 590 | } 591 | fmt.Printf("Response: %d\n", result.Response.Status) 592 | } 593 | 594 | output: 595 | error: a doesnt exist 596 | Response: 200 597 | error: b doesnt exist 598 | error: c doesnt exist 599 | ``` 600 | 601 | ## Pipelines 602 | Pipelines are something that comes from functional programming. Are functions that call another function result. They can be a **Batch Pipeline** (an array as input and output), or a **Stream Pipeline** (single input as an input and output). Here is an example of a simple pipeline: 603 | 604 | ```go 605 | add := func(nums []int, addition int) []int { 606 | result := make([]int, len(nums)) 607 | for i, num := range nums { 608 | result[i] = num + addition 609 | } 610 | return result 611 | } 612 | 613 | multiply := func(nums []int, multiplier int) []int { 614 | result := make([]int, len(nums)) 615 | for i, num := range nums { 616 | result[i] = num * multiplier 617 | } 618 | return result 619 | } 620 | 621 | initialArr := []int{1, 2, 3, 4} 622 | result := add(multiply(initialArr, 2), 1) 623 | fmt.Println(result) 624 | 625 | Output: 626 | [3, 5, 7, 9] 627 | ``` 628 | 629 | ### How can we use pipelines in concurrency? 630 | 1. Use channels as the pipeline's inputs. 631 | 632 | ***Example:*** 633 | 634 | ```go 635 | generator := func (<- done interface{}, integers... int) <-chan int { 636 | intStream := make(chan int) 637 | go func () { 638 | defer close (intStream) 639 | for _, i := range integers { 640 | select { 641 | case <-done: 642 | return 643 | case intStream <- i: 644 | } 645 | } 646 | } () 647 | return intStream 648 | } 649 | 650 | multiply := func( 651 | done <-chan interface, 652 | intStream <-chan int, 653 | multiplier int, 654 | ) <-chan int { 655 | multipliedStream := make(chan int) 656 | go func () { 657 | defer close(multipliedStream) 658 | for i := range intStream { 659 | select { 660 | case <-done: 661 | return 662 | case multipliedStream <- i*multiplier: 663 | } 664 | } 665 | } () 666 | return multipliedStream 667 | } 668 | 669 | add := func( 670 | done <-chan interface, 671 | intStream <-chan int, 672 | additive int, 673 | ) <-chan int { 674 | additionStream := make(chan int) 675 | go func () { 676 | defer close(additionStream) 677 | for i := range intStream { 678 | select { 679 | case <-done: 680 | return 681 | case additionStream <- i*additive: 682 | } 683 | } 684 | } () 685 | return additionStream 686 | } 687 | 688 | done := make(chan interface) 689 | defer close(done) 690 | 691 | pipeline := multiply(done, add(done, multiply(done, generator(done, 1, 2, 3, 4), 2), 1), 2) 692 | 693 | for v := range pipeline { 694 | fmt.Println(v) 695 | } 696 | 697 | Output: 698 | 6 699 | 10 700 | 14 701 | 18 702 | ``` 703 | 704 | ## Useful generators 705 | This is a collection of useful functions/snippets that you might commonly use or see in concurrency projects. 706 | 707 | ### Repeat 708 | This function will repeat the values you pass to it infinitely until you tell it to stop. 709 | 710 | ```go 711 | repeat := func( 712 | done <-chan interface{}, 713 | values ...interface{}, 714 | ) <-chan interface{} { 715 | valueStream := make(chan interface{}) 716 | go func() { 717 | defer close(valueStream) 718 | for { 719 | for _, v := range values { 720 | select { 721 | case <-done: 722 | return 723 | case valueStream <- v: 724 | } 725 | } 726 | } 727 | }() 728 | return valueStream 729 | } 730 | ``` 731 | 732 | ### Take 733 | This "takes" the first number of items off an incoming stream, and then exit. 734 | 735 | ```go 736 | take := func( 737 | done <-chan interface{}, 738 | valueStream <-chan interface{}, 739 | num int, 740 | ) <-chan interface { 741 | takeStream := make(chan interface{}) 742 | go func() { 743 | defer close(takeStream) 744 | for i := 0; i < num; i++ { 745 | select { 746 | case <-done: 747 | return 748 | case takeStream <- <-valueStream: 749 | } 750 | } 751 | }() 752 | return takeStream 753 | } 754 | 755 | done := make(chan interface{}) 756 | defer close(done) 757 | for num := range take(done, repeat(done, 1), 5) { 758 | fmt.Printf("%v ", num) 759 | } 760 | 761 | Output: 762 | 1 1 1 1 1 763 | ``` 764 | 765 | ### The repeatFn 766 | If we expand the repeat and add a callback, we can use that to infinitely call a function and return a channel of the desired type that you want, here's an example: 767 | 768 | ```go 769 | repeat := func( 770 | done <-chan interface{}, 771 | fn func() interface{}, 772 | ) <-chan interface{} { 773 | valueStream := make(chan interface{}) 774 | go func() { 775 | defer close(valueStream) 776 | for { 777 | for _, v := range values { 778 | select { 779 | case <-done: 780 | return 781 | case valueStream <- fn(): 782 | } 783 | } 784 | } 785 | }() 786 | return valueStream 787 | } 788 | 789 | done := make(chan interface{}) 790 | defer close(done) 791 | rand := func() interface{} { 792 | return rand.Int() 793 | } 794 | 795 | for num := range take(done, repeatFn(done, rand), 5) { 796 | fmt.Println(num) 797 | } 798 | 799 | Output: 800 | 1234 801 | 54332 802 | 3467567 803 | 234 804 | 34456 805 | ``` 806 | 807 | ### Fan-out, fan-in 808 | Fan-out is the process of starting multiple goroutines to handle input from the pipeline, and fan-in is what we call the process of combining multiple results into one channel. When should we use this pattern? 809 | 1. when it doesn't rely on values that the stage had calculated before; 810 | 2. when it takes a long time to run; 811 | 3. when the order of the output doesn't matter. 812 | 813 | Fan-out is easy, just launch multiple versions of a particular stage: 814 | ```go 815 | numFinders := 4 816 | finders := make([]<-chan int, numFinders) 817 | for i := 0; i < numFinders; i++ { 818 | finders[i] := primeFinder(done, randIntStream) 819 | } 820 | ``` 821 | 822 | The fan-in joins together (the term is *multiplexing*) the multiple streams of data into a single-stream. Here is an example: 823 | 824 | ```go 825 | fanIn := func( 826 | done <-chan interface{}, 827 | channels ...<-chan interface{}, 828 | ) <-chan interface{} { 829 | var wg sync.WaitGroup 830 | multiplexedStream := make(chan interface{}) 831 | 832 | multiplex := func(c <-chan interface{}) { 833 | defer wg.Done() 834 | for i := range c { 835 | select { 836 | case <-done: 837 | return 838 | case multiplexedStream <- c: 839 | } 840 | } 841 | } 842 | 843 | wg.Add(len(channels)) 844 | for _, c := range channels { 845 | go multiplex(c) 846 | } 847 | 848 | go func() { 849 | wg.Wait() 850 | close(multiplexedStream) 851 | } 852 | 853 | return multiplexedStream 854 | } 855 | ``` 856 | 857 | We can now utilize this multiplexedStream, that combines all of the channels (that are running on multiple goroutines) into one. 858 | 859 | ### The or-done channel 860 | It's a way to improve readability. When utilising this pattern, we don't need to check if the channel is closed when reading from it, because the function does it for us: 861 | 862 | ```go 863 | orDone := func(done, c <-chan interface{}) <-chan interface{} { 864 | valStream := make(chan interface{}) 865 | go func() { 866 | defer close(valStream) 867 | for { 868 | select { 869 | case <-done: 870 | return 871 | case v, ok := <-c: 872 | if ok == false { 873 | return 874 | } 875 | select { 876 | case valStream <-v: 877 | case <-done: 878 | } 879 | } 880 | } 881 | }() 882 | return valStream 883 | } 884 | 885 | for val := range orDone(done, myChan) { 886 | ... 887 | } 888 | ``` 889 | 890 | ### The tee-channel 891 | This reads from one input stream, and exposes it to two other input streams. This way you can send data to two parts of your system. 892 | ```go 893 | tee := func( 894 | done <-chan interface{}, 895 | in <-chan interface{}, 896 | ) (<-chan interface{}, <-chan interface{}) { 897 | out1 := make(chan interface{}) 898 | out2 := make(chan interface{}) 899 | go func() { 900 | defer close(out1) 901 | defer close(out2) 902 | for val := range orDone(done, in) { 903 | var out1, out2 := out1, out2 904 | for i := 0; i < 2; i++ { 905 | select{ 906 | case out1 <- val: 907 | out1 = nil 908 | case out2 <- val: 909 | out2 = nil 910 | } 911 | } 912 | } 913 | }() 914 | return out1, out2 915 | } 916 | ``` 917 | 918 | This will always wait until both out1 and out2 have written data on it. Both of these channels should always have the same data for each iteration of the input channel. 919 | 920 | ### The bridge-channel 921 | This is a useful way of combining multiple streams of data into a single one. It lets our consumers handle only one problem at the time, when we might have a channel of channels as an input. 922 | 923 | ```go 924 | bridge := func( 925 | done <-chan interface{}, 926 | chanStream <-chan <-chan interface{}, 927 | ) <-chan interface{} { 928 | valStream := make(chan interface{}) 929 | go func() { 930 | defer close(valStream) 931 | for { 932 | var stream <-chan interface{} 933 | select { 934 | case maybeStream, ok := <-chanStream: 935 | if ok == false { 936 | return 937 | } 938 | stream = maybeStream 939 | } 940 | case <-done: 941 | return 942 | } 943 | for val := range orDone(done, stream) { 944 | select { 945 | case valStream <- val: 946 | case <-done: 947 | } 948 | } 949 | }() 950 | return valStream 951 | } 952 | ``` 953 | 954 | ## The context package 955 | We can use the context package to add validations whether or not we want to close a channel and end a goroutine. This gives us some control with timeouts and cancelations. 956 | 957 | ```go 958 | work := func(ctx context.Context) { 959 | defer wg.Done() 960 | for i := 0; i < 200; i++ { 961 | select { 962 | case <-time.After(5 * time.Second): 963 | fmt.Println("starting...", i) 964 | 965 | case <-ctx.Done(): 966 | fmt.Println("context was canceled", i) 967 | } 968 | } 969 | } 970 | 971 | ctx, cancel := context.WithTimeout(context.Background(), 2*time.Second) 972 | defer cancel() 973 | 974 | wg.Add(1) 975 | go work(ctx) 976 | wg.Wait() 977 | 978 | fmt.Println("finished") 979 | 980 | Output: 981 | context was canceled 982 | finished 983 | ``` 984 | 985 | # Concurrency At Scale 986 | These techniques will allow us to make our systems more scalable. 987 | 988 | ## Error Propagation 989 | 1. Errors should always be first class citizens in our Go code. 990 | 2. Errors shouldn't just be dumped in front of the user. 991 | 3. We should try to make error handling an asset in our systems 992 | 4. Errors should include the following critical information: 993 | 1. What happened; 994 | 2. When and where it happened; 995 | 3. A friendly user-facing message; 996 | 4. How the user can get more information. 997 | 5. This allows us to know that if our error isn't wrapped, then it's an unhandled bug or error case that we didn't account for. 998 | 6. It gives us a lot more context on our errors. 999 | 1000 | **Example of a good error handling** 1001 | ```go 1002 | // Error handler 1003 | type MyError struct { 1004 | Inner error // wraps the original error 1005 | Message string 1006 | Stacktrace string 1007 | Misc map[string]interface{} // where we can put additional details such as a hash of the stack trace, an ID, any king of contextual information 1008 | } 1009 | 1010 | func wrapError(err error, messagef string, msgArgs ...interface{}) { 1011 | return MyError{ 1012 | Inner: err, 1013 | Message: fmt.Sprintf(messagef, msgArgs...) 1014 | Stacktrace: string(debug.Stack()), 1015 | Misc: make(map[string]interface{}) 1016 | } 1017 | } 1018 | 1019 | func (err MyError) Error() string { 1020 | return err.Message 1021 | } 1022 | 1023 | // Let's call this next module "LowLevel" Module 1024 | type LowLevelErr struct { 1025 | error 1026 | } 1027 | 1028 | func isGloballyExec(path string) (bool, error) { 1029 | info, err := os.Stat(path) 1030 | if err != nil { 1031 | return false, LowLevelErr{(wrapError(err, err.Error()))} 1032 | } 1033 | return info.Mode().Perm()&0100 == 0100, nil 1034 | } 1035 | 1036 | // Let's call this next module "IntermediateLevel" Module 1037 | 1038 | type IntermediateErr struct { 1039 | error 1040 | } 1041 | 1042 | func runJob(id string) error { 1043 | const jobBinPath := "/path/to/binary" 1044 | isExecutable, err := isGloballyExec(jobBinPath) 1045 | if err != nil { 1046 | return IntermediateErr{wrapError( 1047 | err, 1048 | "cannot run job %q: binaries are not available", 1049 | id, 1050 | )} 1051 | } else if isExecutable == false { 1052 | return wrapError( 1053 | nil, 1054 | "cannot run job %q: binaries are not executable", 1055 | id, 1056 | ) 1057 | } 1058 | return exec.Command(jobBinPath, "--id="+id).Run() 1059 | } 1060 | 1061 | // main 1062 | func handleError(key int, err error, message string) { 1063 | log.SetPrefix(fmt.Sprintf("[logID: %v]:", key)) 1064 | log.Printf("%#v", err) 1065 | fmt.Println("[%v] %v", key, message) 1066 | } 1067 | 1068 | func main() { 1069 | log.SetOutput(os.Stdout) 1070 | log.SetFlags(log.Ltime|log.LUTC) 1071 | 1072 | err := runJob("1") 1073 | if err != nil { 1074 | msg := "Unexpected error, please contact someone" 1075 | if _, ok := err.(IntermediateErr); ok { 1076 | msg = err.Error() 1077 | } 1078 | handleError(1, err, msg) 1079 | } 1080 | } 1081 | 1082 | Output: 1083 | [logID: 1]: 22:00:00 main.IntermediateErr{error: main.MyError{Inner: main.LowLevelErr{error: main.MyError(Inner: (*os.PathError)(0xc4200123f0), Message: "stat: \"/path/to/binary\" no such file or directory", Stacktrace: "stacktrace")}}} 1084 | ``` 1085 | 1086 | ## Timeouts and cancelations 1087 | We should always make sure all of our code is preemptable, so that if an operation is canceled at any given point, we can cancel the operation entirely. 1088 | 1089 | ```go 1090 | func someFunction() { 1091 | //... previous code 1092 | var value interface{} 1093 | select { 1094 | case <-done: 1095 | return 1096 | case value <-valueStream: 1097 | } 1098 | 1099 | result := reallyLongCalculation(done, value) 1100 | select { 1101 | case <-done: 1102 | return 1103 | case resultStream<-result: 1104 | } 1105 | } 1106 | 1107 | func reallyLongCalculation(done <-chan interface{}, value interface{}) interface{} { 1108 | intermediateRes := longCalculation(done, value) 1109 | return longCalculation(done, intermediateRes) 1110 | } 1111 | ``` 1112 | 1113 | ## Heartbeats 1114 | Heartbeats are a way for concurrent processes to signal life to outside parties. We can had an heartbeat to our goroutine that will run at a time interval, or an heartbeat that will run at the beggining of a unit of work. It is not necessary to always add heartbeats, but they are useful when: 1115 | * Some goroutine needs to be tested 1116 | * The goroutine takes a lot of time, helps debugging and to see what are the goroutines that are unealthy. 1117 | 1118 | ### Heartbeat that runs at a time interval: 1119 | Useful for checking if a goroutine is healthy (if the channels are not closed and we don't get an heartbeat, for instance). 1120 | 1121 | ```go 1122 | doWork := func( 1123 | done <-chan interface{}, 1124 | pulseInterval time.Duration, 1125 | ) (<-chan interface{}, <-chan time.Time) { 1126 | heartbeat := make(chan interface{}) 1127 | results := make(chan time.Time) 1128 | go func() { 1129 | defer close(heartbeat) 1130 | defer close(results) 1131 | 1132 | pulse := time.Tick(pulseInterval) 1133 | workGen := time.Tick(2*pulseInterval) 1134 | 1135 | sendPulse := func() { 1136 | select { 1137 | case heartbeat <- struct{}{}: 1138 | default: 1139 | // we want to continue to run even if we are not heartbeating 1140 | } 1141 | } 1142 | sendResult := func(r time.Time) { 1143 | for { 1144 | select { 1145 | case <-done: 1146 | return 1147 | case <-pulse: 1148 | sendPulse() 1149 | case results := <- r: 1150 | return 1151 | } 1152 | } 1153 | } 1154 | 1155 | for { 1156 | select { 1157 | case <-done: 1158 | return 1159 | case <-pulse: 1160 | sendPulse() 1161 | case r := <-workGen: 1162 | sendResult(r) 1163 | } 1164 | } 1165 | return heartbeat, results 1166 | }() 1167 | } 1168 | 1169 | done := make(chan interface{}) 1170 | time.AfterFunc(10*time.Second, func() { close(done) }) 1171 | 1172 | const timeout := 2*time.Second 1173 | heartbeat, results := doWork(done, timeout/2) 1174 | for { 1175 | select { 1176 | case _, ok := <-heartbeat: 1177 | if ok == false { 1178 | return 1179 | } 1180 | fmt.Println("pulse") 1181 | case r, ok := <-result: 1182 | if ok == false { 1183 | return 1184 | } 1185 | fmt.Printf("results %v\n", r.Second()) 1186 | case <-time.After(timeout): 1187 | return 1188 | } 1189 | } 1190 | 1191 | Output: 1192 | Pulse 1193 | Pulse 1194 | results 54 1195 | Pulse 1196 | Pulse 1197 | results 56 1198 | Pulse 1199 | Pulse 1200 | results 58 1201 | Pulse 1202 | ``` 1203 | 1204 | ### Heartbeat that runs at the beggining of a unit of work: 1205 | These are useful for testing a goroutine. The following is an example of a goroutine with this kind of heartbeat and how to unit test it. 1206 | 1207 | ```go 1208 | func DoWork(done <-chan interface{}, nums ...int) (<-chan interface{}, <-chan int) { 1209 | heartbeatStream := make(chan interface{}, 1) 1210 | workStream := make(chan int) 1211 | go func() { 1212 | defer close(heartbeatStream) 1213 | defer close(workStream) 1214 | 1215 | for _, n := range nums { 1216 | select { 1217 | case heartbeatStream <- struct{}{} 1218 | default: 1219 | // continue to run even if we can't run the heartbeat (buffered channel is full and someone should listen, we don't care if no) 1220 | } 1221 | 1222 | select { 1223 | case <-done: 1224 | return 1225 | case workStream <- n: 1226 | } 1227 | } 1228 | }() 1229 | return heartbeatStream, workStream 1230 | } 1231 | 1232 | func TestDoWork_GeneratedAllNumbers(t *testing.T) { 1233 | done := make(chan interface{}), 1234 | defer close(done) 1235 | 1236 | intSlice := []int{1, 2, 3, 4, 5} 1237 | heartbeat, results := DoWork(done, intSlice...) 1238 | <-heartbeat // this makes sure that `DoWork` started it's process. 1239 | for i, expected := range intSlice { 1240 | select { 1241 | case r := <-results: 1242 | if expected != r { 1243 | t.Errorf("index %v: expected %v, but got %v", i, expected, r) 1244 | } 1245 | case <-time.After(1 * time.Second): 1246 | t.Fatal("test timed out") 1247 | } 1248 | } 1249 | } 1250 | ``` 1251 | 1252 | 1253 | ## Replicated requests 1254 | This is about setting up multiple workers to provide the first response of multiple requests (the same requests). 1255 | 1256 | ```go 1257 | doWork := func(done <-chan interface{}, id int, wg *sync.WaitGroup, result chan<- int) { 1258 | started := time.Now() 1259 | defer wg.Done() 1260 | 1261 | // Simulate random load 1262 | simulatedLoadTime := time.Duration(1+rand.Intn(5)) * time.Second 1263 | select { 1264 | case <-done: 1265 | case <-time.After(simulatedLoadTime): 1266 | } 1267 | 1268 | select { 1269 | case <-done: 1270 | case result <- id: 1271 | } 1272 | 1273 | took := time.Since(started) 1274 | // Display how long handlers would have taken 1275 | if took < simulatedLoadTime { 1276 | took = simulatedLoadTime 1277 | 1278 | } 1279 | 1280 | fmt.Printf("%v took %v\n", id, took) 1281 | } 1282 | 1283 | done := make(chan interface{}) 1284 | result := make(chan int) 1285 | 1286 | var wg sync.WaitGroup 1287 | wg.Add(10) 1288 | 1289 | // Here we start 10 handlers to handle our requests. 1290 | for i := 0; i < 10; i++ { 1291 | go doWork(done, i, &wg, result) 1292 | } 1293 | 1294 | // This line grabs the first returned value from the group of handlers. 1295 | firstReturned := <-result 1296 | 1297 | // Here we cancel all the remaining handlers. 1298 | // This ensures they don’t continue to do unnecessary work. 1299 | close(done) 1300 | wg.Wait() 1301 | 1302 | fmt.Printf("Received an answer from #%v\n", firstReturned) 1303 | ``` 1304 | --------------------------------------------------------------------------------