├── 01_my_journey_towards_understanding_distribution.md ├── 02_composing_actor_behavior.md ├── 03_local-first_device_management.md ├── 04_freedom.md ├── 05_actors_are_low-level.md ├── 06_time_in_local-first_systems.md ├── 07_freedom_from_fear.md ├── 08_peer-to-peer_internet.md ├── 09_building_a_smart_car_fan.md ├── README.md └── images └── 09 ├── Arduino_in_car.png ├── Arduino_nano_33_ble_sense.png ├── Autodach.png ├── app_capabilities.png ├── buck converter driver circuit.png ├── buck_converter.png ├── buck_driver.png ├── iPhone_app.png └── info_plist.png /01_my_journey_towards_understanding_distribution.md: -------------------------------------------------------------------------------- 1 | Roland Kuhn, Oct 18–Nov 20, 2016 2 | 3 | # My journey towards understanding distribution 4 | 5 | When asked why I like the Actor Model I usually say “because it models distribution exactly.” This means that it expresses what it means for computation to be distributed, without fluff and without hiding essential features. The purpose of this article is to persist something I learnt recently about this statement; I hope it is useful to others as well. 6 | 7 | _Disclaimer: someone else has probably written down all the salient points in the eighties—apologies for not taking the time to research this aspect, this is the way I prefer to learn._ 8 | 9 | ## Starting point: untyped Akka Actors 10 | 11 | It was my privilege to be part of the rewrite of Akka between version 1.3 and 2.0. During this period we made several fundamental changes to how the toolkit works and what it guarantees. The golden rule for every change was “if it cannot be guaranteed to always work under distribution, then it must not be done.” We had an intuitive understanding of what it means to be distributed, something along the lines of 12 | 13 | * sender and receiver of a message can be on systems that are far apart (in terms of communication latency), so knowing when something has been processed is not really meaningful, because 14 | * all communication is unreliable: messages can be lost or delayed arbitrarily, and 15 | * processes (Actors) can fail independently from each other, whether on the same machine or on different parts of a network. 16 | 17 | One deeply ingrained mental reflex within the Akka team was formed back then, assuming that all consensus between Actors is problematic and should be avoided wherever possible. It takes a lot of time to reach consensus, and once it has been reached parts of the system can already be far ahead of what has been decided, potentially invalidating the result unless processes are designed to take propagation time into account. 18 | 19 | What can we offer in terms of user API and features under these severe constraints? The Actor Model defines three features and implies a fourth (but we’ll come back to the implication later): 20 | 21 | * sending messages 22 | * changing behavior between messages (i.e. sequential processing) 23 | * creating more Actors 24 | 25 | Akka implements all these, although it deviates in terms of message delivery guarantees: instead of building in _reliable delivery_ we made that optional, arguing that we should leave it up to the user to decide which level of reliability is required—e.g. we think that without (redundant) persistence it would not really be reliable because a power outage can break the guarantee, but requiring a persistent storage just to run a few local Actors is definitely very heavy. The important constraint here is that user API have a 1:1 mapping to operational semantics, i.e. an ActorRef _must always_ behave in the same fashion, it cannot be made more or less reliable by way of configuration because that would not be obvious when looking at the expression `ref ! msg`. 26 | 27 | Additional features offered by Akka are 28 | 29 | * mandatory parental supervision (including bounding the child Actor’s lifetime by its parent’s) 30 | * lifecycle monitoring a.k.a. _DeathWatch_ 31 | 32 | Going beyond what the Actor Model provides comes at a cost, it requires a certain level of coherency within the cluster of nodes that hosts the actors. Only with consensus on when to assume that a node has fatally failed can we guarantee that these features keep working with the same semantics under all conditions. Reading the mailing list makes it clear that this price is not to be underestimated. One recurring question is why nodes get kicked out the cluster and why they cannot come back later (explanation: once a node has been declared dead all supervision and death watch notification have been fired, so coming back from the dead would lead to Actors that misbehave like zombies). 33 | 34 | ## Vantage point: Akka Typed 35 | 36 | The untyped nature of Actor interactions irked me from the very beginning. Sending a message is mediated by the `!` operator, essentially a function from Any to Unit—completely unconstrained and without feedback. The lack of feedback is a concession to modeling a distributed system, since all communication has a high price. The lack of typing constraints seems accidental, though, and we see the same problem again when looking at how an Actor is defined: it is a _partial_ function from Any to Unit, making every Actor a black box that may or may not do anything when you send a message to it. This gives Actors a lot of freedom, but it also makes static reasoning rather difficult—it feels a bit like injecting a bubble of JavaScript into the type-safe Scala world. 37 | 38 | Since version 2.4 Akka ships with [Akka Typed](http://doc.akka.io/docs/akka/current/scala/typed.html), the third incarnation of the wish to improve the situation by restricting the type of messages accepted by an Actor, allowing the compiler to reject clearly incorrect programs. In essence, an Actor’s definition is now given by a _total_ function from some input message type to the next behavior (restricted to be of the same type). Correspondingly, it becomes possible and prudent to parameterize the Actor reference by the same message type, rejecting invalid inputs. 39 | 40 | ~~~scala 41 | // pseudo-Scala syntax with dotty extensions 42 | type ActorRef[-T] = T => Unit 43 | type Behavior[T] = (T | Signal) => Behavior[T] // of course this is cyclic, so it needs a trait 44 | ~~~ 45 | 46 | This change inspired many cleanups in internals and auxiliary features, but it also opened up possibilities of expressing more than just static Actor types: by including appropriately typed ActorRefs in messages the types occurring during a conversation between Actors can evolve as time passes, going through different protocol steps. 47 | 48 | ~~~scala 49 | case class Authenticate(token: Token, replyTo: ActorRef[AuthResponse]) 50 | 51 | sealed trait AuthResponse 52 | case class AuthSuccess(session: ActorRef[SessionCommand]) extends AuthResponse 53 | case class AuthFailure(reason: String) extends AuthResponse 54 | ~~~ 55 | 56 | Modeling a protocol like this and exposing only an `ActorRef[Authenticate]` to clients does not only inhibit them from sending the entirely wrong message type, it also expresses the dependency of the session availability upon successful authentication—without having an `ActorRef[SessionCommand]` the compiler will not accept the sending of such messages. 57 | 58 | ## Tangent: Protocols 59 | 60 | The previous example works by using different message types for each protocol step, which can get unwieldy after a while. Another caveat is that the number of messages sent at each step is not statically verified, the client could send multiple times, perhaps even going back to a previous protocol step by retaining that step’s ActorRef. And of course this scheme would break down as soon as protocols involve cycles that reuse the same type at different times. 61 | 62 | What is necessary to get a grip on these problems is to describe multi-step protocols in terms of their shape. One promising approach is called [_Session Types_](http://groups.inf.ed.ac.uk/abcd/), but not all questions have been answered here. For example it remains problematic to express the linearity of the process (i.e. the inability to go back in time and use previously invalidated knowledge) within programming languages such that the result is comprehensible to mere humans. One approximation is presented by Alceste Scalas’ [lchannels library](http://alcestes.github.io/lchannels/instructions.html). 63 | 64 | ## The path towards compositionality 65 | 66 | The formulation of Actors consciously focuses on a single entry point for messages. This entry point can be rebound in untyped Akka using `context.become(...)`, or it is the result of each message processing in Erlang or Akka Typed. The consequence for composing an Actor from different behavior pieces (i.e. making it do different things with different interlocutors) is that all messages come in via this one ingress point and must be demultiplexed to reach their correct destination within the internal logic. This is mildly annoying in untyped Actors, but it can be downright frustrating for Akka Typed, requiring casts to formulate a behavior that accepts both `Authenticate` and `SessionCommand` but only exposes the former to the public, for example. Strongly typed logic requires principled means of composition, this seems to be universally true whether composing pure functions or distributed computations. 67 | 68 | Alex Prokopec’s [presentation at ScalaDays 2016](https://www.youtube.com/watch?v=7lulYWWD4Qo&index=11&list=PLLMLOC3WM2r7kLKJPHKnyJgdiBGWaKlJf) in Berlin was a transformative experience for me in that it showed a way out of this dilemma. It is the nature of the Actor Model to designate different Actor identities (their references) for different purposes. We can use this to build a bigger entity that can talk a different protocol with each of its interlocutors. Creating independent Actors has the downside of losing internal consistency—_Actors are isolated islands of sanity in a sea of distributed chaos_—so the trick is to virtualize the Actor and create multiple ingress points, each with their own identity. Where Alex uses stream processing semantics à la RxJava I was immediately attracted by the idea of using π-calculus for the internal composition of these compound Actors. 69 | 70 | The first version of a [possible session DSL](https://github.com/akka/akka/pull/21212/files#diff-b8845b35e24817f4231a4f11d7a86865R203) on top of Akka Typed was quickly created based on a monadic description of the sequential and concurrent composition of primitive actions and calculations. 71 | 72 | ~~~scala 73 | val server = toBehavior(for { 74 | backend ← initialize 75 | server ← register(backend) 76 | } yield run(server, backend)) 77 | 78 | private def initialize: Process[ActorRef[BackendCommand]] = { 79 | val getBackend = channel[Receptionist.Listing[BackendCommand]](1) 80 | actorContext.system.receptionist ! Receptionist.Find(BackendKey)(getBackend.ref) 81 | for (listing ← readAndSeal(getBackend)) yield { 82 | if (listing.addresses.isEmpty) timer((), 1.second).map(_ ⇒ initialize) 83 | else unit(listing.addresses.head) 84 | } 85 | } 86 | 87 | ... 88 | ~~~ 89 | 90 | The core abstraction is a Process that eventually computes a value of a given type. `flatMap` or `map` is used for sequential composition and there is a `fork(process)` action that is used to create concurrent threads of execution. The complete resulting Process is evaluated within a single Actor (the `toBehavior` function wraps it in a suitable interpreter), reacting to inputs as they become available and asked for: the `readAndSeal` operation suspends the process until a message is available on the `getBackend` channel created as part of the `initialize` Process. 91 | 92 | The primitives offered by this library sketch match the actions and composition features of π-calculus: 93 | 94 | * channel creation 95 | * sending 96 | * receiving 97 | * sequence 98 | * choice 99 | * parallelization 100 | 101 | Things started looking really good, a world of nicely composable and reusable behavior pieces began building itself in my imagination. 102 | 103 | ## The trough (yes, of disillusionment) 104 | 105 | My dream universe crumbled when I asked myself what would happen whenever a crucial message—one that unlocks the next protocol step for another piece of Process—were to not arrive, for whatever reason. Making delivery reliable does not fix the problem that other Actors can fail independently, and with the writing end of a channel being an ActorRef it would be entirely reasonable to depend on remote systems to make progress—location transparency is a very strong semantic promise. The fix would of course be to place an upper bound on the waiting time for a receive operation, but then a local Process would need to fail. Assuming that local processes would coordinate also via channels, independent failure of processes would imply that channels can be orphaned—this is a resource safety problem that would require (distributed) GC to be solved. (Channels can also be orphaned due to programmer error, of course, so avoiding a fatal resource leak seems prudent in any case.) 106 | 107 | Another thing I realized was that in a π-calculus expression on paper we can spot and eliminate dead processes that cannot possibly make progress anymore because they are waiting to send or receive along a channel that is not known to any other process. This kind of dead Process elimination is not practically possible in an implementation based on opaque Scala closures. 108 | 109 | But the most severe difficulty is that the defining feature of π-calculus, namely the ability to send channels around, proves extremely challenging to implement in practice. The sending side is trivially solved by exposing it as an ActorRef. For the receiving side it would be necessary to enable message sends to be delivered to a set of readers whose only defining characteristic is that they are currently in possession of a reference to the channel and ready to receive, and it would need to be guaranteed that only exactly one of the readers actually gets the message. 110 | 111 | This kind of global coordination is what we eschew in Akka, at least for the basic primitives, since it is so expensive. The premise is that the basic solution should be scalable without practical limits, in principle infinitely. We strive to get as close to this ideal as possible. Not everybody needs infinite scalability and there are valid cases where a sequentially consistent database is the right solution, but that does not keep us from pushing the envelope. 112 | 113 | ## Climbing up the ridge 114 | 115 | It would be possible to “fix” channel usage by not allowing the receiving end to be serializable (meaning that it cannot be sent across the network) and throwing an exception if a receive operation is attempted from the wrong Actor’s context. This would be ugly, not only because it deviates fundamentally from π-calculus but also because it is bad practice to offer non-total functions as user API whose function depends on circumstances that are invisible in the code or types. 116 | 117 | Selecting which channel to read from is convenient for us humans, it matches how we interact as well: we walk around and talk to different people in a sequence we choose in order to reach our goals. Actors are forced to deal with whatever message comes in next, which in real life would correspond to getting distracted all the time. Unfortunately the freedom to select the channel is precisely what proved problematic above, so we now turn our attention to alternative approaches. 118 | 119 | One alternative was presented by Alex with his Reactors. The API for a channel allows transformations and other reactions to be attached in stream-processing fashion. This is already better, but it still allows a channel reference to be passed to another Reactor and wreak havoc by receiving from the wrong context. 120 | 121 | **This made me realize the advantage of having the receive operation as an _implicit_ property of the API, something that is not freely accessible by user code. This is precisely how the Actor Model avoids this pitfall, it defines three kinds of actions that can be taken _in response to a message_, but it does not permit the actor to actively ask for a message to come in.** 122 | 123 | So the other alternative is … Actors. Each channel is created from a behavior that describes how it will react to incoming messages, including the ability to change behavior betwixt them. This means that continuing a conversation with another Actor is done by creating a channel with the continuation and sending that back to the interlocutor. This is precisely how Carl Hewitt and Gul Agha have envisioned and advertised it from the very beginning, albeit without the inherent concurrency. 124 | 125 | ## Vantage point: sub-Actors 126 | 127 | The improved model could be described as an implicit packaging of channel creation with Process creation and the removal of the argument to `read` operations—reading only has access to the single input channel of a Process. Sequential composition could choose to reuse the same channel, parallel composition would need to communicate results via previously established continuation channels if appropriate. 128 | 129 | The advantage over bare Akka Typed would be the removal of boilerplate code to create and install continuation behaviors, plus the reuse of a single Actor as a scheduling unit makes this fine-grained usage feasible without incurring forbiddingly high overhead in terms of Actor creation and inter-thread messaging. It would mean that step-wise definition of behavioral processes within Actors can conveniently be written down, reused, and composed. 130 | 131 | ## Have we reached the top? 132 | 133 | In the beginning I declared that I have learnt something about the statement “Actors model distribution exactly.” The learning consists in realizing just how exactly this model fits to the problem: there does not seem to be any room between the features of the Actor Model and the semantics of distributed systems. In particular, while for example the Wikipedia page on process calculi states that π-calculus and the Actor Model can be seen as duals, I do no longer think that this is true—to me it seems that π-calculus offers too rich a feature set in order to allow infinitely scalable implementations. This aspect triggered some more research, in particular about the [expressiveness of asynchronous π-calculus](https://arxiv.org/abs/cs/9809008) (thanks to Chris Meiklejohn for the pointer!); I also encountered a very helpful [FAQ about π-calculus](https://cs.cmu.edu/~wing/publications/Wing02a.pdf) that explains its intended use. My conclusion is that the tools of π-calculus are interesting for formal description and verification of protocols—where not all possible expressiveness of the calculus is actually used—and that it is not suitable as direct inspiration for end-user API. 134 | 135 | On the other hand I did not find a true formalization of the Actor Model into a calculus in the sense that it becomes mathematically tractable in a similar fashion to other calculi, with equivalence and congruence relations and all the nice theorems that follow. It might well be that instead of such a formalization we need to derive constraints on an Actor’s behavior from external protocol descriptions, lifting them to the source level by using code generation (e.g. by encoding the whole session as Alceste has done, or by generating suitably linked message classes, depending on how much safety can be achieved with a reasonable end-user API). Or we need to extract the Actor’s actions in an abstract behavior tree that can be represented as π-calculus processes, to be analyzed externally. My main conceptual difficulty is that the primitive action of sending a message unreliably and with arbitrary delay maps to a non-trivial π process, leading to combinatorial explosions in terms of reduction possibilities; but I am not (yet?) ready to abandon the goal of having the most basic construct be efficiently implementable even for infinitely scalable systems. 136 | 137 | My current takeaway is that we should first try out composable sub-Actors and any other such model that others can think up. And then we take it further from there. 138 | 139 | _For the second part please see [Composing Actor Behavior](02_composing_actor_behavior.md)_ 140 | 141 | # Comments 142 | 143 | Please leave comments [on the pull request](https://github.com/rkuhn/blog/pull/1) or [on specific lines](https://github.com/rkuhn/blog/pull/1/files). 144 | 145 | --- 146 | _Writing space sponsored by [BAYMARKETS](http://baymarkets.com/) in Stockholm (tack så mycket!)_ 147 | -------------------------------------------------------------------------------- /02_composing_actor_behavior.md: -------------------------------------------------------------------------------- 1 | Roland Kuhn, Jan 21, 2017 2 | 3 | # Composing Actor Behavior 4 | 5 | In my [previous post](01_my_journey_towards_understanding_distribution.md) I took you on a journey towards a better understanding of distributed computing. The journey ended—somewhat dissatisfactorily—with the insight that due to its inherent properties the Actor model is very well suited for describing distributed systems, but reasoning about it is more difficult than for π-calculus. As @nestmann has [pointed out in the comments](https://github.com/rkuhn/blog/pull/1#issuecomment-266908620) there is a connection between these aspects, π-calculus is not in the same “class of distributability” as the Actor model, Join-calculus, or the localized π-calculus. 6 | 7 | Since I do not feel competent to contribute to the theoretical discourse on these topics, I have worked on a concrete implementation of the sub-actor concept mentioned at the end of the previous post. The result is an expressive toolbox for Actor behaviors that should combine well with an adaptation of @alcestes’ [lchannels](https://github.com/alcestes/lchannels) library: generating message classes that represent the succession of protocol steps in a session type. 8 | 9 | ## The basic abstraction 10 | 11 | Since the term “sub-actor” is a bit awkward—also typographically—I will call the basic building block a _process_. Every process describes a sequence of operations (e.g. awaiting a message, querying the environment, etc.) and is hosted by an Actor. These Actors are not directly visible in the programming abstraction, the implementation comes from the library and contains an interpreter for multiple concurrent processes. A very simple first program illustrates the basic setup: 12 | 13 | ~~~scala 14 | import akka.typed._ 15 | import akka.typed.ScalaProcess._ 16 | 17 | object FirstStep extends App { 18 | 19 | val main = 20 | OpDSL[Nothing] { implicit opDSL => 21 | for { 22 | self <- opProcessSelf 23 | actor <- opActorSelf 24 | } yield { 25 | println(s"Hello World!") 26 | println(s"My process reference is $self,") 27 | println(s"and I live in the actor $actor.") 28 | } 29 | } 30 | 31 | ActorSystem("first-step", main.toBehavior) 32 | } 33 | ~~~ 34 | 35 | This code is taken from [the process demo project](https://github.com/rkuhn/akka-typed-process-demo/blob/58731693461899813e414f0e9d9a09fe580e62c6/src/main/scala/com/rolandkuhn/process_demo/FirstStep.scala) which contains a modified version of the the Akka Typed artifacts that contains the process DSL. Running this program with sbt looks like the following: 36 | 37 | ~~~ 38 | rk:akka-typed-process-demo rkuhn$ sbt run 39 | [... snip ...] 40 | [warn] Multiple main classes detected. Run 'show discoveredMainClasses' to see the list 41 | 42 | Multiple main classes detected, select one to run: 43 | 44 | [1] com.rolandkuhn.process_demo.AskPattern 45 | [2] com.rolandkuhn.process_demo.FirstStep 46 | [3] com.rolandkuhn.process_demo.HelloWorld 47 | [4] com.rolandkuhn.process_demo.Parallelism 48 | 49 | Enter number: 2 50 | 51 | [info] Running com.rolandkuhn.process_demo.FirstStep 52 | Hello World! 53 | My process reference is Actor[typed://first-step/user/$!a#0], 54 | and I live in the actor Actor[typed://first-step/user#0]. 55 | ~~~ 56 | 57 | We see the result of the `println` statements in the `yield` block of the main process. The friendly greeting is nice, but more interesting are the two identities that are printed on the final lines: `opProcessSelf` is a handle for the input channel of the current process while `opActorSelf` is a handle for the main input channel of the Actor that hosts this process. As you can see, these two are in a parent–child relationship according to Akka `ActorRef` rules: the process has its own reference named `$!a` under the namespace of the parent. That parent in the example is the guardian actor for the ActorSystem, always named `/user`. The `#0` part to the right should disambiguate different incarnations of an Actor, but that is not yet fully implemented. 58 | 59 | Processes are always constructed within a lexical scope defined by an `OpDSL` instance. This is necessary because a process has exactly one ingress point for messages and the type of these messages is fixed by the OpDSL type parameter; this parameter was `Nothing` in the example above because that process does not receive any messages. We demonstrate the use of different process context by building a slightly less trivial “hello world” example. First, we construct a server process that listens for `WhatIsYourName` messages and responds with the string `"Hello"`. 60 | 61 | ~~~scala 62 | case class WhatIsYourName(replyTo: ActorRef[String]) 63 | 64 | val sayHello = 65 | OpDSL.loopInf[WhatIsYourName] { implicit opDSL => 66 | for { 67 | request <- opRead 68 | } request.replyTo ! "Hello" 69 | } 70 | ~~~ 71 | 72 | The `OpDSL.loopInf` constructor will construct and run its argument in an indefinite sequence, in this case alternating between awaiting a request with `opRead` and sending a response to the `ActorRef` that was contained in the request. A slightly different formulation might use pattern matching within the for-comprehension, as shown in the definition of the second process we need for our “hello world”. 73 | 74 | ~~~scala 75 | val theWorld = 76 | OpDSL.loopInf[WhatIsYourName] { implicit opDSL => 77 | for { 78 | WhatIsYourName(replyTo) <- opRead 79 | } replyTo ! "World" 80 | } 81 | ~~~ 82 | 83 | The clou of the process abstraction is that the `sayHello` and `theWorld` values can be used as building blocks for larger behaviors by composing them. The main process we are building here will first obtain its own handle—prepared to receive strings from the two processes defined above—and then fork or spawn the two helpers. In each case a `WhatIsYourName` request is sent and the response is read. Finally, both responses are combined in a single output statement. 84 | 85 | ~~~scala 86 | val main = 87 | OpDSL[String] { implicit opDSL => 88 | for { 89 | self <- opProcessSelf 90 | 91 | hello <- opFork(sayHello.named("hello")) 92 | _ = hello.ref ! WhatIsYourName(self) 93 | greeting <- opRead 94 | 95 | world <- opSpawn(theWorld.named("world")) 96 | _ = world ! MainCmd(WhatIsYourName(self)) 97 | name <- opRead 98 | 99 | } yield { 100 | println(s"$greeting $name!") 101 | hello.cancel() 102 | } 103 | } 104 | ~~~ 105 | 106 | Running [this process](https://github.com/rkuhn/akka-typed-process-demo/blob/58731693461899813e414f0e9d9a09fe580e62c6/src/main/scala/com/rolandkuhn/process_demo/HelloWorld.scala) will print the expected `Hello World!` to the console. The greeting is constructed from the inputs of two processes running concurrently to the main process: 107 | 108 | * the one named “hello” is forked, which means that it is hosted by the same Actor as the main process; therefore this process must also be canceled at the end, otherwise its infinite loop would keep waiting for messages indefinitely 109 | * the one named “world” is spawned as the main process of a real child actor, meaning that it can also be executed in parallel (given enough CPU cores) instead of sequentially sharing the CPU time allotted to the guardian actor with its main process; since the child Actor will need to also receive some internal management messages, we need to wrap the message destined for the main process in a `MainCmd` envelope. 110 | 111 | So far we have discussed the basic operations of running process steps sequentially, concurrently, or in parallel. With one addition we will be able to formulate our own abstractions on top of this foundation. 112 | 113 | ## Sequential composition 114 | 115 | The main concern with compositionality of typed processes has been discussed in the previous post under [the path towards compositionality](https://github.com/rkuhn/blog/blob/master/01_my_journey_towards_understanding_distribution.md#the-path-towards-compositionality): if a process is characterized by the type of messages it can receive, then doing two different activities in sequence implies having first one type and then another. Typestate (see [the paper from 1986](http://dl.acm.org/citation.cfm?id=10693) by R.E. Strom and S. Yemini) may be able to model such a transition, but no mainstream programming language includes this capability today. We work around this restriction by introducing `opCall`. This operation spawns the given process within the same host Actor, but instead of running concurrently the caller is suspended until the called process has run to completion and returned its computed value. The called process is free to use a differently typed `ActorRef` for itself, allowing interactions that do not affect the caller in any way—the called process is encapsulated just as a function call that only returns its final value. 116 | 117 | Since it is so common to send a request and expect back a response, we demonstrate this feature by creating a new building block, the “ask” operation. 118 | 119 | ~~~scala 120 | def opAsk[T, U](target: ActorRef[T], msg: ActorRef[U] => T): Operation[U, U] = 121 | OpDSL[U] { implicit opDSL => 122 | for { 123 | self <- opProcessSelf 124 | _ = target ! msg(self) 125 | } yield opRead 126 | } 127 | ~~~ 128 | 129 | This operation assumes to be run in a process whose message type is `U` and it will produce a result of that same type. To do that, it obtains its own process handle, uses the given function to insert the handle into the request message, sends the request to the target `ActorRef`, and returns the result of awaiting the response message. Using this new operation our “hello world” example becomes quite a bit shorter. 130 | 131 | ~~~scala 132 | val mainForName = WhatIsYourName andThen MainCmd.apply _ 133 | 134 | val main = 135 | OpDSL[Nothing] { implicit opDSL => 136 | for { 137 | hello <- opSpawn(sayName.named("hello")) 138 | greeting <- opCall(opAsk(hello, mainForName).named("getGreeting")) 139 | world <- opFork(sayName.named("world")) 140 | name <- opCall(opAsk(world.ref, WhatIsYourName).named("getName")) 141 | } yield { 142 | println(s"$greeting $name!") 143 | world.cancel() 144 | } 145 | } 146 | ~~~ 147 | 148 | Sending the first request to the spawned process in the child Actor is a little more complicated since the request needs to be wrapped additionally inside a `MainCmd` envelope—the Actor uses a set of internal messages for triggering the interpreter and this envelope allows messages to be forwarded to the main process after unwrapping them. Another formulation for that function value would be `(replyTo: ActorRef[String]) => MainCmd(WhatIsYourName(replyTo))`. 149 | 150 | Running [this process](https://github.com/rkuhn/akka-typed-process-demo/blob/4dc1d1c3a197ebbf51c45fd189119baa2f2d1fcb/src/main/scala/com/rolandkuhn/process_demo/AskPattern.scala) does not print the expected greeting, though: instead it prints `hello user!`. The reason lies in the implementation of `sayName` that demonstrates the difference between spawning and forking—hence the “hello” comes from the child Actor’s name while the “user” comes from the guardian actor that hosts the forked process. 151 | 152 | ~~~scala 153 | val sayName = 154 | OpDSL[WhatIsYourName] { implicit opDSL => 155 | for { 156 | self <- opActorSelf 157 | } yield OpDSL.loopInf { _ => 158 | for { 159 | request <- opRead 160 | } request.replyTo ! self.path.name 161 | } 162 | } 163 | ~~~ 164 | 165 | The trick here is that the `ActorRef` contains the main process’ name as the last segment of its `ActorPath`. This snippet also shows how to start an infinite server process after doing some initialization. 166 | 167 | ## Adding parallelism 168 | 169 | The basic functionality for allowing parallel execution is already present in the form of `opSpawn` but that operator ignores the spawned process’ result value. In order to speed up lengthy computations we should be able to build an abstraction that can turn a list of process descriptions into a list of results, running computations in parallel. This is an interesting example because it corroborates the completeness of the provided process algebra. 170 | 171 | ~~~scala 172 | def opParallel[T](processes: Process[_, T]*)(implicit opDSL: OpDSL): Operation[opDSL.Self, List[T]] = { 173 | 174 | def forkAll(self: ActorRef[(T, Int)], index: Int, procs: List[Process[_, T]])(implicit opDSL: OpDSL): Operation[opDSL.Self, Unit] = 175 | procs match { 176 | case x :: xs => 177 | opFork(x.foreach(t => self ! (t -> index))) 178 | .flatMap(s => forkAll(self, index + 1, xs)) 179 | case Nil => opUnit(()) 180 | } 181 | 182 | opCall(OpDSL[(T, Int)] { implicit opDSL => 183 | for { 184 | self <- opProcessSelf 185 | _ <- forkAll(self, 0, processes.toList) 186 | results <- OpDSL.loop(processes.size)(_ => opRead) 187 | } yield results.sortBy(_._2).map(_._1) 188 | }.withMailboxCapacity(processes.size)) 189 | } 190 | ~~~ 191 | 192 | Constructing this abstraction comes with a bit more boilerplate because we will be using `opCall` to run the processes that oversees the parallel execution, awaiting the list of results. This `opCall` needs to know the process context it is embedded into in order to return the right type—it does not care about its self reference and can thus provide the type of the process that is constructed in the calling scope. Scala’s implicit arguments model exactly this situation (and Dotty’s [implicit function types](https://www.scala-lang.org/blog/2016/12/07/implicit-function-types.html) will make this boilerplate go away eventually), the implicit `OpDSL` instance conveys the type information through the type member `Self`. 193 | 194 | The implementation of parallel execution proceeds in four steps: 195 | 196 | * first obtain a self-reference capable of receiving pairs of result value and integer 197 | * then fork all given processes, appending a step that sends the computed value together with the sequence index of that process to the self-reference 198 | * use a loop constructor to read the right number of responses, resulting in an unordered list 199 | * sort the list according to the original sequence and return only the result values 200 | 201 | Calling the process that collects results from the parallel tasks requires us to think about a detail that has not played a role so far. Every process handle is backed by a message queue, where senders enqueue and `opRead` operations dequeue items. The default queue capacity is 1 because it is very common to not have multiple messages outstanding. Exceptions would be server processes or other processes that employ non-determinism in the form of receiving from multiple senders without coordination. The process we are building will need to be able to receive a number of messages equal to the size of the list of processes it is asked to query in parallel. Configuring the mailbox capacity (using `withMailboxCapacity`) or a few other options is the difference between an `Operation` (i.e. a sequence of steps) and a `Process`. 202 | 203 | Using this new abstraction we can make our “hello world” go parallel: 204 | 205 | ~~~scala 206 | val main = 207 | OpDSL[Nothing] { implicit opDSL => 208 | for { 209 | hello <- opSpawn(sayName.named("hello")) 210 | world <- opSpawn(sayName.named("world")) 211 | List(greeting, name) <- opParallel( 212 | opAsk(hello, mainToName).withTimeout(1.second), 213 | opAsk(world, mainToName).withTimeout(1.second) 214 | ) 215 | } yield { 216 | println(s"$greeting $name!") 217 | } 218 | } 219 | ~~~ 220 | 221 | Parallelism itself is enabled by spawning the processes that compute the answers in child Actors, then we use `opParallel` to ask both of them, returning the ordered list of responses. One noteworthy aspect here is that we set a timeout of 1 second for each of the ask operations—if any one of these expires the whole Actor hosting these processes will fail. This choice of linked failure has been made because allowing concurrency (of a specifically restricted kind) within an Actor is already stretching it—allowing processes to fail independently would make Actors distributed on the inside, and that way lies madness. 222 | 223 | ## Current status and future plans 224 | 225 | The process DSL presented in this blog post is currently an open [pull request](https://github.com/akka/akka/pull/22087) towards Akka, awaiting some naming discussions (in particular on the “op” prefix for the built-in operations). I am very happy with how the encoding has worked out and am reasonably confident that a very similar process DSL will soon land in an Akka release near you. If you want to play around with the current code, please clone the [akka-typed-process-demo](https://github.com/rkuhn/akka-typed-process-demo) repository. It also contains source jars for the embedded implementation jar. **For eclipse users: please ensure that you select a 2.12 Scala installation for compilation, otherwise you will encounter binary incompatibility issues.** 226 | 227 | An aspect that is not discussed in this post is that the proposed DSL also contains operations that embed a type-indexed set of state monads within each Actor, allowing processes to keep state, possibly share it, and in a future version also persist it—this is the foreseen integration point with Akka Persistence. You can follow the discussion [on akka-meta](https://github.com/akka/akka-meta/issues/7). 228 | 229 | The next step will be the integration of this process DSL with automatically derived protocol message types, allowing an already useful subset of the static protocol verification that is possible with the Scribble language. While not guaranteeing perfect safety, it will make many aspects of Actor interactions accessible to rigorous compile-time checks. 230 | 231 | # Comments 232 | 233 | Please leave comments [on the pull request](https://github.com/rkuhn/blog/pull/2) or [on specific lines](https://github.com/rkuhn/blog/pull/2/files). 234 | -------------------------------------------------------------------------------- /03_local-first_device_management.md: -------------------------------------------------------------------------------- 1 | # Local-First Device Management 2 | 3 | This morning I set up a new printer at home. 4 | For reasons that will become clear soon below, I was reminded of the first time I set up a printer in 1988: 5 | I connected the power cord and the DB25 cable to the Epson LQ-500, hooked it up to the Atari Mega-ST2, started Signum! and very contently listened to how the bulky printer translated the parallel port’s digital pulses into tiny needle thrusts that left black marks on the paper. 6 | 7 | 33 years later the experience couldn’t be more different. 8 | Information technology has developed astounding capabilities and changed the way we interact with our planet, and it has done so by adding many layers of indirection. 9 | Setting up the printer today entailed connecting the power cord, going to the manufacturer’s website, downloading of the order of 100 million bytes of software, starting it, and anxiously following its completely opaque progress — abiding by the vague requirement to “stay in the vicinity of the printer”. 10 | At some point the printer made a sound and some restarts later it appeared in the system settings. 11 | I could now use the downloaded program to configure the printer, but I soon noticed that that only works via the cloud, demanding a working internet connection. 12 | The printer has a built-in web server that allows local access, but getting there is most inconvenient because my web browser keeps on insisting that the authenticity of that “website” cannot be ascertained; 13 | less determined users will simply give up. 14 | 15 | The stark contrast between these two experiences caused me to appreciate the growth in complexity that end-users have to dig through today when things are not working. 16 | Coming from the background of [local-first cooperation](https://local-first-cooperation.github.io/website/), the follow-up question is: 17 | How *should* this work? 18 | What are the absolutely required complexities and what is decorum? 19 | In the following we’ll attempt an approximation of an answer. 20 | 21 | ## Ownership and physical access 22 | 23 | In my household there are two kinds of electronic devices: 24 | those with lots of customisation and therefore a strong connection to my person, and all the rest. 25 | The first category comprises my phone, tablet, and laptop whereas the second category ranges from the central heating controller over chargers, printers, headphones, powerline adapters to set-top boxes and the internet router. 26 | The categories only differ in the fact that I exert a level of ownership over the former that I need to explicitly relinquish for another person to make use of them, whereas the latter can just be reappropriated by anyone who lays their hands on them — they all have a factory reset procedure or an admin password printed on the back. 27 | 28 | From this viewpoint I should be able to take my printer into operation by just having it in front of me, plugging in cables and pressing buttons. 29 | 30 | ## Administrative power 31 | 32 | For a simple device or one that has a user interface for human interaction, it is reasonable to treat me — the one who took it into operation — as the person from whom the device shall accept commands. 33 | This can be secured by a PIN that I choose during setup so that someone else will have to perform a factory reset before they can control the device. 34 | 35 | In our networked world of many devices this frequently is not enough. 36 | The printer, for example, should allow my laptop to send it print jobs, but it initially has no way of identifying the laptop and linking it to me. 37 | Once this link is established, I will be able to print but also to perform more complex workflows like changing configuration settings or telling the printer whom else it should serve or trust. 38 | 39 | So how shall we bootstrap this process? 40 | 41 | First, every device needs an unforgeable identity. 42 | As cryptography and cryptanalysis evolve, the implementation details will keep changing over the decades. 43 | Currently, a common choice is to tie the identity to an ed25519 private key and use the corresponding public key as the “name” for the device. 44 | This way each communication session between two devices can be authenticated by an ECDH key exchange in which each side proves to the other that it possesses the needed private key for their name without revealing that key. 45 | 46 | Second, we shall make use of physical proximity to initiate a communication: 47 | I might press some buttons on the printer to tell it to await an incoming session, then initiate the session from my laptop (assuming something akin to mDNS discovery). 48 | The printer can use the signal strength, timing, and possibly a short code to validate that the session is indeed the expected one. 49 | Once established, the session can be used to initialise the trust settings of the printer to recognise my laptop in the future. 50 | While I’m at it, I’ll probably also authorise my phone, my watch, my direct neural interface, … I digress. 51 | 52 | ## Privilege delegation 53 | 54 | In many cases, I’ll want to enable more things to cooperate. 55 | The printer should also allow other family members to print and tell them about ink levels — although they may not be authorised to pass along their privileges to others. 56 | This could be done on the level of individual devices. 57 | 58 | If passing along privileges is fine there is an interesting alternative: 59 | giving a name to the privilege so that it can be stored and sent around. 60 | Like for cryptographic device names this means creating an ed25519 key pair and associating the privilege with the public key as a name. 61 | Then anyone is allowed to print who can prove by an ECDH key exchange that they possess the corresponding private key. 62 | 63 | With this in place I could create a QR code containing the printer’s identity and the private key that authorises printing so that guests can also print if I choose to show them the code. 64 | The downside is that from that moment on I must assume that they retain the ability to prove possession of the right to print, so I’ll have to revoke it at intervals or whenever necessary and create a new key pair for this purpose. 65 | 66 | ## Conclusion 67 | 68 | Based on the reasoning above it is clear that local-first cooperation with household electronics is definitely possible. 69 | We possess the cryptographic tools to make it secure and express the same behaviour as for my old LQ-500 — with the added convenience that I can now print via wifi. 70 | 71 | This does not mean that cloud-based business models are out of the question. 72 | Today’s printer feels more like buying the service of having a *point of presence* for printing in my home, with ink cartridges being ordered automatically. 73 | Any number of services could be offered that are enabled by the vast resources available in the cloud, even though I personally prefer the basic function of colouring spots on paper to work without an internet connection. 74 | 75 | The part that I want to change is the weight of physical possession. 76 | Even if my printer is not owned but just a rented printing service, I still want to retain the right and ability to transfer ownership of my end of this deal to someone else by just handing them the device. 77 | There is no cloud interaction necessary in such workflows, as shown above. 78 | Local-first cooperation is not dogmatic, it only demands that those features that can work in a purely local fashion should really do so. 79 | 80 | ## Epilogue 81 | 82 | I left some parts of this topic for later posts, most notably the data models and communication protocols involved in the interaction between devices like my laptop and printer for the purpose of exercising ownership and administration. 83 | And also the corner we have painted ourselves into by creating an HTML/CSS/JS monster that we can only trust based on delicate interactions between web browser standards and centralised identity proofs that simply don’t work for local services. 84 | -------------------------------------------------------------------------------- /04_freedom.md: -------------------------------------------------------------------------------- 1 | # Taking a walk 2 | 3 | Yesterday we had a lot of snow and today I woke up to −16°C and sunshine — I decided to take a walk. 4 | While I was out there, following my whims, I realised that we have lost something essential, something core to our being. 5 | 6 | When I had made my decision, I put on warm clothes, opened the door, and started walking. 7 | There was no process, not even a thought spent on asking someone’s permission, justifying my need, or even explaining what I was doing. 8 | 9 | This is freedom. 10 | 11 | If I had to justify what I was doing, to others or even to myself, then I wouldn’t be free. 12 | If someone else could demand that I explain myself to them, then I wouldn’t be free. 13 | Even if I just had to give a motivation what what I was doing, then I wouldn’t be free. 14 | 15 | The last point may sound pedantic, but it is essential: 16 | if all our actions followed from permitted extrapolations of what we were before, then life would be deterministic, we would not have freedom of choice. 17 | Whether this choice is an illusion created by some invisible hand is an interesting question, but not relevant here. 18 | My concern is my human hunger for heartfelt freedom. 19 | 20 | How have we lost it? 21 | 22 | I must admit that the above contains a small contradiction: 23 | how can we have lost something that is essential to us, as human beings? 24 | The more precise description is that we have subdued it, we have lost sight of it, we have allowed our freedom to be put into a shrine, out of reach. 25 | 26 | There are many ways in which this has happened all over the world, to varying degrees. 27 | Our freedom is curtailed wherever we have to answer to someone, whether they represent legal authority(*), sheer brutality, or some higher cause. 28 | And with today’s communication platforms everyone can demand an explanation from everyone else for anything they do. 29 | We have seen oppressive systems in the past, developing from extremely labour-intensive physical means to ever more sophisticated hierarchical power exertion, and all of these were supplemented by peer pressure. 30 | The latter has now become a central element, examples abound from politics over culture to the sciences, independent of background — political or otherwise. 31 | 32 | Perhaps this last insight, bitter as it is, bears a glimmer of hope: 33 | we have not actually lost our hunger for freedom, we are just channeling it in unhealthy ways. 34 | People are substituting their lost freedom by usurping power over others, sometimes petty and sometimes for a presumed higher cause. 35 | 36 | I am convinced that by giving people back their freedom, by not stifling them anymore, we can again talk to each other to sort out our problems. 37 | We do have a lot that we should talk about in earnest! 38 | But the current oppressive way of trying to deal with our issues — great and small — is leading us into a dead end. 39 | 40 | --- 41 | 42 | (*) The rule of the law is the single most important invention in human civilisation, it is the basis for everything we have achieved, so we must respect legal authority. 43 | The onus is on the legislative to very carefully weigh everyone’s freedom before restricting it by law — and we are not very good at this part yet. 44 | -------------------------------------------------------------------------------- /05_actors_are_low-level.md: -------------------------------------------------------------------------------- 1 | # Actors are a low-level tool 2 | 3 | A few days ago Sergey Bykov published an article on [why he doesn’t use the term Actor anymore](https://docs.temporal.io/blog/sergey-the-curse-of-the-a-word/). 4 | I’ve known Sergey for a few years now, we met at several conferences, were in program committees together and we’re both on the Reactive Foundation’s advisory council. 5 | But funny enough, our first contact was precisely one of those discussions he mentions, leading to [an in-depth comparison between Akka and Orleans](https://github.com/akka/akka-meta/blob/master/ComparisonWithOrleans.md). 6 | 7 | Since May 2015 my own understanding of distributed computing has evolved as well. 8 | The first step was to recognise just [how precisely the Actor Model characterises distributed systems](https://github.com/rkuhn/blog/blob/master/01_my_journey_towards_understanding_distribution.md). 9 | But after that I started building programming tools for automating high-level workflows on the factory shop floor, and I realised that Actors by themselves are not all that useful. 10 | They are too low-level. 11 | 12 | ## Actors are a concurrency and distribution _primitive_ 13 | 14 | Implementations of the Actor Model offer an API that is rather small and seemingly simple: a message receiver is run in a loop and there is a handle for sending messages to it. 15 | Since this API is larger than — say — that of a mutex, we are tricked into believing that an Actor is a higher-level construct. 16 | This impression is corroborated by the fact that Actor runtimes employ mutexes or atomic variables under the hood. 17 | 18 | But the impression is still incorrect. 19 | An Actor is a programming primitive quite like a mutex or a promise/future, but it has two parts that are easily conflated. 20 | The first part is the description of the message processing loop, which often is used as a concurrency control structure (processes “one message at a time”). 21 | The second part is the Actor reference, the handle that allows sending messages. 22 | This is the part that makes an Actor useful in a distributed setting, since Actor references can usually be sent across the network. 23 | 24 | As I argue in the article linked above, the resulting package allows exactly the expression of distributed programs. 25 | A distributed system is built from a group of Actors, each one being one primitive building block. 26 | Designing and implementing Actors therefore requires a comprehensive understanding of distributed systems, the API forces the programmer to take a corresponding viewpoint. 27 | 28 | This is really nice and powerful if you want to write a library that solves some problem using a distributed system: you get to work with the real thing, gloves off, hands dirty, but you’re in full control. 29 | As an end-user API for people from a business background this is less suitable, and we’ll get back to how this observation surfaced in Akka. 30 | 31 | ## One Actor is always local 32 | 33 | Going back to the original definition, an Actor is an entity living somewhere on a network node, tied to and identified by its mailbox. 34 | It takes one message out of the mailbox, processes it, then starts over. 35 | The Actor is created at some point in time and it may choose to become “inert” at a later point in time (which is equivalent to being stopped and re-routing its mailbox into Nirvana) — in other words, an Actor has a linear lifecycle. 36 | 37 | Saying that one particular Actor is distributed does not make much sense because according to the rules it can only process one message at a time anyway. 38 | Actors are building blocks for distributed systems, one Actor is not even a system, let alone a distributed one. 39 | 40 | This presents another piece of evidence that the Actor Model doesn’t really solve high-level problems. 41 | Business use-cases often require distributed systems for redundancy and fail-over, so that the resulting business solution has the resilience it requires. 42 | Business entities can therefore not have a 1:1 relationship with Actors, such an entity will need to be an abstraction over a group of Actors that live in different locations. 43 | 44 | ## How do I tell my local CPU how to run my Actor? 45 | 46 | Given that I have designed an Actor as a solution to one of my problems, how do I write that down? 47 | The design will describe the accepted messages, the state managed by the Actor, and the logic that determines what to do with each received message. 48 | The first two parts are types and data structures while the last part is a procedure. 49 | 50 | Taking a step back, what does an _atomic integer_ require of me? 51 | I only need to provide an initial value and then I can use the methods provided, like `get_and_add` or `compare_exchange`. 52 | In case of a _mutex_ I use the provided constructor and then I can `lock` and `unlock` it, dividing my program into regions inside and outside of the exclusive zone. 53 | One more level up, a _future_ is a handle for a value that may be provided at a later point in time. 54 | In order to use it I need to describe a computation or some external resource and then my program uses callbacks to consume that result when it becomes available. 55 | 56 | The funny thing is, `async`/`await` has been added to many programming languages as a tool for working with futures, but what this language construct allows you to write is the definition of an asynchronous _procedure_. 57 | This is exactly what we need when describing how an Actor should act. 58 | If you want to form a mental model of how an Actor works, my recommendation is to picture an asynchronous loop consuming messages from a queue. 59 | The Actor reference is nothing but the sending side of that queue. 60 | 61 | As most of my daily work is done in Rust nowadays, here’s how that could look: 62 | 63 | ```rust 64 | async fn pong(mut mailbox: Mailbox>) -> Result<(), SenderGone> { 65 | let mut count = 10; 66 | while count > 0 { 67 | let Ping { mut reply } = mailbox.next().await?; 68 | reply.send(count); 69 | count -= 1; 70 | } 71 | Ok(()) 72 | } 73 | ``` 74 | 75 | The compiler will turn this into a state machine that suspends when it hits `.await`, keeps track of how many iterations remain, and returns “success of unit” when done. 76 | This state machine implements the `Future` contract, so that the function call result of `pong(mailbox)` can be spawned as a task on a futures runtime. 77 | I mention this here to make it dead obvious that each Actor will need a CPU to run on whenever a message needs to be processed. 78 | The corresponding complexity of providing this infrastructure is another reason why I consider Actors as low-level tools. 79 | 80 | ## High-level business logic requires other abstractions 81 | 82 | This whole article was sparked by Sergey’s post, which is mainly about a higher-level — and much more useful — programming abstraction. 83 | He describes Orleans “grains” as cloud-native objects, as persistent entities with a business meaning. 84 | A grain just exists somewhere in a silo, which is a cluster of cloud nodes in some computing center. 85 | The important part is that a grain implements some workflow, it describes and defines an object in the virtual space — which may well have close ties to an object or process in the real world. 86 | The programmer is freed from the concerns of when and where to schedule the evaluation of a grain or how to ensure the persistence of its state. 87 | 88 | Akka added the [`PersistentActor`](https://doc.akka.io/docs/akka/current/persistence.html) API for the very same reasons, this API is a close cousin of Orleans’ grain. 89 | While some design details and choices are different, their _raison d’être_ is the same: Actors are too low-level, so there exists an obvious but non-trivial extension package that presents the programmer with a more comprehensive tool. 90 | Of course this larger package has already made some choices, it restricts the design space for the programmer, but that is exactly the reason why it is more useful. 91 | 92 | At Actyx I recently blogged about [Local Twins](https://developer.actyx.com/blog/2021/04/29/partial-connectivity-ux), which is another example of this kind. 93 | The design goal here is to offer replicated business logic with 100% availability in a peer-to-peer network; the logic always progresses as long as there is only a single device it can run on. 94 | While Actors are certainly a helpful underlying primitive, Local Twins are far more useful to application programmers since they include ready-made choices for handling persistence, domain modeling, and distributed conflict resolution. 95 | 96 | ## Conclusion 97 | 98 | Sergey looked at this topic from the perspective of teaching Orleans while using Actor vocabulary, which creates a number of difficulties. 99 | My take is that Actors were never meant to be a high-level abstraction in the realm of business logic. 100 | The short summary would be that we come to the same conclusion for very similar reasons, but use different paths to get there. 101 | 102 | The Actor Model is a precise characterisation of what each individual part of a distributed system can do. 103 | Business entities and workflows, on the other hand, describe the resulting behaviour that an underlying distributed system should achieve. 104 | Until we as an industry have gained an understanding of the link between individual Actors and the whole system’s emergent behaviour, we will have to assume that no single concept can be stretched over this whole range without breaking. 105 | We will thus continue to need higher-level abstractions to describe the business purpose, as well as low-level abstractions like Actors, futures, mutexes, sockets, etc. for the technical implementation. 106 | -------------------------------------------------------------------------------- /06_time_in_local-first_systems.md: -------------------------------------------------------------------------------- 1 | # Time in Local-First Systems 2 | 3 | A lot has been written about the two ways of keeping track of time in distributed systems. 4 | The first way — chosen either by the careless or the particularly resourceful — is to use physical time, measured by clocks that advance at a fixed rate. 5 | Although broken by general relativity in principle, they work quite well if the system in question is constrained to this planet. 6 | The second way — the way you learnt about in your distributed systems lecture — is to use logical time, first proposed by Leslie Lamport. 7 | Here, clock ticks are driven by the communication between network nodes as well as the actions being performed on those nodes. 8 | 9 | While the above seems to cover all (i.e. both) angles, there is indeed a crack between them. 10 | This post explores use-cases that defy or evade all solutions known to me. 11 | 12 | ## Enter local-first cooperation 13 | 14 | In [local-first cooperation](https://www.local-first-cooperation.org/) we strive for autonomous nodes that work together with nearby peers when and if such peers are available. 15 | The characteristic of such networks is that there is no central authority that controls the nodes, meaning that for example the device clocks cannot be trusted across devices. 16 | And there is also no expectation of stable or long-lasting communication relationships: nodes form fleeting associations to exchange information and work together. 17 | In fact, many use-cases like remote maintenance workers or logistics vehicles require that individual devices are fine with being isolated over long stretches of time. 18 | No peers, and no internet either. 19 | 20 | ## Confusion ensues 21 | 22 | This is problematic for logical clocks. 23 | Imagine two nodes working together from time to time and doing relevant things in between their information exchanges. 24 | The logical clock of the one device may advance at a much smaller rate than the other, for example because more is happening on that other device or it is in contact with other nodes that also drive the clock forward. 25 | Whenever the two nodes synchronise, the first node’s clock would make a large jump forward due to [Lamport’s algorithm](https://en.wikipedia.org/wiki/Lamport_timestamp). 26 | 27 | Since the purpose of a clock is to generate timestamps from it and then sorting events by those timestamps, it is obvious that the first node’s events would tend to be sorted at the beginning while the second node’s events would mostly be sorted after those. 28 | In theory, this should be fine because sorting by Lamport timestamps only guarantees correct sorting between causally related events, and the events produced during a network partition are unrelated — they are concurrent. 29 | In practice, this makes it hard to write business logic that makes sense of the merged event history, though. 30 | The two nodes might be mobile devices used by persons, and these persons might have communicated some of the facts recorded, for example by shouting or by a phone call. 31 | Or a single person may have seen the two devices while wifi was down, and some of the events were created by that person on the first device only because the second device showed a relevant piece of status information. 32 | 33 | So the first problem is that causal links may be formed outside the system, and those causal links will not be recorded and thus also not be respected within the system. 34 | 35 | ## We’re addicted to time 36 | 37 | You’re addicted to something _exactly if_ you cannot stop using that thing even if you want to — you have no choice regarding this thing. 38 | Time itself is of this quality, our whole existence is defined by the linear forward progression of time, so each one of us is trapped by the concept of time. 39 | 40 | One of the consequences is that we frequently need and even want to measure the passage of time, we are compelled to quantify it. 41 | Therefore, we cannot disregard physical clocks, as they are the only mechanism we have to measure time. 42 | On the other hand we have not yet created clocks that we can trust across computing device boundaries. 43 | We have developed sophisticated mechanisms to keep clocks synchronised in a continuously functioning network, and we have created foundational clocks that demand massive operational effort to keep running, but we do not yet have a cheap and portable solution to the problem in the general case. 44 | 45 | This is the reason why the event trace of a distributed system may be difficult to represent in terms of temporal reporting. 46 | Consider that you receive some information on your mobile phone (e.g. about a parcel delivery) and you tap a button to confirm it (i.e. to open the delivery box for the courier). 47 | There is a clear causality between these two events, but if one of the clocks is off by a few seconds and your button tap happens quickly then the measured time interval between the two events may turn out to be negative. 48 | 49 | How do you report this? 50 | If you round up all negative time intervals to zero, then the sum of time intervals along a causality chain no longer matches the time that passed from beginning to end. 51 | And if you report the computed difference in timestamps, the result is nonsensical. 52 | 53 | Note how there is no option to report the actually correct real-world time period it took: we simply have no clock that could measure it. 54 | We can only choose between two incorrect options. 55 | 56 | ## How can we fix this? 57 | 58 | Here are a bunch of non-solutions: 59 | 60 | - require device clocks to be in sync before permitting information to flow 61 | 62 | This is not a solution because the information itself usually carries timestamps of past events. 63 | Requiring the devices to sync would only produce a false sense of safety, negative reported time intervals are not prevented. 64 | And depending on how much hassle this synchronisation presents for the user, the network may become unusable and thus effectively useless. 65 | 66 | - artificially advance Lamport clocks to approximate a physical clock 67 | 68 | This sounds enticing at first because causality will be mostly in line with physical time, but lagging clocks will still timestamp events such that they are sorted too early. 69 | And synchronising with a clock that is way ahead will switch the system again into pure Lamport mode because a correct clock will not make the same jump ahead. 70 | 71 | - record as much causality information as possible and sort concurrent events by physical timestamp 72 | 73 | This is probably the best we can do, but it only makes it somewhat less challenging to interpret concurrent events. 74 | Causally related events may still describe a timeline along which the physical clock jumps backwards, which can happen when events come from different devices or when the physical clock of one device needed a significant backwards correction. 75 | 76 | In summary, as far as I can see there is no real solution to this problem. 77 | Under the constraints that we can trust neither our physical clocks nor our ability to communicate, we are left without a mechanism for synchronisation. 78 | 79 | ## So what can we do? 80 | 81 | Within the constraints of pure local-first cooperation the only thing we can do is to record causality within event traces as best we can. 82 | The interpretation of physical timestamps in these traces will always be problematic as they are open to deliberate or accidental tampering. 83 | 84 | If we weaken the constraints, there are a few options: 85 | 86 | 1. Define and use **trusted physical clocks** that can be used to obtain qualified timestamps; this reduces the autonomy of network nodes, in particular their ability to record facts while they don’t have access to communication. 87 | 2. Employ a **consensus protocol** like Raft or Paxos to manage one centralised event trace with its timings; there probably are some subtleties in the “timings” part, nodes would effectively need to synchronise their physical clocks based on the central ledger. 88 | 3. Restrict deployment to **highly available network infrastructure** and pay the operational cost required for that; in this case any resulting negative time intervals will be anomalies due to operational failure, to be minimised by the standard operating procedures. 89 | 90 | Next to that, there is a grey area that I’ll explore more in the future. 91 | Instead of providing a rock solid infrastructure solution to this problem, we may also restrict the application designer so that issues caused by clock skew become less important. 92 | The general idea is that if two devices intensely cooperate, they may reasonably be expected to communicate frequently enough so that their event traces are nicely ordered by causality links. 93 | And then we may use physical device clocks only to measure the relative passage of time since the most recent communication. 94 | A network of devices using such a scheme may well exhibit clock drift relative to UTC, for example due to uncertainties on message propagation delays. 95 | But at least for networks of limited size this may well be good enough. 96 | -------------------------------------------------------------------------------- /07_freedom_from_fear.md: -------------------------------------------------------------------------------- 1 | _(deutsche Version [siehe unten](#freiheit-von-angst))_ 2 | 3 | # Freedom from Fear 4 | 5 | Recently, the phrase «your freedom ends where mine begins» has been used more frequently in the context of the pandemic. 6 | We’ll get to sources of this pseudo-quote further down. 7 | The purpose of this article is to illuminate one particularly nasty way in which this apparently sound principle is abused and contorted. 8 | 9 | ## A thought experiment 10 | 11 | Imagine a conversation between two fictitious personas representing current strife: 12 | one who mandates to forcibly vaccinate others, and one who refuses to be forced — whether this second persona would voluntarily take the shot otherwise is a different question. 13 | Here, we deliberately choose to explore the situation where the first position is motivated and justified by fear: 14 | 15 | - fear of them or their loved ones getting infected 16 | - fear of severe illness and death 17 | - fear of new variants of the virus 18 | - fear of societal collapse 19 | 20 | Whether these fears correspond to actual risks is immaterial for this thought experiment, more on this later. 21 | The first persona is in a dreadful position, a joyful life is unlikely while such fears reside in a human mind. 22 | It thus seems reasonable to restore this persona’s ability to pursue happiness removing the cause of the fear, i.e. by vaccinating all humans. 23 | Whether vaccination actually mitigates the risks is again immaterial, since fear is irrational. 24 | 25 | Taking stock, we have concluded that due to the freedom rights of the first persona everyone, including the second persona, needs to get vaccinated. 26 | 27 | Now, taking a look from the other perspective, it is only fair to assume that the second persona is also motivated by fear. 28 | They might be horrified by syringes, or terrified of a foreign substance circulating in their veins or by possibly fatal side effects. 29 | Following the same argument, it would again be reasonable to restore this persona’s ability to pursue happiness by foregoing vaccination. 30 | And since these fears are no less irrational than the others, it is again immaterial if there are actual risks corresponding to them. 31 | 32 | We have reached an impasse: if we apply the same rules and reasoning to both personas, then we must at the same time forcibly vaccinate and forego vaccination. 33 | 34 | Such a contradiction can only arise if its seed is already included in our premise 35 | (in mathematical terms, if you start from falsehood then you can prove anything you like, even with correct reasoning). 36 | 37 | ## What do we learn from this? 38 | 39 | *As demonstrated above, the argument “I have a right to freedom from fear” is flawed, we cannot use it.* 40 | 41 | This finding also makes sense regarding responsibilities: everybody is responsible for their own fears. 42 | For this reason, neuroses and psychoses are considered as illnesses and treated accordingly. 43 | The norm is that every human learns to control their fears well enough to be able to partake in society. 44 | 45 | So if we want to resolve the conflict in the thought experiment above, we need to take fear out of the equation. 46 | The first persona needs to assess the actual risks and severities attached to getting infected, falling sick, encountering new virus variants, or societal collapse. 47 | Likewise, the second persona needs to assess the actual risks and severities of being punctured by a syringe, injected with an approved vaccine, or suffering fatal side-effects. 48 | To keep this article on point, I won’t even try to go into the details here. 49 | 50 | ## Quotes and context 51 | 52 | Besides the quantification of risks, there are other aspects to consider, which brings us back to the occasion of this article. 53 | Most people who I saw use the initial quote, attributed it to Immanuel Kant. 54 | He said something similar, but a lot more precise: 55 | 56 | > No one can force me to be happy after their fashion, but everyone may choose their own path to pursue happiness, as long as they don’t deny such a right — if consistent with a common law applicable to veveryone — to someone else.  — [link](https://korpora.zim.uni-duisburg-essen.de/Kant/aa08/290.html#z27) 57 | 58 | Kant explicitly includes the requirement that my freedom limits yours only if my freedom is based on common law that can equally be applied to you and me. 59 | In other words, I can only demand of you what you can demand of me; freedom from fear clearly fails this test. 60 | 61 | Another manifestation of this principle can be found in Article 2 of the German constitution: 62 | 63 | > Everyone has the right to free development of their personality, provided that they don’t violate the rights of others and the constitutional order. 64 | 65 | The mentioned rights refer to the collection of civil rights of which this article is one. 66 | The civil rights circumscribe the protected sphere which belongs to each citizen and where that citizen is fully responsible for making use of their rights and liberties as they see fit. 67 | 68 | --- 69 | _German version_ 70 | 71 | # Freiheit von Angst 72 | 73 | In letzter Zeit wird im Zusammenhang mit der Pandemie häufiger der Satz "Ihre Freiheit endet dort, wo meine beginnt" verwendet. 74 | Auf die Quellen dieses Pseudo-Zitats gehe ich weiter unten ein. 75 | Der Zweck dieses Artikels ist es, eine besonders unangenehme Art und Weise zu beleuchten, in der dieser an sich solide Grundsatz missbraucht und verdreht wird. 76 | 77 | ## Ein Gedankenexperiment 78 | 79 | Stellen Sie sich ein Gespräch zwischen zwei fiktiven Personen vor, die den aktuellen Streit repräsentieren: 80 | Eine, die den Auftrag hat, andere zwangsweise zu impfen, und eine, die sich weigert, sich zwangsweise impfen zu lassen - ob diese zweite Person sich sonst freiwillig impfen lassen würde, ist eine andere Frage. 81 | Hier wollen wir bewusst die Situation untersuchen, in der die Position der ersten Person durch Angst motiviert und gerechtfertigt ist: 82 | 83 | - Angst davor, dass sie oder ihre Angehörigen infiziert werden 84 | - Angst vor schwerer Krankheit und Tod 85 | - Angst vor neuen Varianten des Virus 86 | - Angst vor dem Zusammenbruch der Gesellschaft 87 | 88 | Ob diese Ängste tatsächlichen Risiken entsprechen, ist für dieses Gedankenexperiment unerheblich, dazu später mehr. 89 | Die erste Person ist in einer schrecklichen Lage: ein erfülltes Leben ist unwahrscheinlich, solange solche Ängste einem menschlichen Geist innewohnen. 90 | Es erscheint daher vernünftig, die Fähigkeit dieser Person, nach Glück zu streben, wiederherzustellen, indem die Ursache der Angst beseitigt wird, d. h. indem alle Menschen geimpft werden. 91 | Ob die Impfung die Risiken tatsächlich mindert, ist wiederum unerheblich, denn Angst ist irrational. 92 | 93 | Wir kommen also zu dem Schluss, dass aufgrund der Freiheitsrechte der ersten Person alle Menschen, auch die zweite Person, geimpft werden müssen. 94 | 95 | Betrachtet man nun die andere Perspektive, so kann man davon ausgehen, dass auch die zweite Person durch Angst motiviert ist. 96 | Sie könnte sich vor Spritzen fürchten, vor einer fremden Substanz, die in ihren Adern zirkuliert, oder vor möglicherweise tödlichen Nebenwirkungen. 97 | Mit demselben Argument wäre es auch für diese Person vernünftig, ihr Glück durch den Verzicht auf die Impfung wiederherzustellen. 98 | Und da diese Ängste nicht weniger irrational sind als die anderen, ist es wiederum unerheblich, ob ihnen tatsächliche Risiken entsprechen. 99 | 100 | Wir befinden uns in einer Sackgasse: Wenn wir auf beide Personen dieselben Regeln und Überlegungen anwenden, dann müssen wir gleichzeitig zwangsimpfen und auf die Impfung verzichten. 101 | 102 | Ein solcher Widerspruch kann nur entstehen, wenn sein Keim bereits in unserer Prämisse angelegt ist 103 | (mathematisch ausgedrückt: Wenn man von der Unwahrheit ausgeht, kann man alles Mögliche beweisen, auch mit richtiger Argumentation). 104 | 105 | ## Was lernen wir daraus? 106 | 107 | *Wie oben gezeigt, ist das Argument "Ich habe ein Recht auf Freiheit von Angst" fehlerhaft, wir können es nicht verwenden.* 108 | 109 | Diese Erkenntnis macht auch in Bezug auf die Verantwortung Sinn: Jeder ist für seine eigenen Ängste verantwortlich. 110 | Aus diesem Grund werden Neurosen und Psychosen als Krankheiten betrachtet und entsprechend behandelt. 111 | Die Norm ist, dass jeder Mensch lernt, seine Ängste gut genug zu kontrollieren, um an der Gesellschaft teilhaben zu können. 112 | 113 | Wenn wir also den Konflikt des obigen Gedankenexperiment lösen wollen, müssen wir die Angst aus der Gleichung herausnehmen. 114 | Die erste Person muss die tatsächlichen Risiken und Schweregrade abschätzen, die mit einer Ansteckung, einer Erkrankung, dem Auftreten neuer Virusvarianten oder dem Zusammenbruch der Gesellschaft verbunden sind. 115 | Ebenso muss die zweite Person die tatsächlichen Risiken und Schweregrade einschätzen, die damit verbunden sind, von einer Spritze verletzt zu werden, einen zugelassenen Impfstoff zur Anwendung zu bringen oder tödliche Nebenwirkungen zu erleiden. 116 | Um diesen Artikel auf den Punkt zu bringen, werde ich hier nicht einmal versuchen, auf die Details einzugehen. 117 | 118 | ## Zitate und Kontext 119 | 120 | Neben der Quantifizierung der Risiken gibt es noch weitere Aspekte zu berücksichtigen, womit wir wieder beim Anlass dieses Artikels wären. 121 | Die meisten Leute, die ich das ursprüngliche Zitat verwenden gesehen habe, schrieben es Immanuel Kant zu. 122 | Er sagte etwas Ähnliches, aber viel präziser: 123 | 124 | > Niemand kann mich zwingen auf seine Art (wie er sich das Wohlsein anderer Menschen denkt) glücklich zu sein, sondern ein jeder darf seine Glückseligkeit auf dem Wege suchen, welcher ihm selbst gut dünkt, wenn er nur der Freiheit Anderer, einem ähnlichen Zwecke nachzustreben, die mit der Freiheit von jedermann nach einem möglichen allgemeinen Gesetze zusammen bestehen kann, (d.i. diesem Rechte des Anderen) nicht Abbruch thut. - [link](https://korpora.zim.uni-duisburg-essen.de/Kant/aa08/290.html#z27) 125 | 126 | Kant stellt ausdrücklich die Forderung auf, dass meine Freiheit die Ihre nur dann einschränkt, wenn meine Freiheit auf einem allgemeinen Recht beruht, das auf Sie und mich gleichermaßen angewendet werden kann. 127 | Mit anderen Worten: Ich kann von Ihnen nur verlangen, was Sie auch von mir verlangen können; die Freiheit von Angst besteht diesen Test eindeutig nicht. 128 | 129 | Eine weitere Ausprägung dieses Prinzips findet sich in Artikel 2 des deutschen Grundgesetzes: 130 | 131 | > Jeder hat das Recht auf freie Entfaltung seiner Persönlichkeit, soweit er nicht die Rechte anderer verletzt und nicht gegen die verfassungsmäßige Ordnung oder das Sittengesetz verstößt. 132 | 133 | Bei den genannten Rechten handelt es sich um den Kanon der Grundrechte, zu dem auch dieser Artikel gehört. 134 | Die Grundrechte umschreiben die geschützte Sphäre, die jedem Bürger gehört und in der er seine Rechte und Freiheiten in eigener Verantwortung wahrnehmen kann. 135 | 136 | _(Übersetzt mit Hilfe von www.DeepL.com/Translator)_ 137 | -------------------------------------------------------------------------------- /08_peer-to-peer_internet.md: -------------------------------------------------------------------------------- 1 | # A peer-to-peer Internet 2 | 3 | _In this post I articulate something I [learnt yesterday](https://github.com/libp2p/rust-libp2p/issues/2657#issuecomment-1142310386) and explore its ramifications._ 4 | 5 | For a while I’ve been dreaming of a second Internet, co-existing with the current one, consisting not of myriads of clients connecting to numerous servers but of countless peers connecting to each other. 6 | As social beings we have no trouble imagining such a system — it matches how we interact with our friends and neighbours, with staff at businesses, our co-workers, etc. 7 | Some interactions prompt or are prompted by physical proximity, others are conducted over remote communcation. 8 | One restriction we are intuitively aware of is that a successful conversation can only occur if the interlocutors speak the same language, both literally as well as figuratively. 9 | And we tend to seek out certain persons when we have a concrete problem to solve. 10 | 11 | ## How does this translate to programming? 12 | 13 | It would be natural to give each network participant — be that a mobile phone, a server in the cloud, or a thermometer — a cryptographic identity and call that `PeerId`. 14 | Each peer would then maintain an “address book” associating names and purposes with `PeerId`s. 15 | And there would be a form of neighbourhood discovery by which each peer can learn of other peers that are nearby, including some levels of indirect reachability (like a peer of mine could see other peers I cannot see with my own radio antenna). 16 | 17 | In this world, a network application would want to deal in `PeerId`s: open a stream to that peer, receive datagrams from that peer, etc. 18 | Whether the application implements video chat, document sharing, or a hiking tour guide doesn’t matter for our further deliberations. 19 | The important conclusion from this generality is that in addition to `PeerId` we need the concept of a `ProtocolName`: the intended language to be spoken. 20 | There would be a plethora of such names because there are many different and useful ways of implementing any of the aforementioned applications and their communication. 21 | `ProtocolName`s would naturally employ namespacing and versioning, e.g. `/voice/rtp` or `/doc/automerge/v2`. 22 | 23 | So if I wanted to make a voice call to a friend, I’d use an app that tries to open a signaling stream using the range of protocols supported locally. 24 | The friend’s `PeerId` would be searched in the network neighbourhood or in a [DHT](https://en.wikipedia.org/wiki/Distributed_hash_table), a communication path found (like an IP address and port), and a low-level connection made. 25 | Then both sides compare their lists of supported `ProtocolName`s and choose one from the intersection. 26 | The application uses the negotiated protocol to stream the audio bits back and forth over the established connection. 27 | 28 | ## The state of the art 29 | 30 | Contrast that with how network programming is done today: the application deals in IP addresses and port numbers, where the former is usually ephemeral and the latter is a very weak representation of a `ProtocolName` with no negotiation. 31 | This established paradigm works well for clients connecting to servers at well-known coordinates, where IP addresses and port numbers are stable. 32 | 33 | But it does not work well for peer-to-peer networking. 34 | How should an app on a watch know that the peer I want to communicate with happens to be in the same private network right now? 35 | Even though it costs far less to use the local network, such calls today are still routed via cloud nodes because that is much easier to program. 36 | Local network awareness is being added to iOS and Android, but its growth is slow because each application needs to explicitly make provisions for local communication, using proprietary protocols. 37 | 38 | One ongoing effort to improve this situation is [libp2p](https://libp2p.io/), which I’ve been using (and very slightly improving) for the past few years at [Actyx](https://developer.actyx.com/). 39 | 40 | ## But what about privacy? 41 | 42 | If I use a single `PeerId` to make myself discoverable, then everyone else will be able to follow my (network) movements. 43 | In the real world, following someone’s every move is considered inacceptable behaviour and is in many jurisdictions even punishable by law. 44 | With the technical means sketched and implied above, it would become possible to stalk a person from anywhere on the planet without risk of being noticed (modulo possible but costly countermeasures). 45 | 46 | Taking a look at the situation today, we find that cloud services perform the implicit function of hiding their users’ location. 47 | Only when switching to a direct connection for real-time data streams can we see the current IP addresses of our interlocutors. 48 | These addresses are in the vast majority of cases only temporarily associated with a mobile device or home network; ordinary people have no legal access to the real name behind a transport address. 49 | The association with a specific person (e.g. via a username) is encapsulated by the cloud service. 50 | 51 | We can transfer this scheme into the peer-to-peer world by relegating the aforementioned `PeerId` to become an ephemeral transport identity. 52 | The personal identity may have the appearance and cryptographic structure of a `PeerId` and its location would be hidden by a relay service. 53 | This has recently been implemented in [libp2p DCUtR](https://github.com/libp2p/specs/blob/master/relay/DCUtR.md), albeit for the reason of allowing connections to peers behind strict firewalls. 54 | If two peers trust each other enough, a relay service could also facilitate a direct connection between them, e.g. by selectively disclosing the current transport identity. 55 | 56 | ## The path forward 57 | 58 | I’ll keep experimenting with these concepts in the context of [rust-libp2p](https://github.com/libp2p/rust-libp2p). 59 | The currently emerging plans for opening negotiated substreams to some `PeerId` sound very promising and should be complemented by a mechanism for sending and receiving unreliable datagrams. 60 | In addition to these point-to-point facilities there is already the battle-tested [gossipsub](https://github.com/libp2p/specs/blob/master/pubsub/gossipsub/gossipsub-v1.1.md) protocol for broadcasts. 61 | Together with the infrastructure services of [identify](https://github.com/libp2p/specs/tree/master/identify), [kademlia](https://github.com/libp2p/specs/tree/master/kad-dht), etc. this should become a fertile ground for exploring peer-to-peer and local-first network software design. 62 | 63 | -------------------------------------------------------------------------------- /09_building_a_smart_car_fan.md: -------------------------------------------------------------------------------- 1 | # Building a Smart Car Fan 2 | 3 | 4 | 5 | Earlier this year I installed a 435W solar module on top of my car, charging a 2.2kWh power station via the two wires you can see running toward the back of the roof. In order to make do with only one hole in the roof (and have an easy plan of how to keep it rainproof) I decided to pass these rather stiff wires through extra holes drilled in the fan’s plastic housing. This all worked out quite nicely! 6 | 7 | Now we have more than enough power to charge our electronic devices while on the road (or off the grid). In the future this should be enough to power a small fridge as well — when the fridge is needed the solar module will also produce the required power. There was just one issue: I didn’t like the stress on the power station due to the car heating up to 60°C or more in summer. The solution is to simply provide enough circulation to keep the air temperature inside not too far from the temperature outside. But switching on and off the fan manually is annoying, and it may not even be possible when I’m not staying near my car for a few days. 8 | 9 | So the goal was to add some electronics that would do this for me. The power station wouldn’t be challenged at all providing electricity for some embedded processor and a thermal sensor. And while we’re at it we could also regulate the voltage to be able to throttle the fan from its usual 70W (which sounds like an idling airplane engine) to something more tolerable while staying inside the van. And of course all this should be possible with an iPhone app. 10 | 11 | ## Switching power to a 12V / 6A load 12 | 13 | Asking the wizards at the [local maker space](https://flipdot.org) quickly converged on buying — not building — a DC-DC step-down converter (a.k.a. buck converter). These devices are quite efficient and cheap (I paid less than 10€), but they are highly integrated. 14 | 15 | 16 | 17 | The two radiators dissipate the heat of a Schottky diode and an [XLSEMI XL4016E](https://www.alldatasheet.com/datasheet-pdf/pdf/763185/ETC2/XL4016E1.html), respectively. The latter basically takes a feedback signal to generate a pulse-width modulation signal that gates the input voltage via a power mosfet into the coil you see in the image, whereby cutting the supply will draw current from common (through the Schottky diode), filling the large capacitors such that the desired output voltage is reached. That voltage is selected by adjusting the potentiometer at the top right, setting a voltage divider such that the correct output voltage will result in a feedback of 1.25V. 18 | 19 | At this point I thought that I’ll need to remove that potentiometer and connect instead a voltage divider that I can control from an embedded processor (e.g. a ladder of resistors with transistors in parallel). But the wizards convinced me that it is smarter to supply a PWM signal to the feedback pin instead: 20 | 21 | - when I want the converter to work, let the pin float (i.e. do its usual thing) 22 | - when I want it to stop working for a while, pull the pin to >2.2V (as is documented in the data sheet on page 8; a 1N4118 has a max. forward voltage of 1.1V) 23 | 24 | When I tried doing this with an Arduino dev board at 5V I found that overriding the buck converter’s feedback signal in the second case requires a rather low impedance. This problem is exacerbated when running the Arduino nano (the device I’m trying out for this project) at its internally regulated 3.3V. So I had to revisit the electronics classes I took more than twenty years ago and with some experimentation I arrived at the following circuit: 25 | 26 | 27 | 28 | The idea is to drive the gate of the right MOSFET to common when the Arduino supplies 3.3V on its output pin, which will deplete the channel and thus the output on the right will be floating. When the Arduino pin is pulled to common, the left MOSFET channel is depleted so that the 100kΩ resistor pulls the right MOSFET’s gate high, supplying the output pin with about 10V in my case (V_dd at 12V). 29 | 30 | Using such a circuit on a breadboard I tried it out with the Arduino toggling its pin every 10ms. The oscilloscope revealed that while it only takes about 600µs to switch off the converter, it takes nearly 2ms to get it started again after the pin is pulled HIGH. This provides the parameters for the PWM we’ll use further down below. The finished setup consists of just three items soldered to the back side of the buck converter PCB (using 2N7000 transistors): 31 | 32 | 33 | 34 | ## Arduino programming 35 | 36 | We have the means, now use them: the [Arduino nano 33 BLE sense](https://store.arduino.cc/products/arduino-nano-33-ble-sense) comes with 14 digital GPIO pins and various sensors, we’ll only use the [Renesas HS3003](https://www.renesas.com/us/en/products/sensor-products/environmental-sensors/humidity-temperature-sensors/hs3003-high-performance-relative-humidity-and-temperature-sensor) (temperature & humidity) for now. 37 | 38 | Weirdly, the official Arduino library doesn’t quite work (I only got 0% relative humidity back), but fortunately the chip is quite easy to work with: 39 | 40 | 41 | 42 | The main interaction of this board shall happen via Bluetooth Low Energy, and I must say that I was very pleasantly surprised by how simple this is on Arduino. First declare the service and its characteristics, making sure to add the `BLENotify` flag for easy reading in the iPhone app later: 43 | 44 | 45 | 46 | Then some setup code to start the BLE infrastructure: 47 | 48 | 49 | 50 | In the main loop we only need to check whether some values were written and react: 51 | 52 | 53 | 54 | You can see the other places where `writeValue` is used to update the characteristics where applicable, notification of central device occurs in the background. 55 | 56 | In `throttle` mode the `run_throttle` function is used. Instead of programming timer registers by hand I used the `mbed::Ticker` facility that comes with the nRF52840 firmware. The ISR is quite simple and short, the code is only enlarged by the switch statement that configures the slow fan speeds: recall that it takes 2ms to switch on the buck converter, so we need to keep the signal HIGH for at least 4ms (I tried 3ms but found that that generates some strange sound from the fan motor). Getting the voltage down far enough for really slow speeds means extending the period of being switched off — this is exacerbated by the huge capacitors on the converter that are eagerly charged at the beginning of the cycle when the feedback signal is strong, and by the fact that a slow fan drain the capacitors more slowly as well. On setting 1 we’d expect the voltage to be about 2/22th of 12V, but in reality it ends up around 5V. 57 | 58 | 59 | 60 | While scrolling you may find that the above word on sensor usage wasn’t entirely correct: just for the fun of it I tried out the gesture sensor so that the fan control can be switched between `on`, `off`, and `auto` modes at the flick of a finger. The rest of the main loop deals with the thermal hysteresis of switching on the fan when the temperature rises above 30°C and off when it falls below 29°C. I have tried this and found it to be nice: the fan kicks in for a bit every few minutes when the outside temperature is 22°C with an overcast sky, and it stays on while the sun shines on the car roof. To detect this I put the Arduino right up there: 61 | 62 | ![](images/09/Arduino_in_car.png) 63 | 64 | ## iPhone programming 65 | 66 | This turned out to be quite a bit more complicated than all of the activities described above: Xcode is a complex and under-documented product, as are the Swift libraries. Once I had found [some](https://github.com/adafruit/Basic-Chat/tree/master/Basic%20Chat%20MVC) [examples](https://www.freecodecamp.org/news/ultimate-how-to-bluetooth-swift-with-hardware-in-20-minutes/) I got the basic structure implemented quite quickly: 67 | 68 | 69 | 70 | But of course BLE wouldn’t work out of the box, you need to _know_ that some localization keys need to be included in `Info.plist`, which you cannot (or at least shouldn’t) edit directly. You need to select the top-level project entry in the navigation tree and then edit the build settings: 71 | 72 | ![](images/09/info_plist.png) 73 | 74 | This will lead to the dialogue being shown to the iPhone user whether the app may use Bluetooth (instead of crashing the app). The nice thing about Xcode is that the very same app can also be built for macos without any extra hassle. But of course that doesn’t quite work, as on that platform the request for Bluetooth needs to be registered under “Signing & Capabilities” instead: 75 | 76 | ![](images/09/app_capabilities.png) 77 | 78 | But with these it all worked quite beautifully! Now I have an app on my iPhone with which I can control the fan from a distance of about 6–8m around the car. 79 | 80 | 81 | 82 | (yeah, I’m not a master designer, but it gets the job done) 83 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Roland Kuhn’s Blog 2 | 3 | Copyright © Roland Kuhn 2016 _(Though the presentation of this blog is blatantly copied from the [Letters from Klang](https://github.com/viktorklang/blog#letters-from-klang).)_ 4 | 5 | See the [Archive](https://github.com/rkuhn/blog/commits/master) for a list of blog post in reverse chronological order 6 | 7 | For an RSS feed of the commits to this repository, use the following [URL](https://github.com/rkuhn/blog/commits.atom) 8 | 9 | Contact information: 10 | [Twitter](https://twitter.com/rolandkuhn), 11 | [LinkedIn](https://de.linkedin.com/in/roland-kuhn-17828a57), 12 | [website](https://rolandkuhn.com/) 13 | -------------------------------------------------------------------------------- /images/09/Arduino_in_car.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rkuhn/blog/941e014d8787f4b2639c3ab6b6d33241f7fab27a/images/09/Arduino_in_car.png -------------------------------------------------------------------------------- /images/09/Arduino_nano_33_ble_sense.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rkuhn/blog/941e014d8787f4b2639c3ab6b6d33241f7fab27a/images/09/Arduino_nano_33_ble_sense.png -------------------------------------------------------------------------------- /images/09/Autodach.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rkuhn/blog/941e014d8787f4b2639c3ab6b6d33241f7fab27a/images/09/Autodach.png -------------------------------------------------------------------------------- /images/09/app_capabilities.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rkuhn/blog/941e014d8787f4b2639c3ab6b6d33241f7fab27a/images/09/app_capabilities.png -------------------------------------------------------------------------------- /images/09/buck converter driver circuit.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rkuhn/blog/941e014d8787f4b2639c3ab6b6d33241f7fab27a/images/09/buck converter driver circuit.png -------------------------------------------------------------------------------- /images/09/buck_converter.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rkuhn/blog/941e014d8787f4b2639c3ab6b6d33241f7fab27a/images/09/buck_converter.png -------------------------------------------------------------------------------- /images/09/buck_driver.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rkuhn/blog/941e014d8787f4b2639c3ab6b6d33241f7fab27a/images/09/buck_driver.png -------------------------------------------------------------------------------- /images/09/iPhone_app.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rkuhn/blog/941e014d8787f4b2639c3ab6b6d33241f7fab27a/images/09/iPhone_app.png -------------------------------------------------------------------------------- /images/09/info_plist.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rkuhn/blog/941e014d8787f4b2639c3ab6b6d33241f7fab27a/images/09/info_plist.png --------------------------------------------------------------------------------