├── Annotation └── README.md ├── AttributeGrammars └── README.md ├── Introduction.md ├── Mu-Nu └── README.md ├── NaturalTransformations └── README.md ├── README.md ├── Streaming └── README.md ├── glossary.md └── schemas └── README.md /Annotation/README.md: -------------------------------------------------------------------------------- 1 | # How do I annotate a tree? 2 | 3 | Once you’ve created a [pattern functor](../glossary.md#pattern-functor) for your structure, there is a general-purpose structure for tree annotation – `Cofree f ann`. Since it is also a tree, it has a pattern functor itself – `EnvT ann f a`. These two types _may_ be familiar to you from other contexts. `Cofree` is the dual to `Free` and has had corresponding recognition for a particular use case. `EnvT` is known as the environment monad transformer. However, we’re not taking advantages of any of these things – we only care about the structure. `EnvT ann f a` is isomorphic to `(ann, f a)`, so you can picture `Cofree` similarly – as a recursive tuple, where `fst` contains the annotation and `snd` contains the tree from the current node. 4 | 5 | The most direct way to annotate a tree is to use an [algebra](../glossary.md#algebra) like `f (Cofree f a) -> Cofree f a`, at each step, creating the next level of the tree. But there are a number of ways to reduce the boilerplate implied by this (and make the operation more general). 6 | 7 | ## `Transform` 8 | 9 | Rather than explicitly [folding](../glossary.md#fold), you can create a [natural transformation](../glossary.md#natural-transformation) `forall a. f a -> EnvT ann f a`. This is only possible if you don’t need context from the rest of the tree to create the annotation, but if it _is_ possible, then [you get a lot of flexibility](../NaturalTransformations/README.md). 10 | 11 | ## `attributeAlgebra` 12 | 13 | (NB: Should perhaps rename this to `annotateAlgebra`? Also, `attributeAlgebra` in Matryoshka is probably very broken.) 14 | 15 | `attributeAlgebra :: (f ann -> ann) -> f (Cofree f ann) -> Cofree f ann` 16 | 17 | This “algebra transformer” can convert an algebra can calculate the annotation for a node based on the annotations of its children into one that annotates the original tree. 18 | 19 | This is another example of how recursion schemes allow us to focus on only one thing at a time. We don’t need to think about annotating the tree with a value – only about how to calculate that value for the node we’re looking at. And then there are generic operations to give us the fully annotated tree from that. 20 | 21 | ## But … 22 | 23 | Often you annotate a tree and then consume the annotation in a second pass immediately afterward. And one of the things that recursion schemes supposedly offers is a way to “fuse” multiple passes over a structure into one. Is there some way we can do that here? 24 | 25 | Yes! There is a generalized fold called a “[zygomorphism](../glossary.md#zygomorphism)”. _zygo-_ is a prefix meaning something like “paired” (think of a _zygote_, which is a cell formed by the pairing of two gametes). And the algebra used for a zygomorphism is paired like that – `f (ann, a) -> a`. It pairs the [carrier](../glossary.md#carrier) with an extra value providing context to the algebra. 26 | 27 | Where does that context come from, though? The name of the type variable may give us a hint – we previously had an algebra that could give us the annotation we want – `f ann -> ann`. So, a zygomorphism takes this “annotation algebra” in addition to the algebra containing the tuple. 28 | 29 | Here’s an example where we have `annotate` as the “annotation algebra” and `consume` as the primary algebra. 30 | ```haskell 31 | annotate :: f ann -> ann 32 | consume :: f (ann, a) -> a 33 | 34 | myFold :: Mu f -> a 35 | myFold = gcata (distTuple annotate) consume 36 | ``` 37 | ```scala 38 | annotate: F[Ann] => Ann 39 | consume: F[(Ann, A)] => A 40 | 41 | myFold: Mu[F] => A = _.zygo(annotate, consume) 42 | ``` 43 | 44 | You might have noticed that `f (ann, a)` isn’t quite the same shape as our `EnvT` pattern functor. Often this new shape works, but there is a [Elgot](../glossary.md#Elgot) variation that gives us the `(ann, f a)` shape we might need 45 | ```haskell 46 | econsume :: (ann, f a) -> a 47 | 48 | myFold' :: Mu f -> a 49 | myFold' = egcata (distTuple annotate) econsume 50 | ``` 51 | ```scala 52 | econsume: (Ann, F[A]) => A 53 | 54 | myFold0: Mu[F] => A = _.ezygo(annotate, econsume) 55 | ``` 56 | 57 | So, we now have a fusion property for annotation – 58 | ```haskell 59 | cata (econsume . runEnvT) . cata (attributeAlgebra annotate) 60 | == egcata (distTuple annotate) econsume 61 | -- Alternatively, if your `econsume` algebra uses `EnvT` (as it would if you had 62 | -- previously been folding a Cofee): 63 | cata econsume . cata (attributeAlgebra annotate) 64 | == egcata (distTuple annotate) (econsume . EnvT) 65 | ``` 66 | 67 | There is a simpler equality for the case of simply _building_ an annotated tree, without additionally folding it. 68 | ```haskell 69 | cata (attributeAlgebra annotate) == egcata (distTuple annotate) (embed . EnvT) 70 | ``` 71 | -------------------------------------------------------------------------------- /AttributeGrammars/README.md: -------------------------------------------------------------------------------- 1 | # How do I Pass Data Toward the Leaves? 2 | 3 | One common problem that often leads developers toward `ana` is the desire to be able to pass some extra data (“attributes”) toward the leaves. Since `ana` works top-down, building the outermost nodes first, this is a natural intuition. And if your transformation looks like `Mu f -> Nu g`, then `Coalgebra g (Mu f)` can easily become `Coalgebra g (attributes, Mu f)` and you’re on your way! 4 | 5 | However, things are often more complicated than that. For example, what if your transformation is more like `Mu f -> a`, or even worse, `Mu f -> Either error (Nu g)`? These are cases where `cata` tends to serve you much better, but you still have to figure out how to pass data in the opposite direction of the fold! 6 | 7 | ## Avoiding Partiality 8 | 9 | `Mu f -> a` is a problem for `ana`, because the result needs to be some fixed-point type, but we simply have `a`. The simple approach for this is the `Partial` monad, which is `type Partial a = Nu (Either a)`. This gets us close to the type we want: `Mu f -> Partial a`. But what that `Partial` type means is that forcing the result may never terminate. (Pretty sweet that the type system can represent that, right?) Sometimes `Partial` is totally justified, but if you’re introducing it simply to pass data toward the leaves, you’re better off trying to use a fold. 10 | 11 | `Mu f -> a` is pretty straightforward as folds go. Just use `Algebra f a`. But the problem is that during the transformation, we somehow need to pass attributes from closer to the root. To do this, we introduce the attributes as an “environment” in which the value (`a`) is calculated. The environment monad is usually called `Reader`, and `Reader a b` is `a -> b`. So, we can use `Algebra f (attributes -> a)` to introduce attributes passed from the root. At each step, the current node can pass whatever `attributes` it wants to its children – taking into account the `attributes` passed to itself by its parent or not. For the entire fold, it means the result is a function, which needs to be passed some initial value (often something like a monoidal identity). This isn’t too different from the `Coalgebra` case above, where the unfold needs to be passed a tuple with the initial attribute along with the seed. 12 | 13 | ## Avoiding `Unsafe` Partiality 14 | 15 | In the case of `Mu f -> Either error (Nu g)`, it’s tempting to see the `Nu` in the result and lean toward an unfold. However, using a Coalgebra like `Mu f -> Either error (g (Mu f))` is problematic, because the only way to pull the monad (`Either error` in this case) to the outside of the entire fixed-point is force the entire tree to ensure that you never get a `Left` anywhere. But the entire point of `Nu` is to indicate that you can’t guarantee the tree is finite. So you’re really looking to blow things up. This form of partiality is unsafe, because it isn’t represented in the type. So while Haskell and Scala libraries tend to provide these operations, there’re not even possible in total languages. 16 | 17 | The same technique can be used as above to avoid this problem – using the “environment” monad again. There’s a bit more variation here, as _where_ the function goes can vary. If you can identify the error without the attributes, `AlgebraM (Either error) f (attributes -> Nu g)` is a reasonable choice. It has the benefit of letting `cataM` handle the threading of the monad for you. However, if the error is only identified _with_ the attributes, then you’re stuck with `Algebra f (attributes -> Either error (Nu g))`, which means you have to manage the monad yourself. 18 | 19 | ## Be Aware 20 | 21 | Even though `((->) attributes)` is itself a monad, you can’t use `AlgebraM (Reader attributes) f (Nu g)`. While both result in `attributes -> Nu g` after folding, `AlgebraM` (with `cataM`) will result in every node being passed the same value for `attributes` that you pass at the top level. This is simply a consequence of the general way a monad can be extracted over the fold. To be able to control the values for `attributes`, the function must be managed explicitly in the algebra, which is what `Algebra f (attributes -> Nu g)` requires. 22 | 23 | ## Just another word for function 24 | 25 | “Folding to a function” is an implementation of a technique from Knuth called “[attribute grammars](https://en.wikipedia.org/wiki/Attribute_grammar)”. In a attribute grammar, a formal grammar may have “inherited” and “synthesized” attributes attached to help it calculate a value. Inherited attributes are inherited from the parent and synthesized attributes are passed from the children. In recursion schemes, that looks like 26 | ```haskell 27 | Algebra f (inherited -> (synthesized, a)) 28 | ``` 29 | Any attribute grammar should be able to be modeled this way. As a result, I often say “attribute grammar” when I mean a fold with a carrier that’s a function (which is intended to pass some attribute toward the leaves). 30 | -------------------------------------------------------------------------------- /Introduction.md: -------------------------------------------------------------------------------- 1 | # Introduction 2 | 3 | ## Ingredients 4 | 5 | Every recipe usually starts with a list of ingredients. In a recursion schemes cookbook though, we only use a very small wariety of ingredients. Actually we only have three of them and some recipes use only two. Our three ingredients are *pattern-functors*, *f-algebras* and *fixpoint types*. 6 | 7 | ### Pattern-functors 8 | 9 | This is the main ingredient for each recipe. It will determine the "shape" of the computation. For example, if we want to model a binary tree traversal, we would use the following functor as our pattern: 10 | 11 | ``` scala 12 | sealed trait TreeF[A] 13 | final case class Node[A](left: A, right: A) extends TreeF[A] 14 | final case class Leaf[A]() extends TreeF[A] 15 | ``` 16 | 17 | It is not always the case that the notation of a functor (here a simple ADT) resembles the shape of the resulting computation so closely. For instance, you might be surprised that the following functor allows to represent computations over an infinite stream of `Foo`s: 18 | 19 | ``` scala 20 | type Stream[A] = (Foo, A) 21 | ``` 22 | 23 | Every recipe will start by picking the right pattern-functor. Matryoshka provides a bunch of general-purpose patterns in the `matryoshka.patterns` package, but sometimes you'll have to build a custom pattern to meet your needs (most of the time using and ADT like `TreeF` above). In such cases you should always try to avoid polluting your ADT with unnecessary information. In other words, try not adding fields that do not participate to the recursive *structure* (or shape) of your pattern. As an illustration you can notice that we didn't define any `label` field in `TreeF`. 24 | 25 | ### F-Algebras 26 | 27 | As pattern-functors determine the shape of the computation we build, f-algebras determine what operation we perform at each step of the computation. Given a functor `F` and a *carrier* `A`, 28 | * An `Algebra[F, A]` is simply a function `F[A] => A` 29 | * A `Coalgebra[F, A]` is simply a function `A => F[A]` 30 | 31 | From now on we'll use the term f-algebra to talk about algebras and coalgebras indistinctively, otherwise we'll specifically say "algebra" or "coalgebra" when the distinction is necessary. 32 | 33 | Algebras are always applied on the pattern "bottom-up". In our `TreeF` example that means that an algebra will first be applied a leaf, the result of this application will be "pushed" into that leaf's parent, then the algebra will be applied to the parent, and so on up to the root node. 34 | 35 | Dually, coalgebras are always applied "top-down". We start from a seed value of type `A`, the coalgebra produces the root of the tree and then is recursively applied to the root to produce its children, and so on until the leaves are reached. 36 | 37 | It's important to keep in mind that f-algebras can only operate on a single node at a time. In other words, from the body of an f-algebra it is not possible to know where we are in our structure or to peek at another part of it. 38 | 39 | Nevertheless, most non-trivial problems will require such capabilities. Fortunately, we can emulate these capabilities using "embellished" — or, to stick with our cooking metaphor, "flavored" — f-algebras. 40 | 41 | 42 | 43 | ### Fixpoint Types 44 | 45 | Pattern-functors are just regular functors that are used recursively (ie applied to themselves). In a statically-typed language this can be problematic when it comes to writing functions that take or return arbitrary instances of a given pattern. 46 | 47 | Consider the following values: 48 | 49 | ``` scala 50 | val leaf: TreeF[Nothing] = Leaf() 51 | val node: TreeF[TreeF[Nothing] = Node(Leaf(), Leaf()) 52 | ``` 53 | 54 | Both have a different static type, so without any helper, there's no way to write a function that would accept both as an argument (unless we dirty our hands by typing `Any` or `AnyRef`). 55 | 56 | Fixpoint types are exactly that helper we need. A fixpoint type is a type of the shape `T[_[_]]`, ie a type constructor that takes a type constructor as its parameter. If we're able to *embed* every "node" of a structure drawn from a pattern `F` into `T`, then we can use the type `T[F]` to represent arbitrary structure based on `F`: 57 | 58 | ``` scala 59 | val leaf: T[TreeF] = Leaf().embed 60 | val node: T[TreeF] = Node(leaf, leaf).embed 61 | 62 | ``` 63 | 64 | In addition, we need a way to *project* a `T[F]` onto a `F[T[F]]` to regain access to our functor and apply algebras. 65 | 66 | Matryoshka provides several fixpoint types that have at least one of these two capabilities (*project* or *embed*) that differ only by their run-time characteristics (stack-safety in particular). 67 | 68 | Picking the right fixpoint type isn't crucial in coming up with a correct solution to a given problem, although it *is* crucial to use a fixpoint suited to the characteristics of your run-time inputs). It is therefore a good practice to abstract over the fixpoint type and delegate the choice as much as possible. 69 | 70 | The `Recursive`, `Corecursive` and `Birecursive` typeclasses capture the two capabilities of fixpoint types: 71 | * An instance of `Recursive.Aux[T, F]` has a method `project(t: T): F[T]` 72 | * An instance of `Corecursive.Aux[T, F]` has a method `embed(f: F[T]): T` 73 | * An instance of `Birecursive[T, F]` has both 74 | 75 | So instead of writing this overly specific signature: 76 | 77 | ``` scala 78 | def foo(tree: Fix[TreeF]): Fix[TreeF] = ??? 79 | ``` 80 | 81 | We will always write 82 | 83 | ``` scala 84 | def foo[T](tree: T)(implicit T: Recursive.Aux[T, TreeF]): T = ??? 85 | ``` 86 | 87 | Choosing between `Recursive`, `Corecursive` or `Birecursive` depending on whether we need to embed, project or both within the body of the function. 88 | 89 | From now on we'll say "recursive `T` on `F`" (resp. corecursive or birecursive) to refer to a type that has an instance of `Recursive.Aux[T, F]`, sometimes omitting "on `F`" when it's unambiguous. 90 | 91 | 92 | ## Cooking Method 93 | 94 | And finally to complete our recipe we need to decide how we combine these ingredients together. That is, choosing the right scheme for the job. 95 | 96 | There are three families of schemes: folds, unfolds and refolds. 97 | * folds take an algebra on `F` with carrier `A` and produce a function from `T` to `A` (for any recursive `T`) 98 | * unfolds take a coalgebra on `F` with carrier `A` and produce a function from `A` to `T` 99 | * refolds take an algebra on `F` with carrier `B` and a coalgebra on the same `F` with carrier `A` and produce a function from `A` to `B` (notice that no fixpoint type is needed here). 100 | 101 | In each of these families, schemes differ only by the "flavor" of the f-algebra(s) they take as parameter. That means that you can either choose your f-algebra(s) first and deduce the scheme you need, or choose a scheme first and it will tell you the f-algebra(s) you need. Whichever you find the easiest. 102 | 103 | A thing that's worth keeping in mind though is that all schemes are internally expressed in terms of a refold (there's no other way actually). Folds are implemented as refolds with a no-op coalgebra (and vice versa with unfolds). 104 | 105 | If you want to minimize the number of traversals needed by your solution, you should always try to express it in terms of a minimal numbed of successive refolds. For example, if you find yourself chaining an `ana` and then a `cata` on the same structure, you should definitely use a `hylo` instead. 106 | -------------------------------------------------------------------------------- /Mu-Nu/README.md: -------------------------------------------------------------------------------- 1 | # Which fixed-point operator should I use? 2 | 3 | https://gitter.im/slamdata/matryoshka?at=5b46226f7b811a6d63e33981 4 | 5 | # `Mu` is smaller than `Nu` 6 | 7 | The various fixed-point operators are a source of confusion. You commonly see at least three: `Fix`, `Mu`, and `Nu`. When should you choose one over the other, and why do they all appear to have the same operations available? 8 | 9 | For starters, `Mu` represents the “least fixed point” of a functor, and `Nu` represents the “greatest fixed point” of a functor. “Least” and “greatest” have some relative meaning there, but what does it really mean in this context? For our purposes, we can say that `Mu` models _finite_ structures and `Nu` models _potentially_-infinite ones. Note that values in `Nu` don’t have to be infinite – it subsumes all the values of `Mu` – but once we have a value of `Nu`, we have lost any guarantee that it is finite. 10 | 11 | Where do other recursive structures fall on this spectrum from “least” to “greatest”? Well, it largely comes down to laziness – if a data structure is strict in its recursive parameter, it is finite, and if it is lazy, it is potentially-infinite. 12 | 13 | ```haskell 14 | data LeastList a = Nil | Cons a !(LeastList a) 15 | data GreatestList a = Nil | Cons a (GreatestList a) 16 | 17 | data XNor a b = Neither | Both a b 18 | 19 | type LeastList' a = Mu (XNor a) 20 | type GreatestList' a = Nu (XNor a) 21 | ``` 22 | 23 | In this example, we have created strict (`!`) and lazy versions of linked lists in Haskell. Since Haskell’s built-in list is also lazy, there is no finitary proof. Most data structures in Haskell are lazy, so they are the greatest fixed points of their functors, while in Scala, most data structures are strict, so they are the least fixed points of their functors. In Haskell, we tend to _pretend_ that our structures are finite (e.g., by calling `foldr` on a list, expecting it to terminate). 24 | 25 | And that (finally) brings us to `Fix`. `Fix` is commonly defined in a way that makes what recursion schemes do as obvious as possible: 26 | 27 | ```haskell 28 | data Fix f = Fix { unfix :: f (Fix f) } 29 | ``` 30 | 31 | ```scala 32 | final case class Fix[F[_]](unfix: F[Fix[F]]) 33 | ``` 34 | 35 | These use “direct recursion” to show how `Fix` repeatedly nests the same functor recursively. But these two similar statements have quite different meanings – since Haskell is lazy by default, its `Fix` isn’t finitary, while Scala’s is. So, for some pedagogical ease, the notion of least/greatest fixed points is glossed over. In summary – Haskell’s `Fix` is akin to `Nu`, while Scala’s is akin to `Mu`. 36 | 37 | You may be familiar with the phrase “[making illegal states unrepresentable]”. This is often touted as a benefit of strong type systems. What this generally means is having the fewest values of your type without precluding any valid ones. So, as a rough guideline, we should try to use `Mu` when we can, and fall back to `Nu` when we have to. 38 | 39 | ## When does this question come up? 40 | 41 | ### `hylo`morphisms 42 | 43 | ### building structures 44 | 45 | ### transforming structures 46 | 47 | Often you want to do a transformation from one recursive type to another one (although, this can frequently be avoided, but that is for a different chapter). If you know you are starting from a finite structure, you can retain that finitary proof by using a fold to transform the structure. However, if the result is potentially infinite, that approach won’t work, and you’ll be forced to use an unfold. Also, if you don’t know whether the input has a finitary proof (`Mu`) or not (`Nu`), then you can use an unfold, and you will end up with a result without a finitary proof. 48 | 49 | Let’s take a simple example – `drop`, which will shorten a list by some amount. It is common to see this as an unfold operation: 50 | ```haskell 51 | drop :: Coalgebra (Either [a]) (XNor a) (Natural -> [a]) 52 | drop l 0 = 53 | ``` 54 | -------------------------------------------------------------------------------- /NaturalTransformations/README.md: -------------------------------------------------------------------------------- 1 | # How Do I Convert Between (Co)Recursive Structures? 2 | 3 | https://gitter.im/slamdata/matryoshka?at=5b5742a2c86c4f0b472f38c0 4 | 5 | When you’re transforming data between two recursive structures, your best bet is to define a natural transformation rather than using an `Algebra` or `Coalgebra` directly. 6 | 7 | ```haskell 8 | -- ideally 9 | myNat :: forall a. f a -> g a 10 | -- so now you can 11 | cata (embed <<< myNat) 12 | -- or 13 | ana (myNat <<< project) 14 | -- the former can give you a finite structure (but only if you’ve started with 15 | -- one) and the later can take any structure, but will result in a potentially- 16 | -- infinite one. So rather than defining both a `Algebra` and a `Coalgebra` for 17 | -- the various cases, you have one natural transformation for whichever you 18 | -- need. 19 | 20 | -- But there’s also magical fusion … 21 | myGAlg :: Algebra g b 22 | 23 | myFFold :: Mu f -> b 24 | myFFold cata myAlg <<< cata (embed <<< myNat) 25 | 26 | myFFold' :: Mu f -> b 27 | myFFold' = cata (myGAlg <<< myNat) 28 | -- natural transformations compose very nicely with whatever other (co)algebras 29 | -- you may have, eliminating ever constructing a value of `Mu g` along the way. 30 | ``` 31 | 32 | ```haskell 33 | -- however, sometimes your `f` can result in multiple `g`s 34 | myNatOrMore :: forall a. f a -> g (Free g a) 35 | -- and sometimes zero `g`s 36 | myNatMoreOrLess :: forall a. f a -> Free g a 37 | 38 | -- The Free cases are _mostly_ a bit trickier, except for 39 | gana distFree (myNatOrMore <<< project) 40 | ``` 41 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Recursion Schemes Cookbook 2 | 3 | Welcome to the recursion schemes cookbook! This document is intended as a complement to [matryohska](https://github.com/slamdata/matryoshka)'s documentation. Its goal is to help you come up with strategies to solve real-life problems using matryoshka (and recursion schemes in general). 4 | 5 | ## Reading this book 6 | 7 | We can see recursion schemes as a (quite unusual) way to build functions by using a recursive data structure as a blueprint of the computation (a... pattern). Understanding how the three ingredients work together requires a little effort but is within the reach of any motivated programmer. On the other hand, this being a rather unusual way to express computations, the main difficulty is to find the right ingredients to solve a concrete problem. 8 | 9 | That's the goal of this cookbook: provide you with detailed examples of the most frequent uses of recursion schemes that explain the thought process that leads from the definition of a problem to the expression of a solution in terms of pattern-functors and f-algebras. 10 | 11 | In turns out that people seem to use recursion schemes to solve two major families of problems: 12 | * Compiling or interpreting programs (ie working with ASTs) 13 | * Manipulating schemas (or any generic representation of data) 14 | 15 | Both come with its own set of challenges (although those sets have a non-empty intersection). This cookbook is therefore divided in two main parts, each dedicated to one of these two families of problems. 16 | 17 | ## Table of contents 18 | 19 | * [Working With Schemas](schemas/README.md) 20 | * [Working With Schemas Only](schemas/README.md#working-with-schemas-only) 21 | * [Knowing Where You Are](schemas/README.md#knowing-where-you-are) 22 | * [Remembering What You Did Before](schemas/README.md#remembering-what-you-did-before) 23 | * [Working With Schemas and Data](schemas/README.md#working-with-schemas-and-data) 24 | * Working With ASTs 25 | * [Which fixed-point operator should I use?](Mu-Nu/README.md) 26 | * [How do I convert between (co)recursive structures?](NaturalTransformations/README.md) 27 | * [How do I pass data toward the leaves?](AttributeGrammars/README.md) 28 | * How do I restrict which nodes can occur where? (mutual recursion, inductive type families) 29 | * How can I apply this to DAGs? (`RootedGraph`) 30 | * Fixing frustrations of directly-recursive types 31 | * How do I avoid creating two ASTs that are 90% the same, so I can tighten the type after a transformation? (`Coproduct`) 32 | * [How do I annotate a tree?](Annotation/README.md) 33 | * How do I compose algebras and coalgebras efficiently? 34 | * [Streaming](Streaming/README.md) 35 | * [Glossary](glossary.md) 36 | -------------------------------------------------------------------------------- /Streaming/README.md: -------------------------------------------------------------------------------- 1 | # Streaming 2 | 3 | “I want `anaM`.” 4 | 5 | You probably don’t. 6 | 7 | ```haskell 8 | anaM :: (Corecursive t f, Traversable f) => (a -> m (f a)) -> a -> m t 9 | ``` 10 | 11 | This looks like a reasonable enough type, and it‘s not hard to implement in most languages. But when you think through what it has to do, you can see the problem. 12 | 13 | Say you have `expandFoo :: Natural -> Either String (Foo Natural)` as your algebra. Let’s first look at it with (non-monadic `ana)`. 14 | 15 | ```haskell 16 | ana (Compose <<< expandFoo) :: a -> Nu (Compose (Either String) Foo) 17 | ``` 18 | 19 | This is unproblematic. However, we don’t have a nice `Either String (Nu Foo)` like we were hoping for. There’s an `Either` before each node, and we need to handle them one at a time. The only way to pull them out is to walk the whole structure, `sequenceA`ing each `Foo (Either String (Nu Foo))` to `Either String (Foo (Nu Foo)))` and `embed`ding it. But [what do we know about `Nu`?](../Mu-Nu/README.md) The values may be infinitely large. That’s a problem if we want to walk the whole structure. 20 | 21 | That’s roughly what `anaM` does (in one pass, though). So, it’s necessarily partial – it may never complete. 22 | 23 | How do we get around this? 24 | 25 | There are two approaches, and I recommend you explore them in this order 26 | 27 | ## 1. use a fold instead 28 | 29 | When you have an input with a `Recursive` instance, you can sometimes convert your `Coalgebra` to an `Algebra` (or, even better, to a `NaturalTransformation`). We lucked out – our example above uses `Natural`, which has `Recursive Natural Maybe`. So, after thinking hard for a while, perhaps we manage to get to `natToFoo :: Maybe (Mu Foo) -> Either String (Mu Foo)`. 30 | ```haskell 31 | cataM natToFoo :: Natural -> Either String (Mu Foo) 32 | ``` 33 | This has to do the same traversal as we did above, but since we know `Natural` is finite, we can walk the structure without worrying about non-termination … which is exactly what `cata` already does, so we can `sequenceA` in that same pass. 34 | 35 | Notice that we also have `Mu Foo` instead of `Nu Foo` in the result. This isn’t guaranteed, but often if you can fold to something instead of unfold to it, you get to keep the finitary proof that you started with. 36 | 37 | But, this approach doesn’t always work, so … 38 | 39 | ## 2. try streaming 40 | 41 | Doing this within a recursion scheme library requires writing a bit of your own machinery _for now_. But, the annoying type we had before – `Nu (Compose (Either String) Foo)` is very similar to what you see in effectful stream libraries. The types you often see look more like this, though 42 | ```haskell 43 | type Stream m a = Nu (Coproduct m (XNor a)) 44 | ``` 45 | - Where `Compose m f` means something like “always an `m` followed by an `f`”, `Coproduct m f` mean something like “either an `m` or an `f` at each step”. So, you might get five `Right`s in a row before seeing the next `Foo`. 46 | - The `Either String` is generalized to an arbitrary functor. 47 | - `Foo :: * -> *` is replaced by `XNor a :: * -> *`, which is the pattern functor for `List`-like things, since most streaming is done over a sequence of things. 48 | 49 | But recursion schemes aren’t restricted in those ways – you can stream arbitrary tree structures, exploring only the branches you need to. For the time being, you should be able to do a lot of similar stuff with an effectful streaming library (say, `conduits` in Haskell or FS2 in Scala). 50 | -------------------------------------------------------------------------------- /glossary.md: -------------------------------------------------------------------------------- 1 | # Glossary 2 | 3 | ## A 4 | 5 | ### algebra 6 | 7 | ### anamorphism 8 | 9 | ### apomorphism 10 | 11 | ## C 12 | 13 | ### carrier 14 | 15 | In a algebra like `f a -> a`, `a` is called the _carrier_. 16 | 17 | ### catamorphism 18 | 19 | ### chronomorphism 20 | 21 | ### coalgebra 22 | 23 | ## D 24 | 25 | ### distributive law 26 | 27 | ### dynamorphism 28 | 29 | ## E 30 | 31 | ### Elgot 32 | 33 | A variation of generalized algebras that swaps the parameters from `f (w a) -> a` to `w (f a) -> a`. The name is taken from the “Elgot algebra” and is indicated in recursion scheme libraries with an `e` prefix. 34 | 35 | ## F 36 | 37 | ### `Fix` 38 | 39 | A [fixed-point operator](#fixed-point-operator) implemented using native recursion. Depending on evaluation model of the language, it may be more akin to [Mu](#Mu) (in a strict language like Scala) or [Nu](#Nu) (in a lazy language like Haskell). It is generally unimplementable in total languages, but the clarity of its definition makes it useful for teaching recursion schemes. 40 | 41 | ### fixed-point operator 42 | 43 | Any of a number of type constructors that have the (poly)kind `(k -> k) -> k` (most commonly seen where `k = *`). [Mu](#Mu), [Nu](#Nu), and [Fix](#Fix) are the most-frequently seen. 44 | 45 | ### fold 46 | 47 | ### fusion 48 | 49 | ### futumorphism 50 | 51 | ## G 52 | 53 | ### generalized algebra 54 | 55 | ### generalized fold 56 | 57 | ### greatest fixed point 58 | 59 | ## H 60 | 61 | ### histomorphism 62 | 63 | ### hylomorphism 64 | 65 | ## L 66 | 67 | ### least fixed point 68 | 69 | ## M 70 | 71 | ### metamorphism 72 | 73 | ### Mu (Μ) 74 | 75 | ### mutumorphism 76 | 77 | ## N 78 | 79 | ### natural transformation 80 | 81 | Sometimes represented by the Greek lowercase eta (η). 82 | 83 | ### Nu (Ν) 84 | 85 | ## P 86 | 87 | ### pattern functor 88 | 89 | ### phi (ɸ) 90 | 91 | ### psi (ψ) 92 | 93 | ## U 94 | 95 | ### unfold 96 | 97 | ## Z 98 | 99 | ### zygomorphism 100 | 101 | _zygo-_ is a prefix meaning something like “paired” (think of a _zygote_, which is a cell formed by the pairing of two gametes). And the algebra used for a zygomorphism is paired like that – `f (ann, a) -> a`. It pairs the carrier with an extra value providing context to the algebra. 102 | -------------------------------------------------------------------------------- /schemas/README.md: -------------------------------------------------------------------------------- 1 | # Working With Schemas 2 | 3 | Schemas are tree-like objects that unambiguously and completely describe a data structure. Schemas are useful whenever we need to move data in and out across the boudnaries of our applications. A generic way to use and take advantage of schemas regardless of their concrete representation (there is a bunch of competing schema formats out there like [Avro](), [Thrift](), [Protobuf](), etc) is therefore much needed. 4 | 5 | ### The SchemaF Pattern-Functor 6 | 7 | We obviously need a pattern-functor whose structure reflects the general structure of schemas. Of course you'll want to tailor this pattern to fit your business problem, keeping only the bits of the general structure you need, but you'll probably end up with something that looks like the following: 8 | 9 | ```scala 10 | sealed trait SchemaF[A] 11 | 12 | // struct aka record aka object aka dictionnary 13 | final case class StructF[A](fields: Map[String, A]) extends SchemaF[A] 14 | 15 | final case class ArrayF[A](element: A) extends SchemaF[A] 16 | 17 | final case class UnionF[A](alternatives: List[A]) extends SchemaF[A] 18 | 19 | // suppose we have a `Type` ADT that represents simple types as Int, String, etc 20 | final case class ValueF[A](tpe: Type) extends SchemaF[A] 21 | ``` 22 | 23 | You might not care about unions at all and therefore omit the `UnionF` case, or you might prefer to have a specific case for every simple type (like `final case class IntF[A]() extends SchemaF[A]`) and so on. 24 | 25 | In any case, it's always a good idea to provide a `Traverse` instance for your pattern right away, rather than just a mere `Functor`. It'll allow you to use all the "monadic-flavored" schemes out of the box. 26 | 27 | ## Working With Schemas Only 28 | 29 | A lot of work can be done using only a schema or to put it differently, a lot of interesting functions can be built by applying some f-algebras to our `SchemaF` pattern. 30 | 31 | We can produce functions that convert schemas between different formats (Avro, Thrift, Protobuf, etc), that produce data-validator for a schema, that ensure compatibility across different versions of a schema and so on. 32 | 33 | ### Knowing Where You Are 34 | 35 | It's sometimes needed to know the position of a given schema "node" within the global schema. For example, you want the data-validator you're building to add a precise path to the error messages it produces. 36 | 37 | Such information is not present in our pattern-functor (and it shouldn't be in yours either). This means that we need a way to *label* each "node" of our schema with its path first. 38 | 39 | First we need to notice that the only way to build such paths is to go top-down from the "root" of our schema, which means that we're looking for a coalgebra. 40 | 41 | Provided we can compute such path, what we want is to label each element of a given schema with its path. The `matryoshka.patterns.EnvT` pattern-functor allows just that. Given a label type `E` and a (pattern-)functor `F`, `EnvT[E, F, ?]` is the pattern-functor that has the exact same structure as `F` but with each "node" labelled with a value of type `E`. 42 | 43 | ``` scala 44 | final case class EnvT[E, F[_], A](run: (E, F[A])) 45 | ``` 46 | 47 | So we want to build a `Coalgebra[EnvT[Path, SchemaF, ?], A]` but we still need to find the right carrier (the `A` type variable). We will surely need our recursive `T` on `SchemaF`, but we also need to *carry along* the path from the root, so we'll use `(Path, T)` as our carrier. Now we're ready to implement our coalgebra. 48 | 49 | ``` scala 50 | def labelWithPath[T](implicit T: Recursive.Aux[T, SchemaF]): Coalgebra[EnvT[Path, SchemaF, ?], (Path, T)] = { 51 | case (path, t) => 52 | t.project match { 53 | case StructF(fields) => 54 | EnvT((path, StructF(fields.map{ case (name, tt) => 55 | name -> (path / name, tt) 56 | }))) 57 | case schema => EnvT((path, schema)) 58 | } 59 | } 60 | ``` 61 | 62 | For every struct field, we *push down* a new path that is the current path to which we append the name of that field. The other case have no influence on the path so we simply push the current one. 63 | 64 | We need to do that by hand because the name of a field is not visible from the field itself, so we have no way to write a function of type `(Path, SchemaF[T]) => Path` that would actually build a correct path. 65 | 66 | If we were able to write such function, we could use the `attributeTopDown` function from the `Recursive` trait. For example, labelling each element of a schema with its depth would look like: 67 | 68 | ``` scala 69 | def labelWithDepth[T, U](t: T)(implicit T: Recursive.Aux[T, SchemaF], 70 | U: Corecursive.Aux[U, EnvT[Int, SchemaF, ?]]): U = 71 | T.attributeTopDown[U, Int](t, 0) { 72 | case (depth, _) => depth + 1 73 | } 74 | ``` 75 | 76 | ### Remembering What You Did Before 77 | 78 | The main strength of recursion schemes is that you get to process a "tree" one "node" at a time, which means that your algebra is oblivious of the result it produced before (elswhere on the "tree"). 79 | 80 | Lets be a bit more precise. The carrier of any f-algebra carries precisely the result of the previous execution of the said f-algebra, so we always rembember what we did just before. But what if we need to remember what we did on a totally unrelated part of the tree? 81 | 82 | Concretely, say we want to serialize any `T` recursive on `SchemaF` to `org.apache.avro.Schema`. It's definitely doable since we can express every possible `SchemaF` case in terms of `avro.Schema` structure-wise. The problem is that the Avro API mandates that, when building a `Schema`, we register every record (the Avro representation of our `StructF`) under a name that is unique across the whole `Schema`. This is because Avro is primarily meant as a binary representation of some classes in a codebase, but that's not the case there, so these mandatory unique names are irrelevant to us. All we're interested in is the *structure* of our `SchemaF`, so if we're able to deterministically derive a name from an arbitrary `SchemaF` we're good to go, provided that we can remember the name we've already registered. 83 | 84 | So we need to write an algebra that works in a *context* were it can *register* and *lookup* facts, that is a context that's able to maintain an *updatable state*, that is the `State` monad. 85 | 86 | In conclusion, we want to write an `AlgebraM[State[Registry, ?], SchemaF, avro.Schema]`. In the `Registry` managed by the state monad, we'll store a mapping from name to `Schema` of all the partial records we've already built. That way we'll be able to avoid duplicate names. 87 | 88 | ``` scala 89 | import org.apache.avro.Schema 90 | 91 | type Registry = Map[String, Schema] 92 | 93 | def name(structure: Map[String, Schema]): String = ??? 94 | def buildRecordSchema(name: String, fields: Map[String, Schema]): Schema = ??? 95 | 96 | def toAvroAlg: AlgebraM[State[Registry, ?], SchemaF, Schema] = { 97 | 98 | case StructF(fields) => 99 | State { registry => 100 | val n = name(fields) 101 | if (registry.contains(name)) // recalling what we did in the past 102 | (registry, registry(name)) 103 | else { 104 | val schema = buildRecordSchema(name, fields) 105 | ( 106 | registry + (name -> schema), // memorizing what we just did 107 | schema 108 | ) 109 | } 110 | } 111 | 112 | } 113 | 114 | 115 | def toAvro[T](t: T)(implicit T: Recursive.Aux[T, SchemaF]): Schema = 116 | t.cataM(toAvroAlg).run(Map.empty)._2 117 | ``` 118 | 119 | As a bonus, since we reuse a record we've already registered when we encounter a `StructF` that has the exact same structure (provided that `name` is injective) we're guaranttied to build the most compact `Schema` we can. 120 | 121 | ## Working With Schemas And Data 122 | --------------------------------------------------------------------------------