├── .gitignore
├── LICENSE
├── README.md
├── Setup.hs
├── explanation.md
├── higher-order-unification.cabal
├── src
    ├── Client.hs
    └── Unification.hs
└── stack.yaml


/.gitignore:
--------------------------------------------------------------------------------
1 | .stack-work
2 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | Copyright (c) 2017 Danny Gratzer
 2 | 
 3 | Permission is hereby granted, free of charge, to any person obtaining
 4 | a copy of this software and associated documentation files (the
 5 | "Software"), to deal in the Software without restriction, including
 6 | without limitation the rights to use, copy, modify, merge, publish,
 7 | distribute, sublicense, and/or sell copies of the Software, and to
 8 | permit persons to whom the Software is furnished to do so, subject to
 9 | the following conditions:
10 | 
11 | The above copyright notice and this permission notice shall be included
12 | in all copies or substantial portions of the Software.
13 | 
14 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
15 | EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
16 | MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
17 | IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
18 | CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
19 | TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
20 | SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
21 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | ## higher-order-unification
 2 | 
 3 | A simple, concise implementation of Huet's algorithm. Written because
 4 | it's difficult to translate the simple prose explanations of
 5 | algorithms often adopted by the unification community to a working
 6 | piece of code. The code is documented fully in `explanation.md`.
 7 | 
 8 | An example of how higher-order unification might be used may be found
 9 | in `src/Client.hs` which provides a simple type-inference/checking
10 | algorithm for a dependently typed language with `Type : Type`.
11 | 


--------------------------------------------------------------------------------
/Setup.hs:
--------------------------------------------------------------------------------
1 | import Distribution.Simple
2 | main = defaultMain
3 | 


--------------------------------------------------------------------------------
/explanation.md:
--------------------------------------------------------------------------------
  1 | ## An Explanation of Unification.hs
  2 | 
  3 | In order to make this code useful to others, I would like to take the
  4 | time to explain exactly how it works. In this file, we will go through
  5 | the unification algorithm being used and how it is implemented in the
  6 | code.
  7 | 
  8 | ### The Problem
  9 | 
 10 | Before beginning, it's worth clarifying the problem that we're
 11 | attempting to solve with this code, namely, what is higher order
 12 | unification? The simple answer is that we want to take two terms with
 13 | "holes" in them, called metavariables. We then want to figure out how
 14 | to replace those metavariables with programs so that the two terms,
 15 | once fully filled in, evaluate to the same term. Our language will
 16 | contain the following constructs,
 17 | 
 18 |  1. Variables
 19 |  2. Functions (lambdas) and correspondingly application
 20 |  3. Metavariables
 21 |  4. Function types, in the typical style for a dependently typed
 22 |  language: pi types
 23 |  5. Universe types
 24 | 
 25 | This was originally designed to be part of a typechecker for a
 26 | particular dependently typed language, hence the pi types and
 27 | universes, but they can safely be ignored and treated as particular
 28 | constants.
 29 | 
 30 | The main issue is, that it's actually undecidable to do this in the
 31 | general case. It's therefore only possible to implement a semidecision
 32 | procedure that performs relatively well in practice. By a semidecision
 33 | procedure, I mean an algorithm that will terminate with a solution
 34 | when possible and reject only some of the time. This procedure is
 35 | called Huet's algorithm and it's essentially a refinement of the
 36 | following algorithm
 37 | 
 38 |  1. Generate a solution
 39 |  2. Test it
 40 |  3. If the solution was correct, stop
 41 |  4. Else, go to 1
 42 | 
 43 | This is not exactly the most sophisticated algorithm but it does have
 44 | the benefit of being obviously a correct semidecision procedure for
 45 | our problem. The idea with Huet's algorithm is to gradually produce a
 46 | solution and to only produce solutions that are at least not
 47 | obviously wrong. By doing this, we drastically cut down the search
 48 | space and produce answers reasonably quickly.
 49 | 
 50 | ### The Set Up
 51 | 
 52 | To begin with, we introduce the tools that we will need to even code
 53 | up the unification algorithm. The first critical point is how we
 54 | define the language we're unifying in the first place. I will
 55 | represent terms using the so-called "locally nameless" approach. This
 56 | means that we use
 57 | [de Bruijn](https://en.wikipedia.org/wiki/De_Bruijn_index) to
 58 | represent bound variables. However, for free variables we will
 59 | generate globally unique identifiers to simplify the process of
 60 | carrying around contexts or the like. This does mean that our AST has
 61 | two different constructors for variables, free and bound.
 62 | 
 63 | ``` haskell
 64 |     type Id = Int
 65 |     type Index = Int
 66 |     data Term = FreeVar Id
 67 |               | LocalVar Index
 68 |               | MetaVar Id
 69 |               | Uni
 70 |               | Ap Term Term
 71 |               | Lam Term
 72 |               | Pi Term Term
 73 |               deriving (Eq, Show, Ord)
 74 | ```
 75 | 
 76 | Since we're using de Bruijn indices, we also need to define a crucial
 77 | helper function called `raise :: Int -> Term -> Term`. This raises all
 78 | the variables wrapped in a `LocalVar` constructor up by `i`. This is
 79 | done by recursing over the inputted term.
 80 | 
 81 | ``` haskell
 82 |     raise :: Int -> Term -> Term
 83 |     raise = go 0
 84 |       where go lower i t = case t of
 85 |               FreeVar i -> FreeVar i
 86 |               LocalVar j -> if i > lower then LocalVar (i + j) else LocalVar j
 87 |               MetaVar i -> MetaVar i
 88 |               Uni -> Uni
 89 |               Ap l r -> go lower i l `Ap` go lower i r
 90 |               Lam body -> Lam (go (lower + 1) i body)
 91 |               Pi tp body -> Pi (go lower i tp) (go (lower + 1) i body)
 92 | ```
 93 | 
 94 | Using this, we can define substitution on terms. This will be useful
 95 | later on directly. For this, we first define the notion of replacing a
 96 | de Bruijn variable with a term.
 97 | 
 98 | ``` haskell
 99 |     subst :: Term -> Int -> Term -> Term
100 |     subst new i t = case t of
101 |       FreeVar i -> FreeVar i
102 |       LocalVar j -> case compare j i of
103 |         LT -> LocalVar j
104 |         EQ -> new
105 |         GT -> LocalVar (j - 1)
106 |       MetaVar i -> MetaVar i
107 |       Uni -> Uni
108 |       Ap l r -> subst new i l `Ap` subst new i r
109 |       Lam body -> Lam (subst (raise 1 new) (i + 1) body)
110 |       Pi tp body -> Pi (subst new i tp) (subst (raise 1 new) (i + 1) body)
111 | ```
112 | 
113 | Notice that we have used `raise` to escape `new` as we go under
114 | binders to avoid capturing variables. Similarly, since we're removing
115 | a binding level, if we have any de Bruijn variables that refer to a
116 | binding site outside of the one we're working with we have to lower it
117 | to compensate. That is the reason for the line `GT -> LocalVar (j - 1)`.
118 | Apart from these two complications, substitution is just hunting for
119 | all occurrences of `LocalVar i` and replacing it with `new`. However,
120 | we also have this metavariables so it makes sense that we have a
121 | notion of substitution for these as well. It's simpler than the above
122 | substitution function because we don't have to worry about lowering
123 | variables that might be affected by deleting a metavariable binding
124 | since we're using globally unique identifiers for them.
125 | 
126 | ``` haskell
127 |     substMV :: Term -> Id -> Term -> Term
128 |     substMV new i t = case t of
129 |       FreeVar i -> FreeVar i
130 |       LocalVar i -> LocalVar i
131 |       MetaVar j -> if i == j then new else MetaVar j
132 |       Uni -> Uni
133 |       Ap l r -> substMV new i l `Ap` substMV new i r
134 |       Lam body -> Lam (substMV (raise 1 new) i body)
135 |       Pi tp body -> Pi (substMV new i tp) (substMV (raise 1 new) i body)
136 | ```
137 | 
138 | Now there are only a few more utility functions left before we can get
139 | to the actual unification. We need a function to gather all of the
140 | metavariables in a term. For this we use a `Set` from `containers` and
141 | just fold over the structure of the term.
142 | 
143 | ``` haskell
144 |     metavars :: Term -> S.Set Id
145 |     metavars t = case t of
146 |       FreeVar i -> S.empty
147 |       LocalVar i -> S.empty
148 |       MetaVar j -> S.singleton j
149 |       Uni -> S.empty
150 |       Ap l r -> metavars l <> metavars r
151 |       Lam body -> metavars body
152 |       Pi tp body -> metavars tp <> metavars body
153 | ```
154 | 
155 | Another useful function will be necessary for enforcing the condition
156 | that we only unify metavariables with closed terms (no capturing). In
157 | order to handle this, we will need to check that a given term is
158 | closed. This is as simple as looking to see if it mentions the
159 | `FreeVar` constructor since `LocalVar` is used for only bound
160 | variables by invariant.
161 | 
162 | ``` haskell
163 |     isClosed :: Term -> Bool
164 |     isClosed t = case t of
165 |       FreeVar i -> False
166 |       LocalVar i -> True
167 |       MetaVar j -> True
168 |       Uni -> True
169 |       Ap l r -> isClosed l && isClosed r
170 |       Lam body -> isClosed body
171 |       Pi tp body -> isClosed tp && isClosed body
172 | ```
173 | 
174 | The last complicated utility function is `reduce`. This is actually
175 | just a simple interpreter for the language we defined earlier. It
176 | essentially repeatedly searches for `Ap (Lam ...) ...` and when it
177 | finds such an occurrence substitutes the argument into the body of the
178 | function as one might expect. I have made this function reduce
179 | everywhere because it seems to provide a significant performance
180 | improvement in many cases.
181 | 
182 | ``` haskell
183 |     reduce :: Term -> Term
184 |     reduce t = case t of
185 |       FreeVar i -> FreeVar i
186 |       LocalVar j -> LocalVar j
187 |       MetaVar i -> MetaVar i
188 |       Uni -> Uni
189 |       Ap l r -> case reduce l of
190 |         Lam body -> reduce (subst r 0 body)
191 |         l' -> Ap l' (reduce r)
192 |       Lam body -> Lam (reduce body)
193 |       Pi tp body -> Pi (reduce tp) (reduce body)
194 | ```
195 | 
196 | The remaining utility funcitons are simply checks and manipulations
197 | that we will frequently need on terms. We have a function which checks
198 | whether a term is of the form `M e1 e2 e3 ...` for some metavariable
199 | `M`; such terms are said to be stuck.
200 | 
201 | ``` haskell
202 |     isStuck :: Term -> Bool
203 |     isStuck MetaVar {} = True
204 |     isStuck (Ap f _) = isStuck f
205 |     isStuck _ = False
206 | ```
207 | 
208 | The remaining utility functions simply convert telescopes of
209 | applications, `f a1 a2 a3 ...`, into an function and a list of
210 | arguments, `(f, [a1 ... an])` and then we have a function to put
211 | things back again.
212 | 
213 | ``` haskell
214 |     peelApTelescope :: Term -> (Term, [Term])
215 |     peelApTelescope t = go t []
216 |       where go (Ap f r) rest = go f (r : rest)
217 |             go t rest = (t, rest)
218 | 
219 |     applyApTelescope :: Term -> [Term] -> Term
220 |     applyApTelescope = foldl' Ap
221 | ```
222 | 
223 | We are now in a position to turn to implementing the actual
224 | unification algorithm with all of our utilities in hand.
225 | 
226 | ### The Unification Algorithm
227 | 
228 | There are really only two key functions in implementing the
229 | unification algorithm. We can either take an existing constraint and
230 | simplify it, or take a constraint and produce a list of partial
231 | solutions, at least one of which is correct if the constraint is
232 | solvable. The first function is remarkably similar to the first-order
233 | case of unification, we essentially take a constraint and produce a
234 | set of constraints which are equivalent to the original one. For
235 | instance, if our constraint that we're trying to solve is
236 | 
237 | ``` haskell
238 |     FreeVar 0 `Ap` E === FreeVar 0 `Ap` E'
239 | ```
240 | 
241 | It's easy to see that we might as well solve constraint `E === E'`
242 | which is strictly simpler. This is what the function `simplify`
243 | does. It has the type
244 | 
245 | ``` haskell
246 |     simplify :: Constraint -> UnifyM (S.Set Constraint)
247 | ```
248 | 
249 | In order to work with generating fresh metavariables and (later)
250 | backtracking, we use the monad `UnifyM`. This is defined, as is
251 | `Constraint`, as a type synonym
252 | 
253 | ``` haskell
254 |     type UnifyM = LogicT (Gen Id)
255 |     type Constraint = (Term, Term)
256 | ```
257 | 
258 | Here we are using the package
259 | [logict](https://hackage.haskell.org/package/logict) to provide
260 | backtracking. My tutorial of this package can be found
261 | [here](https://jozefg.bitbucket.io/posts/2014-07-10-reading-logict.html). We
262 | are also using a package a threw together a few years ago called
263 | [`monad-gen`](https://hackage.haskell.org/package/monad-gen), it just
264 | provides a simple monad for generating fresh values. The sort of thing
265 | that I always end up needing in compilers. Without further-ado, let's
266 | start going through the cases for `simplify`. Each one of which
267 | corresponds to a simplifying move we are allowed to make on a
268 | constraint, ordered in terms of priority.
269 | 
270 | ``` haskell
271 |     simplify (t1, t2)
272 |       | t1 == t2 = return S.empty
273 | ```
274 | 
275 | We start out with a nice and simple case, if the two terms of the
276 | constraint are literally identical, we have no further goals. Next we
277 | have two cases integrating reduction. If either term is reducible at
278 | all we reduce it and try to simplify the remaining goals.
279 | 
280 | ``` haskell
281 |       | reduce t1 /= t1 = simplify (reduce t1, t2)
282 |       | reduce t2 /= t2 = simplify (t1, reduce t2)
283 | ```
284 | 
285 | This is how we integrate the fact that our unification is modulo
286 | reduction (we allow two terms to unify if they reduce to the same
287 | thing). Next comes the cases that are a little more sophisticated and
288 | correspond more closely to our original motivating example. If our two
289 | terms are a several things applied to free variables, we know the
290 | following
291 | 
292 |  1. The free variables have to be the same
293 |  2. All of the arguments must unify
294 | 
295 | This is captured by the following branch of simplify.
296 | 
297 | ``` haskell
298 |       | (FreeVar i, cxt) <- peelApTelescope t1,
299 |         (FreeVar j, cxt') <- peelApTelescope t2 = do
300 |           guard (i == j && length cxt == length cxt')
301 |           fold <$> mapM simplify (zip cxt cxt')
302 | ```
303 | 
304 | This code just codifies the procedure that we have informally sketched
305 | above. If we're trying to unify `A a1 ... an` and `B b1 ... bm` for
306 | two free variables `A` and `B` then we must have `A = B` and `n = m`
307 | since we have to find a solution that works for any `A` and any
308 | `B`. Finally, we then just need to unify `ai` with `bi`. The next two
309 | cases are congruence type rules. We basically just produce new
310 | constraints saying that `Lam e === Lam e'` if and only if `e ===
311 | e'`. There is a small amount of bookkeeping done to make sure that
312 | free variables are correctly represented by a globally unique
313 | `FreeVar i`. The same thing is done for `Pi` except, since `Pi`s are
314 | annotated with a type we also add a constraint for these types as well.
315 | 
316 | ``` haskell
317 |       | Lam body1 <- t1,
318 |         Lam body2 <- t2 = do
319 |           v <- FreeVar <$> lift gen
320 |           return $ S.singleton (subst v 0 body1, subst v 0 body2)
321 |       | Pi tp1 body1 <- t1,
322 |         Pi tp2 body2 <- t2 = do
323 |           v <- FreeVar <$> lift gen
324 |           return $ S.fromList
325 |             [(subst v 0 body1, subst v 0 body2),
326 |              (tp1, tp2)]
327 | ```
328 | 
329 | The final case is to decide whether or not the constraint is "stuck"
330 | on a metavariable, in which case we'll need to guess a solution for a
331 | metavariable or whether the constraint is just impossible. If neither
332 | constraint is stuck, we fail using `mzero` and if we're stuck then we
333 | just return the inputted constraint since we can make it no simpler.
334 | 
335 | ``` haskell
336 |       | otherwise =
337 |         if isStuck t1 || isStuck t2 then return $ S.singleton (t1, t2) else mzero
338 | ```
339 | 
340 | Now we turn to the most complicated part of the algorithm, where we
341 | actual try and produce possible and partial solutions for our
342 | unification constraints. The basic idea is to work with constraints of
343 | the form
344 | 
345 | ``` haskell
346 |     M a1 a2 ... an = A b1 b2 ... bm
347 | ```
348 | 
349 | where `M` is a metavariable and `A` is a some term, probably a free
350 | variable. These are called flex-rigid equations because one half is
351 | flexible, a metavariable, while one half is rigid. The first part of
352 | this code is to extract the relevant pieces of information from the
353 | constraint. Therefore, the code roughly looks like
354 | 
355 | ``` haskell
356 |     tryFlexRigid :: Constraint -> [UnifyM [Subst]]
357 |     tryFlexRigid (t1, t2)
358 |       | (MetaVar i, cxt1) <- peelApTelescope t1,
359 |         (stuckTerm, cxt2) <- peelApTelescope t2,
360 |         not (i `S.member` metavars t2) = error "TODO"
361 |       | (MetaVar i, cxt1) <- peelApTelescope t2,
362 |         (stuckTerm, cxt2) <- peelApTelescope t1,
363 |         not (i `S.member` metavars t1) = error "TODO"
364 |       | otherwise = []
365 | ```
366 | 
367 | This simply uses `peelApTelescope` to extract the 4 components `M`,
368 | `(a1 ... an)`, `A` and `(b1 ... bm)`. The resulting type is "morally"
369 | supposed to be `[Subst]` but for technical reasons we need to
370 | `[UnifyM [Subst]]` because we need to generate metavariables for the
371 | substitutions. There are exactly 2 forms that `M` may take
372 | 
373 |  - `M = λ x1. ... λ xn. xi (M1 x1 ... xn) ... (Mr x1 ... xn)`
374 |  - `M = λ x1. ... λ xn. A (M1 x1 ... xn) ... (Mr x1 ... xn)`
375 |    (if `A` is closed)
376 | 
377 | These are the only two forms that `M` can take because if `M` is any
378 | other constant or free variable than it would immediately
379 | contradictory, `M` couldn't possibly unify with `A b1 ... bm` as we
380 | need it to. Therefore, `tryFlexRigid` will produce a list of such
381 | substitutions (mod effects) replacing `M` with both of these. Since we
382 | don't know how many subterms we must apply to `xi` or `A` this will be
383 | an infinitely long list. More on this complication will
384 | follow. Therefore, we can replace `error "TODO"` with
385 | 
386 | ``` haskell
387 |     type Subst = M.Map Id Term
388 | 
389 |     tryFlexRigid :: Constraint -> [UnifyM [Subst]]
390 |     tryFlexRigid (t1, t2)
391 |       | (MetaVar i, cxt1) <- peelApTelescope t1,
392 |         (stuckTerm, cxt2) <- peelApTelescope t2,
393 |         not (i `S.member` metavars t2) = proj (length cxt1) i stuckTerm 0
394 |       | (MetaVar i, cxt1) <- peelApTelescope t2,
395 |         (stuckTerm, cxt2) <- peelApTelescope t1,
396 |         not (i `S.member` metavars t1) = proj (length cxt1) i stuckTerm 0
397 |       | otherwise = []
398 | ```
399 | 
400 | Here `proj` generates the list of substitutions. It's arguments are
401 | 
402 |  1. The number of bound variables
403 |  2. The metavariable we're trying to find substitutions for
404 |  3. The term `A` that we may use to construct a substitution for `M`
405 |  4. The number of subterms to generate (this will be incremented in
406 |     the recursive call)
407 | 
408 | It's defined just as
409 | 
410 | ``` haskell
411 |     proj bvars mv f nargs =
412 |       generateSubst bvars mv f nargs : proj bvars mv f (nargs + 1)
413 | ```
414 | 
415 | Now the work is done in the actual function
416 | `generateSubst :: Int -> Id -> Term -> Int -> UnifyM [Subst]`. We have
417 | already explained the behavior of `generateSubst`, it's just going to
418 | create all possible substitutions of the form described above. There
419 | is little more to say than to just show the code.
420 | 
421 | ``` haskell
422 |     generateSubst bvars mv f nargs = do
423 |       let mkLam tm = foldr ($) tm (replicate bvars Lam)
424 |       let saturateMV tm = foldl' Ap tm (map LocalVar [0..bvars - 1])
425 |       let mkSubst = M.singleton mv
426 |       args <- map saturateMV . map MetaVar <$> replicateM nargs (lift gen)
427 |       return [mkSubst . mkLam $ applyApTelescope t args
428 |              | t <- map LocalVar [0..bvars - 1] ++
429 |                     if isClosed f then [f] else []]
430 | ```
431 | 
432 | All that is left to do is to tie these two functions together in to
433 | try and produce a solution in general. One small caveat is that we
434 | need a few simple functions for working with substitutions. One to
435 | take a `Subst` and perform all the indicated replacements on a term
436 | and one to take two substitutions and perform a disjoint merge on
437 | them.
438 | 
439 | ``` haskell
440 |     manySubst :: Subst -> Term -> Term
441 |     manySubst s t = M.foldrWithKey (\mv sol t -> substMV sol mv t) t s
442 | 
443 |     (<+>) :: Subst -> Subst -> Subst
444 |     s1 <+> s2 | not (M.null (M.intersection s1 s2)) = error "Impossible"
445 |     s1 <+> s2 = M.union (manySubst s1 <$> s2) s1
446 | ```
447 | 
448 | Now our main function, `unify` will take the current substitution and
449 | a set of constraints and produce a solution substitution and a set of
450 | flex-flex equations. These are equations of the form
451 | `M a1 ... an = M' b1 ... bn`. It is provable that so called flex-flex
452 | equations are always solvable (cf Huet's lemma) but solving them in a
453 | canonical way is impossible so we instead produce the solution "up to"
454 | flex-flex equations and let the user deal with the ambiguity however
455 | they choose. For example, such an equation in resulting from Agda's
456 | unification algorithm will produce the error "unresolved
457 | metavariables" because the metavariable is not canonically
458 | determined. Therefore, our main algorithm proceeds in the following
459 | steps
460 | 
461 | 1. Apply the given substitution to all our constraints.
462 | 2. Simplify the set of constraints to remove any obvious ones.
463 | 3. Separate flex-flex equations from flex-rigid ones.
464 | 4. Pick a flex-rigid equation at random, if there are none, we're
465 |    done.
466 | 5. Use `tryFlexRigid` to get a list of possible solutions
467 | 6. Try each solution and attempt to unify the remaining constraints,
468 |    backtracking if we get stuck
469 | 
470 | In order to implement 2, we define a function which is simply the
471 | "closure" of `simplify` and applies it until there is no more
472 | simplification to be done.
473 | 
474 | ``` haskell
475 |     repeatedlySimplify :: S.Set Constraint -> UnifyM (S.Set Constraint)
476 |     repeatedlySimplify cs = do
477 |       cs' <- fold <$> traverse simplify (S.toList cs)
478 |       if cs' == cs then return cs else repeatedlySimplify cs'
479 | ```
480 | 
481 | Apart from this, the main routine four unification is quite
482 | declarative
483 | 
484 | ``` haskell
485 |     unify :: Subst -> S.Set Constraint -> UnifyM (Subst, S.Set Constraint)
486 |     unify s cs = do
487 |       let cs' = applySubst s cs
488 |       cs'' <- repeatedlySimplify cs'
489 |       let (flexflexes, flexrigids) = S.partition flexflex cs''
490 |       if S.null flexrigids
491 |         then return (s, flexflexes)
492 |         else do
493 |           let psubsts = tryFlexRigid (S.findMax flexrigids)
494 |           trySubsts psubsts (flexrigids <> flexflexes)
495 | ```
496 | 
497 | The first line implements step 1, using
498 | `applySubst :: Subst -> S.Set Constraint -> S.Set Constraint` to apply
499 | our substitution. The next line simplifies the constraints so we're
500 | left with flex-flex or flex-rigid constraints. After this, we can
501 | partition the constraints into these two classes. From here, we simply
502 | implement steps 4-6 making use of the helper function `trySubst`
503 | 
504 | ``` haskell
505 |     trySubsts :: [UnifyM [Subst]] -> S.Set Constraint -> UnifyM (Subst,S.Set Constraint)
506 | ```
507 | 
508 | This function takes care of peeling out each substitution and applying
509 | it to the constraints we have lying around. In order to cope with the
510 | fact that all of these are potentially infinite and we need to fairly
511 | search the resulting space, we make use of
512 | `interleave :: m a -> m a -> m a` from `logict`. It's essentially
513 | equivalent to `mplus` from the list monad but search fairly in the
514 | case of infinite lists. This takes care of handling backtracking in a
515 | seamless and mostly invisible way, Haskell is fun sometimes! The code
516 | for implementing this is essentially just `interleave`-ing all the
517 | recursive calls to `unify` that we need to make using `mzero`,
518 | failure, for when we've run out of substitutions to try.
519 | 
520 | ``` haskell
521 |     trySubsts [] cs = mzero
522 |     trySubsts (mss : psubsts) cs = do
523 |       ss <- mss
524 |       let these = foldr interleave mzero [unify (newS <+> s) cs | newS <- ss]
525 |       let those = trySubsts psubsts cs
526 |       these `interleave` those
527 | ```
528 | 
529 | Putting all of this code together, we have completed a higher-order
530 | unificaiton algorithm! To make a top-level function to play with, we
531 | add a driver function which runs `unify` and strips out all of the
532 | monads of `UnifyM`
533 | 
534 | ``` haskell
535 |     driver :: Constraint -> Maybe (Subst, S.Set Constraint)
536 |     driver = listToMaybe . runGenFrom 100 . observeAllT . unify M.empty . S.singleton
537 | ```
538 | 


--------------------------------------------------------------------------------
/higher-order-unification.cabal:
--------------------------------------------------------------------------------
 1 | name:                higher-order-unification
 2 | version:             0.1.0.0
 3 | synopsis:            A simple implementation of higher order unification
 4 | license:             MIT
 5 | license-file:        LICENSE
 6 | author:              Danny Gratzer
 7 | maintainer:          jozefg@cmu.edu
 8 | category:            Language
 9 | build-type:          Simple
10 | extra-source-files:  README.md
11 | cabal-version:       >=1.10
12 | 
13 | 
14 | library
15 |   exposed-modules:     Unification
16 |                      , Client
17 |   build-depends:       base >=4 && <5
18 |                      , containers
19 |                      , logict
20 |                      , monad-gen
21 |                      , mtl
22 |   hs-source-dirs:      src
23 |   default-language:    Haskell2010
24 | 


--------------------------------------------------------------------------------
/src/Client.hs:
--------------------------------------------------------------------------------
 1 | module Client (infer) where
 2 | import           Control.Monad
 3 | import           Control.Monad.Gen
 4 | import           Control.Monad.Trans
 5 | import qualified Data.Map      as M
 6 | import           Data.Maybe
 7 | import           Data.Monoid
 8 | import qualified Data.Set      as S
 9 | import           Unification
10 | 
11 | typeOf :: M.Map Id Term
12 |        -> M.Map Id Term
13 |        -> Term
14 |        -> UnifyM (Term, S.Set Constraint)
15 | typeOf mcxt cxt t = case t of
16 |   LocalVar i -> mzero
17 |   FreeVar i -> maybe mzero (\x -> return (x, S.empty)) $ M.lookup i cxt
18 |   MetaVar i -> maybe mzero (\x -> return (x, S.empty)) $ M.lookup i mcxt
19 |   Uni -> return (Uni, S.empty)
20 |   Ap l r -> do
21 |     (tpl, cl) <- typeOf mcxt cxt l
22 |     (tpr, cr) <- typeOf mcxt cxt r
23 |     case tpl of
24 |       Pi from to -> return (subst r 0 to, cl <> cr <> S.singleton (from, tpr))
25 |       _ -> do
26 |         (m1, m2) <- (,) <$> lift gen <*> lift gen
27 |         return ( MetaVar m2 `Ap` r
28 |                , cl <> cr <>
29 |                  S.fromList [ (tpl, Pi (MetaVar m1) (MetaVar m2 `Ap` LocalVar 0))
30 |                             , (tpr, MetaVar m1) ])
31 |   Lam b -> do
32 |     (v, m) <- (,) <$> lift gen <*> lift gen
33 |     (to, cs) <-
34 |       typeOf (M.insert m Uni mcxt) (M.insert v (MetaVar m) cxt)
35 |       (subst (FreeVar v) 0 b)
36 |     return ( Pi (MetaVar m) (substFV (LocalVar 0) v (raise 1 to))
37 |            , cs <> S.singleton (MetaVar m, MetaVar m))
38 |   Pi from to -> do
39 |     v <- lift gen
40 |     (fromTp, fromCs) <- typeOf mcxt cxt from
41 |     (toTp, toCs) <- typeOf mcxt (M.insert v from cxt) (subst (FreeVar v) 0 to)
42 |     return (Uni, fromCs <> toCs <> S.fromList [(Uni, fromTp), (Uni, toTp)])
43 | 
44 | infer :: Term -> Maybe (Term, S.Set Constraint)
45 | infer t = listToMaybe . runUnifyM $ go
46 |   where go = do
47 |           (tp, cs) <- typeOf M.empty M.empty t
48 |           (subst, flexflex) <- unify M.empty cs
49 |           return (manySubst subst tp, flexflex)
50 | 


--------------------------------------------------------------------------------
/src/Unification.hs:
--------------------------------------------------------------------------------
  1 | {-# LANGUAGE FlexibleContexts #-}
  2 | module Unification
  3 |   ( Id(..)
  4 |   , Index(..)
  5 |   , Term(..)
  6 |   , raise
  7 |   , subst
  8 |   , substMV
  9 |   , manySubst
 10 |   , substFV
 11 |   , UnifyM (..)
 12 |   , Constraint(..)
 13 |   , Subst (..)
 14 |   , unify
 15 |   , runUnifyM
 16 |   , driver) where
 17 | import           Control.Monad
 18 | import           Control.Monad.Gen
 19 | import           Control.Monad.Logic
 20 | import           Control.Monad.Trans
 21 | import qualified Data.Map.Strict     as M
 22 | import           Data.Maybe
 23 | import           Data.Foldable
 24 | import           Data.Monoid
 25 | import qualified Data.Set            as S
 26 | 
 27 | --------------------------------------------------
 28 | ------------------ the language ------------------
 29 | --------------------------------------------------
 30 | 
 31 | type Id = Int
 32 | type Index = Int
 33 | data Term = FreeVar Id
 34 |           | LocalVar Index
 35 |           | MetaVar Id
 36 |           | Uni
 37 |           | Ap Term Term
 38 |           | Lam Term
 39 |           | Pi Term Term
 40 |           deriving (Eq, Show, Ord)
 41 | 
 42 | -- | Raise @LocalVar@s without a binding by @i@ amount. Used to avoid
 43 | -- capture in terms with free de Bruijn variables.
 44 | raise :: Int -> Term -> Term
 45 | raise = go 0
 46 |   where go lower i t = case t of
 47 |           FreeVar i -> FreeVar i
 48 |           LocalVar j -> if i > lower then LocalVar (i + j) else LocalVar j
 49 |           MetaVar i -> MetaVar i
 50 |           Uni -> Uni
 51 |           Ap l r -> go lower i l `Ap` go lower i r
 52 |           Lam body -> Lam (go (lower + 1) i body)
 53 |           Pi tp body -> Pi (go lower i tp) (go (lower + 1) i body)
 54 | 
 55 | -- | Substitute a term for the de Bruijn variable @i@.
 56 | subst :: Term -> Int -> Term -> Term
 57 | subst new i t = case t of
 58 |   FreeVar i -> FreeVar i
 59 |   LocalVar j -> case compare j i of
 60 |     LT -> LocalVar j
 61 |     EQ -> new
 62 |     GT -> LocalVar (j - 1)
 63 |   MetaVar i -> MetaVar i
 64 |   Uni -> Uni
 65 |   Ap l r -> subst new i l `Ap` subst new i r
 66 |   Lam body -> Lam (subst (raise 1 new) (i + 1) body)
 67 |   Pi tp body -> Pi (subst new i tp) (subst (raise 1 new) (i + 1) body)
 68 | 
 69 | -- | Substitute a term for all metavariables with a given identifier.
 70 | substMV :: Term -> Id -> Term -> Term
 71 | substMV new i t = case t of
 72 |   FreeVar i -> FreeVar i
 73 |   LocalVar i -> LocalVar i
 74 |   MetaVar j -> if i == j then new else MetaVar j
 75 |   Uni -> Uni
 76 |   Ap l r -> substMV new i l `Ap` substMV new i r
 77 |   Lam body -> Lam (substMV (raise 1 new) i body)
 78 |   Pi tp body -> Pi (substMV new i tp) (substMV (raise 1 new) i body)
 79 | 
 80 | -- | Substitute a term for all free variable with a given identifier.
 81 | substFV :: Term -> Id -> Term -> Term
 82 | substFV new i t = case t of
 83 |   FreeVar j -> if i == j then new else FreeVar j
 84 |   MetaVar i -> MetaVar i
 85 |   LocalVar i -> LocalVar i
 86 |   Uni -> Uni
 87 |   Ap l r -> substFV new i l `Ap` substFV new i r
 88 |   Lam body -> Lam (substFV (raise 1 new) i body)
 89 |   Pi tp body -> Pi (substFV new i tp) (substFV (raise 1 new) i body)
 90 | 
 91 | -- | Gather all the metavariables in a term into a set.
 92 | metavars :: Term -> S.Set Id
 93 | metavars t = case t of
 94 |   FreeVar i -> S.empty
 95 |   LocalVar i -> S.empty
 96 |   MetaVar j -> S.singleton j
 97 |   Uni -> S.empty
 98 |   Ap l r -> metavars l <> metavars r
 99 |   Lam body -> metavars body
100 |   Pi tp body -> metavars tp <> metavars body
101 | 
102 | -- | Returns @True@ if a term has no free variables and is therefore a
103 | -- valid candidate for a solution to a metavariable.
104 | isClosed :: Term -> Bool
105 | isClosed t = case t of
106 |   FreeVar i -> False
107 |   LocalVar i -> True
108 |   MetaVar j -> True
109 |   Uni -> True
110 |   Ap l r -> isClosed l && isClosed r
111 |   Lam body -> isClosed body
112 |   Pi tp body -> isClosed tp && isClosed body
113 | 
114 | -- | Implement reduction for the language. Given a term, normalize it.
115 | -- This boils down mainly to applying lambdas to their arguments and all
116 | -- the appropriate congruence rules.
117 | reduce :: Term -> Term
118 | reduce t = case t of
119 |   FreeVar i -> FreeVar i
120 |   LocalVar j -> LocalVar j
121 |   MetaVar i -> MetaVar i
122 |   Uni -> Uni
123 |   Ap l r -> case reduce l of
124 |     Lam body -> reduce (subst r 0 body)
125 |     l' -> Ap l' (reduce r)
126 |   Lam body -> Lam (reduce body)
127 |   Pi tp body -> Pi (reduce tp) (reduce body)
128 | 
129 | -- | Check to see if a term is blocked on applying a metavariable.
130 | isStuck :: Term -> Bool
131 | isStuck MetaVar {} = True
132 | isStuck (Ap f _) = isStuck f
133 | isStuck _ = False
134 | 
135 | -- | Turn @f a1 a2 a3 a4 ... an@ to @(f, [a1...an])@.
136 | peelApTelescope :: Term -> (Term, [Term])
137 | peelApTelescope t = go t []
138 |   where go (Ap f r) rest = go f (r : rest)
139 |         go t rest = (t, rest)
140 | 
141 | -- | Turn @(f, [a1...an])@ into @f a1 a2 a3 a4 ... an@.
142 | applyApTelescope :: Term -> [Term] -> Term
143 | applyApTelescope = foldl' Ap
144 | 
145 | -----------------------------------------------------------------
146 | -------------- the actual unification code ----------------------
147 | -----------------------------------------------------------------
148 | 
149 | type UnifyM = LogicT (Gen Id)
150 | type Constraint = (Term, Term)
151 | 
152 | -- | Given a constraint, produce a collection of equivalent but
153 | -- simpler constraints. Any solution for the returned set of
154 | -- constraints should be a solution for the original constraint.
155 | simplify :: Constraint -> UnifyM (S.Set Constraint)
156 | simplify (t1, t2)
157 |   | t1 == t2 && S.null (metavars t1) = return S.empty
158 |   | reduce t1 /= t1 = simplify (reduce t1, t2)
159 |   | reduce t2 /= t2 = simplify (t1, reduce t2)
160 |   | (FreeVar i, cxt) <- peelApTelescope t1,
161 |     (FreeVar j, cxt') <- peelApTelescope t2 = do
162 |       guard (i == j && length cxt == length cxt')
163 |       fold <$> mapM simplify (zip cxt cxt')
164 |   | Lam body1 <- t1,
165 |     Lam body2 <- t2 = do
166 |       v <- FreeVar <$> lift gen
167 |       return $ S.singleton (subst v 0 body1, subst v 0 body2)
168 |   | Pi tp1 body1 <- t1,
169 |     Pi tp2 body2 <- t2 = do
170 |       v <- FreeVar <$> lift gen
171 |       return $ S.fromList
172 |         [(subst v 0 body1, subst v 0 body2),
173 |          (tp1, tp2)]
174 |   | otherwise =
175 |     if isStuck t1 || isStuck t2 then return $ S.singleton (t1, t2) else mzero
176 | 
177 | type Subst = M.Map Id Term
178 | 
179 | -- | Generate all possible solutions to flex-rigid equations as an
180 | -- infinite list of computations producing finite lists.
181 | tryFlexRigid :: Constraint -> [UnifyM [Subst]]
182 | tryFlexRigid (t1, t2)
183 |   | (MetaVar i, cxt1) <- peelApTelescope t1,
184 |     (stuckTerm, cxt2) <- peelApTelescope t2,
185 |     not (i `S.member` metavars t2) = proj (length cxt1) i stuckTerm 0
186 |   | (MetaVar i, cxt1) <- peelApTelescope t2,
187 |     (stuckTerm, cxt2) <- peelApTelescope t1,
188 |     not (i `S.member` metavars t1) = proj (length cxt1) i stuckTerm 0
189 |   | otherwise = []
190 |   where proj bvars mv f nargs =
191 |           generateSubst bvars mv f nargs : proj bvars mv f (nargs + 1)
192 |         generateSubst bvars mv f nargs = do
193 |           let mkLam tm = foldr ($) tm (replicate bvars Lam)
194 |           let saturateMV tm = foldl' Ap tm (map LocalVar [0..bvars - 1])
195 |           let mkSubst = M.singleton mv
196 |           args <- map saturateMV . map MetaVar <$> replicateM nargs (lift gen)
197 |           return [mkSubst . mkLam $ applyApTelescope t args
198 |                  | t <- map LocalVar [0..bvars - 1] ++
199 |                         if isClosed f then [f] else []]
200 | 
201 | -- | The reflexive transitive closure of @simplify@
202 | repeatedlySimplify :: S.Set Constraint -> UnifyM (S.Set Constraint)
203 | repeatedlySimplify cs = do
204 |   cs' <- fold <$> traverse simplify (S.toList cs)
205 |   if cs' == cs then return cs else repeatedlySimplify cs'
206 | 
207 | manySubst :: Subst -> Term -> Term
208 | manySubst s t = M.foldrWithKey (\mv sol t -> substMV sol mv t) t s
209 | 
210 | (<+>) :: Subst -> Subst -> Subst
211 | s1 <+> s2 | not (M.null (M.intersection s1 s2)) = error "Impossible"
212 | s1 <+> s2 = M.union (manySubst s1 <$> s2) s1
213 | 
214 | -- | The top level function, given a substitution and a set of
215 | -- constraints, produce a solution substution and the resulting set of
216 | -- flex-flex equations.
217 | unify :: Subst -> S.Set Constraint -> UnifyM (Subst, S.Set Constraint)
218 | unify s cs = do
219 |   let cs' = applySubst s cs
220 |   cs'' <- repeatedlySimplify cs'
221 |   let (flexflexes, flexrigids) = S.partition flexflex cs''
222 |   if S.null flexrigids
223 |     then return (s, flexflexes)
224 |     else do
225 |       let psubsts = tryFlexRigid (S.findMax flexrigids)
226 |       trySubsts psubsts (flexrigids <> flexflexes)
227 |   where applySubst s = S.map (\(t1, t2) -> (manySubst s t1, manySubst s t2))
228 |         flexflex (t1, t2) = isStuck t1 && isStuck t2
229 |         trySubsts [] cs = mzero
230 |         trySubsts (mss : psubsts) cs = do
231 |           ss <- mss
232 |           let these = foldr interleave mzero [unify (newS <+> s) cs | newS <- ss]
233 |           let those = trySubsts psubsts cs
234 |           these `interleave` those
235 | 
236 | runUnifyM :: UnifyM a -> [a]
237 | runUnifyM = runGenFrom 100 . observeAllT
238 | 
239 | -- | Solve a constraint and return the remaining flex-flex constraints
240 | -- and the substitution for it.
241 | driver :: Constraint -> Maybe (Subst, S.Set Constraint)
242 | driver = listToMaybe . runUnifyM . unify M.empty . S.singleton
243 | 


--------------------------------------------------------------------------------
/stack.yaml:
--------------------------------------------------------------------------------
1 | flags: {}
2 | packages:
3 | - '.'
4 | extra-deps: [monad-gen-0.3.0.1]
5 | resolver: lts-9.0
6 | 


--------------------------------------------------------------------------------