A Literate Programming Solution to the Potter Kata (http://codingdojo.org/cgi-bin/wiki.pl?KataPotter)
135 | 136 | 137 | 138 | 139 | 140 |Table of Contents
142 |-
144 |
- 1 Introduction 145 |
- 2 Problem Description 146 |
- 3 Procedure
147 |
-
148 |
- 3.1 Find all partitions and then calculate their costs 149 | 154 | 155 |
- 3.2 Find minimum cost partition directly via dynamic programming 156 |
158 |
1 Introduction
164 |168 | This is a very special document in that it not only explains the inner 169 | workings of this software system but also contains the complete source 170 | code of its implementation. Such a document is called a Literate 171 | Program1 after the software development paradigm 172 | proposed by Donald Knuth in the late 1970s and first implemented in 173 | his 1981 WEB system. 174 |
175 |176 | Unlike WEB and its later offspring CWEB, which were each limited to a 177 | single programming language (Pascal and C respectively), this Literate 178 | Program has been written using Emacs' Org-Mode2, allowing us to freely intermix any number of 179 | programming languages to create our final product. The main 180 | programming language we will use here is Clojure3, a modern dialect of Lisp that targets the Java 181 | Virtual Machine (JVM)4. However, if we needed helper programs in other 182 | languages (e.g., Bash, C++, Python), their code could also be freely 183 | intermixed in this document and automatically extracted later for 184 | compilation and execution. 185 |
186 |187 | In addition to reading this document, its Org source file (potter.org) 188 | may be opened in Emacs and manipulated in three ways: 189 |
190 |-
191 |
- Tangle
- Typing `M-x org-babel-tangle' will cause Emacs to extract 192 | all the source code blocks within potter.org into separate 193 | files and rearrange them into valid compilation order so 194 | that they may be compiled into an executable application. 195 | 196 | 197 |
- Weave
- Typing `M-x org-export-as-pdf' or `M-x org-export-as-html' 198 | will cause Emacs to generate an attractively typeset 199 | version of potter.org as either a PDF file (potter.pdf) or an 200 | HTML webpage (potter.html) respectively. This is likely the 201 | way in which the manual you are currently reading was 202 | created. 203 | 204 | 205 |
- Evaluate
- If potter.org is opened in Emacs' Org major mode (`M-x 206 | org-mode'), and Emacs is connected to an external 207 | Clojure process5, the code blocks within 208 | potter.org may be loaded and executed individually by the 209 | connected Clojure server using `M-x 210 | org-babel-execute-src-block' (typically bound to `C-c 211 | C-c').6 212 | 213 |
217 | See comments in the header section of potter.org for the specific order 218 | in which the above Emacs commands should be executed. 219 |
220 |2 Problem Description
226 |230 | Once upon a time there was a series of 5 books about a very English 231 | hero called Harry. (At least when this Kata was invented, there were 232 | only 5. Since then they have multiplied) Children all over the world 233 | thought he was fantastic, and, of course, so did the publisher. So in 234 | a gesture of immense generosity to mankind, (and to increase sales) 235 | they set up the following pricing model to take advantage of Harry's 236 | magical powers. 237 |
238 |239 | One copy of any of the five books costs 8 EUR. If, however, you buy 240 | two different books from the series, you get a 5% discount on those 241 | two books. If you buy 3 different books, you get a 10% discount. With 242 | 4 different books, you get a 20% discount. If you go the whole hog, 243 | and buy all 5, you get a huge 25% discount. 244 |
245 |246 | Note that if you buy, say, four books, of which 3 are different 247 | titles, you get a 10% discount on the 3 that form part of a set, but 248 | the fourth book still costs 8 EUR. 249 |
250 |251 | Potter mania is sweeping the country and parents of teenagers 252 | everywhere are queueing up with shopping baskets overflowing with 253 | Potter books. Your mission is to write a piece of code to calculate 254 | the price of any conceivable shopping basket, giving as big a discount 255 | as possible. 256 |
257 |3 Procedure
263 |267 | Given a shopping basket specification [1 1 3 2 1 5 3 4] (i.e., a 268 | vector of the books present in the basket by their number in the 269 | series), our goal is to find the partition of the basket's contents 270 | that minimizes the total cost of purchasing the books in the basket. 271 |
272 | 273 |3.1 Find all partitions and then calculate their costs
277 |281 | One approach that we could take to solve this problem is as follows: 282 |
283 |-
284 |
- Find all partitions of the shopping basket contents. 285 | 286 |
- Calculate the cost of the shopping basket contents using each partition. 287 | 288 |
- Select the minimum cost partition. 289 | 290 |
(def find-all-basket-partitions find-all-basket-partitions-via-tree-traversal) 297 | 298 | (defn find-minimum-cost-partition-naive [shopping-basket-books] 299 | (let [all-partitions (find-all-basket-partitions shopping-basket-books) 300 | all-costs (map calculate-partition-cost all-partitions)] 301 | (apply min-key val (zipmap all-partitions all-costs)))) 302 |303 | 304 | 305 | 306 |
3.1.1 Find all basket partitions via power sets
310 |314 | A partition \(P\) of a set \(S\) is a subset of all its subsets, for which 315 | the following three conditions hold: 316 |
317 |-
318 |
- \(P\) does not contain the empty set \(\emptyset\) (i.e., \(\emptyset \notin P\)). 319 | 320 |
- The union of the elements of \(P\) is equal to \(S\). 321 | 322 |
- The intersection of any two distinct elements of \(P\) is the empty set \(\emptyset\). 323 | 324 |
(defn partition? [P S] 331 | (and (not (contains? P #{})) 332 | (= (apply union P) S) 333 | (every? #(= (intersection (first %) (second %)) #{}) (combinations P 2)))) 334 |335 | 336 | 337 |
338 | The set of all subsets of a set \(S\) (including the empty set 339 | \(\emptyset\) and \(S\) itself) is called the power set of \(S\). The 340 | number of elements in the power set of \(S\) is equal to \(2^{|S|}\), 341 | where \(|S|\) is the number of elements in \(S\). 342 |
343 | 344 | 345 | 346 |(defn find-power-set [S] 347 | (set (map set (subsets S)))) 348 |349 | 350 | 351 |
352 | To find all partitions of the set \(S\), we could naively find all 353 | subsets of its power set that satisfy the partition? predicate given 354 | above. 355 |
356 | 357 | 358 | 359 |(defn find-all-partitions [S] 360 | (filter #(partition? % S) (subsets (find-power-set S)))) 361 |362 | 363 | 364 |
365 | Of course, since we are working with sets and no redundant elements 366 | are allowed within sets, we must begin our analysis by mapping the 367 | input shopping basket to a set of distinct elements. We do this by 368 | representing each book in the basket by its index in the input vector. 369 | Finally, once we have found all partitions of the index set, we 370 | translate the returned indices back to their book numbers. 371 |
372 | 373 | 374 | 375 |(defn find-all-basket-partitions-via-power-sets [shopping-basket-books] 376 | (let [S (set (range (count shopping-basket-books)))] 377 | (for [P (find-all-partitions S)] 378 | (for [subset P] 379 | (map shopping-basket-books subset))))) 380 |381 | 382 | 383 |
3.1.2 Find all basket partitions via tree traversal
389 |393 | Although mathematically correct, our first formulation is extremely 394 | computationally inefficient and will scale poorly as the size of \(S\) 395 | grows. Recall that our goal is to maximize the discount available to 396 | the shopper, and since no discounts are applied for groups of less 397 | than two books, we can exclude all such sets within the power set. 398 |
399 | 400 | 401 | 402 |(defn find-discounted-subsets [S] 403 | (remove #(< (count %) 2) (subsets S))) 404 | 405 | (defn find-discounted-subsets-alternate [S] 406 | (mapcat #(combinations S %) (range 2 6))) 407 |408 | 409 | 410 |
411 | Since we are working with sets and no redundant elements are allowed 412 | within a set, we must begin our analysis by mapping the shopping 413 | basket contents to a set of distinct elements. For our second attempt, 414 | we do this by creating a map of distinct books (by their number in the 415 | series) to the number of times each appears in the basket. 416 |
417 |418 | We can then envision a basket partitioning procedure, that proceeds by 419 | iteratively selecting one of the discounted subsets of the 420 | distinct books remaining in the basket until the basket is either 421 | empty or only contains books which cannot be grouped into a 422 | discounted subset. These remaining books are then grouped 423 | together to form the final subset of the partition. 424 |
425 |426 | In order to explore all such possible partitions, we construct a tree 427 | whose nodes are pairs of (book-freqs-in-basket, selected-book-groups). 428 | Successor nodes are constructed by selecting all discounted subsets of 429 | the parent node's book-freqs-in-basket and when none remain, simply 430 | grouping together any books still in book-freqs-in-basket as the final 431 | undiscounted subset. In such a tree, each path from the root node 432 | (i.e., the initial shopping basket contents) to a leaf node (i.e., one 433 | whose book-freqs-in-basket value is empty) represents a partition of 434 | the tree. Each leaf node's selected-book-groups field will contain a 435 | complete partition of the shopping basket contents. To find all 436 | partitions, we simply traverse this tree and return the 437 | selected-book-groups field on each leaf node. 438 |
439 | 440 | 441 | 442 |(defstruct node :book-freqs-in-basket :selected-book-groups) 443 | 444 | (defn remove-from-basket [book-freqs subset] 445 | (into {} (remove #(zero? (val %)) (reduce #(update-in %1 [%2] dec) book-freqs subset)))) 446 | 447 | (defn expand-book-freqs [book-freqs] 448 | (mapcat (fn [[book-id frequency]] (repeat frequency book-id)) book-freqs)) 449 | 450 | (defn successors [{:keys [book-freqs-in-basket selected-book-groups]}] 451 | (let [distinct-books (keys book-freqs-in-basket)] 452 | (if-let [discounted-book-groups (seq (find-discounted-subsets distinct-books))] 453 | (for [books discounted-book-groups] 454 | (struct-map node 455 | :book-freqs-in-basket (remove-from-basket book-freqs-in-basket books) 456 | :selected-book-groups (cons books selected-book-groups))) 457 | (let [undiscounted-book-group (expand-book-freqs book-freqs-in-basket)] 458 | (list (struct-map node 459 | :book-freqs-in-basket nil 460 | :selected-book-groups (if (seq undiscounted-book-group) 461 | (cons undiscounted-book-group selected-book-groups) 462 | selected-book-groups))))))) 463 | 464 | (defn leaf-node? [node] 465 | (nil? (:book-freqs-in-basket node))) 466 | 467 | (defn find-next-partition [[open-list partition]] 468 | (if-let [node (first open-list)] 469 | (if (leaf-node? node) 470 | [(rest open-list) (:selected-book-groups node)] 471 | (recur [(concat (successors node) (rest open-list)) nil])))) 472 | 473 | (defn find-all-basket-partitions-via-tree-traversal [shopping-basket-books] 474 | (let [root-node (struct-map node 475 | :book-freqs-in-basket (frequencies shopping-basket-books) 476 | :selected-book-groups ())] 477 | (->> [(list root-node) nil] 478 | (iterate find-next-partition) 479 | rest 480 | (take-while seq) 481 | (map second)))) 482 |483 | 484 | 485 |
3.1.3 Calculate partition cost
491 |495 | The cost of a partition is simply calculated as the sum of the costs 496 | of its bins. 497 |
498 | 499 | 500 | 501 |(defn calculate-partition-cost [partition] 502 | (reduce + (map calculate-bin-cost partition))) 503 |504 | 505 | 506 |
507 | To calculate the cost of a bin, we first determine the bin discount, 508 | which is a function of the number of distinct books in the bin as 509 | described in Problem Description. 510 |
511 | 512 | 513 | 514 |(defn get-bin-discount [bin] 515 | (case (count (distinct bin)) 516 | 2 0.05 517 | 3 0.10 518 | 4 0.20 519 | 5 0.25 520 | 0.0)) 521 |522 | 523 | 524 |
525 | We then multiply the number of books in the bin by the base book price 526 | (given as 8 euros in the problem statement) and apply the bin discount 527 | to the result. 528 |
529 | 530 | 531 | 532 |(def base-book-price 8.00) 533 | 534 | (defn calculate-bin-cost [bin] 535 | (* base-book-price (count bin) (- 1.0 (get-bin-discount bin)))) 536 |537 | 538 | 539 |
3.2 Find minimum cost partition directly via dynamic programming
546 |550 | The tree traversal approach described in Find all basket partitions via tree traversal does successfully return all partitions of the 551 | shopping basket contents. However, if order is disregarded, many of 552 | the returned partitions end up being redundant. As this translates 553 | into wasted computation, we would like to find an even more efficient 554 | partitioning scheme that eliminates redundant entries. 555 |
556 |557 | The approach we will try this time is called dynamic programming. 558 | Under this scheme, the minimum cost partition of the shopping basket 559 | contents will be defined recursively as the partition which minimizes 560 | the sum of the first selected book group's cost and the minimum 561 | partition cost of the remaining shopping basket contents. 562 |
563 |564 | Ultimately, this algorithm will also perform what is essentially a 565 | depth-first tree search on the states of the shopping basket's 566 | contents after each successive book group selection. This means we 567 | will be searching the same state space as we did in the tree traversal 568 | approach from the previous section. 569 |
570 |571 | However, what is unique about the dynamic programming methodology is 572 | that we can avoid redundant searches through the state space by 573 | memoizing the minimum cost partition at each stage of our tree 574 | traversal in terms of the remaining shopping basket contents. Since we 575 | will be representing what is in the basket as a frequency table, the 576 | order in which we select book groups from the basket will not affect 577 | the number of memoized states. 578 |
579 |580 | For readability, we simply recalculate the partition cost at each 581 | unmemoized step of the tree traversal. If we found this to be a major 582 | efficiency problem in our final application, we could calculate the 583 | bin cost of each newly selected book group and add that to the minimum 584 | partition cost of the remaining shopping basket contents at each step. 585 | We leave this as an exercise for the reader. 586 |
587 | 588 | 589 | 590 |(defn find-minimum-cost-partition-aux [book-freqs-in-basket] 591 | (if (seq book-freqs-in-basket) 592 | (let [distinct-books (keys book-freqs-in-basket)] 593 | (if-let [discounted-book-groups (seq (find-discounted-subsets distinct-books))] 594 | (apply min-key calculate-partition-cost 595 | (for [books discounted-book-groups] 596 | (cons books (find-minimum-cost-partition-aux (remove-from-basket book-freqs-in-basket books))))) 597 | (let [undiscounted-book-group (expand-book-freqs book-freqs-in-basket)] 598 | (list undiscounted-book-group)))))) 599 | (def find-minimum-cost-partition-aux (memoize find-minimum-cost-partition-aux)) 600 | 601 | (defn find-minimum-cost-partition-via-dynamic-programming [shopping-basket-books] 602 | (let [minimum-cost-partition (find-minimum-cost-partition-aux (frequencies shopping-basket-books))] 603 | [minimum-cost-partition (calculate-partition-cost minimum-cost-partition)])) 604 |605 | 606 | 607 |
Footnotes:
609 |1 See http://en.wikipedia.org/wiki/Literate_programming 611 | for more information. 612 |
613 | 614 | 615 |2 http://orgmode.org 616 |
617 | 618 | 619 |3 http://clojure.org 620 |
621 | 622 | 623 |4 See 624 | http://en.wikipedia.org/wiki/Java_virtual_machine for more 625 | information. 626 |
627 | 628 | 629 |5 Connecting to an external Clojure 630 | process is beyond the scope of this document but 631 | requires setting up either SLIME + Swank-Clojure and 632 | typing `M-x clojure-jack-in' or nrepl.el + NREPL and 633 | typing `M-x nrepl-jack-in', 634 |
635 | 636 | 637 |6 See 638 | http://orgmode.org/manual/Evaluating-code-blocks.html 639 | for more information. 640 |