├── .gitignore ├── java └── bench │ ├── core.class │ ├── core$LLNode.class │ ├── core$BinaryTreeNode.class │ ├── LLNode.java │ ├── BinaryTreeNode.java │ └── core.java ├── doc └── intro.md ├── test └── performancepaper │ └── core_test.clj ├── project.clj ├── CHANGELOG.md ├── LICENSE ├── src └── performancepaper │ └── core.clj └── README.org /.gitignore: -------------------------------------------------------------------------------- 1 | target/ 2 | /.project 3 | /.settings/ 4 | /.classpath 5 | *.clj~ 6 | checkouts/ -------------------------------------------------------------------------------- /java/bench/core.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/joinr/performancepaper/HEAD/java/bench/core.class -------------------------------------------------------------------------------- /java/bench/core$LLNode.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/joinr/performancepaper/HEAD/java/bench/core$LLNode.class -------------------------------------------------------------------------------- /java/bench/core$BinaryTreeNode.class: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/joinr/performancepaper/HEAD/java/bench/core$BinaryTreeNode.class -------------------------------------------------------------------------------- /doc/intro.md: -------------------------------------------------------------------------------- 1 | # Introduction to performancepaper 2 | 3 | TODO: write [great documentation](http://jacobian.org/writing/what-to-write/) 4 | -------------------------------------------------------------------------------- /java/bench/LLNode.java: -------------------------------------------------------------------------------- 1 | package bench; 2 | 3 | public class LLNode{ 4 | public LLNode next ; 5 | public LLNode (LLNode next){ 6 | this.next = next ;} 7 | } 8 | -------------------------------------------------------------------------------- /test/performancepaper/core_test.clj: -------------------------------------------------------------------------------- 1 | (ns performancepaper.core-test 2 | (:require [clojure.test :refer :all] 3 | [performancepaper.core :refer :all])) 4 | 5 | (deftest a-test 6 | (testing "FIXME, I fail." 7 | (is (= 0 1)))) 8 | -------------------------------------------------------------------------------- /java/bench/BinaryTreeNode.java: -------------------------------------------------------------------------------- 1 | package bench; 2 | 3 | //BinaryTreeNode not provided by the author.... 4 | //joinr's interpretation of BinaryTreeNode 5 | public class BinaryTreeNode{ 6 | public int value; 7 | public BinaryTreeNode left; 8 | public BinaryTreeNode right; 9 | public BinaryTreeNode(int v){ 10 | value = v; 11 | } 12 | } 13 | -------------------------------------------------------------------------------- /project.clj: -------------------------------------------------------------------------------- 1 | (defproject performancepaper "0.1.0-SNAPSHOT" 2 | :description "FIXME: write description" 3 | :url "http://example.com/FIXME" 4 | :license {:name "EPL-2.0 OR GPL-2.0-or-later WITH Classpath-exception-2.0" 5 | :url "https://www.eclipse.org/legal/epl-2.0/"} 6 | :dependencies [[org.clojure/clojure "1.10.1"]] 7 | :java-source-paths ["java"]) 8 | -------------------------------------------------------------------------------- /CHANGELOG.md: -------------------------------------------------------------------------------- 1 | # Change Log 2 | All notable changes to this project will be documented in this file. This change log follows the conventions of [keepachangelog.com](http://keepachangelog.com/). 3 | 4 | ## [Unreleased] 5 | ### Changed 6 | - Add a new arity to `make-widget-async` to provide a different widget shape. 7 | 8 | ## [0.1.1] - 2020-09-07 9 | ### Changed 10 | - Documentation on how to make the widgets. 11 | 12 | ### Removed 13 | - `make-widget-sync` - we're all async, all the time. 14 | 15 | ### Fixed 16 | - Fixed widget maker to keep working when daylight savings switches over. 17 | 18 | ## 0.1.0 - 2020-09-07 19 | ### Added 20 | - Files from the new template. 21 | - Widget maker public API - `make-widget-sync`. 22 | 23 | [Unreleased]: https://github.com/your-name/performancepaper/compare/0.1.1...HEAD 24 | [0.1.1]: https://github.com/your-name/performancepaper/compare/0.1.0...0.1.1 25 | -------------------------------------------------------------------------------- /java/bench/core.java: -------------------------------------------------------------------------------- 1 | package bench; 2 | 3 | import java.util.HashMap; 4 | import java.util.Queue; 5 | import java.util.ArrayList; 6 | import java.util.Arrays; 7 | import java.util.LinkedList; 8 | import bench.LLNode; 9 | import bench.BinaryTreeNode; 10 | 11 | public class core { 12 | // The idea behind this approach was thatseeing the way the languages perform in 13 | // common fundamental tasks wouldgive the reader an idea of how the languages will 14 | // perform in their application.The reason that the fundamental areas selected were 15 | // separated into their ownexperiments rather than putting them all into the same 16 | // program, was so thatthe reader could more easily predict which language is 17 | // better for their specific tasks. 18 | 19 | //The Java version that was used to execute both the Clojure and the Java 20 | //codewas 1.8.0_60. The JVM was run with the arguments –Xmx11g and -Xss11gto 21 | //increase the max heap and stack space to 11 gigabytes when needed forthe 22 | //experiments. 23 | 24 | // 3.1 25 | 26 | // The recursion experiment consisted of a number of recursion calls with only 27 | // acounter as a parameter and a simple exit condition. It was designed to test 28 | // theperformance of function calls in the two languages. The counter was a 29 | // prim-itive integer in both languages and was decreased by one for each 30 | // recursivecall. 31 | 32 | //Executiontimes were measured for problem sizes of 2000, 20000, 200000, 2000000 33 | //and 20000000, 34 | 35 | //3.1 Pure Recursion 36 | 37 | public static void Recurse(int cnt) 38 | {if (cnt > 0) 39 | Recurse (cnt - 1);} 40 | 41 | 42 | // (defn pure-recursion [cnt] 43 | // (if (> cnt 0) 44 | // (pure-recursion 45 | // (- cnt 1)))) 46 | 47 | 48 | //3.2 49 | 50 | // The sorting experiment consisted of sorting a collection of integers. In Clojure 51 | // this was done by sorting alistof integers, shuffled by theshufflefunc-tion, 52 | // using thesortfunction, all of which are included in theclojure.corelibrary. In 53 | // Java this was done similarly by sorting an array of primitive in-tegers, which 54 | // was shuffled usingjava.util.Collections.shuffle, using theAr-rays.sortfunction. 55 | 56 | //Execution times were measured for collec-tions with 2000, 20000, 200000, 57 | //2000000 and 20000000 integers. 58 | 59 | public static int[] createArray (int size) 60 | {int counter = Integer.MIN_VALUE; 61 | ArrayList arrList= new ArrayList (size) ; 62 | for(int i = 0; i < size ; ++ i) 63 | arrList.add (counter ++); 64 | java.util.Collections.shuffle(arrList); 65 | int[] retArr = new int[size] ; 66 | for(int i = 0; i < size ; ++ i ) 67 | retArr [i] = arrList.get(i); 68 | Arrays.sort(retArr); 69 | return retArr;} 70 | 71 | // (let [list (-> (create-list size (atom Integer/MIN_VALUE)) 72 | // (shuffle))] 73 | // ...) //author elides this, and `create-list` is not provided. 74 | 75 | // (sort list) 76 | 77 | //3.3 Map Creation 78 | 79 | // The map creation experiment consisted of adding integers as keys and valuesto a 80 | // map. In Java they were added to aHashMapfrom thejava.util library, andin Clojure 81 | // they were added to the built-inpersistent mapdata structure. 82 | 83 | //Execution times were measured for20000, 63246, 200000, 632456 and 2000000 84 | //different key-value pairs. 85 | 86 | public static HashMap createMap (int sze) 87 | {HashMap retMap= new HashMap(sze) ; 88 | for (int i = 0; i < sze ;) 89 | retMap.put(i , ++ i ) ; 90 | return retMap ;} 91 | 92 | // (defn create-map[size] 93 | // (loop [map (transient {}), 94 | // i (int size)] 95 | // (if (> i 0) 96 | // (recur (assoc! map i (+ i 1)) (- i 1) ) 97 | // (persistent! map)))) 98 | 99 | //3.4 Object Creation 100 | 101 | // The object creation experiment consisted of creating a linked list without 102 | // val-ues. In Java a custom class was used to create the links while in Clojure 103 | // nestedpresistent maps were used. The links were created backwards in both 104 | // lan-guages, meaning that the first object created would have a next-pointer with 105 | // anull value, and the second object created would point to the first, and so on. 106 | 107 | // Execution times were measured for 100000, 316228, 1000000, 3162278and 10000000 108 | // linked objects 109 | 110 | // 111 | public static LLNode createObjects (int count ) 112 | {LLNode last = null ; 113 | for (int i = 0; i < count; ++ i) 114 | last = new LLNode(last) ; 115 | return last;} 116 | 117 | // (defn create-objects [count] 118 | // (loop [last nil 119 | // i (int count)] 120 | // (if (= 0 i ) 121 | // last 122 | // (recur {:next last} (- i 1))))) 123 | 124 | //3.5 Binary Tree DFS 125 | 126 | // The binary tree DFS experiment consisted of searching a binary tree for a 127 | // valueit did not contain using depth first search. The depth first search was 128 | // implemented recursively in both languages. In Java the binary tree was 129 | // representedby a custom class while in Clojure they were represented using nested 130 | // persistent maps. 131 | 132 | public static BinaryTreeNode createBinaryTree (int depth, int[] counter) 133 | {if (depth == 0) return null; 134 | int value = counter[0]++; 135 | BinaryTreeNode btn = new BinaryTreeNode(value); 136 | btn.left = createBinaryTree(depth - 1, counter); 137 | btn.right = createBinaryTree(depth - 1 , counter); 138 | return btn ;} 139 | 140 | public static boolean binaryTreeDFS(BinaryTreeNode root, int target) 141 | {if (root == null) return false ; 142 | return root.value == target || 143 | binaryTreeDFS(root.left, target) || 144 | binaryTreeDFS (root.right, target);} 145 | 146 | public static boolean binaryTreeDFSTest(int depth, int target) 147 | { 148 | int[] counter = new int[1]; 149 | counter[0] = 0; 150 | return binaryTreeDFS(createBinaryTree(depth,counter),target); 151 | } 152 | 153 | // (defn create-binary-tree [depth counter-atom] 154 | // (when (> depth 0) 155 | // (let [val @counter-atom] 156 | // (swap! counter-atom inc ) 157 | // {:value val 158 | // :left (create-binary-tree (- depth 1) counter-atom ) 159 | // :right (create-binary-tree (- depth 1) counter-atom )}))) 160 | 161 | // (defn binary-tree-DFS [root target] 162 | // (if (nil? root) 163 | // false 164 | // (or (= (:value root) target) 165 | // (binary-tree-DFS (:left root) target) 166 | // (binary-tree-DFS (:right root) target)))) 167 | 168 | //3.6 Binary Tree BFS 169 | 170 | //The binary tree BFS, similar to the binary tree DFS experiment consisted 171 | //ofsearching a binary tree for a value it did not contain, but using breadth 172 | //first search. The breadth first search was implemented iteratively in both 173 | //languages.In Java the binary tree was represented by a custom class while in 174 | //Clojure theywere represented using nested persistent maps. 175 | 176 | public static boolean binaryTreeBFS(BinaryTreeNode root, int target) 177 | {Queuequeue= new LinkedList () ; 178 | queue.add(root) ; 179 | while (! queue.isEmpty()) 180 | {BinaryTreeNode item = queue.poll(); 181 | if (item.value == target) return true; 182 | if (item.left != null) queue.add (item.left); 183 | if (item.right != null) queue.add (item.right);} 184 | return false;} 185 | 186 | public static boolean binaryTreeBFSTest(int depth, int target) 187 | { 188 | int[] counter = new int[1]; 189 | counter[0] = 0; 190 | return binaryTreeBFS(createBinaryTree(depth,counter),target); 191 | } 192 | 193 | // (defn binary-tree-BFS [root target] 194 | // (loop [queue (conj clojure.lang.PersistentQueue/EMPTY root)] 195 | // (if (empty? queue) 196 | // false 197 | // (let [item (peek queue)] 198 | // (if (= target (:value item)) 199 | // true 200 | // (recur (as-> (pop queue) $ 201 | // (if (nil? (:left item)) 202 | // $ 203 | // (conj $ (:left item))) 204 | // (if (nil? (:right item)) 205 | // $ 206 | // (conj $ (:right item )))))))))) 207 | } 208 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Eclipse Public License - v 2.0 2 | 3 | THE ACCOMPANYING PROGRAM IS PROVIDED UNDER THE TERMS OF THIS ECLIPSE 4 | PUBLIC LICENSE ("AGREEMENT"). ANY USE, REPRODUCTION OR DISTRIBUTION 5 | OF THE PROGRAM CONSTITUTES RECIPIENT'S ACCEPTANCE OF THIS AGREEMENT. 6 | 7 | 1. DEFINITIONS 8 | 9 | "Contribution" means: 10 | 11 | a) in the case of the initial Contributor, the initial content 12 | Distributed under this Agreement, and 13 | 14 | b) in the case of each subsequent Contributor: 15 | i) changes to the Program, and 16 | ii) additions to the Program; 17 | where such changes and/or additions to the Program originate from 18 | and are Distributed by that particular Contributor. A Contribution 19 | "originates" from a Contributor if it was added to the Program by 20 | such Contributor itself or anyone acting on such Contributor's behalf. 21 | Contributions do not include changes or additions to the Program that 22 | are not Modified Works. 23 | 24 | "Contributor" means any person or entity that Distributes the Program. 25 | 26 | "Licensed Patents" mean patent claims licensable by a Contributor which 27 | are necessarily infringed by the use or sale of its Contribution alone 28 | or when combined with the Program. 29 | 30 | "Program" means the Contributions Distributed in accordance with this 31 | Agreement. 32 | 33 | "Recipient" means anyone who receives the Program under this Agreement 34 | or any Secondary License (as applicable), including Contributors. 35 | 36 | "Derivative Works" shall mean any work, whether in Source Code or other 37 | form, that is based on (or derived from) the Program and for which the 38 | editorial revisions, annotations, elaborations, or other modifications 39 | represent, as a whole, an original work of authorship. 40 | 41 | "Modified Works" shall mean any work in Source Code or other form that 42 | results from an addition to, deletion from, or modification of the 43 | contents of the Program, including, for purposes of clarity any new file 44 | in Source Code form that contains any contents of the Program. Modified 45 | Works shall not include works that contain only declarations, 46 | interfaces, types, classes, structures, or files of the Program solely 47 | in each case in order to link to, bind by name, or subclass the Program 48 | or Modified Works thereof. 49 | 50 | "Distribute" means the acts of a) distributing or b) making available 51 | in any manner that enables the transfer of a copy. 52 | 53 | "Source Code" means the form of a Program preferred for making 54 | modifications, including but not limited to software source code, 55 | documentation source, and configuration files. 56 | 57 | "Secondary License" means either the GNU General Public License, 58 | Version 2.0, or any later versions of that license, including any 59 | exceptions or additional permissions as identified by the initial 60 | Contributor. 61 | 62 | 2. GRANT OF RIGHTS 63 | 64 | a) Subject to the terms of this Agreement, each Contributor hereby 65 | grants Recipient a non-exclusive, worldwide, royalty-free copyright 66 | license to reproduce, prepare Derivative Works of, publicly display, 67 | publicly perform, Distribute and sublicense the Contribution of such 68 | Contributor, if any, and such Derivative Works. 69 | 70 | b) Subject to the terms of this Agreement, each Contributor hereby 71 | grants Recipient a non-exclusive, worldwide, royalty-free patent 72 | license under Licensed Patents to make, use, sell, offer to sell, 73 | import and otherwise transfer the Contribution of such Contributor, 74 | if any, in Source Code or other form. This patent license shall 75 | apply to the combination of the Contribution and the Program if, at 76 | the time the Contribution is added by the Contributor, such addition 77 | of the Contribution causes such combination to be covered by the 78 | Licensed Patents. The patent license shall not apply to any other 79 | combinations which include the Contribution. No hardware per se is 80 | licensed hereunder. 81 | 82 | c) Recipient understands that although each Contributor grants the 83 | licenses to its Contributions set forth herein, no assurances are 84 | provided by any Contributor that the Program does not infringe the 85 | patent or other intellectual property rights of any other entity. 86 | Each Contributor disclaims any liability to Recipient for claims 87 | brought by any other entity based on infringement of intellectual 88 | property rights or otherwise. As a condition to exercising the 89 | rights and licenses granted hereunder, each Recipient hereby 90 | assumes sole responsibility to secure any other intellectual 91 | property rights needed, if any. For example, if a third party 92 | patent license is required to allow Recipient to Distribute the 93 | Program, it is Recipient's responsibility to acquire that license 94 | before distributing the Program. 95 | 96 | d) Each Contributor represents that to its knowledge it has 97 | sufficient copyright rights in its Contribution, if any, to grant 98 | the copyright license set forth in this Agreement. 99 | 100 | e) Notwithstanding the terms of any Secondary License, no 101 | Contributor makes additional grants to any Recipient (other than 102 | those set forth in this Agreement) as a result of such Recipient's 103 | receipt of the Program under the terms of a Secondary License 104 | (if permitted under the terms of Section 3). 105 | 106 | 3. REQUIREMENTS 107 | 108 | 3.1 If a Contributor Distributes the Program in any form, then: 109 | 110 | a) the Program must also be made available as Source Code, in 111 | accordance with section 3.2, and the Contributor must accompany 112 | the Program with a statement that the Source Code for the Program 113 | is available under this Agreement, and informs Recipients how to 114 | obtain it in a reasonable manner on or through a medium customarily 115 | used for software exchange; and 116 | 117 | b) the Contributor may Distribute the Program under a license 118 | different than this Agreement, provided that such license: 119 | i) effectively disclaims on behalf of all other Contributors all 120 | warranties and conditions, express and implied, including 121 | warranties or conditions of title and non-infringement, and 122 | implied warranties or conditions of merchantability and fitness 123 | for a particular purpose; 124 | 125 | ii) effectively excludes on behalf of all other Contributors all 126 | liability for damages, including direct, indirect, special, 127 | incidental and consequential damages, such as lost profits; 128 | 129 | iii) does not attempt to limit or alter the recipients' rights 130 | in the Source Code under section 3.2; and 131 | 132 | iv) requires any subsequent distribution of the Program by any 133 | party to be under a license that satisfies the requirements 134 | of this section 3. 135 | 136 | 3.2 When the Program is Distributed as Source Code: 137 | 138 | a) it must be made available under this Agreement, or if the 139 | Program (i) is combined with other material in a separate file or 140 | files made available under a Secondary License, and (ii) the initial 141 | Contributor attached to the Source Code the notice described in 142 | Exhibit A of this Agreement, then the Program may be made available 143 | under the terms of such Secondary Licenses, and 144 | 145 | b) a copy of this Agreement must be included with each copy of 146 | the Program. 147 | 148 | 3.3 Contributors may not remove or alter any copyright, patent, 149 | trademark, attribution notices, disclaimers of warranty, or limitations 150 | of liability ("notices") contained within the Program from any copy of 151 | the Program which they Distribute, provided that Contributors may add 152 | their own appropriate notices. 153 | 154 | 4. COMMERCIAL DISTRIBUTION 155 | 156 | Commercial distributors of software may accept certain responsibilities 157 | with respect to end users, business partners and the like. While this 158 | license is intended to facilitate the commercial use of the Program, 159 | the Contributor who includes the Program in a commercial product 160 | offering should do so in a manner which does not create potential 161 | liability for other Contributors. Therefore, if a Contributor includes 162 | the Program in a commercial product offering, such Contributor 163 | ("Commercial Contributor") hereby agrees to defend and indemnify every 164 | other Contributor ("Indemnified Contributor") against any losses, 165 | damages and costs (collectively "Losses") arising from claims, lawsuits 166 | and other legal actions brought by a third party against the Indemnified 167 | Contributor to the extent caused by the acts or omissions of such 168 | Commercial Contributor in connection with its distribution of the Program 169 | in a commercial product offering. The obligations in this section do not 170 | apply to any claims or Losses relating to any actual or alleged 171 | intellectual property infringement. In order to qualify, an Indemnified 172 | Contributor must: a) promptly notify the Commercial Contributor in 173 | writing of such claim, and b) allow the Commercial Contributor to control, 174 | and cooperate with the Commercial Contributor in, the defense and any 175 | related settlement negotiations. The Indemnified Contributor may 176 | participate in any such claim at its own expense. 177 | 178 | For example, a Contributor might include the Program in a commercial 179 | product offering, Product X. That Contributor is then a Commercial 180 | Contributor. If that Commercial Contributor then makes performance 181 | claims, or offers warranties related to Product X, those performance 182 | claims and warranties are such Commercial Contributor's responsibility 183 | alone. Under this section, the Commercial Contributor would have to 184 | defend claims against the other Contributors related to those performance 185 | claims and warranties, and if a court requires any other Contributor to 186 | pay any damages as a result, the Commercial Contributor must pay 187 | those damages. 188 | 189 | 5. NO WARRANTY 190 | 191 | EXCEPT AS EXPRESSLY SET FORTH IN THIS AGREEMENT, AND TO THE EXTENT 192 | PERMITTED BY APPLICABLE LAW, THE PROGRAM IS PROVIDED ON AN "AS IS" 193 | BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, EITHER EXPRESS OR 194 | IMPLIED INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OR CONDITIONS OF 195 | TITLE, NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR 196 | PURPOSE. Each Recipient is solely responsible for determining the 197 | appropriateness of using and distributing the Program and assumes all 198 | risks associated with its exercise of rights under this Agreement, 199 | including but not limited to the risks and costs of program errors, 200 | compliance with applicable laws, damage to or loss of data, programs 201 | or equipment, and unavailability or interruption of operations. 202 | 203 | 6. DISCLAIMER OF LIABILITY 204 | 205 | EXCEPT AS EXPRESSLY SET FORTH IN THIS AGREEMENT, AND TO THE EXTENT 206 | PERMITTED BY APPLICABLE LAW, NEITHER RECIPIENT NOR ANY CONTRIBUTORS 207 | SHALL HAVE ANY LIABILITY FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, 208 | EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING WITHOUT LIMITATION LOST 209 | PROFITS), HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN 210 | CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) 211 | ARISING IN ANY WAY OUT OF THE USE OR DISTRIBUTION OF THE PROGRAM OR THE 212 | EXERCISE OF ANY RIGHTS GRANTED HEREUNDER, EVEN IF ADVISED OF THE 213 | POSSIBILITY OF SUCH DAMAGES. 214 | 215 | 7. GENERAL 216 | 217 | If any provision of this Agreement is invalid or unenforceable under 218 | applicable law, it shall not affect the validity or enforceability of 219 | the remainder of the terms of this Agreement, and without further 220 | action by the parties hereto, such provision shall be reformed to the 221 | minimum extent necessary to make such provision valid and enforceable. 222 | 223 | If Recipient institutes patent litigation against any entity 224 | (including a cross-claim or counterclaim in a lawsuit) alleging that the 225 | Program itself (excluding combinations of the Program with other software 226 | or hardware) infringes such Recipient's patent(s), then such Recipient's 227 | rights granted under Section 2(b) shall terminate as of the date such 228 | litigation is filed. 229 | 230 | All Recipient's rights under this Agreement shall terminate if it 231 | fails to comply with any of the material terms or conditions of this 232 | Agreement and does not cure such failure in a reasonable period of 233 | time after becoming aware of such noncompliance. If all Recipient's 234 | rights under this Agreement terminate, Recipient agrees to cease use 235 | and distribution of the Program as soon as reasonably practicable. 236 | However, Recipient's obligations under this Agreement and any licenses 237 | granted by Recipient relating to the Program shall continue and survive. 238 | 239 | Everyone is permitted to copy and distribute copies of this Agreement, 240 | but in order to avoid inconsistency the Agreement is copyrighted and 241 | may only be modified in the following manner. The Agreement Steward 242 | reserves the right to publish new versions (including revisions) of 243 | this Agreement from time to time. No one other than the Agreement 244 | Steward has the right to modify this Agreement. The Eclipse Foundation 245 | is the initial Agreement Steward. The Eclipse Foundation may assign the 246 | responsibility to serve as the Agreement Steward to a suitable separate 247 | entity. Each new version of the Agreement will be given a distinguishing 248 | version number. The Program (including Contributions) may always be 249 | Distributed subject to the version of the Agreement under which it was 250 | received. In addition, after a new version of the Agreement is published, 251 | Contributor may elect to Distribute the Program (including its 252 | Contributions) under the new version. 253 | 254 | Except as expressly stated in Sections 2(a) and 2(b) above, Recipient 255 | receives no rights or licenses to the intellectual property of any 256 | Contributor under this Agreement, whether expressly, by implication, 257 | estoppel or otherwise. All rights in the Program not expressly granted 258 | under this Agreement are reserved. Nothing in this Agreement is intended 259 | to be enforceable by any entity that is not a Contributor or Recipient. 260 | No third-party beneficiary rights are created under this Agreement. 261 | 262 | Exhibit A - Form of Secondary Licenses Notice 263 | 264 | "This Source Code may also be made available under the following 265 | Secondary Licenses when the conditions for such availability set forth 266 | in the Eclipse Public License, v. 2.0 are satisfied: {name license(s), 267 | version(s), and exceptions or additional permissions here}." 268 | 269 | Simply including a copy of this Agreement, including this Exhibit A 270 | is not sufficient to license the Source Code under Secondary Licenses. 271 | 272 | If it is not possible or desirable to put the notice in a particular 273 | file, then You may include the notice in a location (such as a LICENSE 274 | file in a relevant directory) where a recipient would be likely to 275 | look for such a notice. 276 | 277 | You may add additional accurate notices of copyright ownership. 278 | -------------------------------------------------------------------------------- /src/performancepaper/core.clj: -------------------------------------------------------------------------------- 1 | (ns performancepaper.core 2 | (:require [criterium.core :as c]) 3 | (:import bench.core 4 | [java.util Collections Arrays ArrayList])) 5 | 6 | (defmacro with-unchecked [& body] 7 | `(do (set! *unchecked-math* :warn-on-boxed) 8 | ~@body 9 | (set! *unchecked-math* false) 10 | nil)) 11 | 12 | ;;Quoted from paper: 13 | ;; "The idea behind this approach was that seeing the way the languages perform in 14 | ;; common fundamental tasks wouldgive the reader an idea of how the languages will 15 | ;; perform in their application.The reason that the fundamental areas selected were 16 | ;; separated into their ownexperiments rather than putting them all into the same 17 | ;; program, was so thatthe reader could more easily predict which language is 18 | ;; better for their specific tasks. 19 | 20 | ;;The Java version that was used to execute both the Clojure and the Java 21 | ;;codewas 1.8.0_60. The JVM was run with the arguments –Xmx11g and -Xss11gto 22 | ;;increase the max heap and stack space to 11 gigabytes when needed forthe 23 | ;;experiments."" 24 | 25 | ;; 3.1 26 | 27 | ;; "The recursion experiment consisted of a number of recursion calls with only 28 | ;; acounter as a parameter and a simple exit condition. It was designed to test 29 | ;; theperformance of function calls in the two languages. The counter was a 30 | ;; prim-itive integer in both languages and was decreased by one for each 31 | ;; recursivecall." 32 | 33 | ;;Executiontimes were measured for problem sizes of 2000, 20000, 200000, 2000000 34 | ;;and 20000000,"" 35 | 36 | ;;3.1 Pure Recursion 37 | 38 | ;; private void Recurse(int cnt) 39 | ;; {if (cnt > 0) 40 | ;; Recurse (cnt - 1);} 41 | 42 | ;; performancepaper.core> (c/quick-bench (bench.core/Recurse 10)) 43 | ;; Evaluation count : 49359366 in 6 samples of 8226561 calls. 44 | ;; Execution time mean : 10.123866 ns 45 | ;; Execution time std-deviation : 0.199596 ns 46 | ;; Execution time lower quantile : 9.852564 ns ( 2.5%) 47 | ;; Execution time upper quantile : 10.309200 ns (97.5%) 48 | ;; Overhead used : 1.797578 ns 49 | 50 | ;;21x slower 51 | (defn pure-recursion [cnt] 52 | (if (> cnt 0) 53 | (pure-recursion 54 | (- cnt 1)))) 55 | 56 | ;; performancepaper.core> (c/quick-bench (pure-recursion 10)) 57 | ;; Evaluation count : 2776386 in 6 samples of 462731 calls. 58 | ;; Execution time mean : 217.748915 ns 59 | ;; Execution time std-deviation : 3.708932 ns 60 | ;; Execution time lower quantile : 213.224904 ns ( 2.5%) 61 | ;; Execution time upper quantile : 221.431481 ns (97.5%) 62 | ;; Overhead used : 1.804565 ns 63 | 64 | ;;1.5x 65 | (with-unchecked 66 | (defn pure-recursion2 [^long cnt] 67 | (if (pos? cnt) 68 | (pure-recursion2 (dec cnt)))) 69 | ) 70 | 71 | ;; Evaluation count : 34678608 in 6 samples of 5779768 calls. 72 | ;; Execution time mean : 15.723221 ns 73 | ;; Execution time std-deviation : 0.156759 ns 74 | ;; Execution time lower quantile : 15.545890 ns ( 2.5%) 75 | ;; Execution time upper quantile : 15.907675 ns (97.5%) 76 | ;; Overhead used : 1.804565 ns 77 | 78 | (defn pure-recursion3 [cnt] 79 | (if (> cnt 0) 80 | (recur (dec cnt)))) 81 | 82 | ;; Evaluation count : 9460254 in 6 samples of 1576709 calls. 83 | ;; Execution time mean : 62.493096 ns 84 | ;; Execution time std-deviation : 0.772831 ns 85 | ;; Execution time lower quantile : 61.875935 ns ( 2.5%) 86 | ;; Execution time upper quantile : 63.504731 ns (97.5%) 87 | ;; Overhead used : 1.797578 ns 88 | ;; nil 89 | 90 | ;;faster than java. 91 | 92 | ;;0.697x, faster. We're also somewhat cheating at the 93 | ;;machine level, but at the language level, "recur" is fair 94 | ;;game to avoid function call overhead, which java can't do. 95 | (defn pure-recursion4 [^long cnt] 96 | (if (> cnt 0) 97 | (recur (dec cnt)))) 98 | 99 | ;; performancepaper.core> (c/quick-bench (pure-recursion4 10)) 100 | ;; Evaluation count : 68567172 in 6 samples of 11427862 calls. 101 | ;; Execution time mean : 6.972233 ns 102 | ;; Execution time std-deviation : 0.059662 ns 103 | ;; Execution time lower quantile : 6.887818 ns ( 2.5%) 104 | ;; Execution time upper quantile : 7.030860 ns (97.5%) 105 | ;; Overhead used : 1.797578 ns 106 | ;; nil 107 | 108 | (with-unchecked 109 | (defn pure-recursion5 [^long cnt] 110 | (if (> cnt 0) 111 | (recur (dec cnt)))) 112 | ) 113 | 114 | 115 | ;; Evaluation count : 73179480 in 6 samples of 12196580 calls. 116 | ;; Execution time mean : 6.533542 ns 117 | ;; Execution time std-deviation : 0.173063 ns 118 | ;; Execution time lower quantile : 6.235797 ns ( 2.5%) 119 | ;; Execution time upper quantile : 6.687093 ns (97.5%) 120 | ;; Overhead used : 1.797578 ns 121 | 122 | ;; Found 1 outliers in 6 samples (16.6667 %) 123 | ;; low-severe 1 (16.6667 %) 124 | ;; Variance from outliers : 13.8889 % Variance is moderately inflated by outliers 125 | 126 | ;;3.2 127 | 128 | ;; "The sorting experiment consisted of sorting a collection of integers. In Clojure 129 | ;; this was done by sorting alistof integers, shuffled by theshufflefunc-tion, 130 | ;; using thesortfunction, all of which are included in theclojure.corelibrary. In 131 | ;; Java this was done similarly by sorting an array of primitive in-tegers, which 132 | ;; was shuffled usingjava.util.Collections.shuffle, using theAr-rays.sortfunction. 133 | 134 | ;;Execution times were measured for collec-tions with 2000, 20000, 200000, 135 | ;;2000000 and 20000000 integers." 136 | 137 | ;; private int[] createArray (int size) 138 | ;; {int counter = Integer.MIN_VALUE; 139 | ;; ArrayList arrList= new ArrayList (size) ; 140 | ;; for(int i = 0; i < size ; ++ i) 141 | ;; arrList.add (counter ++); 142 | ;; java.util.Collections.shuffle(arrList); 143 | ;; int[] retArr = new int[size] ; 144 | ;; for(int i = 0; i < size ; ++ i ) 145 | ;; retArr [i] = arrList.get(i); 146 | ;; return retArr;} 147 | 148 | ;; Arrays.sort(array) ; 149 | 150 | ;; performancepaper.core> (c/quick-bench (core/createArray 100)) 151 | ;; Evaluation count : 138942 in 6 samples of 23157 calls. 152 | ;; Execution time mean : 4.369374 µs 153 | ;; Execution time std-deviation : 63.001723 ns 154 | ;; Execution time lower quantile : 4.310739 µs ( 2.5%) 155 | ;; Execution time upper quantile : 4.467841 µs (97.5%) 156 | ;; Overhead used : 1.797578 ns 157 | 158 | ;;Clojure implemention underspecified 159 | 160 | ;; (let [list (-> (create-list size (atom Integer/MIN_VALUE)) 161 | ;; (shuffle))] 162 | ;; ...) ;;author elides this, and `create-list` is not provided. 163 | 164 | ;; (sort list) 165 | 166 | ;;7.99 ~ 8x, slower 167 | (defn create-sorted-array [n] 168 | (->> (range Integer/MIN_VALUE 0 1) 169 | (take n) 170 | shuffle 171 | sort)) 172 | 173 | ;; performancepaper.core> (c/quick-bench (create-sorted-array 100)) 174 | ;; Evaluation count : 17532 in 6 samples of 2922 calls. 175 | ;; Execution time mean : 34.841374 µs 176 | ;; Execution time std-deviation : 549.515702 ns 177 | ;; Execution time lower quantile : 34.210927 µs ( 2.5%) 178 | ;; Execution time upper quantile : 35.646224 µs (97.5%) 179 | ;; Overhead used : 1.804565 ns 180 | 181 | ;; Found 1 outliers in 6 samples (16.6667 %) 182 | ;; low-severe 1 (16.6667 %) 183 | ;; Variance from outliers : 13.8889 % Variance is moderately inflated by outliers 184 | 185 | ;;3x 186 | (defn create-sorted-array2 [^long n] 187 | (let [^ArrayList alist 188 | (->> (range Integer/MIN_VALUE 0 1) 189 | (transduce (take n) 190 | (completing (fn [^ArrayList acc n] 191 | (doto acc (.add n)))) 192 | (java.util.ArrayList. n))) 193 | _ (java.util.Collections/shuffle alist)] 194 | (doto (int-array alist) Arrays/sort))) 195 | 196 | ;; Evaluation count : 46506 in 6 samples of 7751 calls. 197 | ;; Execution time mean : 12.985146 µs 198 | ;; Execution time std-deviation : 570.944434 ns 199 | ;; Execution time lower quantile : 12.451225 µs ( 2.5%) 200 | ;; Execution time upper quantile : 13.917159 µs (97.5%) 201 | ;; Overhead used : 1.800162 ns 202 | 203 | ;; Found 1 outliers in 6 samples (16.6667 %) 204 | ;; low-severe 1 (16.6667 %) 205 | ;; Variance from outliers : 13.8889 % Variance is moderately inflated by outliers 206 | ;; nil 207 | 208 | ;;1.07x, slightly slower but meh. 209 | (with-unchecked 210 | (defn create-sorted-array3 [^long size] 211 | (let [^ArrayList alist 212 | (loop [^ArrayList acc (java.util.ArrayList. size) 213 | counter (int Integer/MIN_VALUE) 214 | n 0] 215 | (if (< n size) 216 | (let [c (inc counter)] 217 | (recur (doto acc (.add c)) 218 | c 219 | (inc n))) 220 | acc)) 221 | _ (Collections/shuffle alist) 222 | res (int-array size)] 223 | (dotimes [i size] (aset res i ^int (.get alist i))) 224 | (doto res Arrays/sort)))) 225 | 226 | ;; performancepaper.core> (c/quick-bench (create-sorted-array3 100)) 227 | ;; Evaluation count : 130794 in 6 samples of 21799 calls. 228 | ;; Execution time mean : 4.669894 µs 229 | ;; Execution time std-deviation : 179.454425 ns 230 | ;; Execution time lower quantile : 4.477268 µs ( 2.5%) 231 | ;; Execution time upper quantile : 4.902860 µs (97.5%) 232 | ;; Overhead used : 1.800162 ns 233 | 234 | ;;3.3 Map Creation 235 | 236 | ;;"The map creation experiment consisted of adding integers as keys and valuesto a 237 | ;; map. In Java they were added to aHashMapfrom thejava.util library, andin Clojure 238 | ;; they were added to the built-inpersistent mapdata structure. 239 | 240 | ;;Execution times were measured for20000, 63246, 200000, 632456 and 2000000 241 | ;;different key-value pairs." 242 | 243 | ;; private HashMap createMap (int sze) 244 | ;; {HashMap retMap= new HashMap(sze) ; 245 | ;; for (int i = 0; i < sze ;) 246 | ;; retMap.put(i , ++ i ) ; 247 | ;; return retMap ;} 248 | 249 | ;; Evaluation count : 538998 in 6 samples of 89833 calls. 250 | ;; Execution time mean : 1.178573 µs 251 | ;; Execution time std-deviation : 40.404054 ns 252 | ;; Execution time lower quantile : 1.142367 µs ( 2.5%) 253 | ;; Execution time upper quantile : 1.237344 µs (97.5%) 254 | ;; Overhead used : 1.800162 ns 255 | 256 | ;;9x 257 | (defn create-map [size] 258 | (loop [map (transient {}), 259 | i (int size)] 260 | (if (> i 0) 261 | (recur (assoc! map i (+ i 1)) (- i 1) ) 262 | (persistent! map)))) 263 | 264 | ;; Evaluation count : 61686 in 6 samples of 10281 calls. 265 | ;; Execution time mean : 9.874480 µs 266 | ;; Execution time std-deviation : 96.973621 ns 267 | ;; Execution time lower quantile : 9.750675 µs ( 2.5%) 268 | ;; Execution time upper quantile : 9.964194 µs (97.5%) 269 | ;; Overhead used : 1.800162 ns 270 | 271 | ;;not much change, still around 9x slower. 272 | (with-unchecked 273 | (defn create-map2 [size] 274 | (loop [^clojure.lang.ITransientAssociative 275 | map (transient {}), 276 | i (int size)] 277 | (if (> i 0) 278 | (recur (.assoc map i (+ i 1)) 279 | (- i 1)) 280 | (persistent! map))))) 281 | ;; performancepaper.core> (c/quick-bench (create-map2 100)) 282 | ;; Evaluation count : 61260 in 6 samples of 10210 calls. 283 | ;; Execution time mean : 9.576160 µs 284 | ;; Execution time std-deviation : 147.638187 ns 285 | ;; Execution time lower quantile : 9.392887 µs ( 2.5%) 286 | ;; Execution time upper quantile : 9.723504 µs (97.5%) 287 | ;; Overhead used : 1.804565 ns 288 | 289 | 290 | ;;1.04x, slower but meh. 291 | (with-unchecked 292 | (defn create-map3 [^ long size] 293 | (let [^java.util.HashMap map (java.util.HashMap. size)] 294 | (dotimes [i size] 295 | (.put map i (+ i 1)))))) 296 | 297 | ;; performancepaper.core> (c/quick-bench (create-map3 100)) 298 | ;; Evaluation count : 487116 in 6 samples of 81186 calls. 299 | ;; Execution time mean : 1.229078 µs 300 | ;; Execution time std-deviation : 30.572826 ns 301 | ;; Execution time lower quantile : 1.191533 µs ( 2.5%) 302 | ;; Execution time upper quantile : 1.268660 µs (97.5%) 303 | ;; Overhead used : 1.804565 ns 304 | 305 | 306 | ;;3.4 Object Creation 307 | 308 | ;; The object creation experiment consisted of creating a linked list without 309 | ;; val-ues. In Java a custom class was used to create the links while in Clojure 310 | ;; nestedpresistent maps were used. The links were created backwards in both 311 | ;; lan-guages, meaning that the first object created would have a next-pointer with 312 | ;; anull value, and the second object created would point to the first, and so on. 313 | 314 | ;; Execution times were measured for 100000, 316228, 1000000, 3162278and 10000000 315 | ;; linked objects 316 | 317 | ;; private class LLNode{ 318 | ;; public LLNode next ; 319 | ;; public LLNode (LLNode next ){ 320 | ;; this.next = next ;} 321 | 322 | ;; ;; 323 | ;; private LLNode create Objects (int count ) 324 | ;; {LLNode last = null ; 325 | ;; for (int i = 0; i < count; ++ i) 326 | ;; last = new LLNode(last) ; 327 | ;; return last;} 328 | 329 | ;; performancepaper.core> (c/quick-bench (bench.core/createObjects 100)) 330 | ;; Evaluation count : 2368566 in 6 samples of 394761 calls. 331 | ;; Execution time mean : 249.927510 ns 332 | ;; Execution time std-deviation : 4.557640 ns 333 | ;; Execution time lower quantile : 244.464795 ns ( 2.5%) 334 | ;; Execution time upper quantile : 254.444188 ns (97.5%) 335 | ;; Overhead used : 1.800162 ns 336 | 337 | ;;2.7x, slower 338 | (defn create-objects [count] 339 | (loop [last nil 340 | i (int count)] 341 | (if (= 0 i ) 342 | last 343 | (recur {:next last} (- i 1))))) 344 | 345 | ;; Evaluation count : 916590 in 6 samples of 152765 calls. 346 | ;; Execution time mean : 673.619823 ns 347 | ;; Execution time std-deviation : 26.588156 ns 348 | ;; Execution time lower quantile : 647.556044 ns ( 2.5%) 349 | ;; Execution time upper quantile : 701.464334 ns (97.5%) 350 | ;; Overhead used : 1.800162 ns 351 | 352 | ;;as expected, marginal improvements. Allocations 353 | ;;are hurting us here, as well as array-map instantation. 354 | ;;We're on a slow path compared to java. 355 | (with-unchecked 356 | (defn create-objects2 [count] 357 | (loop [last nil 358 | i (int count)] 359 | (if (== i 0) 360 | last 361 | (recur {:next last} (- i 1)))))) 362 | 363 | ;; Evaluation count : 933462 in 6 samples of 155577 calls. 364 | ;; Execution time mean : 646.923626 ns 365 | ;; Execution time std-deviation : 11.946099 ns 366 | ;; Execution time lower quantile : 634.453274 ns ( 2.5%) 367 | ;; Execution time upper quantile : 664.344180 ns (97.5%) 368 | ;; Overhead used : 1.800162 ns 369 | 370 | ;;records are faster to construct, but implement a bunch of 371 | ;;stuff and carry more state, so more setup. Still very 372 | ;;much faster to create when you have fixed fields, like 373 | ;;the node class. 374 | (defrecord ll-node [next]) 375 | 376 | ;;1.39x, slower but getting close. 377 | (defn create-objects3 [count] 378 | (loop [last nil 379 | i (int count)] 380 | (if (== i 0) 381 | last 382 | (recur (ll-node. last) (- i 1))))) 383 | 384 | ;; Evaluation count : 1699422 in 6 samples of 283237 calls. 385 | ;; Execution time mean : 348.583970 ns 386 | ;; Execution time std-deviation : 6.587955 ns 387 | ;; Execution time lower quantile : 337.022098 ns ( 2.5%) 388 | ;; Execution time upper quantile : 354.655388 ns (97.5%) 389 | ;; Overhead used : 1.800162 ns 390 | 391 | ;; Found 1 outliers in 6 samples (16.6667 %) 392 | ;; low-severe 1 (16.6667 %) 393 | ;; Variance from outliers : 13.8889 % Variance is moderately inflated by outliers 394 | 395 | ;;revisit 396 | ;;checked comparisons don't buy us anything, can we get allocation 397 | ;;faster? 398 | (with-unchecked 399 | (defn create-objects4 [^long count] 400 | (loop [last nil 401 | i count] 402 | (if (zero? i) #_(== i 0) 403 | last 404 | (recur (ll-node. last) (dec i)))))) 405 | 406 | 407 | ;;types have less to setup, very barebones like the node class. 408 | (deftype ll-node-type [next]) 409 | 410 | ;;~1x, pretty much identical to java now 411 | (with-unchecked 412 | (defn create-objects5 [^long count] 413 | (loop [last nil 414 | i count] 415 | (if (== i 0) 416 | last 417 | (recur (ll-node-type. last) (dec i)))))) 418 | ;; Evaluation count : 2440158 in 6 samples of 406693 calls. 419 | ;; Execution time mean : 249.399392 ns 420 | ;; Execution time std-deviation : 5.009429 ns 421 | ;; Execution time lower quantile : 244.748218 ns ( 2.5%) 422 | ;; Execution time upper quantile : 256.732288 ns (97.5%) 423 | ;; Overhead used : 1.800162 ns 424 | 425 | ;;3.5 Binary Tree DFS 426 | 427 | ;; The binary tree DFS experiment consisted of searching a binary tree for a 428 | ;; valueit did not contain using depth first search. The depth first search was 429 | ;; implemented recursively in both languages. In Java the binary tree was 430 | ;; representedby a custom class while in Clojure they were represented using nested 431 | ;; persistent maps. 432 | 433 | 434 | ;; public BinaryTreeNode createBinaryTree (int depth, int[] counter) 435 | ;; {if (depth == 0) return null; 436 | ;; int value = counter[0]++; 437 | ;; BinaryTreeNode btn = new BinaryTreeNode(value); 438 | ;; btn.left = createBinaryTree(depth - 1, counter) ; 439 | ;; btn.right = createBinaryTree(depth - 1 , counter) ; 440 | ;; return btn ;} 441 | 442 | ;; public boolean binaryTreeDFS(BinaryTreeNode root, int target) 443 | ;; {if (root == null) return false ; 444 | ;; return root.value == target || 445 | ;; binaryTreeDFS(root.left, target) || 446 | ;; binaryTreeDFS (root.right, target);} 447 | 448 | ;;Added by joinr 449 | ;; public boolean binaryTreeDFSTest(int depth, int target) 450 | ;; { 451 | ;; int[] counter = new int[1]; 452 | ;; counter[0] = 0; 453 | ;; return binaryTreeBFS(createBinaryTree(depth,counter),target); 454 | ;; } 455 | 456 | ;; performancepaper.core> (c/quick-bench (bench.core/binaryTreeDFSTest 7 126)) 457 | 458 | ;; Evaluation count : 643680 in 6 samples of 107280 calls. 459 | ;; Execution time mean : 900.028340 ns 460 | ;; Execution time std-deviation : 25.156556 ns 461 | ;; Execution time lower quantile : 873.937425 ns ( 2.5%) 462 | ;; Execution time upper quantile : 927.532690 ns (97.5%) 463 | ;; Overhead used : 1.804565 ns 464 | 465 | (defn create-binary-tree [depth counter-atom] 466 | (when (> depth 0) 467 | (let [val @counter-atom] 468 | (swap! counter-atom inc ) 469 | {:value val 470 | :left (create-binary-tree (- depth 1) counter-atom ) 471 | :right (create-binary-tree (- depth 1) counter-atom )}))) 472 | 473 | (defn binary-tree-DFS [root target] 474 | (if (nil? root) 475 | false 476 | (or (= (:value root) target) 477 | (binary-tree-DFS (:left root) target) 478 | (binary-tree-DFS (:right root) target)))) 479 | 480 | ;;14x, we got keyword access, map allocation, recursion, and using atom as a 481 | ;;mutable counter, boxed numeric comparisons...lots of room to improve. 482 | (defn binary-tree-DFS-test [depth target] 483 | (binary-tree-DFS (create-binary-tree depth (atom 0)) 126)) 484 | 485 | ;; Evaluation count : 46068 in 6 samples of 7678 calls. 486 | ;; Execution time mean : 12.656700 µs 487 | ;; Execution time std-deviation : 244.046759 ns 488 | ;; Execution time lower quantile : 12.465987 µs ( 2.5%) 489 | ;; Execution time upper quantile : 13.059028 µs (97.5%) 490 | ;; Overhead used : 1.804565 ns 491 | 492 | (with-unchecked 493 | (defn create-binary-tree2 [^long depth counter-atom] 494 | (when (> depth 0) 495 | (let [val @counter-atom] 496 | (swap! counter-atom inc) 497 | {:value val 498 | :left (create-binary-tree2 (- depth 1) counter-atom) 499 | :right (create-binary-tree2 (- depth 1) counter-atom)})))) 500 | 501 | (defn binary-tree-DFS2 [root ^long target] 502 | (if (nil? root) 503 | false 504 | (or (== (root :value) target) 505 | (binary-tree-DFS2 (root :left) target) 506 | (binary-tree-DFS2 (root :right) target)))) 507 | 508 | ;;6.2x, unboxed numerics and faster keyword access help a bit 509 | ;;We are still allocating though, so building the tree is 510 | ;;probably the slow point. 511 | (defn binary-tree-DFS-test2 [depth target] 512 | (binary-tree-DFS2 (create-binary-tree2 depth (atom 0)) 126)) 513 | 514 | ;; Evaluation count : 115992 in 6 samples of 19332 calls. 515 | ;; Execution time mean : 5.588021 µs 516 | ;; Execution time std-deviation : 779.251559 ns 517 | ;; Execution time lower quantile : 5.140534 µs ( 2.5%) 518 | ;; Execution time upper quantile : 6.925430 µs (97.5%) 519 | ;; Overhead used : 2.332732 ns 520 | 521 | ;; Found 1 outliers in 6 samples (16.6667 %) 522 | ;; low-severe 1 (16.6667 %) 523 | ;; Variance from outliers : 31.8454 % Variance is moderately inflated by outliers 524 | 525 | ;;as before, we know that types are barebones classes. 526 | (deftype binary-node [^int value left right]) 527 | 528 | (with-unchecked 529 | (defn create-binary-tree3 [^long depth counter-atom] 530 | (when (> depth 0) 531 | (let [^long val @counter-atom] 532 | (vreset! counter-atom (inc val)) 533 | (binary-node. val 534 | (create-binary-tree3 (- depth 1) counter-atom) 535 | (create-binary-tree3 (- depth 1) counter-atom)))))) 536 | 537 | (defn binary-tree-DFS3 [^binary-node root ^long target] 538 | (if (nil? root) 539 | false 540 | (or (== (.value root) target) 541 | (binary-tree-DFS3 (.left root) target) 542 | (binary-tree-DFS3 (.right root) target)))) 543 | 544 | ;;2.96x, using a custom type and a volatile as a mutable 545 | ;;counter gets us closer. 546 | (defn binary-tree-DFS-test3 [depth target] 547 | (binary-tree-DFS3 (create-binary-tree3 depth (volatile! 0)) 126)) 548 | ;; Evaluation count : 222192 in 6 samples of 37032 calls. 549 | ;; Execution time mean : 2.665373 µs 550 | ;; Execution time std-deviation : 69.489473 ns 551 | ;; Execution time lower quantile : 2.580740 µs ( 2.5%) 552 | ;; Execution time upper quantile : 2.737338 µs (97.5%) 553 | ;; Overhead used : 1.804565 ns 554 | 555 | 556 | (with-unchecked 557 | (defn create-binary-tree4 [^long depth ^ints counter] 558 | (when (> depth 0) 559 | (let [val (aget counter 0)] 560 | (aset counter 0 (inc val)) 561 | (binary-node. val 562 | (create-binary-tree4 (- depth 1) counter) 563 | (create-binary-tree4 (- depth 1) counter)))))) 564 | 565 | (defn binary-tree-DFS4 [^binary-node root ^long target] 566 | (if root 567 | (or (== (.value root) target) 568 | (binary-tree-DFS4 (.left root) target) 569 | (binary-tree-DFS4 (.right root) target)) 570 | false)) 571 | 572 | ;;1.27x, like the java version, using a mutable int array as a counter 573 | ;;saves time on boxing with the volatile, gets us closer. 574 | (defn binary-tree-DFS-test4 [depth target] 575 | (binary-tree-DFS4 (create-binary-tree4 depth (doto (int-array 1) (aset 0 1))) 126)) 576 | 577 | ;; Evaluation count : 524934 in 6 samples of 87489 calls. 578 | ;; Execution time mean : 1.158351 µs 579 | ;; Execution time std-deviation : 46.874432 ns 580 | ;; Execution time lower quantile : 1.116454 µs ( 2.5%) 581 | ;; Execution time upper quantile : 1.222972 µs (97.5%) 582 | ;; Overhead used : 1.804565 ns 583 | 584 | ;;3.6 Binary Tree BFS 585 | 586 | ;;The binary tree BFS, similar to the binary tree DFS experiment consisted 587 | ;;ofsearching a binary tree for a value it did not contain, but using breadth 588 | ;;first search. The breadth first search was implemented iteratively in both 589 | ;;languages.In Java the binary tree was represented by a custom class while in 590 | ;;Clojure theywere represented using nested persistent maps. 591 | 592 | ;; public boolean binaryTreeBFS(BinaryTreeNode root, int target) 593 | ;; {Queuequeue= new LinkedList () ; 594 | ;; queue.add(root) ; 595 | ;; while (! queue.isEmpty()) 596 | ;; {BinaryTreeNode item = queue.poll(); 597 | ;; if (item.value == target) return true; 598 | ;; if (item.left != null) queue.add (item.left); 599 | ;; if (item.right != null) queue.add (item.right);} 600 | ;; return false;} 601 | 602 | ;;Added by joinr 603 | ;; public boolean binaryTreeBFSTest(int depth, int target) 604 | ;; { 605 | ;; int[] counter = new int[1]; 606 | ;; counter[0] = 0; 607 | ;; return binaryTreeBFS(createBinaryTree(depth,counter),target); 608 | ;; } 609 | 610 | ;;Not sure why we're getting clipped here...Java mutable linked list 611 | ;;queue should be faster out of the box, but who knows. Results are 612 | ;;identical, so looks consistent! 613 | 614 | ;; performancepaper.core> (c/quick-bench (bench.core/binaryTreeBFSTest 7 126)) 615 | ;; Evaluation count : 465144 in 6 samples of 77524 calls. 616 | ;; Execution time mean : 1.325622 µs 617 | ;; Execution time std-deviation : 31.643248 ns 618 | ;; Execution time lower quantile : 1.301545 µs ( 2.5%) 619 | ;; Execution time upper quantile : 1.376586 µs (97.5%) 620 | ;; Overhead used : 1.804565 ns 621 | 622 | (defn binary-tree-BFS [root target] 623 | (loop [queue (conj clojure.lang.PersistentQueue/EMPTY root)] 624 | (if (empty? queue) 625 | false 626 | (let [item (peek queue)] 627 | (if (= target (:value item)) 628 | true 629 | (recur (as-> (pop queue) $ 630 | (if (nil? (:left item)) 631 | $ 632 | (conj $ (:left item))) 633 | (if (nil? (:right item)) 634 | $ 635 | (conj $ (:right item)))))))))) 636 | 637 | ;;way faster for some reason... 638 | ;;20.8x slower using a persistent queue and original map-based 639 | ;;nodes. 640 | (defn binary-tree-BFS-test [depth tgt] 641 | (binary-tree-BFS (create-binary-tree depth (atom 0)) 126)) 642 | 643 | ;; performancepaper.core> (c/quick-bench (binary-tree-BFS-test 7 126)) 644 | ;; Evaluation count : 23448 in 6 samples of 3908 calls. 645 | ;; Execution time mean : 27.534318 µs 646 | ;; Execution time std-deviation : 3.168409 µs 647 | ;; Execution time lower quantile : 25.831461 µs ( 2.5%) 648 | ;; Execution time upper quantile : 32.973576 µs (97.5%) 649 | ;; Overhead used : 1.804565 ns 650 | 651 | ;; Found 1 outliers in 6 samples (16.6667 %) 652 | ;; low-severe 1 (16.6667 %) 653 | ;; Variance from outliers : 31.1481 % Variance is moderately inflated by outliers 654 | 655 | ;;0.89x, a bit faster surprisingly. 656 | (defn binary-tree-BFS-test2 [depth tgt] 657 | (binary-tree-BFS (create-binary-tree4 depth (doto (int-array 1) (aset 0 0))) 126)) 658 | 659 | ;; performancepaper.core> (c/quick-bench (binary-tree-BFS-test2 7 126)) 660 | ;; Evaluation count : 509616 in 6 samples of 84936 calls. 661 | ;; Execution time mean : 1.221056 µs 662 | ;; Execution time std-deviation : 28.469631 ns 663 | ;; Execution time lower quantile : 1.193429 µs ( 2.5%) 664 | ;; Execution time upper quantile : 1.257014 µs (97.5%) 665 | ;; Overhead used : 1.804565 ns 666 | -------------------------------------------------------------------------------- /README.org: -------------------------------------------------------------------------------- 1 | #+TITLE: Exploring the Methodology of "A performance comparison of Clojure and Java" by Gustav Krantz 2 | #+Author: joinr 3 | 4 | * Introduction 5 | 6 | The author ( [[https://www.diva-portal.org/smash/get/diva2:1424342/FULLTEXT01.pdf][full paper]] ) sought to establish a micro benchmark comparison between simple java 7 | programs and their implementations in Clojure. From there, a sample-based 8 | performance profiling methodology was applied to allow for JIT warm up and 9 | establish well founded statistical measures of each implementation. Per the 10 | author: 11 | 12 | "The idea behind this approach was that seeing the way the languages perform in 13 | common fundamental tasks would give the reader an idea of how the languages will 14 | perform in their application. The reason that the fundamental areas selected were 15 | separated into their own experiments rather than putting them all into the same 16 | program, was so that the reader could more easily predict which language is 17 | better for their specific tasks." 18 | 19 | "The Java version that was used to execute both the Clojure and the Java 20 | code was 1.8.0_60. The JVM was run with the arguments –Xmx11g and -Xss11g to 21 | increase the max heap and stack space to 11 gigabytes when needed for the 22 | experiments." 23 | 24 | After reading a post about this paper on r/clojure on reddit, I decided to 25 | walk through the author's code and offer some insights for the Clojure community 26 | and perhaps the java community regarding the efficiency of the micro benchmarks. 27 | 28 | Where possible, I will delineate apples:apples and apples:oranges during the 29 | walk through, as well as detail areas where I had to fill in the gaps from the paper 30 | with my own hopefully correct implementation. For purposes of consistency and 31 | reproducibility, I inline all of my measurements in the source code, using the 32 | criterium library to benchmark both the clojure and java implementations. I also 33 | caveat the results in that I did not explore the same design space as the author 34 | with regard to experimental parameters (e.g. recursion count, collection size, 35 | tree depth, etc.). Rather, I stuck with relatively small collections amenable 36 | to quick iterative benchmarking and experimentation. I believe the resulting 37 | performance improvements will hold up at scale though (based on experience). 38 | 39 | ** Platform 40 | All noted measures are on a similar platform as the original paper, namely 41 | OpenJDK 1.8. Java measures are provided as a baseline for comparison to 42 | nullify the effects of different hardware. 43 | 44 | * 3.1 Pure Recursion 45 | 46 | "The recursion experiment consisted of a number of recursion calls with only 47 | a counter as a parameter and a simple exit condition. It was designed to test 48 | the performance of function calls in the two languages. The counter was a 49 | primitive integer in both languages and was decreased by one for each 50 | recursive call." 51 | 52 | Execution times were measured for problem sizes of 2000, 20000, 200000, 2000000 53 | and 20000000,"" 54 | 55 | #+begin_src java 56 | private void Recurse(int cnt) 57 | {if (cnt > 0) 58 | Recurse (cnt - 1);} 59 | #+end_src 60 | 61 | #+begin_src clojure 62 | performancepaper.core> (c/quick-bench (bench.core/Recurse 10)) 63 | Evaluation count : 49359366 in 6 samples of 8226561 calls. 64 | Execution time mean : 10.123866 ns 65 | Execution time std-deviation : 0.199596 ns 66 | Execution time lower quantile : 9.852564 ns ( 2.5%) 67 | Execution time upper quantile : 10.309200 ns (97.5%) 68 | Overhead used : 1.797578 ns 69 | 70 | 21x slower 71 | (defn pure-recursion [cnt] 72 | (if (> cnt 0) 73 | (pure-recursion 74 | (- cnt 1)))) 75 | 76 | performancepaper.core> (c/quick-bench (pure-recursion 10)) 77 | Evaluation count : 2776386 in 6 samples of 462731 calls. 78 | Execution time mean : 217.748915 ns 79 | Execution time std-deviation : 3.708932 ns 80 | Execution time lower quantile : 213.224904 ns ( 2.5%) 81 | Execution time upper quantile : 221.431481 ns (97.5%) 82 | Overhead used : 1.804565 ns 83 | #+end_src 84 | 85 | Off the bat, we are in pretty bad spot. Thankfully this can be heavily mitigated 86 | by using unchecked math and a type hint ( ~with-unchecked~ has the effect of 87 | setting ~*unchecked-math*~ to true for the scope of the body, then reverting to 88 | ~false~ after evaluation). 89 | 90 | #+begin_src clojure 91 | 92 | (with-unchecked 93 | (defn pure-recursion2 [^long cnt] 94 | (if (pos? cnt) 95 | (pure-recursion2 (dec cnt))))) 96 | 97 | Evaluation count : 34678608 in 6 samples of 5779768 calls. 98 | Execution time mean : 15.723221 ns 99 | Execution time std-deviation : 0.156759 ns 100 | Execution time lower quantile : 15.545890 ns ( 2.5%) 101 | Execution time upper quantile : 15.907675 ns (97.5%) 102 | Overhead used : 1.804565 ns 103 | #+end_src 104 | 105 | That gets us to within 1.5x. 106 | 107 | We can still do better, although it may be argued over whether 108 | going this route deviates from the author's original intent. 109 | Since the original intent was to measure function call overhead, 110 | we can actually leverage a language feature that clojure provides 111 | to eliminate that overhead completely. Where there was naive recursion 112 | in the original, we can just use the ~recur~ form to semantically 113 | re-enter the function with new arguments, while operationally the 114 | clojure compiler optimizes this to a loop: 115 | 116 | #+begin_src clojure 117 | (defn pure-recursion4 [^long cnt] 118 | (if (> cnt 0) 119 | (recur (dec cnt)))) 120 | 121 | performancepaper.core> (c/quick-bench (pure-recursion4 10)) 122 | Evaluation count : 68567172 in 6 samples of 11427862 calls. 123 | Execution time mean : 6.972233 ns 124 | Execution time std-deviation : 0.059662 ns 125 | Execution time lower quantile : 6.887818 ns ( 2.5%) 126 | Execution time upper quantile : 7.030860 ns (97.5%) 127 | Overhead used : 1.797578 ns 128 | nil 129 | #+end_src 130 | 131 | We are now 0.697x of the original java runtime, so faster. We're also somewhat 132 | cheating at the machine level, but at the language level, ~recur~ (in my 133 | opinion) is fair game to avoid function call overhead, which java can't do. 134 | 135 | 136 | * 3.2 Sorting 137 | 138 | "The sorting experiment consisted of sorting a collection of integers. In Clojure 139 | this was done by sorting a list of integers, shuffled by the shuffle function, 140 | using the sort function, all of which are included in the clojure.core library. In 141 | Java this was done similarly by sorting an array of primitive integers, which 142 | was shuffled using java.util.Collections.shuffle, using the Arrays.sort function. 143 | 144 | Execution times were measured for collections with 2000, 20000, 200000, 145 | 2000000 and 20000000 integers." 146 | 147 | #+begin_src java 148 | private int[] createArray (int size) 149 | {int counter = Integer.MIN_VALUE; 150 | ArrayList arrList= new ArrayList (size) ; 151 | for(int i = 0; i < size ; ++ i) 152 | arrList.add (counter ++); 153 | java.util.Collections.shuffle(arrList); 154 | int[] retArr = new int[size] ; 155 | for(int i = 0; i < size ; ++ i ) 156 | retArr [i] = arrList.get(i); 157 | return retArr;} 158 | 159 | Arrays.sort(array) ; 160 | #+end_src 161 | 162 | #+begin_src clojure 163 | performancepaper.core> (c/quick-bench (core/createArray 100)) 164 | Evaluation count : 138942 in 6 samples of 23157 calls. 165 | Execution time mean : 4.369374 µs 166 | Execution time std-deviation : 63.001723 ns 167 | Execution time lower quantile : 4.310739 µs ( 2.5%) 168 | Execution time upper quantile : 4.467841 µs (97.5%) 169 | Overhead used : 1.797578 ns 170 | 171 | Clojure implemention underspecified 172 | 173 | (let [list (-> (create-list size (atom Integer/MIN_VALUE)) 174 | (shuffle))] 175 | ...) author elides this, and `create-list` is not provided. 176 | 177 | (sort list) 178 | #+end_src 179 | 180 | Since the original paper elided the exact source code for 181 | the clojure implementation, I filled in the rest to maintain 182 | a bit of consistency with what was provided and the java 183 | implementation: 184 | 185 | #+begin_src clojure 186 | (defn create-sorted-array [n] 187 | (->> (range Integer/MIN_VALUE 0 1) 188 | (take n) 189 | shuffle 190 | sort)) 191 | 192 | performancepaper.core> (c/quick-bench (create-sorted-array 100)) 193 | Evaluation count : 17532 in 6 samples of 2922 calls. 194 | Execution time mean : 34.841374 µs 195 | Execution time std-deviation : 549.515702 ns 196 | Execution time lower quantile : 34.210927 µs ( 2.5%) 197 | Execution time upper quantile : 35.646224 µs (97.5%) 198 | Overhead used : 1.804565 ns 199 | 200 | Found 1 outliers in 6 samples (16.6667 %) 201 | low-severe 1 (16.6667 %) 202 | Variance from outliers : 13.8889 % Variance is moderately inflated by outliers 203 | #+end_src 204 | As a starting point, we are roughly 8x slower than the java implementation. 205 | We can improve this to 3x and stay within Clojure idioms though. One thing 206 | to target is to avoid creating copies of stuff; since we are producing 207 | a sorted array using an intermediate ArrayList, we can bypass clojure.core/shuffle 208 | since it creates an intermediate clojure vector we don't need: 209 | 210 | #+begin_src clojure 211 | (defn create-sorted-array2 [^long n] 212 | (let [^ArrayList alist 213 | (->> (range Integer/MIN_VALUE 0 1) 214 | (transduce (take n) 215 | (completing (fn [^ArrayList acc n] 216 | (doto acc (.add n)))) 217 | (java.util.ArrayList. n))) 218 | _ (java.util.Collections/shuffle alist)] 219 | (doto (int-array alist) Arrays/sort))) 220 | 221 | Evaluation count : 46506 in 6 samples of 7751 calls. 222 | Execution time mean : 12.985146 µs 223 | Execution time std-deviation : 570.944434 ns 224 | Execution time lower quantile : 12.451225 µs ( 2.5%) 225 | Execution time upper quantile : 13.917159 µs (97.5%) 226 | Overhead used : 1.800162 ns 227 | 228 | Found 1 outliers in 6 samples (16.6667 %) 229 | low-severe 1 (16.6667 %) 230 | Variance from outliers : 13.8889 % Variance is moderately inflated by outliers 231 | nil 232 | #+end_src 233 | 234 | We still incur overhead in a couple of places, namely 235 | transduce has some checking inside it's internal loop, 236 | and coercing the ArrayList into a seq for ~int-array~ 237 | is substantially slower than iterating the ArrayList and 238 | updating a pre-allocated int-array, as java does. Using 239 | more interop, we get to 1.07x, slightly slower but not bad: 240 | 241 | #+begin_src clojure 242 | (with-unchecked 243 | (defn create-sorted-array3 [^long size] 244 | (let [^ArrayList alist 245 | (loop [^ArrayList acc (java.util.ArrayList. size) 246 | counter (int Integer/MIN_VALUE) 247 | n 0] 248 | (if (< n size) 249 | (let [c (inc counter)] 250 | (recur (doto acc (.add c)) 251 | c 252 | (inc n))) 253 | acc)) 254 | _ (Collections/shuffle alist) 255 | res (int-array size)] 256 | (dotimes [i size] (aset res i ^int (.get alist i))) 257 | (doto res Arrays/sort)))) 258 | 259 | performancepaper.core> (c/quick-bench (create-sorted-array3 100)) 260 | Evaluation count : 130794 in 6 samples of 21799 calls. 261 | Execution time mean : 4.669894 µs 262 | Execution time std-deviation : 179.454425 ns 263 | Execution time lower quantile : 4.477268 µs ( 2.5%) 264 | Execution time upper quantile : 4.902860 µs (97.5%) 265 | Overhead used : 1.800162 ns 266 | #+end_src 267 | 268 | * 3.3 Map Creation 269 | 270 | "The map creation experiment consisted of adding integers as keys and values to a 271 | map. In Java they were added to a HashMapfrom thejava.util library, and in 272 | Clojure they were added to the built-in persistent map data structure. 273 | 274 | Execution times were measured for20000, 63246, 200000, 632456 and 2000000 275 | different key-value pairs." 276 | 277 | #+begin_src java 278 | private HashMap createMap (int sze) 279 | {HashMap retMap= new HashMap(sze) ; 280 | for (int i = 0; i < sze ;) 281 | retMap.put(i , ++ i ) ; 282 | return retMap ;} 283 | #+end_src 284 | 285 | #+begin_src clojure 286 | (c/quick-bench (bench.core/createMap 100)) 287 | Evaluation count : 538998 in 6 samples of 89833 calls. 288 | Execution time mean : 1.178573 µs 289 | Execution time std-deviation : 40.404054 ns 290 | Execution time lower quantile : 1.142367 µs ( 2.5%) 291 | Execution time upper quantile : 1.237344 µs (97.5%) 292 | Overhead used : 1.800162 ns 293 | #+end_src 294 | 295 | We are comparing a java program that builds a mutable hashmap via tight loop 296 | iteration against a clojure program that uses a transient clojure hashmap to 297 | build and the coerce into a persistent clojure map. 298 | 299 | #+begin_src clojure 300 | (defn create-map [size] 301 | (loop [map (transient {}), 302 | i (int size)] 303 | (if (> i 0) 304 | (recur (assoc! map i (+ i 1)) (- i 1) ) 305 | (persistent! map)))) 306 | 307 | Evaluation count : 61686 in 6 samples of 10281 calls. 308 | Execution time mean : 9.874480 µs 309 | Execution time std-deviation : 96.973621 ns 310 | Execution time lower quantile : 9.750675 µs ( 2.5%) 311 | Execution time upper quantile : 9.964194 µs (97.5%) 312 | Overhead used : 1.800162 ns 313 | #+end_src 314 | 315 | Our baseline is ~9x slower, despite the use of 316 | transients. We may try to leverage unchecked 317 | math as before, and direct method invocation 318 | to make things a tad more efficient: 319 | #+begin_src clojure 320 | (with-unchecked 321 | (defn create-map2 [size] 322 | (loop [^clojure.lang.ITransientAssociative 323 | map (transient {}), 324 | i (int size)] 325 | (if (> i 0) 326 | (recur (.assoc map i (+ i 1)) 327 | (- i 1)) 328 | (persistent! map))))) 329 | 330 | performancepaper.core> (c/quick-bench (create-map2 100)) 331 | Evaluation count : 61260 in 6 samples of 10210 calls. 332 | Execution time mean : 9.576160 µs 333 | Execution time std-deviation : 147.638187 ns 334 | Execution time lower quantile : 9.392887 µs ( 2.5%) 335 | Execution time upper quantile : 9.723504 µs (97.5%) 336 | Overhead used : 1.804565 ns 337 | #+end_src 338 | 339 | Looks like not much change; still around 9x slower. 340 | It seems that the cost of building and coercing a transient 341 | map is still substantially outweighed by a pure mutable 342 | java hashmap that pays no coercion cost. Thankfully, 343 | we can just use java hashmaps from clojure via interop: 344 | 345 | #+begin_src clojure 346 | (with-unchecked 347 | (defn create-map3 [^ long size] 348 | (let [^java.util.HashMap map (java.util.HashMap. size)] 349 | (dotimes [i size] 350 | (.put map i (+ i 1)))))) 351 | 352 | performancepaper.core> (c/quick-bench (create-map3 100)) 353 | Evaluation count : 487116 in 6 samples of 81186 calls. 354 | Execution time mean : 1.229078 µs 355 | Execution time std-deviation : 30.572826 ns 356 | Execution time lower quantile : 1.191533 µs ( 2.5%) 357 | Execution time upper quantile : 1.268660 µs (97.5%) 358 | Overhead used : 1.804565 ns 359 | #+end_src 360 | 361 | Leveraging interop leaves us 1.04x, slower but perhaps that's 362 | within the margins. 363 | 364 | * 3.4 Object Creation 365 | 366 | "The object creation experiment consisted of creating a linked list without 367 | values. In Java a custom class was used to create the links while in Clojure 368 | nested persistent maps were used. The links were created backwards in both 369 | languages, meaning that the first object created would have a next-pointer with 370 | a null value, and the second object created would point to the first, and so on. 371 | 372 | Execution times were measured for 100000, 316228, 1000000, 3162278 and 10000000 373 | linked objects" 374 | 375 | #+begin_src java 376 | private class LLNode{ 377 | public LLNode next ; 378 | public LLNode (LLNode next ){ 379 | this.next = next ;} 380 | 381 | 382 | private LLNode create Objects (int count ) 383 | {LLNode last = null ; 384 | for (int i = 0; i < count; ++ i) 385 | last = new LLNode(last) ; 386 | return last;} 387 | #+end_src java 388 | 389 | #+begin_src clojure 390 | performancepaper.core> (c/quick-bench (bench.core/createObjects 100)) 391 | Evaluation count : 2368566 in 6 samples of 394761 calls. 392 | Execution time mean : 249.927510 ns 393 | Execution time std-deviation : 4.557640 ns 394 | Execution time lower quantile : 244.464795 ns ( 2.5%) 395 | Execution time upper quantile : 254.444188 ns (97.5%) 396 | Overhead used : 1.800162 ns 397 | 398 | (defn create-objects [count] 399 | (loop [last nil 400 | i (int count)] 401 | (if (= 0 i ) 402 | last 403 | (recur {:next last} (- i 1))))) 404 | 405 | Evaluation count : 916590 in 6 samples of 152765 calls. 406 | Execution time mean : 673.619823 ns 407 | Execution time std-deviation : 26.588156 ns 408 | Execution time lower quantile : 647.556044 ns ( 2.5%) 409 | Execution time upper quantile : 701.464334 ns (97.5%) 410 | Overhead used : 1.800162 ns 411 | #+end_src 412 | 413 | Our baseline implementation compares a java class-based implementation to a 414 | clojure hash-map based one. Notably unlike the java implementation, the hashmap 415 | must pay a key lookup cost to access fields, and has a higher 416 | construction/allocation cost as opposed to a simple class constructor with fixed 417 | fields (LLNode). Clojure starts off about 2.7x slower. 418 | 419 | Allocations are hurting us here, as well as array-map instantation. We're on a 420 | slow path compared to java. We can add unchecked math, and get some marginal gains, 421 | 422 | #+begin_src clojure 423 | (with-unchecked 424 | (defn create-objects2 [count] 425 | (loop [last nil 426 | i (int count)] 427 | (if (== i 0) 428 | last 429 | (recur {:next last} (- i 1)))))) 430 | 431 | Evaluation count : 933462 in 6 samples of 155577 calls. 432 | Execution time mean : 646.923626 ns 433 | Execution time std-deviation : 11.946099 ns 434 | Execution time lower quantile : 634.453274 ns ( 2.5%) 435 | Execution time upper quantile : 664.344180 ns (97.5%) 436 | Overhead used : 1.800162 ns 437 | #+end_src 438 | 439 | but the real target is to get a simpler container that's easy to construct. 440 | 441 | Records are faster to construct, but they implement a bunch of stuff and carry 442 | more state, so there is more setup. Still they are very much faster to create 443 | when you have fixed fields, like the node class. 444 | 445 | #+begin_src clojure 446 | (defrecord ll-node [next]) 447 | 448 | (defn create-objects3 [count] 449 | (loop [last nil 450 | i (int count)] 451 | (if (== i 0) 452 | last 453 | (recur (ll-node. last) (- i 1))))) 454 | 455 | Evaluation count : 1699422 in 6 samples of 283237 calls. 456 | Execution time mean : 348.583970 ns 457 | Execution time std-deviation : 6.587955 ns 458 | Execution time lower quantile : 337.022098 ns ( 2.5%) 459 | Execution time upper quantile : 354.655388 ns (97.5%) 460 | Overhead used : 1.800162 ns 461 | 462 | Found 1 outliers in 6 samples (16.6667 %) 463 | low-severe 1 (16.6667 %) 464 | Variance from outliers : 13.8889 % Variance is moderately inflated by outliers 465 | #+end_src 466 | 467 | 468 | Record-based is now 1.39x slower; getting close. 469 | As it turns out, types have less to setup, very barebones like the node class. 470 | 471 | #+begin_src clojure 472 | (deftype ll-node-type [next]) 473 | 474 | (with-unchecked 475 | (defn create-objects5 [^long count] 476 | (loop [last nil 477 | i count] 478 | (if (== i 0) 479 | last 480 | (recur (ll-node-type. last) (dec i)))))) 481 | Evaluation count : 2440158 in 6 samples of 406693 calls. 482 | Execution time mean : 249.399392 ns 483 | Execution time std-deviation : 5.009429 ns 484 | Execution time lower quantile : 244.748218 ns ( 2.5%) 485 | Execution time upper quantile : 256.732288 ns (97.5%) 486 | Overhead used : 1.800162 ns 487 | #+end_src 488 | 489 | With a barebones class equivalent and direct field access, 490 | we get ~1x, pretty much identical to java now, with very similar 491 | code. 492 | 493 | 494 | * 3.5 Binary Tree DFS 495 | 496 | "The binary tree DFS experiment consisted of searching a binary tree for a 497 | value it did not contain using depth first search. The depth first search was 498 | implemented recursively in both languages. In Java the binary tree was 499 | represented by a custom class while in Clojure they were represented using nested 500 | persistent maps." 501 | 502 | We have a similar situation with the object creation in 3.4 here, 503 | where the clojure solution is implemented on top of generic 504 | hashmaps, while the java implementation leverages classes and 505 | field access. Persistent hashmaps should have a bit higher 506 | instantiation and key lookup cost compared to raw classes. 507 | 508 | #+begin_src java 509 | public BinaryTreeNode createBinaryTree (int depth, int[] counter) 510 | {if (depth == 0) return null; 511 | int value = counter[0]++; 512 | BinaryTreeNode btn = new BinaryTreeNode(value); 513 | btn.left = createBinaryTree(depth - 1, counter) ; 514 | btn.right = createBinaryTree(depth - 1 , counter) ; 515 | return btn ;} 516 | 517 | public boolean binaryTreeDFS(BinaryTreeNode root, int target) 518 | {if (root == null) return false ; 519 | return root.value == target || 520 | binaryTreeDFS(root.left, target) || 521 | binaryTreeDFS (root.right, target);} 522 | 523 | //Added by joinr 524 | public boolean binaryTreeDFSTest(int depth, int target) 525 | { 526 | int[] counter = new int[1]; 527 | counter[0] = 0; 528 | return binaryTreeBFS(createBinaryTree(depth,counter),target); 529 | } 530 | #+end_src 531 | 532 | #+begin_src clojure 533 | performancepaper.core> (c/quick-bench (bench.core/binaryTreeDFSTest 7 126)) 534 | 535 | Evaluation count : 643680 in 6 samples of 107280 calls. 536 | Execution time mean : 900.028340 ns 537 | Execution time std-deviation : 25.156556 ns 538 | Execution time lower quantile : 873.937425 ns ( 2.5%) 539 | Execution time upper quantile : 927.532690 ns (97.5%) 540 | Overhead used : 1.804565 ns 541 | 542 | (defn create-binary-tree [depth counter-atom] 543 | (when (> depth 0) 544 | (let [val @counter-atom] 545 | (swap! counter-atom inc ) 546 | {:value val 547 | :left (create-binary-tree (- depth 1) counter-atom ) 548 | :right (create-binary-tree (- depth 1) counter-atom )}))) 549 | 550 | (defn binary-tree-DFS [root target] 551 | (if (nil? root) 552 | false 553 | (or (= (:value root) target) 554 | (binary-tree-DFS (:left root) target) 555 | (binary-tree-DFS (:right root) target)))) 556 | 557 | (defn binary-tree-DFS-test [depth target] 558 | (binary-tree-DFS (create-binary-tree depth (atom 0)) 126)) 559 | 560 | Evaluation count : 46068 in 6 samples of 7678 calls. 561 | Execution time mean : 12.656700 µs 562 | Execution time std-deviation : 244.046759 ns 563 | Execution time lower quantile : 12.465987 µs ( 2.5%) 564 | Execution time upper quantile : 13.059028 µs (97.5%) 565 | Overhead used : 1.804565 ns 566 | 567 | #+end_src 568 | 569 | We start at 14x slower, although there is a lot of incidental overhead to 570 | explore: 571 | 572 | - keyword access, 573 | - map allocation, 574 | - recursion, 575 | - using atom as a mutable numeric counter, 576 | - boxed numeric comparisons 577 | 578 | with potentially lots of room to improve. 579 | 580 | #+begin_src clojure 581 | (with-unchecked 582 | (defn create-binary-tree2 [^long depth counter-atom] 583 | (when (> depth 0) 584 | (let [val @counter-atom] 585 | (swap! counter-atom inc) 586 | {:value val 587 | :left (create-binary-tree2 (- depth 1) counter-atom) 588 | :right (create-binary-tree2 (- depth 1) counter-atom)})))) 589 | 590 | (defn binary-tree-DFS2 [root ^long target] 591 | (if (nil? root) 592 | false 593 | (or (== (root :value) target) 594 | (binary-tree-DFS2 (root :left) target) 595 | (binary-tree-DFS2 (root :right) target)))) 596 | 597 | (defn binary-tree-DFS-test2 [depth target] 598 | (binary-tree-DFS2 (create-binary-tree2 depth (atom 0)) 126)) 599 | 600 | Evaluation count : 115992 in 6 samples of 19332 calls. 601 | Execution time mean : 5.588021 µs 602 | Execution time std-deviation : 779.251559 ns 603 | Execution time lower quantile : 5.140534 µs ( 2.5%) 604 | Execution time upper quantile : 6.925430 µs (97.5%) 605 | Overhead used : 2.332732 ns 606 | 607 | Found 1 outliers in 6 samples (16.6667 %) 608 | low-severe 1 (16.6667 %) 609 | Variance from outliers : 31.8454 % Variance is moderately inflated by outliers 610 | #+end_src 611 | 612 | At 6.2x, unboxed numerics and faster keyword access help a bit, but they 613 | are not the choke point. We are still allocating though, so building the tree is 614 | probably the slow point. 615 | 616 | As before, we know that types are barebones classes. Direct class 617 | instantiation is faster than map creation, and direct field access 618 | is faster than key lookup. We can also probably gain a bit of 619 | speed by looking at our counter, switching from an atom to a 620 | volatile for perhaps a little gain: 621 | 622 | #+begin_src clojure 623 | (deftype binary-node [^int value left right]) 624 | 625 | (with-unchecked 626 | (defn create-binary-tree3 [^long depth counter-atom] 627 | (when (> depth 0) 628 | (let [^long val @counter-atom] 629 | (vreset! counter-atom (inc val)) 630 | (binary-node. val 631 | (create-binary-tree3 (- depth 1) counter-atom) 632 | (create-binary-tree3 (- depth 1) counter-atom)))))) 633 | 634 | (defn binary-tree-DFS3 [^binary-node root ^long target] 635 | (if (nil? root) 636 | false 637 | (or (== (.value root) target) 638 | (binary-tree-DFS3 (.left root) target) 639 | (binary-tree-DFS3 (.right root) target)))) 640 | 641 | (defn binary-tree-DFS-test3 [depth target] 642 | (binary-tree-DFS3 (create-binary-tree3 depth (volatile! 0)) 126)) 643 | Evaluation count : 222192 in 6 samples of 37032 calls. 644 | Execution time mean : 2.665373 µs 645 | Execution time std-deviation : 69.489473 ns 646 | Execution time lower quantile : 2.580740 µs ( 2.5%) 647 | Execution time upper quantile : 2.737338 µs (97.5%) 648 | Overhead used : 1.804565 ns 649 | #+end_src 650 | 651 | So that leaves 2.96x; using a custom type and a volatile as a mutable 652 | counter gets us much closer. One difference with the java implementation 653 | is the use of the counter; it's a primitive int array leading to 654 | unboxed operations and primitive math. Our counter (either an atom 655 | or a volatile) has a tad bit of overhead compared to mutating a 656 | primitive array. Let's copy the java implementation and use 657 | an array: 658 | 659 | #+begin_src clojure 660 | (with-unchecked 661 | (defn create-binary-tree4 [^long depth ^ints counter] 662 | (when (> depth 0) 663 | (let [val (aget counter 0)] 664 | (aset counter 0 (inc val)) 665 | (binary-node. val 666 | (create-binary-tree4 (- depth 1) counter) 667 | (create-binary-tree4 (- depth 1) counter)))))) 668 | 669 | (defn binary-tree-DFS4 [^binary-node root ^long target] 670 | (if root 671 | (or (== (.value root) target) 672 | (binary-tree-DFS4 (.left root) target) 673 | (binary-tree-DFS4 (.right root) target)) 674 | false)) 675 | 676 | (defn binary-tree-DFS-test4 [depth target] 677 | (binary-tree-DFS4 (create-binary-tree4 depth (doto (int-array 1) (aset 0 1))) 126)) 678 | 679 | Evaluation count : 524934 in 6 samples of 87489 calls. 680 | Execution time mean : 1.158351 µs 681 | Execution time std-deviation : 46.874432 ns 682 | Execution time lower quantile : 1.116454 µs ( 2.5%) 683 | Execution time upper quantile : 1.222972 µs (97.5%) 684 | Overhead used : 1.804565 ns 685 | #+end_src 686 | 687 | That leaves us with 1.27x, and like the java version, we use a mutable int array 688 | as a counter to save time on boxing with the volatile. There are perhaps more 689 | non-obvious optimizations, but I'm ending these for now since we're still relatively 690 | high up and fairly idiomatic. 691 | 692 | * 3.6 Binary Tree BFS 693 | 694 | "The binary tree BFS, similar to the binary tree DFS experiment consisted 695 | of searching a binary tree for a value it did not contain, but using breadth 696 | first search. The breadth first search was implemented iteratively in both 697 | languages.In Java the binary tree was represented by a custom class while in 698 | Clojure they were represented using nested persistent maps." 699 | 700 | #+begin_src java 701 | public boolean binaryTreeBFS(BinaryTreeNode root, int target) 702 | {Queuequeue= new LinkedList () ; 703 | queue.add(root) ; 704 | while (! queue.isEmpty()) 705 | {BinaryTreeNode item = queue.poll(); 706 | if (item.value == target) return true; 707 | if (item.left != null) queue.add (item.left); 708 | if (item.right != null) queue.add (item.right);} 709 | return false;} 710 | 711 | //Added by joinr 712 | public boolean binaryTreeBFSTest(int depth, int target) 713 | { 714 | int[] counter = new int[1]; 715 | counter[0] = 0; 716 | return binaryTreeBFS(createBinaryTree(depth,counter),target); 717 | } 718 | #+end_src 719 | 720 | Here we are comparing a java implementation - based on a mutable 721 | queue (based on a doubly linked list) for the search fringe - 722 | against a clojure implementation that uses a persistent queue. 723 | 724 | #+begin_src clojure 725 | performancepaper.core> (c/quick-bench (bench.core/binaryTreeBFSTest 7 126)) 726 | Evaluation count : 465144 in 6 samples of 77524 calls. 727 | Execution time mean : 1.325622 µs 728 | Execution time std-deviation : 31.643248 ns 729 | Execution time lower quantile : 1.301545 µs ( 2.5%) 730 | Execution time upper quantile : 1.376586 µs (97.5%) 731 | Overhead used : 1.804565 ns 732 | 733 | (defn binary-tree-BFS [root target] 734 | (loop [queue (conj clojure.lang.PersistentQueue/EMPTY root)] 735 | (if (empty? queue) 736 | false 737 | (let [item (peek queue)] 738 | (if (= target (:value item)) 739 | true 740 | (recur (as-> (pop queue) $ 741 | (if (nil? (:left item)) 742 | $ 743 | (conj $ (:left item))) 744 | (if (nil? (:right item)) 745 | $ 746 | (conj $ (:right item)))))))))) 747 | 748 | (defn binary-tree-BFS-test [depth tgt] 749 | (binary-tree-BFS (create-binary-tree depth (atom 0)) 126)) 750 | 751 | performancepaper.core> (c/quick-bench (binary-tree-BFS-test 7 126)) 752 | Evaluation count : 23448 in 6 samples of 3908 calls. 753 | Execution time mean : 27.534318 µs 754 | Execution time std-deviation : 3.168409 µs 755 | Execution time lower quantile : 25.831461 µs ( 2.5%) 756 | Execution time upper quantile : 32.973576 µs (97.5%) 757 | Overhead used : 1.804565 ns 758 | 759 | Found 1 outliers in 6 samples (16.6667 %) 760 | low-severe 1 (16.6667 %) 761 | Variance from outliers : 31.1481 % Variance is moderately inflated by outliers 762 | 763 | #+end_src 764 | 765 | As expected, the map-based, persistent queued clojure implementation is 20.8x 766 | slower than the java implementation that stores information in plain classes and 767 | uses a mutable queue. Let's apply the lessons from BFS and use 768 | our ~deftype~ based nodes to build the tree, then search it: 769 | 770 | #+begin_src clojure 771 | (defn binary-tree-BFS-test2 [depth tgt] 772 | (binary-tree-BFS (create-binary-tree4 depth (doto (int-array 1) (aset 0 0))) 126)) 773 | 774 | performancepaper.core> (c/quick-bench (binary-tree-BFS-test2 7 126)) 775 | Evaluation count : 509616 in 6 samples of 84936 calls. 776 | Execution time mean : 1.221056 µs 777 | Execution time std-deviation : 28.469631 ns 778 | Execution time lower quantile : 1.193429 µs ( 2.5%) 779 | Execution time upper quantile : 1.257014 µs (97.5%) 780 | Overhead used : 1.804565 ns 781 | #+end_src 782 | 783 | We end up at 0.89x, which is surprisingly a bit faster. I would 784 | naively expect mutable implementations to have a 2-4x edge in most 785 | cases, but we may have a niche for the persistent queue here. 786 | 787 | 788 | * Conclusion 789 | I ran through basic optimization/idiomatic stuff to explore each benchmark using 790 | criterium to compare the java implementation and the clojure ones. 791 | 792 | I started with the original implementations from the paper, then adding 793 | derivative versions suffixed by N, e.g. some-fn, some-fn2, some-fn3, etc. 794 | 795 | The goal here was to provide a layered approach to showing the impact of certain 796 | stuff. In almost all cases (except for the BFS test, which I don't understand 797 | the performance yields), we see a typical pattern: 798 | 799 | - the clojure implementation starts off about 10x worse or more, 800 | - then you get some immediate gains with low-hanging optimizations, 801 | - then eventually converge on typed java interop in the limit to get either 802 | - equivalent performance, within some percentage (like 18% or less), 803 | - or better in a few cases. 804 | 805 | The BFS stuff in clojure was surprisingly a bit better using a persistent queue 806 | with similar optimization from the DFS, which is interesting since I would 807 | "imagine" that the mutable queue implementation in the jvm version would have an 808 | advantage. 809 | 810 | Other than that, the other bench marks are predictable (from an experiental perspective). 811 | 812 | I guess the real interest is comparing apples:apples in such microbenchmarks. The 813 | evolutionary pattern of optimization was to start with perhaps intentionally naive 814 | clojure implementations - which leverage persistent structures, boxed math, and 815 | perhaps a bit of overhead compared to their statically typed java counterparts - 816 | and then gradually morph toward something closer to the host (java) to level 817 | the playing field. We add hints and primitive math, leverage efficient class-based 818 | field access and instantiation, and where necessary, direct java interop to compete 819 | with java. 820 | 821 | I'd like to address some points made by the author: 822 | 823 | ** Optimality Criticism 824 | Like any good researcher, the author addresses some possible criticisms openly: 825 | 826 | - "All of the code tested was implemented by the researcher and it might not be 827 | optimal for some experiments, meaning that there might exist faster 828 | solutions." 829 | 830 | I think we have demonstrated that this is the case for the sample code; although 831 | I'm not entirely certain if the code in this repository is admissible under 832 | potentially unseen criterion in the original paper. If there are no 833 | constraints placed on Clojure, we can typically get at Java performance given 834 | the tight level of host interop (as well as more esoteric techniques like 835 | runtime bytecode gen via asm and similar libraries). I think the raw 836 | java implementations still edge out clojure in like-for-like cases (e.g. 837 | primitive math and mutable collections), but the margins are certainly 838 | far less than the range demonstrated in the paper (at least for the 839 | subset of testing performed here). 840 | 841 | - "This work is intended for private persons and companies to use when 842 | evaluating which language to use for their programming projects. This saves 843 | time and potentially money for the readers, benefiting the society’s economic 844 | sustainability positively, albeit very little." 845 | 846 | - "These results strongly suggest that the use of Clojure over Java comes with a 847 | cost of both startup and runtime performance." 848 | 849 | I hope to provide - if not additional context for pedagogical reasons - a bit of 850 | a counterpoint to the observations in the paper. 851 | 852 | # LocalWords: interop hashmap barebones clojure hashmaps 853 | --------------------------------------------------------------------------------