└── README.md /README.md: -------------------------------------------------------------------------------- 1 | # The Array Olympics! 2 | 3 | ## Overview 4 | 5 | This repo is for collecting thoughts and ideas for benchmarking ndarray computations. We are less focused on the speed of core numerical computations (as have been covered elsewhere, e.g. [here](http://lessthanoptimal.github.io/Java-Matrix-Benchmark/) and [here](http://julialang.org/benchmarks/)), and more focused on how different ndarray abstractions perform across complex, real-world workflows -- involving applying multple functions / aggregations / reshapings of data over different dimensions -- especially in the context of distributed computing settings. 6 | 7 | For now this repo can serve to collect notes on the **events** we want to consider, the **settings** in which we want to evaluate those events, and the **languages** we want to include. Moving forward, we can use this repo to collect implementations of the benchmarks, a CLI to run them, and a little website to show them. 8 | 9 | ## Settings 10 | 11 | (Note that some of these settings will only apply to some of the languages / frameworks) 12 | 13 | Single node 14 | - 1 core 15 | - 8 cores 16 | - GPU 17 | 18 | Multiple nodes 19 | - 5 nodes 20 | - 10 nodes 21 | - 20 nodes 22 | 23 | ## Events 24 | 25 | Suggested events include: 26 | 27 | 1. Aggegation 28 | 2. Mapped function + aggregation 29 | 3. Burrows–Wheeler transform 30 | 4. Aggregate images by key 31 | 5. SVD (including out-of-core) 32 | 33 | Detailed descriptions are available in the [wiki](https://github.com/freeman-lab/array-olympics/wiki) 34 | 35 | ## Languages / Frameworks 36 | 37 | Many of us do a lot of this kind of work in Python, but for the Spark setting we should consider evaluating relative to Scala, and in the local setting it would be interesting to compare to other languages (including Julia and Javascript). 38 | 39 | Note that all languages are appropriate to the "single node + 1 core" setting, but only some are relavant to other settings (e.g. only Spark for the multiple node settings, only torch and theano for GPUs, etc.) 40 | 41 | - numpy (python) 42 | - theano (python) 43 | - dask.array (python) 44 | - scijs-ndarray (javascript) 45 | - torch (lua) 46 | - spark (scala) 47 | - spark (python) 48 | 49 | --------------------------------------------------------------------------------