└── README.md


/README.md:
--------------------------------------------------------------------------------
 1 | # The Array Olympics!
 2 | 
 3 | ## Overview
 4 | 
 5 | This repo is for collecting thoughts and ideas for benchmarking ndarray computations. We are less focused on the speed of core numerical computations (as have been covered elsewhere, e.g. [here](http://lessthanoptimal.github.io/Java-Matrix-Benchmark/) and [here](http://julialang.org/benchmarks/)), and more focused on how different ndarray abstractions perform across complex, real-world workflows -- involving applying multple functions / aggregations / reshapings of data over different dimensions -- especially in the context of distributed computing settings.
 6 | 
 7 | For now this repo can serve to collect notes on the **events** we want to consider, the **settings** in which we want to evaluate those events, and the **languages** we want to include. Moving forward, we can use this repo to collect implementations of the benchmarks, a CLI to run them, and a little website to show them.
 8 | 
 9 | ## Settings
10 | 
11 | (Note that some of these settings will only apply to some of the languages / frameworks)
12 | 
13 | Single node
14 | - 1 core
15 | - 8 cores
16 | - GPU
17 | 
18 | Multiple nodes
19 | - 5 nodes 
20 | - 10 nodes 
21 | - 20 nodes 
22 | 
23 | ## Events
24 | 
25 | Suggested events include:
26 | 
27 | 1. Aggegation 
28 | 2. Mapped function + aggregation
29 | 3. Burrows–Wheeler transform
30 | 4. Aggregate images by key
31 | 5. SVD (including out-of-core)
32 | 
33 | Detailed descriptions are available in the [wiki](https://github.com/freeman-lab/array-olympics/wiki)
34 | 
35 | ## Languages / Frameworks
36 | 
37 | Many of us do a lot of this kind of work in Python, but for the Spark setting we should consider evaluating relative to Scala, and in the local setting it would be interesting to compare to other languages (including Julia and Javascript).
38 | 
39 | Note that all languages are appropriate to the "single node + 1 core" setting, but only some are relavant to other settings (e.g. only Spark for the multiple node settings, only torch and theano for GPUs, etc.)
40 | 
41 | - numpy (python)
42 | - theano (python)
43 | - dask.array (python)
44 | - scijs-ndarray (javascript)
45 | - torch (lua)
46 | - spark (scala)
47 | - spark (python)
48 | 
49 | 


--------------------------------------------------------------------------------