├── .gitignore
├── README.md
├── chapel
├── fmtime.log
├── kernel.chpl
├── kernelmatrix.chpl
├── script.chpl
├── script.sh
└── time.log
├── charts
├── benchplot.jpg
├── benchplot.svg
├── charts.r
├── fmbenchplot.jpg
├── fmbenchplot.svg
├── ndsliceDiagnostic.jpg
└── ndsliceDiagnostic.svg
├── d
├── arrays.d
├── fmtime.log
├── kernel.d
├── mathdemo.d
├── script.d
├── script.sh
└── time.log
├── data
├── chapelBench.csv
├── dBench.csv
├── dNDSliceBench.csv
├── juliaBench.csv
└── ndsliceTime.log
├── docs
└── kernel.pdf
├── fmdata
├── chapelBench.csv
├── dBench.csv
└── juliaBench.csv
├── julia
├── KernelMatrix.jl
├── fmtime.log
├── script.jl
├── script.sh
└── time.log
├── ndslice
├── dub.json
└── source
│ ├── app.d
│ └── ndslice
│ └── kernels.d
└── script.sh
/.gitignore:
--------------------------------------------------------------------------------
1 | *.o
2 | */.dub
3 | ndslice/ndslice
4 | d/mathdemo
5 | d/script
6 | ndslice/dub.selections.json
7 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # A look at Chapel, D, and Julia using Kernel Matrix calculations
2 |
3 | *Author: Dr Chibisi Chima-Okereke*
4 | *Date: 2020-06-11 (Updated article with ieee and fast math calcuations for Chapel, D, and Julia)*
5 |
6 | ## Introduction
7 |
8 | It seems each time you turn around, there is a new programming language aimed at solving some specific problem set. Increased proliferation of programming languages and data are deeply connected in a fundamental way, and increasing demand for “data science” computing is a related phenomenon. In the field of scientific computing Chapel, D, and Julia are highly relevant programming languages, they arise from different needs and are aimed at different problem sets. Chapel focuses on data parallelism in multicore machines and large clusters. D was developed as a productive safer alternative to C++, and Julia was created for technical and scientific computing. All three languages emphasize performance as a feature. This article benchmarks their performance on kernel matrix calculations, and it presents approaches to performance optimization and other usability features of the languages.
9 |
10 | Kernel matrix calculations are the basis of kernel methods in machine learning applications. They scale rather poorly `O(m n^2)`, `n` is the number of items and `m` is the number of elements in each item. In our exercsie `m` will be constant and we will be looking at execution time in each implementation as `n` increases. Here `m = 784`, and `n = 1k, 5k, 10k, 20k, 30k`, each calculation is run 3 times and an average is taken. We disallow any use of BLAS and only allow use of packages or modules from the standard library of each language. D does not have "mathematical"-style arrays as Julia and Chapel do, so a Matrix object is implemented. The performance of the Matrix object is then compared with calculations using Mir, a multidimensional array package written in D to make sure that this implementation of the Matrix reflects the true performance of D. The details for the calculation of the kernel matrix and kernel functions are given [here](https://github.com/dataPulverizer/KernelMatrixBenchmark/blob/master/docs/kernel.pdf).
11 |
12 | Two benchmark types are given, one in IEEE mode - the default mode and another in fast math which violates IEEE floating point calculation standards. Fast math allows reassociation transformations of floating point instructions which could mean getting different results from the IEEE case; it breaks `NaN` and `Inf`, uses approximates modes of calculating certian functions such as `log`, `sin`, `sqrt` and makes [other consessions to standard practice](https://llvm.org/docs/LangRef.html#fast-math-flags) to boost performance. In most real world applications IEEE standard rather than fast math would be used but like other compiler safety features such as bounds checking, after production ready code has be created, if the analyst is sure and has tests perhaps with proofs showing that using fast math calculations will not adversely affect the result it can be an added boost to performance.
13 |
14 | While preparing the code for this article, the Chapel, D, and Julia communities were very helpful and patient with all enquiries, so they are acknowledged here.
15 |
16 | In terms of bias, going in I was much more familiar with D and Julia than I was of Chapel, however getting the best performance from each language requried a lot of interaction with each programming community and I have done my best to be aware of my biases and attempted to correct for them. If the reader has any issue with the way this analysis has been conducted, they can raise it with the [GitHub repository](https://github.com/dataPulverizer/KernelMatrixBenchmark) where the code to carry out the calculation is located.
17 |
18 | ## Language Benchmarks for Kernel Matrix Calculation
19 |
20 |
21 | The [above chart](https://github.com/dataPulverizer/KernelMatrixBenchmark/blob/master/charts/charts.r) shows the performance benchmark time taken in seconds (log scale) against the number of items (`n` as above) for nine kernels all executed on Chapel, D, and Julia for IEEE mathematics calculations. The chart below shows a repeated benchmark as above when using the fast math calculations in each language.
22 |
23 |
24 | In the IEEE floating point case, Julia performs better than D and Chapel in all but the `log` and `power` kernels where D is performs better. In cases where fast math is used the performance of Julia falls behind Chapel and D in all but the power kernel benchmark where D performs best, and Julia second. Chapel and D show very similar performance in all but the `log` and `power` benchmarks.
25 |
26 | In the interest of transparency, the mathematics functions used in D were pulled from C's math module made available in the D compiler in its [`core.stdc.math`](https://dlang.org/library/core/stdc/math.html) module. This was done because the mathematical functions in D's standard library [`std.math`](https://dlang.org/phobos/std_math.html) can be slow. The math functions used are given in the import statement in the [script for calculating kernel functions](https://github.com/dataPulverizer/KernelMatrixBenchmark/blob/master/d/kernel.d). By way of comparison consider the [mathdemo.d](https://github.com/dataPulverizer/KernelMatrixBenchmark/blob/master/d/mathdemo.d) script comparing the imported C `log` function D's `log` function from `std.math`:
27 |
28 | ```bash
29 | $ ldc2 -O --boundscheck=off --mcpu=native mathdemo.d && ./mathdemo
30 | Time taken for c log: 0.58623 seconds.
31 | Time taken for d log: 2.3747 seconds.
32 | ```
33 |
34 | #### Suitability of Matrix object used
35 | The Matrix object used in the D benchmark was implemented specifically because use of modules outside language standard libraries was disallowed for this article (discussed later), but to make sure that this implementation is competitive i.e. does not unfairly represent D's performance, it is compared to Mir's ndslice library written in D. The chart below shows the difference in execution times of the kernel matrix calculation between the implementation of Matrix and ndslice as a percentage of Matrix's kernel benchmark running time. Negative means that ndslice is slower and positive times mean that ndslice is faster. Performance across the kernels are all about the same, sometimes ndslice is slightly faster and at other times it is slightly slower, so the Matrix object used is a fair representation of D's performance.
36 |
37 |
38 |
39 | ## Environment
40 |
41 | The code was run on a computer with an Ubuntu 20.04 OS, 32 GB memory and an Intel® Core™ i9-8950HK CPU @ 2.90GHz with 6 cores and 12 threads.
42 |
43 | ```bash
44 | $ julia --version
45 | julia version 1.4.1
46 | ```
47 |
48 | ```bash
49 | $ dmd --version
50 | DMD64 D Compiler v2.090.1
51 | ```
52 |
53 | ```bash
54 | ldc2 --version
55 | LDC - the LLVM D compiler (1.18.0):
56 | based on DMD v2.088.1 and LLVM 9.0.0
57 | ```
58 |
59 | ```bash
60 | $ chpl --version
61 | chpl version 1.22.0
62 | ```
63 |
64 | ### Compilation
65 |
66 | Compilation is done with scripts see `script.sh` file in each language folder and the `script.sh` script in the [home folder](https://github.com/dataPulverizer/KernelMatrixBenchmark) of the repository.
67 |
68 | ## Implementations
69 |
70 | Efforts were made to avoid non-standard libraries while implementing these kernel functions. The reasons for this are:
71 |
72 | * It is completely transparent and shows how each language works. This article is about the programming languages and what the reader can expect in terms of performance and other factors discussed later. Packages can sometimes give a false impression about what using a programming language is like.
73 | * Packages outside standard libraries can go extinct so avoiding external libraries keeps the article and code relevant.
74 | * Making it easy for a reader after installing the language to copy and run the code. Having to install external libraries can be a bit of a "faff".
75 |
76 |
77 | ### Chapel
78 |
79 | Chapel uses a `forall` loop to parallelize over threads and uses `guided` iteration over indices are used. Array rows and columns are usually indexed in a user friendly manner, but above they are accessed using pointers which can be a way of boosting performance.
80 |
81 | ```chpl
82 | proc calculateKernelMatrix(K, data: [?D] ?T)
83 | {
84 | var n = D.dim(0).last;
85 | var p = D.dim(1).last;
86 | var E: domain(2) = {D.dim(0), D.dim(0)};
87 | var mat: [E] T;
88 | var rowPointers: [1..n] c_ptr(T) =
89 | forall i in 1..n do c_ptrTo(data[i, 1]);
90 |
91 | forall j in guided(1..n by -1) {
92 | for i in j..n {
93 | mat[i, j] = K.kernel(rowPointers[i], rowPointers[j], p);
94 | mat[j, i] = mat[i, j];
95 | }
96 | }
97 | return mat;
98 | }
99 | ```
100 |
101 | ### D
102 |
103 | D uses a `taskPool` of threads from its `std.parallel` package to parallelize code. The D code underwent the least amount of change for performance optimization, a lot of the performance benefits came from the specific compiler used and flags selected (discussed later). The implementation of `Matrix` allows columns to be selected by reference `refColumnSelect`.
104 |
105 | ```d
106 | auto calculateKernelMatrix(alias K, T)(K!(T) kernel, Matrix!(T) data)
107 | {
108 | long n = data.ncol;
109 | auto mat = Matrix!(T)(n, n);
110 |
111 | foreach(j; taskPool.parallel(iota(n)))
112 | {
113 | auto arrj = data.refColumnSelect(j).array;
114 | foreach(long i; j..n)
115 | {
116 | mat[i, j] = kernel(data.refColumnSelect(i).array, arrj);
117 | mat[j, i] = mat[i, j];
118 | }
119 | }
120 | return mat;
121 | }
122 | ```
123 |
124 | ### Julia
125 |
126 | The Julia code uses `@threads` macro for parallelising the code and `@views` macro for referencing arrays. One confusing thing about Julia's arrays is their reference status. Sometimes as in this case arrays will behave like value objects and they have to be referenced by using the `@views` macro otherwise they generate copies, at other times they behave like reference objects, for example passing them into a function. It can be a little tricky dealing with this because it's not always obvious which set of operations will generate a copy, but where this occurs `@views` provides a good solution.
127 |
128 | Julia also has `Symmetric` matrix type which meaning allocating to both sides of the matrix is not necessary.
129 |
130 | ```jl
131 | function calculateKernelMatrix(Kernel::K, data::Array{T}) where {K <: AbstractKernel,T <: AbstractFloat}
132 | n = size(data)[2]
133 | mat = zeros(T, n, n)
134 | @threads for j in 1:n
135 | @views for i in j:n
136 | mat[i,j] = kernel(Kernel, data[:, i], data[:, j])
137 | end
138 | end
139 | return Symmetric(mat, :L)
140 | end
141 | ```
142 |
143 | The `@bounds` and `@simd` macros in the kernel functions were used to turn bounds checking off and apply SIMD optimization to the calculations:
144 |
145 | ```jl
146 | struct DotProduct <: AbstractKernel end
147 | @inline function kernel(K::DotProduct, x::AbstractArray{T, N}, y::AbstractArray{T, N}) where {T,N}
148 | ret = zero(T)
149 | m = length(x)
150 | @inbounds @simd for k in 1:m
151 | ret += x[k] * y[k]
152 | end
153 | return ret
154 | end
155 | ```
156 |
157 | These optimizations are quite visible but easy to apply. Note that Julia has the `@fastmath` macro for applying fast math to individual code lines/blocks but only the command line option was used in this analysis.
158 |
159 | ## Memory Usage
160 |
161 | The total time for each benchmark and memory used was captured using the `/usr/bin/time -v` command. The output for each of the languages is given below:
162 |
163 | The complete calculation in Chapel took about the longest amount of time to execute but consumed a moderate amount of memory (nearly 9GB RAM peak memory):
164 | ```
165 | Command being timed: "./script --verbose=true --fastmath=false"
166 | User time (seconds): 114342.55
167 | System time (seconds): 17.96
168 | Percent of CPU this job got: 1192%
169 | Elapsed (wall clock) time (h:mm:ss or m:ss): 2:39:53
170 | Average shared text size (kbytes): 0
171 | Average unshared data size (kbytes): 0
172 | Average stack size (kbytes): 0
173 | Average total size (kbytes): 0
174 | Maximum resident set size (kbytes): 9266328
175 | Average resident set size (kbytes): 0
176 | Major (requiring I/O) page faults: 0
177 | Minor (reclaiming a frame) page faults: 2315637
178 | Voluntary context switches: 625
179 | Involuntary context switches: 3419118
180 | Swaps: 0
181 | File system inputs: 0
182 | File system outputs: 8
183 | Socket messages sent: 0
184 | Socket messages received: 0
185 | Signals delivered: 0
186 | Page size (bytes): 4096
187 | Exit status: 0
188 | ```
189 |
190 | D consumed the most amount of memory (around 20GB RAM peak memory) but took the least amount of time to execute:
191 |
192 | ```
193 | Command being timed: "./script"
194 | User time (seconds): 69089.75
195 | System time (seconds): 41.91
196 | Percent of CPU this job got: 1181%
197 | Elapsed (wall clock) time (h:mm:ss or m:ss): 1:37:29
198 | Average shared text size (kbytes): 0
199 | Average unshared data size (kbytes): 0
200 | Average stack size (kbytes): 0
201 | Average total size (kbytes): 0
202 | Maximum resident set size (kbytes): 20458972
203 | Average resident set size (kbytes): 0
204 | Major (requiring I/O) page faults: 0
205 | Minor (reclaiming a frame) page faults: 15393443
206 | Voluntary context switches: 4884
207 | Involuntary context switches: 2222841
208 | Swaps: 0
209 | File system inputs: 8
210 | File system outputs: 8
211 | Socket messages sent: 0
212 | Socket messages received: 0
213 | Signals delivered: 0
214 | Page size (bytes): 4096
215 | Exit status: 0
216 | ```
217 |
218 | Julia consumed the least of memory (around 7.5 GB peak memory) and ran the second fastest in total:
219 |
220 | ```
221 | Command being timed: "julia script.jl data true"
222 | User time (seconds): 49163.19
223 | System time (seconds): 32.08
224 | Percent of CPU this job got: 717%
225 | Elapsed (wall clock) time (h:mm:ss or m:ss): 1:54:18
226 | Average shared text size (kbytes): 0
227 | Average unshared data size (kbytes): 0
228 | Average stack size (kbytes): 0
229 | Average total size (kbytes): 0
230 | Maximum resident set size (kbytes): 7501592
231 | Average resident set size (kbytes): 0
232 | Major (requiring I/O) page faults: 804
233 | Minor (reclaiming a frame) page faults: 38021184
234 | Voluntary context switches: 2657
235 | Involuntary context switches: 477363
236 | Swaps: 0
237 | File system inputs: 368240
238 | File system outputs: 8
239 | Socket messages sent: 0
240 | Socket messages received: 0
241 | Signals delivered: 0
242 | Page size (bytes): 4096
243 | Exit status: 0
244 | ```
245 |
246 | ## Performance optimization
247 |
248 | The process of performance optimization in all three languages was very different and all three communities were very helpful in the process. But there were some common themes.
249 |
250 | * Static dispatching of kernel functions instead of using polymorphism. This means that when passing the kernel function, use parametric (static compile time) polymorphism rather than runtime (dynamic) polymorphism were dispatch with virtual functions carries a performance penalty.
251 | * Using views/references rather than copying data over multiple threads – makes a big difference.
252 | * Parallelising the calculations makes a huge difference.
253 | * Knowing if your array is row/column major and using that in your calculation makes a huge difference.
254 | * Bounds checks and compiler optimizations makes a huge difference especially in Chapel and D.
255 | * Enabling SIMD in D and Julia made a contribution to the performance. In D this was done using the `-mcpu=native` flag and in Julia this was done using the `@simd` macro.
256 |
257 | In terms of language specific issues, getting to performant code in Chapel was the most challenging and the Chapel code changed the most from easy to read array operations to using pointers and guided iterations. But on the compiler side it was relatively easy to add `--fast` and get a large performance boost.
258 |
259 | In D the code changed very little and most of the performance was gained in the compiler used and optimization flags. D’s LDC compiler is rich in terms of options for performance optimization. It has 8 `-O` optimization levels but some are repetitions of others for instance `-O`, `-O3`, and `-O5` are identical, and there are a myriad of other flags that affect performance in various ways. In this case the flags used were `-O5 --boundscheck=off –ffast-math` representing aggressive compiler optimizations, bounds checking, and LLVM’s fast-math and `-mcpu=native` to enable CPU vectorization instructions.
260 |
261 | In the Julia the macro changes discussed previously markedly improved the performance but they were not too intrusive.
262 |
263 | ## Quality of life
264 |
265 | This section examines the relative pros and cons around the convenience and ease of use of each language. People underestimate the effort it takes to use a language day to day, the support and infrastructure required is a lot so it is worth comparing various facets of each language. Readers seeking to avoid the TLDR should scroll to the end of this section for the table comparing the language features discussed here. Every effort has been made to be as objective as possible but comparing programming languages is difficult, bias prone, and contentious so read this section with that in mind. Some elements looked at such as arrays are from the “data science”/technical/scientific computing point of view and others are more general.
266 |
267 | ### Interactivity
268 |
269 | Programmers want a fast code/compile/result loop during development to quickly observe results and outputs in order to make progress or necessary changes; Julia’s interpreter is hands down the best for this and offers a smooth and feature-rich development experience, and D comes a close second. This code/compile/result loop in compilers can be slow even when compiling small code volumes. D has three compilers, the standard DMD compiler, the LLVM-based LDC compiler, and the GCC-based GDC. In this development process, the DMD and LDC compilers were used. DMD has **very** fast compilation times which is great for development. The LDC compiler is great at creating **fast** code. Chapel's compiler is very slow in comparison, to give an example running Linux’s `time` command on DMD vs Chapel’s compiler for the kernel matrix code with no optimizations gives us for D:
270 |
271 | ```
272 | real 0m0.545s
273 | user 0m0.447s
274 | sys 0m0.101s
275 | ```
276 | Compared with Chapel:
277 |
278 | ```
279 | real 0m5.980s
280 | user 0m5.787s
281 | sys 0m0.206s
282 | ```
283 | That’s a large actual and *psychological* difference, it can make programmers reluctant to check their work and delay the development loop if they have to wait for outputs especially when source code increases in volume and compilation times become significant.
284 |
285 | It is worth mentioning however that when developing packages in Julia compilation times can be very long, and users have noticed that when they load some packages compilation times can stretch so the experience of the development loop in Julia could vary, but in this specific case the process was seamless.
286 |
287 | ### Documentation and examples
288 |
289 | One way of comparing documentation in the different languages is to compare them all with Python’s official documentation which is *the* gold standard for programming languages, it combines examples with formal definitions and tutorials in a seamless and user friendly way. Since many programmers are familiar with the Python documentation this approach gives an idea of how they compare.
290 |
291 | Julia’s documentation the is closest to Python’s documentation quality and gives the user a very smooth detailed and relatively painless transition into the language, it also has a rich ecosystem of blogs and topics on many aspects of the language are easy to come by. D’s official documentation is not as good and can be challenging and frustrating, however there is a *very* good free book [“Programming in D”](https://wiki.dlang.org/Books) which is a great introduction to the language but no single book can cover a programming language and there are not many sources for advanced topics. Chapel’s documentation is quite good for getting things done though examples vary in presence and quality, often the programmer needs a lot of knowledge to look in the right place. A good topic for comparison is file i/o libraries in Chapel, D, and Julia. Chapel’s i/o library has too few examples but is relatively clear and straightforward, D’s i/o is kind of spread across a few modules and documentation is more difficult to follow, Julia’s i/o documentation has lots of examples and is clear and easy to follow.
292 |
293 | Perhaps one factor affecting Chapel’s adoption is lack of examples, since its arrays have a non-standard interface the user has to work hard to become familiar with them, were as even though D’s documentation may not be as good in places, the language has many similarities to C/C++ and so gets away with more sparse documentation.
294 |
295 | ### Multi-dimensional Array support
296 |
297 | “Arrays” here do not refer to native C/C++ style arrays available in D but mathematical arrays. Julia and Chapel ship with array support and D does not but it has the [Mir](http://docs.algorithm.dlang.io/latest/mir_ndslice.html) which has multidimensional arrays (ndslice). In the implementation of kernel matrix, I wrote my own matrix object in D – which is not difficult if you understand the principle but it's not something a user wants to do, however D has a linear algebra library called [Lubeck](https://github.com/kaleidicassociates/lubeck) which has impressive performance characteristics and interfaces with all the usual BLAS implementations. Julia’s arrays are by far the easiest and most familiar, Chapel arrays are more difficult to get started than Julia’s but are designed to be run on single core, multicore and computer clusters using the same or very similar code which is a good unique selling point.
298 |
299 | ### Language power
300 |
301 | Since Julia is a dynamic programming language some might say, “well Julia’s is a dynamic language which is far more permissive than static programming languages therefore the debate is over” but it’s more complicated than that. There is power in static type systems, Julia has a type system similar in nature to type systems from static languages so you can write code as if you were using a static language but you can do things reserved only for dynamic languages, it has a highly developed generic and meta-programming syntax and powerful macros. It also has highly flexible object system and multiple dispatch. This mix of features is what makes Julia is the most powerful language of the three.
302 |
303 | D was intended to be a replacement for C++ and takes very much after C++ (and also borrows from Java) but makes template programming and compile time evaluation much more user friendly than in C++, it is a single dispatch language (though multi-methods are available in a package), instead of macros D has string and template “mixins” which serve a similar purpose.
304 |
305 | Chapel has generic programming support and nascent support for single dispatch OOP, no macro support, and is not yet as mature as D or Julia in these terms.
306 |
307 | ### Concurrency & Parallel Programming
308 |
309 | Nowadays new languages tout support for concurrency and it’s popular subset parallelism but the detail varies a lot between languages. Parallelism is more relevant in this example and all three languages deliver. Writing parallel for loops required is straightforward in all three languages.
310 |
311 | Chapel’s concurrency model has much more emphasis on data parallelism but has tools for task parallelism and ships with support for cluster-based concurrency.
312 |
313 | Julia has good support for both concurrency and parallelism.
314 |
315 | D has industry strength support for parallelism and concurrency, though its support for threading is much less well documented with examples.
316 |
317 | ### Standard Library
318 |
319 | How good is the standard library of all three languages in general? What range of tasks do they allow users to easily tend to? It’s a tough question because library quality and documentation factor in. All three languages have very good standard libraries, D has the most comprehensive standard library, but Julia is great second then Chapel, but things are never that simple. For example, a user seeking to writing binary i/o may find Julia the easiest to start with, it has the most straightforward clear interface and documentation, followed by Chapel and then D, and Julia code is easy to write for cases unavailable in the other two languages.
320 |
321 | ### Package Managers & Package Ecosystems
322 |
323 | In terms of documentation, usage and features, D’s Dub package manager is the most comprehensive. D also has a rich package ecosystem in the [Dub website](https://code.dlang.org/), Julia’s package manager runs tightly integrated with GitHub and is a good package system with good documentation. Chapel has a package manager but does not have a highly developed package ecosystem.
324 |
325 | ### C Integration
326 |
327 | C interop is easy in all three languages; Chapel’s has good documentation but is not as well popularised as the others. D’s documentation is better and Julia’s documentation is the most comprehensive. Oddly enough though, none of the language documentations show the commands required to compile your own C code and integrate it with the language which is an oversight especially when it comes to novices. It is however easy to search for and find examples for the compilation process in D and Julia.
328 |
329 | ### Community
330 |
331 | All three languages have convenient places where users can ask questions. For Chapel, the easiest place is Gitter, for Julia it’s Discourse (though there is a Julia Gitter) and for D it’s the official website forum. The Julia community is the most active, followed by D and then Chapel. I’ve found that you’ll get good responses from all three communities but you’ll probably get quicker answers from D and Julia communities.
332 |
333 | | | Chapel | D | Julia |
334 | | --------------------------- |:-------------:|:-----------------------------------:| --------:|
335 | | Compilation/Interactivty | Slow | Fast | Best |
336 | | Documentation & Examples | Detailed | Patchy | Best |
337 | | Multi-dimensional Arrays | Yes | Native Only
(library support) | Yes |
338 | | Language Power | Good | Great | Best |
339 | | Concurrency & Parallelism | Great | Great | Good |
340 | | Standard Library | Good | Great | Great |
341 | | Package Manager & Ecosystem | Nascent | Best | Great |
342 | | C Integration | Great | Great | Great |
343 | | Community | Small | Vibrant | Largest |
344 |
345 | Table for quality of life features in Chapel, D & Julia
346 |
347 | ## Summary
348 |
349 | If you are a novice programmer writing numerical algorithms, doing scientific computing and want a fast language that's easy to use Julia is your best bet. If you are an experienced programmer working in the same space Julia is still a great option. If you need an "industrial strength" static compiled high performance language with all the "bells and whistles" but want something more productive, safer and less painful than C++ then D is your best bet. You can write "anything" in D and get great performance from its compilers. If you need to run array calculations on large clusters while avoiding the pain of writing MPI C++ code then Chapel is probably the best place to go.
350 |
351 | In terms of performance on this task, for IEEE math Julia is the winner performing better in 7 of the 9 kernels. For applications that use fast math D and Chapel performed better than Julia in 7 of the 9 kernels, and D performed best in the other 2 kernels (`log` and `power`) in IEEE and fast math modes. This exercise reveals that Julia's label as a high performance language is more than just hype, it has held it's own against highly competitive languages. Chapel and D's performance were very similar to each other in both the IEEE and fast math modes of calculation.
352 |
--------------------------------------------------------------------------------
/chapel/fmtime.log:
--------------------------------------------------------------------------------
1 | Command being timed: "./script --verbose=true --fastmath=true"
2 | User time (seconds): 105423.83
3 | System time (seconds): 17.24
4 | Percent of CPU this job got: 1192%
5 | Elapsed (wall clock) time (h:mm:ss or m:ss): 2:27:24
6 | Average shared text size (kbytes): 0
7 | Average unshared data size (kbytes): 0
8 | Average stack size (kbytes): 0
9 | Average total size (kbytes): 0
10 | Maximum resident set size (kbytes): 9266032
11 | Average resident set size (kbytes): 0
12 | Major (requiring I/O) page faults: 0
13 | Minor (reclaiming a frame) page faults: 2315605
14 | Voluntary context switches: 460
15 | Involuntary context switches: 3139168
16 | Swaps: 0
17 | File system inputs: 0
18 | File system outputs: 8
19 | Socket messages sent: 0
20 | Socket messages received: 0
21 | Signals delivered: 0
22 | Page size (bytes): 4096
23 | Exit status: 0
24 |
25 |
--------------------------------------------------------------------------------
/chapel/kernel.chpl:
--------------------------------------------------------------------------------
1 | use CPtr;
2 | use Math;
3 |
4 | record DotProduct {
5 | type T;
6 | proc kernel(xrow:c_ptr, yrow:c_ptr(?T), p: int): T
7 | {
8 | var dist: T = 0: T;
9 | for i in 0..#p {
10 | dist += xrow[i] * yrow[i];
11 | }
12 | return dist;
13 | }
14 | }
15 |
16 | record Gaussian {
17 | type T;
18 | const theta: T;
19 | proc kernel(xrow:c_ptr, yrow:c_ptr(?T), p: int): T
20 | {
21 | var dist: T = 0: T;
22 | for i in 0..#p {
23 | var tmp = xrow[i] - yrow[i];
24 | dist += tmp * tmp;
25 | }
26 | return exp(-sqrt(dist)/this.theta);
27 | }
28 | }
29 |
30 | record Polynomial {
31 | type T;
32 | const d: T;
33 | const offset: T;
34 | proc kernel(xrow:c_ptr, yrow:c_ptr(?T), p: int): T
35 | {
36 | var dist: T = 0: T;
37 | for i in 0..#p {
38 | dist += xrow[i]*yrow[i];
39 | }
40 | return (dist + this.offset)**this.d;
41 | }
42 | }
43 |
44 | record Exponential {
45 | type T;
46 | const theta: T;
47 | proc kernel(xrow:c_ptr, yrow:c_ptr(?T), p: int): T
48 | {
49 | var dist: T = 0: T;
50 | for i in 0..#p {
51 | dist -= abs(xrow[i] - yrow[i]);
52 | }
53 | return exp(dist/this.theta);
54 | }
55 | }
56 |
57 | record Log {
58 | type T;
59 | const beta: T;
60 | proc kernel(xrow:c_ptr, yrow:c_ptr(?T), p: int): T
61 | {
62 | var dist: T = 0: T;
63 | for i in 0..#p {
64 | dist += abs(xrow[i] - yrow[i])**this.beta;
65 | }
66 | dist = dist**(1/this.beta);
67 | return -log(1 + dist);
68 | }
69 | }
70 |
71 | record Cauchy {
72 | type T;
73 | const theta: T;
74 | proc kernel(xrow:c_ptr, yrow:c_ptr(?T), p: int): T
75 | {
76 | var dist: T = 0: T;
77 | for i in 0..#p {
78 | var tmp = xrow[i] - yrow[i];
79 | dist += tmp * tmp;
80 | }
81 | dist = sqrt(dist)/this.theta;
82 | return 1/(1 + dist);
83 | }
84 | }
85 |
86 | record Power {
87 | type T;
88 | const beta: T;
89 | proc kernel(xrow:c_ptr, yrow:c_ptr(?T), p: int): T
90 | {
91 | var dist: T = 0: T;
92 | for i in 0..#p {
93 | dist += abs(xrow[i] - yrow[i])**this.beta;
94 | }
95 | return -dist**(1/this.beta);
96 | }
97 | }
98 |
99 | record Wave {
100 | type T;
101 | const theta: T;
102 | proc kernel(xrow:c_ptr, yrow:c_ptr(?T), p: int): T
103 | {
104 | var dist: T = 0: T;
105 | for i in 0..#p {
106 | dist += abs(xrow[i] - yrow[i]);
107 | }
108 | var tmp = this.theta/dist;
109 | return tmp*sin(1/tmp);
110 | }
111 | }
112 |
113 | record Sigmoid {
114 | type T;
115 | const beta0: T;
116 | const beta1: T;
117 | proc kernel(xrow:c_ptr, yrow:c_ptr(?T), p: int): T
118 | {
119 | var dist: T = 0: T;
120 | for i in 0..#p {
121 | dist += xrow[i] * yrow[i];
122 | }
123 | return tanh(this.beta0 * dist + this.beta1);
124 | }
125 | }
126 |
127 | /***************************************************************************/
128 | use DynamicIters;
129 | proc calculateKernelMatrix(K, data: [?D] ?T) /* : [?E] T */
130 | {
131 | var n = D.dim(0).last;
132 | var p = D.dim(1).last;
133 | var E: domain(2) = {D.dim(0), D.dim(0)};
134 | var mat: [E] T;
135 | // code below assumes data starts at 1,1
136 | var rowPointers: [1..n] c_ptr(T) =
137 | forall i in 1..n do c_ptrTo(data[i, 1]);
138 |
139 | forall j in guided(1..n by -1) {
140 | for i in j..n {
141 | mat[i, j] = K.kernel(rowPointers[i], rowPointers[j], p);
142 | mat[j, i] = mat[i, j];
143 | }
144 | }
145 | return mat;
146 | }
147 |
148 |
--------------------------------------------------------------------------------
/chapel/kernelmatrix.chpl:
--------------------------------------------------------------------------------
1 | use CPtr;
2 | use Math;
3 |
4 | record DotProduct {
5 | type T;
6 | proc kernel(xrow:c_ptr, yrow:c_ptr(?T), p: int): T
7 | {
8 | var dist: T = 0: T;
9 | for i in 0..#p {
10 | dist += xrow[i] * yrow[i];
11 | }
12 | return dist;
13 | }
14 | }
15 |
16 | record Gaussian {
17 | type T;
18 | const theta: T;
19 | proc kernel(xrow:c_ptr, yrow:c_ptr(?T), p: int): T
20 | {
21 | var dist: T = 0: T;
22 | for i in 0..#p {
23 | var tmp = xrow[i] - yrow[i];
24 | dist += tmp * tmp;
25 | }
26 | return exp(-sqrt(dist)/this.theta);
27 | }
28 | }
29 |
30 | record Polynomial {
31 | type T;
32 | const d: T;
33 | const offset: T;
34 | proc kernel(xrow:c_ptr, yrow:c_ptr(?T), p: int): T
35 | {
36 | var dist: T = 0: T;
37 | for i in 0..#p {
38 | dist += xrow[i]*yrow[i];
39 | }
40 | return (dist + this.offset)**this.d;
41 | }
42 | }
43 |
44 | record Exponential {
45 | type T;
46 | const theta: T;
47 | proc kernel(xrow:c_ptr, yrow:c_ptr(?T), p: int): T
48 | {
49 | var dist: T = 0: T;
50 | for i in 0..#p {
51 | dist -= abs(xrow[i] - yrow[i]);
52 | }
53 | return exp(dist/this.theta);
54 | }
55 | }
56 |
57 | record Log {
58 | type T;
59 | const beta: T;
60 | proc kernel(xrow:c_ptr, yrow:c_ptr(?T), p: int): T
61 | {
62 | var dist: T = 0: T;
63 | for i in 0..#p {
64 | dist += abs(xrow[i] - yrow[i])**this.beta;
65 | }
66 | dist = dist**(1/this.beta);
67 | return -log(1 + dist);
68 | }
69 | }
70 |
71 | record Cauchy {
72 | type T;
73 | const theta: T;
74 | proc kernel(xrow:c_ptr, yrow:c_ptr(?T), p: int): T
75 | {
76 | var dist: T = 0: T;
77 | for i in 0..#p {
78 | var tmp = xrow[i] - yrow[i];
79 | dist += tmp * tmp;
80 | }
81 | dist = sqrt(dist)/this.theta;
82 | return 1/(1 + dist);
83 | }
84 | }
85 |
86 | record Power {
87 | type T;
88 | const beta: T;
89 | proc kernel(xrow:c_ptr, yrow:c_ptr(?T), p: int): T
90 | {
91 | var dist: T = 0: T;
92 | for i in 0..#p {
93 | dist += abs(xrow[i] - yrow[i])**this.beta;
94 | }
95 | return -dist**(1/this.beta);
96 | }
97 | }
98 |
99 | record Wave {
100 | type T;
101 | const theta: T;
102 | proc kernel(xrow:c_ptr, yrow:c_ptr(?T), p: int): T
103 | {
104 | var dist: T = 0: T;
105 | for i in 0..#p {
106 | dist += abs(xrow[i] - yrow[i]);
107 | }
108 | var tmp = this.theta/dist;
109 | return tmp*sin(1/tmp);
110 | }
111 | }
112 |
113 | record Sigmoid {
114 | type T;
115 | const beta0: T;
116 | const beta1: T;
117 | proc kernel(xrow:c_ptr, yrow:c_ptr(?T), p: int): T
118 | {
119 | var dist: T = 0: T;
120 | for i in 0..#p {
121 | dist += xrow[i] * yrow[i];
122 | }
123 | return tanh(this.beta0 * dist + this.beta1);
124 | }
125 | }
126 |
127 | /***************************************************************************/
128 | use DynamicIters;
129 | proc calculateKernelMatrix(K, data: [?D] ?T) /* : [?E] T */
130 | {
131 | var n = D.dim(0).last;
132 | var p = D.dim(1).last;
133 | var E: domain(2) = {D.dim(0), D.dim(0)};
134 | var mat: [E] T;
135 | // code below assumes data starts at 1,1
136 | var rowPointers: [1..n] c_ptr(T) =
137 | forall i in 1..n do c_ptrTo(data[i, 1]);
138 |
139 | forall j in guided(1..n by -1) {
140 | for i in j..n {
141 | mat[i, j] = K.kernel(rowPointers[i], rowPointers[j], p);
142 | mat[j, i] = mat[i, j];
143 | }
144 | }
145 | return mat;
146 | }
147 |
148 |
--------------------------------------------------------------------------------
/chapel/script.chpl:
--------------------------------------------------------------------------------
1 | use kernel;
2 |
3 | use IO;
4 | use Time;
5 | use Random;
6 |
7 | config const fastmath: bool = false;
8 | config const verbose: bool = true;
9 | const folder = if fastmath then "fmdata" else "data";
10 |
11 | record BenchRecord {
12 | var D: domain(1);
13 | var n: [D] int(64);
14 | var times: [D] real(64);
15 | proc init()
16 | {
17 | this.D = {0..1};
18 | this.n = [0, 1];
19 | this.times = [0.0, 1.0];
20 | }
21 | }
22 |
23 | proc bench(type T, Kernel, n: [?D] int(64))
24 | {
25 | var nitems: int(64) = D.dim(0).last: int(64);
26 | var times: [0..nitems] real(64);
27 |
28 | var result: BenchRecord;
29 | result.D = {0..nitems};
30 |
31 | for i in 0..nitems {
32 | var _times: [0..2] real(64);
33 | var data: [1..n[i], 1..784] T;
34 | fillRandom(data);
35 | for j in 0..2 {
36 | var sw = new Timer();
37 | sw.start();
38 | var mat = calculateKernelMatrix(Kernel, data);
39 | sw.stop();
40 | _times[j] = (sw.elapsed(TimeUnits.microseconds)/1000_000): real(64);
41 | }
42 | times[i] = (_times[0] + _times[1] + _times[2])/3;
43 | if verbose {
44 | writeln("Average time for n = ", n[i], ", ", times[i], " seconds.");
45 | writeln("Detailed times: ", _times);
46 | }
47 | }
48 | result.n = n;
49 | result.times = times;
50 | return result;
51 | }
52 |
53 | proc runKernelBenchmarks(type T, kernels, n: [?D] int(64))
54 | {
55 | var results: [0..#kernels.size] BenchRecord;
56 | for param i in 0..(kernels.size - 1) {
57 | const kernel = kernels(i);
58 | if verbose {
59 | writeln("\n\nRunning benchmarks for ", kernel.type: string, kernel: string);
60 | }
61 | results[i] = bench(T, kernel, n);
62 | }
63 | return results;
64 | }
65 |
66 | /**
67 | To compile:
68 | chpl script.chpl kernel.chpl --fast && ./script
69 | */
70 | proc runAllKernelBenchmarks(type T, folder: string)
71 | {
72 | //var n = [100, 500, 1000];
73 | var n = [1000, 5000, 10000, 20000, 30000];
74 |
75 | var kernels = (new DotProduct(T), new Gaussian(T, 1: T), new Polynomial(T, 2.5: T, 1: T),
76 | new Exponential(T, 1: T), new Log(T, 3: T), new Cauchy(T, 1: T),
77 | new Power(T, 2.5: T), new Wave(T, 1: T), new Sigmoid(T, 1: T, 1: T));
78 | var kernelNames = ["DotProduct", "Gaussian", "Polynomial",
79 | "Exponential", "Log", "Cauchy",
80 | "Power", "Wave", "Sigmoid"];
81 | var results = runKernelBenchmarks(T, kernels, n);
82 |
83 |
84 | var last: int(64) = n.domain.dim(0).last: int(64);
85 | var tabLen = n.size * kernels.size;
86 | var table: [0..tabLen, 0..3] string;
87 | table[0, ..] = ["language,", "kernel,", "nitems,", "time"];
88 | while (true)
89 | {
90 | var k = 1;
91 | for i in 0..#kernels.size {
92 | var tmp = ["Chapel,", kernelNames[i] + ",", "", ""];
93 | for j in 0..(n.size - 1) {
94 | tmp[2] = results[i].n[j]: string + ",";
95 | tmp[3] = results[i].times[j]: string;
96 | table[k, ..] = tmp;
97 | k += 1;
98 | }
99 | }
100 | if k > tabLen
101 | {
102 | break;
103 | }
104 | }
105 | var file = open("../" + folder + "/chapelBench.csv", iomode.cw);
106 | var _channel = file.writer();
107 | _channel.write(table);
108 | _channel.close();
109 | file.close();
110 | return;
111 | }
112 |
113 | proc main()
114 | {
115 | writeln("folder: ", folder);
116 | runAllKernelBenchmarks(real(32), folder);
117 | }
118 |
--------------------------------------------------------------------------------
/chapel/script.sh:
--------------------------------------------------------------------------------
1 | #!/usr/bin/bash
2 | # Uses "regular" mathematical functions
3 | chpl --fast --ieee-float script.chpl kernel.chpl
4 | /usr/bin/time -v ./script --verbose=true --fastmath=false
5 | # Uses Fast Math
6 | chpl --fast --no-ieee-float script.chpl kernel.chpl
7 | /usr/bin/time -v ./script --verbose=true --fastmath=true
8 |
--------------------------------------------------------------------------------
/chapel/time.log:
--------------------------------------------------------------------------------
1 | Command being timed: "./script --verbose=true --fastmath=false"
2 | User time (seconds): 114342.55
3 | System time (seconds): 17.96
4 | Percent of CPU this job got: 1192%
5 | Elapsed (wall clock) time (h:mm:ss or m:ss): 2:39:53
6 | Average shared text size (kbytes): 0
7 | Average unshared data size (kbytes): 0
8 | Average stack size (kbytes): 0
9 | Average total size (kbytes): 0
10 | Maximum resident set size (kbytes): 9266328
11 | Average resident set size (kbytes): 0
12 | Major (requiring I/O) page faults: 0
13 | Minor (reclaiming a frame) page faults: 2315637
14 | Voluntary context switches: 625
15 | Involuntary context switches: 3419118
16 | Swaps: 0
17 | File system inputs: 0
18 | File system outputs: 8
19 | Socket messages sent: 0
20 | Socket messages received: 0
21 | Signals delivered: 0
22 | Page size (bytes): 4096
23 | Exit status: 0
24 |
25 |
--------------------------------------------------------------------------------
/charts/benchplot.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dataPulverizer/KernelMatrixBenchmark/28c40acb02497a3052f2d04689ab9c93fd090de2/charts/benchplot.jpg
--------------------------------------------------------------------------------
/charts/charts.r:
--------------------------------------------------------------------------------
1 | require(data.table)
2 | require(ggplot2)
3 | require(scales)
4 |
5 | # Plots for language benchmarks
6 | createJPGPlot = function(folder, filename)
7 | {
8 | results = Map(fread, c(paste0("../", folder,"/chapelBench.csv"),
9 | paste0("../", folder,"/dBench.csv"),
10 | paste0("../", folder,"/juliaBench.csv")))
11 | results = rbindlist(results)
12 | results[, kernel := gsub(" ", "", kernel)]
13 | p = ggplot(results, aes(x = nitems, y = time, color = language)) + geom_line() +
14 | geom_point() + scale_y_continuous(trans = "log10",
15 | labels = trans_format("log10", math_format(10^.x))) +
16 | theme(legend.position="top") + ylab("time (s)") +
17 | xlab("Number Of Items") + facet_wrap(~ kernel, scale = "free_y")
18 | jpeg(file = filename, width = 9, height = 7, units = "in", res = 200)
19 | plot(p)
20 | dev.off()
21 | return(invisible(p))
22 | }
23 | createSVGPlot = function(folder, filename)
24 | {
25 | results = Map(fread, c(paste0("../", folder,"/chapelBench.csv"),
26 | paste0("../", folder,"/dBench.csv"),
27 | paste0("../", folder,"/juliaBench.csv")))
28 | results = rbindlist(results)
29 | results[, kernel := gsub(" ", "", kernel)]
30 | p = ggplot(results, aes(x = nitems, y = time, color = language)) + geom_line() +
31 | geom_point() + scale_y_continuous(trans = "log10",
32 | labels = trans_format("log10", math_format(10^.x))) +
33 | theme(legend.position="top") + ylab("time (s)") +
34 | xlab("Number Of Items") + facet_wrap(~ kernel, scale = "free_y")
35 | svg(file = filename, width = 9, height = 7)
36 | plot(p)
37 | dev.off()
38 | return(invisible(p))
39 | }
40 |
41 | createSVGPlot("data", "benchplot.svg")
42 | createSVGPlot("fmdata", "fmbenchplot.svg")
43 |
44 | createJPGPlot("data", "benchplot.jpg")
45 | createJPGPlot("fmdata", "fmbenchplot.jpg")
46 |
47 |
48 | # Difference between NDSlice and My basic matrix implementation
49 | createJPGNDSlicePlot = function(filename)
50 | {
51 | results = Map(fread, c("../data/dNDSliceBench.csv", "../data/dBench.csv"))
52 | results[[1]][, language := "NDSlice"]
53 | results[[2]][, time := 100*(time - results[[1]][,time])/time]
54 | results = results[[2]]
55 |
56 | p = ggplot(results, aes(x = nitems, y = time, fill = kernel)) + geom_col() +
57 | theme(legend.position="none", plot.title = element_text(hjust = 0.5)) +
58 | ylab("% Difference\n(+ive = ndslice is faster)") +
59 | xlab("Number Of Items") + facet_wrap(~ kernel, scale = "free_y") +
60 | ggtitle("Matrix and NDSlice percentage time difference")
61 |
62 | jpeg(file = filename, width = 7, height = 7, units = "in", res = 200)
63 | plot(p)
64 | dev.off()
65 | return(invisible(p))
66 | }
67 | # "ndsliceDiagnostic.jpg"
68 | createSVGNDSlicePlot = function(filename)
69 | {
70 | results = Map(fread, c("../data/dNDSliceBench.csv", "../data/dBench.csv"))
71 | results[[1]][, language := "NDSlice"]
72 | results[[2]][, time := 100*(time - results[[1]][,time])/time]
73 | results = results[[2]]
74 |
75 | p = ggplot(results, aes(x = nitems, y = time, fill = kernel)) + geom_col() +
76 | theme(legend.position="none", plot.title = element_text(hjust = 0.5)) +
77 | ylab("% Difference\n(+ive = ndslice is faster)") +
78 | xlab("Number Of Items") + facet_wrap(~ kernel, scale = "free_y") +
79 | ggtitle("Matrix and NDSlice percentage time difference")
80 |
81 | svg(file = filename, width = 7, height = 7)
82 | plot(p)
83 | dev.off()
84 | return(invisible(p))
85 | }
86 |
87 | createJPGNDSlicePlot("ndsliceDiagnostic.jpg")
88 | createSVGNDSlicePlot("ndsliceDiagnostic.svg")
89 |
--------------------------------------------------------------------------------
/charts/fmbenchplot.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dataPulverizer/KernelMatrixBenchmark/28c40acb02497a3052f2d04689ab9c93fd090de2/charts/fmbenchplot.jpg
--------------------------------------------------------------------------------
/charts/ndsliceDiagnostic.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dataPulverizer/KernelMatrixBenchmark/28c40acb02497a3052f2d04689ab9c93fd090de2/charts/ndsliceDiagnostic.jpg
--------------------------------------------------------------------------------
/d/arrays.d:
--------------------------------------------------------------------------------
1 | /*
2 | This module contains implementations for vectors and matrices an altered version from my glmsolverd package
3 | */
4 |
5 | module arrays;
6 |
7 | import std.conv: to;
8 | import std.format: format;
9 | import std.traits: isFloatingPoint, isIntegral, isNumeric;
10 | import std.algorithm: min, max;
11 | import std.math: modf;
12 | import core.memory: GC;
13 | import core.stdc.stdlib: malloc, free;
14 | import std.stdio: writeln;
15 | import std.random;
16 | import std.parallelism;
17 | import std.range : iota;
18 |
19 | /********************************************* Printer Utility Functions *********************************************/
20 | auto getRange(T)(const(T[]) data)
21 | if(isFloatingPoint!T)
22 | {
23 | real[2] range = [cast(real)data[0], cast(real)data[0]];
24 | foreach(el; data)
25 | {
26 | range[0] = min(range[0], el);
27 | range[1] = max(range[1], el);
28 | }
29 | return range;
30 | }
31 | string getFormat(real[] range, long maxLength = 8, long gap = 2)
32 | {
33 | writeln("range: ", range);
34 | string form = "";
35 | if((range[0] > 0.01) & (range[1] < 1000_000))
36 | {
37 | form = "%" ~ to!(string)(gap + 2 + maxLength) ~ "." ~ to!(string)(maxLength) ~ "g";
38 | }else if((range[0] < 0.0001) | (range[1] > 1000_000))
39 | {
40 | form = "%" ~ to!(string)(gap + 1 + maxLength) ~ "." ~ to!(string)(4) ~ "g";
41 | }
42 | return form;
43 | }
44 | /********************************************* Matrix Class *********************************************/
45 |
46 | /*
47 | Faster Array Creation
48 | */
49 | auto newArray(T)(long n)
50 | {
51 | auto data = (cast(T*)GC.malloc(T.sizeof*n, GC.BlkAttr.NO_SCAN))[0..n];
52 | //auto data = new T[](n);
53 | if(data == null)
54 | assert(0, "Array Allocation Failed!");
55 | return data;
56 | }
57 |
58 | /*
59 | Matrix will be column major
60 | */
61 | mixin template MatrixGubbings(T)
62 | {
63 | private:
64 | T[] data;
65 | long[] dim;
66 |
67 | public:
68 | this(T[] _data, long rows, long cols)
69 | {
70 | assert(rows*cols == _data.length,
71 | "dimension of matrix inconsistent with length of array");
72 | data = _data; dim = [rows, cols];
73 | }
74 | this(long n, long m)
75 | {
76 | long _len = n*m;
77 | data = newArray!(T)(_len);
78 | dim = [n, m];
79 | }
80 | this(T[] _data, long[] _dim)
81 | {
82 | long tlen = _dim[0]*_dim[1];
83 | assert(tlen == _data.length,
84 | "dimension of matrix inconsistent with length of array");
85 | data = _data; dim = _dim;
86 | }
87 | this(Matrix!(T) mat)
88 | {
89 | data = mat.data.dup;
90 | dim = mat.dim.dup;
91 | }
92 | @property Matrix!(T) dup() const
93 | {
94 | return Matrix!(T)(data.dup, dim.dup);
95 | }
96 | T opIndex(long i, long j) const
97 | {
98 | return data[dim[0]*j + i];
99 | }
100 | void opIndexAssign(T x, long i, long j)
101 | {
102 | data[dim[0]*j + i] = x;
103 | }
104 | T opIndexOpAssign(string op)(T x, long i, long j)
105 | {
106 | static if((op == "+") | (op == "-") | (op == "*") | (op == "/") | (op == "^^"))
107 | mixin("return data[dim[0]*j + i] " ~ op ~ "= x;");
108 | else static assert(0, "Operator \"" ~ op ~ "\" not implemented");
109 | }
110 | Matrix!(T) opBinary(string op)(Matrix!(T) x)
111 | {
112 | assert( data.length == x.array.length,
113 | "Number of rows and columns in matrices not equal.");
114 | long n = data.length;
115 | auto ret = Matrix!(T)(dim[0], dim[1]);
116 | static if((op == "+") | (op == "-") | (op == "*") | (op == "/") | (op == "^^"))
117 | {
118 | for(long i = 0; i < n; ++i)
119 | {
120 | mixin("ret.array[i] = " ~ "data[i] " ~ op ~ " x.array[i];");
121 | }
122 | }else static assert(0, "Operator \"" ~ op ~ "\" not implemented");
123 | return ret;
124 | }
125 | Matrix!(T) opBinary(string op)(T rhs)
126 | {
127 | ulong n = data.length;
128 | Matrix!(T) ret = Matrix!(T)(dim[0], dim[1]);
129 | static if((op == "+") | (op == "-") | (op == "*") | (op == "/") | (op == "^^"))
130 | {
131 | for(ulong i = 0; i < n; ++i)
132 | {
133 | mixin("ret.array[i] = " ~ "data[i] " ~ op ~ " rhs;");
134 | }
135 | }else static assert(0, "Operator \"" ~ op ~ "\" not implemented");
136 | return ret;
137 | }
138 | Matrix!(T) opBinaryRight(string op)(T lhs)
139 | {
140 | long n = data.length;
141 | Matrix!(T) ret = Matrix!(T)(dim[0], dim[1]);
142 | static if((op == "+") | (op == "-") | (op == "*") | (op == "/") | (op == "^^"))
143 | {
144 | for(long i = 0; i < n; ++i)
145 | {
146 | mixin("ret.array[i] = " ~ "lhs " ~ op ~ " data[i];");
147 | }
148 | }else static assert(0, "Operator \"" ~ op ~ "\" not implemented");
149 | return ret;
150 | }
151 | void opOpAssign(string op)(Matrix!(T) x)
152 | {
153 | assert( data.length == x.array.length,
154 | "Number of rows and columns in matrices not equal.");
155 | long n = data.length;
156 | static if((op == "+") | (op == "-") | (op == "*") | (op == "/") | (op == "^^"))
157 | {
158 | for(long i = 0; i < n; ++i)
159 | {
160 | mixin("data[i] " ~ op ~ "= x.array[i];");
161 | }
162 | }else static assert(0, "Operator \"" ~ op ~ "\" not implemented");
163 | }
164 | /* mat "op"= rhs */
165 | void opOpAssign(string op)(T rhs)
166 | {
167 | long n = data.length;
168 | static if((op == "+") | (op == "-") | (op == "*") | (op == "/") | (op == "^^"))
169 | {
170 | for(long i = 0; i < n; ++i)
171 | {
172 | mixin("data[i] " ~ op ~ "= rhs;");
173 | }
174 | }else static assert(0, "Operator \"" ~ op ~ "\" not implemented");
175 | }
176 | @property long nrow() const
177 | {
178 | return dim[0];
179 | }
180 | @property long ncol() const
181 | {
182 | return dim[1];
183 | }
184 | @property T[] array()
185 | {
186 | return data;
187 | }
188 | @property long len() const
189 | {
190 | return data.length;
191 | }
192 | @property long length() const
193 | {
194 | return data.length;
195 | }
196 | @property size() const
197 | {
198 | return dim.dup;
199 | }
200 | /* Returns transposed matrix (duplicated) */
201 | Matrix!(T) t() const
202 | {
203 | auto _data = data.dup;
204 | long[] _dim = new long[2];
205 | _dim[0] = dim[1]; _dim[1] = dim[0];
206 | if((dim[0] == 1) & (dim[1] == 1)){
207 | } else if(dim[0] != dim[1]) {
208 | for(long j = 0; j < dim[1]; ++j)
209 | {
210 | for(long i = 0; i < dim[0]; ++i)
211 | {
212 | _data[_dim[0]*i + j] = data[dim[0]*j + i];
213 | }
214 | }
215 | } else if(dim[0] == dim[1]) {
216 | for(long j = 0; j < dim[1]; ++j)
217 | {
218 | for(long i = 0; i < dim[0]; ++i)
219 | {
220 | if(i == j)
221 | continue;
222 | _data[_dim[0]*i + j] = data[dim[0]*j + i];
223 | }
224 | }
225 | }
226 | return Matrix!(T)(_data, _dim);
227 | }
228 |
229 | /* Appends Vector to the END of the matrix */
230 | void appendColumn(T[] rhs)
231 | {
232 | assert(rhs.length == nrow,
233 | "Vector is not of the same length as number of rows.");
234 | data ~= rhs;
235 | dim[1] += 1;
236 | return;
237 | }
238 | void appendColumn(Matrix!(T) rhs)
239 | {
240 | assert((rhs.nrow == 1) | (rhs.ncol == 1),
241 | "Matrix does not have 1 row or 1 column");
242 | appendColumn(rhs.array);
243 | }
244 | void appendColumn(T _rhs)
245 | {
246 | auto rhs = newArray!(T)(nrow);
247 | rhs[] = _rhs;
248 | appendColumn(rhs);
249 | }
250 | /* Prepends Column Vector to the START of the matrix */
251 | void prependColumn(T[] rhs)
252 | {
253 | assert(rhs.length == nrow,
254 | "Vector is not of the same length as number of rows.");
255 | data = rhs ~ data;
256 | dim[1] += 1;
257 | return;
258 | }
259 | void prependColumn(Matrix!(T) rhs)
260 | {
261 | assert((rhs.nrow == 1) | (rhs.ncol == 1),
262 | "Matrix does not have 1 row or 1 column");
263 | prependColumn(rhs.array);
264 | }
265 | void prependColumn(T _rhs)
266 | {
267 | auto rhs = newArray!(T)(nrow);
268 | rhs[] = _rhs;
269 | prependColumn(rhs);
270 | }
271 | /* Contiguous column select copies the column */
272 | auto columnSelect(long start, long end)
273 | {
274 | assert(end > start, "Starting column is not less than end column");
275 | long nCol = end - start;
276 | long _len = nrow * nCol;
277 | auto arr = newArray!(T)(_len);
278 | auto startIndex = start*nrow;
279 | long iStart = 0;
280 | for(long i = 0; i < nCol; ++i)
281 | {
282 | arr[iStart..((iStart + nrow))] = data[startIndex..(startIndex + nrow)];
283 | startIndex += nrow;
284 | iStart += nrow;
285 | }
286 | return Matrix!(T)(arr, [nrow, nCol]);
287 | }
288 | auto columnSelect(long index)
289 | {
290 | assert(index < ncol, "Selected index is not less than number of columns.");
291 | auto arr = newArray!(T)(nrow);
292 | auto startIndex = index*nrow;
293 | arr[] = data[startIndex..(startIndex + nrow)];
294 | return Matrix!(T)(arr, [nrow, 1]);
295 | }
296 | auto refColumnSelect(long index)
297 | {
298 | assert(index < ncol, "Selected index is not less than number of columns.");
299 | auto startIndex = index*nrow;
300 | return Matrix!(T)(data[startIndex..(startIndex + nrow)], [nrow, 1]);
301 | }
302 | auto refColumnSelectArr(long index)
303 | {
304 | //assert(index < ncol, "Selected index is not less than number of columns.");
305 | auto startIndex = index*nrow;
306 | return data[startIndex..(startIndex + nrow)];
307 | }
308 | /*
309 | Function to remove a column from the matrix.
310 | */
311 | Matrix!(T) refColumnRemove(long index)
312 | {
313 | /* Remove first column */
314 | if(index == 0)
315 | {
316 | data = data[nrow..$];
317 | dim[1] -= 1;
318 | return this;
319 | /* Remove last column */
320 | }else if(index == (ncol - 1))
321 | {
322 | data = data[0..($ - nrow)];
323 | dim[1]-= 1;
324 | return this;
325 | /* Remove any other column */
326 | }else{
327 | auto start = index*nrow;
328 | long _len = data.length - nrow;
329 | auto _data = newArray!(T)(_len);
330 | _data[0..start] = data[0..start];
331 | _data[start..$] = data[(start + nrow)..$];
332 | data = _data;
333 | dim[1] -= 1;
334 | return this;
335 | }
336 | }
337 | /* Assigns vector in-place to a specific column */
338 | /* Refactor these two methods */
339 | Matrix!(T) refColumnAssign(T[] col, long index)
340 | {
341 | assert(col.length == nrow, "Length of vector is not the same as number of rows");
342 | /* Replace first column */
343 | if(index == 0)
344 | {
345 | data[0..nrow] = col;
346 | return this;
347 | /* Replace last column */
348 | }else if(index == (ncol - 1))
349 | {
350 | data[($ - nrow)..$] = col;
351 | return this;
352 | /* Replace any other column */
353 | }else{
354 | auto start = index*nrow;
355 | data[start..(start + nrow)] = col;
356 | return this;
357 | }
358 | }
359 | Matrix!(T) refColumnAssign(T col, long index)
360 | {
361 | /* Replace first column */
362 | if(index == 0)
363 | {
364 | data[0..nrow] = col;
365 | return this;
366 | /* Replace last column */
367 | }else if(index == (ncol - 1))
368 | {
369 | data[($ - nrow)..$] = col;
370 | return this;
371 | /* Replace any other column */
372 | }else{
373 | auto start = index*nrow;
374 | data[start..(start + nrow)] = col;
375 | return this;
376 | }
377 | }
378 | }
379 | /* Assuming column major */
380 | struct Matrix(T)
381 | if(isFloatingPoint!T)
382 | {
383 | mixin MatrixGubbings!(T);
384 | string toString() const
385 | {
386 | string dform = getFormat(getRange(data));
387 | string repr = format(" Matrix(%d x %d)\n", dim[0], dim[1]);
388 | for(long i = 0; i < dim[0]; ++i)
389 | {
390 | for(long j = 0; j < dim[1]; ++j)
391 | {
392 | repr ~= format(dform, opIndex(i, j));
393 | }
394 | repr ~= "\n";
395 | }
396 | return repr;
397 | }
398 | }
399 |
400 | // Create random matrix
401 | /****************************************************************************/
402 | auto createRNG()
403 | {
404 | Mt19937_64 rng;
405 | rng.seed(unpredictableSeed);
406 | return rng;
407 | }
408 | Matrix!T createRandomMatrix(T)(ulong rows, ulong cols)
409 | {
410 | //Mt19937_64 gen;
411 | auto RNG = taskPool.workerLocalStorage(createRNG());
412 | //gen.seed(unpredictableSeed);
413 | ulong len = rows*cols;
414 | T[] data = newArray!(T)(len);
415 | foreach(i; taskPool.parallel(iota(len)))
416 | data[i] = uniform01!(T)(RNG.get);
417 | return Matrix!T(data, rows, cols);
418 | }
419 | Matrix!T createRandomMatrix(T)(ulong m)
420 | {
421 | return createRandomMatrix!(T)(m, m);
422 | }
423 | Matrix!T createRandomMatrix(T)(ulong[] dim)
424 | {
425 | return createRandomMatrix!(T)(dim[0], dim[1]);
426 | }
427 |
428 |
--------------------------------------------------------------------------------
/d/fmtime.log:
--------------------------------------------------------------------------------
1 | Command being timed: "./script"
2 | User time (seconds): 61656.04
3 | System time (seconds): 54.86
4 | Percent of CPU this job got: 1182%
5 | Elapsed (wall clock) time (h:mm:ss or m:ss): 1:26:59
6 | Average shared text size (kbytes): 0
7 | Average unshared data size (kbytes): 0
8 | Average stack size (kbytes): 0
9 | Average total size (kbytes): 0
10 | Maximum resident set size (kbytes): 20458860
11 | Average resident set size (kbytes): 0
12 | Major (requiring I/O) page faults: 0
13 | Minor (reclaiming a frame) page faults: 15898753
14 | Voluntary context switches: 4863
15 | Involuntary context switches: 1999947
16 | Swaps: 0
17 | File system inputs: 0
18 | File system outputs: 8
19 | Socket messages sent: 0
20 | Socket messages received: 0
21 | Signals delivered: 0
22 | Page size (bytes): 4096
23 | Exit status: 0
24 |
25 |
--------------------------------------------------------------------------------
/d/kernel.d:
--------------------------------------------------------------------------------
1 | import arrays;
2 | import std.parallelism;
3 | import std.range : iota;
4 | import std.stdio: writeln;
5 | import std.datetime.stopwatch: AutoStart, StopWatch;
6 |
7 | import core.stdc.math: exp, exp = expf, exp = expl,
8 | fabs, fabs = fabsf, fabs = fabsl,
9 | log, log = logf, log = logl,
10 | pow, pow = powf, pow = powl,
11 | sin, sin = sinf, sin = sinhl,
12 | sqrt, sqrt = sqrtf, sqrt = sqrtl,
13 | tanh, tanh = tanhf, tanh = tanhl;
14 |
15 | /**
16 | Kernel Function Types:
17 | */
18 | struct DotProduct(T)
19 | {
20 | public:
21 | this(T _nothing)
22 | {}
23 | T opCall(T[] x, T[] y) const
24 | {
25 | T dist = 0;
26 | auto m = x.length;
27 | for(size_t i = 0; i < m; ++i)
28 | {
29 | dist += x[i] * y[i];
30 | }
31 | return dist;
32 | }
33 | }
34 |
35 | struct Gaussian(T)
36 | {
37 | private:
38 | T theta;
39 | public:
40 | this(T _theta)
41 | {
42 | theta = _theta;
43 | }
44 | T opCall(T[] x, T[] y) const
45 | {
46 | T dist = 0;
47 | auto m = x.length;
48 | for(size_t i = 0; i < m; ++i)
49 | {
50 | auto tmp = x[i] - y[i];
51 | dist += tmp * tmp;
52 | }
53 | return exp(-sqrt(dist)/theta);
54 | }
55 | }
56 |
57 | struct Polynomial(T)
58 | {
59 | private:
60 | T d;
61 | T offset;
62 | public:
63 | this(T _d, T _offset)
64 | {
65 | d = _d;
66 | offset = _offset;
67 | }
68 | T opCall(T[] x, T[] y) const
69 | {
70 | T dist = 0;
71 | auto m = x.length;
72 | for(size_t i = 0; i < m; ++i)
73 | {
74 | dist += x[i] * y[i];
75 | }
76 | return pow(dist + offset, d);
77 | }
78 | }
79 |
80 | struct Exponential(T)
81 | {
82 | private:
83 | T theta;
84 | public:
85 | this(T _theta)
86 | {
87 | theta = _theta;
88 | }
89 | T opCall(T[] x, T[] y) const
90 | {
91 | T dist = 0;
92 | auto m = x.length;
93 | for(size_t i = 0; i < m; ++i)
94 | {
95 | dist -= fabs(x[i] - y[i]);
96 | }
97 | return exp(dist/theta);
98 | }
99 | }
100 |
101 | struct Log(T)
102 | {
103 | private:
104 | T beta;
105 | public:
106 | this(T _beta)
107 | {
108 | beta = _beta;
109 | }
110 | T opCall(T[] x, T[] y) const
111 | {
112 | T dist = 0;
113 | auto m = x.length;
114 | for(size_t i = 0; i < m; ++i)
115 | {
116 | dist += pow(fabs(x[i] - y[i]), beta);
117 | }
118 | dist = pow(dist, 1/beta);
119 | return -log(1 + dist);
120 | }
121 | }
122 |
123 | struct Cauchy(T)
124 | {
125 | private:
126 | T theta;
127 | public:
128 | this(T _theta)
129 | {
130 | theta = _theta;
131 | }
132 | T opCall(T[] x, T[] y) const
133 | {
134 | T dist = 0;
135 | auto m = x.length;
136 | for(size_t i = 0; i < m; ++i)
137 | {
138 | auto tmp = x[i] - y[i];
139 | dist += tmp * tmp;
140 | }
141 | dist = sqrt(dist)/theta;
142 | return 1/(1 + dist);
143 | }
144 | }
145 |
146 | struct Power(T)
147 | {
148 | private:
149 | T beta;
150 | public:
151 | this(T _beta)
152 | {
153 | beta = _beta;
154 | }
155 | T opCall(T[] x, T[] y) const
156 | {
157 | T dist = 0;
158 | auto m = x.length;
159 | for(size_t i = 0; i < m; ++i)
160 | {
161 | dist += pow(fabs(x[i] - y[i]), beta);
162 | }
163 | return -pow(dist, 1/beta);
164 | }
165 | }
166 |
167 | struct Wave(T)
168 | {
169 | private:
170 | T theta;
171 | public:
172 | this(T _theta)
173 | {
174 | theta = _theta;
175 | }
176 | T opCall(T[] x, T[] y) const
177 | {
178 | T dist = 0;
179 | auto m = x.length;
180 | for(size_t i = 0; i < m; ++i)
181 | {
182 | dist += fabs(x[i] - y[i]);
183 | }
184 | auto tmp = theta/dist;
185 | return tmp*sin(1/tmp);
186 | }
187 | }
188 |
189 | struct Sigmoid(T)
190 | {
191 | private:
192 | T beta0;
193 | T beta1;
194 | public:
195 | this(T _beta0, T _beta1)
196 | {
197 | beta0 = _beta0;
198 | beta1 = _beta1;
199 | }
200 | T opCall(T[] x, T[] y) const
201 | {
202 | T dist = 0;
203 | auto m = x.length;
204 | for(size_t i = 0; i < m; ++i)
205 | {
206 | dist += x[i] * y[i];
207 | }
208 | return tanh(beta0 * dist + beta1);
209 | }
210 | }
211 |
212 | /************************************************************************************/
213 |
214 | auto calculateKernelMatrix(alias K, T)(K!(T) kernel, Matrix!(T) data)
215 | {
216 | size_t n = data.ncol;
217 | auto mat = Matrix!(T)(n, n);
218 |
219 | foreach(j; taskPool.parallel(iota(n)))
220 | {
221 | auto arrj = data.refColumnSelect(j).array;
222 | foreach(size_t i; j..n)
223 | {
224 | mat[i, j] = kernel(data.refColumnSelect(i).array, arrj);
225 | mat[j, i] = mat[i, j];
226 | }
227 | }
228 | return mat;
229 | }
230 |
--------------------------------------------------------------------------------
/d/mathdemo.d:
--------------------------------------------------------------------------------
1 | import std.math: dlog = log;
2 | import std.stdio: writeln;
3 | import std.random: uniform01;
4 | import core.stdc.math: logf, log, logl;
5 | import std.datetime.stopwatch: AutoStart, StopWatch;
6 |
7 | T log(T)(T x)
8 | if(is(T == float))
9 | {
10 | return logf(x);
11 | }
12 | T log(T)(T x)
13 | if(is(T == double))
14 | {
15 | return log(x);
16 | }
17 | T log(T)(T x)
18 | if(is(T == real))
19 | {
20 | return logl(x);
21 | }
22 |
23 | auto makeRandomArray(T)(size_t n)
24 | {
25 | T[] arr = new T[n];
26 | foreach(ref el; arr)
27 | {
28 | el = uniform01!(T)();
29 | }
30 | return arr;
31 | }
32 |
33 | auto apply(alias fun, T)(T[] arr)
34 | {
35 | foreach(ref el; arr)
36 | {
37 | el = fun(el);
38 | }
39 | return;
40 | }
41 |
42 | /**
43 | ldc2 -O --boundscheck=off --ffast-math --mcpu=native --boundscheck=off mathdemo.d && ./mathdemo
44 | Time taken for c log: 0.324789 seconds.
45 | Time taken for d log: 2.30737 seconds.
46 | */
47 | void main()
48 | {
49 | auto sw = StopWatch(AutoStart.no);
50 |
51 | /* For C's log function */
52 | auto arr = makeRandomArray!(float)(100_000_000);
53 | sw.start();
54 | apply!(log)(arr);
55 | sw.stop();
56 | writeln("Time taken for c log: ", sw.peek.total!"nsecs"/1000_000_000.0, " seconds.");
57 | sw.reset();
58 |
59 | /* For D's log function */
60 | arr = makeRandomArray!(float)(100_000_000);
61 | sw.start();
62 | apply!(dlog)(arr);
63 | sw.stop();
64 | writeln("Time taken for d log: ", sw.peek.total!"nsecs"/1000_000_000.0, " seconds.");
65 | sw.reset();
66 | }
67 |
--------------------------------------------------------------------------------
/d/script.d:
--------------------------------------------------------------------------------
1 | import arrays;
2 | import kernel;
3 |
4 | import std.conv: to;
5 | import std.meta: AliasSeq;
6 | import std.algorithm : sum;
7 | import std.stdio: File, writeln;
8 | import std.typecons: tuple, Tuple;
9 | import std.datetime.stopwatch: AutoStart, StopWatch;
10 |
11 | /**
12 | To compile:
13 | ldc2 script.d kernel.d math.d arrays.d -O --boundscheck=off --ffast-math -mcpu=native
14 | /usr/bin/time -v ./script
15 |
16 | ldc2 --mcpu=help
17 | ldc2 script.d kernel.d math.d arrays.d -O --boundscheck=off --ffast-math --mcpu=core-avx2 -mattr=+avx2,+sse4.1,+sse4.2
18 | /usr/bin/time -v ./script
19 | */
20 |
21 | auto bench(alias K, T)(K!T kernel, long[] n)
22 | {
23 | auto times = new double[n.length];
24 | auto sw = StopWatch(AutoStart.no);
25 | foreach(i; 0..n.length)
26 | {
27 | double[3] _times;
28 | auto data = createRandomMatrix!T(784L, n[i]);
29 | foreach(ref t; _times[])
30 | {
31 | sw.start();
32 | auto mat = calculateKernelMatrix!(K!T, T)(kernel, data);
33 | sw.stop();
34 | t = sw.peek.total!"nsecs"/1000_000_000.0;
35 | sw.reset();
36 | }
37 | times[i] = sum(_times[])/3.0;
38 | version(verbose)
39 | {
40 | writeln("Average time for n = ", n[i], ", ", times[i], " seconds.");
41 | writeln("Detailed times: ", _times, "\n");
42 | }
43 | }
44 | return tuple(n, times);
45 | }
46 |
47 | auto runKernelBenchmark(KS)(KS kernels, long[] n)
48 | {
49 | auto tmp = bench(kernels[0], n);
50 | alias R = typeof(tmp);
51 | R[kernels.length] results;
52 | results[0] = tmp;
53 | static foreach(i; 1..kernels.length)
54 | {
55 | version(verbose)
56 | {
57 | writeln("Running benchmarks for ", kernels[i]);
58 | }
59 | results[i] = bench(kernels[i], n);
60 | }
61 | return results;
62 | }
63 |
64 | void writeRow(File file, string[] row)
65 | {
66 | string line = "";
67 | foreach(i; 0..(row.length - 1))
68 | line ~= row[i] ~ ",";
69 | line ~= row[row.length - 1] ~ "\n";
70 | file.write(line);
71 | return;
72 | }
73 |
74 | void runAllKernelBenchmarks(T = float)()
75 | {
76 | auto kernels = tuple(DotProduct!(T)(), Gaussian!(T)(1), Polynomial!(T)(2.5f, 1),
77 | Exponential!(T)(1), Log!(T)(3), Cauchy!(T)(1),
78 | Power!(T)(2.5f), Wave!(T)(1), Sigmoid!(T)(1, 1));
79 | auto kernelNames = ["DotProduct", "Gaussian", "Polynomial",
80 | "Exponential", "Log", "Cauchy",
81 | "Power", "Wave", "Sigmoid"];
82 | //long[] n = [100L, 500L, 1000L];
83 | long[] n = [1000L, 5000L, 10_000L, 20_000L, 30_000L];
84 |
85 | auto results = runKernelBenchmark(kernels, n);
86 |
87 | auto table = new string[][] (n.length * kernels.length + 1, 4);
88 | table[0][] = ["language", "kernel", "nitems", "time"];
89 | auto tmp = ["D", "", "", ""];
90 | while(true)
91 | {
92 | auto k = 1;
93 | foreach(i; 0..kernels.length)
94 | {
95 | tmp = ["D", kernelNames[i], "", ""];
96 | foreach(j; 0..n.length)
97 | {
98 | tmp[2] = to!(string)(results[i][0][j]);
99 | tmp[3] = to!(string)(results[i][1][j]);
100 | table[k][] = tmp.dup;
101 | k += 1;
102 | }
103 | }
104 | if(k > (table.length - 1))
105 | {
106 | break;
107 | }
108 | }
109 | version(fastmath)
110 | {
111 | auto file = File("../fmdata/dBench.csv", "w");
112 | }else{
113 | auto file = File("../data/dBench.csv", "w");
114 | }
115 | foreach(row; table)
116 | file.writeRow(row);
117 | }
118 |
119 | void main()
120 | {
121 | runAllKernelBenchmarks();
122 | }
123 |
--------------------------------------------------------------------------------
/d/script.sh:
--------------------------------------------------------------------------------
1 | #!/usr/bin/bash
2 | # Uses "regular" mathematical functions
3 | ldc2 script.d kernel.d math.d arrays.d --release -O --d-version=verbose --boundscheck=off --mcpu=native
4 | /usr/bin/time -v ./script
5 | # Uses Fast Math
6 | ldc2 script.d kernel.d math.d arrays.d --release -O --d-version=verbose --d-version=fastmath --ffast-math --boundscheck=off --mcpu=native
7 | /usr/bin/time -v ./script
8 |
--------------------------------------------------------------------------------
/d/time.log:
--------------------------------------------------------------------------------
1 | Command being timed: "./script"
2 | User time (seconds): 69089.75
3 | System time (seconds): 41.91
4 | Percent of CPU this job got: 1181%
5 | Elapsed (wall clock) time (h:mm:ss or m:ss): 1:37:29
6 | Average shared text size (kbytes): 0
7 | Average unshared data size (kbytes): 0
8 | Average stack size (kbytes): 0
9 | Average total size (kbytes): 0
10 | Maximum resident set size (kbytes): 20458972
11 | Average resident set size (kbytes): 0
12 | Major (requiring I/O) page faults: 0
13 | Minor (reclaiming a frame) page faults: 15393443
14 | Voluntary context switches: 4884
15 | Involuntary context switches: 2222841
16 | Swaps: 0
17 | File system inputs: 8
18 | File system outputs: 8
19 | Socket messages sent: 0
20 | Socket messages received: 0
21 | Signals delivered: 0
22 | Page size (bytes): 4096
23 | Exit status: 0
24 |
25 |
--------------------------------------------------------------------------------
/data/chapelBench.csv:
--------------------------------------------------------------------------------
1 | language, kernel, nitems, time
2 | Chapel, DotProduct, 1000, 0.0327997
3 | Chapel, DotProduct, 5000, 0.828653
4 | Chapel, DotProduct, 10000, 3.46129
5 | Chapel, DotProduct, 20000, 15.6243
6 | Chapel, DotProduct, 30000, 37.36
7 | Chapel, Gaussian, 1000, 0.0378753
8 | Chapel, Gaussian, 5000, 0.980372
9 | Chapel, Gaussian, 10000, 3.95018
10 | Chapel, Gaussian, 20000, 16.7415
11 | Chapel, Gaussian, 30000, 39.4446
12 | Chapel, Polynomial, 1000, 0.03782
13 | Chapel, Polynomial, 5000, 0.967143
14 | Chapel, Polynomial, 10000, 4.00965
15 | Chapel, Polynomial, 20000, 16.9653
16 | Chapel, Polynomial, 30000, 39.7892
17 | Chapel, Exponential, 1000, 0.037031
18 | Chapel, Exponential, 5000, 0.959345
19 | Chapel, Exponential, 10000, 3.90207
20 | Chapel, Exponential, 20000, 16.5751
21 | Chapel, Exponential, 30000, 39.0656
22 | Chapel, Log, 1000, 0.985141
23 | Chapel, Log, 5000, 24.4549
24 | Chapel, Log, 10000, 98.1639
25 | Chapel, Log, 20000, 394.476
26 | Chapel, Log, 30000, 878.653
27 | Chapel, Cauchy, 1000, 0.0356847
28 | Chapel, Cauchy, 5000, 0.929519
29 | Chapel, Cauchy, 10000, 3.83651
30 | Chapel, Cauchy, 20000, 16.2932
31 | Chapel, Cauchy, 30000, 38.3297
32 | Chapel, Power, 1000, 0.975878
33 | Chapel, Power, 5000, 24.2018
34 | Chapel, Power, 10000, 96.7752
35 | Chapel, Power, 20000, 387.347
36 | Chapel, Power, 30000, 871.812
37 | Chapel, Wave, 1000, 0.0342877
38 | Chapel, Wave, 5000, 0.950036
39 | Chapel, Wave, 10000, 3.92775
40 | Chapel, Wave, 20000, 16.5319
41 | Chapel, Wave, 30000, 38.7444
42 | Chapel, Sigmoid, 1000, 0.035638
43 | Chapel, Sigmoid, 5000, 0.928199
44 | Chapel, Sigmoid, 10000, 3.82996
45 | Chapel, Sigmoid, 20000, 16.1571
46 | Chapel, Sigmoid, 30000, 38.0366
--------------------------------------------------------------------------------
/data/dBench.csv:
--------------------------------------------------------------------------------
1 | language,kernel,nitems,time
2 | D,DotProduct,1000,0.0331662
3 | D,DotProduct,5000,0.863256
4 | D,DotProduct,10000,3.92915
5 | D,DotProduct,20000,17.1248
6 | D,DotProduct,30000,40.4689
7 | D,Gaussian,1000,0.0364082
8 | D,Gaussian,5000,1.02408
9 | D,Gaussian,10000,4.2395
10 | D,Gaussian,20000,18.0227
11 | D,Gaussian,30000,41.9824
12 | D,Polynomial,1000,0.0336679
13 | D,Polynomial,5000,0.992885
14 | D,Polynomial,10000,4.03298
15 | D,Polynomial,20000,17.307
16 | D,Polynomial,30000,40.8675
17 | D,Exponential,1000,0.0348232
18 | D,Exponential,5000,1.00611
19 | D,Exponential,10000,4.20423
20 | D,Exponential,20000,17.814
21 | D,Exponential,30000,41.7958
22 | D,Log,1000,0.960558
23 | D,Log,5000,24.7134
24 | D,Log,10000,94.2969
25 | D,Log,20000,378.366
26 | D,Log,30000,852.784
27 | D,Cauchy,1000,0.0338152
28 | D,Cauchy,5000,1.01084
29 | D,Cauchy,10000,4.08566
30 | D,Cauchy,20000,17.6364
31 | D,Cauchy,30000,41.2217
32 | D,Power,1000,0.961671
33 | D,Power,5000,24.1544
34 | D,Power,10000,96.2633
35 | D,Power,20000,383.808
36 | D,Power,30000,863.304
37 | D,Wave,1000,0.034339
38 | D,Wave,5000,0.960398
39 | D,Wave,10000,4.14267
40 | D,Wave,20000,17.9007
41 | D,Wave,30000,41.9656
42 | D,Sigmoid,1000,0.0337945
43 | D,Sigmoid,5000,0.901038
44 | D,Sigmoid,10000,4.0042
45 | D,Sigmoid,20000,16.9423
46 | D,Sigmoid,30000,39.9905
47 |
--------------------------------------------------------------------------------
/data/dNDSliceBench.csv:
--------------------------------------------------------------------------------
1 | language,kernel,nitems,time
2 | D,DotProduct,1000,0.0347107
3 | D,DotProduct,5000,0.858201
4 | D,DotProduct,10000,4.01725
5 | D,DotProduct,20000,17.5919
6 | D,DotProduct,30000,41.5271
7 | D,Gaussian,1000,0.0344868
8 | D,Gaussian,5000,0.969041
9 | D,Gaussian,10000,4.25197
10 | D,Gaussian,20000,18.2775
11 | D,Gaussian,30000,42.3535
12 | D,Polynomial,1000,0.0328877
13 | D,Polynomial,5000,0.902159
14 | D,Polynomial,10000,3.99975
15 | D,Polynomial,20000,17.4276
16 | D,Polynomial,30000,41.2833
17 | D,Exponential,1000,0.0341596
18 | D,Exponential,5000,0.954447
19 | D,Exponential,10000,4.09684
20 | D,Exponential,20000,17.6142
21 | D,Exponential,30000,42.0758
22 | D,Log,1000,0.518967
23 | D,Log,5000,13.3401
24 | D,Log,10000,53.1602
25 | D,Log,20000,212.737
26 | D,Log,30000,480.446
27 | D,Cauchy,1000,0.0328838
28 | D,Cauchy,5000,0.959192
29 | D,Cauchy,10000,4.08631
30 | D,Cauchy,20000,17.912
31 | D,Cauchy,30000,42.1615
32 | D,Power,1000,0.507569
33 | D,Power,5000,12.902
34 | D,Power,10000,51.4925
35 | D,Power,20000,205.891
36 | D,Power,30000,464.023
37 | D,Wave,1000,0.0335611
38 | D,Wave,5000,0.906904
39 | D,Wave,10000,4.11685
40 | D,Wave,20000,17.9911
41 | D,Wave,30000,42.3834
42 | D,Sigmoid,1000,0.0326368
43 | D,Sigmoid,5000,0.898398
44 | D,Sigmoid,10000,3.90918
45 | D,Sigmoid,20000,17.2436
46 | D,Sigmoid,30000,41.0899
47 |
--------------------------------------------------------------------------------
/data/juliaBench.csv:
--------------------------------------------------------------------------------
1 | language,kernel,nitems,time
2 | Julia,DotProduct,1000,0.028618017832438152
3 | Julia,DotProduct,5000,0.335107962290446
4 | Julia,DotProduct,10000,2.273205359776815
5 | Julia,DotProduct,20000,12.081494728724161
6 | Julia,DotProduct,30000,29.30661424001058
7 | Julia,Gaussian,1000,0.02318231264750163
8 | Julia,Gaussian,5000,0.36138232549031574
9 | Julia,Gaussian,10000,2.553627332051595
10 | Julia,Gaussian,20000,13.242238680521647
11 | Julia,Gaussian,30000,32.783209005991615
12 | Julia,Polynomial,1000,0.022588332494099934
13 | Julia,Polynomial,5000,0.37053394317626953
14 | Julia,Polynomial,10000,2.513085683186849
15 | Julia,Polynomial,20000,13.059705018997192
16 | Julia,Polynomial,30000,31.536365350087483
17 | Julia,Exponential,1000,0.02099498112996419
18 | Julia,Exponential,5000,0.3473227024078369
19 | Julia,Exponential,10000,2.622964064280192
20 | Julia,Exponential,20000,13.915411392847696
21 | Julia,Exponential,30000,32.7377610206604
22 | Julia,Log,1000,0.6652313073476156
23 | Julia,Log,5000,17.037853320439655
24 | Julia,Log,10000,67.28145058949788
25 | Julia,Log,20000,284.03041768074036
26 | Julia,Log,30000,640.7916380564371
27 | Julia,Cauchy,1000,0.020720958709716797
28 | Julia,Cauchy,5000,0.3785683314005534
29 | Julia,Cauchy,10000,2.387653350830078
30 | Julia,Cauchy,20000,12.377931674321493
31 | Julia,Cauchy,30000,31.607904354731243
32 | Julia,Power,1000,0.6242409547170004
33 | Julia,Power,5000,17.334346691767376
34 | Julia,Power,10000,69.57900404930115
35 | Julia,Power,20000,265.5375280380249
36 | Julia,Power,30000,588.9969596862793
37 | Julia,Wave,1000,0.0548706849416097
38 | Julia,Wave,5000,0.4465626080830892
39 | Julia,Wave,10000,2.4781373341878257
40 | Julia,Wave,20000,13.298714955647787
41 | Julia,Wave,30000,33.43664868672689
42 | Julia,Sigmoid,1000,0.023600339889526367
43 | Julia,Sigmoid,5000,0.35213859875996906
44 | Julia,Sigmoid,10000,2.459144671758016
45 | Julia,Sigmoid,20000,12.380852301915487
46 | Julia,Sigmoid,30000,30.208860715230305
47 |
--------------------------------------------------------------------------------
/data/ndsliceTime.log:
--------------------------------------------------------------------------------
1 | Command being timed: "./ndslice"
2 | User time (seconds): 68998.50
3 | System time (seconds): 36.82
4 | Percent of CPU this job got: 1181%
5 | Elapsed (wall clock) time (h:mm:ss or m:ss): 1:37:24
6 | Average shared text size (kbytes): 0
7 | Average unshared data size (kbytes): 0
8 | Average stack size (kbytes): 0
9 | Average total size (kbytes): 0
10 | Maximum resident set size (kbytes): 20461888
11 | Average resident set size (kbytes): 0
12 | Major (requiring I/O) page faults: 0
13 | Minor (reclaiming a frame) page faults: 15345124
14 | Voluntary context switches: 3961
15 | Involuntary context switches: 1315598
16 | Swaps: 0
17 | File system inputs: 0
18 | File system outputs: 8
19 | Socket messages sent: 0
20 | Socket messages received: 0
21 | Signals delivered: 0
22 | Page size (bytes): 4096
23 | Exit status: 0
24 |
--------------------------------------------------------------------------------
/docs/kernel.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dataPulverizer/KernelMatrixBenchmark/28c40acb02497a3052f2d04689ab9c93fd090de2/docs/kernel.pdf
--------------------------------------------------------------------------------
/fmdata/chapelBench.csv:
--------------------------------------------------------------------------------
1 | language, kernel, nitems, time
2 | Chapel, DotProduct, 1000, 0.00924367
3 | Chapel, DotProduct, 5000, 0.221611
4 | Chapel, DotProduct, 10000, 1.42578
5 | Chapel, DotProduct, 20000, 8.38982
6 | Chapel, DotProduct, 30000, 22.7254
7 | Chapel, Gaussian, 1000, 0.00718267
8 | Chapel, Gaussian, 5000, 0.276599
9 | Chapel, Gaussian, 10000, 1.50735
10 | Chapel, Gaussian, 20000, 8.65949
11 | Chapel, Gaussian, 30000, 24.1533
12 | Chapel, Polynomial, 1000, 0.00828233
13 | Chapel, Polynomial, 5000, 0.261301
14 | Chapel, Polynomial, 10000, 1.52145
15 | Chapel, Polynomial, 20000, 8.84253
16 | Chapel, Polynomial, 30000, 24.1945
17 | Chapel, Exponential, 1000, 0.0122707
18 | Chapel, Exponential, 5000, 0.252101
19 | Chapel, Exponential, 10000, 1.53333
20 | Chapel, Exponential, 20000, 8.68634
21 | Chapel, Exponential, 30000, 23.6449
22 | Chapel, Log, 1000, 0.957172
23 | Chapel, Log, 5000, 23.7193
24 | Chapel, Log, 10000, 94.9719
25 | Chapel, Log, 20000, 380.074
26 | Chapel, Log, 30000, 855.543
27 | Chapel, Cauchy, 1000, 0.008833
28 | Chapel, Cauchy, 5000, 0.255776
29 | Chapel, Cauchy, 10000, 1.51067
30 | Chapel, Cauchy, 20000, 8.53653
31 | Chapel, Cauchy, 30000, 23.9485
32 | Chapel, Power, 1000, 0.959575
33 | Chapel, Power, 5000, 23.7316
34 | Chapel, Power, 10000, 94.9118
35 | Chapel, Power, 20000, 379.592
36 | Chapel, Power, 30000, 854.344
37 | Chapel, Wave, 1000, 0.00928133
38 | Chapel, Wave, 5000, 0.271076
39 | Chapel, Wave, 10000, 1.54933
40 | Chapel, Wave, 20000, 8.69982
41 | Chapel, Wave, 30000, 24.2359
42 | Chapel, Sigmoid, 1000, 0.009228
43 | Chapel, Sigmoid, 5000, 0.249445
44 | Chapel, Sigmoid, 10000, 1.47198
45 | Chapel, Sigmoid, 20000, 8.46401
46 | Chapel, Sigmoid, 30000, 23.3408
--------------------------------------------------------------------------------
/fmdata/dBench.csv:
--------------------------------------------------------------------------------
1 | language,kernel,nitems,time
2 | D,DotProduct,1000,0.00758157
3 | D,DotProduct,5000,0.227808
4 | D,DotProduct,10000,1.28944
5 | D,DotProduct,20000,9.12727
6 | D,DotProduct,30000,26.5607
7 | D,Gaussian,1000,0.0090272
8 | D,Gaussian,5000,0.265671
9 | D,Gaussian,10000,1.42508
10 | D,Gaussian,20000,9.2337
11 | D,Gaussian,30000,27.3075
12 | D,Polynomial,1000,0.0068673
13 | D,Polynomial,5000,0.23578
14 | D,Polynomial,10000,1.31602
15 | D,Polynomial,20000,8.99742
16 | D,Polynomial,30000,26.939
17 | D,Exponential,1000,0.008833
18 | D,Exponential,5000,0.27474
19 | D,Exponential,10000,1.4347
20 | D,Exponential,20000,9.3266
21 | D,Exponential,30000,27.2397
22 | D,Log,1000,0.966266
23 | D,Log,5000,24.1955
24 | D,Log,10000,96.2422
25 | D,Log,20000,384.62
26 | D,Log,30000,864.477
27 | D,Cauchy,1000,0.00608963
28 | D,Cauchy,5000,0.212442
29 | D,Cauchy,10000,1.25913
30 | D,Cauchy,20000,8.69336
31 | D,Cauchy,30000,26.7323
32 | D,Power,1000,0.959982
33 | D,Power,5000,24.1416
34 | D,Power,10000,96.0607
35 | D,Power,20000,384.133
36 | D,Power,30000,854.033
37 | D,Wave,1000,0.0075854
38 | D,Wave,5000,0.263024
39 | D,Wave,10000,1.57122
40 | D,Wave,20000,10.0207
41 | D,Wave,30000,29.9925
42 | D,Sigmoid,1000,0.0056798
43 | D,Sigmoid,5000,0.251254
44 | D,Sigmoid,10000,1.47337
45 | D,Sigmoid,20000,8.76444
46 | D,Sigmoid,30000,28.8495
47 |
--------------------------------------------------------------------------------
/fmdata/juliaBench.csv:
--------------------------------------------------------------------------------
1 | language,kernel,nitems,time
2 | Julia,DotProduct,1000,0.028934717178344727
3 | Julia,DotProduct,5000,0.3384213447570801
4 | Julia,DotProduct,10000,2.285879373550415
5 | Julia,DotProduct,20000,12.012665033340454
6 | Julia,DotProduct,30000,29.216094970703125
7 | Julia,Gaussian,1000,0.022055943806966145
8 | Julia,Gaussian,5000,0.3577253818511963
9 | Julia,Gaussian,10000,2.520039637883504
10 | Julia,Gaussian,20000,13.040017366409302
11 | Julia,Gaussian,30000,32.2067592938741
12 | Julia,Polynomial,1000,0.02389367421468099
13 | Julia,Polynomial,5000,0.3974326451619466
14 | Julia,Polynomial,10000,2.538082679112752
15 | Julia,Polynomial,20000,13.033354997634888
16 | Julia,Polynomial,30000,31.670559326807656
17 | Julia,Exponential,1000,0.02152903874715169
18 | Julia,Exponential,5000,0.36496035257975257
19 | Julia,Exponential,10000,2.6344366868336992
20 | Julia,Exponential,20000,13.753678719202677
21 | Julia,Exponential,30000,33.27915596961975
22 | Julia,Log,1000,0.6249533494313557
23 | Julia,Log,5000,15.74955932299296
24 | Julia,Log,10000,61.271198670069374
25 | Julia,Log,20000,247.89898459116617
26 | Julia,Log,30000,565.0516930421193
27 | Julia,Cauchy,1000,0.020629008611043293
28 | Julia,Cauchy,5000,0.3662620385487874
29 | Julia,Cauchy,10000,2.3178850015004473
30 | Julia,Cauchy,20000,12.884659051895142
31 | Julia,Cauchy,30000,31.82168634732564
32 | Julia,Power,1000,0.6112387180328369
33 | Julia,Power,5000,16.487519025802612
34 | Julia,Power,10000,61.17924292882283
35 | Julia,Power,20000,247.75332935651141
36 | Julia,Power,30000,559.8640073140461
37 | Julia,Wave,1000,0.05431739489237467
38 | Julia,Wave,5000,0.4737757047017415
39 | Julia,Wave,10000,2.6081150372823076
40 | Julia,Wave,20000,13.91323169072469
41 | Julia,Wave,30000,33.91305796305338
42 | Julia,Sigmoid,1000,0.021110375722249348
43 | Julia,Sigmoid,5000,0.32453298568725586
44 | Julia,Sigmoid,10000,2.39143697420756
45 | Julia,Sigmoid,20000,12.46296731630961
46 | Julia,Sigmoid,30000,30.015175342559814
47 |
--------------------------------------------------------------------------------
/julia/KernelMatrix.jl:
--------------------------------------------------------------------------------
1 | using Base.Threads: @threads, @spawn
2 | using Random: shuffle!
3 | using LinearAlgebra: Symmetric
4 |
5 | # Kernel Function Types
6 | #======================#
7 | abstract type AbstractKernel{T <: AbstractFloat} end
8 |
9 | struct DotProduct{T} <: AbstractKernel{T} end
10 | @inline function kernel(K::DotProduct{T}, x::AbstractArray{T, N}, y::AbstractArray{T, N}) where {T,N}
11 | dist = T(0)
12 | m = length(x)
13 | @inbounds @simd for i in 1:m
14 | dist += x[i] * y[i]
15 | end
16 | return dist
17 | end
18 |
19 | struct Gaussian{T} <: AbstractKernel{T}
20 | theta::T
21 | end
22 | @inline function kernel(K::Gaussian{T}, x::AbstractArray{T, N}, y::AbstractArray{T, N}) where {T,N}
23 | dist::T = T(0)
24 | tmp::T = T(0)
25 | m = length(x)
26 | @inbounds @simd for i in 1:m
27 | tmp = x[i] - y[i]
28 | dist += tmp * tmp
29 | end
30 | return exp(-sqrt(dist)/K.theta)
31 | end
32 |
33 | struct Polynomial{T} <: AbstractKernel{T}
34 | d::T
35 | offset::T
36 | end
37 | @inline function kernel(K::Polynomial{T}, x::AbstractArray{T, N}, y::AbstractArray{T, N}) where {T, N}
38 | dist::T = T(0)
39 | m = length(x)
40 | @inbounds @simd for i = 1:m
41 | dist += x[i] * y[i]
42 | end
43 | return (dist + K.offset)^K.d
44 | end
45 |
46 | struct Exponential{T} <: AbstractKernel{T}
47 | theta::T
48 | end
49 | @inline function kernel(K::Exponential{T}, x::AbstractArray{T, N}, y::AbstractArray{T, N}) where {T, N}
50 | dist::T = T(0)
51 | m = length(x)
52 | @inbounds @simd for i in 1:m
53 | dist -= abs(x[i] - y[i])
54 | end
55 | return exp(dist/K.theta)
56 | end
57 |
58 | struct Log{T} <: AbstractKernel{T}
59 | beta::T
60 | end
61 | @inline function kernel(K::Log{T}, x::AbstractArray{T, N}, y::AbstractArray{T, N}) where {T, N}
62 | dist::T = T(0)
63 | m = length(x)
64 | @inbounds @simd for i in 1:m
65 | dist += abs(x[i] - y[i])^K.beta
66 | end
67 | dist ^= (1/K.beta)
68 | return -log(1 + dist)
69 | end
70 |
71 | struct Cauchy{T} <: AbstractKernel{T}
72 | theta::T
73 | end
74 | @inline function kernel(K::Cauchy{T}, x::AbstractArray{T, N}, y::AbstractArray{T, N}) where {T, N}
75 | dist::T = T(0)
76 | tmp::T = T(0)
77 | m = length(x)
78 | @inbounds @simd for i in 1:m
79 | tmp = x[i] - y[i]
80 | dist += tmp*tmp
81 | end
82 | dist = sqrt(dist)/K.theta
83 | return 1/(1 + dist)
84 | end
85 |
86 | struct Power{T} <: AbstractKernel{T}
87 | beta::T
88 | end
89 | @inline function kernel(K::Power{T}, x::AbstractArray{T, N}, y::AbstractArray{T, N}) where {T, N}
90 | dist::T = T(0)
91 | m = length(x)
92 | @inbounds @simd for i in 1:m
93 | dist += abs(x[i] - y[i])^K.beta
94 | end
95 | return -dist^(1/K.beta)
96 | end
97 |
98 | struct Wave{T} <: AbstractKernel{T}
99 | theta::T
100 | end
101 | @inline function kernel(K::Wave{T}, x::AbstractArray{T, N}, y::AbstractArray{T, N}) where {T, N}
102 | dist::T = T(0)
103 | m = length(x)
104 | @inbounds @simd for i in 1:m
105 | dist += abs(x[i] - y[i])
106 | end
107 | tmp = K.theta/dist;
108 | return tmp*sin(1/tmp);
109 | end
110 |
111 | struct Sigmoid{T} <: AbstractKernel{T}
112 | beta0::T
113 | beta1::T
114 | end
115 | @inline function kernel(K::Sigmoid{T}, x::AbstractArray{T, N}, y::AbstractArray{T, N}) where {T, N}
116 | dist::T = T(0)
117 | m = length(x)
118 | @inbounds @simd for i = 1:m
119 | dist += x[i] * y[i]
120 | end
121 | return tanh(K.beta0 * dist + K.beta1)
122 | end
123 |
124 | #=======================================================================================#
125 |
126 | function calculateKernelMatrix(Kernel::AbstractKernel{T}, data::AbstractArray{T, N}) where {T, N}
127 | n = size(data)[2]
128 | mat::Array{T, 2} = zeros(T, n, n)
129 | @threads for j in 1:n
130 | @views for i in j:n
131 | mat[i,j] = kernel(Kernel, data[:, i], data[:, j])
132 | end
133 | end
134 | return Symmetric(mat, :L)
135 | end
136 |
--------------------------------------------------------------------------------
/julia/fmtime.log:
--------------------------------------------------------------------------------
1 | Command being timed: "julia --math-mode=fast script.jl fmdata true"
2 | User time (seconds): 45388.59
3 | System time (seconds): 31.50
4 | Percent of CPU this job got: 716%
5 | Elapsed (wall clock) time (h:mm:ss or m:ss): 1:45:36
6 | Average shared text size (kbytes): 0
7 | Average unshared data size (kbytes): 0
8 | Average stack size (kbytes): 0
9 | Average total size (kbytes): 0
10 | Maximum resident set size (kbytes): 7489840
11 | Average resident set size (kbytes): 0
12 | Major (requiring I/O) page faults: 1
13 | Minor (reclaiming a frame) page faults: 38016102
14 | Voluntary context switches: 1583
15 | Involuntary context switches: 437027
16 | Swaps: 0
17 | File system inputs: 96
18 | File system outputs: 8
19 | Socket messages sent: 0
20 | Socket messages received: 0
21 | Signals delivered: 0
22 | Page size (bytes): 4096
23 | Exit status: 0
24 |
25 |
--------------------------------------------------------------------------------
/julia/script.jl:
--------------------------------------------------------------------------------
1 | include("KernelMatrix.jl")
2 | using DelimitedFiles: writedlm;
3 | using InteractiveUtils: @code_warntype;
4 |
5 | const folder, _verbose = ARGS
6 | const verbose = _verbose == "true" ? true : false;
7 |
8 | function bench(Kernel::AbstractKernel{T}, n::Array{Int64, 1}) where {T}
9 | times::Array{Float64, 1} = zeros(Float64, length(n))
10 | for i in 1:length(n)
11 | _times::Array{Float64, 1} = zeros(Float64, 3)
12 | data = rand(T, (784, n[i]))
13 | for j in 1:3
14 | t1 = time()
15 | mat = calculateKernelMatrix(Kernel, data);
16 | t2 = time()
17 | _times[j] = t2 - t1
18 | end
19 | times[i] = (_times[1] + _times[2] + _times[3])/3
20 | if verbose
21 | println("Average time for n = ", n[i], ", ", times[i], " seconds.")
22 | println("Detailed times: ", _times);
23 | end
24 | end
25 | return times
26 | end
27 |
28 | function precompileKernel(Kernel::AbstractKernel{T}, n::Array{Int64, 1}) where {T}
29 | precompile(kernel, (typeof(Kernel), Array{T, 1}, Array{T, 1}))
30 | precompile(calculateKernelMatrix, (typeof(Kernel), Array{T, 2}))
31 | precompile(bench, (typeof(Kernel), Array{Int64, 1}))
32 |
33 | times = bench(Kernel, n)
34 | if verbose
35 | println("\n\nBenchmark for kernel: ", repr(Kernel), "\ntimes: ", times)
36 | end
37 |
38 | return (n, times)
39 | end
40 |
41 | function runKernelBenchmarks(kernels::NTuple{N, AbstractKernel{T}}, n::Array{Int64, 1}) where {N, T}
42 | results = Array{Tuple{Array{Int64, 1}, Array{Float64, 1}}, 1}(undef, length(kernels))
43 | for i in 1:length(results)
44 | if verbose # to check types are known at compilation
45 | @code_warntype precompileKernel(kernels[i], n)
46 | end
47 | results[i] = precompileKernel(kernels[i], n)
48 | end
49 | return results
50 | end
51 |
52 | function main(::Type{T}) where {T}
53 | # n = [100, 500, 1000]
54 | n = [1000, 5000 , 10_000, 20_000, 30_000];
55 | kernels = (DotProduct{T}(), Gaussian{T}(1), Polynomial{T}(2.5, 1),
56 | Exponential{T}(1), Log{T}(3), Cauchy{T}(1),
57 | Power{T}(2.5), Wave{T}(1), Sigmoid{T}(1, 1));
58 | kernelNames = ["DotProduct", "Gaussian", "Polynomial",
59 | "Exponential", "Log", "Cauchy",
60 | "Power", "Wave", "Sigmoid"];
61 | outputs = runKernelBenchmarks(kernels, n)
62 |
63 | table = Array{String, 2}(undef, (length(n)*length(kernels) + 1, 4))
64 | table[1, :] = ["language", "kernel", "nitems", "time"]
65 | while true
66 | k = 2
67 | for i in 1:length(kernelNames)
68 | tmp = ["Julia", kernelNames[i], "", ""]
69 | for j in 1:length(n)
70 | tmp[3] = repr(outputs[i][1][j])
71 | tmp[4] = repr(outputs[i][2][j])
72 | table[k, :] = tmp
73 | k += 1
74 | end
75 | end
76 | if k > size(table)[1]
77 | break
78 | end
79 | end
80 |
81 | writedlm("../" * folder * "/juliaBench.csv", table, ',')
82 |
83 | return
84 | end
85 |
86 | #=
87 | To run:
88 | /usr/bin/time -v julia script.jl
89 | =#
90 | main(Float32)
91 |
--------------------------------------------------------------------------------
/julia/script.sh:
--------------------------------------------------------------------------------
1 | #!/usr/bin/bash
2 | # Uses "regular" mathematical functions
3 | /usr/bin/time -v julia script.jl data true
4 | # Uses Fast Math
5 | /usr/bin/time -v julia --math-mode=fast script.jl fmdata true
6 |
--------------------------------------------------------------------------------
/julia/time.log:
--------------------------------------------------------------------------------
1 | Command being timed: "julia script.jl data true"
2 | User time (seconds): 49163.19
3 | System time (seconds): 32.08
4 | Percent of CPU this job got: 717%
5 | Elapsed (wall clock) time (h:mm:ss or m:ss): 1:54:18
6 | Average shared text size (kbytes): 0
7 | Average unshared data size (kbytes): 0
8 | Average stack size (kbytes): 0
9 | Average total size (kbytes): 0
10 | Maximum resident set size (kbytes): 7501592
11 | Average resident set size (kbytes): 0
12 | Major (requiring I/O) page faults: 804
13 | Minor (reclaiming a frame) page faults: 38021184
14 | Voluntary context switches: 2657
15 | Involuntary context switches: 477363
16 | Swaps: 0
17 | File system inputs: 368240
18 | File system outputs: 8
19 | Socket messages sent: 0
20 | Socket messages received: 0
21 | Signals delivered: 0
22 | Page size (bytes): 4096
23 | Exit status: 0
24 |
25 |
26 |
--------------------------------------------------------------------------------
/ndslice/dub.json:
--------------------------------------------------------------------------------
1 | {
2 | "authors": [
3 | "Dr Chibisi Chima-Okereke"
4 | ],
5 | "copyright": "Copyright © 2020, Dr Chibisi Chima-Okereke",
6 | "dependencies": {
7 | "mir-algorithm": "~>3.8.12",
8 | "mir-random": "~>2.2.14"
9 | },
10 | "description": "Kernel Matrix Calculations using D's Mir Algorithm Package",
11 | "license": "MIT",
12 | "name": "ndslice",
13 | "dflags": ["--boundscheck=off", "-mcpu=native"],
14 | "toolchainRequirements": {
15 | "dmd": ">=2.090.1",
16 | "gdc": "no",
17 | "ldc2": ">=1.18.0"
18 | },
19 | "targetType": "executable"
20 | }
--------------------------------------------------------------------------------
/ndslice/source/app.d:
--------------------------------------------------------------------------------
1 | import ndslice.kernels;
2 |
3 | import mir.math.sum;
4 | import mir.ndslice;
5 | import mir.random.algorithm: randomSlice;
6 | import mir.random.variable: UniformVariable;
7 |
8 | import std.conv: to;
9 | import std.datetime.stopwatch: AutoStart, StopWatch;
10 | import std.meta: AliasSeq;
11 | import std.stdio: File, writeln;
12 | import std.typecons: tuple, Tuple;
13 |
14 | /**
15 | To compile:
16 | dub run --compiler=ldc2 --build=release
17 | */
18 |
19 | auto bench(alias K, T)(K!(T) kernel, long[] n, bool verbose = true)
20 | {
21 | auto times = new double[n.length];
22 | auto sw = StopWatch(AutoStart.no);
23 | foreach(i; 0..n.length)
24 | {
25 | double[3] _times;
26 | auto data = UniformVariable!T(0, 1).randomSlice(n[i], 784L);
27 | foreach(ref t; _times[])
28 | {
29 | sw.start();
30 | auto mat = calculateKernelMatrix!(K, T)(kernel, data);
31 | sw.stop();
32 | t = sw.peek.total!"nsecs"/1000_000_000.0;
33 | sw.reset();
34 | }
35 | times[i] = sum!"naive"(_times[])/3.0;
36 | if(verbose)
37 | {
38 | writeln("Average time for n = ", n[i], ", ", times[i], " seconds.");
39 | writeln("Detailed times: ", _times, "\n");
40 | }
41 | }
42 | return tuple(n, times);
43 | }
44 |
45 | auto runKernelBenchmark(KS)(KS kernels, long[] n, bool verbose = true)
46 | {
47 | auto tmp = bench(kernels[0], n, verbose);
48 | alias R = typeof(tmp);
49 | R[kernels.length] results;
50 | results[0] = tmp;
51 | static foreach(i; 1..kernels.length)
52 | {
53 | if(verbose)
54 | {
55 | writeln("Running benchmarks for ", kernels[i]);
56 | }
57 | results[i] = bench(kernels[i], n, verbose);
58 | }
59 | return results;
60 | }
61 |
62 | void writeRow(File file, string[] row)
63 | {
64 | string line = "";
65 | foreach(i; 0..(row.length - 1))
66 | line ~= row[i] ~ ",";
67 | line ~= row[row.length - 1] ~ "\n";
68 | file.write(line);
69 | return;
70 | }
71 |
72 |
73 | void runAllKernelBenchmarks(T = float)(bool verbose = true)
74 | {
75 | auto kernels = tuple(DotProduct!(T)(), Gaussian!(T)(1), Polynomial!(T)(2.5f, 1),
76 | Exponential!(T)(1), Log!(T)(3), Cauchy!(T)(1),
77 | Power!(T)(2.5f), Wave!(T)(1), Sigmoid!(T)(1, 1));
78 | auto kernelNames = ["DotProduct", "Gaussian", "Polynomial",
79 | "Exponential", "Log", "Cauchy",
80 | "Power", "Wave", "Sigmoid"];
81 | //long[] n = [100L, 500L, 1000L];
82 | long[] n = [1000L, 5000L, 10_000L, 20_000L, 30_000L];
83 | auto results = runKernelBenchmark(kernels, n, verbose);
84 |
85 | auto table = new string[][] (n.length * kernels.length + 1, 4);
86 | table[0][] = ["language", "kernel", "nitems", "time"];
87 | auto tmp = ["D", "", "", ""];
88 | while(true)
89 | {
90 | auto k = 1;
91 | foreach(i; 0..kernels.length)
92 | {
93 | tmp = ["D", kernelNames[i], "", ""];
94 | foreach(j; 0..n.length)
95 | {
96 | tmp[2] = to!(string)(results[i][0][j]);
97 | tmp[3] = to!(string)(results[i][1][j]);
98 | table[k][] = tmp.dup;
99 | k += 1;
100 | }
101 | }
102 | if(k > (table.length - 1))
103 | {
104 | break;
105 | }
106 | }
107 | auto file = File("../data/dNDSliceBench.csv", "w");
108 | foreach(row; table)
109 | file.writeRow(row);
110 |
111 | writeln("table: ", table);
112 | }
113 |
114 | void main()
115 | {
116 | runAllKernelBenchmarks();
117 | }
118 |
--------------------------------------------------------------------------------
/ndslice/source/ndslice/kernels.d:
--------------------------------------------------------------------------------
1 | module ndslice.kernels;
2 | import core.stdc.tgmath: tanh;
3 | import mir.algorithm.iteration;
4 | import mir.math.common;
5 | import mir.ndslice;
6 |
7 | import std.parallelism;
8 |
9 | /**
10 | Kernel Function Types:
11 | */
12 | struct DotProduct(T)
13 | {
14 | public:
15 | this(T _nothing)
16 | {}
17 | T opCall(Slice!(T*) x, Slice!(T*) y) const
18 | {
19 | T dist = 0;
20 | auto m = x.length;
21 | for(size_t i = 0; i < m; ++i)
22 | {
23 | dist += x[i] * y[i];
24 | }
25 | return dist;
26 | }
27 | }
28 |
29 | struct Gaussian(T)
30 | {
31 | private:
32 | T theta;
33 | public:
34 | this(T _theta)
35 | {
36 | theta = _theta;
37 | }
38 | T opCall(Slice!(T*) x, Slice!(T*) y) const
39 | {
40 | T dist = 0;
41 | auto m = x.length;
42 | for(size_t i = 0; i < m; ++i)
43 | {
44 | auto tmp = x[i] - y[i];
45 | dist += tmp * tmp;
46 | }
47 | return exp(-sqrt(dist)/theta);
48 | }
49 | }
50 |
51 | struct Polynomial(T)
52 | {
53 | private:
54 | T d;
55 | T offset;
56 | public:
57 | this(T _d, T _offset)
58 | {
59 | d = _d;
60 | offset = _offset;
61 | }
62 | T opCall(Slice!(T*) x, Slice!(T*) y) const
63 | {
64 | T dist = 0;
65 | auto m = x.length;
66 | for(size_t i = 0; i < m; ++i)
67 | {
68 | dist += x[i] * y[i];
69 | }
70 | return pow(dist + offset, d);
71 | }
72 | }
73 |
74 | struct Exponential(T)
75 | {
76 | private:
77 | T theta;
78 | public:
79 | this(T _theta)
80 | {
81 | theta = _theta;
82 | }
83 | T opCall(Slice!(T*) x, Slice!(T*) y) const
84 | {
85 | T dist = 0;
86 | auto m = x.length;
87 | for(size_t i = 0; i < m; ++i)
88 | {
89 | dist -= fabs(x[i] - y[i]);
90 | }
91 | return exp(dist/theta);
92 | }
93 | }
94 |
95 | struct Log(T)
96 | {
97 | private:
98 | T beta;
99 | public:
100 | this(T _beta)
101 | {
102 | beta = _beta;
103 | }
104 | T opCall(Slice!(T*) x, Slice!(T*) y) const
105 | {
106 | T dist = 0;
107 | auto m = x.length;
108 | for(size_t i = 0; i < m; ++i)
109 | {
110 | dist += pow(fabs(x[i] - y[i]), beta);
111 | }
112 | dist = pow(dist, 1/beta);
113 | return -log(1 + dist);
114 | }
115 | }
116 |
117 | struct Cauchy(T)
118 | {
119 | private:
120 | T theta;
121 | public:
122 | this(T _theta)
123 | {
124 | theta = _theta;
125 | }
126 | T opCall(Slice!(T*) x, Slice!(T*) y) const
127 | {
128 | T dist = 0;
129 | auto m = x.length;
130 | for(size_t i = 0; i < m; ++i)
131 | {
132 | auto tmp = x[i] - y[i];
133 | dist += tmp * tmp;
134 | }
135 | dist = sqrt(dist)/theta;
136 | return 1/(1 + dist);
137 | }
138 | }
139 |
140 | struct Power(T)
141 | {
142 | private:
143 | T beta;
144 | public:
145 | this(T _beta)
146 | {
147 | beta = _beta;
148 | }
149 | T opCall(Slice!(T*) x, Slice!(T*) y) const
150 | {
151 | T dist = 0;
152 | auto m = x.length;
153 | for(size_t i = 0; i < m; ++i)
154 | {
155 | dist += pow(fabs(x[i] - y[i]), beta);
156 | }
157 | return -pow(dist, 1/beta);
158 | }
159 | }
160 |
161 | struct Wave(T)
162 | {
163 | private:
164 | T theta;
165 | public:
166 | this(T _theta)
167 | {
168 | theta = _theta;
169 | }
170 | T opCall(Slice!(T*) x, Slice!(T*) y) const
171 | {
172 | T dist = 0;
173 | auto m = x.length;
174 | for(size_t i = 0; i < m; ++i)
175 | {
176 | dist += fabs(x[i] - y[i]);
177 | }
178 | auto tmp = theta/dist;
179 | return tmp*sin(1/tmp);
180 | }
181 | }
182 |
183 | struct Sigmoid(T)
184 | {
185 | private:
186 | T beta0;
187 | T beta1;
188 | public:
189 | this(T _beta0, T _beta1)
190 | {
191 | beta0 = _beta0;
192 | beta1 = _beta1;
193 | }
194 | T opCall(Slice!(T*) x, Slice!(T*) y) const
195 | {
196 | T dist = 0;
197 | auto m = x.length;
198 | for(size_t i = 0; i < m; ++i)
199 | {
200 | dist += x[i] * y[i];
201 | }
202 | return tanh(beta0 * dist + beta1);
203 | }
204 | }
205 |
206 | /************************************************************************************/
207 |
208 | auto calculateKernelMatrix(alias K, T)(K!(T) kernel, Slice!(T*, 2) data)
209 | {
210 | size_t n = data.length!0;
211 | auto mat = uninitSlice!(T)(n, n);
212 | foreach(j, arrj; taskPool.parallel(data))
213 | foreach (i; j .. n)
214 | mat[j, i] = mat[i, j] = kernel(data[i], arrj);
215 | return mat;
216 | }
217 |
--------------------------------------------------------------------------------
/script.sh:
--------------------------------------------------------------------------------
1 | #!/usr/bin/bash
2 | cd d
3 | printf "\n#============== Running D Benchmark ==============#\n"
4 | ./script.sh
5 | cd ../chapel
6 | printf "\n#============== Running Chapel Benchmark ==============#\n"
7 | ./script.sh
8 | cd ../julia
9 | printf "\n#============== Running Julia Benchmark ==============#\n"
10 | ./script.sh
11 | printf "\n#============== Running Mir NDSlice Benchmark ==============#\n"
12 | cd ../ndslice
13 | dub build --compiler=ldc2 --build=release --force
14 | /usr/bin/time -v ./ndslice
15 |
--------------------------------------------------------------------------------