├── ArrayFire Tutorial.ipynb
├── BlackScholes.ipynb
├── ComputeFramework - logistic regression.ipynb
├── ComputeFramework - out-of-core (1).ipynb
├── ComputeFramework - out-of-core.ipynb
├── JuliaCon2016_Basics.ipynb
├── JuliaParallelGlossary.pdf
├── LICENSE
├── README.md
├── WorkshopDArray.ipynb
├── WorkshopMPI.ipynb
└── src
└── blackscholes.jl
/ArrayFire Tutorial.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Introduction to `ArrayFire.jl`\n",
8 | "\n",
9 | "`ArrayFire.jl` is a library for easy GPU computing in Julia. It wraps the library `arrayfire` for Julia. \n",
10 | "\n",
11 | "## What's GPU computing?\n",
12 | "GPU computing is a new frontier of scientific computing. Scientists and engineers can accelerate their codes by using special pieces of hardware on their systems called accelerators. `ArrayFire.jl` lets your harness the power of the GPU on your system.\n",
13 | "\n",
14 | "It has several advantages:\n",
15 | "\n",
16 | "* Versatile library with accelerated kernels\n",
17 | "* Easy Julian interface \n",
18 | "* Applications can easily be accelerated with little or no code changes\n",
19 | "\n",
20 | "This is a basic tutorial on how to use the package, and a gentle introduction to the API. \n",
21 | "\n",
22 | "First let's load the library."
23 | ]
24 | },
25 | {
26 | "cell_type": "code",
27 | "execution_count": 1,
28 | "metadata": {
29 | "collapsed": true
30 | },
31 | "outputs": [],
32 | "source": [
33 | "using ArrayFire"
34 | ]
35 | },
36 | {
37 | "cell_type": "markdown",
38 | "metadata": {},
39 | "source": [
40 | "Get some basic information about the device hardware you're using and `ArrayFire` version."
41 | ]
42 | },
43 | {
44 | "cell_type": "code",
45 | "execution_count": 2,
46 | "metadata": {
47 | "collapsed": false
48 | },
49 | "outputs": [
50 | {
51 | "name": "stdout",
52 | "output_type": "stream",
53 | "text": [
54 | "ArrayFire v3.3.2 (CUDA, 64-bit Linux, build f65dd97)\n",
55 | "Platform: CUDA Toolkit 7.5, Driver: 352.93\n",
56 | "[0] GRID K520, 4096 MB, CUDA Compute 3.0\n"
57 | ]
58 | }
59 | ],
60 | "source": [
61 | "AFInfo()"
62 | ]
63 | },
64 | {
65 | "cell_type": "markdown",
66 | "metadata": {},
67 | "source": [
68 | "## Creating Arrays on the GPU"
69 | ]
70 | },
71 | {
72 | "cell_type": "markdown",
73 | "metadata": {},
74 | "source": [
75 | "Create an array in Julia. This is a pointer to a section of memory on the CPU."
76 | ]
77 | },
78 | {
79 | "cell_type": "code",
80 | "execution_count": 3,
81 | "metadata": {
82 | "collapsed": false
83 | },
84 | "outputs": [
85 | {
86 | "data": {
87 | "text/plain": [
88 | "10x10 Array{Float64,2}:\n",
89 | " 0.850648 0.509599 0.551742 … 0.142463 0.262445 0.997387 \n",
90 | " 0.404588 0.959807 0.825402 0.668739 0.526881 0.00230886\n",
91 | " 0.740777 0.679432 0.632687 0.668384 0.943771 0.702737 \n",
92 | " 0.936116 0.145503 0.391174 0.892435 0.368432 0.690633 \n",
93 | " 0.463405 0.835053 0.934286 0.810642 0.272273 0.690738 \n",
94 | " 0.0427588 0.120828 0.59832 … 0.543716 0.691998 0.572092 \n",
95 | " 0.860077 0.646616 0.405247 0.152738 0.400603 0.580005 \n",
96 | " 0.347056 0.324015 0.977277 0.91631 0.389837 0.951371 \n",
97 | " 0.0446907 0.727383 0.119053 0.0804144 0.275027 0.784798 \n",
98 | " 0.779985 0.912716 0.0238783 0.099454 0.32698 0.351855 "
99 | ]
100 | },
101 | "execution_count": 3,
102 | "metadata": {},
103 | "output_type": "execute_result"
104 | }
105 | ],
106 | "source": [
107 | "a = rand(10,10)"
108 | ]
109 | },
110 | {
111 | "cell_type": "markdown",
112 | "metadata": {},
113 | "source": [
114 | "Let us now transfer this to the GPU. The interface to arrays on the GPU is `AFArray`. Call the constructor on this Array. "
115 | ]
116 | },
117 | {
118 | "cell_type": "code",
119 | "execution_count": 4,
120 | "metadata": {
121 | "collapsed": false
122 | },
123 | "outputs": [
124 | {
125 | "data": {
126 | "text/plain": [
127 | "10x10 ArrayFire.AFArray{Float64,2}:\n",
128 | " 0.850648 0.509599 0.551742 … 0.142463 0.262445 0.997387 \n",
129 | " 0.404588 0.959807 0.825402 0.668739 0.526881 0.00230886\n",
130 | " 0.740777 0.679432 0.632687 0.668384 0.943771 0.702737 \n",
131 | " 0.936116 0.145503 0.391174 0.892435 0.368432 0.690633 \n",
132 | " 0.463405 0.835053 0.934286 0.810642 0.272273 0.690738 \n",
133 | " 0.0427588 0.120828 0.59832 … 0.543716 0.691998 0.572092 \n",
134 | " 0.860077 0.646616 0.405247 0.152738 0.400603 0.580005 \n",
135 | " 0.347056 0.324015 0.977277 0.91631 0.389837 0.951371 \n",
136 | " 0.0446907 0.727383 0.119053 0.0804144 0.275027 0.784798 \n",
137 | " 0.779985 0.912716 0.0238783 0.099454 0.32698 0.351855 "
138 | ]
139 | },
140 | "execution_count": 4,
141 | "metadata": {},
142 | "output_type": "execute_result"
143 | }
144 | ],
145 | "source": [
146 | "ad = AFArray(a)"
147 | ]
148 | },
149 | {
150 | "cell_type": "markdown",
151 | "metadata": {},
152 | "source": [
153 | "_**Note**: The reason you're able to see the Array on the GPU is because in this notebook, there is an implicit memory transfer from device to host. This is just for interactivity, and won't happen in a script. In other words, interactive programming lets you see the values. But real applications won't perform these unnecessary transfers._"
154 | ]
155 | },
156 | {
157 | "cell_type": "markdown",
158 | "metadata": {},
159 | "source": [
160 | "You could directly generate random numbers on the GPU too. "
161 | ]
162 | },
163 | {
164 | "cell_type": "code",
165 | "execution_count": 5,
166 | "metadata": {
167 | "collapsed": false
168 | },
169 | "outputs": [
170 | {
171 | "data": {
172 | "text/plain": [
173 | "10x10 ArrayFire.AFArray{Float64,2}:\n",
174 | " 0.438451 0.508414 0.655754 0.139302 … 0.109688 0.794316 0.762549\n",
175 | " 0.460365 0.65455 0.0453718 0.350453 0.818407 0.293226 0.81892 \n",
176 | " 0.250215 0.512604 0.41461 0.138603 0.897313 0.194653 0.568026\n",
177 | " 0.494744 0.2643 0.0572878 0.745912 0.771221 0.905755 0.181485\n",
178 | " 0.0530111 0.0519806 0.081616 0.0774803 0.652292 0.111517 0.184462\n",
179 | " 0.337699 0.578997 0.105665 0.0283375 … 0.85391 0.977373 0.194457\n",
180 | " 0.396763 0.385556 0.800571 0.450337 0.423468 0.0126922 0.786332\n",
181 | " 0.874419 0.908215 0.691934 0.684978 0.955083 0.0922648 0.265172\n",
182 | " 0.482167 0.64162 0.93146 0.378957 0.336171 0.924614 0.281818\n",
183 | " 0.0428398 0.283399 0.952571 0.24779 0.137669 0.419587 0.353466"
184 | ]
185 | },
186 | "execution_count": 5,
187 | "metadata": {},
188 | "output_type": "execute_result"
189 | }
190 | ],
191 | "source": [
192 | "bd = rand(AFArray{Float64}, 10, 10)"
193 | ]
194 | },
195 | {
196 | "cell_type": "markdown",
197 | "metadata": {},
198 | "source": [
199 | "Let us now transfer this to the CPU now. You can call the `Array` constructor."
200 | ]
201 | },
202 | {
203 | "cell_type": "code",
204 | "execution_count": 6,
205 | "metadata": {
206 | "collapsed": false
207 | },
208 | "outputs": [
209 | {
210 | "data": {
211 | "text/plain": [
212 | "10x10 Array{Float64,2}:\n",
213 | " 0.438451 0.508414 0.655754 0.139302 … 0.109688 0.794316 0.762549\n",
214 | " 0.460365 0.65455 0.0453718 0.350453 0.818407 0.293226 0.81892 \n",
215 | " 0.250215 0.512604 0.41461 0.138603 0.897313 0.194653 0.568026\n",
216 | " 0.494744 0.2643 0.0572878 0.745912 0.771221 0.905755 0.181485\n",
217 | " 0.0530111 0.0519806 0.081616 0.0774803 0.652292 0.111517 0.184462\n",
218 | " 0.337699 0.578997 0.105665 0.0283375 … 0.85391 0.977373 0.194457\n",
219 | " 0.396763 0.385556 0.800571 0.450337 0.423468 0.0126922 0.786332\n",
220 | " 0.874419 0.908215 0.691934 0.684978 0.955083 0.0922648 0.265172\n",
221 | " 0.482167 0.64162 0.93146 0.378957 0.336171 0.924614 0.281818\n",
222 | " 0.0428398 0.283399 0.952571 0.24779 0.137669 0.419587 0.353466"
223 | ]
224 | },
225 | "execution_count": 6,
226 | "metadata": {},
227 | "output_type": "execute_result"
228 | }
229 | ],
230 | "source": [
231 | "b = Array(bd)"
232 | ]
233 | },
234 | {
235 | "cell_type": "markdown",
236 | "metadata": {},
237 | "source": [
238 | "## Simple Operations"
239 | ]
240 | },
241 | {
242 | "cell_type": "markdown",
243 | "metadata": {},
244 | "source": [
245 | "`ArrayFire.jl` lets you do many things. It is designed to mimic Base Julia. Feel free to step through the following functions and get comfortable with the API. Chances are that you'd be comfortable if you're familiar with Julia's function interfaces. For a list of supported functions, check the [README](https://github.com/JuliaComputing/ArrayFire.jl)."
246 | ]
247 | },
248 | {
249 | "cell_type": "markdown",
250 | "metadata": {},
251 | "source": [
252 | "### Arithmetic Operations"
253 | ]
254 | },
255 | {
256 | "cell_type": "code",
257 | "execution_count": 7,
258 | "metadata": {
259 | "collapsed": false
260 | },
261 | "outputs": [
262 | {
263 | "data": {
264 | "text/plain": [
265 | "10x10 ArrayFire.AFArray{Float64,2}:\n",
266 | " 1.85065 1.5096 1.55174 1.50719 … 1.19936 1.14246 1.26245 1.99739\n",
267 | " 1.40459 1.95981 1.8254 1.09119 1.96278 1.66874 1.52688 1.00231\n",
268 | " 1.74078 1.67943 1.63269 1.8859 1.73551 1.66838 1.94377 1.70274\n",
269 | " 1.93612 1.1455 1.39117 1.22593 1.14665 1.89244 1.36843 1.69063\n",
270 | " 1.46341 1.83505 1.93429 1.26774 1.24486 1.81064 1.27227 1.69074\n",
271 | " 1.04276 1.12083 1.59832 1.92261 … 1.77788 1.54372 1.692 1.57209\n",
272 | " 1.86008 1.64662 1.40525 1.09917 1.25007 1.15274 1.4006 1.58 \n",
273 | " 1.34706 1.32402 1.97728 1.98376 1.05109 1.91631 1.38984 1.95137\n",
274 | " 1.04469 1.72738 1.11905 1.36728 1.39915 1.08041 1.27503 1.7848 \n",
275 | " 1.77999 1.91272 1.02388 1.17785 1.95372 1.09945 1.32698 1.35186"
276 | ]
277 | },
278 | "execution_count": 7,
279 | "metadata": {},
280 | "output_type": "execute_result"
281 | }
282 | ],
283 | "source": [
284 | "ad + 1"
285 | ]
286 | },
287 | {
288 | "cell_type": "code",
289 | "execution_count": 8,
290 | "metadata": {
291 | "collapsed": false
292 | },
293 | "outputs": [
294 | {
295 | "data": {
296 | "text/plain": [
297 | "10x10 ArrayFire.AFArray{Float64,2}:\n",
298 | " 0.425324 0.254799 0.275871 … 0.0712314 0.131223 0.498694 \n",
299 | " 0.202294 0.479904 0.412701 0.334369 0.263441 0.00115443\n",
300 | " 0.370388 0.339716 0.316344 0.334192 0.471885 0.351369 \n",
301 | " 0.468058 0.0727516 0.195587 0.446218 0.184216 0.345316 \n",
302 | " 0.231703 0.417526 0.467143 0.405321 0.136136 0.345369 \n",
303 | " 0.0213794 0.0604139 0.29916 … 0.271858 0.345999 0.286046 \n",
304 | " 0.430039 0.323308 0.202623 0.0763691 0.200301 0.290002 \n",
305 | " 0.173528 0.162008 0.488638 0.458155 0.194919 0.475686 \n",
306 | " 0.0223454 0.363692 0.0595263 0.0402072 0.137513 0.392399 \n",
307 | " 0.389993 0.456358 0.0119391 0.049727 0.16349 0.175928 "
308 | ]
309 | },
310 | "execution_count": 8,
311 | "metadata": {},
312 | "output_type": "execute_result"
313 | }
314 | ],
315 | "source": [
316 | "(ad * 5) / 10 "
317 | ]
318 | },
319 | {
320 | "cell_type": "code",
321 | "execution_count": 9,
322 | "metadata": {
323 | "collapsed": false
324 | },
325 | "outputs": [
326 | {
327 | "data": {
328 | "text/plain": [
329 | "10x10 ArrayFire.AFArray{Float64,2}:\n",
330 | " 0.751708 0.487827 0.524171 0.48572 … 0.141981 0.259443 0.840057 \n",
331 | " 0.39364 0.819081 0.73482 0.0910638 0.619997 0.50284 0.00230886\n",
332 | " 0.674861 0.628351 0.591314 0.774485 0.619719 0.809776 0.646309 \n",
333 | " 0.805261 0.14499 0.381274 0.224012 0.778602 0.360153 0.637025 \n",
334 | " 0.446997 0.741332 0.804175 0.264556 0.724729 0.268921 0.637106 \n",
335 | " 0.0427458 0.120534 0.563255 0.797181 … 0.51732 0.638077 0.541392 \n",
336 | " 0.757893 0.602489 0.394246 0.0990069 0.152145 0.389973 0.548028 \n",
337 | " 0.340131 0.318376 0.828978 0.832583 0.79336 0.380038 0.814212 \n",
338 | " 0.0446759 0.664918 0.118772 0.359078 0.0803278 0.271573 0.706683 \n",
339 | " 0.703269 0.791168 0.023876 0.176915 0.0992901 0.321185 0.34464 "
340 | ]
341 | },
342 | "execution_count": 9,
343 | "metadata": {},
344 | "output_type": "execute_result"
345 | }
346 | ],
347 | "source": [
348 | "sin(ad)"
349 | ]
350 | },
351 | {
352 | "cell_type": "markdown",
353 | "metadata": {},
354 | "source": [
355 | "### Logical Operations"
356 | ]
357 | },
358 | {
359 | "cell_type": "code",
360 | "execution_count": 10,
361 | "metadata": {
362 | "collapsed": false
363 | },
364 | "outputs": [
365 | {
366 | "data": {
367 | "text/plain": [
368 | "10x10 ArrayFire.AFArray{Bool,2}:\n",
369 | " true true false true true true false true false true\n",
370 | " false true true false false false true false true false\n",
371 | " true true true true false true false false true true\n",
372 | " true false true false true false false true false true\n",
373 | " true true true true false true false true true true\n",
374 | " false false true true true true false false false true\n",
375 | " true true false false false false false false true false\n",
376 | " false false true true true true false false true true\n",
377 | " false true false false false false false false false true\n",
378 | " true true false false false true true false false false"
379 | ]
380 | },
381 | "execution_count": 10,
382 | "metadata": {},
383 | "output_type": "execute_result"
384 | }
385 | ],
386 | "source": [
387 | "cd = ad .> bd\n"
388 | ]
389 | },
390 | {
391 | "cell_type": "code",
392 | "execution_count": 11,
393 | "metadata": {
394 | "collapsed": false
395 | },
396 | "outputs": [
397 | {
398 | "data": {
399 | "text/plain": [
400 | "true"
401 | ]
402 | },
403 | "execution_count": 11,
404 | "metadata": {},
405 | "output_type": "execute_result"
406 | }
407 | ],
408 | "source": [
409 | "any_trues = any(cd)"
410 | ]
411 | },
412 | {
413 | "cell_type": "markdown",
414 | "metadata": {},
415 | "source": [
416 | "### Indexing\n",
417 | "\n"
418 | ]
419 | },
420 | {
421 | "cell_type": "code",
422 | "execution_count": 12,
423 | "metadata": {
424 | "collapsed": false
425 | },
426 | "outputs": [
427 | {
428 | "data": {
429 | "text/plain": [
430 | "10-element ArrayFire.AFArray{Float64,1}:\n",
431 | " 0.850648 \n",
432 | " 0.404588 \n",
433 | " 0.740777 \n",
434 | " 0.936116 \n",
435 | " 0.463405 \n",
436 | " 0.0427588\n",
437 | " 0.860077 \n",
438 | " 0.347056 \n",
439 | " 0.0446907\n",
440 | " 0.779985 "
441 | ]
442 | },
443 | "execution_count": 12,
444 | "metadata": {},
445 | "output_type": "execute_result"
446 | }
447 | ],
448 | "source": [
449 | "ad[:,1]"
450 | ]
451 | },
452 | {
453 | "cell_type": "code",
454 | "execution_count": 13,
455 | "metadata": {
456 | "collapsed": false
457 | },
458 | "outputs": [
459 | {
460 | "data": {
461 | "text/plain": [
462 | "1x10 ArrayFire.AFArray{Float64,2}:\n",
463 | " 0.850648 0.509599 0.551742 0.507187 … 0.142463 0.262445 0.997387"
464 | ]
465 | },
466 | "execution_count": 13,
467 | "metadata": {},
468 | "output_type": "execute_result"
469 | }
470 | ],
471 | "source": [
472 | "ad[1,:]"
473 | ]
474 | },
475 | {
476 | "cell_type": "code",
477 | "execution_count": 14,
478 | "metadata": {
479 | "collapsed": false
480 | },
481 | "outputs": [
482 | {
483 | "data": {
484 | "text/plain": [
485 | "10-element ArrayFire.AFArray{Float64,1}:\n",
486 | " 0.850648 \n",
487 | " 0.404588 \n",
488 | " 0.740777 \n",
489 | " 0.936116 \n",
490 | " 0.463405 \n",
491 | " 0.0427588\n",
492 | " 0.860077 \n",
493 | " 0.347056 \n",
494 | " 0.0446907\n",
495 | " 0.779985 "
496 | ]
497 | },
498 | "execution_count": 14,
499 | "metadata": {},
500 | "output_type": "execute_result"
501 | }
502 | ],
503 | "source": [
504 | "ad[:,1]"
505 | ]
506 | },
507 | {
508 | "cell_type": "code",
509 | "execution_count": 15,
510 | "metadata": {
511 | "collapsed": false
512 | },
513 | "outputs": [
514 | {
515 | "data": {
516 | "text/plain": [
517 | "5x2 ArrayFire.AFArray{Float64,2}:\n",
518 | " 0.509599 0.551742\n",
519 | " 0.959807 0.825402\n",
520 | " 0.679432 0.632687\n",
521 | " 0.145503 0.391174\n",
522 | " 0.835053 0.934286"
523 | ]
524 | },
525 | "execution_count": 15,
526 | "metadata": {},
527 | "output_type": "execute_result"
528 | }
529 | ],
530 | "source": [
531 | "ad[1:5, 2:3]"
532 | ]
533 | },
534 | {
535 | "cell_type": "markdown",
536 | "metadata": {},
537 | "source": [
538 | "### Reduction Operations"
539 | ]
540 | },
541 | {
542 | "cell_type": "code",
543 | "execution_count": 16,
544 | "metadata": {
545 | "collapsed": false
546 | },
547 | "outputs": [
548 | {
549 | "data": {
550 | "text/plain": [
551 | "0.9973873758193104"
552 | ]
553 | },
554 | "execution_count": 16,
555 | "metadata": {},
556 | "output_type": "execute_result"
557 | }
558 | ],
559 | "source": [
560 | "total_max = maximum(ad)\n"
561 | ]
562 | },
563 | {
564 | "cell_type": "code",
565 | "execution_count": 17,
566 | "metadata": {
567 | "collapsed": false
568 | },
569 | "outputs": [
570 | {
571 | "data": {
572 | "text/plain": [
573 | "10x10 ArrayFire.AFArray{Float64,2}:\n",
574 | " 0.850648 0.509599 0.551742 … 0.142463 0.262445 0.997387 \n",
575 | " 0.404588 0.959807 0.825402 0.668739 0.526881 0.00230886\n",
576 | " 0.740777 0.679432 0.632687 0.668384 0.943771 0.702737 \n",
577 | " 0.936116 0.145503 0.391174 0.892435 0.368432 0.690633 \n",
578 | " 0.463405 0.835053 0.934286 0.810642 0.272273 0.690738 \n",
579 | " 0.0427588 0.120828 0.59832 … 0.543716 0.691998 0.572092 \n",
580 | " 0.860077 0.646616 0.405247 0.152738 0.400603 0.580005 \n",
581 | " 0.347056 0.324015 0.977277 0.91631 0.389837 0.951371 \n",
582 | " 0.0446907 0.727383 0.119053 0.0804144 0.275027 0.784798 \n",
583 | " 0.779985 0.912716 0.0238783 0.099454 0.32698 0.351855 "
584 | ]
585 | },
586 | "execution_count": 17,
587 | "metadata": {},
588 | "output_type": "execute_result"
589 | }
590 | ],
591 | "source": [
592 | "colwise_min = min(ad,2)"
593 | ]
594 | },
595 | {
596 | "cell_type": "markdown",
597 | "metadata": {},
598 | "source": [
599 | "### Matrix Operations and Linear Algebra"
600 | ]
601 | },
602 | {
603 | "cell_type": "code",
604 | "execution_count": 18,
605 | "metadata": {
606 | "collapsed": false
607 | },
608 | "outputs": [
609 | {
610 | "data": {
611 | "text/plain": [
612 | "0.0063807428606410185"
613 | ]
614 | },
615 | "execution_count": 18,
616 | "metadata": {},
617 | "output_type": "execute_result"
618 | }
619 | ],
620 | "source": [
621 | "det(ad)"
622 | ]
623 | },
624 | {
625 | "cell_type": "code",
626 | "execution_count": 19,
627 | "metadata": {
628 | "collapsed": false
629 | },
630 | "outputs": [
631 | {
632 | "data": {
633 | "text/plain": [
634 | "(\n",
635 | "10x10 ArrayFire.AFArray{Float64,2}:\n",
636 | " -0.313169 -0.0728446 -0.39406 … -0.109904 0.428708 -0.472757 \n",
637 | " -0.302509 0.41031 0.26691 -0.250162 0.435094 -0.116101 \n",
638 | " -0.445768 0.0535002 0.233114 -0.419856 -0.503528 -0.34567 \n",
639 | " -0.264462 -0.22278 -0.438104 0.290968 0.13541 -0.0881808\n",
640 | " -0.344179 -0.107288 -0.162701 0.378457 -0.512499 -0.133523 \n",
641 | " -0.31535 -0.254988 0.540663 … 0.201097 0.0920345 0.16576 \n",
642 | " -0.240185 0.157973 -0.441264 -0.467096 -0.15401 0.612364 \n",
643 | " -0.381838 -0.505288 0.11674 -0.0393874 0.241498 0.394804 \n",
644 | " -0.194417 0.123034 0.00833061 0.0384355 0.0269871 -0.0596087\n",
645 | " -0.287353 0.633962 -0.000342207 0.509433 0.0336435 0.236728 ,\n",
646 | "\n",
647 | "10-element ArrayFire.AFArray{Float64,1}:\n",
648 | " 5.16542 \n",
649 | " 1.61888 \n",
650 | " 1.30156 \n",
651 | " 1.16527 \n",
652 | " 0.872052\n",
653 | " 0.815467\n",
654 | " 0.498861\n",
655 | " 0.407482\n",
656 | " 0.231483\n",
657 | " 0.015035,\n",
658 | "10x10 ArrayFire.AFArray{Float64,2}:\n",
659 | " -0.331332 -0.348378 -0.352093 … -0.321779 -0.283452 -0.38397 \n",
660 | " 0.206926 0.523063 -0.251793 -0.303041 0.0415419 -0.294912\n",
661 | " -0.657539 -0.124683 0.066637 0.0687384 0.22785 -0.363175\n",
662 | " 0.0174905 -0.112311 -0.432082 -0.518946 -0.0116748 0.397134\n",
663 | " 0.392447 -0.4981 -0.34586 -0.0142238 0.371496 -0.102179\n",
664 | " -0.386488 0.311179 0.0206515 … -0.347224 0.0254439 0.417919\n",
665 | " 0.108687 -0.111411 0.431758 -0.489495 -0.229781 -0.28958 \n",
666 | " -0.161245 -0.0505046 -0.383022 0.389094 -0.571367 0.179618\n",
667 | " 0.171555 -0.320445 0.362516 -0.11145 -0.468937 0.174345\n",
668 | " 0.210014 0.338074 -0.192122 0.0800494 -0.357853 -0.37996 )"
669 | ]
670 | },
671 | "execution_count": 19,
672 | "metadata": {},
673 | "output_type": "execute_result"
674 | }
675 | ],
676 | "source": [
677 | "svd(ad)"
678 | ]
679 | },
680 | {
681 | "cell_type": "code",
682 | "execution_count": 20,
683 | "metadata": {
684 | "collapsed": false
685 | },
686 | "outputs": [
687 | {
688 | "data": {
689 | "text/plain": [
690 | "(\n",
691 | "10x10 ArrayFire.AFArray{Float64,2}:\n",
692 | " 1.0 0.0 0.0 … 0.0 0.0 0.0 0.0\n",
693 | " 0.432199 1.0 0.0 0.0 0.0 0.0 0.0\n",
694 | " 0.833214 0.882442 1.0 0.0 0.0 0.0 0.0\n",
695 | " 0.0456768 0.127304 -0.563867 0.0 0.0 0.0 0.0\n",
696 | " 0.37074 0.301109 -0.720155 0.0 0.0 0.0 0.0\n",
697 | " 0.0477406 0.803234 0.484338 … 0.0 0.0 0.0 0.0\n",
698 | " 0.49503 0.850715 -0.206856 1.0 0.0 0.0 0.0\n",
699 | " 0.908699 0.420751 0.0906361 0.269301 1.0 0.0 0.0\n",
700 | " 0.79133 0.629142 0.101891 -0.198653 0.144041 1.0 0.0\n",
701 | " 0.918772 0.571881 0.373907 0.305605 0.859156 0.517231 1.0,\n",
702 | "\n",
703 | "10x10 ArrayFire.AFArray{Float64,2}:\n",
704 | " 0.936116 0.145503 0.391174 … 0.892435 0.368432 0.690633 \n",
705 | " 0.0 0.896921 0.656337 0.283029 0.367645 -0.296182 \n",
706 | " 0.0 0.0 -0.881232 -0.893893 -0.304428 0.037773 \n",
707 | " 0.0 0.0 0.0 -0.0371149 0.45671 0.599551 \n",
708 | " 0.0 0.0 0.0 -0.106887 -0.527428 0.220003 \n",
709 | " 0.0 0.0 0.0 … 0.277916 0.024035 0.691294 \n",
710 | " 0.0 0.0 0.0 0.037266 -0.559184 0.908713 \n",
711 | " 0.0 0.0 0.0 -0.696586 -0.0995607 -0.0310477\n",
712 | " 0.0 0.0 0.0 0.0 0.336167 -0.29248 \n",
713 | " 0.0 0.0 0.0 0.0 0.0 -0.0637559,\n",
714 | "\n",
715 | "10-element ArrayFire.AFArray{Int64,1}:\n",
716 | " 4\n",
717 | " 2\n",
718 | " 10\n",
719 | " 6\n",
720 | " 8\n",
721 | " 9\n",
722 | " 5\n",
723 | " 1\n",
724 | " 3\n",
725 | " 7)"
726 | ]
727 | },
728 | "execution_count": 20,
729 | "metadata": {},
730 | "output_type": "execute_result"
731 | }
732 | ],
733 | "source": [
734 | "lu(ad)"
735 | ]
736 | },
737 | {
738 | "cell_type": "markdown",
739 | "metadata": {
740 | "collapsed": true
741 | },
742 | "source": [
743 | "### FFTs"
744 | ]
745 | },
746 | {
747 | "cell_type": "code",
748 | "execution_count": 21,
749 | "metadata": {
750 | "collapsed": false
751 | },
752 | "outputs": [
753 | {
754 | "data": {
755 | "text/plain": [
756 | "10x10 ArrayFire.AFArray{Complex{Float64},2}:\n",
757 | " 49.9215+0.0im 4.19014+0.43302im … 4.19014-0.43302im \n",
758 | " -0.168803-3.47083im 1.37611+0.526002im -0.541282-0.169453im\n",
759 | " -0.487556-2.20675im -1.13887-1.06387im -0.346743+1.21406im \n",
760 | " -0.512639-0.396899im 1.70543-0.020714im 4.39249+1.59508im \n",
761 | " 1.37913+6.65045im -0.209985-2.32303im -2.19831-1.79236im \n",
762 | " -0.679374+5.55112e-17im 3.73888-1.4337im … 3.73888+1.4337im \n",
763 | " 1.37913-6.65045im -2.19831+1.79236im -0.209985+2.32303im \n",
764 | " -0.512639+0.396899im 4.39249-1.59508im 1.70543+0.020714im\n",
765 | " -0.487556+2.20675im -0.346743-1.21406im -1.13887+1.06387im \n",
766 | " -0.168803+3.47083im -0.541282+0.169453im 1.37611-0.526002im"
767 | ]
768 | },
769 | "execution_count": 21,
770 | "metadata": {},
771 | "output_type": "execute_result"
772 | }
773 | ],
774 | "source": [
775 | "fast_fourier = fft(ad)"
776 | ]
777 | },
778 | {
779 | "cell_type": "markdown",
780 | "metadata": {},
781 | "source": [
782 | "## Backends\n",
783 | "\n",
784 | "ArrayFire allows you to change backends at runtime. This is allows ArrayFire tremendous versatility, and run on a variety of backends, thereby supporting a number of devices. \n",
785 | "\n",
786 | "Run the following command to see which backend you're currently using:"
787 | ]
788 | },
789 | {
790 | "cell_type": "code",
791 | "execution_count": 22,
792 | "metadata": {
793 | "collapsed": false
794 | },
795 | "outputs": [
796 | {
797 | "name": "stdout",
798 | "output_type": "stream",
799 | "text": [
800 | "CUDA Backend\n"
801 | ]
802 | }
803 | ],
804 | "source": [
805 | "getActiveBackend()"
806 | ]
807 | },
808 | {
809 | "cell_type": "markdown",
810 | "metadata": {},
811 | "source": [
812 | "What are the available backends on this system?"
813 | ]
814 | },
815 | {
816 | "cell_type": "code",
817 | "execution_count": 23,
818 | "metadata": {
819 | "collapsed": false
820 | },
821 | "outputs": [
822 | {
823 | "name": "stdout",
824 | "output_type": "stream",
825 | "text": [
826 | "CPU, CUDA and OpenCL\n"
827 | ]
828 | }
829 | ],
830 | "source": [
831 | "getAvailableBackends()"
832 | ]
833 | },
834 | {
835 | "cell_type": "markdown",
836 | "metadata": {},
837 | "source": [
838 | "Our ArrayFire was built for all three backends. Let us now try switching backends. "
839 | ]
840 | },
841 | {
842 | "cell_type": "code",
843 | "execution_count": 24,
844 | "metadata": {
845 | "collapsed": false
846 | },
847 | "outputs": [
848 | {
849 | "data": {
850 | "text/plain": [
851 | "true"
852 | ]
853 | },
854 | "execution_count": 24,
855 | "metadata": {},
856 | "output_type": "execute_result"
857 | }
858 | ],
859 | "source": [
860 | "setBackend(AF_BACKEND_CPU)"
861 | ]
862 | },
863 | {
864 | "cell_type": "code",
865 | "execution_count": 25,
866 | "metadata": {
867 | "collapsed": false
868 | },
869 | "outputs": [
870 | {
871 | "name": "stdout",
872 | "output_type": "stream",
873 | "text": [
874 | "CPU Backend\n"
875 | ]
876 | }
877 | ],
878 | "source": [
879 | "getActiveBackend()"
880 | ]
881 | },
882 | {
883 | "cell_type": "code",
884 | "execution_count": null,
885 | "metadata": {
886 | "collapsed": true
887 | },
888 | "outputs": [],
889 | "source": []
890 | },
891 | {
892 | "cell_type": "code",
893 | "execution_count": null,
894 | "metadata": {
895 | "collapsed": true
896 | },
897 | "outputs": [],
898 | "source": []
899 | }
900 | ],
901 | "metadata": {
902 | "kernelspec": {
903 | "display_name": "Julia 0.4.5",
904 | "language": "julia",
905 | "name": "julia-0.4"
906 | },
907 | "language_info": {
908 | "file_extension": ".jl",
909 | "mimetype": "application/julia",
910 | "name": "julia",
911 | "version": "0.4.5"
912 | }
913 | },
914 | "nbformat": 4,
915 | "nbformat_minor": 0
916 | }
917 |
--------------------------------------------------------------------------------
/BlackScholes.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# ArrayFire Example : BlackScholes"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {},
13 | "source": [
14 | "Now let us work on a simple example, to show you what ArrayFire can do. \n",
15 | "\n",
16 | "### BlackScholes\n",
17 | "\n",
18 | "This is the powerhouse of modern financial simulations. From this model, we can estimate the theoretical price of a European style option. This model is fairly used (with some modifications and tuning) by options markets participants. \n",
19 | "\n",
20 | "Now we're going to write a simple blacksholes kernel, and accelerate it using ArrayFire."
21 | ]
22 | },
23 | {
24 | "cell_type": "code",
25 | "execution_count": 1,
26 | "metadata": {
27 | "collapsed": true
28 | },
29 | "outputs": [],
30 | "source": [
31 | "using ArrayFire"
32 | ]
33 | },
34 | {
35 | "cell_type": "code",
36 | "execution_count": 2,
37 | "metadata": {
38 | "collapsed": false
39 | },
40 | "outputs": [
41 | {
42 | "data": {
43 | "text/plain": [
44 | "Float32"
45 | ]
46 | },
47 | "execution_count": 2,
48 | "metadata": {},
49 | "output_type": "execute_result"
50 | }
51 | ],
52 | "source": [
53 | "T = Float32"
54 | ]
55 | },
56 | {
57 | "cell_type": "markdown",
58 | "metadata": {},
59 | "source": [
60 | "Write a simple blackscholes kernel."
61 | ]
62 | },
63 | {
64 | "cell_type": "code",
65 | "execution_count": 3,
66 | "metadata": {
67 | "collapsed": false
68 | },
69 | "outputs": [
70 | {
71 | "data": {
72 | "text/plain": [
73 | "cndf2 (generic function with 1 method)"
74 | ]
75 | },
76 | "execution_count": 3,
77 | "metadata": {},
78 | "output_type": "execute_result"
79 | }
80 | ],
81 | "source": [
82 | "function blackscholes_serial(sptprice::T,\n",
83 | " strike::AbstractArray,\n",
84 | " rate::T,\n",
85 | " volatility::T,\n",
86 | " time::T)\n",
87 | " logterm = log10( sptprice ./ strike)\n",
88 | " powterm = .5f0 .* volatility .* volatility\n",
89 | " den = volatility .* sqrt(time)\n",
90 | " d1 = (((rate .+ powterm) .* time) .+ logterm) ./ den\n",
91 | " d2 = d1 .- den\n",
92 | " NofXd1 = cndf2(d1)\n",
93 | " NofXd2 = cndf2(d2)\n",
94 | " futureValue = strike .* exp(- rate .* time)\n",
95 | " c1 = futureValue .* NofXd2\n",
96 | " call_ = sptprice .* NofXd1 .- c1\n",
97 | " put = call_ .- futureValue .+ sptprice\n",
98 | "end\n",
99 | "\n",
100 | "@inline function cndf2(in::AbstractArray)\n",
101 | " out = 1/2 + erf(1/√2 .* in) ./ 2\n",
102 | " return out\n",
103 | "end"
104 | ]
105 | },
106 | {
107 | "cell_type": "markdown",
108 | "metadata": {},
109 | "source": [
110 | "Create all the arrays: spot price, various initial strike prices, rates and volatility, and time period. We have chosen a million iterations to simulate. "
111 | ]
112 | },
113 | {
114 | "cell_type": "code",
115 | "execution_count": 4,
116 | "metadata": {
117 | "collapsed": false
118 | },
119 | "outputs": [
120 | {
121 | "data": {
122 | "text/plain": [
123 | "0.5f0"
124 | ]
125 | },
126 | "execution_count": 4,
127 | "metadata": {},
128 | "output_type": "execute_result"
129 | }
130 | ],
131 | "source": [
132 | "iterations = 10^7\n",
133 | "sptprice = T(42.0)\n",
134 | "initStrike = T[ 40.0 + (i / iterations) for i = 1:iterations ]\n",
135 | "rate = T(0.5) \n",
136 | "volatility = T(0.2) \n",
137 | "time = T(0.5) "
138 | ]
139 | },
140 | {
141 | "cell_type": "markdown",
142 | "metadata": {},
143 | "source": [
144 | "Let's now convert them all to GPU arrays. Now, all these different parameters now reside on the GPU. "
145 | ]
146 | },
147 | {
148 | "cell_type": "code",
149 | "execution_count": 5,
150 | "metadata": {
151 | "collapsed": false
152 | },
153 | "outputs": [
154 | {
155 | "data": {
156 | "text/plain": [
157 | "10000000-element ArrayFire.AFArray{Float32,1}:\n",
158 | " 40.0\n",
159 | " 40.0\n",
160 | " 40.0\n",
161 | " 40.0\n",
162 | " 40.0\n",
163 | " 40.0\n",
164 | " 40.0\n",
165 | " 40.0\n",
166 | " 40.0\n",
167 | " 40.0\n",
168 | " 40.0\n",
169 | " 40.0\n",
170 | " 40.0\n",
171 | " ⋮ \n",
172 | " 41.0\n",
173 | " 41.0\n",
174 | " 41.0\n",
175 | " 41.0\n",
176 | " 41.0\n",
177 | " 41.0\n",
178 | " 41.0\n",
179 | " 41.0\n",
180 | " 41.0\n",
181 | " 41.0\n",
182 | " 41.0\n",
183 | " 41.0"
184 | ]
185 | },
186 | "execution_count": 5,
187 | "metadata": {},
188 | "output_type": "execute_result"
189 | }
190 | ],
191 | "source": [
192 | "initStriked = AFArray(initStrike)"
193 | ]
194 | },
195 | {
196 | "cell_type": "markdown",
197 | "metadata": {},
198 | "source": [
199 | "Let's now run the black scholes simulation, and time the execution."
200 | ]
201 | },
202 | {
203 | "cell_type": "code",
204 | "execution_count": 6,
205 | "metadata": {
206 | "collapsed": false
207 | },
208 | "outputs": [
209 | {
210 | "name": "stdout",
211 | "output_type": "stream",
212 | "text": [
213 | "sum(put1) = 2.0954822f8\n",
214 | " 1.848766 seconds (360.94 k allocations: 741.225 MB, 6.26% gc time)\n"
215 | ]
216 | },
217 | {
218 | "data": {
219 | "text/plain": [
220 | "2.0954822f8"
221 | ]
222 | },
223 | "execution_count": 6,
224 | "metadata": {},
225 | "output_type": "execute_result"
226 | }
227 | ],
228 | "source": [
229 | "@time begin \n",
230 | " put1 = blackscholes_serial(sptprice, initStrike, rate, volatility, time)\n",
231 | " @show sum(put1)\n",
232 | "end"
233 | ]
234 | },
235 | {
236 | "cell_type": "markdown",
237 | "metadata": {},
238 | "source": [
239 | "Now let's run the accelerated blackscholes simulation. Notice that we don't need to change any code. All we need to do is feed in different inputs. \n",
240 | "\n",
241 | "This is **multiple dispatch** at work."
242 | ]
243 | },
244 | {
245 | "cell_type": "code",
246 | "execution_count": 7,
247 | "metadata": {
248 | "collapsed": false
249 | },
250 | "outputs": [
251 | {
252 | "name": "stdout",
253 | "output_type": "stream",
254 | "text": [
255 | "sum(put2) = 2.0954820959255382e8\n",
256 | " 0.581040 seconds (418.75 k allocations: 18.451 MB)\n"
257 | ]
258 | },
259 | {
260 | "data": {
261 | "text/plain": [
262 | "2.0954820959255382e8"
263 | ]
264 | },
265 | "execution_count": 7,
266 | "metadata": {},
267 | "output_type": "execute_result"
268 | }
269 | ],
270 | "source": [
271 | "@time begin \n",
272 | " put2 = blackscholes_serial(sptprice, initStriked, rate, volatility, time)\n",
273 | " @show sum(put2)\n",
274 | "end"
275 | ]
276 | },
277 | {
278 | "cell_type": "code",
279 | "execution_count": null,
280 | "metadata": {
281 | "collapsed": true
282 | },
283 | "outputs": [],
284 | "source": []
285 | }
286 | ],
287 | "metadata": {
288 | "kernelspec": {
289 | "display_name": "Julia 0.4.5",
290 | "language": "julia",
291 | "name": "julia-0.4"
292 | },
293 | "language_info": {
294 | "file_extension": ".jl",
295 | "mimetype": "application/julia",
296 | "name": "julia",
297 | "version": "0.4.5"
298 | }
299 | },
300 | "nbformat": 4,
301 | "nbformat_minor": 0
302 | }
303 |
--------------------------------------------------------------------------------
/ComputeFramework - logistic regression.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": 1,
6 | "metadata": {
7 | "collapsed": false
8 | },
9 | "outputs": [
10 | {
11 | "data": {
12 | "text/plain": [
13 | "8-element Array{Int64,1}:\n",
14 | " 2\n",
15 | " 3\n",
16 | " 4\n",
17 | " 5\n",
18 | " 6\n",
19 | " 7\n",
20 | " 8\n",
21 | " 9"
22 | ]
23 | },
24 | "execution_count": 1,
25 | "metadata": {},
26 | "output_type": "execute_result"
27 | }
28 | ],
29 | "source": [
30 | "addprocs(8)"
31 | ]
32 | },
33 | {
34 | "cell_type": "code",
35 | "execution_count": 2,
36 | "metadata": {
37 | "collapsed": false
38 | },
39 | "outputs": [
40 | {
41 | "data": {
42 | "text/plain": [
43 | "_gather (generic function with 2 methods)"
44 | ]
45 | },
46 | "execution_count": 2,
47 | "metadata": {},
48 | "output_type": "execute_result"
49 | }
50 | ],
51 | "source": [
52 | "using ComputeFramework\n",
53 | "using Distributions\n",
54 | "\n",
55 | "# helper function to perform gather only on CF objects\n",
56 | "import ComputeFramework.Computation\n",
57 | "\n",
58 | "global ctx=Context()\n",
59 | "\n",
60 | "_gather{C<:Computation, D<:Computation}(x::Tuple{C,D}) = gather(ctx, x)\n",
61 | "_gather(x::Tuple) = x"
62 | ]
63 | },
64 | {
65 | "cell_type": "code",
66 | "execution_count": 3,
67 | "metadata": {
68 | "collapsed": true
69 | },
70 | "outputs": [],
71 | "source": [
72 | "@everywhere logistic(x) = 1 / (1 + exp(-x))\n",
73 | "\n",
74 | "function initialize(X,y,λ)\n",
75 | " (XtX, Xty) = _gather((X'X, X'y))\n",
76 | " cholfact!(XtX + λ*I)\\(Xty)\n",
77 | "end\n",
78 | "\n",
79 | "@everywhere function regression(X, y; tol = 1e-12, maxIter = 30, λ = 0.0, init = initialize(X,y,λ))\n",
80 | " β = init\n",
81 | " μ = X*vec(β)\n",
82 | " k = 0\n",
83 | " for k = 1:maxIter\n",
84 | " η = map(logistic, μ)\n",
85 | " w = η.*(1-η)\n",
86 | " r = y - η\n",
87 | "\n",
88 | " Xw = scale(w, X)\n",
89 | "\n",
90 | " XtX, Xtr = _gather((Xw'X, X'r))\n",
91 | " Δβ = cholfact!(XtX + λ*I) \\ (Xtr .- λ*β)\n",
92 | "\n",
93 | " β += Δβ\n",
94 | "\n",
95 | " if (@show norm(Δβ)) < tol\n",
96 | " break\n",
97 | " endr\n",
98 | " μ = X*β\n",
99 | " end\n",
100 | " if k == maxIter\n",
101 | " error(\"no convergence\")\n",
102 | " end\n",
103 | " return β, k\n",
104 | "end"
105 | ]
106 | },
107 | {
108 | "cell_type": "code",
109 | "execution_count": 4,
110 | "metadata": {
111 | "collapsed": false
112 | },
113 | "outputs": [
114 | {
115 | "data": {
116 | "text/plain": [
117 | "1666667"
118 | ]
119 | },
120 | "execution_count": 4,
121 | "metadata": {},
122 | "output_type": "execute_result"
123 | }
124 | ],
125 | "source": [
126 | "# Number of rows\n",
127 | "N = 10^7\n",
128 | "N_part = 1666667"
129 | ]
130 | },
131 | {
132 | "cell_type": "code",
133 | "execution_count": 5,
134 | "metadata": {
135 | "collapsed": false
136 | },
137 | "outputs": [
138 | {
139 | "name": "stdout",
140 | "output_type": "stream",
141 | "text": [
142 | " 3.513864 seconds (2.86 M allocations: 129.427 MB, 0.90% gc time)\n",
143 | " 2.759677 seconds (2.09 M allocations: 172.691 MB, 2.65% gc time)\n"
144 | ]
145 | },
146 | {
147 | "data": {
148 | "text/plain": [
149 | "ComputeFramework.Computed(10000000 BitArray{1} in 6 parts)"
150 | ]
151 | },
152 | "execution_count": 5,
153 | "metadata": {},
154 | "output_type": "execute_result"
155 | }
156 | ],
157 | "source": [
158 | "\n",
159 | "x = rand(BlockPartition(N_part,10), N, 10)\n",
160 | "@time X = compute(x)\n",
161 | "\n",
162 | "y = (X * [9:-1:0.;]) .> Distribute(BlockPartition(N_part), rand(Logistic(), N))\n",
163 | "@time Y = compute(y)"
164 | ]
165 | },
166 | {
167 | "cell_type": "code",
168 | "execution_count": 28,
169 | "metadata": {
170 | "collapsed": false
171 | },
172 | "outputs": [
173 | {
174 | "name": "stdout",
175 | "output_type": "stream",
176 | "text": [
177 | "norm(Δβ) = 0.8634392139496269\n",
178 | "norm(Δβ) = 0.7304707586177425\n",
179 | "norm(Δβ) = 0.7189065313479128\n",
180 | "norm(Δβ) = 0.7336529085838923\n",
181 | "norm(Δβ) = 0.7589039220139997\n",
182 | "norm(Δβ) = 0.7898295368295862\n",
183 | "norm(Δβ) = 0.8241002660802208\n",
184 | "norm(Δβ) = 0.8589573676539586\n",
185 | "norm(Δβ) = 0.8879357061572806\n",
186 | "norm(Δβ) = 0.8947902021568425\n",
187 | "norm(Δβ) = 0.849715147713261\n",
188 | "norm(Δβ) = 0.7182723270852963\n",
189 | "norm(Δβ) = 0.4372006082705449\n",
190 | "norm(Δβ) = 0.11264266790268349\n",
191 | "norm(Δβ) = 0.005567291277316119\n",
192 | "norm(Δβ) = 1.2684039469830244e-5\n",
193 | "norm(Δβ) = 6.848122067357623e-11\n",
194 | "norm(Δβ) = 2.464326323531238e-12\n",
195 | "norm(Δβ) = 9.16125330601169e-13\n",
196 | " 0.010783 seconds (79.44 k allocations: 5.733 MB, 40.43% gc time)\n"
197 | ]
198 | },
199 | {
200 | "data": {
201 | "text/plain": [
202 | "([3.233866893389513,3.310758396834973,3.5603050548681554,3.4228170715408903,3.3475120256747575,3.433055426019274,3.492285096042701,3.618546208372272,3.348732338015894,2.944417951482114],19)"
203 | ]
204 | },
205 | "execution_count": 28,
206 | "metadata": {},
207 | "output_type": "execute_result"
208 | }
209 | ],
210 | "source": [
211 | "N_small = 2*10^3\n",
212 | "\n",
213 | "X1 = gather(X[1:N_small, :])\n",
214 | "Y1 = gather(Y[1:N_small])\n",
215 | "\n",
216 | "# run the sequential version on a subset to find an iniital guess\n",
217 | "init, k = @time regression(X1,Y1,tol=1e-12, λ=1/N_small)"
218 | ]
219 | },
220 | {
221 | "cell_type": "code",
222 | "execution_count": 31,
223 | "metadata": {
224 | "collapsed": false
225 | },
226 | "outputs": [
227 | {
228 | "name": "stdout",
229 | "output_type": "stream",
230 | "text": [
231 | "norm(Δβ) = 0.8555009854429065\n",
232 | "norm(Δβ) = 0.7234525658620435\n",
233 | "norm(Δβ) = 0.7109977977108463\n",
234 | "norm(Δβ) = 0.7244025208076879\n",
235 | "norm(Δβ) = 0.7482238363049689\n",
236 | "norm(Δβ) = 0.7780454572866816\n",
237 | "norm(Δβ) = 0.8123921414785201\n",
238 | "norm(Δβ) = 0.8506244591274544\n",
239 | "norm(Δβ) = 0.8922157016532679\n",
240 | "norm(Δβ) = 0.9381939785896993\n",
241 | "norm(Δβ) = 1.0070055416983437\n",
242 | "norm(Δβ) = 1.2150317609659707\n",
243 | "norm(Δβ) = 1.8509823663255252\n",
244 | "norm(Δβ) = 2.763348991500668\n",
245 | "norm(Δβ) = 2.9163254913186822\n",
246 | "norm(Δβ) = 2.088794184329865\n",
247 | "norm(Δβ) = 0.794583838606624\n",
248 | "norm(Δβ) = 0.08829764010809388\n",
249 | "norm(Δβ) = 0.0010138276589816589\n",
250 | "norm(Δβ) = 1.3929833795526373e-7\n",
251 | "norm(Δβ) = 5.228848055849768e-13\n",
252 | " 29.467062 seconds (2.10 M allocations: 133.861 MB, 0.35% gc time)\n"
253 | ]
254 | },
255 | {
256 | "data": {
257 | "text/plain": [
258 | "([9.10830145833094,10.236706198033996,8.815618895321798,4.093497607192385,3.839092840603249,4.641276635042943,3.501394497014302,0.8002262859609115,0.9588986871748054,1.4295594225438053],21)"
259 | ]
260 | },
261 | "execution_count": 31,
262 | "metadata": {},
263 | "output_type": "execute_result"
264 | }
265 | ],
266 | "source": [
267 | "@time regression(X,Y,tol=1e-12, λ=1/N)"
268 | ]
269 | },
270 | {
271 | "cell_type": "code",
272 | "execution_count": 32,
273 | "metadata": {
274 | "collapsed": false
275 | },
276 | "outputs": [
277 | {
278 | "name": "stdout",
279 | "output_type": "stream",
280 | "text": [
281 | "norm(Δβ) = 0.8555009854429105\n",
282 | "norm(Δβ) = 0.7234525658620439\n",
283 | "norm(Δβ) = 0.7109977977108463\n",
284 | "norm(Δβ) = 0.7244025208076847\n",
285 | "norm(Δβ) = 0.7482238363049737\n",
286 | "norm(Δβ) = 0.7780454572866833\n",
287 | "norm(Δβ) = 0.8123921414785209\n",
288 | "norm(Δβ) = 0.8506244591274488\n",
289 | "norm(Δβ) = 0.8922157016532651\n",
290 | "norm(Δβ) = 0.9381939785896897\n",
291 | "norm(Δβ) = 1.007005541698348\n",
292 | "norm(Δβ) = 1.2150317609660173\n",
293 | "norm(Δβ) = 1.8509823663254972\n",
294 | "norm(Δβ) = 2.763348991500565\n",
295 | "norm(Δβ) = 2.916325491318795\n",
296 | "norm(Δβ) = 2.0887941843297857\n",
297 | "norm(Δβ) = 0.794583838606666\n",
298 | "norm(Δβ) = 0.08829764010806286\n",
299 | "norm(Δβ) = 0.001013827658798716\n",
300 | "norm(Δβ) = 1.3929856521853447e-7\n",
301 | "norm(Δβ) = 6.273004866991971e-13\n",
302 | " 37.897343 seconds (420.01 M allocations: 29.803 GB, 9.39% gc time)\n"
303 | ]
304 | },
305 | {
306 | "data": {
307 | "text/plain": [
308 | "([9.10830145833111,10.236706198033634,8.815618895321794,4.09349760719224,3.8390928406031364,4.64127663504292,3.501394497014381,0.8002262859610104,0.9588986871747942,1.4295594225437944],21)"
309 | ]
310 | },
311 | "execution_count": 32,
312 | "metadata": {},
313 | "output_type": "execute_result"
314 | }
315 | ],
316 | "source": [
317 | "a,b= gather((X,Y))\n",
318 | "\n",
319 | "@time regression(a,b,tol=1e-12, λ=1/N)"
320 | ]
321 | },
322 | {
323 | "cell_type": "code",
324 | "execution_count": null,
325 | "metadata": {
326 | "collapsed": true
327 | },
328 | "outputs": [],
329 | "source": []
330 | }
331 | ],
332 | "metadata": {
333 | "kernelspec": {
334 | "display_name": "Julia 0.4.5",
335 | "language": "julia",
336 | "name": "julia-0.4"
337 | },
338 | "language_info": {
339 | "file_extension": ".jl",
340 | "mimetype": "application/julia",
341 | "name": "julia",
342 | "version": "0.4.5"
343 | }
344 | },
345 | "nbformat": 4,
346 | "nbformat_minor": 0
347 | }
348 |
--------------------------------------------------------------------------------
/ComputeFramework - out-of-core.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# ComputeFramework\n",
8 | "\n",
9 | "**One big idea: break up large data into pieces, run computations on the pieces without filling up memory**\n",
10 | "\n",
11 | "Low-level view:\n",
12 | "\n",
13 | "ComputeFramework *reorganizes* your computation so that it can be:\n",
14 | " - out-of-core\n",
15 | " - parallel\n",
16 | " \n",
17 | " \n",
18 | "High-level view:\n",
19 | "\n",
20 | "- Array library\n",
21 | "- Other data-parallel libraries"
22 | ]
23 | },
24 | {
25 | "cell_type": "markdown",
26 | "metadata": {},
27 | "source": [
28 | "## Out-of-core\n",
29 | "Out-of-core computation means working with data which does not fit in the RAM. Let's find out how much RAM we have left. `Sys.free_memory()` can tell you this."
30 | ]
31 | },
32 | {
33 | "cell_type": "code",
34 | "execution_count": 1,
35 | "metadata": {
36 | "collapsed": false
37 | },
38 | "outputs": [
39 | {
40 | "data": {
41 | "text/plain": [
42 | "123.205656576"
43 | ]
44 | },
45 | "execution_count": 1,
46 | "metadata": {},
47 | "output_type": "execute_result"
48 | }
49 | ],
50 | "source": [
51 | "Int(Sys.free_memory())/10^9"
52 | ]
53 | },
54 | {
55 | "cell_type": "code",
56 | "execution_count": null,
57 | "metadata": {
58 | "collapsed": false
59 | },
60 | "outputs": [],
61 | "source": [
62 | "rand(5*10^8) # Docker kills my Julia because I allocated too much"
63 | ]
64 | },
65 | {
66 | "cell_type": "code",
67 | "execution_count": 2,
68 | "metadata": {
69 | "collapsed": false
70 | },
71 | "outputs": [],
72 | "source": [
73 | "io = open(\"/scratch/mybigarray\", \"w+\")\n",
74 | "X = Mmap.mmap(io, Vector{Float64}, (5*10^8,))\n",
75 | "close(io)"
76 | ]
77 | },
78 | {
79 | "cell_type": "code",
80 | "execution_count": 3,
81 | "metadata": {
82 | "collapsed": false
83 | },
84 | "outputs": [
85 | {
86 | "name": "stdout",
87 | "output_type": "stream",
88 | "text": [
89 | " 5.446527 seconds (27.67 k allocations: 1.211 MB, 0.07% gc time)\n"
90 | ]
91 | },
92 | {
93 | "data": {
94 | "text/plain": [
95 | "500000000-element Array{Float64,1}:\n",
96 | " 0.699571 \n",
97 | " 0.831891 \n",
98 | " 0.508932 \n",
99 | " 0.707173 \n",
100 | " 0.7831 \n",
101 | " 0.444995 \n",
102 | " 0.777119 \n",
103 | " 0.674943 \n",
104 | " 0.157073 \n",
105 | " 0.441637 \n",
106 | " 0.666546 \n",
107 | " 0.656926 \n",
108 | " 0.338891 \n",
109 | " ⋮ \n",
110 | " 0.603209 \n",
111 | " 0.556922 \n",
112 | " 0.786834 \n",
113 | " 0.235923 \n",
114 | " 0.856844 \n",
115 | " 0.357216 \n",
116 | " 0.0710475\n",
117 | " 0.401356 \n",
118 | " 0.785052 \n",
119 | " 0.347791 \n",
120 | " 0.57127 \n",
121 | " 0.682185 "
122 | ]
123 | },
124 | "execution_count": 3,
125 | "metadata": {},
126 | "output_type": "execute_result"
127 | }
128 | ],
129 | "source": [
130 | "@time rand!(X)"
131 | ]
132 | },
133 | {
134 | "cell_type": "markdown",
135 | "metadata": {},
136 | "source": [
137 | "Let's try to do something with it."
138 | ]
139 | },
140 | {
141 | "cell_type": "code",
142 | "execution_count": 4,
143 | "metadata": {
144 | "collapsed": false
145 | },
146 | "outputs": [
147 | {
148 | "name": "stdout",
149 | "output_type": "stream",
150 | "text": [
151 | " 8.432462 seconds (24.76 k allocations: 1.053 MB)\n"
152 | ]
153 | },
154 | {
155 | "data": {
156 | "text/plain": [
157 | "2.4999423858350474e8"
158 | ]
159 | },
160 | "execution_count": 4,
161 | "metadata": {},
162 | "output_type": "execute_result"
163 | }
164 | ],
165 | "source": [
166 | "@time sum(X)"
167 | ]
168 | },
169 | {
170 | "cell_type": "markdown",
171 | "metadata": {},
172 | "source": [
173 | "But `mmap` quickly stops working..."
174 | ]
175 | },
176 | {
177 | "cell_type": "code",
178 | "execution_count": null,
179 | "metadata": {
180 | "collapsed": false
181 | },
182 | "outputs": [],
183 | "source": [
184 | "sum(sin(X).^2 + cos(X).^2)\n",
185 | "\n",
186 | "# results in 8 GB allocation at any given time."
187 | ]
188 | },
189 | {
190 | "cell_type": "markdown",
191 | "metadata": {
192 | "collapsed": true
193 | },
194 | "source": [
195 | "### Working with big arrays in ComputeFramework"
196 | ]
197 | },
198 | {
199 | "cell_type": "code",
200 | "execution_count": 1,
201 | "metadata": {
202 | "collapsed": false
203 | },
204 | "outputs": [],
205 | "source": [
206 | "#Pkg.clone(\"git://github.com/shashi/ComputeFramework.jl.git\")\n",
207 | "using ComputeFramework"
208 | ]
209 | },
210 | {
211 | "cell_type": "code",
212 | "execution_count": 4,
213 | "metadata": {
214 | "collapsed": false
215 | },
216 | "outputs": [],
217 | "source": [
218 | "x_node = rand(BlockPartition(5*10^7), 5*10^8);"
219 | ]
220 | },
221 | {
222 | "cell_type": "markdown",
223 | "metadata": {},
224 | "source": [
225 | "We have not actually created the array `x_node` yet. `x_node` represents a *contract* to create the array.\n",
226 | "\n",
227 | "We can create the array and write it directly to disk using `compute(save(x_node, \"/scratch/X\"))`. Again, you can think of `save` as returning a *contract* to save its input.\n",
228 | "\n",
229 | "`compute` is the ComputeFramework primitive that actually realizes a contract."
230 | ]
231 | },
232 | {
233 | "cell_type": "code",
234 | "execution_count": 8,
235 | "metadata": {
236 | "collapsed": false
237 | },
238 | "outputs": [
239 | {
240 | "name": "stdout",
241 | "output_type": "stream",
242 | "text": [
243 | " 13.802681 seconds (5.75 k allocations: 3.726 GB, 2.04% gc time)\n"
244 | ]
245 | },
246 | {
247 | "data": {
248 | "text/plain": [
249 | "ComputeFramework.Computed(500000000 Array{Float64,1} in 10 parts)"
250 | ]
251 | },
252 | "execution_count": 8,
253 | "metadata": {},
254 | "output_type": "execute_result"
255 | }
256 | ],
257 | "source": [
258 | "@time X = compute(save(x_node, \"/scratch/X\"))"
259 | ]
260 | },
261 | {
262 | "cell_type": "code",
263 | "execution_count": 9,
264 | "metadata": {
265 | "collapsed": false
266 | },
267 | "outputs": [
268 | {
269 | "data": {
270 | "text/plain": [
271 | "ComputeFramework.Computed(500000000 Array{Float64,1} in 10 parts)"
272 | ]
273 | },
274 | "execution_count": 9,
275 | "metadata": {},
276 | "output_type": "execute_result"
277 | }
278 | ],
279 | "source": [
280 | "X = load(Context(), \"/scratch/X\")"
281 | ]
282 | },
283 | {
284 | "cell_type": "code",
285 | "execution_count": 11,
286 | "metadata": {
287 | "collapsed": false
288 | },
289 | "outputs": [],
290 | "source": [
291 | "result = sum(sin(X).^2 + cos(X).^2);"
292 | ]
293 | },
294 | {
295 | "cell_type": "code",
296 | "execution_count": 12,
297 | "metadata": {
298 | "collapsed": false
299 | },
300 | "outputs": [
301 | {
302 | "name": "stdout",
303 | "output_type": "stream",
304 | "text": [
305 | " 47.491239 seconds (705.96 k allocations: 26.109 GB, 1.82% gc time)\n"
306 | ]
307 | },
308 | {
309 | "data": {
310 | "text/plain": [
311 | "5.0e8"
312 | ]
313 | },
314 | "execution_count": 12,
315 | "metadata": {},
316 | "output_type": "execute_result"
317 | }
318 | ],
319 | "source": [
320 | "@time compute(result)"
321 | ]
322 | },
323 | {
324 | "cell_type": "code",
325 | "execution_count": 14,
326 | "metadata": {
327 | "collapsed": false
328 | },
329 | "outputs": [
330 | {
331 | "data": {
332 | "image/svg+xml": [
333 | "\n",
334 | "\n",
336 | "\n",
338 | "\n",
339 | "\n"
1101 | ],
1102 | "text/plain": [
1103 | "GraphViz.Graph(Ptr{Void} @0x0000000003967dc0,false)"
1104 | ]
1105 | },
1106 | "execution_count": 14,
1107 | "metadata": {},
1108 | "output_type": "execute_result"
1109 | }
1110 | ],
1111 | "source": [
1112 | "using GraphViz\n",
1113 | "Graph(show_plan(sum(sin(X).^2 + cos(X).^2)))"
1114 | ]
1115 | },
1116 | {
1117 | "cell_type": "code",
1118 | "execution_count": 15,
1119 | "metadata": {
1120 | "collapsed": false
1121 | },
1122 | "outputs": [
1123 | {
1124 | "data": {
1125 | "text/plain": [
1126 | "ComputeFramework.AllocateArray(Float64,rand,ComputeFramework.DenseDomain{1}((1:300,)),ComputeFramework.BlockPartition{1}((100,)))"
1127 | ]
1128 | },
1129 | "execution_count": 15,
1130 | "metadata": {},
1131 | "output_type": "execute_result"
1132 | }
1133 | ],
1134 | "source": [
1135 | "X = rand(BlockPartition(100, 100), 300, 200)\n",
1136 | "y = rand(BlockPartition(100), 300)"
1137 | ]
1138 | },
1139 | {
1140 | "cell_type": "code",
1141 | "execution_count": 16,
1142 | "metadata": {
1143 | "collapsed": false
1144 | },
1145 | "outputs": [],
1146 | "source": [
1147 | "using Colors"
1148 | ]
1149 | },
1150 | {
1151 | "cell_type": "code",
1152 | "execution_count": 20,
1153 | "metadata": {
1154 | "collapsed": false
1155 | },
1156 | "outputs": [
1157 | {
1158 | "data": {
1159 | "image/svg+xml": [
1160 | "\n",
1161 | "\n",
1163 | ""
1185 | ],
1186 | "text/plain": [
1187 | "3x2 Array{ColorTypes.RGB{T<:Union{AbstractFloat,FixedPointNumbers.FixedPoint{T<:Integer,f}}},2}:\n",
1188 | " RGB{U8}(1.0,0.0,0.0) RGB{U8}(1.0,0.0,0.0)\n",
1189 | " RGB{U8}(1.0,0.0,0.0) RGB{U8}(1.0,0.0,0.0)\n",
1190 | " RGB{U8}(1.0,0.0,0.0) RGB{U8}(1.0,0.0,0.0)"
1191 | ]
1192 | },
1193 | "execution_count": 20,
1194 | "metadata": {},
1195 | "output_type": "execute_result"
1196 | }
1197 | ],
1198 | "source": [
1199 | "red = RGB(1,0,0)\n",
1200 | "blue = RGB(0,0,1)\n",
1201 | "white = RGB(1,1,1)\n",
1202 | "\n",
1203 | "X_blocks = RGB[red for j=1:3, i=1:2]"
1204 | ]
1205 | },
1206 | {
1207 | "cell_type": "code",
1208 | "execution_count": 21,
1209 | "metadata": {
1210 | "collapsed": false
1211 | },
1212 | "outputs": [
1213 | {
1214 | "data": {
1215 | "image/svg+xml": [
1216 | "\n",
1217 | "\n",
1219 | ""
1241 | ],
1242 | "text/plain": [
1243 | "2x3 Array{ColorTypes.RGB{T<:Union{AbstractFloat,FixedPointNumbers.FixedPoint{T<:Integer,f}}},2}:\n",
1244 | " RGB{U8}(1.0,0.0,0.0) RGB{U8}(1.0,0.0,0.0) RGB{U8}(1.0,0.0,0.0)\n",
1245 | " RGB{U8}(1.0,0.0,0.0) RGB{U8}(1.0,0.0,0.0) RGB{U8}(1.0,0.0,0.0)"
1246 | ]
1247 | },
1248 | "execution_count": 21,
1249 | "metadata": {},
1250 | "output_type": "execute_result"
1251 | }
1252 | ],
1253 | "source": [
1254 | "X_blocks'"
1255 | ]
1256 | },
1257 | {
1258 | "cell_type": "code",
1259 | "execution_count": 22,
1260 | "metadata": {
1261 | "collapsed": false
1262 | },
1263 | "outputs": [
1264 | {
1265 | "data": {
1266 | "image/svg+xml": [
1267 | "\n",
1268 | "\n",
1270 | ""
1292 | ],
1293 | "text/plain": [
1294 | "3x2 Array{ColorTypes.RGB{T<:Union{AbstractFloat,FixedPointNumbers.FixedPoint{T<:Integer,f}}},2}:\n",
1295 | " RGB{U8}(0.0,0.0,1.0) RGB{U8}(1.0,1.0,1.0)\n",
1296 | " RGB{U8}(0.0,0.0,1.0) RGB{U8}(1.0,1.0,1.0)\n",
1297 | " RGB{U8}(0.0,0.0,1.0) RGB{U8}(1.0,1.0,1.0)"
1298 | ]
1299 | },
1300 | "execution_count": 22,
1301 | "metadata": {},
1302 | "output_type": "execute_result"
1303 | }
1304 | ],
1305 | "source": [
1306 | "Y_blocks = hcat(RGB[blue for i=1:3], RGB[white for i=1:3])"
1307 | ]
1308 | },
1309 | {
1310 | "cell_type": "code",
1311 | "execution_count": 26,
1312 | "metadata": {
1313 | "collapsed": false
1314 | },
1315 | "outputs": [
1316 | {
1317 | "data": {
1318 | "image/svg+xml": [
1319 | "\n",
1320 | "\n",
1322 | "\n",
1324 | "\n",
1325 | "\n"
1462 | ],
1463 | "text/plain": [
1464 | "GraphViz.Graph(Ptr{Void} @0x00000000050b5750,false)"
1465 | ]
1466 | },
1467 | "execution_count": 26,
1468 | "metadata": {},
1469 | "output_type": "execute_result"
1470 | }
1471 | ],
1472 | "source": [
1473 | "Graph(show_plan(X'))"
1474 | ]
1475 | },
1476 | {
1477 | "cell_type": "code",
1478 | "execution_count": null,
1479 | "metadata": {
1480 | "collapsed": true
1481 | },
1482 | "outputs": [],
1483 | "source": []
1484 | }
1485 | ],
1486 | "metadata": {
1487 | "kernelspec": {
1488 | "display_name": "Julia 0.4.5",
1489 | "language": "julia",
1490 | "name": "julia-0.4"
1491 | },
1492 | "language_info": {
1493 | "file_extension": ".jl",
1494 | "mimetype": "application/julia",
1495 | "name": "julia",
1496 | "version": "0.4.5"
1497 | }
1498 | },
1499 | "nbformat": 4,
1500 | "nbformat_minor": 0
1501 | }
1502 |
--------------------------------------------------------------------------------
/JuliaCon2016_Basics.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {
6 | "collapsed": false,
7 | "slideshow": {
8 | "slide_type": "slide"
9 | }
10 | },
11 | "source": [
12 | "## Parallel programming constructs in Base Julia\n",
13 | "\n",
14 | "- Example used\n",
15 | " - calculate pi using random numbers (circle circumscribed by a square)\n",
16 | " - Area of circle / Area of square = pi / 4\n",
17 | "\n",
18 | "- Processes vs tasks\n",
19 | " - Process\n",
20 | " - Single thread of execution scheduled by the OS\n",
21 | " - Master process\n",
22 | " - Driver, orchestrates work\n",
23 | " - Hosts the REPL in interactive mode, or the main script in non-interactive mode \n",
24 | " - Workers\n",
25 | " - Different OS processes, typically one per core\n",
26 | " - Identified by a numeric process id, not related to the OS pid\n",
27 | " - Tasks are lightweight co-routines"
28 | ]
29 | },
30 | {
31 | "cell_type": "code",
32 | "execution_count": null,
33 | "metadata": {
34 | "collapsed": false,
35 | "slideshow": {
36 | "slide_type": "slide"
37 | }
38 | },
39 | "outputs": [],
40 | "source": [
41 | "# First the serial version\n",
42 | "function serial_π(n)\n",
43 | " in_circle = 0 \n",
44 | " for i in 1:n\n",
45 | " x = rand()\n",
46 | " y = rand()\n",
47 | " in_circle += Int((x^2 + y^2) < 1.0)\n",
48 | " end\n",
49 | " return (in_circle/n) * 4.0\n",
50 | "end"
51 | ]
52 | },
53 | {
54 | "cell_type": "code",
55 | "execution_count": null,
56 | "metadata": {
57 | "collapsed": false
58 | },
59 | "outputs": [],
60 | "source": [
61 | "println(\"π = \", @time serial_π(10^8) )"
62 | ]
63 | },
64 | {
65 | "cell_type": "markdown",
66 | "metadata": {},
67 | "source": [
68 | "#### Compute in parallel\n",
69 | "- First set up a local cluster\n",
70 | "- Master process + Worker processes \n",
71 | "- Workers may be on the same host, or different hosts\n",
72 | "- Workers can be launched on cluster managers like SGE, SLURM, etc\n"
73 | ]
74 | },
75 | {
76 | "cell_type": "code",
77 | "execution_count": null,
78 | "metadata": {
79 | "collapsed": false
80 | },
81 | "outputs": [],
82 | "source": [
83 | "if nprocs() == 1\n",
84 | " addprocs(4)\n",
85 | "end\n",
86 | "nprocs()"
87 | ]
88 | },
89 | {
90 | "cell_type": "code",
91 | "execution_count": null,
92 | "metadata": {
93 | "collapsed": false
94 | },
95 | "outputs": [],
96 | "source": [
97 | "procs()"
98 | ]
99 | },
100 | {
101 | "cell_type": "code",
102 | "execution_count": null,
103 | "metadata": {
104 | "collapsed": false
105 | },
106 | "outputs": [],
107 | "source": [
108 | "workers()"
109 | ]
110 | },
111 | {
112 | "cell_type": "markdown",
113 | "metadata": {},
114 | "source": [
115 | "#### Change the serial version to execute the for loop in parallel\n",
116 | " - Use @parallel for\n",
117 | " - Partitions a \"for\" loop\n",
118 | " - Equally partitioned among available workers\n",
119 | " - Can specify a reduction operator "
120 | ]
121 | },
122 | {
123 | "cell_type": "code",
124 | "execution_count": null,
125 | "metadata": {
126 | "collapsed": false,
127 | "slideshow": {
128 | "slide_type": "slide"
129 | }
130 | },
131 | "outputs": [],
132 | "source": [
133 | "#Parallel version\n",
134 | "function parallel_π(n)\n",
135 | " in_circle = @parallel (+) for i in 1:n # <----- partition work\n",
136 | " x = rand()\n",
137 | " y = rand()\n",
138 | " Int((x^2 + y^2) < 1.0)\n",
139 | " end\n",
140 | " return (in_circle/n) * 4.0\n",
141 | "end"
142 | ]
143 | },
144 | {
145 | "cell_type": "code",
146 | "execution_count": null,
147 | "metadata": {
148 | "collapsed": false
149 | },
150 | "outputs": [],
151 | "source": [
152 | "println(\"π = \", @time parallel_π(10^8) )"
153 | ]
154 | },
155 | {
156 | "cell_type": "markdown",
157 | "metadata": {
158 | "collapsed": false,
159 | "slideshow": {
160 | "slide_type": "slide"
161 | }
162 | },
163 | "source": [
164 | "### Julia Tasks\n",
165 | "\n",
166 | "- What is a Julia Task?\n",
167 | " - very lightweight coroutines\n",
168 | " - Not threads!\n",
169 | " - Internal to and scheduled by a Julia Process\n",
170 | " - Runs till it performs an I/O operation or explictly yields (calls sleep() or yield() )\n",
171 | " - A non-yielding task in a process prevents any other code from execution (including I/O operations)\n",
172 | " - Julia process driving external services in parallel\n",
173 | " - Julia master driving worker processes in a Julia cluster \n",
174 | "\n",
175 | "\n",
176 | "Simple example of a single Julia process driving a few external resources\n",
177 | " - Calculate pi using all the machines available at JuliaCon \n"
178 | ]
179 | },
180 | {
181 | "cell_type": "markdown",
182 | "metadata": {},
183 | "source": [
184 | "#### pseudo-code (driver)\n",
185 | "\n",
186 | "```\n",
187 | "schedule a background task to\n",
188 | " listen on a known port\n",
189 | " while true\n",
190 | " accept and store incoming connections from machines at JuliaCon\n",
191 | " end\n",
192 | " \n",
193 | " \n",
194 | "function calculate_pi_in_parallel\n",
195 | " send out computation requests to all connected machines\n",
196 | " add each response to a queue as it arrives\n",
197 | " process responses as they arrive till all responses have been recd or a timeout\n",
198 | " \n",
199 | "``` "
200 | ]
201 | },
202 | {
203 | "cell_type": "markdown",
204 | "metadata": {},
205 | "source": [
206 | "#### pseudo-code (calculation service)\n",
207 | "\n",
208 | "```\n",
209 | "connect to orchestrator\n",
210 | "while true\n",
211 | " wait for a request\n",
212 | " compute request in parallel locally\n",
213 | " send back the response\n",
214 | "end\n",
215 | "```"
216 | ]
217 | },
218 | {
219 | "cell_type": "markdown",
220 | "metadata": {
221 | "slideshow": {
222 | "slide_type": "slide"
223 | }
224 | },
225 | "source": [
226 | "#### Driver code for reference\n",
227 | "\n",
228 | "```\n",
229 | "# Calculate pi using all instances of the users at JuliaCon\n",
230 | "\n",
231 | "const connections=Set()\n",
232 | "\n",
233 | "@schedule begin\n",
234 | " srvr = listen(8000)\n",
235 | " while true\n",
236 | " sock = accept(srvr)\n",
237 | " push!(connections, sock)\n",
238 | " end\n",
239 | "end\n",
240 | "\n",
241 | "function calc_π(n_each)\n",
242 | " println(\"Processing remotely on possible $(length(connections)) processes\")\n",
243 | "\n",
244 | " # This function will wait for a maximum of 10.0 seconds for remote workers to return\n",
245 | " tc = Condition()\n",
246 | " @schedule (sleep(10.0); notify(tc)) # <---- notify when 10.0 seconds are up\n",
247 | " \n",
248 | " response_channel = Channel()\n",
249 | " \n",
250 | " nconn = 0\n",
251 | " conn2 = copy(connections) \n",
252 | " \n",
253 | " for c in conn2\n",
254 | " nconn += 1 \n",
255 | " @async try # <---- start all remote requests\n",
256 | " serialize(c, n_each)\n",
257 | " put!(response_channel, deserialize(c))\n",
258 | " catch e\n",
259 | " put!(response_channel, :ERROR)\n",
260 | " delete!(connections, c)\n",
261 | " finally\n",
262 | " notify(tc)\n",
263 | " end\n",
264 | " end\n",
265 | " \n",
266 | " incircle = 0\n",
267 | " total = 0\n",
268 | " \n",
269 | " # wait for all responses or the timeout\n",
270 | " for i in 1:nconn\n",
271 | " !isready(response_channel) && wait(tc) # Block wait for a pending response or a timeout\n",
272 | " !isready(response_channel) && break # Still not ready, indicates a timeout\n",
273 | " \n",
274 | " resp = take!(response_channel)\n",
275 | " if resp != :ERROR\n",
276 | " incircle += resp\n",
277 | " total += n_each\n",
278 | " println(\"pi calculated from $nconn workers = \", 4*incircle/total)\n",
279 | " end\n",
280 | " end\n",
281 | " \n",
282 | " return 4*incircle/total\n",
283 | " \n",
284 | "end\n",
285 | "\n",
286 | "calc_π(10^6)\n",
287 | "\n",
288 | "```"
289 | ]
290 | },
291 | {
292 | "cell_type": "code",
293 | "execution_count": null,
294 | "metadata": {
295 | "collapsed": false
296 | },
297 | "outputs": [],
298 | "source": [
299 | "\n",
300 | "##################################################\n",
301 | "# Make available your local computation resources\n",
302 | "##################################################\n",
303 | "@schedule begin\n",
304 | " c = connect(\"107.23.255.102\", 8000)\n",
305 | " while true\n",
306 | " num_points = deserialize(c) # <--- Block wait for a request \n",
307 | " \n",
308 | " in_circle = @parallel (+) for i in 1:num_points # <--- Use all available local cores\n",
309 | " Int(rand()^2 + rand()^2 < 1)\n",
310 | " end\n",
311 | " \n",
312 | " println(\"Received request for $num_points points. Response $in_circle\")\n",
313 | " serialize(c, in_circle) # <--- send back response\n",
314 | " end\n",
315 | "end\n"
316 | ]
317 | },
318 | {
319 | "cell_type": "markdown",
320 | "metadata": {},
321 | "source": [
322 | "#### Using Julia Tasks and Workers together\n",
323 | "\n",
324 | "Let us build a simple distributed vector\n",
325 | "- architecturally similar to DistributedArrays.jl\n",
326 | "- create a distributed vector of random floats and implement a map function"
327 | ]
328 | },
329 | {
330 | "cell_type": "code",
331 | "execution_count": null,
332 | "metadata": {
333 | "collapsed": false,
334 | "slideshow": {
335 | "slide_type": "slide"
336 | }
337 | },
338 | "outputs": [],
339 | "source": [
340 | "nprocs() == 1 && addprocs(4)\n",
341 | "\n",
342 | "type DVector\n",
343 | " refs::Array{RemoteRef} # references to localparts\n",
344 | " cuts::Array{UnitRange{Int}} # cut of vector on ith worker\n",
345 | " pids::Array{Int} # participating workers, refs[i] is on pids[i]\n",
346 | " \n",
347 | " function DVector(N)\n",
348 | " refs=[]\n",
349 | " cuts=[]\n",
350 | " pids=workers()\n",
351 | " localpart_len = div(N, nworkers())\n",
352 | " ncut_start = 1\n",
353 | " for p in pids\n",
354 | " if p == pids[end]\n",
355 | " localpart_len = localpart_len + rem(N, nworkers())\n",
356 | " end\n",
357 | " push!(refs, remotecall(p, rand, localpart_len)) # create the localpart on each worker\n",
358 | " # and hold a reference to it\n",
359 | " \n",
360 | " push!(cuts, ncut_start:ncut_start+localpart_len-1) # Which worker has which part\n",
361 | " ncut_start += localpart_len\n",
362 | " end\n",
363 | " return new(refs, cuts, workers())\n",
364 | " end\n",
365 | "end\n",
366 | "\n",
367 | "function Base.convert(::Type{Array}, d::DVector)\n",
368 | " A = Array(Float64, last(d.cuts[end]))\n",
369 | " @sync for (i,r) in enumerate(d.refs) # wait for all enclosed requests to finish\n",
370 | " @async A[d.cuts[i]] = fetch(r) # perform the \"fetching\" in parallel\n",
371 | " end\n",
372 | " A\n",
373 | "end\n",
374 | "\n",
375 | "function Base.getindex(d::DVector, i)\n",
376 | " idx = findfirst(x -> i in x, d.cuts) # Locate which ref has the index we need\n",
377 | "\n",
378 | " # fetch only the single element. fetch localpart on correct worker and index locally. \n",
379 | " remotecall_fetch(d.pids[idx], (li, r) -> fetch(r)[li], i-first(d.cuts[idx])+1, d.refs[idx])\n",
380 | "end\n"
381 | ]
382 | },
383 | {
384 | "cell_type": "code",
385 | "execution_count": null,
386 | "metadata": {
387 | "collapsed": false,
388 | "slideshow": {
389 | "slide_type": "slide"
390 | }
391 | },
392 | "outputs": [],
393 | "source": [
394 | "d=DVector(12) # As you can see the local structure only has refernces to distributed parts"
395 | ]
396 | },
397 | {
398 | "cell_type": "code",
399 | "execution_count": null,
400 | "metadata": {
401 | "collapsed": false,
402 | "slideshow": {
403 | "slide_type": "slide"
404 | }
405 | },
406 | "outputs": [],
407 | "source": [
408 | "# gather distributed parts\n",
409 | "Array(d)"
410 | ]
411 | },
412 | {
413 | "cell_type": "code",
414 | "execution_count": null,
415 | "metadata": {
416 | "collapsed": false,
417 | "slideshow": {
418 | "slide_type": "slide"
419 | }
420 | },
421 | "outputs": [],
422 | "source": [
423 | "d[9]"
424 | ]
425 | },
426 | {
427 | "cell_type": "code",
428 | "execution_count": null,
429 | "metadata": {
430 | "collapsed": false,
431 | "slideshow": {
432 | "slide_type": "slide"
433 | }
434 | },
435 | "outputs": [],
436 | "source": [
437 | "# Implement a distributed map\n",
438 | "function Base.map!(f, d::DVector)\n",
439 | " @sync for (i, p) in enumerate(d.pids)\n",
440 | " @async remotecall_wait(p, (f,r)->(map!(f, fetch(r)); nothing), f, d.refs[i])\n",
441 | " end\n",
442 | " d\n",
443 | "end"
444 | ]
445 | },
446 | {
447 | "cell_type": "code",
448 | "execution_count": null,
449 | "metadata": {
450 | "collapsed": false,
451 | "slideshow": {
452 | "slide_type": "slide"
453 | }
454 | },
455 | "outputs": [],
456 | "source": [
457 | "# Lets try it out \n",
458 | "map!(x->1.0, d)"
459 | ]
460 | },
461 | {
462 | "cell_type": "code",
463 | "execution_count": null,
464 | "metadata": {
465 | "collapsed": false
466 | },
467 | "outputs": [],
468 | "source": [
469 | "# gather parts and display\n",
470 | "Array(d)"
471 | ]
472 | },
473 | {
474 | "cell_type": "markdown",
475 | "metadata": {
476 | "slideshow": {
477 | "slide_type": "slide"
478 | }
479 | },
480 | "source": [
481 | "Package DistributedArrays.jl has the complete implementation for distributed arrays."
482 | ]
483 | }
484 | ],
485 | "metadata": {
486 | "kernelspec": {
487 | "display_name": "Julia 0.4.5",
488 | "language": "julia",
489 | "name": "julia-0.4"
490 | },
491 | "language_info": {
492 | "file_extension": ".jl",
493 | "mimetype": "application/julia",
494 | "name": "julia",
495 | "version": "0.4.5"
496 | }
497 | },
498 | "nbformat": 4,
499 | "nbformat_minor": 0
500 | }
501 |
--------------------------------------------------------------------------------
/JuliaParallelGlossary.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JuliaParallel/ParallelWorkshopJuliaCon2016/b478cbad86b7441842ae99f6c99d38212f56c635/JuliaParallelGlossary.pdf
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | The MIT License (MIT)
2 |
3 | Copyright (c) 2016 Andreas Noack
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # ParallelWorkshop
2 | Repo for the parallel workshop at JuliaCon 2016
3 |
--------------------------------------------------------------------------------
/WorkshopDArray.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Using DArrays for reading and manipulating data\n",
8 | "## Parallel Workshop JuliaCon 2016\n",
9 | "### DArrays basics"
10 | ]
11 | },
12 | {
13 | "cell_type": "code",
14 | "execution_count": 1,
15 | "metadata": {
16 | "collapsed": false
17 | },
18 | "outputs": [],
19 | "source": [
20 | "# Add processes\n",
21 | "addprocs(8)\n",
22 | "# Use package for distributed arrays\n",
23 | "using DistributedArrays"
24 | ]
25 | },
26 | {
27 | "cell_type": "code",
28 | "execution_count": 12,
29 | "metadata": {
30 | "collapsed": false
31 | },
32 | "outputs": [
33 | {
34 | "data": {
35 | "text/plain": [
36 | "9-element Array{Int64,1}:\n",
37 | " 1\n",
38 | " 2\n",
39 | " 3\n",
40 | " 4\n",
41 | " 5\n",
42 | " 6\n",
43 | " 7\n",
44 | " 8\n",
45 | " 9"
46 | ]
47 | },
48 | "execution_count": 12,
49 | "metadata": {},
50 | "output_type": "execute_result"
51 | }
52 | ],
53 | "source": [
54 | "# Create vector of process IDs\n",
55 | "C = procs()"
56 | ]
57 | },
58 | {
59 | "cell_type": "code",
60 | "execution_count": 13,
61 | "metadata": {
62 | "collapsed": false
63 | },
64 | "outputs": [
65 | {
66 | "data": {
67 | "text/plain": [
68 | "9-element Array{Int64,1}:\n",
69 | " 1\n",
70 | " 4\n",
71 | " 9\n",
72 | " 16\n",
73 | " 25\n",
74 | " 36\n",
75 | " 49\n",
76 | " 64\n",
77 | " 81"
78 | ]
79 | },
80 | "execution_count": 13,
81 | "metadata": {},
82 | "output_type": "execute_result"
83 | }
84 | ],
85 | "source": [
86 | "# Apply a map to the vector\n",
87 | "map(t -> t*t, C)"
88 | ]
89 | },
90 | {
91 | "cell_type": "code",
92 | "execution_count": 14,
93 | "metadata": {
94 | "collapsed": false
95 | },
96 | "outputs": [
97 | {
98 | "data": {
99 | "text/plain": [
100 | "9-element DistributedArrays.DArray{Int64,1,Array{Int64,1}}:\n",
101 | " 1\n",
102 | " 2\n",
103 | " 3\n",
104 | " 4\n",
105 | " 5\n",
106 | " 6\n",
107 | " 7\n",
108 | " 8\n",
109 | " 9"
110 | ]
111 | },
112 | "execution_count": 14,
113 | "metadata": {},
114 | "output_type": "execute_result"
115 | }
116 | ],
117 | "source": [
118 | "# Make the vector distributed\n",
119 | "D = distribute(C)"
120 | ]
121 | },
122 | {
123 | "cell_type": "code",
124 | "execution_count": 16,
125 | "metadata": {
126 | "collapsed": false
127 | },
128 | "outputs": [
129 | {
130 | "data": {
131 | "text/plain": [
132 | "8-element Array{Tuple{UnitRange{Int64}},1}:\n",
133 | " (1:1,)\n",
134 | " (2:2,)\n",
135 | " (3:3,)\n",
136 | " (4:5,)\n",
137 | " (6:6,)\n",
138 | " (7:7,)\n",
139 | " (8:8,)\n",
140 | " (9:9,)"
141 | ]
142 | },
143 | "execution_count": 16,
144 | "metadata": {},
145 | "output_type": "execute_result"
146 | }
147 | ],
148 | "source": [
149 | "# show how the vector is distributed accross the workers\n",
150 | "D.indexes"
151 | ]
152 | },
153 | {
154 | "cell_type": "code",
155 | "execution_count": 19,
156 | "metadata": {
157 | "collapsed": false
158 | },
159 | "outputs": [
160 | {
161 | "data": {
162 | "text/plain": [
163 | "9-element DistributedArrays.DArray{Int64,1,Array{Int64,1}}:\n",
164 | " 1\n",
165 | " 4\n",
166 | " 9\n",
167 | " 16\n",
168 | " 25\n",
169 | " 36\n",
170 | " 49\n",
171 | " 64\n",
172 | " 81"
173 | ]
174 | },
175 | "execution_count": 19,
176 | "metadata": {},
177 | "output_type": "execute_result"
178 | }
179 | ],
180 | "source": [
181 | "# apply map to distributed vector (looks identical to non-distributed case)\n",
182 | "map(t -> t*t, D)"
183 | ]
184 | },
185 | {
186 | "cell_type": "code",
187 | "execution_count": 20,
188 | "metadata": {
189 | "collapsed": false
190 | },
191 | "outputs": [
192 | {
193 | "data": {
194 | "text/plain": [
195 | "9-element DistributedArrays.DArray{UTF8String,1,Array{UTF8String,1}}:\n",
196 | " \"January\" \n",
197 | " \"February\" \n",
198 | " \"March\" \n",
199 | " \"April\" \n",
200 | " \"May\" \n",
201 | " \"June\" \n",
202 | " \"July\" \n",
203 | " \"August\" \n",
204 | " \"September\""
205 | ]
206 | },
207 | "execution_count": 20,
208 | "metadata": {},
209 | "output_type": "execute_result"
210 | }
211 | ],
212 | "source": [
213 | "# Distributed vectors not restricted to numerical types\n",
214 | "map(t -> Dates.monthname((t - 1) % 12 + 1), D)"
215 | ]
216 | },
217 | {
218 | "cell_type": "code",
219 | "execution_count": 27,
220 | "metadata": {
221 | "collapsed": false
222 | },
223 | "outputs": [
224 | {
225 | "name": "stdout",
226 | "output_type": "stream",
227 | "text": [
228 | "January is my favorite month.\n",
229 | "February is my favorite month.\n",
230 | "March is my favorite month.\n",
231 | "April is my favorite month.\n",
232 | "May is my favorite month.\n",
233 | "June is my favorite month.\n",
234 | "July is my favorite month.\n",
235 | "August is my favorite month.\n",
236 | "September is my favorite month.\n",
237 | "\n"
238 | ]
239 | }
240 | ],
241 | "source": [
242 | "# A slightly more complicated example of map and reduce\n",
243 | "monthString = map(t -> Dates.monthname((t - 1) % 12 + 1) |> s -> s*\" is my favorite month.\\n\", D) |>\n",
244 | " t -> reduce(*, Array(t))\n",
245 | "println(monthString)"
246 | ]
247 | },
248 | {
249 | "cell_type": "code",
250 | "execution_count": 30,
251 | "metadata": {
252 | "collapsed": false
253 | },
254 | "outputs": [
255 | {
256 | "data": {
257 | "text/plain": [
258 | "32-element DistributedArrays.DArray{Array{Float64,2},1,Array{Array{Float64,2},1}}:\n",
259 | " 5x5 Array{Float64,2}:\n",
260 | " -1.06675 -0.862483 0.068994 -0.954383 0.612892 \n",
261 | " 1.23371 0.470195 -0.569822 1.97442 -0.357816 \n",
262 | " -0.678979 0.283267 -0.719494 -0.321645 0.30929 \n",
263 | " 0.334916 -1.22503 0.745611 -2.33004 0.0618629\n",
264 | " 0.63521 -1.19402 -1.89202 -1.41228 -0.399258 \n",
265 | " 5x5 Array{Float64,2}:\n",
266 | " -0.263565 0.792395 1.4206 -1.04345 -1.3819 \n",
267 | " -0.774798 -2.12411 0.233554 0.480621 -1.07603 \n",
268 | " 0.106657 -0.635091 1.4687 1.32517 -0.53843 \n",
269 | " -0.109394 -0.351786 -1.76389 -1.26364 0.923438\n",
270 | " -1.03854 0.719064 -0.0939638 -0.795515 -1.6328 \n",
271 | " 5x5 Array{Float64,2}:\n",
272 | " 0.552984 0.771221 -1.13863 -0.508559 -0.0113824\n",
273 | " -0.933323 0.108645 -0.156966 0.245402 -1.24307 \n",
274 | " -1.71768 -0.63347 -1.1948 -0.513568 0.102846 \n",
275 | " 1.00582 0.5017 0.254344 -0.459188 -1.24642 \n",
276 | " 0.578157 -0.955268 -0.486167 0.69969 -1.20993 \n",
277 | " 5x5 Array{Float64,2}:\n",
278 | " -0.789508 0.618741 -0.453641 -0.822818 -0.557663\n",
279 | " 0.974117 -1.83974 -0.277178 0.512668 1.27104 \n",
280 | " -0.709733 0.612776 1.49285 0.317977 0.139118\n",
281 | " 0.391671 -1.88709 -0.646619 1.6739 -0.567971\n",
282 | " 2.12636 -0.540806 0.127479 1.23999 0.331377 \n",
283 | " 5x5 Array{Float64,2}:\n",
284 | " -1.39956 -1.46911 -0.107517 -1.02874 0.135895 \n",
285 | " -0.899033 -0.782373 0.916431 0.543763 -1.67456 \n",
286 | " 1.88291 -2.25786 1.17409 -1.72009 0.753339 \n",
287 | " 0.845102 -0.374647 0.6679 0.603797 0.748118 \n",
288 | " 1.04906 -0.371608 0.802411 1.1088 -0.0750549 \n",
289 | " 5x5 Array{Float64,2}:\n",
290 | " 0.961773 2.66996 -0.267378 -1.78509 0.571654\n",
291 | " -0.0500044 0.0573963 -1.02788 0.200939 0.783379\n",
292 | " -1.07686 -0.0784243 0.251457 1.63682 0.356293\n",
293 | " 1.01147 0.294872 -1.22372 -0.889095 1.66064 \n",
294 | " 0.734298 0.88554 0.0373447 -0.754012 -0.198345 \n",
295 | " 5x5 Array{Float64,2}:\n",
296 | " -0.0692339 -1.31255 0.919369 -0.427185 0.39729 \n",
297 | " -0.205595 -0.691149 -0.286952 1.00361 -0.833616\n",
298 | " -0.2555 0.660427 2.39717 0.091816 -0.182907\n",
299 | " 0.0337576 0.484808 -0.00777874 -1.20292 0.204658\n",
300 | " 1.4161 1.21996 0.338959 -0.196538 0.418386 \n",
301 | " 5x5 Array{Float64,2}:\n",
302 | " -1.81399 1.53538 0.218797 -0.764974 1.1978 \n",
303 | " 0.507625 0.521923 -1.19269 0.887064 0.0146956\n",
304 | " 1.1279 0.679558 1.43987 -0.145497 0.794897 \n",
305 | " 0.243419 -0.999 0.628626 -0.899615 0.56604 \n",
306 | " 0.38032 0.265209 -0.123539 0.822136 0.103397 \n",
307 | " 5x5 Array{Float64,2}:\n",
308 | " -1.26677 1.30111 -1.19482 0.944662 1.27032 \n",
309 | " 0.645873 -1.69587 0.790426 1.57385 -1.04872 \n",
310 | " 0.895864 -0.904526 0.211829 0.432422 0.908886 \n",
311 | " 0.909365 -0.0247251 -1.3992 1.35324 0.00605672\n",
312 | " 0.556188 2.0113 0.218212 0.269224 0.483188 \n",
313 | " 5x5 Array{Float64,2}:\n",
314 | " 0.492425 0.0207142 -1.74526 0.269466 0.437831\n",
315 | " -1.09313 0.18195 0.790814 -0.396866 -0.820381\n",
316 | " 0.0961135 0.772636 0.40537 0.527496 -0.598072\n",
317 | " 2.67332 -0.265071 -0.957505 0.561186 -0.257624\n",
318 | " -0.844598 -0.202553 -0.374446 1.264 0.692316 \n",
319 | " 5x5 Array{Float64,2}:\n",
320 | " -1.9382 1.13558 0.266919 -2.37164 0.862658\n",
321 | " 0.399522 1.10717 -1.47292 -0.296714 -0.680509\n",
322 | " 0.579192 -0.488133 -0.720453 2.66568 1.54534 \n",
323 | " 1.3137 2.04789 -0.858939 -0.320695 0.435022\n",
324 | " -0.699034 -0.0083402 -1.41206 -0.879052 -1.57793 \n",
325 | " 5x5 Array{Float64,2}:\n",
326 | " -0.768303 0.875975 -0.826773 -0.164796 -1.61447 \n",
327 | " -0.766573 -0.315926 1.11558 -1.29352 -1.37432 \n",
328 | " 0.727713 0.865269 0.981275 -1.25524 -0.0959191\n",
329 | " 1.44931 0.51309 -0.833846 -0.124609 -0.524423 \n",
330 | " -1.61305 -0.988117 -0.811481 2.05141 -0.315667 \n",
331 | " 5x5 Array{Float64,2}:\n",
332 | " -1.36786 0.0340207 -0.237405 -0.274716 0.8919 \n",
333 | " -0.254547 0.811721 -0.503693 1.28031 0.0555199\n",
334 | " -0.105329 1.38879 0.542973 0.0597622 0.378819 \n",
335 | " -0.308847 -0.620924 0.0457921 0.86464 0.22702 \n",
336 | " -0.140736 -2.23197 0.866094 0.0456088 0.326839 \n",
337 | " ⋮ \n",
338 | " 5x5 Array{Float64,2}:\n",
339 | " -0.336494 0.0951319 0.617808 0.554639 -0.540642 \n",
340 | " 0.169 1.56658 -0.322309 -0.190795 0.768436 \n",
341 | " -0.0130848 -1.35391 -0.620316 0.525973 -0.69227 \n",
342 | " 1.34151 0.421974 -1.01115 0.208916 0.00993669\n",
343 | " 0.170991 -1.23985 0.53718 0.273387 -1.54653 \n",
344 | " 5x5 Array{Float64,2}:\n",
345 | " 0.817697 0.0477398 0.484062 0.933636 -0.215703 \n",
346 | " -0.292451 0.400147 -0.520761 -0.0251304 0.249522 \n",
347 | " -1.39157 0.781195 0.122834 -2.35152 1.29159 \n",
348 | " -1.0079 -0.537778 0.305858 -0.852588 0.165614 \n",
349 | " 0.418937 1.22986 1.68048 0.215558 -0.0718327 \n",
350 | " 5x5 Array{Float64,2}:\n",
351 | " -2.37222 -1.20011 -0.624817 -1.19058 0.403215 \n",
352 | " 0.404301 0.299517 -0.21798 -0.38811 -0.15543 \n",
353 | " -1.37567 1.57264 1.30093 -0.129152 1.07456 \n",
354 | " 0.372478 -0.736678 -0.546062 1.14016 -0.645153 \n",
355 | " -1.73503 -0.678465 0.749759 0.224002 0.0763105 \n",
356 | " 5x5 Array{Float64,2}:\n",
357 | " 0.298955 0.539073 0.196446 0.663063 0.205821 \n",
358 | " -0.460247 0.183238 -0.312501 1.6534 0.0639343\n",
359 | " 0.687594 -0.334807 0.63398 0.483715 0.851791 \n",
360 | " -0.42124 0.4964 0.121562 0.260949 0.793393 \n",
361 | " 0.215331 1.35275 0.471742 -1.36546 0.213337 \n",
362 | " 5x5 Array{Float64,2}:\n",
363 | " 0.35473 0.901614 0.418195 -1.45422 -0.75786 \n",
364 | " 0.836451 1.57328 -1.0498 0.22519 -0.121084 \n",
365 | " 1.97187 -0.556832 1.49936 -0.800662 -1.58326 \n",
366 | " 1.39208 0.556914 -0.312284 -0.446597 0.492288 \n",
367 | " 0.698257 0.32361 -0.728789 -0.253362 0.00586006 \n",
368 | " 5x5 Array{Float64,2}:\n",
369 | " -0.606105 0.518211 -0.20917 -0.18962 2.06473 \n",
370 | " 0.953195 -0.508041 -0.721713 3.23058 -0.909371 \n",
371 | " -0.546238 1.2879 0.668295 -1.20234 -0.461488 \n",
372 | " -2.07535 0.41961 0.314493 0.660369 -0.0927285\n",
373 | " -0.597925 -0.0303698 -1.25935 -0.208039 0.690813 \n",
374 | " 5x5 Array{Float64,2}:\n",
375 | " -1.2314 -1.38403 -0.382574 -0.730484 -0.116011 \n",
376 | " 0.377068 0.224846 -0.461811 0.117082 0.0833577\n",
377 | " 0.182967 -1.20327 -0.874324 -0.358284 1.53178 \n",
378 | " 2.01423 0.871982 -1.01922 -0.283891 -1.1196 \n",
379 | " -0.19053 -0.56969 0.412514 -0.488731 1.02411 \n",
380 | " 5x5 Array{Float64,2}:\n",
381 | " -0.302492 -0.341963 -0.666326 1.49061 0.0832542\n",
382 | " 1.10361 -1.27165 0.684021 0.25974 0.779812 \n",
383 | " -0.598083 0.297564 -0.99761 1.14976 1.35534 \n",
384 | " -0.0810693 -0.339584 -1.28786 -1.71457 0.684911 \n",
385 | " -0.0570092 -0.608203 -0.127248 -0.253749 0.403802 \n",
386 | " 5x5 Array{Float64,2}:\n",
387 | " -1.12751 -1.49364 0.316947 -0.636222 1.20663 \n",
388 | " -0.932772 0.328505 -1.51835 0.494562 0.757595\n",
389 | " -0.520126 -1.26671 1.35733 1.39144 -1.23887 \n",
390 | " 0.508748 -0.395086 1.1448 0.143046 -0.145262\n",
391 | " -0.701212 -0.692988 1.1467 1.14904 0.714268 \n",
392 | " 5x5 Array{Float64,2}:\n",
393 | " 0.570958 0.616406 -1.19313 0.493156 0.403437\n",
394 | " -0.717995 -0.299192 0.872123 0.894911 -2.12749 \n",
395 | " -1.3683 0.611856 0.210335 0.0987628 0.438487\n",
396 | " 1.52265 -0.732908 1.03063 -0.37912 1.15023 \n",
397 | " -2.10254 0.0765609 1.13586 1.53325 1.00984 \n",
398 | " 5x5 Array{Float64,2}:\n",
399 | " 0.328605 -0.838537 -0.0106646 0.10132 -0.768491\n",
400 | " 1.59261 1.37791 -0.599187 0.454541 0.680724\n",
401 | " -1.50817 1.11634 1.8366 0.029672 0.267361\n",
402 | " 0.0385608 -0.934858 -0.913016 -1.03821 1.55526 \n",
403 | " 0.27456 -1.27207 0.452561 -0.565003 1.21654 \n",
404 | " 5x5 Array{Float64,2}:\n",
405 | " -1.44782 0.857923 0.651314 -0.656111 -1.32793 \n",
406 | " 2.0212 -0.572611 -0.607868 -0.819101 1.24945 \n",
407 | " -0.494674 2.44147 0.309794 0.344964 2.2493 \n",
408 | " -2.75853 0.215022 0.990857 1.2092 -0.254345\n",
409 | " -0.809851 -0.669484 -1.05596 -1.28696 -0.749188 "
410 | ]
411 | },
412 | "execution_count": 30,
413 | "metadata": {},
414 | "output_type": "execute_result"
415 | }
416 | ],
417 | "source": [
418 | "# Distributed comprehension\n",
419 | "D55 = @DArray [randn(5,5) for i = 1:32]"
420 | ]
421 | },
422 | {
423 | "cell_type": "code",
424 | "execution_count": 31,
425 | "metadata": {
426 | "collapsed": false
427 | },
428 | "outputs": [
429 | {
430 | "data": {
431 | "text/plain": [
432 | "32-element DistributedArrays.DArray{Array{Float64,1},1,Array{Array{Float64,1},1}}:\n",
433 | " [3.999836455760208,2.5044582011232213,1.7051737717267952,0.6770905297265457,0.2298652678344431] \n",
434 | " [3.5269810373142625,2.9504161034893985,2.3501373295253316,0.904445051394586,0.036825704100338044] \n",
435 | " [2.6238799978968435,2.1866508742328703,1.707233356369122,1.2765590827198168,0.5839922950173981] \n",
436 | " [4.105123344980141,1.9854356889225975,1.6245519564067503,1.4201918512438185,0.03971714564736037] \n",
437 | " [3.8691762964769665,2.8147930897207667,2.352397425888243,0.9912135906748117,0.2145582563848968] \n",
438 | " [4.2015580368111705,2.2515617032836204,1.5737011557308977,0.44782935379067657,0.28057510281268183] \n",
439 | " [2.7386924486715265,2.328167427456983,1.674341982152823,1.0797402982138045,0.4057750025903677] \n",
440 | " [2.8585407690602858,2.4226851529761357,1.9197908080706862,0.7444017726046709,0.3663540362476281] \n",
441 | " [3.6701118732695526,2.6341723275159223,1.7506859088988276,1.362974720086374,1.1825383948594033] \n",
442 | " [3.4742905262877843,2.0860653342424276,1.2977934260109951,0.9036045962584492,0.45504202929757265] \n",
443 | " [4.409249113869339,3.141197668106675,2.614351728124699,1.6507454230231815,0.07365050889281405] \n",
444 | " [3.6380792785314315,2.656852465287582,2.184367082319934,0.8367046570960576,0.35079944154310505] \n",
445 | " [2.8988803141221187,1.7753367960466189,1.5474403938726755,0.9968554331002822,0.0003115798651073548]\n",
446 | " ⋮ \n",
447 | " [3.103766003758832,1.844156685844372,1.0756401108095648,0.6446976080949658,0.22320307740767642] \n",
448 | " [3.5396554348818055,2.2375502876355653,1.0636124233929034,0.40475534037574207,0.22040135741398684] \n",
449 | " [3.6273446423394344,2.746717243430661,1.583698444235153,0.5182607438014766,0.2210675428240503] \n",
450 | " [2.4548275769880328,1.6575290583700433,1.3401965530312108,0.7081018687860655,0.04149479884174369] \n",
451 | " [3.3918290013577304,2.6550622794033014,1.4544014879848008,0.9197587449504111,0.44624377584759845] \n",
452 | " [4.03202000276583,2.504409527496782,2.2198124312443506,1.1313042329785816,0.8643293790208498] \n",
453 | " [3.337318895608361,2.107451249950189,1.4975644285354333,0.6650198563671659,0.13250566730732805] \n",
454 | " [2.6201299166478145,2.348963447675852,2.032674793821867,0.8063576985389266,0.3187403196623055] \n",
455 | " [3.392183884411519,2.598590681744475,1.7900231821952424,1.0723948198528697,0.15572848706878345] \n",
456 | " [3.596724033083425,2.6751636179665605,2.1780852993813893,1.1218736309579183,0.31214325604846566] \n",
457 | " [3.1547804874542456,2.6722641067572632,2.1236844746594716,1.0645872892244217,0.09306040656686795] \n",
458 | " [4.431059139087309,3.6470477650975877,1.915227377095197,1.3063810531689377,0.48562771787623377] "
459 | ]
460 | },
461 | "execution_count": 31,
462 | "metadata": {},
463 | "output_type": "execute_result"
464 | }
465 | ],
466 | "source": [
467 | "# Compute singular values of the dsitributed vector of matrices\n",
468 | "Dsvd = map(svdvals, D55)"
469 | ]
470 | },
471 | {
472 | "cell_type": "markdown",
473 | "metadata": {},
474 | "source": [
475 | "### Reading data in parallel"
476 | ]
477 | },
478 | {
479 | "cell_type": "code",
480 | "execution_count": 32,
481 | "metadata": {
482 | "collapsed": false
483 | },
484 | "outputs": [
485 | {
486 | "ename": "LoadError",
487 | "evalue": "LoadError: SystemError: unable to read directory /data/MIMICII: No such file or directory\nwhile loading In[32], in expression starting on line 3",
488 | "output_type": "error",
489 | "traceback": [
490 | "LoadError: SystemError: unable to read directory /data/MIMICII: No such file or directory\nwhile loading In[32], in expression starting on line 3",
491 | "",
492 | " in _setindex! at ./dict.jl:625"
493 | ]
494 | }
495 | ],
496 | "source": [
497 | "# Save the path to the data directory and load a list of subdirectories with the data\n",
498 | "pth = \"/data/MIMICII\"\n",
499 | "dirs = filter(isdir, map(t -> joinpath(pth, t), readdir(pth)))"
500 | ]
501 | },
502 | {
503 | "cell_type": "code",
504 | "execution_count": null,
505 | "metadata": {
506 | "collapsed": false
507 | },
508 | "outputs": [],
509 | "source": [
510 | "# Extract the data files\n",
511 | "fls = mapreduce(t -> map(s -> joinpath(t, s), readdir(t)), vcat, dirs)"
512 | ]
513 | },
514 | {
515 | "cell_type": "code",
516 | "execution_count": null,
517 | "metadata": {
518 | "collapsed": false
519 | },
520 | "outputs": [],
521 | "source": [
522 | "# Size if GB\n",
523 | "@time sum(map(filesize, fls))/1024^3"
524 | ]
525 | },
526 | {
527 | "cell_type": "code",
528 | "execution_count": 33,
529 | "metadata": {
530 | "collapsed": false
531 | },
532 | "outputs": [
533 | {
534 | "ename": "LoadError",
535 | "evalue": "LoadError: UndefVarError: fls not defined\nwhile loading In[33], in expression starting on line 2",
536 | "output_type": "error",
537 | "traceback": [
538 | "LoadError: UndefVarError: fls not defined\nwhile loading In[33], in expression starting on line 2",
539 | ""
540 | ]
541 | }
542 | ],
543 | "source": [
544 | "# Create smaller subset of the files to avoid waiting\n",
545 | "flsSmall = fls[1:div(length(fls), 10)]\n",
546 | "@time sum(map(filesize, flsSmall))/1024^3"
547 | ]
548 | },
549 | {
550 | "cell_type": "code",
551 | "execution_count": null,
552 | "metadata": {
553 | "collapsed": false
554 | },
555 | "outputs": [],
556 | "source": [
557 | "# Use package for reading binary files (written in Julia)\n",
558 | "using MAT"
559 | ]
560 | },
561 | {
562 | "cell_type": "code",
563 | "execution_count": null,
564 | "metadata": {
565 | "collapsed": false
566 | },
567 | "outputs": [],
568 | "source": [
569 | "# Map matread to all the file paths to read in all the files in parallel\n",
570 | "@time dt = map(matread, distribute(flsSmall));"
571 | ]
572 | },
573 | {
574 | "cell_type": "code",
575 | "execution_count": null,
576 | "metadata": {
577 | "collapsed": false
578 | },
579 | "outputs": [],
580 | "source": [
581 | "# The result is distributed vector of dictionaries of vectors (and a dictionary)\n",
582 | "@show typeof(dt)\n",
583 | "dt[1]"
584 | ]
585 | },
586 | {
587 | "cell_type": "code",
588 | "execution_count": null,
589 | "metadata": {
590 | "collapsed": false
591 | },
592 | "outputs": [],
593 | "source": [
594 | "# Plot a signal\n",
595 | "using PlotlyJS\n",
596 | "plot(PlotlyJS.scattergl(;y = dt[1][\"signal\"][:]))"
597 | ]
598 | },
599 | {
600 | "cell_type": "code",
601 | "execution_count": null,
602 | "metadata": {
603 | "collapsed": false
604 | },
605 | "outputs": [],
606 | "source": [
607 | "# custom cleaner functions are fast in Julia\n",
608 | "# in this exmaple, replace NaNs with zeros for a single signal\n",
609 | "x = map(t -> (isnan(t) ? 0 : t), dt[1][\"signal\"][:]);\n",
610 | "\n",
611 | "# fft\n",
612 | "xfft = fft(x)"
613 | ]
614 | },
615 | {
616 | "cell_type": "code",
617 | "execution_count": null,
618 | "metadata": {
619 | "collapsed": false
620 | },
621 | "outputs": [],
622 | "source": [
623 | "# size in GB\n",
624 | "@time mapreduce(Base.summarysize, +, dt)/1024^3"
625 | ]
626 | },
627 | {
628 | "cell_type": "code",
629 | "execution_count": null,
630 | "metadata": {
631 | "collapsed": false
632 | },
633 | "outputs": [],
634 | "source": [
635 | "# Compute the lengths for the signals\n",
636 | "@time lngs = map(t -> length(t[\"signal\"]), dt)"
637 | ]
638 | },
639 | {
640 | "cell_type": "code",
641 | "execution_count": null,
642 | "metadata": {
643 | "collapsed": false
644 | },
645 | "outputs": [],
646 | "source": [
647 | "# Plot the distribution of the lenghts of the signals\n",
648 | "plot(histogram(x = lngs), Layout(yaxis=Dict(\"type\" => \"log\")))"
649 | ]
650 | },
651 | {
652 | "cell_type": "code",
653 | "execution_count": null,
654 | "metadata": {
655 | "collapsed": false
656 | },
657 | "outputs": [],
658 | "source": [
659 | "# My own small package for iterative SVD and a few convenience methods\n",
660 | "\n",
661 | "# Pkg.clone(\"https://github.com/andreasnoack/TSVD.jl\")\n",
662 | "using TSVD\n",
663 | "\n",
664 | "# Define method for converting distributed vector of vectors to distributed matrix\n",
665 | "function Base.hcat{T}(A::DistributedArrays.DArray{Vector{T}})\n",
666 | " n = length(A[1])\n",
667 | " D = DArray((n, length(A)), A.pids, [1, length(A.pids)]) do I\n",
668 | " mB, nB = map(length, I)\n",
669 | " # Create new DArray from the distributed vector of vectors.\n",
670 | " # For now, we assume that each vector is only located on a single\n",
671 | " # worker. Eventually, we'd like to find a more flexible solution.\n",
672 | " B = Array(eltype(A[1]), mB, nB)\n",
673 | " for i = 1:nB\n",
674 | " B[:,i] = A[I[2][i]][I[1]]\n",
675 | " end\n",
676 | " B\n",
677 | " end\n",
678 | " return D\n",
679 | "end\n",
680 | "\n",
681 | "# convenience function for concatenating distributed vectors collected in a vector\n",
682 | "@everywhere function Base.hcat{T<:DistributedArrays.DVector}(x::Vector{T})\n",
683 | " l = length(x)\n",
684 | " if l == 0\n",
685 | " throw(ArgumentError(\"cannot flatten empty vector\"))\n",
686 | " else\n",
687 | " x1 = x[1]\n",
688 | " m, n = size(x1, 1), size(x1, 2)\n",
689 | " B = DArray((m, l*n)) do I\n",
690 | " B_local = Array(eltype(x1), map(length, I))\n",
691 | " for j = 1:length(I[2])\n",
692 | " B_local[:, j] = x[I[2][j]][I[1]]\n",
693 | " end\n",
694 | " return B_local\n",
695 | " end\n",
696 | " return B\n",
697 | " end\n",
698 | "end\n",
699 | "\n",
700 | "@everywhere Base.procs(A::Array) = fill(myid(), 1, 1)\n",
701 | "\n",
702 | "Base.convert{S,T,N,D<:DArray}(::Type{Array{S,N}}, s::SubArray{T,N,D}) = begin\n",
703 | " I = s.indexes\n",
704 | " d = s.parent\n",
705 | "# println(\"Hej\", isa(I,Tuple{Vararg{UnitRange{Int}}}))\n",
706 | "# if isa(I,Tuple{Vararg{UnitRange{Int}}}) && S<:T && T<:S\n",
707 | " l = DistributedArrays.locate(d, map(first, I)...)\n",
708 | " if isequal(d.indexes[l...], I)\n",
709 | " # SubDArray corresponds to a chunk\n",
710 | " return DistributedArrays.chunk(d, l...)\n",
711 | " end\n",
712 | "# end\n",
713 | "# a = Array(S, size(s))\n",
714 | "# a[[1:size(a,i) for i=1:N]...] = s\n",
715 | "# return a\n",
716 | "end"
717 | ]
718 | },
719 | {
720 | "cell_type": "code",
721 | "execution_count": null,
722 | "metadata": {
723 | "collapsed": false
724 | },
725 | "outputs": [],
726 | "source": [
727 | "# Necesarry to define our own rep function because Julia's repeat is a bit slow\n",
728 | "@everywhere function rep(x::Vector, l)\n",
729 | " y = similar(x, l)\n",
730 | " cx = cycle(x)\n",
731 | " s = start(cx)\n",
732 | " @inbounds @simd for i = 1:l\n",
733 | " (yi, s) = next(cx, s)\n",
734 | " y[i] = ifelse(isnan(yi), zero(yi), yi)\n",
735 | " end\n",
736 | " return y\n",
737 | "end"
738 | ]
739 | },
740 | {
741 | "cell_type": "code",
742 | "execution_count": null,
743 | "metadata": {
744 | "collapsed": false
745 | },
746 | "outputs": [],
747 | "source": [
748 | "# Compute a distributed vector of vectors of equal lengths\n",
749 | "@time dt1 = let n = 50000\n",
750 | " map(t -> rep(vec(t[\"signal\"]), n), dt);\n",
751 | "end;"
752 | ]
753 | },
754 | {
755 | "cell_type": "code",
756 | "execution_count": null,
757 | "metadata": {
758 | "collapsed": false
759 | },
760 | "outputs": [],
761 | "source": [
762 | "# Convert to a distributed matrix\n",
763 | "@time A = hcat(dt1);"
764 | ]
765 | },
766 | {
767 | "cell_type": "code",
768 | "execution_count": null,
769 | "metadata": {
770 | "collapsed": false
771 | },
772 | "outputs": [],
773 | "source": [
774 | "# Apply the fft along the first dimension of the matrix\n",
775 | "@time B = mapslices(fft, A, 1)"
776 | ]
777 | },
778 | {
779 | "cell_type": "code",
780 | "execution_count": null,
781 | "metadata": {
782 | "collapsed": true
783 | },
784 | "outputs": [],
785 | "source": [
786 | "# Similar to \n",
787 | "map(fft, dt1)"
788 | ]
789 | },
790 | {
791 | "cell_type": "code",
792 | "execution_count": null,
793 | "metadata": {
794 | "collapsed": false
795 | },
796 | "outputs": [],
797 | "source": [
798 | "# Created initial vector. Has to be distributed.\n",
799 | "v0 = DArray(I -> rand(Complex64, length(I[1])), (size(A, 1),), A.pids[:,1])\n",
800 | "\n",
801 | "# Compute the SVD\n",
802 | "@time U, s, V = TSVD.tsvd(A, 5, initVec = v0, stepSize = 5, debug = true);"
803 | ]
804 | }
805 | ],
806 | "metadata": {
807 | "kernelspec": {
808 | "display_name": "Julia 0.4.5-pre",
809 | "language": "julia",
810 | "name": "julia-0.4"
811 | },
812 | "language_info": {
813 | "file_extension": ".jl",
814 | "mimetype": "application/julia",
815 | "name": "julia",
816 | "version": "0.4.5"
817 | }
818 | },
819 | "nbformat": 4,
820 | "nbformat_minor": 0
821 | }
822 |
--------------------------------------------------------------------------------
/WorkshopMPI.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# MPI and Elemental in Julia\n",
8 | "## Parallel Workshop JuliaCon 2016\n",
9 | "### `MPI.jl`\n",
10 | "- MPI.jl provides\n",
11 | " - Julia wrappers for many MPI function but not yet all the newer ones. Normal script execution, e.g.\n",
12 | " ```\n",
13 | " mpirun -np 100000 julia mpiprogram.jl\n",
14 | " ```\n",
15 | " - An MPI Cluster manager for interactive execution of MPI jobs. See below"
16 | ]
17 | },
18 | {
19 | "cell_type": "code",
20 | "execution_count": 1,
21 | "metadata": {
22 | "collapsed": false
23 | },
24 | "outputs": [],
25 | "source": [
26 | "using MPI"
27 | ]
28 | },
29 | {
30 | "cell_type": "markdown",
31 | "metadata": {},
32 | "source": [
33 | "Create an `MPIManager` and use `addprocs` to launch the workers. This will automatically initialize MPI."
34 | ]
35 | },
36 | {
37 | "cell_type": "code",
38 | "execution_count": 2,
39 | "metadata": {
40 | "collapsed": false
41 | },
42 | "outputs": [
43 | {
44 | "data": {
45 | "text/plain": [
46 | "8-element Array{Int64,1}:\n",
47 | " 2\n",
48 | " 3\n",
49 | " 4\n",
50 | " 5\n",
51 | " 6\n",
52 | " 7\n",
53 | " 8\n",
54 | " 9"
55 | ]
56 | },
57 | "execution_count": 2,
58 | "metadata": {},
59 | "output_type": "execute_result"
60 | }
61 | ],
62 | "source": [
63 | "man = MPIManager(np = 8)\n",
64 | "addprocs(man)"
65 | ]
66 | },
67 | {
68 | "cell_type": "markdown",
69 | "metadata": {},
70 | "source": [
71 | "To run a command on the MPI workers, use the `@mpi_do` macro. Here, store the MPI rank in a variable and `@show` it"
72 | ]
73 | },
74 | {
75 | "cell_type": "code",
76 | "execution_count": 3,
77 | "metadata": {
78 | "collapsed": false
79 | },
80 | "outputs": [
81 | {
82 | "name": "stdout",
83 | "output_type": "stream",
84 | "text": [
85 | "\tFrom worker 2:\tmyrank = MPI.Comm_rank(MPI.COMM_WORLD) = 0\n",
86 | "\tFrom worker 9:\tmyrank = MPI.Comm_rank(MPI.COMM_WORLD) = 7\n",
87 | "\tFrom worker 3:\tmyrank = MPI.Comm_rank(MPI.COMM_WORLD) = 1\n",
88 | "\tFrom worker 7:\tmyrank = MPI.Comm_rank(MPI.COMM_WORLD) = 5\n",
89 | "\tFrom worker 5:\tmyrank = MPI.Comm_rank(MPI.COMM_WORLD) = 3\n",
90 | "\tFrom worker 8:\tmyrank = MPI.Comm_rank(MPI.COMM_WORLD) = 6\n",
91 | "\tFrom worker 4:\tmyrank = MPI.Comm_rank(MPI.COMM_WORLD) = 2\n",
92 | "\tFrom worker 6:\tmyrank = MPI.Comm_rank(MPI.COMM_WORLD) = 4\n"
93 | ]
94 | }
95 | ],
96 | "source": [
97 | "@mpi_do man @show myrank = MPI.Comm_rank(MPI.COMM_WORLD);"
98 | ]
99 | },
100 | {
101 | "cell_type": "markdown",
102 | "metadata": {},
103 | "source": [
104 | "Allocate the vector `x` on all workers and show its values. Notice that the RNG is not syncronized accross workers."
105 | ]
106 | },
107 | {
108 | "cell_type": "code",
109 | "execution_count": 4,
110 | "metadata": {
111 | "collapsed": false
112 | },
113 | "outputs": [
114 | {
115 | "name": "stdout",
116 | "output_type": "stream",
117 | "text": [
118 | "\tFrom worker 3:\tx = randn(2) = [1.561807228850006,-1.8991612031066882]\n",
119 | "\tFrom worker 4:\tx = randn(2) = [1.7827074625350516,1.1432842639087033]\n",
120 | "\tFrom worker 5:\tx = randn(2) = [0.7812067994558566,1.3515494345331944]\n",
121 | "\tFrom worker 9:\tx = randn(2) = [-1.0100482065707683,0.46991613460582093]\n",
122 | "\tFrom worker 7:\tx = randn(2) = [-0.4051027120044051,-0.8893470928293499]\n",
123 | "\tFrom worker 6:\tx = randn(2) = [-0.7893776602217285,-2.3168249675165273]\n",
124 | "\tFrom worker 2:\tx = randn(2) = [-2.7318205254023775,0.6965055849677959]\n",
125 | "\tFrom worker 8:\tx = randn(2) = [0.4709601667537731,-1.1963960727081422]\n"
126 | ]
127 | }
128 | ],
129 | "source": [
130 | "@mpi_do man @show x = randn(2)"
131 | ]
132 | },
133 | {
134 | "cell_type": "markdown",
135 | "metadata": {},
136 | "source": [
137 | "Send and receive."
138 | ]
139 | },
140 | {
141 | "cell_type": "code",
142 | "execution_count": 7,
143 | "metadata": {
144 | "collapsed": false
145 | },
146 | "outputs": [
147 | {
148 | "name": "stdout",
149 | "output_type": "stream",
150 | "text": [
151 | "\tFrom worker 7:\tx = [-2.7318205254023775,0.6965055849677959]\n",
152 | "\tFrom worker 4:\tx = [3.0,0.6965055849677959]\n",
153 | "\tFrom worker 3:\tx = [-2.7318205254023775,0.6965055849677959]\n",
154 | "\tFrom worker 2:\tx = [-2.7318205254023775,0.6965055849677959]\n",
155 | "\tFrom worker 5:\tx = [-2.7318205254023775,0.6965055849677959]\n",
156 | "\tFrom worker 8:\tx = [-2.7318205254023775,0.6965055849677959]\n",
157 | "\tFrom worker 6:\tx = [-2.7318205254023775,0.6965055849677959]\n",
158 | "\tFrom worker 9:\tx = [-2.7318205254023775,0.6965055849677959]\n"
159 | ]
160 | }
161 | ],
162 | "source": [
163 | "@mpi_do man begin\n",
164 | " if myrank == 3\n",
165 | " MPI.Send(3.0, 2, 0, MPI.COMM_WORLD)\n",
166 | " elseif myrank == 2\n",
167 | " MPI.Recv!(x, 1, 3, 0, MPI.COMM_WORLD)\n",
168 | " end\n",
169 | "end\n",
170 | "@mpi_do man @show x"
171 | ]
172 | },
173 | {
174 | "cell_type": "markdown",
175 | "metadata": {},
176 | "source": [
177 | "Below, we show an example of `Bcast!` which is the Julia wrapper function for the collective MPI operation `MPI_Bcast`. The function is overloaded for several input types. The type of the broadcasted buffer is always determined from the Julia type and if the input argument is a Julia vector then the size determined automatically as well. Alternatively if the argument is either a vector or pointer, the length can be specified as an integer argument."
178 | ]
179 | },
180 | {
181 | "cell_type": "code",
182 | "execution_count": 8,
183 | "metadata": {
184 | "collapsed": false
185 | },
186 | "outputs": [
187 | {
188 | "name": "stdout",
189 | "output_type": "stream",
190 | "text": [
191 | "\tFrom worker 7:\tx = [-2.7318205254023775,0.6965055849677959]\n",
192 | "\tFrom worker 4:\tx = [-2.7318205254023775,0.6965055849677959]\n",
193 | "\tFrom worker 9:\tx = [-2.7318205254023775,0.6965055849677959]\n",
194 | "\tFrom worker 2:\tx = [-2.7318205254023775,0.6965055849677959]\n",
195 | "\tFrom worker 3:\tx = [-2.7318205254023775,0.6965055849677959]\n",
196 | "\tFrom worker 5:\tx = [-2.7318205254023775,0.6965055849677959]\n",
197 | "\tFrom worker 8:\tx = [-2.7318205254023775,0.6965055849677959]\n",
198 | "\tFrom worker 6:\tx = [-2.7318205254023775,0.6965055849677959]\n"
199 | ]
200 | }
201 | ],
202 | "source": [
203 | "@mpi_do man MPI.Bcast!(x, 0, MPI.COMM_WORLD)\n",
204 | "@mpi_do man @show x"
205 | ]
206 | },
207 | {
208 | "cell_type": "markdown",
209 | "metadata": {
210 | "collapsed": false
211 | },
212 | "source": [
213 | "Compute $\\pi$ with `MPI_Reduce` (`MPI_Allreduce` available in `MPI.jl` development version)"
214 | ]
215 | },
216 | {
217 | "cell_type": "code",
218 | "execution_count": 29,
219 | "metadata": {
220 | "collapsed": false
221 | },
222 | "outputs": [
223 | {
224 | "name": "stdout",
225 | "output_type": "stream",
226 | "text": [
227 | "\tFrom worker 2:\tpi_final / MPI.Comm_size(MPI.COMM_WORLD) = 3.1414827\n"
228 | ]
229 | }
230 | ],
231 | "source": [
232 | "@mpi_do man begin\n",
233 | " n = 10^7\n",
234 | " \n",
235 | " # compute pi locally\n",
236 | " pi_local = mapreduce(i -> rand()^2 + rand()^2 < 1, +, 1:n)/n*4\n",
237 | " \n",
238 | " # combine the results with MPI_Reduce\n",
239 | " pi_final = MPI.Reduce(pi_local, MPI.SUM, 0, MPI.COMM_WORLD)\n",
240 | "\n",
241 | " # show the result\n",
242 | " if myrank == 0\n",
243 | " @show pi_final / MPI.Comm_size(MPI.COMM_WORLD)\n",
244 | " end\n",
245 | "end"
246 | ]
247 | },
248 | {
249 | "cell_type": "markdown",
250 | "metadata": {},
251 | "source": [
252 | "## `Elemental.jl`\n",
253 | "\n",
254 | "- [Elemental](http://github.com/Elemental/elemental) is a C++ library for distributed dense linear algebra (lately also some sparse and optimization functions)\n",
255 | "- `Elemental.jl` provides Julia wrappers for `Elemental`.\n",
256 | "- Still alpha stage\n",
257 | "- Two APIs\n",
258 | " - Thin layer on top of C++ library\n",
259 | " - Higher level with `DArray` interoperability"
260 | ]
261 | },
262 | {
263 | "cell_type": "code",
264 | "execution_count": 30,
265 | "metadata": {
266 | "collapsed": false
267 | },
268 | "outputs": [],
269 | "source": [
270 | "using Elemental"
271 | ]
272 | },
273 | {
274 | "cell_type": "code",
275 | "execution_count": 31,
276 | "metadata": {
277 | "collapsed": false
278 | },
279 | "outputs": [],
280 | "source": [
281 | "# Define a 2000x2000 Elemental distributed matrix of Gaussian variates\n",
282 | "@mpi_do man n = 2000\n",
283 | "@mpi_do man A = Elemental.DistMatrix(Float64)\n",
284 | "@mpi_do man Elemental.gaussian!(A, n, n)"
285 | ]
286 | },
287 | {
288 | "cell_type": "code",
289 | "execution_count": 32,
290 | "metadata": {
291 | "collapsed": false
292 | },
293 | "outputs": [
294 | {
295 | "name": "stdout",
296 | "output_type": "stream",
297 | "text": [
298 | "\tFrom worker 2:\tA[1,1] = -1.0290752082028511\n",
299 | "\tFrom worker 4:\tA[1,1] = -1.0290752082028511\n",
300 | "\tFrom worker 3:\tA[1,1] = -1.0290752082028511\n",
301 | "\tFrom worker 6:\tA[1,1] = -1.0290752082028511\n",
302 | "\tFrom worker 5:\tA[1,1] = -1.0290752082028511\n",
303 | "\tFrom worker 9:\tA[1,1] = -1.0290752082028511\n",
304 | "\tFrom worker 7:\tA[1,1] = -1.0290752082028511\n",
305 | "\tFrom worker 8:\tA[1,1] = -1.0290752082028511\n"
306 | ]
307 | }
308 | ],
309 | "source": [
310 | "# Print the first element\n",
311 | "@mpi_do man @show A[1,1]"
312 | ]
313 | },
314 | {
315 | "cell_type": "code",
316 | "execution_count": 33,
317 | "metadata": {
318 | "collapsed": false
319 | },
320 | "outputs": [
321 | {
322 | "name": "stdout",
323 | "output_type": "stream",
324 | "text": [
325 | " 6.251383 seconds (5.46 k allocations: 413.798 KB)\n"
326 | ]
327 | }
328 | ],
329 | "source": [
330 | "# Compute the singular values with Elementals singular value solver. Computes all values.\n",
331 | "@time @mpi_do man vals = svdvals(A)"
332 | ]
333 | },
334 | {
335 | "cell_type": "code",
336 | "execution_count": 34,
337 | "metadata": {
338 | "collapsed": false
339 | },
340 | "outputs": [
341 | {
342 | "name": "stdout",
343 | "output_type": "stream",
344 | "text": [
345 | "\tFrom worker 6:\tvals[1] = 89.16743025999959\n",
346 | "\tFrom worker 4:\tvals[1] = 89.16743025999959\n",
347 | "\tFrom worker 3:\tvals[1] = 89.16743025999959\n",
348 | "\tFrom worker 2:\tvals[1] = 89.16743025999959\n",
349 | "\tFrom worker 7:\tvals[1] = 89.16743025999959\n",
350 | "\tFrom worker 5:\tvals[1] = 89.16743025999959\n",
351 | "\tFrom worker 8:\tvals[1] = 89.16743025999959\n",
352 | "\tFrom worker 9:\tvals[1] = 89.16743025999959\n"
353 | ]
354 | }
355 | ],
356 | "source": [
357 | "# Show the largest singular value\n",
358 | "@mpi_do man @show vals[1]"
359 | ]
360 | },
361 | {
362 | "cell_type": "code",
363 | "execution_count": 35,
364 | "metadata": {
365 | "collapsed": false
366 | },
367 | "outputs": [
368 | {
369 | "name": "stdout",
370 | "output_type": "stream",
371 | "text": [
372 | "(svdvals(A))[1] = 88.76880541870597\n",
373 | " 5.691730 seconds (90.51 k allocations: 35.761 MB, 0.06% gc time)\n"
374 | ]
375 | }
376 | ],
377 | "source": [
378 | "# Now compate to a local compuation with Julia's `svdvals` based on LAPACK\n",
379 | "n = 2000\n",
380 | "A = randn(n, n);\n",
381 | "@time @show svdvals(A)[1];"
382 | ]
383 | },
384 | {
385 | "cell_type": "code",
386 | "execution_count": 38,
387 | "metadata": {
388 | "collapsed": false
389 | },
390 | "outputs": [],
391 | "source": [
392 | "# Now try out my TSVD package on an Elemental array\n",
393 | "\n",
394 | "# Pkg.clone(\"https://github.com/andreasnoack/TSVD.jl\")\n",
395 | "using TSVD"
396 | ]
397 | },
398 | {
399 | "cell_type": "code",
400 | "execution_count": 41,
401 | "metadata": {
402 | "collapsed": false
403 | },
404 | "outputs": [
405 | {
406 | "name": "stdout",
407 | "output_type": "stream",
408 | "text": [
409 | " 1.757477 seconds (5.88 k allocations: 444.171 KB)\n"
410 | ]
411 | }
412 | ],
413 | "source": [
414 | "# PROBLEM: The tsvd function uses random seed on all workers which we saw was different for each worker.\n",
415 | "# SOLUTION: Use Bcast! (or just set the seed on all workers)\n",
416 | "@time @mpi_do man begin\n",
417 | " v0 = randn(n)\n",
418 | " MPI.Bcast!(v0, 0, MPI.COMM_WORLD)\n",
419 | " \n",
420 | " # compute the SVD\n",
421 | " vals = TSVD.tsvd(A, 10, initVec = v0)\n",
422 | "end"
423 | ]
424 | }
425 | ],
426 | "metadata": {
427 | "kernelspec": {
428 | "display_name": "Julia 0.4.5-pre",
429 | "language": "julia",
430 | "name": "julia-0.4"
431 | },
432 | "language_info": {
433 | "file_extension": ".jl",
434 | "mimetype": "application/julia",
435 | "name": "julia",
436 | "version": "0.4.5"
437 | }
438 | },
439 | "nbformat": 4,
440 | "nbformat_minor": 0
441 | }
442 |
--------------------------------------------------------------------------------
/src/blackscholes.jl:
--------------------------------------------------------------------------------
1 | # Multi-threading in Julia
2 | # Multi-threading will be native to Julia in the next release (v0.5.x). It is currently under active development.
3 | # This tutorial gives you a quick taste of how to use multi-threading in Julia.
4 |
5 | # First load the threading code that's in the Threads module.
6 |
7 | using Base.Threads
8 |
9 | @inline function cndf2(in::Array{Float64,1})
10 | out = 0.5 .+ 0.5 .* erf(0.707106781 .* in)
11 | return out
12 | end
13 |
14 | function blackscholes_serial(sptprice::Float64,
15 | strike::Array{Float64,1},
16 | rate::Float64,
17 | volatility::Float64,
18 | time::Float64)
19 | logterm = log10(sptprice ./ strike)
20 | powterm = .5 .* volatility .* volatility
21 | den = volatility .* sqrt(time)
22 | d1 = (((rate .+ powterm) .* time) .+ logterm) ./ den
23 | d2 = d1 .- den
24 | NofXd1 = cndf2(d1)
25 | NofXd2 = cndf2(d2)
26 | futureValue = strike .* exp(- rate .* time)
27 | c1 = futureValue .* NofXd2
28 | call = sptprice .* NofXd1 .- c1
29 | put = call .- futureValue .+ sptprice
30 | end
31 |
32 | # This next parallel rewrite does two simple things:
33 | # It devectorizes the entire loop. We're now looping through arrays instead of dealing with them in blocks like we did previously. This is devectorized code while the above serial block is vectorized code.
34 |
35 | function blackscholes_devec(sptprice::Float64,
36 | strike::Vector{Float64},
37 | rate::Float64,
38 | volatility::Float64,
39 | time::Float64)
40 | sqt = sqrt(time)
41 | put = similar(strike)
42 | for i = 1:size(strike, 1)
43 | logterm = log10(sptprice / strike[i])
44 | powterm = 0.5 * volatility * volatility
45 | den = volatility * sqt
46 | d1 = (((rate + powterm) * time) + logterm) / den
47 | d2 = d1 - den
48 | NofXd1 = 0.5 + 0.5 * erf(0.707106781 * d1)
49 | NofXd2 = 0.5 + 0.5 * erf(0.707106781 * d2)
50 | futureValue = strike[i] * exp(-rate * time)
51 | c1 = futureValue * NofXd2
52 | call_ = sptprice * NofXd1 - c1
53 | put[i] = call_ - futureValue + sptprice
54 | end
55 | put
56 | end
57 |
58 | # Affixes the `@threads` macro in front of the `for` to tell Julia that this is a multi-threaded block.
59 |
60 | function blackscholes_parallel(sptprice::Float64,
61 | strike::Vector{Float64},
62 | rate::Float64,
63 | volatility::Float64,
64 | time::Float64)
65 | sqt = sqrt(time)
66 | put = similar(strike)
67 | @threads for i = 1:size(strike, 1)
68 | logterm = log10(sptprice / strike[i])
69 | powterm = 0.5 * volatility * volatility
70 | den = volatility * sqt
71 | d1 = (((rate + powterm) * time) + logterm) / den
72 | d2 = d1 - den
73 | NofXd1 = 0.5 + 0.5 * erf(0.707106781 * d1)
74 | NofXd2 = 0.5 + 0.5 * erf(0.707106781 * d2)
75 | futureValue = strike[i] * exp(-rate * time)
76 | c1 = futureValue * NofXd2
77 | call_ = sptprice * NofXd1 - c1
78 | put[i] = call_ - futureValue + sptprice
79 | end
80 | put
81 | end
82 |
83 |
84 | function run(iterations)
85 | sptprice = 42.0
86 | initStrike = Float64[ 40.0 + (i / iterations) for i = 1:iterations ]
87 | rate = 0.5
88 | volatility = 0.2
89 | time = 0.5
90 |
91 | tic()
92 | put1 = blackscholes_serial(sptprice, initStrike, rate, volatility, time)
93 | t1 = toq()
94 | println("Serial checksum: ", sum(put1))
95 | tic()
96 | put2 = blackscholes_devec(sptprice, initStrike, rate, volatility, time)
97 | t2 = toq()
98 | println("Parallel checksum: ", sum(put2))
99 | tic()
100 | put3 = blackscholes_parallel(sptprice, initStrike, rate, volatility, time)
101 | t3 = toq()
102 | println("Parallel checksum: ", sum(put3))
103 | return t1, t2, t3
104 | end
105 |
106 | function driver()
107 | srand(0)
108 | tic()
109 | iterations = 10^6
110 | blackscholes_serial(0., Float64[], 0., 0., 0.)
111 | blackscholes_devec(0., Float64[], 0., 0., 0.)
112 | blackscholes_parallel(0., Float64[], 0., 0., 0.)
113 | println("SELFPRIMED ", toq())
114 | tserial, tdevec, tparallel = run(iterations)
115 | println("Time taken for serial = $tserial")
116 | println("Time taken for devec = $tdevec")
117 | println("Time taken for parallel = $tparallel")
118 | println("Serial rate = ", iterations / tserial, " opts/sec")
119 | println("Devec rate = ", iterations / tdevec, " opts/sec")
120 | println("Parallel rate = ", iterations / tparallel, " opts/sec")
121 | end
122 | driver()
123 |
124 |
--------------------------------------------------------------------------------