├── ocaml2020-workshop-parallel ├── .gitignore ├── examples │ ├── dune-project │ ├── README.md │ ├── dune │ ├── .merlin │ ├── nbody_domain.ml │ ├── nbody_serial.ml │ ├── nbody_task.ml │ ├── nbody_task_write_optim.ml │ └── nbody_task_triangle.ml ├── slides.pdf ├── multicore-ocaml20.pdf ├── slides-with-speaker-notes.pdf └── README.md ├── ocaml2023-eio ├── .gitignore ├── arch.pdf ├── eio.pdf ├── parsing.png ├── slides.pdf ├── lock-free.png ├── lock-free.csv ├── Dockerfile ├── Makefile ├── eio.tex └── slides.tex ├── ocaml2021-workshop-effects ├── .gitignore ├── output.ods ├── trace.png ├── rps-graph.png ├── Dockerfile ├── Makefile ├── eio.tex └── slides.tex └── wasm-wg2022-stack-switching ├── .gitignore ├── output.ods ├── slides.pdf ├── trace.png ├── uring.png ├── rps-graph.png ├── Dockerfile ├── Makefile ├── eio.tex └── slides.tex /ocaml2020-workshop-parallel/.gitignore: -------------------------------------------------------------------------------- 1 | examples/_build 2 | -------------------------------------------------------------------------------- /ocaml2020-workshop-parallel/examples/dune-project: -------------------------------------------------------------------------------- 1 | (lang dune 2.5) 2 | -------------------------------------------------------------------------------- /ocaml2023-eio/.gitignore: -------------------------------------------------------------------------------- 1 | *.aux 2 | *.log 3 | *.out 4 | *.nav 5 | *.snm 6 | *.toc 7 | *.vrb 8 | -------------------------------------------------------------------------------- /ocaml2023-eio/arch.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ocaml-multicore/multicore-talks/HEAD/ocaml2023-eio/arch.pdf -------------------------------------------------------------------------------- /ocaml2023-eio/eio.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ocaml-multicore/multicore-talks/HEAD/ocaml2023-eio/eio.pdf -------------------------------------------------------------------------------- /ocaml2021-workshop-effects/.gitignore: -------------------------------------------------------------------------------- 1 | *.aux 2 | *.log 3 | *.out 4 | *.pdf 5 | *.nav 6 | *.snm 7 | *.toc 8 | *.vrb 9 | -------------------------------------------------------------------------------- /ocaml2023-eio/parsing.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ocaml-multicore/multicore-talks/HEAD/ocaml2023-eio/parsing.png -------------------------------------------------------------------------------- /ocaml2023-eio/slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ocaml-multicore/multicore-talks/HEAD/ocaml2023-eio/slides.pdf -------------------------------------------------------------------------------- /wasm-wg2022-stack-switching/.gitignore: -------------------------------------------------------------------------------- 1 | *.aux 2 | *.log 3 | *.out 4 | *.pdf 5 | *.nav 6 | *.snm 7 | *.toc 8 | *.vrb 9 | -------------------------------------------------------------------------------- /ocaml2023-eio/lock-free.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ocaml-multicore/multicore-talks/HEAD/ocaml2023-eio/lock-free.png -------------------------------------------------------------------------------- /ocaml2020-workshop-parallel/slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ocaml-multicore/multicore-talks/HEAD/ocaml2020-workshop-parallel/slides.pdf -------------------------------------------------------------------------------- /ocaml2021-workshop-effects/output.ods: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ocaml-multicore/multicore-talks/HEAD/ocaml2021-workshop-effects/output.ods -------------------------------------------------------------------------------- /ocaml2021-workshop-effects/trace.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ocaml-multicore/multicore-talks/HEAD/ocaml2021-workshop-effects/trace.png -------------------------------------------------------------------------------- /wasm-wg2022-stack-switching/output.ods: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ocaml-multicore/multicore-talks/HEAD/wasm-wg2022-stack-switching/output.ods -------------------------------------------------------------------------------- /wasm-wg2022-stack-switching/slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ocaml-multicore/multicore-talks/HEAD/wasm-wg2022-stack-switching/slides.pdf -------------------------------------------------------------------------------- /wasm-wg2022-stack-switching/trace.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ocaml-multicore/multicore-talks/HEAD/wasm-wg2022-stack-switching/trace.png -------------------------------------------------------------------------------- /wasm-wg2022-stack-switching/uring.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ocaml-multicore/multicore-talks/HEAD/wasm-wg2022-stack-switching/uring.png -------------------------------------------------------------------------------- /ocaml2021-workshop-effects/rps-graph.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ocaml-multicore/multicore-talks/HEAD/ocaml2021-workshop-effects/rps-graph.png -------------------------------------------------------------------------------- /wasm-wg2022-stack-switching/rps-graph.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ocaml-multicore/multicore-talks/HEAD/wasm-wg2022-stack-switching/rps-graph.png -------------------------------------------------------------------------------- /ocaml2023-eio/lock-free.csv: -------------------------------------------------------------------------------- 1 | n_send_domains, mutex, lock-free 2 | 0, 278.1, 323.00 3 | 1, 1667.9, 532.41 4 | 2, 1218.16, 352.23 5 | 4, 1275.13, 301.99 6 | -------------------------------------------------------------------------------- /ocaml2020-workshop-parallel/multicore-ocaml20.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ocaml-multicore/multicore-talks/HEAD/ocaml2020-workshop-parallel/multicore-ocaml20.pdf -------------------------------------------------------------------------------- /ocaml2020-workshop-parallel/slides-with-speaker-notes.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ocaml-multicore/multicore-talks/HEAD/ocaml2020-workshop-parallel/slides-with-speaker-notes.pdf -------------------------------------------------------------------------------- /ocaml2023-eio/Dockerfile: -------------------------------------------------------------------------------- 1 | FROM debian:10 2 | RUN apt-get update && apt-get install -y texlive-latex-base texlive-latex-recommended texlive-pictures make --no-install-recommends 3 | -------------------------------------------------------------------------------- /ocaml2021-workshop-effects/Dockerfile: -------------------------------------------------------------------------------- 1 | FROM debian:10 2 | RUN apt-get update && apt-get install -y texlive-latex-base texlive-latex-recommended texlive-pictures make --no-install-recommends 3 | -------------------------------------------------------------------------------- /wasm-wg2022-stack-switching/Dockerfile: -------------------------------------------------------------------------------- 1 | FROM debian:10 2 | RUN apt-get update && apt-get install -y texlive-latex-base texlive-latex-recommended texlive-pictures make --no-install-recommends 3 | -------------------------------------------------------------------------------- /ocaml2021-workshop-effects/Makefile: -------------------------------------------------------------------------------- 1 | all: eio.pdf slides.pdf 2 | 3 | %.pdf: %.tex 4 | pdflatex -halt-on-error $< 5 | 6 | docker: 7 | docker build -t latex . 8 | docker run --rm -v "${PWD}:/mnt" latex make -C /mnt 9 | -------------------------------------------------------------------------------- /wasm-wg2022-stack-switching/Makefile: -------------------------------------------------------------------------------- 1 | all: eio.pdf slides.pdf 2 | 3 | %.pdf: %.tex 4 | pdflatex -halt-on-error $< 5 | 6 | docker: 7 | docker build -t latex . 8 | docker run --rm -v "${PWD}:/mnt" latex make -C /mnt 9 | -------------------------------------------------------------------------------- /ocaml2023-eio/Makefile: -------------------------------------------------------------------------------- 1 | all: eio.pdf slides.pdf 2 | 3 | %.pdf: %.tex 4 | pdflatex -halt-on-error $< 5 | 6 | docker: 7 | docker build -t latex . 8 | docker run --rm -v "${PWD}:/mnt" latex make -C /mnt 9 | 10 | #%.svg: %.txt 11 | # goat -i $< -o $@ 12 | 13 | slides.pdf: arch.pdf 14 | -------------------------------------------------------------------------------- /ocaml2020-workshop-parallel/examples/README.md: -------------------------------------------------------------------------------- 1 | These are the examples used in the talk. 2 | 3 | To set up Multicore OCaml, Dune and Domainslib use the instructions from the [multicore-opam repo](https://github.com/ocaml-multicore/multicore-opam) as these are kept in sync. 4 | 5 | Building these examples should then just be a matter of running: 6 | 7 | ``` 8 | dune build 9 | ``` 10 | 11 | which will put executables in to the _build/default directory. 12 | -------------------------------------------------------------------------------- /ocaml2020-workshop-parallel/examples/dune: -------------------------------------------------------------------------------- 1 | (executable 2 | (name nbody_serial) 3 | (modules nbody_serial) 4 | (libraries unix) 5 | ) 6 | 7 | (executable 8 | (name nbody_domain) 9 | (modules nbody_domain) 10 | (libraries unix) 11 | ) 12 | 13 | (executable 14 | (name nbody_task) 15 | (modules nbody_task) 16 | (libraries unix domainslib) 17 | ) 18 | 19 | (executable 20 | (name nbody_task_write_optim) 21 | (modules nbody_task_write_optim) 22 | (libraries unix domainslib) 23 | ) 24 | 25 | (executable 26 | (name nbody_task_triangle) 27 | (modules nbody_task_triangle) 28 | (libraries unix domainslib) 29 | ) -------------------------------------------------------------------------------- /ocaml2020-workshop-parallel/examples/.merlin: -------------------------------------------------------------------------------- 1 | EXCLUDE_QUERY_DIR 2 | B /home/sadiq/.opam/4.10.0+multicore/lib/domainslib 3 | B /home/sadiq/.opam/4.10.0+multicore/lib/ocaml 4 | B _build/default/.nbody_domain.eobjs/byte 5 | B _build/default/.nbody_serial.eobjs/byte 6 | B _build/default/.nbody_task.eobjs/byte 7 | B _build/default/.nbody_task_triangle.eobjs/byte 8 | B _build/default/.nbody_task_write_optim.eobjs/byte 9 | S /home/sadiq/.opam/4.10.0+multicore/lib/domainslib 10 | S /home/sadiq/.opam/4.10.0+multicore/lib/ocaml 11 | S . 12 | FLG -w @1..3@5..28@30..39@43@46..47@49..57@61..62-40 -strict-sequence -strict-formats -short-paths -keep-locs -w @1..3@5..28@30..39@43@46..47@49..57@61..62-40 -strict-sequence -strict-formats -short-paths -keep-locs -w @1..3@5..28@30..39@43@46..47@49..57@61..62-40 -strict-sequence -strict-formats -short-paths -keep-locs -w @1..3@5..28@30..39@43@46..47@49..57@61..62-40 -strict-sequence -strict-formats -short-paths -keep-locs -w @1..3@5..28@30..39@43@46..47@49..57@61..62-40 -strict-sequence -strict-formats -short-paths -keep-locs 13 | -------------------------------------------------------------------------------- /ocaml2020-workshop-parallel/README.md: -------------------------------------------------------------------------------- 1 | # Parallelising your OCaml Code with Multicore OCaml 2 | 3 | This repository contains the [abstract](multicore-ocaml20.pdf), [slides](slides.pdf), [speaker notes](slides-with-speaker-notes.pdf), [runnable examples](examples) and other resources for the OCaml Workshop 2020 talk "Parallelising your OCaml Code with Multicore OCaml". The talk video is available on [YouTube](https://www.youtube.com/watch?v=Z7YZR1q8wzI). 4 | 5 | ## Other resources 6 | 7 | [Parallel Programming in Multicore OCaml](https://github.com/ocaml-multicore/parallel-programming-in-multicore-ocaml) is an in-progress book chapter that covers some of the same material as this talk but does so in more depth. 8 | 9 | [Multicore OCaml's opam repo](https://github.com/ocaml-multicore/multicore-opam) has instructions for installing Multicore OCaml along with domainslib. 10 | 11 | The [OCaml Multicore Github Wiki](https://github.com/ocaml-multicore/ocaml-multicore/wiki) contains links to many more articles on Multicore OCaml as well as notes on the memory model, garbage collector and other lower-level topics. Note: some of the older articles and wiki pages may not describe the current state of multicore. 12 | -------------------------------------------------------------------------------- /ocaml2020-workshop-parallel/examples/nbody_domain.ml: -------------------------------------------------------------------------------- 1 | type planet = { mutable x : float; mutable y : float; mutable z : float; 2 | mutable vx: float; mutable vy: float; mutable vz: float; 3 | mass : float } 4 | 5 | let advance bodies n_bodies dt = 6 | let ds = 7 | Array.init n_bodies (fun i -> Domain.spawn (fun _ -> 8 | let b = bodies.(i) in 9 | for j = 0 to n_bodies - 1 do 10 | let b' = bodies.(j) in 11 | if (i!=j) then begin 12 | let dx = b.x -. b'.x and dy = b.y -. b'.y and dz = b.z -. b'.z in 13 | let dist2 = dx *. dx +. dy *. dy +. dz *. dz in 14 | let mag = dt /. (dist2 *. sqrt(dist2)) in 15 | b.vx <- b.vx -. dx *. b'.mass *. mag; 16 | b.vy <- b.vy -. dy *. b'.mass *. mag; 17 | b.vz <- b.vz -. dz *. b'.mass *. mag; 18 | end 19 | done 20 | )) in 21 | Array.iter (Domain.join) ds; (* barrier *) 22 | for i = 0 to n_bodies - 1 do 23 | let b = bodies.(i) in 24 | b.x <- b.x +. dt *. b.vx; 25 | b.y <- b.y +. dt *. b.vy; 26 | b.z <- b.z +. dt *. b.vz; 27 | done 28 | 29 | let energy bodies = 30 | let e = ref 0. in 31 | for i = 0 to Array.length bodies - 1 do 32 | let b = bodies.(i) in 33 | e := !e +. 0.5 *. b.mass *. (b.vx *. b.vx +. b.vy *. b.vy +. b.vz *. b.vz); 34 | for j = i+1 to Array.length bodies - 1 do 35 | let b' = bodies.(j) in 36 | let dx = b.x -. b'.x and dy = b.y -. b'.y and dz = b.z -. b'.z in 37 | let distance = sqrt(dx *. dx +. dy *. dy +. dz *. dz) in 38 | e := !e -. (b.mass *. b'.mass) /. distance 39 | done 40 | done; 41 | !e 42 | 43 | let pi = 3.141592653589793 44 | let solar_mass = 4. *. pi *. pi 45 | let days_per_year = 365.24 46 | 47 | let offset_momentum bodies = 48 | let px = ref 0. and py = ref 0. and pz = ref 0. in 49 | for i = 0 to Array.length bodies - 1 do 50 | px := !px +. bodies.(i).vx *. bodies.(i).mass; 51 | py := !py +. bodies.(i).vy *. bodies.(i).mass; 52 | pz := !pz +. bodies.(i).vz *. bodies.(i).mass; 53 | done; 54 | bodies.(0).vx <- -. !px /. solar_mass; 55 | bodies.(0).vy <- -. !py /. solar_mass; 56 | bodies.(0).vz <- -. !pz /. solar_mass 57 | 58 | let initialize_bodies num_bodies = 59 | Array.init num_bodies (fun _ -> 60 | { x = (Random.float 10.); 61 | y = (Random.float 10.); 62 | z = (Random.float 10.); 63 | vx= (Random.float 5.) *. days_per_year; 64 | vy= (Random.float 4.) *. days_per_year; 65 | vz= (Random.float 5.) *. days_per_year; 66 | mass=(Random.float 10.) *. solar_mass; }) 67 | 68 | let () = 69 | let n = int_of_string(Sys.argv.(1)) in 70 | let num_bodies = int_of_string(Sys.argv.(2)) in 71 | let bodies = initialize_bodies num_bodies in 72 | offset_momentum bodies; 73 | Printf.printf "%.9f\n" (energy bodies); 74 | for _i = 1 to n do advance bodies num_bodies 0.01 done; 75 | Printf.printf "%.9f\n" (energy bodies) 76 | -------------------------------------------------------------------------------- /ocaml2020-workshop-parallel/examples/nbody_serial.ml: -------------------------------------------------------------------------------- 1 | type planet = { mutable x : float; mutable y : float; mutable z : float; 2 | mutable vx: float; mutable vy: float; mutable vz: float; 3 | mass : float } 4 | 5 | let advance bodies n_bodies dt = 6 | for i = 0 to n_bodies - 1 do 7 | let b = bodies.(i) in 8 | for j = i+1 to n_bodies - 1 do 9 | let b' = bodies.(j) in 10 | let dx = b.x -. b'.x and dy = b.y -. b'.y and dz = b.z -. b'.z in 11 | let dist2 = dx *. dx +. dy *. dy +. dz *. dz in 12 | let mag = dt /. (dist2 *. sqrt(dist2)) in 13 | 14 | b.vx <- b.vx -. dx *. b'.mass *. mag; 15 | b.vy <- b.vy -. dy *. b'.mass *. mag; 16 | b.vz <- b.vz -. dz *. b'.mass *. mag; 17 | 18 | b'.vx <- b'.vx +. dx *. b.mass *. mag; 19 | b'.vy <- b'.vy +. dy *. b.mass *. mag; 20 | b'.vz <- b'.vz +. dz *. b.mass *. mag; 21 | done 22 | done 23 | 24 | let update bodies dt = 25 | let n_bodies = Array.length bodies in 26 | for i = 0 to n_bodies - 1 do 27 | let b = bodies.(i) in 28 | b.x <- b.x +. dt *. b.vx; 29 | b.y <- b.y +. dt *. b.vy; 30 | b.z <- b.z +. dt *. b.vz; 31 | done 32 | 33 | let energy bodies = 34 | let e = ref 0. in 35 | for i = 0 to Array.length bodies - 1 do 36 | let b = bodies.(i) in 37 | e := !e +. 0.5 *. b.mass *. (b.vx *. b.vx +. b.vy *. b.vy +. b.vz *. b.vz); 38 | for j = i+1 to Array.length bodies - 1 do 39 | let b' = bodies.(j) in 40 | let dx = b.x -. b'.x and dy = b.y -. b'.y and dz = b.z -. b'.z in 41 | let distance = sqrt(dx *. dx +. dy *. dy +. dz *. dz) in 42 | e := !e -. (b.mass *. b'.mass) /. distance 43 | done 44 | done; 45 | !e 46 | 47 | let pi = 3.141592653589793 48 | let solar_mass = 4. *. pi *. pi 49 | let days_per_year = 365.24 50 | 51 | let offset_momentum bodies = 52 | let px = ref 0. and py = ref 0. and pz = ref 0. in 53 | for i = 0 to Array.length bodies - 1 do 54 | px := !px +. bodies.(i).vx *. bodies.(i).mass; 55 | py := !py +. bodies.(i).vy *. bodies.(i).mass; 56 | pz := !pz +. bodies.(i).vz *. bodies.(i).mass; 57 | done; 58 | bodies.(0).vx <- -. !px /. solar_mass; 59 | bodies.(0).vy <- -. !py /. solar_mass; 60 | bodies.(0).vz <- -. !pz /. solar_mass 61 | 62 | let initialize_bodies num_bodies = 63 | Array.init num_bodies (fun _ -> 64 | { x = (Random.float 10.); 65 | y = (Random.float 10.); 66 | z = (Random.float 10.); 67 | vx= (Random.float 5.) *. days_per_year; 68 | vy= (Random.float 4.) *. days_per_year; 69 | vz= (Random.float 5.) *. days_per_year; 70 | mass=(Random.float 10.) *. solar_mass; }) 71 | 72 | let () = 73 | let n = int_of_string(Sys.argv.(1)) in 74 | let num_bodies = int_of_string(Sys.argv.(2)) in 75 | let bodies = initialize_bodies num_bodies in 76 | let dt = 0.01 in 77 | offset_momentum bodies; 78 | Printf.printf "%.9f\n" (energy bodies); 79 | for _i = 1 to n do 80 | advance bodies num_bodies dt; 81 | update bodies dt; 82 | done; 83 | Printf.printf "%.9f\n" (energy bodies) 84 | -------------------------------------------------------------------------------- /ocaml2020-workshop-parallel/examples/nbody_task.ml: -------------------------------------------------------------------------------- 1 | 2 | module T = Domainslib.Task 3 | 4 | type planet = { mutable x : float; mutable y : float; mutable z : float; 5 | mutable vx: float; mutable vy: float; mutable vz: float; 6 | mass : float } 7 | 8 | let advance pool n_bodies n_domains bodies dt = 9 | T.parallel_for pool 10 | ~chunk_size:(n_bodies/n_domains) 11 | ~start:0 12 | ~finish:(n_bodies - 1) 13 | ~body:(fun i -> 14 | let b = bodies.(i) in 15 | for j = 0 to n_bodies - 1 do 16 | let b' = bodies.(j) in 17 | if (i!=j) then begin 18 | let dx = b.x -. b'.x and dy = b.y -. b'.y and dz = b.z -. b'.z in 19 | let dist2 = dx *. dx +. dy *. dy +. dz *. dz in 20 | let mag = dt /. (dist2 *. sqrt(dist2)) in 21 | b.vx <- b.vx -. dx *. b'.mass *. mag; 22 | b.vy <- b.vy -. dy *. b'.mass *. mag; 23 | b.vz <- b.vz -. dz *. b'.mass *. mag; 24 | end 25 | done 26 | ); 27 | for i = 0 to Array.length bodies - 1 do 28 | let b = bodies.(i) in 29 | b.x <- b.x +. dt *. b.vx; 30 | b.y <- b.y +. dt *. b.vy; 31 | b.z <- b.z +. dt *. b.vz; 32 | done 33 | 34 | let energy bodies = 35 | let e = ref 0. in 36 | for i = 0 to Array.length bodies - 1 do 37 | let b = bodies.(i) in 38 | e := !e +. 0.5 *. b.mass *. (b.vx *. b.vx +. b.vy *. b.vy +. b.vz *. b.vz); 39 | for j = i+1 to Array.length bodies - 1 do 40 | let b' = bodies.(j) in 41 | let dx = b.x -. b'.x and dy = b.y -. b'.y and dz = b.z -. b'.z in 42 | let distance = sqrt(dx *. dx +. dy *. dy +. dz *. dz) in 43 | e := !e -. (b.mass *. b'.mass) /. distance 44 | done 45 | done; 46 | !e 47 | 48 | let pi = 3.141592653589793 49 | let solar_mass = 4. *. pi *. pi 50 | let days_per_year = 365.24 51 | 52 | let offset_momentum bodies = 53 | let px = ref 0. and py = ref 0. and pz = ref 0. in 54 | for i = 0 to Array.length bodies - 1 do 55 | px := !px +. bodies.(i).vx *. bodies.(i).mass; 56 | py := !py +. bodies.(i).vy *. bodies.(i).mass; 57 | pz := !pz +. bodies.(i).vz *. bodies.(i).mass; 58 | done; 59 | bodies.(0).vx <- -. !px /. solar_mass; 60 | bodies.(0).vy <- -. !py /. solar_mass; 61 | bodies.(0).vz <- -. !pz /. solar_mass 62 | 63 | let initialize_bodies num_bodies = 64 | Array.init num_bodies (fun _ -> 65 | { x = (Random.float 10.); 66 | y = (Random.float 10.); 67 | z = (Random.float 10.); 68 | vx= (Random.float 5.) *. days_per_year; 69 | vy= (Random.float 4.) *. days_per_year; 70 | vz= (Random.float 5.) *. days_per_year; 71 | mass=(Random.float 10.) *. solar_mass; }) 72 | 73 | let () = 74 | let n = int_of_string(Sys.argv.(1)) in 75 | let n_bodies = int_of_string(Sys.argv.(2)) in 76 | let n_domains = int_of_string(Sys.argv.(3)) in 77 | let pool = T.setup_pool ~num_domains:(n_domains - 1) in 78 | let bodies = initialize_bodies n_bodies in 79 | offset_momentum bodies; 80 | Printf.printf "%.9f\n" (energy bodies); 81 | for _i = 1 to n do advance pool n_bodies n_domains bodies 0.01 done; 82 | Printf.printf "%.9f\n" (energy bodies); 83 | T.teardown_pool pool 84 | -------------------------------------------------------------------------------- /ocaml2020-workshop-parallel/examples/nbody_task_write_optim.ml: -------------------------------------------------------------------------------- 1 | 2 | module T = Domainslib.Task 3 | 4 | type planet_pos = { mutable x : float; mutable y : float; mutable z : float; mass : float; } 5 | 6 | type planet_vec = { mutable vx: float; mutable vy: float; mutable vz: float; } 7 | 8 | let advance pool n_domains n_bodies bodies_pos bodies_vec dt = 9 | T.parallel_for pool 10 | ~chunk_size:(n_bodies/n_domains) 11 | ~start:0 12 | ~finish:(n_bodies - 1) 13 | ~body:(fun i -> 14 | let bp = bodies_pos.(i) in 15 | let bv = bodies_vec.(i) in 16 | let vx, vy, vz = ref bv.vx, ref bv.vy, ref bv.vz in 17 | for j = 0 to n_bodies - 1 do 18 | let bp' = bodies_pos.(j) in 19 | if (i!=j) then begin 20 | let dx = bp.x -. bp'.x and dy = bp.y -. bp'.y and dz = bp.z -. bp'.z in 21 | let dist2 = dx *. dx +. dy *. dy +. dz *. dz in 22 | let mag = dt /. (dist2 *. sqrt(dist2)) in 23 | let mass = bp'.mass in 24 | vx := !vx -. dx *. mass *. mag; 25 | vy := !vy -. dy *. mass *. mag; 26 | vz := !vz -. dz *. mass *. mag; 27 | end 28 | done; 29 | bv.vx <- !vx; 30 | bv.vy <- !vy; 31 | bv.vz <- !vz); 32 | for i = 0 to n_bodies - 1 do 33 | let bp = bodies_pos.(i) in 34 | let bv = bodies_vec.(i) in 35 | bp.x <- bp.x +. dt *. bv.vx; 36 | bp.y <- bp.y +. dt *. bv.vy; 37 | bp.z <- bp.z +. dt *. bv.vz; 38 | done 39 | 40 | let energy bodies_pos bodies_vec = 41 | let e = ref 0. in 42 | for i = 0 to Array.length bodies_pos - 1 do 43 | let bp = bodies_pos.(i) in 44 | let bv = bodies_vec.(i) in 45 | e := !e +. 0.5 *. bp.mass *. (bv.vx *. bv.vx +. bv.vy *. bv.vy +. bv.vz *. bv.vz); 46 | for j = i+1 to Array.length bodies_pos - 1 do 47 | let bp' = bodies_pos.(j) in 48 | let dx = bp.x -. bp'.x and dy = bp.y -. bp'.y and dz = bp.z -. bp'.z in 49 | let distance = sqrt(dx *. dx +. dy *. dy +. dz *. dz) in 50 | e := !e -. (bp.mass *. bp'.mass) /. distance 51 | done 52 | done; 53 | !e 54 | 55 | let pi = 3.141592653589793 56 | let solar_mass = 4. *. pi *. pi 57 | let days_per_year = 365.24 58 | 59 | let offset_momentum bodies_pos bodies_vec = 60 | let px = ref 0. and py = ref 0. and pz = ref 0. in 61 | for i = 0 to Array.length bodies_pos - 1 do 62 | px := !px +. bodies_vec.(i).vx *. bodies_pos.(i).mass; 63 | py := !py +. bodies_vec.(i).vy *. bodies_pos.(i).mass; 64 | pz := !pz +. bodies_vec.(i).vz *. bodies_pos.(i).mass; 65 | done; 66 | bodies_vec.(0).vx <- -. !px /. solar_mass; 67 | bodies_vec.(0).vy <- -. !py /. solar_mass; 68 | bodies_vec.(0).vz <- -. !pz /. solar_mass 69 | 70 | let initialize_bodies num_bodies = 71 | (Array.init num_bodies (fun _ -> 72 | { x = (Random.float 10.); 73 | y = (Random.float 10.); 74 | z = (Random.float 10.); 75 | mass=(Random.float 10.) *. solar_mass; }), 76 | Array.init num_bodies (fun _ -> 77 | { 78 | vx= (Random.float 5.) *. days_per_year; 79 | vy= (Random.float 4.) *. days_per_year; 80 | vz= (Random.float 5.) *. days_per_year; 81 | })) 82 | 83 | let () = 84 | let n = int_of_string(Sys.argv.(1)) in 85 | let n_bodies = int_of_string(Sys.argv.(2)) in 86 | let n_domains = int_of_string(Sys.argv.(3)) in 87 | let pool = T.setup_pool ~num_domains:(n_domains - 1) in 88 | let bodies_pos, bodies_vec = initialize_bodies n_bodies in 89 | offset_momentum bodies_pos bodies_vec; 90 | Printf.printf "%.9f\n" (energy bodies_pos bodies_vec); 91 | for _i = 1 to n do advance pool n_domains n_bodies bodies_pos bodies_vec 0.01 done; 92 | Printf.printf "%.9f\n" (energy bodies_pos bodies_vec); 93 | T.teardown_pool pool 94 | -------------------------------------------------------------------------------- /ocaml2020-workshop-parallel/examples/nbody_task_triangle.ml: -------------------------------------------------------------------------------- 1 | 2 | module T = Domainslib.Task 3 | 4 | type planet = { mutable x : float; mutable y : float; mutable z : float; 5 | mutable vx: float; mutable vy: float; mutable vz: float; 6 | mass : float } 7 | 8 | (* Compact index into flat array (of size) for 9 | matrix of upper triangle (including diagonal) 10 | 0th row contains n elements, 11 | 1st row contains n-1 elements, 12 | kth row contains n-k elements 13 | *) 14 | let index n i j = (i * ( (2*n) - 1 - i) / 2) + j 15 | 16 | let initialize_velocities n_bodies = 17 | Array.init (index n_bodies n_bodies n_bodies) (fun _ -> {x=0.; y=0.; z=0.; vx=0.; vy=0.; vz=0.; mass=0.}) 18 | 19 | let advance pool bodies velocities dt = 20 | let n_bodies = Array.length bodies in 21 | (* calculate velocity increments *) 22 | T.parallel_for pool 23 | ~chunk_size:1 24 | ~start:0 25 | ~finish:(n_bodies - 1) 26 | ~body:(fun i -> 27 | let b = bodies.(i) in 28 | let vx = ref 0. and vy = ref 0. and vz = ref 0. in 29 | for j = i+1 to n_bodies - 1 do 30 | let b' = bodies.(j) in 31 | let dx = b.x -. b'.x and dy = b.y -. b'.y and dz = b.z -. b'.z in 32 | let dist2 = dx *. dx +. dy *. dy +. dz *. dz in 33 | let mag = dt /. (dist2 *. sqrt(dist2)) in 34 | 35 | vx := !vx -. dx *. b'.mass *. mag; 36 | vy := !vy -. dy *. b'.mass *. mag; 37 | vz := !vz -. dz *. b'.mass *. mag; 38 | 39 | let vb' = velocities.(index n_bodies i j) in 40 | vb'.vx <- dx *. b.mass *. mag; 41 | vb'.vy <- dy *. b.mass *. mag; 42 | vb'.vz <- dz *. b.mass *. mag 43 | done; 44 | let vb = velocities.(index n_bodies i i) in 45 | vb.vx <- !vx; vb.vy <- !vy; vb.vz <- !vz 46 | ); 47 | let k = ref 0 in 48 | (* accumulate velocities *) 49 | for i = 0 to n_bodies - 1 do 50 | for j = i to n_bodies - 1 do 51 | let b = bodies.(j) and v = velocities.(!k) in 52 | b.vx <- b.vx +. v.vx; 53 | b.vy <- b.vy +. v.vy; 54 | b.vz <- b.vz +. v.vz; 55 | incr k 56 | done 57 | done; 58 | (* advance positions *) 59 | for i = 0 to n_bodies - 1 do 60 | let b = bodies.(i) in 61 | b.x <- b.x +. dt *. b.vx; 62 | b.y <- b.y +. dt *. b.vy; 63 | b.z <- b.z +. dt *. b.vz; 64 | done 65 | 66 | let energy bodies = 67 | let e = ref 0. in 68 | for i = 0 to Array.length bodies - 1 do 69 | let b = bodies.(i) in 70 | e := !e +. 0.5 *. b.mass *. (b.vx *. b.vx +. b.vy *. b.vy +. b.vz *. b.vz); 71 | for j = i+1 to Array.length bodies - 1 do 72 | let b' = bodies.(j) in 73 | let dx = b.x -. b'.x and dy = b.y -. b'.y and dz = b.z -. b'.z in 74 | let distance = sqrt(dx *. dx +. dy *. dy +. dz *. dz) in 75 | e := !e -. (b.mass *. b'.mass) /. distance 76 | done 77 | done; 78 | !e 79 | 80 | let pi = 3.141592653589793 81 | let solar_mass = 4. *. pi *. pi 82 | let days_per_year = 365.24 83 | 84 | let offset_momentum bodies = 85 | let px = ref 0. and py = ref 0. and pz = ref 0. in 86 | for i = 0 to Array.length bodies - 1 do 87 | px := !px +. bodies.(i).vx *. bodies.(i).mass; 88 | py := !py +. bodies.(i).vy *. bodies.(i).mass; 89 | pz := !pz +. bodies.(i).vz *. bodies.(i).mass; 90 | done; 91 | bodies.(0).vx <- -. !px /. solar_mass; 92 | bodies.(0).vy <- -. !py /. solar_mass; 93 | bodies.(0).vz <- -. !pz /. solar_mass 94 | 95 | let initialize_bodies n_bodies = 96 | Array.init n_bodies (fun _ -> 97 | { x = (Random.float 10.); 98 | y = (Random.float 10.); 99 | z = (Random.float 10.); 100 | vx= (Random.float 5.) *. days_per_year; 101 | vy= (Random.float 4.) *. days_per_year; 102 | vz= (Random.float 5.) *. days_per_year; 103 | mass=(Random.float 10.) *. solar_mass; }) 104 | 105 | let () = 106 | let n = int_of_string(Sys.argv.(1)) in 107 | let n_bodies = int_of_string(Sys.argv.(2)) in 108 | let n_domains = int_of_string(Sys.argv.(3)) in 109 | let pool = T.setup_pool ~num_domains:(n_domains - 1) in 110 | let bodies = initialize_bodies n_bodies in 111 | offset_momentum bodies; 112 | Printf.printf "%.9f\n" (energy bodies); 113 | let velocities = initialize_velocities n_bodies in 114 | for _i = 1 to n do advance pool bodies velocities 0.01 done; 115 | Printf.printf "%.9f\n" (energy bodies); 116 | T.teardown_pool pool 117 | -------------------------------------------------------------------------------- /ocaml2021-workshop-effects/eio.tex: -------------------------------------------------------------------------------- 1 | \documentclass[a4paper,twocolumn]{article} 2 | \usepackage[colorlinks=true]{hyperref} 3 | \usepackage{graphicx} 4 | % From libnqsbtls.tex 5 | \usepackage{xcolor,listings} 6 | 7 | \newcommand\inputml[1]{\lstinputlisting[language={[Objective]Caml}]{#1}} 8 | 9 | \lstdefinelanguage{OCaml}{ 10 | keywords={ 11 | and,as,assert,asr,begin,class,constraint,do,done,downto,effect,else,end,exception, 12 | external,false,for,fun,function,functor,if,implicit,in,include,inherit,initializer, 13 | land,lazy,let,lor,lsl,lsr,lxor,macro,match,method,mod,module,mutable,new,object, 14 | of,open,or,private,rec,sig,struct,then,to,true,try,type,val,virtual,when, 15 | with,while}, 16 | comment=[s]{(*\ }{\ *)}, 17 | } 18 | 19 | \definecolor{darkgreen}{rgb}{0,0.2,0} 20 | \definecolor{darkblue}{rgb}{0.1,0.1,0.8} 21 | \definecolor{darkbrown}{rgb}{0.5,0.3,0.0} 22 | \definecolor{grey}{rgb}{0.5,0.5,0.5} 23 | \definecolor{darkgrey}{rgb}{0.2,0.2,0.2} 24 | 25 | \lstdefinestyle{ocaml}{ 26 | basicstyle=\ttfamily, % \small 27 | basewidth=0.5em, 28 | commentstyle=\color{darkgreen}, 29 | escapeinside={(**}{)}, 30 | keywordstyle=\color{darkblue}, 31 | language=OCaml, 32 | morekeywords={macro}, 33 | stringstyle=\color{blue}, 34 | showstringspaces=false, 35 | mathescape=true, 36 | moredelim=**[is][]{?}{?}, 37 | moredelim=**[is][]{&}{&}, 38 | } 39 | 40 | \lstset{literate=% 41 | {->}{{$\to$}}2 42 | {...}{{$\ldots$}}2 43 | } 44 | 45 | \begin{document} 46 | 47 | \title{Experiences with Effects} 48 | \author{Thomas Leonard\and 49 | Craig Ferguson\and 50 | Patrick Ferris\and 51 | Sadiq Jaffer\and 52 | Tom Kelly\and 53 | KC Sivaramakrishnan\and 54 | Anil Madhavapeddy} 55 | \maketitle 56 | 57 | \begin{abstract} 58 | The multicore branch of OCaml adds support for \emph{effect handlers}. 59 | In this talk, we report our experiences with effects, 60 | both from converting existing code, and from writing new code. 61 | Converting the Angstrom parser from a callback style to effects 62 | greatly simplified the code, while also improving performance and reducing allocations. 63 | Our experimental Eio library uses effects to allow writing concurrent code in direct style, 64 | without the need for monads (as found in Lwt or Async). 65 | 66 | \end{abstract} 67 | 68 | \section*{Effects} 69 | 70 | The multicore branch of OCaml adds support for \emph{effect handlers}\footnote{Retrofitting Effect Handlers onto OCaml, accepted to PLDI 2021}. 71 | Using effects brings several advantages over using callbacks or monadic style: 72 | 73 | \begin{itemize} 74 | \item It is faster, because no heap allocations are needed to simulate a stack. 75 | \item Concurrent code can be written in the same style as plain non-concurrent code. 76 | \item Because a real stack is used, exception backtraces and stack-based profiling work as expected. 77 | \item Other features of the language (such as {\tt try}/{\tt with}, {\tt match}, {\tt while}, etc) 78 | can be used in concurrent code. 79 | \end{itemize} 80 | 81 | Installing an effect handler executes a function in a new stack, called a \emph{fibre}. 82 | The function can \emph{perform} an effect (similar to raising an exception), transferring control to the handler. 83 | Unlike an exception handler, an effect handler also receives a \emph{continuation}, 84 | which can be used to resume the suspended fibre when the handler is ready. 85 | 86 | \section*{Angstrom with effects} 87 | 88 | A natural implementation for a parser is a function that takes an input stream and returns the parsed result. 89 | This works well if the complete input is present at the start, or if the application can block while waiting for more data. 90 | 91 | However, if the parser needs to run concurrently with other code (as is typical in a network service), then this API needs to change so that when it requires more input the parser returns a callback to the application. 92 | Angstrom\footnote{\url{https://github.com/inhabitedtype/angstrom/}} is a parser-combinator library written in this way. 93 | It is intended for high-performance applications, such as network protocols. 94 | 95 | To give a quick idea of the difference between the callback style and the direct style, here is Angstrom's implementation of the \verb|*>| combinator (which uses a pair of parsers \verb|a| and \verb|b| to parse a pair of items, discarding the first result): 96 | \begin{lstlisting}[style=ocaml] 97 | let (*>) a b = 98 | { run = fun input pos more fail succ -> 99 | let succ' input' pos' more' _ = 100 | b.run input' pos' more' fail succ in 101 | a.run input pos more fail succ' 102 | } 103 | \end{lstlisting} 104 | 105 | Here is the same thing written in direct style (without support for asynchronous reads): 106 | \begin{lstlisting}[style=ocaml] 107 | let (*>) a b state = 108 | let _ = a state in 109 | b state 110 | \end{lstlisting} 111 | 112 | % Mention use of exceptions? 113 | 114 | But now, thanks to effects, the simpler direct-style version \emph{does} support asynchronous reads. 115 | If the \verb|a| or \verb|b| parser needs more input, it can perform an effect to get it. 116 | 117 | Interestingly, our ``effects" version of Angstrom doesn't actually perform or handle any effects. 118 | Instead, it allows the user to provide a function for reading more data; 119 | if that function happens to perform an effect that suspends the parsing operation during the read 120 | then other threads will be able to run while the parser is waiting for the read to complete. 121 | 122 | An initial benchmark (parsing an HTTP request) shows that the simpler direct-style version of Angstrom is also slightly faster, and performs considerably fewer allocations: 123 | 124 | \begin{figure}[h] 125 | \begin{tabular}{l|rrrrr} 126 | & Time & MinWrds & MajWrds \\ 127 | \hline 128 | Callbacks & 11.18ms & 4640k & 50471 \\ 129 | Effects & 10.46ms & 1066k & 285 \\ 130 | \end{tabular} 131 | \end{figure} 132 | 133 | We can also implement the old (callback-based) API on top of the new one, for compatibility. 134 | The refill-buffer effect is only performed rarely, and so we only allocate a callback occasionally, 135 | when more data is actually needed, not for every parsing operation. 136 | 137 | 138 | \section*{Effects-based IO} 139 | 140 | It is easy to use effects to implement a cooperative scheduler, 141 | by running each thread in its own fibre. 142 | Threads perform effects when they want to block (e.g. for IO). 143 | The scheduler handles the effect by saving the continuation in the IO operation and resuming the next runnable thread. 144 | 145 | Our experimental new IO library\footnote{\url{https://github.com/ocaml-multicore/eio}} 146 | does this to provide direct-style IO, without the need for monads. 147 | The library aims to support multiple platforms using optimised platform-specific backends, 148 | such as {\tt io\_uring}\footnote{\url{https://kernel.dk/io_uring.pdf}} on Linux and 149 | Grand Central Dispatch\footnote{\url{https://developer.apple.com/documentation/DISPATCH}} on macos. 150 | 151 | In the talk we will demonstrate the current state of the library, and provide comparisons between Lwt and Eio. 152 | 153 | \section*{HTTP benchmarks} 154 | 155 | Results from our preliminary benchmarking of HTTP servers indicate that an effect-based IO library is competitive both with callback-based OCaml implementatons but also commonly used frameworks in other languages, such as Go's \emph{net/http}. There remains a performance gap between the OCaml implementations and high performing Rust ones, the closing of which is a goal we intend to provide more progress on in the talk. 156 | 157 | \begin{figure}[hbtp] 158 | \caption{HTTP throughput comparison} 159 | \centering 160 | \includegraphics[width=0.45\textwidth]{rps-graph.png} 161 | \end{figure} 162 | 163 | Figure 1 shows a throughput comparison of several HTTP server implementations: 164 | \begin{itemize} 165 | \item OCaml 4.12 with cohttp 4.0 and Lwt 5.4.0 (cohttp\_lwt\_unix) 166 | \item OCaml 4.12 with httpaf 0.7.1 and Lwt 5.4.0 (httpaf\_lwt) 167 | \item OCaml 4.12+domains+effects with 0.7.1 and aeio 0.2.0 (httpaf\_effects) 168 | \item Go 1.15.4 with net/http (nethttp\_go) 169 | \item rust 1.47.0 with hyper 0.12 and tokio 0.1.11 (rust\_hyper) 170 | \end{itemize} 171 | 172 | All benchmarks were restricted to one core. The results above were from an Intel(R) Xeon(R) Silver 4108 CPU with turbo disabled running Ubuntu 18.04.3 LTS and Linux 4.15.0-65-generic. Code for the specific run used for benchmarking can be found at https://github.com/ocaml-multicore/retro-httpaf-bench/tree/ocamlworkshop2021 . 173 | 174 | \end{document} 175 | -------------------------------------------------------------------------------- /wasm-wg2022-stack-switching/eio.tex: -------------------------------------------------------------------------------- 1 | \documentclass[a4paper,twocolumn]{article} 2 | \usepackage[colorlinks=true]{hyperref} 3 | \usepackage{graphicx} 4 | % From libnqsbtls.tex 5 | \usepackage{xcolor,listings} 6 | 7 | \newcommand\inputml[1]{\lstinputlisting[language={[Objective]Caml}]{#1}} 8 | 9 | \lstdefinelanguage{OCaml}{ 10 | keywords={ 11 | and,as,assert,asr,begin,class,constraint,do,done,downto,effect,else,end,exception, 12 | external,false,for,fun,function,functor,if,implicit,in,include,inherit,initializer, 13 | land,lazy,let,lor,lsl,lsr,lxor,macro,match,method,mod,module,mutable,new,object, 14 | of,open,or,private,rec,sig,struct,then,to,true,try,type,val,virtual,when, 15 | with,while}, 16 | comment=[s]{(*\ }{\ *)}, 17 | } 18 | 19 | \definecolor{darkgreen}{rgb}{0,0.2,0} 20 | \definecolor{darkblue}{rgb}{0.1,0.1,0.8} 21 | \definecolor{darkbrown}{rgb}{0.5,0.3,0.0} 22 | \definecolor{grey}{rgb}{0.5,0.5,0.5} 23 | \definecolor{darkgrey}{rgb}{0.2,0.2,0.2} 24 | 25 | \lstdefinestyle{ocaml}{ 26 | basicstyle=\ttfamily, % \small 27 | basewidth=0.5em, 28 | commentstyle=\color{darkgreen}, 29 | escapeinside={(**}{)}, 30 | keywordstyle=\color{darkblue}, 31 | language=OCaml, 32 | morekeywords={macro}, 33 | stringstyle=\color{blue}, 34 | showstringspaces=false, 35 | mathescape=true, 36 | moredelim=**[is][]{?}{?}, 37 | moredelim=**[is][]{&}{&}, 38 | } 39 | 40 | \lstset{literate=% 41 | {->}{{$\to$}}2 42 | {...}{{$\ldots$}}2 43 | } 44 | 45 | \begin{document} 46 | 47 | \title{Experiences with Effects} 48 | \author{Thomas Leonard\and 49 | Craig Ferguson\and 50 | Patrick Ferris\and 51 | Sadiq Jaffer\and 52 | Tom Kelly\and 53 | KC Sivaramakrishnan\and 54 | Anil Madhavapeddy} 55 | \maketitle 56 | 57 | \begin{abstract} 58 | The multicore branch of OCaml adds support for \emph{effect handlers}. 59 | In this talk, we report our experiences with effects, 60 | both from converting existing code, and from writing new code. 61 | Converting the Angstrom parser from a callback style to effects 62 | greatly simplified the code, while also improving performance and reducing allocations. 63 | Our experimental Eio library uses effects to allow writing concurrent code in direct style, 64 | without the need for monads (as found in Lwt or Async). 65 | 66 | \end{abstract} 67 | 68 | \section*{Effects} 69 | 70 | The multicore branch of OCaml adds support for \emph{effect handlers}\footnote{Retrofitting Effect Handlers onto OCaml, accepted to PLDI 2021}. 71 | Using effects brings several advantages over using callbacks or monadic style: 72 | 73 | \begin{itemize} 74 | \item It is faster, because no heap allocations are needed to simulate a stack. 75 | \item Concurrent code can be written in the same style as plain non-concurrent code. 76 | \item Because a real stack is used, exception backtraces and stack-based profiling work as expected. 77 | \item Other features of the language (such as {\tt try}/{\tt with}, {\tt match}, {\tt while}, etc) 78 | can be used in concurrent code. 79 | \end{itemize} 80 | 81 | Installing an effect handler executes a function in a new stack, called a \emph{fibre}. 82 | The function can \emph{perform} an effect (similar to raising an exception), transferring control to the handler. 83 | Unlike an exception handler, an effect handler also receives a \emph{continuation}, 84 | which can be used to resume the suspended fibre when the handler is ready. 85 | 86 | \section*{Angstrom with effects} 87 | 88 | A natural implementation for a parser is a function that takes an input stream and returns the parsed result. 89 | This works well if the complete input is present at the start, or if the application can block while waiting for more data. 90 | 91 | However, if the parser needs to run concurrently with other code (as is typical in a network service), then this API needs to change so that when it requires more input the parser returns a callback to the application. 92 | Angstrom\footnote{\url{https://github.com/inhabitedtype/angstrom/}} is a parser-combinator library written in this way. 93 | It is intended for high-performance applications, such as network protocols. 94 | 95 | To give a quick idea of the difference between the callback style and the direct style, here is Angstrom's implementation of the \verb|*>| combinator (which uses a pair of parsers \verb|a| and \verb|b| to parse a pair of items, discarding the first result): 96 | \begin{lstlisting}[style=ocaml] 97 | let (*>) a b = 98 | { run = fun input pos more fail succ -> 99 | let succ' input' pos' more' _ = 100 | b.run input' pos' more' fail succ in 101 | a.run input pos more fail succ' 102 | } 103 | \end{lstlisting} 104 | 105 | Here is the same thing written in direct style (without support for asynchronous reads): 106 | \begin{lstlisting}[style=ocaml] 107 | let (*>) a b state = 108 | let _ = a state in 109 | b state 110 | \end{lstlisting} 111 | 112 | % Mention use of exceptions? 113 | 114 | But now, thanks to effects, the simpler direct-style version \emph{does} support asynchronous reads. 115 | If the \verb|a| or \verb|b| parser needs more input, it can perform an effect to get it. 116 | 117 | Interestingly, our ``effects" version of Angstrom doesn't actually perform or handle any effects. 118 | Instead, it allows the user to provide a function for reading more data; 119 | if that function happens to perform an effect that suspends the parsing operation during the read 120 | then other threads will be able to run while the parser is waiting for the read to complete. 121 | 122 | An initial benchmark (parsing an HTTP request) shows that the simpler direct-style version of Angstrom is also slightly faster, and performs considerably fewer allocations: 123 | 124 | \begin{figure}[h] 125 | \begin{tabular}{l|rrrrr} 126 | & Time & MinWrds & MajWrds \\ 127 | \hline 128 | Callbacks & 11.18ms & 4640k & 50471 \\ 129 | Effects & 10.46ms & 1066k & 285 \\ 130 | \end{tabular} 131 | \end{figure} 132 | 133 | We can also implement the old (callback-based) API on top of the new one, for compatibility. 134 | The refill-buffer effect is only performed rarely, and so we only allocate a callback occasionally, 135 | when more data is actually needed, not for every parsing operation. 136 | 137 | 138 | \section*{Effects-based IO} 139 | 140 | It is easy to use effects to implement a cooperative scheduler, 141 | by running each thread in its own fibre. 142 | Threads perform effects when they want to block (e.g. for IO). 143 | The scheduler handles the effect by saving the continuation in the IO operation and resuming the next runnable thread. 144 | 145 | Our experimental new IO library\footnote{\url{https://github.com/ocaml-multicore/eio}} 146 | does this to provide direct-style IO, without the need for monads. 147 | The library aims to support multiple platforms using optimised platform-specific backends, 148 | such as {\tt io\_uring}\footnote{\url{https://kernel.dk/io_uring.pdf}} on Linux and 149 | Grand Central Dispatch\footnote{\url{https://developer.apple.com/documentation/DISPATCH}} on macos. 150 | 151 | In the talk we will demonstrate the current state of the library, and provide comparisons between Lwt and Eio. 152 | 153 | \section*{HTTP benchmarks} 154 | 155 | Results from our preliminary benchmarking of HTTP servers indicate that an effect-based IO library is competitive both with callback-based OCaml implementatons but also commonly used frameworks in other languages, such as Go's \emph{net/http}. There remains a performance gap between the OCaml implementations and high performing Rust ones, the closing of which is a goal we intend to provide more progress on in the talk. 156 | 157 | \begin{figure}[hbtp] 158 | \caption{HTTP throughput comparison} 159 | \centering 160 | \includegraphics[width=0.45\textwidth]{rps-graph.png} 161 | \end{figure} 162 | 163 | Figure 1 shows a throughput comparison of several HTTP server implementations: 164 | \begin{itemize} 165 | \item OCaml 4.12 with cohttp 4.0 and Lwt 5.4.0 (cohttp\_lwt\_unix) 166 | \item OCaml 4.12 with httpaf 0.7.1 and Lwt 5.4.0 (httpaf\_lwt) 167 | \item OCaml 4.12+domains+effects with 0.7.1 and aeio 0.2.0 (httpaf\_effects) 168 | \item Go 1.15.4 with net/http (nethttp\_go) 169 | \item rust 1.47.0 with hyper 0.12 and tokio 0.1.11 (rust\_hyper) 170 | \end{itemize} 171 | 172 | All benchmarks were restricted to one core. The results above were from an Intel(R) Xeon(R) Silver 4108 CPU with turbo disabled running Ubuntu 18.04.3 LTS and Linux 4.15.0-65-generic. Code for the specific run used for benchmarking can be found at https://github.com/ocaml-multicore/retro-httpaf-bench/tree/ocamlworkshop2021 . 173 | 174 | \end{document} 175 | -------------------------------------------------------------------------------- /ocaml2023-eio/eio.tex: -------------------------------------------------------------------------------- 1 | \documentclass[a4paper,twocolumn]{article} 2 | \usepackage[colorlinks=true]{hyperref} 3 | \usepackage{graphicx} 4 | % From libnqsbtls.tex 5 | \usepackage{xcolor,listings} 6 | 7 | \newcommand\inputml[1]{\lstinputlisting[language={[Objective]Caml}]{#1}} 8 | 9 | \lstdefinelanguage{OCaml}{ 10 | keywords={ 11 | and,as,assert,asr,begin,class,constraint,do,done,downto,effect,else,end,exception, 12 | external,false,for,fun,function,functor,if,implicit,in,include,inherit,initializer, 13 | land,lazy,let,lor,lsl,lsr,lxor,macro,match,method,mod,module,mutable,new,object, 14 | of,open,or,private,rec,sig,struct,then,to,true,try,type,val,virtual,when, 15 | with,while}, 16 | comment=[s]{(*\ }{\ *)}, 17 | } 18 | 19 | \definecolor{darkgreen}{rgb}{0,0.2,0} 20 | \definecolor{darkblue}{rgb}{0.1,0.1,0.8} 21 | \definecolor{darkbrown}{rgb}{0.5,0.3,0.0} 22 | \definecolor{grey}{rgb}{0.5,0.5,0.5} 23 | \definecolor{darkgrey}{rgb}{0.2,0.2,0.2} 24 | 25 | \lstdefinestyle{ocaml}{ 26 | basicstyle=\ttfamily, % \small 27 | basewidth=0.5em, 28 | commentstyle=\color{darkgreen}, 29 | escapeinside={(**}{)}, 30 | keywordstyle=\color{darkblue}, 31 | language=OCaml, 32 | morekeywords={macro}, 33 | stringstyle=\color{blue}, 34 | showstringspaces=false, 35 | mathescape=true, 36 | moredelim=**[is][]{?}{?}, 37 | moredelim=**[is][]{&}{&}, 38 | } 39 | 40 | \lstset{literate=% 41 | {->}{{$\to$}}2 42 | {...}{{$\ldots$}}2 43 | } 44 | 45 | \begin{document} 46 | 47 | \title{Eio 1.0 -- Effects-based IO for OCaml 5} 48 | \author{Thomas Leonard\and 49 | Patrick Ferris\and 50 | Christiano Haesbaert\and 51 | Lucas Pluvinage\and 52 | Vesa Karvonen\and 53 | Sudha Parimala\and 54 | KC Sivaramakrishnan\and 55 | Vincent Balat\and 56 | Anil Madhavapeddy} 57 | \maketitle 58 | 59 | \begin{abstract} 60 | 61 | Eio\footnote{\url{https://github.com/ocaml-multicore/eio}} provides an effects-based direct-style IO stack for OCaml 5. This talk introduces Eio's main features, such as use of effects, multi-core support and lock-free data-structures, support for modular programming, interoperability with other concurrency libraries such as Lwt, Async and Domainslib, and interactive monitoring support enabled by the custom runtime events in OCaml 5.1. 62 | We will report on our experiences porting existing applications to Eio. 63 | 64 | \end{abstract} 65 | 66 | \section*{Motivation} 67 | 68 | OCaml 5 added support for progamming with \emph{effects}, which has many advantages over using callbacks or monadic style: it is faster, because no heap allocations are needed to simulate a stack; concurrent code can be written in the same style as plain non-concurrent code; exception backtraces work; and other features of the language (such as {\tt try}/{\tt with}, {\tt match}, {\tt while}, etc) can be used in concurrent code. 69 | OCaml 5 also added support for running on multiple cores, allowing much improved performance. 70 | 71 | Given the benefits of these new features, there is a lot of interest in moving existing OCaml code to a new IO library. 72 | This is a good opportunity to bring the community together around a single IO API, as well as upgrading our IO support with modern features, such as optimised backends (e.g. \verb|io_uring|), structured concurrency, improved security, testing and tracing. 73 | 74 | \section*{Structure of Eio} 75 | 76 | Eio is made up of several packages. 77 | The \verb|eio| package itself is similar in scope to \verb|lwt|: 78 | it provides primitives for spawning and coordinating fibers, cancelling them, and managing resource lifetimes. 79 | 80 | To use Eio, you also require a \emph{backend} to run a suitable event loop for your platform 81 | (\verb|lwt.unix| is roughly equivalent to an Eio backend). 82 | The event loop must implement the three effects defined by the \verb|eio| package: 83 | 84 | \begin{description} 85 | \item{\verb|Suspend|} suspends the calling fiber, switching to the scheduler's context and providing access to the fiber context. 86 | \item{\verb|Fork|} runs a new fiber (with its own stack). 87 | \item{\verb|Get_context|} gets the fiber context (used for cancellation and fiber-local storage). 88 | \end{description} 89 | 90 | The \verb|eio_mock| backend performs no IO and is around 50 lines of code. 91 | It is intended for running tests that don't interact with the outside world, 92 | but it also provides a good starting point to learn how to write a backend. 93 | 94 | Other backends include \verb|eio_posix| (which uses the \verb|poll| system call to wait for IO), 95 | \verb|eio_linux| (using Linux's \verb|io_uring| system\footnote{\url{https://github.com/axboe/liburing}}), 96 | \verb|eio_windows|, 97 | \verb|eio_js| (running inside a browser with \verb|js_of_ocaml|\footnote{\url{https://ocsigen.org/js_of_ocaml/}}), 98 | and \verb|eio_solo5| (for Mirage unikernels\footnote{\url{https://mirage.io/}}). 99 | 100 | Each backend provides a ``low-level'' API that mimicks the platform's native API, but uses effects so that operations don't block the whole domain. For example, \verb|Eio_posix.Low_level| provides: 101 | 102 | \begin{lstlisting}[style=ocaml,basicstyle=\small] 103 | val read : fd -> bytes -> int -> int -> int 104 | val write : fd -> bytes -> int -> int -> int 105 | \end{lstlisting} 106 | 107 | These two functions have the same signatures as their counterparts in OCaml's Unix module 108 | (except that \verb|fd| wraps \verb|Unix.file_descr| to prevent use-after-close bugs). 109 | Internally, these calls use backend-specific effects to switch to the next runnable fiber while they wait. 110 | 111 | Eio then defines a cross-platform API, and each backend implements some or all of this API using its low-level functions. 112 | It is expected that users will normally program against this cross-platform API, for portability. 113 | The \verb|eio_main| package selects an appropriate backend for the current platform automatically. 114 | 115 | \section*{Modularity} 116 | 117 | Eio has a number of design features intended to support modularity: 118 | 119 | Every OS resource (e.g. an open file handle) must be attached to an active \emph{switch}, 120 | and will be closed when the switch is turned off. 121 | This helps to prevent resource leaks, especially when errors occur. 122 | 123 | It uses \emph{structured concurrency}, so that fibers have well defined lifetimes. 124 | This also uses the switch mechanism, treating fibers as resources. 125 | 126 | Eio has built-in support for cancellation. This is essential when using structured concurrency, 127 | because if one fiber fails then the others must finish before the error can be reported to the parent context. 128 | 129 | Eio wraps file descriptors using a (lock-free) ref-counting scheme. 130 | This ensures that one module in a program cannot corrupt another module's resources 131 | by using a file descriptor after it has been closed. 132 | 133 | Finally, instead of representing the initially-available OS resources (such as the filesystem and network) as globals, 134 | Eio passes them as arguments to the application when the main event loop is started. 135 | This makes it easy to get a bound on how the program, or any part of it, can interact with the outside world. 136 | 137 | \section*{Integrations} 138 | 139 | The \verb|Lwt_eio|\footnote{\url{https://github.com/ocaml-multicore/lwt_eio}} package provides a Lwt engine that simply delegates to Eio's event loop. 140 | The \verb|run_lwt| function runs a Lwt function, blocking the Eio fiber until the result is ready, 141 | while \verb|run_eio| allows Lwt code to run Eio code, getting a promise for its result. 142 | This allows Lwt and Eio code to be mixed freely, which allows existing code to be migrated to Eio in stages. 143 | For example, \verb|tls-eio| was created by starting from \verb|tls-lwt| and converting the code line by line, 144 | testing it along the way. 145 | 146 | Similarly, \verb|Async_eio|\footnote{\url{https://github.com/talex5/async_eio}} allows Async and Eio to be used together in a single domain. 147 | It is even possible to use Async, Eio and Lwt all at the same time! 148 | 149 | Lwt and Async code can only run in a single domain, 150 | and their tasks are scheduled cooperatively with any Eio fibers running in the same domain. 151 | Integration with Domainslib\footnote{\url{https://github.com/ocaml-multicore/domainslib}} is slightly different, 152 | as it manages a set of domains. 153 | Here, we provide a bridge allowing Domainslib jobs to be run from Eio and the results collected. 154 | This bridge is possible because \verb|Domainslib.Task.async| is able to run from an Eio domain, 155 | and \verb|Eio.Promise.resolve| is able to run from a Domainslib one. 156 | 157 | Finally, kcas\footnote{\url{https://github.com/ocaml-multicore/kcas}} provides software transactional memory based on an atomic lock-free multi-word compare-and-set (MCAS) algorithm. 158 | Eio and Domainslib both implement the \verb|domain-local-await| interface\footnote{\url{https://github.com/ocaml-multicore/domain-local-await}}, 159 | allowing kcas operations to span domains controlled by both systems. 160 | 161 | \section*{Tracing} 162 | 163 | Eio can output trace data to a ring buffer, which can be viewed using mirage-trace-viewer. 164 | With OCaml 5.1, this has been updated to work with the new custom events support, so that e.g. GC events are included too. 165 | The Meio\footnote{\url{https://github.com/tarides/meio}} (Monitoring for Eio) project provides a console-based tool for inspecting a running Eio process, 166 | showing the tree of fibers along with profiling information. 167 | 168 | \end{document} 169 | -------------------------------------------------------------------------------- /ocaml2021-workshop-effects/slides.tex: -------------------------------------------------------------------------------- 1 | \documentclass{beamer} 2 | \usepackage{graphicx} 3 | \usepackage{verbatim} 4 | \usepackage{alltt} 5 | \usepackage{xcolor} 6 | \usepackage{listings} 7 | \usepackage{hyperref} 8 | \setbeamertemplate{navigation symbols}{}%remove navigation symbols 9 | 10 | \renewcommand\UrlFont{\color{blue}} 11 | 12 | \newcommand\inputml[1]{\lstinputlisting[language={[Objective]Caml}]{#1}} 13 | 14 | \lstdefinelanguage{OCaml}{ 15 | keywords={ 16 | and,as,assert,asr,begin,class,constraint,do,done,downto,effect,else,end,exception, 17 | external,false,for,fun,function,functor,if,implicit,in,include,inherit,initializer, 18 | land,lazy,let,lor,lsl,lsr,lxor,macro,match,method,mod,module,mutable,new,object, 19 | of,open,or,private,rec,sig,struct,then,to,true,try,type,val,virtual,when, 20 | with,while}, 21 | comment=[s]{(*\ }{\ *)}, 22 | } 23 | 24 | \definecolor{darkgreen}{rgb}{0,0.2,0} 25 | \definecolor{darkblue}{rgb}{0.1,0.1,0.8} 26 | \definecolor{darkbrown}{rgb}{0.5,0.3,0.0} 27 | \definecolor{grey}{rgb}{0.5,0.5,0.5} 28 | \definecolor{darkgrey}{rgb}{0.2,0.2,0.2} 29 | 30 | \lstdefinestyle{ocaml}{ 31 | basicstyle=\ttfamily, % \small 32 | basewidth=0.5em, 33 | commentstyle=\color{darkgreen}, 34 | escapeinside={(**}{)}, 35 | keywordstyle=\color{darkblue}, 36 | language=OCaml, 37 | morekeywords={macro}, 38 | stringstyle=\color{blue}, 39 | showstringspaces=false, 40 | mathescape=true, 41 | moredelim=**[is][]{?}{?}, 42 | moredelim=**[is][]{&}{&}, 43 | } 44 | 45 | \lstdefinestyle{output}{ 46 | basicstyle=\ttfamily\small, 47 | basewidth=0.5em, 48 | } 49 | 50 | \lstset{literate=% 51 | {->}{{$\to$}}2 52 | {...}{{$\ldots$}}2 53 | } 54 | 55 | \newcommand\mlkeyword[1]{{\ttfamily\color{darkblue} #1}} 56 | 57 | \title[Effects]{Experiences with Effects} 58 | \author[Thomas Leonard] 59 | {Thomas Leonard\and Craig Ferguson\and Patrick Ferris\and Sadiq Jaffer\and Tom Kelly\and KC Sivaramakrishnan\and Anil Madhavapeddy} 60 | \institute{OCaml Labs} 61 | \date[OCaml 2021]{The OCaml Users and Developers Workshop, Aug 2021} 62 | 63 | \begin{document} 64 | 65 | \definecolor{grey}{gray}{0.6} 66 | 67 | \frame{\titlepage} 68 | 69 | % \begin{frame} 70 | % \frametitle{Table of Contents} 71 | % \tableofcontents 72 | % \end{frame} 73 | 74 | \begin{frame} 75 | \frametitle{Overview} 76 | \begin{itemize} 77 | \item {\color{grey}Domains / }effects {\color{grey}/ typed effects} 78 | \item Introduction to effects 79 | \item Case study: Converting the Angstrom parser 80 | \item Eio concurrency library 81 | \end{itemize} 82 | \end{frame} 83 | 84 | \begin{frame}[fragile] 85 | \frametitle{Introduction to effects} 86 | \begin{columns}[t] 87 | \begin{column}{4.5cm} 88 | \begin{itemize} 89 | \item Resumable exceptions 90 | \item Multiple stacks 91 | \end{itemize} 92 | \end{column} 93 | \begin{column}{5cm} 94 | \begin{lstlisting}[style=ocaml] 95 | effect Foo : int -> int 96 | 97 | try 98 | println "step 1"; 99 | let x = perform (Foo 2) in 100 | println "step %d" x 101 | with effect (Foo n) k -> 102 | println "step %d" n; 103 | continue k (n + 1) 104 | \end{lstlisting} 105 | \end{column} 106 | \end{columns} 107 | \end{frame} 108 | 109 | \begin{frame} 110 | \frametitle{Advantages of effects} 111 | \begin{itemize} 112 | \item No difference between sequential and concurrent code. 113 | \begin{itemize} 114 | \item No special monad syntax. 115 | \item Can use \mlkeyword{try}, \mlkeyword{match}, \mlkeyword{while}, etc. 116 | \item No separate lwt or async versions of code. 117 | \end{itemize} 118 | \item No heap allocations needed to simulate a stack. 119 | \item A real stack means backtraces and profiling tools work. 120 | \end{itemize} 121 | \end{frame} 122 | 123 | \begin{frame} 124 | \frametitle{Case study: Angstrom} 125 | 126 | \url{https://github.com/inhabitedtype/angstrom/} 127 | \bigskip 128 | \begin{itemize} 129 | \item A library for writing parsers 130 | \item Designed for network protocols 131 | \item Strong focus on performance 132 | \end{itemize} 133 | \end{frame} 134 | 135 | \begin{frame}[fragile] 136 | \frametitle{A toy parser} 137 | \begin{lstlisting}[style=ocaml] 138 | type 'a parser = state -> 'a 139 | 140 | let any_char state = 141 | ensure 1 state; 142 | let c = Input.unsafe_get_char state.input state.pos in 143 | state.pos <- state.pos + 1; 144 | c 145 | 146 | let (*>) a b state = 147 | let _ = a state in 148 | b state 149 | \end{lstlisting} 150 | \end{frame} 151 | 152 | \begin{frame}[fragile] 153 | \frametitle{The Angstrom parser type} 154 | \begin{lstlisting}[style=ocaml,basicstyle=\ttfamily\small] 155 | module State = struct 156 | type 'a t = 157 | | Partial of 'a partial 158 | | Lazy of 'a t Lazy.t 159 | | Done of int * 'a 160 | | Fail of int * string list * string 161 | and 'a partial = 162 | { committed : int; 163 | continue : Bigstringaf.t -> 164 | off:int -> len:int -> More.t -> 'a t } 165 | end 166 | type 'a with_state = Input.t -> int -> More.t -> 'a 167 | type 'a failure = 168 | (string list -> string -> 'a State.t) with_state 169 | type ('a, 'r) success = ('a -> 'r State.t) with_state 170 | type 'a parser = { run : 'r. 171 | ('r failure -> ('a, 'r) success -> 'r State.t) with_state 172 | } 173 | \end{lstlisting} 174 | \end{frame} 175 | 176 | \begin{frame}[fragile] 177 | \frametitle{Angstrom parsers} 178 | \begin{lstlisting}[style=ocaml,basicstyle=\ttfamily\small] 179 | let any_char = 180 | ensure 1 { run = fun input pos more _fail succ -> 181 | succ input (pos + 1) more 182 | (Input.unsafe_get_char input pos) 183 | } 184 | 185 | let (*>) a b = 186 | { run = fun input pos more fail succ -> 187 | let succ' input' pos' more' _ = 188 | b.run input' pos' more' fail succ in 189 | a.run input pos more fail succ' 190 | } 191 | \end{lstlisting} 192 | \end{frame} 193 | 194 | \begin{frame}[fragile] 195 | \frametitle{Angstrom : effects branch} 196 | \url{https://github.com/talex5/angstrom/tree/effects} 197 | \bigskip 198 | \begin{lstlisting}[style=ocaml] 199 | type 'a parser = state -> 'a 200 | 201 | let any_char state = 202 | ensure 1 state; 203 | let c = Input.unsafe_get_char state.input state.pos in 204 | state.pos <- state.pos + 1; 205 | c 206 | 207 | let (*>) a b state = 208 | let _ = a state in 209 | b state 210 | \end{lstlisting} 211 | \end{frame} 212 | 213 | \begin{frame}[fragile] 214 | \frametitle{Parser micro-benchmark} 215 | \begin{lstlisting}[style=ocaml] 216 | let parser = skip_many any_char 217 | \end{lstlisting} 218 | \bigskip 219 | \begin{table} 220 | \begin{tabular}{l|rrrrr} 221 | & Time & MinWrds & MajWrds \\ 222 | \hline 223 | Callbacks & 750.63ms & 160.04Mw & 8,9944.00kw \\ 224 | \uncover<2>{Callbacks'} & \uncover<2>{180.73ms} & \uncover<2>{220.01Mw} & \uncover<2>{9,659.00w} \\ 225 | Effects & 57.81ms & - & - 226 | \end{tabular} 227 | \end{table} 228 | \bigskip 229 | \uncover<1>{1}3 times faster! 230 | \pause 231 | \end{frame} 232 | 233 | \begin{frame}[fragile] 234 | \frametitle{Realistic parser benchmark} 235 | Parsing an HTTP request shows smaller gains: 236 | \bigskip 237 | \begin{table} 238 | \begin{tabular}{l|rrrrr} 239 | & Time & MinWrds & MajWrds \\ 240 | \hline 241 | Callbacks & 60.30ms & 9.28Mw & 102.08kw \\ 242 | Effects & 50.71ms & 2.13Mw & 606.30w 243 | \end{tabular} 244 | \end{table} 245 | \end{frame} 246 | 247 | \begin{frame}[fragile] 248 | \frametitle{Using effects for backwards compatibility} 249 | \begin{lstlisting}[style=ocaml] 250 | effect Read : int -> state 251 | let read c = perform (Read c) 252 | 253 | let parse p = 254 | let buffering = Buffering.create () in 255 | try Unbuffered.parse ~read p 256 | with effect (Read committed) k -> 257 | Buffering.shift buffering committed; 258 | Partial (fun input -> 259 | Buffering.feed_input buffering input; 260 | continue k (Buffering.for_reading buffering) 261 | ) 262 | \end{lstlisting} 263 | (simplified) 264 | \end{frame} 265 | 266 | \begin{frame} 267 | \frametitle{Angstrom summary} 268 | \begin{itemize} 269 | \item Slightly faster 270 | \item Much simpler code 271 | \item No effects in interface 272 | \item Can convert between callbacks and effects easily 273 | \end{itemize} 274 | \end{frame} 275 | 276 | \begin{frame} 277 | \frametitle{Eio : an IO library using effects for concurrency} 278 | \begin{itemize} 279 | \item Alternative to Lwt and Async 280 | \item Generic API that performs effects 281 | \item Cross-platform libuv effect handler 282 | \item High-performance io-uring handler for Linux 283 | \end{itemize} 284 | \end{frame} 285 | 286 | \begin{frame}[fragile] 287 | \frametitle{Eio example} 288 | \begin{lstlisting}[style=ocaml] 289 | let handle_connection = 290 | Httpaf_eio.Server.create_connection_handler 291 | ~config 292 | ~request_handler 293 | ~error_handler 294 | 295 | let main ~net = 296 | Switch.top @@ fun sw -> 297 | let socket = Eio.Net.listen ~sw net (`Tcp (host, port)) 298 | ~reuse_addr:true 299 | ~backlog:1000 300 | in 301 | while true do 302 | Eio.Net.accept_sub ~sw socket handle_connection 303 | ~on_error:log_connection_error 304 | done 305 | \end{lstlisting} 306 | \end{frame} 307 | 308 | \begin{frame} 309 | \frametitle{HTTP benchmark} 310 | \includegraphics[width=\textwidth]{rps-graph.png} 311 | 100 concurrent connections. Servers limited to 1 core. 312 | \end{frame} 313 | 314 | \begin{frame} 315 | \frametitle{Eio : other features} 316 | \begin{itemize} 317 | \item Structured concurrency 318 | \item OCaps security model 319 | \item Tracing support 320 | \item Supports multiple cores 321 | \item Still experimental 322 | \end{itemize} 323 | \includegraphics[width=\textwidth]{trace.png} 324 | \end{frame} 325 | 326 | \begin{frame} 327 | \frametitle{Summary} 328 | \begin{itemize} 329 | \item Concurrency with effects works very well 330 | \item Effects have very good performance 331 | \item No bugs found in effects system during testing 332 | \end{itemize} 333 | \bigskip 334 | \url{https://github.com/ocaml-multicore/eio} documentation shows how to try out OCaml effects. 335 | \end{frame} 336 | 337 | % Backup slides 338 | 339 | \begin{frame}[fragile] 340 | \frametitle{Lwt example} 341 | \begin{lstlisting}[style=ocaml] 342 | let foo ~stdin total = 343 | let* n = Lwt_io.read_line stdin in 344 | Lwt_io.printlf "n/total = %d" 345 | (int_of_string n / total) 346 | \end{lstlisting} 347 | \begin{lstlisting}[style=output] 348 | Fatal error: exception Division_by_zero 349 | Raised at Lwt_example.foo in file "lwt_example.ml", line 6 350 | Called from Lwt.[...].callback in file "src/core/lwt.ml", ... 351 | \end{lstlisting} 352 | \begin{itemize} 353 | \item Backtrace doesn't say what called \verb|foo| 354 | \item Closure with \verb|total| allocated on the heap 355 | \end{itemize} 356 | \end{frame} 357 | 358 | \begin{frame}[fragile] 359 | \frametitle{Eio example} 360 | \begin{lstlisting}[style=ocaml] 361 | let foo ~stdin total = 362 | let n = read_line stdin in 363 | traceln "n/total = %d" 364 | (int_of_string n / total) 365 | \end{lstlisting} 366 | \begin{lstlisting}[style=output] 367 | Fatal error: exception Division_by_zero 368 | Raised at Eio_example.foo in file "eio_example.ml", line 11 369 | Called from Eio_example.bar in file "eio_example.ml", line 15 370 | ... 371 | \end{lstlisting} 372 | \end{frame} 373 | 374 | 375 | \end{document} 376 | -------------------------------------------------------------------------------- /ocaml2023-eio/slides.tex: -------------------------------------------------------------------------------- 1 | \documentclass{beamer} 2 | \usepackage{graphicx} 3 | \usepackage{verbatim} 4 | \usepackage{alltt} 5 | \usepackage{xcolor} 6 | \usepackage{listings} 7 | \usepackage{hyperref} 8 | \setbeamertemplate{navigation symbols}{}%remove navigation symbols 9 | 10 | \renewcommand\UrlFont{\color{blue}} 11 | 12 | \newcommand\inputml[1]{\lstinputlisting[language={[Objective]Caml}]{#1}} 13 | 14 | \lstdefinelanguage{OCaml}{ 15 | keywords={ 16 | and,as,assert,asr,begin,class,constraint,do,done,downto,effect,else,end,exception, 17 | external,false,for,fun,function,functor,if,implicit,in,include,inherit,initializer, 18 | land,lazy,let,lor,lsl,lsr,lxor,macro,match,method,mod,module,mutable,new,object, 19 | of,open,or,private,rec,sig,struct,then,to,true,try,type,val,virtual,when, 20 | with,while}, 21 | comment=[s]{(*\ }{\ *)}, 22 | } 23 | 24 | \definecolor{darkgreen}{rgb}{0,0.2,0} 25 | \definecolor{darkblue}{rgb}{0.1,0.1,0.8} 26 | \definecolor{darkbrown}{rgb}{0.5,0.3,0.0} 27 | \definecolor{grey}{rgb}{0.5,0.5,0.5} 28 | \definecolor{darkgrey}{rgb}{0.2,0.2,0.2} 29 | \definecolor{highlight}{rgb}{1.0,1.0,0.4} 30 | 31 | \lstdefinestyle{ocaml}{ 32 | basicstyle=\ttfamily\scriptsize, 33 | basewidth=0.5em, 34 | commentstyle=\color{darkgreen}, 35 | escapeinside={(**}{)}, 36 | keywordstyle=\color{darkblue}, 37 | language=OCaml, 38 | morekeywords={macro}, 39 | stringstyle=\color{blue}, 40 | showstringspaces=false, 41 | mathescape=true, 42 | moredelim=**[is][]{?}{?}, 43 | moredelim=**[is][]{&}{&}, 44 | } 45 | 46 | \lstdefinestyle{output}{ 47 | basicstyle=\ttfamily\small, 48 | basewidth=0.5em, 49 | } 50 | 51 | \lstset{literate=% 52 | {->}{{$\to$}}2 53 | {...}{{$\ldots$}}2 54 | } 55 | 56 | \newcommand\mlkeyword[1]{{\ttfamily\color{darkblue} #1}} 57 | 58 | \title[Eio]{Eio 1.0 – Effects-based IO for OCaml 5} 59 | \author[Thomas Leonard] 60 | {Thomas Leonard\and Patrick Ferris\and Christiano Haesbaert\and Lucas Pluvinage\and Vesa Karvonen\and Sudha Parimala\and KC Sivaramakrishnan\and Vincent Balat\and Anil Madhavapeddy} 61 | \institute{Tarides} 62 | \date[OCaml 2023]{The OCaml Users and Developers Workshop, Sep 2023} 63 | 64 | \begin{document} 65 | 66 | \definecolor{grey}{gray}{0.6} 67 | 68 | \frame{\titlepage} 69 | 70 | \begin{frame} 71 | \frametitle{Overview} 72 | \begin{itemize} 73 | \item Motivation and design 74 | \item Interoperability (Lwt, Async, Kcas, Domainslib) 75 | \item Comparison with Lwt 76 | \item Experiences porting software 77 | \end{itemize} 78 | \end{frame} 79 | 80 | \begin{frame}[fragile] 81 | \frametitle{Motivation} 82 | \begin{itemize} 83 | \item Support effects 84 | \begin{itemize} 85 | \item No special monad syntax 86 | \item Can use \mlkeyword{try}, \mlkeyword{match}, \mlkeyword{while}, etc 87 | \item No separate Lwt or Async versions of code 88 | \item No heap allocations needed to simulate a stack 89 | \item A real stack means backtraces and profiling tools work 90 | \end{itemize} 91 | \item Support multiple cores 92 | \item Fix some annoyances with Lwt 93 | \end{itemize} 94 | \end{frame} 95 | 96 | \begin{frame}[fragile] 97 | \frametitle{Eio packages} 98 | \begin{itemize} 99 | \item Eio defines: 100 | \begin{itemize} 101 | \item 3 effects (\verb|Suspend|, \verb|Fork|, \verb|Get_context|) 102 | \item Generic cross-platform APIs 103 | \end{itemize} 104 | \item Backends for various platforms 105 | \item \verb|eio_main| chooses the best backend 106 | \end{itemize} 107 | \begin{figure} 108 | \includegraphics[width=0.6\textwidth]{arch.pdf} 109 | \end{figure} 110 | \end{frame} 111 | 112 | \begin{frame}[fragile] 113 | \frametitle{Performance : single core} 114 | \bigskip 115 | Eio (0.38 s): 116 | \begin{lstlisting}[style=ocaml] 117 | let parse r = 118 | for _ = 1 to n_bytes do 119 | let r = Eio.Buf_read.any_char r in 120 | ignore (r : char) 121 | done 122 | \end{lstlisting} 123 | \bigskip 124 | Lwt (1.49 s): 125 | \begin{lstlisting}[style=ocaml] 126 | let parse r = 127 | let rec aux = function 128 | | 0 -> Lwt.return_unit 129 | | i -> 130 | let* r = Lwt_io.read_char r in 131 | ignore (r : char); 132 | aux (i - 1) 133 | in 134 | aux n_bytes 135 | \end{lstlisting} 136 | \end{frame} 137 | 138 | \begin{frame} 139 | \frametitle{Performance : multi-core} 140 | \bigskip 141 | \begin{itemize} 142 | \item Many data-structures are now lock-free 143 | \item Better performance with multiple domains 144 | \end{itemize} 145 | \bigskip 146 | \centering 147 | Synchronous streams\\ 148 | \includegraphics[width=0.8\textwidth]{lock-free.png} 149 | \end{frame} 150 | 151 | \begin{frame}[fragile] 152 | \frametitle{Interoperability : Lwt} 153 | To run Lwt programs under Eio, replace \verb|Lwt_main.run| with: 154 | \begin{lstlisting}[style=ocaml] 155 | Eio_main.run @@ fun env -> 156 | Lwt_eio.with_event_loop ~clock:env#clock @@ fun _ -> 157 | Lwt_eio.run_lwt @@ fun () -> 158 | ... 159 | \end{lstlisting} 160 | \verb|run_lwt| and \verb|run_eio| switch between Lwt and Eio code: 161 | \begin{lstlisting}[style=ocaml] 162 | val run_lwt : (unit -> 'a Lwt.t) -> 'a 163 | val run_eio : (unit -> 'a) -> 'a Lwt.t 164 | \end{lstlisting} 165 | \bigskip 166 | \url{https://github.com/ocaml-multicore/lwt_eio} 167 | \end{frame} 168 | 169 | \begin{frame}[fragile] 170 | \frametitle{Interoperability : Async} 171 | \verb|Async_eio| does the same for async: 172 | \bigskip 173 | \begin{lstlisting}[style=ocaml] 174 | val run_eio : 175 | (unit -> 'a) -> 'a Async_kernel.Deferred.t 176 | 177 | val run_async : 178 | (unit -> 'a Async_kernel.Deferred.t) -> 'a 179 | \end{lstlisting} 180 | \bigskip 181 | \url{https://github.com/talex5/async_eio} 182 | \end{frame} 183 | 184 | \begin{frame}[fragile] 185 | \frametitle{Interoperability : Async, Eio and Lwt} 186 | You can even use all three libraries together in a single domain! 187 | \bigskip 188 | \begin{lstlisting}[style=ocaml] 189 | Eio_main.run @@ fun env -> 190 | Lwt_eio.with_event_loop ~clock:env#clock @@ fun _ -> 191 | Async_eio.with_event_loop @@ fun _ -> 192 | ... 193 | \end{lstlisting} 194 | \bigskip 195 | \url{https://github.com/talex5/async-eio-lwt-chimera} 196 | \end{frame} 197 | 198 | \begin{frame}[fragile] 199 | \frametitle{Interoperability : Domainslib and Kcas} 200 | Eio, Domainslib and Kcas all use \verb|domain-local-await|, 201 | allowing e.g. Domainslib to add items to a Kcas queue, 202 | which is being read from an Eio doman. 203 | \end{frame} 204 | 205 | 206 | \begin{frame}[fragile] 207 | \frametitle{Resource leaks} 208 | \begin{itemize} 209 | \item Resources are attached to switches 210 | \item When the switch finishes, the resource is freed 211 | \end{itemize} 212 | Eio: 213 | \begin{lstlisting}[style=ocaml] 214 | let accept socket = 215 | Switch.run @@ fun sw -> 216 | let conn, _addr = Eio.Net.accept ~sw socket in 217 | ... 218 | Eio.Net.close conn (* Optional *) 219 | \end{lstlisting} 220 | Lwt (leaks \verb|conn| if cancelled): 221 | \begin{lstlisting}[style=ocaml] 222 | let accept socket = 223 | let* conn, _addr = Lwt_unix.accept socket in 224 | ... 225 | Lwt_unix.close conn 226 | \end{lstlisting} 227 | \end{frame} 228 | 229 | \begin{frame}[fragile] 230 | \frametitle{Bounds on behaviour : Lwt} 231 | 232 | \begin{lstlisting}[style=ocaml] 233 | let () = 234 | Lwt_main.run (main ()) 235 | \end{lstlisting} 236 | 237 | \begin{itemize} 238 | \item What does this program do? 239 | \item What firewall rules should we set? 240 | \item Global state is hard to reason about 241 | \end{itemize} 242 | \end{frame} 243 | 244 | \begin{frame}[fragile] 245 | \frametitle{Bounds on behaviour : Eio} 246 | 247 | \setlength\fboxsep{1.2pt} 248 | \begin{lstlisting}[style=ocaml,escapechar=!] 249 | let () = 250 | Eio_main.run @@ fun !\colorbox{highlight}{env}! -> 251 | Switch.run @@ fun sw -> 252 | let addr = `Tcp (Eio.Net.Ipaddr.V4.any, 8080) in 253 | let socket = 254 | Eio.Net.listen ~sw !\colorbox{highlight}{env}!#net addr 255 | ~backlog:5 256 | ~reuse_addr:true 257 | in 258 | let dir = Eio.Path.open_dir ~sw (!\colorbox{highlight}{env}!#fs / "/srv/htdocs") in 259 | main ~socket dir 260 | \end{lstlisting} 261 | \begin{itemize} 262 | \item Listens on port 8080 (no other network use) 263 | \item Uses \verb|/srv/htdocs| (no other file-system use) 264 | \end{itemize} 265 | \bigskip 266 | \url{https://roscidus.com/blog/blog/2023/04/26/lambda-capabilities/} 267 | \end{frame} 268 | 269 | \begin{frame} 270 | \frametitle{Experiences porting software} 271 | \begin{itemize} 272 | \item Solver service (cache-dir bug) 273 | \item Wayland proxy 274 | \item Libraries: ocaml-tls, cohttp, dream, capnp-rpc, ... 275 | \end{itemize} 276 | \bigskip 277 | \url{https://github.com/ocaml-multicore/awesome-multicore-ocaml} 278 | \end{frame} 279 | 280 | \begin{frame}[fragile] 281 | \frametitle{Future} 282 | Eio 1.0: 283 | \begin{itemize} 284 | \item Finish file-system APIs 285 | \item OCaml 5.1 events 286 | \end{itemize} 287 | \bigskip 288 | Get involved: 289 | \begin{itemize} 290 | \item Chat on \verb|#eio| (\url{https://matrix.to/#/#eio:roscidus.com}) 291 | \item Developer video call every two weeks 292 | \end{itemize} 293 | \bigskip 294 | \url{https://github.com/ocaml-multicore/eio} 295 | \end{frame} 296 | 297 | \begin{frame} 298 | \frametitle{Questions} 299 | \end{frame} 300 | 301 | % Backup slides 302 | 303 | \begin{frame}[fragile] 304 | \frametitle{perf : test code} 305 | \begin{tabular}{ll} 306 | Eio & Lwt \\ 307 | \begin{lstlisting}[style=ocaml,boxpos=t] 308 | let run_task1 () = 309 | for _ = 1 to 2000 do 310 | do_work () 311 | done 312 | 313 | let run_task2 () = 314 | for _ = 1 to 2000 do 315 | do_work () 316 | done 317 | 318 | let run () = 319 | Fiber.both run_task1 run_task2 320 | \end{lstlisting}& 321 | \begin{lstlisting}[style=ocaml,boxpos=t] 322 | let run_task1 () = 323 | let rec outer = function 324 | | 0 -> Lwt.return_unit 325 | | i -> 326 | let* () = do_work () in 327 | outer (i - 1) 328 | in 329 | outer 2000 330 | 331 | let run_task2 () = ... 332 | 333 | let run () = 334 | Lwt.join [ 335 | run_task1 (); 336 | run_task2 (); 337 | ] 338 | \end{lstlisting} 339 | \end{tabular} 340 | \end{frame} 341 | 342 | \begin{frame}[fragile] 343 | \frametitle{perf : results} 344 | \verb|perf| shows \verb|task1| vs \verb|task2| for Eio part: 345 | \scriptsize 346 | \begin{verbatim} 347 | - 49.94% Lwt_main.run_495 348 | - Lwt_main.run_loop_435 349 | - 49.83% Lwt_sequence.loop_346 350 | - Lwt.callback_1373 351 | - 49.77% Dune.exe.Perf.fun_967 352 | + 49.77% Dune.exe.Perf.use_cpu_273 353 | - 49.90% Eio_linux.Sched.with_sched_inner_3088 354 | - 49.89% Eio_linux.Sched.with_eventfd_1738 355 | - Stdlib.Fun.protect_320 356 | - 49.86% caml_runstack 357 | - Eio.core.Fiber.fun_1369 358 | - 25.07% Dune.exe.Perf.run_task2_425 359 | + Dune.exe.Perf.use_cpu_273 360 | - 24.78% Dune.exe.Perf.run_task1_421 361 | + 24.77% Dune.exe.Perf.use_cpu_273 362 | \end{verbatim} 363 | \end{frame} 364 | 365 | 366 | \begin{frame}[fragile] 367 | \frametitle{Error reporting} 368 | \begin{itemize} 369 | \item Eio takes care to preserve stack-traces 370 | \item \verb|Lwt.join| waits for all threads before reporting errors; 371 | errors may never be seen 372 | \item \verb|Eio.Fiber.both| cancels the other fiber 373 | \end{itemize} 374 | \bigskip 375 | \begin{lstlisting}[style=ocaml] 376 | Fiber.both 377 | (fun () -> 378 | for x = 1 to 1000 do 379 | traceln "x = %d" x; 380 | Fiber.yield () 381 | done 382 | ) 383 | (fun () -> failwith "Simulated error") 384 | 385 | +x = 1 386 | Exception: Failure "Simulated error" 387 | \end{lstlisting} 388 | \end{frame} 389 | 390 | \begin{frame} 391 | \frametitle{Parsing benchmark} 392 | \includegraphics[width=\textwidth]{parsing.png} 393 | Parsing 100,000,000 bytes, one at a time: 394 | \begin{itemize} 395 | \item With a 4096-byte buffer (3.7x faster) 396 | \item With a 512-byte buffer (7.1x faster) 397 | \item With four runs in parallel (7.4x faster) 398 | \end{itemize} 399 | \end{frame} 400 | 401 | \end{document} 402 | -------------------------------------------------------------------------------- /wasm-wg2022-stack-switching/slides.tex: -------------------------------------------------------------------------------- 1 | \documentclass{beamer} 2 | \usepackage{graphicx} 3 | \usepackage{verbatim} 4 | \usepackage{alltt} 5 | \usepackage{xcolor} 6 | \usepackage{listings} 7 | \usepackage{hyperref} 8 | \setbeamertemplate{navigation symbols}{}%remove navigation symbols 9 | 10 | \renewcommand\UrlFont{\color{blue}} 11 | 12 | \newcommand\inputml[1]{\lstinputlisting[language={[Objective]Caml}]{#1}} 13 | 14 | \lstdefinelanguage{OCaml}{ 15 | keywords={ 16 | and,as,assert,asr,begin,class,constraint,do,done,downto,effect,else,end,exception, 17 | external,false,for,fun,function,functor,if,implicit,in,include,inherit,initializer, 18 | land,lazy,let,lor,lsl,lsr,lxor,macro,match,method,mod,module,mutable,new,object, 19 | of,open,or,private,rec,sig,struct,then,to,true,try,type,val,virtual,when, 20 | with,while}, 21 | comment=[s]{(*\ }{\ *)}, 22 | } 23 | 24 | \definecolor{darkgreen}{rgb}{0,0.2,0} 25 | \definecolor{darkblue}{rgb}{0.1,0.1,0.8} 26 | \definecolor{darkbrown}{rgb}{0.5,0.3,0.0} 27 | \definecolor{grey}{rgb}{0.5,0.5,0.5} 28 | \definecolor{darkgrey}{rgb}{0.2,0.2,0.2} 29 | 30 | \lstdefinestyle{ocaml}{ 31 | basicstyle=\ttfamily, % \small 32 | basewidth=0.5em, 33 | commentstyle=\color{darkgreen}, 34 | escapeinside={(**}{)}, 35 | keywordstyle=\color{darkblue}, 36 | language=OCaml, 37 | morekeywords={macro}, 38 | stringstyle=\color{blue}, 39 | showstringspaces=false, 40 | mathescape=true, 41 | moredelim=**[is][]{?}{?}, 42 | moredelim=**[is][]{&}{&}, 43 | } 44 | 45 | \lstdefinestyle{output}{ 46 | basicstyle=\ttfamily\small, 47 | basewidth=0.5em, 48 | } 49 | 50 | \lstset{literate=% 51 | {->}{{$\to$}}2 52 | {...}{{$\ldots$}}2 53 | } 54 | 55 | \newcommand\mlkeyword[1]{{\ttfamily\color{darkblue} #1}} 56 | 57 | \title[Effects]{Experiences with Effects in OCaml 5.0} 58 | \author[Anil Madhavapeddy and Thomas Leonard] 59 | {Anil Madhavapeddy (speaker) \and Thomas Leonard(speaker) \and Craig Ferguson\and Patrick Ferris\and Sadiq Jaffer\and Tom Kelly \and KC Sivaramakrishnan} 60 | \institute{University of Cambridge and Tarides} 61 | \date[Wasm Stack Switching WG]{WASM Stack Switching WG, Feb 2022} 62 | 63 | \begin{document} 64 | 65 | \definecolor{grey}{gray}{0.6} 66 | 67 | \frame{\titlepage} 68 | 69 | % \begin{frame} 70 | % \frametitle{Table of Contents} 71 | % \tableofcontents 72 | % \end{frame} 73 | 74 | \begin{frame} 75 | \frametitle{Overview} 76 | \begin{itemize} 77 | \item {Background: OCaml 5.0} 78 | \item Introduction to effects 79 | \item Case study: Converting the Angstrom parser 80 | \item Eio concurrency library 81 | \end{itemize} 82 | \end{frame} 83 | 84 | \begin{frame}[fragile] 85 | \frametitle{Background: OCaml and the road to 5.0} 86 | \begin{itemize} 87 | \item Industrial-grade functional programming language, first released in 1996 and continously developed since then. 88 | \item Compiles native code binaries for x86/arm/ppc/riscv in 32- and 64-bit. 89 | \item Also has a portable bytecode compiler that just needs a C compiler, and can 90 | be compiled to JavaScript (js\_of\_ocaml). 91 | \item OCaml 5.0 will feature multicore parallelism, and also untyped effects. 92 | \begin{itemize} 93 | \item \textbf{Pros:} High-performance direct-style code with a GC (this talk) 94 | \item \textbf{Cons:} How do retain portability to JavaScript and Wasm? 95 | \end{itemize} 96 | \end{itemize} 97 | \end{frame} 98 | 99 | \begin{frame}[fragile] 100 | \frametitle{Background: asynchrony in OCaml 4 and earlier} 101 | \begin{itemize} 102 | \item OCaml 4 is single-threaded with no first-class support for concurrency. 103 | \item IO concurrency has been expressed for years via userlevel libraries that 104 | allow for futures to be expressed succinctly. 105 | \item Two widely adopted libraries are: 106 | \begin{itemize} 107 | \item Lwt (usually for web programming) 108 | \item Async (used by Jane Street in their production usage of OCaml) 109 | \end{itemize} 110 | \end{itemize} 111 | \end{frame} 112 | 113 | 114 | \begin{frame}[fragile] 115 | \frametitle{OCaml 4.0 Lwt example} 116 | \begin{lstlisting}[style=ocaml] 117 | let foo ~stdin total = 118 | Lwt_io.read_line stdin >>= fun -> 119 | Lwt_io.printlf "n/total = %d" 120 | (int_of_string n / total) 121 | \end{lstlisting} 122 | \begin{lstlisting}[style=output] 123 | Fatal error: exception Division_by_zero 124 | Raised at Lwt_example.foo in file "lwt_example.ml", line 6 125 | Called from Lwt.[...].callback in file "src/core/lwt.ml", ... 126 | \end{lstlisting} 127 | \begin{itemize} 128 | \item Backtrace doesn't say what called \verb|foo| 129 | \item Closure with \verb|total| allocated on the heap 130 | \item Type of function \verb|foo| appends an \verb|Lwt.t| 131 | \end{itemize} 132 | \end{frame} 133 | 134 | \begin{frame}[fragile] 135 | \frametitle{OCaml 5.0 effects-based example} 136 | \begin{lstlisting}[style=ocaml] 137 | let foo ~stdin total = 138 | let n = read_line stdin in 139 | traceln "n/total = %d" 140 | (int_of_string n / total) 141 | \end{lstlisting} 142 | \begin{lstlisting}[style=output] 143 | Fatal error: exception Division_by_zero 144 | Raised at Eio_example.foo in file "eio_example.ml", line 11 145 | Called from Eio_example.bar in file "eio_example.ml", line 15 146 | ... 147 | \end{lstlisting} 148 | \begin{itemize} 149 | \item Backtrace is entirely accurate now 150 | \item Only stack allocation needed for the blocking I/O 151 | \item Type of function is no longer affected by use of IO 152 | \end{itemize} 153 | 154 | \end{frame} 155 | 156 | \begin{frame}[fragile] 157 | \frametitle{Introduction to effects} 158 | \begin{columns}[t] 159 | \begin{column}{4.5cm} 160 | \begin{itemize} 161 | \item Resumable exceptions 162 | \item Multiple stacks 163 | \end{itemize} 164 | \end{column} 165 | \begin{column}{5cm} 166 | \begin{lstlisting}[style=ocaml] 167 | effect Foo : int -> int 168 | 169 | try 170 | println "step 1"; 171 | let x = perform (Foo 2) in 172 | println "step %d" x 173 | with effect (Foo n) k -> 174 | println "step %d" n; 175 | continue k (n + 1) 176 | \end{lstlisting} 177 | \end{column} 178 | \end{columns} 179 | \end{frame} 180 | 181 | \begin{frame} 182 | \frametitle{Advantages of effects} 183 | \begin{itemize} 184 | \item No difference between sequential and concurrent code. 185 | \begin{itemize} 186 | \item No special monad syntax. 187 | \item Can use \mlkeyword{try}, \mlkeyword{match}, \mlkeyword{while}, etc. 188 | \item No separate lwt or async versions of code. 189 | \end{itemize} 190 | \item No heap allocations needed to simulate a stack. 191 | \item A real stack means backtraces and profiling tools work. 192 | \end{itemize} 193 | \end{frame} 194 | 195 | \begin{frame} 196 | \frametitle{Case study: Angstrom} 197 | 198 | \url{https://github.com/inhabitedtype/angstrom/} 199 | \bigskip 200 | \begin{itemize} 201 | \item A library for writing parsers 202 | \item Designed for network protocols 203 | \item Strong focus on performance 204 | \end{itemize} 205 | \end{frame} 206 | 207 | \begin{frame}[fragile] 208 | \frametitle{A toy parser} 209 | \begin{lstlisting}[style=ocaml] 210 | type 'a parser = state -> 'a 211 | 212 | let any_char state = 213 | ensure 1 state; 214 | let c = Input.unsafe_get_char state.input state.pos in 215 | state.pos <- state.pos + 1; 216 | c 217 | 218 | let (*>) a b state = 219 | let _ = a state in 220 | b state 221 | \end{lstlisting} 222 | \end{frame} 223 | 224 | \begin{frame}[fragile] 225 | \frametitle{The Angstrom parser type} 226 | \begin{lstlisting}[style=ocaml,basicstyle=\ttfamily\small] 227 | module State = struct 228 | type 'a t = 229 | | Partial of 'a partial 230 | | Lazy of 'a t Lazy.t 231 | | Done of int * 'a 232 | | Fail of int * string list * string 233 | and 'a partial = 234 | { committed : int; 235 | continue : Bigstringaf.t -> 236 | off:int -> len:int -> More.t -> 'a t } 237 | end 238 | type 'a with_state = Input.t -> int -> More.t -> 'a 239 | type 'a failure = 240 | (string list -> string -> 'a State.t) with_state 241 | type ('a, 'r) success = ('a -> 'r State.t) with_state 242 | type 'a parser = { run : 'r. 243 | ('r failure -> ('a, 'r) success -> 'r State.t) with_state 244 | } 245 | \end{lstlisting} 246 | \end{frame} 247 | 248 | \begin{frame}[fragile] 249 | \frametitle{Angstrom parsers} 250 | \begin{lstlisting}[style=ocaml,basicstyle=\ttfamily\small] 251 | let any_char = 252 | ensure 1 { run = fun input pos more _fail succ -> 253 | succ input (pos + 1) more 254 | (Input.unsafe_get_char input pos) 255 | } 256 | 257 | let (*>) a b = 258 | { run = fun input pos more fail succ -> 259 | let succ' input' pos' more' _ = 260 | b.run input' pos' more' fail succ in 261 | a.run input pos more fail succ' 262 | } 263 | \end{lstlisting} 264 | \end{frame} 265 | 266 | \begin{frame}[fragile] 267 | \frametitle{Angstrom : effects branch} 268 | \url{https://github.com/talex5/angstrom/tree/effects} 269 | \bigskip 270 | \begin{lstlisting}[style=ocaml] 271 | type 'a parser = state -> 'a 272 | 273 | let any_char state = 274 | ensure 1 state; 275 | let c = Input.unsafe_get_char state.input state.pos in 276 | state.pos <- state.pos + 1; 277 | c 278 | 279 | let (*>) a b state = 280 | let _ = a state in 281 | b state 282 | \end{lstlisting} 283 | \end{frame} 284 | 285 | \begin{frame}[fragile] 286 | \frametitle{Parser micro-benchmark} 287 | \begin{lstlisting}[style=ocaml] 288 | let parser = skip_many any_char 289 | \end{lstlisting} 290 | \bigskip 291 | \begin{table} 292 | \begin{tabular}{l|rrrrr} 293 | & Time & MinWrds & MajWrds \\ 294 | \hline 295 | Callbacks & 750.63ms & 160.04Mw & 8,9944.00kw \\ 296 | \uncover<2>{Callbacks'} & \uncover<2>{180.73ms} & \uncover<2>{220.01Mw} & \uncover<2>{9,659.00w} \\ 297 | Effects & 57.81ms & - & - 298 | \end{tabular} 299 | \end{table} 300 | \bigskip 301 | \uncover<1>{1}3 times faster! 302 | \pause 303 | \end{frame} 304 | 305 | \begin{frame}[fragile] 306 | \frametitle{Realistic parser benchmark} 307 | Parsing an HTTP request shows smaller gains: 308 | \bigskip 309 | \begin{table} 310 | \begin{tabular}{l|rrrrr} 311 | & Time & MinWrds & MajWrds \\ 312 | \hline 313 | Callbacks & 60.30ms & 9.28Mw & 102.08kw \\ 314 | Effects & 50.71ms & 2.13Mw & 606.30w 315 | \end{tabular} 316 | \end{table} 317 | \end{frame} 318 | 319 | \begin{frame}[fragile] 320 | \frametitle{Using effects for backwards compatibility} 321 | \begin{lstlisting}[style=ocaml] 322 | effect Read : int -> state 323 | let read c = perform (Read c) 324 | 325 | let parse p = 326 | let buffering = Buffering.create () in 327 | try Unbuffered.parse ~read p 328 | with effect (Read committed) k -> 329 | Buffering.shift buffering committed; 330 | Partial (fun input -> 331 | Buffering.feed_input buffering input; 332 | continue k (Buffering.for_reading buffering) 333 | ) 334 | \end{lstlisting} 335 | (simplified) 336 | \end{frame} 337 | 338 | \begin{frame} 339 | \frametitle{Angstrom summary} 340 | \begin{itemize} 341 | \item Slightly faster 342 | \item Much simpler code 343 | \item No effects in interface 344 | \item Can convert between callbacks and effects easily 345 | \end{itemize} 346 | \end{frame} 347 | 348 | \begin{frame} 349 | \frametitle{Eio : an IO library using effects for concurrency} 350 | \begin{itemize} 351 | \item Alternative to Lwt and Async 352 | \item Generic API that performs effects 353 | \item Cross-platform libuv effect handler 354 | \item High-performance io-uring handler for Linux 355 | \end{itemize} 356 | \end{frame} 357 | 358 | \begin{frame}[fragile] 359 | \frametitle{Eio example} 360 | \begin{lstlisting}[style=ocaml] 361 | let handle_connection = 362 | Httpaf_eio.Server.create_connection_handler 363 | ~config 364 | ~request_handler 365 | ~error_handler 366 | 367 | let main ~net = 368 | Switch.top @@ fun sw -> 369 | let socket = Eio.Net.listen ~sw net (`Tcp (host, port)) 370 | ~reuse_addr:true 371 | ~backlog:1000 372 | in 373 | while true do 374 | Eio.Net.accept_sub ~sw socket handle_connection 375 | ~on_error:log_connection_error 376 | done 377 | \end{lstlisting} 378 | \end{frame} 379 | 380 | \begin{frame} 381 | \frametitle{HTTP benchmark} 382 | \includegraphics[width=\textwidth]{rps-graph.png} 383 | 100 concurrent connections. Servers limited to 1 core. 384 | \end{frame} 385 | 386 | \begin{frame} 387 | \frametitle{Eio : other features} 388 | \begin{itemize} 389 | \item Structured concurrency 390 | \item OCaps security model 391 | \item Tracing support 392 | \item Supports multiple cores 393 | \item Still experimental 394 | \end{itemize} 395 | \includegraphics[width=\textwidth]{trace.png} 396 | \end{frame} 397 | 398 | \begin{frame} 399 | \frametitle{Eio : migrating from old-style code} 400 | Switching to effects in OCaml 5.0 turns out to be great timing in the bigger picture. 401 | \begin{itemize} 402 | \item There has been a slow but steady shift to better system interfaces for async 403 | \item Io\_uring (Linux), Grand Central Dispatch (macOS), IOCP (Windows) 404 | \item These all make effect-based IO incredibly straightforward and elegant. 405 | \end{itemize} 406 | \end{frame} 407 | 408 | \begin{frame} 409 | \frametitle{Eio : post-POSIX, with io\_uring} 410 | \includegraphics[width=\textwidth]{uring.png} 411 | \end{frame} 412 | 413 | \begin{frame} 414 | \frametitle{Eio : post-POSIX} 415 | \begin{itemize} 416 | \item No more fd-set management and scalability bottlenecks, so great time for new post-POSIX interfaces 417 | \item \textbf{Concurrency-friendly:} Just stash a single-shot continuation and call it when IO is ready, or raise exception if IO is cancelled. 418 | \item \textbf{Parallel-friendly:} Push batch onto a shared memory ring and get responses back with one syscall. 419 | \item \textbf{Hardware-friendly:} Very similar to hypervisor-level interfaces, but from userspace. 420 | \end{itemize} 421 | \end{frame} 422 | 423 | \begin{frame} 424 | \frametitle{Summary of OCaml 5.0 and our use of effects} 425 | \begin{itemize} 426 | \item Concurrency with effects works very well and is ergonomic to program with 427 | \item Effects have very good performance (stack vs heap) 428 | \item The use of separate of effect schedulers is still emerging, but there are dozens 429 | of networking/storage OCaml libraries being ported currently, with little drama. 430 | \item Key open blocker for our community is Js/Wasm compilation support: 431 | \textbf{effects are here to stay in OCaml 5.0, so what's the best path forward?} 432 | \end{itemize} 433 | \bigskip 434 | \url{https://github.com/ocaml-multicore/eio} documentation shows how to try out OCaml effects. 435 | \url{https://github.com/patricoferris/awesome-multicore-ocaml} lists community libraries. 436 | \end{frame} 437 | 438 | % Backup slides 439 | 440 | 441 | 442 | \end{document} 443 | --------------------------------------------------------------------------------