├── .github └── ISSUE_TEMPLATE │ ├── bug_report.md │ └── feature_request.md ├── .gitignore ├── 00a_livebook_for_python_jupyter.livemd ├── 01a_matmul_using_CPU.livemd ├── 01g_matmul_EXLA_gpu.livemd ├── 01h_matmul_Torchx_gpu.livemd ├── 01i_matmul_Exla_cpu.livemd ├── ElixirFashionML_Challenge ├── fashion_mnist_challenge.livemd └── fashion_mnist_sean_m.livemd └── README.md /.github/ISSUE_TEMPLATE/bug_report.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: Bug report 3 | about: Create a report to help us improve 4 | title: '' 5 | labels: '' 6 | assignees: '' 7 | 8 | --- 9 | 10 | **Describe the bug** 11 | A clear and concise description of what the bug is. 12 | 13 | **To Reproduce** 14 | Steps to reproduce the behavior: 15 | 1. Go to '...' 16 | 2. Click on '....' 17 | 3. Scroll down to '....' 18 | 4. See error 19 | 20 | **Expected behavior** 21 | A clear and concise description of what you expected to happen. 22 | 23 | **Screenshots** 24 | If applicable, add screenshots to help explain your problem. 25 | 26 | **Desktop (please complete the following information):** 27 | - OS: [e.g. iOS] 28 | - Browser [e.g. chrome, safari] 29 | - Version [e.g. 22] 30 | 31 | **Smartphone (please complete the following information):** 32 | - Device: [e.g. iPhone6] 33 | - OS: [e.g. iOS8.1] 34 | - Browser [e.g. stock browser, safari] 35 | - Version [e.g. 22] 36 | 37 | **Additional context** 38 | Add any other context about the problem here. 39 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/feature_request.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: Feature request 3 | about: Suggest an idea for this project 4 | title: '' 5 | labels: '' 6 | assignees: '' 7 | 8 | --- 9 | 10 | **Is your feature request related to a problem? Please describe.** 11 | A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] 12 | 13 | **Describe the solution you'd like** 14 | A clear and concise description of what you want to happen. 15 | 16 | **Describe alternatives you've considered** 17 | A clear and concise description of any alternative solutions or features you've considered. 18 | 19 | **Additional context** 20 | Add any other context or screenshots about the feature request here. 21 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | /_build 2 | /cover 3 | /deps 4 | /doc 5 | /.fetch 6 | erl_crash.dump 7 | *.ez 8 | *.beam 9 | /config/*.secret.exs 10 | .elixir_ls/ 11 | -------------------------------------------------------------------------------- /00a_livebook_for_python_jupyter.livemd: -------------------------------------------------------------------------------- 1 | # Elixir/Livebook for Python/Jupyter Developers 2 | 3 | ## Quick Overview 4 | 5 | Python/Jupyter focused developers that are casually looking at Elixir and Livebook notebooks will see concepts that kind of look the same but are different. Knowing the key differences could help ease understanding about key Elixir concepts in the notebook. The goal of this guide is to help Python focused people look at a Livebook notebook and grasp what is happening in the notebook. 6 | 7 | ## Installing Livebook 8 | 9 | We've had good success with installing Livebook.dev, https://livebook.dev/#install. There are native applications for Windows and Mac. On a Linux system, we go to the Github site, https://github.com/livebook-dev/livebook, and install via Escript. Don't forget to set the shims. Also, when running on a local Linux server, the firewall ports for Kino and other interactive cells, are different than the livebook server port. Be sure to pay attention to the environment variable options in the Readme.md 10 | 11 | Let's discuss how Livebook/Elixir is a little different from Jupyter/Python 12 | 13 | ## Function vs Object Oriented 14 | 15 | Elixir is a functional language. State exists outside of a function with values passed into a function. Most Elixir functions will probably transform the inputs then supply an output back. You could think of them as procedures that have everything passed in, don't update any object state, and return the result. 16 | 17 | However, there are a few situations where state is held after a function call. In Elixir, we think of these functions as having a side effect. Common side effects are storing data in a database, file, or operating resources. The database "write" and file "write" functions result in changing a resource that can later be retrieved. There are a several other examples of side effect situations. We'll even see a few examples in Elixir machine learning libraries. 18 | 19 | Elixir has modules that hold function definitions and may define a data structure. Python has class definitions that hold state and function definitions. Where a variable can have a method invocation in Python, i.e. list_a.sum(), Elixir values must be passed as arguments into a module's function, Enum.sum(list_b) 20 | 21 | ## Immutable state in Elixir 22 | 23 | For Machine Learning notebooks, state is referenced in variable names specific to the notebook. This is very similar to how variable state is held in a Jupyter notebook. The variable values are held by the notebook until the Elixir notebook is closed. 24 | 25 | One pretty big difference in Elixir is that all state is immutable. A function can receive state as an argument variable, however, the variable is immutable so it can't be changed. The function may transform the information, but any transformations must be returned back as a newly created value. One convenient approach in Elixir is to assign the resulting function call back to the same variable name. But there are better conventions that we'll see below 26 | 27 | 28 | 29 | ```elixir 30 | list_b = [1,2,3] 31 | list_b = Enum.map(list_b, fn(value) -> value * value end) 32 | ``` 33 | 34 | Like Jupyter notebooks, shift-*return key* will execute the current cell. The other keyboard shortcuts can be found in the keypad-like icon on the left bar. There is also a mouse approach with the > Execute that appears above the active cell. Click on the Execute button will also work. If you've already install Livebook, try executing the following code cell. 35 | 36 | ```elixir 37 | list_b = [1, 2, 3] 38 | list_b = Enum.map(list_b, fn value -> value * value end) 39 | ``` 40 | 41 | ## Chaining function calls 42 | 43 | In many object-oriented languages method calls can be chained sequentially, i.e. array_a.square().sum() 44 | 45 | Elixir has a special notation for chaining function calls together. 46 | 47 | 48 | 49 | ```elixir 50 | list_b 51 | |> Enum.map(fn(value) -> value * value end) 52 | |> Enum.sum() 53 | ``` 54 | 55 | The |>, pipe operator, takes the result of the previous function and passes it as the first argument to the following function call. Note that the first argument, a list or enumerable, for Enum.map and Enum.sum aren't shown because the pipe operator represents the output from the previous line of code. 56 | 57 | ```elixir 58 | list_b 59 | |> Enum.map(fn value -> value * value end) 60 | |> Enum.sum() 61 | ``` 62 | 63 | ```elixir 64 | # All in one line also works 65 | list_b |> Enum.map(fn value -> value * value end) |> Enum.sum() 66 | ``` 67 | 68 | ## Elixir functions in modules 69 | 70 | As long as the code in a Livebook is calling existing functions, variable assignment works pretty much like they do in Python/Jupyter. However, Python supports the definition of standalone functions in notebooks. 71 | 72 | ```python 73 | def chunks(x, sz): 74 | for i in range(0, len(x), sz): yield x[i:i+sz] 75 | ``` 76 | 77 | Alll Elixir function definitions must be inside a module definition 78 | 79 | 80 | 81 | ```elixir 82 | defmodule ModA do 83 | def funct_a() do 84 | end 85 | end 86 | ``` 87 | 88 | 89 | 90 | Elixir has an anonymous function capability. In the above Enum function call, the fn(something) -> transform(something) end) is creating an anonymous function, like Python's lambda. Anonymous functions can be assigned to variable names and called. Note the .(args) when the named anonymous function is called. 91 | 92 | 93 | 94 | ```elixir 95 | sum_of_squares = fn(value) -> 96 | Enum.map(value, fn(v) -> v * v end) 97 | |> Enum.sum() 98 | end 99 | 100 | sum_of_squares.(list_b) 101 | ``` 102 | 103 | ```elixir 104 | sum_of_squares = fn value -> 105 | Enum.map(value, fn v -> v * v end) 106 | |> Enum.sum() 107 | end 108 | 109 | sum_of_squares.(list_b) 110 | ``` 111 | 112 | ## Livebook module version management 113 | 114 | One item to note about Livebook, modules are installed with a version. Rather than a requirements.txt for an entire folder of Jupyter notebooks, the module dependencies are defined within each notebook. The Livebook convention is to use the first cell to define any module dependencies. In this notebook, the basic Elixir language capabilities were sufficient so no modules were Mix.installed. Watch for the contents of the first cell to explore the modules used in notebooks. The specific modules used helps with repeatability challenges with notebooks. However, you'll note that the Elixir and Erlang versions are not defined in the notebook. Neither was the version of Livebook the notebook was run under. Operating system dependencies, like Cuda, CudaDNN, cmake, make, etc. are not defined in notebooks either. 115 | 116 | ## Livebook file format 117 | 118 | Livebook's file format is a markdown file. The use of a well defined standard format allows support for understandable Git pull requests against the .livemd file. 119 | 120 | ## Left sidebar and hints 121 | 122 | We've already mentioned the keyboard shortcuts. Other icons represent the table of Section labels and connected users. The lock captures secrets that you don't want stored in your Livebook. Secrets can be things like a database login, etc. The runtime settings is kind of an advanced setting. We suggest finding the documentation or blog posts on how to use the settings. These settings don't have a strong mapping to Jupyter. Finally, a Big Hint, if you accidently delete a cell, you can retrieve the cell from the bin/trash. 123 | 124 | 125 | 126 | For the active cell, there are some icons above the cell to the right. The up and down arrow move the active cell up or down in your notebook. We just noticed, in Livebook 0.7.2, that there is an icon to insert an image into a markdown cell. We'll need to try it out. 127 | 128 | 129 | 130 | Another big hint: Livebook knows the cells that are stale. If you go to the bottom of the notebook, or someplace in the middle of notebook, execute the cell and any cells that are out of date with your edits are executed down to your cell. This is one technique for executing all of the cells in a notebook. However, it doesn't force the re-executing of all cells. Only stale cells are run. If you re-execute the first cell and then execute the last cell, all cells will be executed. 131 | 132 | 133 | 134 | A Livebook notebook opened from a web resource will not be saved locally unless to instruct Livebook to save the notebook. Click on the floppy disk icon in the lower right and choose someplace you want to store the notebook. 135 | 136 | ## Try out some Livebook notebooks 137 | 138 | This hasn't been a complete guide to Livebook, but hopefully it provides some context for your exploration of Elixir and Livebook. Have fun! 139 | -------------------------------------------------------------------------------- /01g_matmul_EXLA_gpu.livemd: -------------------------------------------------------------------------------- 1 | # Matrix multiplication on GPU - XLA 2 | 3 | ```elixir 4 | Mix.install( 5 | [ 6 | {:nx, "~> 0.4.0"}, 7 | {:scidata, "~> 0.1.9"}, 8 | {:axon, "~> 0.3.0"}, 9 | {:exla, "~> 0.4"} 10 | ], 11 | system_env: %{"XLA_TARGET" => "cuda111"} 12 | ) 13 | ``` 14 | 15 | ## Before running notebook 16 | 17 | This notebook has a dependency on EXLA. XLA support systems with direct access to an NVidia GPU, AMD ROCm or a Google TPU. According to the documentation, https://github.com/elixir-nx/nx/tree/main/exla#readme EXLA will try to find a precompiled version that matches your system. If it doesn't find a match. you will need to install CUDA and CuDNN for your system. 18 | 19 | The notebook is currently configured for Nvidia GPU via 20 | 21 | ``` 22 | system_env: %{"XLA_TARGET" => "cuda111"} 23 | ``` 24 | 25 | Review the configuration documentation for more options. https://hexdocs.pm/exla/EXLA.html#module-configuration 26 | 27 | We had to install CUDA and CuDNN but that was several months ago. Your experience may vary from ours. 28 | 29 | ## Context 30 | 31 | This Livebook is a transformation of a Python Jupyter Notebook from Fast.ai's From Deep Learning Foundations to Stable Diffusion, Practical Deep Learning for Coders part 2, 2022. Specifically, it mimics the CUDA portion of https://github.com/fastai/course22p2/blob/master/nbs/01_matmul.ipynb 32 | 33 | The purpose of the transformation is to bring the Fast.ai concepts to Elixir focused developers. The object-oriented Python/PyTorch implementation is transformed into a functional programming implementation using Nx and Axon 34 | 35 | ## Experimenting with backend control 36 | 37 | In this notebook, we are going to experiment with swapping out backends in the same notebook. One of the strengths of Elixir's numerical processing approach is the concept of a backend. The same Nx code can run on several different backends. This allows Nx to adapt to changes in numerical libaries and technology. Currently, Nx has support for Tensorflow's XLA and PyTorch's TorchScript. Theoretically, backends for SOC type devices should be possible. 38 | 39 | We chose not to set the backend globally throughout the notebook. At the beginning of the notebook we'll repeat the approach we used in 01a_matmul_using_CPU. We begin with the Elixir Binary backend. You'll see that it isn't quick multiplying 10,000 rows of MNIST data by some arbitrary weights. We'll then repeat the same multiplication using an NVidia 1080Ti GPU. The 1080 Ti is not the fastest GPU, but it is tremendously faster than a "large" set of data on the BinaryBackend. 40 | 41 | * 31649.26 milliseconds using BinaryBackend with a CPU only. 42 | * 0.14 milliseconds using XLA with a warmed up GPU 43 | 44 | *226,000 times faster on an old GPU* 45 | 46 | ## Default - BinaryBackend 47 | 48 | ```elixir 49 | # Without choosing a backend, Nx defaults to Nx.BinaryBackend 50 | Nx.default_backend() 51 | ``` 52 | 53 | ```elixir 54 | # Just in case you rerun the notebook, let's make sure the default backend is BinaryBackend 55 | # Setting to the Nx default backend 56 | Nx.default_backend(Nx.BinaryBackend) 57 | Nx.default_backend() 58 | ``` 59 | 60 | We'll pull down the MNIST data 61 | 62 | ```elixir 63 | {train_images, train_labels} = Scidata.MNIST.download() 64 | ``` 65 | 66 | ```elixir 67 | {train_images_binary, train_tensor_type, train_shape} = train_images 68 | ``` 69 | 70 | ```elixir 71 | train_tensor_type 72 | ``` 73 | 74 | Convert into Tensors and normalize to between 0 and 1 75 | 76 | ```elixir 77 | train_tensors = 78 | train_images_binary 79 | |> Nx.from_binary(train_tensor_type) 80 | |> Nx.reshape({60000, 28 * 28}) 81 | |> Nx.divide(255) 82 | ``` 83 | 84 | We'll separate the data into 50,000 train images and 10,000 validation images. 85 | 86 | ```elixir 87 | x_train_cpu = train_tensors[0..49_999] 88 | x_valid_cpu = train_tensors[50_000..59_999] 89 | {x_train_cpu.shape, x_valid_cpu.shape} 90 | ``` 91 | 92 | Training is more stable when random numbers are initialized with a mean of 0.0 and a variance of 1.0 93 | 94 | ```elixir 95 | mean = 0.0 96 | variance = 1.0 97 | weights_cpu = Nx.random_normal({784, 10}, mean, variance, type: {:f, 32}) 98 | ``` 99 | 100 | In order to simplify timing the performance of the Nx.dot/2 function, we'll use an 0 parameter anonymous function. Invoking the anonymous function will always use the two parameters, x_valid_cpu and weights_cpu. 101 | 102 | ```elixir 103 | large_nx_mult_fn = fn -> Nx.dot(x_valid_cpu, weights_cpu) end 104 | ``` 105 | 106 | The following anonymous function takes function and the number of times to make the call to the function. 107 | 108 | ```elixir 109 | repeat = fn timed_fn, times -> Enum.each(1..times, fn _x -> timed_fn.() end) end 110 | ``` 111 | 112 | Timing the average duration of the dot multiply function to run. The cell will output the average and total elapsed time 113 | 114 | ```elixir 115 | repeat_times = 5 116 | {elapsed_time_micro, _} = :timer.tc(repeat, [large_nx_mult_fn, repeat_times]) 117 | avg_elapsed_time_ms = elapsed_time_micro / 1000 / repeat_times 118 | 119 | {backend, _device} = Nx.default_backend() 120 | 121 | "#{backend} CPU avg time in #{avg_elapsed_time_ms} milliseconds, total_time #{elapsed_time_micro / 1000} milliseconds" 122 | ``` 123 | 124 | ## XLA using GPU 125 | 126 | We'll switch to the XLA backend and use the cuda device. If you have a different device, replace all the :cuda specifications with your device. 127 | 128 | ```elixir 129 | Nx.default_backend({EXLA.Backend, device: :cuda}) 130 | Nx.default_backend() 131 | ``` 132 | 133 | In the following cell, we transfer the target data onto the GPU. 134 | 135 | ```elixir 136 | x_valid_cuda = Nx.backend_transfer(x_valid_cpu, {EXLA.Backend, client: :cuda}) 137 | weights_cuda = Nx.backend_transfer(weights_cpu, {EXLA.Backend, client: :cuda}) 138 | ``` 139 | 140 | An anonymous function that calls Nx.dot/2 with data on the GPU 141 | 142 | ```elixir 143 | exla_gpu_mult_fn = fn -> Nx.dot(x_valid_cuda, weights_cuda) end 144 | ``` 145 | 146 | We'll warm up the GPU by looping through 5 function calls and then timing the next 5 147 | function calls. 148 | 149 | ```elixir 150 | repeat_times = 5 151 | # Warm up one epoch 152 | {elapsed_time_micro, _} = :timer.tc(repeat, [exla_gpu_mult_fn, repeat_times]) 153 | # The real timing starts here 154 | {elapsed_time_micro, _} = :timer.tc(repeat, [exla_gpu_mult_fn, repeat_times]) 155 | avg_elapsed_time_ms = elapsed_time_micro / 1000 / repeat_times 156 | 157 | {backend, [device: device]} = Nx.default_backend() 158 | 159 | "#{backend} #{device} avg time in #{avg_elapsed_time_ms} milliseconds total_time #{elapsed_time_micro / 1000} milliseconds" 160 | ``` 161 | 162 | ```elixir 163 | x_valid_cpu = Nx.backend_transfer(x_valid_cuda, Nx.BinaryBackend) 164 | weights_cpu = Nx.backend_transfer(weights_cuda, Nx.BinaryBackend) 165 | ``` 166 | -------------------------------------------------------------------------------- /01h_matmul_Torchx_gpu.livemd: -------------------------------------------------------------------------------- 1 | # Matrix multiplication on GPU - TorchScript 2 | 3 | ```elixir 4 | Mix.install( 5 | [ 6 | {:nx, "~> 0.4.0"}, 7 | {:scidata, "~> 0.1.9"}, 8 | {:torchx, "~> 0.3"} 9 | ], 10 | system_env: %{"LIBTORCH_TARGET" => "cu116"} 11 | ) 12 | ``` 13 | 14 | 15 | 16 | ``` 17 | :ok 18 | ``` 19 | 20 | ## Before running notebook 21 | 22 | This notebook has a dependency on TorchScript. Torchx can use your CPU or GPU. If you have direct access to an NVidia GPU, the notebook has a section on running matrix multiplication on a GPU. If you only have a CPU, you can comment out the last GPU section and just run on your CPU. CPU is still pretty fast for this simple notebook. 23 | 24 | According to the documentation, https://github.com/elixir-nx/nx/tree/main/torchx#readme Torchx will need to compile the TorchScript binding. Before you run the above cell, you will need make/nmake, cmake (3.12+) and a C++ compiler. The Windows binding to TorchScript is also supported and more information can be found at the Torchx readme. At this time, the MacOS binding doesn't support access to a GPU. 25 | 26 | **Running the first cell downloads and compiles the binding to TorchScript. The download of TorchScript took about 9 minutes and compilation took about 1 minute on our system.** In the future, it is likely that the downloaded TorchScript file will be cached locally, however, right now each notebook that uses torchx will download the file. 27 | 28 | The notebook is currently set up for an Nvidia GPU on Linux. 29 | 30 | ``` 31 | system_env: %{"LIBTORCH_TARGET" => "cu111"} 32 | ``` 33 | 34 | Feel free to read the Torchx documentation and modify to fit your needs. 35 | 36 | ## Context 37 | 38 | The notebook is a transformation of a Python Jupyter Notebook from Fast.ai's [From Deep Learning Foundations to Stable Diffusion](https://www.fast.ai/posts/part2-2022.html), Practical Deep Learning for Coders part 2, 2022. Specifically, it mimics the CUDA portion of https://github.com/fastai/course22p2/blob/master/nbs/01_matmul.ipynb 39 | 40 | The purpose of the transformation is to bring the Fast.ai concepts to Elixir focused developers. The object-oriented Python/PyTorch implementation is transformed into a functional programming implementation using [Nx](https://github.com/elixir-nx/nx) and [Axon](https://github.com/elixir-nx/axon) 41 | 42 | ## Experimenting with backend control 43 | 44 | In this notebook, we are going to experiment with swapping out backends in the same notebook. One of the strengths of Elixir's numerical processing approach is the concept of a backend. The same Nx code can run on several different backends. This allows Nx to adapt to changes in numerical libaries and technology. Currently, Nx has support for Tensorflow's XLA and PyTorch's TorchScript. Theoretically, backends for SOC type devices should be possible. 45 | 46 | We chose not to set the backend globally in this notebook. At the beginning of the notebook, we'll repeat the approach we used in 01a_matmul_using_CPU. We begin with the Elixir Binary backend. You'll see that it isn't quick multiplying 10,000 rows of MNIST data by some arbitrary weights. 47 | 48 | We'll then repeat the same multiplication using TorchScript on the CPU. Followed again by TorchScript using an NVidia 1080Ti GPU. The 1080 Ti is not the fastest GPU, but it is tremendously faster than a "large" set of data on the BinaryBackend but only a little faster than just the CPU 49 | 50 | * About 32 seconds using BinaryBackend with only a CPU. 51 | * 1.8 milliseconds using TorchScript with only a CPU 52 | 53 | 17,778 times faster than Binary backend 54 | 55 | * 70 microseconds using TorchScript with a warmed up, but old, GPU 56 | 57 | 111 times faster on the GPU vs the CPU 58 | 59 | ## Default - BinaryBackend 60 | 61 | ```elixir 62 | # Without choosing a backend, Nx defaults to Nx.BinaryBackend 63 | Nx.default_backend() 64 | ``` 65 | 66 | 67 | 68 | ``` 69 | {Nx.BinaryBackend, []} 70 | ``` 71 | 72 | ```elixir 73 | # Just in case you rerun the notebook, let's make sure the default backend is BinaryBackend 74 | # Setting to the Nx default backend 75 | Nx.default_backend(Nx.BinaryBackend) 76 | Nx.default_backend() 77 | ``` 78 | 79 | 80 | 81 | ``` 82 | {Nx.BinaryBackend, []} 83 | ``` 84 | 85 | We'll pull down the MNIST data 86 | 87 | ```elixir 88 | {train_images, train_labels} = Scidata.MNIST.download() 89 | {test_images, test_labels} = Scidata.MNIST.download_test() 90 | ``` 91 | 92 | 93 | 94 | ``` 95 | {{<<0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 96 | 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...>>, {:u, 8}, {10000, 1, 28, 28}}, 97 | {<<7, 2, 1, 0, 4, 1, 4, 9, 5, 9, 0, 6, 9, 0, 1, 5, 9, 7, 3, 4, 9, 6, 6, 5, 4, 0, 7, 4, 0, 1, 3, 1, 98 | 3, 4, 7, 2, 7, 1, 2, 1, 1, 7, 4, 2, 3, 5, 1, ...>>, {:u, 8}, {10000}}} 99 | ``` 100 | 101 | ```elixir 102 | {train_images_binary, train_tensor_type, train_shape} = train_images 103 | {test_images_binary, test_tensor_type, test_shape} = test_images 104 | ``` 105 | 106 | 107 | 108 | ``` 109 | {<<0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 110 | 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...>>, {:u, 8}, {10000, 1, 28, 28}} 111 | ``` 112 | 113 | ```elixir 114 | {train_tensor_type, test_tensor_type} 115 | ``` 116 | 117 | 118 | 119 | ``` 120 | {{:u, 8}, {:u, 8}} 121 | ``` 122 | 123 | Convert into Tensors and normalize to between 0 and 1 124 | 125 | ```elixir 126 | train_tensors = 127 | train_images_binary 128 | |> Nx.from_binary(train_tensor_type) 129 | |> Nx.reshape({60000, 28 * 28}) 130 | |> Nx.divide(255) 131 | ``` 132 | 133 | 134 | 135 | ``` 136 | #Nx.Tensor< 137 | f32[60000][784] 138 | [ 139 | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...], 140 | ... 141 | ] 142 | > 143 | ``` 144 | 145 | We'll separate the data into 50,000 train images and 10,000 validation images. 146 | 147 | ```elixir 148 | x_train = train_tensors[0..49_999] 149 | x_valid = train_tensors[50_000..59_999] 150 | {x_train.shape, x_valid.shape} 151 | ``` 152 | 153 | 154 | 155 | ``` 156 | {{50000, 784}, {10000, 784}} 157 | ``` 158 | 159 | Training is more stable when random numbers are initialized with a mean of 0.0 and a variance of 1.0 160 | 161 | ```elixir 162 | mean = 0.0 163 | variance = 1.0 164 | weights = Nx.random_normal({784, 10}, mean, variance, type: {:f, 32}) 165 | ``` 166 | 167 | 168 | 169 | ``` 170 | #Nx.Tensor< 171 | f32[784][10] 172 | [ 173 | [1.182692050933838, 1.6625404357910156, -0.598689079284668, -0.6435468196868896, 0.25204139947891235, -1.1432150602340698, -0.9701210260391235, 1.9566036462783813, -0.6923237442970276, -1.0753910541534424], 174 | [0.17891690135002136, 0.42717286944389343, -0.9910821914672852, -2.649228096008301, 0.13641099631786346, 0.48691749572753906, -1.0575640201568604, 0.40385302901268005, 0.5131964683532715, 0.41488444805145264], 175 | [2.100423574447632, -1.2787413597106934, -1.8883213996887207, -0.49423742294311523, 0.5708040595054626, -0.48230457305908203, -0.19617703557014465, 0.7797456979751587, 0.7876895070075989, -0.33916765451431274], 176 | [-0.4369395673274994, 0.4421914517879486, 0.18007169663906097, 0.7891340255737305, 0.28369951248168945, -1.2312926054000854, -0.17864377796649933, -1.2232452630996704, 0.6976354718208313, 1.300831913948059], 177 | [-1.9821809530258179, 1.426361083984375, -2.2645328044891357, 0.26135173439979553, -0.36276111006736755, 2.7461342811584473, 0.007044021971523762, -0.18955571949481964, 0.6062670946121216, -0.4373891055583954], 178 | ... 179 | ] 180 | > 181 | ``` 182 | 183 | In order to simplify timing the performance of the Nx.dot/2 function, we'll use an 0 parameter anonymous function. Invoking the anonymous function will always use the two parameters, x_valid_cpu and weights_cpu. 184 | 185 | ```elixir 186 | large_nx_mult_fn = fn -> Nx.dot(x_valid, weights) end 187 | ``` 188 | 189 | 190 | 191 | ``` 192 | #Function<43.3316493/0 in :erl_eval.expr/6> 193 | ``` 194 | 195 | The following anonymous function take a function and the number of times to make the call to the function. 196 | 197 | ```elixir 198 | repeat = fn timed_fn, times -> Enum.each(1..times, fn _x -> timed_fn.() end) end 199 | ``` 200 | 201 | 202 | 203 | ``` 204 | #Function<41.3316493/2 in :erl_eval.expr/6> 205 | ``` 206 | 207 | Timing the average duration of the dot multiply function to run. The cell will output the average and total elapsed time 208 | 209 | ```elixir 210 | repeat_times = 5 211 | {elapsed_time_micro, _} = :timer.tc(repeat, [large_nx_mult_fn, repeat_times]) 212 | avg_elapsed_time_ms = elapsed_time_micro / 1000 / repeat_times 213 | 214 | {backend, _device} = Nx.default_backend() 215 | 216 | "#{backend} CPU avg time in #{avg_elapsed_time_ms} milliseconds total_time #{elapsed_time_micro / 1000} milliseconds" 217 | ``` 218 | 219 | 220 | 221 | ``` 222 | "Elixir.Nx.BinaryBackend CPU avg time in 31846.6328 milliseconds total_time 159233.164 milliseconds" 223 | ``` 224 | 225 | ## TorchScript CPU only 226 | 227 | We'll switch to the TorchScript backend but we'll stick with using the CPU. 228 | 229 | ```elixir 230 | Nx.default_backend({Torchx.Backend, device: :cpu}) 231 | Nx.default_backend() 232 | ``` 233 | 234 | 235 | 236 | ``` 237 | {Torchx.Backend, [device: :cpu]} 238 | ``` 239 | 240 | In the following cell, we transfer the target data from BinaryBackend to Torchx cpu backend. 241 | 242 | ```elixir 243 | x_valid_torchx_cpu = Nx.backend_transfer(x_valid, {Torchx.Backend, device: :cpu}) 244 | weights_torchx_cpu = Nx.backend_transfer(weights, {Torchx.Backend, device: :cpu}) 245 | ``` 246 | 247 | 248 | 249 | ``` 250 | #Nx.Tensor< 251 | f32[784][10] 252 | Torchx.Backend(cpu) 253 | [ 254 | [1.182692050933838, 1.6625404357910156, -0.598689079284668, -0.6435468196868896, 0.25204139947891235, -1.1432150602340698, -0.9701210260391235, 1.9566036462783813, -0.6923237442970276, -1.0753910541534424], 255 | [0.17891690135002136, 0.42717286944389343, -0.9910821914672852, -2.649228096008301, 0.13641099631786346, 0.48691749572753906, -1.0575640201568604, 0.40385302901268005, 0.5131964683532715, 0.41488444805145264], 256 | [2.100423574447632, -1.2787413597106934, -1.8883213996887207, -0.49423742294311523, 0.5708040595054626, -0.48230457305908203, -0.19617703557014465, 0.7797456979751587, 0.7876895070075989, -0.33916765451431274], 257 | [-0.4369395673274994, 0.4421914517879486, 0.18007169663906097, 0.7891340255737305, 0.28369951248168945, -1.2312926054000854, -0.17864377796649933, -1.2232452630996704, 0.6976354718208313, 1.300831913948059], 258 | [-1.9821809530258179, 1.426361083984375, -2.2645328044891357, 0.26135173439979553, -0.36276111006736755, 2.7461342811584473, 0.007044021971523762, -0.18955571949481964, 0.6062670946121216, -0.4373891055583954], 259 | ... 260 | ] 261 | > 262 | ``` 263 | 264 | An anonymous function that calls Nx.dot/2 with data on the Torchx cpu backend. 265 | 266 | ```elixir 267 | torchx_cpu_mult_fn = fn -> Nx.dot(x_valid_torchx_cpu, weights_torchx_cpu) end 268 | ``` 269 | 270 | 271 | 272 | ``` 273 | #Function<43.3316493/0 in :erl_eval.expr/6> 274 | ``` 275 | 276 | We'll time using Torchx on the CPU. Notice the significant performance improvement over BinaryBackend while still using just the CPU. 277 | 278 | ```elixir 279 | repeat_times = 5 280 | {elapsed_time_micro, _} = :timer.tc(repeat, [torchx_cpu_mult_fn, repeat_times]) 281 | avg_elapsed_time_ms = elapsed_time_micro / 1000 / repeat_times 282 | 283 | {backend, [device: device]} = Nx.default_backend() 284 | 285 | "#{backend} #{device} avg time in milliseconds #{avg_elapsed_time_ms} total_time #{elapsed_time_micro / 1000}" 286 | ``` 287 | 288 | 289 | 290 | ``` 291 | "Elixir.Torchx.Backend cpu avg time in milliseconds 1.7149999999999999 total_time 8.575" 292 | ``` 293 | 294 | ## TorchScript using GPU 295 | 296 | We'll switch to using the cuda device. If you have a different device, replace all the :cuda specifications with your device. 297 | 298 | ```elixir 299 | Nx.default_backend({Torchx.Backend, device: :cuda}) 300 | Nx.default_backend() 301 | ``` 302 | 303 | 304 | 305 | ``` 306 | {Torchx.Backend, [device: :cuda]} 307 | ``` 308 | 309 | In the following cell, we transfer the target data onto the GPU. 310 | 311 | ```elixir 312 | x_valid_cuda = Nx.backend_transfer(x_valid, {Torchx.Backend, client: :cuda}) 313 | weights_cuda = Nx.backend_transfer(weights, {Torchx.Backend, client: :cuda}) 314 | ``` 315 | 316 | 317 | 318 | ``` 319 | #Nx.Tensor< 320 | f32[784][10] 321 | Torchx.Backend(cuda) 322 | [ 323 | [1.182692050933838, 1.6625404357910156, -0.598689079284668, -0.6435468196868896, 0.25204139947891235, -1.1432150602340698, -0.9701210260391235, 1.9566036462783813, -0.6923237442970276, -1.0753910541534424], 324 | [0.17891690135002136, 0.42717286944389343, -0.9910821914672852, -2.649228096008301, 0.13641099631786346, 0.48691749572753906, -1.0575640201568604, 0.40385302901268005, 0.5131964683532715, 0.41488444805145264], 325 | [2.100423574447632, -1.2787413597106934, -1.8883213996887207, -0.49423742294311523, 0.5708040595054626, -0.48230457305908203, -0.19617703557014465, 0.7797456979751587, 0.7876895070075989, -0.33916765451431274], 326 | [-0.4369395673274994, 0.4421914517879486, 0.18007169663906097, 0.7891340255737305, 0.28369951248168945, -1.2312926054000854, -0.17864377796649933, -1.2232452630996704, 0.6976354718208313, 1.300831913948059], 327 | [-1.9821809530258179, 1.426361083984375, -2.2645328044891357, 0.26135173439979553, -0.36276111006736755, 2.7461342811584473, 0.007044021971523762, -0.18955571949481964, 0.6062670946121216, -0.4373891055583954], 328 | ... 329 | ] 330 | > 331 | ``` 332 | 333 | An anonymous function that calls Nx.dot/2 with data on the GPU 334 | 335 | ```elixir 336 | torchx_gpu_mult_fn = fn -> Nx.dot(x_valid_cuda, weights_cuda) end 337 | ``` 338 | 339 | 340 | 341 | ``` 342 | #Function<43.3316493/0 in :erl_eval.expr/6> 343 | ``` 344 | 345 | We'll warm up the GPU by looping through 5 function calls and then timing the next 5 function calls. 346 | 347 | ```elixir 348 | repeat_times = 5 349 | # Warmup 350 | {elapsed_time_micro, _} = :timer.tc(repeat, [torchx_gpu_mult_fn, repeat_times]) 351 | {elapsed_time_micro, _} = :timer.tc(repeat, [torchx_gpu_mult_fn, repeat_times]) 352 | avg_elapsed_time_ms = elapsed_time_micro / 1000 / repeat_times 353 | 354 | {backend, [device: device]} = Nx.default_backend() 355 | 356 | "#{backend} #{device} avg time in milliseconds #{avg_elapsed_time_ms} total_time #{elapsed_time_micro / 1000}" 357 | ``` 358 | 359 | 360 | 361 | ``` 362 | "Elixir.Torchx.Backend cuda avg time in milliseconds 0.0718 total_time 0.359" 363 | ``` 364 | 365 | ```elixir 366 | x_valid = Nx.backend_transfer(x_valid_cuda, Nx.BinaryBackend) 367 | weights = Nx.backend_transfer(weights_cuda, Nx.BinaryBackend) 368 | ``` 369 | 370 | 371 | 372 | ``` 373 | #Nx.Tensor< 374 | f32[784][10] 375 | [ 376 | [1.182692050933838, 1.6625404357910156, -0.598689079284668, -0.6435468196868896, 0.25204139947891235, -1.1432150602340698, -0.9701210260391235, 1.9566036462783813, -0.6923237442970276, -1.0753910541534424], 377 | [0.17891690135002136, 0.42717286944389343, -0.9910821914672852, -2.649228096008301, 0.13641099631786346, 0.48691749572753906, -1.0575640201568604, 0.40385302901268005, 0.5131964683532715, 0.41488444805145264], 378 | [2.100423574447632, -1.2787413597106934, -1.8883213996887207, -0.49423742294311523, 0.5708040595054626, -0.48230457305908203, -0.19617703557014465, 0.7797456979751587, 0.7876895070075989, -0.33916765451431274], 379 | [-0.4369395673274994, 0.4421914517879486, 0.18007169663906097, 0.7891340255737305, 0.28369951248168945, -1.2312926054000854, -0.17864377796649933, -1.2232452630996704, 0.6976354718208313, 1.300831913948059], 380 | [-1.9821809530258179, 1.426361083984375, -2.2645328044891357, 0.26135173439979553, -0.36276111006736755, 2.7461342811584473, 0.007044021971523762, -0.18955571949481964, 0.6062670946121216, -0.4373891055583954], 381 | ... 382 | ] 383 | > 384 | ``` 385 | -------------------------------------------------------------------------------- /01i_matmul_Exla_cpu.livemd: -------------------------------------------------------------------------------- 1 | # Matrix multiplication on CPU- XLA 2 | 3 | ```elixir 4 | Mix.install( 5 | [ 6 | {:nx, "~> 0.4.0"}, 7 | {:scidata, "~> 0.1.9"}, 8 | {:axon, "~> 0.3.0"}, 9 | {:exla, "~> 0.4"} 10 | ] 11 | ) 12 | ``` 13 | 14 | 15 | 16 | ``` 17 | Resolving Hex dependencies... 18 | Dependency resolution completed: 19 | New: 20 | axon 0.3.0 21 | castore 0.1.18 22 | complex 0.4.2 23 | elixir_make 0.6.3 24 | exla 0.4.0 25 | jason 1.4.0 26 | nimble_csv 1.2.0 27 | nx 0.4.0 28 | scidata 0.1.9 29 | xla 0.3.0 30 | * Getting nx (Hex package) 31 | * Getting scidata (Hex package) 32 | * Getting axon (Hex package) 33 | * Getting exla (Hex package) 34 | * Getting elixir_make (Hex package) 35 | * Getting xla (Hex package) 36 | * Getting castore (Hex package) 37 | * Getting jason (Hex package) 38 | * Getting nimble_csv (Hex package) 39 | * Getting complex (Hex package) 40 | ==> jason 41 | Compiling 10 files (.ex) 42 | Generated jason app 43 | ==> nimble_csv 44 | Compiling 1 file (.ex) 45 | Generated nimble_csv app 46 | ==> complex 47 | Compiling 2 files (.ex) 48 | Generated complex app 49 | ==> nx 50 | Compiling 27 files (.ex) 51 | Generated nx app 52 | ==> axon 53 | Compiling 24 files (.ex) 54 | Generated axon app 55 | ==> elixir_make 56 | Compiling 1 file (.ex) 57 | Generated elixir_make app 58 | ==> xla 59 | Compiling 2 files (.ex) 60 | Generated xla app 61 | ==> exla 62 | Unpacking /home/ml3/.cache/xla/0.3.0/cache/download/xla_extension-x86_64-linux-cpu.tar.gz into /home/ml3/.cache/mix/installs/elixir-1.14.1-erts-13.1/45e4038ac8aacd103fe2688496702add/deps/exla/cache 63 | g++ -fPIC -I/home/ml3/.asdf/installs/erlang/25.1/erts-13.1/include -Icache/xla_extension/include -O3 -Wall -Wno-sign-compare -Wno-unused-parameter -Wno-missing-field-initializers -Wno-comment -shared -std=c++14 c_src/exla/exla.cc c_src/exla/exla_nif_util.cc c_src/exla/exla_client.cc -o cache/libexla.so -Lcache/xla_extension/lib -lxla_extension -Wl,-rpath,'$ORIGIN/lib' 64 | Compiling 21 files (.ex) 65 | Generated exla app 66 | ==> castore 67 | Compiling 1 file (.ex) 68 | Generated castore app 69 | ==> scidata 70 | Compiling 13 files (.ex) 71 | Generated scidata app 72 | ``` 73 | 74 | 75 | 76 | ``` 77 | :ok 78 | ``` 79 | 80 | ## Before running notebook 81 | 82 | This notebook has a dependency on EXLA. XLA supports systems with direct access to an NVidia GPU, AMD ROCm or a Google TPU. According to the documentation, https://github.com/elixir-nx/nx/tree/main/exla#readme EXLA will try to find a precompiled version that matches your system. If it doesn't find a match. you will need to install CUDA and CuDNN for your system. 83 | 84 | The notebook is currently configured for Nvidia GPU via 85 | 86 | ``` 87 | system_env: %{"XLA_TARGET" => "cuda111"} 88 | ``` 89 | 90 | Review the configuration documentation for more options. https://hexdocs.pm/exla/EXLA.html#module-configuration 91 | 92 | We had to install CUDA and CuDNN but that was several months ago. Your experience may vary from ours. 93 | 94 | ## Context 95 | 96 | This Livebook is a transformation of a Python Jupyter Notebook from Fast.ai's From Deep Learning Foundations to Stable Diffusion, Practical Deep Learning for Coders part 2, 2022. Specifically, it mimics the CUDA portion of https://github.com/fastai/course22p2/blob/master/nbs/01_matmul.ipynb 97 | 98 | The purpose of the transformation is to bring the Fast.ai concepts to Elixir focused developers. The object-oriented Python/PyTorch implementation is transformed into a functional programming implementation using Nx and Axon 99 | 100 | ## Experimenting with backend control 101 | 102 | In this notebook, we are going to experiment with swapping out backends in the same notebook. One of the strengths of Elixir's numerical processing approach is the concept of a backend. The same Nx code can run on several different backends. This allows Nx to adapt to changes in numerical libaries and technology. Currently, Nx has support for Tensorflow's XLA and PyTorch's TorchScript. Theoretically, backends for SOC type devices should be possible. 103 | 104 | We chose not to set the backend globally throughout the notebook. At the beginning of the notebook we'll repeat the approach we used in 01a_matmul_using_CPU. We begin with the Elixir Binary backend. You'll see that it isn't quick multiplying 10,000 rows of MNIST data by some arbitrary weights. We'll then repeat the same multiplication using an NVidia 1080Ti GPU. The 1080 Ti is not the fastest GPU, but it is tremendously faster than a "large" set of data on the BinaryBackend. 105 | 106 | * 31649.26 milliseconds using BinaryBackend with a CPU only. 107 | * 0.14 milliseconds using XLA with a warmed up GPU 108 | 109 | *226,000 times faster on an old GPU* 110 | 111 | ## Backends 112 | 113 | ```elixir 114 | # Without choosing a backend, Nx defaults to Nx.BinaryBackend 115 | Nx.default_backend() 116 | ``` 117 | 118 | 119 | 120 | ``` 121 | {Nx.BinaryBackend, []} 122 | ``` 123 | 124 | Let's change to EXLA with CPU 125 | 126 | ```elixir 127 | Nx.default_backend({EXLA.Backend, device: :host}) 128 | Nx.default_backend() 129 | ``` 130 | 131 | 132 | 133 | ``` 134 | {EXLA.Backend, [device: :host]} 135 | ``` 136 | 137 | We'll pull down the MNIST data 138 | 139 | ```elixir 140 | {train_images, train_labels} = Scidata.MNIST.download() 141 | ``` 142 | 143 | 144 | 145 | ``` 146 | {{<<0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 147 | 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...>>, {:u, 8}, {60000, 1, 28, 28}}, 148 | {<<5, 0, 4, 1, 9, 2, 1, 3, 1, 4, 3, 5, 3, 6, 1, 7, 2, 8, 6, 9, 4, 0, 9, 1, 1, 2, 4, 3, 2, 7, 3, 8, 149 | 6, 9, 0, 5, 6, 0, 7, 6, 1, 8, 7, 9, 3, 9, 8, ...>>, {:u, 8}, {60000}}} 150 | ``` 151 | 152 | ```elixir 153 | {train_images_binary, train_tensor_type, train_shape} = train_images 154 | ``` 155 | 156 | 157 | 158 | ``` 159 | {<<0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 160 | 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...>>, {:u, 8}, {60000, 1, 28, 28}} 161 | ``` 162 | 163 | ```elixir 164 | train_tensor_type 165 | ``` 166 | 167 | 168 | 169 | ``` 170 | {:u, 8} 171 | ``` 172 | 173 | Convert into Tensors and normalize to between 0 and 1 174 | 175 | ```elixir 176 | train_tensors = 177 | train_images_binary 178 | |> Nx.from_binary(train_tensor_type) 179 | |> Nx.reshape({60000, 28 * 28}) 180 | |> Nx.divide(255) 181 | ``` 182 | 183 | 184 | 185 | ``` 186 | 187 | 18:50:30.293 [info] XLA service 0x7fe6d40e2330 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 188 | 189 | 18:50:30.295 [info] StreamExecutor device (0): Host, Default Version 190 | 191 | ``` 192 | 193 | 194 | 195 | ``` 196 | #Nx.Tensor< 197 | f32[60000][784] 198 | EXLA.Backend 199 | [ 200 | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...], 201 | ... 202 | ] 203 | > 204 | ``` 205 | 206 | We'll separate the data into 50,000 train images and 10,000 validation images. 207 | 208 | ```elixir 209 | x_train_cpu = train_tensors[0..49_999] 210 | x_valid_cpu = train_tensors[50_000..59_999] 211 | {x_train_cpu.shape, x_valid_cpu.shape} 212 | ``` 213 | 214 | 215 | 216 | ``` 217 | {{50000, 784}, {10000, 784}} 218 | ``` 219 | 220 | Training is more stable when random numbers are initialized with a mean of 0.0 and a variance of 1.0 221 | 222 | ```elixir 223 | mean = 0.0 224 | variance = 1.0 225 | weights_cpu = Nx.random_normal({784, 10}, mean, variance, type: {:f, 32}) 226 | ``` 227 | 228 | 229 | 230 | ``` 231 | #Nx.Tensor< 232 | f32[784][10] 233 | EXLA.Backend 234 | [ 235 | [-0.973583996295929, 1.3404284715652466, 0.5889155268669128, -0.06439179182052612, -2.2255215644836426, -0.3939111828804016, -1.5497547388076782, -1.1714494228363037, 1.0855729579925537, -0.4689534306526184], 236 | [-0.31778475642204285, 0.07520100474357605, 0.053238045424222946, 0.42360711097717285, -2.253004312515259, -0.3818463981151581, -0.5468025803565979, 1.3460612297058105, 1.509813904762268, 0.10178464651107788], 237 | [2.7212319374084473, -0.6341637969017029, 1.9983967542648315, 0.4862823486328125, 0.951216459274292, -0.8570582270622253, 1.7834625244140625, -0.1596108078956604, -0.369051992893219, 0.7038326263427734], 238 | [-1.321571946144104, -0.573075532913208, -0.5281657576560974, -1.528030276298523, 0.5641341209411621, -0.13296610116958618, -0.20917919278144836, -0.5405102372169495, 0.13647650182247162, 1.0692965984344482], 239 | [1.1940683126449585, -1.0889204740524292, 0.26889121532440186, -0.8505605459213257, 0.31284958124160767, 0.8289848566055298, 0.23549814522266388, 0.5921769738197327, 0.506867527961731, 0.6787563562393188], 240 | ... 241 | ] 242 | > 243 | ``` 244 | 245 | In order to simplify timing the performance of the Nx.dot/2 function, we'll use an 0 parameter anonymous function. Invoking the anonymous function will always use the two parameters, x_valid_cpu and weights_cpu. 246 | 247 | ```elixir 248 | large_nx_mult_fn = fn -> Nx.dot(x_valid_cpu, weights_cpu) end 249 | ``` 250 | 251 | 252 | 253 | ``` 254 | #Function<43.3316493/0 in :erl_eval.expr/6> 255 | ``` 256 | 257 | The following anonymous function takes function and the number of times to make the call to the function. 258 | 259 | ```elixir 260 | repeat = fn timed_fn, times -> Enum.each(1..times, fn _x -> timed_fn.() end) end 261 | ``` 262 | 263 | 264 | 265 | ``` 266 | #Function<41.3316493/2 in :erl_eval.expr/6> 267 | ``` 268 | 269 | Timing the average duration of the dot multiply function to run. The cell will output the average and total elapsed time 270 | 271 | ```elixir 272 | repeat_times = 5 273 | {elapsed_time_micro, _} = :timer.tc(repeat, [large_nx_mult_fn, repeat_times]) 274 | avg_elapsed_time_ms = elapsed_time_micro / 1000 / repeat_times 275 | 276 | {backend, device} = Nx.default_backend() 277 | 278 | "#{backend} CPU avg time in #{avg_elapsed_time_ms} milliseconds, total_time #{elapsed_time_micro / 1000} milliseconds" 279 | ``` 280 | 281 | 282 | 283 | ``` 284 | "Elixir.EXLA.Backend CPU avg time in 1.2837999999999998 milliseconds, total_time 6.419 milliseconds" 285 | ``` 286 | -------------------------------------------------------------------------------- /ElixirFashionML_Challenge/fashion_mnist_challenge.livemd: -------------------------------------------------------------------------------- 1 | # Classifying Simple Fashion Types - State of the Art (SOTA) Challenge 2 | 3 | ```elixir 4 | Mix.install([ 5 | {:axon, "~> 0.5.1"}, 6 | {:exla, "~> 0.5.3"}, 7 | {:req, "~> 0.3.10"}, 8 | {:scidata, "~> 0.1.10"} 9 | ]) 10 | ``` 11 | 12 | ## Introduction 13 | 14 | This livebook is inspired by the Classifying handwritten digits notebook in the Axon documentation. FashionMNIST was designed as a drop-in replacment for the MNIST dataset. Instead of digits, there a grey scale images of clothing types. Like MNIST, there are 10 kinds of images. FashionMNIST was designed as a harder problem than the digits dataset. You can check the difficulty by running this notebook for 10 epochs. Notice the training accuracy will be lower than the corresponding MNIST notebook when using the exact same model and epochs. 15 | 16 | ## State of the Art 17 | 18 | In a December tweet, Jeremy Howard created a challenge for the machine learning community. Can anyone beat his accuracy in 5, 20 or 50 epochs. The challenge's epoch accuracy approach is open to the community and inclusive because the compute requirements are broader. It doesn't matter whether you are running on an NVidia 1060, 4080, or some GPU in the cloud. In fact, because the problem is small enough, you can even use your CPU and patience. A CPU cloud resource can be used on a free Huggingface Space or Fly.io. If you only have a CPU, be sure to use the EXLA or TorchX backends because they are faster than the pure Elixir default. 19 | 20 | 21 | 22 | ![](images/fashion-MNIST_Challenge.png) 23 | 24 | 25 | 26 | One implied rule that isn't written in Jeremy's challenge, the model must be trained using only the original FashionMNIST training dataset. Participants can't add any extra images to the training set. For example, you can't use generative AI to create new fashion training data images. 27 | 28 | 29 | 30 | Leaderboard (Accuracy) on 12/15/2022 31 | 32 | * 5 Epochs - 92.7% 33 | * 20 Epochs - 93.2% 34 | 35 | 36 | 37 | Using Axon, we should be able to match those mid December numbers. The techniques that Jeremy used can be built in the Nx family of libraries. The foundations for the necessary tools and techniques are in the Axon, Nx, Kino, and NxImage libraries. Going through training resources, and hints I'll provide, should allow participants to improve the score. Try implementing one techique and share your results. If you improve the accuracy, I'll add you to the leaderboard. I'll also keep track of everyone who has been on the leaderboard. 38 | 39 | By competing with each other and sharing, we'll all learn the best techniques for building a State of the Art model in Elixir. Also, I strongly recommend sharing techniques that you try that don't improve the leaderboard. If you try something, you learn something. When you share, everyone learns something. 40 | 41 | If we can match the numbers, then we might be able to get close to the current [leaderboard](https://forums.fast.ai/t/a-challenge-for-you-all/102656). But let's try the 12/15 leaderboard first. 42 | 43 | ## Hyperparameters 44 | 45 | Hyperparameters in machine learning are choices the developer makes that shape the training of a model. However, what model to use is one of those choices but it isn't a simple hyperparameter. Let's create a Map with our simple parameter choices. It should make it easier to see some key choices that we are making. We can then reference the choices later in our notebook. When you add a new technique, you are probably going to make some hyperparameter choices. Please add your choices to this datastructure. When we get further along, I plan upon sharing the reasoning for a separate hyperparameter data structure. 46 | 47 | ```elixir 48 | hyperparams = %{ 49 | epochs: 5, 50 | batch_size: 32 51 | } 52 | ``` 53 | 54 | ## Retrieving and exploring the dataset 55 | 56 | The Fashion MNIST dataset is available for free online. The Elixir SciData library provides an easy technique to access the training and test datasets. 57 | 58 | ```elixir 59 | {train_images, train_labels} = Scidata.FashionMNIST.download() 60 | ``` 61 | 62 | ```elixir 63 | # Normalize and batch images 64 | {images_binary, images_type, images_shape} = train_images 65 | 66 | batched_images = 67 | images_binary 68 | |> Nx.from_binary(images_type) 69 | |> Nx.reshape(images_shape) 70 | |> Nx.divide(255) 71 | |> Nx.to_batched(hyperparams[:batch_size]) 72 | ``` 73 | 74 | ```elixir 75 | # One-hot-encode and batch labels 76 | {labels_binary, labels_type, _shape} = train_labels 77 | 78 | batched_labels = 79 | labels_binary 80 | |> Nx.from_binary(labels_type) 81 | |> Nx.new_axis(-1) 82 | |> Nx.equal(Nx.tensor(Enum.to_list(0..9))) 83 | |> Nx.to_batched(hyperparams[:batch_size]) 84 | ``` 85 | 86 | ## Defining the model 87 | 88 | We'll use the same model from the MNIST example. By starting with an extremely simple model, I've left room for challenge participants to try different models. Remember, the models have to start with random weights. Pre-trained models can't be used on the leaderboard. However, you can learn from trying a pre-trained model. Check out Sean's Machine Learning for Elixir book for an example. 89 | 90 | ```elixir 91 | model = 92 | Axon.input("input", shape: {nil, 1, 28, 28}) 93 | |> Axon.flatten() 94 | |> Axon.dense(128, activation: :relu) 95 | |> Axon.dense(10, activation: :softmax) 96 | ``` 97 | 98 | All `Axon` models start with an input layer to tell subsequent layers what shapes to expect. We then use `Axon.flatten/2` which flattens the previous layer by squeezing all dimensions but the first dimension into a single dimension. Our model consists of 2 fully connected layers with 128 and 10 units respectively. The first layer uses `:relu` activation which returns `max(0, input)` element-wise. The final layer uses `:softmax` activation to return a probability distribution over the 10 labels. 99 | 100 | ## Training 101 | 102 | In Axon we express the task of training using a declarative loop API. First, we need to specify a loss function and optimizer, there are many built-in variants to choose from. In this example, we'll use *categorical cross-entropy* and the *Adam* optimizer. We will also keep track of the *accuracy* metric. Finally, we run training loop passing our batched images and labels. We'll train for 10 epochs using the `EXLA` compiler. 103 | 104 | 105 | 106 | Based upon the results of PyTorch challenge from last winter, every leaderboard change overtook the others for all 3 epoch levels. Five epochs is enough to experiment with different model and training approaches. If 5 epochs is more accurate than the current leaderboard, then try the 20 and 50 epochs for completeness 107 | 108 | ```elixir 109 | trained_model_params = 110 | model 111 | |> Axon.Loop.trainer(:categorical_cross_entropy, :adam) 112 | |> Axon.Loop.metric(:accuracy, "Accuracy") 113 | |> Axon.Loop.run(Stream.zip(batched_images, batched_labels), %{}, 114 | epochs: hyperparams[:epochs], 115 | compiler: EXLA 116 | ) 117 | ``` 118 | 119 | ## Comparison with the test data leaderboard 120 | 121 | Now that we have the trained model parameters from the training effort, we can use them for calculating test data accuracy. 122 | 123 | Let's get the test data. 124 | 125 | ```elixir 126 | {test_images, test_labels} = Scidata.FashionMNIST.download_test() 127 | ``` 128 | 129 | ```elixir 130 | {test_images_binary, _test_images_type, test_images_shape} = test_images 131 | 132 | test_batched_images = 133 | test_images_binary 134 | |> Nx.from_binary(images_type) 135 | |> Nx.reshape(test_images_shape) 136 | |> Nx.divide(255) 137 | |> Nx.to_batched(hyperparams[:batch_size]) 138 | ``` 139 | 140 | ```elixir 141 | # One-hot-encode and batch labels 142 | {test_labels_binary, _test_labels_type, _shape} = test_labels 143 | 144 | test_batched_labels = 145 | test_labels_binary 146 | |> Nx.from_binary(labels_type) 147 | |> Nx.new_axis(-1) 148 | |> Nx.equal(Nx.tensor(Enum.to_list(0..9))) 149 | |> Nx.to_batched(hyperparams[:batch_size]) 150 | ``` 151 | 152 | Instead of Axon.predict, we'll use Axon.loop.evaluator with an accuracy metric. 153 | 154 | ```elixir 155 | Axon.Loop.evaluator(model) 156 | |> Axon.Loop.metric(:accuracy, "Accuracy") 157 | |> Axon.Loop.run( 158 | Stream.zip(test_batched_images, test_batched_labels), 159 | trained_model_params, 160 | compiler: EXLA 161 | ) 162 | ``` 163 | 164 | ## Challenge: #ElixirFashionML 165 | 166 | '#ElixirFashionMLChallenge Leaderboard (Accuracy) on 7/30/2023 167 | 168 | * 5 Epochs - 87.4% 169 | * 20 Epochs - 87.7% 170 | * 50 Epochs - 87.8% 171 | 172 | 173 | 174 | We have an 5 epoch accuracy of 87.4% vs Jeremy's 12/15 accuracy of 92.7%. That should plenty of opportunities for the community to leap to the top of the leaderboard 175 | 176 | ## How can you beat this initial result? 177 | 178 | I'll provide a quick set of resources and expand upon important resources at a later time. For now, start reading, try various Livebook notebooks, and watch some videos. 179 | 180 | ## Resources 181 | 182 | We highly recommend purchasing Sean Moriarity's book, [Machine Learning in Elixir](https://pragprog.com/titles/smelixir/machine-learning-in-elixir/). He and Jose' started the Elixir numerical compute capability. The book explains many important concepts about training models in Elixir. 183 | 184 | 185 | 186 | Nicolo` G created a batch of Livebook notebooks that translated Python book examples into Nx. The notebooks can be found on his [Github account](https://github.com/nickgnd/programming-machine-learning-livebooks) 187 | 188 | 189 | 190 | The techniques to achieve the SOTA are taught in the Fast.ai [Part 2 course](https://course.fast.ai/Lessons/part2.html). There are three parts of the course: StableDiffusion, Deep Learning Foundations, StableDiffusion from scratch. Deep Learching Foundations focused on the skills for this challenge and are found in Lessons 10.second_half through 19.first_half. About 18 hours of videos on the PyTorch implementation to reach the SOTA numbers. I have some Livebook notebooks for [Lesson 10-11](https://github.com/meanderingstream/dl_foundations_in_elixir). 191 | 192 | I struggled for a while on translating the object oriented concepts into a similar approach in Elixir before I decided that the object-oriented abstractions probably weren't worth translating. The calculations and tools are important though. Axon and Kino have elements that can provide some of the same ease of use as Fast.ai. Kino can be used, with Axon, to create the visualizations and tools that are created in the course. Axon and NxImage have elements that can be combined to create other capabilities taught in the course. I'll have more thoughts and hints to share soon. 193 | 194 | ## Why is it important for Elixir folks to try to beat Jeremy's 12/15 SOTA values 195 | 196 | By implementing the techniques from Fast.ai's Lessons 10-18, we will be learning how to train a very accurate model using lower compute costs. When a business is trying to use a model in production, normally businesses want the best performing model that fits the problem constraints. By learning techniques to improve model performance while also reducing the compute training requirements, we help reduce costs and have a better chance of meeting business goals. 197 | 198 | 199 | 200 | While it may seem that FashionMNIST is a simple problem, all of the techniques used to reach SOTA were originally combined in the [2018 DawnBench competition](https://www.fast.ai/posts/2018-04-30-dawnbench-fastai.html). Fast.ai students in a study group teamed up to compete against well funded companies and came in second place (Imagenet) and first place (CIFAR). Unlike this competition, the DawnBench competition was a time-based competition. 201 | -------------------------------------------------------------------------------- /ElixirFashionML_Challenge/fashion_mnist_sean_m.livemd: -------------------------------------------------------------------------------- 1 | # Classifying Simple Fashion Types - Sean Moriarity 2 | 3 | ```elixir 4 | Mix.install([ 5 | {:axon, "~> 0.5.1"}, 6 | {:exla, "~> 0.5.3"}, 7 | {:req, "~> 0.3.10"}, 8 | {:scidata, "~> 0.1.10"} 9 | ]) 10 | ``` 11 | 12 | ## Elixir FashionMNIST Challenge 13 | 14 | I challenged an Elixir community to an Elixir Fashion MNIST challenge, https://alongtheaxon.com/blog/fashion_mnist_challenge. The idea was derived from a Twitter post by Jeremy Howard. Jeremy was teaching his Deep Learning from the Foundations 2022 course. He used the Fashion MNIST dataset as an accessible deep learning problem. By using accuracy, anyone with patience and a CPU could try to compete for a better accuracy measure. Having a GPU would be faster, but using the CPU works pretty well. 15 | 16 | ## Sean Moriarity's Blog Post 17 | 18 | Sean created an excellent Dockyard blog post inviting Elixir folks to join the [Elixir FashionMNIST Challenge](https://dockyard.com/blog/2023/08/08/join-the-elixir-fashionmnist-challenge). In his post, he presented his initial approach to improving upon my initial baseline. He achieved a 90.7% accuracy on the test set. However, he didn't link to a Livebook implementation. This notebook attempts to match his blog post in a Livebook notebook that you can use to repeat his results. I also tried the other two epoch sizes and added Sean to the leaderboard. 19 | 20 | ## Hyperparameters 21 | 22 | Hyperparameters in machine learning are choices the developer makes that shape the training of a model. However, what model to use is one of those choices but it isn't a simple hyperparameter. Let's create a map with our simple parameter choices. It should make it easier to see some key training choices. We can then reference the choices later in our notebook. 23 | 24 | ```elixir 25 | hyperparams = %{ 26 | epochs: 5, 27 | batch_size: 32 28 | } 29 | ``` 30 | 31 | ## Retrieving and exploring the dataset 32 | 33 | The Fashion MNIST dataset is available for free online. The Elixir SciData library provides an easy technique to access the training and test datasets. 34 | 35 | ```elixir 36 | {train_images, train_labels} = Scidata.FashionMNIST.download() 37 | ``` 38 | 39 | ```elixir 40 | # Normalize and batch images 41 | {images_binary, images_type, images_shape} = train_images 42 | 43 | train_images = 44 | images_binary 45 | |> Nx.from_binary(images_type) 46 | |> Nx.reshape({:auto, 28, 28, 1}) 47 | |> Nx.divide(255) 48 | ``` 49 | 50 | ```elixir 51 | # One-hot-encode and batch labels 52 | {labels_binary, labels_type, _shape} = train_labels 53 | 54 | train_labels = 55 | labels_binary 56 | |> Nx.from_binary(labels_type) 57 | |> Nx.new_axis(-1) 58 | |> Nx.equal(Nx.tensor(Enum.to_list(0..9))) 59 | ``` 60 | 61 | ## Pipeline 62 | 63 | The next cell was one of the key improvements that Sean made. The pipeline concept provides a place for dynamically modifying the original dataset. Without having to capture more data, augmenting input data improves the quality of the data. When working on a machine learning problem, the business team often asks how much data is required. There isn't a magic number, or rule of thumb, for how much data is required. However, useful augmentation of the data can improve the quality of the data at a low acquisition cost. 64 | 65 | ```elixir 66 | seed = 42 67 | 68 | {batched_images, _} = 69 | train_images 70 | |> Nx.to_batched(hyperparams[:batch_size]) 71 | |> Enum.map_reduce(Nx.Random.key(seed), fn batch, key -> 72 | fun = 73 | Nx.Defn.jit( 74 | fn regular, key -> 75 | {mask, key} = Nx.Random.uniform(key) 76 | flipped = Nx.reverse(regular, axes: [1]) 77 | augmented = Nx.select(Nx.greater(mask, 0.5), flipped, regular) 78 | {augmented, key} 79 | end, 80 | compiler: EXLA 81 | ) 82 | 83 | fun.(batch, key) 84 | end) 85 | 86 | batched_labels = 87 | train_labels 88 | |> Nx.to_batched(hyperparams[:batch_size]) 89 | |> Enum.to_list() 90 | ``` 91 | 92 | ## Defining the model 93 | 94 | Sean created a custom, but small, convolutional neural network (CNN). Two convolutional blocks with max pooling. This is an incremental improvement over the initial model. You can experiment with other model approaches to improve the score. 95 | 96 | ```elixir 97 | model = 98 | Axon.input("features") 99 | |> Axon.conv(32, kernel_size: {3, 3}, padding: :same, activation: :relu) 100 | |> Axon.max_pool(kernel_size: 2) 101 | |> Axon.conv(64, kernel_size: {3, 3}, padding: :same, activation: :relu) 102 | |> Axon.max_pool(kernel_size: 2) 103 | |> Axon.flatten() 104 | |> Axon.dense(128, activation: :relu) 105 | |> Axon.dense(10, activation: :softmax) 106 | ``` 107 | 108 | ## Training 109 | 110 | Sean also improved the training process. He added a learning rate scheduler instead of the constant default learning rate from the Adam optimizer. Learning rate schedules are a fantastic area for experimentation. Sean used 1850 transition steps which is just under one epoch at a batch size of 32. 60,000/32 = 1,875. 111 | 112 | What happens when the decay rate is changed? What is the best nominal learning rate? What happens with the transition steps are increased or decreased? Is a decay optimal or should it be an increasing learning rate? What about a one-cycle approach? These are all hyperparameter decisions that can be varied to improve the model test set accuracy. 113 | 114 | 115 | 116 | Based upon the results of PyTorch challenge from last winter, every leaderboard change overtook the others for all 3 epoch levels. Five epochs is enough to experiment with different model and training approaches. If 5 epochs is more accurate than the current leaderboard, then try the 20 and 50 epochs for completeness. Sean only provided a 5 epoch accuracy. His accuracy 90.7% improves upon my initial notebook. So let's try the 20 epoch level and the 50 epoch level. We'll credit Sean with all three numbers. 117 | 118 | ```elixir 119 | training_seed = 42 120 | learning_rate = 1.0e-3 121 | 122 | schedule = 123 | Axon.Schedules.exponential_decay( 124 | 5.0e-3, 125 | transition_steps: 1850, 126 | decay_rate: 0.5 127 | ) 128 | 129 | optimizer = Axon.Optimizers.adam(schedule) 130 | 131 | trained_model_state = 132 | model 133 | |> Axon.Loop.trainer(:categorical_cross_entropy, optimizer) 134 | |> Axon.Loop.metric(:accuracy) 135 | |> Axon.Loop.run(Stream.zip(batched_images, batched_labels), %{}, 136 | epochs: hyperparams[:epochs], 137 | compiler: EXLA, 138 | seed: training_seed 139 | ) 140 | ``` 141 | 142 | ## Comparison with the test data leaderboard 143 | 144 | Now that we have the trained model parameters from the training effort, we can use them for calculating test data accuracy. 145 | 146 | Let's get the test data. 147 | 148 | ```elixir 149 | {test_images, test_labels} = Scidata.FashionMNIST.download_test() 150 | ``` 151 | 152 | ```elixir 153 | {test_images_binary, _, _} = test_images 154 | 155 | test_images = 156 | test_images_binary 157 | |> Nx.from_binary(images_type) 158 | |> Nx.reshape({:auto, 28, 28, 1}) 159 | |> Nx.divide(255) 160 | 161 | {test_labels_binary, _, _} = test_labels 162 | 163 | test_labels = 164 | test_labels_binary 165 | |> Nx.from_binary(labels_type) 166 | |> Nx.new_axis(-1) 167 | |> Nx.equal(Nx.tensor(Enum.to_list(0..9))) 168 | ``` 169 | 170 | Instead of Axon.predict, we'll use Axon.loop.evaluator with an accuracy metric. 171 | 172 | ```elixir 173 | test_batched_images = Nx.to_batched(test_images, hyperparams[:batch_size]) 174 | test_batched_labels = Nx.to_batched(test_labels, hyperparams[:batch_size]) 175 | 176 | model 177 | |> Axon.Loop.evaluator() 178 | |> Axon.Loop.metric(:accuracy) 179 | |> Axon.Loop.run(Stream.zip(test_batched_images, test_batched_labels), trained_model_state, 180 | compiler: EXLA 181 | ) 182 | ``` 183 | 184 | ## Challenge: #ElixirFashionML 185 | 186 | Sean's #ElixirFashionMLChallenge Leaderboard (Accuracy) on 8/13/2023 187 | 188 | * 5 Epochs - 90.7% 189 | * 20 Epochs - 91.1% 190 | * 50 Epochs - 90.9% 191 | 192 | 193 | 194 | We have an 5 epoch accuracy of 90.7% vs Jeremy's 12/15 accuracy of 92.7%. That still leaves plenty of opportunities for the community to leap to the top of the leaderboard 195 | 196 | ## Resources 197 | 198 | We highly recommend purchasing Sean Moriarity's book, [Machine Learning in Elixir](https://pragprog.com/titles/smelixir/machine-learning-in-elixir/). He and Jose' started the Elixir numerical compute capability. The book explains many important concepts about training models in Elixir. 199 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Deep Learning from the Foundations with Elixir 2 | 3 | This repository is a transformation of the 2022 version of Fast.ai's Deep Learning for Coders Part 2. These code notebooks follow the notebooks from the [first portion](https://github.com/fastai/course22p2) of the course. Please watch the Jeremy Howard's course videos for the complete context. In the course, Jeremy starts with some constraints. His approach is to first implement a concept in standard Python code. Once he's introduced the concepts, then the notebooks start to bring in PyTorch library code that also implements the concept. He tries to incrementally demystify the PyTorch and Fast.ai library code. 4 | 5 | Similar to the course, these notebooks start from standard Elixir code and then bring in Nx and Axon libraries. We'll use Elixir and [Livebook.dev interactive & collaborative code notebooks](https://livebook.dev/). A requirement for these notebooks is a running Livebook application or server. Livebook runs on Windows and Mac desktops and on Linux. Please see the Livebook web site for instructions on installing the basic Livebook application. Livebook also runs on Linux. For our purposes, we run Livebook on a local Linux server using escript. For more information on using Livebook on escript, please see the Readme.md at https://github.com/livebook-dev/livebook. 6 | 7 | We'll be building a livebook for every Fast.ai Foundations Jupyter notebook. We welcome pull requests that improve our notebooks. 8 | --------------------------------------------------------------------------------