├── .gitignore ├── .vscode └── settings.json ├── LICENSE ├── Makefile ├── README.md ├── examples └── mlp.c ├── img ├── comp_graph.drawio.svg ├── neuron.drawio.svg ├── simple_expr.drawio.svg └── tape.drawio.svg └── src ├── autodiff.c ├── autodiff.h └── main.c /.gitignore: -------------------------------------------------------------------------------- 1 | *.exe -------------------------------------------------------------------------------- /.vscode/settings.json: -------------------------------------------------------------------------------- 1 | { 2 | "files.associations": { 3 | "autodiff.h": "c", 4 | "nn.h": "c", 5 | "mlp.h": "c" 6 | } 7 | } -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2023 Jan 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | CC = gcc 2 | CFLAGS = -Wall -Wextra -std=c11 -pedantic -g -std=c11 -lm 3 | SRC = src/ 4 | EXAMPLES = examples/ 5 | IN = $(SRC)autodiff.c $(SRC)main.c 6 | OUT = autodiff 7 | 8 | make: $(IN) 9 | $(CC) $(IN) -o $(OUT) $(CFLAGS) 10 | 11 | mlp_example: $(SRC)autodiff.c $(EXAMPLES)mlp.c 12 | $(CC) $(SRC)autodiff.c $(EXAMPLES)mlp.c -o mlp_demo $(CFLAGS) -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Simple Automatic Differentiation library 2 | 3 | This repository contains the implementation of scalar-valued reverse mode `autodiff` written in the C language. I implemented autodiff in C for educational and recreational purposes. Languages that offer operator overloading and a garbage collector would be ideal for an autodiff implementation. Nonetheless, implementing autodiff in C allowed for many interesting implementation details, such as the use of a flattened directed acyclic graph which is sometimes referred to as `relative pointers`. 4 | 5 | ## What is `autodiff`? 6 | Automatic differentiation (or simply `autodiff`) is the key method used by sophisticated deep learning libraries (e.g. Pytorch and Tensorflow) to garner the gradients of arbitrary computations. The gradients are then used to perform backpropagation through an artificial neural network model. The key difference between this implementation and that of Pytorch or Tensorflow is that this implementation considers gradients of scalar values, while Pytorch/Tensorflow operate on multi-dimensional arrays, and thus consider gradients of tensor values. As opposed to other methods for computing gradients, the strength of autodiff lies in accuracy and simplicity to implement. We can distinguish two variants of autodiff, forward mode and reverse mode. This repository is concerned with the reverse mode autodiff, which means that the derivative of a result with respect to its inputs is obtained by starting at the result and propagating the gradient backwards through previous computations. 7 | 8 | Generally, we are used to computing gradients based on differentiating the symbolic expression of a function $f(x)$ to another expression that is the derivative of $f$. For instance, $f(x) = x^2$ can be differentiated, which results in $f'(x) = 2x$. Symbolically parsing and transforming functions into its derivative is not scalable as the functions grow in size. Take for instance a moderately sized neural network, which consists of possibly hundreds of function compositions that would result in a massive symbolic expression. The derivative of which is not feasable to compute using symbolic differentiation. Thus, this approach is rejected for computing gradients. 9 | 10 | Another method that we can use is the limit definition of the derivative. That is, the following limit. 11 | 12 | $$ 13 | \lim_{h \to 0} \frac{f(x + h) - f(x)}{h} 14 | $$ 15 | 16 | This is very easy to compute. We just choose a small value for `h` and plug it into the formula. However, this will not be accurate as we are limited to the finite size of floating point values. As an example. let $f(x) = x^2$ and let $x = 3$. The following table depicts derivative calculations with decreasing values for $h$ using the limit definition of the derivative. 17 | 18 | |$h$ | $df / dx$ | 19 | |---|---------| 20 | |$0.1$ |$6.100000000000012$| 21 | |$0.01$ |$6.009999999999849$| 22 | |$0.0001$ |$6.000100000012054$| 23 | |$0.0000001$ |$6.000000087880153$| 24 | |$0.000000001$ |$6.000000087880153$| 25 | 26 | We know that the derivative of $f$ is $f'(x) = 2x$ and $f'(3) = 6$. Therefore, we can see from the table that the value converges as $h$ tends to 0. However, using this method, convergence with limited 32 or 64 bit floats is difficult, and thus, this method is rejected for optimal differentiation. 27 | 28 | At last, consider automatic differentiation, which is neither symbolic differentiation nor uses the limit definition. Rather, it uses the elementary building blocks of simple mathematical expressions and chains them together using the **chain rule** of calculus. Consider the simple expression $x = (2 + 3) * 4$, which can be destructured into 2 separate computations, i.e., $a = 2+3$ and $x = a * 4$. We can construct a binary tree for the computation associated with $x$. 29 | 30 | ![comp_graph](img/simple_expr.drawio.svg) 31 | 32 | The method of reverse mode automatic differentiation starts at the result, in this case $20$, and we compute each local derivative with respect to the child nodes, which are $5$ and $4$ using the multiplication rule of calculus. These are often very easy calculations, such as for addition and multiplication. Thereafter, we use the chain rule and multiply and add the local derivatives to obtain the global derivative with respect to some input. We set the gradients of the children using this computation and perform the same algorithm recursively until we have walked the entire graph. The result is a **computation graph** where each node has a gradient that was calculated with respect to the output of the computation. 33 | 34 | ![comp_graph](img/comp_graph.drawio.svg) 35 | 36 | We can encode custom functions to differentiate simple additions, multiplications, and any other computation (this is not limited to only mathematical computations) and use the chain rule to derive the derivative of the output with respect to every possible node in the graph. The reverse mode method starts at the end of the computation, i.e., at the result, and propagates the gradients backwards towards the beginning. 37 | 38 | In the above example, it is easy to compute the local derivatives of the computations. Starting at node `e`, the derivative of `e` with respect to (w.r.t.) `e` is just 1. Looking at the children of `e`, we have `d` and `c`. The derivative of `e` w.r.t. `c` is `d`, since `e = d * c`. And the derivative of `e` w.r.t. `d` is `c`, since `e = d * c`. Now `d` also has children, so let's compute the derivatives. The derivative of `d` w.r.t. `a` and w.r.t. `b` is 1, since `d = a + b`. Up until now, we have computed the local derivatives. Using the chain rule we can multiply the local derivatives to obtain the global derivatives with respect to the inputs `a` and `b`. Mathematics has a nice way to formulate this notion using the partial form ($\partial$). 39 | 40 | $$ 41 | \frac{\partial e}{\partial a} = \frac{\partial e}{\partial d} \cdot \frac{\partial d}{\partial a} \qquad \text{and} \qquad \frac{\partial e}{\partial b} = \frac{\partial e}{\partial d} \cdot \frac{\partial d}{\partial b} 42 | $$ 43 | 44 | ## Implementation details 45 | 46 | The library consists of two files, `autodiff.c` and `autodiff.h`. Since the computation graph is a **graph**, and not a tree structure even though it might look like a tree, it is possible that the graph contains cycles. Therefore, simply wrapping floating point values in a struct and using this wrapper as nodes in a linked-list is not enough. Furthermore, for each value and each result in a computation, we would need to allocate space in virtual memory. This can cause severe memory fragmentation as the allocations are always small, i.e., a couple bytes in size. To counteract this issue, the library uses a simple dynamic array and relative pointers that index into this array. The structure that acts as a dynamic array and maintains all value nodes is the `Tape` structure. Sophisticated machine learning libraries such as Pytorch and Tensorflow also use a tape object, although I am not sure if their approach also uses relative pointers. Consider the following schematic depicting the tape for the computation of $(2+3) * 4$. 47 | 48 | ![comp_graph](img/tape.drawio.svg) 49 | 50 | The first element is the **nil** element. It is analogous to the `NULL` value for a child pointer of a standard linked list. Here, the elements at indices 1, 2, and 4 all have 0 as the left (L) and right (R) pointers, which sends them back to the **nil** element at index 0. The indices 3 and 5 are a result of a computation. Their value is dependent on the operation used (Op) and the left and right relative pointers. Furthermore, observe that the graph is encoded into a dynamic array structure. Memory is managed easily as a singular tape containing all values during computations, and instead of fragmenting the memory, entire chunks of memory need only be reallocated when expanding the dynamic array. 51 | 52 | To achieve this in code, consider the following snippet. 53 | 54 | ```C 55 | int main(void){ 56 | 57 | // initialise the tape 58 | Tape tp = {0}; 59 | ad_init_tape(&tp); 60 | 61 | // perform computations 62 | size_t a = ad_create(&tp, 2.0f); 63 | size_t b = ad_create(&tp, 3.0f); 64 | size_t c = ad_add(&tp, a, b); 65 | size_t d = ad_create(&tp, 4.0f); 66 | 67 | size_t result = ad_mul(&tp, c, d); 68 | 69 | // print tape content 70 | ad_print_tape(&tp); 71 | 72 | // free tape 73 | ad_destroy_tape(&tp); 74 | return 0; 75 | } 76 | ``` 77 | 78 | the `ad_print_tape(&tp)` function prints the following to the console. 79 | 80 | ```C 81 | val: 0, index: 0, left: 0, right: 0, op: add 82 | val: 2, index: 1, left: 0, right: 0, op: nil 83 | val: 3, index: 2, left: 0, right: 0, op: nil 84 | val: 5, index: 3, left: 1, right: 2, op: add 85 | val: 4, index: 4, left: 0, right: 0, op: nil 86 | val: 20, index: 5, left: 3, right: 4, op: mul 87 | ``` 88 | 89 | If we want to compute the gradients of the expression `(2+3) * 4`, we can add the `ad_reverse(&tp, result)` function that computes the gradients using reverse mode autodiff. Consider the following snippet. 90 | 91 | ```C 92 | int main(void){ 93 | 94 | // initialise the tape 95 | Tape tp = {0}; 96 | ad_init_tape(&tp); 97 | 98 | // perform computations 99 | size_t a = ad_create(&tp, 2.0f); 100 | size_t b = ad_create(&tp, 3.0f); 101 | size_t c = ad_add(&tp, a, b); 102 | size_t d = ad_create(&tp, 4.0f); 103 | 104 | size_t result = ad_mul(&tp, c, d); 105 | 106 | // differentiate result w.r.t. a b c and d 107 | ad_reverse(&tp, result); 108 | // print computation tree 109 | ad_print_tree(&tp, result); 110 | 111 | // free tape 112 | ad_destroy_tape(&tp); 113 | return 0; 114 | } 115 | 116 | ``` 117 | ```C 118 | ------------- Computation graph ------------- 119 | [mul ] node (data: 20, grad: 1) 120 | [add ] node (data: 5, grad: 4) 121 | [nil] node (data: 2, grad: 4) 122 | [nil] node (data: 3, grad: 4) 123 | [nil] node (data: 4, grad: 5) 124 | -------------------------------------------- 125 | ``` 126 | 127 | The API of the `autodiff` library following consists of the following functions. 128 | Consider the memory management functions and the reverse mode function: 129 | - `ad_init_tape(Tape* tape)`: initialises the dynamic array (tape) for use. 130 | - `ad_destroy_tape(Tape* tape)`: destroys the dynamic array (tape). 131 | - `ad_reverse(Tape* tp, size_t y)`: computes the gradients of every value that is connected to the graph of `y` with respect to `y` and propagates the gradients using the chain rule. 132 | - `ad_reverse_toposort(Tape* tp, size_t y)`: Before computing the gradients of the computation graph, the nodes are first topologically sorted, which means that the children of a node are first considered before their parent is considered. Traversing the computation graph in topological reverse order ensures that every node is visited exactly once. This is only needed when the internal structure of the tape is manipulated. Otherwise, use the `ad_reverse(Tape* tp, size_t y)` function. 133 | 134 | Consider the arithmatic functions: 135 | - `ad_create(Tape* tp, float value)`: creates a value on the tape. 136 | - `ad_add(Tape* tp, size_t a, size_t b)`: creates a value on the tape based on the addition of two values on the tape pointed to by index. 137 | - `ad_sub(Tape* tp, size_t a, size_t b)`: creates a value on the tape based on the subtraction of two values on the tape pointed to by index. 138 | - `ad_mul(Tape* tp, size_t a, size_t b)`: creates a value on the tape based on the multiplication of two values on the tape pointed to by index. 139 | - `ad_pow(Tape* tp, size_t a, size_t b)`: creates a value on the tape based on the exponentiation of two values on the tape pointed to by index. 140 | 141 | Consider some common activation functions: 142 | - `ad_tanh(Tape* tp, size_t a)`: creates a value on the tape based on the `tanh(x)` function. 143 | - `ad_relu(Tape* tp, size_t a)`: creates a value on the tape based on the `ReLu(x)` function. 144 | - `ad_sigm(Tape* tp, size_t a)`: creates a value on the tape based on the `sigmoid(x)` function. 145 | 146 | Consider the debugging functions: 147 | - `ad_print_tape(Tape* tp)`: prints the contents of the tape array. 148 | - `ad_print_tree(Tape* tp, size_t y)`: print the computation graph of `y`. 149 | 150 | ## Usage 151 | The build system is `make` and there are no external dependencies. The implementation of autodiff is contained in `autodiff.c` and `autodiff.h`. Run the following for a demo example. 152 | ``` 153 | $ make 154 | $ ./autodiff 155 | ``` 156 | A more sophisticated example is also included, in which a simple multi-layer perceptron is used to solve the `XOR` problem. The code for this example can be found in `examples/mlp.c`. 157 | ``` 158 | $ make mlp_example 159 | $ ./mlp_demo 160 | . 161 | . 162 | . 163 | Average loss: 0.000727057 164 | Average loss: 0.000726102 165 | Average loss: 0.000725149 166 | ...Training end 167 | Prediction for input {0, 0} is 0.016746 168 | Prediction for input {1, 0} is 0.976343 169 | Prediction for input {0, 1} is 0.972189 170 | Prediction for input {1, 1} is 0.035648 171 | ``` 172 | 173 | 174 | ## Example 175 | 176 | As an addition example, consider the following. To build an artificial `Neuron` with 2 inputs ($x_1, x_2$), 2 weights ($w_1, w_2$), and a bias ($b$), which uses the `tanh(x)` activation function, we can construct the following mathematical function. 177 | 178 | $$ f(x_1, x_2) = \tanh((w_1 x_1 + w_2 x_2) + b) $$ 179 | 180 | Now, consider the following computation graph for this function. The goal is to find the derivatives of `y` w.r.t. the parameters of the neuron. These are `w1`, `w2`, and `b`. 181 | 182 | ![neuron](img/neuron.drawio.svg) 183 | 184 | The following code snippet computes this expression for some arbitrary values, and thereafter, computes and prints the gradients. 185 | ```C 186 | int main(){ 187 | 188 | Tape tp = {0}; 189 | ad_init_tape(&tp); 190 | 191 | // The inputs x1, x2 192 | size_t x1 = ad_create(&tp, -1.0f); 193 | size_t x2 = ad_create(&tp, 2.0f); 194 | 195 | // The params w1, w2, b 196 | size_t w1 = ad_create(&tp, 4.0f); 197 | size_t w2 = ad_create(&tp, -2.0f); 198 | size_t b = ad_create(&tp, .5f); 199 | 200 | // Intermediate computations 201 | size_t xw1 = ad_mul(&tp, x1, w1); 202 | size_t xw2 = ad_mul(&tp, x2, w2); 203 | size_t xw = ad_add(&tp, xw1, xw2); 204 | size_t xwb = ad_add(&tp, xw, b); 205 | 206 | // The result 207 | size_t y = ad_tanh(&tp, xwb); 208 | 209 | ad_reverse(&tp, y); 210 | ad_print_tree(&tp, y); 211 | 212 | ad_destroy_tape(&tp); 213 | return 0; 214 | } 215 | ``` 216 | ```Python 217 | ------------- Computation graph ------------- 218 | [tanh] node (data: -0.999999, grad: 1) 219 | [add ] node (data: -7.5, grad: 1.19209e-006) 220 | [add ] node (data: -8, grad: 1.19209e-006) 221 | [mul ] node (data: -4, grad: 1.19209e-006) 222 | [nil] node (data: -1, grad: 4.76837e-006) 223 | [nil] node (data: 4, grad: -1.19209e-006) 224 | [mul ] node (data: -4, grad: 1.19209e-006) 225 | [nil] node (data: 2, grad: -2.38419e-006) 226 | [nil] node (data: -2, grad: 2.38419e-006) 227 | [nil] node (data: 0.5, grad: 1.19209e-006) 228 | -------------------------------------------- 229 | ``` 230 | 231 | ## References 232 | - Introduction to autodiff: https://arxiv.org/pdf/2110.06209.pdf 233 | - Tensorflow autodiff API: https://www.tensorflow.org/guide/autodiff 234 | - Video detailing the intuition of autodiff: https://www.youtube.com/watch?v=VMj-3S1tku0&ab_channel=AndrejKarpathy -------------------------------------------------------------------------------- /examples/mlp.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include "../src/autodiff.h" 5 | 6 | // Vector and matrix structs that have a size_t ptr 7 | // that points to a value in a tape structure. 8 | typedef struct { 9 | size_t ptr; 10 | size_t rows; 11 | } Vector; 12 | 13 | typedef struct { 14 | size_t ptr; 15 | size_t rows; 16 | size_t cols; 17 | } Matrix; 18 | 19 | // Layer consists of weights, biases, and an activation function. 20 | // The activation function can currently be one of the following: 21 | // - ReLu (ad_relu()) 22 | // - Tanh (ad_tanh()) 23 | // - Sigmoid (ad_sigm()) 24 | typedef struct { 25 | Matrix weights; 26 | Vector biases; 27 | size_t (*activation)(Tape* tp, size_t a); 28 | } Layer; 29 | 30 | // Multi-Layer Perceptron struct 31 | // It manages its own tape of parameters 32 | // that gets copied into a new tape at every start of the fitness function (mlp_fit) 33 | typedef struct { 34 | Tape params; 35 | Layer* layers; 36 | size_t num_layers; 37 | size_t max_layers; 38 | float learning_rate; 39 | } MLP; 40 | 41 | // Returns a floating point number between -1 and 1 42 | float mlp_rand(){ 43 | return ((float)rand() / (float)RAND_MAX) * 2.0 - 1.0; 44 | } 45 | 46 | // Vector is creating by consecutively creating leaf nodes in the computation graph. 47 | Vector mlp_create_vector(Tape* tp, size_t rows) { 48 | 49 | size_t ptr = ad_create(tp, mlp_rand()); 50 | for (size_t i = 1; i < rows; ++i){ 51 | ad_create(tp, mlp_rand()); 52 | } 53 | 54 | return (Vector){ 55 | .rows = rows, 56 | .ptr = ptr 57 | }; 58 | } 59 | 60 | // 2D Matrix is created by flattening the matrix into a 1D array and 61 | // consecutively creating leaf nodes in the computation graph. 62 | Matrix mlp_create_matrix(Tape* tp, size_t rows, size_t cols) { 63 | 64 | size_t ptr = ad_create(tp, mlp_rand()); 65 | for (size_t i = 1; i < rows*cols; ++i){ 66 | ad_create(tp, mlp_rand()); 67 | } 68 | return (Matrix){ 69 | .rows = rows, 70 | .cols = cols, 71 | .ptr = ptr 72 | }; 73 | } 74 | 75 | void mlp_print_mat(Tape* tp, Matrix mat){ 76 | printf("shape (%d, %d)\n", mat.rows, mat.cols); 77 | for (size_t i = 0; i < mat.rows; ++i){ 78 | for (size_t j = 0; j < mat.cols; ++j){ 79 | printf("[%f] ", GET(mat.ptr + i*mat.cols + j).data); 80 | } 81 | printf("\n"); 82 | } 83 | } 84 | 85 | void mlp_print_vec(Tape* tp, Vector vec){ 86 | printf("shape (%d, 1)\n", vec.rows); 87 | for (size_t i = 0; i < vec.rows; ++i) 88 | printf("[%f]\n", GET(vec.ptr+i).data); 89 | } 90 | 91 | // Initialise MLP struct by providing learning rate 92 | void mlp_init(MLP* nn, float learning_rate){ 93 | srand(time(NULL)); 94 | nn->learning_rate = learning_rate; 95 | ad_init_tape(&nn->params); 96 | nn->num_layers = 0; 97 | nn->max_layers = 0; 98 | nn->layers = NULL; 99 | } 100 | 101 | void mlp_destroy(MLP* nn){ 102 | ad_destroy_tape(&nn->params); 103 | free(nn->layers); 104 | } 105 | 106 | void mlp_init_layer(Layer* layer, Tape* tp, size_t num_inputs, size_t num_neurons, const char* activation_function){ 107 | 108 | if (strcmp("relu", activation_function) == 0) { 109 | layer->activation = ad_relu; 110 | } else if (strcmp("tanh", activation_function) == 0){ 111 | layer->activation = ad_relu; 112 | } else if (strcmp("sigm", activation_function) == 0){ 113 | layer->activation = ad_sigm; 114 | } else { 115 | fprintf(stderr, "The provided activation function is supported.\nChoose either 'relu' or 'tanh'\n"); 116 | exit(1); 117 | } 118 | 119 | layer->weights = mlp_create_matrix(tp, num_neurons, num_inputs); 120 | layer->biases = mlp_create_vector(tp, num_neurons); 121 | } 122 | 123 | // Add a dense layer to the neural network by providing 124 | // - the number of input nodes, 125 | // - the number of neurons in the layer, and 126 | // - the activation function ("relu", "tanh", "sigm") 127 | void mlp_add_layer(MLP* nn, size_t num_inputs, size_t num_neurons, const char* activation_function){ 128 | if (nn->num_layers >= nn->max_layers){ 129 | nn->max_layers = Extend(nn->max_layers); 130 | nn->layers = realloc(nn->layers, sizeof(Layer) * nn->max_layers); 131 | if (!nn->layers) { 132 | fprintf(stderr, "Not enough memory, buy more ram!\n"); 133 | exit(1); 134 | } 135 | } 136 | mlp_init_layer(nn->layers + nn->num_layers, &nn->params, num_inputs, num_neurons, activation_function); 137 | nn->num_layers++; 138 | } 139 | 140 | // Forward pass through the components of a layer, 141 | // i.e., the input vector, the weight matrix, the bias vector, and the activation function 142 | Vector mlp_forward_pass_layer( 143 | Tape* tp, 144 | Matrix mat, 145 | Vector vec, 146 | Vector bias, 147 | size_t (*a_fun)(Tape*, size_t)) 148 | { 149 | 150 | if (mat.cols != vec.rows || mat.rows != bias.rows) { 151 | fprintf(stderr, "Columns of matrix do not match rows of vector\n"); 152 | exit(1); 153 | } 154 | size_t* out_ptr = malloc(sizeof(size_t) * mat.rows); 155 | 156 | for (size_t i = 0; i < mat.rows; ++i){ 157 | size_t res = ad_create(tp, 0.0f); 158 | for (size_t j = 0; j < mat.cols; ++j){ 159 | res = ad_add(tp, 160 | res, 161 | ad_mul(tp, 162 | mat.ptr + i*mat.cols + j, 163 | vec.ptr + j) 164 | ); 165 | } 166 | res = ad_add(tp, res, bias.ptr + i); 167 | res = a_fun(tp, res); 168 | out_ptr[i] = res; 169 | } 170 | 171 | Vector out = mlp_create_vector(tp, mat.rows); 172 | for (size_t i = 0; i < mat.rows; ++i){ 173 | GET(out.ptr + i).data = GET(out_ptr[i]).data; 174 | GET(out.ptr + i).left_child = GET(out_ptr[i]).left_child; 175 | GET(out.ptr + i).right_child = GET(out_ptr[i]).right_child; 176 | GET(out.ptr + i).op = GET(out_ptr[i]).op; 177 | } 178 | free(out_ptr); 179 | 180 | return out; 181 | } 182 | 183 | // Pass through all layers 184 | Vector mlp_forward_pass(MLP* nn, Tape* tp, Vector xs){ 185 | 186 | Vector out = xs; 187 | for (size_t i = 0; i < nn->num_layers; ++i){ 188 | out = mlp_forward_pass_layer(tp, 189 | nn->layers[i].weights, 190 | out, 191 | nn->layers[i].biases, 192 | nn->layers[i].activation 193 | ); 194 | } 195 | 196 | return out; 197 | } 198 | 199 | Vector _predict(MLP* nn, Tape* tp, float* xs, size_t xs_size){ 200 | 201 | // Copy over model params into new tape 202 | for (size_t i = 1; i < nn->params.count; ++i){ 203 | ad_create(tp, nn->params.val_buf[i].data); 204 | } 205 | 206 | // Create and fill input vector 207 | Vector xs_vec = mlp_create_vector(tp, xs_size); 208 | for (size_t i = 0; i < xs_size; ++i){ 209 | tp->val_buf[xs_vec.ptr + i].data = xs[i]; 210 | } 211 | 212 | // Forward pass 213 | Vector out = mlp_forward_pass(nn, tp, xs_vec); 214 | return out; 215 | } 216 | 217 | float mlp_fit(MLP* nn, float* X, size_t X_size, float* Y, size_t Y_size){ 218 | 219 | Tape tp = {0}; 220 | ad_init_tape(&tp); 221 | 222 | Vector out = _predict(nn, &tp, X, X_size); 223 | 224 | // Create and fill ground truth vector 225 | Vector ys = mlp_create_vector(&tp, Y_size); 226 | for (size_t i = 0; i < Y_size; ++i){ 227 | tp.val_buf[ys.ptr + i].data = Y[i]; 228 | } 229 | 230 | // Compute mean squared error 231 | size_t loss = ad_create(&tp, 0.0f); 232 | for (size_t i = 0; i < out.rows; ++i){ 233 | loss = ad_add(&tp, 234 | loss, 235 | ad_pow(&tp, 236 | ad_sub(&tp, out.ptr + i, ys.ptr + i), 237 | ad_create(&tp, 2.0f) 238 | ) 239 | ); 240 | } 241 | 242 | loss = ad_mul(&tp, 243 | loss, 244 | ad_create(&tp, 1.0f/(float)out.rows) 245 | ); 246 | 247 | // Backpropagation with autodiff 248 | ad_reverse(&tp, loss); 249 | 250 | // Update rule 251 | for (size_t i = 1; i < nn->params.count; ++i){ 252 | nn->params.val_buf[i].data -= nn->learning_rate * tp.val_buf[i].grad; 253 | } 254 | 255 | // Save loss value 256 | float ret_loss = tp.val_buf[loss].data; 257 | 258 | // Destroy computation graph 259 | ad_destroy_tape(&tp); 260 | 261 | return ret_loss; 262 | } 263 | 264 | void mlp_predict(MLP* nn, float* xs, size_t xs_size, float* out, size_t out_size){ 265 | 266 | Tape tp = {0}; 267 | ad_init_tape(&tp); 268 | 269 | Vector out_vec = _predict(nn, &tp, xs, xs_size); 270 | 271 | for (size_t i = 0; i < out_size; ++i){ 272 | out[i] = tp.val_buf[out_vec.ptr + i].data; 273 | } 274 | 275 | // Clean tape 276 | ad_destroy_tape(&tp); 277 | } 278 | 279 | void mlp_print(MLP* nn){ 280 | printf("------------- MLP model -------------\nlearning_rate = %g\n", nn->learning_rate); 281 | printf("Input layer, (in: %3d): ", nn->layers[0].weights.cols); 282 | for (size_t j = 0; j < nn->layers[0].weights.cols; ++j){ 283 | printf("[n] "); 284 | } 285 | printf("\n"); 286 | for (size_t i = 0; i < nn->num_layers; ++i){ 287 | printf("Layer %d, shape (in: %3d, out: %3d): ", i+1, 288 | nn->layers[i].weights.cols, 289 | nn->layers[i].weights.rows); 290 | for (size_t j = 0; j < nn->layers[i].weights.rows; ++j){ 291 | printf("[n] "); 292 | } 293 | printf("\n"); 294 | } 295 | printf("-------------------------------------\n"); 296 | } 297 | 298 | #define TRAINING_SIZE 4 299 | 300 | // Input dataset for the XOR problem 301 | float X[TRAINING_SIZE][2] = { 302 | {0.0f, 0.0f}, 303 | {1.0f, 0.0f}, 304 | {0.0f, 1.0f}, 305 | {1.0f, 1.0f}, 306 | }; 307 | 308 | // Ground truth dataset for the XOR problem 309 | float Y[TRAINING_SIZE] = { 310 | 0.0f, 311 | 1.0f, 312 | 1.0f, 313 | 0.0f, 314 | }; 315 | 316 | int main(void){ 317 | 318 | // Initialise multi-layer perceptron 319 | MLP nn = {0}; 320 | float learning_rate = 1.5f; 321 | mlp_init(&nn, learning_rate); 322 | 323 | // Add layers of neurons 324 | mlp_add_layer(&nn, 2, 4, "sigm"); 325 | mlp_add_layer(&nn, 4, 1, "sigm"); 326 | 327 | mlp_print(&nn); 328 | 329 | // Train model and print average loss 330 | printf("Training start...\n"); 331 | #define BATCH_SIZE 1000 332 | float loss; 333 | for (size_t n = 0; n < BATCH_SIZE; ++n){ 334 | loss = 0.0f; 335 | for (size_t i = 0; i < TRAINING_SIZE; ++i){ 336 | loss += mlp_fit(&nn, X[i], 2, Y+i, 1); 337 | } 338 | printf("Average loss: %g\n", loss/TRAINING_SIZE); 339 | } 340 | printf("...Training end\n"); 341 | 342 | // Prediction 343 | float out1, out2, out3, out4; 344 | mlp_predict(&nn, X[0], 2, &out1, 1); 345 | mlp_predict(&nn, X[1], 2, &out2, 1); 346 | mlp_predict(&nn, X[2], 2, &out3, 1); 347 | mlp_predict(&nn, X[3], 2, &out4, 1); 348 | 349 | printf("Prediction for input {0, 0} is %f\n", out1); 350 | printf("Prediction for input {1, 0} is %f\n", out2); 351 | printf("Prediction for input {0, 1} is %f\n", out3); 352 | printf("Prediction for input {1, 1} is %f\n", out4); 353 | 354 | // Destroy model 355 | mlp_destroy(&nn); 356 | 357 | return 0; 358 | } 359 | -------------------------------------------------------------------------------- /img/comp_graph.drawio.svg: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
11 |
12 |
13 | a 14 |
15 |
16 |
17 |
18 | 19 | a 20 | 21 |
22 |
23 | 24 | 25 | 26 | 27 | 28 |
29 |
30 |
31 | b 32 |
33 |
34 |
35 |
36 | 37 | b 38 | 39 |
40 |
41 | 42 | 43 | 44 | 45 | 46 |
47 |
48 |
49 | c 50 |
51 |
52 |
53 |
54 | 55 | c 56 | 57 |
58 |
59 | 60 | 61 | 62 | 63 | 64 |
65 |
66 |
67 | + 68 |
69 |
70 |
71 |
72 | 73 | + 74 | 75 |
76 |
77 | 78 | 79 | 80 | 81 | 82 |
83 |
84 |
85 | * 86 |
87 |
88 |
89 |
90 | 91 | * 92 | 93 |
94 |
95 | 96 | 97 | 98 | 99 | 100 |
101 |
102 |
103 | d 104 |
105 |
106 |
107 |
108 | 109 | d 110 | 111 |
112 |
113 | 114 | 115 | 116 | 117 |
118 |
119 |
120 | e 121 |
122 |
123 |
124 |
125 | 126 | e 127 | 128 |
129 |
130 |
131 | 132 | 133 | 134 | 135 | Text is not SVG - cannot display 136 | 137 | 138 | 139 |
-------------------------------------------------------------------------------- /img/neuron.drawio.svg: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
11 |
12 |
13 | x1 14 |
15 |
16 |
17 |
18 | 19 | x1 20 | 21 |
22 |
23 | 24 | 25 | 26 | 27 | 28 |
29 |
30 |
31 | w1 32 |
33 |
34 |
35 |
36 | 37 | w1 38 | 39 |
40 |
41 | 42 | 43 | 44 | 45 | 46 |
47 |
48 |
49 | * 50 |
51 |
52 |
53 |
54 | 55 | * 56 | 57 |
58 |
59 | 60 | 61 | 62 | 63 | 64 |
65 |
66 |
67 | x2 68 |
69 |
70 |
71 |
72 | 73 | x2 74 | 75 |
76 |
77 | 78 | 79 | 80 | 81 | 82 |
83 |
84 |
85 | w2 86 |
87 |
88 |
89 |
90 | 91 | w2 92 | 93 |
94 |
95 | 96 | 97 | 98 | 99 | 100 |
101 |
102 |
103 | * 104 |
105 |
106 |
107 |
108 | 109 | * 110 | 111 |
112 |
113 | 114 | 115 | 116 | 117 | 118 |
119 |
120 |
121 | b 122 |
123 |
124 |
125 |
126 | 127 | b 128 | 129 |
130 |
131 | 132 | 133 | 134 | 135 | 136 |
137 |
138 |
139 | + 140 |
141 |
142 |
143 |
144 | 145 | + 146 | 147 |
148 |
149 | 150 | 151 | 152 | 153 | 154 |
155 |
156 |
157 | tanh 158 |
159 |
160 |
161 |
162 | 163 | tanh 164 | 165 |
166 |
167 | 168 | 169 | 170 | 171 |
172 |
173 |
174 | y 175 |
176 |
177 |
178 |
179 | 180 | y 181 | 182 |
183 |
184 | 185 | 186 | 187 | 188 | 189 |
190 |
191 |
192 | xw1 193 |
194 |
195 |
196 |
197 | 198 | xw1 199 | 200 |
201 |
202 | 203 | 204 | 205 | 206 | 207 |
208 |
209 |
210 | xw2 211 |
212 |
213 |
214 |
215 | 216 | xw2 217 | 218 |
219 |
220 | 221 | 222 | 223 | 224 | 225 |
226 |
227 |
228 | + 229 |
230 |
231 |
232 |
233 | 234 | + 235 | 236 |
237 |
238 | 239 | 240 | 241 | 242 | 243 |
244 |
245 |
246 | xw 247 |
248 |
249 |
250 |
251 | 252 | xw 253 | 254 |
255 |
256 | 257 | 258 | 259 | 260 | 261 |
262 |
263 |
264 | xwb 265 |
266 |
267 |
268 |
269 | 270 | xwb 271 | 272 |
273 |
274 |
275 | 276 | 277 | 278 | 279 | Text is not SVG - cannot display 280 | 281 | 282 | 283 |
-------------------------------------------------------------------------------- /img/simple_expr.drawio.svg: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
11 |
12 |
13 | 2 14 |
15 |
16 |
17 |
18 | 19 | 2 20 | 21 |
22 |
23 | 24 | 25 | 26 | 27 | 28 |
29 |
30 |
31 | 3 32 |
33 |
34 |
35 |
36 | 37 | 3 38 | 39 |
40 |
41 | 42 | 43 | 44 | 45 | 46 |
47 |
48 |
49 | 4 50 |
51 |
52 |
53 |
54 | 55 | 4 56 | 57 |
58 |
59 | 60 | 61 | 62 | 63 | 64 |
65 |
66 |
67 | + 68 |
69 |
70 |
71 |
72 | 73 | + 74 | 75 |
76 |
77 | 78 | 79 | 80 | 81 | 82 |
83 |
84 |
85 | * 86 |
87 |
88 |
89 |
90 | 91 | * 92 | 93 |
94 |
95 | 96 | 97 | 98 | 99 | 100 |
101 |
102 |
103 | 5 104 |
105 |
106 |
107 |
108 | 109 | 5 110 | 111 |
112 |
113 | 114 | 115 | 116 | 117 |
118 |
119 |
120 | 20 121 |
122 |
123 |
124 |
125 | 126 | 20 127 | 128 |
129 |
130 |
131 | 132 | 133 | 134 | 135 | Text is not SVG - cannot display 136 | 137 | 138 | 139 |
-------------------------------------------------------------------------------- /img/tape.drawio.svg: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
10 |
11 |
12 | 13 | Val: 0 14 | 15 |
16 | 17 | Op: nil 18 | 19 |
20 | 21 | L: 0 22 | 23 |
24 |
25 | 26 | R: 0 27 | 28 |
29 |
30 |
31 |
32 |
33 | 34 | Val: 0... 35 | 36 |
37 |
38 | 39 | 40 | 41 | 42 |
43 |
44 |
45 | 0 46 |
47 |
48 |
49 |
50 | 51 | 0 52 | 53 |
54 |
55 | 56 | 57 | 58 | 59 |
60 |
61 |
62 | 1 63 |
64 |
65 |
66 |
67 | 68 | 1 69 | 70 |
71 |
72 | 73 | 74 | 75 | 76 |
77 |
78 |
79 | 2 80 |
81 |
82 |
83 |
84 | 85 | 2 86 | 87 |
88 |
89 | 90 | 91 | 92 | 93 |
94 |
95 |
96 | 3 97 |
98 |
99 |
100 |
101 | 102 | 3 103 | 104 |
105 |
106 | 107 | 108 | 109 | 110 |
111 |
112 |
113 | 4 114 |
115 |
116 |
117 |
118 | 119 | 4 120 | 121 |
122 |
123 | 124 | 125 | 126 | 127 |
128 |
129 |
130 | 5 131 |
132 |
133 |
134 |
135 | 136 | 5 137 | 138 |
139 |
140 | 141 | 142 | 143 | 144 |
145 |
146 |
147 | Index: 148 |
149 |
150 |
151 |
152 | 153 | Index: 154 | 155 |
156 |
157 | 158 | 159 | 160 | 161 |
162 |
163 |
164 | ... 165 |
166 |
167 |
168 |
169 | 170 | ... 171 | 172 |
173 |
174 | 175 | 176 | 177 | 178 |
179 |
180 |
181 | 182 | Val: 2 183 | 184 |
185 | 186 | Op: nil 187 | 188 |
189 | 190 | L: 0 191 | 192 |
193 |
194 | 195 | R: 0 196 | 197 |
198 |
199 |
200 |
201 |
202 | 203 | Val: 2... 204 | 205 |
206 |
207 | 208 | 209 | 210 | 211 |
212 |
213 |
214 | 215 | Val: 3 216 | 217 |
218 | 219 | Op: nil 220 | 221 |
222 | 223 | L: 0 224 | 225 |
226 |
227 | 228 | R: 0 229 | 230 |
231 |
232 |
233 |
234 |
235 | 236 | Val: 3... 237 | 238 |
239 |
240 | 241 | 242 | 243 | 244 | 245 | 246 | 247 | 248 |
249 |
250 |
251 | Val: 5 252 |
253 | Op: + 254 |
255 | L: 1 256 |
257 | R: 2 258 |
259 |
260 |
261 |
262 | 263 | Val: 5... 264 | 265 |
266 |
267 | 268 | 269 | 270 | 271 |
272 |
273 |
274 | 275 | Val: 4 276 | 277 |
278 | 279 | Op: nil 280 | 281 |
282 | 283 | L: 0 284 | 285 |
286 |
287 | 288 | R: 0 289 | 290 |
291 |
292 |
293 |
294 |
295 | 296 | Val: 4... 297 | 298 |
299 |
300 | 301 | 302 | 303 | 304 | 305 | 306 | 307 | 308 |
309 |
310 |
311 | 312 | Val: 20 313 | 314 |
315 | 316 | Op: * 317 | 318 |
319 | 320 | L: 4 321 | 322 |
323 |
324 | 325 | R: 3 326 | 327 |
328 |
329 |
330 |
331 |
332 | 333 | Val: 2... 334 | 335 |
336 |
337 | 338 | 339 | 340 | 341 |
342 |
343 |
344 | ... 345 |
346 |
347 |
348 |
349 | 350 | ... 351 | 352 |
353 |
354 |
355 | 356 | 357 | 358 | 359 | Text is not SVG - cannot display 360 | 361 | 362 | 363 |
-------------------------------------------------------------------------------- /src/autodiff.c: -------------------------------------------------------------------------------- 1 | #include "autodiff.h" 2 | 3 | // For debugging purposes (such as printing the computation tree) 4 | // note that the order matters. 5 | const char* operators[] = { 6 | "add ", "sub ", "mul ", "pow ", "tanh", "relu", "sigm", "nil" 7 | }; 8 | 9 | // Initialise tape by providing pointer to Tape. 10 | // Tape should be destroyed after its use by calling `void ad_destroy_tape(Tape* tp)` 11 | void ad_init_tape(Tape* tp) { 12 | tp->val_buf = calloc(INIT_TAPE_SIZE, sizeof(Value)); 13 | tp->cap = INIT_TAPE_SIZE; 14 | tp->count = 1; 15 | } 16 | 17 | void ad_destroy_tape(Tape* tp) { 18 | free(tp->val_buf); 19 | } 20 | 21 | // Create new floating point (f32) variable value, 22 | // which is a leaf node of the computation graph. 23 | size_t ad_create(Tape* tp, float value){ 24 | if (tp->count >= tp->cap){ 25 | 26 | tp->cap = Extend(tp->cap); 27 | tp->val_buf = realloc(tp->val_buf, sizeof(Value) * tp->cap); 28 | if (!tp->val_buf) { 29 | fprintf(stderr, "Not enough memory, buy more ram!\n"); 30 | exit(1); 31 | } 32 | } 33 | 34 | Value* res = tp->val_buf + tp->count; 35 | res->data = value; 36 | res->grad = 0.0f; 37 | res->left_child = 0; 38 | res->right_child = 0; 39 | res->op = COUNT; 40 | tp->count++; 41 | return tp->count-1; 42 | } 43 | 44 | // Macro to define addition, subtraction, and multiplication functions. 45 | #define AD_OPERATOR_FUNC_BINARY(op_symbol, op_type, op_name) \ 46 | size_t ad_##op_name(Tape* tp, size_t a, size_t b) { \ 47 | float data = GET(a).data op_symbol GET(b).data; \ 48 | size_t out = ad_create(tp, data); \ 49 | GET(out).left_child = a; \ 50 | GET(out).right_child = b; \ 51 | GET(out).op = op_type; \ 52 | return out; \ 53 | }\ 54 | 55 | AD_OPERATOR_FUNC_BINARY(+, ADD, add) 56 | AD_OPERATOR_FUNC_BINARY(-, SUB, sub) 57 | AD_OPERATOR_FUNC_BINARY(*, MUL, mul) 58 | 59 | size_t ad_pow(Tape* tp, size_t a, size_t b){ 60 | float data = powf(GET(a).data, GET(b).data); 61 | size_t out = ad_create(tp, data); 62 | GET(out).left_child = a; 63 | GET(out).right_child = b; 64 | GET(out).op = POW; 65 | return out; 66 | } 67 | 68 | size_t ad_tanh(Tape* tp, size_t a){ 69 | size_t out = ad_create(tp, tanh(GET(a).data)); 70 | GET(out).left_child = a; 71 | GET(out).right_child = 0; 72 | GET(out).op = TANH; 73 | return out; 74 | } 75 | 76 | size_t ad_relu(Tape* tp, size_t a){ 77 | size_t out = ad_create(tp, GET(a).data > 0 ? GET(a).data : 0); 78 | GET(out).left_child = a; 79 | GET(out).right_child = 0; 80 | GET(out).op = RELU; 81 | return out; 82 | } 83 | 84 | float sigmoid(float x) { 85 | return 1/(1 + exp(-x)); 86 | } 87 | 88 | size_t ad_sigm(Tape* tp, size_t a){ 89 | size_t out = ad_create(tp, sigmoid(GET(a).data)); 90 | GET(out).left_child = a; 91 | GET(out).right_child = 0; 92 | GET(out).op = SIGM; 93 | return out; 94 | } 95 | 96 | // For each value on the tape, the gradient of its parents 97 | // is updated, i.e., the local gradients are flowing from the root 98 | // of the computation graph towards the leaves in a topological order. 99 | void _ad_reverse(Tape* tp, size_t y){ 100 | Value y_deref = GET(y); 101 | switch (y_deref.op){ 102 | case SUB: { 103 | GET(y_deref.left_child).grad += y_deref.grad * 1.0f; 104 | GET(y_deref.right_child).grad += y_deref.grad * -1.0f; 105 | } break; 106 | case ADD: { 107 | GET(y_deref.left_child).grad += y_deref.grad * 1.0f; 108 | GET(y_deref.right_child).grad += y_deref.grad * 1.0f; 109 | } break; 110 | case MUL: { 111 | GET(y_deref.left_child).grad += y_deref.grad * GET(y_deref.right_child).data; 112 | GET(y_deref.right_child).grad += y_deref.grad * GET(y_deref.left_child).data; 113 | } break; 114 | case POW: { 115 | float l_data = GET(y_deref.left_child).data; 116 | float r_data = GET(y_deref.right_child).data; 117 | GET(y_deref.left_child).grad += y_deref.grad * r_data * powf(l_data, r_data - 1); 118 | GET(y_deref.right_child).grad += y_deref.grad * log(l_data) * powf(l_data, r_data); 119 | } break; 120 | case TANH: { 121 | GET(y_deref.left_child).grad += y_deref.grad * (1 - y_deref.data*y_deref.data); 122 | } break; 123 | case RELU: { 124 | if (y_deref.data > 0) 125 | GET(y_deref.left_child).grad += y_deref.grad * 1.0f; 126 | } break; 127 | case SIGM: { 128 | GET(y_deref.left_child).grad += y_deref.grad * y_deref.data * (1 - y_deref.data); 129 | } break; 130 | default: break; 131 | } 132 | } 133 | 134 | void ad_reverse(Tape* tp, size_t y){ 135 | // Initial gradient is always 1 136 | // because the derivative of x w.r.t. x = 1 137 | GET(y).grad = 1.0; 138 | // We traverse the computation graph topologically in reverse order, 139 | // so that every node is visited exactly once. Note that this only works 140 | // when adhering to the provided ad_ api for constructing the computation graph. 141 | // 142 | // When the order of the tape is not topologically consistent, i.e., is not sorted, 143 | // then consider using `void ad_reverse_toposort(Tape* tp, size_t y)` to first sort the nodes 144 | // and only then traverse the graph. 145 | for (size_t i = tp->count-1; i >= 1; --i){ 146 | _ad_reverse(tp, i); 147 | } 148 | } 149 | 150 | // Depth first search through computation graph to recursively obtain topologically sorted nodes 151 | void _ad_topo(Tape *tp, size_t* topo_nodes, bool* visited, size_t y, size_t* count) { 152 | visited[y] = true; 153 | if(GET(y).left_child != 0 && !visited[GET(y).left_child]) { 154 | _ad_topo(tp, topo_nodes, visited, GET(y).left_child, count); 155 | } 156 | if (GET(y).right_child != 0 && !visited[GET(y).right_child]) { 157 | _ad_topo(tp, topo_nodes, visited, GET(y).right_child, count); 158 | } 159 | topo_nodes[*count] = y; 160 | (*count)++; 161 | } 162 | 163 | void ad_reverse_toposort(Tape* tp, size_t y) { 164 | GET(y).grad = 1.0f; 165 | size_t count = 0; 166 | 167 | // List of sorted graph indices 168 | size_t *sorted_nodes = malloc(tp->count * sizeof(size_t)); 169 | bool *visited = calloc(tp->count, sizeof(bool)); 170 | 171 | _ad_topo(tp, sorted_nodes, visited, y, &count); 172 | 173 | // traverse the topologically sorted nodes in reverse order 174 | for (size_t i = count-1; i > 0; --i) { 175 | _ad_reverse(tp, sorted_nodes[i]); 176 | } 177 | 178 | free(sorted_nodes); 179 | free(visited); 180 | } 181 | 182 | // Printing utility 183 | void _ad_print_tree(Tape* tp, size_t y, size_t indent){ 184 | if (y == 0) return; 185 | Value y_deref = GET(y); 186 | for (size_t i = 0; i < indent; ++i) printf(" "); 187 | printf("[idx: %d, %s] node (data: %g, grad: %g)\n", y, operators[y_deref.op], y_deref.data, y_deref.grad); 188 | _ad_print_tree(tp, y_deref.left_child, indent+4); 189 | _ad_print_tree(tp, y_deref.right_child, indent+4); 190 | } 191 | 192 | void ad_print_tree(Tape* tp, size_t y){ 193 | printf("------------- Computation graph -------------\n"); 194 | _ad_print_tree(tp, y, 0); 195 | printf("--------------------------------------------\n"); 196 | } 197 | 198 | void ad_print_tape(Tape* tp){ 199 | for (size_t i = 0; i < tp->count; ++i){ 200 | printf("val: %2g, index: %3zu, left: %3zu, right: %3zu, op: %s\n", 201 | tp->val_buf[i].data, i, tp->val_buf[i].left_child, tp->val_buf[i].right_child, operators[tp->val_buf[i].op]); 202 | } 203 | } -------------------------------------------------------------------------------- /src/autodiff.h: -------------------------------------------------------------------------------- 1 | #ifndef _AUTODIFF_H 2 | #define _AUTODIFF_H 3 | 4 | #include 5 | #include 6 | #include 7 | #include 8 | 9 | // Macro for extending dynamic array 10 | #define Extend(n) (n == 0 ? 8 : (n*2)) 11 | 12 | // Current set of operators 13 | // Adding more operators implies adding the corresponding functions 14 | typedef enum { 15 | ADD, 16 | SUB, 17 | MUL, 18 | POW, 19 | TANH, 20 | RELU, 21 | SIGM, 22 | COUNT, 23 | } OpType; 24 | 25 | // Value struct behaving like a node in a graph 26 | // It is aware of its operator type and has a 27 | // reference to its operands in an array linked list 28 | typedef struct { 29 | float data; 30 | float grad; 31 | OpType op; 32 | size_t left_child; 33 | size_t right_child; 34 | } Value; 35 | 36 | // The tape struct is a dynamic array of values 37 | // which function as an array linked list 38 | typedef struct { 39 | Value* val_buf; 40 | size_t count; 41 | size_t cap; 42 | } Tape; 43 | 44 | // Most params of the tape are Tape pointers 'tp' 45 | // which leads to the usefulness of this macro 46 | #define GET(v) tp->val_buf[(v)] 47 | 48 | // The initial tape size. 49 | // recommended to be a multiple of 2 50 | #define INIT_TAPE_SIZE 8 51 | 52 | // Gradient tape functions 53 | void ad_init_tape(Tape* tape); 54 | void ad_destroy_tape(Tape* tape); 55 | void ad_print_tape(Tape* tp); 56 | 57 | // Create a differentiable value 58 | // that gets added to the provided tape 59 | size_t ad_create(Tape* tp, float value); 60 | 61 | // autodiff API for common operations 62 | // note that the operands are 'size_t' and serve as array pointers for the tape 63 | size_t ad_add(Tape* tp, size_t a, size_t b); 64 | size_t ad_sub(Tape* tp, size_t a, size_t b); 65 | size_t ad_mul(Tape* tp, size_t a, size_t b); 66 | size_t ad_pow(Tape* tp, size_t a, size_t b); 67 | 68 | // Common activation functions 69 | size_t ad_tanh(Tape* tp, size_t a); 70 | size_t ad_relu(Tape* tp, size_t a); 71 | size_t ad_sigm(Tape* tp, size_t a); 72 | 73 | // Compute gradients of value in reverse mode 74 | void ad_reverse(Tape* tp, size_t y); 75 | void ad_reverse_toposort(Tape* tp, size_t y); 76 | 77 | // Print computation tree 78 | void ad_print_tree(Tape* tp, size_t y); 79 | 80 | #endif //_AUTODIFF_H 81 | 82 | -------------------------------------------------------------------------------- /src/main.c: -------------------------------------------------------------------------------- 1 | #include "autodiff.h" 2 | 3 | float mlp_rand(){ 4 | return ((float)rand() / (float)RAND_MAX) * 2.0 - 1.0; 5 | } 6 | 7 | int main(void){ 8 | 9 | Tape tp = {0}; 10 | ad_init_tape(&tp); 11 | 12 | // f(a, b) = (a + b) + ((a + b) + a) -> 3a + 2b 13 | // f' w.r.t. a is 3 14 | // f' w.r.t. b is 2 15 | size_t a = ad_create(&tp, 5); 16 | size_t b = ad_create(&tp, 10); 17 | size_t c = ad_add(&tp, a, b); 18 | c = ad_add(&tp, c, ad_add(&tp, c, a)); 19 | 20 | // ad_reverse(&tp, c); 21 | ad_reverse_toposort(&tp, c); 22 | 23 | ad_print_tape(&tp); 24 | ad_print_tree(&tp, c); 25 | 26 | printf("grad of a: %g\n", tp.val_buf[a].grad); 27 | printf("grad of b: %g\n", tp.val_buf[b].grad); 28 | 29 | ad_destroy_tape(&tp); 30 | return 0; 31 | } 32 | --------------------------------------------------------------------------------