├── .gitignore ├── README.md ├── docs ├── KMeans_Documentation.MD ├── MSE_Score_Documentation.md ├── PCA_Documentation.MD ├── R2_Score_Documentation.md ├── Ridge_Regression_Documentation.md ├── Simple_Imputer_Documentation.md ├── Standard_Scaler_Documentation.md ├── benchmarking_cols.png └── benchmarking_rows.png ├── python ├── env │ ├── environment.yml │ └── req.txt ├── logs │ ├── rustkit_benchmarking.csv │ ├── sklearn_benchmarking.csv │ ├── timing_log.csv │ └── unit_tests_output.txt ├── presentation.ipynb ├── rustkit_benchmarking.py ├── sklearn_benchmarking.py ├── test.py └── unit_tests.py └── rustkit ├── Cargo.lock ├── Cargo.toml ├── pyproject.toml └── src ├── benchmarking.rs ├── converters.rs ├── lib.rs ├── main.rs ├── preprocessing ├── mod.rs ├── simple_imputer.rs └── standard_scaler.rs ├── supervised ├── mod.rs └── ridge_regression.rs ├── testing ├── mod.rs └── regression_metrics.rs └── unsupervised ├── kmeans.rs ├── mod.rs └── pca.rs /.gitignore: -------------------------------------------------------------------------------- 1 | /rustkit/target 2 | /rustkit/rustkit 3 | /python/__pycache__ 4 | /python/timing_log.csv -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # rustkit 2 | 3 | --- 4 | 5 | ## Overview 6 | 7 | `rustkit` is a data science library written in Rust, callable from Python, and inspired by `Scikit-Learn`. The underlying Rust implementation relies on the `nalegbra` crate for fast linear algebra and matrix/vector data structures. Like Scikit-Learn, methods are defined through classes (structs) and tend to follow a fit-transform or fit-predict approach for the model (for our supervised, unsupervised, and preprocessing models). Additionally we provide some structs to define testing methods (such as R^2) score etc. See this [presentation](python/presentation.ipynb) for a more detailed overview of benchmarking and accuracy results, as well as examples of calling `rustkit` from Python. 8 | 9 | This project includes Python bindings using `maturin` and `PyO3` to use these methods and classes as a library in Python, called `rustkit`. To do so, we implemented converter functions that converted `numpy` matrices and vectors into `nalgebra` matrices and vectors, handling generic types and null values. More information can be found below on building the library for Python. 10 | 11 | For now, the methods only accept floats (represented as `f64` in Rust). So far, we have implemented the classes below, grouped by type: 12 | 13 | - Preprocessing: 14 | - Scaler 15 | - Imputer 16 | - Supervised 17 | - Ridge Regression 18 | - With the following Regression Metrics: 19 | - $R^2$ 20 | - MSE 21 | - Unsupervised 22 | - KMeans 23 | - PCA 24 | 25 | After implementing the methods in Rust, we created Python bindings using `maturin` and `PyO3` to use these methods and classes as a library in Python, called `rustkit`. To do so, we implemented converter functions that converted `numpy` matrices and vectors into `nalgebra` matrices and vectors, handling generic types and null values. 26 | 27 | **_Note_** 28 | Numpy matrices and pandas dataframes in Python tend to handle `None` or `NaN` entries. In Rust, while we can have null entries by storing our data as a `Option` matrix, most matrix operations are not implementable on Optional values. Thus all of our methods expect **_non-null_** entries, with the exception of `SimpleImputer` which provides imputation methods to ensure that input data is completely non-null. 29 | 30 | ## Project Structure 31 | 32 | This repo contains a folder defining the rustkit [library](rustkit/), as well as a folder containing unit-tests (which compare our output with Sk-learn) and benchmarking in [Python](python/). 33 | 34 | ### **_rustkit/_** 35 | 36 | Our Rust project, `rustkit` follows the following directory structure: 37 | 38 | ``` 39 | rustkit 40 | ├───rustkit 41 | │ ├───__init__.py 42 | │ └───**compiled package** 43 | ├───src 44 | │ ├───preprocessing 45 | │ ├───supervised 46 | │ ├───testing 47 | │ ├───unsupervised 48 | │ ├───lib.rs 49 | │ └───main.rs 50 | ├───Cargo.toml 51 | ├───pyproject.toml 52 | └───target 53 | ``` 54 | 55 | - `src` contains all of the of the Rust code needed. The core algorithms are organized by type into modules (e.g. preprocessing, supervised, etc.). Documentation for each class can be found below 56 | - `src/main.rs` prints an example use of all of these algorithms directly in Rust (use `cargo run` from the [rustkit/](rustkit/) folder). 57 | - `src/lib.rs` contains `pyo3` bindings to expose classes, methods, and functions to to the python package when built. 58 | - `src/benchmarking.rs` contains a wrapper function that times and logs the runtime of functions. 59 | - `src/converters.rs` contains wrapper functions that convert to and from Python objects and Rust `nalgebra` objects. 60 | - `Cargo.toml` and `pyproject.toml` enable us to talk to `cargo` and `maturin` to compile and build the crate as both a binary, library, and Python package. 61 | 62 | ### **_python/_** 63 | 64 | - `python/env/` contains files useful for users to easily create a python environment with the necessary packages installed 65 | - `python/logs/` contains benchmarking runtime logs and unit test outputs. 66 | - `python/presentation.ipynb` Jupyter notebook that serves as a demonstration of our package's functionalities and analyzes benchmarking data. 67 | - `python/rustkit_benchmarking.py` and `python/sklearn_benchmarking.py` contain all benchmarking functions that generate log data 68 | - `python/test.py` the Python implementation of `rustkit/main.rs` 69 | - `python/unit_tests.py` contians unit tests of `rustkit` methods 70 | 71 | ## Quickstart 72 | 73 | Create a Python environment with at least Python 3.10, and `pip intall maturin`. For help with this, use the following to create a conda environment with the necessary requirmentes: 74 | 75 | - `python/env/environment.yml` - Use the command `conda env create -n -f environment.yml python=3.10` to install the necessary requirements 76 | - `python/env/req.txt` - Use the command `conda create -n -f req.txt python=3.10` 77 | 78 | ### Building the Library 79 | 80 | 1. If you don't have the `rustkit/rustkit/` folder, create it. 81 | 2. To build the package, run `maturin develop` from the outermost `rustkit/` directory 82 | - This command compiles the local Rust crate into a Python module and installs it into your local Python environment, in the `rustkit/rustkit/` folder 83 | - Run `maturin develop --release` if you want the build to be optimized. 84 | 3. Within the `rustkit/rustkit/` folder, create `__init__.py` 85 | - Put the following in `__init__.py`: 86 | ```python 87 | from .rustkit import * 88 | from .rustkit import __all__ 89 | ``` 90 | Now, you should be able to call methods from `rustkit` in Python using `from rustkit import StandardScaler` or `import rustkit` 91 | 92 | ## Benchmarking Results 93 | 94 | We saw impressive results from our benchmarking of the performance of `rustkit` in relation to `sklearn`. We measured the following runtimes: 95 | 96 | - `sklearn` method runtime: using wallclock time from the function call to when it returns in Python. See `python/logs/sklearn_benchmarking.csv` for raw data. 97 | - `rustkit` method runtime: using wallclock time from the function call to when it returns in Python, including the full process of converting to/from Rust/Python objects. See `python/logs/rustkit_benchmarking.csv` for raw data. 98 | - `rustkit` Rust internal runtime: using wallclock time from the function call to when the function returns in Rust, excluding all Python interoperability computation. See `python/logs/timing_log.csv` for raw data. 99 | 100 | All Python benchmarking was done for 50 iterations. Input matrices ranged from 10 to 1000 rows and 2 to 50 columns. We ran two tests, one where we fixed the number of features (at 10) and varied the number of examples, and one where we fixed the number of examples (at 1000) and varied the number of features. 101 | 102 | We see from our results that `rustkit` scales well in comparison to `sklearn` as we increase the number of examples holding number of features constant at 10. We see a lack of scaling ability with `rustkit`'s implementation of `KMeans`. This is becuase `sklearn` parallelizes `KMeans` across multiple CPU cores while the current implementation of `KMeans` in rustkit is not parallelized. 103 | ![benchmarking results, rows](docs/benchmarking_rows.png) 104 | 105 | We see similar results as we scale the number of features in relation to a fixed number of examples, 1000. The runtime difference between `rustkit` in Python vs. `rustkit` in Rust increases significantly for `StandardScaler`. This may be because of unoptimized Python/Rust conversion from `numpy` to `nalgebra`. 106 | ![benchmarking results, columns](docs/benchmarking_cols.png) 107 | 108 | ### Running the Benchmarks 109 | 110 | All benchmarking and unit test code is in the `python/` directory. 111 | 112 | - `python/rustkit_benchmarking.py` runs all benchmarking tests for `rustkit`. All outputted benchmarks will be written to `python/logs/rustkit_benchmarking.csv`. If you want to start fresh, you must manually delete the entries in the csv file. 113 | - All Rust-specific runtimes will be written out to `timing_log.csv` in the directory that you run te Python from. This allows one to segment runtime logs since Rust benchmarking is done directly in the Rust source code. 114 | - `python/rustkit_benchmarking.py` runs all benchmarking tests for `sklearn`. All outputted benchmarks will be written to `python/logs/sklearn_benchmarking.csv`. If you want to start fresh, you must manually delete the entries in the csv file. 115 | 116 | Benchmark analysis shown above from the raw runtime logs is done in `python/presentation.ipynb` 117 | 118 | ## Documentation 119 | 120 | The underlying data science algorithms are implemented in Rust and rely heavily on the `nalgebra` crate. Below is the documentation for the classes currently available in this project. Each documentation page separates the methods into the external methods (those intended for Python integration) and internal methods (the Rust code to actually run the algorithms / update parameters). Classes are grouped by their type. 121 | 122 | ### **Preprocessing** 123 | 124 | - [SimpleImputer](docs/Simple_Imputer_Documentation.md) 125 | - [StandardScaler](docs/Standard_Scaler_Documentation.md) 126 | 127 | ### **Unsupervised** 128 | 129 | - [KMeans](docs/KMeans_Documentation.MD) 130 | - [PCA](docs/PCA_Documentation.MD) 131 | 132 | ### **Supervised** 133 | 134 | - [RidgeRegression](docs/Ridge_Regression_Documentation.md) 135 | 136 | ### **Testing** 137 | 138 | - [R2Score](docs/R2_Score_Documentation.md) 139 | - [MSE](docs/MSE_Score_Documentation.md) 140 | -------------------------------------------------------------------------------- /docs/KMeans_Documentation.MD: -------------------------------------------------------------------------------- 1 | # KMeans 2 | 3 | `KMeans` is a Rust implementation of the K-Means clustering algorithm, including support for K-Means++ and random initialization methods. This implementation integrates with Python via PyO3 and Maturin. 4 | 5 | --- 6 | 7 | ## Class Definition 8 | 9 | ### `KMeans` 10 | - **Fields**: 11 | - `k`: Number of clusters. 12 | - `max_iter`: Maximum number of iterations per run. 13 | - `n_init`: Number of runs to select the best clustering result. 14 | - `init_method`: Initialization method (`KMeansPlusPlus` or `Random`). 15 | - `centroids`: Centroids of the clusters after fitting (optional). 16 | - `labels`: Cluster labels for each data point after fitting (optional). 17 | 18 | --- 19 | 20 | ## Methods (callable from Python) 21 | 22 | ### `new(k: usize, init_method_str: &str, max_iter: Option, n_init: Option) -> Self` 23 | - **Description**: Creates a new `KMeans` instance with the specified parameters. 24 | - **Parameters**: 25 | - `k`: Number of clusters. 26 | - `init_method_str`: Initialization method (`"kmeans++"` or `"random"`). 27 | - `max_iter`: Maximum number of iterations (default: 200). 28 | - `n_init`: Number of runs to initialize centroids - default depends on `init_method` (10 for random initialization, 1 for KMeans++) 29 | - **Returns**: A new `KMeans` instance. 30 | 31 | --- 32 | 33 | ### `fit(data: PyReadonlyArray2) -> PyResult<()>` 34 | - **Description**: Fits the KMeans model to the input data by computing centroids and assigning cluster labels. 35 | - **Parameters**: 36 | - `data`: 2D NumPy array of shape `(n_samples, n_features)`. 37 | - **Returns**: `Ok(())` on success, `Err(PyValueError)` if an error occurs. 38 | 39 | --- 40 | 41 | ### `fit_predict(py: Python, data: PyReadonlyArray2) -> PyResult>>` 42 | - **Description**: Fits the model and returns cluster labels for the input data. 43 | - **Parameters**: 44 | - `py`: Python GIL token. 45 | - `data`: 2D NumPy array of shape `(n_samples, n_features)`. 46 | - **Returns**: Cluster labels as a 1D NumPy array. 47 | 48 | --- 49 | 50 | ### `compute_inertia(py: Python, data: PyReadonlyArray2, labels: PyReadonlyArray1) -> PyResult>` 51 | - **Description**: Computes the inertia (sum of squared distances to the nearest centroids). 52 | - **Parameters**: 53 | - `py`: Python GIL token. 54 | - `data`: 2D NumPy array of shape `(n_samples, n_features)`. 55 | - `labels`: Cluster labels as a 1D NumPy array. 56 | - **Returns**: Inertia as a floating-point value. 57 | 58 | --- 59 | 60 | ### `predict(py: Python, data: PyReadonlyArray2) -> PyResult>>` 61 | - **Description**: Predicts cluster labels for new data points using the fitted centroids. 62 | - **Parameters**: 63 | - `py`: Python GIL token. 64 | - `data`: 2D NumPy array of shape `(n_samples, n_features)`. 65 | - **Returns**: Predicted labels as a 1D NumPy array. 66 | 67 | --- 68 | 69 | ### `centroids(py: Python) -> PyResult>>` 70 | - **Description**: Retrieves the computed centroids of the clusters. 71 | - **Returns**: Centroids as a 2D NumPy array of shape `(n_features, k)`. 72 | 73 | --- 74 | 75 | ## Internal Methods (Rust Implementation) 76 | 77 | ### `fit_helper(data: &DMatrix)` 78 | - **Description**: Fits the KMeans model to the input data for `n_init` iterations and keeps the best result. 79 | - **Parameters**: 80 | - `data`: Dynamic matrix representation of the dataset. 81 | 82 | --- 83 | 84 | ### `fit_predict_helper(data: &DMatrix) -> DVector` 85 | - **Description**: Combines `fit_helper` and `predict_helper` to fit the model and predict cluster labels in one step. 86 | - **Parameters**: 87 | - `data`: Dynamic matrix representation of the dataset. 88 | - **Returns**: Cluster labels as a dynamic vector. 89 | 90 | --- 91 | 92 | ### `compute_inertia_helper(data: &DMatrix, labels: &DVector, centroids: &DMatrix) -> f64` 93 | - **Description**: Computes the inertia for the given data, labels, and centroids. 94 | - **Parameters**: 95 | - `data`: Dynamic matrix representation of the dataset. 96 | - `labels`: Dynamic vector of cluster labels. 97 | - `centroids`: Dynamic matrix of centroids. 98 | - **Returns**: Inertia as a floating-point value. 99 | 100 | --- 101 | 102 | ### `predict_helper(data: &DMatrix) -> Option>` 103 | - **Description**: Predicts cluster labels for input data using the fitted centroids. 104 | - **Parameters**: 105 | - `data`: Dynamic matrix representation of the dataset. 106 | - **Returns**: Cluster labels as an optional dynamic vector. 107 | 108 | --- 109 | 110 | ### `get_centroids_helper() -> Option<&DMatrix>` 111 | - **Description**: Retrieves the centroids of the clusters. 112 | - **Returns**: Centroids as an optional dynamic matrix. 113 | 114 | --- 115 | 116 | ### `run_single(data: &DMatrix) -> (DMatrix, DVector, f64)` 117 | - **Description**: Runs a single iteration of the KMeans algorithm. 118 | - **Returns**: A tuple containing centroids, labels, and inertia. 119 | 120 | --- 121 | 122 | ### `random_init(data: &DMatrix) -> DMatrix` 123 | - **Description**: Initializes centroids by randomly selecting data points. 124 | - **Parameters**: 125 | - `data`: Dynamic matrix representation of the dataset. 126 | - **Returns**: Centroids as a dynamic matrix. 127 | 128 | --- 129 | 130 | ### `kmeans_plus_plus(data: &DMatrix) -> DMatrix` 131 | - **Description**: Initializes centroids using the KMeans++ method. 132 | - **Parameters**: 133 | - `data`: Dynamic matrix representation of the dataset. 134 | - **Returns**: Centroids as a dynamic matrix. 135 | 136 | --- 137 | 138 | ### `assign_labels(data: &DMatrix, centroids: &DMatrix) -> DVector` 139 | - **Description**: Assigns each data point to the nearest centroid. 140 | - **Returns**: Cluster labels as a dynamic vector. 141 | 142 | --- 143 | 144 | ### `update_centroids(data: &DMatrix, labels: &DVector) -> DMatrix` 145 | - **Description**: Updates centroids based on the mean of assigned points. 146 | - **Returns**: Updated centroids as a dynamic matrix. 147 | 148 | --- 149 | 150 | ## Example Usage (Python) 151 | 152 | ```python 153 | import numpy as np 154 | from ruskit import KMeans 155 | 156 | # Create KMeans instance 157 | kmeans = KMeans(k=3, init_method_str="kmeans++", max_iter=100, n_init=5) 158 | 159 | # Fit model 160 | data = np.random.rand(100, 2) 161 | kmeans.fit(data) 162 | 163 | # Predict cluster labels 164 | labels = kmeans.predict(data) 165 | 166 | # Fit and predict in one step 167 | labels = kmeans.fit_predict(data) 168 | 169 | # Compute inertia 170 | inertia = kmeans.compute_inertia(data, labels) 171 | 172 | # Access centroids 173 | centroids = kmeans.centroids() 174 | ``` 175 | 176 | --- 177 | 178 | ## **Notes** 179 | 180 | - Centroids are stored as columns in the centroids matrix. -------------------------------------------------------------------------------- /docs/MSE_Score_Documentation.md: -------------------------------------------------------------------------------- 1 | # MSE 2 | 3 | `MSE` is a Rust implementation of the mean squared error (MSE) calculation. It quantifies the average squared difference between true and predicted values, serving as a measure of model accuracy. This implementation integrates with Python via PyO3 and Maturin. 4 | 5 | --- 6 | 7 | ## Class Definition 8 | 9 | ### `MSE` 10 | 11 | `MSE` provides methods to compute the mean squared error between true and predicted values. 12 | 13 | --- 14 | 15 | ## Methods (callable from Python) 16 | 17 | ### `compute(py: Python, y_true: PyReadonlyArray1, y_pred: PyReadonlyArray1) -> PyResult>` 18 | - **Description**: Computes the MSE between the true values (`y_true`) and the predicted values (`y_pred`). 19 | - **Parameters**: 20 | - `py`: Python GIL token. 21 | - `y_true`: 1D NumPy array of true values. 22 | - `y_pred`: 1D NumPy array of predicted values. 23 | - **Returns**: MSE as a Python float. 24 | - **Panics**: Raises an error if `y_true` and `y_pred` have different lengths. 25 | 26 | --- 27 | 28 | ## Internal Methods (Rust implementation) 29 | 30 | ### `compute_helper(y_true: &DVector, y_pred: &DVector) -> f64` 31 | - **Description**: Calculates the MSE given two dynamic vectors. 32 | - **Parameters**: 33 | - `y_true`: Dynamic vector of true values. 34 | - `y_pred`: Dynamic vector of predicted values. 35 | - **Returns**: MSE. 36 | - **Panics**: Raises an error if `y_true` and `y_pred` have different lengths. 37 | 38 | --- 39 | 40 | ## Example Usage (Python) 41 | 42 | ```python 43 | import numpy as np 44 | from ruskit import MSE 45 | 46 | # True and predicted values 47 | y_true = np.array([3.0, -0.5, 2.0, 7.0]) 48 | y_pred = np.array([2.5, 0.0, 2.0, 8.0]) 49 | 50 | # Compute MSE 51 | mse = MSE.compute(y_true, y_pred) 52 | print(f"MSE: {mse}") 53 | -------------------------------------------------------------------------------- /docs/PCA_Documentation.MD: -------------------------------------------------------------------------------- 1 | # PCA 2 | 3 | `PCA` (Principal Component Analysis) is a Rust implementation of a dimensionality reduction algorithm. It reduces the dimensionality of the data while preserving as much variability as possible. This implementation integrates with Python via PyO3 and Maturin. 4 | 5 | --- 6 | 7 | ## Class Definition 8 | 9 | ### `PCA` 10 | - **Fields**: 11 | - `components`: Matrix representing the principal components. 12 | - `explained_variance`: Vector of explained variances corresponding to each principal component. 13 | - `mean`: Row vector of feature means. 14 | 15 | --- 16 | 17 | ## Methods (callable from Python) 18 | 19 | ### `new()` 20 | - **Description**: Creates a new instance of `PCA`. 21 | - **Returns**: `PCA` object. 22 | 23 | --- 24 | 25 | ### `fit(data: PyReadonlyArray2, n_components: i64) -> PyResult<()>` 26 | - **Description**: Fits the PCA model to the data, computing principal components and explained variance. 27 | - **Parameters**: 28 | - `data`: 2D NumPy array of shape `(n_samples, n_features)`. 29 | - `n_components`: Number of principal components to compute. 30 | - **Returns**: `Ok(())` on success, `Err(PyValueError)` if an error occurs. 31 | 32 | --- 33 | 34 | ### `transform(py: Python, data: PyReadonlyArray2) -> PyResult>>` 35 | - **Description**: Transforms the data into the principal component space. 36 | - **Parameters**: 37 | - `py`: Python GIL token. 38 | - `data`: 2D NumPy array of shape `(n_samples, n_features)`. 39 | - **Returns**: Transformed data as a 2D NumPy array. 40 | 41 | --- 42 | 43 | ### `fit_transform(py: Python, data: PyReadonlyArray2, n_components: i64) -> PyResult>>` 44 | - **Description**: Combines fitting the PCA model and transforming the data in one step. 45 | - **Parameters**: 46 | - `py`: Python GIL token. 47 | - `data`: 2D NumPy array of shape `(n_samples, n_features)`. 48 | - `n_components`: Number of principal components to compute. 49 | - **Returns**: Transformed data as a 2D NumPy array. 50 | 51 | --- 52 | 53 | ### `inverse_transform(py: Python, data: PyReadonlyArray2) -> PyResult>>` 54 | - **Description**: Reconstructs the original data from its projection in the principal component space. 55 | - **Parameters**: 56 | - `py`: Python GIL token. 57 | - `data`: 2D NumPy array of shape `(n_samples, n_components)`. 58 | - **Returns**: Reconstructed data as a 2D NumPy array. 59 | 60 | --- 61 | 62 | ### Getters 63 | 64 | #### `components(py: Python) -> PyResult>>` 65 | - **Description**: Returns the principal components. 66 | - **Returns**: 2D NumPy array of shape `(n_features, n_components)`. 67 | 68 | #### `explained_variance(py: Python) -> PyResult>>` 69 | - **Description**: Returns the explained variance for each principal component. 70 | - **Returns**: 1D NumPy array. 71 | 72 | #### `mean(py: Python) -> PyResult>>` 73 | - **Description**: Returns the mean of each feature. 74 | - **Returns**: 1D NumPy array. 75 | 76 | --- 77 | 78 | ### Setters 79 | 80 | #### `set_components(components: PyReadonlyArray2) -> PyResult<()>` 81 | - **Description**: Sets the principal components. 82 | - **Parameters**: 83 | - `components`: 2D NumPy array of principal components. 84 | - **Returns**: `Ok(())`. 85 | 86 | #### `set_explained_variance(explained_variance: PyReadonlyArray1) -> PyResult<()>` 87 | - **Description**: Sets the explained variance for each principal component. 88 | - **Parameters**: 89 | - `explained_variance`: 1D NumPy array. 90 | - **Returns**: `Ok(())`. 91 | 92 | #### `set_mean(mean: PyReadonlyArray1) -> PyResult<()>` 93 | - **Description**: Sets the mean of each feature. 94 | - **Parameters**: 95 | - `mean`: 1D NumPy array. 96 | - **Returns**: `Ok(())`. 97 | 98 | --- 99 | 100 | ## Internal Methods (Rust Implementation) 101 | 102 | ### `fit_helper(data: &DMatrix, n_components: usize)` 103 | - **Description**: Computes the principal components and explained variance. 104 | - **Parameters**: 105 | - `data`: Dynamic matrix of shape `(n_samples, n_features)`. 106 | - `n_components`: Number of principal components to compute. 107 | 108 | --- 109 | 110 | ### `transform_helper(data: &DMatrix) -> DMatrix` 111 | - **Description**: Transforms the data into the principal component space. 112 | - **Parameters**: 113 | - `data`: Dynamic matrix of shape `(n_samples, n_features)`. 114 | - **Returns**: Transformed data as a dynamic matrix. 115 | 116 | --- 117 | 118 | ### `fit_transform_helper(data: &DMatrix, n_components: usize) -> DMatrix` 119 | - **Description**: Combines fitting the PCA model and transforming the data. 120 | - **Parameters**: 121 | - `data`: Dynamic matrix of shape `(n_samples, n_features)`. 122 | - `n_components`: Number of principal components to compute. 123 | - **Returns**: Transformed data as a dynamic matrix. 124 | 125 | --- 126 | 127 | ### `inverse_transform_helper(data: &DMatrix) -> DMatrix` 128 | - **Description**: Reconstructs the original data from its projection in the principal component space. 129 | - **Parameters**: 130 | - `data`: Dynamic matrix of shape `(n_samples, n_components)`. 131 | - **Returns**: Reconstructed data as a dynamic matrix. 132 | 133 | --- 134 | 135 | ## Example Usage (Python) 136 | 137 | ```python 138 | import numpy as np 139 | from ruskit import PCA 140 | 141 | # Create and fit PCA 142 | pca = PCA() 143 | data = np.random.rand(100, 5) 144 | pca.fit(data, n_components=2) 145 | 146 | # Transform data 147 | transformed = pca.transform(data) 148 | 149 | # Fit and transform in one step 150 | fit_transformed = pca.fit_transform(data, n_components=2) 151 | 152 | # Reconstruct data 153 | reconstructed = pca.inverse_transform(transformed) 154 | ``` 155 | 156 | --- 157 | 158 | ## **Notes** 159 | 160 | - The implementation currently uses full Singular Value Decomposition (SVD), which may be inefficient for large datasets. Future versions may incorporate partial SVD. -------------------------------------------------------------------------------- /docs/R2_Score_Documentation.md: -------------------------------------------------------------------------------- 1 | # R2Score 2 | 3 | `R2Score` is a Rust implementation of the R² score (coefficient of determination) calculation. It evaluates the proportion of variance in the dependent variable that is predictable from the independent variable(s). This implementation integrates with Python via PyO3 and Maturin. 4 | 5 | --- 6 | 7 | ## Class Definition 8 | 9 | ### `R2Score` 10 | 11 | `R2Score` provides methods to compute the R² score between true and predicted values. 12 | 13 | --- 14 | 15 | ## Methods (callable from Python) 16 | 17 | ### `compute(py: Python, y_true: PyReadonlyArray1, y_pred: PyReadonlyArray1) -> PyResult>` 18 | - **Description**: Computes the R² score between the true values (`y_true`) and the predicted values (`y_pred`). 19 | - **Parameters**: 20 | - `py`: Python GIL token. 21 | - `y_true`: 1D NumPy array of true values. 22 | - `y_pred`: 1D NumPy array of predicted values. 23 | - **Returns**: R² score as a Python float. 24 | - **Panics**: Raises an error if `y_true` and `y_pred` have different lengths. 25 | 26 | --- 27 | 28 | ## Internal Methods (Rust implementation) 29 | 30 | ### `compute_helper(y_true: &DVector, y_pred: &DVector) -> f64` 31 | - **Description**: Calculates the R² score given two dynamic vectors. 32 | - **Parameters**: 33 | - `y_true`: Dynamic vector of true values. 34 | - `y_pred`: Dynamic vector of predicted values. 35 | - **Returns**: R² score. 36 | - **Panics**: Raises an error if `y_true` and `y_pred` have different lengths. 37 | 38 | --- 39 | 40 | ## Example Usage (Python) 41 | 42 | ```python 43 | import numpy as np 44 | from rustkit import R2Score 45 | 46 | # True and predicted values 47 | y_true = np.array([3.0, -0.5, 2.0, 7.0]) 48 | y_pred = np.array([2.5, 0.0, 2.0, 8.0]) 49 | 50 | # Compute R² score 51 | r2_score = R2Score.compute(y_true, y_pred) 52 | print(f"R² Score: {r2_score}") 53 | -------------------------------------------------------------------------------- /docs/Ridge_Regression_Documentation.md: -------------------------------------------------------------------------------- 1 | # RidgeRegression 2 | 3 | `RidgeRegression` is a Rust implementation of the Ridge Regression algorithm. This model minimizes the residual sum of squares with L2 regularization, allowing for improved generalization by penalizing large coefficients. The implementation supports Python interop using PyO3 and Maturin. 4 | 5 | --- 6 | 7 | ## Class Definition 8 | 9 | ### `RidgeRegression` 10 | - **Fields**: 11 | - `weights`: Vector of model weights (coefficients). 12 | - `intercept`: Optional bias term (intercept) for the model. 13 | - `regularization`: Regularization strength (λ). Set to `0.0` for standard linear regression. 14 | - `with_bias`: Boolean flag indicating whether a bias term is included. 15 | 16 | --- 17 | 18 | ## Methods (callable from Python) 19 | 20 | ### `new(regularization: f64, with_bias: bool) -> Self` 21 | - **Description**: Creates a new instance of `RidgeRegression`. 22 | - **Parameters**: 23 | - `regularization`: L2 regularization strength. Set to `0.0` for standard linear regression. 24 | - `with_bias`: Whether to include a bias term in the model. 25 | - **Returns**: `RidgeRegression` object. 26 | 27 | --- 28 | 29 | ### `weights(py: Python) -> PyResult>>` 30 | - **Description**: Retrieves the model weights (coefficients). 31 | - **Parameters**: 32 | - `py`: Python GIL token. 33 | - **Returns**: Weights as a 1D NumPy array. 34 | 35 | --- 36 | 37 | ### `intercept(py: Python) -> PyResult>` 38 | - **Description**: Retrieves the model intercept (bias term), if applicable. 39 | - **Parameters**: 40 | - `py`: Python GIL token. 41 | - **Returns**: Intercept as a Python float or `None`. 42 | 43 | --- 44 | 45 | ### `set_regularization(regularization: f64) -> PyResult<()>` 46 | - **Description**: Updates the regularization strength. 47 | - **Parameters**: 48 | - `regularization`: New L2 regularization strength. 49 | - **Returns**: `Ok(())` on success. 50 | 51 | --- 52 | 53 | ### `set_with_bias(with_bias: bool) -> PyResult<()>` 54 | - **Description**: Updates the inclusion of a bias term. 55 | - **Parameters**: 56 | - `with_bias`: Whether to include a bias term in the model. 57 | - **Returns**: `Ok(())` on success. 58 | 59 | --- 60 | 61 | ### `set_weights(weights: PyReadonlyArray1) -> PyResult<()>` 62 | - **Description**: Sets custom model weights. 63 | - **Parameters**: 64 | - `weights`: 1D NumPy array of weights. 65 | - **Returns**: `Ok(())` on success. 66 | 67 | --- 68 | 69 | ### `set_intercept(intercept: Option) -> PyResult<()>` 70 | - **Description**: Sets a custom model intercept. 71 | - **Parameters**: 72 | - `intercept`: Optional intercept value. 73 | - **Returns**: `Ok(())` on success. 74 | 75 | --- 76 | 77 | ### `fit(data: PyReadonlyArray2, target: PyReadonlyArray1) -> PyResult<()>` 78 | - **Description**: Fits the Ridge Regression model to the provided data. 79 | - **Parameters**: 80 | - `data`: 2D NumPy array of shape `(n_samples, n_features)`. 81 | - `target`: 1D NumPy array of target values of shape `(n_samples,)`. 82 | - **Returns**: `Ok(())` on success, `Err(PyValueError)` if an error occurs. 83 | 84 | --- 85 | 86 | ### `predict(py: Python, data: PyReadonlyArray2) -> PyResult>>` 87 | - **Description**: Predicts target values for the provided input data. 88 | - **Parameters**: 89 | - `py`: Python GIL token. 90 | - `data`: 2D NumPy array of shape `(n_samples, n_features)`. 91 | - **Returns**: Predictions as a 1D NumPy array. 92 | 93 | --- 94 | 95 | ## Internal Methods (Rust Implementation) 96 | 97 | ### `fit_helper(x: &DMatrix, y: &DVector)` 98 | - **Description**: Performs Ridge Regression fitting using LU decomposition to solve for weights and intercept. 99 | - **Parameters**: 100 | - `x`: Dynamic matrix of input data. 101 | - `y`: Dynamic vector of target values. 102 | 103 | --- 104 | 105 | ### `predict_helper(x: &DMatrix) -> DVector` 106 | - **Description**: Computes predictions using the fitted model weights and intercept. 107 | - **Parameters**: 108 | - `x`: Dynamic matrix of input data. 109 | - **Returns**: Dynamic vector of predictions. 110 | 111 | --- 112 | 113 | ### `weights_helper() -> &DVector` 114 | - **Description**: Retrieves the model weights (coefficients). 115 | - **Returns**: Reference to the weights vector. 116 | 117 | --- 118 | 119 | ### `intercept_helper() -> Option` 120 | - **Description**: Retrieves the model intercept (bias term), if available. 121 | - **Returns**: Optional intercept value. 122 | 123 | --- 124 | 125 | ## Example Usage (Python) 126 | 127 | ```python 128 | import numpy as np 129 | from ruskit import RidgeRegression 130 | 131 | # Create Ridge Regression model 132 | ridge = RidgeRegression(regularization=1.0, with_bias=True) 133 | 134 | # Example data 135 | X = np.array([[1.0, 2.0], [2.0, 3.0], [3.0, 4.0]]) 136 | y = np.array([1.0, 2.0, 3.0]) 137 | 138 | # Fit the model 139 | ridge.fit(X, y) 140 | 141 | # Make predictions 142 | predictions = ridge.predict(X) 143 | 144 | # Access weights and intercept 145 | weights = ridge.weights 146 | intercept = ridge.intercept 147 | 148 | ``` 149 | 150 | --- 151 | 152 | ## **Notes** 153 | 154 | - This implementation assumes that input data is normalized. If not normalized, set regularization = 0.0 for correct results. 155 | - Uses LU decomposition for efficient computation instead of directly inverting the matrix. -------------------------------------------------------------------------------- /docs/Simple_Imputer_Documentation.md: -------------------------------------------------------------------------------- 1 | # Imputer 2 | 3 | `Imputer` is a Rust implementation of a data imputation utility inspired by Scikit-learn. It allows replacing missing values in a dataset using a specified imputation strategy. This implementation integrates with Python via PyO3 and Maturin. 4 | 5 | --- 6 | 7 | ## Class Definition 8 | 9 | ### `Imputer` 10 | - **Fields**: 11 | - `strategy`: Imputation strategy, either `Mean` or `Constant(f64)`. 12 | - `impute_values`: Optional vector of computed imputation values for each column. 13 | 14 | --- 15 | 16 | ## Methods (callable from Python) 17 | 18 | ### `new(strategy: &str, value: Option) -> Self` 19 | - **Description**: Creates a new instance of `Imputer` with the specified imputation strategy. 20 | - **Parameters**: 21 | - `strategy`: The imputation strategy. Accepted values: 22 | - `"mean"`: Replace missing values with the mean of the column. 23 | - `"constant"`: Replace missing values with a constant value. 24 | - `value`: The constant value for the `"constant"` strategy. Ignored if the strategy is `"mean"`. 25 | - **Returns**: `Imputer` object. 26 | 27 | --- 28 | 29 | ### `fit(data: PyReadonlyArray2) -> PyResult<()>` 30 | - **Description**: Computes the imputation values for each column in the dataset. 31 | - **Parameters**: 32 | - `data`: 2D NumPy array of shape `(n_samples, n_features)`. Missing values should be represented as `NaN`. 33 | - **Returns**: `Ok(())` on success, `Err(PyValueError)` if an error occurs. 34 | 35 | --- 36 | 37 | ### `transform(py: Python, data: PyReadonlyArray2) -> PyResult>>` 38 | - **Description**: Imputes missing values in the input dataset using the precomputed values (assumes fit has already been called). 39 | - **Parameters**: 40 | - `py`: Python GIL token. 41 | - `data`: 2D NumPy array of shape `(n_samples, n_features)`. 42 | - **Returns**: Imputed data as a 2D NumPy array. 43 | 44 | --- 45 | 46 | ### `fit_transform(py: Python, data: PyReadonlyArray2) -> PyResult>>` 47 | - **Description**: Computes the imputation values and imputes missing values in the input dataset. 48 | - **Parameters**: 49 | - `py`: Python GIL token. 50 | - `data`: 2D NumPy array of shape `(n_samples, n_features)`. 51 | - **Returns**: Imputed data as a 2D NumPy array. 52 | 53 | --- 54 | 55 | ## Internal Methods (Rust Implementation) 56 | 57 | ### `fit_helper(data: &DMatrix>) -> Result<(), ImputerError>` 58 | - **Description**: Computes the imputation values for each column based on the specified strategy. 59 | - **Parameters**: 60 | - `data`: Dynamic matrix of optional values representing the dataset. 61 | - **Returns**: `Ok(())` on success, `Err(ImputerError)` if a column contains only missing values. 62 | 63 | --- 64 | 65 | ### `transform_helper(data: &DMatrix>) -> DMatrix` 66 | - **Description**: Imputes missing values in the input data using precomputed imputation values. 67 | - **Parameters**: 68 | - `data`: Dynamic matrix of optional values representing the dataset. 69 | - **Returns**: Imputed data as a dynamic matrix. 70 | 71 | --- 72 | 73 | ### `fit_transform_helper(data: &DMatrix>) -> Result, ImputerError>` 74 | - **Description**: Combines `fit_helper` and `transform_helper` to compute and impute missing values. 75 | - **Parameters**: 76 | - `data`: Dynamic matrix of optional values representing the dataset. 77 | - **Returns**: Imputed data as a dynamic matrix, or an error if a column contains only missing values. 78 | 79 | --- 80 | 81 | ## Example Usage (Python) 82 | 83 | ```python 84 | import numpy as np 85 | from rustkit import Imputer 86 | 87 | # Create and fit an imputer 88 | imputer = Imputer("mean", None) 89 | data = np.array([[1.0, 2.0, np.nan], [3.0, np.nan, 6.0], [7.0, 8.0, 9.0]]) 90 | imputer.fit(data) 91 | 92 | # Transform data 93 | transformed = imputer.transform(data) 94 | 95 | # Fit and transform in one step 96 | fit_transformed = imputer.fit_transform(data) 97 | ``` 98 | 99 | ## **Notes** 100 | 101 | - The "mean" strategy computes the mean of non-missing values in each column. 102 | - The "constant" strategy replaces missing values with a specified constant value. 103 | - Missing values in the input data must be represented as `NaN` for compatibility with NumPy. 104 | - Columns with all missing values will raise an `ImputerError` during fitting with the "mean" strategy. 105 | -------------------------------------------------------------------------------- /docs/Standard_Scaler_Documentation.md: -------------------------------------------------------------------------------- 1 | # StandardScaler 2 | 3 | `StandardScaler` is a Rust implementation of a data preprocessing utility inspired by Scikit-learn. It standardizes features by removing the mean and scaling to unit variance. This implementation integrates with Python via PyO3 and Maturin. Each of the methods callable from Python makes use of a corresponding `_helper` which actually implements that method in Rust. 4 | 5 | --- 6 | 7 | ## Class Definition 8 | 9 | ### `StandardScaler` 10 | - **Fields**: 11 | - `means`: Optional vector of column-wise means. 12 | - `std_devs`: Optional vector of column-wise standard deviations. 13 | 14 | --- 15 | 16 | ## Methods (callable from Python) 17 | 18 | ### `new()` 19 | - **Description**: Creates a new instance of `StandardScaler`. 20 | - **Returns**: `StandardScaler` object. 21 | 22 | --- 23 | 24 | ### `fit(data: PyReadonlyArray2) -> PyResult<()>` 25 | - **Description**: Computes the mean and standard deviation for each feature in the input dataset. 26 | - **Parameters**: 27 | - `data`: 2D NumPy array of shape `(n_samples, n_features)`. 28 | - **Returns**: `Ok(())` on success, `Err(PyValueError)` if an error occurs. 29 | 30 | --- 31 | 32 | ### `transform(py: Python, data: PyReadonlyArray2) -> PyResult>>` 33 | - **Description**: Standardizes the input data using the precomputed means and standard deviations. 34 | - **Parameters**: 35 | - `py`: Python GIL token. 36 | - `data`: 2D NumPy array of shape `(n_samples, n_features)`. 37 | - **Returns**: Transformed data as a 2D NumPy array. 38 | 39 | --- 40 | 41 | ### `fit_transform(py: Python, data: PyReadonlyArray2) -> PyResult>>` 42 | - **Description**: Computes the means and standard deviations, then standardizes the input data. 43 | - **Parameters**: 44 | - `py`: Python GIL token. 45 | - `data`: 2D NumPy array of shape `(n_samples, n_features)`. 46 | - **Returns**: Transformed data as a 2D NumPy array. 47 | 48 | --- 49 | 50 | ### `inverse_transform(py: Python, scaled_data: PyReadonlyArray2) -> PyResult>>` 51 | - **Description**: Reverts standardized data back to its original scale. 52 | - **Parameters**: 53 | - `py`: Python GIL token. 54 | - `scaled_data`: 2D NumPy array of standardized data `(n_samples, n_features)`. 55 | - **Returns**: Original data as a 2D NumPy array. 56 | 57 | --- 58 | 59 | ## Internal Methods (Rust implementation) 60 | 61 | ### `fit_helper(data: &DMatrix)` 62 | - **Description**: Computes column-wise means and standard deviations from a dynamic matrix. 63 | - **Parameters**: 64 | - `data`: Dynamic matrix representation of the dataset. 65 | 66 | --- 67 | 68 | ### `transform_helper(data: &DMatrix) -> DMatrix` 69 | - **Description**: Standardizes the input data using precomputed means and standard deviations. 70 | - **Parameters**: 71 | - `data`: Dynamic matrix representation of the dataset. 72 | - **Returns**: Standardized data as a dynamic matrix. 73 | 74 | --- 75 | 76 | ### `fit_transform_helper(data: &DMatrix) -> DMatrix` 77 | - **Description**: Combines `fit_helper` and `transform_helper` to compute and standardize data. 78 | - **Parameters**: 79 | - `data`: Dynamic matrix representation of the dataset. 80 | - **Returns**: Standardized data as a dynamic matrix. 81 | 82 | --- 83 | 84 | ### `inverse_transform_helper(scaled_data: &DMatrix) -> DMatrix` 85 | - **Description**: Reverts standardized data back to its original scale. 86 | - **Parameters**: 87 | - `scaled_data`: Dynamic matrix of standardized data. 88 | - **Returns**: Original data as a dynamic matrix. 89 | 90 | --- 91 | 92 | ## Example Usage (Python) 93 | 94 | ```python 95 | import numpy as np 96 | from rustkit import StandardScaler 97 | 98 | # Create and fit scaler 99 | scaler = StandardScaler() 100 | data = np.array([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]]) 101 | scaler.fit(data) 102 | 103 | # Transform data 104 | transformed = scaler.transform(data) 105 | 106 | # Fit and transform in one step 107 | fit_transformed = scaler.fit_transform(data) 108 | 109 | # Revert transformed data 110 | original = scaler.inverse_transform(transformed) 111 | ``` 112 | 113 | --- 114 | 115 | ## **Notes** 116 | 117 | - This implementation currently computes standard deviation using n (population standard deviation). Future versions may add an option for n-1 (sample standard deviation). 118 | -------------------------------------------------------------------------------- /docs/benchmarking_cols.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rosewang01/rustkit-learn/f0ba147980c4f820e2fc10b2abf9439112d1417d/docs/benchmarking_cols.png -------------------------------------------------------------------------------- /docs/benchmarking_rows.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rosewang01/rustkit-learn/f0ba147980c4f820e2fc10b2abf9439112d1417d/docs/benchmarking_rows.png -------------------------------------------------------------------------------- /python/env/environment.yml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rosewang01/rustkit-learn/f0ba147980c4f820e2fc10b2abf9439112d1417d/python/env/environment.yml -------------------------------------------------------------------------------- /python/env/req.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rosewang01/rustkit-learn/f0ba147980c4f820e2fc10b2abf9439112d1417d/python/env/req.txt -------------------------------------------------------------------------------- /python/logs/rustkit_benchmarking.csv: -------------------------------------------------------------------------------- 1 | PCA::fit_transform,10,10,0.0005188608169555665 2 | StandardScaler::fit_transform,10,10,0.00038103103637695314 3 | RidgeRegression::fit,10,10,0.0004150247573852539 4 | R2Score::compute,1,10,0.0003993511199951172 5 | MSE::compute,1,10,0.00040621280670166015 6 | KMeans::fit,10,10,0.0006931591033935547 7 | KMeans(Random)::fit,10,10,0.0006873273849487305 8 | PCA::fit_transform,50,10,0.0005540895462036133 9 | StandardScaler::fit_transform,50,10,0.0004749298095703125 10 | RidgeRegression::fit,50,10,0.00048160552978515625 11 | R2Score::compute,1,50,0.00044877052307128906 12 | MSE::compute,1,50,0.000391693115234375 13 | KMeans::fit,50,10,0.001981062889099121 14 | KMeans(Random)::fit,50,10,0.0018183040618896485 15 | PCA::fit_transform,100,10,0.0005147695541381836 16 | StandardScaler::fit_transform,100,10,0.00043550491333007815 17 | RidgeRegression::fit,100,10,0.0004017496109008789 18 | R2Score::compute,1,100,0.0004363250732421875 19 | MSE::compute,1,100,0.0004455232620239258 20 | KMeans::fit,100,10,0.004171614646911621 21 | KMeans(Random)::fit,100,10,0.0035326814651489256 22 | PCA::fit_transform,250,10,0.0005501174926757813 23 | StandardScaler::fit_transform,250,10,0.00038709640502929685 24 | RidgeRegression::fit,250,10,0.00044628620147705076 25 | R2Score::compute,1,250,0.0004374408721923828 26 | MSE::compute,1,250,0.00040935039520263673 27 | KMeans::fit,250,10,0.010909204483032226 28 | KMeans(Random)::fit,250,10,0.013869271278381348 29 | PCA::fit_transform,500,10,0.0005493545532226562 30 | StandardScaler::fit_transform,500,10,0.00045832157135009764 31 | RidgeRegression::fit,500,10,0.00047684669494628905 32 | R2Score::compute,1,500,0.00045699119567871094 33 | MSE::compute,1,500,0.000428466796875 34 | KMeans::fit,500,10,0.016373496055603027 35 | KMeans(Random)::fit,500,10,0.03104447841644287 36 | PCA::fit_transform,750,10,0.0005922412872314453 37 | StandardScaler::fit_transform,750,10,0.00047014236450195314 38 | RidgeRegression::fit,750,10,0.00045459747314453126 39 | R2Score::compute,1,750,0.0004418659210205078 40 | MSE::compute,1,750,0.0004158592224121094 41 | KMeans::fit,750,10,0.029011554718017578 42 | KMeans(Random)::fit,750,10,0.046846356391906735 43 | PCA::fit_transform,1000,10,0.0008000564575195312 44 | StandardScaler::fit_transform,1000,10,0.00048412322998046874 45 | RidgeRegression::fit,1000,10,0.000489511489868164 46 | R2Score::compute,1,1000,0.0004176950454711914 47 | MSE::compute,1,1000,0.00044550895690917967 48 | KMeans::fit,1000,10,0.03484732627868652 49 | KMeans(Random)::fit,1000,10,0.06919992446899415 50 | PCA::fit_transform,1000,2,0.00046601295471191404 51 | StandardScaler::fit_transform,1000,2,0.0005391645431518555 52 | RidgeRegression::fit,1000,2,0.00048605918884277345 53 | R2Score::compute,1,1000,0.0005372047424316406 54 | MSE::compute,1,1000,0.00037721633911132814 55 | KMeans::fit,1000,2,0.030015969276428224 56 | KMeans(Random)::fit,1000,2,0.057054877281188965 57 | PCA::fit_transform,1000,5,0.0004772806167602539 58 | StandardScaler::fit_transform,1000,5,0.0005111885070800781 59 | RidgeRegression::fit,1000,5,0.0004412364959716797 60 | R2Score::compute,1,1000,0.00043361663818359376 61 | MSE::compute,1,1000,0.0004312229156494141 62 | KMeans::fit,1000,5,0.03914988040924072 63 | KMeans(Random)::fit,1000,5,0.06047675132751465 64 | PCA::fit_transform,1000,10,0.0008242845535278321 65 | StandardScaler::fit_transform,1000,10,0.0004401206970214844 66 | RidgeRegression::fit,1000,10,0.0005350351333618164 67 | R2Score::compute,1,1000,0.00040287017822265625 68 | MSE::compute,1,1000,0.0004132556915283203 69 | KMeans::fit,1000,10,0.047134113311767575 70 | KMeans(Random)::fit,1000,10,0.06979289054870605 71 | PCA::fit_transform,1000,25,0.0010766363143920898 72 | StandardScaler::fit_transform,1000,25,0.000578618049621582 73 | RidgeRegression::fit,1000,25,0.0005658864974975586 74 | R2Score::compute,1,1000,0.000423884391784668 75 | MSE::compute,1,1000,0.00042859077453613284 76 | KMeans::fit,1000,25,0.042478199005126956 77 | KMeans(Random)::fit,1000,25,0.10140793800354003 78 | PCA::fit_transform,1000,50,0.0023641443252563478 79 | StandardScaler::fit_transform,1000,50,0.002023940086364746 80 | RidgeRegression::fit,1000,50,0.00125823974609375 81 | R2Score::compute,1,1000,0.00042165279388427734 82 | MSE::compute,1,1000,0.0004620504379272461 83 | KMeans::fit,1000,50,0.07120592594146728 84 | KMeans(Random)::fit,1000,50,0.1354771089553833 85 | -------------------------------------------------------------------------------- /python/logs/sklearn_benchmarking.csv: -------------------------------------------------------------------------------- 1 | PCA::fit_transform,10,10,0.0010920143127441406 2 | StandardScaler::fit_transform,10,10,0.0009880733489990234 3 | RidgeRegression::fit,10,10,0.0013889074325561523 4 | R2Score::compute,1,10,0.0008965110778808593 5 | MSE::compute,1,10,2.0623207092285156e-05 6 | KMeans::fit,10,10,0.008584160804748536 7 | KMeans(Random)::fit,10,10,0.03399446487426758 8 | PCA::fit_transform,50,10,0.0012941503524780274 9 | StandardScaler::fit_transform,50,10,0.0008196830749511719 10 | RidgeRegression::fit,50,10,0.0012430524826049804 11 | R2Score::compute,1,50,0.0007856178283691406 12 | MSE::compute,1,50,1.9984245300292967e-05 13 | KMeans::fit,50,10,0.004307894706726074 14 | KMeans(Random)::fit,50,10,0.034214291572570804 15 | PCA::fit_transform,100,10,0.0013422393798828126 16 | StandardScaler::fit_transform,100,10,0.0011043453216552734 17 | RidgeRegression::fit,100,10,0.0014276599884033203 18 | R2Score::compute,1,100,0.0008810806274414062 19 | MSE::compute,1,100,1.9979476928710937e-05 20 | KMeans::fit,100,10,0.004336023330688476 21 | KMeans(Random)::fit,100,10,0.0328058385848999 22 | PCA::fit_transform,250,10,0.0010349464416503907 23 | StandardScaler::fit_transform,250,10,0.0007710123062133789 24 | RidgeRegression::fit,250,10,0.002090301513671875 25 | R2Score::compute,1,250,0.001070575714111328 26 | MSE::compute,1,250,2.1524429321289063e-05 27 | KMeans::fit,250,10,0.005083417892456055 28 | KMeans(Random)::fit,250,10,0.03514935493469238 29 | PCA::fit_transform,500,10,0.0010472917556762695 30 | StandardScaler::fit_transform,500,10,0.0012758588790893555 31 | RidgeRegression::fit,500,10,0.0015604877471923828 32 | R2Score::compute,1,500,0.0013840866088867187 33 | MSE::compute,1,500,3.302574157714844e-05 34 | KMeans::fit,500,10,0.00586606502532959 35 | KMeans(Random)::fit,500,10,0.04117615222930908 36 | PCA::fit_transform,750,10,0.0014310359954833984 37 | StandardScaler::fit_transform,750,10,0.0014940738677978516 38 | RidgeRegression::fit,750,10,0.0016718673706054687 39 | R2Score::compute,1,750,0.0007764577865600586 40 | MSE::compute,1,750,1.9922256469726563e-05 41 | KMeans::fit,750,10,0.005638890266418457 42 | KMeans(Random)::fit,750,10,0.05279683589935303 43 | PCA::fit_transform,1000,10,0.0012772417068481446 44 | StandardScaler::fit_transform,1000,10,0.0009987497329711915 45 | RidgeRegression::fit,1000,10,0.0017050647735595704 46 | R2Score::compute,1,1000,0.0006479644775390626 47 | MSE::compute,1,1000,2.005577087402344e-05 48 | KMeans::fit,1000,10,0.004474177360534668 49 | KMeans(Random)::fit,1000,10,0.03486660480499268 50 | PCA::fit_transform,1000,2,0.000982050895690918 51 | StandardScaler::fit_transform,1000,2,0.0011439180374145507 52 | RidgeRegression::fit,1000,2,0.0013374805450439454 53 | R2Score::compute,1,1000,0.0007796525955200195 54 | MSE::compute,1,1000,2.078533172607422e-05 55 | KMeans::fit,1000,2,0.004116382598876953 56 | KMeans(Random)::fit,1000,2,0.03314480304718018 57 | PCA::fit_transform,1000,5,0.0014568710327148438 58 | StandardScaler::fit_transform,1000,5,0.0016419076919555664 59 | RidgeRegression::fit,1000,5,0.0015299558639526368 60 | R2Score::compute,1,1000,0.0008126688003540039 61 | MSE::compute,1,1000,4.001617431640625e-05 62 | KMeans::fit,1000,5,0.005040826797485351 63 | KMeans(Random)::fit,1000,5,0.036921024322509766 64 | PCA::fit_transform,1000,10,0.001308903694152832 65 | StandardScaler::fit_transform,1000,10,0.0014638996124267578 66 | RidgeRegression::fit,1000,10,0.0016265583038330078 67 | R2Score::compute,1,1000,0.0008108186721801758 68 | MSE::compute,1,1000,2.0251274108886718e-05 69 | KMeans::fit,1000,10,0.005084161758422852 70 | KMeans(Random)::fit,1000,10,0.044522757530212405 71 | PCA::fit_transform,1000,25,0.0022488927841186526 72 | StandardScaler::fit_transform,1000,25,0.001184096336364746 73 | RidgeRegression::fit,1000,25,0.0019571685791015623 74 | R2Score::compute,1,1000,0.0009643125534057617 75 | MSE::compute,1,1000,1.025676727294922e-05 76 | KMeans::fit,1000,25,0.0057989549636840824 77 | KMeans(Random)::fit,1000,25,0.04559636116027832 78 | PCA::fit_transform,1000,50,0.0025264930725097654 79 | StandardScaler::fit_transform,1000,50,0.0016193151473999023 80 | RidgeRegression::fit,1000,50,0.002396669387817383 81 | R2Score::compute,1,1000,0.0009355354309082031 82 | MSE::compute,1,1000,3.075122833251953e-05 83 | KMeans::fit,1000,50,0.013863744735717774 84 | KMeans(Random)::fit,1000,50,0.07263703346252441 85 | -------------------------------------------------------------------------------- /python/logs/unit_tests_output.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rosewang01/rustkit-learn/f0ba147980c4f820e2fc10b2abf9439112d1417d/python/logs/unit_tests_output.txt -------------------------------------------------------------------------------- /python/rustkit_benchmarking.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import time 3 | from tqdm import tqdm 4 | import rustkit 5 | from sklearn.datasets import make_blobs 6 | 7 | def benchmark_pca(X, n_components=2, n_iterations=10): 8 | def fit_pca(X): 9 | pca = rustkit.PCA() 10 | return pca.fit_transform(X, n_components) 11 | 12 | total_time = 0 13 | 14 | for _ in tqdm(range(n_iterations)): 15 | start_time = time.time() # Start timing 16 | fit_pca(X) # Call the PCA fitting function 17 | fit_time = time.time() - start_time # Calculate the time taken 18 | total_time += fit_time # Add to the total time 19 | 20 | average_time = total_time / n_iterations 21 | print(f"PCA Average Time: {average_time:.4f}s") 22 | return average_time 23 | 24 | 25 | def benchmark_standard_scaler(X, n_iterations=10): 26 | def fit_standard_scaler(X): 27 | scaler = rustkit.StandardScaler() 28 | return scaler.fit_transform(X) 29 | 30 | total_time = 0 31 | 32 | for _ in tqdm(range(n_iterations)): 33 | start_time = time.time() 34 | fit_standard_scaler(X) 35 | fit_time = time.time() - start_time 36 | total_time += fit_time 37 | 38 | average_time = total_time / n_iterations 39 | print(f"Standard Scaler Average Time: {average_time:.4f}s") 40 | return average_time 41 | 42 | 43 | def benchmark_ridge(X, y, alpha=1.0, n_iterations=10): 44 | def fit_ridge(X, y): 45 | ridge = rustkit.RidgeRegression(alpha, True) 46 | ridge.fit(X, y) 47 | return ridge 48 | 49 | total_time = 0 50 | 51 | for _ in tqdm(range(n_iterations)): 52 | start_time = time.time() 53 | fit_ridge(X, y) 54 | fit_time = time.time() - start_time 55 | total_time += fit_time 56 | 57 | average_time = total_time / n_iterations 58 | print(f"Ridge Regression Average Time: {average_time:.4f}s") 59 | return average_time 60 | 61 | 62 | def benchmark_r2(y_true, y_pred, n_iterations=10): 63 | def compute_r2(y_true, y_pred): 64 | return rustkit.R2Score.compute(y_true, y_pred) 65 | 66 | total_time = 0 67 | 68 | for _ in tqdm(range(n_iterations)): 69 | start_time = time.time() 70 | compute_r2(y_true, y_pred) 71 | fit_time = time.time() - start_time 72 | total_time += fit_time 73 | 74 | average_time = total_time / n_iterations 75 | print(f"R² Score Average Time: {average_time:.4f}s") 76 | return average_time 77 | 78 | def benchmark_mse(y_true, y_pred, n_iterations=10): 79 | def compute_mse(y_true, y_pred): 80 | return rustkit.MSE.compute(y_true, y_pred) 81 | 82 | total_time = 0 83 | 84 | for _ in tqdm(range(n_iterations)): 85 | start_time = time.time() 86 | compute_mse(y_true, y_pred) 87 | fit_time = time.time() - start_time 88 | total_time += fit_time 89 | 90 | average_time = total_time / n_iterations 91 | print(f"MSE Average Time: {average_time:.4f}s") 92 | return average_time 93 | 94 | 95 | def benchmark_kmeans_random(X, n_clusters=3, n_iterations=10): 96 | def fit_kmeans(X): 97 | kmeans = rustkit.KMeans(n_clusters, "random", 200, 10) 98 | kmeans.fit(X) 99 | return kmeans 100 | 101 | total_time = 0 102 | 103 | for _ in tqdm(range(n_iterations)): 104 | start_time = time.time() 105 | fit_kmeans(X) 106 | fit_time = time.time() - start_time 107 | total_time += fit_time 108 | 109 | average_time = total_time / n_iterations 110 | print(f"KMeans - Random Init Average Time: {average_time:.4f}s") 111 | return average_time 112 | 113 | def benchmark_kmeans(X, n_clusters=3, n_iterations=10): 114 | def fit_kmeans(X): 115 | kmeans = rustkit.KMeans(n_clusters, "kmeans++", 200, 10) 116 | kmeans.fit(X) 117 | return kmeans 118 | 119 | total_time = 0 120 | 121 | for _ in tqdm(range(n_iterations)): 122 | start_time = time.time() 123 | fit_kmeans(X) 124 | fit_time = time.time() - start_time 125 | total_time += fit_time 126 | 127 | average_time = total_time / n_iterations 128 | print(f"KMeans Average Time: {average_time:.4f}s") 129 | return average_time 130 | 131 | def run_benchmark(nrows, ncols, filename): 132 | X = np.random.rand(nrows, ncols) 133 | X_clustered = make_blobs(n_samples=nrows, n_features=ncols, centers=3, random_state=42)[0] 134 | y = np.random.rand(nrows) 135 | y_true = np.random.rand(nrows) 136 | y_pred = np.random.rand(nrows) 137 | 138 | pca_time = benchmark_pca(X, n_iterations=50) 139 | standard_scaler_time = benchmark_standard_scaler(X, n_iterations=50) 140 | ridge_time = benchmark_ridge(X, y, n_iterations=50) 141 | r2_time = benchmark_r2(y_true, y_pred, n_iterations=50) 142 | mse_time = benchmark_mse(y_true, y_pred, n_iterations=50) 143 | kmeans_time = benchmark_kmeans(X_clustered, n_iterations=50) 144 | kmeans_random_time = benchmark_kmeans_random(X_clustered, n_iterations=50) 145 | 146 | with open(filename, "a") as f: 147 | f.write(f"PCA::fit_transform,{nrows},{ncols},{pca_time}\n") 148 | f.write(f"StandardScaler::fit_transform,{nrows},{ncols},{standard_scaler_time}\n") 149 | f.write(f"RidgeRegression::fit,{nrows},{ncols},{ridge_time}\n") 150 | f.write(f"R2Score::compute,{1},{nrows},{r2_time}\n") 151 | f.write(f"MSE::compute,{1},{nrows},{mse_time}\n") 152 | f.write(f"KMeans::fit,{nrows},{ncols},{kmeans_time}\n") 153 | f.write(f"KMeans(Random)::fit,{nrows},{ncols},{kmeans_random_time}\n") 154 | 155 | 156 | def main(): 157 | nrows = [10, 50, 100, 250, 500, 750, 1000] 158 | ncols = [2, 5, 10, 25, 50] 159 | filename = "logs/rustkit_benchmarking.csv" 160 | for nrow in nrows: 161 | run_benchmark(nrow, 10, filename) 162 | 163 | for ncol in ncols: 164 | run_benchmark(1000, ncol, filename) 165 | 166 | 167 | if __name__ == "__main__": 168 | main() 169 | -------------------------------------------------------------------------------- /python/sklearn_benchmarking.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import time 3 | from tqdm import tqdm 4 | from sklearn.decomposition import PCA as SklearnPCA 5 | from sklearn.metrics import r2_score 6 | from sklearn.preprocessing import StandardScaler as SklearnStandardScaler 7 | from sklearn.linear_model import Ridge as SklearnRidgeRegression 8 | from sklearn.cluster import KMeans as SklearnKMeans 9 | from sklearn.datasets import make_blobs 10 | 11 | def benchmark_pca(X, n_components=2, n_iterations=10): 12 | def fit_pca(X): 13 | pca = SklearnPCA(n_components) 14 | return pca.fit_transform(X) 15 | 16 | total_time = 0 17 | 18 | for _ in tqdm(range(n_iterations)): 19 | start_time = time.time() # Start timing 20 | fit_pca(X) # Call the PCA fitting function 21 | fit_time = time.time() - start_time # Calculate the time taken 22 | total_time += fit_time # Add to the total time 23 | 24 | average_time = total_time / n_iterations 25 | print(f"PCA Average Time: {average_time:.4f}s") 26 | return average_time 27 | 28 | 29 | def benchmark_standard_scaler(X, n_iterations=10): 30 | def fit_standard_scaler(X): 31 | scaler = SklearnStandardScaler() 32 | return scaler.fit_transform(X) 33 | 34 | total_time = 0 35 | 36 | for _ in tqdm(range(n_iterations)): 37 | start_time = time.time() 38 | fit_standard_scaler(X) 39 | fit_time = time.time() - start_time 40 | total_time += fit_time 41 | 42 | average_time = total_time / n_iterations 43 | print(f"Standard Scaler Average Time: {average_time:.4f}s") 44 | return average_time 45 | 46 | 47 | def benchmark_ridge(X, y, alpha=1.0, n_iterations=10): 48 | def fit_ridge(X, y): 49 | ridge = SklearnRidgeRegression(alpha=1.0, fit_intercept=True) 50 | ridge.fit(X, y) 51 | return ridge 52 | 53 | total_time = 0 54 | 55 | for _ in tqdm(range(n_iterations)): 56 | start_time = time.time() 57 | fit_ridge(X, y) 58 | fit_time = time.time() - start_time 59 | total_time += fit_time 60 | 61 | average_time = total_time / n_iterations 62 | print(f"Ridge Regression Average Time: {average_time:.4f}s") 63 | return average_time 64 | 65 | 66 | def benchmark_r2(y_true, y_pred, n_iterations=10): 67 | def compute_r2(y_true, y_pred): 68 | return r2_score(y_true, y_pred) 69 | 70 | total_time = 0 71 | 72 | for _ in tqdm(range(n_iterations)): 73 | start_time = time.time() 74 | compute_r2(y_true, y_pred) 75 | fit_time = time.time() - start_time 76 | total_time += fit_time 77 | 78 | average_time = total_time / n_iterations 79 | print(f"R² Score Average Time: {average_time:.4f}s") 80 | return average_time 81 | 82 | def benchmark_mse(y_true, y_pred, n_iterations=10): 83 | def compute_mse(y_true, y_pred): 84 | return np.mean((y_true - y_pred)**2) 85 | 86 | total_time = 0 87 | 88 | for _ in tqdm(range(n_iterations)): 89 | start_time = time.time() 90 | compute_mse(y_true, y_pred) 91 | fit_time = time.time() - start_time 92 | total_time += fit_time 93 | 94 | average_time = total_time / n_iterations 95 | print(f"MSE Average Time: {average_time:.4f}s") 96 | return average_time 97 | 98 | 99 | def benchmark_kmeans_random(X, n_clusters=10, n_iterations=10): 100 | def fit_kmeans(X): 101 | kmeans = SklearnKMeans(n_clusters=n_clusters, init="random") 102 | kmeans.fit(X) 103 | return kmeans 104 | 105 | total_time = 0 106 | 107 | for _ in tqdm(range(n_iterations)): 108 | start_time = time.time() 109 | fit_kmeans(X) 110 | fit_time = time.time() - start_time 111 | total_time += fit_time 112 | 113 | average_time = total_time / n_iterations 114 | print(f"KMeans - Random Init Average Time: {average_time:.4f}s") 115 | return average_time 116 | 117 | def benchmark_kmeans(X, n_clusters=10, n_iterations=10): 118 | def fit_kmeans(X): 119 | kmeans = SklearnKMeans(n_clusters) 120 | kmeans.fit(X) 121 | return kmeans 122 | 123 | total_time = 0 124 | 125 | for _ in tqdm(range(n_iterations)): 126 | start_time = time.time() 127 | fit_kmeans(X) 128 | fit_time = time.time() - start_time 129 | total_time += fit_time 130 | 131 | average_time = total_time / n_iterations 132 | print(f"KMeans Average Time: {average_time:.4f}s") 133 | return average_time 134 | 135 | def run_benchmark(nrows, ncols, filename): 136 | X = np.random.rand(nrows, ncols) 137 | X_clustered = make_blobs(n_samples=nrows, n_features=ncols, centers=3, random_state=42)[0] 138 | y = np.random.rand(nrows) 139 | y_true = np.random.rand(nrows) 140 | y_pred = np.random.rand(nrows) 141 | 142 | pca_time = benchmark_pca(X, n_iterations=50) 143 | standard_scaler_time = benchmark_standard_scaler(X, n_iterations=50) 144 | ridge_time = benchmark_ridge(X, y, n_iterations=50) 145 | r2_time = benchmark_r2(y_true, y_pred, n_iterations=50) 146 | mse_time = benchmark_mse(y_true, y_pred, n_iterations=50) 147 | kmeans_time = benchmark_kmeans(X_clustered, n_clusters=3, n_iterations=50) 148 | kmeans_random_time = benchmark_kmeans_random(X_clustered, n_clusters=3, n_iterations=50) 149 | 150 | with open(filename, "a") as f: 151 | f.write(f"PCA::fit_transform,{nrows},{ncols},{pca_time}\n") 152 | f.write(f"StandardScaler::fit_transform,{nrows},{ncols},{standard_scaler_time}\n") 153 | f.write(f"RidgeRegression::fit,{nrows},{ncols},{ridge_time}\n") 154 | f.write(f"R2Score::compute,{1},{nrows},{r2_time}\n") 155 | f.write(f"MSE::compute,{1},{nrows},{mse_time}\n") 156 | f.write(f"KMeans::fit,{nrows},{ncols},{kmeans_time}\n") 157 | f.write(f"KMeans(Random)::fit,{nrows},{ncols},{kmeans_random_time}\n") 158 | 159 | 160 | def main(): 161 | nrows = [10, 50, 100, 250, 500, 750, 1000] 162 | ncols = [2, 5, 10, 25, 50] 163 | filename = "logs/sklearn_benchmarking.csv" 164 | for nrow in nrows: 165 | run_benchmark(nrow, 10, filename) 166 | 167 | for ncol in ncols: 168 | run_benchmark(1000, ncol, filename) 169 | 170 | 171 | if __name__ == "__main__": 172 | main() 173 | -------------------------------------------------------------------------------- /python/test.py: -------------------------------------------------------------------------------- 1 | import rustkit 2 | import numpy as np 3 | 4 | def test_converter_vector(): 5 | print("=" * 77) 6 | print("VECTOR TEST") 7 | print("=" * 77) 8 | input_vector = np.array([1.0, 2.0, 3.0, 4.0]) 9 | result = rustkit.converter_vector_test(input_vector) 10 | 11 | result_vector = np.array(result) 12 | 13 | print("Vector test") 14 | print("Input vector:") 15 | print(input_vector) 16 | print("Result vector:") 17 | print(result_vector) 18 | assert np.array_equal(input_vector, result_vector), "Test failed! Input and output vectors are not equal." 19 | print("Vector test passed!") 20 | 21 | def test_converter_matrix(): 22 | print("=" * 77) 23 | print("MATRIX TEST") 24 | print("=" * 77) 25 | input_matrix = np.array([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]) 26 | 27 | result = rustkit.converter_matrix_test(input_matrix) 28 | 29 | result_matrix = np.array(result) 30 | 31 | print("Input matrix:") 32 | print(input_matrix) 33 | print("Result matrix:") 34 | print(result_matrix) 35 | assert np.array_equal(input_matrix, result_matrix), "Test failed! Input and output matrices are not equal." 36 | print("Matrix test passed!") 37 | 38 | def test_converter_opt_matrix(): 39 | print("=" * 77) 40 | print("NULL VAL MATRIX TEST") 41 | print("=" * 77) 42 | input_matrix = np.array([[1.0, np.nan, 3.0], [4.0, 5.0, np.nan]]) 43 | 44 | result = rustkit.converter_matrix_opt_test(input_matrix) 45 | 46 | result_matrix = np.array(result) 47 | 48 | print("Input matrix:") 49 | print(input_matrix) 50 | print("Result matrix:") 51 | print(result_matrix) 52 | 53 | for i in range(input_matrix.shape[0]): 54 | for j in range(input_matrix.shape[1]): 55 | if np.isnan(input_matrix[i][j]): 56 | assert np.isnan(result_matrix[i][j]), "Test failed! NaN values are not equal." 57 | else: 58 | assert input_matrix[i][j] == result_matrix[i][j], "Test failed! Values are not equal." 59 | print("Null val matrix test passed!") 60 | 61 | def sample_scaler(): 62 | print("=" * 77) 63 | print("STANDARD SCALER EXAMPLE") 64 | print("=" * 77) 65 | 66 | data = np.array([ 67 | [1.0, 2.0, 3.0], 68 | [4.0, 5.0, 6.0], 69 | [7.0, 8.0, 9.0], 70 | [10.0, 11.0, 12.0] 71 | ]) 72 | 73 | scaler = rustkit.StandardScaler() 74 | standardized_data = scaler.fit_transform(data) 75 | print("Standardized Data:") 76 | print(standardized_data) 77 | 78 | original_data = scaler.inverse_transform(standardized_data) 79 | print("Original Data (after inverse transform):") 80 | print(original_data) 81 | 82 | def sample_pca(): 83 | print("=" * 77) 84 | print("PCA EXAMPLE") 85 | print("=" * 77) 86 | 87 | data = np.array([ 88 | [1.0, 2.0, 3.0], 89 | [4.0, 5.0, 6.0], 90 | [7.0, 8.0, 9.0], 91 | [10.0, 11.0, 12.0] 92 | ]) 93 | 94 | pca = rustkit.PCA() 95 | transformed_data = pca.fit_transform(data, 2) 96 | print("Original Data:") 97 | print(data) 98 | print("Transformed Data:") 99 | print(transformed_data) 100 | print("Principal Components:") 101 | print(pca.components) 102 | print("Explained Variance:") 103 | print(pca.explained_variance) 104 | 105 | original_data = pca.inverse_transform(transformed_data) 106 | print("Reconstructed Data (after inverse transform):") 107 | print(original_data) 108 | 109 | def sample_ridge(): 110 | print("=" * 77) 111 | print("RIDGE REGRESSION EXAMPLE") 112 | print("=" * 77) 113 | 114 | x = np.array([ 115 | [1.0, 2.0], 116 | [3.0, 4.0], 117 | [5.0, 6.0], 118 | [7.0, 8.0] 119 | ]) 120 | y = np.array([1.0, 2.0, 3.0, 4.0]) 121 | 122 | ridge_with_bias = rustkit.RidgeRegression(1.0, True) 123 | ridge_with_bias.fit(x, y) 124 | print("With Bias - Weights:") 125 | print(ridge_with_bias.weights) 126 | print("With Bias - Intercept:", ridge_with_bias.intercept) 127 | print("With Bias - Predictions:") 128 | print(ridge_with_bias.predict(x)) 129 | 130 | ridge_no_bias = rustkit.RidgeRegression(1.0, False) 131 | ridge_no_bias.fit(x, y) 132 | print("No Bias - Weights:") 133 | print(ridge_no_bias.weights) 134 | print("No Bias - Intercept:", ridge_no_bias.intercept) 135 | print("No Bias - Predictions:") 136 | print(ridge_no_bias.predict(x)) 137 | 138 | def sample_r2(): 139 | print("=" * 77) 140 | print("R2-SCORE & MSE EXAMPLE") 141 | print("=" * 77) 142 | 143 | y_true = np.array([3.0, -0.5, 2.0, 7.0]) 144 | y_pred = np.array([2.5, 0.0, 2.0, 8.0]) 145 | 146 | r2_score = rustkit.R2Score.compute(y_true, y_pred) 147 | mse = rustkit.MSE.compute(y_true, y_pred) 148 | print("R² Score:", r2_score) 149 | print("MSE:", mse) 150 | 151 | def sample_kmeans(): 152 | print("=" * 77) 153 | print("KMEANS EXAMPLE") 154 | print("=" * 77) 155 | 156 | data = np.array([ 157 | [1.0, 2.0, 3.0], 158 | [1.1, 2.1, 3.1], 159 | [0.9, 1.9, 2.9], 160 | [8.0, 9.0, 10.0], 161 | [8.1, 9.1, 10.1], 162 | [7.9, 8.9, 9.9], 163 | [4.0, 5.0, 6.0], 164 | [4.1, 5.1, 6.1], 165 | [3.9, 4.9, 5.9], 166 | [4.0, 5.0, 6.0] 167 | ]) 168 | 169 | kmeans = rustkit.KMeans(3, "random", 200, 10) 170 | kmeans.fit(data) 171 | labels = kmeans.predict(data) 172 | inertia = kmeans.compute_inertia(data, labels) 173 | 174 | print("Results with Random Initialization:") 175 | print("Labels:") 176 | print(labels) 177 | print("Centroids:") 178 | print(kmeans.centroids) 179 | print("Total Inertia:", inertia) 180 | 181 | if __name__ == "__main__": 182 | test_converter_vector() 183 | print("\n\n") 184 | test_converter_matrix() 185 | print("\n\n") 186 | test_converter_opt_matrix() 187 | print("\n\n") 188 | sample_scaler() 189 | print("\n\n") 190 | sample_pca() 191 | print("\n\n") 192 | sample_ridge() 193 | print("\n\n") 194 | sample_r2() 195 | print("\n\n") 196 | sample_kmeans() -------------------------------------------------------------------------------- /python/unit_tests.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import rustkit 3 | from sklearn.decomposition import PCA as SklearnPCA 4 | from sklearn.metrics import r2_score 5 | from sklearn.preprocessing import StandardScaler as SklearnStandardScaler 6 | from sklearn.linear_model import Ridge as SklearnRidgeRegression 7 | from sklearn.cluster import KMeans as SklearnKMeans 8 | from sklearn.datasets import make_blobs 9 | from sklearn.impute import SimpleImputer 10 | 11 | def get_pca_equality(sklearn_result, rustkit_result): 12 | # check col wise if values are equal or -1 * the other col 13 | for i in range(sklearn_result.shape[1]): 14 | if (np.allclose(sklearn_result[:, i], rustkit_result[:, i], atol=1e-5)): 15 | continue 16 | elif (np.allclose(sklearn_result[:, i], -rustkit_result[:, i], atol=1e-5)): 17 | continue 18 | else: 19 | return False 20 | return True 21 | 22 | def test_pca_correctness(X): 23 | n_components = np.min(X.shape) 24 | sklearn_pca = SklearnPCA(n_components) 25 | rustkit_pca = rustkit.PCA() 26 | 27 | sklearn_result = sklearn_pca.fit_transform(X) 28 | rustkit_result = rustkit_pca.fit_transform(X, n_components) 29 | 30 | if (get_pca_equality(sklearn_result, rustkit_result)): 31 | print("PCA correctness test passed!") 32 | else: 33 | print("PCA correctness test failed!") 34 | print("Sklearn PCA result:") 35 | print(sklearn_result) 36 | print("Rustkit PCA result:") 37 | print(rustkit_result) 38 | 39 | # assert np.allclose(sklearn_result, rustkit_result, atol=1e-5), "PCA results differ!" 40 | # print("PCA correctness test passed!") 41 | 42 | def get_kmeans_equality(sklearn_result, rustkit_result, n_clusters): 43 | sklearn_dict = { 44 | i: np.where(sklearn_result == i)[0] for i in range(n_clusters) 45 | } 46 | rustkit_dict = { 47 | i: np.where(rustkit_result == i)[0] for i in range(n_clusters) 48 | } 49 | 50 | for i in sklearn_dict.keys(): 51 | for j in rustkit_dict.keys(): 52 | if (len(sklearn_dict[i]) != len(rustkit_dict[j])): 53 | continue 54 | elif (np.allclose(sklearn_dict[i], rustkit_dict[j])): 55 | rustkit_dict.pop(j) 56 | break 57 | else: 58 | continue 59 | if (len(rustkit_dict) > 0): 60 | return False 61 | return True 62 | 63 | def test_kmeans_correctness(X): 64 | n_clusters = min(3, X.shape[0]) 65 | sklearn_kmeans_random = SklearnKMeans(n_clusters=n_clusters, init="random") 66 | sklearn_kmeans = SklearnKMeans(n_clusters) 67 | rustkit_kmeans_random = rustkit.KMeans(n_clusters, "random", 200, 10) 68 | rustkit_kmeans = rustkit.KMeans(n_clusters, "kmeans++", 200, 10) 69 | 70 | sklearn_result_random = sklearn_kmeans_random.fit_predict(X) 71 | sklearn_random_inertia = sklearn_kmeans_random.inertia_ 72 | sklearn_result = sklearn_kmeans.fit_predict(X) 73 | sklearn_inertia = sklearn_kmeans.inertia_ 74 | rustkit_result_random = rustkit_kmeans_random.fit_predict(X) 75 | rustkit_random_inertia = rustkit_kmeans_random.compute_inertia(X, rustkit_result_random) 76 | rustkit_result = rustkit_kmeans.fit_predict(X) 77 | rustkit_inertia = rustkit_kmeans.compute_inertia(X, rustkit_result) 78 | 79 | if (get_kmeans_equality(sklearn_result, rustkit_result, n_clusters)): 80 | print("KMeans - KMeans++ correctness test passed!") 81 | else: 82 | print("KMeans - KMeans++ correctness test failed!") 83 | print("Sklearn KMeans result:") 84 | print(sklearn_result) 85 | print("Rustkit KMeans result:") 86 | print(rustkit_result) 87 | 88 | if (abs(sklearn_inertia - rustkit_inertia) < 1e-5): 89 | print("KMeans - KMeans++ inertia correctness test passed!") 90 | else: 91 | print("KMeans - KMeans++ inertia correctness test failed!") 92 | print("Sklearn KMeans inertia:", sklearn_inertia) 93 | print("Rustkit KMeans inertia:", rustkit_inertia) 94 | 95 | if (get_kmeans_equality(sklearn_result_random, rustkit_result_random, n_clusters)): 96 | print("KMeans - Random correctness test passed!") 97 | else: 98 | print("KMeans - Random correctness test failed!") 99 | print("Sklearn KMeans result:") 100 | print(sklearn_result) 101 | print("Rustkit KMeans result:") 102 | print(rustkit_result_random) 103 | 104 | if (abs(sklearn_random_inertia - rustkit_random_inertia) < 1e-5): 105 | print("KMeans - Random inertia correctness test passed!") 106 | else: 107 | print("KMeans - Random inertia correctness test failed!") 108 | print("Sklearn KMeans inertia:", sklearn_random_inertia) 109 | print("Rustkit KMeans inertia:", rustkit_random_inertia) 110 | # assert np.allclose(sklearn_result, rustkit_result), "KMeans results differ!" 111 | # print("KMeans correctness test passed!") 112 | 113 | def test_standard_scaler_correctness(X): 114 | sklearn_scaler = SklearnStandardScaler() 115 | rustkit_scaler = rustkit.StandardScaler() 116 | 117 | sklearn_result = sklearn_scaler.fit_transform(X) 118 | rustkit_result = rustkit_scaler.fit_transform(X) 119 | 120 | if (np.allclose(sklearn_result, rustkit_result, atol=1e-5)): 121 | print("Standard Scaler correctness test passed!") 122 | else: 123 | print("Standard Scaler correctness test failed!") 124 | print("Sklearn Standard Scaler result:") 125 | print(sklearn_result) 126 | print("Rustkit Standard Scaler result:") 127 | print(rustkit_result) 128 | 129 | # assert np.allclose(sklearn_result, rustkit_result, atol=1e-5), "Standard Scaler results differ!" 130 | # print("Standard Scaler correctness test passed!") 131 | 132 | def test_ridge_correctness(X, y): 133 | sklearn_ridge = SklearnRidgeRegression(alpha=1.0, fit_intercept=True) 134 | rustkit_ridge = rustkit.RidgeRegression(1.0, True) 135 | 136 | sklearn_ridge.fit(X, y) 137 | rustkit_ridge.fit(X, y) 138 | 139 | if (np.allclose(sklearn_ridge.coef_, rustkit_ridge.weights, atol=1e-5)): 140 | print("Ridge Regression correctness test passed!") 141 | else: 142 | print("Ridge Regression correctness test failed!") 143 | print("Sklearn Ridge Regression result:") 144 | print(sklearn_ridge.coef_) 145 | print("Rustkit Ridge Regression result:") 146 | print(rustkit_ridge.weights) 147 | 148 | # assert np.allclose(sklearn_ridge.coef_, rustkit_ridge.weights, atol=1e-5), "Ridge Regression results differ!" 149 | # print("Ridge Regression correctness test passed!") 150 | 151 | def test_r2_correctness(y_true, y_pred): 152 | sklearn_r2 = r2_score(y_true, y_pred) 153 | rustkit_r2 = rustkit.R2Score.compute(y_true, y_pred) 154 | 155 | if (abs(sklearn_r2 - rustkit_r2) < 1e-5): 156 | print("R² correctness test passed!") 157 | else: 158 | print("R² correctness test failed!") 159 | print("Sklearn R² Score:", sklearn_r2) 160 | print("Rustkit R² Score:", rustkit_r2) 161 | 162 | # assert abs(sklearn_r2 - rustkit_r2) < 1e-5, "R² results differ!" 163 | # print("R² correctness test passed!") 164 | 165 | def test_mse_correctness(y_true, y_pred): 166 | sklearn_mse = np.mean((y_true - y_pred)**2) 167 | rustkit_mse = rustkit.MSE.compute(y_true, y_pred) 168 | 169 | if (abs(sklearn_mse - rustkit_mse) < 1e-5): 170 | print("MSE correctness test passed!") 171 | else: 172 | print("MSE correctness test failed!") 173 | print("Sklearn MSE:", sklearn_mse) 174 | print("Rustkit MSE:", rustkit_mse) 175 | 176 | # assert abs(sklearn_mse - rustkit_mse) < 1e-5, "MSE results differ!" 177 | # print("MSE correctness test passed!") 178 | 179 | def test_imputer(X): 180 | np.random.seed(42) 181 | total_entries = X.size 182 | 183 | n_nan = int(total_entries * 0.1) 184 | nan_indices = np.random.choice(total_entries, n_nan, replace=False) 185 | 186 | X_flattened = X.flatten() 187 | X_flattened[nan_indices] = np.nan 188 | X_with_missing = X_flattened.reshape(X.shape) 189 | 190 | imputer = SimpleImputer(strategy='mean') 191 | sk_imputed = imputer.fit_transform(X_with_missing) 192 | 193 | imputer = rustkit.Imputer("mean") 194 | rustkit_imputed = imputer.fit_transform(X_with_missing) 195 | 196 | if (np.allclose(sk_imputed, rustkit_imputed, atol=1e-5)): 197 | print("Imputer correctness test passed!") 198 | else: 199 | print("Imputer correctness test failed!") 200 | print("Sklearn Imputer result:") 201 | print(sk_imputed) 202 | print("Rustkit Imputer result:") 203 | print(rustkit_imputed) 204 | 205 | 206 | def test_empty_input(): 207 | print("EMPTY INPUT") 208 | # Test empty input 209 | X = np.array([[]]) 210 | y = np.array([]) 211 | y_true = np.array([]) 212 | y_pred = np.array([]) 213 | 214 | test_standard_scaler_correctness(X) 215 | test_ridge_correctness(X, y) 216 | test_r2_correctness(y_pred, y_true) 217 | test_mse_correctness(y_pred, y_true) 218 | test_pca_correctness(X) 219 | test_kmeans_correctness(X) 220 | 221 | def test_square_input(): 222 | print("SQUARE INPUT") 223 | # Test square input 224 | X = np.random.rand(10, 10) 225 | y = np.array([1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0]) 226 | X_clustered, _ = make_blobs(n_samples=10, centers=3, n_features=10, random_state=0) 227 | 228 | test_standard_scaler_correctness(X) 229 | test_ridge_correctness(X, y) 230 | test_pca_correctness(X) 231 | test_kmeans_correctness(X_clustered) 232 | 233 | def test_single_input(): 234 | print("SINGLE INPUT") 235 | # Test single input 236 | X = np.random.rand(1, 1) 237 | y = np.array([1.0]) 238 | y_pred = np.array([2.0]) 239 | 240 | # not well defined with single input 241 | # test_standard_scaler_correctness(X) 242 | # test_r2_correctness(y_pred, y) 243 | # test_mse_correctness(y_pred, y) 244 | # test_pca_correctness(X) 245 | # test_imputer(X) 246 | test_ridge_correctness(X, y) 247 | test_kmeans_correctness(X) 248 | 249 | def test_large_input(): 250 | print("LARGE INPUT") 251 | # Test large input 252 | X = np.random.rand(1000, 100) 253 | y = np.random.rand(1000) 254 | y_true = np.random.rand(1000) 255 | y_pred = np.random.rand(1000) 256 | X_clustered, _ = make_blobs(n_samples=1000, centers=3, n_features=100, random_state=0) 257 | 258 | test_standard_scaler_correctness(X) 259 | test_ridge_correctness(X, y) 260 | test_r2_correctness(y_pred, y_true) 261 | test_mse_correctness(y_pred, y_true) 262 | test_pca_correctness(X) 263 | test_kmeans_correctness(X_clustered) 264 | test_imputer(X) 265 | 266 | def test_negative_input(): 267 | print("NEGATIVE INPUT") 268 | # Test negative input 269 | X = np.random.rand(10, 10) - 0.5 270 | X_clustered = make_blobs(n_samples=10, centers=3, n_features=10, random_state=0)[0] - 0.5 271 | y = np.array([1.0, -2.0, 3.0, -4.0, 5.0, -6.0, 7.0, -8.0, 9.0, -10.0]) 272 | y_true = np.array([1.0, -2.0, 3.0, -4.0, 5.0, -6.0, 7.0, -8.0, 9.0, -10.0]) 273 | y_pred = np.array([2.0, -3.0, 4.0, -5.0, 6.0, -7.0, 8.0, -9.0, 10.0, -11.0]) 274 | 275 | test_standard_scaler_correctness(X) 276 | test_ridge_correctness(X, y) 277 | test_r2_correctness(y_pred, y_true) 278 | test_mse_correctness(y_pred, y_true) 279 | test_pca_correctness(X) 280 | test_kmeans_correctness(X_clustered) 281 | test_imputer(X) 282 | 283 | def test_mixed_input(): 284 | print("MIXED INPUT") 285 | # Test mixed input 286 | X = np.random.rand(10, 10) 287 | X_clustered_1 = make_blobs(n_samples=10, centers=3, n_features=10, random_state=0)[0] - 0.5 288 | X_clustered_2 = make_blobs(n_samples=10, centers=3, n_features=10, random_state=0)[0] 289 | X_clustered = np.concatenate((X_clustered_1, X_clustered_2), axis=0) 290 | y = np.array([1.0, -2.0, 3.0, -4.0, 5.0, -6.0, 7.0, -8.0, 9.0, -10.0]) 291 | y_true = np.array([1.0, -2.0, 3.0, -4.0, 5.0, -6.0, 7.0, -8.0, 9.0, -10.0]) 292 | y_pred = np.array([2.0, -3.0, 4.0, -5.0, 6.0, -7.0, 8.0, -9.0, 10.0, -11.0]) 293 | 294 | test_standard_scaler_correctness(X) 295 | test_ridge_correctness(X, y) 296 | test_r2_correctness(y_pred, y_true) 297 | test_mse_correctness(y_pred, y_true) 298 | test_pca_correctness(X) 299 | test_kmeans_correctness(X_clustered) 300 | test_imputer(X) 301 | 302 | def main(): 303 | # test_empty_input() 304 | # print("\n\n") 305 | test_single_input() 306 | print("\n\n") 307 | test_square_input() 308 | print("\n\n") 309 | test_large_input() 310 | print("\n\n") 311 | test_negative_input() 312 | print("\n\n") 313 | test_mixed_input() 314 | 315 | if __name__ == "__main__": 316 | main() 317 | -------------------------------------------------------------------------------- /rustkit/Cargo.lock: -------------------------------------------------------------------------------- 1 | # This file is automatically @generated by Cargo. 2 | # It is not intended for manual editing. 3 | version = 3 4 | 5 | [[package]] 6 | name = "approx" 7 | version = "0.5.1" 8 | source = "registry+https://github.com/rust-lang/crates.io-index" 9 | checksum = "cab112f0a86d568ea0e627cc1d6be74a1e9cd55214684db5561995f6dad897c6" 10 | dependencies = [ 11 | "num-traits", 12 | ] 13 | 14 | [[package]] 15 | name = "autocfg" 16 | version = "1.4.0" 17 | source = "registry+https://github.com/rust-lang/crates.io-index" 18 | checksum = "ace50bade8e6234aa140d9a2f552bbee1db4d353f69b8217bc503490fc1a9f26" 19 | 20 | [[package]] 21 | name = "bitflags" 22 | version = "2.6.0" 23 | source = "registry+https://github.com/rust-lang/crates.io-index" 24 | checksum = "b048fb63fd8b5923fc5aa7b340d8e156aec7ec02f0c78fa8a6ddc2613f6f71de" 25 | 26 | [[package]] 27 | name = "bytemuck" 28 | version = "1.20.0" 29 | source = "registry+https://github.com/rust-lang/crates.io-index" 30 | checksum = "8b37c88a63ffd85d15b406896cc343916d7cf57838a847b3a6f2ca5d39a5695a" 31 | 32 | [[package]] 33 | name = "byteorder" 34 | version = "1.5.0" 35 | source = "registry+https://github.com/rust-lang/crates.io-index" 36 | checksum = "1fd0f2584146f6f2ef48085050886acf353beff7305ebd1ae69500e27c67f64b" 37 | 38 | [[package]] 39 | name = "cfg-if" 40 | version = "1.0.0" 41 | source = "registry+https://github.com/rust-lang/crates.io-index" 42 | checksum = "baf1de4339761588bc0619e3cbc0120ee582ebb74b53b4efbf79117bd2da40fd" 43 | 44 | [[package]] 45 | name = "getrandom" 46 | version = "0.2.15" 47 | source = "registry+https://github.com/rust-lang/crates.io-index" 48 | checksum = "c4567c8db10ae91089c99af84c68c38da3ec2f087c3f82960bcdbf3656b6f4d7" 49 | dependencies = [ 50 | "cfg-if", 51 | "libc", 52 | "wasi", 53 | ] 54 | 55 | [[package]] 56 | name = "indoc" 57 | version = "1.0.9" 58 | source = "registry+https://github.com/rust-lang/crates.io-index" 59 | checksum = "bfa799dd5ed20a7e349f3b4639aa80d74549c81716d9ec4f994c9b5815598306" 60 | 61 | [[package]] 62 | name = "libc" 63 | version = "0.2.167" 64 | source = "registry+https://github.com/rust-lang/crates.io-index" 65 | checksum = "09d6582e104315a817dff97f75133544b2e094ee22447d2acf4a74e189ba06fc" 66 | 67 | [[package]] 68 | name = "lock_api" 69 | version = "0.4.12" 70 | source = "registry+https://github.com/rust-lang/crates.io-index" 71 | checksum = "07af8b9cdd281b7915f413fa73f29ebd5d55d0d3f0155584dade1ff18cea1b17" 72 | dependencies = [ 73 | "autocfg", 74 | "scopeguard", 75 | ] 76 | 77 | [[package]] 78 | name = "matrixmultiply" 79 | version = "0.3.9" 80 | source = "registry+https://github.com/rust-lang/crates.io-index" 81 | checksum = "9380b911e3e96d10c1f415da0876389aaf1b56759054eeb0de7df940c456ba1a" 82 | dependencies = [ 83 | "autocfg", 84 | "rawpointer", 85 | ] 86 | 87 | [[package]] 88 | name = "memoffset" 89 | version = "0.8.0" 90 | source = "registry+https://github.com/rust-lang/crates.io-index" 91 | checksum = "d61c719bcfbcf5d62b3a09efa6088de8c54bc0bfcd3ea7ae39fcc186108b8de1" 92 | dependencies = [ 93 | "autocfg", 94 | ] 95 | 96 | [[package]] 97 | name = "nalgebra" 98 | version = "0.33.2" 99 | source = "registry+https://github.com/rust-lang/crates.io-index" 100 | checksum = "26aecdf64b707efd1310e3544d709c5c0ac61c13756046aaaba41be5c4f66a3b" 101 | dependencies = [ 102 | "approx", 103 | "matrixmultiply", 104 | "nalgebra-macros", 105 | "num-complex", 106 | "num-rational", 107 | "num-traits", 108 | "simba", 109 | "typenum", 110 | ] 111 | 112 | [[package]] 113 | name = "nalgebra-macros" 114 | version = "0.2.2" 115 | source = "registry+https://github.com/rust-lang/crates.io-index" 116 | checksum = "254a5372af8fc138e36684761d3c0cdb758a4410e938babcff1c860ce14ddbfc" 117 | dependencies = [ 118 | "proc-macro2", 119 | "quote", 120 | "syn 2.0.90", 121 | ] 122 | 123 | [[package]] 124 | name = "ndarray" 125 | version = "0.15.6" 126 | source = "registry+https://github.com/rust-lang/crates.io-index" 127 | checksum = "adb12d4e967ec485a5f71c6311fe28158e9d6f4bc4a447b474184d0f91a8fa32" 128 | dependencies = [ 129 | "matrixmultiply", 130 | "num-complex", 131 | "num-integer", 132 | "num-traits", 133 | "rawpointer", 134 | ] 135 | 136 | [[package]] 137 | name = "num-bigint" 138 | version = "0.4.6" 139 | source = "registry+https://github.com/rust-lang/crates.io-index" 140 | checksum = "a5e44f723f1133c9deac646763579fdb3ac745e418f2a7af9cd0c431da1f20b9" 141 | dependencies = [ 142 | "num-integer", 143 | "num-traits", 144 | ] 145 | 146 | [[package]] 147 | name = "num-complex" 148 | version = "0.4.6" 149 | source = "registry+https://github.com/rust-lang/crates.io-index" 150 | checksum = "73f88a1307638156682bada9d7604135552957b7818057dcef22705b4d509495" 151 | dependencies = [ 152 | "num-traits", 153 | ] 154 | 155 | [[package]] 156 | name = "num-integer" 157 | version = "0.1.46" 158 | source = "registry+https://github.com/rust-lang/crates.io-index" 159 | checksum = "7969661fd2958a5cb096e56c8e1ad0444ac2bbcd0061bd28660485a44879858f" 160 | dependencies = [ 161 | "num-traits", 162 | ] 163 | 164 | [[package]] 165 | name = "num-rational" 166 | version = "0.4.2" 167 | source = "registry+https://github.com/rust-lang/crates.io-index" 168 | checksum = "f83d14da390562dca69fc84082e73e548e1ad308d24accdedd2720017cb37824" 169 | dependencies = [ 170 | "num-bigint", 171 | "num-integer", 172 | "num-traits", 173 | ] 174 | 175 | [[package]] 176 | name = "num-traits" 177 | version = "0.2.19" 178 | source = "registry+https://github.com/rust-lang/crates.io-index" 179 | checksum = "071dfc062690e90b734c0b2273ce72ad0ffa95f0c74596bc250dcfd960262841" 180 | dependencies = [ 181 | "autocfg", 182 | ] 183 | 184 | [[package]] 185 | name = "numpy" 186 | version = "0.18.0" 187 | source = "registry+https://github.com/rust-lang/crates.io-index" 188 | checksum = "96b0fee4571867d318651c24f4a570c3f18408cf95f16ccb576b3ce85496a46e" 189 | dependencies = [ 190 | "libc", 191 | "ndarray", 192 | "num-complex", 193 | "num-integer", 194 | "num-traits", 195 | "pyo3", 196 | "rustc-hash", 197 | ] 198 | 199 | [[package]] 200 | name = "once_cell" 201 | version = "1.20.2" 202 | source = "registry+https://github.com/rust-lang/crates.io-index" 203 | checksum = "1261fe7e33c73b354eab43b1273a57c8f967d0391e80353e51f764ac02cf6775" 204 | 205 | [[package]] 206 | name = "parking_lot" 207 | version = "0.12.3" 208 | source = "registry+https://github.com/rust-lang/crates.io-index" 209 | checksum = "f1bf18183cf54e8d6059647fc3063646a1801cf30896933ec2311622cc4b9a27" 210 | dependencies = [ 211 | "lock_api", 212 | "parking_lot_core", 213 | ] 214 | 215 | [[package]] 216 | name = "parking_lot_core" 217 | version = "0.9.10" 218 | source = "registry+https://github.com/rust-lang/crates.io-index" 219 | checksum = "1e401f977ab385c9e4e3ab30627d6f26d00e2c73eef317493c4ec6d468726cf8" 220 | dependencies = [ 221 | "cfg-if", 222 | "libc", 223 | "redox_syscall", 224 | "smallvec", 225 | "windows-targets", 226 | ] 227 | 228 | [[package]] 229 | name = "paste" 230 | version = "1.0.15" 231 | source = "registry+https://github.com/rust-lang/crates.io-index" 232 | checksum = "57c0d7b74b563b49d38dae00a0c37d4d6de9b432382b2892f0574ddcae73fd0a" 233 | 234 | [[package]] 235 | name = "ppv-lite86" 236 | version = "0.2.20" 237 | source = "registry+https://github.com/rust-lang/crates.io-index" 238 | checksum = "77957b295656769bb8ad2b6a6b09d897d94f05c41b069aede1fcdaa675eaea04" 239 | dependencies = [ 240 | "zerocopy", 241 | ] 242 | 243 | [[package]] 244 | name = "proc-macro2" 245 | version = "1.0.92" 246 | source = "registry+https://github.com/rust-lang/crates.io-index" 247 | checksum = "37d3544b3f2748c54e147655edb5025752e2303145b5aefb3c3ea2c78b973bb0" 248 | dependencies = [ 249 | "unicode-ident", 250 | ] 251 | 252 | [[package]] 253 | name = "pyo3" 254 | version = "0.18.3" 255 | source = "registry+https://github.com/rust-lang/crates.io-index" 256 | checksum = "e3b1ac5b3731ba34fdaa9785f8d74d17448cd18f30cf19e0c7e7b1fdb5272109" 257 | dependencies = [ 258 | "cfg-if", 259 | "indoc", 260 | "libc", 261 | "memoffset", 262 | "parking_lot", 263 | "pyo3-build-config", 264 | "pyo3-ffi", 265 | "pyo3-macros", 266 | "unindent", 267 | ] 268 | 269 | [[package]] 270 | name = "pyo3-build-config" 271 | version = "0.18.3" 272 | source = "registry+https://github.com/rust-lang/crates.io-index" 273 | checksum = "9cb946f5ac61bb61a5014924910d936ebd2b23b705f7a4a3c40b05c720b079a3" 274 | dependencies = [ 275 | "once_cell", 276 | "target-lexicon", 277 | ] 278 | 279 | [[package]] 280 | name = "pyo3-ffi" 281 | version = "0.18.3" 282 | source = "registry+https://github.com/rust-lang/crates.io-index" 283 | checksum = "fd4d7c5337821916ea2a1d21d1092e8443cf34879e53a0ac653fbb98f44ff65c" 284 | dependencies = [ 285 | "libc", 286 | "pyo3-build-config", 287 | ] 288 | 289 | [[package]] 290 | name = "pyo3-macros" 291 | version = "0.18.3" 292 | source = "registry+https://github.com/rust-lang/crates.io-index" 293 | checksum = "a9d39c55dab3fc5a4b25bbd1ac10a2da452c4aca13bb450f22818a002e29648d" 294 | dependencies = [ 295 | "proc-macro2", 296 | "pyo3-macros-backend", 297 | "quote", 298 | "syn 1.0.109", 299 | ] 300 | 301 | [[package]] 302 | name = "pyo3-macros-backend" 303 | version = "0.18.3" 304 | source = "registry+https://github.com/rust-lang/crates.io-index" 305 | checksum = "97daff08a4c48320587b5224cc98d609e3c27b6d437315bd40b605c98eeb5918" 306 | dependencies = [ 307 | "proc-macro2", 308 | "quote", 309 | "syn 1.0.109", 310 | ] 311 | 312 | [[package]] 313 | name = "quote" 314 | version = "1.0.37" 315 | source = "registry+https://github.com/rust-lang/crates.io-index" 316 | checksum = "b5b9d34b8991d19d98081b46eacdd8eb58c6f2b201139f7c5f643cc155a633af" 317 | dependencies = [ 318 | "proc-macro2", 319 | ] 320 | 321 | [[package]] 322 | name = "rand" 323 | version = "0.8.5" 324 | source = "registry+https://github.com/rust-lang/crates.io-index" 325 | checksum = "34af8d1a0e25924bc5b7c43c079c942339d8f0a8b57c39049bef581b46327404" 326 | dependencies = [ 327 | "libc", 328 | "rand_chacha", 329 | "rand_core", 330 | ] 331 | 332 | [[package]] 333 | name = "rand_chacha" 334 | version = "0.3.1" 335 | source = "registry+https://github.com/rust-lang/crates.io-index" 336 | checksum = "e6c10a63a0fa32252be49d21e7709d4d4baf8d231c2dbce1eaa8141b9b127d88" 337 | dependencies = [ 338 | "ppv-lite86", 339 | "rand_core", 340 | ] 341 | 342 | [[package]] 343 | name = "rand_core" 344 | version = "0.6.4" 345 | source = "registry+https://github.com/rust-lang/crates.io-index" 346 | checksum = "ec0be4795e2f6a28069bec0b5ff3e2ac9bafc99e6a9a7dc3547996c5c816922c" 347 | dependencies = [ 348 | "getrandom", 349 | ] 350 | 351 | [[package]] 352 | name = "rawpointer" 353 | version = "0.2.1" 354 | source = "registry+https://github.com/rust-lang/crates.io-index" 355 | checksum = "60a357793950651c4ed0f3f52338f53b2f809f32d83a07f72909fa13e4c6c1e3" 356 | 357 | [[package]] 358 | name = "redox_syscall" 359 | version = "0.5.7" 360 | source = "registry+https://github.com/rust-lang/crates.io-index" 361 | checksum = "9b6dfecf2c74bce2466cabf93f6664d6998a69eb21e39f4207930065b27b771f" 362 | dependencies = [ 363 | "bitflags", 364 | ] 365 | 366 | [[package]] 367 | name = "rustc-hash" 368 | version = "1.1.0" 369 | source = "registry+https://github.com/rust-lang/crates.io-index" 370 | checksum = "08d43f7aa6b08d49f382cde6a7982047c3426db949b1424bc4b7ec9ae12c6ce2" 371 | 372 | [[package]] 373 | name = "rustkit" 374 | version = "0.1.0" 375 | dependencies = [ 376 | "nalgebra", 377 | "numpy", 378 | "pyo3", 379 | "rand", 380 | ] 381 | 382 | [[package]] 383 | name = "safe_arch" 384 | version = "0.7.2" 385 | source = "registry+https://github.com/rust-lang/crates.io-index" 386 | checksum = "c3460605018fdc9612bce72735cba0d27efbcd9904780d44c7e3a9948f96148a" 387 | dependencies = [ 388 | "bytemuck", 389 | ] 390 | 391 | [[package]] 392 | name = "scopeguard" 393 | version = "1.2.0" 394 | source = "registry+https://github.com/rust-lang/crates.io-index" 395 | checksum = "94143f37725109f92c262ed2cf5e59bce7498c01bcc1502d7b9afe439a4e9f49" 396 | 397 | [[package]] 398 | name = "simba" 399 | version = "0.9.0" 400 | source = "registry+https://github.com/rust-lang/crates.io-index" 401 | checksum = "b3a386a501cd104797982c15ae17aafe8b9261315b5d07e3ec803f2ea26be0fa" 402 | dependencies = [ 403 | "approx", 404 | "num-complex", 405 | "num-traits", 406 | "paste", 407 | "wide", 408 | ] 409 | 410 | [[package]] 411 | name = "smallvec" 412 | version = "1.13.2" 413 | source = "registry+https://github.com/rust-lang/crates.io-index" 414 | checksum = "3c5e1a9a646d36c3599cd173a41282daf47c44583ad367b8e6837255952e5c67" 415 | 416 | [[package]] 417 | name = "syn" 418 | version = "1.0.109" 419 | source = "registry+https://github.com/rust-lang/crates.io-index" 420 | checksum = "72b64191b275b66ffe2469e8af2c1cfe3bafa67b529ead792a6d0160888b4237" 421 | dependencies = [ 422 | "proc-macro2", 423 | "quote", 424 | "unicode-ident", 425 | ] 426 | 427 | [[package]] 428 | name = "syn" 429 | version = "2.0.90" 430 | source = "registry+https://github.com/rust-lang/crates.io-index" 431 | checksum = "919d3b74a5dd0ccd15aeb8f93e7006bd9e14c295087c9896a110f490752bcf31" 432 | dependencies = [ 433 | "proc-macro2", 434 | "quote", 435 | "unicode-ident", 436 | ] 437 | 438 | [[package]] 439 | name = "target-lexicon" 440 | version = "0.12.16" 441 | source = "registry+https://github.com/rust-lang/crates.io-index" 442 | checksum = "61c41af27dd6d1e27b1b16b489db798443478cef1f06a660c96db617ba5de3b1" 443 | 444 | [[package]] 445 | name = "typenum" 446 | version = "1.17.0" 447 | source = "registry+https://github.com/rust-lang/crates.io-index" 448 | checksum = "42ff0bf0c66b8238c6f3b578df37d0b7848e55df8577b3f74f92a69acceeb825" 449 | 450 | [[package]] 451 | name = "unicode-ident" 452 | version = "1.0.14" 453 | source = "registry+https://github.com/rust-lang/crates.io-index" 454 | checksum = "adb9e6ca4f869e1180728b7950e35922a7fc6397f7b641499e8f3ef06e50dc83" 455 | 456 | [[package]] 457 | name = "unindent" 458 | version = "0.1.11" 459 | source = "registry+https://github.com/rust-lang/crates.io-index" 460 | checksum = "e1766d682d402817b5ac4490b3c3002d91dfa0d22812f341609f97b08757359c" 461 | 462 | [[package]] 463 | name = "wasi" 464 | version = "0.11.0+wasi-snapshot-preview1" 465 | source = "registry+https://github.com/rust-lang/crates.io-index" 466 | checksum = "9c8d87e72b64a3b4db28d11ce29237c246188f4f51057d65a7eab63b7987e423" 467 | 468 | [[package]] 469 | name = "wide" 470 | version = "0.7.30" 471 | source = "registry+https://github.com/rust-lang/crates.io-index" 472 | checksum = "58e6db2670d2be78525979e9a5f9c69d296fd7d670549fe9ebf70f8708cb5019" 473 | dependencies = [ 474 | "bytemuck", 475 | "safe_arch", 476 | ] 477 | 478 | [[package]] 479 | name = "windows-targets" 480 | version = "0.52.6" 481 | source = "registry+https://github.com/rust-lang/crates.io-index" 482 | checksum = "9b724f72796e036ab90c1021d4780d4d3d648aca59e491e6b98e725b84e99973" 483 | dependencies = [ 484 | "windows_aarch64_gnullvm", 485 | "windows_aarch64_msvc", 486 | "windows_i686_gnu", 487 | "windows_i686_gnullvm", 488 | "windows_i686_msvc", 489 | "windows_x86_64_gnu", 490 | "windows_x86_64_gnullvm", 491 | "windows_x86_64_msvc", 492 | ] 493 | 494 | [[package]] 495 | name = "windows_aarch64_gnullvm" 496 | version = "0.52.6" 497 | source = "registry+https://github.com/rust-lang/crates.io-index" 498 | checksum = "32a4622180e7a0ec044bb555404c800bc9fd9ec262ec147edd5989ccd0c02cd3" 499 | 500 | [[package]] 501 | name = "windows_aarch64_msvc" 502 | version = "0.52.6" 503 | source = "registry+https://github.com/rust-lang/crates.io-index" 504 | checksum = "09ec2a7bb152e2252b53fa7803150007879548bc709c039df7627cabbd05d469" 505 | 506 | [[package]] 507 | name = "windows_i686_gnu" 508 | version = "0.52.6" 509 | source = "registry+https://github.com/rust-lang/crates.io-index" 510 | checksum = "8e9b5ad5ab802e97eb8e295ac6720e509ee4c243f69d781394014ebfe8bbfa0b" 511 | 512 | [[package]] 513 | name = "windows_i686_gnullvm" 514 | version = "0.52.6" 515 | source = "registry+https://github.com/rust-lang/crates.io-index" 516 | checksum = "0eee52d38c090b3caa76c563b86c3a4bd71ef1a819287c19d586d7334ae8ed66" 517 | 518 | [[package]] 519 | name = "windows_i686_msvc" 520 | version = "0.52.6" 521 | source = "registry+https://github.com/rust-lang/crates.io-index" 522 | checksum = "240948bc05c5e7c6dabba28bf89d89ffce3e303022809e73deaefe4f6ec56c66" 523 | 524 | [[package]] 525 | name = "windows_x86_64_gnu" 526 | version = "0.52.6" 527 | source = "registry+https://github.com/rust-lang/crates.io-index" 528 | checksum = "147a5c80aabfbf0c7d901cb5895d1de30ef2907eb21fbbab29ca94c5b08b1a78" 529 | 530 | [[package]] 531 | name = "windows_x86_64_gnullvm" 532 | version = "0.52.6" 533 | source = "registry+https://github.com/rust-lang/crates.io-index" 534 | checksum = "24d5b23dc417412679681396f2b49f3de8c1473deb516bd34410872eff51ed0d" 535 | 536 | [[package]] 537 | name = "windows_x86_64_msvc" 538 | version = "0.52.6" 539 | source = "registry+https://github.com/rust-lang/crates.io-index" 540 | checksum = "589f6da84c646204747d1270a2a5661ea66ed1cced2631d546fdfb155959f9ec" 541 | 542 | [[package]] 543 | name = "zerocopy" 544 | version = "0.7.35" 545 | source = "registry+https://github.com/rust-lang/crates.io-index" 546 | checksum = "1b9b4fd18abc82b8136838da5d50bae7bdea537c574d8dc1a34ed098d6c166f0" 547 | dependencies = [ 548 | "byteorder", 549 | "zerocopy-derive", 550 | ] 551 | 552 | [[package]] 553 | name = "zerocopy-derive" 554 | version = "0.7.35" 555 | source = "registry+https://github.com/rust-lang/crates.io-index" 556 | checksum = "fa4f8080344d4671fb4e831a13ad1e68092748387dfc4f55e356242fae12ce3e" 557 | dependencies = [ 558 | "proc-macro2", 559 | "quote", 560 | "syn 2.0.90", 561 | ] 562 | -------------------------------------------------------------------------------- /rustkit/Cargo.toml: -------------------------------------------------------------------------------- 1 | [package] 2 | name = "rustkit" 3 | version = "0.1.0" 4 | edition = "2021" 5 | 6 | [dependencies] 7 | nalgebra = "0.33.2" 8 | rand = "0.8" 9 | pyo3 = { version = "0.18", features = ["extension-module"] } 10 | numpy = "0.18" 11 | 12 | [lib] 13 | name = "rustkit" 14 | crate-type = ["cdylib"] 15 | 16 | [[bin]] 17 | name = "rustkit-cli" # Renames the binary target 18 | path = "src/main.rs" -------------------------------------------------------------------------------- /rustkit/pyproject.toml: -------------------------------------------------------------------------------- 1 | [build-system] 2 | requires = ["maturin>=1.0,<2.0"] 3 | build-backend = "maturin" 4 | 5 | [tool.maturin] 6 | module-name = "rustkit" -------------------------------------------------------------------------------- /rustkit/src/benchmarking.rs: -------------------------------------------------------------------------------- 1 | use std::fs::OpenOptions; 2 | use std::io::{self, Write}; 3 | use std::time::Instant; 4 | 5 | pub fn log_function_time( 6 | mut func: F, 7 | func_name: &str, 8 | input_rows: usize, 9 | input_cols: usize, 10 | ) -> io::Result 11 | where 12 | F: FnMut() -> T, 13 | { 14 | let start = Instant::now(); 15 | let resp = func(); 16 | let duration = start.elapsed(); 17 | 18 | let runtime = duration.as_secs_f64(); 19 | 20 | let mut file = OpenOptions::new() 21 | .create(true) 22 | .append(true) 23 | .open("timing_log.csv")?; 24 | 25 | writeln!( 26 | file, 27 | "{},{},{},{}", 28 | func_name, input_rows, input_cols, runtime 29 | )?; 30 | 31 | Ok(resp) 32 | } 33 | -------------------------------------------------------------------------------- /rustkit/src/converters.rs: -------------------------------------------------------------------------------- 1 | use nalgebra::{DMatrix, DVector}; 2 | use numpy::ndarray::{Array1, Array2}; 3 | use numpy::{PyArray1, PyArray2, PyReadonlyArray1, PyReadonlyArray2}; 4 | use pyo3::prelude::*; 5 | use pyo3::types::PyFloat; 6 | 7 | pub fn python_to_rust_opt_float(value: f64) -> Option { 8 | if value.is_nan() { 9 | None 10 | } else { 11 | Some(value) 12 | } 13 | } 14 | 15 | pub fn python_to_rust_dynamic_vector< 16 | T: Clone + Copy + numpy::Element + std::fmt::Debug + std::cmp::PartialEq + 'static, 17 | >( 18 | array: &PyReadonlyArray1, 19 | ) -> DVector { 20 | let vector = array.as_array(); 21 | let elements = vector.iter().cloned().collect::>(); 22 | DVector::from_row_slice(&elements) 23 | } 24 | 25 | pub fn python_to_rust_dynamic_matrix< 26 | T: Clone + Copy + numpy::Element + std::fmt::Debug + std::cmp::PartialEq + 'static, 27 | >( 28 | array: &PyReadonlyArray2, 29 | ) -> DMatrix { 30 | let matrix = array.as_array(); 31 | let shape = matrix.shape(); 32 | let rows = shape[0]; 33 | let cols = shape[1]; 34 | let elements = matrix.iter().cloned().collect::>(); 35 | DMatrix::from_row_slice(rows, cols, &elements) 36 | } 37 | 38 | pub fn python_to_rust_opt_dynamic_vector(array: &PyReadonlyArray1) -> DVector> { 39 | // Convert a NumPy array to nalgebra::DVector 40 | let vector = array.as_array(); 41 | let elements = vector.iter().cloned().collect::>(); 42 | DVector::from_iterator( 43 | elements.len(), 44 | elements.into_iter().map(python_to_rust_opt_float), 45 | ) 46 | } 47 | 48 | pub fn python_to_rust_opt_dynamic_matrix(array: &PyReadonlyArray2) -> DMatrix> { 49 | // Convert a NumPy array to nalgebra::DMatrix 50 | let matrix = array.as_array(); 51 | let shape = matrix.shape(); 52 | let rows = shape[0]; 53 | let cols = shape[1]; 54 | let matrix_transposed = matrix.t(); 55 | let elements = matrix_transposed.iter().cloned().collect::>(); 56 | DMatrix::from_iterator( 57 | rows, 58 | cols, 59 | elements.into_iter().map(python_to_rust_opt_float), 60 | ) 61 | } 62 | 63 | pub fn rust_to_python_opt_float(py: Python, value: Option) -> PyResult> { 64 | match value { 65 | Some(v) => Ok(PyFloat::new(py, v).into()), 66 | None => Ok(PyFloat::new(py, f64::NAN).into()), 67 | } 68 | } 69 | 70 | pub fn rust_to_python_dynamic_vector( 71 | py: Python, 72 | vector: DVector, 73 | ) -> PyResult>> { 74 | let array = Array1::from_vec(vector.data.into()); 75 | Ok(PyArray1::from_array(py, &array).into()) 76 | } 77 | 78 | pub fn rust_to_python_dynamic_matrix< 79 | T: ToPyObject + Clone + std::fmt::Debug + std::cmp::PartialEq + numpy::Element + 'static, 80 | >( 81 | py: Python, 82 | matrix: DMatrix, 83 | ) -> PyResult>> { 84 | let shape = matrix.shape(); 85 | let rows = shape.0; 86 | let cols = shape.1; 87 | let transposed_matrix = matrix.transpose(); 88 | let array = Array2::from_shape_vec((rows, cols), transposed_matrix.data.into()); 89 | match array { 90 | Ok(arr) => Ok(PyArray2::from_array(py, &arr).into()), 91 | Err(e) => Err(pyo3::exceptions::PyValueError::new_err(format!( 92 | "Array creation failed: {}", 93 | e 94 | ))), 95 | } 96 | } 97 | 98 | pub fn rust_to_python_opt_dynamic_vector( 99 | py: Python, 100 | vector: DVector>, 101 | ) -> PyResult>> { 102 | // Convert a nalgebra::DVector (constructed from elements) to a NumPy array 103 | let array = Array1::from_vec( 104 | vector 105 | .data 106 | .as_slice() 107 | .iter() 108 | .map(|v| match v { 109 | Some(val) => *val, 110 | None => f64::NAN, 111 | }) 112 | .collect(), 113 | ); 114 | Ok(PyArray1::from_array(py, &array).into()) 115 | } 116 | 117 | pub fn rust_to_python_opt_dynamic_matrix( 118 | py: Python, 119 | matrix: DMatrix>, 120 | ) -> PyResult>> { 121 | // Convert a nalgebra::DMatrix (constructed from rows, cols, and elements) to a NumPy array 122 | let shape = matrix.shape(); 123 | let rows = shape.0; 124 | let cols = shape.1; 125 | 126 | let transposed_matrix = matrix.transpose(); 127 | 128 | let array_data: Vec = transposed_matrix 129 | .data 130 | .as_slice() 131 | .iter() 132 | .map(|v| match v { 133 | Some(val) => *val, 134 | None => f64::NAN, 135 | }) 136 | .collect(); 137 | 138 | let array = Array2::from_shape_vec((rows, cols), array_data).map_err(|e| { 139 | pyo3::exceptions::PyValueError::new_err(format!("Array creation failed: {}", e)) 140 | })?; 141 | let numpy_array = PyArray2::from_array(py, &array); 142 | 143 | Ok(numpy_array.to_owned()) 144 | } 145 | 146 | #[pyfunction] 147 | pub fn converter_vector_test( 148 | py: Python, 149 | vector: PyReadonlyArray1, 150 | ) -> PyResult>> { 151 | let rust_vector = python_to_rust_dynamic_vector(&vector); 152 | rust_to_python_dynamic_vector(py, rust_vector) 153 | } 154 | 155 | #[pyfunction] 156 | pub fn converter_matrix_test( 157 | py: Python, 158 | matrix: PyReadonlyArray2, 159 | ) -> PyResult>> { 160 | let rust_matrix = python_to_rust_dynamic_matrix(&matrix); 161 | println!("{:?}", rust_matrix); 162 | rust_to_python_dynamic_matrix(py, rust_matrix) 163 | } 164 | 165 | #[pyfunction] 166 | pub fn converter_matrix_opt_test( 167 | py: Python, 168 | matrix: PyReadonlyArray2, 169 | ) -> PyResult>> { 170 | let rust_matrix = python_to_rust_opt_dynamic_matrix(&matrix); 171 | println!("{:?}", rust_matrix); 172 | rust_to_python_opt_dynamic_matrix(py, rust_matrix) 173 | } 174 | -------------------------------------------------------------------------------- /rustkit/src/lib.rs: -------------------------------------------------------------------------------- 1 | use pyo3::prelude::*; 2 | 3 | mod preprocessing; 4 | use preprocessing::simple_imputer::Imputer; 5 | use preprocessing::standard_scaler::StandardScaler; 6 | 7 | mod supervised; 8 | use supervised::ridge_regression::RidgeRegression; 9 | 10 | mod testing; 11 | use testing::regression_metrics::R2Score; 12 | use testing::regression_metrics::MSE; 13 | 14 | mod unsupervised; 15 | use unsupervised::kmeans::KMeans; 16 | use unsupervised::pca::PCA; 17 | 18 | pub mod converters; 19 | use converters::{converter_matrix_opt_test, converter_matrix_test, converter_vector_test}; 20 | 21 | pub mod benchmarking; 22 | 23 | #[pymodule] 24 | fn rustkit(_py: Python, m: &PyModule) -> PyResult<()> { 25 | m.add_class::()?; 26 | m.add_class::()?; 27 | m.add_class::()?; 28 | m.add_class::()?; 29 | m.add_class::()?; 30 | m.add_class::()?; 31 | m.add_class::()?; 32 | 33 | m.add_function(wrap_pyfunction!(converter_vector_test, m)?)?; 34 | m.add_function(wrap_pyfunction!(converter_matrix_test, m)?)?; 35 | m.add_function(wrap_pyfunction!(converter_matrix_opt_test, m)?)?; 36 | 37 | Ok(()) 38 | } 39 | -------------------------------------------------------------------------------- /rustkit/src/main.rs: -------------------------------------------------------------------------------- 1 | use nalgebra::{DMatrix, DVector}; 2 | pub mod benchmarking; 3 | pub mod converters; 4 | 5 | mod preprocessing; 6 | mod supervised; 7 | mod testing; 8 | mod unsupervised; 9 | 10 | use preprocessing::simple_imputer::Imputer; 11 | use preprocessing::standard_scaler::StandardScaler; 12 | use supervised::ridge_regression::RidgeRegression; 13 | use testing::regression_metrics::{R2Score, MSE}; 14 | use unsupervised::kmeans::KMeans; 15 | use unsupervised::pca::PCA; 16 | 17 | fn main() { 18 | sample_scaler(); 19 | print!("\n \n"); 20 | sample_pca(); 21 | print!("\n \n"); 22 | sample_ridge(); 23 | print!("\n \n"); 24 | sample_r2(); 25 | 26 | print!("\n \n"); 27 | sample_kmeans(); 28 | 29 | print!("\n \n"); 30 | sample_imputer(); 31 | } 32 | 33 | fn sample_ridge() { 34 | println!("============================================================================="); 35 | println!("RIDGE REGRESSION EXAMPLE"); 36 | println!("============================================================================="); 37 | // Training data: 4 samples, 2 features 38 | let x = DMatrix::from_row_slice(4, 2, &[1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0]); 39 | let y = DVector::from_row_slice(&[1.0, 2.0, 3.0, 4.0]); 40 | 41 | // Ridge regression with bias (default behavior) 42 | let mut ridge_with_bias = RidgeRegression::new(1.0, true); 43 | ridge_with_bias.fit_helper(&x, &y); 44 | 45 | println!("With Bias - Weights: {}", ridge_with_bias.weights_helper()); 46 | println!( 47 | "With Bias - Intercept: {:?}", 48 | ridge_with_bias.intercept_helper() 49 | ); 50 | println!( 51 | "With Bias - Predictions: {}", 52 | ridge_with_bias.predict_helper(&x) 53 | ); 54 | 55 | // Ridge regression without bias 56 | let mut ridge_no_bias = RidgeRegression::new(1.0, false); 57 | ridge_no_bias.fit_helper(&x, &y); 58 | 59 | println!("No Bias - Weights: {}", ridge_no_bias.weights_helper()); 60 | println!( 61 | "No Bias - Intercept: {:?}", 62 | ridge_no_bias.intercept_helper() 63 | ); 64 | println!( 65 | "No Bias - Predictions: {}", 66 | ridge_no_bias.predict_helper(&x) 67 | ); 68 | } 69 | 70 | fn sample_pca() { 71 | println!("============================================================================="); 72 | println!("PCA EXAMPLE"); 73 | println!("============================================================================="); 74 | // Sample data: 4 samples, 3 features 75 | let data = DMatrix::from_row_slice( 76 | 4, 77 | 3, 78 | &[ 79 | 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 80 | ], 81 | ); 82 | 83 | let mut pca = PCA::new(); 84 | let transformed_data = pca.fit_transform_helper(&data, 2); 85 | println!("Original Data:\n{}", data); 86 | println!("Transformed Data:\n{}", transformed_data); 87 | println!("Principal Components:\n{}", pca.components_helper()); 88 | println!("Explained Variance:\n{}", pca.explained_variance_helper()); 89 | 90 | let original_data = pca.inverse_transform_helper(&transformed_data); 91 | println!( 92 | "Reconstructed Data (after inverse transform):\n{}", 93 | original_data 94 | ); 95 | } 96 | 97 | fn sample_scaler() { 98 | println!("============================================================================="); 99 | println!("STANDARD SCALER EXAMPLE"); 100 | println!("============================================================================="); 101 | 102 | let data = DMatrix::from_row_slice( 103 | 4, 104 | 3, 105 | &[ 106 | 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 107 | ], 108 | ); 109 | 110 | let mut scaler = StandardScaler::new(); 111 | let standardized_data = scaler.fit_transform_helper(&data); 112 | 113 | println!("Standardized Data:\n{}", standardized_data); 114 | 115 | let original_data = scaler.inverse_transform_helper(&standardized_data); 116 | println!( 117 | "Original Data (after inverse transform):\n{}", 118 | original_data 119 | ); 120 | } 121 | 122 | fn sample_r2() { 123 | // True values 124 | println!("============================================================================="); 125 | println!("R2-SCORE & MSE EXAMPLE"); 126 | println!("============================================================================="); 127 | 128 | let y_true = DVector::from_row_slice(&[3.0, -0.5, 2.0, 7.0]); 129 | 130 | // Predicted values 131 | let y_pred = DVector::from_row_slice(&[2.5, 0.0, 2.0, 8.0]); 132 | 133 | // Compute the scores 134 | let r2_score = R2Score::compute_helper(&y_true, &y_pred); 135 | let mse = MSE::compute_helper(&y_true, &y_pred); 136 | 137 | println!("R² Score: {}", r2_score); 138 | println!("MSE: {}", mse); 139 | 140 | // Test case for a constant true vector 141 | let y_true_constant = DVector::from_row_slice(&[1.0, 1.0, 1.0, 1.0]); 142 | let y_pred_constant = DVector::from_row_slice(&[1.0, 1.0, 1.0, 1.0]); 143 | 144 | let r2_score_constant = R2Score::compute_helper(&y_true_constant, &y_pred_constant); 145 | let mse_constant = MSE::compute_helper(&y_true_constant, &y_pred_constant); 146 | println!("R² Score (constant): {}", r2_score_constant); 147 | println!("MSE (constant): {}", mse_constant); 148 | 149 | // Example where R² is negative (bad model) 150 | let y_pred_bad = DVector::from_row_slice(&[10.0, 10.0, 10.0, 10.0]); 151 | let r2_score_bad = R2Score::compute_helper(&y_true, &y_pred_bad); 152 | let mse_bad = MSE::compute_helper(&y_true, &y_pred_bad); 153 | println!("R² Score (bad model): {}", r2_score_bad); 154 | println!("MSE (bad model): {}", mse_bad); 155 | } 156 | 157 | fn sample_imputer() { 158 | println!("============================================================================="); 159 | println!("IMPUTER EXAMPLE"); 160 | println!("============================================================================="); 161 | let data = DMatrix::from_row_slice( 162 | 3, 163 | 2, 164 | &[ 165 | Some(1.0), 166 | None, 167 | None, 168 | Some(4.0), 169 | Some(5.0), 170 | None, 171 | None, 172 | Some(8.0), 173 | None, 174 | ], 175 | ); 176 | 177 | let mut imputer_mean = Imputer::new("mean", None); 178 | let mut imputer_cons = Imputer::new("constant", Some(-1.0)); 179 | match imputer_mean.fit_helper(&data) { 180 | Ok(()) => { 181 | println!("Original data:\n{:?}", data); 182 | println!( 183 | "Mean imputed data:\n{}", 184 | imputer_mean.transform_helper(&data) 185 | ); 186 | } 187 | Err(e) => eprintln!("Mean imputation error: {}", e), 188 | } 189 | match imputer_cons.fit_transform_helper(&data) { 190 | Ok(imputed_data) => { 191 | println!("Cons imputed data:\n{}", imputed_data); 192 | } 193 | Err(e) => eprintln!("Cons imputation error: {}", e), 194 | } 195 | let test_data = DMatrix::from_row_slice( 196 | 5, 197 | 2, 198 | &[ 199 | None, 200 | Some(1.0), 201 | Some(1.0), 202 | None, 203 | None, 204 | Some(1.0), 205 | Some(1.0), 206 | None, 207 | None, 208 | Some(1.0), 209 | ], 210 | ); 211 | println!("Original test data:\n{:?}", test_data); 212 | println!( 213 | "Mean imputed test data (fit on original data above):\n{}", 214 | imputer_mean.transform_helper(&test_data) 215 | ); 216 | } 217 | 218 | fn sample_kmeans() { 219 | println!("============================================================================="); 220 | println!("R2-SCORE & MSE EXAMPLE"); 221 | println!("============================================================================="); 222 | let data = DMatrix::from_row_slice( 223 | 10, 224 | 3, 225 | &[ 226 | 1.0, 2.0, 3.0, // Point 1 227 | 1.1, 2.1, 3.1, // Point 2 228 | 0.9, 1.9, 2.9, // Point 3 229 | 8.0, 9.0, 10.0, // Point 4 230 | 8.1, 9.1, 10.1, // Point 5 231 | 7.9, 8.9, 9.9, // Point 6 232 | 4.0, 5.0, 6.0, // Point 7 233 | 4.1, 5.1, 6.1, // Point 8 234 | 3.9, 4.9, 5.9, // Point 9 235 | 4.0, 5.0, 6.0, // Point 10 236 | ], 237 | ); 238 | 239 | // Number of clusters 240 | let k = 2; 241 | println!("{}", data.row(1)); 242 | 243 | // Run KMeans with Random initialization 244 | let mut kmeans_random = KMeans::new(k, "random", Some(200), Some(10)); 245 | kmeans_random.fit_helper(&data); 246 | let labels_random = kmeans_random.predict_helper(&data).unwrap(); 247 | let inertia_random = kmeans_random.compute_inertia_helper( 248 | &data, 249 | &labels_random, 250 | kmeans_random.get_centroids_helper().unwrap(), 251 | ); 252 | 253 | // Print results for Random initialization 254 | println!("Results with Random Initialization:"); 255 | println!("Labels: {}", labels_random.transpose()); 256 | println!( 257 | "Centroids: {}", 258 | kmeans_random.get_centroids_helper().unwrap() 259 | ); 260 | println!("Total Inertia: {:.4}", inertia_random); 261 | 262 | // Run KMeans with KMeans++ initialization 263 | let mut kmeans_plus_plus = KMeans::new(k, "kmeans++", None, None); 264 | let labels_plus_plus = kmeans_plus_plus.fit_predict_helper(&data); 265 | let inertia_plus_plus = kmeans_plus_plus.compute_inertia_helper( 266 | &data, 267 | &labels_plus_plus, 268 | kmeans_plus_plus.get_centroids_helper().unwrap(), 269 | ); 270 | 271 | // Print results for KMeans++ initialization 272 | println!("\nResults with KMeans++ Initialization:"); 273 | println!("Labels: {}", labels_plus_plus.transpose()); 274 | println!( 275 | "Centroids: {}", 276 | kmeans_plus_plus.get_centroids_helper().unwrap() 277 | ); 278 | println!("Total Inertia: {:.4}", inertia_plus_plus); 279 | } 280 | -------------------------------------------------------------------------------- /rustkit/src/preprocessing/mod.rs: -------------------------------------------------------------------------------- 1 | pub mod standard_scaler; 2 | pub mod simple_imputer; 3 | -------------------------------------------------------------------------------- /rustkit/src/preprocessing/simple_imputer.rs: -------------------------------------------------------------------------------- 1 | use crate::benchmarking::log_function_time; 2 | use crate::converters::{python_to_rust_opt_dynamic_matrix, rust_to_python_dynamic_matrix}; 3 | use nalgebra::DMatrix; 4 | use numpy::{PyArray2, PyReadonlyArray2}; 5 | use pyo3::exceptions::PyValueError; 6 | use pyo3::prelude::*; 7 | use pyo3::Python; 8 | use std::fmt; 9 | 10 | #[derive(Debug)] 11 | pub struct ImputerError { 12 | column_index: usize, 13 | } 14 | 15 | impl fmt::Display for ImputerError { 16 | fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { 17 | write!( 18 | f, 19 | "Column {} has no non-missing values to compute the mean.", 20 | self.column_index 21 | ) 22 | } 23 | } 24 | 25 | impl std::error::Error for ImputerError {} 26 | 27 | #[derive(Debug, Clone)] 28 | pub enum ImputationType { 29 | Mean, 30 | Constant(f64), 31 | } 32 | 33 | #[pyclass] 34 | pub struct Imputer { 35 | strategy: ImputationType, 36 | impute_values: Option>, 37 | } 38 | 39 | // Performs imputation on a matrix of Option. Necessary when importing datasets with null entries (e.g. Python) 40 | #[pymethods] 41 | impl Imputer { 42 | #[new] 43 | pub fn new(strategy: &str, value: Option) -> Self { 44 | match strategy { 45 | "mean" => Imputer { 46 | strategy: ImputationType::Mean, 47 | impute_values: None, 48 | }, 49 | "constant" => Imputer { 50 | strategy: ImputationType::Constant(value.unwrap()), 51 | impute_values: None, 52 | }, 53 | _ => panic!("Invalid strategy"), 54 | } 55 | } 56 | 57 | pub fn fit(&mut self, data: PyReadonlyArray2) -> PyResult<()> { 58 | let rust_data = python_to_rust_opt_dynamic_matrix(&data); 59 | let result = log_function_time( 60 | || self.fit_helper(&rust_data), 61 | "Imputer::fit", 62 | rust_data.shape().0, 63 | rust_data.shape().1, 64 | ); 65 | match result { 66 | Ok(_) => Ok(()), 67 | Err(e) => Err(PyValueError::new_err(e.to_string())), 68 | } 69 | } 70 | 71 | pub fn transform( 72 | &self, 73 | py: Python, 74 | data: PyReadonlyArray2, 75 | ) -> PyResult>> { 76 | let rust_data = python_to_rust_opt_dynamic_matrix(&data); 77 | let shape = rust_data.shape(); 78 | let transformed_data = log_function_time( 79 | || self.transform_helper(&rust_data), 80 | "Imputer::transform", 81 | shape.0, 82 | shape.1, 83 | ) 84 | .unwrap(); 85 | rust_to_python_dynamic_matrix(py, transformed_data) 86 | } 87 | 88 | pub fn fit_transform( 89 | &mut self, 90 | py: Python, 91 | data: PyReadonlyArray2, 92 | ) -> PyResult>> { 93 | let rust_data = python_to_rust_opt_dynamic_matrix(&data); 94 | let shape = rust_data.shape(); 95 | let transformed_data = log_function_time( 96 | || self.fit_transform_helper(&rust_data), 97 | "Imputer::fit_transform", 98 | shape.0, 99 | shape.1, 100 | ) 101 | .unwrap(); 102 | match transformed_data { 103 | Ok(data) => rust_to_python_dynamic_matrix(py, data), 104 | Err(e) => Err(PyValueError::new_err(format!( 105 | "Column {} has no non-missing values to compute the mean.", 106 | e.column_index, 107 | ))), 108 | } 109 | } 110 | } 111 | 112 | impl Imputer { 113 | pub fn fit_transform_helper( 114 | &mut self, 115 | data: &DMatrix>, 116 | ) -> Result, ImputerError> { 117 | // Call `fit` and propagate errors if any 118 | let result = self.fit_helper(data); 119 | 120 | match result { 121 | Ok(_) => Ok(self.transform_helper(data)), 122 | Err(e) => Err(e), 123 | } 124 | } 125 | 126 | /// Fits the imputer to the data, computing imputation values for each column. 127 | pub fn fit_helper(&mut self, data: &DMatrix>) -> Result<(), ImputerError> { 128 | let (_, ncols) = data.shape(); 129 | let mut impute_values = Vec::new(); 130 | 131 | for j in 0..ncols { 132 | let column = data.column(j); 133 | 134 | // Compute the imputation value based on the strategy 135 | let impute_value = match &self.strategy { 136 | ImputationType::Mean => { 137 | let mut non_missing_values = Vec::new(); 138 | 139 | for i in 0..data.nrows() { 140 | if let Some(value) = column[i] { 141 | non_missing_values.push(value); 142 | } 143 | } 144 | 145 | let mean = self.mean_safe(&non_missing_values); 146 | match mean { 147 | None => return Err(ImputerError { column_index: j }), 148 | Some(value) => value, 149 | } 150 | } 151 | ImputationType::Constant(val) => *val, 152 | }; 153 | 154 | impute_values.push(impute_value); 155 | } 156 | 157 | self.impute_values = Some(impute_values); 158 | Ok(()) 159 | } 160 | 161 | /// Transforms the data using the computed imputation values. Panics if the imputer has not been fitted. 162 | pub fn transform_helper(&self, data: &DMatrix>) -> DMatrix { 163 | if self.impute_values.is_none() { 164 | panic!("Imputer has not been fitted yet. Please call `fit` before `transform`."); 165 | } 166 | 167 | let impute_values = self.impute_values.as_ref().unwrap(); 168 | let (nrows, ncols) = data.shape(); 169 | let mut result = DMatrix::zeros(nrows, ncols); 170 | 171 | for j in 0..ncols { 172 | let column = data.column(j); 173 | 174 | // Use the pre-computed imputation value for the column 175 | let impute_value = impute_values[j]; 176 | 177 | for i in 0..nrows { 178 | result[(i, j)] = column[i].unwrap_or(impute_value); 179 | } 180 | } 181 | 182 | result 183 | } 184 | 185 | // Helper function to calculate the mean of a Vec without running into overflow errors 186 | fn mean_safe(&self, vec: &Vec) -> Option { 187 | if vec.is_empty() { 188 | None // Handle empty vector 189 | } else { 190 | let mut mean = 0.0; 191 | for (i, &value) in vec.iter().enumerate() { 192 | mean += (value - mean) / (i + 1) as f64; // Update mean iteratively 193 | } 194 | Some(mean) 195 | } 196 | } 197 | } 198 | -------------------------------------------------------------------------------- /rustkit/src/preprocessing/standard_scaler.rs: -------------------------------------------------------------------------------- 1 | use crate::benchmarking::log_function_time; 2 | use crate::converters::{python_to_rust_dynamic_matrix, rust_to_python_dynamic_matrix}; 3 | use nalgebra::{DMatrix, DVector}; 4 | use numpy::{PyArray2, PyReadonlyArray2}; 5 | use pyo3::exceptions::PyValueError; 6 | use pyo3::prelude::*; 7 | 8 | #[pyclass] 9 | pub struct StandardScaler { 10 | means: Option>, 11 | std_devs: Option>, 12 | } 13 | 14 | //TODO: Add a flag to decide whether feature std is calculated 15 | // by dividing by n or n-1 16 | #[pymethods] 17 | impl StandardScaler { 18 | /// Creates a new `StandardScaler` instance. 19 | #[new] 20 | pub fn new() -> Self { 21 | StandardScaler { 22 | means: None, 23 | std_devs: None, 24 | } 25 | } 26 | 27 | // Wrapper for the `fit` method 28 | pub fn fit(&mut self, data: PyReadonlyArray2) -> PyResult<()> { 29 | let rust_data = python_to_rust_dynamic_matrix(&data); 30 | let shape = rust_data.shape(); 31 | let result = log_function_time( 32 | || self.fit_helper(&rust_data), 33 | "StandardScaler::fit", 34 | shape.0, 35 | shape.1, 36 | ); 37 | match result { 38 | Ok(_) => Ok(()), 39 | Err(e) => Err(PyValueError::new_err(e.to_string())), 40 | } 41 | } 42 | 43 | // Wrapper for the `transform` method 44 | pub fn transform( 45 | &self, 46 | py: Python, 47 | data: PyReadonlyArray2, 48 | ) -> PyResult>> { 49 | let rust_data = python_to_rust_dynamic_matrix(&data); 50 | let shape = rust_data.shape(); 51 | let transformed_data = log_function_time( 52 | || self.transform_helper(&rust_data), 53 | "StandardScaler::transform", 54 | shape.0, 55 | shape.1, 56 | ) 57 | .unwrap(); 58 | rust_to_python_dynamic_matrix(py, transformed_data) 59 | } 60 | 61 | // Wrapper for the `fit_transform` method 62 | pub fn fit_transform( 63 | &mut self, 64 | py: Python, 65 | data: PyReadonlyArray2, 66 | ) -> PyResult>> { 67 | let rust_data = python_to_rust_dynamic_matrix(&data); 68 | let shape = rust_data.shape(); 69 | let transformed_data = log_function_time( 70 | || self.fit_transform_helper(&rust_data), 71 | "StandardScaler::fit_transform", 72 | shape.0, 73 | shape.1, 74 | ) 75 | .unwrap(); 76 | rust_to_python_dynamic_matrix(py, transformed_data) 77 | } 78 | 79 | // Wrapper for the `inverse_transform` method 80 | pub fn inverse_transform( 81 | &self, 82 | py: Python, 83 | scaled_data: PyReadonlyArray2, 84 | ) -> PyResult>> { 85 | let rust_scaled_data = python_to_rust_dynamic_matrix(&scaled_data); 86 | let shape = rust_scaled_data.shape(); 87 | let original_data = log_function_time( 88 | || self.inverse_transform_helper(&rust_scaled_data), 89 | "StandardScaler::inverse_transform", 90 | shape.0, 91 | shape.1, 92 | ) 93 | .unwrap(); 94 | rust_to_python_dynamic_matrix(py, original_data) 95 | } 96 | } 97 | 98 | impl StandardScaler { 99 | /// Fits the scaler to the data by calculating the means and standard deviations for each column. 100 | fn fit_helper(&mut self, data: &DMatrix) { 101 | let (n_rows, n_cols) = data.shape(); 102 | 103 | let mut means = DVector::zeros(n_cols); 104 | let mut std_devs = DVector::zeros(n_cols); 105 | 106 | for col in 0..n_cols { 107 | let column = data.column(col); 108 | let sum: f64 = column.iter().sum(); 109 | let mean = sum / n_rows as f64; 110 | 111 | means[col] = mean; 112 | 113 | let variance: f64 = 114 | column.iter().map(|&x| (x - mean).powi(2)).sum::() / (n_rows) as f64; 115 | std_devs[col] = variance.sqrt(); 116 | } 117 | 118 | self.means = Some(means); 119 | self.std_devs = Some(std_devs); 120 | } 121 | 122 | /// Transforms the input data using the fitted means and standard deviations. 123 | fn transform_helper(&self, data: &DMatrix) -> DMatrix { 124 | let means = self 125 | .means 126 | .as_ref() 127 | .expect("Scaler has not been fitted yet."); 128 | let std_devs = self 129 | .std_devs 130 | .as_ref() 131 | .expect("Scaler has not been fitted yet."); 132 | 133 | let (n_rows, n_cols) = data.shape(); 134 | let mut scaled_data = DMatrix::zeros(n_rows, n_cols); 135 | 136 | for col in 0..n_cols { 137 | let mean = means[col]; 138 | let std_dev = std_devs[col]; 139 | 140 | for row in 0..n_rows { 141 | scaled_data[(row, col)] = (data[(row, col)] - mean) / std_dev; 142 | } 143 | } 144 | 145 | scaled_data 146 | } 147 | 148 | /// Fits the scaler and then transforms the input data. 149 | pub fn fit_transform_helper(&mut self, data: &DMatrix) -> DMatrix { 150 | self.fit_helper(data); 151 | self.transform_helper(data) 152 | } 153 | 154 | pub fn inverse_transform_helper(&self, scaled_data: &DMatrix) -> DMatrix { 155 | let means = self 156 | .means 157 | .as_ref() 158 | .expect("Scaler has not been fitted yet."); 159 | let std_devs = self 160 | .std_devs 161 | .as_ref() 162 | .expect("Scaler has not been fitted yet."); 163 | 164 | let (n_rows, n_cols) = scaled_data.shape(); 165 | let mut original_data = DMatrix::zeros(n_rows, n_cols); 166 | 167 | for col in 0..n_cols { 168 | let mean = means[col]; 169 | let std_dev = std_devs[col]; 170 | 171 | for row in 0..n_rows { 172 | original_data[(row, col)] = scaled_data[(row, col)] * std_dev + mean; 173 | } 174 | } 175 | 176 | original_data 177 | } 178 | } 179 | -------------------------------------------------------------------------------- /rustkit/src/supervised/mod.rs: -------------------------------------------------------------------------------- 1 | pub mod ridge_regression; 2 | -------------------------------------------------------------------------------- /rustkit/src/supervised/ridge_regression.rs: -------------------------------------------------------------------------------- 1 | use crate::benchmarking::log_function_time; 2 | use crate::converters::{ 3 | python_to_rust_dynamic_matrix, python_to_rust_dynamic_vector, rust_to_python_dynamic_vector, 4 | rust_to_python_opt_float, 5 | }; 6 | use nalgebra::{DMatrix, DVector}; 7 | use numpy::{PyArray1, PyReadonlyArray1, PyReadonlyArray2}; 8 | use pyo3::exceptions::PyValueError; 9 | use pyo3::prelude::*; 10 | use pyo3::types::PyFloat; 11 | 12 | #[pyclass] 13 | pub struct RidgeRegression { 14 | weights: DVector, 15 | intercept: Option, // Optional because it may not be used 16 | regularization: f64, 17 | with_bias: bool, 18 | } 19 | 20 | #[pymethods] 21 | impl RidgeRegression { 22 | /// Creates a new RidgeRegression instance. 23 | /// Set `regularization` to 0.0 for standard linear regression. 24 | /// The `with_bias` parameter specifies whether to include a bias term (default is true). 25 | /// We use LU decomposition and solve, rather than explicitly computing a (computationally expensive) matrix inverse 26 | /// IMPORTANT: Ridge regression should only be used on normalized data. If the data is not normalized, set regularization = 0. 27 | 28 | #[new] 29 | pub fn new(regularization: f64, with_bias: bool) -> Self { 30 | RidgeRegression { 31 | weights: DVector::zeros(0), 32 | intercept: if with_bias { Some(0.0) } else { None }, 33 | regularization, 34 | with_bias, 35 | } 36 | } 37 | 38 | #[getter] 39 | pub fn weights(&self, py: Python) -> PyResult>> { 40 | rust_to_python_dynamic_vector(py, self.weights.clone()) 41 | } 42 | 43 | #[getter] 44 | pub fn intercept(&self, py: Python) -> PyResult> { 45 | rust_to_python_opt_float(py, self.intercept) 46 | } 47 | 48 | #[setter] 49 | pub fn set_regularization(&mut self, regularization: f64) -> PyResult<()> { 50 | self.regularization = regularization; 51 | Ok(()) 52 | } 53 | 54 | #[setter] 55 | pub fn set_with_bias(&mut self, with_bias: bool) -> PyResult<()> { 56 | self.with_bias = with_bias; 57 | if with_bias { 58 | self.intercept = Some(0.0); 59 | } else { 60 | self.intercept = None; 61 | } 62 | Ok(()) 63 | } 64 | 65 | #[setter] 66 | pub fn set_weights(&mut self, weights: PyReadonlyArray1) -> PyResult<()> { 67 | self.weights = python_to_rust_dynamic_vector(&weights); 68 | Ok(()) 69 | } 70 | 71 | #[setter] 72 | pub fn set_intercept(&mut self, intercept: Option) -> PyResult<()> { 73 | self.intercept = intercept; 74 | Ok(()) 75 | } 76 | 77 | /// Fits the Ridge Regression model to the data. 78 | pub fn fit( 79 | &mut self, 80 | data: PyReadonlyArray2, 81 | target: PyReadonlyArray1, 82 | ) -> PyResult<()> { 83 | let x = python_to_rust_dynamic_matrix(&data); 84 | let y = python_to_rust_dynamic_vector(&target); 85 | let result = log_function_time( 86 | || self.fit_helper(&x, &y), 87 | "RidgeRegression::fit", 88 | x.nrows(), 89 | x.ncols(), 90 | ); 91 | match result { 92 | Ok(_) => Ok(()), 93 | Err(e) => Err(PyValueError::new_err(e)), 94 | } 95 | } 96 | 97 | /// Predicts target values for the given input data. 98 | pub fn predict(&self, py: Python, data: PyReadonlyArray2) -> PyResult>> { 99 | let x = python_to_rust_dynamic_matrix(&data); 100 | let (nrows, ncols) = x.shape(); 101 | let predictions = log_function_time( 102 | || self.predict_helper(&x), 103 | "RidgeRegression::predict", 104 | nrows, 105 | ncols, 106 | ) 107 | .unwrap(); 108 | rust_to_python_dynamic_vector(py, predictions) 109 | } 110 | } 111 | 112 | impl RidgeRegression { 113 | /// Fits the Ridge Regression model to the data. 114 | pub fn fit_helper(&mut self, x: &DMatrix, y: &DVector) { 115 | let n_samples = x.nrows(); 116 | let n_features = x.ncols(); 117 | 118 | assert_eq!(y.len(), n_samples, "Mismatched input dimensions."); 119 | 120 | if self.with_bias { 121 | // Add a column of ones to X for the intercept term 122 | let x_with_bias = { 123 | let mut extended = DMatrix::zeros(n_samples, n_features + 1); 124 | extended.index_mut((.., ..n_features)).copy_from(x); 125 | extended.column_mut(n_features).fill(1.0); 126 | extended 127 | }; 128 | 129 | // Compute the regularization matrix 130 | let regularization_matrix = { 131 | let mut reg_matrix = DMatrix::identity(n_features + 1, n_features + 1); 132 | reg_matrix[(n_features, n_features)] = 0.0; // Don't regularize the intercept 133 | reg_matrix * self.regularization 134 | }; 135 | 136 | // Solve for weights and intercept: (X'X + λI)^-1 X'Y 137 | let xt = x_with_bias.transpose(); 138 | let xtx = &xt * &x_with_bias; 139 | let xtx_reg = &xtx + regularization_matrix; 140 | let xty = xt * y; 141 | 142 | let solution = xtx_reg.lu().solve(&xty).expect("Matrix inversion failed."); 143 | 144 | // Extract weights and intercept 145 | self.weights = solution.rows(0, n_features).into(); 146 | self.intercept = Some(solution[n_features]); 147 | } else { 148 | // Compute the regularization matrix for weights only 149 | let regularization_matrix = 150 | DMatrix::identity(n_features, n_features) * self.regularization; 151 | 152 | // Solve for weights: (X'X + λI)^-1 X'Y 153 | let xt = x.transpose(); 154 | let xtx = &xt * x; 155 | let xtx_reg = &xtx + regularization_matrix; 156 | let xty = xt * y; 157 | 158 | self.weights = xtx_reg.lu().solve(&xty).expect("Matrix inversion failed."); 159 | self.intercept = None; 160 | } 161 | } 162 | 163 | /// Predicts target values for the given input data. 164 | pub fn predict_helper(&self, x: &DMatrix) -> DVector { 165 | let predictions = if self.with_bias { 166 | let intercept = self 167 | .intercept 168 | .expect("Bias term is not available but required for prediction."); 169 | x * &self.weights + DVector::from_element(x.nrows(), intercept) 170 | } else { 171 | x * &self.weights 172 | }; 173 | predictions 174 | } 175 | 176 | /// Returns the model weights (coefficients). 177 | pub fn weights_helper(&self) -> &DVector { 178 | &self.weights 179 | } 180 | 181 | /// Returns the model intercept (if available). 182 | pub fn intercept_helper(&self) -> Option { 183 | self.intercept 184 | } 185 | } 186 | -------------------------------------------------------------------------------- /rustkit/src/testing/mod.rs: -------------------------------------------------------------------------------- 1 | pub mod regression_metrics; 2 | -------------------------------------------------------------------------------- /rustkit/src/testing/regression_metrics.rs: -------------------------------------------------------------------------------- 1 | use crate::benchmarking::log_function_time; 2 | use crate::converters::python_to_rust_dynamic_vector; 3 | use nalgebra::DVector; 4 | use numpy::PyReadonlyArray1; 5 | use pyo3::prelude::*; 6 | use pyo3::types::PyFloat; 7 | 8 | /// A class to compute the R² score between two vectors. 9 | #[pyclass] 10 | pub struct R2Score; 11 | 12 | #[pymethods] 13 | impl R2Score { 14 | #[staticmethod] 15 | pub fn compute( 16 | py: Python, 17 | y_true: PyReadonlyArray1, 18 | y_pred: PyReadonlyArray1, 19 | ) -> PyResult> { 20 | let rust_y_true = python_to_rust_dynamic_vector(&y_true); 21 | let rust_y_pred = python_to_rust_dynamic_vector(&y_pred); 22 | let result = log_function_time( 23 | || R2Score::compute_helper(&rust_y_true, &rust_y_pred), 24 | "R2Score::compute", 25 | 1, 26 | rust_y_true.len(), 27 | ) 28 | .unwrap(); 29 | Ok(PyFloat::new(py, result).into()) 30 | } 31 | } 32 | 33 | impl R2Score { 34 | /// Computes the R² score between the true values and predictions. 35 | /// 36 | /// # Arguments 37 | /// - `y_true`: The vector of true values. 38 | /// - `y_pred`: The vector of predicted values. 39 | /// 40 | /// # Returns 41 | /// - The R² score (coefficient of determination). 42 | /// 43 | /// # Panics 44 | /// - If `y_true` and `y_pred` have different lengths. 45 | pub fn compute_helper(y_true: &DVector, y_pred: &DVector) -> f64 { 46 | assert_eq!( 47 | y_true.len(), 48 | y_pred.len(), 49 | "y_true and y_pred must have the same length." 50 | ); 51 | 52 | let mean_y_true = y_true.mean(); 53 | let total_variance: f64 = y_true.iter().map(|&y| (y - mean_y_true).powi(2)).sum(); 54 | 55 | let residual_sum_of_squares: f64 = y_true 56 | .iter() 57 | .zip(y_pred.iter()) 58 | .map(|(&true_val, &pred_val)| (true_val - pred_val).powi(2)) 59 | .sum(); 60 | 61 | if total_variance == 0.0 { 62 | if residual_sum_of_squares == 0.0 { 63 | return 1.0; // Perfect prediction for a constant true vector 64 | } else { 65 | return f64::NEG_INFINITY; // Undefined for non-zero residuals 66 | } 67 | } 68 | 69 | 1.0 - residual_sum_of_squares / total_variance 70 | } 71 | } 72 | 73 | #[pyclass] 74 | pub struct MSE; 75 | 76 | #[pymethods] 77 | impl MSE { 78 | #[staticmethod] 79 | pub fn compute( 80 | py: Python, 81 | y_true: PyReadonlyArray1, 82 | y_pred: PyReadonlyArray1, 83 | ) -> PyResult> { 84 | let rust_y_true = python_to_rust_dynamic_vector(&y_true); 85 | let rust_y_pred = python_to_rust_dynamic_vector(&y_pred); 86 | let result = log_function_time( 87 | || MSE::compute_helper(&rust_y_true, &rust_y_pred), 88 | "MSE::compute", 89 | 1, 90 | rust_y_true.len(), 91 | ) 92 | .unwrap(); 93 | Ok(PyFloat::new(py, result).into()) 94 | } 95 | } 96 | 97 | impl MSE { 98 | /// Computes the MSE between the true values and predictions. 99 | /// 100 | /// # Arguments 101 | /// - `y_true`: The vector of true values. 102 | /// - `y_pred`: The vector of predicted values. 103 | /// 104 | /// # Returns 105 | /// - The MSE (mean squared error). 106 | /// 107 | /// # Panics 108 | /// - If `y_true` and `y_pred` have different lengths. 109 | pub fn compute_helper(y_true: &DVector, y_pred: &DVector) -> f64 { 110 | assert_eq!( 111 | y_true.len(), 112 | y_pred.len(), 113 | "y_true and y_pred must have the same length." 114 | ); 115 | 116 | let residual_sum_of_squares: f64 = y_true 117 | .iter() 118 | .zip(y_pred.iter()) 119 | .map(|(&true_val, &pred_val)| (true_val - pred_val).powi(2)) 120 | .sum(); 121 | 122 | residual_sum_of_squares / (y_pred.len() as f64) 123 | } 124 | } 125 | -------------------------------------------------------------------------------- /rustkit/src/unsupervised/kmeans.rs: -------------------------------------------------------------------------------- 1 | use crate::benchmarking::log_function_time; 2 | use crate::converters::{ 3 | python_to_rust_dynamic_matrix, python_to_rust_dynamic_vector, rust_to_python_dynamic_matrix, 4 | rust_to_python_dynamic_vector, 5 | }; 6 | use nalgebra::{DMatrix, DVector}; 7 | use numpy::{PyArray1, PyArray2, PyReadonlyArray1, PyReadonlyArray2}; 8 | use pyo3::exceptions::PyValueError; 9 | use pyo3::prelude::*; 10 | use pyo3::types::PyFloat; 11 | use pyo3::Python; 12 | use rand::seq::SliceRandom; 13 | use rand::Rng; 14 | use std::f64; 15 | 16 | #[derive(Debug, Clone, Copy, PartialEq)] 17 | pub enum InitMethod { 18 | KMeansPlusPlus, 19 | Random, 20 | } 21 | 22 | #[pyclass] 23 | pub struct KMeans { 24 | k: usize, // Number of clusters 25 | max_iter: usize, // Maximum number of iterations for a single run 26 | n_init: usize, // Number of runs of the algo (we pick the best, particularly relevant for random initialization) 27 | init_method: InitMethod, // Initialization method (KMeans++ or Random) 28 | centroids: Option>, // Centroids of the clusters after fitting 29 | labels: Option>, // Cluster labels for each data point 30 | } 31 | 32 | // Implementation of Lloyd's KMeans clustering algorithm with KMeans++ or random initialization 33 | // IMPORTANT: Centroids are stored as a (d x k) matrix where d is the dimension of your data. 34 | // in particular, this means that the centroids are stored as the *columns* of the centroids matrix 35 | #[pymethods] 36 | impl KMeans { 37 | // Creates a new KMeans instance with specified parameters 38 | #[new] 39 | pub fn new( 40 | k: usize, 41 | init_method_str: &str, 42 | max_iter: Option, 43 | n_init: Option, 44 | ) -> Self { 45 | let init_method = match init_method_str { 46 | "kmeans++" => InitMethod::KMeansPlusPlus, 47 | "random" => InitMethod::Random, 48 | _ => panic!("Invalid initialization method"), 49 | }; 50 | let max_iter = max_iter.unwrap_or(200); 51 | let n_init = n_init.unwrap_or_else(|| match init_method { 52 | InitMethod::KMeansPlusPlus => 1, 53 | InitMethod::Random => 10, 54 | }); 55 | 56 | KMeans { 57 | k, 58 | max_iter, 59 | n_init, 60 | init_method, 61 | centroids: None, 62 | labels: None, 63 | } 64 | } 65 | 66 | pub fn fit(&mut self, data: PyReadonlyArray2) -> PyResult<()> { 67 | let data = python_to_rust_dynamic_matrix(&data); 68 | let fn_name = if self.init_method == InitMethod::KMeansPlusPlus { 69 | "KMeans::fit" 70 | } else { 71 | "KMeans(Random)::fit" 72 | }; 73 | let result = log_function_time( 74 | || self.fit_helper(&data), 75 | fn_name, 76 | data.nrows(), 77 | data.ncols(), 78 | ); 79 | match result { 80 | Ok(_) => Ok(()), 81 | Err(e) => Err(PyValueError::new_err(e.to_string())), 82 | } 83 | } 84 | 85 | pub fn fit_predict( 86 | &mut self, 87 | py: Python, 88 | data: PyReadonlyArray2, 89 | ) -> PyResult>> { 90 | let data = python_to_rust_dynamic_matrix(&data); 91 | let labels = log_function_time( 92 | || self.fit_predict_helper(&data), 93 | "KMeans::fit_predict", 94 | data.nrows(), 95 | data.ncols(), 96 | ) 97 | .unwrap(); 98 | rust_to_python_dynamic_vector(py, labels) 99 | } 100 | 101 | pub fn compute_inertia( 102 | &self, 103 | py: Python, 104 | data: PyReadonlyArray2, 105 | labels: PyReadonlyArray1, 106 | ) -> PyResult> { 107 | let data = python_to_rust_dynamic_matrix(&data); 108 | let labels = python_to_rust_dynamic_vector(&labels); 109 | let float = log_function_time( 110 | || self.compute_inertia_helper(&data, &labels, self.centroids.as_ref().unwrap()), 111 | "KMeans::compute_inertia", 112 | data.nrows(), 113 | data.ncols(), 114 | ) 115 | .unwrap(); 116 | Ok(PyFloat::new(py, float).into()) 117 | } 118 | 119 | pub fn predict( 120 | &self, 121 | py: Python, 122 | data: PyReadonlyArray2, 123 | ) -> PyResult>> { 124 | let data = python_to_rust_dynamic_matrix(&data); 125 | let labels = log_function_time( 126 | || self.predict_helper(&data), 127 | "KMeans::predict", 128 | data.nrows(), 129 | data.ncols(), 130 | ) 131 | .unwrap(); 132 | match labels { 133 | Some(labels) => rust_to_python_dynamic_vector(py, labels), 134 | None => Err(PyValueError::new_err("Model has not been fitted")), 135 | } 136 | } 137 | 138 | #[getter] 139 | pub fn centroids(&self, py: Python) -> PyResult>> { 140 | let centroids_opt = self.get_centroids_helper(); 141 | match centroids_opt { 142 | Some(centroids) => rust_to_python_dynamic_matrix(py, centroids.clone()), 143 | None => Err(PyValueError::new_err("Centroids have not been computed")), 144 | } 145 | } 146 | } 147 | 148 | impl KMeans { 149 | // Fits the KMeans model to the data 150 | pub fn fit_helper(&mut self, data: &DMatrix) { 151 | let mut best_inertia = f64::MAX; 152 | let mut best_centroids = None; 153 | let mut best_labels = None; 154 | 155 | // Run the algorithm n_init times and keep the best result 156 | for _ in 0..self.n_init { 157 | let (centroids, labels, inertia) = self.run_single(data); 158 | if inertia < best_inertia { 159 | best_inertia = inertia; 160 | best_centroids = Some(centroids); 161 | best_labels = Some(labels); 162 | } 163 | } 164 | 165 | self.centroids = best_centroids; 166 | self.labels = best_labels; 167 | } 168 | 169 | // Combines fit and predict into a single function 170 | pub fn fit_predict_helper(&mut self, data: &DMatrix) -> DVector { 171 | self.fit_helper(data); 172 | self.predict_helper(data).unwrap() 173 | } 174 | 175 | // Runs a single iteration of KMeans, initializing centroids and iteratively updating them 176 | fn run_single(&self, data: &DMatrix) -> (DMatrix, DVector, f64) { 177 | let mut centroids = match self.init_method { 178 | InitMethod::KMeansPlusPlus => self.kmeans_plus_plus(data), 179 | InitMethod::Random => self.random_init(data), 180 | }; 181 | 182 | let mut labels = DVector::from_element(data.nrows(), 0); 183 | let mut inertia = f64::MAX; 184 | 185 | // Iterate to refine centroids and labels 186 | for _ in 0..self.max_iter { 187 | labels = self.assign_labels(data, ¢roids); 188 | let new_centroids = self.update_centroids(data, &labels); 189 | 190 | let new_inertia = self.compute_inertia_helper(data, &labels, &new_centroids); 191 | if (inertia - new_inertia).abs() < 1e-4 { 192 | break; // Stop if the improvement in inertia is negligible 193 | } 194 | 195 | centroids = new_centroids; 196 | inertia = new_inertia; 197 | } 198 | 199 | (centroids, labels, inertia) 200 | } 201 | 202 | // Randomly selects k data points as initial centroids 203 | fn random_init(&self, data: &DMatrix) -> DMatrix { 204 | let mut rng = rand::thread_rng(); 205 | let mut indices: Vec = (0..data.nrows()).collect(); 206 | indices.shuffle(&mut rng); 207 | let selected = indices.iter().take(self.k).copied().collect::>(); 208 | DMatrix::from_columns( 209 | &selected 210 | .iter() 211 | .map(|&i| data.row(i).transpose()) 212 | .collect::>(), 213 | ) 214 | } 215 | 216 | // Implements KMeans++ initialization to choose centroids 217 | fn kmeans_plus_plus(&self, data: &DMatrix) -> DMatrix { 218 | let mut rng = rand::thread_rng(); 219 | let mut centroids = Vec::new(); 220 | centroids.push(data.row(rng.gen_range(0..data.nrows())).transpose()); 221 | 222 | for _ in 1..self.k { 223 | let distances: Vec = data 224 | .row_iter() 225 | .map(|row| { 226 | centroids 227 | .iter() 228 | .map(|centroid| (row - centroid.transpose()).norm_squared()) 229 | .fold(f64::MAX, f64::min) 230 | }) 231 | .collect(); 232 | 233 | let cumulative_distances: Vec = distances 234 | .iter() 235 | .scan(0.0, |acc, &dist| { 236 | *acc += dist; 237 | Some(*acc) 238 | }) 239 | .collect(); 240 | 241 | let total_distance = *cumulative_distances.last().unwrap(); 242 | let rand_distance = rng.gen_range(0.0..total_distance); 243 | 244 | let next_idx = cumulative_distances 245 | .iter() 246 | .position(|&d| d >= rand_distance) 247 | .unwrap(); 248 | 249 | centroids.push(data.row(next_idx).transpose()); 250 | } 251 | 252 | DMatrix::from_columns(¢roids) 253 | } 254 | 255 | // Assigns each data point to the nearest centroid 256 | fn assign_labels(&self, data: &DMatrix, centroids: &DMatrix) -> DVector { 257 | // Map each data point to the index of its closest centroid 258 | DVector::from_iterator( 259 | data.nrows(), 260 | data.row_iter().map(|data_point| { 261 | // For the current data point, calculate the distance to each centroid 262 | let closest_centroid = centroids 263 | .column_iter() 264 | .enumerate() // Keep track of the centroid index 265 | .map(|(centroid_idx, centroid)| { 266 | // Compute the squared Euclidean distance to the centroid 267 | let distance = (data_point - centroid.transpose()).norm_squared(); 268 | (centroid_idx, distance) 269 | }) 270 | // Find the centroid with the smallest distance 271 | .min_by(|(_, dist_a), (_, dist_b)| dist_a.partial_cmp(dist_b).unwrap()) 272 | .unwrap(); // Unwrap is safe because centroids is non-empty 273 | 274 | closest_centroid.0 // Return the index of the closest centroid 275 | }), 276 | ) 277 | } 278 | 279 | // Updates centroids based on the mean of assigned points 280 | fn update_centroids(&self, data: &DMatrix, labels: &DVector) -> DMatrix { 281 | let mut centroids = vec![DVector::zeros(data.ncols()); self.k]; 282 | let mut counts = vec![0; self.k]; 283 | 284 | for (i, label) in labels.iter().enumerate() { 285 | centroids[*label] += data.row(i).transpose(); 286 | counts[*label] += 1; 287 | } 288 | 289 | for (centroid, &count) in centroids.iter_mut().zip(&counts) { 290 | if count > 0 { 291 | *centroid /= count as f64; 292 | } 293 | } 294 | 295 | DMatrix::from_columns(¢roids) 296 | } 297 | 298 | // Computes the inertia (sum of squared distances to nearest centroids) 299 | pub fn compute_inertia_helper( 300 | &self, 301 | data: &DMatrix, 302 | labels: &DVector, 303 | centroids: &DMatrix, 304 | ) -> f64 { 305 | data.row_iter() 306 | .enumerate() 307 | .map(|(i, row)| (row - centroids.column(labels[i]).transpose()).norm_squared()) 308 | .sum() 309 | } 310 | 311 | // Predicts cluster labels for new data points 312 | pub fn predict_helper(&self, data: &DMatrix) -> Option> { 313 | self.centroids 314 | .as_ref() 315 | .map(|centroids| self.assign_labels(data, centroids)) 316 | } 317 | 318 | // 319 | pub fn get_centroids_helper(&self) -> Option<&DMatrix> { 320 | self.centroids.as_ref() 321 | } 322 | } 323 | -------------------------------------------------------------------------------- /rustkit/src/unsupervised/mod.rs: -------------------------------------------------------------------------------- 1 | pub mod pca; 2 | pub mod kmeans; 3 | -------------------------------------------------------------------------------- /rustkit/src/unsupervised/pca.rs: -------------------------------------------------------------------------------- 1 | use crate::benchmarking::log_function_time; 2 | use crate::converters::{ 3 | python_to_rust_dynamic_matrix, python_to_rust_dynamic_vector, rust_to_python_dynamic_matrix, 4 | rust_to_python_dynamic_vector, 5 | }; 6 | use nalgebra::{linalg::SVD, DMatrix, DVector, RowDVector}; 7 | use numpy::{PyArray1, PyArray2, PyReadonlyArray1, PyReadonlyArray2}; 8 | use pyo3::exceptions::PyValueError; 9 | use pyo3::prelude::*; 10 | 11 | #[pyclass] 12 | pub struct PCA { 13 | components: DMatrix, 14 | explained_variance: DVector, 15 | mean: RowDVector, 16 | } 17 | 18 | // TODO: make a fix to use partial SVD decomposition for efficiency 19 | #[pymethods] 20 | impl PCA { 21 | /// Creates a new PCA instance. 22 | #[new] 23 | pub fn new() -> Self { 24 | PCA { 25 | components: DMatrix::zeros(0, 0), 26 | explained_variance: DVector::zeros(0), 27 | mean: RowDVector::zeros(0), 28 | } 29 | } 30 | 31 | pub fn fit(&mut self, data: PyReadonlyArray2, n_components: i64) -> PyResult<()> { 32 | let x = python_to_rust_dynamic_matrix(&data); 33 | let result = log_function_time( 34 | || self.fit_helper(&x, n_components.abs() as usize), 35 | "PCA::fit", 36 | x.nrows(), 37 | x.ncols(), 38 | ); 39 | match result { 40 | Ok(_) => Ok(()), 41 | Err(e) => Err(PyValueError::new_err(e.to_string())), 42 | } 43 | } 44 | 45 | pub fn transform( 46 | &self, 47 | py: Python, 48 | data: PyReadonlyArray2, 49 | ) -> PyResult>> { 50 | let x = python_to_rust_dynamic_matrix(&data); 51 | let transformed_data = log_function_time( 52 | || self.transform_helper(&x), 53 | "PCA::transform", 54 | x.nrows(), 55 | x.ncols(), 56 | ) 57 | .unwrap(); 58 | rust_to_python_dynamic_matrix(py, transformed_data) 59 | } 60 | 61 | pub fn fit_transform( 62 | &mut self, 63 | py: Python, 64 | data: PyReadonlyArray2, 65 | n_components: i64, 66 | ) -> PyResult>> { 67 | let x = python_to_rust_dynamic_matrix(&data); 68 | let transformed_data = log_function_time( 69 | || self.fit_transform_helper(&x, n_components.abs() as usize), 70 | "PCA::fit_transform", 71 | x.nrows(), 72 | x.ncols(), 73 | ) 74 | .unwrap(); 75 | rust_to_python_dynamic_matrix(py, transformed_data) 76 | } 77 | 78 | pub fn inverse_transform( 79 | &self, 80 | py: Python, 81 | data: PyReadonlyArray2, 82 | ) -> PyResult>> { 83 | let x = python_to_rust_dynamic_matrix(&data); 84 | let original_data = log_function_time( 85 | || self.inverse_transform_helper(&x), 86 | "PCA::inverse_transform", 87 | x.nrows(), 88 | x.ncols(), 89 | ) 90 | .unwrap(); 91 | rust_to_python_dynamic_matrix(py, original_data) 92 | } 93 | 94 | #[getter] 95 | pub fn components(&self, py: Python) -> PyResult>> { 96 | rust_to_python_dynamic_matrix(py, self.components.clone()) 97 | } 98 | 99 | #[getter] 100 | pub fn explained_variance(&self, py: Python) -> PyResult>> { 101 | rust_to_python_dynamic_vector(py, self.explained_variance.clone()) 102 | } 103 | 104 | #[getter] 105 | pub fn mean(&self, py: Python) -> PyResult>> { 106 | let mean = self.mean.transpose(); 107 | rust_to_python_dynamic_vector(py, mean) 108 | } 109 | 110 | #[setter] 111 | pub fn set_components(&mut self, components: PyReadonlyArray2) -> PyResult<()> { 112 | self.components = python_to_rust_dynamic_matrix(&components); 113 | Ok(()) 114 | } 115 | 116 | #[setter] 117 | pub fn set_explained_variance( 118 | &mut self, 119 | explained_variance: PyReadonlyArray1, 120 | ) -> PyResult<()> { 121 | self.explained_variance = python_to_rust_dynamic_vector(&explained_variance); 122 | Ok(()) 123 | } 124 | 125 | #[setter] 126 | pub fn set_mean(&mut self, mean: PyReadonlyArray1) -> PyResult<()> { 127 | self.mean = python_to_rust_dynamic_vector(&mean).transpose(); 128 | Ok(()) 129 | } 130 | } 131 | 132 | impl PCA { 133 | /// Fits the PCA model to the data and computes the principal components. 134 | pub fn fit_helper(&mut self, data: &DMatrix, n_components: usize) { 135 | let (n_samples, n_features) = data.shape(); 136 | assert!( 137 | n_components <= n_features, 138 | "n_components must be <= number of features" 139 | ); 140 | 141 | // Compute the mean of each feature 142 | self.mean = data.row_mean(); 143 | 144 | // Center the data by subtracting the mean 145 | let centered_data = data - DMatrix::from_rows(&vec![self.mean.clone(); n_samples]); 146 | 147 | // Compute the covariance matrix 148 | let covariance_matrix = 149 | ¢ered_data.transpose() * ¢ered_data / (n_samples as f64 - 1.0); 150 | 151 | // Perform Singular Value Decomposition 152 | let svd = SVD::new(covariance_matrix, true, true); 153 | 154 | // Extract the top n_components 155 | self.components = svd.v_t.unwrap().rows(0, n_components).transpose(); 156 | self.explained_variance = svd 157 | .singular_values 158 | .rows(0, n_components) 159 | .map(|s| s * s / (n_samples as f64 - 1.0)); 160 | } 161 | 162 | /// Transforms the data to the principal component space. 163 | pub fn transform_helper(&self, data: &DMatrix) -> DMatrix { 164 | let n_samples = data.nrows(); 165 | let centered_data = data - DMatrix::from_rows(&vec![self.mean.clone(); n_samples]); 166 | ¢ered_data * &self.components 167 | } 168 | 169 | /// Fits the PCA model and transforms the data. 170 | pub fn fit_transform_helper( 171 | &mut self, 172 | data: &DMatrix, 173 | n_components: usize, 174 | ) -> DMatrix { 175 | self.fit_helper(data, n_components); 176 | self.transform_helper(data) 177 | } 178 | 179 | /// Inversely transforms the data back to the original feature space. 180 | pub fn inverse_transform_helper(&self, data: &DMatrix) -> DMatrix { 181 | let n_samples = data.nrows(); 182 | let reconstructed_data = data * self.components.transpose(); 183 | reconstructed_data + DMatrix::from_rows(&vec![self.mean.clone(); n_samples]) 184 | } 185 | 186 | /// Returns the principal components. 187 | pub fn components_helper(&self) -> &DMatrix { 188 | &self.components 189 | } 190 | 191 | /// Returns the amount of variance explained by each of the selected components. 192 | pub fn explained_variance_helper(&self) -> &DVector { 193 | &self.explained_variance 194 | } 195 | } 196 | --------------------------------------------------------------------------------