├── .gitignore ├── CHANGELOG.md ├── LICENSE.md ├── README.md ├── TODO.md ├── doc ├── eulerflow.lyx ├── eulerflow.pdf ├── wlsqm.lyx ├── wlsqm.pdf ├── wlsqm_gen.lyx └── wlsqm_gen.pdf ├── example.png ├── examples ├── expertsolver_example.py ├── lapackdrivers_example.py ├── sudoku_lhs.py └── wlsqm_example.py ├── setup.py └── wlsqm ├── __init__.py ├── fitter ├── __init__.py ├── defs.pxd ├── defs.pyx ├── expert.pyx ├── impl.pxd ├── impl.pyx ├── infra.pxd ├── infra.pyx ├── interp.pxd ├── interp.pyx ├── polyeval.pxd ├── polyeval.pyx ├── popcount.h ├── simple.pxd └── simple.pyx └── utils ├── __init__.py ├── lapackdrivers.pxd ├── lapackdrivers.pyx ├── ptrwrap.pxd └── ptrwrap.pyx /.gitignore: -------------------------------------------------------------------------------- 1 | *~ 2 | *.pyc 3 | *.c 4 | build 5 | -------------------------------------------------------------------------------- /CHANGELOG.md: -------------------------------------------------------------------------------- 1 | ## Changelog 2 | 3 | ### [v0.1.5] 4 | - support both Python 3.4 and 2.7 5 | 6 | ### [v0.1.4] 7 | - actually use the shorter short description (oops) 8 | 9 | ### [v0.1.3] 10 | - setup.py is now Python 3 compatible (but wlsqm itself is not yet!) 11 | - fixed sdist: package also CHANGELOG.md 12 | 13 | ### [v0.1.2] 14 | - set zip_safe to False to better work with Cython (important for libs that depend on this one) 15 | 16 | ### [v0.1.1] 17 | - change distribution system from distutils to setuptools 18 | 19 | ### [v0.1.0] 20 | - initial version 21 | 22 | -------------------------------------------------------------------------------- /LICENSE.md: -------------------------------------------------------------------------------- 1 | Copyright (c) 2016-2017, Juha Jeronen and University of Jyväskylä. 2 | All rights reserved. 3 | 4 | Redistribution and use in source and binary forms, with or without 5 | modification, are permitted provided that the following conditions are met: 6 | * Redistributions of source code must retain the above copyright 7 | notice, this list of conditions and the following disclaimer. 8 | * Redistributions in binary form must reproduce the above copyright 9 | notice, this list of conditions and the following disclaimer in the 10 | documentation and/or other materials provided with the distribution. 11 | 12 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 13 | ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 14 | WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 15 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER BE LIABLE FOR ANY 16 | DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 17 | (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 18 | LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND 19 | ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 20 | (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 21 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 22 | 23 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # wlsqm 2 | 3 | Weighted least squares meshless interpolator 4 | 5 | ![2D example](example.png) 6 | 7 | 8 | ## Introduction 9 | 10 | WLSQM (Weighted Least SQuares Meshless) is a fast and accurate meshless least-squares interpolator for Python, for scalar-valued data defined as point values on 1D, 2D and 3D point clouds. 11 | 12 | Use cases include response surface modeling, and computing space derivatives of data known only as values at discrete points in space (this has applications in explicit algorithms for solving IBVPs). No grid or mesh is needed. No restriction is imposed on geometry other than "not degenerate", e.g. points in 2D should not all fall onto the same 1D line. 13 | 14 | This is an independent implementation of the weighted least squares meshless algorithm described (in the 2nd order 2D case) in section 2.2.1 of Hong Wang (2012), Evolutionary Design Optimization with Nash Games and Hybridized Mesh/Meshless Methods in Computational Fluid Dynamics, Jyväskylä Studies in Computing 162, University of Jyväskylä. [ISBN 978-951-39-5007-1 (PDF)](http://urn.fi/URN:ISBN:978-951-39-5007-1) 15 | 16 | This implementation is targeted for high performance in a single-node environment, such as a laptop. Cython is used to accelerate the low-level routines. The main target is the `x86_64` architecture, but any 64-bit architecture should be fine with the appropriate compiler option changes to [setup.py](setup.py). 17 | 18 | Currently automated unit tests are missing; this is an area that is likely to be improved. Otherwise the code is already rather stable; any major new features are unlikely to be added, and the API is considered stable. 19 | 20 | 21 | ## Features 22 | 23 | - Given scalar data values on a set of points in 1D, 2D or 3D, construct a piecewise polynomial global surrogate model (a.k.a. response surface), using up to 4th order polynomials. 24 | 25 | - Sliced arrays are supported for input, both for the geometry (points) and data (function values). 26 | 27 | - Obtain any derivative of the model, up to the order of the polynomial. Derivatives at each "local model reference point" xi are directly available as DOFs of the solution. Derivatives at any other point can be automatically interpolated from the model. Differentiation of polynomials has been hardcoded to obtain high performance. 28 | 29 | - Knowns. At the model reference point xi, the function value and/or any of the derivatives can be specified as knowns. The knowns are internally automatically eliminated (making the equation system smaller) and only the unknowns are fitted. The function value itself may also be unknown, which is useful for implementing Neumann BCs in a PDE (IBVP) solving context. 30 | 31 | - Selectable weighting method for the fitting error, to support different use cases: 32 | - uniform (`wlsqm.fitter.defs.WEIGHT_UNIFORM`), for best overall fit for function values 33 | - emphasize points closer to xi (`wlsqm.fitter.defs.WEIGHT_CENTER`), to improve derivatives at the reference point xi by reducing the influence of points far away from the reference point. 34 | 35 | - Sensitivity data of solution DOFs (on the data values at points other than the reference in the local neighborhood) can be optionally computed. 36 | 37 | - Expert mode with separate prepare and solve stages, for faster fitting of many data sets using the same geometry. Also performs global model patching, using the set of local models fitted. 38 | 39 | **CAVEAT**: `wlsqm.fitter.expert.ExpertSolver` instances are not currently pickleable or copyable. This is a known limitation that may (or may not) change in the future. 40 | 41 | It is nevertheless recommended to use ExpertSolver, since this allows for easy simultaneous solving of many local models (in parallel), automatic global model patching, and reuse of problem matrices when the geometry of the point cloud does not change. 42 | 43 | - Speed: 44 | - Performance-critical parts are implemented in Cython, and the GIL is released during computation. 45 | - LAPACK is used directly via [SciPy's Cython-level bindings](https://docs.scipy.org/doc/scipy/reference/linalg.cython_lapack.html) (see the `ntasks` parameter in various API functions in `wlsqm`). This is especially useful when many (1e4 or more) local models are being fitted, as the solver loop does not require holding the GIL. 46 | - OpenMP is used for parallelization over the independent local problems (also in the linear solver step). 47 | - The polynomial evaluation code has been manually optimized to reduce the number of FLOPS required. 48 | 49 | In 1D, the Horner form is used. The 2D and 3D cases use a symmetric form that extends the 1D Horner form into multiple dimensions (see [wlsqm/fitter/polyeval.pyx](wlsqm/fitter/polyeval.pyx) for details). The native FMA (fused multiply-add) instruction of the CPU is used in the evaluation to further reduce FLOPS required, and to improve accuracy (utilizing the fact it rounds only once). 50 | 51 | - Accuracy: 52 | - Problem matrices are preconditioned by a symmetry-preserving scaling algorithm (D. Ruiz 2001; exact reference given in [wlsqm/utils/lapackdrivers.pyx](wlsqm/utils/lapackdrivers.pyx)) to obtain best possible accuracy from the direct linear solver. This is critical especially for high-order fits. 53 | - The fitting procedure optionally accomodates an internal iterative refinement loop to mitigate the effect of roundoff. 54 | - FMA, as mentioned above. 55 | 56 | 57 | ## Installation 58 | 59 | ### From PyPI (wlsqm v0.1.1+) 60 | 61 | Install as user: 62 | 63 | ```bash 64 | pip install wlsqm --user 65 | ``` 66 | 67 | Install as admin: 68 | 69 | ```bash 70 | sudo pip install wlsqm 71 | ``` 72 | 73 | ### From GitHub 74 | 75 | As user: 76 | 77 | ```bash 78 | git clone https://github.com/Technologicat/python-wlsqm.git 79 | cd python-wlsqm 80 | python setup.py install --user 81 | ``` 82 | 83 | As admin, change the last command to 84 | 85 | ```bash 86 | sudo python setup.py install 87 | ``` 88 | 89 | 90 | ## Documentation 91 | 92 | For usage examples, see [examples/wlsqm_example.py](examples/wlsqm_example.py) for a tour, and [examples/expertsolver_example.py](examples/expertsolver_example.py) for a minimal example concentrating specifically on `ExpertSolver`. 93 | 94 | For the technical details, see the docstrings and comments in the code itself. 95 | 96 | Mathematics documented at: 97 | 98 | - [https://yousource.it.jyu.fi/jjrandom2/freya/trees/master/docs](https://yousource.it.jyu.fi/jjrandom2/freya/trees/master/docs) [dead link, relevant files mirrored below] 99 | 100 | where the relevant files are [mirrored locally on GitHub]: 101 | 102 | - [wlsqm.pdf](doc/wlsqm.pdf) (old documentation for the old pure-Python version of WLSQM included in FREYA, plus the sensitivity calculation) 103 | - [eulerflow.pdf](doc/eulerflow.pdf) (clearer presentation of the original version, but without the sensitivity calculation) 104 | - [wlsqm_gen.pdf](doc/wlsqm_gen.pdf) (theory diff on how to make a version that handles also missing function values; also why WLSQM works and some analysis of its accuracy) 105 | 106 | The documentation is slightly out of date; see [TODO](TODO.md) for details on what needs updating and how. 107 | 108 | 109 | ## Experiencing crashes? 110 | 111 | Check that you are loading the same BLAS your LAPACK and SciPy link against: 112 | 113 | ```bash 114 | shopt -s globstar 115 | ldd /usr/lib/**/*lapack*.so | grep blas 116 | ldd $(dirname $(python -c "import scipy; print(scipy.__file__)"))/linalg/cython_lapack.so | grep blas 117 | ``` 118 | 119 | In Debian-based Linux, you can change the active BLAS implementation by: 120 | 121 | ```bash 122 | sudo update-alternatives --config libblas.so 123 | sudo update-alternatives --config libblas.so.3 124 | ``` 125 | 126 | This may (or may not) be different from what NumPy links against: 127 | 128 | ```bash 129 | ldd $(dirname $(python -c "import numpy; print(numpy.__file__)"))/core/multiarray.so | grep blas 130 | ``` 131 | 132 | WLSQM itself does not link against LAPACK or BLAS; it utilizes the `cython_lapack` module of SciPy. 133 | 134 | 135 | ## Dependencies 136 | 137 | - [NumPy](http://www.numpy.org) 138 | - [SciPy](http://www.scipy.org) 139 | - [Cython](http://www.cython.org) 140 | - [Matplotlib](http://matplotlib.org/) (for usage examples) 141 | 142 | 143 | ## License 144 | 145 | [BSD](LICENSE.md). Copyright 2016-2017 Juha Jeronen and University of Jyväskylä. 146 | 147 | 148 | #### Acknowledgement 149 | 150 | This work was financially supported by the Jenny and Antti Wihuri Foundation. 151 | -------------------------------------------------------------------------------- /TODO.md: -------------------------------------------------------------------------------- 1 | High priority 2 | ============= 3 | 4 | - create unit tests 5 | 6 | ------------------------------------------------------------------------------- 7 | 8 | General 9 | ======= 10 | 11 | - figure out a way to automatically add function signatures to docstrings for functions defined in Cython modules 12 | - in the current docstrings, the "def" has been simply manually copy'n'pasted into the docstring, 13 | causing unnecessary duplication and introducing a potential source of errors in documentation. 14 | 15 | - move examples/sudoku_lhs into a separate proper library. 16 | 17 | Documentation 18 | ============= 19 | 20 | - Update the documentation: 21 | 1. Emphasize surrogate models / response surface modeling (the Taylor series based intuition is misleading, as it severely overestimates the error). 22 | 2. Introduce the weighting factors `w[k]`. Change the definition of the total squared error G to be the weighted total squared error, where each neighbor point x[k] has its own weight w[k]. The end result is basically just to weight, in each sum over k, each term by `w[k]`. See `wlsqm/fitter/impl.pyx`. 23 | 3. Add a comment about matrix scaling, which drastically improves the condition number. Include a short comment on how to use the row and column scaling arrays. Cite the algorithm papers (see `wlsqm/utils/lapackdrivers.pyx`). 24 | 4. Add a comment about iterative refinement to reduce effects of roundoff (this technique is rather standard in least-squares fitting). The use of FMA in `wlsqm/fitter/polyeval.pyx`, used internally to compute `error = (data - model)`, may also mitigate roundoff, since it computes `op1*op2 + op3`, rounding only the end result. 25 | 5. Combine the pieces into a single document (see README.md for a listing of the pieces). 26 | 27 | utils.lapackdrivers 28 | =================== 29 | 30 | - add option to return also the orthogonal matrices U and V in `svd()` (currently this routine is only useful to compute the 2-norm condition number) 31 | 32 | fitter 33 | ====== 34 | 35 | - fix TODOs in `setup.py` 36 | 37 | - API professionalism: 38 | - make `wlsqm.fitter.expert.ExpertSolver` instances copyable 39 | - needs a copy() method that deep-copies also the C-level stuff (re-running the memory allocation fandango) 40 | - `wlsqm.fitter.infra` needs a `Case_copy()` method, because `wlsqm.fitter.infra.Case` contains pointers 41 | - make `wlsqm.fitter.expert.ExpertSolver` instances pickleable (need to save/load the C-level stuff) 42 | - use `DTYPE` and `DTYPE_t` aliases instead of `double`/`np.float64` directly, to allow compiling a version with complex number support 43 | 44 | - test the 3D support more thoroughly 45 | - `wlsqm/fitter/polyeval.pyx`: make really, really sure `taylor_3D()`, `general_3D()` are bug-free 46 | - `wlsqm/fitter/interp.pyx`: make really, really sure `interpolate_3D()` is bug-free 47 | - write a unit test: generate random `sympy` functions (from a preset seed to make the test repeatable), differentiate them symbolically, fit models of orders 0, 1, 2, 3, 4 and compare all up to 34 derivatives with the exact result (the worst case should be within approx. `100*machine_epsilon` at least for the function value itself). 48 | 49 | - profile performance, see [http://stackoverflow.com/questions/28301931/how-to-profile-cython-functions-line-by-line](http://stackoverflow.com/questions/28301931/how-to-profile-cython-functions-line-by-line) 50 | 51 | - fix various small TODOs and FIXMEs in the code (low priority) 52 | 53 | - maybe: ExpertSolver: fix the silly slicing requirement in model interpolation: make it possible to interpolate the model to a single point without a memoryview 54 | - but profile the performance first to check whether this actually causes a problem 55 | - multiple points require the memoryview, because in the general case the input is non-contiguous (a sliced array) 56 | 57 | - maybe: reduce code duplication between driver and expert mode 58 | - split `generic_fit_basic_many()` (and its friends) into prepare and solve stages, implement the driver in terms of calling these stages 59 | - re-use the same stages in ExpertSolver 60 | 61 | -------------------------------------------------------------------------------- /doc/eulerflow.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Technologicat/python-wlsqm/b697d163c2d2bec46b4d9696467abaebb9d4cbb3/doc/eulerflow.pdf -------------------------------------------------------------------------------- /doc/wlsqm.lyx: -------------------------------------------------------------------------------- 1 | #LyX 1.6.7 created this file. For more info see http://www.lyx.org/ 2 | \lyxformat 345 3 | \begin_document 4 | \begin_header 5 | \textclass article 6 | \use_default_options true 7 | \language english 8 | \inputencoding auto 9 | \font_roman palatino 10 | \font_sans default 11 | \font_typewriter default 12 | \font_default_family default 13 | \font_sc false 14 | \font_osf false 15 | \font_sf_scale 100 16 | \font_tt_scale 100 17 | 18 | \graphics default 19 | \paperfontsize default 20 | \spacing single 21 | \use_hyperref true 22 | \pdf_bookmarks true 23 | \pdf_bookmarksnumbered false 24 | \pdf_bookmarksopen false 25 | \pdf_bookmarksopenlevel 1 26 | \pdf_breaklinks false 27 | \pdf_pdfborder true 28 | \pdf_colorlinks false 29 | \pdf_backref false 30 | \pdf_pdfusetitle true 31 | \papersize default 32 | \use_geometry true 33 | \use_amsmath 1 34 | \use_esint 1 35 | \cite_engine natbib_authoryear 36 | \use_bibtopic false 37 | \paperorientation portrait 38 | \leftmargin 1in 39 | \topmargin 1in 40 | \rightmargin 1in 41 | \bottommargin 1in 42 | \secnumdepth 3 43 | \tocdepth 3 44 | \paragraph_separation skip 45 | \defskip medskip 46 | \quotes_language english 47 | \papercolumns 1 48 | \papersides 1 49 | \paperpagestyle default 50 | \tracking_changes false 51 | \output_changes false 52 | \author "" 53 | \author "" 54 | \end_header 55 | 56 | \begin_body 57 | 58 | \begin_layout Title 59 | Notes based on H. 60 | Wang's meshless method presentation for the FSI team 61 | \end_layout 62 | 63 | \begin_layout Author 64 | Juha Jeronen 65 | \end_layout 66 | 67 | \begin_layout Abstract 68 | This technical note explains how to approximate derivatives of a known function, 69 | defined as a set of values on a point cloud, where each point may have 70 | arbitrary Cartesian coordinates. 71 | This is a meshless method based on Taylor series expansion in a local set 72 | of nearest neighbors. 73 | It can be used for, e.g., integration of initial boundary value problems 74 | using explicit methods (e.g. 75 | RK4). 76 | \end_layout 77 | 78 | \begin_layout Abstract 79 | Also, a simple 80 | \begin_inset Formula $O(d\, N\,\log N)$ 81 | \end_inset 82 | 83 | time algorithm for finding the nearest neighbors in 84 | \begin_inset Formula $d$ 85 | \end_inset 86 | 87 | dimensions is presented for the sake of completeness. 88 | \end_layout 89 | 90 | \begin_layout Subsubsection* 91 | Derivative approximation --- the weighted least squares meshless method 92 | \end_layout 93 | 94 | \begin_layout Standard 95 | We will present the 96 | \emph on 97 | weighted least squares 98 | \emph default 99 | 100 | \emph on 101 | meshless method 102 | \emph default 103 | (WLSQ). 104 | It belongs to the class of finite point methods (collocation methods), 105 | so in spirit it is similar to finite differences. 106 | Because the method only differentiates known quantities, it is best suited 107 | for time evolution problems (initial boundary value problems; IBVP), which 108 | are solved with explicit time integration methods such as RK4. 109 | Dirichlet boundary conditions are very easy to enforce; Neumann and Robin 110 | are much harder. 111 | \end_layout 112 | 113 | \begin_layout Standard 114 | To start with, consider a point cloud of 115 | \begin_inset Formula $N$ 116 | \end_inset 117 | 118 | points in 119 | \begin_inset Formula $\mathbb{R}^{d}$ 120 | \end_inset 121 | 122 | . 123 | Let 124 | \begin_inset Formula $i$ 125 | \end_inset 126 | 127 | denote the index of the current node under consideration, and 128 | \begin_inset Formula $k$ 129 | \end_inset 130 | 131 | the index of one of its nearest neighbors. 132 | (For finding the 133 | \begin_inset Formula $m$ 134 | \end_inset 135 | 136 | nearest neighbors of a point in a point cloud, refer to the final section 137 | of this document.) 138 | \end_layout 139 | 140 | \begin_layout Standard 141 | Let 142 | \begin_inset Formula $f=f(x_{k}),\; k=1,\dots,N$ 143 | \end_inset 144 | 145 | be a function defined on the point cloud. 146 | Here we will only consider the two-dimensional case ( 147 | \begin_inset Formula $d=2$ 148 | \end_inset 149 | 150 | ) for simplicity. 151 | Let us shorten the notation by defining 152 | \begin_inset Formula $f_{k}:=f(x_{k})$ 153 | \end_inset 154 | 155 | . 156 | \end_layout 157 | 158 | \begin_layout Standard 159 | We would like to be able to approximate the derivatives of 160 | \begin_inset Formula $f$ 161 | \end_inset 162 | 163 | at the point 164 | \begin_inset Formula $x_{i}$ 165 | \end_inset 166 | 167 | , using only the point cloud data. 168 | This has applications in e.g. 169 | explicit time integration of PDEs with given initial data. 170 | \end_layout 171 | 172 | \begin_layout Standard 173 | Below, we will only consider the problem for one node 174 | \begin_inset Formula $x_{i}$ 175 | \end_inset 176 | 177 | . 178 | Trivially, the same procedure can be repeated for each node. 179 | \end_layout 180 | 181 | \begin_layout Standard 182 | Using multivariate Taylor expansion up to the second order, we can write 183 | 184 | \begin_inset Formula $f_{k}$ 185 | \end_inset 186 | 187 | (value of 188 | \begin_inset Formula $f$ 189 | \end_inset 190 | 191 | at one of the nearest neighbors) in terms of 192 | \begin_inset Formula $f_{i}$ 193 | \end_inset 194 | 195 | as 196 | \begin_inset Formula \begin{equation} 197 | f_{k}=f_{i}+h_{k}a_{1}+\ell_{k}a_{2}+\frac{h_{k}^{2}}{2}a_{3}+h_{k}\ell_{k}a_{4}+\frac{\ell_{k}^{2}}{2}a_{5}+O(h_{k}^{3},\ell_{k}^{3})\;,\label{eq:Tay}\end{equation} 198 | 199 | \end_inset 200 | 201 | where 202 | \begin_inset Formula $h_{k}=(x_{k})_{1}-(x_{i})_{1}$ 203 | \end_inset 204 | 205 | (i.e. 206 | the 207 | \begin_inset Formula $x$ 208 | \end_inset 209 | 210 | component of the vector from 211 | \begin_inset Formula $x_{i}$ 212 | \end_inset 213 | 214 | to 215 | \begin_inset Formula $x_{k}$ 216 | \end_inset 217 | 218 | ) and 219 | \begin_inset Formula $\ell_{k}=(x_{k})_{2}-(x_{i})_{2}$ 220 | \end_inset 221 | 222 | (respectively, the 223 | \begin_inset Formula $y$ 224 | \end_inset 225 | 226 | component). 227 | \end_layout 228 | 229 | \begin_layout Standard 230 | Note that generally, we must expand up to as many orders as is the highest 231 | derivative we wish to approximate. 232 | We will assume here for simplicity that we are building the approximation 233 | for a second-order problem. 234 | \end_layout 235 | 236 | \begin_layout Standard 237 | If we drop the asymptotic term, we get the approximation 238 | \begin_inset Formula \begin{equation} 239 | \overline{f}_{k}=f_{i}+h_{k}a_{1}+\ell_{k}a_{2}+\frac{h_{k}^{2}}{2}a_{3}+h_{k}\ell_{k}a_{4}+\frac{\ell_{k}^{2}}{2}a_{5}\;.\label{eq:approx}\end{equation} 240 | 241 | \end_inset 242 | 243 | By the Taylor expansion, we would expect to have 244 | \begin_inset Formula \begin{align} 245 | a_{1} & =\frac{\partial f_{k}}{\partial x}\vert_{x=x_{i}}\nonumber \\ 246 | a_{2} & =\frac{\partial f_{k}}{\partial y}\vert_{x=x_{i}}\nonumber \\ 247 | a_{3} & =\frac{\partial^{2}f_{k}}{\partial x^{2}}\vert_{x=x_{i}}\nonumber \\ 248 | a_{4} & =\frac{\partial^{2}f_{k}}{\partial x\partial y}\vert_{x=x_{i}}\nonumber \\ 249 | a_{5} & =\frac{\partial^{2}f_{k}}{\partial y^{2}}\vert_{x=x_{i}}\;,\label{eq:aj}\end{align} 250 | 251 | \end_inset 252 | 253 | if 254 | \begin_inset Formula $f$ 255 | \end_inset 256 | 257 | was defined on all of 258 | \begin_inset Formula $\mathbb{R}^{2}$ 259 | \end_inset 260 | 261 | . 262 | Our problem is thus to find a good approximation for the values of the 263 | 264 | \begin_inset Formula $a_{j}$ 265 | \end_inset 266 | 267 | . 268 | \end_layout 269 | 270 | \begin_layout Standard 271 | Let us denote 272 | \begin_inset Formula \begin{align} 273 | c_{k}^{(1)} & :=h_{k}\nonumber \\ 274 | c_{k}^{(2)} & :=\ell_{k}\nonumber \\ 275 | c_{k}^{(3)} & :=\frac{h_{k}^{2}}{2}\nonumber \\ 276 | c_{k}^{(3)} & :=h_{k}\ell_{k}\nonumber \\ 277 | c_{k}^{(5)} & :=\frac{\ell_{k}^{2}}{2}\;.\label{eq:ck}\end{align} 278 | 279 | \end_inset 280 | 281 | We would like to minimize the approximation error. 282 | Let us denote the error as 283 | \begin_inset Formula \begin{equation} 284 | e_{k}:=f_{k}-\overline{f}_{k}\;.\label{eq:ek}\end{equation} 285 | 286 | \end_inset 287 | 288 | We proceed by making a least squares approximation. 289 | Let 290 | \begin_inset Formula \begin{equation} 291 | G:=\frac{1}{2}\underset{k}{\sum}e_{k}^{2}\label{eq:G}\end{equation} 292 | 293 | \end_inset 294 | 295 | where the sum is taken over the nearest-neighbor set of 296 | \begin_inset Formula $x_{i}$ 297 | \end_inset 298 | 299 | . 300 | The least-squares approximation is given by the minimum 301 | \begin_inset Formula \[ 302 | \underset{a_{j}}{\min}\, G\;,\] 303 | 304 | \end_inset 305 | 306 | i.e. 307 | such values for the 308 | \begin_inset Formula $a_{j}$ 309 | \end_inset 310 | 311 | that they minimize the squared error 312 | \begin_inset Formula $G$ 313 | \end_inset 314 | 315 | . 316 | \end_layout 317 | 318 | \begin_layout Standard 319 | The minimum of the function 320 | \begin_inset Formula $G=G(a_{1},\dots,a_{5})$ 321 | \end_inset 322 | 323 | is necessarily at an extremum point. 324 | Thus, we set all its partial derivatives to zero (w.r.t the 325 | \begin_inset Formula $a_{j}$ 326 | \end_inset 327 | 328 | ): 329 | \begin_inset Formula \begin{equation} 330 | \frac{\partial G}{\partial a_{j}}=0\quad\forall\; j=1,\dots,5\;.\label{eq:minG}\end{equation} 331 | 332 | \end_inset 333 | 334 | Because 335 | \begin_inset Formula $G\ge0$ 336 | \end_inset 337 | 338 | for any values of the 339 | \begin_inset Formula $a_{j}$ 340 | \end_inset 341 | 342 | and it is a quadratic function, this point is also necessarily the minimum. 343 | Thus, solving equation 344 | \begin_inset CommandInset ref 345 | LatexCommand eqref 346 | reference "eq:minG" 347 | 348 | \end_inset 349 | 350 | gives us the optimal 351 | \begin_inset Formula $a_{j}$ 352 | \end_inset 353 | 354 | . 355 | \end_layout 356 | 357 | \begin_layout Standard 358 | One important thing to notice here is that we of course do not have the 359 | value of the asymptotic term 360 | \family roman 361 | \series medium 362 | \shape up 363 | \size normal 364 | \emph off 365 | \bar no 366 | \noun off 367 | \color none 368 | 369 | \begin_inset Formula $O(h_{k}^{3},\ell_{k}^{3})$ 370 | \end_inset 371 | 372 | in 373 | \begin_inset CommandInset ref 374 | LatexCommand eqref 375 | reference "eq:Tay" 376 | 377 | \end_inset 378 | 379 | . 380 | However, we do not need equation 381 | \begin_inset CommandInset ref 382 | LatexCommand eqref 383 | reference "eq:Tay" 384 | 385 | \end_inset 386 | 387 | for computing the error 388 | \begin_inset CommandInset ref 389 | LatexCommand eqref 390 | reference "eq:ek" 391 | 392 | \end_inset 393 | 394 | . 395 | This is because we already have the value of 396 | \begin_inset Formula $f_{k}$ 397 | \end_inset 398 | 399 | directly, since it is one of the points in the data! Thus, for any set 400 | of values for the 401 | \begin_inset Formula $a_{j}$ 402 | \end_inset 403 | 404 | , the error 405 | \begin_inset CommandInset ref 406 | LatexCommand eqref 407 | reference "eq:ek" 408 | 409 | \end_inset 410 | 411 | can be computed (by replacing 412 | \begin_inset Formula $f_{k}$ 413 | \end_inset 414 | 415 | with the data point in question and computing 416 | \begin_inset Formula $\overline{f}_{k}$ 417 | \end_inset 418 | 419 | from 420 | \begin_inset CommandInset ref 421 | LatexCommand eqref 422 | reference "eq:approx" 423 | 424 | \end_inset 425 | 426 | ). 427 | \end_layout 428 | 429 | \begin_layout Standard 430 | Let us write out 431 | \begin_inset CommandInset ref 432 | LatexCommand eqref 433 | reference "eq:minG" 434 | 435 | \end_inset 436 | 437 | . 438 | We have 439 | \begin_inset Formula \begin{align} 440 | \frac{\partial G}{\partial a_{j}} & =\underset{k}{\sum}e_{k}\frac{\partial e_{k}}{\partial a_{j}}\nonumber \\ 441 | & =\underset{k}{\sum}[f_{k}-\overline{f}_{k}(a_{1},\dots,a_{5})]\left[-\frac{\partial\overline{f}_{k}}{\partial a_{j}}\right]=0\quad\forall\; j=1,\dots5\;,\label{eq:minG2}\end{align} 442 | 443 | \end_inset 444 | 445 | where we have replaced 446 | \begin_inset Formula $e_{k}$ 447 | \end_inset 448 | 449 | by the difference of data 450 | \begin_inset Formula $f_{k}$ 451 | \end_inset 452 | 453 | and the interpolate 454 | \begin_inset Formula $\overline{f}_{k}$ 455 | \end_inset 456 | 457 | , as noted above. 458 | \end_layout 459 | 460 | \begin_layout Standard 461 | Now the rest is essentially technique. 462 | Expanding the first 463 | \begin_inset Formula $\overline{f}_{k}$ 464 | \end_inset 465 | 466 | in 467 | \begin_inset CommandInset ref 468 | LatexCommand eqref 469 | reference "eq:minG2" 470 | 471 | \end_inset 472 | 473 | and taking the minus sign in front, we have 474 | \begin_inset Formula \[ 475 | -\underset{k}{\sum}\left(\left[f_{k}-f_{i}-c_{k}^{(1)}a_{1}-c_{k}^{(2)}a_{2}-c_{k}^{(3)}a_{3}-c_{k}^{(4)}a_{4}-c_{k}^{(5)}a_{5}\right]\left[\frac{\partial\overline{f}_{k}}{\partial a_{j}}\right]\right)=0\quad\forall j\;.\] 476 | 477 | \end_inset 478 | 479 | This can be rewritten as a standard linear equation system 480 | \begin_inset Formula \begin{equation} 481 | A\mathbf{a}=\mathbf{b}\;,\label{eq:lineq}\end{equation} 482 | 483 | \end_inset 484 | 485 | where 486 | \begin_inset Formula \[ 487 | \mathbf{a}=(a_{1},\dots,a_{5})^{T}\] 488 | 489 | \end_inset 490 | 491 | are the unknowns, and the 492 | \begin_inset Formula $j$ 493 | \end_inset 494 | 495 | th component of the load vector 496 | \begin_inset Formula $\mathbf{b}$ 497 | \end_inset 498 | 499 | is 500 | \begin_inset Formula \begin{equation} 501 | b_{j}=\underset{k}{\sum}[f_{k}-f_{i}]\left[\frac{\partial\overline{f}_{k}}{\partial a_{j}}\right]=\underset{k}{\sum}[f_{k}-f_{i}]c_{k}^{(j)}\;,\label{eq:bj}\end{equation} 502 | 503 | \end_inset 504 | 505 | where in the last form we have used 506 | \begin_inset CommandInset ref 507 | LatexCommand eqref 508 | reference "eq:approx" 509 | 510 | \end_inset 511 | 512 | and the definition 513 | \begin_inset CommandInset ref 514 | LatexCommand eqref 515 | reference "eq:ck" 516 | 517 | \end_inset 518 | 519 | . 520 | The sum, like above, is taken over the set of nearest neighbors. 521 | Especially note that, as required, all the quantities on the right-hand 522 | side of 523 | \begin_inset CommandInset ref 524 | LatexCommand eqref 525 | reference "eq:bj" 526 | 527 | \end_inset 528 | 529 | are known. 530 | \end_layout 531 | 532 | \begin_layout Standard 533 | The element 534 | \begin_inset Formula $A_{jn}$ 535 | \end_inset 536 | 537 | of the coefficient matrix 538 | \begin_inset Formula $A$ 539 | \end_inset 540 | 541 | is 542 | \begin_inset Formula \begin{equation} 543 | A_{jn}=\underset{k}{\sum}c_{k}^{(n)}c_{k}^{(j)}\;.\label{eq:Ajn}\end{equation} 544 | 545 | \end_inset 546 | 547 | This sum, too, is taken over the set of nearest neighbors. 548 | The matrix 549 | \begin_inset Formula $A$ 550 | \end_inset 551 | 552 | is symmetric, 553 | \begin_inset Formula $A=A^{T}$ 554 | \end_inset 555 | 556 | . 557 | \end_layout 558 | 559 | \begin_layout Standard 560 | Solving 561 | \begin_inset CommandInset ref 562 | LatexCommand eqref 563 | reference "eq:lineq" 564 | 565 | \end_inset 566 | 567 | , by e.g. 568 | pivoted Gaussian elimination (routine DGESV in LAPACK, operator 569 | \backslash 570 | in MATLAB, scipy.linalg.solve() in Python, ...), produces the derivative approximati 571 | ons 572 | \begin_inset Formula $a_{j}$ 573 | \end_inset 574 | 575 | , up to the second order. 576 | \end_layout 577 | 578 | \begin_layout Standard 579 | Note that both 580 | \begin_inset Formula $A$ 581 | \end_inset 582 | 583 | and 584 | \begin_inset Formula $\mathbf{b}$ 585 | \end_inset 586 | 587 | depend on the node index 588 | \begin_inset Formula $i$ 589 | \end_inset 590 | 591 | ! That is, each node comes with its own 592 | \begin_inset Formula $A$ 593 | \end_inset 594 | 595 | and 596 | \begin_inset Formula $\mathbf{b}$ 597 | \end_inset 598 | 599 | , and thus 600 | \begin_inset CommandInset ref 601 | LatexCommand eqref 602 | reference "eq:bj" 603 | 604 | \end_inset 605 | 606 | and 607 | \begin_inset CommandInset ref 608 | LatexCommand eqref 609 | reference "eq:Ajn" 610 | 611 | \end_inset 612 | 613 | must be re-evaluated for each node where we wish to obtain the derivative 614 | approximation. 615 | \end_layout 616 | 617 | \begin_layout Subsubsection* 618 | Sensitivity of the solution 619 | \end_layout 620 | 621 | \begin_layout Standard 622 | It is possible to also obtain the sensitivity of the solution 623 | \begin_inset Formula $\mathbf{a}$ 624 | \end_inset 625 | 626 | 627 | \begin_inset Formula $ $ 628 | \end_inset 629 | 630 | in terms of small changes in the values of the data points 631 | \begin_inset Formula $f_{k}$ 632 | \end_inset 633 | 634 | . 635 | Consider, formally, manipulating 636 | \begin_inset CommandInset ref 637 | LatexCommand eqref 638 | reference "eq:lineq" 639 | 640 | \end_inset 641 | 642 | into 643 | \begin_inset Formula \[ 644 | \mathbf{a}(f_{k})=A^{-1}\cdot\mathbf{b}(f_{k})\;.\] 645 | 646 | \end_inset 647 | 648 | Differentiation on both sides, and writing the equation in component form, 649 | gives (the matrix 650 | \begin_inset Formula $A$ 651 | \end_inset 652 | 653 | is constant w.r.t. 654 | 655 | \begin_inset Formula $f_{k}$ 656 | \end_inset 657 | 658 | ) 659 | \begin_inset Formula \begin{align*} 660 | \frac{\partial a_{j}}{\partial f_{k}} & =\underset{n}{\sum}(A^{-1})_{jn}\frac{\partial b_{n}}{\partial f_{k}}\\ 661 | & =\underset{n}{\sum}(A^{-1})_{jn}c_{k}^{(n)}\;,\quad\forall\; j=1,\dots5\;,\end{align*} 662 | 663 | \end_inset 664 | 665 | which can be rewritten as 666 | \begin_inset Formula \begin{equation} 667 | A\frac{\partial\mathbf{a}}{\partial f_{k}}=(c_{k}^{(1)},c_{k}^{(2)},c_{k}^{(3)},c_{k}^{(4)},c_{k}^{(5)})^{T}\;.\label{eq:sens}\end{equation} 668 | 669 | \end_inset 670 | 671 | Thus we have a linear equation system, from which the sensitivities of each 672 | of the 673 | \begin_inset Formula $a_{j}$ 674 | \end_inset 675 | 676 | in terms of the node value 677 | \begin_inset Formula $f_{k}$ 678 | \end_inset 679 | 680 | can be solved. 681 | By changing 682 | \begin_inset Formula $k$ 683 | \end_inset 684 | 685 | on the right-hand side and solving again for each 686 | \begin_inset Formula $k$ 687 | \end_inset 688 | 689 | , we obtain the sensitivity with respect to each of the neighbors. 690 | (Note that there is 691 | \series bold 692 | no 693 | \series default 694 | sum over 695 | \begin_inset Formula $k$ 696 | \end_inset 697 | 698 | , except inside the matrix 699 | \begin_inset Formula $A$ 700 | \end_inset 701 | 702 | .) 703 | \end_layout 704 | 705 | \begin_layout Standard 706 | This sensitivity result may be useful for forcing Neumann boundary conditions 707 | to hold during IBVP integration (at each timestep, changing the values 708 | at the nodes belonging to the boundary until the BC is satisfied). 709 | \end_layout 710 | 711 | \begin_layout Standard 712 | Again, it should be noted that equation 713 | \begin_inset CommandInset ref 714 | LatexCommand eqref 715 | reference "eq:sens" 716 | 717 | \end_inset 718 | 719 | is valid for the node 720 | \begin_inset Formula $i$ 721 | \end_inset 722 | 723 | , and in principle must be solved separately for each node. 724 | \end_layout 725 | 726 | \begin_layout Standard 727 | However, we observe that the sensitivities depend on the (local) geometry 728 | of the point cloud only. 729 | Recall the definitions of 730 | \begin_inset Formula $A$ 731 | \end_inset 732 | 733 | and 734 | \begin_inset Formula $c_{k}^{(n)}$ 735 | \end_inset 736 | 737 | , equations 738 | \begin_inset CommandInset ref 739 | LatexCommand eqref 740 | reference "eq:Ajn" 741 | 742 | \end_inset 743 | 744 | and 745 | \begin_inset CommandInset ref 746 | LatexCommand eqref 747 | reference "eq:ck" 748 | 749 | \end_inset 750 | 751 | ; the only quantities that appear are the pairwise node distances. 752 | This observation holds for any point cloud. 753 | \end_layout 754 | 755 | \begin_layout Standard 756 | If there is some regularity in the geometry, it may be possible to reuse 757 | (some of) the results. 758 | As a special case, if we have a regular Cartesian grid, the 759 | \begin_inset Formula $c_{k}^{(n)}$ 760 | \end_inset 761 | 762 | are constant with respect to 763 | \begin_inset Formula $k$ 764 | \end_inset 765 | 766 | , and thus in this special case only, the sensitivities at each node follow 767 | the same pattern. 768 | This extends easily to other regular geometries; e.g. 769 | for a grid based on the nodes of a hexagonal tiling, there will be only 770 | two kinds of nodes with regard to the sensitivity. 771 | The strength of the method, however, lies in being able to handle irregular 772 | geometries: in the general case, one does not need to assume anything about 773 | the distribution of the points. 774 | \end_layout 775 | 776 | \begin_layout Subsubsection* 777 | Finding nearest neighbors --- a simple algorithm 778 | \end_layout 779 | 780 | \begin_layout Standard 781 | In this section, we look into the problem of searching a given point cloud 782 | for nearest neighbors. 783 | We consider finding the neighbors within a given distance 784 | \begin_inset Formula $R$ 785 | \end_inset 786 | 787 | from a given point, and finding the 788 | \begin_inset Formula $m$ 789 | \end_inset 790 | 791 | nearest neighbors of a given point, with 792 | \begin_inset Formula $m$ 793 | \end_inset 794 | 795 | given. 796 | \end_layout 797 | 798 | \begin_layout Standard 799 | An example MATLAB/Octave implementation of the ideas presented in this section 800 | is provided in 801 | \family typewriter 802 | 803 | \begin_inset Newline newline 804 | \end_inset 805 | 806 | find_neighbors.m 807 | \family default 808 | (in the SAVU project git repository). 809 | \end_layout 810 | 811 | \begin_layout Paragraph* 812 | Finding all neighbors within distance R 813 | \end_layout 814 | 815 | \begin_layout Standard 816 | For a static point cloud (in the sense of not changing during the simulation), 817 | the nearest neighbor search problem can be solved in 818 | \begin_inset Formula $O(d\, N\,\log\, N)$ 819 | \end_inset 820 | 821 | time (where 822 | \begin_inset Formula $N$ 823 | \end_inset 824 | 825 | is the number of points in the whole cloud, and 826 | \begin_inset Formula $d$ 827 | \end_inset 828 | 829 | is the dimensionality of the space 830 | \begin_inset Formula $\mathbb{R}^{d}$ 831 | \end_inset 832 | 833 | where the points live in) using an indexed search procedure. 834 | For a moving point cloud, the 835 | \begin_inset Quotes eld 836 | \end_inset 837 | 838 | expensive 839 | \begin_inset Quotes erd 840 | \end_inset 841 | 842 | 843 | \begin_inset Formula $O(d\, N\,\log\, N)$ 844 | \end_inset 845 | 846 | step must be re-performed at each timestep. 847 | \end_layout 848 | 849 | \begin_layout Standard 850 | Initially, we create a sorted index of the data based on the coordinates 851 | on each axis. 852 | This gives us 853 | \begin_inset Formula $d$ 854 | \end_inset 855 | 856 | sorted vectors of 857 | \begin_inset Formula $(\text{coordinate along }j\text{th axis},\,\text{point ID})$ 858 | \end_inset 859 | 860 | pairs. 861 | This enables us to search for the set of points, which belong to a given 862 | interval on, say, the 863 | \begin_inset Formula $x$ 864 | \end_inset 865 | 866 | axis ( 867 | \begin_inset Formula $j=1$ 868 | \end_inset 869 | 870 | ; correspondingly for the other axes). 871 | Each sort finishes in 872 | \begin_inset Formula $O(N\,\log\, N)$ 873 | \end_inset 874 | 875 | time, and only needs to be done once (or until the point cloud changes; 876 | then we must re-index). 877 | Then, indexed search on this data can be done using the binary search procedure 878 | in 879 | \begin_inset Formula $O(\log\, N)$ 880 | \end_inset 881 | 882 | time for each dimension. 883 | \end_layout 884 | 885 | \begin_layout Standard 886 | To find the neighbors within distance 887 | \begin_inset Formula $R$ 888 | \end_inset 889 | 890 | of a point with given coordinates in 891 | \begin_inset Formula $\mathbb{R}^{d}$ 892 | \end_inset 893 | 894 | (allowed to be a point belonging to the cloud, but not necessary), we first 895 | search along each axis, producing 896 | \begin_inset Formula $d$ 897 | \end_inset 898 | 899 | filtered index sets in each of which the coordinates on the 900 | \begin_inset Formula $k$ 901 | \end_inset 902 | 903 | th axis match the desired interval 904 | \begin_inset Formula $[(x_{0})_{k}-R/2,\;(x_{0})_{k}+R/2]$ 905 | \end_inset 906 | 907 | . 908 | Taking the set intersection of the result sets gives us the neighbor set 909 | within distance 910 | \begin_inset Formula $R$ 911 | \end_inset 912 | 913 | in the sense of the 914 | \begin_inset Formula $\ell^{\infty}$ 915 | \end_inset 916 | 917 | metric. 918 | The next step is to filter the result further. 919 | \end_layout 920 | 921 | \begin_layout Standard 922 | An important property here is that because 923 | \begin_inset Formula $\Vert x\Vert_{\ell^{\infty}}\le\Vert x\Vert_{\ell^{p}}$ 924 | \end_inset 925 | 926 | for all 927 | \begin_inset Formula $1\le p<\infty$ 928 | \end_inset 929 | 930 | , the 931 | \begin_inset Formula $\ell^{\infty}$ 932 | \end_inset 933 | 934 | neighbor set encloses all other 935 | \begin_inset Formula $\ell^{p}$ 936 | \end_inset 937 | 938 | neighbor sets, including the Euclidean neighbor set (with 939 | \begin_inset Formula $p=2$ 940 | \end_inset 941 | 942 | ). 943 | Thus, all these other neighbor sets can be produced by filtering the 944 | \begin_inset Formula $\ell^{\infty}$ 945 | \end_inset 946 | 947 | neighbor set. 948 | \end_layout 949 | 950 | \begin_layout Standard 951 | The 952 | \begin_inset Formula $\ell^{\infty}$ 953 | \end_inset 954 | 955 | neighbor set, with 956 | \begin_inset Formula $M$ 957 | \end_inset 958 | 959 | points, is for any practically interesting 960 | \begin_inset Formula $R$ 961 | \end_inset 962 | 963 | much smaller than the whole cloud ( 964 | \begin_inset Formula $M\ll N$ 965 | \end_inset 966 | 967 | ). 968 | Thus, linear filtering of the result set, which takes 969 | \begin_inset Formula $O(M)$ 970 | \end_inset 971 | 972 | time, is not a major cost. 973 | \end_layout 974 | 975 | \begin_layout Standard 976 | To find the 977 | \begin_inset Formula $\ell^{2}$ 978 | \end_inset 979 | 980 | (Euclidean) neighbor set, we simply construct a new result set, including 981 | in it only those points in the 982 | \begin_inset Formula $\ell^{\infty}$ 983 | \end_inset 984 | 985 | neighbor set that also satisfy the 986 | \begin_inset Formula $\ell^{2}$ 987 | \end_inset 988 | 989 | distance requirement 990 | \begin_inset Formula $\Vert x_{j}-x_{0}\Vert_{\ell^{2}}\le R$ 991 | \end_inset 992 | 993 | . 994 | \end_layout 995 | 996 | \begin_layout Paragraph* 997 | Finding the m nearest neighbors 998 | \end_layout 999 | 1000 | \begin_layout Standard 1001 | Finally, consider the question of finding 1002 | \begin_inset Formula $R$ 1003 | \end_inset 1004 | 1005 | such that within this radius, there are exactly 1006 | \begin_inset Formula $m$ 1007 | \end_inset 1008 | 1009 | neighbors (where 1010 | \begin_inset Formula $m$ 1011 | \end_inset 1012 | 1013 | is user-specified). 1014 | This provides us a nearest-neighbor search procedure for user-definable 1015 | 1016 | \begin_inset Formula $m$ 1017 | \end_inset 1018 | 1019 | , which is what we need in the meshless method. 1020 | \end_layout 1021 | 1022 | \begin_layout Standard 1023 | We start from some 1024 | \begin_inset Formula $R=r_{0}$ 1025 | \end_inset 1026 | 1027 | (this can be e.g. 1028 | some function of the size of the bounding box of the data, which can be 1029 | trivially found in 1030 | \begin_inset Formula $O(N)$ 1031 | \end_inset 1032 | 1033 | time, and the number of points in the data (e.g. 1034 | assuming them to have uniform density and estimating average 1035 | \begin_inset Formula $R$ 1036 | \end_inset 1037 | 1038 | from that)). 1039 | We then do a logarithmic search, counting the neighbors within radius 1040 | \begin_inset Formula $R$ 1041 | \end_inset 1042 | 1043 | and, based on the result, we either double or halve 1044 | \begin_inset Formula $R$ 1045 | \end_inset 1046 | 1047 | at each step. 1048 | \end_layout 1049 | 1050 | \begin_layout Standard 1051 | By this logarithmic search, we may get lucky and hit an 1052 | \begin_inset Formula $R$ 1053 | \end_inset 1054 | 1055 | where there are exactly 1056 | \begin_inset Formula $m$ 1057 | \end_inset 1058 | 1059 | neighbors. 1060 | In this case, we stop and return the current neighbor set. 1061 | \end_layout 1062 | 1063 | \begin_layout Standard 1064 | But most often, we will find an interval 1065 | \begin_inset Formula $R\in[R_{1},R_{2}]$ 1066 | \end_inset 1067 | 1068 | where 1069 | \begin_inset Formula $R_{1}$ 1070 | \end_inset 1071 | 1072 | has less then 1073 | \begin_inset Formula $m$ 1074 | \end_inset 1075 | 1076 | neighbors, and 1077 | \begin_inset Formula $R_{2}=2R_{1}$ 1078 | \end_inset 1079 | 1080 | has more than 1081 | \begin_inset Formula $m$ 1082 | \end_inset 1083 | 1084 | . 1085 | This interval can be refined using binary search on the variable 1086 | \begin_inset Formula $R$ 1087 | \end_inset 1088 | 1089 | . 1090 | This produces a sequence of shrinking intervals 1091 | \begin_inset Formula $[R_{a},R_{b}]$ 1092 | \end_inset 1093 | 1094 | , which converges onto (some) correct 1095 | \begin_inset Formula $R$ 1096 | \end_inset 1097 | 1098 | . 1099 | This works, because the number of neighbors as a function of distance is 1100 | a monotonic (although discontinuous and piecewise constant) function. 1101 | We stop the search once we find an 1102 | \begin_inset Formula $R$ 1103 | \end_inset 1104 | 1105 | which has exactly 1106 | \begin_inset Formula $m$ 1107 | \end_inset 1108 | 1109 | neighbors. 1110 | \end_layout 1111 | 1112 | \begin_layout Standard 1113 | The final pitfall is that in an arbitrary point cloud, for any given point, 1114 | the cloud may contain exactly two (or more) points at the exact same distance 1115 | from it. 1116 | In these cases, there might not exist a distance with exactly 1117 | \begin_inset Formula $m$ 1118 | \end_inset 1119 | 1120 | neighbors for the given point! To protect against this possibility, we 1121 | set a tolerance 1122 | \begin_inset Formula $\varepsilon>0$ 1123 | \end_inset 1124 | 1125 | for the length of the search interval 1126 | \begin_inset Formula $[R_{a},R_{b}]$ 1127 | \end_inset 1128 | 1129 | in the above procedure. 1130 | If no matching 1131 | \begin_inset Formula $R$ 1132 | \end_inset 1133 | 1134 | has been found, and 1135 | \begin_inset Formula $R_{b}-R_{a}<\varepsilon$ 1136 | \end_inset 1137 | 1138 | , we stop the search and return the neighbor set at 1139 | \begin_inset Formula $R_{b}$ 1140 | \end_inset 1141 | 1142 | (along with e.g. 1143 | an error code or some other signal, so that the calling end knows that 1144 | extra neighbors have been returned). 1145 | \end_layout 1146 | 1147 | \end_body 1148 | \end_document 1149 | -------------------------------------------------------------------------------- /doc/wlsqm.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Technologicat/python-wlsqm/b697d163c2d2bec46b4d9696467abaebb9d4cbb3/doc/wlsqm.pdf -------------------------------------------------------------------------------- /doc/wlsqm_gen.lyx: -------------------------------------------------------------------------------- 1 | #LyX 2.2 created this file. For more info see http://www.lyx.org/ 2 | \lyxformat 508 3 | \begin_document 4 | \begin_header 5 | \save_transient_properties true 6 | \origin unavailable 7 | \textclass article 8 | \use_default_options true 9 | \maintain_unincluded_children false 10 | \language english 11 | \language_package default 12 | \inputencoding auto 13 | \fontencoding global 14 | \font_roman "palatino" "default" 15 | \font_sans "default" "default" 16 | \font_typewriter "default" "default" 17 | \font_math "auto" "auto" 18 | \font_default_family default 19 | \use_non_tex_fonts false 20 | \font_sc false 21 | \font_osf false 22 | \font_sf_scale 100 100 23 | \font_tt_scale 100 100 24 | \graphics default 25 | \default_output_format default 26 | \output_sync 0 27 | \bibtex_command default 28 | \index_command default 29 | \paperfontsize default 30 | \spacing single 31 | \use_hyperref true 32 | \pdf_bookmarks true 33 | \pdf_bookmarksnumbered false 34 | \pdf_bookmarksopen false 35 | \pdf_bookmarksopenlevel 1 36 | \pdf_breaklinks false 37 | \pdf_pdfborder true 38 | \pdf_colorlinks false 39 | \pdf_backref false 40 | \pdf_pdfusetitle true 41 | \papersize default 42 | \use_geometry true 43 | \use_package amsmath 1 44 | \use_package amssymb 1 45 | \use_package cancel 1 46 | \use_package esint 1 47 | \use_package mathdots 1 48 | \use_package mathtools 1 49 | \use_package mhchem 1 50 | \use_package stackrel 1 51 | \use_package stmaryrd 1 52 | \use_package undertilde 1 53 | \cite_engine natbib 54 | \cite_engine_type authoryear 55 | \biblio_style plain 56 | \use_bibtopic false 57 | \use_indices false 58 | \paperorientation portrait 59 | \suppress_date false 60 | \justification true 61 | \use_refstyle 1 62 | \index Index 63 | \shortcut idx 64 | \color #008000 65 | \end_index 66 | \leftmargin 2cm 67 | \topmargin 2cm 68 | \rightmargin 2cm 69 | \bottommargin 2cm 70 | \secnumdepth 3 71 | \tocdepth 3 72 | \paragraph_separation indent 73 | \paragraph_indentation default 74 | \quotes_language english 75 | \papercolumns 1 76 | \papersides 1 77 | \paperpagestyle default 78 | \tracking_changes false 79 | \output_changes false 80 | \html_math_output 0 81 | \html_css_as_file 0 82 | \html_be_strict false 83 | \end_header 84 | 85 | \begin_body 86 | 87 | \begin_layout Section 88 | Extended WLSQM: dealing with missing function values 89 | \end_layout 90 | 91 | \begin_layout Standard 92 | Can we extend WLSQM to the case where the function value 93 | \begin_inset Formula $\widehat{f}_{i}(x_{i})$ 94 | \end_inset 95 | 96 | is unknown, provided that 97 | \begin_inset Formula $\widehat{f}_{i}(x_{k})$ 98 | \end_inset 99 | 100 | is known for all neighbor points 101 | \begin_inset Formula $x_{k}$ 102 | \end_inset 103 | 104 | ? 105 | \end_layout 106 | 107 | \begin_layout Itemize 108 | The primary use case is handling boundary conditions, which may prescribe 109 | a derivative, leaving the function value free. 110 | In these cases, we eliminate the appropriate 111 | \begin_inset Formula $a_{j}$ 112 | \end_inset 113 | 114 | , either by algebraic elimination of the corresponding row and column; or 115 | by replacing its row in the equation system with 116 | \begin_inset Formula $1\cdot a_{j}=C$ 117 | \end_inset 118 | 119 | (maybe appropriately scaled), where 120 | \begin_inset Formula $C$ 121 | \end_inset 122 | 123 | is its known value. 124 | \end_layout 125 | 126 | \begin_layout Itemize 127 | This can also be used for interpolation, to obtain an approximation to the 128 | function value and its derivatives at an arbitrary point 129 | \begin_inset Formula $x$ 130 | \end_inset 131 | 132 | that does not belong to the point cloud. 133 | (But here a cheaper alternative is to compute the approximation from the 134 | obtained quadratic fit. 135 | This also gives the derivatives, since the analytical expression of the 136 | fit is known.) 137 | \end_layout 138 | 139 | \begin_layout Itemize 140 | Another use case may be as an error indicator (compare the interpolated 141 | 142 | \begin_inset Formula $\widehat{f}_{i}(x_{i})$ 143 | \end_inset 144 | 145 | , computed by omitting 146 | \begin_inset Formula $f_{i}$ 147 | \end_inset 148 | 149 | , and the actual data 150 | \begin_inset Formula $f_{i}$ 151 | \end_inset 152 | 153 | ). 154 | \end_layout 155 | 156 | \begin_layout Itemize 157 | Also as a smoother? Replace each 158 | \begin_inset Formula $f_{i}$ 159 | \end_inset 160 | 161 | by its interpolant, then iterate until convergence. 162 | \end_layout 163 | 164 | \begin_layout Standard 165 | The answer turns out to be yes. 166 | Let us denote the local representation of our scalar field 167 | \begin_inset Formula $f(x)$ 168 | \end_inset 169 | 170 | , in a neighborhood of the point 171 | \begin_inset Formula $x_{i}$ 172 | \end_inset 173 | 174 | , by 175 | \begin_inset Formula $\widehat{f}_{i}(x)$ 176 | \end_inset 177 | 178 | . 179 | \end_layout 180 | 181 | \begin_layout Standard 182 | Let us Taylor expand 183 | \begin_inset Formula $\widehat{f}_{i}$ 184 | \end_inset 185 | 186 | around the point 187 | \begin_inset Formula $x_{i}$ 188 | \end_inset 189 | 190 | , and evaluate the Taylor series at a neighbor point 191 | \begin_inset Formula $x_{k}$ 192 | \end_inset 193 | 194 | (a point distinct from 195 | \begin_inset Formula $x_{i}$ 196 | \end_inset 197 | 198 | , also belonging to the point cloud): 199 | \begin_inset Formula 200 | \begin{equation} 201 | \widehat{f}_{i}(x_{k})=\widehat{f}_{i}(x_{i})+h_{k}a_{1}+\ell_{k}a_{2}+\frac{h_{k}^{2}}{2}a_{3}+h_{k}\ell_{k}a_{4}+\frac{\ell_{k}^{2}}{2}a_{5}+O(h_{k}^{3}\,,\ell_{k}^{3})\;,\label{eq:Tay} 202 | \end{equation} 203 | 204 | \end_inset 205 | 206 | where 207 | \begin_inset Formula 208 | \begin{align} 209 | h_{k} & :=(x_{k})_{1}-(x_{i})_{1}\;,\label{eq:hk}\\ 210 | \ell_{k} & :=(x_{k})_{2}-(x_{i})_{2}\;,\label{eq:ellk} 211 | \end{align} 212 | 213 | \end_inset 214 | 215 | and the function value and the derivatives are denoted by (note the numbering) 216 | \begin_inset Formula 217 | \begin{align} 218 | a_{1} & =\frac{\partial\widehat{f}_{i}}{\partial x}\vert_{x=x_{i}}\;, & a_{2} & =\frac{\partial\widehat{f}_{i}}{\partial y}\vert_{x=x_{i}}\;,\nonumber \\ 219 | a_{3} & =\frac{\partial^{2}\widehat{f}_{i}}{\partial x^{2}}\vert_{x=x_{i}}\;, & a_{5} & =\frac{\partial^{2}\widehat{f}_{i}}{\partial y^{2}}\vert_{x=x_{i}}\;,\nonumber \\ 220 | a_{4} & =\frac{\partial^{2}\widehat{f}_{i}}{\partial x\partial y}\vert_{x=x_{i}}\;, & a_{0} & =\widehat{f}_{i}\vert_{x=x_{i}}\;.\label{eq:aj} 221 | \end{align} 222 | 223 | \end_inset 224 | 225 | Truncating the error term, we have the Taylor approximation: 226 | \begin_inset Formula 227 | \begin{equation} 228 | \widehat{f}_{i}(x_{k})\approx a_{0}+h_{k}a_{1}+\ell_{k}a_{2}+\frac{h_{k}^{2}}{2}a_{3}+h_{k}\ell_{k}a_{4}+\frac{\ell_{k}^{2}}{2}a_{5}=:\overline{f}_{k}\;,\label{eq:approx} 229 | \end{equation} 230 | 231 | \end_inset 232 | 233 | Now, let us define the coefficients 234 | \begin_inset Formula 235 | \begin{align} 236 | c_{k}^{(1)} & :=h_{k}\;, & c_{k}^{(2)} & :=\ell_{k}\;,\nonumber \\ 237 | c_{k}^{(3)} & :=\frac{h_{k}^{2}}{2}\;, & c_{k}^{(5)} & :=\frac{\ell_{k}^{2}}{2}\;,\nonumber \\ 238 | c_{k}^{(4)} & :=h_{k}\ell_{k}\;, & c_{k}^{(0)} & :=1\;.\label{eq:ck} 239 | \end{align} 240 | 241 | \end_inset 242 | 243 | Observe that 244 | \begin_inset Formula 245 | \begin{equation} 246 | \frac{\partial\overline{f}_{k}}{\partial a_{j}}=c_{k}^{(j)}\;,\label{eq:dfkdaj} 247 | \end{equation} 248 | 249 | \end_inset 250 | 251 | At the neighbor points 252 | \begin_inset Formula $x_{k}$ 253 | \end_inset 254 | 255 | (belonging to the point cloud), by assumption we have the function values 256 | available as data. 257 | The error made at any such point 258 | \begin_inset Formula $x_{k}$ 259 | \end_inset 260 | 261 | , when we replace 262 | \begin_inset Formula $\widehat{f}_{i}(x_{k})$ 263 | \end_inset 264 | 265 | with its Taylor approximation, is 266 | \begin_inset Formula 267 | \begin{equation} 268 | e_{k}:=f_{k}-\overline{f}_{k}\;,\label{eq:ek} 269 | \end{equation} 270 | 271 | \end_inset 272 | 273 | One-half of the total squared error across all the neighbor points 274 | \begin_inset Formula $k$ 275 | \end_inset 276 | 277 | is simply 278 | \begin_inset Formula 279 | \begin{equation} 280 | G(a_{0},\dots,a_{5}):=\frac{1}{2}\;\underset{k\in I_{i}}{\sum}\,e_{k}^{2}\;,\label{eq:G} 281 | \end{equation} 282 | 283 | \end_inset 284 | 285 | where 286 | \begin_inset Formula $I_{i}$ 287 | \end_inset 288 | 289 | is the index set of the point 290 | \begin_inset Formula $i$ 291 | \end_inset 292 | 293 | 's neighbors. 294 | \end_layout 295 | 296 | \begin_layout Standard 297 | Minimizing the error leads, in the least-squares sense, to the best possible 298 | values for the 299 | \begin_inset Formula $a_{j}$ 300 | \end_inset 301 | 302 | : 303 | \begin_inset Formula 304 | \[ 305 | \{a_{0},\dots,a_{5}\}_{\mathrm{optimal}}=\underset{a_{0},\dots,a_{5}}{\arg\min}\,G(a_{0},\dots,a_{5})\;. 306 | \] 307 | 308 | \end_inset 309 | 310 | Because 311 | \begin_inset Formula $G\ge0$ 312 | \end_inset 313 | 314 | for any values of the 315 | \begin_inset Formula $a_{j}$ 316 | \end_inset 317 | 318 | , and 319 | \begin_inset Formula $G$ 320 | \end_inset 321 | 322 | is a quadratic function of the 323 | \begin_inset Formula $a_{j}$ 324 | \end_inset 325 | 326 | , it has a unique extremal point, which is a minimum. 327 | The least-squares fit is given by this unique minimum of 328 | \begin_inset Formula $G$ 329 | \end_inset 330 | 331 | : 332 | \begin_inset Note Note 333 | status open 334 | 335 | \begin_layout Plain Layout 336 | The error 337 | \begin_inset Formula $G$ 338 | \end_inset 339 | 340 | obviously has no finite maximum in terms of the 341 | \begin_inset Formula $a_{j}$ 342 | \end_inset 343 | 344 | ; hence its only critical point must be a minimum. 345 | Thus the problem becomes to find values for 346 | \begin_inset Formula $a_{j}$ 347 | \end_inset 348 | 349 | such that 350 | \end_layout 351 | 352 | \end_inset 353 | 354 | 355 | \begin_inset Formula 356 | \begin{equation} 357 | \frac{\partial G}{\partial a_{j}}=0\;,\quad j=0,\dots,5\;.\label{eq:minG} 358 | \end{equation} 359 | 360 | \end_inset 361 | 362 | Using first 363 | \begin_inset CommandInset ref 364 | LatexCommand eqref 365 | reference "eq:G" 366 | 367 | \end_inset 368 | 369 | , and then on the second line 370 | \begin_inset CommandInset ref 371 | LatexCommand eqref 372 | reference "eq:ek" 373 | 374 | \end_inset 375 | 376 | , we can write 377 | \begin_inset Formula 378 | \begin{align} 379 | \frac{\partial G}{\partial a_{j}} & =\underset{k\in I_{i}}{\sum}e_{k}\frac{\partial e_{k}}{\partial a_{j}}\nonumber \\ 380 | & =\underset{k\in I_{i}}{\sum}[f_{k}-\overline{f}_{k}(a_{0},\dots,a_{5})]\left[-\frac{\partial\overline{f}_{k}}{\partial a_{j}}\right]=0\;,\quad j=0,\dots5\;,\label{eq:minG2} 381 | \end{align} 382 | 383 | \end_inset 384 | 385 | which, using 386 | \begin_inset CommandInset ref 387 | LatexCommand eqref 388 | reference "eq:approx" 389 | 390 | \end_inset 391 | 392 | – 393 | \begin_inset CommandInset ref 394 | LatexCommand eqref 395 | reference "eq:ck" 396 | 397 | \end_inset 398 | 399 | and 400 | \begin_inset CommandInset ref 401 | LatexCommand eqref 402 | reference "eq:dfkdaj" 403 | 404 | \end_inset 405 | 406 | , leads to 407 | \begin_inset Formula 408 | \[ 409 | \underset{k\in I_{i}}{\sum}\left(\left[-f_{k}+c_{k}^{(0)}a_{0}+c_{k}^{(1)}a_{1}+c_{k}^{(2)}a_{2}+c_{k}^{(3)}a_{3}+c_{k}^{(4)}a_{4}+c_{k}^{(5)}a_{5}\right]c_{k}^{(j)}\right)=0\;,\quad j=0,\dots,5\;. 410 | \] 411 | 412 | \end_inset 413 | 414 | This can be written as a standard linear equation system 415 | \begin_inset Formula 416 | \begin{equation} 417 | \underset{n=0}{\overset{5}{\sum}}A_{jn}a_{n}=b_{j}\;,\quad j=0,\dots,5\;,\label{eq:lineq} 418 | \end{equation} 419 | 420 | \end_inset 421 | 422 | where 423 | \begin_inset Formula 424 | \begin{equation} 425 | A_{jn}=\underset{k\in I_{i}}{\sum}\;\,c_{k}^{(n)}c_{k}^{(j)}\;,\label{eq:Ajn} 426 | \end{equation} 427 | 428 | \end_inset 429 | 430 | 431 | \begin_inset Formula 432 | \begin{equation} 433 | b_{j}=\underset{k\in I_{i}}{\sum}\,f_{k}\,c_{k}^{(j)}\;.\label{eq:bj} 434 | \end{equation} 435 | 436 | \end_inset 437 | 438 | Considering the magnitudes of the expressions 439 | \begin_inset CommandInset ref 440 | LatexCommand eqref 441 | reference "eq:ck" 442 | 443 | \end_inset 444 | 445 | , which contribute quadratically to 446 | \begin_inset Formula $A_{jn}$ 447 | \end_inset 448 | 449 | , we see that the condition number of 450 | \begin_inset Formula $A$ 451 | \end_inset 452 | 453 | will likely deteriorate, when compared to the previous case where 454 | \begin_inset Formula $f_{i}$ 455 | \end_inset 456 | 457 | is known. 458 | This is as expected; we are now dealing with not only the first and second 459 | derivatives, but also with the function value. 460 | Preconditioning (even by row normalization, although this destroys the 461 | symmetry) may help with floating-point roundoff issues. 462 | \end_layout 463 | 464 | \begin_layout Standard 465 | \begin_inset Newpage pagebreak 466 | \end_inset 467 | 468 | 469 | \end_layout 470 | 471 | \begin_layout Section 472 | Accuracy? 473 | \end_layout 474 | 475 | \begin_layout Standard 476 | At first glance, WLSQM is not even consistent, if we treat 477 | \begin_inset Formula $\overline{f}_{k}$ 478 | \end_inset 479 | 480 | as an 481 | \begin_inset Formula $O(h_{k}^{3}\,,\ell_{k}^{3})$ 482 | \end_inset 483 | 484 | approximation of 485 | \begin_inset Formula $\widehat{f}_{i}(x_{k})$ 486 | \end_inset 487 | 488 | , and track the error term through the calculation (details left as an exercise). 489 | \end_layout 490 | 491 | \begin_layout Standard 492 | Obviously, consistent expansion of the matrix 493 | \begin_inset Formula $A$ 494 | \end_inset 495 | 496 | to the order 497 | \begin_inset Formula $O(h_{k}^{3}\,,\ell_{k}^{3})$ 498 | \end_inset 499 | 500 | gives 501 | \begin_inset Formula 502 | \[ 503 | A=\underset{k\in I_{i}}{\sum}\left[\begin{array}{cccccc} 504 | 1 & h_{k} & \ell_{k} & \frac{h_{k}^{2}}{2} & h_{k}\ell_{k} & \frac{\ell_{k}^{2}}{2}\\ 505 | h_{k} & h_{k}^{2} & h_{k}\ell_{k} & \sim0 & \sim0 & \sim0\\ 506 | \ell_{k} & h_{k}\ell_{k} & \ell_{k}^{2} & \sim0 & \sim0 & \sim0\\ 507 | \frac{h_{k}^{2}}{2} & \sim0 & \sim0 & \sim0 & \sim0 & \sim0\\ 508 | h_{k}\ell_{k} & \sim0 & \sim0 & \sim0 & \sim0 & \sim0\\ 509 | \frac{\ell_{k}^{2}}{2} & \sim0 & \sim0 & \sim0 & \sim0 & \sim0 510 | \end{array}\right]=\underset{k\in I_{i}}{\sum}\left[\begin{array}{cccccc} 511 | 1 & h_{k} & \ell_{k} & \frac{h_{k}^{2}}{2} & h_{k}\ell_{k} & \frac{\ell_{k}^{2}}{2}\\ 512 | & h_{k}^{2} & h_{k}\ell_{k} & \sim0 & \sim0 & \sim0\\ 513 | & & \ell_{k}^{2} & \sim0 & \sim0 & \sim0\\ 514 | & & & \sim0 & \sim0 & \sim0\\ 515 | & \mathrm{symm.} & & & \sim0 & \sim0\\ 516 | & & & & & \sim0 517 | \end{array}\right] 518 | \] 519 | 520 | \end_inset 521 | 522 | which has at most rank 3 (rank 2 in the classical case, where the first 523 | row and column, corresponding to unknown 524 | \begin_inset Formula $f_{i}$ 525 | \end_inset 526 | 527 | , are removed). 528 | Of course it can be of full rank if the almost zeros are retained, but 529 | the truncation error (of the Taylor approximation) dominates those, so 530 | consistency requires that they be dropped. 531 | \end_layout 532 | 533 | \begin_layout Standard 534 | (As an aside, we note that if there is only one neighbor point, the equations 535 | corresponding to the UL 3x3 block become scalar multiples of each other 536 | due to 537 | \begin_inset Formula $b$ 538 | \end_inset 539 | 540 | also having a factor of 541 | \begin_inset Formula $c_{k}^{(j)}$ 542 | \end_inset 543 | 544 | . 545 | This is of course as expected; one can hardly expect to obtain two independent 546 | derivatives from just one neighbor. 547 | The same occurs if the neighbors are collinear (as is obvious geometrically, 548 | and quite simple to see algebraically, writing e.g. 549 | for two points 550 | \begin_inset Formula $h_{2}=Ch_{1}$ 551 | \end_inset 552 | 553 | , 554 | \begin_inset Formula $\ell_{2}=C\ell_{1}$ 555 | \end_inset 556 | 557 | ...).) 558 | \end_layout 559 | 560 | \begin_layout Standard 561 | However, WLSQM (at least the classical version with 562 | \begin_inset Formula $f_{i}$ 563 | \end_inset 564 | 565 | known) has been observed to actually work, with some reasonable amount 566 | of numerical error, so this analysis must be wrong. 567 | What is going on? 568 | \end_layout 569 | 570 | \begin_layout Subsection 571 | Accuracy, correctly 572 | \end_layout 573 | 574 | \begin_layout Standard 575 | Let us take a page from finite element methods, where the weak form is — 576 | after the fact — taken as the new 577 | \emph on 578 | definition 579 | \emph default 580 | of the problem (which just so happens to lead to the classical strong form 581 | in cases where both can be written). 582 | \end_layout 583 | 584 | \begin_layout Standard 585 | To apply this philosophy here: after we define 586 | \begin_inset Formula $\overline{f}_{k}$ 587 | \end_inset 588 | 589 | , we can 590 | \begin_inset Quotes eld 591 | \end_inset 592 | 593 | forget 594 | \begin_inset Quotes erd 595 | \end_inset 596 | 597 | that it comes from a truncated Taylor series, and 598 | \emph on 599 | take the definition as a new starting point 600 | \emph default 601 | : in principle, 602 | \begin_inset Formula $\overline{f}_{k}$ 603 | \end_inset 604 | 605 | is just a function of the 606 | \begin_inset Formula $a_{j}$ 607 | \end_inset 608 | 609 | , to be least-squares fitted to known data points 610 | \begin_inset Formula $f_{k}$ 611 | \end_inset 612 | 613 | (and optionally known 614 | \begin_inset Formula $f_{i}$ 615 | \end_inset 616 | 617 | , as per classical WLSQM). 618 | \end_layout 619 | 620 | \begin_layout Standard 621 | Then we just perform standard least-squares fitting. 622 | The math is exact (given unrealistic, exact arithmetic — this is a separate 623 | issue); no truncation error term appears. 624 | The full matrix should be retained: 625 | \begin_inset Formula 626 | \[ 627 | A=\underset{k\in I_{i}}{\sum}\left[\begin{array}{cccccc} 628 | 1 & h_{k} & \ell_{k} & \frac{h_{k}^{2}}{2} & h_{k}\ell_{k} & \frac{\ell_{k}^{2}}{2}\\ 629 | h_{k} & h_{k}^{2} & h_{k}\ell_{k} & \frac{h_{k}^{3}}{2} & h_{k}^{2}\ell_{k} & h_{k}\frac{\ell_{k}^{2}}{2}\\ 630 | \ell_{k} & h_{k}\ell_{k} & \ell_{k}^{2} & \ell_{k}\frac{h_{k}^{2}}{2} & h_{k}\ell_{k}^{2} & \frac{\ell_{k}^{3}}{2}\\ 631 | \frac{h_{k}^{2}}{2} & \frac{h_{k}^{3}}{2} & \ell_{k}\frac{h_{k}^{2}}{2} & \frac{h_{k}^{4}}{4} & \frac{h_{k}^{3}}{2}\ell_{k} & \frac{h_{k}^{2}\ell_{k}^{2}}{4}\\ 632 | h_{k}\ell_{k} & h_{k}^{2}\ell_{k} & h_{k}\ell_{k}^{2} & \frac{h_{k}^{3}}{2}\ell_{k} & h_{k}^{2}\ell_{k}^{2} & h_{k}\frac{\ell_{k}^{3}}{2}\\ 633 | \frac{\ell_{k}^{2}}{2} & h_{k}\frac{\ell_{k}^{2}}{2} & \frac{\ell_{k}^{3}}{2} & \frac{h_{k}^{2}\ell_{k}^{2}}{4} & h_{k}\frac{\ell_{k}^{3}}{2} & \frac{\ell_{k}^{4}}{4} 634 | \end{array}\right]=\underset{k\in I_{i}}{\sum}\left[\begin{array}{cccccc} 635 | 1 & h_{k} & \ell_{k} & \frac{h_{k}^{2}}{2} & h_{k}\ell_{k} & \frac{\ell_{k}^{2}}{2}\\ 636 | & h_{k}^{2} & h_{k}\ell_{k} & \frac{h_{k}^{3}}{2} & h_{k}^{2}\ell_{k} & h_{k}\frac{\ell_{k}^{2}}{2}\\ 637 | & & \ell_{k}^{2} & \ell_{k}\frac{h_{k}^{2}}{2} & h_{k}\ell_{k}^{2} & \frac{\ell_{k}^{3}}{2}\\ 638 | & & & \frac{h_{k}^{4}}{4} & \frac{h_{k}^{3}}{2}\ell_{k} & \frac{h_{k}^{2}\ell_{k}^{2}}{4}\\ 639 | & \mathrm{symm.} & & & h_{k}^{2}\ell_{k}^{2} & h_{k}\frac{\ell_{k}^{3}}{2}\\ 640 | & & & & & \frac{\ell_{k}^{4}}{4} 641 | \end{array}\right] 642 | \] 643 | 644 | \end_inset 645 | 646 | This is now of full rank, provided that enough neighbor points 647 | \begin_inset Formula $x_{k}$ 648 | \end_inset 649 | 650 | are used in the calculation (considering that we are least-squares fitting 651 | a general quadratic polynomial in the plane; see below). 652 | \end_layout 653 | 654 | \begin_layout Standard 655 | At this point the only error — considering only 656 | \begin_inset Formula $\overline{f}_{k}$ 657 | \end_inset 658 | 659 | and the data 660 | \begin_inset Formula $f_{k}$ 661 | \end_inset 662 | 663 | — is the 664 | \emph on 665 | RMS (root mean square) error 666 | \emph default 667 | of the least-squares fit, 668 | \begin_inset Formula $\min\sqrt{2G}$ 669 | \end_inset 670 | 671 | (where the minimum occurs at the solution point). 672 | The RMS error measures how well the model adheres to each data point, on 673 | average. 674 | The obtained coefficients are optimal: out of all functions of the form 675 | 676 | \begin_inset CommandInset ref 677 | LatexCommand eqref 678 | reference "eq:approx" 679 | 680 | \end_inset 681 | 682 | with 683 | \begin_inset Formula $a_{j}$ 684 | \end_inset 685 | 686 | as parameters, the solution of 687 | \begin_inset CommandInset ref 688 | LatexCommand eqref 689 | reference "eq:lineq" 690 | 691 | \end_inset 692 | 693 | – 694 | \begin_inset CommandInset ref 695 | LatexCommand eqref 696 | reference "eq:bj" 697 | 698 | \end_inset 699 | 700 | gives the smallest possible RMS error for the fit. 701 | \end_layout 702 | 703 | \begin_layout Standard 704 | Then — again after the fact — we observe that these optimal 705 | \begin_inset Formula $a_{j}$ 706 | \end_inset 707 | 708 | are pretty good also for use in a Taylor approximation 709 | \begin_inset Note Note 710 | status open 711 | 712 | \begin_layout Plain Layout 713 | , precisely because they minimize the RMS error of the fit 714 | \end_layout 715 | 716 | \end_inset 717 | 718 | . 719 | The solution is, in the least-squares sense, the best quadratic polynomial 720 | of 721 | \begin_inset Formula $(x,y)$ 722 | \end_inset 723 | 724 | for locally approximating 725 | \begin_inset Formula $f(x)$ 726 | \end_inset 727 | 728 | around 729 | \begin_inset Formula $x_{i}$ 730 | \end_inset 731 | 732 | . 733 | (The fit 734 | \begin_inset CommandInset ref 735 | LatexCommand eqref 736 | reference "eq:approx" 737 | 738 | \end_inset 739 | 740 | is linear in 741 | \begin_inset Formula $a_{j}$ 742 | \end_inset 743 | 744 | , but quadratic in 745 | \begin_inset Formula $(x,y)$ 746 | \end_inset 747 | 748 | .) Also the Taylor approximation, truncated after the second-order terms, 749 | is a quadratic polynomial approximating 750 | \begin_inset Formula $f(x)$ 751 | \end_inset 752 | 753 | around 754 | \begin_inset Formula $x_{i}$ 755 | \end_inset 756 | 757 | . 758 | Thus 759 | \emph on 760 | we 761 | \emph default 762 | 763 | \emph on 764 | interpret the quadratic fit as a (response surface) model 765 | \emph default 766 | for 767 | \begin_inset Formula $f(x)$ 768 | \end_inset 769 | 770 | near 771 | \begin_inset Formula $x_{i}$ 772 | \end_inset 773 | 774 | , and thus the 775 | \begin_inset Formula $a_{j}$ 776 | \end_inset 777 | 778 | as approximations to the Taylor coefficients of 779 | \begin_inset Formula $f$ 780 | \end_inset 781 | 782 | (whence also as the numerical approximations to the derivatives). 783 | \end_layout 784 | 785 | \begin_layout Standard 786 | However, it must be emphasized that this gives rise to 787 | \emph on 788 | modeling error 789 | \emph default 790 | , because the 791 | \begin_inset Formula $a_{j}$ 792 | \end_inset 793 | 794 | are 795 | \emph on 796 | not 797 | \emph default 798 | the exact coefficients of the true Taylor expansion of 799 | \begin_inset Formula $f(x)$ 800 | \end_inset 801 | 802 | around 803 | \begin_inset Formula $x_{i}$ 804 | \end_inset 805 | 806 | . 807 | Indeed, strictly speaking, the data may not even describe a function admitting 808 | such an expansion! Even if the data admits an underlying function, and 809 | it happens to be in 810 | \begin_inset space ~ 811 | \end_inset 812 | 813 | 814 | \begin_inset Formula $C^{2}$ 815 | \end_inset 816 | 817 | , there may be numerical and/or experimental noise in the data points, depending 818 | on the data source. 819 | (This 820 | \emph on 821 | inexact data 822 | \emph default 823 | is another separate error source.) Also, in the general case the fit will 824 | not be exact, i.e. 825 | the RMS error will be nonzero. 826 | \end_layout 827 | 828 | \begin_layout Standard 829 | From this viewpoint, WLSQM would be more accurately advertised as a method 830 | for response surface modeling (RSM), for computing a local quadratic response 831 | surface in arbitrary geometries, instead of as a method for numerical different 832 | iation. 833 | \end_layout 834 | 835 | \begin_layout Standard 836 | Regarding numerical differentiation, the natural follow-up question is, 837 | what is the the total error arising from approximating the function 838 | \begin_inset Formula $f(x)$ 839 | \end_inset 840 | 841 | locally as the quadratic polynomial fit? The (original, not truncated) 842 | Taylor series, at a general point 843 | \begin_inset Formula $x$ 844 | \end_inset 845 | 846 | in the neighborhood of 847 | \begin_inset Formula $x_{i}$ 848 | \end_inset 849 | 850 | , is 851 | \begin_inset Formula 852 | \begin{equation} 853 | \widehat{f}_{i}(x)=\widehat{f}_{i}+h\frac{\partial\widehat{f}_{i}}{\partial x}+\ell\frac{\partial\widehat{f}_{i}}{\partial y}+\frac{h^{2}}{2}\frac{\partial^{2}\widehat{f}_{i}}{\partial x^{2}}+h\ell\frac{\partial^{2}\widehat{f}_{i}}{\partial x\partial y}+\frac{\ell^{2}}{2}\frac{\partial^{2}\widehat{f}_{i}}{\partial y^{2}}+O(h^{3}\,,\ell^{3})\;,\label{eq:Taygeneral} 854 | \end{equation} 855 | 856 | \end_inset 857 | 858 | where on the right-hand side, the function and the derivatives are evaluated 859 | at 860 | \begin_inset Formula $x=x_{i}$ 861 | \end_inset 862 | 863 | , and 864 | \begin_inset Formula $x-x_{i}=(h,\ell)$ 865 | \end_inset 866 | 867 | . 868 | The quadratic polynomial fit is 869 | \begin_inset Formula 870 | \begin{equation} 871 | Q(x):=a_{0}+ha_{1}+\ell a_{2}+\frac{h^{2}}{2}a_{3}+h\ell a_{4}+\frac{\ell^{2}}{2}a_{5}\;,\label{eq:Qx} 872 | \end{equation} 873 | 874 | \end_inset 875 | 876 | where the 877 | \begin_inset Formula $a_{j}$ 878 | \end_inset 879 | 880 | are obtained from the least-squares optimization. 881 | The total error in the function value, at a point 882 | \begin_inset Formula $x$ 883 | \end_inset 884 | 885 | , is their difference 886 | \begin_inset Formula 887 | \begin{align} 888 | \mathrm{err}(x) & :=f(x)-Q(x)\overset{\text{near }x_{i}}{=}\widehat{f}_{i}(x)-Q(x)\nonumber \\ 889 | & =(\widehat{f}_{i}-a_{0})+h(\frac{\partial\widehat{f}_{i}}{\partial x}-a_{1})+\ell(\frac{\partial\widehat{f}_{i}}{\partial y}-a_{2})+\frac{h^{2}}{2}(\frac{\partial^{2}\widehat{f}_{i}}{\partial x^{2}}-a_{3})+h\ell(\frac{\partial^{2}\widehat{f}_{i}}{\partial x\partial y}-a_{4})+\frac{\ell^{2}}{2}(\frac{\partial^{2}\widehat{f}_{i}}{\partial y^{2}}-a_{5})+O(h^{3}\,,\ell^{3})\;.\label{eq:errx} 890 | \end{align} 891 | 892 | \end_inset 893 | 894 | When the Taylor series is truncated after the quadratic terms, the asymptotic 895 | term gives the 896 | \emph on 897 | truncation error 898 | \emph default 899 | . 900 | The rest of the error is due to 901 | \emph on 902 | modeling error 903 | \emph default 904 | in the coefficients 905 | \begin_inset Formula $a_{j}$ 906 | \end_inset 907 | 908 | , i.e. 909 | the parenthetical expressions in 910 | \begin_inset CommandInset ref 911 | LatexCommand eqref 912 | reference "eq:errx" 913 | 914 | \end_inset 915 | 916 | . 917 | \end_layout 918 | 919 | \begin_layout Standard 920 | It is obvious that in the general case, the modeling error will be nonzero 921 | (even if we assume the data to be exact): the function 922 | \begin_inset Formula $f$ 923 | \end_inset 924 | 925 | is generally not a quadratic polynomial, and hence no quadratic polynomial 926 | can represent it exactly. 927 | To reiterate: the coefficients 928 | \begin_inset Formula $a_{j}$ 929 | \end_inset 930 | 931 | are the coefficients of the quadratic fit 932 | \begin_inset Formula $Q(x)$ 933 | \end_inset 934 | 935 | — they are 936 | \emph on 937 | not 938 | \emph default 939 | the Taylor coefficients of 940 | \begin_inset Formula $f$ 941 | \end_inset 942 | 943 | ! 944 | \end_layout 945 | 946 | \begin_layout Standard 947 | However, they are a computable, close relative of the Taylor coefficients 948 | of the unknown function 949 | \begin_inset Formula $f$ 950 | \end_inset 951 | 952 | , since the Taylor series of 953 | \begin_inset Formula $Q(x)$ 954 | \end_inset 955 | 956 | expanded at 957 | \begin_inset Formula $x_{i}$ 958 | \end_inset 959 | 960 | is, quite simply, 961 | \begin_inset Formula $Q(x)$ 962 | \end_inset 963 | 964 | itself. 965 | (Because 966 | \begin_inset Formula $Q(x)$ 967 | \end_inset 968 | 969 | is a polynomial, no asymptotic error term appears.) 970 | \end_layout 971 | 972 | \begin_layout Standard 973 | Thus the magnitude of the total error depends on how well the coefficients 974 | 975 | \begin_inset Formula $a_{j}$ 976 | \end_inset 977 | 978 | approximate the Taylor coefficients of 979 | \begin_inset Formula $f$ 980 | \end_inset 981 | 982 | ; or in other words, how close 983 | \begin_inset Formula $f$ 984 | \end_inset 985 | 986 | is (locally) to a quadratic polynomial (which — given exact data and exact 987 | arithmetic — can be fitted exactly; note that both assumptions are required, 988 | as inexact data will give rise to nonzero RMS error in the fit, i.e. 989 | then the fit will not be exact). 990 | \end_layout 991 | 992 | \begin_layout Standard 993 | This obviously depends on the neighborhood size, due to the asymptotic term 994 | describing the truncation error in 995 | \begin_inset CommandInset ref 996 | LatexCommand eqref 997 | reference "eq:Taygeneral" 998 | 999 | \end_inset 1000 | 1001 | . 1002 | The asymptotic term of the Taylor series says that if the neighborhood 1003 | is small enough, any 1004 | \begin_inset Formula $f\in C^{2}$ 1005 | \end_inset 1006 | 1007 | is locally close to a quadratic polynomial. 1008 | This — for sufficiently small neighborhoods — should make the modeling 1009 | error (and thus the error in the numerical derivatives) comparable to 1010 | \begin_inset Formula $O(h^{3}\,,\ell^{3})$ 1011 | \end_inset 1012 | 1013 | . 1014 | \end_layout 1015 | 1016 | \begin_layout Standard 1017 | This suggests that 1018 | \begin_inset Formula $\mathrm{err}(x)$ 1019 | \end_inset 1020 | 1021 | — with exact data and exact arithmetic — should also be comparable to 1022 | \begin_inset Formula $O(h^{3}\,,\ell^{3})$ 1023 | \end_inset 1024 | 1025 | . 1026 | (With inexact data, one needs to take into account that 1027 | \begin_inset Formula $f_{k}=f(x_{k})+\delta_{k}$ 1028 | \end_inset 1029 | 1030 | and work from that.) 1031 | \end_layout 1032 | 1033 | \begin_layout Standard 1034 | Observe also that there are six 1035 | \begin_inset Formula $a_{j}$ 1036 | \end_inset 1037 | 1038 | ( 1039 | \begin_inset Formula $j=0,\dots,5$ 1040 | \end_inset 1041 | 1042 | ) in 1043 | \begin_inset CommandInset ref 1044 | LatexCommand eqref 1045 | reference "eq:Qx" 1046 | 1047 | \end_inset 1048 | 1049 | . 1050 | Hence, with exact arithmetic, six data values for 1051 | \begin_inset CommandInset ref 1052 | LatexCommand eqref 1053 | reference "eq:approx" 1054 | 1055 | \end_inset 1056 | 1057 | , i.e. 1058 | six neighbors 1059 | \begin_inset Formula $x_{k}$ 1060 | \end_inset 1061 | 1062 | (five if 1063 | \begin_inset Formula $f_{i}$ 1064 | \end_inset 1065 | 1066 | is known, eliminating 1067 | \begin_inset Formula $a_{0}$ 1068 | \end_inset 1069 | 1070 | ), uniquely determine the quadratic function 1071 | \begin_inset Formula $Q(x)$ 1072 | \end_inset 1073 | 1074 | . 1075 | (Fewer data values lead to an underdetermined system, which has an infinite 1076 | family of solutions.) More data values lead to an overdetermined system, 1077 | which is then taken care of by least-squares fitting: picking the quadratic 1078 | polynomial that best approximates the data (which generally did not come 1079 | from a quadratic polynomial). 1080 | \end_layout 1081 | 1082 | \begin_layout Standard 1083 | This explains why the classical WLSQM takes 1084 | \begin_inset Formula $6$ 1085 | \end_inset 1086 | 1087 | neighbors (here 1088 | \begin_inset Formula $7$ 1089 | \end_inset 1090 | 1091 | if 1092 | \begin_inset Formula $f_{i}$ 1093 | \end_inset 1094 | 1095 | is not known) to perform the fitting; it is the smallest number of (nondegenera 1096 | te!) neighbors 1097 | \begin_inset Formula $x_{k}$ 1098 | \end_inset 1099 | 1100 | that makes the quadratic fitting problem overdetermined (hence actually 1101 | needing the least-squares procedure). 1102 | (The overdeterminedness also slightly protects against inexact data, so 1103 | that one data point that is slightly off will not completely change the 1104 | fit.) 1105 | \end_layout 1106 | 1107 | \begin_layout Standard 1108 | \begin_inset Newpage pagebreak 1109 | \end_inset 1110 | 1111 | 1112 | \end_layout 1113 | 1114 | \begin_layout Standard 1115 | But why is the result not exact, i.e. 1116 | why is there modeling error in 1117 | \begin_inset CommandInset ref 1118 | LatexCommand eqref 1119 | reference "eq:errx" 1120 | 1121 | \end_inset 1122 | 1123 | ? After all, the truncated Taylor expansion 1124 | \emph on 1125 | is 1126 | \emph default 1127 | the best local polynomial representation of 1128 | \begin_inset Formula $f$ 1129 | \end_inset 1130 | 1131 | , up to the given degree. 1132 | With exact data and arithmetic, how can the least-squares fit be anything 1133 | but the truncated Taylor expansion? 1134 | \end_layout 1135 | 1136 | \begin_layout Standard 1137 | The key is in the definition of 1138 | \begin_inset Quotes eld 1139 | \end_inset 1140 | 1141 | best 1142 | \begin_inset Quotes erd 1143 | \end_inset 1144 | 1145 | . 1146 | In a Taylor series, as the truncation order is increased, with each added 1147 | term (given sufficient continuity of 1148 | \begin_inset Formula $f$ 1149 | \end_inset 1150 | 1151 | ) the asymptotic accuracy increases, without requiring changes to the already 1152 | computed coefficients. 1153 | The Taylor series, being the polynomial series expansion of 1154 | \begin_inset Formula $f$ 1155 | \end_inset 1156 | 1157 | , is optimal in the class of polynomial representations where the coefficients 1158 | are 1159 | \begin_inset Quotes eld 1160 | \end_inset 1161 | 1162 | final 1163 | \begin_inset Quotes erd 1164 | \end_inset 1165 | 1166 | in this sense. 1167 | This is indeed what leads to the common-sense notion of the Taylor series 1168 | being 1169 | \begin_inset Quotes eld 1170 | \end_inset 1171 | 1172 | the best polynomial representation 1173 | \begin_inset Quotes erd 1174 | \end_inset 1175 | 1176 | of 1177 | \begin_inset space ~ 1178 | \end_inset 1179 | 1180 | 1181 | \begin_inset Formula $f$ 1182 | \end_inset 1183 | 1184 | . 1185 | \end_layout 1186 | 1187 | \begin_layout Standard 1188 | However, nothing requires the truncated Taylor series to satisfy the least-squar 1189 | es property. 1190 | In the least-squares sense, 1191 | \emph on 1192 | there may exist better polynomials of the same degree 1193 | \emph default 1194 | to locally approximate 1195 | \begin_inset Formula $f$ 1196 | \end_inset 1197 | 1198 | . 1199 | For a trivial 1D example: to represent 1200 | \begin_inset Formula $f(x)=x^{2}$ 1201 | \end_inset 1202 | 1203 | in an interval 1204 | \begin_inset Formula $x\in[-a,a]$ 1205 | \end_inset 1206 | 1207 | around the origin using a constant approximation, the Taylor series produces 1208 | 1209 | \begin_inset Formula $f\approx0$ 1210 | \end_inset 1211 | 1212 | . 1213 | However, the mean value across the interval is a 1214 | \begin_inset Quotes eld 1215 | \end_inset 1216 | 1217 | better 1218 | \begin_inset Quotes erd 1219 | \end_inset 1220 | 1221 | constant approximation in an integral least-squares sense. 1222 | \end_layout 1223 | 1224 | \begin_layout Standard 1225 | Indeed, a least-squares fit, as its order is increased, will change 1226 | \emph on 1227 | all 1228 | \emph default 1229 | of its coefficients; and it will do this to minimize the RMS error of the 1230 | fit. 1231 | (Be very careful: this is different from the modeling error in equation 1232 | 1233 | \begin_inset CommandInset ref 1234 | LatexCommand eqref 1235 | reference "eq:errx" 1236 | 1237 | \end_inset 1238 | 1239 | . 1240 | The RMS error only measures how well the model adheres to each data point, 1241 | on average; it does not see what the model is used for.) In the least-squares 1242 | fit, there is no asymptotic error term — the data points 1243 | \begin_inset space ~ 1244 | \end_inset 1245 | 1246 | 1247 | \begin_inset Formula $f_{k}$ 1248 | \end_inset 1249 | 1250 | , used in the fitting, implicitly contain the information also from all 1251 | the higher-order terms in the polynomial series expansion of 1252 | \begin_inset space ~ 1253 | \end_inset 1254 | 1255 | 1256 | \begin_inset Formula $f$ 1257 | \end_inset 1258 | 1259 | . 1260 | The fit then eliminates as much of the difference between the chosen model 1261 | and the data as is possible. 1262 | \end_layout 1263 | 1264 | \begin_layout Standard 1265 | It is not surprising that the price that must be paid for this increase 1266 | of accuracy in interpolation is the 1267 | \begin_inset Quotes eld 1268 | \end_inset 1269 | 1270 | finality 1271 | \begin_inset Quotes erd 1272 | \end_inset 1273 | 1274 | of the coefficients in the above sense, the Taylor series already being 1275 | optimal in its class. 1276 | \end_layout 1277 | 1278 | \begin_layout Standard 1279 | We conclude that in the general case the result cannot be exact, because 1280 | we are dealing with two very different entities, which coincide only under 1281 | very restrictive assumptions. 1282 | \end_layout 1283 | 1284 | \begin_layout Standard 1285 | Note also that 1286 | \begin_inset Quotes eld 1287 | \end_inset 1288 | 1289 | best 1290 | \begin_inset Quotes erd 1291 | \end_inset 1292 | 1293 | obviously depends on context. 1294 | For response surface modeling, the WLSQM quadratic polynomial fit is optimal. 1295 | However, for numerical differentiation, the fact that the obtained coefficients 1296 | do not exactly coincide with the Taylor series coefficients of 1297 | \begin_inset Formula $f$ 1298 | \end_inset 1299 | 1300 | produces an undesirable source of numerical error (modeling error). 1301 | \end_layout 1302 | 1303 | \end_body 1304 | \end_document 1305 | -------------------------------------------------------------------------------- /doc/wlsqm_gen.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Technologicat/python-wlsqm/b697d163c2d2bec46b4d9696467abaebb9d4cbb3/doc/wlsqm_gen.pdf -------------------------------------------------------------------------------- /example.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Technologicat/python-wlsqm/b697d163c2d2bec46b4d9696467abaebb9d4cbb3/example.png -------------------------------------------------------------------------------- /examples/expertsolver_example.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """A minimal usage example for ExpertSolver. 3 | 4 | JJ 2017-03-28 5 | """ 6 | 7 | from __future__ import division, print_function, absolute_import 8 | 9 | import numpy as np 10 | 11 | import scipy.spatial.ckdtree 12 | 13 | import matplotlib.pyplot as plt 14 | import mpl_toolkits.mplot3d.axes3d 15 | 16 | import wlsqm 17 | 18 | 19 | def project_onto_regular_grid_2D(x, F, nvis=101, fit_order=1, nk=10): 20 | """Project scalar data from a 2D point cloud onto a regular grid. 21 | 22 | Useful for plotting. Uses the WLSQM meshless method. 23 | 24 | The bounding box of the x data is automatically used as the bounds of the generated regular grid. 25 | 26 | Parameters: 27 | x : rank-2 array, dtype np.float64 28 | Point cloud, one point per row. x[i,:] = (xi,yi) 29 | 30 | F : rank-1 array, dtype np.float64 31 | The corresponding function values. F[i] = F( x[i,:] ) 32 | 33 | nvis : int 34 | Number of points per axis in the generated regular grid. 35 | 36 | fit_order : int 37 | Order of the surrogate polynomial, one of [0,1,2,3,4]. 38 | 39 | nk : int 40 | Number of nearest neighbors to use for fitting the model. 41 | 42 | Return value: 43 | tuple (X,Y,Z) 44 | where 45 | X,Y are rank-2 meshgrid arrays representing the generated regular grid, and 46 | Z is an array of the same shape, containing the corresponding data values. 47 | 48 | """ 49 | # Form the neighborhoods. 50 | 51 | # index the input points for fast searching 52 | tree = scipy.spatial.cKDTree( data=x ) 53 | 54 | # Find max_nk nearest neighbors of each input point. 55 | # 56 | # The +1 is for the point itself, since it is always the nearest to itself. 57 | # 58 | # (cKDTree.query() supports querying for arbitrary x; here we just set these x as the same as the points in the tree.) 59 | # 60 | dd,ii = tree.query( x, 1 + nk ) 61 | 62 | # Take only the neighbors of points[i], excluding the point itself. 63 | # 64 | ii = ii[:,1:] # points[ ii[i,k] ] is the kth nearest neighbor of points[i]. Shape of ii is (npoints, nk). 65 | 66 | # neighbor point indices (pointing to rows in x[]); typecast to int32 67 | hoods = np.array( ii, dtype=np.int32 ) 68 | 69 | npoints = x.shape[0] 70 | nk_array = nk * np.ones( (npoints,), dtype=np.int32 ) # number of neighbors, i.e. nk_array[i] is the number of actually used columns in hoods[i,:] 71 | 72 | # Construct the model by least-squares fitting 73 | # 74 | fit_order_array = fit_order * np.ones( (npoints,), dtype=np.int32 ) 75 | knowns_array = wlsqm.b2_F * np.ones( (npoints,), dtype=np.int64 ) # bitmask! wlsqm.b* 76 | wm_array = wlsqm.WEIGHT_UNIFORM * np.ones( (npoints,), dtype=np.int32 ) 77 | solver = wlsqm.ExpertSolver( dimension=2, 78 | nk=nk_array, 79 | order=fit_order_array, 80 | knowns=knowns_array, 81 | weighting_method=wm_array, 82 | algorithm=wlsqm.ALGO_BASIC, 83 | do_sens=False, 84 | max_iter=10, # must be an int even though this parameter is not used in ALGO_BASIC mode 85 | ntasks=8, 86 | debug=False ) 87 | 88 | no = wlsqm.number_of_dofs( dimension=2, order=fit_order ) 89 | fi = np.empty( (npoints,no), dtype=np.float64 ) 90 | fi[:,0] = F # fi[i,0] contains the function value at point x[i,:] 91 | 92 | solver.prepare( xi=x, xk=x[hoods] ) # generate problem matrices from the geometry of the point cloud 93 | solver.solve( fk=fi[hoods,0], fi=fi, sens=None ) # compute least-squares fit to data 94 | 95 | 96 | # generate the regular grid for output 97 | # 98 | xx = np.linspace( np.min(x[:,0]), np.max(x[:,0]), nvis ) 99 | yy = np.linspace( np.min(x[:,1]), np.max(x[:,1]), nvis ) 100 | X,Y = np.meshgrid(xx,yy) 101 | 102 | # make a flat list of grid points (rank-2 array, one point per row) 103 | # 104 | Xlin = np.reshape(X, -1) 105 | Ylin = np.reshape(Y, -1) 106 | xout = np.empty( (len(Xlin), 2), dtype=np.float64 ) 107 | xout[:,0] = Xlin 108 | xout[:,1] = Ylin 109 | 110 | # Using the model, interpolate onto the regular grid 111 | # 112 | solver.prep_interpolate() # prepare global model 113 | Z,mi = solver.interpolate( xout, mode='nearest' ) # use the nearest local model; fast, surprisingly accurate 114 | # if a reasonable number of points (and continuous-looking 115 | # although technically has jumps over Voronoi cell boundaries) 116 | # when mode="nearest", "mi" is an array containing the index of the local model (which belongs to x[mi,:]) used for each evaluation 117 | 118 | return (X, Y, np.reshape( Z, X.shape )) 119 | 120 | 121 | def plot_wireframe( data, figno=None ): 122 | """Make and label a wireframe plot. 123 | 124 | Parameters: 125 | data : dict 126 | key : "x","y","z" 127 | value : tuple (rank-2 array in meshgrid format, axis label) 128 | 129 | Return value: 130 | ax 131 | The Axes3D object that was used for plotting. 132 | """ 133 | # http://matplotlib.org/mpl_toolkits/mplot3d/tutorial.html 134 | fig = plt.figure(figno) 135 | 136 | # Axes3D has a tendency to underestimate how much space it needs; it draws its labels 137 | # outside the window area in certain orientations. 138 | # 139 | # This causes the labels to be clipped, which looks bad. We prevent this by creating the axes 140 | # in a slightly smaller rect (leaving a margin). This way the labels will show - outside the Axes3D, 141 | # but still inside the figure window. 142 | # 143 | # The final touch is to set the window background to a matching white, so that the 144 | # background of the figure appears uniform. 145 | # 146 | fig.patch.set_color( (1,1,1) ) 147 | fig.patch.set_alpha( 1.0 ) 148 | x0y0wh = [ 0.02, 0.02, 0.96, 0.96 ] # left, bottom, width, height (here as fraction of subplot area) 149 | 150 | ax = mpl_toolkits.mplot3d.axes3d.Axes3D(fig, rect=x0y0wh) 151 | 152 | X,xlabel = data["x"] 153 | Y,ylabel = data["y"] 154 | Z,zlabel = data["z"] 155 | ax.plot_wireframe( X, Y, Z ) 156 | 157 | ax.view_init(34, -40) 158 | ax.axis('tight') 159 | plt.xlabel(xlabel) 160 | plt.ylabel(ylabel) 161 | ax.set_title(zlabel) 162 | 163 | return ax 164 | 165 | 166 | def main(): 167 | x = np.random.random( (1000, 2) ) # point cloud (no mesh topology!) 168 | F = np.sin(np.pi*x[:,0]) * np.cos(np.pi*x[:,1]) # function values on the point cloud 169 | X,Y,Z = project_onto_regular_grid_2D(x, F, fit_order=2, nk=30) 170 | plot_wireframe( {"x" : (X, r"$x$"), 171 | "y" : (Y, r"$y$"), 172 | "z" : (Z, r"$f(x,y)$")} ) 173 | 174 | if __name__ == '__main__': 175 | main() 176 | plt.show() 177 | -------------------------------------------------------------------------------- /examples/lapackdrivers_example.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # 3 | """Performance benchmarking and usage examples for the wlsqm.utils.lapackdrivers module. 4 | 5 | JJ 2016-11-02 6 | """ 7 | 8 | from __future__ import division, print_function, absolute_import 9 | 10 | import time 11 | 12 | import numpy as np 13 | from numpy.linalg import solve as numpy_solve # for comparison purposes 14 | 15 | import matplotlib.pyplot as plt 16 | 17 | try: 18 | import wlsqm.utils.lapackdrivers as drivers 19 | except ImportError: 20 | import sys 21 | sys.exit( "WLSQM not found; is it installed?" ) 22 | 23 | # from find_neighbors2.py 24 | class SimpleTimer: 25 | def __init__(self, label="", n=None): 26 | self.label = label 27 | self.n = n # number of repetitions done inside the "with..." section (for averaging in timing info) 28 | 29 | def __enter__(self): 30 | self.t0 = time.time() 31 | return self 32 | 33 | def __exit__(self, errtype, errvalue, traceback): 34 | dt = time.time() - self.t0 35 | identifier = ("%s" % self.label) if len(self.label) else "time taken: " 36 | avg = (", avg. %gs per run" % (dt/self.n)) if self.n is not None else "" 37 | print( "%s%gs%s" % (identifier, dt, avg) ) 38 | 39 | # from util.py 40 | def f5(seq, idfun=None): 41 | """Uniqify a list (remove duplicates). 42 | 43 | This is the fast order-preserving uniqifier "f5" from 44 | http://www.peterbe.com/plog/uniqifiers-benchmark 45 | 46 | The list does not need to be sorted. 47 | 48 | The return value is the uniqified list. 49 | 50 | """ 51 | # order preserving 52 | if idfun is None: 53 | def idfun(x): return x 54 | seen = {} 55 | result = [] 56 | for item in seq: 57 | marker = idfun(item) 58 | # in old Python versions: 59 | # if seen.has_key(marker) 60 | # but in new ones: 61 | if marker in seen: continue 62 | seen[marker] = 1 63 | result.append(item) 64 | return result 65 | 66 | 67 | 68 | def main(): 69 | # # exact solution is (3/10, 2/5, 0) 70 | # A = np.array( ( (2., 1., 3.), 71 | # (2., 6., 8.), 72 | # (6., 8., 18.) ), dtype=np.float64, order='F' ) 73 | # b = np.array( (1., 3., 5.), dtype=np.float64 ) 74 | 75 | # # symmetric matrix for testing symmetric solver 76 | # A = np.array( ( (2., 1., 3.), 77 | # (1., 6., 8.), 78 | # (3., 8., 18.) ), dtype=np.float64, order='F' ) 79 | # b = np.array( (1., 3., 5.), dtype=np.float64 ) 80 | 81 | # random matrix 82 | n = 5 83 | A = np.random.sample( (n,n) ) 84 | A = np.array( A, dtype=np.float64, order='F' ) 85 | drivers.symmetrize( A ) # fast Cython implementation of A = 0.5 * (A + A.T) 86 | b = np.random.sample( (n,) ) 87 | 88 | # test that it works 89 | 90 | x = numpy_solve(A, b) 91 | print( "NumPy:", x ) 92 | 93 | A2 = A.copy(order='F') 94 | x2 = b.copy() 95 | drivers.symmetric(A2, x2) 96 | print( "dsysv:", x2 ) 97 | 98 | A3 = A.copy(order='F') 99 | x3 = b.copy() 100 | drivers.general(A3, x3) 101 | print( "dgesv:", x3 ) 102 | 103 | assert (np.abs(x - x3) < 1e-10).all(), "Something went wrong, solutions do not match" # check general solver first 104 | assert (np.abs(x - x2) < 1e-10).all(), "Something went wrong, solutions do not match" # then check symmetric solver 105 | 106 | 107 | # test performance 108 | 109 | # for verification only - very slow (serial only!) 110 | use_numpy = True 111 | 112 | # parallel processing 113 | ntasks = 8 114 | 115 | # # overview, somewhat fast but not very accurate 116 | # sizes = f5( map( lambda x: int(x), np.ceil(3*np.logspace(0, 3, 21, dtype=int)) ) ) 117 | # reps = map( lambda x: int(x), 10.**(4 - np.log10(sizes)) ) 118 | 119 | # # "large n" 120 | # sizes = f5( map( lambda x: int(x), np.ceil(3*np.logspace(2, 3, 21, dtype=int)) ) ) 121 | # reps = map( lambda x: int(x), 10.**(5 - np.log10(sizes)) ) 122 | 123 | # "small n" (needs more repetitions to eliminate noise from other running processes since a single solve is very fast) 124 | sizes = f5( map( lambda x: int(x), np.ceil(3*np.logspace(0, 2, 21, dtype=int)) ) ) 125 | reps = map( lambda x: int(x), 10.**(6 - np.log10(sizes)) ) 126 | 127 | print( "performance test: %d tasks, sizes %s" % (ntasks, sizes) ) 128 | 129 | results1 = np.empty( (len(sizes),), dtype=np.float64 ) 130 | results2 = np.empty( (len(sizes),), dtype=np.float64 ) 131 | results3 = np.empty( (len(sizes),), dtype=np.float64 ) 132 | results4 = np.empty( (len(sizes),), dtype=np.float64 ) 133 | results5 = np.empty( (len(sizes),), dtype=np.float64 ) 134 | results6 = np.empty( (len(sizes),), dtype=np.float64 ) 135 | results7 = np.empty( (len(sizes),), dtype=np.float64 ) 136 | 137 | # # many LHS (completely independent problems) 138 | # n = 5 139 | # reps=int(1e5) 140 | # A = np.random.sample( (n,n,reps) ) 141 | # A = 0.5 * (A + A.transpose(1,0,2)) # symmetrize 142 | # A = np.array( A, dtype=np.float64, order='F' ) 143 | # b = np.random.sample( (n,reps) ) 144 | # b = np.array( b, dtype=np.float64, order='F' ) 145 | # with SimpleTimer(label="msymmetric ", n=reps) as s: 146 | # drivers.msymmetricp(A, b, ntasks) 147 | # with SimpleTimer(label="mgeneral ", n=reps) as s: 148 | # drivers.mgeneralp(A, b, ntasks) 149 | 150 | 151 | for j,item in enumerate(zip(sizes,reps)): 152 | n,r = item 153 | print( "testing size %d, reps = %d" % (n, r) ) 154 | 155 | # same LHS, many different RHS 156 | 157 | print( " prep same LHS, many RHS..." ) 158 | 159 | A = np.random.sample( (n,n) ) 160 | # symmetrize 161 | # A *= 0.5 162 | # A += A.T # not sure if this works 163 | A = np.array( A, dtype=np.float64, order='F' ) 164 | # A = 0.5 * (A + A.T) # symmetrize 165 | drivers.symmetrize(A) 166 | b = np.random.sample( (n,r) ) 167 | b = np.array( b, dtype=np.float64, order='F' ) 168 | 169 | print( " solve:" ) 170 | 171 | # # for verification only - very slow (Python loop, serial!) 172 | # if use_numpy: 173 | # t0 = time.time() 174 | # x = np.empty( (n,r), dtype=np.float64 ) 175 | # for k in range(r): 176 | # x[:,k] = numpy_solve(A, b[:,k]) 177 | # results1[j] = (time.time() - t0) / r 178 | 179 | print( " symmetricsp" ) 180 | t0 = time.time() 181 | A2 = A.copy(order='F') 182 | x2 = b.copy(order='F') 183 | drivers.symmetricsp(A2, x2, ntasks) 184 | results2[j] = (time.time() - t0) / r 185 | 186 | print( " generalsp" ) 187 | t0 = time.time() 188 | A3 = A.copy(order='F') 189 | x3 = b.copy(order='F') 190 | drivers.generalsp(A3, x3, ntasks) 191 | results3[j] = (time.time() - t0) / r 192 | 193 | # different LHS for each problem 194 | 195 | print( " prep independent problems..." ) 196 | 197 | A = np.random.sample( (n,n,r) ) 198 | # symmetrize 199 | # A *= 0.5 200 | # A += A.transpose(1,0,2) # this doesn't work 201 | A = np.array( A, dtype=np.float64, order='F' ) 202 | # A = 0.5 * (A + A.transpose(1,0,2)) 203 | drivers.msymmetrizep(A, ntasks) 204 | b = np.random.sample( (n,r) ) 205 | b = np.array( b, dtype=np.float64, order='F' ) 206 | 207 | print( " solve:" ) 208 | 209 | # for verification only - very slow (Python loop, serial!) 210 | if use_numpy: 211 | print( " NumPy" ) 212 | t0 = time.time() 213 | x = np.empty( (n,r), dtype=np.float64, order='F' ) 214 | for k in range(r): 215 | x[:,k] = numpy_solve(A[:,:,k], b[:,k]) 216 | results1[j] = (time.time() - t0) / r 217 | 218 | print( " msymmetricp" ) 219 | t0 = time.time() 220 | A2 = A.copy(order='F') 221 | x2 = b.copy(order='F') 222 | drivers.msymmetricp(A2, x2, ntasks) 223 | results4[j] = (time.time() - t0) / r 224 | 225 | print( " mgeneralp" ) 226 | t0 = time.time() 227 | A3 = A.copy(order='F') 228 | x3 = b.copy(order='F') 229 | drivers.mgeneralp(A3, x3, ntasks) 230 | results5[j] = (time.time() - t0) / r 231 | 232 | print( " msymmetricfactorp & msymmetricfactoredp" ) # factor once, then it is possible to solve multiple times (although we now test only once) 233 | t0 = time.time() 234 | ipiv = np.empty( (n,r), dtype=np.intc, order='F' ) 235 | fact = A.copy(order='F') 236 | x4 = b.copy(order='F') 237 | drivers.msymmetricfactorp( fact, ipiv, ntasks ) 238 | drivers.msymmetricfactoredp( fact, ipiv, x4, ntasks ) 239 | results6[j] = (time.time() - t0) / r 240 | 241 | print( " mgeneralfactorp & mgeneralfactoredp" ) # factor once, then it is possible to solve multiple times (although we now test only once) 242 | t0 = time.time() 243 | ipiv = np.empty( (n,r), dtype=np.intc, order='F' ) 244 | fact = A.copy(order='F') 245 | x5 = b.copy(order='F') 246 | drivers.mgeneralfactorp( fact, ipiv, ntasks ) 247 | drivers.mgeneralfactoredp( fact, ipiv, x5, ntasks ) 248 | results7[j] = (time.time() - t0) / r 249 | 250 | if use_numpy: 251 | # print( np.max(np.abs(x - x3)) ) # DEBUG 252 | # print( np.max(np.abs(x - x5)) ) # DEBUG 253 | print( np.max(np.abs(x2 - x4)) ) # DEBUG 254 | assert (np.abs(x - x5) < 1e-10).all(), "Something went wrong, solutions do not match" # check general solver first 255 | assert (np.abs(x - x3) < 1e-10).all(), "Something went wrong, solutions do not match" # check general solver 256 | # assert (np.abs(x - x2) < 1e-5).all(), "Something went wrong, solutions do not match" # doesn't make sense to compare, DSYSV is more accurate for badly conditioned symmetric matrices 257 | assert (np.abs(x2 - x4) < 1e-7).all(), "Something went wrong, solutions do not match" # check symmetric solvers against each other 258 | # (not exactly the same algorithm (DSYTRS2 vs. DSYTRS), so there may be slight deviation) 259 | 260 | 261 | # old, serial only 262 | # 263 | # for j,item in enumerate(zip(sizes,reps)): 264 | # n,r = item 265 | # print( "testing size %d, reps = %d" % (n, r) ) 266 | # 267 | # A = np.random.sample( (n,n) ) 268 | # A = 0.5 * (A + A.T) # symmetrize 269 | # A = np.array( A, dtype=np.float64, order='F' ) 270 | # b = np.random.sample( (n,) ) 271 | # 272 | # t0 = time.time() 273 | # for k in range(r): 274 | # x = numpy_solve(A, b) 275 | # results1[j] = (time.time() - t0) / r 276 | # 277 | # t0 = time.time() 278 | # for k in range(r): 279 | # A2 = A.copy(order='F') 280 | # x2 = b.copy() 281 | # drivers.symmetric(A2, x2) 282 | # results2[j] = (time.time() - t0) / r 283 | # 284 | # t0 = time.time() 285 | # for k in range(r): 286 | # A3 = A.copy(order='F') 287 | # x3 = b.copy() 288 | # drivers.general(A3, x3) 289 | # results3[j] = (time.time() - t0) / r 290 | 291 | 292 | # visualize 293 | 294 | plt.figure(1) 295 | plt.clf() 296 | if use_numpy: 297 | plt.loglog(sizes, results1, 'k-', label='NumPy') 298 | plt.loglog(sizes, results2, 'b--', label='dsysv, same LHS, many RHS') 299 | plt.loglog(sizes, results3, 'b-', label='dgesv, same LHS, many RHS') 300 | plt.loglog(sizes, results4, 'r--', label='dsysv, independent problems') 301 | plt.loglog(sizes, results5, 'r-', label='dgesv, independent problems') 302 | plt.loglog(sizes, results6, 'g--', label='dsytrf+dsytrs, independent problems') 303 | plt.loglog(sizes, results7, 'g-', label='dgetrf+dgetrs, independent problems') 304 | plt.xlabel('n') 305 | plt.ylabel('t') 306 | plt.title('Average time per problem instance, %d parallel tasks' % (ntasks)) 307 | plt.axis('tight') 308 | plt.grid(b=True, which='both') 309 | plt.legend(loc='best') 310 | 311 | plt.savefig('figure1_latest.pdf') 312 | 313 | 314 | if __name__ == '__main__': 315 | main() 316 | plt.show() 317 | -------------------------------------------------------------------------------- /examples/sudoku_lhs.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """Latin hypercube sampler with a sudoku-like constraint. 3 | 4 | Tested on Python 2.7 and 3.4. 5 | 6 | License: 2-clause BSD; copyright 2010-2017 Juha Jeronen and University of Jyväskylä. 7 | 8 | The sudoku LHS algorithm is a bit like the first stage in the design of an N-dimensional 9 | sudoku puzzle, hence the name: each "box" must have exactly the same number of samples, 10 | and no two samples may occur on the same hyperplane. The algorithm runs in linear time 11 | (w.r.t. number of samples) and consumes a linear amount of memory. 12 | 13 | The Sudoku LHS sampling method is inspired by, but not related to, orthogonal sampling. 14 | The latter refers to LHS sampling using orthogonal arrays; about that, see the articles 15 | B. Tang 1993: Orthogonal Array-Based Latin Hypercubes, 16 | A. B. Owen 1992: Orthogonal arrays for computer experiments, integration and 17 | visualization, 18 | K. Q. Ye 1998: Orthogonal column Latin hypercubes and their application in computer 19 | experiments 20 | for more information. 21 | 22 | Latin hypercube sampling is very classical (original paper: M. D. McKay, R. J. Beckman, 23 | W. J. Conover 1979: A Comparison of Three Methods for Selecting Values of Input Variables 24 | in the Analysis of Output from a Computer Code). 25 | See e.g. Wikipedia for a description: 26 | http://en.wikipedia.org/wiki/Latin_hypercube_sampling 27 | 28 | Historical note: 29 | The variant of sudoku sampling implemented here was originally developed as part of 30 | the SAVU project in 2010, and briefly mentioned in the paper 31 | Jeronen, J. SAVU: A Statistical Approach for Uncertain Data in Dynamics 32 | of Axially Moving Materials. In: A. Cangiani, R. Davidchack, E. Georgoulis, 33 | A. Gorban, J. Levesley, M. Tretyakov (eds.) Numerical Mathematics and 34 | Advanced Applications 2011: Proceedings of ENUMATH 2011, the 9th European Conference 35 | on Numerical Mathematics and Advanced Applications, Leicester, September 2011, 36 | 831-839, Springer, 2013. 37 | 38 | Later, Shields and Zhang independently developed and published a variant of sudoku sampling 39 | in the paper 40 | Michael D. Shields and Jiaxin Zhang. The generalization of Latin hypercube sampling. 41 | Reliability Engineering & System Safety 148:96-108, 2016. 42 | http://doi.org/10.1016/j.ress.2015.12.002 43 | 44 | JJ 2010-09-23 (MATLAB version) 45 | JJ 2012-03-12 (Python version) 46 | JJ 2017-04-25 (Python3 compatibility, NumPyDoc style docstring, define __version__) 47 | """ 48 | 49 | from __future__ import division, print_function, absolute_import 50 | 51 | import numpy as np 52 | 53 | __version__ = '1.0.0' 54 | 55 | def sample(N,k,n, visualize=False, showdiag=False, verbose=False): 56 | """Create a coarsely `N`-dimensionally stratified latin hypercube sample (LHS) of range(`k` * `m`) in `N` dimensions. 57 | 58 | Parameters: 59 | N : int, >= 1 60 | number of dimensions 61 | k : int, >= 1 62 | number of large subdivisions (sudoku boxes, "subspaces") per dimension 63 | n : int, >= 1 64 | number of samples to place in each subspace 65 | visualize : bool (optional) 66 | If True, the results (projected into two dimensions pairwise) 67 | are plotted using Matplotlib when the sampling is finished. 68 | showdiag : bool (optional) 69 | If True, and `N` >= 3, show also one-dimensional projection 70 | of the result onto each axis. 71 | 72 | Implies "visualize". 73 | 74 | This should produce a straight line with no holes onto 75 | each subplot that is on the diagonal of the plot array; 76 | mainly intended for debug. 77 | verbose : bool (optional) 78 | If this exists and is true, progress messages and warnings 79 | (for non-integer input) are printed. 80 | 81 | Return value: 82 | tuple (`S`, `m`), where: 83 | S : (`k` * `m`)-by-`N` rank-2 np.array 84 | where each row is an `N`-tuple of integers in range(1, `k` * `m` + 1). 85 | 86 | m : int, >= 1 87 | number of bins per parameter in one subspace (i.e. sample slots 88 | per axis in one box). 89 | 90 | `m` = `n` * (`k` ** (`N` - 1)), but is provided as output for convenience. 91 | 92 | **Examples:** 93 | 94 | `N` = 2 dimensions, k = 3 subspaces per axis, `n` = 1 sample per subspace. 95 | `m` will be `n` * (`k` ** (`N` - 1)) = 1 * 3**(2-1) = 3. Plot the result and show progress messages:: 96 | 97 | S,m = sample(2, 3, 1, visualize=True, verbose=True) 98 | 99 | For comparison with the previous example, try this classical Latin hypercube 100 | that has 9 samples in total, plotting the result. We choose 9, because in 101 | the previous example, `k` * `m` = 3*3 = 9:: 102 | 103 | S,m = sample(2, 1, 9, visualize=True) 104 | 105 | **Notes:** 106 | 107 | If `k` = 1, the algorithm reduces to classical Latin hypercube sampling. 108 | 109 | If `N` = 1, the algorithm simply produces a random permutation of range(`k`). 110 | 111 | Let `m` = `n` * (`k` ** (`N` - 1)) denote the number of bins for one variable 112 | in one subspace. The total number of samples is always exactly `k` * `m'. 113 | Each component of a sample can take on values 0, 1, ..., (`k` * `m` - 1). 114 | """ 115 | # sanity check input 116 | if not isinstance(N, int) or N < 1: 117 | raise ValueError("N must be int >= 1, got %g" % (N)) 118 | if not isinstance(k, int) or k < 1: 119 | raise ValueError("k must be int >= 1, got %g" % (k)) 120 | if not isinstance(n, int) or n < 1: 121 | raise ValueError("n must be int >= 1, got %g" % (n)) 122 | 123 | # showing the diagonal implies visualization 124 | if showdiag: 125 | visualize = True 126 | 127 | # Discussion. 128 | 129 | # Proof that the following algorithm implements a Sudoku-like LHS method: 130 | # 131 | # * We desire two properties: Latin hypercube sampling globally, and equal density 132 | # in each subspace. 133 | # * The independent index vector generation for each parameter guarantees the Latin 134 | # hypercube property: some numbers will have been used, and removed from the index 135 | # vectors, when the next subspace along the same hyperplane is reached. Thus, the same 136 | # indices cannot be used again for any such subspace. This process continues until each 137 | # index has been used exactly once. 138 | # * The equal density property is enforced by the fact that each subspace gets exactly one 139 | # sample generated in one run of the loop. The total number of samples is, by design, 140 | # divisible by the number of these subspaces. Therefore, each subspace will have the 141 | # same sample density. 142 | # 143 | # Run time and memory cost: 144 | # 145 | # * Exactly k*m samples will be generated. This can be seen from the fact that there are 146 | # k*m bins per parameter, and they all get filled by exactly one sample. 147 | # * Thus, runtime is in O(k*m) = O( k * n*k^(N-1) ) = O( n*k^N ). (This isn't as bad as it 148 | # looks. All it's saying is that a linear number of bins gets filled. This is much less 149 | # than the total number of bins (k*m)^N - which is why LHS is needed in the first place. 150 | # We get a reduction in sample count by the factor (k*m)^(N-1).) 151 | # * Required memory for the final result is (k*m)*N reals (plus some overhead), where the 152 | # N comes from the fact that each N-tuple generated has N elements. Note that the index 153 | # vectors also use up k*m*N reals in total (k*N vectors, each with m elements). Thus the 154 | # memory cost is 2*k*m*N reals plus overhead. 155 | # * Note that using a more complicated implementation that frees the elements of the index 156 | # vectors as they are used up probably wouldn't help with the memory usage, because many 157 | # vector implementations never decrease their storage space even if elements are deleted. 158 | # * In other programming languages, one might work around this by using linked lists 159 | # instead of vectors, and arranging the memory allocations for the elements in a very 160 | # special way (i.e. such that the last ones in memory always get deleted first). By 161 | # using a linked list for the results, too, and allocating them over the deleted 162 | # elements of the index vectors (since they shrink at exactly the same rate the results 163 | # grow), one might be able to bring down the memory usage to k*m*N plus overhead. 164 | # * Finally, note that in practical situations N, k and m are usually small, so the factor 165 | # of 2 doesn't really matter. 166 | 167 | # Algorithm. 168 | 169 | # Find necessary number of bins per subspace so that equal nonzero density is possible. 170 | # A brief analysis shows that in order to exactly fill up all k*m bins for one variable, 171 | # we must have k*m = n*k^N, i.e... 172 | m = n * k**(N-1) 173 | 174 | # Create index vectors for each subspace for each parameter. (There are k*N of these.) 175 | if verbose: 176 | print('Allocating %d elements for solution...' % (N*k*m)) 177 | 178 | I = np.empty( [N,k,m], dtype=int, order="C" ) # index vectors 179 | Iidx = np.zeros( [N,k], dtype=int, order="C" ) # index of first "not yet used" element in each index vector 180 | 181 | # Create random permutations of range(m) so that in the sampling loop 182 | # we may simply pick the first element from each index vector. 183 | # 184 | for i in range(N): 185 | for j in range(k): 186 | tmp = np.array( range(m), dtype=int ) 187 | np.random.shuffle(tmp) 188 | I[i,j,:] = tmp 189 | 190 | if verbose: 191 | print('Generating sample...') 192 | print('Looping through %d subspaces.' % (k**N)) 193 | 194 | L = k*m # number of samples still waiting for placement 195 | Ns = k**N # number of subspaces in total (cartesian product 196 | # of k subspaces per axis in N dimensions) 197 | 198 | # Start with an empty result set. We will place the generated samples here. 199 | S = np.empty( [L,N], dtype=int, order="C" ) 200 | out_idx = 0 # index of current output sample in S 201 | 202 | # create views for linear indexing 203 | I_lin = np.reshape(I, -1) 204 | Iidx_lin = np.reshape(Iidx, -1) 205 | 206 | # we will need an array of range(N) several times in the loop... 207 | rgN = np.arange(N, dtype=int) 208 | 209 | while L > 0: 210 | # Loop over all subspaces, placing one sample in each. 211 | for j in range(Ns): # index subspaces linearly 212 | # Find, in each dimension, which subspace we are in. 213 | # Compute the multi-index (vector containing an index in each dimension) 214 | # for this subspace. 215 | # 216 | # Simple example: (N,k,n) = (2,3,1) 217 | # => pj = 0 0, 1 0, 2 0, 0 1, 1 1, 2 1, 0 2, 1 2, 2 2 218 | # when j = 0, 1, 2, 3, 4, 5, 6, 7, 8 219 | # 220 | pj = np.array( ( j // (k**rgN) ) % k, dtype=int ) 221 | 222 | # Construct one sample point. 223 | # 224 | # To do this, we grab the first "not yet used" element in all index vectors 225 | # (one for each dimension) corresponding to this subspace. 226 | # 227 | # Along the dth dimension, we are in the pj[d]th subspace. 228 | # Hence, in the dth dimension, we want to refer to the vector whose index is pj[d]. 229 | # 230 | # Hence, we should take 231 | # row = d (effectively, range(N)) 232 | # col = pj[d] 233 | # 234 | # The array Iidx is of the shape [N,k]. NumPy uses C-contiguous ordering 235 | # by default; last index varies fastest. Hence, the element [row,col] is at 236 | # k*row + col. 237 | # 238 | # This gets us a vector of linear indices into Iidx, where the dth element 239 | # corresponds to the linear index of the pj[d]th vector. 240 | # 241 | i = np.array( k*rgN + pj, dtype=int ) 242 | 243 | # Extract the "first unused element" data from Iidx for each of the vectors, 244 | # to get the actual sample slot numbers (random permutations) stored in I. 245 | # 246 | indices = Iidx_lin[i] 247 | 248 | # Indexing: the array I is of shape [N,k,m] and has C storage order. 249 | # 250 | idx_first = np.array( k*m*rgN + m*pj + indices, dtype=int ) 251 | 252 | s = I_lin[idx_first] # this is our new sample point (vector of length N) 253 | Iidx_lin[i] += 1 # move to the next element in the selected index vectors 254 | 255 | # Now s contains a sample from (range(m), range(m), ..., range(m)) (N elements). 256 | # By its construction, the sample conforms globally to the Latin hypercube 257 | # requirement. 258 | 259 | # Compute the base index along each dimension. In the global numbering 260 | # which goes 0, 1, ..., (k*m-1) along each axis, the first element 261 | # of the current subspace is at this multi-index: 262 | # 263 | a = pj*m 264 | 265 | # Add the new sample to the result set. 266 | S[out_idx,:] = a+s 267 | out_idx += 1 268 | 269 | # We placed exactly Ns samples during the for loop. 270 | L -= Ns 271 | 272 | # Result visualization (for debug and illustrative purposes) 273 | # 274 | if visualize and N > 1: 275 | if verbose: 276 | print('Plotting...') 277 | 278 | import itertools 279 | import matplotlib.pyplot as plt 280 | 281 | # if the grid would show more lines than this, the lines are hidden. 282 | max_major_lines = 5 283 | max_minor_lines = 15 284 | 285 | if k*m > 100: 286 | style = 'b.' 287 | else: 288 | style = 'bo' # use circles when a small number of bins 289 | 290 | plt.figure(1) 291 | plt.clf() 292 | 293 | if N >= 3: 294 | # We'll make a "pairs" plot (like the pairs() function of the "R" 295 | # statistics software). 296 | 297 | # generate all pairs of dimensions, make explicit list 298 | pair_list = list(itertools.combinations(range(N), 2)) 299 | 300 | # make final list. 301 | # 302 | # We want to populate both sides of the diagonal in the plot, 303 | # so we need pair_list, plus another copy of it 304 | # with the first and second components switched in each pair. 305 | # 306 | pairs = list(pair_list) # copy 307 | pairs.extend( tuple(reversed(pair)) for pair in pair_list ) 308 | 309 | # Show also the diagonal if requested. 310 | # 311 | # This should produce a straight line with no holes onto 312 | # each subplot that is on the diagonal of the plot array. 313 | # 314 | if showdiag: 315 | pairs.extend( [ (j,j) for j in range(N) ] ) 316 | else: # N == 2: 317 | pairs = [ (0, 1) ] 318 | 319 | Np = len(pairs) 320 | for i in range(Np): 321 | if N >= 3: 322 | if verbose: 323 | print('Subplot %d of %d...' % ((i+1), Np)) 324 | plt.subplot( N,N, N*pairs[i][1] + (pairs[i][0] + 1) ) 325 | 326 | # off-diagonal projection? (i.e. a true 2D projection) 327 | if pairs[i][0] != pairs[i][1]: 328 | # Plot the points picked by the sample 329 | plt.plot( S[:,pairs[i][0]], S[:,pairs[i][1]], style) 330 | axmax = k*m 331 | 332 | # Mark the subspaces onto the figure 333 | # (if few enough to fit reasonably on screen) 334 | # 335 | if k <= max_major_lines: 336 | for j in range(k): 337 | xy = -0.5 + j*m 338 | plt.plot( [xy, xy], [-0.5, axmax - 0.5], 'k', linewidth=2.0 ) 339 | plt.plot( [-0.5, axmax - 0.5], [xy, xy], 'k', linewidth=2.0 ) 340 | 341 | # Mark bins (if few enough to fit reasonably on screen) 342 | # 343 | if k*m <= max_minor_lines: 344 | for j in range(k*m): 345 | xy = -0.5 + j 346 | plt.plot( [xy, xy], [-0.5, axmax - 0.5], 'k') 347 | plt.plot( [-0.5, axmax - 0.5], [xy, xy], 'k') 348 | 349 | # Make a box around the area 350 | plt.plot( [-0.5, axmax - 0.5], [-0.5, -0.5], 'k', \ 351 | linewidth=2.0 ) 352 | plt.plot( [-0.5, axmax - 0.5], [axmax - 0.5, axmax - 0.5], 'k', \ 353 | linewidth=2.0 ) 354 | plt.plot( [-0.5, -0.5], [-0.5, axmax - 0.5], 'k', \ 355 | linewidth=2.0 ) 356 | plt.plot( [axmax - 0.5, axmax - 0.5], [-0.5, axmax - 0.5], 'k', \ 357 | linewidth=2.0 ) 358 | 359 | # Set the axes so that the extreme indices just fit into the view 360 | plt.axis("equal") 361 | plt.axis( [-0.5, axmax-0.5, -0.5, axmax-0.5 ] ) 362 | else: # 1D projection 363 | plt.plot( S[:,pairs[i][0]], np.zeros( [k*m] ), style) 364 | plt.axis( [-0.5, axmax-0.5, -0.5, 0.5] ) 365 | 366 | # Label the variables. 367 | # 368 | # We only do this if the diagonal subplots are blank. 369 | # 370 | if N >= 3: 371 | if not showdiag: 372 | for i in range(N): 373 | plt.subplot(N,N, N*i+(i+1)) 374 | my_label = 'Row: x = var %d' % i 375 | plt.text(0.5,0.6, my_label, horizontalalignment="center", fontweight="bold") 376 | my_label = 'Col: y = var %d' % i 377 | plt.text(0.5,0.4, my_label, horizontalalignment="center", fontweight="bold") 378 | plt.axis("off") 379 | else: 380 | plt.xlabel('Var 0', fontweight="bold") 381 | plt.ylabel('Var 1', fontweight="bold") 382 | 383 | if verbose: 384 | print('Plotting done. Showing figure...') 385 | 386 | # show figures and enter gtk mainloop 387 | plt.show() 388 | 389 | return (S,m) 390 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # 3 | """Setuptools-based setup script for WLSQM.""" 4 | 5 | from __future__ import division, print_function, absolute_import 6 | 7 | ######################################################### 8 | # Config 9 | ######################################################### 10 | 11 | # choose build type here 12 | # 13 | build_type="optimized" 14 | #build_type="debug" 15 | 16 | 17 | ######################################################### 18 | # Init 19 | ######################################################### 20 | 21 | # check for Python 2.7 or later 22 | # http://stackoverflow.com/questions/19534896/enforcing-python-version-in-setup-py 23 | import sys 24 | if sys.version_info < (2,7): 25 | sys.exit('Sorry, Python < 2.7 is not supported') 26 | 27 | import os 28 | import platform 29 | 30 | from setuptools import setup 31 | from setuptools.extension import Extension 32 | 33 | try: 34 | from Cython.Build import cythonize 35 | except ImportError: 36 | sys.exit("Cython not found. Cython is needed to build the extension modules for WLSQM.") 37 | 38 | 39 | ######################################################### 40 | # Definitions 41 | ######################################################### 42 | 43 | system = platform.system() 44 | 45 | if system == "Windows": 46 | my_extra_compile_args_math = ["/openmp"] 47 | my_extra_compile_args_nonmath = [] 48 | my_extra_link_args = [] 49 | debug = False 50 | else: 51 | extra_compile_args_math_optimized = ['-fopenmp', '-march=native', '-O2', '-msse', '-msse2', '-mfma', '-mfpmath=sse'] 52 | extra_compile_args_math_debug = ['-fopenmp', '-march=native', '-O0', '-g'] 53 | 54 | extra_compile_args_nonmath_optimized = ['-O2'] 55 | extra_compile_args_nonmath_debug = ['-O0', '-g'] 56 | 57 | extra_link_args_optimized = ['-fopenmp'] 58 | extra_link_args_debug = ['-fopenmp'] 59 | 60 | 61 | if build_type == 'optimized': 62 | my_extra_compile_args_math = extra_compile_args_math_optimized 63 | my_extra_compile_args_nonmath = extra_compile_args_nonmath_optimized 64 | my_extra_link_args = extra_link_args_optimized 65 | debug = False 66 | print( "build configuration selected: optimized" ) 67 | else: # build_type == 'debug': 68 | my_extra_compile_args_math = extra_compile_args_math_debug 69 | my_extra_compile_args_nonmath = extra_compile_args_nonmath_debug 70 | my_extra_link_args = extra_link_args_debug 71 | debug = True 72 | print( "build configuration selected: debug" ) 73 | 74 | 75 | ######################################################### 76 | # Long description 77 | ######################################################### 78 | 79 | DESC="""WLSQM (Weighted Least SQuares Meshless) is a fast and accurate meshless least-squares interpolator for Python, implemented in Cython. 80 | 81 | Given scalar data values on a set of points in 1D, 2D or 3D, WLSQM constructs a piecewise polynomial global surrogate model (a.k.a. response surface), using up to 4th order polynomials. 82 | 83 | Use cases include response surface modeling, and computing space derivatives of data known only as values at discrete points in space. No grid or mesh is needed. 84 | 85 | Any derivative of the model function (e.g. d2f/dxdy) can be easily evaluated, up to the order of the polynomial. 86 | 87 | Sensitivity data of solution DOFs (on the data values at points other than the reference in the local neighborhood) can be optionally computed. 88 | 89 | Performance-critical parts are implemented in Cython. LAPACK is used via SciPy's Cython-level bindings. OpenMP is used for parallelization over the independent local problems (also in the linear solver step). 90 | 91 | This implementation is targeted for high performance in a single-node environment, such as a laptop. The main target is x86_64. 92 | """ 93 | 94 | ######################################################### 95 | # Helpers 96 | ######################################################### 97 | 98 | my_include_dirs = ["."] # IMPORTANT, see https://github.com/cython/cython/wiki/PackageHierarchy 99 | 100 | def ext(extName): 101 | extPath = extName.replace(".", os.path.sep)+".pyx" 102 | return Extension( extName, 103 | [extPath], 104 | extra_compile_args=my_extra_compile_args_nonmath 105 | ) 106 | def ext_math(extName): 107 | if system == "Windows": 108 | libraries = [] 109 | else: 110 | libraries = ["m"] # "m" links libm, the math library on unix-likes; see http://docs.cython.org/src/tutorial/external.html 111 | extPath = extName.replace(".", os.path.sep)+".pyx" 112 | return Extension( extName, 113 | [extPath], 114 | extra_compile_args=my_extra_compile_args_math, 115 | extra_link_args=my_extra_link_args, 116 | libraries=libraries 117 | ) 118 | 119 | # http://stackoverflow.com/questions/13628979/setuptools-how-to-make-package-contain-extra-data-folder-and-all-folders-inside 120 | datadirs = ("examples",) 121 | dataexts = (".py", ".pyx", ".pxd", ".c", ".sh", ".lyx", ".pdf") 122 | datafiles = [] 123 | getext = lambda filename: os.path.splitext(filename)[1] 124 | for datadir in datadirs: 125 | datafiles.extend( [(root, [os.path.join(root, f) for f in files if getext(f) in dataexts]) 126 | for root, dirs, files in os.walk(datadir)] ) 127 | 128 | datafiles.append( ('.', ["README.md", "LICENSE.md", "TODO.md", "CHANGELOG.md"]) ) 129 | datafiles.append( ('.', ["example.png"]) ) 130 | 131 | 132 | ######################################################### 133 | # Utility modules 134 | ######################################################### 135 | 136 | ext_module_ptrwrap = ext( "wlsqm.utils.ptrwrap") # Pointer wrapper for Cython/Python integration 137 | ext_module_lapackdrivers = ext_math("wlsqm.utils.lapackdrivers") # Simple Python interface to LAPACK for solving many independent linear equation systems efficiently in parallel. Built on top of scipy.linalg.cython_lapack. 138 | 139 | ######################################################### 140 | # WLSQM (Weighted Least SQuares Meshless method) 141 | ######################################################### 142 | 143 | ext_module_defs = ext( "wlsqm.fitter.defs") # definitions (named constants) 144 | ext_module_infra = ext( "wlsqm.fitter.infra") # memory allocation infrastructure 145 | ext_module_impl = ext_math("wlsqm.fitter.impl") # low-level routines (implementation) 146 | ext_module_polyeval = ext_math("wlsqm.fitter.polyeval") # evaluation of Taylor expansions and general polynomials 147 | ext_module_interp = ext_math("wlsqm.fitter.interp") # interpolation of fitted model 148 | ext_module_simple = ext_math("wlsqm.fitter.simple") # simple API 149 | ext_module_expert = ext_math("wlsqm.fitter.expert") # advanced API 150 | 151 | ######################################################### 152 | 153 | # Extract __version__ from the package __init__.py 154 | # (since it's not a good idea to actually run __init__.py during the build process). 155 | # 156 | # http://stackoverflow.com/questions/2058802/how-can-i-get-the-version-defined-in-setup-py-setuptools-in-my-package 157 | # 158 | import ast 159 | with open('wlsqm/__init__.py', 'r') as f: 160 | for line in f: 161 | if line.startswith('__version__'): 162 | version = ast.parse(line).body[0].value.s 163 | break 164 | else: 165 | version = '0.0.unknown' 166 | print( "WARNING: Version information not found, using placeholder '%s'" % (version) ) 167 | 168 | 169 | setup( 170 | name = "wlsqm", 171 | version = version, 172 | author = "Juha Jeronen", 173 | author_email = "juha.jeronen@jyu.fi", 174 | url = "https://github.com/Technologicat/python-wlsqm", 175 | 176 | description = "Weighted least squares meshless interpolator", 177 | long_description = DESC, 178 | 179 | license = "BSD", 180 | platforms = ["Linux"], # free-form text field; http://stackoverflow.com/questions/34994130/what-platforms-argument-to-setup-in-setup-py-does 181 | 182 | classifiers = [ "Development Status :: 4 - Beta", 183 | "Environment :: Console", 184 | "Intended Audience :: Developers", 185 | "Intended Audience :: Science/Research", 186 | "License :: OSI Approved :: BSD License", 187 | "Operating System :: POSIX :: Linux", 188 | "Programming Language :: Cython", 189 | "Programming Language :: Python", 190 | "Programming Language :: Python :: 2", 191 | "Programming Language :: Python :: 2.7", 192 | "Programming Language :: Python :: 3", 193 | "Programming Language :: Python :: 3.4", 194 | "Topic :: Scientific/Engineering", 195 | "Topic :: Scientific/Engineering :: Mathematics", 196 | "Topic :: Software Development :: Libraries", 197 | "Topic :: Software Development :: Libraries :: Python Modules" 198 | ], 199 | 200 | # 0.16 seems to be the first SciPy version that has cython_lapack.pxd. ( https://github.com/scipy/scipy/commit/ba438eab99ce8f55220a6ff652500f07dd6a547a ) 201 | setup_requires = ["cython", "scipy (>=0.16)"], 202 | install_requires = ["numpy", "scipy (>=0.16)"], 203 | provides = ["wlsqm"], 204 | 205 | # same keywords as used as topics on GitHub 206 | keywords = ["numerical interpolation differentiation curve-fitting least-squares meshless numpy cython"], 207 | 208 | ext_modules = cythonize( [ ext_module_lapackdrivers, 209 | ext_module_ptrwrap, 210 | ext_module_defs, 211 | ext_module_infra, 212 | ext_module_impl, 213 | ext_module_polyeval, 214 | ext_module_interp, 215 | ext_module_simple, 216 | ext_module_expert ], 217 | 218 | include_path = my_include_dirs, 219 | 220 | gdb_debug = debug ), 221 | 222 | # Declare packages so that python -m setup build will copy .py files (especially __init__.py). 223 | packages = ["wlsqm", "wlsqm.utils", "wlsqm.fitter"], 224 | 225 | # Install also Cython headers so that other Cython modules can cimport ours 226 | # FIXME: force sdist, but sdist only, to keep the .pyx files (this puts them also in the bdist) 227 | package_data={'wlsqm.utils': ['*.pxd', '*.pyx'], # note: paths relative to each package 228 | 'wlsqm.fitter': ['*.pxd', '*.pyx']}, 229 | 230 | # Disable zip_safe, because: 231 | # - Cython won't find .pxd files inside installed .egg, hard to compile libs depending on this one 232 | # - dynamic loader may need to have the library unzipped to a temporary folder anyway (at import time) 233 | zip_safe = False, 234 | 235 | # Usage examples; not in a package 236 | data_files = datafiles 237 | ) 238 | -------------------------------------------------------------------------------- /wlsqm/__init__.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # 3 | """WLSQM (Weighted Least SQuares Meshless): a fast and accurate meshless least-squares interpolator for Python, for scalar-valued data defined as point values on 1D, 2D and 3D point clouds. 4 | 5 | A general overview can be found in the README. 6 | 7 | For the API, refer to wlsqm.fitter.simple and wlsqm.fitter.expert. 8 | 9 | When imported, this module imports all symbols from the following modules to the local namespace: 10 | 11 | wlsqm.fitter.defs # definitions (constants) (common) 12 | wlsqm.fitter.simple # simple API 13 | wlsqm.fitter.interp # interpolation of fitted model (for simple API) 14 | wlsqm.fitter.expert # advanced API 15 | 16 | This makes the names available as wlsqm.fit_2D(), wlsqm.ExpertSolver, etc. 17 | 18 | JJ 2017-02-22 19 | """ 20 | 21 | # absolute_import: https://www.python.org/dev/peps/pep-0328/ 22 | from __future__ import division, print_function, absolute_import 23 | 24 | __version__ = '0.1.6' 25 | 26 | from .fitter.defs import * # definitions (constants) (common) 27 | from .fitter.simple import * # simple API 28 | from .fitter.interp import * # interpolation of fitted model (for simple API) 29 | from .fitter.expert import * # advanced API 30 | 31 | -------------------------------------------------------------------------------- /wlsqm/fitter/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Technologicat/python-wlsqm/b697d163c2d2bec46b4d9696467abaebb9d4cbb3/wlsqm/fitter/__init__.py -------------------------------------------------------------------------------- /wlsqm/fitter/defs.pxd: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # 3 | # WLSQM (Weighted Least SQuares Meshless): a fast and accurate meshless least-squares interpolator for Python, for scalar-valued data defined as point values on 1D, 2D and 3D point clouds. 4 | # 5 | # C-level definitions for Cython. 6 | # 7 | # This file contains only the declarations; the actual values are set in the .pyx source for wlsqm.fitter.defs. 8 | # 9 | # The suffix of _c means "visible at the C level in Cython"; it is used to distinguish 10 | # the typed C constants from the Python-accessible objects also defined in the .pyx source. 11 | # 12 | # JJ 2016-11-30 13 | 14 | from __future__ import absolute_import 15 | 16 | # Algorithms for the solve step (expert mode). 17 | # 18 | cdef int ALGO_BASIC_c # fit just once 19 | cdef int ALGO_ITERATIVE_c # fit with iterative refinement to mitigate roundoff 20 | 21 | # Weighting methods. 22 | # 23 | cdef int WEIGHT_UNIFORM_c 24 | cdef int WEIGHT_CENTER_c 25 | 26 | # DOF index in the array f. 27 | # 28 | # These are ordered in increasing order of number of differentiations, so that if only first derivatives 29 | # are required, the DOF array can be simply truncated after the first derivatives. 30 | # 31 | # To avoid gaps in the numbering, this requires separate DOF orderings for the 1D, 2D and 3D cases. 32 | # 33 | # (The other logical possibility would be function value first, then x-related, then y-related, then mixed, 34 | # but then the case of "first derivatives only" requires changes to the ordering to avoid gaps. 35 | # Specifying different orderings for different numbers of space dimensions is conceptually cleaner 36 | # of the two possibilities.) 37 | 38 | # 1D case 39 | # 40 | cdef int i1_F_c 41 | cdef int i1_X_c 42 | cdef int i1_X2_c 43 | cdef int i1_X3_c 44 | cdef int i1_X4_c 45 | 46 | cdef int i1_0th_end_c # one-past end of zeroth-order case 47 | cdef int i1_1st_end_c # one-past end of first-order case 48 | cdef int i1_2nd_end_c # one-past-end of second-order case 49 | cdef int i1_3rd_end_c # one-past-end of third-order case 50 | cdef int i1_4th_end_c # one-past-end of fourth-order case 51 | 52 | cdef int SIZE1_c # maximum possible number of DOFs, 1D case 53 | 54 | # 2D case 55 | # 56 | cdef int i2_F_c 57 | 58 | cdef int i2_X_c 59 | cdef int i2_Y_c 60 | 61 | cdef int i2_X2_c 62 | cdef int i2_XY_c 63 | cdef int i2_Y2_c 64 | 65 | cdef int i2_X3_c 66 | cdef int i2_X2Y_c 67 | cdef int i2_XY2_c 68 | cdef int i2_Y3_c 69 | 70 | cdef int i2_X4_c 71 | cdef int i2_X3Y_c 72 | cdef int i2_X2Y2_c 73 | cdef int i2_XY3_c 74 | cdef int i2_Y4_c 75 | 76 | cdef int i2_0th_end_c # one-past end of zeroth-order case 77 | cdef int i2_1st_end_c # one-past end of first-order case 78 | cdef int i2_2nd_end_c # one-past-end of second-order case 79 | cdef int i2_3rd_end_c # one-past-end of third-order case 80 | cdef int i2_4th_end_c # one-past-end of fourth-order case 81 | 82 | cdef int SIZE2_c # maximum possible number of DOFs, 2D case 83 | 84 | # 3D case 85 | # 86 | cdef int i3_F_c 87 | 88 | cdef int i3_X_c 89 | cdef int i3_Y_c 90 | cdef int i3_Z_c 91 | 92 | cdef int i3_X2_c 93 | cdef int i3_XY_c 94 | cdef int i3_Y2_c 95 | cdef int i3_YZ_c 96 | cdef int i3_Z2_c 97 | cdef int i3_XZ_c 98 | 99 | cdef int i3_X3_c 100 | cdef int i3_X2Y_c 101 | cdef int i3_XY2_c 102 | cdef int i3_Y3_c 103 | cdef int i3_Y2Z_c 104 | cdef int i3_YZ2_c 105 | cdef int i3_Z3_c 106 | cdef int i3_XZ2_c 107 | cdef int i3_X2Z_c 108 | cdef int i3_XYZ_c 109 | 110 | cdef int i3_X4_c 111 | cdef int i3_X3Y_c 112 | cdef int i3_X2Y2_c 113 | cdef int i3_XY3_c 114 | cdef int i3_Y4_c 115 | cdef int i3_Y3Z_c 116 | cdef int i3_Y2Z2_c 117 | cdef int i3_YZ3_c 118 | cdef int i3_Z4_c 119 | cdef int i3_XZ3_c 120 | cdef int i3_X2Z2_c 121 | cdef int i3_X3Z_c 122 | cdef int i3_X2YZ_c 123 | cdef int i3_XY2Z_c 124 | cdef int i3_XYZ2_c 125 | 126 | cdef int i3_0th_end_c # one-past end of zeroth-order case 127 | cdef int i3_1st_end_c # one-past end of first-order case 128 | cdef int i3_2nd_end_c # one-past-end of second-order case 129 | cdef int i3_3rd_end_c # one-past-end of third-order case 130 | cdef int i3_4th_end_c # one-past-end of fourth-order case 131 | 132 | cdef int SIZE3_c # maximum possible number of DOFs, 3D case 133 | 134 | 135 | # bitmask constants for knowns. 136 | # 137 | # Knowns are eliminated algebraically from the equation system; if any knowns are specified, 138 | # the system to be solved (for a point x_i) will be smaller than the full 6x6. 139 | # 140 | # The sensible default thing to do is to consider the function value F known, with all the 141 | # derivatives unknown. 142 | # 143 | # Note that here "known" means "known at point x_i" (the point at which we wish to compute the derivatives). 144 | # 145 | # Function values (F) are always assumed known at all *neighbor* points x_k, since they are used 146 | # for determining the local least-squares quadratic polynomial fit to the data. This fit is then used 147 | # as local a surrogate model for the unknown function f; in WLSQM, the derivatives are actually computed 148 | # from the surrogate. 149 | # 150 | # The option to have the function value (F) as an unknown is useful with Neumann BCs, if the neighborhoods 151 | # of the Neumann boundary points are chosen so that each Neumann boundary point only uses neighbors from 152 | # the interior of the domain. This gives the possibility to leave F free at all Neumann boundary points, 153 | # while prescribing only a derivative. 154 | # 155 | # (In practice, at slanted (i.e. not coordinate axis aligned) boundaries, local (tangent, normal) 156 | # coordinates must be used; i.e., the coordinate system in which the derivatives are to be computed 157 | # must be rotated to match the orientation of the boundary. This makes Y the normal derivative, 158 | # which can then be prescribed using this mechanism, while leaving the function value F free.) 159 | 160 | # 1D case 161 | # 162 | cdef long long b1_F_c 163 | cdef long long b1_X_c 164 | cdef long long b1_X2_c 165 | cdef long long b1_X3_c 166 | cdef long long b1_X4_c 167 | 168 | # 2D case 169 | # 170 | cdef long long b2_F_c 171 | 172 | cdef long long b2_X_c 173 | cdef long long b2_Y_c 174 | 175 | cdef long long b2_X2_c 176 | cdef long long b2_XY_c 177 | cdef long long b2_Y2_c 178 | 179 | cdef long long b2_X3_c 180 | cdef long long b2_X2Y_c 181 | cdef long long b2_XY2_c 182 | cdef long long b2_Y3_c 183 | 184 | cdef long long b2_X4_c 185 | cdef long long b2_X3Y_c 186 | cdef long long b2_X2Y2_c 187 | cdef long long b2_XY3_c 188 | cdef long long b2_Y4_c 189 | 190 | # 3D case 191 | # 192 | cdef long long b3_F_c 193 | 194 | cdef long long b3_X_c 195 | cdef long long b3_Y_c 196 | cdef long long b3_Z_c 197 | 198 | cdef long long b3_X2_c 199 | cdef long long b3_XY_c 200 | cdef long long b3_Y2_c 201 | cdef long long b3_YZ_c 202 | cdef long long b3_Z2_c 203 | cdef long long b3_XZ_c 204 | 205 | cdef long long b3_X3_c 206 | cdef long long b3_X2Y_c 207 | cdef long long b3_XY2_c 208 | cdef long long b3_Y3_c 209 | cdef long long b3_Y2Z_c 210 | cdef long long b3_YZ2_c 211 | cdef long long b3_Z3_c 212 | cdef long long b3_XZ2_c 213 | cdef long long b3_X2Z_c 214 | cdef long long b3_XYZ_c 215 | 216 | cdef long long b3_X4_c 217 | cdef long long b3_X3Y_c 218 | cdef long long b3_X2Y2_c 219 | cdef long long b3_XY3_c 220 | cdef long long b3_Y4_c 221 | cdef long long b3_Y3Z_c 222 | cdef long long b3_Y2Z2_c 223 | cdef long long b3_YZ3_c 224 | cdef long long b3_Z4_c 225 | cdef long long b3_XZ3_c 226 | cdef long long b3_X2Z2_c 227 | cdef long long b3_X3Z_c 228 | cdef long long b3_X2YZ_c 229 | cdef long long b3_XY2Z_c 230 | cdef long long b3_XYZ2_c 231 | 232 | -------------------------------------------------------------------------------- /wlsqm/fitter/defs.pyx: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # 3 | # Set Cython compiler directives. This section must appear before any code! 4 | # 5 | # For available directives, see: 6 | # 7 | # http://docs.cython.org/en/latest/src/reference/compilation.html 8 | # 9 | # cython: wraparound = False 10 | # cython: boundscheck = False 11 | # cython: cdivision = True 12 | # 13 | """WLSQM (Weighted Least SQuares Meshless): a fast and accurate meshless least-squares interpolator for Python, for scalar-valued data defined as point values on 1D, 2D and 3D point clouds. 14 | 15 | This module contains C-level and Python-level definitions of constants. The constants are made visible to Python by creating Python objects, with their values copied from the corresponding C constants. 16 | 17 | In the source code, the suffix of _c means "visible at the C level in Cython"; it is used to distinguish the typed C constants from the corresponding Python objects. 18 | 19 | Naming scheme: 20 | 21 | ALGO_* = algorithms for the solve step (for advanced API in wlsqm.fitter.expert). 22 | WEIGHT_* = weighting methods for the error (data - predicted) in least-squares fitting. 23 | 24 | i1_* = integer, 1D case 25 | i2_* = integer, 2D case 26 | i3_* = integer, 3D case 27 | 28 | The i?_* constants are the human-readable names for the DOF indices in the "fi" array (see docstrings in wlsqm.fitter.simple). 29 | 30 | b1_* = bitmask, 1D case 31 | b2_* = bitmask, 2D case 32 | b3_* = bitmask, 3D case 33 | 34 | *_end = one-past-end index of this case. The ordinal (0th, 1st, 2nd, 3rd, 4th) refers to the degree of the fit. 35 | E.g. i2_3rd_end = one-past-end for 2D with 3rd order fit. 36 | 37 | SIZE1 = maximum possible number of DOFs (degrees of freedom), 1D case 38 | SIZE2 = maximum possible number of DOFs (degrees of freedom), 2D case 39 | SIZE3 = maximum possible number of DOFs (degrees of freedom), 3D case 40 | 41 | ("maximum possible" because if order < 4, then only lower-degree DOFs will exist.) 42 | 43 | F = function value 44 | X = "times x" (coefficients) or "differentiate by x" (see wlsqm.fitter.interp) 45 | X2 = "times x**2" or "differentiate twice by x" 46 | Y, Z respectively 47 | 48 | Examples: 49 | 50 | i2_F = 2D case, function value 51 | i2_X2Y = 2D case, coefficient of the X**2 * Y term in the polynomial; or request differentiation twice by x and once by y to compute d3f/dx2dy (see wlsqm.fitter.interp) 52 | 53 | IMPORTANT: the DOF values returned by the fitter are "partially baked" such that the DOF value directly corresponds to the value of the corresponding derivative. 54 | This is for convenience of evaluating derivatives at the model reference point. 55 | 56 | E.g. fi[:,i2_X2] is the coefficient of d2f/dx2 in a Taylor series expansion of f around the reference point xi. 57 | (The ":" is here meant to refer to the reference point xi for all local models; see wlsqm.fitter.simple.fit_2D_many() for a description of the "fi" array.) 58 | 59 | JJ 2016-11-30 60 | """ 61 | 62 | from __future__ import division, print_function, absolute_import 63 | 64 | ################################################# 65 | # C definitions (Cython level) 66 | ################################################# 67 | 68 | # Algorithms for the solve step (expert mode). 69 | # 70 | cdef int ALGO_BASIC_c = 1 # just fit once 71 | cdef int ALGO_ITERATIVE_c = 2 # fit with iterative refinement to mitigate roundoff 72 | 73 | # Weighting methods. 74 | # 75 | cdef int WEIGHT_UNIFORM_c = 1 76 | cdef int WEIGHT_CENTER_c = 2 77 | 78 | # DOF index in the array f. 79 | # 80 | # These are ordered in increasing order of number of differentiations, so that if only first derivatives 81 | # are required, the DOF array can be simply truncated after the first derivatives. 82 | # 83 | # To avoid gaps in the numbering, this requires separate DOF orderings for the 1D, 2D and 3D cases. 84 | # 85 | # (The other logical possibility would be function value first, then x-related, then y-related, then mixed, 86 | # but then the case of "first derivatives only" requires changes to the ordering to avoid gaps. 87 | # Specifying different orderings for different numbers of space dimensions is conceptually cleaner 88 | # of the two possibilities.) 89 | 90 | # 1D case 91 | # 92 | cdef int i1_F_c = 0 93 | cdef int i1_X_c = 1 94 | cdef int i1_X2_c = 2 95 | cdef int i1_X3_c = 3 96 | cdef int i1_X4_c = 4 97 | 98 | cdef int i1_0th_end_c = 1 # one-past end of zeroth-order case 99 | cdef int i1_1st_end_c = 2 # one-past end of first-order case 100 | cdef int i1_2nd_end_c = 3 # one-past-end of second-order case 101 | cdef int i1_3rd_end_c = 4 # one-past-end of third-order case 102 | cdef int i1_4th_end_c = 5 # one-past-end of fourth-order case 103 | 104 | cdef int SIZE1_c = i1_4th_end_c # maximum possible number of DOFs, 1D case 105 | 106 | # 2D case 107 | # 108 | cdef int i2_F_c = 0 109 | 110 | cdef int i2_X_c = 1 111 | cdef int i2_Y_c = 2 112 | 113 | cdef int i2_X2_c = 3 114 | cdef int i2_XY_c = 4 115 | cdef int i2_Y2_c = 5 116 | 117 | cdef int i2_X3_c = 6 118 | cdef int i2_X2Y_c = 7 119 | cdef int i2_XY2_c = 8 120 | cdef int i2_Y3_c = 9 121 | 122 | cdef int i2_X4_c = 10 123 | cdef int i2_X3Y_c = 11 124 | cdef int i2_X2Y2_c = 12 125 | cdef int i2_XY3_c = 13 126 | cdef int i2_Y4_c = 14 127 | 128 | cdef int i2_0th_end_c = 1 # one-past end of zeroth-order case 129 | cdef int i2_1st_end_c = 3 # one-past end of first-order case 130 | cdef int i2_2nd_end_c = 6 # one-past-end of second-order case 131 | cdef int i2_3rd_end_c = 10 # one-past-end of third-order case 132 | cdef int i2_4th_end_c = 15 # one-past-end of fourth-order case 133 | 134 | cdef int SIZE2_c = i2_4th_end_c # maximum possible number of DOFs, 2D case 135 | 136 | # 3D case 137 | # 138 | cdef int i3_F_c = 0 139 | 140 | cdef int i3_X_c = 1 141 | cdef int i3_Y_c = 2 142 | cdef int i3_Z_c = 3 143 | 144 | cdef int i3_X2_c = 4 145 | cdef int i3_XY_c = 5 146 | cdef int i3_Y2_c = 6 147 | cdef int i3_YZ_c = 7 148 | cdef int i3_Z2_c = 8 149 | cdef int i3_XZ_c = 9 150 | 151 | cdef int i3_X3_c = 10 152 | cdef int i3_X2Y_c = 11 153 | cdef int i3_XY2_c = 12 154 | cdef int i3_Y3_c = 13 155 | cdef int i3_Y2Z_c = 14 156 | cdef int i3_YZ2_c = 15 157 | cdef int i3_Z3_c = 16 158 | cdef int i3_XZ2_c = 17 159 | cdef int i3_X2Z_c = 18 160 | cdef int i3_XYZ_c = 19 161 | 162 | cdef int i3_X4_c = 20 163 | cdef int i3_X3Y_c = 21 164 | cdef int i3_X2Y2_c = 22 165 | cdef int i3_XY3_c = 23 166 | cdef int i3_Y4_c = 24 167 | cdef int i3_Y3Z_c = 25 168 | cdef int i3_Y2Z2_c = 26 169 | cdef int i3_YZ3_c = 27 170 | cdef int i3_Z4_c = 28 171 | cdef int i3_XZ3_c = 29 172 | cdef int i3_X2Z2_c = 30 173 | cdef int i3_X3Z_c = 31 174 | cdef int i3_X2YZ_c = 32 175 | cdef int i3_XY2Z_c = 33 176 | cdef int i3_XYZ2_c = 34 177 | 178 | cdef int i3_0th_end_c = 1 # one-past end of zeroth-order case 179 | cdef int i3_1st_end_c = 4 # one-past end of first-order case 180 | cdef int i3_2nd_end_c = 10 # one-past-end of second-order case 181 | cdef int i3_3rd_end_c = 20 # one-past-end of third-order case 182 | cdef int i3_4th_end_c = 35 # one-past-end of fourth-order case 183 | 184 | cdef int SIZE3_c = i3_4th_end_c # maximum possible number of DOFs, 3D case 185 | 186 | 187 | # bitmask constants for knowns. 188 | # 189 | # Knowns are eliminated algebraically from the equation system; if any knowns are specified, 190 | # the system to be solved (for a point x_i) will be smaller than the full system (e.g. "full" is 6x6 for 2nd order in 2D). 191 | # 192 | # The sensible default is to consider the function value F known, with all the derivatives unknown. 193 | # 194 | # Note that here "known" means "known at point xi" (the reference point of the model). 195 | # 196 | # Function values (F) are always assumed known at all *neighbor* points xk, since they are used 197 | # for determining the local least-squares polynomial fit to the data. This fit is then used 198 | # as a local surrogate model representing the unknown function f. 199 | # 200 | # In the application context of solving IBVPs with explicit time integration, the option to have the function value (F) as an unknown 201 | # is useful with Neumann BCs. The neighborhoods of the Neumann boundary points can be chosen such that each Neumann boundary point 202 | # only uses neighbors from the interior of the domain. This gives the possibility to leave F free at all Neumann boundary points, 203 | # while prescribing only a derivative (the normal-direction derivative). 204 | # 205 | # (In practice, at slanted (i.e. not coordinate axis aligned) boundaries, local (tangent, normal) 206 | # coordinates must be used; i.e., the coordinate system in which the derivatives are to be computed 207 | # must be rotated to match the orientation of the boundary. This makes Y the normal derivative, 208 | # which can then be prescribed using this mechanism, while leaving the function value F free.) 209 | 210 | # 1D case 211 | # 212 | cdef long long b1_F_c = (1LL << i1_F_c) 213 | cdef long long b1_X_c = (1LL << i1_X_c) 214 | cdef long long b1_X2_c = (1LL << i1_X2_c) 215 | cdef long long b1_X3_c = (1LL << i1_X3_c) 216 | cdef long long b1_X4_c = (1LL << i1_X4_c) 217 | 218 | # 2D case 219 | # 220 | cdef long long b2_F_c = (1LL << i2_F_c) 221 | 222 | cdef long long b2_X_c = (1LL << i2_X_c) 223 | cdef long long b2_Y_c = (1LL << i2_Y_c) 224 | 225 | cdef long long b2_X2_c = (1LL << i2_X2_c) 226 | cdef long long b2_XY_c = (1LL << i2_XY_c) 227 | cdef long long b2_Y2_c = (1LL << i2_Y2_c) 228 | 229 | cdef long long b2_X3_c = (1LL << i2_X3_c) 230 | cdef long long b2_X2Y_c = (1LL << i2_X2Y_c) 231 | cdef long long b2_XY2_c = (1LL << i2_XY2_c) 232 | cdef long long b2_Y3_c = (1LL << i2_Y3_c) 233 | 234 | cdef long long b2_X4_c = (1LL << i2_X4_c) 235 | cdef long long b2_X3Y_c = (1LL << i2_X3Y_c) 236 | cdef long long b2_X2Y2_c = (1LL << i2_X2Y2_c) 237 | cdef long long b2_XY3_c = (1LL << i2_XY3_c) 238 | cdef long long b2_Y4_c = (1LL << i2_Y4_c) 239 | 240 | # 3D case 241 | # 242 | cdef long long b3_F_c = (1LL << i3_F_c) 243 | 244 | cdef long long b3_X_c = (1LL << i3_X_c) 245 | cdef long long b3_Y_c = (1LL << i3_Y_c) 246 | cdef long long b3_Z_c = (1LL << i3_Z_c) 247 | 248 | cdef long long b3_X2_c = (1LL << i3_X2_c) 249 | cdef long long b3_XY_c = (1LL << i3_XY_c) 250 | cdef long long b3_Y2_c = (1LL << i3_Y2_c) 251 | cdef long long b3_YZ_c = (1LL << i3_YZ_c) 252 | cdef long long b3_Z2_c = (1LL << i3_Z2_c) 253 | cdef long long b3_XZ_c = (1LL << i3_XZ_c) 254 | 255 | cdef long long b3_X3_c = (1LL << i3_X3_c) 256 | cdef long long b3_X2Y_c = (1LL << i3_X2Y_c) 257 | cdef long long b3_XY2_c = (1LL << i3_XY2_c) 258 | cdef long long b3_Y3_c = (1LL << i3_Y3_c) 259 | cdef long long b3_Y2Z_c = (1LL << i3_Y2Z_c) 260 | cdef long long b3_YZ2_c = (1LL << i3_YZ2_c) 261 | cdef long long b3_Z3_c = (1LL << i3_Z3_c) 262 | cdef long long b3_XZ2_c = (1LL << i3_XZ2_c) 263 | cdef long long b3_X2Z_c = (1LL << i3_X2Z_c) 264 | cdef long long b3_XYZ_c = (1LL << i3_XYZ_c) 265 | 266 | cdef long long b3_X4_c = (1LL << i3_X4_c) 267 | cdef long long b3_X3Y_c = (1LL << i3_X3Y_c) 268 | cdef long long b3_X2Y2_c = (1LL << i3_X2Y2_c) 269 | cdef long long b3_XY3_c = (1LL << i3_XY3_c) 270 | cdef long long b3_Y4_c = (1LL << i3_Y4_c) 271 | cdef long long b3_Y3Z_c = (1LL << i3_Y3Z_c) 272 | cdef long long b3_Y2Z2_c = (1LL << i3_Y2Z2_c) 273 | cdef long long b3_YZ3_c = (1LL << i3_YZ3_c) 274 | cdef long long b3_Z4_c = (1LL << i3_Z4_c) 275 | cdef long long b3_XZ3_c = (1LL << i3_XZ3_c) 276 | cdef long long b3_X2Z2_c = (1LL << i3_X2Z2_c) 277 | cdef long long b3_X3Z_c = (1LL << i3_X3Z_c) 278 | cdef long long b3_X2YZ_c = (1LL << i3_X2YZ_c) 279 | cdef long long b3_XY2Z_c = (1LL << i3_XY2Z_c) 280 | cdef long long b3_XYZ2_c = (1LL << i3_XYZ2_c) 281 | 282 | 283 | ################################################# 284 | # Python wrapper 285 | ################################################# 286 | 287 | # Algorithms for the solve step (expert mode). 288 | # 289 | ALGO_BASIC = ALGO_BASIC_c 290 | ALGO_ITERATIVE = ALGO_ITERATIVE_c 291 | 292 | # Weighting methods. 293 | # 294 | WEIGHT_UNIFORM = WEIGHT_UNIFORM_c 295 | WEIGHT_CENTER = WEIGHT_CENTER_c 296 | 297 | # DOF index in the array f. 298 | # 299 | # These are ordered in increasing order of number of differentiations, so that if only first derivatives 300 | # are required, the DOF array can be simply truncated after the first derivatives. 301 | # 302 | # To avoid gaps in the numbering, this requires separate DOF orderings for the 1D, 2D and 3D cases. 303 | # 304 | # (The other logical possibility would be function value first, then x-related, then y-related, then mixed, 305 | # but then the case of "first derivatives only" requires changes to the ordering to avoid gaps. 306 | # Specifying different orderings for different numbers of space dimensions is conceptually cleaner 307 | # of the two possibilities.) 308 | 309 | # 1D case 310 | # 311 | i1_F = i1_F_c 312 | i1_X = i1_X_c 313 | i1_X2 = i1_X2_c 314 | i1_X3 = i1_X3_c 315 | i1_X4 = i1_X4_c 316 | 317 | i1_1st_end = i1_1st_end_c 318 | i1_2nd_end = i1_2nd_end_c 319 | i1_3rd_end = i1_3rd_end_c 320 | i1_4th_end = i1_4th_end_c 321 | 322 | SIZE1 = SIZE1_c 323 | 324 | 325 | # 2D case 326 | # 327 | i2_F = i2_F_c 328 | 329 | i2_X = i2_X_c 330 | i2_Y = i2_Y_c 331 | 332 | i2_X2 = i2_X2_c 333 | i2_XY = i2_XY_c 334 | i2_Y2 = i2_Y2_c 335 | 336 | i2_X3 = i2_X3_c 337 | i2_X2Y = i2_X2Y_c 338 | i2_XY2 = i2_XY2_c 339 | i2_Y3 = i2_Y3_c 340 | 341 | i2_X4 = i2_X4_c 342 | i2_X3Y = i2_X3Y_c 343 | i2_X2Y2 = i2_X2Y2_c 344 | i2_XY3 = i2_XY3_c 345 | i2_Y4 = i2_Y4_c 346 | 347 | i2_1st_end = i2_1st_end_c 348 | i2_2nd_end = i2_2nd_end_c 349 | i2_3rd_end = i2_3rd_end_c 350 | i2_4th_end = i2_4th_end_c 351 | 352 | SIZE2 = SIZE2_c 353 | 354 | 355 | # 3D case 356 | # 357 | i3_F = i3_F_c 358 | 359 | i3_X = i3_X_c 360 | i3_Y = i3_Y_c 361 | i3_Z = i3_Z_c 362 | 363 | i3_X2 = i3_X2_c 364 | i3_XY = i3_XY_c 365 | i3_Y2 = i3_Y2_c 366 | i3_YZ = i3_YZ_c 367 | i3_Z2 = i3_Z2_c 368 | i3_XZ = i3_XZ_c 369 | 370 | i3_X3 = i3_X3_c 371 | i3_X2Y = i3_X2Y_c 372 | i3_XY2 = i3_XY2_c 373 | i3_Y3 = i3_Y3_c 374 | i3_Y2Z = i3_Y2Z_c 375 | i3_YZ2 = i3_YZ2_c 376 | i3_Z3 = i3_Z3_c 377 | i3_XZ2 = i3_XZ2_c 378 | i3_X2Z = i3_X2Z_c 379 | i3_XYZ = i3_XYZ_c 380 | 381 | i3_X4 = i3_X4_c 382 | i3_X3Y = i3_X3Y_c 383 | i3_X2Y2 = i3_X2Y2_c 384 | i3_XY3 = i3_XY3_c 385 | i3_Y4 = i3_Y4_c 386 | i3_Y3Z = i3_Y3Z_c 387 | i3_Y2Z2 = i3_Y2Z2_c 388 | i3_YZ3 = i3_YZ3_c 389 | i3_Z4 = i3_Z4_c 390 | i3_XZ3 = i3_XZ3_c 391 | i3_X2Z2 = i3_X2Z2_c 392 | i3_X3Z = i3_X3Z_c 393 | i3_X2YZ = i3_X2YZ_c 394 | i3_XY2Z = i3_XY2Z_c 395 | i3_XYZ2 = i3_XYZ2_c 396 | 397 | i3_0th_end = i3_0th_end_c 398 | i3_1st_end = i3_1st_end_c 399 | i3_2nd_end = i3_2nd_end_c 400 | i3_3rd_end = i3_3rd_end_c 401 | i3_4th_end = i3_4th_end_c 402 | 403 | SIZE3 = SIZE3_c 404 | 405 | 406 | # bitmask constants for knowns. 407 | # 408 | # Knowns are eliminated algebraically from the equation system; if any knowns are specified, 409 | # the system to be solved (for a point x_i) will be smaller than the full 6x6. 410 | # 411 | # The sensible default thing to do is to consider the function value F known, with all the 412 | # derivatives unknown. 413 | # 414 | # Note that here "known" means "known at point x_i" (the point at which we wish to compute the derivatives). 415 | # 416 | # Function values (F) are always assumed known at all *neighbor* points x_k, since they are used 417 | # for determining the local least-squares quadratic polynomial fit to the data. This fit is then used 418 | # as local a surrogate model for the unknown function f; in WLSQM, the derivatives are actually computed 419 | # from the surrogate. 420 | # 421 | # The option to have the function value (F) as an unknown is useful with Neumann BCs, if the neighborhoods 422 | # of the Neumann boundary points are chosen so that each Neumann boundary point only uses neighbors from 423 | # the interior of the domain. This gives the possibility to leave F free at all Neumann boundary points, 424 | # while prescribing only a derivative. 425 | # 426 | # (In practice, at slanted (i.e. not coordinate axis aligned) boundaries, local (tangent, normal) 427 | # coordinates must be used; i.e., the coordinate system in which the derivatives are to be computed 428 | # must be rotated to match the orientation of the boundary. This makes Y the normal derivative, 429 | # which can then be prescribed using this mechanism, while leaving the function value F free.) 430 | 431 | # 1D case 432 | # 433 | b1_F = b1_F_c 434 | b1_X = b1_X_c 435 | b1_X2 = b1_X2_c 436 | b1_X3 = b1_X3_c 437 | b1_X4 = b1_X4_c 438 | 439 | 440 | # 2D case 441 | # 442 | b2_F = b2_F_c 443 | 444 | b2_X = b2_X_c 445 | b2_Y = b2_Y_c 446 | 447 | b2_X2 = b2_X2_c 448 | b2_XY = b2_XY_c 449 | b2_Y2 = b2_Y2_c 450 | 451 | b2_X3 = b2_X3_c 452 | b2_X2Y = b2_X2Y_c 453 | b2_XY2 = b2_XY2_c 454 | b2_Y3 = b2_Y3_c 455 | 456 | b2_X4 = b2_X4_c 457 | b2_X3Y = b2_X3Y_c 458 | b2_X2Y2 = b2_X2Y2_c 459 | b2_XY3 = b2_XY3_c 460 | b2_Y4 = b2_Y4_c 461 | 462 | 463 | # 3D case 464 | # 465 | b3_F = b3_F_c 466 | 467 | b3_X = b3_X_c 468 | b3_Y = b3_Y_c 469 | b3_Z = b3_Z_c 470 | 471 | b3_X2 = b3_X2_c 472 | b3_XY = b3_XY_c 473 | b3_Y2 = b3_Y2_c 474 | b3_YZ = b3_YZ_c 475 | b3_Z2 = b3_Z2_c 476 | b3_XZ = b3_XZ_c 477 | 478 | b3_X3 = b3_X3_c 479 | b3_X2Y = b3_X2Y_c 480 | b3_XY2 = b3_XY2_c 481 | b3_Y3 = b3_Y3_c 482 | b3_Y2Z = b3_Y2Z_c 483 | b3_YZ2 = b3_YZ2_c 484 | b3_Z3 = b3_Z3_c 485 | b3_XZ2 = b3_XZ2_c 486 | b3_X2Z = b3_X2Z_c 487 | b3_XYZ = b3_XYZ_c 488 | 489 | b3_X4 = b3_X4_c 490 | b3_X3Y = b3_X3Y_c 491 | b3_X2Y2 = b3_X2Y2_c 492 | b3_XY3 = b3_XY3_c 493 | b3_Y4 = b3_Y4_c 494 | b3_Y3Z = b3_Y3Z_c 495 | b3_Y2Z2 = b3_Y2Z2_c 496 | b3_YZ3 = b3_YZ3_c 497 | b3_Z4 = b3_Z4_c 498 | b3_XZ3 = b3_XZ3_c 499 | b3_X2Z2 = b3_X2Z2_c 500 | b3_X3Z = b3_X3Z_c 501 | b3_X2YZ = b3_X2YZ_c 502 | b3_XY2Z = b3_XY2Z_c 503 | b3_XYZ2 = b3_XYZ2_c 504 | 505 | -------------------------------------------------------------------------------- /wlsqm/fitter/impl.pxd: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # 3 | # WLSQM (Weighted Least SQuares Meshless): a fast and accurate meshless least-squares interpolator for Python, for scalar-valued data defined as point values on 1D, 2D and 3D point clouds. 4 | # 5 | # Low-level routines: distance matrix generation, problem matrix generation, solver. 6 | # 7 | # JJ 2016-11-30 8 | 9 | # Set Cython compiler directives. This section must appear before any code! 10 | # 11 | # For available directives, see: 12 | # 13 | # http://docs.cython.org/en/latest/src/reference/compilation.html 14 | # 15 | # cython: wraparound = False 16 | # cython: boundscheck = False 17 | # cython: cdivision = True 18 | 19 | from __future__ import absolute_import 20 | 21 | from cython cimport view 22 | 23 | # See the infrasructure module for the definition of Case. 24 | cimport wlsqm.fitter.infra as infra 25 | 26 | #################################################### 27 | # Distance matrix (c) generation 28 | #################################################### 29 | 30 | cdef void make_c_nD( infra.Case* case, double[::view.generic,::view.contiguous] xkManyD, double[::view.generic] xk1D ) nogil 31 | cdef void make_c_3D( infra.Case* case, double[::view.generic,::view.contiguous] xk ) nogil 32 | cdef void make_c_2D( infra.Case* case, double[::view.generic,::view.contiguous] xk ) nogil 33 | cdef void make_c_1D( infra.Case* case, double[::view.generic] xk ) nogil 34 | 35 | #################################################### 36 | # Problem matrix (A) generation 37 | #################################################### 38 | 39 | cdef void make_A( infra.Case* case ) nogil 40 | cdef void preprocess_A( infra.Case* case, int debug ) nogil 41 | 42 | #################################################### 43 | # RHS handling and solving 44 | #################################################### 45 | 46 | cdef void solve( infra.Case* case, double[::view.generic] fk, double[::view.generic,::view.contiguous] sens, int do_sens, int taskid ) nogil 47 | cdef int solve_iterative( infra.Case* case, double[::view.generic] fk, double[::view.generic,::view.contiguous] sens, int do_sens, int taskid, int max_iter, 48 | double[::view.generic,::view.contiguous] xkManyD, double[::view.generic] xk1D ) nogil 49 | 50 | -------------------------------------------------------------------------------- /wlsqm/fitter/infra.pxd: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # 3 | # WLSQM (Weighted Least SQuares Meshless): a fast and accurate meshless least-squares interpolator for Python, for scalar-valued data defined as point values on 1D, 2D and 3D point clouds. 4 | # 5 | # Centralized memory allocation infrastructure. 6 | # 7 | # The implementation uses C-style object-oriented programming, with structs and 8 | # class name prefixed methods using an explicit self pointer argument. 9 | # 10 | # JJ 2016-11-30 11 | 12 | from __future__ import absolute_import 13 | 14 | ################################################# 15 | # Helper functions 16 | ################################################# 17 | 18 | cdef int number_of_dofs( int dimension, int order ) nogil 19 | cdef int number_of_reduced_dofs( int n, long long mask ) nogil 20 | cdef int remap( int* o2r, int* r2o, int n, long long mask ) nogil 21 | 22 | ################################################# 23 | # class Allocator: 24 | ################################################# 25 | 26 | cdef int ALLOC_MODE_PASSTHROUGH # pass each call through to C malloc/free 27 | cdef int ALLOC_MODE_ONEBIGBUFFER # pre-allocate one big buffer to fit everything in 28 | 29 | cdef struct Allocator: 30 | int mode # operation mode, see constants ALLOC_MODE_* 31 | void* buffer # start address of all storage 32 | int size_total # buffer size, bytes 33 | void* p # first currently unused address in buffer 34 | int size_used # bytes used up to now 35 | 36 | cdef Allocator* Allocator_new( int mode, int total_size_bytes ) nogil except 0 37 | cdef void* Allocator_malloc( Allocator* self, int size_bytes ) nogil 38 | cdef void Allocator_free( Allocator* self, void* p ) nogil 39 | cdef int Allocator_size_remaining( Allocator* self ) nogil 40 | cdef void Allocator_del( Allocator* self ) nogil 41 | 42 | ################################################# 43 | # class CaseManager: 44 | ################################################# 45 | 46 | # Sizes needed for the various arrays in Case, as bytes. 47 | # 48 | # This is really just a struct; no methods. 49 | # 50 | cdef struct BufferSizes: 51 | int o2r # DOF mapping original --> reduced 52 | int r2o # DOF mapping reduced --> original 53 | 54 | int c # distance matrix 55 | int w # weights 56 | 57 | int A # problem matrix / its packed LU factorization 58 | int row_scale # row scaling factors for A (needed by solver to scale RHS) 59 | int col_scale # column scaling factors for A (needed by solver to scale solution) 60 | int ipiv # pivot information of LU factored A (needed by solver) 61 | 62 | int fi # coefficients of the fit; essentially, the function value and derivatives at the origin of the fit 63 | int fi2 # work space for coefficients for interpolating derivatives of the model (wlsqm.fitter.interp) to a general point 64 | 65 | int wrk # solver work space for RHS (or zero in managed mode) 66 | 67 | int fk_tmp # work space for iterative refinement (or zero in managed mode), remaining error at each point xk 68 | int fi_tmp # work space for iterative refinement (or zero in managed mode), coefficients of error reduction fit 69 | 70 | int total # sum of all the above 71 | 72 | # Infra class for multiple problem instances having the same dimension, order, knowns mask and flags (do_sens, iterative). 73 | # 74 | # This centralizes the memory allocation (to avoid unnecessary fragmentation) 75 | # when multiple problem instances are solved at one go. 76 | # 77 | # This class is only intended to be used from a Python thread. 78 | # 79 | cdef struct CaseManager: 80 | # parallel processing 81 | # 82 | # "per-task arrays": one work space per task, independent of the number of problem instances (cases). 83 | # 84 | int ntasks 85 | double** wrks # array of work spaces for RHS 86 | double** fk_tmps # array of work spaces for iterative refinement, remaining error at each point xk 87 | double** fi_tmps # array of work spaces for iterative refinement, coefficients of error reduction fit 88 | 89 | # managed cases 90 | # 91 | Case** cases # array to store the Case pointers 92 | int max_cases # array capacity 93 | int ncases # currently used capacity 94 | 95 | int bytes_needed # total memory required for storing the work spaces and the arrays allocated by the Case objects 96 | 97 | # data common to all managed cases 98 | # 99 | Allocator* mal 100 | int dimension 101 | int do_sens 102 | int iterative 103 | 104 | cdef CaseManager* CaseManager_new( int dimension, int do_sens, int iterative, int max_cases, int ntasks ) nogil except 0 105 | cdef int CaseManager_add( CaseManager* self, Case* case ) nogil except -1 106 | cdef int CaseManager_commit( CaseManager* self ) nogil except -1 107 | #cdef int CaseManager_allocate( CaseManager* self ) nogil except -1 # private (not exported from the module) 108 | #cdef void CaseManager_deallocate( CaseManager* self ) nogil # private 109 | cdef void CaseManager_del( CaseManager* self ) nogil 110 | 111 | ################################################# 112 | # class Case: 113 | ################################################# 114 | 115 | # This class gathers some metadata and centralizes memory management for one problem instance. 116 | # 117 | # We use the above custom allocator to allocate all needed memory in one big block. 118 | # 119 | # A centralized mode is available (see the optional constructor parameter "cases") 120 | # to centralize memory allocation for a set of cases. 121 | # 122 | # TODO: refactor: make_c_nD(), make_A(), preprocess_A(), solve() now look a lot like methods of Case (the first parameter is a Case*, and its member variables are used extensively). 123 | # TODO: could also store the point xi and other relevant stuff. (OTOH, currently no actual use case that needs them) 124 | # 125 | cdef struct Case: 126 | # infra 127 | int have_manager # 1 = has a CaseManager, using its allocator 128 | # 0 = no CaseManager, create an allocator locally 129 | CaseManager* manager 130 | Allocator* mal # custom memory allocator 131 | 132 | # case metadata 133 | int dimension # number of space dimensions 134 | int order # degree of polynomial to be fitted 135 | long long knowns # knowns bitmask 136 | int weighting_method # weighting: uniform or emphasize center region (see wlsqm.fitter.defs) 137 | int no # number of DOFs in original (unreduced) system 138 | int nr # number of DOFs in reduced system 139 | int nk # number of neighbor points used in fit 140 | int do_sens # flag: do sensitivity analysis? (affects memory usage) 141 | int iterative # flag: iterative refinement? (affects memory usage) 142 | 143 | # the origin point of the model (needed by certain routines) 144 | double xi # 1D, 2D, 3D 145 | double yi # 2D, 3D 146 | double zi # 3D 147 | 148 | # data pointers 149 | 150 | int geometry_owned # guest mode support: possible to use o2r,r2o,c,w,A,row_scale,col_scale,ipiv off another Case instance 151 | 152 | # DOF mappings 153 | int* o2r 154 | int* r2o 155 | 156 | # low level stuff: "c" matrix, weights 157 | double* c 158 | double* w 159 | 160 | # higher-level stuff: "A" matrix 161 | double* A 162 | double* row_scale 163 | double* col_scale 164 | int* ipiv 165 | 166 | # condition number (for wlsqm.fitter.impl.preprocess_A() debug mode) 167 | double cond_orig 168 | double cond_scaled 169 | 170 | # coefficients of the fit 171 | # 172 | # This name was chosen because fi[0] is the function value ("f") at the point (xi,yi), hence "i", 173 | # and the other elements store derivatives at the same point. 174 | # 175 | double* fi 176 | double* fi2 # work space for coefficients for evaluating derivatives of the model (wlsqm.fitter.interp) at a general point 177 | 178 | # RHS work space for solver 179 | double* wrk 180 | 181 | # work space for iterative fitting algorithm 182 | double* fk_tmp 183 | double* fi_tmp 184 | 185 | cdef Case* Case_new( int dimension, int order, double xi, double yi, double zi, int nk, long long knowns, int weighting_method, int do_sens, int iterative, CaseManager* manager, Case* host ) nogil except 0 186 | cdef double* Case_get_wrk( Case* self, int taskid ) nogil 187 | cdef double* Case_get_fk_tmp( Case* self, int taskid ) nogil 188 | cdef double* Case_get_fi_tmp( Case* self, int taskid ) nogil 189 | cdef void Case_make_weights( Case* self, double max_d2 ) nogil # mainly for use by wlsqm.fitter.impl.make_c_nd() 190 | cdef void Case_set_fi( Case* self, double* fi ) nogil 191 | cdef void Case_get_fi( Case* self, double* out ) nogil 192 | #cdef void Case_determine_sizes( Case* self, BufferSizes* sizes ) nogil # private 193 | #cdef int Case_allocate( Case* self ) nogil except -1 # private 194 | #cdef void Case_deallocate( Case* self ) nogil # private 195 | cdef void Case_del( Case* self ) nogil 196 | 197 | -------------------------------------------------------------------------------- /wlsqm/fitter/infra.pyx: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # 3 | # WLSQM (Weighted Least SQuares Meshless): a fast and accurate meshless least-squares interpolator for Python, for scalar-valued data defined as point values on 1D, 2D and 3D point clouds. 4 | # 5 | # Centralized memory allocation infrastructure. 6 | # 7 | # JJ 2016-11-30 8 | 9 | # Set Cython compiler directives. This section must appear before any code! 10 | # 11 | # For available directives, see: 12 | # 13 | # http://docs.cython.org/en/latest/src/reference/compilation.html 14 | # 15 | # cython: wraparound = False 16 | # cython: boundscheck = False 17 | # cython: cdivision = True 18 | 19 | # Total memory needed for arrays: 20 | # - no*sizeof(int) bytes for o2r; one shared copy is enough (bypass custom alloc, since only remap(), which needs this, computes "nr") 21 | # - no*sizeof(int) bytes for r2o; one shared copy is enough ( --''-- ) 22 | # - nprob*nk*no*sizeof(double) bytes for c (actually, sum(nk_j, j in problems)*no*sizeof(double)) 23 | # - nprob*nr*sizeof(double) bytes for row_scale 24 | # - nprob*nr*sizeof(double) bytes for column_scale 25 | # - nprob*nr*nr*sizeof(double) bytes for A 26 | # - nprob*nr*sizeof(int) bytes for ipiv 27 | # - solve: 28 | # - if do_sens, ntasks*nr*(nk+1)*sizeof(double) bytes for wrk 29 | # - use max(nk_j) here to fit the largest problem instance (since any thread may run it) 30 | # - if not do_sens, ntasks*nr*sizeof(double) bytes for wrk 31 | # - iterative refinement: 32 | # - ntasks*nk*sizeof(double) bytes for fk_tmp 33 | # - max(nk_j) here too, same reason 34 | # - ntasks*no*sizeof(double) bytes for fi_tmp 35 | 36 | from __future__ import division, print_function, absolute_import 37 | 38 | from libc.stdlib cimport malloc, free 39 | from libc.math cimport sqrt 40 | 41 | cimport wlsqm.fitter.defs as defs # C constants 42 | 43 | # use GCC's intrinsics for counting the number of set bits in an int 44 | # 45 | # See 46 | # http://stackoverflow.com/questions/109023/how-to-count-the-number-of-set-bits-in-a-32-bit-integer (algorithms, suggestions) 47 | # https://gist.github.com/craffel/e470421958cad33df550 (Cython defs; on popcounting a NumPy array) 48 | # 49 | cdef extern from "popcount.h": 50 | int __builtin_popcount(unsigned int) nogil 51 | int __builtin_popcountll(unsigned long long) nogil 52 | 53 | ##################################### 54 | # Helper functions 55 | ##################################### 56 | 57 | # Return number of DOFs in the original (unreduced) system. 58 | # 59 | # dimension : in, number of space dimensions (1, 2 or 3) 60 | # order : in, the order of the polynomial to be fitted 61 | # 62 | cdef int number_of_dofs( int dimension, int order ) nogil: 63 | if dimension not in [1,2,3]: 64 | return -1 65 | # with gil: 66 | # raise ValueError( "dimension must be 1, 2 or 3; got %d" % dimension ) 67 | if order not in [0,1,2,3,4]: 68 | return -2 69 | # with gil: 70 | # raise ValueError( "order must be 0, 1, 2, 3 or 4; got %d" % order ) 71 | 72 | cdef int no 73 | if dimension == 3: 74 | if order == 4: 75 | no = defs.i3_4th_end_c 76 | elif order == 3: 77 | no = defs.i3_3rd_end_c 78 | elif order == 2: 79 | no = defs.i3_2nd_end_c 80 | elif order == 1: 81 | no = defs.i3_1st_end_c 82 | else: # order == 0: 83 | no = defs.i3_0th_end_c 84 | elif dimension == 2: 85 | if order == 4: 86 | no = defs.i2_4th_end_c 87 | elif order == 3: 88 | no = defs.i2_3rd_end_c 89 | elif order == 2: 90 | no = defs.i2_2nd_end_c 91 | elif order == 1: 92 | no = defs.i2_1st_end_c 93 | else: # order == 0: 94 | no = defs.i2_0th_end_c 95 | else: # dimension == 1: 96 | if order == 4: 97 | no = defs.i1_4th_end_c 98 | elif order == 3: 99 | no = defs.i1_3rd_end_c 100 | elif order == 2: 101 | no = defs.i1_2nd_end_c 102 | elif order == 1: 103 | no = defs.i1_1st_end_c 104 | else: # order == 0: 105 | no = defs.i1_0th_end_c 106 | 107 | return no 108 | 109 | # Return the number of DOFs in the reduced system, corresponding to an original (unreduced) number of DOFs n and a knowns mask. 110 | # 111 | # n : in, number of DOFs in the original (unreduced) system 112 | # mask : in, bitmask of knowns 113 | # 114 | cdef int number_of_reduced_dofs( int n, long long mask ) nogil: 115 | cdef int ne = __builtin_popcountll(mask) # number of eliminated DOFs = number of bits set in mask 116 | return n - ne # remaining DOFs 117 | 118 | # Reduce the system size by removing the rows/columns for knowns from the DOF numbering. 119 | # 120 | # Specifically: 121 | # 122 | # Given a bitmask of DOFs to eliminate, construct DOF number mappings 123 | # between the original full equation system and the reduced equation system. 124 | # 125 | # o2r : out, mapping original --> reduced; size (n,), must be allocated by caller 126 | # r2o : out, mapping reduces --> original; size (n,), must be allocated by caller 127 | # n : in, number of DOFs in the original (unreduced) system 128 | # mask : in, bitmask of knowns 129 | # 130 | # return value: the number of DOFs in the reduced system. 131 | # 132 | # In the arrays, non-existent DOFs will be represented by the special value -1. 133 | # 134 | # In o2r (original->reduced), non-existent DOFs are those that were eliminated 135 | # (hence have no index in the reduced system). 136 | # 137 | # In r2o (reduced->original), non-existent DOFs are those with index >= n_reduced, 138 | # where n_reduced = (n - n_eliminated), since the reduced system has only n_reduced DOFs in total. 139 | # 140 | cdef int remap( int* o2r, int* r2o, int n, long long mask ) nogil: # o = original, r = reduced 141 | # We always start the elimination with a full range(n) of DOFs. 142 | # 143 | # For example, if we have 4 DOFs, and we would like to eliminate the DOF "1", 144 | # we construct the following mappings: 145 | # 146 | # orig -> reduced 147 | # 148 | # 0 -> 0 149 | # 1 -> -1 (original DOF "1" does not exist in reduced system) 150 | # 2 -> 1 151 | # 3 -> 2 152 | # 153 | # reduced -> orig 154 | # 155 | # 0 -> 0 156 | # 1 -> 2 157 | # 2 -> 3 158 | # 3 -> -1 (in the reduced system, there is no DOF "3") 159 | # 160 | # The right-hand sides can be expressed in array form, using the left-hand side as the array index: 161 | # 162 | # orig->reduced: [0, -1, 1, 2] 163 | # reduced->orig: [0, 2, 3, -1] 164 | # 165 | # These arrays are the output format. 166 | # 167 | # This says that e.g. the DOF "2" of orig maps to DOF "1" of reduced (array orig->reduced, index 2). 168 | # The DOF "1" of reduced maps to DOF "2" of orig (array reduced->orig, index 1). 169 | 170 | # We first generate orig -> reduced. 171 | # 172 | cdef int j, k=0 # k = first currently available DOF number (0-based) in the reduced system 173 | for j in range(n): 174 | if mask & (1LL << j): # eliminate this DOF? 175 | o2r[j] = -1 176 | else: 177 | o2r[j] = k 178 | k += 1 # a DOF was introduced into the reduced system 179 | 180 | # k is now the number of DOFs in the reduced system 181 | 182 | # Construct the inverse to obtain reduced -> orig. See the example above. 183 | # 184 | for j in range(n): 185 | if o2r[j] == -1: 186 | continue 187 | r2o[ o2r[j] ] = j 188 | 189 | # In the reduced -> orig mapping, set the rest to -1, since the reduced system 190 | # has less DOFs than the original one. 191 | # 192 | for j in range(k, n): 193 | r2o[j] = -1 194 | 195 | return k 196 | 197 | 198 | ################################################# 199 | # class Allocator: 200 | ################################################# 201 | 202 | # To avoid memory fragmentation in the case with many instances of the model being fitted at once, 203 | # we use a custom memory allocator. 204 | # 205 | # This is very simplistic; we do not need to support the re-use of already allocated blocks. 206 | # 207 | # Example: 208 | # int total_size_bytes = 1000000 # 1 MB 209 | # Allocator* a = Allocator_new( ALLOC_MODE_ONEBIGBUFFER, total_size_bytes ) 210 | # int* my_block_1 = Allocator_malloc( a, 100*sizeof(int) ) 211 | # # ...other Allocator_malloc()'s... 212 | # # ... 213 | # Allocator_free( a, my_block_1 ) 214 | # # ...other Allocator_free()'s... 215 | # Allocator_del( a ) 216 | 217 | # Object-oriented programming, C style. 218 | cdef int ALLOC_MODE_PASSTHROUGH = 1 # pass each call through to C malloc/free 219 | cdef int ALLOC_MODE_ONEBIGBUFFER = 2 # pre-allocate one big buffer to fit everything in 220 | 221 | # Constructor. 222 | # 223 | # This class is only intended to be used from a Python thread. 224 | # 225 | # Note that ALLOC_MODE_PASSTRHOUGH doesn't use total_size_bytes, but .pxd files do not support default values for function arguments, 226 | # and Cython's "int total_size_bytes=*" syntax (in the .pxd file) does not support nogil functions. 227 | # 228 | cdef Allocator* Allocator_new( int mode, int total_size_bytes ) nogil except 0: 229 | cdef Allocator* self = malloc( sizeof(Allocator) ) 230 | if self == 0: # we promised Cython not to return NULL, so we must raise if the malloc fails 231 | with gil: 232 | raise MemoryError("Out of memory trying to allocate an Allocator object") 233 | 234 | if mode == ALLOC_MODE_ONEBIGBUFFER and total_size_bytes > 0: 235 | self.buffer = malloc( total_size_bytes ) 236 | if self.buffer == 0: 237 | with gil: 238 | raise MemoryError("Out of memory trying to allocate a buffer of %d bytes" % (total_size_bytes)) 239 | else: 240 | self.buffer = 0 241 | 242 | self.mode = mode 243 | self.size_total = total_size_bytes 244 | self.p = self.buffer 245 | self.size_used = 0 246 | 247 | return self 248 | 249 | cdef void* Allocator_malloc( Allocator* self, int size_bytes ) nogil: 250 | if self.mode == ALLOC_MODE_PASSTHROUGH: 251 | # with gil: 252 | # print( "directly allocating %d bytes" % (size_bytes) ) # DEBUG 253 | return malloc( size_bytes ) 254 | 255 | # else... 256 | 257 | # pathological case: no buffer, can't allocate 258 | if self.buffer == 0: 259 | return 0 260 | 261 | # check that there is enough space remaining in the buffer 262 | cdef int size_remaining = self.size_total - self.size_used 263 | if size_bytes > size_remaining: 264 | # with gil: 265 | # print( "buffer full, cannot allocate %d bytes" % (size_bytes) ) # DEBUG 266 | return 0 267 | 268 | cdef void* p 269 | with gil: # since we are called from Python threads only, we can use the GIL to make the operation thread-safe. (TODO/FIXME: well, not exactly, see e.g. http://www.slideshare.net/dabeaz/an-introduction-to-python-concurrency ) 270 | # print( "reserving %d bytes from buffer of size %d; after alloc, %d bytes remaining" % (size_bytes, self.size_total, size_remaining - size_bytes) ) # DEBUG 271 | p = self.p 272 | self.p = ( (p) + size_bytes ) 273 | self.size_used += size_bytes 274 | 275 | return p 276 | 277 | cdef void Allocator_free( Allocator* self, void* p ) nogil: 278 | if self.mode == ALLOC_MODE_PASSTHROUGH: 279 | free( p ) 280 | # else do nothing; this simplistic allocator doesn't reuse blocks once they are allocated 281 | 282 | cdef int Allocator_size_remaining( Allocator* self ) nogil: 283 | return self.size_total - self.size_used 284 | 285 | # Destructor. 286 | cdef void Allocator_del( Allocator* self ) nogil: 287 | if self != 0: 288 | free( self.buffer ) 289 | free( self ) 290 | 291 | 292 | ################################################# 293 | # class CaseManager: 294 | ################################################# 295 | 296 | # Constructor. 297 | # 298 | # max_cases is mandatory (with an invalid default value since no sane default can exist). 299 | # 300 | # ntasks is for parallel processing at solve time; effectively, it specifies how many per-task arrays to allocate. 301 | # When processing serially, use the value 1. 302 | # 303 | cdef CaseManager* CaseManager_new( int dimension, int do_sens, int iterative, int max_cases, int ntasks ) nogil except 0: 304 | # Generally speaking, fixing the array size at instantiation time is stupid (an automatically expanding buffer would be better), 305 | # but considering that this class has only one actual user, where we do know max_cases in advance, it is fine for our purposes. 306 | if max_cases < 1: 307 | with gil: 308 | raise ValueError("Must specify max_cases > 0 when creating a CaseManager.") 309 | 310 | cdef CaseManager* self = malloc( sizeof(CaseManager) ) 311 | if self == 0: # we promised Cython not to return NULL, so we must raise if the malloc fails 312 | with gil: 313 | raise MemoryError("Out of memory trying to allocate an CaseManager object") 314 | 315 | # init parallel proc 316 | # 317 | self.ntasks = ntasks 318 | self.wrks = malloc( ntasks*sizeof(double*) ) 319 | self.fk_tmps = malloc( ntasks*sizeof(double*) ) 320 | self.fi_tmps = malloc( ntasks*sizeof(double*) ) 321 | for j in range(ntasks): # init to NULL needed to gracefully handle errors in CaseManager_allocate() 322 | self.wrks[j] = 0 323 | self.fk_tmps[j] = 0 324 | self.fi_tmps[j] = 0 325 | 326 | # init storage for Case pointers 327 | # 328 | self.cases = malloc( max_cases*sizeof(Case*) ) 329 | self.max_cases = max_cases 330 | self.ncases = 0 331 | 332 | # save metadata 333 | # 334 | self.dimension = dimension 335 | self.do_sens = do_sens 336 | self.iterative = iterative 337 | 338 | # these will be set up at allocate time 339 | # 340 | self.mal = 0 341 | self.bytes_needed = 0 342 | 343 | return self 344 | 345 | # Add a case to this manager. 346 | # 347 | # This means the manager will manage the memory for the Case, and at destruction time, 348 | # will also destroy the managed Case. 349 | # 350 | # Up to max_cases Case objects can be added to the manager (see CaseManager_new()). 351 | # 352 | # Case_new() will call this automatically, if a manager is specified. 353 | # 354 | cdef int CaseManager_add( CaseManager* self, Case* case ) nogil except -1: 355 | # sanity check remaining space 356 | if self.ncases == self.max_cases: 357 | with gil: 358 | raise MemoryError("Case pointer buffer full, max_cases = %d reached" % self.max_cases) 359 | 360 | # sanity check Case metadata for compatibility with this CaseManager instance 361 | if case.dimension != self.dimension: 362 | with gil: 363 | raise ValueError("Cannot add case with different dimension = %d; this manager has dimension = %d" % (case.dimension, self.dimension)) 364 | if case.do_sens != self.do_sens: 365 | with gil: 366 | raise ValueError("Cannot add case with different setting for do_sens = %d; this manager has do_sens = %d" % (case.do_sens, self.do_sens)) 367 | if case.iterative != self.iterative: 368 | with gil: 369 | raise ValueError("Cannot add case with different setting for iterative = %d; this manager has iterative = %d" % (case.iterative, self.iterative)) 370 | 371 | # add the Case to the managed cases. 372 | self.cases[self.ncases] = case 373 | self.ncases += 1 374 | 375 | return 0 376 | 377 | # Finish adding Case objects. Prepare the manaager for solving. 378 | # 379 | # This should be called exactly once (per instance of CaseManager), after all cases have been CaseManager_add()'d. 380 | # 381 | cdef int CaseManager_commit( CaseManager* self ) nogil except -1: 382 | return CaseManager_allocate( self ) 383 | 384 | # Create the memory buffer that will contain the data arrays for all cases. 385 | # 386 | cdef int CaseManager_allocate( CaseManager* self ) nogil except -1: 387 | cdef int j, problem_instance_bytes=0, task_bytes=0 388 | cdef int max_nk=0, max_no=0, max_nr=0 389 | cdef int size_wrk=0, size_fk_tmp=0, size_fi_tmp=0 390 | cdef BufferSizes sizes 391 | cdef Case* case 392 | with gil: 393 | try: 394 | if self.ncases == 0: 395 | raise ValueError("No cases; add some before allocating") 396 | 397 | # Determine total amount of memory needed by the per-problem-instance arrays. 398 | # 399 | # Also, find max_nk for allocation of per-task arrays. (nk may vary across cases) 400 | # 401 | for j in range(self.ncases): 402 | case = self.cases[j] 403 | Case_determine_sizes( case, &sizes ) 404 | 405 | problem_instance_bytes += sizes.total 406 | 407 | if case.nk > max_nk: 408 | max_nk = case.nk 409 | if case.no > max_no: 410 | max_no = case.no 411 | if case.nr > max_nr: 412 | max_nr = case.nr 413 | 414 | # Determine maximum needed size for one instance of the per-task arrays, 415 | # when working on this set of cases. 416 | # 417 | if self.do_sens: 418 | size_wrk = max_nr*(max_nk + 1)*sizeof(double) 419 | else: 420 | size_wrk = max_nr*sizeof(double) 421 | 422 | if self.iterative: 423 | size_fk_tmp = max_nk*sizeof(double) 424 | size_fi_tmp = max_no*sizeof(double) 425 | else: 426 | size_fk_tmp = 0 427 | size_fi_tmp = 0 428 | 429 | # The total for the per-task arrays is then just ntasks copies: 430 | # 431 | task_bytes = self.ntasks*(size_wrk + size_fk_tmp + size_fi_tmp) 432 | 433 | # Final total of memory needed is thus: 434 | # 435 | self.bytes_needed = task_bytes + problem_instance_bytes 436 | 437 | # NOTE: this may be big (e.g. ~5.5kB per problem instance for dimension=2, order=4, nk=25, 438 | # so for a moderate number of 1e4 problem instances, this is already 55MB) 439 | # 440 | # The good news is that even if multiple fits (against different data) are performed with the same points, 441 | # we can simply let the buffer be - there is no need to re-create it for each run. 442 | # 443 | self.mal = Allocator_new( mode=ALLOC_MODE_ONEBIGBUFFER, total_size_bytes=self.bytes_needed ) 444 | 445 | # Allocate the per-task arrays. 446 | # 447 | for j in range(self.ntasks): 448 | self.wrks[j] = Allocator_malloc( self.mal, size_wrk ) 449 | self.fk_tmps[j] = Allocator_malloc( self.mal, size_fk_tmp ) 450 | self.fi_tmps[j] = Allocator_malloc( self.mal, size_fi_tmp ) 451 | 452 | # Finally, tell the Case objects to allocate their memory. 453 | # 454 | # They will automatically grab our allocator, since they are in managed mode. 455 | # 456 | for j in range(self.ncases): 457 | Case_allocate( self.cases[j] ) 458 | 459 | except: 460 | # on error, leave the CaseManager in the state it was in before this method was called. 461 | CaseManager_deallocate( self ) 462 | raise 463 | 464 | return 0 465 | 466 | # The opposite of CaseManager_allocate(). 467 | # 468 | cdef void CaseManager_deallocate( CaseManager* self ) nogil: 469 | if self != 0: 470 | self.bytes_needed = 0 471 | 472 | if self.mal != 0: # the allocator instantiation may have failed, so make sure we have an allocator before attempting this 473 | # the managed Case objects also use our allocator 474 | for j in range(self.ncases): 475 | Case_deallocate( self.cases[j] ) # it is safe to Case_deallocate() also a Case that has not yet been Case_allocate()'d. 476 | 477 | for j in range(self.ntasks): 478 | Allocator_free( self.mal, self.fi_tmps[j] ) 479 | self.fi_tmps[j] = 0 480 | Allocator_free( self.mal, self.fk_tmps[j] ) 481 | self.fk_tmps[j] = 0 482 | Allocator_free( self.mal, self.wrks[j] ) 483 | self.wrks[j] = 0 484 | 485 | Allocator_del( self.mal ) 486 | self.mal = 0 487 | 488 | # Destructor. Destroys also the managed Case objects. 489 | # 490 | cdef void CaseManager_del( CaseManager* self ) nogil: 491 | cdef int j 492 | if self != 0: 493 | CaseManager_deallocate( self ) 494 | 495 | # destroy the managed Case objects 496 | for j in range(self.ncases): 497 | Case_del( self.cases[j] ) 498 | 499 | # free manually allocated storage 500 | free( self.cases ) 501 | free( self.fi_tmps ) 502 | free( self.fk_tmps ) 503 | free( self.wrks ) 504 | 505 | free( self ) 506 | 507 | 508 | ################################################# 509 | # class Case: 510 | ################################################# 511 | 512 | # Constructor. 513 | # 514 | # This class is only intended to be instantiated from a Python thread. 515 | # 516 | # manager: an already existing CaseManager object to use, to share the memory allocator among a set of cases. 517 | # The cases must have the same dimension, do_sens, iterative. 518 | # 519 | # If null, an Allocator will be created locally. 520 | # 521 | # host: for guest mode, an existing Case object to use. The geometry data (o2r,r2o,c,w,A,row_scale,col_scale,ipiv) will be borrowed off the host, 522 | # and no local copies will be created. 523 | # 524 | # This can be used to save both memory and time when different fields (in an IBVP problem) live on the exact same geometry. 525 | # "Geometry" includes both xi,yi.zi (point "xi") and the neighbor set (points "xk"; see wlsqm.fitter.impl.make_c_?D()). 526 | # 527 | # Thus, the host Case instance must have the exact same parameters (and geometry!) as the Case instance being created. 528 | # "Parameters" include dimension, order, nk, knowns, weighting_method. 529 | # 530 | # The parameter match is not checked! See wlswm2_expert.ExpertSolver for correct usage. 531 | # (It does some rudimentary checking, but does not check the geometry.) 532 | # 533 | # When using guest mode, the calling code must make sure the host instance stays alive at least as long as its guest instances, 534 | # or hope for a crash. 535 | # 536 | # If null, the geometry data will be allocated locally. 537 | # 538 | cdef Case* Case_new( int dimension, int order, double xi, double yi, double zi, int nk, long long knowns, int weighting_method, int do_sens, int iterative, CaseManager* manager, Case* host ) nogil except 0: 539 | cdef Case* self = malloc( sizeof(Case) ) 540 | if self == 0: # we promised Cython not to return NULL, so we must raise if the malloc fails 541 | with gil: 542 | raise MemoryError("Out of memory trying to allocate a Case object") 543 | 544 | # tag unused components as NaN 545 | cdef double zero = 0 546 | cdef double nan = zero/zero # NaN as per IEEE-754 547 | self.xi = xi 548 | self.yi = yi if dimension >= 2 else nan 549 | self.zi = zi if dimension == 3 else nan 550 | 551 | # init data pointers to NULL to make it safe to dealloc partially initialized Case (when something goes wrong) 552 | self.o2r = 0 553 | self.r2o = 0 554 | self.c = 0 555 | self.w = 0 556 | self.A = 0 557 | self.row_scale = 0 558 | self.col_scale = 0 559 | self.ipiv = 0 560 | self.fi = 0 561 | self.fi2 = 0 562 | self.wrk = 0 563 | self.fk_tmp = 0 564 | self.fi_tmp = 0 565 | 566 | if host == 0: 567 | self.geometry_owned = 1 568 | 569 | # set condition numbers to nan until computed (only computed if wlsqm.fitter.impl.prepare() is called with the debug flag set!) 570 | self.cond_orig = nan 571 | self.cond_scaled = nan 572 | 573 | else: 574 | self.geometry_owned = 0 575 | self.o2r = host.o2r 576 | self.r2o = host.r2o 577 | self.c = host.c 578 | self.w = host.w 579 | self.A = host.A 580 | self.row_scale = host.row_scale 581 | self.col_scale = host.col_scale 582 | self.ipiv = host.ipiv 583 | 584 | # these may have been computed by host (this only works if the host has had preprocess_A() called on it already, but this is the best we can do) 585 | self.cond_orig = host.cond_orig 586 | self.cond_scaled = host.cond_scaled 587 | 588 | # Use the data from CaseManager if given 589 | self.have_manager = (manager != 0) 590 | self.manager = manager # (copies the pointer also if NULL) 591 | self.mal = 0 # this will be filled at allocate time 592 | 593 | # determine number of DOFs in the original (unreduced) and reduced systems (needed to determine array sizes) 594 | cdef int no = number_of_dofs( dimension, order ) 595 | cdef int nr = number_of_reduced_dofs( no, knowns ) 596 | 597 | # save metadata 598 | self.dimension = dimension 599 | self.order = order 600 | self.knowns = knowns 601 | self.weighting_method = weighting_method 602 | self.no = no 603 | self.nr = nr 604 | self.nk = nk 605 | self.do_sens = do_sens 606 | self.iterative = iterative 607 | 608 | # Now the Case is in a half-initialized state, with metadata available, but no memory allocated yet. 609 | 610 | if self.have_manager: 611 | # In managed mode, cases automatically add themselves to the manager. 612 | with gil: 613 | try: 614 | CaseManager_add( self.manager, self ) # this may raise if the buffer is full 615 | except: 616 | free( self ) 617 | raise 618 | else: 619 | # In unmanaged mode, for caller convenience, cases fully initialize themselves, since there is no separate allocate step 620 | # that depends on having all the cases available (to compute the final buffer size). 621 | Case_allocate( self ) 622 | 623 | return self 624 | 625 | # Getters for work space pointers for parallel task "taskid" (0, 1, ..., ntasks-1). 626 | # 627 | # In managed mode, the work spaces live in the manager (there are ntasks copies). 628 | # 629 | # In unmanaged mode, each Case has its own work space. 630 | # 631 | cdef double* Case_get_wrk( Case* self, int taskid ) nogil: 632 | if self.have_manager: 633 | return self.manager.wrks[taskid] 634 | else: 635 | return self.wrk 636 | 637 | cdef double* Case_get_fk_tmp( Case* self, int taskid ) nogil: 638 | if self.have_manager: 639 | return self.manager.fk_tmps[taskid] 640 | else: 641 | return self.fk_tmp 642 | 643 | cdef double* Case_get_fi_tmp( Case* self, int taskid ) nogil: 644 | if self.have_manager: 645 | return self.manager.fi_tmps[taskid] 646 | else: 647 | return self.fi_tmp 648 | 649 | # Helper: convert an (nk,) array of squared distances to corresponding weights. 650 | # 651 | # w : in/out. On entry, squared distances from xi to each xk. 652 | # On exit, weight factors for each xk. 653 | # nk : in, number of neighbor points (i.e. points xk) 654 | # max_d2 : in, the largest squared distance seen (i.e. the max element of input w). 655 | # This is used for normalization. 656 | # weighting_method : in, one of the constants WEIGHT_*. Specifies the type of weighting to use; 657 | # different weightings are good for different use cases of WLSQM. 658 | # 659 | cdef void Case_make_weights( Case* self, double max_d2 ) nogil: 660 | cdef double* w = self.w 661 | cdef int nk = self.nk 662 | cdef int weighting_method = self.weighting_method 663 | 664 | # no-op in guest mode (weights already computed in the host Case instance) 665 | if not self.geometry_owned: 666 | return 667 | 668 | cdef int k 669 | cdef double d2, tmp 670 | if weighting_method == defs.WEIGHT_UNIFORM_c: 671 | # Trivial weighting. Don't use distance information, treat all points as equally important. 672 | # 673 | # This gives the best overall fit of function values across all points xk, 674 | # at the cost of accuracy of derivatives at the point xi. 675 | # 676 | # (Essentially, this cost is because derivatives are local, so the information 677 | # from far-away points corrupts them.) 678 | # 679 | for k in range(nk): 680 | w[k] = 1. 681 | 682 | else: # weighting_method == defs.WEIGHT_CENTER_c: 683 | # Emphasize points close to xi. 684 | # 685 | # Improves the fit of derivatives at the point xi, at the cost of the overall fit 686 | # of function values at points xk that are (relatively speaking) distant from xi. 687 | # 688 | for k in range(nk): 689 | d2 = w[k] # the array w originally contains squared distances (without normalization) 690 | 691 | # distance squared, flipped on the distance axis (fast falloff near origin) 692 | DEF alpha = 1e-4 # weight remaining at maximum distance 693 | DEF beta = 1. - alpha 694 | tmp = 1. - sqrt(d2 / max_d2) 695 | w[k] = alpha + beta * tmp*tmp 696 | 697 | # Determine how many bytes of memory this Case will need for storing its arrays. 698 | # 699 | # Write the result into the given BufferSizes struct. 700 | # 701 | cdef void Case_determine_sizes( Case* self, BufferSizes* sizes ) nogil: 702 | cdef int no = self.no 703 | cdef int nr = self.nr 704 | cdef int nk = self.nk 705 | cdef int do_sens = self.do_sens 706 | cdef int iterative = self.iterative 707 | 708 | if self.geometry_owned: 709 | sizes.o2r = no*sizeof(int) # (no,) 710 | sizes.r2o = no*sizeof(int) # (no,) 711 | sizes.c = nk*no*sizeof(double) # (nk,no), C-contiguous 712 | sizes.w = nk*sizeof(double) # (nk,) 713 | sizes.A = nr*nr*sizeof(double) # (nr, nr), Fortran-contiguous 714 | sizes.row_scale = nr*sizeof(double) # (nr,) 715 | sizes.col_scale = nr*sizeof(double) # (nr,) 716 | sizes.ipiv = nr*sizeof(int) # (nr,) 717 | else: 718 | # This function computes only bytes needed, so in guest mode we can put zeroes here. 719 | # This function is not used for determining the number of elements in anything. 720 | sizes.o2r = 0 721 | sizes.r2o = 0 722 | sizes.c = 0 723 | sizes.w = 0 724 | sizes.A = 0 725 | sizes.row_scale = 0 726 | sizes.col_scale = 0 727 | sizes.ipiv = 0 728 | 729 | # The coefficient array is always needed. 730 | # 731 | sizes.fi = no*sizeof(double) # (no,) 732 | 733 | # For any polynomial of degree d >= 1, its (non-zero) derivatives are a polynomial of degree d - 1. 734 | # In the zeroth order case, the derivative is everywhere zero. 735 | cdef int no2 = number_of_dofs( self.dimension, self.order - 1 ) if self.order >= 1 else 0 736 | sizes.fi2 = no2*sizeof(double) # (no2,) 737 | 738 | # per-task work space arrays 739 | # 740 | if self.have_manager: 741 | # in managed mode, CaseManager will allocate one copy of the per-task (not per-problem-instance) arrays 742 | sizes.wrk = 0 743 | sizes.fk_tmp = 0 744 | sizes.fi_tmp = 0 745 | else: 746 | # unmanaged mode - allocate also the per-task arrays locally. 747 | # see the header comment of solve() for the solver work space sizes 748 | if do_sens: 749 | sizes.wrk = nr*(nk + 1)*sizeof(double) 750 | else: 751 | sizes.wrk = nr*sizeof(double) 752 | 753 | if iterative: 754 | sizes.fk_tmp = nk*sizeof(double) 755 | sizes.fi_tmp = no*sizeof(double) 756 | else: 757 | sizes.fk_tmp = 0 758 | sizes.fi_tmp = 0 759 | 760 | sizes.total = sizes.o2r + sizes.r2o \ 761 | + sizes.c + sizes.w \ 762 | + sizes.A + sizes.row_scale + sizes.col_scale + sizes.ipiv \ 763 | + sizes.fi + sizes.fi2 \ 764 | + sizes.wrk \ 765 | + sizes.fk_tmp + sizes.fi_tmp 766 | 767 | # Load user-given data into the coefficients fi. 768 | # 769 | # The length of the input is assumed to be self.no. 770 | # 771 | # This can be used to populate knowns. 772 | # 773 | cdef void Case_set_fi( Case* self, double* fi ) nogil: 774 | cdef double* my_fi = self.fi 775 | cdef int no = self.no 776 | cdef int om 777 | for om in range(no): 778 | my_fi[om] = fi[om] 779 | 780 | # Populate user-given array of length self.no 781 | # with the solution data (coefficients self.fi). 782 | # 783 | cdef void Case_get_fi( Case* self, double* out ) nogil: 784 | cdef double* my_fi = self.fi 785 | cdef int no = self.no 786 | cdef int om 787 | for om in range(no): 788 | out[om] = my_fi[om] 789 | 790 | # Perform memory allocation. 791 | # 792 | cdef int Case_allocate( Case* self ) nogil except -1: 793 | # At this point, the constructor has finished, so we have no, nr, nk and the flags (do_sens, iterative). 794 | # 795 | # Calculate the (space-)optimal buffer size for ONEBIGBUFFER mode. 796 | # 797 | # (This also calculates the various individual sizes, which we will use to actually allocate the memory.) 798 | # 799 | cdef BufferSizes sizes 800 | Case_determine_sizes( self, &sizes ) 801 | 802 | # Acquire or create the custom memory allocator to allocate storage for actual data. 803 | # 804 | cdef int size_remaining=-1 805 | cdef Allocator* mal 806 | with gil: 807 | try: 808 | if self.have_manager: # managed mode - external allocator given; check that it has enough space for us. 809 | mal = self.manager.mal 810 | size_remaining = Allocator_size_remaining( mal ) 811 | if size_remaining < sizes.total: 812 | raise MemoryError("%d bytes of memory needed, but the given allocator has only %d bytes remaining." % (sizes.total, size_remaining)) 813 | 814 | else: # unmanaged mode - instantiate our own Allocator. The Allocator constructor will raise MemoryError if it runs out of memory. 815 | mal = Allocator_new( mode=ALLOC_MODE_ONEBIGBUFFER, total_size_bytes=sizes.total ) 816 | except: 817 | free( self ) 818 | raise 819 | self.mal = mal 820 | 821 | # Allocate the storage, using the custom allocator. 822 | # 823 | if self.geometry_owned: 824 | self.o2r = Allocator_malloc( mal, sizes.o2r ) 825 | self.r2o = Allocator_malloc( mal, sizes.r2o ) 826 | 827 | self.c = Allocator_malloc( mal, sizes.c ) 828 | self.w = Allocator_malloc( mal, sizes.w ) 829 | 830 | self.A = Allocator_malloc( mal, sizes.A ) 831 | self.row_scale = Allocator_malloc( mal, sizes.row_scale ) 832 | self.col_scale = Allocator_malloc( mal, sizes.col_scale ) 833 | self.ipiv = Allocator_malloc( mal, sizes.ipiv ) 834 | 835 | self.fi = Allocator_malloc( mal, sizes.fi ) 836 | 837 | if sizes.fi2: 838 | self.fi2 = Allocator_malloc( mal, sizes.fi2 ) 839 | else: 840 | self.fi2 = 0 841 | 842 | if self.have_manager: 843 | # in managed mode, CaseManager will allocate one copy of the per-task (not per-problem-instance) arrays 844 | self.wrk = 0 845 | self.fk_tmp = 0 846 | self.fi_tmp = 0 847 | 848 | else: 849 | # unmanaged mode - allocate the per-task arrays locally. 850 | self.wrk = Allocator_malloc( mal, sizes.wrk ) 851 | 852 | if self.iterative: 853 | self.fk_tmp = Allocator_malloc( mal, sizes.fk_tmp ) 854 | self.fi_tmp = Allocator_malloc( mal, sizes.fi_tmp ) 855 | else: 856 | self.fk_tmp = 0 857 | self.fi_tmp = 0 858 | 859 | # memory allocated; populate o2r and r2o 860 | # 861 | # (in guest mode, this has already been done in the host Case instance) 862 | # 863 | if self.geometry_owned: 864 | remap( self.o2r, self.r2o, self.no, self.knowns ) 865 | 866 | return 0 867 | 868 | # The opposite of Case_allocate(). 869 | # 870 | cdef void Case_deallocate( Case* self ) nogil: 871 | if self != 0: 872 | # No guarantee that we'll be called only from the destructor; 873 | # we must set any deallocated pointers to NULL. 874 | Allocator_free( self.mal, self.fi_tmp ) 875 | self.fi_tmp = 0 876 | Allocator_free( self.mal, self.fk_tmp ) 877 | self.fk_tmp = 0 878 | 879 | Allocator_free( self.mal, self.wrk ) 880 | self.wrk = 0 881 | 882 | Allocator_free( self.mal, self.fi2 ) 883 | self.fi2 = 0 884 | Allocator_free( self.mal, self.fi ) 885 | self.fi = 0 886 | 887 | if self.geometry_owned: 888 | Allocator_free( self.mal, self.ipiv ) 889 | self.ipiv = 0 890 | Allocator_free( self.mal, self.col_scale ) 891 | self.col_scale = 0 892 | Allocator_free( self.mal, self.row_scale ) 893 | self.row_scale = 0 894 | Allocator_free( self.mal, self.A ) 895 | self.A = 0 896 | 897 | Allocator_free( self.mal, self.w ) 898 | self.w = 0 899 | Allocator_free( self.mal, self.c ) 900 | self.c = 0 901 | 902 | Allocator_free( self.mal, self.r2o ) 903 | self.r2o = 0 904 | Allocator_free( self.mal, self.o2r ) 905 | self.o2r = 0 906 | else: 907 | # In guest mode, the allocation of these arrays is managed by the host Case instance. 908 | self.ipiv = 0 909 | self.col_scale = 0 910 | self.row_scale = 0 911 | self.A = 0 912 | self.w = 0 913 | self.c = 0 914 | self.r2o = 0 915 | self.o2r = 0 916 | 917 | # Destructor. 918 | # 919 | cdef void Case_del( Case* self ) nogil: 920 | if self != 0: 921 | Case_deallocate( self ) 922 | 923 | if not self.have_manager: 924 | Allocator_del( self.mal ) 925 | 926 | free( self ) 927 | 928 | -------------------------------------------------------------------------------- /wlsqm/fitter/interp.pxd: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # 3 | # WLSQM (Weighted Least SQuares Meshless): a fast and accurate meshless least-squares interpolator for Python, for scalar-valued data defined as point values on 1D, 2D and 3D point clouds. 4 | # 5 | # Interpolation of fitted surrogate model. 6 | # 7 | # C API definitions. 8 | # 9 | # JJ 2016-11-30 10 | 11 | from __future__ import absolute_import 12 | 13 | from cython cimport view 14 | 15 | cimport wlsqm.fitter.infra as infra 16 | 17 | cdef int interpolate_nD( infra.Case* case, double[::view.generic,::view.contiguous] xManyD, double[::view.generic] x1D, double* out, int diff ) nogil 18 | 19 | cdef int interpolate_3D( infra.Case* case, double[::view.generic,::view.contiguous] x, double* out, int diff ) nogil 20 | cdef int interpolate_2D( infra.Case* case, double[::view.generic,::view.contiguous] x, double* out, int diff ) nogil 21 | cdef int interpolate_1D( infra.Case* case, double[::view.generic] x, double* out, int diff ) nogil 22 | 23 | -------------------------------------------------------------------------------- /wlsqm/fitter/polyeval.pxd: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # 3 | # WLSQM (Weighted Least SQuares Meshless): a fast and accurate meshless least-squares interpolator for Python, for scalar-valued data defined as point values on 1D, 2D and 3D point clouds. 4 | # 5 | # Evaluation of Taylor expansions and general polynomials up to 4th order in 1D, 2D and 3D. 6 | # 7 | # C API definitions. 8 | # 9 | # JJ 2016-12-09 10 | 11 | from __future__ import absolute_import 12 | 13 | from cython cimport view 14 | 15 | cdef int taylor_3D( int order, double* fi, double xi, double yi, double zi, double[::view.generic,::view.contiguous] x, double* out ) nogil 16 | cdef int general_3D( int order, double* fi, double xi, double yi, double zi, double[::view.generic,::view.contiguous] x, double* out ) nogil 17 | 18 | cdef int taylor_2D( int order, double* fi, double xi, double yi, double[::view.generic,::view.contiguous] x, double* out ) nogil 19 | cdef int general_2D( int order, double* fi, double xi, double yi, double[::view.generic,::view.contiguous] x, double* out ) nogil 20 | 21 | cdef int taylor_1D( int order, double* fi, double xi, double[::view.generic] x, double* out ) nogil 22 | cdef int general_1D( int order, double* fi, double xi, double[::view.generic] x, double* out ) nogil 23 | 24 | -------------------------------------------------------------------------------- /wlsqm/fitter/popcount.h: -------------------------------------------------------------------------------- 1 | 2 | 3 | #ifdef _MSC_VER 4 | #include 5 | #define __builtin_popcount __popcnt 6 | #define __builtin_popcountll __popcnt64 7 | #endif -------------------------------------------------------------------------------- /wlsqm/fitter/simple.pxd: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # 3 | # WLSQM (Weighted Least SQuares Meshless): a fast and accurate meshless least-squares interpolator for Python, for scalar-valued data defined as point values on 1D, 2D and 3D point clouds. 4 | # 5 | # Cython declarations for the main module. See the .pyx source for wlsqm.fitter.simple for documentation. 6 | # 7 | # JJ 2016-11-07 8 | 9 | # Set Cython compiler directives. This section must appear before any code! 10 | # 11 | # For available directives, see: 12 | # 13 | # http://docs.cython.org/en/latest/src/reference/compilation.html 14 | # 15 | # cython: wraparound = False 16 | # cython: boundscheck = False 17 | # cython: cdivision = True 18 | 19 | # This module contains "driver" routines in the LAPACK sense. 20 | # The low-level C routines are contained in wlsqm.fitter.impl. 21 | 22 | from __future__ import absolute_import 23 | 24 | from cython cimport view # for usage, see http://cython.readthedocs.io/en/latest/src/userguide/memoryviews.html#specifying-more-general-memory-layouts 25 | 26 | #################################################### 27 | # Single case (one neighborhood), single-threaded 28 | #################################################### 29 | 30 | cdef int generic_fit_basic( int dimension, double[::view.generic,::view.contiguous] xkManyD, double[::view.generic] xk1D, double[::view.generic] fk, double[::1] xiManyD, double xi1D, double[::1] fi, 31 | double[::view.generic,::view.contiguous] sens, int do_sens, int order, long long knowns, int weighting_method, int debug ) nogil except -1 32 | 33 | cdef int generic_fit_iterative( int dimension, double[::view.generic,::view.contiguous] xkManyD, double[::view.generic] xk1D, double[::view.generic] fk, double[::1] xiManyD, double xi1D, double[::1] fi, 34 | double[::view.generic,::view.contiguous] sens, int do_sens, int order, long long knowns, int weighting_method, int max_iter, int debug ) nogil except -1 35 | 36 | #################################################### 37 | # Many cases, single-threaded 38 | #################################################### 39 | 40 | # single-threaded 41 | cdef int generic_fit_basic_many( int dimension, double[::view.generic,::view.generic,::view.contiguous] xkManyD, double[::view.generic,::view.generic] xk1D, 42 | double[::view.generic,::view.generic] fk, int[::view.generic] nk, 43 | double[::view.generic,::view.contiguous] xiManyD, double[::view.generic] xi1D, double[::view.generic,::view.contiguous] fi, 44 | double[::view.generic,::view.generic,::view.contiguous] sens, int do_sens, 45 | int[::view.generic] order, long long[::view.generic] knowns, int[::view.generic] weighting_method, int debug ) nogil except -1 46 | 47 | cdef int generic_fit_iterative_many( int dimension, double[::view.generic,::view.generic,::view.contiguous] xkManyD, double[::view.generic,::view.generic] xk1D, 48 | double[::view.generic,::view.generic] fk, int[::view.generic] nk, 49 | double[::view.generic,::view.contiguous] xiManyD, double[::view.generic] xi1D, double[::view.generic,::view.contiguous] fi, 50 | double[::view.generic,::view.generic,::view.contiguous] sens, int do_sens, 51 | int[::view.generic] order, long long[::view.generic] knowns, int[::view.generic] weighting_method, int max_iter, int debug ) nogil except -1 52 | 53 | #################################################### 54 | # Many cases, multithreaded 55 | #################################################### 56 | 57 | cdef int generic_fit_basic_many_parallel( int dimension, double[::view.generic,::view.generic,::view.contiguous] xkManyD, double[::view.generic,::view.generic] xk1D, 58 | double[::view.generic,::view.generic] fk, int[::view.generic] nk, 59 | double[::view.generic,::view.contiguous] xiManyD, double[::view.generic] xi1D, double[::view.generic,::view.contiguous] fi, 60 | double[::view.generic,::view.generic,::view.contiguous] sens, int do_sens, 61 | int[::view.generic] order, long long[::view.generic] knowns, int[::view.generic] weighting_method, int ntasks, int debug ) nogil except -1 62 | 63 | cdef int generic_fit_iterative_many_parallel( int dimension, double[::view.generic,::view.generic,::view.contiguous] xkManyD, double[::view.generic,::view.generic] xk1D, 64 | double[::view.generic,::view.generic] fk, int[::view.generic] nk, 65 | double[::view.generic,::view.contiguous] xiManyD, double[::view.generic] xi1D, double[::view.generic,::view.contiguous] fi, 66 | double[::view.generic,::view.generic,::view.contiguous] sens, int do_sens, 67 | int[::view.generic] order, long long[::view.generic] knowns, int[::view.generic] weighting_method, int max_iter, int ntasks, int debug ) nogil except -1 68 | 69 | -------------------------------------------------------------------------------- /wlsqm/utils/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Technologicat/python-wlsqm/b697d163c2d2bec46b4d9696467abaebb9d4cbb3/wlsqm/utils/__init__.py -------------------------------------------------------------------------------- /wlsqm/utils/lapackdrivers.pxd: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # 3 | # Cython interface for lapackdrivers.pyx. 4 | # 5 | # Naming scheme (in shellglob notation): 6 | # *s = multiple RHS (but with the same LHS for all). 7 | # These are one-shot deals that reuse the matrix factorization internally. 8 | # However, the pivot information is not returned, so the matrix A 9 | # is destroyed (overwritten) during the call. 10 | # 11 | # m* = multiple LHS (a separate single RHS for each) 12 | # These simply loop over the problem instances. 13 | # 14 | # *p = parallel (multi-threaded using OpenMP) 15 | # These introduce parallel looping over problem instances. 16 | # 17 | # *factor* = the routine that factors the matrix and generates pivot information. 18 | # *factored* = the solver routine that uses the factored matrix and pivot information. 19 | # These are useful for solving with many RHSs, when all the RHSs 20 | # are not available at once (e.g. in PDE solvers, timestepping 21 | # with a mass matrix that remains constant in time). 22 | # 23 | # *_c = C version without memoryviews (only visible from Cython). Can be more convenient 24 | # for use in nogil blocks, in cases where the arrays need to be allocated dynamically (with malloc). 25 | # The purpose of the C version is to avoid the need to acquire the GIL to create a memoryview 26 | # into a malloc()'d array. 27 | # 28 | # This .pxd file offers access to only the C versions. To import the corresponding Python versions 29 | # of the routines (same name, without the _c suffix), import the module normally in Python. 30 | # 31 | # Note that the Python routines operate on memoryview slices (compatible with np.arrays), 32 | # so they have slightly different parameters and return values when compared to the C routines. 33 | # 34 | # Generally, the Python versions will allocate arrays for you, while the C versions expect you 35 | # to provide pointers to already malloc()'d memory (and explicit sizes). 36 | # 37 | # See the function docstrings and comments in the .pyx source for details. 38 | # 39 | # JJ 2016-11-07 40 | 41 | from __future__ import absolute_import 42 | 43 | ############################################################################################################## 44 | # Helpers 45 | ############################################################################################################## 46 | 47 | cdef void distribute_items_c( int nitems, int ntasks, int* blocksizes, int* baseidxs ) nogil # distribute work items across tasks, assuming equal load per item. 48 | 49 | cdef void copygeneral_c( double* O, double* I, int nrows, int ncols ) nogil # copy general square array 50 | cdef void copysymmu_c( double* O, double* I, int nrows, int ncols ) nogil # copy symmetric square array, upper triangle only 51 | 52 | cdef void symmetrize_c( double* A, int nrows, int ncols ) nogil 53 | cdef void msymmetrize_c( double* A, int nrows, int ncols, int nlhs ) nogil 54 | cdef void msymmetrizep_c( double* A, int nrows, int ncols, int nlhs, int ntasks ) nogil 55 | 56 | ############################################################################################################## 57 | # Preconditioning (scaling) 58 | ############################################################################################################## 59 | 60 | # Scaling reduces the condition number of A, helping DGESV (general()) to give more correct digits. 61 | # 62 | # The return value is the number of iterations taken; always 1 for non-iterative algorithms. 63 | 64 | # helpers 65 | cdef void init_scaling_c( int nrows, int ncols, double* row_scale, double* col_scale ) nogil # init all scaling factors to 1.0 66 | cdef void apply_scaling_c( double* A, int nrows, int ncols, double* row_scale, double* col_scale ) nogil # freeze the scaling by applying it in-place 67 | 68 | # simple, fast methods; these destroy the possible symmetry of A 69 | cdef int rescale_columns_c( double* A, int nrows, int ncols, double* row_scale, double* col_scale ) nogil 70 | cdef int rescale_rows_c( double* A, int nrows, int ncols, double* row_scale, double* col_scale ) nogil 71 | cdef int rescale_twopass_c( double* A, int nrows, int ncols, double* row_scale, double* col_scale ) nogil # scale columns, then rows 72 | cdef int rescale_dgeequ_c( double* A, int nrows, int ncols, double* row_scale, double* col_scale ) nogil 73 | 74 | # symmetry-preserving methods (iterative) 75 | cdef int rescale_ruiz2001_c( double* A, int nrows, int ncols, double* row_scale, double* col_scale ) nogil 76 | cdef int rescale_scalgm_c( double* A, int nrows, int ncols, double* row_scale, double* col_scale ) nogil 77 | 78 | ############################################################################################################## 79 | # Tridiagonal matrices 80 | ############################################################################################################## 81 | 82 | cpdef int tridiag( double[::1] a, double[::1] b, double[::1] c, double[::1] x ) nogil except -1 83 | 84 | ############################################################################################################## 85 | # Symmetric matrices 86 | ############################################################################################################## 87 | 88 | cdef int symmetric2x2_c( double* A, double* b ) nogil except -1 89 | 90 | cdef int symmetric_c( double* A, double* b, int n ) nogil except -1 91 | cdef int symmetricfactor_c( double* A, int* ipiv, int n ) nogil except -1 92 | cdef int symmetricfactored_c( double* A, int* ipiv, double* b, int n ) nogil except -1 93 | 94 | cdef int symmetrics_c( double* A, double* b, int n, int nrhs ) nogil except -1 95 | cdef int symmetricsp_c( double* A, double* b, int n, int nrhs, int ntasks ) nogil except -1 96 | 97 | cdef int msymmetric_c( double* A, double* b, int n, int nlhs ) nogil except -1 98 | cdef int msymmetricp_c( double* A, double* b, int n, int nlhs, int ntasks ) nogil except -1 99 | 100 | cdef int msymmetricfactor_c( double* A, int* ipiv, int n, int nlhs ) nogil except -1 101 | cdef int msymmetricfactored_c( double* A, int* ipiv, double* b, int n, int nlhs ) nogil except -1 102 | cdef int msymmetricfactorp_c( double* A, int* ipiv, int n, int nlhs, int ntasks ) nogil except -1 103 | cdef int msymmetricfactoredp_c( double* A, int* ipiv, double* b, int n, int nlhs, int ntasks ) nogil except -1 104 | 105 | ############################################################################################################## 106 | # General matrices 107 | ############################################################################################################## 108 | 109 | cdef int general2x2_c( double* A, double* b ) nogil except -1 110 | 111 | cdef int general_c( double* A, double* b, int n ) nogil except -1 112 | cdef int generalfactor_c( double* A, int* ipiv, int n ) nogil except -1 113 | cdef int generalfactored_c( double* A, int* ipiv, double* b, int n ) nogil except -1 114 | 115 | cdef int generals_c( double* A, double* b, int n, int nrhs ) nogil except -1 116 | cdef int generalsp_c( double* A, double* b, int n, int nrhs, int ntasks ) nogil except -1 117 | 118 | cdef int mgeneral_c( double* A, double* b, int n, int nlhs ) nogil except -1 119 | cdef int mgeneralp_c( double* A, double* b, int n, int nlhs, int ntasks ) nogil except -1 120 | 121 | cdef int mgeneralfactor_c( double* A, int* ipiv, int n, int nlhs ) nogil except -1 122 | cdef int mgeneralfactored_c( double* A, int* ipiv, double* b, int n, int nlhs ) nogil except -1 123 | cdef int mgeneralfactorp_c( double* A, int* ipiv, int n, int nlhs, int ntasks ) nogil except -1 124 | cdef int mgeneralfactoredp_c( double* A, int* ipiv, double* b, int n, int nlhs, int ntasks ) nogil except -1 125 | 126 | ############################################################################################################## 127 | # Other stuff 128 | ############################################################################################################## 129 | 130 | cdef int svd_c( double* A, int m, int n, double* S ) nogil except -1 131 | 132 | -------------------------------------------------------------------------------- /wlsqm/utils/ptrwrap.pxd: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # 3 | # Hack around the limitation that C pointers cannot be passed to Python functions. 4 | # 5 | # http://grokbase.com/t/gg/cython-users/134b21rga8/passing-callback-pointers-to-python-and-back 6 | # 7 | # JJ 2016-02-29 8 | 9 | from __future__ import absolute_import 10 | 11 | cdef class PointerWrapper: 12 | cdef void* ptr 13 | cdef set_ptr(self, void * input) 14 | 15 | -------------------------------------------------------------------------------- /wlsqm/utils/ptrwrap.pyx: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # 3 | # Hack around the limitation that C pointers cannot be passed to Python functions. 4 | # 5 | # http://grokbase.com/t/gg/cython-users/134b21rga8/passing-callback-pointers-to-python-and-back 6 | # 7 | # This is a Cython module, to be used by other .pyx modules, with no access from Python. 8 | # 9 | # JJ 2016-02-29 10 | 11 | from __future__ import division, print_function, absolute_import 12 | 13 | cdef class PointerWrapper: 14 | cdef set_ptr(self, void * input): 15 | self.ptr = input 16 | 17 | --------------------------------------------------------------------------------