├── .gitignore
├── CHANGELOG.md
├── LICENSE.md
├── README.md
├── TODO.md
├── doc
    ├── eulerflow.lyx
    ├── eulerflow.pdf
    ├── wlsqm.lyx
    ├── wlsqm.pdf
    ├── wlsqm_gen.lyx
    └── wlsqm_gen.pdf
├── example.png
├── examples
    ├── expertsolver_example.py
    ├── lapackdrivers_example.py
    ├── sudoku_lhs.py
    └── wlsqm_example.py
├── setup.py
└── wlsqm
    ├── __init__.py
    ├── fitter
        ├── __init__.py
        ├── defs.pxd
        ├── defs.pyx
        ├── expert.pyx
        ├── impl.pxd
        ├── impl.pyx
        ├── infra.pxd
        ├── infra.pyx
        ├── interp.pxd
        ├── interp.pyx
        ├── polyeval.pxd
        ├── polyeval.pyx
        ├── popcount.h
        ├── simple.pxd
        └── simple.pyx
    └── utils
        ├── __init__.py
        ├── lapackdrivers.pxd
        ├── lapackdrivers.pyx
        ├── ptrwrap.pxd
        └── ptrwrap.pyx


/.gitignore:
--------------------------------------------------------------------------------
1 | *~
2 | *.pyc
3 | *.c
4 | build
5 | 


--------------------------------------------------------------------------------
/CHANGELOG.md:
--------------------------------------------------------------------------------
 1 | ## Changelog
 2 | 
 3 | ### [v0.1.5]
 4 |  - support both Python 3.4 and 2.7
 5 | 
 6 | ### [v0.1.4]
 7 |  - actually use the shorter short description (oops)
 8 | 
 9 | ### [v0.1.3]
10 |  - setup.py is now Python 3 compatible (but wlsqm itself is not yet!)
11 |  - fixed sdist: package also CHANGELOG.md
12 | 
13 | ### [v0.1.2]
14 |  - set zip_safe to False to better work with Cython (important for libs that depend on this one)
15 | 
16 | ### [v0.1.1]
17 |  - change distribution system from distutils to setuptools
18 | 
19 | ### [v0.1.0]
20 |   - initial version
21 | 
22 | 


--------------------------------------------------------------------------------
/LICENSE.md:
--------------------------------------------------------------------------------
 1 | Copyright (c) 2016-2017, Juha Jeronen and University of Jyväskylä.
 2 | All rights reserved.
 3 | 
 4 | Redistribution and use in source and binary forms, with or without
 5 | modification, are permitted provided that the following conditions are met:
 6 |     * Redistributions of source code must retain the above copyright
 7 |       notice, this list of conditions and the following disclaimer.
 8 |     * Redistributions in binary form must reproduce the above copyright
 9 |       notice, this list of conditions and the following disclaimer in the
10 |       documentation and/or other materials provided with the distribution.
11 | 
12 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
13 | ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
14 | WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
15 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER BE LIABLE FOR ANY
16 | DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
17 | (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
18 | LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
19 | ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
20 | (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
21 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
22 | 
23 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # wlsqm
  2 | 
  3 | Weighted least squares meshless interpolator
  4 | 
  5 | ![2D example](example.png)
  6 | 
  7 | 
  8 | ## Introduction
  9 | 
 10 | WLSQM (Weighted Least SQuares Meshless) is a fast and accurate meshless least-squares interpolator for Python, for scalar-valued data defined as point values on 1D, 2D and 3D point clouds.
 11 | 
 12 | Use cases include response surface modeling, and computing space derivatives of data known only as values at discrete points in space (this has applications in explicit algorithms for solving IBVPs). No grid or mesh is needed. No restriction is imposed on geometry other than "not degenerate", e.g. points in 2D should not all fall onto the same 1D line.
 13 | 
 14 | This is an independent implementation of the weighted least squares meshless algorithm described (in the 2nd order 2D case) in section 2.2.1 of Hong Wang (2012), Evolutionary Design Optimization with Nash Games and Hybridized Mesh/Meshless Methods in Computational Fluid Dynamics, Jyväskylä Studies in Computing 162, University of Jyväskylä. [ISBN 978-951-39-5007-1 (PDF)](http://urn.fi/URN:ISBN:978-951-39-5007-1)
 15 | 
 16 | This implementation is targeted for high performance in a single-node environment, such as a laptop. Cython is used to accelerate the low-level routines. The main target is the `x86_64` architecture, but any 64-bit architecture should be fine with the appropriate compiler option changes to [setup.py](setup.py).
 17 | 
 18 | Currently automated unit tests are missing; this is an area that is likely to be improved. Otherwise the code is already rather stable; any major new features are unlikely to be added, and the API is considered stable.
 19 | 
 20 | 
 21 | ## Features
 22 | 
 23 | - Given scalar data values on a set of points in 1D, 2D or 3D, construct a piecewise polynomial global surrogate model (a.k.a. response surface), using up to 4th order polynomials.
 24 | 
 25 | - Sliced arrays are supported for input, both for the geometry (points) and data (function values).
 26 | 
 27 | - Obtain any derivative of the model, up to the order of the polynomial. Derivatives at each "local model reference point" xi are directly available as DOFs of the solution. Derivatives at any other point can be automatically interpolated from the model. Differentiation of polynomials has been hardcoded to obtain high performance.
 28 | 
 29 | - Knowns. At the model reference point xi, the function value and/or any of the derivatives can be specified as knowns. The knowns are internally automatically eliminated (making the equation system smaller) and only the unknowns are fitted. The function value itself may also be unknown, which is useful for implementing Neumann BCs in a PDE (IBVP) solving context.
 30 | 
 31 | - Selectable weighting method for the fitting error, to support different use cases:
 32 |   - uniform (`wlsqm.fitter.defs.WEIGHT_UNIFORM`), for best overall fit for function values
 33 |   - emphasize points closer to xi (`wlsqm.fitter.defs.WEIGHT_CENTER`), to improve derivatives at the reference point xi by reducing the influence of points far away from the reference point.
 34 | 
 35 | - Sensitivity data of solution DOFs (on the data values at points other than the reference in the local neighborhood) can be optionally computed.
 36 | 
 37 | - Expert mode with separate prepare and solve stages, for faster fitting of many data sets using the same geometry. Also performs global model patching, using the set of local models fitted.
 38 | 
 39 |   **CAVEAT**: `wlsqm.fitter.expert.ExpertSolver` instances are not currently pickleable or copyable. This is a known limitation that may (or may not) change in the future.
 40 | 
 41 |   It is nevertheless recommended to use ExpertSolver, since this allows for easy simultaneous solving of many local models (in parallel), automatic global model patching, and reuse of problem matrices when the geometry of the point cloud does not change.
 42 | 
 43 | - Speed:
 44 |   - Performance-critical parts are implemented in Cython, and the GIL is released during computation.
 45 |   - LAPACK is used directly via [SciPy's Cython-level bindings](https://docs.scipy.org/doc/scipy/reference/linalg.cython_lapack.html) (see the `ntasks` parameter in various API functions in `wlsqm`). This is especially useful when many (1e4 or more) local models are being fitted, as the solver loop does not require holding the GIL.
 46 |   - OpenMP is used for parallelization over the independent local problems (also in the linear solver step).
 47 |   - The polynomial evaluation code has been manually optimized to reduce the number of FLOPS required.
 48 | 
 49 |     In 1D, the Horner form is used. The 2D and 3D cases use a symmetric form that extends the 1D Horner form into multiple dimensions (see [wlsqm/fitter/polyeval.pyx](wlsqm/fitter/polyeval.pyx) for details). The native FMA (fused multiply-add) instruction of the CPU is used in the evaluation to further reduce FLOPS required, and to improve accuracy (utilizing the fact it rounds only once).
 50 | 
 51 | - Accuracy:
 52 |   - Problem matrices are preconditioned by a symmetry-preserving scaling algorithm (D. Ruiz 2001; exact reference given in [wlsqm/utils/lapackdrivers.pyx](wlsqm/utils/lapackdrivers.pyx)) to obtain best possible accuracy from the direct linear solver. This is critical especially for high-order fits.
 53 |   - The fitting procedure optionally accomodates an internal iterative refinement loop to mitigate the effect of roundoff.
 54 |   - FMA, as mentioned above.
 55 | 
 56 | 
 57 | ## Installation
 58 | 
 59 | ### From PyPI (wlsqm v0.1.1+)
 60 | 
 61 | Install as user:
 62 | 
 63 | ```bash
 64 | pip install wlsqm --user
 65 | ```
 66 | 
 67 | Install as admin:
 68 | 
 69 | ```bash
 70 | sudo pip install wlsqm
 71 | ```
 72 | 
 73 | ### From GitHub
 74 | 
 75 | As user:
 76 | 
 77 | ```bash
 78 | git clone https://github.com/Technologicat/python-wlsqm.git
 79 | cd python-wlsqm
 80 | python setup.py install --user
 81 | ```
 82 | 
 83 | As admin, change the last command to
 84 | 
 85 | ```bash
 86 | sudo python setup.py install
 87 | ```
 88 | 
 89 | 
 90 | ## Documentation
 91 | 
 92 | For usage examples, see [examples/wlsqm_example.py](examples/wlsqm_example.py) for a tour, and [examples/expertsolver_example.py](examples/expertsolver_example.py) for a minimal example concentrating specifically on `ExpertSolver`.
 93 | 
 94 | For the technical details, see the docstrings and comments in the code itself.
 95 | 
 96 | Mathematics documented at:
 97 | 
 98 |   - [https://yousource.it.jyu.fi/jjrandom2/freya/trees/master/docs](https://yousource.it.jyu.fi/jjrandom2/freya/trees/master/docs) [dead link, relevant files mirrored below]
 99 | 
100 | where the relevant files are [mirrored locally on GitHub]:
101 | 
102 |   - [wlsqm.pdf](doc/wlsqm.pdf) (old documentation for the old pure-Python version of WLSQM included in FREYA, plus the sensitivity calculation)
103 |   - [eulerflow.pdf](doc/eulerflow.pdf) (clearer presentation of the original version, but without the sensitivity calculation)
104 |   - [wlsqm_gen.pdf](doc/wlsqm_gen.pdf) (theory diff on how to make a version that handles also missing function values; also why WLSQM works and some analysis of its accuracy)
105 | 
106 | The documentation is slightly out of date; see [TODO](TODO.md) for details on what needs updating and how.
107 | 
108 | 
109 | ## Experiencing crashes?
110 | 
111 | Check that you are loading the same BLAS your LAPACK and SciPy link against:
112 | 
113 | ```bash
114 | shopt -s globstar
115 | ldd /usr/lib/**/*lapack*.so | grep blas
116 | ldd $(dirname $(python -c "import scipy; print(scipy.__file__)"))/linalg/cython_lapack.so | grep blas
117 | ```
118 | 
119 | In Debian-based Linux, you can change the active BLAS implementation by:
120 | 
121 | ```bash
122 | sudo update-alternatives --config libblas.so
123 | sudo update-alternatives --config libblas.so.3
124 | ```
125 | 
126 | This may (or may not) be different from what NumPy links against:
127 | 
128 | ```bash
129 | ldd $(dirname $(python -c "import numpy; print(numpy.__file__)"))/core/multiarray.so | grep blas
130 | ```
131 | 
132 | WLSQM itself does not link against LAPACK or BLAS; it utilizes the `cython_lapack` module of SciPy.
133 | 
134 | 
135 | ## Dependencies
136 | 
137 | - [NumPy](http://www.numpy.org)
138 | - [SciPy](http://www.scipy.org)
139 | - [Cython](http://www.cython.org)
140 | - [Matplotlib](http://matplotlib.org/) (for usage examples)
141 | 
142 | 
143 | ## License
144 | 
145 | [BSD](LICENSE.md). Copyright 2016-2017 Juha Jeronen and University of Jyväskylä.
146 | 
147 | 
148 | #### Acknowledgement
149 | 
150 | This work was financially supported by the Jenny and Antti Wihuri Foundation.
151 | 


--------------------------------------------------------------------------------
/TODO.md:
--------------------------------------------------------------------------------
 1 | High priority
 2 | =============
 3 | 
 4 | - create unit tests
 5 | 
 6 | -------------------------------------------------------------------------------
 7 | 
 8 | General
 9 | =======
10 | 
11 | - figure out a way to automatically add function signatures to docstrings for functions defined in Cython modules
12 |   - in the current docstrings, the "def" has been simply manually copy'n'pasted into the docstring,
13 |     causing unnecessary duplication and introducing a potential source of errors in documentation.
14 | 
15 | - move examples/sudoku_lhs into a separate proper library.
16 | 
17 | Documentation
18 | =============
19 | 
20 | - Update the documentation:
21 |   1. Emphasize surrogate models / response surface modeling (the Taylor series based intuition is misleading, as it severely overestimates the error).
22 |   2. Introduce the weighting factors `w[k]`. Change the definition of the total squared error G to be the weighted total squared error, where each neighbor point x[k] has its own weight w[k]. The end result is basically just to weight, in each sum over k, each term by `w[k]`. See `wlsqm/fitter/impl.pyx`.
23 |   3. Add a comment about matrix scaling, which drastically improves the condition number. Include a short comment on how to use the row and column scaling arrays. Cite the algorithm papers (see `wlsqm/utils/lapackdrivers.pyx`).
24 |   4. Add a comment about iterative refinement to reduce effects of roundoff (this technique is rather standard in least-squares fitting). The use of FMA in `wlsqm/fitter/polyeval.pyx`, used internally to compute `error = (data - model)`, may also mitigate roundoff, since it computes `op1*op2 + op3`, rounding only the end result.
25 |   5. Combine the pieces into a single document (see README.md for a listing of the pieces).
26 | 
27 | utils.lapackdrivers
28 | ===================
29 | 
30 |  - add option to return also the orthogonal matrices U and V in `svd()` (currently this routine is only useful to compute the 2-norm condition number)
31 | 
32 | fitter
33 | ======
34 | 
35 |  - fix TODOs in `setup.py`
36 | 
37 |  - API professionalism:
38 |    - make `wlsqm.fitter.expert.ExpertSolver` instances copyable
39 |      - needs a copy() method that deep-copies also the C-level stuff (re-running the memory allocation fandango)
40 |      - `wlsqm.fitter.infra` needs a `Case_copy()` method, because `wlsqm.fitter.infra.Case` contains pointers
41 |    - make `wlsqm.fitter.expert.ExpertSolver` instances pickleable (need to save/load the C-level stuff)
42 |    - use `DTYPE` and `DTYPE_t` aliases instead of `double`/`np.float64` directly, to allow compiling a version with complex number support
43 | 
44 |  - test the 3D support more thoroughly
45 |    - `wlsqm/fitter/polyeval.pyx`: make really, really sure `taylor_3D()`, `general_3D()` are bug-free
46 |    - `wlsqm/fitter/interp.pyx`: make really, really sure `interpolate_3D()` is bug-free
47 |    - write a unit test: generate random `sympy` functions (from a preset seed to make the test repeatable), differentiate them symbolically, fit models of orders 0, 1, 2, 3, 4 and compare all up to 34 derivatives with the exact result (the worst case should be within approx. `100*machine_epsilon` at least for the function value itself).
48 | 
49 |  - profile performance, see [http://stackoverflow.com/questions/28301931/how-to-profile-cython-functions-line-by-line](http://stackoverflow.com/questions/28301931/how-to-profile-cython-functions-line-by-line)
50 | 
51 |  - fix various small TODOs and FIXMEs in the code (low priority)
52 | 
53 |  - maybe: ExpertSolver: fix the silly slicing requirement in model interpolation: make it possible to interpolate the model to a single point without a memoryview
54 |    - but profile the performance first to check whether this actually causes a problem
55 |    - multiple points require the memoryview, because in the general case the input is non-contiguous (a sliced array)
56 | 
57 |  - maybe: reduce code duplication between driver and expert mode
58 |    - split `generic_fit_basic_many()` (and its friends) into prepare and solve stages, implement the driver in terms of calling these stages
59 |    - re-use the same stages in ExpertSolver
60 | 
61 | 


--------------------------------------------------------------------------------
/doc/eulerflow.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Technologicat/python-wlsqm/b697d163c2d2bec46b4d9696467abaebb9d4cbb3/doc/eulerflow.pdf


--------------------------------------------------------------------------------
/doc/wlsqm.lyx:
--------------------------------------------------------------------------------
   1 | #LyX 1.6.7 created this file. For more info see http://www.lyx.org/
   2 | \lyxformat 345
   3 | \begin_document
   4 | \begin_header
   5 | \textclass article
   6 | \use_default_options true
   7 | \language english
   8 | \inputencoding auto
   9 | \font_roman palatino
  10 | \font_sans default
  11 | \font_typewriter default
  12 | \font_default_family default
  13 | \font_sc false
  14 | \font_osf false
  15 | \font_sf_scale 100
  16 | \font_tt_scale 100
  17 | 
  18 | \graphics default
  19 | \paperfontsize default
  20 | \spacing single
  21 | \use_hyperref true
  22 | \pdf_bookmarks true
  23 | \pdf_bookmarksnumbered false
  24 | \pdf_bookmarksopen false
  25 | \pdf_bookmarksopenlevel 1
  26 | \pdf_breaklinks false
  27 | \pdf_pdfborder true
  28 | \pdf_colorlinks false
  29 | \pdf_backref false
  30 | \pdf_pdfusetitle true
  31 | \papersize default
  32 | \use_geometry true
  33 | \use_amsmath 1
  34 | \use_esint 1
  35 | \cite_engine natbib_authoryear
  36 | \use_bibtopic false
  37 | \paperorientation portrait
  38 | \leftmargin 1in
  39 | \topmargin 1in
  40 | \rightmargin 1in
  41 | \bottommargin 1in
  42 | \secnumdepth 3
  43 | \tocdepth 3
  44 | \paragraph_separation skip
  45 | \defskip medskip
  46 | \quotes_language english
  47 | \papercolumns 1
  48 | \papersides 1
  49 | \paperpagestyle default
  50 | \tracking_changes false
  51 | \output_changes false
  52 | \author "" 
  53 | \author "" 
  54 | \end_header
  55 | 
  56 | \begin_body
  57 | 
  58 | \begin_layout Title
  59 | Notes based on H.
  60 |  Wang's meshless method presentation for the FSI team
  61 | \end_layout
  62 | 
  63 | \begin_layout Author
  64 | Juha Jeronen
  65 | \end_layout
  66 | 
  67 | \begin_layout Abstract
  68 | This technical note explains how to approximate derivatives of a known function,
  69 |  defined as a set of values on a point cloud, where each point may have
  70 |  arbitrary Cartesian coordinates.
  71 |  This is a meshless method based on Taylor series expansion in a local set
  72 |  of nearest neighbors.
  73 |  It can be used for, e.g., integration of initial boundary value problems
  74 |  using explicit methods (e.g.
  75 |  RK4).
  76 | \end_layout
  77 | 
  78 | \begin_layout Abstract
  79 | Also, a simple 
  80 | \begin_inset Formula $O(d\, N\,\log N)$
  81 | \end_inset
  82 | 
  83 |  time algorithm for finding the nearest neighbors in 
  84 | \begin_inset Formula $d$
  85 | \end_inset
  86 | 
  87 |  dimensions is presented for the sake of completeness.
  88 | \end_layout
  89 | 
  90 | \begin_layout Subsubsection*
  91 | Derivative approximation --- the weighted least squares meshless method
  92 | \end_layout
  93 | 
  94 | \begin_layout Standard
  95 | We will present the 
  96 | \emph on
  97 | weighted least squares
  98 | \emph default
  99 |  
 100 | \emph on
 101 | meshless method
 102 | \emph default
 103 |  (WLSQ).
 104 |  It belongs to the class of finite point methods (collocation methods),
 105 |  so in spirit it is similar to finite differences.
 106 |  Because the method only differentiates known quantities, it is best suited
 107 |  for time evolution problems (initial boundary value problems; IBVP), which
 108 |  are solved with explicit time integration methods such as RK4.
 109 |  Dirichlet boundary conditions are very easy to enforce; Neumann and Robin
 110 |  are much harder.
 111 | \end_layout
 112 | 
 113 | \begin_layout Standard
 114 | To start with, consider a point cloud of 
 115 | \begin_inset Formula $N$
 116 | \end_inset
 117 | 
 118 |  points in 
 119 | \begin_inset Formula $\mathbb{R}^{d}$
 120 | \end_inset
 121 | 
 122 | .
 123 |  Let 
 124 | \begin_inset Formula $i$
 125 | \end_inset
 126 | 
 127 |  denote the index of the current node under consideration, and 
 128 | \begin_inset Formula $k$
 129 | \end_inset
 130 | 
 131 |  the index of one of its nearest neighbors.
 132 |  (For finding the 
 133 | \begin_inset Formula $m$
 134 | \end_inset
 135 | 
 136 |  nearest neighbors of a point in a point cloud, refer to the final section
 137 |  of this document.)
 138 | \end_layout
 139 | 
 140 | \begin_layout Standard
 141 | Let 
 142 | \begin_inset Formula $f=f(x_{k}),\; k=1,\dots,N$
 143 | \end_inset
 144 | 
 145 |  be a function defined on the point cloud.
 146 |  Here we will only consider the two-dimensional case (
 147 | \begin_inset Formula $d=2$
 148 | \end_inset
 149 | 
 150 | ) for simplicity.
 151 |  Let us shorten the notation by defining 
 152 | \begin_inset Formula $f_{k}:=f(x_{k})$
 153 | \end_inset
 154 | 
 155 | .
 156 | \end_layout
 157 | 
 158 | \begin_layout Standard
 159 | We would like to be able to approximate the derivatives of 
 160 | \begin_inset Formula $f$
 161 | \end_inset
 162 | 
 163 |  at the point 
 164 | \begin_inset Formula $x_{i}$
 165 | \end_inset
 166 | 
 167 | , using only the point cloud data.
 168 |  This has applications in e.g.
 169 |  explicit time integration of PDEs with given initial data.
 170 | \end_layout
 171 | 
 172 | \begin_layout Standard
 173 | Below, we will only consider the problem for one node 
 174 | \begin_inset Formula $x_{i}$
 175 | \end_inset
 176 | 
 177 | .
 178 |  Trivially, the same procedure can be repeated for each node.
 179 | \end_layout
 180 | 
 181 | \begin_layout Standard
 182 | Using multivariate Taylor expansion up to the second order, we can write
 183 |  
 184 | \begin_inset Formula $f_{k}$
 185 | \end_inset
 186 | 
 187 |  (value of 
 188 | \begin_inset Formula $f$
 189 | \end_inset
 190 | 
 191 |  at one of the nearest neighbors) in terms of 
 192 | \begin_inset Formula $f_{i}$
 193 | \end_inset
 194 | 
 195 |  as
 196 | \begin_inset Formula \begin{equation}
 197 | f_{k}=f_{i}+h_{k}a_{1}+\ell_{k}a_{2}+\frac{h_{k}^{2}}{2}a_{3}+h_{k}\ell_{k}a_{4}+\frac{\ell_{k}^{2}}{2}a_{5}+O(h_{k}^{3},\ell_{k}^{3})\;,\label{eq:Tay}\end{equation}
 198 | 
 199 | \end_inset
 200 | 
 201 | where 
 202 | \begin_inset Formula $h_{k}=(x_{k})_{1}-(x_{i})_{1}$
 203 | \end_inset
 204 | 
 205 |  (i.e.
 206 |  the 
 207 | \begin_inset Formula $x$
 208 | \end_inset
 209 | 
 210 |  component of the vector from 
 211 | \begin_inset Formula $x_{i}$
 212 | \end_inset
 213 | 
 214 |  to 
 215 | \begin_inset Formula $x_{k}$
 216 | \end_inset
 217 | 
 218 | ) and 
 219 | \begin_inset Formula $\ell_{k}=(x_{k})_{2}-(x_{i})_{2}$
 220 | \end_inset
 221 | 
 222 |  (respectively, the 
 223 | \begin_inset Formula $y$
 224 | \end_inset
 225 | 
 226 |  component).
 227 | \end_layout
 228 | 
 229 | \begin_layout Standard
 230 | Note that generally, we must expand up to as many orders as is the highest
 231 |  derivative we wish to approximate.
 232 |  We will assume here for simplicity that we are building the approximation
 233 |  for a second-order problem.
 234 | \end_layout
 235 | 
 236 | \begin_layout Standard
 237 | If we drop the asymptotic term, we get the approximation
 238 | \begin_inset Formula \begin{equation}
 239 | \overline{f}_{k}=f_{i}+h_{k}a_{1}+\ell_{k}a_{2}+\frac{h_{k}^{2}}{2}a_{3}+h_{k}\ell_{k}a_{4}+\frac{\ell_{k}^{2}}{2}a_{5}\;.\label{eq:approx}\end{equation}
 240 | 
 241 | \end_inset
 242 | 
 243 | By the Taylor expansion, we would expect to have
 244 | \begin_inset Formula \begin{align}
 245 | a_{1} & =\frac{\partial f_{k}}{\partial x}\vert_{x=x_{i}}\nonumber \\
 246 | a_{2} & =\frac{\partial f_{k}}{\partial y}\vert_{x=x_{i}}\nonumber \\
 247 | a_{3} & =\frac{\partial^{2}f_{k}}{\partial x^{2}}\vert_{x=x_{i}}\nonumber \\
 248 | a_{4} & =\frac{\partial^{2}f_{k}}{\partial x\partial y}\vert_{x=x_{i}}\nonumber \\
 249 | a_{5} & =\frac{\partial^{2}f_{k}}{\partial y^{2}}\vert_{x=x_{i}}\;,\label{eq:aj}\end{align}
 250 | 
 251 | \end_inset
 252 | 
 253 | if 
 254 | \begin_inset Formula $f$
 255 | \end_inset
 256 | 
 257 |  was defined on all of 
 258 | \begin_inset Formula $\mathbb{R}^{2}$
 259 | \end_inset
 260 | 
 261 | .
 262 |  Our problem is thus to find a good approximation for the values of the
 263 |  
 264 | \begin_inset Formula $a_{j}$
 265 | \end_inset
 266 | 
 267 | .
 268 | \end_layout
 269 | 
 270 | \begin_layout Standard
 271 | Let us denote
 272 | \begin_inset Formula \begin{align}
 273 | c_{k}^{(1)} & :=h_{k}\nonumber \\
 274 | c_{k}^{(2)} & :=\ell_{k}\nonumber \\
 275 | c_{k}^{(3)} & :=\frac{h_{k}^{2}}{2}\nonumber \\
 276 | c_{k}^{(3)} & :=h_{k}\ell_{k}\nonumber \\
 277 | c_{k}^{(5)} & :=\frac{\ell_{k}^{2}}{2}\;.\label{eq:ck}\end{align}
 278 | 
 279 | \end_inset
 280 | 
 281 | We would like to minimize the approximation error.
 282 |  Let us denote the error as
 283 | \begin_inset Formula \begin{equation}
 284 | e_{k}:=f_{k}-\overline{f}_{k}\;.\label{eq:ek}\end{equation}
 285 | 
 286 | \end_inset
 287 | 
 288 | We proceed by making a least squares approximation.
 289 |  Let
 290 | \begin_inset Formula \begin{equation}
 291 | G:=\frac{1}{2}\underset{k}{\sum}e_{k}^{2}\label{eq:G}\end{equation}
 292 | 
 293 | \end_inset
 294 | 
 295 | where the sum is taken over the nearest-neighbor set of 
 296 | \begin_inset Formula $x_{i}$
 297 | \end_inset
 298 | 
 299 | .
 300 |  The least-squares approximation is given by the minimum
 301 | \begin_inset Formula \[
 302 | \underset{a_{j}}{\min}\, G\;,\]
 303 | 
 304 | \end_inset
 305 | 
 306 | i.e.
 307 |  such values for the 
 308 | \begin_inset Formula $a_{j}$
 309 | \end_inset
 310 | 
 311 |  that they minimize the squared error 
 312 | \begin_inset Formula $G$
 313 | \end_inset
 314 | 
 315 | .
 316 | \end_layout
 317 | 
 318 | \begin_layout Standard
 319 | The minimum of the function 
 320 | \begin_inset Formula $G=G(a_{1},\dots,a_{5})$
 321 | \end_inset
 322 | 
 323 |  is necessarily at an extremum point.
 324 |  Thus, we set all its partial derivatives to zero (w.r.t the 
 325 | \begin_inset Formula $a_{j}$
 326 | \end_inset
 327 | 
 328 | ):
 329 | \begin_inset Formula \begin{equation}
 330 | \frac{\partial G}{\partial a_{j}}=0\quad\forall\; j=1,\dots,5\;.\label{eq:minG}\end{equation}
 331 | 
 332 | \end_inset
 333 | 
 334 | Because 
 335 | \begin_inset Formula $G\ge0$
 336 | \end_inset
 337 | 
 338 |  for any values of the 
 339 | \begin_inset Formula $a_{j}$
 340 | \end_inset
 341 | 
 342 |  and it is a quadratic function, this point is also necessarily the minimum.
 343 |  Thus, solving equation 
 344 | \begin_inset CommandInset ref
 345 | LatexCommand eqref
 346 | reference "eq:minG"
 347 | 
 348 | \end_inset
 349 | 
 350 |  gives us the optimal 
 351 | \begin_inset Formula $a_{j}$
 352 | \end_inset
 353 | 
 354 | .
 355 | \end_layout
 356 | 
 357 | \begin_layout Standard
 358 | One important thing to notice here is that we of course do not have the
 359 |  value of the asymptotic term 
 360 | \family roman
 361 | \series medium
 362 | \shape up
 363 | \size normal
 364 | \emph off
 365 | \bar no
 366 | \noun off
 367 | \color none
 368 | 
 369 | \begin_inset Formula $O(h_{k}^{3},\ell_{k}^{3})$
 370 | \end_inset
 371 | 
 372 |  in 
 373 | \begin_inset CommandInset ref
 374 | LatexCommand eqref
 375 | reference "eq:Tay"
 376 | 
 377 | \end_inset
 378 | 
 379 | .
 380 |  However, we do not need equation 
 381 | \begin_inset CommandInset ref
 382 | LatexCommand eqref
 383 | reference "eq:Tay"
 384 | 
 385 | \end_inset
 386 | 
 387 |  for computing the error 
 388 | \begin_inset CommandInset ref
 389 | LatexCommand eqref
 390 | reference "eq:ek"
 391 | 
 392 | \end_inset
 393 | 
 394 | .
 395 |  This is because we already have the value of 
 396 | \begin_inset Formula $f_{k}$
 397 | \end_inset
 398 | 
 399 |  directly, since it is one of the points in the data! Thus, for any set
 400 |  of values for the 
 401 | \begin_inset Formula $a_{j}$
 402 | \end_inset
 403 | 
 404 | , the error 
 405 | \begin_inset CommandInset ref
 406 | LatexCommand eqref
 407 | reference "eq:ek"
 408 | 
 409 | \end_inset
 410 | 
 411 |  can be computed (by replacing 
 412 | \begin_inset Formula $f_{k}$
 413 | \end_inset
 414 | 
 415 |  with the data point in question and computing 
 416 | \begin_inset Formula $\overline{f}_{k}$
 417 | \end_inset
 418 | 
 419 |  from 
 420 | \begin_inset CommandInset ref
 421 | LatexCommand eqref
 422 | reference "eq:approx"
 423 | 
 424 | \end_inset
 425 | 
 426 | ).
 427 | \end_layout
 428 | 
 429 | \begin_layout Standard
 430 | Let us write out 
 431 | \begin_inset CommandInset ref
 432 | LatexCommand eqref
 433 | reference "eq:minG"
 434 | 
 435 | \end_inset
 436 | 
 437 | .
 438 |  We have
 439 | \begin_inset Formula \begin{align}
 440 | \frac{\partial G}{\partial a_{j}} & =\underset{k}{\sum}e_{k}\frac{\partial e_{k}}{\partial a_{j}}\nonumber \\
 441 |  & =\underset{k}{\sum}[f_{k}-\overline{f}_{k}(a_{1},\dots,a_{5})]\left[-\frac{\partial\overline{f}_{k}}{\partial a_{j}}\right]=0\quad\forall\; j=1,\dots5\;,\label{eq:minG2}\end{align}
 442 | 
 443 | \end_inset
 444 | 
 445 | where we have replaced 
 446 | \begin_inset Formula $e_{k}$
 447 | \end_inset
 448 | 
 449 |  by the difference of data 
 450 | \begin_inset Formula $f_{k}$
 451 | \end_inset
 452 | 
 453 |  and the interpolate 
 454 | \begin_inset Formula $\overline{f}_{k}$
 455 | \end_inset
 456 | 
 457 | , as noted above.
 458 | \end_layout
 459 | 
 460 | \begin_layout Standard
 461 | Now the rest is essentially technique.
 462 |  Expanding the first 
 463 | \begin_inset Formula $\overline{f}_{k}$
 464 | \end_inset
 465 | 
 466 |  in 
 467 | \begin_inset CommandInset ref
 468 | LatexCommand eqref
 469 | reference "eq:minG2"
 470 | 
 471 | \end_inset
 472 | 
 473 |  and taking the minus sign in front, we have
 474 | \begin_inset Formula \[
 475 | -\underset{k}{\sum}\left(\left[f_{k}-f_{i}-c_{k}^{(1)}a_{1}-c_{k}^{(2)}a_{2}-c_{k}^{(3)}a_{3}-c_{k}^{(4)}a_{4}-c_{k}^{(5)}a_{5}\right]\left[\frac{\partial\overline{f}_{k}}{\partial a_{j}}\right]\right)=0\quad\forall j\;.\]
 476 | 
 477 | \end_inset
 478 | 
 479 | This can be rewritten as a standard linear equation system
 480 | \begin_inset Formula \begin{equation}
 481 | A\mathbf{a}=\mathbf{b}\;,\label{eq:lineq}\end{equation}
 482 | 
 483 | \end_inset
 484 | 
 485 | where
 486 | \begin_inset Formula \[
 487 | \mathbf{a}=(a_{1},\dots,a_{5})^{T}\]
 488 | 
 489 | \end_inset
 490 | 
 491 | are the unknowns, and the 
 492 | \begin_inset Formula $j$
 493 | \end_inset
 494 | 
 495 | th component of the load vector 
 496 | \begin_inset Formula $\mathbf{b}$
 497 | \end_inset
 498 | 
 499 |  is
 500 | \begin_inset Formula \begin{equation}
 501 | b_{j}=\underset{k}{\sum}[f_{k}-f_{i}]\left[\frac{\partial\overline{f}_{k}}{\partial a_{j}}\right]=\underset{k}{\sum}[f_{k}-f_{i}]c_{k}^{(j)}\;,\label{eq:bj}\end{equation}
 502 | 
 503 | \end_inset
 504 | 
 505 | where in the last form we have used 
 506 | \begin_inset CommandInset ref
 507 | LatexCommand eqref
 508 | reference "eq:approx"
 509 | 
 510 | \end_inset
 511 | 
 512 |  and the definition 
 513 | \begin_inset CommandInset ref
 514 | LatexCommand eqref
 515 | reference "eq:ck"
 516 | 
 517 | \end_inset
 518 | 
 519 | .
 520 |  The sum, like above, is taken over the set of nearest neighbors.
 521 |  Especially note that, as required, all the quantities on the right-hand
 522 |  side of 
 523 | \begin_inset CommandInset ref
 524 | LatexCommand eqref
 525 | reference "eq:bj"
 526 | 
 527 | \end_inset
 528 | 
 529 |  are known.
 530 | \end_layout
 531 | 
 532 | \begin_layout Standard
 533 | The element 
 534 | \begin_inset Formula $A_{jn}$
 535 | \end_inset
 536 | 
 537 |  of the coefficient matrix 
 538 | \begin_inset Formula $A$
 539 | \end_inset
 540 | 
 541 |  is
 542 | \begin_inset Formula \begin{equation}
 543 | A_{jn}=\underset{k}{\sum}c_{k}^{(n)}c_{k}^{(j)}\;.\label{eq:Ajn}\end{equation}
 544 | 
 545 | \end_inset
 546 | 
 547 | This sum, too, is taken over the set of nearest neighbors.
 548 |  The matrix 
 549 | \begin_inset Formula $A$
 550 | \end_inset
 551 | 
 552 |  is symmetric, 
 553 | \begin_inset Formula $A=A^{T}$
 554 | \end_inset
 555 | 
 556 | .
 557 | \end_layout
 558 | 
 559 | \begin_layout Standard
 560 | Solving 
 561 | \begin_inset CommandInset ref
 562 | LatexCommand eqref
 563 | reference "eq:lineq"
 564 | 
 565 | \end_inset
 566 | 
 567 | , by e.g.
 568 |  pivoted Gaussian elimination (routine DGESV in LAPACK, operator 
 569 | \backslash
 570 |  in MATLAB, scipy.linalg.solve() in Python, ...), produces the derivative approximati
 571 | ons 
 572 | \begin_inset Formula $a_{j}$
 573 | \end_inset
 574 | 
 575 | , up to the second order.
 576 | \end_layout
 577 | 
 578 | \begin_layout Standard
 579 | Note that both 
 580 | \begin_inset Formula $A$
 581 | \end_inset
 582 | 
 583 |  and 
 584 | \begin_inset Formula $\mathbf{b}$
 585 | \end_inset
 586 | 
 587 |  depend on the node index 
 588 | \begin_inset Formula $i$
 589 | \end_inset
 590 | 
 591 | ! That is, each node comes with its own 
 592 | \begin_inset Formula $A$
 593 | \end_inset
 594 | 
 595 |  and 
 596 | \begin_inset Formula $\mathbf{b}$
 597 | \end_inset
 598 | 
 599 | , and thus 
 600 | \begin_inset CommandInset ref
 601 | LatexCommand eqref
 602 | reference "eq:bj"
 603 | 
 604 | \end_inset
 605 | 
 606 |  and 
 607 | \begin_inset CommandInset ref
 608 | LatexCommand eqref
 609 | reference "eq:Ajn"
 610 | 
 611 | \end_inset
 612 | 
 613 |  must be re-evaluated for each node where we wish to obtain the derivative
 614 |  approximation.
 615 | \end_layout
 616 | 
 617 | \begin_layout Subsubsection*
 618 | Sensitivity of the solution
 619 | \end_layout
 620 | 
 621 | \begin_layout Standard
 622 | It is possible to also obtain the sensitivity of the solution 
 623 | \begin_inset Formula $\mathbf{a}$
 624 | \end_inset
 625 | 
 626 |  
 627 | \begin_inset Formula $ $
 628 | \end_inset
 629 | 
 630 | in terms of small changes in the values of the data points 
 631 | \begin_inset Formula $f_{k}$
 632 | \end_inset
 633 | 
 634 | .
 635 |  Consider, formally, manipulating 
 636 | \begin_inset CommandInset ref
 637 | LatexCommand eqref
 638 | reference "eq:lineq"
 639 | 
 640 | \end_inset
 641 | 
 642 |  into
 643 | \begin_inset Formula \[
 644 | \mathbf{a}(f_{k})=A^{-1}\cdot\mathbf{b}(f_{k})\;.\]
 645 | 
 646 | \end_inset
 647 | 
 648 | Differentiation on both sides, and writing the equation in component form,
 649 |  gives (the matrix 
 650 | \begin_inset Formula $A$
 651 | \end_inset
 652 | 
 653 |  is constant w.r.t.
 654 |  
 655 | \begin_inset Formula $f_{k}$
 656 | \end_inset
 657 | 
 658 | )
 659 | \begin_inset Formula \begin{align*}
 660 | \frac{\partial a_{j}}{\partial f_{k}} & =\underset{n}{\sum}(A^{-1})_{jn}\frac{\partial b_{n}}{\partial f_{k}}\\
 661 |  & =\underset{n}{\sum}(A^{-1})_{jn}c_{k}^{(n)}\;,\quad\forall\; j=1,\dots5\;,\end{align*}
 662 | 
 663 | \end_inset
 664 | 
 665 | which can be rewritten as
 666 | \begin_inset Formula \begin{equation}
 667 | A\frac{\partial\mathbf{a}}{\partial f_{k}}=(c_{k}^{(1)},c_{k}^{(2)},c_{k}^{(3)},c_{k}^{(4)},c_{k}^{(5)})^{T}\;.\label{eq:sens}\end{equation}
 668 | 
 669 | \end_inset
 670 | 
 671 | Thus we have a linear equation system, from which the sensitivities of each
 672 |  of the 
 673 | \begin_inset Formula $a_{j}$
 674 | \end_inset
 675 | 
 676 |  in terms of the node value 
 677 | \begin_inset Formula $f_{k}$
 678 | \end_inset
 679 | 
 680 |  can be solved.
 681 |  By changing 
 682 | \begin_inset Formula $k$
 683 | \end_inset
 684 | 
 685 |  on the right-hand side and solving again for each 
 686 | \begin_inset Formula $k$
 687 | \end_inset
 688 | 
 689 | , we obtain the sensitivity with respect to each of the neighbors.
 690 |  (Note that there is 
 691 | \series bold
 692 | no
 693 | \series default
 694 |  sum over 
 695 | \begin_inset Formula $k$
 696 | \end_inset
 697 | 
 698 | , except inside the matrix 
 699 | \begin_inset Formula $A$
 700 | \end_inset
 701 | 
 702 | .)
 703 | \end_layout
 704 | 
 705 | \begin_layout Standard
 706 | This sensitivity result may be useful for forcing Neumann boundary conditions
 707 |  to hold during IBVP integration (at each timestep, changing the values
 708 |  at the nodes belonging to the boundary until the BC is satisfied).
 709 | \end_layout
 710 | 
 711 | \begin_layout Standard
 712 | Again, it should be noted that equation 
 713 | \begin_inset CommandInset ref
 714 | LatexCommand eqref
 715 | reference "eq:sens"
 716 | 
 717 | \end_inset
 718 | 
 719 |  is valid for the node 
 720 | \begin_inset Formula $i$
 721 | \end_inset
 722 | 
 723 | , and in principle must be solved separately for each node.
 724 | \end_layout
 725 | 
 726 | \begin_layout Standard
 727 | However, we observe that the sensitivities depend on the (local) geometry
 728 |  of the point cloud only.
 729 |  Recall the definitions of 
 730 | \begin_inset Formula $A$
 731 | \end_inset
 732 | 
 733 |  and 
 734 | \begin_inset Formula $c_{k}^{(n)}$
 735 | \end_inset
 736 | 
 737 | , equations 
 738 | \begin_inset CommandInset ref
 739 | LatexCommand eqref
 740 | reference "eq:Ajn"
 741 | 
 742 | \end_inset
 743 | 
 744 |  and 
 745 | \begin_inset CommandInset ref
 746 | LatexCommand eqref
 747 | reference "eq:ck"
 748 | 
 749 | \end_inset
 750 | 
 751 | ; the only quantities that appear are the pairwise node distances.
 752 |  This observation holds for any point cloud.
 753 | \end_layout
 754 | 
 755 | \begin_layout Standard
 756 | If there is some regularity in the geometry, it may be possible to reuse
 757 |  (some of) the results.
 758 |  As a special case, if we have a regular Cartesian grid, the 
 759 | \begin_inset Formula $c_{k}^{(n)}$
 760 | \end_inset
 761 | 
 762 |  are constant with respect to 
 763 | \begin_inset Formula $k$
 764 | \end_inset
 765 | 
 766 | , and thus in this special case only, the sensitivities at each node follow
 767 |  the same pattern.
 768 |  This extends easily to other regular geometries; e.g.
 769 |  for a grid based on the nodes of a hexagonal tiling, there will be only
 770 |  two kinds of nodes with regard to the sensitivity.
 771 |  The strength of the method, however, lies in being able to handle irregular
 772 |  geometries: in the general case, one does not need to assume anything about
 773 |  the distribution of the points.
 774 | \end_layout
 775 | 
 776 | \begin_layout Subsubsection*
 777 | Finding nearest neighbors --- a simple algorithm
 778 | \end_layout
 779 | 
 780 | \begin_layout Standard
 781 | In this section, we look into the problem of searching a given point cloud
 782 |  for nearest neighbors.
 783 |  We consider finding the neighbors within a given distance 
 784 | \begin_inset Formula $R$
 785 | \end_inset
 786 | 
 787 |  from a given point, and finding the 
 788 | \begin_inset Formula $m$
 789 | \end_inset
 790 | 
 791 |  nearest neighbors of a given point, with 
 792 | \begin_inset Formula $m$
 793 | \end_inset
 794 | 
 795 |  given.
 796 | \end_layout
 797 | 
 798 | \begin_layout Standard
 799 | An example MATLAB/Octave implementation of the ideas presented in this section
 800 |  is provided in 
 801 | \family typewriter
 802 | 
 803 | \begin_inset Newline newline
 804 | \end_inset
 805 | 
 806 | find_neighbors.m
 807 | \family default
 808 |  (in the SAVU project git repository).
 809 | \end_layout
 810 | 
 811 | \begin_layout Paragraph*
 812 | Finding all neighbors within distance R
 813 | \end_layout
 814 | 
 815 | \begin_layout Standard
 816 | For a static point cloud (in the sense of not changing during the simulation),
 817 |  the nearest neighbor search problem can be solved in 
 818 | \begin_inset Formula $O(d\, N\,\log\, N)$
 819 | \end_inset
 820 | 
 821 |  time (where 
 822 | \begin_inset Formula $N$
 823 | \end_inset
 824 | 
 825 |  is the number of points in the whole cloud, and 
 826 | \begin_inset Formula $d$
 827 | \end_inset
 828 | 
 829 |  is the dimensionality of the space 
 830 | \begin_inset Formula $\mathbb{R}^{d}$
 831 | \end_inset
 832 | 
 833 |  where the points live in) using an indexed search procedure.
 834 |  For a moving point cloud, the 
 835 | \begin_inset Quotes eld
 836 | \end_inset
 837 | 
 838 | expensive
 839 | \begin_inset Quotes erd
 840 | \end_inset
 841 | 
 842 |  
 843 | \begin_inset Formula $O(d\, N\,\log\, N)$
 844 | \end_inset
 845 | 
 846 |  step must be re-performed at each timestep.
 847 | \end_layout
 848 | 
 849 | \begin_layout Standard
 850 | Initially, we create a sorted index of the data based on the coordinates
 851 |  on each axis.
 852 |  This gives us 
 853 | \begin_inset Formula $d$
 854 | \end_inset
 855 | 
 856 |  sorted vectors of 
 857 | \begin_inset Formula $(\text{coordinate along }j\text{th axis},\,\text{point ID})$
 858 | \end_inset
 859 | 
 860 |  pairs.
 861 |  This enables us to search for the set of points, which belong to a given
 862 |  interval on, say, the 
 863 | \begin_inset Formula $x$
 864 | \end_inset
 865 | 
 866 |  axis (
 867 | \begin_inset Formula $j=1$
 868 | \end_inset
 869 | 
 870 | ; correspondingly for the other axes).
 871 |  Each sort finishes in 
 872 | \begin_inset Formula $O(N\,\log\, N)$
 873 | \end_inset
 874 | 
 875 |  time, and only needs to be done once (or until the point cloud changes;
 876 |  then we must re-index).
 877 |  Then, indexed search on this data can be done using the binary search procedure
 878 |  in 
 879 | \begin_inset Formula $O(\log\, N)$
 880 | \end_inset
 881 | 
 882 |  time for each dimension.
 883 | \end_layout
 884 | 
 885 | \begin_layout Standard
 886 | To find the neighbors within distance 
 887 | \begin_inset Formula $R$
 888 | \end_inset
 889 | 
 890 |  of a point with given coordinates in 
 891 | \begin_inset Formula $\mathbb{R}^{d}$
 892 | \end_inset
 893 | 
 894 |  (allowed to be a point belonging to the cloud, but not necessary), we first
 895 |  search along each axis, producing 
 896 | \begin_inset Formula $d$
 897 | \end_inset
 898 | 
 899 |  filtered index sets in each of which the coordinates on the 
 900 | \begin_inset Formula $k$
 901 | \end_inset
 902 | 
 903 | th axis match the desired interval 
 904 | \begin_inset Formula $[(x_{0})_{k}-R/2,\;(x_{0})_{k}+R/2]$
 905 | \end_inset
 906 | 
 907 | .
 908 |  Taking the set intersection of the result sets gives us the neighbor set
 909 |  within distance 
 910 | \begin_inset Formula $R$
 911 | \end_inset
 912 | 
 913 |  in the sense of the 
 914 | \begin_inset Formula $\ell^{\infty}$
 915 | \end_inset
 916 | 
 917 |  metric.
 918 |  The next step is to filter the result further.
 919 | \end_layout
 920 | 
 921 | \begin_layout Standard
 922 | An important property here is that because 
 923 | \begin_inset Formula $\Vert x\Vert_{\ell^{\infty}}\le\Vert x\Vert_{\ell^{p}}$
 924 | \end_inset
 925 | 
 926 |  for all 
 927 | \begin_inset Formula $1\le p<\infty$
 928 | \end_inset
 929 | 
 930 | , the 
 931 | \begin_inset Formula $\ell^{\infty}$
 932 | \end_inset
 933 | 
 934 |  neighbor set encloses all other 
 935 | \begin_inset Formula $\ell^{p}$
 936 | \end_inset
 937 | 
 938 |  neighbor sets, including the Euclidean neighbor set (with 
 939 | \begin_inset Formula $p=2$
 940 | \end_inset
 941 | 
 942 | ).
 943 |  Thus, all these other neighbor sets can be produced by filtering the 
 944 | \begin_inset Formula $\ell^{\infty}$
 945 | \end_inset
 946 | 
 947 |  neighbor set.
 948 | \end_layout
 949 | 
 950 | \begin_layout Standard
 951 | The 
 952 | \begin_inset Formula $\ell^{\infty}$
 953 | \end_inset
 954 | 
 955 |  neighbor set, with 
 956 | \begin_inset Formula $M$
 957 | \end_inset
 958 | 
 959 |  points, is for any practically interesting 
 960 | \begin_inset Formula $R$
 961 | \end_inset
 962 | 
 963 |  much smaller than the whole cloud (
 964 | \begin_inset Formula $M\ll N$
 965 | \end_inset
 966 | 
 967 | ).
 968 |  Thus, linear filtering of the result set, which takes 
 969 | \begin_inset Formula $O(M)$
 970 | \end_inset
 971 | 
 972 |  time, is not a major cost.
 973 | \end_layout
 974 | 
 975 | \begin_layout Standard
 976 | To find the 
 977 | \begin_inset Formula $\ell^{2}$
 978 | \end_inset
 979 | 
 980 |  (Euclidean) neighbor set, we simply construct a new result set, including
 981 |  in it only those points in the 
 982 | \begin_inset Formula $\ell^{\infty}$
 983 | \end_inset
 984 | 
 985 |  neighbor set that also satisfy the 
 986 | \begin_inset Formula $\ell^{2}$
 987 | \end_inset
 988 | 
 989 |  distance requirement 
 990 | \begin_inset Formula $\Vert x_{j}-x_{0}\Vert_{\ell^{2}}\le R$
 991 | \end_inset
 992 | 
 993 | .
 994 | \end_layout
 995 | 
 996 | \begin_layout Paragraph*
 997 | Finding the m nearest neighbors
 998 | \end_layout
 999 | 
1000 | \begin_layout Standard
1001 | Finally, consider the question of finding 
1002 | \begin_inset Formula $R$
1003 | \end_inset
1004 | 
1005 |  such that within this radius, there are exactly 
1006 | \begin_inset Formula $m$
1007 | \end_inset
1008 | 
1009 |  neighbors (where 
1010 | \begin_inset Formula $m$
1011 | \end_inset
1012 | 
1013 |  is user-specified).
1014 |  This provides us a nearest-neighbor search procedure for user-definable
1015 |  
1016 | \begin_inset Formula $m$
1017 | \end_inset
1018 | 
1019 | , which is what we need in the meshless method.
1020 | \end_layout
1021 | 
1022 | \begin_layout Standard
1023 | We start from some 
1024 | \begin_inset Formula $R=r_{0}$
1025 | \end_inset
1026 | 
1027 |  (this can be e.g.
1028 |  some function of the size of the bounding box of the data, which can be
1029 |  trivially found in 
1030 | \begin_inset Formula $O(N)$
1031 | \end_inset
1032 | 
1033 |  time, and the number of points in the data (e.g.
1034 |  assuming them to have uniform density and estimating average 
1035 | \begin_inset Formula $R$
1036 | \end_inset
1037 | 
1038 |  from that)).
1039 |  We then do a logarithmic search, counting the neighbors within radius 
1040 | \begin_inset Formula $R$
1041 | \end_inset
1042 | 
1043 |  and, based on the result, we either double or halve 
1044 | \begin_inset Formula $R$
1045 | \end_inset
1046 | 
1047 |  at each step.
1048 | \end_layout
1049 | 
1050 | \begin_layout Standard
1051 | By this logarithmic search, we may get lucky and hit an 
1052 | \begin_inset Formula $R$
1053 | \end_inset
1054 | 
1055 |  where there are exactly 
1056 | \begin_inset Formula $m$
1057 | \end_inset
1058 | 
1059 |  neighbors.
1060 |  In this case, we stop and return the current neighbor set.
1061 | \end_layout
1062 | 
1063 | \begin_layout Standard
1064 | But most often, we will find an interval 
1065 | \begin_inset Formula $R\in[R_{1},R_{2}]$
1066 | \end_inset
1067 | 
1068 |  where 
1069 | \begin_inset Formula $R_{1}$
1070 | \end_inset
1071 | 
1072 |  has less then 
1073 | \begin_inset Formula $m$
1074 | \end_inset
1075 | 
1076 |  neighbors, and 
1077 | \begin_inset Formula $R_{2}=2R_{1}$
1078 | \end_inset
1079 | 
1080 |  has more than 
1081 | \begin_inset Formula $m$
1082 | \end_inset
1083 | 
1084 | .
1085 |  This interval can be refined using binary search on the variable 
1086 | \begin_inset Formula $R$
1087 | \end_inset
1088 | 
1089 | .
1090 |  This produces a sequence of shrinking intervals 
1091 | \begin_inset Formula $[R_{a},R_{b}]$
1092 | \end_inset
1093 | 
1094 | , which converges onto (some) correct 
1095 | \begin_inset Formula $R$
1096 | \end_inset
1097 | 
1098 | .
1099 |  This works, because the number of neighbors as a function of distance is
1100 |  a monotonic (although discontinuous and piecewise constant) function.
1101 |  We stop the search once we find an 
1102 | \begin_inset Formula $R$
1103 | \end_inset
1104 | 
1105 |  which has exactly 
1106 | \begin_inset Formula $m$
1107 | \end_inset
1108 | 
1109 |  neighbors.
1110 | \end_layout
1111 | 
1112 | \begin_layout Standard
1113 | The final pitfall is that in an arbitrary point cloud, for any given point,
1114 |  the cloud may contain exactly two (or more) points at the exact same distance
1115 |  from it.
1116 |  In these cases, there might not exist a distance with exactly 
1117 | \begin_inset Formula $m$
1118 | \end_inset
1119 | 
1120 |  neighbors for the given point! To protect against this possibility, we
1121 |  set a tolerance 
1122 | \begin_inset Formula $\varepsilon>0$
1123 | \end_inset
1124 | 
1125 |  for the length of the search interval 
1126 | \begin_inset Formula $[R_{a},R_{b}]$
1127 | \end_inset
1128 | 
1129 |  in the above procedure.
1130 |  If no matching 
1131 | \begin_inset Formula $R$
1132 | \end_inset
1133 | 
1134 |  has been found, and 
1135 | \begin_inset Formula $R_{b}-R_{a}<\varepsilon$
1136 | \end_inset
1137 | 
1138 | , we stop the search and return the neighbor set at 
1139 | \begin_inset Formula $R_{b}$
1140 | \end_inset
1141 | 
1142 |  (along with e.g.
1143 |  an error code or some other signal, so that the calling end knows that
1144 |  extra neighbors have been returned).
1145 | \end_layout
1146 | 
1147 | \end_body
1148 | \end_document
1149 | 


--------------------------------------------------------------------------------
/doc/wlsqm.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Technologicat/python-wlsqm/b697d163c2d2bec46b4d9696467abaebb9d4cbb3/doc/wlsqm.pdf


--------------------------------------------------------------------------------
/doc/wlsqm_gen.lyx:
--------------------------------------------------------------------------------
   1 | #LyX 2.2 created this file. For more info see http://www.lyx.org/
   2 | \lyxformat 508
   3 | \begin_document
   4 | \begin_header
   5 | \save_transient_properties true
   6 | \origin unavailable
   7 | \textclass article
   8 | \use_default_options true
   9 | \maintain_unincluded_children false
  10 | \language english
  11 | \language_package default
  12 | \inputencoding auto
  13 | \fontencoding global
  14 | \font_roman "palatino" "default"
  15 | \font_sans "default" "default"
  16 | \font_typewriter "default" "default"
  17 | \font_math "auto" "auto"
  18 | \font_default_family default
  19 | \use_non_tex_fonts false
  20 | \font_sc false
  21 | \font_osf false
  22 | \font_sf_scale 100 100
  23 | \font_tt_scale 100 100
  24 | \graphics default
  25 | \default_output_format default
  26 | \output_sync 0
  27 | \bibtex_command default
  28 | \index_command default
  29 | \paperfontsize default
  30 | \spacing single
  31 | \use_hyperref true
  32 | \pdf_bookmarks true
  33 | \pdf_bookmarksnumbered false
  34 | \pdf_bookmarksopen false
  35 | \pdf_bookmarksopenlevel 1
  36 | \pdf_breaklinks false
  37 | \pdf_pdfborder true
  38 | \pdf_colorlinks false
  39 | \pdf_backref false
  40 | \pdf_pdfusetitle true
  41 | \papersize default
  42 | \use_geometry true
  43 | \use_package amsmath 1
  44 | \use_package amssymb 1
  45 | \use_package cancel 1
  46 | \use_package esint 1
  47 | \use_package mathdots 1
  48 | \use_package mathtools 1
  49 | \use_package mhchem 1
  50 | \use_package stackrel 1
  51 | \use_package stmaryrd 1
  52 | \use_package undertilde 1
  53 | \cite_engine natbib
  54 | \cite_engine_type authoryear
  55 | \biblio_style plain
  56 | \use_bibtopic false
  57 | \use_indices false
  58 | \paperorientation portrait
  59 | \suppress_date false
  60 | \justification true
  61 | \use_refstyle 1
  62 | \index Index
  63 | \shortcut idx
  64 | \color #008000
  65 | \end_index
  66 | \leftmargin 2cm
  67 | \topmargin 2cm
  68 | \rightmargin 2cm
  69 | \bottommargin 2cm
  70 | \secnumdepth 3
  71 | \tocdepth 3
  72 | \paragraph_separation indent
  73 | \paragraph_indentation default
  74 | \quotes_language english
  75 | \papercolumns 1
  76 | \papersides 1
  77 | \paperpagestyle default
  78 | \tracking_changes false
  79 | \output_changes false
  80 | \html_math_output 0
  81 | \html_css_as_file 0
  82 | \html_be_strict false
  83 | \end_header
  84 | 
  85 | \begin_body
  86 | 
  87 | \begin_layout Section
  88 | Extended WLSQM: dealing with missing function values
  89 | \end_layout
  90 | 
  91 | \begin_layout Standard
  92 | Can we extend WLSQM to the case where the function value 
  93 | \begin_inset Formula $\widehat{f}_{i}(x_{i})$
  94 | \end_inset
  95 | 
  96 |  is unknown, provided that 
  97 | \begin_inset Formula $\widehat{f}_{i}(x_{k})$
  98 | \end_inset
  99 | 
 100 |  is known for all neighbor points 
 101 | \begin_inset Formula $x_{k}$
 102 | \end_inset
 103 | 
 104 | ?
 105 | \end_layout
 106 | 
 107 | \begin_layout Itemize
 108 | The primary use case is handling boundary conditions, which may prescribe
 109 |  a derivative, leaving the function value free.
 110 |  In these cases, we eliminate the appropriate 
 111 | \begin_inset Formula $a_{j}$
 112 | \end_inset
 113 | 
 114 | , either by algebraic elimination of the corresponding row and column; or
 115 |  by replacing its row in the equation system with 
 116 | \begin_inset Formula $1\cdot a_{j}=C$
 117 | \end_inset
 118 | 
 119 |  (maybe appropriately scaled), where 
 120 | \begin_inset Formula $C$
 121 | \end_inset
 122 | 
 123 |  is its known value.
 124 | \end_layout
 125 | 
 126 | \begin_layout Itemize
 127 | This can also be used for interpolation, to obtain an approximation to the
 128 |  function value and its derivatives at an arbitrary point 
 129 | \begin_inset Formula $x$
 130 | \end_inset
 131 | 
 132 |  that does not belong to the point cloud.
 133 |  (But here a cheaper alternative is to compute the approximation from the
 134 |  obtained quadratic fit.
 135 |  This also gives the derivatives, since the analytical expression of the
 136 |  fit is known.)
 137 | \end_layout
 138 | 
 139 | \begin_layout Itemize
 140 | Another use case may be as an error indicator (compare the interpolated
 141 |  
 142 | \begin_inset Formula $\widehat{f}_{i}(x_{i})$
 143 | \end_inset
 144 | 
 145 | , computed by omitting 
 146 | \begin_inset Formula $f_{i}$
 147 | \end_inset
 148 | 
 149 | , and the actual data 
 150 | \begin_inset Formula $f_{i}$
 151 | \end_inset
 152 | 
 153 | ).
 154 | \end_layout
 155 | 
 156 | \begin_layout Itemize
 157 | Also as a smoother? Replace each 
 158 | \begin_inset Formula $f_{i}$
 159 | \end_inset
 160 | 
 161 |  by its interpolant, then iterate until convergence.
 162 | \end_layout
 163 | 
 164 | \begin_layout Standard
 165 | The answer turns out to be yes.
 166 |  Let us denote the local representation of our scalar field 
 167 | \begin_inset Formula $f(x)$
 168 | \end_inset
 169 | 
 170 | , in a neighborhood of the point 
 171 | \begin_inset Formula $x_{i}$
 172 | \end_inset
 173 | 
 174 | , by 
 175 | \begin_inset Formula $\widehat{f}_{i}(x)$
 176 | \end_inset
 177 | 
 178 | .
 179 | \end_layout
 180 | 
 181 | \begin_layout Standard
 182 | Let us Taylor expand 
 183 | \begin_inset Formula $\widehat{f}_{i}$
 184 | \end_inset
 185 | 
 186 |  around the point 
 187 | \begin_inset Formula $x_{i}$
 188 | \end_inset
 189 | 
 190 | , and evaluate the Taylor series at a neighbor point 
 191 | \begin_inset Formula $x_{k}$
 192 | \end_inset
 193 | 
 194 |  (a point distinct from 
 195 | \begin_inset Formula $x_{i}$
 196 | \end_inset
 197 | 
 198 | , also belonging to the point cloud): 
 199 | \begin_inset Formula 
 200 | \begin{equation}
 201 | \widehat{f}_{i}(x_{k})=\widehat{f}_{i}(x_{i})+h_{k}a_{1}+\ell_{k}a_{2}+\frac{h_{k}^{2}}{2}a_{3}+h_{k}\ell_{k}a_{4}+\frac{\ell_{k}^{2}}{2}a_{5}+O(h_{k}^{3}\,,\ell_{k}^{3})\;,\label{eq:Tay}
 202 | \end{equation}
 203 | 
 204 | \end_inset
 205 | 
 206 | where
 207 | \begin_inset Formula 
 208 | \begin{align}
 209 | h_{k} & :=(x_{k})_{1}-(x_{i})_{1}\;,\label{eq:hk}\\
 210 | \ell_{k} & :=(x_{k})_{2}-(x_{i})_{2}\;,\label{eq:ellk}
 211 | \end{align}
 212 | 
 213 | \end_inset
 214 | 
 215 | and the function value and the derivatives are denoted by (note the numbering)
 216 | \begin_inset Formula 
 217 | \begin{align}
 218 | a_{1} & =\frac{\partial\widehat{f}_{i}}{\partial x}\vert_{x=x_{i}}\;, & a_{2} & =\frac{\partial\widehat{f}_{i}}{\partial y}\vert_{x=x_{i}}\;,\nonumber \\
 219 | a_{3} & =\frac{\partial^{2}\widehat{f}_{i}}{\partial x^{2}}\vert_{x=x_{i}}\;, & a_{5} & =\frac{\partial^{2}\widehat{f}_{i}}{\partial y^{2}}\vert_{x=x_{i}}\;,\nonumber \\
 220 | a_{4} & =\frac{\partial^{2}\widehat{f}_{i}}{\partial x\partial y}\vert_{x=x_{i}}\;, & a_{0} & =\widehat{f}_{i}\vert_{x=x_{i}}\;.\label{eq:aj}
 221 | \end{align}
 222 | 
 223 | \end_inset
 224 | 
 225 | Truncating the error term, we have the Taylor approximation:
 226 | \begin_inset Formula 
 227 | \begin{equation}
 228 | \widehat{f}_{i}(x_{k})\approx a_{0}+h_{k}a_{1}+\ell_{k}a_{2}+\frac{h_{k}^{2}}{2}a_{3}+h_{k}\ell_{k}a_{4}+\frac{\ell_{k}^{2}}{2}a_{5}=:\overline{f}_{k}\;,\label{eq:approx}
 229 | \end{equation}
 230 | 
 231 | \end_inset
 232 | 
 233 | Now, let us define the coefficients
 234 | \begin_inset Formula 
 235 | \begin{align}
 236 | c_{k}^{(1)} & :=h_{k}\;, & c_{k}^{(2)} & :=\ell_{k}\;,\nonumber \\
 237 | c_{k}^{(3)} & :=\frac{h_{k}^{2}}{2}\;, & c_{k}^{(5)} & :=\frac{\ell_{k}^{2}}{2}\;,\nonumber \\
 238 | c_{k}^{(4)} & :=h_{k}\ell_{k}\;, & c_{k}^{(0)} & :=1\;.\label{eq:ck}
 239 | \end{align}
 240 | 
 241 | \end_inset
 242 | 
 243 | Observe that
 244 | \begin_inset Formula 
 245 | \begin{equation}
 246 | \frac{\partial\overline{f}_{k}}{\partial a_{j}}=c_{k}^{(j)}\;,\label{eq:dfkdaj}
 247 | \end{equation}
 248 | 
 249 | \end_inset
 250 | 
 251 | At the neighbor points 
 252 | \begin_inset Formula $x_{k}$
 253 | \end_inset
 254 | 
 255 |  (belonging to the point cloud), by assumption we have the function values
 256 |  available as data.
 257 |  The error made at any such point 
 258 | \begin_inset Formula $x_{k}$
 259 | \end_inset
 260 | 
 261 | , when we replace 
 262 | \begin_inset Formula $\widehat{f}_{i}(x_{k})$
 263 | \end_inset
 264 | 
 265 |  with its Taylor approximation, is
 266 | \begin_inset Formula 
 267 | \begin{equation}
 268 | e_{k}:=f_{k}-\overline{f}_{k}\;,\label{eq:ek}
 269 | \end{equation}
 270 | 
 271 | \end_inset
 272 | 
 273 | One-half of the total squared error across all the neighbor points 
 274 | \begin_inset Formula $k$
 275 | \end_inset
 276 | 
 277 |  is simply
 278 | \begin_inset Formula 
 279 | \begin{equation}
 280 | G(a_{0},\dots,a_{5}):=\frac{1}{2}\;\underset{k\in I_{i}}{\sum}\,e_{k}^{2}\;,\label{eq:G}
 281 | \end{equation}
 282 | 
 283 | \end_inset
 284 | 
 285 | where 
 286 | \begin_inset Formula $I_{i}$
 287 | \end_inset
 288 | 
 289 |  is the index set of the point 
 290 | \begin_inset Formula $i$
 291 | \end_inset
 292 | 
 293 | 's neighbors.
 294 | \end_layout
 295 | 
 296 | \begin_layout Standard
 297 | Minimizing the error leads, in the least-squares sense, to the best possible
 298 |  values for the 
 299 | \begin_inset Formula $a_{j}$
 300 | \end_inset
 301 | 
 302 | :
 303 | \begin_inset Formula 
 304 | \[
 305 | \{a_{0},\dots,a_{5}\}_{\mathrm{optimal}}=\underset{a_{0},\dots,a_{5}}{\arg\min}\,G(a_{0},\dots,a_{5})\;.
 306 | \]
 307 | 
 308 | \end_inset
 309 | 
 310 | Because 
 311 | \begin_inset Formula $G\ge0$
 312 | \end_inset
 313 | 
 314 |  for any values of the 
 315 | \begin_inset Formula $a_{j}$
 316 | \end_inset
 317 | 
 318 | , and 
 319 | \begin_inset Formula $G$
 320 | \end_inset
 321 | 
 322 |  is a quadratic function of the 
 323 | \begin_inset Formula $a_{j}$
 324 | \end_inset
 325 | 
 326 | , it has a unique extremal point, which is a minimum.
 327 |  The least-squares fit is given by this unique minimum of 
 328 | \begin_inset Formula $G$
 329 | \end_inset
 330 | 
 331 | :
 332 | \begin_inset Note Note
 333 | status open
 334 | 
 335 | \begin_layout Plain Layout
 336 | The error 
 337 | \begin_inset Formula $G$
 338 | \end_inset
 339 | 
 340 |  obviously has no finite maximum in terms of the 
 341 | \begin_inset Formula $a_{j}$
 342 | \end_inset
 343 | 
 344 | ; hence its only critical point must be a minimum.
 345 |  Thus the problem becomes to find values for 
 346 | \begin_inset Formula $a_{j}$
 347 | \end_inset
 348 | 
 349 |  such that
 350 | \end_layout
 351 | 
 352 | \end_inset
 353 | 
 354 | 
 355 | \begin_inset Formula 
 356 | \begin{equation}
 357 | \frac{\partial G}{\partial a_{j}}=0\;,\quad j=0,\dots,5\;.\label{eq:minG}
 358 | \end{equation}
 359 | 
 360 | \end_inset
 361 | 
 362 | Using first 
 363 | \begin_inset CommandInset ref
 364 | LatexCommand eqref
 365 | reference "eq:G"
 366 | 
 367 | \end_inset
 368 | 
 369 | , and then on the second line 
 370 | \begin_inset CommandInset ref
 371 | LatexCommand eqref
 372 | reference "eq:ek"
 373 | 
 374 | \end_inset
 375 | 
 376 | , we can write
 377 | \begin_inset Formula 
 378 | \begin{align}
 379 | \frac{\partial G}{\partial a_{j}} & =\underset{k\in I_{i}}{\sum}e_{k}\frac{\partial e_{k}}{\partial a_{j}}\nonumber \\
 380 |  & =\underset{k\in I_{i}}{\sum}[f_{k}-\overline{f}_{k}(a_{0},\dots,a_{5})]\left[-\frac{\partial\overline{f}_{k}}{\partial a_{j}}\right]=0\;,\quad j=0,\dots5\;,\label{eq:minG2}
 381 | \end{align}
 382 | 
 383 | \end_inset
 384 | 
 385 | which, using 
 386 | \begin_inset CommandInset ref
 387 | LatexCommand eqref
 388 | reference "eq:approx"
 389 | 
 390 | \end_inset
 391 | 
 392 | –
 393 | \begin_inset CommandInset ref
 394 | LatexCommand eqref
 395 | reference "eq:ck"
 396 | 
 397 | \end_inset
 398 | 
 399 |  and 
 400 | \begin_inset CommandInset ref
 401 | LatexCommand eqref
 402 | reference "eq:dfkdaj"
 403 | 
 404 | \end_inset
 405 | 
 406 | , leads to
 407 | \begin_inset Formula 
 408 | \[
 409 | \underset{k\in I_{i}}{\sum}\left(\left[-f_{k}+c_{k}^{(0)}a_{0}+c_{k}^{(1)}a_{1}+c_{k}^{(2)}a_{2}+c_{k}^{(3)}a_{3}+c_{k}^{(4)}a_{4}+c_{k}^{(5)}a_{5}\right]c_{k}^{(j)}\right)=0\;,\quad j=0,\dots,5\;.
 410 | \]
 411 | 
 412 | \end_inset
 413 | 
 414 | This can be written as a standard linear equation system
 415 | \begin_inset Formula 
 416 | \begin{equation}
 417 | \underset{n=0}{\overset{5}{\sum}}A_{jn}a_{n}=b_{j}\;,\quad j=0,\dots,5\;,\label{eq:lineq}
 418 | \end{equation}
 419 | 
 420 | \end_inset
 421 | 
 422 | where
 423 | \begin_inset Formula 
 424 | \begin{equation}
 425 | A_{jn}=\underset{k\in I_{i}}{\sum}\;\,c_{k}^{(n)}c_{k}^{(j)}\;,\label{eq:Ajn}
 426 | \end{equation}
 427 | 
 428 | \end_inset
 429 | 
 430 | 
 431 | \begin_inset Formula 
 432 | \begin{equation}
 433 | b_{j}=\underset{k\in I_{i}}{\sum}\,f_{k}\,c_{k}^{(j)}\;.\label{eq:bj}
 434 | \end{equation}
 435 | 
 436 | \end_inset
 437 | 
 438 | Considering the magnitudes of the expressions 
 439 | \begin_inset CommandInset ref
 440 | LatexCommand eqref
 441 | reference "eq:ck"
 442 | 
 443 | \end_inset
 444 | 
 445 | , which contribute quadratically to 
 446 | \begin_inset Formula $A_{jn}$
 447 | \end_inset
 448 | 
 449 | , we see that the condition number of 
 450 | \begin_inset Formula $A$
 451 | \end_inset
 452 | 
 453 |  will likely deteriorate, when compared to the previous case where 
 454 | \begin_inset Formula $f_{i}$
 455 | \end_inset
 456 | 
 457 |  is known.
 458 |  This is as expected; we are now dealing with not only the first and second
 459 |  derivatives, but also with the function value.
 460 |  Preconditioning (even by row normalization, although this destroys the
 461 |  symmetry) may help with floating-point roundoff issues.
 462 | \end_layout
 463 | 
 464 | \begin_layout Standard
 465 | \begin_inset Newpage pagebreak
 466 | \end_inset
 467 | 
 468 | 
 469 | \end_layout
 470 | 
 471 | \begin_layout Section
 472 | Accuracy?
 473 | \end_layout
 474 | 
 475 | \begin_layout Standard
 476 | At first glance, WLSQM is not even consistent, if we treat 
 477 | \begin_inset Formula $\overline{f}_{k}$
 478 | \end_inset
 479 | 
 480 |  as an 
 481 | \begin_inset Formula $O(h_{k}^{3}\,,\ell_{k}^{3})$
 482 | \end_inset
 483 | 
 484 |  approximation of 
 485 | \begin_inset Formula $\widehat{f}_{i}(x_{k})$
 486 | \end_inset
 487 | 
 488 | , and track the error term through the calculation (details left as an exercise).
 489 | \end_layout
 490 | 
 491 | \begin_layout Standard
 492 | Obviously, consistent expansion of the matrix 
 493 | \begin_inset Formula $A$
 494 | \end_inset
 495 | 
 496 |  to the order 
 497 | \begin_inset Formula $O(h_{k}^{3}\,,\ell_{k}^{3})$
 498 | \end_inset
 499 | 
 500 |  gives 
 501 | \begin_inset Formula 
 502 | \[
 503 | A=\underset{k\in I_{i}}{\sum}\left[\begin{array}{cccccc}
 504 | 1 & h_{k} & \ell_{k} & \frac{h_{k}^{2}}{2} & h_{k}\ell_{k} & \frac{\ell_{k}^{2}}{2}\\
 505 | h_{k} & h_{k}^{2} & h_{k}\ell_{k} & \sim0 & \sim0 & \sim0\\
 506 | \ell_{k} & h_{k}\ell_{k} & \ell_{k}^{2} & \sim0 & \sim0 & \sim0\\
 507 | \frac{h_{k}^{2}}{2} & \sim0 & \sim0 & \sim0 & \sim0 & \sim0\\
 508 | h_{k}\ell_{k} & \sim0 & \sim0 & \sim0 & \sim0 & \sim0\\
 509 | \frac{\ell_{k}^{2}}{2} & \sim0 & \sim0 & \sim0 & \sim0 & \sim0
 510 | \end{array}\right]=\underset{k\in I_{i}}{\sum}\left[\begin{array}{cccccc}
 511 | 1 & h_{k} & \ell_{k} & \frac{h_{k}^{2}}{2} & h_{k}\ell_{k} & \frac{\ell_{k}^{2}}{2}\\
 512 |  & h_{k}^{2} & h_{k}\ell_{k} & \sim0 & \sim0 & \sim0\\
 513 |  &  & \ell_{k}^{2} & \sim0 & \sim0 & \sim0\\
 514 |  &  &  & \sim0 & \sim0 & \sim0\\
 515 |  & \mathrm{symm.} &  &  & \sim0 & \sim0\\
 516 |  &  &  &  &  & \sim0
 517 | \end{array}\right]
 518 | \]
 519 | 
 520 | \end_inset
 521 | 
 522 | which has at most rank 3 (rank 2 in the classical case, where the first
 523 |  row and column, corresponding to unknown 
 524 | \begin_inset Formula $f_{i}$
 525 | \end_inset
 526 | 
 527 | , are removed).
 528 |  Of course it can be of full rank if the almost zeros are retained, but
 529 |  the truncation error (of the Taylor approximation) dominates those, so
 530 |  consistency requires that they be dropped.
 531 | \end_layout
 532 | 
 533 | \begin_layout Standard
 534 | (As an aside, we note that if there is only one neighbor point, the equations
 535 |  corresponding to the UL 3x3 block become scalar multiples of each other
 536 |  due to 
 537 | \begin_inset Formula $b$
 538 | \end_inset
 539 | 
 540 |  also having a factor of 
 541 | \begin_inset Formula $c_{k}^{(j)}$
 542 | \end_inset
 543 | 
 544 | .
 545 |  This is of course as expected; one can hardly expect to obtain two independent
 546 |  derivatives from just one neighbor.
 547 |  The same occurs if the neighbors are collinear (as is obvious geometrically,
 548 |  and quite simple to see algebraically, writing e.g.
 549 |  for two points 
 550 | \begin_inset Formula $h_{2}=Ch_{1}$
 551 | \end_inset
 552 | 
 553 | , 
 554 | \begin_inset Formula $\ell_{2}=C\ell_{1}$
 555 | \end_inset
 556 | 
 557 | ...).)
 558 | \end_layout
 559 | 
 560 | \begin_layout Standard
 561 | However, WLSQM (at least the classical version with 
 562 | \begin_inset Formula $f_{i}$
 563 | \end_inset
 564 | 
 565 |  known) has been observed to actually work, with some reasonable amount
 566 |  of numerical error, so this analysis must be wrong.
 567 |  What is going on?
 568 | \end_layout
 569 | 
 570 | \begin_layout Subsection
 571 | Accuracy, correctly
 572 | \end_layout
 573 | 
 574 | \begin_layout Standard
 575 | Let us take a page from finite element methods, where the weak form is —
 576 |  after the fact — taken as the new 
 577 | \emph on
 578 | definition
 579 | \emph default
 580 |  of the problem (which just so happens to lead to the classical strong form
 581 |  in cases where both can be written).
 582 | \end_layout
 583 | 
 584 | \begin_layout Standard
 585 | To apply this philosophy here: after we define 
 586 | \begin_inset Formula $\overline{f}_{k}$
 587 | \end_inset
 588 | 
 589 | , we can 
 590 | \begin_inset Quotes eld
 591 | \end_inset
 592 | 
 593 | forget
 594 | \begin_inset Quotes erd
 595 | \end_inset
 596 | 
 597 |  that it comes from a truncated Taylor series, and 
 598 | \emph on
 599 | take the definition as a new starting point
 600 | \emph default
 601 | : in principle, 
 602 | \begin_inset Formula $\overline{f}_{k}$
 603 | \end_inset
 604 | 
 605 |  is just a function of the 
 606 | \begin_inset Formula $a_{j}$
 607 | \end_inset
 608 | 
 609 | , to be least-squares fitted to known data points 
 610 | \begin_inset Formula $f_{k}$
 611 | \end_inset
 612 | 
 613 |  (and optionally known 
 614 | \begin_inset Formula $f_{i}$
 615 | \end_inset
 616 | 
 617 | , as per classical WLSQM).
 618 | \end_layout
 619 | 
 620 | \begin_layout Standard
 621 | Then we just perform standard least-squares fitting.
 622 |  The math is exact (given unrealistic, exact arithmetic — this is a separate
 623 |  issue); no truncation error term appears.
 624 |  The full matrix should be retained:
 625 | \begin_inset Formula 
 626 | \[
 627 | A=\underset{k\in I_{i}}{\sum}\left[\begin{array}{cccccc}
 628 | 1 & h_{k} & \ell_{k} & \frac{h_{k}^{2}}{2} & h_{k}\ell_{k} & \frac{\ell_{k}^{2}}{2}\\
 629 | h_{k} & h_{k}^{2} & h_{k}\ell_{k} & \frac{h_{k}^{3}}{2} & h_{k}^{2}\ell_{k} & h_{k}\frac{\ell_{k}^{2}}{2}\\
 630 | \ell_{k} & h_{k}\ell_{k} & \ell_{k}^{2} & \ell_{k}\frac{h_{k}^{2}}{2} & h_{k}\ell_{k}^{2} & \frac{\ell_{k}^{3}}{2}\\
 631 | \frac{h_{k}^{2}}{2} & \frac{h_{k}^{3}}{2} & \ell_{k}\frac{h_{k}^{2}}{2} & \frac{h_{k}^{4}}{4} & \frac{h_{k}^{3}}{2}\ell_{k} & \frac{h_{k}^{2}\ell_{k}^{2}}{4}\\
 632 | h_{k}\ell_{k} & h_{k}^{2}\ell_{k} & h_{k}\ell_{k}^{2} & \frac{h_{k}^{3}}{2}\ell_{k} & h_{k}^{2}\ell_{k}^{2} & h_{k}\frac{\ell_{k}^{3}}{2}\\
 633 | \frac{\ell_{k}^{2}}{2} & h_{k}\frac{\ell_{k}^{2}}{2} & \frac{\ell_{k}^{3}}{2} & \frac{h_{k}^{2}\ell_{k}^{2}}{4} & h_{k}\frac{\ell_{k}^{3}}{2} & \frac{\ell_{k}^{4}}{4}
 634 | \end{array}\right]=\underset{k\in I_{i}}{\sum}\left[\begin{array}{cccccc}
 635 | 1 & h_{k} & \ell_{k} & \frac{h_{k}^{2}}{2} & h_{k}\ell_{k} & \frac{\ell_{k}^{2}}{2}\\
 636 |  & h_{k}^{2} & h_{k}\ell_{k} & \frac{h_{k}^{3}}{2} & h_{k}^{2}\ell_{k} & h_{k}\frac{\ell_{k}^{2}}{2}\\
 637 |  &  & \ell_{k}^{2} & \ell_{k}\frac{h_{k}^{2}}{2} & h_{k}\ell_{k}^{2} & \frac{\ell_{k}^{3}}{2}\\
 638 |  &  &  & \frac{h_{k}^{4}}{4} & \frac{h_{k}^{3}}{2}\ell_{k} & \frac{h_{k}^{2}\ell_{k}^{2}}{4}\\
 639 |  & \mathrm{symm.} &  &  & h_{k}^{2}\ell_{k}^{2} & h_{k}\frac{\ell_{k}^{3}}{2}\\
 640 |  &  &  &  &  & \frac{\ell_{k}^{4}}{4}
 641 | \end{array}\right]
 642 | \]
 643 | 
 644 | \end_inset
 645 | 
 646 | This is now of full rank, provided that enough neighbor points 
 647 | \begin_inset Formula $x_{k}$
 648 | \end_inset
 649 | 
 650 |  are used in the calculation (considering that we are least-squares fitting
 651 |  a general quadratic polynomial in the plane; see below).
 652 | \end_layout
 653 | 
 654 | \begin_layout Standard
 655 | At this point the only error — considering only 
 656 | \begin_inset Formula $\overline{f}_{k}$
 657 | \end_inset
 658 | 
 659 |  and the data 
 660 | \begin_inset Formula $f_{k}$
 661 | \end_inset
 662 | 
 663 |  — is the 
 664 | \emph on
 665 | RMS (root mean square) error
 666 | \emph default
 667 |  of the least-squares fit, 
 668 | \begin_inset Formula $\min\sqrt{2G}$
 669 | \end_inset
 670 | 
 671 |  (where the minimum occurs at the solution point).
 672 |  The RMS error measures how well the model adheres to each data point, on
 673 |  average.
 674 |  The obtained coefficients are optimal: out of all functions of the form
 675 |  
 676 | \begin_inset CommandInset ref
 677 | LatexCommand eqref
 678 | reference "eq:approx"
 679 | 
 680 | \end_inset
 681 | 
 682 |  with 
 683 | \begin_inset Formula $a_{j}$
 684 | \end_inset
 685 | 
 686 |  as parameters, the solution of 
 687 | \begin_inset CommandInset ref
 688 | LatexCommand eqref
 689 | reference "eq:lineq"
 690 | 
 691 | \end_inset
 692 | 
 693 | –
 694 | \begin_inset CommandInset ref
 695 | LatexCommand eqref
 696 | reference "eq:bj"
 697 | 
 698 | \end_inset
 699 | 
 700 |  gives the smallest possible RMS error for the fit.
 701 | \end_layout
 702 | 
 703 | \begin_layout Standard
 704 | Then — again after the fact — we observe that these optimal 
 705 | \begin_inset Formula $a_{j}$
 706 | \end_inset
 707 | 
 708 |  are pretty good also for use in a Taylor approximation
 709 | \begin_inset Note Note
 710 | status open
 711 | 
 712 | \begin_layout Plain Layout
 713 | , precisely because they minimize the RMS error of the fit
 714 | \end_layout
 715 | 
 716 | \end_inset
 717 | 
 718 | .
 719 |  The solution is, in the least-squares sense, the best quadratic polynomial
 720 |  of 
 721 | \begin_inset Formula $(x,y)$
 722 | \end_inset
 723 | 
 724 |  for locally approximating 
 725 | \begin_inset Formula $f(x)$
 726 | \end_inset
 727 | 
 728 |  around 
 729 | \begin_inset Formula $x_{i}$
 730 | \end_inset
 731 | 
 732 | .
 733 |  (The fit 
 734 | \begin_inset CommandInset ref
 735 | LatexCommand eqref
 736 | reference "eq:approx"
 737 | 
 738 | \end_inset
 739 | 
 740 |  is linear in 
 741 | \begin_inset Formula $a_{j}$
 742 | \end_inset
 743 | 
 744 | , but quadratic in 
 745 | \begin_inset Formula $(x,y)$
 746 | \end_inset
 747 | 
 748 | .) Also the Taylor approximation, truncated after the second-order terms,
 749 |  is a quadratic polynomial approximating 
 750 | \begin_inset Formula $f(x)$
 751 | \end_inset
 752 | 
 753 |  around 
 754 | \begin_inset Formula $x_{i}$
 755 | \end_inset
 756 | 
 757 | .
 758 |  Thus 
 759 | \emph on
 760 | we
 761 | \emph default
 762 |  
 763 | \emph on
 764 | interpret the quadratic fit as a (response surface) model
 765 | \emph default
 766 |  for 
 767 | \begin_inset Formula $f(x)$
 768 | \end_inset
 769 | 
 770 |  near 
 771 | \begin_inset Formula $x_{i}$
 772 | \end_inset
 773 | 
 774 | , and thus the 
 775 | \begin_inset Formula $a_{j}$
 776 | \end_inset
 777 | 
 778 |  as approximations to the Taylor coefficients of 
 779 | \begin_inset Formula $f$
 780 | \end_inset
 781 | 
 782 |  (whence also as the numerical approximations to the derivatives).
 783 | \end_layout
 784 | 
 785 | \begin_layout Standard
 786 | However, it must be emphasized that this gives rise to 
 787 | \emph on
 788 | modeling error
 789 | \emph default
 790 | , because the 
 791 | \begin_inset Formula $a_{j}$
 792 | \end_inset
 793 | 
 794 |  are 
 795 | \emph on
 796 | not
 797 | \emph default
 798 |  the exact coefficients of the true Taylor expansion of 
 799 | \begin_inset Formula $f(x)$
 800 | \end_inset
 801 | 
 802 |  around 
 803 | \begin_inset Formula $x_{i}$
 804 | \end_inset
 805 | 
 806 | .
 807 |  Indeed, strictly speaking, the data may not even describe a function admitting
 808 |  such an expansion! Even if the data admits an underlying function, and
 809 |  it happens to be in
 810 | \begin_inset space ~
 811 | \end_inset
 812 | 
 813 | 
 814 | \begin_inset Formula $C^{2}$
 815 | \end_inset
 816 | 
 817 | , there may be numerical and/or experimental noise in the data points, depending
 818 |  on the data source.
 819 |  (This 
 820 | \emph on
 821 | inexact data
 822 | \emph default
 823 |  is another separate error source.) Also, in the general case the fit will
 824 |  not be exact, i.e.
 825 |  the RMS error will be nonzero.
 826 | \end_layout
 827 | 
 828 | \begin_layout Standard
 829 | From this viewpoint, WLSQM would be more accurately advertised as a method
 830 |  for response surface modeling (RSM), for computing a local quadratic response
 831 |  surface in arbitrary geometries, instead of as a method for numerical different
 832 | iation.
 833 | \end_layout
 834 | 
 835 | \begin_layout Standard
 836 | Regarding numerical differentiation, the natural follow-up question is,
 837 |  what is the the total error arising from approximating the function 
 838 | \begin_inset Formula $f(x)$
 839 | \end_inset
 840 | 
 841 |  locally as the quadratic polynomial fit? The (original, not truncated)
 842 |  Taylor series, at a general point 
 843 | \begin_inset Formula $x$
 844 | \end_inset
 845 | 
 846 |  in the neighborhood of 
 847 | \begin_inset Formula $x_{i}$
 848 | \end_inset
 849 | 
 850 | , is 
 851 | \begin_inset Formula 
 852 | \begin{equation}
 853 | \widehat{f}_{i}(x)=\widehat{f}_{i}+h\frac{\partial\widehat{f}_{i}}{\partial x}+\ell\frac{\partial\widehat{f}_{i}}{\partial y}+\frac{h^{2}}{2}\frac{\partial^{2}\widehat{f}_{i}}{\partial x^{2}}+h\ell\frac{\partial^{2}\widehat{f}_{i}}{\partial x\partial y}+\frac{\ell^{2}}{2}\frac{\partial^{2}\widehat{f}_{i}}{\partial y^{2}}+O(h^{3}\,,\ell^{3})\;,\label{eq:Taygeneral}
 854 | \end{equation}
 855 | 
 856 | \end_inset
 857 | 
 858 | where on the right-hand side, the function and the derivatives are evaluated
 859 |  at 
 860 | \begin_inset Formula $x=x_{i}$
 861 | \end_inset
 862 | 
 863 | , and 
 864 | \begin_inset Formula $x-x_{i}=(h,\ell)$
 865 | \end_inset
 866 | 
 867 | .
 868 |  The quadratic polynomial fit is
 869 | \begin_inset Formula 
 870 | \begin{equation}
 871 | Q(x):=a_{0}+ha_{1}+\ell a_{2}+\frac{h^{2}}{2}a_{3}+h\ell a_{4}+\frac{\ell^{2}}{2}a_{5}\;,\label{eq:Qx}
 872 | \end{equation}
 873 | 
 874 | \end_inset
 875 | 
 876 | where the 
 877 | \begin_inset Formula $a_{j}$
 878 | \end_inset
 879 | 
 880 |  are obtained from the least-squares optimization.
 881 |  The total error in the function value, at a point 
 882 | \begin_inset Formula $x$
 883 | \end_inset
 884 | 
 885 | , is their difference
 886 | \begin_inset Formula 
 887 | \begin{align}
 888 | \mathrm{err}(x) & :=f(x)-Q(x)\overset{\text{near }x_{i}}{=}\widehat{f}_{i}(x)-Q(x)\nonumber \\
 889 |  & =(\widehat{f}_{i}-a_{0})+h(\frac{\partial\widehat{f}_{i}}{\partial x}-a_{1})+\ell(\frac{\partial\widehat{f}_{i}}{\partial y}-a_{2})+\frac{h^{2}}{2}(\frac{\partial^{2}\widehat{f}_{i}}{\partial x^{2}}-a_{3})+h\ell(\frac{\partial^{2}\widehat{f}_{i}}{\partial x\partial y}-a_{4})+\frac{\ell^{2}}{2}(\frac{\partial^{2}\widehat{f}_{i}}{\partial y^{2}}-a_{5})+O(h^{3}\,,\ell^{3})\;.\label{eq:errx}
 890 | \end{align}
 891 | 
 892 | \end_inset
 893 | 
 894 | When the Taylor series is truncated after the quadratic terms, the asymptotic
 895 |  term gives the 
 896 | \emph on
 897 | truncation error
 898 | \emph default
 899 | .
 900 |  The rest of the error is due to 
 901 | \emph on
 902 | modeling error
 903 | \emph default
 904 |  in the coefficients 
 905 | \begin_inset Formula $a_{j}$
 906 | \end_inset
 907 | 
 908 | , i.e.
 909 |  the parenthetical expressions in 
 910 | \begin_inset CommandInset ref
 911 | LatexCommand eqref
 912 | reference "eq:errx"
 913 | 
 914 | \end_inset
 915 | 
 916 | .
 917 | \end_layout
 918 | 
 919 | \begin_layout Standard
 920 | It is obvious that in the general case, the modeling error will be nonzero
 921 |  (even if we assume the data to be exact): the function 
 922 | \begin_inset Formula $f$
 923 | \end_inset
 924 | 
 925 |  is generally not a quadratic polynomial, and hence no quadratic polynomial
 926 |  can represent it exactly.
 927 |  To reiterate: the coefficients 
 928 | \begin_inset Formula $a_{j}$
 929 | \end_inset
 930 | 
 931 |  are the coefficients of the quadratic fit 
 932 | \begin_inset Formula $Q(x)$
 933 | \end_inset
 934 | 
 935 |  — they are 
 936 | \emph on
 937 | not
 938 | \emph default
 939 |  the Taylor coefficients of 
 940 | \begin_inset Formula $f$
 941 | \end_inset
 942 | 
 943 | !
 944 | \end_layout
 945 | 
 946 | \begin_layout Standard
 947 | However, they are a computable, close relative of the Taylor coefficients
 948 |  of the unknown function 
 949 | \begin_inset Formula $f$
 950 | \end_inset
 951 | 
 952 | , since the Taylor series of 
 953 | \begin_inset Formula $Q(x)$
 954 | \end_inset
 955 | 
 956 |  expanded at 
 957 | \begin_inset Formula $x_{i}$
 958 | \end_inset
 959 | 
 960 |  is, quite simply, 
 961 | \begin_inset Formula $Q(x)$
 962 | \end_inset
 963 | 
 964 |  itself.
 965 |  (Because 
 966 | \begin_inset Formula $Q(x)$
 967 | \end_inset
 968 | 
 969 |  is a polynomial, no asymptotic error term appears.)
 970 | \end_layout
 971 | 
 972 | \begin_layout Standard
 973 | Thus the magnitude of the total error depends on how well the coefficients
 974 |  
 975 | \begin_inset Formula $a_{j}$
 976 | \end_inset
 977 | 
 978 |  approximate the Taylor coefficients of 
 979 | \begin_inset Formula $f$
 980 | \end_inset
 981 | 
 982 | ; or in other words, how close 
 983 | \begin_inset Formula $f$
 984 | \end_inset
 985 | 
 986 |  is (locally) to a quadratic polynomial (which — given exact data and exact
 987 |  arithmetic — can be fitted exactly; note that both assumptions are required,
 988 |  as inexact data will give rise to nonzero RMS error in the fit, i.e.
 989 |  then the fit will not be exact).
 990 | \end_layout
 991 | 
 992 | \begin_layout Standard
 993 | This obviously depends on the neighborhood size, due to the asymptotic term
 994 |  describing the truncation error in 
 995 | \begin_inset CommandInset ref
 996 | LatexCommand eqref
 997 | reference "eq:Taygeneral"
 998 | 
 999 | \end_inset
1000 | 
1001 | .
1002 |  The asymptotic term of the Taylor series says that if the neighborhood
1003 |  is small enough, any 
1004 | \begin_inset Formula $f\in C^{2}$
1005 | \end_inset
1006 | 
1007 |  is locally close to a quadratic polynomial.
1008 |  This — for sufficiently small neighborhoods — should make the modeling
1009 |  error (and thus the error in the numerical derivatives) comparable to 
1010 | \begin_inset Formula $O(h^{3}\,,\ell^{3})$
1011 | \end_inset
1012 | 
1013 | .
1014 | \end_layout
1015 | 
1016 | \begin_layout Standard
1017 | This suggests that 
1018 | \begin_inset Formula $\mathrm{err}(x)$
1019 | \end_inset
1020 | 
1021 |  — with exact data and exact arithmetic — should also be comparable to 
1022 | \begin_inset Formula $O(h^{3}\,,\ell^{3})$
1023 | \end_inset
1024 | 
1025 | .
1026 |  (With inexact data, one needs to take into account that 
1027 | \begin_inset Formula $f_{k}=f(x_{k})+\delta_{k}$
1028 | \end_inset
1029 | 
1030 |  and work from that.)
1031 | \end_layout
1032 | 
1033 | \begin_layout Standard
1034 | Observe also that there are six 
1035 | \begin_inset Formula $a_{j}$
1036 | \end_inset
1037 | 
1038 |  (
1039 | \begin_inset Formula $j=0,\dots,5$
1040 | \end_inset
1041 | 
1042 | ) in 
1043 | \begin_inset CommandInset ref
1044 | LatexCommand eqref
1045 | reference "eq:Qx"
1046 | 
1047 | \end_inset
1048 | 
1049 | .
1050 |  Hence, with exact arithmetic, six data values for 
1051 | \begin_inset CommandInset ref
1052 | LatexCommand eqref
1053 | reference "eq:approx"
1054 | 
1055 | \end_inset
1056 | 
1057 | , i.e.
1058 |  six neighbors 
1059 | \begin_inset Formula $x_{k}$
1060 | \end_inset
1061 | 
1062 |  (five if 
1063 | \begin_inset Formula $f_{i}$
1064 | \end_inset
1065 | 
1066 |  is known, eliminating 
1067 | \begin_inset Formula $a_{0}$
1068 | \end_inset
1069 | 
1070 | ), uniquely determine the quadratic function 
1071 | \begin_inset Formula $Q(x)$
1072 | \end_inset
1073 | 
1074 | .
1075 |  (Fewer data values lead to an underdetermined system, which has an infinite
1076 |  family of solutions.) More data values lead to an overdetermined system,
1077 |  which is then taken care of by least-squares fitting: picking the quadratic
1078 |  polynomial that best approximates the data (which generally did not come
1079 |  from a quadratic polynomial).
1080 | \end_layout
1081 | 
1082 | \begin_layout Standard
1083 | This explains why the classical WLSQM takes 
1084 | \begin_inset Formula $6$
1085 | \end_inset
1086 | 
1087 |  neighbors (here 
1088 | \begin_inset Formula $7$
1089 | \end_inset
1090 | 
1091 |  if 
1092 | \begin_inset Formula $f_{i}$
1093 | \end_inset
1094 | 
1095 |  is not known) to perform the fitting; it is the smallest number of (nondegenera
1096 | te!) neighbors 
1097 | \begin_inset Formula $x_{k}$
1098 | \end_inset
1099 | 
1100 |  that makes the quadratic fitting problem overdetermined (hence actually
1101 |  needing the least-squares procedure).
1102 |  (The overdeterminedness also slightly protects against inexact data, so
1103 |  that one data point that is slightly off will not completely change the
1104 |  fit.)
1105 | \end_layout
1106 | 
1107 | \begin_layout Standard
1108 | \begin_inset Newpage pagebreak
1109 | \end_inset
1110 | 
1111 | 
1112 | \end_layout
1113 | 
1114 | \begin_layout Standard
1115 | But why is the result not exact, i.e.
1116 |  why is there modeling error in 
1117 | \begin_inset CommandInset ref
1118 | LatexCommand eqref
1119 | reference "eq:errx"
1120 | 
1121 | \end_inset
1122 | 
1123 | ? After all, the truncated Taylor expansion 
1124 | \emph on
1125 | is
1126 | \emph default
1127 |  the best local polynomial representation of 
1128 | \begin_inset Formula $f$
1129 | \end_inset
1130 | 
1131 | , up to the given degree.
1132 |  With exact data and arithmetic, how can the least-squares fit be anything
1133 |  but the truncated Taylor expansion?
1134 | \end_layout
1135 | 
1136 | \begin_layout Standard
1137 | The key is in the definition of 
1138 | \begin_inset Quotes eld
1139 | \end_inset
1140 | 
1141 | best
1142 | \begin_inset Quotes erd
1143 | \end_inset
1144 | 
1145 | .
1146 |  In a Taylor series, as the truncation order is increased, with each added
1147 |  term (given sufficient continuity of 
1148 | \begin_inset Formula $f$
1149 | \end_inset
1150 | 
1151 | ) the asymptotic accuracy increases, without requiring changes to the already
1152 |  computed coefficients.
1153 |  The Taylor series, being the polynomial series expansion of 
1154 | \begin_inset Formula $f$
1155 | \end_inset
1156 | 
1157 | , is optimal in the class of polynomial representations where the coefficients
1158 |  are 
1159 | \begin_inset Quotes eld
1160 | \end_inset
1161 | 
1162 | final
1163 | \begin_inset Quotes erd
1164 | \end_inset
1165 | 
1166 |  in this sense.
1167 |  This is indeed what leads to the common-sense notion of the Taylor series
1168 |  being 
1169 | \begin_inset Quotes eld
1170 | \end_inset
1171 | 
1172 | the best polynomial representation
1173 | \begin_inset Quotes erd
1174 | \end_inset
1175 | 
1176 |  of
1177 | \begin_inset space ~
1178 | \end_inset
1179 | 
1180 | 
1181 | \begin_inset Formula $f$
1182 | \end_inset
1183 | 
1184 | .
1185 | \end_layout
1186 | 
1187 | \begin_layout Standard
1188 | However, nothing requires the truncated Taylor series to satisfy the least-squar
1189 | es property.
1190 |  In the least-squares sense, 
1191 | \emph on
1192 | there may exist better polynomials of the same degree
1193 | \emph default
1194 |  to locally approximate 
1195 | \begin_inset Formula $f$
1196 | \end_inset
1197 | 
1198 | .
1199 |  For a trivial 1D example: to represent 
1200 | \begin_inset Formula $f(x)=x^{2}$
1201 | \end_inset
1202 | 
1203 |  in an interval 
1204 | \begin_inset Formula $x\in[-a,a]$
1205 | \end_inset
1206 | 
1207 |  around the origin using a constant approximation, the Taylor series produces
1208 |  
1209 | \begin_inset Formula $f\approx0$
1210 | \end_inset
1211 | 
1212 | .
1213 |  However, the mean value across the interval is a 
1214 | \begin_inset Quotes eld
1215 | \end_inset
1216 | 
1217 | better
1218 | \begin_inset Quotes erd
1219 | \end_inset
1220 | 
1221 |  constant approximation in an integral least-squares sense.
1222 | \end_layout
1223 | 
1224 | \begin_layout Standard
1225 | Indeed, a least-squares fit, as its order is increased, will change 
1226 | \emph on
1227 | all
1228 | \emph default
1229 |  of its coefficients; and it will do this to minimize the RMS error of the
1230 |  fit.
1231 |  (Be very careful: this is different from the modeling error in equation
1232 |  
1233 | \begin_inset CommandInset ref
1234 | LatexCommand eqref
1235 | reference "eq:errx"
1236 | 
1237 | \end_inset
1238 | 
1239 | .
1240 |  The RMS error only measures how well the model adheres to each data point,
1241 |  on average; it does not see what the model is used for.) In the least-squares
1242 |  fit, there is no asymptotic error term — the data points
1243 | \begin_inset space ~
1244 | \end_inset
1245 | 
1246 | 
1247 | \begin_inset Formula $f_{k}$
1248 | \end_inset
1249 | 
1250 | , used in the fitting, implicitly contain the information also from all
1251 |  the higher-order terms in the polynomial series expansion of
1252 | \begin_inset space ~
1253 | \end_inset
1254 | 
1255 | 
1256 | \begin_inset Formula $f$
1257 | \end_inset
1258 | 
1259 | .
1260 |  The fit then eliminates as much of the difference between the chosen model
1261 |  and the data as is possible.
1262 | \end_layout
1263 | 
1264 | \begin_layout Standard
1265 | It is not surprising that the price that must be paid for this increase
1266 |  of accuracy in interpolation is the 
1267 | \begin_inset Quotes eld
1268 | \end_inset
1269 | 
1270 | finality
1271 | \begin_inset Quotes erd
1272 | \end_inset
1273 | 
1274 |  of the coefficients in the above sense, the Taylor series already being
1275 |  optimal in its class.
1276 | \end_layout
1277 | 
1278 | \begin_layout Standard
1279 | We conclude that in the general case the result cannot be exact, because
1280 |  we are dealing with two very different entities, which coincide only under
1281 |  very restrictive assumptions.
1282 | \end_layout
1283 | 
1284 | \begin_layout Standard
1285 | Note also that 
1286 | \begin_inset Quotes eld
1287 | \end_inset
1288 | 
1289 | best
1290 | \begin_inset Quotes erd
1291 | \end_inset
1292 | 
1293 |  obviously depends on context.
1294 |  For response surface modeling, the WLSQM quadratic polynomial fit is optimal.
1295 |  However, for numerical differentiation, the fact that the obtained coefficients
1296 |  do not exactly coincide with the Taylor series coefficients of 
1297 | \begin_inset Formula $f$
1298 | \end_inset
1299 | 
1300 |  produces an undesirable source of numerical error (modeling error).
1301 | \end_layout
1302 | 
1303 | \end_body
1304 | \end_document
1305 | 


--------------------------------------------------------------------------------
/doc/wlsqm_gen.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Technologicat/python-wlsqm/b697d163c2d2bec46b4d9696467abaebb9d4cbb3/doc/wlsqm_gen.pdf


--------------------------------------------------------------------------------
/example.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Technologicat/python-wlsqm/b697d163c2d2bec46b4d9696467abaebb9d4cbb3/example.png


--------------------------------------------------------------------------------
/examples/expertsolver_example.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | """A minimal usage example for ExpertSolver.
  3 | 
  4 | JJ 2017-03-28
  5 | """
  6 | 
  7 | from __future__ import division, print_function, absolute_import
  8 | 
  9 | import numpy as np
 10 | 
 11 | import scipy.spatial.ckdtree
 12 | 
 13 | import matplotlib.pyplot as plt
 14 | import mpl_toolkits.mplot3d.axes3d
 15 | 
 16 | import wlsqm
 17 | 
 18 | 
 19 | def project_onto_regular_grid_2D(x, F, nvis=101, fit_order=1, nk=10):
 20 |     """Project scalar data from a 2D point cloud onto a regular grid.
 21 | 
 22 | Useful for plotting. Uses the WLSQM meshless method.
 23 | 
 24 | The bounding box of the x data is automatically used as the bounds of the generated regular grid.
 25 | 
 26 | Parameters:
 27 |     x : rank-2 array, dtype np.float64
 28 |         Point cloud, one point per row. x[i,:] = (xi,yi)
 29 | 
 30 |     F : rank-1 array, dtype np.float64
 31 |         The corresponding function values. F[i] = F( x[i,:] )
 32 | 
 33 |     nvis : int
 34 |         Number of points per axis in the generated regular grid.
 35 | 
 36 |     fit_order : int
 37 |         Order of the surrogate polynomial, one of [0,1,2,3,4].
 38 | 
 39 |     nk : int
 40 |         Number of nearest neighbors to use for fitting the model.
 41 | 
 42 | Return value:
 43 |     tuple (X,Y,Z)
 44 |         where
 45 |         X,Y are rank-2 meshgrid arrays representing the generated regular grid, and
 46 |         Z is an array of the same shape, containing the corresponding data values.
 47 | 
 48 | """
 49 |     # Form the neighborhoods.
 50 | 
 51 |     # index the input points for fast searching
 52 |     tree = scipy.spatial.cKDTree( data=x )
 53 | 
 54 |     # Find max_nk nearest neighbors of each input point.
 55 |     #
 56 |     # The +1 is for the point itself, since it is always the nearest to itself.
 57 |     #
 58 |     # (cKDTree.query() supports querying for arbitrary x; here we just set these x as the same as the points in the tree.)
 59 |     #
 60 |     dd,ii = tree.query( x, 1 + nk )
 61 | 
 62 |     # Take only the neighbors of points[i], excluding the point itself.
 63 |     #
 64 |     ii = ii[:,1:]  # points[ ii[i,k] ] is the kth nearest neighbor of points[i]. Shape of ii is (npoints, nk).
 65 | 
 66 |     # neighbor point indices (pointing to rows in x[]); typecast to int32
 67 |     hoods = np.array( ii, dtype=np.int32 )
 68 | 
 69 |     npoints  = x.shape[0]
 70 |     nk_array = nk * np.ones( (npoints,), dtype=np.int32 )  # number of neighbors, i.e. nk_array[i] is the number of actually used columns in hoods[i,:]
 71 | 
 72 |     # Construct the model by least-squares fitting
 73 |     #
 74 |     fit_order_array = fit_order            * np.ones( (npoints,), dtype=np.int32 )
 75 |     knowns_array    = wlsqm.b2_F           * np.ones( (npoints,), dtype=np.int64 )  # bitmask! wlsqm.b*
 76 |     wm_array        = wlsqm.WEIGHT_UNIFORM * np.ones( (npoints,), dtype=np.int32 )
 77 |     solver = wlsqm.ExpertSolver( dimension=2,
 78 |                                  nk=nk_array,
 79 |                                  order=fit_order_array,
 80 |                                  knowns=knowns_array,
 81 |                                  weighting_method=wm_array,
 82 |                                  algorithm=wlsqm.ALGO_BASIC,
 83 |                                  do_sens=False,
 84 |                                  max_iter=10,  # must be an int even though this parameter is not used in ALGO_BASIC mode
 85 |                                  ntasks=8,
 86 |                                  debug=False )
 87 | 
 88 |     no = wlsqm.number_of_dofs( dimension=2, order=fit_order )
 89 |     fi = np.empty( (npoints,no), dtype=np.float64 )
 90 |     fi[:,0] = F  # fi[i,0] contains the function value at point x[i,:]
 91 | 
 92 |     solver.prepare( xi=x, xk=x[hoods] )  # generate problem matrices from the geometry of the point cloud
 93 |     solver.solve( fk=fi[hoods,0], fi=fi, sens=None )  # compute least-squares fit to data
 94 | 
 95 | 
 96 |     # generate the regular grid for output
 97 |     #
 98 |     xx  = np.linspace( np.min(x[:,0]), np.max(x[:,0]), nvis )
 99 |     yy  = np.linspace( np.min(x[:,1]), np.max(x[:,1]), nvis )
100 |     X,Y = np.meshgrid(xx,yy)
101 | 
102 |     # make a flat list of grid points (rank-2 array, one point per row)
103 |     #
104 |     Xlin = np.reshape(X, -1)
105 |     Ylin = np.reshape(Y, -1)
106 |     xout = np.empty( (len(Xlin), 2), dtype=np.float64 )
107 |     xout[:,0] = Xlin
108 |     xout[:,1] = Ylin
109 | 
110 |     # Using the model, interpolate onto the regular grid
111 |     #
112 |     solver.prep_interpolate()  # prepare global model
113 |     Z,mi = solver.interpolate( xout, mode='nearest' )  # use the nearest local model; fast, surprisingly accurate
114 |                                                        # if a reasonable number of points (and continuous-looking
115 |                                                        # although technically has jumps over Voronoi cell boundaries)
116 |     # when mode="nearest", "mi" is an array containing the index of the local model (which belongs to x[mi,:]) used for each evaluation
117 | 
118 |     return (X, Y, np.reshape( Z, X.shape ))
119 | 
120 | 
121 | def plot_wireframe( data, figno=None ):
122 |     """Make and label a wireframe plot.
123 | 
124 | Parameters:
125 |     data : dict
126 |         key   : "x","y","z"
127 |         value : tuple (rank-2 array in meshgrid format, axis label)
128 | 
129 | Return value:
130 |     ax
131 |         The Axes3D object that was used for plotting.
132 | """
133 |     # http://matplotlib.org/mpl_toolkits/mplot3d/tutorial.html
134 |     fig = plt.figure(figno)
135 | 
136 |     # Axes3D has a tendency to underestimate how much space it needs; it draws its labels
137 |     # outside the window area in certain orientations.
138 |     #
139 |     # This causes the labels to be clipped, which looks bad. We prevent this by creating the axes
140 |     # in a slightly smaller rect (leaving a margin). This way the labels will show - outside the Axes3D,
141 |     # but still inside the figure window.
142 |     #
143 |     # The final touch is to set the window background to a matching white, so that the
144 |     # background of the figure appears uniform.
145 |     #
146 |     fig.patch.set_color( (1,1,1) )
147 |     fig.patch.set_alpha( 1.0 )
148 |     x0y0wh = [ 0.02, 0.02, 0.96, 0.96 ]  # left, bottom, width, height      (here as fraction of subplot area)
149 | 
150 |     ax = mpl_toolkits.mplot3d.axes3d.Axes3D(fig, rect=x0y0wh)
151 | 
152 |     X,xlabel = data["x"]
153 |     Y,ylabel = data["y"]
154 |     Z,zlabel = data["z"]
155 |     ax.plot_wireframe( X, Y, Z )
156 | 
157 |     ax.view_init(34, -40)
158 |     ax.axis('tight')
159 |     plt.xlabel(xlabel)
160 |     plt.ylabel(ylabel)
161 |     ax.set_title(zlabel)
162 | 
163 |     return ax
164 | 
165 | 
166 | def main():
167 |     x = np.random.random( (1000, 2) )                # point cloud (no mesh topology!)
168 |     F = np.sin(np.pi*x[:,0]) * np.cos(np.pi*x[:,1])  # function values on the point cloud
169 |     X,Y,Z = project_onto_regular_grid_2D(x, F, fit_order=2, nk=30)
170 |     plot_wireframe( {"x" : (X, r"$x$"),
171 |                      "y" : (Y, r"$y$"),
172 |                      "z" : (Z, r"$f(x,y)$")} )
173 | 
174 | if __name__ == '__main__':
175 |     main()
176 |     plt.show()
177 | 


--------------------------------------------------------------------------------
/examples/lapackdrivers_example.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | #
  3 | """Performance benchmarking and usage examples for the wlsqm.utils.lapackdrivers module.
  4 | 
  5 | JJ 2016-11-02
  6 | """
  7 | 
  8 | from __future__ import division, print_function, absolute_import
  9 | 
 10 | import time
 11 | 
 12 | import numpy as np
 13 | from numpy.linalg import solve as numpy_solve  # for comparison purposes
 14 | 
 15 | import matplotlib.pyplot as plt
 16 | 
 17 | try:
 18 |     import wlsqm.utils.lapackdrivers as drivers
 19 | except ImportError:
 20 |     import sys
 21 |     sys.exit( "WLSQM not found; is it installed?" )
 22 | 
 23 | # from find_neighbors2.py
 24 | class SimpleTimer:
 25 |     def __init__(self, label="", n=None):
 26 |         self.label = label
 27 |         self.n     = n      # number of repetitions done inside the "with..." section (for averaging in timing info)
 28 | 
 29 |     def __enter__(self):
 30 |         self.t0 = time.time()
 31 |         return self
 32 | 
 33 |     def __exit__(self, errtype, errvalue, traceback):
 34 |         dt         = time.time() - self.t0
 35 |         identifier = ("%s" % self.label) if len(self.label) else "time taken: "
 36 |         avg        = (", avg. %gs per run" % (dt/self.n)) if self.n is not None else ""
 37 |         print( "%s%gs%s" % (identifier, dt, avg) )
 38 | 
 39 | # from util.py
 40 | def f5(seq, idfun=None):
 41 |    """Uniqify a list (remove duplicates).
 42 | 
 43 |    This is the fast order-preserving uniqifier "f5" from
 44 |    http://www.peterbe.com/plog/uniqifiers-benchmark
 45 | 
 46 |    The list does not need to be sorted.
 47 | 
 48 |    The return value is the uniqified list.
 49 | 
 50 |    """
 51 |    # order preserving
 52 |    if idfun is None:
 53 |        def idfun(x): return x
 54 |    seen = {}
 55 |    result = []
 56 |    for item in seq:
 57 |        marker = idfun(item)
 58 |        # in old Python versions:
 59 |        # if seen.has_key(marker)
 60 |        # but in new ones:
 61 |        if marker in seen: continue
 62 |        seen[marker] = 1
 63 |        result.append(item)
 64 |    return result
 65 | 
 66 | 
 67 | 
 68 | def main():
 69 | #    # exact solution is (3/10, 2/5, 0)
 70 | #    A = np.array( ( (2., 1.,  3.),
 71 | #                    (2., 6.,  8.),
 72 | #                    (6., 8., 18.) ), dtype=np.float64, order='F' )
 73 | #    b = np.array(   (1., 3., 5.),    dtype=np.float64 )
 74 | 
 75 | #    # symmetric matrix for testing symmetric solver
 76 | #    A = np.array( ( (2., 1.,  3.),
 77 | #                    (1., 6.,  8.),
 78 | #                    (3., 8., 18.) ), dtype=np.float64, order='F' )
 79 | #    b = np.array(   (1., 3., 5.),    dtype=np.float64 )
 80 | 
 81 |     # random matrix
 82 |     n = 5
 83 |     A = np.random.sample( (n,n) )
 84 |     A = np.array( A, dtype=np.float64, order='F' )
 85 |     drivers.symmetrize( A )  # fast Cython implementation of  A = 0.5 * (A + A.T)
 86 |     b = np.random.sample( (n,) )
 87 | 
 88 |     # test that it works
 89 | 
 90 |     x = numpy_solve(A, b)
 91 |     print( "NumPy:", x )
 92 | 
 93 |     A2 = A.copy(order='F')
 94 |     x2 = b.copy()
 95 |     drivers.symmetric(A2, x2)
 96 |     print( "dsysv:", x2 )
 97 | 
 98 |     A3 = A.copy(order='F')
 99 |     x3 = b.copy()
100 |     drivers.general(A3, x3)
101 |     print( "dgesv:", x3 )
102 | 
103 |     assert (np.abs(x - x3) < 1e-10).all(), "Something went wrong, solutions do not match"  # check general solver first
104 |     assert (np.abs(x - x2) < 1e-10).all(), "Something went wrong, solutions do not match"  # then check symmetric solver
105 | 
106 | 
107 |     # test performance
108 | 
109 |     # for verification only - very slow (serial only!)
110 |     use_numpy = True
111 | 
112 |     # parallel processing
113 |     ntasks = 8
114 | 
115 | #    # overview, somewhat fast but not very accurate
116 | #    sizes = f5( map( lambda x: int(x), np.ceil(3*np.logspace(0, 3, 21, dtype=int)) ) )
117 | #    reps = map( lambda x: int(x), 10.**(4 - np.log10(sizes)) )
118 | 
119 | #    # "large n"
120 | #    sizes = f5( map( lambda x: int(x), np.ceil(3*np.logspace(2, 3, 21, dtype=int)) ) )
121 | #    reps = map( lambda x: int(x), 10.**(5 - np.log10(sizes)) )
122 | 
123 |     # "small n" (needs more repetitions to eliminate noise from other running processes since a single solve is very fast)
124 |     sizes = f5( map( lambda x: int(x), np.ceil(3*np.logspace(0, 2, 21, dtype=int)) ) )
125 |     reps = map( lambda x: int(x), 10.**(6 - np.log10(sizes)) )
126 | 
127 |     print( "performance test: %d tasks, sizes %s" % (ntasks, sizes) )
128 | 
129 |     results1 = np.empty( (len(sizes),), dtype=np.float64 )
130 |     results2 = np.empty( (len(sizes),), dtype=np.float64 )
131 |     results3 = np.empty( (len(sizes),), dtype=np.float64 )
132 |     results4 = np.empty( (len(sizes),), dtype=np.float64 )
133 |     results5 = np.empty( (len(sizes),), dtype=np.float64 )
134 |     results6 = np.empty( (len(sizes),), dtype=np.float64 )
135 |     results7 = np.empty( (len(sizes),), dtype=np.float64 )
136 | 
137 | #    # many LHS (completely independent problems)
138 | #    n = 5
139 | #    reps=int(1e5)
140 | #    A = np.random.sample( (n,n,reps) )
141 | #    A = 0.5 * (A + A.transpose(1,0,2))  # symmetrize
142 | #    A = np.array( A, dtype=np.float64, order='F' )
143 | #    b = np.random.sample( (n,reps) )
144 | #    b = np.array( b, dtype=np.float64, order='F' )
145 | #    with SimpleTimer(label="msymmetric ", n=reps) as s:
146 | #        drivers.msymmetricp(A, b, ntasks)
147 | #    with SimpleTimer(label="mgeneral ", n=reps) as s:
148 | #        drivers.mgeneralp(A, b, ntasks)
149 | 
150 | 
151 |     for j,item in enumerate(zip(sizes,reps)):
152 |         n,r = item
153 |         print( "testing size %d, reps = %d" % (n, r) )
154 | 
155 |         # same LHS, many different RHS
156 | 
157 |         print( "    prep same LHS, many RHS..." )
158 | 
159 |         A = np.random.sample( (n,n) )
160 |         # symmetrize
161 | #        A *= 0.5
162 | #        A += A.T  # not sure if this works
163 |         A = np.array( A, dtype=np.float64, order='F' )
164 | #        A = 0.5 * (A + A.T)  # symmetrize
165 |         drivers.symmetrize(A)
166 |         b = np.random.sample( (n,r) )
167 |         b = np.array( b, dtype=np.float64, order='F' )
168 | 
169 |         print( "    solve:" )
170 | 
171 | #        # for verification only - very slow (Python loop, serial!)
172 | #        if use_numpy:
173 | #            t0 = time.time()
174 | #            x = np.empty( (n,r), dtype=np.float64 )
175 | #            for k in range(r):
176 | #                x[:,k] = numpy_solve(A, b[:,k])
177 | #            results1[j] = (time.time() - t0) / r
178 | 
179 |         print( "        symmetricsp" )
180 |         t0 = time.time()
181 |         A2 = A.copy(order='F')
182 |         x2 = b.copy(order='F')
183 |         drivers.symmetricsp(A2, x2, ntasks)
184 |         results2[j] = (time.time() - t0) / r
185 | 
186 |         print( "        generalsp" )
187 |         t0 = time.time()
188 |         A3 = A.copy(order='F')
189 |         x3 = b.copy(order='F')
190 |         drivers.generalsp(A3, x3, ntasks)
191 |         results3[j] = (time.time() - t0) / r
192 | 
193 |         # different LHS for each problem
194 | 
195 |         print( "    prep independent problems..." )
196 | 
197 |         A = np.random.sample( (n,n,r) )
198 |         # symmetrize
199 | #        A *= 0.5
200 | #        A += A.transpose(1,0,2)  # this doesn't work
201 |         A = np.array( A, dtype=np.float64, order='F' )
202 | #        A = 0.5 * (A + A.transpose(1,0,2))
203 |         drivers.msymmetrizep(A, ntasks)
204 |         b = np.random.sample( (n,r) )
205 |         b = np.array( b, dtype=np.float64, order='F' )
206 | 
207 |         print( "    solve:" )
208 | 
209 |         # for verification only - very slow (Python loop, serial!)
210 |         if use_numpy:
211 |             print( "        NumPy" )
212 |             t0 = time.time()
213 |             x = np.empty( (n,r), dtype=np.float64, order='F' )
214 |             for k in range(r):
215 |                 x[:,k] = numpy_solve(A[:,:,k], b[:,k])
216 |             results1[j] = (time.time() - t0) / r
217 | 
218 |         print( "        msymmetricp" )
219 |         t0 = time.time()
220 |         A2 = A.copy(order='F')
221 |         x2 = b.copy(order='F')
222 |         drivers.msymmetricp(A2, x2, ntasks)
223 |         results4[j] = (time.time() - t0) / r
224 | 
225 |         print( "        mgeneralp" )
226 |         t0 = time.time()
227 |         A3 = A.copy(order='F')
228 |         x3 = b.copy(order='F')
229 |         drivers.mgeneralp(A3, x3, ntasks)
230 |         results5[j] = (time.time() - t0) / r
231 | 
232 |         print( "        msymmetricfactorp & msymmetricfactoredp" )  # factor once, then it is possible to solve multiple times (although we now test only once)
233 |         t0 = time.time()
234 |         ipiv = np.empty( (n,r), dtype=np.intc, order='F' )
235 |         fact = A.copy(order='F')
236 |         x4   = b.copy(order='F')
237 |         drivers.msymmetricfactorp( fact, ipiv, ntasks )
238 |         drivers.msymmetricfactoredp( fact, ipiv, x4, ntasks )
239 |         results6[j] = (time.time() - t0) / r
240 | 
241 |         print( "        mgeneralfactorp & mgeneralfactoredp" )  # factor once, then it is possible to solve multiple times (although we now test only once)
242 |         t0 = time.time()
243 |         ipiv = np.empty( (n,r), dtype=np.intc, order='F' )
244 |         fact = A.copy(order='F')
245 |         x5   = b.copy(order='F')
246 |         drivers.mgeneralfactorp( fact, ipiv, ntasks )
247 |         drivers.mgeneralfactoredp( fact, ipiv, x5, ntasks )
248 |         results7[j] = (time.time() - t0) / r
249 | 
250 |         if use_numpy:
251 | #            print( np.max(np.abs(x - x3)) )  # DEBUG
252 | #            print( np.max(np.abs(x - x5)) )  # DEBUG
253 |             print( np.max(np.abs(x2 - x4)) )  # DEBUG
254 |             assert (np.abs(x - x5) < 1e-10).all(), "Something went wrong, solutions do not match"  # check general solver first
255 |             assert (np.abs(x - x3) < 1e-10).all(), "Something went wrong, solutions do not match"  # check general solver
256 | #            assert (np.abs(x - x2) < 1e-5).all(), "Something went wrong, solutions do not match"  # doesn't make sense to compare, DSYSV is more accurate for badly conditioned symmetric matrices
257 |             assert (np.abs(x2 - x4) < 1e-7).all(), "Something went wrong, solutions do not match"  # check symmetric solvers against each other
258 |                                                                                                    # (not exactly the same algorithm (DSYTRS2 vs. DSYTRS), so there may be slight deviation)
259 | 
260 | 
261 | # old, serial only
262 | #
263 | #    for j,item in enumerate(zip(sizes,reps)):
264 | #        n,r = item
265 | #        print( "testing size %d, reps = %d" % (n, r) )
266 | #
267 | #        A = np.random.sample( (n,n) )
268 | #        A = 0.5 * (A + A.T)  # symmetrize
269 | #        A = np.array( A, dtype=np.float64, order='F' )
270 | #        b = np.random.sample( (n,) )
271 | #
272 | #        t0 = time.time()
273 | #        for k in range(r):
274 | #            x = numpy_solve(A, b)
275 | #        results1[j] = (time.time() - t0) / r
276 | #
277 | #        t0 = time.time()
278 | #        for k in range(r):
279 | #            A2 = A.copy(order='F')
280 | #            x2 = b.copy()
281 | #            drivers.symmetric(A2, x2)
282 | #        results2[j] = (time.time() - t0) / r
283 | #
284 | #        t0 = time.time()
285 | #        for k in range(r):
286 | #            A3 = A.copy(order='F')
287 | #            x3 = b.copy()
288 | #            drivers.general(A3, x3)
289 | #        results3[j] = (time.time() - t0) / r
290 | 
291 | 
292 |     # visualize
293 | 
294 |     plt.figure(1)
295 |     plt.clf()
296 |     if use_numpy:
297 |         plt.loglog(sizes, results1, 'k-', label='NumPy')
298 |     plt.loglog(sizes, results2, 'b--', label='dsysv, same LHS, many RHS')
299 |     plt.loglog(sizes, results3, 'b-',  label='dgesv, same LHS, many RHS')
300 |     plt.loglog(sizes, results4, 'r--', label='dsysv, independent problems')
301 |     plt.loglog(sizes, results5, 'r-',  label='dgesv, independent problems')
302 |     plt.loglog(sizes, results6, 'g--', label='dsytrf+dsytrs, independent problems')
303 |     plt.loglog(sizes, results7, 'g-',  label='dgetrf+dgetrs, independent problems')
304 |     plt.xlabel('n')
305 |     plt.ylabel('t')
306 |     plt.title('Average time per problem instance, %d parallel tasks' % (ntasks))
307 |     plt.axis('tight')
308 |     plt.grid(b=True, which='both')
309 |     plt.legend(loc='best')
310 | 
311 |     plt.savefig('figure1_latest.pdf')
312 | 
313 | 
314 | if __name__ == '__main__':
315 |     main()
316 |     plt.show()
317 | 


--------------------------------------------------------------------------------
/examples/sudoku_lhs.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | """Latin hypercube sampler with a sudoku-like constraint.
  3 | 
  4 | Tested on Python 2.7 and 3.4.
  5 | 
  6 | License: 2-clause BSD; copyright 2010-2017 Juha Jeronen and University of Jyväskylä.
  7 | 
  8 | The sudoku LHS algorithm is a bit like the first stage in the design of an N-dimensional
  9 | sudoku puzzle, hence the name: each "box" must have exactly the same number of samples,
 10 | and no two samples may occur on the same hyperplane. The algorithm runs in linear time
 11 | (w.r.t. number of samples) and consumes a linear amount of memory.
 12 | 
 13 | The Sudoku LHS sampling method is inspired by, but not related to, orthogonal sampling.
 14 | The latter refers to LHS sampling using orthogonal arrays; about that, see the articles
 15 |     B. Tang 1993:    Orthogonal Array-Based Latin Hypercubes,
 16 |     A. B. Owen 1992: Orthogonal arrays for computer experiments, integration and
 17 |                      visualization,
 18 |     K. Q. Ye 1998:   Orthogonal column Latin hypercubes and their application in computer
 19 |                      experiments
 20 | for more information.
 21 | 
 22 | Latin hypercube sampling is very classical (original paper: M. D. McKay, R. J. Beckman,
 23 | W. J. Conover 1979: A Comparison of Three Methods for Selecting Values of Input Variables
 24 | in the Analysis of Output from a Computer Code).
 25 | See e.g. Wikipedia for a description:
 26 |     http://en.wikipedia.org/wiki/Latin_hypercube_sampling
 27 | 
 28 | Historical note:
 29 |     The variant of sudoku sampling implemented here was originally developed as part of
 30 |     the SAVU project in 2010, and briefly mentioned in the paper
 31 |         Jeronen, J. SAVU: A Statistical Approach for Uncertain Data in Dynamics
 32 |         of Axially Moving Materials. In: A. Cangiani, R. Davidchack, E. Georgoulis,
 33 |         A. Gorban, J. Levesley, M. Tretyakov (eds.) Numerical Mathematics and
 34 |         Advanced Applications 2011: Proceedings of ENUMATH 2011, the 9th European Conference
 35 |         on Numerical Mathematics and Advanced Applications, Leicester, September 2011,
 36 |         831-839, Springer, 2013.
 37 | 
 38 |     Later, Shields and Zhang independently developed and published a variant of sudoku sampling
 39 |     in the paper
 40 |         Michael D. Shields and Jiaxin Zhang. The generalization of Latin hypercube sampling.
 41 |         Reliability Engineering & System Safety 148:96-108, 2016.
 42 |         http://doi.org/10.1016/j.ress.2015.12.002
 43 | 
 44 | JJ 2010-09-23 (MATLAB version)
 45 | JJ 2012-03-12 (Python version)
 46 | JJ 2017-04-25 (Python3 compatibility, NumPyDoc style docstring, define __version__)
 47 | """
 48 | 
 49 | from __future__ import division, print_function, absolute_import
 50 | 
 51 | import numpy as np
 52 | 
 53 | __version__ = '1.0.0'
 54 | 
 55 | def sample(N,k,n, visualize=False, showdiag=False, verbose=False):
 56 |     """Create a coarsely `N`-dimensionally stratified latin hypercube sample (LHS) of range(`k` * `m`) in `N` dimensions.
 57 | 
 58 | Parameters:
 59 |     N : int, >= 1
 60 |         number of dimensions
 61 |     k : int, >= 1
 62 |         number of large subdivisions (sudoku boxes, "subspaces") per dimension
 63 |     n : int, >= 1
 64 |         number of samples to place in each subspace
 65 |     visualize : bool (optional)
 66 |         If True, the results (projected into two dimensions pairwise)
 67 |         are plotted using Matplotlib when the sampling is finished.
 68 |     showdiag : bool (optional)
 69 |         If True, and `N` >= 3, show also one-dimensional projection
 70 |         of the result onto each axis.
 71 | 
 72 |         Implies "visualize".
 73 | 
 74 |         This should produce a straight line with no holes onto
 75 |         each subplot that is on the diagonal of the plot array;
 76 |         mainly intended for debug.
 77 |     verbose : bool (optional)
 78 |         If this exists and is true, progress messages and warnings
 79 |         (for non-integer input) are printed.
 80 | 
 81 | Return value:
 82 |     tuple (`S`, `m`), where:
 83 |         S : (`k` * `m`)-by-`N` rank-2 np.array
 84 |             where each row is an `N`-tuple of integers in range(1, `k` * `m` + 1).
 85 | 
 86 |         m : int, >= 1
 87 |             number of bins per parameter in one subspace (i.e. sample slots
 88 |             per axis in one box).
 89 | 
 90 |             `m` = `n` * (`k` ** (`N` - 1)), but is provided as output for convenience.
 91 | 
 92 | **Examples:**
 93 | 
 94 |     `N` = 2 dimensions, k = 3 subspaces per axis, `n` = 1 sample per subspace.
 95 |     `m` will be `n` * (`k` ** (`N` - 1)) = 1 * 3**(2-1) = 3. Plot the result and show progress messages::
 96 | 
 97 |         S,m = sample(2, 3, 1, visualize=True, verbose=True)
 98 | 
 99 |     For comparison with the previous example, try this classical Latin hypercube
100 |     that has 9 samples in total, plotting the result. We choose 9, because in
101 |     the previous example, `k` * `m` = 3*3 = 9::
102 | 
103 |         S,m = sample(2, 1, 9, visualize=True)
104 | 
105 | **Notes:**
106 | 
107 |     If `k` = 1, the algorithm reduces to classical Latin hypercube sampling.
108 | 
109 |     If `N` = 1, the algorithm simply produces a random permutation of range(`k`).
110 | 
111 |     Let `m` = `n` * (`k` ** (`N` - 1)) denote the number of bins for one variable
112 |     in one subspace. The total number of samples is always exactly `k` * `m'.
113 |     Each component of a sample can take on values 0, 1, ..., (`k` * `m` - 1).
114 | """
115 |     # sanity check input
116 |     if not isinstance(N, int)  or  N < 1:
117 |         raise ValueError("N must be int >= 1, got %g" % (N))
118 |     if not isinstance(k, int)  or  k < 1:
119 |         raise ValueError("k must be int >= 1, got %g" % (k))
120 |     if not isinstance(n, int)  or  n < 1:
121 |         raise ValueError("n must be int >= 1, got %g" % (n))
122 | 
123 |     # showing the diagonal implies visualization
124 |     if showdiag:
125 |         visualize = True
126 | 
127 |     # Discussion.
128 | 
129 |     # Proof that the following algorithm implements a Sudoku-like LHS method:
130 |     #
131 |     # * We desire two properties: Latin hypercube sampling globally, and equal density
132 |     #   in each subspace.
133 |     # * The independent index vector generation for each parameter guarantees the Latin
134 |     #   hypercube property: some numbers will have been used, and removed from the index
135 |     #   vectors, when the next subspace along the same hyperplane is reached. Thus, the same
136 |     #   indices cannot be used again for any such subspace. This process continues until each
137 |     #   index has been used exactly once.
138 |     # * The equal density property is enforced by the fact that each subspace gets exactly one
139 |     #   sample generated in one run of the loop. The total number of samples is, by design,
140 |     #   divisible by the number of these subspaces. Therefore, each subspace will have the
141 |     #   same sample density.
142 |     #
143 |     # Run time and memory cost:
144 |     #
145 |     # * Exactly k*m samples will be generated. This can be seen from the fact that there are
146 |     #   k*m bins per parameter, and they all get filled by exactly one sample.
147 |     # * Thus, runtime is in O(k*m) = O( k * n*k^(N-1) ) = O( n*k^N ). (This isn't as bad as it
148 |     #   looks. All it's saying is that a linear number of bins gets filled. This is much less
149 |     #   than the total number of bins (k*m)^N - which is why LHS is needed in the first place.
150 |     #   We get a reduction in sample count by the factor (k*m)^(N-1).)
151 |     # * Required memory for the final result is (k*m)*N reals (plus some overhead), where the
152 |     #   N comes from the fact that each N-tuple generated has N elements. Note that the index
153 |     #   vectors also use up k*m*N reals in total (k*N vectors, each with m elements). Thus the
154 |     #   memory cost is 2*k*m*N reals plus overhead.
155 |     # * Note that using a more complicated implementation that frees the elements of the index
156 |     #   vectors as they are used up probably wouldn't help with the memory usage, because many
157 |     #   vector implementations never decrease their storage space even if elements are deleted.
158 |     # * In other programming languages, one might work around this by using linked lists
159 |     #   instead of vectors, and arranging the memory allocations for the elements in a very
160 |     #   special way (i.e. such that the last ones in memory always get deleted first). By
161 |     #   using a linked list for the results, too, and allocating them over the deleted
162 |     #   elements of the index vectors (since they shrink at exactly the same rate the results
163 |     #   grow), one might be able to bring down the memory usage to k*m*N plus overhead.
164 |     # * Finally, note that in practical situations N, k and m are usually small, so the factor
165 |     #   of 2 doesn't really matter.
166 | 
167 |     # Algorithm.
168 | 
169 |     # Find necessary number of bins per subspace so that equal nonzero density is possible.
170 |     # A brief analysis shows that in order to exactly fill up all k*m bins for one variable,
171 |     # we must have k*m = n*k^N, i.e...
172 |     m = n * k**(N-1)
173 | 
174 |     # Create index vectors for each subspace for each parameter. (There are k*N of these.)
175 |     if verbose:
176 |         print('Allocating %d elements for solution...' % (N*k*m))
177 | 
178 |     I    = np.empty( [N,k,m], dtype=int, order="C" )  # index vectors
179 |     Iidx = np.zeros( [N,k],   dtype=int, order="C" )  # index of first "not yet used" element in each index vector
180 | 
181 |     # Create random permutations of range(m) so that in the sampling loop
182 |     # we may simply pick the first element from each index vector.
183 |     #
184 |     for i in range(N):
185 |         for j in range(k):
186 |             tmp = np.array( range(m), dtype=int )
187 |             np.random.shuffle(tmp)
188 |             I[i,j,:] = tmp
189 | 
190 |     if verbose:
191 |         print('Generating sample...')
192 |         print('Looping through %d subspaces.' % (k**N))
193 | 
194 |     L  = k*m   # number of samples still waiting for placement
195 |     Ns = k**N  # number of subspaces in total (cartesian product
196 |                #         of k subspaces per axis in N dimensions)
197 | 
198 |     # Start with an empty result set. We will place the generated samples here.
199 |     S = np.empty( [L,N], dtype=int, order="C" )
200 |     out_idx = 0  # index of current output sample in S
201 | 
202 |     # create views for linear indexing
203 |     I_lin    = np.reshape(I, -1)
204 |     Iidx_lin = np.reshape(Iidx, -1)
205 | 
206 |     # we will need an array of range(N) several times in the loop...
207 |     rgN = np.arange(N, dtype=int)
208 | 
209 |     while L > 0:
210 |         # Loop over all subspaces, placing one sample in each.
211 |         for j in range(Ns):  # index subspaces linearly
212 |             # Find, in each dimension, which subspace we are in.
213 |             # Compute the multi-index (vector containing an index in each dimension)
214 |             # for this subspace.
215 |             #
216 |             # Simple example: (N,k,n) = (2,3,1)
217 |             #   =>  pj = 0 0, 1 0, 2 0,  0 1, 1 1, 2 1,  0 2, 1 2, 2 2
218 |             #   when j =  0,   1,   2,    3,   4,   5,    6,   7,   8
219 |             #
220 |             pj = np.array( ( j // (k**rgN) ) % k, dtype=int )
221 | 
222 |             # Construct one sample point.
223 |             #
224 |             # To do this, we grab the first "not yet used" element in all index vectors
225 |             # (one for each dimension) corresponding to this subspace.
226 |             #
227 |             # Along the dth dimension, we are in the pj[d]th subspace.
228 |             # Hence, in the dth dimension, we want to refer to the vector whose index is pj[d].
229 |             #
230 |             # Hence, we should take
231 |             #  row = d  (effectively, range(N))
232 |             #  col = pj[d]
233 |             #
234 |             # The array Iidx is of the shape [N,k]. NumPy uses C-contiguous ordering
235 |             # by default; last index varies fastest. Hence, the element [row,col] is at
236 |             # k*row + col.
237 |             #
238 |             # This gets us a vector of linear indices into Iidx, where the dth element
239 |             # corresponds to the linear index of the pj[d]th vector.
240 |             #
241 |             i = np.array( k*rgN + pj, dtype=int )
242 | 
243 |             # Extract the "first unused element" data from Iidx for each of the vectors,
244 |             # to get the actual sample slot numbers (random permutations) stored in I.
245 |             #
246 |             indices = Iidx_lin[i]
247 | 
248 |             # Indexing: the array I is of shape [N,k,m] and has C storage order.
249 |             #
250 |             idx_first = np.array( k*m*rgN + m*pj + indices, dtype=int )
251 | 
252 |             s = I_lin[idx_first] # this is our new sample point (vector of length N)
253 |             Iidx_lin[i] += 1     # move to the next element in the selected index vectors
254 | 
255 |             # Now s contains a sample from (range(m), range(m), ..., range(m)) (N elements).
256 |             # By its construction, the sample conforms globally to the Latin hypercube
257 |             # requirement.
258 | 
259 |             # Compute the base index along each dimension. In the global numbering
260 |             # which goes 0, 1, ..., (k*m-1) along each axis, the first element
261 |             # of the current subspace is at this multi-index:
262 |             #
263 |             a = pj*m
264 | 
265 |             # Add the new sample to the result set.
266 |             S[out_idx,:] = a+s
267 |             out_idx += 1
268 | 
269 |         # We placed exactly Ns samples during the for loop.
270 |         L -= Ns
271 | 
272 |     # Result visualization (for debug and illustrative purposes)
273 |     #
274 |     if visualize  and  N > 1:
275 |         if verbose:
276 |             print('Plotting...')
277 | 
278 |         import itertools
279 |         import matplotlib.pyplot as plt
280 | 
281 |         # if the grid would show more lines than this, the lines are hidden.
282 |         max_major_lines = 5
283 |         max_minor_lines = 15
284 | 
285 |         if k*m > 100:
286 |             style = 'b.'
287 |         else:
288 |             style = 'bo' # use circles when a small number of bins
289 | 
290 |         plt.figure(1)
291 |         plt.clf()
292 | 
293 |         if N >= 3:
294 |             # We'll make a "pairs" plot (like the pairs() function of the "R"
295 |             # statistics software).
296 | 
297 |             # generate all pairs of dimensions, make explicit list
298 |             pair_list = list(itertools.combinations(range(N), 2))
299 | 
300 |             # make final list.
301 |             #
302 |             # We want to populate both sides of the diagonal in the plot,
303 |             # so we need pair_list, plus another copy of it
304 |             # with the first and second components switched in each pair.
305 |             #
306 |             pairs = list(pair_list) # copy
307 |             pairs.extend( tuple(reversed(pair)) for pair in pair_list )
308 | 
309 |             # Show also the diagonal if requested.
310 |             #
311 |             # This should produce a straight line with no holes onto
312 |             # each subplot that is on the diagonal of the plot array.
313 |             #
314 |             if showdiag:
315 |                 pairs.extend( [ (j,j) for j in range(N) ] )
316 |         else: # N == 2:
317 |             pairs = [ (0, 1) ]
318 | 
319 |         Np = len(pairs)
320 |         for i in range(Np):
321 |             if N >= 3:
322 |                 if verbose:
323 |                     print('Subplot %d of %d...' % ((i+1), Np))
324 |                 plt.subplot( N,N, N*pairs[i][1] + (pairs[i][0] + 1) )
325 | 
326 |             # off-diagonal projection? (i.e. a true 2D projection)
327 |             if pairs[i][0] != pairs[i][1]:
328 |                 # Plot the points picked by the sample
329 |                 plt.plot( S[:,pairs[i][0]], S[:,pairs[i][1]], style)
330 |                 axmax = k*m
331 | 
332 |                 # Mark the subspaces onto the figure
333 |                 # (if few enough to fit reasonably on screen)
334 |                 #
335 |                 if k <= max_major_lines:
336 |                     for j in range(k):
337 |                         xy = -0.5 + j*m
338 |                         plt.plot( [xy, xy], [-0.5, axmax - 0.5], 'k', linewidth=2.0 )
339 |                         plt.plot( [-0.5, axmax - 0.5], [xy, xy], 'k', linewidth=2.0 )
340 | 
341 |                 # Mark bins (if few enough to fit reasonably on screen)
342 |                 #
343 |                 if k*m <= max_minor_lines:
344 |                     for j in range(k*m):
345 |                         xy = -0.5 + j
346 |                         plt.plot( [xy, xy], [-0.5, axmax - 0.5], 'k')
347 |                         plt.plot( [-0.5, axmax - 0.5], [xy, xy], 'k')
348 | 
349 |                 # Make a box around the area
350 |                 plt.plot( [-0.5,         axmax - 0.5], [-0.5,        -0.5],         'k', \
351 |                          linewidth=2.0  )
352 |                 plt.plot( [-0.5,         axmax - 0.5], [axmax - 0.5, axmax - 0.5], 'k', \
353 |                          linewidth=2.0  )
354 |                 plt.plot( [-0.5,         -0.5],        [-0.5,        axmax - 0.5], 'k', \
355 |                          linewidth=2.0  )
356 |                 plt.plot( [axmax - 0.5,  axmax - 0.5],  [-0.5,       axmax - 0.5], 'k', \
357 |                          linewidth=2.0  )
358 | 
359 |                 # Set the axes so that the extreme indices just fit into the view
360 |                 plt.axis("equal")
361 |                 plt.axis( [-0.5, axmax-0.5, -0.5, axmax-0.5 ] )
362 |             else: # 1D projection
363 |                 plt.plot( S[:,pairs[i][0]], np.zeros( [k*m] ), style)
364 |                 plt.axis( [-0.5, axmax-0.5, -0.5, 0.5] )
365 | 
366 |         # Label the variables.
367 |         #
368 |         # We only do this if the diagonal subplots are blank.
369 |         #
370 |         if N >= 3:
371 |             if not showdiag:
372 |                 for i in range(N):
373 |                     plt.subplot(N,N, N*i+(i+1))
374 |                     my_label = 'Row: x = var %d' % i
375 |                     plt.text(0.5,0.6, my_label, horizontalalignment="center", fontweight="bold")
376 |                     my_label = 'Col: y = var %d' % i
377 |                     plt.text(0.5,0.4, my_label, horizontalalignment="center", fontweight="bold")
378 |                     plt.axis("off")
379 |         else:
380 |             plt.xlabel('Var 0', fontweight="bold")
381 |             plt.ylabel('Var 1', fontweight="bold")
382 | 
383 |         if verbose:
384 |             print('Plotting done. Showing figure...')
385 | 
386 |         # show figures and enter gtk mainloop
387 |         plt.show()
388 | 
389 |     return (S,m)
390 | 


--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | #
  3 | """Setuptools-based setup script for WLSQM."""
  4 | 
  5 | from __future__ import division, print_function, absolute_import
  6 | 
  7 | #########################################################
  8 | # Config
  9 | #########################################################
 10 | 
 11 | # choose build type here
 12 | #
 13 | build_type="optimized"
 14 | #build_type="debug"
 15 | 
 16 | 
 17 | #########################################################
 18 | # Init
 19 | #########################################################
 20 | 
 21 | # check for Python 2.7 or later
 22 | # http://stackoverflow.com/questions/19534896/enforcing-python-version-in-setup-py
 23 | import sys
 24 | if sys.version_info < (2,7):
 25 |     sys.exit('Sorry, Python < 2.7 is not supported')
 26 | 
 27 | import os
 28 | import platform
 29 | 
 30 | from setuptools import setup
 31 | from setuptools.extension import Extension
 32 | 
 33 | try:
 34 |     from Cython.Build import cythonize
 35 | except ImportError:
 36 |     sys.exit("Cython not found. Cython is needed to build the extension modules for WLSQM.")
 37 | 
 38 | 
 39 | #########################################################
 40 | # Definitions
 41 | #########################################################
 42 | 
 43 | system = platform.system()
 44 | 
 45 | if system == "Windows":
 46 |     my_extra_compile_args_math = ["/openmp"]
 47 |     my_extra_compile_args_nonmath = []
 48 |     my_extra_link_args = []
 49 |     debug = False
 50 | else:
 51 |     extra_compile_args_math_optimized    = ['-fopenmp', '-march=native', '-O2', '-msse', '-msse2', '-mfma', '-mfpmath=sse']
 52 |     extra_compile_args_math_debug        = ['-fopenmp', '-march=native', '-O0', '-g']
 53 | 
 54 |     extra_compile_args_nonmath_optimized = ['-O2']
 55 |     extra_compile_args_nonmath_debug     = ['-O0', '-g']
 56 | 
 57 |     extra_link_args_optimized    = ['-fopenmp']
 58 |     extra_link_args_debug        = ['-fopenmp']
 59 | 
 60 | 
 61 |     if build_type == 'optimized':
 62 |         my_extra_compile_args_math    = extra_compile_args_math_optimized
 63 |         my_extra_compile_args_nonmath = extra_compile_args_nonmath_optimized
 64 |         my_extra_link_args            = extra_link_args_optimized
 65 |         debug = False
 66 |         print( "build configuration selected: optimized" )
 67 |     else: # build_type == 'debug':
 68 |         my_extra_compile_args_math    = extra_compile_args_math_debug
 69 |         my_extra_compile_args_nonmath = extra_compile_args_nonmath_debug
 70 |         my_extra_link_args            = extra_link_args_debug
 71 |         debug = True
 72 |         print( "build configuration selected: debug" )
 73 | 
 74 | 
 75 | #########################################################
 76 | # Long description
 77 | #########################################################
 78 | 
 79 | DESC="""WLSQM (Weighted Least SQuares Meshless) is a fast and accurate meshless least-squares interpolator for Python, implemented in Cython.
 80 | 
 81 | Given scalar data values on a set of points in 1D, 2D or 3D, WLSQM constructs a piecewise polynomial global surrogate model (a.k.a. response surface), using up to 4th order polynomials.
 82 | 
 83 | Use cases include response surface modeling, and computing space derivatives of data known only as values at discrete points in space. No grid or mesh is needed.
 84 | 
 85 | Any derivative of the model function (e.g. d2f/dxdy) can be easily evaluated, up to the order of the polynomial.
 86 | 
 87 | Sensitivity data of solution DOFs (on the data values at points other than the reference in the local neighborhood) can be optionally computed.
 88 | 
 89 | Performance-critical parts are implemented in Cython. LAPACK is used via SciPy's Cython-level bindings. OpenMP is used for parallelization over the independent local problems (also in the linear solver step).
 90 | 
 91 | This implementation is targeted for high performance in a single-node environment, such as a laptop. The main target is x86_64.
 92 | """
 93 | 
 94 | #########################################################
 95 | # Helpers
 96 | #########################################################
 97 | 
 98 | my_include_dirs = ["."]  # IMPORTANT, see https://github.com/cython/cython/wiki/PackageHierarchy
 99 | 
100 | def ext(extName):
101 |     extPath = extName.replace(".", os.path.sep)+".pyx"
102 |     return Extension( extName,
103 |                       [extPath],
104 |                       extra_compile_args=my_extra_compile_args_nonmath
105 |                     )
106 | def ext_math(extName):
107 |     if system == "Windows":
108 |         libraries = []
109 |     else:
110 |         libraries = ["m"]  # "m" links libm, the math library on unix-likes; see http://docs.cython.org/src/tutorial/external.html
111 |     extPath = extName.replace(".", os.path.sep)+".pyx"
112 |     return Extension( extName,
113 |                       [extPath],
114 |                       extra_compile_args=my_extra_compile_args_math,
115 |                       extra_link_args=my_extra_link_args,
116 |                       libraries=libraries
117 |                     )
118 | 
119 | # http://stackoverflow.com/questions/13628979/setuptools-how-to-make-package-contain-extra-data-folder-and-all-folders-inside
120 | datadirs  = ("examples",)
121 | dataexts  = (".py", ".pyx", ".pxd", ".c", ".sh", ".lyx", ".pdf")
122 | datafiles = []
123 | getext = lambda filename: os.path.splitext(filename)[1]
124 | for datadir in datadirs:
125 |     datafiles.extend( [(root, [os.path.join(root, f) for f in files if getext(f) in dataexts])
126 |                        for root, dirs, files in os.walk(datadir)] )
127 | 
128 | datafiles.append( ('.', ["README.md", "LICENSE.md", "TODO.md", "CHANGELOG.md"]) )
129 | datafiles.append( ('.', ["example.png"]) )
130 | 
131 | 
132 | #########################################################
133 | # Utility modules
134 | #########################################################
135 | 
136 | ext_module_ptrwrap       = ext(     "wlsqm.utils.ptrwrap")        # Pointer wrapper for Cython/Python integration
137 | ext_module_lapackdrivers = ext_math("wlsqm.utils.lapackdrivers")  # Simple Python interface to LAPACK for solving many independent linear equation systems efficiently in parallel. Built on top of scipy.linalg.cython_lapack.
138 | 
139 | #########################################################
140 | # WLSQM (Weighted Least SQuares Meshless method)
141 | #########################################################
142 | 
143 | ext_module_defs     = ext(     "wlsqm.fitter.defs")      # definitions (named constants)
144 | ext_module_infra    = ext(     "wlsqm.fitter.infra")     # memory allocation infrastructure
145 | ext_module_impl     = ext_math("wlsqm.fitter.impl")      # low-level routines (implementation)
146 | ext_module_polyeval = ext_math("wlsqm.fitter.polyeval")  # evaluation of Taylor expansions and general polynomials
147 | ext_module_interp   = ext_math("wlsqm.fitter.interp")    # interpolation of fitted model
148 | ext_module_simple   = ext_math("wlsqm.fitter.simple")    # simple API
149 | ext_module_expert   = ext_math("wlsqm.fitter.expert")    # advanced API
150 | 
151 | #########################################################
152 | 
153 | # Extract __version__ from the package __init__.py
154 | # (since it's not a good idea to actually run __init__.py during the build process).
155 | #
156 | # http://stackoverflow.com/questions/2058802/how-can-i-get-the-version-defined-in-setup-py-setuptools-in-my-package
157 | #
158 | import ast
159 | with open('wlsqm/__init__.py', 'r') as f:
160 |     for line in f:
161 |         if line.startswith('__version__'):
162 |             version = ast.parse(line).body[0].value.s
163 |             break
164 |     else:
165 |         version = '0.0.unknown'
166 |         print( "WARNING: Version information not found, using placeholder '%s'" % (version) )
167 | 
168 | 
169 | setup(
170 |     name = "wlsqm",
171 |     version = version,
172 |     author = "Juha Jeronen",
173 |     author_email = "juha.jeronen@jyu.fi",
174 |     url = "https://github.com/Technologicat/python-wlsqm",
175 | 
176 |     description = "Weighted least squares meshless interpolator",
177 |     long_description = DESC,
178 | 
179 |     license = "BSD",
180 |     platforms = ["Linux"],  # free-form text field; http://stackoverflow.com/questions/34994130/what-platforms-argument-to-setup-in-setup-py-does
181 | 
182 |     classifiers = [ "Development Status :: 4 - Beta",
183 |                     "Environment :: Console",
184 |                     "Intended Audience :: Developers",
185 |                     "Intended Audience :: Science/Research",
186 |                     "License :: OSI Approved :: BSD License",
187 |                     "Operating System :: POSIX :: Linux",
188 |                     "Programming Language :: Cython",
189 |                     "Programming Language :: Python",
190 |                     "Programming Language :: Python :: 2",
191 |                     "Programming Language :: Python :: 2.7",
192 |                     "Programming Language :: Python :: 3",
193 |                     "Programming Language :: Python :: 3.4",
194 |                     "Topic :: Scientific/Engineering",
195 |                     "Topic :: Scientific/Engineering :: Mathematics",
196 |                     "Topic :: Software Development :: Libraries",
197 |                     "Topic :: Software Development :: Libraries :: Python Modules"
198 |                   ],
199 | 
200 |     # 0.16 seems to be the first SciPy version that has cython_lapack.pxd. ( https://github.com/scipy/scipy/commit/ba438eab99ce8f55220a6ff652500f07dd6a547a )
201 |     setup_requires = ["cython", "scipy (>=0.16)"],
202 |     install_requires = ["numpy", "scipy (>=0.16)"],
203 |     provides = ["wlsqm"],
204 | 
205 |     # same keywords as used as topics on GitHub
206 |     keywords = ["numerical interpolation differentiation curve-fitting least-squares meshless numpy cython"],
207 | 
208 |     ext_modules = cythonize( [ ext_module_lapackdrivers,
209 |                                ext_module_ptrwrap,
210 |                                ext_module_defs,
211 |                                ext_module_infra,
212 |                                ext_module_impl,
213 |                                ext_module_polyeval,
214 |                                ext_module_interp,
215 |                                ext_module_simple,
216 |                                ext_module_expert ],
217 | 
218 |                              include_path = my_include_dirs,
219 | 
220 |                              gdb_debug = debug ),
221 | 
222 |     # Declare packages so that  python -m setup build  will copy .py files (especially __init__.py).
223 |     packages = ["wlsqm", "wlsqm.utils", "wlsqm.fitter"],
224 | 
225 |     # Install also Cython headers so that other Cython modules can cimport ours
226 |     # FIXME: force sdist, but sdist only, to keep the .pyx files (this puts them also in the bdist)
227 |     package_data={'wlsqm.utils': ['*.pxd', '*.pyx'],  # note: paths relative to each package
228 |                   'wlsqm.fitter': ['*.pxd', '*.pyx']},
229 | 
230 |     # Disable zip_safe, because:
231 |     #   - Cython won't find .pxd files inside installed .egg, hard to compile libs depending on this one
232 |     #   - dynamic loader may need to have the library unzipped to a temporary folder anyway (at import time)
233 |     zip_safe = False,
234 | 
235 |     # Usage examples; not in a package
236 |     data_files = datafiles
237 | )
238 | 


--------------------------------------------------------------------------------
/wlsqm/__init__.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | #
 3 | """WLSQM (Weighted Least SQuares Meshless): a fast and accurate meshless least-squares interpolator for Python, for scalar-valued data defined as point values on 1D, 2D and 3D point clouds.
 4 | 
 5 | A general overview can be found in the README.
 6 | 
 7 | For the API, refer to  wlsqm.fitter.simple  and  wlsqm.fitter.expert.
 8 | 
 9 | When imported, this module imports all symbols from the following modules to the local namespace:
10 | 
11 |     wlsqm.fitter.defs    # definitions (constants) (common)
12 |     wlsqm.fitter.simple  # simple API
13 |     wlsqm.fitter.interp  # interpolation of fitted model (for simple API)
14 |     wlsqm.fitter.expert  # advanced API
15 | 
16 | This makes the names available as wlsqm.fit_2D(), wlsqm.ExpertSolver, etc.
17 | 
18 | JJ 2017-02-22
19 | """
20 | 
21 | # absolute_import: https://www.python.org/dev/peps/pep-0328/
22 | from __future__ import division, print_function, absolute_import
23 | 
24 | __version__ = '0.1.6'
25 | 
26 | from .fitter.defs   import *  # definitions (constants) (common)
27 | from .fitter.simple import *  # simple API
28 | from .fitter.interp import *  # interpolation of fitted model (for simple API)
29 | from .fitter.expert import *  # advanced API
30 | 
31 | 


--------------------------------------------------------------------------------
/wlsqm/fitter/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Technologicat/python-wlsqm/b697d163c2d2bec46b4d9696467abaebb9d4cbb3/wlsqm/fitter/__init__.py


--------------------------------------------------------------------------------
/wlsqm/fitter/defs.pxd:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | #
  3 | # WLSQM (Weighted Least SQuares Meshless): a fast and accurate meshless least-squares interpolator for Python, for scalar-valued data defined as point values on 1D, 2D and 3D point clouds.
  4 | #
  5 | # C-level definitions for Cython.
  6 | #
  7 | # This file contains only the declarations; the actual values are set in the .pyx source for wlsqm.fitter.defs.
  8 | #
  9 | # The suffix of _c means "visible at the C level in Cython"; it is used to distinguish
 10 | # the typed C constants from the Python-accessible objects also defined in the .pyx source.
 11 | #
 12 | # JJ 2016-11-30
 13 | 
 14 | from __future__ import absolute_import
 15 | 
 16 | # Algorithms for the solve step (expert mode).
 17 | #
 18 | cdef int ALGO_BASIC_c      # fit just once
 19 | cdef int ALGO_ITERATIVE_c  # fit with iterative refinement to mitigate roundoff
 20 | 
 21 | # Weighting methods.
 22 | #
 23 | cdef int WEIGHT_UNIFORM_c
 24 | cdef int WEIGHT_CENTER_c
 25 | 
 26 | # DOF index in the array f.
 27 | #
 28 | # These are ordered in increasing order of number of differentiations, so that if only first derivatives
 29 | # are required, the DOF array can be simply truncated after the first derivatives.
 30 | #
 31 | # To avoid gaps in the numbering, this requires separate DOF orderings for the 1D, 2D and 3D cases.
 32 | #
 33 | # (The other logical possibility would be function value first, then x-related, then y-related, then mixed,
 34 | #  but then the case of "first derivatives only" requires changes to the ordering to avoid gaps.
 35 | #  Specifying different orderings for different numbers of space dimensions is conceptually cleaner
 36 | #  of the two possibilities.)
 37 | 
 38 | # 1D case
 39 | #
 40 | cdef int i1_F_c
 41 | cdef int i1_X_c
 42 | cdef int i1_X2_c
 43 | cdef int i1_X3_c
 44 | cdef int i1_X4_c
 45 | 
 46 | cdef int i1_0th_end_c  # one-past end of zeroth-order case
 47 | cdef int i1_1st_end_c  # one-past end of first-order case
 48 | cdef int i1_2nd_end_c  # one-past-end of second-order case
 49 | cdef int i1_3rd_end_c  # one-past-end of third-order case
 50 | cdef int i1_4th_end_c  # one-past-end of fourth-order case
 51 | 
 52 | cdef int SIZE1_c  # maximum possible number of DOFs, 1D case
 53 | 
 54 | # 2D case
 55 | #
 56 | cdef int i2_F_c
 57 | 
 58 | cdef int i2_X_c
 59 | cdef int i2_Y_c
 60 | 
 61 | cdef int i2_X2_c
 62 | cdef int i2_XY_c
 63 | cdef int i2_Y2_c
 64 | 
 65 | cdef int i2_X3_c
 66 | cdef int i2_X2Y_c
 67 | cdef int i2_XY2_c
 68 | cdef int i2_Y3_c
 69 | 
 70 | cdef int i2_X4_c
 71 | cdef int i2_X3Y_c
 72 | cdef int i2_X2Y2_c
 73 | cdef int i2_XY3_c
 74 | cdef int i2_Y4_c
 75 | 
 76 | cdef int i2_0th_end_c  # one-past end of zeroth-order case
 77 | cdef int i2_1st_end_c  # one-past end of first-order case
 78 | cdef int i2_2nd_end_c  # one-past-end of second-order case
 79 | cdef int i2_3rd_end_c  # one-past-end of third-order case
 80 | cdef int i2_4th_end_c  # one-past-end of fourth-order case
 81 | 
 82 | cdef int SIZE2_c  # maximum possible number of DOFs, 2D case
 83 | 
 84 | # 3D case
 85 | #
 86 | cdef int i3_F_c
 87 | 
 88 | cdef int i3_X_c
 89 | cdef int i3_Y_c
 90 | cdef int i3_Z_c
 91 | 
 92 | cdef int i3_X2_c
 93 | cdef int i3_XY_c
 94 | cdef int i3_Y2_c
 95 | cdef int i3_YZ_c
 96 | cdef int i3_Z2_c
 97 | cdef int i3_XZ_c
 98 | 
 99 | cdef int i3_X3_c
100 | cdef int i3_X2Y_c
101 | cdef int i3_XY2_c
102 | cdef int i3_Y3_c
103 | cdef int i3_Y2Z_c
104 | cdef int i3_YZ2_c
105 | cdef int i3_Z3_c
106 | cdef int i3_XZ2_c
107 | cdef int i3_X2Z_c
108 | cdef int i3_XYZ_c
109 | 
110 | cdef int i3_X4_c
111 | cdef int i3_X3Y_c
112 | cdef int i3_X2Y2_c
113 | cdef int i3_XY3_c
114 | cdef int i3_Y4_c
115 | cdef int i3_Y3Z_c
116 | cdef int i3_Y2Z2_c
117 | cdef int i3_YZ3_c
118 | cdef int i3_Z4_c
119 | cdef int i3_XZ3_c
120 | cdef int i3_X2Z2_c
121 | cdef int i3_X3Z_c
122 | cdef int i3_X2YZ_c
123 | cdef int i3_XY2Z_c
124 | cdef int i3_XYZ2_c
125 | 
126 | cdef int i3_0th_end_c  # one-past end of zeroth-order case
127 | cdef int i3_1st_end_c  # one-past end of first-order case
128 | cdef int i3_2nd_end_c  # one-past-end of second-order case
129 | cdef int i3_3rd_end_c  # one-past-end of third-order case
130 | cdef int i3_4th_end_c  # one-past-end of fourth-order case
131 | 
132 | cdef int SIZE3_c  # maximum possible number of DOFs, 3D case
133 | 
134 | 
135 | # bitmask constants for knowns.
136 | #
137 | # Knowns are eliminated algebraically from the equation system; if any knowns are specified,
138 | # the system to be solved (for a point x_i) will be smaller than the full 6x6.
139 | #
140 | # The sensible default thing to do is to consider the function value F known, with all the
141 | # derivatives unknown.
142 | #
143 | # Note that here "known" means "known at point x_i" (the point at which we wish to compute the derivatives).
144 | #
145 | # Function values (F) are always assumed known at all *neighbor* points x_k, since they are used
146 | # for determining the local least-squares quadratic polynomial fit to the data. This fit is then used
147 | # as local a surrogate model for the unknown function f; in WLSQM, the derivatives are actually computed
148 | # from the surrogate.
149 | #
150 | # The option to have the function value (F) as an unknown is useful with Neumann BCs, if the neighborhoods
151 | # of the Neumann boundary points are chosen so that each Neumann boundary point only uses neighbors from
152 | # the interior of the domain. This gives the possibility to leave F free at all Neumann boundary points,
153 | # while prescribing only a derivative.
154 | #
155 | # (In practice, at slanted (i.e. not coordinate axis aligned) boundaries, local (tangent, normal)
156 | #  coordinates must be used; i.e., the coordinate system in which the derivatives are to be computed
157 | #  must be rotated to match the orientation of the boundary. This makes Y the normal derivative,
158 | #  which can then be prescribed using this mechanism, while leaving the function value F free.)
159 | 
160 | # 1D case
161 | #
162 | cdef long long b1_F_c
163 | cdef long long b1_X_c
164 | cdef long long b1_X2_c
165 | cdef long long b1_X3_c
166 | cdef long long b1_X4_c
167 | 
168 | # 2D case
169 | #
170 | cdef long long b2_F_c
171 | 
172 | cdef long long b2_X_c
173 | cdef long long b2_Y_c
174 | 
175 | cdef long long b2_X2_c
176 | cdef long long b2_XY_c
177 | cdef long long b2_Y2_c
178 | 
179 | cdef long long b2_X3_c
180 | cdef long long b2_X2Y_c
181 | cdef long long b2_XY2_c
182 | cdef long long b2_Y3_c
183 | 
184 | cdef long long b2_X4_c
185 | cdef long long b2_X3Y_c
186 | cdef long long b2_X2Y2_c
187 | cdef long long b2_XY3_c
188 | cdef long long b2_Y4_c
189 | 
190 | # 3D case
191 | #
192 | cdef long long b3_F_c
193 | 
194 | cdef long long b3_X_c
195 | cdef long long b3_Y_c
196 | cdef long long b3_Z_c
197 | 
198 | cdef long long b3_X2_c
199 | cdef long long b3_XY_c
200 | cdef long long b3_Y2_c
201 | cdef long long b3_YZ_c
202 | cdef long long b3_Z2_c
203 | cdef long long b3_XZ_c
204 | 
205 | cdef long long b3_X3_c
206 | cdef long long b3_X2Y_c
207 | cdef long long b3_XY2_c
208 | cdef long long b3_Y3_c
209 | cdef long long b3_Y2Z_c
210 | cdef long long b3_YZ2_c
211 | cdef long long b3_Z3_c
212 | cdef long long b3_XZ2_c
213 | cdef long long b3_X2Z_c
214 | cdef long long b3_XYZ_c
215 | 
216 | cdef long long b3_X4_c
217 | cdef long long b3_X3Y_c
218 | cdef long long b3_X2Y2_c
219 | cdef long long b3_XY3_c
220 | cdef long long b3_Y4_c
221 | cdef long long b3_Y3Z_c
222 | cdef long long b3_Y2Z2_c
223 | cdef long long b3_YZ3_c
224 | cdef long long b3_Z4_c
225 | cdef long long b3_XZ3_c
226 | cdef long long b3_X2Z2_c
227 | cdef long long b3_X3Z_c
228 | cdef long long b3_X2YZ_c
229 | cdef long long b3_XY2Z_c
230 | cdef long long b3_XYZ2_c
231 | 
232 | 


--------------------------------------------------------------------------------
/wlsqm/fitter/defs.pyx:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | #
  3 | # Set Cython compiler directives. This section must appear before any code!
  4 | #
  5 | # For available directives, see:
  6 | #
  7 | # http://docs.cython.org/en/latest/src/reference/compilation.html
  8 | #
  9 | # cython: wraparound  = False
 10 | # cython: boundscheck = False
 11 | # cython: cdivision   = True
 12 | #
 13 | """WLSQM (Weighted Least SQuares Meshless): a fast and accurate meshless least-squares interpolator for Python, for scalar-valued data defined as point values on 1D, 2D and 3D point clouds.
 14 | 
 15 | This module contains C-level and Python-level definitions of constants. The constants are made visible to Python by creating Python objects, with their values copied from the corresponding C constants.
 16 | 
 17 | In the source code, the suffix of _c means "visible at the C level in Cython"; it is used to distinguish the typed C constants from the corresponding Python objects.
 18 | 
 19 | Naming scheme:
 20 | 
 21 |     ALGO_*   = algorithms for the solve step (for advanced API in wlsqm.fitter.expert).
 22 |     WEIGHT_* = weighting methods for the error (data - predicted) in least-squares fitting.
 23 | 
 24 |     i1_* = integer, 1D case
 25 |     i2_* = integer, 2D case
 26 |     i3_* = integer, 3D case
 27 | 
 28 |         The i?_* constants are the human-readable names for the DOF indices in the "fi" array (see docstrings in wlsqm.fitter.simple).
 29 | 
 30 |     b1_* = bitmask, 1D case
 31 |     b2_* = bitmask, 2D case
 32 |     b3_* = bitmask, 3D case
 33 | 
 34 |     *_end = one-past-end index of this case. The ordinal (0th, 1st, 2nd, 3rd, 4th) refers to the degree of the fit.
 35 |             E.g. i2_3rd_end = one-past-end for 2D with 3rd order fit.
 36 | 
 37 |     SIZE1 = maximum possible number of DOFs (degrees of freedom), 1D case
 38 |     SIZE2 = maximum possible number of DOFs (degrees of freedom), 2D case
 39 |     SIZE3 = maximum possible number of DOFs (degrees of freedom), 3D case
 40 | 
 41 |         ("maximum possible" because if order < 4, then only lower-degree DOFs will exist.)
 42 | 
 43 |     F  = function value
 44 |     X  = "times x" (coefficients) or "differentiate by x" (see wlsqm.fitter.interp)
 45 |     X2 = "times x**2" or "differentiate twice by x"
 46 |     Y, Z respectively
 47 | 
 48 | Examples:
 49 | 
 50 |     i2_F   = 2D case, function value
 51 |     i2_X2Y = 2D case, coefficient of the X**2 * Y term in the polynomial; or request differentiation twice by x and once by y to compute d3f/dx2dy (see wlsqm.fitter.interp)
 52 | 
 53 |         IMPORTANT: the DOF values returned by the fitter are "partially baked" such that the DOF value directly corresponds to the value of the corresponding derivative.
 54 |                    This is for convenience of evaluating derivatives at the model reference point.
 55 | 
 56 |                    E.g. fi[:,i2_X2] is the coefficient of d2f/dx2 in a Taylor series expansion of f around the reference point xi.
 57 |                    (The ":" is here meant to refer to the reference point xi for all local models; see  wlsqm.fitter.simple.fit_2D_many()  for a description of the "fi" array.)
 58 | 
 59 | JJ 2016-11-30
 60 | """
 61 | 
 62 | from __future__ import division, print_function, absolute_import
 63 | 
 64 | #################################################
 65 | # C definitions (Cython level)
 66 | #################################################
 67 | 
 68 | # Algorithms for the solve step (expert mode).
 69 | #
 70 | cdef int ALGO_BASIC_c     = 1  # just fit once
 71 | cdef int ALGO_ITERATIVE_c = 2  # fit with iterative refinement to mitigate roundoff
 72 | 
 73 | # Weighting methods.
 74 | #
 75 | cdef int WEIGHT_UNIFORM_c = 1
 76 | cdef int WEIGHT_CENTER_c  = 2
 77 | 
 78 | # DOF index in the array f.
 79 | #
 80 | # These are ordered in increasing order of number of differentiations, so that if only first derivatives
 81 | # are required, the DOF array can be simply truncated after the first derivatives.
 82 | #
 83 | # To avoid gaps in the numbering, this requires separate DOF orderings for the 1D, 2D and 3D cases.
 84 | #
 85 | # (The other logical possibility would be function value first, then x-related, then y-related, then mixed,
 86 | #  but then the case of "first derivatives only" requires changes to the ordering to avoid gaps.
 87 | #  Specifying different orderings for different numbers of space dimensions is conceptually cleaner
 88 | #  of the two possibilities.)
 89 | 
 90 | # 1D case
 91 | #
 92 | cdef int i1_F_c   = 0
 93 | cdef int i1_X_c   = 1
 94 | cdef int i1_X2_c  = 2
 95 | cdef int i1_X3_c  = 3
 96 | cdef int i1_X4_c  = 4
 97 | 
 98 | cdef int i1_0th_end_c = 1  # one-past end of zeroth-order case
 99 | cdef int i1_1st_end_c = 2  # one-past end of first-order case
100 | cdef int i1_2nd_end_c = 3  # one-past-end of second-order case
101 | cdef int i1_3rd_end_c = 4  # one-past-end of third-order case
102 | cdef int i1_4th_end_c = 5  # one-past-end of fourth-order case
103 | 
104 | cdef int SIZE1_c = i1_4th_end_c  # maximum possible number of DOFs, 1D case
105 | 
106 | # 2D case
107 | #
108 | cdef int i2_F_c     =  0
109 | 
110 | cdef int i2_X_c     =  1
111 | cdef int i2_Y_c     =  2
112 | 
113 | cdef int i2_X2_c    =  3
114 | cdef int i2_XY_c    =  4
115 | cdef int i2_Y2_c    =  5
116 | 
117 | cdef int i2_X3_c    =  6
118 | cdef int i2_X2Y_c   =  7
119 | cdef int i2_XY2_c   =  8
120 | cdef int i2_Y3_c    =  9
121 | 
122 | cdef int i2_X4_c    = 10
123 | cdef int i2_X3Y_c   = 11
124 | cdef int i2_X2Y2_c  = 12
125 | cdef int i2_XY3_c   = 13
126 | cdef int i2_Y4_c    = 14
127 | 
128 | cdef int i2_0th_end_c =  1  # one-past end of zeroth-order case
129 | cdef int i2_1st_end_c =  3  # one-past end of first-order case
130 | cdef int i2_2nd_end_c =  6  # one-past-end of second-order case
131 | cdef int i2_3rd_end_c = 10  # one-past-end of third-order case
132 | cdef int i2_4th_end_c = 15  # one-past-end of fourth-order case
133 | 
134 | cdef int SIZE2_c = i2_4th_end_c  # maximum possible number of DOFs, 2D case
135 | 
136 | # 3D case
137 | #
138 | cdef int i3_F_c      =  0
139 | 
140 | cdef int i3_X_c      =  1
141 | cdef int i3_Y_c      =  2
142 | cdef int i3_Z_c      =  3
143 | 
144 | cdef int i3_X2_c     =  4
145 | cdef int i3_XY_c     =  5
146 | cdef int i3_Y2_c     =  6
147 | cdef int i3_YZ_c     =  7
148 | cdef int i3_Z2_c     =  8
149 | cdef int i3_XZ_c     =  9
150 | 
151 | cdef int i3_X3_c     = 10
152 | cdef int i3_X2Y_c    = 11
153 | cdef int i3_XY2_c    = 12
154 | cdef int i3_Y3_c     = 13
155 | cdef int i3_Y2Z_c    = 14
156 | cdef int i3_YZ2_c    = 15
157 | cdef int i3_Z3_c     = 16
158 | cdef int i3_XZ2_c    = 17
159 | cdef int i3_X2Z_c    = 18
160 | cdef int i3_XYZ_c    = 19
161 | 
162 | cdef int i3_X4_c     = 20
163 | cdef int i3_X3Y_c    = 21
164 | cdef int i3_X2Y2_c   = 22
165 | cdef int i3_XY3_c    = 23
166 | cdef int i3_Y4_c     = 24
167 | cdef int i3_Y3Z_c    = 25
168 | cdef int i3_Y2Z2_c   = 26
169 | cdef int i3_YZ3_c    = 27
170 | cdef int i3_Z4_c     = 28
171 | cdef int i3_XZ3_c    = 29
172 | cdef int i3_X2Z2_c   = 30
173 | cdef int i3_X3Z_c    = 31
174 | cdef int i3_X2YZ_c   = 32
175 | cdef int i3_XY2Z_c   = 33
176 | cdef int i3_XYZ2_c   = 34
177 | 
178 | cdef int i3_0th_end_c =  1  # one-past end of zeroth-order case
179 | cdef int i3_1st_end_c =  4  # one-past end of first-order case
180 | cdef int i3_2nd_end_c = 10  # one-past-end of second-order case
181 | cdef int i3_3rd_end_c = 20  # one-past-end of third-order case
182 | cdef int i3_4th_end_c = 35  # one-past-end of fourth-order case
183 | 
184 | cdef int SIZE3_c = i3_4th_end_c  # maximum possible number of DOFs, 3D case
185 | 
186 | 
187 | # bitmask constants for knowns.
188 | #
189 | # Knowns are eliminated algebraically from the equation system; if any knowns are specified,
190 | # the system to be solved (for a point x_i) will be smaller than the full system (e.g. "full" is 6x6 for 2nd order in 2D).
191 | #
192 | # The sensible default is to consider the function value F known, with all the derivatives unknown.
193 | #
194 | # Note that here "known" means "known at point xi" (the reference point of the model).
195 | #
196 | # Function values (F) are always assumed known at all *neighbor* points xk, since they are used
197 | # for determining the local least-squares polynomial fit to the data. This fit is then used
198 | # as a local surrogate model representing the unknown function f.
199 | #
200 | # In the application context of solving IBVPs with explicit time integration, the option to have the function value (F) as an unknown
201 | # is useful with Neumann BCs. The neighborhoods of the Neumann boundary points can be chosen such that each Neumann boundary point
202 | # only uses neighbors from the interior of the domain. This gives the possibility to leave F free at all Neumann boundary points,
203 | # while prescribing only a derivative (the normal-direction derivative).
204 | #
205 | # (In practice, at slanted (i.e. not coordinate axis aligned) boundaries, local (tangent, normal)
206 | #  coordinates must be used; i.e., the coordinate system in which the derivatives are to be computed
207 | #  must be rotated to match the orientation of the boundary. This makes Y the normal derivative,
208 | #  which can then be prescribed using this mechanism, while leaving the function value F free.)
209 | 
210 | # 1D case
211 | #
212 | cdef long long b1_F_c      = (1LL << i1_F_c)
213 | cdef long long b1_X_c      = (1LL << i1_X_c)
214 | cdef long long b1_X2_c     = (1LL << i1_X2_c)
215 | cdef long long b1_X3_c     = (1LL << i1_X3_c)
216 | cdef long long b1_X4_c     = (1LL << i1_X4_c)
217 | 
218 | # 2D case
219 | #
220 | cdef long long b2_F_c      = (1LL << i2_F_c)
221 | 
222 | cdef long long b2_X_c      = (1LL << i2_X_c)
223 | cdef long long b2_Y_c      = (1LL << i2_Y_c)
224 | 
225 | cdef long long b2_X2_c     = (1LL << i2_X2_c)
226 | cdef long long b2_XY_c     = (1LL << i2_XY_c)
227 | cdef long long b2_Y2_c     = (1LL << i2_Y2_c)
228 | 
229 | cdef long long b2_X3_c     = (1LL << i2_X3_c)
230 | cdef long long b2_X2Y_c    = (1LL << i2_X2Y_c)
231 | cdef long long b2_XY2_c    = (1LL << i2_XY2_c)
232 | cdef long long b2_Y3_c     = (1LL << i2_Y3_c)
233 | 
234 | cdef long long b2_X4_c     = (1LL << i2_X4_c)
235 | cdef long long b2_X3Y_c    = (1LL << i2_X3Y_c)
236 | cdef long long b2_X2Y2_c   = (1LL << i2_X2Y2_c)
237 | cdef long long b2_XY3_c    = (1LL << i2_XY3_c)
238 | cdef long long b2_Y4_c     = (1LL << i2_Y4_c)
239 | 
240 | # 3D case
241 | #
242 | cdef long long b3_F_c      = (1LL << i3_F_c)
243 | 
244 | cdef long long b3_X_c      = (1LL << i3_X_c)
245 | cdef long long b3_Y_c      = (1LL << i3_Y_c)
246 | cdef long long b3_Z_c      = (1LL << i3_Z_c)
247 | 
248 | cdef long long b3_X2_c     = (1LL << i3_X2_c)
249 | cdef long long b3_XY_c     = (1LL << i3_XY_c)
250 | cdef long long b3_Y2_c     = (1LL << i3_Y2_c)
251 | cdef long long b3_YZ_c     = (1LL << i3_YZ_c)
252 | cdef long long b3_Z2_c     = (1LL << i3_Z2_c)
253 | cdef long long b3_XZ_c     = (1LL << i3_XZ_c)
254 | 
255 | cdef long long b3_X3_c     = (1LL << i3_X3_c)
256 | cdef long long b3_X2Y_c    = (1LL << i3_X2Y_c)
257 | cdef long long b3_XY2_c    = (1LL << i3_XY2_c)
258 | cdef long long b3_Y3_c     = (1LL << i3_Y3_c)
259 | cdef long long b3_Y2Z_c    = (1LL << i3_Y2Z_c)
260 | cdef long long b3_YZ2_c    = (1LL << i3_YZ2_c)
261 | cdef long long b3_Z3_c     = (1LL << i3_Z3_c)
262 | cdef long long b3_XZ2_c    = (1LL << i3_XZ2_c)
263 | cdef long long b3_X2Z_c    = (1LL << i3_X2Z_c)
264 | cdef long long b3_XYZ_c    = (1LL << i3_XYZ_c)
265 | 
266 | cdef long long b3_X4_c     = (1LL << i3_X4_c)
267 | cdef long long b3_X3Y_c    = (1LL << i3_X3Y_c)
268 | cdef long long b3_X2Y2_c   = (1LL << i3_X2Y2_c)
269 | cdef long long b3_XY3_c    = (1LL << i3_XY3_c)
270 | cdef long long b3_Y4_c     = (1LL << i3_Y4_c)
271 | cdef long long b3_Y3Z_c    = (1LL << i3_Y3Z_c)
272 | cdef long long b3_Y2Z2_c   = (1LL << i3_Y2Z2_c)
273 | cdef long long b3_YZ3_c    = (1LL << i3_YZ3_c)
274 | cdef long long b3_Z4_c     = (1LL << i3_Z4_c)
275 | cdef long long b3_XZ3_c    = (1LL << i3_XZ3_c)
276 | cdef long long b3_X2Z2_c   = (1LL << i3_X2Z2_c)
277 | cdef long long b3_X3Z_c    = (1LL << i3_X3Z_c)
278 | cdef long long b3_X2YZ_c   = (1LL << i3_X2YZ_c)
279 | cdef long long b3_XY2Z_c   = (1LL << i3_XY2Z_c)
280 | cdef long long b3_XYZ2_c   = (1LL << i3_XYZ2_c)
281 | 
282 | 
283 | #################################################
284 | # Python wrapper
285 | #################################################
286 | 
287 | # Algorithms for the solve step (expert mode).
288 | #
289 | ALGO_BASIC     = ALGO_BASIC_c
290 | ALGO_ITERATIVE = ALGO_ITERATIVE_c
291 | 
292 | # Weighting methods.
293 | #
294 | WEIGHT_UNIFORM = WEIGHT_UNIFORM_c
295 | WEIGHT_CENTER  = WEIGHT_CENTER_c
296 | 
297 | # DOF index in the array f.
298 | #
299 | # These are ordered in increasing order of number of differentiations, so that if only first derivatives
300 | # are required, the DOF array can be simply truncated after the first derivatives.
301 | #
302 | # To avoid gaps in the numbering, this requires separate DOF orderings for the 1D, 2D and 3D cases.
303 | #
304 | # (The other logical possibility would be function value first, then x-related, then y-related, then mixed,
305 | #  but then the case of "first derivatives only" requires changes to the ordering to avoid gaps.
306 | #  Specifying different orderings for different numbers of space dimensions is conceptually cleaner
307 | #  of the two possibilities.)
308 | 
309 | # 1D case
310 | #
311 | i1_F  = i1_F_c
312 | i1_X  = i1_X_c
313 | i1_X2 = i1_X2_c
314 | i1_X3 = i1_X3_c
315 | i1_X4 = i1_X4_c
316 | 
317 | i1_1st_end = i1_1st_end_c
318 | i1_2nd_end = i1_2nd_end_c
319 | i1_3rd_end = i1_3rd_end_c
320 | i1_4th_end = i1_4th_end_c
321 | 
322 | SIZE1 = SIZE1_c
323 | 
324 | 
325 | # 2D case
326 | #
327 | i2_F     = i2_F_c
328 | 
329 | i2_X     = i2_X_c
330 | i2_Y     = i2_Y_c
331 | 
332 | i2_X2    = i2_X2_c
333 | i2_XY    = i2_XY_c
334 | i2_Y2    = i2_Y2_c
335 | 
336 | i2_X3    = i2_X3_c
337 | i2_X2Y   = i2_X2Y_c
338 | i2_XY2   = i2_XY2_c
339 | i2_Y3    = i2_Y3_c
340 | 
341 | i2_X4    = i2_X4_c
342 | i2_X3Y   = i2_X3Y_c
343 | i2_X2Y2  = i2_X2Y2_c
344 | i2_XY3   = i2_XY3_c
345 | i2_Y4    = i2_Y4_c
346 | 
347 | i2_1st_end = i2_1st_end_c
348 | i2_2nd_end = i2_2nd_end_c
349 | i2_3rd_end = i2_3rd_end_c
350 | i2_4th_end = i2_4th_end_c
351 | 
352 | SIZE2 = SIZE2_c
353 | 
354 | 
355 | # 3D case
356 | #
357 | i3_F      = i3_F_c
358 | 
359 | i3_X      = i3_X_c
360 | i3_Y      = i3_Y_c
361 | i3_Z      = i3_Z_c
362 | 
363 | i3_X2     = i3_X2_c
364 | i3_XY     = i3_XY_c
365 | i3_Y2     = i3_Y2_c
366 | i3_YZ     = i3_YZ_c
367 | i3_Z2     = i3_Z2_c
368 | i3_XZ     = i3_XZ_c
369 | 
370 | i3_X3     = i3_X3_c
371 | i3_X2Y    = i3_X2Y_c
372 | i3_XY2    = i3_XY2_c
373 | i3_Y3     = i3_Y3_c
374 | i3_Y2Z    = i3_Y2Z_c
375 | i3_YZ2    = i3_YZ2_c
376 | i3_Z3     = i3_Z3_c
377 | i3_XZ2    = i3_XZ2_c
378 | i3_X2Z    = i3_X2Z_c
379 | i3_XYZ    = i3_XYZ_c
380 | 
381 | i3_X4     = i3_X4_c
382 | i3_X3Y    = i3_X3Y_c
383 | i3_X2Y2   = i3_X2Y2_c
384 | i3_XY3    = i3_XY3_c
385 | i3_Y4     = i3_Y4_c
386 | i3_Y3Z    = i3_Y3Z_c
387 | i3_Y2Z2   = i3_Y2Z2_c
388 | i3_YZ3    = i3_YZ3_c
389 | i3_Z4     = i3_Z4_c
390 | i3_XZ3    = i3_XZ3_c
391 | i3_X2Z2   = i3_X2Z2_c
392 | i3_X3Z    = i3_X3Z_c
393 | i3_X2YZ   = i3_X2YZ_c
394 | i3_XY2Z   = i3_XY2Z_c
395 | i3_XYZ2   = i3_XYZ2_c
396 | 
397 | i3_0th_end = i3_0th_end_c
398 | i3_1st_end = i3_1st_end_c
399 | i3_2nd_end = i3_2nd_end_c
400 | i3_3rd_end = i3_3rd_end_c
401 | i3_4th_end = i3_4th_end_c
402 | 
403 | SIZE3 = SIZE3_c
404 | 
405 | 
406 | # bitmask constants for knowns.
407 | #
408 | # Knowns are eliminated algebraically from the equation system; if any knowns are specified,
409 | # the system to be solved (for a point x_i) will be smaller than the full 6x6.
410 | #
411 | # The sensible default thing to do is to consider the function value F known, with all the
412 | # derivatives unknown.
413 | #
414 | # Note that here "known" means "known at point x_i" (the point at which we wish to compute the derivatives).
415 | #
416 | # Function values (F) are always assumed known at all *neighbor* points x_k, since they are used
417 | # for determining the local least-squares quadratic polynomial fit to the data. This fit is then used
418 | # as local a surrogate model for the unknown function f; in WLSQM, the derivatives are actually computed
419 | # from the surrogate.
420 | #
421 | # The option to have the function value (F) as an unknown is useful with Neumann BCs, if the neighborhoods
422 | # of the Neumann boundary points are chosen so that each Neumann boundary point only uses neighbors from
423 | # the interior of the domain. This gives the possibility to leave F free at all Neumann boundary points,
424 | # while prescribing only a derivative.
425 | #
426 | # (In practice, at slanted (i.e. not coordinate axis aligned) boundaries, local (tangent, normal)
427 | #  coordinates must be used; i.e., the coordinate system in which the derivatives are to be computed
428 | #  must be rotated to match the orientation of the boundary. This makes Y the normal derivative,
429 | #  which can then be prescribed using this mechanism, while leaving the function value F free.)
430 | 
431 | # 1D case
432 | #
433 | b1_F  = b1_F_c
434 | b1_X  = b1_X_c
435 | b1_X2 = b1_X2_c
436 | b1_X3 = b1_X3_c
437 | b1_X4 = b1_X4_c
438 | 
439 | 
440 | # 2D case
441 | #
442 | b2_F     = b2_F_c
443 | 
444 | b2_X     = b2_X_c
445 | b2_Y     = b2_Y_c
446 | 
447 | b2_X2    = b2_X2_c
448 | b2_XY    = b2_XY_c
449 | b2_Y2    = b2_Y2_c
450 | 
451 | b2_X3    = b2_X3_c
452 | b2_X2Y   = b2_X2Y_c
453 | b2_XY2   = b2_XY2_c
454 | b2_Y3    = b2_Y3_c
455 | 
456 | b2_X4    = b2_X4_c
457 | b2_X3Y   = b2_X3Y_c
458 | b2_X2Y2  = b2_X2Y2_c
459 | b2_XY3   = b2_XY3_c
460 | b2_Y4    = b2_Y4_c
461 | 
462 | 
463 | # 3D case
464 | #
465 | b3_F      = b3_F_c
466 | 
467 | b3_X      = b3_X_c
468 | b3_Y      = b3_Y_c
469 | b3_Z      = b3_Z_c
470 | 
471 | b3_X2     = b3_X2_c
472 | b3_XY     = b3_XY_c
473 | b3_Y2     = b3_Y2_c
474 | b3_YZ     = b3_YZ_c
475 | b3_Z2     = b3_Z2_c
476 | b3_XZ     = b3_XZ_c
477 | 
478 | b3_X3     = b3_X3_c
479 | b3_X2Y    = b3_X2Y_c
480 | b3_XY2    = b3_XY2_c
481 | b3_Y3     = b3_Y3_c
482 | b3_Y2Z    = b3_Y2Z_c
483 | b3_YZ2    = b3_YZ2_c
484 | b3_Z3     = b3_Z3_c
485 | b3_XZ2    = b3_XZ2_c
486 | b3_X2Z    = b3_X2Z_c
487 | b3_XYZ    = b3_XYZ_c
488 | 
489 | b3_X4     = b3_X4_c
490 | b3_X3Y    = b3_X3Y_c
491 | b3_X2Y2   = b3_X2Y2_c
492 | b3_XY3    = b3_XY3_c
493 | b3_Y4     = b3_Y4_c
494 | b3_Y3Z    = b3_Y3Z_c
495 | b3_Y2Z2   = b3_Y2Z2_c
496 | b3_YZ3    = b3_YZ3_c
497 | b3_Z4     = b3_Z4_c
498 | b3_XZ3    = b3_XZ3_c
499 | b3_X2Z2   = b3_X2Z2_c
500 | b3_X3Z    = b3_X3Z_c
501 | b3_X2YZ   = b3_X2YZ_c
502 | b3_XY2Z   = b3_XY2Z_c
503 | b3_XYZ2   = b3_XYZ2_c
504 | 
505 | 


--------------------------------------------------------------------------------
/wlsqm/fitter/impl.pxd:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | #
 3 | # WLSQM (Weighted Least SQuares Meshless): a fast and accurate meshless least-squares interpolator for Python, for scalar-valued data defined as point values on 1D, 2D and 3D point clouds.
 4 | #
 5 | # Low-level routines: distance matrix generation, problem matrix generation, solver.
 6 | #
 7 | # JJ 2016-11-30
 8 | 
 9 | # Set Cython compiler directives. This section must appear before any code!
10 | #
11 | # For available directives, see:
12 | #
13 | # http://docs.cython.org/en/latest/src/reference/compilation.html
14 | #
15 | # cython: wraparound  = False
16 | # cython: boundscheck = False
17 | # cython: cdivision   = True
18 | 
19 | from __future__ import absolute_import
20 | 
21 | from cython cimport view
22 | 
23 | # See the infrasructure module for the definition of Case.
24 | cimport wlsqm.fitter.infra as infra
25 | 
26 | ####################################################
27 | # Distance matrix (c) generation
28 | ####################################################
29 | 
30 | cdef void make_c_nD( infra.Case* case, double[::view.generic,::view.contiguous] xkManyD, double[::view.generic] xk1D ) nogil
31 | cdef void make_c_3D( infra.Case* case, double[::view.generic,::view.contiguous] xk ) nogil
32 | cdef void make_c_2D( infra.Case* case, double[::view.generic,::view.contiguous] xk ) nogil
33 | cdef void make_c_1D( infra.Case* case, double[::view.generic] xk ) nogil
34 | 
35 | ####################################################
36 | # Problem matrix (A) generation
37 | ####################################################
38 | 
39 | cdef void make_A( infra.Case* case ) nogil
40 | cdef void preprocess_A( infra.Case* case, int debug ) nogil
41 | 
42 | ####################################################
43 | # RHS handling and solving
44 | ####################################################
45 | 
46 | cdef void solve( infra.Case* case, double[::view.generic] fk, double[::view.generic,::view.contiguous] sens, int do_sens, int taskid ) nogil
47 | cdef int solve_iterative( infra.Case* case, double[::view.generic] fk, double[::view.generic,::view.contiguous] sens, int do_sens, int taskid, int max_iter,
48 |                           double[::view.generic,::view.contiguous] xkManyD, double[::view.generic] xk1D ) nogil
49 | 
50 | 


--------------------------------------------------------------------------------
/wlsqm/fitter/infra.pxd:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | #
  3 | # WLSQM (Weighted Least SQuares Meshless): a fast and accurate meshless least-squares interpolator for Python, for scalar-valued data defined as point values on 1D, 2D and 3D point clouds.
  4 | #
  5 | # Centralized memory allocation infrastructure.
  6 | #
  7 | # The implementation uses C-style object-oriented programming, with structs and
  8 | # class name prefixed methods using an explicit self pointer argument.
  9 | #
 10 | # JJ 2016-11-30
 11 | 
 12 | from __future__ import absolute_import
 13 | 
 14 | #################################################
 15 | # Helper functions
 16 | #################################################
 17 | 
 18 | cdef int number_of_dofs( int dimension, int order ) nogil
 19 | cdef int number_of_reduced_dofs( int n, long long mask ) nogil
 20 | cdef int remap( int* o2r, int* r2o, int n, long long mask ) nogil
 21 | 
 22 | #################################################
 23 | # class Allocator:
 24 | #################################################
 25 | 
 26 | cdef int ALLOC_MODE_PASSTHROUGH   # pass each call through to C malloc/free
 27 | cdef int ALLOC_MODE_ONEBIGBUFFER  # pre-allocate one big buffer to fit everything in
 28 | 
 29 | cdef struct Allocator:
 30 |     int mode       # operation mode, see constants ALLOC_MODE_*
 31 |     void* buffer   # start address of all storage
 32 |     int size_total # buffer size, bytes
 33 |     void* p        # first currently unused address in buffer
 34 |     int size_used  # bytes used up to now
 35 | 
 36 | cdef Allocator* Allocator_new( int mode, int total_size_bytes ) nogil except <Allocator*>0
 37 | cdef void* Allocator_malloc( Allocator* self, int size_bytes ) nogil
 38 | cdef void Allocator_free( Allocator* self, void* p ) nogil
 39 | cdef int Allocator_size_remaining( Allocator* self ) nogil
 40 | cdef void Allocator_del( Allocator* self ) nogil
 41 | 
 42 | #################################################
 43 | # class CaseManager:
 44 | #################################################
 45 | 
 46 | # Sizes needed for the various arrays in Case, as bytes.
 47 | #
 48 | # This is really just a struct; no methods.
 49 | #
 50 | cdef struct BufferSizes:
 51 |     int o2r        # DOF mapping original --> reduced
 52 |     int r2o        # DOF mapping reduced  --> original
 53 | 
 54 |     int c          # distance matrix
 55 |     int w          # weights
 56 | 
 57 |     int A          # problem matrix / its packed LU factorization
 58 |     int row_scale  # row scaling factors for A (needed by solver to scale RHS)
 59 |     int col_scale  # column scaling factors for A (needed by solver to scale solution)
 60 |     int ipiv       # pivot information of LU factored A (needed by solver)
 61 | 
 62 |     int fi         # coefficients of the fit; essentially, the function value and derivatives at the origin of the fit
 63 |     int fi2        # work space for coefficients for interpolating derivatives of the model (wlsqm.fitter.interp) to a general point
 64 | 
 65 |     int wrk        # solver work space for RHS (or zero in managed mode)
 66 | 
 67 |     int fk_tmp     # work space for iterative refinement (or zero in managed mode), remaining error at each point xk
 68 |     int fi_tmp     # work space for iterative refinement (or zero in managed mode), coefficients of error reduction fit
 69 | 
 70 |     int total      # sum of all the above
 71 | 
 72 | # Infra class for multiple problem instances having the same dimension, order, knowns mask and flags (do_sens, iterative).
 73 | #
 74 | # This centralizes the memory allocation (to avoid unnecessary fragmentation)
 75 | # when multiple problem instances are solved at one go.
 76 | #
 77 | # This class is only intended to be used from a Python thread.
 78 | #
 79 | cdef struct CaseManager:
 80 |     # parallel processing
 81 |     #
 82 |     # "per-task arrays": one work space per task, independent of the number of problem instances (cases).
 83 |     #
 84 |     int ntasks
 85 |     double** wrks     # array of work spaces for RHS
 86 |     double** fk_tmps  # array of work spaces for iterative refinement, remaining error at each point xk
 87 |     double** fi_tmps  # array of work spaces for iterative refinement, coefficients of error reduction fit
 88 | 
 89 |     # managed cases
 90 |     #
 91 |     Case** cases      # array to store the Case pointers
 92 |     int max_cases     # array capacity
 93 |     int ncases        # currently used capacity
 94 | 
 95 |     int bytes_needed  # total memory required for storing the work spaces and the arrays allocated by the Case objects
 96 | 
 97 |     # data common to all managed cases
 98 |     #
 99 |     Allocator* mal
100 |     int dimension
101 |     int do_sens
102 |     int iterative
103 | 
104 | cdef CaseManager* CaseManager_new( int dimension, int do_sens, int iterative, int max_cases, int ntasks ) nogil except <CaseManager*>0
105 | cdef int CaseManager_add( CaseManager* self, Case* case ) nogil except -1
106 | cdef int CaseManager_commit( CaseManager* self ) nogil except -1
107 | #cdef int CaseManager_allocate( CaseManager* self ) nogil except -1  # private (not exported from the module)
108 | #cdef void CaseManager_deallocate( CaseManager* self ) nogil         # private
109 | cdef void CaseManager_del( CaseManager* self ) nogil
110 | 
111 | #################################################
112 | # class Case:
113 | #################################################
114 | 
115 | # This class gathers some metadata and centralizes memory management for one problem instance.
116 | #
117 | # We use the above custom allocator to allocate all needed memory in one big block.
118 | #
119 | # A centralized mode is available (see the optional constructor parameter "cases")
120 | # to centralize memory allocation for a set of cases.
121 | #
122 | # TODO: refactor: make_c_nD(), make_A(), preprocess_A(), solve() now look a lot like methods of Case (the first parameter is a Case*, and its member variables are used extensively).
123 | # TODO: could also store the point xi and other relevant stuff. (OTOH, currently no actual use case that needs them)
124 | #
125 | cdef struct Case:
126 |     # infra
127 |     int have_manager  # 1 = has a CaseManager, using its allocator
128 |                       # 0 = no CaseManager, create an allocator locally
129 |     CaseManager* manager
130 |     Allocator* mal  # custom memory allocator
131 | 
132 |     # case metadata
133 |     int dimension         # number of space dimensions
134 |     int order             # degree of polynomial to be fitted
135 |     long long knowns      # knowns bitmask
136 |     int weighting_method  # weighting: uniform or emphasize center region (see wlsqm.fitter.defs)
137 |     int no                # number of DOFs in original (unreduced) system
138 |     int nr                # number of DOFs in reduced system
139 |     int nk                # number of neighbor points used in fit
140 |     int do_sens           # flag: do sensitivity analysis? (affects memory usage)
141 |     int iterative         # flag: iterative refinement? (affects memory usage)
142 | 
143 |     # the origin point of the model (needed by certain routines)
144 |     double xi  # 1D, 2D, 3D
145 |     double yi  #     2D, 3D
146 |     double zi  #         3D
147 | 
148 |     # data pointers
149 | 
150 |     int geometry_owned    # guest mode support: possible to use o2r,r2o,c,w,A,row_scale,col_scale,ipiv off another Case instance
151 | 
152 |     # DOF mappings
153 |     int* o2r
154 |     int* r2o
155 | 
156 |     # low level stuff: "c" matrix, weights
157 |     double* c
158 |     double* w
159 | 
160 |     # higher-level stuff: "A" matrix
161 |     double* A
162 |     double* row_scale
163 |     double* col_scale
164 |     int* ipiv
165 | 
166 |     # condition number (for wlsqm.fitter.impl.preprocess_A() debug mode)
167 |     double cond_orig
168 |     double cond_scaled
169 | 
170 |     # coefficients of the fit
171 |     #
172 |     # This name was chosen because fi[0] is the function value ("f") at the point (xi,yi), hence "i",
173 |     # and the other elements store derivatives at the same point.
174 |     #
175 |     double* fi
176 |     double* fi2  # work space for coefficients for evaluating derivatives of the model (wlsqm.fitter.interp) at a general point
177 | 
178 |     # RHS work space for solver
179 |     double* wrk
180 | 
181 |     # work space for iterative fitting algorithm
182 |     double* fk_tmp
183 |     double* fi_tmp
184 | 
185 | cdef Case* Case_new( int dimension, int order, double xi, double yi, double zi, int nk, long long knowns, int weighting_method, int do_sens, int iterative, CaseManager* manager, Case* host ) nogil except <Case*>0
186 | cdef double* Case_get_wrk( Case* self, int taskid ) nogil
187 | cdef double* Case_get_fk_tmp( Case* self, int taskid ) nogil
188 | cdef double* Case_get_fi_tmp( Case* self, int taskid ) nogil
189 | cdef void Case_make_weights( Case* self, double max_d2 ) nogil  # mainly for use by wlsqm.fitter.impl.make_c_nd()
190 | cdef void Case_set_fi( Case* self, double* fi ) nogil
191 | cdef void Case_get_fi( Case* self, double* out ) nogil
192 | #cdef void Case_determine_sizes( Case* self, BufferSizes* sizes ) nogil  # private
193 | #cdef int Case_allocate( Case* self ) nogil except -1  # private
194 | #cdef void Case_deallocate( Case* self ) nogil  # private
195 | cdef void Case_del( Case* self ) nogil
196 | 
197 | 


--------------------------------------------------------------------------------
/wlsqm/fitter/infra.pyx:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | #
  3 | # WLSQM (Weighted Least SQuares Meshless): a fast and accurate meshless least-squares interpolator for Python, for scalar-valued data defined as point values on 1D, 2D and 3D point clouds.
  4 | #
  5 | # Centralized memory allocation infrastructure.
  6 | #
  7 | # JJ 2016-11-30
  8 | 
  9 | # Set Cython compiler directives. This section must appear before any code!
 10 | #
 11 | # For available directives, see:
 12 | #
 13 | # http://docs.cython.org/en/latest/src/reference/compilation.html
 14 | #
 15 | # cython: wraparound  = False
 16 | # cython: boundscheck = False
 17 | # cython: cdivision   = True
 18 | 
 19 | # Total memory needed for arrays:
 20 | #    - no*sizeof(int) bytes for o2r; one shared copy is enough (bypass custom alloc, since only remap(), which needs this, computes "nr")
 21 | #    - no*sizeof(int) bytes for r2o; one shared copy is enough (                                --''--                                  )
 22 | #    - nprob*nk*no*sizeof(double) bytes for c   (actually, sum(nk_j, j in problems)*no*sizeof(double))
 23 | #    - nprob*nr*sizeof(double) bytes for row_scale
 24 | #    - nprob*nr*sizeof(double) bytes for column_scale
 25 | #    - nprob*nr*nr*sizeof(double) bytes for A
 26 | #    - nprob*nr*sizeof(int) bytes for ipiv
 27 | #    - solve:
 28 | #    -   if do_sens, ntasks*nr*(nk+1)*sizeof(double) bytes for wrk
 29 | #           - use max(nk_j) here to fit the largest problem instance (since any thread may run it)
 30 | #    -   if not do_sens, ntasks*nr*sizeof(double) bytes for wrk
 31 | #    - iterative refinement:
 32 | #      - ntasks*nk*sizeof(double) bytes for fk_tmp
 33 | #           - max(nk_j) here too, same reason
 34 | #      - ntasks*no*sizeof(double) bytes for fi_tmp
 35 | 
 36 | from __future__ import division, print_function, absolute_import
 37 | 
 38 | from libc.stdlib cimport malloc, free
 39 | from libc.math cimport sqrt
 40 | 
 41 | cimport wlsqm.fitter.defs as defs  # C constants
 42 | 
 43 | # use GCC's intrinsics for counting the number of set bits in an int
 44 | #
 45 | # See
 46 | #   http://stackoverflow.com/questions/109023/how-to-count-the-number-of-set-bits-in-a-32-bit-integer (algorithms, suggestions)
 47 | #   https://gist.github.com/craffel/e470421958cad33df550 (Cython defs; on popcounting a NumPy array)
 48 | #
 49 | cdef extern from "popcount.h":
 50 |     int __builtin_popcount(unsigned int) nogil
 51 |     int __builtin_popcountll(unsigned long long) nogil
 52 | 
 53 | #####################################
 54 | # Helper functions
 55 | #####################################
 56 | 
 57 | # Return number of DOFs in the original (unreduced) system.
 58 | #
 59 | # dimension : in, number of space dimensions (1, 2 or 3)
 60 | # order     : in, the order of the polynomial to be fitted
 61 | #
 62 | cdef int number_of_dofs( int dimension, int order ) nogil:
 63 |     if dimension not in [1,2,3]:
 64 |         return -1
 65 | #        with gil:
 66 | #            raise ValueError( "dimension must be 1, 2 or 3; got %d" % dimension )
 67 |     if order not in [0,1,2,3,4]:
 68 |         return -2
 69 | #        with gil:
 70 | #            raise ValueError( "order must be 0, 1, 2, 3 or 4; got %d" % order )
 71 | 
 72 |     cdef int no
 73 |     if dimension == 3:
 74 |         if order == 4:
 75 |             no = defs.i3_4th_end_c
 76 |         elif order == 3:
 77 |             no = defs.i3_3rd_end_c
 78 |         elif order == 2:
 79 |             no = defs.i3_2nd_end_c
 80 |         elif order == 1:
 81 |             no = defs.i3_1st_end_c
 82 |         else: # order == 0:
 83 |             no = defs.i3_0th_end_c
 84 |     elif dimension == 2:
 85 |         if order == 4:
 86 |             no = defs.i2_4th_end_c
 87 |         elif order == 3:
 88 |             no = defs.i2_3rd_end_c
 89 |         elif order == 2:
 90 |             no = defs.i2_2nd_end_c
 91 |         elif order == 1:
 92 |             no = defs.i2_1st_end_c
 93 |         else: # order == 0:
 94 |             no = defs.i2_0th_end_c
 95 |     else: # dimension == 1:
 96 |         if order == 4:
 97 |             no = defs.i1_4th_end_c
 98 |         elif order == 3:
 99 |             no = defs.i1_3rd_end_c
100 |         elif order == 2:
101 |             no = defs.i1_2nd_end_c
102 |         elif order == 1:
103 |             no = defs.i1_1st_end_c
104 |         else: # order == 0:
105 |             no = defs.i1_0th_end_c
106 | 
107 |     return no
108 | 
109 | # Return the number of DOFs in the reduced system, corresponding to an original (unreduced) number of DOFs  n  and a knowns mask.
110 | #
111 | # n    : in, number of DOFs in the original (unreduced) system
112 | # mask : in, bitmask of knowns
113 | #
114 | cdef int number_of_reduced_dofs( int n, long long mask ) nogil:
115 |     cdef int ne = __builtin_popcountll(mask)  # number of eliminated DOFs = number of bits set in mask
116 |     return n - ne  # remaining DOFs
117 | 
118 | # Reduce the system size by removing the rows/columns for knowns from the DOF numbering.
119 | #
120 | # Specifically:
121 | #
122 | # Given a bitmask of DOFs to eliminate, construct DOF number mappings
123 | # between the original full equation system and the reduced equation system.
124 | #
125 | # o2r  : out, mapping original --> reduced;  size (n,), must be allocated by caller
126 | # r2o  : out, mapping reduces  --> original; size (n,), must be allocated by caller
127 | # n    : in, number of DOFs in the original (unreduced) system
128 | # mask : in, bitmask of knowns
129 | #
130 | # return value: the number of DOFs in the reduced system.
131 | #
132 | # In the arrays, non-existent DOFs will be represented by the special value -1.
133 | #
134 | # In o2r (original->reduced), non-existent DOFs are those that were eliminated
135 | # (hence have no index in the reduced system).
136 | #
137 | # In r2o (reduced->original), non-existent DOFs are those with  index >= n_reduced,
138 | # where  n_reduced = (n - n_eliminated),  since the reduced system has only n_reduced DOFs in total.
139 | #
140 | cdef int remap( int* o2r, int* r2o, int n, long long mask ) nogil:  # o = original, r = reduced
141 |     # We always start the elimination with a full range(n) of DOFs.
142 |     #
143 |     # For example, if we have 4 DOFs, and we would like to eliminate the DOF "1",
144 |     # we construct the following mappings:
145 |     #
146 |     #    orig -> reduced
147 |     #
148 |     #       0 ->  0
149 |     #       1 -> -1 (original DOF "1" does not exist in reduced system)
150 |     #       2 ->  1
151 |     #       3 ->  2
152 |     #
153 |     # reduced -> orig
154 |     #
155 |     #       0 ->  0
156 |     #       1 ->  2
157 |     #       2 ->  3
158 |     #       3 -> -1 (in the reduced system, there is no DOF "3")
159 |     #
160 |     # The right-hand sides can be expressed in array form, using the left-hand side as the array index:
161 |     #
162 |     # orig->reduced: [0, -1, 1, 2]
163 |     # reduced->orig: [0, 2, 3, -1]
164 |     #
165 |     # These arrays are the output format.
166 |     #
167 |     # This says that e.g. the DOF "2" of orig maps to DOF "1" of reduced (array orig->reduced, index 2).
168 |     # The DOF "1" of reduced maps to DOF "2" of orig (array reduced->orig, index 1).
169 | 
170 |     # We first generate orig -> reduced.
171 |     #
172 |     cdef int j, k=0  # k = first currently available DOF number (0-based) in the reduced system
173 |     for j in range(n):
174 |         if mask & (1LL << j):  # eliminate this DOF?
175 |             o2r[j] = -1
176 |         else:
177 |             o2r[j] = k
178 |             k += 1  # a DOF was introduced into the reduced system
179 | 
180 |     # k is now the number of DOFs in the reduced system
181 | 
182 |     # Construct the inverse to obtain reduced -> orig. See the example above.
183 |     #
184 |     for j in range(n):
185 |         if o2r[j] == -1:
186 |             continue
187 |         r2o[ o2r[j] ] = j
188 | 
189 |     # In the reduced -> orig mapping, set the rest to -1, since the reduced system
190 |     # has less DOFs than the original one.
191 |     #
192 |     for j in range(k, n):
193 |         r2o[j] = -1
194 | 
195 |     return k
196 | 
197 | 
198 | #################################################
199 | # class Allocator:
200 | #################################################
201 | 
202 | # To avoid memory fragmentation in the case with many instances of the model being fitted at once,
203 | # we use a custom memory allocator.
204 | #
205 | # This is very simplistic; we do not need to support the re-use of already allocated blocks.
206 | #
207 | # Example:
208 | #    int total_size_bytes = 1000000  # 1 MB
209 | #    Allocator* a = Allocator_new( ALLOC_MODE_ONEBIGBUFFER, total_size_bytes )
210 | #    int* my_block_1 = <int*>Allocator_malloc( a, 100*sizeof(int) )
211 | #    # ...other Allocator_malloc()'s...
212 | #    # ...
213 | #    Allocator_free( a, my_block_1 )
214 | #    # ...other Allocator_free()'s...
215 | #    Allocator_del( a )
216 | 
217 | # Object-oriented programming, C style.
218 | cdef int ALLOC_MODE_PASSTHROUGH  = 1  # pass each call through to C malloc/free
219 | cdef int ALLOC_MODE_ONEBIGBUFFER = 2  # pre-allocate one big buffer to fit everything in
220 | 
221 | # Constructor.
222 | #
223 | # This class is only intended to be used from a Python thread.
224 | #
225 | # Note that ALLOC_MODE_PASSTRHOUGH doesn't use total_size_bytes, but .pxd files do not support default values for function arguments,
226 | # and Cython's "int total_size_bytes=*" syntax (in the .pxd file) does not support nogil functions.
227 | #
228 | cdef Allocator* Allocator_new( int mode, int total_size_bytes ) nogil except <Allocator*>0:
229 |     cdef Allocator* self = <Allocator*>malloc( sizeof(Allocator) )
230 |     if self == <Allocator*>0:  # we promised Cython not to return NULL, so we must raise if the malloc fails
231 |         with gil:
232 |             raise MemoryError("Out of memory trying to allocate an Allocator object")
233 | 
234 |     if mode == ALLOC_MODE_ONEBIGBUFFER  and  total_size_bytes > 0:
235 |         self.buffer = malloc( total_size_bytes )
236 |         if self.buffer == <void*>0:
237 |             with gil:
238 |                 raise MemoryError("Out of memory trying to allocate a buffer of %d bytes" % (total_size_bytes))
239 |     else:
240 |         self.buffer = <void*>0
241 | 
242 |     self.mode       = mode
243 |     self.size_total = total_size_bytes
244 |     self.p          = self.buffer
245 |     self.size_used  = 0
246 | 
247 |     return self
248 | 
249 | cdef void* Allocator_malloc( Allocator* self, int size_bytes ) nogil:
250 |     if self.mode == ALLOC_MODE_PASSTHROUGH:
251 | #        with gil:
252 | #            print( "directly allocating %d bytes" % (size_bytes) )  # DEBUG
253 |         return malloc( size_bytes )
254 | 
255 |     # else...
256 | 
257 |     # pathological case: no buffer, can't allocate
258 |     if self.buffer == <void*>0:
259 |         return <void*>0
260 | 
261 |     # check that there is enough space remaining in the buffer
262 |     cdef int size_remaining = self.size_total - self.size_used
263 |     if size_bytes > size_remaining:
264 | #        with gil:
265 | #            print( "buffer full, cannot allocate %d bytes" % (size_bytes) )  # DEBUG
266 |         return <void*>0
267 | 
268 |     cdef void* p
269 |     with gil:  # since we are called from Python threads only, we can use the GIL to make the operation thread-safe. (TODO/FIXME: well, not exactly, see e.g. http://www.slideshare.net/dabeaz/an-introduction-to-python-concurrency )
270 | #        print( "reserving %d bytes from buffer of size %d; after alloc, %d bytes remaining" % (size_bytes, self.size_total, size_remaining - size_bytes) )  # DEBUG
271 |         p = self.p
272 |         self.p = <void*>( (<char*>p) + size_bytes )
273 |         self.size_used += size_bytes
274 | 
275 |     return p
276 | 
277 | cdef void Allocator_free( Allocator* self, void* p ) nogil:
278 |     if self.mode == ALLOC_MODE_PASSTHROUGH:
279 |         free( p )
280 |     # else do nothing; this simplistic allocator doesn't reuse blocks once they are allocated
281 | 
282 | cdef int Allocator_size_remaining( Allocator* self ) nogil:
283 |     return self.size_total - self.size_used
284 | 
285 | # Destructor.
286 | cdef void Allocator_del( Allocator* self ) nogil:
287 |     if self != <Allocator*>0:
288 |         free( self.buffer )
289 |         free( self )
290 | 
291 | 
292 | #################################################
293 | # class CaseManager:
294 | #################################################
295 | 
296 | # Constructor.
297 | #
298 | # max_cases is mandatory (with an invalid default value since no sane default can exist).
299 | #
300 | # ntasks is for parallel processing at solve time; effectively, it specifies how many per-task arrays to allocate.
301 | # When processing serially, use the value 1.
302 | #
303 | cdef CaseManager* CaseManager_new( int dimension, int do_sens, int iterative, int max_cases, int ntasks ) nogil except <CaseManager*>0:
304 |     # Generally speaking, fixing the array size at instantiation time is stupid (an automatically expanding buffer would be better),
305 |     # but considering that this class has only one actual user, where we do know max_cases in advance, it is fine for our purposes.
306 |     if max_cases < 1:
307 |         with gil:
308 |             raise ValueError("Must specify max_cases > 0 when creating a CaseManager.")
309 | 
310 |     cdef CaseManager* self = <CaseManager*>malloc( sizeof(CaseManager) )
311 |     if self == <CaseManager*>0:  # we promised Cython not to return NULL, so we must raise if the malloc fails
312 |         with gil:
313 |             raise MemoryError("Out of memory trying to allocate an CaseManager object")
314 | 
315 |     # init parallel proc
316 |     #
317 |     self.ntasks  = ntasks
318 |     self.wrks    = <double**>malloc( ntasks*sizeof(double*) )
319 |     self.fk_tmps = <double**>malloc( ntasks*sizeof(double*) )
320 |     self.fi_tmps = <double**>malloc( ntasks*sizeof(double*) )
321 |     for j in range(ntasks):  # init to NULL needed to gracefully handle errors in CaseManager_allocate()
322 |         self.wrks[j]    = <double*>0
323 |         self.fk_tmps[j] = <double*>0
324 |         self.fi_tmps[j] = <double*>0
325 | 
326 |     # init storage for Case pointers
327 |     #
328 |     self.cases = <Case**>malloc( max_cases*sizeof(Case*) )
329 |     self.max_cases = max_cases
330 |     self.ncases = 0
331 | 
332 |     # save metadata
333 |     #
334 |     self.dimension = dimension
335 |     self.do_sens   = do_sens
336 |     self.iterative = iterative
337 | 
338 |     # these will be set up at allocate time
339 |     #
340 |     self.mal          = <Allocator*>0
341 |     self.bytes_needed = 0
342 | 
343 |     return self
344 | 
345 | # Add a case to this manager.
346 | #
347 | # This means the manager will manage the memory for the Case, and at destruction time,
348 | # will also destroy the managed Case.
349 | #
350 | # Up to max_cases Case objects can be added to the manager (see CaseManager_new()).
351 | #
352 | # Case_new() will call this automatically, if a manager is specified.
353 | #
354 | cdef int CaseManager_add( CaseManager* self, Case* case ) nogil except -1:
355 |     # sanity check remaining space
356 |     if self.ncases == self.max_cases:
357 |         with gil:
358 |             raise MemoryError("Case pointer buffer full, max_cases = %d reached" % self.max_cases)
359 | 
360 |     # sanity check Case metadata for compatibility with this CaseManager instance
361 |     if case.dimension != self.dimension:
362 |         with gil:
363 |             raise ValueError("Cannot add case with different dimension = %d; this manager has dimension = %d" % (case.dimension, self.dimension))
364 |     if case.do_sens != self.do_sens:
365 |         with gil:
366 |             raise ValueError("Cannot add case with different setting for do_sens = %d; this manager has do_sens = %d" % (case.do_sens, self.do_sens))
367 |     if case.iterative != self.iterative:
368 |         with gil:
369 |             raise ValueError("Cannot add case with different setting for iterative = %d; this manager has iterative = %d" % (case.iterative, self.iterative))
370 | 
371 |     # add the Case to the managed cases.
372 |     self.cases[self.ncases] = case
373 |     self.ncases += 1
374 | 
375 |     return 0
376 | 
377 | # Finish adding Case objects. Prepare the manaager for solving.
378 | #
379 | # This should be called exactly once (per instance of CaseManager), after all cases have been CaseManager_add()'d.
380 | #
381 | cdef int CaseManager_commit( CaseManager* self ) nogil except -1:
382 |     return CaseManager_allocate( self )
383 | 
384 | # Create the memory buffer that will contain the data arrays for all cases.
385 | #
386 | cdef int CaseManager_allocate( CaseManager* self ) nogil except -1:
387 |     cdef int j, problem_instance_bytes=0, task_bytes=0
388 |     cdef int max_nk=0, max_no=0, max_nr=0
389 |     cdef int size_wrk=0, size_fk_tmp=0, size_fi_tmp=0
390 |     cdef BufferSizes sizes
391 |     cdef Case* case
392 |     with gil:
393 |         try:
394 |             if self.ncases == 0:
395 |                 raise ValueError("No cases; add some before allocating")
396 | 
397 |             # Determine total amount of memory needed by the per-problem-instance arrays.
398 |             #
399 |             # Also, find max_nk for allocation of per-task arrays. (nk may vary across cases)
400 |             #
401 |             for j in range(self.ncases):
402 |                 case = self.cases[j]
403 |                 Case_determine_sizes( case, &sizes )
404 | 
405 |                 problem_instance_bytes += sizes.total
406 | 
407 |                 if case.nk > max_nk:
408 |                     max_nk = case.nk
409 |                 if case.no > max_no:
410 |                     max_no = case.no
411 |                 if case.nr > max_nr:
412 |                     max_nr = case.nr
413 | 
414 |             # Determine maximum needed size for one instance of the per-task arrays,
415 |             # when working on this set of cases.
416 |             #
417 |             if self.do_sens:
418 |                 size_wrk = max_nr*(max_nk + 1)*sizeof(double)
419 |             else:
420 |                 size_wrk = max_nr*sizeof(double)
421 | 
422 |             if self.iterative:
423 |                 size_fk_tmp = max_nk*sizeof(double)
424 |                 size_fi_tmp = max_no*sizeof(double)
425 |             else:
426 |                 size_fk_tmp = 0
427 |                 size_fi_tmp = 0
428 | 
429 |             # The total for the per-task arrays is then just ntasks copies:
430 |             #
431 |             task_bytes = self.ntasks*(size_wrk + size_fk_tmp + size_fi_tmp)
432 | 
433 |             # Final total of memory needed is thus:
434 |             #
435 |             self.bytes_needed = task_bytes + problem_instance_bytes
436 | 
437 |             # NOTE: this may be big (e.g. ~5.5kB per problem instance for dimension=2, order=4, nk=25,
438 |             #                        so for a moderate number of 1e4 problem instances, this is already 55MB)
439 |             #
440 |             # The good news is that even if multiple fits (against different data) are performed with the same points,
441 |             # we can simply let the buffer be - there is no need to re-create it for each run.
442 |             #
443 |             self.mal = Allocator_new( mode=ALLOC_MODE_ONEBIGBUFFER, total_size_bytes=self.bytes_needed )
444 | 
445 |             # Allocate the per-task arrays.
446 |             #
447 |             for j in range(self.ntasks):
448 |                 self.wrks[j]    = <double*>Allocator_malloc( self.mal, size_wrk )
449 |                 self.fk_tmps[j] = <double*>Allocator_malloc( self.mal, size_fk_tmp )
450 |                 self.fi_tmps[j] = <double*>Allocator_malloc( self.mal, size_fi_tmp )
451 | 
452 |             # Finally, tell the Case objects to allocate their memory.
453 |             #
454 |             # They will automatically grab our allocator, since they are in managed mode.
455 |             #
456 |             for j in range(self.ncases):
457 |                 Case_allocate( self.cases[j] )
458 | 
459 |         except:
460 |             # on error, leave the CaseManager in the state it was in before this method was called.
461 |             CaseManager_deallocate( self )
462 |             raise
463 | 
464 |     return 0
465 | 
466 | # The opposite of CaseManager_allocate().
467 | #
468 | cdef void CaseManager_deallocate( CaseManager* self ) nogil:
469 |     if self != <CaseManager*>0:
470 |         self.bytes_needed = 0
471 | 
472 |         if self.mal != <Allocator*>0:  # the allocator instantiation may have failed, so make sure we have an allocator before attempting this
473 |             # the managed Case objects also use our allocator
474 |             for j in range(self.ncases):
475 |                 Case_deallocate( self.cases[j] )  # it is safe to Case_deallocate() also a Case that has not yet been Case_allocate()'d.
476 | 
477 |             for j in range(self.ntasks):
478 |                 Allocator_free( self.mal, self.fi_tmps[j] )
479 |                 self.fi_tmps[j] = <double*>0
480 |                 Allocator_free( self.mal, self.fk_tmps[j] )
481 |                 self.fk_tmps[j] = <double*>0
482 |                 Allocator_free( self.mal, self.wrks[j] )
483 |                 self.wrks[j]    = <double*>0
484 | 
485 |             Allocator_del( self.mal )
486 |             self.mal = <Allocator*>0
487 | 
488 | # Destructor. Destroys also the managed Case objects.
489 | #
490 | cdef void CaseManager_del( CaseManager* self ) nogil:
491 |     cdef int j
492 |     if self != <CaseManager*>0:
493 |         CaseManager_deallocate( self )
494 | 
495 |         # destroy the managed Case objects
496 |         for j in range(self.ncases):
497 |             Case_del( self.cases[j] )
498 | 
499 |         # free manually allocated storage
500 |         free( self.cases )
501 |         free( self.fi_tmps )
502 |         free( self.fk_tmps )
503 |         free( self.wrks )
504 | 
505 |         free( self )
506 | 
507 | 
508 | #################################################
509 | # class Case:
510 | #################################################
511 | 
512 | # Constructor.
513 | #
514 | # This class is only intended to be instantiated from a Python thread.
515 | #
516 | # manager: an already existing CaseManager object to use, to share the memory allocator among a set of cases.
517 | #          The cases must have the same dimension, do_sens, iterative.
518 | #
519 | #          If null, an Allocator will be created locally.
520 | #
521 | # host:    for guest mode, an existing Case object to use. The geometry data (o2r,r2o,c,w,A,row_scale,col_scale,ipiv) will be borrowed off the host,
522 | #          and no local copies will be created.
523 | #
524 | #          This can be used to save both memory and time when different fields (in an IBVP problem) live on the exact same geometry.
525 | #          "Geometry" includes both xi,yi.zi (point "xi") and the neighbor set (points "xk"; see wlsqm.fitter.impl.make_c_?D()).
526 | #
527 | #          Thus, the host Case instance must have the exact same parameters (and geometry!) as the Case instance being created.
528 | #          "Parameters" include  dimension, order, nk, knowns, weighting_method.
529 | #
530 | #          The parameter match is not checked! See wlswm2_expert.ExpertSolver for correct usage.
531 | #          (It does some rudimentary checking, but does not check the geometry.)
532 | #
533 | #          When using guest mode, the calling code must make sure the host instance stays alive at least as long as its guest instances,
534 | #          or hope for a crash.
535 | #
536 | #          If null, the geometry data will be allocated locally.
537 | #
538 | cdef Case* Case_new( int dimension, int order, double xi, double yi, double zi, int nk, long long knowns, int weighting_method, int do_sens, int iterative, CaseManager* manager, Case* host ) nogil except <Case*>0:
539 |     cdef Case* self = <Case*>malloc( sizeof(Case) )
540 |     if self == <Case*>0:  # we promised Cython not to return NULL, so we must raise if the malloc fails
541 |         with gil:
542 |             raise MemoryError("Out of memory trying to allocate a Case object")
543 | 
544 |     # tag unused components as NaN
545 |     cdef double zero = 0
546 |     cdef double nan = zero/zero  # NaN as per IEEE-754
547 |     self.xi = xi
548 |     self.yi = yi  if dimension >= 2  else  nan
549 |     self.zi = zi  if dimension == 3  else  nan
550 | 
551 |     # init data pointers to NULL to make it safe to dealloc partially initialized Case (when something goes wrong)
552 |     self.o2r       = <int*>0
553 |     self.r2o       = <int*>0
554 |     self.c         = <double*>0
555 |     self.w         = <double*>0
556 |     self.A         = <double*>0
557 |     self.row_scale = <double*>0
558 |     self.col_scale = <double*>0
559 |     self.ipiv      = <int*>0
560 |     self.fi        = <double*>0
561 |     self.fi2       = <double*>0
562 |     self.wrk       = <double*>0
563 |     self.fk_tmp    = <double*>0
564 |     self.fi_tmp    = <double*>0
565 | 
566 |     if host == <Case*>0:
567 |         self.geometry_owned = 1
568 | 
569 |         # set condition numbers to nan until computed (only computed if wlsqm.fitter.impl.prepare() is called with the debug flag set!)
570 |         self.cond_orig   = nan
571 |         self.cond_scaled = nan
572 | 
573 |     else:
574 |         self.geometry_owned = 0
575 |         self.o2r            = host.o2r
576 |         self.r2o            = host.r2o
577 |         self.c              = host.c
578 |         self.w              = host.w
579 |         self.A              = host.A
580 |         self.row_scale      = host.row_scale
581 |         self.col_scale      = host.col_scale
582 |         self.ipiv           = host.ipiv
583 | 
584 |         # these may have been computed by host (this only works if the host has had preprocess_A() called on it already, but this is the best we can do)
585 |         self.cond_orig      = host.cond_orig
586 |         self.cond_scaled    = host.cond_scaled
587 | 
588 |     # Use the data from CaseManager if given
589 |     self.have_manager = (manager != <CaseManager*>0)
590 |     self.manager      = manager        # (copies the pointer also if NULL)
591 |     self.mal          = <Allocator*>0  # this will be filled at allocate time
592 | 
593 |     # determine number of DOFs in the original (unreduced) and reduced systems (needed to determine array sizes)
594 |     cdef int no = number_of_dofs( dimension, order )
595 |     cdef int nr = number_of_reduced_dofs( no, knowns )
596 | 
597 |     # save metadata
598 |     self.dimension        = dimension
599 |     self.order            = order
600 |     self.knowns           = knowns
601 |     self.weighting_method = weighting_method
602 |     self.no               = no
603 |     self.nr               = nr
604 |     self.nk               = nk
605 |     self.do_sens          = do_sens
606 |     self.iterative        = iterative
607 | 
608 |     # Now the Case is in a half-initialized state, with metadata available, but no memory allocated yet.
609 | 
610 |     if self.have_manager:
611 |         # In managed mode, cases automatically add themselves to the manager.
612 |         with gil:
613 |             try:
614 |                 CaseManager_add( self.manager, self )  # this may raise if the buffer is full
615 |             except:
616 |                 free( self )
617 |                 raise
618 |     else:
619 |         # In unmanaged mode, for caller convenience, cases fully initialize themselves, since there is no separate allocate step
620 |         # that depends on having all the cases available (to compute the final buffer size).
621 |         Case_allocate( self )
622 | 
623 |     return self
624 | 
625 | # Getters for work space pointers for parallel task "taskid" (0, 1, ..., ntasks-1).
626 | #
627 | # In managed mode, the work spaces live in the manager (there are ntasks copies).
628 | #
629 | # In unmanaged mode, each Case has its own work space.
630 | #
631 | cdef double* Case_get_wrk( Case* self, int taskid ) nogil:
632 |     if self.have_manager:
633 |         return self.manager.wrks[taskid]
634 |     else:
635 |         return self.wrk
636 | 
637 | cdef double* Case_get_fk_tmp( Case* self, int taskid ) nogil:
638 |     if self.have_manager:
639 |         return self.manager.fk_tmps[taskid]
640 |     else:
641 |         return self.fk_tmp
642 | 
643 | cdef double* Case_get_fi_tmp( Case* self, int taskid ) nogil:
644 |     if self.have_manager:
645 |         return self.manager.fi_tmps[taskid]
646 |     else:
647 |         return self.fi_tmp
648 | 
649 | # Helper: convert an (nk,) array of squared distances to corresponding weights.
650 | #
651 | # w      : in/out. On entry, squared distances from xi to each xk.
652 | #                  On exit, weight factors for each xk.
653 | # nk     : in, number of neighbor points (i.e. points xk)
654 | # max_d2 : in, the largest squared distance seen (i.e. the max element of input w).
655 | #          This is used for normalization.
656 | # weighting_method : in, one of the constants WEIGHT_*. Specifies the type of weighting to use;
657 | #                    different weightings are good for different use cases of WLSQM.
658 | #
659 | cdef void Case_make_weights( Case* self, double max_d2 ) nogil:
660 |     cdef double* w = self.w
661 |     cdef int nk    = self.nk
662 |     cdef int weighting_method = self.weighting_method
663 | 
664 |     # no-op in guest mode (weights already computed in the host Case instance)
665 |     if not self.geometry_owned:
666 |         return
667 | 
668 |     cdef int k
669 |     cdef double d2, tmp
670 |     if weighting_method == defs.WEIGHT_UNIFORM_c:
671 |         # Trivial weighting. Don't use distance information, treat all points as equally important.
672 |         #
673 |         # This gives the best overall fit of function values across all points xk,
674 |         # at the cost of accuracy of derivatives at the point xi.
675 |         #
676 |         # (Essentially, this cost is because derivatives are local, so the information
677 |         #  from far-away points corrupts them.)
678 |         #
679 |         for k in range(nk):
680 |             w[k] = 1.
681 | 
682 |     else: # weighting_method == defs.WEIGHT_CENTER_c:
683 |         # Emphasize points close to xi.
684 |         #
685 |         # Improves the fit of derivatives at the point xi, at the cost of the overall fit
686 |         # of function values at points xk that are (relatively speaking) distant from xi.
687 |         #
688 |         for k in range(nk):
689 |             d2 = w[k]  # the array w originally contains squared distances (without normalization)
690 | 
691 |             # distance squared, flipped on the distance axis (fast falloff near origin)
692 |             DEF alpha = 1e-4  # weight remaining at maximum distance
693 |             DEF beta  = 1. - alpha
694 |             tmp = 1. - sqrt(d2 / max_d2)
695 |             w[k] = alpha + beta * tmp*tmp
696 | 
697 | # Determine how many bytes of memory this Case will need for storing its arrays.
698 | #
699 | # Write the result into the given BufferSizes struct.
700 | #
701 | cdef void Case_determine_sizes( Case* self, BufferSizes* sizes ) nogil:
702 |     cdef int no        = self.no
703 |     cdef int nr        = self.nr
704 |     cdef int nk        = self.nk
705 |     cdef int do_sens   = self.do_sens
706 |     cdef int iterative = self.iterative
707 | 
708 |     if self.geometry_owned:
709 |         sizes.o2r       = no*sizeof(int)        # (no,)
710 |         sizes.r2o       = no*sizeof(int)        # (no,)
711 |         sizes.c         = nk*no*sizeof(double)  # (nk,no), C-contiguous
712 |         sizes.w         = nk*sizeof(double)     # (nk,)
713 |         sizes.A         = nr*nr*sizeof(double)  # (nr, nr), Fortran-contiguous
714 |         sizes.row_scale = nr*sizeof(double)     # (nr,)
715 |         sizes.col_scale = nr*sizeof(double)     # (nr,)
716 |         sizes.ipiv      = nr*sizeof(int)        # (nr,)
717 |     else:
718 |         # This function computes only bytes needed, so in guest mode we can put zeroes here.
719 |         # This function is not used for determining the number of elements in anything.
720 |         sizes.o2r       = 0
721 |         sizes.r2o       = 0
722 |         sizes.c         = 0
723 |         sizes.w         = 0
724 |         sizes.A         = 0
725 |         sizes.row_scale = 0
726 |         sizes.col_scale = 0
727 |         sizes.ipiv      = 0
728 | 
729 |     # The coefficient array is always needed.
730 |     #
731 |     sizes.fi        = no*sizeof(double)     # (no,)
732 | 
733 |     # For any polynomial of degree  d >= 1,  its (non-zero) derivatives are a polynomial of degree  d - 1.
734 |     # In the zeroth order case, the derivative is everywhere zero.
735 |     cdef int no2    = number_of_dofs( self.dimension, self.order - 1 )  if self.order >= 1  else  0
736 |     sizes.fi2       = no2*sizeof(double)    # (no2,)
737 | 
738 |     # per-task work space arrays
739 |     #
740 |     if self.have_manager:
741 |         # in managed mode, CaseManager will allocate one copy of the per-task (not per-problem-instance) arrays
742 |         sizes.wrk    = 0
743 |         sizes.fk_tmp = 0
744 |         sizes.fi_tmp = 0
745 |     else:
746 |         # unmanaged mode - allocate also the per-task arrays locally.
747 |         # see the header comment of solve() for the solver work space sizes
748 |         if do_sens:
749 |             sizes.wrk = nr*(nk + 1)*sizeof(double)
750 |         else:
751 |             sizes.wrk = nr*sizeof(double)
752 | 
753 |         if iterative:
754 |             sizes.fk_tmp = nk*sizeof(double)
755 |             sizes.fi_tmp = no*sizeof(double)
756 |         else:
757 |             sizes.fk_tmp = 0
758 |             sizes.fi_tmp = 0
759 | 
760 |     sizes.total = sizes.o2r + sizes.r2o \
761 |                 + sizes.c + sizes.w \
762 |                 + sizes.A + sizes.row_scale + sizes.col_scale + sizes.ipiv \
763 |                 + sizes.fi + sizes.fi2 \
764 |                 + sizes.wrk \
765 |                 + sizes.fk_tmp + sizes.fi_tmp
766 | 
767 | # Load user-given data into the coefficients fi.
768 | #
769 | # The length of the input is assumed to be self.no.
770 | #
771 | # This can be used to populate knowns.
772 | #
773 | cdef void Case_set_fi( Case* self, double* fi ) nogil:
774 |     cdef double* my_fi = self.fi
775 |     cdef int no        = self.no
776 |     cdef int om
777 |     for om in range(no):
778 |         my_fi[om] = fi[om]
779 | 
780 | # Populate user-given array of length self.no
781 | # with the solution data (coefficients self.fi).
782 | #
783 | cdef void Case_get_fi( Case* self, double* out ) nogil:
784 |     cdef double* my_fi = self.fi
785 |     cdef int no        = self.no
786 |     cdef int om
787 |     for om in range(no):
788 |         out[om] = my_fi[om]
789 | 
790 | # Perform memory allocation.
791 | #
792 | cdef int Case_allocate( Case* self ) nogil except -1:
793 |     # At this point, the constructor has finished, so we have no, nr, nk and the flags (do_sens, iterative).
794 |     #
795 |     # Calculate the (space-)optimal buffer size for ONEBIGBUFFER mode.
796 |     #
797 |     # (This also calculates the various individual sizes, which we will use to actually allocate the memory.)
798 |     #
799 |     cdef BufferSizes sizes
800 |     Case_determine_sizes( self, &sizes )
801 | 
802 |     # Acquire or create the custom memory allocator to allocate storage for actual data.
803 |     #
804 |     cdef int size_remaining=-1
805 |     cdef Allocator* mal
806 |     with gil:
807 |         try:
808 |             if self.have_manager:  # managed mode - external allocator given; check that it has enough space for us.
809 |                 mal = self.manager.mal
810 |                 size_remaining = Allocator_size_remaining( mal )
811 |                 if size_remaining < sizes.total:
812 |                     raise MemoryError("%d bytes of memory needed, but the given allocator has only %d bytes remaining." % (sizes.total, size_remaining))
813 | 
814 |             else:  # unmanaged mode - instantiate our own Allocator. The Allocator constructor will raise MemoryError if it runs out of memory.
815 |                 mal = Allocator_new( mode=ALLOC_MODE_ONEBIGBUFFER, total_size_bytes=sizes.total )
816 |         except:
817 |             free( self )
818 |             raise
819 |     self.mal = mal
820 | 
821 |     # Allocate the storage, using the custom allocator.
822 |     #
823 |     if self.geometry_owned:
824 |         self.o2r       = <int*>   Allocator_malloc( mal, sizes.o2r )
825 |         self.r2o       = <int*>   Allocator_malloc( mal, sizes.r2o )
826 | 
827 |         self.c         = <double*>Allocator_malloc( mal, sizes.c )
828 |         self.w         = <double*>Allocator_malloc( mal, sizes.w )
829 | 
830 |         self.A         = <double*>Allocator_malloc( mal, sizes.A )
831 |         self.row_scale = <double*>Allocator_malloc( mal, sizes.row_scale )
832 |         self.col_scale = <double*>Allocator_malloc( mal, sizes.col_scale )
833 |         self.ipiv      = <int*>   Allocator_malloc( mal, sizes.ipiv )
834 | 
835 |     self.fi        = <double*>Allocator_malloc( mal, sizes.fi )
836 | 
837 |     if sizes.fi2:
838 |         self.fi2   = <double*>Allocator_malloc( mal, sizes.fi2 )
839 |     else:
840 |         self.fi2   = <double*>0
841 | 
842 |     if self.have_manager:
843 |         # in managed mode, CaseManager will allocate one copy of the per-task (not per-problem-instance) arrays
844 |         self.wrk    = <double*>0
845 |         self.fk_tmp = <double*>0
846 |         self.fi_tmp = <double*>0
847 | 
848 |     else:
849 |         # unmanaged mode - allocate the per-task arrays locally.
850 |         self.wrk        = <double*>Allocator_malloc( mal, sizes.wrk )
851 | 
852 |         if self.iterative:
853 |             self.fk_tmp = <double*>Allocator_malloc( mal, sizes.fk_tmp )
854 |             self.fi_tmp = <double*>Allocator_malloc( mal, sizes.fi_tmp )
855 |         else:
856 |             self.fk_tmp = <double*>0
857 |             self.fi_tmp = <double*>0
858 | 
859 |     # memory allocated; populate o2r and r2o
860 |     #
861 |     # (in guest mode, this has already been done in the host Case instance)
862 |     #
863 |     if self.geometry_owned:
864 |         remap( self.o2r, self.r2o, self.no, self.knowns )
865 | 
866 |     return 0
867 | 
868 | # The opposite of Case_allocate().
869 | #
870 | cdef void Case_deallocate( Case* self ) nogil:
871 |     if self != <Case*>0:
872 |         # No guarantee that we'll be called only from the destructor;
873 |         # we must set any deallocated pointers to NULL.
874 |         Allocator_free( self.mal, self.fi_tmp )
875 |         self.fi_tmp = <double*>0
876 |         Allocator_free( self.mal, self.fk_tmp )
877 |         self.fk_tmp = <double*>0
878 | 
879 |         Allocator_free( self.mal, self.wrk )
880 |         self.wrk = <double*>0
881 | 
882 |         Allocator_free( self.mal, self.fi2 )
883 |         self.fi2 = <double*>0
884 |         Allocator_free( self.mal, self.fi )
885 |         self.fi = <double*>0
886 | 
887 |         if self.geometry_owned:
888 |             Allocator_free( self.mal, self.ipiv )
889 |             self.ipiv = <int*>0
890 |             Allocator_free( self.mal, self.col_scale )
891 |             self.col_scale = <double*>0
892 |             Allocator_free( self.mal, self.row_scale )
893 |             self.row_scale = <double*>0
894 |             Allocator_free( self.mal, self.A )
895 |             self.A = <double*>0
896 | 
897 |             Allocator_free( self.mal, self.w )
898 |             self.w = <double*>0
899 |             Allocator_free( self.mal, self.c )
900 |             self.c = <double*>0
901 | 
902 |             Allocator_free( self.mal, self.r2o )
903 |             self.r2o = <int*>0
904 |             Allocator_free( self.mal, self.o2r )
905 |             self.o2r = <int*>0
906 |         else:
907 |             # In guest mode, the allocation of these arrays is managed by the host Case instance.
908 |             self.ipiv      = <int*>0
909 |             self.col_scale = <double*>0
910 |             self.row_scale = <double*>0
911 |             self.A         = <double*>0
912 |             self.w         = <double*>0
913 |             self.c         = <double*>0
914 |             self.r2o       = <int*>0
915 |             self.o2r       = <int*>0
916 | 
917 | # Destructor.
918 | #
919 | cdef void Case_del( Case* self ) nogil:
920 |     if self != <Case*>0:
921 |         Case_deallocate( self )
922 | 
923 |         if not self.have_manager:
924 |             Allocator_del( self.mal )
925 | 
926 |         free( self )
927 | 
928 | 


--------------------------------------------------------------------------------
/wlsqm/fitter/interp.pxd:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | #
 3 | # WLSQM (Weighted Least SQuares Meshless): a fast and accurate meshless least-squares interpolator for Python, for scalar-valued data defined as point values on 1D, 2D and 3D point clouds.
 4 | #
 5 | # Interpolation of fitted surrogate model.
 6 | #
 7 | # C API definitions.
 8 | #
 9 | # JJ 2016-11-30
10 | 
11 | from __future__ import absolute_import
12 | 
13 | from cython cimport view
14 | 
15 | cimport wlsqm.fitter.infra as infra
16 | 
17 | cdef int interpolate_nD( infra.Case* case, double[::view.generic,::view.contiguous] xManyD, double[::view.generic] x1D, double* out, int diff ) nogil
18 | 
19 | cdef int interpolate_3D( infra.Case* case, double[::view.generic,::view.contiguous] x, double* out, int diff ) nogil
20 | cdef int interpolate_2D( infra.Case* case, double[::view.generic,::view.contiguous] x, double* out, int diff ) nogil
21 | cdef int interpolate_1D( infra.Case* case, double[::view.generic] x, double* out, int diff ) nogil
22 | 
23 | 


--------------------------------------------------------------------------------
/wlsqm/fitter/polyeval.pxd:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | #
 3 | # WLSQM (Weighted Least SQuares Meshless): a fast and accurate meshless least-squares interpolator for Python, for scalar-valued data defined as point values on 1D, 2D and 3D point clouds.
 4 | #
 5 | # Evaluation of Taylor expansions and general polynomials up to 4th order in 1D, 2D and 3D.
 6 | #
 7 | # C API definitions.
 8 | #
 9 | # JJ 2016-12-09
10 | 
11 | from __future__ import absolute_import
12 | 
13 | from cython cimport view
14 | 
15 | cdef int taylor_3D( int order, double* fi, double xi, double yi, double zi, double[::view.generic,::view.contiguous] x, double* out ) nogil
16 | cdef int general_3D( int order, double* fi, double xi, double yi, double zi, double[::view.generic,::view.contiguous] x, double* out ) nogil
17 | 
18 | cdef int taylor_2D( int order, double* fi, double xi, double yi, double[::view.generic,::view.contiguous] x, double* out ) nogil
19 | cdef int general_2D( int order, double* fi, double xi, double yi, double[::view.generic,::view.contiguous] x, double* out ) nogil
20 | 
21 | cdef int taylor_1D( int order, double* fi, double xi, double[::view.generic] x, double* out ) nogil
22 | cdef int general_1D( int order, double* fi, double xi, double[::view.generic] x, double* out ) nogil
23 | 
24 | 


--------------------------------------------------------------------------------
/wlsqm/fitter/popcount.h:
--------------------------------------------------------------------------------
1 | 
2 | 
3 | #ifdef _MSC_VER
4 | #include <intrin.h>
5 | #define __builtin_popcount __popcnt
6 | #define __builtin_popcountll __popcnt64
7 | #endif


--------------------------------------------------------------------------------
/wlsqm/fitter/simple.pxd:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | #
 3 | # WLSQM (Weighted Least SQuares Meshless): a fast and accurate meshless least-squares interpolator for Python, for scalar-valued data defined as point values on 1D, 2D and 3D point clouds.
 4 | #
 5 | # Cython declarations for the main module. See the .pyx source for wlsqm.fitter.simple for documentation.
 6 | #
 7 | # JJ 2016-11-07
 8 | 
 9 | # Set Cython compiler directives. This section must appear before any code!
10 | #
11 | # For available directives, see:
12 | #
13 | # http://docs.cython.org/en/latest/src/reference/compilation.html
14 | #
15 | # cython: wraparound  = False
16 | # cython: boundscheck = False
17 | # cython: cdivision   = True
18 | 
19 | # This module contains "driver" routines in the LAPACK sense.
20 | # The low-level C routines are contained in wlsqm.fitter.impl.
21 | 
22 | from __future__ import absolute_import
23 | 
24 | from cython cimport view  # for usage, see http://cython.readthedocs.io/en/latest/src/userguide/memoryviews.html#specifying-more-general-memory-layouts
25 | 
26 | ####################################################
27 | # Single case (one neighborhood), single-threaded
28 | ####################################################
29 | 
30 | cdef int generic_fit_basic( int dimension, double[::view.generic,::view.contiguous] xkManyD, double[::view.generic] xk1D, double[::view.generic] fk, double[::1] xiManyD, double xi1D, double[::1] fi,
31 |                                            double[::view.generic,::view.contiguous] sens, int do_sens, int order, long long knowns, int weighting_method, int debug ) nogil except -1
32 | 
33 | cdef int generic_fit_iterative( int dimension, double[::view.generic,::view.contiguous] xkManyD, double[::view.generic] xk1D, double[::view.generic] fk, double[::1] xiManyD, double xi1D, double[::1] fi,
34 |                                                double[::view.generic,::view.contiguous] sens, int do_sens, int order, long long knowns, int weighting_method, int max_iter, int debug ) nogil except -1
35 | 
36 | ####################################################
37 | # Many cases, single-threaded
38 | ####################################################
39 | 
40 | # single-threaded
41 | cdef int generic_fit_basic_many( int dimension, double[::view.generic,::view.generic,::view.contiguous] xkManyD, double[::view.generic,::view.generic] xk1D,
42 |                                                 double[::view.generic,::view.generic] fk, int[::view.generic] nk,
43 |                                                 double[::view.generic,::view.contiguous] xiManyD, double[::view.generic] xi1D, double[::view.generic,::view.contiguous] fi,
44 |                                                 double[::view.generic,::view.generic,::view.contiguous] sens, int do_sens,
45 |                                                 int[::view.generic] order, long long[::view.generic] knowns, int[::view.generic] weighting_method, int debug ) nogil except -1
46 | 
47 | cdef int generic_fit_iterative_many( int dimension, double[::view.generic,::view.generic,::view.contiguous] xkManyD, double[::view.generic,::view.generic] xk1D,
48 |                                                     double[::view.generic,::view.generic] fk, int[::view.generic] nk,
49 |                                                     double[::view.generic,::view.contiguous] xiManyD, double[::view.generic] xi1D, double[::view.generic,::view.contiguous] fi,
50 |                                                     double[::view.generic,::view.generic,::view.contiguous] sens, int do_sens,
51 |                                                     int[::view.generic] order, long long[::view.generic] knowns, int[::view.generic] weighting_method, int max_iter, int debug ) nogil except -1
52 | 
53 | ####################################################
54 | # Many cases, multithreaded
55 | ####################################################
56 | 
57 | cdef int generic_fit_basic_many_parallel( int dimension, double[::view.generic,::view.generic,::view.contiguous] xkManyD, double[::view.generic,::view.generic] xk1D,
58 |                                                          double[::view.generic,::view.generic] fk, int[::view.generic] nk,
59 |                                                          double[::view.generic,::view.contiguous] xiManyD, double[::view.generic] xi1D, double[::view.generic,::view.contiguous] fi,
60 |                                                          double[::view.generic,::view.generic,::view.contiguous] sens, int do_sens,
61 |                                                          int[::view.generic] order, long long[::view.generic] knowns, int[::view.generic] weighting_method, int ntasks, int debug ) nogil except -1
62 | 
63 | cdef int generic_fit_iterative_many_parallel( int dimension, double[::view.generic,::view.generic,::view.contiguous] xkManyD, double[::view.generic,::view.generic] xk1D,
64 |                                                              double[::view.generic,::view.generic] fk, int[::view.generic] nk,
65 |                                                              double[::view.generic,::view.contiguous] xiManyD, double[::view.generic] xi1D, double[::view.generic,::view.contiguous] fi,
66 |                                                              double[::view.generic,::view.generic,::view.contiguous] sens, int do_sens,
67 |                                                              int[::view.generic] order, long long[::view.generic] knowns, int[::view.generic] weighting_method, int max_iter, int ntasks, int debug ) nogil except -1
68 | 
69 | 


--------------------------------------------------------------------------------
/wlsqm/utils/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Technologicat/python-wlsqm/b697d163c2d2bec46b4d9696467abaebb9d4cbb3/wlsqm/utils/__init__.py


--------------------------------------------------------------------------------
/wlsqm/utils/lapackdrivers.pxd:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | #
  3 | # Cython interface for lapackdrivers.pyx.
  4 | #
  5 | # Naming scheme (in shellglob notation):
  6 | #   *s = multiple RHS (but with the same LHS for all).
  7 | #        These are one-shot deals that reuse the matrix factorization internally.
  8 | #        However, the pivot information is not returned, so the matrix A
  9 | #        is destroyed (overwritten) during the call.
 10 | #
 11 | #   m* = multiple LHS (a separate single RHS for each)
 12 | #        These simply loop over the problem instances.
 13 | #
 14 | #   *p = parallel (multi-threaded using OpenMP)
 15 | #        These introduce parallel looping over problem instances.
 16 | #
 17 | #   *factor*   = the routine that factors the matrix and generates pivot information.
 18 | #   *factored* = the solver routine that uses the factored matrix and pivot information.
 19 | #        These are useful for solving with many RHSs, when all the RHSs
 20 | #        are not available at once (e.g. in PDE solvers, timestepping
 21 | #        with a mass matrix that remains constant in time).
 22 | #
 23 | #   *_c = C version without memoryviews (only visible from Cython). Can be more convenient
 24 | #         for use in nogil blocks, in cases where the arrays need to be allocated dynamically (with malloc).
 25 | #         The purpose of the C version is to avoid the need to acquire the GIL to create a memoryview
 26 | #         into a malloc()'d array.
 27 | #
 28 | #         This .pxd file offers access to only the C versions. To import the corresponding Python versions
 29 | #         of the routines (same name, without the _c suffix), import the module normally in Python.
 30 | #
 31 | #         Note that the Python routines operate on memoryview slices (compatible with np.arrays),
 32 | #         so they have slightly different parameters and return values when compared to the C routines.
 33 | #
 34 | #         Generally, the Python versions will allocate arrays for you, while the C versions expect you
 35 | #         to provide pointers to already malloc()'d memory (and explicit sizes).
 36 | #
 37 | # See the function docstrings and comments in the .pyx source for details.
 38 | #
 39 | # JJ 2016-11-07
 40 | 
 41 | from __future__ import absolute_import
 42 | 
 43 | ##############################################################################################################
 44 | # Helpers
 45 | ##############################################################################################################
 46 | 
 47 | cdef void distribute_items_c( int nitems, int ntasks, int* blocksizes, int* baseidxs ) nogil  # distribute work items across tasks, assuming equal load per item.
 48 | 
 49 | cdef void copygeneral_c( double* O, double* I, int nrows, int ncols ) nogil  # copy general square array
 50 | cdef void copysymmu_c( double* O, double* I, int nrows, int ncols ) nogil  # copy symmetric square array, upper triangle only
 51 | 
 52 | cdef void symmetrize_c( double* A, int nrows, int ncols ) nogil
 53 | cdef void msymmetrize_c( double* A, int nrows, int ncols, int nlhs ) nogil
 54 | cdef void msymmetrizep_c( double* A, int nrows, int ncols, int nlhs, int ntasks ) nogil
 55 | 
 56 | ##############################################################################################################
 57 | # Preconditioning (scaling)
 58 | ##############################################################################################################
 59 | 
 60 | # Scaling reduces the condition number of A, helping DGESV (general()) to give more correct digits.
 61 | #
 62 | # The return value is the number of iterations taken; always 1 for non-iterative algorithms.
 63 | 
 64 | # helpers
 65 | cdef void init_scaling_c( int nrows, int ncols, double* row_scale, double* col_scale ) nogil  # init all scaling factors to 1.0
 66 | cdef void apply_scaling_c( double* A, int nrows, int ncols, double* row_scale, double* col_scale ) nogil  # freeze the scaling by applying it in-place
 67 | 
 68 | # simple, fast methods; these destroy the possible symmetry of A
 69 | cdef int rescale_columns_c( double* A, int nrows, int ncols, double* row_scale, double* col_scale ) nogil
 70 | cdef int rescale_rows_c( double* A, int nrows, int ncols, double* row_scale, double* col_scale ) nogil
 71 | cdef int rescale_twopass_c( double* A, int nrows, int ncols, double* row_scale, double* col_scale ) nogil  # scale columns, then rows
 72 | cdef int rescale_dgeequ_c( double* A, int nrows, int ncols, double* row_scale, double* col_scale ) nogil
 73 | 
 74 | # symmetry-preserving methods (iterative)
 75 | cdef int rescale_ruiz2001_c( double* A, int nrows, int ncols, double* row_scale, double* col_scale ) nogil
 76 | cdef int rescale_scalgm_c( double* A, int nrows, int ncols, double* row_scale, double* col_scale ) nogil
 77 | 
 78 | ##############################################################################################################
 79 | # Tridiagonal matrices
 80 | ##############################################################################################################
 81 | 
 82 | cpdef int tridiag( double[::1] a, double[::1] b, double[::1] c, double[::1] x ) nogil except -1
 83 | 
 84 | ##############################################################################################################
 85 | # Symmetric matrices
 86 | ##############################################################################################################
 87 | 
 88 | cdef int symmetric2x2_c( double* A, double* b ) nogil except -1
 89 | 
 90 | cdef int symmetric_c( double* A, double* b, int n ) nogil except -1
 91 | cdef int symmetricfactor_c( double* A, int* ipiv, int n ) nogil except -1
 92 | cdef int symmetricfactored_c( double* A, int* ipiv, double* b, int n ) nogil except -1
 93 | 
 94 | cdef int symmetrics_c( double* A, double* b, int n, int nrhs ) nogil except -1
 95 | cdef int symmetricsp_c( double* A, double* b, int n, int nrhs, int ntasks ) nogil except -1
 96 | 
 97 | cdef int msymmetric_c( double* A, double* b, int n, int nlhs ) nogil except -1
 98 | cdef int msymmetricp_c( double* A, double* b, int n, int nlhs, int ntasks ) nogil except -1
 99 | 
100 | cdef int msymmetricfactor_c( double* A, int* ipiv, int n, int nlhs ) nogil except -1
101 | cdef int msymmetricfactored_c( double* A, int* ipiv, double* b, int n, int nlhs ) nogil except -1
102 | cdef int msymmetricfactorp_c( double* A, int* ipiv, int n, int nlhs, int ntasks ) nogil except -1
103 | cdef int msymmetricfactoredp_c( double* A, int* ipiv, double* b, int n, int nlhs, int ntasks ) nogil except -1
104 | 
105 | ##############################################################################################################
106 | # General matrices
107 | ##############################################################################################################
108 | 
109 | cdef int general2x2_c( double* A, double* b ) nogil except -1
110 | 
111 | cdef int general_c( double* A, double* b, int n ) nogil except -1
112 | cdef int generalfactor_c( double* A, int* ipiv, int n ) nogil except -1
113 | cdef int generalfactored_c( double* A, int* ipiv, double* b, int n ) nogil except -1
114 | 
115 | cdef int generals_c( double* A, double* b, int n, int nrhs ) nogil except -1
116 | cdef int generalsp_c( double* A, double* b, int n, int nrhs, int ntasks ) nogil except -1
117 | 
118 | cdef int mgeneral_c( double* A, double* b, int n, int nlhs ) nogil except -1
119 | cdef int mgeneralp_c( double* A, double* b, int n, int nlhs, int ntasks ) nogil except -1
120 | 
121 | cdef int mgeneralfactor_c( double* A, int* ipiv, int n, int nlhs ) nogil except -1
122 | cdef int mgeneralfactored_c( double* A, int* ipiv, double* b, int n, int nlhs ) nogil except -1
123 | cdef int mgeneralfactorp_c( double* A, int* ipiv, int n, int nlhs, int ntasks ) nogil except -1
124 | cdef int mgeneralfactoredp_c( double* A, int* ipiv, double* b, int n, int nlhs, int ntasks ) nogil except -1
125 | 
126 | ##############################################################################################################
127 | # Other stuff
128 | ##############################################################################################################
129 | 
130 | cdef int svd_c( double* A, int m, int n, double* S ) nogil except -1
131 | 
132 | 


--------------------------------------------------------------------------------
/wlsqm/utils/ptrwrap.pxd:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | #
 3 | # Hack around the limitation that C pointers cannot be passed to Python functions.
 4 | #
 5 | # http://grokbase.com/t/gg/cython-users/134b21rga8/passing-callback-pointers-to-python-and-back
 6 | #
 7 | # JJ 2016-02-29
 8 | 
 9 | from __future__ import absolute_import
10 | 
11 | cdef class PointerWrapper:
12 |     cdef void* ptr
13 |     cdef set_ptr(self, void * input)
14 | 
15 | 


--------------------------------------------------------------------------------
/wlsqm/utils/ptrwrap.pyx:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | #
 3 | # Hack around the limitation that C pointers cannot be passed to Python functions.
 4 | #
 5 | # http://grokbase.com/t/gg/cython-users/134b21rga8/passing-callback-pointers-to-python-and-back
 6 | #
 7 | # This is a Cython module, to be used by other .pyx modules, with no access from Python.
 8 | #
 9 | # JJ 2016-02-29
10 | 
11 | from __future__ import division, print_function, absolute_import
12 | 
13 | cdef class PointerWrapper:
14 |     cdef set_ptr(self, void * input):
15 |         self.ptr = input
16 | 
17 | 


--------------------------------------------------------------------------------