├── LICENSE
├── README.md
├── kalman_filters_tests.ipynb
├── kalmanfilters
    ├── cov_loc.py
    ├── ensrf.py
    ├── ensrf_direct.py
    ├── ensrf_direct_loc.py
    ├── ensrf_serial.py
    ├── estkf.py
    ├── etkf.py
    ├── etkf_livings.py
    ├── senkf.py
    └── senkf_loc.py
└── testdata
    ├── HXf.npy
    ├── R.npy
    ├── Xf.npz
    └── Y.npy


/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2022 mathivierpunktnull
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # A collection of ensemble square root kalman filters implemented in Python
 2 | 
 3 | This repository offers Python code for a variety of Ensemble Kalman Filters as presented in the comprehensive paper by Vetra-Carvalho et al. (2018) [1]. The authors present a variety of data-assimilation methods using a unified mathematical notation. I consider it a pleasant to read paper that makes the math more understandable than the separate papers for different methods. You can find the derivation of the methods in my master thesis about Paleoclimate Data Assimilation: https://mchoblet.github.io/post/master/.
 4 | 
 5 | I also added the possibility of localization with the function cov_loc.py which computes the the distance decorrelation matrices.
 6 | 
 7 | For the implementation of the algorithms I followed the Fortran-like pseudocode given by authors in the appendix and indicated in the comments where I deviated from it (unfortunately there are some errors in the pseudocode, but they helped me in understanding the algoirthms better). The jupyter notebook shows that the output (posterior mean + covariance) from all functions is equal for my test data, but of course strictly speaking this is not a proof.
 8 | 
 9 | I hope to have time to implement other methods mentioned in the Vetra-Carvalho paper one day.
10 | 
11 | ## Content of repository:
12 | * Folder kalmanfilters: Separate file for each Kalman Filter
13 | * Folder testdata: data from a general circulation climate model which can be assimilated with the functions (you could also just generate random vectors)
14 | * kalman_filters_tests-notebook: Simple script to check that the output of the different functions is equal (posterior mean and covariance matrix)
15 | 
16 | ## Dependencies
17 | The functions work on pure numpy arrays.
18 | 
19 | * numpy 
20 | * scipy (only the EnSRF_direct function needs it for matrix square root calculation)
21 | 
22 | ## Input variables and dimension conventions
23 | * Note that the observation operator  H  is only implemented implicitely in these functions, the observations from the model  Hx  need to be precalculated. The observation uncertainties are assumed to be uncorrelated, hence the matrix R is diagonal (algorithms are written for diagonal R).
24 | 
25 | **Variables**
26 | * Xf: Prior ensemble ( Nx  *  N_e )
27 | * HX: Observations from model ( Ny  *  Ne )
28 | * Y: Observations ( N_y  * 1) 
29 | * R: Observation error (uncorrelated, R is assumed diagonal) ( Ny  * 1)
30 | 
31 | **Dimensions**
32 | *  Ne:  Ensemble Size 
33 | *  Nx:  State Vector length
34 | *  Ny:  Number of measurements
35 | 
36 | I usually work with climate fields as [xarrays](https://docs.xarray.dev/en/stable/), which you can easily bring into the right shape using methods like '.stack(z=('lat','lon')), 'swap_dims' for getting the dimensions in the right order, and '.values' to convert to numpy arrays. Although the algorithms here work on pure numpy arrays, using xarray for the pre- and postprocessing is really an asset.
37 | 
38 | ## Ensemble Kalman Filters implemented
39 | 
40 | * EnSRF: Ensemble Square Root Filter
41 |     * simultaneous solver
42 |     * serialized solver
43 |     * direct solving of square root filter
44 |     * direct solving with covariance localization (requires prior/measurement with latitudes/longitudes, see cov_loc.py)
45 | * ETKF: Ensemble Transform Kalman Filter:
46 |     * Square Root Formulation by Hunt
47 |     * Adaptation by David Livings 
48 | * ESTKF: Error-subspace transform Kalman Filter 
49 | * Stochastic EnKF (the Burgers 1998 update), also with localization.
50 | 
51 | 
52 | ## Test Data
53 | As I work on a paleoclimate Data Assimilation project the test-data is from a past-millenium climate simulation. Of course you can also easily generate some random test data.
54 | 
55 | * Y: Measurements (293 * 1) (Actualizly synthesized observations generated from model with additional noise from the prior)
56 | * R: Measurement errors (293 * 1)
57 | * Xf: Forecast from model (55296 * 100) (The number of rows is given by the number of gridpoints of the climate model. Prior contains temperature values (K))
58 | * HXf: Observations from model (293 * 100)
59 | 
60 | For this type of test data, the speed is dominated by the last operation (multiplication of perturbation matrix with weight matrix). You will see how much faster an optimized variant like the ETKF or ESTKF is in comparison to the serialized EnSRF. In my discipline people have also simply used the direct solving for K and K-tilde, which doesn't require much fancy mathematics. When the matrices are multiplied efficiently, it is only a factor 2-3 slower than the optimized variants.
61 | 
62 | # Contact
63 | If you find errors, ways to optimize the code etc.  feel free to open an issue or contact me via mchoblet -AT- iup.uni-heidelberg.de
64 | 
65 | # Literature
66 | [1] Sanita Vetra-Carvalho et al. State-of-the-art stochastic data assimilation methods for high-dimensional non-Gaussian problems. Tellus A: Dynamic Meteorology and Oceanography, 70(1):1445364, 2018. https://doi.org/10.1080/16000870.2018.1445364
67 | The authors have implemented most of the functions for the sangema project in Fortran and in julia language. I have not checked their code in detail, you can find it here: https://sourceforge.net/projects/sangoma/, https://github.com/Alexander-Barth/DataAssim.jlts
68 | 
69 | 


--------------------------------------------------------------------------------
/kalman_filters_tests.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "## Simple notebook that loads test data and applies the ensemble Kalman Filters.\n",
  8 |     "\n",
  9 |     "We compare the mean of the analysis ensembles and the covariance matrices.\n",
 10 |     "Actually only 1000 entries of the covariance matrix are computed due to the large size of the prior ensemble. Ther perturbations don't have to be equal"
 11 |    ]
 12 |   },
 13 |   {
 14 |    "cell_type": "code",
 15 |    "execution_count": 3,
 16 |    "metadata": {
 17 |     "scrolled": true
 18 |    },
 19 |    "outputs": [
 20 |     {
 21 |      "name": "stdout",
 22 |      "output_type": "stream",
 23 |      "text": [
 24 |       "[Errno 2] No such file or directory: 'kalmanfilters'\n",
 25 |       "/home/ldap-server/draco/notebooks/kalmanfilters\n"
 26 |      ]
 27 |     }
 28 |    ],
 29 |    "source": [
 30 |     "%cd kalmanfilters"
 31 |    ]
 32 |   },
 33 |   {
 34 |    "cell_type": "code",
 35 |    "execution_count": 37,
 36 |    "metadata": {
 37 |     "scrolled": true
 38 |    },
 39 |    "outputs": [],
 40 |    "source": [
 41 |     "from ensrf import *\n",
 42 |     "from ensrf_direct import *\n",
 43 |     "from estkf import *\n",
 44 |     "from etkf import *\n",
 45 |     "from etkf_livings import *\n",
 46 |     "from ensrf_serial import *\n",
 47 |     "import numpy as np\n",
 48 |     "import scipy\n",
 49 |     "from time import time"
 50 |    ]
 51 |   },
 52 |   {
 53 |    "cell_type": "code",
 54 |    "execution_count": 33,
 55 |    "metadata": {},
 56 |    "outputs": [
 57 |     {
 58 |      "name": "stdout",
 59 |      "output_type": "stream",
 60 |      "text": [
 61 |       "The autoreload extension is already loaded. To reload it, use:\n",
 62 |       "  %reload_ext autoreload\n"
 63 |      ]
 64 |     }
 65 |    ],
 66 |    "source": [
 67 |     "%load_ext autoreload\n",
 68 |     "%autoreload 2"
 69 |    ]
 70 |   },
 71 |   {
 72 |    "cell_type": "code",
 73 |    "execution_count": 39,
 74 |    "metadata": {},
 75 |    "outputs": [],
 76 |    "source": [
 77 |     "#You can look into the imported files using\n",
 78 |     "\n",
 79 |     "%less estkf.py"
 80 |    ]
 81 |   },
 82 |   {
 83 |    "cell_type": "code",
 84 |    "execution_count": 63,
 85 |    "metadata": {
 86 |     "scrolled": true
 87 |    },
 88 |    "outputs": [
 89 |     {
 90 |      "name": "stdout",
 91 |      "output_type": "stream",
 92 |      "text": [
 93 |       "Y shape: (293,)\n",
 94 |       "R shape: (293,)\n",
 95 |       "Xf shape: (55296, 100)\n",
 96 |       "HXf shape: (293, 100)\n"
 97 |      ]
 98 |     }
 99 |    ],
100 |    "source": [
101 |     "###LOAD TEST DATA\n",
102 |     "\n",
103 |     "Y=np.load('../testdata/Y.npy',allow_pickle=True)\n",
104 |     "R=np.load('../testdata/R.npy',allow_pickle=True)\n",
105 |     "Xf=np.load('../testdata/Xf.npz',allow_pickle=True)['arr_0']\n",
106 |     "HXf=np.load('../testdata/HXf.npy',allow_pickle=True)\n",
107 |     "\n",
108 |     "print('Y shape:',np.shape(Y))\n",
109 |     "print('R shape:',np.shape(R))\n",
110 |     "print('Xf shape:',np.shape(Xf))\n",
111 |     "print('HXf shape:',np.shape(HXf))"
112 |    ]
113 |   },
114 |   {
115 |    "cell_type": "code",
116 |    "execution_count": 58,
117 |    "metadata": {
118 |     "scrolled": true
119 |    },
120 |    "outputs": [
121 |     {
122 |      "name": "stdout",
123 |      "output_type": "stream",
124 |      "text": [
125 |       "-------------------------------\n",
126 |       "ESTKF  executed in  0.056397438049316406 seconds\n",
127 |       "-------------------------------\n",
128 |       "-------------------------------\n",
129 |       "EnSRF  executed in  0.1246793270111084 seconds\n",
130 |       "-------------------------------\n",
131 |       "-------------------------------\n",
132 |       "EnSRF_serial  executed in  10.726128101348877 seconds\n",
133 |       "-------------------------------\n",
134 |       "-------------------------------\n",
135 |       "ENSRF_direct  executed in  0.41527724266052246 seconds\n",
136 |       "-------------------------------\n",
137 |       "-------------------------------\n",
138 |       "ETKF  executed in  0.05635523796081543 seconds\n",
139 |       "-------------------------------\n",
140 |       "-------------------------------\n",
141 |       "ETKF_livings  executed in  0.06243419647216797 seconds\n",
142 |       "-------------------------------\n"
143 |      ]
144 |     }
145 |    ],
146 |    "source": [
147 |     "import numpy as np\n",
148 |     "\n",
149 |     "variables=[Xf, HXf, Y, R]\n",
150 |     "\n",
151 |     "mean={}\n",
152 |     "cov={}\n",
153 |     "\n",
154 |     "funcs=[ESTKF, EnSRF, EnSRF_serial,ENSRF_direct, ETKF, ETKF_livings]\n",
155 |     "\n",
156 |     "for i,f in enumerate(funcs):\n",
157 |     "    name=str(f.__name__)\n",
158 |     "    begin = time()\n",
159 |     "    full=f(*variables)\n",
160 |     "    end=time ()\n",
161 |     "    print('-------------------------------')\n",
162 |     "    print(name,' executed in ',end-begin, 'seconds')\n",
163 |     "    print('-------------------------------')\n",
164 |     "    mean[name]=np.mean(full,axis=1)\n",
165 |     "    cov[name]=np.cov(full[:1000,:],ddof=1)"
166 |    ]
167 |   },
168 |   {
169 |    "cell_type": "code",
170 |    "execution_count": 55,
171 |    "metadata": {},
172 |    "outputs": [
173 |     {
174 |      "data": {
175 |       "text/plain": [
176 |        "{'ESTKF': array([225.77951468, 225.75510173, 225.75508779, ..., 252.08298075,\n",
177 |        "        252.08490147, 252.08658989]),\n",
178 |        " 'EnSRF': array([225.77951468+0.j, 225.75510173+0.j, 225.75508779+0.j, ...,\n",
179 |        "        252.08298075+0.j, 252.08490147+0.j, 252.08658989+0.j]),\n",
180 |        " 'EnSRF_serial': array([225.77951468, 225.75510173, 225.75508779, ..., 252.08298075,\n",
181 |        "        252.08490147, 252.08658989]),\n",
182 |        " 'ENSRF_direct': array([225.77951468+1.02921696e-32j, 225.75510173+9.65516829e-33j,\n",
183 |        "        225.75508779+1.08391337e-32j, ..., 252.08298075+5.84616035e-33j,\n",
184 |        "        252.08490147+1.06922816e-32j, 252.08658989+1.66809607e-32j]),\n",
185 |        " 'ETKF': array([225.77951468, 225.75510173, 225.75508779, ..., 252.08298075,\n",
186 |        "        252.08490147, 252.08658989])}"
187 |       ]
188 |      },
189 |      "execution_count": 55,
190 |      "metadata": {},
191 |      "output_type": "execute_result"
192 |     }
193 |    ],
194 |    "source": [
195 |     "mean"
196 |    ]
197 |   },
198 |   {
199 |    "cell_type": "code",
200 |    "execution_count": 62,
201 |    "metadata": {
202 |     "scrolled": true
203 |    },
204 |    "outputs": [
205 |     {
206 |      "name": "stdout",
207 |      "output_type": "stream",
208 |      "text": [
209 |       "True\n",
210 |       "True\n",
211 |       "True\n",
212 |       "True\n",
213 |       "True\n"
214 |      ]
215 |     }
216 |    ],
217 |    "source": [
218 |     "#check equality of means\n",
219 |     "m=list(mean.values()) \n",
220 |     "for i,me in enumerate(m):\n",
221 |     "    if i<len(m)-1:\n",
222 |     "        print(np.allclose(m[i],m[i+1]))"
223 |    ]
224 |   },
225 |   {
226 |    "cell_type": "code",
227 |    "execution_count": 61,
228 |    "metadata": {},
229 |    "outputs": [
230 |     {
231 |      "name": "stdout",
232 |      "output_type": "stream",
233 |      "text": [
234 |       "True\n",
235 |       "True\n",
236 |       "True\n",
237 |       "True\n",
238 |       "True\n"
239 |      ]
240 |     }
241 |    ],
242 |    "source": [
243 |     "#check equality of covariance matrix entries\n",
244 |     "c=list(cov.values()) \n",
245 |     "for i,me in enumerate(c):\n",
246 |     "    if i<len(c)-1:\n",
247 |     "        print(np.allclose(c[i],c[i+1]))"
248 |    ]
249 |   }
250 |  ],
251 |  "metadata": {
252 |   "kernelspec": {
253 |    "display_name": "Python 3 (ipykernel)",
254 |    "language": "python",
255 |    "name": "python3"
256 |   },
257 |   "language_info": {
258 |    "codemirror_mode": {
259 |     "name": "ipython",
260 |     "version": 3
261 |    },
262 |    "file_extension": ".py",
263 |    "mimetype": "text/x-python",
264 |    "name": "python",
265 |    "nbconvert_exporter": "python",
266 |    "pygments_lexer": "ipython3",
267 |    "version": "3.10.2"
268 |   }
269 |  },
270 |  "nbformat": 4,
271 |  "nbformat_minor": 4
272 | }
273 | 


--------------------------------------------------------------------------------
/kalmanfilters/cov_loc.py:
--------------------------------------------------------------------------------
 1 | def covariance_loc(model_data,proxy_lat,proxy_lon, cov_len):    
 2 |     """
 3 |     Function that returns the matrices needed for the Covariance Localization in the direct EnSRF solver by Hadamard (element-wise) product.
 4 |     These are the terms called W_loc and Y_loc here: https://www.nature.com/articles/s41586-020-2617-x#Sec7 (Data Assimilation section).
 5 |     The idea is to compute these matrices once in the beginning for all available proxy locations, and later in the DA loop one only selects
 6 |     the relevant columns of W_loc / rows and columns of Y_loc for the localized simultaneous Kalman Filter Solver.
 7 |     
 8 |     Input:
 9 |        - model_data from which the grid point locations are extracted. Here I use the stack function, which I also use when constructing the
10 |        prior vector. In brings all gridpoints in a vector form (xarray-DataArray such that stack can be applied, N_x grid points)
11 |        - proxy_lat, proxy_lon are the latitudes and longitudes of the proxy locations (np.arrays, length = N_y). Make sure they have the same ordering as
12 |        the entries of your Observations-from-Model (HXf) in the Kalman Filter.
13 |        - cov_len: Radius for Gaspari Cohn function [float, in km ]
14 |        
15 |     Ouput:
16 |         - PH_loc: Matrix for localization of PH^T (N_x * N_y)
17 |         - HPH_loc: Matrix for localization of HPH^T (N_y * N_y)
18 |     """
19 |     from haversine import haversine_vector, Unit
20 |     
21 |     #bring coordinates of model (field) and proxy (individual locations) into the right form
22 |     #the method we use are different due to this different structures (field and individual locations)
23 |     loc=np.array([[lat,lon] for lat,lon in zip(proxy_lat,proxy_lon)])
24 |     stacked=model_data.stack(z=('lat','lon')).transpose('z','time')
25 |     coords=[list(z) for z in stacked.z.values]
26 |     
27 |     #model-proxy distances
28 |     dists_mp=haversine_vector(loc,coords, Unit.KILOMETERS,comb=True)
29 |     dists_mp_shape=dists_mp.shape
30 |     
31 |     #proxy-proxy distances
32 |     dists_pp=haversine_vector(loc,loc, Unit.KILOMETERS,comb=True)
33 |     dists_pp_shape=dists_pp.shape
34 |     
35 |     def gaspari_cohn(dists,cov_len):
36 |         """
37 |         Gaspari Cohn decorrelation function https://rmets.onlinelibrary.wiley.com/doi/epdf/10.1002/qj.49712555417 page 26
38 |         dists: need to be a 1-D array with all the distances (Reshapeto your needs afterwards)
39 |         cov_len: radius given in km
40 | 
41 |         """
42 |         dists = np.abs(dists)
43 |         array = np.zeros_like(dists)
44 |         r = dists/cov_len
45 |         #first the short distances
46 |         i=np.where(r<=1.)[0]
47 |         array[i]=-0.25*(r[i])**5+0.5*r[i]**4+0.625*r[i]**3-5./3.*r[i]**2+1.
48 |         #then the long ones
49 |         i=np.where((r>1) & (r<=2))[0]
50 |         array[i]=1./12.*r[i]**5-0.5*r[i]**4+0.625*r[i]**3+5./3.*r[i]**2.-5.*r[i]+4.-2./(3.*r[i])
51 | 
52 |         array[array < 0.0] = 0.0
53 |         return array
54 |     
55 |     #flatten distances, apply to Gaspari Cohn and reshape
56 |     PH_loc=gaspari_cohn(dists_mp.reshape(-1),cov_len).reshape(dists_mp_shape)
57 |     HPH_loc=gaspari_cohn(dists_pp.reshape(-1),cov_len).reshape(dists_pp_shape)
58 |     
59 |     return PH_loc, HPH_loc
60 | 


--------------------------------------------------------------------------------
/kalmanfilters/ensrf.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | 
 3 | def EnSRF(Xf, HXf, Y, R):
 4 |     """
 5 |     Implementation adapted from pseudocode description in
 6 |     "State-of-the-art stochastic data assimialation methods" by Vetra-Carvalho et al. (2018),
 7 |     algorithm 9, see section 5.6. Pseudocode has some errors, eg. in step 7 it should be sqrt(Lambda).
 8 |     
 9 |     Dimensions: N_e: ensemble size, N_y: Number of observations: N_x: State vector size (Gridboxes x assimilated variables)
10 |     
11 |     Input:
12 |     - Xf:  the prior ensemble (N_x x N_y) 
13 |     - R: Measurement Error (Variance of pseudoproxy timerseries) (N_y x 1) -> converted to Ny x Ny matrix
14 |     - HX^f: Model value projected into observation space/at proxy locations (N_y x N_e)
15 |     - Y: Observation vector (N_y x 1)
16 | 
17 |     Output:
18 |     - Analysis ensemble (N_x, N_e)
19 |     """
20 |     #Obs error matrix
21 |     Rmat=np.diag(R)
22 |     #Mean of prior ensemble for each gridbox   
23 |     mX = np.mean(Xf, axis=1)
24 |     #Perturbations from ensemble mean
25 |     Xfp=Xf-mX[:,None]
26 |     #Mean and perturbations for model values in observation space
27 |     mY = np.mean(HXf, axis=1)
28 |     HXp = HXf-mY[:,None]
29 | 
30 |     #Gram matrix of perturbations
31 |     I1=HXp @ HXp.T
32 |     Ny=np.shape(Y)[0]
33 |     Ne=np.shape(Xf)[1]
34 | 
35 |     I2=I1+(Ne-1)*Rmat
36 |     #compute eigenvalues and eigenvectors (use that matrix is symmetric and real)
37 |     eigs, ev = np.linalg.eigh(I2) 
38 | 
39 |     #Error in Pseudocode: Square Root + multiplication order (important!)
40 |     G1=ev @ np.diag(np.sqrt(1/eigs)) 
41 |     G2=HXp.T @ G1
42 | 
43 |     U,s,Vh=np.linalg.svd(G2)
44 |     #Compute  sqrt of matrix, Problem of imaginary values?? (singular values are small)
45 |     rad=(np.ones(Ne)-np.square(s)).astype(complex)
46 |     rad=np.sqrt(rad)
47 |     A=np.diag(rad)
48 | 
49 |     W1p=U @ A
50 |     W2p=W1p@U.T
51 | 
52 |     d=Y-mY
53 | 
54 |     w1=ev.T @ d
55 |     w2=np.diag(1/eigs).T @ w1
56 |     w3=ev @ w2
57 |     w4=HXp.T @ w3
58 |     W=W2p+w4[:,None]
59 |     Xa=mX[:,None]+Xfp @ W
60 | 
61 |     return Xa
62 | 
63 | 
64 | 
65 | 


--------------------------------------------------------------------------------
/kalmanfilters/ensrf_direct.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | import scipy
 3 | 
 4 | def ENSRF_direct(Xf, HXf, Y, R):
 5 |     """
 6 |     direct calculation of Ensemble Square Root Filter from Whitaker and Hamill
 7 |     As for instance done in Steiger 2018: "A reconstruction of global hydroclimate and dynamical variables over the Common Era".
 8 |     
 9 |     In comparison to the code for that paper [1], the matrix multiplications are performed  consequently from left to right and 
10 |     the kalman gain is not explicitely computed, because this would be inefficient when we are just interested in the posterior ensemble.
11 |     One could also avoid computing the matrix inverses and solve linear systems instead (one could even use Cholesky decomposition
12 |     because the covariance matrices are positive definite), but as the number of observations is small the speed up is insignificant.
13 |     When using many observations (>1000) one should consider doing it. Here, the main computation effort comes from the matrix square root
14 |     (potentially numerically unstable) and unavoidable matrix - matrix multiplications.
15 |     
16 |     Dimensions: N_e: ensemble size, N_y: Number of observations: N_x: State vector size (Gridboxes x assimilated variables)
17 |     
18 |     Input:
19 |     - Xf:  the prior ensemble (N_x x N_e) 
20 |     - R: Measurement Error (Variance of pseudoproxy timerseries) ($N_y$ x 1$) -> converted to Ny x Ny matrix
21 |     - HX^f: Model value projected into observation space/at proxy locations ($N_y$ x $N_e$)
22 |     - Y: Observation vector ($N_y$ x 1)
23 |     Output:
24 |     - Analysis ensemble (N_x, N_e)
25 |     
26 |     [1] https://github.com/njsteiger/PHYDA-v1/blob/master/M_update.m
27 |     """
28 |     Ne=np.shape(Xf)[1]
29 | 
30 |     #Obs error matrix, assumption that it's diagonal
31 |     Rmat=np.diag(R)
32 |     Rsqr=np.diag(np.sqrt(R)) 
33 | 
34 |     #Mean of prior ensemble for each gridbox   
35 |     mX = np.mean(Xf, axis=1)
36 |     #Perturbations from ensemble mean
37 |     Xfp=Xf-mX[:,None]
38 |     #Mean and perturbations for model values in observation space
39 |     mY = np.mean(HXf, axis=1)
40 |     HXp = HXf-mY[:,None]
41 |     #innovation
42 |     d=Y-mY
43 | 
44 |     #compute matrix products directly
45 |     #BHT=(Xfp @ HXp.T)/(Ne-1) #avoid this, it's inefficient to compute it here
46 |     HPHT=(HXp @ HXp.T)/(Ne-1)
47 | 
48 |     #second Kalman gain factor
49 |     HPHTR=HPHT+Rmat
50 |     #inverse of term
51 |     HPHTR_inv=np.linalg.inv(HPHTR)
52 |     #matrix square root of denominator
53 |     HPHTR_sqr=scipy.linalg.sqrtm(HPHTR)
54 | 
55 |     #Kalman gain for mean
56 |     xa_m=mX + (Xfp @ (HXp.T /(Ne-1) @ (HPHTR_inv @ d)))
57 | 
58 |     #Perturbation Kalman gain
59 |     #inverse of square root calculated via previous inverse: sqrt(A)^(-1)=sqrt(A) @ A^(-1)
60 |     HPHTR_sqr_inv=HPHTR_sqr @ HPHTR_inv
61 |     fac2=HPHTR_sqr + Rsqr
62 |     factor=np.linalg.inv(fac2)
63 | 
64 |     #right to left multiplication!
65 |     pert = (Xfp @ (HXp.T/(Ne-1) @ (HPHTR_sqr_inv.T @ (factor @ HXp))))
66 |     Xap=Xfp-pert
67 |     
68 |     return Xap+xa_m[:,None]
69 | 


--------------------------------------------------------------------------------
/kalmanfilters/ensrf_direct_loc.py:
--------------------------------------------------------------------------------
 1 | #localized version of the direct kalman solver
 2 | import numpy as np
 3 | import scipy
 4 | 
 5 | def ENSRF_direct_loc(Xf, HXf, Y, R,PH_loc, HPH_loc):
 6 |     """
 7 |     direct calculation of Ensemble Square Root Filter from Whitaker and Hamill
 8 |     applying localization matrices to PH^T and HPH^T as in Tierney 2020: 
 9 |     https://www.nature.com/articles/s41586-020-2617-x#Sec7 (Data Assimilation section).
10 |     This is less efficient than without localization, becaue PH needs to be explicitely calculated for the entry-wise hadamard product
11 |     (https://en.wikipedia.org/wiki/Hadamard_product_(matrices)). 
12 |     However, this is still better than using the serial EnSRF formulation (At least an order of magnitude faster).
13 |     It is important to not compute the Kalman gains explicitely.
14 |     As commented in the docstring ENSRF_direct, avoiding inverting matrices could be done, but the speed up is insignificant
15 |     in comparison to the rest.
16 |     
17 |     I propose to compute PH_loc and HPH_loc once for all possible proxy locations, and here only select the 
18 |     relevant columns (for PH_loc) and the relvant rows and columns for HPH_loc using fancy indexing:
19 |     PH_loc -> PH_loc[:,[column_indices]]
20 |     HPH_loc -> HPH_loc[[row_indices]][:,[column_indices]],
21 |     given which proxies are available at one timestep.
22 | 
23 |     Input:
24 |     - Xf:  the prior ensemble (N_x x N_e) 
25 |     - R: Measurement Error (Variance of pseudoproxy timerseries) (N_y) -> converted to Ny x Ny matrix
26 |     - HX^f: Model value projected into observation space/at proxy locations ($N_y$ x $N_e$)
27 |     - Y: Observation vector (N_y)
28 |     - PH_loc: Matrix for localization of PH^T (N_x * N_y)
29 |     - HPH_loc: Matrix for localization of HPH^T (N_y * N_y)
30 |     
31 |     Output:
32 |     - Analysis ensemble (N_x, N_e)
33 |     """
34 |     
35 |     Ne=np.shape(Xf)[1]
36 | 
37 |     #Obs error matrix, assumption that it's diagonal
38 |     Rmat=np.diag(R)
39 |     Rsqr=np.diag(np.sqrt(R)) 
40 | 
41 |     #Mean of prior ensemble for each gridbox   
42 |     mX = np.mean(Xf, axis=1)
43 |     #Perturbations from ensemble mean
44 |     Xfp=Xf-mX[:,None]
45 |     #Mean and perturbations for model values in observation space
46 |     mY = np.mean(HXf, axis=1)
47 |     HXp = HXf-mY[:,None]
48 |     #innovation
49 |     d=Y-mY
50 | 
51 |     #compute matrix products directly
52 |     #entry wise product of covariance localization matrices
53 |     PHT= PH_loc * (Xfp @ HXp.T/(Ne-1))
54 |     HPHT= HPH_loc * (HXp @ HXp.T/(Ne-1))
55 |     
56 |     #second Kalman gain factor
57 |     HPHTR=HPHT+Rmat
58 |     #inverse of factor
59 |     HPHTR_inv=np.linalg.inv(HPHTR)
60 |     #matrix square root of denominator
61 |     HPHTR_sqr=scipy.linalg.sqrtm(HPHTR)
62 | 
63 |     #Kalman gain for mean
64 |     xa_m=mX + PHT @ (HPHTR_inv @ d)
65 | 
66 |     #Perturbation Kalman gain
67 |     #inverse of square root calculated via previous inverse: sqrt(A)^(-1)=sqrt(A) @ A^(-1)
68 |     HPHTR_sqr_inv=HPHTR_sqr @ HPHTR_inv
69 |     fac2=HPHTR_sqr + Rsqr
70 |     factor=np.linalg.inv(fac2)
71 | 
72 |     # right to left multiplication!
73 |     pert = PHT @ (HPHTR_sqr_inv.T @ (factor @ HXp))
74 |     Xap=Xfp-pert
75 |     
76 |     return Xap+xa_m[:,None]
77 | 


--------------------------------------------------------------------------------
/kalmanfilters/ensrf_serial.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | def EnSRF_serial(Xf, HXf, Y, R):
 3 |     """
 4 |     Implementation adapted from pseudocode description in
 5 |     "State-of-the-art stochastic data assimialation methods" by Vetra-Carvalho et al. (2018),
 6 |     algorithm 10, see section 5.7.
 7 |     Errors: Line 1 must be inside of loop, in HPH^T the divisor Ne-1 is missing.
 8 |     This version uses the appended state vector approach, which also updates the precalculated observations from the model.
 9 |     
10 |     
11 |     Dimensions: N_e: ensemble size, N_y: Number of observations: N_x: State vector size (Gridboxes x assimilated variables)
12 |     
13 |     Input:
14 |     - Xf:  the prior ensemble (N_x x N_y) 
15 |     - R: Measurement Error (Variance of pseudoproxy timerseries) ($N_y$ x 1$) -> converted to Ny x Ny matrix
16 |     - HX^f: Model value projected into observation space/at proxy locations ($N_y$ x $N_e$)
17 |     - Y: Observation vector ($N_y$ x 1)
18 | 
19 |     Output:
20 |     - Analysis ensemble (N_x, N_e)
21 |     """
22 | 
23 |     # augmented state vector with Ye appended
24 |     Xfn = np.append(Xf, HXf, axis=0)
25 |     
26 |     # number of state variables
27 |     Nx= np.shape(Xf)[0]
28 |     # number of ensemble members
29 |     Ne=np.shape(Xf)[1]
30 |     #Number of measurements
31 |     Ny=np.shape(Y)[0]
32 |     for i in range(Ny):
33 |         #ensemble mean and perturbations
34 |         mX = np.mean(Xfn, axis=1)
35 |         Xfp=np.subtract(Xfn,mX[:,None])
36 |         
37 |         #get obs from model
38 |         HX=Xfn[Nx+i,:]
39 |         #ensemble mean for obs
40 |         mY=np.mean(HX)
41 |         #remove mean
42 |         HXp=(HX-mY)[None]
43 | 
44 |         HP=HXp @ Xfp.T /(Ne-1)
45 |         
46 |         #Variance at location (here divisor is missing in reference!)
47 |         HPHT=HXp @ HXp.T/(Ne-1)
48 | 
49 |         ##Localize HP ?
50 |         
51 |         #compute scalar
52 |         sig=R[i]
53 |         F=HPHT + sig
54 |         K=(HP/F)
55 | 
56 |         #compute factors for final calc
57 |         d=Y[i]-mY
58 |         a1=1+np.sqrt(sig/F)
59 |         a2=1/a1
60 |         
61 |         #final calcs
62 |         mXa=mX+np.squeeze((K*d))
63 |         Xfp=Xfp-a2*K.T @ HXp
64 |         Xfn=Xfp+mXa[:,None]
65 |         
66 |     return Xfn[:Nx,:]
67 |     
68 | 


--------------------------------------------------------------------------------
/kalmanfilters/estkf.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | 
 3 | def ESTKF(Xf, HXf, Y, R):
 4 |     """
 5 |     Error-subspace transform Kalman Filter
 6 |     
 7 |     Implementation adapted from pseudocode description in
 8 |     "State-of-the-art stochastic data assimialation methods" by Vetra-Carvalho et al. (2018),
 9 |     algorithm 12, see section 5.9
10 |     Errors: 
11 |         5th line: A instead of L (A needs to be created)
12 |         Last line: W_A instead of W'
13 |     
14 |     Dimensions: N_e: ensemble size, N_y: Number of observations: N_x: State vector size (Gridboxes x assimilated variables)
15 |     
16 |     Input:
17 |     - Xf:  the prior ensemble (N_x x N_y) 
18 |     - R: Measurement Error (assumed uncorrelated) (N_y x 1) -> converted to Ny x Ny matrix
19 |     - HX^f: Model value projected into observation space/at proxy locations (N_y x N_e)
20 |     - Y: Observation vector (N_y x 1)
21 | 
22 |     Output:
23 |     - Analysis ensemble (N_x, N_e)
24 |     """
25 |     
26 |     # number of ensemble members
27 |     Ne=np.shape(Xf)[1]
28 |     
29 |     #Obs error matrix
30 |     Rmat=np.diag(R)
31 |     Rmat_inv=np.diag(1/R)
32 |     #Mean of prior ensemble for each state vector variable 
33 |     mX = np.mean(Xf, axis=1)
34 |     #Perturbations from ensemble mean
35 |     Xfp=Xf-mX[:,None]
36 |     
37 |     #Mean of model values in observation space
38 |     mY = np.mean(HXf, axis=1)
39 |     d=Y-mY
40 | 
41 |     """
42 |     Create projection matrix:
43 |     - create matrix of shape Ne x Ne-1 filled with off diagonal values
44 |     - fill diagonal with diagonal values
45 |     - replace values of last row
46 |     """
47 | 
48 |     sqr_ne=-1/np.sqrt(Ne)
49 |     off_diag=-1/(Ne*(-sqr_ne+1))
50 |     diag=1+off_diag
51 | 
52 |     A=np.ones((Ne,Ne-1))*off_diag
53 |     np.fill_diagonal(A,diag)
54 |     A[-1,:]=sqr_ne
55 | 
56 |     #error in pseudocode, replace L by A
57 |     HL=HXf @ A
58 |     B1=Rmat_inv @ HL
59 |     C1=(Ne-1)*np.identity(Ne-1)
60 |     C2=C1+HL.T @ B1
61 |     
62 |     #EVD of C2, assumed symmetric
63 |     eigs,U=np.linalg.eigh(C2)
64 |     
65 |     d1=B1.T @ d
66 |     d2=U.T @ d1
67 |     d3=d2/eigs
68 |     T=U @ np.diag(1/np.sqrt(eigs)) @ U.T
69 |     
70 |     #mean weight
71 |     wm=U @ d3
72 |     #perturbation weight
73 |     Wp=T @ A.T*np.sqrt((Ne-1))
74 |     #total weight matrix + projection matrix transform
75 |     W=wm[:,None]+Wp
76 |     Wa = A @ W
77 | 
78 |     #Analysis ensemble
79 |     Xa = mX[:,None] + Xfp @ Wa
80 | 
81 |     return Xa
82 | 


--------------------------------------------------------------------------------
/kalmanfilters/etkf.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | 
 3 | def ETKF(Xf, HXf, Y, R):
 4 |     """
 5 |     Implementation adapted from pseudocode description in
 6 |     "State-of-the-art stochastic data assimialation methods" by Vetra-Carvalho et al. (2018),
 7 |     algorithm 7, see section 5.4.
 8 |     Errors: Calculation of W1 prime, divide by square root of eigenvalues. The mathematical formula in the paper has an error already.
 9 |     
10 |     Dimensions: N_e: ensemble size, N_y: Number of observations: N_x: State vector size (Gridboxes x assimilated variables)
11 |     
12 |     Input:
13 |     - Xf:  the prior ensemble (N_x x N_y) 
14 |     - R: Measurement Error (Variance of pseudoproxy timerseries) ($N_y$ x 1$) -> converted to Ny x Ny matrix
15 |     - HX^f: Model value projected into observation space/at proxy locations ($N_y$ x $N_e$)
16 |     - Y: Observation vector ($N_y$ x 1)
17 | 
18 |     Output:
19 |     - Analysis ensemble (N_x, N_e)
20 |     """
21 |     # number of ensemble members
22 |     Ne=np.shape(Xf)[1]
23 | 
24 |     #Obs error matrix
25 |     #Rmat=np.diag(R)
26 |     Rmat_inv=np.diag(1/R)
27 |     #Mean of prior ensemble for each gridbox   
28 |     mX = np.mean(Xf, axis=1)
29 |     #Perturbations from ensemble mean
30 |     Xfp=Xf-mX[:,None]
31 |     #Mean and perturbations for model values in observation space
32 |     mY = np.mean(HXf, axis=1)
33 |     HXp = HXf-mY[:,None]
34 | 
35 |     C=Rmat_inv @ HXp
36 |     A1=(Ne-1)*np.identity(Ne)
37 |     A2=A1 + (HXp.T @ C)
38 | 
39 |     #eigenvalue decomposition of A2, A2 is symmetric
40 |     eigs, ev = np.linalg.eigh(A2) 
41 | 
42 |     #compute perturbations
43 |     Wp1 = np.diag(np.sqrt(1/eigs)) @ ev .T
44 |     Wp = ev @ Wp1 * np.sqrt(Ne-1)
45 | 
46 |     #differing from pseudocode
47 |     d=Y-mY
48 |     D1 = Rmat_inv @ d
49 |     D2 = HXp.T @ D1
50 |     wm=ev @ np.diag(1/eigs) @ ev.T @ D2  #/ np.sqrt(Ne-1) 
51 | 
52 |     #adding pert and mean (!row-major formulation in Python!)
53 |     W=Wp + wm[:,None]
54 | 
55 |     #final adding up (most costly operation)
56 |     Xa=mX[:,None] + Xfp @ W
57 | 
58 |     return Xa
59 |     
60 |     
61 | 


--------------------------------------------------------------------------------
/kalmanfilters/etkf_livings.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | 
 3 | def ETKF_livings(Xf, HXf, Y, R):
 4 |     """
 5 |     Adaption of the ETKF proposed by David Livings (2005)
 6 |     
 7 |     Implementation adapted from
 8 |     "State-of-the-art stochastic data assimialation methods" by Vetra-Carvalho et al. (2018),
 9 |     
10 |     Dimensions: N_e: ensemble size, N_y: Number of observations: N_x: State vector size (Gridboxes x assimilated variables)
11 |     
12 |     Input:
13 |     - Xf:  the prior ensemble (N_x x N_y) 
14 |     - R: Measurement Error (Variance of pseudoproxy timerseries) ($N_y$ x 1$) -> converted to Ny x Ny matrix
15 |     - HX^f: Model value projected into observation space/at proxy locations ($N_y$ x $N_e$)
16 |     - Y: Observation vector ($N_y$ x 1)
17 | 
18 |     Output:
19 |     - Analysis ensemble (N_x, N_e)
20 |     """
21 |     # number of ensemble members
22 |     Ne=np.shape(Xf)[1]
23 |     Ny=np.shape(Y)[0]
24 | 
25 |     #Obs error matrix
26 |     Rmat=np.diag(R)
27 |     Rmat_inv=np.diag(1/R)
28 |     #Mean of prior ensemble for each gridbox   
29 |     mX = np.mean(Xf, axis=1)
30 |     #Perturbations from ensemble mean
31 |     Xfp=Xf-mX[:,None]
32 |     #Mean and perturbations for model values in observation space
33 |     mY = np.mean(HXf, axis=1)
34 |     HXp = HXf-mY[:,None]
35 |     
36 |     #Scaling of perturbations proposed by Livings (2005), numerical stability
37 |     S_hat=np.diag(1/np.sqrt(R)) @ HXp/np.sqrt(Ne-1)
38 |     
39 |     #svd of S_hat transposed
40 |     U,s,Vh=np.linalg.svd(S_hat.T)
41 |     
42 |     C=Rmat_inv @ HXp
43 |     #recreate singular value matrix
44 |     Sig=np.zeros((Ne,Ny))
45 |     np.fill_diagonal(Sig,s)
46 |     
47 |     #perturbation weight
48 |     mat=np.diag(1/np.sqrt(1+np.square(s)))
49 |     Wp1=mat @ U.T
50 |     Wp=U @ Wp1
51 |     
52 |     #innovation
53 |     d=Y-mY
54 |     #mean weight
55 |     D = np.diag(1/np.sqrt(R)) @ d
56 |     D2= Vh @ D
57 |     D3 = np.diag(1/(1+np.square(s))) @ Sig @ D2
58 |     wm= U @ D3 / np.sqrt(Ne-1)
59 | 
60 |     #adding pert and mean (!row-major formulation in Python!)
61 |     W=Wp + wm[:,None]
62 | 
63 |     #final adding up (most costly operation)
64 |     Xa=mX[:,None] + Xfp @ W
65 |     
66 |     return Xa
67 | 


--------------------------------------------------------------------------------
/kalmanfilters/senkf.py:
--------------------------------------------------------------------------------
 1 | def SEnKF(Xf, HXf, Y, R):
 2 |     """
 3 |     Stochastic Ensemble Kalman Filter
 4 |     Implementation adapted from pseudocode description in
 5 |     "State-of-the-art stochastic data assimialation methods" by Vetra-Carvalho et al. (2018),
 6 |     
 7 |     Changes: The pseudocode is not consistent with the description in 5.1, where the obs-from-model are perturbed, but in the pseudocode it's the other way round.
 8 |     Hence the 8th line D= ... is confusing if we would generate Y as described in the text.
 9 |     Last line needs to have 1/(Ne-1) (+always better to do that on the smaller matrix)
10 |     
11 |     Input:
12 |     - Xf:  the prior ensemble (N_x x N_e) 
13 |     - R: Measurement Error (Variance of pseudoproxy timerseries) (N_y x 1) -> converted to Ny x Ny matrix
14 |     - HX^f: Model value projected into observation space/at proxy locations (N_y x N_e)
15 |     - Y: Observation vector (N_y x 1)
16 | 
17 |     Output:
18 |     - Analysis ensemble (N_x, N_e)
19 |     
20 |     
21 |     """
22 |     # number of ensemble members
23 |     Ne=np.shape(Xf)[1]
24 |     Ny=np.shape(R)[0]
25 |     #Obs error matrix
26 |     Rmat=np.diag(R)
27 |     #Mean of prior ensemble for each gridbox   
28 |     mX = np.mean(Xf, axis=1)
29 |     #Perturbations from ensemble mean
30 |     Xfp=Xf-mX[:,None]
31 |     #Mean and perturbations for model values in observation space
32 |     mY = np.mean(HXf, axis=1)
33 |     HXp = HXf-mY[:,None]
34 | 
35 |     HPH=HXp@HXp.T /(Ne-1)
36 | 
37 |     A=HPH + Rmat
38 | 
39 |     rng = np.random.default_rng(seed=42)
40 |     Y_p=rng.standard_normal((Ny, Ne))*np.sqrt(R)[:,None]
41 | 
42 |     D= Y[:,None]+Y_p - HXf
43 |     
44 |     #solve linear system for getting inverse
45 |     C=np.linalg.solve(A,D)
46 |     
47 |     E=HXp.T @ C
48 |     
49 |     Xa=Xf+Xfp@(E/(Ne-1))
50 |     
51 |     return Xa
52 | 


--------------------------------------------------------------------------------
/kalmanfilters/senkf_loc.py:
--------------------------------------------------------------------------------
 1 | def SEnKF_loc(Xf, HXf, Y, R,PH_loc, HPH_loc):
 2 |     """
 3 |     Stochastic Ensemble Kalman Filter that can do localisation. Changed the order of calculations
 4 |     Implementation adapted from pseudocode description in
 5 |     "State-of-the-art stochastic data assimialation methods" by Vetra-Carvalho et al. (2018),
 6 |     
 7 |     for the calculation of PH_loc/HPH_loc look at the function in ensrf_direct_loc.py
 8 |     
 9 |     Changes: The pseudocode is not consistent with the description in 5.1, where the obs-from-model are perturbed, but in the pseudocode it's the other way round.
10 |     Hence the 8th line D= ... is confusing if we would generate Y as described in the text.
11 |     Last line needs to have 1/(Ne-1)
12 |     """
13 |     # number of ensemble members
14 |     Ne=np.shape(Xf)[1]
15 |     Ny=np.shape(R)[0]
16 |     #Obs error matrix
17 |     Rmat=np.diag(R)
18 |     #Mean of prior ensemble for each gridbox   
19 |     mX = np.mean(Xf, axis=1)
20 |     #Perturbations from ensemble mean
21 |     Xfp=Xf-mX[:,None]
22 |     #Mean and perturbations for model values in observation space
23 |     mY = np.mean(HXf, axis=1)
24 |     HXp = HXf-mY[:,None]
25 | 
26 |     #Hadamard product for localisation
27 |     HPH=HPH_loc * (HXp@HXp.T /(Ne-1))
28 | 
29 |     A=HPH + Rmat
30 | 
31 |     rng = np.random.default_rng(seed=42)
32 |     Y_p=rng.standard_normal((Ny, Ne))*np.sqrt(R)[:,None]
33 | 
34 |     D= Y[:,None]+Y_p - HXf
35 |     
36 |     #solve linear system for getting inverse
37 |     C=np.linalg.solve(A,D)
38 |     
39 |     Pb=PH_loc*(Xfp @ HXp.T/(Ne-1)) 
40 |     
41 |     Xa=Xf + Pb @ C
42 |     
43 |     return Xa
44 | 


--------------------------------------------------------------------------------
/testdata/HXf.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mchoblet/ensemblefilters/788d7b1a38f6d4e040b9579443bc204c83030505/testdata/HXf.npy


--------------------------------------------------------------------------------
/testdata/R.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mchoblet/ensemblefilters/788d7b1a38f6d4e040b9579443bc204c83030505/testdata/R.npy


--------------------------------------------------------------------------------
/testdata/Xf.npz:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mchoblet/ensemblefilters/788d7b1a38f6d4e040b9579443bc204c83030505/testdata/Xf.npz


--------------------------------------------------------------------------------
/testdata/Y.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mchoblet/ensemblefilters/788d7b1a38f6d4e040b9579443bc204c83030505/testdata/Y.npy


--------------------------------------------------------------------------------