├── .gitignore
├── LICENSE
├── README.md
├── independence_testing
    ├── CorrTestObject.py
    ├── ExampleData.txt
    ├── ExampleHSIC.py
    ├── ExperimentsHSICBlock.py
    ├── ExperimentsHSICPermutation.py
    ├── ExperimentsHSICSpectral.py
    ├── HSICBlockTestObject.py
    ├── HSICPermutationTestObject.py
    ├── HSICSpectralTestObject.py
    ├── HSICTestObject.py
    ├── SimDataGen.py
    ├── SubHSICTestObject.py
    ├── TestExperiment.py
    ├── TestObject.py
    └── __init__.py
├── kerpy
    ├── BagKernel.py
    ├── BrownianKernel.py
    ├── GaussianBagKernel.py
    ├── GaussianKernel.py
    ├── HypercubeKernel.py
    ├── Kernel.py
    ├── LinearBagKernel.py
    ├── LinearKernel.py
    ├── MaternKernel.py
    ├── PolynomialKernel.py
    ├── ProductKernel.py
    ├── SumKernel.py
    └── __init__.py
├── setup.py
├── tools
    ├── GenericTests.py
    ├── ProcessingObject.py
    ├── UnitTests.py
    ├── __init__.py
    ├── read_and_plot_test_results.py
    └── read_test_results.py
└── weak_conditional_independence_testing
    ├── BH_prewhiten.csv
    ├── Ozone_prewhiten.csv
    ├── PCalg_twostep_flags.py
    ├── SimDataGen.py
    ├── SyntheticDim_KRESIT.py
    ├── Synthetic_DAGexample.csv
    ├── TwoStepCondTestObject.py
    ├── WHO_KRESITvsRESIT.py
    ├── WHO_dataset.csv
    └── __init__.py


/.gitignore:
--------------------------------------------------------------------------------
1 | *.pyc
2 | *.project
3 | *.pydevproject
4 | *~
5 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | Copyright (c) 2016 oxmlcs
 2 | All rights reserved.
 3 | 
 4 | Redistribution and use in source and binary forms, with or without
 5 | modification, are permitted provided that the following conditions are met:
 6 |  *
 7 | 1. Redistributions of source code must retain the above copyright notice, this
 8 |    list of conditions and the following disclaimer.
 9 | 2. Redistributions in binary form must reproduce the above copyright notice,
10 |    this list of conditions and the following disclaimer in the documentation
11 |    and/or other materials provided with the distribution.
12 |  *
13 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
14 | ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
15 | WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
16 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
17 | ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
18 | (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
19 | LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
20 | ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
21 | (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
22 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
23 |  *
24 | The views and conclusions contained in the software and documentation are those
25 | of the authors and should not be interpreted as representing official policies,
26 | either expressed or implied, of the author.
27 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # kerpy
 2 | python code framework for kernel methods in hypothesis testing. 
 3 | some code on kernel computation was adapted from https://github.com/karlnapf/kameleon-mcmc 
 4 | 
 5 | To set up as a package run in terminal:
 6 | 
 7 | ``python setup.py develop``
 8 | 
 9 | 
10 | ### independence_testing
11 | 
12 | Code for HSIC-based large-scale independence tests. The methods are described in:
13 | 
14 | Q. Zhang, S. Filippi, A. Gretton, and D. Sejdinovic, __Large-Scale Kernel Methods for Independence Testing__, _Statistics and Computing_, to appear, 2017. [url](http://link.springer.com/article/10.1007%2Fs11222-016-9721-7)
15 | 
16 | For an example use of the code, demonstrating how to run an HSIC-based large-scale independence test on either simulated data or data loaded from a file, see ExampleHSIC.py. 
17 | 
18 | To reproduce results from the paper, see ExperimentsHSICPermutation.py, ExperimentsHSICSpectral.py, ExperimentsHSICBlock.py. 
19 | 
20 | 
21 | 
22 | ### weak_conditional_independence_testing
23 | 
24 | Code for feature-to-feature regression for a two-step conditional independence tests (i.e. testing for weak conditional independence). The methods are described in:
25 | 
26 | Q. Zhang, S. Filippi, S. Flaxman, and D. Sejdinovic, __Feature-to-Feature Regression for a Two-Step Conditional Independence Test__, UAI, 2017.
27 | 
28 | 
29 | To reproduce results from the paper, see WHO_KRESITvsRESIT.py, SyntheticDim_KRESIT.py, PCalg_twostep_flags.py (the nodes names correspond to the varibales in Synthetic_DAGexample.csv file, to run it for Boston Housing Data (BH_prewhiten.csv) or Ozone Data (Ozone_prewhiten.csv) simply change the label names in the script.)
30 | 


--------------------------------------------------------------------------------
/independence_testing/CorrTestObject.py:
--------------------------------------------------------------------------------
 1 | from TestObject import TestObject
 2 | from numpy import shape, zeros
 3 | from scipy.stats import pearsonr
 4 | import time
 5 | from numpy.random import permutation
 6 | 
 7 | 
 8 | class CorrTestObject(TestObject):
 9 |     def __init__(self, num_samples, data_generator, streaming=False, freeze_data=False,num_shuffles=1000):
10 |         TestObject.__init__(self,self.__class__.__name__,streaming=streaming, freeze_data=freeze_data)
11 |         self.num_samples = num_samples #We have same number of samples from X and Y in independence testing
12 |         self.data_generator = data_generator
13 |         self.num_shuffles = num_shuffles
14 |     
15 |     
16 |     def generate_data(self):
17 |         self.data_x, self.data_y = self.data_generator(self.num_samples)
18 |         return self.data_x, self.data_y
19 |     
20 |     
21 |     def SubCorr_statistic(self,data_x=None,data_y=None):
22 |         if data_x is None:
23 |             data_x=self.data_x
24 |         if data_y is None:
25 |             data_y=self.data_y
26 |         dx = shape(data_x)[1]
27 |         stats_value = zeros(dx)
28 |         for dd in range(dx):
29 |             stats_value[dd] = pearsonr(data_x[:,[dd]],data_y)[0]**2
30 |         SubCorr = sum(stats_value)/float(dx)
31 |         return SubCorr
32 |     
33 |     
34 |     def compute_pvalue_with_time_tracking(self,data_x = None, data_y = None):
35 |         if data_x is None and data_y is None:
36 |             if not self.streaming and not self.freeze_data:
37 |                 start = time.clock()
38 |                 self.generate_data()
39 |                 data_generating_time = time.clock()-start
40 |                 data_x = self.data_x
41 |                 data_y = self.data_y
42 |             else:
43 |                 data_generating_time = 0.
44 |         else:
45 |             data_generating_time = 0.
46 |         print 'data generating time passed: ', data_generating_time
47 |         SubCorr_statistic = self.SubCorr_statistic(data_x=data_x,data_y=data_y)
48 |         null_samples=zeros(self.num_shuffles)
49 |         for jj in range(self.num_shuffles):
50 |             pp = permutation(self.num_samples)
51 |             yy = self.data_y[pp,:]
52 |             null_samples[jj]=self.SubCorr_statistic(data_x = data_x, data_y = yy)
53 |         pvalue = ( sum( null_samples > SubCorr_statistic ) ) / float( self.num_shuffles )
54 |         return pvalue, data_generating_time
55 |     
56 |     
57 |     
58 |     


--------------------------------------------------------------------------------
/independence_testing/ExampleHSIC.py:
--------------------------------------------------------------------------------
  1 | '''
  2 | Example script for running large-scale independence tests with HSIC
  3 | https://github.com/oxmlcs/kerpy
  4 | '''
  5 | 
  6 | 
  7 | #adding relevant folder to your pythonpath
  8 | import os, sys
  9 | BASE_DIR = os.path.join( os.path.dirname( __file__ ), '..' )
 10 | sys.path.append(BASE_DIR)
 11 | 
 12 | 
 13 | from kerpy.GaussianKernel import GaussianKernel
 14 | from SimDataGen import SimDataGen
 15 | from HSICTestObject import HSICTestObject
 16 | from numpy import shape,savetxt,loadtxt,transpose,shape,reshape,concatenate
 17 | from independence_testing.HSICSpectralTestObject import HSICSpectralTestObject
 18 | from independence_testing.HSICBlockTestObject import HSICBlockTestObject
 19 | 
 20 | '''
 21 | Given a data set data_x and data_y of paired observations, 
 22 | we wish to test the hypothesis of independence between the two. 
 23 | If dealing with vectorial data, data_x and data_y should be 2d-numpy arrays of shape (n,dim), 
 24 | where n is the number of observations and dim is the dimension of these observations 
 25 | --- note: one-dimensional observations should also be in a 2d-numpy array format (n,1)
 26 | '''
 27 | 
 28 | 
 29 | 
 30 | #here we simulate a dataset of size 'num_samples' in the correct format
 31 | num_samples = 10000
 32 | data_x, data_y = SimDataGen.LargeScale(num_samples, dimension=20)
 33 | #SimDataGen.py contains more examples of data generating functions
 34 | 
 35 | 
 36 | '''
 37 | # Alternatively, we can load a dataset from a file as follows: 
 38 | #-- here file is assumed to be a num_samples by (dimx+dimy) table
 39 | data = loadtxt('ExampleData.txt')
 40 | num_samples,D = shape(data)
 41 | #assume that x corresponds to all but the last column in the file
 42 | data_x = data[:,:(D-1)]
 43 | #and that y is just the last column
 44 | data_y = data[:,D-1]
 45 | #need to ensure data_y is a 2d array
 46 | data_y=reshape(data_y,(num_samples,1))
 47 | '''
 48 | 
 49 | 
 50 | print "shape of data_x:", shape(data_x)
 51 | print "shape of data_y:", shape(data_y)
 52 | 
 53 | '''
 54 | First, we need to specify the kernels for X and Y. We will use Gaussian kernels -- default value of the width parameter is 1.0
 55 | the widths can be either kept fixed or set to a median heuristic based on the data when running a test
 56 | '''
 57 | kernelX=GaussianKernel()
 58 | kernelY=GaussianKernel()
 59 | 
 60 | 
 61 | 
 62 | 
 63 | '''
 64 | HSICSpectralTestObject/HSICPermutationTestObject:
 65 | =================================================
 66 | num_samples:                Integer values -- the number of data samples 
 67 | data_generator:             If we use simulated data, which function to use to generate data for repeated tests to investigate power;
 68 |                             Examples are given in SimDataGen.py, e.g. data_generator = SimDataGen.LargeScale;
 69 |                             Default value is None (if only a single test will be run).
 70 |                             
 71 | kernelX, kernelY:           The kernel functions to use for X and Y respectively. (Examples are included in kerpy folder) 
 72 |                             E.g. kernelX = GaussianKernel(); alternatively, for a kernel with fixed width: kernelY = GaussianKernel(float(1.5))
 73 | kernelX_use_median,
 74 | kernelY_use_median:         "True" or "False" -- if median heuristic should be used to select the kernel bandwidth. 
 75 | 
 76 | rff:                        "True" or "False" -- if random Fourier Features should be used.
 77 | num_rfx, num_rfy:           Even integer values -- the number of random features for X and Y respectively.
 78 | 
 79 | induce_set:                 "True" or "False" -- if Nystrom method should be used.
 80 | num_inducex, num_inducey:    Integer values -- the number of inducing variables for X and Y respectively.
 81 | 
 82 | num_nullsims:                An integer value -- the number of simulations from the null distribution for spectral approach.
 83 | num_shuffles:                An integer value -- the number of shuffles for permutation approach.
 84 | unbiased:                    "True" or "False" -- if unbiased HSIC test statistics is preferred.
 85 | 
 86 | 
 87 | HSICBlockTestObject: 
 88 | ====================
 89 | blocksize:                  Integer value -- the size of each block. 
 90 | nullvarmethod:              "permutation", "direct" or "across" -- the method of estimating the null variance. 
 91 |                             Refer to the paper for more details of each.
 92 | '''
 93 | 
 94 | 
 95 | #example usage of HSIC spectral test with random Fourier feature approximation
 96 | myspectralobject = HSICSpectralTestObject(num_samples, kernelX=kernelX, kernelY=kernelY, 
 97 |                                           kernelX_use_median=True, kernelY_use_median=True,
 98 |                                           rff=True, num_rfx=20, num_rfy=20, num_nullsims=1000)
 99 | pvalue = myspectralobject.compute_pvalue(data_x, data_y)
100 | 
101 | print "Spectral test p-value:", pvalue
102 | 
103 | #example usage of HSIC block test:
104 | myblockobject = HSICBlockTestObject(num_samples, kernelX=kernelX, kernelY=kernelY, 
105 |                                     kernelX_use_median=True, kernelY_use_median=True,
106 |                                     blocksize=50, nullvarmethod='permutation')
107 | pvalue = myblockobject.compute_pvalue(data_x, data_y)
108 | 
109 | print "Block test p-value:", pvalue


--------------------------------------------------------------------------------
/independence_testing/ExperimentsHSICBlock.py:
--------------------------------------------------------------------------------
 1 | '''
 2 |     adding relevant folder to your pythonpath
 3 |     '''
 4 | import os, sys
 5 | BASE_DIR = os.path.join( os.path.dirname( __file__ ), '..' )
 6 | sys.path.append(BASE_DIR)
 7 | 
 8 | from kerpy.GaussianKernel import GaussianKernel
 9 | from HSICTestObject import HSICTestObject
10 | from HSICBlockTestObject import HSICBlockTestObject
11 | from TestExperiment import TestExperiment
12 | from SimDataGen import SimDataGen
13 | from tools.ProcessingObject import ProcessingObject
14 | 
15 | #example use: python ExperimentsHSICBlock.py 500 --dimX 3 --kernelX_use_median --kernelY_use_median --blocksize 10
16 | 
17 | data_generating_function = SimDataGen.LargeScale
18 | data_generating_function_null = SimDataGen.turn_into_null(SimDataGen.LargeScale)
19 | args = ProcessingObject.parse_arguments()
20 | 
21 | '''unpack the arguments needed:''' 
22 | num_samples=args.num_samples
23 | hypothesis=args.hypothesis
24 | dimX = args.dimX
25 | kernelX_use_median = args.kernelX_use_median
26 | kernelY_use_median = args.kernelY_use_median
27 | blocksize = args.blocksize 
28 | #currently, we are using the same blocksize for both X and Y
29 | 
30 | # A temporary set up for the kernels: 
31 | kernelX = GaussianKernel(1.)
32 | kernelY = GaussianKernel(1.)
33 | 
34 | if hypothesis=="alter":
35 |     data_generator=lambda num_samples: data_generating_function(num_samples,dimension=dimX)
36 | elif hypothesis=="null":
37 |     data_generator=lambda num_samples: data_generating_function_null(num_samples,dimension=dimX)
38 | else:
39 |     raise NotImplementedError()
40 | 
41 | 
42 | test_object=HSICBlockTestObject(num_samples, data_generator, kernelX, kernelY, 
43 |                                    kernelX_use_median=kernelX_use_median,kernelY_use_median=kernelY_use_median,
44 |                                    nullvarmethod='permutation',
45 |                                    blocksize=blocksize)
46 | 
47 | name = os.path.basename(__file__).rstrip('.py')+'_LS_'+hypothesis+'_d_'+str(dimX)+'_B_'+str(blocksize)+'_n_'+str(num_samples)
48 | 
49 | param={'name': name,\
50 |        'kernelX': kernelX,\
51 |        'kernelY': kernelY,\
52 |        'blocksize':blocksize,\
53 |        'data_generator': data_generator.__name__,\
54 |        'hypothesis':hypothesis,\
55 |        'num_samples': num_samples}
56 | 
57 | 
58 | experiment=TestExperiment(name, param, test_object)
59 | 
60 | numTrials = 100
61 | alpha=0.05
62 | experiment.run_test_trials(numTrials, alpha=alpha)
63 | 


--------------------------------------------------------------------------------
/independence_testing/ExperimentsHSICPermutation.py:
--------------------------------------------------------------------------------
 1 | '''
 2 |     adding relevant folder to your pythonpath
 3 |     '''
 4 | import os, sys
 5 | BASE_DIR = os.path.join( os.path.dirname( __file__ ), '..' )
 6 | sys.path.append(BASE_DIR)
 7 | 
 8 | from kerpy.GaussianKernel import GaussianKernel
 9 | from HSICTestObject import HSICTestObject
10 | from TestExperiment import TestExperiment
11 | from SimDataGen import SimDataGen
12 | from tools.ProcessingObject import ProcessingObject
13 | from HSICPermutationTestObject import HSICPermutationTestObject
14 | 
15 | #example use: python ExperimentsHSICPermutation.py 500 --dimX 3 --hypothesis null --rff --num_rfx 50 --num_rfy 50 
16 | 
17 | data_generating_function = SimDataGen.VaryDimension
18 | data_generating_function_null = SimDataGen.turn_into_null(SimDataGen.VaryDimension)
19 | args = ProcessingObject.parse_arguments()
20 | 
21 | '''unpack the arguments needed:''' 
22 | num_samples=args.num_samples
23 | hypothesis=args.hypothesis
24 | dimX = args.dimX 
25 | kernelX_use_median = args.kernelX_use_median
26 | kernelY_use_median = args.kernelY_use_median
27 | rff=args.rff
28 | num_rfx = args.num_rfx
29 | num_rfy = args.num_rfy
30 | induce_set = args.induce_set
31 | num_inducex = args.num_inducex
32 | num_inducey = args.num_inducey
33 | num_shuffles = args.num_shuffles
34 | 
35 | 
36 | # A temporary set up for the kernels: 
37 | kernelX = GaussianKernel(1.)
38 | kernelY = GaussianKernel(1.)
39 | 
40 | 
41 | if hypothesis=="alter":
42 |     data_generator=lambda num_samples: data_generating_function(num_samples,dimension=dimX)
43 | elif hypothesis=="null":
44 |     data_generator=lambda num_samples: data_generating_function_null(num_samples,dimension=dimX)
45 | else:
46 |     raise NotImplementedError()
47 | 
48 | 
49 | 
50 | 
51 | test_object=HSICPermutationTestObject(num_samples, data_generator, kernelX, kernelY, 
52 |                                    kernelX_use_median=kernelX_use_median,kernelY_use_median=kernelY_use_median, 
53 |                                    num_rfx=num_rfx,num_rfy=num_rfy, unbiased=False, rff=rff,
54 |                                    induce_set=induce_set,num_inducex=num_inducex,num_inducey=num_inducey,
55 |                                    num_shuffles=num_shuffles)
56 | 
57 | 
58 | if rff:
59 |     name = os.path.basename(__file__).rstrip('.py')+'_VD_'+hypothesis+'_d_'+str(dimX)+\
60 |     '_shuffles_'+str(num_shuffles)+'_rff_'+str(num_rfx)+str(num_rfy)+'_n_'+str(num_samples)
61 | elif induce_set:
62 |     name = os.path.basename(__file__).rstrip('.py')+'_VD_'+hypothesis+'_d_'+str(dimX)+\
63 |     '_shuffles_'+str(num_shuffles)+'_induce_'+str(num_inducex)+str(num_inducey)+'_n_'+str(num_samples)
64 | else:
65 |     name = os.path.basename(__file__).rstrip('.py')+'_VD_'+hypothesis+'_d_'+str(dimX)+\
66 |     '_shuffles_'+str(num_shuffles)+'_n_'+str(num_samples)
67 | 
68 | 
69 | 
70 | param={'name': name,\
71 |        'dim': dimX,\
72 |        'kernelX': kernelX,\
73 |        'kernelY': kernelY,\
74 |        'num_rfx': num_rfx,\
75 |        'num_rfy': num_rfy,\
76 |        'num_inducex': num_inducex,\
77 |        'num_inducey': num_inducey,\
78 |        'data_generator': data_generator.__name__,\
79 |        'hypothesis':hypothesis,\
80 |        'num_samples': num_samples}
81 | 
82 | 
83 | experiment=TestExperiment(name, param, test_object)
84 | 
85 | numTrials = 100
86 | alpha=0.05
87 | experiment.run_test_trials(numTrials, alpha=alpha)


--------------------------------------------------------------------------------
/independence_testing/ExperimentsHSICSpectral.py:
--------------------------------------------------------------------------------
 1 | '''
 2 |     adding relevant folder to your pythonpath
 3 | '''
 4 | import os, sys
 5 | BASE_DIR = os.path.join( os.path.dirname( __file__ ), '..' )
 6 | #print BASE_DIR
 7 | sys.path.append(BASE_DIR)
 8 | #print sys.path.append(BASE_DIR)
 9 | 
10 | from kerpy.GaussianKernel import GaussianKernel
11 | from kerpy.BrownianKernel import BrownianKernel
12 | from HSICTestObject import HSICTestObject
13 | from HSICSpectralTestObject import HSICSpectralTestObject
14 | from TestExperiment import TestExperiment
15 | from SimDataGen import SimDataGen
16 | from tools.ProcessingObject import ProcessingObject
17 | 
18 | #example usage: python ExperimentsHSICSpectral.py 500 --dimX 4 --hypothesis null --rff --num_rfx 50 --num_rfy 50 
19 | # the above says that 500 samples; null hypothesis; rff True; 50 random Fourier features for X and Y.
20 | 
21 | 
22 | data_generating_function = SimDataGen.VaryDimension
23 | data_generating_function_null = SimDataGen.turn_into_null(SimDataGen.VaryDimension)
24 | #data_generating_function = SimDataGen.LargeScale
25 | #data_generating_function_null = SimDataGen.turn_into_null(SimDataGen.LargeScale)
26 | args = ProcessingObject.parse_arguments()
27 | 
28 | '''unpack the arguments needed:''' 
29 | num_samples=args.num_samples
30 | hypothesis=args.hypothesis
31 | dimX = args.dimX
32 | kernelX_use_median = args.kernelX_use_median
33 | kernelY_use_median = args.kernelY_use_median
34 | rff=args.rff
35 | num_rfx = args.num_rfx
36 | num_rfy = args.num_rfy
37 | induce_set = args.induce_set
38 | num_inducex = args.num_inducex
39 | num_inducey = args.num_inducey
40 | 
41 | 
42 | # This will only be a temporary set up for the kernels when we use median heuristics. 
43 | #kernelX = GaussianKernel(1.)
44 | #kernelY = GaussianKernel(1.)
45 | 
46 | # Brownian kernel with H = 0.5 equivalently alpha = 1.0
47 | kernelX = BrownianKernel(1.)
48 | kernelY = BrownianKernel(1.) 
49 | 
50 | if hypothesis=="alter":
51 |     data_generator=lambda num_samples: data_generating_function(num_samples,dimension=dimX)
52 | elif hypothesis=="null":
53 |     data_generator=lambda num_samples: data_generating_function_null(num_samples,dimension=dimX)
54 | else:
55 |     raise NotImplementedError()
56 | 
57 | 
58 | test_object=HSICSpectralTestObject(num_samples, data_generator, kernelX, kernelY, 
59 |                                    kernelX_use_median=kernelX_use_median,kernelY_use_median=kernelY_use_median, 
60 |                                    rff = rff, num_rfx=num_rfx,num_rfy=num_rfy, unbiased=False,
61 |                                    induce_set = induce_set, num_inducex = num_inducex, num_inducey = num_inducey)
62 | 
63 | 
64 | # file name of the results
65 | if rff:
66 |     name = os.path.basename(__file__).rstrip('.py')+'Bsine'+'_'+hypothesis+'_rff_'+str(num_rfx)+\
67 |     str(num_rfy)+'_d_'+str(dimX)+'_n_'+str(num_samples)
68 | elif induce_set:
69 |     name = os.path.basename(__file__).rstrip('.py')+'Bsine'+'_'+hypothesis+'_induce_'+str(num_inducex)+\
70 |     str(num_inducey)+'_d_'+str(dimX)+'_n_'+str(num_samples)
71 | else:
72 |     name = os.path.basename(__file__).rstrip('.py')+'Bsine'+'_'+hypothesis+'_d_'+str(dimX)+'_n_'+str(num_samples)
73 | 
74 | param={'name': name,\
75 |        'dim': dimX,\
76 |        'kernelX': kernelX,\
77 |        'kernelY': kernelY,\
78 |        'num_rfx': num_rfx,\
79 |        'num_rfy': num_rfy,\
80 |        'num_inducex': num_inducex,\
81 |        'num_inducey': num_inducey,\
82 |        'data_generator': data_generator.__name__,\
83 |        'hypothesis':hypothesis,\
84 |        'num_samples': num_samples}
85 | 
86 | experiment=TestExperiment(name, param, test_object)
87 | 
88 | numTrials = 100
89 | alpha=0.05
90 | experiment.run_test_trials(numTrials, alpha=alpha)
91 | 
92 | 


--------------------------------------------------------------------------------
/independence_testing/HSICBlockTestObject.py:
--------------------------------------------------------------------------------
 1 | from TestObject import TestObject
 2 | from HSICTestObject import HSICTestObject
 3 | from numpy import mean, sum, zeros, var, sqrt
 4 | from scipy.stats import norm
 5 | import time
 6 | 
 7 | class HSICBlockTestObject(HSICTestObject):
 8 |     def __init__(self,num_samples, data_generator=None, kernelX=None, kernelY=None,
 9 |                  kernelX_use_median=False,kernelY_use_median=False,
10 |                   rff=False, num_rfx=None, num_rfy=None,
11 |                  blocksize=50, streaming=False, nullvarmethod='permutation', freeze_data=False):
12 |         HSICTestObject.__init__(self, num_samples, data_generator=data_generator, kernelX=kernelX, kernelY=kernelY, 
13 |                                 kernelX_use_median=kernelX_use_median,kernelY_use_median=kernelY_use_median,
14 |                                  rff=rff, streaming=streaming, num_rfx=num_rfx, num_rfy=num_rfy,
15 |                                 freeze_data=freeze_data)
16 |         self.blocksize = blocksize
17 |         #self.blocksizeY = blocksizeY
18 |         self.nullvarmethod = nullvarmethod
19 |         
20 |     def compute_pvalue_with_time_tracking(self,data_x=None,data_y=None):
21 |         if data_x is None and data_y is None:
22 |             if not self.streaming and not self.freeze_data:
23 |                 start = time.clock()
24 |                 self.generate_data()
25 |                 data_generating_time = time.clock()-start
26 |                 data_x = self.data_x
27 |                 data_y = self.data_y
28 |             else:
29 |                 data_generating_time = 0.
30 |         else: 
31 |             data_generating_time = 0.
32 |         #print 'Total block data generating time passed: ', data_generating_time
33 |         if self.kernelX_use_median:
34 |             sigmax = self.kernelX.get_sigma_median_heuristic(data_x)
35 |             self.kernelX.set_width(float(sigmax))
36 |         if self.kernelY_use_median:
37 |             sigmay = self.kernelY.get_sigma_median_heuristic(data_y)
38 |             self.kernelY.set_width(float(sigmay))
39 |         num_blocks = int(( self.num_samples ) // self.blocksize)
40 |         block_statistics = zeros(num_blocks)
41 |         null_samples = zeros(num_blocks)
42 |         null_varx = zeros(num_blocks)
43 |         null_vary = zeros(num_blocks)
44 |         for bb in range(num_blocks):
45 |             if self.streaming:
46 |                 data_xb, data_yb = self.data_generator(self.blocksize, self.blocksize)
47 |             else:
48 |                 data_xb = data_x[(bb*self.blocksize):((bb+1)*self.blocksize)]
49 |                 data_yb = data_y[(bb*self.blocksize):((bb+1)*self.blocksize)]
50 |             if self.nullvarmethod == 'permutation':
51 |                 block_statistics[bb], null_samples[bb], _, _, _, _, _ = \
52 |                     self.HSICmethod(data_x=data_xb, data_y=data_yb, unbiased=True, num_shuffles=1, estimate_nullvar=False,isBlockHSIC=True)
53 |             elif self.nullvarmethod == 'direct':
54 |                 block_statistics[bb], _, null_varx[bb], null_vary[bb], _, _, _ = \
55 |                     self.HSICmethod(data_x=data_xb, data_y=data_yb, unbiased=True, num_shuffles=0, estimate_nullvar=True,isBlockHSIC=True)
56 |             elif self.nullvarmethod == 'across':
57 |                 block_statistics[bb], _, _, _, _, _, _ = \
58 |                     self.HSICmethod(data_x=data_xb, data_y=data_yb, unbiased=True, num_shuffles=0, estimate_nullvar=False,isBlockHSIC=True)
59 |             else:
60 |                 raise NotImplementedError()
61 |         BTest_Statistic = sum(block_statistics) / float(num_blocks)
62 |         #print BTest_Statistic
63 |         if self.nullvarmethod == 'permutation':
64 |             BTest_NullVar = self.blocksize**2*var(null_samples)
65 |         elif self.nullvarmethod == 'direct':
66 |             overall_varx = mean(null_varx)
67 |             overall_vary = mean(null_vary)
68 |             BTest_NullVar = 2.*overall_varx*overall_vary
69 |         elif self.nullvarmethod == 'across':
70 |             BTest_NullVar = var(block_statistics)
71 |         #print BTest_NullVar
72 |         Z_score = sqrt(self.num_samples*self.blocksize)*BTest_Statistic / sqrt(BTest_NullVar) 
73 |         #print Z_score
74 |         pvalue = norm.sf(Z_score)
75 |         return pvalue, data_generating_time
76 |     
77 |     
78 | 


--------------------------------------------------------------------------------
/independence_testing/HSICPermutationTestObject.py:
--------------------------------------------------------------------------------
 1 | '''
 2 | Created on 17 Nov 2015
 3 | 
 4 | @author: qinyi
 5 | '''
 6 | from HSICTestObject import HSICTestObject
 7 | import time
 8 | 
 9 | 
10 | class HSICPermutationTestObject(HSICTestObject):
11 | 
12 |     def __init__(self, num_samples, data_generator=None, kernelX=None, kernelY=None, kernelX_use_median=False,
13 |                  kernelY_use_median=False, num_rfx=None, num_rfy=None, rff=False,
14 |                  induce_set=False, num_inducex = None, num_inducey = None, num_shuffles=1000, unbiased=True):
15 |         HSICTestObject.__init__(self, num_samples, data_generator=data_generator, kernelX=kernelX, kernelY=kernelY, 
16 |                                 kernelX_use_median=kernelX_use_median,kernelY_use_median=kernelY_use_median,
17 |                                 num_rfx=num_rfx, num_rfy=num_rfy, rff=rff,induce_set=induce_set,
18 |                                  num_inducex = num_inducex, num_inducey = num_inducey)
19 |         self.num_shuffles = num_shuffles
20 |         self.unbiased = unbiased
21 |     
22 |      
23 |     def compute_pvalue_with_time_tracking(self,data_x=None,data_y=None):
24 |         if data_x is None and data_y is None:
25 |             if not self.streaming and not self.freeze_data:
26 |                 start = time.clock()
27 |                 self.generate_data()
28 |                 data_generating_time = time.clock()-start
29 |                 data_x = self.data_x
30 |                 data_y = self.data_y
31 |             else:
32 |                 data_generating_time = 0.
33 |         else:
34 |             data_generating_time = 0.
35 |         print 'Permutation data generating time passed: ', data_generating_time
36 |         hsic_statistic, null_samples, _, _, _, _, _ = self.HSICmethod(unbiased=self.unbiased,num_shuffles=self.num_shuffles,
37 |                                                                       data_x = data_x, data_y = data_y)
38 |         pvalue = ( 1 + sum( null_samples > hsic_statistic ) ) / float( 1 + self.num_shuffles )
39 | 
40 |         return pvalue, data_generating_time


--------------------------------------------------------------------------------
/independence_testing/HSICSpectralTestObject.py:
--------------------------------------------------------------------------------
 1 | '''
 2 | Created on 15 Nov 2015
 3 | 
 4 | @author: qinyi
 5 | '''
 6 | from HSICTestObject import HSICTestObject
 7 | import numpy as np
 8 | import time
 9 | 
10 | class HSICSpectralTestObject(HSICTestObject):
11 | 
12 |     def __init__(self, num_samples, data_generator=None, 
13 |                  kernelX=None, kernelY=None, kernelX_use_median=False,kernelY_use_median=False,
14 |                  rff=False,num_rfx=None,num_rfy=None,induce_set=False, num_inducex = None, num_inducey = None,
15 |                  num_nullsims=1000, unbiased=False):
16 |         HSICTestObject.__init__(self, num_samples, data_generator=data_generator, kernelX=kernelX, kernelY=kernelY, 
17 |                                 kernelX_use_median=kernelX_use_median,kernelY_use_median=kernelY_use_median, 
18 |                                 num_rfx=num_rfx, num_rfy=num_rfy, rff=rff,
19 |                                 induce_set=induce_set, num_inducex = num_inducex, num_inducey = num_inducey)
20 |         self.num_nullsims = num_nullsims
21 |         self.unbiased = unbiased
22 |     
23 |     
24 |     def get_null_samples_with_spectral_approach(self,Mx,My):
25 |         lambdax, lambday = self.get_spectrum_on_data(Mx,My)
26 |         Dx=len(lambdax)
27 |         Dy=len(lambday)
28 |         null_samples=np.zeros(self.num_nullsims)
29 |         for jj in range(self.num_nullsims):
30 |             zz=np.random.randn(Dx,Dy)**2
31 |             if self.unbiased:
32 |                 zz = zz - 1
33 |             null_samples[jj]=np.dot(lambdax.T,np.dot(zz,lambday))
34 |         return null_samples
35 |     
36 |     def compute_pvalue_with_time_tracking(self,data_x=None,data_y=None):
37 |         if data_x is None and data_y is None:
38 |             if not self.streaming and not self.freeze_data:
39 |                 start = time.clock()
40 |                 self.generate_data()
41 |                 data_generating_time = time.clock()-start
42 |                 data_x = self.data_x
43 |                 data_y = self.data_y
44 |             else:
45 |                 data_generating_time = 0.
46 |         else:
47 |             data_generating_time = 0.
48 |         #print 'data generating time passed: ', data_generating_time
49 |         hsic_statistic, _, _, _, Mx, My, _ = self.HSICmethod(unbiased=self.unbiased,data_x = data_x, data_y = data_y)
50 |         null_samples = self.get_null_samples_with_spectral_approach(Mx, My)
51 |         pvalue = ( 1+ sum( null_samples > self.num_samples*hsic_statistic ) ) / float( 1 + self.num_nullsims )
52 |         return pvalue, data_generating_time
53 | 


--------------------------------------------------------------------------------
/independence_testing/HSICTestObject.py:
--------------------------------------------------------------------------------
  1 | from numpy import shape, fill_diagonal, zeros, mean, sqrt,identity,dot,diag
  2 | from numpy.random import permutation, randn
  3 | from independence_testing.TestObject import TestObject
  4 | import numpy as np
  5 | from abc import abstractmethod
  6 | from kerpy.Kernel import Kernel
  7 | import time
  8 | from scipy.linalg import sqrtm,inv
  9 | from numpy.linalg import eigh,svd
 10 | 
 11 | 
 12 | 
 13 | class HSICTestObject(TestObject):
 14 |     def __init__(self, num_samples, data_generator=None, kernelX=None, kernelY=None, kernelZ = None,
 15 |                  kernelX_use_median=False,kernelY_use_median=False,kernelZ_use_median=False,
 16 |                   rff=False, num_rfx=None, num_rfy=None, induce_set=False, 
 17 |                   num_inducex = None, num_inducey = None,
 18 |                   streaming=False, freeze_data=False):
 19 |         TestObject.__init__(self,self.__class__.__name__,streaming=streaming, freeze_data=freeze_data)
 20 |         self.num_samples = num_samples #We have same number of samples from X and Y in independence testing
 21 |         self.data_generator = data_generator
 22 |         self.kernelX = kernelX
 23 |         self.kernelY = kernelY
 24 |         self.kernelZ = kernelZ
 25 |         self.kernelX_use_median = kernelX_use_median #indicate if median heuristic for Gaussian Kernel should be used
 26 |         self.kernelY_use_median = kernelY_use_median
 27 |         self.kernelZ_use_median = kernelZ_use_median
 28 |         self.rff = rff
 29 |         self.num_rfx = num_rfx
 30 |         self.num_rfy = num_rfy
 31 |         self.induce_set = induce_set
 32 |         self.num_inducex = num_inducex
 33 |         self.num_inducey = num_inducey
 34 |         if self.rff|self.induce_set: 
 35 |             self.HSICmethod = self.HSIC_with_shuffles_rff
 36 |         else:
 37 |             self.HSICmethod = self.HSIC_with_shuffles
 38 |     
 39 |     def generate_data(self,isConditionalTesting = False):
 40 |         if not isConditionalTesting:
 41 |             self.data_x, self.data_y = self.data_generator(self.num_samples)
 42 |             return self.data_x, self.data_y
 43 |         else: 
 44 |             self.data_x, self.data_y, self.data_z = self.data_generator(self.num_samples)
 45 |             return self.data_x, self.data_y, self.data_z
 46 |         ''' for our SimDataGen examples, one argument suffice'''
 47 |     
 48 |     
 49 |     @staticmethod
 50 |     def HSIC_U_statistic(Kx,Ky):
 51 |         m = shape(Kx)[0]
 52 |         fill_diagonal(Kx,0.)
 53 |         fill_diagonal(Ky,0.)
 54 |         K = np.dot(Kx,Ky)
 55 |         first_term = np.trace(K)/float(m*(m-3.))
 56 |         second_term = np.sum(Kx)*np.sum(Ky)/float(m*(m-3.)*(m-1.)*(m-2.))
 57 |         third_term = 2.*np.sum(K)/float(m*(m-3.)*(m-2.))
 58 |         return first_term+second_term-third_term
 59 |     
 60 |     
 61 |     @staticmethod
 62 |     def HSIC_V_statistic(Kx,Ky):
 63 |         Kxc=Kernel.center_kernel_matrix(Kx)
 64 |         Kyc=Kernel.center_kernel_matrix(Ky)
 65 |         return np.sum(Kxc*Kyc)
 66 |     
 67 |     
 68 |     @staticmethod
 69 |     def HSIC_V_statistic_rff(phix,phiy):
 70 |         m=shape(phix)[0]
 71 |         phix_c=phix-mean(phix,axis=0)
 72 |         phiy_c=phiy-mean(phiy,axis=0)
 73 |         featCov=(phix_c.T).dot(phiy_c)/float(m)
 74 |         return np.linalg.norm(featCov)**2
 75 |     
 76 |     
 77 |     # generalise distance correlation ---- a kernel interpretation
 78 |     @staticmethod
 79 |     def dCor_HSIC_statistic(Kx,Ky,unbiased=False):
 80 |         if unbiased:
 81 |             first_term = HSICTestObject.HSIC_U_statistic(Kx,Ky)
 82 |             second_term = HSICTestObject.HSIC_U_statistic(Kx,Kx)*HSICTestObject.HSIC_U_statistic(Ky,Ky)
 83 |             dCor = first_term/float(sqrt(second_term))
 84 |         else:
 85 |             first_term = HSICTestObject.HSIC_V_statistic(Kx,Ky)
 86 |             second_term = HSICTestObject.HSIC_V_statistic(Kx,Kx)*HSICTestObject.HSIC_V_statistic(Ky,Ky)
 87 |             dCor = first_term/float(sqrt(second_term))
 88 |         return dCor
 89 |     
 90 |     
 91 |     # approximated dCor using rff/Nystrom 
 92 |     @staticmethod
 93 |     def dCor_HSIC_statistic_rff(phix,phiy):
 94 |         first_term = HSICTestObject.HSIC_V_statistic_rff(phix,phiy)
 95 |         second_term = HSICTestObject.HSIC_V_statistic_rff(phix,phix)*HSICTestObject.HSIC_V_statistic_rff(phiy,phiy)
 96 |         approx_dCor = first_term/float(sqrt(second_term))
 97 |         return approx_dCor
 98 |     
 99 |     
100 |     def SubdCor_HSIC_statistic(self,data_x=None,data_y=None,unbiased=True):
101 |         if data_x is None:
102 |             data_x=self.data_x
103 |         if data_y is None:
104 |             data_y=self.data_y
105 |         dx = shape(data_x)[1]
106 |         stats_value = zeros(dx)
107 |         for dd in range(dx):
108 |             Kx, Ky = self.compute_kernel_matrix_on_data(data_x[:,[dd]], data_y)
109 |             stats_value[dd] = HSICTestObject.dCor_HSIC_statistic(Kx, Ky, unbiased)
110 |         SubdCor = sum(stats_value)/float(dx)
111 |         return SubdCor
112 |     
113 |     
114 |     def SubHSIC_statistic(self,data_x=None,data_y=None,unbiased=True):
115 |         if data_x is None:
116 |             data_x=self.data_x
117 |         if data_y is None:
118 |             data_y=self.data_y
119 |         dx = shape(data_x)[1]
120 |         stats_value = zeros(dx)
121 |         for dd in range(dx):
122 |             Kx, Ky = self.compute_kernel_matrix_on_data(data_x[:,[dd]], data_y)
123 |             if unbiased: 
124 |                 stats_value[dd] = HSICTestObject.HSIC_U_statistic(Kx, Ky) 
125 |             else:
126 |                 stats_value[dd] = HSICTestObject.HSIC_V_statistic(Kx, Ky)
127 |         SubHSIC = sum(stats_value)/float(dx)
128 |         return SubHSIC
129 |     
130 |     
131 |     def HSIC_with_shuffles(self,data_x=None,data_y=None,unbiased=True,num_shuffles=0,
132 |                            estimate_nullvar=False,isBlockHSIC=False):
133 |         start = time.clock()
134 |         if data_x is None:
135 |             data_x=self.data_x
136 |         if data_y is None:
137 |             data_y=self.data_y
138 |         time_passed = time.clock()-start
139 |         if isBlockHSIC:
140 |             Kx, Ky = self.compute_kernel_matrix_on_dataB(data_x,data_y)
141 |         else:
142 |             Kx, Ky = self.compute_kernel_matrix_on_data(data_x,data_y)
143 |         ny=shape(data_y)[0]
144 |         if unbiased:
145 |             test_statistic = HSICTestObject.HSIC_U_statistic(Kx,Ky)
146 |         else:
147 |             test_statistic = HSICTestObject.HSIC_V_statistic(Kx,Ky)
148 |         null_samples=zeros(num_shuffles)
149 |         for jj in range(num_shuffles):
150 |             pp = permutation(ny)
151 |             Kpp = Ky[pp,:][:,pp]
152 |             if unbiased:
153 |                 null_samples[jj]=HSICTestObject.HSIC_U_statistic(Kx,Kpp)
154 |             else:
155 |                 null_samples[jj]=HSICTestObject.HSIC_V_statistic(Kx,Kpp)
156 |         if estimate_nullvar:
157 |             nullvarx, nullvary = self.unbiased_HSnorm_estimate_of_centred_operator(Kx,Ky)
158 |             nullvarx = 2.* nullvarx
159 |             nullvary = 2.* nullvary
160 |         else:
161 |             nullvarx, nullvary = None, None
162 |         return test_statistic,null_samples,nullvarx,nullvary,Kx, Ky, time_passed
163 |     
164 |     
165 |     
166 |     def HSIC_with_shuffles_rff(self,data_x=None,data_y=None,
167 |                                unbiased=True,num_shuffles=0,estimate_nullvar=False):
168 |         start = time.clock()
169 |         if data_x is None:
170 |             data_x=self.data_x
171 |         if data_y is None:
172 |             data_y=self.data_y
173 |         time_passed = time.clock()-start
174 |         if self.rff:
175 |             phix, phiy = self.compute_rff_on_data(data_x,data_y)
176 |         else:
177 |             phix, phiy = self.compute_induced_kernel_matrix_on_data(data_x,data_y)
178 |         ny=shape(data_y)[0]
179 |         if unbiased:
180 |             test_statistic = HSICTestObject.HSIC_U_statistic_rff(phix,phiy)
181 |         else:
182 |             test_statistic = HSICTestObject.HSIC_V_statistic_rff(phix,phiy)
183 |         null_samples=zeros(num_shuffles)
184 |         for jj in range(num_shuffles):
185 |             pp = permutation(ny)
186 |             if unbiased:
187 |                 null_samples[jj]=HSICTestObject.HSIC_U_statistic_rff(phix,phiy[pp])
188 |             else:
189 |                 null_samples[jj]=HSICTestObject.HSIC_V_statistic_rff(phix,phiy[pp])
190 |         if estimate_nullvar:
191 |             raise NotImplementedError()
192 |         else:
193 |             nullvarx, nullvary = None, None
194 |         return test_statistic, null_samples, nullvarx, nullvary,phix, phiy, time_passed
195 |     
196 |     
197 |     def get_spectrum_on_data(self, Mx, My):
198 |         '''Mx and My are Kx Ky when rff =False
199 |            Mx and My are phix, phiy when rff =True'''
200 |         if self.rff|self.induce_set:
201 |             Cx = np.cov(Mx.T)
202 |             Cy = np.cov(My.T)
203 |             lambdax=np.linalg.eigvalsh(Cx)
204 |             lambday=np.linalg.eigvalsh(Cy)
205 |         else:
206 |             Kxc = Kernel.center_kernel_matrix(Mx)
207 |             Kyc = Kernel.center_kernel_matrix(My)
208 |             lambdax=np.linalg.eigvalsh(Kxc)
209 |             lambday=np.linalg.eigvalsh(Kyc)
210 |         return lambdax,lambday
211 |     
212 |     
213 |     @abstractmethod
214 |     def compute_kernel_matrix_on_data(self,data_x,data_y):
215 |         if self.kernelX_use_median:
216 |             sigmax = self.kernelX.get_sigma_median_heuristic(data_x)
217 |             self.kernelX.set_width(float(sigmax))
218 |         if self.kernelY_use_median:
219 |             sigmay = self.kernelY.get_sigma_median_heuristic(data_y)
220 |             self.kernelY.set_width(float(sigmay))
221 |         Kx=self.kernelX.kernel(data_x)
222 |         Ky=self.kernelY.kernel(data_y)
223 |         return Kx, Ky
224 |     
225 |     
226 |     @abstractmethod
227 |     def compute_kernel_matrix_on_dataB(self,data_x,data_y):
228 |         Kx=self.kernelX.kernel(data_x)
229 |         Ky=self.kernelY.kernel(data_y)
230 |         return Kx, Ky
231 |     
232 |     
233 |     
234 |     @abstractmethod
235 |     def compute_kernel_matrix_on_data_CI(self,data_x,data_y,data_z):
236 |         if self.kernelX_use_median:
237 |             sigmax = self.kernelX.get_sigma_median_heuristic(data_x)
238 |             self.kernelX.set_width(float(sigmax))
239 |         if self.kernelY_use_median:
240 |             sigmay = self.kernelY.get_sigma_median_heuristic(data_y)
241 |             self.kernelY.set_width(float(sigmay))
242 |         if self.kernelZ_use_median:
243 |             sigmaz = self.kernelZ.get_sigma_median_heuristic(data_z)
244 |             self.kernelZ.set_width(float(sigmaz))
245 |         Kx=self.kernelX.kernel(data_x)
246 |         Ky=self.kernelY.kernel(data_y)
247 |         Kz=self.kernelZ.kernel(data_z)
248 |         return Kx, Ky,Kz
249 |     
250 |     
251 |     
252 |     
253 |     def unbiased_HSnorm_estimate_of_centred_operator(self,Kx,Ky):
254 |         '''returns an unbiased estimate of 2*Sum_p Sum_q lambda^2_p theta^2_q
255 |         where lambda and theta are the eigenvalues of the centered matrices for X and Y respectively'''
256 |         varx = HSICTestObject.HSIC_U_statistic(Kx,Kx)
257 |         vary = HSICTestObject.HSIC_U_statistic(Ky,Ky)
258 |         return varx,vary
259 |     
260 |     
261 |     @abstractmethod
262 |     def compute_rff_on_data(self,data_x,data_y):
263 |         self.kernelX.rff_generate(self.num_rfx,dim=shape(data_x)[1])
264 |         self.kernelY.rff_generate(self.num_rfy,dim=shape(data_y)[1])
265 |         if self.kernelX_use_median:
266 |             sigmax = self.kernelX.get_sigma_median_heuristic(data_x)
267 |             self.kernelX.set_width(float(sigmax))
268 |         if self.kernelY_use_median:
269 |             sigmay = self.kernelY.get_sigma_median_heuristic(data_y)
270 |             self.kernelY.set_width(float(sigmay))
271 |         phix = self.kernelX.rff_expand(data_x)
272 |         phiy = self.kernelY.rff_expand(data_y)
273 |         return phix, phiy
274 |     
275 |     
276 |     @abstractmethod
277 |     def compute_induced_kernel_matrix_on_data(self,data_x,data_y):
278 |         '''Z follows the same distribution as X; W follows that of Y.
279 |         The current data generating methods we use 
280 |         generate X and Y at the same time. '''
281 |         size_induced_set = max(self.num_inducex,self.num_inducey)
282 |         #print "size_induce_set", size_induced_set
283 |         if self.data_generator is None:
284 |             subsample_idx = np.random.randint(self.num_samples, size=size_induced_set)
285 |             self.data_z = data_x[subsample_idx,:]
286 |             self.data_w = data_y[subsample_idx,:]
287 |         else:
288 |             self.data_z, self.data_w = self.data_generator(size_induced_set)
289 |             self.data_z[[range(self.num_inducex)],:]
290 |             self.data_w[[range(self.num_inducey)],:]
291 |         #print 'Induce Set'
292 |         if self.kernelX_use_median:
293 |             sigmax = self.kernelX.get_sigma_median_heuristic(data_x)
294 |             self.kernelX.set_width(float(sigmax))
295 |         if self.kernelY_use_median:
296 |             sigmay = self.kernelY.get_sigma_median_heuristic(data_y)
297 |             self.kernelY.set_width(float(sigmay))
298 |         Kxz = self.kernelX.kernel(data_x,self.data_z)
299 |         Kzz = self.kernelX.kernel(self.data_z)
300 |         #R = inv(sqrtm(Kzz))
301 |         R = inv(sqrtm(Kzz + np.eye(np.shape(Kzz)[0])*10**(-6)))
302 |         phix = Kxz.dot(R)
303 |         Kyw = self.kernelY.kernel(data_y,self.data_w)
304 |         Kww = self.kernelY.kernel(self.data_w)
305 |         #S = inv(sqrtm(Kww))
306 |         S = inv(sqrtm(Kww + np.eye(np.shape(Kww)[0])*10**(-6)))
307 |         phiy = Kyw.dot(S)
308 |         return phix, phiy
309 |     
310 |     
311 |     def compute_pvalue(self,data_x=None,data_y=None):
312 |         pvalue,_=self.compute_pvalue_with_time_tracking(data_x,data_y)
313 |         return pvalue
314 |     
315 |     
316 |     
317 |     
318 |     
319 |     
320 |     
321 |     
322 | 


--------------------------------------------------------------------------------
/independence_testing/SimDataGen.py:
--------------------------------------------------------------------------------
 1 | from numpy.random import uniform, permutation, multivariate_normal,normal
 2 | from numpy import pi, prod, empty, sin, cos, asscalar, shape,zeros, identity,arange,sign,sum,sqrt,transpose, tanh,sinh
 3 | import numpy as np 
 4 | 
 5 | 
 6 | 
 7 | class SimDataGen(object):
 8 |     def __init__(self):
 9 |         pass
10 |     
11 |     
12 |     @staticmethod
13 |     def LargeScale(num_samples, dimension=4):
14 |         ''' dimension takes large even numbers, e.g. 50, 100 '''
15 |         Xmean = zeros(dimension)
16 |         Xcov = identity(dimension)
17 |         data_x = multivariate_normal(Xmean, Xcov, num_samples)
18 |         dd = dimension/2
19 |         Zmean = zeros(dd+1)
20 |         Zcov = identity(dd+1)
21 |         Z = multivariate_normal(Zmean, Zcov, num_samples)
22 |         first_term = sqrt(2./dimension)*sum(sign(data_x[:,arange(0,dimension,2)]* data_x[:,arange(1,dimension,2)])*abs(Z[:,range(dd)]),axis=1,keepdims=True)
23 |         second_term = Z[:,[dd]] #take the last dimension of Z 
24 |         data_y = first_term + second_term
25 |         return data_x, data_y
26 |     
27 |     
28 |     @staticmethod
29 |     def VaryDimension(num_samples, dimension = 5):
30 |         Xmean = zeros(dimension)
31 |         Xcov = identity(dimension)
32 |         data_x = multivariate_normal(Xmean, Xcov, num_samples)
33 |         data_z = transpose([normal(0,1,num_samples)])
34 |         data_y = 20*sin(4*pi*(data_x[:,[0]]**2 + data_x[:,[1]]**2)) + data_z
35 |         return data_x,data_y
36 |     
37 |     
38 |     @staticmethod
39 |     def SimpleLn(num_samples, dimension = 5):
40 |         Xmean = zeros(dimension)
41 |         Xcov = identity(dimension)
42 |         data_x = multivariate_normal(Xmean, Xcov, num_samples)
43 |         data_z = transpose([normal(0,1,num_samples)])
44 |         data_y = data_x[:,[0]] + data_z
45 |         return data_x, data_y
46 |     
47 |     
48 |     @staticmethod
49 |     def turn_into_null(fn):
50 |         def null_fn(*args, **kwargs):
51 |             dataX,dataY=fn(*args, **kwargs)
52 |             num_samples=shape(dataX)[0]
53 |             pp = permutation(num_samples)
54 |             return dataX,dataY[pp]
55 |         return null_fn
56 | 


--------------------------------------------------------------------------------
/independence_testing/SubHSICTestObject.py:
--------------------------------------------------------------------------------
 1 | from HSICTestObject import HSICTestObject
 2 | from numpy import zeros
 3 | import time
 4 | from numpy.random import permutation
 5 | 
 6 | class SubHSICTestObject(HSICTestObject):
 7 | 
 8 |     def __init__(self, num_samples, data_generator=None, kernelX=None, kernelY=None, kernelX_use_median=False,
 9 |                  kernelY_use_median=False, num_rfx=None, num_rfy=None, rff=False, num_shuffles=1000, unbiased=True):
10 |         HSICTestObject.__init__(self, num_samples, data_generator=data_generator, kernelX=kernelX, kernelY=kernelY, 
11 |                                 kernelX_use_median=kernelX_use_median,kernelY_use_median=kernelY_use_median,
12 |                                 num_rfx=num_rfx, num_rfy=num_rfy, rff=rff)
13 |         self.num_samples = num_samples
14 |         self.num_shuffles = num_shuffles
15 |         self.unbiased = unbiased
16 |         
17 |      
18 |     def compute_pvalue_with_time_tracking(self,data_x=None,data_y=None):
19 |         if data_x is None and data_y is None:
20 |             if not self.streaming and not self.freeze_data:
21 |                 start = time.clock()
22 |                 self.generate_data()
23 |                 data_generating_time = time.clock()-start
24 |                 data_x = self.data_x
25 |                 data_y = self.data_y
26 |             else:
27 |                 data_generating_time = 0.
28 |         else:
29 |             data_generating_time = 0.
30 |         #print 'data generating time passed: ', data_generating_time
31 |         SubHSIC_statistic = self.SubHSIC_statistic(unbiased=self.unbiased,data_x=data_x, data_y = data_y)
32 |         null_samples=zeros(self.num_shuffles)
33 |         for jj in range(self.num_shuffles):
34 |             pp = permutation(self.num_samples)
35 |             yy = self.data_y[pp,:]
36 |             null_samples[jj]=self.SubHSIC_statistic(data_x = data_x, data_y = yy, unbiased = self.unbiased)
37 |         pvalue = ( sum( null_samples > SubHSIC_statistic ) ) / float( self.num_shuffles )
38 |         return pvalue, data_generating_time


--------------------------------------------------------------------------------
/independence_testing/TestExperiment.py:
--------------------------------------------------------------------------------
 1 | """
 2 | This program is free software; you can redistribute it and/or modify
 3 | it under the terms of the GNU General Public License as published by
 4 | the Free Software Foundation; either version 3 of the License, or
 5 | (at your option) any later version.
 6 | 
 7 | Written (W) 2013 Dino Sejdinovic
 8 | """
 9 | from numpy import arange,zeros,mean
10 | import os
11 | from pickle import load, dump
12 | import time
13 | 
14 | class TestExperiment(object):
15 |     def __init__(self, name, param, test_object):
16 |         self.name=name
17 |         self.param=param
18 |         self.test_object=test_object
19 |         self.folder="results/res_"+self.name
20 |     
21 |     def compute_pvalue(self):
22 |         return self.test_object.compute_pvalue()
23 |     
24 |     def compute_pvalue_with_time_tracking(self):
25 |         return self.test_object.compute_pvalue_with_time_tracking()
26 |     
27 |     def perform_test(self, alpha):
28 |         return self.test_object.perform_test(alpha)
29 |     
30 |     def run_test_trials(self, numTrials, alpha=0.05):
31 |         completedTrials = 0
32 |         counter_init = 0
33 |         save_filename = self.folder+"/results.bin"
34 |         average_time_init=0.0
35 |         pvalues_init=list()
36 |         if not os.path.exists(self.folder):
37 |             os.mkdir(self.folder)
38 |         elif os.path.exists(save_filename):
39 |             load_f = open(save_filename,"r")
40 |             [counter_init, completedTrials, _, average_time_init,pvalues_init] = load(load_f)
41 |             load_f.close()
42 |             print "Found %d completed trials" % completedTrials
43 |             if completedTrials >= numTrials:
44 |                 print "Exiting"
45 |                 return 0
46 |             else:
47 |                 print "Continuing"
48 |         counter = counter_init
49 |         pvalues = pvalues_init
50 |         times_passed=zeros(numTrials-completedTrials)
51 |         for trial in arange(completedTrials,numTrials):
52 |             start=time.clock()
53 |             print "Trial %d" % trial
54 |             pvalue, data_generating_time = self.compute_pvalue_with_time_tracking()
55 |             counter += pvalue<alpha
56 |             pvalues.append(pvalue)
57 |             time_passed = time.clock()-start-data_generating_time
58 |             times_passed[trial-completedTrials]=time_passed
59 |             print 'p-value: ',pvalue
60 |             print 'testing time passed: ', time_passed
61 |             #save results every 50 trials
62 |             if not (trial+1)%5 or trial+1==numTrials:
63 |                     save_f = open(save_filename,"w")
64 |                     average_time=mean(times_passed[:trial+1-completedTrials])*float(trial+1-completedTrials)/(trial+1)+\
65 |                                 average_time_init*float(completedTrials)/(trial+1)
66 |                     dump([counter, trial+1, self.param, average_time, pvalues], save_f)
67 |                     save_f.close()
68 |                     print "...Dumped into file"
69 |                     print "Rejection rate: %d / %d" % (counter, trial+1)
70 |                     print "Average time: ", average_time
71 |         return 0
72 |         
73 | '''We are directly calculating the power if under alternative;
74 | calculating the Type I error (alpha) if under null'''


--------------------------------------------------------------------------------
/independence_testing/TestObject.py:
--------------------------------------------------------------------------------
 1 | from abc import abstractmethod
 2 | from scipy.stats import norm as normaldist
 3 | 
 4 | 
 5 | 
 6 | class TestObject(object):
 7 |     def __init__(self, test_type, streaming=False, freeze_data=False):
 8 |         self.test_type=test_type
 9 |         self.streaming=streaming
10 |         self.freeze_data=freeze_data
11 |         if self.freeze_data:
12 |             self.generate_data()
13 |             assert not self.streaming
14 |     
15 |     @abstractmethod
16 |     def compute_Zscore(self):
17 |         raise NotImplementedError
18 |     
19 |     @abstractmethod
20 |     def generate_data(self):
21 |         raise NotImplementedError
22 |     
23 |     def compute_pvalue(self):
24 |         Z_score = self.compute_Zscore()
25 |         pvalue = normaldist.sf(Z_score)
26 |         return pvalue
27 |     
28 |     def perform_test(self, alpha):
29 |         pvalue=self.compute_pvalue()
30 |         return pvalue<alpha
31 | 
32 | 
33 | 


--------------------------------------------------------------------------------
/independence_testing/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oxcsml/kerpy/50b175961d13e0e1f625aa987ae41cb98bfe4d84/independence_testing/__init__.py


--------------------------------------------------------------------------------
/kerpy/BagKernel.py:
--------------------------------------------------------------------------------
 1 | from kerpy.Kernel import Kernel
 2 | import numpy as np
 3 | from tools.GenericTests import GenericTests
 4 | from abc import abstractmethod
 5 | 
 6 | class BagKernel(Kernel):
 7 |     def __init__(self,data_kernel):
 8 |         Kernel.__init__(self)
 9 |         self.data_kernel=data_kernel
10 |         
11 |     def __str__(self):
12 |         s="BagKernel["
13 |         s += "data_kernel=" + self.data_kernel.__str__()
14 |         s += "]"
15 |         return s
16 |     
17 |     def set_kerpar(self,kerpar):
18 |         self.data_kernel.set_kerpar(kerpar)
19 |     
20 |     def kernel(self, bagX, bagY=None):
21 |         #GenericTests.check_type(bagX,'bagX',list)
22 |         nx=len(bagX)
23 |         if bagY is None:
24 |             K=np.zeros((nx,nx))
25 |             for ii in range(nx):
26 |                 zi = bagX[ii]
27 |                 for jj in range(ii,nx):
28 |                     zj = bagX[jj]
29 |                     K[ii,jj]=self.compute_BagKernel_value(zi,zj)
30 |             K=self.symmetrize(K)
31 |         else:
32 |             #GenericTests.check_type(bagY,'bagY',list)
33 |             ny=len(bagY)
34 |             K=np.zeros((nx,ny))
35 |             for ii in range(nx):
36 |                 zi = bagX[ii]
37 |                 for jj in range(ny):
38 |                     zj = bagY[jj]
39 |                     K[ii,jj]=self.compute_BagKernel_value(zi,zj)
40 |         return K
41 |     
42 |     @abstractmethod
43 |     def compute_BagKernel_value(self,bag1,bag2):
44 |         raise NotImplementedError()
45 |     
46 |     @staticmethod
47 |     def symmetrize(a):
48 |         return a + a.T - np.diag(a.diagonal())


--------------------------------------------------------------------------------
/kerpy/BrownianKernel.py:
--------------------------------------------------------------------------------
 1 | from matplotlib.pyplot import show, imshow
 2 | from numpy import shape, reshape
 3 | import numpy as np
 4 | from scipy.spatial.distance import squareform, pdist, cdist
 5 | 
 6 | from Kernel import Kernel
 7 | from tools.GenericTests import GenericTests
 8 | 
 9 | 
10 | class BrownianKernel(Kernel):
11 |     def __init__(self, alpha=1.0):
12 |         Kernel.__init__(self)
13 |         GenericTests.check_type(alpha,'alpha',float)
14 |         self.alpha = alpha
15 |     
16 |     def __str__(self):
17 |         s=self.__class__.__name__+ "=["
18 |         s += "alpha="+ str(self.alpha)
19 |         s += ", " + Kernel.__str__(self)
20 |         s += "]"
21 |         return s
22 |     
23 |     def set_kerpar(self, kerpar):
24 |         if (kerpar<=0 or kerpar>=2):
25 |             raise ValueError("incorrect parameter value")
26 |         self.alpha=kerpar
27 |     
28 |     
29 |     def kernel(self, X, Y=None):
30 |         
31 |         GenericTests.check_type(X,'X',np.ndarray,2)
32 |         # if X=Y, use more efficient pdist call which exploits symmetry
33 |         normX=reshape(np.linalg.norm(X,axis=1),(len(X),1))
34 |         if Y is None:
35 |             dists = squareform(pdist(X, 'euclidean'))
36 |             normY=normX.T
37 |         else:
38 |             GenericTests.check_type(Y,'Y',np.ndarray,2)
39 |             assert(shape(X)[1]==shape(Y)[1])
40 |             normY=reshape(np.linalg.norm(Y,axis=1),(1,len(Y)))
41 |             dists = cdist(X, Y, 'euclidean')
42 |         K=0.5*(normX**self.alpha+normY**self.alpha-dists**self.alpha)
43 |         return K
44 |     
45 |     def gradient(self, x, Y):
46 |         raise NotImplementedError()
47 |     
48 | if __name__ == '__main__':
49 |     from tools.UnitTests import UnitTests
50 |     UnitTests.UnitTestDefaultKernel(BrownianKernel)
51 | 


--------------------------------------------------------------------------------
/kerpy/GaussianBagKernel.py:
--------------------------------------------------------------------------------
 1 | from kerpy.BagKernel import BagKernel
 2 | from abc import abstractmethod
 3 | from numpy import exp, zeros, dot, cos, sin, concatenate, sqrt, mean, median
 4 | from numpy.random.mtrand import randn
 5 | import numpy as np
 6 | 
 7 | class GaussianBagKernel(BagKernel):
 8 |     def __init__(self,data_kernel,sigma=1.0):
 9 |         BagKernel.__init__(self,data_kernel)
10 |         self.width=sigma
11 |         
12 |     def __str__(self):
13 |         s=self.__class__.__name__+ "["
14 |         s += "width="+ str(self.width)
15 |         s += ", " + BagKernel.__str__(self)
16 |         s += "]"
17 |         return s
18 |     
19 |     def rff_generate(self,mbags=20,mdata=20,dim=1):
20 |         '''
21 |         mbags:: number of random features for bag kernel
22 |         mdata:: number of random features for data kernel
23 |         dim:: data dimensionality
24 |         '''
25 |         self.data_kernel.rff_generate(mdata,dim=dim)
26 |         self.rff_num=mbags
27 |         self.unit_rff_freq=randn(mbags/2,mdata)
28 |         self.rff_freq=self.unit_rff_freq/self.width
29 |     
30 |     def rff_expand(self,bagX):
31 |         if self.rff_freq is None:
32 |             raise ValueError("rff_freq has not been set. use rff_generate first")
33 |         nx=len(bagX)
34 |         featuremeans=zeros((nx,self.data_kernel.rff_num))
35 |         for ii in range(nx):
36 |             featuremeans[ii]=mean(self.data_kernel.rff_expand(bagX[ii]),axis=0)
37 |         xdotw=dot(featuremeans,(self.rff_freq).T)
38 |         return sqrt(2./self.rff_num)*concatenate( ( cos(xdotw),sin(xdotw) ) , axis=1 )
39 |     
40 |     
41 |     def compute_BagKernel_value(self,bag1,bag2):
42 |         return exp(-0.5 * self.data_kernel.estimateMMD(bag1,bag2) / self.width ** 2)
43 |     
44 |     def get_sigma_median_heuristic(self,X):
45 |         nx=np.shape(X)[0]
46 |         if nx>200:
47 |             X=X[np.random.permutation(nx)[:200]]
48 |         n=min(nx,200)
49 |         D=zeros((n,n))
50 |         for ii in range(n):
51 |             zi = X[ii]
52 |             for jj in range(ii+1,n):
53 |                 zj = X[jj]
54 |                 D[ii,jj]=sqrt(self.data_kernel.estimateMMD(zi,zj))
55 |         D=self.symmetrize(D)
56 |         median_dist=median(D[D>0])
57 |         sigma=median_dist/sqrt(2.)
58 |         return sigma
59 |     
60 | if __name__ == '__main__':
61 |     from tools.UnitTests import UnitTests
62 |     UnitTests.UnitTestBagKernel(GaussianBagKernel)


--------------------------------------------------------------------------------
/kerpy/GaussianKernel.py:
--------------------------------------------------------------------------------
 1 | from kerpy.Kernel import Kernel
 2 | from numpy import exp, shape, reshape, sqrt, median
 3 | from numpy.random import permutation,randn
 4 | from scipy.spatial.distance import squareform, pdist, cdist
 5 | import warnings
 6 | from tools.GenericTests import GenericTests
 7 | import numpy as np
 8 | 
 9 | class GaussianKernel(Kernel):
10 |     def __init__(self, sigma=1.0, is_sparse = False):
11 |         Kernel.__init__(self)
12 |         self.width = sigma
13 |         self.is_sparse = is_sparse
14 |     
15 |     def __str__(self):
16 |         s=self.__class__.__name__+ "["
17 |         s += "width="+ str(self.width)
18 |         s += "]"
19 |         return s
20 |     
21 |     def kernel(self, X, Y=None):
22 |         """
23 |         Computes the standard Gaussian kernel k(x,y)=exp(-0.5* ||x-y||**2 / sigma**2)
24 |         
25 |         X - 2d numpy.ndarray, first set of samples:
26 |             number of rows: number of samples
27 |             number of columns: dimensionality
28 |         Y - 2d numpy.ndarray, second set of samples, can be None in which case its replaced by X
29 |         """
30 |         if self.is_sparse:
31 |             X = X.todense()
32 |             Y = Y.todense()
33 |         GenericTests.check_type(X, 'X',np.ndarray)
34 |         assert(len(shape(X))==2)
35 |         
36 |         # if X=Y, use more efficient pdist call which exploits symmetry
37 |         if Y is None:
38 |             sq_dists = squareform(pdist(X, 'sqeuclidean'))
39 |         else:
40 |             GenericTests.check_type(Y, 'Y',np.ndarray)
41 |             assert(len(shape(Y))==2)
42 |             assert(shape(X)[1]==shape(Y)[1])
43 |             sq_dists = cdist(X, Y, 'sqeuclidean')
44 |         
45 |         K = exp(-0.5 * (sq_dists) / self.width ** 2)
46 |         return K
47 |     
48 |     
49 |     def gradient(self, x, Y):
50 |         """
51 |         Computes the gradient of the Gaussian kernel wrt. to the left argument, i.e.
52 |         k(x,y)=exp(-0.5* ||x-y||**2 / sigma**2), which is
53 |         \nabla_x k(x,y)=1.0/sigma**2 k(x,y)(y-x)
54 |         Given a set of row vectors Y, this computes the
55 |         gradient for every pair (x,y) for y in Y.
56 |         """
57 |         if self.is_sparse:
58 |             x = x.todense()
59 |             Y = Y.todense()
60 |         assert(len(shape(x))==1)
61 |         assert(len(shape(Y))==2)
62 |         assert(len(x)==shape(Y)[1])
63 |         
64 |         x_2d=reshape(x, (1, len(x)))
65 |         k = self.kernel(x_2d, Y)
66 |         differences = Y - x
67 |         G = (1.0 / self.width ** 2) * (k.T * differences)
68 |         return G
69 |     
70 |     
71 |     def rff_generate(self,m,dim=1):
72 |         self.rff_num=m
73 |         self.unit_rff_freq=randn(int(m/2),dim)
74 |         self.rff_freq=self.unit_rff_freq/self.width
75 |     
76 |     @staticmethod
77 |     def get_sigma_median_heuristic(X, is_sparse = False):
78 |         if is_sparse:
79 |             X = X.todense()
80 |         n=shape(X)[0]
81 |         if n>1000:
82 |             X=X[permutation(n)[:1000],:]
83 |         dists=squareform(pdist(X, 'euclidean'))
84 |         median_dist=median(dists[dists>0])
85 |         sigma=median_dist/sqrt(2.)
86 |         return sigma
87 | 


--------------------------------------------------------------------------------
/kerpy/HypercubeKernel.py:
--------------------------------------------------------------------------------
 1 | from numpy import tanh
 2 | import numpy
 3 | from scipy.spatial.distance import squareform, pdist, cdist
 4 | 
 5 | from kerpy.Kernel import Kernel
 6 | 
 7 | 
 8 | class HypercubeKernel(Kernel):
 9 |     def __init__(self, gamma):
10 |         Kernel.__init__(self)
11 |         
12 |         if type(gamma) is not float:
13 |             raise TypeError("Gamma must be float")
14 |         
15 |         self.gamma = gamma
16 |     
17 |     def __str__(self):
18 |         s = self.__class__.__name__ + "=["
19 |         s += "gamma=" + str(self.gamma)
20 |         s += ", " + Kernel.__str__(self)
21 |         s += "]"
22 |         return s
23 |     
24 |     def kernel(self, X, Y=None):
25 |         """
26 |         Computes the hypercube kerpy k(x,y)=tanh(gamma)^d(x,y), where d is the
27 |         Hamming distance between x and y
28 |         
29 |         X - 2d numpy.bool8 array, samples on right left side
30 |         Y - 2d numpy.bool8 array, samples on left hand side.
31 |             Can be None in which case its replaced by X
32 |         """
33 |         
34 |         if not type(X) is numpy.ndarray:
35 |             raise TypeError("X must be numpy array")
36 |         
37 |         if not len(X.shape) == 2:
38 |             raise ValueError("X must be 2D numpy array")
39 |         
40 |         if not X.dtype == numpy.bool8:
41 |             raise ValueError("X must be boolean numpy array")
42 |         
43 |         if not Y is None:
44 |             if not type(Y) is numpy.ndarray:
45 |                 raise TypeError("Y must be None or numpy array")
46 |             
47 |             if not len(Y.shape) == 2:
48 |                 raise ValueError("Y must be None or 2D numpy array")
49 |             
50 |             if not Y.dtype == numpy.bool8:
51 |                 raise ValueError("Y must be boolean numpy array")
52 |         
53 |             if not X.shape[1] == Y.shape[1]:
54 |                 raise ValueError("X and Y must have same dimension if Y is not None")
55 |         
56 |         # un-normalise normalised hamming distance in both cases
57 |         if Y is None:
58 |             K = tanh(self.gamma) ** squareform(pdist(X, 'hamming') * X.shape[1])
59 |         else:
60 |             K = tanh(self.gamma) ** (cdist(X, Y, 'hamming') * X.shape[1])
61 |             
62 |         return K
63 |     
64 |     def gradient(self, x, Y):
65 |         """
66 |         Computes the gradient of the hypercube kerpy wrt. to the left argument
67 |         
68 |         x - single sample on right hand side (1D vector)
69 |         Y - samples on left hand side (2D matrix)
70 |         """
71 |         pass
72 | 
73 | 


--------------------------------------------------------------------------------
/kerpy/Kernel.py:
--------------------------------------------------------------------------------
  1 | from abc import abstractmethod
  2 | from numpy import eye, concatenate, zeros, shape, mean, reshape, arange, exp, outer,\
  3 |     linalg, dot, cos, sin, sqrt, inf
  4 | from numpy.random import permutation
  5 | from numpy.lib.index_tricks import fill_diagonal
  6 | from matplotlib.pyplot import imshow,show
  7 | import numpy as np
  8 | import matplotlib.pyplot as plt
  9 | import matplotlib.cm as cm
 10 | import warnings
 11 | from tools.GenericTests import GenericTests
 12 | 
 13 | 
 14 | 
 15 | 
 16 | class Kernel(object):
 17 |     def __init__(self):
 18 |         self.rff_num=None
 19 |         self.rff_freq=None
 20 |         pass
 21 |     
 22 |     def __str__(self):
 23 |         s=""
 24 |         return s
 25 |     
 26 |     @abstractmethod
 27 |     def kernel(self, X, Y=None):
 28 |         raise NotImplementedError()
 29 |     
 30 |     @abstractmethod
 31 |     def set_kerpar(self,kerpar):
 32 |         self.set_width(kerpar)
 33 |     
 34 |     @abstractmethod
 35 |     def set_width(self, width):
 36 |         if hasattr(self, 'width'):
 37 |             warnmsg="\nChanging kernel width from "+str(self.width)+" to "+str(width)
 38 |             #warnings.warn(warnmsg) ---need to add verbose argument to show these warning messages
 39 |             if self.rff_freq is not None:
 40 |                 warnmsg="\nrff frequencies found. rescaling to width " +str(width)
 41 |                 #warnings.warn(warnmsg)
 42 |                 self.rff_freq=self.unit_rff_freq/width
 43 |             self.width=width
 44 |         else:
 45 |             raise ValueError("Senseless: kernel has no 'width' attribute!")
 46 |     
 47 |     @abstractmethod
 48 |     def rff_generate(self,m,dim=1):
 49 |         raise NotImplementedError()
 50 |     
 51 |     @abstractmethod
 52 |     def rff_expand(self,X):
 53 |         if self.rff_freq is None:
 54 |             raise ValueError("rff_freq has not been set. use rff_generate first")
 55 |         """
 56 |         Computes the random Fourier features for the input dataset X
 57 |         for a set of frequencies in rff_freq.
 58 |         This set of frequencies has to be precomputed
 59 |         X - 2d numpy.ndarray, first set of samples:
 60 |             number of rows: number of samples
 61 |             number of columns: dimensionality
 62 |         """
 63 |         GenericTests.check_type(X, 'X',np.ndarray)
 64 |         xdotw=dot(X,(self.rff_freq).T)
 65 |         return sqrt(2./self.rff_num)*np.concatenate( ( cos(xdotw),sin(xdotw) ) , axis=1 )
 66 |         
 67 |     @abstractmethod
 68 |     def gradient(self, x, Y):
 69 |         
 70 |         # ensure this in every implementation
 71 |         assert(len(shape(x))==1)
 72 |         assert(len(shape(Y))==2)
 73 |         assert(len(x)==shape(Y)[1])
 74 |         
 75 |         raise NotImplementedError()
 76 |     
 77 |     @staticmethod
 78 |     def centering_matrix(n):
 79 |         """
 80 |         Returns the centering matrix eye(n) - 1.0 / n
 81 |         """
 82 |         return eye(n) - 1.0 / n
 83 |     
 84 |     @staticmethod
 85 |     def center_kernel_matrix(K):
 86 |         """
 87 |         Centers the kernel matrix via a centering matrix H=I-1/n and returns HKH
 88 |         """
 89 |         n = shape(K)[0]
 90 |         H = eye(n) - 1.0 / n
 91 |         return  1.0 / n * H.dot(K.dot(H))
 92 |     
 93 |     
 94 |     @abstractmethod
 95 |     def show_kernel_matrix(self,X,Y=None):
 96 |         K=self.kernel(X,Y)
 97 |         imshow(K, interpolation="nearest")
 98 |         show()
 99 |     
100 |     @abstractmethod
101 |     def svc(self,X,y,lmbda=1.0,Xtst=None,ytst=None):
102 |         from sklearn import svm
103 |         svc=svm.SVC(kernel=self.kernel,C=lmbda)
104 |         svc.fit(X,y)
105 |         if Xtst is None:
106 |             return svc
107 |         else:
108 |             ypre=svc.predict(Xtst)
109 |             if ytst is None:
110 |                 return svc,ypre
111 |             else:
112 |                 return svc,ypre,1-svc.score(Xtst,ytst)
113 |     
114 |     @abstractmethod
115 |     def svc_rff(self,X,y,lmbda=1.0,Xtst=None,ytst=None):
116 |         from sklearn import svm
117 |         phi=self.rff_expand(X)
118 |         svc=svm.LinearSVC(C=lmbda,dual=True)
119 |         svc.fit(phi,y)
120 |         if Xtst is None:
121 |             return svc
122 |         else:
123 |             phitst=self.rff_expand(Xtst)
124 |             ypre=svc.predict(phitst)
125 |             if ytst is None:
126 |                 return svc,ypre
127 |             else:
128 |                 return svc,ypre,1-svc.score(phitst,ytst)
129 |     
130 |     @abstractmethod
131 |     def ridge_regress(self,X,y,lmbda=0.01,Xtst=None,ytst=None):
132 |         K=self.kernel(X)
133 |         n=shape(K)[0]
134 |         aa=linalg.solve(K+lmbda*eye(n),y)
135 |         if Xtst is None:
136 |             return aa
137 |         else:
138 |             ypre=dot(aa.T,self.kernel(X,Xtst)).T
139 |             if ytst is None:
140 |                 return aa,ypre
141 |             else:
142 |                 return aa,ypre,(linalg.norm(ytst-ypre)**2)/np.shape(ytst)[0]
143 |     
144 |     @abstractmethod
145 |     def ridge_regress_rff(self,X,y,lmbda=0.01,Xtst=None,ytst=None):
146 | #         if self.rff_freq is None:
147 | #             warnings.warn("\nrff_freq has not been set!\nGenerating new random frequencies (m=100 by default)")
148 | #             self.rff_generate(100,dim=shape(X)[1])
149 | #             print shape(X)[1]
150 |         phi=self.rff_expand(X)
151 |         bb=linalg.solve(dot(phi.T,phi)+lmbda*eye(self.rff_num),dot(phi.T,y))
152 |         if Xtst is None:
153 |             return bb
154 |         else:
155 |             phitst=self.rff_expand(Xtst)
156 |             ypre=dot(phitst,bb)
157 |             if ytst is None:
158 |                 return bb,ypre
159 |             else:
160 |                 return bb,ypre,(linalg.norm(ytst-ypre)**2)/np.shape(ytst)[0]
161 |     
162 |     @abstractmethod
163 |     def xvalidate( self,X,y, method = 'ridge_regress',  \
164 |                                     regpar_grid=(1+arange(25))/200.0,  \
165 |                                     kerpar_grid=exp(-13+arange(25)),  \
166 |                                     numFolds = 10, verbose = False, visualise = False):
167 |         from sklearn import cross_validation
168 |         which_method = getattr(self,method)
169 |         n=len(X)
170 |         kf=cross_validation.KFold(n,n_folds=numFolds)
171 |         xvalerr=zeros((len(regpar_grid),len(kerpar_grid)))
172 |         width_idx=0
173 |         for width in kerpar_grid:
174 |             try:
175 |                 self.set_kerpar(width)
176 |             except ValueError:
177 |                 xvalerr[:,width_idx]=inf
178 |                 warnings.warn("...invalid kernel parameter value in cross-validation. ignoring\n")
179 |                 width_idx+=1
180 |                 continue
181 |             else:
182 |                 lmbda_idx=0
183 |                 for lmbda in regpar_grid:
184 |                     fold = 0
185 |                     prederr = zeros(numFolds)
186 |                     for train_index, test_index in kf:
187 |                         if type(X)==list:
188 |                             #could use slicing to speed this up when X is a list
189 |                             #currently uses sklearn cross_validation framework which returns indices as arrays
190 |                             #so simple list comprehension below
191 |                             X_train = [X[i] for i in train_index]
192 |                             X_test = [X[i] for i in test_index]
193 |                         else:
194 |                             X_train, X_test = X[train_index], X[test_index]
195 |                         if type(y)==list:
196 |                             y_train = [y[i] for i in train_index]
197 |                             y_test = [y[i] for i in test_index]
198 |                         else:
199 |                             y_train, y_test = y[train_index], y[test_index]
200 |                         _,_,prederr[fold]=which_method(X_train,y_train,lmbda=lmbda,Xtst=X_test,ytst=y_test)
201 |                         fold+=1
202 |                     xvalerr[lmbda_idx,width_idx]=mean(prederr)
203 |                     if verbose:
204 |                         print("kerpar:"+str(width)+", regpar:"+str(lmbda))
205 |                         print("    cross-validated loss:"+str(xvalerr[lmbda_idx,width_idx]))
206 |                     lmbda_idx+=1
207 |                 width_idx+=1
208 |         min_idx = np.unravel_index(np.argmin(xvalerr),shape(xvalerr))
209 |         if visualise:
210 |             plt.imshow(xvalerr, interpolation='none',
211 |                 origin='lower',
212 |                 cmap=cm.pink)
213 |                 #extent=(regpar_grid[0],regpar_grid[-1],kerpar_grid[0],kerpar_grid[-1]))
214 |             plt.colorbar()
215 |             plt.title("cross-validated loss")
216 |             plt.ylabel("regularisation parameter")
217 |             plt.xlabel("kernel parameter")
218 |             show()
219 |         return regpar_grid[min_idx[0]],kerpar_grid[min_idx[1]]
220 |     
221 |     @abstractmethod
222 |     def estimateMMD(self,sample1,sample2,unbiased=False):
223 |         """
224 |         Compute the MMD between two samples
225 |         """
226 |         K11 = self.kernel(sample1)
227 |         K22 = self.kernel(sample2)
228 |         K12 = self.kernel(sample1,sample2)
229 |         if unbiased:
230 |             fill_diagonal(K11,0.0)
231 |             fill_diagonal(K22,0.0)
232 |             n=float(shape(K11)[0])
233 |             m=float(shape(K22)[0])
234 |             return sum(sum(K11))/(pow(n,2)-n) + sum(sum(K22))/(pow(m,2)-m) - 2*mean(K12[:])
235 |         else:
236 |             return mean(K11[:])+mean(K22[:])-2*mean(K12[:])
237 |     
238 |     
239 |     
240 |     @abstractmethod
241 |     def estimateMMD_rff(self,sample1,sample2,unbiased=False):
242 | #         if self.rff_freq is None:
243 | #             warnings.warn("\nrff_freq has not been set!\nGenerating new random frequencies (m=100 by default)")
244 | #             self.rff_generate(100,dim=shape(sample1)[1])
245 |         phi1=self.rff_expand(sample1)
246 |         phi2=self.rff_expand(sample2)
247 |         featuremean1=mean(phi1,axis=0)
248 |         featuremean2=mean(phi2,axis=0)
249 |         if unbiased:
250 |             nx=shape(phi1)[0]
251 |             ny=shape(phi2)[0]
252 |             first_term=nx/(nx-1.0)*( dot(featuremean1,featuremean1)   \
253 |                                         -mean(linalg.norm(phi1,axis=1)**2)/nx )
254 |             second_term=ny/(ny-1.0)*( dot(featuremean2,featuremean2)   \
255 |                                         -mean(linalg.norm(phi2,axis=1)**2)/ny )
256 |             third_term=-2*dot(featuremean1,featuremean2)
257 |             return first_term+second_term+third_term
258 |         else:
259 |             return linalg.norm(featuremean1-featuremean2)**2
260 | 


--------------------------------------------------------------------------------
/kerpy/LinearBagKernel.py:
--------------------------------------------------------------------------------
 1 | from kerpy.BagKernel import BagKernel
 2 | import numpy as np
 3 | from tools.GenericTests import GenericTests
 4 | from kerpy.GaussianKernel import GaussianKernel
 5 | from abc import abstractmethod
 6 | 
 7 | class LinearBagKernel(BagKernel):
 8 |     def __init__(self,data_kernel):
 9 |         BagKernel.__init__(self,data_kernel)
10 |         
11 |     def __str__(self):
12 |         s=self.__class__.__name__+ "["
13 |         s += "" + BagKernel.__str__(self)
14 |         s += "]"
15 |         return s
16 |     
17 |     def rff_generate(self,mdata=20,dim=1):
18 |         '''
19 |         mdata:: number of random features for data kernel
20 |         dim:: data dimensionality
21 |         '''
22 |         self.data_kernel.rff_generate(mdata,dim=dim)
23 |         self.rff_num=mdata
24 |     
25 |     def rff_expand(self,bagX):
26 |         nx=len(bagX)
27 |         featuremeans=np.zeros((nx,self.data_kernel.rff_num))
28 |         for ii in range(nx):
29 |             featuremeans[ii]=np.mean(self.data_kernel.rff_expand(bagX[ii]),axis=0)
30 |         return featuremeans
31 |     
32 |     def compute_BagKernel_value(self,bag1,bag2):
33 |         innerK=self.data_kernel.kernel(bag1,bag2)
34 |         return np.mean(innerK[:])
35 |     
36 |     
37 | if __name__ == '__main__':
38 |     from tools.UnitTests import UnitTests
39 |     UnitTests.UnitTestBagKernel(LinearBagKernel)


--------------------------------------------------------------------------------
/kerpy/LinearKernel.py:
--------------------------------------------------------------------------------
 1 | from kerpy.Kernel import Kernel
 2 | 
 3 | class LinearKernel(Kernel):
 4 |     def __init__(self, is_sparse = False):
 5 |         Kernel.__init__(self)
 6 |         self.is_sparse = is_sparse
 7 | 
 8 |     def __str__(self):
 9 |         s=self.__class__.__name__+ "=["
10 |         s += "" + Kernel.__str__(self)
11 |         s += "]"
12 |         return s
13 |     
14 |     def kernel(self, X, Y=None):
15 |         """
16 |         Computes the linear kerpy k(x,y)=x^T y for the given data
17 |         X - samples on right hand side
18 |         Y - samples on left hand side, can be None in which case its replaced by X
19 |         """
20 | 
21 |         if Y is None:
22 |             Y = X
23 |         if self.is_sparse:
24 |             return X.dot(Y.T).todense()
25 |         else:
26 |             return X.dot(Y.T)
27 | 
28 |     def gradient(self, x, Y, args_euqal=False):
29 |         """
30 |         Computes the linear kerpy k(x,y)=x^T y for the given data
31 |         x - single sample on right hand side
32 |         Y - samples on left hand side
33 |         """
34 |         return Y
35 | 


--------------------------------------------------------------------------------
/kerpy/MaternKernel.py:
--------------------------------------------------------------------------------
 1 | from matplotlib.pyplot import show, imshow
 2 | from numpy import exp, shape, sqrt, reshape
 3 | import numpy as np
 4 | from scipy.spatial.distance import squareform, pdist, cdist
 5 | 
 6 | 
 7 | from kerpy.Kernel import Kernel
 8 | from tools.GenericTests import GenericTests
 9 | 
10 | 
11 | class MaternKernel(Kernel):
12 |     def __init__(self, width=1.0, nu=1.5, sigma=1.0):
13 |         Kernel.__init__(self)
14 |         GenericTests.check_type(width,'width',float)
15 |         GenericTests.check_type(nu,'nu',float)
16 |         GenericTests.check_type(sigma,'sigma',float)
17 |         
18 |         self.width = width
19 |         self.nu = nu
20 |         self.sigma = sigma
21 |     
22 |     def __str__(self):
23 |         s=self.__class__.__name__+ "["
24 |         s += "width="+ str(self.width)
25 |         s += ", nu="+ str(self.nu)
26 |         s += ", sigma="+ str(self.sigma)
27 |         s += "]"
28 |         return s
29 |     
30 |     def kernel(self, X, Y=None):
31 |         
32 |         GenericTests.check_type(X,'X',np.ndarray,2)
33 |         # if X=Y, use more efficient pdist call which exploits symmetry
34 |         if Y is None:
35 |             dists = squareform(pdist(X, 'euclidean'))
36 |         else:
37 |             GenericTests.check_type(Y,'Y',np.ndarray,2)
38 |             assert(shape(X)[1]==shape(Y)[1])
39 |             dists = cdist(X, Y, 'euclidean')
40 |         if self.nu==0.5:
41 |             #for nu=1/2, Matern class corresponds to Ornstein-Uhlenbeck Process
42 |             K = (self.sigma**2.) * exp( -dists / self.width )                 
43 |         elif self.nu==1.5:
44 |             K = (self.sigma**2.) * (1+ sqrt(3.)*dists / self.width) * exp( -sqrt(3.)*dists / self.width )
45 |         elif self.nu==2.5:
46 |             K = (self.sigma**2.) * (1+ sqrt(5.)*dists / self.width + 5.0*(dists**2.) / (3.0*self.width**2.) ) * exp( -sqrt(5.)*dists / self.width )
47 |         else:
48 |             raise NotImplementedError()
49 |         return K
50 |     
51 |     def rff_generate(self,m,dim=1):
52 |         self.rff_num=m
53 |         assert(dim==1)
54 |         ##currently works only for dim=1
55 |         ##need to check how student spectral density generalizes to multivariate case
56 |         assert(self.sigma==1.0)
57 |         ##the scale parameter should be one
58 |         if self.nu==0.5 or self.nu==1.5 or self.nu==2.5:
59 |             df = self.nu*2
60 |             self.unit_rff_freq=np.random.standard_t(df,size=(int(m/2),dim))
61 |             self.rff_freq=self.unit_rff_freq/self.width
62 |         else:
63 |             raise NotImplementedError()
64 |     
65 |     def gradient(self, x, Y):
66 |         assert(len(shape(x))==1)
67 |         assert(len(shape(Y))==2)
68 |         assert(len(x)==shape(Y)[1])
69 |         
70 |         if self.nu==1.5 or self.nu==2.5:
71 |             x_2d=reshape(x, (1, len(x)))
72 |             lower_order_width = self.width * sqrt(2*(self.nu-1)) / sqrt(2*self.nu)
73 |             lower_order_kernel = MaternKernel(lower_order_width,self.nu-1,self.sigma)
74 |             k = lower_order_kernel.kernel(x_2d, Y)
75 |             differences = Y - x
76 |             G = ( 1.0 / lower_order_width ** 2 ) * (k.T * differences)
77 |             return G
78 |         else:
79 |             raise NotImplementedError()
80 |     
81 | if __name__ == '__main__':
82 |     from tools.UnitTests import UnitTests
83 |     UnitTests.UnitTestDefaultKernel(MaternKernel)
84 |     kernel=MaternKernel(width=2.0)
85 |     x=np.random.rand(10,1)
86 |     y=np.random.rand(15,1)
87 |     K=kernel.kernel(x,y)
88 |     kernel.rff_generate(50000)
89 |     phix=kernel.rff_expand(x)
90 |     phiy=kernel.rff_expand(y)
91 |     Khat=phix.dot(phiy.T)
92 |     print(np.linalg.norm(K-Khat))
93 | 


--------------------------------------------------------------------------------
/kerpy/PolynomialKernel.py:
--------------------------------------------------------------------------------
 1 | from numpy import array
 2 | 
 3 | from kerpy.Kernel import Kernel
 4 | 
 5 | 
 6 | class PolynomialKernel(Kernel):
 7 |     def __init__(self, degree,theta=1.0):
 8 |         Kernel.__init__(self)
 9 |         self.degree = degree
10 |         self.theta = theta
11 |         
12 |     def __str__(self):
13 |         s=self.__class__.__name__+ "=["
14 |         s += "degree="+ str(self.degree)
15 |         s += ", " + Kernel.__str__(self)
16 |         s += "]"
17 |         return s
18 |     
19 |     def kernel(self, X, Y=None):
20 |         """
21 |         Computes the polynomial kerpy k(x,y)=(1+theta*<x,y>)^degree for the given data
22 |         X - samples on right hand side
23 |         Y - samples on left hand side, can be None in which case its replaced by X
24 |         """
25 |         if Y is None:
26 |             Y = X
27 |         
28 |         return pow(self.theta+X.dot(Y.T), self.degree)
29 |     
30 |     def gradient(self, x, Y):
31 |         """
32 |         Computes the gradient of the Polynomial kerpy wrt. to the left argument, i.e.
33 |         \nabla_x k(x,y)=\nabla_x (1+x^Ty)^d=d(1+x^Ty)^(d-1) y
34 |         
35 |         x - single sample on right hand side (1D vector)
36 |         Y - samples on left hand side (2D matrix)
37 |         """
38 |         assert(len(x.shape)==1)
39 |         assert(len(Y.shape)==2)
40 |         assert(len(x)==Y.shape[1])
41 |         
42 |         return self.degree*pow(self.theta+x.dot(Y.T), self.degree-1)*Y
43 | 


--------------------------------------------------------------------------------
/kerpy/ProductKernel.py:
--------------------------------------------------------------------------------
 1 | from kerpy.Kernel import Kernel
 2 | import numpy as np
 3 | 
 4 | 
 5 | class ProductKernel(Kernel):
 6 |     def __init__(self, list_of_kernels):
 7 |         Kernel.__init__(self)
 8 |         self.list_of_kernels = list_of_kernels
 9 |         
10 |     def __str__(self):
11 |         s=self.__class__.__name__+ "=["
12 |         s += ", " + Kernel.__str__(self)
13 |         s += "]"
14 |         return s
15 |     
16 |     def kernel(self, X, Y=None):
17 |         return np.prod([individual_kernel.kernel(X,Y) for individual_kernel in self.list_of_kernels],0)


--------------------------------------------------------------------------------
/kerpy/SumKernel.py:
--------------------------------------------------------------------------------
 1 | from kerpy.Kernel import Kernel
 2 | import numpy as np
 3 | 
 4 | 
 5 | class SumKernel(Kernel):
 6 |     def __init__(self, list_of_kernels):
 7 |         Kernel.__init__(self)
 8 |         self.list_of_kernels = list_of_kernels
 9 |         
10 |     def __str__(self):
11 |         s=self.__class__.__name__+ "=["
12 |         s += ", " + Kernel.__str__(self)
13 |         s += "]"
14 |         return s
15 |     
16 |     def kernel(self, X, Y=None):
17 |         return np.sum([individual_kernel.kernel(X,Y) for individual_kernel in self.list_of_kernels],0)


--------------------------------------------------------------------------------
/kerpy/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oxcsml/kerpy/50b175961d13e0e1f625aa987ae41cb98bfe4d84/kerpy/__init__.py


--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
 1 | from setuptools import setup, find_packages
 2 | 
 3 | setup(
 4 |     name='kerpy',
 5 |     version='0.1.0',
 6 |     url='https://github.com/oxmlcs/kerpy',
 7 |     packages=find_packages(),
 8 |     description='Code for kernel methods',
 9 |     license='MIT'
10 | )
11 | 


--------------------------------------------------------------------------------
/tools/GenericTests.py:
--------------------------------------------------------------------------------
 1 | class GenericTests():
 2 |     @staticmethod
 3 |     def check_type(varvalue, varname, vartype, required_shapelen=None):
 4 |         if not type(varvalue) is vartype:
 5 |             raise TypeError("Variable " + varname + " must be of type " + vartype.__name__ + \
 6 |                             ". Given is " + str(type(varvalue)))
 7 |         if not required_shapelen is None:
 8 |             if not len(varvalue.shape) is required_shapelen:
 9 |                 raise ValueError("Variable " + varname + " must be " + str(required_shapelen) + "-dimensional")
10 |         return 0
11 | 


--------------------------------------------------------------------------------
/tools/ProcessingObject.py:
--------------------------------------------------------------------------------
  1 | import argparse
  2 | '''
  3 | Class containing some helper functions, i.e., argument parsing
  4 | '''
  5 | 
  6 | from kerpy.LinearKernel import LinearKernel
  7 | from kerpy.GaussianKernel import GaussianKernel
  8 | 
  9 | class ProcessingObject(object):
 10 |     def __init__(self):
 11 |         '''
 12 |         Constructor
 13 |         '''
 14 |     @staticmethod
 15 |     def parse_arguments():
 16 |         parser = argparse.ArgumentParser()
 17 |         parser.add_argument("num_samples", type=int,\
 18 |                             help="total # of samples")
 19 |         parser.add_argument("--num_rfx", type=int,\
 20 |                             help="number of random features of the data X",
 21 |                             default=30)
 22 |         parser.add_argument("--num_rfy", type=int,\
 23 |                             help="number of random features of the data Y",
 24 |                             default=30)
 25 |         parser.add_argument("--num_inducex", type=int,\
 26 |                             help="number of inducing variables of the data X",
 27 |                             default=30)
 28 |         parser.add_argument("--num_inducey", type=int,\
 29 |                             help="number of inducing variables of the data Y",
 30 |                             default=30)
 31 |         parser.add_argument("--num_shuffles", type=int,\
 32 |                             help="number of shuffles",
 33 |                             default=800)
 34 |         parser.add_argument("--blocksize", type=int,\
 35 |                             help="# of samples per block (includes X and Y) when using a block-based test",
 36 |                             default=20)
 37 |         parser.add_argument("--dimX", type=int,\
 38 |                             help="dimensionality of the data X",
 39 |                             default=3)
 40 |         parser.add_argument("--dimZ", type=int,\
 41 |                             help="dimensionality of the data Z (i.e. the conditioning variable)",
 42 |                             default=7)
 43 |         parser.add_argument("--kernelX", const = LinearKernel(), default = GaussianKernel(1.), \
 44 |                             action='store_const', \
 45 |                             help="Linear kernel (Default GaussianKernel(1.))?")
 46 |         parser.add_argument("--kernelY", const = LinearKernel(), default = GaussianKernel(1.), \
 47 |                             action='store_const', \
 48 |                             help="Linear kernel (Default GaussianKernel(1.))?")
 49 |         parser.add_argument("--kernelX_use_median", action="store_true",\
 50 |                             help="should median heuristic be used for X?",
 51 |                            default=False)
 52 |         parser.add_argument("--kernelY_use_median", action="store_true",\
 53 |                             help="should median heuristic be used for Y?",
 54 |                             default=False)
 55 |         
 56 |         parser.add_argument("--kernelRxz", const = GaussianKernel(1.), default = LinearKernel(), \
 57 |                             action='store_const', \
 58 |                             help="Gaussian kernel(1.) (Default LinearKernel)?")
 59 |         parser.add_argument("--kernelRyz", const = GaussianKernel(1.), default = LinearKernel(), \
 60 |                             action='store_const', \
 61 |                             help="Linear kernel (Default GaussianKernel(1.))?")
 62 |         parser.add_argument("--kernelRxz_use_median", action="store_true",\
 63 |                             help="should median heuristic be used for residuals Rxz?",
 64 |                             default=False)
 65 |         parser.add_argument("--kernelRyz_use_median", action="store_true",\
 66 |                             help="should median heuristic be used for residuals Ryz?",
 67 |                             default=False)
 68 |         
 69 |         parser.add_argument("--RESIT_type", action="store_true",\
 70 |                             help="Conditional Testing using RESIT?",\
 71 |                             default=False)
 72 |         parser.add_argument("--optimise_lambda_only", action="store_false",\
 73 |                             help="Optimise lambdas only?",\
 74 |                             default=True)
 75 |         parser.add_argument("--grid_search", action="store_false",\
 76 |                             help="Optimise hyperparameters through grid search?",\
 77 |                             default=True)
 78 |         parser.add_argument("--GD_optimise", action="store_true",\
 79 |                             help="Optimise hyperparameters through gradient descent?",\
 80 |                             default=False)
 81 |         
 82 |         parser.add_argument("--results_filename",type=str,\
 83 |                             help = "name of the file to save results?",\
 84 |                             default = "testing")
 85 |         parser.add_argument("--figure_filename",type=str,\
 86 |                             help = "name of the file to save the causal graph?",\
 87 |                             default = "testing")
 88 |         parser.add_argument("--data_filename",type=str,\
 89 |                             help = "name of the file to load data from?",\
 90 |                             default = "testing")
 91 |         
 92 |         #parser.add_argument("--dimY", type=int,\
 93 |         #                    help="dimensionality of the data Y",
 94 |         #                    default=3)
 95 |         parser.add_argument("--hypothesis", type=str,\
 96 |                             help="is null or alternative true in this experiment? [null, alter]",\
 97 |                             default="alter")
 98 |         parser.add_argument("--nullvarmethod", type=str,\
 99 |                             help="how to estimate asymptotic variance under null? [direct, permutation, across]?",\
100 |                             default="direct")
101 |         parser.add_argument("--streaming", action="store_true",\
102 |                             help="should data be streamed (rather than all loaded into memory)?",\
103 |                             default=False)
104 |         parser.add_argument("--rff", action="store_true",\
105 |                             help="should random features be used?",\
106 |                             default=False)
107 |         parser.add_argument("--induce_set", action="store_true",\
108 |                             help="should inducing variables be used?",\
109 |                             default=False)
110 |         args = parser.parse_args()
111 |         return args


--------------------------------------------------------------------------------
/tools/UnitTests.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | from kerpy.GaussianKernel import GaussianKernel
 3 | 
 4 | class UnitTests():
 5 |     
 6 |     @staticmethod
 7 |     def UnitTestDefaultKernel(which_kernel):
 8 |         dim = 5
 9 |         nx = 20
10 |         X=np.random.randn(nx,dim)
11 |         kernel = which_kernel()
12 |         kernel.show_kernel_matrix(X)
13 |         print '...successfully visualised kernel matrix.'
14 |         response_y=X[:,1]**2+np.random.randn(nx)
15 |         kernel.ridge_regress(X,response_y)
16 |         print '...successfully ran ridge regression.'
17 |         
18 |     @staticmethod
19 |     def UnitTestBagKernel(which_bag_kernel):
20 |             num_bagsX = 20
21 |             num_bagsY = 30
22 |             shift = 2.0
23 |             dim = 3
24 |             bagsize = 50
25 |             qvar = 0.6
26 |             baglistx = list()
27 |             baglisty = list()
28 |             for _ in range(num_bagsX):
29 |                 muX = np.sqrt(qvar) * np.random.randn(1, dim)
30 |                 baglistx.append(muX + np.sqrt(1 - qvar) * np.random.randn(bagsize, dim))
31 |             for _ in range(num_bagsY):
32 |                 muY = np.sqrt(qvar) * np.random.randn(1, dim)
33 |                 muY[:, 0] = muY[:, 0] + shift
34 |                 baglisty.append(muY + np.sqrt(1 - qvar) * np.random.randn(bagsize, dim))
35 |             data_kernel = GaussianKernel(1.0)
36 |             bag_kernel = which_bag_kernel(data_kernel)
37 |             bag_kernel.show_kernel_matrix(baglistx + baglisty)
38 |             print '...successfully visualised kernel matrix on bags.'
39 |             bag_kernel.rff_generate(dim=dim)
40 |             bagmmd = bag_kernel.estimateMMD_rff(baglistx, baglisty)
41 |             print '...successfully computed rff mmd on bags; value: ', bagmmd
42 |             response_y=np.random.randn(num_bagsX)
43 |             bag_kernel.ridge_regress_rff(baglistx,response_y)
44 |             print '...successfully ran rff ridge regression on bags.'
45 |             print 'unit test ran for ', bag_kernel.__str__()
46 | 


--------------------------------------------------------------------------------
/tools/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oxcsml/kerpy/50b175961d13e0e1f625aa987ae41cb98bfe4d84/tools/__init__.py


--------------------------------------------------------------------------------
/tools/read_and_plot_test_results.py:
--------------------------------------------------------------------------------
 1 | from matplotlib.pyplot import figure, gca, errorbar, grid, show, legend, xlabel, \
 2 |     ylabel, ylim
 3 | from numpy import sqrt, argsort, asarray
 4 | import os
 5 | from pickle import load
 6 | import sys
 7 | 
 8 | import matplotlib as mpl
 9 | 
10 | 
11 | mpl.rcParams['text.usetex']=True
12 | mpl.rcParams['text.latex.unicode']=True
13 | 
14 | 
15 | load_filename = "results.bin"
16 | #print sys.argv[1:]
17 | folders_greps = [x.replace('results/','') for x in sys.argv[1:]]
18 | #print folders_greps
19 | #if len(sys.argv)==1:
20 | #	sys.argv[1:] = raw_input('Read null or alter: ').split()
21 | which_case = 'alter'
22 | 
23 | lsdirs = os.listdir('results/')
24 | print lsdirs
25 | figure()
26 | ax=gca()
27 | legend_str=list()
28 | for folders_grep in folders_greps:
29 |     counter=list()
30 |     numTrials=list()
31 |     rate=list()
32 |     stder=list()
33 |     num_samples=list()
34 |     ii=0
35 |     for lsdir in lsdirs:
36 |         if os.path.isdir('results/'+lsdir) and lsdir.startswith(folders_grep):
37 |             os.chdir('results/'+lsdir)
38 |             load_f = open(load_filename,"r")
39 |             [counter_current, numTrials_current, param,_,_] = load(load_f)
40 |             print 'reading ' + lsdir + ' -found ' + str(numTrials_current) + ' trials'
41 |             num_samples.append( param['num_samplesX'] )
42 |             numTrials.append(numTrials_current)
43 |             counter.append(counter_current)
44 |             rate.append( counter[ii]/float(numTrials[ii]) )
45 |             stder.append( 1.96*sqrt( rate[ii]*(1-rate[ii]) / float(numTrials[ii]) ) )
46 |             print "Rejection rate: %.3f +- %.3f (%d / %d)" % (rate[ii], stder[ii], counter[ii], numTrials[ii])
47 |             os.chdir('../..')
48 |             ii+=1
49 |     #stat_test sizes may not be ordered
50 |     legend_str.append(param['name'].split('_')[0])
51 |     #legend_str.append(param['name'])
52 |     order = argsort(num_samples)
53 |     
54 |     
55 |     errorbar(asarray(num_samples)[order],\
56 |     		asarray(rate)[order],\
57 |     		yerr=asarray(stder)[order])
58 |     
59 | legend(legend_str,loc=4)
60 | xlabel("number of samples",fontsize=12)
61 | if which_case=='null':
62 |     ylim([0,0.12])
63 |     ax.set_yticks([0.01, 0.03, 0.05, 0.07, 0.09, 0.11])
64 |     ylabel("rejection rate (Type I error)",fontsize=12)
65 | elif which_case=='alter':
66 |     #ylabel("rejection rate (1-Type II error)",fontsize=12)
67 |     ylabel("rejection rate",fontsize=12)
68 | ax.set_xscale("log",basex=2)
69 | grid()
70 | show()
71 | 


--------------------------------------------------------------------------------
/tools/read_test_results.py:
--------------------------------------------------------------------------------
 1 | import os,sys 
 2 | BASE_DIR = os.path.join( os.path.dirname( __file__ ), '..' )
 3 | sys.path.append(BASE_DIR)
 4 | 
 5 | from numpy import sqrt
 6 | import os
 7 | from pickle import load
 8 | import sys
 9 | 
10 | 
11 | os.chdir(sys.argv[1])
12 | load_filename = "results.bin"
13 | load_f = open(load_filename,"r")
14 | [counter, numTrials, param, average_time, pvalues] = load(load_f)
15 | load_f.close()
16 | 
17 | rate = counter/float(numTrials)
18 | stder = 1.96*sqrt( rate*(1-rate) / float(numTrials) )
19 | '''this stder is symmetrical in terms of rate'''
20 | 
21 | print "Parameters:"
22 | for keys,values in param.items():
23 |     print(keys)
24 |     print(values)
25 | print "Rejection rate: %.3f +- %.3f (%d / %d)" % (rate, stder, counter, numTrials)
26 | print "Average test time: %.5f sec" % average_time 
27 | os.chdir('..')
28 | #Minor: need to change the above for Gaussian Kernel Median Heuristic
29 | 
30 | 


--------------------------------------------------------------------------------
/weak_conditional_independence_testing/Ozone_prewhiten.csv:
--------------------------------------------------------------------------------
  1 | "","Ozone","Temp","InvHt","Pres","Vis","Hgt","Hum","InvTmp","Wind"
  2 | "1",-0.093145,-0.097772,0.11934,-0.054445,0.79649,-0.003748,-0.10658,-0.05296,-0.1364
  3 | "2",0.017124,-0.019049,-0.20739,-0.13161,-0.26912,-0.070705,-0.043278,0.025266,-0.60218
  4 | "3",-0.016349,0.26796,-0.0054871,0.16532,-0.33646,0.094017,-0.00088725,0.026799,-0.61121
  5 | "4",0.045921,-0.29085,-0.06006,0.10718,-0.32905,-0.10617,0.28116,-0.021376,-0.14299
  6 | "5",-0.049687,-0.0092208,0.16081,-0.16119,-0.23511,0.081014,-0.26058,-0.047845,0.79053
  7 | "6",-0.053726,0.24389,-0.24056,-0.13393,0.91119,0.10203,-0.20197,0.11335,-0.63392
  8 | "7",0.017516,-0.17027,0.01662,0.20012,-0.22723,-0.10208,0.26964,-0.037665,-0.63734
  9 | "8",0.065979,-0.088185,-0.056292,-0.01744,-0.081539,-0.079742,0.083908,-0.032508,-0.65003
 10 | "9",-0.092817,0.12984,0.44579,-0.11491,-0.058096,0.11828,-0.21664,-0.074408,1.7013
 11 | "10",0.040506,-0.020616,-0.35255,0.20127,0.2584,-0.088947,0.095655,0.066674,-0.67176
 12 | "11",-0.0058906,-0.041487,-0.2795,-0.14027,-0.81997,0.0086807,-0.00039355,0.046488,0.73755
 13 | "12",-0.039191,-0.017412,0.4783,-0.00093938,0.33832,-0.030532,-0.086534,-0.13117,0.73096
 14 | "13",-0.063422,0.009324,-0.40386,-0.012208,0.52485,0.020466,-0.033456,0.14501,-0.6847
 15 | "14",0.073353,0.0066984,0.26671,-0.13545,0.080308,0.091766,-0.041759,-0.032971,-1.1585
 16 | "15",-0.092215,0.031637,0.31154,-0.03473,0.26975,0.051543,-0.19031,-0.027275,0.24467
 17 | "16",0.15615,0.18368,-0.29293,0.28441,0.13441,0.033471,0.12352,6.2296e-05,-0.23528
 18 | "17",-0.074148,-0.1671,-0.33887,-0.19295,-0.96298,-0.16004,0.29601,0.095371,0.22343
 19 | "18",-0.061865,-0.014602,0.3029,-0.06044,-0.0019414,-0.010194,-0.24038,-0.060201,-0.25578
 20 | "19",-0.00168,-0.055533,0.098344,-0.091167,0.67863,0.081056,-0.085277,-0.020209,-0.74305
 21 | "20",-0.030806,-0.040905,0.16597,-0.038903,0.21668,0.010897,-0.1314,-0.0058436,-0.28215
 22 | "21",-0.01705,0.29397,-0.10401,0.11788,-0.024424,0.011319,0.093732,0.023869,0.16825
 23 | "22",-0.035197,-0.18858,0.2303,0.116,0.15788,-0.099767,0.096,-0.087188,0.15311
 24 | "23",0.085781,-0.046949,-0.29105,-0.040372,-0.44213,-0.057153,0.076567,0.029869,-0.80677
 25 | "24",-0.050139,0.0060219,-0.11092,-0.14669,-0.22533,0.09815,-0.14795,0.049668,0.099892
 26 | "25",-0.10213,0.044801,0.060901,0.041121,0.29486,0.021649,-0.080151,-0.021589,0.059365
 27 | "26",0.12932,0.0027747,-0.07598,0.042492,0.11341,-0.037602,0.015296,-0.043092,-0.90687
 28 | "27",0.13751,0.040503,-0.031541,-0.20589,-0.51025,-0.051945,0.20176,0.062748,-1.3958
 29 | "28",-0.23697,-0.028659,-0.09363,0.046094,0.49778,0.13725,-0.45266,0.049707,0.021279
 30 | "29",0.13907,0.12582,-0.18156,-0.11365,-0.50027,0.074295,0.24876,0.041351,1.4025
 31 | "30",0.076123,0.098252,0.1105,0.18489,-0.36071,-0.0042622,-0.02582,-0.032836,-0.98719
 32 | "31",-0.16406,-0.14923,0.1686,0.16194,-0.06017,0.092023,0.087884,-0.042206,-1.0162
 33 | "32",-0.014058,-0.089128,0.24406,-0.070679,0.56292,-0.1635,-0.037193,-0.10009,0.36646
 34 | "33",0.0021554,-0.097518,-0.41413,0.013574,-1.0666,-0.24585,-0.050606,0.093285,0.80758
 35 | "34",-0.050037,-0.00040655,0.29045,-0.019683,1.1317,0.0058602,0.034025,-0.06687,1.7184
 36 | "35",-0.0011941,0.1087,-0.07885,-0.17815,-0.15424,0.23978,0.048918,0.067657,0.75704
 37 | "36",-0.034264,-0.091675,0.11895,0.21793,0.62888,-0.19389,0.0086469,-0.03978,2.628
 38 | "37",-0.032957,-0.028242,0.067776,-0.0084405,0.15234,-0.087069,0.16099,-0.049816,2.1571
 39 | "38",0.0025582,0.00082486,0.074259,-0.15714,0.26035,0.171,-0.26202,0.036384,-1.1395
 40 | "39",0.079801,0.041585,0.1839,-0.039508,0.49467,-0.017501,-0.082929,-0.036024,-0.18771
 41 | "40",8.718e-05,0.18752,-0.27632,0.0059564,-0.032088,0.074521,0.26786,0.12188,-0.65691
 42 | "41",-0.047864,-0.14743,0.23937,0.12019,-0.73364,-0.02637,-0.090461,-0.1152,-0.18185
 43 | "42",-0.12172,-0.10099,0.17588,0.035541,0.58835,-0.181,-0.017993,-0.042117,-0.18697
 44 | "43",-0.014571,0.1002,-0.27851,0.020932,0.1202,0.077506,0.0092277,0.041196,-0.1794
 45 | "44",0.059791,-0.11546,0.18181,-0.12482,0.34948,0.062502,-0.028147,0.0078783,-0.1899
 46 | "45",0.12983,0.1778,-0.3087,0.13924,-1.0052,0.053434,0.20653,0.05401,-1.1327
 47 | "46",-0.055625,-0.13049,0.19026,-0.03122,0.57716,-0.062229,-0.14956,-0.055632,1.2089
 48 | "47",-0.022346,-0.038294,0.07451,-0.12161,0.31238,-0.022082,-0.048036,-0.023586,-1.1542
 49 | "48",-0.04233,0.045813,0.06368,-0.12719,0.25664,0.078002,-0.045707,0.025562,2.6133
 50 | "49",-0.065902,0.044751,0.081771,0.086045,0.32132,0.0022497,-0.22112,0.027664,0.7319
 51 | "50",-0.063293,-0.11601,0.25574,0.075749,0.60925,-0.14511,0.22617,-0.12263,-0.2026
 52 | "51",-0.034857,0.15814,-0.222,0.038138,-0.3117,0.054804,0.038702,0.053264,0.2793
 53 | "52",-0.012506,-0.032088,-0.029836,-0.036051,0.11993,0.031241,-0.01442,0.020946,-1.1312
 54 | "53",0.31635,-0.002521,-0.058239,-0.060752,-0.18149,0.013613,-0.076768,0.022002,-1.1307
 55 | "54",0.11234,0.090624,-0.25568,-0.16512,-0.3205,0.081899,0.010071,0.089579,-1.1298
 56 | "55",-0.16637,-0.066545,0.1638,0.21111,-0.32659,-0.0030047,0.0051735,-0.065605,0.74801
 57 | "56",-0.10841,0.07696,0.12813,0.1074,-0.55055,0.024292,0.15949,0.021917,2.1544
 58 | "57",-0.010117,-0.075732,0.069652,-0.046715,1.0779,0.039787,-0.11294,-0.05303,0.74337
 59 | "58",-0.02972,-0.1729,0.082002,0.053623,0.1007,-0.31841,-0.019465,-0.038831,2.6468
 60 | "59",-0.045912,0.013326,0.081975,-0.11551,-0.085929,0.01908,0.059133,-0.020284,-1.079
 61 | "60",0.011591,0.0065198,0.069409,-0.067275,0.0021883,0.030953,-0.25321,0.0093028,-1.0599
 62 | "61",0.022422,0.10419,0.12596,0.098027,0.068421,0.16254,0.19234,0.02869,0.84518
 63 | "62",0.024822,0.0076408,-0.12315,-0.046011,-0.28591,-0.0096443,-0.095046,0.014651,-0.080528
 64 | "63",0.028466,-0.050638,0.12864,-0.0065388,-0.046696,-0.12146,-0.021722,-0.0509,-0.065879
 65 | "64",-0.071334,0.040425,0.093503,0.056515,0.3449,-0.098401,0.049572,0.0032188,-0.051475
 66 | "65",-0.16142,-0.11173,0.29657,0.27503,-0.6028,0.010187,0.019179,-0.1353,1.8629
 67 | "66",0.024503,-0.015429,-0.26908,-0.2965,0.77435,0.012226,0.10182,0.10086,-2.3648
 68 | "67",0.080117,0.094225,-0.10823,-0.031723,0.82721,0.099077,-0.27023,0.043306,-0.46013
 69 | "68",-0.041704,-0.10259,-0.037483,0.14036,0.01783,-0.053029,0.19051,-0.026712,-0.9181
 70 | "69",0.16638,0.050003,-0.057319,-0.10713,-0.77975,0.035471,-0.0070737,-0.013039,0.034951
 71 | "70",-0.35066,-0.0092071,-0.052745,-0.073539,-0.14683,0.055523,-0.13755,0.058131,-1.3758
 72 | "71",0.47948,0.2394,-0.10328,-0.089471,0.021991,0.035277,-0.0090086,0.031265,-0.90199
 73 | "72",-0.068766,0.071209,0.079,0.29574,-0.60461,-0.051623,0.22307,-0.052136,-0.43742
 74 | "73",-0.15203,-0.42815,-0.072298,0.10745,0.20125,-0.072412,0.051643,-0.069018,-0.44597
 75 | "74",0.045964,0.04773,-0.060592,-0.28393,-0.24805,0.036297,-0.17559,0.028971,0.01664
 76 | "75",-0.060333,0.030715,-0.069216,-0.036279,0.89885,-0.0039374,-0.11585,0.028432,0.005654
 77 | "76",-0.014923,0.20888,-0.078439,-0.1095,-0.088595,0.030597,-0.14499,0.092448,-0.47649
 78 | "77",-0.014596,-0.013589,-0.052194,0.25603,0.54936,-0.026025,0.23072,-0.064101,-0.016075
 79 | "78",0.13876,-0.21229,-0.22521,-0.024601,-0.53925,0.028651,0.21482,0.01793,-0.5153
 80 | "79",-0.11017,0.25857,0.27456,-0.20265,-0.15744,0.098819,-0.3388,-0.020753,-1.0053
 81 | "80",0.085521,-0.15495,0.27692,0.14405,0.1402,-0.15868,0.072422,-0.048075,0.39087
 82 | "81",-0.11865,-0.21702,-0.24543,0.14007,-0.010956,-0.052428,-0.1016,0.021031,-0.57365
 83 | "82",-0.026367,0.167,-0.22674,-0.10969,0.40872,0.030826,0.21113,0.007197,0.81857
 84 | "83",0.099509,0.054983,0.48251,-0.18133,-0.53541,0.12493,-0.11581,-0.034897,0.33472
 85 | "84",0.25097,0.37542,-0.40587,-0.03867,1.4893,0.42857,-0.1479,0.33351,-1.1017
 86 | "85",-0.24833,-0.38537,0.11079,0.18531,-0.77878,-0.51743,0.053463,-0.27943,0.31543
 87 | "86",0.048214,0.31249,-0.26587,-0.090829,-0.58256,0.10126,0.10305,0.086241,-0.17599
 88 | "87",-0.069842,-0.281,0.2757,0.19774,0.23944,0.12812,0.0031289,-0.085679,0.27173
 89 | "88",-0.05822,-0.062762,0.0856,-0.12041,0.058442,-0.35607,0.076165,-0.049382,0.72384
 90 | "89",-0.11426,-0.15402,0.078237,0.10504,-0.15337,0.030235,-0.054349,-0.040479,-0.23653
 91 | "90",0.035341,0.043391,0.20195,-0.0080218,0.52716,-0.026813,-0.069955,-0.010474,0.21558
 92 | "91",0.21773,0.28573,-0.33044,-0.091896,0.49828,0.24704,0.051797,0.13183,0.19971
 93 | "92",-0.22643,-0.31745,0.22521,0.13559,0.3441,-0.20982,0.029553,-0.1318,0.65866
 94 | "93",0.075865,0.2237,0.28803,-0.14259,-0.47663,0.049918,-0.045934,0.014562,0.64718
 95 | "94",0.069631,0.014134,-0.52716,0.083878,-0.15791,0.070876,-0.04692,0.11623,0.16651
 96 | "95",-0.020627,0.0059018,0.2738,0.0089682,0.38834,-0.018585,0.14204,-0.06389,0.63082
 97 | "96",-0.043266,-0.03921,0.096835,0.1064,-0.083609,0.086126,0.039782,-0.017706,-0.78996
 98 | "97",-0.078703,-0.16264,0.18146,-0.11855,-0.7139,-0.19301,-0.13234,-0.086961,-0.3232
 99 | "98",0.11007,0.17518,-0.26995,-0.17498,1.1216,-0.10793,-0.059387,0.14697,0.60787
100 | "99",-0.10509,-0.013195,0.18142,0.40487,-0.83912,0.29492,0.2513,-0.092083,-0.34273
101 | "100",-0.072208,-0.24302,0.096515,-0.13686,0.94605,-0.38796,-0.16891,-0.034112,2.482
102 | "101",-0.03857,0.079007,0.27095,-0.032518,0.16794,0.16518,-0.12899,-0.048616,0.13208
103 | "102",-0.009673,-0.1007,-0.20877,0.07521,-0.21482,-0.042125,0.14613,0.0043484,0.13648
104 | "103",-0.015564,0.14738,-0.10084,-0.11126,-0.17577,0.067726,-0.072171,0.060275,-0.80363
105 | "104",0.27272,0.023141,-0.095037,-0.086661,0.12798,0.12277,-0.01659,0.052855,-0.7968
106 | "105",0.10727,0.12882,-0.20073,0.020061,-0.36533,-0.013395,0.090172,0.031087,-0.33077
107 | "106",-0.27811,-0.14606,0.19235,0.20702,0.17821,-0.13339,-0.06837,-0.095174,0.13745
108 | "107",-0.1363,0.025627,0.18962,-0.065819,0.004597,-0.025737,-0.019695,-0.031304,-0.33541
109 | "108",0.21446,0.23089,-0.13172,-0.041417,0.13053,0.11772,0.098928,0.072549,0.60519
110 | "109",0.14252,-0.22157,-0.081383,0.17454,-0.41553,0.040126,0.12091,0.035412,0.60739
111 | "110",-0.012659,-0.044224,-0.27672,-0.25266,-0.37885,0.022023,-0.16527,0.058801,-0.33858
112 | "111",-0.22011,0.028115,0.30199,0.092061,0.29017,-0.13587,0.0038974,-0.10976,1.5556
113 | "112",-0.16034,-0.1283,0.2919,0.085114,0.26126,-0.075465,-0.069364,-0.12937,0.14209
114 | "113",0.14171,0.028544,-0.22311,0.024762,-0.14993,0.0054149,0.083447,0.04066,-0.31612
115 | "114",-0.022637,0.091403,-0.084924,-0.11547,-0.1293,0.12291,-0.047067,0.096333,-0.78215
116 | "115",0.42051,0.10346,-0.11233,-0.055349,-0.022063,0.047044,-0.050401,0.043787,-0.77824
117 | "116",-0.1556,-0.025769,0.086764,0.072457,0.18352,-0.027061,0.1097,-0.034489,1.1081
118 | "117",6.3438e-05,0.11728,-0.25179,0.029124,-0.046895,-0.029639,-0.0054905,0.030866,-0.77092
119 | "118",-0.32598,-0.38485,0.40178,0.18662,-0.039672,0.011423,0.072927,-0.12761,0.18018
120 | "119",0.070156,0.098291,-0.15512,-0.14732,0.16113,-0.11842,-0.070327,0.02839,1.1208
121 | "120",-0.00096457,0.08132,-0.19229,-0.029754,0.088827,0.073647,0.023315,0.014157,0.18457
122 | "121",0.1507,-0.024158,0.053973,0.0099844,-0.21279,-0.03171,0.033068,0.022262,-1.2199
123 | "122",-0.14412,0.045205,-0.030515,0.07492,-0.0073608,-0.046674,0.025365,-0.018576,0.67379
124 | "123",0.058549,-0.054412,-0.03565,0.021603,0.0073847,0.026153,-0.022628,-0.037163,0.20337
125 | "124",0.2897,0.033797,-0.10011,-0.096226,-0.1287,0.059913,0.0066319,0.073549,-1.2011
126 | "125",-0.17385,0.11107,-0.055595,-0.10602,-0.20483,0.067093,0.1066,0.040909,0.21655
127 | "126",0.33715,-0.064654,-0.053199,0.12789,-0.22108,-0.003913,0.01663,0.024045,-0.72868
128 | "127",-0.2685,0.042126,0.038965,0.0017252,-0.084989,-0.019464,-0.08282,-0.047408,0.21338
129 | "128",0.23153,0.016357,-0.028403,0.034063,-0.15146,0.05728,0.068069,-0.0077306,0.68258
130 | "129",-0.049287,0.058082,-0.15434,-0.015044,-0.06034,-0.034135,0.042461,0.058663,-1.204
131 | "130",-0.052534,-0.08612,0.11421,0.035174,-0.10175,-0.051681,-0.036196,-0.041309,0.68527
132 | "131",-0.14403,-0.02851,0.031018,-0.015835,0.14606,-0.0049731,0.012834,-0.017133,1.1589
133 | "132",0.1608,0.055533,-0.082125,0.022227,-0.11462,0.039258,-0.046572,0.022777,-0.24776
134 | "133",-0.16049,-0.11446,0.057265,0.062626,-0.12326,-0.080834,-0.015444,-0.070646,1.6503
135 | "134",-0.32425,-0.076972,0.20349,0.071753,0.56977,-0.015292,0.054639,-0.02537,-0.22408
136 | "135",0.59117,0.19889,-0.13427,-0.19257,0.029151,0.053167,-0.044696,0.086025,-1.1554
137 | "136",0.17697,0.09588,-0.11125,0.064482,-0.45767,0.068228,0.071516,0.050011,0.74215
138 | "137",-0.31726,-0.11,0.22428,0.095345,-0.14771,0.027109,0.025512,-0.049299,-1.1459
139 | "138",-0.18165,-0.0028762,-0.24712,0.0042936,0.016767,-0.12977,0.004065,0.010222,0.2793
140 | "139",0.0072111,-0.047464,0.22005,0.097062,0.12214,-0.039559,-0.0631,-0.089615,-1.131
141 | "140",-0.39475,-0.21603,0.44358,0.24456,-0.074809,-0.095287,0.22561,-0.18593,-0.65813
142 | "141",0.47902,0.19895,-0.55482,-0.39939,0.1772,0.18851,-0.22745,0.24444,-0.65813
143 | "142",0.020357,0.028639,-0.10355,0.0019812,-0.21775,0.042304,0.17532,0.038137,1.2265
144 | "143",0.11467,0.040319,0.091645,0.098576,-0.11356,-0.035415,-0.042529,-0.015461,0.28174
145 | "144",0.0079812,-0.064984,0.13813,0.0028263,0.0090637,-0.047188,-0.043902,-0.0096151,0.27905
146 | "145",0.035329,0.048074,-0.14307,-0.046537,-0.065672,0.030135,0.052995,-0.0010438,0.28052
147 | "146",-0.056598,0.075065,-0.073427,0.045831,0.047391,0.10355,-0.06582,0.0085755,0.75826
148 | "147",0.0073665,-0.01288,0.071091,0.089519,-0.0087504,-0.035791,0.040256,0.016631,1.2392
149 | "148",-0.0086907,0.053241,-0.13084,-0.10824,-0.071374,-0.0094341,0.01752,0.04414,0.30811
150 | "149",-0.033821,-0.079305,0.18589,0.115,0.13794,-0.026484,-0.091725,-0.067564,-1.091
151 | "150",-0.28331,-0.18248,0.097457,-0.0021163,0.020842,-0.18659,0.16747,-0.069927,1.7511
152 | "151",0.00058725,0.019091,0.27498,0.017701,0.037535,0.037797,-0.045377,-0.014449,-0.11471
153 | "152",0.28564,0.092049,-0.26046,-0.068938,0.088558,0.032435,-0.066728,0.046582,-1.0385
154 | "153",-0.20543,-0.086951,-0.03666,0.20309,0.25506,0.037413,0.034052,-0.03411,-0.084922
155 | "154",0.19424,0.001652,-0.052603,-0.25642,-0.20619,-0.018514,0.041227,0.039084,-0.54192
156 | "155",0.30124,0.15963,-0.077759,-0.15517,-0.2595,0.08274,0.0032884,0.084968,0.41504
157 | "156",-0.20731,-0.014599,0.0060314,0.23262,-0.35628,0.015054,-0.011195,-0.037504,-0.9911
158 | "157",-0.3157,-0.11831,-0.052501,0.066179,0.11428,-0.070359,-0.019451,-0.043833,0.4336
159 | "158",0.1794,0.12113,-0.020284,-0.031237,-0.2062,0.058582,0.00077908,0.02289,-0.50261
160 | "159",0.21346,0.010033,-0.055179,-0.04213,0.0072498,0.059599,0.1082,0.033519,-0.96937
161 | "160",0.032367,-0.0014093,-0.017387,0.040014,0.020451,-0.059513,-0.088387,-0.01872,-0.96839
162 | "161",-0.19319,0.11062,-0.035166,0.11706,0.19962,0.012826,0.06483,-0.02988,-0.50237
163 | "162",0.15771,-0.30618,-0.047085,0.010454,-0.10153,-0.12658,0.10819,-0.027589,-0.036094
164 | "163",-0.14837,0.10855,-0.042375,-0.1126,0.073293,0.081992,-0.16807,0.043933,-0.51555
165 | "164",-0.071706,0.054188,-0.084002,-0.12586,0.08731,0.025385,-0.063798,0.049141,0.89205
166 | "165",0.30607,-0.068869,-0.027968,0.12281,-0.037187,0.013563,0.12013,-0.0095146,-0.52776
167 | "166",-0.031635,0.07083,-0.07233,0.013688,0.038615,-0.026717,-0.088374,-0.025996,0.40845
168 | "167",-0.41137,-0.0013381,-0.044983,-0.073828,-0.32075,0.034871,-0.054491,0.015691,-0.075401
169 | "168",0.4143,0.11359,-0.053327,-0.0037349,0.14283,0.0093257,-0.036331,0.043399,0.86544
170 | "169",0.19027,0.11927,-0.040888,0.023225,0.052194,0.023666,0.16065,0.031205,-0.55412
171 | "170",-0.2176,-0.16605,-0.11601,0.052377,0.1452,-0.020955,-0.052353,-0.025254,-0.084434
172 | "171",-0.022593,-0.023557,0.014519,-0.012698,-0.076637,-0.018117,0.010903,0.0047052,-0.090293
173 | "172",-0.066493,-0.12035,0.083354,0.049014,-0.034916,-0.043339,-0.018307,-0.060135,1.3188
174 | "173",-0.027031,-0.017964,0.071901,-0.010153,0.21072,0.030909,-0.0081673,-0.0152,0.3772
175 | "174",-0.011037,0.072128,-0.10612,0.0042904,0.010093,-0.0035332,0.061084,0.037326,-1.0358
176 | "175",0.024231,0.095528,-0.00025697,0.017931,-0.26025,0.016839,-0.060971,-0.011784,0.3772
177 | "176",0.1938,-0.062474,-0.044421,-0.050006,-0.31825,0.026055,0.16031,0.030888,0.3811
178 | "177",0.15025,0.066093,-0.11998,-0.018544,0.13826,-0.014884,-0.090513,0.023929,-0.089073
179 | "178",0.086337,0.053309,-0.096829,0.030305,0.11259,0.033362,0.045715,0.010694,-0.55901
180 | "179",-0.017983,2.8592e-05,0.062208,0.023027,-0.15842,0.040078,0.013099,0.01489,-0.093711
181 | "180",-0.22323,-0.064395,-0.0017049,0.033358,-0.19753,-0.011075,-0.016659,-0.032875,-1.0328
182 | "181",0.054028,0.016408,-0.06186,0.024102,-0.050534,-0.052642,-0.041754,0.015243,-1.0402
183 | "182",-0.087054,-0.052874,0.10839,-0.016792,-0.020483,0.040722,0.0080406,-0.025067,1.7807
184 | "183",0.038272,0.070319,-0.0071286,0.0207,-0.075977,-0.0069046,0.093667,0.022892,0.35816
185 | "184",0.12874,-0.041203,-0.035232,-0.010326,-0.10686,-0.032969,-0.032857,-0.0029486,1.2948
186 | "185",-0.067764,0.035266,0.041625,0.046302,0.10423,0.019299,0.047898,-0.015122,-0.58342
187 | "186",-0.077525,-0.099673,0.045383,0.028594,0.11053,-0.0099649,-0.023772,0.0085014,-0.10714
188 | "187",0.054655,0.04579,-0.01333,-0.02519,0.048041,-0.018276,0.010251,-0.0090937,0.83737
189 | "188",0.085671,0.020403,-0.23623,-0.014336,-0.14982,0.035972,-0.04992,0.032006,0.36987
190 | "189",0.12428,0.065966,0.083817,0.038744,-0.2281,0.011406,0.12375,8.2936e-05,-0.56682
191 | "190",-0.0564,-0.036311,-0.035842,-0.018556,-0.22656,-0.044047,0.033695,-0.018667,-1.0199
192 | "191",0.27659,-0.042435,-0.049934,0.027903,-0.018264,0.055986,-0.013108,0.0083337,-1.0285
193 | "192",-0.22475,0.046335,-0.053801,-0.015718,-0.0015916,0.0075134,-0.030156,0.030427,-0.5573
194 | "193",-0.27146,0.03457,-0.059619,-0.034781,-0.040421,-0.015296,0.035086,0.018341,-0.56438
195 | "194",0.32806,-0.056279,-0.022788,0.017369,-0.070502,0.03693,0.051696,-0.036952,0.36426
196 | "195",0.051115,0.098583,-0.093222,0.028848,-0.13967,0.013283,0.013317,0.037988,1.2997
197 | "196",0.074378,0.039033,-0.022785,-0.14262,-0.23613,0.19145,0.15183,0.062211,0.34473
198 | "197",-0.050446,-0.03013,-0.0050457,0.13735,0.14189,-0.18695,-0.15021,-0.055161,0.34473
199 | "198",-0.14989,-0.027823,0.17111,0.039204,0.42063,-0.055798,-0.06285,-0.0050313,-1.0687
200 | "199",0.035069,-0.072495,-0.0029292,0.036206,-0.015441,0.010512,0.052112,-0.030674,-0.60613
201 | "200",-0.086632,0.027597,-0.08112,-0.002847,-0.044476,-0.024216,-0.0014006,0.0055538,0.76242
202 | "201",0.23726,0.041765,-0.057578,-0.081019,-0.089543,0.073064,0.0051224,0.020204,-0.19454
203 | "202",0.023645,0.06501,-0.095912,0.04753,-0.15896,0.021164,0.11056,0.025038,-0.19332
204 | "203",0.11574,0.13473,-0.11571,0.062571,-0.39175,0.079501,-0.078711,0.027457,0.26953
205 | "204",-0.097359,-0.15583,0.10227,0.0024972,0.11829,-0.049679,-0.034179,-0.011632,0.73873
206 | "205",-0.1778,-0.17774,0.13514,0.096419,0.27067,-0.11269,0.093745,-0.042014,-0.2109
207 | "206",-0.048281,0.12057,0.10525,-0.067999,0.099759,0.075175,0.032007,-0.031335,0.73409
208 | "207",-0.071175,0.0064823,-0.040806,-0.0084468,-0.036263,-0.032402,-0.055188,0.05731,0.73556
209 | "208",0.14962,-0.082511,0.11819,0.073376,0.2763,0.037734,-0.061654,-0.074048,-0.20431
210 | "209",-0.15086,-0.022102,0.25183,-0.080558,0.82253,-0.038157,0.051877,-0.0078553,0.74728
211 | "210",0.078846,0.084221,-0.18055,0.026746,-1.139,-0.063594,0.11422,0.024767,-0.18526
212 | "211",0.10967,0.041458,-0.093058,-0.0073796,0.97372,0.087611,-0.12063,0.029649,-0.6508
213 | "212",-0.19463,-0.11632,-0.008644,0.13276,-0.44089,-0.17027,0.01626,-0.034964,-1.1217
214 | "213",0.082612,0.0106,-0.079266,-0.14145,1.1817,0.12512,0.048123,0.015658,-1.119
215 | "214",0.024147,0.10327,-0.039438,0.0054777,-0.8964,0.041171,-0.013409,0.015985,0.75826
216 | "215",0.0074531,0.050156,-0.055253,0.05789,0.039068,0.0022203,0.021957,0.0023085,-0.65642
217 | "216",-0.045667,-0.22473,-0.040122,0.045965,0.19488,-0.042365,0.017425,0.0055079,0.74752
218 | "217",0.139,0.10463,-0.031845,-0.059959,-0.34498,0.011066,-0.019713,-0.029912,0.27466
219 | "218",0.15007,0.014956,-0.026255,0.055928,-0.17352,0.016071,0.074291,0.0077853,-0.66985
220 | "219",-0.22287,0.040657,-0.086995,-0.040602,-0.346,0.00067163,-0.0047946,0.045073,-1.1491
221 | "220",0.54465,0.0077165,-0.066304,-0.10281,0.16216,0.057155,-0.043073,0.03082,-0.21603
222 | "221",0.1205,0.059203,-0.21975,-0.0252,-0.31799,0.24869,-0.26414,0.032091,1.1891
223 | "222",-0.3788,0.059203,0.21486,0.14279,0.18603,-0.22431,0.23926,-0.033115,-0.22799
224 | "223",-0.016749,0.020547,-0.018772,0.034367,0.0048496,-0.0025987,0.084331,-0.032014,1.1757
225 | "224",-0.26083,-0.15512,-0.10404,-0.11607,-0.18879,-0.0070812,0.0026974,0.039645,0.70309
226 | "225",0.54776,0.070047,-0.16875,0.0238,-0.16603,0.022764,-0.036964,0.013599,-0.23605
227 | "226",-0.39254,0.015774,0.12024,0.072981,-0.15909,-0.0089479,0.067887,-0.006697,0.23511
228 | "227",-0.15999,-0.12147,0.1791,-0.014624,0.31428,-0.10351,-0.09186,-0.080389,0.71285
229 | "228",0.4139,0.072698,-0.16298,-0.037084,-0.12392,0.010528,0.13968,0.046802,0.2478
230 | "229",-0.041799,0.13703,-0.15241,-0.042697,0.037053,0.086294,-0.069327,0.068827,-0.69182
231 | "230",-0.38618,-0.25762,0.32607,0.11497,0.36616,-0.14072,0.12402,-0.10165,0.7402
232 | "231",-0.051293,-0.0016189,-0.16424,-0.056127,-0.36193,0.13217,-0.075597,-0.006942,-0.67082
233 | "232",0.56395,0.10883,-0.081997,-0.02039,-0.031338,-0.022246,0.029877,0.057172,-0.1943
234 | "233",0.11425,0.24021,-0.11803,-0.1332,-0.29118,0.052002,0.043799,0.1035,-0.19112
235 | "234",-0.40785,-0.25206,0.2598,0.17248,-0.11945,-0.037953,-0.0088329,-0.11192,0.75509
236 | "235",-0.13313,-0.091614,0.098877,0.006484,0.1829,-0.048092,0.0077638,-0.040673,-1.1159
237 | "236",0.36583,0.10196,-0.11675,-0.035816,0.17556,0.065007,-0.010511,0.032106,-0.17135
238 | "237",-0.32058,0.012495,0.08044,0.014666,-0.081935,0.0013745,0.038746,0.012557,-0.17159
239 | "238",0.36167,-0.01087,0.084809,-0.0032732,-0.097133,-0.038988,0.030003,0.0057152,0.29712
240 | "239",-0.093667,0.098273,-0.012228,0.038095,-0.33382,0.022492,-0.01036,0.012675,0.29419
241 | "240",-0.018504,-0.011049,-0.13692,-0.081914,-0.017178,0.023952,0.061792,0.047031,-0.16939
242 | "241",-0.15086,-0.045086,0.20862,0.069319,0.059315,-0.027023,-0.042001,-0.042879,1.7199
243 | "242",-0.0949,-0.054072,0.3069,0.031686,-0.01443,0.033953,0.06258,-0.11746,1.2563
244 | "243",0.11452,0.064404,-0.46015,-0.011818,0.14832,0.015584,0.014199,0.14159,-0.14693
245 | "244",0.16582,-0.041268,0.017138,-0.050532,0.017573,0.090848,-0.11053,-0.00065559,-0.601
246 | "245",-0.114,0.031921,0.18058,0.04107,-0.4558,-0.12233,0.26109,-0.030915,-0.11788
247 | "246",-0.067724,0.02641,0.074603,-0.040155,0.073397,-0.036299,-0.18657,0.0055985,0.84469
248 | "247",-0.0026152,-0.010453,0.074192,0.050778,0.29955,-0.064035,0.067748,-0.026156,-1.0275
249 | "248",-0.16085,-0.088249,0.17647,0.059634,0.24833,-0.00047829,-0.018329,-0.066726,0.39819
250 | "249",-0.051674,-0.10589,0.02929,-0.022063,0.46829,-0.028424,-0.021482,-0.00062112,0.41406
251 | "250",-0.016394,0.14011,-0.16981,-0.018909,-0.32881,0.049897,0.021791,0.047075,-0.51799
252 | "251",0.17401,-0.019399,-0.084646,-0.022836,-0.29937,0.04557,0.023592,0.0059248,-0.97205
253 | "252",0.36467,0.083335,-0.019526,0.062976,-0.10341,0.020401,0.075497,0.0034605,-0.02169
254 | "253",-0.070413,-0.036305,-0.31464,-0.087458,-0.49151,0.0061481,0.19781,0.10709,-0.4821
255 | "254",-0.074349,-0.031422,0.32412,-0.047205,-0.27531,0.0090986,-0.2644,-0.087694,-0.0094827
256 | "255",-0.2545,-0.020537,0.33088,-0.083388,0.92431,0.047534,-0.26326,-0.039133,0.46582
257 | "256",0.18776,0.10859,-0.32115,0.010519,0.61612,-0.018806,0.21201,0.071759,-0.94495
258 | "257",-0.0033965,-0.022145,-0.017056,0.19592,-0.70082,-0.027367,0.03629,-0.056718,0.0080954
259 | "258",-0.45912,-0.041158,-0.097938,-0.15465,-0.20969,-0.0282,-0.012497,0.036435,1.4267
260 | "259",0.50511,0.044673,-0.10411,-0.0182,-0.27125,0.068133,0.099088,0.050503,0.026162
261 | "260",0.047726,0.029053,-0.068453,0.008342,-0.11783,0.0026184,0.012789,0.020535,0.032265
262 | "261",-0.023153,-0.081133,0.16105,0.080755,-0.034568,-0.058579,-0.069121,-0.06912,0.98629
263 | "262",-0.22125,0.030715,-0.054802,-0.063769,0.11595,0.077463,0.046978,0.0010916,-0.41521
264 | "263",0.42389,0.055792,-0.16059,0.025503,-0.37953,0.019352,-0.048425,0.084764,0.070595
265 | "264",-0.28386,-0.042529,0.35851,0.086497,0.15938,-0.11397,0.11731,-0.14339,-0.38835
266 | "265",0.12019,0.10047,-0.14058,-0.072838,-0.14703,0.090205,0.070023,0.070922,0.096474
267 | "266",-0.12658,-0.18079,-0.10444,-0.094339,-0.06102,-0.0025377,-0.22158,0.057634,1.0522
268 | "267",0.12991,0.12297,0.0051709,0.058201,-0.092228,-0.014833,0.11649,-0.0022981,1.5424
269 | "268",-0.089515,-0.04297,0.16391,0.0096216,0.087725,-0.018372,0.041012,-0.058355,0.62329
270 | "269",-0.019826,-0.034307,0.19455,-0.03894,-0.14979,-0.12668,-0.050941,-0.040651,-0.28972
271 | "270",-0.12894,0.11546,-0.092026,0.06311,-0.12528,0.032733,0.070217,0.013045,-0.25627
272 | "271",0.272,-0.16368,-0.23964,0.14789,-0.3038,0.10974,0.14371,0.07504,0.23808
273 | "272",-0.10921,0.0072871,0.22079,-0.2254,-0.70086,0.004859,-0.11297,-0.039428,-0.67396
274 | "273",-0.14503,0.0044299,0.093568,0.014065,1.2022,-0.07583,-0.082957,-0.031797,0.29179
275 | "274",0.03095,0.022057,0.20501,-0.070907,0.19325,0.034417,-0.208,-0.016119,0.79444
276 | "275",-0.016758,-0.0082656,-0.11252,0.10253,-0.0020724,0.066699,0.16264,0.017587,1.2978
277 | "276",0.013314,-0.0025452,0.014567,0.0098332,-0.24246,-0.15887,0.10918,-0.024553,-0.084152
278 | "277",0.10674,-0.007806,-0.14628,-0.002126,-0.19819,0.050277,-0.077257,0.015087,-1.9307
279 | "278",-0.080942,0.050111,-0.22336,-0.08408,-0.045152,0.10074,-0.081081,0.085518,-0.46204
280 | "279",-0.19157,0.013011,0.34057,-0.018478,1.0307,-0.013762,-0.11434,-0.079808,1.4748
281 | "280",0.24859,-0.0046347,-0.19416,-0.035305,-0.61395,0.011963,0.027938,0.040835,-1.7637
282 | "281",-0.043136,0.13414,-0.16129,-0.042349,-0.20394,0.039056,-0.11003,0.050516,-0.77426
283 | "282",0.23736,0.01591,-0.15654,0.045237,-0.66499,0.043845,0.14208,0.061601,-0.75644
284 | "283",-0.17065,-0.03634,0.25062,0.0423,0.63389,0.12817,0.03545,-0.10998,0.20467
285 | "284",-0.10657,-0.2292,0.081017,-0.057086,-0.45896,-0.40631,-0.04428,-0.025772,2.5797
286 | "285",0.074756,0.11295,0.061995,-0.016791,1.0018,0.07949,-0.07758,0.015085,-0.22743
287 | "286",-0.15455,-0.11805,0.060069,0.2933,0.23998,-0.063857,0.13575,-0.067081,-0.19618
288 | "287",0.050112,0.010499,0.061719,-0.18855,-0.64197,0.066902,0.094277,0.0033993,0.7725
289 | "288",-0.13041,0.047412,0.07853,-0.068586,0.064618,0.035666,-0.16487,0.06552,-1.5542
290 | "289",0.20403,0.00087583,0.22847,-0.058083,0.092581,0.11442,-0.11177,-0.061231,-1.5303
291 | "290",-0.13241,0.026923,-0.091586,-0.021352,0.73004,0.028285,0.0094657,0.048629,0.84647
292 | "291",-0.097641,0.022108,-0.17756,0.052396,-0.69491,-0.086691,-0.069864,0.027722,-0.080697
293 | "292",0.063215,0.035532,-0.048418,0.03851,-0.10408,0.013582,0.096259,-0.022045,-1.4749
294 | "293",0.21725,-0.016689,-0.021008,-0.0086342,-0.11976,0.0028875,0.052811,-0.006761,-0.52719
295 | "294",-0.034757,0.0010778,0.0085438,-0.023232,-0.044877,0.0056652,0.0021977,-0.0056995,-0.51621
296 | "295",-0.053666,-0.046077,-0.11463,-0.049031,-0.23914,-0.053978,0.074306,0.02976,-0.50766
297 | "296",-0.14907,0.066652,-0.085055,-0.057671,-0.28518,0.025008,0.021725,0.033393,0.90385
298 | "297",0.3038,0.034041,-0.16484,-0.046054,-0.17734,0.065326,0.0080677,0.065943,0.91215
299 | "298",-0.15234,0.20957,0.24776,0.45885,-0.1965,0.03148,0.12015,-0.082162,1.3994
300 | "299",-0.057099,-0.28464,0.096772,-0.34477,0.33367,-0.18564,-0.18615,-0.059934,2.3605
301 | "300",-0.019564,-0.12024,0.20693,-0.014824,-0.4113,0.067257,-0.040944,-0.060996,-1.3814
302 | "301",-0.013198,-0.014805,-0.15658,-0.043344,0.74991,-0.021347,-0.028246,0.036733,0.52693
303 | "302",-0.012325,0.10916,-0.1898,0.075107,-0.25209,0.038662,-0.027443,0.075619,-0.38877
304 | "303",-0.013425,-0.011177,0.30181,-0.14,0.1933,-0.049092,-0.030905,-0.05822,-0.3624
305 | "304",-0.020935,-0.03335,0.14684,-0.051743,0.19232,0.11767,-0.071514,-0.015952,0.1383
306 | "305",-0.065139,0.090524,-0.35549,-0.058154,0.37343,0.049607,-0.13227,0.10915,-0.3143
307 | "306",0.05018,0.030188,0.16027,0.27579,-0.48735,-0.0068029,0.21441,-0.087373,-1.2354
308 | "307",0.024792,-0.042476,-0.19246,-0.078913,-0.79014,-0.20139,0.092538,0.026952,-1.2151
309 | "308",0.0059682,-0.073672,0.10088,-0.093357,1.0324,0.069603,-0.16733,-0.020836,0.68416
310 | "309",-0.076763,-0.05053,0.12652,-0.046089,-0.075128,0.10523,-0.098951,0.01127,0.22423
311 | "310",-0.046847,0.3267,-0.35787,0.0051628,1.0942,0.027106,-0.059636,0.10873,-1.1775
312 | "311",0.086092,-0.25373,0.50174,0.18451,-0.69412,-0.1683,0.23886,-0.22634,-1.1707
313 | "312",-0.031945,-0.053342,-0.55981,-0.16232,-0.44007,0.023112,-0.11267,0.18176,-1.1682
314 | "313",0.0076973,0.0092993,0.29462,-0.048858,-0.24324,0.073991,-0.065383,-0.061372,1.1892
315 | "314",-0.009595,0.047249,0.088352,-0.010746,0.28619,0.0097922,-0.030359,0.020547,0.23815
316 | "315",-0.023292,0.0047937,0.084135,0.038604,0.22097,0.0033434,-0.027298,-0.031705,-1.1751
317 | "316",-0.08319,-0.013582,0.25238,0.022875,-0.5426,-0.076918,-0.027123,-0.088257,-1.1795
318 | "317",0.073905,0.055926,-0.47253,-0.06175,0.5664,0.033011,-0.028175,0.19265,0.23131
319 | "318",-0.058242,-0.059163,0.25896,-0.099935,-0.19364,0.15685,-0.040447,-0.079392,-1.2041
320 | "319",-0.003054,0.07566,0.14351,0.029178,0.13182,-0.076301,-0.1805,-0.011347,0.67268
321 | "320",-0.042718,-0.10104,-0.11066,0.011411,0.24715,-0.10646,0.18459,0.00086902,-0.28452
322 | "321",0.037291,0.080563,0.1225,0.076622,-0.42025,-0.018403,-0.011781,-0.023858,-1.2405
323 | "322",0.049444,-0.028949,0.069449,-0.031985,-0.39013,0.066776,0.12938,0.003615,0.15124
324 | "323",-0.08208,-0.04161,0.085806,-0.049641,-0.0090254,-0.085952,-0.17188,-0.02249,1.0796
325 | "324",0.002423,0.060926,0.28635,-0.044334,1.198,0.095215,-0.18815,-0.034615,0.58821
326 | "325",-0.11066,-0.067903,-0.4118,0.020679,-0.3876,0.020985,0.17771,0.089724,0.58064
327 | "326",0.12667,0.019861,0.14487,0.018152,-0.40734,-0.10187,-0.026358,-0.079644,0.57478
328 | "327",-0.083937,0.0083425,-0.40077,0.024828,0.63676,0.088036,0.10813,0.15111,0.0819
329 | "328",-0.027496,0.026971,0.37383,-0.20551,-0.31644,0.015756,-0.30385,-0.078564,0.076529
330 | "329",0.090614,0.091512,-0.07486,0.15776,-0.42342,0.050158,0.051346,0.0018103,0.076529
331 | "330",-0.13797,-0.15824,0.19323,-0.021542,0.06887,-0.15722,0.17069,-0.060901,0.54915
332 | 


--------------------------------------------------------------------------------
/weak_conditional_independence_testing/PCalg_twostep_flags.py:
--------------------------------------------------------------------------------
  1 | '''
  2 | Modified PC algorithm code (original code: https://github.com/keiichishima/pcalg) 
  3 | by incorporating KRESIT as the conditional independence test.
  4 | 
  5 | Original code: A graph generator based on the PC algorithm [Kalisch2007].
  6 | [Kalisch2007] Markus Kalisch and Peter Bhlmann. Estimating
  7 | high-dimensional directed acyclic graphs with the pc-algorithm. In The
  8 | Journal of Machine Learning Research, Vol. 8, pp. 613-636, 2007.
  9 | License: BSD
 10 | 
 11 | Example run in terminal:
 12 | 1) KRESIT:
 13 | $ python PCalg_twostep_flags.py 400 --dimZ 4 --data_filename "Synthetic_DAGexample" 
 14 | --kernelX_use_median --kernelY_use_median 
 15 | --results_filename "test_run_KRESIT" --figure_filename "test_graph_KRESIT" 
 16 | 
 17 | (It takes the first 400 samples and the first 4 dimensions from the Synthetic_DAGexample.csv
 18 | and run KRESIT with Gaussian kernel median Heuristic on the variables X and Y. The kernel on Z
 19 | is set by default to be Gaussian median Heuristic. The regularisation parameters is 
 20 | set by default to use grid search. The resulting CPDAG is saved "test_graph_KRESIT.pdf".)
 21 | 
 22 | 2) RESIT:
 23 | $ python PCalg_twostep_flags.py 400 --dimZ 4 --data_filename "Synthetic_DAGexample"
 24 | --kernelX --kernelY 
 25 | --kernelRxz --kernelRyz
 26 | --kernelRxz_use_median --kernelRyz_use_median 
 27 | --RESIT_type
 28 | --result_filename "test_run_RESIT" --figure_filename "test_graph_RESIT"
 29 | 
 30 | (It takes the first 400 samples and the first 4 dimensions from the Synthetic_DAGexample.csv
 31 | and run RESIT. The kernels on X and Y are set to be linear. The kernels on the residuals Rxz and 
 32 | Ryz are Gaussian with median Heuristic bandwidth.The regularisation parameters is 
 33 | set by default to use grid search. The resulting CPDAG is saved "test_graph_RESIT.pdf".)
 34 | 
 35 | '''
 36 | from __future__ import print_function
 37 | 
 38 | # Remote_running 
 39 | #import matplotlib
 40 | #matplotlib.use('Agg')
 41 | 
 42 | import os, sys
 43 | BASE_DIR = os.path.join( os.path.dirname( __file__ ), '..' )
 44 | sys.path.append(BASE_DIR)
 45 | 
 46 | 
 47 | from itertools import combinations, permutations
 48 | import logging
 49 | 
 50 | import networkx as nx
 51 | import cPickle as pickle
 52 | from pickle import load, dump
 53 | 
 54 | from kerpy.GaussianKernel import GaussianKernel
 55 | from kerpy.LinearKernel import LinearKernel
 56 | from TwoStepCondTestObject import TwoStepCondTestObject
 57 | from independence_testing.HSICSpectralTestObject import HSICSpectralTestObject
 58 | import numpy as np
 59 | from numpy import arange
 60 | 
 61 | _logger = logging.getLogger(__name__)
 62 | 
 63 | 
 64 | 
 65 | 
 66 | 
 67 | 
 68 | def _create_complete_graph(node_ids):
 69 |     """Create a complete graph from the list of node ids.
 70 |     Args:
 71 |         node_ids: a list of node ids
 72 |     Returns:
 73 |         An undirected graph (as a networkx.Graph)
 74 |     """
 75 |     g = nx.Graph()
 76 |     g.add_nodes_from(node_ids)
 77 |     for (i, j) in combinations(node_ids, 2):
 78 |         g.add_edge(i, j)
 79 |         pass
 80 |     return g
 81 | 
 82 | 
 83 | 
 84 | 
 85 | def estimate_skeleton( data_matrix, alpha, **kwargs):
 86 |     # originally first argument is indep_test_func
 87 |     # now this version uses HSIC Spectral Test for independence 
 88 |     # and KRESIT for conditional independence.
 89 |     """Estimate a skeleton graph from the statistis information.
 90 |     Args:
 91 |         indep_test_func: the function name for a conditional
 92 |             independency test.
 93 |         data_matrix: data (as a numpy array).
 94 |         alpha: the significance level.
 95 |         kwargs:
 96 |             'max_reach': maximum value of l (see the code).  The
 97 |                 value depends on the underlying distribution.
 98 |             'method': if 'stable' given, use stable-PC algorithm
 99 |                 (see [Colombo2014]).
100 |             other parameters may be passed depending on the
101 |                 indep_test_func()s.
102 |     Returns:
103 |         g: a skeleton graph (as a networkx.Graph).
104 |         sep_set: a separation set (as an 2D-array of set()).
105 |     [Colombo2014] Diego Colombo and Marloes H Maathuis. Order-independent
106 |     constraint-based causal structure learning. In The Journal of Machine
107 |     Learning Research, Vol. 15, pp. 3741-3782, 2014.
108 |     """
109 |     
110 |     def method_stable(kwargs):
111 |         return ('method' in kwargs) and kwargs['method'] == "stable"
112 |     
113 |     node_ids = range(data_matrix.shape[1])
114 |     g = _create_complete_graph(node_ids)
115 |     
116 |     node_size = data_matrix.shape[1]
117 |     sep_set = [[set() for i in range(node_size)] for j in range(node_size)]
118 |     
119 |     
120 |     X_idx_list_init = []
121 |     Y_idx_list_init = []
122 |     Z_idx_list_init = []
123 |     pval_list_init = []
124 |     
125 |     l = 0
126 |     completed_xy_idx_init = 0
127 |     completed_z_idx_init = 0 
128 |     remove_edges_current = []
129 |     
130 |     
131 |     results_filename = kwargs['results_filename']
132 |     myfolder = "pcalg_results/"
133 |     save_filename = myfolder + results_filename + ".bin"
134 |     #print("save_filename:", save_filename)
135 |     #sys.exit(1)
136 |     if not os.path.exists(myfolder):
137 |         os.mkdir(myfolder)
138 |     elif os.path.exists(save_filename):
139 |         load_f = open(save_filename,"r")
140 |         [X_idx_list_init, Y_idx_list_init, Z_idx_list_init, pval_list_init, \
141 |          l, completed_xy_idx_init, completed_z_idx_init,\
142 |          remove_edges_current, g] = load(load_f)
143 |         load_f.close()
144 |         print("Found exitising results")
145 |     
146 |     X_idx_list = X_idx_list_init
147 |     Y_idx_list = Y_idx_list_init
148 |     Z_idx_list = Z_idx_list_init
149 |     pval_list = pval_list_init
150 |     completed_xy_idx = completed_xy_idx_init
151 |     completed_z_idx = completed_z_idx_init
152 |     
153 |     
154 |     
155 |     while True:
156 |         cont = False
157 |         remove_edges = remove_edges_current
158 |         perm_iteration_list = list(permutations(node_ids,2))
159 |         length_iteration_list = len(perm_iteration_list)
160 |         
161 |         for ij in arange(completed_xy_idx, length_iteration_list):
162 |             (i,j) = perm_iteration_list[ij]
163 |             adj_i = g.neighbors(i)
164 |             if j not in adj_i:
165 |                 continue
166 |             else:
167 |                 adj_i.remove(j)
168 |                 pass
169 |             if len(adj_i) >= l:
170 |                 _logger.debug('testing %s and %s' % (i,j))
171 |                 _logger.debug('neighbors of %s are %s' % (i, str(adj_i)))
172 |                 if len(adj_i) < l:
173 |                     continue
174 |                 
175 |                 cc = list(combinations(adj_i, l))
176 |                 length_cc = len(cc)
177 |                 
178 |                 
179 |                 for kk in arange(completed_z_idx, length_cc):
180 |                     k = cc[kk]
181 |                     _logger.debug('indep prob of %s and %s with subset %s'
182 |                                   % (i, j, str(k)))
183 |                     if l == 0: # independence testing 
184 |                         print("independence testing", (i,j))
185 |                         data_x = data_matrix[:,[i]]
186 |                         data_y = data_matrix[:,[j]]
187 |                         
188 |                         num_samples = np.shape(data_matrix)[0]
189 |                         kernelX_hsic = GaussianKernel(1.)
190 |                         kernelY_hsic = GaussianKernel(1.)
191 |                         kernelX_use_median_hsic = True 
192 |                         kernelY_use_median_hsic = True
193 |                         
194 |                         myspectraltestobj = HSICSpectralTestObject(num_samples, None, kernelX_hsic, kernelY_hsic, 
195 |                             kernelX_use_median = kernelX_use_median_hsic,
196 |                             kernelY_use_median = kernelY_use_median_hsic, 
197 |                             num_nullsims=1000, unbiased=False)
198 |                         
199 |                         p_val, _ = myspectraltestobj.compute_pvalue_with_time_tracking(data_x,data_y)
200 |                         
201 |                         
202 |                         X_idx_list.append((i))
203 |                         Y_idx_list.append((j))
204 |                         Z_idx_list.append((0))
205 |                         pval_list.append((p_val))
206 |                         
207 |                         
208 |                     else: # conditional independence testing
209 |                         print("conditional independence testing",(i,j,k))
210 |                         data_x = data_matrix[:,[i]]
211 |                         data_y = data_matrix[:,[j]]
212 |                         data_z = data_matrix[:,k]
213 |                         
214 |                         num_samples = np.shape(data_matrix)[0]
215 |                         #kernelX = GaussianKernel(1.)
216 |                         #kernelY = GaussianKernel(1.)
217 |                         #kernelX_use_median = True
218 |                         #kernelY_use_median = True
219 |                         #kernelX = LinearKernel()
220 |                         #kernelY = LinearKernel()
221 |                         kernelX = kwargs['kernelX']
222 |                         kernelY = kwargs['kernelY']
223 |                         kernelZ = GaussianKernel(1.)
224 |                         kernelX_use_median = kwargs['kernelX_use_median']
225 |                         kernelY_use_median = kwargs['kernelY_use_median']
226 |                         kernelRxz = kwargs['kernelRxz']
227 |                         kernelRyz = kwargs['kernelRyz']
228 |                         kernelRxz_use_median = kwargs['kernelRxz_use_median']
229 |                         kernelRyz_use_median = kwargs['kernelRyz_use_median']
230 |                         RESIT_type = kwargs['RESIT_type']
231 |                         optimise_lambda_only = kwargs['optimise_lambda_only']
232 |                         grid_search = kwargs['grid_search']
233 |                         GD_optimise = kwargs['GD_optimise']
234 |                         
235 |                         
236 |                         
237 |                         num_lambdaval = 30
238 |                         lambda_val = 10**np.linspace(-6,1, num=num_lambdaval)
239 |                         z_bandwidth = None
240 |                         #num_bandwidth = 20
241 |                         #z_bandwidth = 10**np.linspace(-5,1,num = num_bandwidth)
242 |                         
243 |                         mytestobj = TwoStepCondTestObject(num_samples, None, 
244 |                                                  kernelX, kernelY, kernelZ, 
245 |                                                  kernelX_use_median=kernelX_use_median,
246 |                                                  kernelY_use_median=kernelY_use_median, 
247 |                                                  kernelZ_use_median=True, 
248 |                                                  kernelRxz = kernelRxz, kernelRyz = kernelRyz,
249 |                                                  kernelRxz_use_median = kernelRxz_use_median, 
250 |                                                  kernelRyz_use_median = kernelRyz_use_median,
251 |                                                  RESIT_type = RESIT_type,
252 |                                                  num_shuffles=800,
253 |                                                  lambda_val=lambda_val,lambda_X = None, lambda_Y = None,
254 |                                                  optimise_lambda_only = optimise_lambda_only, 
255 |                                                  sigmasq_vals = z_bandwidth ,sigmasq_xz = 1., sigmasq_yz = 1.,
256 |                                                  K_folds=5, grid_search = grid_search,
257 |                                                  GD_optimise=GD_optimise, learning_rate=0.1,max_iter=300,
258 |                                                  initial_lambda_x=0.5,initial_lambda_y=0.5, initial_sigmasq = 1)
259 |                         
260 |                         
261 |                         
262 |                         p_val, _ = mytestobj.compute_pvalue(data_x, data_y, data_z)
263 |                         
264 |                         X_idx_list.append((i))
265 |                         Y_idx_list.append((j))
266 |                         Z_idx_list.append(k)
267 |                         pval_list.append((p_val))
268 |                         
269 |                         
270 |                     
271 |                     
272 |                     completed_z_idx = kk + 1
273 |                     
274 |                     save_f = open(save_filename,"w")
275 |                     dump([X_idx_list, Y_idx_list, Z_idx_list, pval_list, l, completed_xy_idx, completed_z_idx,\
276 |                           remove_edges, g], save_f)
277 |                     save_f.close()
278 |                     
279 |                     
280 |                     _logger.debug('p_val is %s' % str(p_val))
281 |                     if p_val > alpha:
282 |                         if g.has_edge(i, j):
283 |                             _logger.debug('p: remove edge (%s, %s)' % (i, j))
284 |                             if method_stable(kwargs):
285 |                                 remove_edges.append((i, j))
286 |                             else:
287 |                                 g.remove_edge(i, j)
288 |                             pass
289 |                         sep_set[i][j] |= set(k)
290 |                         sep_set[j][i] |= set(k)
291 |                         break
292 |                     pass
293 |                 completed_z_idx = 0
294 |                 completed_xy_idx = ij + 1
295 |                 cont = True
296 |                 pass
297 |             pass
298 |         
299 |         
300 |         
301 |         l += 1
302 |         completed_xy_idx = 0
303 |         if method_stable(kwargs):
304 |             g.remove_edges_from(remove_edges)
305 |         if cont is False:
306 |             break
307 |         if ('max_reach' in kwargs) and (l > kwargs['max_reach']):
308 |             break
309 |         
310 |         
311 |         
312 |         save_f = open(save_filename,"w")
313 |         dump([X_idx_list, Y_idx_list, Z_idx_list, pval_list, l, completed_xy_idx, completed_z_idx,\
314 |                           remove_edges, g], save_f)
315 |         save_f.close()
316 |         
317 |         pass
318 |     
319 |     return (g, sep_set)
320 | 
321 | 
322 | 
323 | 
324 | def estimate_cpdag(skel_graph, sep_set):
325 |     """Estimate a CPDAG from the skeleton graph and separation sets
326 |     returned by the estimate_skeleton() function.
327 |     Args:
328 |         skel_graph: A skeleton graph (an undirected networkx.Graph).
329 |         sep_set: An 2D-array of separation set.
330 |             The contents look like something like below.
331 |                 sep_set[i][j] = set([k, l, m])
332 |     Returns:
333 |         An estimated DAG.
334 |     """
335 |     dag = skel_graph.to_directed()
336 |     node_ids = skel_graph.nodes()
337 |     for (i, j) in combinations(node_ids, 2):
338 |         adj_i = set(dag.successors(i))
339 |         if j in adj_i:
340 |             continue
341 |         adj_j = set(dag.successors(j))
342 |         if i in adj_j:
343 |             continue
344 |         common_k = adj_i & adj_j
345 |         for k in common_k:
346 |             if k not in sep_set[i][j]:
347 |                 if dag.has_edge(k, i):
348 |                     _logger.debug('S: remove edge (%s, %s)' % (k, i))
349 |                     dag.remove_edge(k, i)
350 |                     pass
351 |                 if dag.has_edge(k, j):
352 |                     _logger.debug('S: remove edge (%s, %s)' % (k, j))
353 |                     dag.remove_edge(k, j)
354 |                     pass
355 |                 pass
356 |             pass
357 |         pass
358 |     
359 |     def _has_both_edges(dag, i, j):
360 |         return dag.has_edge(i, j) and dag.has_edge(j, i)
361 |     
362 |     def _has_any_edge(dag, i, j):
363 |         return dag.has_edge(i, j) or dag.has_edge(j, i)
364 |     
365 |     def _has_one_edge(dag, i, j):
366 |         return ((dag.has_edge(i, j) and (not dag.has_edge(j, i))) or
367 |                 (not dag.has_edge(i, j)) and dag.has_edge(j, i))
368 |     
369 |     def _has_no_edge(dag, i, j):
370 |         return (not dag.has_edge(i, j)) and (not dag.has_edge(j, i))
371 |     
372 |     # For all the combination of nodes i and j, apply the following
373 |     # rules.
374 |     for (i, j) in combinations(node_ids, 2):
375 |         # Rule 1: Orient i-j into i->j whenever there is an arrow k->i
376 |         # such that k and j are nonadjacent.
377 |         #
378 |         # Check if i-j.
379 |         if _has_both_edges(dag, i, j):
380 |             # Look all the predecessors of i.
381 |             for k in dag.predecessors(i):
382 |                 # Skip if there is an arrow i->k.
383 |                 if dag.has_edge(i, k):
384 |                     continue
385 |                 # Skip if k and j are adjacent.
386 |                 if _has_any_edge(dag, k, j):
387 |                     continue
388 |                 # Make i-j into i->j
389 |                 _logger.debug('R1: remove edge (%s, %s)' % (j, i))
390 |                 dag.remove_edge(j, i)
391 |                 break
392 |             pass
393 |         
394 |         # Rule 2: Orient i-j into i->j whenever there is a chain
395 |         # i->k->j.
396 |         #
397 |         # Check if i-j.
398 |         if _has_both_edges(dag, i, j):
399 |             # Find nodes k where k is i->k.
400 |             succs_i = set()
401 |             for k in dag.successors(i):
402 |                 if not dag.has_edge(k, i):
403 |                     succs_i.add(k)
404 |                     pass
405 |                 pass
406 |             # Find nodes j where j is k->j.
407 |             preds_j = set()
408 |             for k in dag.predecessors(j):
409 |                 if not dag.has_edge(j, k):
410 |                     preds_j.add(k)
411 |                     pass
412 |                 pass
413 |             # Check if there is any node k where i->k->j.
414 |             if len(succs_i & preds_j) > 0:
415 |                 # Make i-j into i->j
416 |                 _logger.debug('R2: remove edge (%s, %s)' % (j, i))
417 |                 dag.remove_edge(j, i)
418 |                 break
419 |             pass
420 |         
421 |         # Rule 3: Orient i-j into i->j whenever there are two chains
422 |         # i-k->j and i-l->j such that k and l are nonadjacent.
423 |         #
424 |         # Check if i-j.
425 |         if _has_both_edges(dag, i, j):
426 |             # Find nodes k where i-k.
427 |             adj_i = set()
428 |             for k in dag.successors(i):
429 |                 if dag.has_edge(k, i):
430 |                     adj_i.add(k)
431 |                     pass
432 |                 pass
433 |             # For all the pairs of nodes in adj_i,
434 |             for (k, l) in combinations(adj_i, 2):
435 |                 # Skip if k and l are adjacent.
436 |                 if _has_any_edge(dag, k, l):
437 |                     continue
438 |                 # Skip if not k->j.
439 |                 if dag.has_edge(j, k) or (not dag.has_edge(k, j)):
440 |                     continue
441 |                 # Skip if not l->j.
442 |                 if dag.has_edge(j, l) or (not dag.has_edge(l, j)):
443 |                     continue
444 |                 # Make i-j into i->j.
445 |                 _logger.debug('R3: remove edge (%s, %s)' % (j, i))
446 |                 dag.remove_edge(j, i)
447 |                 break
448 |             pass
449 |         
450 |         # Rule 4: Orient i-j into i->j whenever there are two chains
451 |         # i-k->l and k->l->j such that k and j are nonadjacent.
452 |         #
453 |         # However, this rule is not necessary when the PC-algorithm
454 |         # is used to estimate a DAG.
455 |         pass
456 |     
457 |     return dag
458 | 
459 | 
460 | 
461 | 
462 | def run():#if __name__ == '__main__':
463 |     import networkx as nx
464 |     import pandas as pd
465 |     import matplotlib.pyplot as plt
466 |     from SimDataGen import SimDataGen
467 |     
468 |     from tools.ProcessingObject import ProcessingObject
469 |     args = ProcessingObject.parse_arguments()
470 |     num_samples = args.num_samples
471 |     kernelX = args.kernelX #Default: GaussianKernel(1.)
472 |     kernelY = args.kernelY #Default: GaussianKernel(1.)
473 |     kernelX_use_median = args.kernelX_use_median #Default: False
474 |     kernelY_use_median = args.kernelY_use_median #Default: False
475 |     kernelRxz = args.kernelRxz #Default: LinearKernel 
476 |     kernelRyz = args.kernelRyz #Default: LinearKernel
477 |     kernelRxz_use_median = args.kernelRxz_use_median #Default: False 
478 |     kernelRyz_use_median = args.kernelRyz_use_median #Default: False
479 |     RESIT_type = args.RESIT_type #Default: False
480 |     optimise_lambda_only = args.optimise_lambda_only #Default: True
481 |     grid_search = args.grid_search #Default: True
482 |     GD_optimise = args.GD_optimise #Default: False 
483 |     results_filename = args.results_filename
484 |     figure_filename = args.figure_filename
485 |     data_filename = args.data_filename
486 |     num_var = args.dimZ
487 |     
488 |     
489 |     
490 |     datafile = data_filename + ".csv"
491 |     data = np.loadtxt(datafile,delimiter = ',')
492 |     dm = data[range(num_samples),:][:, range(num_var)]
493 |     
494 |     
495 |     (g, sep_set) = estimate_skeleton(data_matrix=dm, alpha=0.05,
496 |                                      kernelX = kernelX, kernelY = kernelY,
497 |                                      kernelX_use_median = kernelX_use_median,
498 |                                      kernelY_use_median = kernelY_use_median,
499 |                                      kernelRxz = kernelRxz, kernelRyz = kernelRyz,
500 |                                      kernelRxz_use_median = kernelRxz_use_median,
501 |                                      kernelRyz_use_median = kernelRyz_use_median,
502 |                                      RESIT_type = RESIT_type,
503 |                                      results_filename = results_filename,
504 |                                      optimise_lambda_only = optimise_lambda_only, 
505 |                                      grid_search = grid_search, 
506 |                                      GD_optimise = GD_optimise)
507 |     
508 |     
509 |     g = estimate_cpdag(skel_graph=g, sep_set=sep_set)
510 |     
511 |     
512 |     if num_var == 7:
513 |         labels={}
514 |         labels[0]=r'$X$'
515 |         labels[1]=r'$Y$'
516 |         labels[2]=r'$Z$'
517 |         labels[3]=r'$A$'
518 |         labels[4]=r'$B$'
519 |         labels[5]=r'$Cx$'
520 |         labels[6]=r'$Cy$'
521 |     elif num_var == 6:
522 |         labels={}
523 |         labels[0] = r'$X$'
524 |         labels[1] = r'$Y$'
525 |         labels[2] = r'$Z$'
526 |         labels[3] = r'$A$'
527 |         labels[4] = r'$Cx$'
528 |         labels[5] = r'$Cy$'
529 |     elif num_var == 5:
530 |         labels={}
531 |         labels[0] = r'$X$'
532 |         labels[1] = r'$Y$'
533 |         labels[2] = r'$Z$'
534 |         labels[3] = r'$A$'
535 |         labels[4] = r'$B$'
536 |     elif num_var == 4:
537 |         labels={}
538 |         labels[0] = r'$X$'
539 |         labels[1] = r'$Y$'
540 |         labels[2] = r'$Z$'
541 |         labels[3] = r'$A$'
542 |     else:
543 |         raise NotImplementedError
544 |     
545 |     nx.draw_networkx(g, pos=nx.spring_layout(g), labels=labels, with_labels=True)
546 |     figure_name = figure_filename + ".pdf"
547 |     plt.savefig(figure_name)
548 |     #plt.show()
549 | 
550 | 
551 | if __name__ == '__main__':
552 |     run()


--------------------------------------------------------------------------------
/weak_conditional_independence_testing/SimDataGen.py:
--------------------------------------------------------------------------------
 1 | from numpy.random import uniform, permutation, multivariate_normal,normal
 2 | from numpy import pi, prod, empty, sin, cos, asscalar, shape,zeros, identity,arange,sign,sum,sqrt,transpose, tanh,sinh
 3 | import numpy as np 
 4 | 
 5 | 
 6 | 
 7 | class SimDataGen(object):
 8 |     def __init__(self):
 9 |         pass
10 |     
11 |     
12 |     @staticmethod
13 |     def null_model(num_samples, dimension = 1, rho=0):
14 |         data_z = np.reshape(uniform(0,5,num_samples*dimension),(num_samples,dimension))
15 |         coin_flip_x = np.random.choice([0,1],replace=True,size=num_samples)
16 |         coin_flip_y = np.random.choice([0,1],replace=True,size=num_samples)
17 |         mean_noise = [0,0]
18 |         cov_noise = [[1,0],[0,1]]
19 |         noise_x, noise_y = multivariate_normal(mean_noise, cov_noise, num_samples).T
20 |         data_x = zeros(num_samples)
21 |         data_x[coin_flip_x == 0,] = 1.7*data_z[coin_flip_x == 0,0] 
22 |         data_x[coin_flip_x == 1,] = -1.7*data_z[coin_flip_x == 1,0]
23 |         data_x = data_x + noise_x
24 |         data_y = zeros(num_samples)
25 |         data_y[coin_flip_y == 0,] = (data_z[coin_flip_y == 0,0]-2.7)**2
26 |         data_y[coin_flip_y == 1,] = -(data_z[coin_flip_y == 1,0]-2.7)**2+13
27 |         data_y = data_y + noise_y
28 |         data_x = np.reshape(data_x, (num_samples,1))
29 |         data_y = np.reshape(data_y, (num_samples,1))
30 |         return data_x, data_y, data_z
31 |     
32 |     
33 |     @staticmethod
34 |     def alternative_model(num_samples,dimension = 1, rho=0.15):
35 |         data_z = np.reshape(uniform(0,5,num_samples*dimension),(num_samples,dimension))
36 |         rr = uniform(0,1, num_samples)
37 |         idx_rr = np.where(rr < rho)
38 |         coin_flip_x = np.random.choice([0,1],replace=True,size=num_samples)
39 |         coin_flip_y = np.random.choice([0,1],replace=True,size=num_samples)
40 |         coin_flip_y[idx_rr] = coin_flip_x[idx_rr]
41 |         mean_noise = [0,0]
42 |         cov_noise = [[1,0],[0,1]]
43 |         noise_x, noise_y = multivariate_normal(mean_noise, cov_noise, num_samples).T
44 |         data_x = zeros(num_samples)
45 |         data_x[coin_flip_x == 0] = 1.7*data_z[coin_flip_x == 0,0] 
46 |         data_x[coin_flip_x == 1] = -1.7*data_z[coin_flip_x == 1,0]
47 |         data_x = data_x + noise_x
48 |         data_y = zeros(num_samples)
49 |         data_y[coin_flip_y == 0] = (data_z[coin_flip_y == 0,0]-2.7)**2
50 |         data_y[coin_flip_y == 1] = -(data_z[coin_flip_y == 1,0]-2.7)**2+13
51 |         data_y = data_y + noise_y
52 |         data_x = np.reshape(data_x, (num_samples,1))
53 |         data_y = np.reshape(data_y, (num_samples,1))
54 |         return data_x, data_y, data_z
55 |     
56 |     
57 |     
58 |     @staticmethod
59 |     def DAG_simulation_version1(num_samples):
60 |         dimension = 1
61 |         rho = 0
62 |         data_z = np.reshape(uniform(0,5,num_samples*dimension),(num_samples,dimension))
63 |         rr = uniform(0,1, num_samples)
64 |         idx_rr = np.where(rr < rho)
65 |         coin_flip_x = np.random.choice([0,1],replace=True,size=num_samples)
66 |         coin_flip_y = np.random.choice([0,1],replace=True,size=num_samples)
67 |         coin_flip_y[idx_rr] = coin_flip_x[idx_rr]
68 |         mean_noise = [0,0]
69 |         cov_noise = [[1,0],[0,1]]
70 |         noise_x, noise_y = multivariate_normal(mean_noise, cov_noise, num_samples).T
71 |         data_x = zeros(num_samples)
72 |         data_x[coin_flip_x == 0] = 1.7*data_z[coin_flip_x == 0,0] 
73 |         data_x[coin_flip_x == 1] = -1.7*data_z[coin_flip_x == 1,0]
74 |         data_x = data_x + noise_x
75 |         data_y = zeros(num_samples)
76 |         data_y[coin_flip_y == 0] = (data_z[coin_flip_y == 0,0]-2.7)**2
77 |         data_y[coin_flip_y == 1] = -(data_z[coin_flip_y == 1,0]-2.7)**2+13
78 |         data_y = data_y + noise_y
79 |         data_x = np.reshape(data_x, (num_samples,1))
80 |         data_y = np.reshape(data_y, (num_samples,1))
81 |         coin_x = np.reshape(coin_flip_x, (num_samples,1))
82 |         coin_y = np.reshape(coin_flip_y, (num_samples,1))
83 |         
84 |         noise_A, noise_B = multivariate_normal(mean_noise, cov_noise, num_samples).T
85 |         noise_A = np.reshape(noise_A, (num_samples,1))
86 |         noise_B = np.reshape(noise_B, (num_samples,1))
87 |         
88 |         data_A = (data_y-5)**2/float(3) + 5 + noise_A
89 |         #data_A = (data_y-5)**2/float(11)+ 5 +noise_A
90 |         #data_B = 5*np.tanh(data_y) + noise_B # tanh version 
91 |         #data_B = 5*np.sin(data_y) + noise_B # sine version
92 |         data_B = 5.5*np.tanh(data_y) + noise_B 
93 |         
94 |         data_matrix = np.concatenate((data_x,data_y,data_z,data_A,data_B,coin_x, coin_y),axis=1)
95 |         return data_matrix
96 |         # data_x, data_y, data_z, data_A, data_B, coin_x, coin_y,
97 | 


--------------------------------------------------------------------------------
/weak_conditional_independence_testing/SyntheticDim_KRESIT.py:
--------------------------------------------------------------------------------
  1 | '''
  2 | Example run in terminal:
  3 | 1) KRESIT:
  4 | $ python SyntheticDim_KRESIT.py 40 --dimZ 1 
  5 | --kernelX_use_median --kernelY_use_median 
  6 | 
  7 | (Simulate 100 sets of 40 samples with 1 dimensional conditioning set from the null_model
  8 | and run KRESIT with Gaussian kernel median Heuristic on the variables X and Y. The kernel on Z
  9 | is set by default to be Gaussian median Heuristic. The regularisation parameters is 
 10 | set by default to use grid search. )
 11 | 
 12 | 2) RESIT:
 13 | $ python SyntheticDim_KRESIT.py 40 --dimZ 1 
 14 | --kernelX --kernelY 
 15 | --kernelRxz --kernelRyz
 16 | --kernelRxz_use_median --kernelRyz_use_median 
 17 | --RESIT_type
 18 | 
 19 | (Simulate 100 sets of 40 samples with 1 dimensional conditioning set from the null_model
 20 | and run RESIT. The kernels on X and Y are set to be linear. The kernels on the residuals Rxz and 
 21 | Ryz are Gaussian with median Heuristic bandwidth.The regularisation parameters is 
 22 | set by default to use grid search. )
 23 | 
 24 | '''
 25 | 
 26 | import os, sys
 27 | BASE_DIR = os.path.join( os.path.dirname( __file__ ), '..' )
 28 | sys.path.append(BASE_DIR)
 29 | 
 30 | from kerpy.GaussianKernel import GaussianKernel
 31 | from kerpy.LinearKernel import LinearKernel
 32 | from TwoStepCondTestObject import TwoStepCondTestObject
 33 | from independence_testing.TestExperiment import TestExperiment
 34 | from SimDataGen import SimDataGen
 35 | import numpy as np
 36 | from tools.ProcessingObject import ProcessingObject
 37 | 
 38 | 
 39 | 
 40 | data_generating_function = SimDataGen.null_model
 41 | #data_generating_function = SimDataGen.alternative_model
 42 | args = ProcessingObject.parse_arguments()
 43 | 
 44 | '''unpack the arguments needed:''' 
 45 | num_samples = args.num_samples 
 46 | dimZ = args.dimZ # Integer dimension of the conditioning set.
 47 | kernelX = args.kernelX #Default: GaussianKernel(1.)
 48 | kernelY = args.kernelY #Default: GaussianKernel(1.)
 49 | kernelX_use_median = args.kernelX_use_median #Default: False
 50 | kernelY_use_median = args.kernelY_use_median #Default: False
 51 | kernelRxz = args.kernelRxz #Default: LinearKernel 
 52 | kernelRyz = args.kernelRyz #Default: LinearKernel
 53 | kernelRxz_use_median = args.kernelRxz_use_median #Default: False 
 54 | kernelRyz_use_median = args.kernelRyz_use_median #Default: False
 55 | RESIT_type = args.RESIT_type #Default: False
 56 | optimise_lambda_only = args.optimise_lambda_only #Default: True
 57 | grid_search = args.grid_search #Default: True
 58 | GD_optimise = args.GD_optimise #Default: False 
 59 | 
 60 | 
 61 | data_generator=lambda num_samples: data_generating_function(num_samples,dimension=dimZ)
 62 | 
 63 | 
 64 | num_lambdaval = 30
 65 | lambda_val = 10**np.linspace(-6,-1, num=num_lambdaval)
 66 | #num_bandwidth = 20
 67 | #z_bandwidth = 10**np.linspace(-5,1,num = num_bandwidth)
 68 | z_bandwidth = None
 69 | kernelZ = GaussianKernel(1.)
 70 | 
 71 | 
 72 | test_object = TwoStepCondTestObject(num_samples, data_generator, 
 73 |                  kernelX, kernelY, kernelZ, 
 74 |                  kernelX_use_median=kernelX_use_median,
 75 |                  kernelY_use_median=kernelY_use_median, 
 76 |                  kernelZ_use_median=True, 
 77 |                  kernelRxz = kernelRxz, kernelRyz = kernelRyz,
 78 |                  kernelRxz_use_median = kernelRxz_use_median, 
 79 |                  kernelRyz_use_median = kernelRyz_use_median,
 80 |                  RESIT_type = RESIT_type,
 81 |                  num_shuffles=800,
 82 |                  lambda_val=lambda_val,lambda_X = None, lambda_Y = None,
 83 |                  optimise_lambda_only = optimise_lambda_only, 
 84 |                  sigmasq_vals = z_bandwidth ,sigmasq_xz = 1., sigmasq_yz = 1.,
 85 |                  K_folds=5, grid_search = grid_search,
 86 |                  GD_optimise=GD_optimise, learning_rate=0.1,max_iter=300,
 87 |                  initial_lambda_x=0.5,initial_lambda_y=0.5, initial_sigmasq = 1)
 88 | 
 89 | 
 90 | # file name of the results
 91 | name = os.path.basename(__file__).rstrip('.py')+'_d_'+str(dimZ)+'_n_'+str(num_samples)
 92 | 
 93 | param={'name': name,\
 94 |        'dim_conditioning_set': dimZ,\
 95 |        'kernelX': kernelX,\
 96 |        'kernelY': kernelY,\
 97 |        'kernelZ': kernelZ,\
 98 |        'RESIT_type': RESIT_type,\
 99 |        'optimise_lambda_only': optimise_lambda_only,\
100 |        'grid_search': grid_search, \
101 |        'data_generator': str(data_generating_function),\
102 |        'num_samples': num_samples}
103 | 
104 | experiment=TestExperiment(name, param, test_object)
105 | 
106 | numTrials = 100
107 | alpha=0.05
108 | experiment.run_test_trials(numTrials, alpha=alpha)
109 | 
110 | 
111 | 


--------------------------------------------------------------------------------
/weak_conditional_independence_testing/TwoStepCondTestObject.py:
--------------------------------------------------------------------------------
  1 | '''
  2 | Created on 21 Jun 2017
  3 | 
  4 | 
  5 | Combine HSICConditionalTestObject.py (KRESIT)
  6 | and RESITCondTestObject.py (RESIT & LRESIT)
  7 | in one framework with the following parameter optimisation options:
  8 | - Fix step gradient descent on lambda 
  9 | - Fix step gradient descent on sigmasq and lambda 
 10 | - grid search on lambda 
 11 | - grid search on sigmasq and lambda
 12 | 
 13 | Such parameter optimisation is only for the regression part.
 14 | 
 15 | '''
 16 | 
 17 | import os, sys
 18 | BASE_DIR = os.path.join( os.path.dirname( __file__ ), '..' )
 19 | sys.path.append(BASE_DIR)
 20 | 
 21 | 
 22 | from independence_testing.HSICTestObject import HSICTestObject
 23 | from kerpy.Kernel import Kernel
 24 | from kerpy.LinearKernel import LinearKernel
 25 | from kerpy.GaussianKernel import GaussianKernel
 26 | from scipy.spatial.distance import squareform, pdist, cdist
 27 | from sklearn.model_selection import KFold
 28 | from scipy.linalg import solve
 29 | 
 30 | 
 31 | import numpy as np
 32 | from numpy.random import permutation 
 33 | from numpy import trace,eye, sqrt, median,exp,log
 34 | from scipy.linalg import cholesky, cho_solve
 35 | import time
 36 | 
 37 | 
 38 | class TwoStepCondTestObject(HSICTestObject):
 39 | 
 40 |     def __init__(self, num_samples, data_generator, 
 41 |                  kernelX, kernelY, kernelZ, kernelX_use_median=False,
 42 |                  kernelY_use_median=False, kernelZ_use_median=False, 
 43 |                  kernelRxz = LinearKernel(), kernelRyz = LinearKernel(),
 44 |                  kernelRxz_use_median=False, kernelRyz_use_median=False,
 45 |                  RESIT_type = False,
 46 |                  num_shuffles=800,
 47 |                  lambda_val=[0.5,1,5,10],lambda_X = None, lambda_Y = None,
 48 |                   optimise_lambda_only = False, 
 49 |                  sigmasq_vals = [1,2,3] ,sigmasq_xz = 1., sigmasq_yz = 1.,
 50 |                  K_folds=5, grid_search = False,
 51 |                  GD_optimise=True, learning_rate=0.001,max_iter=3000,
 52 |                  initial_lambda_x=0.5,initial_lambda_y=0.5, initial_sigmasq = 1):
 53 |         HSICTestObject.__init__(self, num_samples, data_generator, kernelX, kernelY, kernelZ,
 54 |                                 kernelX_use_median=kernelX_use_median, kernelY_use_median=kernelY_use_median,
 55 |                                 kernelZ_use_median=kernelZ_use_median)
 56 |         
 57 |         self.kernelRxz = kernelRxz
 58 |         self.kernelRyz = kernelRyz
 59 |         self.kernelRxz_use_median = kernelRxz_use_median
 60 |         self.kernelRyz_use_median = kernelRyz_use_median
 61 |         self.RESIT_type = RESIT_type
 62 |         self.num_shuffles = num_shuffles
 63 |         self.lambda_val = lambda_val
 64 |         self.lambda_X = lambda_X
 65 |         self.lambda_Y = lambda_Y
 66 |         self.optimise_lambda_only = optimise_lambda_only
 67 |         self.sigmasq_vals = sigmasq_vals
 68 |         self.sigmasq_xz = sigmasq_xz
 69 |         self.sigmasq_yz = sigmasq_yz
 70 |         self.K_folds = K_folds
 71 |         self.GD_optimise = GD_optimise
 72 |         self.learning_rate = learning_rate
 73 |         self.grid_search = grid_search
 74 |         self.initial_lambda_x = initial_lambda_x
 75 |         self.initial_lambda_y = initial_lambda_y
 76 |         self.initial_sigmasq = initial_sigmasq
 77 |         self.max_iter = max_iter
 78 |     
 79 |     
 80 |     
 81 |     
 82 |     # Pre-compute the kernel matrices needed for the total cv error and its gradient
 83 |     def compute_matrices_for_gradient_totalcverr(self, train_x, train_y, train_z):
 84 |         if self.kernelX_use_median:
 85 |             sigmax = self.kernelX.get_sigma_median_heuristic(train_x)
 86 |             self.kernelX.set_width(float(sigmax))
 87 |         if self.kernelY_use_median:
 88 |             sigmay = self.kernelY.get_sigma_median_heuristic(train_y)
 89 |             self.kernelY.set_width(float(sigmay))
 90 |         kf = KFold( n_splits=self.K_folds)
 91 |         matrix_results = [[[None] for _ in range(self.K_folds)]for _ in range(8)] 
 92 |         # xx=[[None]*10]*6 will give the same id to xx[0][0] and xx[1][0] etc. as 
 93 |         # this command simply copied [None] many times. But the above gives different ids.
 94 |         count = 0
 95 |         for train_index, test_index in kf.split(np.ones((self.num_samples,1))):
 96 |             X_tr, X_tst = train_x[train_index], train_x[test_index]
 97 |             Y_tr, Y_tst = train_y[train_index], train_y[test_index]
 98 |             Z_tr, Z_tst = train_z[train_index], train_z[test_index]
 99 |             matrix_results[0][count] = self.kernelX.kernel(X_tst, X_tr) #Kx_tst_tr
100 |             matrix_results[1][count] = self.kernelX.kernel(X_tr, X_tr) #Kx_tr_tr
101 |             matrix_results[2][count] = self.kernelX.kernel(X_tst, X_tst) #Kx_tst_tst
102 |             matrix_results[3][count] = self.kernelY.kernel(Y_tst, Y_tr) #Ky_tst_tr
103 |             matrix_results[4][count] = self.kernelY.kernel(Y_tr, Y_tr) #Ky_tr_tr
104 |             matrix_results[5][count] = self.kernelY.kernel(Y_tst,Y_tst) #Ky_tst_tst
105 |             matrix_results[6][count] = cdist(Z_tst, Z_tr, 'sqeuclidean') #D_tst_tr: square distance matrix
106 |             matrix_results[7][count] = cdist(Z_tr, Z_tr, 'sqeuclidean') #D_tr_tr: square distance matrix
107 |             count = count + 1
108 |         return matrix_results
109 |     
110 |     
111 |     
112 |     
113 |     
114 |     # compute the gradient of the total cverror with respect to lambda
115 |     def compute_gradient_totalcverr_wrt_lambda(self,matrix_results,lambda_val,sigmasq_z):
116 |         # 0: K_tst_tr; 1: K_tr_tr; 2: D_tst_tr; 3: D_tr_tr
117 |         num_sample_cv = self.num_samples
118 |         ttl_num_folds = np.shape(matrix_results)[1]
119 |         gradient_cverr_per_fold = np.zeros(ttl_num_folds)
120 |         for jj in range(ttl_num_folds):
121 |             uu = np.shape(matrix_results[3][jj])[0] # number of training samples
122 |             M_tst_tr = exp(matrix_results[2][jj]*float(-1/2)*sigmasq_z**(-1))
123 |             M_tr_tr = exp(matrix_results[3][jj]*float(-1/2)*sigmasq_z**(-1))
124 |             lower_ZZ = cholesky(M_tr_tr+ lambda_val*eye(uu), lower=True)
125 |             ZZ = cho_solve((lower_ZZ,True),eye(uu))
126 |             first_term = matrix_results[0][jj].dot(ZZ.dot(ZZ.dot(M_tst_tr.T)))
127 |             second_term = M_tst_tr.dot(ZZ.dot(ZZ.dot(
128 |                             matrix_results[1][jj].dot(ZZ.dot(M_tst_tr.T)))))
129 |             gradient_cverr_per_fold[jj] = trace(first_term-second_term)
130 |         return 2*sum(gradient_cverr_per_fold)/float(num_sample_cv)
131 |     
132 |     
133 |     # lambda = exp(eta)
134 |     def compute_gradient_totalcverr_wrt_eta(self,matrix_results,lambda_val,sigmasq_z):
135 |         # 0: K_tst_tr; 1: K_tr_tr; 2: D_tst_tr; 3: D_tr_tr
136 |         #eta = log(lambda_val)
137 |         #gamma = log(sigmasq_z)
138 |         num_sample_cv = self.num_samples
139 |         ttl_num_folds = np.shape(matrix_results)[1]
140 |         gradient_cverr_per_fold = np.zeros(ttl_num_folds)
141 |         for jj in range(ttl_num_folds):
142 |             uu = np.shape(matrix_results[3][jj])[0] # number of training samples
143 |             M_tst_tr = exp(matrix_results[2][jj]*float(-1/2)*sigmasq_z**(-1))
144 |             M_tr_tr = exp(matrix_results[3][jj]*float(-1/2)*sigmasq_z**(-1))
145 |             lower_ZZ = cholesky(M_tr_tr+ lambda_val*eye(uu), lower=True)
146 |             ZZ = cho_solve((lower_ZZ,True),eye(uu))
147 |             EE = lambda_val*eye(uu)
148 |             first_term = matrix_results[0][jj].dot(ZZ.dot(EE.dot(ZZ.dot(M_tst_tr.T))))
149 |             second_term = first_term.T
150 |             third_term = -M_tst_tr.dot(ZZ.dot(EE.dot(ZZ.dot(
151 |                             matrix_results[1][jj].dot(ZZ.dot(M_tst_tr.T))))))
152 |             forth_term = -M_tst_tr.dot(ZZ.dot(
153 |                             matrix_results[1][jj].dot(ZZ.dot(EE.dot(ZZ.dot(M_tst_tr.T))))))
154 |             gradient_cverr_per_fold[jj] = trace(first_term + second_term + third_term + forth_term)
155 |         return sum(gradient_cverr_per_fold)/float(num_sample_cv)
156 |     
157 |      
158 |     
159 |     
160 |     
161 |     # compute the gradient of the total cverror with respect to sigma_z squared 
162 |     def compute_gradient_totalcverr_wrt_sqsigma(self,matrix_results,lambda_val,sigmasq_z):
163 |         # 0: K_tst_tr; 1: K_tr_tr; 2: D_tst_tr; 3: D_tr_tr
164 |         num_sample_cv = self.num_samples
165 |         ttl_num_folds = np.shape(matrix_results)[1]
166 |         gradient_cverr_per_fold = np.zeros(ttl_num_folds)
167 |         for jj in range(ttl_num_folds):
168 |             uu = np.shape(matrix_results[3][jj])[0]
169 |             log_M_tr_tst = matrix_results[2][jj].T*float(-1/2)*sigmasq_z**(-1)
170 |             M_tr_tst = exp(log_M_tr_tst)
171 |             log_M_tr_tr = matrix_results[3][jj]*float(-1/2)*sigmasq_z**(-1)
172 |             M_tr_tr = exp(log_M_tr_tr)
173 |             lower_ZZ = cholesky(M_tr_tr+ lambda_val*eye(uu), lower=True)
174 |             ZZ = cho_solve((lower_ZZ,True),eye(uu))
175 |             term_1 = matrix_results[0][jj].dot(ZZ.dot((M_tr_tr*sigmasq_z**(-1)*(-log_M_tr_tr)).dot(ZZ.dot(M_tr_tst))))
176 |             term_2 = -matrix_results[0][jj].dot(ZZ.dot(M_tr_tst*(-log_M_tr_tst*sigmasq_z**(-1))))
177 |             term_3 = (sigmasq_z**(-1)*(M_tr_tst.T)*(-log_M_tr_tst.T)).dot(ZZ.dot(matrix_results[1][jj].dot(ZZ.dot(M_tr_tst))))
178 |             term_4 = -(M_tr_tst.T).dot(ZZ.dot((M_tr_tr*sigmasq_z**(-1)*(-log_M_tr_tr)).dot(ZZ.dot(matrix_results[1][jj].dot(
179 |                                                                                     ZZ.dot(M_tr_tst))))))
180 |             term_5 = -(M_tr_tst.T).dot(ZZ.dot(matrix_results[1][jj].dot(ZZ.dot((M_tr_tr*sigmasq_z**(-1)*(-log_M_tr_tr)).dot(
181 |                                                                                     ZZ.dot(M_tr_tst))))))
182 |             term_6 = (M_tr_tst.T).dot(ZZ.dot(matrix_results[1][jj].dot(ZZ.dot(M_tr_tst*sigmasq_z**(-1)*(-log_M_tr_tst)))))
183 |             gradient_cverr_per_fold[jj] = trace(2*term_1 + 2*term_2 + term_3 + term_4 + term_5 + term_6)
184 |         return sum(gradient_cverr_per_fold)/float(num_sample_cv)
185 |     
186 |     
187 |     
188 |     
189 |     def compute_gradient_totalcverr_wrt_gamma(self,matrix_results,lambda_val,sigmasq_z):
190 |         # 0: K_tst_tr; 1: K_tr_tr; 2: D_tst_tr; 3: D_tr_tr
191 |         #eta = log(lambda_val)
192 |         #gamma = log(sigmasq_z)
193 |         num_sample_cv = self.num_samples
194 |         ttl_num_folds = np.shape(matrix_results)[1]
195 |         gradient_cverr_per_fold = np.zeros(ttl_num_folds)
196 |         for jj in range(ttl_num_folds):
197 |             uu = np.shape(matrix_results[3][jj])[0]
198 |             log_M_tr_tst = matrix_results[2][jj].T*float(-1/2)*sigmasq_z**(-1)
199 |             M_tr_tst = exp(log_M_tr_tst)
200 |             log_M_tr_tr = matrix_results[3][jj]*float(-1/2)*sigmasq_z**(-1)
201 |             M_tr_tr = exp(log_M_tr_tr)
202 |             lower_ZZ = cholesky(M_tr_tr+ lambda_val*eye(uu), lower=True)
203 |             ZZ = cho_solve((lower_ZZ,True),eye(uu))
204 |             term_1 = matrix_results[0][jj].dot(ZZ.dot((M_tr_tr*(-log_M_tr_tr)).dot(ZZ.dot(M_tr_tst))))
205 |             term_2 = -matrix_results[0][jj].dot(ZZ.dot(M_tr_tst*(-log_M_tr_tst)))
206 |             term_3 = (M_tr_tst.T*(-log_M_tr_tst).T).dot(ZZ.dot(matrix_results[1][jj].dot(ZZ.dot(M_tr_tst))))
207 |             term_4 = -(M_tr_tst.T).dot(ZZ.dot((M_tr_tr*(-log_M_tr_tr)).dot(ZZ.dot(matrix_results[1][jj].dot(
208 |                                                                                     ZZ.dot(M_tr_tst))))))
209 |             term_5 = -(M_tr_tst.T).dot(ZZ.dot(matrix_results[1][jj].dot(ZZ.dot((M_tr_tr*(-log_M_tr_tr)).dot(
210 |                                                                                     ZZ.dot(M_tr_tst))))))
211 |             term_6 = (M_tr_tst.T).dot(ZZ.dot(matrix_results[1][jj].dot(ZZ.dot(M_tr_tst*(-log_M_tr_tst)))))
212 |             gradient_cverr_per_fold[jj] = trace(2*term_1 + 2*term_2 + term_3 + term_4 + term_5 + term_6)
213 |         return sum(gradient_cverr_per_fold)/float(num_sample_cv)
214 |     
215 |     
216 |     
217 |     # compute the total cverror
218 |     def compute_totalcverr(self,matrix_results,lambda_val,sigmasq_z):
219 |         # 0: K_tst_tr; 1: K_tr_tr; 2: K_tst_tst; 3: D_tst_tr; 4: D_tr_tr 
220 |         num_sample_cv = self.num_samples
221 |         ttl_num_folds = np.shape(matrix_results)[1]
222 |         cverr_per_fold = np.zeros(ttl_num_folds)
223 |         for jj in range(ttl_num_folds):
224 |             uu = np.shape(matrix_results[4][jj])[0] # number of training samples 
225 |             M_tst_tr = exp(matrix_results[3][jj]*float(-1/2)*sigmasq_z**(-1))
226 |             M_tr_tr = exp(matrix_results[4][jj]*float(-1/2)*sigmasq_z**(-1))
227 |             lower_ZZ = cholesky(M_tr_tr+ lambda_val*eye(uu), lower=True)
228 |             ZZ = cho_solve((lower_ZZ,True),eye(uu))
229 |             first_term = matrix_results[2][jj]
230 |             second_term = - matrix_results[0][jj].dot(ZZ.dot(M_tst_tr.T))
231 |             third_term = np.transpose(second_term)
232 |             fourth_term = M_tst_tr.dot(ZZ.dot(
233 |                             matrix_results[1][jj].dot(ZZ.dot(M_tst_tr.T))))
234 |             cverr_per_fold[jj] = trace(first_term + second_term + third_term + fourth_term)
235 |         return sum(cverr_per_fold)/float(num_sample_cv)
236 |     
237 |     
238 |     
239 |     
240 |     def compute_GD_lambda_sigmasq_for_TotalCVerr_with_fix_step_logspace(self, matrix_results,initial_lambda, initial_sigmasq):
241 |         EE = log(initial_lambda) # initialisation of the lambda value
242 |         GG = log(initial_sigmasq) # initialisation of the sigma square value for z
243 |         count = 0
244 |         log_lambda_path = [EE]
245 |         log_sigma_square_path = [GG]
246 |         Error_path = [self.compute_totalcverr(matrix_results,lambda_val = exp(EE),sigmasq_z=exp(GG))]
247 |         d_part_matrix_results = [matrix_results[ii] for ii in [0,1,3,4]]
248 |         Grad_EE = self.compute_gradient_totalcverr_wrt_eta(d_part_matrix_results, exp(EE), exp(GG))
249 |         Grad_GG = self.compute_gradient_totalcverr_wrt_gamma(d_part_matrix_results, exp(EE), exp(GG))
250 |         while (sum(np.array([abs(Grad_EE),abs(Grad_GG)]) >= 0.00001) == 2 and count < self.max_iter):
251 |             Grad_EE_old = self.compute_gradient_totalcverr_wrt_eta(d_part_matrix_results, exp(EE), exp(GG))
252 |             EE = EE - self.learning_rate*Grad_EE_old
253 |             Grad_EE = self.compute_gradient_totalcverr_wrt_eta(d_part_matrix_results, exp(EE), exp(GG))
254 |             log_lambda_path = np.concatenate((log_lambda_path,[EE]))
255 |             Error_path = np.concatenate((Error_path,[self.compute_totalcverr(matrix_results,lambda_val = exp(EE), sigmasq_z=exp(GG))]))
256 |             
257 |             if sum(np.array([abs(Grad_EE),abs(Grad_GG)]) >= 0.00001) == 2 and count < self.max_iter:
258 |                 Grad_GG_old = self.compute_gradient_totalcverr_wrt_gamma(d_part_matrix_results, exp(EE), exp(GG))
259 |                 GG = GG - self.learning_rate*Grad_GG_old
260 |                 Grad_GG = self.compute_gradient_totalcverr_wrt_gamma(d_part_matrix_results, exp(EE), exp(GG))
261 |                 log_sigma_square_path = np.concatenate((log_sigma_square_path,[GG]))
262 |                 Error_path = np.concatenate((Error_path,[self.compute_totalcverr(matrix_results,lambda_val = exp(EE), sigmasq_z=exp(GG))]))
263 |                 
264 |             else:
265 |                 break
266 |             count = count+1
267 |         return log_lambda_path[count], log_lambda_path, log_sigma_square_path[count], log_sigma_square_path,Error_path
268 |     
269 |     
270 |     
271 |     def compute_GD_lambda_for_TotalCVerr_with_fix_step_logspace(self, matrix_results,initial_lambda, sigmasq_z):
272 |         EE = log(initial_lambda) # initialisation of the lambda value
273 |         count = 0
274 |         log_lambda_path = [EE]
275 |         Error_path = [self.compute_totalcverr(matrix_results,lambda_val = exp(EE),sigmasq_z=sigmasq_z)]
276 |         d_part_matrix_results = [matrix_results[ii] for ii in [0,1,3,4]]
277 |         Grad_EE = self.compute_gradient_totalcverr_wrt_eta(d_part_matrix_results, exp(EE), sigmasq_z)
278 |         while (abs(Grad_EE) >= 0.00001 and count < self.max_iter):
279 |             Grad_EE_old = Grad_EE
280 |             EE = EE - self.learning_rate*Grad_EE_old
281 |             Grad_EE = self.compute_gradient_totalcverr_wrt_eta(d_part_matrix_results, exp(EE), sigmasq_z)
282 |             log_lambda_path = np.concatenate((log_lambda_path,[EE]))
283 |             Error_path = np.concatenate((Error_path,[self.compute_totalcverr(matrix_results,lambda_val = exp(EE), sigmasq_z=sigmasq_z)]))
284 |             count = count+1
285 |         
286 |         return log_lambda_path[count], log_lambda_path, Error_path
287 |     
288 |     
289 |     
290 |     def compute_lambda_sigmasq_through_grid_search(self, matrix_results_x, matrix_results_y, lambda_vals, sigmasq_vals):
291 |         # 0: K_tst_tr; 1: K_tr_tr; 2: K_tst_tst; 3: D_tst_tr; 4: D_tr_tr 
292 |         #print "parameter opt via grid search"
293 |         num_of_lambdas = np.shape(lambda_vals)[0]
294 |         num_of_sigmasq = np.shape(sigmasq_vals)[0]
295 |         total_cverr_matrix_x = np.reshape(np.zeros(num_of_sigmasq*num_of_lambdas), (num_of_sigmasq,num_of_lambdas))
296 |         total_cverr_matrix_y = np.reshape(np.zeros(num_of_sigmasq*num_of_lambdas), (num_of_sigmasq,num_of_lambdas))
297 |         for ss in range(num_of_sigmasq):
298 |             for ll in range(num_of_lambdas):
299 |                 #print "Bandwidth numb; Lambdaval numb:", (ss,ll)
300 |                 total_cverr_matrix_x[ss,ll] = self.compute_totalcverr(matrix_results_x, lambda_vals[ll], sigmasq_vals[ss])
301 |                 total_cverr_matrix_y[ss,ll] = self.compute_totalcverr(matrix_results_y, lambda_vals[ll], sigmasq_vals[ss])
302 |         x_sigmasq_idx, x_lambda_idx = np.where(total_cverr_matrix_x == np.min(total_cverr_matrix_x))
303 |         y_sigmasq_idx, y_lambda_idx = np.where(total_cverr_matrix_y == np.min(total_cverr_matrix_y))
304 |         if np.shape(x_sigmasq_idx)[0] > 1:
305 |             x_sigmasq = sigmasq_vals[x_sigmasq_idx[0]]
306 |             x_lambda = lambda_vals[x_lambda_idx[0]]
307 |         else:
308 |             x_sigmasq = sigmasq_vals[x_sigmasq_idx]
309 |             x_lambda = lambda_vals[x_lambda_idx]
310 |         if np.shape(y_sigmasq_idx[0]) > 1:
311 |             y_sigmasq = sigmasq_vals[y_sigmasq_idx[0]]
312 |             y_lambda = lambda_vals[y_lambda_idx[0]]
313 |         else:
314 |             y_sigmasq = sigmasq_vals[y_sigmasq_idx]
315 |             y_lambda = lambda_vals[y_lambda_idx]
316 |         return x_sigmasq, x_lambda, y_sigmasq, y_lambda, total_cverr_matrix_x,total_cverr_matrix_y
317 |     
318 |     
319 |     
320 |     
321 |     
322 |     def compute_lambda_through_grid_search(self, matrix_results_x, lambda_vals, sigmasq_xz):
323 |         # 0: K_tst_tr; 1: K_tr_tr; 2: K_tst_tst; 3: D_tst_tr; 4: D_tr_tr 
324 |         #print "lambda parameter opt via grid search"
325 |         num_of_lambdas = np.shape(lambda_vals)[0]
326 |         total_cverr_matrix_x = np.reshape(np.zeros(num_of_lambdas), (num_of_lambdas,1))
327 |         for ll in range(num_of_lambdas):
328 |             total_cverr_matrix_x[ll,0] = self.compute_totalcverr(matrix_results_x, lambda_vals[ll], sigmasq_xz)
329 |         
330 |         x_lambda_idx = np.where(total_cverr_matrix_x == np.min(total_cverr_matrix_x))
331 |         if np.shape(x_lambda_idx)[0] > 1:
332 |             x_lambda = lambda_vals[x_lambda_idx[0]]
333 |         else:
334 |             x_lambda = lambda_vals[x_lambda_idx]
335 |         return x_lambda, total_cverr_matrix_x
336 |     
337 |     
338 |     
339 |     
340 |     
341 |     
342 |     def compute_test_statistics_and_others(self, data_x, data_y, data_z):
343 |         if self.grid_search or self.GD_optimise:
344 |             matrix_results = self.compute_matrices_for_gradient_totalcverr(data_x,data_y,data_z)
345 |             matrix_results_x = [matrix_results[ii] for ii in [0,1,2,6,7]]
346 |             matrix_results_y = [matrix_results[ii] for ii in [3,4,5,6,7]]
347 |             if self.GD_optimise: # Gradient descent with fixed learning rate
348 |                 if self.optimise_lambda_only:
349 |                     if self.kernelZ_use_median:
350 |                         sigmaz = self.kernelZ.get_sigma_median_heuristic(data_z)
351 |                         self.kernelZ.set_width(float(sigmaz))
352 |                         self.sigmasq_xz = self.sigmasq_yz = sigmaz**2
353 |                     #print "Gradient Descent Optimisation in log space, fixed step for lambda X"
354 |                     log_lambda_X, log_lambda_pathx, X_CVerror = self.compute_GD_lambda_for_TotalCVerr_with_fix_step_logspace(matrix_results_x,
355 |                                                                                 self.initial_lambda_x, self.sigmasq_xz)
356 |                     self.lambda_X = exp(log_lambda_X)
357 |                     #print X_CVerror
358 |                     #print "Gradient Descent Optimisation in log space, fixed step for lambda Y"
359 |                     log_lambda_Y, log_lambda_pathy, Y_CVerror = self.compute_GD_lambda_for_TotalCVerr_with_fix_step_logspace(matrix_results_y,
360 |                                                                                 self.initial_lambda_y, self.sigmasq_yz)
361 |                     self.lambda_Y = exp(log_lambda_Y)
362 |                     #print Y_CVerror
363 |                 else:
364 |                     #print "Gradient Descent Optimisation in log space, fixed step for lambda X and sigma XZ"
365 |                     log_lambda_X, _, log_sigmasq_xz, _, X_CVerror = self.compute_GD_lambda_sigmasq_for_TotalCVerr_with_fix_step_logspace(matrix_results_x,
366 |                                                                 initial_lambda=self.initial_lambda_x, initial_sigmasq=self.initial_sigmasq)
367 |                     self.lambda_X = exp(log_lambda_X)
368 |                     self.sigmasq_xz = exp(log_sigmasq_xz)
369 |                     #print X_CVerror
370 |                     #print "Gradient Descent Optimisation in log space, fixed step for lambda Y and sigma YZ"
371 |                     log_lambda_Y, _, log_sigmasq_yz, _, Y_CVerror = self.compute_GD_lambda_sigmasq_for_TotalCVerr_with_fix_step_logspace(matrix_results_y,
372 |                                                                 initial_lambda=self.initial_lambda_y, initial_sigmasq=self.initial_sigmasq)
373 |                     self.lambda_Y = exp(log_lambda_Y)
374 |                     self.sigmasq_yz = exp(log_sigmasq_yz)
375 |                     #print Y_CVerror
376 |                 
377 |             elif self.grid_search:
378 |                 if self.optimise_lambda_only:
379 |                     if self.kernelZ_use_median:
380 |                         sigmaz = self.kernelZ.get_sigma_median_heuristic(data_z)
381 |                         self.kernelZ.set_width(float(sigmaz))
382 |                         self.sigmasq_xz = self.sigmasq_yz = sigmaz**2
383 |                     #print "Grid Search Optimisation in log space for lambda X"
384 |                     self.lambda_X, X_CVerror = self.compute_lambda_through_grid_search(matrix_results_x, self.lambda_val,self.sigmasq_xz)
385 |                     #print X_CVerror
386 |                     #print "Grid Search Optimisation in log space for lambda Y"
387 |                     self.lambda_Y, Y_CVerror = self.compute_lambda_through_grid_search(matrix_results_y, self.lambda_val,self.sigmasq_yz)
388 |                     #print Y_CVerror
389 |                 else:
390 |                     self.sigmasq_xz, self.lambda_X, self.sigmasq_yz, self.lambda_Y, X_CVerror, Y_CVerror = \
391 |                     self.compute_lambda_sigmasq_through_grid_search(matrix_results_x, matrix_results_y, self.lambda_val, self.sigmasq_vals)
392 |                     #print X_CVerror
393 |                     #print Y_CVerror
394 |             else:
395 |                 raise NotImplementedError
396 |                 
397 |         else:
398 |             if self.lambda_X == None:
399 |                 self.lambda_X = self.lambda_val[0]
400 |             if self.lambda_Y == None:
401 |                 self.lambda_Y = self.lambda_val[0]
402 |             if self.sigmasq_xz == None:
403 |                 sigmaz = self.kernelZ.get_sigma_median_heuristic(data_z)
404 |                 self.kernelZ.set_width(float(sigmaz))
405 |                 self.sigmasq_xz = sigmaz**2
406 |             if self.sigmasq_yz == None:
407 |                 sigmaz = self.kernelZ.get_sigma_median_heuristic(data_z)
408 |                 self.kernelZ.set_width(float(sigmaz))
409 |                 self.sigmasq_yz = sigmaz**2
410 |             X_CVerror = 0
411 |             Y_CVerror = 0
412 |         
413 |         #print "lambda value for (X,Y)", (self.lambda_X,self.lambda_Y)
414 |         #print "sigma squared for (XZ, YZ)", (self.sigmasq_xz, self.sigmasq_yz)
415 |         test_size = self.num_samples
416 |         if not self.RESIT_type:
417 |             test_Kx, test_Ky, _  = self.compute_kernel_matrix_on_data_CI(data_x, data_y, data_z)
418 |             D_z = cdist(data_z, data_z, 'sqeuclidean')
419 |             test_Kzx = exp(D_z*(-0.5)*self.sigmasq_xz**(-1))
420 |             test_Kzy = exp(D_z*(-0.5)*self.sigmasq_yz**(-1))
421 |             weight_xz = solve(test_Kzx/float(self.lambda_X)+np.identity(test_size),np.identity(test_size))
422 |             weight_yz = solve(test_Kzy/float(self.lambda_Y)+np.identity(test_size),np.identity(test_size))
423 |             K_epsilon_x = weight_xz.dot(test_Kx.dot(weight_xz))
424 |             K_epsilon_y = weight_yz.dot(test_Ky.dot(weight_yz))
425 |         else:
426 |             #print "RESIT Computation"
427 |             if self.kernelZ_use_median:
428 |                 sigmaz = self.kernelZ.get_sigma_median_heuristic(data_z)
429 |                 self.kernelZ.set_width(float(sigmaz))
430 |             test_Kz = self.kernelZ.kernel(data_z)
431 |             weight_xz = solve(test_Kz/float(self.lambda_X)+np.identity(test_size),np.identity(test_size))
432 |             weight_yz = solve(test_Kz/float(self.lambda_Y)+np.identity(test_size),np.identity(test_size))
433 |             residual_xz = weight_xz.dot(data_x)
434 |             residual_yz = weight_yz.dot(data_y)
435 |             if self.kernelRxz_use_median:
436 |                 sigmaRxz = self.kernelRxz.get_sigma_median_heuristic(residual_xz)
437 |                 self.kernelRxz.set_width(float(sigmaRxz))
438 |             if self.kernelRyz_use_median:
439 |                 sigmaRyz = self.kernelRyz.get_sigma_median_heuristic(residual_yz)
440 |                 self.kernelRyz.set_width(float(sigmaRyz))
441 |             K_epsilon_x = self.kernelRxz.kernel(residual_xz)
442 |             K_epsilon_y = self.kernelRyz.kernel(residual_yz)
443 |         
444 |         hsic_statistic = self.HSIC_V_statistic(K_epsilon_x, K_epsilon_y)
445 |         #print "HSIC Statistics", hsic_statistic
446 |         return hsic_statistic, K_epsilon_x, K_epsilon_y, X_CVerror, Y_CVerror
447 |     
448 |     
449 |     
450 |     
451 |     
452 |     def compute_null_samples_and_pvalue(self,data_x=None,data_y=None,data_z=None):
453 |         ''' data_x,data_y, data_z are the given data that we wish to test 
454 |         the conditional independence given data_z. 
455 |         > each data set has the number of samples = number of rows 
456 |         > the bandwidth for training set and test set will be different (as we will calculate as soon as data comes in)
457 |         '''
458 |         if data_x is None and data_y is None and data_z is None: 
459 |             if not self.streaming and not self.freeze_data:
460 |                 start = time.clock()
461 |                 self.generate_data(isConditionalTesting=True)
462 |                 data_generating_time = time.clock()-start
463 |                 data_x = self.data_x
464 |                 data_y = self.data_y
465 |                 data_z = self.data_z
466 |                 #print "dimension of data:", np.shape(data_x)
467 |             else:
468 |                 data_generating_time = 0.
469 |             
470 |         else:
471 |             data_generating_time = 0.
472 |         #print 'Data generating time passed: ', data_generating_time
473 |         hsic_statistic, K_epsilon_x, K_epsilon_y, X_CVerror, Y_CVerror = self.compute_test_statistics_and_others(data_x, data_y, data_z)
474 |         if self.num_shuffles != 0:
475 |             ny = np.shape(K_epsilon_y)[0]
476 |             null_samples = np.zeros(self.num_shuffles)
477 |             for jj in range(self.num_shuffles):
478 |                 pp = permutation(ny)
479 |                 Kp = K_epsilon_y[pp,:][:,pp]
480 |                 null_samples[jj] = self.HSIC_V_statistic(K_epsilon_x, Kp)
481 |             pvalue = ( sum( null_samples > hsic_statistic ) + 1) / float( self.num_shuffles + 1)
482 |             #print "P-value:", pvalue
483 |         else:
484 |             pvalue = None 
485 |             null_samples = 0  
486 |             #print "Not interested in P-value"
487 |         return null_samples, hsic_statistic, pvalue, X_CVerror, Y_CVerror,data_generating_time
488 |     
489 |     
490 |     
491 |     def compute_pvalue_with_time_tracking(self, data_x = None, data_y = None, data_z = None):
492 |         if self.lambda_X is not None and self.lambda_Y is not None:
493 |             self.GD_optimise = False
494 |             self.grid_search = False
495 |             self.lambda_val = [1]
496 |         _, _, pvalue, _, _, data_generating_time = self.compute_null_samples_and_pvalue(data_x = data_x,
497 |                                                                             data_y = data_y, data_z = data_z)
498 |         return pvalue, data_generating_time
499 |     
500 |     
501 |     
502 |     
503 | 


--------------------------------------------------------------------------------
/weak_conditional_independence_testing/WHO_KRESITvsRESIT.py:
--------------------------------------------------------------------------------
  1 | '''
  2 | Comparing RESIT and KRESIT on WHO data.
  3 | To run the script in terminal:
  4 | $ python WHO_KRESITvsRESIT.py 
  5 | The p-values will be printed and a data plot will be shown.
  6 | 
  7 | '''
  8 | 
  9 | import os, sys
 10 | BASE_DIR = os.path.join( os.path.dirname( __file__ ), '..' )
 11 | sys.path.append(BASE_DIR)
 12 | 
 13 | import os 
 14 | import pandas as pd
 15 | import numpy as np 
 16 | from numpy.random import normal
 17 | from scipy.stats import t
 18 | from scipy import stats, linalg
 19 | import matplotlib.pyplot as plt
 20 | from pylab import rcParams 
 21 | 
 22 | from kerpy.GaussianKernel import GaussianKernel
 23 | from kerpy.LinearKernel import LinearKernel
 24 | from TwoStepCondTestObject import TwoStepCondTestObject
 25 | # Import data 
 26 | data = pd.DataFrame.from_csv("WHO_dataset.csv") #WHO_dataset.csv is the same as WHO1 original.csv
 27 | Gross_national_income = np.reshape(data.iloc[:,1],(202,1))
 28 | Expenditure_on_health = np.reshape(data.iloc[:,5],(202,1))
 29 | 
 30 | # Remove missing values 
 31 | data_yz = np.concatenate((Gross_national_income,Expenditure_on_health),axis=1)
 32 | data_yz = data_yz[~np.isnan(data_yz).any(axis=1)]
 33 | data_z = data_yz[:,[0]] #(178,1)
 34 | data_y = data_yz[:,[1]]
 35 | 
 36 | # log transform data z to make the concentrated data more spread out
 37 | data_z = np.log(data_z)
 38 | 
 39 | 
 40 | # range of values for grid search 
 41 | num_lambdaval = 30
 42 | lambda_val = 10**np.linspace(-6,-1, num=num_lambdaval)
 43 | z_bandwidth = None
 44 | #num_bandwidth = 20
 45 | #z_bandwidth = 10**np.linspace(-5,1,num = num_bandwidth)
 46 | 
 47 | # some parameter settings
 48 | num_samples = np.shape(data_z)[0]
 49 | data_generator=None
 50 | num_trials = 1
 51 | pvals_KRESIT = np.reshape(np.zeros(num_trials),(num_trials,1))
 52 | pvals_RESIT = np.reshape(np.zeros(num_trials),(num_trials,1))
 53 | 
 54 | 
 55 | 
 56 | # computing Type I error (Null model is true)
 57 | for jj in xrange(num_trials):
 58 |         #print "number of trial:", jj
 59 |         
 60 |         data_x = np.reshape(np.zeros(num_samples),(num_samples,1))
 61 |         noise_x = np.reshape(normal(0,1,np.shape(data_z)[0]),(np.shape(data_z)[0],1))
 62 |         coin_flip_x = np.random.choice([0,1],replace=True,size=num_samples)
 63 |         data_x[coin_flip_x == 0] = (data_z[coin_flip_x == 0]-10)**2 
 64 |         data_x[coin_flip_x == 1] = -(data_z[coin_flip_x == 1]-10)**2+35
 65 |         data_x = data_x + noise_x
 66 |         
 67 |         
 68 |         # KRESIT:
 69 |         kernelX = GaussianKernel(1.)
 70 |         kernelY = GaussianKernel(1.)
 71 |         kernelZ = GaussianKernel(1.)
 72 |         mytestobject = TwoStepCondTestObject(num_samples, None, 
 73 |                                          kernelX, kernelY, kernelZ, 
 74 |                                          kernelX_use_median=True,
 75 |                                          kernelY_use_median=True, 
 76 |                                          kernelZ_use_median=True, 
 77 |                                          kernelRxz = LinearKernel(), kernelRyz = LinearKernel(),
 78 |                                          kernelRxz_use_median = False, 
 79 |                                          kernelRyz_use_median = False,
 80 |                                          RESIT_type = False,
 81 |                                          num_shuffles=800,
 82 |                                          lambda_val=lambda_val,lambda_X = None, lambda_Y = None,
 83 |                                          optimise_lambda_only = True, 
 84 |                                          sigmasq_vals = z_bandwidth ,sigmasq_xz = 1., sigmasq_yz = 1.,
 85 |                                          K_folds=5, grid_search = True,
 86 |                                          GD_optimise=False)
 87 |         
 88 |         pvals_KRESIT[jj,], _ = mytestobject.compute_pvalue(data_x,data_y,data_z)
 89 |         
 90 |         
 91 |         # RESIT:
 92 |         kernelX = LinearKernel()
 93 |         kernelY = LinearKernel()
 94 |         mytestobject_RESIT = TwoStepCondTestObject(num_samples, None, 
 95 |                                          kernelX, kernelY, kernelZ, 
 96 |                                          kernelX_use_median=False,
 97 |                                          kernelY_use_median=False, 
 98 |                                          kernelZ_use_median=True, 
 99 |                                          kernelRxz = GaussianKernel(1.), kernelRyz = GaussianKernel(1.),
100 |                                          kernelRxz_use_median = True, 
101 |                                          kernelRyz_use_median = True,
102 |                                          RESIT_type = True,
103 |                                          num_shuffles=800,
104 |                                          lambda_val=lambda_val,lambda_X = None, lambda_Y = None,
105 |                                          optimise_lambda_only = True, 
106 |                                          sigmasq_vals = z_bandwidth ,sigmasq_xz = 1., sigmasq_yz = 1.,
107 |                                          K_folds=5, grid_search = True,
108 |                                          GD_optimise=False)
109 |         
110 |         pvals_RESIT[jj,], _ = mytestobject_RESIT.compute_pvalue(data_x,data_y,data_z)
111 | 
112 | 
113 | 
114 | #np.savetxt("WHO_KRESIT_rejection_rate.csv", pvals_KRESIT, delimiter=",")
115 | #np.savetxt("WHO_RESIT_rejection_rate.csv", pvals_RESIT, delimiter=",")
116 | 
117 | 
118 | if num_trials > 1:
119 |     print "Type I error (KRESIT):", np.shape(filter(lambda x: x<0.05 ,pvals_KRESIT))[0]/float(num_trials)
120 |     print "Type I error (RESIT):", np.shape(filter(lambda x: x<0.05 ,pvals_RESIT))[0]/float(num_trials)
121 | elif num_trials == 1:
122 |     print "pval our approach:", pvals_KRESIT
123 |     print "pval RESIT:", pvals_RESIT
124 | 
125 | 
126 | # Plot of the data
127 | rcParams['figure.figsize'] = 9, 4.7
128 | plt.figure(1)
129 | plt.subplot(121)
130 | plt.plot(data_z, data_y,'.')
131 | plt.ylabel('Y = Expenditure on health per cap')
132 | plt.xlabel('Z = log(Gross national income per cap)')
133 | 
134 | plt.subplot(122)
135 | plt.plot(data_z,data_x,'.')
136 | plt.ylabel('X ')
137 | plt.xlabel('Z')
138 | plt.show()
139 | 
140 | #plot_name = "WHO_" + "logZ_"+ "quadraticX" + ".pdf"
141 | #plt.savefig(plot_name, format='pdf')
142 | #plt.show()
143 | 
144 | 
145 | 
146 | 


--------------------------------------------------------------------------------
/weak_conditional_independence_testing/WHO_dataset.csv:
--------------------------------------------------------------------------------
1 | Country,CountryID,Gross national income per cap,Government expenditure on health per cap,Adult female obesity (%),Income per person,Per capita total expenditure on health (US$)Afghanistan,1,,8,,874,23Albania,2,6000,127,,5369,174Algeria,3,5940,146,,6011,123Andorra,4,,2054,,,2815Angola,5,3890,61,,3533,71Antigua and Barbuda,6,15130,439,,14579,517Argentina,7,11670,758,,11063,551Armenia,8,4950,112,15.5,3903,99Australia,9,33940,2097,,32798,3316Austria,10,36040,2729,,34108,3864Azerbaijan,11,5430,67,,4648,86Bahamas,12,,775,,23021,1311Bahrain,13,34310,669,,27236,810Bangladesh,14,1230,26,,1268,13Barbados,15,15150,722,,15837,785Belarus,16,9700,428,,8541,244Belgium,17,33860,2264,13.4,32077,3565Belize,18,7080,254,,7290,229Benin,19,1250,25,6.1,1390,29Bermuda,20,,,,69916.79,Bhutan,21,4000,73,,3694,65Bolivia,22,3810,128,15.1,3618,79Bosnia and Herzegovina,23,6780,454,25.2,6506,258Botswana,24,11730,487,,12057,378Brazil,25,8700,367,13.1,8596,426Brunei Darussalam,26,49900,314,,47465,543Bulgaria,27,10270,443,,9353,283Burkina Faso,28,1130,50,2.4,1140,27Burundi,29,320,4,,420.08,4Cambodia,30,1550,43,1.2,1453,30Cameroon,31,2060,23,8.2,1995,51Canada,32,36280,2585,13.9,35078,3912Cape Verde,33,2590,227,,2831,129Central African Republic,34,690,20,,675,14Chad,35,1170,14,1.5,1749,22Chile,36,11300,367,25,12262,473China,37,4660,144,3.4,4091,90Colombia,38,6130,534,16.6,6306,217Comoros,39,1140,19,,1063,16"Congo, Dem. Rep.",40,270,7,,264,6"Congo, Rep.",41,2420,13,7.5,3621,42Cook Islands,42,,518,65.7,9000,427Costa Rica,43,9220,565,,8661,353Cote d'Ivoire,44,1580,15,,1575,35Croatia,45,13850,869,22.7,13232,722Cuba,46,,329,,7407.24,355Cyprus,47,25060,759,11.8,24473,1483Czech Republic,48,20920,1309,16.3,20281,943Denmark,49,36190,2812,9.1,33626,4828Djibouti,50,2180,75,,1964,62Dominica,51,7870,311,,8576,297Dominican Republic,52,5550,140,,5173,223Ecuador,53,6810,130,,6533,166Egypt,54,4940,129,46.6,5049,93El Salvador,55,5610,227,,5403,191Equatorial Guinea,56,16620,219,,11999,274Eritrea,57,680,10,1.6,685,10Estonia,58,18090,734,14.9,16654,620Ethiopia,59,630,13,0.7,591,7Fiji,60,4450,199,26.4,4209,149Finland,61,33170,1940,13.5,30469,2994France,62,32240,2833,,29644,4056French Polynesia,63,,,,26016.11,Gabon,64,11180,198,8.2,12742,267Gambia,65,1110,33,,726,13Georgia,66,3880,76,,3505,147Germany,67,32680,2548,12.3,30496,3669Ghana,68,1240,36,8.1,1225,35Greece,69,30870,1317,18.2,25520,2733Grenada,70,8770,387,,9128,346Guatemala,71,5120,98,,4897,144Guinea,72,1130,14,3,946,20Guinea-Bissau,73,460,10,,569,13Guyana,74,3410,223,,3232,67Haiti,75,1070,65,6.3,1175,42Honduras,76,3420,116,18.8,3266,99"Hong Kong, China",77,,,,35680,Hungary,78,16970,978,18.2,17014,853Iceland,79,33740,2758,12.3,35630,4962India,80,2460,21,2.8,2126,39Indonesia,81,3310,44,3.6,3234,34Iran (Islamic Republic of),82,9800,406,19.2,10692,247Iraq,83,,90,38.2,3200,67Ireland,84,34730,2413,12,38058,3888Israel,85,23840,1477,,23846,1618Italy,86,28970,2022,8.9,27750,2845Jamaica,87,7050,127,,7132,180Japan,88,32840,2067,3.3,30290,2690Jordan,89,4820,257,16.2,4294,246Kazakhstan,90,8700,214,,8699,189Kenya,91,1470,51,6.3,1359,29Kiribati,92,6230,268,,3377,117"Korea, Dem. Rep.",93,,42,,1596.68,0"Korea, Rep.",94,22990,819,,21342,1187Kuwait,95,48310,422,,44947,796Kyrgyzstan,96,1790,55,,1728,34Lao People's Democratic Republic,97,1740,18,1.6,1811,22Latvia,98,14840,615,19.5,13218,533Lebanon,99,9600,285,,10212,468Lesotho,100,1810,88,16.1,1415,49Liberia,101,260,25,,383,10Libyan Arab Jamahiriya,102,11630,189,,10804,255Lithuania,103,14550,728,19.2,14085,545Luxembourg,104,60870,5233,,70014,6610"Macao, China",105,,,,37256,Macedonia,106,7850,446,,7393,245Madagascar,107,870,21,1,988,9Malawi,108,690,51,2.4,691,20Malaysia,109,12160,226,18.8,11466,255Maldives,110,4740,742,,4017,306Mali,111,1000,34,3.7,1027,30Malta,112,20990,1419,21.3,20410,1295Marshall Islands,113,8040,589,,,298Mauritania,114,1970,31,16.7,1691,19Mauritius,115,10640,292,,10155,223Mexico,116,11990,327,28.1,11317,500Micronesia (Federated States of),117,6070,444,,,266Moldova,118,2660,107,18.2,2362,68Monaco,119,,5309,,,6343Mongolia,120,2810,124,12.5,2643,53Montenegro,121,8930,93,,,306Morocco,122,3860,98,11,3547,95Mozambique,123,660,39,3.9,743,17Myanmar,124,510,7,,831,4Namibia,125,4770,218,,4547,167Nauru,126,,444,60.5,2500,605Nepal,127,1010,24,0.9,1081,17Netherlands,128,37940,2768,,34724,3784Netherlands Antilles,129,,,,22700.41,New Caledonia,130,,,,31942.83,New Zealand,131,25750,1905,23.2,24554,2420Nicaragua,132,2720,137,18,2611,76Niger,133,630,14,3.2,613,10Nigeria,134,1410,15,5.8,1892,32Niue,135,,294,,,1045Norway,136,50070,3780,5.9,47551,6267Oman,137,19740,321,,20334,325Pakistan,138,2410,8,,2396,16Palau,139,14340,1003,,,835Panama,140,8690,495,,8399,380Papua New Guinea,141,1630,111,,1747,29Paraguay,142,4040,131,,3900,117Peru,143,6490,171,13.1,6466,145Philippines,144,3430,88,,2932,45Poland,145,14250,636,19.9,13573,556Portugal,146,19960,1494,,20006,1830Puerto Rico,147,,,,19725.45,Qatar,148,,1115,,68696,2753Romania,149,10150,433,9.5,9374,315Russia,150,12740,404,,11861,369Rwanda,151,730,134,1.3,813,32Saint Kitts and Nevis,152,12440,403,,13677,569Saint Lucia,153,8500,237,,9279,339Saint Vincent and the Grenadines,154,6220,289,,6752,233Samoa,155,5090,188,66.3,4872,120San Marino,156,,2765,,,3591Sao Tome and Principe,157,1490,120,,1460,58Saudi Arabia,158,22300,468,,21220,491Senegal,159,1560,23,7.2,1676,40Serbia,160,9320,373,,,247Seychelles,161,14360,602,35.2,14202,573Sierra Leone,162,610,20,,790,9Singapore,163,43300,413,7.3,41479,1035Slovakia,164,17060,913,15,15881,718Slovenia,165,23970,1507,13.8,23004,1599Solomon Islands,166,1850,99,,1712,34Somalia,167,,8,,932.96,8South Africa,168,8900,364,,8477,456Spain,169,28200,1732,13.5,27270,2263Sri Lanka,170,3730,105,,3481,60Sudan,171,1780,23,,2249,38Suriname,172,7720,151,,7234,254Swaziland,173,4700,219,,4384,138Sweden,174,34310,2533,9.5,31995,3870Switzerland,175,40840,2598,7.5,35520,5878Syria,176,4110,52,,4059,66Taiwan,177,,,,26069,Tajikistan,178,1560,16,,1413,21Tanzania,179,980,27,4.4,1018,18Thailand,180,7440,223,,6869,113Timor-Leste,181,5100,150,,,52Togo,182,770,20,,888,19Tonga,183,5470,218,74.9,5135,121Trinidad and Tobago,184,16800,438,,15352,568Tunisia,185,6490,214,,6461,159Turkey,186,8410,461,22.7,7786,406Turkmenistan,187,3990,172,10.3,4247,161Tuvalu,188,,189,,,281Uganda,189,880,39,4.1,991,25Ukraine,190,6110,298,11.3,5583,159United Arab Emirates,191,31190,491,,33487,982United Kingdom,192,33650,2434,23,31580,3361United States of America,193,44070,3074,33.2,41674,6714Uruguay,194,9940,430,,9266,476Uzbekistan,195,2190,89,7.1,1975,30Vanuatu,196,3480,90,25.2,3477,68Venezuela,197,10970,196,,9876,332Vietnam,198,2310,86,,2142,46West Bank and Gaza,199,,,,3542,Yemen,200,2090,38,,2276,40Zambia,201,1140,29,3,1175,49Zimbabwe,202,,77,19.4,538,36


--------------------------------------------------------------------------------
/weak_conditional_independence_testing/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oxcsml/kerpy/50b175961d13e0e1f625aa987ae41cb98bfe4d84/weak_conditional_independence_testing/__init__.py


--------------------------------------------------------------------------------