├── README.md
├── data
    ├── example.csv
    └── gwr.csv
├── jupyter notebook
    └── example.ipynb
└── mgtwr
    ├── __init__.py
    ├── __pycache__
        ├── __init__.cpython-39.pyc
        ├── diagnosis.cpython-39.pyc
        ├── function.cpython-39.pyc
        ├── function_.cpython-39.pyc
        ├── kernel.cpython-39.pyc
        ├── kernelt.cpython-39.pyc
        ├── model.cpython-39.pyc
        ├── modelt.cpython-39.pyc
        ├── obj.cpython-39.pyc
        ├── objt.cpython-39.pyc
        ├── sel.cpython-39.pyc
        ├── selt.cpython-39.pyc
        └── setup.cpython-39.pyc
    ├── diagnosis.py
    ├── function.py
    ├── kernel.py
    ├── model.py
    ├── obj.py
    ├── sel.py
    └── setup.py


/README.md:
--------------------------------------------------------------------------------
 1 | # mgtwr
 2 | 
 3 | To fit geographically weighted model, geographically and temporally weighted regression model and multiscale geographically and temporally weighted regression model. You can
 4 | read example.ipynb to know how to use it.
 5 | 
 6 | # model.py improve
 7 | When trying to use **parallel processing**, **model.py** is the function that is called. The original code was using multiprocessing.So based on cProfile and the pstats library ran statistics, it is found **thread.lock** to be the main time consuming task,which causes parallel processing to be slower than non-parallel processing. So attempts were made to compare the processing of ThreadPoolExecutor from concurrent.futures and joblib, and ultimately joblib was found to be the fastest case.All of the result was run on the provided **example.csv**.
 8 | As you can see, joblib greatly reduces the parallel processing time, and also has a large improvement compared to orginal, however, it should be noted that **{method 'acquire' of '_thread.lock' objects}** is still the most time-consuming task, and how to solve this is beyond my ability to do.
 9 | 
10 | Related computer configuration:
11 | 12t Gen_Intel(R) Core(TM) i5-12600KF(6+4 for core 16 of threads)
12 | Crucial 16GB DDR5-4800 UDIMM
13 | (My computer configuration isn't that low also so I'm very sad why I have to run for 11 minutes when it's only 6 minutes in the example)
14 | 
15 | # result comparison:
16 | 1. Orginal(**thread=1**)
17 | **time cost: 0:11:3.748**
18 |          636805417 function calls (613137518 primitive calls) in 663.749 seconds
19 | 
20 |    Ordered by: internal time
21 | 
22 |    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
23 |   2151360  169.201    0.000  225.785    0.000 d:\anaconda\Lib\site-packages\scipy\linalg\_basic.py:40(solve)
24 | 19328497/15022891   35.541    0.000   81.668    0.000 {built-in method numpy.core._multiarray_umath.implement_array_function}
25 | 
26 | 2. Original(thread=15)
27 | **time cost: 0:24:29.669**
28 |          6022657 function calls (6019718 primitive calls) in 1469.671 seconds
29 | 
30 |    Ordered by: internal time
31 | 
32 |    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
33 |     23663 1397.118    0.059 1397.118    0.059 {method 'acquire' of '_thread.lock' objects}
34 | 
35 | 3. ThreadPoolExecutor(**thread=15**)
36 | **time cost: 0:07:44.877**
37 |          99635100 function calls (99632161 primitive calls) in 464.878 seconds
38 | 
39 |    Ordered by: internal time
40 | 
41 |    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
42 |  10310899  409.240    0.000  409.240    0.000 {method 'acquire' of '_thread.lock' objects}
43 | 
44 | 4. joblib(**thread=15**)
45 | time cost: 0:03:40.609
46 |          12395230 function calls (12381086 primitive calls) in 220.610 seconds
47 | 
48 |    Ordered by: internal time
49 | 
50 |    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
51 |    902982  187.731    0.000  187.731    0.000 {method 'acquire' of '_thread.lock' objects}
52 |      1245   22.471    0.018  215.907    0.173 d:\anaconda\Lib\site-packages\joblib\parallel.py:960(retrieve)
53 |    403539    2.014    0.000  192.678    0.000 d:\anaconda\Lib\concurrent\futures\_base.py:428(result)
54 |      1220    1.293    0.001  215.684    0.177 C:\Users\34456\AppData\Roaming\Python\Python311\site-packages\mgtwr\model.py:450(cal_aic)
55 | 


--------------------------------------------------------------------------------
/data/gwr.csv:
--------------------------------------------------------------------------------
  1 | longitude,latitude,x1,x2,x3,y
  2 | 941396.6,3521764.0,75.6,19.9,20.76,8.2
  3 | 895553.0,3471916.0,100.0,26.0,26.86,6.4
  4 | 930946.4,3502787.0,61.7,24.1,15.42,6.6
  5 | 745398.6,3474765.0,100.0,24.8,51.67,9.4
  6 | 849431.3,3665553.0,42.7,17.5,42.39,13.3
  7 | 819317.3,3807616.0,100.0,15.1,3.49,6.4
  8 | 803747.1,3769623.0,64.6,14.7,11.44,9.2
  9 | 699011.5,3793408.0,75.2,10.7,9.21,9.0
 10 | 863020.8,3520432.0,47.0,22.0,31.33,7.6
 11 | 859915.8,3466377.0,66.2,19.3,11.62,7.5
 12 | 809736.9,3636468.0,16.1,19.2,41.68,17.0
 13 | 844270.1,3595691.0,57.9,18.3,22.36,10.3
 14 | 979288.9,3463849.0,100.0,18.2,4.58,5.8
 15 | 827822.0,3421638.0,65.6,25.9,41.47,9.1
 16 | 1023145.0,3554982.0,80.6,13.2,14.85,11.8
 17 | 994903.4,3600493.0,63.2,27.5,25.95,19.9
 18 | 971593.8,3671394.0,72.3,30.3,52.19,9.6
 19 | 782448.2,3684504.0,73.4,15.6,35.48,7.2
 20 | 724741.2,3492653.0,100.0,31.8,58.89,10.1
 21 | 1008480.0,3437933.0,47.1,11.5,20.19,13.5
 22 | 964264.9,3598842.0,52.1,24.1,30.94,9.9
 23 | 678778.6,3713250.0,68.5,14.4,15.46,12.0
 24 | 670055.9,3862318.0,43.6,12.0,0.91,8.1
 25 | 962612.3,3432769.0,100.0,18.3,27.05,6.4
 26 | 1059706.0,3556747.0,5.1,17.2,38.02,18.6
 27 | 704959.2,3577608.0,13.7,10.4,30.94,20.2
 28 | 653026.6,3813760.0,77.4,14.6,8.61,5.9
 29 | 734240.9,3794110.0,57.8,6.1,1.77,18.4
 30 | 832508.6,3762905.0,17.6,27.0,26.23,37.5
 31 | 695793.9,3495219.0,100.0,35.7,60.76,11.2
 32 | 745538.8,3711726.0,4.4,8.6,23.82,14.7
 33 | 908046.1,3428340.0,58.6,26.4,27.29,6.7
 34 | 724646.8,3757187.0,5.8,5.6,9.84,33.0
 35 | 894463.9,3492465.0,64.6,22.5,25.46,11.1
 36 | 808691.8,3455994.0,59.4,22.8,24.16,10.0
 37 | 942527.9,3722100.0,30.6,6.6,10.93,23.9
 38 | 839816.1,3449007.0,62.0,22.4,29.94,6.5
 39 | 705457.9,3694344.0,76.1,11.4,22.59,13.3
 40 | 783416.5,3623343.0,100.0,14.0,30.66,5.7
 41 | 805648.4,3537103.0,48.4,29.0,40.66,10.0
 42 | 635964.3,3854592.0,96.5,14.6,0.35,8.0
 43 | 764386.1,3812502.0,100.0,12.8,0.29,8.6
 44 | 732628.4,3421800.0,58.0,23.3,39.47,11.7
 45 | 759231.9,3735253.0,2.5,9.9,42.23,32.7
 46 | 860451.4,3569933.0,70.7,21.8,27.64,8.0
 47 | 800031.3,3564188.0,72.6,32.9,48.98,9.5
 48 | 764116.9,3494367.0,10.0,24.4,50.15,17.0
 49 | 707288.7,3731361.0,26.7,6.6,7.63,12.0
 50 | 703495.1,3467152.0,52.8,31.4,44.09,9.4
 51 | 896654.0,3401148.0,100.0,14.6,11.48,4.7
 52 | 1031899.0,3596117.0,89.1,12.7,14.03,7.6
 53 | 879541.2,3785425.0,70.0,19.7,29.99,8.0
 54 | 943066.2,3616602.0,64.2,25.7,32.58,9.1
 55 | 981727.8,3571315.0,100.0,25.4,33.88,8.6
 56 | 739255.8,3866604.0,100.0,17.2,0.03,7.8
 57 | 731468.7,3700612.0,53.9,2.6,5.13,25.8
 58 | 662257.4,3789664.0,36.1,13.6,13.56,13.7
 59 | 765397.3,3789005.0,93.7,6.8,0.0,15.6
 60 | 845701.3,3813323.0,87.2,16.5,9.89,9.5
 61 | 733728.4,3733248.0,4.2,18.4,49.92,31.6
 62 | 732702.3,3844809.0,100.0,16.6,0.26,8.6
 63 | 908386.8,3685752.0,100.0,16.8,12.69,5.3
 64 | 1023411.0,3471063.0,20.3,14.3,25.57,19.9
 65 | 695325.1,3822135.0,79.7,11.1,3.78,9.2
 66 | 765058.1,3421817.0,55.4,22.3,31.5,7.7
 67 | 855577.3,3722330.0,75.7,25.1,49.89,8.8
 68 | 772634.6,3764306.0,13.6,4.0,5.11,29.6
 69 | 818917.1,3839931.0,88.5,11.6,5.42,12.0
 70 | 794419.5,3803344.0,81.1,10.6,8.48,15.4
 71 | 873518.8,3689861.0,100.0,30.1,79.64,6.8
 72 | 665933.8,3740622.0,67.8,14.4,6.47,7.5
 73 | 695500.6,3624790.0,95.8,13.7,25.49,13.6
 74 | 870749.9,3810303.0,73.8,14.2,20.41,9.1
 75 | 675280.4,3685569.0,100.0,19.1,13.38,5.7
 76 | 763488.4,3699716.0,76.0,6.1,10.24,10.7
 77 | 814118.9,3590553.0,20.9,10.6,21.8,16.0
 78 | 855461.8,3506293.0,63.4,27.2,30.5,8.3
 79 | 815753.1,3783949.0,78.0,14.1,9.58,9.0
 80 | 807249.1,3695092.0,100.0,17.4,34.8,10.8
 81 | 915741.9,3530869.0,65.1,18.8,15.36,8.3
 82 | 924108.1,3668080.0,100.0,31.3,55.92,6.2
 83 | 970465.7,3640263.0,53.8,27.8,41.51,7.7
 84 | 908636.7,3624562.0,100.0,22.2,33.89,4.9
 85 | 821367.1,3660143.0,81.9,10.8,25.6,12.0
 86 | 766461.7,3663959.0,63.6,16.3,34.03,10.0
 87 | 873804.3,3439981.0,100.0,25.9,26.58,5.4
 88 | 884830.4,3599291.0,52.9,20.5,33.32,12.0
 89 | 770455.5,3520161.0,78.2,12.6,19.22,13.7
 90 | 1014742.0,3537225.0,32.9,17.2,39.15,13.4
 91 | 919396.5,3752562.0,100.0,17.8,38.19,8.2
 92 | 1004544.0,3517834.0,100.0,23.7,21.75,5.2
 93 | 864781.1,3419313.0,47.6,19.9,31.88,16.3
 94 | 772600.0,3832429.0,78.6,15.3,1.41,11.1
 95 | 917730.9,3716368.0,65.9,21.6,36.38,10.4
 96 | 1030500.0,3500535.0,100.0,22.3,43.34,8.7
 97 | 777055.3,3584821.0,65.6,29.2,58.72,10.1
 98 | 848638.8,3785405.0,100.0,15.7,8.32,9.7
 99 | 732876.8,3584393.0,100.0,28.2,41.32,4.6
100 | 715359.8,3660275.0,82.3,22.4,44.62,6.7
101 | 716369.8,3451034.0,100.0,22.1,27.48,8.2
102 | 766238.6,3453930.0,56.2,28.7,47.91,7.8
103 | 790338.7,3660608.0,75.1,13.8,31.78,12.9
104 | 920887.4,3568473.0,98.6,24.5,28.27,10.1
105 | 825920.1,3717990.0,73.0,15.0,34.74,11.0
106 | 707834.3,3854188.0,89.0,11.3,0.26,5.5
107 | 700833.7,3598228.0,3.2,18.6,37.95,16.6
108 | 793263.9,3719734.0,76.0,14.4,22.35,9.5
109 | 830735.9,3750903.0,95.2,7.9,7.37,28.4
110 | 863291.8,3756777.0,100.0,16.2,24.74,12.8
111 | 695329.2,3758093.0,93.7,8.8,3.94,7.6
112 | 798061.4,3609091.0,61.3,24.0,47.53,15.2
113 | 733846.7,3812828.0,100.0,12.8,1.48,9.0
114 | 953533.8,3482044.0,74.4,21.3,11.69,6.3
115 | 744180.8,3665561.0,100.0,13.4,20.04,9.3
116 | 668031.4,3764766.0,66.5,16.3,14.3,6.8
117 | 833819.6,3567447.0,56.5,24.3,32.46,10.7
118 | 840169.1,3695254.0,66.5,16.4,32.79,11.7
119 | 686875.4,3524124.0,100.0,33.0,49.93,7.3
120 | 824645.5,3864805.0,100.0,13.6,0.35,11.6
121 | 712437.1,3519627.0,53.5,35.9,58.17,6.0
122 | 954272.3,3697862.0,9.9,18.2,41.96,17.3
123 | 777759.0,3729605.0,59.2,6.2,8.03,18.1
124 | 752973.1,3570222.0,100.0,19.9,34.09,8.0
125 | 1004028.0,3641918.0,79.3,22.9,44.69,8.6
126 | 704495.6,3422002.0,69.4,29.1,32.74,7.8
127 | 754916.2,3685029.0,53.6,15.6,29.08,11.1
128 | 842085.9,3827075.0,64.5,17.0,11.81,13.1
129 | 703256.8,3552857.0,100.0,31.4,63.46,8.0
130 | 763457.1,3551752.0,45.4,24.8,46.53,15.9
131 | 734217.9,3623162.0,97.9,24.9,62.34,7.1
132 | 884376.9,3717493.0,100.0,31.9,61.36,5.6
133 | 963427.8,3560039.0,79.3,21.9,29.19,6.5
134 | 759410.8,3608179.0,100.0,29.5,43.21,7.1
135 | 882069.4,3534470.0,72.6,27.3,34.45,8.6
136 | 743031.8,3522636.0,50.3,29.1,59.9,9.2
137 | 795506.2,3421725.0,55.2,22.6,37.93,13.4
138 | 831682.3,3487715.0,51.1,22.9,26.68,14.0
139 | 941734.4,3567586.0,35.7,24.0,23.38,11.4
140 | 797981.7,3872640.0,100.0,14.0,0.0,11.4
141 | 919077.6,3595170.0,53.3,27.1,33.1,6.3
142 | 682616.8,3660254.0,44.0,16.3,30.03,13.6
143 | 819399.6,3514927.0,44.5,31.3,40.66,7.2
144 | 832935.0,3623868.0,100.0,26.0,45.93,4.8
145 | 777040.1,3858779.0,100.0,18.3,0.1,10.1
146 | 752165.2,3639192.0,65.3,14.7,27.78,9.0
147 | 658870.4,3842167.0,44.8,12.8,3.73,8.4
148 | 800384.3,3742691.0,61.2,13.2,18.37,9.4
149 | 938349.6,3446675.0,54.2,21.1,25.88,10.4
150 | 902471.1,3699878.0,100.0,32.6,60.23,4.2
151 | 894704.3,3648583.0,67.1,21.6,51.86,9.8
152 | 986832.8,3494323.0,59.9,21.2,19.45,9.6
153 | 731576.3,3544716.0,100.0,22.5,50.2,5.5
154 | 898776.3,3563384.0,100.0,30.3,30.06,8.6
155 | 796905.6,3841086.0,100.0,12.5,2.59,13.6
156 | 686891.4,3855274.0,70.0,11.1,4.06,12.0
157 | 838551.5,3538547.0,100.0,28.6,31.76,7.6
158 | 891228.5,3749769.0,59.6,22.6,45.94,10.4
159 | 858796.9,3637891.0,100.0,15.3,41.99,8.8
160 | 801018.1,3487328.0,71.1,26.2,30.71,6.3
161 | 


--------------------------------------------------------------------------------
/jupyter notebook/example.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "id": "2d479b4a",
  6 |    "metadata": {
  7 |     "ExecuteTime": {
  8 |      "end_time": "2022-09-19T07:47:51.787237Z",
  9 |      "start_time": "2022-09-19T07:47:51.778516Z"
 10 |     }
 11 |    },
 12 |    "source": [
 13 |     "Read data"
 14 |    ]
 15 |   },
 16 |   {
 17 |    "cell_type": "code",
 18 |    "execution_count": 1,
 19 |    "id": "4b06eeb4",
 20 |    "metadata": {
 21 |     "ExecuteTime": {
 22 |      "end_time": "2022-09-19T08:09:42.796617Z",
 23 |      "start_time": "2022-09-19T08:09:42.434654Z"
 24 |     }
 25 |    },
 26 |    "outputs": [],
 27 |    "source": [
 28 |     "import pandas as pd"
 29 |    ]
 30 |   },
 31 |   {
 32 |    "cell_type": "code",
 33 |    "execution_count": 2,
 34 |    "id": "7d65c606",
 35 |    "metadata": {
 36 |     "ExecuteTime": {
 37 |      "end_time": "2022-09-19T08:09:43.559791Z",
 38 |      "start_time": "2022-09-19T08:09:43.539778Z"
 39 |     }
 40 |    },
 41 |    "outputs": [],
 42 |    "source": [
 43 |     "data = pd.read_csv(r'\\data\\example.csv')"
 44 |    ]
 45 |   },
 46 |   {
 47 |    "cell_type": "code",
 48 |    "execution_count": 3,
 49 |    "id": "796fe3e5",
 50 |    "metadata": {
 51 |     "ExecuteTime": {
 52 |      "end_time": "2022-09-19T08:09:44.798946Z",
 53 |      "start_time": "2022-09-19T08:09:44.787275Z"
 54 |     }
 55 |    },
 56 |    "outputs": [],
 57 |    "source": [
 58 |     "coords = data[['longitude', 'latitude']]\n",
 59 |     "t = data[['t']]\n",
 60 |     "X = data[['x1', 'x2']]\n",
 61 |     "y = data[['y']]"
 62 |    ]
 63 |   },
 64 |   {
 65 |    "cell_type": "markdown",
 66 |    "id": "905c6002",
 67 |    "metadata": {},
 68 |    "source": [
 69 |     "GWR model"
 70 |    ]
 71 |   },
 72 |   {
 73 |    "cell_type": "code",
 74 |    "execution_count": 4,
 75 |    "id": "532ffaf3",
 76 |    "metadata": {
 77 |     "ExecuteTime": {
 78 |      "end_time": "2022-09-19T08:09:50.207629Z",
 79 |      "start_time": "2022-09-19T08:09:50.131373Z"
 80 |     }
 81 |    },
 82 |    "outputs": [],
 83 |    "source": [
 84 |     "from mgtwr.sel import SearchGWRParameter\n",
 85 |     "from mgtwr.model import GWR"
 86 |    ]
 87 |   },
 88 |   {
 89 |    "cell_type": "code",
 90 |    "execution_count": 5,
 91 |    "id": "e0aa3e40",
 92 |    "metadata": {
 93 |     "ExecuteTime": {
 94 |      "end_time": "2022-09-19T08:09:54.705874Z",
 95 |      "start_time": "2022-09-19T08:09:53.355340Z"
 96 |     }
 97 |    },
 98 |    "outputs": [
 99 |     {
100 |      "name": "stdout",
101 |      "output_type": "stream",
102 |      "text": [
103 |       "bw: 15.0 , score: 18778.49\n",
104 |       "bw: 10.0 , score: 18764.75\n",
105 |       "bw: 6.0 , score: 18699.21\n",
106 |       "bw: 4.0 , score: 18506.22\n",
107 |       "bw: 2.0 , score: 17786.86\n",
108 |       "bw: 2.0 , score: 17786.86\n",
109 |       "time cost: 0:00:1.934\n"
110 |      ]
111 |     }
112 |    ],
113 |    "source": [
114 |     "sel = SearchGWRParameter(coords, X, y, kernel='gaussian', fixed=True)\n",
115 |     "bw = sel.search(bw_max=40, verbose=True, time_cost=True)"
116 |    ]
117 |   },
118 |   {
119 |    "cell_type": "code",
120 |    "execution_count": 6,
121 |    "id": "cb3be837",
122 |    "metadata": {
123 |     "ExecuteTime": {
124 |      "end_time": "2022-09-19T08:10:32.986328Z",
125 |      "start_time": "2022-09-19T08:10:32.709532Z"
126 |     }
127 |    },
128 |    "outputs": [
129 |     {
130 |      "name": "stdout",
131 |      "output_type": "stream",
132 |      "text": [
133 |       "0.5935790327518\n"
134 |      ]
135 |     }
136 |    ],
137 |    "source": [
138 |     "gwr = GWR(coords, X, y, bw, kernel='gaussian', fixed=True).fit()\n",
139 |     "print(gwr.R2)"
140 |    ]
141 |   },
142 |   {
143 |    "cell_type": "markdown",
144 |    "id": "ac6e9d39",
145 |    "metadata": {},
146 |    "source": [
147 |     "MGWR model"
148 |    ]
149 |   },
150 |   {
151 |    "cell_type": "code",
152 |    "execution_count": 7,
153 |    "id": "20f580b3",
154 |    "metadata": {},
155 |    "outputs": [],
156 |    "source": [
157 |     "from mgtwr.sel import SearchMGWRParameter\n",
158 |     "from mgtwr.model import MGWR"
159 |    ]
160 |   },
161 |   {
162 |    "cell_type": "code",
163 |    "execution_count": 8,
164 |    "id": "08bf65d5",
165 |    "metadata": {},
166 |    "outputs": [
167 |     {
168 |      "name": "stdout",
169 |      "output_type": "stream",
170 |      "text": [
171 |       "Current iteration: 1 ,SOC: 0.0033171\n",
172 |       "Bandwidths: 986.8, 965.5, 0.7\n",
173 |       "Current iteration: 2 ,SOC: 5.64e-05\n",
174 |       "Bandwidths: 986.8, 986.8, 0.7\n",
175 |       "Current iteration: 3 ,SOC: 4.27e-05\n",
176 |       "Bandwidths: 986.8, 986.8, 0.7\n",
177 |       "Current iteration: 4 ,SOC: 3.22e-05\n",
178 |       "Bandwidths: 986.8, 986.8, 0.7\n",
179 |       "Current iteration: 5 ,SOC: 2.43e-05\n",
180 |       "Bandwidths: 986.8, 986.8, 0.7\n",
181 |       "time cost: 0:00:35.14\n"
182 |      ]
183 |     }
184 |    ],
185 |    "source": [
186 |     "sel_multi = SearchMGWRParameter(coords, X, y, kernel='gaussian', fixed=True)\n",
187 |     "bws = sel_multi.search(multi_bw_max=[1000], verbose=True, time_cost=True, tol_multi=3.0e-5)"
188 |    ]
189 |   },
190 |   {
191 |    "cell_type": "code",
192 |    "execution_count": 9,
193 |    "id": "e7dbf9fb",
194 |    "metadata": {},
195 |    "outputs": [
196 |     {
197 |      "name": "stdout",
198 |      "output_type": "stream",
199 |      "text": [
200 |       "0.7045779853867871\n"
201 |      ]
202 |     }
203 |    ],
204 |    "source": [
205 |     "mgwr = MGWR(coords, X, y, sel_multi, kernel='gaussian', fixed=True).fit()\n",
206 |     "print(mgwr.R2)"
207 |    ]
208 |   },
209 |   {
210 |    "cell_type": "markdown",
211 |    "id": "68c915f1",
212 |    "metadata": {},
213 |    "source": [
214 |     "If you already know bws, you can also do the following"
215 |    ]
216 |   },
217 |   {
218 |    "cell_type": "code",
219 |    "execution_count": 10,
220 |    "id": "56555609",
221 |    "metadata": {},
222 |    "outputs": [
223 |     {
224 |      "name": "stdout",
225 |      "output_type": "stream",
226 |      "text": [
227 |       "0.7045779853867871\n"
228 |      ]
229 |     }
230 |    ],
231 |    "source": [
232 |     "class sel_multi:\n",
233 |     "    def __init__(self, bws):\n",
234 |     "        self.bws = bws\n",
235 |     "\n",
236 |     "        \n",
237 |     "selector = sel_multi(bws)\n",
238 |     "mgwr = MGWR(coords, X, y, selector, kernel='gaussian', fixed=True).fit()\n",
239 |     "print(mgwr.R2)"
240 |    ]
241 |   },
242 |   {
243 |    "cell_type": "markdown",
244 |    "id": "6aea4108",
245 |    "metadata": {
246 |     "ExecuteTime": {
247 |      "end_time": "2022-09-19T08:11:21.337967Z",
248 |      "start_time": "2022-09-19T08:11:21.326547Z"
249 |     }
250 |    },
251 |    "source": [
252 |     "GTWR model"
253 |    ]
254 |   },
255 |   {
256 |    "cell_type": "code",
257 |    "execution_count": 11,
258 |    "id": "462da66a",
259 |    "metadata": {
260 |     "ExecuteTime": {
261 |      "end_time": "2022-09-19T08:11:53.026336Z",
262 |      "start_time": "2022-09-19T08:11:53.021405Z"
263 |     }
264 |    },
265 |    "outputs": [],
266 |    "source": [
267 |     "from mgtwr.sel import SearchGTWRParameter\n",
268 |     "from mgtwr.model import GTWR"
269 |    ]
270 |   },
271 |   {
272 |    "cell_type": "code",
273 |    "execution_count": 12,
274 |    "id": "4f9cc821",
275 |    "metadata": {
276 |     "ExecuteTime": {
277 |      "end_time": "2022-09-19T08:14:07.489058Z",
278 |      "start_time": "2022-09-19T08:13:28.866324Z"
279 |     }
280 |    },
281 |    "outputs": [
282 |     {
283 |      "name": "stdout",
284 |      "output_type": "stream",
285 |      "text": [
286 |       "bw:  5.9 , tau:  19.9 , score:  18095.04059255282\n",
287 |       "bw:  3.7 , tau:  19.9 , score:  17608.38596885707\n",
288 |       "bw:  2.3 , tau:  10.1 , score:  16461.58709937909\n",
289 |       "bw:  1.4 , tau:  3.8 , score:  14817.811620052908\n",
290 |       "bw:  0.9 , tau:  1.4 , score:  13780.792562049754\n",
291 |       "bw:  0.9 , tau:  1.4 , score:  13780.792562049754\n",
292 |       "bw:  0.9 , tau:  1.4 , score:  13780.792562049754\n",
293 |       "bw:  0.9 , tau:  1.4 , score:  13780.792562049754\n",
294 |       "bw:  0.9 , tau:  1.4 , score:  13780.792562049754\n",
295 |       "time cost: 0:00:40.776\n"
296 |      ]
297 |     }
298 |    ],
299 |    "source": [
300 |     "sel = SearchGTWRParameter(coords, t, X, y, kernel='gaussian', fixed=True)\n",
301 |     "bw, tau = sel.search(tau_max=20, verbose=True, time_cost=True)"
302 |    ]
303 |   },
304 |   {
305 |    "cell_type": "code",
306 |    "execution_count": 13,
307 |    "id": "4bbf93f8",
308 |    "metadata": {
309 |     "ExecuteTime": {
310 |      "end_time": "2022-09-19T08:14:17.776587Z",
311 |      "start_time": "2022-09-19T08:14:17.313360Z"
312 |     }
313 |    },
314 |    "outputs": [
315 |     {
316 |      "name": "stdout",
317 |      "output_type": "stream",
318 |      "text": [
319 |       "0.9829884630503501\n"
320 |      ]
321 |     }
322 |    ],
323 |    "source": [
324 |     "gtwr = GTWR(coords, t, X, y, bw, tau, kernel='gaussian', fixed=True).fit()\n",
325 |     "print(gtwr.R2)"
326 |    ]
327 |   },
328 |   {
329 |    "cell_type": "markdown",
330 |    "id": "2ad9399f",
331 |    "metadata": {},
332 |    "source": [
333 |     "MGTWR model"
334 |    ]
335 |   },
336 |   {
337 |    "cell_type": "code",
338 |    "execution_count": 14,
339 |    "id": "7d015f1a",
340 |    "metadata": {
341 |     "ExecuteTime": {
342 |      "end_time": "2022-09-19T08:15:02.313810Z",
343 |      "start_time": "2022-09-19T08:15:02.303789Z"
344 |     }
345 |    },
346 |    "outputs": [],
347 |    "source": [
348 |     "from mgtwr.sel import SearchMGTWRParameter\n",
349 |     "from mgtwr.model import MGTWR"
350 |    ]
351 |   },
352 |   {
353 |    "cell_type": "code",
354 |    "execution_count": 15,
355 |    "id": "94d738b5",
356 |    "metadata": {
357 |     "ExecuteTime": {
358 |      "end_time": "2022-09-19T08:23:08.330524Z",
359 |      "start_time": "2022-09-19T08:15:42.813827Z"
360 |     }
361 |    },
362 |    "outputs": [
363 |     {
364 |      "name": "stdout",
365 |      "output_type": "stream",
366 |      "text": [
367 |       "Current iteration: 1 ,SOC: 0.0025274\n",
368 |       "Bandwidths: 0.7, 0.7, 0.5\n",
369 |       "taus: 1.3,0.8,0.8\n",
370 |       "Current iteration: 2 ,SOC: 0.0011033\n",
371 |       "Bandwidths: 0.9, 0.7, 0.5\n",
372 |       "taus: 3.0,0.4,0.8\n",
373 |       "Current iteration: 3 ,SOC: 0.0005365\n",
374 |       "Bandwidths: 0.9, 0.7, 0.5\n",
375 |       "taus: 3.4,0.2,0.8\n",
376 |       "Current iteration: 4 ,SOC: 0.0003\n",
377 |       "Bandwidths: 0.9, 0.7, 0.5\n",
378 |       "taus: 3.4,0.2,0.8\n",
379 |       "Current iteration: 5 ,SOC: 0.0001986\n",
380 |       "Bandwidths: 0.9, 0.7, 0.5\n",
381 |       "taus: 3.6,0.2,0.8\n",
382 |       "Current iteration: 6 ,SOC: 0.0001415\n",
383 |       "Bandwidths: 0.9, 0.7, 0.5\n",
384 |       "taus: 3.6,0.2,0.8\n",
385 |       "Current iteration: 7 ,SOC: 0.0001052\n",
386 |       "Bandwidths: 0.9, 0.7, 0.5\n",
387 |       "taus: 3.6,0.2,0.8\n",
388 |       "Current iteration: 8 ,SOC: 7.99e-05\n",
389 |       "Bandwidths: 0.9, 0.7, 0.5\n",
390 |       "taus: 3.6,0.2,0.8\n",
391 |       "time cost: 0:06:2.651\n"
392 |      ]
393 |     }
394 |    ],
395 |    "source": [
396 |     "sel_multi = SearchMGTWRParameter(coords, t, X, y, kernel='gaussian', fixed=True)\n",
397 |     "bws = sel_multi.search(multi_bw_min=[0.1], verbose=True, tol_multi=1.0e-4, time_cost=True)"
398 |    ]
399 |   },
400 |   {
401 |    "cell_type": "code",
402 |    "execution_count": 16,
403 |    "id": "51401611",
404 |    "metadata": {
405 |     "ExecuteTime": {
406 |      "end_time": "2022-09-19T08:24:31.131209Z",
407 |      "start_time": "2022-09-19T08:24:16.718379Z"
408 |     }
409 |    },
410 |    "outputs": [
411 |     {
412 |      "name": "stdout",
413 |      "output_type": "stream",
414 |      "text": [
415 |       "0.9972924820674222\n"
416 |      ]
417 |     }
418 |    ],
419 |    "source": [
420 |     "mgtwr = MGTWR(coords, t, X, y, sel_multi, kernel='gaussian', fixed=True).fit()\n",
421 |     "print(mgtwr.R2)"
422 |    ]
423 |   },
424 |   {
425 |    "cell_type": "markdown",
426 |    "id": "541bdcce",
427 |    "metadata": {},
428 |    "source": [
429 |     "If you already know bws, you can also do the following"
430 |    ]
431 |   },
432 |   {
433 |    "cell_type": "code",
434 |    "execution_count": 17,
435 |    "id": "bcfc1992",
436 |    "metadata": {
437 |     "ExecuteTime": {
438 |      "end_time": "2022-09-19T08:25:21.934146Z",
439 |      "start_time": "2022-09-19T08:25:08.333204Z"
440 |     }
441 |    },
442 |    "outputs": [
443 |     {
444 |      "name": "stdout",
445 |      "output_type": "stream",
446 |      "text": [
447 |       "0.9972924820674222\n"
448 |      ]
449 |     }
450 |    ],
451 |    "source": [
452 |     "class sel_multi:\n",
453 |     "    def __init__(self, bws):\n",
454 |     "        self.bws = bws\n",
455 |     "\n",
456 |     "        \n",
457 |     "selector = sel_multi(bws)\n",
458 |     "mgtwr = MGTWR(coords, t, X, y, selector, kernel='gaussian', fixed=True).fit()\n",
459 |     "print(mgtwr.R2)"
460 |    ]
461 |   },
462 |   {
463 |    "cell_type": "code",
464 |    "execution_count": null,
465 |    "id": "a0534878",
466 |    "metadata": {},
467 |    "outputs": [],
468 |    "source": []
469 |   }
470 |  ],
471 |  "metadata": {
472 |   "kernelspec": {
473 |    "display_name": "Python 3 (ipykernel)",
474 |    "language": "python",
475 |    "name": "python3"
476 |   },
477 |   "language_info": {
478 |    "codemirror_mode": {
479 |     "name": "ipython",
480 |     "version": 3
481 |    },
482 |    "file_extension": ".py",
483 |    "mimetype": "text/x-python",
484 |    "name": "python",
485 |    "nbconvert_exporter": "python",
486 |    "pygments_lexer": "ipython3",
487 |    "version": "3.9.12"
488 |   }
489 |  },
490 |  "nbformat": 4,
491 |  "nbformat_minor": 5
492 | }
493 | 


--------------------------------------------------------------------------------
/mgtwr/__init__.py:
--------------------------------------------------------------------------------
1 | __version__ = '2.0.5'
2 | 


--------------------------------------------------------------------------------
/mgtwr/__pycache__/__init__.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/sunkun1997/mgtwr/5680ee85d6f010a9f34c3d611c51d95a9938c4db/mgtwr/__pycache__/__init__.cpython-39.pyc


--------------------------------------------------------------------------------
/mgtwr/__pycache__/diagnosis.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/sunkun1997/mgtwr/5680ee85d6f010a9f34c3d611c51d95a9938c4db/mgtwr/__pycache__/diagnosis.cpython-39.pyc


--------------------------------------------------------------------------------
/mgtwr/__pycache__/function.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/sunkun1997/mgtwr/5680ee85d6f010a9f34c3d611c51d95a9938c4db/mgtwr/__pycache__/function.cpython-39.pyc


--------------------------------------------------------------------------------
/mgtwr/__pycache__/function_.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/sunkun1997/mgtwr/5680ee85d6f010a9f34c3d611c51d95a9938c4db/mgtwr/__pycache__/function_.cpython-39.pyc


--------------------------------------------------------------------------------
/mgtwr/__pycache__/kernel.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/sunkun1997/mgtwr/5680ee85d6f010a9f34c3d611c51d95a9938c4db/mgtwr/__pycache__/kernel.cpython-39.pyc


--------------------------------------------------------------------------------
/mgtwr/__pycache__/kernelt.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/sunkun1997/mgtwr/5680ee85d6f010a9f34c3d611c51d95a9938c4db/mgtwr/__pycache__/kernelt.cpython-39.pyc


--------------------------------------------------------------------------------
/mgtwr/__pycache__/model.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/sunkun1997/mgtwr/5680ee85d6f010a9f34c3d611c51d95a9938c4db/mgtwr/__pycache__/model.cpython-39.pyc


--------------------------------------------------------------------------------
/mgtwr/__pycache__/modelt.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/sunkun1997/mgtwr/5680ee85d6f010a9f34c3d611c51d95a9938c4db/mgtwr/__pycache__/modelt.cpython-39.pyc


--------------------------------------------------------------------------------
/mgtwr/__pycache__/obj.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/sunkun1997/mgtwr/5680ee85d6f010a9f34c3d611c51d95a9938c4db/mgtwr/__pycache__/obj.cpython-39.pyc


--------------------------------------------------------------------------------
/mgtwr/__pycache__/objt.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/sunkun1997/mgtwr/5680ee85d6f010a9f34c3d611c51d95a9938c4db/mgtwr/__pycache__/objt.cpython-39.pyc


--------------------------------------------------------------------------------
/mgtwr/__pycache__/sel.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/sunkun1997/mgtwr/5680ee85d6f010a9f34c3d611c51d95a9938c4db/mgtwr/__pycache__/sel.cpython-39.pyc


--------------------------------------------------------------------------------
/mgtwr/__pycache__/selt.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/sunkun1997/mgtwr/5680ee85d6f010a9f34c3d611c51d95a9938c4db/mgtwr/__pycache__/selt.cpython-39.pyc


--------------------------------------------------------------------------------
/mgtwr/__pycache__/setup.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/sunkun1997/mgtwr/5680ee85d6f010a9f34c3d611c51d95a9938c4db/mgtwr/__pycache__/setup.cpython-39.pyc


--------------------------------------------------------------------------------
/mgtwr/diagnosis.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | 
 3 | 
 4 | def get_AICc(gtwr):
 5 |     """
 6 |     Get AICc value
 7 | 
 8 |     Gaussian: p61, (2.33), Fotheringham, Brunsdon and Charlton (2002)
 9 | 
10 |     GWGLM: AICc=AIC+2k(k+1)/(n-k-1), okay et al. (2005): p2704, (36)
11 | 
12 |     """
13 |     n = gtwr.n
14 |     k = gtwr.tr_S
15 | 
16 |     aicc = get_AIC(gtwr) + 2.0 * k * (k + 1.0) / (n - k - 1.0)
17 |     return aicc
18 | 
19 | 
20 | def get_AIC(gtwr):
21 |     """
22 |     Get AIC value
23 | 
24 |     Gaussian: p96, (4.22), Fotheringham, Brunsdon and Charlton (2002)
25 | 
26 |     GWGLM:  AIC(G)=D(G) + 2K(G), where D and K denote the deviance and the effective
27 |     number of parameters in the model with bandwidth G, respectively.
28 | 
29 |     """
30 | 
31 |     k = gtwr.tr_S
32 | 
33 |     aic = -2.0 * gtwr.llf + 2.0 * (k + 1)
34 | 
35 |     return aic
36 | 
37 | 
38 | def get_BIC(gtwr):
39 |     """
40 |     Get BIC value
41 | 
42 |     Gaussian: p61 (2.34), Fotheringham, Brunsdon and Charlton (2002)
43 |     BIC = -2log(L)+k*log(n)
44 | 
45 |     GWGLM: BIC = dev + tr_S * log(n)
46 | 
47 |     """
48 |     n = gtwr.n  # (scalar) number of observations
49 |     k = gtwr.tr_S
50 | 
51 |     bic = -2.0 * gtwr.llf + (k + 1) * np.log(n)
52 |     return bic
53 | 
54 | 
55 | def get_CV(gtwr):
56 |     """
57 |     Get CV value
58 | 
59 |     Gaussian only
60 | 
61 |     Methods: p60, (2.31) or p212 (9.4)
62 |     Fotheringham, A. S., Brunsdon, C., & Charleston, M. (2002).
63 |     Geographically weighted regression: the analysis of spatially varying relationships.
64 |     Modification: sum of residual squared is divided by n according to GWR4 results
65 | 
66 |     """
67 | 
68 |     cv = gtwr.aa / gtwr.n
69 |     return cv
70 | 
71 | 
72 | def corr(cov):
73 | 
74 |     invsd = np.diag(1 / np.sqrt(np.diag(cov)))
75 |     cors = np.dot(np.dot(invsd, cov), invsd)
76 |     return cors
77 | 


--------------------------------------------------------------------------------
/mgtwr/function.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | from scipy import linalg
  3 | import time
  4 | from typing import Callable
  5 | from copy import deepcopy
  6 | 
  7 | 
  8 | def print_time(func: Callable):
  9 |     def inner(*args, **kwargs):
 10 |         start = time.time()
 11 |         res = func(*args, **kwargs)
 12 |         end = time.time()
 13 |         m, s = divmod(end - start, 60)
 14 |         h, m = divmod(m, 60)
 15 |         if 'time_cost' in kwargs and kwargs['time_cost']:
 16 |             print("time cost: %d:%02d:%s" % (h, m, round(s, 3)))
 17 |         return res
 18 |     return inner
 19 | 
 20 | 
 21 | def _compute_betas_gwr(y, x, wi):
 22 |     """
 23 |     compute MLE coefficients using iwls routine
 24 | 
 25 |     Methods: p189, Iteratively (Re)weighted Least Squares (IWLS),
 26 |     Fotheringham, A. S., Brunsdon, C., & Charlton, M. (2002).
 27 |     Geographically weighted regression: the analysis of spatially varying relationships.
 28 |     """
 29 |     xt = (x * wi).T
 30 |     xtx = np.dot(xt, x)
 31 |     xtx_inv_xt = linalg.solve(xtx, xt)
 32 |     betas = np.dot(xtx_inv_xt, y)
 33 |     return betas, xtx_inv_xt
 34 | 
 35 | 
 36 | def surface_to_plane(
 37 |         longitude: np.ndarray,
 38 |         latitude: np.ndarray,
 39 |         central_longitude: int = 114
 40 | ):
 41 | 
 42 |     r"""
 43 |     base on Gauss-Kruger projection
 44 | 
 45 |     equatorial radius: a = 6378136.49m
 46 | 
 47 |     polar radius: b = 6356755m
 48 | 
 49 |     so that
 50 | 
 51 |     first eccentricity :math:`e = \sqrt{a^2-b^2}/a`
 52 | 
 53 |     second eccentricity :math:`e' = \sqrt{a^2-b^2}/b`
 54 | 
 55 |     so that
 56 | 
 57 |     .. math::
 58 |         \begin{aligned}
 59 |             Y_{b0}=a^2B\beta_0/b +
 60 |             sin(B)\left(\beta_2cos(B)+\beta_4cos^3(B)+\beta_6cos^5(B)+\beta_8cos^7(B)\right)
 61 |         \end{aligned}
 62 |     where B is the latitude converted from degrees to radians and
 63 | 
 64 |     .. math::
 65 |         \begin{aligned}
 66 |             \beta_0 &= 1-\frac{3}{4}e'^2+\frac{45}{64}e'^4-\frac{175}{256}e'^6+
 67 |                        \frac{11025}{16384}e'^8 \\
 68 |             \beta_2 &= \beta_0 - 1 \\
 69 |             \beta_4 &= \frac{15}{32}e'^4-\frac{175}{384}e'^6+\frac{3675}{8192}e'^8 \\
 70 |             \beta_6 &= -\frac{35}{96}e'^6 + \frac{735}{2048}e'^8 \\
 71 |             \beta_8 &= \frac{315}{1024}e'^8 \\
 72 |         \end{aligned}
 73 | 
 74 |     so that the Y-axis is
 75 | 
 76 |     .. math::
 77 |         \begin{aligned}
 78 |             Y &= Y_{b0}+\frac{1}{2}Ntan(B)m^2+\frac{1}{24}\left(5-tan^2(B)+9\eta^2+4\eta^4
 79 |                 \right)Ntan(B)m^4 \\
 80 |               &+ \frac{1}{720}\left(61-58tan^2(B)\right)Ntan(B)m^6
 81 |         \end{aligned}
 82 |     where L is the longitude subtracts the central longitude converted to radians and
 83 | 
 84 |     .. math::
 85 |         \begin{aligned}
 86 |             N &= a/\sqrt{1-(esin(B))^2} \\
 87 |             \eta &= e'cos(B) \\
 88 |             m &= Lcos(B) \\
 89 |         \end{aligned}
 90 |     so that the X_axis is
 91 | 
 92 |     .. math::
 93 |         \begin{aligned}
 94 |             X &= Nm+\frac{1}{6}\left(1-tan^2(B)+\eta^2\right)Nm^3 \\
 95 |               &+ \frac{1}{120}\left(5-18tan^2(B)+tan^4(B)+14\eta^2-58tan^2(B)\eta\right)Nm^5+500000
 96 |         \end{aligned}
 97 |     """
 98 |     a = 6378136.49
 99 |     b = 6356755
100 | 
101 |     e1 = np.sqrt(a ** 2 - b ** 2) / a
102 |     e2 = np.sqrt(a ** 2 - b ** 2) / b
103 |     beta0 = 1 - (3 / 4) * e2 ** 2 + (45 / 64) * e2 ** 4 - (175 / 256) * e2 ** 6 \
104 |         + (11025 / 16384) * e2 ** 8
105 |     beta2 = beta0 - 1
106 |     beta4 = (15 / 32) * e2 ** 4 - (175 / 384) * e2 ** 6 + (3675 / 8192) * e2 ** 8
107 |     beta6 = -(35 / 96) * e2 ** 6 + (735 / 2048) * e2 ** 8
108 |     beta8 = (315 / 1024) * e2 ** 8
109 | 
110 |     L = np.radians(longitude - central_longitude)
111 |     B = np.radians(latitude)
112 |     cosB = np.cos(B)
113 |     sinB = np.sin(B)
114 |     tanB = np.tan(B)
115 |     N = a / np.sqrt(1 - (e1 * sinB) ** 2)
116 |     eta = e2 * cosB
117 |     m = L * cosB
118 |     Yb0 = a ** 2 * B * beta0 / b + sinB * \
119 |         (beta2 * cosB + beta4 * cosB ** 3 + beta6 * cosB ** 5 + beta8 * cosB ** 7)
120 |     Y = Yb0 + (1 / 2) * N * tanB * m ** 2 + (1 / 24) * (5 - tanB ** 2 + 9 * eta ** 2 + 4 * eta ** 4) * N * tanB * \
121 |         m ** 4 + (1 / 720) * (61 - 58 * tanB ** 2) * N * tanB * m ** 6
122 |     X = N * m + (1 / 6) * (1 - tanB ** 2 + eta ** 2) * N * m ** 3 + \
123 |         (1 / 120) * (5 - 18 * tanB ** 2 + tanB ** 4 + 14 * eta ** 2 - 58 * tanB ** 2 * eta) * N * m ** 5 + 500000
124 |     X = X.reshape(-1, 1)
125 |     Y = Y.reshape(-1, 1)
126 |     return X, Y
127 | 
128 | 
129 | def golden_section(a, c, delta, decimal, function, tol, max_iter, verbose=False):
130 |     b = a + delta * np.abs(c - a)
131 |     d = c - delta * np.abs(c - a)
132 |     diff = 1.0e9
133 |     iter_num = 0
134 |     score_dict = {}
135 |     opt_val = None
136 |     while np.abs(diff) > tol and iter_num < max_iter:
137 |         iter_num += 1
138 |         b = np.round(b, decimal)
139 |         d = np.round(d, decimal)
140 | 
141 |         if b in score_dict:
142 |             score_b = score_dict[b]
143 |         else:
144 |             score_b = function(b)
145 |             score_dict[b] = score_b
146 | 
147 |         if d in score_dict:
148 |             score_d = score_dict[d]
149 |         else:
150 |             score_d = function(d)
151 |             score_dict[d] = score_d
152 | 
153 |         if score_b <= score_d:
154 |             opt_val = b
155 |             opt_score = score_b
156 |             c = d
157 |             d = b
158 |             b = a + delta * np.abs(c - a)
159 | 
160 |         else:
161 |             opt_val = d
162 |             opt_score = score_d
163 |             a = b
164 |             b = d
165 |             d = c - delta * np.abs(c - a)
166 | 
167 |         opt_val = np.round(opt_val, decimal)
168 |         diff = score_b - score_d
169 |         if verbose:
170 |             print('bw:', opt_val, ', score:', np.round(opt_score, 2))
171 | 
172 |     return opt_val
173 | 
174 | 
175 | def onestep_golden_section(A, C, x, delta, tau_decimal, function, tol):
176 |     iter_num = 0
177 |     score_dict = {}
178 |     diff = 1e9
179 |     opt_score = None
180 |     opt_tau = None
181 |     B = A + delta * np.abs(C - A)
182 |     D = C - delta * np.abs(C - A)
183 |     while np.abs(diff) > tol and iter_num < 200:
184 |         iter_num += 1
185 |         B = np.round(B, tau_decimal)
186 |         D = np.round(D, tau_decimal)
187 |         if B in score_dict:
188 |             score_B = score_dict[B]
189 |         else:
190 |             score_B = function(x, B)
191 |             score_dict[B] = score_B
192 | 
193 |         if D in score_dict:
194 |             score_D = score_dict[D]
195 |         else:
196 |             score_D = function(x, D)
197 |             score_dict[D] = score_D
198 |         if score_B <= score_D:
199 |             opt_score = score_B
200 |             opt_tau = B
201 |             C = D
202 |             D = B
203 |             B = A + delta * np.abs(C - A)
204 |         else:
205 |             opt_score = score_D
206 |             opt_tau = D
207 |             A = B
208 |             B = D
209 |             D = C - delta * np.abs(C - A)
210 |         diff = score_B - score_D
211 |     return opt_tau, opt_score
212 | 
213 | 
214 | def twostep_golden_section(
215 |         a, c, A, C, delta, function,
216 |         tol, max_iter, bw_decimal, tau_decimal, verbose=False):
217 |     b = a + delta * np.abs(c - a)
218 |     d = c - delta * np.abs(c - a)
219 |     opt_bw = None
220 |     opt_tau = None
221 |     diff = 1e9
222 |     score_dict = {}
223 |     iter_num = 0
224 |     while np.abs(diff) > tol and iter_num < max_iter:
225 |         iter_num += 1
226 |         b = np.round(b, bw_decimal)
227 |         d = np.round(d, bw_decimal)
228 |         if b in score_dict:
229 |             tau_b, score_b = score_dict[b]
230 |         else:
231 |             tau_b, score_b = onestep_golden_section(A, C, b, delta, tau_decimal, function, tol)
232 |             score_dict[b] = [tau_b, score_b]
233 |         if d in score_dict:
234 |             tau_d, score_d = score_dict[d]
235 |         else:
236 |             tau_d, score_d = onestep_golden_section(A, C, d, delta, tau_decimal, function, tol)
237 |             score_dict[d] = [tau_d, score_d]
238 | 
239 |         if score_b <= score_d:
240 |             opt_score = score_b
241 |             opt_bw = b
242 |             opt_tau = tau_b
243 |             c = d
244 |             d = b
245 |             b = a + delta * np.abs(c - a)
246 |         else:
247 |             opt_score = score_d
248 |             opt_bw = d
249 |             opt_tau = tau_d
250 |             a = b
251 |             b = d
252 |             d = c - delta * np.abs(c - a)
253 |         diff = score_b - score_d
254 |         if verbose:
255 |             print('bw: ', opt_bw, ', tau: ', opt_tau, ', score: ', opt_score)
256 |     return opt_bw, opt_tau
257 | 
258 | 
259 | def multi_bw(init, X, y, n, k, tol, rss_score, gwr_func,
260 |              bw_func, sel_func, multi_bw_min, multi_bw_max, bws_same_times,
261 |              verbose=False):
262 |     """
263 |     Multiscale GWR bandwidth search procedure using iterative GAM backfitting
264 |     """
265 |     if init is None:
266 |         bw = sel_func(bw_func(X, y))
267 |         optim_model = gwr_func(X, y, bw)
268 |     else:
269 |         bw = init
270 |         optim_model = gwr_func(X, y, init)
271 |     bw_gwr = bw
272 |     err = optim_model.reside
273 |     betas = optim_model.betas
274 |     XB = np.multiply(betas, X)
275 |     rss = np.sum(err ** 2) if rss_score else None
276 |     scores = []
277 |     BWs = []
278 |     bw_stable_counter = 0
279 |     bws = np.empty(k)
280 |     Betas = None
281 | 
282 |     for iters in range(1, 201):
283 |         new_XB = np.zeros_like(X)
284 |         Betas = np.zeros_like(X)
285 | 
286 |         for j in range(k):
287 |             temp_y = XB[:, j].reshape((-1, 1))
288 |             temp_y = temp_y + err
289 |             temp_X = X[:, j].reshape((-1, 1))
290 |             bw_class = bw_func(temp_X, temp_y)
291 | 
292 |             if bw_stable_counter >= bws_same_times:
293 |                 # If in backfitting, all bws not changing in bws_same_times (default 5) iterations
294 |                 bw = bws[j]
295 |             else:
296 |                 bw = sel_func(bw_class, multi_bw_min[j], multi_bw_max[j])
297 | 
298 |             optim_model = gwr_func(temp_X, temp_y, bw)
299 |             err = optim_model.reside
300 |             betas = optim_model.betas
301 |             new_XB[:, j] = optim_model.pre.reshape(-1)
302 |             Betas[:, j] = betas.reshape(-1)
303 |             bws[j] = bw
304 | 
305 |         # If bws remain the same as from previous iteration
306 |         if (iters > 1) and np.all(BWs[-1] == bws):
307 |             bw_stable_counter += 1
308 |         else:
309 |             bw_stable_counter = 0
310 | 
311 |         num = np.sum((new_XB - XB) ** 2) / n
312 |         den = np.sum(np.sum(new_XB, axis=1) ** 2)
313 |         score = (num / den) ** 0.5
314 |         XB = new_XB
315 | 
316 |         if rss_score:
317 |             predy = np.sum(np.multiply(betas, X), axis=1).reshape((-1, 1))
318 |             new_rss = np.sum((y - predy) ** 2)
319 |             score = np.abs((new_rss - rss) / new_rss)
320 |             rss = new_rss
321 |         scores.append(deepcopy(score))
322 |         delta = score
323 |         BWs.append(deepcopy(bws))
324 | 
325 |         if verbose:
326 |             print("Current iteration:", iters, ",SOC:", np.round(score, 7))
327 |             print("Bandwidths:", ', '.join([str(bw) for bw in bws]))
328 | 
329 |         if delta < tol:
330 |             break
331 | 
332 |     opt_bw = BWs[-1]
333 |     return opt_bw, np.array(BWs), np.array(scores), Betas, err, bw_gwr
334 | 
335 | 
336 | def multi_bws(init_bw, init_tau, X, y, n, k, tol, rss_score,
337 |               gtwr_func, bw_func, sel_func, multi_bw_min, multi_bw_max,
338 |               multi_tau_min, multi_tau_max, verbose=False):
339 |     """
340 |     Multiscale GTWR bandwidth search procedure using iterative GAM back fitting
341 |     """
342 |     if (init_bw is None) or (init_tau is None):
343 |         bw, tau = sel_func(bw_func(X, y))
344 |     else:
345 |         bw, tau = init_bw, init_tau
346 |     opt_model = gtwr_func(X, y, bw, tau)
347 |     bw_gtwr = bw
348 |     tau_gtwr = tau
349 |     err = opt_model.reside
350 |     betas = opt_model.betas
351 | 
352 |     XB = np.multiply(betas, X)
353 |     rss = np.sum(err ** 2) if rss_score else None
354 |     scores = []
355 |     bws = np.empty(k)
356 |     taus = np.empty(k)
357 |     BWs = []
358 |     Taus = []
359 |     Betas = None
360 | 
361 |     for iter_num in range(1, 201):
362 |         new_XB = np.zeros_like(X)
363 |         Betas = np.zeros_like(X)
364 | 
365 |         for j in range(k):
366 |             temp_y = XB[:, j].reshape((-1, 1))
367 |             temp_y = temp_y + err
368 |             temp_X = X[:, j].reshape((-1, 1))
369 |             bw_class = bw_func(temp_X, temp_y)
370 | 
371 |             bw, tau = sel_func(bw_class, multi_bw_min[j], multi_bw_max[j],
372 |                                multi_tau_min[j], multi_tau_max[j])
373 | 
374 |             opt_model = gtwr_func(temp_X, temp_y, bw, tau)
375 |             err = opt_model.reside
376 |             betas = opt_model.betas
377 |             new_XB[:, j] = (betas * temp_X).reshape(-1)
378 |             Betas[:, j] = betas.reshape(-1)
379 |             bws[j] = bw
380 |             taus[j] = tau
381 | 
382 |         num = np.sum((new_XB - XB) ** 2) / n
383 |         den = np.sum(np.sum(new_XB, axis=1) ** 2)
384 |         score = (num / den) ** 0.5
385 |         XB = new_XB
386 | 
387 |         if rss_score:
388 |             predy = np.sum(np.multiply(betas, X), axis=1).reshape((-1, 1))
389 |             new_rss = np.sum((y - predy) ** 2)
390 |             score = np.abs((new_rss - rss) / new_rss)
391 |             rss = new_rss
392 |         scores.append(deepcopy(score))
393 |         delta = score
394 |         BWs.append(deepcopy(bws))
395 |         Taus.append(deepcopy(taus))
396 | 
397 |         if verbose:
398 |             print("Current iteration:", iter_num, ",SOC:", np.round(score, 7))
399 |             print("Bandwidths:", ', '.join([str(bw) for bw in bws]))
400 |             print("taus:", ','.join([str(tau) for tau in taus]))
401 | 
402 |         if delta < tol:
403 |             break
404 |     opt_bws = BWs[-1]
405 |     opt_tau = Taus[-1]
406 |     return (opt_bws, opt_tau, np.array(BWs), np.array(Taus), np.array(scores),
407 |             Betas, err, bw_gtwr, tau_gtwr)
408 | 


--------------------------------------------------------------------------------
/mgtwr/kernel.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | from scipy.spatial.distance import cdist
 3 | 
 4 | 
 5 | class GWRKernel:
 6 | 
 7 |     def __init__(
 8 |             self,
 9 |             coords: np.ndarray,
10 |             bw: float = None,
11 |             fixed: bool = True,
12 |             function: str = 'triangular',
13 |             eps: float = 1.0000001):
14 | 
15 |         self.coords = coords
16 |         self.function = function
17 |         self.bw = bw
18 |         self.fixed = fixed
19 |         self.function = function
20 |         self.eps = eps
21 |         self.bandwidth = None
22 |         self.kernel = None
23 | 
24 |     def cal_distance(
25 |             self,
26 |             i: int):
27 |         distance = cdist([self.coords[i]], self.coords).reshape(-1)
28 |         return distance
29 | 
30 |     def cal_kernel(
31 |             self,
32 |             distance
33 |     ):
34 | 
35 |         if self.fixed:
36 |             self.bandwidth = float(self.bw)
37 |         else:
38 |             self.bandwidth = np.partition(
39 |                 distance,
40 |                 int(self.bw) - 1)[int(self.bw) - 1] * self.eps  # partial sort in O(n) Time
41 | 
42 |         self.kernel = self._kernel_funcs(distance / self.bandwidth)
43 | 
44 |         if self.function == "bisquare":  # Truncate for bisquare
45 |             self.kernel[(distance >= self.bandwidth)] = 0
46 |         return self.kernel
47 | 
48 |     def _kernel_funcs(self, zs):
49 |         # functions follow Anselin and Rey (2010) table 5.4
50 |         if self.function == 'triangular':
51 |             return 1 - zs
52 |         elif self.function == 'uniform':
53 |             return np.ones(zs.shape) * 0.5
54 |         elif self.function == 'quadratic':
55 |             return (3. / 4) * (1 - zs ** 2)
56 |         elif self.function == 'quartic':
57 |             return (15. / 16) * (1 - zs ** 2) ** 2
58 |         elif self.function == 'gaussian':
59 |             return np.exp(-0.5 * zs ** 2)
60 |         elif self.function == 'bisquare':
61 |             return (1 - zs ** 2) ** 2
62 |         elif self.function == 'exponential':
63 |             return np.exp(-zs)
64 |         else:
65 |             print('Unsupported kernel function', self.function)
66 | 
67 | 
68 | class GTWRKernel(GWRKernel):
69 | 
70 |     def __init__(
71 |             self,
72 |             coords: np.ndarray,
73 |             t: np.ndarray,
74 |             bw: float = None,
75 |             tau: float = None,
76 |             fixed: bool = True,
77 |             function: str = 'triangular',
78 |             eps: float = 1.0000001):
79 | 
80 |         super(GTWRKernel, self).__init__(coords, bw, fixed=fixed, function=function, eps=eps)
81 | 
82 |         self.t = t
83 |         self.tau = tau
84 |         self.coords_new = None
85 | 
86 |     def cal_distance(
87 |             self,
88 |             i: int):
89 | 
90 |         if self.tau == 0:
91 |             self.coords_new = self.coords
92 |         else:
93 |             self.coords_new = np.hstack([self.coords, (np.sqrt(self.tau) * self.t)])
94 |         distance = cdist([self.coords_new[i]], self.coords_new).reshape(-1)
95 |         return distance
96 | 


--------------------------------------------------------------------------------
/mgtwr/model.py:
--------------------------------------------------------------------------------
  1 | from typing import Union
  2 | import numpy as np
  3 | import pandas as pd
  4 | import multiprocessing as mp
  5 | from .kernel import GWRKernel, GTWRKernel
  6 | from .function import _compute_betas_gwr, surface_to_plane
  7 | from .obj import CalAicObj, CalMultiObj, BaseModel, GWRResults, GTWRResults, MGWRResults, MGTWRResults
  8 | from joblib import Parallel, delayed
  9 | 
 10 | 
 11 | class GWR(BaseModel):
 12 |     """
 13 |     Geographically Weighted Regression
 14 |     """
 15 |     def __init__(
 16 |             self,
 17 |             coords: Union[np.ndarray, pd.DataFrame],
 18 |             X: Union[np.ndarray, pd.DataFrame],
 19 |             y: Union[np.ndarray, pd.DataFrame, pd.Series],
 20 |             bw: float,
 21 |             kernel: str = 'bisquare',
 22 |             fixed: bool = True,
 23 |             constant: bool = True,
 24 |             thread: int = 1,
 25 |             convert: bool = False,
 26 |     ):
 27 |         """
 28 |         Parameters
 29 |         ----------
 30 |         coords        : array-like
 31 |                         n*2, spatial coordinates of the observations, if it's latitude and longitude,
 32 |                         the first column should be longitude
 33 | 
 34 |         X             : array-like
 35 |                         n*k, independent variable, excluding the constant
 36 | 
 37 |         y             : array-like
 38 |                         n*1, dependent variable
 39 | 
 40 |         bw            : scalar
 41 |                         bandwidth value consisting of either a distance or N
 42 |                         nearest neighbors; user specified or obtained using
 43 |                         sel
 44 | 
 45 |         kernel        : string
 46 |                         type of kernel function used to weight observations;
 47 |                         available options:
 48 |                         'gaussian'
 49 |                         'bisquare'
 50 |                         'exponential'
 51 | 
 52 |         fixed         : bool
 53 |                         True for distance based kernel function (default) and
 54 |                         False for adaptive (nearest neighbor) kernel function
 55 | 
 56 |         constant      : bool
 57 |                         True to include intercept (default) in model and False to exclude
 58 |                         intercept.
 59 | 
 60 |         thread        : int
 61 |                         The number of processes in parallel computation. If you have a large amount of data,
 62 |                         you can use it
 63 | 
 64 |         convert       : bool
 65 |                         Whether to convert latitude and longitude to plane coordinates.
 66 |         Examples
 67 |         --------
 68 |         import numpy as np
 69 |         from mgtwr.model import GWR
 70 |         np.random.seed(10)
 71 |         u = np.array([(i-1) % 12 for i in range(1, 1729)]).reshape(-1, 1)
 72 |         v = np.array([((i-1) % 144) // 12 for i in range(1, 1729)]).reshape(-1, 1)
 73 |         t = np.array([(i-1) // 144 for i in range(1, 1729)]).reshape(-1, 1)
 74 |         x1 = np.random.uniform(0, 1, (1728, 1))
 75 |         x2 = np.random.uniform(0, 1, (1728, 1))
 76 |         epsilon = np.random.randn(1728, 1)
 77 |         beta0 = 5
 78 |         beta1 = 3 + (u + v + t)/6
 79 |         beta2 = 3 + ((36-(6-u)**2)*(36-(6-v)**2)*(36-(6-t)**2)) / 128
 80 |         y = beta0 + beta1 * x1 + beta2 * x2 + epsilon
 81 |         coords = np.hstack([u, v])
 82 |         X = np.hstack([x1, x2])
 83 |         gwr = GWR(coords, X, y, 0.8, kernel='gaussian', fixed=True).fit()
 84 |         print(gwr.R2)
 85 |         0.7128737240047688
 86 |         """
 87 |         super(GWR, self).__init__(X, y, kernel, fixed, constant)
 88 |         if thread < 1 or not isinstance(thread, int):
 89 |             raise ValueError('thread should be an integer greater than or equal to 1')
 90 |         if isinstance(coords, pd.DataFrame):
 91 |             coords = coords.values
 92 |         self.coords = coords
 93 |         if convert:
 94 |             longitude = coords[:, 0]
 95 |             latitude = coords[:, 1]
 96 |             longitude, latitude = surface_to_plane(longitude, latitude)
 97 |             self.coords = np.hstack([longitude, latitude])
 98 |         self.bw = bw
 99 |         self.thread = thread
100 | 
101 |     def _build_wi(self, i, bw):
102 |         """
103 |         calculate Weight matrix
104 |         """
105 |         try:
106 |             gwr_kernel = GWRKernel(self.coords, bw, fixed=self.fixed, function=self.kernel)
107 |             distance = gwr_kernel.cal_distance(i)
108 |             wi = gwr_kernel.cal_kernel(distance)
109 |         except BaseException:
110 |             raise  # TypeError('Unsupported kernel function  ', kernel)
111 | 
112 |         return wi
113 | 
114 |     def cal_aic(self):
115 |         """
116 |         use for calculating AICc, BIC, CV and so on.
117 |         """
118 |         if self.thread > 1:
119 |             result = list(zip(*Parallel(n_jobs=self.thread,pre_dispatch='all',batch_size=self.n//self.thread)(delayed(self._search_local_fit)(i) for i in range(self.n))))
120 |         else:
121 |             result = list(zip(*map(self._search_local_fit, range(self.n))))
122 |         err2 = np.array(result[0]).reshape(-1, 1)
123 |         hat = np.array(result[1]).reshape(-1, 1)
124 |         aa = np.sum(err2 / ((1 - hat) ** 2))
125 |         RSS = np.sum(err2)
126 |         tr_S = np.sum(hat)
127 |         llf = -np.log(RSS) * self.n / 2 - (1 + np.log(np.pi / self.n * 2)) * self.n / 2
128 | 
129 |         return CalAicObj(tr_S, float(llf), float(aa), self.n)
130 | 
131 |     def _search_local_fit(self, i):
132 |         wi = self._build_wi(i, self.bw).reshape(-1, 1)
133 |         betas, inv_xtx_xt = _compute_betas_gwr(self.y, self.X, wi)
134 |         predict = np.dot(self.X[i], betas)[0]
135 |         reside = self.y[i] - predict
136 |         influx = np.dot(self.X[i], inv_xtx_xt[:, i])
137 |         return reside * reside, influx
138 | 
139 |     def _local_fit(self, i):
140 |         wi = self._build_wi(i, self.bw).reshape(-1, 1)
141 |         betas, inv_xtx_xt = _compute_betas_gwr(self.y, self.X, wi)
142 |         predict = np.dot(self.X[i], betas)[0]
143 |         reside = self.y[i] - predict
144 |         influx = np.dot(self.X[i], inv_xtx_xt[:, i])
145 |         Si = np.dot(self.X[i], inv_xtx_xt).reshape(-1)
146 |         CCT = np.diag(np.dot(inv_xtx_xt, inv_xtx_xt.T)).reshape(-1)
147 |         Si2 = np.sum(Si ** 2)
148 |         return influx, reside, predict, betas.reshape(-1), CCT, Si2
149 | 
150 |     def _multi_fit(self, i):
151 |         wi = self._build_wi(i, self.bw).reshape(-1, 1)
152 |         betas, inv_xtx_xt = _compute_betas_gwr(self.y, self.X, wi)
153 |         pre = np.dot(self.X[i], betas)[0]
154 |         reside = self.y[i] - pre
155 |         return betas.reshape(-1), pre, reside
156 | 
157 |     def cal_multi(self):
158 |         """
159 |         calculate betas, predict value and reside, use for searching best bandwidth in MGWR model by backfitting.
160 |         """
161 |         if self.thread > 1:
162 |             result = list(zip(*Parallel(n_jobs=self.thread,pre_dispatch='all',batch_size=self.n//self.thread)(delayed(self._multi_fit)(i) for i in range(self.n))))
163 |         else:
164 |             result = list(zip(*map(self._multi_fit, range(self.n))))
165 |         betas = np.array(result[0])
166 |         pre = np.array(result[1]).reshape(-1, 1)
167 |         reside = np.array(result[2]).reshape(-1, 1)
168 |         return CalMultiObj(betas, pre, reside)
169 | 
170 |     def fit(self):
171 |         """
172 |         To fit GWR model
173 |         """
174 |         if self.thread > 1:
175 |             result = list(zip(*Parallel(n_jobs=self.thread,pre_dispatch='all',batch_size=self.n//self.thread)(delayed(self._local_fit)(i) for i in range(self.n))))
176 |         else:
177 |             result = list(zip(*map(self._local_fit, range(self.n))))
178 |         influ = np.array(result[0]).reshape(-1, 1)
179 |         reside = np.array(result[1]).reshape(-1, 1)
180 |         predict_value = np.array(result[2]).reshape(-1, 1)
181 |         betas = np.array(result[3])
182 |         CCT = np.array(result[4])
183 |         tr_STS = np.array(result[5])
184 |         return GWRResults(self.coords, self.X, self.y, self.bw, self.kernel, self.fixed,
185 |                           influ, reside, predict_value, betas, CCT, tr_STS)
186 | 
187 | 
188 | class MGWR(GWR):
189 |     """
190 |     Multiscale Geographically Weighted Regression
191 |     """
192 |     def __init__(
193 |             self,
194 |             coords: np.ndarray,
195 |             X: np.ndarray,
196 |             y: np.ndarray,
197 |             selector,
198 |             kernel: str = 'bisquare',
199 |             fixed: bool = False,
200 |             constant: bool = True,
201 |             thread: int = 1,
202 |             convert: bool = False
203 |     ):
204 |         """
205 |         Parameters
206 |         ----------
207 |         coords        : array-like
208 |                         n*2, spatial coordinates of the observations, if it's latitude and longitude,
209 |                         the first column should be longitude
210 | 
211 |         X             : array-like
212 |                         n*k, independent variable, excluding the constant
213 | 
214 |         y             : array-like
215 |                         n*1, dependent variable
216 | 
217 |         selector      :SearchMGWRParameter object
218 |                        valid SearchMGWRParameter that has successfully called
219 |                        the "search" method. This parameter passes on
220 |                        information from GAM model estimation including optimal
221 |                        bandwidths.
222 | 
223 |         kernel        : string
224 |                         type of kernel function used to weight observations;
225 |                         available options:
226 |                         'gaussian'
227 |                         'bisquare'
228 |                         'exponential'
229 | 
230 |         fixed         : bool
231 |                         True for distance based kernel function (default) and  False for
232 |                         adaptive (nearest neighbor) kernel function
233 | 
234 |         constant      : bool
235 |                         True to include intercept (default) in model and False to exclude
236 |                         intercept.
237 | 
238 |         thread        : int
239 |                         The number of processes in parallel computation. If you have a large amount of data,
240 |                         you can use it
241 | 
242 |         convert       : bool
243 |                         Whether to convert latitude and longitude to plane coordinates.
244 |         Examples
245 |         --------
246 |         import numpy as np
247 |         from mgtwr.sel import SearchMGWRParameter
248 |         from mgtwr.model import MGWR
249 |         np.random.seed(10)
250 |         u = np.array([(i-1) % 12 for i in range(1, 1729)]).reshape(-1, 1)
251 |         v = np.array([((i-1) % 144) // 12 for i in range(1, 1729)]).reshape(-1, 1)
252 |         t = np.array([(i-1) // 144 for i in range(1, 1729)]).reshape(-1, 1)
253 |         x1 = np.random.uniform(0, 1, (1728, 1))
254 |         x2 = np.random.uniform(0, 1, (1728, 1))
255 |         epsilon = np.random.randn(1728, 1)
256 |         beta0 = 5
257 |         beta1 = 3 + (u + v + t)/6
258 |         beta2 = 3 + ((36-(6-u)**2)*(36-(6-v)**2)*(36-(6-t)**2)) / 128
259 |         y = beta0 + beta1 * x1 + beta2 * x2 + epsilon
260 |         coords = np.hstack([u, v])
261 |         X = np.hstack([x1, x2])
262 |         sel_multi = SearchMGWRParameter(coords, X, y, kernel='gaussian', fixed=True)
263 |         bws = sel_multi.search(multi_bw_max=[40], verbose=True)
264 |         mgwr = MGWR(coords, X, y, sel_multi, kernel='gaussian', fixed=True).fit()
265 |         print(mgwr.R2)
266 |         0.7045642214972343
267 |         """
268 |         self.selector = selector
269 |         self.bws = self.selector.bws[0]  # final set of bandwidth
270 |         self.bws_history = selector.bws[1]  # bws history in back_fitting
271 |         self.betas = selector.bws[3]
272 |         bw_init = self.selector.bws[5]  # initialization bandwidth
273 |         super().__init__(
274 |             coords, X, y, bw_init, kernel=kernel, fixed=fixed, constant=constant, thread=thread, convert=convert)
275 |         self.n_chunks = None
276 |         self.ENP_j = None
277 | 
278 |     def _chunk_compute(self, chunk_id=0):
279 |         n = self.n
280 |         k = self.k
281 |         n_chunks = self.n_chunks
282 |         chunk_size = int(np.ceil(float(n / n_chunks)))
283 |         ENP_j = np.zeros(self.k)
284 |         CCT = np.zeros((self.n, self.k))
285 | 
286 |         chunk_index = np.arange(n)[chunk_id * chunk_size:(chunk_id + 1) * chunk_size]
287 |         init_pR = np.zeros((n, len(chunk_index)))
288 |         init_pR[chunk_index, :] = np.eye(len(chunk_index))
289 |         pR = np.zeros((n, len(chunk_index),
290 |                        k))  # partial R: n by chunk_size by k
291 | 
292 |         for i in range(n):
293 |             wi = self._build_wi(i, self.bw).reshape(-1, 1)
294 |             xT = (self.X * wi).T
295 |             P = np.linalg.solve(xT.dot(self.X), xT).dot(init_pR).T
296 |             pR[i, :, :] = P * self.X[i]
297 | 
298 |         err = init_pR - np.sum(pR, axis=2)  # n by chunk_size
299 | 
300 |         for iter_i in range(self.bws_history.shape[0]):
301 |             for j in range(k):
302 |                 pRj_old = pR[:, :, j] + err
303 |                 Xj = self.X[:, j]
304 |                 n_chunks_Aj = n_chunks
305 |                 chunk_size_Aj = int(np.ceil(float(n / n_chunks_Aj)))
306 |                 for chunk_Aj in range(n_chunks_Aj):
307 |                     chunk_index_Aj = np.arange(n)[chunk_Aj * chunk_size_Aj:(
308 |                                                                                    chunk_Aj + 1) * chunk_size_Aj]
309 |                     pAj = np.empty((len(chunk_index_Aj), n))
310 |                     for i in range(len(chunk_index_Aj)):
311 |                         index = chunk_index_Aj[i]
312 |                         wi = self._build_wi(index, self.bws_history[iter_i, j])
313 |                         xw = Xj * wi
314 |                         pAj[i, :] = Xj[index] / np.sum(xw * Xj) * xw
315 |                     pR[chunk_index_Aj, :, j] = pAj.dot(pRj_old)
316 |                 err = pRj_old - pR[:, :, j]
317 | 
318 |         for j in range(k):
319 |             CCT[:, j] += ((pR[:, :, j] / self.X[:, j].reshape(-1, 1)) ** 2).sum(
320 |                 axis=1)
321 |         for i in range(len(chunk_index)):
322 |             ENP_j += pR[chunk_index[i], i, :]
323 | 
324 |         return ENP_j, CCT,
325 | 
326 |     def fit(self, n_chunks: int = 1, skip_calculate: bool = False):
327 |         """
328 |         Compute MGWR inference by chunk to reduce memory footprint.
329 |         Parameters
330 |         ----------
331 |         n_chunks       : int
332 |                          divided into n_chunks steps to reduce memory consumption
333 |         skip_calculate : bool
334 |                          if True, skip calculate CCT, ENP and other variables derived from it
335 |         """
336 |         pre = np.sum(self.X * self.betas, axis=1).reshape(-1, 1)
337 |         ENP_j = None
338 |         CCT = None
339 |         if not skip_calculate:
340 |             self.n_chunks = n_chunks
341 |             result = map(self._chunk_compute, (range(n_chunks)))
342 |             result_list = list(zip(*result))
343 |             ENP_j = np.sum(np.array(result_list[0]), axis=0)
344 |             CCT = np.sum(np.array(result_list[1]), axis=0)
345 |         return MGWRResults(
346 |             self.coords, self.X, self.y, self.bws, self.kernel, self.fixed,
347 |             self.bws_history, self.betas, pre, ENP_j, CCT)
348 | 
349 | 
350 | class GTWR(BaseModel):
351 |     """
352 |     Geographically and Temporally Weighted Regression
353 | 
354 |     Parameters
355 |     ----------
356 |     coords        : array-like
357 |                     n*2, collection of n sets of (x,y) coordinates of
358 |                     observations
359 | 
360 |     t             : array-like
361 |                     n*1, time location
362 | 
363 |     X             : array-like
364 |                         n*k, independent variable, excluding the constant
365 | 
366 |     y             : array-like
367 |                     n*1, dependent variable
368 | 
369 |     bw            : scalar
370 |                     bandwidth value consisting of either a distance or N
371 |                     nearest neighbors; user specified or obtained using
372 |                     sel
373 | 
374 |     tau           : scalar
375 |                     spatio-temporal scale
376 | 
377 |     kernel        : string
378 |                     type of kernel function used to weight observations;
379 |                     available options:
380 |                     'gaussian'
381 |                     'bisquare'
382 |                     'exponential'
383 | 
384 |     fixed         : bool
385 |                     True for distance based kernel function (default) and
386 |                     False for adaptive (nearest neighbor) kernel function
387 | 
388 |     constant      : bool
389 |                     True to include intercept (default) in model and False to exclude
390 |                     intercept.
391 | 
392 |     Examples
393 |     --------
394 |     import numpy as np
395 |     from mgtwr.model import GTWR
396 |     np.random.seed(10)
397 |     u = np.array([(i-1) % 12 for i in range(1, 1729)]).reshape(-1, 1)
398 |     v = np.array([((i-1) % 144) // 12 for i in range(1, 1729)]).reshape(-1, 1)
399 |     t = np.array([(i-1) // 144 for i in range(1, 1729)]).reshape(-1, 1)
400 |     x1 = np.random.uniform(0, 1, (1728, 1))
401 |     x2 = np.random.uniform(0, 1, (1728, 1))
402 |     epsilon = np.random.randn(1728, 1)
403 |     beta0 = 5
404 |     beta1 = 3 + (u + v + t)/6
405 |     beta2 = 3 + ((36-(6-u)**2)*(36-(6-v)**2)*(36-(6-t)**2)) / 128
406 |     y = beta0 + beta1 * x1 + beta2 * x2 + epsilon
407 |     coords = np.hstack([u, v])
408 |     X = np.hstack([x1, x2])
409 |     gtwr = GTWR(coords, t, X, y, 0.8, 1.9, kernel='gaussian', fixed=True).fit()
410 |     print(gtwr.R2)
411 |     0.9899869616636376
412 |     """
413 | 
414 |     def __init__(
415 |             self,
416 |             coords: Union[np.ndarray, pd.DataFrame],
417 |             t: Union[np.ndarray, pd.DataFrame],
418 |             X: Union[np.ndarray, pd.DataFrame],
419 |             y: Union[np.ndarray, pd.DataFrame],
420 |             bw: float,
421 |             tau: float,
422 |             kernel: str = 'gaussian',
423 |             fixed: bool = False,
424 |             constant: bool = True,
425 |             thread: int = 1,
426 |             convert: bool = False
427 |     ):
428 |         super(GTWR, self).__init__(X, y, kernel, fixed, constant)
429 |         if thread < 1 or not isinstance(thread, int):
430 |             raise ValueError('thread should be an integer greater than or equal to 1')
431 |         if isinstance(coords, pd.DataFrame):
432 |             coords = coords.values
433 |         self.coords = coords
434 |         if convert:
435 |             longitude = coords[:, 0]
436 |             latitude = coords[:, 1]
437 |             longitude, latitude = surface_to_plane(longitude, latitude)
438 |             self.coords = np.hstack([longitude, latitude])
439 |         self.t = t
440 |         self.bw = bw
441 |         self.tau = tau
442 |         self.bw_s = self.bw
443 |         self.bw_t = np.sqrt(self.bw ** 2 / self.tau)
444 |         self.thread = thread
445 | 
446 |     def _build_wi(self, i, bw, tau):
447 |         """
448 |         calculate Weight matrix
449 |         """
450 |         try:
451 |             gtwr_kernel = GTWRKernel(self.coords, self.t, bw, tau, fixed=self.fixed, function=self.kernel)
452 |             distance = gtwr_kernel.cal_distance(i)
453 |             wi = gtwr_kernel.cal_kernel(distance)
454 |         except BaseException:
455 |             raise  # TypeError('Unsupported kernel function  ', kernel)
456 | 
457 |         return wi
458 | 
459 |     def cal_aic(self):
460 | 
461 |         """
462 |         use for calculating AICc, BIC, CV and so on.
463 |         """
464 |         if self.thread > 1:
465 |             result = list(zip(*Parallel(n_jobs=self.thread,pre_dispatch='all',batch_size=self.n//self.thread)(delayed(self._search_local_fit)(i) for i in range(self.n))))
466 |         else:
467 |             result = list(zip(*map(self._search_local_fit, range(self.n))))
468 |         err2 = np.array(result[0]).reshape(-1, 1)
469 |         hat = np.array(result[1]).reshape(-1, 1)
470 |         aa = np.sum(err2 / ((1 - hat) ** 2))
471 |         RSS = np.sum(err2)
472 |         tr_S = np.sum(hat)
473 |         llf = -np.log(RSS) * self.n / 2 - (1 + np.log(np.pi / self.n * 2)) * self.n / 2
474 | 
475 |         return CalAicObj(tr_S, float(llf), float(aa), self.n)
476 | 
477 |     def _search_local_fit(self, i):
478 |         wi = self._build_wi(i, self.bw, self.tau).reshape(-1, 1)
479 |         betas, xtx_inv_xt = _compute_betas_gwr(self.y, self.X, wi)
480 |         predict = np.dot(self.X[i], betas)[0]
481 |         reside = self.y[i] - predict
482 |         influ = np.dot(self.X[i], xtx_inv_xt[:, i])
483 |         return reside * reside, influ
484 | 
485 |     def _local_fit(self, i):
486 |         wi = self._build_wi(i, self.bw, self.tau).reshape(-1, 1)
487 |         betas, xtx_inv_xt = _compute_betas_gwr(self.y, self.X, wi)
488 |         predict = np.dot(self.X[i], betas)[0]
489 |         reside = self.y[i] - predict
490 |         influ = np.dot(self.X[i], xtx_inv_xt[:, i])
491 |         Si = np.dot(self.X[i], xtx_inv_xt).reshape(-1)
492 |         CCT = np.diag(np.dot(xtx_inv_xt, xtx_inv_xt.T)).reshape(-1)
493 |         Si2 = np.sum(Si ** 2)
494 |         return influ, reside, predict, betas.reshape(-1), CCT, Si2
495 | 
496 |     def _multi_fit(self, i):
497 |         wi = self._build_wi(i, self.bw, self.tau).reshape(-1, 1)
498 |         betas, inv_xtx_xt = _compute_betas_gwr(self.y, self.X, wi)
499 |         pre = np.dot(self.X[i], betas)[0]
500 |         reside = self.y[i] - pre
501 |         return betas.reshape(-1), pre, reside
502 | 
503 |     def cal_multi(self):
504 |         """
505 |         calculate betas, predict value and reside, use for searching best bandwidth in MGWR model by backfitting.
506 |         """
507 |         if self.thread > 1:
508 |             result = list(zip(*Parallel(n_jobs=self.thread,pre_dispatch='all',batch_size=self.n//self.thread)(delayed(self._multi_fit)(i) for i in range(self.n))))
509 |         else:
510 |             result = list(zip(*map(self._multi_fit, range(self.n))))
511 |         betas = np.array(result[0])
512 |         pre = np.array(result[1]).reshape(-1, 1)
513 |         reside = np.array(result[2]).reshape(-1, 1)
514 |         return CalMultiObj(betas, pre, reside)
515 | 
516 |     def fit(self):
517 |         """
518 |         fit GTWR models
519 | 
520 |         """
521 |         if self.thread > 1:
522 |             result = list(zip(*Parallel(n_jobs=self.thread,pre_dispatch='all',batch_size=self.n//self.thread)(delayed(self._local_fit)(i) for i in range(self.n))))
523 |         else:
524 |                 result = list(zip(*map(self._local_fit, range(self.n))))
525 |         influ = np.array(result[0]).reshape(-1, 1)
526 |         reside = np.array(result[1]).reshape(-1, 1)
527 |         predict_value = np.array(result[2]).reshape(-1, 1)
528 |         betas = np.array(result[3])
529 |         CCT = np.array(result[4])
530 |         tr_STS = np.array(result[5])
531 |         return GTWRResults(
532 |             self.coords, self.t, self.X, self.y, self.bw, self.tau, self.kernel, self.fixed,
533 |             influ, reside, predict_value, betas, CCT, tr_STS
534 |         )
535 | 
536 | 
537 | class MGTWR(GTWR):
538 |     """
539 |     Multiscale GTWR estimation and inference.
540 | 
541 |     Parameters
542 |     ----------
543 |     coords        : array-like
544 |                     n*2, collection of n sets of (x,y) coordinates of
545 |                     observatons
546 | 
547 |     t             : array
548 |                     n*1, time location
549 | 
550 |     X             : array-like
551 |                         n*k, independent variable, excluding the constant
552 | 
553 |     y             : array-like
554 |                     n*1, dependent variable
555 | 
556 |     selector      : SearchMGTWRParameter object
557 |                     valid SearchMGTWRParameter object that has successfully called
558 |                     the "search" method. This parameter passes on
559 |                     information from GAM model estimation including optimal
560 |                     bandwidths.
561 | 
562 |     kernel        : string
563 |                     type of kernel function used to weight observations;
564 |                     available options:
565 |                     'gaussian'
566 |                     'bisquare'
567 |                     'exponential'
568 | 
569 |     fixed         : bool
570 |                     True for distance based kernel function (default) and  False for
571 |                     adaptive (nearest neighbor) kernel function
572 | 
573 |     constant      : bool
574 |                     True to include intercept (default) in model and False to exclude
575 |                     intercept.
576 |     Examples
577 |     --------
578 |     import numpy as np
579 |     from mgtwr.sel import SearchMGTWRParameter
580 |     from mgtwr.model import MGTWR
581 |     np.random.seed(10)
582 |     u = np.array([(i-1) % 12 for i in range(1, 1729)]).reshape(-1, 1)
583 |     v = np.array([((i-1) % 144)//12 for i in range(1, 1729)]).reshape(-1, 1)
584 |     t = np.array([(i-1) // 144 for i in range(1, 1729)]).reshape(-1, 1)
585 |     x1 = np.random.uniform(0, 1, (1728, 1))
586 |     x2 = np.random.uniform(0, 1, (1728, 1))
587 |     epsilon = np.random.randn(1728, 1)
588 |     beta0 = 5
589 |     beta1 = 3 + (u + v + t)/6
590 |     beta2 = 3 + ((36-(6-u)**2)*(36-(6-v)**2)*(36-(6-t)**2)) / 128
591 |     y = beta0 + beta1 * x1 + beta2 * x2 + epsilon
592 |     coords = np.hstack([u, v])
593 |     X = np.hstack([x1, x2])
594 |     sel_multi = SearchMGTWRParameter(coords, t, X, y, kernel='gaussian', fixed=True)
595 |     bws = sel_multi.search(multi_bw_min=[0.1], verbose=True, tol_multi=1.0e-4)
596 |     mgtwr = MGTWR(coords, t, X, y, sel_multi, kernel='gaussian', fixed=True).fit()
597 |     print(mgtwr.R2)
598 |     0.9972924820674222
599 |     """
600 | 
601 |     def __init__(
602 |             self,
603 |             coords: np.ndarray,
604 |             t: np.ndarray,
605 |             X: np.ndarray,
606 |             y: np.ndarray,
607 |             selector,
608 |             kernel: str = 'bisquare',
609 |             fixed: bool = False,
610 |             constant: bool = True,
611 |             thread: int = 1,
612 |             convert: bool = False
613 |     ):
614 |         self.selector = selector
615 |         self.bws = self.selector.bws[0]  # final set of bandwidth
616 |         self.taus = self.selector.bws[1]
617 |         self.bw_ts = np.sqrt(self.bws ** 2 / self.taus)
618 |         self.bws_history = selector.bws[2]  # bws history in back_fitting
619 |         self.taus_history = selector.bws[3]
620 |         self.betas = selector.bws[5]
621 |         bw_init = self.selector.bws[7]  # initialization bandwidth
622 |         tau_init = self.selector.bws[8]
623 |         super().__init__(coords, t, X, y, bw_init, tau_init,
624 |                          kernel=kernel, fixed=fixed, constant=constant, thread=thread, convert=convert)
625 |         self.n_chunks = None
626 |         self.ENP_j = None
627 | 
628 |     def _chunk_compute(self, chunk_id=0):
629 |         n = self.n
630 |         k = self.k
631 |         n_chunks = self.n_chunks
632 |         chunk_size = int(np.ceil(float(n / n_chunks)))
633 |         ENP_j = np.zeros(self.k)
634 |         CCT = np.zeros((self.n, self.k))
635 | 
636 |         chunk_index = np.arange(n)[chunk_id * chunk_size:(chunk_id + 1) * chunk_size]
637 |         init_pR = np.zeros((n, len(chunk_index)))
638 |         init_pR[chunk_index, :] = np.eye(len(chunk_index))
639 |         pR = np.zeros((n, len(chunk_index),
640 |                        k))  # partial R: n by chunk_size by k
641 | 
642 |         for i in range(n):
643 |             wi = self._build_wi(i, self.bw, self.tau).reshape(-1, 1)
644 |             xT = (self.X * wi).T
645 |             P = np.linalg.solve(xT.dot(self.X), xT).dot(init_pR).T
646 |             pR[i, :, :] = P * self.X[i]
647 | 
648 |         err = init_pR - np.sum(pR, axis=2)  # n by chunk_size
649 | 
650 |         for iter_i in range(self.bws_history.shape[0]):
651 |             for j in range(k):
652 |                 pRj_old = pR[:, :, j] + err
653 |                 Xj = self.X[:, j]
654 |                 n_chunks_Aj = n_chunks
655 |                 chunk_size_Aj = int(np.ceil(float(n / n_chunks_Aj)))
656 |                 for chunk_Aj in range(n_chunks_Aj):
657 |                     chunk_index_Aj = np.arange(n)[chunk_Aj * chunk_size_Aj:(
658 |                                                                                    chunk_Aj + 1) * chunk_size_Aj]
659 |                     pAj = np.empty((len(chunk_index_Aj), n))
660 |                     for i in range(len(chunk_index_Aj)):
661 |                         index = chunk_index_Aj[i]
662 |                         wi = self._build_wi(index, self.bws_history[iter_i, j],
663 |                                             self.taus_history[iter_i, j])
664 |                         xw = Xj * wi
665 |                         pAj[i, :] = Xj[index] / np.sum(xw * Xj) * xw
666 |                     pR[chunk_index_Aj, :, j] = pAj.dot(pRj_old)
667 |                 err = pRj_old - pR[:, :, j]
668 | 
669 |         for j in range(k):
670 |             CCT[:, j] += ((pR[:, :, j] / self.X[:, j].reshape(-1, 1)) ** 2).sum(
671 |                 axis=1)
672 |         for i in range(len(chunk_index)):
673 |             ENP_j += pR[chunk_index[i], i, :]
674 | 
675 |         return ENP_j, CCT,
676 | 
677 |     def fit(self, n_chunks: int = 1, skip_calculate: bool = False):
678 |         """
679 |         Compute MGTWR inference by chunk to reduce memory footprint.
680 |         Parameters
681 |         ----------
682 |         n_chunks       : int
683 |                          divided into n_chunks steps to reduce memory consumption
684 |         skip_calculate : bool
685 |                          if True, skip calculate CCT, ENP and other variables derived from it
686 |         """
687 |         pre = np.sum(self.X * self.betas, axis=1).reshape(-1, 1)
688 |         ENP_j = None
689 |         CCT = None
690 |         if not skip_calculate:
691 |             self.n_chunks = n_chunks
692 |             result = map(self._chunk_compute, (range(n_chunks)))
693 |             result_list = list(zip(*result))
694 |             ENP_j = np.sum(np.array(result_list[0]), axis=0)
695 |             CCT = np.sum(np.array(result_list[1]), axis=0)
696 |         return MGTWRResults(
697 |             self.coords, self.t, self.X, self.y, self.bws, self.taus, self.kernel, self.fixed, self.bw_ts,
698 |             self.bws_history, self.taus_history, self.betas, pre, ENP_j, CCT)
699 | 


--------------------------------------------------------------------------------
/mgtwr/obj.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import pandas as pd
  3 | from typing import Union
  4 | 
  5 | 
  6 | class CalAicObj:
  7 | 
  8 |     def __init__(self, tr_S, llf, aa, n):
  9 |         self.tr_S = tr_S
 10 |         self.llf = llf
 11 |         self.aa = aa
 12 |         self.n = n
 13 | 
 14 | 
 15 | class CalMultiObj:
 16 | 
 17 |     def __init__(self, betas, pre, reside):
 18 |         self.betas = betas
 19 |         self.pre = pre
 20 |         self.reside = reside
 21 | 
 22 | 
 23 | class BaseModel:
 24 |     """
 25 |     Is the parent class of most models
 26 |     """
 27 |     def __init__(
 28 |             self,
 29 |             X: Union[np.ndarray, pd.DataFrame, pd.Series],
 30 |             y: Union[np.ndarray, pd.DataFrame, pd.Series],
 31 |             kernel: str,
 32 |             fixed: bool,
 33 |             constant: bool,
 34 |     ):
 35 |         self.X = X.values if isinstance(X, (pd.DataFrame, pd.Series)) else X
 36 |         self.y = y.values if isinstance(y, (pd.DataFrame, pd.Series)) else y
 37 |         if len(y.shape) > 1 and y.shape[1] != 1:
 38 |             raise ValueError('Label should be one-dimensional arrays')
 39 |         if len(y.shape) == 1:
 40 |             self.y = self.y.reshape(-1, 1)
 41 |         self.kernel = kernel
 42 |         self.fixed = fixed
 43 |         self.constant = constant
 44 |         self.n = X.shape[0]
 45 |         if self.constant:
 46 |             if len(self.X.shape) == 1 and np.all(self.X == 1):
 47 |                 raise ValueError("You've already passed in a constant sequence, use constant=False instead")
 48 |             for j in range(self.X.shape[1]):
 49 |                 if np.all(self.X[:, j] == 1):
 50 |                     raise ValueError("You've already passed in a constant sequence, use constant=False instead")
 51 |             self.X = np.hstack([np.ones((self.n, 1)), X])
 52 |         self.k = self.X.shape[1]
 53 | 
 54 | 
 55 | class Results(BaseModel):
 56 |     """
 57 |     Is the result parent class of all models
 58 |     """
 59 | 
 60 |     def __init__(
 61 |             self,
 62 |             X: Union[np.ndarray, pd.DataFrame],
 63 |             y: Union[np.ndarray, pd.Series],
 64 |             kernel: str,
 65 |             fixed: bool,
 66 |             influ: np.ndarray,
 67 |             reside,
 68 |             predict_value: np.ndarray,
 69 |             betas: np.ndarray,
 70 |             tr_STS: float
 71 |     ):
 72 |         super(Results, self).__init__(X, y, kernel, fixed, constant=False)
 73 |         self.influ = influ
 74 |         self.reside = reside
 75 |         self.predict_value = predict_value
 76 |         self.betas = betas
 77 |         self.tr_S = np.sum(influ)
 78 |         self.ENP = self.tr_S
 79 |         self.tr_STS = tr_STS
 80 |         self.TSS = np.sum((y - np.mean(y)) ** 2)
 81 |         self.RSS = np.sum(reside ** 2)
 82 |         self.sigma2 = self.RSS / (self.n - self.tr_S)
 83 |         self.std_res = self.reside / (np.sqrt(self.sigma2 * (1.0 - self.influ)))
 84 |         self.cooksD = self.std_res ** 2 * self.influ / (self.tr_S * (1.0 - self.influ))
 85 |         self.df_model = self.n - self.tr_S
 86 |         self.df_reside = self.n - 2.0 * self.tr_S + self.tr_STS
 87 |         self.R2 = 1 - self.RSS / self.TSS
 88 |         self.adj_R2 = 1 - (1 - self.R2) * (self.n - 1) / (self.n - self.ENP - 1)
 89 |         self.llf = -np.log(self.RSS) * self.n / 2 - (1 + np.log(np.pi / self.n * 2)) * self.n / 2
 90 |         self.aic = -2.0 * self.llf + 2.0 * (self.tr_S + 1)
 91 |         self.aicc = self.aic + 2.0 * self.tr_S * (self.tr_S + 1.0) / (self.n - self.tr_S - 1.0)
 92 |         self.bic = -2.0 * self.llf + (self.k + 1) * np.log(self.n)
 93 | 
 94 | 
 95 | class GWRResults(Results):
 96 | 
 97 |     def __init__(
 98 |             self, coords, X, y, bw, kernel, fixed, influ, reside, predict_value, betas, CCT, tr_STS
 99 |     ):
100 |         """
101 |         betas               : array
102 |                               n*k, estimated coefficients
103 | 
104 |         predict             : array
105 |                               n*1, predict y values
106 | 
107 |         CCT                 : array
108 |                               n*k, scaled variance-covariance matrix
109 | 
110 |         df_model            : integer
111 |                               model degrees of freedom
112 | 
113 |         df_reside           : integer
114 |                               residual degrees of freedom
115 | 
116 |         reside              : array
117 |                               n*1, residuals of the response
118 | 
119 |         RSS                 : scalar
120 |                               residual sum of squares
121 | 
122 |         CCT                 : array
123 |                               n*k, scaled variance-covariance matrix
124 | 
125 |         ENP                 : scalar
126 |                               effective number of parameters, which depends on
127 |                               sigma2
128 | 
129 |         tr_S                : float
130 |                               trace of S (hat) matrix
131 | 
132 |         tr_STS              : float
133 |                               trace of STS matrix
134 | 
135 |         R2                  : float
136 |                               R-squared for the entire model (1- RSS/TSS)
137 | 
138 |         adj_R2              : float
139 |                               adjusted R-squared for the entire model
140 | 
141 |         aic                 : float
142 |                               Akaike information criterion
143 | 
144 |         aicc                : float
145 |                               corrected Akaike information criterion
146 |                               to account for model complexity (smaller
147 |                               bandwidths)
148 | 
149 |         bic                 : float
150 |                               Bayesian information criterion
151 | 
152 |         sigma2              : float
153 |                               sigma squared (residual variance) that has been
154 |                               corrected to account for the ENP
155 | 
156 |         std_res             : array
157 |                               n*1, standardised residuals
158 | 
159 |         bse                 : array
160 |                               n*k, standard errors of parameters (betas)
161 | 
162 |         influ               : array
163 |                               n*1, leading diagonal of S matrix
164 | 
165 |         CooksD              : array
166 |                               n*1, Cook's D
167 | 
168 |         tvalues             : array
169 |                               n*k, local t-statistics
170 | 
171 |         llf                 : scalar
172 |                               log-likelihood of the full model; see
173 |                               pysal.contrib.glm.family for damily-sepcific
174 |                               log-likelihoods
175 |         """
176 | 
177 |         super(GWRResults, self).__init__(
178 |             X, y, kernel, fixed, influ, reside, predict_value, betas, tr_STS)
179 |         self.coords = coords
180 |         self.bw = bw
181 |         self.CCT = CCT * self.sigma2
182 |         self.bse = np.sqrt(self.CCT)
183 |         self.tvalues = self.betas / self.bse
184 | 
185 | 
186 | class GTWRResults(Results):
187 | 
188 |     def __init__(
189 |             self, coords, t, X, y, bw, tau, kernel, fixed, influ, reside, predict_value, betas, CCT, tr_STS
190 |     ):
191 |         """
192 |         tau:        : scalar
193 |                       spatio-temporal scale
194 |         bw_s        : scalar
195 |                       spatial bandwidth
196 |         bw_t        : scalar
197 |                       temporal bandwidth
198 |         See Also GWRResults
199 |         """
200 | 
201 |         super(GTWRResults, self).__init__(X, y, kernel, fixed, influ, reside, predict_value, betas, tr_STS)
202 |         self.coords = coords
203 |         self.t = t
204 |         self.bw = bw
205 |         self.tau = tau
206 |         self.bw_s = self.bw
207 |         self.bw_t = np.sqrt(self.bw ** 2 / self.tau)
208 |         self.CCT = CCT * self.sigma2
209 |         self.bse = np.sqrt(self.CCT)
210 |         self.tvalues = self.betas / self.bse
211 | 
212 | 
213 | class MGWRResults(BaseModel):
214 | 
215 |     def __init__(self, coords, X, y, bws, kernel, fixed, bws_history, betas,
216 |                  predict_value, ENP_j, CCT):
217 |         """
218 |         bws         : array-like
219 |                       corresponding spatial bandwidth of all variables
220 |         ENP_j       : array-like
221 |                       effective number of paramters, which depends on
222 |                       sigma2, for each covariate in the model
223 | 
224 |         See Also GWRResults
225 |         """
226 |         super(MGWRResults, self).__init__(X, y, kernel, fixed, constant=False)
227 |         self.coords = coords
228 |         self.bws = bws
229 |         self.bws_history = bws_history
230 |         self.predict_value = predict_value
231 |         self.betas = betas
232 |         self.reside = self.y - self.predict_value
233 |         self.TSS = np.sum((self.y - np.mean(self.y)) ** 2)
234 |         self.RSS = np.sum(self.reside ** 2)
235 |         self.R2 = 1 - self.RSS / self.TSS
236 |         self.llf = -np.log(self.RSS) * self.n / 2 - (1 + np.log(np.pi / self.n * 2)) * self.n / 2
237 |         self.bic = -2.0 * self.llf + (self.k + 1) * np.log(self.n)
238 |         if ENP_j is not None:
239 |             self.ENP_j = ENP_j
240 |             self.tr_S = np.sum(self.ENP_j)
241 |             self.ENP = self.tr_S
242 |             self.sigma2 = self.RSS / (self.n - self.tr_S)
243 |             self.CCT = CCT * self.sigma2
244 |             self.bse = np.sqrt(self.CCT)
245 |             self.t_values = self.betas / self.bse
246 |             self.df_model = self.n - self.tr_S
247 |             self.adj_R2 = 1 - (1 - self.R2) * (self.n - 1) / (self.n - self.ENP - 1)
248 |             self.aic = -2.0 * self.llf + 2.0 * (self.tr_S + 1)
249 |             self.aic_c = self.aic + 2.0 * self.tr_S * (self.tr_S + 1.0) / (self.n - self.tr_S - 1.0)
250 | 
251 | 
252 | class MGTWRResults(MGWRResults):
253 | 
254 |     def __init__(self, coords, t, X, y, bws, taus, kernel, fixed, bw_ts, bws_history, taus_history, betas,
255 |                  predict_value, ENP_j, CCT):
256 |         """
257 |         taus        : array-like
258 |                      corresponding spatio-temporal scale of all variables
259 |         bws         : array-like
260 |                      corresponding spatio bandwidth of all variables
261 |         bw_ts       : array-like
262 |                      corresponding temporal bandwidth of all variables
263 |         See Also
264 |         -------------
265 |         MGWRResults
266 |         GWRResults
267 |         """
268 |         super(MGTWRResults, self).__init__(
269 |             coords, X, y, bws, kernel, fixed, bws_history, betas, predict_value, ENP_j, CCT)
270 |         self.t = t
271 |         self.taus = taus
272 |         self.bw_ts = bw_ts
273 |         self.taus_history = taus_history
274 | 


--------------------------------------------------------------------------------
/mgtwr/sel.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | from typing import Union
  3 | import pandas as pd
  4 | from .diagnosis import get_AICc, get_AIC, get_BIC, get_CV
  5 | from .obj import BaseModel
  6 | from scipy.spatial.distance import pdist
  7 | from .model import GWR, GTWR
  8 | from .function import golden_section, surface_to_plane, print_time, twostep_golden_section, multi_bw, multi_bws
  9 | 
 10 | getDiag = {'AICc': get_AICc, 'AIC': get_AIC, 'BIC': get_BIC, 'CV': get_CV}
 11 | 
 12 | delta = 0.38197
 13 | 
 14 | 
 15 | class SearchGWRParameter(BaseModel):
 16 |     """
 17 |     Select bandwidth for GWR model
 18 | 
 19 |     Parameters
 20 |     ----------
 21 |     coords        : array-like
 22 |                     n*2, collection of n sets of (x,y) coordinates of
 23 |                     observations
 24 | 
 25 |     y             : array-like
 26 |                     n*1, dependent variable
 27 | 
 28 |     X             : array-like
 29 |                     n*k, independent variable, excluding the constant
 30 | 
 31 |     kernel        : string
 32 |                     type of kernel function used to weight observations;
 33 |                     available options:
 34 |                     'gaussian'
 35 |                     'bisquare'
 36 |                     'exponential'
 37 | 
 38 |     fixed         : boolean
 39 |                     True for distance based kernel function and  False for
 40 |                     adaptive (nearest neighbor) kernel function (default)
 41 | 
 42 |     constant      : boolean
 43 |                     True to include intercept (default) in model and False to exclude
 44 |                     intercept.
 45 | 
 46 |     Examples
 47 |     --------
 48 |     import numpy as np
 49 |     from mgtwr.sel import SearchGWRParameter
 50 |     np.random.seed(1)
 51 |     u = np.array([(i-1) % 12 for i in range(1, 1729)]).reshape(-1, 1)
 52 |     v = np.array([((i-1) % 144) // 12 for i in range(1, 1729)]).reshape(-1, 1)
 53 |     t = np.array([(i-1) // 144 for i in range(1, 1729)]).reshape(-1, 1)
 54 |     x1 = np.random.uniform(0, 1, (1728, 1))
 55 |     x2 = np.random.uniform(0, 1, (1728, 1))
 56 |     epsilon = np.random.randn(1728, 1)
 57 |     beta0 = 5
 58 |     beta1 = 3 + (u + v + t)/6
 59 |     beta2 = 3 + ((36-(6-u)**2)*(36-(6-v)**2)*(36-(6-t)**2)) / 128
 60 |     y = beta0 + beta1 * x1 + beta2 * x2 + epsilon
 61 |     coords = np.hstack([u, v])
 62 |     X = np.hstack([x1, x2])
 63 |     sel = SearchGWRParameter(coords, X, y, kernel='gaussian', fixed=True)
 64 |     bw = sel.search(bw_max=40, verbose=True)
 65 |     2.0
 66 |     """
 67 | 
 68 |     def __init__(
 69 |             self,
 70 |             coords: Union[np.ndarray, pd.DataFrame],
 71 |             X: Union[np.ndarray, pd.DataFrame],
 72 |             y: Union[np.ndarray, pd.DataFrame],
 73 |             kernel: str = 'exponential',
 74 |             fixed: bool = False,
 75 |             constant: bool = True,
 76 |             convert: bool = False,
 77 |             thread: int = 1
 78 |     ):
 79 | 
 80 |         super(SearchGWRParameter, self).__init__(X, y, kernel, fixed, constant)
 81 |         if isinstance(coords, pd.DataFrame):
 82 |             coords = coords.values
 83 |         self.coords = coords
 84 |         if convert:
 85 |             longitude = coords[:, 0]
 86 |             latitude = coords[:, 1]
 87 |             longitude, latitude = surface_to_plane(longitude, latitude)
 88 |             self.coords = np.hstack([longitude, latitude])
 89 |         self.int_score = not self.fixed
 90 |         self.thread = thread
 91 | 
 92 |     @print_time
 93 |     def search(self,
 94 |                criterion: str = 'AICc',
 95 |                bw_min: float = None,
 96 |                bw_max: float = None,
 97 |                tol: float = 1.0e-6,
 98 |                bw_decimal: int = 0,
 99 |                max_iter: int = 200,
100 |                verbose: bool = True,
101 |                time_cost: bool = False
102 |                ):
103 |         """
104 |         Method to select one unique bandwidth for a GWR model.
105 | 
106 |         Parameters
107 |         ----------
108 |         criterion      : string
109 |                          bw selection criterion: 'AICc', 'AIC', 'BIC', 'CV'
110 |         bw_min         : float
111 |                          min value used in bandwidth search
112 |         bw_max         : float
113 |                          max value used in bandwidth search
114 |         tol            : float
115 |                          tolerance used to determine convergence
116 |         max_iter       : integer
117 |                          max iterations if no convergence to tol
118 | 
119 |         bw_decimal      : scalar
120 |                          The number of bandwidth's decimal places saved during the search
121 | 
122 |         verbose        : bool
123 |                          If true, bandwidth searching history is printed out; default is False.
124 |         time_cost      : bool
125 |                          If true, print run time
126 |         """
127 | 
128 |         def gwr_func(x):
129 |             return getDiag[criterion](GWR(
130 |                 self.coords, self.X, self.y, x, kernel=self.kernel,
131 |                 fixed=self.fixed, constant=False, thread=self.thread).cal_aic())
132 | 
133 |         bw_min, bw_max = self._init_section(bw_min, bw_max)
134 |         bw = golden_section(bw_min, bw_max, delta, bw_decimal, gwr_func, tol, max_iter, verbose)
135 |         return bw
136 | 
137 |     def _init_section(self, bw_min, bw_max):
138 |         if bw_min is not None and bw_max is not None:
139 |             return bw_min, bw_max
140 | 
141 |         if len(self.X) > 0:
142 |             n_glob = self.X.shape[1]
143 |         else:
144 |             n_glob = 0
145 |         if self.constant:
146 |             n_vars = n_glob + 1
147 |         else:
148 |             n_vars = n_glob
149 |         n = np.array(self.coords).shape[0]
150 | 
151 |         if self.int_score:
152 |             a = 40 + 2 * n_vars
153 |             c = n
154 |         else:
155 |             try:
156 |                 coords = np.unique(self.coords, axis=0)
157 |                 sq_dists = pdist(coords)
158 |                 a = np.min(sq_dists) / 2.0
159 |                 c = np.max(sq_dists)
160 |             except MemoryError:
161 |                 # Note that the value obtained in this way is not the maximum distance of all points,
162 |                 # but the upper bound of the search has little effect on the results of the model
163 |                 coords = sorted(self.coords, key=lambda x: x[0] ** 2 + x[1] ** 2)
164 |                 a = pdist(coords[:2])[0]
165 |                 c = pdist([coords[0], coords[-1]])[0]
166 |         if bw_min is None:
167 |             bw_min = a
168 |         if bw_max is None:
169 |             bw_max = c
170 | 
171 |         return bw_min, bw_max
172 | 
173 | 
174 | class SearchMGWRParameter(BaseModel):
175 | 
176 |     def __init__(
177 |             self,
178 |             coords: Union[np.ndarray, pd.DataFrame],
179 |             X: Union[np.ndarray, pd.DataFrame],
180 |             y: Union[np.ndarray, pd.DataFrame],
181 |             kernel: str = 'exponential',
182 |             fixed: bool = False,
183 |             constant: bool = True,
184 |             convert: bool = False,
185 |             thread: int = 1
186 |     ):
187 | 
188 |         super(SearchMGWRParameter, self).__init__(X, y, kernel, fixed, constant)
189 |         if isinstance(coords, pd.DataFrame):
190 |             coords = coords.values
191 |         self.coords = coords
192 |         if convert:
193 |             longitude = coords[:, 0]
194 |             latitude = coords[:, 1]
195 |             longitude, latitude = surface_to_plane(longitude, latitude)
196 |             self.coords = np.hstack([longitude, latitude])
197 |         self.int_score = not self.fixed
198 |         self.thread = thread
199 |         self.criterion = None
200 |         self.bws = None
201 |         self.tol = None
202 |         self.bw_decimal = None
203 | 
204 |     @print_time
205 |     def search(
206 |             self,
207 |             criterion: str = 'AICc',
208 |             bw_min: float = None,
209 |             bw_max: float = None,
210 |             tol: float = 1.0e-6,
211 |             bw_decimal: int = 1,
212 |             init_bw: float = None,
213 |             multi_bw_min: list = None,
214 |             multi_bw_max: list = None,
215 |             tol_multi: float = 1.0e-5,
216 |             bws_same_times: int = 5,
217 |             verbose: bool = False,
218 |             rss_score: bool = False,
219 |             time_cost: bool = False
220 |             ):
221 |         """
222 |         Method to select one unique bandwidth and Spatio-temporal scale for a gtwr model or a
223 |         bandwidth vector and Spatio-temporal scale vector for a mgwr model.
224 | 
225 |         Parameters
226 |         ----------
227 |         criterion      : string
228 |                          bw selection criterion: 'AICc', 'AIC', 'BIC', 'CV'
229 |         bw_min         : float
230 |                          min value used in bandwidth search
231 |         bw_max         : float
232 |                          max value used in bandwidth search
233 |         multi_bw_min   : list
234 |                          min values used for each covariate in mgwr bandwidth search.
235 |                          Must be either a single value or have one value for
236 |                          each covariate including the intercept
237 |         multi_bw_max   : list
238 |                          max values used for each covariate in mgwr bandwidth
239 |                          search. Must be either a single value or have one value
240 |                          for each covariate including the intercept
241 |         tol            : float
242 |                          tolerance used to determine convergence
243 |         bw_decimal     : int
244 |                         The number of bw decimal places reserved
245 |         init_bw        : float
246 |                          None (default) to initialize MGTWR with a bandwidth
247 |                          derived from GTWR. Otherwise this option will choose the
248 |                          bandwidth to initialize MGWR with.
249 |         tol_multi      : convergence tolerance for the multiple bandwidth
250 |                          back fitting algorithm; a larger tolerance may stop the
251 |                          algorithm faster though it may result in a less optimal
252 |                          model
253 |         bws_same_times : If bandwidths keep the same between iterations for
254 |                          bws_same_times (default 5) in backfitting, then use the
255 |                          current set of bandwidths as final bandwidths.
256 |         rss_score      : True to use the residual sum of squares to evaluate
257 |                          each iteration of the multiple bandwidth back fitting
258 |                          routine and False to use a smooth function; default is
259 |                          False
260 |         verbose        : Boolean
261 |                          If true, bandwidth searching history is printed out; default is False.
262 |         time_cost      : bool
263 |                         If true, print run time
264 |         """
265 |         self.criterion = criterion
266 |         self.tol = tol
267 |         self.bw_decimal = bw_decimal
268 |         if multi_bw_min is not None:
269 |             if len(multi_bw_min) == self.k:
270 |                 multi_bw_min = multi_bw_min
271 |             elif len(multi_bw_min) == 1:
272 |                 multi_bw_min = multi_bw_min * self.k
273 |             else:
274 |                 raise AttributeError(
275 |                     "multi_bw_min must be either a list containing"
276 |                     " a single entry or a list containing an entry for each of k"
277 |                     " covariates including the intercept")
278 |         else:
279 |             a = self._init_section(bw_min, bw_max)[0]
280 |             multi_bw_min = [a] * self.k
281 | 
282 |         if multi_bw_max is not None:
283 |             if len(multi_bw_max) == self.k:
284 |                 multi_bw_max = multi_bw_max
285 |             elif len(multi_bw_max) == 1:
286 |                 multi_bw_max = multi_bw_max * self.k
287 |             else:
288 |                 raise AttributeError(
289 |                     "multi_bw_max must be either a list containing"
290 |                     " a single entry or a list containing an entry for each of k"
291 |                     " covariates including the intercept")
292 |         else:
293 |             c = self._init_section(bw_min, bw_max)[1]
294 |             multi_bw_max = [c] * self.k
295 | 
296 |         self.bws = multi_bw(init_bw, self.X, self.y, self.n, self.k, tol_multi,
297 |                             rss_score, self.gwr_func, self.bw_func, self.sel_func, multi_bw_min, multi_bw_max,
298 |                             bws_same_times, verbose=verbose)
299 |         return self.bws
300 | 
301 |     def gwr_func(self, X, y, bw):
302 |         res = GWR(self.coords, X, y, bw, kernel=self.kernel,
303 |                   fixed=self.fixed, constant=False, thread=self.thread).cal_multi()
304 |         return res
305 | 
306 |     def bw_func(self, X, y):
307 |         selector = SearchGWRParameter(self.coords, X, y, kernel=self.kernel, fixed=self.fixed,
308 |                                       constant=False, thread=self.thread)
309 |         return selector
310 | 
311 |     def sel_func(self, bw_func, bw_min=None, bw_max=None):
312 |         return bw_func.search(criterion=self.criterion, bw_min=bw_min, bw_max=bw_max,
313 |                               tol=self.tol, bw_decimal=self.bw_decimal, verbose=False)
314 | 
315 |     def _init_section(self, bw_min, bw_max):
316 | 
317 |         a = bw_min if bw_min is not None else 0
318 |         if bw_max is not None:
319 |             c = bw_max
320 |         else:
321 |             c = max(np.max(self.coords[:, 0]) - np.min(self.coords[:, 0]),
322 |                     np.max(self.coords[:, 1]) - np.min(self.coords[:, 1]))
323 |         return a, c
324 | 
325 | 
326 | class SearchGTWRParameter(BaseModel):
327 |     """
328 |     Select bandwidth for GTWR model
329 | 
330 |     Parameters
331 |     ----------
332 |     coords        : array-like
333 |                     n*2, collection of n sets of (x,y) coordinates of
334 |                     observations
335 | 
336 |     t             : array-like
337 |                     n*1, time location
338 | 
339 |     y             : array-like
340 |                     n*1, dependent variable
341 | 
342 |     X             : array-like
343 |                     n*k, independent variable, excluding the constant
344 | 
345 |     kernel        : string
346 |                     type of kernel function used to weight observations;
347 |                     available options:
348 |                     'gaussian'
349 |                     'bisquare'
350 |                     'exponential'
351 | 
352 |     fixed         : boolean
353 |                     True for distance based kernel function and  False for
354 |                     adaptive (nearest neighbor) kernel function (default)
355 | 
356 |     constant      : boolean
357 |                     True to include intercept (default) in model and False to exclude
358 |                     intercept.
359 | 
360 |     Examples
361 |     --------
362 |     import numpy as np
363 |     from mgtwr.sel import SearchGTWRParameter
364 |     np.random.seed(1)
365 |     u = np.array([(i-1) % 12 for i in range(1, 1729)]).reshape(-1, 1)
366 |     v = np.array([((i-1) % 144)//12 for i in range(1, 1729)]).reshape(-1, 1)
367 |     t = np.array([(i-1) // 144 for i in range(1, 1729)]).reshape(-1, 1)
368 |     x1 = np.random.uniform(0, 1, (1728, 1))
369 |     x2 = np.random.uniform(0, 1, (1728, 1))
370 |     epsilon = np.random.randn(1728, 1)
371 |     beta0 = 5
372 |     beta1 = 3 + (u + v + t)/6
373 |     beta2 = 3 + ((36-(6-u)**2)*(36-(6-v)**2)*(36-(6-t)**2)) / 128
374 |     y = beta0 + beta1 * x1 + beta2 * x2 + epsilon
375 |     coords = np.hstack([u, v])
376 |     X = np.hstack([x1, x2])
377 |     sel = SearchGTWRParameter(coords, t, X, y, kernel='gaussian', fixed=True)
378 |     bw, tau = sel.search(tau_max=20, verbose=True)
379 |     0.9, 1.5
380 |     """
381 | 
382 |     def __init__(
383 |             self,
384 |             coords: np.ndarray,
385 |             t: np.ndarray,
386 |             X: np.ndarray,
387 |             y: np.ndarray,
388 |             kernel: str = 'exponential',
389 |             fixed: bool = False,
390 |             constant: bool = True,
391 |             convert: bool = False,
392 |             thread: int = 1
393 |     ):
394 | 
395 |         super(SearchGTWRParameter, self).__init__(X, y, kernel, fixed, constant)
396 |         if isinstance(coords, pd.DataFrame):
397 |             coords = coords.values
398 |         self.coords = coords
399 |         if convert:
400 |             longitude = coords[:, 0]
401 |             latitude = coords[:, 1]
402 |             longitude, latitude = surface_to_plane(longitude, latitude)
403 |             self.coords = np.hstack([longitude, latitude])
404 |         self.t = t
405 |         self.int_score = not self.fixed
406 |         self.thread = thread
407 | 
408 |     @print_time
409 |     def search(
410 |             self,
411 |             criterion: str = 'AICc',
412 |             bw_min: float = None,
413 |             bw_max: float = None,
414 |             tau_min: float = None,
415 |             tau_max: float = None,
416 |             tol: float = 1.0e-6,
417 |             bw_decimal: int = 1,
418 |             tau_decimal: int = 1,
419 |             max_iter: int = 200,
420 |             verbose: bool = False,
421 |             time_cost: bool = False
422 |             ):
423 |         """
424 |         Method to select one unique bandwidth and Spatio-temporal scale for a GTWR model.
425 | 
426 |         Parameters
427 |         ----------
428 |         criterion      : string
429 |                          bw selection criterion: 'AICc', 'AIC', 'BIC', 'CV'
430 |         bw_min         : float
431 |                          min value used in bandwidth search
432 |         bw_max         : float
433 |                          max value used in bandwidth search
434 |         tau_min        : float
435 |                          min value used in spatio-temporal scale search
436 |         tau_max        : float
437 |                          max value used in spatio-temporal scale search
438 |         tol            : float
439 |                          tolerance used to determine convergence
440 |         max_iter       : integer
441 |                          max iterations if no convergence to tol
442 |         bw_decimal      : scalar
443 |                          The number of bandwidth's decimal places saved during the search
444 |         tau_decimal     : scalar
445 |                          The number of Spatio-temporal decimal places saved during the search
446 |         verbose        : Boolean
447 |                          If true, bandwidth searching history is printed out; default is False.
448 |         time_cost      : bool
449 |                         If true, print run time
450 |         """
451 | 
452 |         def gtwr_func(x, y):
453 |             return getDiag[criterion](GTWR(
454 |                 self.coords, self.t, self.X, self.y, x, y, kernel=self.kernel,
455 |                 fixed=self.fixed, constant=False, thread=self.thread).cal_aic())
456 | 
457 |         bw_min, bw_max, tau_min, tau_max = self._init_section(bw_min, bw_max, tau_min, tau_max)
458 |         bw, tau = twostep_golden_section(bw_min, bw_max, tau_min, tau_max, delta, gtwr_func, tol, max_iter, bw_decimal,
459 |                                          tau_decimal, verbose)
460 | 
461 |         return bw, tau
462 | 
463 |     def _init_section(self, bw_min, bw_max, tau_min, tau_max):
464 |         if (bw_min is not None) and (bw_max is not None) and (tau_min is not None) and (tau_max is not None):
465 |             return bw_min, bw_max, tau_min, tau_max
466 |         if len(self.X) > 0:
467 |             n_glob = self.X.shape[1]
468 |         else:
469 |             n_glob = 0
470 |         if self.constant:
471 |             n_vars = n_glob + 1
472 |         else:
473 |             n_vars = n_glob
474 |         n = np.array(self.coords).shape[0]
475 | 
476 |         if self.int_score:
477 |             a = 40 + 2 * n_vars
478 |             c = n
479 |         else:
480 |             try:
481 |                 coords = np.unique(self.coords, axis=0)
482 |                 sq_dists = pdist(coords)
483 |                 a = np.min(sq_dists) / 2.0
484 |                 c = np.max(sq_dists)
485 |             except MemoryError:
486 |                 # Note that the value obtained in this way is not the maximum distance of all points,
487 |                 # but the upper bound of the search has little effect on the results of the model
488 |                 coords = sorted(self.coords, key=lambda x: x[0] ** 2 + x[1] ** 2)
489 |                 a = pdist(coords[:2])[0]
490 |                 c = pdist([coords[0], coords[-1]])[0]
491 | 
492 |         if bw_min is None:
493 |             bw_min = a
494 |         if bw_max is None:
495 |             bw_max = c
496 | 
497 |         if tau_min is None:
498 |             tau_min = 0
499 |         if tau_max is None:
500 |             tau_max = 4
501 |         return bw_min, bw_max, tau_min, tau_max
502 | 
503 | 
504 | class SearchMGTWRParameter(BaseModel):
505 |     """
506 |     Select bandwidth for MGTWR model
507 | 
508 |     Parameters
509 |     ----------
510 |     coords        : array-like
511 |                     n*2, collection of n sets of (x,y) coordinates of
512 |                     observations
513 | 
514 |     t             : array-like
515 |                     n*1, time location
516 | 
517 |     X             : array-like
518 |                     n*k, independent variable, excluding the constant
519 | 
520 |     y             : array-like
521 |                     n*1, dependent variable
522 | 
523 |     kernel        : string
524 |                     type of kernel function used to weight observations;
525 |                     available options:
526 |                     'gaussian'
527 |                     'bisquare'
528 |                     'exponential'
529 | 
530 |     fixed         : bool
531 |                     True for distance based kernel function and  False for
532 |                     adaptive (nearest neighbor) kernel function (default)
533 | 
534 |     constant      : bool
535 |                     True to include intercept (default) in model and False to exclude
536 |                     intercept.
537 | 
538 |     Examples
539 |     --------
540 |     import numpy as np
541 |     from mgtwr.sel import SearchMGTWRParameter
542 |     from mgtwr.model import MGTWR
543 |     np.random.seed(10)
544 |     u = np.array([(i-1) % 12 for i in range(1, 1729)]).reshape(-1, 1)
545 |     v = np.array([((i-1) % 144)//12 for i in range(1, 1729)]).reshape(-1, 1)
546 |     t = np.array([(i-1) // 144 for i in range(1, 1729)]).reshape(-1, 1)
547 |     x1 = np.random.uniform(0, 1, (1728, 1))
548 |     x2 = np.random.uniform(0, 1, (1728, 1))
549 |     epsilon = np.random.randn(1728, 1)
550 |     beta0 = 5
551 |     beta1 = 3 + (u + v + t)/6
552 |     beta2 = 3 + ((36-(6-u)**2)*(36-(6-v)**2)*(36-(6-t)**2)) / 128
553 |     y = beta0 + beta1 * x1 + beta2 * x2 + epsilon
554 |     coords = np.hstack([u, v])
555 |     X = np.hstack([x1, x2])
556 |     sel_multi = SearchMGTWRParameter(coords, t, X, y, kernel='gaussian', fixed=True)
557 |     bws = sel_multi.search(multi_bw_min=[0.1], verbose=True, tol_multi=1.0e-4)
558 |     mgtwr = MGTWR(coords, t, X, y, sel_multi, kernel='gaussian', fixed=True).fit()
559 |     print(mgtwr.R2)
560 |     0.9972924820674222
561 |     """
562 |     def __init__(
563 |             self,
564 |             coords: np.ndarray,
565 |             t: np.ndarray,
566 |             X: np.ndarray,
567 |             y: np.ndarray,
568 |             kernel: str = 'exponential',
569 |             fixed: bool = False,
570 |             constant: bool = True,
571 |             convert: bool = False,
572 |             thread: int = 1
573 |     ):
574 | 
575 |         super(SearchMGTWRParameter, self).__init__(X, y, kernel, fixed, constant)
576 |         if isinstance(coords, pd.DataFrame):
577 |             coords = coords.values
578 |         self.coords = coords
579 |         if convert:
580 |             longitude = coords[:, 0]
581 |             latitude = coords[:, 1]
582 |             longitude, latitude = surface_to_plane(longitude, latitude)
583 |             self.coords = np.hstack([longitude, latitude])
584 |         self.t = t
585 |         self.int_score = not self.fixed
586 |         self.thread = thread
587 |         self.criterion = None
588 |         self.bws = None
589 |         self.tol = None
590 |         self.bw_decimal = None
591 |         self.tau_decimal = None
592 | 
593 |     @print_time
594 |     def search(
595 |             self,
596 |             criterion: str = 'AICc',
597 |             bw_min: float = None,
598 |             bw_max: float = None,
599 |             tau_min: float = None,
600 |             tau_max: float = None,
601 |             tol: float = 1.0e-6,
602 |             bw_decimal: int = 1,
603 |             tau_decimal: int = 1,
604 |             init_bw: float = None,
605 |             init_tau: float = None,
606 |             multi_bw_min: list = None,
607 |             multi_bw_max: list = None,
608 |             multi_tau_min: list = None,
609 |             multi_tau_max: list = None,
610 |             tol_multi: float = 1.0e-5,
611 |             verbose: bool = False,
612 |             rss_score: bool = False,
613 |             time_cost: bool = False
614 |             ):
615 |         """
616 |         Method to select one unique bandwidth and Spatio-temporal scale for a gtwr model or a
617 |         bandwidth vector and Spatio-temporal scale vector for a mtgwr model.
618 | 
619 |         Parameters
620 |         ----------
621 |         criterion      : string
622 |                          bw selection criterion: 'AICc', 'AIC', 'BIC', 'CV'
623 |         bw_min         : float
624 |                          min value used in bandwidth search
625 |         bw_max         : float
626 |                          max value used in bandwidth search
627 |         tau_min        : float
628 |                          min value used in spatio-temporal scale search
629 |         tau_max        : float
630 |                          max value used in spatio-temporal scale search
631 |         multi_bw_min   : list
632 |                          min values used for each covariate in mgwr bandwidth search.
633 |                          Must be either a single value or have one value for
634 |                          each covariate including the intercept
635 |         multi_bw_max   : list
636 |                          max values used for each covariate in mgwr bandwidth
637 |                          search. Must be either a single value or have one value
638 |                          for each covariate including the intercept
639 |         multi_tau_min  : list
640 |                          min values used for each covariate in mgtwr spatio-temporal scale
641 |                          search. Must be either a single value or have one value
642 |                          for each covariate including the intercept
643 |         multi_tau_max  : max values used for each covariate in mgtwr spatio-temporal scale
644 |                          search. Must be either a single value or have one value
645 |                          for each covariate including the intercept
646 |         tol            : float
647 |                          tolerance used to determine convergence
648 |         bw_decimal     : int
649 |                         The number of bw decimal places reserved
650 |         tau_decimal    : int
651 |                         The number of tau decimal places reserved
652 |         init_bw        : float
653 |                          None (default) to initialize MGTWR with a bandwidth
654 |                          derived from GTWR. Otherwise this option will choose the
655 |                          bandwidth to initialize MGWR with.
656 |         init_tau       : float
657 |                          None (default) to initialize MGTWR with a spatio-temporal scale
658 |                          derived from GTWR. Otherwise this option will choose the
659 |                          spatio-temporal scale to initialize MGWR with.
660 |         tol_multi      : convergence tolerance for the multiple bandwidth
661 |                          back fitting algorithm; a larger tolerance may stop the
662 |                          algorithm faster though it may result in a less optimal
663 |                          model
664 |         rss_score      : True to use the residual sum of squares to evaluate
665 |                          each iteration of the multiple bandwidth back fitting
666 |                          routine and False to use a smooth function; default is
667 |                          False
668 |         verbose        : Boolean
669 |                          If true, bandwidth searching history is printed out; default is False.
670 |         time_cost      : bool
671 |                         If true, print run time
672 |         """
673 |         self.criterion = criterion
674 |         self.tol = tol
675 |         self.bw_decimal = bw_decimal
676 |         self.tau_decimal = tau_decimal
677 |         if multi_bw_min is not None:
678 |             if len(multi_bw_min) == self.k:
679 |                 multi_bw_min = multi_bw_min
680 |             elif len(multi_bw_min) == 1:
681 |                 multi_bw_min = multi_bw_min * self.k
682 |             else:
683 |                 raise AttributeError(
684 |                     "multi_bw_min must be either a list containing"
685 |                     " a single entry or a list containing an entry for each of k"
686 |                     " covariates including the intercept")
687 |         else:
688 |             a = self._init_section(bw_min, bw_max, tau_min, tau_max)[0]
689 |             multi_bw_min = [a] * self.k
690 | 
691 |         if multi_bw_max is not None:
692 |             if len(multi_bw_max) == self.k:
693 |                 multi_bw_max = multi_bw_max
694 |             elif len(multi_bw_max) == 1:
695 |                 multi_bw_max = multi_bw_max * self.k
696 |             else:
697 |                 raise AttributeError(
698 |                     "multi_bw_max must be either a list containing"
699 |                     " a single entry or a list containing an entry for each of k"
700 |                     " covariates including the intercept")
701 |         else:
702 |             c = self._init_section(bw_min, bw_max, tau_min, tau_max)[1]
703 |             multi_bw_max = [c] * self.k
704 | 
705 |         if multi_tau_min is not None:
706 |             if len(multi_tau_min) == self.k:
707 |                 multi_tau_min = multi_tau_min
708 |             elif len(multi_tau_min) == 1:
709 |                 multi_tau_min = multi_tau_min * self.k
710 |             else:
711 |                 raise AttributeError(
712 |                     "multi_tau_min must be either a list containing"
713 |                     " a single entry or a list containing an entry for each of k"
714 |                     " variates including the intercept")
715 |         else:
716 |             A = self._init_section(bw_min, bw_max, tau_min, tau_max)[2]
717 |             multi_tau_min = [A] * self.k
718 | 
719 |         if multi_tau_max is not None:
720 |             if len(multi_tau_max) == self.k:
721 |                 multi_tau_max = multi_tau_max
722 |             elif len(multi_tau_max) == 1:
723 |                 multi_tau_max = multi_tau_max * self.k
724 |             else:
725 |                 raise AttributeError(
726 |                     "multi_tau_max must be either a list containing"
727 |                     " a single entry or a list containing an entry for each of k"
728 |                     " variates including the intercept")
729 |         else:
730 |             C = self._init_section(bw_min, bw_max, tau_min, tau_max)[3]
731 |             multi_tau_max = [C] * self.k
732 | 
733 |         self.bws = multi_bws(init_bw, init_tau, self.X, self.y, self.n, self.k, tol_multi,
734 |                              rss_score, self.gtwr_func, self.bw_func, self.sel_func, multi_bw_min, multi_bw_max,
735 |                              multi_tau_min, multi_tau_max, verbose=verbose)
736 |         return self.bws
737 | 
738 |     def gtwr_func(self, X, y, bw, tau):
739 |         return GTWR(self.coords, self.t, X, y, bw, tau, kernel=self.kernel,
740 |                     fixed=self.fixed, constant=False, thread=self.thread).cal_multi()
741 | 
742 |     def bw_func(self, X, y):
743 |         selector = SearchGTWRParameter(self.coords, self.t, X, y, kernel=self.kernel, fixed=self.fixed,
744 |                                        constant=False, thread=self.thread)
745 |         return selector
746 | 
747 |     def sel_func(self, bw_func, bw_min=None, bw_max=None, tau_min=None, tau_max=None):
748 |         return bw_func.search(criterion=self.criterion, bw_min=bw_min, bw_max=bw_max, tau_min=tau_min, tau_max=tau_max,
749 |                               tol=self.tol, bw_decimal=self.bw_decimal, tau_decimal=self.tau_decimal, verbose=False)
750 | 
751 |     def _init_section(self, bw_min, bw_max, tau_min, tau_max):
752 | 
753 |         a = bw_min if bw_min is not None else 0
754 |         if bw_max is not None:
755 |             c = bw_max
756 |         else:
757 |             c = max(np.max(self.coords[:, 0]) - np.min(self.coords[:, 0]),
758 |                     np.max(self.coords[:, 1]) - np.min(self.coords[:, 1]))
759 | 
760 |         A = tau_min if tau_min is not None else 0
761 |         C = tau_max if tau_max is not None else 4
762 | 
763 |         return a, c, A, C
764 | 


--------------------------------------------------------------------------------
/mgtwr/setup.py:
--------------------------------------------------------------------------------
 1 | import setuptools
 2 | 
 3 | setuptools.setup(
 4 |     version="2.0.5",
 5 |     long_description="To fit geographically weighted model, "
 6 |                      "multiscale geographically weighted regression model, "
 7 |                      "geographically and temporally weighted regression model and "
 8 |                      "multiscale geographically and temporally weighted regression model.",
 9 |     author="Kun Sun",
10 |     author_email="849024477@qq.com",
11 |     packages=['mgtwr'],
12 |     url="https://github.com/sunkun1997/mgtwr",
13 | )
14 | 


--------------------------------------------------------------------------------