├── .gitignore ├── .ipynb_checkpoints └── Time_Series_Generation_Examples-checkpoint.ipynb ├── License.txt ├── README.md ├── Time_Series_Generation_Examples.ipynb ├── setup.py ├── tsBNgen ├── __init__.py ├── __pycache__ │ ├── __init__.cpython-38.pyc │ └── tsBNgen.cpython-38.pyc └── tsBNgen.py └── tsbngen.pdf /.gitignore: -------------------------------------------------------------------------------- 1 | tsBNgen.egg-info 2 | dist 3 | build 4 | .git -------------------------------------------------------------------------------- /.ipynb_checkpoints/Time_Series_Generation_Examples-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Examples Companion with Documentation\n", 8 | "\n", 9 | "**Description**\n", 10 | "> #### tsBNgen is a Python library to generate time series data based on an arbitrary dynamic Bayesian network. The intention behind writing tsBNgen is to let researchers geenrate time series data according to arbitrary model they want. \n", 11 | "\n", 12 | "> #### tsBNgen is released under the MIT license. \n", 13 | "\n", 14 | "**Instruction** \n", 15 | "\n", 16 | "> #### 1. Either clone this repository https://github.com/manitadayon/tsBNgen or install the package using pip install tsBNgen.\n", 17 | "> #### Then import necessary libraries using the following commands:\n", 18 | "```python\n", 19 | "from tsBNgen import *\n", 20 | "from tsBNgen.tsBNgen import * \n", 21 | "```\n", 22 | "> #### There are in general two functions you should be running if you want to generate data:\n", 23 | "\n", 24 | "> - BN_data_gen(): Use this function under the following conditions:\n", 25 | " custom_time variable is not specified and the value of the loopback for all the variables is at most 1.\n", 26 | " \n", 27 | "---- \n", 28 | "\n", 29 | "**Note**: condition 1 describes the classical dynamic Bayesian network in which some nodes at time t-1 are connected to themselves at time t. \n", 30 | "\n", 31 | "> - BN_sample_gen_loopback(): \n", 32 | " 1. custom_time is not specified and you want the loopback value for some nodes to be at most 2.\n", 33 | " 2. custom_time is specified and it is at least equal to the maximum loopback value of the loopbacks2.\n", 34 | "\n", 35 | "> #### Following are the explanation of the some of the variables and parameters in tsBNgen:\n", 36 | "> - **T** : Length of each time series.\n", 37 | "> - **N** : Number of samples.\n", 38 | "> - **N_level** : list. Number of possible levels for discrete nodes.\n", 39 | "> - **Mat** : data-frame. Adjacency matrix for each time point.\n", 40 | "> - **Node_Type** :list. Type of each variable in Bayesian Network.\n", 41 | "> - **CPD** : dict. Conditonal Probability Distribution for initial time point.\n", 42 | "> - **Parent** : dict. Identifying parent of each node in Bayesian network at initial time.\n", 43 | "> - **CPD2** : dict. Conditonal Probability Distribution.\n", 44 | "> - **Parent2** : dict. Identifying parent of each node in Bayesian network.\n", 45 | "> - **loopbacks** : dict. Describing the temporal interconnection between nodes.\n", 46 | "> - **CPD3** : dict. Conditonal Probability Distribution. Use this entry when BN_sample_gen_loopback() is called.\n", 47 | "> - **Parent3** : dict. Identifying parent of each node in Bayesian network. Use this entry when \n", 48 | "> BN_sample_gen_loopback() is called.\n", 49 | "> - **loopback2** : dict. Describing the temporal interconnection between nodes. Use this entry when \n", 50 | "> BN_sample_gen_loopback() is called.\n", 51 | " \n", 52 | " \n" 53 | ] 54 | }, 55 | { 56 | "cell_type": "markdown", 57 | "metadata": {}, 58 | "source": [ 59 | "### Import Necessary Files" 60 | ] 61 | }, 62 | { 63 | "cell_type": "code", 64 | "execution_count": 2, 65 | "metadata": {}, 66 | "outputs": [], 67 | "source": [ 68 | "from tsBNgen import *\n", 69 | "from tsBNgen.tsBNgen import * " 70 | ] 71 | }, 72 | { 73 | "cell_type": "markdown", 74 | "metadata": {}, 75 | "source": [ 76 | "## Architecture 1\n", 77 | "\n", 78 | "### Two Discerete and One Continuous Nodes." 79 | ] 80 | }, 81 | { 82 | "cell_type": "code", 83 | "execution_count": 3, 84 | "metadata": {}, 85 | "outputs": [], 86 | "source": [ 87 | "T=20\n", 88 | "N=2000\n", 89 | "N_level=[2,4]\n", 90 | "Mat=pd.DataFrame(np.array(([0,1,1],[0,0,1],[0,0,0])))\n", 91 | "Node_Type=['D','D','C']\n", 92 | "CPD={'0':[0.6,0.4],'01':[[0.5,0.3,0.15,0.05],[0.1,0.15,0.3,0.45]],'012':{'mu0':10,'sigma0':2,'mu1':30,'sigma1':5,\n", 93 | " 'mu2':50,'sigma2':5,'mu3':70,'sigma3':5,'mu4':15,'sigma4':5,'mu5':50,'sigma5':5,'mu6':70,'sigma6':5,'mu7':90,'sigma7':3\n", 94 | "}}\n", 95 | "Parent={'0':[],'1':[0],'2':[0,1]}\n", 96 | "\n", 97 | "CPD2={'00':[[0.7,0.3],[0.2,0.8]],'011':[[0.7,0.2,0.1,0],[0.6,0.3,0.05,0.05],[0.35,0.5,0.15,0],\n", 98 | "[0.2,0.3,0.4,0.1],[0.3,0.3,0.2,0.2],[0.1,0.2,0.3,0.4],[0.05,0.15,0.3,0.5],[0,0.05,0.25,0.7]],'012':{'mu0':10,'sigma0':2,'mu1':30,'sigma1':5,\n", 99 | " 'mu2':50,'sigma2':5,'mu3':70,'sigma3':5,'mu4':15,'sigma4':5,'mu5':50,'sigma5':5,'mu6':70,'sigma6':5,'mu7':90,'sigma7':3\n", 100 | "}}\n", 101 | "\n", 102 | "Parent2={'0':[0],'1':[0,1],'2':[0,1]}\n", 103 | "loopbacks={'00':[1],'11':[1]}\n", 104 | "Parent2={'0':[0],'1':[0,1],'2':[0,1]}\n", 105 | "Time_series1=tsBNgen(T,N,N_level,Mat,Node_Type,CPD,Parent,CPD2,Parent2,loopbacks)\n", 106 | "Time_series1.BN_data_gen()" 107 | ] 108 | }, 109 | { 110 | "cell_type": "code", 111 | "execution_count": 4, 112 | "metadata": {}, 113 | "outputs": [ 114 | { 115 | "name": "stdout", 116 | "output_type": "stream", 117 | "text": [ 118 | "[1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 2, 1, 2, 2, 2, 2, 2, 1, 1, 1]\n", 119 | "[1, 1, 2, 2, 4, 4, 4, 3, 2, 3, 4, 2, 2, 4, 4, 3, 4, 3, 2, 1]\n", 120 | "[7.808894461133653, 11.32725466660192, 45.60586730086184, 49.417565048817615, 91.96224489111819, 90.60949019682866, 90.56676813839593, 63.41098040448917, 34.83786809684297, 69.03588167756895, 95.48196749276904, 22.059789834337842, 52.4413443699694, 89.17064709084299, 90.74716862100209, 75.25321342376877, 93.21277916279126, 53.755213101836375, 21.726853444296204, 11.167892171240036]\n" 121 | ] 122 | } 123 | ], 124 | "source": [ 125 | "print(Time_series1.BN_Nodes[0][3])\n", 126 | "print(Time_series1.BN_Nodes[1][3])\n", 127 | "print(Time_series1.BN_Nodes[2][3])" 128 | ] 129 | }, 130 | { 131 | "cell_type": "markdown", 132 | "metadata": {}, 133 | "source": [ 134 | "## Architecture 2\n" 135 | ] 136 | }, 137 | { 138 | "cell_type": "code", 139 | "execution_count": 5, 140 | "metadata": { 141 | "scrolled": true 142 | }, 143 | "outputs": [], 144 | "source": [ 145 | "T=10\n", 146 | "N=1000\n", 147 | "N_level=[2,4]\n", 148 | "Mat=pd.DataFrame(np.array(([0,1,0],[0,0,1],[0,0,0])))\n", 149 | "Node_Type=['D','D','C']\n", 150 | "CPD={'0':[0.5,0.5],'01':[[0.6,0.3,0.05,0.05],[0.1,0.2,0.3,0.4]],'12':{'mu0':10,'sigma0':5,'mu1':30,'sigma1':5,\n", 151 | " 'mu2':60,'sigma2':5,'mu3':80,'sigma3':5}}\n", 152 | "Parent={'0':[],'1':[0],'2':[1]}\n", 153 | "\n", 154 | "CPD2={'00':[[0.7,0.3],[0.3,0.7]],'0011':[[0.7,0.2,0.1,0],[0.5,0.4,0.1,0],[0.45,0.45,0.1,0],\n", 155 | "[0.3,0.4,0.2,0.1],[0.4,0.4,0.1,0.1],[0.2,0.3,0.3,0.2],[0.2,0.3,0.3,0.2],[0.1,0.2,0.3,0.4],[0.3,0.4,0.2,0.1],[0.2,0.2,0.4,0.2],\n", 156 | " [0.2,0.1,0.4,0.3],[0.05,0.15,0.3,0.5],[0.1,0.3,0.3,0.3],[0,0.1,0.3,0.6],[0,0.1,0.2,0.7],[0,0,0.3,0.7]],'112':{'mu0':10,'sigma0':2,'mu1':30,'sigma1':2,\n", 157 | " 'mu2':50,'sigma2':2,'mu3':60,'sigma3':5,'mu4':20,'sigma4':2,'mu5':25,'sigma5':5,'mu6':50,'sigma6':5,'mu7':60,'sigma7':5,\n", 158 | " 'mu8':40,'sigma8':5,'mu9':50,'sigma9':5,'mu10':70,'sigma10':5,'mu11':85,'sigma11':2,'mu12':60,'sigma12':5, \n", 159 | " 'mu13':60,'sigma13':5,'mu14':80,'sigma14':3,'mu15':90,'sigma15':3}}\n", 160 | "\n", 161 | "Parent2={'0':[0],'1':[0,0,1],'2':[1,1]}\n", 162 | "loopbacks={'00':[1], '01':[1],'11':[1],'12':[1]}\n", 163 | "\n", 164 | "Time_series2=tsBNgen(T,N,N_level,Mat,Node_Type,CPD,Parent,CPD2,Parent2,loopbacks)\n", 165 | "Time_series2.BN_data_gen()" 166 | ] 167 | }, 168 | { 169 | "cell_type": "code", 170 | "execution_count": 6, 171 | "metadata": {}, 172 | "outputs": [ 173 | { 174 | "name": "stdout", 175 | "output_type": "stream", 176 | "text": [ 177 | "[2, 2, 2, 2, 2, 1, 1, 2, 2, 2]\n", 178 | "[3, 4, 3, 4, 3, 3, 2, 1, 4, 4]\n", 179 | "[49.95557580237343, 81.91935377141675, 80.02776603979912, 82.7514263833196, 78.09782664417831, 62.4090858819203, 43.79221621872338, 18.10747482296797, 52.72102411795976, 91.61301738516033]\n" 180 | ] 181 | } 182 | ], 183 | "source": [ 184 | "print(Time_series2.BN_Nodes[0][3])\n", 185 | "print(Time_series2.BN_Nodes[1][3])\n", 186 | "print(Time_series2.BN_Nodes[2][3])" 187 | ] 188 | }, 189 | { 190 | "cell_type": "markdown", 191 | "metadata": {}, 192 | "source": [ 193 | "## Architecture 3\n", 194 | "\n", 195 | "### Similar to Architecture 1 but with loopback 2 for the middle node." 196 | ] 197 | }, 198 | { 199 | "cell_type": "code", 200 | "execution_count": 8, 201 | "metadata": {}, 202 | "outputs": [], 203 | "source": [ 204 | "T=10\n", 205 | "N=1000\n", 206 | "N_level=[2,4]\n", 207 | "Mat=pd.DataFrame(np.array(([0,1,1],[0,0,1],[0,0,0])))\n", 208 | "Node_Type=['D','D','C']\n", 209 | "\n", 210 | "CPD={'0':[0.6,0.4],'01':[[0.5,0.3,0.15,0.05],[0.1,0.15,0.3,0.45]],'012':{'mu0':10,'sigma0':2,'mu1':30,'sigma1':5,\n", 211 | " 'mu2':50,'sigma2':5,'mu3':70,'sigma3':5,'mu4':15,'sigma4':5,'mu5':50,'sigma5':5,'mu6':70,'sigma6':5,'mu7':90,'sigma7':3\n", 212 | "}}\n", 213 | "Parent={'0':[],'1':[0],'2':[0,1]}\n", 214 | "\n", 215 | "\n", 216 | "CPD2={'00':[[0.7,0.3],[0.2,0.8]],'011':[[0.7,0.2,0.1,0],[0.6,0.3,0.05,0.05],[0.35,0.5,0.15,0],\n", 217 | "[0.2,0.3,0.4,0.1],[0.3,0.3,0.2,0.2],[0.1,0.2,0.3,0.4],[0.05,0.15,0.3,0.5],[0,0.05,0.25,0.7]],'012':{'mu0':10,'sigma0':2,'mu1':30,'sigma1':5,\n", 218 | " 'mu2':50,'sigma2':5,'mu3':70,'sigma3':5,'mu4':15,'sigma4':5,'mu5':50,'sigma5':5,'mu6':70,'sigma6':5,'mu7':90,'sigma7':3\n", 219 | "}}\n", 220 | "\n", 221 | "Parent2={'0':[0],'1':[0,1],'2':[0,1]}\n", 222 | "\n", 223 | "loopbacks={'00':[1],'11':[1]}\n", 224 | "\n", 225 | "CPD3={'00':[[0.7,0.3],[0.2,0.8]],'0111':[[0.7,0.2,0.1,0],[0.6,0.3,0.1,0],[0.3,0.5,0.2,0],\n", 226 | "[0.3,0.4,0.15,0.15],[0.5,0.4,0.05,0.05],[0.5,0.4,0.05,0.05],[0.25,0.45,0.15,0.15],[0.2,0.4,0.3,0.1],[0.3,0.5,0.2,0],[0.25,0.45,0.15,0.15],\n", 227 | "[0.1,0.45,0.3,0.15],[0.05,0.45,0.3,0.2],[0.3,0.4,0.15,0.15],[0.2,0.4,0.3,0.1],[0.05,0.45,0.3,0.2],[0.1,0.3,0.4,0.2],[0.35,0.35,0.2,0.1],[0.25,0.45,0.2,0.1],[0.1,0.2,0.5,0.2],[0.05,0.25,0.5,0.2],\n", 228 | "[0.25,0.45,0.2,0.1],[0.05,0.35,0.5,0.1],[0.05,0.25,0.45,0.25],[0.05,0.2,0.35,0.4],[0.1,0.2,0.5,0],[0.05,0.25,0.45,0.25],[0.05,0.15,0.3,0.5],\n", 229 | " [0.05,0.1,0.3,0.55],[0.05,0.25,0.5,0.2],[0.05,0.2,0.35,0.4],[0.05,0.1,0.3,0.55],[0,0,0.2,0.8]],'012':{'mu0':10,'sigma0':2,'mu1':20,'sigma1':3,\n", 230 | " 'mu2':50,'sigma2':5,'mu3':70,'sigma3':5,'mu4':15,'sigma4':5,'mu5':50,'sigma5':5,'mu6':75,'sigma6':3,'mu7':90,'sigma7':3\n", 231 | "}}\n", 232 | "\n", 233 | "Parent3={'0':[0],'1':[0,1,1],'2':[0,1]}\n", 234 | "\n", 235 | "loopbacks2={'00':[1],'11':[2,1]}\n", 236 | "\n", 237 | "\n", 238 | "Time_series3=tsBNgen(T,N,N_level,Mat,Node_Type,CPD,Parent,CPD2,Parent2,loopbacks,CPD3,Parent3,loopbacks2)\n", 239 | "\n", 240 | "Time_series3.BN_sample_gen_loopback()" 241 | ] 242 | }, 243 | { 244 | "cell_type": "code", 245 | "execution_count": 9, 246 | "metadata": {}, 247 | "outputs": [ 248 | { 249 | "name": "stdout", 250 | "output_type": "stream", 251 | "text": [ 252 | "[2, 1, 1, 2, 2, 2, 1, 1, 2, 2]\n", 253 | "[2, 1, 1, 1, 2, 3, 4, 3, 4, 2]\n", 254 | "[53.87472154213914, 6.495627168596188, 9.418392120518126, 19.646516517998936, 44.525380117118665, 75.87802769068614, 75.68517757174538, 56.29952829580641, 93.33191414647266, 45.862343348068705]\n" 255 | ] 256 | } 257 | ], 258 | "source": [ 259 | "print((Time_series3.BN_Nodes[0][1]))\n", 260 | "print(Time_series3.BN_Nodes[1][1])\n", 261 | "print(Time_series3.BN_Nodes[2][1])" 262 | ] 263 | }, 264 | { 265 | "cell_type": "markdown", 266 | "metadata": {}, 267 | "source": [ 268 | "## Architecture 4\n", 269 | "\n", 270 | "### 3 Discerete and 3 Continuous Nodes. Each Node is Connected to Itself Across Time" 271 | ] 272 | }, 273 | { 274 | "cell_type": "code", 275 | "execution_count": 13, 276 | "metadata": {}, 277 | "outputs": [], 278 | "source": [ 279 | "T=10\n", 280 | "N=1000\n", 281 | "N_level=[2,2,2]\n", 282 | "Mat=pd.DataFrame(np.array(([0,1,1,1,1],[0,0,1,1,1],[0,0,0,1,1],[0,0,0,0,0],[0,0,0,0,0]))) \n", 283 | "Node_Type=['D','D','D','C','C']\n", 284 | "\n", 285 | "CPD={'0':[0.6,.04],'01':[[0.7,0.3],[0.3,0.7]],'012':[[0.9,0.1],[0.4,0.6],[0.6,0.4],[0.1,0.9]],\n", 286 | " '0123':{'mu0':5,'sigma0':2,'mu1':10,'sigma1':3,'mu2':20,'sigma2':2,'mu3':50,'sigma3':3,'mu4':20,'sigma4':2,'mu5':40,'sigma5':3,'mu6':50,'sigma6':5,'mu7':80,'sigma7':3,\n", 287 | " },'0124':{'mu0':500,'sigma0':10,'mu1':480,'sigma1':13,'mu2':450,'sigma2':10,'mu3':400,'sigma3':13,'mu4':400,'sigma4':10,'mu5':300,'sigma5':10,'mu6':250,'sigma6':10,'mu7':100,'sigma7':5}}\n", 288 | "\n", 289 | "Parent={'0':[],'1':[0],'2':[0,1],'3':[0,1,2],'4':[0,1,2]}\n", 290 | "\n", 291 | "CPD2={'00':[[0.6,0.4],[0.2,0.8]],'011':[[0.8,0.2],[0.6,0.4],[0.7,0.3],[0.2,0.8]],\n", 292 | " '0122':[[0.9,0.1],[0.7,0.3],[0.7,0.3],[0.28,0.78],[0.7,0.3],[0.28,0.72],[0.28,0.72],[0.1,0.9]],'01233':{'33':{'coefficient':[np.linspace(0.6,0.8,8).tolist()]},'sigma_intercept':np.linspace(0.6,3,8).tolist(),'sigma':np.linspace(3,4,8).tolist()},'01244':{'44':{'coefficient':[np.linspace(0.6,1.3,8).tolist()]},\n", 293 | " 'sigma_intercept':np.linspace(2,5,8).tolist(),'sigma':np.linspace(3,4,8).tolist()}}\n", 294 | "\n", 295 | "\n", 296 | "loopbacks={'00':[1],'11':[1],'22':[1],'33':[1],'44':[1]} \n", 297 | "\n", 298 | "Parent2={'0':[0],'1':[0,1],'2':[0,1,2],'3':[0,1,2,3],'4':[0,1,2,4]}\n", 299 | "\n", 300 | "\n", 301 | "Time_series4=tsBNgen(T,N,N_level,Mat,Node_Type,CPD,Parent,CPD2,Parent2,loopbacks)\n", 302 | "\n", 303 | "Time_series4.BN_data_gen() " 304 | ] 305 | }, 306 | { 307 | "cell_type": "markdown", 308 | "metadata": {}, 309 | "source": [ 310 | "## Architecture 5\n", 311 | "\n", 312 | "### HMM with loopback 1 " 313 | ] 314 | }, 315 | { 316 | "cell_type": "code", 317 | "execution_count": 18, 318 | "metadata": {}, 319 | "outputs": [ 320 | { 321 | "name": "stdout", 322 | "output_type": "stream", 323 | "text": [ 324 | "Total Time is 2.0737037658691406\n" 325 | ] 326 | } 327 | ], 328 | "source": [ 329 | "import time\n", 330 | "START=time.time()\n", 331 | "T=20\n", 332 | "N=1000\n", 333 | "N_level=[4]\n", 334 | "Mat=pd.DataFrame(np.array(([0,1],[0,0]))) # HMM\n", 335 | "Node_Type=['D','C']\n", 336 | "\n", 337 | "CPD={'0':[0.25,0.25,0.25,0.25],'01':{'mu0':20,'sigma0':5,'mu1':40,'sigma1':5,\n", 338 | " 'mu2':60,'sigma2':5,'mu3':80,'sigma3':5}}\n", 339 | "\n", 340 | "Parent={'0':[],'1':[0]}\n", 341 | "\n", 342 | "\n", 343 | "CPD2={'00':[[0.6,0.3,0.05,0.05],[0.25,0.4,0.25,0.1],[0.1,0.3,0.4,0.2],[0.05,0.05,0.4,0.5]],'01':{'mu0':20,'sigma0':5,'mu1':40,'sigma1':5,\n", 344 | " 'mu2':60,'sigma2':5,'mu3':80,'sigma3':5\n", 345 | "}}\n", 346 | "\n", 347 | "loopbacks={'00':[1]}\n", 348 | "\n", 349 | "Parent2={'0':[0],'1':[0]}\n", 350 | "\n", 351 | "\n", 352 | "Time_series5=tsBNgen(T,N,N_level,Mat,Node_Type,CPD,Parent,CPD2,Parent2,loopbacks)\n", 353 | "\n", 354 | "Time_series5.BN_data_gen() \n", 355 | "FINISH=time.time()\n", 356 | "print('Total Time is',FINISH-START)" 357 | ] 358 | }, 359 | { 360 | "cell_type": "code", 361 | "execution_count": 19, 362 | "metadata": { 363 | "scrolled": true 364 | }, 365 | "outputs": [ 366 | { 367 | "name": "stdout", 368 | "output_type": "stream", 369 | "text": [ 370 | "[3, 4, 3, 1, 2, 2, 2, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 3, 4, 3]\n", 371 | "[60.072920827425, 82.74836461790208, 57.3652355700002, 15.808715408345048, 42.11400427111718, 33.940916747529904, 39.38294267545808, 56.77532334095553, 55.225363073651565, 80.31120583537712, 72.83731788645105, 78.75540628414659, 83.74799152447356, 80.85359737561909, 83.89681465845214, 73.10191951112242, 83.32808287493899, 52.70302656588948, 74.34544473307433, 61.67002934637476]\n" 372 | ] 373 | } 374 | ], 375 | "source": [ 376 | "print(Time_series5.BN_Nodes[0][1])\n", 377 | "print(Time_series5.BN_Nodes[1][1])" 378 | ] 379 | }, 380 | { 381 | "cell_type": "code", 382 | "execution_count": null, 383 | "metadata": {}, 384 | "outputs": [], 385 | "source": [] 386 | } 387 | ], 388 | "metadata": { 389 | "kernelspec": { 390 | "display_name": "Python 3", 391 | "language": "python", 392 | "name": "python3" 393 | }, 394 | "language_info": { 395 | "codemirror_mode": { 396 | "name": "ipython", 397 | "version": 3 398 | }, 399 | "file_extension": ".py", 400 | "mimetype": "text/x-python", 401 | "name": "python", 402 | "nbconvert_exporter": "python", 403 | "pygments_lexer": "ipython3", 404 | "version": "3.8.2" 405 | } 406 | }, 407 | "nbformat": 4, 408 | "nbformat_minor": 4 409 | } 410 | -------------------------------------------------------------------------------- /License.txt: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2020 Manie Tadayon 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ![GitHub](https://img.shields.io/github/license/manitadayon/tsBNgen) ![PyPI - Python Version](https://img.shields.io/pypi/pyversions/tsBNgen) ![GitHub User's stars](https://img.shields.io/github/stars/manitadayon?style=flat-square) ![GitHub forks](https://img.shields.io/github/forks/manitadayon/tsBNgen?logo=GitHub) 2 | ![PyPI](https://img.shields.io/pypi/v/tsBNgen) 3 | 4 | ## If you would like to buy me a coffee 5 | 6 | Buy Me A Coffee 7 | 8 | 9 | **tsBNgen: A Python Library to Generate Time Series Data Based on an Arbitrary Bayesian Network Structure** 10 | 11 | [Description](#Description) 12 | 13 | [Citation](#Citaton) 14 | 15 | [Features](#Features) 16 | 17 | [Instruction](#Instruction) 18 | 19 | [License](#License) 20 | 21 | ---- 22 | 23 | ### **Description** 24 | 25 | #### tsBNgen is a Python package to generate time series data based on an arbitrary Bayesian Network Structures. 26 | --- 27 | ### **Citation** 28 | 29 | #### If you find this package useful or if you use it in your research or work please consider citing it as follows: 30 | ``` 31 | @article{tadayon2020tsbngen, 32 | title={tsBNgen: A Python Library to Generate Time Series Data from an Arbitrary Dynamic Bayesian Network Structure}, 33 | author={Tadayon, Manie and Pottie, Greg}, 34 | journal={arXiv preprint arXiv:2009.04595}, 35 | year={2020} 36 | } 37 | ``` 38 | ---- 39 | ### **Features** 40 | 41 | - It handles discrete nodes, continous nodes and hybrid (Mixture of discrete and continuous) network. 42 | 43 | - It uses multinomila distribution for the discrete nodes and Gaussian distribution for the continuous nodes. 44 | 45 | - It handles arbitrary Bayesian network structure. 46 | 47 | - It supports arbitrary loopback values. 48 | 49 | - The code can be modified easily to handle arbitrary static and temporal structures. 50 | --- 51 | 52 | ### **Instruction** 53 | 54 | To run this code either clone this repo or use the package distribution in PyPI using the following commands: 55 | 56 | ```python 57 | pip install tsBNgen 58 | ``` 59 | 60 | Then Run through the set of examples in 61 | 62 | > **Time_Series_Generation_Examples.ipynb** 63 | 64 | For more information on how to use the package please visit the following: 65 | 66 | 1. Watch my Youtube tutorial (I go over the package) 67 | Watch the videos 68 | 3. Original paper 69 | 4. Documentation in PDF available in this repository. 70 | 71 | ### **License** 72 | 73 | This software is released under the MIT liecense. 74 | 75 | 76 | 77 | 78 | 79 | 80 | 81 | 82 | 83 | 84 | 85 | 86 | 87 | 88 | -------------------------------------------------------------------------------- /Time_Series_Generation_Examples.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Examples Companion with Documentation\n", 8 | "\n", 9 | "**Description**\n", 10 | "> #### tsBNgen is a Python library to generate time series data based on an arbitrary dynamic Bayesian network. The intention behind writing tsBNgen is to let researchers geenrate time series data according to arbitrary model they want. \n", 11 | "\n", 12 | "> #### tsBNgen is released under the MIT license. \n", 13 | "\n", 14 | "**Instruction** \n", 15 | "\n", 16 | "> #### 1. Either clone this repository https://github.com/manitadayon/tsBNgen or install the package using pip install tsBNgen.\n", 17 | "> #### Then import necessary libraries using the following commands:\n", 18 | "```python\n", 19 | "from tsBNgen import *\n", 20 | "from tsBNgen.tsBNgen import * \n", 21 | "```\n", 22 | "> #### There are in general two functions you should be running if you want to generate data:\n", 23 | "\n", 24 | "> - BN_data_gen(): Use this function under the following conditions:\n", 25 | " custom_time variable is not specified and the value of the loopback for all the variables is at most 1.\n", 26 | " \n", 27 | "---- \n", 28 | "\n", 29 | "**Note**: condition 1 describes the classical dynamic Bayesian network in which some nodes at time t-1 are connected to themselves at time t. \n", 30 | "\n", 31 | "> - BN_sample_gen_loopback(): \n", 32 | " 1. custom_time is not specified and you want the loopback value for some nodes to be at most 2.\n", 33 | " 2. custom_time is specified and it is at least equal to the maximum loopback value of the loopbacks2.\n", 34 | "\n", 35 | "> #### Following are the explanation of the some of the variables and parameters in tsBNgen:\n", 36 | "> - **T** : Length of each time series.\n", 37 | "> - **N** : Number of samples.\n", 38 | "> - **N_level** : list. Number of possible levels for discrete nodes.\n", 39 | "> - **Mat** : data-frame. Adjacency matrix for each time point.\n", 40 | "> - **Node_Type** :list. Type of each variable in Bayesian Network.\n", 41 | "> - **CPD** : dict. Conditonal Probability Distribution for initial time point.\n", 42 | "> - **Parent** : dict. Identifying parent of each node in Bayesian network at initial time.\n", 43 | "> - **CPD2** : dict. Conditonal Probability Distribution.\n", 44 | "> - **Parent2** : dict. Identifying parent of each node in Bayesian network.\n", 45 | "> - **loopbacks** : dict. Describing the temporal interconnection between nodes.\n", 46 | "> - **CPD3** : dict. Conditonal Probability Distribution. Use this entry when BN_sample_gen_loopback() is called.\n", 47 | "> - **Parent3** : dict. Identifying parent of each node in Bayesian network. Use this entry when \n", 48 | "> BN_sample_gen_loopback() is called.\n", 49 | "> - **loopback2** : dict. Describing the temporal interconnection between nodes. Use this entry when \n", 50 | "> BN_sample_gen_loopback() is called.\n", 51 | " \n", 52 | " \n" 53 | ] 54 | }, 55 | { 56 | "cell_type": "markdown", 57 | "metadata": {}, 58 | "source": [ 59 | "### Import Necessary Files" 60 | ] 61 | }, 62 | { 63 | "cell_type": "code", 64 | "execution_count": 2, 65 | "metadata": {}, 66 | "outputs": [], 67 | "source": [ 68 | "from tsBNgen import *\n", 69 | "from tsBNgen.tsBNgen import * " 70 | ] 71 | }, 72 | { 73 | "cell_type": "markdown", 74 | "metadata": {}, 75 | "source": [ 76 | "## Architecture 1\n", 77 | "\n", 78 | "### Two Discerete and One Continuous Nodes." 79 | ] 80 | }, 81 | { 82 | "cell_type": "code", 83 | "execution_count": 3, 84 | "metadata": {}, 85 | "outputs": [], 86 | "source": [ 87 | "T=20\n", 88 | "N=2000\n", 89 | "N_level=[2,4]\n", 90 | "Mat=pd.DataFrame(np.array(([0,1,1],[0,0,1],[0,0,0])))\n", 91 | "Node_Type=['D','D','C']\n", 92 | "CPD={'0':[0.6,0.4],'01':[[0.5,0.3,0.15,0.05],[0.1,0.15,0.3,0.45]],'012':{'mu0':10,'sigma0':2,'mu1':30,'sigma1':5,\n", 93 | " 'mu2':50,'sigma2':5,'mu3':70,'sigma3':5,'mu4':15,'sigma4':5,'mu5':50,'sigma5':5,'mu6':70,'sigma6':5,'mu7':90,'sigma7':3\n", 94 | "}}\n", 95 | "Parent={'0':[],'1':[0],'2':[0,1]}\n", 96 | "\n", 97 | "CPD2={'00':[[0.7,0.3],[0.2,0.8]],'011':[[0.7,0.2,0.1,0],[0.6,0.3,0.05,0.05],[0.35,0.5,0.15,0],\n", 98 | "[0.2,0.3,0.4,0.1],[0.3,0.3,0.2,0.2],[0.1,0.2,0.3,0.4],[0.05,0.15,0.3,0.5],[0,0.05,0.25,0.7]],'012':{'mu0':10,'sigma0':2,'mu1':30,'sigma1':5,\n", 99 | " 'mu2':50,'sigma2':5,'mu3':70,'sigma3':5,'mu4':15,'sigma4':5,'mu5':50,'sigma5':5,'mu6':70,'sigma6':5,'mu7':90,'sigma7':3\n", 100 | "}}\n", 101 | "\n", 102 | "Parent2={'0':[0],'1':[0,1],'2':[0,1]}\n", 103 | "loopbacks={'00':[1],'11':[1]}\n", 104 | "Parent2={'0':[0],'1':[0,1],'2':[0,1]}\n", 105 | "Time_series1=tsBNgen(T,N,N_level,Mat,Node_Type,CPD,Parent,CPD2,Parent2,loopbacks)\n", 106 | "Time_series1.BN_data_gen()" 107 | ] 108 | }, 109 | { 110 | "cell_type": "code", 111 | "execution_count": 4, 112 | "metadata": {}, 113 | "outputs": [ 114 | { 115 | "name": "stdout", 116 | "output_type": "stream", 117 | "text": [ 118 | "[1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 2, 1, 2, 2, 2, 2, 2, 1, 1, 1]\n", 119 | "[1, 1, 2, 2, 4, 4, 4, 3, 2, 3, 4, 2, 2, 4, 4, 3, 4, 3, 2, 1]\n", 120 | "[7.808894461133653, 11.32725466660192, 45.60586730086184, 49.417565048817615, 91.96224489111819, 90.60949019682866, 90.56676813839593, 63.41098040448917, 34.83786809684297, 69.03588167756895, 95.48196749276904, 22.059789834337842, 52.4413443699694, 89.17064709084299, 90.74716862100209, 75.25321342376877, 93.21277916279126, 53.755213101836375, 21.726853444296204, 11.167892171240036]\n" 121 | ] 122 | } 123 | ], 124 | "source": [ 125 | "print(Time_series1.BN_Nodes[0][3])\n", 126 | "print(Time_series1.BN_Nodes[1][3])\n", 127 | "print(Time_series1.BN_Nodes[2][3])" 128 | ] 129 | }, 130 | { 131 | "cell_type": "markdown", 132 | "metadata": {}, 133 | "source": [ 134 | "## Architecture 2\n" 135 | ] 136 | }, 137 | { 138 | "cell_type": "code", 139 | "execution_count": 5, 140 | "metadata": { 141 | "scrolled": true 142 | }, 143 | "outputs": [], 144 | "source": [ 145 | "T=10\n", 146 | "N=1000\n", 147 | "N_level=[2,4]\n", 148 | "Mat=pd.DataFrame(np.array(([0,1,0],[0,0,1],[0,0,0])))\n", 149 | "Node_Type=['D','D','C']\n", 150 | "CPD={'0':[0.5,0.5],'01':[[0.6,0.3,0.05,0.05],[0.1,0.2,0.3,0.4]],'12':{'mu0':10,'sigma0':5,'mu1':30,'sigma1':5,\n", 151 | " 'mu2':60,'sigma2':5,'mu3':80,'sigma3':5}}\n", 152 | "Parent={'0':[],'1':[0],'2':[1]}\n", 153 | "\n", 154 | "CPD2={'00':[[0.7,0.3],[0.3,0.7]],'0011':[[0.7,0.2,0.1,0],[0.5,0.4,0.1,0],[0.45,0.45,0.1,0],\n", 155 | "[0.3,0.4,0.2,0.1],[0.4,0.4,0.1,0.1],[0.2,0.3,0.3,0.2],[0.2,0.3,0.3,0.2],[0.1,0.2,0.3,0.4],[0.3,0.4,0.2,0.1],[0.2,0.2,0.4,0.2],\n", 156 | " [0.2,0.1,0.4,0.3],[0.05,0.15,0.3,0.5],[0.1,0.3,0.3,0.3],[0,0.1,0.3,0.6],[0,0.1,0.2,0.7],[0,0,0.3,0.7]],'112':{'mu0':10,'sigma0':2,'mu1':30,'sigma1':2,\n", 157 | " 'mu2':50,'sigma2':2,'mu3':60,'sigma3':5,'mu4':20,'sigma4':2,'mu5':25,'sigma5':5,'mu6':50,'sigma6':5,'mu7':60,'sigma7':5,\n", 158 | " 'mu8':40,'sigma8':5,'mu9':50,'sigma9':5,'mu10':70,'sigma10':5,'mu11':85,'sigma11':2,'mu12':60,'sigma12':5, \n", 159 | " 'mu13':60,'sigma13':5,'mu14':80,'sigma14':3,'mu15':90,'sigma15':3}}\n", 160 | "\n", 161 | "Parent2={'0':[0],'1':[0,0,1],'2':[1,1]}\n", 162 | "loopbacks={'00':[1], '01':[1],'11':[1],'12':[1]}\n", 163 | "\n", 164 | "Time_series2=tsBNgen(T,N,N_level,Mat,Node_Type,CPD,Parent,CPD2,Parent2,loopbacks)\n", 165 | "Time_series2.BN_data_gen()" 166 | ] 167 | }, 168 | { 169 | "cell_type": "code", 170 | "execution_count": 6, 171 | "metadata": {}, 172 | "outputs": [ 173 | { 174 | "name": "stdout", 175 | "output_type": "stream", 176 | "text": [ 177 | "[2, 2, 2, 2, 2, 1, 1, 2, 2, 2]\n", 178 | "[3, 4, 3, 4, 3, 3, 2, 1, 4, 4]\n", 179 | "[49.95557580237343, 81.91935377141675, 80.02776603979912, 82.7514263833196, 78.09782664417831, 62.4090858819203, 43.79221621872338, 18.10747482296797, 52.72102411795976, 91.61301738516033]\n" 180 | ] 181 | } 182 | ], 183 | "source": [ 184 | "print(Time_series2.BN_Nodes[0][3])\n", 185 | "print(Time_series2.BN_Nodes[1][3])\n", 186 | "print(Time_series2.BN_Nodes[2][3])" 187 | ] 188 | }, 189 | { 190 | "cell_type": "markdown", 191 | "metadata": {}, 192 | "source": [ 193 | "## Architecture 3\n", 194 | "\n", 195 | "### Similar to Architecture 1 but with loopback 2 for the middle node." 196 | ] 197 | }, 198 | { 199 | "cell_type": "code", 200 | "execution_count": 8, 201 | "metadata": {}, 202 | "outputs": [], 203 | "source": [ 204 | "T=10\n", 205 | "N=1000\n", 206 | "N_level=[2,4]\n", 207 | "Mat=pd.DataFrame(np.array(([0,1,1],[0,0,1],[0,0,0])))\n", 208 | "Node_Type=['D','D','C']\n", 209 | "\n", 210 | "CPD={'0':[0.6,0.4],'01':[[0.5,0.3,0.15,0.05],[0.1,0.15,0.3,0.45]],'012':{'mu0':10,'sigma0':2,'mu1':30,'sigma1':5,\n", 211 | " 'mu2':50,'sigma2':5,'mu3':70,'sigma3':5,'mu4':15,'sigma4':5,'mu5':50,'sigma5':5,'mu6':70,'sigma6':5,'mu7':90,'sigma7':3\n", 212 | "}}\n", 213 | "Parent={'0':[],'1':[0],'2':[0,1]}\n", 214 | "\n", 215 | "\n", 216 | "CPD2={'00':[[0.7,0.3],[0.2,0.8]],'011':[[0.7,0.2,0.1,0],[0.6,0.3,0.05,0.05],[0.35,0.5,0.15,0],\n", 217 | "[0.2,0.3,0.4,0.1],[0.3,0.3,0.2,0.2],[0.1,0.2,0.3,0.4],[0.05,0.15,0.3,0.5],[0,0.05,0.25,0.7]],'012':{'mu0':10,'sigma0':2,'mu1':30,'sigma1':5,\n", 218 | " 'mu2':50,'sigma2':5,'mu3':70,'sigma3':5,'mu4':15,'sigma4':5,'mu5':50,'sigma5':5,'mu6':70,'sigma6':5,'mu7':90,'sigma7':3\n", 219 | "}}\n", 220 | "\n", 221 | "Parent2={'0':[0],'1':[0,1],'2':[0,1]}\n", 222 | "\n", 223 | "loopbacks={'00':[1],'11':[1]}\n", 224 | "\n", 225 | "CPD3={'00':[[0.7,0.3],[0.2,0.8]],'0111':[[0.7,0.2,0.1,0],[0.6,0.3,0.1,0],[0.3,0.5,0.2,0],\n", 226 | "[0.3,0.4,0.15,0.15],[0.5,0.4,0.05,0.05],[0.5,0.4,0.05,0.05],[0.25,0.45,0.15,0.15],[0.2,0.4,0.3,0.1],[0.3,0.5,0.2,0],[0.25,0.45,0.15,0.15],\n", 227 | "[0.1,0.45,0.3,0.15],[0.05,0.45,0.3,0.2],[0.3,0.4,0.15,0.15],[0.2,0.4,0.3,0.1],[0.05,0.45,0.3,0.2],[0.1,0.3,0.4,0.2],[0.35,0.35,0.2,0.1],[0.25,0.45,0.2,0.1],[0.1,0.2,0.5,0.2],[0.05,0.25,0.5,0.2],\n", 228 | "[0.25,0.45,0.2,0.1],[0.05,0.35,0.5,0.1],[0.05,0.25,0.45,0.25],[0.05,0.2,0.35,0.4],[0.1,0.2,0.5,0],[0.05,0.25,0.45,0.25],[0.05,0.15,0.3,0.5],\n", 229 | " [0.05,0.1,0.3,0.55],[0.05,0.25,0.5,0.2],[0.05,0.2,0.35,0.4],[0.05,0.1,0.3,0.55],[0,0,0.2,0.8]],'012':{'mu0':10,'sigma0':2,'mu1':20,'sigma1':3,\n", 230 | " 'mu2':50,'sigma2':5,'mu3':70,'sigma3':5,'mu4':15,'sigma4':5,'mu5':50,'sigma5':5,'mu6':75,'sigma6':3,'mu7':90,'sigma7':3\n", 231 | "}}\n", 232 | "\n", 233 | "Parent3={'0':[0],'1':[0,1,1],'2':[0,1]}\n", 234 | "\n", 235 | "loopbacks2={'00':[1],'11':[2,1]}\n", 236 | "\n", 237 | "\n", 238 | "Time_series3=tsBNgen(T,N,N_level,Mat,Node_Type,CPD,Parent,CPD2,Parent2,loopbacks,CPD3,Parent3,loopbacks2)\n", 239 | "\n", 240 | "Time_series3.BN_sample_gen_loopback()" 241 | ] 242 | }, 243 | { 244 | "cell_type": "code", 245 | "execution_count": 9, 246 | "metadata": {}, 247 | "outputs": [ 248 | { 249 | "name": "stdout", 250 | "output_type": "stream", 251 | "text": [ 252 | "[2, 1, 1, 2, 2, 2, 1, 1, 2, 2]\n", 253 | "[2, 1, 1, 1, 2, 3, 4, 3, 4, 2]\n", 254 | "[53.87472154213914, 6.495627168596188, 9.418392120518126, 19.646516517998936, 44.525380117118665, 75.87802769068614, 75.68517757174538, 56.29952829580641, 93.33191414647266, 45.862343348068705]\n" 255 | ] 256 | } 257 | ], 258 | "source": [ 259 | "print((Time_series3.BN_Nodes[0][1]))\n", 260 | "print(Time_series3.BN_Nodes[1][1])\n", 261 | "print(Time_series3.BN_Nodes[2][1])" 262 | ] 263 | }, 264 | { 265 | "cell_type": "markdown", 266 | "metadata": {}, 267 | "source": [ 268 | "## Architecture 4\n", 269 | "\n", 270 | "### 3 Discerete and 3 Continuous Nodes. Each Node is Connected to Itself Across Time" 271 | ] 272 | }, 273 | { 274 | "cell_type": "code", 275 | "execution_count": 13, 276 | "metadata": {}, 277 | "outputs": [], 278 | "source": [ 279 | "T=10\n", 280 | "N=1000\n", 281 | "N_level=[2,2,2]\n", 282 | "Mat=pd.DataFrame(np.array(([0,1,1,1,1],[0,0,1,1,1],[0,0,0,1,1],[0,0,0,0,0],[0,0,0,0,0]))) \n", 283 | "Node_Type=['D','D','D','C','C']\n", 284 | "\n", 285 | "CPD={'0':[0.6,.04],'01':[[0.7,0.3],[0.3,0.7]],'012':[[0.9,0.1],[0.4,0.6],[0.6,0.4],[0.1,0.9]],\n", 286 | " '0123':{'mu0':5,'sigma0':2,'mu1':10,'sigma1':3,'mu2':20,'sigma2':2,'mu3':50,'sigma3':3,'mu4':20,'sigma4':2,'mu5':40,'sigma5':3,'mu6':50,'sigma6':5,'mu7':80,'sigma7':3,\n", 287 | " },'0124':{'mu0':500,'sigma0':10,'mu1':480,'sigma1':13,'mu2':450,'sigma2':10,'mu3':400,'sigma3':13,'mu4':400,'sigma4':10,'mu5':300,'sigma5':10,'mu6':250,'sigma6':10,'mu7':100,'sigma7':5}}\n", 288 | "\n", 289 | "Parent={'0':[],'1':[0],'2':[0,1],'3':[0,1,2],'4':[0,1,2]}\n", 290 | "\n", 291 | "CPD2={'00':[[0.6,0.4],[0.2,0.8]],'011':[[0.8,0.2],[0.6,0.4],[0.7,0.3],[0.2,0.8]],\n", 292 | " '0122':[[0.9,0.1],[0.7,0.3],[0.7,0.3],[0.28,0.78],[0.7,0.3],[0.28,0.72],[0.28,0.72],[0.1,0.9]],'01233':{'33':{'coefficient':[np.linspace(0.6,0.8,8).tolist()]},'sigma_intercept':np.linspace(0.6,3,8).tolist(),'sigma':np.linspace(3,4,8).tolist()},'01244':{'44':{'coefficient':[np.linspace(0.6,1.3,8).tolist()]},\n", 293 | " 'sigma_intercept':np.linspace(2,5,8).tolist(),'sigma':np.linspace(3,4,8).tolist()}}\n", 294 | "\n", 295 | "\n", 296 | "loopbacks={'00':[1],'11':[1],'22':[1],'33':[1],'44':[1]} \n", 297 | "\n", 298 | "Parent2={'0':[0],'1':[0,1],'2':[0,1,2],'3':[0,1,2,3],'4':[0,1,2,4]}\n", 299 | "\n", 300 | "\n", 301 | "Time_series4=tsBNgen(T,N,N_level,Mat,Node_Type,CPD,Parent,CPD2,Parent2,loopbacks)\n", 302 | "\n", 303 | "Time_series4.BN_data_gen() " 304 | ] 305 | }, 306 | { 307 | "cell_type": "markdown", 308 | "metadata": {}, 309 | "source": [ 310 | "## Architecture 5\n", 311 | "\n", 312 | "### HMM with loopback 1 " 313 | ] 314 | }, 315 | { 316 | "cell_type": "code", 317 | "execution_count": 18, 318 | "metadata": {}, 319 | "outputs": [ 320 | { 321 | "name": "stdout", 322 | "output_type": "stream", 323 | "text": [ 324 | "Total Time is 2.0737037658691406\n" 325 | ] 326 | } 327 | ], 328 | "source": [ 329 | "import time\n", 330 | "START=time.time()\n", 331 | "T=20\n", 332 | "N=1000\n", 333 | "N_level=[4]\n", 334 | "Mat=pd.DataFrame(np.array(([0,1],[0,0]))) # HMM\n", 335 | "Node_Type=['D','C']\n", 336 | "\n", 337 | "CPD={'0':[0.25,0.25,0.25,0.25],'01':{'mu0':20,'sigma0':5,'mu1':40,'sigma1':5,\n", 338 | " 'mu2':60,'sigma2':5,'mu3':80,'sigma3':5}}\n", 339 | "\n", 340 | "Parent={'0':[],'1':[0]}\n", 341 | "\n", 342 | "\n", 343 | "CPD2={'00':[[0.6,0.3,0.05,0.05],[0.25,0.4,0.25,0.1],[0.1,0.3,0.4,0.2],[0.05,0.05,0.4,0.5]],'01':{'mu0':20,'sigma0':5,'mu1':40,'sigma1':5,\n", 344 | " 'mu2':60,'sigma2':5,'mu3':80,'sigma3':5\n", 345 | "}}\n", 346 | "\n", 347 | "loopbacks={'00':[1]}\n", 348 | "\n", 349 | "Parent2={'0':[0],'1':[0]}\n", 350 | "\n", 351 | "\n", 352 | "Time_series5=tsBNgen(T,N,N_level,Mat,Node_Type,CPD,Parent,CPD2,Parent2,loopbacks)\n", 353 | "\n", 354 | "Time_series5.BN_data_gen() \n", 355 | "FINISH=time.time()\n", 356 | "print('Total Time is',FINISH-START)" 357 | ] 358 | }, 359 | { 360 | "cell_type": "code", 361 | "execution_count": 19, 362 | "metadata": { 363 | "scrolled": true 364 | }, 365 | "outputs": [ 366 | { 367 | "name": "stdout", 368 | "output_type": "stream", 369 | "text": [ 370 | "[3, 4, 3, 1, 2, 2, 2, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 3, 4, 3]\n", 371 | "[60.072920827425, 82.74836461790208, 57.3652355700002, 15.808715408345048, 42.11400427111718, 33.940916747529904, 39.38294267545808, 56.77532334095553, 55.225363073651565, 80.31120583537712, 72.83731788645105, 78.75540628414659, 83.74799152447356, 80.85359737561909, 83.89681465845214, 73.10191951112242, 83.32808287493899, 52.70302656588948, 74.34544473307433, 61.67002934637476]\n" 372 | ] 373 | } 374 | ], 375 | "source": [ 376 | "print(Time_series5.BN_Nodes[0][1])\n", 377 | "print(Time_series5.BN_Nodes[1][1])" 378 | ] 379 | }, 380 | { 381 | "cell_type": "code", 382 | "execution_count": null, 383 | "metadata": {}, 384 | "outputs": [], 385 | "source": [] 386 | } 387 | ], 388 | "metadata": { 389 | "kernelspec": { 390 | "display_name": "Python 3", 391 | "language": "python", 392 | "name": "python3" 393 | }, 394 | "language_info": { 395 | "codemirror_mode": { 396 | "name": "ipython", 397 | "version": 3 398 | }, 399 | "file_extension": ".py", 400 | "mimetype": "text/x-python", 401 | "name": "python", 402 | "nbconvert_exporter": "python", 403 | "pygments_lexer": "ipython3", 404 | "version": "3.8.2" 405 | } 406 | }, 407 | "nbformat": 4, 408 | "nbformat_minor": 4 409 | } 410 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | from setuptools import setup 2 | 3 | with open("README.md", "r") as fh: 4 | long_description = fh.read() 5 | 6 | setup( 7 | name='tsBNgen', 8 | version='1.0.0', 9 | author='Manie Tadayon', 10 | author_email='manitadayon@ucla.edu', 11 | description='Generate time series data from an arbitrary Bayesian network', 12 | packages=['tsBNgen'], 13 | license='MIT', 14 | long_description=long_description, 15 | long_description_content_type="text/markdown", 16 | url='https://github.com/manitadayon/tsBNgen', 17 | classifiers=[ 18 | "Programming Language :: Python :: 3", 19 | "License :: OSI Approved :: MIT License", 20 | "Operating System :: OS Independent", 21 | ], 22 | ) -------------------------------------------------------------------------------- /tsBNgen/__init__.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import pandas as pd 3 | from functools import reduce 4 | 5 | -------------------------------------------------------------------------------- /tsBNgen/__pycache__/__init__.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/manitadayon/tsBNgen/1d54de78e9a4405e7a0049ccab89093cd7f0d094/tsBNgen/__pycache__/__init__.cpython-38.pyc -------------------------------------------------------------------------------- /tsBNgen/__pycache__/tsBNgen.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/manitadayon/tsBNgen/1d54de78e9a4405e7a0049ccab89093cd7f0d094/tsBNgen/__pycache__/tsBNgen.cpython-38.pyc -------------------------------------------------------------------------------- /tsBNgen/tsBNgen.py: -------------------------------------------------------------------------------- 1 | from tsBNgen import * 2 | class tsBNgen: 3 | def __init__(self,T,N,N_level,Mat,Node_Type,CPD,Parent,CPD2,Parent2,loopbacks,CPD3=None,Parent3=None,loopbacks2=None,custom_time=0): 4 | ''' 5 | A class to generate time series according to arbitrary dynamic Bayesian network structure. 6 | 7 | Attributes 8 | ----------- 9 | T : int 10 | Length of each time series. 11 | 12 | N : int 13 | Number of time series. 14 | 15 | N_level : list 16 | Number of levels for the discrete nodes. Ignore this for the continuous nodes. 17 | 18 | Mat : data-frame 19 | Adjacency matrix corresponding to the Bayesian network at initial time. 20 | 21 | Node_Type : list 22 | Identifying nodes as either discrete "D" or continuous "C". 23 | 24 | CPD : dict 25 | Probability distribution fof the nodes at initial time point. 26 | 27 | Parent : dict 28 | Parents of each node at initial time point. 29 | 30 | CPD2 : dic 31 | Probability distribution of the nodes after initial time. 32 | 33 | Parent2 : dict 34 | Parents of each node for time points after the initial time point. 35 | 36 | loopbacks : dict 37 | Determining the temporal conection between the nodes. 38 | 39 | CPD3 : dict 40 | Probability distribution for the nodes. Use this entry when BN_sample_gen_loopback() is called. 41 | It defaults to empty. 42 | 43 | Parent3 : dict 44 | Parents of each node after the initial time point. It is default to empty. 45 | Use this entry when BN_sample_gen_loopback() is called 46 | 47 | loopbacks2 : dict 48 | Determining the temporal conection between nodes. It is default to empty. 49 | Use this entry when BN_sample_gen_loopback() is called 50 | 51 | custom_time: int 52 | Determines at which time point, the new BN is used. The default is 0, which means 53 | the program learns it automatically from the loopbacks entry. 54 | 55 | Methods 56 | ------------ 57 | BFS(Row) 58 | Perform Breadth-first search for the given node(row). 59 | 60 | zero_loc(List) 61 | Find the index of zero values in a list. 62 | 63 | Role_Assignment() 64 | Identify the root node. 65 | 66 | DAG_ordering() 67 | Find the topological ordering of the graph. 68 | 69 | Child(Row) 70 | Finds the children of the node specified by the row of the adjacency matrix. 71 | 72 | Multinomial_Select(index1,index2.ii) 73 | Generate sample according to the Multinomial distribution. 74 | 75 | Roots_length() 76 | Identifying the number of root nodes. 77 | 78 | int_to_str(List) 79 | Concatenates list elements to a string. 80 | 81 | Valid_BN(parent) 82 | Verify whether the parent-child relationships between nodes are valid. 83 | 84 | Initial_sample() 85 | Generate samples for all the nodes at initial time (t=0). 86 | 87 | Gaussian_select(sindex1,index2,ii=0) 88 | Generate sample according to the Gaussian distribution. 89 | 90 | continous_cpd() 91 | Identify which CPD entry to sample from. 92 | 93 | BN_sample(): 94 | Generate samples for all the nodes after the initial time. 95 | 96 | BN_data_gen() 97 | Use this function under the following conditions: 98 | custom_time variable is not specified and the value of the loopback for all the variables is at most 1 99 | 100 | BN_sample_loopback() 101 | Generate samples for all the nodes for time t=k. 102 | 103 | BN_sample_gen_loopback() 104 | custom_time is not specified and you want the loopback value for some nodes to be at most 2. 105 | custom_time is specified and it is at least equal to the maximum loopback value of the loopbacks2. 106 | 107 | ''' 108 | self.T=T 109 | self.N=N 110 | self.Mat=Mat 111 | self.Node_Type=Node_Type 112 | self.CPD=CPD 113 | self.Node=[[] for ii in range(self.Mat.shape[0])] 114 | self.Parent=Parent 115 | self.N_level=N_level 116 | self.CPD2=CPD2 117 | self.Parent2=Parent2 118 | self.flag=0 119 | if CPD3 is None: 120 | CPD3={} 121 | self.CPD3=CPD3 122 | if Parent3 is None: 123 | Parent3={} 124 | self.Parent3=Parent3 125 | self.loopbacks=loopbacks 126 | if loopbacks2 is None: 127 | loopbacks2={} 128 | self.loopbacks2=loopbacks2 129 | self.custom_time=custom_time 130 | 131 | def BFS(self,Row): 132 | ''' 133 | Perform Breadth-first search for the given node(row). 134 | 135 | Parameters 136 | -------------- 137 | 138 | Row : int 139 | Corresponds to the row (node) in adjacency matrix. 140 | 141 | Returns 142 | -------------- 143 | list 144 | The node and all its children. 145 | ''' 146 | child = [Row] 147 | queue = [] 148 | queue.extend(np.nonzero(self.Mat.iloc[Row, :].values)[0].tolist()) 149 | while queue: 150 | vertex = queue.pop(0) 151 | if vertex not in child: 152 | child.append(vertex) 153 | queue.extend(np.nonzero(self.Mat.iloc[vertex, :].values)[0].tolist()) 154 | return child 155 | 156 | @staticmethod 157 | def zero_loc(List): 158 | ''' 159 | Find the index of zero values in a list. 160 | 161 | Parameters 162 | ------------ 163 | List: list 164 | 165 | Returns 166 | ----------- 167 | list 168 | indices of zero values in a list. 169 | ''' 170 | ind=[count for count,ii in enumerate(List) if ii==0] 171 | return ind 172 | 173 | def __repr__(self): 174 | 175 | return f''' length of each time series is {self.T} 176 | Number of time series samples is {self.N} 177 | The adjacency matrix is {self.Mat} 178 | Node Types are {self.Node_Type} 179 | Conditional Probability Table for initial time is {self.CPD} 180 | Parents of each node at initial time are {self.Parent} 181 | The number of levels for each discrete variable is {self.N_level} 182 | The roles are {self.Role} 183 | Conditional Probability Table for t2 to t_loopback is {self.CPD2} 184 | BN parents for time t2 ...t_loopback for each node are {self.Parent2} 185 | Conditional Probability Table for t_loopback to tn is {self.CPD3} 186 | BN parents for time t_loopback ... tn for each node are {self.Parent3} 187 | ''' 188 | 189 | def Role_Assignment(self): 190 | ''' 191 | Identify the root node. 192 | 193 | Parameters 194 | ------------ 195 | None 196 | 197 | Returns 198 | ------------ 199 | None 200 | 201 | ''' 202 | self.Role=[0]*len(self.top_order) 203 | for count,ii in enumerate(self.top_order): 204 | if(sum(self.Mat.iloc[:,ii]) ==0): 205 | self.Role[ii]=1 206 | 207 | def DAG_ordering(self): 208 | ''' 209 | Find the topological ordering of the graph 210 | 211 | Parameters 212 | ------------- 213 | None 214 | 215 | Returns 216 | ------------ 217 | None 218 | ''' 219 | queue = [] 220 | in_degree = np.count_nonzero(self.Mat, axis=0).tolist() 221 | index = tsBNgen.zero_loc(in_degree) 222 | queue.extend(index) 223 | visited_count = 0 224 | self.top_order = [] 225 | while queue: 226 | P = queue.pop(0) 227 | Neighbor = self.Child(P) 228 | self.top_order.append(P) 229 | for count, ii in enumerate(Neighbor): 230 | in_degree[ii] = in_degree[ii] - 1 231 | if (in_degree[ii] == 0): 232 | queue.append(ii) 233 | 234 | visited_count = visited_count + 1 235 | if (visited_count != self.Mat.shape[0]): 236 | print('DAG has a cycle') 237 | 238 | def Child(self, Row): 239 | ''' 240 | Finds the children of the node specified by the row of the adjacency matrix. 241 | 242 | Parameters 243 | ----------- 244 | Row : int 245 | The row in the adjacency matrix, corresponding to the same node in a Bayesian network. 246 | 247 | Returns 248 | ---------- 249 | list 250 | All the children of the given node. 251 | ''' 252 | child=[] 253 | child.extend(np.nonzero(self.Mat.iloc[Row, :].values)[0].tolist()) 254 | return child 255 | 256 | 257 | def Multinomial_Select(self,index1,index2,ii=0): 258 | ''' 259 | Generate sample according to the Multinomial distribution. 260 | 261 | Parameters 262 | ----------- 263 | index1: string 264 | key values of dictionary in CPD/CPD2/CPD3 265 | index2: int 266 | Determine which CPD entry to select. 267 | ii : int 268 | The node to generate the sample for. It defaults to 0. 269 | 270 | Returns 271 | ------------ 272 | int 273 | The new generated sample. 274 | ''' 275 | if(self.flag==0): 276 | if(self.Role[ii]==1): 277 | Num=np.random.multinomial(1, self.CPD[str(index1)], size=1) 278 | Pos=Num.tolist()[0].index(1) 279 | return Pos+1 280 | else: 281 | Num=np.random.multinomial(1, self.CPD[str(index1)][index2], size=1) 282 | Pos=Num.tolist()[0].index(1) 283 | return Pos+1 284 | elif (self.flag==1): 285 | if(len(self.Parent2[str(ii)])!=0): 286 | Num=np.random.multinomial(1, self.CPD2[str(index1)][index2], size=1) 287 | Pos=Num.tolist()[0].index(1) 288 | else: 289 | Num=np.random.multinomial(1, self.CPD2[str(index1)], size=1) 290 | Pos=Num.tolist()[0].index(1) 291 | return Pos+1 292 | 293 | elif (self.flag==2): 294 | if(len(self.Parent3[str(ii)])!=0): 295 | Num=np.random.multinomial(1, self.CPD3[str(index1)][index2], size=1) 296 | else: 297 | Num=np.random.multinomial(1, self.CPD3[str(index1)], size=1) 298 | Pos=Num.tolist()[0].index(1) 299 | return Pos+1 300 | 301 | 302 | 303 | def parents_len(self,Node): 304 | return np.count_nonzero(self.Mat.iloc[:,Node], axis=0).tolist() 305 | 306 | def Roots_length(self): 307 | ''' 308 | Identifying the number of root nodes. 309 | 310 | Parameters 311 | ------------ 312 | None 313 | 314 | Returns 315 | ------------ 316 | int 317 | Number of root nodes. 318 | ''' 319 | parent=np.count_nonzero(self.Mat, axis=0).tolist() 320 | return len([idx for idx,ii in enumerate(parent) if ii==0]) 321 | 322 | @staticmethod 323 | def int_to_str(List): 324 | ''' 325 | Concatenates list elements to a string. 326 | 327 | Parameters 328 | ------------- 329 | List : list 330 | 331 | Returns 332 | ------------- 333 | string 334 | concatenated list elements as a string. 335 | 336 | ''' 337 | if not isinstance(List, list): 338 | List=list(map(int, str(List))) 339 | List=[str(ii) for ii in List] 340 | return ''.join(List) 341 | 342 | def Valid_BN(self,parent): 343 | ''' 344 | Verify whether the parent-child relationships between nodes are valid. 345 | 346 | Parameters 347 | ------------ 348 | parent : dict 349 | dictionary where the keys are the nodes and the values are the list of parents. 350 | 351 | Returns 352 | ----------- 353 | None 354 | 355 | Raises 356 | ----------- 357 | Exception 358 | "Parent of a discrete node cannot be continuous" 359 | ''' 360 | for count,ii in enumerate(self.top_order): 361 | if(self.Node_Type[ii]=='D' and any(self.Node_Type[jj]=='C' for jj in parent[str(ii)])): 362 | raise Exception("Parent of a discrete node cannot be continuous") 363 | 364 | 365 | def Initial_sample(self): 366 | ''' 367 | Generate samples for all the nodes at initial time (t=0) 368 | 369 | Parameters 370 | -------------- 371 | None 372 | 373 | Returns 374 | ------------- 375 | None 376 | 377 | Raises 378 | ------------ 379 | Exception 380 | Parent of a discrete node cannot be continuous 381 | ''' 382 | self.DAG_ordering() 383 | self.Role_Assignment() 384 | self.flag=0 385 | self.Valid_BN(self.Parent) 386 | 387 | for count,ii in enumerate(self.top_order): 388 | parent=tsBNgen.int_to_str(self.Parent[str(ii)])+str(ii) 389 | if(self.Role[ii]==1 and self.Node_Type[ii]=='D'): 390 | self.Node[ii].append(self.Multinomial_Select(parent,0,ii)) 391 | elif(self.Role[ii]==1 and self.Node_Type[ii]=='C'): 392 | self.Node[ii].extend(self.Gaussian_select(ii,0)) 393 | elif(all(self.Node_Type[jj]=='D' for jj in self.Parent[str(ii)])): 394 | all_parents=[] 395 | parent_N_level=[] 396 | for count2,jj in enumerate(self.Parent[str(ii)]): 397 | all_parents.append(self.Node[jj][-1]) 398 | parent_N_level.append(self.N_level[jj]) 399 | 400 | self.all_parents=all_parents 401 | self.parent_N_level=parent_N_level 402 | CPD_entry=self.continous_cpd() 403 | 404 | if(self.Node_Type[ii]=='D'): 405 | self.Node[ii].append(self.Multinomial_Select(parent,CPD_entry,ii)) 406 | elif(self.Node_Type[ii]=='C'): 407 | self.Node[ii].extend(self.Gaussian_select(parent,CPD_entry)) 408 | 409 | elif(any(self.Node_Type[jj]=='D' for jj in self.Parent[str(ii)])): 410 | D_Parent=[kk for kk in self.Parent[str(ii)] if self.Node_Type[kk]=='D'] 411 | all_parents=[] 412 | parent_N_level=[] 413 | for count2,jj in enumerate(D_Parent): 414 | all_parents.append(self.Node[jj][-1]) 415 | parent_N_level.append(self.N_level[jj]) 416 | 417 | self.all_parents=all_parents 418 | self.parent_N_level=parent_N_level 419 | CPD_entry=self.continous_cpd() 420 | 421 | C_Parent=[kk for kk in self.Parent[str(ii)] if self.Node_Type[kk]=='C'] 422 | temp=0 423 | for count3,kk in enumerate(C_Parent): 424 | cont_SUM=self.CPD[str(parent)][str(kk)+str(ii)]['coefficient'][0][CPD_entry]*self.Node[kk][-1] 425 | temp=temp+cont_SUM 426 | intercept=np.random.normal(0,self.CPD[str(parent)]['sigma_intercept'][CPD_entry],1) 427 | self.Node[ii].extend(self.Gaussian_select(temp+intercept,self.CPD[str(parent)]['sigma'][CPD_entry],ii)) 428 | 429 | elif(all(self.Node_Type[jj]=='C' for jj in self.Parent[str(ii)])): 430 | temp=0 431 | for count3,kk in enumerate(self.Parent[str(ii)]): 432 | cont_SUM=self.CPD[str(parent)][str(kk)+str(ii)]['coefficient'][0][0]*self.Node[kk][-1] 433 | temp=temp+cont_SUM 434 | intercept=np.random.normal(0,self.CPD[str(parent)]['sigma_intercept'][0],1) 435 | self.Node[ii].extend(self.Gaussian_select(temp+intercept,self.CPD[str(parent)]['sigma'][0],ii)) 436 | 437 | 438 | def Gaussian_select(self,index1,index2,ii=0): 439 | if(self.flag==0): 440 | C_Parent=[kk for kk in self.Parent[str(ii)] if self.Node_Type[kk]=='C'] 441 | if(len(C_Parent)!=0): 442 | return (np.random.normal(index1,index2, 1).tolist()) 443 | else: 444 | return (np.random.normal(self.CPD[str(index1)]['mu'+str(index2)],self.CPD[str(index1)]['sigma'+str(index2)], 1).tolist()) 445 | 446 | elif(self.flag==1): 447 | C_Parent=[kk for kk in self.Parent2[str(ii)] if self.Node_Type[kk]=='C'] 448 | if(len(C_Parent)!=0): 449 | return (np.random.normal(index1,index2, 1).tolist()) 450 | else: 451 | return (np.random.normal(self.CPD2[str(index1)]['mu'+str(index2)],self.CPD2[str(index1)]['sigma'+str(index2)], 1).tolist()) 452 | elif(self.flag==2): 453 | C_Parent=[kk for kk in self.Parent3[str(ii)] if self.Node_Type[kk]=='C'] 454 | if(len(C_Parent)!=0): 455 | return (np.random.normal(index1,index2, 1).tolist()) 456 | else: 457 | return (np.random.normal(self.CPD3[str(index1)]['mu'+str(index2)],self.CPD3[str(index1)]['sigma'+str(index2)], 1).tolist()) 458 | 459 | 460 | def Level_multiplied(self): 461 | self.level_multiply=[] 462 | Temp=self.parent_N_level 463 | for ii in range(len(Temp)): 464 | A1=reduce(lambda x, y: x*y,Temp) 465 | self.level_multiply.append(A1) 466 | Temp=Temp[1:] 467 | self.level_multiply=self.level_multiply[1:] 468 | self.level_multiply.insert(len(self.parent_N_level ),1) 469 | 470 | def continous_cpd(self): 471 | self.Level_multiplied() 472 | List=[(ii-1)*jj for ii, jj in zip(self.all_parents,self.level_multiply)] 473 | return sum(List) 474 | 475 | def BN_sample(self): 476 | ''' 477 | Generate samples for all the nodes after the initial time. 478 | 479 | Parameters 480 | -------------- 481 | None 482 | 483 | Returns 484 | ------------- 485 | None 486 | 487 | Notes 488 | ------------ 489 | Use this function to generate samples if the loopback values are at most one. 490 | Loopback=1 means that a node at time t is connected to the node at t-1. 491 | 492 | Raises 493 | ------------ 494 | Exception 495 | Parent of a discrete node cannot be continuous 496 | ''' 497 | self.flag=1 498 | loopbacks_temp=self.loopbacks.copy() 499 | self.Valid_BN(self.Parent2) 500 | for count,ii in enumerate(self.top_order): 501 | parent=tsBNgen.int_to_str(self.Parent2[str(ii)])+str(ii) 502 | if(all(self.Node_Type[jj]=='D' for jj in self.Parent2[str(ii)])): 503 | temp=0 504 | all_parents=[] 505 | parent_N_level=[] 506 | for count2,jj in enumerate(self.Parent2[str(ii)]): 507 | loopbacks_keys=tsBNgen.int_to_str(jj)+str(ii) 508 | if(loopbacks_keys in self.loopbacks.keys()): 509 | if(self.top_order.index(jj) < count): 510 | for count3,mm in enumerate(self.loopbacks[loopbacks_keys]): 511 | all_parents.append(self.Node[jj][-1-mm]) 512 | else: 513 | for count3,mm in enumerate(self.loopbacks[loopbacks_keys]): 514 | LP=mm-1 515 | all_parents.append(self.Node[jj][-1-LP]) 516 | del self.loopbacks[loopbacks_keys] 517 | elif(jj != ii): 518 | all_parents.append(self.Node[jj][-1]) 519 | parent_N_level.append(self.N_level[jj]) 520 | 521 | self.loopbacks=loopbacks_temp.copy() 522 | 523 | self.all_parents=all_parents 524 | self.parent_N_level=parent_N_level 525 | CPD_entry=self.continous_cpd() 526 | 527 | if(self.Node_Type[ii]=='D'): 528 | self.Node[ii].append(self.Multinomial_Select(parent,CPD_entry,ii)) 529 | elif(self.Node_Type[ii]=='C'): 530 | self.Node[ii].extend(self.Gaussian_select(parent,CPD_entry)) 531 | 532 | elif(any(self.Node_Type[jj]=='D' for jj in self.Parent2[str(ii)])): 533 | loopbacks_temp=self.loopbacks.copy() 534 | D_Parent=[kk for kk in self.Parent2[str(ii)] if self.Node_Type[kk]=='D'] 535 | all_parents=[] 536 | parent_N_level=[] 537 | for count2,jj in enumerate(D_Parent): 538 | loopbacks_keys=tsBNgen.int_to_str(jj)+str(ii) 539 | if(loopbacks_keys in self.loopbacks.keys()): 540 | if(self.top_order.index(jj) < count): 541 | for count3,mm in enumerate(self.loopbacks[loopbacks_keys]): 542 | all_parents.append(self.Node[jj][-1-mm]) 543 | else: 544 | for count3,mm in enumerate(self.loopbacks[loopbacks_keys]): 545 | LP=mm-1 546 | all_parents.append(self.Node[jj][-1-LP]) 547 | del self.loopbacks[loopbacks_keys] 548 | elif(jj != ii): 549 | 550 | all_parents.append(self.Node[jj][-1]) 551 | parent_N_level.append(self.N_level[jj]) 552 | 553 | self.loopbacks=loopbacks_temp.copy() 554 | self.all_parents=all_parents 555 | self.parent_N_level=parent_N_level 556 | CPD_entry=self.continous_cpd() 557 | 558 | C_Parent=[kk for kk in self.Parent2[str(ii)] if self.Node_Type[kk]=='C'] 559 | temp=0 560 | loopbacks_temp=self.loopbacks.copy() 561 | for count3,kk in enumerate(C_Parent): 562 | loopbacks_keys=tsBNgen.int_to_str(kk)+str(ii) 563 | if(loopbacks_keys in self.loopbacks.keys()): 564 | if(self.top_order.index(kk) < count): 565 | for count3,mm in enumerate(self.loopbacks[loopbacks_keys]): 566 | cont_SUM=self.CPD2[str(parent)][str(kk)+str(ii)]['coefficient'][mm][CPD_entry]*self.Node[kk][-1-mm] 567 | temp=temp+cont_SUM 568 | else: 569 | for count3,mm in enumerate(self.loopbacks[loopbacks_keys]): 570 | LP=mm-1 571 | cont_SUM=self.CPD2[str(parent)][str(kk)+str(ii)]['coefficient'][LP][CPD_entry]*self.Node[kk][-1-LP] 572 | temp=temp+cont_SUM 573 | del self.loopbacks[loopbacks_keys] 574 | elif(kk != ii): 575 | cont_SUM=self.CPD2[str(parent)][str(kk)+str(ii)]['coefficient'][0][CPD_entry]*self.Node[kk][-1] 576 | temp=temp+cont_SUM 577 | intercept=np.random.normal(0,self.CPD2[str(parent)]['sigma_intercept'][CPD_entry],1) 578 | self.Node[ii].extend(self.Gaussian_select(temp+intercept,self.CPD2[str(parent)]['sigma'][CPD_entry],ii)) 579 | self.loopbacks=loopbacks_temp.copy() 580 | 581 | elif(all(self.Node_Type[jj]=='C' for jj in self.Parent2[str(ii)])): 582 | C_Parent=[kk for kk in self.Parent2[str(ii)] if self.Node_Type[kk]=='C'] 583 | temp=0 584 | loopbacks_temp=self.loopbacks.copy() 585 | for count3,kk in enumerate(C_Parent): 586 | loopbacks_keys=tsBNgen.int_to_str(kk)+str(ii) 587 | if(loopbacks_keys in self.loopbacks.keys()): 588 | if(self.top_order.index(kk) < count): 589 | for count3,mm in enumerate(self.loopbacks[loopbacks_keys]): 590 | cont_SUM=self.CPD2[str(parent)][str(kk)+str(ii)]['coefficient'][mm][0]*self.Node[kk][-1-mm] 591 | temp=temp+cont_SUM 592 | else: 593 | for count3,mm in enumerate(self.loopbacks[loopbacks_keys]): 594 | LP=mm-1 595 | cont_SUM=self.CPD2[str(parent)][str(kk)+str(ii)]['coefficient'][LP][0]*self.Node[kk][-1-LP] 596 | temp=temp+cont_SUM 597 | del self.loopbacks[loopbacks_keys] 598 | elif(kk != ii): 599 | cont_SUM=self.CPD2[str(parent)][str(kk)+str(ii)]['coefficient'][0][0]*self.Node[kk][-1] 600 | temp=temp+cont_SUM 601 | intercept=np.random.normal(0,self.CPD2[str(parent)]['sigma_intercept'][0],1) 602 | self.Node[ii].extend(self.Gaussian_select(temp+sintercept,self.CPD2[str(parent)]['sigma'][0],ii)) 603 | self.loopbacks=loopbacks_temp.copy() 604 | 605 | 606 | 607 | def BN_data_gen(self): 608 | ''' 609 | It uses Initial_sample for initial time(t=0) and BN_sample for time point t=1 up to time t=T (length of time series) 610 | 611 | Parameters 612 | ------------- 613 | None 614 | 615 | Returns 616 | ------------- 617 | None 618 | None 619 | 620 | Raises 621 | ------------ 622 | Exception 623 | Parent of a discrete node cannot be continuous 624 | 625 | Notes 626 | ------------- 627 | Use this function under the following conditions: custom_time variable is not specified 628 | and the value of the loopback for all the variables is at most 1 629 | ''' 630 | keys = range(len(self.Node)) 631 | self.BN_Nodes= dict(zip(keys, ([[] for ii in range(self.N)] for _ in keys ))) 632 | for ii in range(self.N): 633 | 634 | self.Initial_sample() 635 | 636 | for kk in range(1,self.T): 637 | self.BN_sample() 638 | for jj in range(len(self.Node)): 639 | self.BN_Nodes[jj][ii]=self.Node[jj] 640 | self.Node[jj]=[] 641 | 642 | 643 | 644 | def BN_sample_loopback(self): 645 | ''' 646 | Generate samples for all the nodes given CPD3 and Parent3 are used. 647 | 648 | Parameters 649 | -------------- 650 | None 651 | 652 | Returns 653 | ------------- 654 | None 655 | 656 | Raises 657 | ------------ 658 | Exception 659 | Parent of a discrete node cannot be continuous 660 | 661 | Notes 662 | ------------ 663 | Use this function when you want to incorporate three BNs. 664 | ''' 665 | self.flag=2 666 | self.Valid_BN(self.Parent3) 667 | self.DAG_ordering() 668 | loopbacks_temp=self.loopbacks2.copy() 669 | for count,ii in enumerate(self.top_order): 670 | parent=tsBNgen.int_to_str(self.Parent3[str(ii)])+str(ii) 671 | if(all(self.Node_Type[jj]=='D' for jj in self.Parent3[str(ii)])): 672 | temp=0 673 | all_parents=[] 674 | parent_N_level=[] 675 | for count2,jj in enumerate(self.Parent3[str(ii)]): 676 | loopbacks_keys=tsBNgen.int_to_str(jj)+str(ii) 677 | if(loopbacks_keys in self.loopbacks2.keys()): 678 | if(self.top_order.index(jj) < count): 679 | for count3,mm in enumerate(self.loopbacks2[loopbacks_keys]): 680 | all_parents.append(self.Node[jj][-1-mm]) 681 | else: 682 | for count3,mm in enumerate(self.loopbacks2[loopbacks_keys]): 683 | LP=mm-1 684 | all_parents.append(self.Node[jj][-1-LP]) 685 | del self.loopbacks2[loopbacks_keys] 686 | 687 | 688 | elif(jj != ii): 689 | all_parents.append(self.Node[jj][-1]) 690 | parent_N_level.append(self.N_level[jj]) 691 | 692 | self.loopbacks2=loopbacks_temp.copy() 693 | self.all_parents=all_parents 694 | self.parent_N_level=parent_N_level 695 | CPD_entry=self.continous_cpd() 696 | if(self.Node_Type[ii]=='D'): 697 | self.Node[ii].append(self.Multinomial_Select(parent,CPD_entry,ii)) 698 | elif(self.Node_Type[ii]=='C'): 699 | self.Node[ii].extend(self.Gaussian_select(parent,CPD_entry)) 700 | elif(any(self.Node_Type[jj]=='D' for jj in self.Parent3[str(ii)])): 701 | loopbacks_temp=self.loopbacks2.copy() 702 | D_Parent=[kk for kk in self.Parent3[str(ii)] if self.Node_Type[kk]=='D'] 703 | temp=0 704 | all_parents=[] 705 | parent_N_level=[] 706 | for count2,jj in enumerate(D_Parent): 707 | loopbacks_keys=tsBNgen.int_to_str(jj)+str(ii) 708 | if(loopbacks_keys in self.loopbacks2.keys()): 709 | if(self.top_order.index(jj) < count): 710 | for count3,mm in enumerate(self.loopbacks2[loopbacks_keys]): 711 | all_parents.append(self.Node[jj][-1-mm]) 712 | else: 713 | for count3,mm in enumerate(self.loopbacks2[loopbacks_keys]): 714 | LP=mm-1 715 | all_parents.append(self.Node[jj][-1-LP]) 716 | del self.loopbacks2[loopbacks_keys] 717 | 718 | 719 | elif(jj != ii): 720 | all_parents.append(self.Node[jj][-1]) 721 | parent_N_level.append(self.N_level[jj]) 722 | 723 | self.loopbacks2=loopbacks_temp.copy() 724 | self.all_parents=all_parents 725 | self.parent_N_level=parent_N_level 726 | CPD_entry=self.continous_cpd() 727 | 728 | C_Parent=[kk for kk in self.Parent3[str(ii)] if self.Node_Type[kk]=='C'] 729 | temp=0 730 | loopbacks_temp=self.loopbacks2.copy() 731 | for count3,kk in enumerate(C_Parent): 732 | loopbacks_keys=tsBNgen.int_to_str(kk)+str(ii) 733 | if(loopbacks_keys in self.loopbacks2.keys()): 734 | if(self.top_order.index(kk) < count): 735 | for count3,mm in enumerate(self.loopbacks2[loopbacks_keys]): 736 | cont_SUM=self.CPD3[str(parent)][str(kk)+str(ii)]['coefficient'][mm][CPD_entry]*self.Node[kk][-1-mm] 737 | temp=temp+cont_SUM 738 | else: 739 | for count3,mm in enumerate(self.loopbacks2[loopbacks_keys]): 740 | LP=mm-1 741 | cont_SUM=self.CPD3[str(parent)][str(kk)+str(ii)]['coefficient'][LP][CPD_entry]*self.Node[kk][-1-LP] 742 | temp=temp+cont_SUM 743 | del self.loopbacks[loopbacks_keys] 744 | elif(kk != ii): 745 | cont_SUM=self.CPD3[str(parent)][str(kk)+str(ii)]['coefficient'][0][CPD_entry]*self.Node[kk][-1] 746 | temp=temp+cont_SUM 747 | intercept=np.random.normal(0,self.CPD3[str(parent)]['sigma_intercept'][CPD_entry],1) 748 | self.Node[ii].extend(self.Gaussian_select(temp+intercept,self.CPD3[str(parent)]['sigma'][CPD_entry],ii)) 749 | self.loopbacks2=loopbacks_temp.copy() 750 | 751 | elif(all(self.Node_Type[jj]=='C' for jj in self.Parent3[str(ii)])): 752 | C_Parent=[kk for kk in self.Parent3[str(ii)] if self.Node_Type[kk]=='C'] 753 | temp=0 754 | loopbacks_temp=self.loopbacks2.copy() 755 | for count3,kk in enumerate(C_Parent): 756 | loopbacks_keys=tsBNgen.int_to_str(kk)+str(ii) 757 | if(loopbacks_keys in self.loopbacks2.keys()): 758 | if(self.top_order.index(kk) < count): 759 | for count3,mm in enumerate(self.loopbacks2[loopbacks_keys]): 760 | cont_SUM=self.CPD3[str(parent)][str(kk)+str(ii)]['coefficient'][mm][0]*self.Node[kk][-1-mm] 761 | temp=temp+cont_SUM 762 | else: 763 | for count3,mm in enumerate(self.loopbacks2[loopbacks_keys]): 764 | LP=mm-1 765 | cont_SUM=self.CPD3[str(parent)][str(kk)+str(ii)]['coefficient'][LP][0]*self.Node[kk][-1-LP] 766 | temp=temp+cont_SUM 767 | del self.loopbacks[loopbacks_keys] 768 | elif(kk != ii): 769 | cont_SUM=self.CPD3[str(parent)][str(kk)+str(ii)]['coefficient'][0][0]*self.Node[kk][-1] 770 | temp=temp+cont_SUM 771 | intercept=np.random.normal(0,self.CPD3[str(parent)]['sigma_intercept'][0],1) 772 | self.Node[ii].extend(self.Gaussian_select(temp+intercept,self.CPD3[str(parent)]['sigma'][0],ii)) 773 | self.loopbacks2=loopbacks_temp.copy() 774 | 775 | 776 | def BN_sample_gen_loopback(self): 777 | ''' 778 | Generate time series data for all the nodes for all time. See Notes for the more information. 779 | 780 | Parameters 781 | ------------- 782 | None 783 | 784 | Returns 785 | ------------- 786 | None 787 | 788 | Raises 789 | ------------ 790 | Exception 791 | Parent of a discrete node cannot be continuous 792 | 793 | Notes 794 | ------------ 795 | This is more general form of BN_data_gen that supports only two different BN structures 796 | or loopback value of maximum one for all the nodes. 797 | ''' 798 | keys = range(len(self.Node)) 799 | self.BN_Nodes= dict(zip(keys, ([[] for ii in range(self.N)] for _ in keys ))) 800 | Max_loopback=max(sum(self.loopbacks2.values(),[])) 801 | 802 | if (self.custom_time == 0): 803 | for ii in range(self.N): 804 | self.Initial_sample() 805 | for kk in range(1,Max_loopback): 806 | self.BN_sample() 807 | for mm in range(Max_loopback,self.T): 808 | self.BN_sample_loopback() 809 | for jj in range(len(self.Node)): 810 | self.BN_Nodes[jj][ii]=self.Node[jj] 811 | self.Node[jj]=[] 812 | else: 813 | for ii in range(self.N): 814 | self.Initial_sample() 815 | for kk in range(1,self.custom_time): 816 | self.BN_sample() 817 | for mm in range(self.custom_time,self.T): 818 | self.BN_sample_loopback() 819 | for jj in range(len(self.Node)): 820 | self.BN_Nodes[jj][ii]=self.Node[jj] 821 | self.Node[jj]=[] 822 | 823 | 824 | 825 | 826 | 827 | 828 | 829 | 830 | 831 | 832 | 833 | 834 | -------------------------------------------------------------------------------- /tsbngen.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/manitadayon/tsBNgen/1d54de78e9a4405e7a0049ccab89093cd7f0d094/tsbngen.pdf --------------------------------------------------------------------------------