├── .gitignore
├── .ipynb_checkpoints
└── Time_Series_Generation_Examples-checkpoint.ipynb
├── License.txt
├── README.md
├── Time_Series_Generation_Examples.ipynb
├── setup.py
├── tsBNgen
├── __init__.py
├── __pycache__
│ ├── __init__.cpython-38.pyc
│ └── tsBNgen.cpython-38.pyc
└── tsBNgen.py
└── tsbngen.pdf
/.gitignore:
--------------------------------------------------------------------------------
1 | tsBNgen.egg-info
2 | dist
3 | build
4 | .git
--------------------------------------------------------------------------------
/.ipynb_checkpoints/Time_Series_Generation_Examples-checkpoint.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## Examples Companion with Documentation\n",
8 | "\n",
9 | "**Description**\n",
10 | "> #### tsBNgen is a Python library to generate time series data based on an arbitrary dynamic Bayesian network. The intention behind writing tsBNgen is to let researchers geenrate time series data according to arbitrary model they want. \n",
11 | "\n",
12 | "> #### tsBNgen is released under the MIT license. \n",
13 | "\n",
14 | "**Instruction** \n",
15 | "\n",
16 | "> #### 1. Either clone this repository https://github.com/manitadayon/tsBNgen or install the package using pip install tsBNgen.\n",
17 | "> #### Then import necessary libraries using the following commands:\n",
18 | "```python\n",
19 | "from tsBNgen import *\n",
20 | "from tsBNgen.tsBNgen import * \n",
21 | "```\n",
22 | "> #### There are in general two functions you should be running if you want to generate data:\n",
23 | "\n",
24 | "> - BN_data_gen(): Use this function under the following conditions:\n",
25 | " custom_time variable is not specified and the value of the loopback for all the variables is at most 1.\n",
26 | " \n",
27 | "---- \n",
28 | "\n",
29 | "**Note**: condition 1 describes the classical dynamic Bayesian network in which some nodes at time t-1 are connected to themselves at time t. \n",
30 | "\n",
31 | "> - BN_sample_gen_loopback(): \n",
32 | " 1. custom_time is not specified and you want the loopback value for some nodes to be at most 2.\n",
33 | " 2. custom_time is specified and it is at least equal to the maximum loopback value of the loopbacks2.\n",
34 | "\n",
35 | "> #### Following are the explanation of the some of the variables and parameters in tsBNgen:\n",
36 | "> - **T** : Length of each time series.\n",
37 | "> - **N** : Number of samples.\n",
38 | "> - **N_level** : list. Number of possible levels for discrete nodes.\n",
39 | "> - **Mat** : data-frame. Adjacency matrix for each time point.\n",
40 | "> - **Node_Type** :list. Type of each variable in Bayesian Network.\n",
41 | "> - **CPD** : dict. Conditonal Probability Distribution for initial time point.\n",
42 | "> - **Parent** : dict. Identifying parent of each node in Bayesian network at initial time.\n",
43 | "> - **CPD2** : dict. Conditonal Probability Distribution.\n",
44 | "> - **Parent2** : dict. Identifying parent of each node in Bayesian network.\n",
45 | "> - **loopbacks** : dict. Describing the temporal interconnection between nodes.\n",
46 | "> - **CPD3** : dict. Conditonal Probability Distribution. Use this entry when BN_sample_gen_loopback() is called.\n",
47 | "> - **Parent3** : dict. Identifying parent of each node in Bayesian network. Use this entry when \n",
48 | "> BN_sample_gen_loopback() is called.\n",
49 | "> - **loopback2** : dict. Describing the temporal interconnection between nodes. Use this entry when \n",
50 | "> BN_sample_gen_loopback() is called.\n",
51 | " \n",
52 | " \n"
53 | ]
54 | },
55 | {
56 | "cell_type": "markdown",
57 | "metadata": {},
58 | "source": [
59 | "### Import Necessary Files"
60 | ]
61 | },
62 | {
63 | "cell_type": "code",
64 | "execution_count": 2,
65 | "metadata": {},
66 | "outputs": [],
67 | "source": [
68 | "from tsBNgen import *\n",
69 | "from tsBNgen.tsBNgen import * "
70 | ]
71 | },
72 | {
73 | "cell_type": "markdown",
74 | "metadata": {},
75 | "source": [
76 | "## Architecture 1\n",
77 | "\n",
78 | "### Two Discerete and One Continuous Nodes."
79 | ]
80 | },
81 | {
82 | "cell_type": "code",
83 | "execution_count": 3,
84 | "metadata": {},
85 | "outputs": [],
86 | "source": [
87 | "T=20\n",
88 | "N=2000\n",
89 | "N_level=[2,4]\n",
90 | "Mat=pd.DataFrame(np.array(([0,1,1],[0,0,1],[0,0,0])))\n",
91 | "Node_Type=['D','D','C']\n",
92 | "CPD={'0':[0.6,0.4],'01':[[0.5,0.3,0.15,0.05],[0.1,0.15,0.3,0.45]],'012':{'mu0':10,'sigma0':2,'mu1':30,'sigma1':5,\n",
93 | " 'mu2':50,'sigma2':5,'mu3':70,'sigma3':5,'mu4':15,'sigma4':5,'mu5':50,'sigma5':5,'mu6':70,'sigma6':5,'mu7':90,'sigma7':3\n",
94 | "}}\n",
95 | "Parent={'0':[],'1':[0],'2':[0,1]}\n",
96 | "\n",
97 | "CPD2={'00':[[0.7,0.3],[0.2,0.8]],'011':[[0.7,0.2,0.1,0],[0.6,0.3,0.05,0.05],[0.35,0.5,0.15,0],\n",
98 | "[0.2,0.3,0.4,0.1],[0.3,0.3,0.2,0.2],[0.1,0.2,0.3,0.4],[0.05,0.15,0.3,0.5],[0,0.05,0.25,0.7]],'012':{'mu0':10,'sigma0':2,'mu1':30,'sigma1':5,\n",
99 | " 'mu2':50,'sigma2':5,'mu3':70,'sigma3':5,'mu4':15,'sigma4':5,'mu5':50,'sigma5':5,'mu6':70,'sigma6':5,'mu7':90,'sigma7':3\n",
100 | "}}\n",
101 | "\n",
102 | "Parent2={'0':[0],'1':[0,1],'2':[0,1]}\n",
103 | "loopbacks={'00':[1],'11':[1]}\n",
104 | "Parent2={'0':[0],'1':[0,1],'2':[0,1]}\n",
105 | "Time_series1=tsBNgen(T,N,N_level,Mat,Node_Type,CPD,Parent,CPD2,Parent2,loopbacks)\n",
106 | "Time_series1.BN_data_gen()"
107 | ]
108 | },
109 | {
110 | "cell_type": "code",
111 | "execution_count": 4,
112 | "metadata": {},
113 | "outputs": [
114 | {
115 | "name": "stdout",
116 | "output_type": "stream",
117 | "text": [
118 | "[1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 2, 1, 2, 2, 2, 2, 2, 1, 1, 1]\n",
119 | "[1, 1, 2, 2, 4, 4, 4, 3, 2, 3, 4, 2, 2, 4, 4, 3, 4, 3, 2, 1]\n",
120 | "[7.808894461133653, 11.32725466660192, 45.60586730086184, 49.417565048817615, 91.96224489111819, 90.60949019682866, 90.56676813839593, 63.41098040448917, 34.83786809684297, 69.03588167756895, 95.48196749276904, 22.059789834337842, 52.4413443699694, 89.17064709084299, 90.74716862100209, 75.25321342376877, 93.21277916279126, 53.755213101836375, 21.726853444296204, 11.167892171240036]\n"
121 | ]
122 | }
123 | ],
124 | "source": [
125 | "print(Time_series1.BN_Nodes[0][3])\n",
126 | "print(Time_series1.BN_Nodes[1][3])\n",
127 | "print(Time_series1.BN_Nodes[2][3])"
128 | ]
129 | },
130 | {
131 | "cell_type": "markdown",
132 | "metadata": {},
133 | "source": [
134 | "## Architecture 2\n"
135 | ]
136 | },
137 | {
138 | "cell_type": "code",
139 | "execution_count": 5,
140 | "metadata": {
141 | "scrolled": true
142 | },
143 | "outputs": [],
144 | "source": [
145 | "T=10\n",
146 | "N=1000\n",
147 | "N_level=[2,4]\n",
148 | "Mat=pd.DataFrame(np.array(([0,1,0],[0,0,1],[0,0,0])))\n",
149 | "Node_Type=['D','D','C']\n",
150 | "CPD={'0':[0.5,0.5],'01':[[0.6,0.3,0.05,0.05],[0.1,0.2,0.3,0.4]],'12':{'mu0':10,'sigma0':5,'mu1':30,'sigma1':5,\n",
151 | " 'mu2':60,'sigma2':5,'mu3':80,'sigma3':5}}\n",
152 | "Parent={'0':[],'1':[0],'2':[1]}\n",
153 | "\n",
154 | "CPD2={'00':[[0.7,0.3],[0.3,0.7]],'0011':[[0.7,0.2,0.1,0],[0.5,0.4,0.1,0],[0.45,0.45,0.1,0],\n",
155 | "[0.3,0.4,0.2,0.1],[0.4,0.4,0.1,0.1],[0.2,0.3,0.3,0.2],[0.2,0.3,0.3,0.2],[0.1,0.2,0.3,0.4],[0.3,0.4,0.2,0.1],[0.2,0.2,0.4,0.2],\n",
156 | " [0.2,0.1,0.4,0.3],[0.05,0.15,0.3,0.5],[0.1,0.3,0.3,0.3],[0,0.1,0.3,0.6],[0,0.1,0.2,0.7],[0,0,0.3,0.7]],'112':{'mu0':10,'sigma0':2,'mu1':30,'sigma1':2,\n",
157 | " 'mu2':50,'sigma2':2,'mu3':60,'sigma3':5,'mu4':20,'sigma4':2,'mu5':25,'sigma5':5,'mu6':50,'sigma6':5,'mu7':60,'sigma7':5,\n",
158 | " 'mu8':40,'sigma8':5,'mu9':50,'sigma9':5,'mu10':70,'sigma10':5,'mu11':85,'sigma11':2,'mu12':60,'sigma12':5, \n",
159 | " 'mu13':60,'sigma13':5,'mu14':80,'sigma14':3,'mu15':90,'sigma15':3}}\n",
160 | "\n",
161 | "Parent2={'0':[0],'1':[0,0,1],'2':[1,1]}\n",
162 | "loopbacks={'00':[1], '01':[1],'11':[1],'12':[1]}\n",
163 | "\n",
164 | "Time_series2=tsBNgen(T,N,N_level,Mat,Node_Type,CPD,Parent,CPD2,Parent2,loopbacks)\n",
165 | "Time_series2.BN_data_gen()"
166 | ]
167 | },
168 | {
169 | "cell_type": "code",
170 | "execution_count": 6,
171 | "metadata": {},
172 | "outputs": [
173 | {
174 | "name": "stdout",
175 | "output_type": "stream",
176 | "text": [
177 | "[2, 2, 2, 2, 2, 1, 1, 2, 2, 2]\n",
178 | "[3, 4, 3, 4, 3, 3, 2, 1, 4, 4]\n",
179 | "[49.95557580237343, 81.91935377141675, 80.02776603979912, 82.7514263833196, 78.09782664417831, 62.4090858819203, 43.79221621872338, 18.10747482296797, 52.72102411795976, 91.61301738516033]\n"
180 | ]
181 | }
182 | ],
183 | "source": [
184 | "print(Time_series2.BN_Nodes[0][3])\n",
185 | "print(Time_series2.BN_Nodes[1][3])\n",
186 | "print(Time_series2.BN_Nodes[2][3])"
187 | ]
188 | },
189 | {
190 | "cell_type": "markdown",
191 | "metadata": {},
192 | "source": [
193 | "## Architecture 3\n",
194 | "\n",
195 | "### Similar to Architecture 1 but with loopback 2 for the middle node."
196 | ]
197 | },
198 | {
199 | "cell_type": "code",
200 | "execution_count": 8,
201 | "metadata": {},
202 | "outputs": [],
203 | "source": [
204 | "T=10\n",
205 | "N=1000\n",
206 | "N_level=[2,4]\n",
207 | "Mat=pd.DataFrame(np.array(([0,1,1],[0,0,1],[0,0,0])))\n",
208 | "Node_Type=['D','D','C']\n",
209 | "\n",
210 | "CPD={'0':[0.6,0.4],'01':[[0.5,0.3,0.15,0.05],[0.1,0.15,0.3,0.45]],'012':{'mu0':10,'sigma0':2,'mu1':30,'sigma1':5,\n",
211 | " 'mu2':50,'sigma2':5,'mu3':70,'sigma3':5,'mu4':15,'sigma4':5,'mu5':50,'sigma5':5,'mu6':70,'sigma6':5,'mu7':90,'sigma7':3\n",
212 | "}}\n",
213 | "Parent={'0':[],'1':[0],'2':[0,1]}\n",
214 | "\n",
215 | "\n",
216 | "CPD2={'00':[[0.7,0.3],[0.2,0.8]],'011':[[0.7,0.2,0.1,0],[0.6,0.3,0.05,0.05],[0.35,0.5,0.15,0],\n",
217 | "[0.2,0.3,0.4,0.1],[0.3,0.3,0.2,0.2],[0.1,0.2,0.3,0.4],[0.05,0.15,0.3,0.5],[0,0.05,0.25,0.7]],'012':{'mu0':10,'sigma0':2,'mu1':30,'sigma1':5,\n",
218 | " 'mu2':50,'sigma2':5,'mu3':70,'sigma3':5,'mu4':15,'sigma4':5,'mu5':50,'sigma5':5,'mu6':70,'sigma6':5,'mu7':90,'sigma7':3\n",
219 | "}}\n",
220 | "\n",
221 | "Parent2={'0':[0],'1':[0,1],'2':[0,1]}\n",
222 | "\n",
223 | "loopbacks={'00':[1],'11':[1]}\n",
224 | "\n",
225 | "CPD3={'00':[[0.7,0.3],[0.2,0.8]],'0111':[[0.7,0.2,0.1,0],[0.6,0.3,0.1,0],[0.3,0.5,0.2,0],\n",
226 | "[0.3,0.4,0.15,0.15],[0.5,0.4,0.05,0.05],[0.5,0.4,0.05,0.05],[0.25,0.45,0.15,0.15],[0.2,0.4,0.3,0.1],[0.3,0.5,0.2,0],[0.25,0.45,0.15,0.15],\n",
227 | "[0.1,0.45,0.3,0.15],[0.05,0.45,0.3,0.2],[0.3,0.4,0.15,0.15],[0.2,0.4,0.3,0.1],[0.05,0.45,0.3,0.2],[0.1,0.3,0.4,0.2],[0.35,0.35,0.2,0.1],[0.25,0.45,0.2,0.1],[0.1,0.2,0.5,0.2],[0.05,0.25,0.5,0.2],\n",
228 | "[0.25,0.45,0.2,0.1],[0.05,0.35,0.5,0.1],[0.05,0.25,0.45,0.25],[0.05,0.2,0.35,0.4],[0.1,0.2,0.5,0],[0.05,0.25,0.45,0.25],[0.05,0.15,0.3,0.5],\n",
229 | " [0.05,0.1,0.3,0.55],[0.05,0.25,0.5,0.2],[0.05,0.2,0.35,0.4],[0.05,0.1,0.3,0.55],[0,0,0.2,0.8]],'012':{'mu0':10,'sigma0':2,'mu1':20,'sigma1':3,\n",
230 | " 'mu2':50,'sigma2':5,'mu3':70,'sigma3':5,'mu4':15,'sigma4':5,'mu5':50,'sigma5':5,'mu6':75,'sigma6':3,'mu7':90,'sigma7':3\n",
231 | "}}\n",
232 | "\n",
233 | "Parent3={'0':[0],'1':[0,1,1],'2':[0,1]}\n",
234 | "\n",
235 | "loopbacks2={'00':[1],'11':[2,1]}\n",
236 | "\n",
237 | "\n",
238 | "Time_series3=tsBNgen(T,N,N_level,Mat,Node_Type,CPD,Parent,CPD2,Parent2,loopbacks,CPD3,Parent3,loopbacks2)\n",
239 | "\n",
240 | "Time_series3.BN_sample_gen_loopback()"
241 | ]
242 | },
243 | {
244 | "cell_type": "code",
245 | "execution_count": 9,
246 | "metadata": {},
247 | "outputs": [
248 | {
249 | "name": "stdout",
250 | "output_type": "stream",
251 | "text": [
252 | "[2, 1, 1, 2, 2, 2, 1, 1, 2, 2]\n",
253 | "[2, 1, 1, 1, 2, 3, 4, 3, 4, 2]\n",
254 | "[53.87472154213914, 6.495627168596188, 9.418392120518126, 19.646516517998936, 44.525380117118665, 75.87802769068614, 75.68517757174538, 56.29952829580641, 93.33191414647266, 45.862343348068705]\n"
255 | ]
256 | }
257 | ],
258 | "source": [
259 | "print((Time_series3.BN_Nodes[0][1]))\n",
260 | "print(Time_series3.BN_Nodes[1][1])\n",
261 | "print(Time_series3.BN_Nodes[2][1])"
262 | ]
263 | },
264 | {
265 | "cell_type": "markdown",
266 | "metadata": {},
267 | "source": [
268 | "## Architecture 4\n",
269 | "\n",
270 | "### 3 Discerete and 3 Continuous Nodes. Each Node is Connected to Itself Across Time"
271 | ]
272 | },
273 | {
274 | "cell_type": "code",
275 | "execution_count": 13,
276 | "metadata": {},
277 | "outputs": [],
278 | "source": [
279 | "T=10\n",
280 | "N=1000\n",
281 | "N_level=[2,2,2]\n",
282 | "Mat=pd.DataFrame(np.array(([0,1,1,1,1],[0,0,1,1,1],[0,0,0,1,1],[0,0,0,0,0],[0,0,0,0,0]))) \n",
283 | "Node_Type=['D','D','D','C','C']\n",
284 | "\n",
285 | "CPD={'0':[0.6,.04],'01':[[0.7,0.3],[0.3,0.7]],'012':[[0.9,0.1],[0.4,0.6],[0.6,0.4],[0.1,0.9]],\n",
286 | " '0123':{'mu0':5,'sigma0':2,'mu1':10,'sigma1':3,'mu2':20,'sigma2':2,'mu3':50,'sigma3':3,'mu4':20,'sigma4':2,'mu5':40,'sigma5':3,'mu6':50,'sigma6':5,'mu7':80,'sigma7':3,\n",
287 | " },'0124':{'mu0':500,'sigma0':10,'mu1':480,'sigma1':13,'mu2':450,'sigma2':10,'mu3':400,'sigma3':13,'mu4':400,'sigma4':10,'mu5':300,'sigma5':10,'mu6':250,'sigma6':10,'mu7':100,'sigma7':5}}\n",
288 | "\n",
289 | "Parent={'0':[],'1':[0],'2':[0,1],'3':[0,1,2],'4':[0,1,2]}\n",
290 | "\n",
291 | "CPD2={'00':[[0.6,0.4],[0.2,0.8]],'011':[[0.8,0.2],[0.6,0.4],[0.7,0.3],[0.2,0.8]],\n",
292 | " '0122':[[0.9,0.1],[0.7,0.3],[0.7,0.3],[0.28,0.78],[0.7,0.3],[0.28,0.72],[0.28,0.72],[0.1,0.9]],'01233':{'33':{'coefficient':[np.linspace(0.6,0.8,8).tolist()]},'sigma_intercept':np.linspace(0.6,3,8).tolist(),'sigma':np.linspace(3,4,8).tolist()},'01244':{'44':{'coefficient':[np.linspace(0.6,1.3,8).tolist()]},\n",
293 | " 'sigma_intercept':np.linspace(2,5,8).tolist(),'sigma':np.linspace(3,4,8).tolist()}}\n",
294 | "\n",
295 | "\n",
296 | "loopbacks={'00':[1],'11':[1],'22':[1],'33':[1],'44':[1]} \n",
297 | "\n",
298 | "Parent2={'0':[0],'1':[0,1],'2':[0,1,2],'3':[0,1,2,3],'4':[0,1,2,4]}\n",
299 | "\n",
300 | "\n",
301 | "Time_series4=tsBNgen(T,N,N_level,Mat,Node_Type,CPD,Parent,CPD2,Parent2,loopbacks)\n",
302 | "\n",
303 | "Time_series4.BN_data_gen() "
304 | ]
305 | },
306 | {
307 | "cell_type": "markdown",
308 | "metadata": {},
309 | "source": [
310 | "## Architecture 5\n",
311 | "\n",
312 | "### HMM with loopback 1 "
313 | ]
314 | },
315 | {
316 | "cell_type": "code",
317 | "execution_count": 18,
318 | "metadata": {},
319 | "outputs": [
320 | {
321 | "name": "stdout",
322 | "output_type": "stream",
323 | "text": [
324 | "Total Time is 2.0737037658691406\n"
325 | ]
326 | }
327 | ],
328 | "source": [
329 | "import time\n",
330 | "START=time.time()\n",
331 | "T=20\n",
332 | "N=1000\n",
333 | "N_level=[4]\n",
334 | "Mat=pd.DataFrame(np.array(([0,1],[0,0]))) # HMM\n",
335 | "Node_Type=['D','C']\n",
336 | "\n",
337 | "CPD={'0':[0.25,0.25,0.25,0.25],'01':{'mu0':20,'sigma0':5,'mu1':40,'sigma1':5,\n",
338 | " 'mu2':60,'sigma2':5,'mu3':80,'sigma3':5}}\n",
339 | "\n",
340 | "Parent={'0':[],'1':[0]}\n",
341 | "\n",
342 | "\n",
343 | "CPD2={'00':[[0.6,0.3,0.05,0.05],[0.25,0.4,0.25,0.1],[0.1,0.3,0.4,0.2],[0.05,0.05,0.4,0.5]],'01':{'mu0':20,'sigma0':5,'mu1':40,'sigma1':5,\n",
344 | " 'mu2':60,'sigma2':5,'mu3':80,'sigma3':5\n",
345 | "}}\n",
346 | "\n",
347 | "loopbacks={'00':[1]}\n",
348 | "\n",
349 | "Parent2={'0':[0],'1':[0]}\n",
350 | "\n",
351 | "\n",
352 | "Time_series5=tsBNgen(T,N,N_level,Mat,Node_Type,CPD,Parent,CPD2,Parent2,loopbacks)\n",
353 | "\n",
354 | "Time_series5.BN_data_gen() \n",
355 | "FINISH=time.time()\n",
356 | "print('Total Time is',FINISH-START)"
357 | ]
358 | },
359 | {
360 | "cell_type": "code",
361 | "execution_count": 19,
362 | "metadata": {
363 | "scrolled": true
364 | },
365 | "outputs": [
366 | {
367 | "name": "stdout",
368 | "output_type": "stream",
369 | "text": [
370 | "[3, 4, 3, 1, 2, 2, 2, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 3, 4, 3]\n",
371 | "[60.072920827425, 82.74836461790208, 57.3652355700002, 15.808715408345048, 42.11400427111718, 33.940916747529904, 39.38294267545808, 56.77532334095553, 55.225363073651565, 80.31120583537712, 72.83731788645105, 78.75540628414659, 83.74799152447356, 80.85359737561909, 83.89681465845214, 73.10191951112242, 83.32808287493899, 52.70302656588948, 74.34544473307433, 61.67002934637476]\n"
372 | ]
373 | }
374 | ],
375 | "source": [
376 | "print(Time_series5.BN_Nodes[0][1])\n",
377 | "print(Time_series5.BN_Nodes[1][1])"
378 | ]
379 | },
380 | {
381 | "cell_type": "code",
382 | "execution_count": null,
383 | "metadata": {},
384 | "outputs": [],
385 | "source": []
386 | }
387 | ],
388 | "metadata": {
389 | "kernelspec": {
390 | "display_name": "Python 3",
391 | "language": "python",
392 | "name": "python3"
393 | },
394 | "language_info": {
395 | "codemirror_mode": {
396 | "name": "ipython",
397 | "version": 3
398 | },
399 | "file_extension": ".py",
400 | "mimetype": "text/x-python",
401 | "name": "python",
402 | "nbconvert_exporter": "python",
403 | "pygments_lexer": "ipython3",
404 | "version": "3.8.2"
405 | }
406 | },
407 | "nbformat": 4,
408 | "nbformat_minor": 4
409 | }
410 |
--------------------------------------------------------------------------------
/License.txt:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2020 Manie Tadayon
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 |    
2 | 
3 |
4 | ## If you would like to buy me a coffee
5 |
6 |
7 |
8 |
9 | **tsBNgen: A Python Library to Generate Time Series Data Based on an Arbitrary Bayesian Network Structure**
10 |
11 | [Description](#Description)
12 |
13 | [Citation](#Citaton)
14 |
15 | [Features](#Features)
16 |
17 | [Instruction](#Instruction)
18 |
19 | [License](#License)
20 |
21 | ----
22 |
23 | ### **Description**
24 |
25 | #### tsBNgen is a Python package to generate time series data based on an arbitrary Bayesian Network Structures.
26 | ---
27 | ### **Citation**
28 |
29 | #### If you find this package useful or if you use it in your research or work please consider citing it as follows:
30 | ```
31 | @article{tadayon2020tsbngen,
32 | title={tsBNgen: A Python Library to Generate Time Series Data from an Arbitrary Dynamic Bayesian Network Structure},
33 | author={Tadayon, Manie and Pottie, Greg},
34 | journal={arXiv preprint arXiv:2009.04595},
35 | year={2020}
36 | }
37 | ```
38 | ----
39 | ### **Features**
40 |
41 | - It handles discrete nodes, continous nodes and hybrid (Mixture of discrete and continuous) network.
42 |
43 | - It uses multinomila distribution for the discrete nodes and Gaussian distribution for the continuous nodes.
44 |
45 | - It handles arbitrary Bayesian network structure.
46 |
47 | - It supports arbitrary loopback values.
48 |
49 | - The code can be modified easily to handle arbitrary static and temporal structures.
50 | ---
51 |
52 | ### **Instruction**
53 |
54 | To run this code either clone this repo or use the package distribution in PyPI using the following commands:
55 |
56 | ```python
57 | pip install tsBNgen
58 | ```
59 |
60 | Then Run through the set of examples in
61 |
62 | > **Time_Series_Generation_Examples.ipynb**
63 |
64 | For more information on how to use the package please visit the following:
65 |
66 | 1. Watch my Youtube tutorial (I go over the package)
67 | Watch the videos
68 | 3. Original paper
69 | 4. Documentation in PDF available in this repository.
70 |
71 | ### **License**
72 |
73 | This software is released under the MIT liecense.
74 |
75 |
76 |
77 |
78 |
79 |
80 |
81 |
82 |
83 |
84 |
85 |
86 |
87 |
88 |
--------------------------------------------------------------------------------
/Time_Series_Generation_Examples.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## Examples Companion with Documentation\n",
8 | "\n",
9 | "**Description**\n",
10 | "> #### tsBNgen is a Python library to generate time series data based on an arbitrary dynamic Bayesian network. The intention behind writing tsBNgen is to let researchers geenrate time series data according to arbitrary model they want. \n",
11 | "\n",
12 | "> #### tsBNgen is released under the MIT license. \n",
13 | "\n",
14 | "**Instruction** \n",
15 | "\n",
16 | "> #### 1. Either clone this repository https://github.com/manitadayon/tsBNgen or install the package using pip install tsBNgen.\n",
17 | "> #### Then import necessary libraries using the following commands:\n",
18 | "```python\n",
19 | "from tsBNgen import *\n",
20 | "from tsBNgen.tsBNgen import * \n",
21 | "```\n",
22 | "> #### There are in general two functions you should be running if you want to generate data:\n",
23 | "\n",
24 | "> - BN_data_gen(): Use this function under the following conditions:\n",
25 | " custom_time variable is not specified and the value of the loopback for all the variables is at most 1.\n",
26 | " \n",
27 | "---- \n",
28 | "\n",
29 | "**Note**: condition 1 describes the classical dynamic Bayesian network in which some nodes at time t-1 are connected to themselves at time t. \n",
30 | "\n",
31 | "> - BN_sample_gen_loopback(): \n",
32 | " 1. custom_time is not specified and you want the loopback value for some nodes to be at most 2.\n",
33 | " 2. custom_time is specified and it is at least equal to the maximum loopback value of the loopbacks2.\n",
34 | "\n",
35 | "> #### Following are the explanation of the some of the variables and parameters in tsBNgen:\n",
36 | "> - **T** : Length of each time series.\n",
37 | "> - **N** : Number of samples.\n",
38 | "> - **N_level** : list. Number of possible levels for discrete nodes.\n",
39 | "> - **Mat** : data-frame. Adjacency matrix for each time point.\n",
40 | "> - **Node_Type** :list. Type of each variable in Bayesian Network.\n",
41 | "> - **CPD** : dict. Conditonal Probability Distribution for initial time point.\n",
42 | "> - **Parent** : dict. Identifying parent of each node in Bayesian network at initial time.\n",
43 | "> - **CPD2** : dict. Conditonal Probability Distribution.\n",
44 | "> - **Parent2** : dict. Identifying parent of each node in Bayesian network.\n",
45 | "> - **loopbacks** : dict. Describing the temporal interconnection between nodes.\n",
46 | "> - **CPD3** : dict. Conditonal Probability Distribution. Use this entry when BN_sample_gen_loopback() is called.\n",
47 | "> - **Parent3** : dict. Identifying parent of each node in Bayesian network. Use this entry when \n",
48 | "> BN_sample_gen_loopback() is called.\n",
49 | "> - **loopback2** : dict. Describing the temporal interconnection between nodes. Use this entry when \n",
50 | "> BN_sample_gen_loopback() is called.\n",
51 | " \n",
52 | " \n"
53 | ]
54 | },
55 | {
56 | "cell_type": "markdown",
57 | "metadata": {},
58 | "source": [
59 | "### Import Necessary Files"
60 | ]
61 | },
62 | {
63 | "cell_type": "code",
64 | "execution_count": 2,
65 | "metadata": {},
66 | "outputs": [],
67 | "source": [
68 | "from tsBNgen import *\n",
69 | "from tsBNgen.tsBNgen import * "
70 | ]
71 | },
72 | {
73 | "cell_type": "markdown",
74 | "metadata": {},
75 | "source": [
76 | "## Architecture 1\n",
77 | "\n",
78 | "### Two Discerete and One Continuous Nodes."
79 | ]
80 | },
81 | {
82 | "cell_type": "code",
83 | "execution_count": 3,
84 | "metadata": {},
85 | "outputs": [],
86 | "source": [
87 | "T=20\n",
88 | "N=2000\n",
89 | "N_level=[2,4]\n",
90 | "Mat=pd.DataFrame(np.array(([0,1,1],[0,0,1],[0,0,0])))\n",
91 | "Node_Type=['D','D','C']\n",
92 | "CPD={'0':[0.6,0.4],'01':[[0.5,0.3,0.15,0.05],[0.1,0.15,0.3,0.45]],'012':{'mu0':10,'sigma0':2,'mu1':30,'sigma1':5,\n",
93 | " 'mu2':50,'sigma2':5,'mu3':70,'sigma3':5,'mu4':15,'sigma4':5,'mu5':50,'sigma5':5,'mu6':70,'sigma6':5,'mu7':90,'sigma7':3\n",
94 | "}}\n",
95 | "Parent={'0':[],'1':[0],'2':[0,1]}\n",
96 | "\n",
97 | "CPD2={'00':[[0.7,0.3],[0.2,0.8]],'011':[[0.7,0.2,0.1,0],[0.6,0.3,0.05,0.05],[0.35,0.5,0.15,0],\n",
98 | "[0.2,0.3,0.4,0.1],[0.3,0.3,0.2,0.2],[0.1,0.2,0.3,0.4],[0.05,0.15,0.3,0.5],[0,0.05,0.25,0.7]],'012':{'mu0':10,'sigma0':2,'mu1':30,'sigma1':5,\n",
99 | " 'mu2':50,'sigma2':5,'mu3':70,'sigma3':5,'mu4':15,'sigma4':5,'mu5':50,'sigma5':5,'mu6':70,'sigma6':5,'mu7':90,'sigma7':3\n",
100 | "}}\n",
101 | "\n",
102 | "Parent2={'0':[0],'1':[0,1],'2':[0,1]}\n",
103 | "loopbacks={'00':[1],'11':[1]}\n",
104 | "Parent2={'0':[0],'1':[0,1],'2':[0,1]}\n",
105 | "Time_series1=tsBNgen(T,N,N_level,Mat,Node_Type,CPD,Parent,CPD2,Parent2,loopbacks)\n",
106 | "Time_series1.BN_data_gen()"
107 | ]
108 | },
109 | {
110 | "cell_type": "code",
111 | "execution_count": 4,
112 | "metadata": {},
113 | "outputs": [
114 | {
115 | "name": "stdout",
116 | "output_type": "stream",
117 | "text": [
118 | "[1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 2, 1, 2, 2, 2, 2, 2, 1, 1, 1]\n",
119 | "[1, 1, 2, 2, 4, 4, 4, 3, 2, 3, 4, 2, 2, 4, 4, 3, 4, 3, 2, 1]\n",
120 | "[7.808894461133653, 11.32725466660192, 45.60586730086184, 49.417565048817615, 91.96224489111819, 90.60949019682866, 90.56676813839593, 63.41098040448917, 34.83786809684297, 69.03588167756895, 95.48196749276904, 22.059789834337842, 52.4413443699694, 89.17064709084299, 90.74716862100209, 75.25321342376877, 93.21277916279126, 53.755213101836375, 21.726853444296204, 11.167892171240036]\n"
121 | ]
122 | }
123 | ],
124 | "source": [
125 | "print(Time_series1.BN_Nodes[0][3])\n",
126 | "print(Time_series1.BN_Nodes[1][3])\n",
127 | "print(Time_series1.BN_Nodes[2][3])"
128 | ]
129 | },
130 | {
131 | "cell_type": "markdown",
132 | "metadata": {},
133 | "source": [
134 | "## Architecture 2\n"
135 | ]
136 | },
137 | {
138 | "cell_type": "code",
139 | "execution_count": 5,
140 | "metadata": {
141 | "scrolled": true
142 | },
143 | "outputs": [],
144 | "source": [
145 | "T=10\n",
146 | "N=1000\n",
147 | "N_level=[2,4]\n",
148 | "Mat=pd.DataFrame(np.array(([0,1,0],[0,0,1],[0,0,0])))\n",
149 | "Node_Type=['D','D','C']\n",
150 | "CPD={'0':[0.5,0.5],'01':[[0.6,0.3,0.05,0.05],[0.1,0.2,0.3,0.4]],'12':{'mu0':10,'sigma0':5,'mu1':30,'sigma1':5,\n",
151 | " 'mu2':60,'sigma2':5,'mu3':80,'sigma3':5}}\n",
152 | "Parent={'0':[],'1':[0],'2':[1]}\n",
153 | "\n",
154 | "CPD2={'00':[[0.7,0.3],[0.3,0.7]],'0011':[[0.7,0.2,0.1,0],[0.5,0.4,0.1,0],[0.45,0.45,0.1,0],\n",
155 | "[0.3,0.4,0.2,0.1],[0.4,0.4,0.1,0.1],[0.2,0.3,0.3,0.2],[0.2,0.3,0.3,0.2],[0.1,0.2,0.3,0.4],[0.3,0.4,0.2,0.1],[0.2,0.2,0.4,0.2],\n",
156 | " [0.2,0.1,0.4,0.3],[0.05,0.15,0.3,0.5],[0.1,0.3,0.3,0.3],[0,0.1,0.3,0.6],[0,0.1,0.2,0.7],[0,0,0.3,0.7]],'112':{'mu0':10,'sigma0':2,'mu1':30,'sigma1':2,\n",
157 | " 'mu2':50,'sigma2':2,'mu3':60,'sigma3':5,'mu4':20,'sigma4':2,'mu5':25,'sigma5':5,'mu6':50,'sigma6':5,'mu7':60,'sigma7':5,\n",
158 | " 'mu8':40,'sigma8':5,'mu9':50,'sigma9':5,'mu10':70,'sigma10':5,'mu11':85,'sigma11':2,'mu12':60,'sigma12':5, \n",
159 | " 'mu13':60,'sigma13':5,'mu14':80,'sigma14':3,'mu15':90,'sigma15':3}}\n",
160 | "\n",
161 | "Parent2={'0':[0],'1':[0,0,1],'2':[1,1]}\n",
162 | "loopbacks={'00':[1], '01':[1],'11':[1],'12':[1]}\n",
163 | "\n",
164 | "Time_series2=tsBNgen(T,N,N_level,Mat,Node_Type,CPD,Parent,CPD2,Parent2,loopbacks)\n",
165 | "Time_series2.BN_data_gen()"
166 | ]
167 | },
168 | {
169 | "cell_type": "code",
170 | "execution_count": 6,
171 | "metadata": {},
172 | "outputs": [
173 | {
174 | "name": "stdout",
175 | "output_type": "stream",
176 | "text": [
177 | "[2, 2, 2, 2, 2, 1, 1, 2, 2, 2]\n",
178 | "[3, 4, 3, 4, 3, 3, 2, 1, 4, 4]\n",
179 | "[49.95557580237343, 81.91935377141675, 80.02776603979912, 82.7514263833196, 78.09782664417831, 62.4090858819203, 43.79221621872338, 18.10747482296797, 52.72102411795976, 91.61301738516033]\n"
180 | ]
181 | }
182 | ],
183 | "source": [
184 | "print(Time_series2.BN_Nodes[0][3])\n",
185 | "print(Time_series2.BN_Nodes[1][3])\n",
186 | "print(Time_series2.BN_Nodes[2][3])"
187 | ]
188 | },
189 | {
190 | "cell_type": "markdown",
191 | "metadata": {},
192 | "source": [
193 | "## Architecture 3\n",
194 | "\n",
195 | "### Similar to Architecture 1 but with loopback 2 for the middle node."
196 | ]
197 | },
198 | {
199 | "cell_type": "code",
200 | "execution_count": 8,
201 | "metadata": {},
202 | "outputs": [],
203 | "source": [
204 | "T=10\n",
205 | "N=1000\n",
206 | "N_level=[2,4]\n",
207 | "Mat=pd.DataFrame(np.array(([0,1,1],[0,0,1],[0,0,0])))\n",
208 | "Node_Type=['D','D','C']\n",
209 | "\n",
210 | "CPD={'0':[0.6,0.4],'01':[[0.5,0.3,0.15,0.05],[0.1,0.15,0.3,0.45]],'012':{'mu0':10,'sigma0':2,'mu1':30,'sigma1':5,\n",
211 | " 'mu2':50,'sigma2':5,'mu3':70,'sigma3':5,'mu4':15,'sigma4':5,'mu5':50,'sigma5':5,'mu6':70,'sigma6':5,'mu7':90,'sigma7':3\n",
212 | "}}\n",
213 | "Parent={'0':[],'1':[0],'2':[0,1]}\n",
214 | "\n",
215 | "\n",
216 | "CPD2={'00':[[0.7,0.3],[0.2,0.8]],'011':[[0.7,0.2,0.1,0],[0.6,0.3,0.05,0.05],[0.35,0.5,0.15,0],\n",
217 | "[0.2,0.3,0.4,0.1],[0.3,0.3,0.2,0.2],[0.1,0.2,0.3,0.4],[0.05,0.15,0.3,0.5],[0,0.05,0.25,0.7]],'012':{'mu0':10,'sigma0':2,'mu1':30,'sigma1':5,\n",
218 | " 'mu2':50,'sigma2':5,'mu3':70,'sigma3':5,'mu4':15,'sigma4':5,'mu5':50,'sigma5':5,'mu6':70,'sigma6':5,'mu7':90,'sigma7':3\n",
219 | "}}\n",
220 | "\n",
221 | "Parent2={'0':[0],'1':[0,1],'2':[0,1]}\n",
222 | "\n",
223 | "loopbacks={'00':[1],'11':[1]}\n",
224 | "\n",
225 | "CPD3={'00':[[0.7,0.3],[0.2,0.8]],'0111':[[0.7,0.2,0.1,0],[0.6,0.3,0.1,0],[0.3,0.5,0.2,0],\n",
226 | "[0.3,0.4,0.15,0.15],[0.5,0.4,0.05,0.05],[0.5,0.4,0.05,0.05],[0.25,0.45,0.15,0.15],[0.2,0.4,0.3,0.1],[0.3,0.5,0.2,0],[0.25,0.45,0.15,0.15],\n",
227 | "[0.1,0.45,0.3,0.15],[0.05,0.45,0.3,0.2],[0.3,0.4,0.15,0.15],[0.2,0.4,0.3,0.1],[0.05,0.45,0.3,0.2],[0.1,0.3,0.4,0.2],[0.35,0.35,0.2,0.1],[0.25,0.45,0.2,0.1],[0.1,0.2,0.5,0.2],[0.05,0.25,0.5,0.2],\n",
228 | "[0.25,0.45,0.2,0.1],[0.05,0.35,0.5,0.1],[0.05,0.25,0.45,0.25],[0.05,0.2,0.35,0.4],[0.1,0.2,0.5,0],[0.05,0.25,0.45,0.25],[0.05,0.15,0.3,0.5],\n",
229 | " [0.05,0.1,0.3,0.55],[0.05,0.25,0.5,0.2],[0.05,0.2,0.35,0.4],[0.05,0.1,0.3,0.55],[0,0,0.2,0.8]],'012':{'mu0':10,'sigma0':2,'mu1':20,'sigma1':3,\n",
230 | " 'mu2':50,'sigma2':5,'mu3':70,'sigma3':5,'mu4':15,'sigma4':5,'mu5':50,'sigma5':5,'mu6':75,'sigma6':3,'mu7':90,'sigma7':3\n",
231 | "}}\n",
232 | "\n",
233 | "Parent3={'0':[0],'1':[0,1,1],'2':[0,1]}\n",
234 | "\n",
235 | "loopbacks2={'00':[1],'11':[2,1]}\n",
236 | "\n",
237 | "\n",
238 | "Time_series3=tsBNgen(T,N,N_level,Mat,Node_Type,CPD,Parent,CPD2,Parent2,loopbacks,CPD3,Parent3,loopbacks2)\n",
239 | "\n",
240 | "Time_series3.BN_sample_gen_loopback()"
241 | ]
242 | },
243 | {
244 | "cell_type": "code",
245 | "execution_count": 9,
246 | "metadata": {},
247 | "outputs": [
248 | {
249 | "name": "stdout",
250 | "output_type": "stream",
251 | "text": [
252 | "[2, 1, 1, 2, 2, 2, 1, 1, 2, 2]\n",
253 | "[2, 1, 1, 1, 2, 3, 4, 3, 4, 2]\n",
254 | "[53.87472154213914, 6.495627168596188, 9.418392120518126, 19.646516517998936, 44.525380117118665, 75.87802769068614, 75.68517757174538, 56.29952829580641, 93.33191414647266, 45.862343348068705]\n"
255 | ]
256 | }
257 | ],
258 | "source": [
259 | "print((Time_series3.BN_Nodes[0][1]))\n",
260 | "print(Time_series3.BN_Nodes[1][1])\n",
261 | "print(Time_series3.BN_Nodes[2][1])"
262 | ]
263 | },
264 | {
265 | "cell_type": "markdown",
266 | "metadata": {},
267 | "source": [
268 | "## Architecture 4\n",
269 | "\n",
270 | "### 3 Discerete and 3 Continuous Nodes. Each Node is Connected to Itself Across Time"
271 | ]
272 | },
273 | {
274 | "cell_type": "code",
275 | "execution_count": 13,
276 | "metadata": {},
277 | "outputs": [],
278 | "source": [
279 | "T=10\n",
280 | "N=1000\n",
281 | "N_level=[2,2,2]\n",
282 | "Mat=pd.DataFrame(np.array(([0,1,1,1,1],[0,0,1,1,1],[0,0,0,1,1],[0,0,0,0,0],[0,0,0,0,0]))) \n",
283 | "Node_Type=['D','D','D','C','C']\n",
284 | "\n",
285 | "CPD={'0':[0.6,.04],'01':[[0.7,0.3],[0.3,0.7]],'012':[[0.9,0.1],[0.4,0.6],[0.6,0.4],[0.1,0.9]],\n",
286 | " '0123':{'mu0':5,'sigma0':2,'mu1':10,'sigma1':3,'mu2':20,'sigma2':2,'mu3':50,'sigma3':3,'mu4':20,'sigma4':2,'mu5':40,'sigma5':3,'mu6':50,'sigma6':5,'mu7':80,'sigma7':3,\n",
287 | " },'0124':{'mu0':500,'sigma0':10,'mu1':480,'sigma1':13,'mu2':450,'sigma2':10,'mu3':400,'sigma3':13,'mu4':400,'sigma4':10,'mu5':300,'sigma5':10,'mu6':250,'sigma6':10,'mu7':100,'sigma7':5}}\n",
288 | "\n",
289 | "Parent={'0':[],'1':[0],'2':[0,1],'3':[0,1,2],'4':[0,1,2]}\n",
290 | "\n",
291 | "CPD2={'00':[[0.6,0.4],[0.2,0.8]],'011':[[0.8,0.2],[0.6,0.4],[0.7,0.3],[0.2,0.8]],\n",
292 | " '0122':[[0.9,0.1],[0.7,0.3],[0.7,0.3],[0.28,0.78],[0.7,0.3],[0.28,0.72],[0.28,0.72],[0.1,0.9]],'01233':{'33':{'coefficient':[np.linspace(0.6,0.8,8).tolist()]},'sigma_intercept':np.linspace(0.6,3,8).tolist(),'sigma':np.linspace(3,4,8).tolist()},'01244':{'44':{'coefficient':[np.linspace(0.6,1.3,8).tolist()]},\n",
293 | " 'sigma_intercept':np.linspace(2,5,8).tolist(),'sigma':np.linspace(3,4,8).tolist()}}\n",
294 | "\n",
295 | "\n",
296 | "loopbacks={'00':[1],'11':[1],'22':[1],'33':[1],'44':[1]} \n",
297 | "\n",
298 | "Parent2={'0':[0],'1':[0,1],'2':[0,1,2],'3':[0,1,2,3],'4':[0,1,2,4]}\n",
299 | "\n",
300 | "\n",
301 | "Time_series4=tsBNgen(T,N,N_level,Mat,Node_Type,CPD,Parent,CPD2,Parent2,loopbacks)\n",
302 | "\n",
303 | "Time_series4.BN_data_gen() "
304 | ]
305 | },
306 | {
307 | "cell_type": "markdown",
308 | "metadata": {},
309 | "source": [
310 | "## Architecture 5\n",
311 | "\n",
312 | "### HMM with loopback 1 "
313 | ]
314 | },
315 | {
316 | "cell_type": "code",
317 | "execution_count": 18,
318 | "metadata": {},
319 | "outputs": [
320 | {
321 | "name": "stdout",
322 | "output_type": "stream",
323 | "text": [
324 | "Total Time is 2.0737037658691406\n"
325 | ]
326 | }
327 | ],
328 | "source": [
329 | "import time\n",
330 | "START=time.time()\n",
331 | "T=20\n",
332 | "N=1000\n",
333 | "N_level=[4]\n",
334 | "Mat=pd.DataFrame(np.array(([0,1],[0,0]))) # HMM\n",
335 | "Node_Type=['D','C']\n",
336 | "\n",
337 | "CPD={'0':[0.25,0.25,0.25,0.25],'01':{'mu0':20,'sigma0':5,'mu1':40,'sigma1':5,\n",
338 | " 'mu2':60,'sigma2':5,'mu3':80,'sigma3':5}}\n",
339 | "\n",
340 | "Parent={'0':[],'1':[0]}\n",
341 | "\n",
342 | "\n",
343 | "CPD2={'00':[[0.6,0.3,0.05,0.05],[0.25,0.4,0.25,0.1],[0.1,0.3,0.4,0.2],[0.05,0.05,0.4,0.5]],'01':{'mu0':20,'sigma0':5,'mu1':40,'sigma1':5,\n",
344 | " 'mu2':60,'sigma2':5,'mu3':80,'sigma3':5\n",
345 | "}}\n",
346 | "\n",
347 | "loopbacks={'00':[1]}\n",
348 | "\n",
349 | "Parent2={'0':[0],'1':[0]}\n",
350 | "\n",
351 | "\n",
352 | "Time_series5=tsBNgen(T,N,N_level,Mat,Node_Type,CPD,Parent,CPD2,Parent2,loopbacks)\n",
353 | "\n",
354 | "Time_series5.BN_data_gen() \n",
355 | "FINISH=time.time()\n",
356 | "print('Total Time is',FINISH-START)"
357 | ]
358 | },
359 | {
360 | "cell_type": "code",
361 | "execution_count": 19,
362 | "metadata": {
363 | "scrolled": true
364 | },
365 | "outputs": [
366 | {
367 | "name": "stdout",
368 | "output_type": "stream",
369 | "text": [
370 | "[3, 4, 3, 1, 2, 2, 2, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 3, 4, 3]\n",
371 | "[60.072920827425, 82.74836461790208, 57.3652355700002, 15.808715408345048, 42.11400427111718, 33.940916747529904, 39.38294267545808, 56.77532334095553, 55.225363073651565, 80.31120583537712, 72.83731788645105, 78.75540628414659, 83.74799152447356, 80.85359737561909, 83.89681465845214, 73.10191951112242, 83.32808287493899, 52.70302656588948, 74.34544473307433, 61.67002934637476]\n"
372 | ]
373 | }
374 | ],
375 | "source": [
376 | "print(Time_series5.BN_Nodes[0][1])\n",
377 | "print(Time_series5.BN_Nodes[1][1])"
378 | ]
379 | },
380 | {
381 | "cell_type": "code",
382 | "execution_count": null,
383 | "metadata": {},
384 | "outputs": [],
385 | "source": []
386 | }
387 | ],
388 | "metadata": {
389 | "kernelspec": {
390 | "display_name": "Python 3",
391 | "language": "python",
392 | "name": "python3"
393 | },
394 | "language_info": {
395 | "codemirror_mode": {
396 | "name": "ipython",
397 | "version": 3
398 | },
399 | "file_extension": ".py",
400 | "mimetype": "text/x-python",
401 | "name": "python",
402 | "nbconvert_exporter": "python",
403 | "pygments_lexer": "ipython3",
404 | "version": "3.8.2"
405 | }
406 | },
407 | "nbformat": 4,
408 | "nbformat_minor": 4
409 | }
410 |
--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
1 | from setuptools import setup
2 |
3 | with open("README.md", "r") as fh:
4 | long_description = fh.read()
5 |
6 | setup(
7 | name='tsBNgen',
8 | version='1.0.0',
9 | author='Manie Tadayon',
10 | author_email='manitadayon@ucla.edu',
11 | description='Generate time series data from an arbitrary Bayesian network',
12 | packages=['tsBNgen'],
13 | license='MIT',
14 | long_description=long_description,
15 | long_description_content_type="text/markdown",
16 | url='https://github.com/manitadayon/tsBNgen',
17 | classifiers=[
18 | "Programming Language :: Python :: 3",
19 | "License :: OSI Approved :: MIT License",
20 | "Operating System :: OS Independent",
21 | ],
22 | )
--------------------------------------------------------------------------------
/tsBNgen/__init__.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import pandas as pd
3 | from functools import reduce
4 |
5 |
--------------------------------------------------------------------------------
/tsBNgen/__pycache__/__init__.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/manitadayon/tsBNgen/1d54de78e9a4405e7a0049ccab89093cd7f0d094/tsBNgen/__pycache__/__init__.cpython-38.pyc
--------------------------------------------------------------------------------
/tsBNgen/__pycache__/tsBNgen.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/manitadayon/tsBNgen/1d54de78e9a4405e7a0049ccab89093cd7f0d094/tsBNgen/__pycache__/tsBNgen.cpython-38.pyc
--------------------------------------------------------------------------------
/tsBNgen/tsBNgen.py:
--------------------------------------------------------------------------------
1 | from tsBNgen import *
2 | class tsBNgen:
3 | def __init__(self,T,N,N_level,Mat,Node_Type,CPD,Parent,CPD2,Parent2,loopbacks,CPD3=None,Parent3=None,loopbacks2=None,custom_time=0):
4 | '''
5 | A class to generate time series according to arbitrary dynamic Bayesian network structure.
6 |
7 | Attributes
8 | -----------
9 | T : int
10 | Length of each time series.
11 |
12 | N : int
13 | Number of time series.
14 |
15 | N_level : list
16 | Number of levels for the discrete nodes. Ignore this for the continuous nodes.
17 |
18 | Mat : data-frame
19 | Adjacency matrix corresponding to the Bayesian network at initial time.
20 |
21 | Node_Type : list
22 | Identifying nodes as either discrete "D" or continuous "C".
23 |
24 | CPD : dict
25 | Probability distribution fof the nodes at initial time point.
26 |
27 | Parent : dict
28 | Parents of each node at initial time point.
29 |
30 | CPD2 : dic
31 | Probability distribution of the nodes after initial time.
32 |
33 | Parent2 : dict
34 | Parents of each node for time points after the initial time point.
35 |
36 | loopbacks : dict
37 | Determining the temporal conection between the nodes.
38 |
39 | CPD3 : dict
40 | Probability distribution for the nodes. Use this entry when BN_sample_gen_loopback() is called.
41 | It defaults to empty.
42 |
43 | Parent3 : dict
44 | Parents of each node after the initial time point. It is default to empty.
45 | Use this entry when BN_sample_gen_loopback() is called
46 |
47 | loopbacks2 : dict
48 | Determining the temporal conection between nodes. It is default to empty.
49 | Use this entry when BN_sample_gen_loopback() is called
50 |
51 | custom_time: int
52 | Determines at which time point, the new BN is used. The default is 0, which means
53 | the program learns it automatically from the loopbacks entry.
54 |
55 | Methods
56 | ------------
57 | BFS(Row)
58 | Perform Breadth-first search for the given node(row).
59 |
60 | zero_loc(List)
61 | Find the index of zero values in a list.
62 |
63 | Role_Assignment()
64 | Identify the root node.
65 |
66 | DAG_ordering()
67 | Find the topological ordering of the graph.
68 |
69 | Child(Row)
70 | Finds the children of the node specified by the row of the adjacency matrix.
71 |
72 | Multinomial_Select(index1,index2.ii)
73 | Generate sample according to the Multinomial distribution.
74 |
75 | Roots_length()
76 | Identifying the number of root nodes.
77 |
78 | int_to_str(List)
79 | Concatenates list elements to a string.
80 |
81 | Valid_BN(parent)
82 | Verify whether the parent-child relationships between nodes are valid.
83 |
84 | Initial_sample()
85 | Generate samples for all the nodes at initial time (t=0).
86 |
87 | Gaussian_select(sindex1,index2,ii=0)
88 | Generate sample according to the Gaussian distribution.
89 |
90 | continous_cpd()
91 | Identify which CPD entry to sample from.
92 |
93 | BN_sample():
94 | Generate samples for all the nodes after the initial time.
95 |
96 | BN_data_gen()
97 | Use this function under the following conditions:
98 | custom_time variable is not specified and the value of the loopback for all the variables is at most 1
99 |
100 | BN_sample_loopback()
101 | Generate samples for all the nodes for time t=k.
102 |
103 | BN_sample_gen_loopback()
104 | custom_time is not specified and you want the loopback value for some nodes to be at most 2.
105 | custom_time is specified and it is at least equal to the maximum loopback value of the loopbacks2.
106 |
107 | '''
108 | self.T=T
109 | self.N=N
110 | self.Mat=Mat
111 | self.Node_Type=Node_Type
112 | self.CPD=CPD
113 | self.Node=[[] for ii in range(self.Mat.shape[0])]
114 | self.Parent=Parent
115 | self.N_level=N_level
116 | self.CPD2=CPD2
117 | self.Parent2=Parent2
118 | self.flag=0
119 | if CPD3 is None:
120 | CPD3={}
121 | self.CPD3=CPD3
122 | if Parent3 is None:
123 | Parent3={}
124 | self.Parent3=Parent3
125 | self.loopbacks=loopbacks
126 | if loopbacks2 is None:
127 | loopbacks2={}
128 | self.loopbacks2=loopbacks2
129 | self.custom_time=custom_time
130 |
131 | def BFS(self,Row):
132 | '''
133 | Perform Breadth-first search for the given node(row).
134 |
135 | Parameters
136 | --------------
137 |
138 | Row : int
139 | Corresponds to the row (node) in adjacency matrix.
140 |
141 | Returns
142 | --------------
143 | list
144 | The node and all its children.
145 | '''
146 | child = [Row]
147 | queue = []
148 | queue.extend(np.nonzero(self.Mat.iloc[Row, :].values)[0].tolist())
149 | while queue:
150 | vertex = queue.pop(0)
151 | if vertex not in child:
152 | child.append(vertex)
153 | queue.extend(np.nonzero(self.Mat.iloc[vertex, :].values)[0].tolist())
154 | return child
155 |
156 | @staticmethod
157 | def zero_loc(List):
158 | '''
159 | Find the index of zero values in a list.
160 |
161 | Parameters
162 | ------------
163 | List: list
164 |
165 | Returns
166 | -----------
167 | list
168 | indices of zero values in a list.
169 | '''
170 | ind=[count for count,ii in enumerate(List) if ii==0]
171 | return ind
172 |
173 | def __repr__(self):
174 |
175 | return f''' length of each time series is {self.T}
176 | Number of time series samples is {self.N}
177 | The adjacency matrix is {self.Mat}
178 | Node Types are {self.Node_Type}
179 | Conditional Probability Table for initial time is {self.CPD}
180 | Parents of each node at initial time are {self.Parent}
181 | The number of levels for each discrete variable is {self.N_level}
182 | The roles are {self.Role}
183 | Conditional Probability Table for t2 to t_loopback is {self.CPD2}
184 | BN parents for time t2 ...t_loopback for each node are {self.Parent2}
185 | Conditional Probability Table for t_loopback to tn is {self.CPD3}
186 | BN parents for time t_loopback ... tn for each node are {self.Parent3}
187 | '''
188 |
189 | def Role_Assignment(self):
190 | '''
191 | Identify the root node.
192 |
193 | Parameters
194 | ------------
195 | None
196 |
197 | Returns
198 | ------------
199 | None
200 |
201 | '''
202 | self.Role=[0]*len(self.top_order)
203 | for count,ii in enumerate(self.top_order):
204 | if(sum(self.Mat.iloc[:,ii]) ==0):
205 | self.Role[ii]=1
206 |
207 | def DAG_ordering(self):
208 | '''
209 | Find the topological ordering of the graph
210 |
211 | Parameters
212 | -------------
213 | None
214 |
215 | Returns
216 | ------------
217 | None
218 | '''
219 | queue = []
220 | in_degree = np.count_nonzero(self.Mat, axis=0).tolist()
221 | index = tsBNgen.zero_loc(in_degree)
222 | queue.extend(index)
223 | visited_count = 0
224 | self.top_order = []
225 | while queue:
226 | P = queue.pop(0)
227 | Neighbor = self.Child(P)
228 | self.top_order.append(P)
229 | for count, ii in enumerate(Neighbor):
230 | in_degree[ii] = in_degree[ii] - 1
231 | if (in_degree[ii] == 0):
232 | queue.append(ii)
233 |
234 | visited_count = visited_count + 1
235 | if (visited_count != self.Mat.shape[0]):
236 | print('DAG has a cycle')
237 |
238 | def Child(self, Row):
239 | '''
240 | Finds the children of the node specified by the row of the adjacency matrix.
241 |
242 | Parameters
243 | -----------
244 | Row : int
245 | The row in the adjacency matrix, corresponding to the same node in a Bayesian network.
246 |
247 | Returns
248 | ----------
249 | list
250 | All the children of the given node.
251 | '''
252 | child=[]
253 | child.extend(np.nonzero(self.Mat.iloc[Row, :].values)[0].tolist())
254 | return child
255 |
256 |
257 | def Multinomial_Select(self,index1,index2,ii=0):
258 | '''
259 | Generate sample according to the Multinomial distribution.
260 |
261 | Parameters
262 | -----------
263 | index1: string
264 | key values of dictionary in CPD/CPD2/CPD3
265 | index2: int
266 | Determine which CPD entry to select.
267 | ii : int
268 | The node to generate the sample for. It defaults to 0.
269 |
270 | Returns
271 | ------------
272 | int
273 | The new generated sample.
274 | '''
275 | if(self.flag==0):
276 | if(self.Role[ii]==1):
277 | Num=np.random.multinomial(1, self.CPD[str(index1)], size=1)
278 | Pos=Num.tolist()[0].index(1)
279 | return Pos+1
280 | else:
281 | Num=np.random.multinomial(1, self.CPD[str(index1)][index2], size=1)
282 | Pos=Num.tolist()[0].index(1)
283 | return Pos+1
284 | elif (self.flag==1):
285 | if(len(self.Parent2[str(ii)])!=0):
286 | Num=np.random.multinomial(1, self.CPD2[str(index1)][index2], size=1)
287 | Pos=Num.tolist()[0].index(1)
288 | else:
289 | Num=np.random.multinomial(1, self.CPD2[str(index1)], size=1)
290 | Pos=Num.tolist()[0].index(1)
291 | return Pos+1
292 |
293 | elif (self.flag==2):
294 | if(len(self.Parent3[str(ii)])!=0):
295 | Num=np.random.multinomial(1, self.CPD3[str(index1)][index2], size=1)
296 | else:
297 | Num=np.random.multinomial(1, self.CPD3[str(index1)], size=1)
298 | Pos=Num.tolist()[0].index(1)
299 | return Pos+1
300 |
301 |
302 |
303 | def parents_len(self,Node):
304 | return np.count_nonzero(self.Mat.iloc[:,Node], axis=0).tolist()
305 |
306 | def Roots_length(self):
307 | '''
308 | Identifying the number of root nodes.
309 |
310 | Parameters
311 | ------------
312 | None
313 |
314 | Returns
315 | ------------
316 | int
317 | Number of root nodes.
318 | '''
319 | parent=np.count_nonzero(self.Mat, axis=0).tolist()
320 | return len([idx for idx,ii in enumerate(parent) if ii==0])
321 |
322 | @staticmethod
323 | def int_to_str(List):
324 | '''
325 | Concatenates list elements to a string.
326 |
327 | Parameters
328 | -------------
329 | List : list
330 |
331 | Returns
332 | -------------
333 | string
334 | concatenated list elements as a string.
335 |
336 | '''
337 | if not isinstance(List, list):
338 | List=list(map(int, str(List)))
339 | List=[str(ii) for ii in List]
340 | return ''.join(List)
341 |
342 | def Valid_BN(self,parent):
343 | '''
344 | Verify whether the parent-child relationships between nodes are valid.
345 |
346 | Parameters
347 | ------------
348 | parent : dict
349 | dictionary where the keys are the nodes and the values are the list of parents.
350 |
351 | Returns
352 | -----------
353 | None
354 |
355 | Raises
356 | -----------
357 | Exception
358 | "Parent of a discrete node cannot be continuous"
359 | '''
360 | for count,ii in enumerate(self.top_order):
361 | if(self.Node_Type[ii]=='D' and any(self.Node_Type[jj]=='C' for jj in parent[str(ii)])):
362 | raise Exception("Parent of a discrete node cannot be continuous")
363 |
364 |
365 | def Initial_sample(self):
366 | '''
367 | Generate samples for all the nodes at initial time (t=0)
368 |
369 | Parameters
370 | --------------
371 | None
372 |
373 | Returns
374 | -------------
375 | None
376 |
377 | Raises
378 | ------------
379 | Exception
380 | Parent of a discrete node cannot be continuous
381 | '''
382 | self.DAG_ordering()
383 | self.Role_Assignment()
384 | self.flag=0
385 | self.Valid_BN(self.Parent)
386 |
387 | for count,ii in enumerate(self.top_order):
388 | parent=tsBNgen.int_to_str(self.Parent[str(ii)])+str(ii)
389 | if(self.Role[ii]==1 and self.Node_Type[ii]=='D'):
390 | self.Node[ii].append(self.Multinomial_Select(parent,0,ii))
391 | elif(self.Role[ii]==1 and self.Node_Type[ii]=='C'):
392 | self.Node[ii].extend(self.Gaussian_select(ii,0))
393 | elif(all(self.Node_Type[jj]=='D' for jj in self.Parent[str(ii)])):
394 | all_parents=[]
395 | parent_N_level=[]
396 | for count2,jj in enumerate(self.Parent[str(ii)]):
397 | all_parents.append(self.Node[jj][-1])
398 | parent_N_level.append(self.N_level[jj])
399 |
400 | self.all_parents=all_parents
401 | self.parent_N_level=parent_N_level
402 | CPD_entry=self.continous_cpd()
403 |
404 | if(self.Node_Type[ii]=='D'):
405 | self.Node[ii].append(self.Multinomial_Select(parent,CPD_entry,ii))
406 | elif(self.Node_Type[ii]=='C'):
407 | self.Node[ii].extend(self.Gaussian_select(parent,CPD_entry))
408 |
409 | elif(any(self.Node_Type[jj]=='D' for jj in self.Parent[str(ii)])):
410 | D_Parent=[kk for kk in self.Parent[str(ii)] if self.Node_Type[kk]=='D']
411 | all_parents=[]
412 | parent_N_level=[]
413 | for count2,jj in enumerate(D_Parent):
414 | all_parents.append(self.Node[jj][-1])
415 | parent_N_level.append(self.N_level[jj])
416 |
417 | self.all_parents=all_parents
418 | self.parent_N_level=parent_N_level
419 | CPD_entry=self.continous_cpd()
420 |
421 | C_Parent=[kk for kk in self.Parent[str(ii)] if self.Node_Type[kk]=='C']
422 | temp=0
423 | for count3,kk in enumerate(C_Parent):
424 | cont_SUM=self.CPD[str(parent)][str(kk)+str(ii)]['coefficient'][0][CPD_entry]*self.Node[kk][-1]
425 | temp=temp+cont_SUM
426 | intercept=np.random.normal(0,self.CPD[str(parent)]['sigma_intercept'][CPD_entry],1)
427 | self.Node[ii].extend(self.Gaussian_select(temp+intercept,self.CPD[str(parent)]['sigma'][CPD_entry],ii))
428 |
429 | elif(all(self.Node_Type[jj]=='C' for jj in self.Parent[str(ii)])):
430 | temp=0
431 | for count3,kk in enumerate(self.Parent[str(ii)]):
432 | cont_SUM=self.CPD[str(parent)][str(kk)+str(ii)]['coefficient'][0][0]*self.Node[kk][-1]
433 | temp=temp+cont_SUM
434 | intercept=np.random.normal(0,self.CPD[str(parent)]['sigma_intercept'][0],1)
435 | self.Node[ii].extend(self.Gaussian_select(temp+intercept,self.CPD[str(parent)]['sigma'][0],ii))
436 |
437 |
438 | def Gaussian_select(self,index1,index2,ii=0):
439 | if(self.flag==0):
440 | C_Parent=[kk for kk in self.Parent[str(ii)] if self.Node_Type[kk]=='C']
441 | if(len(C_Parent)!=0):
442 | return (np.random.normal(index1,index2, 1).tolist())
443 | else:
444 | return (np.random.normal(self.CPD[str(index1)]['mu'+str(index2)],self.CPD[str(index1)]['sigma'+str(index2)], 1).tolist())
445 |
446 | elif(self.flag==1):
447 | C_Parent=[kk for kk in self.Parent2[str(ii)] if self.Node_Type[kk]=='C']
448 | if(len(C_Parent)!=0):
449 | return (np.random.normal(index1,index2, 1).tolist())
450 | else:
451 | return (np.random.normal(self.CPD2[str(index1)]['mu'+str(index2)],self.CPD2[str(index1)]['sigma'+str(index2)], 1).tolist())
452 | elif(self.flag==2):
453 | C_Parent=[kk for kk in self.Parent3[str(ii)] if self.Node_Type[kk]=='C']
454 | if(len(C_Parent)!=0):
455 | return (np.random.normal(index1,index2, 1).tolist())
456 | else:
457 | return (np.random.normal(self.CPD3[str(index1)]['mu'+str(index2)],self.CPD3[str(index1)]['sigma'+str(index2)], 1).tolist())
458 |
459 |
460 | def Level_multiplied(self):
461 | self.level_multiply=[]
462 | Temp=self.parent_N_level
463 | for ii in range(len(Temp)):
464 | A1=reduce(lambda x, y: x*y,Temp)
465 | self.level_multiply.append(A1)
466 | Temp=Temp[1:]
467 | self.level_multiply=self.level_multiply[1:]
468 | self.level_multiply.insert(len(self.parent_N_level ),1)
469 |
470 | def continous_cpd(self):
471 | self.Level_multiplied()
472 | List=[(ii-1)*jj for ii, jj in zip(self.all_parents,self.level_multiply)]
473 | return sum(List)
474 |
475 | def BN_sample(self):
476 | '''
477 | Generate samples for all the nodes after the initial time.
478 |
479 | Parameters
480 | --------------
481 | None
482 |
483 | Returns
484 | -------------
485 | None
486 |
487 | Notes
488 | ------------
489 | Use this function to generate samples if the loopback values are at most one.
490 | Loopback=1 means that a node at time t is connected to the node at t-1.
491 |
492 | Raises
493 | ------------
494 | Exception
495 | Parent of a discrete node cannot be continuous
496 | '''
497 | self.flag=1
498 | loopbacks_temp=self.loopbacks.copy()
499 | self.Valid_BN(self.Parent2)
500 | for count,ii in enumerate(self.top_order):
501 | parent=tsBNgen.int_to_str(self.Parent2[str(ii)])+str(ii)
502 | if(all(self.Node_Type[jj]=='D' for jj in self.Parent2[str(ii)])):
503 | temp=0
504 | all_parents=[]
505 | parent_N_level=[]
506 | for count2,jj in enumerate(self.Parent2[str(ii)]):
507 | loopbacks_keys=tsBNgen.int_to_str(jj)+str(ii)
508 | if(loopbacks_keys in self.loopbacks.keys()):
509 | if(self.top_order.index(jj) < count):
510 | for count3,mm in enumerate(self.loopbacks[loopbacks_keys]):
511 | all_parents.append(self.Node[jj][-1-mm])
512 | else:
513 | for count3,mm in enumerate(self.loopbacks[loopbacks_keys]):
514 | LP=mm-1
515 | all_parents.append(self.Node[jj][-1-LP])
516 | del self.loopbacks[loopbacks_keys]
517 | elif(jj != ii):
518 | all_parents.append(self.Node[jj][-1])
519 | parent_N_level.append(self.N_level[jj])
520 |
521 | self.loopbacks=loopbacks_temp.copy()
522 |
523 | self.all_parents=all_parents
524 | self.parent_N_level=parent_N_level
525 | CPD_entry=self.continous_cpd()
526 |
527 | if(self.Node_Type[ii]=='D'):
528 | self.Node[ii].append(self.Multinomial_Select(parent,CPD_entry,ii))
529 | elif(self.Node_Type[ii]=='C'):
530 | self.Node[ii].extend(self.Gaussian_select(parent,CPD_entry))
531 |
532 | elif(any(self.Node_Type[jj]=='D' for jj in self.Parent2[str(ii)])):
533 | loopbacks_temp=self.loopbacks.copy()
534 | D_Parent=[kk for kk in self.Parent2[str(ii)] if self.Node_Type[kk]=='D']
535 | all_parents=[]
536 | parent_N_level=[]
537 | for count2,jj in enumerate(D_Parent):
538 | loopbacks_keys=tsBNgen.int_to_str(jj)+str(ii)
539 | if(loopbacks_keys in self.loopbacks.keys()):
540 | if(self.top_order.index(jj) < count):
541 | for count3,mm in enumerate(self.loopbacks[loopbacks_keys]):
542 | all_parents.append(self.Node[jj][-1-mm])
543 | else:
544 | for count3,mm in enumerate(self.loopbacks[loopbacks_keys]):
545 | LP=mm-1
546 | all_parents.append(self.Node[jj][-1-LP])
547 | del self.loopbacks[loopbacks_keys]
548 | elif(jj != ii):
549 |
550 | all_parents.append(self.Node[jj][-1])
551 | parent_N_level.append(self.N_level[jj])
552 |
553 | self.loopbacks=loopbacks_temp.copy()
554 | self.all_parents=all_parents
555 | self.parent_N_level=parent_N_level
556 | CPD_entry=self.continous_cpd()
557 |
558 | C_Parent=[kk for kk in self.Parent2[str(ii)] if self.Node_Type[kk]=='C']
559 | temp=0
560 | loopbacks_temp=self.loopbacks.copy()
561 | for count3,kk in enumerate(C_Parent):
562 | loopbacks_keys=tsBNgen.int_to_str(kk)+str(ii)
563 | if(loopbacks_keys in self.loopbacks.keys()):
564 | if(self.top_order.index(kk) < count):
565 | for count3,mm in enumerate(self.loopbacks[loopbacks_keys]):
566 | cont_SUM=self.CPD2[str(parent)][str(kk)+str(ii)]['coefficient'][mm][CPD_entry]*self.Node[kk][-1-mm]
567 | temp=temp+cont_SUM
568 | else:
569 | for count3,mm in enumerate(self.loopbacks[loopbacks_keys]):
570 | LP=mm-1
571 | cont_SUM=self.CPD2[str(parent)][str(kk)+str(ii)]['coefficient'][LP][CPD_entry]*self.Node[kk][-1-LP]
572 | temp=temp+cont_SUM
573 | del self.loopbacks[loopbacks_keys]
574 | elif(kk != ii):
575 | cont_SUM=self.CPD2[str(parent)][str(kk)+str(ii)]['coefficient'][0][CPD_entry]*self.Node[kk][-1]
576 | temp=temp+cont_SUM
577 | intercept=np.random.normal(0,self.CPD2[str(parent)]['sigma_intercept'][CPD_entry],1)
578 | self.Node[ii].extend(self.Gaussian_select(temp+intercept,self.CPD2[str(parent)]['sigma'][CPD_entry],ii))
579 | self.loopbacks=loopbacks_temp.copy()
580 |
581 | elif(all(self.Node_Type[jj]=='C' for jj in self.Parent2[str(ii)])):
582 | C_Parent=[kk for kk in self.Parent2[str(ii)] if self.Node_Type[kk]=='C']
583 | temp=0
584 | loopbacks_temp=self.loopbacks.copy()
585 | for count3,kk in enumerate(C_Parent):
586 | loopbacks_keys=tsBNgen.int_to_str(kk)+str(ii)
587 | if(loopbacks_keys in self.loopbacks.keys()):
588 | if(self.top_order.index(kk) < count):
589 | for count3,mm in enumerate(self.loopbacks[loopbacks_keys]):
590 | cont_SUM=self.CPD2[str(parent)][str(kk)+str(ii)]['coefficient'][mm][0]*self.Node[kk][-1-mm]
591 | temp=temp+cont_SUM
592 | else:
593 | for count3,mm in enumerate(self.loopbacks[loopbacks_keys]):
594 | LP=mm-1
595 | cont_SUM=self.CPD2[str(parent)][str(kk)+str(ii)]['coefficient'][LP][0]*self.Node[kk][-1-LP]
596 | temp=temp+cont_SUM
597 | del self.loopbacks[loopbacks_keys]
598 | elif(kk != ii):
599 | cont_SUM=self.CPD2[str(parent)][str(kk)+str(ii)]['coefficient'][0][0]*self.Node[kk][-1]
600 | temp=temp+cont_SUM
601 | intercept=np.random.normal(0,self.CPD2[str(parent)]['sigma_intercept'][0],1)
602 | self.Node[ii].extend(self.Gaussian_select(temp+sintercept,self.CPD2[str(parent)]['sigma'][0],ii))
603 | self.loopbacks=loopbacks_temp.copy()
604 |
605 |
606 |
607 | def BN_data_gen(self):
608 | '''
609 | It uses Initial_sample for initial time(t=0) and BN_sample for time point t=1 up to time t=T (length of time series)
610 |
611 | Parameters
612 | -------------
613 | None
614 |
615 | Returns
616 | -------------
617 | None
618 | None
619 |
620 | Raises
621 | ------------
622 | Exception
623 | Parent of a discrete node cannot be continuous
624 |
625 | Notes
626 | -------------
627 | Use this function under the following conditions: custom_time variable is not specified
628 | and the value of the loopback for all the variables is at most 1
629 | '''
630 | keys = range(len(self.Node))
631 | self.BN_Nodes= dict(zip(keys, ([[] for ii in range(self.N)] for _ in keys )))
632 | for ii in range(self.N):
633 |
634 | self.Initial_sample()
635 |
636 | for kk in range(1,self.T):
637 | self.BN_sample()
638 | for jj in range(len(self.Node)):
639 | self.BN_Nodes[jj][ii]=self.Node[jj]
640 | self.Node[jj]=[]
641 |
642 |
643 |
644 | def BN_sample_loopback(self):
645 | '''
646 | Generate samples for all the nodes given CPD3 and Parent3 are used.
647 |
648 | Parameters
649 | --------------
650 | None
651 |
652 | Returns
653 | -------------
654 | None
655 |
656 | Raises
657 | ------------
658 | Exception
659 | Parent of a discrete node cannot be continuous
660 |
661 | Notes
662 | ------------
663 | Use this function when you want to incorporate three BNs.
664 | '''
665 | self.flag=2
666 | self.Valid_BN(self.Parent3)
667 | self.DAG_ordering()
668 | loopbacks_temp=self.loopbacks2.copy()
669 | for count,ii in enumerate(self.top_order):
670 | parent=tsBNgen.int_to_str(self.Parent3[str(ii)])+str(ii)
671 | if(all(self.Node_Type[jj]=='D' for jj in self.Parent3[str(ii)])):
672 | temp=0
673 | all_parents=[]
674 | parent_N_level=[]
675 | for count2,jj in enumerate(self.Parent3[str(ii)]):
676 | loopbacks_keys=tsBNgen.int_to_str(jj)+str(ii)
677 | if(loopbacks_keys in self.loopbacks2.keys()):
678 | if(self.top_order.index(jj) < count):
679 | for count3,mm in enumerate(self.loopbacks2[loopbacks_keys]):
680 | all_parents.append(self.Node[jj][-1-mm])
681 | else:
682 | for count3,mm in enumerate(self.loopbacks2[loopbacks_keys]):
683 | LP=mm-1
684 | all_parents.append(self.Node[jj][-1-LP])
685 | del self.loopbacks2[loopbacks_keys]
686 |
687 |
688 | elif(jj != ii):
689 | all_parents.append(self.Node[jj][-1])
690 | parent_N_level.append(self.N_level[jj])
691 |
692 | self.loopbacks2=loopbacks_temp.copy()
693 | self.all_parents=all_parents
694 | self.parent_N_level=parent_N_level
695 | CPD_entry=self.continous_cpd()
696 | if(self.Node_Type[ii]=='D'):
697 | self.Node[ii].append(self.Multinomial_Select(parent,CPD_entry,ii))
698 | elif(self.Node_Type[ii]=='C'):
699 | self.Node[ii].extend(self.Gaussian_select(parent,CPD_entry))
700 | elif(any(self.Node_Type[jj]=='D' for jj in self.Parent3[str(ii)])):
701 | loopbacks_temp=self.loopbacks2.copy()
702 | D_Parent=[kk for kk in self.Parent3[str(ii)] if self.Node_Type[kk]=='D']
703 | temp=0
704 | all_parents=[]
705 | parent_N_level=[]
706 | for count2,jj in enumerate(D_Parent):
707 | loopbacks_keys=tsBNgen.int_to_str(jj)+str(ii)
708 | if(loopbacks_keys in self.loopbacks2.keys()):
709 | if(self.top_order.index(jj) < count):
710 | for count3,mm in enumerate(self.loopbacks2[loopbacks_keys]):
711 | all_parents.append(self.Node[jj][-1-mm])
712 | else:
713 | for count3,mm in enumerate(self.loopbacks2[loopbacks_keys]):
714 | LP=mm-1
715 | all_parents.append(self.Node[jj][-1-LP])
716 | del self.loopbacks2[loopbacks_keys]
717 |
718 |
719 | elif(jj != ii):
720 | all_parents.append(self.Node[jj][-1])
721 | parent_N_level.append(self.N_level[jj])
722 |
723 | self.loopbacks2=loopbacks_temp.copy()
724 | self.all_parents=all_parents
725 | self.parent_N_level=parent_N_level
726 | CPD_entry=self.continous_cpd()
727 |
728 | C_Parent=[kk for kk in self.Parent3[str(ii)] if self.Node_Type[kk]=='C']
729 | temp=0
730 | loopbacks_temp=self.loopbacks2.copy()
731 | for count3,kk in enumerate(C_Parent):
732 | loopbacks_keys=tsBNgen.int_to_str(kk)+str(ii)
733 | if(loopbacks_keys in self.loopbacks2.keys()):
734 | if(self.top_order.index(kk) < count):
735 | for count3,mm in enumerate(self.loopbacks2[loopbacks_keys]):
736 | cont_SUM=self.CPD3[str(parent)][str(kk)+str(ii)]['coefficient'][mm][CPD_entry]*self.Node[kk][-1-mm]
737 | temp=temp+cont_SUM
738 | else:
739 | for count3,mm in enumerate(self.loopbacks2[loopbacks_keys]):
740 | LP=mm-1
741 | cont_SUM=self.CPD3[str(parent)][str(kk)+str(ii)]['coefficient'][LP][CPD_entry]*self.Node[kk][-1-LP]
742 | temp=temp+cont_SUM
743 | del self.loopbacks[loopbacks_keys]
744 | elif(kk != ii):
745 | cont_SUM=self.CPD3[str(parent)][str(kk)+str(ii)]['coefficient'][0][CPD_entry]*self.Node[kk][-1]
746 | temp=temp+cont_SUM
747 | intercept=np.random.normal(0,self.CPD3[str(parent)]['sigma_intercept'][CPD_entry],1)
748 | self.Node[ii].extend(self.Gaussian_select(temp+intercept,self.CPD3[str(parent)]['sigma'][CPD_entry],ii))
749 | self.loopbacks2=loopbacks_temp.copy()
750 |
751 | elif(all(self.Node_Type[jj]=='C' for jj in self.Parent3[str(ii)])):
752 | C_Parent=[kk for kk in self.Parent3[str(ii)] if self.Node_Type[kk]=='C']
753 | temp=0
754 | loopbacks_temp=self.loopbacks2.copy()
755 | for count3,kk in enumerate(C_Parent):
756 | loopbacks_keys=tsBNgen.int_to_str(kk)+str(ii)
757 | if(loopbacks_keys in self.loopbacks2.keys()):
758 | if(self.top_order.index(kk) < count):
759 | for count3,mm in enumerate(self.loopbacks2[loopbacks_keys]):
760 | cont_SUM=self.CPD3[str(parent)][str(kk)+str(ii)]['coefficient'][mm][0]*self.Node[kk][-1-mm]
761 | temp=temp+cont_SUM
762 | else:
763 | for count3,mm in enumerate(self.loopbacks2[loopbacks_keys]):
764 | LP=mm-1
765 | cont_SUM=self.CPD3[str(parent)][str(kk)+str(ii)]['coefficient'][LP][0]*self.Node[kk][-1-LP]
766 | temp=temp+cont_SUM
767 | del self.loopbacks[loopbacks_keys]
768 | elif(kk != ii):
769 | cont_SUM=self.CPD3[str(parent)][str(kk)+str(ii)]['coefficient'][0][0]*self.Node[kk][-1]
770 | temp=temp+cont_SUM
771 | intercept=np.random.normal(0,self.CPD3[str(parent)]['sigma_intercept'][0],1)
772 | self.Node[ii].extend(self.Gaussian_select(temp+intercept,self.CPD3[str(parent)]['sigma'][0],ii))
773 | self.loopbacks2=loopbacks_temp.copy()
774 |
775 |
776 | def BN_sample_gen_loopback(self):
777 | '''
778 | Generate time series data for all the nodes for all time. See Notes for the more information.
779 |
780 | Parameters
781 | -------------
782 | None
783 |
784 | Returns
785 | -------------
786 | None
787 |
788 | Raises
789 | ------------
790 | Exception
791 | Parent of a discrete node cannot be continuous
792 |
793 | Notes
794 | ------------
795 | This is more general form of BN_data_gen that supports only two different BN structures
796 | or loopback value of maximum one for all the nodes.
797 | '''
798 | keys = range(len(self.Node))
799 | self.BN_Nodes= dict(zip(keys, ([[] for ii in range(self.N)] for _ in keys )))
800 | Max_loopback=max(sum(self.loopbacks2.values(),[]))
801 |
802 | if (self.custom_time == 0):
803 | for ii in range(self.N):
804 | self.Initial_sample()
805 | for kk in range(1,Max_loopback):
806 | self.BN_sample()
807 | for mm in range(Max_loopback,self.T):
808 | self.BN_sample_loopback()
809 | for jj in range(len(self.Node)):
810 | self.BN_Nodes[jj][ii]=self.Node[jj]
811 | self.Node[jj]=[]
812 | else:
813 | for ii in range(self.N):
814 | self.Initial_sample()
815 | for kk in range(1,self.custom_time):
816 | self.BN_sample()
817 | for mm in range(self.custom_time,self.T):
818 | self.BN_sample_loopback()
819 | for jj in range(len(self.Node)):
820 | self.BN_Nodes[jj][ii]=self.Node[jj]
821 | self.Node[jj]=[]
822 |
823 |
824 |
825 |
826 |
827 |
828 |
829 |
830 |
831 |
832 |
833 |
834 |
--------------------------------------------------------------------------------
/tsbngen.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/manitadayon/tsBNgen/1d54de78e9a4405e7a0049ccab89093cd7f0d094/tsbngen.pdf
--------------------------------------------------------------------------------