├── README.md ├── chap02 ├── Bayes net-Causal reasoning.ipynb ├── D-separation.html ├── D-separation.ipynb ├── Naive Bayes.ipynb ├── job_interview.txt ├── printJointDistribution.html └── printJointDistribution.ipynb ├── chap04 ├── alarm-bnfinder.ipynb ├── alarm.csv ├── alarm.txt ├── alarm_1.txt ├── alarm_2.txt ├── alarm_net.bif ├── alarm_net_cpd.txt ├── alarm_network.ipynb ├── alarm_transpose.txt ├── bnfinder.ipynb ├── constraint-based.ipynb ├── job_interview.bif ├── job_interview.sif ├── job_interview.txt ├── job_interview_cpd.txt ├── job_interview_samples.ipynb ├── job_interview_samples.txt ├── job_interview_samples_preamble1.txt ├── job_interview_samples_preamble2.txt ├── job_interview_samples_preamble3.txt ├── job_interview_sif.txt ├── job_net.png ├── nursery.data ├── nursery_net.bif ├── nursery_net_cpd.txt ├── nursery_transpose.txt ├── parent-child.txt ├── sachs.bif ├── sachs.inp ├── sachs_cpd.sif ├── sachs_cpd.txt └── sachs_network.png ├── chap05 ├── Chap5-thumbtack-MLE.ipynb ├── data_segmentation.ipynb ├── job_interview.txt ├── job_interview_libpgm.ipynb ├── learn_cpd_Bayesian.ipynb ├── small_network.txt └── thumbtack_Bayesian.ipynb ├── chap06 ├── JunctionTreeAlgorithm.ipynb ├── VarElim_asia.ipynb ├── alarm.txt ├── asia.bif ├── asia.txt ├── asia1.txt ├── asia_bn.py ├── bif_parser.py └── unittestdict.txt └── chap07 ├── Comparing gibbs and random sampling.ipynb ├── LBP_image_segmentation.ipynb ├── cow_image.jpg └── job_interview.txt /README.md: -------------------------------------------------------------------------------- 1 | # Building Probabilistic Graphical Models in Python 2 | 3 | This is the repository for the source code of the book [Building Probabilistic Graphical Models in Python](https://www.packtpub.com/big-data-and-business-intelligence/building-probabilistic-graphical-models-python), published by Packt Publishers. 4 | -------------------------------------------------------------------------------- /chap02/Bayes net-Causal reasoning.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "metadata": { 3 | "name": "" 4 | }, 5 | "nbformat": 3, 6 | "nbformat_minor": 0, 7 | "worksheets": [ 8 | { 9 | "cells": [ 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "In this section we shall look at different kinds of reasoning used in a Bayes Net. We shall use the libpgm library to create a Bayes net. Libpgm uses a JSON-formatted file with a specific format where the edges, vertices and the CPDs at each vertice are annotated. This json file is read into the NodeData and GraphSkeleton objects to create a DiscreteBayesianNetwork (which, as the name suggests is a Bayes net where the CPDs take on discrete values). The TableCPDFactorization is an object that wraps the DiscreteBayesianNetwork and allows us to query the CPDs in the network. Please copy the json file for this example, 'job_interview.txt' to a local folder and change the relevant line in the getTableCPD() function to refer your local path to job_interview.txt. \n", 15 | "\n", 16 | "The following discussion uses integers 0, 1, 2 for discrete outcomes of each random variable, where 0 is the worst outcome. For e.g, Interview=0 indicates the worst outcome of the interview and Interview=2 is the best outcome. \n", 17 | "\n", 18 | "The first kind of reasoning we shall explore is called 'Causal Reasoning'. Initially we observe the prior probability of an event unconditioned by any evidence(for this example, we shall focus on the 'Offer' random variable). We then introduce observations of (one of) the parent variables. Consistent with our logical reasoning, we note that if one of the parents (equivalent to causes) of an event are observed, then we have stronger beliefs about the Offer random variable.\n", 19 | "\n", 20 | "We start off by defining a function that reads the JSON data file and creating an object we can use to run probability queries on." 21 | ] 22 | }, 23 | { 24 | "cell_type": "code", 25 | "collapsed": false, 26 | "input": [ 27 | "from libpgm.graphskeleton import GraphSkeleton\n", 28 | "from libpgm.nodedata import NodeData\n", 29 | "from libpgm.discretebayesiannetwork import DiscreteBayesianNetwork\n", 30 | "from libpgm.tablecpdfactorization import TableCPDFactorization\n", 31 | "\n", 32 | "def getTableCPD():\n", 33 | " nd = NodeData()\n", 34 | " skel = GraphSkeleton()\n", 35 | " jsonpath=\"job_interview.txt\"\n", 36 | " nd.load(jsonpath)\n", 37 | " skel.load(jsonpath)\n", 38 | " bn = DiscreteBayesianNetwork(skel, nd)\n", 39 | " tablecpd=TableCPDFactorization(bn)\n", 40 | " return tablecpd\n" 41 | ], 42 | "language": "python", 43 | "metadata": {}, 44 | "outputs": [], 45 | "prompt_number": 1 46 | }, 47 | { 48 | "cell_type": "markdown", 49 | "metadata": {}, 50 | "source": [ 51 | "What is prior probability of getting a job offer $ P(Offer=1) $? Note that the probability query takes 2 dictionary arguments, the first one being the query and the second being the evidence set, which is empty right now." 52 | ] 53 | }, 54 | { 55 | "cell_type": "code", 56 | "collapsed": false, 57 | "input": [ 58 | "tcpd=getTableCPD()\n", 59 | "tcpd.specificquery(dict(Offer='1'),{})" 60 | ], 61 | "language": "python", 62 | "metadata": {}, 63 | "outputs": [ 64 | { 65 | "metadata": {}, 66 | "output_type": "pyout", 67 | "prompt_number": 13, 68 | "text": [ 69 | "0.432816" 70 | ] 71 | } 72 | ], 73 | "prompt_number": 13 74 | }, 75 | { 76 | "cell_type": "markdown", 77 | "metadata": {}, 78 | "source": [ 79 | "It is about 43%, and if we now introduce evidence that the candidate has poor grades, how does it change the probability of getting an offer? Evaluating for $ P(Offer=1 | Grades=0) $" 80 | ] 81 | }, 82 | { 83 | "cell_type": "code", 84 | "collapsed": false, 85 | "input": [ 86 | "tcpd=getTableCPD()\n", 87 | "tcpd.specificquery(dict(Offer='1'),dict(Grades='0'))" 88 | ], 89 | "language": "python", 90 | "metadata": {}, 91 | "outputs": [ 92 | { 93 | "metadata": {}, 94 | "output_type": "pyout", 95 | "prompt_number": 11, 96 | "text": [ 97 | "0.35148" 98 | ] 99 | } 100 | ], 101 | "prompt_number": 11 102 | }, 103 | { 104 | "cell_type": "markdown", 105 | "metadata": {}, 106 | "source": [ 107 | "As expected, it decreases the probability of getting an offer. Adding further evidence that the candidate's experience is low as well, we evaluate $ P(Offer=1 | Grades=0, Experience=0) $" 108 | ] 109 | }, 110 | { 111 | "cell_type": "code", 112 | "collapsed": false, 113 | "input": [ 114 | "tcpd=getTableCPD()\n", 115 | "tcpd.specificquery(dict(Offer='1'),dict(Grades='0',Experience='0'))" 116 | ], 117 | "language": "python", 118 | "metadata": {}, 119 | "outputs": [ 120 | { 121 | "metadata": {}, 122 | "output_type": "pyout", 123 | "prompt_number": 12, 124 | "text": [ 125 | "0.2078" 126 | ] 127 | } 128 | ], 129 | "prompt_number": 12 130 | }, 131 | { 132 | "cell_type": "markdown", 133 | "metadata": {}, 134 | "source": [ 135 | "As expected, it drops even lower to 20%.\n", 136 | "\n", 137 | "What we have seen is the introduction of observed parent random variable strenthens our beliefs leading to the name 'Causal Reasoning'. \n", 138 | "\n", 139 | "## Evidential Reasoning\n", 140 | "\n" 141 | ] 142 | }, 143 | { 144 | "cell_type": "markdown", 145 | "metadata": {}, 146 | "source": [ 147 | "The second kind of reasoning is when we observe the value of a child variable, and we wish to reason about how it strengthens our beliefs about its parents. Evaluating for the prior probability of high Experience $ P(Experience=1) $" 148 | ] 149 | }, 150 | { 151 | "cell_type": "code", 152 | "collapsed": false, 153 | "input": [ 154 | "tcpd=getTableCPD()\n", 155 | "tcpd.specificquery(dict(Experience='1'),{})" 156 | ], 157 | "language": "python", 158 | "metadata": {}, 159 | "outputs": [ 160 | { 161 | "output_type": "stream", 162 | "stream": "stdout", 163 | "text": [ 164 | "0.4\n" 165 | ] 166 | } 167 | ], 168 | "prompt_number": 14 169 | }, 170 | { 171 | "cell_type": "markdown", 172 | "metadata": {}, 173 | "source": [ 174 | "If we now introduce evidence that the candidate's interview was good, and evaluate for $ P(Experience=1\\mid Interview=2) $" 175 | ] 176 | }, 177 | { 178 | "cell_type": "code", 179 | "collapsed": false, 180 | "input": [ 181 | "tcpd=getTableCPD()\n", 182 | "tcpd.specificquery(dict(Experience='1'),dict(Interview='2'))" 183 | ], 184 | "language": "python", 185 | "metadata": {}, 186 | "outputs": [ 187 | { 188 | "metadata": {}, 189 | "output_type": "pyout", 190 | "prompt_number": 15, 191 | "text": [ 192 | "0.8641975308641975" 193 | ] 194 | } 195 | ], 196 | "prompt_number": 15 197 | }, 198 | { 199 | "cell_type": "markdown", 200 | "metadata": {}, 201 | "source": [ 202 | "we see that the probability that the candidate was highly experienced increases, which follows the reasoning that the candidate must have good experience or education, or both. In Evidential reasoning, we reason from effect to cause. " 203 | ] 204 | }, 205 | { 206 | "cell_type": "markdown", 207 | "metadata": {}, 208 | "source": [ 209 | "## Intercausal reasoning\n", 210 | "\n", 211 | "Intercausal reasoning, as the name suggests, is a type of reasoning where multiple causes of a single effect interact. We first determine what is the prior probability of having high relevant experience. On evaluating $ P(Experience=1) $" 212 | ] 213 | }, 214 | { 215 | "cell_type": "code", 216 | "collapsed": false, 217 | "input": [ 218 | "tcpd=getTableCPD()\n", 219 | "tcpd.specificquery(dict(Experience='1'),{})" 220 | ], 221 | "language": "python", 222 | "metadata": {}, 223 | "outputs": [ 224 | { 225 | "metadata": {}, 226 | "output_type": "pyout", 227 | "prompt_number": 16, 228 | "text": [ 229 | "0.4000000000000001" 230 | ] 231 | } 232 | ], 233 | "prompt_number": 16 234 | }, 235 | { 236 | "cell_type": "markdown", 237 | "metadata": {}, 238 | "source": [ 239 | "Introducing evidence that the interview went extremely well, we think that the candidate must be quite experienced. Evaluating for $ P(Experience=1 \\mid Interview=2) $" 240 | ] 241 | }, 242 | { 243 | "cell_type": "code", 244 | "collapsed": false, 245 | "input": [ 246 | "tcpd=getTableCPD()\n", 247 | "tcpd.specificquery(dict(Experience='1'),dict(Interview='2'))" 248 | ], 249 | "language": "python", 250 | "metadata": {}, 251 | "outputs": [ 252 | { 253 | "metadata": {}, 254 | "output_type": "pyout", 255 | "prompt_number": 17, 256 | "text": [ 257 | "0.8641975308641975" 258 | ] 259 | } 260 | ], 261 | "prompt_number": 17 262 | }, 263 | { 264 | "cell_type": "markdown", 265 | "metadata": {}, 266 | "source": [ 267 | "The Bayes net confirms what we think is true, and the probability of high experience goes up from 0.4 to 0.86. Now if we introduce evidence the the candidate didn't have good grades and still managed to get a good score in the interview, we may conclude that the candidate must be so experienced that his grades didn't matter at all. Evaluating for $ P(Experience=1\u2223Interview=2,Grades=0) $" 268 | ] 269 | }, 270 | { 271 | "cell_type": "code", 272 | "collapsed": false, 273 | "input": [ 274 | "tcpd=getTableCPD()\n", 275 | "tcpd.specificquery(dict(Experience='1'),dict(Interview='2',Grades='0'))" 276 | ], 277 | "language": "python", 278 | "metadata": {}, 279 | "outputs": [ 280 | { 281 | "metadata": {}, 282 | "output_type": "pyout", 283 | "prompt_number": 18, 284 | "text": [ 285 | "0.9090909090909091" 286 | ] 287 | } 288 | ], 289 | "prompt_number": 18 290 | }, 291 | { 292 | "cell_type": "markdown", 293 | "metadata": {}, 294 | "source": [ 295 | "which confirms what we were thinking, even though the probability of high experience went up only a little, it strengthens our belief about the candidate's high experience. This example shows the interplay between the two parents of Interview, which are Experience and Grades, and shows us that if we know one of the causes behind an effect, it reduces the importance of the other cause. In other words, we have explained away the poor grades on observing the experience of the candidate. This phenomenon is commonly calle 'Explaining Away'. " 296 | ] 297 | } 298 | ], 299 | "metadata": {} 300 | } 301 | ] 302 | } -------------------------------------------------------------------------------- /chap02/D-separation.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "metadata": { 3 | "name": "" 4 | }, 5 | "nbformat": 3, 6 | "nbformat_minor": 0, 7 | "worksheets": [ 8 | { 9 | "cells": [ 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "In this section we shall look at using the job candidate example to understand d-separation.\n", 15 | "In the process of doing Causal Reasoning, we will query for Job Offer, and we shall introduce observed variables in the parents of Job Offer to verify the concepts of Active Trails that we have seen in the previous section." 16 | ] 17 | }, 18 | { 19 | "cell_type": "code", 20 | "collapsed": false, 21 | "input": [ 22 | "from libpgm.graphskeleton import GraphSkeleton\n", 23 | "from libpgm.nodedata import NodeData\n", 24 | "from libpgm.discretebayesiannetwork import DiscreteBayesianNetwork\n", 25 | "from libpgm.tablecpdfactorization import TableCPDFactorization\n", 26 | "\n", 27 | "def getTableCPD():\n", 28 | " nd = NodeData()\n", 29 | " skel = GraphSkeleton()\n", 30 | " jsonpath=\"job_interview.txt\"\n", 31 | " nd.load(jsonpath)\n", 32 | " skel.load(jsonpath)\n", 33 | " # load bayesian network\n", 34 | " bn = DiscreteBayesianNetwork(skel, nd)\n", 35 | " tablecpd=TableCPDFactorization(bn)\n", 36 | " return tablecpd" 37 | ], 38 | "language": "python", 39 | "metadata": {}, 40 | "outputs": [], 41 | "prompt_number": 2 42 | }, 43 | { 44 | "cell_type": "markdown", 45 | "metadata": {}, 46 | "source": [ 47 | "We first query Job Offer with no other observed variables." 48 | ] 49 | }, 50 | { 51 | "cell_type": "code", 52 | "collapsed": false, 53 | "input": [ 54 | "getTableCPD().specificquery(dict(Offer='1'),dict())" 55 | ], 56 | "language": "python", 57 | "metadata": {}, 58 | "outputs": [ 59 | { 60 | "output_type": "stream", 61 | "stream": "stdout", 62 | "text": [ 63 | "0.432816\n" 64 | ] 65 | } 66 | ], 67 | "prompt_number": 3 68 | }, 69 | { 70 | "cell_type": "markdown", 71 | "metadata": {}, 72 | "source": [ 73 | "We know from the active trail rules that observing Experience should change the probability of Job Offer." 74 | ] 75 | }, 76 | { 77 | "cell_type": "code", 78 | "collapsed": false, 79 | "input": [ 80 | "getTableCPD().specificquery(dict(Offer='1'),dict(Experience='1'))" 81 | ], 82 | "language": "python", 83 | "metadata": {}, 84 | "outputs": [ 85 | { 86 | "output_type": "stream", 87 | "stream": "stdout", 88 | "text": [ 89 | "0.6438\n" 90 | ] 91 | } 92 | ], 93 | "prompt_number": 4 94 | }, 95 | { 96 | "cell_type": "markdown", 97 | "metadata": {}, 98 | "source": [ 99 | "And it does. Now let us add the Job interview observed variable. " 100 | ] 101 | }, 102 | { 103 | "cell_type": "code", 104 | "collapsed": false, 105 | "input": [ 106 | "getTableCPD().specificquery(dict(Offer='1'),dict(Interview='1'))" 107 | ], 108 | "language": "python", 109 | "metadata": {}, 110 | "outputs": [ 111 | { 112 | "output_type": "stream", 113 | "stream": "stdout", 114 | "text": [ 115 | "0.6\n" 116 | ] 117 | } 118 | ], 119 | "prompt_number": 5 120 | }, 121 | { 122 | "cell_type": "markdown", 123 | "metadata": {}, 124 | "source": [ 125 | "We get a slightly different probability for Job Offer. We know from the D-seperation rules that observing Job Interview should block the active trail from Experience to Job Offer." 126 | ] 127 | }, 128 | { 129 | "cell_type": "code", 130 | "collapsed": false, 131 | "input": [ 132 | "\n", 133 | "getTableCPD().specificquery(dict(Offer='1'),dict(Interview='1',Experience='1'))" 134 | ], 135 | "language": "python", 136 | "metadata": {}, 137 | "outputs": [ 138 | { 139 | "output_type": "stream", 140 | "stream": "stdout", 141 | "text": [ 142 | "0.6\n" 143 | ] 144 | } 145 | ], 146 | "prompt_number": 6 147 | }, 148 | { 149 | "cell_type": "markdown", 150 | "metadata": {}, 151 | "source": [ 152 | "Observe that the probability of Job Offer does not change from 0.6, despite the addition of the Experience variable being observed. We can add other values of Job Interview's parent variables" 153 | ] 154 | }, 155 | { 156 | "cell_type": "code", 157 | "collapsed": false, 158 | "input": [ 159 | "query=dict(Offer='1')\n", 160 | "results=[getTableCPD().specificquery(query,e) for e in [dict(Interview='1',Experience='0'),dict(Interview='1',Experience='1'),\n", 161 | " dict(Interview='1',Grades='1'),dict(Interview='1',Grades='0')]]\n", 162 | "print results" 163 | ], 164 | "language": "python", 165 | "metadata": {}, 166 | "outputs": [ 167 | { 168 | "output_type": "stream", 169 | "stream": "stdout", 170 | "text": [ 171 | "[0.6, 0.6, 0.6, 0.6]\n" 172 | ] 173 | } 174 | ], 175 | "prompt_number": 9 176 | }, 177 | { 178 | "cell_type": "markdown", 179 | "metadata": {}, 180 | "source": [ 181 | "The above code shows that once the Job Interview variable is observed, the active trail between Experience and Job Offer is blocked . Observing values of its parents Experience and Grades do not contribute to changing the probability of Job Offer.\n", 182 | "\n", 183 | "## Blocking and unblocking a V-structure\n" 184 | ] 185 | }, 186 | { 187 | "cell_type": "code", 188 | "collapsed": false, 189 | "input": [ 190 | "print getTableCPD().specificquery(dict(Grades='1'),dict(Experience='0'))\n", 191 | "print getTableCPD().specificquery(dict(Grades='1'),dict())" 192 | ], 193 | "language": "python", 194 | "metadata": {}, 195 | "outputs": [ 196 | { 197 | "output_type": "stream", 198 | "stream": "stdout", 199 | "text": [ 200 | "0.3\n", 201 | "0.3\n" 202 | ] 203 | } 204 | ], 205 | "prompt_number": 24 206 | }, 207 | { 208 | "cell_type": "markdown", 209 | "metadata": {}, 210 | "source": [ 211 | "According the rules of D-separation, the The Job Interview node is a V-structure between Experience and Grades, and it blocks the active trail between them. The above code shows that the introduction of observed variable Experience has no effect on the probability of Grade." 212 | ] 213 | }, 214 | { 215 | "cell_type": "code", 216 | "collapsed": false, 217 | "input": [ 218 | "print getTableCPD().specificquery(dict(Grades='1'),dict(Interview='1'))" 219 | ], 220 | "language": "python", 221 | "metadata": {}, 222 | "outputs": [ 223 | { 224 | "output_type": "stream", 225 | "stream": "stdout", 226 | "text": [ 227 | "0.413016270338\n" 228 | ] 229 | } 230 | ], 231 | "prompt_number": 25 232 | }, 233 | { 234 | "cell_type": "markdown", 235 | "metadata": {}, 236 | "source": [ 237 | "That should activate trail between Experience and Grades." 238 | ] 239 | }, 240 | { 241 | "cell_type": "code", 242 | "collapsed": false, 243 | "input": [ 244 | "print getTableCPD().specificquery(dict(Grades='1'),dict(Interview='1',Experience='0'))\n", 245 | "print getTableCPD().specificquery(dict(Grades='1'),dict(Interview='1',Experience='1'))" 246 | ], 247 | "language": "python", 248 | "metadata": {}, 249 | "outputs": [ 250 | { 251 | "output_type": "stream", 252 | "stream": "stdout", 253 | "text": [ 254 | "0.588235294118\n", 255 | "0.176470588235\n" 256 | ] 257 | } 258 | ], 259 | "prompt_number": 26 260 | }, 261 | { 262 | "cell_type": "markdown", 263 | "metadata": {}, 264 | "source": [ 265 | "the above code now shows the existence of an active trail between Experience and Grade, where changing the observed Experience does change the probability of Grades." 266 | ] 267 | } 268 | ], 269 | "metadata": {} 270 | } 271 | ] 272 | } -------------------------------------------------------------------------------- /chap02/Naive Bayes.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "metadata": { 3 | "name": "" 4 | }, 5 | "nbformat": 3, 6 | "nbformat_minor": 0, 7 | "worksheets": [ 8 | { 9 | "cells": [ 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "In this example, we will use the Naive Bayes implementation from the Scikit-learn machine learning library to classify newsgroup postings. We have chosen two newsgroups from the datasets provided by Scikit-learn (alt.atheism and sci.med) and we shall use Naive Bayes to predict which newsgroup a particular posting is from." 15 | ] 16 | }, 17 | { 18 | "cell_type": "code", 19 | "collapsed": false, 20 | "input": [ 21 | "from sklearn.datasets import fetch_20newsgroups\n", 22 | "import numpy as np\n", 23 | "from sklearn.naive_bayes import MultinomialNB\n", 24 | "from sklearn import metrics,cross_validation\n", 25 | "from sklearn.feature_extraction.text import TfidfVectorizer\n", 26 | "\n", 27 | "cats = ['alt.atheism', 'sci.med']\n", 28 | "newsgroups= fetch_20newsgroups(subset='all',remove=('headers', 'footers', 'quotes'), categories=cats)" 29 | ], 30 | "language": "python", 31 | "metadata": {}, 32 | "outputs": [], 33 | "prompt_number": 9 34 | }, 35 | { 36 | "cell_type": "markdown", 37 | "metadata": {}, 38 | "source": [ 39 | "We first loads the newsgroup data using the utility function provided by scikit-learn (this downloads the dataset from the internet and may take some time). The newsgroup object is a map, the newsgroup postings are saved against 'data', and the target variables are in newsgroups.target. " 40 | ] 41 | }, 42 | { 43 | "cell_type": "code", 44 | "collapsed": false, 45 | "input": [ 46 | "newsgroups.target" 47 | ], 48 | "language": "python", 49 | "metadata": {}, 50 | "outputs": [ 51 | { 52 | "metadata": {}, 53 | "output_type": "pyout", 54 | "prompt_number": 10, 55 | "text": [ 56 | "array([1, 0, 0, ..., 0, 0, 0], dtype=int64)" 57 | ] 58 | } 59 | ], 60 | "prompt_number": 10 61 | }, 62 | { 63 | "cell_type": "markdown", 64 | "metadata": {}, 65 | "source": [ 66 | "Since the features are words, we transform them to another representation using Term Frequency-Inverse Document Frequency (Tfidf). The purpose of tfidf is to de-emphasize words that occur in all postings (such as 'the','by,'for' etc) and instead emphasize words that are unique to a particular class (such as religion, creationism which are from the alt.atheism newsgroup).\n", 67 | "We can do the same by creating a TfidfVectorizer and then transforming all the newsgroup data to a vector representation" 68 | ] 69 | }, 70 | { 71 | "cell_type": "code", 72 | "collapsed": false, 73 | "input": [ 74 | "vectorizer = TfidfVectorizer()\n", 75 | "vectors = vectorizer.fit_transform(newsgroups.data)" 76 | ], 77 | "language": "python", 78 | "metadata": {}, 79 | "outputs": [], 80 | "prompt_number": 11 81 | }, 82 | { 83 | "cell_type": "markdown", 84 | "metadata": {}, 85 | "source": [ 86 | "Vectors now contains features that we can use as input data to the Naive Bayes classifier. A shape query reveals that it contains 1789 instances, and each instance contains about 24k features. However, many of those features can be 0, indicating words that do no appear in that particular posting." 87 | ] 88 | }, 89 | { 90 | "cell_type": "code", 91 | "collapsed": false, 92 | "input": [ 93 | "vectors.shape" 94 | ], 95 | "language": "python", 96 | "metadata": {}, 97 | "outputs": [ 98 | { 99 | "metadata": {}, 100 | "output_type": "pyout", 101 | "prompt_number": 12, 102 | "text": [ 103 | "(1789, 24202)" 104 | ] 105 | } 106 | ], 107 | "prompt_number": 12 108 | }, 109 | { 110 | "cell_type": "markdown", 111 | "metadata": {}, 112 | "source": [ 113 | "Scikit-learn provides a few versions of Naive Bayes classifier, the one we use is called MultinomialNB. Since using a classifier typically involves splitting the dataset into train, test and validation sets, then training on the 'train' set and testing the efficacy on the 'validation' set, we can use the scikit-learn provided utility to do the same for us.\n", 114 | "The cross_validation.cross_val_score automatically splits the data into multiple sets and returns the F1 score (a metric that measures a classifier's accuracy)." 115 | ] 116 | }, 117 | { 118 | "cell_type": "code", 119 | "collapsed": false, 120 | "input": [ 121 | "clf = MultinomialNB(alpha=.01)\n", 122 | "print \"CrossValidation Score: \", np.mean(cross_validation.cross_val_score(clf,vectors, newsgroups.target, scoring='f1'))" 123 | ], 124 | "language": "python", 125 | "metadata": {}, 126 | "outputs": [ 127 | { 128 | "output_type": "stream", 129 | "stream": "stdout", 130 | "text": [ 131 | "CrossValidation Score: " 132 | ] 133 | }, 134 | { 135 | "output_type": "stream", 136 | "stream": "stdout", 137 | "text": [ 138 | "0.954618416381\n" 139 | ] 140 | } 141 | ], 142 | "prompt_number": 13 143 | }, 144 | { 145 | "cell_type": "markdown", 146 | "metadata": {}, 147 | "source": [ 148 | "We can see that despite the assumption that all features are conditionally independent, the classifier maintains a decent F1 score of 95%. " 149 | ] 150 | } 151 | ], 152 | "metadata": {} 153 | } 154 | ] 155 | } -------------------------------------------------------------------------------- /chap02/job_interview.txt: -------------------------------------------------------------------------------- 1 | { 2 | "V": ["Offer", "Interview", "Grades", "Admission", "Experience"], 3 | "E": [["Grades", "Interview"], 4 | ["Experience", "Interview"], 5 | ["Grades", "Admission"], 6 | ["Interview", "Offer"]], 7 | "Vdata": { 8 | "Offer": { 9 | "ord": 4, 10 | "numoutcomes": 2, 11 | "vals": ["0", "1"], 12 | "parents": ["Interview"], 13 | "children": None, 14 | "cprob": { 15 | "['0']": [.9, .1], 16 | "['1']": [.4, .6], 17 | "['2']": [.01, .99] 18 | } 19 | }, 20 | 21 | "Admission": { 22 | "ord": 3, 23 | "numoutcomes": 2, 24 | "vals": ["0", "1"], 25 | "parents": ["Grades"], 26 | "children": None, 27 | "cprob": { 28 | "['0']": [.7, .3], 29 | "['1']": [.2, .8] 30 | } 31 | }, 32 | 33 | "Interview": { 34 | "ord": 2, 35 | "numoutcomes": 3, 36 | "vals": ["0", "1", "2"], 37 | "parents": ["Experience", "Grades"], 38 | "children": ["Offer"], 39 | "cprob": { 40 | "['0', '0']": [.8, .18, .02], 41 | "['0', '1']": [.3, .6, .1], 42 | "['1', '0']": [.3, .4, .3], 43 | "['1', '1']": [.1, .2, .7] 44 | } 45 | }, 46 | 47 | "Grades": { 48 | "ord": 1, 49 | "numoutcomes": 2, 50 | "vals": ["0", "1"], 51 | "parents": None, 52 | "children": ["Admission", "Interview"], 53 | "cprob": [.7, .3] 54 | }, 55 | 56 | "Experience": { 57 | "ord": 0, 58 | "numoutcomes": 2, 59 | "vals": ["0", "1"], 60 | "parents": None, 61 | "children": ["Interview"], 62 | "cprob": [.6, .4] 63 | } 64 | } 65 | } 66 | -------------------------------------------------------------------------------- /chap02/printJointDistribution.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "metadata": { 3 | "name": "" 4 | }, 5 | "nbformat": 3, 6 | "nbformat_minor": 0, 7 | "worksheets": [ 8 | { 9 | "cells": [ 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "The goal of this program is to print out the joint distribution for the job interview example (where the Admission node is absent), that is, the joint distribution over the variables experience, education, interview and offer.\n", 15 | "\n", 16 | "We first load the JSON file containing the CPDs and tree structure in the getJointDist() method, and then do a product of the interview and offer CPD factors. This gives us an object that contains the un-normalized values of the joint distribution using those four variables.\n", 17 | "\n", 18 | "The printdist method prints the permutations of the variables involved, along with the normalized probabilities against each one." 19 | ] 20 | }, 21 | { 22 | "cell_type": "code", 23 | "collapsed": false, 24 | "input": [ 25 | "from libpgm.graphskeleton import GraphSkeleton\n", 26 | "from libpgm.nodedata import NodeData\n", 27 | "from libpgm.discretebayesiannetwork import DiscreteBayesianNetwork\n", 28 | "from libpgm.tablecpdfactor import TableCPDFactor\n", 29 | "import itertools\n", 30 | "import pandas as pd \n", 31 | "\n", 32 | "def getJointDist():\n", 33 | " nd = NodeData()\n", 34 | " skel = GraphSkeleton()\n", 35 | " jsonpath=\"job_interview.txt\"\n", 36 | " nd.load(jsonpath)\n", 37 | " skel.load(jsonpath)\n", 38 | " skel.toporder()\n", 39 | " # load bayes netw\n", 40 | " bn = DiscreteBayesianNetwork(skel, nd)\n", 41 | " inter_fac=TableCPDFactor(\"Interview\",bn)\n", 42 | " offer_fac=TableCPDFactor(\"Offer\",bn)\n", 43 | " offer_fac.multiplyfactor(inter_fac)\n", 44 | " return offer_fac,bn\n", 45 | "\n", 46 | "#a method that prints the distribution as a table.\n", 47 | "def printdist(jd,bn):\n", 48 | " x=[bn.Vdata[i][\"vals\"] for i in jd.scope]\n", 49 | " #creates the cartesian product\n", 50 | " k=[a + [b] for a,b in zip([list(i) for i in itertools.product(*x[::-1])],jd.vals)]\n", 51 | " df=pd.DataFrame.from_records(k,columns=[i for i in reversed(jd.scope)]+['probability'])\n", 52 | " return df\n", 53 | " \n", 54 | "jd,bn=getJointDist()\n", 55 | "printdist(jd,bn)" 56 | ], 57 | "language": "python", 58 | "metadata": {}, 59 | "outputs": [ 60 | { 61 | "html": [ 62 | "
\n", 63 | "\n", 64 | " \n", 65 | " \n", 66 | " \n", 67 | " \n", 68 | " \n", 69 | " \n", 70 | " \n", 71 | " \n", 72 | " \n", 73 | " \n", 74 | " \n", 75 | " \n", 76 | " \n", 77 | " \n", 78 | " \n", 79 | " \n", 80 | " \n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | " \n", 200 | " \n", 201 | " \n", 202 | " \n", 203 | " \n", 204 | " \n", 205 | " \n", 206 | " \n", 207 | " \n", 208 | " \n", 209 | " \n", 210 | " \n", 211 | " \n", 212 | " \n", 213 | " \n", 214 | " \n", 215 | " \n", 216 | " \n", 217 | " \n", 218 | " \n", 219 | " \n", 220 | " \n", 221 | " \n", 222 | " \n", 223 | " \n", 224 | " \n", 225 | " \n", 226 | " \n", 227 | " \n", 228 | " \n", 229 | " \n", 230 | " \n", 231 | " \n", 232 | " \n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | " \n", 267 | " \n", 268 | "
ExperienceGradesInterviewOfferprobability
0 0 0 0 0 0.7200
1 0 0 0 1 0.0800
2 0 0 1 0 0.0720
3 0 0 1 1 0.1080
4 0 0 2 0 0.0002
5 0 0 2 1 0.0198
6 0 1 0 0 0.2700
7 0 1 0 1 0.0300
8 0 1 1 0 0.2400
9 0 1 1 1 0.3600
10 0 1 2 0 0.0010
11 0 1 2 1 0.0990
12 1 0 0 0 0.2700
13 1 0 0 1 0.0300
14 1 0 1 0 0.1600
15 1 0 1 1 0.2400
16 1 0 2 0 0.0030
17 1 0 2 1 0.2970
18 1 1 0 0 0.0900
19 1 1 0 1 0.0100
20 1 1 1 0 0.0800
21 1 1 1 1 0.1200
22 1 1 2 0 0.0070
23 1 1 2 1 0.6930
\n", 269 | "
" 270 | ], 271 | "metadata": {}, 272 | "output_type": "pyout", 273 | "prompt_number": 2, 274 | "text": [ 275 | " Experience Grades Interview Offer probability\n", 276 | "0 0 0 0 0 0.7200\n", 277 | "1 0 0 0 1 0.0800\n", 278 | "2 0 0 1 0 0.0720\n", 279 | "3 0 0 1 1 0.1080\n", 280 | "4 0 0 2 0 0.0002\n", 281 | "5 0 0 2 1 0.0198\n", 282 | "6 0 1 0 0 0.2700\n", 283 | "7 0 1 0 1 0.0300\n", 284 | "8 0 1 1 0 0.2400\n", 285 | "9 0 1 1 1 0.3600\n", 286 | "10 0 1 2 0 0.0010\n", 287 | "11 0 1 2 1 0.0990\n", 288 | "12 1 0 0 0 0.2700\n", 289 | "13 1 0 0 1 0.0300\n", 290 | "14 1 0 1 0 0.1600\n", 291 | "15 1 0 1 1 0.2400\n", 292 | "16 1 0 2 0 0.0030\n", 293 | "17 1 0 2 1 0.2970\n", 294 | "18 1 1 0 0 0.0900\n", 295 | "19 1 1 0 1 0.0100\n", 296 | "20 1 1 1 0 0.0800\n", 297 | "21 1 1 1 1 0.1200\n", 298 | "22 1 1 2 0 0.0070\n", 299 | "23 1 1 2 1 0.6930" 300 | ] 301 | } 302 | ], 303 | "prompt_number": 2 304 | } 305 | ], 306 | "metadata": {} 307 | } 308 | ] 309 | } -------------------------------------------------------------------------------- /chap04/alarm-bnfinder.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "metadata": { 3 | "name": "" 4 | }, 5 | "nbformat": 3, 6 | "nbformat_minor": 0, 7 | "worksheets": [ 8 | { 9 | "cells": [ 10 | { 11 | "cell_type": "code", 12 | "collapsed": false, 13 | "input": [ 14 | "import pandas as pd\n", 15 | "\n", 16 | "'''\n", 17 | "df=pd.read_csv(\"nursery.data\")\n", 18 | "df.transpose().to_csv(\"nursery_transpose.txt\",sep=\"\\t\")\n", 19 | "s,g,sp=learn_structure(\"nursery_transpose.txt\",\"nursery_net\")\n", 20 | "'''\n", 21 | "from BNfinder.BDE import BDE\n", 22 | "from BNfinder.data import dataset\n", 23 | "\n", 24 | "score=eval(\"BDE\")(data_factor=1.0,chi_alpha=.9999,sloops=False)\n", 25 | "\n", 26 | "def learn_structure(sample_data,dataset_name):\n", 27 | " d = dataset(dataset_name).fromNewFile(open(sample_data))\n", 28 | " score2,g,subpars = d.learn(score=score,data_factor=1.0)\n", 29 | " d.write_bif(g,dataset_name+\".bif\")\n", 30 | " d.write_cpd(g,file(dataset_name+\"_cpd.txt\",\"w\"))\n", 31 | " return score2,g,subpars" 32 | ], 33 | "language": "python", 34 | "metadata": {}, 35 | "outputs": [], 36 | "prompt_number": 1 37 | }, 38 | { 39 | "cell_type": "code", 40 | "collapsed": false, 41 | "input": [ 42 | "import pandas as pd\n", 43 | "\n", 44 | "df=pd.read_csv(\"alarm.csv\")\n", 45 | "df.transpose().to_csv(\"alarm_transpose.txt\",sep=\"\\t\")" 46 | ], 47 | "language": "python", 48 | "metadata": {}, 49 | "outputs": [], 50 | "prompt_number": 3 51 | }, 52 | { 53 | "cell_type": "code", 54 | "collapsed": false, 55 | "input": [], 56 | "language": "python", 57 | "metadata": {}, 58 | "outputs": [] 59 | }, 60 | { 61 | "cell_type": "code", 62 | "collapsed": false, 63 | "input": [ 64 | "s,g,sp=learn_structure(\"alarm_1.txt\",\"alarm_net\")\n" 65 | ], 66 | "language": "python", 67 | "metadata": {}, 68 | "outputs": [], 69 | "prompt_number": 2 70 | }, 71 | { 72 | "cell_type": "code", 73 | "collapsed": false, 74 | "input": [ 75 | "g" 76 | ], 77 | "language": "python", 78 | "metadata": {}, 79 | "outputs": [ 80 | { 81 | "metadata": {}, 82 | "output_type": "pyout", 83 | "prompt_number": 3, 84 | "text": [ 85 | "Graph: \n", 86 | "\tHypovolemia(Hypovolemia) => LVEDVolume(+), StrokeVolume(+), CVP(+), PCWP(+), CO(+), BP(+), \n", 87 | "\tLVFailure(LVFailure) => LVEDVolume(-), StrokeVolume(+), CVP(-), PCWP(-), CO(+), History(+), \n", 88 | "\tLVEDVolume(LVEDVolume) => \n", 89 | "\tStrokeVolume(StrokeVolume) => \n", 90 | "\tCVP(CVP) => \n", 91 | "\tPCWP(PCWP) => \n", 92 | "\tInsuffAnesth(InsuffAnesth) => Catechol(-), HR(-), CO(-), BP(-), HREKG(-), HRSat(-), HRBP(-), \n", 93 | "\tPulmEmbolus(PulmEmbolus) => Shunt(-), VentLung(-), VentAlv(-), PVSat(-), ArtCOb(+), ExpCOb(-), PAP(-), MinVol(-), \n", 94 | "\tIntubation(Intubation) => Shunt(+), VentTube(+), VentLung(-), VentAlv(-), PVSat(-), ArtCOb(+), ExpCOb(+), Press(+), MinVol(-), \n", 95 | "\tShunt(Shunt) => \n", 96 | "\tKinkedTube(KinkedTube) => Shunt(-), VentLung(+), VentAlv(+), PVSat(+), ArtCOb(-), ExpCOb(+), Press(-), MinVol(+), \n", 97 | "\tMinVolSet(MinVolSet) => VentMach(+), VentTube(+), VentLung(+), VentAlv(+), ArtCOb(-), ExpCOb(-), Press(+), MinVol(+), \n", 98 | "\tVentMach(VentMach) => \n", 99 | "\tDisconnect(Disconnect) => Shunt(-), VentTube(+), VentLung(+), VentAlv(+), PVSat(+), ArtCOb(-), ExpCOb(+), Press(+), MinVol(+), \n", 100 | "\tVentTube(VentTube) => \n", 101 | "\tVentLung(VentLung) => \n", 102 | "\tVentAlv(VentAlv) => \n", 103 | "\tFiOb(FiOb) => VentLung(-), VentAlv(-), PVSat(+), ArtCOb(+), ExpCOb(-), \n", 104 | "\tPVSat(PVSat) => \n", 105 | "\tSaOb(SaOb) => Shunt(-), VentMach(+), VentTube(+), VentLung(+), VentAlv(+), PVSat(+), ArtCOb(-), Catechol(-), HR(-), CO(-), BP(-), HREKG(-), HRSat(-), HRBP(-), ExpCOb(+), Press(+), MinVol(+), \n", 106 | "\tAnaphylaxis(Anaphylaxis) => TPR(+), Catechol(-), HR(-), BP(+), HREKG(-), HRSat(-), HRBP(-), \n", 107 | "\tTPR(TPR) => \n", 108 | "\tArtCOb(ArtCOb) => \n", 109 | "\tCatechol(Catechol) => \n", 110 | "\tHR(HR) => \n", 111 | "\tCO(CO) => \n", 112 | "\tHistory(History) => \n", 113 | "\tBP(BP) => \n", 114 | "\tErrCauter(ErrCauter) => HREKG(+), HRSat(+), \n", 115 | "\tHREKG(HREKG) => \n", 116 | "\tHRSat(HRSat) => \n", 117 | "\tErrLowOutput(ErrLowOutput) => HRBP(+), \n", 118 | "\tHRBP(HRBP) => \n", 119 | "\tExpCOb(ExpCOb) => \n", 120 | "\tPAP(PAP) => \n", 121 | "\tPress(Press) => \n", 122 | "\tMinVol(MinVol) => \n" 123 | ] 124 | } 125 | ], 126 | "prompt_number": 3 127 | }, 128 | { 129 | "cell_type": "code", 130 | "collapsed": false, 131 | "input": [ 132 | "s,g,sp=learn_structure(\"alarm_2.txt\",\"alarm_net\")" 133 | ], 134 | "language": "python", 135 | "metadata": {}, 136 | "outputs": [], 137 | "prompt_number": 8 138 | }, 139 | { 140 | "cell_type": "code", 141 | "collapsed": false, 142 | "input": [ 143 | "g" 144 | ], 145 | "language": "python", 146 | "metadata": {}, 147 | "outputs": [ 148 | { 149 | "metadata": {}, 150 | "output_type": "pyout", 151 | "prompt_number": 20, 152 | "text": [ 153 | "Graph: \n", 154 | "\tHypovolemia(Hypovolemia) => LVEDVolume(+), StrokeVolume(+), CVP(+), PCWP(+), BP(+), \n", 155 | "\tLVFailure(LVFailure) => LVEDVolume(-), StrokeVolume(+), CVP(-), PCWP(-), History(+), \n", 156 | "\tLVEDVolume(LVEDVolume) => \n", 157 | "\tStrokeVolume(StrokeVolume) => CO(+), \n", 158 | "\tCVP(CVP) => \n", 159 | "\tPCWP(PCWP) => \n", 160 | "\tInsuffAnesth(InsuffAnesth) => Catechol(-), BP(-), HREKG(-), HRSat(-), HRBP(-), \n", 161 | "\tPulmEmbolus(PulmEmbolus) => Shunt(-), VentAlv(-), PVSat(-), ArtCOb(+), ExpCOb(-), PAP(-), MinVol(-), \n", 162 | "\tIntubation(Intubation) => Shunt(+), VentLung(-), VentAlv(-), PVSat(-), ArtCOb(+), ExpCOb(+), Press(+), MinVol(-), \n", 163 | "\tShunt(Shunt) => SaOb(-), \n", 164 | "\tKinkedTube(KinkedTube) => Shunt(-), VentLung(+), VentAlv(+), PVSat(+), ArtCOb(-), ExpCOb(+), Press(-), MinVol(+), \n", 165 | "\tMinVolSet(MinVolSet) => VentMach(+), VentAlv(+), ArtCOb(-), ExpCOb(-), Press(+), MinVol(+), \n", 166 | "\tVentMach(VentMach) => VentTube(+), \n", 167 | "\tDisconnect(Disconnect) => Shunt(-), VentTube(+), VentAlv(+), PVSat(+), ArtCOb(-), ExpCOb(+), Press(+), MinVol(+), \n", 168 | "\tVentTube(VentTube) => VentLung(+), \n", 169 | "\tVentLung(VentLung) => \n", 170 | "\tVentAlv(VentAlv) => \n", 171 | "\tFiOb(FiOb) => VentAlv(-), PVSat(+), ArtCOb(+), ExpCOb(-), \n", 172 | "\tPVSat(PVSat) => SaOb(+), \n", 173 | "\tSaOb(SaOb) => Shunt(-), VentMach(+), VentAlv(+), PVSat(+), ArtCOb(-), Catechol(-), BP(-), HREKG(-), HRSat(-), HRBP(-), ExpCOb(+), Press(+), MinVol(+), \n", 174 | "\tAnaphylaxis(Anaphylaxis) => TPR(+), BP(+), HREKG(-), HRSat(-), HRBP(-), \n", 175 | "\tTPR(TPR) => Catechol(-), \n", 176 | "\tArtCOb(ArtCOb) => Catechol(+), \n", 177 | "\tCatechol(Catechol) => HR(+), \n", 178 | "\tHR(HR) => CO(+), \n", 179 | "\tCO(CO) => \n", 180 | "\tHistory(History) => \n", 181 | "\tBP(BP) => \n", 182 | "\tErrCauter(ErrCauter) => HREKG(+), HRSat(+), \n", 183 | "\tHREKG(HREKG) => \n", 184 | "\tHRSat(HRSat) => \n", 185 | "\tErrLowOutput(ErrLowOutput) => HRBP(+), \n", 186 | "\tHRBP(HRBP) => \n", 187 | "\tExpCOb(ExpCOb) => \n", 188 | "\tPAP(PAP) => \n", 189 | "\tPress(Press) => \n", 190 | "\tMinVol(MinVol) => \n" 191 | ] 192 | } 193 | ], 194 | "prompt_number": 20 195 | } 196 | ], 197 | "metadata": {} 198 | } 199 | ] 200 | } -------------------------------------------------------------------------------- /chap04/alarm_network.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "metadata": { 3 | "name": "" 4 | }, 5 | "nbformat": 3, 6 | "nbformat_minor": 0, 7 | "worksheets": [ 8 | { 9 | "cells": [ 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "Let us attempt to use constraint based approaches to a larger dataset. The \"Logical Alarm Reduction Mechanism\" network is a Bayesian network designed to provide an alarm message system for patient monitoring. This network has 37 vertices and 46 edges, considerably larger than the toy job interview network we have been using so far that had 5 vertices and 4 edges.\n", 15 | "\n", 16 | "The dataset can be found here (http://www.cs.ru.nl/~peterl/BN/) and is commonly referred to as the Alarm network. More information on the dataset(such as column descriptions) can be found here (http://www.bnlearn.com/documentation/man/alarm.html).\n", 17 | "\n", 18 | "Let us load the alarm.csv file using the Pandas library." 19 | ] 20 | }, 21 | { 22 | "cell_type": "code", 23 | "collapsed": false, 24 | "input": [ 25 | "import pandas as pd\n", 26 | "import numpy as np\n", 27 | "df=pd.read_csv(\"alarm.csv\")" 28 | ], 29 | "language": "python", 30 | "metadata": {}, 31 | "outputs": [], 32 | "prompt_number": 1 33 | }, 34 | { 35 | "cell_type": "markdown", 36 | "metadata": {}, 37 | "source": [ 38 | "The alarm.csv file has records that we should convert into a format that libpgm can consume. Each instance should be a dictionary, where keys are column names and values are column values. The following function does the same, and returns a list of dictionaries that are sampled without replacement from the original dataset." 39 | ] 40 | }, 41 | { 42 | "cell_type": "code", 43 | "collapsed": false, 44 | "input": [ 45 | "from random import randint,sample\n", 46 | "\n", 47 | "def rand_index(dframe,n_samples=100):\n", 48 | " rindex = np.array(sample(xrange(len(dframe)) ,n_samples if n_samples <=len(dframe) else len(dframe)))\n", 49 | " return [{i:j.values()[0] for i,j in dframe.iloc[[k]].to_dict().items()} for k in rindex ]\n", 50 | "\n", 51 | "#Lets examine a single sample:\n", 52 | "rand_index(df,n_samples=1)" 53 | ], 54 | "language": "python", 55 | "metadata": {}, 56 | "outputs": [ 57 | { 58 | "metadata": {}, 59 | "output_type": "pyout", 60 | "prompt_number": 2, 61 | "text": [ 62 | "[{'Anaphylaxis': 'b',\n", 63 | " 'ArtCOb': 'b',\n", 64 | " 'BP': 'c',\n", 65 | " 'CO': 'b',\n", 66 | " 'CVP': 'b',\n", 67 | " 'Catechol': 'a',\n", 68 | " 'Disconnect': 'b',\n", 69 | " 'ErrCauter': 'b',\n", 70 | " 'ErrLowOutput': 'b',\n", 71 | " 'ExpCOb': 'c',\n", 72 | " 'FiOb': 'b',\n", 73 | " 'HR': 'b',\n", 74 | " 'HRBP': 'b',\n", 75 | " 'HREKG': 'b',\n", 76 | " 'HRSat': 'b',\n", 77 | " 'History': 'b',\n", 78 | " 'Hypovolemia': 'b',\n", 79 | " 'InsuffAnesth': 'b',\n", 80 | " 'Intubation': 'a',\n", 81 | " 'KinkedTube': 'b',\n", 82 | " 'LVEDVolume': 'b',\n", 83 | " 'LVFailure': 'b',\n", 84 | " 'MinVol': 'c',\n", 85 | " 'MinVolSet': 'b',\n", 86 | " 'PAP': 'b',\n", 87 | " 'PCWP': 'b',\n", 88 | " 'PVSat': 'b',\n", 89 | " 'Press': 'c',\n", 90 | " 'PulmEmbolus': 'b',\n", 91 | " 'SaOb': 'b',\n", 92 | " 'Shunt': 'a',\n", 93 | " 'StrokeVolume': 'b',\n", 94 | " 'TPR': 'b',\n", 95 | " 'VentAlv': 'c',\n", 96 | " 'VentLung': 'c',\n", 97 | " 'VentMach': 'c',\n", 98 | " 'VentTube': 'c'}]" 99 | ] 100 | } 101 | ], 102 | "prompt_number": 2 103 | }, 104 | { 105 | "cell_type": "markdown", 106 | "metadata": {}, 107 | "source": [ 108 | "Lets load the data, create an instance of the learner object and estimate the structure with a small number of samples (100). " 109 | ] 110 | }, 111 | { 112 | "cell_type": "code", 113 | "collapsed": false, 114 | "input": [ 115 | "import json\n", 116 | "\n", 117 | "from libpgm.nodedata import NodeData\n", 118 | "from libpgm.graphskeleton import GraphSkeleton\n", 119 | "from libpgm.discretebayesiannetwork import DiscreteBayesianNetwork\n", 120 | "from libpgm.pgmlearner import PGMLearner\n", 121 | "\n", 122 | "data=rand_index(df,n_samples=100)\n", 123 | "learner = PGMLearner()\n", 124 | "result=learner.discrete_constraint_estimatestruct(data)\n", 125 | "print result.E" 126 | ], 127 | "language": "python", 128 | "metadata": {}, 129 | "outputs": [ 130 | { 131 | "output_type": "stream", 132 | "stream": "stdout", 133 | "text": [ 134 | "[['HR', 'HRSat'], ['CVP', 'LVEDVolume'], ['ExpCOb', 'ArtCOb'], ['StrokeVolume', 'CO'], ['TPR', 'BP'], ['CO', 'BP']]\n" 135 | ] 136 | } 137 | ], 138 | "prompt_number": 3 139 | }, 140 | { 141 | "cell_type": "markdown", 142 | "metadata": {}, 143 | "source": [ 144 | "To compare the performance of the structure found to the correct alarm network, let's load the alarm network. The file parent-child.txt contains, on each line, the parent vertex, followed by the child vertices (in some cases, some nodes are leaf nodes that have no children)" 145 | ] 146 | }, 147 | { 148 | "cell_type": "code", 149 | "collapsed": false, 150 | "input": [ 151 | "file = open('parent-child.txt', 'r')\n", 152 | "\n", 153 | "def edges(line):\n", 154 | " st=line.strip('\\n').strip(' ').split(' ')\n", 155 | " #print st\n", 156 | " return [[st[0],i] for i in st[1:] ]\n", 157 | "\n", 158 | "all_edges=[l for line in file for l in edges(line)]\n", 159 | "#a set containing the correct edges\n", 160 | "ground_truth=set([tuple(i) for i in all_edges])\n", 161 | "print all_edges[:5]\n" 162 | ], 163 | "language": "python", 164 | "metadata": {}, 165 | "outputs": [ 166 | { 167 | "output_type": "stream", 168 | "stream": "stdout", 169 | "text": [ 170 | "[['HISTORY', 'LVFAILURE'], ['CVP', 'LVEDVOLUME'], ['PCWP', 'LVEDVOLUME'], ['LVEDVOLUME', 'HYPOVOLEMIA'], ['LVEDVOLUME', 'LVFAILURE']]\n" 171 | ] 172 | } 173 | ], 174 | "prompt_number": 4 175 | }, 176 | { 177 | "cell_type": "markdown", 178 | "metadata": {}, 179 | "source": [ 180 | "Lets define a diagnostic function that compares the found network with the correct network. \n", 181 | "\n", 182 | "The learned structure tries to find edges between nodes, and its performance can be categorized as follows:\n", 183 | " 1. It finds edges that exist in the correct network. \n", 184 | " 2. Some of these have wrong directionality.\n", 185 | " 3. It also finds spurios edges that don't exist in the correct network.\n", 186 | " \n", 187 | "We define a function that can quantify the above errors in the learned network." 188 | ] 189 | }, 190 | { 191 | "cell_type": "code", 192 | "collapsed": false, 193 | "input": [ 194 | "def printdiag(result):\n", 195 | " found=set([tuple([j.upper() for j in i]) for i in result.E])\n", 196 | " correct=ground_truth.intersection(found)\n", 197 | " undirected_common_edges=[(i,j) for i,j in found for k,l in ground_truth if i.find(k)!=-1 and j.find(l)!=-1]\n", 198 | " print \"Number of edges in learnt network \",len(found)\n", 199 | " print \"Total number of edges in true network \",len(ground_truth)\n", 200 | " print \"Number of edges with correct directionality \",len(correct)\n", 201 | " print \"Number of edges with incorrect directionality \",len(undirected_common_edges)" 202 | ], 203 | "language": "python", 204 | "metadata": {}, 205 | "outputs": [], 206 | "prompt_number": 12 207 | }, 208 | { 209 | "cell_type": "markdown", 210 | "metadata": {}, 211 | "source": [ 212 | "Lets define a function that prints the resulting statistics when the dataset picks a specific number of samples." 213 | ] 214 | }, 215 | { 216 | "cell_type": "code", 217 | "collapsed": false, 218 | "input": [ 219 | "def learn_structure(n_samples):\n", 220 | " data=rand_index(df,n_samples)\n", 221 | " learner = PGMLearner()\n", 222 | " result1=learner.discrete_constraint_estimatestruct(data)\n", 223 | " printdiag(result1)\n", 224 | " \n", 225 | "learn_structure(1000)" 226 | ], 227 | "language": "python", 228 | "metadata": {}, 229 | "outputs": [ 230 | { 231 | "output_type": "stream", 232 | "stream": "stdout", 233 | "text": [ 234 | "Number of edges in learnt 36\n", 235 | "Total number of edges in true network 46\n", 236 | "Number of edges with correct directionality 6\n", 237 | "Number of edges with incorrect directionality 7\n" 238 | ] 239 | } 240 | ], 241 | "prompt_number": 10 242 | }, 243 | { 244 | "cell_type": "markdown", 245 | "metadata": {}, 246 | "source": [ 247 | "Lets examine the performance of the learn_structure when provided with 1000 samples. It finds less than half of the edges, and only a few edges correctly connect nodes. Does increasing the sample size help? (Caution: this may take a few minutes to run)" 248 | ] 249 | }, 250 | { 251 | "cell_type": "code", 252 | "collapsed": false, 253 | "input": [ 254 | "learn_structure(10000)" 255 | ], 256 | "language": "python", 257 | "metadata": {}, 258 | "outputs": [ 259 | { 260 | "output_type": "stream", 261 | "stream": "stdout", 262 | "text": [ 263 | "Number of edges in learnt network 30\n", 264 | "Total number of edges in true network 46\n", 265 | "Number of edges with correct directionality 2\n", 266 | "Number of edges with incorrect directionality 3\n" 267 | ] 268 | } 269 | ], 270 | "prompt_number": 13 271 | }, 272 | { 273 | "cell_type": "markdown", 274 | "metadata": {}, 275 | "source": [ 276 | "We can see that the number of samples is still not enough to improve the number of edges that are correctly identified. The complexity of the algorithm has a running time of O(n^(d+2)) where n is the number of vertices and d is the upper bound on the witness set. Various versions of this algorithm try to constrain the witness set to improve its performance(described in Chapter 2 of this paper (http://arxiv.org/pdf/1111.6925.pdf))" 277 | ] 278 | } 279 | ], 280 | "metadata": {} 281 | } 282 | ] 283 | } -------------------------------------------------------------------------------- /chap04/constraint-based.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "metadata": { 3 | "name": "" 4 | }, 5 | "nbformat": 3, 6 | "nbformat_minor": 0, 7 | "worksheets": [ 8 | { 9 | "cells": [ 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "In this notebook we shall attempt to learn the structure of a Bayes net using constraint-based approaches.We shall first load the network " 15 | ] 16 | }, 17 | { 18 | "cell_type": "code", 19 | "collapsed": false, 20 | "input": [ 21 | "import json\n", 22 | "\n", 23 | "from libpgm.nodedata import NodeData\n", 24 | "from libpgm.graphskeleton import GraphSkeleton\n", 25 | "from libpgm.discretebayesiannetwork import DiscreteBayesianNetwork\n", 26 | "from libpgm.pgmlearner import PGMLearner\n", 27 | "\n", 28 | "nd = NodeData()\n", 29 | "skel = GraphSkeleton()\n", 30 | "\n", 31 | "fpath=\"job_interview.txt\"\n", 32 | "nd.load(fpath)\n", 33 | "skel.load(fpath)\n", 34 | "skel.toporder()\n", 35 | "\n", 36 | "bn = DiscreteBayesianNetwork(skel, nd)" 37 | ], 38 | "language": "python", 39 | "metadata": {}, 40 | "outputs": [], 41 | "prompt_number": 24 42 | }, 43 | { 44 | "cell_type": "markdown", 45 | "metadata": {}, 46 | "source": [ 47 | "It may seem strange that we are loading a network with existing structure and parameters (which are defined in the file job_interview.txt). For this example, we shall be using synthetic data, which is samples drawn from an existing network. This helps us compare our results with the known network that we started with. To start with, we shall drawn two random samples from the job_interview network which we have seen in previous chapters (where?)" 48 | ] 49 | }, 50 | { 51 | "cell_type": "code", 52 | "collapsed": false, 53 | "input": [ 54 | "bn.randomsample(2)" 55 | ], 56 | "language": "python", 57 | "metadata": {}, 58 | "outputs": [ 59 | { 60 | "metadata": {}, 61 | "output_type": "pyout", 62 | "prompt_number": 25, 63 | "text": [ 64 | "[{u'Admission': u'admitted',\n", 65 | " u'Experience': u'high',\n", 66 | " u'Grades': u'poor',\n", 67 | " u'Interview': u'good',\n", 68 | " u'Offer': u'no'},\n", 69 | " {u'Admission': u'admitted',\n", 70 | " u'Experience': u'low',\n", 71 | " u'Grades': u'poor',\n", 72 | " u'Interview': u'poor',\n", 73 | " u'Offer': u'yes'}]" 74 | ] 75 | } 76 | ], 77 | "prompt_number": 25 78 | }, 79 | { 80 | "cell_type": "markdown", 81 | "metadata": {}, 82 | "source": [ 83 | "We can see the random samples is one specific assignment to each random variable, which is drawn from the joint distribution. It could also be thought of, as a random assignment for all the nodes in the network.\n", 84 | "\n", 85 | "The algorithm goes thus: we first wish to inquire about the conditional independence of all pairs of nodes. This is achieved by running the chi-squared test. The \"null hypothesis\" states that nodes X and Y are conditionally independent, given Z. \n", 86 | "\n", 87 | "The method discrete_condind returns the value of chi-square, as well as the p-values, which is the probability that the variables are independent is due to chance. We have a prior value for the cut-off (of p-values), where we decide that any p-value > 0.05 means that probability of independence between X and Y is too high to have occured by chance, and therefore, X and Y are indeed independent. " 88 | ] 89 | }, 90 | { 91 | "cell_type": "code", 92 | "collapsed": false, 93 | "input": [ 94 | "learner = PGMLearner()\n", 95 | "data = bn.randomsample(200)\n", 96 | "\n", 97 | "X,Y='Grades','Offer'\n", 98 | "c,p,w=learner.discrete_condind(data,X,Y,[])\n", 99 | "print \"independence between X and Y: \",c,\" p-value\",p,\" witness node: \",w" 100 | ], 101 | "language": "python", 102 | "metadata": {}, 103 | "outputs": [ 104 | { 105 | "output_type": "stream", 106 | "stream": "stdout", 107 | "text": [ 108 | "correlation between X and Y: 8184619.56996 p-value 0.0 witness node: []\n" 109 | ] 110 | } 111 | ], 112 | "prompt_number": 33 113 | }, 114 | { 115 | "cell_type": "markdown", 116 | "metadata": {}, 117 | "source": [ 118 | "We can see that with a p-value < 0.05, Grades and Offer are not independent. Since the D-separation rules state that given the job interview network, Grades and Offer have an active trail between them, which gets blocked if the Interview variable is observed. What happens if we introduce the witness (read: observed) variable Interview" 119 | ] 120 | }, 121 | { 122 | "cell_type": "code", 123 | "collapsed": false, 124 | "input": [ 125 | "X,Y='Grades','Offer'\n", 126 | "c,p,w=learner.discrete_condind(data,X,Y,['Interview'])\n", 127 | "print \"independence between X and Y: \",c,\" p-value\",p,\" witness node \",w" 128 | ], 129 | "language": "python", 130 | "metadata": {}, 131 | "outputs": [ 132 | { 133 | "output_type": "stream", 134 | "stream": "stdout", 135 | "text": [ 136 | "correlation between X and Y: 2.79444519518 p-value 0.993172910586 witness node ['Interview']\n" 137 | ] 138 | } 139 | ], 140 | "prompt_number": 32 141 | }, 142 | { 143 | "cell_type": "markdown", 144 | "metadata": {}, 145 | "source": [ 146 | "Now Grades and Offer are conditionally independent (because the p-value is much > than 0.05). \n", 147 | "The first stage of the algorithm essentially tries to determine the conditional independence for all pairs of nodes in the network, given other witness variables. We are then left with a set of undirected dependencies between nodes.\n", 148 | "\n", 149 | "The second and third stage of the algorithm are essential contained in the discrete_constraint_estimatestruct method, where the set of dependencies are converted into an undirected graph, and then the directionalities are resolved.\n" 150 | ] 151 | }, 152 | { 153 | "cell_type": "code", 154 | "collapsed": false, 155 | "input": [ 156 | "result=learner.discrete_constraint_estimatestruct(data)\n", 157 | "print result.E" 158 | ], 159 | "language": "python", 160 | "metadata": {}, 161 | "outputs": [ 162 | { 163 | "output_type": "stream", 164 | "stream": "stdout", 165 | "text": [ 166 | "[[u'Grades', u'Admission'], [u'Experience', u'Interview'], [u'Grades', u'Interview'], [u'Interview', u'Offer']]\n" 167 | ] 168 | } 169 | ], 170 | "prompt_number": 34 171 | }, 172 | { 173 | "cell_type": "markdown", 174 | "metadata": {}, 175 | "source": [ 176 | "We can see that the five original edges in the job interview network have been found. But this is a small network, will this algorithm scale to a larger network?" 177 | ] 178 | } 179 | ], 180 | "metadata": {} 181 | } 182 | ] 183 | } -------------------------------------------------------------------------------- /chap04/job_interview.bif: -------------------------------------------------------------------------------- 1 | \\ File generated by BNfinder 2 | \\ 05/05/14 10:47:51 3 | \\ Conditional probability distributions generated with total pseudocounts number 1.000000 4 | 5 | network "name" {} 6 | 7 | variable "Admission" { 8 | type discrete[2] { "admitted" "rejected" } 9 | } 10 | probability ( "Admission" | "Grades" ) { 11 | default 0.5 0.5 ; 12 | ( "0" ) 0.00032637075718 0.00032637075718 ; 13 | ( "1" ) 0.00014409221902 0.00014409221902 ; 14 | } 15 | 16 | variable "Experience" { 17 | type discrete[2] { "high" "low" } 18 | } 19 | probability ( "Experience" ) { 20 | table 9.99800039992e-05 9.99800039992e-05 ; 21 | } 22 | 23 | variable "Grades" { 24 | type discrete[2] { "good" "poor" } 25 | } 26 | probability ( "Grades" ) { 27 | table 9.99800039992e-05 9.99800039992e-05 ; 28 | } 29 | 30 | variable "Interview" { 31 | type discrete[3] { "average" "good" "poor" } 32 | } 33 | probability ( "Interview" | "Experience" "Grades" ) { 34 | default 0.333333333333 0.333333333333 0.333333333333 ; 35 | ( "0" "1" ) 0.00036443148688 0.00036443148688 0.00036443148688 ; 36 | ( "1" "0" ) 0.000553097345133 0.000553097345133 0.000553097345133 ; 37 | ( "0" "0" ) 0.000793650793651 0.000793650793651 0.000793650793651 ; 38 | ( "1" "1" ) 0.000238095238095 0.000238095238095 0.000238095238095 ; 39 | } 40 | 41 | variable "Offer" { 42 | type discrete[2] { "no" "yes" } 43 | } 44 | probability ( "Offer" | "Interview" ) { 45 | default 0.5 0.5 ; 46 | ( "2" ) 0.000205761316872 0.000205761316872 ; 47 | ( "0" ) 0.000309310238169 0.000309310238169 ; 48 | ( "1" ) 0.000522739153163 0.000522739153163 ; 49 | } 50 | 51 | -------------------------------------------------------------------------------- /chap04/job_interview.sif: -------------------------------------------------------------------------------- 1 | Experience + Interview 2 | Grades - Admission 3 | Grades + Interview 4 | Interview + Offer 5 | -------------------------------------------------------------------------------- /chap04/job_interview.txt: -------------------------------------------------------------------------------- 1 | { 2 | "V": ["Offer", "Interview", "Grades", "Admission", "Experience"], 3 | "E": [["Grades", "Interview"], 4 | ["Experience", "Interview"], 5 | ["Grades", "Admission"], 6 | ["Interview", "Offer"]], 7 | "Vdata": { 8 | "Offer": { 9 | "ord": 4, 10 | "numoutcomes": 2, 11 | "vals": ["yes", "no"], 12 | "parents": ["Interview"], 13 | "children": None, 14 | "cprob": { 15 | "['poor']": [.9, .1], 16 | "['average']": [.4, .6], 17 | "['good']": [.01, .99] 18 | } 19 | }, 20 | 21 | "Admission": { 22 | "ord": 3, 23 | "numoutcomes": 2, 24 | "vals": ["admitted", "rejected"], 25 | "parents": ["Grades"], 26 | "children": None, 27 | "cprob": { 28 | "['poor']": [.7, .3], 29 | "['good']": [.2, .8] 30 | } 31 | }, 32 | 33 | "Interview": { 34 | "ord": 2, 35 | "numoutcomes": 3, 36 | "vals": ["poor", "average", "good"], 37 | "parents": ["Experience", "Grades"], 38 | "children": ["Offer"], 39 | "cprob": { 40 | "['low', 'poor']": [.8, .18, .02], 41 | "['low', 'good']": [.3, .6, .1], 42 | "['high', 'poor']": [.3, .4, .3], 43 | "['high', 'good']": [.1, .2, .7] 44 | } 45 | }, 46 | 47 | "Grades": { 48 | "ord": 1, 49 | "numoutcomes": 2, 50 | "vals": ["poor", "good"], 51 | "parents": None, 52 | "children": ["Admission", "Interview"], 53 | "cprob": [.7, .3] 54 | }, 55 | 56 | "Experience": { 57 | "ord": 0, 58 | "numoutcomes": 2, 59 | "vals": ["low", "high"], 60 | "parents": None, 61 | "children": ["Interview"], 62 | "cprob": [.6, .4] 63 | } 64 | } 65 | } 66 | -------------------------------------------------------------------------------- /chap04/job_interview_cpd.txt: -------------------------------------------------------------------------------- 1 | { 2 | 'Admission' : { 3 | 'vals' : ['admitted', 'rejected'] , 4 | 'pars' : ['Grades'] , 5 | 'cpds' : { 6 | (0,) : { 1 : 0.809399477807 , 0 : 0.190600522193 , None : 0.00032637075718 } , 7 | (1,) : { 0 : 0.695677233429 , 1 : 0.304322766571 , None : 0.00014409221902 } , 8 | None : 0.5 } } , 9 | 'Experience' : { 10 | 'vals' : ['high', 'low'] , 11 | 'pars' : [] , 12 | 'cpds' : { 13 | () : { 0 : 0.399820035993 , 1 : 0.600179964007 , None : 9.99800039992e-05 } , 14 | None : 0.5 } } , 15 | 'Grades' : { 16 | 'vals' : ['good', 'poor'] , 17 | 'pars' : [] , 18 | 'cpds' : { 19 | () : { 0 : 0.30623875225 , 1 : 0.69376124775 , None : 9.99800039992e-05 } , 20 | None : 0.5 } } , 21 | 'Interview' : { 22 | 'vals' : ['average', 'good', 'poor'] , 23 | 'pars' : ['Experience', 'Grades'] , 24 | 'cpds' : { 25 | (0, 1) : { 1 : 0.29555393586 , 2 : 0.306851311953 , 0 : 0.397594752187 , None : 0.00036443148688 } , 26 | (1, 0) : { 0 : 0.599004424779 , 1 : 0.0945796460177 , 2 : 0.306415929204 , None : 0.000553097345133 } , 27 | (0, 0) : { 2 : 0.102380952381 , 1 : 0.677777777778 , 0 : 0.219841269841 , None : 0.000793650793651 } , 28 | (1, 1) : { 0 : 0.186666666667 , 2 : 0.794523809524 , 1 : 0.0188095238095 , None : 0.000238095238095 } , 29 | None : 0.333333333333 } } , 30 | 'Offer' : { 31 | 'vals' : ['no', 'yes'] , 32 | 'pars' : ['Interview'] , 33 | 'cpds' : { 34 | (2,) : { 1 : 0.900205761317 , 0 : 0.0997942386831 , None : 0.000205761316872 } , 35 | (0,) : { 1 : 0.399938137952 , 0 : 0.600061862048 , None : 0.000309310238169 } , 36 | (1,) : { 0 : 0.990590695243 , 1 : 0.00940930475693 , None : 0.000522739153163 } , 37 | None : 0.5 } } , 38 | } 39 | -------------------------------------------------------------------------------- /chap04/job_interview_samples.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "metadata": { 3 | "name": "" 4 | }, 5 | "nbformat": 3, 6 | "nbformat_minor": 0, 7 | "worksheets": [ 8 | { 9 | "cells": [ 10 | { 11 | "cell_type": "code", 12 | "collapsed": false, 13 | "input": [ 14 | "import json\n", 15 | "\n", 16 | "from libpgm.nodedata import NodeData\n", 17 | "from libpgm.graphskeleton import GraphSkeleton\n", 18 | "from libpgm.discretebayesiannetwork import DiscreteBayesianNetwork\n", 19 | "from libpgm.pgmlearner import PGMLearner\n", 20 | "\n", 21 | "nd = NodeData()\n", 22 | "skel = GraphSkeleton()\n", 23 | "\n", 24 | "fpath=\"job_interview.txt\"\n", 25 | "nd.load(fpath)\n", 26 | "skel.load(fpath)\n", 27 | "skel.toporder()\n", 28 | "\n", 29 | "bn = DiscreteBayesianNetwork(skel, nd)" 30 | ], 31 | "language": "python", 32 | "metadata": {}, 33 | "outputs": [], 34 | "prompt_number": 1 35 | }, 36 | { 37 | "cell_type": "code", 38 | "collapsed": false, 39 | "input": [ 40 | "def get_samples(n):\n", 41 | " s=bn.randomsample(n)\n", 42 | " k=s[0].keys()\n", 43 | " lst_of_cols=dict(zip(k,[[] for i in range(len(k))]))\n", 44 | " [lst_of_cols[i].append(row[i]) for row in s for i in k ]\n", 45 | " return lst_of_cols" 46 | ], 47 | "language": "python", 48 | "metadata": {}, 49 | "outputs": [], 50 | "prompt_number": 2 51 | }, 52 | { 53 | "cell_type": "code", 54 | "collapsed": false, 55 | "input": [ 56 | "import pandas as pd\n", 57 | "samples=get_samples(10000)\n", 58 | "df=pd.DataFrame(samples)\n", 59 | "df.transpose().to_csv(\"job_interview_samples.txt\",sep=\"\\t\")" 60 | ], 61 | "language": "python", 62 | "metadata": {}, 63 | "outputs": [], 64 | "prompt_number": 3 65 | } 66 | ], 67 | "metadata": {} 68 | } 69 | ] 70 | } -------------------------------------------------------------------------------- /chap04/job_interview_sif.txt: -------------------------------------------------------------------------------- 1 | Experience + Interview 2 | Grades - Admission 3 | Grades + Interview 4 | Interview + Offer 5 | -------------------------------------------------------------------------------- /chap04/job_net.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shark8me/Building_Probabilistic_Graphical_Models_in_Python/c1f7ad013e1d20759eb396c866fa95ac4a9e8885/chap04/job_net.png -------------------------------------------------------------------------------- /chap04/nursery_net.bif: -------------------------------------------------------------------------------- 1 | \\ File generated by BNfinder 2 | \\ 02/28/14 18:14:18 3 | \\ Conditional probability distributions generated with total pseudocounts number 1.000000 4 | 5 | network "0" {} 6 | 7 | variable "parents" { 8 | type discrete[3] { "great_pret" "pretentious" "usual" } 9 | } 10 | probability ( "parents" | "has_nurs" "health" "class" ) { 11 | default 0.333333333333 0.333333333333 0.333333333333 ; 12 | ( "0" "1" "1" ) 0.00925925925926 0.00925925925926 0.00925925925926 ; 13 | ( "1" "0" "0" ) 0.00115340253749 0.00115340253749 0.00115340253749 ; 14 | ( "2" "2" "4" ) 0.00740740740741 0.00740740740741 0.00740740740741 ; 15 | ( "0" "1" "3" ) 0.00131233595801 0.00131233595801 0.00131233595801 ; 16 | ( "0" "2" "1" ) 0.00276243093923 0.00276243093923 0.00276243093923 ; 17 | ( "2" "2" "3" ) 0.0151515151515 0.0151515151515 0.0151515151515 ; 18 | ( "4" "0" "0" ) 0.00115340253749 0.00115340253749 0.00115340253749 ; 19 | ( "0" "2" "3" ) 0.00196850393701 0.00196850393701 0.00196850393701 ; 20 | ( "3" "2" "2" ) 0.2 0.2 0.2 ; 21 | ( "3" "1" "3" ) 0.00520833333333 0.00520833333333 0.00520833333333 ; 22 | ( "3" "2" "3" ) 0.0151515151515 0.0151515151515 0.0151515151515 ; 23 | ( "2" "0" "0" ) 0.00115340253749 0.00115340253749 0.00115340253749 ; 24 | ( "1" "2" "1" ) 0.00193423597679 0.00193423597679 0.00193423597679 ; 25 | ( "3" "1" "1" ) 0.00147492625369 0.00147492625369 0.00147492625369 ; 26 | ( "3" "2" "1" ) 0.0014880952381 0.0014880952381 0.0014880952381 ; 27 | ( "1" "2" "3" ) 0.00348432055749 0.00348432055749 0.00348432055749 ; 28 | ( "0" "0" "0" ) 0.00115340253749 0.00115340253749 0.00115340253749 ; 29 | ( "4" "1" "1" ) 0.0833333333333 0.0833333333333 0.0833333333333 ; 30 | ( "1" "2" "4" ) 0.0144927536232 0.0144927536232 0.0144927536232 ; 31 | ( "1" "1" "1" ) 0.00254452926209 0.00254452926209 0.00254452926209 ; 32 | ( "3" "2" "4" ) 0.00751879699248 0.00751879699248 0.00751879699248 ; 33 | ( "2" "1" "3" ) 0.00520833333333 0.00520833333333 0.00520833333333 ; 34 | ( "4" "1" "3" ) 0.0011655011655 0.0011655011655 0.0011655011655 ; 35 | ( "2" "2" "1" ) 0.0014880952381 0.0014880952381 0.0014880952381 ; 36 | ( "2" "1" "1" ) 0.00147492625369 0.00147492625369 0.00147492625369 ; 37 | ( "4" "2" "1" ) 0.00490196078431 0.00490196078431 0.00490196078431 ; 38 | ( "1" "1" "3" ) 0.0020964360587 0.0020964360587 0.0020964360587 ; 39 | ( "3" "0" "0" ) 0.00115340253749 0.00115340253749 0.00115340253749 ; 40 | ( "4" "2" "3" ) 0.0015015015015 0.0015015015015 0.0015015015015 ; 41 | } 42 | 43 | variable "has_nurs" { 44 | type discrete[5] { "critical" "improper" "less_proper" "proper" "very_crit" } 45 | } 46 | probability ( "has_nurs" | "parents" "social" "health" "class" ) { 47 | default 0.2 0.2 0.2 0.2 0.2 ; 48 | ( "2" "2" "1" "1" ) 0.00305810397554 0.00305810397554 0.00305810397554 0.00305810397554 0.00305810397554 ; 49 | ( "0" "2" "2" "1" ) 0.00337837837838 0.00337837837838 0.00337837837838 0.00337837837838 0.00337837837838 ; 50 | ( "2" "0" "1" "3" ) 0.00613496932515 0.00613496932515 0.00613496932515 0.00613496932515 0.00613496932515 ; 51 | ( "1" "1" "1" "1" ) 0.00431034482759 0.00431034482759 0.00431034482759 0.00431034482759 0.00431034482759 ; 52 | ( "1" "2" "0" "0" ) 0.0020618556701 0.0020618556701 0.0020618556701 0.0020618556701 0.0020618556701 ; 53 | ( "0" "0" "2" "3" ) 0.00515463917526 0.00515463917526 0.00515463917526 0.00515463917526 0.00515463917526 ; 54 | ( "2" "0" "0" "0" ) 0.0020618556701 0.0020618556701 0.0020618556701 0.0020618556701 0.0020618556701 ; 55 | ( "0" "0" "1" "1" ) 0.0135135135135 0.0135135135135 0.0135135135135 0.0135135135135 0.0135135135135 ; 56 | ( "1" "2" "1" "3" ) 0.00387596899225 0.00387596899225 0.00387596899225 0.00387596899225 0.00387596899225 ; 57 | ( "2" "1" "1" "3" ) 0.00613496932515 0.00613496932515 0.00613496932515 0.00613496932515 0.00613496932515 ; 58 | ( "0" "2" "1" "3" ) 0.00240384615385 0.00240384615385 0.00240384615385 0.00240384615385 0.00240384615385 ; 59 | ( "1" "0" "1" "1" ) 0.00431034482759 0.00431034482759 0.00431034482759 0.00431034482759 0.00431034482759 ; 60 | ( "2" "1" "2" "1" ) 0.00305810397554 0.00305810397554 0.00305810397554 0.00305810397554 0.00305810397554 ; 61 | ( "2" "0" "2" "3" ) 0.0147058823529 0.0147058823529 0.0147058823529 0.0147058823529 0.0147058823529 ; 62 | ( "1" "0" "2" "3" ) 0.00763358778626 0.00763358778626 0.00763358778626 0.00763358778626 0.00763358778626 ; 63 | ( "2" "2" "2" "2" ) 0.166666666667 0.166666666667 0.166666666667 0.166666666667 0.166666666667 ; 64 | ( "1" "2" "2" "4" ) 0.0140845070423 0.0140845070423 0.0140845070423 0.0140845070423 0.0140845070423 ; 65 | ( "0" "2" "0" "0" ) 0.0020618556701 0.0020618556701 0.0020618556701 0.0020618556701 0.0020618556701 ; 66 | ( "0" "1" "1" "1" ) 0.0135135135135 0.0135135135135 0.0135135135135 0.0135135135135 0.0135135135135 ; 67 | ( "2" "2" "2" "1" ) 0.0030959752322 0.0030959752322 0.0030959752322 0.0030959752322 0.0030959752322 ; 68 | ( "1" "2" "2" "1" ) 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 ; 69 | ( "2" "2" "1" "3" ) 0.00613496932515 0.00613496932515 0.00613496932515 0.00613496932515 0.00613496932515 ; 70 | ( "0" "2" "2" "3" ) 0.00515463917526 0.00515463917526 0.00515463917526 0.00515463917526 0.00515463917526 ; 71 | ( "2" "0" "1" "1" ) 0.00305810397554 0.00305810397554 0.00305810397554 0.00305810397554 0.00305810397554 ; 72 | ( "1" "1" "1" "3" ) 0.00387596899225 0.00387596899225 0.00387596899225 0.00387596899225 0.00387596899225 ; 73 | ( "0" "1" "2" "3" ) 0.00240384615385 0.00240384615385 0.00240384615385 0.00240384615385 0.00240384615385 ; 74 | ( "0" "0" "2" "1" ) 0.00337837837838 0.00337837837838 0.00337837837838 0.00337837837838 0.00337837837838 ; 75 | ( "2" "0" "2" "4" ) 0.00970873786408 0.00970873786408 0.00970873786408 0.00970873786408 0.00970873786408 ; 76 | ( "1" "0" "0" "0" ) 0.0020618556701 0.0020618556701 0.0020618556701 0.0020618556701 0.0020618556701 ; 77 | ( "0" "0" "1" "3" ) 0.00240384615385 0.00240384615385 0.00240384615385 0.00240384615385 0.00240384615385 ; 78 | ( "1" "1" "2" "1" ) 0.00431034482759 0.00431034482759 0.00431034482759 0.00431034482759 0.00431034482759 ; 79 | ( "2" "1" "1" "1" ) 0.00305810397554 0.00305810397554 0.00305810397554 0.00305810397554 0.00305810397554 ; 80 | ( "2" "0" "2" "2" ) 0.166666666667 0.166666666667 0.166666666667 0.166666666667 0.166666666667 ; 81 | ( "0" "1" "0" "0" ) 0.0020618556701 0.0020618556701 0.0020618556701 0.0020618556701 0.0020618556701 ; 82 | ( "0" "2" "1" "1" ) 0.0135135135135 0.0135135135135 0.0135135135135 0.0135135135135 0.0135135135135 ; 83 | ( "2" "1" "2" "3" ) 0.00613496932515 0.00613496932515 0.00613496932515 0.00613496932515 0.00613496932515 ; 84 | ( "0" "1" "1" "3" ) 0.00240384615385 0.00240384615385 0.00240384615385 0.00240384615385 0.00240384615385 ; 85 | ( "2" "2" "0" "0" ) 0.0020618556701 0.0020618556701 0.0020618556701 0.0020618556701 0.0020618556701 ; 86 | ( "1" "1" "0" "0" ) 0.0020618556701 0.0020618556701 0.0020618556701 0.0020618556701 0.0020618556701 ; 87 | ( "1" "2" "1" "1" ) 0.00431034482759 0.00431034482759 0.00431034482759 0.00431034482759 0.00431034482759 ; 88 | ( "0" "1" "2" "1" ) 0.0135135135135 0.0135135135135 0.0135135135135 0.0135135135135 0.0135135135135 ; 89 | ( "2" "1" "0" "0" ) 0.0020618556701 0.0020618556701 0.0020618556701 0.0020618556701 0.0020618556701 ; 90 | ( "1" "0" "2" "4" ) 0.0140845070423 0.0140845070423 0.0140845070423 0.0140845070423 0.0140845070423 ; 91 | ( "0" "0" "0" "0" ) 0.0020618556701 0.0020618556701 0.0020618556701 0.0020618556701 0.0020618556701 ; 92 | ( "2" "0" "2" "1" ) 0.0030959752322 0.0030959752322 0.0030959752322 0.0030959752322 0.0030959752322 ; 93 | ( "1" "0" "2" "1" ) 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 ; 94 | ( "1" "1" "2" "3" ) 0.00387596899225 0.00387596899225 0.00387596899225 0.00387596899225 0.00387596899225 ; 95 | ( "2" "2" "2" "4" ) 0.00970873786408 0.00970873786408 0.00970873786408 0.00970873786408 0.00970873786408 ; 96 | ( "1" "0" "1" "3" ) 0.00387596899225 0.00387596899225 0.00387596899225 0.00387596899225 0.00387596899225 ; 97 | ( "2" "2" "2" "3" ) 0.0147058823529 0.0147058823529 0.0147058823529 0.0147058823529 0.0147058823529 ; 98 | ( "1" "2" "2" "3" ) 0.00763358778626 0.00763358778626 0.00763358778626 0.00763358778626 0.00763358778626 ; 99 | } 100 | 101 | variable "form" { 102 | type discrete[4] { "complete" "completed" "foster" "incomplete" } 103 | } 104 | probability ( "form" ) { 105 | table 7.7136686208e-05 7.7136686208e-05 7.7136686208e-05 7.7136686208e-05 ; 106 | } 107 | 108 | variable "children" { 109 | type discrete[4] { "1" "2" "3" "more" } 110 | } 111 | probability ( "children" | "class" ) { 112 | default 0.25 0.25 0.25 0.25 ; 113 | ( "2" ) 0.166666666667 0.166666666667 0.166666666667 0.166666666667 ; 114 | ( "0" ) 0.000231267345051 0.000231267345051 0.000231267345051 0.000231267345051 ; 115 | ( "3" ) 0.000247035573123 0.000247035573123 0.000247035573123 0.000247035573123 ; 116 | ( "1" ) 0.000234192037471 0.000234192037471 0.000234192037471 0.000234192037471 ; 117 | ( "4" ) 0.00301204819277 0.00301204819277 0.00301204819277 0.00301204819277 ; 118 | } 119 | 120 | variable "housing" { 121 | type discrete[3] { "convenient" "critical" "less_conv" } 122 | } 123 | probability ( "housing" | "has_nurs" "finance" "class" ) { 124 | default 0.333333333333 0.333333333333 0.333333333333 ; 125 | ( "3" "0" "2" ) 0.2 0.2 0.2 ; 126 | ( "0" "1" "1" ) 0.00534759358289 0.00534759358289 0.00534759358289 ; 127 | ( "1" "0" "3" ) 0.0028818443804 0.0028818443804 0.0028818443804 ; 128 | ( "1" "0" "0" ) 0.00229885057471 0.00229885057471 0.00229885057471 ; 129 | ( "3" "0" "4" ) 0.0112359550562 0.0112359550562 0.0112359550562 ; 130 | ( "1" "0" "1" ) 0.00208768267223 0.00208768267223 0.00208768267223 ; 131 | ( "4" "0" "3" ) 0.00138312586445 0.00138312586445 0.00138312586445 ; 132 | ( "2" "0" "3" ) 0.00934579439252 0.00934579439252 0.00934579439252 ; 133 | ( "4" "0" "1" ) 0.00680272108844 0.00680272108844 0.00680272108844 ; 134 | ( "2" "0" "4" ) 0.010989010989 0.010989010989 0.010989010989 ; 135 | ( "1" "0" "4" ) 0.0212765957447 0.0212765957447 0.0212765957447 ; 136 | ( "4" "0" "0" ) 0.00229885057471 0.00229885057471 0.00229885057471 ; 137 | ( "3" "0" "3" ) 0.00934579439252 0.00934579439252 0.00934579439252 ; 138 | ( "3" "1" "4" ) 0.0212765957447 0.0212765957447 0.0212765957447 ; 139 | ( "2" "0" "1" ) 0.00148148148148 0.00148148148148 0.00148148148148 ; 140 | ( "2" "0" "0" ) 0.00229885057471 0.00229885057471 0.00229885057471 ; 141 | ( "0" "0" "3" ) 0.00170357751278 0.00170357751278 0.00170357751278 ; 142 | ( "3" "1" "1" ) 0.00148148148148 0.00148148148148 0.00148148148148 ; 143 | ( "3" "1" "3" ) 0.00662251655629 0.00662251655629 0.00662251655629 ; 144 | ( "2" "1" "4" ) 0.0212765957447 0.0212765957447 0.0212765957447 ; 145 | ( "3" "1" "0" ) 0.00229885057471 0.00229885057471 0.00229885057471 ; 146 | ( "0" "0" "1" ) 0.00353356890459 0.00353356890459 0.00353356890459 ; 147 | ( "4" "1" "0" ) 0.00229885057471 0.00229885057471 0.00229885057471 ; 148 | ( "0" "0" "0" ) 0.00229885057471 0.00229885057471 0.00229885057471 ; 149 | ( "4" "1" "1" ) 0.0144927536232 0.0144927536232 0.0144927536232 ; 150 | ( "1" "1" "1" ) 0.00232018561485 0.00232018561485 0.00232018561485 ; 151 | ( "0" "1" "3" ) 0.00146412884334 0.00146412884334 0.00146412884334 ; 152 | ( "2" "1" "3" ) 0.00662251655629 0.00662251655629 0.00662251655629 ; 153 | ( "1" "1" "0" ) 0.00229885057471 0.00229885057471 0.00229885057471 ; 154 | ( "4" "1" "3" ) 0.00124843945069 0.00124843945069 0.00124843945069 ; 155 | ( "2" "1" "0" ) 0.00229885057471 0.00229885057471 0.00229885057471 ; 156 | ( "1" "1" "3" ) 0.00239808153477 0.00239808153477 0.00239808153477 ; 157 | ( "2" "1" "1" ) 0.00148148148148 0.00148148148148 0.00148148148148 ; 158 | ( "3" "0" "0" ) 0.00229885057471 0.00229885057471 0.00229885057471 ; 159 | ( "3" "0" "1" ) 0.00148148148148 0.00148148148148 0.00148148148148 ; 160 | ( "0" "1" "0" ) 0.00229885057471 0.00229885057471 0.00229885057471 ; 161 | ( "1" "1" "4" ) 0.04 0.04 0.04 ; 162 | } 163 | 164 | variable "finance" { 165 | type discrete[2] { "convenient" "inconv" } 166 | } 167 | probability ( "finance" | "housing" "class" ) { 168 | default 0.5 0.5 ; 169 | ( "0" "1" ) 0.000617283950617 0.000617283950617 ; 170 | ( "0" "0" ) 0.000693481276006 0.000693481276006 ; 171 | ( "1" "1" ) 0.000797448165869 0.000797448165869 ; 172 | ( "2" "1" ) 0.00071530758226 0.00071530758226 ; 173 | ( "0" "2" ) 0.25 0.25 ; 174 | ( "2" "0" ) 0.000693481276006 0.000693481276006 ; 175 | ( "1" "3" ) 0.000621118012422 0.000621118012422 ; 176 | ( "2" "3" ) 0.000721500721501 0.000721500721501 ; 177 | ( "1" "4" ) 0.0454545454545 0.0454545454545 ; 178 | ( "0" "4" ) 0.0047619047619 0.0047619047619 ; 179 | ( "1" "0" ) 0.000693481276006 0.000693481276006 ; 180 | ( "0" "3" ) 0.000948766603416 0.000948766603416 ; 181 | ( "2" "4" ) 0.00980392156863 0.00980392156863 ; 182 | } 183 | 184 | variable "social" { 185 | type discrete[3] { "nonprob" "problematic" "slightly_prob" } 186 | } 187 | probability ( "social" | "parents" "has_nurs" "health" "class" ) { 188 | default 0.333333333333 0.333333333333 0.333333333333 ; 189 | ( "1" "3" "2" "1" ) 0.00444444444444 0.00444444444444 0.00444444444444 ; 190 | ( "2" "3" "2" "2" ) 0.2 0.2 0.2 ; 191 | ( "2" "2" "1" "1" ) 0.00343642611684 0.00343642611684 0.00343642611684 ; 192 | ( "0" "3" "0" "0" ) 0.00343642611684 0.00343642611684 0.00343642611684 ; 193 | ( "1" "3" "2" "4" ) 0.0144927536232 0.0144927536232 0.0144927536232 ; 194 | ( "0" "4" "2" "1" ) 0.0142857142857 0.0142857142857 0.0142857142857 ; 195 | ( "2" "4" "2" "1" ) 0.0142857142857 0.0142857142857 0.0142857142857 ; 196 | ( "2" "0" "1" "3" ) 0.00520833333333 0.00520833333333 0.00520833333333 ; 197 | ( "2" "3" "2" "4" ) 0.0149253731343 0.0149253731343 0.0149253731343 ; 198 | ( "1" "1" "1" "1" ) 0.00980392156863 0.00980392156863 0.00980392156863 ; 199 | ( "1" "2" "0" "0" ) 0.00343642611684 0.00343642611684 0.00343642611684 ; 200 | ( "1" "4" "2" "3" ) 0.00446428571429 0.00446428571429 0.00446428571429 ; 201 | ( "0" "0" "2" "3" ) 0.00446428571429 0.00446428571429 0.00446428571429 ; 202 | ( "0" "4" "1" "1" ) 0.166666666667 0.166666666667 0.166666666667 ; 203 | ( "0" "2" "1" "3" ) 0.00520833333333 0.00520833333333 0.00520833333333 ; 204 | ( "2" "0" "0" "0" ) 0.00343642611684 0.00343642611684 0.00343642611684 ; 205 | ( "2" "3" "1" "1" ) 0.00343642611684 0.00343642611684 0.00343642611684 ; 206 | ( "0" "0" "1" "1" ) 0.166666666667 0.166666666667 0.166666666667 ; 207 | ( "2" "4" "1" "3" ) 0.00347222222222 0.00347222222222 0.00347222222222 ; 208 | ( "1" "0" "1" "1" ) 0.166666666667 0.166666666667 0.166666666667 ; 209 | ( "2" "1" "2" "1" ) 0.00444444444444 0.00444444444444 0.00444444444444 ; 210 | ( "2" "0" "2" "3" ) 0.0151515151515 0.0151515151515 0.0151515151515 ; 211 | ( "1" "0" "2" "3" ) 0.00446428571429 0.00446428571429 0.00446428571429 ; 212 | ( "1" "2" "2" "4" ) 0.0144927536232 0.0144927536232 0.0144927536232 ; 213 | ( "2" "4" "0" "0" ) 0.00343642611684 0.00343642611684 0.00343642611684 ; 214 | ( "0" "1" "1" "1" ) 0.166666666667 0.166666666667 0.166666666667 ; 215 | ( "0" "4" "0" "0" ) 0.00343642611684 0.00343642611684 0.00343642611684 ; 216 | ( "2" "2" "2" "1" ) 0.00444444444444 0.00444444444444 0.00444444444444 ; 217 | ( "1" "2" "2" "1" ) 0.00444444444444 0.00444444444444 0.00444444444444 ; 218 | ( "0" "2" "0" "0" ) 0.00343642611684 0.00343642611684 0.00343642611684 ; 219 | ( "1" "4" "0" "0" ) 0.00343642611684 0.00343642611684 0.00343642611684 ; 220 | ( "0" "4" "1" "3" ) 0.00347222222222 0.00347222222222 0.00347222222222 ; 221 | ( "0" "3" "1" "3" ) 0.00520833333333 0.00520833333333 0.00520833333333 ; 222 | ( "1" "3" "0" "0" ) 0.00343642611684 0.00343642611684 0.00343642611684 ; 223 | ( "0" "3" "2" "1" ) 0.00438596491228 0.00438596491228 0.00438596491228 ; 224 | ( "2" "4" "2" "3" ) 0.00446428571429 0.00446428571429 0.00446428571429 ; 225 | ( "2" "0" "1" "1" ) 0.00980392156863 0.00980392156863 0.00980392156863 ; 226 | ( "2" "3" "0" "0" ) 0.00343642611684 0.00343642611684 0.00343642611684 ; 227 | ( "1" "1" "1" "3" ) 0.00520833333333 0.00520833333333 0.00520833333333 ; 228 | ( "0" "2" "2" "1" ) 0.00438596491228 0.00438596491228 0.00438596491228 ; 229 | ( "0" "1" "2" "3" ) 0.00446428571429 0.00446428571429 0.00446428571429 ; 230 | ( "0" "0" "2" "1" ) 0.0142857142857 0.0142857142857 0.0142857142857 ; 231 | ( "1" "0" "0" "0" ) 0.00343642611684 0.00343642611684 0.00343642611684 ; 232 | ( "0" "0" "1" "3" ) 0.00347222222222 0.00347222222222 0.00347222222222 ; 233 | ( "1" "4" "1" "1" ) 0.166666666667 0.166666666667 0.166666666667 ; 234 | ( "1" "1" "2" "1" ) 0.00438596491228 0.00438596491228 0.00438596491228 ; 235 | ( "2" "1" "1" "1" ) 0.00343642611684 0.00343642611684 0.00343642611684 ; 236 | ( "2" "4" "1" "1" ) 0.166666666667 0.166666666667 0.166666666667 ; 237 | ( "0" "1" "0" "0" ) 0.00343642611684 0.00343642611684 0.00343642611684 ; 238 | ( "0" "2" "1" "1" ) 0.00980392156863 0.00980392156863 0.00980392156863 ; 239 | ( "0" "2" "2" "3" ) 0.0151515151515 0.0151515151515 0.0151515151515 ; 240 | ( "2" "1" "2" "4" ) 0.0144927536232 0.0144927536232 0.0144927536232 ; 241 | ( "1" "3" "1" "1" ) 0.00343642611684 0.00343642611684 0.00343642611684 ; 242 | ( "0" "1" "1" "3" ) 0.00347222222222 0.00347222222222 0.00347222222222 ; 243 | ( "2" "3" "2" "1" ) 0.00444444444444 0.00444444444444 0.00444444444444 ; 244 | ( "1" "4" "1" "3" ) 0.00347222222222 0.00347222222222 0.00347222222222 ; 245 | ( "2" "2" "0" "0" ) 0.00343642611684 0.00343642611684 0.00343642611684 ; 246 | ( "0" "3" "1" "1" ) 0.00980392156863 0.00980392156863 0.00980392156863 ; 247 | ( "1" "4" "2" "1" ) 0.0142857142857 0.0142857142857 0.0142857142857 ; 248 | ( "0" "3" "2" "3" ) 0.0151515151515 0.0151515151515 0.0151515151515 ; 249 | ( "1" "1" "0" "0" ) 0.00343642611684 0.00343642611684 0.00343642611684 ; 250 | ( "1" "2" "1" "1" ) 0.00343642611684 0.00343642611684 0.00343642611684 ; 251 | ( "0" "1" "2" "1" ) 0.0142857142857 0.0142857142857 0.0142857142857 ; 252 | ( "2" "1" "0" "0" ) 0.00343642611684 0.00343642611684 0.00343642611684 ; 253 | ( "1" "0" "1" "3" ) 0.00347222222222 0.00347222222222 0.00347222222222 ; 254 | ( "0" "0" "0" "0" ) 0.00343642611684 0.00343642611684 0.00343642611684 ; 255 | ( "2" "0" "2" "1" ) 0.00438596491228 0.00438596491228 0.00438596491228 ; 256 | ( "1" "0" "2" "1" ) 0.0142857142857 0.0142857142857 0.0142857142857 ; 257 | ( "1" "1" "2" "3" ) 0.0151515151515 0.0151515151515 0.0151515151515 ; 258 | ( "2" "2" "2" "4" ) 0.0144927536232 0.0144927536232 0.0144927536232 ; 259 | ( "0" "4" "2" "3" ) 0.00446428571429 0.00446428571429 0.00446428571429 ; 260 | } 261 | 262 | variable "health" { 263 | type discrete[3] { "not_recom" "priority" "recommended" } 264 | } 265 | probability ( "health" | "has_nurs" "class" ) { 266 | default 0.333333333333 0.333333333333 0.333333333333 ; 267 | ( "0" "1" ) 0.00214132762313 0.00214132762313 0.00214132762313 ; 268 | ( "3" "2" ) 0.2 0.2 0.2 ; 269 | ( "0" "0" ) 0.00115340253749 0.00115340253749 0.00115340253749 ; 270 | ( "0" "3" ) 0.000789265982636 0.000789265982636 0.000789265982636 ; 271 | ( "3" "3" ) 0.00392156862745 0.00392156862745 0.00392156862745 ; 272 | ( "3" "0" ) 0.00115340253749 0.00115340253749 0.00115340253749 ; 273 | ( "4" "1" ) 0.00469483568075 0.00469483568075 0.00469483568075 ; 274 | ( "3" "1" ) 0.000742390497402 0.000742390497402 0.000742390497402 ; 275 | ( "2" "1" ) 0.000742390497402 0.000742390497402 0.000742390497402 ; 276 | ( "1" "1" ) 0.00110253583241 0.00110253583241 0.00110253583241 ; 277 | ( "2" "0" ) 0.00115340253749 0.00115340253749 0.00115340253749 ; 278 | ( "1" "3" ) 0.00131406044678 0.00131406044678 0.00131406044678 ; 279 | ( "2" "3" ) 0.00392156862745 0.00392156862745 0.00392156862745 ; 280 | ( "1" "4" ) 0.0144927536232 0.0144927536232 0.0144927536232 ; 281 | ( "4" "3" ) 0.000657462195924 0.000657462195924 0.000657462195924 ; 282 | ( "1" "0" ) 0.00115340253749 0.00115340253749 0.00115340253749 ; 283 | ( "3" "4" ) 0.00751879699248 0.00751879699248 0.00751879699248 ; 284 | ( "2" "4" ) 0.00740740740741 0.00740740740741 0.00740740740741 ; 285 | ( "4" "0" ) 0.00115340253749 0.00115340253749 0.00115340253749 ; 286 | } 287 | 288 | variable "class" { 289 | type discrete[5] { "not_recom" "priority" "recommend" "spec_prior" "very_recom" } 290 | } 291 | probability ( "class" | "parents" "has_nurs" "health" ) { 292 | default 0.2 0.2 0.2 0.2 0.2 ; 293 | ( "0" "1" "1" ) 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 ; 294 | ( "0" "2" "2" ) 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 ; 295 | ( "1" "3" "2" ) 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 ; 296 | ( "1" "0" "0" ) 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 ; 297 | ( "0" "1" "2" ) 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 ; 298 | ( "1" "3" "1" ) 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 ; 299 | ( "1" "0" "1" ) 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 ; 300 | ( "0" "2" "1" ) 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 ; 301 | ( "1" "3" "0" ) 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 ; 302 | ( "2" "3" "0" ) 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 ; 303 | ( "2" "4" "1" ) 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 ; 304 | ( "2" "2" "0" ) 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 ; 305 | ( "2" "3" "1" ) 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 ; 306 | ( "2" "4" "0" ) 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 ; 307 | ( "2" "3" "2" ) 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 ; 308 | ( "0" "4" "2" ) 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 ; 309 | ( "1" "4" "1" ) 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 ; 310 | ( "0" "2" "0" ) 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 ; 311 | ( "2" "4" "2" ) 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 ; 312 | ( "0" "3" "0" ) 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 ; 313 | ( "2" "0" "1" ) 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 ; 314 | ( "1" "2" "0" ) 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 ; 315 | ( "2" "0" "0" ) 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 ; 316 | ( "1" "2" "1" ) 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 ; 317 | ( "0" "0" "2" ) 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 ; 318 | ( "1" "1" "0" ) 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 ; 319 | ( "1" "2" "2" ) 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 ; 320 | ( "2" "0" "2" ) 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 ; 321 | ( "0" "0" "1" ) 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 ; 322 | ( "0" "3" "1" ) 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 ; 323 | ( "0" "0" "0" ) 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 ; 324 | ( "0" "1" "0" ) 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 ; 325 | ( "2" "1" "2" ) 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 ; 326 | ( "1" "4" "0" ) 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 ; 327 | ( "1" "1" "1" ) 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 ; 328 | ( "0" "4" "0" ) 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 ; 329 | ( "0" "3" "2" ) 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 ; 330 | ( "2" "2" "2" ) 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 ; 331 | ( "2" "1" "0" ) 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 ; 332 | ( "2" "2" "1" ) 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 ; 333 | ( "1" "4" "2" ) 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 ; 334 | ( "2" "1" "1" ) 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 ; 335 | ( "1" "1" "2" ) 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 ; 336 | ( "0" "4" "1" ) 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 ; 337 | ( "1" "0" "2" ) 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 0.00341296928328 ; 338 | } 339 | 340 | -------------------------------------------------------------------------------- /chap04/nursery_net_cpd.txt: -------------------------------------------------------------------------------- 1 | { 2 | 'parents' : { 3 | 'vals' : ['great_pret', 'pretentious', 'usual'] , 4 | 'pars' : ['has_nurs', 'health', 'class'] , 5 | 'cpds' : { 6 | (0, 1, 1) : { 2 : 0.925925925926 , 1 : 0.037037037037 , 0 : 0.037037037037 , None : 0.00925925925926 } , 7 | (1, 0, 0) : { 2 : 0.333333333333 , 0 : 0.333333333333 , 1 : 0.333333333333 , None : 0.00115340253749 } , 8 | (2, 2, 4) : { 2 : 0.496296296296 , 1 : 0.496296296296 , None : 0.00740740740741 } , 9 | (0, 1, 3) : { 2 : 0.249343832021 , 0 : 0.37532808399 , 1 : 0.37532808399 , None : 0.00131233595801 } , 10 | (0, 2, 1) : { 0 : 0.187845303867 , 1 : 0.187845303867 , 2 : 0.624309392265 , None : 0.00276243093923 } , 11 | (2, 2, 3) : { 0 : 0.969696969697 , None : 0.0151515151515 } , 12 | (4, 0, 0) : { 1 : 0.333333333333 , 0 : 0.333333333333 , 2 : 0.333333333333 , None : 0.00115340253749 } , 13 | (0, 2, 3) : { 0 : 0.437007874016 , 1 : 0.437007874016 , 2 : 0.125984251969 , None : 0.00196850393701 } , 14 | (3, 2, 2) : { 2 : 0.6 , None : 0.2 } , 15 | (3, 1, 3) : { 0 : 0.989583333333 , None : 0.00520833333333 } , 16 | (3, 2, 3) : { 0 : 0.969696969697 , None : 0.0151515151515 } , 17 | (2, 0, 0) : { 0 : 0.333333333333 , 2 : 0.333333333333 , 1 : 0.333333333333 , None : 0.00115340253749 } , 18 | (1, 2, 1) : { 0 : 0.131528046422 , 2 : 0.431334622824 , 1 : 0.437137330754 , None : 0.00193423597679 } , 19 | (3, 1, 1) : { 0 : 0.147492625369 , 1 : 0.426253687316 , 2 : 0.426253687316 , None : 0.00147492625369 } , 20 | (3, 2, 1) : { 0 : 0.33630952381 , 2 : 0.331845238095 , 1 : 0.331845238095 , None : 0.0014880952381 } , 21 | (1, 2, 3) : { 0 : 0.773519163763 , 1 : 0.222996515679 , None : 0.00348432055749 } , 22 | (0, 0, 0) : { 2 : 0.333333333333 , 0 : 0.333333333333 , 1 : 0.333333333333 , None : 0.00115340253749 } , 23 | (4, 1, 1) : { 0 : 0.333333333333 , 2 : 0.333333333333 , 1 : 0.333333333333 , None : 0.0833333333333 } , 24 | (1, 2, 4) : { 2 : 0.971014492754 , None : 0.0144927536232 } , 25 | (1, 1, 1) : { 1 : 0.254452926209 , 2 : 0.735368956743 , 0 : 0.0101781170483 , None : 0.00254452926209 } , 26 | (3, 2, 4) : { 1 : 0.503759398496 , 2 : 0.488721804511 , None : 0.00751879699248 } , 27 | (2, 1, 3) : { 0 : 0.989583333333 , None : 0.00520833333333 } , 28 | (4, 1, 3) : { 0 : 0.333333333333 , 2 : 0.333333333333 , 1 : 0.333333333333 , None : 0.0011655011655 } , 29 | (2, 2, 1) : { 1 : 0.331845238095 , 2 : 0.331845238095 , 0 : 0.33630952381 , None : 0.0014880952381 } , 30 | (2, 1, 1) : { 2 : 0.426253687316 , 1 : 0.426253687316 , 0 : 0.147492625369 , None : 0.00147492625369 } , 31 | (4, 2, 1) : { 1 : 0.333333333333 , 2 : 0.333333333333 , 0 : 0.333333333333 , None : 0.00490196078431 } , 32 | (1, 1, 3) : { 1 : 0.398322851153 , 0 : 0.599580712788 , None : 0.0020964360587 } , 33 | (3, 0, 0) : { 1 : 0.333333333333 , 0 : 0.333333333333 , 2 : 0.333333333333 , None : 0.00115340253749 } , 34 | (4, 2, 3) : { 2 : 0.333333333333 , 1 : 0.333333333333 , 0 : 0.333333333333 , None : 0.0015015015015 } , 35 | None : 0.333333333333 } } , 36 | 'has_nurs' : { 37 | 'vals' : ['critical', 'improper', 'less_proper', 'proper', 'very_crit'] , 38 | 'pars' : ['parents', 'social', 'health', 'class'] , 39 | 'cpds' : { 40 | (2, 2, 1, 1) : { 2 : 0.296636085627 , 3 : 0.296636085627 , 4 : 0.00611620795107 , 1 : 0.296636085627 , 0 : 0.103975535168 , None : 0.00305810397554 } , 41 | (0, 2, 2, 1) : { 2 : 0.327702702703 , 3 : 0.327702702703 , 0 : 0.114864864865 , 4 : 0.114864864865 , 1 : 0.114864864865 , None : 0.00337837837838 } , 42 | (2, 0, 1, 3) : { 4 : 0.588957055215 , 0 : 0.39263803681 , None : 0.00613496932515 } , 43 | (1, 1, 1, 1) : { 1 : 0.146551724138 , 0 : 0.00862068965517 , 3 : 0.418103448276 , 2 : 0.418103448276 , 4 : 0.00862068965517 , None : 0.00431034482759 } , 44 | (1, 2, 0, 0) : { 0 : 0.2 , 1 : 0.2 , 2 : 0.2 , 3 : 0.2 , 4 : 0.2 , None : 0.0020618556701 } , 45 | (0, 0, 2, 3) : { 4 : 0.329896907216 , 1 : 0.329896907216 , 0 : 0.329896907216 , None : 0.00515463917526 } , 46 | (2, 0, 0, 0) : { 4 : 0.2 , 1 : 0.2 , 0 : 0.2 , 3 : 0.2 , 2 : 0.2 , None : 0.0020618556701 } , 47 | (0, 0, 1, 1) : { 3 : 0.459459459459 , 4 : 0.027027027027 , 0 : 0.027027027027 , 1 : 0.027027027027 , 2 : 0.459459459459 , None : 0.0135135135135 } , 48 | (1, 2, 1, 3) : { 0 : 0.372093023256 , 1 : 0.248062015504 , 4 : 0.372093023256 , None : 0.00387596899225 } , 49 | (2, 1, 1, 3) : { 0 : 0.39263803681 , 4 : 0.588957055215 , None : 0.00613496932515 } , 50 | (0, 2, 1, 3) : { 4 : 0.230769230769 , 3 : 0.153846153846 , 2 : 0.153846153846 , 0 : 0.230769230769 , 1 : 0.230769230769 , None : 0.00240384615385 } , 51 | (1, 0, 1, 1) : { 0 : 0.00862068965517 , 1 : 0.146551724138 , 4 : 0.00862068965517 , 2 : 0.418103448276 , 3 : 0.418103448276 , None : 0.00431034482759 } , 52 | (2, 1, 2, 1) : { 3 : 0.296636085627 , 4 : 0.00611620795107 , 1 : 0.296636085627 , 0 : 0.103975535168 , 2 : 0.296636085627 , None : 0.00305810397554 } , 53 | (2, 0, 2, 3) : { 4 : 0.941176470588 , None : 0.0147058823529 } , 54 | (1, 0, 2, 3) : { 0 : 0.488549618321 , 4 : 0.488549618321 , None : 0.00763358778626 } , 55 | (2, 2, 2, 2) : { 3 : 0.333333333333 , None : 0.166666666667 } , 56 | (1, 2, 2, 4) : { 2 : 0.478873239437 , 3 : 0.478873239437 , None : 0.0140845070423 } , 57 | (0, 2, 0, 0) : { 2 : 0.2 , 4 : 0.2 , 1 : 0.2 , 0 : 0.2 , 3 : 0.2 , None : 0.0020618556701 } , 58 | (0, 1, 1, 1) : { 3 : 0.459459459459 , 4 : 0.027027027027 , 0 : 0.027027027027 , 1 : 0.027027027027 , 2 : 0.459459459459 , None : 0.0135135135135 } , 59 | (2, 2, 2, 1) : { 0 : 0.300309597523 , 1 : 0.198142414861 , 2 : 0.198142414861 , 3 : 0.198142414861 , 4 : 0.105263157895 , None : 0.0030959752322 } , 60 | (1, 2, 2, 1) : { 3 : 0.21843003413 , 2 : 0.21843003413 , 1 : 0.331058020478 , 0 : 0.116040955631 , 4 : 0.116040955631 , None : 0.00341296928328 } , 61 | (2, 2, 1, 3) : { 0 : 0.39263803681 , 4 : 0.588957055215 , None : 0.00613496932515 } , 62 | (0, 2, 2, 3) : { 0 : 0.329896907216 , 4 : 0.329896907216 , 1 : 0.329896907216 , None : 0.00515463917526 } , 63 | (2, 0, 1, 1) : { 1 : 0.296636085627 , 0 : 0.103975535168 , 3 : 0.296636085627 , 2 : 0.296636085627 , 4 : 0.00611620795107 , None : 0.00305810397554 } , 64 | (1, 1, 1, 3) : { 1 : 0.248062015504 , 4 : 0.372093023256 , 0 : 0.372093023256 , None : 0.00387596899225 } , 65 | (0, 1, 2, 3) : { 3 : 0.153846153846 , 2 : 0.153846153846 , 4 : 0.230769230769 , 1 : 0.230769230769 , 0 : 0.230769230769 , None : 0.00240384615385 } , 66 | (0, 0, 2, 1) : { 0 : 0.114864864865 , 2 : 0.327702702703 , 3 : 0.327702702703 , 4 : 0.114864864865 , 1 : 0.114864864865 , None : 0.00337837837838 } , 67 | (2, 0, 2, 4) : { 3 : 0.320388349515 , 2 : 0.330097087379 , 1 : 0.330097087379 , None : 0.00970873786408 } , 68 | (1, 0, 0, 0) : { 4 : 0.2 , 2 : 0.2 , 3 : 0.2 , 0 : 0.2 , 1 : 0.2 , None : 0.0020618556701 } , 69 | (0, 0, 1, 3) : { 0 : 0.230769230769 , 3 : 0.153846153846 , 1 : 0.230769230769 , 2 : 0.153846153846 , 4 : 0.230769230769 , None : 0.00240384615385 } , 70 | (1, 1, 2, 1) : { 4 : 0.00862068965517 , 2 : 0.418103448276 , 3 : 0.418103448276 , 0 : 0.00862068965517 , 1 : 0.146551724138 , None : 0.00431034482759 } , 71 | (2, 1, 1, 1) : { 4 : 0.00611620795107 , 2 : 0.296636085627 , 3 : 0.296636085627 , 0 : 0.103975535168 , 1 : 0.296636085627 , None : 0.00305810397554 } , 72 | (2, 0, 2, 2) : { 3 : 0.333333333333 , None : 0.166666666667 } , 73 | (0, 1, 0, 0) : { 0 : 0.2 , 1 : 0.2 , 2 : 0.2 , 3 : 0.2 , 4 : 0.2 , None : 0.0020618556701 } , 74 | (0, 2, 1, 1) : { 2 : 0.459459459459 , 3 : 0.459459459459 , 0 : 0.027027027027 , 1 : 0.027027027027 , 4 : 0.027027027027 , None : 0.0135135135135 } , 75 | (2, 1, 2, 3) : { 0 : 0.39263803681 , 4 : 0.588957055215 , None : 0.00613496932515 } , 76 | (0, 1, 1, 3) : { 0 : 0.230769230769 , 1 : 0.230769230769 , 4 : 0.230769230769 , 2 : 0.153846153846 , 3 : 0.153846153846 , None : 0.00240384615385 } , 77 | (2, 2, 0, 0) : { 2 : 0.2 , 1 : 0.2 , 0 : 0.2 , 3 : 0.2 , 4 : 0.2 , None : 0.0020618556701 } , 78 | (1, 1, 0, 0) : { 3 : 0.2 , 2 : 0.2 , 4 : 0.2 , 1 : 0.2 , 0 : 0.2 , None : 0.0020618556701 } , 79 | (1, 2, 1, 1) : { 2 : 0.418103448276 , 3 : 0.418103448276 , 4 : 0.00862068965517 , 0 : 0.00862068965517 , 1 : 0.146551724138 , None : 0.00431034482759 } , 80 | (0, 1, 2, 1) : { 1 : 0.027027027027 , 0 : 0.027027027027 , 2 : 0.459459459459 , 4 : 0.027027027027 , 3 : 0.459459459459 , None : 0.0135135135135 } , 81 | (2, 1, 0, 0) : { 2 : 0.2 , 3 : 0.2 , 0 : 0.2 , 1 : 0.2 , 4 : 0.2 , None : 0.0020618556701 } , 82 | (1, 0, 2, 4) : { 2 : 0.478873239437 , 3 : 0.478873239437 , None : 0.0140845070423 } , 83 | (0, 0, 0, 0) : { 3 : 0.2 , 1 : 0.2 , 4 : 0.2 , 0 : 0.2 , 2 : 0.2 , None : 0.0020618556701 } , 84 | (2, 0, 2, 1) : { 2 : 0.198142414861 , 3 : 0.198142414861 , 0 : 0.300309597523 , 1 : 0.198142414861 , 4 : 0.105263157895 , None : 0.0030959752322 } , 85 | (1, 0, 2, 1) : { 2 : 0.21843003413 , 4 : 0.116040955631 , 1 : 0.331058020478 , 0 : 0.116040955631 , 3 : 0.21843003413 , None : 0.00341296928328 } , 86 | (1, 1, 2, 3) : { 0 : 0.372093023256 , 1 : 0.248062015504 , 4 : 0.372093023256 , None : 0.00387596899225 } , 87 | (2, 2, 2, 4) : { 2 : 0.330097087379 , 1 : 0.330097087379 , 3 : 0.320388349515 , None : 0.00970873786408 } , 88 | (1, 0, 1, 3) : { 4 : 0.372093023256 , 0 : 0.372093023256 , 1 : 0.248062015504 , None : 0.00387596899225 } , 89 | (2, 2, 2, 3) : { 4 : 0.941176470588 , None : 0.0147058823529 } , 90 | (1, 2, 2, 3) : { 4 : 0.488549618321 , 0 : 0.488549618321 , None : 0.00763358778626 } , 91 | None : 0.2 } } , 92 | 'form' : { 93 | 'vals' : ['complete', 'completed', 'foster', 'incomplete'] , 94 | 'pars' : [] , 95 | 'cpds' : { 96 | () : { 2 : 0.25 , 0 : 0.25 , 3 : 0.25 , 1 : 0.25 , None : 7.7136686208e-05 } , 97 | None : 0.25 } } , 98 | 'children' : { 99 | 'vals' : ['1', '2', '3', 'more'] , 100 | 'pars' : ['class'] , 101 | 'cpds' : { 102 | (2,) : { 0 : 0.5 , None : 0.166666666667 } , 103 | (0,) : { 1 : 0.25 , 0 : 0.25 , 3 : 0.25 , 2 : 0.25 , None : 0.000231267345051 } , 104 | (3,) : { 2 : 0.28087944664 , 3 : 0.28087944664 , 0 : 0.198863636364 , 1 : 0.239377470356 , None : 0.000247035573123 } , 105 | (1,) : { 2 : 0.230679156909 , 3 : 0.230679156909 , 0 : 0.282669789227 , 1 : 0.255971896956 , None : 0.000234192037471 } , 106 | (4,) : { 3 : 0.123493975904 , 2 : 0.123493975904 , 1 : 0.30421686747 , 0 : 0.448795180723 , None : 0.00301204819277 } , 107 | None : 0.25 } } , 108 | 'housing' : { 109 | 'vals' : ['convenient', 'critical', 'less_conv'] , 110 | 'pars' : ['has_nurs', 'finance', 'class'] , 111 | 'cpds' : { 112 | (3, 0, 2) : { 0 : 0.6 , None : 0.2 } , 113 | (0, 1, 1) : { 2 : 0.390374331551 , 1 : 0.219251336898 , 0 : 0.390374331551 , None : 0.00534759358289 } , 114 | (1, 0, 3) : { 0 : 0.175792507205 , 2 : 0.377521613833 , 1 : 0.446685878963 , None : 0.0028818443804 } , 115 | (1, 0, 0) : { 2 : 0.333333333333 , 0 : 0.333333333333 , 1 : 0.333333333333 , None : 0.00229885057471 } , 116 | (3, 0, 4) : { 1 : 0.0561797752809 , 2 : 0.23595505618 , 0 : 0.707865168539 , None : 0.0112359550562 } , 117 | (1, 0, 1) : { 0 : 0.41127348643 , 2 : 0.311064718163 , 1 : 0.277661795407 , None : 0.00208768267223 } , 118 | (4, 0, 3) : { 2 : 0.358229598893 , 0 : 0.250345781466 , 1 : 0.39142461964 , None : 0.00138312586445 } , 119 | (2, 0, 3) : { 1 : 0.570093457944 , 2 : 0.420560747664 , None : 0.00934579439252 } , 120 | (4, 0, 1) : { 0 : 0.741496598639 , 1 : 0.047619047619 , 2 : 0.210884353741 , None : 0.00680272108844 } , 121 | (2, 0, 4) : { 2 : 0.230769230769 , 1 : 0.0549450549451 , 0 : 0.714285714286 , None : 0.010989010989 } , 122 | (1, 0, 4) : { 2 : 0.234042553191 , 0 : 0.702127659574 , 1 : 0.063829787234 , None : 0.0212765957447 } , 123 | (4, 0, 0) : { 0 : 0.333333333333 , 2 : 0.333333333333 , 1 : 0.333333333333 , None : 0.00229885057471 } , 124 | (3, 0, 3) : { 2 : 0.420560747664 , 1 : 0.570093457944 , None : 0.00934579439252 } , 125 | (3, 1, 4) : { 2 : 0.446808510638 , 1 : 0.106382978723 , 0 : 0.446808510638 , None : 0.0212765957447 } , 126 | (2, 0, 1) : { 0 : 0.333333333333 , 1 : 0.333333333333 , 2 : 0.333333333333 , None : 0.00148148148148 } , 127 | (2, 0, 0) : { 2 : 0.333333333333 , 1 : 0.333333333333 , 0 : 0.333333333333 , None : 0.00229885057471 } , 128 | (0, 0, 3) : { 1 : 0.424190800681 , 2 : 0.369676320273 , 0 : 0.206132879046 , None : 0.00170357751278 } , 129 | (3, 1, 1) : { 0 : 0.333333333333 , 1 : 0.333333333333 , 2 : 0.333333333333 , None : 0.00148148148148 } , 130 | (3, 1, 3) : { 2 : 0.298013245033 , 0 : 0.298013245033 , 1 : 0.403973509934 , None : 0.00662251655629 } , 131 | (2, 1, 4) : { 1 : 0.106382978723 , 2 : 0.446808510638 , 0 : 0.446808510638 , None : 0.0212765957447 } , 132 | (3, 1, 0) : { 0 : 0.333333333333 , 2 : 0.333333333333 , 1 : 0.333333333333 , None : 0.00229885057471 } , 133 | (0, 0, 1) : { 1 : 0.144876325088 , 2 : 0.257950530035 , 0 : 0.597173144876 , None : 0.00353356890459 } , 134 | (4, 1, 0) : { 0 : 0.333333333333 , 1 : 0.333333333333 , 2 : 0.333333333333 , None : 0.00229885057471 } , 135 | (0, 0, 0) : { 1 : 0.333333333333 , 0 : 0.333333333333 , 2 : 0.333333333333 , None : 0.00229885057471 } , 136 | (4, 1, 1) : { 1 : 0.101449275362 , 0 : 0.449275362319 , 2 : 0.449275362319 , None : 0.0144927536232 } , 137 | (1, 1, 1) : { 0 : 0.345707656613 , 1 : 0.308584686775 , 2 : 0.345707656613 , None : 0.00232018561485 } , 138 | (0, 1, 3) : { 2 : 0.317715959004 , 1 : 0.364568081991 , 0 : 0.317715959004 , None : 0.00146412884334 } , 139 | (2, 1, 3) : { 0 : 0.298013245033 , 2 : 0.298013245033 , 1 : 0.403973509934 , None : 0.00662251655629 } , 140 | (1, 1, 0) : { 1 : 0.333333333333 , 2 : 0.333333333333 , 0 : 0.333333333333 , None : 0.00229885057471 } , 141 | (4, 1, 3) : { 0 : 0.323345817728 , 1 : 0.353308364544 , 2 : 0.323345817728 , None : 0.00124843945069 } , 142 | (2, 1, 0) : { 1 : 0.333333333333 , 2 : 0.333333333333 , 0 : 0.333333333333 , None : 0.00229885057471 } , 143 | (1, 1, 3) : { 1 : 0.37170263789 , 0 : 0.314148681055 , 2 : 0.314148681055 , None : 0.00239808153477 } , 144 | (2, 1, 1) : { 0 : 0.333333333333 , 2 : 0.333333333333 , 1 : 0.333333333333 , None : 0.00148148148148 } , 145 | (3, 0, 0) : { 1 : 0.333333333333 , 2 : 0.333333333333 , 0 : 0.333333333333 , None : 0.00229885057471 } , 146 | (3, 0, 1) : { 1 : 0.333333333333 , 0 : 0.333333333333 , 2 : 0.333333333333 , None : 0.00148148148148 } , 147 | (0, 1, 0) : { 0 : 0.333333333333 , 1 : 0.333333333333 , 2 : 0.333333333333 , None : 0.00229885057471 } , 148 | (1, 1, 4) : { 2 : 0.44 , 1 : 0.12 , 0 : 0.44 , None : 0.04 } , 149 | None : 0.333333333333 } } , 150 | 'finance' : { 151 | 'vals' : ['convenient', 'inconv'] , 152 | 'pars' : ['housing', 'class'] , 153 | 'cpds' : { 154 | (0, 1) : { 1 : 0.431481481481 , 0 : 0.568518518519 , None : 0.000617283950617 } , 155 | (0, 0) : { 1 : 0.5 , 0 : 0.5 , None : 0.000693481276006 } , 156 | (1, 1) : { 1 : 0.5 , 0 : 0.5 , None : 0.000797448165869 } , 157 | (2, 1) : { 0 : 0.5 , 1 : 0.5 , None : 0.00071530758226 } , 158 | (0, 2) : { 0 : 0.75 , None : 0.25 } , 159 | (2, 0) : { 1 : 0.5 , 0 : 0.5 , None : 0.000693481276006 } , 160 | (1, 3) : { 1 : 0.5 , 0 : 0.5 , None : 0.000621118012422 } , 161 | (2, 3) : { 0 : 0.5 , 1 : 0.5 , None : 0.000721500721501 } , 162 | (1, 4) : { 0 : 0.5 , 1 : 0.5 , None : 0.0454545454545 } , 163 | (0, 4) : { 1 : 0.242857142857 , 0 : 0.757142857143 , None : 0.0047619047619 } , 164 | (1, 0) : { 0 : 0.5 , 1 : 0.5 , None : 0.000693481276006 } , 165 | (0, 3) : { 0 : 0.342504743833 , 1 : 0.657495256167 , None : 0.000948766603416 } , 166 | (2, 4) : { 1 : 0.5 , 0 : 0.5 , None : 0.00980392156863 } , 167 | None : 0.5 } } , 168 | 'social' : { 169 | 'vals' : ['nonprob', 'problematic', 'slightly_prob'] , 170 | 'pars' : ['parents', 'has_nurs', 'health', 'class'] , 171 | 'cpds' : { 172 | (1, 3, 2, 1) : { 0 : 0.284444444444 , 1 : 0.431111111111 , 2 : 0.284444444444 , None : 0.00444444444444 } , 173 | (2, 3, 2, 2) : { 2 : 0.4 , 0 : 0.4 , None : 0.2 } , 174 | (2, 2, 1, 1) : { 2 : 0.333333333333 , 1 : 0.333333333333 , 0 : 0.333333333333 , None : 0.00343642611684 } , 175 | (0, 3, 0, 0) : { 2 : 0.333333333333 , 0 : 0.333333333333 , 1 : 0.333333333333 , None : 0.00343642611684 } , 176 | (1, 3, 2, 4) : { 0 : 0.492753623188 , 2 : 0.492753623188 , None : 0.0144927536232 } , 177 | (0, 4, 2, 1) : { 2 : 0.485714285714 , 0 : 0.485714285714 , 1 : 0.0285714285714 , None : 0.0142857142857 } , 178 | (2, 4, 2, 1) : { 2 : 0.485714285714 , 0 : 0.485714285714 , 1 : 0.0285714285714 , None : 0.0142857142857 } , 179 | (2, 0, 1, 3) : { 2 : 0.333333333333 , 1 : 0.333333333333 , 0 : 0.333333333333 , None : 0.00520833333333 } , 180 | (2, 3, 2, 4) : { 2 : 0.492537313433 , 0 : 0.492537313433 , None : 0.0149253731343 } , 181 | (1, 1, 1, 1) : { 1 : 0.333333333333 , 0 : 0.333333333333 , 2 : 0.333333333333 , None : 0.00980392156863 } , 182 | (1, 2, 0, 0) : { 0 : 0.333333333333 , 1 : 0.333333333333 , 2 : 0.333333333333 , None : 0.00343642611684 } , 183 | (1, 4, 2, 3) : { 2 : 0.285714285714 , 1 : 0.428571428571 , 0 : 0.285714285714 , None : 0.00446428571429 } , 184 | (0, 0, 2, 3) : { 1 : 0.428571428571 , 0 : 0.285714285714 , 2 : 0.285714285714 , None : 0.00446428571429 } , 185 | (0, 4, 1, 1) : { 2 : 0.333333333333 , 1 : 0.333333333333 , 0 : 0.333333333333 , None : 0.166666666667 } , 186 | (0, 2, 1, 3) : { 2 : 0.333333333333 , 1 : 0.333333333333 , 0 : 0.333333333333 , None : 0.00520833333333 } , 187 | (2, 0, 0, 0) : { 1 : 0.333333333333 , 0 : 0.333333333333 , 2 : 0.333333333333 , None : 0.00343642611684 } , 188 | (2, 3, 1, 1) : { 1 : 0.333333333333 , 2 : 0.333333333333 , 0 : 0.333333333333 , None : 0.00343642611684 } , 189 | (0, 0, 1, 1) : { 0 : 0.333333333333 , 1 : 0.333333333333 , 2 : 0.333333333333 , None : 0.166666666667 } , 190 | (2, 4, 1, 3) : { 0 : 0.333333333333 , 2 : 0.333333333333 , 1 : 0.333333333333 , None : 0.00347222222222 } , 191 | (1, 0, 1, 1) : { 0 : 0.333333333333 , 1 : 0.333333333333 , 2 : 0.333333333333 , None : 0.166666666667 } , 192 | (2, 1, 2, 1) : { 1 : 0.431111111111 , 0 : 0.284444444444 , 2 : 0.284444444444 , None : 0.00444444444444 } , 193 | (2, 0, 2, 3) : { 1 : 0.969696969697 , None : 0.0151515151515 } , 194 | (1, 0, 2, 3) : { 2 : 0.285714285714 , 1 : 0.428571428571 , 0 : 0.285714285714 , None : 0.00446428571429 } , 195 | (1, 2, 2, 4) : { 2 : 0.492753623188 , 0 : 0.492753623188 , None : 0.0144927536232 } , 196 | (2, 4, 0, 0) : { 0 : 0.333333333333 , 2 : 0.333333333333 , 1 : 0.333333333333 , None : 0.00343642611684 } , 197 | (0, 1, 1, 1) : { 2 : 0.333333333333 , 1 : 0.333333333333 , 0 : 0.333333333333 , None : 0.166666666667 } , 198 | (0, 4, 0, 0) : { 0 : 0.333333333333 , 2 : 0.333333333333 , 1 : 0.333333333333 , None : 0.00343642611684 } , 199 | (2, 2, 2, 1) : { 0 : 0.284444444444 , 1 : 0.431111111111 , 2 : 0.284444444444 , None : 0.00444444444444 } , 200 | (1, 2, 2, 1) : { 2 : 0.284444444444 , 1 : 0.431111111111 , 0 : 0.284444444444 , None : 0.00444444444444 } , 201 | (0, 2, 0, 0) : { 2 : 0.333333333333 , 1 : 0.333333333333 , 0 : 0.333333333333 , None : 0.00343642611684 } , 202 | (1, 4, 0, 0) : { 0 : 0.333333333333 , 1 : 0.333333333333 , 2 : 0.333333333333 , None : 0.00343642611684 } , 203 | (0, 4, 1, 3) : { 0 : 0.333333333333 , 2 : 0.333333333333 , 1 : 0.333333333333 , None : 0.00347222222222 } , 204 | (0, 3, 1, 3) : { 2 : 0.333333333333 , 0 : 0.333333333333 , 1 : 0.333333333333 , None : 0.00520833333333 } , 205 | (1, 3, 0, 0) : { 2 : 0.333333333333 , 1 : 0.333333333333 , 0 : 0.333333333333 , None : 0.00343642611684 } , 206 | (0, 3, 2, 1) : { 1 : 0.149122807018 , 2 : 0.425438596491 , 0 : 0.425438596491 , None : 0.00438596491228 } , 207 | (2, 4, 2, 3) : { 0 : 0.285714285714 , 1 : 0.428571428571 , 2 : 0.285714285714 , None : 0.00446428571429 } , 208 | (2, 0, 1, 1) : { 1 : 0.333333333333 , 0 : 0.333333333333 , 2 : 0.333333333333 , None : 0.00980392156863 } , 209 | (2, 3, 0, 0) : { 1 : 0.333333333333 , 2 : 0.333333333333 , 0 : 0.333333333333 , None : 0.00343642611684 } , 210 | (1, 1, 1, 3) : { 2 : 0.333333333333 , 1 : 0.333333333333 , 0 : 0.333333333333 , None : 0.00520833333333 } , 211 | (0, 2, 2, 1) : { 0 : 0.425438596491 , 2 : 0.425438596491 , 1 : 0.149122807018 , None : 0.00438596491228 } , 212 | (0, 1, 2, 3) : { 2 : 0.285714285714 , 1 : 0.428571428571 , 0 : 0.285714285714 , None : 0.00446428571429 } , 213 | (0, 0, 2, 1) : { 0 : 0.485714285714 , 1 : 0.0285714285714 , 2 : 0.485714285714 , None : 0.0142857142857 } , 214 | (1, 0, 0, 0) : { 0 : 0.333333333333 , 2 : 0.333333333333 , 1 : 0.333333333333 , None : 0.00343642611684 } , 215 | (0, 0, 1, 3) : { 1 : 0.333333333333 , 0 : 0.333333333333 , 2 : 0.333333333333 , None : 0.00347222222222 } , 216 | (1, 4, 1, 1) : { 2 : 0.333333333333 , 0 : 0.333333333333 , 1 : 0.333333333333 , None : 0.166666666667 } , 217 | (1, 1, 2, 1) : { 2 : 0.425438596491 , 0 : 0.425438596491 , 1 : 0.149122807018 , None : 0.00438596491228 } , 218 | (2, 1, 1, 1) : { 2 : 0.333333333333 , 0 : 0.333333333333 , 1 : 0.333333333333 , None : 0.00343642611684 } , 219 | (2, 4, 1, 1) : { 1 : 0.333333333333 , 0 : 0.333333333333 , 2 : 0.333333333333 , None : 0.166666666667 } , 220 | (0, 1, 0, 0) : { 0 : 0.333333333333 , 1 : 0.333333333333 , 2 : 0.333333333333 , None : 0.00343642611684 } , 221 | (0, 2, 1, 1) : { 2 : 0.333333333333 , 1 : 0.333333333333 , 0 : 0.333333333333 , None : 0.00980392156863 } , 222 | (0, 2, 2, 3) : { 1 : 0.969696969697 , None : 0.0151515151515 } , 223 | (2, 1, 2, 4) : { 2 : 0.492753623188 , 0 : 0.492753623188 , None : 0.0144927536232 } , 224 | (1, 3, 1, 1) : { 0 : 0.333333333333 , 2 : 0.333333333333 , 1 : 0.333333333333 , None : 0.00343642611684 } , 225 | (0, 1, 1, 3) : { 1 : 0.333333333333 , 0 : 0.333333333333 , 2 : 0.333333333333 , None : 0.00347222222222 } , 226 | (2, 3, 2, 1) : { 1 : 0.431111111111 , 0 : 0.284444444444 , 2 : 0.284444444444 , None : 0.00444444444444 } , 227 | (1, 4, 1, 3) : { 0 : 0.333333333333 , 1 : 0.333333333333 , 2 : 0.333333333333 , None : 0.00347222222222 } , 228 | (2, 2, 0, 0) : { 2 : 0.333333333333 , 1 : 0.333333333333 , 0 : 0.333333333333 , None : 0.00343642611684 } , 229 | (0, 3, 1, 1) : { 1 : 0.333333333333 , 2 : 0.333333333333 , 0 : 0.333333333333 , None : 0.00980392156863 } , 230 | (1, 4, 2, 1) : { 2 : 0.485714285714 , 1 : 0.0285714285714 , 0 : 0.485714285714 , None : 0.0142857142857 } , 231 | (0, 3, 2, 3) : { 1 : 0.969696969697 , None : 0.0151515151515 } , 232 | (1, 1, 0, 0) : { 2 : 0.333333333333 , 1 : 0.333333333333 , 0 : 0.333333333333 , None : 0.00343642611684 } , 233 | (1, 2, 1, 1) : { 2 : 0.333333333333 , 0 : 0.333333333333 , 1 : 0.333333333333 , None : 0.00343642611684 } , 234 | (0, 1, 2, 1) : { 1 : 0.0285714285714 , 0 : 0.485714285714 , 2 : 0.485714285714 , None : 0.0142857142857 } , 235 | (2, 1, 0, 0) : { 2 : 0.333333333333 , 0 : 0.333333333333 , 1 : 0.333333333333 , None : 0.00343642611684 } , 236 | (1, 0, 1, 3) : { 0 : 0.333333333333 , 1 : 0.333333333333 , 2 : 0.333333333333 , None : 0.00347222222222 } , 237 | (0, 0, 0, 0) : { 2 : 0.333333333333 , 1 : 0.333333333333 , 0 : 0.333333333333 , None : 0.00343642611684 } , 238 | (2, 0, 2, 1) : { 2 : 0.425438596491 , 0 : 0.425438596491 , 1 : 0.149122807018 , None : 0.00438596491228 } , 239 | (1, 0, 2, 1) : { 2 : 0.485714285714 , 1 : 0.0285714285714 , 0 : 0.485714285714 , None : 0.0142857142857 } , 240 | (1, 1, 2, 3) : { 1 : 0.969696969697 , None : 0.0151515151515 } , 241 | (2, 2, 2, 4) : { 2 : 0.492753623188 , 0 : 0.492753623188 , None : 0.0144927536232 } , 242 | (0, 4, 2, 3) : { 2 : 0.285714285714 , 0 : 0.285714285714 , 1 : 0.428571428571 , None : 0.00446428571429 } , 243 | None : 0.333333333333 } } , 244 | 'health' : { 245 | 'vals' : ['not_recom', 'priority', 'recommended'] , 246 | 'pars' : ['has_nurs', 'class'] , 247 | 'cpds' : { 248 | (0, 1) : { 1 : 0.226980728051 , 2 : 0.770877944325 , None : 0.00214132762313 } , 249 | (3, 2) : { 2 : 0.6 , None : 0.2 } , 250 | (0, 0) : { 0 : 0.997693194925 , None : 0.00115340253749 } , 251 | (0, 3) : { 2 : 0.399368587214 , 1 : 0.599842146803 , None : 0.000789265982636 } , 252 | (3, 3) : { 1 : 0.745098039216 , 2 : 0.250980392157 , None : 0.00392156862745 } , 253 | (3, 0) : { 0 : 0.997693194925 , None : 0.00115340253749 } , 254 | (4, 1) : { 1 : 0.0469483568075 , 2 : 0.948356807512 , None : 0.00469483568075 } , 255 | (3, 1) : { 2 : 0.497401633259 , 1 : 0.501855976244 , None : 0.000742390497402 } , 256 | (2, 1) : { 2 : 0.497401633259 , 1 : 0.501855976244 , None : 0.000742390497402 } , 257 | (1, 1) : { 1 : 0.431091510474 , 2 : 0.567805953693 , None : 0.00110253583241 } , 258 | (2, 0) : { 0 : 0.997693194925 , None : 0.00115340253749 } , 259 | (1, 3) : { 2 : 0.374507227332 , 1 : 0.624178712221 , None : 0.00131406044678 } , 260 | (2, 3) : { 1 : 0.745098039216 , 2 : 0.250980392157 , None : 0.00392156862745 } , 261 | (1, 4) : { 2 : 0.971014492754 , None : 0.0144927536232 } , 262 | (4, 3) : { 2 : 0.436554898093 , 1 : 0.562787639711 , None : 0.000657462195924 } , 263 | (1, 0) : { 0 : 0.997693194925 , None : 0.00115340253749 } , 264 | (3, 4) : { 2 : 0.984962406015 , None : 0.00751879699248 } , 265 | (2, 4) : { 2 : 0.985185185185 , None : 0.00740740740741 } , 266 | (4, 0) : { 0 : 0.997693194925 , None : 0.00115340253749 } , 267 | None : 0.333333333333 } } , 268 | 'class' : { 269 | 'vals' : ['not_recom', 'priority', 'recommend', 'spec_prior', 'very_recom'] , 270 | 'pars' : ['parents', 'has_nurs', 'health'] , 271 | 'cpds' : { 272 | (0, 1, 1) : { 1 : 0.0136518771331 , 3 : 0.976109215017 , None : 0.00341296928328 } , 273 | (0, 2, 2) : { 1 : 0.77133105802 , 3 : 0.21843003413 , None : 0.00341296928328 } , 274 | (1, 3, 2) : { 1 : 0.761092150171 , 4 : 0.22866894198 , None : 0.00341296928328 } , 275 | (1, 0, 0) : { 0 : 0.986348122867 , None : 0.00341296928328 } , 276 | (0, 1, 2) : { 3 : 0.757679180887 , 1 : 0.232081911263 , None : 0.00341296928328 } , 277 | (1, 3, 1) : { 1 : 0.986348122867 , None : 0.00341296928328 } , 278 | (1, 0, 1) : { 1 : 0.0136518771331 , 3 : 0.976109215017 , None : 0.00341296928328 } , 279 | (0, 2, 1) : { 3 : 0.648464163823 , 1 : 0.341296928328 , None : 0.00341296928328 } , 280 | (1, 3, 0) : { 0 : 0.986348122867 , None : 0.00341296928328 } , 281 | (2, 3, 0) : { 0 : 0.986348122867 , None : 0.00341296928328 } , 282 | (2, 4, 1) : { 3 : 0.976109215017 , 1 : 0.0136518771331 , None : 0.00341296928328 } , 283 | (2, 2, 0) : { 0 : 0.986348122867 , None : 0.00341296928328 } , 284 | (2, 3, 1) : { 1 : 0.986348122867 , None : 0.00341296928328 } , 285 | (2, 4, 0) : { 0 : 0.986348122867 , None : 0.00341296928328 } , 286 | (2, 3, 2) : { 2 : 0.0102389078498 , 4 : 0.221843003413 , 1 : 0.761092150171 , None : 0.00341296928328 } , 287 | (0, 4, 2) : { 1 : 0.232081911263 , 3 : 0.757679180887 , None : 0.00341296928328 } , 288 | (1, 4, 1) : { 1 : 0.0136518771331 , 3 : 0.976109215017 , None : 0.00341296928328 } , 289 | (0, 2, 0) : { 0 : 0.986348122867 , None : 0.00341296928328 } , 290 | (2, 4, 2) : { 1 : 0.232081911263 , 3 : 0.757679180887 , None : 0.00341296928328 } , 291 | (0, 3, 0) : { 0 : 0.986348122867 , None : 0.00341296928328 } , 292 | (2, 0, 1) : { 3 : 0.648464163823 , 1 : 0.341296928328 , None : 0.00341296928328 } , 293 | (1, 2, 0) : { 0 : 0.986348122867 , None : 0.00341296928328 } , 294 | (2, 0, 0) : { 0 : 0.986348122867 , None : 0.00341296928328 } , 295 | (1, 2, 1) : { 1 : 0.986348122867 , None : 0.00341296928328 } , 296 | (0, 0, 2) : { 3 : 0.757679180887 , 1 : 0.232081911263 , None : 0.00341296928328 } , 297 | (1, 1, 0) : { 0 : 0.986348122867 , None : 0.00341296928328 } , 298 | (1, 2, 2) : { 4 : 0.22866894198 , 1 : 0.761092150171 , None : 0.00341296928328 } , 299 | (2, 0, 2) : { 3 : 0.21843003413 , 1 : 0.77133105802 , None : 0.00341296928328 } , 300 | (0, 0, 1) : { 1 : 0.0136518771331 , 3 : 0.976109215017 , None : 0.00341296928328 } , 301 | (0, 3, 1) : { 3 : 0.648464163823 , 1 : 0.341296928328 , None : 0.00341296928328 } , 302 | (0, 0, 0) : { 0 : 0.986348122867 , None : 0.00341296928328 } , 303 | (0, 1, 0) : { 0 : 0.986348122867 , None : 0.00341296928328 } , 304 | (2, 1, 2) : { 1 : 0.761092150171 , 4 : 0.22866894198 , None : 0.00341296928328 } , 305 | (1, 4, 0) : { 0 : 0.986348122867 , None : 0.00341296928328 } , 306 | (1, 1, 1) : { 1 : 0.341296928328 , 3 : 0.648464163823 , None : 0.00341296928328 } , 307 | (0, 4, 0) : { 0 : 0.986348122867 , None : 0.00341296928328 } , 308 | (0, 3, 2) : { 1 : 0.77133105802 , 3 : 0.21843003413 , None : 0.00341296928328 } , 309 | (2, 2, 2) : { 1 : 0.761092150171 , 4 : 0.22866894198 , None : 0.00341296928328 } , 310 | (2, 1, 0) : { 0 : 0.986348122867 , None : 0.00341296928328 } , 311 | (2, 2, 1) : { 1 : 0.986348122867 , None : 0.00341296928328 } , 312 | (1, 4, 2) : { 3 : 0.757679180887 , 1 : 0.232081911263 , None : 0.00341296928328 } , 313 | (2, 1, 1) : { 1 : 0.986348122867 , None : 0.00341296928328 } , 314 | (1, 1, 2) : { 1 : 0.77133105802 , 3 : 0.21843003413 , None : 0.00341296928328 } , 315 | (0, 4, 1) : { 1 : 0.0136518771331 , 3 : 0.976109215017 , None : 0.00341296928328 } , 316 | (1, 0, 2) : { 3 : 0.757679180887 , 1 : 0.232081911263 , None : 0.00341296928328 } , 317 | None : 0.2 } } , 318 | } 319 | -------------------------------------------------------------------------------- /chap04/parent-child.txt: -------------------------------------------------------------------------------- 1 | HISTORY LVFAILURE 2 | CVP LVEDVOLUME 3 | PCWP LVEDVOLUME 4 | HYPOVOLEMIA 5 | LVEDVOLUME HYPOVOLEMIA LVFAILURE 6 | LVFAILURE 7 | STROKEVOLUME HYPOVOLEMIA LVFAILURE 8 | ERRLOWOUTPUT 9 | HRBP ERRLOWOUTPUT HR 10 | HREKG ERRCAUTER HR 11 | ERRCAUTER 12 | HRSAT ERRCAUTER HR 13 | INSUFFANESTH 14 | ANAPHYLAXIS 15 | TPR ANAPHYLAXIS 16 | EXPCOB ARTCOB VENTLUNG 17 | KINKEDTUBE 18 | MINVOL INTUBATION VENTLUNG 19 | FIOB 20 | PVSAT FIOB VENTALV 21 | SAOB PVSAT SHUNT 22 | PAP PULMEMBOLUS 23 | PULMEMBOLUS 24 | SHUNT INTUBATION PULMEMBOLUS 25 | INTUBATION 26 | PRESS INTUBATION KINKEDTUBE VENTTUBE 27 | DISCONNECT 28 | MINVOLSET 29 | VENTMACH MINVOLSET 30 | VENTTUBE DISCONNECT VENTMACH 31 | VENTLUNG INTUBATION KINKEDTUBE VENTTUBE 32 | VENTALV INTUBATION VENTLUNG 33 | ARTCOB VENTALV 34 | CATECHOL ARTCOB INSUFFANESTH SAOB TPR 35 | HR CATECHOL 36 | CO HR STROKEVOLUME 37 | BP CO TPR 38 | -------------------------------------------------------------------------------- /chap04/sachs.bif: -------------------------------------------------------------------------------- 1 | \\ File generated by BNfinder 2 | \\ 05/05/14 11:09:09 3 | \\ Conditional probability distributions generated with total pseudocounts number 1.000000 4 | 5 | network "sachs" {} 6 | 7 | variable "praf" { 8 | type discrete[2] { "0" "1" } 9 | } 10 | probability ( "praf" | "PIP2" "PKC" ) { 11 | default 0.5 0.5 ; 12 | ( "0" "1" ) 0.883411529532 0.116588470468 ; 13 | ( "1" "0" ) 0.843907252714 0.156092747286 ; 14 | ( "0" "0" ) 0.97375463383 0.0262453661701 ; 15 | ( "1" "1" ) 0.749388041392 0.250611958608 ; 16 | } 17 | 18 | variable "pmek" { 19 | type discrete[2] { "0" "1" } 20 | } 21 | probability ( "pmek" | "praf" "PIP2" ) { 22 | default 0.5 0.5 ; 23 | ( "0" "1" ) 0.938564960427 0.0614350395727 ; 24 | ( "1" "0" ) 0.00733886077814 0.992661139222 ; 25 | ( "0" "0" ) 0.992748183808 0.00725181619217 ; 26 | ( "1" "1" ) 0.0412287644613 0.958771235539 ; 27 | } 28 | 29 | variable "plcg" { 30 | type discrete[2] { "0" "1" } 31 | } 32 | probability ( "plcg" ) { 33 | table 0.982465879906 0.0175341200942 ; 34 | } 35 | 36 | variable "PIP2" { 37 | type discrete[2] { "0" "1" } 38 | } 39 | probability ( "PIP2" | "plcg" ) { 40 | default 0.5 0.5 ; 41 | ( "0" ) 0.996171593893 0.00382840610689 ; 42 | ( "1" ) 0.0131055663148 0.986894433685 ; 43 | } 44 | 45 | variable "PIP3" { 46 | type discrete[2] { "0" "1" } 47 | } 48 | probability ( "PIP3" ) { 49 | table 0.998259239423 0.00174076057728 ; 50 | } 51 | 52 | variable "p44/42" { 53 | type discrete[2] { "0" "1" } 54 | } 55 | probability ( "p44/42" | "plcg" "PKA" "pjnk" ) { 56 | default 0.5 0.5 ; 57 | ( "0" "1" "1" ) 0.500164252165 0.499835747835 ; 58 | ( "1" "1" "0" ) 0.500054997485 0.499945002515 ; 59 | ( "1" "0" "0" ) 0.784397684566 0.215602315434 ; 60 | ( "0" "0" "1" ) 0.820508289157 0.179491710843 ; 61 | ( "1" "0" "1" ) 0.670034697352 0.329965302648 ; 62 | ( "0" "0" "0" ) 0.956853227591 0.0431467724095 ; 63 | ( "0" "1" "0" ) 0.822414604184 0.177585395816 ; 64 | ( "1" "1" "1" ) 0.500008857135 0.499991142865 ; 65 | } 66 | 67 | variable "pakts473" { 68 | type discrete[2] { "0" "1" } 69 | } 70 | probability ( "pakts473" | "pmek" "PIP2" "p44/42" "PKA" "P38" "pjnk" ) { 71 | default 0.5 0.5 ; 72 | ( "1" "0" "0" "1" "1" "1" ) 0.50000294785 0.49999705215 ; 73 | ( "0" "1" "0" "1" "1" "0" ) 0.500000718319 0.499999281681 ; 74 | ( "0" "1" "1" "1" "0" "1" ) 0.499995996598 0.500004003402 ; 75 | ( "1" "1" "0" "0" "1" "0" ) 0.445876790005 0.554123209995 ; 76 | ( "0" "0" "1" "0" "1" "0" ) 0.321271597571 0.678728402429 ; 77 | ( "0" "1" "1" "0" "0" "0" ) 0.151910630054 0.848089369946 ; 78 | ( "1" "0" "0" "0" "0" "0" ) 0.846952928465 0.153047071535 ; 79 | ( "0" "1" "1" "1" "1" "1" ) 0.499993116757 0.500006883243 ; 80 | ( "1" "0" "0" "1" "0" "0" ) 0.951591761451 0.0484082385495 ; 81 | ( "1" "1" "1" "1" "1" "0" ) 0.499999924982 0.500000075018 ; 82 | ( "0" "0" "1" "0" "0" "1" ) 0.178010398042 0.821989601958 ; 83 | ( "0" "1" "1" "0" "1" "1" ) 0.138381238659 0.861618761341 ; 84 | ( "0" "0" "1" "1" "0" "0" ) 0.864564524136 0.135435475864 ; 85 | ( "0" "1" "1" "1" "0" "0" ) 0.501428970877 0.498571029123 ; 86 | ( "1" "0" "1" "1" "0" "1" ) 0.499998341558 0.500001658442 ; 87 | ( "0" "0" "1" "0" "1" "1" ) 0.231703682119 0.768296317881 ; 88 | ( "1" "1" "1" "0" "1" "0" ) 0.479974609613 0.520025390387 ; 89 | ( "0" "1" "1" "0" "0" "1" ) 0.258399170701 0.741600829299 ; 90 | ( "1" "0" "1" "0" "0" "0" ) 0.187926631648 0.812073368352 ; 91 | ( "0" "0" "1" "1" "1" "1" ) 0.49999399157 0.50000600843 ; 92 | ( "1" "1" "1" "1" "0" "0" ) 0.499996515396 0.500003484604 ; 93 | ( "1" "0" "1" "1" "1" "1" ) 0.499998960859 0.500001039141 ; 94 | ( "1" "0" "0" "1" "1" "0" ) 0.49999928455 0.50000071545 ; 95 | ( "0" "0" "1" "0" "0" "0" ) 0.719423229837 0.280576770163 ; 96 | ( "1" "1" "1" "0" "0" "1" ) 0.534350331916 0.465649668084 ; 97 | ( "0" "1" "1" "0" "1" "0" ) 0.506722754091 0.493277245909 ; 98 | ( "1" "0" "1" "0" "1" "1" ) 0.368098424988 0.631901575012 ; 99 | ( "0" "0" "1" "1" "0" "1" ) 0.500026599472 0.499973400528 ; 100 | ( "0" "0" "0" "1" "1" "0" ) 0.523780410964 0.476219589036 ; 101 | ( "1" "0" "1" "1" "0" "0" ) 0.553548778823 0.446451221177 ; 102 | ( "1" "1" "1" "0" "1" "1" ) 0.296074781294 0.703925218706 ; 103 | ( "1" "0" "1" "0" "0" "1" ) 0.288424613884 0.711575386116 ; 104 | ( "1" "1" "1" "1" "0" "1" ) 0.500000137762 0.499999862238 ; 105 | ( "0" "0" "1" "1" "1" "0" ) 0.501855233959 0.498144766041 ; 106 | ( "0" "0" "0" "1" "0" "1" ) 0.500108908151 0.499891091849 ; 107 | ( "0" "0" "0" "0" "0" "0" ) 0.981275524832 0.0187244751677 ; 108 | ( "1" "1" "1" "0" "0" "0" ) 0.252396700971 0.747603299029 ; 109 | ( "1" "0" "1" "0" "1" "0" ) 0.394468299105 0.605531700895 ; 110 | ( "0" "1" "0" "0" "1" "0" ) 0.627398392229 0.372601607771 ; 111 | ( "0" "0" "0" "1" "1" "1" ) 0.500023429945 0.499976570055 ; 112 | ( "1" "1" "0" "1" "1" "0" ) 0.49999978173 0.50000021827 ; 113 | ( "0" "0" "0" "0" "1" "1" ) 0.703227060067 0.296772939933 ; 114 | ( "1" "0" "0" "1" "0" "1" ) 0.499998702026 0.500001297974 ; 115 | ( "0" "1" "0" "0" "0" "1" ) 0.795903252496 0.204096747504 ; 116 | ( "0" "0" "0" "1" "0" "0" ) 0.992499174555 0.00750082544459 ; 117 | ( "1" "1" "0" "1" "0" "1" ) 0.500001707129 0.499998292871 ; 118 | ( "0" "1" "0" "1" "0" "0" ) 0.517904160209 0.482095839791 ; 119 | ( "0" "0" "0" "0" "0" "1" ) 0.471454070008 0.528545929992 ; 120 | ( "1" "1" "0" "0" "0" "0" ) 0.581330507845 0.418669492155 ; 121 | ( "0" "1" "0" "0" "1" "1" ) 0.43624023887 0.56375976113 ; 122 | ( "1" "0" "0" "0" "1" "0" ) 0.36224267246 0.63775732754 ; 123 | ( "1" "1" "0" "1" "1" "1" ) 0.499999814268 0.500000185732 ; 124 | ( "0" "1" "0" "1" "1" "1" ) 0.499998258751 0.500001741249 ; 125 | ( "0" "0" "0" "0" "1" "0" ) 0.773846245443 0.226153754557 ; 126 | ( "1" "1" "0" "0" "1" "1" ) 0.492584892042 0.507415107958 ; 127 | ( "0" "1" "0" "0" "0" "0" ) 0.643582342433 0.356417657567 ; 128 | ( "1" "0" "0" "0" "0" "1" ) 0.444307312729 0.555692687271 ; 129 | ( "1" "1" "0" "1" "0" "0" ) 0.500002800988 0.499997199012 ; 130 | ( "0" "1" "0" "1" "0" "1" ) 0.500008893694 0.499991106306 ; 131 | ( "0" "1" "1" "1" "1" "0" ) 0.500000027372 0.499999972628 ; 132 | ( "1" "1" "0" "0" "0" "1" ) 0.736798151454 0.263201848546 ; 133 | ( "1" "0" "1" "1" "1" "0" ) 0.499999487142 0.500000512858 ; 134 | ( "1" "1" "1" "1" "1" "1" ) 0.499997912627 0.500002087373 ; 135 | ( "1" "0" "0" "0" "1" "1" ) 0.618960188489 0.381039811511 ; 136 | } 137 | 138 | variable "PKA" { 139 | type discrete[2] { "0" "1" } 140 | } 141 | probability ( "PKA" | "PIP2" ) { 142 | default 0.5 0.5 ; 143 | ( "0" ) 0.893199615174 0.106800384826 ; 144 | ( "1" ) 0.993173981621 0.00682601837946 ; 145 | } 146 | 147 | variable "PKC" { 148 | type discrete[2] { "0" "1" } 149 | } 150 | probability ( "PKC" | "PIP2" ) { 151 | default 0.5 0.5 ; 152 | ( "0" ) 0.990273952072 0.00972604792819 ; 153 | ( "1" ) 0.854999323966 0.145000676034 ; 154 | } 155 | 156 | variable "P38" { 157 | type discrete[2] { "0" "1" } 158 | } 159 | probability ( "P38" | "PKC" ) { 160 | default 0.5 0.5 ; 161 | ( "0" ) 0.997336886711 0.00266311328887 ; 162 | ( "1" ) 0.012452177968 0.987547822032 ; 163 | } 164 | 165 | variable "pjnk" { 166 | type discrete[2] { "0" "1" } 167 | } 168 | probability ( "pjnk" | "PIP2" "PKC" ) { 169 | default 0.5 0.5 ; 170 | ( "0" "1" ) 0.121376850167 0.878623149833 ; 171 | ( "1" "0" ) 0.870625026606 0.129374973394 ; 172 | ( "0" "0" ) 0.991497808048 0.00850219195186 ; 173 | ( "1" "1" ) 0.0792796299772 0.920720370023 ; 174 | } 175 | 176 | -------------------------------------------------------------------------------- /chap04/sachs_cpd.sif: -------------------------------------------------------------------------------- 1 | praf + pmek 2 | pmek + pakts473 3 | plcg + PIP2 4 | plcg + p44/42 5 | PIP2 + praf 6 | PIP2 + pmek 7 | PIP2 + pakts473 8 | PIP2 - PKA 9 | PIP2 + PKC 10 | PIP2 + pjnk 11 | p44/42 + pakts473 12 | PKA + p44/42 13 | PKA - pakts473 14 | PKC + praf 15 | PKC + P38 16 | PKC + pjnk 17 | P38 + pakts473 18 | pjnk + p44/42 19 | pjnk + pakts473 20 | -------------------------------------------------------------------------------- /chap04/sachs_cpd.txt: -------------------------------------------------------------------------------- 1 | { 2 | 'praf' : { 3 | 'vals' : [] , 4 | 'pars' : ['PIP2', 'PKC'] , 5 | 'cpds' : { 6 | (0, 1) : { 1 : 0.116588470468 , 0 : 0.883411529532 , None : 0.0138669597404 } , 7 | (1, 0) : { 0 : 0.843907252714 , 1 : 0.156092747286 , None : 0.00733386283011 } , 8 | (0, 0) : { 1 : 0.0262453661701 , 0 : 0.97375463383 , None : 0.000138091458915 } , 9 | (1, 1) : { 0 : 0.749388041392 , 1 : 0.250611958608 , None : 0.0417451794914 } , 10 | None : 0.5 } } , 11 | 'pmek' : { 12 | 'vals' : [] , 13 | 'pars' : ['praf', 'PIP2'] , 14 | 'cpds' : { 15 | (0, 1) : { 1 : 0.0614350395727 , 0 : 0.938564960427 , None : 0.00751758513077 } , 16 | (1, 0) : { 0 : 0.00733886077814 , 1 : 0.992661139222 , None : 0.00503865917595 } , 17 | (0, 0) : { 1 : 0.00725181619217 , 0 : 0.992748183808 , None : 0.000140543676492 } , 18 | (1, 1) : { 0 : 0.0412287644613 , 1 : 0.958771235539 , None : 0.0366472022197 } , 19 | None : 0.5 } } , 20 | 'plcg' : { 21 | 'vals' : [] , 22 | 'pars' : [] , 23 | 'cpds' : { 24 | () : { 0 : 0.982465879906 , 1 : 0.0175341200942 , None : 0.000133904659882 } , 25 | None : 0.5 } } , 26 | 'PIP2' : { 27 | 'vals' : [] , 28 | 'pars' : ['plcg'] , 29 | 'cpds' : { 30 | (0,) : { 1 : 0.00382840610689 , 0 : 0.996171593893 , None : 0.000136275889722 } , 31 | (1,) : { 0 : 0.0131055663148 , 1 : 0.986894433685 , None : 0.00757892643608 } , 32 | None : 0.5 } } , 33 | 'PIP3' : { 34 | 'vals' : [] , 35 | 'pars' : [] , 36 | 'cpds' : { 37 | () : { 0 : 0.998259239423 , 1 : 0.00174076057728 , None : 0.000133904659882 } , 38 | None : 0.5 } } , 39 | 'p44/42' : { 40 | 'vals' : [] , 41 | 'pars' : ['plcg', 'PKA', 'pjnk'] , 42 | 'cpds' : { 43 | (0, 1, 1) : { 0 : 0.500164252165 , 1 : 0.499835747835 , None : 0.499723414153 } , 44 | (1, 1, 0) : { 1 : 0.499945002515 , 0 : 0.500054997485 , None : 0.499902775817 } , 45 | (1, 0, 0) : { 0 : 0.784397684566 , 1 : 0.215602315434 , None : 0.0100250752583 } , 46 | (0, 0, 1) : { 1 : 0.179491710843 , 0 : 0.820508289157 , None : 0.00766688700656 } , 47 | (1, 0, 1) : { 1 : 0.329965302648 , 0 : 0.670034697352 , None : 0.0292445507574 } , 48 | (0, 0, 0) : { 1 : 0.0431467724095 , 0 : 0.956853227591 , None : 0.000155529397806 } , 49 | (0, 1, 0) : { 0 : 0.822414604184 , 1 : 0.177585395816 , None : 0.00127882073372 } , 50 | (1, 1, 1) : { 1 : 0.499991142865 , 0 : 0.500008857135 , None : 0.499964824705 } , 51 | None : 0.5 } } , 52 | 'pakts473' : { 53 | 'vals' : [] , 54 | 'pars' : ['pmek', 'PIP2', 'p44/42', 'PKA', 'P38', 'pjnk'] , 55 | 'cpds' : { 56 | (1, 0, 0, 1, 1, 1) : { 0 : 0.50000294785 , 1 : 0.49999705215 , None : 0.499990359737 } , 57 | (0, 1, 0, 1, 1, 0) : { 1 : 0.499999281681 , 0 : 0.500000718319 , None : 0.499999103998 } , 58 | (0, 1, 1, 1, 0, 1) : { 0 : 0.499995996598 , 1 : 0.500004003402 , None : 0.499994928602 } , 59 | (1, 1, 0, 0, 1, 0) : { 0 : 0.445876790005 , 1 : 0.554123209995 , None : 0.445853725021 } , 60 | (0, 0, 1, 0, 1, 0) : { 0 : 0.321271597571 , 1 : 0.678728402429 , None : 0.222948466533 } , 61 | (0, 1, 1, 0, 0, 0) : { 0 : 0.151910630054 , 1 : 0.848089369946 , None : 0.0507905833895 } , 62 | (1, 0, 0, 0, 0, 0) : { 1 : 0.153047071535 , 0 : 0.846952928465 , None : 0.00516571718808 } , 63 | (0, 1, 1, 1, 1, 1) : { 1 : 0.500006883243 , 0 : 0.499993116757 , None : 0.499992873421 } , 64 | (1, 0, 0, 1, 0, 0) : { 1 : 0.0484082385495 , 0 : 0.951591761451 , None : 0.0483987222828 } , 65 | (1, 1, 1, 1, 1, 0) : { 1 : 0.500000075018 , 0 : 0.499999924982 , None : 0.49999992498 } , 66 | (0, 0, 1, 0, 0, 1) : { 1 : 0.821989601958 , 0 : 0.178010398042 , None : 0.0966689282817 } , 67 | (0, 1, 1, 0, 1, 1) : { 0 : 0.138381238659 , 1 : 0.861618761341 , None : 0.122161835562 } , 68 | (0, 0, 1, 1, 0, 0) : { 0 : 0.864564524136 , 1 : 0.135435475864 , None : 0.00716264530707 } , 69 | (0, 1, 1, 1, 0, 0) : { 1 : 0.498571029123 , 0 : 0.501428970877 , None : 0.498538695914 } , 70 | (1, 0, 1, 1, 0, 1) : { 0 : 0.499998341558 , 1 : 0.500001658442 , None : 0.499998157657 } , 71 | (0, 0, 1, 0, 1, 1) : { 0 : 0.231703682119 , 1 : 0.768296317881 , None : 0.0854641188459 } , 72 | (1, 1, 1, 0, 1, 0) : { 1 : 0.520025390387 , 0 : 0.479974609613 , None : 0.479974117797 } , 73 | (0, 1, 1, 0, 0, 1) : { 0 : 0.258399170701 , 1 : 0.741600829299 , None : 0.172240973218 } , 74 | (1, 0, 1, 0, 0, 0) : { 0 : 0.187926631648 , 1 : 0.812073368352 , None : 0.049171755917 } , 75 | (0, 0, 1, 1, 1, 1) : { 0 : 0.49999399157 , 1 : 0.50000600843 , None : 0.49999033941 } , 76 | (1, 1, 1, 1, 0, 0) : { 1 : 0.500003484604 , 0 : 0.499996515396 , None : 0.499995118111 } , 77 | (1, 0, 1, 1, 1, 1) : { 1 : 0.500001039141 , 0 : 0.499998960859 , None : 0.499998614732 } , 78 | (1, 0, 0, 1, 1, 0) : { 1 : 0.50000071545 , 0 : 0.49999928455 , None : 0.499999254783 } , 79 | (0, 0, 1, 0, 0, 0) : { 1 : 0.280576770163 , 0 : 0.719423229837 , None : 0.00392378735231 } , 80 | (1, 1, 1, 0, 0, 1) : { 0 : 0.534350331916 , 1 : 0.465649668084 , None : 0.46487955384 } , 81 | (0, 1, 1, 0, 1, 0) : { 1 : 0.493277245909 , 0 : 0.506722754091 , None : 0.482887988535 } , 82 | (1, 0, 1, 0, 1, 1) : { 0 : 0.368098424988 , 1 : 0.631901575012 , None : 0.312568832712 } , 83 | (0, 0, 1, 1, 0, 1) : { 1 : 0.499973400528 , 0 : 0.500026599472 , None : 0.49995806306 } , 84 | (0, 0, 0, 1, 1, 0) : { 1 : 0.476219589036 , 0 : 0.523780410964 , None : 0.476214440953 } , 85 | (1, 0, 1, 1, 0, 0) : { 1 : 0.446451221177 , 0 : 0.553548778823 , None : 0.446420488785 } , 86 | (1, 1, 1, 0, 1, 1) : { 0 : 0.296074781294 , 1 : 0.703925218706 , None : 0.280654298558 } , 87 | (1, 0, 1, 0, 0, 1) : { 1 : 0.711575386116 , 0 : 0.288424613884 , None : 0.267116882868 } , 88 | (1, 1, 1, 1, 0, 1) : { 1 : 0.499999862238 , 0 : 0.500000137762 , None : 0.499999858607 } , 89 | (0, 0, 1, 1, 1, 0) : { 0 : 0.501855233959 , 1 : 0.498144766041 , None : 0.498140840926 } , 90 | (0, 0, 0, 1, 0, 1) : { 0 : 0.500108908151 , 1 : 0.499891091849 , None : 0.499850256783 } , 91 | (0, 0, 0, 0, 0, 0) : { 0 : 0.981275524832 , 1 : 0.0187244751677 , None : 0.00016872167145 } , 92 | (1, 1, 1, 0, 0, 0) : { 1 : 0.747603299029 , 0 : 0.252396700971 , None : 0.14561277852 } , 93 | (1, 0, 1, 0, 1, 0) : { 0 : 0.394468299105 , 1 : 0.605531700895 , None : 0.394244611087 } , 94 | (0, 1, 0, 0, 1, 0) : { 0 : 0.627398392229 , 1 : 0.372601607771 , None : 0.339301493354 } , 95 | (0, 0, 0, 1, 1, 1) : { 1 : 0.499976570055 , 0 : 0.500023429945 , None : 0.499948776305 } , 96 | (1, 1, 0, 1, 1, 0) : { 0 : 0.49999978173 , 1 : 0.50000021827 , None : 0.499999781636 } , 97 | (0, 0, 0, 0, 1, 1) : { 0 : 0.703227060067 , 1 : 0.296772939933 , None : 0.0194634350747 } , 98 | (1, 0, 0, 1, 0, 1) : { 1 : 0.500001297974 , 0 : 0.499998702026 , None : 0.499994606058 } , 99 | (0, 1, 0, 0, 0, 1) : { 1 : 0.204096747504 , 0 : 0.795903252496 , None : 0.0784893268074 } , 100 | (0, 0, 0, 1, 0, 0) : { 1 : 0.00750082544459 , 0 : 0.992499174555 , None : 0.00159931412242 } , 101 | (1, 1, 0, 1, 0, 1) : { 0 : 0.500001707129 , 1 : 0.499998292871 , None : 0.499998262227 } , 102 | (0, 1, 0, 1, 0, 0) : { 0 : 0.517904160209 , 1 : 0.482095839791 , None : 0.48204604687 } , 103 | (0, 0, 0, 0, 0, 1) : { 0 : 0.471454070008 , 1 : 0.528545929992 , None : 0.0243524936626 } , 104 | (1, 1, 0, 0, 0, 0) : { 0 : 0.581330507845 , 1 : 0.418669492155 , None : 0.0468626686177 } , 105 | (0, 1, 0, 0, 1, 1) : { 0 : 0.43624023887 , 1 : 0.56375976113 , None : 0.0895846064421 } , 106 | (1, 0, 0, 0, 1, 0) : { 1 : 0.63775732754 , 0 : 0.36224267246 , None : 0.356330303012 } , 107 | (1, 1, 0, 1, 1, 1) : { 1 : 0.500000185732 , 0 : 0.499999814268 , None : 0.499996010217 } , 108 | (0, 1, 0, 1, 1, 1) : { 0 : 0.499998258751 , 1 : 0.500001741249 , None : 0.499991100432 } , 109 | (0, 0, 0, 0, 1, 0) : { 1 : 0.226153754557 , 0 : 0.773846245443 , None : 0.0576534188595 } , 110 | (1, 1, 0, 0, 1, 1) : { 0 : 0.492584892042 , 1 : 0.507415107958 , None : 0.160629335316 } , 111 | (0, 1, 0, 0, 0, 0) : { 1 : 0.356417657567 , 0 : 0.643582342433 , None : 0.0129379331738 } , 112 | (1, 0, 0, 0, 0, 1) : { 0 : 0.444307312729 , 1 : 0.555692687271 , None : 0.142866868837 } , 113 | (1, 1, 0, 1, 0, 0) : { 1 : 0.499997199012 , 0 : 0.500002800988 , None : 0.49998097251 } , 114 | (0, 1, 0, 1, 0, 1) : { 1 : 0.499991106306 , 0 : 0.500008893694 , None : 0.499988120305 } , 115 | (0, 1, 1, 1, 1, 0) : { 1 : 0.499999972628 , 0 : 0.500000027372 , None : 0.499999933682 } , 116 | (1, 1, 0, 0, 0, 1) : { 1 : 0.263201848546 , 0 : 0.736798151454 , None : 0.258951767039 } , 117 | (1, 0, 1, 1, 1, 0) : { 0 : 0.499999487142 , 1 : 0.500000512858 , None : 0.499999486124 } , 118 | (1, 1, 1, 1, 1, 1) : { 0 : 0.499997912627 , 1 : 0.500002087373 , None : 0.499997810456 } , 119 | (1, 0, 0, 0, 1, 1) : { 1 : 0.381039811511 , 0 : 0.618960188489 , None : 0.0952448421986 } , 120 | None : 0.5 } } , 121 | 'PKA' : { 122 | 'vals' : [] , 123 | 'pars' : ['PIP2'] , 124 | 'cpds' : { 125 | (0,) : { 1 : 0.106800384826 , 0 : 0.893199615174 , None : 0.000136767261158 } , 126 | (1,) : { 0 : 0.993173981621 , 1 : 0.00682601837946 , None : 0.0063167741167 } , 127 | None : 0.5 } } , 128 | 'PKC' : { 129 | 'vals' : [] , 130 | 'pars' : ['PIP2'] , 131 | 'cpds' : { 132 | (0,) : { 1 : 0.00972604792819 , 0 : 0.990273952072 , None : 0.000136767261158 } , 133 | (1,) : { 0 : 0.854999323966 , 1 : 0.145000676034 , None : 0.0063167741167 } , 134 | None : 0.5 } } , 135 | 'P38' : { 136 | 'vals' : [] , 137 | 'pars' : ['PKC'] , 138 | 'cpds' : { 139 | (0,) : { 1 : 0.00266311328887 , 0 : 0.997336886711 , None : 0.000135576100472 } , 140 | (1,) : { 0 : 0.012452177968 , 1 : 0.987547822032 , None : 0.0106305260364 } , 141 | None : 0.5 } } , 142 | 'pjnk' : { 143 | 'vals' : [] , 144 | 'pars' : ['PIP2', 'PKC'] , 145 | 'cpds' : { 146 | (0, 1) : { 1 : 0.878623149833 , 0 : 0.121376850167 , None : 0.0138669597404 } , 147 | (1, 0) : { 0 : 0.870625026606 , 1 : 0.129374973394 , None : 0.00733386283011 } , 148 | (0, 0) : { 1 : 0.00850219195186 , 0 : 0.991497808048 , None : 0.000138091458915 } , 149 | (1, 1) : { 0 : 0.0792796299772 , 1 : 0.920720370023 , None : 0.0417451794914 } , 150 | None : 0.5 } } , 151 | } 152 | -------------------------------------------------------------------------------- /chap04/sachs_network.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shark8me/Building_Probabilistic_Graphical_Models_in_Python/c1f7ad013e1d20759eb396c866fa95ac4a9e8885/chap04/sachs_network.png -------------------------------------------------------------------------------- /chap05/data_segmentation.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "metadata": { 3 | "name": "" 4 | }, 5 | "nbformat": 3, 6 | "nbformat_minor": 0, 7 | "worksheets": [ 8 | { 9 | "cells": [ 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "In this notebook, we shall see what the effect of data segmentation on parameter estimation using Maximum Likelihood. We have a small network defined in \"small_network.txt\", which has 2 random variables, X->Y connected by an arc. The parent X takes 5 values and the child Y takes 2 values.\n", 15 | "\n", 16 | "We first load the network from file and create a DiscreteBayesianNetwork." 17 | ] 18 | }, 19 | { 20 | "cell_type": "code", 21 | "collapsed": false, 22 | "input": [ 23 | "from libpgm.graphskeleton import GraphSkeleton\n", 24 | "from libpgm.nodedata import NodeData\n", 25 | "from libpgm.discretebayesiannetwork import DiscreteBayesianNetwork\n", 26 | "from libpgm.tablecpdfactor import TableCPDFactor\n", 27 | "from libpgm.pgmlearner import PGMLearner\n", 28 | "\n", 29 | "nd = NodeData()\n", 30 | "skel = GraphSkeleton()\n", 31 | "jsonpath=\"small_network.txt\"\n", 32 | "nd.load(jsonpath)\n", 33 | "skel.load(jsonpath)\n", 34 | "skel.toporder()\n", 35 | "\n", 36 | "bn = DiscreteBayesianNetwork(skel, nd)" 37 | ], 38 | "language": "python", 39 | "metadata": {}, 40 | "outputs": [], 41 | "prompt_number": 97 42 | }, 43 | { 44 | "cell_type": "markdown", 45 | "metadata": {}, 46 | "source": [ 47 | "We write a function that learns the parameters of the network with data sampled from it. We print out the estimated parameter value of the assignment X=3 and Y=0 after drawing 50 samples. We run the same function a few times to compare the results we get. Since sampling is random, you might get different results when you run the same." 48 | ] 49 | }, 50 | { 51 | "cell_type": "code", 52 | "collapsed": false, 53 | "input": [ 54 | "def learn_param(num_samp=50):\n", 55 | " data = bn.randomsample(num_samp)\n", 56 | " # instantiate learner \n", 57 | " learner = PGMLearner()\n", 58 | "\n", 59 | " # estimate parameters from data and skeleton\n", 60 | " result = learner.discrete_mle_estimateparams(skel, data)\n", 61 | " numer=len([1 for m in data if m[\"X\"]=='3' and m[\"Y\"]=='0'])\n", 62 | " denom=len([1 for m in data if m[\"X\"]=='3'])\n", 63 | "\n", 64 | " print \"numerator:\",numer,\" denominator:\",denom,\" result=\",numer/float(denom)\n", 65 | "\n", 66 | "[learn_param() for _ in range(5)]\n" 67 | ], 68 | "language": "python", 69 | "metadata": {}, 70 | "outputs": [ 71 | { 72 | "output_type": "stream", 73 | "stream": "stdout", 74 | "text": [ 75 | "numerator: 1 denominator: 6 result= 0.166666666667\n", 76 | "numerator: 2 denominator: 6 result= 0.333333333333\n", 77 | "numerator: 2 denominator: 10 result= 0.2\n", 78 | "numerator: 1 denominator: 4 result= 0.25\n", 79 | "numerator: 2 denominator: 12 result= 0.166666666667\n" 80 | ] 81 | }, 82 | { 83 | "metadata": {}, 84 | "output_type": "pyout", 85 | "prompt_number": 113, 86 | "text": [ 87 | "[None, None, None, None, None]" 88 | ] 89 | } 90 | ], 91 | "prompt_number": 113 92 | }, 93 | { 94 | "cell_type": "markdown", 95 | "metadata": {}, 96 | "source": [ 97 | "We can see that the result varies a lot, because the number of interesting samples (X==3, Y==0) is found in low numbers in the sampled dataset. The actual value that we've set in the file is 0.2." 98 | ] 99 | }, 100 | { 101 | "cell_type": "code", 102 | "collapsed": false, 103 | "input": [ 104 | "nd.Vdata[\"Y\"][\"cprob\"][\"['3']\"][0]" 105 | ], 106 | "language": "python", 107 | "metadata": {}, 108 | "outputs": [ 109 | { 110 | "metadata": {}, 111 | "output_type": "pyout", 112 | "prompt_number": 104, 113 | "text": [ 114 | "0.2" 115 | ] 116 | } 117 | ], 118 | "prompt_number": 104 119 | }, 120 | { 121 | "cell_type": "markdown", 122 | "metadata": {}, 123 | "source": [ 124 | "It is only when we increase the number of samples that we get values that are close. " 125 | ] 126 | }, 127 | { 128 | "cell_type": "code", 129 | "collapsed": false, 130 | "input": [ 131 | "[learn_param(5000) for _ in range(3)]" 132 | ], 133 | "language": "python", 134 | "metadata": {}, 135 | "outputs": [ 136 | { 137 | "output_type": "stream", 138 | "stream": "stdout", 139 | "text": [ 140 | "numerator: 160 denominator: 720 result= 0.222222222222\n", 141 | "numerator:" 142 | ] 143 | }, 144 | { 145 | "output_type": "stream", 146 | "stream": "stdout", 147 | "text": [ 148 | " 156 denominator: 763 result= 0.204456094364\n", 149 | "numerator:" 150 | ] 151 | }, 152 | { 153 | "output_type": "stream", 154 | "stream": "stdout", 155 | "text": [ 156 | " 156 denominator: 757 result= 0.20607661823\n" 157 | ] 158 | }, 159 | { 160 | "metadata": {}, 161 | "output_type": "pyout", 162 | "prompt_number": 108, 163 | "text": [ 164 | "[None, None, None]" 165 | ] 166 | } 167 | ], 168 | "prompt_number": 108 169 | }, 170 | { 171 | "cell_type": "markdown", 172 | "metadata": {}, 173 | "source": [ 174 | "Although a small network with a single parent, the number of discrete values the parent takes on, gives rise to the data fragmentation problem. which causes poor Maximum likelihood estimates." 175 | ] 176 | } 177 | ], 178 | "metadata": {} 179 | } 180 | ] 181 | } -------------------------------------------------------------------------------- /chap05/job_interview.txt: -------------------------------------------------------------------------------- 1 | { 2 | "V": ["Offer", "Interview", "Grades", "Admission", "Experience"], 3 | "E": [["Grades", "Interview"], 4 | ["Experience", "Interview"], 5 | ["Grades", "Admission"], 6 | ["Interview", "Offer"]], 7 | "Vdata": { 8 | "Offer": { 9 | "ord": 4, 10 | "numoutcomes": 2, 11 | "vals": ["0", "1"], 12 | "parents": ["Interview"], 13 | "children": None, 14 | "cprob": { 15 | "['0']": [.9, .1], 16 | "['1']": [.4, .6], 17 | "['2']": [.01, .99] 18 | } 19 | }, 20 | 21 | "Admission": { 22 | "ord": 3, 23 | "numoutcomes": 2, 24 | "vals": ["0", "1"], 25 | "parents": ["Grades"], 26 | "children": None, 27 | "cprob": { 28 | "['0']": [.7, .3], 29 | "['1']": [.2, .8] 30 | } 31 | }, 32 | 33 | "Interview": { 34 | "ord": 2, 35 | "numoutcomes": 3, 36 | "vals": ["0", "1", "2"], 37 | "parents": ["Experience", "Grades"], 38 | "children": ["Offer"], 39 | "cprob": { 40 | "['0', '0']": [.8, .18, .02], 41 | "['0', '1']": [.3, .6, .1], 42 | "['1', '0']": [.3, .4, .3], 43 | "['1', '1']": [.1, .2, .7] 44 | } 45 | }, 46 | 47 | "Grades": { 48 | "ord": 1, 49 | "numoutcomes": 2, 50 | "vals": ["0", "1"], 51 | "parents": None, 52 | "children": ["Admission", "Interview"], 53 | "cprob": [.7, .3] 54 | }, 55 | 56 | "Experience": { 57 | "ord": 0, 58 | "numoutcomes": 2, 59 | "vals": ["0", "1"], 60 | "parents": None, 61 | "children": ["Interview"], 62 | "cprob": [.6, .4] 63 | } 64 | } 65 | } 66 | -------------------------------------------------------------------------------- /chap05/job_interview_libpgm.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "metadata": { 3 | "name": "" 4 | }, 5 | "nbformat": 3, 6 | "nbformat_minor": 0, 7 | "worksheets": [ 8 | { 9 | "cells": [ 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "In this notebook we shall use the libpgm implementation of maximum likelihood estimates to learn the parameters of the CPDs in the job interview example that we've already seen in the previous chapters.\n", 15 | "\n", 16 | "We've already seen code from libpgm that loads the the CPDs, here it is again." 17 | ] 18 | }, 19 | { 20 | "cell_type": "code", 21 | "collapsed": false, 22 | "input": [ 23 | "from libpgm.graphskeleton import GraphSkeleton\n", 24 | "from libpgm.nodedata import NodeData\n", 25 | "from libpgm.discretebayesiannetwork import DiscreteBayesianNetwork\n", 26 | "from libpgm.tablecpdfactor import TableCPDFactor\n", 27 | "from libpgm.pgmlearner import PGMLearner\n", 28 | "import pandas as pd\n", 29 | "\n", 30 | "nd = NodeData()\n", 31 | "skel = GraphSkeleton()\n", 32 | "jsonpath=\"job_interview.txt\"\n", 33 | "nd.load(jsonpath)\n", 34 | "skel.load(jsonpath)\n", 35 | "skel.toporder()" 36 | ], 37 | "language": "python", 38 | "metadata": {}, 39 | "outputs": [], 40 | "prompt_number": 1 41 | }, 42 | { 43 | "cell_type": "markdown", 44 | "metadata": {}, 45 | "source": [ 46 | "We create the bayes network and get some random samples" 47 | ] 48 | }, 49 | { 50 | "cell_type": "code", 51 | "collapsed": false, 52 | "input": [ 53 | "bn = DiscreteBayesianNetwork(skel, nd)\n", 54 | "samples=bn.randomsample(2000)" 55 | ], 56 | "language": "python", 57 | "metadata": {}, 58 | "outputs": [], 59 | "prompt_number": 2 60 | }, 61 | { 62 | "cell_type": "markdown", 63 | "metadata": {}, 64 | "source": [ 65 | "We instantiate the PGMLearner class. The method discrete_mle_estimateparams already knows the structure of the network. As discussed in the earlier section, the estimates for each CPD only needs information from the parent, and this decomposition makes it possible to learn the parameters of each CPD." 66 | ] 67 | }, 68 | { 69 | "cell_type": "code", 70 | "collapsed": false, 71 | "input": [ 72 | "learner = PGMLearner()\n", 73 | "result = learner.discrete_mle_estimateparams(skel, samples)" 74 | ], 75 | "language": "python", 76 | "metadata": {}, 77 | "outputs": [], 78 | "prompt_number": 3 79 | }, 80 | { 81 | "cell_type": "markdown", 82 | "metadata": {}, 83 | "source": [ 84 | "The following 2 tables are the CPD parameters from the learnt, and original network. It can be seen that they are reasonable close approximation to each other." 85 | ] 86 | }, 87 | { 88 | "cell_type": "code", 89 | "collapsed": false, 90 | "input": [ 91 | "pd.DataFrame(result.Vdata['Interview']['cprob']).transpose()" 92 | ], 93 | "language": "python", 94 | "metadata": {}, 95 | "outputs": [ 96 | { 97 | "html": [ 98 | "
\n", 99 | "\n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | "
012
['0', '0'] 0.809582 0.165848 0.024570
['0', '1'] 0.321678 0.396853 0.281469
['1', '0'] 0.323204 0.591160 0.085635
['1', '1'] 0.115079 0.182540 0.702381
\n", 135 | "
" 136 | ], 137 | "metadata": {}, 138 | "output_type": "pyout", 139 | "prompt_number": 4, 140 | "text": [ 141 | " 0 1 2\n", 142 | "['0', '0'] 0.809582 0.165848 0.024570\n", 143 | "['0', '1'] 0.321678 0.396853 0.281469\n", 144 | "['1', '0'] 0.323204 0.591160 0.085635\n", 145 | "['1', '1'] 0.115079 0.182540 0.702381" 146 | ] 147 | } 148 | ], 149 | "prompt_number": 4 150 | }, 151 | { 152 | "cell_type": "code", 153 | "collapsed": false, 154 | "input": [ 155 | "pd.DataFrame(bn.Vdata['Interview']['cprob']).transpose()" 156 | ], 157 | "language": "python", 158 | "metadata": {}, 159 | "outputs": [ 160 | { 161 | "html": [ 162 | "
\n", 163 | "\n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | "
012
['0', '0'] 0.8 0.18 0.02
['0', '1'] 0.3 0.60 0.10
['1', '0'] 0.3 0.40 0.30
['1', '1'] 0.1 0.20 0.70
\n", 199 | "
" 200 | ], 201 | "metadata": {}, 202 | "output_type": "pyout", 203 | "prompt_number": 5, 204 | "text": [ 205 | " 0 1 2\n", 206 | "['0', '0'] 0.8 0.18 0.02\n", 207 | "['0', '1'] 0.3 0.60 0.10\n", 208 | "['1', '0'] 0.3 0.40 0.30\n", 209 | "['1', '1'] 0.1 0.20 0.70" 210 | ] 211 | } 212 | ], 213 | "prompt_number": 5 214 | } 215 | ], 216 | "metadata": {} 217 | } 218 | ] 219 | } -------------------------------------------------------------------------------- /chap05/small_network.txt: -------------------------------------------------------------------------------- 1 | { 2 | "V": ["X", "Y"], 3 | "E": [["X", "Y"]], 4 | "Vdata": { 5 | "Y": { 6 | "ord": 0, 7 | "numoutcomes": 2, 8 | "vals": ["0", "1"], 9 | "parents": ["X"], 10 | "children": None, 11 | "cprob": { 12 | "['0']": [.9, .1], 13 | "['1']": [.4, .6], 14 | "['2']": [.01, .99], 15 | "['3']": [.2, .8], 16 | "['4']": [.25, .75] 17 | } 18 | }, 19 | 20 | "X": { 21 | "ord": 1, 22 | "numoutcomes": 5, 23 | "vals": ["0", "1","2","3","4"], 24 | "parents": None, 25 | "children": ["Y"], 26 | "cprob": [.2, .3,.2,.15,.25] 27 | } 28 | } 29 | } 30 | -------------------------------------------------------------------------------- /chap06/alarm.txt: -------------------------------------------------------------------------------- 1 | { 2 | "V": ["marycalls", "johncalls", "alarm", "earthquake", "burglary"], 3 | "E": [["earthquake", "alarm"], 4 | ["burglary", "alarm"], 5 | ["alarm", "marycalls"], 6 | ["alarm", "johncalls"]], 7 | "Vdata": { 8 | "marycalls": { 9 | "ord": 4, 10 | "numoutcomes": 2, 11 | "vals": ["T", "F"], 12 | "parents": ["alarm"], 13 | "children": None, 14 | "cprob": { 15 | "['T']": [.7,.3], 16 | "['F']": [.01, .99] 17 | } 18 | }, 19 | 20 | "johncalls": { 21 | "ord": 3, 22 | "numoutcomes": 2, 23 | "vals": ["T", "F"], 24 | "parents": ["alarm"], 25 | "children": None, 26 | "cprob": { 27 | "['T']": [.9,.1], 28 | "['F']": [.05, .95] 29 | } 30 | }, 31 | 32 | "alarm": { 33 | "ord": 2, 34 | "numoutcomes": 2, 35 | "vals": ["T", "F"], 36 | "parents": ["earthquake", "burglary"], 37 | "children": ["johncalls","marycalls"], 38 | "cprob": { 39 | "['T', 'T']": [.95, .05], 40 | "['T', 'F']": [.29, .71], 41 | "['F', 'T']": [.94, .06], 42 | "['F', 'F']": [.001, .999] 43 | } 44 | }, 45 | 46 | "burglary": { 47 | "ord": 1, 48 | "numoutcomes": 2, 49 | "vals": ["T", "F"], 50 | "parents": None, 51 | "children": ["alarm"], 52 | "cprob": [.001, .999] 53 | }, 54 | 55 | "earthquake": { 56 | "ord": 0, 57 | "numoutcomes": 2, 58 | "vals": ["T", "F"], 59 | "parents": None, 60 | "children": ["alarm"], 61 | "cprob": [.002, .998] 62 | } 63 | } 64 | } 65 | -------------------------------------------------------------------------------- /chap06/asia.bif: -------------------------------------------------------------------------------- 1 | network unknown { 2 | } 3 | variable asia { 4 | type discrete [ 2 ] { yes, no }; 5 | } 6 | variable tub { 7 | type discrete [ 2 ] { yes, no }; 8 | } 9 | variable smoke { 10 | type discrete [ 2 ] { yes, no }; 11 | } 12 | variable lung { 13 | type discrete [ 2 ] { yes, no }; 14 | } 15 | variable bronc { 16 | type discrete [ 2 ] { yes, no }; 17 | } 18 | variable either { 19 | type discrete [ 2 ] { yes, no }; 20 | } 21 | variable xray { 22 | type discrete [ 2 ] { yes, no }; 23 | } 24 | variable dysp { 25 | type discrete [ 2 ] { yes, no }; 26 | } 27 | probability ( asia ) { 28 | table 0.01, 0.99; 29 | } 30 | probability ( tub | asia ) { 31 | (yes) 0.05, 0.95; 32 | (no) 0.01, 0.99; 33 | } 34 | probability ( smoke ) { 35 | table 0.5, 0.5; 36 | } 37 | probability ( lung | smoke ) { 38 | (yes) 0.1, 0.9; 39 | (no) 0.01, 0.99; 40 | } 41 | probability ( bronc | smoke ) { 42 | (yes) 0.6, 0.4; 43 | (no) 0.3, 0.7; 44 | } 45 | probability ( either | lung, tub ) { 46 | (yes, yes) 1.0, 0.0; 47 | (no, yes) 1.0, 0.0; 48 | (yes, no) 1.0, 0.0; 49 | (no, no) 0.0, 1.0; 50 | } 51 | probability ( xray | either ) { 52 | (yes) 0.98, 0.02; 53 | (no) 0.05, 0.95; 54 | } 55 | probability ( dysp | bronc, either ) { 56 | (yes, yes) 0.9, 0.1; 57 | (no, yes) 0.7, 0.3; 58 | (yes, no) 0.8, 0.2; 59 | (no, no) 0.1, 0.9; 60 | } 61 | -------------------------------------------------------------------------------- /chap06/asia.txt: -------------------------------------------------------------------------------- 1 | { 2 | "V": ["asia","tub","smoke","lung","bronc","either","xray","dysp"], 3 | "E": [["asia", "tub"], 4 | ["tub", "either"], 5 | ["either", "xray"], 6 | ["either", "dysp"], 7 | ["bronc", "dysp"], 8 | ["smoke","bronc"], 9 | ["smoke","lung"], 10 | ["lung","either"]], 11 | "Vdata": { 12 | "dysp": { 13 | "ord": 7, 14 | "numoutcomes": 2, 15 | "vals": ["yes", "no"], 16 | "parents": ["bronc", "either"], 17 | "children": None, 18 | "cprob": { 19 | "['yes', 'yes']": [.9, 0.1], 20 | "['no', 'yes']": [.7, 0.3], 21 | "['yes', 'no']": [.8, 0.2], 22 | "['no', 'no']": [0.1, 0.9] 23 | } 24 | }, 25 | "xray": { 26 | "ord": 6, 27 | "numoutcomes": 2, 28 | "vals": ["yes", "no"], 29 | "parents": ["either"], 30 | "children": None, 31 | "cprob": { 32 | "['yes']": [.98, .02], 33 | "['no']": [.05, .95] 34 | } 35 | }, 36 | 37 | "either": { 38 | "ord": 5, 39 | "numoutcomes": 2, 40 | "vals": ["yes", "no"], 41 | "parents": ["tub", "lung"], 42 | "children": ["xray","dysp"], 43 | "cprob": { 44 | "['yes', 'yes']": [1, 0], 45 | "['no', 'yes']": [1, 0], 46 | "['yes', 'no']": [1, 0], 47 | "['no', 'no']": [0, 1] 48 | } 49 | }, 50 | "bronc": { 51 | "ord": 4, 52 | "numoutcomes": 2, 53 | "vals": ["yes", "no"], 54 | "parents": ["smoke"], 55 | "children": ["dysp"], 56 | "cprob": { 57 | "['yes']": [.6, .4], 58 | "['no']": [.3, .7] 59 | } 60 | }, 61 | 62 | "lung": { 63 | "ord": 3, 64 | "numoutcomes": 2, 65 | "vals": ["yes", "no"], 66 | "parents": ["smoke"], 67 | "children": ["either","bronc"], 68 | "cprob": { 69 | "['yes']": [.1, .9], 70 | "['no']": [.01, .99] 71 | } 72 | }, 73 | 74 | "tub": { 75 | "ord":2, 76 | "numoutcomes": 2, 77 | "vals": ["yes", "no"], 78 | "parents": ["asia"], 79 | "children": ["either"], 80 | "cprob": { 81 | "['yes']": [.05, .95], 82 | "['no']": [.01, .99] 83 | } 84 | }, 85 | "smoke": { 86 | "ord": 1, 87 | "numoutcomes": 2, 88 | "vals": ["yes", "no"], 89 | "parents": None, 90 | "children": ["lung","bronc"], 91 | "cprob": [.5, .5] 92 | }, 93 | 94 | "asia": { 95 | "ord": 0, 96 | "numoutcomes": 2, 97 | "vals": ["yes", "no"], 98 | "parents": None, 99 | "children": ["tub"], 100 | "cprob": [.01, .99] 101 | } 102 | } 103 | } 104 | -------------------------------------------------------------------------------- /chap06/asia1.txt: -------------------------------------------------------------------------------- 1 | { 2 | "V": ["asia","tub","smoke","lung","bronc","either","xray","dysp"], 3 | "E": [["asia", "tub"], 4 | ["tub", "either"], 5 | ["either", "xray"], 6 | ["either", "dysp"], 7 | ["bronc", "dysp"], 8 | ["smoke","bronc"], 9 | ["smoke","lung"], 10 | ["lung","either"]], 11 | "Vdata": { 12 | "dysp": { 13 | "ord": 7, 14 | "numoutcomes": 2, 15 | "vals": ["yes", "no"], 16 | "parents": ["bronc", "either"], 17 | "children": None, 18 | "cprob": { 19 | "['yes', 'yes']": [.9,.1], 20 | "['no', 'yes']": [.7, .3], 21 | "['yes', 'no']": [.8, .2], 22 | "['no', 'no']": [.1, .9] 23 | } 24 | }, 25 | "xray": { 26 | "ord": 6, 27 | "numoutcomes": 2, 28 | "vals": ["yes", "no"], 29 | "parents": ["either"], 30 | "children": None, 31 | "cprob": { 32 | "['yes']": [.98, .02], 33 | "['no']": [.05, .95] 34 | } 35 | }, 36 | "either": { 37 | "ord": 5, 38 | "numoutcomes": 2, 39 | "vals": ["yes", "no"], 40 | "parents": ["tub", "lung"], 41 | "children": ["xray","dysp"], 42 | "cprob": { 43 | "['yes', 'yes']": [1, 0], 44 | "['no', 'yes']": [1, 0], 45 | "['yes', 'no']": [1, 0], 46 | "['no', 'no']": [0, 1] 47 | } 48 | }, 49 | "bronc": { 50 | "ord": 4, 51 | "numoutcomes": 2, 52 | "vals": ["yes", "no"], 53 | "parents": ["smoke"], 54 | "children": ["dysp"], 55 | "cprob": { 56 | "['yes']": [.6, .4], 57 | "['no']": [.3, .7] 58 | } 59 | }, 60 | 61 | "lung": { 62 | "ord": 3, 63 | "numoutcomes": 2, 64 | "vals": ["yes", "no"], 65 | "parents": ["smoke"], 66 | "children": ["either","bronc"], 67 | "cprob": { 68 | "['yes']": [.1, .9], 69 | "['no']": [.01, .99] 70 | } 71 | }, 72 | 73 | "tub": { 74 | "ord":2, 75 | "numoutcomes": 2, 76 | "vals": ["yes", "no"], 77 | "parents": ["asia"], 78 | "children": ["either"], 79 | "cprob": { 80 | "['yes']": [.05, .95], 81 | "['no']": [.01, .99] 82 | } 83 | }, 84 | "smoke": { 85 | "ord": 1, 86 | "numoutcomes": 2, 87 | "vals": ["yes", "no"], 88 | "parents": None, 89 | "children": ["lung","bronc"], 90 | "cprob": [.5, .5] 91 | }, 92 | 93 | "asia": { 94 | "ord": 0, 95 | "numoutcomes": 2, 96 | "vals": ["yes", "no"], 97 | "parents": None, 98 | "children": ["tub"], 99 | "cprob": [.01, .99] 100 | } 101 | } 102 | } 103 | -------------------------------------------------------------------------------- /chap06/asia_bn.py: -------------------------------------------------------------------------------- 1 | from bayesian.factor_graph import * 2 | from bayesian.bbn import * 3 | 4 | dictionary_asia = {'yes': 0.01, 'no': 0.99} 5 | 6 | def f_asia(asia): 7 | return dictionary_asia[asia] 8 | 9 | dictionary_tub = {('yes', 'no'): 0.95, ('no', 'no'): 0.99, ('no', 'yes'): 0.01, ('yes', 'yes'): 0.05} 10 | def f_tub(asia, tub): 11 | return dictionary_tub[(asia, tub)] 12 | 13 | dictionary_smoke = {'yes': 0.5, 'no': 0.5} 14 | 15 | def f_smoke(smoke): 16 | return dictionary_smoke[smoke] 17 | 18 | dictionary_lung = {('yes', 'no'): 0.9, ('no', 'no'): 0.99, ('no', 'yes'): 0.01, ('yes', 'yes'): 0.1} 19 | def f_lung(smoke, lung): 20 | return dictionary_lung[(smoke, lung)] 21 | 22 | dictionary_bronc = {('yes', 'no'): 0.4, ('no', 'no'): 0.7, ('no', 'yes'): 0.3, ('yes', 'yes'): 0.6} 23 | def f_bronc(smoke, bronc): 24 | return dictionary_bronc[(smoke, bronc)] 25 | 26 | dictionary_either = {('yes', 'yes', 'no'): 0.0, ('no', 'yes', 'yes'): 1.0, ('yes', 'yes', 'yes'): 1.0, ('no', 'no', 'yes'): 0.0, ('yes', 'no', 'no'): 0.0, ('no', 'no', 'no'): 1.0, ('no', 'yes', 'no'): 0.0, ('yes', 'no', 'yes'): 1.0} 27 | def f_either(lung, tub, either): 28 | return dictionary_either[(lung, tub, either)] 29 | 30 | dictionary_xray = {('yes', 'no'): 0.02, ('no', 'no'): 0.95, ('no', 'yes'): 0.05, ('yes', 'yes'): 0.98} 31 | def f_xray(either, xray): 32 | return dictionary_xray[(either, xray)] 33 | 34 | dictionary_dysp = {('yes', 'yes', 'no'): 0.1, ('no', 'yes', 'yes'): 0.7, ('yes', 'yes', 'yes'): 0.9, ('no', 'no', 'yes'): 0.1, ('yes', 'no', 'no'): 0.2, ('no', 'no', 'no'): 0.9, ('no', 'yes', 'no'): 0.3, ('yes', 'no', 'yes'): 0.8} 35 | def f_dysp(bronc, either, dysp): 36 | return dictionary_dysp[(bronc, either, dysp)] 37 | 38 | functions = [f_asia, f_tub, f_smoke, f_lung, f_bronc, f_either, f_xray, f_dysp] 39 | domains_dict = {'dysp': ['yes', 'no'], 'bronc': ['yes', 'no'], 'asia': ['yes', 'no'], 'xray': ['yes', 'no'], 'lung': ['yes', 'no'], 'either': ['yes', 'no'], 'smoke': ['yes', 'no'], 'tub': ['yes', 'no']} 40 | 41 | def create_graph(): 42 | g = build_graph( 43 | *functions, 44 | domains = domains_dict) 45 | g.name = 'asia' 46 | return g 47 | 48 | def create_bbn(): 49 | g = build_bbn( 50 | *functions, 51 | domains = domains_dict) 52 | g.name = 'asia' 53 | return g 54 | 55 | -------------------------------------------------------------------------------- /chap06/bif_parser.py: -------------------------------------------------------------------------------- 1 | import re 2 | 3 | 4 | def parse(filename): 5 | """Parses the .bif file with the 6 | given name (exclude the extension from the argument) 7 | and produces a python file with create_graph() and create_bbn() functions 8 | to return the network. The name of the module is returned. 9 | The bbn/factor_graph objects will have the filename as their model name.""" 10 | 11 | # Setting up I/O 12 | module_name = filename+'_bn' 13 | outfile = open(module_name + '.py', 'w') 14 | 15 | def write(s): 16 | outfile.write(s+"\n") 17 | infile = open(filename+'.bif') 18 | infile.readline() 19 | infile.readline() 20 | 21 | # Import statements in the produced module 22 | write("""from bayesian.factor_graph import * 23 | from bayesian.bbn import * 24 | """) 25 | 26 | # Regex patterns for parsing 27 | variable_pattern = re.compile(r" type discrete \[ \d+ \] \{ (.+) \};\s*") 28 | prior_probability_pattern_1 = re.compile( 29 | r"probability \( ([^|]+) \) \{\s*") 30 | prior_probability_pattern_2 = re.compile(r" table (.+);\s*") 31 | conditional_probability_pattern_1 = ( 32 | re.compile(r"probability \( (.+) \| (.+) \) \{\s*")) 33 | conditional_probability_pattern_2 = re.compile(r" \((.+)\) (.+);\s*") 34 | 35 | variables = {} # domains 36 | functions = [] # function names (nodes/variables) 37 | 38 | # For every line in the file 39 | while True: 40 | line = infile.readline() 41 | 42 | # End of file 43 | if not line: 44 | break 45 | 46 | # Variable declaration 47 | if line.startswith("variable"): 48 | match = variable_pattern.match(infile.readline()) 49 | 50 | # Extract domain and place into dictionary 51 | if match: 52 | variables[line[9:-3]] = match.group(1).split(", ") 53 | else: 54 | raise Exception("Unrecognised variable declaration:\n" + line) 55 | infile.readline() 56 | 57 | # Probability distribution 58 | elif line.startswith("probability"): 59 | 60 | match = prior_probability_pattern_1.match(line) 61 | if match: 62 | 63 | # Prior probabilities 64 | variable = match.group(1) 65 | function_name = "f_" + variable 66 | functions.append(function_name) 67 | line = infile.readline() 68 | match = prior_probability_pattern_2.match(line) 69 | write("""dictionary_%(var)s = %(dict)s 70 | 71 | def %(function)s(%(var)s): 72 | return dictionary_%(var)s[%(var)s] 73 | """ 74 | % { 75 | 'function': function_name, 76 | 'var': variable, 77 | 'dict': str(dict( 78 | zip(variables[variable], 79 | map(float, match.group(1).split(", "))))) 80 | } 81 | ) 82 | infile.readline() # } 83 | 84 | else: 85 | match = conditional_probability_pattern_1.match(line) 86 | if match: 87 | 88 | # Conditional probabilities 89 | variable = match.group(1) 90 | function_name = "f_" + variable 91 | functions.append(function_name) 92 | given = match.group(2) 93 | dictionary = {} 94 | 95 | # Iterate through the conditional probability table 96 | while True: 97 | line = infile.readline() # line of the CPT 98 | if line == '}\n': 99 | break 100 | match = conditional_probability_pattern_2.match(line) 101 | given_values = match.group(1).split(", ") 102 | for value, prob in zip( 103 | variables[variable], 104 | map(float, match.group(2).split(", "))): 105 | dictionary[tuple(given_values + [value])] = prob 106 | write("""dictionary_%(var)s = %(dict)s 107 | def %(function)s(%(given)s, %(var)s): 108 | return dictionary_%(var)s[(%(given)s, %(var)s)] 109 | """ 110 | % {'function': function_name, 111 | 'given': given, 112 | 'var': variable, 113 | 'dict': str(dictionary)}) 114 | else: 115 | raise Exception( 116 | "Unrecognised probability declaration:\n" + line) 117 | 118 | write("""functions = %(funcs)s 119 | domains_dict = %(vars)s 120 | 121 | def create_graph(): 122 | g = build_graph( 123 | *functions, 124 | domains = domains_dict) 125 | g.name = '%(name)s' 126 | return g 127 | 128 | def create_bbn(): 129 | g = build_bbn( 130 | *functions, 131 | domains = domains_dict) 132 | g.name = '%(name)s' 133 | return g 134 | """ 135 | % { 136 | 'funcs': ''.join(c for c in str(functions) if c not in "'\""), 137 | 'vars': str(variables), 'name': filename}) 138 | outfile.close() 139 | return module_name 140 | -------------------------------------------------------------------------------- /chap06/unittestdict.txt: -------------------------------------------------------------------------------- 1 | { 2 | "V": ["Letter", "Grade", "Intelligence", "SAT", "Difficulty"], 3 | "E": [["Intelligence", "Grade"], 4 | ["Difficulty", "Grade"], 5 | ["Intelligence", "SAT"], 6 | ["Grade", "Letter"]], 7 | "Vdata": { 8 | "Letter": { 9 | "ord": 4, 10 | "numoutcomes": 2, 11 | "vals": ["weak", "strong"], 12 | "parents": ["Grade"], 13 | "children": None, 14 | "cprob": { 15 | "['A']": [.1, .9], 16 | "['B']": [.4, .6], 17 | "['C']": [.99, .01] 18 | } 19 | }, 20 | 21 | "SAT": { 22 | "ord": 3, 23 | "numoutcomes": 2, 24 | "vals": ["lowscore", "highscore"], 25 | "parents": ["Intelligence"], 26 | "children": None, 27 | "cprob": { 28 | "['low']": [.95, .05], 29 | "['high']": [.2, .8] 30 | } 31 | }, 32 | 33 | "Grade": { 34 | "ord": 2, 35 | "numoutcomes": 3, 36 | "vals": ["A", "B", "C"], 37 | "parents": ["Difficulty", "Intelligence"], 38 | "children": ["Letter"], 39 | "cprob": { 40 | "['easy', 'low']": [.3, .4, .3], 41 | "['easy', 'high']": [.9, .08, .02], 42 | "['hard', 'low']": [.05, .25, .7], 43 | "['hard', 'high']": [.5, .3, .2] 44 | } 45 | }, 46 | 47 | "Intelligence": { 48 | "ord": 1, 49 | "numoutcomes": 2, 50 | "vals": ["low", "high"], 51 | "parents": None, 52 | "children": ["SAT", "Grade"], 53 | "cprob": [.7, .3] 54 | }, 55 | 56 | "Difficulty": { 57 | "ord": 0, 58 | "numoutcomes": 2, 59 | "vals": ["easy", "hard"], 60 | "parents": None, 61 | "children": ["Grade"], 62 | "cprob": [.6, .4] 63 | } 64 | } 65 | } 66 | -------------------------------------------------------------------------------- /chap07/Comparing gibbs and random sampling.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "metadata": { 3 | "name": "" 4 | }, 5 | "nbformat": 3, 6 | "nbformat_minor": 0, 7 | "worksheets": [ 8 | { 9 | "cells": [ 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "In this notebook, we shall look at the sampling process. We'll first understand what it means to 'arrive at the stationary distribution' for a discrete distribution, and what methods we can use to get there faster.\n", 15 | "\n", 16 | "We'll use the familiar job interview example to anchor the discussion.\n", 17 | "\n", 18 | "The Job interview network has 5 binary-valued variables, which means the joint distribution has 48 rows (2x2x3x3x2, the number of values each variable takes). We are interested in a marginal distribution over a subset of variables, and we have some observed evidence too. " 19 | ] 20 | }, 21 | { 22 | "cell_type": "code", 23 | "collapsed": false, 24 | "input": [ 25 | "from libpgm.graphskeleton import GraphSkeleton\n", 26 | "from libpgm.nodedata import NodeData\n", 27 | "from libpgm.discretebayesiannetwork import DiscreteBayesianNetwork\n", 28 | "from libpgm.tablecpdfactorization import TableCPDFactorization\n", 29 | "from libpgm.sampleaggregator import SampleAggregator\n", 30 | "from libpgm.pgmlearner import PGMLearner\n", 31 | "import itertools\n", 32 | "import pandas as pd \n", 33 | "import json\n", 34 | "\n", 35 | "def getTableCPD():\n", 36 | " nd = NodeData()\n", 37 | " skel = GraphSkeleton()\n", 38 | " jsonpath=\"job_interview.txt\"\n", 39 | " nd.load(jsonpath)\n", 40 | " skel.load(jsonpath)\n", 41 | " skel.toporder()\n", 42 | " # load bayesian network\n", 43 | " bn = DiscreteBayesianNetwork(skel, nd)\n", 44 | " tablecpd=TableCPDFactorization(bn)\n", 45 | " return tablecpd,bn,skel\n", 46 | "\n", 47 | "#a method that prints the distribution as a table.\n", 48 | "def printdist(jd,bn,normalize=False):\n", 49 | " x=[bn.Vdata[i][\"vals\"] for i in jd.scope]\n", 50 | " zipover=[i/sum(jd.vals) for i in jd.vals] if normalize else jd.vals\n", 51 | " #creates the cartesian product\n", 52 | " k=[a + [b] for a,b in zip([list(i) for i in itertools.product(*x[::-1])],zipover)]\n", 53 | " df=pd.DataFrame.from_records(k,columns=[i for i in reversed(jd.scope)]+['probability'])\n", 54 | " return df" 55 | ], 56 | "language": "python", 57 | "metadata": {}, 58 | "outputs": [], 59 | "prompt_number": 1 60 | }, 61 | { 62 | "cell_type": "markdown", 63 | "metadata": {}, 64 | "source": [ 65 | "We are interested in the marginal probability $ P(Offer,Grades,Interview \\mid Admission,Experience) $, where we have observed the value of Admission and Experience.\n", 66 | "\n", 67 | "In the snippet below, we use exact inference (Variable Elimination) to determine the conditional probability, and then print the CPD for the same. " 68 | ] 69 | }, 70 | { 71 | "cell_type": "code", 72 | "collapsed": false, 73 | "input": [ 74 | "tcpd,bn,skel=getTableCPD()\n", 75 | "query={'Offer':'0','Grades':'0','Interview':'0'}\n", 76 | "evidence={'Admission':'0','Experience':'0'}\n", 77 | "fac=tcpd.condprobve(query,evidence)\n", 78 | "df=printdist(fac,bn)\n", 79 | "df" 80 | ], 81 | "language": "python", 82 | "metadata": {}, 83 | "outputs": [ 84 | { 85 | "html": [ 86 | "
\n", 87 | "\n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | "
OfferInterviewGradesprobability
0 0 0 0 0.641455
1 0 0 1 0.029455
2 0 1 0 0.064145
3 0 1 1 0.026182
4 0 2 0 0.000178
5 0 2 1 0.000109
6 1 0 0 0.071273
7 1 0 1 0.003273
8 1 1 0 0.096218
9 1 1 1 0.039273
10 1 2 0 0.017640
11 1 2 1 0.010800
\n", 184 | "
" 185 | ], 186 | "metadata": {}, 187 | "output_type": "pyout", 188 | "prompt_number": 2, 189 | "text": [ 190 | " Offer Interview Grades probability\n", 191 | "0 0 0 0 0.641455\n", 192 | "1 0 0 1 0.029455\n", 193 | "2 0 1 0 0.064145\n", 194 | "3 0 1 1 0.026182\n", 195 | "4 0 2 0 0.000178\n", 196 | "5 0 2 1 0.000109\n", 197 | "6 1 0 0 0.071273\n", 198 | "7 1 0 1 0.003273\n", 199 | "8 1 1 0 0.096218\n", 200 | "9 1 1 1 0.039273\n", 201 | "10 1 2 0 0.017640\n", 202 | "11 1 2 1 0.010800" 203 | ] 204 | } 205 | ], 206 | "prompt_number": 2 207 | }, 208 | { 209 | "cell_type": "markdown", 210 | "metadata": {}, 211 | "source": [ 212 | "To get the desired distribution $ P(Offer,Grades,Interview\u2223Admission=0,Experience=0) $, we first have to draw samples, reject those that do not satisfy the evidence. \n", 213 | "\n", 214 | "Libpgm allows us to draw samples using random sampling and gibbs sampling. In both cases, we can condition by evidence ( $ (Admission=0,Experience=0) $ ).\n", 215 | "\n", 216 | "In the code below, we draw 5000 samples using gibbs and random sampling, and compare the marginal probabilities that are learnt from the samples. " 217 | ] 218 | }, 219 | { 220 | "cell_type": "code", 221 | "collapsed": false, 222 | "input": [ 223 | "def estimate_distrib(skel,samples):\n", 224 | " learner=PGMLearner()\n", 225 | " #learn the parameters of the network from the samples, given skeleton\n", 226 | " #returns a new bayes net.\n", 227 | " bayesnet=learner.discrete_mle_estimateparams(skel,samples)\n", 228 | " tablecpd=TableCPDFactorization(bayesnet)\n", 229 | " #run a conditional probability query for\n", 230 | " #P(Offer,Grades,Interview\u2223Admission=0,Experience=0)\n", 231 | " fac=tablecpd.condprobve(query,evidence)\n", 232 | " #create a dataframe listing the marginals \n", 233 | " df2=printdist(fac,bayesnet)\n", 234 | " return df2\n", 235 | "\n", 236 | "#learn the marginals from gibbs samples\n", 237 | "def gibbs_marginals(num_samples=5000):\n", 238 | " tcpd,bn,skel=getTableCPD()\n", 239 | " samples=tcpd.gibbssample(evidence,num_samples)\n", 240 | " df2=estimate_distrib(skel,samples)\n", 241 | " return df2['probability']\n", 242 | "\n", 243 | "#learn the marginals from random samples\n", 244 | "def random_sample_marginals(num_samples=5000):\n", 245 | " tcpd,bn,skel=getTableCPD()\n", 246 | " samples=bn.randomsample(num_samples,evidence)\n", 247 | " df2=estimate_distrib(skel,samples)\n", 248 | " return df2['probability']\n", 249 | "\n", 250 | "df['prob from gibbs']=gibbs_marginals()\n", 251 | "df['prob from random samples']=random_sample_marginals()\n", 252 | "df" 253 | ], 254 | "language": "python", 255 | "metadata": {}, 256 | "outputs": [ 257 | { 258 | "html": [ 259 | "
\n", 260 | "\n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | " \n", 267 | " \n", 268 | " \n", 269 | " \n", 270 | " \n", 271 | " \n", 272 | " \n", 273 | " \n", 274 | " \n", 275 | " \n", 276 | " \n", 277 | " \n", 278 | " \n", 279 | " \n", 280 | " \n", 281 | " \n", 282 | " \n", 283 | " \n", 284 | " \n", 285 | " \n", 286 | " \n", 287 | " \n", 288 | " \n", 289 | " \n", 290 | " \n", 291 | " \n", 292 | " \n", 293 | " \n", 294 | " \n", 295 | " \n", 296 | " \n", 297 | " \n", 298 | " \n", 299 | " \n", 300 | " \n", 301 | " \n", 302 | " \n", 303 | " \n", 304 | " \n", 305 | " \n", 306 | " \n", 307 | " \n", 308 | " \n", 309 | " \n", 310 | " \n", 311 | " \n", 312 | " \n", 313 | " \n", 314 | " \n", 315 | " \n", 316 | " \n", 317 | " \n", 318 | " \n", 319 | " \n", 320 | " \n", 321 | " \n", 322 | " \n", 323 | " \n", 324 | " \n", 325 | " \n", 326 | " \n", 327 | " \n", 328 | " \n", 329 | " \n", 330 | " \n", 331 | " \n", 332 | " \n", 333 | " \n", 334 | " \n", 335 | " \n", 336 | " \n", 337 | " \n", 338 | " \n", 339 | " \n", 340 | " \n", 341 | " \n", 342 | " \n", 343 | " \n", 344 | " \n", 345 | " \n", 346 | " \n", 347 | " \n", 348 | " \n", 349 | " \n", 350 | " \n", 351 | " \n", 352 | " \n", 353 | " \n", 354 | " \n", 355 | " \n", 356 | " \n", 357 | " \n", 358 | " \n", 359 | " \n", 360 | " \n", 361 | " \n", 362 | " \n", 363 | " \n", 364 | " \n", 365 | " \n", 366 | " \n", 367 | " \n", 368 | " \n", 369 | " \n", 370 | " \n", 371 | " \n", 372 | " \n", 373 | " \n", 374 | " \n", 375 | " \n", 376 | " \n", 377 | " \n", 378 | " \n", 379 | " \n", 380 | " \n", 381 | " \n", 382 | "
OfferInterviewGradesprobabilityprob from gibbsprob from random samples
0 0 0 0 0.641455 0.645444 0.078557
1 0 0 1 0.029455 0.025156 0.113443
2 0 1 0 0.064145 0.065997 0.058145
3 0 1 1 0.026182 0.026203 0.008655
4 0 2 0 0.000178 0.000000 0.013869
5 0 2 1 0.000109 0.000000 0.028531
6 1 0 0 0.071273 0.072956 0.048443
7 1 0 1 0.003273 0.002844 0.069957
8 1 1 0 0.096218 0.096203 0.504855
9 1 1 1 0.039273 0.038197 0.075145
10 1 2 0 0.017640 0.016400 0.000131
11 1 2 1 0.010800 0.010600 0.000269
\n", 383 | "
" 384 | ], 385 | "metadata": {}, 386 | "output_type": "pyout", 387 | "prompt_number": 3, 388 | "text": [ 389 | " Offer Interview Grades probability prob from gibbs \\\n", 390 | "0 0 0 0 0.641455 0.645444 \n", 391 | "1 0 0 1 0.029455 0.025156 \n", 392 | "2 0 1 0 0.064145 0.065997 \n", 393 | "3 0 1 1 0.026182 0.026203 \n", 394 | "4 0 2 0 0.000178 0.000000 \n", 395 | "5 0 2 1 0.000109 0.000000 \n", 396 | "6 1 0 0 0.071273 0.072956 \n", 397 | "7 1 0 1 0.003273 0.002844 \n", 398 | "8 1 1 0 0.096218 0.096203 \n", 399 | "9 1 1 1 0.039273 0.038197 \n", 400 | "10 1 2 0 0.017640 0.016400 \n", 401 | "11 1 2 1 0.010800 0.010600 \n", 402 | "\n", 403 | " prob from random samples \n", 404 | "0 0.078557 \n", 405 | "1 0.113443 \n", 406 | "2 0.058145 \n", 407 | "3 0.008655 \n", 408 | "4 0.013869 \n", 409 | "5 0.028531 \n", 410 | "6 0.048443 \n", 411 | "7 0.069957 \n", 412 | "8 0.504855 \n", 413 | "9 0.075145 \n", 414 | "10 0.000131 \n", 415 | "11 0.000269 " 416 | ] 417 | } 418 | ], 419 | "prompt_number": 3 420 | }, 421 | { 422 | "cell_type": "markdown", 423 | "metadata": {}, 424 | "source": [ 425 | "We can compare the true probability (obtained from exact inference), the probability from gibbs samples, and the probability from random samples. We can see that the probabilities from gibbs samples are reasonably close to the true marginals, and while the random samples differ quite a bit from the true probability. \n", 426 | "\n", 427 | "It is obvious that gibbs sampling is a much more efficient sampling process than random sampling. Yet, for larger dimensions, gibbs sampling too will struggle to obtain marginals that are 'close' to the true marginals." 428 | ] 429 | } 430 | ], 431 | "metadata": {} 432 | } 433 | ] 434 | } -------------------------------------------------------------------------------- /chap07/cow_image.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shark8me/Building_Probabilistic_Graphical_Models_in_Python/c1f7ad013e1d20759eb396c866fa95ac4a9e8885/chap07/cow_image.jpg -------------------------------------------------------------------------------- /chap07/job_interview.txt: -------------------------------------------------------------------------------- 1 | { 2 | "V": ["Offer", "Interview", "Grades", "Admission", "Experience"], 3 | "E": [["Grades", "Interview"], 4 | ["Experience", "Interview"], 5 | ["Grades", "Admission"], 6 | ["Interview", "Offer"]], 7 | "Vdata": { 8 | "Offer": { 9 | "ord": 4, 10 | "numoutcomes": 2, 11 | "vals": ["0", "1"], 12 | "parents": ["Interview"], 13 | "children": None, 14 | "cprob": { 15 | "['0']": [.9, .1], 16 | "['1']": [.4, .6], 17 | "['2']": [.01, .99] 18 | } 19 | }, 20 | 21 | "Admission": { 22 | "ord": 3, 23 | "numoutcomes": 2, 24 | "vals": ["0", "1"], 25 | "parents": ["Grades"], 26 | "children": None, 27 | "cprob": { 28 | "['0']": [.7, .3], 29 | "['1']": [.2, .8] 30 | } 31 | }, 32 | 33 | "Interview": { 34 | "ord": 2, 35 | "numoutcomes": 3, 36 | "vals": ["0", "1", "2"], 37 | "parents": ["Experience", "Grades"], 38 | "children": ["Offer"], 39 | "cprob": { 40 | "['0', '0']": [.8, .18, .02], 41 | "['0', '1']": [.3, .6, .1], 42 | "['1', '0']": [.3, .4, .3], 43 | "['1', '1']": [.1, .2, .7] 44 | } 45 | }, 46 | 47 | "Grades": { 48 | "ord": 1, 49 | "numoutcomes": 2, 50 | "vals": ["0", "1"], 51 | "parents": None, 52 | "children": ["Admission", "Interview"], 53 | "cprob": [.7, .3] 54 | }, 55 | 56 | "Experience": { 57 | "ord": 0, 58 | "numoutcomes": 2, 59 | "vals": ["0", "1"], 60 | "parents": None, 61 | "children": ["Interview"], 62 | "cprob": [.6, .4] 63 | } 64 | } 65 | } 66 | --------------------------------------------------------------------------------