├── ex 1.str ├── ex 10.str ├── ex 2.str ├── ex 3.str ├── ex 4.str ├── ex 5.str ├── ex 6.str ├── ex 7.str ├── ex 8.str ├── ex 9.str ├── PREDICTIVE MODELLING USING SPSS MODELER.pdf └── README.md /ex 1.str: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kvkkavin/PREDICTIVE-MODELLING-USING-IBM-SPSS-MODELER/HEAD/ex 1.str -------------------------------------------------------------------------------- /ex 10.str: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kvkkavin/PREDICTIVE-MODELLING-USING-IBM-SPSS-MODELER/HEAD/ex 10.str -------------------------------------------------------------------------------- /ex 2.str: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kvkkavin/PREDICTIVE-MODELLING-USING-IBM-SPSS-MODELER/HEAD/ex 2.str -------------------------------------------------------------------------------- /ex 3.str: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kvkkavin/PREDICTIVE-MODELLING-USING-IBM-SPSS-MODELER/HEAD/ex 3.str -------------------------------------------------------------------------------- /ex 4.str: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kvkkavin/PREDICTIVE-MODELLING-USING-IBM-SPSS-MODELER/HEAD/ex 4.str -------------------------------------------------------------------------------- /ex 5.str: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kvkkavin/PREDICTIVE-MODELLING-USING-IBM-SPSS-MODELER/HEAD/ex 5.str -------------------------------------------------------------------------------- /ex 6.str: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kvkkavin/PREDICTIVE-MODELLING-USING-IBM-SPSS-MODELER/HEAD/ex 6.str -------------------------------------------------------------------------------- /ex 7.str: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kvkkavin/PREDICTIVE-MODELLING-USING-IBM-SPSS-MODELER/HEAD/ex 7.str -------------------------------------------------------------------------------- /ex 8.str: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kvkkavin/PREDICTIVE-MODELLING-USING-IBM-SPSS-MODELER/HEAD/ex 8.str -------------------------------------------------------------------------------- /ex 9.str: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kvkkavin/PREDICTIVE-MODELLING-USING-IBM-SPSS-MODELER/HEAD/ex 9.str -------------------------------------------------------------------------------- /PREDICTIVE MODELLING USING SPSS MODELER.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kvkkavin/PREDICTIVE-MODELLING-USING-IBM-SPSS-MODELER/HEAD/PREDICTIVE MODELLING USING SPSS MODELER.pdf -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # 1). COLLECT INITIAL DATA FOR THE TELECOM FIRM- 2 | 3 | AIM: 4 | To execute a program which collect initial data for the telecom firm. 5 | PROCEDURE: 6 | IMPORT FROM MICROSOFT EXCEL: 7 | STEP1: From the Sources palette, place an Excel node on the stream canvas. 8 | STEP2: Edit the Excel node. Click the Data tab, if not already selected. 9 | STEP 3: In the File type box, ensure that Excel 2007-2016 (*.xlsx) is selected. 10 | STEP 4: In the Import file box, select telco x customer data.xlsx from the location 11 | where it is stored. 12 | STEP 5: Ensure that the option First row has column names is enabled. 13 | STEP 6: Click Preview. 14 | STEP 7: Close the Preview output window. 15 | STEP 8: Close the Excel dialog box. 16 | IMPORT FROM A TAB-DELIMITED TEXT FILE: 17 | STEP 1: From the Sources palette, add a Var. File node to the stream canvas. 18 | STEP 2: Edit the Var. File node. Click the File tab, if not already selected. 19 | STEP3: In the File box, select telco x products.tab from the location where it is 20 | stored. 21 | STEP4: Ensure that the option Read field names from file is enabled. 22 | STEP5: In the Field delimiters section, click the Comma check box to disable it. 23 | STEP6: In the Field delimiters section, click the Tab check box to enable it. 24 | STEP 7: Click Preview. 25 | STEP 8: Close the Preview output window. 26 | STEP 9: Close the Var. File dialog box. 27 | 28 | 29 | IMPORT FROM IBM SPSS STATISTICS: 30 | STEP 1: From the Sources palette, add a Statistics File node to the stream canvas. 31 | STEP 2: Edit the Statistics File node. Click the Data tab, if not already selected. 32 | STEP 3: In the File box, select telco x tariffs. sav from the location where it is stored. 33 | STEP 4: Click the Use field format information to determine thestorage checkbox to 34 | enable it. 35 | STEP 5: Click Preview. 36 | STEP 6: Close the Preview output window. 37 | STEP 7: Close the Statistics File dialog box. 38 | SET MEASUREMENT LEVELS: 39 | STEP 1:From the Field Ops palette, add a Type node downstream from the 40 | Microsoft Excel node. 41 | STEP 2: Edit the Type node. 42 | STEP3: Click Read Values 43 | STEP 4: Examine the results in the Values and Measurement column. 44 | STEP 5: Click the cell in the POSTAL_CODE row, Measurement column, and then 45 | Click Categoricalfrom the drop-down. 46 | STEP6: Click the cell in the REGION row, Measurement column, and then Click 47 | Categorical from thedrop-down. 48 | STEP 7: Click Read Values. 49 | STEP 8: Close the Type dialog box. 50 | OUTPUT: 51 | RESULT: 52 | Thus, the Collect initial data for the telecom firm Program has been Executed 53 | Successfully. 54 | 55 | # 2). UNDERSTAND TELECOMMUNICATIONS DATA- 56 | 57 | AIM: 58 | To Create a stream for collecting initial telecom firm data and understand the data 59 | properties using the IBM SPSS modeler. 60 | ALGORITHM: 61 | STEP 1: From the Sources palette, place an Excel node then import the input file, as 62 | telco x customer data.xlsx. 63 | Close the Preview output window and Excel dialog box. 64 | STEP 2: From the Field Ops palette, add a Type node and read the values in the type 65 | node. Then from the Output palette, add a Table node. Then Run the Table node. 66 | 67 | 68 | Close the Table output window 69 | STEP 3: From the Output palette, add a Data Audit node downstream from 70 | the Type node. Then run the Data Audit node. 71 | The minimum value for AGE is -1, which is clearly an invalid value. 72 | STEP 4: Edit theType node Click the cell in the Values column, AGE row (where it reads 73 | [-1.0, 82.0]), and then click Specify the AGE Values sub-dialog box opens. 74 | Click the Specify values and labels option, set then set Lower to 12 and Upper to 90. 75 | Close the AGE Values sub-dialog box. 76 | STEP 5: Click the cell in the Check column, AGE row, and then click Warn from the 77 | drop-down 78 | STEP 6: Close the Type dialog box. Youwill rerun the Data Audit node to examine the 79 | effect of specifying a valid range. Run the Data Audit node. The minimum AGE 80 | value is still -1, so it seems as if nothing has changed. Close the Data 81 | Audit output window. 82 | STEP 7: Edit the Type node, click the cell in the Check column in 83 | the END_DATE row, and then set the action to Discard. 84 | Close the Type dialog box. 85 | STEP 8: Run the Table node that is downstream from the Type node. 86 | Scroll to the right so that you can view END_DATE and then scroll down to verify 87 | that END_DATE is never $null$. 88 | Only 14,698 records are retained. Close the Table output window. 89 | STEP 9: Edit the Type node. Click the cell in the Missing column, AGE row, and 90 | then click Specify. 91 | STEP 10: Click the Define blanks check box to enable it. And Below Missing 92 | values, type -1. 93 | Close the AGE Values sub dialog box. And Close the Type dialog box. 94 | Run the Data Audit node. 95 | The minimum value for AGE is 12 instead of -1. 96 | There are no stream messages so no out-of-range values were found. 97 | OUTPUT: 98 | RESULT: 99 | Thus, the Colleting and Understanding the telecommunication data. Program has 100 | been Executed Successfully. 101 | 102 | # 3). SET THE UNIT OF ANALYSIS FOR THE TELECOMMUNICATIONS DATA- 103 | 104 | 105 | AIM: 106 | To remove duplicate records in the customer dataset and transform a transactional 107 | dataset into a dataset that has one record per customer using the IBM SPSS modeler. 108 | PROCEDURE TO IMPLEMENTATION: 109 | 1. TO REMOVE DUPLICATE RECORDS 110 | STEP 1: Import the data file telco x customer data.xlsx using the Excel source node. 111 | Then add a Distinct node from the Record Ops palette. 112 | STEP 2: Edit the Distinct node. In the Settings tab. Click the Mode drop-down, to 113 | view the options. From the Mode, drop-down click Include only the first record 114 | in each group. 115 | [The Include only the first record in each group option retains only the first 116 | record of the group. You will need this option to remove duplicate records]. 117 | 118 | STEP 3: Click the Pick from the set of available fields button, click All and then 119 | click OK. 120 | Connect a Table node and execute it. Check the results. 121 | Starting records – 31,781 and Current result records: 31,769 122 | 2. AGGREGATE TRANSACTIONAL DATA 123 | STEP 1: Import the data file telco x products.dat file chose the field delimiter as Tab 124 | and newline, and add a Table node to it. Then Run this Table node. Close 125 | the Table output window. 126 | STEP 2: From the Record Ops palette, add an Aggregate node downstream to the 127 | source node telco x products.dat. 128 | STEP 3: Edit the Aggregate node. Click the Settings tab. In the Key Fields box, 129 | select CUSTOMER_ID. 130 | STEP 4: In the Basic Aggregates section, in the Aggregate fields sub-section, 131 | select REVENUES. 132 | Two statistics will be computed by default, mean and sum. Only the sum is required 133 | in this exercise. 134 | So, Click the check box in the Mean column so that it is disabled. 135 | STEP 5: Ensure that the Include record count in the field check box is enabled and 136 | then type NUMBER_OF_PRODUCTS in the text box. 137 | STEP 6: Click the Optimization tab. Click the Keys are contiguous check box to 138 | enable it. 139 | 3. CREATE FLAG FIELDS AND AGGREGATE THE DATA 140 | STEP 1: From the Field Ops palette, add a Type node downstream from the Var. 141 | File node named telco x products. 142 | STEP 2: Edit the Type node. Click the Types tab and then Click Read Values 143 | STEP 3: From the Field Ops palette, add a SetToFlag node downstream from 144 | the Type node. 145 | STEP 4: Edit the SetToFlag node. In Settings tab click the Set fields drop down and 146 | then click gadget. 147 | STEP 5: Select all values in the Available set values box and then move them into 148 | the Create flag fields box. 149 | STEP 6: Click the Aggregate keys check box to enable it. In the Aggregate keys box, 150 | select CUSTOMER_ID. Click preview 151 | OUTPUT[AGGREGATE]: 152 | OUTPUT [SET TO FLAG]: 153 | RESULT: 154 | Thus, the Set unit of analysis for the data Remove, Aggregate, Create Program 155 | has been Executed Successfully. 156 | 157 | # 4). IDENTIFY RELATIONSHIPS IN THE TELECOMMUNICATIONS DATA- 158 | 159 | AIM: 160 | To identify relationships between the following: 161 | a) Examine the relationship between categorical. 162 | b) Examine the relationship between a categorical and continuous field. 163 | PROCEDURE TO IMPLEMENTATION: 164 | a) Examine the Relationship between Categorical fields 165 | STEP1: Import the file telco x data.txt. From the Output palette, add a Matrix node 166 | downstream from the Type node. 167 | STEP 2: Edit the Matrix node. 168 | ➢ In the Rows box, select HANDSET. 169 | ➢ In the Columns box, select CHURN. 170 | ➢ Click the Include missing values check box to disable it. 171 | STEP 3: In the Appearance tab. Click the Percentage of row check box to enable it. 172 | And also Click the Include row and column totals check box to enable it. 173 | Click Run. 174 | The churn rate for customers with handset ASAD170 is 4.627%, whereas it is 175 | 94.856% for those with handset ASAD90. 176 | Close the Matrix output window. 177 | STEP 4: From the Graphs palette, add a Distribution node downstream from 178 | the Type node. 179 | STEP 5: Edit the Distribution node. 180 | 181 | 182 | In the Field box, select HANDSET. In the Color box, select CHURN. 183 | Click the Normalize by color check box to enable it. Click Run. 184 | b) Examine the Relationship between Categorical and Continuous field 185 | STEP 1: From the Output palette, add a Means node downstream from 186 | the Type node. 187 | STEP 2: Edit the Means node. In the Grouping field box, select CHURN. 188 | Similary In the Test field(s) box, select DROPPED_CALLS. Click Run. 189 | Close the Means output window. 190 | STEP 4: From the Graphs palette, add a Histogram node downstream from 191 | the Type node 192 | STEP 5: Edit the Histogram node. In the Field box, select DROPPED_CALLS. In 193 | the Color box, select CHURN. Click Run. 194 | OUTPUT: 195 | RESULT: 196 | Thus, the Predict Customer churn in the telecom data Program has been Executed 197 | Successfully. 198 | 199 | # 5). PREDICT CUSTOMER CHURN IN THE TELECOM DATASET- 200 | 201 | AIM: 202 | To Write a Program to Predict Customer churn in the telecom dataset. 203 | a) Build Model using CHAID 204 | b) Examine the CHAID Model 205 | c) Apply the model to new data 206 | 207 | PROCEDURE TO IMPLEMENTATION: 208 | IMPORT DATASET: 209 | STEP 1: Import the dataset telco x modeling data. Excel 210 | STEP 2: Insert a Select node which will only keep the valid records You can insert a 211 | Table node and checkthe output. 212 | STEP 3: From the Field Ops palette, add a Type node downstream from the Select 213 | node.STEP 4: Edit the Type node. 214 | STEP 5: Click the Types tab, if not already selected. Click the Read Values button. 215 | STEP 6: Click the cell in the CHURN row, Role column and then click Target from 216 | the drop down. STEP 7: Click the cell in the RETENTION row, Role column and 217 | then click None from the drop down. 218 | STEP 8: Click the cell in the DATA_KNOWN row, Role column and then click 219 | None from the drop down.STEP 9: Close the Type dialog box. 220 | BUILD MODEL: 221 | STEP 1: Click the Modeling tab, Add the CHAID node, located at the far right in 222 | the palette, downstreamfrom the Type node. 223 | STEP 2: Run the CHAID node (right-click it and then click Run). 224 | 225 | EXAMINE THE MODEL: 226 | STEP 1: Edit the CHAID model nugget (the yellow diamond) 227 | STEP 2: Click the Model tab, if not already selected. 228 | STEP 3: Click the Viewer tab. Navigate to the root of the tree. 229 | STEP 4: Click Preview. 230 | STEP 5: Scroll all the way to the right in the Table output window. 231 | STEP 6: Close the CHAID model nugget; you will return to the stream. 232 | STEP 7: You can also add an Analysis node from the Output palette in order to 233 | check accuracy. 234 | STEP 8: Run the Analysis Node. 235 | OUTPUT: 236 | Build Model using CHAID: 237 | Examine the CHAID model: 238 | Apply the model to new data: 239 | RESULT: 240 | Thus, the Predict Customer churn in the telecom data Program has been Executed 241 | Successfully. 242 | 243 | 244 | # 6). CREATE HOMOGENEOUS GROUPS (CLUSTERS) OF CUSTOMERS BASED ON USAGE PATTERNS- 245 | 246 | AIM: 247 | To Create homogeneous groups of customers using Segmentation model 248 | in IBM SPSS Modeler. 249 | PROCEDURE TO IMPLEMENTATION: 250 | STEP 1: Insert Type node after importing telco x modeling data.csv 251 | STEP 2: View the Type node. BILL_PEAK and BILL_OFFPEAK have role Input, 252 | so the clusters will be based on these two fields. 253 | Records with similar values for BILL_PEAK and BILL_OFFPEAK will be put into 254 | the same cluster. 255 | STEP 3: Click the Modeling palette, if not already selected. Click 256 | the Segmentation sub palette at the left side. 257 | STEP 4: Add a Two Step node downstream from the Type node in the lower stream. 258 | STEP 5: Run the Two Step node. STEP 7: Edit the Two Step model nugget that was 259 | generated. 260 | 261 | OUTPUT: 262 | RESULT: 263 | Thus, the Creating homogeneous groups of customers using Segmentation model 264 | Program has been Executed Successfully. 265 | 266 | # 7). USING FUNCTIONS IN IBM SPSS MODELE- 267 | 268 | AIM: 269 | To derive new fields using Date Functions and String Function to cleanse and 270 | enrich a dataset that stores demographic and churn data on the company's customers 271 | in IBM SPSS Modeler. 272 | ALGORITHM: 273 | Use date functions to derive fields: 274 | STEP1: From the Sources palette, double click the Var. File node to add it to the 275 | stream canvas. 276 | STEP 2: Double-click the Var. File node to edit the Var. File node. 277 | STEP 3: To the right of the File field, click the Browse for file button, navigate to 278 | the relevant folder, click the telco x subset.csv file, and then click Open to import the 279 | data. Do not close the Var. File dialog box. 280 | STEP 4: In the Var. File dialog box, click Preview, and then scroll to the last fields 281 | in the Preview output window. 282 | STEP 5: Click OK to close the Preview output window and Click OK to close 283 | the Var. File dialog box. 284 | STEP 6: From the Field Ops palette, double-click the Derive node to add it 285 | downstream from the Var. File node. 286 | STEP 7: Note: Placing node B downstream from node A means that the data flows 287 | from A to B. 288 | STEP 8: Edit the Derive node.Under Derive field, type MONTHS_CUSTOMER. 289 | 290 | STEP 9: Under Formula,enter date_months_difference(CONNECT_DATE, 291 | END_DATE). 292 | STEP 10: Note: Type the expression or use the Expression Builder to construct the 293 | expression. In this course "enter" refers to typing or using the Expression Builder, 294 | according to your preference. Here, when you use the Expression Builder, look for 295 | the date_months_difference function in the Date and Time function group. 296 | STEP 11: Click Preview, and then scroll to the last fields in the Preview output 297 | window. The new field stores the number of months that elapsed between the two 298 | dates, as a real number. If you want the result as an integer, use a function such as 299 | round, intof or to_integer. 300 | STEP 12: Close the Preview output window. Close the Derive dialog box. 301 | STEP 13: From the Field Ops palette, add a Derive node downstream from 302 | the Derive node named MONTHS_CUSTOMER. 303 | STEP 14: Edit the Derive node, and then set the Mode to Multiple. 304 | STEP 15: The Derive dialog box reflects the change. The Derive field box is 305 | replaced by a Derive from box where the source fields are selected. 306 | STEP 16: Under Derive from, click the Pick from the set of available fields button, 307 | Ctrl+click CONNECT_DATE and END_DATE, and then click OK. 308 | STEP 17: Beside Field name extension, replace the current extension by _MONTH. 309 | STEP 18: Under Formula, enter datetime_month_name (datetime_month 310 | (@FIELD)). If you use the Expression Builder, locate 311 | the datetime_month and datetime_month_name function in the Date and 312 | Time function group. Locate the @FIELD function in the @ Functions function 313 | group. 314 | STEP 19: Click Preview, and then scroll to the last fields in the Preview output 315 | window. 316 | STEP 20: Close the Preview output window. Close the Derive dialog box. 317 | Use string functions to derive fields: 318 | STEP 1: From the Output palette, add a Table node downstream from 319 | the Derive node named _MONTH. 320 | STEP 2: Right-click the Table node and then click Run then Close the Table output 321 | window. 322 | STEP 3: From the Field Ops palette, add a Derive node downstream from 323 | the Derive node named _MONTH (make room by repositioning the Table node, if 324 | preferred). 325 | STEP 4: Edit the Derive node. Under Derive field, type E-MAIL ADDRESS OK. 326 | STEP 5: Beside Derive as, click Flag from the list. 327 | STEP 6: Under True when, enter count_substring('E-MAIL ADDRESS', "@") = 328 | 1. If you use the Expression Builder, locate the count_substring function in 329 | the String function group. 330 | STEP 7: Click Preview, and then move E-MAIL ADDRESS OK next to EMAIL_ADDRESS in the Preview output window. (Note: move E-MAIL ADDRESS 331 | OK by dragging it to the left, until it is just right from E-MAIL ADDRESS.) 332 | STEP 8: Close the Preview output window. Close the Derive dialog box. 333 | STEP 9: From the Field Ops palette, add a Derive node downstream from 334 | the Derive node named E-MAIL ADDRESS OK. 335 | STEP 10: Edit the Derive node. Under Derive field, type NO E-MAIL ADDRESS. 336 | STEP 11: Beside Derive as, click Flag from the list. 337 | STEP 12: Under True when, enter length ('E-MAIL ADDRESS') = 0. If you use 338 | the Expression Builder, locate the length function in the String function group. 339 | STEP13: Click Preview, and then move NO E-MAIL ADDRESS next to E-MAIL 340 | ADDRESS. 341 | STEP 14: Close the Preview output window. 342 | STEP15: Under True when, enter length (trim ('E-MAIL ADDRESS')) = 0. If you 343 | use the Expression Builder, locate the length and trim function in 344 | the String function group. 345 | Click Preview. 346 | STEP 16: Close the Preview output window. Then Close the Derive dialog box. 347 | STEP 17: From the Field Ops palette, add a Derive node downstream from 348 | the Derive node named NO E-MAIL ADDRESS. 349 | STEP 18: Edit the Derive node. Under Derive field, type POSITION PERIOD. 350 | STEP 19: Under Formula, enter locchar_back (., length ('E-MAIL ADDRESS'), 351 | 'E-MAIL ADDRESS'). If you use the Expression Builder, locate 352 | the locchar_back and length function in the String function group. 353 | STEP 20: Click Preview, and then move POSITION PERIOD next to E-MAIL 354 | ADDRESS. 355 | STEP 21: Close the Preview output window. Close the Derive dialog box. 356 | STEP 22: From the Field Ops palette, add a Derive node downstream from 357 | the Derive node named POSITION PERIOD. 358 | STEP 23: Edit the Derive node. Under Derive field type DOMAIN NAME. 359 | STEP 24: Beside Derive as, click Conditional from the list. 360 | STEP 25: Under If, enter 'POSITION PERIOD' > 0. 361 | STEP 26: Under Then, enter substring_between ('POSITION PERIOD' + 1, 362 | length ('E-MAIL ADDRESS'), 'E-MAIL ADDRESS'). If you use the Expression 363 | Builder, locate the substring_between and length function in the String function 364 | group. 365 | STEP 27: Under Else, type undef. 366 | Click Preview, and then scroll to the last fields in the Preview output window. 367 | STEP 28: Close the Preview output window. Then Close the Derive dialog box. 368 | OUTPUT: 369 | Use date functions to derive fields: 370 | Use string functions to derive fields: 371 | RESULT: 372 | Thus, the Create Segmentation Model Program has been Executed Successfully. 373 | 374 | 375 | # 8). ADD FIELDS TO THE TELECOMMUNICATIONS DATA- 376 | 377 | AIM: 378 | To write a Program for Add Fields to the Telecommunication data. 379 | a) Drive fields as formula 380 | b) Derive fields as flag or nominal. 381 | ALGORITHM: 382 | Derive fields as formula: 383 | STEP 1: Import the dataset telco x data.txt 384 | STEP 2: From the Field Ops palette, add a Derive node downstream from 385 | the Type node. 386 | STEP 3: Edit the Derive node. Click the Settings tab, if not already selected. 387 | STEP 4: In the Derive field box, type BILL_PEAK. 388 | STEP 5: Click the Derive as drop down. From the Derive as drop down, 389 | click Formula, if not already selected. 390 | STEP 6: Click the Field type drop down. Click the Launch expression 391 | builder button. 392 | STEP 7: In the Formula box, enter PEAK_MINS * PEAK_RATE/100, by typing it 393 | or by pasting the field names from the list of fields, whatever you feel comfortable 394 | with. 395 | STEP 8: Click the Check button. Click OK to close the Expression Builder. 396 | STEP 9: From the Field Ops palette, add a Derive node downstream from 397 | the Derive node named BILL_PEAK. 398 | 399 | STEP 10: Edit the Derive node. Click the Settings tab, if not already selected. 400 | STEP 11: Derive field box: BILL_OFFPEAK 401 | STEP 12: Derive as: Formula (the default) 402 | STEP 13: Field type: ; the field will then be auto-typed as Continuous 403 | STEP 14: Expression: OFFPEAK_MINS * OFFPEAK_RATE/100 (if preferred, 404 | use the Expression Builder) 405 | Close the Derive dialog box. 406 | STEP 15: From the Field Ops palette, add a Derive node downstream from 407 | the Derive node named BILL_OFFPEAK. 408 | STEP 16: Edit the Derive node. Click the Settings tab, if not already selected. 409 | STEP 17: Derive field box: BILL_TOTAL 410 | STEP 18: Derive as: Formula (the default) 411 | STEP 19: Field type:; the field will then be auto-typed as Continuous 412 | STEP 20: Expression: BILL_PEAK + BILL_OFFPEAK (if preferred, use the 413 | Expression Builder) 414 | STEP 21: Click Preview. Then Close the Preview output window. 415 | Close the Derive dialog box. 416 | Derive fields as flag or nominal: 417 | STEP 1: From the Field Ops palette, add a Derive node downstream from 418 | the Derive node named BILL_TOTAL. 419 | STEP 2: Edit the Derive node. Click the Settings tab, if not already selected. 420 | STEP 3: Derive field box: BILL_GT_0 421 | STEP 4: Derive as: Flag 422 | STEP 5: Field type: Flag (should be set automatically to Flag when you choose 423 | Derive as: Flag) 424 | STEP 6: True value: T (the default) and False value: F (the default) 425 | STEP 7: True when box: BILL_TOTAL > 0. Close the Derive dialog box. 426 | STEP 8: From the Field Ops palette, add a Derive node downstream from 427 | the Derive node named BILL_GT_0. 428 | STEP 9: Edit the Derive node Click the Settings tab, if not already selected. 429 | STEP 10: Derive field box: SEGMENT 430 | STEP 11: Derive as: Nominal and Field type: Ordinal 431 | STEP 12: Click the cell in the Set field to column and then type 1. 432 | STEP 13: Click the cell in the If this condition is true column and then 433 | type BILL_TOTAL <= 100. 434 | STEP 14: Repeat the previous two steps for the following values and expressions: 435 | STEP 15: Set field to: 2 If this condition is true: BILL_TOTAL <= 200 436 | STEP 16: Set field to: 3 If this condition is true: BILL_TOTAL > 200 437 | STEP 17: In the Default value box, type undef. Then Close the Derive dialog box. 438 | OUTPUT: 439 | Drive fields as formula: 440 | Derive fields as flag or nominal: 441 | RESULT: 442 | Thus, the Add fields to the Telecommunication data Program have been 443 | Executed Successfully. 444 | 445 | # 9). CREATE A LINEAR REGRESSION MODEL TO PREDICT EMPLOYEE SALARIES- 446 | 447 | 448 | AIM: 449 | To Create a Linear Regression Model to predict Employee Salaries. 450 | ALGORITHM: 451 | Import and examine the data 452 | STEP1: From the Sources palette, add a Var. File node to a blank stream canvas, 453 | edit the node, point to employee_data.txt, and then close the Var. File dialog 454 | box. 455 | STEP2: From the Output palette, add a Table node downstream from the Var. 456 | File node, run it, and then examine the output. The dataset is comprised of 474 457 | employees. 458 | Close the Table output window. 459 | STEP 3: From the Output palette, add a Data Audit node downstream from 460 | the Var. File node, run it, and then examine the output. 461 | Set measurement levels and roles: 462 | STEP 1: From Field Ops, add a Type node downstream from the Var. File node. 463 | STEP 2: Edit the Type node. Click Read Values 464 | STEP 3: set the Measurement for educational_level to Ordinal 465 | STEP 4: The Role from gender to months_previous_experience is set to Input 466 | STEP 5: set the Role for current_salary to Target 467 | 468 | Create Linear Regression Model: 469 | STEP 1: From the Modeling palette, add a Linear node downstream from 470 | the Type node. 471 | STEP 2: Edit the Linear node. Click the Build Options tab 472 | STEP 3: click the Basics item and clear the Automatically prepare data check box 473 | STEP 4: click the Model Selection item and set the Model Selection method 474 | to Include all predictors 475 | STEP 5: click Run 476 | STEP 6: Edit the generated model nugget, and then click the Model Summary item in 477 | the pane on the left. 478 | STEP 7: Click the Predictor Importance item in the pane on the left. 479 | STEP 8: The job_category field is by far the most important predictor. Gender is the 480 | second most important field. Region and age are least important. 481 | STEP 9: Click the Predicted by Observed item in the pane on the left. 482 | STEP 10: The points are not scattered around the diagonal and the predicted values 483 | seem to break up in two categories. 484 | STEP 11: Click the Coefficients by Observed item in the pane on the left, and then, 485 | from the Style list, select Table. 486 | OUTPUT: 487 | RESULT: 488 | Thus, the Create Linear regression model to predict Employee Salaries Program 489 | has been Executed Successfully. 490 | 491 | 492 | # 10). USE LOGISTIC REGRESSION TO PREDICT RESPONSE TO A CHARITY PROMOTION CAMPAIGN- 493 | 494 | 495 | AIM: 496 | To write a Use Logistic Regression to Predict Response to a Charity Promotion 497 | Campaign 498 | ALGORITHM: 499 | Import and examine the data 500 | STEP 1: From the Sources palette, double-click the Var. File node to add it to the 501 | stream. Import the dataset charity.csv 502 | STEP 2: From the Output palette, add a Data Audit node downstream from 503 | the Var. File node, run the Data Audit node 504 | STEP 3: double-click the Sample Graph for the response to campaign field. 505 | Partition the data and set the roles: 506 | STEP 1: From the Field Ops palette, add a Partition node downstream from 507 | the Var. File node, 508 | STEP 2: Set the Training partition size to 70% and the Testing partition size to 509 | 30%. Ensure that the Repeatable partition assignment option is enabled, with seed 510 | value 1234567. 511 | STEP 3: From the Field Ops palette, add a Type node downstream from the Partition 512 | node. 513 | STEP 4: Edit the Type node, and then click the Read Values button. The Values 514 | column is populated with values from the data. 515 | 516 | STEP 5: Set the Role for gender, age, mosaic bands, pre-campaign expenditure, 517 | and pre-campaign visits to Input 518 | STEP 6: Set the Role for response to campaign to Target 519 | STEP 7: Ensure that the Role for the Partition field is set to Partition 520 | STEP 8: Set the Role for all other fields to None 521 | Create the Logistic Regression Models: 522 | STEP 1: From the Modeling palette, add a Logistic node downstream from 523 | the Type node. 524 | STEP 2: Edit the Logistic node and click the Model tab 525 | STEP 3: For Procedure, select the Binomial option 526 | STEP 4: close the Logistic dialog box 527 | STEP 5: Add a second Logistic node downstream from the Type node. 528 | STEP 6: Edit the second Logistic node, and then: click the Model tab 529 | STEP 7: For Procedure, select the Binomial option 530 | STEP 8: below Categorical inputs, select mosaic bands, and for Base Category, 531 | select First 532 | STEP 9: click the Annotations tab, select the Custom option, and type custom close 533 | the Logistic dialog box 534 | STEP 10: Select the two Logistic nodes, right-click one of them, and click Run 535 | Selection. 536 | STEP 11: Edit the Logistic model nugget named response to campaign, click the 537 | Advanced tab, and scroll down to the Variables in the Equation table (the last table in 538 | the output). 539 | STEP 12: Close the Logistic output window. 540 | STEP 13: Edit the Logistic model nugget named custom, click the Advanced tab, and 541 | scroll to the Categorical Variables Coding’s table. 542 | STEP 14: You can add an Analysis node at the end and check accuracy levels. 543 | OUTPUT: 544 | RESULT: 545 | Thus, the Use of Logistic Regression to Predict Response to a Charity Promotion 546 | Campaign Program has been Executed Successfully. 547 | 548 | --------------------------------------------------------------------------------