├── ex 1.str
├── ex 10.str
├── ex 2.str
├── ex 3.str
├── ex 4.str
├── ex 5.str
├── ex 6.str
├── ex 7.str
├── ex 8.str
├── ex 9.str
├── PREDICTIVE MODELLING USING SPSS MODELER.pdf
└── README.md


/ex 1.str:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kvkkavin/PREDICTIVE-MODELLING-USING-IBM-SPSS-MODELER/HEAD/ex 1.str


--------------------------------------------------------------------------------
/ex 10.str:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kvkkavin/PREDICTIVE-MODELLING-USING-IBM-SPSS-MODELER/HEAD/ex 10.str


--------------------------------------------------------------------------------
/ex 2.str:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kvkkavin/PREDICTIVE-MODELLING-USING-IBM-SPSS-MODELER/HEAD/ex 2.str


--------------------------------------------------------------------------------
/ex 3.str:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kvkkavin/PREDICTIVE-MODELLING-USING-IBM-SPSS-MODELER/HEAD/ex 3.str


--------------------------------------------------------------------------------
/ex 4.str:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kvkkavin/PREDICTIVE-MODELLING-USING-IBM-SPSS-MODELER/HEAD/ex 4.str


--------------------------------------------------------------------------------
/ex 5.str:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kvkkavin/PREDICTIVE-MODELLING-USING-IBM-SPSS-MODELER/HEAD/ex 5.str


--------------------------------------------------------------------------------
/ex 6.str:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kvkkavin/PREDICTIVE-MODELLING-USING-IBM-SPSS-MODELER/HEAD/ex 6.str


--------------------------------------------------------------------------------
/ex 7.str:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kvkkavin/PREDICTIVE-MODELLING-USING-IBM-SPSS-MODELER/HEAD/ex 7.str


--------------------------------------------------------------------------------
/ex 8.str:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kvkkavin/PREDICTIVE-MODELLING-USING-IBM-SPSS-MODELER/HEAD/ex 8.str


--------------------------------------------------------------------------------
/ex 9.str:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kvkkavin/PREDICTIVE-MODELLING-USING-IBM-SPSS-MODELER/HEAD/ex 9.str


--------------------------------------------------------------------------------
/PREDICTIVE MODELLING USING SPSS MODELER.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kvkkavin/PREDICTIVE-MODELLING-USING-IBM-SPSS-MODELER/HEAD/PREDICTIVE MODELLING USING SPSS MODELER.pdf


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # 1). COLLECT INITIAL DATA FOR THE TELECOM FIRM-
  2 | 
  3 | AIM:
  4 | To execute a program which collect initial data for the telecom firm.
  5 | PROCEDURE:
  6 | IMPORT FROM MICROSOFT EXCEL:
  7 | STEP1: From the Sources palette, place an Excel node on the stream canvas.
  8 | STEP2: Edit the Excel node. Click the Data tab, if not already selected.
  9 | STEP 3: In the File type box, ensure that Excel 2007-2016 (*.xlsx) is selected.
 10 | STEP 4: In the Import file box, select telco x customer data.xlsx from the location 
 11 | where it is stored.
 12 | STEP 5: Ensure that the option First row has column names is enabled.
 13 | STEP 6: Click Preview.
 14 | STEP 7: Close the Preview output window.
 15 | STEP 8: Close the Excel dialog box.
 16 | IMPORT FROM A TAB-DELIMITED TEXT FILE:
 17 | STEP 1: From the Sources palette, add a Var. File node to the stream canvas.
 18 | STEP 2: Edit the Var. File node. Click the File tab, if not already selected.
 19 | STEP3: In the File box, select telco x products.tab from the location where it is
 20 | stored.
 21 | STEP4: Ensure that the option Read field names from file is enabled.
 22 | STEP5: In the Field delimiters section, click the Comma check box to disable it.
 23 | STEP6: In the Field delimiters section, click the Tab check box to enable it.
 24 | STEP 7: Click Preview.
 25 | STEP 8: Close the Preview output window.
 26 | STEP 9: Close the Var. File dialog box.
 27 |  
 28 | 
 29 | IMPORT FROM IBM SPSS STATISTICS:
 30 | STEP 1: From the Sources palette, add a Statistics File node to the stream canvas.
 31 | STEP 2: Edit the Statistics File node. Click the Data tab, if not already selected.
 32 | STEP 3: In the File box, select telco x tariffs. sav from the location where it is stored.
 33 | STEP 4: Click the Use field format information to determine thestorage checkbox to
 34 | enable it.
 35 | STEP 5: Click Preview.
 36 | STEP 6: Close the Preview output window.
 37 | STEP 7: Close the Statistics File dialog box.
 38 | SET MEASUREMENT LEVELS:
 39 | STEP 1:From the Field Ops palette, add a Type node downstream from the
 40 | Microsoft Excel node.
 41 | STEP 2: Edit the Type node.
 42 | STEP3: Click Read Values
 43 | STEP 4: Examine the results in the Values and Measurement column.
 44 | STEP 5: Click the cell in the POSTAL_CODE row, Measurement column, and then 
 45 | Click Categoricalfrom the drop-down.
 46 | STEP6: Click the cell in the REGION row, Measurement column, and then Click 
 47 | Categorical from thedrop-down.
 48 | STEP 7: Click Read Values.
 49 | STEP 8: Close the Type dialog box.
 50 | OUTPUT:
 51 | RESULT:
 52 |  Thus, the Collect initial data for the telecom firm Program has been Executed
 53 | Successfully.
 54 | 
 55 | # 2). UNDERSTAND TELECOMMUNICATIONS DATA-
 56 | 
 57 | AIM:
 58 | To Create a stream for collecting initial telecom firm data and understand the data 
 59 | properties using the IBM SPSS modeler.
 60 | ALGORITHM:
 61 | STEP 1: From the Sources palette, place an Excel node then import the input file, as 
 62 | telco x customer data.xlsx.
 63 | Close the Preview output window and Excel dialog box.
 64 | STEP 2: From the Field Ops palette, add a Type node and read the values in the type 
 65 | node. Then from the Output palette, add a Table node. Then Run the Table node.
 66 | 
 67 | 
 68 | Close the Table output window
 69 | STEP 3: From the Output palette, add a Data Audit node downstream from 
 70 | the Type node. Then run the Data Audit node.
 71 | The minimum value for AGE is -1, which is clearly an invalid value.
 72 | STEP 4: Edit theType node Click the cell in the Values column, AGE row (where it reads 
 73 | [-1.0, 82.0]), and then click Specify the AGE Values sub-dialog box opens.
 74 | Click the Specify values and labels option, set then set Lower to 12 and Upper to 90.
 75 | Close the AGE Values sub-dialog box.
 76 | STEP 5: Click the cell in the Check column, AGE row, and then click Warn from the 
 77 | drop-down
 78 | STEP 6: Close the Type dialog box. Youwill rerun the Data Audit node to examine the 
 79 | effect of specifying a valid range. Run the Data Audit node. The minimum AGE 
 80 | value is still -1, so it seems as if nothing has changed. Close the Data 
 81 | Audit output window.
 82 | STEP 7: Edit the Type node, click the cell in the Check column in 
 83 | the END_DATE row, and then set the action to Discard.
 84 | Close the Type dialog box.
 85 | STEP 8: Run the Table node that is downstream from the Type node.
 86 | Scroll to the right so that you can view END_DATE and then scroll down to verify 
 87 | that END_DATE is never $null$.
 88 | Only 14,698 records are retained. Close the Table output window.
 89 | STEP 9: Edit the Type node. Click the cell in the Missing column, AGE row, and 
 90 | then click Specify.
 91 | STEP 10: Click the Define blanks check box to enable it. And Below Missing 
 92 | values, type -1.
 93 | Close the AGE Values sub dialog box. And Close the Type dialog box.
 94 | Run the Data Audit node.
 95 | The minimum value for AGE is 12 instead of -1.
 96 | There are no stream messages so no out-of-range values were found.
 97 | OUTPUT:
 98 | RESULT:
 99 |  Thus, the Colleting and Understanding the telecommunication data. Program has 
100 | been Executed Successfully.
101 | 
102 | # 3). SET THE UNIT OF ANALYSIS FOR THE TELECOMMUNICATIONS DATA- 
103 | 
104 | 
105 | AIM:
106 | To remove duplicate records in the customer dataset and transform a transactional 
107 | dataset into a dataset that has one record per customer using the IBM SPSS modeler.
108 | PROCEDURE TO IMPLEMENTATION:
109 | 1. TO REMOVE DUPLICATE RECORDS
110 | STEP 1: Import the data file telco x customer data.xlsx using the Excel source node. 
111 | Then add a Distinct node from the Record Ops palette.
112 | STEP 2: Edit the Distinct node. In the Settings tab. Click the Mode drop-down, to 
113 | view the options. From the Mode, drop-down click Include only the first record 
114 | in each group.
115 | [The Include only the first record in each group option retains only the first
116 | record of the group. You will need this option to remove duplicate records].
117 | 
118 | STEP 3: Click the Pick from the set of available fields button, click All and then 
119 | click OK.
120 | Connect a Table node and execute it. Check the results.
121 | Starting records – 31,781 and Current result records: 31,769
122 | 2. AGGREGATE TRANSACTIONAL DATA
123 | STEP 1: Import the data file telco x products.dat file chose the field delimiter as Tab 
124 | and newline, and add a Table node to it. Then Run this Table node. Close 
125 | the Table output window.
126 | STEP 2: From the Record Ops palette, add an Aggregate node downstream to the 
127 | source node telco x products.dat.
128 | STEP 3: Edit the Aggregate node. Click the Settings tab. In the Key Fields box, 
129 | select CUSTOMER_ID.
130 | STEP 4: In the Basic Aggregates section, in the Aggregate fields sub-section, 
131 | select REVENUES.
132 | Two statistics will be computed by default, mean and sum. Only the sum is required 
133 | in this exercise.
134 | So, Click the check box in the Mean column so that it is disabled.
135 | STEP 5: Ensure that the Include record count in the field check box is enabled and 
136 | then type NUMBER_OF_PRODUCTS in the text box.
137 | STEP 6: Click the Optimization tab. Click the Keys are contiguous check box to 
138 | enable it.
139 | 3. CREATE FLAG FIELDS AND AGGREGATE THE DATA
140 | STEP 1: From the Field Ops palette, add a Type node downstream from the Var. 
141 | File node named telco x products.
142 | STEP 2: Edit the Type node. Click the Types tab and then Click Read Values
143 | STEP 3: From the Field Ops palette, add a SetToFlag node downstream from 
144 | the Type node.
145 | STEP 4: Edit the SetToFlag node. In Settings tab click the Set fields drop down and 
146 | then click gadget.
147 | STEP 5: Select all values in the Available set values box and then move them into 
148 | the Create flag fields box.
149 | STEP 6: Click the Aggregate keys check box to enable it. In the Aggregate keys box, 
150 | select CUSTOMER_ID. Click preview
151 | OUTPUT[AGGREGATE]:
152 | OUTPUT [SET TO FLAG]:
153 | RESULT:
154 |  Thus, the Set unit of analysis for the data Remove, Aggregate, Create Program 
155 | has been Executed Successfully.
156 | 
157 | # 4). IDENTIFY RELATIONSHIPS IN THE TELECOMMUNICATIONS DATA-
158 | 
159 | AIM:
160 |  To identify relationships between the following:
161 | a) Examine the relationship between categorical.
162 | b) Examine the relationship between a categorical and continuous field.
163 | PROCEDURE TO IMPLEMENTATION:
164 | a) Examine the Relationship between Categorical fields
165 | STEP1: Import the file telco x data.txt. From the Output palette, add a Matrix node 
166 | downstream from the Type node.
167 | STEP 2: Edit the Matrix node.
168 | ➢ In the Rows box, select HANDSET.
169 | ➢ In the Columns box, select CHURN.
170 | ➢ Click the Include missing values check box to disable it.
171 | STEP 3: In the Appearance tab. Click the Percentage of row check box to enable it. 
172 | And also Click the Include row and column totals check box to enable it.
173 | Click Run.
174 | The churn rate for customers with handset ASAD170 is 4.627%, whereas it is 
175 | 94.856% for those with handset ASAD90.
176 | Close the Matrix output window.
177 | STEP 4: From the Graphs palette, add a Distribution node downstream from 
178 | the Type node.
179 | STEP 5: Edit the Distribution node.
180 | 
181 | 
182 | In the Field box, select HANDSET. In the Color box, select CHURN.
183 | Click the Normalize by color check box to enable it. Click Run.
184 | b) Examine the Relationship between Categorical and Continuous field
185 | STEP 1: From the Output palette, add a Means node downstream from 
186 | the Type node.
187 | STEP 2: Edit the Means node. In the Grouping field box, select CHURN.
188 | Similary In the Test field(s) box, select DROPPED_CALLS. Click Run.
189 | Close the Means output window.
190 | STEP 4: From the Graphs palette, add a Histogram node downstream from 
191 | the Type node
192 | STEP 5: Edit the Histogram node. In the Field box, select DROPPED_CALLS. In 
193 | the Color box, select CHURN. Click Run.
194 | OUTPUT:
195 | RESULT:
196 |  Thus, the Predict Customer churn in the telecom data Program has been Executed 
197 | Successfully.
198 | 
199 | # 5). PREDICT CUSTOMER CHURN IN THE TELECOM DATASET-
200 | 
201 | AIM:
202 | To Write a Program to Predict Customer churn in the telecom dataset.
203 | a) Build Model using CHAID
204 | b) Examine the CHAID Model
205 | c) Apply the model to new data
206 | 
207 | PROCEDURE TO IMPLEMENTATION:
208 | IMPORT DATASET:
209 | STEP 1: Import the dataset telco x modeling data. Excel
210 | STEP 2: Insert a Select node which will only keep the valid records You can insert a
211 | Table node and checkthe output.
212 | STEP 3: From the Field Ops palette, add a Type node downstream from the Select
213 | node.STEP 4: Edit the Type node.
214 | STEP 5: Click the Types tab, if not already selected. Click the Read Values button.
215 | STEP 6: Click the cell in the CHURN row, Role column and then click Target from 
216 | the drop down. STEP 7: Click the cell in the RETENTION row, Role column and
217 | then click None from the drop down.
218 | STEP 8: Click the cell in the DATA_KNOWN row, Role column and then click 
219 | None from the drop down.STEP 9: Close the Type dialog box.
220 | BUILD MODEL:
221 | STEP 1: Click the Modeling tab, Add the CHAID node, located at the far right in
222 | the palette, downstreamfrom the Type node.
223 | STEP 2: Run the CHAID node (right-click it and then click Run).
224 | 
225 | EXAMINE THE MODEL:
226 | STEP 1: Edit the CHAID model nugget (the yellow diamond)
227 | STEP 2: Click the Model tab, if not already selected.
228 | STEP 3: Click the Viewer tab. Navigate to the root of the tree.
229 | STEP 4: Click Preview.
230 | STEP 5: Scroll all the way to the right in the Table output window.
231 | STEP 6: Close the CHAID model nugget; you will return to the stream.
232 | STEP 7: You can also add an Analysis node from the Output palette in order to
233 | check accuracy.
234 | STEP 8: Run the Analysis Node.
235 | OUTPUT:
236 | Build Model using CHAID:
237 | Examine the CHAID model:
238 | Apply the model to new data:
239 | RESULT:
240 | Thus, the Predict Customer churn in the telecom data Program has been Executed
241 | Successfully.
242 | 
243 | 
244 | # 6). CREATE HOMOGENEOUS GROUPS (CLUSTERS) OF CUSTOMERS BASED ON USAGE PATTERNS-
245 | 
246 | AIM:
247 | To Create homogeneous groups of customers using Segmentation model
248 | in IBM SPSS Modeler.
249 | PROCEDURE TO IMPLEMENTATION:
250 | STEP 1: Insert Type node after importing telco x modeling data.csv
251 | STEP 2: View the Type node. BILL_PEAK and BILL_OFFPEAK have role Input, 
252 | so the clusters will be based on these two fields. 
253 | Records with similar values for BILL_PEAK and BILL_OFFPEAK will be put into 
254 | the same cluster.
255 | STEP 3: Click the Modeling palette, if not already selected. Click 
256 | the Segmentation sub palette at the left side.
257 | STEP 4: Add a Two Step node downstream from the Type node in the lower stream.
258 | STEP 5: Run the Two Step node. STEP 7: Edit the Two Step model nugget that was 
259 | generated.
260 | 
261 | OUTPUT:
262 | RESULT:
263 | Thus, the Creating homogeneous groups of customers using Segmentation model 
264 | Program has been Executed Successfully.
265 | 
266 | # 7). USING FUNCTIONS IN IBM SPSS MODELE-
267 | 
268 | AIM:
269 | To derive new fields using Date Functions and String Function to cleanse and 
270 | enrich a dataset that stores demographic and churn data on the company's customers 
271 | in IBM SPSS Modeler.
272 | ALGORITHM:
273 | Use date functions to derive fields:
274 | STEP1: From the Sources palette, double click the Var. File node to add it to the 
275 | stream canvas.
276 | STEP 2: Double-click the Var. File node to edit the Var. File node.
277 | STEP 3: To the right of the File field, click the Browse for file button, navigate to 
278 | the relevant folder, click the telco x subset.csv file, and then click Open to import the 
279 | data. Do not close the Var. File dialog box.
280 | STEP 4: In the Var. File dialog box, click Preview, and then scroll to the last fields 
281 | in the Preview output window.
282 | STEP 5: Click OK to close the Preview output window and Click OK to close 
283 | the Var. File dialog box.
284 | STEP 6: From the Field Ops palette, double-click the Derive node to add it 
285 | downstream from the Var. File node.
286 | STEP 7: Note: Placing node B downstream from node A means that the data flows 
287 | from A to B.
288 | STEP 8: Edit the Derive node.Under Derive field, type MONTHS_CUSTOMER.
289 | 
290 | STEP 9: Under Formula,enter date_months_difference(CONNECT_DATE, 
291 | END_DATE).
292 | STEP 10: Note: Type the expression or use the Expression Builder to construct the 
293 | expression. In this course "enter" refers to typing or using the Expression Builder, 
294 | according to your preference. Here, when you use the Expression Builder, look for 
295 | the date_months_difference function in the Date and Time function group.
296 | STEP 11: Click Preview, and then scroll to the last fields in the Preview output 
297 | window. The new field stores the number of months that elapsed between the two 
298 | dates, as a real number. If you want the result as an integer, use a function such as 
299 | round, intof or to_integer.
300 | STEP 12: Close the Preview output window. Close the Derive dialog box.
301 | STEP 13: From the Field Ops palette, add a Derive node downstream from 
302 | the Derive node named MONTHS_CUSTOMER.
303 | STEP 14: Edit the Derive node, and then set the Mode to Multiple.
304 | STEP 15: The Derive dialog box reflects the change. The Derive field box is 
305 | replaced by a Derive from box where the source fields are selected.
306 | STEP 16: Under Derive from, click the Pick from the set of available fields button, 
307 | Ctrl+click CONNECT_DATE and END_DATE, and then click OK.
308 | STEP 17: Beside Field name extension, replace the current extension by _MONTH.
309 | STEP 18: Under Formula, enter datetime_month_name (datetime_month 
310 | (@FIELD)). If you use the Expression Builder, locate 
311 | the datetime_month and datetime_month_name function in the Date and 
312 | Time function group. Locate the @FIELD function in the @ Functions function 
313 | group.
314 | STEP 19: Click Preview, and then scroll to the last fields in the Preview output 
315 | window.
316 | STEP 20: Close the Preview output window. Close the Derive dialog box.
317 | Use string functions to derive fields:
318 | STEP 1: From the Output palette, add a Table node downstream from 
319 | the Derive node named _MONTH.
320 | STEP 2: Right-click the Table node and then click Run then Close the Table output 
321 | window.
322 | STEP 3: From the Field Ops palette, add a Derive node downstream from 
323 | the Derive node named _MONTH (make room by repositioning the Table node, if 
324 | preferred).
325 | STEP 4: Edit the Derive node. Under Derive field, type E-MAIL ADDRESS OK.
326 | STEP 5: Beside Derive as, click Flag from the list.
327 | STEP 6: Under True when, enter count_substring('E-MAIL ADDRESS', "@") = 
328 | 1. If you use the Expression Builder, locate the count_substring function in 
329 | the String function group.
330 | STEP 7: Click Preview, and then move E-MAIL ADDRESS OK next to EMAIL_ADDRESS in the Preview output window. (Note: move E-MAIL ADDRESS 
331 | OK by dragging it to the left, until it is just right from E-MAIL ADDRESS.)
332 | STEP 8: Close the Preview output window. Close the Derive dialog box.
333 | STEP 9: From the Field Ops palette, add a Derive node downstream from 
334 | the Derive node named E-MAIL ADDRESS OK.
335 | STEP 10: Edit the Derive node. Under Derive field, type NO E-MAIL ADDRESS.
336 | STEP 11: Beside Derive as, click Flag from the list.
337 | STEP 12: Under True when, enter length ('E-MAIL ADDRESS') = 0. If you use 
338 | the Expression Builder, locate the length function in the String function group.
339 | STEP13: Click Preview, and then move NO E-MAIL ADDRESS next to E-MAIL 
340 | ADDRESS.
341 | STEP 14: Close the Preview output window.
342 | STEP15: Under True when, enter length (trim ('E-MAIL ADDRESS')) = 0. If you 
343 | use the Expression Builder, locate the length and trim function in 
344 | the String function group.
345 | Click Preview.
346 | STEP 16: Close the Preview output window. Then Close the Derive dialog box.
347 | STEP 17: From the Field Ops palette, add a Derive node downstream from 
348 | the Derive node named NO E-MAIL ADDRESS.
349 | STEP 18: Edit the Derive node. Under Derive field, type POSITION PERIOD.
350 | STEP 19: Under Formula, enter locchar_back (., length ('E-MAIL ADDRESS'), 
351 | 'E-MAIL ADDRESS'). If you use the Expression Builder, locate 
352 | the locchar_back and length function in the String function group.
353 | STEP 20: Click Preview, and then move POSITION PERIOD next to E-MAIL 
354 | ADDRESS.
355 | STEP 21: Close the Preview output window. Close the Derive dialog box.
356 | STEP 22: From the Field Ops palette, add a Derive node downstream from 
357 | the Derive node named POSITION PERIOD.
358 | STEP 23: Edit the Derive node. Under Derive field type DOMAIN NAME.
359 | STEP 24: Beside Derive as, click Conditional from the list.
360 | STEP 25: Under If, enter 'POSITION PERIOD' > 0.
361 | STEP 26: Under Then, enter substring_between ('POSITION PERIOD' + 1, 
362 | length ('E-MAIL ADDRESS'), 'E-MAIL ADDRESS'). If you use the Expression 
363 | Builder, locate the substring_between and length function in the String function 
364 | group.
365 | STEP 27: Under Else, type undef.
366 | Click Preview, and then scroll to the last fields in the Preview output window.
367 | STEP 28: Close the Preview output window. Then Close the Derive dialog box.
368 | OUTPUT:
369 | Use date functions to derive fields:
370 | Use string functions to derive fields:
371 | RESULT:
372 |  Thus, the Create Segmentation Model Program has been Executed Successfully.
373 | 
374 | 
375 | # 8). ADD FIELDS TO THE TELECOMMUNICATIONS DATA-
376 | 
377 | AIM:
378 |  To write a Program for Add Fields to the Telecommunication data.
379 | a) Drive fields as formula
380 | b) Derive fields as flag or nominal.
381 | ALGORITHM:
382 |  Derive fields as formula:
383 | STEP 1: Import the dataset telco x data.txt
384 | STEP 2: From the Field Ops palette, add a Derive node downstream from 
385 | the Type node.
386 | STEP 3: Edit the Derive node. Click the Settings tab, if not already selected.
387 | STEP 4: In the Derive field box, type BILL_PEAK.
388 | STEP 5: Click the Derive as drop down. From the Derive as drop down, 
389 | click Formula, if not already selected.
390 | STEP 6: Click the Field type drop down. Click the Launch expression 
391 | builder button.
392 | STEP 7: In the Formula box, enter PEAK_MINS * PEAK_RATE/100, by typing it 
393 | or by pasting the field names from the list of fields, whatever you feel comfortable 
394 | with.
395 | STEP 8: Click the Check button. Click OK to close the Expression Builder.
396 | STEP 9: From the Field Ops palette, add a Derive node downstream from 
397 | the Derive node named BILL_PEAK.
398 | 
399 | STEP 10: Edit the Derive node. Click the Settings tab, if not already selected.
400 | STEP 11: Derive field box: BILL_OFFPEAK
401 | STEP 12: Derive as: Formula (the default)
402 | STEP 13: Field type: <Default>; the field will then be auto-typed as Continuous
403 | STEP 14: Expression: OFFPEAK_MINS * OFFPEAK_RATE/100 (if preferred, 
404 | use the Expression Builder)
405 | Close the Derive dialog box.
406 | STEP 15: From the Field Ops palette, add a Derive node downstream from 
407 | the Derive node named BILL_OFFPEAK.
408 | STEP 16: Edit the Derive node. Click the Settings tab, if not already selected.
409 | STEP 17: Derive field box: BILL_TOTAL
410 | STEP 18: Derive as: Formula (the default)
411 | STEP 19: Field type:<Default>; the field will then be auto-typed as Continuous
412 | STEP 20: Expression: BILL_PEAK + BILL_OFFPEAK (if preferred, use the 
413 | Expression Builder)
414 | STEP 21: Click Preview. Then Close the Preview output window.
415 | Close the Derive dialog box.
416 |  Derive fields as flag or nominal:
417 | STEP 1: From the Field Ops palette, add a Derive node downstream from 
418 | the Derive node named BILL_TOTAL.
419 | STEP 2: Edit the Derive node. Click the Settings tab, if not already selected.
420 | STEP 3: Derive field box: BILL_GT_0
421 | STEP 4: Derive as: Flag
422 | STEP 5: Field type: Flag (should be set automatically to Flag when you choose 
423 | Derive as: Flag)
424 | STEP 6: True value: T (the default) and False value: F (the default)
425 | STEP 7: True when box: BILL_TOTAL > 0. Close the Derive dialog box.
426 | STEP 8: From the Field Ops palette, add a Derive node downstream from 
427 | the Derive node named BILL_GT_0.
428 | STEP 9: Edit the Derive node Click the Settings tab, if not already selected.
429 | STEP 10: Derive field box: SEGMENT
430 | STEP 11: Derive as: Nominal and Field type: Ordinal
431 | STEP 12: Click the cell in the Set field to column and then type 1.
432 | STEP 13: Click the cell in the If this condition is true column and then 
433 | type BILL_TOTAL <= 100.
434 | STEP 14: Repeat the previous two steps for the following values and expressions:
435 | STEP 15: Set field to: 2 If this condition is true: BILL_TOTAL <= 200
436 | STEP 16: Set field to: 3 If this condition is true: BILL_TOTAL > 200
437 | STEP 17: In the Default value box, type undef. Then Close the Derive dialog box.
438 | OUTPUT:
439 | Drive fields as formula:
440 | Derive fields as flag or nominal:
441 | RESULT:
442 |  Thus, the Add fields to the Telecommunication data Program have been 
443 | Executed Successfully.
444 | 
445 | # 9). CREATE A LINEAR REGRESSION MODEL TO PREDICT EMPLOYEE SALARIES-
446 | 
447 | 
448 | AIM:
449 |  To Create a Linear Regression Model to predict Employee Salaries.
450 | ALGORITHM:
451 | Import and examine the data
452 | STEP1: From the Sources palette, add a Var. File node to a blank stream canvas, 
453 | edit the node, point to employee_data.txt, and then close the Var. File dialog 
454 | box.
455 | STEP2: From the Output palette, add a Table node downstream from the Var. 
456 | File node, run it, and then examine the output. The dataset is comprised of 474 
457 | employees.
458 | Close the Table output window.
459 | STEP 3: From the Output palette, add a Data Audit node downstream from 
460 | the Var. File node, run it, and then examine the output.
461 | Set measurement levels and roles:
462 | STEP 1: From Field Ops, add a Type node downstream from the Var. File node.
463 | STEP 2: Edit the Type node. Click Read Values
464 | STEP 3: set the Measurement for educational_level to Ordinal
465 | STEP 4: The Role from gender to months_previous_experience is set to Input
466 | STEP 5: set the Role for current_salary to Target
467 | 
468 | Create Linear Regression Model:
469 | STEP 1: From the Modeling palette, add a Linear node downstream from 
470 | the Type node.
471 | STEP 2: Edit the Linear node. Click the Build Options tab
472 | STEP 3: click the Basics item and clear the Automatically prepare data check box
473 | STEP 4: click the Model Selection item and set the Model Selection method 
474 | to Include all predictors
475 | STEP 5: click Run
476 | STEP 6: Edit the generated model nugget, and then click the Model Summary item in 
477 | the pane on the left.
478 | STEP 7: Click the Predictor Importance item in the pane on the left.
479 | STEP 8: The job_category field is by far the most important predictor. Gender is the 
480 | second most important field. Region and age are least important.
481 | STEP 9: Click the Predicted by Observed item in the pane on the left.
482 | STEP 10: The points are not scattered around the diagonal and the predicted values 
483 | seem to break up in two categories.
484 | STEP 11: Click the Coefficients by Observed item in the pane on the left, and then, 
485 | from the Style list, select Table.
486 | OUTPUT:
487 | RESULT:
488 |  Thus, the Create Linear regression model to predict Employee Salaries Program 
489 | has been Executed Successfully.
490 | 
491 | 
492 | # 10). USE LOGISTIC REGRESSION TO PREDICT RESPONSE TO A CHARITY PROMOTION CAMPAIGN-
493 | 
494 | 
495 | AIM: 
496 |  To write a Use Logistic Regression to Predict Response to a Charity Promotion 
497 | Campaign
498 | ALGORITHM: 
499 | Import and examine the data
500 | STEP 1: From the Sources palette, double-click the Var. File node to add it to the 
501 | stream. Import the dataset charity.csv
502 | STEP 2: From the Output palette, add a Data Audit node downstream from 
503 | the Var. File node, run the Data Audit node
504 | STEP 3: double-click the Sample Graph for the response to campaign field.
505 | Partition the data and set the roles:
506 | STEP 1: From the Field Ops palette, add a Partition node downstream from 
507 | the Var. File node,
508 | STEP 2: Set the Training partition size to 70% and the Testing partition size to 
509 | 30%. Ensure that the Repeatable partition assignment option is enabled, with seed 
510 | value 1234567.
511 | STEP 3: From the Field Ops palette, add a Type node downstream from the Partition 
512 | node.
513 | STEP 4: Edit the Type node, and then click the Read Values button. The Values 
514 | column is populated with values from the data.
515 | 
516 | STEP 5: Set the Role for gender, age, mosaic bands, pre-campaign expenditure, 
517 | and pre-campaign visits to Input
518 | STEP 6: Set the Role for response to campaign to Target
519 | STEP 7: Ensure that the Role for the Partition field is set to Partition
520 | STEP 8: Set the Role for all other fields to None
521 | Create the Logistic Regression Models:
522 | STEP 1: From the Modeling palette, add a Logistic node downstream from 
523 | the Type node.
524 | STEP 2: Edit the Logistic node and click the Model tab
525 | STEP 3: For Procedure, select the Binomial option
526 | STEP 4: close the Logistic dialog box
527 | STEP 5: Add a second Logistic node downstream from the Type node.
528 | STEP 6: Edit the second Logistic node, and then: click the Model tab
529 | STEP 7: For Procedure, select the Binomial option
530 | STEP 8: below Categorical inputs, select mosaic bands, and for Base Category, 
531 | select First
532 | STEP 9: click the Annotations tab, select the Custom option, and type custom close 
533 | the Logistic dialog box
534 | STEP 10: Select the two Logistic nodes, right-click one of them, and click Run 
535 | Selection.
536 | STEP 11: Edit the Logistic model nugget named response to campaign, click the 
537 | Advanced tab, and scroll down to the Variables in the Equation table (the last table in 
538 | the output).
539 | STEP 12: Close the Logistic output window.
540 | STEP 13: Edit the Logistic model nugget named custom, click the Advanced tab, and 
541 | scroll to the Categorical Variables Coding’s table.
542 | STEP 14: You can add an Analysis node at the end and check accuracy levels.
543 | OUTPUT:
544 | RESULT:
545 |  Thus, the Use of Logistic Regression to Predict Response to a Charity Promotion 
546 | Campaign Program has been Executed Successfully.
547 | 
548 | 


--------------------------------------------------------------------------------