Build a single classification tree with the training data and Default as the target

The purpose of this assignment is to perform classification using regression trees, interpret the results, and analyze whether or not the information generated can be used to address a specific business problem.

For this assignment, you will use the “Credit Card Defaults” data set from the Topic Materials. Most data categories are self-explanatory. Clarifying notes are as follows.

1. Limit_Balance: Balance limit on the credit card

2. Sex: Gender

3. Education: Highest level of education completed

4. Pay_Status1-3: Pay status in the previous 1 to 3 months, respectively

5. Pay_Amt_Prev1-3: Amount paid in the previous 1 to 3 months, respectively

6. Bill_Amt_Prev1-3: Amount billed in the previous 1 to 3 months, respectively

7. Default: Whether or not the individual defaulted on their credit card payment

You are an analyst for a credit card company. Management wants to know if there are any early signs that indicate whether customers will default on their credit cards. If these indicators can be identified, then more scrutiny can be placed on customer transactions in an effort to avoid losses. The rules to detect the potential for default must be simple enough for management to understand and easily implemented as part of the early default flagging system. Your task is to determine the indicators and communicate your findings to management.

Question 1:  Partition the data to create a training data set (70%) and test data set (30%).

Question 2:  Build a single classification tree with the training data and Default as the target. Include the “Default Tree Model” output when submitting the answer.

1. Which variable(s) were used in the tree model?

2. How would you use the model to predict whether or not the customer will default?

3. What is the accuracy of the model when using the training and test data? Include the “Misclassification Table” outputs when submitting the answer.

4. Consider the following individual: Limit_Balance=5000, Sex=Male, Education=High School, Marital_Status=Married, Age=30, Pay_Status1=On Time, Pay_Status2=On Time, Pay_Status3=2 Mths Late, Pay_Amt_Prev1=0, Pay_Amt_Prev2=0, Pay_Amt_Prev3=0, Bill_Amt_Prev1=5000, Bill_Amt_Prev2=2500, Bill_Amt_Prev3=100. Based on the classification model, what is the predicted default outcome? Explain your answer.

Question 3:  Predicting a default correctly is more important than predicting a nondefault outcome. Therefore, the focus of the modeling process should be weighted toward predicting defaults accurately. One way to do this is by increasing the cost of misclassifying a true default.

Rerun the model, but increase the cost of misclassifying a true default by a factor of 5 vs. misclassifying a true nondefault as 1. Make sure to set the minimum change in impurity to 0.01. Include the “Default-weighted Tree Model” output when submitting the answer.

1. Which variable(s) were used in the tree model?

2. What is the accuracy of the model when using the training and test data? Include the “Misclassification Table” outputs when submitting the answer.

3. Consider the following individual: Limit_Balance=5000, Sex=Male, Education=High School, Marital_Status=Married, Age=30, Pay_Status1=On Time, Pay_Status2=On Time, Pay_Status3=2 Mths Late, Pay_Amt_Prev1=0, Pay_Amt_Prev2=0, Pay_Amt_Prev3=0, Bill_Amt_Prev1=5000, Bill_Amt_Prev2=2500, Bill_Amt_Prev3=100. Based on the classification model, what is the predicted default outcome? Explain your answer.

Question 4:  Based on the two classification tree models, which one should be used if the goal is to more accurately predict defaulters?

Question 5:  Based upon your analysis, what are the indicators of whether or not customers will default on their credit cards? Discuss how management can use this information as part of the early default flagging system. Present your finding in the form of a 250-word executive summary that includes relevant data, charts, and tables to validate the conclusions presented.