The Business Context

A Taiwan-based credit card issuer wants to better predict the likelihood of default for its customers, as well as identify the key drivers that determine this likelihood. This would inform the issuer’s decisions on who to give a credit card to and what credit limit to provide. It would also help the issuer have a better understanding of their current and potential customers, which would inform their future strategy, including their planning of offering targeted credit products to their customers.


The Data

(Data source: https://www.kaggle.com/uciml/default-of-credit-card-clients-dataset. We acknowledge the following: Lichman, M. (2013). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.)

The credit card issuer has gathered information on 30000 customers. The dataset contains information on 10 variables, including demographic factors, credit data, history of payment, and bill statements of credit card customers from April 2005 to September 2005, as well as information on the outcome: did the customer default or not?

Name Description
ID ID of each client
LIMIT_BAL Amount of given credit in NT dollars (includes individual and family/supplementary credit)
SEX Gender (1=male, 2=female)
EDUCATION (1=graduate school, 2=university, 3=high school, 4=others, 5=unknown, 6=unknown)
MARRIAGE Marital status (1=married, 2=single, 3=others)
AGE Age in years
PAY_0 Repayment status in September, 2005 (-2=no consumption, -1=pay duly, 0=the use of revolving credit, 1=payment delay for one month, 2=payment delay for two months, … 8=payment delay for eight months, 9=payment delay for nine months and above)
PAY_2 Repayment status in August, 2005 (scale same as above)
PAY_3 Repayment status in July, 2005 (scale same as above)
PAY_4 Repayment status in June, 2005 (scale same as above)
PAY_5 Repayment status in May, 2005 (scale same as above)
PAY_6 Repayment status in April, 2005 (scale same as above)
BILL_AMT1 Amount of bill statement in September, 2005 (NT dollar)
BILL_AMT2 Amount of bill statement in August, 2005 (NT dollar)
BILL_AMT3 Amount of bill statement in July, 2005 (NT dollar)
BILL_AMT4 Amount of bill statement in June, 2005 (NT dollar)
BILL_AMT5 Amount of bill statement in May, 2005 (NT dollar)
BILL_AMT6 Amount of bill statement in April, 2005 (NT dollar)
PAY_AMT1 Amount of previous payment in September, 2005 (NT dollar)
PAY_AMT2 Amount of previous payment in August, 2005 (NT dollar)
PAY_AMT3 Amount of previous payment in July, 2005 (NT dollar)
PAY_AMT4 Amount of previous payment in June, 2005 (NT dollar)
PAY_AMT5 Amount of previous payment in May, 2005 (NT dollar)
PAY_AMT6 Amount of previous payment in April, 2005 (NT dollar)
default.payment.next.month Default payment (1=yes, 0=no)

Let’s look into the data for a few customers. This is how the first 10 out of the total of 30000 rows look like (transposed, for convenience):

01 02 03 04 05 06 07 08 09 10
LIMIT_BAL 20000.00 120000.00 90000.00 50000.00 50000.00 50000.00 500000.00 100000.00 140000.00 2e+04
SEX 2.00 2.00 2.00 2.00 1.00 1.00 1.00 2.00 2.00 1e+00
EDUCATION 2.00 2.00 2.00 2.00 2.00 1.00 1.00 2.00 3.00 3e+00
MARRIAGE 1.00 2.00 2.00 1.00 1.00 2.00 2.00 2.00 1.00 2e+00
NoConsum 2.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 4e+00
PayDuly 2.00 1.00 0.00 0.00 2.00 0.00 0.00 3.00 0.00 2e+00
Delay1Month 0.00 3.00 6.00 6.00 4.00 6.00 6.00 3.00 5.00 0e+00
DelayMoreMonths 2.00 2.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0e+00
MedianBill 344.50 2971.50 14639.50 38268.50 19138.50 38546.50 459475.50 473.50 11950.50 0e+00
MedianPercentPay 0.07 0.31 0.12 0.04 1.25 0.04 0.07 -0.61 0.13 4e-02

A Process for Classification

It is important to remember that data analytics projects require a delicate balance between experimentation, intuition, and following a process. The value of following a process is so as to avoid getting fooled by randomness in data and finding “results and patterns” that are mainly driven by our own biases and not by the facts/data themselves.

There is no single best process for classification. However, we have to start somewhere, so we will use the following process:

  1. Create an estimation sample and two validation samples by splitting the data into three groups. Steps 2-5 below will then be performed only on the estimation and the first validation data. You should only do step 6 once on the second validation data, also called test data, and only report/use the performance on that (second validation) data to make final business decisions.
  2. Set up the dependent variable (as a categorical 0-1 variable; multi-class classification is also feasible, and similar, but we do not explore it in this note).
  3. Make a preliminary assessment of the relative importance of the explanatory variables using visualization tools and simple descriptive statistics.
  4. Estimate the classification model using the estimation data, and interpret the results.
  5. Assess the accuracy of classification in the first validation sample, possibly repeating steps 2-5 a few times changing the classifier in different ways to increase performance.
  6. Finally, assess the accuracy of classification in the second validation sample. You should eventually use and report all relevant performance measures and plots on this second validation sample only.

Let’s follow these steps.

Step 1: Split the data

It is very important that you (or the data scientists working on the project) finally measure and report the performance of the models on data that have not been used at all during the analysis, called “out-of-sample” or test data (steps 2-5 above). The idea is that in practice we want our models to be used for predicting the class of observations/data we have not seen yet (i.e., “the future data”): although the performance of a classification method may be high in the data used to estimate the model parameters, it may be significantly poorer on data not used for parameter estimation, such as the out-of-sample (future) data.

This is why we split the data into an estimation sample and two validation samples - using some kind of randomized splitting technique. The second validation data mimic out-of-sample data, and the performance on this validation set is a better approximation of the performance one should expect in practice from the selected classification method. The estimation data and the first validation data are used during steps 2-5 (with a few iterations of these steps), while the second validation data is only used once at the very end before making final business decisions based on the analysis. The split can be, for example, 80% estimation, 10% validation, and 10% test data, depending on the number of observations - for example, when there is a lot of data, you may only keep a few hundreds of them for the validation and test sets, and use the rest for estimation.

While setting up the estimation and validation samples, you should also check that the same proportion of data from each class (i.e., customers who default versus not) are maintained in each sample. That is, you should maintain the same balance of the dependent variable categories as in the overall dataset.

For simplicity, in this note we will not iterate steps 2-5. In practice, however, we should usually iterate steps 2-5 a number of times using the first validation sample each time, and at the end make our final assessment of the classification model using the test sample only once.

We typically refer to the three data samples as estimation data (80% of the data in our case), validation data (10% of the data) and test data (the remaining 10% of the data).

In our case we use 24000 observations in the estimation data, 3000 in the validation data, and 3000 in the test data.

Step 2: Set up the dependent variable

First, make sure the dependent variable is set up as a categorical 0-1 variable. In our illustrative example, we use the payment default (or no default) as the dependent variable.

The data however may not be always readily available with a categorical dependent variable. Suppose a retailer wants to understand what discriminates consumers who are loyal versus those who are not. If they have data on the amount that customers spend in their store or the frequency of their purchases, they can create a categorical variable (“loyal vs. not loyal”) by using a definition such as: “A loyal customer is one who spends more than X amount at the store and makes at least Y purchases a year”. They can then code these loyal customers as “1” and the others as “0”. They can choose the thresholds X and Y as they wish: a definition/decision that may have a big impact in the overall analysis. This decision can be the most crucial one of the whole data analysis: a wrong choice at this step may lead both to poor performance later as well as to no valuable insights. One should revisit the choice made at this step several times, iterating steps 2-3 and 2-5.

Carefully deciding what the dependent 0/1 variable is can be the most critical choice of a classification analysis. This decision typically depends on contextual knowledge and needs to be revisited multiple times throughout a data analytics project.

In our data the number of 0/1’s in our estimation sample is as follows:

Class 1 Class 0
# of Observations 5370 18630

while in the validation sample they are:

Class 1 Class 0
# of Observations 603 2397

Step 3: Simple Analysis

Good data analytics start with good contextual knowledge as well as a simple statistical and visual exploration of the data. In the case of classification, one can explore “simple classifications” by assessing how the classes differ along any of the independent variables. For example, these are the statistics of our independent variables across the two classes in the estimation data, class 1 (“default”):

min 25 percent median mean 75 percent max std
LIMIT_BAL 10000.00 50000.00 90000.00 129725.82 200000.00 740000.00 115263.81
SEX 1.00 1.00 2.00 1.59 2.00 2.00 0.49
EDUCATION 1.00 1.00 2.00 1.89 2.00 6.00 0.72
MARRIAGE 0.00 1.00 2.00 1.53 2.00 3.00 0.53
NoConsum 0.00 0.00 0.00 0.67 0.00 6.00 1.61
PayDuly 0.00 0.00 0.00 0.87 1.00 6.00 1.72
Delay1Month 0.00 0.00 2.00 2.46 5.00 6.00 2.28
DelayMoreMonths 0.00 0.00 1.00 2.00 3.00 6.00 2.13
MedianBill -36035.00 2426.00 19424.50 42501.91 49946.62 541125.00 64001.75
MedianPercentPay -5904.73 0.04 0.05 -2.02 0.19 1055.74 87.11

and class 0 (“no default”):

min 25 percent median mean 75 percent max std
LIMIT_BAL 10000.00 60000.00 150000.00 175806.55 250000.00 1000000.00 131059.41
SEX 1.00 1.00 2.00 1.64 2.00 2.00 0.48
EDUCATION 0.00 1.00 2.00 1.84 2.00 6.00 0.80
MARRIAGE 0.00 1.00 2.00 1.56 2.00 3.00 0.52
NoConsum 0.00 0.00 0.00 0.83 0.00 6.00 1.80
PayDuly 0.00 0.00 0.00 1.26 2.00 6.00 1.99
Delay1Month 0.00 0.00 4.00 3.40 6.00 6.00 2.53
DelayMoreMonths 0.00 0.00 0.00 0.52 1.00 6.00 1.18
MedianBill -20320.00 2895.38 19232.25 43840.05 56125.38 944417.50 63559.58
MedianPercentPay -2878.89 0.04 0.08 -1.34 0.86 4444.13 62.58

The purpose of such an analysis by class is to get an initial idea about whether the classes are indeed separable as well as to understand which of the independent variables have most discriminatory power.

Notice however that

Even though each independent variable may not differ across classes, classification may still be feasible: a (linear or nonlinear) combination of independent variables may still be discriminatory.

A simple visualization tool to assess the discriminatory power of the independent variables are the box plots. A box plot visually indicates simple summary statistics of an independent variable (e.g. mean, median, top and bottom quantiles, min, max, etc.). For example consider the box plots for our estimation data for the repayment status variables, for class 1

## Warning: It is recommended that you provide at least seven colours for the
## swatch.

and class 0:

## Warning: It is recommended that you provide at least seven colours for the
## swatch.

Questions:

  1. Draw the box plots for class 1 and class 0 for another set of independent variables of your choice.
  2. Which independent variables appear to have the most discriminatory power?

Answers:

Step 4: Classification and Interpretation

Once we decide which dependent and independent variables to use (which can be revisited in later iterations), one can use a number of classification methods to develop a model that discriminates the different classes.

Some of the widely used classification methods are: classification and regression trees (CART), boosted trees, support vector machines, neural networks, nearest neighbors, logistic regression, lasso, random forests, deep learning methods, etc.

In this report we consider only the following classification methods: Logistic Regression, CART, Regularized Logistic Regression, XGBoost. Understanding how these methods work is beyond the scope of this note - there are many references available online for all these classification methods.

Logistic Regression

These are the estimated parameters for logistic regression on our data:

Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.9209431 0.1091208 17.603824 0.0000000
LIMIT_BAL -0.0000022 0.0000002 -12.291202 0.0000000
SEX -0.1323413 0.0348065 -3.802205 0.0001434
EDUCATION -0.0407241 0.0232700 -1.750066 0.0801069
MARRIAGE -0.1665024 0.0332340 -5.010007 0.0000005
NoConsum -0.4113798 0.0139261 -29.540107 0.0000000
PayDuly -0.4525871 0.0134375 -33.680873 0.0000000
Delay1Month -0.5023519 0.0105978 -47.401619 0.0000000
MedianBill 0.0000015 0.0000004 4.145690 0.0000339
MedianPercentPay -0.0003048 0.0002247 -1.356866 0.1748238

The estimated probability that a validation observation belongs to class 1 (e.g., the estimated probability that the customer defaults) for the first few validation observations, using the logistic regression above, is:

## Warning in predict.lm(object, newdata, se.fit, scale = 1, type =
## ifelse(type == : prediction from a rank-deficient fit may be misleading

## Warning in predict.lm(object, newdata, se.fit, scale = 1, type =
## ifelse(type == : prediction from a rank-deficient fit may be misleading

## Warning in predict.lm(object, newdata, se.fit, scale = 1, type =
## ifelse(type == : prediction from a rank-deficient fit may be misleading
Actual Class Predicted Class Probability of Class 1
Obs 1 0 0 0.34
Obs 2 1 0 0.16
Obs 3 0 0 0.07
Obs 4 0 1 0.78
Obs 5 0 0 0.18
Obs 6 0 0 0.15
Obs 7 0 0 0.17
Obs 8 0 0 0.11
Obs 9 0 0 0.13
Obs 10 0 0 0.24

The default decision is to classify each observation in the group with the highest probability - but one can change this choice.

Regularized Logistic Regression

These are the estimated parameters for regularized logistic regression on our data:

Estimate
(Intercept) -1.5971156
LIMIT_BAL -0.0000008
SEX 0.0000000
EDUCATION 0.0000000
MARRIAGE 0.0000000
NoConsum 0.0000000
PayDuly 0.0000000
Delay1Month -0.0022303
DelayMoreMonths 0.4458947
MedianBill 0.0000000
MedianPercentPay 0.0000000

The estimated probability that a validation observation belongs to class 1 (e.g., the estimated probability that the customer defaults) for the first few validation observations, using regularized logistic regression above, is:

Actual Class Predicted Class Probability of Class 1
Obs 1 0 0 0.32
Obs 2 1 0 0.16
Obs 3 0 0 0.13
Obs 4 0 1 0.74
Obs 5 0 0 0.16
Obs 6 0 0 0.15
Obs 7 0 0 0.16
Obs 8 0 0 0.14
Obs 9 0 0 0.16
Obs 10 0 0 0.23

The table above assumes that the probability threshold for considering an observation as “class 1” is 0.5 .

CART

Running a basic CART model with complexity control 0.0025 leads to the following tree: (NOTE: for better readability of the tree figures below, we will rename the independent variables as IV1 to IV10 when using CART):

The leaves of the tree indicate the number of estimation data observations that “reach that leaf” that belong to each class. A perfect classification would only have data from one class in each of the tree leaves. However, such a perfect classification of the estimation data would most likely not be able to classify well out-of-sample data due to overfitting of the estimation data.

One can estimate larger trees through changing the tree’s complexity control parameter (in this case the rpart.control argument cp). For example, this is how the tree would look like if we set cp= 0.00068 :

In our case, the probability our validation data belong to class 1 (i.e., a customer’s likelihood of default) for the first few validation observations, using the first CART above, is:

Actual Class Predicted Class Probability of Class 1
Obs 1 0 0 0.38
Obs 2 1 0 0.15
Obs 3 0 0 0.15
Obs 4 0 1 0.69
Obs 5 0 0 0.15
Obs 6 0 0 0.15
Obs 7 0 0 0.15
Obs 8 0 0 0.15
Obs 9 0 0 0.15
Obs 10 0 0 0.15

The table above assumes that the probability threshold for considering an observation as “class 1” is 0.5 .

XGBoost

The estimated probability that a validation observation belongs to class 1 (e.g., the estimated probability that the customer defaults) for the first few validation observations, using XGBoost, is:

Actual Class Predicted Class Probability of Class 1
Obs 1 0 0 0.24
Obs 2 1 0 0.18
Obs 3 0 0 0.05
Obs 4 0 0 0.37
Obs 5 0 0 0.30
Obs 6 0 0 0.12
Obs 7 0 0 0.14
Obs 8 0 0 0.07
Obs 9 0 0 0.16
Obs 10 0 0 0.30

Step 5: Validation accuracy

Using the predicted class probabilities of the validation data, as outlined above, we can generate some measures of classification performance. Before discussing them, note that given the probability an observation belongs to a class, a reasonable class prediction choice is to predict the class that has the highest probability. However, this does not need to be the only choice in practice.

Selecting the probability threshold based on which we predict the class of an observation is a decision the user needs to make. While in some cases a reasonable probability threshold is 50%, in other cases it may be 99.9% or 0.1%.

Question:

Can you think of such a scenario?

Answer:

For different choices of the probability threshold, one can measure a number of classification performance metrics, which are outlined next.

1. Hit ratio

This is the percentage of the observations that have been correctly classified (i.e., the predicted class and the actual class are the same). We can just count the number of the validation data correctly classified and divide this number with the total number of the validation data, using the two CART and the logistic regression above. These are as follows for probability threshold 50%:

Hit Ratio
Logistic Regression 82.06667
First CART 82.20000
Second CART 82.26667
Regularized Logistic Regression 82.00000
XGBoost 81.66667

For the estimation data, the hit rates are:

Hit Ratio
Logistic Regression 79.95417
First CART 80.22500
Second CART 81.18750
Regularized Logistic Regression 79.90833
XGBoost 84.81667

A simple benchmark to compare the hit ratio performance of a classification model against is the Maximum Chance Criterion. This measures the proportion of the class with the largest size. For our validation data the largest group is customers who do not default: 2397 out of 3000 customers). Clearly, if we classified all individuals into the largest group, we could get a hit ratio of 79.9% without doing any work. One should have a hit rate of at least as much as the Maximum Chance Criterion rate, although as we discuss next there are more performance criteria to consider.

2. Confusion matrix

The confusion matrix shows for each class the number (or percentage) of the data that are correctly classified for that class. For example, for the method above with the highest hit rate in the validation data (among logistic regression and the 2 CART models), and for probability threshold 50%, the confusion matrix for the validation data is:

Predicted 1 (default) Predicted 0 (no default)
Actual 1 (default) 27.53 72.47
Actual 0 (no default) 3.96 96.04

Questions:

  1. Note that the percentages add up to 100% for each row. Why?
  2. Moreover, a “good” confusion matrix should have large diagonal values and small off-diagonal ones. Why?

Answers:

3. ROC curve

Remember that each observation is classified by our model according to the probabilities Pr(0) and Pr(1) and a chosen probability threshold. Typically we set the probability threshold to 0.5 - so that observations for which Pr(1) > 0.5 are classified as 1’s. However, we can vary this threshold, for example if we are interested in correctly predicting all 1’s but do not mind missing some 0’s (and vice-versa).

When we change the probability threshold we get different values of hit rate, false positive and false negative rates, or any other performance metric. We can plot for example how the false positive versus true positive rates change as we alter the probability threshold, and generate the so called ROC curve.

The ROC curves for the validation data for the logistic regression as well as both the CARTs above are as follows:

How should a good ROC curve look like? A rule of thumb in assessing ROC curves is that the “higher” the curve (i.e., the closer it gets to the point with coordinates (0,1)), hence the larger the area under the curve, the better. You may also select one point on the ROC curve (the “best one” for our purpose) and use that false positive/false negative performances (and corresponding threshold for P(1)) to assess your model.

Questions:

  1. Which point on the ROC would you select?
  2. What classifier does the dotted 45-degree line correspond to? How does the ROC plot above showcase that the classifiers used here are superior to such classifier?

Answers:

4. Gains chart

The gains chart is a popular technique in certain applications, such as direct marketing or credit risk.

For a concrete example, consider the case of a direct marketing mailing campaign. Say we have a classifier that attempts to identify the likely responders by assigning each case a probability of response. We may want to select as few cases as possible and still capture the maximum number of responders possible.

We can measure the percentage of all responses the classifier captures if we only select, say, x% of cases: the top x% in terms of the probability of response assigned by our classifier. For each percentage of cases we select (x), we can plot the following point: the x-coordinate will be the percentage of all cases that were selected, while the y-coordinate will be the percentage of all class 1 cases that were captured within the selected cases (i.e., the ratio true positives/positives of the classifier, assuming the classifier predicts class 1 for all the selected cases, and predicts class 0 for all the remaining cases). If we plot these points while we change the percentage of cases we select (x) (i.e., while we change the probability threshold of the classifier), we get a chart that is called the gains chart.

In the credit card default case we are studying, the gains charts for the validation data for our classifiers are the following:

Notice that if we were to examine cases selecting them at random, instead of selecting the “best” ones using an informed classifier, the “random prediction” gains chart would be a straight 45-degree line.

Question:

Why?

Answer:

So how should a good gains chart look like? The further above this 45-degree reference line our gains curve is, the better the “gains”. Moreover, much like for the ROC curve, one can select the percentage of all cases examined appropriately so that any point of the gains curve is selected.

Question:

Which point on the gains curve should we select in practice?

Answer:

5. Profit curve

Finally, we can generate the so called profit curve, which we often use to make our final decisions. The intuition is as follows. Consider a direct marketing campaign, and suppose it costs $1 to send an advertisement, and the expected profit from a person who responds positively is $45. Suppose you have a database of 1 million people to whom you could potentially send the promotion. Typical response rates are 0.05%. What fraction of the 1 million people should you send the promotion to?

To answer this type of questions, we need to create the profit curve. We can measure some measure of profit if we only select the top cases in terms of the probability of response assigned by our classifier. We can plot the profit curve by changing, as we did for the gains chart, the percentage of cases we select, and calculating the corresponding total estimated profit (or loss) we would generate. This is simply equal to:

Total Estimated Profit = (% of 1’s correctly predicted)x(value of capturing a 1) + (% of 0’s correctly predicted)x(value of capturing a 0) + (% of 1’s incorrectly predicted as 0)x(cost of missing a 1) + (% of 0’s incorrectly predicted as 1)x(cost of missing a 0)

Calculating the expected profit requires we have an estimate of the four costs/values: the value of capturing a 1 or a 0, and the cost of misclassifying a 1 into a 0 or vice versa.

Given the values and costs of correct classifications and misclassifications, we can plot the total estimated profit (or loss) as we change the percentage of cases we select, i.e., the probability threshold of the classifier, like we did for the ROC and the gains chart.

In our credit card default case, we consider the following business profit and loss to the credit card issuer for the correctly classified and misclassified customers:

Predict 1 (default) Predict 0 (no default)
Actual 1 (default) 0 -1e+05
Actual 0 (no default) 0 2e+04

Based on these profit and cost estimates, the profit curves for the validation data for the classifiers are:

We can then select the percentage of selected cases that corresponds to the maximum estimated profit (or minimum loss, if necessary).

Question:

Which point on the profit curve would you select in practice?

Answer:

Notice that to maximize estimated profit, we need to have the cost/profit for each of the four cases! This can be difficult to assess, hence typically we want to do a sensitivity analysis to our assumptions about the cost/profit. For example, we can generate different profit curves (i.e., worst case, best case, average case scenarios) and see how much the best profit we get varies, and most importantly how our selection of the classification model and of the probability threshold corresponding to the best profit vary, as the classifier and the percentage of cases are what we need to decide on eventually.

Step 6. Test Accuracy

Having iterated steps 2-5 until we are satisfied with the performance of our selected model on the validation data, in this step the performance analysis outlined in step 5 needs to be done with the test sample. This is the performance that best mimics what one should expect in practice upon deployment of the classification solution, assuming (as always) that the data used for this performance analysis are representative of the situation in which the solution will be deployed.

Let’s see in our case how the hit ratio, confusion matrix, ROC curve, gains chart, and profit curve look like for our test data. For the hit ratio and the confusion matrix we use 50% as the probability threshold for classification.

Hit Ratio
Logistic Regression 81.40000
First CART 80.86667
Second CART 80.53333
Regularized Logistic Regression 81.56667
XGBoost 72.86667

The confusion matrix for the model with the best test data hit ratio above:

Predicted 1 (default) Predicted 0 (no default)
Actual 1 (default) 26.85 73.15
Actual 0 (no default) 2.91 97.09

ROC curves for the test data:

Gains charts for the test data:

Finally the profit curves for the test data, using the same profit/cost estimates as we did above:

Questions:

  1. Is the performance in the test data similar to the performance in the validation data above? Should we expect the performance of our classification model to be close to that in our test data when we deploy the model in practice? Why or why not? What should we do if they are different?
  2. Make a final assessment about what classifier you would use (out of the ones considered here) for this credit card default classification business problem, with what percentage of cases/probability threshold, and why. What is the business profit the company can achieve (as measured with the test data) based on your solution?
  3. How does your assessment depend on the values and costs of correct classifications and misclassifications (0, 10^{5}, 0, 210^{4})?
  4. What business decisions can the credit card issuer make based on this analysis?

Answers: