Question 1

Use the three-class confusion matrix below to answer questions 1 through 3.&#10;  &#10;How many class 2 instances are in the dataset?

Accepted Answer

23

Question 2

The average positive difference between computed and desired outcome values.&#10;A) root mean squared error&#10;B) mean squared error&#10;C) mean absolute error&#10;D) mean positive error

Accepted Answer

The mean absolute error (MAE) measures the average absolute difference between predicted and actual values. The positive sign of the error is ignored in the MAE calculation, which makes it suitable for evaluating models where all errors contribute equally to the overall quality of the predictions. The root mean squared error (RMSE) and mean squared error (MSE) are other commonly used error metrics, but they square the error terms, which gives more weight to larger errors and can penalize models more severely for large outliers. Therefore, MAE is the best choice for this question.

Question 3

Another name for an output attribute.&#10;A) predictive variable&#10;B) independent variable&#10;C) estimated variable&#10;D) dependent variable

Accepted Answer

Output attributes are also known as dependent variables, as they depend on the input attributes or independent variables. Therefore, the best choice for another name for an output attribute is D) dependent variable.

Question 4

Classification problems are distinguished from estimation problems in that&#10;A) classification problems require the output attribute to be numeric.&#10;B) classification problems require the output attribute to be categorical.&#10;C) classification problems do not allow an output attribute.&#10;D) classification problems are designed to predict future outcome.

Accepted Answer

Classification problems require the output attribute to be categorical or nominal, while in estimation problems, the output attribute is typically numerical. Choices A and C are incorrect because classification problems do not require an output attribute to be numeric, nor do they disallow an output attribute. Choice D is also incorrect because both classification and estimation problems can be used to predict future outcomes.

Question 5

Which statement about outliers is true?&#10;A) Outliers should be identified and removed from a dataset.&#10;B) Outliers should be part of the training dataset but should not be present in the test data.&#10;C) Outliers should be part of the test dataset but should not be present in the training data.&#10;D) The nature of the problem determines how outliers are used.&#10;E) More than one of a,b,c or d is true.

Accepted Answer

The nature of the problem at hand determines how outliers should be treated, and there is no one-size-fits-all solution. In some cases, outliers may be genuine data points that are important to the analysis, while in other cases, they may be errors or anomalies that need to be removed. Therefore, it is crucial to understand the context of the problem and make an informed decision about how to handle outliers.

Question 6

Use the confusion matrix for Model X and confusion matrix for Model Y to answer questions 4 through 6.&#10;  &#10;Compute the lift for Model Y.

Accepted Answer

The answer of Use the confusion matrix for Model X...

Question 7

Use the confusion matrix for Model X and confusion matrix for Model Y to answer questions 4 through 6.

You will notice that the lift for both models is the same. Assume that the cost of a false reject is significantly higher than the cost of a false accept. Which model is the better choice?
Answers to Chapter 2 Questions
Multiple Choice Questions

Accepted Answer

The answer of Use the confusion matrix for Model X...

Question 8

Assume that we have a dataset containing information about 200 individuals. One hundred of these individuals have purchased life insurance. A supervised data mining session has discovered the following rule: IF age < 30 & credit card insurance = yes
THEN life insurance = yes
Rule Accuracy: 70%
Rule Coverage: 63%
How many individuals in the class life insurance= no have credit card insurance and are less than 30 years old?

A) 63
B) 70
C) 30
D) 27

Accepted Answer

Since the rule coverage is 63%, we can assume that the rule applies to 63% of the total number of individuals who have life insurance, which is 100. Thus, 63 individuals satisfy the IF condition in the rule, which leaves us with 37 individuals who do not satisfy the IF condition. 
Therefore, the total number of individuals who have credit card insurance and are less than 30 years old is 63. Out of these 63, 70% or 44 individuals would satisfy the THEN condition and have purchased life insurance, leaving us with 19 individuals who have credit card insurance and are less than 30 years old, but have not purchased life insurance. 
Thus, the answer is 27, which represents the number of individuals who have credit card insurance, are less than 30 years old, and do not have life insurance.

Question 9

Use the three-class confusion matrix below to answer questions 1 through 3.&#10;  &#10;What percent of the instances were correctly classified?

Accepted Answer

The answer of Use the three-class confusion matrix below to...

Question 10

Which statement is true about neural network and linear regression models?&#10;A) Both models require input attributes to be numeric.&#10;B) Both models require numeric attributes to range between 0 and 1.&#10;C) The output of both models is a categorical attribute value.&#10;D) Both techniques build models whose output is determined by a linear sum of weighted input attribute values.&#10;E) More than one of a,b,c or d is true.

Accepted Answer

The answer of Which statement is true about neural network...

Question 11

Unlike traditional production rules, association rules&#10;A) allow the same variable to be an input attribute in one rule and an output attribute in another rule.&#10;B) allow more than one input attribute in a single rule.&#10;C) require input attributes to take on numeric values.&#10;D) require each rule to have exactly one categorical output attribute.

Accepted Answer

The answer of Unlike traditional production rules, association rules&#10;A) allow...

Question 12

Which statement is true about prediction problems?&#10;A) The output attribute must be categorical.&#10;B) The output attribute must be numeric.&#10;C) The resultant model is designed to determine future outcomes.&#10;D) The resultant model is designed to classify current behavior.

Accepted Answer

The answer of Which statement is true about prediction problems?&#10;A)...

Question 13

Use the confusion matrix for Model X and confusion matrix for Model Y to answer questions 4 through 6.&#10;  &#10;How many instances were classified as an accept by Model X?

Accepted Answer

The answer of Use the confusion matrix for Model X...

Question 14

Use the three-class confusion matrix below to answer questions 1 through 3.&#10;  &#10;How many instances were incorrectly classified with class 3?

Accepted Answer

The answer of Use the three-class confusion matrix below to...

Question 15

Which of the following is a common use of unsupervised clustering?&#10;A) detect outliers&#10;B) determine a best set of input attributes for supervised learning&#10;C) evaluate the likely performance of a supervised learner model&#10;D) determine if meaningful relationships can be found in a dataset&#10;E) All of a,b,c, and d are common uses of unsupervised clustering.

Accepted Answer

The answer of Which of the following is a common...

Question 16

Given desired class C and population P, lift is defined as&#10;A) the probability of class C given population P divided by the probability of C given a sample taken from the population.&#10;B) the probability of population P given a sample taken from P.&#10;C) the probability of class C given a sample taken from population P.&#10;D) the probability of class C given a sample taken from population P divided by the probability of C within the entire population P.

Accepted Answer

The answer of Given desired class C and population P,...

Deck 2: Data Mining: a Closer Look