Question 1

The data might be normalized so that each variable is expressed on a common scale.

Accepted Answer

Normalization is a process of scaling the variables to a common scale to avoid bias towards variables with higher value ranges.

Question 2

Classification refers to a type of data mining problem that uses the information available in a set of independent variables to predict the value of a discrete, or categorical, dependent variable.

Accepted Answer

This statement correctly defines classification as a data mining problem that involves predicting the value of a categorical dependent variable based on independent variables.

Question 3

Exhibit 10.2&#10;The following questions are based on the problem description and the output below.&#10;A college admissions officer wants to evaluate graduate school applicants based on their GMAT scores, verbal and quantitative. Students are classified as either successful (Group 1), marginally successful (Group 2) or not-successful (Group 3) in their graduate studies. The officer has data on 20 current students, 7 successful (Group 1), 6 marginally successful (Group 2) and 7 not successful (Group 3).   &#8203;   &#8203;   &#8203;   &#8203;  &#10;Refer to Exhibit 10.2. What is the verbal test score value of the group centroid for group 3?&#10;A) 697.71&#10;B) 647.86&#10;C) 587.67&#10;D) 605.17

Accepted Answer

605.17

Question 4

Data mining tasks fall into three potential categories: Classification, Prediction and Association/Segmentation.

Accepted Answer

This statement is true. The three main categories of data mining tasks are classification, prediction, and association/segmentation.

Question 5

A classification tree is a graphical representation of a set of rules for classifying observations into one group.

Accepted Answer

The answer of A classification tree is a graphical representation...

Question 6

Affinity analysis is a data mining technique that attempts to discover&#10;A) what goes with what&#10;B) the relationship between independent vatiables&#10;C) multicollinearity&#10;D) causality

Accepted Answer

Affinity analysis is also known as market basket analysis, which attempts to discover what items are frequently purchased together. It is used in retail and e-commerce to identify associations between products and improve sales strategies.

Question 7

Suppose that an analyst classified a new record using the following sequential steps (i) find identical records in the training sample, (ii) determine a group, to which majority of these records belong, (iii) assign the new record to the group in step (ii). This technique is called

A) naive Bayes
B) Bayes
C) conditional probability estimation
D) posterior probability estimation

Accepted Answer

Naive Bayes is a classification technique that uses Bayes' theorem to assign a new record to a group. The steps described in the question are the steps used in naive Bayes.

Question 8

&#8203;Data mining is the process of finding and extracting useful information and insights from large data sets.

Accepted Answer

Data mining involves using statistical and computational techniques to analyze large data sets and extract useful information and insights. The goal is to find patterns, relationships, and trends that can be used to make informed decisions or predictions. Therefore, the statement that data mining is the process of finding and extracting useful information and insights from large data sets is true.

Question 9

The Fisher classification scores can be converted to&#10;A) a linear function for each of the groups in the classification problem&#10;B) probabilities of group membership&#10;C) a uniform distribution&#10;D) a half-space

Accepted Answer

The Fisher classification scores represent the distance of each observation to the group centroids. These scores can be transformed into a probability of group membership by applying the softmax function, which converts the scores into a probability distribution over the groups. Therefore, option B is the best choice as it accurately represents the Fisher classification scores.

Question 10

Exhibit 10.2&#10;The following questions are based on the problem description and the output below.&#10;A college admissions officer wants to evaluate graduate school applicants based on their GMAT scores, verbal and quantitative. Students are classified as either successful (Group 1), marginally successful (Group 2) or not-successful (Group 3) in their graduate studies. The officer has data on 20 current students, 7 successful (Group 1), 6 marginally successful (Group 2) and 7 not successful (Group 3).   &#8203;   &#8203;   &#8203;   &#8203;  &#10;Refer to Exhibit 10.2. What is the quantitative test score value of the group centroid for group 1?&#10;A) 697.71&#10;B) 647.86&#10;C) 587.67&#10;D) 650.43

Accepted Answer

The quantitative test score value of the group centroid for group 1 is the value under the "Q" column for Group 1, which is 697.71.

Question 11

Exhibit 10.2&#10;The following questions are based on the problem description and the output below.&#10;A college admissions officer wants to evaluate graduate school applicants based on their GMAT scores, verbal and quantitative. Students are classified as either successful (Group 1), marginally successful (Group 2) or not-successful (Group 3) in their graduate studies. The officer has data on 20 current students, 7 successful (Group 1), 6 marginally successful (Group 2) and 7 not successful (Group 3).   &#8203;   &#8203;   &#8203;   &#8203;  &#10;Refer to Exhibit 10.2. What percentage of observations is classified correctly?&#10;A) 100%&#10;B) 85.71%&#10;C) 95%&#10;D) 90%

Accepted Answer

The answer of Exhibit 10.2&#10;The following questions are based on...

Question 12

Exhibit 10.2&#10;The following questions are based on the problem description and the output below.&#10;A college admissions officer wants to evaluate graduate school applicants based on their GMAT scores, verbal and quantitative. Students are classified as either successful (Group 1), marginally successful (Group 2) or not-successful (Group 3) in their graduate studies. The officer has data on 20 current students, 7 successful (Group 1), 6 marginally successful (Group 2) and 7 not successful (Group 3).   &#8203;   &#8203;   &#8203;   &#8203;  &#10;Refer to Exhibit 10.2. What number of observations is classified correctly?&#10;A) 19&#10;B) 20&#10;C) 7&#10;D) 8

Accepted Answer

The answer of Exhibit 10.2&#10;The following questions are based on...

Question 13

Exhibit 10.2&#10;The following questions are based on the problem description and the output below.&#10;A college admissions officer wants to evaluate graduate school applicants based on their GMAT scores, verbal and quantitative. Students are classified as either successful (Group 1), marginally successful (Group 2) or not-successful (Group 3) in their graduate studies. The officer has data on 20 current students, 7 successful (Group 1), 6 marginally successful (Group 2) and 7 not successful (Group 3).   &#8203;   &#8203;   &#8203;   &#8203;  &#10;Refer to Exhibit 10.2. What is the verbal test score value of the group centroid for group 1?&#10;A) 697.71&#10;B) 647.86&#10;C) 587.67&#10;D) 650.43

Accepted Answer

The answer of Exhibit 10.2&#10;The following questions are based on...

Question 14

The Mahalanobis distance measure accounts for differences in the covariances between all possible pairings of the independent variables.

Accepted Answer

The answer of The Mahalanobis distance measure accounts for differences...

Question 15

If using the regression tool for two-group discriminant analysis, in the regression dialog box, the Input X-Range entry corresponds to&#10;A) the Group values.&#10;B) the independent variable values.&#10;C) the predicted variable values.&#10;D) the fitted variable values.

Accepted Answer

The answer of If using the regression tool for two-group...

Question 16

Exhibit 10.1&#10;The following questions are based on the problem description and the output below.&#10;A college admissions officer wants to evaluate graduate school applicants based on their GMAT scores, verbal and quantitative. Students are classified as either successful or not-successful in their graduate studies. The officer has data on 20 current students, ten of whom are doing very well (Group 1) and ten who are not (Group 2).   &#8203;   &#8203;   &#8203;   &#8203;  &#10;Refer to Exhibit 10.1. What percentage of the observations is classified incorrectly?&#10;A) 90%&#10;B) 80%&#10;C) 85%&#10;D) 15%

Accepted Answer

The answer of Exhibit 10.1&#10;The following questions are based on...

Question 17

In hierarchical clustering, the measure of similarity between clusters is/are&#10;A) single linkage&#10;B) complete linkage&#10;C) Ward's method&#10;D) all of the above

Accepted Answer

The answer of In hierarchical clustering, the measure of similarity...

Question 18

&#8203;The Get Data command is part of the XLMiner Platform in Excel add-on.

Accepted Answer

The answer of &#8203;The Get Data command is part of...

Question 19

The k-nearest neighbor (k-NN) technique identifies the k observations in the training data that are most similar (or nearest) to a new observation we want to classify.

Accepted Answer

The answer of The k-nearest neighbor (k-NN) technique identifies the...

Question 20

When purity is perfect, the Gini index is equal to&#10;A) 0&#10;B) 0.25&#10;C) 0.5&#10;D) 1

Accepted Answer

The answer of When purity is perfect, the Gini index...

Question 21

In discriminant analysis the averages for the independent variables for a group define the&#10;A) centroid.&#10;B) median.&#10;C) mode.&#10;D) central tendency.

Accepted Answer

The answer of In discriminant analysis the averages for the...

Question 22

One element in cleaning the data set in the mining process involves A) removing unimportant variables B) adding more variables to the data set C) calculating the adjusted R² D) calculating the coefficient of multiple correlation

Accepted Answer

The answer of One element in cleaning the data set...

Question 23

Discriminant analysis (DA) differs from most other predictive statistical methods because the dependent variable is&#10;A) continuous&#10;B) random&#10;C) stochastic&#10;D) discrete

Accepted Answer

The answer of Discriminant analysis (DA) differs from most other...

Question 24

Exhibit 10.1&#10;The following questions are based on the problem description and the output below.&#10;A college admissions officer wants to evaluate graduate school applicants based on their GMAT scores, verbal and quantitative. Students are classified as either successful or not-successful in their graduate studies. The officer has data on 20 current students, ten of whom are doing very well (Group 1) and ten who are not (Group 2).   &#8203;   &#8203;   &#8203;   &#8203;  &#10;Refer to Exhibit 10.1. What is the quantitative test score value of the group centroid for group 2?&#10;A) 683.8&#10;B) 654.2&#10;C) 610.7&#10;D) 605.7

Accepted Answer

The answer of Exhibit 10.1&#10;The following questions are based on...

Question 25

Exhibit 10.1&#10;The following questions are based on the problem description and the output below.&#10;A college admissions officer wants to evaluate graduate school applicants based on their GMAT scores, verbal and quantitative. Students are classified as either successful or not-successful in their graduate studies. The officer has data on 20 current students, ten of whom are doing very well (Group 1) and ten who are not (Group 2).   &#8203;   &#8203;   &#8203;   &#8203;  &#10;Refer to Exhibit 10.1. The university has received applications from several new students and would like to predict which group they would fall into. What is the discriminant score for a student with a Quantitative score of 686 and a Verbal score of 601. Use five (5) significant figures in your coefficients.&#10;A) 1.29 &#8804; discriminant score &#8804; 1.30&#10;B) 1.69 &#8804; discriminant score &#8804; 1.70&#10;C) 2.69 &#8804; discriminant score &#8804; 2.70&#10;D) 6.05 &#8804; discriminant score &#8804; 6.06

Accepted Answer

The answer of Exhibit 10.1&#10;The following questions are based on...

Question 26

The dependent variable   in the regression equation   represents&#10;A) the estimated value of the dependent variable.&#10;B) the estimated value of the Group variable.&#10;C) the estimated ranking of the subject&#10;D) all of these are true.

Accepted Answer

The answer of The dependent variable   in the...

Question 27

Useful data mining techniques can be found in Excel under ___________ drop menu&#10;A) Data/(Data Analysis)&#10;B) Regression&#10;C) Histogram&#10;D) Insert/Chart

Accepted Answer

The answer of Useful data mining techniques can be found...

Question 28

In hierarchical clustering, the measure of similarity between clusters is/are&#10;A) single linkage&#10;B) average linkage&#10;C) average group linkage&#10;D) all of the above

Accepted Answer

The answer of In hierarchical clustering, the measure of similarity...

Question 29

The k-means clustering algorithm is available&#10;A) in XLMiner Excel add-in&#10;B) as a regression option in Excel&#10;C) as ANOVA option in Excel&#10;D) as one of the options under non-parametric statistics in Excel

Accepted Answer

The answer of The k-means clustering algorithm is available&#10;A) in...

Question 30

Overfitting refers to&#10;A) placing too much emphasis on the sample-specific noise&#10;B) fitting the model too tightly&#10;C) fitting the model too loosely&#10;D) underestimating model parameters

Accepted Answer

The answer of Overfitting refers to&#10;A) placing too much emphasis...

Question 31

Prediction step in data mining is an option available in&#10;A) XLMiner Excel add-in&#10;B) Excel&#10;C) data mining software&#10;D) nonlinear multivariate regression

Accepted Answer

The answer of Prediction step in data mining is an...

Question 32

Suppose that two variables are found to be significantly correlated. A researcher may&#10;A) remove one variable from the data set&#10;B) replace the two variables by their product&#10;C) replace the two variables by their squared difference&#10;D) remove both variables from the data set

Accepted Answer

The answer of Suppose that two variables are found to...

Question 33

Steps in the data mining process include the following (in sequence)&#10;A) (identify opportunity), (collect data), (explore, understand and prepare data), (identify tasks and tools, (partition data), (build and evaluate models), (deploy models)&#10;B) (identify opportunity), (collect data), (explore, understand and prepare data), (identify tasks and tools, (build and evaluate models), (deploy models)&#10;C) (collect data), (explore, understand and prepare data), (identify tasks and tools, (partition data), (build and evaluate models), (deploy models)&#10;D) (identify opportunity), (collect data), (identify tasks and tools, (partition data), (build and evaluate models), (deploy models)

Accepted Answer

The answer of Steps in the data mining process include...

Question 34

Suppose that the correlation coefficient between X₁ and X₂ is equal to -1. This means that A) X₁ and X₂ are perfectly positively correlated B) X₁ and X₂ are perfectly negatively correlated C) X₁ and X₂ are weakly and positively correlated D) X₁ and X₂ are weakly and negatively correlated

Accepted Answer

The answer of Suppose that the correlation coefficient between X₁...

Question 35

____________ is a classification technique that estimates the probability of an observation belonging to a particular group&#10;A) logistic regression&#10;B) binary regression&#10;C) multivariate analysis&#10;D) ANCOVA

Accepted Answer

The answer of ____________ is a classification technique that estimates...

Question 36

The discriminant score is denoted by A) B) C) Y_i D)

Accepted Answer

The answer of The discriminant score is denoted by&#10;A) ...

Question 37

The Fisher linear discriminant function&#10;A) identifies a linear function for each of the groups in the classification problem&#10;B) fits a nonlinear function for each of the groups in the classification problem&#10;C) defines a hyperplane&#10;D) defines a half-space

Accepted Answer

The answer of The Fisher linear discriminant function&#10;A) identifies a...

Question 38

In a two-group discriminant analysis problem using regression, why is the midpoint cut-off value used to determine group classification?&#10;A) Because the value minimizes the absolute misclassification error.&#10;B) Because the value minimizes the probability of misclassification error.&#10;C) Because the value represents an equal division between the groups.&#10;D) Because the value incorporates problem specific knowledge.

Accepted Answer

The answer of In a two-group discriminant analysis problem using...

Question 39

In the k nearest neighbor technique, a small value of k produces classifications that are&#10;A) very sensitive to the sample-specific characteristics of the training data&#10;B) not sensitive to the sample-specific characteristics of the training data&#10;C) robust&#10;D) reliable

Accepted Answer

The answer of In the k nearest neighbor technique, a...

Question 40

Suppose that there are 3 variables in a data set. Approximately how many data records are required using a rule of thumb discussed in the textbook?&#10;A) 30 to 45&#10;B) 20 to 30&#10;C) 45 to 60&#10;D) 50 to 100

Accepted Answer

The answer of Suppose that there are 3 variables in...

Question 41

The parameters of the logistic regression model&#10;A) are derived through a nonlinear maximum likelihood estimation procedure&#10;B) are negative&#10;C) are positive&#10;D) are negative fractions

Accepted Answer

The answer of The parameters of the logistic regression model&#10;A)...

Question 42

Exhibit 10.2&#10;The following questions are based on the problem description and the output below.&#10;A college admissions officer wants to evaluate graduate school applicants based on their GMAT scores, verbal and quantitative. Students are classified as either successful (Group 1), marginally successful (Group 2) or not-successful (Group 3) in their graduate studies. The officer has data on 20 current students, 7 successful (Group 1), 6 marginally successful (Group 2) and 7 not successful (Group 3).   &#8203;   &#8203;   &#8203;   &#8203;  &#10;Refer to Exhibit 10.2. What percentage of observations is classified incorrectly?&#10;A) 5%&#10;B) 15%&#10;C) 95%&#10;D) 90%

Accepted Answer

The answer of Exhibit 10.2&#10;The following questions are based on...

Question 43

Suppose that a data set contains a variable EDUCATION, which has 7 discrete levels. EDUCATION is an example of&#10;A) a categorical variable&#10;B) a classification variable&#10;C) a continuous variable&#10;D) an exponential variable

Accepted Answer

The answer of Suppose that a data set contains a...

Question 44

Logistic regression in XLMiner add-in can be used for ______ groups&#10;A) 2&#10;B) 3&#10;C) 4&#10;D) 5

Accepted Answer

The answer of Logistic regression in XLMiner add-in can be...

Question 45

Data mining tasks fall in the following categories&#10;A) classification, prediction, association&#10;B) categorization, segmentation&#10;C) prediction, association, mining&#10;D) observation, categorization, association

Accepted Answer

The answer of Data mining tasks fall in the following...

Question 46

Cluster analysis is a data mining technique used for&#10;A) grouping together similar data&#10;B) segmentation of records within a data set&#10;C) designing effective marketing strategies&#10;D) all of the above

Accepted Answer

The answer of Cluster analysis is a data mining technique...

Question 47

Neural networks classification methodology&#10;A) is one of the options available in XLMiner Excel add-in&#10;B) is superior to other classification schemes&#10;C) is inferior to other classification schemes&#10;D) produces results that are identical to the outcomes of the DA technique

Accepted Answer

The answer of Neural networks classification methodology&#10;A) is one of...

Question 48

Technique(s) used in classification step of data mining include&#10;A) discriminant analysis&#10;B) logistic regression&#10;C) neural networks&#10;D) all of the above

Accepted Answer

The answer of Technique(s) used in classification step of data...

Question 49

In the ________ step of data mining, a researcher attempts to form logical groupings of data in the set&#10;A) classification&#10;B) prediction&#10;C) categorization&#10;D) association/segmentation

Accepted Answer

The answer of In the ________ step of data mining,...

Question 50

Exhibit 10.1&#10;The following questions are based on the problem description and the output below.&#10;A college admissions officer wants to evaluate graduate school applicants based on their GMAT scores, verbal and quantitative. Students are classified as either successful or not-successful in their graduate studies. The officer has data on 20 current students, ten of whom are doing very well (Group 1) and ten who are not (Group 2).   &#8203;   &#8203;   &#8203;   &#8203;  &#10;Refer to Exhibit 10.1. Based on the regression output, what is the discriminant score for a student with a quantitative score of 635 and a verbal score of 570?&#10;A) 1.72 &#8804; discriminant score &#8804; 1.73&#10;B) 2.02 &#8804; discriminant score &#8804; 2.03&#10;C) 3.04 &#8804; discriminant score &#8804; 3.05&#10;D) 6.12 &#8804; discriminant score &#8804; 6.14

Accepted Answer

The answer of Exhibit 10.1&#10;The following questions are based on...

Question 51

Logistic regression is a classification technique that&#10;A) outperforms other techniques accros a variety of data collections&#10;B) is not reliable&#10;C) is not robust&#10;D) is not feasible for most data sets

Accepted Answer

The answer of Logistic regression is a classification technique that&#10;A)...

Question 52

___________ and _________ must be chosen each time a partition is subdivided&#10;A) an independent variable and splitting value&#10;B) a dependent variable and cutoff value&#10;C) a significance level and an upper bound&#10;D) a significance level and a lower bound

Accepted Answer

The answer of ___________ and _________ must be chosen each...

Question 53

A graphical representation of a set of rules for classifying observations into 2 or more groups is called&#10;A) a classification tree&#10;B) a binary tree&#10;C) a Pareto diagram&#10;D) a branch-and-bound tree

Accepted Answer

The answer of A graphical representation of a set of...

Question 54

Given the following confusion matrix   what is the correct classification rate?&#10;A) 9/13 = 69%&#10;B) 10/14 = 86%&#10;C) 19/25 = 76%&#10;D) 6/19 = 32%

Accepted Answer

The answer of Given the following confusion matrix  ...

Question 55

Plots useful in data mining analysis can be accessed in Excel using the _______ add-in&#10;A) XLMiner&#10;B) Charts&#10;C) Data Analysis&#10;D) Visual Basic

Accepted Answer

The answer of Plots useful in data mining analysis can...

Question 56

Neural networks are&#10;A) a pattern recognition technique&#10;B) a physical model representation of interrelationships&#10;C) a non-directed graph&#10;D) a binary network

Accepted Answer

The answer of Neural networks are&#10;A) a pattern recognition technique&#10;B)...

Question 57

Suppose that all observations belong to the same class. The entropy measure for this situation is equal to&#10;A) 0&#10;B) 0.25&#10;C) 0.5&#10;D) 1

Accepted Answer

The answer of Suppose that all observations belong to the...

Question 58

Two approaches to clustering discussed in the text are&#10;A) k-means clustering and hierarchical clustering&#10;B) intuitive clustering and methodical clustering&#10;C) MIPS clustering and pixel clustering&#10;D) centroid clustering and VMI

Accepted Answer

The answer of Two approaches to clustering discussed in the...

Question 59

Exhibit 10.2&#10;The following questions are based on the problem description and the output below.&#10;A college admissions officer wants to evaluate graduate school applicants based on their GMAT scores, verbal and quantitative. Students are classified as either successful (Group 1), marginally successful (Group 2) or not-successful (Group 3) in their graduate studies. The officer has data on 20 current students, 7 successful (Group 1), 6 marginally successful (Group 2) and 7 not successful (Group 3).   &#8203;   &#8203;   &#8203;   &#8203;  &#10;Refer to Exhibit 10.2. What number of observations is classified incorrectly?&#10;A) 19&#10;B) 20&#10;C) 7&#10;D) 1

Accepted Answer

The answer of Exhibit 10.2&#10;The following questions are based on...

Question 60

Suppose that a data set contains a variable EDUCATION, which has 7 discrete levels. EDUCATION can be represented by ____ binary variables&#10;A) 6&#10;B) 7&#10;C) 8&#10;D) 9

Accepted Answer

The answer of Suppose that a data set contains a...

Question 61

Exhibit 10.1&#10;The following questions are based on the problem description and the output below.&#10;A college admissions officer wants to evaluate graduate school applicants based on their GMAT scores, verbal and quantitative. Students are classified as either successful or not-successful in their graduate studies. The officer has data on 20 current students, ten of whom are doing very well (Group 1) and ten who are not (Group 2).   &#8203;   &#8203;   &#8203;   &#8203;  &#10;Refer to Exhibit 10.1. What is the verbal test score value of the group centroid for group 2?&#10;A) 683.8&#10;B) 654.2&#10;C) 610.7&#10;D) 605.7

Accepted Answer

The answer of Exhibit 10.1&#10;The following questions are based on...

Question 62

Exhibit 10.1&#10;The following questions are based on the problem description and the output below.&#10;A college admissions officer wants to evaluate graduate school applicants based on their GMAT scores, verbal and quantitative. Students are classified as either successful or not-successful in their graduate studies. The officer has data on 20 current students, ten of whom are doing very well (Group 1) and ten who are not (Group 2).   &#8203;   &#8203;   &#8203;   &#8203;  &#10;Refer to Exhibit 10.1. What is the quantitative test score value of the group centroid for group 1?&#10;A) 683.8&#10;B) 654.2&#10;C) 610.7&#10;D) 605.7

Accepted Answer

The answer of Exhibit 10.1&#10;The following questions are based on...

Question 63

Suppose that the correlation coefficient between X₁ and X₂ is equal to 1. This means that A) X₁ and X₂ are perfectly positively correlated B) X₁ and X₂ are perfectly negatively correlated C) X₁ and X₂ are weakly and positively correlated D) X₁ and X₂ are weakly and negatively correlated

Accepted Answer

The answer of Suppose that the correlation coefficient between X₁...

Question 64

A graphical representation of clustering outcomes showing which items should be classified to which clusters is called a(n)&#10;A) dendrogram&#10;B) hierarchical chart&#10;C) horizontal multi layer chart&#10;D) vertical bar chart

Accepted Answer

The answer of A graphical representation of clustering outcomes showing...

Deck 10: Data Mining