Deck 9: Predictive Data Mining
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Unlock Deck
Sign up to unlock the cards in this deck!
Unlock Deck
Unlock Deck
1/38
Play
Full screen (f)
Deck 9: Predictive Data Mining
1
Misclassifying an actual __________ observation as a(n) __________ observation is known as a false positive.
A)Class 0, Class 1
B)Class 1, Class 0
C)error, accuracy
D)false, true
A)Class 0, Class 1
B)Class 1, Class 0
C)error, accuracy
D)false, true
Class 0, Class 1
2
__________ is dividing the sample data into three sets for training, validation, and testing of the data mining algorithm performance.
A)Data sampling
B)Data partitioning
C)Data preparation
D)Model assessment
A)Data sampling
B)Data partitioning
C)Data preparation
D)Model assessment
Data partitioning
3
A(n) __________ is often displayed as a row of values in a spreadsheet or database in which the columns correspond to the variables.
A)record
B)data point
C)classification
D)location
A)record
B)data point
C)classification
D)location
record
4
The percent of misclassified records out of the total records in the validation data is known as the
A)overall error rate.
B)error.
C)accuracy.
D)class.
A)overall error rate.
B)error.
C)accuracy.
D)class.
Unlock Deck
Unlock for access to all 38 flashcards in this deck.
Unlock Deck
k this deck
5
As we increase the cutoff value, _______ error will decrease and _________ error will rise.
A)Class 0, Class 1
B)Class 1, Class 0
C)false, true
D)None of these are correct.
A)Class 0, Class 1
B)Class 1, Class 0
C)false, true
D)None of these are correct.
Unlock Deck
Unlock for access to all 38 flashcards in this deck.
Unlock Deck
k this deck
6
__________ involves descriptive statistics, data visualization, and clustering.
A)Data exploration
B)Data partitioning
C)Data preparation
D)Model assessment
A)Data exploration
B)Data partitioning
C)Data preparation
D)Model assessment
Unlock Deck
Unlock for access to all 38 flashcards in this deck.
Unlock Deck
k this deck
7
__________ is one minus the Class 0 error rate.
A)Sensitivity
B)Specificity
C)Accuracy
D)Cutoff value
A)Sensitivity
B)Specificity
C)Accuracy
D)Cutoff value
Unlock Deck
Unlock for access to all 38 flashcards in this deck.
Unlock Deck
k this deck
8
__________ is the manipulation of the data with the goal of putting it in a form suitable for formal modeling.
A)Data sampling
B)Data partitioning
C)Data preparation
D)Model assessment
A)Data sampling
B)Data partitioning
C)Data preparation
D)Model assessment
Unlock Deck
Unlock for access to all 38 flashcards in this deck.
Unlock Deck
k this deck
9
__________ is the step in data mining that includes addressing missing and erroneous data, reducing the number of variables, defining new variables, and data exploration.
A)Data sampling
B)Data partitioning
C)Data preparation
D)Model assessment
A)Data sampling
B)Data partitioning
C)Data preparation
D)Model assessment
Unlock Deck
Unlock for access to all 38 flashcards in this deck.
Unlock Deck
k this deck
10
__________ is NOT a step of the data mining process.
A)Data sampling
B)Data partitioning
C)Model construction
D)Supervised learning
A)Data sampling
B)Data partitioning
C)Model construction
D)Supervised learning
Unlock Deck
Unlock for access to all 38 flashcards in this deck.
Unlock Deck
k this deck
11
Determine a freshman's likely first-year grade point average from the student's Scholastic Aptitude Test (SAT) score, high school grade point average, and number of extra-curricular activities. This is an example of
A)classification of a categorical outcome.
B)estimation of a continuous outcome.
C)prediction of a categorical outcome.
D)unsupervised learning.
A)classification of a categorical outcome.
B)estimation of a continuous outcome.
C)prediction of a categorical outcome.
D)unsupervised learning.
Unlock Deck
Unlock for access to all 38 flashcards in this deck.
Unlock Deck
k this deck
12
A characteristic or quantity of interest that can take on different values is a(n)
A)variable.
B)observation.
C)record.
D)quality.
A)variable.
B)observation.
C)record.
D)quality.
Unlock Deck
Unlock for access to all 38 flashcards in this deck.
Unlock Deck
k this deck
13
__________ is a category of data mining techniques in which an algorithm learns how to classify or estimate an outcome variable of interest.
A)Supervised learning
B)Unsupervised learning
C)Dimension reduction
D)Data sampling
A)Supervised learning
B)Unsupervised learning
C)Dimension reduction
D)Data sampling
Unlock Deck
Unlock for access to all 38 flashcards in this deck.
Unlock Deck
k this deck
14
Applying descriptive statistics and data visualization to the training set to understand the data and assist in the selection of an appropriate technique is a part of
A)data exploration.
B)data partitioning.
C)data preparation.
D)model assessment.
A)data exploration.
B)data partitioning.
C)data preparation.
D)model assessment.
Unlock Deck
Unlock for access to all 38 flashcards in this deck.
Unlock Deck
k this deck
15
Data mining methods for classifying or estimating an outcome based on a set of input variables is referred to as
A)supervised learning.
B)unsupervised learning.
C)dimension reduction.
D)data sampling.
A)supervised learning.
B)unsupervised learning.
C)dimension reduction.
D)data sampling.
Unlock Deck
Unlock for access to all 38 flashcards in this deck.
Unlock Deck
k this deck
16
The set of recorded values of variables associated with a single entity is a(n)
A)observation.
B)data point.
C)classification.
D)location.
A)observation.
B)data point.
C)classification.
D)location.
Unlock Deck
Unlock for access to all 38 flashcards in this deck.
Unlock Deck
k this deck
17
Estimation methods are also referred to as
A)prediction methods.
B)clustering methods.
C)association methods.
D)supervised methods.
A)prediction methods.
B)clustering methods.
C)association methods.
D)supervised methods.
Unlock Deck
Unlock for access to all 38 flashcards in this deck.
Unlock Deck
k this deck
18
Data used to build a data mining model is called
A)validation data.
B)training data.
C)test data.
D)exploration data.
A)validation data.
B)training data.
C)test data.
D)exploration data.
Unlock Deck
Unlock for access to all 38 flashcards in this deck.
Unlock Deck
k this deck
19
Classifying a record as belonging to one class when it belongs to another class is referred to as a(n)
A)overall error rate.
B)error.
C)accuracy.
D)class.
A)overall error rate.
B)error.
C)accuracy.
D)class.
Unlock Deck
Unlock for access to all 38 flashcards in this deck.
Unlock Deck
k this deck
20
__________ is a method of extracting data relevant to the business problem under consideration. It is the first step in the data mining process.
A)Data sampling
B)Data partitioning
C)Model construction
D)Model assessment
A)Data sampling
B)Data partitioning
C)Model construction
D)Model assessment
Unlock Deck
Unlock for access to all 38 flashcards in this deck.
Unlock Deck
k this deck
21
A __________ classifies a categorical outcome variable by splitting observations into groups via a sequence of hierarchical rules.
A)regression tree
B)scatter chart
C)classification tree
D)confusion matrix
A)regression tree
B)scatter chart
C)classification tree
D)confusion matrix
Unlock Deck
Unlock for access to all 38 flashcards in this deck.
Unlock Deck
k this deck
22
The x-axis of a lift chart shows
A)the number of actual Class 1 records identified.
B)the ratio of decile mean to overall mean.
C)the number of actual Class 1 records.
D)the ratio of the overall mean to the decile mean.
A)the number of actual Class 1 records identified.
B)the ratio of decile mean to overall mean.
C)the number of actual Class 1 records.
D)the ratio of the overall mean to the decile mean.
Unlock Deck
Unlock for access to all 38 flashcards in this deck.
Unlock Deck
k this deck
23
A(n) __________ matrix displays a model's correct and incorrect classification.
A)cumulative lift
B)confusion
C)decile-wise lift chart
D)ROC curve
A)cumulative lift
B)confusion
C)decile-wise lift chart
D)ROC curve
Unlock Deck
Unlock for access to all 38 flashcards in this deck.
Unlock Deck
k this deck
24
_________ attempts to classify a categorical outcome as a linear function of explanatory variables.
A)Linear regression
B)Logistic regression
C)Classification model
D)Supervised learning
A)Linear regression
B)Logistic regression
C)Classification model
D)Supervised learning
Unlock Deck
Unlock for access to all 38 flashcards in this deck.
Unlock Deck
k this deck
25
How many Class 1's are correctly classified as Class 1 in the Table below? ?
A)221
B)100
C)30
D)3,000
A)221
B)100
C)30
D)3,000
Unlock Deck
Unlock for access to all 38 flashcards in this deck.
Unlock Deck
k this deck
26
Which of the following is a commonly used supervised learning method?
A)k-means clustering
B)k-nearest neighbors
C)hierarchical clustering
D)association rule development
A)k-means clustering
B)k-nearest neighbors
C)hierarchical clustering
D)association rule development
Unlock Deck
Unlock for access to all 38 flashcards in this deck.
Unlock Deck
k this deck
27
The impurity of a group of observations is based on the variance of the outcome value for the observations in the group for
A)regression trees.
B)time-series plots.
C)classification trees.
D)cumulative lift charts.
A)regression trees.
B)time-series plots.
C)classification trees.
D)cumulative lift charts.
Unlock Deck
Unlock for access to all 38 flashcards in this deck.
Unlock Deck
k this deck
28
How many Class 1's are incorrectly classified as Class 0? ?
A)221
B)100
C)30
D)3,000
A)221
B)100
C)30
D)3,000
Unlock Deck
Unlock for access to all 38 flashcards in this deck.
Unlock Deck
k this deck
29
An observation classified as part of a group with a characteristic when it actually does not have the characteristic is termed as a(n)
A)false negative.
B)false positive.
C)residual.
D)outlier.
A)false negative.
B)false positive.
C)residual.
D)outlier.
Unlock Deck
Unlock for access to all 38 flashcards in this deck.
Unlock Deck
k this deck
30
The y-axis of a decile chart shows
A)number of important class records identified.
B)ratio of decile mean to overall mean.
C)the number of actual Class 1 records.
D)the ratio of the overall mean to the decile mean.
A)number of important class records identified.
B)ratio of decile mean to overall mean.
C)the number of actual Class 1 records.
D)the ratio of the overall mean to the decile mean.
Unlock Deck
Unlock for access to all 38 flashcards in this deck.
Unlock Deck
k this deck
31
__________ compares the number of actual Class 1 observations identified if considered in decreasing order of their estimated probability if randomly classified.
A)Cumulative lift
B)Confusion
C)Decile-wise lift chart
D)ROC curve
A)Cumulative lift
B)Confusion
C)Decile-wise lift chart
D)ROC curve
Unlock Deck
Unlock for access to all 38 flashcards in this deck.
Unlock Deck
k this deck
32
__________ refers to the scenario in which the analyst builds a model that does a great job of explaining the sample of data on which it is based but fails to accurately predict outside the sample data.
A)Underfitting
B)Overfitting
C)Oversampling
D)Undersampling
A)Underfitting
B)Overfitting
C)Oversampling
D)Undersampling
Unlock Deck
Unlock for access to all 38 flashcards in this deck.
Unlock Deck
k this deck
33
A test set is the data set used to
A)build the data mining model.
B)estimate performance of candidate models on unseen data.
C)estimate performance of the final model on unseen data.
D)show counts of actual versus predicted class values.
A)build the data mining model.
B)estimate performance of candidate models on unseen data.
C)estimate performance of the final model on unseen data.
D)show counts of actual versus predicted class values.
Unlock Deck
Unlock for access to all 38 flashcards in this deck.
Unlock Deck
k this deck
34
One minus the overall error rate is often referred to as the __________ of the model.
A)sensitivity
B)accuracy
C)specificity
D)cutoff value
A)sensitivity
B)accuracy
C)specificity
D)cutoff value
Unlock Deck
Unlock for access to all 38 flashcards in this deck.
Unlock Deck
k this deck
35
Separate error rates with respect to the false negative and false positive cases are computed to take into account the
A)asymmetric costs in misclassification.
B)symmetric weights of these two cases.
C)distortions due to outliers.
D)effect of sampling error.
A)asymmetric costs in misclassification.
B)symmetric weights of these two cases.
C)distortions due to outliers.
D)effect of sampling error.
Unlock Deck
Unlock for access to all 38 flashcards in this deck.
Unlock Deck
k this deck
36
__________ is a measure of the heterogeneity of observations in a classification tree.
A)Sensitivity
B)Specificity
C)Accuracy
D)Impurity
A)Sensitivity
B)Specificity
C)Accuracy
D)Impurity
Unlock Deck
Unlock for access to all 38 flashcards in this deck.
Unlock Deck
k this deck
37
__________ is a generalization of linear regression for predicting a categorical outcome variable.
A)Multiple linear regression
B)Logistic regression
C)Discriminant analysis
D)Cluster analysis
A)Multiple linear regression
B)Logistic regression
C)Discriminant analysis
D)Cluster analysis
Unlock Deck
Unlock for access to all 38 flashcards in this deck.
Unlock Deck
k this deck
38
In the k-nearest neighbors method, when the value of k is set to 1
A)the classification or prediction of a new observation is based solely on the single most similar observation from the training set.
B)the new observation's class is naïvely assigned to the most common class in the training set.
C)the new observation's prediction is used to estimate the anticipated error rate on future data over the entire training set.
D)the classification or prediction of a new observation is subject to the smallest possible classification error.
A)the classification or prediction of a new observation is based solely on the single most similar observation from the training set.
B)the new observation's class is naïvely assigned to the most common class in the training set.
C)the new observation's prediction is used to estimate the anticipated error rate on future data over the entire training set.
D)the classification or prediction of a new observation is subject to the smallest possible classification error.
Unlock Deck
Unlock for access to all 38 flashcards in this deck.
Unlock Deck
k this deck