Deck 16: Introduction to Data Mining
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Unlock Deck
Sign up to unlock the cards in this deck!
Unlock Deck
Unlock Deck
1/68
Play
Full screen (f)
Deck 16: Introduction to Data Mining
1
According to the output below, which of the following statement is true about the correlation between stock price and earnings per share (EPS)? 
A) The correlation is negative.
B) The correlation is not significantly different from zero.
C) The correlation is positive and significantly different from zero.
D) The correlation is positive but not significantly different from zero.
E) Cannot be determined from the information given.

A) The correlation is negative.
B) The correlation is not significantly different from zero.
C) The correlation is positive and significantly different from zero.
D) The correlation is positive but not significantly different from zero.
E) Cannot be determined from the information given.
The correlation is positive and significantly different from zero.
2
Data were collected for a sample of 12 pharmacists to determine if years of experience and salary are related. Based on the results below, the calculated t-
Statistic to test whether the regression slope is significant is
A) 10.99
B) 47.97
C) 31.2
D) 6.93
E) 5.58485
Statistic to test whether the regression slope is significant is

A) 10.99
B) 47.97
C) 31.2
D) 6.93
E) 5.58485
6.93
3
In a regression analysis predicting tourism revenue ($billion) using number of foreign visitors (million), the P-value for the calculated test statistic is 0.006. At the 0.05
Level of significance we
A) reject the null hypothesis.
B) do not reject the null hypothesis.
C) conclude that the number of foreign visitors is significant in explaining tourism
Revenue.
D) Both A and C.
E) Both B and C.
Level of significance we
A) reject the null hypothesis.
B) do not reject the null hypothesis.
C) conclude that the number of foreign visitors is significant in explaining tourism
Revenue.
D) Both A and C.
E) Both B and C.
Both A and C.
4
According to the results below, what is the correlation between stock price and EPS? 
A) -0.975
B) 0.906
C) 0.950
D) 0.975
E) Cannot be determined from the information given.

A) -0.975
B) 0.906
C) 0.950
D) 0.975
E) Cannot be determined from the information given.
Unlock Deck
Unlock for access to all 68 flashcards in this deck.
Unlock Deck
k this deck
5
Based on the regression output and residual plot below, which of the following is true?
Regression Analysis: Technology Adoption versus Time
Technology Adoption = - 11.9 +
3)37 Time
S = 6.30783 R-Sq = 82.5%
The regression equation is:
Durbin-Watson statistic = 0.278634
A) The linear model explains 82.5 % of the variability in technology adoption.
B) The linear model is appropriate.
C) The linear model is not appropriate.
D) Both A and B.
E) Both A and C.
Regression Analysis: Technology Adoption versus Time
Technology Adoption = - 11.9 +
3)37 Time
S = 6.30783 R-Sq = 82.5%
The regression equation is:

A) The linear model explains 82.5 % of the variability in technology adoption.
B) The linear model is appropriate.
C) The linear model is not appropriate.
D) Both A and B.
E) Both A and C.
Unlock Deck
Unlock for access to all 68 flashcards in this deck.
Unlock Deck
k this deck
6
From the plots of residuals shown below, which assumption appears to be violated?

A) Equal Variance
B) Linearity
C) Normality
D) Independence
E) None; all appear to be satisfied.


A) Equal Variance
B) Linearity
C) Normality
D) Independence
E) None; all appear to be satisfied.
Unlock Deck
Unlock for access to all 68 flashcards in this deck.
Unlock Deck
k this deck
7
According to the partial regression analysis output below, what is the t-statistic to test whether the regression slope is significant? 
A) 6.20
B) 13.88
C) 0.07917
D) 2.58307
E) 3.73

A) 6.20
B) 13.88
C) 0.07917
D) 2.58307
E) 3.73
Unlock Deck
Unlock for access to all 68 flashcards in this deck.
Unlock Deck
k this deck
8
Regression analysis was performed to develop a model for predicting a firm's Price- Earnings Ratio (PE) based on Growth Rate, Profit Margin, and whether or not the
Firm is Green (1 = Yes, 0 = No). Based on the F-statistic of 26.48 which has a p-value
Of 0.000, we can conclude at α = .05 that
A) the regression equation is not significant.
B) all independent variables in the model are significant.
C) the regression equation is significant.
D) none of the independent variables in the model are significant.
E) both B and C.
Firm is Green (1 = Yes, 0 = No). Based on the F-statistic of 26.48 which has a p-value
Of 0.000, we can conclude at α = .05 that
A) the regression equation is not significant.
B) all independent variables in the model are significant.
C) the regression equation is significant.
D) none of the independent variables in the model are significant.
E) both B and C.
Unlock Deck
Unlock for access to all 68 flashcards in this deck.
Unlock Deck
k this deck
9
Data were collected for a sample of 12 pharmacists to determine if years of experience and salary are related. A regression was run with the dependent variable
Salary (thousands of dollars) and independent variable Experience (years). Suppose
The P-value associated with the calculated t-statistic is < .001. At the .05 level of
Significance we
A) reject the null hypothesis.
B) do not reject the null hypothesis.
C) conclude that years of experience is significant in explaining pharmacists' salary.
D) Both A and C.
E) Both B and C.
Salary (thousands of dollars) and independent variable Experience (years). Suppose
The P-value associated with the calculated t-statistic is < .001. At the .05 level of
Significance we
A) reject the null hypothesis.
B) do not reject the null hypothesis.
C) conclude that years of experience is significant in explaining pharmacists' salary.
D) Both A and C.
E) Both B and C.
Unlock Deck
Unlock for access to all 68 flashcards in this deck.
Unlock Deck
k this deck
10
Using this regression equation: Salary = 37.2 + 1.49 Years' Experience to predict salary for pharmacists with 10 years of experience gives the following results. Which
Of the following is true?
A) 95% of pharmacists with 10 years of experience earn between $38,960 and
$65,130.
B) 95% of pharmacists with 10 years of experience earn between $48,010 and
$56,080.
C) We are 95% confident that a particular pharmacist who has 10 years of experience
Earns between $38,960 and $65,130.
D) We are 95% confident that a particular pharmacist who has 10 years of experience
Earns between $48,010 and $56,080
E) 95% of pharmacists with 10 years experience on average earn between $48,010
And $56,080.
Of the following is true?

A) 95% of pharmacists with 10 years of experience earn between $38,960 and
$65,130.
B) 95% of pharmacists with 10 years of experience earn between $48,010 and
$56,080.
C) We are 95% confident that a particular pharmacist who has 10 years of experience
Earns between $38,960 and $65,130.
D) We are 95% confident that a particular pharmacist who has 10 years of experience
Earns between $48,010 and $56,080
E) 95% of pharmacists with 10 years experience on average earn between $48,010
And $56,080.
Unlock Deck
Unlock for access to all 68 flashcards in this deck.
Unlock Deck
k this deck
11
According to the regression analysis output below, how much of the variability in tourism revenue is accounted for by the number of foreign visitors? 
A) 63.4 %
B) 13.8 %
C) 2.58 billion $
D) 21.464 %
E) 3.73 billion $

A) 63.4 %
B) 13.8 %
C) 2.58 billion $
D) 21.464 %
E) 3.73 billion $
Unlock Deck
Unlock for access to all 68 flashcards in this deck.
Unlock Deck
k this deck
12
A regression model: PE = 8.04 + 0.747 Growth Rate + 0.0516 Profit Margin + 2.09 Green was developed to predict a firm's Price-Earnings Ratio (PE) using
Growth Rate, Profit Margin, and whether the firm is Green (1 = Yes, 0 = No). Which
Of the following is the correct interpretation for the regression coefficient of Green?
A) The regression coefficient indicates that the PE ratio of a firm that is green will,
On average, be 2.09 higher than a firm that is not green with the same growth rate
And profit margin.
B) The regression coefficient indicates that the PE ratio of a firm that is green will,
On average, be 2.09 lower than a firm that is not green with the same growth rate
And profit margin.
C) The regression coefficient indicates that the PE ratio of a firm that is green will,
On average, be 2.09 times higher than a firm that is not green with the same
Growth rate and profit margin.
D) The regression coefficient indicates that the PE ratio of a firm that is green will,
On average, be 2.09 times lower than a firm that is not green with the same growth
Rate and profit margin.
E) The regression coefficient is not significantly different from zero.
Growth Rate, Profit Margin, and whether the firm is Green (1 = Yes, 0 = No). Which
Of the following is the correct interpretation for the regression coefficient of Green?
A) The regression coefficient indicates that the PE ratio of a firm that is green will,
On average, be 2.09 higher than a firm that is not green with the same growth rate
And profit margin.
B) The regression coefficient indicates that the PE ratio of a firm that is green will,
On average, be 2.09 lower than a firm that is not green with the same growth rate
And profit margin.
C) The regression coefficient indicates that the PE ratio of a firm that is green will,
On average, be 2.09 times higher than a firm that is not green with the same
Growth rate and profit margin.
D) The regression coefficient indicates that the PE ratio of a firm that is green will,
On average, be 2.09 times lower than a firm that is not green with the same growth
Rate and profit margin.
E) The regression coefficient is not significantly different from zero.
Unlock Deck
Unlock for access to all 68 flashcards in this deck.
Unlock Deck
k this deck
13
If we were interested in using regression methods to predict the tourism revenue for a particular country that had 30 million foreign visitors we should
A) construct a confidence interval using the regression equation.
B) construct a predication interval using the regression equation.
C) use the correlation.
D) use the standard error.
E) None of these.
A) construct a confidence interval using the regression equation.
B) construct a predication interval using the regression equation.
C) use the correlation.
D) use the standard error.
E) None of these.
Unlock Deck
Unlock for access to all 68 flashcards in this deck.
Unlock Deck
k this deck
14
A patient is injected with the drug and the concentration (units/cc) in the patient's blood is measured every hour for seven hours. Re-expressing these data result in the
Following model and residual plot. What is true about the predicted concentration
Level after 10 hours has elapsed?
Log(Concentration) = 1.79 - 0.169 Time Elapsed.
S = 0.00565191 R-Sq = 100.0%
A) The predicted value is 1.259 units/cc.
B) This value is considered an extrapolation.
C) This value is accurate because R2 = 100%.
D) Both A and B.
E) All of the above.
Following model and residual plot. What is true about the predicted concentration
Level after 10 hours has elapsed?
Log(Concentration) = 1.79 - 0.169 Time Elapsed.
S = 0.00565191 R-Sq = 100.0%
A) The predicted value is 1.259 units/cc.
B) This value is considered an extrapolation.
C) This value is accurate because R2 = 100%.
D) Both A and B.
E) All of the above.
Unlock Deck
Unlock for access to all 68 flashcards in this deck.
Unlock Deck
k this deck
15
Data are collected on the number of foreign visitors to a country (million) and total tourism revenue ($billion) for a sample of 10 countries. According to the following
Output, what is standard error of the slope for this estimated regression equation?
S = 2.58307 R-Sq = 63.4%
A) 2.58307
B) 3.462
C) 0.07917
D) 6.672
E) 0.29497
Output, what is standard error of the slope for this estimated regression equation?

A) 2.58307
B) 3.462
C) 0.07917
D) 6.672
E) 0.29497
Unlock Deck
Unlock for access to all 68 flashcards in this deck.
Unlock Deck
k this deck
16
A patient is injected with the drug and the concentration (units/cc) in the patient's blood is measured every hour for seven hours. Based on the linear regression output
Below, which of the following is true?
Regression Analysis:
The regression equation is
Concentration = 41.3 - 6.00 Time Elapsed
S = 4.72077 R-Sq = 90.0%
Concentration versus Time Elapsed
A) The linear model is appropriate given that it explains 90% of the variability in
Blood concentration levels of the drug.
B) If the observed pattern continues into the future, this model will underestimate the
Concentration level after 10 hours has elapsed because the linear model is not
Appropriate.
C) If the observed pattern continues into the future, this model will overestimate the
Concentration level after 10 hours has elapsed because the linear model is not
Appropriate.
D) Both A and B.
E) Both A and C.
Below, which of the following is true?
Regression Analysis:
The regression equation is
Concentration = 41.3 - 6.00 Time Elapsed
S = 4.72077 R-Sq = 90.0%
Concentration versus Time Elapsed

A) The linear model is appropriate given that it explains 90% of the variability in
Blood concentration levels of the drug.
B) If the observed pattern continues into the future, this model will underestimate the
Concentration level after 10 hours has elapsed because the linear model is not
Appropriate.
C) If the observed pattern continues into the future, this model will overestimate the
Concentration level after 10 hours has elapsed because the linear model is not
Appropriate.
D) Both A and B.
E) Both A and C.
Unlock Deck
Unlock for access to all 68 flashcards in this deck.
Unlock Deck
k this deck
17
Data were collected for a sample of 12 pharmacists to determine if years of experience and salary are related. Based on the results below, the standard error of
The slope for this estimated regression equation is
A) 3.381
B) 0.2149
C) 5.58485
D) 82.8
E) 1.4882
The slope for this estimated regression equation is

A) 3.381
B) 0.2149
C) 5.58485
D) 82.8
E) 1.4882
Unlock Deck
Unlock for access to all 68 flashcards in this deck.
Unlock Deck
k this deck
18
Based on the output below from regression analysis performed to develop a model for predicting a firm's Price-Earnings Ratio (PE) based on Growth Rate, Profit Margin,
And whether or not the firm is Green (1 = Yes, 0 = No), we can conclude (α = .05)
That
A) Growth Rate is not a significant variable in predicting a firm's PE ratio.
B) Profit Margin is a significant variable in predicting a firm's PE ratio.
C) The regression coefficient associated with Growth Rate is not significantly
Different from zero.
D) Whether or not a firm is Green is significant in predicting its PE ratio.
E) The regression coefficient associated with Profit Margin is significantly different
From zero.
And whether or not the firm is Green (1 = Yes, 0 = No), we can conclude (α = .05)
That

A) Growth Rate is not a significant variable in predicting a firm's PE ratio.
B) Profit Margin is a significant variable in predicting a firm's PE ratio.
C) The regression coefficient associated with Growth Rate is not significantly
Different from zero.
D) Whether or not a firm is Green is significant in predicting its PE ratio.
E) The regression coefficient associated with Profit Margin is significantly different
From zero.
Unlock Deck
Unlock for access to all 68 flashcards in this deck.
Unlock Deck
k this deck
19
A least squares estimated regression line has been fitted to a set of data and the resulting residual plot is shown. Which is true? 
A) The linear model is appropriate.
B) The linear model is poor because some residuals are large.
C) The linear model is poor because the correlation is near 0.
D) A curved model would be better.
E) A transformation of the data is required.

A) The linear model is appropriate.
B) The linear model is poor because some residuals are large.
C) The linear model is poor because the correlation is near 0.
D) A curved model would be better.
E) A transformation of the data is required.
Unlock Deck
Unlock for access to all 68 flashcards in this deck.
Unlock Deck
k this deck
20
Data were collected for a sample of 12 pharmacists to determine if years of experience and salary are related. Based on the output below, how much of the
Variability in pharmacists' salary is accounted for by years of experience?
A) 82.8 %
B) 47.97 %
C) 5.58485 thousand dollars
D) 10.99 %
E) 98.9 %
Variability in pharmacists' salary is accounted for by years of experience?

A) 82.8 %
B) 47.97 %
C) 5.58485 thousand dollars
D) 10.99 %
E) 98.9 %
Unlock Deck
Unlock for access to all 68 flashcards in this deck.
Unlock Deck
k this deck
21
Use the following information
Suppose data mining is employed on telecommunication company data warehouse in order to answer the following questions. On the line to the right of each, indicate whether these involve a classification (C) or regression (R) problem.
How much does a customer spend on all household communication-related
expenditures? ____
Suppose data mining is employed on telecommunication company data warehouse in order to answer the following questions. On the line to the right of each, indicate whether these involve a classification (C) or regression (R) problem.
How much does a customer spend on all household communication-related
expenditures? ____
Unlock Deck
Unlock for access to all 68 flashcards in this deck.
Unlock Deck
k this deck
22
According to the multiple regression model to predict the job performance of new hires based on age, GPA and gender (female = 1 and male = 0) shown below, how
Much of the variability in Job Performance is explained by the model?
A) 30.33 %
B) 77.7 %
C) 5.56 %
D) 60.76 %
E) Cannot be determined.
Much of the variability in Job Performance is explained by the model?

A) 30.33 %
B) 77.7 %
C) 5.56 %
D) 60.76 %
E) Cannot be determined.
Unlock Deck
Unlock for access to all 68 flashcards in this deck.
Unlock Deck
k this deck
23
From its plot of residuals versus fitted 
A) Equal Variance
B) Linearity
C) Normality
D) Independence
E) None; all appear to be satisfied.

A) Equal Variance
B) Linearity
C) Normality
D) Independence
E) None; all appear to be satisfied.
Unlock Deck
Unlock for access to all 68 flashcards in this deck.
Unlock Deck
k this deck
24
Use the following information
Suppose data mining is employed on telecommunication company data warehouse in order to answer the following questions. On the line to the right of each, indicate whether these involve a classification (C) or regression (R) problem.
Whether or not a customer would be interested in flexible cable TV plans (subscribe to different channels on different days/times)? ____
Suppose data mining is employed on telecommunication company data warehouse in order to answer the following questions. On the line to the right of each, indicate whether these involve a classification (C) or regression (R) problem.
Whether or not a customer would be interested in flexible cable TV plans (subscribe to different channels on different days/times)? ____
Unlock Deck
Unlock for access to all 68 flashcards in this deck.
Unlock Deck
k this deck
25
The model
can be used to predict the breaking strength (pounds) of a rope from its diameter (inches). According to this model, how much force should a
Rope one-half inch in diameter withstand?
A) 484 pounds
B) 16 pounds
C) 22 pounds
D) 256 pounds
E) 4.7 pounds

Rope one-half inch in diameter withstand?
A) 484 pounds
B) 16 pounds
C) 22 pounds
D) 256 pounds
E) 4.7 pounds
Unlock Deck
Unlock for access to all 68 flashcards in this deck.
Unlock Deck
k this deck
26
Based on the regression output shown below, which of the following statements is true?
The regression equation is
Price (cents) = 128 + 1.08 Time
Predictor Coef SE Coef T P
Constant 128.112 2.092 61.25 0.000
Time 1.0782 0.1407 7.66 0.000
S = 5.07299 R-Sq = 71.9%
Durbin-Watson statistic = 0.244822
A) The regression slope is significantly different from zero.
B) The model explains 71.9% of the variability in heating oil prices.
C) The linear model is appropriate.
D) Both A and B.
E) All of the above.
The regression equation is
Price (cents) = 128 + 1.08 Time
Predictor Coef SE Coef T P
Constant 128.112 2.092 61.25 0.000
Time 1.0782 0.1407 7.66 0.000
S = 5.07299 R-Sq = 71.9%
Durbin-Watson statistic = 0.244822
A) The regression slope is significantly different from zero.
B) The model explains 71.9% of the variability in heating oil prices.
C) The linear model is appropriate.
D) Both A and B.
E) All of the above.
Unlock Deck
Unlock for access to all 68 flashcards in this deck.
Unlock Deck
k this deck
27
According to the following regression analysis, the correlation between average annual cash bonus and average annual pay using α = 0.05 is 
A) not significantly different from zero.
B) negative but not significantly different from zero.
C) positive and significantly different from zero.
D) negative and significantly different from zero.
E) Cannot be determined from the information given.

A) not significantly different from zero.
B) negative but not significantly different from zero.
C) positive and significantly different from zero.
D) negative and significantly different from zero.
E) Cannot be determined from the information given.
Unlock Deck
Unlock for access to all 68 flashcards in this deck.
Unlock Deck
k this deck
28
The regression equation to predict the job performance of new hires based on age, GPA and gender (female = 1 and male = 0) is Job Performance = -60.8 + 4.80
Age + 1.44 GPA + 9.06 Gender. Which of the following is the correct
Interpretation for the regression coefficient of Gender?
A) The regression coefficient indicates that the job performance score for a female
Will, on average, be 9.06 points higher than for males of the same age and GPA.
B) The regression coefficient indicates that the job performance score for a female
Will, on average, be 9.06 points lower than for males of the same age and GPA.
C) The regression coefficient indicates that the job performance score for a female
Will, on average, be 9.06 times higher than for males.
D) The regression coefficient indicates that the job performance score for a female
Will, on average, be 9.06 times lower than for males.
E) The regression coefficient is not significantly different from zero.
Age + 1.44 GPA + 9.06 Gender. Which of the following is the correct
Interpretation for the regression coefficient of Gender?
A) The regression coefficient indicates that the job performance score for a female
Will, on average, be 9.06 points higher than for males of the same age and GPA.
B) The regression coefficient indicates that the job performance score for a female
Will, on average, be 9.06 points lower than for males of the same age and GPA.
C) The regression coefficient indicates that the job performance score for a female
Will, on average, be 9.06 times higher than for males.
D) The regression coefficient indicates that the job performance score for a female
Will, on average, be 9.06 times lower than for males.
E) The regression coefficient is not significantly different from zero.
Unlock Deck
Unlock for access to all 68 flashcards in this deck.
Unlock Deck
k this deck
29
Which of the following statements about a residual plot is true?
A) A curved pattern indicates nonlinear association between the variables.
B) A pattern of increasing spread indicates the predicted values become less reliable
As the explanatory variable increases.
C) If all of the residuals are very small, the model will predict accurately.
D) It should not be used if the regression results are not significant.
E) It cannot be used to analyze linear association.
A) A curved pattern indicates nonlinear association between the variables.
B) A pattern of increasing spread indicates the predicted values become less reliable
As the explanatory variable increases.
C) If all of the residuals are very small, the model will predict accurately.
D) It should not be used if the regression results are not significant.
E) It cannot be used to analyze linear association.
Unlock Deck
Unlock for access to all 68 flashcards in this deck.
Unlock Deck
k this deck
30
Which statement about influential points is true?
A) Removal of an influential point changes the regression line.
B) A high leverage point is always influential.
C) Influential points have large residuals.
D) All outliers are influential.
E) None of these.
A) Removal of an influential point changes the regression line.
B) A high leverage point is always influential.
C) Influential points have large residuals.
D) All outliers are influential.
E) None of these.
Unlock Deck
Unlock for access to all 68 flashcards in this deck.
Unlock Deck
k this deck
31
Disparate databases that include demographic and transactional variables merged together are referred to as
A) Data storage bins
B) Data mines
C) OLAP
D) CRISP
E) Data warehouses
A) Data storage bins
B) Data mines
C) OLAP
D) CRISP
E) Data warehouses
Unlock Deck
Unlock for access to all 68 flashcards in this deck.
Unlock Deck
k this deck
32
Using the following regression analysis of the relationship between the size of cash bonuses and pay scale, find the correlation between average annual cash bonus and
Average annual pay?
A) -0.540
B) -0.223
C) 0.108
D) 0.472
E) Cannot be determined from the information given.
Average annual pay?

A) -0.540
B) -0.223
C) 0.108
D) 0.472
E) Cannot be determined from the information given.
Unlock Deck
Unlock for access to all 68 flashcards in this deck.
Unlock Deck
k this deck
33
Use the following information
Suppose data mining is employed on telecommunication company data warehouse in order to answer the following questions. On the line to the right of each, indicate whether these involve a classification (C) or regression (R) problem.
Whether or not a customer would be interested in wireless internet capabilities? ____
Suppose data mining is employed on telecommunication company data warehouse in order to answer the following questions. On the line to the right of each, indicate whether these involve a classification (C) or regression (R) problem.
Whether or not a customer would be interested in wireless internet capabilities? ____
Unlock Deck
Unlock for access to all 68 flashcards in this deck.
Unlock Deck
k this deck
34
The results of a multiple regression model to predict the job performance of new hires based on age, GPA and gender (female = 1 and male = 0 are shown below. At α =
)05 we can conclude that
S = 5.56691 R-Sq = 77.7%
A) Age is not a significant variable in predicting job performance.
B) GPA is a significant variable in predicting job performance.
C) The regression coefficient associated with GPA is significantly different from
Zero.
D) Gender is a significant variable in predicting job performance.
E) The regression coefficient associated with Age is not significantly different from
Zero.
)05 we can conclude that

A) Age is not a significant variable in predicting job performance.
B) GPA is a significant variable in predicting job performance.
C) The regression coefficient associated with GPA is significantly different from
Zero.
D) Gender is a significant variable in predicting job performance.
E) The regression coefficient associated with Age is not significantly different from
Zero.
Unlock Deck
Unlock for access to all 68 flashcards in this deck.
Unlock Deck
k this deck
35
A linear model is fit to estimate the diameter of maple trees based on age. According to the scatterplot and residual plots shown below, which of the following is true? 
A) Assuming the pattern continues into the future, if we use this model to predict the
Diameter of a maple tree that is 50 years old it would be too low.
B) Assuming the pattern continues into the future, if we use this model to predict the
Diameter of a maple tree that is 50 years old it would be too high.
C) Re-expressing these data by taking the logarithm of age would improve this
Model.
D) Both A and B.
E) Both B and C.

A) Assuming the pattern continues into the future, if we use this model to predict the
Diameter of a maple tree that is 50 years old it would be too low.
B) Assuming the pattern continues into the future, if we use this model to predict the
Diameter of a maple tree that is 50 years old it would be too high.
C) Re-expressing these data by taking the logarithm of age would improve this
Model.
D) Both A and B.
E) Both B and C.
Unlock Deck
Unlock for access to all 68 flashcards in this deck.
Unlock Deck
k this deck
36
A farmer has increased his wheat production by about the same amount each year. His most useful predictive model is most probably
A) exponential.
B) linear.
C) logarithmic.
D) power.
E) quadratic.
A) exponential.
B) linear.
C) logarithmic.
D) power.
E) quadratic.
Unlock Deck
Unlock for access to all 68 flashcards in this deck.
Unlock Deck
k this deck
37
The results of a multiple regression model to predict the job performance of new hires based on age, GPA and gender (female = 1 and male = 0) resulted in an F-statistic of
30)23 and associated p-value of 0.000, we can conclude at α = .05 that
A) the regression equation is not significant.
B) all independent variables in the model are significant.
C) the regression equation is significant.
D) none of the independent variables in the model are significant.
E) both B and C.
30)23 and associated p-value of 0.000, we can conclude at α = .05 that
A) the regression equation is not significant.
B) all independent variables in the model are significant.
C) the regression equation is significant.
D) none of the independent variables in the model are significant.
E) both B and C.
Unlock Deck
Unlock for access to all 68 flashcards in this deck.
Unlock Deck
k this deck
38
Use the following information for questions 1 thorough 5:
Below is a list of a few variables for which data were collected from various
telecommunication companies. On the line to the right of each variable, identify whether
it is transactional (T) or a demographic (D).
Below is a list of a few variables for which data were collected from various
telecommunication companies. On the line to the right of each variable, identify whether
it is transactional (T) or a demographic (D).

Unlock Deck
Unlock for access to all 68 flashcards in this deck.
Unlock Deck
k this deck
39
According to the residual plot for a linear regression model shown to the right, the linear model 
A) okay because the same number of points is above the
Line as below it.
B) okay because the association between the two
Variables is fairly strong.
C) no good because the correlation is near 0.
D) no good because some residuals are large.
E) no good because of the curve in the residuals.

A) okay because the same number of points is above the
Line as below it.
B) okay because the association between the two
Variables is fairly strong.
C) no good because the correlation is near 0.
D) no good because some residuals are large.
E) no good because of the curve in the residuals.
Unlock Deck
Unlock for access to all 68 flashcards in this deck.
Unlock Deck
k this deck
40
According to the residual plots shown below, which linear regression assumptions appear to be violated?

A) Linearity
B) Normality
C) Equal Variance
D) Both A and B
E) All of the above


A) Linearity
B) Normality
C) Equal Variance
D) Both A and B
E) All of the above
Unlock Deck
Unlock for access to all 68 flashcards in this deck.
Unlock Deck
k this deck
41
In a data warehouse, which of the following variables is/are demographic?
A) Age.
B) Occupation.
C) Amount spent on organic food products.
D) Both A and B.
E) All of the above.
A) Age.
B) Occupation.
C) Amount spent on organic food products.
D) Both A and B.
E) All of the above.
Unlock Deck
Unlock for access to all 68 flashcards in this deck.
Unlock Deck
k this deck
42
Which is not a phase of the data mining process?
A) Business understanding.
B) Data preparation.
C) Modeling.
D) Deployment.
E) None of the above.
A) Business understanding.
B) Data preparation.
C) Modeling.
D) Deployment.
E) None of the above.
Unlock Deck
Unlock for access to all 68 flashcards in this deck.
Unlock Deck
k this deck
43
In a data warehouse, which of the following variable(s) is/are demographic?
A) Number of magazine subscriptions.
B) Monthly expenditures on cleaning supplies.
C) Homeowner (Yes or No).
D) Both A and B.
E) All of the above.
A) Number of magazine subscriptions.
B) Monthly expenditures on cleaning supplies.
C) Homeowner (Yes or No).
D) Both A and B.
E) All of the above.
Unlock Deck
Unlock for access to all 68 flashcards in this deck.
Unlock Deck
k this deck
44
Use the following information
Suppose data mining is employed on supermarket chains and travel industry dataBwarehouse in order to answer the following questions. On the line to the right, indicate whether these involve a classification (C) or regression (R) problem.
Whether or not a customer is interested in eco-friendly travel products? ____
Suppose data mining is employed on supermarket chains and travel industry dataBwarehouse in order to answer the following questions. On the line to the right, indicate whether these involve a classification (C) or regression (R) problem.
Whether or not a customer is interested in eco-friendly travel products? ____
Unlock Deck
Unlock for access to all 68 flashcards in this deck.
Unlock Deck
k this deck
45
Explain how data mining differs from statistical inference.
Unlock Deck
Unlock for access to all 68 flashcards in this deck.
Unlock Deck
k this deck
46
Describe the phases of the data mining process.
Unlock Deck
Unlock for access to all 68 flashcards in this deck.
Unlock Deck
k this deck
47
Suppose data mining is employed to answer the following questions. Which is considered a regression problem?
A) Whether or not a customer would be interested in wireless internet capabilities?
B) How much does a customer spend on all household communication-related
Expenditures?
C) Whether or not a customer would be interested in flexible cable TV plans
(subscribe to different channels on different days/times)?
D) Both A and B.
E) All of the above.
A) Whether or not a customer would be interested in wireless internet capabilities?
B) How much does a customer spend on all household communication-related
Expenditures?
C) Whether or not a customer would be interested in flexible cable TV plans
(subscribe to different channels on different days/times)?
D) Both A and B.
E) All of the above.
Unlock Deck
Unlock for access to all 68 flashcards in this deck.
Unlock Deck
k this deck
48
In a data warehouse, which of the following variables is/are demographic?
A) Number of residents in household.
B) Gender of head of household.
C) Monthly electricity usage.
D) Both A and B.
E) All of the above.
A) Number of residents in household.
B) Gender of head of household.
C) Monthly electricity usage.
D) Both A and B.
E) All of the above.
Unlock Deck
Unlock for access to all 68 flashcards in this deck.
Unlock Deck
k this deck
49
In a data warehouse, which of the following variable(s) is/are transactional?
A) Annual expenditures on garden supplies.
B) Number of children in household.
C) Household income.
D) Both A and B.
E) All of the above.
A) Annual expenditures on garden supplies.
B) Number of children in household.
C) Household income.
D) Both A and B.
E) All of the above.
Unlock Deck
Unlock for access to all 68 flashcards in this deck.
Unlock Deck
k this deck
50
Suppose the goal of data mining using this data warehouse was to predict whether a household's telecommunication needs will increase, decrease or stay the same over the next year. What technique might be most appropriate for achieving this goal?
Unlock Deck
Unlock for access to all 68 flashcards in this deck.
Unlock Deck
k this deck
51
Use the following information
Suppose data mining is employed on supermarket chains and travel industry dataBwarehouse in order to answer the following questions. On the line to the right, indicate whether these involve a classification (C) or regression (R) problem.
How much a customer spends annually on travel related products? ____
Suppose data mining is employed on supermarket chains and travel industry dataBwarehouse in order to answer the following questions. On the line to the right, indicate whether these involve a classification (C) or regression (R) problem.
How much a customer spends annually on travel related products? ____
Unlock Deck
Unlock for access to all 68 flashcards in this deck.
Unlock Deck
k this deck
52
Use the following information
Suppose data mining is employed on supermarket chains and travel industry dataBwarehouse in order to answer the following questions. On the line to the right, indicate whether these involve a classification (C) or regression (R) problem.
How much a customer spends annually on international specialty food items? ____
Suppose data mining is employed on supermarket chains and travel industry dataBwarehouse in order to answer the following questions. On the line to the right, indicate whether these involve a classification (C) or regression (R) problem.
How much a customer spends annually on international specialty food items? ____
Unlock Deck
Unlock for access to all 68 flashcards in this deck.
Unlock Deck
k this deck
53
Suppose the goal of data mining in a data warehouse was to predict whether a household's telecommunication needs will increase, decrease or stay the same over
The next year. What technique is most appropriate for achieving this goal?
A) Neural network.
B) Supervised problem.
C) Tree model.
D) Nodal network.
E) None of the above.
The next year. What technique is most appropriate for achieving this goal?
A) Neural network.
B) Supervised problem.
C) Tree model.
D) Nodal network.
E) None of the above.
Unlock Deck
Unlock for access to all 68 flashcards in this deck.
Unlock Deck
k this deck
54
Popular data mining tools inspired by models that tried to mimic the function of the brain are known as
A) Tree models.
B) Supervised problems.
C) Neural networks.
D) Nodal network.
E) None of the above.
A) Tree models.
B) Supervised problems.
C) Neural networks.
D) Nodal network.
E) None of the above.
Unlock Deck
Unlock for access to all 68 flashcards in this deck.
Unlock Deck
k this deck
55
Use the following information for questions
Scanner data gathered from various supermarket chains were merged with data from the
travel industry (e.g., airlines, hotels, etc) into one data warehouse. Below is a list of a
few variables for which data were collected. On the line to the right, indicate whether the
variable is transactional (T) or demographic (D).
Scanner data gathered from various supermarket chains were merged with data from the
travel industry (e.g., airlines, hotels, etc) into one data warehouse. Below is a list of a
few variables for which data were collected. On the line to the right, indicate whether the
variable is transactional (T) or demographic (D).

Unlock Deck
Unlock for access to all 68 flashcards in this deck.
Unlock Deck
k this deck
56
Which is considered a data mining classification problem?
A) Whether or not a customer is interested in eco-friendly travel products?
B) How much a customer spends annually on travel related products?
C) How much a customer spends annually on international specialty food items?
D) Both B and C.
E) All of the above.
A) Whether or not a customer is interested in eco-friendly travel products?
B) How much a customer spends annually on travel related products?
C) How much a customer spends annually on international specialty food items?
D) Both B and C.
E) All of the above.
Unlock Deck
Unlock for access to all 68 flashcards in this deck.
Unlock Deck
k this deck
57
In a data warehouse, which of the following variables is/are transactional?
A) Amount spent on organic food products.
B) Number of international flights taken annually.
C) Types of eco-friendly products purchased.
D) Both A and B.
E) All of the above.
A) Amount spent on organic food products.
B) Number of international flights taken annually.
C) Types of eco-friendly products purchased.
D) Both A and B.
E) All of the above.
Unlock Deck
Unlock for access to all 68 flashcards in this deck.
Unlock Deck
k this deck
58
Data not used in building the model but used to evaluate the performance of the model is known as
A) the terminal node.
B) the test set.
C) meta data.
D) the training set.
E) None of the above.
A) the terminal node.
B) the test set.
C) meta data.
D) the training set.
E) None of the above.
Unlock Deck
Unlock for access to all 68 flashcards in this deck.
Unlock Deck
k this deck
59
Suppose the goal of data mining using this data warehouse was to predict whether a customer's expenditures on international specialty food items would increase, decrease or stay the same in the next year. What technique might be most appropriate for achieving this goal?
Unlock Deck
Unlock for access to all 68 flashcards in this deck.
Unlock Deck
k this deck
60
In a data warehouse, which of the following variables is/are transactional?
A) Type of cell phone plan.
B) Zip code.
C) Household income.
D) Both A and B.
E) All of the above.
A) Type of cell phone plan.
B) Zip code.
C) Household income.
D) Both A and B.
E) All of the above.
Unlock Deck
Unlock for access to all 68 flashcards in this deck.
Unlock Deck
k this deck
61
Suppose data mining is used to determine whether or not a household subscribes to magazines about home and garden. In data mining this is referred to as what type of
Problem?
A) Regression.
B) Transactional.
C) Unsupervised.
D) Classification.
E) None of the above.
Problem?
A) Regression.
B) Transactional.
C) Unsupervised.
D) Classification.
E) None of the above.
Unlock Deck
Unlock for access to all 68 flashcards in this deck.
Unlock Deck
k this deck
62
Suppose the goal of data mining using in a data warehouse is to predict whether a customer's expenditures on international specialty food items would increase,
Decrease or stay the same in the next year. What technique might be most appropriate
For achieving this goal?
A) Neural network.
B) Unsupervised problem.
C) Tree model.
D) Nodal network.
E) None of the above.
Decrease or stay the same in the next year. What technique might be most appropriate
For achieving this goal?
A) Neural network.
B) Unsupervised problem.
C) Tree model.
D) Nodal network.
E) None of the above.
Unlock Deck
Unlock for access to all 68 flashcards in this deck.
Unlock Deck
k this deck
63
Suppose data mining is used to determine how important it is for a customer to purchase a vehicle with a very low carbon footprint. In data mining this is referred to
As what type of problem?
A) Regression.
B) Transactional.
C) Unsupervised.
D) Classification.
E) None of the above.
As what type of problem?
A) Regression.
B) Transactional.
C) Unsupervised.
D) Classification.
E) None of the above.
Unlock Deck
Unlock for access to all 68 flashcards in this deck.
Unlock Deck
k this deck
64
Information about variables, such as variable definitions as well as how and when data were collected, is collectively called
A) superdata.
B) metadata.
C) extradata.
D) cases.
E) none of the above.
A) superdata.
B) metadata.
C) extradata.
D) cases.
E) none of the above.
Unlock Deck
Unlock for access to all 68 flashcards in this deck.
Unlock Deck
k this deck
65
In a data warehouse, which of the following variable(s) is/are transactional?
A) Gender.
B) Homeowner (Yes or No).
C) Purchase price of a vehicle.
D) Both B and C.
E) None of the above.
A) Gender.
B) Homeowner (Yes or No).
C) Purchase price of a vehicle.
D) Both B and C.
E) None of the above.
Unlock Deck
Unlock for access to all 68 flashcards in this deck.
Unlock Deck
k this deck
66
Suppose data mining is used to determine how much a customer spends annually on energy efficient products. In data mining this is referred to as what type of problem?
A) Regression.
B) Transactional.
C) Unsupervised.
D) Classification.
E) None of the above.
A) Regression.
B) Transactional.
C) Unsupervised.
D) Classification.
E) None of the above.
Unlock Deck
Unlock for access to all 68 flashcards in this deck.
Unlock Deck
k this deck
67
Data used in a supervised problem to build the predictive model is known as
A) the terminal node.
B) the test set.
C) meta data.
D) the training set.
E) none of the above.
A) the terminal node.
B) the test set.
C) meta data.
D) the training set.
E) none of the above.
Unlock Deck
Unlock for access to all 68 flashcards in this deck.
Unlock Deck
k this deck
68
Suppose data mining is used to determine whether or not a customer would purchase a hybrid vehicle. In data mining this is referred to as what type of problem?
A) Regression.
B) Transactional.
C) Unsupervised.
D) Classification.
E) None of the above.
A) Regression.
B) Transactional.
C) Unsupervised.
D) Classification.
E) None of the above.
Unlock Deck
Unlock for access to all 68 flashcards in this deck.
Unlock Deck
k this deck