Deck 5: Regression Analysis

ملء الشاشة (f)
exit full mode
سؤال
Describe how the Ordinary least squares (OLS) method minimizes the sum of squared errors.
استخدم زر المسافة أو
up arrow
down arrow
لقلب البطاقة.
سؤال
Which of the following metrics returns the percentage absolute difference in error prediction, on average, from the actual target?

A) Mean Absolute Percentage Error
B) Mean Absolute Error
C) precipProbability
D) Root Mean Squared Error
سؤال
Which of the following is true of the hold-out method of model validation?

A) This procedure cannot be used in advanced analytics techniques due to its complexity.
B) Two-thirds of the data is randomly selected and removed to build the regression model.
C) This method uses the training dataset and is validated using the single selected validation set.
D) This method requires no training time and a minimum of computer processing power.
سؤال
In the ridesharing case study, the variable distance refers to ________.

A) the number of miles a ride covered
B) the duration of the ride
C) the hour of day extracted from the datetime
D) how good the condition is overall
سؤال
In the context of modeling categorical values, dummy coding is ________.

A) measuring the absolute difference between the predicted and actual values in a predictive model
B) representing the difference between the observed and predicted values of a dependent variable
C) typically dividing data into ten subsets called folds
D) creating a dichotomous value to represent a variable
سؤال
Overfitting happens when sample characteristics are included in the regression model that can be generalized to new data.
سؤال
Identify a difference between the model evaluation methods of hold-out variation and N-fold cross validation.

A) Unlike the N-fold cross validation method, the hold-out variation method requires high amounts of training time and computer processing power.
B) Unlike in the hold-out variation method, the N-fold cross validation method randomly selects one set of two-thirds of data to build the regression model.
C) Unlike the N-fold cross validation method, the hold-out variation method divides data into many smaller subsets.
D) Unlike the hold-out sample method, the N-fold cross validation method is less sensitive to variation in training and validation datasets selection.
سؤال
An essential practice before starting with any modeling process is to first ________.

A) review and clean the dataset
B) determine the accuracy of the dataset
C) determine the target variables
D) plot the data into graphs
سؤال
Descriptive/explanatory modeling is used

A) when the focus is limited to a single, numeric dependent variable and a single independent variable.
B) to represent explanation and association between independent and dependent variables.
C) to determine whether two or more independent variables are good predictors of the single dependent variable.
D) to predict a new observation.
سؤال
Mean Absolute Error

A) represents the difference between the observed and predicted value of the dependent variable.
B) is the percentage absolute difference the prediction is, on average, from the actual target.
C) measures the total difference between the predicted and actual values of the model.
D) indicates how different the residuals are from zero.
سؤال
In the ridesharing case study data dictionary, the rideshare variable refers to the name of ride sharing service (e.g., Lyft or Uber).
سؤال
In feature selection, ________ starts with a regression model that includes all predictors under consideration.

A) backward elimination
B) forward selection
C) overfitting
D) stepwise selection
سؤال
In feature selection, ________ follows forward selection by adding a variable at each stage, but also includes removing variables that no longer meet the threshold.

A) hold-out variation
B) stepwise selection
C) dummy coding
D) forward selection
سؤال
Employment status (unemployed, employed, student, retired) is an example of a dichotomous value.
سؤال
In the ridesharing case study, the windspeed variable refers to the

A) the likelihood of rain for a specific forecast period and location.
B) wind speed at the time and location of the ride.
C) wind gust measuring the increase in wind speed at the time and location of the ride.
D) air quality at the time and location of the ride.
سؤال
A descriptive/explanatory model uses validation dataset metrics such as Mean Absolute Percentage Error and Root Mean Squared Error.
سؤال
Identify a true statement about the N-fold cross evaluation method of model validation.

A) This procedure is highly sensitive to variation in the datasets.
B) This method requires minimal computer processing power and no training time.
C) This method uses randomly selected data.
D) This procedure typically uses ten data subsets.
سؤال
In the ridesharing case study, the variable source refers to the ________.

A) type of rideshare service
B) date and time of the ride
C) location of the ride pickup
D) ride unique id per observation
سؤال
In feature selection, ________ begins by creating a separate regression model for each predictor.

A) stepwise selection
B) forward selection
C) hold-out variation
D) dummy coding
سؤال
Which of the following is true of validation data?

A) It is used as a last check that the regression model is complete.
B) It is a portion of the data that is used to build a regression model.
C) It is the portion of the data used to assess the regression model developed from the training data.
D) It provides a final estimate of the regression model's performance after it has been trained and validated.
سؤال
Describe the backward elimination, forward selection, and stepwise selection regression models of feature selection.
سؤال
When high levels of accuracy in a training dataset do not apply to predicting models using new data, the phenomenon is termed ________.

A) adjacency
B) dummy coding
C) overfitting
D) multicollinearity
سؤال
A predictive model only uses coefficient sizes, goodness of fit, and overall model fit.
سؤال
In the N-fold cross validation model evaluation method, it is typical to use 45 folds (data subsets).
سؤال
Explain how the quality of a predictive model is determined using metrics.
سؤال
Feature selection is a qualitative method used to reduce the impact of dummy coding.
سؤال
In regression analysis, the variable being predicted is referred to as the ________.

A) independent variable
B) target variable
C) predictor
D) feature
سؤال
________ is used to determine whether two or more independent variables are good predictors of the single target variable.

A) Explanatory modeling
B) Multiple regression
C) Predictive modeling
D) The Ordinary least squares method
سؤال
Which of the following is an example of a categorical independent variable?

A) price
B) product amount
C) marital status
D) date of birth
سؤال
The Akaike information criterion (AIC) and Bayesian information criterion (BIC) are measures to determine the ________.

A) quality of the statistical model
B) significance
C) overall model fit
D) coefficients
سؤال
Which of the following datasets is used as an optional dataset dedicated for model validation?

A) training data
B) validation data
C) baseline data
D) test data
سؤال
In the ridesharing case study, the variable representing the likelihood of rain for a specific forecast period and location is ________.

A) percent_Rain
B) precipProbability
C) percent_humidity
D) rainPossibility
سؤال
Compare and contrast simple bivariate linear regression with multiple linear regression.
سؤال
With the accuracy measures of Mean Absolute Error, Mean Absolute Percentage Error, and Root Mean Squared Error, lower values indicate a better fit.
سؤال
Which of the following is true of a predictive model?

A) The interpretability of the x and y association is critical for this model to work.
B) An entire dataset is used to build a predictive model.
C) It is prospective-its main focus is on forecasting new data records.
D) It is retrospective-its main focus is on interpreting coefficients.
سؤال
KDNuggets identified ________ as one of the software tools most often used for data analysis.

A) RapidMiner
B) PowerBI
C) Crystal Report
D) Tableau
فتح الحزمة
قم بالتسجيل لفتح البطاقات في هذه المجموعة!
Unlock Deck
Unlock Deck
1/36
auto play flashcards
العب
simple tutorial
ملء الشاشة (f)
exit full mode
Deck 5: Regression Analysis
1
Describe how the Ordinary least squares (OLS) method minimizes the sum of squared errors.
No Answer
2
Which of the following metrics returns the percentage absolute difference in error prediction, on average, from the actual target?

A) Mean Absolute Percentage Error
B) Mean Absolute Error
C) precipProbability
D) Root Mean Squared Error
Mean Absolute Percentage Error
3
Which of the following is true of the hold-out method of model validation?

A) This procedure cannot be used in advanced analytics techniques due to its complexity.
B) Two-thirds of the data is randomly selected and removed to build the regression model.
C) This method uses the training dataset and is validated using the single selected validation set.
D) This method requires no training time and a minimum of computer processing power.
Two-thirds of the data is randomly selected and removed to build the regression model.
4
In the ridesharing case study, the variable distance refers to ________.

A) the number of miles a ride covered
B) the duration of the ride
C) the hour of day extracted from the datetime
D) how good the condition is overall
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 36 في هذه المجموعة.
فتح الحزمة
k this deck
5
In the context of modeling categorical values, dummy coding is ________.

A) measuring the absolute difference between the predicted and actual values in a predictive model
B) representing the difference between the observed and predicted values of a dependent variable
C) typically dividing data into ten subsets called folds
D) creating a dichotomous value to represent a variable
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 36 في هذه المجموعة.
فتح الحزمة
k this deck
6
Overfitting happens when sample characteristics are included in the regression model that can be generalized to new data.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 36 في هذه المجموعة.
فتح الحزمة
k this deck
7
Identify a difference between the model evaluation methods of hold-out variation and N-fold cross validation.

A) Unlike the N-fold cross validation method, the hold-out variation method requires high amounts of training time and computer processing power.
B) Unlike in the hold-out variation method, the N-fold cross validation method randomly selects one set of two-thirds of data to build the regression model.
C) Unlike the N-fold cross validation method, the hold-out variation method divides data into many smaller subsets.
D) Unlike the hold-out sample method, the N-fold cross validation method is less sensitive to variation in training and validation datasets selection.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 36 في هذه المجموعة.
فتح الحزمة
k this deck
8
An essential practice before starting with any modeling process is to first ________.

A) review and clean the dataset
B) determine the accuracy of the dataset
C) determine the target variables
D) plot the data into graphs
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 36 في هذه المجموعة.
فتح الحزمة
k this deck
9
Descriptive/explanatory modeling is used

A) when the focus is limited to a single, numeric dependent variable and a single independent variable.
B) to represent explanation and association between independent and dependent variables.
C) to determine whether two or more independent variables are good predictors of the single dependent variable.
D) to predict a new observation.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 36 في هذه المجموعة.
فتح الحزمة
k this deck
10
Mean Absolute Error

A) represents the difference between the observed and predicted value of the dependent variable.
B) is the percentage absolute difference the prediction is, on average, from the actual target.
C) measures the total difference between the predicted and actual values of the model.
D) indicates how different the residuals are from zero.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 36 في هذه المجموعة.
فتح الحزمة
k this deck
11
In the ridesharing case study data dictionary, the rideshare variable refers to the name of ride sharing service (e.g., Lyft or Uber).
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 36 في هذه المجموعة.
فتح الحزمة
k this deck
12
In feature selection, ________ starts with a regression model that includes all predictors under consideration.

A) backward elimination
B) forward selection
C) overfitting
D) stepwise selection
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 36 في هذه المجموعة.
فتح الحزمة
k this deck
13
In feature selection, ________ follows forward selection by adding a variable at each stage, but also includes removing variables that no longer meet the threshold.

A) hold-out variation
B) stepwise selection
C) dummy coding
D) forward selection
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 36 في هذه المجموعة.
فتح الحزمة
k this deck
14
Employment status (unemployed, employed, student, retired) is an example of a dichotomous value.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 36 في هذه المجموعة.
فتح الحزمة
k this deck
15
In the ridesharing case study, the windspeed variable refers to the

A) the likelihood of rain for a specific forecast period and location.
B) wind speed at the time and location of the ride.
C) wind gust measuring the increase in wind speed at the time and location of the ride.
D) air quality at the time and location of the ride.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 36 في هذه المجموعة.
فتح الحزمة
k this deck
16
A descriptive/explanatory model uses validation dataset metrics such as Mean Absolute Percentage Error and Root Mean Squared Error.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 36 في هذه المجموعة.
فتح الحزمة
k this deck
17
Identify a true statement about the N-fold cross evaluation method of model validation.

A) This procedure is highly sensitive to variation in the datasets.
B) This method requires minimal computer processing power and no training time.
C) This method uses randomly selected data.
D) This procedure typically uses ten data subsets.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 36 في هذه المجموعة.
فتح الحزمة
k this deck
18
In the ridesharing case study, the variable source refers to the ________.

A) type of rideshare service
B) date and time of the ride
C) location of the ride pickup
D) ride unique id per observation
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 36 في هذه المجموعة.
فتح الحزمة
k this deck
19
In feature selection, ________ begins by creating a separate regression model for each predictor.

A) stepwise selection
B) forward selection
C) hold-out variation
D) dummy coding
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 36 في هذه المجموعة.
فتح الحزمة
k this deck
20
Which of the following is true of validation data?

A) It is used as a last check that the regression model is complete.
B) It is a portion of the data that is used to build a regression model.
C) It is the portion of the data used to assess the regression model developed from the training data.
D) It provides a final estimate of the regression model's performance after it has been trained and validated.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 36 في هذه المجموعة.
فتح الحزمة
k this deck
21
Describe the backward elimination, forward selection, and stepwise selection regression models of feature selection.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 36 في هذه المجموعة.
فتح الحزمة
k this deck
22
When high levels of accuracy in a training dataset do not apply to predicting models using new data, the phenomenon is termed ________.

A) adjacency
B) dummy coding
C) overfitting
D) multicollinearity
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 36 في هذه المجموعة.
فتح الحزمة
k this deck
23
A predictive model only uses coefficient sizes, goodness of fit, and overall model fit.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 36 في هذه المجموعة.
فتح الحزمة
k this deck
24
In the N-fold cross validation model evaluation method, it is typical to use 45 folds (data subsets).
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 36 في هذه المجموعة.
فتح الحزمة
k this deck
25
Explain how the quality of a predictive model is determined using metrics.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 36 في هذه المجموعة.
فتح الحزمة
k this deck
26
Feature selection is a qualitative method used to reduce the impact of dummy coding.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 36 في هذه المجموعة.
فتح الحزمة
k this deck
27
In regression analysis, the variable being predicted is referred to as the ________.

A) independent variable
B) target variable
C) predictor
D) feature
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 36 في هذه المجموعة.
فتح الحزمة
k this deck
28
________ is used to determine whether two or more independent variables are good predictors of the single target variable.

A) Explanatory modeling
B) Multiple regression
C) Predictive modeling
D) The Ordinary least squares method
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 36 في هذه المجموعة.
فتح الحزمة
k this deck
29
Which of the following is an example of a categorical independent variable?

A) price
B) product amount
C) marital status
D) date of birth
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 36 في هذه المجموعة.
فتح الحزمة
k this deck
30
The Akaike information criterion (AIC) and Bayesian information criterion (BIC) are measures to determine the ________.

A) quality of the statistical model
B) significance
C) overall model fit
D) coefficients
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 36 في هذه المجموعة.
فتح الحزمة
k this deck
31
Which of the following datasets is used as an optional dataset dedicated for model validation?

A) training data
B) validation data
C) baseline data
D) test data
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 36 في هذه المجموعة.
فتح الحزمة
k this deck
32
In the ridesharing case study, the variable representing the likelihood of rain for a specific forecast period and location is ________.

A) percent_Rain
B) precipProbability
C) percent_humidity
D) rainPossibility
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 36 في هذه المجموعة.
فتح الحزمة
k this deck
33
Compare and contrast simple bivariate linear regression with multiple linear regression.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 36 في هذه المجموعة.
فتح الحزمة
k this deck
34
With the accuracy measures of Mean Absolute Error, Mean Absolute Percentage Error, and Root Mean Squared Error, lower values indicate a better fit.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 36 في هذه المجموعة.
فتح الحزمة
k this deck
35
Which of the following is true of a predictive model?

A) The interpretability of the x and y association is critical for this model to work.
B) An entire dataset is used to build a predictive model.
C) It is prospective-its main focus is on forecasting new data records.
D) It is retrospective-its main focus is on interpreting coefficients.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 36 في هذه المجموعة.
فتح الحزمة
k this deck
36
KDNuggets identified ________ as one of the software tools most often used for data analysis.

A) RapidMiner
B) PowerBI
C) Crystal Report
D) Tableau
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 36 في هذه المجموعة.
فتح الحزمة
k this deck
locked card icon
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 36 في هذه المجموعة.