Deck 9: Regression

Full screen (f)
exit full mode
Question
Prior to running a regression analysis, what is the first procedure that should be completed?

A) Visually inspect the data using a scatterplot.
B) Run a correlation analysis on the data.
C) Check the residuals.
D) Plot a histogram for each variable to determine its distribution.
Use Space or
up arrow
down arrow
to flip the card.
Question
Having obtained a scatterplot to inspect the data, you suspect that the results for participants 4 and 17 are outliers. What should you do?

A) Remove the cases so that the model is more representative of the majority of observations.
B) Perform a series of diagnostic tests on the data such as Cook's distance or Mahalanobis distance.
C) Leave the data points in and incorporate the Fick reducer variable.
D) Rerun the analysis without the observations in to see how the regression model is affected.
Question
Typically, what is the minimum number of case per predictor?

A) 10-15
B) 5-10
C) 0-5
D) 15-20
Question
Which of the following are assumptions of multiple linear regression?

A) A sample is representative of its population and the observations are independent.
B) Linearity exists within the data.
C) Normal distribution.
D) All of the above.
Question
The number of hours spent practising and the ability to successfully execute a specific skill (e.g. arm stand, high board dive) are related by a straight-line model: the more you practise, the better the performance of the dive. Which of the following equations best represents a 'straight line' model?

A) y = mx + c
B) y = mc + x
C) y = cm - x
D) y = cx + m
Question
What does the term 'residual sum of errors' (SSR) represent?

A) The extent to which the line of best fit reflects the data
B) The degree of covariance that exists between the data
C) The measure of sphercity regarding the variables of concern
D) All of the above are appropriate.
Question
What happens to the line of best fit if participant 4 scored 22, instead of 95, and participant 17 scored 20 as opposed to 90 in the diving competition? <strong>What happens to the line of best fit if participant 4 scored 22, instead of 95, and participant 17 scored 20 as opposed to 90 in the diving competition?    </strong> A) The line of best fit has become flatter. B) The line of best fit has become steeper. C) The line of best fit has the same angle but is generally above the observations. D) The line is relatively unaffected by the change in data. <div style=padding-top: 35px> <strong>What happens to the line of best fit if participant 4 scored 22, instead of 95, and participant 17 scored 20 as opposed to 90 in the diving competition?    </strong> A) The line of best fit has become flatter. B) The line of best fit has become steeper. C) The line of best fit has the same angle but is generally above the observations. D) The line is relatively unaffected by the change in data. <div style=padding-top: 35px>

A) The line of best fit has become flatter.
B) The line of best fit has become steeper.
C) The line of best fit has the same angle but is generally above the observations.
D) The line is relatively unaffected by the change in data.
Question
What purpose does the adjusted R2 value serve?

A) The variance in the outcome variable if the model was drawn from the population of the sample.
B) The variance between the predictor and outcome variable.
C) This is used to correct for the covariance.
D) It more accurately reflects the sample data.
Question
Why is the sample size so important to statistical analysis, in this case during regression analysis?

A) It provides a more reliable regression model.
B) It provides additional assurance that predictors are from a normally distributed data set.
C) It increases the chances of detecting a difference if one exists.
D) All of the above.
Question
Typically, research studies in sports science tend to recruit fairly small sample groups (less than 20), but what magnitude of effect can random data have on a data set?

A) Strong effect
B) Weak effect
C) No measurable effect
D) Weak to medium effect
Question
What does c in a straight line equation represent?

A) The gradient of the straight line
B) The predictor variable
C) The intercept on the y-axis
D) The outcome variable
Question
Suppose the abdominal and subscapular skinfold sites had a high degree of collinearity. What are the implications of this when attempting to predict percentage body fat?

A) Increased standard errors of the b coefficient.
B) It limits the R value as the predictor variables account for the same variance in the outcome variable (i.e. percentage body fat).
C) It becomes difficult to decide which variable to include.
D) All of the above.
Question
The difference between each observation (i.e. hours of training and competition score) and the model fitted to the data (i.e. all observations) is known as

A) Residual
B) Difference
C) Offset
D) Disparity
Question
Percentage body fat cannot be measured directly, so it has to be predicted from a series of skinfold measures. Suppose seven sites were measured (triceps, chest, midauxillary, subscapular, suprailiac, abdominal and thigh). How many participants should be recruited for a multiple regression to be performed?

A) 105
B) 155
C) 85
D) 65
Question
What was the adjusted R2 value for the diving data?

A) .329
B) .331
C) .292
D) .259
Question
You determine an effect size from a study to be .04. How would this effect be categorized?

A) Very weak effect
B) Medium effect
C) Medium to strong effect
D) Very strong effect
Question
What does hierarchical regression refer to?

A) Known predictors of the outcome variable are entered first.
B) Known predictors of the outcome variable are entered last.
C) Known predictors of the outcome variable are entered simultaneously.
D) Variables are entered based on a mathematical formula.
Question
What does m in a straight-line equation represent?

A) The gradient of the straight line
B) The predictor variable
C) The intercept on the y-axis
D) The outcome variable
Question
If there are key skinfold sites known to the researcher that enabled percentage body fat to be predicted, which regression method(s) should be chosen?

A) Hierarchial (blockwise entry)
B) Forced entry(Enter)
C) Stepwise
D) Backward
Question
Using the equation identified in Q2, calculate the potential score in a diving competition if they have amassed 250 hours of training in the 12 months leading up to the competition. Assume m = .1016 and c = 22.5.

A) 48 points
B) 52 points
C) 58 points
D) 60 points
Question
Which procedure could be used to determine whether the residuals from the proposed model are independent?

A) Durbin-Watson
B) Cook's distance
C) Mahalanobis distance
D) None of the above are appropriate
Question
Having developed a statistical model to predict percentage body fat, what is the purpose of conducting cross-validation?

A) To test the accuracy of the prediction.
B) To test the model on a different population.
C) To evaluate the covariance within the model.
D) To measure the degree of difference in the residuals.
Question
You run a correlation matrix of the seven different skinfold sites and obtain an R value of .87 for the abdominal and subscapular sites and a variance inflation factor (VIF) of 0.1. What would you deduce from these findings?

A) A high degree of collinearity is likely between these two skinfold sites.
B) A low degree of collinearity is likely between these two skinfold sites.
C) A moderate degree of collinearity is likely between these two skinfold sites.
D) These figures make it difficult to deduce firm conclusions.
Unlock Deck
Sign up to unlock the cards in this deck!
Unlock Deck
Unlock Deck
1/23
auto play flashcards
Play
simple tutorial
Full screen (f)
exit full mode
Deck 9: Regression
1
Prior to running a regression analysis, what is the first procedure that should be completed?

A) Visually inspect the data using a scatterplot.
B) Run a correlation analysis on the data.
C) Check the residuals.
D) Plot a histogram for each variable to determine its distribution.
Visually inspect the data using a scatterplot.
2
Having obtained a scatterplot to inspect the data, you suspect that the results for participants 4 and 17 are outliers. What should you do?

A) Remove the cases so that the model is more representative of the majority of observations.
B) Perform a series of diagnostic tests on the data such as Cook's distance or Mahalanobis distance.
C) Leave the data points in and incorporate the Fick reducer variable.
D) Rerun the analysis without the observations in to see how the regression model is affected.
Perform a series of diagnostic tests on the data such as Cook's distance or Mahalanobis distance.
3
Typically, what is the minimum number of case per predictor?

A) 10-15
B) 5-10
C) 0-5
D) 15-20
10-15
4
Which of the following are assumptions of multiple linear regression?

A) A sample is representative of its population and the observations are independent.
B) Linearity exists within the data.
C) Normal distribution.
D) All of the above.
Unlock Deck
Unlock for access to all 23 flashcards in this deck.
Unlock Deck
k this deck
5
The number of hours spent practising and the ability to successfully execute a specific skill (e.g. arm stand, high board dive) are related by a straight-line model: the more you practise, the better the performance of the dive. Which of the following equations best represents a 'straight line' model?

A) y = mx + c
B) y = mc + x
C) y = cm - x
D) y = cx + m
Unlock Deck
Unlock for access to all 23 flashcards in this deck.
Unlock Deck
k this deck
6
What does the term 'residual sum of errors' (SSR) represent?

A) The extent to which the line of best fit reflects the data
B) The degree of covariance that exists between the data
C) The measure of sphercity regarding the variables of concern
D) All of the above are appropriate.
Unlock Deck
Unlock for access to all 23 flashcards in this deck.
Unlock Deck
k this deck
7
What happens to the line of best fit if participant 4 scored 22, instead of 95, and participant 17 scored 20 as opposed to 90 in the diving competition? <strong>What happens to the line of best fit if participant 4 scored 22, instead of 95, and participant 17 scored 20 as opposed to 90 in the diving competition?    </strong> A) The line of best fit has become flatter. B) The line of best fit has become steeper. C) The line of best fit has the same angle but is generally above the observations. D) The line is relatively unaffected by the change in data. <strong>What happens to the line of best fit if participant 4 scored 22, instead of 95, and participant 17 scored 20 as opposed to 90 in the diving competition?    </strong> A) The line of best fit has become flatter. B) The line of best fit has become steeper. C) The line of best fit has the same angle but is generally above the observations. D) The line is relatively unaffected by the change in data.

A) The line of best fit has become flatter.
B) The line of best fit has become steeper.
C) The line of best fit has the same angle but is generally above the observations.
D) The line is relatively unaffected by the change in data.
Unlock Deck
Unlock for access to all 23 flashcards in this deck.
Unlock Deck
k this deck
8
What purpose does the adjusted R2 value serve?

A) The variance in the outcome variable if the model was drawn from the population of the sample.
B) The variance between the predictor and outcome variable.
C) This is used to correct for the covariance.
D) It more accurately reflects the sample data.
Unlock Deck
Unlock for access to all 23 flashcards in this deck.
Unlock Deck
k this deck
9
Why is the sample size so important to statistical analysis, in this case during regression analysis?

A) It provides a more reliable regression model.
B) It provides additional assurance that predictors are from a normally distributed data set.
C) It increases the chances of detecting a difference if one exists.
D) All of the above.
Unlock Deck
Unlock for access to all 23 flashcards in this deck.
Unlock Deck
k this deck
10
Typically, research studies in sports science tend to recruit fairly small sample groups (less than 20), but what magnitude of effect can random data have on a data set?

A) Strong effect
B) Weak effect
C) No measurable effect
D) Weak to medium effect
Unlock Deck
Unlock for access to all 23 flashcards in this deck.
Unlock Deck
k this deck
11
What does c in a straight line equation represent?

A) The gradient of the straight line
B) The predictor variable
C) The intercept on the y-axis
D) The outcome variable
Unlock Deck
Unlock for access to all 23 flashcards in this deck.
Unlock Deck
k this deck
12
Suppose the abdominal and subscapular skinfold sites had a high degree of collinearity. What are the implications of this when attempting to predict percentage body fat?

A) Increased standard errors of the b coefficient.
B) It limits the R value as the predictor variables account for the same variance in the outcome variable (i.e. percentage body fat).
C) It becomes difficult to decide which variable to include.
D) All of the above.
Unlock Deck
Unlock for access to all 23 flashcards in this deck.
Unlock Deck
k this deck
13
The difference between each observation (i.e. hours of training and competition score) and the model fitted to the data (i.e. all observations) is known as

A) Residual
B) Difference
C) Offset
D) Disparity
Unlock Deck
Unlock for access to all 23 flashcards in this deck.
Unlock Deck
k this deck
14
Percentage body fat cannot be measured directly, so it has to be predicted from a series of skinfold measures. Suppose seven sites were measured (triceps, chest, midauxillary, subscapular, suprailiac, abdominal and thigh). How many participants should be recruited for a multiple regression to be performed?

A) 105
B) 155
C) 85
D) 65
Unlock Deck
Unlock for access to all 23 flashcards in this deck.
Unlock Deck
k this deck
15
What was the adjusted R2 value for the diving data?

A) .329
B) .331
C) .292
D) .259
Unlock Deck
Unlock for access to all 23 flashcards in this deck.
Unlock Deck
k this deck
16
You determine an effect size from a study to be .04. How would this effect be categorized?

A) Very weak effect
B) Medium effect
C) Medium to strong effect
D) Very strong effect
Unlock Deck
Unlock for access to all 23 flashcards in this deck.
Unlock Deck
k this deck
17
What does hierarchical regression refer to?

A) Known predictors of the outcome variable are entered first.
B) Known predictors of the outcome variable are entered last.
C) Known predictors of the outcome variable are entered simultaneously.
D) Variables are entered based on a mathematical formula.
Unlock Deck
Unlock for access to all 23 flashcards in this deck.
Unlock Deck
k this deck
18
What does m in a straight-line equation represent?

A) The gradient of the straight line
B) The predictor variable
C) The intercept on the y-axis
D) The outcome variable
Unlock Deck
Unlock for access to all 23 flashcards in this deck.
Unlock Deck
k this deck
19
If there are key skinfold sites known to the researcher that enabled percentage body fat to be predicted, which regression method(s) should be chosen?

A) Hierarchial (blockwise entry)
B) Forced entry(Enter)
C) Stepwise
D) Backward
Unlock Deck
Unlock for access to all 23 flashcards in this deck.
Unlock Deck
k this deck
20
Using the equation identified in Q2, calculate the potential score in a diving competition if they have amassed 250 hours of training in the 12 months leading up to the competition. Assume m = .1016 and c = 22.5.

A) 48 points
B) 52 points
C) 58 points
D) 60 points
Unlock Deck
Unlock for access to all 23 flashcards in this deck.
Unlock Deck
k this deck
21
Which procedure could be used to determine whether the residuals from the proposed model are independent?

A) Durbin-Watson
B) Cook's distance
C) Mahalanobis distance
D) None of the above are appropriate
Unlock Deck
Unlock for access to all 23 flashcards in this deck.
Unlock Deck
k this deck
22
Having developed a statistical model to predict percentage body fat, what is the purpose of conducting cross-validation?

A) To test the accuracy of the prediction.
B) To test the model on a different population.
C) To evaluate the covariance within the model.
D) To measure the degree of difference in the residuals.
Unlock Deck
Unlock for access to all 23 flashcards in this deck.
Unlock Deck
k this deck
23
You run a correlation matrix of the seven different skinfold sites and obtain an R value of .87 for the abdominal and subscapular sites and a variance inflation factor (VIF) of 0.1. What would you deduce from these findings?

A) A high degree of collinearity is likely between these two skinfold sites.
B) A low degree of collinearity is likely between these two skinfold sites.
C) A moderate degree of collinearity is likely between these two skinfold sites.
D) These figures make it difficult to deduce firm conclusions.
Unlock Deck
Unlock for access to all 23 flashcards in this deck.
Unlock Deck
k this deck
locked card icon
Unlock Deck
Unlock for access to all 23 flashcards in this deck.