Question 1

When making an interpolation it is possible to use which of the following functional forms?&#10;A) Linear&#10;B) Quadratic&#10;C) Cubic&#10;D) All of these choices are correct.

Accepted Answer

All of these choices are correct. Linear interpolation is the simplest approach, where a straight line is used to connect two neighboring data points. Quadratic interpolation uses a parabolic curve to connect three neighboring data points. Cubic interpolation uses a cubic curve to connect four neighboring data points, providing a smoother and more natural-looking curve than linear and quadratic interpolation. Therefore, if the data points can be well-represented by either of these three functional forms, then any of them can be used for interpolation.

Question 2

For a parameter that is identified, what should happen to the 95% confidence interval as the sample size gets larger?&#10;A) The right tail should get larger.&#10;B) The left tail should get larger.&#10;C) The width of the interval should get smaller.&#10;D) The width of the interval should get wider.

Accepted Answer

As the sample size gets larger, the variability of the estimate decreases, leading to a smaller standard error and a narrower confidence interval.

Question 3

The key distinction between extrapolation and interpolation is that interpolation:&#10;A) involves causal estimates, while extrapolation involves correlations.&#10;B) involves linear functional forms, while extrapolation involves non-linear function forms.&#10;C) fills data gaps, while extrapolation fills beyond the extent of the data.&#10;D) can be used with control variables, while extrapolation must be used in conjunction with instrumental variables.

Accepted Answer

Interpolation fills data gaps within the range of existing data, while extrapolation predicts values beyond the range of existing data. Therefore, option C accurately highlights the difference between the two. Option A is incorrect because both interpolation and extrapolation can involve causal estimates or correlations. Option B is incorrect because both methods can involve linear or non-linear functional forms. Option D is incorrect because neither interpolation nor extrapolation must be used with control or instrumental variables.

Question 4

Which of the following settings constitutes an interpolation?&#10;A) Using past sales, and price information to make a forecast for next year's sales growth.&#10;B) Using past sales, and price information to estimate last year's price elasticity of demand.&#10;C) Using 2014 and 2016 data on GDP to estimate GDP in the third quarter of 2015, which is missing.&#10;D) Using a randomized control trial to estimate the causal effect of the use of a cancer treatment on health outcomes.

Accepted Answer

Interpolation involves estimating a missing value within a known range of values, which is what is happening in option C, where GDP for the missing quarter is being estimated based on GDP data from surrounding years. Options A, B, and D all involve prediction or estimation outside of a known range of values, which would be considered extrapolation.

Question 5

When one conducts interpolation, one is:&#10;A) imputing values that have been mismeasured.&#10;B) correcting values that have been mismeasured.&#10;C) drawing conclusions where there are &#34;gaps&#34; in the data.&#10;D) drawing conclusions beyond the extent of the data.

Accepted Answer

Interpolation is a method of estimating values within an existing set of data points. It is used to fill gaps in the data by drawing conclusions based on the surrounding data. Therefore, option C is the correct answer. Options A and B refer to correcting errors, which is not the goal of interpolation. Option D refers to extrapolation, which is different from interpolation and involves drawing conclusions beyond the available data.

Question 6

Which of the following is a step that can remedy an interpolation/extrapolation identification problem?&#10;A) Use a functional form assumption to interpolate/extrapolate.&#10;B) Using an instrumental variable to interpolate/extrapolate.&#10;C) Use two-stage least squares to interpolate/extrapolate.&#10;D) Use a fixed effects model.

Accepted Answer

Using a functional form assumption to interpolate or extrapolate involves making an assumption about the relationship between variables that extends beyond the observed data. This can help address identification problems by providing a structured way to estimate values outside the observed range, thus offering a potential remedy for interpolation or extrapolation issues.

Question 7

A parameter is identified in the event that it can be:&#10;A) rejected to be zero at the 95 percent confidence level.&#10;B) rejected to be zero at the 99 percent confidence level.&#10;C) estimated with any level of precision given a large enough sample from the population.&#10;D) safely assumed to not suffer from heteroscedasticity.

Accepted Answer

A parameter is considered identified if it can be estimated with any level of precision, assuming a sufficiently large sample size is available. This is because identification refers to the possibility of determining the parameter's value from the population distribution, given enough data. Confidence levels and heteroscedasticity concerns do not directly relate to the fundamental concept of parameter identification.

Question 8

Determining whether a parameter is identified exclusively involves an argument about the data-generating process of the:&#10;A) observations in the sample.&#10;B) observations in a representative sample.&#10;C) total population.&#10;D) control group.

Accepted Answer

The identification of a parameter depends on the population data-generating process, not just the data in the sample or a representative sample. Therefore, the parameter must be identified based on the total population. The control group is not necessarily relevant to determining identification of a parameter.

Question 9

In the event that there is not an acceptable model of the data-generating process within which the treatment effect is identified using samples from the population, the analyst should do what?&#10;A) Collect more data.&#10;B) Re-run the experiment.&#10;C) Consider alternative data populations (e.g., additional variables) before attempting to estimate the effect.&#10;D) Run a probit/logit model.

Accepted Answer

If there is not an acceptable model of the data-generating process, it may be necessary to consider alternative data populations before attempting to estimate the effect. This could involve introducing additional variables or exploring subsets of the data to better understand the relationships between variables. Collecting more data or re-running the experiment may not address the issue of the lack of an acceptable model. Running a probit/logit model may be a possible approach, but it depends on the specific nature of the data and the research question. Considering alternative data populations is likely to be the most productive approach in this situation.

Question 10

A data gap is any place where:&#10;A) data is measured with error.&#10;B) the data is heteroscedastic.&#10;C) there are missing data for a variable over an interval of values, but data are not missing for at least some values on both ends of the interval.&#10;D) there is missing data.

Accepted Answer

A data gap specifically refers to a situation where there are missing data points within a continuous dataset, but there are available data points before and after the gap. This is distinct from general missing data (D), measurement error (A), or heteroscedasticity (B), which refers to a condition where the variance of the dependent variable varies across the data.

Question 11

Which of the following settings constitutes an extrapolation?&#10;A) Using past sales and price information to make a forecast for next year's sales growth.&#10;B) Using past sales and price information to estimate last year's price elasticity of demand.&#10;C) Using 2014 and 2016 data on GDP to estimate GDP in the third quarter of 2015, which is missing.&#10;D) Using a randomized control trial to estimate the causal effect of the use of a cancer treatment on health outcomes.

Accepted Answer

The answer of Which of the following settings constitutes an...

Question 12

In the event that there is an acceptable model of the data-generating process within which the treatment effect is identified using samples from the population, but the 95% confidence level of the treatment effect contains both large negative and large positive values, what might be an appropriate step for the analyst to do?

A) Collect more data.
B) Re-run the experiment.
C) Consider alternative data populations (e.g., additional variables) before attempting to estimate the effect.
D) Run a probit/logit model.

Accepted Answer

The answer of In the event that there is an...

Question 13

For a parameter that is identified, what should happen to the 99% confidence interval as the sample size gets larger?&#10;A) The right tail should get larger.&#10;B) The left tail should get larger.&#10;C) The width of the interval should get smaller.&#10;D) The width of the interval should get wider.

Accepted Answer

The answer of For a parameter that is identified, what...

Question 14

For a parameter that is not identified, what should happen to the 95% confidence interval as the sample size gets larger?&#10;A) The right tail should get larger.&#10;B) The left tail should get larger.&#10;C) The width of the interval should eventually capture the true parameter value.&#10;D) None of the answers is correct.

Accepted Answer

The answer of For a parameter that is not identified,...

Question 15

If the data-generating process you are working with has a population parameter that is not identified it will be the case that:&#10;A) you will need more data to get a precise estimate.&#10;B) you will always need an instrumental variable to get an unbiased estimate.&#10;C) a fixed effect model will be required.&#10;D) None of the answers is correct.

Accepted Answer

The answer of If the data-generating process you are working...

Question 16

You are collecting information on prices and attributes of electrical vehicles such as the mile range from a single charge for a bunch of electric cars. Suppose your company has created an electric vehicle that will have a mile range twice as large as an existing electric vehicle on the market. Does the model you've estimated have an identification problem?

A) No, because this data gap does not exist in the population.
B) Yes, the extrapolation required exists in the population of current electric vehicles, not just your sample.
C) Yes, but you can still extrapolate the data gap.
D) No, because data gaps cannot cause identification problems.

Accepted Answer

The answer of You are collecting information on prices and...

Question 17

When one conducts extrapolation, one is:&#10;A) imputing values that have been mismeasured.&#10;B) correcting values that have been mismeasured.&#10;C) drawing conclusions where there are &#34;gaps&#34; in the data.&#10;D) drawing conclusions beyond the extent of the data.

Accepted Answer

The answer of When one conducts extrapolation, one is:&#10;A) imputing...

Question 18

In the event that you are collecting data on how heights relate to income in the U.S. population and you notice that in your sample of 1,500 individuals you have a data gap at 5'6", will this cause an identification problem?

A) No, because this data gap does not exist in the population.
B) Yes, but you can still interpolate the data gap.
C) Yes, but you can still extrapolate the data gap.
D) No, because data gaps cannot cause identification problems.

Accepted Answer

The answer of In the event that you are collecting...

Question 19

For a parameter that is identified, what is the role of having a larger sample size?&#10;A) The estimate of the identified parameter will be more precise.&#10;B) The estimate of the identified parameter will be distributed as a t-distribution.&#10;C) The estimate of the identified parameter will be efficient.&#10;D) The estimate of the identified parameter will be distributed as a chi-squared distribution.

Accepted Answer

The answer of For a parameter that is identified, what...

Question 20

Which of the following is a step that can remedy an interpolation/extrapolation identification problem?&#10;A) Use a probit/logit model to alleviate the data gaps.&#10;B) Collect more/alternative data to close the data gap, extent of the data.&#10;C) Use two-stage least squares to interpolate/extrapolate.&#10;D) Use a fixed effects model.

Accepted Answer

The answer of Which of the following is a step...

Question 21

All of the following will cause an identification challenge except for what?&#10;A) A control variable is correlated with the error term.&#10;B) The treatment variable is correlated with the error term.&#10;C) Perfect multicollinearity&#10;D) Imperfect multicollinearity

Accepted Answer

The answer of All of the following will cause an...

Question 22

In the event that your treatment variable suffers from perfect multicollinearity with one of your controls, a viable remedy is:&#10;A) drop the control variable.&#10;B) drop the treatment variable of interest.&#10;C) change the population to one, where the treatment and control are not exactly linearly related.&#10;D) use the within estimator.

Accepted Answer

The answer of In the event that your treatment variable...

Question 23

How does making functional form choices put less burden on the data collection process of the analyst?&#10;A) By making functional form choices, an analyst can interpolate/extrapolate measures of treatment effects within gaps/beyond the extent of the data sampled.&#10;B) By making functional form choices, an analyst can avoid having to take random samples from the population.&#10;C) By making functional form choices, an analyst can avoid having to ensure that the observations are carefully measured.&#10;D) By making functional form choices, an analyst can avoid having to ensure the dataset is structured.

Accepted Answer

The answer of How does making functional form choices put...

Question 24

What condition best describes the endogeneity problem? A) The variance of the errors (U_i) depends on X_i. B) Some variables within X_i are perfectly correlated with other variables in X_i. C) The distribution of the errors (U_i) is non-normal. D) One of the X_i variables is correlated with the error term (U_i).

Accepted Answer

The answer of What condition best describes the endogeneity problem?&#10;A)...

Question 25

All else equal, the theoretical justifications for a particular function form should be stronger in the cases in which you are:&#10;A) attempting to identify correlations only.&#10;B) running fixed effects regressions.&#10;C) using an instrumental variable.&#10;D) extrapolating further off the support of the data.

Accepted Answer

The answer of All else equal, the theoretical justifications for...

Question 26

Which of the following functional forms will be the most restrictive in terms of the shape of the resulting interpolation of data gaps in a sample data set?&#10;A) Linear&#10;B) Quadratic&#10;C) Cubic&#10;D) Piecewise linear

Accepted Answer

The answer of Which of the following functional forms will...

Question 27

A usual result from having a model that has imperfect multicollinearity is that:&#10;A) coefficient estimates are large.&#10;B) coefficient estimates are negative.&#10;C) coefficient estimates are positive.&#10;D) standard errors for coefficient estimates are large.

Accepted Answer

The answer of A usual result from having a model...

Question 28

Defining and omitting a base group amongst a set of fixed effects is an attempt to remedy which identification challenge?&#10;A) A control variable is correlated with the error term.&#10;B) The treatment variable is correlated with the error term.&#10;C) Perfect multicollinearity&#10;D) Imperfect multicollinearity

Accepted Answer

The answer of Defining and omitting a base group amongst...

Question 29

If one was to include in a regression a binary dummy variable for all four regions of the country (East, West, South, North) - what identification challenge would be presented?&#10;A) Endogeneity problem&#10;B) Heteroscedasticity&#10;C) Perfect multicollinearity&#10;D) Variance inflation factor

Accepted Answer

The answer of If one was to include in a...

Question 30

Perfect multicollinearity is when:&#10;A) two or more independent variables have an exact linear relationship.&#10;B) two independent variables are orthogonal.&#10;C) the r-squared coefficient is zero.&#10;D) the coefficients on two independent variables are equal.

Accepted Answer

The answer of Perfect multicollinearity is when:&#10;A) two or more...

Question 31

When functional form choice is used to alleviate an interpolation/extrapolation identification challenge the justification represents what sort of element in our data reasoning framework?&#10;A) Empirically testable conclusion&#10;B) Inductive reasoning&#10;C) Assumption&#10;D) Statistically reasoning

Accepted Answer

The answer of When functional form choice is used to...

Question 32

A useful diagnostic measure for imperfect multicollinearity is the:&#10;A) z-score.&#10;B) t-test.&#10;C) variance inflation factor.&#10;D) likelihood ratio test.

Accepted Answer

The answer of A useful diagnostic measure for imperfect multicollinearity...

Question 33

Imperfect multicollinearity is when:&#10;A) two or more independent variables have nearly an exact linear relationship.&#10;B) two independent variables are nearly orthogonal.&#10;C) the r-squared coefficient is nearly zero.&#10;D) the coefficients on two independent variables are nearly equal.

Accepted Answer

The answer of Imperfect multicollinearity is when:&#10;A) two or more...

Question 34

Consider the regression of Sales_i = α₀ + α₁Price_i + α₂Promo_i + α₃Weekend_i + U_i, where Promo_i and Weekend_i are binary variables for if the particular observation was on promo (1 if promo, 0 otherwise), or comes from a weekend day (1 if weekend day, 0 otherwise). A member of your team informs you that all the promotions run for the products in your population were run on weekends. What identification challenge might you worry about? What identification challenge might you worry about?

A) Heteroscedasticity
B) Endogeneity
C) Perfect multicollinearity
D) Imperfect multicollinearity

Accepted Answer

The answer of Consider the regression of Sales_i = α₀...

Question 35

When making functional form choices to alleviate an interpolation/extrapolation identification challenge the justification will almost always take what form?&#10;A) A calculation of difference of means across sub-samples of the data&#10;B) Theoretical argument&#10;C) A hypothesis test&#10;D) A p-value from a test statistic

Accepted Answer

The answer of When making functional form choices to alleviate...

Question 36

A typical remedy for perfect multicollinearity amongst control variables is to:&#10;A) use an instrumental variable.&#10;B) use the within estimator.&#10;C) drop one of the control variables that is exactly linear with other control variables.&#10;D) use the t-test.

Accepted Answer

The answer of A typical remedy for perfect multicollinearity amongst...

Question 37

If one suffers from the endogeneity problem for identification, it will result in your coefficient estimates being:&#10;A) estimated with limited precision.&#10;B) inconsistent.&#10;C) efficient.&#10;D) too large.

Accepted Answer

The answer of If one suffers from the endogeneity problem...

Question 38

Of the remedies for an interpolation/extrapolation identification problem, which requires the most trust in the assumptions surrounding the data generation process?&#10;A) Use probit/logit to alleviate the data gaps.&#10;B) Collect more/alternative data to close the data gap, extent of the data.&#10;C) Use two-stage least squares to interpolate/extrapolate.&#10;D) Use functional form choices to interpolate/extrapolate.

Accepted Answer

The answer of Of the remedies for an interpolation/extrapolation identification...

Question 39

Suppose you are estimating the following model: Y_i = β₀ + β₁X_i + U_i. You believe the variance of the unobserved factors (U) varies with X. If this is true, what is the consequence? A) Your estimate for β₁ will be biased. B) Your estimate for β₀ will be biased. C) Your estimate for β₁ and β₀ will be biased. D) None of the answers is correct.

Accepted Answer

The answer of Suppose you are estimating the following model:...

Question 40

How does collecting more/alternative data put less burden on the modeling choices of the analyst?&#10;A) By collecting more data, an analyst can completely avoid characterizing a determining function.&#10;B) By collecting more data, an analyst can avoid heteroscedasticity.&#10;C) By collecting more data, an analyst can avoid having to make functional form choices to fill in data gaps/extent of the data.&#10;D) By collecting more data, an analyst can implement a fixed effects design.

Accepted Answer

The answer of How does collecting more/alternative data put less...

Question 41

In the event that your treatment variable is imperfectly multicollinear with one of your control variables, a possible remedy would be to:&#10;A) re-orient your base group.&#10;B) use the within estimator.&#10;C) gather more data, in hopes of getting more independent variation of the treatment.&#10;D) use data reduction methods.

Accepted Answer

The answer of In the event that your treatment variable...

Question 42

The endogeneity problem that results from when a variable correlated with the treatment is not included as a control variable is known as the:&#10;A) instrumental variable.&#10;B) omitted variable bias.&#10;C) weak instruments problem.&#10;D) least efficient estimator.

Accepted Answer

The answer of The endogeneity problem that results from when...

Question 43

Suppose that you observed several key characteristics of a random sample of firms in your industry. You know that the semi-partial correlation of firm Productivity (Y) with R&D (Z) investment holding amount of Labor (X) fixed is positive. Furthermore, suppose you know that the covariance of R&D investment and amount of labor is positive. How will the coefficient on Labor when you run the regression of Productivity on Labor and R&D investment relate to the coefficient on Labor when you run a regression of Productivity on just Labor?

A) It'll be equal to
B) It'll be greater than
C) It'll be less than
D) There is not enough information to determine.

Accepted Answer

The answer of Suppose that you observed several key characteristics...

Question 44

Fortunately, even in cases where we suffer from an omitted variable bias, it is often the case that with careful reasoning it is possible to:&#10;A) sign the bias.&#10;B) make the size of bias not statistically significant from zero.&#10;C) report appropriately adjusted standard errors.&#10;D) construct alternative p-values.

Accepted Answer

The answer of Fortunately, even in cases where we suffer...

Question 45

When might imperfect multicollinearity not require the collecting of more data to remedy the likely imprecise estimates of the affected variables?&#10;A) When the model suffers from heteroscedasticity as well&#10;B) When you are only conducting hypothesis tests&#10;C) When the imperfect multicollinearity is confined to control variables only&#10;D) When the treatment effect is positive

Accepted Answer

The answer of When might imperfect multicollinearity not require the...

Question 46

Potential remedies for when your model suffers from the endogeneity problem include all of the following except what?&#10;A) Gather (additional) data on a possible instrumental variable.&#10;B) Gather (additional) longitudinal data to allow for a fixed effect approach.&#10;C) Gather (additional) control variables that would limit the endogeneity concern.&#10;D) Check if the residuals from your regression are uncorrelated with your treatment.

Accepted Answer

The answer of Potential remedies for when your model suffers...

Question 47

The two critical elements required to sign the bias of an omitted variable include the sign of the:&#10;A) effect of the omitted variable on the outcome and the sign of the effect of the treatment variable on the outcome.&#10;B) effect of the omitted variable on the outcome and the sign of the correlation between the omitted variable and the outcome.&#10;C) effect of the omitted variable on the outcome and the sign of the correlation between the omitted variable and the treatment.&#10;D) correlation between the omitted variable and the outcome and the sign of the correlation between the treatment variable and the outcome.

Accepted Answer

The answer of The two critical elements required to sign...

Question 48

Suppose that you have many observations of the employee level longevity and the wage earned by that employee that year. You run the regression Longevity_i = β₀ + β₁ Wage_i +U_i, and get an estimate of β₁. Now, suppose that a member of the analytics team suggests that education is an omitted variable in your regression and is likely biasing your estimate of β₁. Suppose you knew that, conditional on Wage, more educated employees tended to have shorter stints with the company (lower longevity) and that the error, η_i, in the equation Longevity_i = β₀ + β₁Wage_i + β₂ Education_i + U_i, was uncorrelated with both Wage and Education. How would you sign the bias on your estimate of β₁?

A) Argue that education is positively correlated with wage and that your estimate of β₁ is an upper bound.
B) Argue that education is negatively correlated with wage and that your estimate of β₁ is a lower bound.
C) Argue that education is positively correlated with wage and that your estimate of β₁ is a lower bound.
D) Argue that education is positively correlated with longevity and that your estimate of β₁ is an upper bound.

Accepted Answer

The answer of Suppose that you have many observations of...

Question 49

Suppose you have many observations of the scores students got on an exam, with other characteristics of the students including how many hours they studied for the exam that week and what is their current major GPA. Now, suppose you ran the standard linear regression of the exam grade on the number of hours studied: Grade_i = β₀ + β₁ Hours Studied_i + U_i and got an estimate of β₁, call it b₁. Now, suppose a colleague tells you that the students with above average GPAs were the students who studied more for the exam. How does your estimate of β₂* in the following regression, Grade_i = β₀^* + β₁Hours Studied_i + β₂^* Major GPA_i + U_i, inform you of what your estimate of what β₁^* will be for that same regression? A) If β₂* > 0, then β₁* > b₁ B) If β₂* > 0, then β₁* < b₁ C) If β₂* < 0, then β₁* > 0 D) If β₂* > 0, then β₁* > 0

Accepted Answer

The answer of Suppose you have many observations of the...

Question 50

Suppose you have many observations of the price of refrigerators, with other characteristics including their energy cost as well as whether or not they are stainless steel. Now, suppose you ran the standard linear regression of price on energy cost: Price_i = β₀ + β₁ Energy Cost_i + U_i and got an estimate of β₁. Now, suppose you know that stainless steel refrigerators are more popular (i.e., sell for higher prices holding all other characteristics constant) and tend to be in more energy efficient (i.e., lower energy cost) refrigerator models. What should you expect your estimate of β₁ in the following regression Price_i = β₀* + β₁*Energy Cost_i + β₂* Stainless Steel_i + U_i to be? A) β₁* < β₁ B) β₁* > β₁ C) β₁* = β₁ D) You cannot tell from the information given.

Accepted Answer

The answer of Suppose you have many observations of the...

Deck 10: Identification and Data Assessment