What Is Omitted Variable Bias? | Definition & Examples
Omitted variable bias occurs when a statistical model fails to include one or more relevant variables. In other words, it means that you left out an important factor in your analysis.
As a result, the model mistakenly attributes the effect of the missing variable to the included variables. Exclusion of important variables can limit the validity of your study findings.
What is an omitted variable?
An omitted variable is a confounding variable related to both the supposed cause and the supposed effect of a study. In other words, it is related to both the independent and dependent variable.
While a variable can be omitted because you are not aware that it exists, it’s also possible to omit variables that you can’t measure, even though you are aware of their existence.
What is omitted variable bias?
Omitted variable bias occurs in linear regression analysis when one or more relevant independent variables are not included in your regression model.
A regression model describes the relationship between one or more independent variables (also called predictors, covariates, or explanatory variables) and a dependent variable (often called a response or target variable).
Because the omitted variable is hidden or unobserved, it’s not factored into your analysis, affecting your results.
This can bias your coefficients if the omitted variable is correlated with either:
- The dependent variable
- One or more other independent variables
Why is omitted variable bias a problem?
An omitted variable is a source of endogeneity. Endogeneity occurs when a variable in the error term is also correlated with an independent variable.
When this happens, the causal effect from the omitted variable becomes tangled up in the coefficient on the variable with which it is correlated. This, in turn, undermines our ability to infer causality and severely impacts our results.
Omitting a variable might lead to an overestimation (upward bias) or underestimation (downward bias) of the coefficient of your independent variable(s). Since the coefficient becomes unreliable, the regression model also becomes unreliable.
How to deal with omitted variable bias
Regression models cannot always perfectly predict the value of the dependent variable. Thus, every regression model has one or more omitted variables. While it can’t be avoided altogether, there are steps you can take to mitigate omitted variable bias.
- If the required data are not available, like in the case of ability, you can use control variables. Taking the example of salaries, controls are variables that in theory affect salary, such as years of work experience.
- If you don’t have the data, use proxies for the omitted variables. These are variables that are similar enough to the omitted variable to give you an idea about its value, but that you are able to measure. For example, you might use an IQ test as a proxy for an individual’s ability.
- If you are not able to resolve the research bias, try to make a prediction about which direction your estimates are biased. This is called “signing” the bias. You can sign it as either positive or negative, and this helps you estimate the omitted variable bias.
Estimating omitted variable bias
Without getting too far into advanced algebra, we can use logical thinking to predict the direction of the omitted variable. In this way, we can establish whether we have overestimated or underestimated the effect of the variable we included in our regression model.
The table below summarizes the direction of the omitted variable bias. The sign of the bias is based on the sign of the relationships between the omitted variables and the variables in the model.
Let’s assume:
Y is the dependent variable
A is an independent variable
B is another independent variable, the omitted variable.
A and B are positively correlated | A and B are negatively correlated | |
B has a positive effect on Y | Positive bias | Negative bias |
B has a negative effect on Y | Negative bias | Positive bias |
Note that with positive bias, we tend to overestimate, while with negative bias, we tend to underestimate.
Other types of research bias
Frequently asked questions
- How do I prevent omitted variable bias from interfering with research?
-
Omitted variable bias is common in linear regression as it’s usually not possible to include all relevant variables in the model. You can mitigate the effects of omitted variable bias by:
- Introducing control variables
- Introducing proxy variables
Using logic to predict whether you have overestimated or underestimated the effect of the variable(s) included in your regression model
- What are the two requirements that must be fulfilled for omitted variable bias to occur?
-
Omitted variable bias occurs when two requirements are fulfilled:
- The omitted variable relates to the dependent variable.
- The omitted variable relates to one or more other independent variables.
- Why does omitted variable bias matter?
-
Omitted variable bias matters because it can lead researchers to draw false conclusions by attributing the effects of a missing variable to those that are included in a statistical model.
Sources in this article
We strongly encourage students to use sources in their work. You can cite our article (APA Style) or take a deep dive into the articles below.
This Scribbr article Sources