Lecture 5:
- Ouliers: Data point that does not follow the general trend of the data (extreme value)
What does happen if we run the regression anyway?
- The fit of the model can change
- The regression may be titled
- You can remove an outlier
How can we check whether we have outliers
- Scatterplots
- Statistical test
- Easiest wat the range of +/- 2 à 3 standard deviatons include the word at least in the conclusion! à based on the assumption that it distribution is normal
How can we solve the problem?
- First think: what is the reason for the outlier, can/could you do something?
- Throw the outlier out of the dataset, however mismeasurement, error in the observation, data entry error. But not because it’s convenient to do so.
- Be careful: some extreme values are to be expected, indicative of the characteristics of the population. Therefore it is important to check how sensitive your results are to the presence of the outlier? à what happens if we keep the outlier, what happens if we omit the outlier.
- If the outlier does not change the results, but does affect assumptions, you may drop the outlier
- If it affects both results and assumptions, you may not drop the outlier, but you have to run the regression both with and without the outlier and say that in he paper
- If a relationship is clearly created by the outlier, you may drop the outlier, because without it there would be no relationship between x and y. So the regression coefficient does not truly describe the effect of x on y
- Reverse causality: We assume that changes in the dependent variables are caused by changes in the independent variables. But we only find a statistical relationship, says nothing about causality of the direction of causality. In some analysis is could be that y (also) causes X which is called reverse causality à cause endogeneity problem
How can we check whether there is a reverse causality problem?
- What does the theory say
- Timing of measurement: Theory says x causes y, but sometimes x is measured later than y
- Statistical tests (to check whether changes in x precede changes in y) and some more advances techniques
What to do:
- Have a model that is well-grounded in theory
- Explain
- Acknowledge
- In general: advances econometric techniques also exist to mitigate the problem of endogeneity
- Omitted variable bias: which variables to include as IVs and what happens if we omit relevant variables? You have omitted variables if:
- As excluded variable has some effect on your DV and
- It’s correlated with at least one of your IVs (endogeneity)
It is impossible to control for everything, so how do we solve the problem?
- Avoid simple regressions models (with on IV)
- Include variables that are likely to be the most important theoretically in explaining the DV (what does the literature say)
Panel data or longitudinal data: data on many units collected at several points in time, whereby each unit is observed several times. You also have cross sectional and time series dimensions.
Why panel data:
- Rich in information
- Potentially, an increase in sample size
- Possibility to control for time-invariant effects correlated with the regressors
- How> intuition: Include dummy variables for each cross-section unit and use fixed effects.
- Mitigate omitted variable bias
Fixed effects model: is a statistical regression model in which the intercept of the regression model in which the intercept of the regression model is allowed to vary freely across individuals or groups. It often applied to panel data in order to control for any individual-specific attributes that do not vary across time. Remove omitted variable bias. Assumption: the individual-specific effects are correlated with the IV’s
Assume: For the Grundfeld data we concluded that the assumption of OLS regression that the investment behaviour of all firms in all years is the same à is not realistic. The fixed effects model offers another way of restricting that assumption, namely by assuming that each firm has a number of unique characteristics that influence the firm’s investment behaviour. These unique characteristics are caught in the model by including for each firm a separate dummy variable.
In example of Grunfeld we assume:
- Each firm has a unique characteristic which is stable over time
- Random error term is assumed to satisfy the usual OLS assumptions
- Hence each firm I gets a different intercept parameter but the slope coefficient b2 and b3 are assumed to be the same for all firms
- An easy way to estimate the model is to create for each firm a dummy variable and add thse dummies to the model
General equation FE model

Restrictions of FE model, the FE model is very powerful but:
- We cannot include variables that do not vary over time, all stable characteristics are captured by dummies, it leaves not variation left for estimating effects of variables that vary between economic entities
- You can only include those that change over time
- However you can still examine the interaction between group dummies and time-varying variables in FE model
When should you use a Fe model à if you are concerned about omitted factors that may be correlated with key predictors at the group level
Interpretation of results à similar to OLS
Logs in the regression equation, in general don’t forget:
- Sign-size significance
- Use the unit of measurement of y and x when given
- Ceteris paribus
4 situations:

Robustness/sensitivity analysis:
To what end? à determine how sensitive your results are to change in the model
Experiment with:
- Combinations of (other) control variables
- Datasets
- Time frames
Always rely on theory and literature
Do you results remain, results are robust.
Add new contribution