Join with a free account for more service, or become a member for full access to exclusives and extra support of WorldSupporter >>

Blok AWV HC10+11: Regression analysis

HC10+11: Regression analysis

Mean and standard deviation

Statistics consists of making statements about a population based on data observed from a sample. This is often done using means and standard deviations (σ). The bigger the standard deviation, the bigger the spread in the population.

For example, the lung function (FEV1 in L) of 40 children is measured:

Mean FEV1 = 3,16 L
σ = 0,41 L

This means that roughly 95% of the population has a FEV1 between 3,16 – 0,82 and 3,19 + 0,82 L → approximately 95% of observations are less than 2σ from the mean:

95% CI = (2,34 L, 3,84 L)

However, lung function depends on many factors such as age and gender. These factors also need to be taken into account.

Linear regression

Simple linear regression is regression for continuous outcomes. Linear regression tries to predict or explain a variable → the outcome or the dependent variable (x). This variable is explained by another variable → the explanatory variable (y). A regression line is based on a scatter plot and calculates the mean value of “y” for a value of “x”:

y = the dependent variable, outcome and response variable
x = the independent variable, covariate, risk factor, predictor and explanatory variable
Mean y = β0+ β1x
- β0= the intercept (“constante”)
  - The predicted value of “y” if “x” is equal to 0
    - Not always clinically meaningful
- β1= the slope (“richtingscoëfficiënt”)
  - The expected change in the outcome by increasing the exposure of 1 unit if β1 is positive
    - Or decrease, in case β1is negative

For instance, a regression line can describe the mean FEV1 as function of age:

Mean FEV1 = 2,281 + 0,119 x age

This means that for 2 children with an age difference of 1 year, the expected mean difference in the FEV1 is 0,119 L.

Error/residual:

Observations of (x1, y1), (x2, y2), …, (xn, yn) show that each pair represents the values of 1 person. Sometimes, the error can also be taken into account:

y = β₀+ β₁x + e

The deviations of the regression line are called residuals, which are taken into the error. The error/residual is assumed to be normally distributed with the standard deviation σ. σ indicates how much the observations vary around the regression line:

Small σ: all observations are close to the regression line
Large σ: some observations are far from the regression line

The residual is the distance from a single observation to the regression line → the difference between what is observed and what is predicted:

y_i– (β₀+ β₁x_i)

Least squares method:

The unknown true regression line in the population is line y = β₀+ β₁x. Using the least squares method, the regression line can be estimated by y = b0+ b1x. The b0and b1which minimize the sum of squared residuals need to be selected:

∑(y_i– (β₀+ β₁x_i))²
- b₁=
- b₀= 1
- s =
  - sis an estimate for σ, the standard deviation around the regression line

95% confidence interval:

Because research is usually based on a sample, b0and b1are not exact. The standard error is the uncertainty of estimate in a and b (se(b0) and se(b1)), which is used to make confidence intervals for the true unknown β0and β1. The approximate 95% CI for β1can be calculated as follows:

(b₁– 2 x se(b₁), b₁+ 2 x se(b₁)) → it is 95% sure that the true β₁lies in this interval

In case 0 is in the 95% CI, this indicates that there is no association. The 95% CI for the FEV1 of children is:

(0.119 – 2×0.011, 0.119 + 2×0.011) = (0.097, 0.141) → a value of 0 between age and FEV1 is very unlikely

The 95% confidence interval for mean y = β0+ β1x for given value of “x” can be calculated as follows:

(b₀+ b₁x – 2 se(b₀+ b₁x), b₀+ b₁x + 2 se(b₀+ b₁x))
- se(b₀+ b₁x) can be calculated in SPSS

If the 95% of a regression line is known, the true regression line is likely to be between these bounds.

Standard deviation versus standard error:

The standard deviation is often mixed up with the standard error:

Standard deviation: a measure of variability in the population → indicates how much the FEV1 values in children vary
Standard error: a measure of precision of an estimate (sample mean or estimated slope of the regression line) → used to calculate the 95% CI’s

Prediction:

The expected FEV1 of a 6-year-old child according to the formula is:

2,281 + 0,119x6 = 2,995 L

There are 2 sources of variation:

Imprecision in the estimated regression line: se(b₀+ b₁x)
Spread around regression line σ

Combining this gives the 95% reference or prediction interval for a new observation → the interval between which 95% of the values of the population fall into. For a 6-year-old child, values between 2,6 and 3,5 are considered normal.

Assumptions:

Simple linear regression relies on some assumptions:

Linearity
- The scatterplot needs to be checked
- It is assumed that the relation between “x” and “y” is linear
Nearly normal residuals
Constant variability: homoscedasticity
- σ is constant
- This often isn’t a problem if the sample size is large → the estimate se, 95% CI and p-value are still valid
- If the “y” variable is very skewed, it may be log transformed
Independent observations
- How the data was collected needs to be checked

Residual plot:

The residual plot is the plot of predicted values versus residuals. It is used to see if the assumptions are correct. A residual plot shouldn’t have a clear pattern and can be used to detect deviations of the model:

Dots scattered everywhere → no constant variability
Dots taking the shape of a parabola → no linear relation

Categorical variables:

If x is categorical, x is either 1 or 0, for example if x indicates asthma treatment:

x = 0 → no treatment
x = 1 → treatment

In this case, x can be taken as an independent variable in the regression model of the FEV1 of children. The FEV of treated children is on average 0,266 L larger with a p-value of 0,036 → there is a statistically significant difference between treated and untreated children.

The increase in the mean FEV between untreated (x = 0) and treated (x = 1) children is 0,226 → the slope of the regression line. Because the mean of the treated and untreated children is compared, this is equivalent to an unpaired t-test.

Multiple regression

Multiple linear regression means regression in multiple directions. It is characterized by the influence of several explanatory variables on the response:

How does the average “y” vary as function of x1, x2, ..., xp?
Can “y” be predicted if x1, x2,..., xpare known?
What is the influence of x1on “y”, corrected for x2,.., xp?
Which combination of x’s is related to “y”?

Multiple regression can be used to:

Control for confounders
Build a prediction model
- By adding extra information to the model to make a better guess
  - E.g. age
Increase the precision
- By adding more information, less patients are needed to obtain the same precision for the treatment effect

Calculations:

The mean FEV1 is obtained with the formula 2,281 + 0,119 x age. This formula changes if height is added as explanatory variable to the model:

Mean (FEV1) = 1,711 + (0,058 x age) + (0,008 x height)
- If the FEV of 2 children who have the same height is measured, a 1-year older child has on average 0,058 L more FEV
- If 2 children have the same age, a child who is 1 cm taller has on average 0,008 L more FEV

In short, in multiple regression:

“y” is a numerical outcome
The model has 2 independent variables (x1and x2) with e~ N(0, σ²)
The estimated regression equation is y = b₀+ b₁x₁+ b₂x₂
- If x1is increased by 1 unit, x₂is kept fixed → y = b₀+ b₁(x₁+ 1) + b₂x₂
  - The difference is b₁
  - The amount by which the mean of y increases if x1increases 1 unit and all other x’s are kept fixed

Testing and estimation:

Testing and estimation is done in a similar way. Coefficients are estimated with the least squares method. Here, standard errors, confidence intervals and p-values can be calculated. In the FEV example, after correction for height, the relation between age and FEV isn’t significant anymore.

In short, if age is added to the FEV model, the following is visible:

The direction of the effect changes
- The effect is very small and no longer statistically significant
Age is a confounder
- Young children have a lower FEV and are less often treated
- Adding age to the model adjusts for age
  - Differences between treated and untreated for fixed ages should be considered

Confounding:

One of the main functions of multiple regression is to control for confounding. Confounding should be considered if the regression coefficient for a variable (e.g. treatment) changes if another variable (e.g. age) is added.

Functions of regression

Both linear and multiple regression have different uses:

Linear regression
- To predict: e.g. what is the mean FEV for a 7-year-old child who is 1,30m tall and doesn’t use any medication?
- To correct for confounders: e.g. what is the effect of treatment on FEV, after adjustment for age?
Multiple regression
- Increases precision of randomized trials → adjusts the variability of important risk variables → the σ around the regression line becomes smaller

Assumptions shouldn’t be made outside of the sample → the regression line may be different in extrapolation.

Types of regression models

There are different types of regression models for different types of outcomes:

Numerical outcomes → linear or non-linear regression
Binary outcome → logistic regression
- A 0-1, success/failure outcome
Survival data → proportional hazard model (Cox regression)

Cox proportional hazards:

Cox proportional hazards (Cox PH) is a regression method for survival data for adjusted analysis. The assumption is that there’s a baseline hazard in a group, and a hazard ratio (HR) which increases or decreases the hazard:

h₁= h₀ (t) x HR

In this case, the HR may depend on covariates, but not on the time (t) → proportional hazards do not change over time.

Access:

Public

Check more of this topic?

Samenvattingen voor geneeskunde en gezondheidszorg

Search other summaries?

Associate with your Field of Study

Search Summaries or Notes

Start using Summaries

Add a Summary

This content is used in:

Blok AWV2 2020/2021 UL

This content is also used in .....

Blok AWV2 2020/2021 UL

Deze bundel bevat alle aantekeningen van de colleges uit het blok AWV uit het 2e jaar van de bachelor Geneeskunde aan de Universiteit Leiden. Ook aantekeningen uit de werkgroepen zijn in de samenvattingen verwerkt.

Blok AWV HC1: Research questions

Blok AWV HC2: RCT

Blok AWV HC3: Sample size calculation

Blok AWV HC4: Cohort studies

Blok AWV HC5: Case control studies

Blok AWV HC6+7: Bias

Blok AWV HC8+9: Survival analysis

Blok AWV HC10+11: Regression analysis

Blok AWV HC12: Diagnostische begrippen

Blok AWV HC13: Beslisbomen

Blok AWV HC14: Test en behandeldrempel

Read more about Blok AWV2 2020/2021 UL
1556 reads

Follow the author: nathalievlangen

nathalievlangen

More contributions of WorldSupporter author: nathalievlangen:

Work for WorldSupporter

JoHo can really use your help! Check out the various student jobs here that match your studies, improve your competencies, strengthen your CV and contribute to a more tolerant world

Working for JoHo as a student in Leyden

Parttime werken voor JoHo

Comments, Compliments & Kudos:

Add new contribution

Promotions

The JoHo Insurances Foundation is specialized in insurances for travel, work, study, volunteer, internships an long stay abroad

Check the options on joho.org (international insurances) or go direct to JoHo's https://www.expatinsurances.org

Check how to use summaries on WorldSupporter.org

Online access to all summaries, study notes en practice exams
How and why would you use WorldSupporter.org for your summaries and study assistance?
Using and finding summaries, study notes en practice exams on JoHo WorldSupporter
Quicklinks to fields of study for summaries and study assistance

Online access to all summaries, study notes en practice exams

Check out: Register with JoHo WorldSupporter: starting page (EN)
Check out: Aanmelden bij JoHo WorldSupporter - startpagina (NL)

How and why would you use WorldSupporter.org for your summaries and study assistance?

For free use of many of the summaries and study aids provided or collected by your fellow students.
For free use of many of the lecture and study group notes, exam questions and practice questions.
For use of all exclusive summaries and study assistance for those who are member with JoHo WorldSupporter with online access
For compiling your own materials and contributions with relevant study help
For sharing and finding relevant and interesting summaries, documents, notes, blogs, tips, videos, discussions, activities, recipes, side jobs and more.

Using and finding summaries, study notes en practice exams on JoHo WorldSupporter

There are several ways to navigate the large amount of summaries, study notes en practice exams on JoHo WorldSupporter.

Use the menu above every page to go to one of the main starting pages
- Starting pages: for some fields of study and some university curricula editors have created (start) magazines where customised selections of summaries are put together to smoothen navigation. When you have found a magazine of your likings, add that page to your favorites so you can easily go to that starting point directly from your profile during future visits. Below you will find some start magazines per field of study
Use the topics and taxonomy terms
- The topics and taxonomy of the study and working fields gives you insight in the amount of summaries that are tagged by authors on specific subjects. This type of navigation can help find summaries that you could have missed when just using the search tools. Tags are organised per field of study and per study institution. Note: not all content is tagged thoroughly, so when this approach doesn't give the results you were looking for, please check the search tool as back up
Check or follow your (study) organizations:
- by checking or using your study organizations you are likely to discover all relevant study materials.
- this option is only available trough partner organizations
Check or follow authors or other WorldSupporters
- by following individual users, authors you are likely to discover more relevant study materials.
Use the Search tools
- 'Quick & Easy'- not very elegant but the fastest way to find a specific summary of a book or study assistance with a specific course or subject.
- The search tool is also available at the bottom of most pages

Do you want to share your summaries with JoHo WorldSupporter and its visitors?

Check out: Why and how to add a WorldSupporter contributions
JoHo members: JoHo WorldSupporter members can share content directly and have access to all content: Join JoHo and become a JoHo member
Non-members: When you are not a member you do not have full access, but if you want to share your own content with others you can fill out the contact form

Quicklinks to fields of study for summaries and study assistance

Field of study

Check the related and most recent topics and summaries:

Activity abroad, study field of working area:

Samenvattingen voor geneeskunde en gezondheidszorg

Institutions, jobs and organizations:

Universiteit Leiden en studieverenigingen

Access level of this page

Public
WorldSupporters only
JoHo members
Private

Statistics

1767