Join with a free account for more service, or become a member for full access to exclusives and extra support of WorldSupporter >>

Image

What is logistic regression? – Chapter 15

15.1 What are the basics of logistic regression?

A logistic regression model is a model with a binary response variable (like 'agree' or 'don't agree'). It's also possible for logistic regression models to have ordinal or nominal response variables. The mean is the proportion of responses that are 1. The linear probability model is P(y=1) = α + βx. This model often is too simple, a more extended version is:

The logarithm can be calculated using software. The odds are:: P(y=1)/[1-P(y=1)]. The log of the odds, or logistic transformation (abbreviated as logit) is the logistic regression model: logit[P(y=1)] = α + βx.

To find the outcome for a certain value of a predictor, the following formula is used:

The e to a certain power is the antilog of that number.

A straight line is drawn next to the curve of a logistic graph to analyze it. β is maximal where P(y=1) = ½. For logistic regression the maximal likelihood method is used instead of the least squares method. The model expressed in odds is:

The estimate is:

With this the odds ratio can be calculated.

There are two possibilities to present the data. For ungrouped data a normal contingency table suffices. For grouped data a row contains data for every count in a cel, like just one row with the number of subjects that agreed, followed by the total number of subjects.

An alternative of the logit is the probit. This link assumes a hidden, underlying continuous variable y* that is 1 above a certain value T (threshold) and that is 0 below T. Because y* is hidden, it's called a latent variable. However, it can be used to make a probit model: probit[P(y=1)] = α + βx.

Logistic regression with repeated measures and random effects is analyzed with a linear mixed model: logit[P(yij = 1)] = α + βxij + si.

15.2 What does multiple logistic regression look like?

The multiple logistic regression model is: logit[P(y = 1)] = α + β1x1 + … + βpxp. The further βi is from 0, the stronger the effect of xi is and the further the odds ratio is from 1. If needed, cross-product terms and dummy variables can be added.

Research results are often expressed in terms of odds instead of the log odds scale, because the odds are easier to interpret. The odds is the multiplied version of the antilog. To present the results even more clearly, they're expressed in probabilities. For instance the chance that a certain value is the output, while controlling for the other variables. The estimated probability is:

The standardized estimate allows to compare the effects of explanatory variables using different units of measurement:

The sxj is the standard deviation of the variable xj.

To help prevent selection bias in observation studies, the propensity is used, the probability that a subjects ends up in a certain group. By managing this, researchers can control and group the kind of people that find themselves in a certain situation. However, this only manages observed confounding variables. Variables unknown to the researchers remain hidden.

15.3 How does inference with logistic regression models work?

A logistic regression model assumes the binomial distribution and is shaped like this: logit[P(y = 1)] = α + β1x1 + … + βpxp. The general null hypothesis is H0 : β1 = … = βp = 0 and is tested by the likelihood-ratio test. This inferential test compares a complete model to a reduced model. The likelihood function (ℓ) is the probability that the observed data result from the parameter values. For instance, ℓ0 is the maximal likelihood function if the null hypothesis is true and ℓ1 when it is not true. The test statistic is: -2 log (ℓ0 /ℓ1 ) = (-2 log ℓ0 ) – (-2 log ℓ1 ).

Alternative test statistics are z and z squared (called the Wald statistic):

But for small samples or extreme effects the likelihood ratio test works better..

15.4 How is logistic regression performed for ordinal variables?

Ordinal variables assume a certain order in the categories. The cumulative probability is the probability that a response falls in a certain category j or below: P(y ≤ j). Each cumulative probability can be transformed to odd, for instance the odds that a response falls in category j or below: P(y ≤ j) / P(y > j).

Cumulative logits are popular, these divide the responses into a binary scale: logit[P(y ≤ 1)] = αj – βx in which j = 1, 2, …, c – 1 and c is the number of categories. Beware, some software puts + instead of – in front of the slope.

A proportional odds model is a cumulative logit model in which the slope is the same for every cumulative probability, so β doesn't vary. The slope indicates the steepness of the graph, so in a proportional odds model the lines of the different categories are equally steep.

Cumulative logit models can have multiple explanatory variables. H0 : β tests whether they are independent. An independence test for logistic regression with ordinal variables results in a P-value that is more clear than tests that ignore the order in the data, like the chi squared test. A confidence interval is also an option.

An advantage of the cumulative logit model is invariance towards the scale of responses. If a researcher uses a different number of categories, he/she will still reach the same conclusions.

15.5 What do logistic models with nominal responses look like?

For nominal variables (without order) a model exists that specifies the probabilities that a certain outcome happens instead of another outcome. This model calculates these probabilities simultanously and it presumes independent observations. This is the baseline-category logit model:

It doesn't matter which category is in the baseline. Inference works similarly to logistic regression, but to test the effect of an explanatory variable, all parameters of the comparisons are involved. The likelihood ratio test examines if the model fits the data better with or without a certain value.

15.6 How do loglinear models describe the associations between categorical variables?

Most models study the effect of an explanatory variable on a response variable. Loglinear models are different, they study the associations between (categorical) variables, for instance in a contingency table. These models are more alike correlations.

A loglinear model assumes the Poisson distribution; non-negative discrete variables, like counts, based on the multinomial distribution.

A contingency table can show multiple categorical response variables. A conditional association is an association between two variables, while a third variable is controlled for. When variables are conditionally independent, they are independent of each category of the third variable. A hierarchy of dependence is the following (accompanied by symbols for the response variables x, y and z):

  1. All three are conditionally independent (x, y, z)

  2. Two pairs are conditionally independent (xy, z)

  3. One pair is conditionally independent (xy, yz)

  4. There is no conditional independence, but there is a homogeneous association, meaning the association for each possible pair is the same for each category of the third variable (xy, yz, xz)

  5. All pairs are associated and there is interaction, this is a saturated model (xyz)

Also linear models can be interpreted using the odds ratio.

15.7 How do goodness-of-fit tests work for contingency tables?

A goodness-of-fit test investigates the null hypothesis that a model really fits a certain population. It measures whether the estimated frequencies fe are close to the observed frequencies fo . Bigger test statistics are bigger evidence that the model is incorrect. This is measured by the Pearson chi squared test:

Another version is the likelihood ratio chi-squared test:

When the model fits reality perfectly, then both X2 and G2 are 0. The likelihood ratio test is better in case of large samples. The Pearson test is better for frequencies that average between 1 and 10/ Both tests only work well for contingency tables with categorical predictors and relatively big counts.

To see what exactly doesn't fit, the standardized residuals can be calculated per cel: (fo – fe) / (standard error of (fo – fe)). When a standardized residual exceeds 3, for that cel the model doesn't fit the data.

Goodness-of fit tests and standardized residuals can also be applied to loglinear models.

To see if a complete or a reduced model fits better, the likelihood ratios can be compared.

Image  Image  Image  Image

Access: 
Public
This content is related to:

Image

This content is also used in .....
Work for WorldSupporter

Image

JoHo can really use your help!  Check out the various student jobs here that match your studies, improve your competencies, strengthen your CV and contribute to a more tolerant world

Working for JoHo as a student in Leyden

Parttime werken voor JoHo

Comments, Compliments & Kudos:

Add new contribution

CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA
Enter the characters shown in the image.

Image

Check how to use summaries on WorldSupporter.org

Online access to all summaries, study notes en practice exams

How and why would you use WorldSupporter.org for your summaries and study assistance?

  • For free use of many of the summaries and study aids provided or collected by your fellow students.
  • For free use of many of the lecture and study group notes, exam questions and practice questions.
  • For use of all exclusive summaries and study assistance for those who are member with JoHo WorldSupporter with online access
  • For compiling your own materials and contributions with relevant study help
  • For sharing and finding relevant and interesting summaries, documents, notes, blogs, tips, videos, discussions, activities, recipes, side jobs and more.

Using and finding summaries, study notes en practice exams on JoHo WorldSupporter

There are several ways to navigate the large amount of summaries, study notes en practice exams on JoHo WorldSupporter.

  1. Use the menu above every page to go to one of the main starting pages
    • Starting pages: for some fields of study and some university curricula editors have created (start) magazines where customised selections of summaries are put together to smoothen navigation. When you have found a magazine of your likings, add that page to your favorites so you can easily go to that starting point directly from your profile during future visits. Below you will find some start magazines per field of study
  2. Use the topics and taxonomy terms
    • The topics and taxonomy of the study and working fields gives you insight in the amount of summaries that are tagged by authors on specific subjects. This type of navigation can help find summaries that you could have missed when just using the search tools. Tags are organised per field of study and per study institution. Note: not all content is tagged thoroughly, so when this approach doesn't give the results you were looking for, please check the search tool as back up
  3. Check or follow your (study) organizations:
    • by checking or using your study organizations you are likely to discover all relevant study materials.
    • this option is only available trough partner organizations
  4. Check or follow authors or other WorldSupporters
    • by following individual users, authors  you are likely to discover more relevant study materials.
  5. Use the Search tools
    • 'Quick & Easy'- not very elegant but the fastest way to find a specific summary of a book or study assistance with a specific course or subject.
    • The search tool is also available at the bottom of most pages

Do you want to share your summaries with JoHo WorldSupporter and its visitors?

Quicklinks to fields of study for summaries and study assistance

Field of study

Check the related and most recent topics and summaries:
Activity abroad, study field of working area:
Institutions, jobs and organizations:
Statistics
2215 1