Discovering statistics using IBM SPSS statisticsChapter 20Categorical outcomes: logistic regressionThis summary contains the information from chapter 20.8 and forward, the rest of the chapter is not necessary for the course. Logistic regression is a model for predicting categorical outcomes from categorical and continuous predictors.A binary logistic regression is when we’re trying to predict membership of only two categories.Multinominal is when we want to predict membership of more than two categories. The linear model can be expressed as: Yi = b0 + b1Xi + errorib0 is the value of the outcome when the predictors are zero (the intercept).The bs quantify the relationship between each predictor and outcome.X is the value of each predictor variable.One of the assumptions of the linear model is that the relationship between the predictors and outcome is linear.When the outcome variable is categorical, this assumption is violated.One way to solve this problem is to transform the data using the logarithmic transformation, where you can express a non-linear relationship in a linear way.In logistic regression, we predict the probability of Y occurring, P(Y) from known (logtransformed) values of X1 (or Xs).The logistic regression model with one predictor is:P(Y) = 1/(1+e –(b0 +b1X1i))The value of the model will lie between 1 and 0. ...


Access options

      How do you get full online access and services on JoHo WorldSupporter.org?

      1 - Go to www JoHo.org, and join JoHo WorldSupporter by choosing a membership + online access
       
      2 - Return to WorldSupporter.org and create an account with the same email address
       
      3 - State your JoHo WorldSupporter Membership during the creation of your account, and you can start using the services
      • You have online access to all free + all exclusive summaries and study notes on WorldSupporter.org and JoHo.org
      • You can use all services on JoHo WorldSupporter.org (EN/NL)
      • You can make use of the tools for work abroad, long journeys, voluntary work, internships and study abroad on JoHo.org (Dutch service)
      Already an account?
      • If you already have a WorldSupporter account than you can change your account status from 'I am not a JoHo WorldSupporter Member' into 'I am a JoHo WorldSupporter Member with full online access
      • Please note: here too you must have used the same email address.
      Are you having trouble logging in or are you having problems logging in?

      Toegangsopties (NL)

      Hoe krijg je volledige toegang en online services op JoHo WorldSupporter.org?

      1 - Ga naar www JoHo.org, en sluit je aan bij JoHo WorldSupporter door een membership met online toegang te kiezen
      2 - Ga terug naar WorldSupporter.org, en maak een account aan met hetzelfde e-mailadres
      3 - Geef bij het account aanmaken je JoHo WorldSupporter membership aan, en je kunt je services direct gebruiken
      • Je hebt nu online toegang tot alle gratis en alle exclusieve samenvattingen en studiehulp op WorldSupporter.org en JoHo.org
      • Je kunt gebruik maken van alle diensten op JoHo WorldSupporter.org (EN/NL)
      • Op JoHo.org kun je gebruik maken van de tools voor werken in het buitenland, verre reizen, vrijwilligerswerk, stages en studeren in het buitenland
      Heb je al een WorldSupporter account?
      • Wanneer je al eerder een WorldSupporter account hebt aangemaakt dan kan je, nadat je bent aangesloten bij JoHo via je 'membership + online access ook je status op WorldSupporter.org aanpassen
      • Je kunt je status aanpassen van 'I am not a JoHo WorldSupporter Member' naar 'I am a JoHo WorldSupporter Member with 'full online access'.
      • Let op: ook hier moet je dan wel hetzelfde email adres gebruikt hebben
      Kom je er niet helemaal uit of heb je problemen met inloggen?

      Join JoHo WorldSupporter!

      What can you choose from?

      JoHo WorldSupporter membership (= from €5 per calendar year):
      • To support the JoHo WorldSupporter and Smokey projects and to contribute to all activities in the field of international cooperation and talent development
      • To use the basic features of JoHo WorldSupporter.org
      JoHo WorldSupporter membership + online access (= from €10 per calendar year):
      • To support the JoHo WorldSupporter and Smokey projects and to contribute to all activities in the field of international cooperation and talent development
      • To use full services on JoHo WorldSupporter.org (EN/NL)
      • For access to the online book summaries and study notes on JoHo.org and Worldsupporter.org
      • To make use of the tools for work abroad, long journeys, voluntary work, internships and study abroad on JoHo.org (NL service)

      Sluit je aan bij JoHo WorldSupporter!  (NL)

      Waar kan je uit kiezen?

      JoHo membership zonder extra services (donateurschap) = €5 per kalenderjaar
      • Voor steun aan de JoHo WorldSupporter en Smokey projecten en een bijdrage aan alle activiteiten op het gebied van internationale samenwerking en talentontwikkeling
      • Voor gebruik van de basisfuncties van JoHo WorldSupporter.org
      • Voor het gebruik van de kortingen en voordelen bij partners
      • Voor gebruik van de voordelen bij verzekeringen en reisverzekeringen zonder assurantiebelasting
      JoHo membership met extra services (abonnee services):  Online toegang Only= €10 per kalenderjaar
      • Voor volledige online toegang en gebruik van alle online boeksamenvattingen en studietools op WorldSupporter.org en JoHo.org
      • voor online toegang tot de tools en services voor werk in het buitenland, lange reizen, vrijwilligerswerk, stages en studie in het buitenland
      • voor online toegang tot de tools en services voor emigratie of lang verblijf in het buitenland
      • voor online toegang tot de tools en services voor competentieverbetering en kwaliteitenonderzoek
      • Voor extra steun aan JoHo, WorldSupporter en Smokey projecten

      Meld je aan, wordt donateur en maak gebruik van de services

      Check page access:
      JoHo members
      Check more or recent content:

      Discovering statistics using IBM SPSS statistics by A. Field (5th edition) a summary

      Why is my evil lecturer forcing me to learn statisics? - summary of chapter 1 of statistics by A. Field (5th edition)

      Why is my evil lecturer forcing me to learn statisics? - summary of chapter 1 of statistics by A. Field (5th edition)

      Image

      Statistics
      Chapter 1
      Why is my evil lecturer forcing me to learn statistics?


      The research process

      Initial observation: finding something that needs explaining

      To see whether an observation is true, you need to define one or more variables to measure that quantify the thing you’re trying to measure.

      Generating and testing theories and hypotheses

      A theory: an explanation or set of principles that is well substantiated by repeated testing and explains a broad phenomenon.

      A hypotheses: a proposed explanation for a fairly narrow phenomenon or set of observations.
      An informed, theory-driven attempt to explain what has been observed.

      A theory explains a wide set of phenomena with a small set of well-established principles.
      A hypotheses typically seeks to explain a narrower phenomenon and is, as yet, untested.
      Both theories and hypotheses exist in the conceptual domain, and you cannot observe them directly.

      To test a hypotheses, we need to operationalize our hypotheses in a way that enables us to collect and analyse data that have a bearing on the hypotheses.
      Predictions emerge from a hypotheses. A prediction tells us something about the hypotheses from which it derived.

      Falsification: the act of disproving a hypotheses or theory.

      Collecting data: measurement

      Independent and dependent variable

      Variables: things that can change

      Independent variable: a variable thought to be the cause of some effect.

      Dependent variable: a variable thought to be affected by changes in an independent variable.

      Predictor variable: a variable thought to predict an outcome variable. (independent)

      Outcome variable: a variable thought to change as a function of changes in a predictor variable (dependent)

      Levels of measurement

      The level of measurement: the relationship between what is being measured and the number that represent what is being measured.

      Variables can be categorical or continuous, and can have different levels of measurement.

      A categorical variable is made up of categories.
      It names distinct entities.
      In its simplest form it names just two distinct types of things (like male or female).
      Binary variable: there are only two categories.
      Nominal variable: there are more than two categories.

      Ordinal variable: when categories are ordered.
      Tell us not only that things have occurred, but also the order in which they occurred.
      These data tell us nothing about the differences between values. Yet they still do not tell us about the differences between point scale.

      Continuous variable: a variable that gives us a score for each person and can take on any value on the measurement scale that we are using.
      Interval variable: to say that data are interval, we must certain that equal intervals on the scale represents equal differences in

      .....read more
      Access: 
      Public
      The spine of statistics - summary of chapter 2 of Statistics by A. Field (5th edition)

      The spine of statistics - summary of chapter 2 of Statistics by A. Field (5th edition)

      Image

      Statistics
      Chapter 2
      The spine of statistics

      What is the spine of statistics?

      The spine of statistics: (an acronym for)

      • Standard error
      • Parameters
      • Interval estimates (confidence intervals)
      • Null hypotheses significance testing
      • Estimation


      Statistical models

      Testing hypotheses involves building statistical models of the phenomenon of interest.
      Scientists build (statistical) models of real-world processes to predict how these processes operate under certain conditions. The models need to be as accurate as possible so that the prediction we make about the real world are accurate too.
      The degree to which a statistical model represents the data collected is known as the fit of the model.

      The data we observe can be predicted from the model we choose to fit plus some amount of error.

      Populations and samples

      Scientists are usually interested in finding results that apply to an entire population of entities.
      Populations can be very general or very narrow.
      Usually, scientists strive to infer things abut general populations rather than narrow ones.

      We collect data from a smaller subset of the population known as a sample, and use these data to infer things about the population as a whole.
      The bigger the sample, the more likely it is to reflect the whole population.

      P is for parameters

      Statistical models are made up of variables and parameters.
      Parameters are not measured an are (usually) constants believed to represent some fundamental truth about the relations between variables in the model.
      (Like mean and median).

      We can predict values of an outcome variable based on a model. The form of the model changes, but there will always be some error in prediction, and there will always be parameters that tell us about the shape or form of the model.

      To work out what the model looks like, we estimate the parameters.

      The mean as a statistical model

      The mean is a hypothetical value and not necessarily one that is observed in the data.

      Estimates have ^.

      Assessing the fit of a model: sums of squares and variance revisited.

      The error or deviance for a particular entity is the score predicted by the model for that entity subtracted from the corresponding observed score.

      Degrees of freedom (df): the number of scores used to compute the total adjusted for the fact that we’re trying to estimate the population value.
      The degrees of freedom relate to the number of observations that are free to vary.

      We

      .....read more
      Access: 
      Public
      The beast of bias - summary of chapter 6 of Statistics by A. Field (5th edition)

      The beast of bias - summary of chapter 6 of Statistics by A. Field (5th edition)

      Image

      Statistics
      Chapter 6
      The beast of bias


      What is bias?

      Bias: the summary information is at odds with the objective truth.

      An unbiased estimator: one estimator that yields and expected value that is the same thing it is trying to estimate.

      We predict an outcome variable from a model described by one or ore predictor variables and parameters that tell us about the relationship between the predictor and the outcome variable.
      The model will not predict the outcome perfectly, so for each observation there is some amount of error.

      Statistical bias enters the statistical process in three ways:

      • things that bias the parameter estimates (including effect sizes)
      • things that bias standard errors and confidence intervals
      • things that bias test statistics and p-values

      Outliers

      An outlier: a score very different from the rest of the data.

      Outliers have a dramatic effect on the sum of squared error.
      If the sum of squared errors is biased, the associated standard error, confidence interval and test statistic will be too.

      Overview of assumptions

      The second bias is ‘violation of assumptions’.

      An assumption: a condition that ensures that what you’re attempting to do works.
      If any of the assumptions are not true then the test statistic and p-value will be inaccurate and could lead us to the wrong conclusion.

      The main assumptions that we’ll look at are:

      • additivity and linearity
      • normality of something or other
      • homoscedasticity/ homogeneity of variance
      • independence

      Additivity and linearity

      The assumption of additivity and linearity: the relationship between the outcome variable and predictor is accurately described by equation.
      The scores on the outcome variable are, in reality, linearly related to any predictors. If you have several predictors then their combined effect is best described by adding their effects together.

      If the assumption is not true, even if all the other assumptions are met, your model is invalid because your description of the process you want to model is wrong.

      Normally distributed something or other

      The assumption of normality relates in different ways to things we want to do when fitting models and assessing them:

      • Parameter estimates.
        The mean is a parameter and extreme scores can bias it.
        Estimates of parameters are affected by non-normal distributions (such as those with outliers).
        Parameter estimates differ in how much they are biased in a non-normal distribution.
      • Confidence intervals
        We use values of the standard normal distribution to compute the confidence interval around a parameter estimate. Using values of he
      .....read more
      Access: 
      Public
      Non-parametric models - summary of chapter 7 of Statistics by A. Field (5h edition)

      Non-parametric models - summary of chapter 7 of Statistics by A. Field (5h edition)

      Image

      Statistics
      Chapter 7
      Non-parametric models


      When to use non-parametric tests

      Sometimes you can’t correct problems in your data.
      This is especially irksome if you have a small sample and can’t rely on the central limit theorem to get you out of trouble.

      • The historical solution is a small family of models called non-parametric tests or assumption-free tests that make fewer assumptions than the linear model.

      The four most common non-parametric procedures:

      • the Mann-Whitney test
      • the Wilcoxon signed-rank test
      • the Friedman’s test
      • the Kruskal-Wallis test

      All four tests overcome distributional problems by ranking the data.

      Ranking the data: finding the lowest score and giving it a rank 1, then finding the next highest score and giving it the rank 3, and so on.
      This process results in high scores being represented by large ranks, and low scores being represented by small ranks.
      The model is then fitted to the ranks and not to the raw scores.

      • By using ranks we eliminate the effect of outliers.

      Comparing two independent conditions: the Wilcoxon rank-sum test and Mann-Whitney test

      There are two choices to compare the distributions in two conditions containing scores from different entities:

      • the Mann-Whitney test
      • the Wilcoxon rank-sum test

      Both tests are equivalent.
      There is also a second Wilcoxon test that does something different.

      Theory

      If you were to rank the data ignoring the group to which a person belonged from lowest to highest, if there’s no difference between the groups, ten you should find a similar number of high and low ranks in each group.

      • if you added up the ranks, then you’d expect the summed total of ranks in each group to be about the same.

      If you were to rank the data ignoring the group to which a person belonged from lowest to highest, if there’s a difference between the groups, ten you should not find a similar number of high and low ranks in each group.

      • if you added up the ranks, then you’d expect the summed total of ranks in each group to be different.

      The Mann-Whitney and Wilcoxon rank-sum test use the principles above.

      • when the groups have unequal numbers of participants in them, the test statistic (Ws) for the Wilxcoxon rank-sum test is simply the sum of ranks in the
      .....read more
      Access: 
      Public
      Correlation - summary of chapter 8 of Statistics by A. Field (5th edition)

      Correlation - summary of chapter 8 of Statistics by A. Field (5th edition)

      Image

      Statistics
      Chapter 8
      Correlation


      Modeling relationships

      The data we observe can be predicted from the model we choose to fit the data plus some error in prediction.

      Outcomei = (model) + errori
      Thus
      outcomei = (b1Xi)+errori

      z(outcome)i = b1z(Xi)+errori

      z-scores are standardized scores.

      A detour into the murky world of covariance

      The simplest way to look at whether two variables are associated is to look whether they covary.
      If two variables are related, then changes in one variable should be met with similar changes in the other variable.

      Covariance (x,y) = Σni=1 ((xi-ẍ)(yi-ÿ))/N-1

      The equation for covariance is the same as the equation for variance, except that instead of squaring the deviances, we multiply them by the corresponding deviance of the second variable.

      A positive covariance indicates that as on variable deviates from the mean, the other variable deviates in the same direction.
      A negative covariance indicates that as one variable deviates from the mean, the other deviates from the mean in the opposite direction.

      The covariance depends upon the scales of measurement used: it is not a standardized measure.

      Standardization of the correlation coefficient

      To overcome the problem of dependence on the measurement scale, we need to convert the covariance into standard set of units → standardization.
      Standard deviation: a measure of the average deviation from the mean.
      If we divide any distance from the mean by the standard deviation, it gives us that distance in standard deviation units.
      We can express the covariance in a standard units of measurement if we divide it by the standard deviation. But, there are two variables and hence two standard deviations.

      Correlation coefficient: the standardized covariance

      r = covxy/(sxsy)

      sx is the standard deviation for the first variable
      sy is the standard deviation for the second variable.

      By standardizing the covariance we end up with a value that has to lie between -1 and +1.
      A coefficient of +1 indicates that the two variables are perfectly positively correlated.
      A coefficient of -1 indicates a perfect negative relationship.
      A coefficient of 0 indicates no linear relationship at all.

      The significance of the correlation coefficient

      We can test the hypothesis that the correlation is different from zero.
      There are two ways of testing this hypothesis.

      We can adjust r so that its sampling distribution is normal:

      zr = ½ loge((1+r)/(1-r))

      The resulting zr has a standard error given by:

      Sezr = 1/(square root(N-3))

      We can adjust r

      .....read more
      Access: 
      Public
      The linear model - summary of Chapter 9 by A. Field 5th edition

      The linear model - summary of Chapter 9 by A. Field 5th edition

      Image

      Statistics
      Chapter 9
      The linear model (regression)


      An introduction to the linear model (regression)

      The linear model with one predictor

      outcome = (b0+b1xi) +errori

      This model uses an unstandardised measure of the relationship (b1) and consequently we include a parameter b0 that tells us the value of the outcome when the predictor is zero.

      Any straight line can be defined by two things:

      • the slope of the line (usually denoted by b1)
      • the point at which the the line crosses the vertical axis of the graph (the intercept of the line, b0)

      These parameters are regression coefficients.

      The linear model with several predictors

      The linear model expands to include as many predictor variables as you like.
      An additional predictor can be placed in the model given a b to estimate its relationship to the outcome:

      Yi = (b0 +b1X1i +b2X2i+ … bnXni) + Ɛi

      bn is the coefficient is the nth predictor (Xni)

      Regression analysis is a term for fitting a linear model to data and using it to predict values of an outcome variable form one or more predictor variables.
      Simple regression: with one predictor variable
      Multiple regression: with several predictors

      Estimating the model

      No matter how many predictors there are, the model can be described entirely by a constant (b0) and by parameters associated with each predictor (bs).

      To estimate these parameters we use the method of least squares.
      We could assess the fit of a model by looking at the deviations between the model and the data collected.

      Residuals: the differences between what the model predicts and the observed values.

      To calculate the total error in a model we square the differences between the observed values of the outcome, and the predicted values that come from the model:

      total error: Σni=1(observedi-modeli)2

      Because we call these errors residuals, this is called the residual sum of squares (SSR).
      It is a gauge of how well a linear model fits the data.

      • if the SSR is large, the model is not representative
      • if the SSR is small, the model is representative for the data

      The least SSR gives us the best model.

      Assessing the goodness of fit, sums of squares R and R2

      Goodness of fit: how well the model fits the observed data

      Total sum of squares (SST): how good the mean is as a model of the observed outcome scores.

      We can use the values of SST and SSR to calculate how much better the linear model is than the

      .....read more
      Access: 
      Public
      Comparing two means - summary of chapter 10 of Statistics by A. Field (5th edition)

      Comparing two means - summary of chapter 10 of Statistics by A. Field (5th edition)

      Image

      Statistics
      Chapter 10
      Comparing two means

      Categorical predictors in the linear model

      If we want to compare differences between the means of two groups, all we are doing is predicting an outcome based on membership of two groups.
      This is a linear model with one dichotomous predictor.


      The t-test

      Independent t-test: used when you want to compare two means that come from conditions consisting of different entities (this is sometimes called the independent-measures or independent-means t-test)
      Paired-samples t-test: also known as the dependent t-test. Is used when you want to compare two means that come from conditions consisting of the same or related entities.

      Rationale for the t-test

      Both t-tests have a similar rationale:

      • two samples of data are collected and the sample means calculated. These might differ by either a little or a lot
      • If the samples come from the same population, then we expect their means to be roughly equal. Although it is possible for the means to differ because of sample variation, we would expect large differences between sample means to occur very infrequently. Under the null hypothesis we assume that the experimental manipulation has no effect on the participant’s behaviour: therefore, we expect means from two random samples to be very similar.
      • We compare the difference between the sample means that we collected to the difference between the sample means that we would expect to obtain (in the long run) if there were no effect. We use the standard error as a gauge of the variability between sample means. If the standard error is small, then we expect most samples to have very similar means. When the standard error is large, large differences in sample means are more likely. If the difference between the samples we have collected is larger than we would expect based on the standard error then one of two things has happened:
        • There is no effect but sample means form our population fluctuate a lot and we happen to have collected two samples that produce very different means.
        • The two samples come from different populations, which is why they have different means, and this difference is indicative of a genuine difference between the samples.
      • the larger the observed difference between the sample means, the more likely it is that the second explanation is correct.

      Most test statistics have a signal-to-noise ratio: the ‘variance explained by the model’ divided by the ‘variance that the model can’t explain’.
      Effect divided by error.
      When comparing two means, the model we fit is the difference between the two group means. Means vary from sample to sample (sampling variation) and we can use the standard error as a measure of how much means fluctuate. Therefore, we

      .....read more
      Access: 
      Public
      Moderation, mediation, and multi-category predictors - summary of chapter 11 of Statistics by A. Field (5th edition),

      Moderation, mediation, and multi-category predictors - summary of chapter 11 of Statistics by A. Field (5th edition),

      Image

      Statistics
      Chapter 11
      Moderation, mediation, and multi-category predictors


      Moderation: interactions in the linear model

      The conceptual model

      Moderation: for a statistical model to include the combined effect of two or more predictor variables on an outcome.
      This is in statistical terms an interaction effect.

      A moderator variable: one variable that affects the relationship between two others.
      Can be continuous or categorical.
      We can explore this by comparing the slope of the regression plane for X ad low and high levels of Y.

      The statistical model

      Moderation is conceptually.

      Moderation in the statistical model. We predict the outcome from the predictor variable, the proposed variable, and the interaction of the two.
      It is the interaction effect that tells us whether moderation has occurred, but we must include the predictor and moderator for the interaction term to be valid.

      Outcomei = (model) + errori

      or

      Yi = (b0 + b1iX1i + b2iX2i + … + bnXni) + Ɛi

      To add variables to a linear model we literally just add them in and assign them a parameter (b).
      Therefore, if we had two predictors labelled A and B, a model that tests for moderation would be expressed as:

      Yi = (b0 + b1Ai + b2Bi + b3ABi) + Ɛi

      The interaction is ABi

      Centring variables

      When an interaction term is included in the model the b parameters have a specific meaning: for the individual predictors they represent the regression of the outcome on that predictor when the other predictor is zero.

      But, there are situation where it makes no sense for a predictor to have a score of zero. So the interaction term makes the bs for the main predictors uninterpretable in many situations.
      For this reason, it is common to transform the predictors using grand mean centring.
      Centring: the process of transforming a variable into deviations around a fixed point.
      This fixed point ca be any value that you choose, but typically it’s the grand mean.
      The grand mean centring for a given variable is achieved by taking each score and subtracting from it the mean of all scores (for that variable).

      Centring the predictors has no effect on the b for highest-order predictor, but will affect the bs for the lower-order predictors.
      Order: how many variables are involved.
      When we centre variables, the bs represent the effect of the predictor when the other predictor is at its mean value.

      Centring is important when your model contains an interaction term because it makes the bs for lower-order effects interpretable.
      There are good reasons for not caring about the lower-order effects when the higher-order

      .....read more
      Access: 
      Public
      Comparing several independent means - summary of chapter 12 of Statistics by A. Field (5th edition)

      Comparing several independent means - summary of chapter 12 of Statistics by A. Field (5th edition)

      Image

      Statistics
      Chapter 12
      Comparing several independent means


      Using a linear model to compare several means

      ANOVA: analysis of variance
      the same thing as the linear model or regression.

      In designs in which the group sizes are unequal, it is important that the baseline category contains a large number of cases to ensure that the estimates of the b-values are reliable.

      When we are predicting an outcome from group membership, predicted values from the model are the group means.
      If the group means are meaningfully different, then using the group means should be an effective way to predict scores.

      Predictioni = b0 + b1X + b2Y + Ɛi

      Control = b0

      Using dummy coding ins only one of many ways to code dummy variables.

      • an alternative is contrast coding: in which you code the dummy variables in such a way that the b-values represent differences between groups that you specifically hypothesized before collecting data.

      The F-test is an overall test that doesn’t identify differences between specific means. But, the model parameters do.

      Logic of the F-statistic

      The F-statistic tests the overall fit of a linear model to a set of observed data.
      F is the ratio of how good the model is compared to how bad it is.
      When the model is based on group means, our predictions from the model are those means.

      • if the group means are the same then our ability to predict the observed data will be poor (F will be small)
      • if the means differ we will be able to better discriminate between cases from different groups (F will be large).

      F tells us whether the group means are significantly different.

      The same logic as for any linear model:

      • the model that represents ‘no effect’ or ‘no relationship between the predictor variable and the outcome’ is one where the predicted value of the outcome is always the grand mean
      • we can fit a different model to the data that represents our alternative hypotheses. We compare fit of this model to the fit of the null model
      • the intercept and one or more parameters (b) describe the model
      • the parameters determine the shape of the model that we have fitted.
      • in experimental research the parameters (b) represent the differences between group means. The bigger the differences between group means, the greater the difference between the model and the null model (grand mean)
      • if the differences between group means are large enough, then the resulting model will be
      .....read more
      Access: 
      Public
      Analysis of covariance - summary of chapter 13 of Statistics by A. Field (5th edition)

      Analysis of covariance - summary of chapter 13 of Statistics by A. Field (5th edition)

      Image

      Statistics
      Chapter 13
      Comparing means adjusted for other predictors (analysis of covariance)


      What is ANCOVA?

      The linear model to compare means can be extended to include one or more continuous variables that predict the outcome (or dependent variable).
      Covariates: the additional predictors.

      ANCOVA: analysis of covariance.

      Reasons to include covariates in ANOVA:

      • To reduce within-group error variance
      • Elimination of confounds

      ANCOVA and the general linear model

      For example:

      Happinessi = b0 + b1Longi + b2Shorti + b3Covariatei + Ɛi

      We can add a covariate as a predictor to the model to test the difference between group means adjusted for the covariate.

      With a covariate present, the b-values represent the differences between the means of each group and the control adjusted for the covariate(s).

      Assumptions and issues in ANCOVA

      Independence of the covariate and treatment effect

      When the covariate and the experimental effect are not independent, the treatment effect is obscured, spurious treatment effects can arise, and at the very least the interpretation of the ANCOVA is seriously compromised.

      When treatment groups differ on the covariate, putting the covariate into the analysis will not ‘control for’ or ‘balance out’ those differences.
      This problem can be avoided by randomizing participants to experimental groups, or by matching experimental groups on the covariate.

      We can see whether this problem is likely to be an issue by checking whether experimental groups differ on the covariate before fitting the model.
      If they do not significantly differ then we might consider it reasonable to use it as a covariate.

      Homogeneity of regression slopes

      When a covariate is used we loot at its overall relationship with the outcome variable:; we ignore the group to which a person belongs.
      We assume that this relationship between covariate and outcome variable holds true for all groups of participants: homogeneity of regression slopes.

      There are situations where you might expect regression slopes to differ across groups and that variability may be interesting.

      What to do when assumptions are violated

      • bootstrap for the model parameters
      • post hoc tests

      But bootstrap won’t help for the F-tests.

      There is a robust variant of ANCOVA.

      Interpreting ANCOVA

      The main analysis

      The format of the ANOVA table is largely the same as without the covariate, except that there is an additional

      .....read more
      Access: 
      Public
      Factorial designs - summary of chapter 14 of statistics by A. Field (5th edition)

      Factorial designs - summary of chapter 14 of statistics by A. Field (5th edition)

      Image

      Statistics
      Chapter 14
      Factorial designs


      Factorial designs

      Factorial design: when an experiment has two or more independent variables.
      There are several types of factorial designs:

      • Independent factorial design: there are several independent variables or predictors and each has been measured using different entities (between groups).
      • Repeated-measures (related) factorial design: several independent variables or predictors have been measured, but the same entities have been used in all conditions.
      • Mixed design: several independent variables or predictors have been measured: some have been measured with different entities, whereas others used the same entities.

      We can still fit a linear model to the design.
      Factorial ANOVA: the linear model with two or more categorical predictors that represent experimental independent variables.

      Independent factorial designs and the linear model

      The general linear model takes the following general form:

      Yi =b0 + b1X1i+b2X2i+... +bnXnii

      We can code participant’s category membership on variables with zeros and ones.

      For example:

      Attractivenessi = b0+b1Ai+b2Bi+b3ABii

      b3AB is the interaction variable. It is A dummy multiplied by B dummy variable.

      Behind the scenes of factorial designs

      Calculating the F-statistic with two categorical predictors is very similar to when we had only one.

      • We still find the total sum of squared errors (SST) and break this variance down into variance that can be explained by the model/experiment (SSM) and variance that cannot be explained (SSR)
      • The main difference is that with factorial designs, the variance explained by the model/experiment is made up of not one predictor, but two.

      Therefore, the sum of squares gets further subdivided into

      • variance explained by the first predictor/independent variable (SSA)
      • variance explained by the second predictor/independent variable (SSB)
      • variance explained by the interaction of these two predictors (SSAxB)

      Total sum of squares (SST)

      We start of with calculating how much variability there is between scores when the ignore the experimental condition from which they came.

      The grand variance: the variance of all scores when we ignore the group to which they belong.
      We treat the data as one big group.
      The degrees of freedom are: N-1

      SST = s2Grand(N-1)

      The model sum of squares (SSM)

      The model sum of squares is broken down into the variance attributable to the first independent variable, the variance attributable to the second independent variable, and the variance attributable to the interaction of those two.

      The model sum of squares: the difference between what the model predicts and the overall mean of the outcome variable.

      .....read more
      Access: 
      Public
      Repeated measures designs - summary of chapter 15 of Statistics by A. Field (5th edition)

      Repeated measures designs - summary of chapter 15 of Statistics by A. Field (5th edition)

      Image

      Statistics
      Chapter 15
      Repeated measures designs


      Introduction to repeated-measures designs

      Repeated measures: when the same entities participate in all conditions of an experiment or provide data at multiple time points.

      Repeated measures and the linear model

      Repeated measures can also be considered as a variation of the general linear model.

      For example.

      Ygi = b0i +b1iXgigi

      b0i = b0 + u0i

      b1i = b1 + u1i

      Ygi for outcome g within person i from the specific predictor Xgi with the error Ɛgi

      g is the level of treatment condition
      i for the individuals

      u0i for the deviation of the individual’s intercept from the group-level intercept

      The ANOVA approach to repeated-measures designs

      The way that people typically handle repeated measures in IBM SPSS is to use a repeated-measures ANOVA approach.

      The assumption of sphericity

      The assumption that permits us to use a simpler model to analyse repeated-measures data is sphericity.

      Sphericity: assuming that the relationship between scores in pairs of treatment conditions is similar.

      It is a form of compound symmetry: holds true when both the variances across conditions are equal and the covariances between pairs of conditions are equal.
      We assume that the variation within conditions is similar and that no two conditions are any more dependent than any other two.
      Sphericity is a more general, less restrictive form of compound symmetry and refers to the equality of variances of the differences between treatment levels.

      For example:

      varianceA-B = varianceA-C = varianceB-C

      Assessing the severity of departures from sphericity

      Mauchly’s test: assesses the hypothesis that the variances of the differences between conditions are equal.
      If the test is statistically significant, it implies that there are significant differences between the variances of differences and, therefore, sphericity is not met.
      If it is not significant, the implication is that the variances of differences are roughly equal and sphericity is met.
      It depends upon sample size.

      What’s the effect of violating the assumption of sphericity?

      A lack of sphericity creates a loss of power and an F-statistic that doesn’t have the distribution that it’s supposed to have.
      It also causes some complications for post

      .....read more
      Access: 
      Public
      Mixed designs - summary of chapter 16 of Statistics by A. Field (5th edition)

      Mixed designs - summary of chapter 16 of Statistics by A. Field (5th edition)

      Image

      Statistics
      Chapter 16
      Mixed designs


      Mixed designs

      Situations where we combine repeated-measures and independent designs.

      Mixed designs: when a design includes some independent variables that were measured using different entities and others that used repeated measures.
      A mixed design requires at least two independent variables.

      Because by adding independent variables we’re simply adding predictors to the linear model, you can have virtually any number of independent variables if your sample size is gin enough.

      We’re still essentially using the linear model.
      Because there are repeated measures involved, people typically use an ANOVA-style model. Mixed ANOVA

      Assumptions in mixed designs

      All the sources of potential bias in chapter 6 apply.

      • homogeneity of variance
      • sphericity

      You can apply the Greenhouse-Geisser correction and forget about sphericity.

      Mixed designs

      • Mixed designs compare several means when there are two or more independent variables, and at least one of them has been measured using the same entities and at least one other has been measured using different entiteis.
      • Correct for deviations from sphericity for the repeated-measures variable(s) by routinely interpreting the Greenhouse-Geisser corrected effects.
      • The table labelled Tests of Within-Subject Effects shows the F-statistic(s) for any repeated-measures variables and all of the interaction effects. For each effect, read the row labelled Greenhouse-Geisser or Huynh-Feldt. If the values in the Sig column is less than 0.05 then the means are significantly different
      • The table labelled Test of Between-Subjects Effects shows the F-statistic(s) for any between-group variables. If the value in the Sig column is less than 0.05 then the means of the groups are significantly different
      • Break down the mean effects and interaction terms using contrasts. These contrasts appear in the table labelled Tests of Within-Subjects Contrasts. Again, look at the column labelled sig.
      • Look at the means, or draw graphs, to help you interpret contrasts.

      Calculating effect sizes

      Effect sizes are more useful when they summarize a focused effect.

      A straightforward approach is to calculate effect sizes for your contrasts.

      Access: 
      Public
      Multivariate analysis of variance (MANOVA) - summary of chapter 17 of Statistics by A. Field (5th edition)

      Multivariate analysis of variance (MANOVA) - summary of chapter 17 of Statistics by A. Field (5th edition)

      Image

      Statistics
      Chapter 17
      Multivariate analysis of variance (MANOVA)


      Introducing MANOVA

      Multivariate analysis of variance (MANOVA) is used when we are interested in several outcomes.

      The principles of the linear model extend to MANOVA in that we can use MANOVA when there is one independent variable or several, we can look at interactions between outcome variables, and we can do contrasts to see which groups differ.

      Univariate: the model when we have only one outcome variable.
      Multivariate: the model when we include several outcome variables simultaneously.

      We shouldn’t fit separate linear models to each outcome variable.

      Separate models can tell us only whether groups differ along a single dimension, MANOVA has the power to detect whether groups differ along a combination of dimensions.

      Choosing outcomes

      It is a bad idea to lump outcome measures together in a MANOVA unless you have a good theoretical or empirical basis for doing so.
      Where there is a good theoretical basis for including some, but not all, of your outcome measures, then fit separate models: one for the outcomes being tested on a heuristic and one for the theoretically meaningful outcomes.

      The point here is not to include lots of outcome variables in a MANOVA just because you measured them.

      Introducing matrices

      A matrix: a grid of numbers arranged in columns and rows.
      A matrix can have many columns and rows, and we specify its dimensions using numbers.
      For example: a 2 x 3 matrix is a matrix with two rows and three columns.

      The values within a matrix are components or elements.
      The rows and columns are vectors.

      A square matrix: a matrix with an equal number of columns and rows.

      An identity matrix: a square matrix in which the diagonal elements are 1 and the off-diagonal elements are 0.

      The matrix that represents the systematic variance (or the model sum of squares for all variables) is denoted by the letter H and is called the hypothesis sum of squares and cross-products matrix (or hypothesis SSCP).

      The matrix that represents the unsystematic variance (the residual sums of squares for all variables) is denoted by the letter E and called the error sum of squares and cross-products matrix (or error SSCP).

      The matrix that represents the total amount of variance present for each outcome variable is denoted by T and is called the total sum of squares and cross-products matrix (or total SSCP).

      Cross-products represent a total value for the combined error between two variables.
      Whereas the sum of squares of a variable is

      .....read more
      Access: 
      Public
      Exploratory factor analysis - summary of chapter 18 of Statistics by A. Field (5th edition)

      Exploratory factor analysis - summary of chapter 18 of Statistics by A. Field (5th edition)

      Image

      Statistics
      Chapter 18
      Exploratory factor analysis

      In factor analysis, we take a lot of information (variables) and a computer effortlessly reduces this into a simple message (fewer variables).


      When to use factor analysis

      Latent variable: something that cannot be accessed directly.

      Measuring what the observable measures driven by the same underlying variable are.

      Factor analysis and principal component analysis (PCA) are techniques for identifying clusters of variables.
      Three main uses:

      • To understand the structure of a set of variables
      • to construct a questionnaire to measure an underlying variable
      • to reduce a data set to a more manageable size while retaining as much of the original information as possible.

      Factors and components

      If we measure several variables, or ask someone several questions about themselves, the correlation between each pair of variables can be arranged in a table.

      • this table is sometimes called the R-matrix.

      Factor analysis attempts to achieve parsimony by explaining the maximum amount of common variance in a correlation matrix using the smallest number of explanatory constructs.
      Explanatory constructs are known as latent variables (or factors) and they represent clusters of variables that correlate highly with each other.

      PCA differs in that it tries to explain the maximum amount of total variance in a correlation matrix by transforming the original variables into linear components.

      Factor analysis and PCA both aim to reduce the R matrix into a smaller set of dimensions.

      • in factor analysis these dimensions, or factors, are estimated form the data and are believed to reflect constructs that can’t be measured directly.
      • PCA transforms the data into a set of linear components. It doesn’t estimate unmeasured variables, it just transforms measured ones.

      Graphical representation

      Factors and components can be visualized as the axis of a graph along which we plot variables.
      The coordinates of variables along each axis represent the strength of relationship between that variable and each factor.
      In an ideal world a variable will have a large coordinate for one of the axes and small coordinates for any others.

      • this scenario indicates that this particular variable is related to only one factor.
      • variables that haver large coordinates on the same axis are assumed to measure different aspects of some common underlying dimension.

      Factor loading: the coordinate of a variable along a classification axis.

      If we square the factor loading for a variable we get a measure of its substantive importance to a factor.

      Mathematical representation

      .....read more
      Access: 
      Public
      Categorical outcomes: chi-square and loglinear analysis - summary of chapter 19 of Statistics by A. Field

      Categorical outcomes: chi-square and loglinear analysis - summary of chapter 19 of Statistics by A. Field

      Image

      Statistics
      Chapter 19
      Categorical outcomes: chi-square and loglinear analysis

      Analysing categorical data

      Sometimes we want to predict categorical outcome variables. We want to predict into which category an entity falls.


      Associations between two categorical variables

      With categorical variables we can’t use the mean or any similar statistic because the mean of a categorical variable is meaningless: the numeric values you attach to different categories are arbitrary, and the mean of those numeric values will depend on how many members each category has.

      When we’ve measured only categorical variables, we analyse the number of things that fall into each combination of categories (the frequencies).

      Pearson’s chi-square test

      To see whether there’s a relationship between two categorical variables we can use the Pearson’s chi-square test.
      This statistic is based on the simple idea of comparing the frequencies you observe in certain categories to the frequencies you might expect to get in those categories by chance.

      X2 = Σ(observedij-modelij)2 / modelij

      i represents the rows in the contingency table
      j represents the columns in the contingency table.

      As model we use ‘expected frequencies’.

      To adjust for inequalities, we calculate frequencies for each cell in the table using the column and row totals for that cell.
      By doing so we factor in the total number of observations that could have contributed to that cell.

      Modelij = Eij = (row totali x column totalj) / n

      X2 has a distribution with known properties called the chi-square distribution. This has a shape determined by the degrees of freedom: (r-1)(c-1)

      r = the number of rows

      c = the number of columns

      Fischer’s exact test

      The chi-square statistic has a sampling distribution that is only approximately a chi-square distribution.
      The larger the sample is, the better this approximation becomes. In large samples the approximation is good enough not to worry about the fact that it is an approximation.
      In small samples, the approximation is not good enough, making significance tests of the chi-square statistic inaccurate.

      Fischer’s exact tests: a way to compute the exact probability of the chi-square statistic in small samples.

      The likelihood ratio

      An alternative to Pearson’s chi-square.
      Based on maximum-likelihood theory.

      General idea: you collect some data and create a model for which the probability of obtaining the observed set of data is maximized, then you compare this model to the probability of obtaining those data under the null hypothesis.
      The resulting statistic is based on comparing observed frequencies with those predicted by the

      .....read more
      Access: 
      Public
      WSRt using SPSS, manual for tests in the third block of the second year of psychology at the uva

      WSRt using SPSS, manual for tests in the third block of the second year of psychology at the uva

      Image

      Here is a short explanation how to do tests in SPSS. These are the tests needed for the third block of WSRt and psychology at the second year of the uva.


      Correlation analysis (two continuous variables)

      1. Open the data
      2. Go to analyse, correlate, bivariate
      3. Place the variables of which you want to know the correlation under ‘variables’
      4. Click on ‘paste’ and run the syntax

      Partial correlation (three continuous variables and you want to know the correlation between two variables, corrected for a third variable)

      1. Open the data
      2. Go to analyse, correlate, partial
      3. Place the variable of which you want to know the correlation under ‘variables’
      4. Place the variable for which you want to control under ‘controlling for’
      5. Click on ‘options’
        Select ‘zero-order correlations’ (this is the correlation without controlling for one variable)
      6. Click on ‘continue’
      7. Click on ‘paste’ and run the syntax

      Multiple regression analysis

      1. Open the data
      2. Go to analyse, regression, linear
      3. Place the dependent variable under ‘dependent’
      4. Place the independent variables under ‘independent’
        If you want to run more models, you can put the first variable under ‘independent’, click on ‘next’ and put the next variable under ‘independent’ (this way you can compare the models)
      5. Click on ‘statistics’ and select:
        Model fit
        R squared change (if you have multiple models)
        Descriptives
        Part and partial correlations
        Collinearity diagnostics
      6. Click on ‘plots’
        Put ZPRED under Y
        Put ZRESID under X
        (This is for testing homoscedasticity)
      7. Click on ‘save’ and select:
        Unstandardised
        (for expected values)
        Mahalanobis
        Cook’s
        Leverage values
        (for outliers)
      8. Click on paste and run the syntax

      Principal component analysis

      1. Open the data
      2. Go to analyse, dimension-reduction, Factor
      3. Put the items which you want to analyse under ‘variables’
      4. Click on ‘descriptives’ and select:
        Univariate descriptives
        Initial solution
        Coefficients
        Significance levels
        Anti-image (for assumptions)
        KMO and Bartlett’s test of sphericity (also for assumptions)
      5. Click on Extraction
        Chose Principal component analysis
        Select:
        Scree plot
        Chose for an eigenvalue bigger than 1
      6. Click
      .....read more
      Access: 
      JoHo members
      Everything you need for the course WSRt of the second year of Psychology at the Uva

      Everything you need for the course WSRt of the second year of Psychology at the Uva

      Image

      This magazine contains all the summaries you need for the course WSRt at the second year of psychology at the Uva.

      Access: 
      Public
      Categorical outcomes: logistic regression - summary of (part of) chapter 20 of Statistics by A. Field

      Categorical outcomes: logistic regression - summary of (part of) chapter 20 of Statistics by A. Field

      Image

      Discovering statistics using IBM SPSS statistics
      Chapter 20
      Categorical outcomes: logistic regression

      This summary contains the information from chapter 20.8 and forward, the rest of the chapter is not necessary for the course.


      What is logistic regression?

      Logistic regression is a model for predicting categorical outcomes from categorical and continuous predictors.

      A binary logistic regression is when we’re trying to predict membership of only two categories.
      Multinominal is when we want to predict membership of more than two categories.

      Theory of logistic regression

      The linear model can be expressed as: Yi = b0 + b1Xi + errori

      b0 is the value of the outcome when the predictors are zero (the intercept).
      The bs quantify the relationship between each predictor and outcome.
      X is the value of each predictor variable.

      One of the assumptions of the linear model is that the relationship between the predictors and outcome is linear.
      When the outcome variable is categorical, this assumption is violated.
      One way to solve this problem is to transform the data using the logarithmic transformation, where you can express a non-linear relationship in a linear way.

      In logistic regression, we predict the probability of Y occurring, P(Y) from known (logtransformed) values of X1 (or Xs).
      The logistic regression model with one predictor is:
      P(Y) = 1/(1+e –(b0 +b1X1i))
      The value of the model will lie between 1 and 0.

      Testing assumptions

      You need to test for

      • Linearity of the logit
        You need to check that each continuous variable is linearly related to the log of the outcome variable.
        If this is significant, it indicates that the main effect has violated the assumption of linearity of the logic.
      • Multicollinearity
        This has a biasing effect

      Predicting several categories: multinomial logistic regression

      Multinomial logistic regression predicts membership of more than two categories.
      The model breaks the outcome variable into a series of comparisons between two categories.
      In practice, you have to set a baseline outcome category.

      Access: 
      JoHo members

      Evidence-based working in clincial practice

      Evidence-based treatment and practice: New opportunities to bridge clinical research and practice, enhance the knowledge base, and improve patient care - summary of an article in American psychologist

      Evidence-based treatment and practice: New opportunities to bridge clinical research and practice, enhance the knowledge base, and improve patient care - summary of an article in American psychologist

      Image

      Evidence-based treatment and practice: New opportunities to bridge clinical research and practice, enhance the knowledge base, and improve patient care
      American Psychologist, 63, 146-159.


      Introduction  

      A central issue is the extent to which findings from research can be applied to clinical practice.

      Empirically support or evidence-based treatment (EBT) refers to the interventions or techniques that have produced therapeutic change in controlled trials. Evidence-based practice (EBP) is a broader tem and refers to clinical practice that is informed by evidence about interventions, clinical expertise, and patient needs, values, and preferences and their integration in decision making about individual care.

      Evidence-based treatments and clinical practice: illustrative concerns

      Concerns about evidence-based treatments

      An concern about EBTs is that key conditions and characteristics of treatment research depart markedly from those in clinical practice and bring into question how and whether to generalize the results to practice.

      Another concern about research in psychotherapy pertains to the focus on symptoms and disorders as the primary way of identifying participants and evaluation treatment outcomes. In clinical practice, much of psychotherapy is not about reaching a destination (eliminating symptoms), but it is about the ride (the process of coping with life). Psychotherapy research rarely addresses the broader focus of coping with multiple stressors and negotiating the difficult shoals of life, both of which are aided by speaking with a trained professional.

      There are concerns about the methods of analysis or the results among several studies. They question whether these are satisfactory bases for concluding that treatment is effective or efficacious. These concerns are: 1) Conclusions about treatment that are based on studies showing statistical differences are difficult to translate into effects on the lives of participants in the study, let alone generalize to patients seen in practice. 2) The outcome measures in most psychotherapy studies raise fundamental concerns. Changes on rating scales are difficult to translate into changes in everyday life. Many valid and reliable measures of psychotherapy are ‘arbitrary metrics’, we do not know how changes on standardized measures translate to functioning in everyday life. 3) Typically, in a single study, multiple measures are used to evaluate outcome, and only some of these show that the treatment and control conditions are statistically different. An EBT may have support for its effects, but within individual studies and among multiple studies, the results are often mixed.

      There are inherent limitations in the ways EBTs are discussed. Large segments of the literature usually are grouped together.

      A central concern about EMBTs involves the generalization of the results

      .....read more
      Access: 
      Public
      A power primer - summary of an article by Cohen (1992)

      A power primer - summary of an article by Cohen (1992)

      Image

      A power primer. 
      Cohen (1992)
      Psychological Bulletin

      The tables of this article are missing

      Abstract

      Effect-size indexes and conventional values for these are given for operationally defined small, medium, and large effects.

      Method

      Statistical power analysis exploits the relationships among the four variables involved in statistical inference.

      • Sample size (N)
      • Significance certerion (α)
      • Population effect size (ES)
      • Statistical power

      Each is a function of the other three. It is most useful to determine the N necessary to have a specified power for given α and ES.

      The significance criterion α

      α represents the maximum risk of mistakenly rejecting the null hypothesis (committing a Type I error). This is usually .05. α risk may be defined as one or two sided.

      Power

      The statistical power of a significance test is the long-term probability, given the population ES, α, and N of rejection the H0. When the ES is nit equal to zero, H0 is false, so failure to reject it also incurs an error (Type II error). For any given ES, α, and N, its probability of occurring is β. Power is 1 – β, the probability of rejecting a false H0.

      Taken the conventional α = .05, power of .80, there is a α:β ratio of 4:1 of the two kinds of risks.

      Sample size

      In research planning, the investigator needs to know the N necessary to attain the desired power for the specified α and hypothesized ES. N increases with an increase in the power desired, a decrease in the ES and in α.

      For statistical tests involving two or more groups, N is the necessary size for each group.

      Effect size

      The effect size (ES) is the degree to which the H0 is believed to be false.

      In the Neyman-Pearson method of statistical inference, an alternative hypothesis H1 is counterpoised against H0. The degree to which H0 is false is indexed by the discrepancy between H0 and H1 and is called the ES. Each statistical test has its own ES index.   All the indexes are scale free and continuous, ranging upward from zero. For all, the H0 is that ES = 0.

      To convey the meaning of any given ES index, it is necessary to have some idea of its scale.

      The ES index for the t test of the difference between independent means is d, the difference expressed in units of the within-population standard deviation.

      Statistical tests

      .....read more
      Access: 
      Public
      Measures of clinical significance - summary of an article by Kraemer et al. (2003)

      Measures of clinical significance - summary of an article by Kraemer et al. (2003)

      Image

      Measures of clinical significance. 
      Kraemer, Morgan, Leech, Gliner, Vaske & Harmon (2003)
      Journal of the American Academic of Child & Adolescent Psychiatry


      Introduction

      Behavioural scientists are interested in answering three questions when examining the relationships between variables

      • Statistical significance
         Is an observed result real or should it be attributed to chance?
      • Effect size
        If the result is real, how large is it?
      • Clinical or practical significance
        Is the result large enough to be meaningful and useful?

      Researchers suggest that using one of three type of effect size measures assist in interpreting clinical significance

      • r family effect size measures
        The strength of association between variables
      • d family effect size measures
        The magnitude of the difference between treatment and comparison groups
      • Measures of risk potency
        • Odds ratio
        • Risk ratio
        • relative risk reduction
        • risk difference
        • number needed to treat

      Problems with statistical significance

      A statistical significant outcome indicates that there is likely to be at least one relationship between the variables. p indicates the probability that an outcome this extreme could happen, if the null hypothesis is true. It doesn’t provide information about the strength of the relationship or whether it is meaningful.

      It is possible, with a large sample, to have a statistically significant result from a weak relationship between variables. Outcomes with lower p values are sometimes misinterpret as having stronger effects than those with higher p’s.

      Non-statistically significant results do not ‘prove’ the null hypothesis. These might be due to determinants of low power.

      The presence or absence of statistical significance does not give information about the size or importance of the outcome. This makes it critical to know the effect size.

      Effect size measures

      The r family

      One method of expressing effect sizes is in terms of strength of association. This can be done with statistics such as the Pearson product moment correlation coefficient, r, used when both the independent and the dependent measures are ordered. Such effect sizes vary between  -1.0 and + 1.0. 0 represents no effect.

      The d family

      Used when the independent variable is binary (dichotomous) and the dependent variable is ordered.

      When comparing two groups, the effect size d can be computed by

      .....read more
      Access: 
      JoHo members
      Analysis of covariance - summary of chapter 13 of Statistics by A. Field (5th edition)

      Analysis of covariance - summary of chapter 13 of Statistics by A. Field (5th edition)

      Image

      Statistics
      Chapter 13
      Comparing means adjusted for other predictors (analysis of covariance)


      What is ANCOVA?

      The linear model to compare means can be extended to include one or more continuous variables that predict the outcome (or dependent variable).
      Covariates: the additional predictors.

      ANCOVA: analysis of covariance.

      Reasons to include covariates in ANOVA:

      • To reduce within-group error variance
      • Elimination of confounds

      ANCOVA and the general linear model

      For example:

      Happinessi = b0 + b1Longi + b2Shorti + b3Covariatei + Ɛi

      We can add a covariate as a predictor to the model to test the difference between group means adjusted for the covariate.

      With a covariate present, the b-values represent the differences between the means of each group and the control adjusted for the covariate(s).

      Assumptions and issues in ANCOVA

      Independence of the covariate and treatment effect

      When the covariate and the experimental effect are not independent, the treatment effect is obscured, spurious treatment effects can arise, and at the very least the interpretation of the ANCOVA is seriously compromised.

      When treatment groups differ on the covariate, putting the covariate into the analysis will not ‘control for’ or ‘balance out’ those differences.
      This problem can be avoided by randomizing participants to experimental groups, or by matching experimental groups on the covariate.

      We can see whether this problem is likely to be an issue by checking whether experimental groups differ on the covariate before fitting the model.
      If they do not significantly differ then we might consider it reasonable to use it as a covariate.

      Homogeneity of regression slopes

      When a covariate is used we loot at its overall relationship with the outcome variable:; we ignore the group to which a person belongs.
      We assume that this relationship between covariate and outcome variable holds true for all groups of participants: homogeneity of regression slopes.

      There are situations where you might expect regression slopes to differ across groups and that variability may be interesting.

      What to do when assumptions are violated

      • bootstrap for the model parameters
      • post hoc tests

      But bootstrap won’t help for the F-tests.

      There is a robust variant of ANCOVA.

      Interpreting ANCOVA

      The main analysis

      The format of the ANOVA table is largely the same as without the covariate, except that there is an additional

      .....read more
      Access: 
      Public
      Mixed designs - summary of chapter 16 of Statistics by A. Field (5th edition)

      Mixed designs - summary of chapter 16 of Statistics by A. Field (5th edition)

      Image

      Statistics
      Chapter 16
      Mixed designs


      Mixed designs

      Situations where we combine repeated-measures and independent designs.

      Mixed designs: when a design includes some independent variables that were measured using different entities and others that used repeated measures.
      A mixed design requires at least two independent variables.

      Because by adding independent variables we’re simply adding predictors to the linear model, you can have virtually any number of independent variables if your sample size is gin enough.

      We’re still essentially using the linear model.
      Because there are repeated measures involved, people typically use an ANOVA-style model. Mixed ANOVA

      Assumptions in mixed designs

      All the sources of potential bias in chapter 6 apply.

      • homogeneity of variance
      • sphericity

      You can apply the Greenhouse-Geisser correction and forget about sphericity.

      Mixed designs

      • Mixed designs compare several means when there are two or more independent variables, and at least one of them has been measured using the same entities and at least one other has been measured using different entiteis.
      • Correct for deviations from sphericity for the repeated-measures variable(s) by routinely interpreting the Greenhouse-Geisser corrected effects.
      • The table labelled Tests of Within-Subject Effects shows the F-statistic(s) for any repeated-measures variables and all of the interaction effects. For each effect, read the row labelled Greenhouse-Geisser or Huynh-Feldt. If the values in the Sig column is less than 0.05 then the means are significantly different
      • The table labelled Test of Between-Subjects Effects shows the F-statistic(s) for any between-group variables. If the value in the Sig column is less than 0.05 then the means of the groups are significantly different
      • Break down the mean effects and interaction terms using contrasts. These contrasts appear in the table labelled Tests of Within-Subjects Contrasts. Again, look at the column labelled sig.
      • Look at the means, or draw graphs, to help you interpret contrasts.

      Calculating effect sizes

      Effect sizes are more useful when they summarize a focused effect.

      A straightforward approach is to calculate effect sizes for your contrasts.

      Access: 
      Public
      Categorical outcomes: logistic regression - summary of (part of) chapter 20 of Statistics by A. Field

      Categorical outcomes: logistic regression - summary of (part of) chapter 20 of Statistics by A. Field

      Image

      Discovering statistics using IBM SPSS statistics
      Chapter 20
      Categorical outcomes: logistic regression

      This summary contains the information from chapter 20.8 and forward, the rest of the chapter is not necessary for the course.


      What is logistic regression?

      Logistic regression is a model for predicting categorical outcomes from categorical and continuous predictors.

      A binary logistic regression is when we’re trying to predict membership of only two categories.
      Multinominal is when we want to predict membership of more than two categories.

      Theory of logistic regression

      The linear model can be expressed as: Yi = b0 + b1Xi + errori

      b0 is the value of the outcome when the predictors are zero (the intercept).
      The bs quantify the relationship between each predictor and outcome.
      X is the value of each predictor variable.

      One of the assumptions of the linear model is that the relationship between the predictors and outcome is linear.
      When the outcome variable is categorical, this assumption is violated.
      One way to solve this problem is to transform the data using the logarithmic transformation, where you can express a non-linear relationship in a linear way.

      In logistic regression, we predict the probability of Y occurring, P(Y) from known (logtransformed) values of X1 (or Xs).
      The logistic regression model with one predictor is:
      P(Y) = 1/(1+e –(b0 +b1X1i))
      The value of the model will lie between 1 and 0.

      Testing assumptions

      You need to test for

      • Linearity of the logit
        You need to check that each continuous variable is linearly related to the log of the outcome variable.
        If this is significant, it indicates that the main effect has violated the assumption of linearity of the logic.
      • Multicollinearity
        This has a biasing effect

      Predicting several categories: multinomial logistic regression

      Multinomial logistic regression predicts membership of more than two categories.
      The model breaks the outcome variable into a series of comparisons between two categories.
      In practice, you have to set a baseline outcome category.

      Access: 
      JoHo members
      Moderation, mediation, and multi-category predictors - summary of chapter 11 of Statistics by A. Field (5th edition),

      Moderation, mediation, and multi-category predictors - summary of chapter 11 of Statistics by A. Field (5th edition),

      Image

      Statistics
      Chapter 11
      Moderation, mediation, and multi-category predictors


      Moderation: interactions in the linear model

      The conceptual model

      Moderation: for a statistical model to include the combined effect of two or more predictor variables on an outcome.
      This is in statistical terms an interaction effect.

      A moderator variable: one variable that affects the relationship between two others.
      Can be continuous or categorical.
      We can explore this by comparing the slope of the regression plane for X ad low and high levels of Y.

      The statistical model

      Moderation is conceptually.

      Moderation in the statistical model. We predict the outcome from the predictor variable, the proposed variable, and the interaction of the two.
      It is the interaction effect that tells us whether moderation has occurred, but we must include the predictor and moderator for the interaction term to be valid.

      Outcomei = (model) + errori

      or

      Yi = (b0 + b1iX1i + b2iX2i + … + bnXni) + Ɛi

      To add variables to a linear model we literally just add them in and assign them a parameter (b).
      Therefore, if we had two predictors labelled A and B, a model that tests for moderation would be expressed as:

      Yi = (b0 + b1Ai + b2Bi + b3ABi) + Ɛi

      The interaction is ABi

      Centring variables

      When an interaction term is included in the model the b parameters have a specific meaning: for the individual predictors they represent the regression of the outcome on that predictor when the other predictor is zero.

      But, there are situation where it makes no sense for a predictor to have a score of zero. So the interaction term makes the bs for the main predictors uninterpretable in many situations.
      For this reason, it is common to transform the predictors using grand mean centring.
      Centring: the process of transforming a variable into deviations around a fixed point.
      This fixed point ca be any value that you choose, but typically it’s the grand mean.
      The grand mean centring for a given variable is achieved by taking each score and subtracting from it the mean of all scores (for that variable).

      Centring the predictors has no effect on the b for highest-order predictor, but will affect the bs for the lower-order predictors.
      Order: how many variables are involved.
      When we centre variables, the bs represent the effect of the predictor when the other predictor is at its mean value.

      Centring is important when your model contains an interaction term because it makes the bs for lower-order effects interpretable.
      There are good reasons for not caring about the lower-order effects when the higher-order

      .....read more
      Access: 
      Public
      Voorbij het oordeel van de dodo - samenvatting van een artikel van Huiberts (2015)

      Voorbij het oordeel van de dodo - samenvatting van een artikel van Huiberts (2015)

      Image

      Voorbij het oordeel van de dodo
      Huibers (2015)
      Tijdschrift voor Psychotherapie
      DOI: 10.1007/s12485-015-0027-6

      Inleiding

      Om de effectiviteit van psychotherapie te verhogen zijn er drie routes. Dit zijn: 1) Het bedenken van nieuwe therapieën 2) Inzicht vinden in de onderliggende werkingsmechanismen 3) Onderzoek doen naar het profiel van patiënten die baad hebben van therapie.

      The great psychotherapy debate

      Over de werkingsmechanismen van psychotherapie is weinig bekend.

      Mechanismen en mediatie in onderzoek

      Een mediator is de term voor het werkingsmechanisme of het veranderproces waarmee we het effect van de therapie willen verklaren.

      De temporele relatie is wat er vooraf gaat aan het andere. Om dit te kunnen bepalen moeten we tijdens de behandeling vaak zowel de mediator als de uitkomstvariabele meten. Zo kun je ontdekken welke eraan vooraf gaat. Indien de mediator aan de uitkomstvariabele vooraf gaat, weet je nog niet zeker of de mediator werkelijk het veranderingsmechanisme is. De verandering op de mediator kan op zijn beurt weer worden veroorzaakt door een andere factor.

      In onderzoek naar werkingsmechanismen zijn twee elementen belangrijk. Dit zijn: 1) Statistische mediatie, het effect van de behandeling op de uitkomst loopt via de mediator 2) Temporele relatie, de verandering van de mediator moet eerder optreden dan de verandering in uitkomstvariabele.

      Welke therapie werkt voor welke patiënt?

      Een moderator wijst op verschillende uitkomsten in verschillende behadelingen. Het is een interessante factor om te bepalen voor welke mensen een bepaalde soort therapie werkt. bn

      Access: 
      JoHo members
      An introduction to Meta-analysis - summary of chapter 1, 2, 3, 4, 8, 10, 11, 12, 13 and 20

      An introduction to Meta-analysis - summary of chapter 1, 2, 3, 4, 8, 10, 11, 12, 13 and 20

      Image

      An introduction to Meta-analysis


      Chapter 1 How a meta-analysis works

      Individual studies

      Effect size

      The effect size is a value which reflects the magnitude of the treatment effect or the strength of a relationship between two variables. It can represent any relationship between two variables. It is the unit of currency in a meta-analysis.

      You compute the effect size for each study, and then work with the effect sizes to assess the consistency of the effect across studies and to compute a summary effect.

      In graphs, the effect size for each study is represented by a square, with the location of the square representing both the direction and magnitude of the effect.

      Precision

      In a schematic, the effect size for each study is bounded by a confidence interval. This reflects the precision with which the effect size has been estimated in that study.

      Study weights

      In a schematic, the size of each square reflects the weight that is assigned to the corresponding study. There is a relationships between a study’s precision and that study’s weight in the analysis. Studies with good precision are assigned more weight. Precision is primarily driven by simple size.

      Other elements can be used as well to assign weights.

      p-values

      For each study, a p-value for a test of the null is shown. A p-value will fall under .005 only if the 95% confidence interval does not include the null value.

      The summary effect

      Typically, you report the effect size of a summary effect, as well as a measure of precision and a p-value.

      Effect size

      In a plot, a summary effect is shown on the bottom line. It is the weighted mean of the individual effects. The mechanism used to assign the weights depends on our assumptions about the distribution of effect sizes form which the studies were sampled. 1) Fixed-effect model, there is assumed that all studies in the analysis share the same true effect size. The summary effect is the estimate of this common effect size 2) Random-effects model, there is assumed that the true effect sizes vary from study to study. The summary effect is our estimate of the mean of the distribution of effect sizes

      Precision

      The summary effect is represented by a diamond. The location

      .....read more
      Access: 
      JoHo members
      Meta-analysis in mental health reserach - summary of part of an article by Cuipers (2016)

      Meta-analysis in mental health reserach - summary of part of an article by Cuipers (2016)

      Image

      Meta-analysis in mental health reserach
      Cuipters, P (2016)


      Advantages and problems of meta-analysis

      Advantages of meta-analysis are: 1) The statistical power to detect effects is higher than for individual studies,this makes a more precise and accurate estimation of true effects possible 2) It is possible to explore inconsistencies between studies and to examine whether the effects of the intervention differs among specific subgroups of studies. 3) It is possible to make an estimate of the number of studies that were conducted but not published

      Problems with meta-analyses are: 1) Garbage in, garbage out, they can never be better than the studies they summarize 2) They combine apples and oranges, there are always differences between studies. 3) The file drawer problem, not all relevant studies are published and are often not included in meta-analyses 4) Researcher allegiance, the agenda-driven bias of researchers who conduct the meta-analyses.

      Publication bias

      Publication bias is the problem that not all the studies that are conducted in a certain area are actually published. Publication of studies which show significant effect and large effects of intentions are favoured. This can lead to an over-estimation of the effect.

      There are several other types of reporting bias 1)  Time lag bias, some studies are published later than others, depending on the nature and direction of the results 2) Outcome reporting bias 3) Language bias, when studies in another language are not identified and these studies differ in terms of nature and direction of the results.

      Testing for publication bias with indirect methods: the funnel plot

      In some cases it is possible to examine publication bias directly.

      If it is not possible to compare published with unpublished trials, it is possible to get an indirect impression whether there is publication bias or not. These estimates are based on the assumption that large studies can make a more precise estimate of the effect size. Random variations of the effect sizes are larger in studies with few participants, a difference that can be represented graphically in a ‘funnel plot’. In this plot the effect size is represented at the horizontal axis and the size of the study on the vertical axis. If the effect sizes differ from the mean effect size only by chance, they should divert in both directions, both positive and negative.

      There are several tests for the asymmetry of the funnel plot.

      There is also a method to impute the missing studies and estimate the effect size after imputation of these missing studies.

      .....read more
      Access: 
      JoHo members
      Evaluating statistical and clinical significance of intervention effects in single-case experimental designs: an SPSS method to analyse univariate data - summary of an article by : Marija, de Haan, Hogendoorn, Wolters and Huizenga

      Evaluating statistical and clinical significance of intervention effects in single-case experimental designs: an SPSS method to analyse univariate data - summary of an article by : Marija, de Haan, Hogendoorn, Wolters and Huizenga

      Image

      Evaluating statistical and clinical significance of intervention effects in single-case experimental designs: an SPSS method to analyse univariate data
      Marija, de Haan, Hogendoorn, Wolters and Huizenga (2015)


      Abstract

      Single-case experimental designs are useful to investigate individual client progress. This can help the clinician to investigate whether an intervention works as compared with a baseline period or another intervention, and whether symptom improvement is clinically significant.

      Introduction

      In single-case experimental designs (SCEDs), a single participant is repeatedly assessed on one or multiple indices (symptoms) during various phases. Advantages of this are: 1) It can be used to test novel interventions. 2) In heterogenous populations, it can be the only way to investigate treatment outcomes. 3) It offers the possibility to systematically document knowledge of researchers and clinicians.

      Method

      The AB design consist of two phases; baseline and a treatment.

      The AB method can be conceptualized as an interrupted time series. To analysis the differences between baseline and treatment two requirements must be fulfilled. These are:  1) the overall pattern in the time series has to be modelled adequately, an adequate model consists of two linear functions, one for baseline and one for the treatment. Each of these functions is described by an intercept and a slope. 2) adequate modelling of potential correlations between residuals is needed, this is adequate modelling for that which remains after the overall pattern has been accounted for. Autocorrelation is the correlation between residuals of the observations. The residuals are not independent. If residuals are correlated, the correlations are likely to decrease with increasing separation between time points.

      Analyses investigating treatment efficacy

       Y(i) = b0 +b1 * phase(i) + b2*time_in_phase (i) + b3 * phase(i) * time_in_phase(i) + error(i)

      Y(i) is the outcome variable at time point i.
      Phase(i) denotes the phase in which time point I is contained (0 for baseline and 1 for treatment).
      Time_in_phase (i)is the time points in each phase.
      Error (i) is the residual at point i.
      b0 is the baseline intercept
      b1 is the treatment-baseline difference in intercepts
      b2 is the baseline slope
      b3 is the treatment-baseline difference in slopes

      Intercept differences between phases can be assessed by testing whether b1 differs from 0. Slope differences can be assessed by testing b3.

      B0 and b1 refer to symptom scores when time_in_phase is zero. This depends on the coding of time_in_phase.

      It is best to both describe the general trend adequately and to account for remaining correlations. The

      .....read more
      Access: 
      JoHo members
      N=1 studies in onderzoek en praktijk - samenvatting van een artikel uit De psycholoog

      N=1 studies in onderzoek en praktijk - samenvatting van een artikel uit De psycholoog

      Image

      N=1 studies in onderzoek en praktijk. 
      De Psycholoog, 3, 11-20


      Inleiding

      Door frequent te meten bij een individu ontstaat er een N =1 studie. De informatie kan worden gebruikt om de behandeling optimaal in te richten en maximaal resultaat te verkrijgen.

      Kenmerken

      N =1 studies zijn studies waarbij de ernst van de klachten worden gemeten bij één cliënt tijdens de baseline en de behandeling. Dit kan informatie opleveren of en wanneer de behandeling werkt.

      Kenmerken van N=1 studies zijn: 1) Het is een studie op één deelnemer. Het is mogelijk een reeks N =1 studies uit te voeren waarbij meerdere deelnemers met dezelfde onderzoeksvraag worden onderzocht. De gevonden resultaten worden bij elkaar gevoegd. 2) Er wordt regelmatig en betrouwbaar gemeten 3) Er zijn duidelijk gedefinieerde fasen. Fase A is een baselinefase Fase B is een fase waarin er behandeling plaatsvindt. In een ABC design krijgt een cliënt twee verschillende behandelingen (B en C).

      Werkzaam

      N = 1 studies kunnen gebruikt worden voor het testen van nieuwe behandelprocedures en om de doeltreffendheid van bestaande behandelingen te testen in de praktijk.

      Werkingsmechanismen

      In N=1 studies kunnen veranderingsprocessen worden onderzocht. Werkingsmechanismen zijn processen waardoor de behandeling effect behaalt.

      Voor het onderzoek naar werkingsmechanismen zijn een aantal voorwaarden waaraan moet worden voldaan: 1) de deelnemer moet de behandeling hebben ontvangen 2) tijdens de behandeling moet er verbetering op zijn getreden in de onderzochte klachten 3) er moet worden aangetoond dat er verandering is opgetreden op het voorgestelde werkingsmechanisme 4) de verandering op het werkingsmechanisme moet voorafgaan aan het behandeleffect.

      Brug onderzoek en praktijk

      De N=1 methodologie sluit aan bij de klinische praktijk.

      Een nadeel van de N=1 methodologie is: gegevens verkregen via een N=1 studie hebben een lagere bewijskracht dan gegevens verkregen uit een RCT.

      N=1 studies en RCT studies kunnen elkaar aanvullen.

      Voordelen N=1 studies zijn: 1) De omgeving van de studie moet minder aan methodologische eisen voldoen. De cliënt is zijn eigen controle Het kan worden onderzocht in heterogene groepen 2) Analyses worden op individueel niveau uitgevoerd, waardoor er geen informatie verloren gaat 3) N=1 studies worden op kleine schaal uitgevoerd.

      De werkingsmechanismes die naar voren komen in N=1 studies kunnen daarna in RCTs worden onderzocht.

      Routinematig meten

      Routine Outcome motinitoring (ROM) maakt herhaaldelijk meten mogelijk. Het kan gebruikt worden om gegevens te verzamelen over stoornis, cliënt en behandeluitkomsten.

      .....read more
      Access: 
      JoHo members
      The empirical status of empirically supported psychotherapies: Assumptions, findings, and reporting in controlled clinical trials - summary of an article in Psychological bulletin

      The empirical status of empirically supported psychotherapies: Assumptions, findings, and reporting in controlled clinical trials - summary of an article in Psychological bulletin

      Image

      The empirical status of empirically supported psychotherapies: Assumptions, findings, and reporting in controlled clinical trials
      Psychological Bulletin, 130

      The assumptions underlying ESTs

      ESTs are empirically supported therapies.

      ESTs are typically designed for a single Axis I disorder, and patients are screened to maximize homogeneity of diagnosis and minimize co-occurring conditions that could increase variability of treatment response. Treatments are manualized and of brief and fixed duration to minimize within-group variability. Outcome assessment focuses primarily on the symptom that is the focus of the study.

      RCT methodologies to validate ESTs require a set of additional assumptions that are neither well validated nor applicable to most disorders because: 1) psychopathology is highly malleable 2) most patients can be treated for a single problem or disorder 3) psychiatric disorders can be treated independently of personality factors 4) experimental methods provide a gold standard for identifying useful psychotherapeutic packages

      Psychological processes are highly malleable

      A substantial body of data shows that, with or without treatment, relapse rates for all but a handful of disorders are high. There is also a dose-response relationship. Longer treatments are more effective.

      Most psychopathological vulnerabilities are highly resistant to change. They tend to be rooted in personality and temperament. The modal patient treated with brief treatments for most disorders relapses or seeks additional treatment.

      Most patients have one primary problem or can be treated as if they do

      In RCTs, including patients with substantial comorbidities would vastly increase the sample size necessary to detect treatment differences if comorbidity bears any systematic relation to outcome.

      Three issues are: 1) The empirical and pragmatic limits imposed by reliance on DSM diagnoses 2) The problem of comorbidity 3) the way the different functions of assessing comorbidity in controlled trials and clinical practice may place limits on generalizability

      The pragmatics of DSM-IV diagnosis

      Three costs of the DSM are 1) the diagnoses are themselves created by committee consensus on the basis of the available evidence rather than by strictly empirical methods 2) The assumption that patients typically present with symptoms of a specific Axis I diagnosis and identify

      .....read more
      Access: 
      JoHo members
      Evidence-based working in clinical practice - uva

      Selected contributions for Understanding logistic regression

      What is logistic regression? – Chapter 15

      What is logistic regression? – Chapter 15


      15.1 What are the basics of logistic regression?

      A logistic regression model is a model with a binary response variable (like 'agree' or 'don't agree'). It's also possible for logistic regression models to have ordinal or nominal response variables. The mean is the proportion of responses that are 1. The linear probability model is P(y=1) = α + βx. This model often is too simple, a more extended version is:

      The logarithm can be calculated using software. The odds are:: P(y=1)/[1-P(y=1)]. The log of the odds, or logistic transformation (abbreviated as logit) is the logistic regression model: logit[P(y=1)] = α + βx.

      To find the outcome for a certain value of a predictor, the following formula is used:

      The e to a certain power is the antilog of that number.

      A straight line is drawn next to the curve of a logistic graph to analyze it. β is maximal where P(y=1) = ½. For logistic regression the maximal likelihood method is used instead of the least squares method. The model expressed in odds is:

      The estimate is:

      With this the odds ratio can be calculated.

      There are two possibilities to present the data. For ungrouped data a normal contingency table suffices. For grouped data a row contains data for every count in a cel, like just one row with the number of subjects that agreed, followed by the total number of subjects.

      An alternative of the logit is the probit. This link assumes a hidden, underlying continuous variable y* that is 1 above a certain value T (threshold) and that is 0 below T. Because y* is hidden, it's called a latent variable. However, it can be used to make a probit model: probit[P(y=1)] = α + βx.

      Logistic regression with repeated measures and random effects is analyzed with a linear mixed model: logit[P(yij = 1)] = α + βxij + si.

      15.2 What does multiple logistic regression look like?

      The multiple logistic regression model is: logit[P(y = 1)] = α + β1x1 + … + βpxp. The further βi is from 0, the stronger

      .....read more
      Access: 
      JoHo members
      Categorical outcomes: logistic regression - summary of (part of) chapter 20 of Statistics by A. Field

      Categorical outcomes: logistic regression - summary of (part of) chapter 20 of Statistics by A. Field

      Image

      Discovering statistics using IBM SPSS statistics
      Chapter 20
      Categorical outcomes: logistic regression

      This summary contains the information from chapter 20.8 and forward, the rest of the chapter is not necessary for the course.


      What is logistic regression?

      Logistic regression is a model for predicting categorical outcomes from categorical and continuous predictors.

      A binary logistic regression is when we’re trying to predict membership of only two categories.
      Multinominal is when we want to predict membership of more than two categories.

      Theory of logistic regression

      The linear model can be expressed as: Yi = b0 + b1Xi + errori

      b0 is the value of the outcome when the predictors are zero (the intercept).
      The bs quantify the relationship between each predictor and outcome.
      X is the value of each predictor variable.

      One of the assumptions of the linear model is that the relationship between the predictors and outcome is linear.
      When the outcome variable is categorical, this assumption is violated.
      One way to solve this problem is to transform the data using the logarithmic transformation, where you can express a non-linear relationship in a linear way.

      In logistic regression, we predict the probability of Y occurring, P(Y) from known (logtransformed) values of X1 (or Xs).
      The logistic regression model with one predictor is:
      P(Y) = 1/(1+e –(b0 +b1X1i))
      The value of the model will lie between 1 and 0.

      Testing assumptions

      You need to test for

      • Linearity of the logit
        You need to check that each continuous variable is linearly related to the log of the outcome variable.
        If this is significant, it indicates that the main effect has violated the assumption of linearity of the logic.
      • Multicollinearity
        This has a biasing effect

      Predicting several categories: multinomial logistic regression

      Multinomial logistic regression predicts membership of more than two categories.
      The model breaks the outcome variable into a series of comparisons between two categories.
      In practice, you have to set a baseline outcome category.

      Access: 
      JoHo members
      MVDA - logistic regression analysis

      MVDA - logistic regression analysis

      Image

      Week 4: Logistic Regression Analysis (LRA)

      LRA can be used when the dependent variable (Y) is binary and the predictors (X1, X2) interval level (or binary).

      The research question is: Can Y be predicted fromX1and/orX2?

      • Example: Can the passing (1) or failing (0) the MVDA exam (Y) be predicted from the student’s grade on the psychometrics exam (X)?

      Is there a significant association between grade and passing/failing the exam? (report test statistic, df, and p value)?

      Here, we look at the Variables in the Equation table at the Wald of the grade. If it’s significant, then yes there is a significant association. An example of how this can be reported:

      Yes, Wald  χ2(1) = 7.090,p=.006

      Write down the logistic regression equation

      For example:

      if the constant B is -4.200

      the grade B is: .671

      Then the equation looks like this:

      (From now on, sorry for the weird format of the formulas)

      For what grade is the probability of passing the MVDA exam equal to the probability of failing the MVDA exam?

      Passing= 50%

      Failing=50%

      P=1/2 =

      In order for  to be 1, -4.200+ .671(Grade) has to be equal to 0. This is because e to the power of 0 is 1.

      So, -4.200 + 0.671(g)=0

      0.671(g)=4.200

      g=6.259

      Therefore, the grade where there is an equal chance for passing and failing is 6.259.

      Calculate the probabilities and odds of passing for X= 0,5, 10

      X                      P                                              Odds (rounded up)              

      0                      =0.0148                           = = 0.015                    

      5                    = 0.3005                            = 0.429                              

      10                   = 0.9248                           =11.5

       

      How to calculate the odds ratio?

      Example:

      X                      P                      Odds               Odds ratio

      1                   .0285           .02931          =1.958

      2                   .0543            .0574            1.958

      Therefore, if X increases 1 unit, the odds are going to increase by x 1.958 (times 1.958).

       

      What is the odds ratio of X of

      .....read more
      Access: 
      Public
      Call to action: Do you have statistical knowledge and skills and do you enjoy helping others while expanding your international network?

      Call to action: Do you have statistical knowledge and skills and do you enjoy helping others while expanding your international network?

      People who share their statistical knowledge and skills can contact WorldSupporter Statistics for more exposure to a larger audience. Relevant contributions to specific WorldSupporter Statistics Topics are highlighted per topic so that users who are interested in certain statistical topics can broaden their theoretical perspective and international network.

      Do you have statistical knowledge and skills and do you enjoy helping others while expanding your international network? Would you like to cooperate with WorldSupporter Statistics? Please send us an e-mail with some basics (Where do you live? What's your (statistical) background? How are you helping others at the moment? And how do you see that in relation to WorldSupporter Statistics?) to info@joho.org - and we will most definitely be in touch.

      Work for WorldSupporter

      Image

      JoHo can really use your help!  Check out the various student jobs here that match your studies, improve your competencies, strengthen your CV and contribute to a more tolerant world

      Working for JoHo as a student in Leyden

      Parttime werken voor JoHo

      Check more of this topic?
      How to use more summaries?


      Online access to all summaries, study notes en practice exams

      Using and finding summaries, study notes en practice exams on JoHo WorldSupporter

      There are several ways to navigate the large amount of summaries, study notes en practice exams on JoHo WorldSupporter.

      1. Starting Pages: for some fields of study and some university curricula editors have created (start) magazines where customised selections of summaries are put together to smoothen navigation. When you have found a magazine of your likings, add that page to your favorites so you can easily go to that starting point directly from your profile during future visits. Below you will find some start magazines per field of study
      2. Use the menu above every page to go to one of the main starting pages
      3. Tags & Taxonomy: gives you insight in the amount of summaries that are tagged by authors on specific subjects. This type of navigation can help find summaries that you could have missed when just using the search tools. Tags are organised per field of study and per study institution. Note: not all content is tagged thoroughly, so when this approach doesn't give the results you were looking for, please check the search tool as back up
      4. Follow authors or (study) organizations: by following individual users, authors and your study organizations you are likely to discover more relevant study materials.
      5. Search tool : 'quick & dirty'- not very elegant but the fastest way to find a specific summary of a book or study assistance with a specific course or subject. The search tool is also available at the bottom of most pages

      Do you want to share your summaries with JoHo WorldSupporter and its visitors?

      Quicklinks to fields of study (main tags and taxonomy terms)

      Field of study

      Access level of this page
      • Public
      • WorldSupporters only
      • JoHo members
      • Private
      Statistics
      2378 1
      Comments, Compliments & Kudos:

      Add new contribution

      CAPTCHA
      This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
      Image CAPTCHA
      Enter the characters shown in the image.
      Promotions
      Image
      The JoHo Insurances Foundation is specialized in insurances for travel, work, study, volunteer, internships an long stay abroad
      Check the options on joho.org (international insurances) or go direct to JoHo's https://www.expatinsurances.org

       

      More contributions of WorldSupporter author: SanneA:
      Follow the author: SanneA