Call to action: Do you have statistical knowledge and skills and do you enjoy helping others while expanding your international network?

 

People who share their statistical knowledge and skills can contact WorldSupporter Statistics for more exposure to a larger audience. Relevant contributions to specific WorldSupporter Statistics Topics are highlighted per topic so that users who are interested in certain statistical topics can broaden their theoretical perspective and international network.

Do you have statistical knowledge and skills and do you enjoy helping others while expanding your international network? Would you like to cooperate with WorldSupporter Statistics? Please send us an e-mail with some basics (Where do you live? What's your (statistical) background? How are you helping others at the moment? And how do you see that in relation to WorldSupporter Statistics?) to info@joho.org - and we will most definitely be in touch.

This content is used in bundle:

Selected contributions for Introduction to Statistics

What are statistical methods? – Chapter 1

What are statistical methods? – Chapter 1


1.1 What is statistics and how can you learn it?

Statistics is used more and more often to study the behavior of people, not only by the social sciences but also by companies. Everyone can learn how to use statistics, even without much knowledge of mathematics and even with fear of statistics. Most important are logic thinking and perseverance.

To first step to using statistical methods is collecting data. Data are collected observations of characteristics of interest. For instance the opinion of 1000 people on whether marihuana should be allowed. Data can be obtained through questionnaires, experiments, observations or existing databases.

But statistics aren't only numbers obtained from data. A broader definition of statistics entails all methods to obtain and analyze data.

1.2 What is the difference between descriptive and inferential statistics?

Before being able to analyze data, a design is made on how to obtain the data. Next there are two sorts of statistical analyses; descriptive statistics and inferential statistics. Descriptive statistics summarizes the information obtained from a collection of data, so the data is easier to interpret. Inferential statistics makes predictions with the help of data. Which kind of statistics is used, depends on the goal of the research (summarize or predict).

To understand the differences better, a number of basic terms are important. The subjects are the entities that are observed in a research study, most often people but sometimes families, schools, cities etc. The population is the whole of subjects that you want to study (for instance foreign students). The sample is a limited number of selected subjects on which you will collect data (for instance 100 foreign students from several universities). The ultimate goal is to learn about the population, but because it's impossible to research the entire population, a sample is made.

Descriptive statistics can be used both in case data is available for the entire population and only for the sample. Inferential statistics is only applicable to samples, because predictions for a yet unknown future are made. Hence the definition of inferential statistics is making predictions about a population, based on data gathered from a sample.

The goal of statistics is to learn more about the parameter. The parameter is the numerical summary of the population, or the unknown value that can tell something about the ultimate conditions of the whole. So it's not about the sample but about the population. This is why an important part of

.....read more
Access: 
JoHo members
What are the main measures and graphs of descriptive statistics? - Chapter 3

What are the main measures and graphs of descriptive statistics? - Chapter 3


3.1 Which tables and graphs display data?

Descriptive statistics serves to create an overview or summary of data. There are two kinds of data, quantitative and categorical, each has different descriptive statistics.

To create an overview of categorical data, it's easiest if the categories are in a list including the frequence for each category. To compare the categories, the relative frequencies are listed too. The relative frequency of a category shows how often a subject falls within this category compared to the sample. This can be calculated as a percentage or a proportion. The percentage is the total number of observations within a certain category, divided by the total number of observations * 100. Calculating a proportion works largely similar, but then the number isn't multiplied by 100. The sum of all proportions should be 1.00, the sum of all percentages should be 100.

Frequencies can be shown using a frequency distribution, a list of all possible values of a variable and the number of observations for each value. A relative frequency distributions also shows the comparisons with the sample.

Example (relative) frequency distribution:

Gender

Frequence

Proportion

Percentage

Male

150

0.43

43%

Female

200

0.57

57%

Total

350 (=n)

1.00

100%

Aside from tables also other visual displays are used, such as bar graphs, pie charts, histograms and stem-and-leaf plots.

A bar graph is used for categorical variables and uses a bar for each category. The bars are separated to indicate that the graph doesn't display quantitative variables but categorical variables.

A pie chart is also used for categorical variables. Each slice represents a category. When the values are close together, bar graphs show the differences more clearly than pie charts.

Frequency distributions and other visual displays are also used for quantitative variables. In that case, the categories are replaced by intervals. Each interval has a frequence, a proportion and a percentage.

A histogram is a graph of the frequency distribution for a quantitative variable. Each value is represented by a bar, except when there are many values, then

.....read more
Access: 
JoHo members
Call to action: Do you have statistical knowledge and skills and do you enjoy helping others while expanding your international network?

Call to action: Do you have statistical knowledge and skills and do you enjoy helping others while expanding your international network?

People who share their statistical knowledge and skills can contact WorldSupporter Statistics for more exposure to a larger audience. Relevant contributions to specific WorldSupporter Statistics Topics are highlighted per topic so that users who are interested in certain statistical topics can broaden their theoretical perspective and international network.

Do you have statistical knowledge and skills and do you enjoy helping others while expanding your international network? Would you like to cooperate with WorldSupporter Statistics? Please send us an e-mail with some basics (Where do you live? What's your (statistical) background? How are you helping others at the moment? And how do you see that in relation to WorldSupporter Statistics?) to info@joho.org - and we will most definitely be in touch.

Startmagazine: Introduction to Statistics

Startmagazine: Introduction to Statistics

Image

Introduction to Statistics: in short

  • Statistics comprises the arithmetic procedures to organize, sum up and interpret information. By means of statistics you can note information in a compact manner.
  • The aim of statistics is twofold: 1) organizing and summing up of information, in order to publish research results and 2) answering research questions, which are formed by
........Read more

Selected contributions for Data: distributions, connections and gatherings

Which kinds of samples and variables are possible? – Chapter 2

Which kinds of samples and variables are possible? – Chapter 2


2.1 Which kinds of variables can be measured?

All characteristics of a subject that can be measured are variables. These characteristics can vary between different subjects within a sample or within a population (like income, sex, opinion). The use of variables is to indicate the variability of a value. As as example, the number of beers consumed per week by students. The values of a variable constitute the measurement scale. Several measurement scales, or ways to differ variables, are possible.

The most important divide is that between quantitative and categorical variables. Quantitative variables are measured in numerical values, such as age, numbers of brothers and sisters, income. Categorical variables (also called qualitative variables) are measured in categories, such as sex, marital status, religion. The measurement scales are tied to statistical analyses: for quantitative variables it is possible to calculate the mean (i.e. the average age), but for categorical variables this isn't possible (i.e. there is no average sex.

Also there are four measurement scales: nominal, ordinal, interval and ratio. Categorical variables have nominal or ordinal scales.

The nominal scale is purely descriptive. For instance with sex as a variable, the possible values are man and woman. There is no order or hierarchy, one value isn't higher than the other.

The ordinal scale on the other hand assumes a certain order. For instance happiness. If the possible values are unhappy, considerably unhappy, neutral, considerably happy and ecstatic, then there is a certain order. If a respondent indicates to be neutral, this is happier than considerably unhappy, which in turn is happier than unhappy. Important is that the distances between the values cannot be measured, this is the difference between ordinal and interval.

Quantitative variables have an interval or ratio scale. Interval means that there are measurable differences between the values. For instance temperate in Celcius. There is an order (30 degrees is more than 20) and the difference is clearly measurable and consistent.

The difference between interval and ratio is that for an interval scale the value can't be zero, but for a ratio scale it can be. So the ratio scale has numerical values, with a certain order, with measurable differences and with zero as a possible value. Examples are percentage or income.

Furthermore there are discrete and continuous variables. A variable is discrete when the possible values can only be limited, separate numbers. A variable is continuous when the values can be anything possible. For instance the number of brothers and sisters is discrete, because it's not possible to have 2.43 brother/sister. And for instance

.....read more
Access: 
JoHo members
What are the main measures and graphs of descriptive statistics? - Chapter 3

What are the main measures and graphs of descriptive statistics? - Chapter 3


3.1 Which tables and graphs display data?

Descriptive statistics serves to create an overview or summary of data. There are two kinds of data, quantitative and categorical, each has different descriptive statistics.

To create an overview of categorical data, it's easiest if the categories are in a list including the frequence for each category. To compare the categories, the relative frequencies are listed too. The relative frequency of a category shows how often a subject falls within this category compared to the sample. This can be calculated as a percentage or a proportion. The percentage is the total number of observations within a certain category, divided by the total number of observations * 100. Calculating a proportion works largely similar, but then the number isn't multiplied by 100. The sum of all proportions should be 1.00, the sum of all percentages should be 100.

Frequencies can be shown using a frequency distribution, a list of all possible values of a variable and the number of observations for each value. A relative frequency distributions also shows the comparisons with the sample.

Example (relative) frequency distribution:

Gender

Frequence

Proportion

Percentage

Male

150

0.43

43%

Female

200

0.57

57%

Total

350 (=n)

1.00

100%

Aside from tables also other visual displays are used, such as bar graphs, pie charts, histograms and stem-and-leaf plots.

A bar graph is used for categorical variables and uses a bar for each category. The bars are separated to indicate that the graph doesn't display quantitative variables but categorical variables.

A pie chart is also used for categorical variables. Each slice represents a category. When the values are close together, bar graphs show the differences more clearly than pie charts.

Frequency distributions and other visual displays are also used for quantitative variables. In that case, the categories are replaced by intervals. Each interval has a frequence, a proportion and a percentage.

A histogram is a graph of the frequency distribution for a quantitative variable. Each value is represented by a bar, except when there are many values, then

.....read more
Access: 
JoHo members
What role do probability distributions play in statistical inference? – Chapter 4

What role do probability distributions play in statistical inference? – Chapter 4


4.1 What are the basic rules of probability?

Randomization is important for collecting data, the idea that the possible observations are known but it's yet unknown which possibility will prevail. What will happen, depends on probability. The probability is the proportion of the number of times that a certain observation is prevalent in a long sequence of similar observations. The fact that the sequence is long, is important, because the longer the sequence, the more accurate the probability. Then the sample proportion becomes more like the population proportion. Probabilities can also be measured in percentages (such as 70%) instead of proportions (such as 0.7). A specific branch within statistics deals with subjective probabilities, called Bayesian statistics. However, most of statistics is about regular probabilities.

A probability is written like P(A), where P is the probability and A is an outcome. If two outcomes are possible and they exclude each other, then the chance that B happens is 1- P(A).

Imagine research about people's favorite colors, whether this is mostly red and blue. Again the assumption is made that the possibilities exclude each other without overlapping. The chance that someone's favorite color is red (A) or blue (B), is P(A of B) = P (A) + P (B).

Next, imagine research that encompasses multiple questions. The research seeks to investigate how many married people have kids. Then you can multiply the chance that someone is married (A) with the chance that someone has kids (B).The formula for this is: P(A and B) = P(A) * P(B if also A). Because there is a connection between A and B, this is called a conditional probability.

Now, imagine researching multiple possibilities that are not connected. The chance that a random person likes to wear sweaters (A) and the chance that another random person likes to wear sweaters (B), is P (A and B) = P (A) x P (B). These are independent probabilities.

4.2 What is the difference in probability distributions for discrete and continuous variables?

A random variable means that the outcome differs for each observation, but mostly this is just referred to as a variable. While a discrete variable has set possible values,

.....read more
Access: 
JoHo members
Call to action: Do you have statistical knowledge and skills and do you enjoy helping others while expanding your international network?

Call to action: Do you have statistical knowledge and skills and do you enjoy helping others while expanding your international network?

People who share their statistical knowledge and skills can contact WorldSupporter Statistics for more exposure to a larger audience. Relevant contributions to specific WorldSupporter Statistics Topics are highlighted per topic so that users who are interested in certain statistical topics can broaden their theoretical perspective and international network.

Do you have statistical knowledge and skills and do you enjoy helping others while expanding your international network? Would you like to cooperate with WorldSupporter Statistics? Please send us an e-mail with some basics (Where do you live? What's your (statistical) background? How are you helping others at the moment? And how do you see that in relation to WorldSupporter Statistics?) to info@joho.org - and we will most definitely be in touch.

Understanding data: distributions, connections and gatherings

Selected contributions for Understanding logistic regression

What is logistic regression? – Chapter 15

What is logistic regression? – Chapter 15


15.1 What are the basics of logistic regression?

A logistic regression model is a model with a binary response variable (like 'agree' or 'don't agree'). It's also possible for logistic regression models to have ordinal or nominal response variables. The mean is the proportion of responses that are 1. The linear probability model is P(y=1) = α + βx. This model often is too simple, a more extended version is:

The logarithm can be calculated using software. The odds are:: P(y=1)/[1-P(y=1)]. The log of the odds, or logistic transformation (abbreviated as logit) is the logistic regression model: logit[P(y=1)] = α + βx.

To find the outcome for a certain value of a predictor, the following formula is used:

The e to a certain power is the antilog of that number.

A straight line is drawn next to the curve of a logistic graph to analyze it. β is maximal where P(y=1) = ½. For logistic regression the maximal likelihood method is used instead of the least squares method. The model expressed in odds is:

The estimate is:

With this the odds ratio can be calculated.

There are two possibilities to present the data. For ungrouped data a normal contingency table suffices. For grouped data a row contains data for every count in a cel, like just one row with the number of subjects that agreed, followed by the total number of subjects.

An alternative of the logit is the probit. This link assumes a hidden, underlying continuous variable y* that is 1 above a certain value T (threshold) and that is 0 below T. Because y* is hidden, it's called a latent variable. However, it can be used to make a probit model: probit[P(y=1)] = α + βx.

Logistic regression with repeated measures and random effects is analyzed with a linear mixed model: logit[P(yij = 1)] = α + βxij + si.

15.2 What does multiple logistic regression look like?

The multiple logistic regression model is: logit[P(y = 1)] = α + β1x1 + … + βpxp. The further βi is from 0, the stronger

.....read more
Access: 
JoHo members
Categorical outcomes: logistic regression - summary of (part of) chapter 20 of Statistics by A. Field

Categorical outcomes: logistic regression - summary of (part of) chapter 20 of Statistics by A. Field

Image

Discovering statistics using IBM SPSS statistics
Chapter 20
Categorical outcomes: logistic regression

This summary contains the information from chapter 20.8 and forward, the rest of the chapter is not necessary for the course.


What is logistic regression?

Logistic regression is a model for predicting categorical outcomes from categorical and continuous predictors.

A binary logistic regression is when we’re trying to predict membership of only two categories.
Multinominal is when we want to predict membership of more than two categories.

Theory of logistic regression

The linear model can be expressed as: Yi = b0 + b1Xi + errori

b0 is the value of the outcome when the predictors are zero (the intercept).
The bs quantify the relationship between each predictor and outcome.
X is the value of each predictor variable.

One of the assumptions of the linear model is that the relationship between the predictors and outcome is linear.
When the outcome variable is categorical, this assumption is violated.
One way to solve this problem is to transform the data using the logarithmic transformation, where you can express a non-linear relationship in a linear way.

In logistic regression, we predict the probability of Y occurring, P(Y) from known (logtransformed) values of X1 (or Xs).
The logistic regression model with one predictor is:
P(Y) = 1/(1+e –(b0 +b1X1i))
The value of the model will lie between 1 and 0.

Testing assumptions

You need to test for

  • Linearity of the logit
    You need to check that each continuous variable is linearly related to the log of the outcome variable.
    If this is significant, it indicates that the main effect has violated the assumption of linearity of the logic.
  • Multicollinearity
    This has a biasing effect

Predicting several categories: multinomial logistic regression

Multinomial logistic regression predicts membership of more than two categories.
The model breaks the outcome variable into a series of comparisons between two categories.
In practice, you have to set a baseline outcome category.

Access: 
JoHo members
MVDA - logistic regression analysis

MVDA - logistic regression analysis

Image

Week 4: Logistic Regression Analysis (LRA)

LRA can be used when the dependent variable (Y) is binary and the predictors (X1, X2) interval level (or binary).

The research question is: Can Y be predicted fromX1and/orX2?

  • Example: Can the passing (1) or failing (0) the MVDA exam (Y) be predicted from the student’s grade on the psychometrics exam (X)?

Is there a significant association between grade and passing/failing the exam? (report test statistic, df, and p value)?

Here, we look at the Variables in the Equation table at the Wald of the grade. If it’s significant, then yes there is a significant association. An example of how this can be reported:

Yes, Wald  χ2(1) = 7.090,p=.006

Write down the logistic regression equation

For example:

if the constant B is -4.200

the grade B is: .671

Then the equation looks like this:

(From now on, sorry for the weird format of the formulas)

For what grade is the probability of passing the MVDA exam equal to the probability of failing the MVDA exam?

Passing= 50%

Failing=50%

P=1/2 =

In order for  to be 1, -4.200+ .671(Grade) has to be equal to 0. This is because e to the power of 0 is 1.

So, -4.200 + 0.671(g)=0

0.671(g)=4.200

g=6.259

Therefore, the grade where there is an equal chance for passing and failing is 6.259.

Calculate the probabilities and odds of passing for X= 0,5, 10

X                      P                                              Odds (rounded up)              

0                      =0.0148                           = = 0.015                    

5                    = 0.3005                            = 0.429                              

10                   = 0.9248                           =11.5

 

How to calculate the odds ratio?

Example:

X                      P                      Odds               Odds ratio

1                   .0285           .02931          =1.958

2                   .0543            .0574            1.958

Therefore, if X increases 1 unit, the odds are going to increase by x 1.958 (times 1.958).

 

What is the odds ratio of X of

.....read more
Access: 
Public
Call to action: Do you have statistical knowledge and skills and do you enjoy helping others while expanding your international network?

Call to action: Do you have statistical knowledge and skills and do you enjoy helping others while expanding your international network?

People who share their statistical knowledge and skills can contact WorldSupporter Statistics for more exposure to a larger audience. Relevant contributions to specific WorldSupporter Statistics Topics are highlighted per topic so that users who are interested in certain statistical topics can broaden their theoretical perspective and international network.

Do you have statistical knowledge and skills and do you enjoy helping others while expanding your international network? Would you like to cooperate with WorldSupporter Statistics? Please send us an e-mail with some basics (Where do you live? What's your (statistical) background? How are you helping others at the moment? And how do you see that in relation to WorldSupporter Statistics?) to info@joho.org - and we will most definitely be in touch.

Follow the author: Statistics Supporter
Comments & Kudos

Add new contribution

CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA
Enter the characters shown in the image.
Access level of this page
  • Public
  • WorldSupporters only
  • JoHo members
  • Private
Statistics
2068 1
Check related topics:
Activities abroad, studies and working fields
Competences and goals for meaningful life