Join with a free account for more service, or become a member for full access to exclusives and extra support of WorldSupporter >>

Image

Which kinds of samples and variables are possible? – Chapter 2

2.1 Which kinds of variables can be measured?

All characteristics of a subject that can be measured are variables. These characteristics can vary between different subjects within a sample or within a population (like income, sex, opinion). The use of variables is to indicate the variability of a value. As as example, the number of beers consumed per week by students. The values of a variable constitute the measurement scale. Several measurement scales, or ways to differ variables, are possible.

The most important divide is that between quantitative and categorical variables. Quantitative variables are measured in numerical values, such as age, numbers of brothers and sisters, income. Categorical variables (also called qualitative variables) are measured in categories, such as sex, marital status, religion. The measurement scales are tied to statistical analyses: for quantitative variables it is possible to calculate the mean (i.e. the average age), but for categorical variables this isn't possible (i.e. there is no average sex.

Also there are four measurement scales: nominal, ordinal, interval and ratio. Categorical variables have nominal or ordinal scales.

The nominal scale is purely descriptive. For instance with sex as a variable, the possible values are man and woman. There is no order or hierarchy, one value isn't higher than the other.

The ordinal scale on the other hand assumes a certain order. For instance happiness. If the possible values are unhappy, considerably unhappy, neutral, considerably happy and ecstatic, then there is a certain order. If a respondent indicates to be neutral, this is happier than considerably unhappy, which in turn is happier than unhappy. Important is that the distances between the values cannot be measured, this is the difference between ordinal and interval.

Quantitative variables have an interval or ratio scale. Interval means that there are measurable differences between the values. For instance temperate in Celcius. There is an order (30 degrees is more than 20) and the difference is clearly measurable and consistent.

The difference between interval and ratio is that for an interval scale the value can't be zero, but for a ratio scale it can be. So the ratio scale has numerical values, with a certain order, with measurable differences and with zero as a possible value. Examples are percentage or income.

Furthermore there are discrete and continuous variables. A variable is discrete when the possible values can only be limited, separate numbers. A variable is continuous when the values can be anything possible. For instance the number of brothers and sisters is discrete, because it's not possible to have 2.43 brother/sister. And for instance weight is continuous, because it's possible to weigh 70 kilo but also 70.52 kilo.

Categorical variables (nominal or ordinal) are always discrete because they have a limited number of categories. Quantitative variables can be both discrete or continuous. When quantitative variables happen to be able to have lots of possible values, they are considered continuous.

2.2 How does randomization work?

Randomization is the mechanism of obtaining a representative sample. In a simple random sample every subject of the population has an equal chance of becoming part of the sample. The randomness is important, because it needs to be guaranteed that the data isn't biased. Biased information would make inferential statistics useless, because then it's impossible to say anything about the population.

For a random sample a sampling frame is necessary; a list of all subjects within the population. Next all subjects are numbered and then at random numbers are drawn. Drawing random numbers can be done using software, for instance R. In R the following formula is used:

> sample(1:60, 4) #

[1] 22 47 38 44 #

The symbol > indicates that the program needs to execute a task. In this sample the goal is to select four random subjects from a list of 60 subjects in total. The program indicates which subjects are chosen: numbers 22, 47. 38 and 44.

Data can be collected using surveys, experiments and observational studies. All these methods can have a degree of randomization.

Different types of surveys are possible; online, offline etc. Every way to gather data has challenges in terms of representing the population accurately.

Experiments are used to measure and compare the reactions from subjects under different conditions. These conditions, so called treatments, are values of a variable that can influence the reaction. It is up to the researcher to decide which subjects will follow which treatments. This is where randomization plays a part; the researcher needs to divide the subjects into groups randomly. In this case an experimental design is used to constitute which subjects will follow which treatments.

In observational studies the researcher measures the values of variables without influencing or manipulating the situation. Who will be observed, is determined at random. The biggest risk of this method is that a variable that influences the results remains unseen.

2.3 How do you control variability and bias?

In theory, a measure must be valid, which means that it is clear what it's supposed to measure and that it accurately reflects this concept. A measure must also be reliable, meaning that it's consistent and a respondent would give the same answer again when asked twice. In reality however all kinds of factors can influence a research.

Even in the case of multiple completely random samples, the samples will differ in the way that they are different from the population. This difference is called the sampling error; how much the statistic that is drawn from the sample differs from the parameter that indicates the value in the population. In other words, the sampling error indicates the percentage that the sample can differ from the actual population. If in the population 66% agrees with government policy, but in the sample 68%, then the sampling error is 2%. In most cases for samples of over a 1000 subjects the sampling error remains limited to 3%. This is called the margin of error. This concept is often used in statistics because it can say something about the quality of a sample.

Apart from sampling error there are other factors that influence the results from random samples, such as sampling bias, response bias and non-response bias.

In probability sampling the chance of every possible sample is known. In nonprobability sampling however this is not known, the reliability is unknown and sampling bias can happen. So the sampling bias occurs in case it's not possible to guarantee that all members of the population have an equal chance to become part of the sample. This happens for instance when only volunteers take part in a research. Volunteers can be different from people that choose not to participate. The difference for certain variables that the volunteers cause, is called selection bias.

When questions in a survey or interview are asked in a certain fashion or sequence, response bias can occur. The interviewers may want to get socially desirable answers, with questions such as “Do you agree that...?” The respondents prefer not to disagree with the interviewer and are more inclined to agree, even if they might not want to. Also the general inclination to give answers that people think the interviewer favors, is part of response bias.

Non-response bias happens when people quit during research or other factors result in missing data. Some people choose not to answer certain questions, for various reasons. When people decide to quit, they may have different values on important variables compared to the respondents that remain. This can influence the data, even in a random sample.

2.4 Which methods can be used for probability sampling?

Apart from simple random samples, also other methods are possible. There are cases when a completely unselective sample isn't possible. Sometimes it can be desirable or easier not to use a completely unselective sample. There are other methods that still use probability sampling (so that the chance is known for every possible sample) and randomization (to have a representative sample as a goal).

In a systematic random sample the subjects are chosen in a systematic manner, by consistently skipping a certain number of subjects. An example is selecting every tenth house in a street. The formula for this method is: k = N/n. The k is the skip number, the selected subject after other subjects are skipped. N is the population and n is the sample size.

A stratified sample divides the population in groups, also called strata. From each stratum a number of subjects is chosen at random that will form the sample. This can be proportional or disproportional. In a porportional stratified sample the proportions in the strata are equal to the proportions in the population. If for instance 60% in the population is male and 40% is female, then this needs to be the same in the sample. Sometimes is may be better to use a disproportional stratified sample. If in a sample of only 100 subjects only 10% is female, it doesn't make sense to have those 10 women all in the sample. A number like that is too small to be representative, then no conclusions can be drawn about the actual population. In that case it's better to choose a disproportional stratified sample.

Most samples require access to the entire population, but in reality this may not be given. In that case cluster sampling may be an option. This requires dividing the population in clusters (for instance city districts) and randomly choosing one cluster. The difference with stratified samples is that not every cluster is represented.

Another option is multistage sampling; several layered samples. For instance first provinces are selected, then cities within those provinces and then streets within those cities.

 

Image  Image  Image  Image

Access: 
Public
This content is related to:
This content is used in:

Statistical methods for the social sciences - Agresti - 5th edition, 2018 - Summary (EN)

Selected contributions for Data: distributions, connections and gatherings

Image

This content is also used in .....
Work for WorldSupporter

Image

JoHo can really use your help!  Check out the various student jobs here that match your studies, improve your competencies, strengthen your CV and contribute to a more tolerant world

Working for JoHo as a student in Leyden

Parttime werken voor JoHo

Comments, Compliments & Kudos:

Add new contribution

CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA
Enter the characters shown in the image.

Image

Check how to use summaries on WorldSupporter.org

Online access to all summaries, study notes en practice exams

How and why would you use WorldSupporter.org for your summaries and study assistance?

  • For free use of many of the summaries and study aids provided or collected by your fellow students.
  • For free use of many of the lecture and study group notes, exam questions and practice questions.
  • For use of all exclusive summaries and study assistance for those who are member with JoHo WorldSupporter with online access
  • For compiling your own materials and contributions with relevant study help
  • For sharing and finding relevant and interesting summaries, documents, notes, blogs, tips, videos, discussions, activities, recipes, side jobs and more.

Using and finding summaries, study notes en practice exams on JoHo WorldSupporter

There are several ways to navigate the large amount of summaries, study notes en practice exams on JoHo WorldSupporter.

  1. Use the menu above every page to go to one of the main starting pages
    • Starting pages: for some fields of study and some university curricula editors have created (start) magazines where customised selections of summaries are put together to smoothen navigation. When you have found a magazine of your likings, add that page to your favorites so you can easily go to that starting point directly from your profile during future visits. Below you will find some start magazines per field of study
  2. Use the topics and taxonomy terms
    • The topics and taxonomy of the study and working fields gives you insight in the amount of summaries that are tagged by authors on specific subjects. This type of navigation can help find summaries that you could have missed when just using the search tools. Tags are organised per field of study and per study institution. Note: not all content is tagged thoroughly, so when this approach doesn't give the results you were looking for, please check the search tool as back up
  3. Check or follow your (study) organizations:
    • by checking or using your study organizations you are likely to discover all relevant study materials.
    • this option is only available trough partner organizations
  4. Check or follow authors or other WorldSupporters
    • by following individual users, authors  you are likely to discover more relevant study materials.
  5. Use the Search tools
    • 'Quick & Easy'- not very elegant but the fastest way to find a specific summary of a book or study assistance with a specific course or subject.
    • The search tool is also available at the bottom of most pages

Do you want to share your summaries with JoHo WorldSupporter and its visitors?

Quicklinks to fields of study for summaries and study assistance

Field of study

Check the related and most recent topics and summaries:
Activity abroad, study field of working area:
Institutions, jobs and organizations:
Statistics
1974 2