Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Chapter 2 summary

DIFFERENT TYPES OF DATA
A variable is any characteristic observed in a study. The data values that we observe for a variable are called observations. A variable can be categorical and quantitative.

Categorical variables are variables that belong to a distinct set of categories. A categorical variable can be numerical, because some variables do not vary in quantity. (e.g: religion, favourite sport, bank account, area codes)
Quantitative variables are variables that have numerical values and represent different magnitudes. (e.g: weight, height, hours spent watching TV every day)

Key features to describe quantitative variables are the centre and the variability (spread) of the data (e.g: average amount of hours spent watching TV every day). Key feature to describe categorical variables is the relative number of observations in various categories. (e.g: the percentage of days in a year that it was sunny)

Quantitative variables can be discrete and continuous. A quantitative variable is discrete if its possible values form a set of separate numbers, such as 0, 1, 2, 3 (e.g: the number of pets in a household). A quantitative variable is continuous if its possible values form an interval, such as 0.16, 0,13, 2,32 (e.g: weight: 68,3 kg).

The distribution of a variable describes how the observations fall (are distributed) across the range of possible values. The modal category is the category with the largest frequency.

A frequency table is a listing of possible values for a variable, together with the number of observations for each value.

Category	A	B	C
Frequency	17	23	9
Proportion	0.347	0.469	0.184
Percentage	34.7%	46.9%	18.4%

*an example of a frequency table*

The proportion of observations falling in a certain category is the number of observations in that category divided by the total number of observations. The percentage is the proportion multiplied by 100. Proportions and percentages are also called relative frequencies.

Proportion =

GRAPHICAL SUMMARIES OF DATA
The two primary graphical displays for summarizing a categorical variable are the pie chart and the bar graph. A bar graph with categories ordered by their frequency is called a Pareto chart. The Pareto Principle states that a small subset of categories often contains most of the observations.

There are three common ways of summarizing quantitative variables and visualize their distribution.

Dot plot
A dot plot shows a dot for each observation, placed just above the value on the number line for that observation.
Stem-and-Leaf plots
A stem-and-leaf plot represents each observation by a stem and a leaf. The stem usually consists of all the digits except for the final one, which is the leaf. It is possible to truncate the data values: cut off the final digit without having to round it.
Histogram
A histogram is a graph that uses bars to portray frequencies or the relative frequencies of the possible outcomes for a quantitative variable. A histogram can be unimodal and bimodal. If the distribution has a single mound or peak it is called unimodal, if it has two distinct mounds or peaks, then it is called bimodal.

It is wise to always plot a histogram when summarizing the data. If the amount of observations is small (less than 50), the histogram should be supplemented with a stem-and-leaf plot or a dot plot to show the numerical values of the observations. A unimodal distribution can be symmetric or skewed. If it is skewed, it can either be skewed to the right or to the left. The distribution is skewed if one side of the distribution stretches out longer than the other side. If the peak is at the left side, the distribution is skewed to the right.

A data set collected over time is called a time series. A common pattern to look for is a trend over time, indicating a tendency of the date to either rise or fall. Time series can be displayed in either a time plot or a bar graph.

The mean is the sum of observations divided by the number of observations. It is interpreted as the balance point of the distribution. The median is the middle value of the observations when observations are ordered from smallest to largest. Here are some basic properties of the mean:

The mean is the balance point of data.
The mean is often not equal to any value that was observed in the sample.
For a skewed distribution, the mean is pulled in the direction of the longer tail, relative to the median.
The mean can be highly influenced by an outlier, an unusual small or an unusual large observation.

The mean and the median can be compared. The shape of a distribution influences whether the mean is larger or smaller than the median.

If the distribution is perfectly symmetric, the mean equals the median
If the distribution is skewed to the left, the mean is smaller than the median.
If the distribution is skewed to the right, the mean is larger than the median.

A numerical summary of the observations is called resistant if extreme observations have little, if any, influence on its value. The median is resistant, the mean is not. If a distribution is highly skewed, the median is usually preferred over the mean. If the distribution is close to symmetric or only mildly skewed, the mean is usually preferred over the median.

The mode is the value that occurs most frequently. The mode is often used with categorical variables. It is possible that there is no mode with a continuous observation.

MEASURING THE VARIABILITY OF QUANTITATIVE DATA
The deviation of an observation x from the mean, the difference between the observation and the sample mean. The sum of the deviations always equals zero. The average of the squared deviation is called the variance. The root of the variance (squared deviation) is called the standard deviation. This represents a typical distance or a type of average distance of an observation from the mean. The greater the standard deviation ‘s’, the greater the variability in the data. ‘s’ can only be 0 when all the observations take the same value.

The standard deviation: s=∑(x-x̄)2n-1

This means: the square root of (the sum of squared deviations divided by sample size – 1)

The mean and median describe the centre of the distribution. The standard deviation and the range describe the variability of the distribution.

USING MEASURES OF POSITION TO DESCRIBE VARIABILITY
The median is a special case of a more general set of measures of position called percentiles. The pth percentile is a value such that p percent of the observation fall below or at that value. Three useful percentiles are the quartiles. (1^st quartile: p = 25, 2^nd quartile: p = 50 (median), 3^rd quartile: p = 75)

The quartiles are also used to define a measure of variability that is more resistant than the range and the standard deviation. The distance from Q1 to Q3 is called the interquartile range. It is possible to identify possible outliers using the interquartile range. An observation is a potential outlier if the observation falls more than 1.5 x IQR below the first quartile or more than 1.5 x IQR above the third quartile.

The five number summary is the basis of a graphical display called the box plot. The box of a box plot contains the central 50% of the distribution, from the first quartile to the third quartile.

A box plot does not portray certain features of a distribution, such as distinct mounds and possible gaps, as clearly as a histogram does. Box plots are useful for identifying potential outliers. Side-by-side box plots are useful in comparing data, as it shows differences in centres, potential outliers and the variability.

The z-score is the number of standard deviation falls from the mean.

RECOGNIZING AND AVOIDING MISUSES OF GRAPHICAL SUMMARIES
The following things are useful when constructing a graph:

Label both axes and provide a heading to make clear what the graph is intended to portray
The vertical axis usually starts at 0
Make sure you don’t get the relative percentages incorrect
Sometimes it is useful to use multiple graphs to compensate for the relative difference

Access:

Public

Join WorldSupporter!

Join with a free account for more service, or become a member for full access to exclusives and extra support of WorldSupporter >>

This content is related to:

Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Book summary

Research Methods & Statistics – Interim exam 1 (UNIVERSITY OF AMSTERDAM)

Check more of topic:

Samenvattingen voor psychologie en gedrag

Universiteit Amsterdam: UVA

This content is used in:

Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Book summary

Research Methods & Statistics – Interim exam 1 (UNIVERSITY OF AMSTERDAM)

Going abroad?

Insure your way around the world

International expat insurances

Travel & Worldsupporter insurances (NL)

Study with summaries

Contributions: posts

Help other WorldSupporters with additions, improvements and tips

Spotlight: topics

Check the related and most recent topics and summaries:

Activities abroad, study fields and working areas:

Samenvattingen voor psychologie en gedrag

Institutions, jobs and organizations:

Universiteit Amsterdam: UVA

This content is also used in .....

Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Book summary

This bundle contains a full summary for the book "Statistics, the art and science of learning from data by A. Agresti (third edition". It contains the following chapters:

1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 14, 15.

Research_Methods_and_Statistics_University_of_Amsterdam.jpg

Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Chapter 1 summary

Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Chapter 2 summary

Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Chapter 3 summary

Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Chapter 5 summary

Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Chapter 6 summary

Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Chapter 7 summary

Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Chapter 8 summary

Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Chapter 9 summary

Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Chapter 10 summary

Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Chapter 11 summary

Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Chapter 12 summary

Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Chapter 14 summary

Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Chapter 15 summary

Research Methods & Statistics – Interim exam 1 (UNIVERSITY OF AMSTERDAM)

Contents of this bundle:

This bundle contains a summary for the first interim exam of the course "Research Methods & Statistics" given at the University of Amsterdam. It contains the books: "Statistics, the art and science of

...

Research_Methods_and_Statistics_University_of_Amsterdam.jpg

Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Chapter 1 summary

Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Chapter 2 summary

Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Chapter 3 summary

Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Chapter 5 summary

Research methods in psychology by B. Morling (third edition) – Chapter 1 summary

Research methods in psychology by B. Morling (third edition) – Chapter 2 summary

Lees verder over Research Methods & Statistics – Interim exam 1 (UNIVERSITY OF AMSTERDAM)
3314 keer gelezen

Check how to use summaries on WorldSupporter.org

Submenu: Summaries & Activities

Follow the author: JesperN

Work for WorldSupporter

JoHo can really use your help! Check out the various student jobs here that match your studies, improve your competencies, strengthen your CV and contribute to a more tolerant world

Working for JoHo as a student in Leyden

Parttime werken voor JoHo

Statistics

Search a summary, study help or student organization

Select any filter and click on Search to see results

Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Chapter 2 summary

Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Book summary

Research Methods & Statistics – Interim exam 1 (UNIVERSITY OF AMSTERDAM)

Samenvattingen voor psychologie en gedrag

Universiteit Amsterdam: UVA

Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Book summary

Research Methods & Statistics – Interim exam 1 (UNIVERSITY OF AMSTERDAM)

Contributions: posts

Add new contribution

Spotlight: topics

Samenvattingen voor psychologie en gedrag

Universiteit Amsterdam: UVA

Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Book summary

Research_Methods_and_Statistics_University_of_Amsterdam.jpg

Research Methods & Statistics – Interim exam 1 (UNIVERSITY OF AMSTERDAM)

Research_Methods_and_Statistics_University_of_Amsterdam.jpg

Online access to all summaries, study notes en practice exams

How and why use WorldSupporter.org for your summaries and study assistance?

Using and finding summaries, notes and practice exams on JoHo WorldSupporter

Quicklinks to fields of study for summaries and study assistance