Summary with Conducting Research in Psychology: Measuring the Weight of Smoke - Pelham & Blanton - 4th edition

What is the nature of the scientific methods? - Chapter 1
How does the scientific process work? - Chapter 2
What scales and measurement exist in psychological research? - Chapter 3
How do you convert the subjective to the objective? - Chapter 4
What types of misinterpretation can occur during research? - Chapter 5
What is non experimental research in psychology? - Chapter 6
What is experimental research in psychology? - Chapter 7
What is quasi-experimental research in psychology? - Chapter 8
How do you determine the best research approach? - Chapter 9
Conducting Research in Psychology: Measuring the Weight of Smoke - Pelham & Blanton - 4e druk - BulletPoints
Conducting Research in Psychology: Measuring the Weight of Smoke - Pelham & Blanton - 4th edition - Glossary

What is the nature of the scientific methods? - Chapter 1

What are the origins of psychology?

Metaphysical (supernatural) explanations were the earliest ways to explain human behavior. While modern scientists would likely disagree with metaphysical explanations because they go against scientific laws, early philosophers such as Plato and Aristotle used the language of metaphysics to explain natural occurrences. In discussing the origins of psychology, we will discuss three metaphysical systems: animism, mythology/religion, and astrology.

What is the essence of the metaphysical system 'animism'?

Animism, a belief that objects in nature are alive, is an example of a metaphysical explanation. Prehistoric people thought that owning animal parts such as bones or feathers would give them the animal’s powers. People also believed natural elements and events such as fire, wind, and the sun had personalities that could be angered. An early philosopher, Plato thought the earth was living and possessed a soul. Another philosopher, Aristotle analyzed human personalities in his writings Physiognomics. In accordance with notions of animism, he argued that people who resembled specific animals also had the same character traits as those animals. Aristotle claimed that a person with a thick neck that resembled a bull may also possess a similar quick temper.

What is the body of the metaphysical system mythology/religion?

Another type of metaphysical explanation is religion or mythology. These systems assume that spiritual beings are central to human behavior. Compared to animism, religious beliefs are more complex and sophisticated. However, both types of explanations state that forces without actual physical presence control human behavior. Similarly to science, religion is a system with which to analyze and explain the way people behave.

How is astrology the third type of metaphysical explanation?

Astrology is a third type of metaphysical explanation. First practiced by ancient Egyptians, astrology states that human behavior is determined by celestial bodies. One thing to note is that astrology does actually use some scientific methods to explain human behavior. For example, astrologers try to increase accuracy of their predictions by using the exact time, day, year and location of a person’s birth.

Philosophy is another system used to explain human behavior. Early greek philosophers, Plato and Aristotle focused on logic. Logic was the trend until the 17th century philosopher Descartes shifted the focus to empirical observation. British philosophers, Locke, Hume, and Hartley followed in Descartes footsteps. Empirical observation reached it’s climax with August Comte who introduced positivism which states that theories of human behavior must be based on proven observations. Learning about the world though observation became central to the scientific method. Around this time (mid to late 1800’s), psychology became its own field. Coming out of philosophy, psychologists emphasize systematic observation.

Besides philosophy, psychology also had its origins in physiology and physical science. Unlike philosophy which uses empirical observation, physiology uses an experimental method. Physiology is the study of how the brain and the body function and are connected. Important experiments by biologists (Galvani, Volta, and Hall) answered physical and social research questions and helped to develop and refine methods that influenced experimental psychologists. When psychologists began to conduct experiments, psychology became a science.

Historians disagree on who invented experimental psychology, though they agree it was invented by a German scientist in the mid to late 19th century. The general consensus is that Wundt invented it, though Fechner, von Helmhotz and Weber all made contributions. Wundt was the most focused on psychology. He wanted to dissect consciousness thorough experiments. In 1862, he published the Beitrage, where he championed the field of social psychology. Between 1900 and 1920 he produced another ten volume book called Folk Psychology.

Today, psychology is experimental and scientific. The definition has changed to become the scientific study of human behavior. This is not to say that scientists do not make assumptions based on fundamental principles or canons.

What are the four fundamental principles of science?

Determinism

The four canons are determinism, empiricism, parsimony, and testability. The first canon, determinism, states that there is order to the universe and there is a cause for events. Even astrologers believe that the position and movement of planets and other celestial bodies cause humans to behave certain ways. Some psychologies believe it is human tendency to think in terms of cause and effect. Whether this is true or not, there is evidence to the power of causal thinking.

Tversky and Kahneman conducted an experiment where they give a group of participants two pieces of information regarding a cab that was involved in a hit-and run accident in a city with two cab companies, Green and Blue. The participants were told the percentage of cabs in the city that were green versus blue, and that the witness identified the cab as Blue. The participants were asked to judge the probability that the cab was blue. Without additional information, the participants based their judgements on the witness’s testimony and the witness’s rate of accuracy in identifying the car of a cab a night. However, when a second group of participants were told that though the cab companies are equal in size, that a higher percentage of accidents were caused by Green cabs, these participants began making judgements based on base-rate information which is information about proportion. Because the base-rate was discussed in causal terms, these participants were able to come closer to the correct answer. Additionally, people often see connections that don’t actually exist which contributes to stereotypes. This may be because people are too eager to see the world in causal terms. Behaviorists have seen a similar judgement bias or illusory correlation in animals. When animals are fed at random intervals, they will try to determine a pattern to what caused the feeding and behave accordingly in a false conditioning process called superstitious conditioning.

The scientific idea of theory, a statement on the cause and effect relationship between variables, is closely related to determinism. Theories identify abstract hypothesis that supposedly explain something about how the world works.

Empiricism

The second canon is, empiricism, or observation. Scientists believe that empiricism or making observations is the best method to determine scientific principles. This is the least debated canon of science. Galileo is famous for an experiment where he is said to have dropped two cannonballs from the Leaning Tower of Pisa, one being light and the other being heavy. Both balls fell at the same rate and hit the ground at the same time, disproving Aristotle's theory that the rate that an object falls depend on the object’s weight. However Galileo never conducted such an experiment. Instead, he used logic to reason through the experiment and invited others to challenge his theory by conducting the experiment themselves. It is useful to note that empiricism is important to those outside of science as well. It can also be seen in religion, for example Thomas doubting Jesus’s resurrection unless he could touch his wounds.

Parsimony

The third canon is parsimony which states that if there are two equally plausible theories of explanation, preference should go to the simpler theory and that in developing theories, unnecessary concepts should be avoided. This canon is sometimes called “Occam’s razor” because the English philosopher William of Occam who said you shouldn’t make more assumptions that you have to. C. Loyd Morgan applied this to animal behavior. An exception to the canon of parsimony would be if you are trying to explain why an animal such as a chimpanzee does such human things. This shows that while scientists agree to these four canons, there is disagreement how how and when to use them.

Testability

The fourth and last canon is testability. This may be the most important canon that states theories should be testable, basically proven or disproven through research and other empirical techniques. Testability is also related the the concept of falsifiability, or the idea that scientists should try to disprove their theories. In his book Unended Quest, Karl Popper discussed how he was converted from Marxism to logical positivism, the belief that science and philosophy should have their roots in proven observations. Popper was a strong believer in the importance of finding tests to disprove theories, citing Einstein as a role model.

Psychology measure concepts that are difficult to observe. Therefore it is necessary to use indirect observations to draw conclusions. Through the example of physicists, behaviorists E.C. Tolman and Clark Hull espoused the idea of operational definitions which define theories in very concrete procedures based on observation. For example, while it is difficult to observe abstract ideas like hunger, we can connect these ideas to other things that we can observe such has the amount of time someone goes without food and how this affects their body weight. Through operational definitions we can begin to precisely quantify measurements.

It is important to remember the attitude of people for or against a theory. Consider the context, a politician will tend to care more about authority than a scientist would. To understand how scientific belief systems differ from other belief systems, we need to consider how a person views intuition, logic, authority, and observation. The amount of importance placed on each component depends on the belief system. Laypeople tend to see authority or expertise as important. Governments and religions will emphasize intuition. On the other hand, scientists and philosophers place greater importance on logic and observation. Keep in mind that no one of the four ways is better than the others. Additionally there are exceptions, a scientist may try to use logic but being human, they may also use their intuition.

How does the scientific process work? - Chapter 2

What is the role of laws, theories, and hypotheses in psychology?

Psychologists are searching for laws, or universal statements of the nature of things, in order to predict human behavior. In light of the difficult task of establishing laws, most psychologists are developing and testing theories. Theories differ from laws in their universality. Laws are comprehensive and general statements about the nature of reality, whereas theories are more specific and have limits. Most theories can only predict outcomes accurately in specific instances. This means that more than one theory can be true depending on the circumstances. A strong theory not only states the specific conditions and variables, it will also state the exceptions to the theory. The principle of equifinality states that the same behavior can be the result of a variety of causes. In order to test a theory, one needs to form a hypothesis which is a prediction about events based on theories. Hypotheses should logically follow the theory. Through testing hypothesis, one can determine if a theory is true and under what circumstances it is true.

How is observation used in psychology?

In the early 17th century, Francis Bacon championed the method of induction. Induction is using specific observations to draw general conclusions or principles. For example, after observing that hundreds of fish can swim, we may induce that all fish can swim. Such a statement in the scientific realm could be called a theory, which would be tested through other observations. After extensive tests, the theory could still hold true, or if it fails it may be rejected or revised. If revised, the theory would again be tested and revised again if necessary. Through this process of systematic observation, the theory or statement would gain accuracy. Once the theory is well-supported and very precise or accurate it could become a law.

A century later, the British philosopher, David Hume contradicted Bacon. Instead, Hume argued that the division between science and non-science could not be determined through observation. He introduced the problem of induction, which is that it is always possible that the next observation in a long series of observations could disprove your theory. In order to balance the use of induction, it is necessary to use deduction. Deduction is when a general theory is used to create hypotheses that are tested through observations. Karl Popper was in favor of the deductive method. He believed you could never prove a theory true because it is impossible to know if you have used all potential methods of testing. At the same time, Popper believed that it was as important to know what is not true as what is true. In this case, the aim of science is not to find truth, but to get closer to the truth.

How is a hypothesis tested?

Scientists test their hypotheses through validation, falsification, and qualification. Before discussing these three ways that scientists approach testing their hypotheses it is important to mention that people tend to be biased in their observations. Positive test bias refers to the tendency of people to try to confirm rather than disprove their hypothesis. For example, in behavioral confirmation people tend to notice behaviors that are consistent with their first impressions of the person. Studies on bias, stereotypes, and self-fulling prophecies support the claims of positive test bias.

Why is validation important?

Validation is the most common approach where scientists try to find evidence to prove their claims. In this way it is similar to the positive test bias. Cognitive dissonance illustrates this approach. This theory says that when a person believes two logically inconsistent beliefs to both be true, they will try to make the two beliefs compatible. In a study conducted by Festinger and Carlsmith, subjects participated in peg-turning, a very boring task. Some were paid 1 dollar to lie to the next subject by saying the task was exciting, others were paid 20 dollar to tell the same lie. To reduce the dissonance of lying for such a small sum, those that were paid $1 decided that the task must have been enjoyable after-all. The researchers framed the study to validate their hypothesis.

How does falsification contribute?

Falsification is another approach where researchers try to find evidence that disproves their claims. Researchers will often work to falsify an opposing theory to their own and all theories are subject to other researcher’s attempts to invalidate them. In 1967, Daryl Bem falsified cognitive dissonance theory. He criticized dissonance theory because he rejected the motivational processes behind it. He conducted a study that where he asked participants to judge the attitudes of a person that had taken part in the Festinger and Carlsmith study. Bem’s study showed that people judged the original participant the same way as he had judged himself. They also thought that when he lied for twenty dollars, he must have thought the task boring, and when he lied for one dollar, he must have enjoyed the task. A key assumption in dissonance theory is that aversive arousal, or avoiding personal discomfort, is important to cognitive dissonance. Since people responded the same way who did not even participate in the task, this put into doubt Festinger and Carlsmith’s study.

What is the added value of using qualification?

The third approach to testing a hypothesis is qualification. This approach has become more and more popular with psychologists. In this approach, researchers try to find the limits to a theory and the specific set of circumstances under which the theory is true or false. In 1977, Fazio, Zanna, and Cooper hypothesized that when people behave in ways that are slightly to moderately inconsistent with their own attitudes, they experience little averse arousal and will engage in self-perception processes. However if the behavior is very inconsistent with their own attitudes, they experience a lot of averse arousal and will use cognitive dissonance as a defense. They tested this by having people write essays that were counter to their actual belief. They measured for the differences of self-perception and cognitive dissonance by telling some of the participants that the booths they were in were known to cause people to feel nervous. As predicted, this eliminated the change in attitude due to dissonance but do not effect self-perception.

One concern of the qualitative approach, as illustrated by the previous example, is that it tends to be a complicated process. A theory is usually fairly advanced before researchers will apply the qualitative approach since it is necessary to have a certain amount of data before being able to define the limits or boundaries of a theory. The benefit to the qualitative approach is that it is a hybrid of validation and falsification. Due to the complexity of the research based on the qualitative approach, the results tend to be more specific. For this reason, this approach is likely the best for discovering laws.

Keep in mind that the qualitative approach is not immune to issues of positive test bias. On the other hand, there are two checks on a researcher’s tendency to look for results that support their hypothesis. One such check is the huge range of theories and the researchers of those theories. This results in opposing theories, so that while some researchers are validating their results, other researchers are inevitability falsifying results in an effort to validate their own results. Another check is the scientific standards in place to determine the accuracy of the research and resulting conclusions.

How is a hypothesis formulated?

Before testing a hypothesis, it is necessary to create a hypothesis. McGuire pointed out that the majority of coursework on methodology was on how to test hypothesis while how to formulate a hypothesis was often ignored. There are two ways that people reason. Through induction, one reasons from the specific towards the general while through deduction, one reasons from the general towards the specific. Cognitive psychologists call these “bottom-up” (induction), and “top-down”(deduction).

When are inductive approaches applied?

There are many ways to utilize the inductive approach. One such way is through case-studies which document detailed observations of a person or a group of people. Sigmund Freud come up with many of his theories based on observations of his patients. Another form of the inductive approach is by attempting to explain paradoxical incidents or strange observations. Another inductive technique would be to analyze the practitioner’s rule of thumb. This means evaluating the process experts follow to reach certain results. In this way, one can locate psychological principles through studying people with specific knowledge or skills. A fourth inductive approach is serendipity, or good luck. While this is often out of the researcher’s control, one needs to be able to be open to recognize a serendipitous discovery.

When are deductive approaches best used?

There are also many ways to utilize the deductive approach. Reasoning by analogy is a technique where the researcher creates analogies for comparisons. For example, McGuire created an analogy between immunity to disease and immunity to persuasion. By inoculating people to persuasion in a similar fashion as people are typically inoculated to disease, that is by exposing them to a controlled less potent version, McGuire saw similar results.

Another deductive method is to apply a functional or adaptive analysis to a question. In this approach, researchers ask a set of questions about what a system has to do in order to learn about changes in their environments and control their environments. A third deductive approach is the hypothetico-deductive method. In this method, researchers begin with guidelines and formulate their hypotheses through rearranging the guidelines.

Another deductive approach is by accounting for conflicting results. In this approach, researchers juxtapose contradictory theories. The fifth deductive approach is by accounting for exceptions to established principles and further exploring the instances when such an exception could occur.

Which ethical responsibilities need to be met by researchers?

Conducting research in psychology can create controversial ethical concerns. Some experiments require deceiving the participants, others involve working with children or other subjects that are unable to give their consent, still others could involve causing an animal pain. When researchers first began working with human subjects, it was assumed that research would be conducted with good judgement and would consider the well-being of the participants. The proved to be false when in a study on the results of untreated syphilis, researchers intentionally withheld penicillin from their subjects. Another example is Milgrams’ study on obedience where participants were told to deliver what they believed to be painful shocks to another participant. These studies caused the American Psychological Association (APA) to publish a set of ethical standards in 1958. There have been subsequent revisions in 1972, 1992, and 2002. Research universities have review committees to make sure that research procedures are in line with APA and other governing bodies.

Ethics of research studies are subjected to a risk-benefit analysis. In other words there must be a benefit to the risk that human participants undergo. This benefit can refer to the greater good and to the specific gains of the human participant or subject. It can be problematic for a researcher to do a risk-benefit analysis on their own studies. Internal review boards (IRBs) conduct the risk-benefit analysis so that all studies meet the guidelines of ethical behavior. A committee will include instructors and researchers of the university, staff members, and people form the community. The IRB reviews proposals of researchers who want to use human subjects. These proposals outline the goals of the project and the a detailed account of anticipated procedures.

In conducting the risk-benefit analysis, IRBs keep two issues in mind, risk or cost, and benefits. The review committee aims to protect human subjects from extreme mental or physical harm. Researchers cannot knowingly put their subjects in dangerous situations. They must do their best to insure they lessen any potential risk. This could be in the form of providing anonymity or screening subjects before the study. Additionally, subjects must agree to participate. They must first be informed about the risks so they can give their informed consent. Participants cannot be pressured to participate. They cannot be bullied, made to feel guilty, or bribed. This is also called freedom from coercion. Participants also have the right to remain anonymous, or the right to confidentiality. They are assigned a number which identifies their responses. Researchers can collect necessary data such as age, gender, race but should not list addresses or anything else that could identify the participant. Participants should also be debriefed, or informed of any false pretenses as soon as possible. They should not leave the study in a more negative psychological state than when they arrived.

What is the role of laws, theories, and hypotheses in psychology?

What scales and measurement exist in psychological research? - Chapter 3

Which types of measurement scales are relevant for psychological research?

Nominal scales

There are four kinds of variables or measurement scales that vary in complexity. The most simple kind is the nominal, or categorical, scale. Nominal scales use meaningful but possibly irrelevant categories such as a person’s sex. Practically speaking, this scale utilizes two possibilities that are mutually exclusive. Numbers in nominal scales are used to tag or label different categories. Examples of nominally scaled variables are license plate numbers, birth dates, or identification numbers.

Ordinal scales

The next type of scale, ordinal scales, use ranking or putting things in order. Ordinal scales give value to the position of people or things in relation to other people or things. However, ordinal scales do not give precise information on the differences between people or things. Interval scales, on the other hand, do provide information on these differences by using real numbers and amounts. Unlike previously mentioned scales, these scales can use negative values, for example when using a below freezing temperature.

Interval scales

In interval scales, the units relate to a specific amount of what is being measured. While this seems obvious, the specificity of what is being measured can be challenging when dealing with psychological constructs.

Ratio scales

Another type of scales are ratio scales. These are similar to interval scales but they also utilize a true zero point (the point where none of the quantity under consideration exists). For this reason, ratio scales do not use negative values. An example would be that someone has smoked 0 times in the past. Ratio scales allow us, as their name implies, to discuss ratios of values - a 4 year old is twice as old as a 2 year old, or participants in group 2 received twice as much money as participants in group 1. Keep in mind that some assumed ratios would be incorrect - a person with an IQ of 200 is not twice as smart as a person with an IQ of 100, or participants in group 2 who received twice as much money as those in group 1 were twice as happy. In studying and utilizing the above scales, it is necessary to remember that you cannot determine the properties of a scale just by looking at the numbering system. Scales that may appear to be interval or ratio scales in other studies could lose the properties of interval or ratio scales when applied to psychology. Researchers should be especially careful drawing conclusions when using similar scales to assess psychological constructs as when assessing physical constructs.

Which types of validity are relevant when conducting research?

In conducting psychological research it is necessary to use measurement scales to determine the the validity and reliability of the research. How valid and how reliable the research is can be reasoned in a variety of ways. It is not always possible for an experiment to be both valid and reliable. In many cases, a balance needs to be achieved. In general, validity means how accurate a statement is. The exact meaning of the word validity can shift depending on how specific or general of the context in which it is used. For the purposes of building a foundation for research, we will discuss internal validity, external validity, construct validity, and conceptual validity. Keep in mind that validity is relative to each experiment.

Internal validity

Internal validity is a measurement of how well research results establish causality. In other words, internal validity measures the likelihood that that variations in the independent variable (the variable being manipulated) caused the changes you observed in the dependent variable. Researchers conducting laboratory experiments try to control for individual difference and separate the variables they are manipulating from other potential influences that could affect the results of their experiments. For this reason, laboratory experiments usually score high in internal validity. Since internal validity measures causality, it is important in order to test theories. Sometimes this presents a challenge. For example, if there are many independent variables, it can be difficult to link a specific dependent variable to a specific independent variable.

To address this issue, the majority of researchers follow the guidelines set by the philosopher John Stuart Mill. According to Mill, three criteria have to be met to conclude a cause and effect relationship. The first requirement is that changes in one variable must coincide with changes in another. Mills calls this covariation. For example, high levels of stress may relate to insomnia. While this may indicate that stress can cause insomnia, it does not prove that to be a fact. In this scenario, it is also possible that the reverse is happening, that insomnia causes stress. This brings us to Mill’s second requirement that changes in the first variable most happen before changes in the second variable. This is called temporal sequence. Since researcher often test many variables at the same time, temporal sequence can be challenging to establish. Measuring variables over time in a prospective or longitudinal study can help in determining temporal sequence. However, it can still be difficult to say with certainty that changes in the second variable had no other potential cause. Mill’s third condition, eliminating confounds, addresses how to rule out external potential causes. This third condition can be very difficult to meet if the researcher is not in full control of all variables. Two variables may seem related, however, this may not be a casual relationship. There may in fact be a third variable that is actually causes changes in the first two variables. This is called the third-variable problem. Additionally, three variables could seem casual, but in fact there is a fourth variable that is actually affecting change, and so on and so forth.

External validity

While laboratory experiments are good for establishing internal validity, they are not ideally suited to measure external validity, or the extent to which research results accurately reflect the real world. This is also known as generalizability. If a researcher want the ability to assert that the research results would be applicable in other situations or environments or to other people, the study needs to have high external validity. External validity has an inverse relationship with boundary conditions. Narrow boundary conditions would most likely result in low external validity. High external validity can be especially difficult to establish when the variables involve people. The boundary conditions of a study may be specific to people of a specific gender, age, social group, class, or culture. Once the boundary conditions are expanded to include other genders, ages, social groups, classes or cultures, it would be difficult to predict similar findings. This interest in people is called generalizability with respect to people. Research has been criticized because of the lack of diversity in subjects that typically participate in the studies.

Social and cognitive psychologists have often studied white college students, while most psychological research has been done in western societies. Researchers are also concerned with generalizability with respect to situations. Generalizability with respect to situations refers to the extent to which research results are applicable in real-life scenarios. While laboratory experiments tend to score low in external validity, passive observational studies, especially if conducted in the real world on a diverse group of people, tend to be higher in external validity. Passive observational studies refers to observations made by researchers of occurrences in the real world. While conducting passive observational studies, the researchers tries to remain unobtrusive. Passive observational studies tend to be low in internal validity since it can be difficult for the researcher to establish temporal sequence and eliminate confounds.

Construct validity

Construct validity is a third type of validity and deals with how accurately the variables in a study represent the hypothetical variables that the researcher is focused on. To be high in construct validity, it is important that a researcher be able to establish operational definitions that will help transform abstract hypothetical variables into observable changes. Certain abstractions are easier to make tangible than others. While hours without food is a good way to gauge hunger, it is much more difficult to convert emotions like happiness or self-esteem and other feelings that are challenging to define. In the case of self-esteem, how do we determine self esteem? Is it how positively or negatively people feel about themselves? Is it if people feel like they have skills or talents or are popular? Is it a combination of how people think about or feel about themselves? Additionally, self-esteem may be a state of being and not a consistent personality trait.

Conceptual validity

The last type of validity we will be discussing is conceptual validity, or the extent to which a research hypothesis connects to the broader theory that it is testing. A hypotheses must be suitable to the theory it is testing. This goes back to what we discussed earlier when talking about the deductive method of coming up with a hypothesis out of a theory and testing the hypothesis through observation. There are times where there is a general consensus among researchers that a theory leads to a certain set of hypotheses. Other times, researchers disagree, this is often a result of variations in researchers’ view on human behavior and what is human nature.

A study by Murray, Homes, and Griffin conducted in 1996 illustrates conceptual validity. In this study, the researchers had a theory on positive illusions, or unrealistically positive views of one’s partner, in intimate relationships. Murray, Homes, and Griffin felt that positive illusions were necessary and healthy since they provided a way for a couple to weather the day to day problems they may face. Their study sought to measure positive illusions by comparing a person’s view of themselves with their partner’s view of them. The study did not actually measure the validity of a person’s opinion of themselves or their partners opinion of them, it compared the two opinions. From this comparison, we can reasonably assume that if a partner’s view of a person is much more favorable than the person’s view of themselves, the partner is practicing positive illusion.

Murray et al. next had to show the crux of their theory, which was that these positive illusions are actually healthy and benefit the couple. To do this, they surveyed people in relationship and asked how satisfied they were with their relationship. By comparing the survey results to the data collected previously, they showed that the more idealized a person was by their partner, the more satisfied they were in their relationship. Murray et. al conducted these experiments with couples that were dating and couples that were married and were able to achieve the same results in both samples. Conceptual validity and construct validity are similar, both measure how closely what a researcher does relates to what the researcher intended to study. The key difference is that conceptual validity is broader in scope, dealing with the overall hypothesis, while construct validity involves the details of manipulating variables in specific studies.

Which types of reliability play a role in measuring research?

Reliability, the repeatability or consistency of a measure or observation, is another important component in measuring research. As an example, let’s say we have a theory that all people fall into two categories, introverts and extroverts. For our study to be reliable, an introvert needs to be an introvert from week to week, they can’t move in and out of both groups. For a study to be reliable, it does not need to be valid. If we choose to categorize people who like blue as introverts and people who like red as extroverts, our study is still reliable as long people who like blue continue to like blue, and people who like red continue to like red. While the preference for blue or red as a way to measure introverts versus extroverts would not be high in validity, it can be high in reliability. Of course, reliability is most useful when the measure is high in validity. In such a scenario, researchers could assume that repeated testing would yield the same valid results.

Test-retest reliability

One way to measure reliability in a study is to assess test-retest reliability. You would do so by testing a group of people once, and then invite the same group of people back for a second round of tests and compare the results. If the results are the same or very similar, you can assume the measure is reliable. A reliable measure is more impressive if a longer stretch of time passes between the first and second tests. Of course there is a limit to how much time can pass as people do change as they are influenced by factors such as age etc. As a result, most assessments leave a 2 to 4 week gap between the first and second tests, though time frames will vary depending on the topic.

Internal consistency

Many assessments use the minimum amount of wait time between tests. Internal consistency is a way to assess reliability through one session. For example, a subject can be asked 10 questions with each question considered a separate test. The subject is essentially being asked the same question 10 different ways, and we can measure the consistency in their answers. A test that is high in internal consistency is especially useful for testing psychological theories because they can show if variables covary (correlate) as the theory predicts.

What does the interobserver agreement entail?

A third form of reliability is sometimes measured and involves human judgement. Interobserver agreement or interrater reliability is the degree to which the judgements of individual people agree with one another. This is easily illustrated in the real world of performance evaluation in the Olympics. Several trained judges gauge a performance and give it a score. All of these individual scores are then averaged. There is an inherent consistency in the process since the highest and lowest scores are often discarded, and judges aim for consistency themselves as the individual scores are generally consistent with one another.

Of course, as illustrated by the above example, the ratings of multiple judges are only useful if the judges are highly trained and act independently. It can take minutes, hours, or months for a judge to learn a coding scheme. During the initial learning or training process judges should be free to discuss their ratings in detail. Once they begin making ratings during the study, they should stop discussing their ratings. Multiple judges discussing their ratings could lead to increased reliability but not increased validity. While it is necessary to have multiple judges for the purposes of interobserver agreement, it is more important to achieve a high level of reliability rather than have many judges participate. The number of judges that should participate will vary depending on the research being conducted. In situations where interobserver agreement is high, researchers may ask two or three raters to judge a subset of observations. Researchers then measure the interrater reliability of their judgements and if they are consistent, they will use the judgements of the two or three raters. In situations where interobserver agreement is low, researches may use a team of multiple raters. Behavioral ratings from multiple judges are key when subjects are unable or unmotivated to reflect and report on what they are experiencing.

All three forms of reliability are more likely to increase as the number of judges, observations, or sessions is increased because smaller samples are more likely to include chance or error. By using larger samples or conducting more tests, the errors would be more noticeable or average out as seen in the example of Olympic judges discarding the highest and lowest scores. Generally speaking, more reliable measures are also more valid measures. For a measure to be valid, it should be reliable, but reliability is not the only requirement to satisfy validity. Additionally, while reliability can be assessed through statistics, validity also requires logic and intuition.

How do you convert the subjective to the objective? - Chapter 4

How to ask a question?

A key aspect of research involves the process of shifting hypothetical abstractions into an observable realm. On the level of human behavior, this often means that a person must shift their subjective thoughts into responses that can be evaluated through a measurement scale. In this chapter, we will discuss the judgement phase where the researcher needs to confirm that people are considering what the researcher wants them to consider, and the response translation phase, where the researcher confirms that people convert their subjective thoughts reliably. We will further discuss guidelines that researchers should adhere to. A useful analogy is that as a researcher, you need to act as a guide to a traveler through giving accurate directions to arrive at the final destination. Researchers should also be aware of the details along the way to better optimize their response scales, as well as consider the different modes of transportation (measurement) that exist. What works best for one study may not be applicable for another. Since such diverse topics exist, this chapter will focus on studies that ask subjects to report on their own experiences.

Main challenges

There are two main challenges when posing questions to a subject and asking them to self-report. Often, these challenges are not obvious until they occur. The first challenge is to make sure that people are thinking about the right question, or the specific question the researcher wants them to think about it. This is the judgement phase. While researchers usually know what questions they want the research participant to answer, it is more difficult to determine the best way to ask or word such questions so that they make sense to the research participant. The questions should be worded in language that is familiar to the research participant, avoiding very technical lingo. Sometimes this language familiar to the research participant may be unfamiliar to the researcher. In such cases, it is helpful to look at past research questions, or if the researcher is breaking new ground, to consult experts who are knowledgable about the people who will be participating in your study. These experts may be in fact people from that group. For example, College students know the most about themselves and other college students.

If your study is on college students, getting a group of them together to ask them how the feel about or interpret a set of questions would a way of conducting pilot testing. Pilot testing refers to using practice studies designed to help researchers refine their questions or other forms of measurement before conducting the intended study. Similar to marketing researchers, a researcher may talk with a focus group. A focus group is a small but representative sample of people from the larger group of people the researcher intends to study. These discussions are informal and not as structured as a study. The researcher may ask open-ended questions about a topic they want to study. This is very different from the structured self-report questions because the participants are allowed to respond in their own words.

Pilot studies

In some pilot studies, a researcher may ask research participant a structured self-report question and ask them an open-ended question that asks the same thing. By comparing the responses, a researcher can determine how to improve the structured self-report question to make more sense to research participants. This could include refining the language of the question, or potentially adding more questions to address issues the researcher may not have thought about before.

Open-ended questions

Some researchers rely only on open-ended questions, and do not use structured self-report questions. Using a single rating scale will give us all the information we would like to know about people, however, multiple structured self-report questions that ask specific questions and provide a specific and relevant response scale and yield a detailed information. Additionally, structured self-report questions give researchers a way to measure such responses-numbers. Fans of open-ended questions may say that such questions can also provide numbers if they are analyzed and coded. The concern is that this is a time-consuming and slow process. For this reason, open-ended questions are used less frequently, and usually when researchers know very little about the topic in question, or to refine the structured self-report questions. A more common form of a pilot study is to go ahead and conduct the study on a smaller scale. In this case, a researcher would go ask the participants to answer the structured self-report questions and analyze the responses. For example, if all the participants gave answers that were on the extreme ends of the scale, the researcher may adjust to the scale to register more subtle variations.

What are tips on how to word a question?

There are a few tips to keep in mind when thinking about how to word a question. The following tips will be discussed:

Keep the question simple
Avoid using unnecessary negatives
Avoid asking too many questions at once
Avoid questions where you force the respondent to choose between two or more options
Avoid questions that do not yield varied responses
Avoid asking loaded questions
Make sure that your questions are applicable to all respondents in the study
Ask more than one question on the same construct
Give specific instructions
Ask sensitive questions in a sensitive manner

Keep the question simple

The first and most important tip is to keep the question simple. Do not use words that are unfamiliar to people. It is better to use informal language and avoid catchphrases or jargon. Keep in mind what the average teenager would be able to comprehend. Even if your target group does not consist of teenagers, due to differences in education levels, it is best best to keep it simple.

Avoid using unnecessary negatives

For example, instead of asking to what extent does a research participant not like something, it is better to ask to what extent does a research participant like something. When people process negative statements, there is a higher chance that they could misinterpret the question. It goes without saying that a question should avoid the double negative. If you must ask a question that uses negation, use words that express the negation clearly. For example, a better way to say “not clothed,” is “nude.” “Nude” is familiar enough for people to understand and avoids the “not” in “not clothed.”

Avoid asking too many questions at once

Such a question where a person is asked to evaluate two different things in a single answer is known as a double-barreled question. A question where a person is asked to circle on a scale of 0 to 10 if they strongly agree or disagree with the statement “I am a lovable and caring person” is an example of a double-barreled question. Lovable and caring may seem related, but for the purposes of the question, the response could be misleading.

Avoid questions where you force the respondent to choose between two or more options

Additionally, avoid questions where you force the respondent to choose between two or more options to avoid gathering incomplete information. For example, you might ask the question, “Do you prefer blue or red?” The first respondent may love both colors, but blue slightly more, and the second respond may hate both colors, but blue slightly less. Both responses would simply show that the respondents prefer blue to red. An improvement would be to ask two separate questions, “How much do you like blue?” and “How much do you like red?” and provide a rating scale.

Avoid questions that do not yield varied responses

It is through the variations in the answers that a researcher can get useful information. If a researcher asks, “Are kittens cute?” chances are the vast majority of respondents would say yes. When thinking of a question, try to avoid floor effects and ceiling effects. Floor effects happen when the majority of respondents respond at the same low level on a question, while the ceiling effects happen when the majority of respondents respond at the same high level. These effects are examples of when responses have little or no variation, also known as the problem of restriction of range. To avoid this problem, it is important to know your sample group and adjust the question and/or response scale to account for the similarities in your sample group.

Avoid asking loaded questions

It is also important to avoid asking loaded questions. Loaded questions are questions that are especially sensitive, or controversial. When asked a loaded question, a respondent may not answer truthfully. A researcher can lessen the likelihood of this occurring by wording the question so that is it unclear what answer the researcher wants. For example, if a researcher asks, “On a scale of 0 to 10, with 0 being not at all and 10 being extremely, to what extent do you trust the firefighters who risk their lives to serve our city?” The phrase “who risk their lives to serve our city” suggests a bias in favor of the firefighters. It would be better for the researcher to ask, “On a scale of 0 to 10, with 0 being not at all and 10 being extremely, to what extent do you trust the firefighters?” While this example is fairly overt, there are times where subtle biases can appear in questions.

Make sure that your questions are applicable to all respondents

Another important tip is to make sure that your questions are applicable to all respondents in the study. To do this it is important to avoid making assumptions. If you are studying intimate relationships, do not assume that everyone in your study is heterosexual. Additionally, try to write questions that are culture and gender neutral. For example, use the word partner instead of spouse, use the word principal caregiver instead of mother.

Ask more than one question on the same construct

Another tip is to ask more than one question on the same construct. While asking someone’s gender is enough to get all the information you need in one question, when assessing a construct like self-esteem, it is best to ask many questions and average the answers. This increases reliability as people may respond to certain specific questions in idiosyncratic ways. They may not like the connotations of a certain word in a question that may affect their response.

By asking many other questions, the question that yielded that response would be averaged out. When asking many questions that measure the same construct, it is important to have some variation. To do this, researchers may ask similar questions in positively and negatively worded ways. This is helpful to avoid response bias. Some respondents can have a tendency to agree with things, while others may be more inclined to disagree. Negatively worded questions can be reverse coded, so that a 0 corresponds to a 10 on a positively worded question. As mentioned previously, it is important to write negatively worded questions without actually using negations. Instead of using the phrase “I feel like I am not a person of worth,” use the phrase “I feel useless at times.”

Give specific instructions on the mental judgement process

During the judgement phase, it is important that the researchers not only ask a specific question, but give specific instructions on the mental judgement process they want the subject to follow to arrive at their answer. Keep in mind that all judgements are in relation to something else. To address this, researchers can include instructions to establish the context that the response applies to. Additionally, a researcher can include practice questions at the beginning of the survey. These questions can help to establish the judgement context. Practice or warm-up questions can also be helpful in introducing sensitive issues. People handle sensitive survey questions better when the questions are introduced slowly.

Sensitive questions should be asked in a sensitive manner

Additionally, sensitive questions should be asked in a sensitive manner. Instead of asking is someone engages in casual sex, it is better to ask them the number of sexual partners they have had in the last month. Instead of asking if someone smokes marijuana, you may get a more truthful answer if you list many drugs (both soft drugs and hard drugs) and ask them to check what they have tried. The final tip is to keep the respondent’s identity secret. This guarantee of anonymity will aid you in drawing out more truthful answers. You can remind participants in your written and spoken instructions that their responses are anonymous.

How can a response conversion be facilitated?

Once the first challenge of how to ask a question is overcome, the second challenge is to make sure that the subject is able to and given the tools to convert their response into a measurable value on a response scale. This is the response translation phase. When establishing the scale for measuring a research participant’s response, keep three issues in mind: how many numbers to use, what anchors to include in the scale, and which numbering system is the best to use.

How many numbers to use?

Not providing enough numbers asks the respondent to make a judgement that without enough choices. Providing too many numbers asks the respondent to make a judgement with too many choices. A good scale will provide enough numbers that the respondent can express a wide range of opinions without creating confusion. Since different questions require different ways to measure response, there is no perfect scale. Most experts in psychological measurement, also called psychometricians, feel that in most cases a rating scale should have between 3 to 10 response choices. Of course in some cases a researcher would need even less, for example, in a situation when the question is a yes or no question.

What anchors to include?

The second consideration is what anchors to include in the scale. Anchors are descriptive words that give the numbers meaning. For example, a question could ask the respondent to circle a number between from 1 to 6 with 1 being “strongly agree” and 6 being “strongly disagree” to what extent the respondent agrees/disagrees with a statement. “Strongly agree” and “strongly disagree” are endpoint anchors.

The meaning of a response scales changes based on the chosen anchors. To gain more specific information, a researcher may choose to include middle anchors. Middle anchors make it even easier for respondents to assign a numerical response to their subjective judgements or feelings. If you ask the question, “how satisfied are you with your job right now?” between the endpoint anchors of “not at all” and “extremely,” you could include middle anchors of “slightly,” “quite,” “mostly,” and “very.” In including anchors, a researcher needs to make sure that the chosen anchor corresponds well to the number on the scale. It is important that anchors create intervals that appear equal. By posing the above question, the researcher is suggesting that the difference between “slightly” and “quite” is equal to the difference between “mostly” and “very”. However, people may believe that the interval between “slightly” and “quite” is much larger than the interval between “mostly” and “very”. That is to say that the psychological distance is not equal to the response scale.

Which numbering system should be applied?

This can seem like an impossible task. Luckily, researchers who study the measurement of meaning have looked into what words like “very” or “extremely” suggest. Through asking participants in a study to rank anchor words, researchers determined that the following four anchors have fairly equal psychological intervals : not at all satisfied(0), slightly satisfied(2), quite satisfied(4) and extremely satisfied(6). When asked the question “how satisfied are you with your job right now on a scale of 0 to 6?” this scale would act as an appropriate response scale especially since the anchors use the same words that are in the question, “satisfied.” While it is not always possible, including words from the question helps the respondent stay focused on the question being asked. Another benefit to this scale is that “not at all” corresponds to the number 0 rather than the number 1. For most people, this is a more intuitive and stronger correlation. Notice that in this scale there are numbers (1,3,5) that do not have anchors. This allows the respondent to split the difference between anchors if they are unable to decide between two anchors.

Researchers agree that this scale can be adapted to many purposes. However, it still cannot be deemed an all-purpose scale. The scales we have discussed are unipolar scales. This means that these scales ask research participants to make ratings beginning from a low value such as zero and move to a higher value that ends at the researchers discretion. In some cases, a researcher may want to know more extreme information. Is the participant so extremely dissatisfied or extremely satisfied that he/she would like to rate express this? Some psychological constructs are better served through bipolar scales. Bipolar scales ask research participants to make ratings where zero is the middle anchor. For example the response scale to the question, “How satisfied are you with your job right now?” may have -6 as extremely dissatisfied and 6 as extremely satisfied with middle anchor points of quite negative, slightly negative, neutral, slightly positive, quite positive. This type of scale leads respondents to consider negative as well as positive events or experiences.

When considering this scale with its 13 response choices, you may remember that researchers suggested that an optimal number of response choices is between 3 to 10. Bipolar scales have more response choices, but studies have shown that respondents consider which side of the response scale is applicable to them in relation to the midpoint. For example if the respondent is satisfied, they would consider the positive side or right hand side of the scale and ignore the negative side or left hand side of the scale. The unipolar and bipolar response scales we’ve discussed are examples are empirically grounded, well-anchored (EGWA) response scales. These scales can be adapted to many uses and be used to ask many questions.

There are cases that will require other scales. If research participants are less than twelve years old, a researcher may limit the number of response choices to four or conduct more pilot tests and focus groups to determine the appropriate language to use. If people are being surveyed over the telephone, a researcher often breaks up a response scale into several questions. For example, “Do you like x? Please choose yes, no , or indifferent.” If the response is yes, the researcher can follow-up with “Do you like x slightly, quite a bit, or extremely?” Since the respondent is asked two questions instead of one, they are able to better process the question and not overloaded with too many choices. In other cases, a respondent may need more choices than the prescribed 3-10 on a unipolar scale. For example if the question is “How much do you trust your most-trusted friend?” respondents would likely choose 6 (extremely) which would lead to the problem of restriction of range. In this case, it may be necessary to add an even higher anchor point “completely”. While there seem to be too many anchor points, the respondents are likely to ignore the lower values of the scale anyway.

Why is it important to evaluate research?

Don’t forget that researchers want to pose multiple questions to gauge similar things. After coming up with a first set of questions, a researcher uses data to fine tune those questions. Through empirical evaluation, a researcher may find that a question they thought was great is not at all useful, while another question that they didn’t even seriously consider is in fact a very good question. To evaluate questions empirically, a researcher should follow three steps.

Step one: keep the larger goal in mind

The first step is to keep the larger goal in mind and measure the question against that goal. The question should be relevant to what is being measured. For example, if a researcher wants to measure how much people trust police, it would be helpful to compare that question to a more general question of how much people trust authority figures overall. Asking a friend or colleague for their input could help a researcher refine the questions. a fellow research may recall a previous study (in this case Lerner’s 1980 study that measured people's belief in the existence of a just and fair world).

Step two: write a lot of question to make strong final questions

Second, a researcher needs to write a lot of questions to come up with a strong list of final questions. A researcher may begin with fifty questions in order to have ten final questions. In another scenario, a researcher may come up with twenty questions and then a pilot group could help generate more questions through their comments and open-ended answers.

Step three: analyze the scale

The third step is to analyze the scale. After generating a initial list of questions, a researcher needs to give them to a large sample group. The sample group they choose should be comprised of the same type of people that the researcher plans to include in the actual study. At this point, a researcher may conduct a factor analyze to see how closely related the items in their scale are to each other. Through factor analysis, a researcher may find that they are really asking two groups of questions. The researcher then needs to make the important decision of either focusing on one thing to measure with one group of questions, or creating two scales to pose both groups of questions to the participants. In either line of research, the researcher needs to make sure the items in the scale are reliable, meaning the items have high inter-item reliability. The researcher also needs to determine a scale’s validity. To be valid, a scale should relate to established scales or behaviors that are connected to what they are measuring.

Evaluating questions can be a difficult and time-consuming task. As a researcher, you will find yourself moving back and forth between all three steps. In order to establish a new measure as both reliable and valid, a researcher may need to generate questions several times and conduct multiple pilot studies. Building a strong foundation will allow you to generate more questions and more things to measure. Constructing a good scale will tell you what measures you can and cannot predict, as well as what reasonable conclusions can be drawn. In other instances, for example when working with absolute scales where the participant is asked to report magnitude, the process of choosing questions and constructing a scale can be fairly straightforward.

What are alternative scales?

The types of scales we have discussed serve researchers in most instances. However, there are other times where alternative scales are useful. Three scales will be discussed: the semantic differential, the Thurston scale and the Guttman scale.

Semantic Differential (Osgood)

The Semantic Differential is such a scale developed by psychologist Charles Osgood. Osgood was interested in how people mentally link their representations of concepts to everyday language. With the help of his students, Osgood made a list of every adjective in the English language using an unabridged dictionary. He asked research participants to group and sort these adjectives. Osgood saw a pattern of paired adjectives, for example, good versus bad. This held true when he conducted his research in other languages. He felt that these universal pairs would be useful as endpoints that a diverse group of people could relate to. Osgood farther broke down all adjectives into three groups or categories: adjectives of evaluation such as good/bad and easy/difficult, adjectives of potency such as strong/weak, and adjectives of activity such as fast/slow, or alive/dead. Osgood used these categories to present questions in a direct and easy to grasp way. There are times where this scale is not the best choice. For example, some objects or concepts cannot be evaluated on all three categories. Osgood also noticed other exceptions. For example “hard/soft” was useful in the dimension of evaluation of people, but not in the dimension of potency when talking about war. In light of these times where this scale is not suitable, it is necessary to conduct a factor analysis before using the Semantic Differential.

Thurston scale

All the scales that we’ve discussed to this point were designed to yield a scale total, which is a single number that is either the sum or average of all the items in the scale. This number represents a participant’s standing on the psychological construct of interest. There are a few examples of scales that do not utilize such a scale total. In the 1930s, a psychometrician named L.L. Thurston designed a scale to measure how people felt about war. Those strongly opposed to war favored items on the low end of the scale, statements such as “there is no conceivable justification for war.” Those with moderate attitudes on war favored items towards the middle of the scale, statements like “war brings out both good and bad qualities in men.” While those in favor of war favored statements towards the high end of the scale, statement like “war is glorious.” In Thurston’s scale, it would be illogical to average or add all the items in the scale. Instead, a researcher should decide which items the research participant agrees with most strongly. Thurston’s scale and scales like it are designed so that certain items imply different levels of evaluation. Researchers then give scores to show if the participant strongly endorsed items that had low, medium, and high evaluations.

Guttman scale

John Guttman, another psychometrician, designed a scale to measure people’s attitudes towards abortion. The format of the Guttman scale is similar to the Thurston Scale, except that Guttman goes on to assume a threshold. This means that since the scale moves from endorsing all items to rejecting all items (or vice-versa), there is a threshold where if a person answers “yes,” they are likely to have answered “yes” to all the questions before it. For example, if the items move from “Do you think that abortions should be available to a woman after rape?” to “Do you think that abortion should be available on demand?” a participant that says “yes to the last question, will have most likely said “yes” to all the preceding questions. To score a respondent, a research would identify the highest item the respondent responds “yes” to.

Both Thurston and Guttman Scales gather information on people’s views of controversial subjects as well as why people have these views. the downside to these types of scales is that they are very difficult to design. They also create complicated issues in scoring.

How to ask a question?

What types of misinterpretation can occur during research? - Chapter 5

Validity can be threatened by three main things: people are different, people change, and the process of studying people changes them.

How is validity threatened through selection bias and non-response bias when differentiating between people?

We’ll begin by discussing the first threat, people are different. A pseudo-experiment is a research experiment where someone tests a variable by exposing people to it and observing how people feel, think, and behave. These observations are then compared to predicted expectations. An example of a claim based on a pseudo-experiment is “Lotte got a perfect score on her test because she used her lucky pencil.” The main issue with such pseudo-experiments is that they lack a control group. Some people are naturally better at test-taking. Without the control group, it is not possible to tell what would have happened if Lotte had taken the same test without her lucky pencil. Beyond undermining an experiment, individual differences can undermine observations and internal validity.

In cases where the sampling technique is geared towards a specific kind of person, a researcher may draw incorrect conclusions about people in general. Sampling people from this type of unrepresentative sample is known as selection bias. An example of Selection bias is in the incorrect prediction for United States President. The prediction was that Alf Landon, the republican candidate, would win over Franklin D. Roosevelt, the democratic candidate. The pollsters that conducted the Literary Digest Poll, found the names and addresses of people to poll from telephone books and car registrations. They mailed them postcards with the questions and asked them to send the response back. Since anyone who owned a telephone or car during the depression was most likely wealthy, and wealthy people tended to favor the republican party, the poll reflected this bias.

People who answer surveys are different from those who don’t. In the previous example, only twenty-four percent of those contacted responded. The ones who did not respond may have done so for a number of reasons, such as they hadn’t made a choice yet on who they were voting for. This further contributes to the incorrect prediction, and is called non-response bias. Nowadays, survey researchers have developed various solutions to address non-response. Gallup pollsters now ask most survey questions by phone rather than through the mail. This generally increases response rates from twenty-five percent to sixty-five percent. While this is a huge improvement, it could still be a problem. For this reason, researchers adjust their findings for the non-response bias. While biases affect external validity, they do not necessary threaten internal validity.

How is validity threatened by people changing through history, maturation and regression toward the mean?

A second threat to validity is that people change. People can change based on time. For example, now versus five years ago. People can change based on their environment and the other people they are around. These differences in the same person can lead to misinterpreted research findings. This can be overcome with pretests and posttests.

History and maturation are threats to internal validity and can pose a problem when there is no control group. History refers to changes in a large number of people such as a country or culture. Maturation is more specific and refers to changes in a specific person over time. Both history and maturation can happen over the short-term, a few weeks, a few minutes.

Another threat to internal validity is regression toward the mean. This is the tendency that people will score high or low on the first round of tests and will score closer to the middle on the second round of tests. This can also cause researchers to misinterpret results. This is most likely to occur when researchers use pretest and posttest designs that can be affected by history and maturation. Regression towards the mean is everywhere. Consider the belief that rookie athletes and musicians experience “a sophomore slump” where their second season or their second album is not as successful as their first. This performance or observed score and a true score differs in that the true score is the underlying ability that the observed score reflects (being athletic or musically gifted). True scores are more stable then observed scores because they reflect the characteristics of a person in an ideal setting, meaning a setting without good or bad luck. Observed scores are affected by a person’s true score and error or chance factors that influence performance. Errors are irregular and unpredictable. This includes factors of personal variables such as health and environmental variables such as wind speed and direction or the public’s musical taste. As previously mentioned, using a control group would help to minimize the effects of regression to the mean.

Why does people changing when they are studied propose a threat to validity?

The third threat to validity is that the act of studying people can change they way they behave. One example is a study at the Hawthorne plant of the Western Electric Company in Chicago. Researchers found that any change in working conditions resulted in the workers working harder. This is now called the Hawthorne effect. The workers were being more productive because they knew they were being studied, not because of other changes to their environment. Another example is the mere measurement effect. This occurs when participants change their behavior because they are asked how they will act in the future. An example is if a researcher asks a participant if they plan on exercising after the session and the participant says yes, and does in fact exercise after the study. However, it is possible that the participant would never have exercised without being asked the question. This would affect the external validity of research since the general population who did not participate in the study may not respond in the same way.

Another concern is testing effects which can be a problem to internal validity when there is no control group. Testing effects is the tendency that people have to do better on a test the second time around. This even happens in IQ tests or when people are given another version of the test the second time. There are several explanations for this. The test taker could have learned from taking the first test. They may have a better strategy for test taking or have looked up the answers that they didn’t know. In tests that test physical ability, the first test could have served as a practice in learning the task. In personality tests, the test taker may have learned more about what types of answers are socially desirable. Psychological tests may represent attitude polarization. Letting people think about their attitudes often leads to even more extreme or polarized attitudes.

To correct for testing effects, a researcher should do an experiment with a pretested control group. While this will not rule out testing effects, a researcher can use the results to separate them from the experimental treatment. A researcher may also choose to not do a pretest and separate the participants at random into a control group and another experimental group. If the sorting is random, a researcher can reasonably assume that differences in the two groups after the interaction is a result of the experimental intervention. If it is too difficult to have a control group (ethical or otherwise) a researcher should wait as long as possible to give the posttest and try to give a different version of the test. By waiting, a researcher minimizes the threat of testing effects since the participant will have forgotten the test.

How do experimental mortality and attrition influence validity?

In some cases a participant will not complete a study, this is called experimental mortality or attrition. Depending on why this happens, attrition could be a threat to external validity, internal validity, or both. One reason a participant may not finish a study is if the experiment or longitudinal study spans a long length of time. People may become bored or move away. For example, a study on implicit learning, or learning without being aware of the learning, can be very boring since such a study may require people to process what appears to be meaningless information for long periods of time. Let’s say the study begins with forty people, evenly divided between a control and variable group. The research study loses six people in each group. Since an equal level of attrition occurred in each condition, this is called simple or homogenous attrition. In this study, the rate of attrition is a threat to external validity rather than internal validity. It is likely that the twelve remaining participants with the patience and dedication to remain in the study are not representative of the larger population. If attrition does not occur at an equal level but are actually noticeably different, this is called differential or heterogeneous attrition. Heterogeneous attrition is a threat to internal validity since it undercuts the benefits of random assignment. Heterogenous attrition results in the two groups of people (in their respective condition of study) as being different from each other.

The first step to avoid these threats is to try to prevent as many people as possible from dropping out of the study. A researcher can communicate the importance of the study to participants as well as warn them that it will be boring. A researcher can try to make the participant as comfortable as possible by providing breaks. A researcher can also give people a large academic or financial reward. Another approach a researcher can take is to conduct the unpleasant or boring part of the study after they’ve completed the initial part. The researcher can then delete the results from the people who drop out at this point, in other words the research can equalize attrition in the two conditions of the experiment. Additionally, a researcher can compare those who dropped out with those who did not to see if attrition was in fact a threat. Equalizing attrition could lead to ethical issues that a researcher will need to consider and balance.

What are types of participant reaction bias?

A general term for a third category of threats is participant reaction bias. This bias happens when people behave in uncharacteristic ways when they know they are being observed. People typically react in three different ways: they do what they think the researcher expects, they do the opposite of what they think the researcher expects, or they do what will make them look good. These actions could threaten internal validity by hiding or copying the effects of the variables being studied.

The first type of participant reaction bias is participant expectancies. In this bias, the participant tries too hard to cooperate with the researcher. They may behave in a way that they think is consistent with the researcher’s hypothesis to please the researcher, to feel normal, or to satisfy what they may see as a social contract with the researcher. This of course suggests that a participant is aware of what the researcher hopes to prove. An example is the Weapons Effect study by Berkowize and Lepage in 1967. In this study, male participants were given the chance to shock a person they were told had previously shocked them. The study showed that if a gun was present, the participants administered a stronger shock. However, critics of the study said that the men administered a stronger shock because that is what they thought they were expected to do upon seeing the gun. After doing a subsequent study, in 1977 Berkowtiz Turner, Simons, and Frodi reported that making participants more or less suspicious or telling them more about the hypotheses did not lower the level of aggressiveness upon seeing weapons.

Participant reactance is a type of participant reaction bias where the participant tries to disprove the researcher’s hypothesis. One reason this happens may be a participant's desire to feel independent. Participant reactance could cause researchers to reject a valid hypothesis.

Evaluation Apprehension is a third type of participant reaction bias and refers to a person being worried about being judged by another person. Evaluation apprehension can threaten validity when people change their behavior so that they will be judged favorably. This could apply in the Weapons Effect experiment if a participant really wanted to shock a person but didn’t because they didn’t want to be seen negatively as being aggressive. A researcher can lessen the likelihood of evaluation apprehension by guaranteeing anonymity and protecting the privacy of the participant. A researcher can ask that participants not put identifying information on their surveys and to seal their surveys in plain envelopes.

What are ways to avoid participant reaction bias?

A researcher can also chose to provide the control group and the independent variable group with the same cover story, a false story about the study and its goals. There are ethical issues to consider when a researcher gives participants such elaborate misinformation. A researcher could also chose to keep certain key information, such as if a participant is in a control group or in a variable group, a secret. In surreptitious or unobtrusive observations, participants may not realize they are being studied or what aspects of their behavior are being observed.

In 1971 Jones and Sigall created a technique to convince research participants that researchers could read their minds by hooking them up to a fake lie detector. They had measured the participant’s attitudes in a pretest and convinced the participants that the machine could read their minds by the way they hold the “attitude wheel” which looked like a steering wheel. Their study found that people were more willing to voice their prejudiced views on stereotyped groups when hooked up to the machine.

A researcher can also use indirect measures of people’s attitudes and opinions. A study done in 1998 by Vargas, Von HIppel, and Petty sought to measure people’s attitudes on cheating. Such an attitude can be challenging to measure since most people are unlikely to admit that they cheat. Researcher’s instead asked the participants how they felt about a situation that happened to someone else. Did they feel it was alright for someone else to get a copy of an expensive and sought after library book by pretending to lose it and paying the small fine? Unobtrusive measures are successful because the participant is unaware of what exactly is being measured.

Additionally, researchers can measure attitudes that the research participant doesn’t even know they have. Researchers came up with implicit or unconscious measures of a person’s attitude. Greenwald, McGhee and Schwartz use the Implicit Association Test (IAT) to gauge unconscious associations. Hetts and his colleagues use the terms implicit self regard or implicit self esteem to describe people’s unconscious attitudes about themselves. In 1999, Hetts Sakama, and Pelham conducted a study where they had participants make quick judgements about words that appeared on a screen. Participants were first shown, or primed with the words “I” or “me”, then were shown words like “good” or “bad” in the same place the words “I” and “me” had been. Based on how easily the participant identified the words “good” or “bad”, researchers were able to gauge if a person had high (“good”) or low (“bad”) implicit self-regard. In recent years, work on indirect and implicit attitudes as unobtrusive behavioral measures has gained popularity. Researchers expect to see these measures continue to develop as researchers seek solutions to problems of participant reaction bias.

What is the experimenter bias about?

The researcher or person conducting the experiment can also pose a threat to validity if their expectations affect their observations. This is called experimenter bias. This bias can happen in two ways. The first occurs when the experimenter makes incorrect observations due to their bias. The second occurs when researchers behave differently with participants due to their bias. Both these forms of experimenter bias can be seen in a study conducted by Rosenthal and Fode in 1963. They asked a group of experimenters to test the performance of specially bred “maze-bright” and “maze-dull” rats. The rats went through a maze and the maze-bright rats learned their way faster than the maze-dull rats. This may not be surprising except that both group of rats were the same. The experimenters unknowingly treated the maze-bright rats differently by petting them and encouraging them more, thus inadvertently affecting the outcome. Coding errors were also found that were made in the maze-bright rat’s favor. Keep in mind that the experimenters did not intentionally manipulate results to favor the maze-bright rats. But their expectations were able to inadvertently affect how they treated the rats.

In some cases, experimenter bias can lead to participant bias if the participants become aware of the experimenter's expectations. Experimenter bias can be avoided in several ways. Experimenters can prerecord instructions so they have less personal interaction with the research participants. Experimenters can also make sure they are blind to, or unaware of a participant’s treatment condition. They can do this by having one researcher interact with participants while another researcher is responsible for observing and recording the results. The identities of the participants would be kept anonymous since they would be assigned coded numbers. To really attempt to prevent both experimenter and participant bias, researchers can use a double-blind procedure. Such a procedure means that both the researcher and the research participants are unaware of or blind to the participant’s treatment condition.

How do confounds and artifacts pose a threat to validity?

Threats to validity can also be described in two other ways, as confounds or artifacts. In comparing confounds and artifacts, it is important to remember that confounds are variables that vary but should be constant, while artifacts are constant but should vary in a regulated manner. A confound is also referred to as a nuisance variable. A confound is a situation where an additional variable “z” varies with both the independent variable “x” and the dependent variable “y.” Researchers do not think of and do not plan for confounds which can lead to a seemingly meaningful connection between the independent and dependent variables. For this reason, confounds threaten internal not external validity. Internal validity in passive observational/correlational studies are often threatened due to the occurrence of confounds. It is important to remember that confounds can be a concern in any type of study.

One example is a study on how a person being in a good mood makes them more likely to try to help others. In this study, participants were given positive feedback on their personalities. This did in fact improve their tendency to be helpful. However, being complimented in such a way may have increased their sense of competence, which may have led to them being more helpful. The sense of competence would be the confound in this scenario, meaning that the link between mood and helpfulness may not actually exist. In 1972, Isen and Levine conducted a study that did not involve a participant’s sense of competence. Instead of giving compliments, they gave participants a cookie. The cookie also made participants more helpful. However, in being given a cookie, the participants had been given something nice by a stranger. This is called a pro-social model. This model is known to increase helpful behavior, thus acting as another confound. In 1987, Isen changed the experiment so that the participants found a dime. In finding a dime, the participant felt lucky rather than competent or on the receiving end of a gift from a stranger. This also increased the participants helpful behavior without the previous confounds. This third experiment is higher in internal validity than the previous two.

There are three lessons to be learned from the studies we’ve just discussed. First, research past studies and think through your experiment to try to eliminate as many confounds as you can. This will help you because you’ll be less likely to repeat the mistakes of past studies. Secondly, replicate your experiment. By researching the same idea in different ways, it is difficult for critics of the previous studies to say that a specific confound was responsible for the outcome. The last lesson is to control for confounds and measure them. Many statistical techniques have been developed for just this purpose: analysis of covariance (ANCOVA), multiple regression, and partial correlation. Isen and Levine could have surveyed their participants (both the control group that were not given compliments and the variable group that received the positive feedback) to gauge how competent they felt afterwards. Isen and Levine would then have measured and could remove the effects of that confound.

Unlike confounds, artifacts do not vary, but remain constant. Artifacts threaten external rather than internal validity. If an artifact occurs in a study, the dependent and independent variables may only have a relationship with the existence of said artifact. This makes it difficult to show high external validity since this experiment has special conditions or circumstances. For example, consider a study where the researcher has a hypotheses that if a person feels unique they will have higher self-esteem. Let’s also say that a researcher tests this in many different ways in order to eliminate potential confounds and the test results continue to confirm the researcher’s hypothesis. This experiment has high internal validity. However, the artifact in the experiment is culture. Non-western cultures like China do not place high importance on individuality, rather the opposite. In China, feeling unique would result in lower rather than higher self-esteem, leading to low external validity. Artifacts are more difficult to eliminate since it can be very difficult to have a diverse sample of people that reflect differences in culture, language, wealth, race, gender, and age.

For this reason, researchers try to focus on eliminating artifact that closely relate to (and may effect) what they want to study or is connected theoretically to a research finding. A researcher can address artifacts and the problems the may pose by varying anything that may be an artifact in later studies.

Validity can be threatened by three main things: people are different, people change, and the process of studying people changes them.

How is validity threatened through selection bias and non-response bias when differentiating between people?

What is non experimental research in psychology? - Chapter 6

What are case studies?

By conducting case studies, researchers can study the behavior of one person. This person usually has a very specific or extraordinary experience or condition that would be difficult to recreate in a laboratory or find a large enough sample of similar people. Researchers conducting case studies try to find general psychological principles and confirm theories or develop new theories. Case studies may not involve a designed research experiment, instead relying on very detailed observations. Once a researcher comes up with a new insight or theory they then go on to conduct research involving many participants. Clinical psychologists and behavioral neuroscientists make the most use of this method.

Example: What is the case about Phineas Cage?

The case of Phineas Cage provides a good example of the usefulness of case studies. Cage was injured in a freak accident while laying railroad tracks. A tamping iron went through his skull and removed a portion of his brain. Afterwards, Cage was still able to function physically and mentally. The only noticable change was his temperament. He changed from being a very likable and kind person to an abusive person who cursed frequently. This case study led researchers to believe that damage to certain brain areas caused certain psychological defects. This is still relevant today, as modern day researchers have conducted similar experiments with people with damaged prefrontal cortexes (the area of damage Cage suffered) and confirmed similar results.

Example: What is the case about Stephen D's amphetamine addiction?

The case study of Stephen D. shows us what happened when normal human experiences are exaggerated or diminished. Stephen D was a medical student and was addicted to amphetamines. He had a dream that he was a dog and woke up with a heightened sense of smell. He was able to navigate New York City with his nose and able to recognize people from their smell alone. During the three weeks that his condition existed, his sense of smell was so exaggerated that his ability to reason was diminished. Stephen D’s three week experience can be explained as a dopaminergic excitation caused by his use of amphetamines.

Example: What is the case about Peter Tripp?

Thirty years ago a DJ named Peter Tripp decided to stay awake for two hundred hours (eight days) straight. Unlike the previous two case studies, Peter chose to expose himself to extreme conditions. As predicted, after a few days without sleep Tripp was extremely tired and had trouble performing routine tasks. After one hundred hours without sleep, Peter started having hallucinations. He saw clocks with human faces, inanimate objects change int other things and burst into flames. After one hundred and seventy hours, Peter was losing his sense of reality. He sometimes forgot who he was. After two hundred hours, Petter Tripp essentially lost his mind and had to be restrained by the scientists who were observing him. These scientists, Suter and Lindgren, concluded that extreme sleep deprivation was intolerable. However Suter and Lindgren also noted another case study. Randy Gardner stayed awake for eleven days for his high school science project. In contrast to Tripp, Randy appeared normal, though obviously exhausted at the end of his eleven day experiment.

This brings into question if Peter Tripp was already mentally unstable before beginning his experiment. This also shows that the results of sleep deprivation vary from person to person and can be unpredictable.

Example: What is the case about Sarah's abuse?

Unlike Peter Tripp’s self-inflicted sleep deprivation other’s suffer abuse at the hand of other people. Psychologists want to understand how victims respond to abuse, how they deal with the stress it causes, how they cope, and how they view themselves. The case of Sarah shows how one such abused person could eventually lead a normal life. Sarah was neglected by her mother and sexually abused by her biological father and her stepfather. Despite all this, Sarah later married and had a loving family. This suggests that people have a universal desire for a sense of self-worth and to make sense of the world. However, the self-concept is obviously very complex since other people respond to abuse differently, becoming depressed or even abusing others.

Example: What is the case about the amnesia of K.C.?

In some cases, case studies can be helpful in testing a hypothesis. One case is a study on memory and self-concept conducted by Tulving in 1993. In this case study, a man he called K.C. suffered a head injury in a motorcycle accident. As a result, He developed an extreme case of amnesia. He was especially unable to access episodic memory or his memory of autobiographical events. He had no short-term memory of actual events, but he could still think abstractly. When K.C. and his mother were asked to rate his personality using terms often used to describe self-concept, their ratings agreed seventy-three percent of the time. K.C.’s personality had changed after the accident and his ratings of his personality post accident were more in tune with his mother’s than the ratings of his personality before the accident. This means that people’s abstract views of themselves or their trait information (for example, he is an introvert) are separate from their autobiographical or episodic information (he eats lunch alone everyday).

There are also other ways to study unusual people or situations besides case studies. As we mentioned earlier, researchers may use case studies since it can be challenging to find a large group of people with the same unusual condition. In 2001, Liberman, Oschner, Gilbert and Schacter wanted to see if people with amnesia would be able to reduce cognitive dissonance in relation to events they couldn’t remember. Their study showed that dissonance reduction does happen even if a person can’t recall what caused the dissonance in the first place, essentially that some part of dissonance reduction happens unconsciously in people.

It is important to state that noting the quirks and eccentricities of people, or relating their life events in a tabloid magazine does not quality as a case study. The purpose of case studies is to develop or refine theories on why people behave the way they do. Case studies are scientific because they seek to explain human behavior on scientific principals. In the right situations, these established scientific would be applicable to everyone. When identifying a case study, consider how an audience would respond to the study. In 1992, Abramson made the point that what determines the scientific status of a case study should be governed by the same standards applied to determine if anything is scientific.

This is the same set of standards established by Popper that we’ve already discussed. Case studies help in developing, refining, and testing theories that are susceptible to falsification. The facts of case studies are not usually debated, but this conclusions drawn from case studies are often debated. In the first case study we discussed Phineas Cage. Scientists do not dispute the events of his accident. His situation also did not explain everything there is to explain on the prefrontal cortex, but by reviewing the case studies researcher were able to refine their questions in subsequent studies. Case studies also have a negative reputation because of Sigmund Freud, who used case studies frequently but did not do so in a scientifically acceptable way.

What is the function of doing research by using one variable?

A pitfall of case studies is that since they focus on person or a small group (or a person with unusual conditions), it is challenging to apply what a researcher learns to the general population. There are times where it is most important to know if and how correctly a specific observation can be applied to the general population. This is where single-variable studies come in handy. They are designed for this purpose, to describe a specific property of a large group of people. Studies on how many people want gun control or clinical depression in America high schools would be an example of single-variable studies. Keep in mind that these studies tend to be descriptive not theoretical because this type of research engages one question. For this reason, single-variable studies are not used to study what causes a certain human behavior. Of course there are times where there is strong evidence that a large group of people behaves differently than expected. For the purposes of our discussion, we will focus on the types of single-variable studies researchers conduct in order to describe the world.

Two important examples of single-variable studies are through census and survey. Both census and survey aim to provide information on people, specifically the population. A census has another goal of counting the people in a population. A census is expensive and difficult to perform because it aims to collect a very thorough amount of data. For example, the U.S. government conducts a census once a decade to count the people living in the U.S. Since this is such a huge undertaking, researchers use surveys given to a sample of the population (also called a subset). The results of these surveys are used by researchers to estimate how the larger population would respond. In this situation, the researchers assume that the sample group is representative of the entire population. This is tricky when you consider the issues of selection bias that we discussed earlier. It is difficult to gauge exactly how representative and diverse the sample group is.

For this reason, researchers use a population survey. A population survey chooses the sample group randomly through a process called random sampling or random selection. A researcher would begin by attaining a population list. They would use a computer program with a random number generator, or create a random numbers table to collect the numbers. This process is similar to conducting a lottery where the winners are in the sample survey group. The largest benefit in using a population survey is that is is easier, faster, and less expensive than conducting a census.

There are still challenges though. Finding a population list can be difficult and if the results that are generated are truly random, a researcher may have to travel far and wide to conduct the survey.

To address this challenge, researchers use a method called cluster sampling. This is a population survey that uses a different version of random selection. In cluster sampling, a researcher first makes a list of possible locations where they will find the people who they are interested in studying. The researcher would then shorten the list of locations by random selection. Finally the researcher would randomly choose a specific number of individuals from each location to survey. Through using cluster sampling, researchers are able to make predictions without conducting a census. However, chances are the results of cluster sampling would not exactly match the results of a census. Researchers can combat this problem through statistical calculations. They can estimate the chance of error, or sampling error. Sampling errors are also known as a margin of error. According to probability, a three percent margin of error in a study would mean that the results of the study would be ninety-five percent accurate. When the accuracy of cluster sampling is compared with taking a much larger sample, the differences are minimal. These slight discrepancies can be lessened even more by doing careful random selection and sampling. Population surveys are a powerful method for this very reason.

What is the difference between epidemiological research and research on public opinion?

Population surveys are especially applicable in epidemiological research and research on public opinion. In scientific terms, epidemiology is the study of what causes disease. In the terms of clinical psychology, epidemiological research studies the prevalence of certain psychological disorders within a specific population. Conducting a population survey is particularly challenging because the researcher would need to attain a population list of people to survey. They would need to determine who meets the criteria for the disorder they are studying. Additionally, researchers face the challenge of nonresponse bias. The mentally ill are often unwilling or unable to respond to interviews. A study called the Epidemiological Catchment Area (ECA) conducted by Regier and Burk in 1987 functioned this way. Eventually the researchers were able to gain valuable insight since the results of the study showed that anxiety disorders were more linked than researchers thought. The ECA studies can be seen as single-variable studies, however it should be noted that studies rarely focus on only one variable. Researchers can study much more with the addition of a few variables. For example, the ECA studies also showed interesting correlations between gender and different mental disorders.

A second type of single-variable research is research on public opinion. This method of research also makes use of population surveys and is designed to determine the attitudes of specific populations. This is especially useful when considering consumers or voters. Research on consumers is called marketing research. Marketing research aims to gauge what a consumer thinks about different products and which ones they prefer.

It is practical for businesses and politicians to find out what their audience wants before trying to sell them a product or sell them on a politician. One possible way to gauge public opinion is through intuition. An intuitive approach has the benefit of being faster and cheaper to conduct. However, there are obvious disadvantages. Ross, Greene, and House conducted research on the false consensus effect in 1977. Their research showed that people tend to overestimate the proportion of other people who have similar attitudes as themselves. Additionally, people tend to be conservative in their judgements in what we discussed previously as regression toward the midpoint. These tendencies of mis-judgement are heightened when the person is in the minority. People tend to have egocentric bias and are overly confident in their judgements. It is advisable for researchers to take the time to locate a random sample and survey them. Responses can be used to generate fairly accurate predictions for a larger population.

What are the disadvantages of population surveys?

Population surveys provide a good medium between surveying everyone and surveying no one. They can prove challenging because a researcher must develop an effective strategy and deal with nonresponse bias. The researchers directing the study may need to train others to conduct valid and reliable interviews. Additionally, language barriers may prove a challenge to reflecting diversity and minority groups. Since researchers aim to survey a random selection of the population, research questions are especially difficult to write. They must be applicable to a diverse group of people in terms of race, age, culture, gender, etc. Researchers also face the problem of public skepticism of surveys. As more and more researchers, telemarketers, and salespeople make use of telephone and mail surveys, more and more of the population have become weary of responding to surveys. Necessarily, researchers must make their surveys concise and to the point. This may mean they are not able to ask as many questions as they would like. An added concern is that since population surveys are conducted in the respondent’s home (even if via telephone), the researchers are not able to control the surroundings. All these challenges cause population surveys to be more complex and time-consuming than what may first appear to be the case.

What are exceptions to the importance of a representative sample group?

On occasion, a researcher can illustrate a case of extremely interesting or bizarre human behavior that would be unlikely to be found in any sample. When the research communitie’s established assumptions on human behavior are challenged by normal and healthy people, the scientific community takes note. The obedience study using electric shock conducted by Milgram is a good example. As we’ve already discussed, he was able to show that sixty-five percent of normal adults were willing to shock to the point of death complete strangers when told to by an authority figure. This finding was so interesting and bizarre that a randomly selected control and variable group was not necessary. Such a sample is called a convenience sample. Milgram’s study essentially showed that people do not act on principal as much as previously thought. Recent research on judgement and decision making shows that people are also not as good at using principals as previously hoped.

A majority of this research is done using the convenience sample method to see what proportion of people make a correct judgement or decision. For example, consider the case of Simone. Simone is 28 years old, single, confident, and intelligent. She studied philosophy in university and was active in the community dealing with issues of discrimination and protesting nuclear power. Is Simone an office manager, or is Simone an office manager who is involved in the feminist movement? In 1983, Tversky and Kahneman posed a similar question to a large number of college students. Eighty-five percent of the students responded that Simone was an office manager who is involved in the feminist movement. They formed this illogical judgement based on very little information. A key law in probability theory says that the idea that a compound event (event A and event B) is never more true than one of the events (event A or event B). In the case of Simone, she is not more a feminist office manager than a regular office manager since the term office manager would include a feminist or non-feminist one. The surveyed college students committed conjunction fallacy. Researchers that study judgement and decision making try to determine the likelihood of people making suboptimal judgements. When a researcher wants to study how come and on what occasions people deviate from the ideal judgement, a good place for them to start is by doing a single-variable study that documents how often the people deviate.

How is using multiple variables conducted in doing research?

Researchers are interested in determining what causes their observations after describing the variables that interest them. To find causes and establish internal validity, a researcher typically wants to use the experimental method. However, many types of subject matter are not conducive to the experimental method. Researchers can then use correlational methods to test their theories by observing how variables are related in nature.

Correlational research is used to make observations about a group of people and to test predictions about how different variables are related. For example, does gender affect a person’s view on capital punishment? Do cities with cooler climates have a lower crime rate? While these questions can be answered fairly directly, it is difficult to draw a theoretical conclusion from the answer. In a previous chapter, we defined a theory as a statement about the cause and effect relationship between two or more variables. For this reason, correlational research can only test theories as long as the study results in reasonable conclusions about cause and effect. Also, as previously discussed, non experimental research is especially susceptible to confounds as there may be a third variable that is actually the cause of the effect. A researcher should think through which confounds could exist before claiming a casual relationship between variables.

What is meant by confounds in correlational research?

Person confound

There are three specific types of confounds that can threaten correlational research. The first is called a person confound. This is when a variable appears to cause something because people who are high or low on this variable happen to be high or low on an individual difference variable (demographic characteristic, or personality trait). For example, people who suffer from depression also tend to suffer from anxiety. A finding that shows that depression affects self-view would be incorrect if it was in fact anxiety that affected self-view. By measuring the confounding variable to make a statistical alteration, a researcher can try to address this threat. In the example of depression and anxiety, a researcher could compare both variables through regression analysis that is designed to predict self-views.

Environmental confound

The second type of confound is called an environmental confound. These confounds are similar to people confounds except they relate to environmental instead of personal nuisance variables. An example of an environmental confound could be a stressful life event that causes someone to be more depressed and have a lower self-view. It may look as if the depression is affecting self-view, but in fact it is the stressful life event that is causing both the depression and the low self-view. Both person and environmental confounds threaten internal validity. The two confounds differ because person confounds happen internal to a person, and environmental confounds happen externally.

Operational confound

The third type of confound is the operational confound. They happen when a measure used to gauge a specific topic such as depression or memory measures something else accidentally. One can identify an operational confound because they typically appear in two distinct ways. Firstly, what is inadvertently measured has nothing to do with the predictor variable the researcher actually wanted to study. Secondly, what is inadvertently measured is closely related to the outcome the researcher predicted. An example would be if a researcher decides to conduct a written survey of depression. Let’s say that the researcher wants to avoid participant bias by avoiding questions that obviously relate to depression such as “I recently considered suicide.” The researcher decides to pose questions that relate to the physical symptoms of depression such as feeling tired, or less in control, or less attractive. The operational confound in this scenario would be a person’s age. An older person could feel tired, less in control, and less attractive but not actually be depressed. Age is not correlated with depression but affected the researcher’s measurement of depression. An operational confound can be addressed if a researcher removes the confound, basically a researcher would start over.

Another way to address the problem of an operational confound is the same as with the previous two. A researcher could measure the construct that he thinks is the cause of the action (age in this example), and use statistics to determine what the actual cause is. Lastly, a researcher could conduct the same study and focus on less diverse sample group (people closer in age).

Reverse causality

In addition to problems of confounds, researchers must also consider reverse causality. In the previous example of depression, a researcher who has carefully prepared for confounds may conclude that depression leads to negative self-views. The issue of reverse causality comes in because negative self-views may actually lead to depression instead of the other way around. Longitudinal designs or prospective designs can help address the problem of reverse causality. In these types of studies, researchers track and observe people over time to better understand the causal relationships between variables. This method is still not without potential error. To firmly establish a causal relationship, it may be necessary to conduct an experiment. It may seem not very productive for researchers to use a correlational study when so much is left to be determined. However, if a such a study is conducted carefully, a researcher can rule out many confounds to help advance knowledge. While there may be a lack of internal validity, correlational studies can be high in external validity. Researchers can use a variety of methods to balance internal and external validity.

Which two correlational research methods can be used?

Archival research

Archival research is useful when the topic a researcher wants to investigate has ethical concerns, when it is especially challenging to manipulate a variable, or when a researcher wants high external validity. In archival research, researchers review already existing public records to test their hypothesis. There are countless sources of these records including police reports, court decisions, weather reports, hospital records, sports scores, marriage and divorce records, census data, and so on and so forth. In 1987, Peterson and Seligman researched self-evaluation and physical well-being by gathering newspaper records of interviews of baseball players who had been inducted into the Baseball Hall of Fame. The researchers wanted to see if there were any long term consequences if a player made self-serving or self-protective attributions for their success. They had judges rate the interviews based on if the players made such attributions. Peterson and Seligman then compared these ratings with how long the players lived. The results showed that the players that made more self-serving attributions had noticeably longer life-spans.

Another example is a study conducted by Phillips on emulative suicide, also called copycat suicide. Phillips reviewed public records of suicides and the media coverage of various suicides. He showed how suicide rates increased after there was a lot of media publicity on a particular suicide. Researchers who wish to study various forms of prejudice and stereotyping have done archival research on ethnicity and capital punishment. In 1983, Paternoster reviewed criminal records and court proceedings to see how often a prosecutor asked for the death penalty if a defendant was black versus if the defendant was white. His study showed that when a black defendant was accused of murdering a white victim, the defendant was forty times more likely to be tried for the death penalty than if the defendant had murdered a black victim. Even when researchers statistically control for potential confounds such as severity of crime, number of victims, and socioeconomic status of defendant, ethnicity correlated with seeking the death penalty.

Additional research shows that not only were black defendants more likely to be tried on the death penalty, they are more likely to receive the death penalty. In 2000, Sidanius and Pratto reviewed over seven thousand execution records of executions that happened in the United States over a three-hundred-and-eight-three year period. They divided the executions into three different groups based on how humane versus how violent the executions were. Their study showed that sixty-one percent of the humanly executed were white while thirty-nine percent were black. Over eighty-eight percent of those violently executed or black compared to a little over eleven percent that were white.

Archival research yields high external validity. Not only is the researcher studying actual occurrences in the real world, they also avoid issues of participant bias since the participant does not interact with the researcher and would not be concerned on being judged. On the downside, a researcher may need to go to great lengths to show high internal validity. In the previous example of increased suicide rates due to media coverage, there were many potential confounds. Phillips tried to eliminate many of them. For example to eliminate the confound that grief caused people to commit suicide, Phillips showed that the suicide rates still increased even if it was a hated person that committed suicide. On the other hand, when a well-loved person such as John F. Kennedy was murdered, the suicide rates did not go up.

Another disadvantage in archival research is that the people who create the original records have their own goals and do not think of what the researcher is interested in. For this reason, a researcher may need to wait until public records exist that are of interest to their study. Researchers have less control over the data of public records and have to work with what they find. For example, it is possible that coroners are reluctant to declare a death a suicide and when they see media coverage of suicides they may be more comfortable reporting the death as a suicide. Phillips addressed this confound by showing that hidden suicides also increased following media coverage. These are less obvious suicides, for example, if a person seems to lose control of their car in a single car accident. Additionally, if the coroners had misreported their findings, when they did report suicides then the rate of other causes of death should go down. This was not the case.

Observational reseach

In observational research, a researcher has more control than in archival research. They are still observing the real world, in this case the subject in their natural environment and they do not manipulate variables. However, in observational research, the researcher is the one observing. The best method of observational research is unobtrusive observational research where the researcher does not interfere with the person’s behavior and the person doesn’t realize they are being studied. In 1974, Colett and Marsh placed a camera in a high traffic area to observe what people did when they had to squeeze past each other. They were specifically interested in the differences between men and women. Their studied showed that seventy-five percent of men faced the person they were squeezing past, while only seventeen percent of women faced the person they were squeezing past. The study also showed that in the incidences where women faced the people they were passing, they covered their breasts with their arms.

In a different study on gender, McKelvie and Schamer observed whether drivers obeyed stop signs. They coded based on if the person stopped completely, performed a slow stop, or didn’t stop at all. They observed during the day and at night. The study showed that when it was night time, sixty-two percent of women stopped entirely even if no one was coming, while thirty-six percent of men stopped completely. This differed from daytime behavior because women were more inclined to stop at night, while men were more inclined to ignore the stop sign at night. In daily life, a mechanic uses unobtrusive measures when measuring the wear and tear on a person’s brakes. Like archival research, observational research is high in external validity. Showing high internal validity is also challenging.

How can you measure confounds?

We have already stated that one way to address confounds it to measure them statistically. If a researcher sees a relationship between two variables after all known confounds have been measured and removed, they can feel more confident in their findings. In 2005, Jaccard, Blanton, and Dodge did a study on if close friendships influenced sexual activity in adolescents. Previous research showed that close friends had a lot of power to influence each other. For example if Sam and John are best friends in seventh grade, and Sam has already had sex, previous studies showed that John would be more likely to have sex in eighth grade than he would have without Sam’s influence. However, previous studies did not explain why this happens because there were so many personal and environmental confounds that could have been a factor.

Jaccard, Blanton, and Dodge sought to measure as many of these confounds as they could. They controlled for and measured confounds ranging from parenting style, parent-child relationship, parental education, parental feelings on sex, child’s level of physical development and academic standing, dating status, gender, age, ethnicity etc. After removing all these confounds, their study showed that the “best friend” effect still remained. This still doesn’t prove the existence of the “best friend” effect since it is impossible to remove every single confound. But this does mean that the previous confounds were no longer the cause of the change. Due to these limitations in non experimental research methods, researchers utilize an experimental laboratory when possible.

What are Case Studies?

What is experimental research in psychology? - Chapter 7

What are true experiments?

Experiments conducted in labs are powerful methods. They help researchers determine causal relationships since they are high in internal validity. Lab experiments use experimental manipulation and random assignment conditions to help reduce person confounds. By being conducted in a lab, such research experiments give the researcher greater control to reduce procedural confounds. Unlike the observational research we discussed previously, experimental research in labs can be disconnected from the real world since the procedures involved are unlikely to exist in the real world. Researchers address this by trying to create experimental studies that are as realistic as possible. Lab experiments are useful because they show researchers how the world changes when one variable is manipulated. This gives researchers information on causal relationships that would be difficult to gain by other means. As mentioned many times now, these causal relationships are important for establishing high internal validity and thus provide support for a researcher’s theory.

What is the function of manipulation in an experiment?

For a study to qualify as an experiment it must use manipulation. Manipulation is when an experimenter changes the levels of a variable, the independent variable, in a systematic way. An experimenter manipulates a variable to see if the changes in the independent variable cause corresponding changes in the dependent variable. If this happens in support of a known theory, the theory gains more credibility. If this contradicts a known theory, the theory loses credibility. The theory could be falsified, but most likely it will be revised. This process is the boundary conditions of a theory and is important since it is just as useful to know what manipulation in an experiment cannot do as to know what it can do.

Prehistoric people manipulated objects from nature to create tools, and learned that putting fire to meat allowed it to stay edible for longer through direct manipulation. Regardless of these examples, it took scientists a while to use manipulation as the main tool for figuring out the world around them. Aristotle chose to argue and reason rather than experiment or manipulate. Galileo Galilei made experimentation popular through being the first to perform such experiments. After the work of Sir Isaac Newton and John Locke, manipulation became a more established method.

What is a random assignment?

In addition to manipulation, true experiments also require random assignment. Random assignment is crucial to modern experimental psychological research. When other scientists wish to manipulate something in the world, they can use two substances that are identical. This may be in the form of identical samples of sodium, or two metal balls of the same weight. However, this proves a challenge when a researcher wishes to study humans since humans are different from each other. The history of true experimentation is how researchers bypassed the issue of individual differences. In earlier times, experimenters tried to match one person in the control group with a similar person in the variable group.

They ran into two major problems. The first problem was that it was hard to match two people together. Two people of similar intelligence may be very different in age. The second problem was that the method of matching didn’t work since even if two people were very well matched, it was still impossible to rule out that some difference wasn’t influencing the outcome. This is similar to the inductive problem we discussed earlier.

In the 1920s and 1930s, a scientist named R.A. Fisher wrote two books, Statistical Methods for Research Workers and The Design of Experiments. These two books were revolutionary in the history of psychology. Fisher was an agricultural biologist. He was interested in studying how different compositions of manure could optimize plant growth and crop yield. Plants are similar to people in that no two plants are alike. There are many external forces that can affect plant growth beyond manure composition such as light and spacing. Fisher made use of random assignment to deal with this challenge of finding identical specimens to study. Random assignment is the opposite of matching since it is completely arbitrary in a type of ordered chaos. It equalizes the groups being studied on all dimensions. Fisher’s contribution was that since he knew two seeds would not grow at exactly the same rate, he could use two groups of seeds and study their average rate of growth. This applies to people as well. Of course, random assignment does not always equalize all dimensions, however it is overall very effective.

It is important to note that random assignment is different from random selection or random sampling which we discussed in the previous chapter. Random assignment relates to experiments and internal validity, while random selection relates to population surveys and external validity. The similarity is that both random assignment and random selection aim to make two groups similar to each other. The difference is that random assignment aims to make a control group as similar as possible to the variable group, while random sampling aims to make the sample group as similar as possible to the larger population. Since both random assignment and random sampling are not foolproof in creating similar groups, researchers utilize statistical tests to see if the observed differences after manipulation are more exaggerated than they would be by chance. If researchers are unsure of the validity of their results, they can redo the experiment using the same random assignment wor random selection process to see if they get the same results. We’ll discuss statistical analysis in more detail in a later chapter.

What are the advantages of true experiments?

As we’ve just discussed, researchers were able to get around the problem of individual difference by using true experiments, specifically random assignment. An example would be the lab experiment conducted by Bargh, Chen, and Burrows in 1996. In this experiment, the researchers gave the participants thirty sets of words that they needed to unscramble to form thirty sentences. Participants were told this was to measure language proficiency. In actuality, the scrambled words were a priming manipulation to get the variable group to think about stereotypes of old people. That group was given words like “Florida,” “wrinkled,” and “retired”. The researchers wanted to see if causing participants to think stereotypically about a group would cause them to behave more like the group, in this case the elderly.

They measured this by having a research assistant time how long it took the participants to walk down the hall after the study. The study showed that the participants in the variable group did in fact walk slower than the participants in the control group. The groups were randomly assigned and subsequent studies using random assignment yielded the same results. The same researchers also conducted other studies where they primed for different behaviors such as rudeness versus politeness. This confirmed the previous study. In this example, it is unlikely that there was an error in random assignment that caused the same results in all four studies.

While random assignment addresses the problems with individual differences, other confounds can still exist. Random assignment specifically addresses the person confound.

Another type of confound is the procedural confound. Procedural confounds are the same as environmental confounds. The only difference is that procedural confounds relate to experimental research while environmental confounds relate to non experimental research. These confounds happen when the experimenter unintentionally manipulates two or more variables at a time. An example would be a study where the experimenter wants to see if the ethnicity of a person asking for a favor affects how helpful strangers are. If the experimenter chooses a white man, a black woman, and a latino man to ask for favors, he may discover that strangers were most helpful to the black woman. The issue in this scenario is that race and gender were probably confounded. The experimenter would need to conduct the experiment with a black man to avoid the gender confound. True experiments are good for avoiding procedural confounds since the experimenter has more control and can frame the study so that they only manipulate one variable at a time. Researchers can do this by making sure that everything else remains constant across the control group and the variable group.

Researchers still need to consider operational confounds (which we discussed earlier) when working with non experimental research studies. When an experimenter manipulates a variable, they could be affecting more than one psychological construct which then causes the dependent variable to change. However, it is not possible to be sure of the exact cause of the change. An example is the cookie study conducted by Isen and Levin that we’ve already discussed. It seemed that being given a cookie made a person more likely to help others. This seemed inline with their hypothesis that if a person was in a good mood they would be more likely to help others. However, there was another construct at work called the prosocial model. This is when a person copies the behavior of someone else. So, in Isen and Levin’s study, it is not possible to tell what made people more helpful. It could have been their mood, or it could have been the prosocial model at work. These are operational confounds.

What are laboratory experiments?

Another way for researchers to better eliminate confounds is to conduct their experiments in a laboratory rather than “in the field” or in the real world. By conducting the experiment in the lab, a researcher can avoid many of the confounds that typically occur in the real world. A research participant can be easily distracted outside of the lab. In a lab, a researcher can also control to what extent and how often a research participant is exposed to the conditions the researcher wants. Additionally, true experiments (especially those conducted in a laboratory), let researchers observe things they would not be able to otherwise. For example, researchers can measure if a person likes or dislikes something by measuring the electrical activity in the muscles of their face that control smiling or frowning.

Another example is a study conducted by Patricia Devine in 1989. She believed that prejudice existed on an automatic or unconscious level, and on a controlled or considered level. She conducted a laboratory experiment to test her hypothesis. In her study, Devine exposed two groups to two very different groups of words. The control group was exposed to general words like “people,” “water,” “thoughts.” The variable group was exposed to words used to stereotype African Americans like “ghetto,” “basketball,” “Negroes.” Both groups were then shown a picture of a man of nonspecific race and asked to rate how aggressive this man was. The variable group rated him as more aggressive than the control group did (aggressiveness is another common stereotype for African Americans). Devine’s study showed that a person’s unconscious beliefs can influence their social judgements.

Through conducting true experiments, researchers can not only isolate and manipulate variables, but also blend variables in a systematic way. Researchers can gain information on how multiple variables combine or interact to cause behavior. Interaction is a term used in statistics to mean that the way an independent variable affects the dependent variable relates to the condition of the other independent variable. Through true experiments, a researcher can determine if the effect of two variables working together is different than the sum of their effects if they were acting independently. If this is true, this would qualify as a statistical interaction. Interactions play a key role in establishing the limiting conditions of a theory. As we’ve discussed, theories have boundaries since no theory is true under every conceivable condition. The process researchers use to test the conditions under which a theory is or isn’t true is called qualification. We can use the example of semantic priming to illustrate qualification. The finding that people recognize words more quickly when they have just been exposed to words with a similar definition is called semantic priming. An example would be that a person would be able to recognize the word “money” more quickly if they had just been exposed to the word “bank.” Neely’s 1977 study showed that even if people were exposed to the word “bank” for such a short amount of time that they didn’t realize it, they still were able to recognize the word “money” faster. To conduct a qualification test on this study, a researcher could change the meaning of the word “bank” by having participants read something about a river “bank” to gauge if semantic priming still occurred.

Lastly, it is important to note that by conducting experiments in a laboratory setting, researchers can lessen the noise. Noise should not be confused with confounds. Noise in an experiment are the extra variables that affect the dependent variable. These variables evenly affect the control group and the variable group. In a lab setting researchers can control for noise more easily than in the real world. For example, they can determine the brightness of the lights in the testing room and the size of a computer screen participants are using. Since researchers try their best to eliminate as much noise as possible, they can be prone to work with sample groups that lack diversity. For this reason, true experiments conducted in a laboratory setting have been criticized as not being applicable to the real world. While there is some truth to this critique, researchers are able to gather much data and make many observations when they have the ability to manipulate variables in a controlled way. These experiments are high in internal validity and give a great deal of information on causality since experimenters are making their observations under the most ideal conditions.

What are pitfalls of true experiments?

Let's discuss in greater depth the criticism that true experiments are not applicable in the real world. This is referred to as artificiality. For example, in the study conducted by Isen and Levin where a person was given a cookie to gauge if it affected their helpfulness. This scenario is improbable in real life. Most people are in a good mood for other reasons, such as just having watched a nice movie, or enjoyed a nice meal. In the study of the peg-turning exercise. It is unlikely that most people would be asked to perform such a task much less paid to tell someone else that the task is enjoyable. This separation from real world events could be an artifact in any laboratory study. Critics could say the act of studying a person in a lab is creating an artifact. Additionally, as we've mentioned, by choosing homogenous participants to reduce noise, researchers open themselves to artifacts. In the 60s and 70's, concerns about artificiality in the social sciences led researchers to doubt that their experiments had any real world bearing. They worried that the way research participants behaved was specific to the laboratory and to the specific study and may not be able to be replicated in the real world. You should recognize this as an issue of external validity. For a laboratory study to have high external validity, it needs to be applicable to many different types of people in many different types of situations.

What is the difference between mundane realism and experimental realism?

In 1968, two researchers by the names of Aronson and Carlsmith rejected the idea that a laboratory study was destined to be low in external validity. They suggested the use of mundane realism, or to create a study that appeared similar to the real world. For example, to study how people behave when gambling, a researcher could create a casino. Their feeling was that by recreating the real world in a controlled setting, the results of a study would translate better to the actual real world. While they felt this would increase external validity, they did not think the aim should be to replicate the real world in a lab setting, but rather to get at what was underneath the surface of reality. They encouraged experimental realism, also called psychological realism.

This type of realism would make the study feel real to the research participants rather than look real (as would be the case with mundane realism). A very good example of experimental realism is in a study conducted by Asch in 1955 and 1966 on conformity. If Asch had been interested in creating a study based on mundane realism, he could have invited teenagers to participate and recreate the peer pressure of trying a cigarette. Instead, Asch focused on creating a sense of experimental realism. He told participants that they were doing a study on visual perception. All participants were asked to compare different line lengths and state which line was equal to the first one they were shown. The participants did this in a group setting and the first to respond were not actual participants. They intentionally gave the same incorrect answer. The real participants were in disbelief yet seventy-five percent of them also gave the same incorrect answer at least some of the time. Additionally, Asch was able to manipulate variables. For example, when there was one other person who gave the actual correct answer, participants were twenty-five percent less likely to "follow the crowd." While judging the lengths of line segments obviously doesn't relate in any real world way to conformity, the participants experienced a high level of experimental realism. They felt the same feelings that a person would feel in a situation where they had to choose to do what they felt was the right thing (or give the correct answer), or do what their peers were doing.

Another example of experimental realism is a study on competition and performance conducted by Triplett in 1898. He noticed that cyclists had stronger performances (faster times) when they competed against actual people rather than against a clock. He tested this by asking children to participate in a game where they raced in pulling a flag around a small course. Interestingly enough, Triplett learned a lot from observing the children who did not do as he had predicted. Some children were so overstimulated by the presence of others that they physically had difficultly pulling the flag. The experiment was high in experimental realism because the children were obviously very invested in the task and cared a lot about what they were doing. Triplett's results actually correlate to contemporary studies that show that the presence of others can facilitate or disrupt performance. If it is a task that a person is comfortable with (as the cyclists were with biking), it would facilitate performance. However, if it is a task that a person is inexperienced in or uncomfortable with, the presence of others can disrupt performance.

A more recent study on experimental realism is a study conducted in 1995 by MacDonald, Zanna, and Fong on how alcohol affects judgement. In this study, researchers had to convince people who had consumed very little alcohol that they were intoxicated. They did this by misleading the research participants in smell, taste, and sight. They sprayed the room before the participants entered to smell like alcohol. They poured tonic water from a bottle labeled as alcohol and placed a small bit of actual alcohol (which the participants were told was lime juice) on top. The participants thus tasted the alcohol strongly at first. The participants obviously felt a strong sense of experimental realism because they actually believed they were intoxicated though their blood-alcohol level was only minutely if at all changed.

The study showed that this group of participants, when compared to participants who were actually given alcohol and were intoxicated, was less likely to drive under the influence of alcohol.

While these previous examples show how experimentation can be useful and necessary in a study, experimenters still need to consider how to best establish high external validity. As we've shown, experimental realism can go a long way in helping to establish external validity. However, this is only true if the research participants are actually experiencing psychologically (feeling) what the researchers intend. This comes down to establishing construct validity. It is difficult to design a study to be high in experimental realism. There is no recipe for creating it, and often it has more to do with art than with science. It boils down to the ability to recognize the basic and key aspects of a psychological experience and then replicate it in a laboratory setting without adding anything or leaving anything important out.

The good news is that there are empirical ways to measure experimental realism. One such way is through manipulation checks. A manipulation check is a way to gauge if the research participant does feel the psychological condition the researcher intended. For example, if a researcher wants to study if the physical attractiveness of a person affects how helpful other are to them, they can ask the research participants to report how attractive they think the person is. If the researcher doesn’t want to reveal too much about the goals of the study, they can ask another group of people who won’t be in the actual study to rate the person’s attractiveness.

There does seem to be one key ingredient in establishing experimental realism, a certain amount of deception. In most cases (such as the study on conformity) a researcher will need to tell the participant that they are participating in a different study than they are actually taking part in. There are varying degrees of deception, and a lesser form is when researchers don’t give incorrect information, but withhold the purpose of the study. The reason for this is that the participant needs a believable illusion so they can experience the psychological conditions that the researcher intends. It is necessary to note that more recently, the trend has shifted away from such deceptions and towards mundane realism. This may be how researchers have responded to people losing trust in the science of psychology.

How can you balance internal and external validity?

From our discussion thus far, it may seem that non experimental research is always higher in external reality than experimental research, and that experimental research is always higher in internal reality than non experimental research. This is not always the case. This fact is worth mentioning because the results of laboratory studies and field studies can contradict one another. In case of such a contradiction, it would be incorrect for a researcher to assume that the laboratory experiments is more internally valid than the field study.

Two studies (when compared to each other) illustrate this well. In 1979, Alloy and Abramson conducted an influential study on depression and a person’s sense of control. The results were later referred to as the “sadder but wiser” effect. Depressed and non-depressed students were asked to judge their control over a series of lights. While it may be assumed that the depressed students would underestimate their level of control, in fact their predictions were more accurate than the predictions of the non-depressed students. The non-depressed students overestimated their control of the lights. From this study, the researchers came to the conclusion that depressed people have a more accurate view of the world than non-depressed people.

Their study inspired a later study conducted in 1991 by Dunning and Story. They took the experiment outside of the lab and asked people to predict what would happen to them in the near future. For example, the participant could state, “I think four of the ten bad things you asked me about will occur in the future.” When Dunning and Story compared depressed versus non-depressed peoples’ predictions with what actually came to happen, they found that the depressed people were more optimistic. The depressed people had underestimated the negative events of the future, while non-depressed people predicted the events fairly accurately. This second study contradicts the Alloy and Abramson study which seemed to establish (through systematic laboratory studies) a link between depression and control. However, their study isn’t higher in internal validity if the link is actually between depression and optimism. It would also be incorrect to say that their study did not contribute important information. For one thing, the Alloy Abramson study led to the Dunning and Story study. Secondly, it is important to remember that scientific studies work to get closer to the truth. There are probably real-world judgements that would more closely resemble the results that Alloy and Abramson saw.

Another comparison is laboratory studies conducted on the connection between temperature and aggression and field research conducted on the same thing. Lab studies have shown irregular results. Some studies have shown that higher temperatures cause increased aggression when the participant has gotten positive evaluations from the potential target of the aggression. Other studies showed that increased temperature causes increased aggression in a curvilinear way (only to a point). However, the evidence is much more clear when looking at the results of field studies. It has been shown that in Phoenix, Arizona (a very hot city) the rates of aggressive car horn honking increased in a measurable way with the temperature. Additionally, the driver was more aggressive if their car window was down. Cities with higher temperatures have a higher violent crime rate. Additionally those rates were even higher on the hottest days of the year while non-violent crime rates did not vary.

Another example is that a study has shown that a baseball pitcher is more inclined to hit the batter with the ball when the game is played on a hotter day. These two studies illustrate that a field study can be just as high in internal validity as a laboratory study. The nature of this study (temperature and aggression) is not subject to additional variables that may be problematic. For example, it is logical to assume that aggression does not lead to a higher temperature. Temperature is also easy to measure and separate from confounds typically found in nature.

If a researcher wanted to study self-esteem and positive social-feedback, it would be more difficult to establish high internal validity in a field study. The third variable problem would exist, does high self-esteem cause positive feedback from one’s peers or does receiving positive feedback lead to high self-esteem. Additionally, naturally occurring confounds could be more complex to differentiate from the variables of interest. It is important to note that just as lab experiments do not guarantee high internal validity in all studies, field studies do not guarantee high external validity. The field study is also working with variables and should be tested for generalizability. This means that multiple studies should be done to make sure the results are applicable to a variety of people in a variety of situations.

Inconsistencies and contradictions between lab and field studies contribute to the overall wealth of knowledge on a topic and should be reviewed together to draw the most useful conclusions. That being said, it is still important to reiterate that laboratory experiments are usually preferred for establishing cause and effect relationships between the dependent and independent variables. We’ve discussed some exceptions, but laboratory experiments can provide valuable insight on the motivations behind what a researcher observes in field studies or in the real world.

One example is a 1987 study by Rule, Taylor, and Dobbs. They asked people to complete a story. One group of participants completed the task under hot working conditions while the other group completed the task under cool and comfortable working conditions. The study found that the group with the hot conditions completed the stories in a more aggressive way. This suggests that high temperatures may make aggressive thoughts more easily accessible. This insight would be difficult to gain through passive observation. For this reason, researchers should make use of multiple methods of research. Ideally, all paths would lead to similar results.

Which laboratory study guidelines can you use?

Pilot tests and manipulation checking

Researchers should conduct pilot studies. Human beings suffer from an optimistic bias and planning fallacy. Optimistic bias means that people are overly optimistic about outcomes in life. Planning fallacy means that people assume things will go better than they actually do. By preparing for things to go wrong, a researcher can plan for errors which will make their study stronger. Pilot testing is a good way to conduct a mini experiment and work out any problems before conducting the full experiment. Pilot testing is especially necessary in complex experiments to help a researcher determine what is working and what is not working. When conducting a pilot test, researchers should build in a manipulation check. As we discussed before, this is a good way to get direct feedback from research participants in all conditions to see if they experienced the different variables that were intended. Even if a researcher skips the pilot test, they should conduct a manipulation check. If a study fails, a manipulation check can be valuable because it will still provide evidence in support of the result. Even a result that contradicts a researcher’s hypothesis can be a valuable contribution to future studies.

Sometimes participants are reluctant or unable to report their psychological state. In these instances, a researcher will need to conduct a manipulation check in an indirect way. To do this, a researcher could ask judges to gauge the psychological state of the participant. In other instances, it may be necessary for the researcher to conduct a pilot test in order to see if the manipulation is effective. This may be the case if it is important that the participant be unaware of what the researcher is measuring. Once the researcher has established that the manipulation creates the desired psychological state, they can conduct the actual experiment with a different sample group.

Tips for experimenter and participant interaction

As a experimenter, you should rehearse what you are going to say before giving instructions and directions to research participants. Behave professionally and be dressed professionally. This encourages research participants to take you more seriously and see you as an authority figure. You should be polite and friendly. There may be times where a participant is rude or difficult. In these instances, a researcher should be firm in maintaining the design of the experiment but give the participant the opportunity to leave the study. You should always make notes on any difficult participants or if they do not follow your instructions. This may prove relevant to the outcome of the study. Know your experiment very well. This means you should be aware of how you want the experiment to go so you can know when it is not going well. At the same time, you need to be aware of experimenter bias. If the experiment corroborates your hypothesis, you want to be sure it is for the right reasons. If a researcher is too aware of the results they hope for, they may inadvertently guide participants towards that goal. For this reason, it is a good idea to have a script that you read your instructions from so that you can be sure both the control and variable group were treated the same. If you are conducting a study that requires deceiving the participant, you’ll need to learn to be good at lying. If the participant becomes aware of what is actually being observed or measured it could affect the outcome of the study. It is important that you carefully observe the participants and take detailed notes. Small details that may not seem important (such as difficulty or rudeness) may turn out to provide insightful information.

Replicate the study

Once you’ve completed your study, you should try to replicate it. This is necessary to rule out the possibility that artifacts are affecting the outcome of the study. It is important to remember that even if you don’t get the exact results you hope for, it doesn’t mean the hypothesis was wrong. It could be that there was something wrong with the experiment. By replicating experiments, a researcher can determine this information.

What are some new trends?

While researchers do still aim to confirm their hypothesis, researchers are increasingly interested in establishing boundaries for their theories. This means that aside from predicting how two variables are linked, researchers are searching for the conditions under which such a link or effect does or doesn’t exist. Another trend is that journals tend to only publish papers where a hypothesis has been confirmed through multiple experiments and ideally multiple methods of experimentation. Researchers can use laboratory experiments to establish a causal relationship and then test this further through field studies to establish external validity and real world relevance to address problems of artifacts.

What are true experiments?

What is quasi-experimental research in psychology? - Chapter 8

In this chapter we will discuss quasi-experiments. These are research experiments where the researcher does not have full control over the variables. This specifically means that conditions are not assigned to participants through random assignment. We will discuss different kinds of quasi-experiments with their specific advantages and disadvantages. Quasi-experiments are helpful in addressing research problems by giving researchers the chance to study phenomena they wouldn't be able to study otherwise. Additionally, they relate information on the external validity of laboratory studies and can help a researcher determine if the experiment is in need of revision. As with the other methods that we have discussed of non experimental and experimental research designs, quasi-experiments can yield a lot of useful information. Quasi-experiments should also be used in conjunction with other methods to provide a better picture of the topic being studied.

Quasi-experiments can lead to issues in establishing high internal validity. These issues can be addressed if the researcher is able to test for confounds. Since the researcher doesn't use random assignment in quasi-experimentation, these studies are especially susceptible to person confounds. When these confounds are addressed, quasi-experiments are very useful because (unlike true experiments), they do not cause much debate about external validity. Since this is the case, researchers like to use quasi-experiments if they want to have high internal and external validity.

Also, in some instances a true experiment may not be feasible to conduct. One example is that while we know gender and sex are very important to how people behave and why they behave a certain way, it is not feasible to assign a person a gender. While it is important to study gender, researchers are not able to manipulate gender. In order to study important things that cannot be tested through random assignment (such as gender, natural disasters, etc.) researchers use quasi-experiments.

In other instances, it may be possible to conduct a true experiment but the researcher will chose to conduct a quasi-experiment to avoid using unaffordable equipment. There are also times where random assignment should not be used in an experiment for ethical reasons. Researchers could make great leaps in cancer research if they exposed participants to cancer and then observed them. However, this obviously crosses ethical boundaries. Psychological studies may study variables that negatively influence a person's sense of well-being. Researchers are also unwilling to expose participants to extreme psychological stress. Researchers may also study sensitive issues of aggression or racial prejudice. However, no researcher would be willing to manipulate a control group to develop racist attitudes. Quasi-experiments should be treated as a hybrid of experimental and non experimental research designs, the best of both worlds. They do this by crossing the borders that typically separate experimental and non experimental research. A quasi-experiment can allow researchers to measure a variable instead of manipulating it in a laboratory setting. A quasi-experiment can also let researchers observe an event that occurs naturally in the real world that functions in a similar way to an experimental manipulation.

Which two types of quasi-experiments can be used?

Person-by-treatment quasi-experiment

The two most often used quasi-experiments are person by treatment quasi-experiments and natural experiments. Person by treatment quasi-experiments are most often used by researchers in the fields of social, personality, and clinical psychology. In this type of study, a researcher measures at least one independent variable while manipulating at least one other independent variable. Researchers conduct this experiment in the laboratory and use random assignment to have control over at least one independent variable. The notable point is that researchers also measure at least one other independent variable. Researchers conducting studies on social or personality psychology may use pencil and paper to measure attitudes or personality. They could use this data to create two experimental groups. This is another way to say that researchers prescreen participants based on their individual differences. They then use the pretest results to determine at least two extreme groups (the highest and lowest scoring people on an individual-difference measure). Researchers use extreme groups to establish two groups where the members of the group are similar to each other but extremely different from the members of the other group. These two groups can be said to be qualitatively different from each other.

Researchers can also use a procedure called a median split. This is not a preferred method. In a median split, the researcher would divide the scores into two groups: participants who scored above the median score, and those who scored below the median score. The problem with this method is that those participants that scored only slightly above or slightly below the middle most score are more similar to each other than they are to participants that scored way above or way below the medium score. The median split method does not allow researchers to study these subtleties.

A critique of the extreme group procedure is that a researcher wouldn't really be studying the people in the middle (those who did not have extreme scores) at all. However, in most cases, it is reasonable for a researcher to assume that those people in the middle would respond to treatments in "medium amounts" in comparison to the people in the extreme groups. Generally, researchers believe that if an effect exists it can be found through using extreme groups. Conversely, researchers generally also believe that if an effect is not found through using extreme groups, then it is unlikely to be found in any median group either. By this logic, researchers using extreme groups are able to use a smaller sample size. For practical reasons, this can be very helpful if the study is expensive to conduct or very complex. We should also note that true experiments tend to follow the some logic behind the extreme group procedure. If a researcher is manipulating temperature in a laboratory setting to see if it affects aggression, they will most likely use a high temperature and a low temperature. Critics of studies using extreme groups are welcome to replicate the experiment and include a middle group.

In 1988, Swann, Pelham, and Chidester conducted a laboratory study on attitude change. Their aim was to change the attitudes of people who were highly confident that their attitude was right. This appeared to be a huge challenge because previous studies showed that people who are firm in their attitudes would be particularly resistance to changing their attitude. The researchers conducted the laboratory experiment with female students from the University of Texas. All the students had traditional views on gender roles and were divided into extreme groups of those highly certain of their viewpoints and those highly uncertain of their viewpoints. In other words, researchers did a prescreen where they measured the student's certainty. These students were then exposed to two experimental conditions. They were assigned to these conditions using random assignment, meaning that where their scores fell on the attitude certainty measure was irrelevant to which condition they were assigned to. In the first experimental condition, the students were asked leading questions that encouraged them to have more liberal viewpoints. For example, "Why do you think women are better supervisors then men are?" In the second experimental condition, the students were asked leading questions that encouraged them to have more conservative viewpoints. For example, "Why do you think men always make better supervisors then women?" The researches predicted that the students who were extremely certain of their viewpoint would need to make it clear that they did not agree with the extremely conservative questions, and that when they were surveyed they would say that they felt a more extreme attitude change from conservative to liberal.

The study did in fact corroborate the researchers' hypothesis. The women who were uncertain of their opinions that were exposed to the leading liberal questions, became more liberal, and the women who were exposed to the leading conservative questions didn't change. However the women who were very certain of their opinions showed the opposite. Those exposed to the leading liberal questions showed little change, while those exposed to the leading conservative questions became more liberal. This is a fairly common construct in research design. Researchers expect different people with different attitudes and personalities to respond differently to different treatments. This is why this procedure is called person-by treatment quasi-experiment.

Another hypothetical study using the person-by treatment quasi-experiment technique would be a study on how people's view of themselves affected who they chose to interact with. It seems obvious that people would prefer to interact with others who gave them positive feedback. However, according to self-verification theory this is not always the case. Self-verification theory says that a person's preference for positive or negative feedback depends on the feedback and on how they view themselves. This is a self-consistency theory. Self-verification predicts that if a person has a negative self-view they would prefer to interact with other people who give them negative feedback. The theory essentially states that people prefer to interact with other people who view them the same way they view themselves, even if it is negative. Less surprising, people with positive self-views preferred to interact with other people who viewed them positively.

Researchers do not have experimental control over the independent variable that they measured, so it is challenging to establish a causal relationship. For example, in the study on attitude change, the participants could have been stubborn (a individual-difference variable) which would have affected the results of the study. This is an issue of person confounds. Many topics that researchers are interested in studying present a challenge in that there may be limitless third variables at work. For example, a study on self-esteem would need to take under consideration variables like age, gender, socioeconomic status, health, depression, professional success etc. Researchers should consider these confounds to see if they may be affecting the research findings. Not only does the person confound pose a problem, operational confounds can also be found in person-by-treatment quasi-experiments. For example, a measure of self-esteem could actually be measuring age or another variable. Researchers will need to consider this as well. This goes back to the inductive problem that we discussed earlier.

Natural experiments

In natural experiments, a second type of quasi-experiment, researchers give up all experimental control over variables. This differs from the person-by-treatment quasi-experiment because researchers use random-assignment for at least one variable and measure at least one other variable. In natural experiments, researchers do not use random assignment at all. Natural experiments focus on naturally occurring manipulations. Real life events often randomly assign themselves to people (being struck by lightening, winning the lottery). Rare occurrences would be difficult to convert to a quasi-experiment, but other random events do happen often enough that researchers have a large sample group to work with, natural disasters for instance.

For example, tornados are common to the mid-western United States. However, tornados are also unpredictable in that they can flatten several houses and leave the one next door perfectly intact. If researchers wanted to study if there were higher rates of anxiety or depression in the people who had the intact houses, they would treat tornado exposure as the independent variable and anxiety and depression as dependent variables. Natural manipulations differ from true experiments since they are not absolutely random. Tornados are more likely to occur in certain states than in other states. Researchers would need limit the geographical location they are studying since comparing tornado victims from one state with non victims from another would be problematic. Researchers can measure existing confounds (the differences between people who are exposed to a certain level of natural manipulation versus other people who are exposed to a different level of the natural manipulation). Researchers could then use statistical analysis to make groups equal that would normally be different.

Natural versus non experiments

The differences between natural experiments and non experiments can sometimes be difficult to recognize. This boils down to how arbitrary natural manipulations are. The boundaries between natural experiments and non experiments blur when researchers study naturally occurring events that can be categorized but are actually arbitrary. For example, in the judicial system there is great importance placed on a defendant’s age. If the defendant is less than eighteen years of age, they are considered a juvenile, but once they turn eighteen, they are considered an adult. Most researchers would agree that there is little difference between someone who is two days younger than their eighteenth birthday versus someone who is two days older than their eighteenth birthday. In this case, the categories of juvenile and adult are arbitrary. Another point of confusion is that archival studies of real-world events can qualify as natural experiments. For example, tornado records can be found in archival sources. For this reason some archival studies can qualify as quasi-experiments.

What are some other quasi-experiments that can be used?

Don’t forget that the two quasi-experiments that we’ve discussed in depth do not represent all potential quasi-experiments. Other settings will require a different quasi-experiment. One such quasi-experiment design is the opposite of the person-by-treatment design. This is called the natural groups with experimental treatment design. In this design, researchers take two naturally occurring groups of people (people who have sorted themselves into groups) and treat them differently in a systematic way. When using this quasi-experimental research design, researchers need to be sure that the two groups were very similar and experienced different levels of the experimental treatment. Otherwise, the findings would not be useful. This design follows the same ideas of experimental research, except the two groups were not randomly assigned. They were preexisting in nature or the real world.

Examples of this would be if a business changed the health insurance for some employees while other employees kept the previous insurance. Researchers could study the differences in health between the two groups of employees. In this scenario, a researcher would not have the liberty to determine which employee receives which insurance. However, the researcher can still make careful observations and measurements. It is rare for a researcher to predict and control natural manipulations, such as the path of a tornado. However, certain natural manipulations are easier to predict and control. These can be called nature-by treatment studies. In 1997, Pyszczynski, Greenberg, and Soloman designed a study to see how people deal with the notion of death. The researchers approached two groups of people, those who could see a funeral home nearby, and a control group who could not. The researchers then asked the participants to judge a person who either criticized or praised the American lifestyle. In this study, seeing or not seeing the funeral home was the natural manipulation. The study found that people were more critical of a person who criticized the American lifestyle when they could see the funeral home. Essentially, that when a person is faced with their own mortality, they are more inclined to defend their beliefs.

Obviously, quasi-experiments come in many forms since they blend elements of true experiments and non experiments. There are still pitfalls to consider since there are constraints to conducting such quasi-experiments. Comparability poses a problem since researchers need to locate comparable and non-overlapping groups. Consider a hypothetical situation: a natural disaster such as a a meteor striking a rural community and destroying the farm crops thereby devastating the local economy. Researchers can agree that this is a freak occurrence and appears to be completely random. Unlike tornados, a meteor can strike anywhere. However if researchers later visit the small rural community to study post-traumatic stress disorder, they would have a problem of establishing comparability since it would be difficult to find a comparison group. Comparisons are necessary because it gives the new information more meaning since numbers are useful relative to other numbers. However, in nature or the real world, researchers can have a hard time finding two groups to compare that are similar in every way except for the variables the researcher wants to study. That’s why random assignment is so useful in laboratory experiments. On the other hand laboratory experiments can be problematic as well. If a researcher is interested in studying a very unique or unusual group of people such as people who have an extreme phobia of something, it can be equally difficult to find a comparison group.

Since it is so challenging to find comparison groups for many quasi-experiments, researchers have resorted to patched-up designs. The latin phrase ceteris paribus (which means all else being equal) is good to keep in mind when thinking about the purpose of true experiments. In true experiments, researchers keep everything equal except the specific variable manipulations they want to study by using random assignment and procedural controls. These methods deal with issues of person confounds and procedural confounds. Since this is not possible in quasi-experiments, researchers developed a method called patching. In this method, researchers introduce conditions in a study to determine the size of the quasi-experimental effect and/or to test for confounds. This results in a patched-up design where researchers add control groups throughout the study to address specific issues that come up in the course of the study. Researchers in this research design have to think beyond the groups or conditions of interest to them. When a researcher’s hypothesis still holds true after the researcher has subjected it to many different patches, the researcher can have more confidence that their study has scientific credibility.

What is an one-group design?

As an example, consider the following fictional study on teaching methods. Imagine that two researchers (who are both also teachers) differ on the use of study guides. Professor A thinks that study guides will help students improve their test scores because students will be better able to prepare and plan for the midterm and final exams. On the other hand, Professor B thinks that study guides will only encourage students to not attend lectures because they will be overly reliant on the guides. Let’s say that in this hypothetical scenario, Professor A decides to run a study-guide efficacy study using a one-group design. Professor A posts study guides for his course online so that students can download and use them before the midterm and final exams. After the students take the exams, Professor A calculates that the class average score was 90 out of 100 points. This is called a one-group design because all of the participants in the study comprise only one group and all of them experienced the quasi-experimental treatment. As we’ve discussed before, a experiment such as this one without a control group is a pseudo-experiment. In this study, Professor A’s results are not useful because there are no other scores for him to compare.

What is an one-group pretest-posttest design?

To set up a comparison, Professor A can patch in a one-group, pretest-posttest design. In this study, Professor A could provide study guides for the final exam but not for the mid-term. He could then compare the student’s mid-term test scores with their final test-scores to see if the study guides were effective. Let’s say that Professor A’s results show that the student’s mid-term test scores averaged 82, while their final test scores averaged 90. This seems to show that the study guides were effective. While the one-group, pretest-posttest design is more informative than the one-group design, it is still possible that there were other influences on the test scores. Maybe the students tested better after they had become more comfortable with Professor A’s teaching style. Maybe they learned from taking the midterm exam (testing effects). Maybe Professor A engaged in experimenter bias and treated the students differently after the midterm exam.

What is a posttest only design with nonequaivalent groups?

To address these alternative explanations, Professor A could then patch-in a posttest-only design with nonequivalent groups. Professor A could compare the final exam scores of the current group of students with the final exam scores of a previous group of students who had not received study guides. In this scenario, we would need to assume that Professor A teaches the same course regularly in the same manner and that the tests ask the same questions. Let’s say that the results for this patched in study shows that the current group of students who did have access to study guides scored 90 out of 100, while the previous group of student without access scored 80 out of 100. This study gives more information than the previous two, but could still leave room for confounds.

What is a pretest-posttest design with nonequivalent groups?

Perhaps Professor A teaches the current class in the morning, and the previous class was taught in the afternoon. Professor A wonders if students who enroll for a morning class are by nature different from those who enroll for a afternoon class, meaning they are more studious to begin with. Professor A could address this treatment confound through matching. We have said that random assignment is preferable to matching, however since the students sorted themselves (into morning class and afternoon class) matching can still provide Professor A some new insights. In a patched-in pretest-posttest design with nonequivalent groups, Professor A could test to see if the two classes were the same or not. To determine if the two classes are well-matched, Professor A can compare both sets of mid-term test scores with both sets of final test scores. Both classes should perform similarly on the mid-term. The current group of students should perform better on the final exam since they had access to the study guides. Although Professor A’s results reflect this hypothesis, the study could still be open to criticism. Through the use of patched-in designs, Professor A’s hypothesis has much more credibility than when he began with the one-group study. However since people are different, and the two classes were not randomly assigned, there is a small chance that one class was predisposed to do better. This is a typical problem that researchers using quasi-experimental designs run into.

What are time-series designs?

Researchers address this problem through time-series designs. In these designs, researchers gather additional data to show that their groups are the same before the quasi-experimental treatment/manipulation happens. In the case of Professor A, he could give weekly exams. If his hypothesis holds true, the weekly exam scores of both classes before he posted the study guides should be similar and the current class should test better after they accessed the study guides. Let’s assume that the study results reflect this hypothesis. This is still not enough to conclude that study guides improve grades because they help students prepare for the test. It is still possible that study guides improve grades but for an entirely different reason. Maybe students are more engaged in a course because they think Professor A is an invested teacher since he took the time to post study guides. This could have led students to ask more questions in class and otherwise participate more. To address this concern, Professor A could use internal analysis to locate the exact reason behind his results.

How can you do internal analysis?

In internal analysis, researchers divide groups into smaller subgroups to test for differences. In the case of Professor A, we can say that a student needs to view the study guide in order for his hypothesis to hold true. If all students need to do better on the exam is to know that study guide exists, they don’t actually need to access it to improve their exam scores. This is an internal comparison between students in Professor A’s current class who did or did not access the study guides. By patching in an internal comparison study, Professor A again lends credibility to his hypothesis (if the results corroborate it). Of course this doesn’t mean that the hypothesis is without a doubt correct. There are still potential explanations that something other than the quasi-experimental treatment he intended caused the results. The purpose of this lengthy example is the show that patching in designs is often necessary to address confounds and other variables that may be affecting a research study. However in most cases, it is impossible to conduct all the full number of necessary patch in studies. This may be because there is no pretest data. Sometimes a researcher can’t find a control group to compare to the group that experienced the quasi-experimental treatment. Researchers should aim to conduct patch in studies to address the larger issues of their hypothesis.

What is meant by the name-letter effect?

In studies conducted in 1985 and 1987, Josef Nuttin documented the name-letter effect. This is the effect where people tend to like letters that exist in their name more than they like letters that do not exist in their name. This is especially the case with a person’s first and last initials. Nuttin believed that this happened because people like things that they associate with themselves. Nuttin saw the name-letter effect as a type of unconscious narcissism. In 2002, Pelham, Mirenberg, and Jones come up with a hypothesis that if it is the case that people strongly prefer the letters in their names, they would make life decisions on these preferences. For example, a people would be attracted to other people with the same name. A person named Frank may be inclined towards the profession of firefighter. The researchers labeled this implicit egotism. They tested their theory of implicit egotism by coming up with pairs of names that strongly correlated with the names of cities: Jack and Phillip (Jacksonville and Philadelphia) and Virginia and Mildred(Virginia Beach and Milwaukee). The researchers chose the names in a systematic way and made sure that they were equally popular names for a particular age group. The researchers then looked at death records to see how many people by these names had died in the perspective cities. The archives showed that there were a disproportionate number of people named Jack that had passed away in Jacksonville and Phillips that had passed away in Philadelphia. The same was true for Virginia and Mildred). However, this doesn’t necessarily prove the hypothesis. There is also the issue of reverse causality. It is also possible that living in Philadelphia makes a parent more inclined to name their child Phillip. Essentially, it is not clear whether people chose specific cities because they resembled their names or if parents living in a city are more inclined to name their child something similar. The researchers could conduct a patch in study to see how many of these people had moved to the cities where they died. This would be a form of a pretest-posttest design. However, this data was not available because the records did not include birthplaces.

Pelham, Mirenberg, and Jones conducted another patch in study that addressed the concern in a different way. While it may be possible that a parent would name their child Jack if they live in Jacksonville, a parent would be unlikely to change their surname. The researchers replicated their study focusing on surnames instead of first names. The new study found that people whose surnames began with Cal were more likely to live in California and people whose surnames began with Tex were more likely to live in Texas. As an added benefit, both Texas and California had similar proportions of white and latino residents. This helps avoid the concern of a racial confound. The researchers conducted other patch in studies where they controlled for age and ethnicity through other means, all to similar results. However, the results of the study are still open to criticism since it is impossible that the researchers were able to control for all confounds. This is especially problematic because confounds of age, gender, and ethnicity and pose serious problems to the credibility of a study. Pelham, Mirenberg, and Jones addressed this by conducting another patch in study using birth dates instead of names. While it is debatable how random being given a certain name is, dates of birth more closely resemble random assignment. The researchers looked at people with birthdays such as 02/02 (February 2) or 03/03 and compared them to cities that began with Two or with Three etc. They were lucky to find cities named Two Harbors, Three Forks, and Five Points. When they looked at the death records for these cities, they also found that a disproportionally high number of people with those birthdays chose to live in those cities. Additionally, the researchers studied people’s names and compared them to their professions. The results also corroborated their hypothesis. For example, people who’s names began with the letter “D” were more likely to be doctors or dentists. In this study, reverse causality does not pose a threat. Obviously, while people are given names at birth, they choose their profession later in life.

What is the relation between internal analysis and the name-letter effect?

Pelham, Mirenberg, and Jones also applied internal analysis to their studies. They discovered that people with very unique or uncommon names exhibited a stronger degree of the name-letter effect. This result further supports their hypothesis because it goes to reason that if a person feels their name is special rather than common, they would have a even stronger preference for the letters of their name. Further internal analysis patch in studies show that women more strongly prefer the letters in their first name as compared to men. This can be contributed to the fact that many women change their surnames when they marry. As with the fictional example of the study on teaching methods, patch in studies will not make the findings of a quasi-experiment foolproof but can go a very long way in addressing confounds.

What is the relation between quasi-experiments and validity?

As illustrated by the earlier examples, it is difficult to identify causes through quasi-experiments. This is because true experiments or laboratory experiments tend to be higher in internal validity. The earlier examples (teaching methods and the name-letter effect) illustrate situations where the use of patch in studies helped to corroborate the researchers’ hypotheses. Of course there are also many cases where the findings of the patch in studies contradict a researcher’s hypothesis. However, if researchers can use quasi-experimentation to address issues of internal validity, their studies have some advantages over true experiments. Since quasi-experiments focus on real-world outcomes they are obviously applicable to the real world whereas laboratory experiments can be difficult to generalize to the real world. Additionally quasi-experiments are not only useful in testing theories, but they can test theories in a way that is impossible to do in a lab. For example, researchers can’t conduct experiments on the effects of a tornado within the confines of a laboratory. There are times where quasi-experiments are the only way to properly test a theory. Researchers should use quasi-experiments and true experiments to test the same thing at once. When the results are similar, researchers can be confident that their lab experiment will generalize well to the real world and that their quasi-experiment has some degree of internal validity. When the results differ, researchers can learn a lot from figuring why this happened.

What is the differene between experimental and quasi-experimental research on self-esteem?

An example of a situation where true experiments and quasi-experiments provide differing information would be a study on self-esteem. In the true experiments method of researching self-esteem, researchers would use random assignment to determine two groups and temporarily manipulate a person’s self esteem. One group of participants may receive self- esteem enhancing positive feedback while the other group receives self-esteem diminishing feedback. This should result in high and low self-esteem respectively. The researcher would then judge the behavior of both groups.

To be more specific, let’s suppose the researcher told one group of participants that scored very high on a test of social perceptiveness and the other group is told they scored very low. The researcher then asks the participants to take another test on creativity. After this test, all the participants (both groups) are told that they scored very low. The dependent measure would be how many excuses the participant makes to excuse their poor performance. The assumption would be that the group that was given positive feedback so they would have high self-esteem would make fewer excuses since past research has shown that when people are given positive feedback they become less defensive. In this true experiment on self-esteem, the results would predict that low self-esteem would lead people to make excuses. However, this apparently reasonable conclusion would be incorrect. Quasi-experiments that have measured rather than manipulated (artificially created) self-esteem has shown the opposite, that people of high self-esteem would make more excuses than than people of low self-esteem.

The 1947 study conducted by Bettleheim can help explain this contradiction. Bettleheim conducted case studies of Jews who were imprisoned in Nazi concentration camps. His studies showed that while the prisoners initially resisted the insults and abuse of their captors, they eventually began siding with the guards and even developed anti-Semitic attitudes. He called this “identification with the aggressor”. Bettleheim’s studies suggest that the way people initially respond to criticism or abuse is very different from the way people respond when subjected to long-term criticism or abuse. This is a possible explanation for the contradictions we see in the two studies on self-esteem. The true experiment and the quasi-experiment were measuring two different things. The true experiment measured short term exposure to criticism, while the quasi-studies measured the results of long term exposure. Other research on self-esteem has also shown that self-esteem manipulations are unlikely to actually create low self-esteem even though they can affect self-evaluation. This is because people usually have a sense of who they are and this sense doesn’t fluctuate from moment to moment. Swann’s 1987 study suggests that after people receive negative feedback, they try to restore their level of self-esteem to what it was before. Ultimately this is a question of construct validity and conceptual validity. In this case, the quasi-experiment gives more valid information because self-esteem is difficult if not impossible to manipulate in a true experiment. However, both experiments bring researchers closer to understanding the larger picture of self-esteem since the contradictions can point to areas that should be further studied.

What is the difference between experimental and quasi-experimental research on image appeals?

Another example on image appeals shows that experimental and quasi-experimental research can also show the same results. In 2001, Blanton, Stuart, and VandenEijnden tested their theory on image appeals using both methods. An image appeal is a persuasion technique that is intended to change a person’s attitude by taking advantage of the fact that people want to avoid negative images and adopt positive ones. This is a common technique of advertising when commercials depict a certain brand as “cool” so that people will buy that brand to appear “cool.” Blanton, Stuart, and VandenEijnden were interested to know if image appeals that capitalize on negative images were as effective as those that capitalized on positive images. The researcher predicted that image appeals were more effective if they dealt with the consequences of being different. In a true experiment, the researchers used random assignment to create two groups of participants. Each group would read one of two studies. The first group read a study that said most sexually active students use condoms. The second group read a study that said most sexually active students do not use condoms. The study showed that positive image appeals (“Be responsible”) were most effective when participants believed that their peers were not using condoms and negative image appeals (“Don’t be irresponsible”) were most effective when participants believed that their peers were using condoms. In the true experiment, the researchers manipulated norm perception. In the follow-up quasi-experiment, the researchers measured norm perception. They asked participants to rate the prevalence of condom use by their sexually active peers.

The results were the same in measuring norm perception as in manipulating norm perception. The fact that these results are consistent gives the researchers more confidence that their hypothesis is correct.

How do you determine the best research approach? - Chapter 9

In this chapter we will discuss basic research designs often used in experimental and quasi-experimental research. To begin with, there are two basic types of designs: one-way designs and factorial designs. One-way designs have only one independent variable while factorial designs can have two or more independent variables. There are also two basic ways to approach designing a research study: between-subjects manipulations and within subjects manipulations. These ideas are applicable to both non experimental and experimental research methods.

What are single variable designs: one-way designs?

The most basic type of experimental design is the one-way design. This design consists of only one independent variable. The most basic type of one-way design is two-group designs. Two-group designs use only one independent variable. This variable has only two levels. This usually means that two-group designs have an experimental group and a control group. The control group does not receive the treatment while the experimental group does. This is the most basic form of research. It is important to note that often researchers do not firmly adhere to the use of an experimental group and a control group. A researcher may instead use two experimental groups that receive different levels of treatment. For example, a researcher that is interested in studying the effectiveness of a drug may give two different groups two different dosages and observe the results after they perform a task. If the drug is a pain reliever, the researcher may ask the participants to place their hand in cold water (this is called a cold-pressor task) and rate their pain. In this scenario, it is not necessary for the researcher to use a group that did not take the drug at all.

One-way, multiple-group design is another type of one-way design. In this type of design, there is also only one independent variable. The difference between one-way, multiple-group designs and two-group designs is that the former uses one independent variable in three or more levels. Compared to two-group designs, this type of design can provide more detailed information. In the example of the drug effectiveness study, the researchers could include additional groups. A control group could receive a placebo drug, additional groups could receive dosages of less extreme increments than in the two-group study. This would allow researchers to gauge the drug effectiveness with more sensitivity. Studies using more than two groups are especially useful when the researcher is studying person perception. For example, if a researcher is interested to see if people judge others by their appearance, the researcher may provide three different groups with exactly same description of a person’s personality but vary how attractive the person is. Past studies have identified what is called the physical attractiveness stereotype.

In 1991, Eagly, Ashmore, Makhijani, and Longo conducted a study that showed that people assume that people who are physically attractive are happier, better adjusted and more sociable than other people. When researchers include more than two groups in person perception studies, they can notice things that would not happen in two group studies. For example, rather than only study the perception of attractive or not attractive people, they can study the perception of moderately attractive people. We should note that using additional groups does require more planning, resources, and time. For this reason, researchers usually begin studies using two groups (people who score either high or low on the scale the researcher is interested in). If the study using two-groups results in a documented effect, then the researcher will incorporate additional groups to study more levels of the independent variable. This allows the researcher to save time and money. As we’ve discussed before, this also makes theoretical sense because if an effect does not occur on the high and low ends of the spectrum, it may not occur in the middle. Of course there are exceptions and for this reason researchers will use additional groups when they deem it necessary.

What are two or more independent variables: factorial designs?

The basics of factorial designs

In factorial designs a researcher creates every possible combination of all the levels of all the independent variables. This design was made popular by R.A. Fisher and is useful since human behavior is usually a result of multiple variables working together. Unlike one-way designs, these designs contain two or more independent variables. Additionally, these variables are crossed completely. This means that every level of every independent variable appears in combination with every level of every other independent variable.

Factorial designs are labeled based on two criteria. The first criteria is the number of independent variables in the design. The second criteria is the number of levels each independent variable has in the design. For example, a 2 X 3 factorial design has two independent variables (because there are two numbers) with the first independent variable having two levels and the second independent variable having three levels. When referring verbally to a 2 X 3 factorial design, the X is referred to as “by.” So this is a “two by three” factorial design. Another example would be a 2 X 2 factorial design has two independent variables and both variables have two levels. By multiplying the numbers in the notation, a researcher can also tell how many cells or how many unique conditions are in a study. For example in a 2 X 3 factorial study, there are six cells, or six unique conditions. In a 2 X 2 X 3 factorial study there are twelve cells. This system of notation is also helpful to researchers because it shows the kind of statistical analysis that should be conducted. A 2 X 2 factorial design needs a two-way analysis of variance (ANOVA) since there are two independent variables in the study. A 2 X 2 X 2 factorial design needs a three-way ANOVA since there are three independent variables in the study etc.

This may seem very complex, but for most purposes psychological researchers focus on the most simple form, the 2 X 2 factorial design. Researchers do this because a smaller study is more manageable and pragmatic. While it may seem that a large factorial study would yield the most specific and informative results, researchers usually do not have the resources to conduct such large factorial studies. Additionally it is often unnecessary to include such minute manipulations and including them does not generally yield much more informative results. Research also prefer to refine their studies through qualification where they try to show the conditions where a theory is true. 2 X 2 factorial studies help in this endeavor because it is easier to show that the effect of an independent variable on a dependent variable depends on the level of a second independent variable. When this dependent condition is shown the two variables are said to interact. We’ll discuss interactions in more detail later. Researchers can test for interactions through factorial designs, but because it is difficult to track many different qualifications of an effect at once, researchers use 2 X 2 factorial designs. 2 X 2 factorial designs are the most efficient way of determining when an effect is (or isn’t) occurring.

In general, researchers tend to prefer simple designs over complex designs because they usually want to test predictions that are derived from more than one theory at a time. Since most theories have one or two main principles, researchers tend to make predictions using one or two variables. Researchers can conduct simple factorial studies whose findings will be useful to more than one theory because the same manipulation will have different effects on the outcome. Another reason is that most theories follow the “If X, then Y” format. These cause and effect usually only takes one manipulation to garner support for the theory and a second manipulation to qualify the first manipulation. Anything more complicated would normally deviate into a different theory. Basically researchers typically have simple theories so they prefer to use a simple factorial design to test their theories.

When to use factorial designs instead of one-way designs?

Since factorial designs have two or more independent variables, they can address questions that one-way designs can’t handle. Factorial designs let researchers study more than one independent variable at the same time. Easily detected effects of independent variables are called main effects. These are the effects that would normally also be seen if a researcher conducted a simple one-way design. Main effects in factorial designs are the overall effect of an independent variable taken as an average of all other levels of the independent variable. Main effects are straightforward because they only involve one variable. For example, in 1991, Greenwald, Spangenberg, Pratkanis, and Eskenazi conducted a 2 X 2 experiment to study how expectancies affect memory. The researchers provided the participants with one of two tapes with subliminal messages to listen to. The first tape had messages intended to improve self-esteem. The second tape had messages intended to improve memory. This was the first manipulation. The second manipulation occurred in how the tapes were labeled. Half the tapes of each group were labeled incorrectly (ex: as memory boosting when it should have been self-esteem boosting).

Essentially, regardless of which tape they received, half the participants thought they were listening to messages that would boost self-esteem and the other half thought they were listening to messages that would boost memory. Greenwald et al.’s study yielded one result, the main effect of the tape labels. There was no main effect of the actual content on the tape. This meant that content had no effect on if people reported improved memory. However, what people expected from the tapes because of how they were labeled dramatically affected how people perceived an improvement in their memory. The researchers in this study were able to show expectancy effect in the exact participants and context where there was no supporting evidence for subliminal messages. This leads us to believe that the results were accurate because any person confounds or other idiosyncrasies found in the participants would have affected both variables. If the researchers had conducted a two one-way designs and found the same results, we may be more incredulous. Additionally, the 2 X 2 design allowed the researchers to show that they did in fact see the expectancy effect regardless of if the participants were in a condition that would explain their expectation. This basically shows that expectation effects are independent of message content effects.

Interactions

Unlike one-way designs, factorial designs allow researchers to see if statistical interactions are present. An interaction is present when the effect of one independent variable on a dependent variable depends on the level of a second independent variable. In 1988, Gilbert, Pelham, and Krull conducted a study on causal attribution. The researchers wanted to study when people are most likely to consider the social context when they are judging someone else’s personality.

For example, let’s say we see someone acting very nervous and anxious while engaged in conversation with another person. Our judgements about the personality of the person would differ depending on what we are told of the context of the conversation. For example if the person is discussing an embarrassing event, we would not judge them to be typically anxious since it is understandable to feel uncomfortable in that situation. Now if we are told the person is discussing the weather, we would conclude that this must be an extremely anxious person if they are uncomfortable in that situation. To study this, a researcher could conduct a two-groups study where they manipulate the social context that the person’s nervous behavior took place in. They could show two groups of participants the exact same video of the person behaving nervously and provide them with differing information regarding the context of the nervous behavior.

This is essentially what Gilbert, Pelham, and Krull did except they also manipulated a second variable. Thus their study was a 2 X 2 factorial design. The second variable that the researchers manipulated was the amount of cognitive load the participants experienced while making their judgements. Half of the participants were allowed to focus on making their judgements while the second half were given an additional task involving memory. The study showed that when people were able to focus they did a good job of taking the situation into account. However when people have to divide their attention due to the addition of a second task, they were unable to consider the context in making their judgements. This pattern shows that the researchers observed an interaction between the information given and the cognitive load variable.

To relate this back to our definition of interactions, the effect of the context of the conversation on the participants’ judgments of the person’s anxiety level depended on their level of cognitive load. The researchers came to the conclusion that person perception happens in multiple steps. People begin by jumping to conclusions that a person is a certain “kind” of person. When given additional information providing more context, a person that is able to focus and access their higher-order cognitive resources is better able to correct their initial assumptions. Other studies on attitude change, stereotyping, and judgment under uncertainty have shown similar results. Essentially, if a person is distracted they are more likely to make simplistic judgements rather than use reasoning. Studies of this nature tend to be factorial studies because judgment is usually the result of more than one variable.

Differences between main effects and interactions

It is important to realize that interactions and main effects are different. In the previous study, a main effect of cognitive load would be at work if the cognitively loaded participants (as a group) judged the person to be more anxious than the unloaded participants independent of what they were told about the context of the person’s anxious behavior. Let’s use another example to illustrate the differences between main effects and interactions. A researcher wants to study how the growth rate of a bean plant is affected by two variables: sunlight and water. The researcher wants to conduct a 2 X 2 factorial study using forty healthy plants that he randomly assigns to four different conditions. The four conditions are low water and low light, low water and high light, high light and low water, and high water and high light. The researcher takes care to cross the two independent variables.

FIGURE 1: Plant Growth Results

Water Given to Plants	Sunlight Given to Plants
	Low	High	Mean (row)
Low	23	27	25
High	10	40	25
Mean (column)	16.5	33.5

ANOVA analysis of the results shows that the studied yielded three results. The first being a main effect of sunlight since the plants that received high levels of sunlight grew significantly more than those that received low light. The second result is no main effect of water since the plants that received high water and the plants that received low water grew to the same height. The third result is a Sunlight X Water interaction. The presence of this interaction tells the researcher that that the main effects do not give the complete picture. The interactions shows that how water levels affected plant growth depended on the amount of sunlight the plant received. Plants that got low sun did better with low water while plants that got high sun did better with high water. The point of this example is to illustrate the fact that if the researcher had relied only on the main effects he witnessed, he would have come to an incorrect conclusion regarding sunlight, water, and plant growth. Interactions always involve two or more variables.

In this case, the effects of water depended on the levels of sunlight and the levels of sunlight depended on the levels of water. The same interaction can be seen from two different vantage points in the case of two-way interactions.

Ordinal/spreading interaction

A wide range of data patterns can reflect interactions. Even in basic interactions like two-way interactions, researchers try to make a distinction between different patterns that reflect an interaction in different ways. One common data pattern is ordinal or spreading interaction. This happens when an independent variable has an effect under some conditions but has no effect or less of an effect under other conditions. To illustrate this we can give the example of the study conducted in 1980 by Wells and Petty on attitude change. They hypothesized that people infer their attitudes from how they physically respond to a persuasive argument. The researchers asked participants to evaluate headphones. The participants were told that these headphones were used for exercise. Participants listened to one of two persuasive arguments on tuition increase through the headphones. Some participants were asked to nod their head while listening and others were told to shake their heads while listening. Head nodding did in fact lead to attitude change but only when participants were listening to a counter attitudinal argument.

Conducting Research in Psychology: Measuring the Weight of Smoke - Pelham & Blanton - 4e druk - BulletPoints

Hoofdstuk 1 – Inleiding en geschiedenis

Barnum beschrijving: een niet-wetenschappelijke vage omschrijving van de persoonlijkheid van een persoon die zo algemeen is dat deze van toepassing kan zijn op iedereen. Deze omschrijving wordt echter verward met beweringen die enkel voor de desbetreffende persoon gelden.
De eerste van de vier wetenschappelijke aannames is spaarzaamheid, ook wel Occam’s Razor genoemd. Hier wordt de voorkeur gegeven aan de meest simpele, wetenschappelijke theorie die iedereen zou moeten kunnen begrijpen.
De tweede aanname is empirisme, het maken van systematische observaties om te begrijpen hoe de wereld in elkaar steekt.
De derde en meeste belangrijke aanname is testbaarheid, alle wetenschappelijke theorieën moeten testbaar zijn. Empirische technieken zijn gerelateerd aan falsificeerbaarheid (Popper), de mogelijkheid om aan te tonen dat een theorie niet juist is. Popper was een logisch-positivist, een stroming die vindt dat beweringen in de wetenschap en filosofie enkel gebaseerd mogen zijn op dingen die alleen met absolute zekerheid zijn te observeren.
Veel psychologische fenomenen zijn niet direct observeerbaar, het gebruik van operationele definities (Tolman en Hull) weet dit probleem echter te omzeilen.
Operationele definities verbinden onobserveerbare fenomenen aan dingen die direct zijn te observeren om deze zichtbaar te maken.
De laatste aanname is het determinisme, een denkwijze die stelt dat alle dingen een betekenisvolle en systematische oorzaken hebben. Voorbeelden hiervan zijn Illusionaire correlaties en basisstandaarden (base-rates).
De eerste verklaringen voor menselijk gedrag zijn bovennatuurlijke, metafysische verklaringen.
Verscheidene vormen van metafysische verklaringen zijn terug te vinden in mythologie en religie, de astrologie en het animisme.
Fysiologie is de studie van de onderlinge relaties en functies van verschillende delen van het lichaam en de hersenen. Filosofie is de studie van gedrag, de aard van de realiteit en kennis door gebruik te maken van intuïtie, empirische observaties en logica.
Wundt wordt gezien als eerste psycholoog die met het uit voeren van experimenten begin en hiermee als grondlegger van de experimentele psychologie.
Om over kennis te beschikken bestaan er verschillende manieren: logica, intuïtie, observatie en autoriteit. Al deze begrippen spelen een rol in verschillende geloofssystemen: religie (autoriteit), wetenschap (observatie), filosofie (logica) en regering (autoriteit).

Hoofdstuk 2 – Doelen en procedures van wetenschap

Observaties – het empirisme onderscheidt wetenschappelijke van niet-wetenschappelijke methodes.
Inductie (Francis Bacon) is het redeneren van een specifieke situatie naar een algemeen principe. Het probleem van inductie (David Hume) is de onzekerheid omtrent het minimum aantal observaties om van een theorie of wet te kunnen spreken.
Deductie (Karl Popper) is het redeneren van een algemeen principe naar een specifieke situatie. Een hypothese is een voorspelling over specifieke gebeurtenissen die afgeleid is van één of meerdere theorieën.
Een wet is een bewering over de universele aard van dingen die betrouwbare voorspellingen mogelijk maken. Een theorie is een algemene bewering over de relatie tussen twee of meerdere variabelen in een specifieke situatie. Deze beweringen moeten spaarzaam, empirisch, testbaar en deterministisch zijn.
Falsificatie vindt plaats wanneer een onderzoeker bewijs probeert te vinden dat een theorie of hypothese afwijst. Validatie vindt plaats wanneer een onderzoeker bewijs probeert te vinden dat een theorie of hypothese ondersteunt.
Over het algemeen zijn mensen geneigd een hypothese the bevestigen in plaats van deze te verwerpen, een positieve test bias.
Kwalificatie - onderzoekers proberen de grenswaarden te vinden waartussen een theorie of hypothese waar of niet waar is. Vaak leidt dit tot een integratie van twee tegenstrijdige theorieën door middel van het specificeren van omstandigheden waarin beide theorieën gelden.
Een experimenteel paradigma (Fischer) is een onderzoeksmethode waarin de ideale volgorde van procedures van psychologisch onderzoek wordt gevolgd. Drie technieken gebruikt in het experimentele paradigma zijn statistisch toetsen, manipulatie en willekeurigheid.
Je spreekt van een studie wanneer een onderzoeker een onafhankelijke variabele heeft gemanipuleerd.
Met een statistische toets wordt er onderzocht of het resultaat van een onderzoek significant groter is dan verkregen zou worden puur bij toeval alleen. Hier wordt gebruik gemaakt van de nulhypothese en de alternatieve hypothese.
Manipulatie is het systematisch variëren van de onafhankelijke variabele om het effect op de afhankelijke variabele te meten. Het experiment heeft minimaal twee groepen: de experimentele groep en de controle groep. De techniek willekeurigheid (randomnisation) bestaat uit een willekeurige selectie of toewijzing.

Hoofdstuk 3 – Meetschalen, betrouwbaarheid en validiteit

Er wordt gebruik gemaakt van vier soorten meetschalen. De eerste is een interval schaal, hierin heeft het verschil tussen twee uitkomsten een eenduidige betekenis. De schaal kan negatieve waarden aannemen, maar kent geen natuurlijk nulpunt.
Als er wel sprake is van een natuurlijk nulpunt, spreken we van een ratioschaal. Hier is nooit sprake is van negatieve waarden.
In een nominale/categorische schaal meet een variabele een kenmerk dat niet vanzelfsprekend in een getal kan worden weergeven. Er kan nu op willekeurige basis een codenummer aan de waarden worden gegeven.
Wanneer er wel sprake is van een logische volgorde spreekt men van een ordinale schaal. Hoewel deze schaal informatie biedt over de relatieve waarden geven ze geen informatie over de absolute verschillen. Het is mogelijk een meting van een hoger niveau naar een lager niveau te converteren, maar andersom niet.
Betrouwbaarheid is de mate van consistentie van een onderzoek waarin dezelfde uitkomsten worden verwacht. Om een zo hoog mogelijke betrouwbaarheid te behalen wordt de meer is beter regel toegepast, ga voor zoveel mogelijk observatoren, observaties en gelegenheden.
Verschillende vormen van betrouwbaarheid zijn o.a. interne consistentie, de mate waarin verschillende items in een multiple-item meting zich op dezelfde manier gedragen. Dit is te meten met Cronbach’s Alfa. De test-hertest betrouwbaarheid (temporele consistentie) en de interbeoordelaar betrouwbaarheid.
Met het begrip validiteit wordt de relatieve nauwkeurigheid en juistheid van een bewering die wordt onderzocht bedoeld (wordt er gemeten, wat er moet worden gemeten).
Verschillende vormen van validiteit zijn externe validiteit, waarmee wordt gemeten of de bevindingen van het onderzoek een nauwkeurige beschrijving geven van wat er in de echte wereld gebeurt. Het is echter lastig een studie te doen die oneindig generaliseerbaar is vanwege grensvoorwaarden.
Een andere vorm van validiteit is de interne validiteit. Hier wordt gekeken of de geobserveerde verandering in de afhankelijke variabele de oorzaak is van de manipulatie in de onafhankelijke variabele. De drie vereisten van causaliteit zijn het uitschakelen van confounds, covariatie en temporele sequentie.
Concept validiteit is de mate waarin het gehele onderzoek past in de bredere theorie waarvan de onderzoekshypothese is afgeleid. Construct validiteit is de mate waarin de onafhankelijke en de afhankelijke variabelen van het onderzoek ook echt datgene meten wat de onderzoeker beoogt te meten.

Hoofdstuk 4 – Psychologische metingen

Tijdens een zelf-rapportage onderzoek kom je twee hindernissen tegen. Ten eerste moet de interpretatie, van de deelnemer en onderzoeker, van de vragen overeenkomen. Ook moet de deelnemer in staat zijn, zijn of haar interne toestand te kunnen vertalen naar een waarde op de responsschaal.
Een deelnemer gaat eerst door een oordeelfase en vervolgens door een response translation fase. Tijdens de oordeelfase van het beantwoorden van een zelf-rapportage vraag, bepaalt de deelnemer welke vraag wordt beantwoord en wordt er een eerste reactie gevormd.
Bij het opstellen van een vragenlijst voor deelnemers in een onderzoek, moeten de hypothesen zo vertaald worden dat ze meetbaar zijn. Een manier om te testen of dit is gelukt, is het uitvoeren van een pilot studie, een oefenonderzoek.
Er wordt onderscheid gemaakt tussen open vragen (rijk en gevarieerd beeld) en gestructureerde (beperkende) vragen.
Bij het opstellen van vragen in een zelf-rapportage vragenlijst wordt rekening gehouden met o.a. volgende punten: Houd het simpel, vermijd (dubbelzinnige) ontkenningen en vragen, geef een context en zorg ervoor dat de vragen voorzichtig zijn bij gevoelige onderwerpen. Vermijd ook vragen die iedereen of niemand zal onderschrijven. Dit soort vragen leiden tot vloereffecten en plafondeffecten.
In marketing onderzoek wordt gebruik gemaakt van focus groepen, een representatieve steekproef van deelnemers waarin over ervaringen tijdens het onderzoek wordt gesproken.
Tijdens het ontwerpen van een numerieke meetschaal wordt er rekening gehouden met hoeveel getallen er worden gebruikt, de betekenis van de getallen en het getallensysteem (unipolair of bipolair).
Bij het maken van vragenlijsten worden er verschillende punten gevolgd. Bedenk ten eerste goed wat je precies wil meten, maak veel vragen waaruit je later een selectie kan maken en maak een analyse van de schaal met de best bijpassende items. Daarvoor kan gebruik worden gemaakt van een factoranalyse.
Enkele bruikbare metingen zijn de EGWA, de Semantic Differential, de Thurstone en Guttman Scales.

Hoofdstuk 5 – Bedreigingen voor de validiteit

Een regressie richting het gemiddelde houdt in dat mensen die op een voortest erg hoog of laag scoren de neiging hebben bij de na-test een score te hebben die meer rond het gemiddelde ligt.
Ontwikkeling verwijst naar veranderingen die specifiek zijn voor één persoon of een leeftijdsgroep naarmate deze ouder wordt. Geschiedenis verwijst naar de algemene veranderingen in de tijd in een grote groep mensen.
Bij een non-respons bias zijn de respondenten zelf de oorzaak van bias. Mensen die geïnteresseerd zijn in het onderwerp van het onderzoek zullen eerder geneigd zijn deel te nemen dan mensen die niets om het onderwerp geven.
De onderzoeksmethode die het meest gevoelig is voor de vertekening van individuele verschillen is het pseudo-experiment. Met een selectie bias wordt er een proef genomen van een niet-representatieve steekproef.
Er is sprake van uitval wanneer deelnemers vroegtijdig stoppen en komt vooral voor bij lange termijn studies. Dit kan bestaan uit homogene uitval of heterogene uitval.
De deelnemersreactie bias komt voor wanneer een deelnemer zich realiseert dat hij of zij wordt bestudeerd en zich daarom op een andere manier gedraagt. Dit kan zich op drie manieren uiten: via deelnemersverwachtingen, via deelnemersverzet of doormiddel van vrees voor een evaluatie. Het Hawthorne effect houdt in dat de productiviteit van mensen toeneemt als ze weten dat ze worden bestudeerd.
Om deelnemersreactie bias te voorkomen kan de onderzoeker in de coverstory een valse verwachting door laten schemeren, anonimiteit en privacy garanderen, de deelnemers observeren zonder dat ze het in de gaten hebben, gebruik maken van indirecte metingen over attitudes en meningen of van een bogus pipeline.
Testeffecten komen vooral voor in voortest - natest onderzoeken en verwijzen naar de neiging van mensen beter te scoren op de natest in vergelijking met de voortest omdat het maken van de voortest invloed heeft op het maken van de natest.
Bij psychologische testen is er vaak sprake van attitude polarisatie. Dit betekent dat deelnemers in de tussentijd hebben nagedacht over hun antwoorden en daar extremer in worden om sociaal wenselijker over te komen.
Confounds is een term die verwijst naar elk probleem in de onderzoeksopzet waarbij een extra variabele systematisch varieert met de onafhankelijke variabele. Artifact verwijst naar een variabele die constant wordt gehouden in een onderzoek, en zorgt voor een begrensde context waarin het effect kan worden gevonden.
Experimenter bias komt voor wanneer de verwachting van de onderzoeker hem of haar ongewild bevooroordeeld tijdens de experimentele observaties. Dit kan voorkomen worden met een dubbelblind procedure.

Hoofdstuk 6 – Niet-experimenteel onderzoek

Een onderzoek met één variabele heeft als doel een helder beeld te schetsen van een populatie. Dit wordt gedaan door middel van een beschrijving van een specifieke eigenschap van de desbetreffende groep.
Een census is een verzameling data die is verkregen door alle mensen van de onderzochte populatie te ondervragen. Dit is dan ook de reden waarom er vaak gebruik wordt gemaakt van populatieonderzoeken.
In een cluster steekproef zijn mensen uit een populatie al in groepen geplaatst. De steekproeffout of foutmarge geeft de waarschijnlijke discrepantie aan tussen de verkregen resultaten uit de steekproef en de resultaten die verkregen zouden zijn als er daadwerkelijk gebruik wordt gemaakt van de populatie.
Onderzoek naar publieke meningen is onderzoek naar attitudes en voorkeuren van een specifieke populatie. Het valse consensus is een verkeerde aanname van mensen dat hun attitudes en gedragingen gelijk staan aan de attitudes en gedragingen van anderen in dezelfde groep.
Epidemiologisch onderzoek is een beschrijvende studie die zich primair focust op de prevalentie en incidentie van psychologische aandoeningen binnen een populatie. Beperkingen van populatieonderzoeken zijn o.a. de hoge kosten en lange tijdsduur, het oproepen scepticisme, het te kort en bondig verschaffen van nieuwe informatie en het gebruik van een oncontroleerbare onderzoek omgeving.
Voor het begrijpen van oorzaak-gevolg relaties tussen meerdere variabelen worden correlationele methoden gebruikt. Dit houdt het verzamelen van observaties in en het testen van hypothesen. Om omgekeerde causaliteit te voorkomen wordt er vaak gebruikt gemaakt van longitudinale of prospectieve onderzoeksontwerpen.
Tijdens een onderzoek kan er sprake zijn van een persoons-, omgevings- of operationele-confound. De twee manieren om de invloed van een operationele confound te verminderen zijn het herzien van de meting om een confound te verwijderen of een aparte meting om te kijken hoeveel invloed een confound heeft.
Een voorbeeld van correlationeel onderzoek is archief onderzoek, hierbij wordt gebruik gemaakt van bestaande publieke gegevens om een theorie of een hypothese te testen. Tijdens (onopvallend) observatie onderzoek wordt het gedrag geregistreerd van mensen in hun eigen natuurlijke omgeving.
In vragenlijsten en interviews worden mensen (mondeling) gevraagd naar hun gedrag, gevoelens en gedachten. Een mogelijk onderdeel van een vragenlijst is de Likert schaal. Een casestudie is een studie die een analyse van de ervaringen van één persoon of een groep betreft. Vaak vertoont de persoon of groep uitzonderlijk gedrag.

Hoofdstuk 7 – Experimenteel onderzoek

Experimenten beschikken vaak over een hoge interne validiteit, wat noodzakelijk is om een causale relatie aan te tonen tussen een onafhankelijke en een afhankelijke variabele. De ceterus paribus conditie is een conditie waarin slechts één aspect wordt gemanipuleerd en alle andere constant worden gehouden.
Om te kunnen spreken van een experiment moet een studie gebruik maken van een manipulatie. In de psychologie heeft deze manipulatie vaak te maken met individuele verschillen.
Vroeger gebruikten onderzoekers matching, ieder persoon uit de controlegroep wordt gekoppeld aan een persoon uit de experimentele groep. Hedendaags wordt er meer gebruik gemaakt van willekeurige selectie. Om de imperfectie van de willekeurige selectie op te vangen voeren onderzoekers statistische toetsen uit.
Een belangrijk kritiekpunt op experimenteel onderzoek is dat een experiment niet altijd even realistisch is, hier zijn twee oplossingen voor: het mondain realisme en experimenteel realisme. Het mondain realisme is een manier om een laboratoriumexperiment realistischer te laten lijken door de fysieke belevingswereld gelijk te stellen aan de echte wereld.
Experimenteel realisme, ook wel psychologisch realisme genoemd, is een manier om een laboratoriumonderzoek realistischer te laten voelen om de vertoning van natuurlijk en spontaan gedrag uit te lokken.
Een methode om achter de mate van experimentele realiteit te komen is een manipulatie check. Dit is een methode waarbij wordt gemeten of de manipulatie bij de deelnemers daadwerkelijk de psychologische staat teweegbrengt die de onderzoeker wenst te bereiken.
Deceptie hangt samen met een hoge mate van experimenteel realisme. Dit betekent dat de deelnemers niet altijd de ware reden van het onderzoek wordt verteld, omdat ze dan bevooroordeeld raken waardoor natuurlijk gedragingen minder snel wordt vertoond.
Sterke punten van experimenten zijn de minimalisatie van ruis, onderzoekers kunnen het onzichtbare observeren, de eliminatie van confounds en het verkrijgen van informatie over interacties tussen variabelen.
Kwalificatie is het onderzoeken van grenswaarden waartussen een theorie zowel waar als niet waar is. Dit heeft te maken met het analyseren van interacties. Ook kunnen laboratoriumexperimenten belangrijke verklaringen geven voor resultaten met betrekking tot psychologische mechanismen die naar voren zijn gekomen tijdens veldonderzoek. Een ander voordeel van experimenteel onderzoek is dat deze vorm van onderzoek garantie geeft voor enige vorm van validiteit.

Hoofdstuk 8 – Quasi-experimenteel onderzoek

In een quasi-experiment is het slechts gedeeltelijk mogelijk controle te hebben over de onafhankelijke variabelen. Dit betekent dat deelnemers op een andere manier, dan op een willekeurige manier, aan de experimentele condities worden toegewezen.
Een reden dat er toch wordt gekozen voor een quasi-experiment is dat het niet altijd mogelijk is om willekeurig te selecteren, ook is een experiment soms te duur of onderzoekers hebben de intentie de interne en de externe validiteit te maximaliseren.
Een natuurlijke experiment is een voorbeeld van een quasi-experiment. Hierbij worden alle variabelen van interesse gemeten en is er geen sprake van een willekeurige selectie.
Om controle te krijgen over een onafhankelijke variabele zijn er twee manieren. De eerste is dat de onderzoeker ‘doet alsof’ de derde variabele een experimentele manipulatie is. Het tweede alternatief is dat de onderzoeker elke confound meet om statistische controle te krijgen.
Tijdens een persoon-per-behandeling wordt tenminste één onafhankelijke variabele gemeten en ten minste één andere onafhankelijke variabele gemanipuleerd. Vaak vindt dit in het laboratorium plaats en is het gebaseerd op willekeurige selectie.
Eerst worden de variabele met betrekking tot individuele verschillen gemeten en daarna worden de deelnemers met lage en hoge scores opgesplitst in extreme groepen. Voor de verdeling kan een mediaan split worden gebruikt.
Tijdens de experimentele behandeling van natuurlijke groepen is er sprake van groepen die op een natuurlijke wijze gevormd zijn en niet uit elkaar zijn te halen. Elke groep wordt aan één conditie blootgesteld.
Het natuur per behandeling quasi-experiment is een onderzoeksontwerp dat zelden voorkomt. Het ontwerp wordt gebruikt op het moment dat een bepaalde natuurlijke manipulatie van tevoren is te voorspellen en kan worden gecontroleerd.
Een nadeel van een quasi-experimenteel onderzoek is het ontbreken van vergelijkbaarheid. Het is een experiment waarin de ceterus paribus conditie niet geldt.
In een patched-up ontwerp worden telkens nieuwe controlegroepen toegevoegd waardoor het effect van het experiment beter meetbaar is en de invloed van confounds kan worden getest. Voorbeelden zijn een één-groep ontwerp, één-groep voortest-natest ontwerp, na-test met niet-equivalente groepen ontwerp, voortest-natest met niet-equivalente groepen en het tijd-series ontwerp.

Hoofdstuk 9 – Onderzoeksontwerpen

Een factorontwerp is een onderzoeksontwerp met meerdere onafhankelijke variabelen die compleet zijn gekruist. Dit betekent dat elk niveau van elke onafhankelijke variabele gecombineerd wordt met elk niveau van elke andere onafhankelijke variabele.
Een kenmerk van een factorontwerp is het label. Het label is een specificatie van het aantal onafhankelijke variabelen en het aantal niveaus per onafhankelijke variabele (bijv. 2x3). Ook geeft het aantal onafhankelijke variabelen aan welke statistische analyse wordt toegepast tijdens het analyseren van de data. Een ander kenmerk is het aantal cellen, oftewel de unieke combinaties die worden berekend door de getallen met elkaar te vermenigvuldigen.
Hoofdeffecten zijn effecten van onafhankelijke variabelen in factorstudies. Deze effecten verwijzen naar het overall effect van een onafhankelijke variabele, gemiddeld genomen over alle niveaus van de andere onafhankelijke variabele(n).
Er is sprake van interactie als het effect van een onafhankelijke variabele op een afhankelijke variabele afhangt van het niveau van een tweede onafhankelijke variabele. Interacties kunnen alleen in factorstudies worden aangetoond. Er bestaan twee vormen van een twee-wegs interactie: de ordinale / gespreide interactie en de disordinaal / overgekruiste interactie.
De resultaten van een factorstudie kunnen grafisch worden weergegeven in een staafdiagram. Een overgekruiste interactie is goed te zien in een lijndiagram. Bij een overgekruiste interactie kruisen de lijnen elkaar in de buurt van het midden van de grafiek.
Een simpele hoofdeffecten test kan worden gebruikt om grafische misleiding te voorkomen en om te zien welke specifieke vergelijkingen van gemiddelden significant zijn in hun factorstudie.
Een voorbeeld van een ontwerp met één onafhankelijke variabele (one-way) is het twee-groepen ontwerp. Het twee-groepen ontwerp is de meest simpele vorm: er is één onafhankelijke variabele die twee niveaus heeft. Een ander voorbeeld is het multiple-groepen ontwerp.
Naast deze ontwerpen worden er ook twee benaderingen omschreven. De eerste is het between-subject ontwerp. In een within-subject of herhaalde meting ontwerp wordt elke deelnemer op verschillende tijdstippen aan meerdere (soms alle) condities blootgesteld en met zichzelf vergeleken.
Nadelen van within-subjects ontwerpen is het hoge transparantie gehalte, gevolg- (sequence) effecten en overdrachtseffecten. Overdrachtseffecten komen voor in drie vormen: interferentie effecten, oefeneffecten, en volgorde (order) effecten.

Conducting Research in Psychology: Measuring the Weight of Smoke - Pelham & Blanton - 4th edition - Glossary

Glossary of Terms Chapter 1

Metaphysical	Supernatural explanations were the earliest ways to explain human behaviour
Animism	A belief that objects in nature are alive
Religion or mythology	These systems assume that spiritual beings are central to human behaviour
Astrology	States that human behaviour is determined by celestial bodies.
Physiology	The study of how the brain and the body function are connected
Psychology	The scientific study of human behaviour
Determinism	States that there is order to the universe and there is a cause for events
Empiricism	Knowledge comes from making observations
Parsimony	States that if there are two equally plausible theories of explanation, preference should go to the simpler theory and that in developing theories, unnecessary concepts should be avoided.
Testability	States that theories should be testable, basically proven or disproven through research and other empirical techniques

Glossary of Terms Chapter 2

Laws	Are comprehensive and general statements about the nature of reality
Induction	Using specific observations to draw general conclusions or principles. Form specific towards the general
Deduction	When a general theory is used to create hypotheses that are tested through observations. From general towards specific
Positive test bias	Refers to the tendency of people to try to confirm rather than disprove their hypothesis
Validation	The most common approach where scientists try to find evidence to prove their claims
Cognitive dissonance	Says that when a person believes two logically inconsistent beliefs to both be true.
Falsification	Where researchers try to find evidence that disprove their claims.
Qualification	Researchers try to find the limits to a theory and the specific set of circumstances under which the theory is true or false
Case studies	Document detailed observations of a person or a group of people
Practitioner´s rule of thumb	Evaluating the process experts follow to reach certain results.
Reasoning by analogy	A technique where the researcher creates analogies for comparisons
Functional or adaptive analysis	Researches ask a set of questions about what a system has to do in order to learn about changes in their environments and control their environments.
Hypothetico-deductive method	Researches juxtapose contradictory theories
Accounting for expectations	To establish principles and further exploring the instances when such an exception could occur

Glossary of Terms Chapter 3

Nominal scale	Use meaningful but possibly irrelevant categories. This scale utilizes two possibilities that are mutually exclusive, like a person’s sex
Ordinal scale	Use ranking or putting things in order. It gives value to the position of people or things in relation to other people or things.
Interval scale	They do provide information on these differences by using real numbers and amounts. Unlike preciously mentioned scales, these scales can use negative values.
Ratio scale	These are similar to interval scales but they also utilize a true zero point (the point where none of the quantity under consideration exists).
Validity	How accurate a statement is
Internal validity	A measurement of how well research results establish causality. It measures the likelihood that that variations in the independent variable (the variable being manipulated) caused the changes you observed in the dependent variable.
External validity	The extent to which research results accurately reflect the real world. This is also known as generalizability. If a researcher want the ability to assert that the research results would be applicable in other situations or environments or to other people, the study needs to have high external validity
Construct validity	Deals with how accurately the variables in a study represent the hypothetical variables that the researcher is focused on.
Conceptual validity	The extent to which a research hypothesis connects to the broader theory that it is testing.
Reliability	The repeatability or consistency of a measure or observation
Test-retest reliability	Testing a group of people once and then invite the same group of people once and then invite the same group of people back for a second round of tests and compare the results.
Internal consistency	A way to assess validity through one session
Interrater reliability	The degree to which the judgements of individual people agree with one another.

Glossary of Terms Chapter 4

Pilot testing	Refers to using practice studies designed to help researchers refine their questions or other forms of measurement before conducting the intended study.
Focus group	Small but representative sample of people from the larger group of people the researcher intends to study.
Anchors	Descriptive words that give the numbers meaning
Unipolar scales	These scales ask research participants to make ratings beginning from a low value such as zero and move to a higher value that ends at the researcher’s discretion.
Bipolar scales	These scales ask research participants to make ratings where zero is the middle anchor.
Empirical evaluation	A researcher may find that a question they thought was great is not at all useful, while another question that they didn´t even seriously consider is in fact a very good question.
Factor analysis	To see how closely related the items in their scale are to each other
Semantic differential	A type of a rating scale designed to measure the connotative meaning of objects, events, and concepts. The connotations are used to derive the attitude towards the given object, event or concept. The respondent is asked to choose where his or her position lies, on a scale between two bipolar adjectives

Glossary of Terms Chapter 5

Pseudo-experiment	A research experiment where someone tests a variable by exposing people to it and observing how people feel, think and behave.
Selection bias	A statistical bias in which there is an error in choosing the individuals or groups to take part in a scientific study.
Non-response bias	People who answer surveys are different from those who don’t.
History	Refers to changes in large numbers of people such as a country or culture
Maturation	Refers to changes in a specific person over time.
Regression towards the mean	The tendency that people will score high or low on the first round of tests and will score closer to the middle on the second round of tests.
Hawtorne effect	Workers were being more productive because they knew they were being studied
Mere measurement effect	Occurs when participants change their behaviour because they are asked how they will act in the future.
Testing effects	The tendency that people have to do better on a test the second time around
Attitude polarization	Letting people think about their attitudes often leads to even more extreme or polarized attitudes
Experimental mortality or attrition	In some cases a participant will not complete a study.
Simple or homogenous attrition	An equal level of attrition occurred in each condition
Heterogeneous attrition	If it does not occur at an equal level but are actually noticeably different.
Participant reaction bias	Happens when people behave in uncharacteristic ways when they know they are being observed
Participant expectancies	The participant tries too hard to cooperate with the researcher. They may behave in a way that they think is consistent with the researcher’s hypothesis.
Participant reactance	The participant tries to disprove the researcher’s hypothesis
Evaluation apprehensions	Refers to a person being worried about being judged by another person.
Experimenter bias	When the experimenter makes incorrect observations due to their bias or when researchers behave differently with participants due to their bias.
Confounds	Variables that vary but should be constant. A situation where an additional variable ‘z’ varies with both the independent variable ‘x’ and the dependent variable ‘y’
Artifacts	Are constant but should vary in a regulated manner
Pro social model	By giving participants something, they feel like they have to give something back and they will be more helpful.
Double blind procedure	Means that both the researcher and the research participants are unaware of or blind to the participant’s treatment condition

Glossary of Terms Chapter 6

Single variable studies	Are designed to describe a specific property of a large group of people.
Census	The procedure of systematically acquiring and recording information about the members of a given population
Population survey	Chooses the sample group randomly through a process called random sampling.
Cluster sampling	A researcher first makes a list of possible locations where they will find the people who they are interested in studying. Then he will shorten the list by random selection. Then he will randomly choose a specific number of individuals from each location to survey.
Epidemiological research studies	The prevalence of certain psychological disorders within a specific population
Marketing research	Aims to gauge what a consumer thinks about different products and which ones they prefer.
False consensus effect	Shows that people tend to overestimate the proportion of other people who have similar attitudes as themselves.
Convenience sample	A finding which is so interesting and bizarre that a randomly selected control and variable group is not necessary.
Correlation research	Is used to make observations about a group of people and to test predictions about how different variables are related.
Person confound	When a variable appears to cause something because people who are high or low on this variable happen to be high or low on an individual difference variable.
Environmental confound	Are similar to people confounds except they relate to environmental instead of personal nuisance variables.
Operational confound	Happens when a measure used to gauge a specific topic such as depression or memory measures something else accidently.
Longitudinal designs or prospective designs	Researchers track and observe people over time to better understand the causal relationships between variables.
Archival research	Researchers review already existing public records to test their hypothesis.
Observational research	They are observing the real world, in this case the subject in their natural environment and they do not manipulate variables.
Unobtrusive observational research	Where the researcher does not interfere with the person’s behaviour and the person doesn’t realize they are being studied.

Glossary of Terms Chapter 7

Manipulation	When an experimenter changes the levels of a variable, in a systematic way.
Random assignment	Aims to make a control groups as similar as possible to the variable group
Random sampling	Aims to make the sample groups as similar as possible to the population
Procedural confounds	Are the same as environmental confounds. The only difference is that procedural confounds relate to experimental research while environmental confounds relate to non-experimental research.
Qualification	The process researches use to test the conditions under which a theory is or isn’t true.
Artificiality	True experiments are not applicable in the real world.
Psychological realism	Type of realism that makes the study feels real to the research participants rather than look real.
Mundane realism	Type of realism that makes the study appear similar to the real world (look real)
Manipulation check	A way to gauge if the research participant does feel the psychological condition the researcher intended.
Optimistic bias	People are overly optimistic about outcomes in life
Planning fallacy	People assume things will go better than they actually do

Glossary of Terms Chapter 8

Quasi-experiments	Research experiments where the researcher does not have full control over the variables.
Person by treatment quasi –experiment	A researcher measures at least one independent variable while manipulating at least one other independent variable.
Median split	The researcher would divide the scores into two groups: participants who scored above the median score and participants who scored below the median score.
Self-verification theory	Says that a person’s preference for positive or negative feedback depends on the feedback and on how they view themselves.
Natural experiments	Researchers give up all control over variables. They do not use random assignment at all. It focuses on naturally occurring manipulations.
Natural groups with experimental treatment design	Researchers take two naturally occurring groups of people (people who have sorted themselves into groups) and treat them differently in a systematic way.

Glossary of Terms Chapter 9

One-way designs	Have only one independent variable
Two-group designs	Have only one independent variable, this variable has only two levels. This usually means one experiment group and one control group
One-way multiple group design	It has one independent variable in three or more levels
Factorial design	Researcher creates every possible combination of all the levels of all the independent variables. Additionally these variables are crossed completely
Interaction	When the effect of one independent variable on a dependent variable depends on the level of a second indecent variable.
Spreading or ordinal interaction	This happens when an independent variable has an effect under some conditions but has nog effect or less of an effect under other conditions
Disordinal interaction	When there are no main effects and when the effects of each independent variable are opposite at different levels of the other independent variable.
Crossover interaction	The effects of both independent variables are opposite and fairly equal at the different levels of the other independent variable.
Simple effects test	These tests help a researcher recognize which specific mean comparisons are important
Between-subject designs	Each participant servers in only one condition of the experiment
Within subject designs	Each participant serves in more than one condition of a study.
Sequence effect	People’s psychological states tend to change as they spend time working on tasks. The passing of time begins to affect a participant’s response
Carry over effect	These effects happen when a participant’s response to one stimulus in a study influences their response to a second stimulus
Order effect	Happen when a question takes on a different meaning when it follows one question versus when it follows another.
Testing effect	Happen when people do better on a test the second time they take it.
Interference effect	Sometimes performing a task will cause a participant to do worse on the second task
Demand characteristics	A participant could try to act in ways that are consistent with the experimenter’s hypothesis when they figured out the hypothesis.
Counterbalancing	A control technique that can address both sequence effects an carry over effects. The researcher varies the order that the participants experience the different experimental conditions.
Complete counterbalancing	Researchers make sure that different participants receive different orders through random assignment.
Reverse counterbalancing	Researchers generate a single order and then reverse it. The order can be generated through meaningful or random means. It will produce two different orders and keeo the average serial position of a condition the same for all the unique conditions.
Partial counterbalancing	Researcher would choose a limited number of orders randomly from the list of all possible orders.
Structured debriefing	An interview where the participants are asked what they thought the researcher wanted to study or expected to find.
Mixed-model designs	Hybrid of between-subject and within-subject designs. At least one independt variable is manipulated on a between-subject basis and at least one other independent variable is manipulated on a within-subject basis.

Access:

Public

Join WorldSupporter!

Join with a free account for more service, or become a member for full access to exclusives and extra support of WorldSupporter >>

Going abroad?

Insure your way around the world

International expat insurances

Travel & Worldsupporter insurances (NL)

Study with summaries

Associate with your Field of Study

Search Summaries or Notes

Start using Summaries

Add a Summary

Search a summary

Contributions: posts

Help other WorldSupporters with additions, improvements and tips

Add new contribution

Spotlight: topics

Check how to use summaries on WorldSupporter.org

Online access to all summaries, study notes en practice exams
How and why use WorldSupporter.org for your summaries and study assistance?
Using and finding summaries, notes and practice exams on JoHo WorldSupporter
Quicklinks to fields of study for summaries and study assistance

Online access to all summaries, study notes en practice exams

Check out: Register with JoHo WorldSupporter: starting page (EN)
Check out: Aanmelden bij JoHo WorldSupporter - startpagina (NL)

How and why use WorldSupporter.org for your summaries and study assistance?

For free use of many of the summaries and study aids provided or collected by your fellow students.
For free use of many of the lecture and study group notes, exam questions and practice questions.
For use of all exclusive summaries and study assistance for those who are member with JoHo WorldSupporter with online access
For compiling your own materials and contributions with relevant study help
For sharing and finding relevant and interesting summaries, documents, notes, blogs, tips, videos, discussions, activities, recipes, side jobs and more.

Using and finding summaries, notes and practice exams on JoHo WorldSupporter

There are several ways to navigate the large amount of summaries, study notes en practice exams on JoHo WorldSupporter.

Use the summaries home pages for your study or field of study
Use the check and search pages for summaries and study aids by field of study, subject or faculty
Use and follow your (study) organization
- by using your own student organization as a starting point, and continuing to follow it, easily discover which study materials are relevant to you
- this option is only available through partner organizations
Check or follow authors or other WorldSupporters
Use the menu above each page to go to the main theme pages for summaries
- Theme pages can be found for international studies as well as Dutch studies

Do you want to share your summaries with JoHo WorldSupporter and its visitors?

Check out: Why and how to add a WorldSupporter contributions
JoHo members: JoHo WorldSupporter members can share content directly and have access to all content: Join JoHo and become a JoHo member
Non-members: When you are not a member you do not have full access, but if you want to share your own content with others you can fill out the contact form

Quicklinks to fields of study for summaries and study assistance

Main summaries home pages:

Main study fields:

Business organization and economics, Communication & Marketing, Education & Pedagogic Sciences, International Relations and Politics, IT and Technology, Law & Administration, Medicine & Health Care, Nature & Environmental Sciences, Psychology and behavioral sciences, Science and academic Research, Society & Culture, Tourisme & Sports

Main study fields NL:

Studies: Bedrijfskunde en economie, communicatie en marketing, geneeskunde en gezondheidszorg, internationale studies en betrekkingen, IT, Logistiek en technologie, maatschappij, cultuur en sociale studies, pedagogiek en onderwijskunde, rechten en bestuurskunde, statistiek, onderzoeksmethoden en SPSS
Studie instellingen: Maatschappij: ISW in Utrecht - Pedagogiek: Groningen, Leiden , Utrecht - Psychologie: Amsterdam, Leiden, Nijmegen, Twente, Utrecht - Recht: Arresten en jurisprudentie, Groningen, Leiden

WorldSupporter: what are the features, functionalities and rules on WorldSupporter.org?

WorldSupporter NL: hoe vind je samenvattingen en studiehulp op WorldSupporter.org en JoHo.org

Summaries and Study Assistance - Start

Follow the author: Vintage Supporter

Vintage Supporter

Work for WorldSupporter

JoHo can really use your help! Check out the various student jobs here that match your studies, improve your competencies, strengthen your CV and contribute to a more tolerant world

Working for JoHo as a student in Leyden

Parttime werken voor JoHo

Statistics

2412