CHAPTER A - SAMPLING
The unit analysis depicts the level at which the research is performed and which objects are researched.
The essential application of sampling is that it allows drawing conclusions about the entire population, by studying some of the elements in a population.
A population element is the unit of study - the individual participant or object on which the measurement is taken. A population is the total collection of elements about which some conclusion is to be drawn.
A census is a count of all the elements in a population. The listing of all population elements from which the sample will be drawn is called the sample frame.
There are several compelling reasons for sampling:
1) Lower cost - the difference between the sample costs and census costs is substantial.
2) Greater accuracy of results – some argue that the quality of a study is often better with sampling than with a census.
However, when the population is small, accessible, and highly variable accuracy is expected to be greater with a census than a sample (Thus, a census study is: feasible when the population is small and necessary when the elements are quite different from each other).
3) Greater speed of data collection – the time between the recognition of a need for information and the availability of that information is reduced.
4) Availability of population elements – Some situations simply requires sampling. This is the case where e.g. the population is infinite conditions are appropriate for a census study.
The advantages of sampling over census studies are less compelling when the population is small and the variability within the population high.
- Feasible when the population is small
- Necessary when the elements are quite different from each other
However, when the population is small and variable, any sample we draw may not be representative of the population from which it is drawn. The resulting values we calculate from the sample are incorrect as estimates of the population values.
The ultimate test of a sample design is how well it represents the characteristics of the population it claims to represent à Thus, the sample must be valid.
Validity of a sample depends on two considerations: Accuracy and precision.
Accuracy: is the degree to which bias is absent from the sample. When the sample is drawn properly, the measure of behaviour, attitudes or knowledge of some sample elements will be less than the measure of those same variables drawn from the population. Also, the measure of the behaviour, attitudes, or knowledge of other sample elements will be more than the population values. Variations in these sample values offset each other, resulting in a sample value that is close to the population value.
Thus, an accurate (unbiased) sample is one in which the underestimators offset the overestimators.
Systematic variance has been defined as “the variation in measures due to some known or unknown influences that ‘cause’ the scores to lean in on direction more than another.” The systematic variance may be reduced by e.g. increasing the sample size.
Precision: precision of estimate is the second criterion of a good sample design. In order to interpret the findings of research, a measurement of how closely the sample represents the population is needed. The numerical descriptors that describe samples may be expected to differ from those that describe populations because of random fluctuations natural to the sampling process. This is called sampling error (or random sampling error) and reflects the influence of chance in drawing the sample members.
Sampling error is what is left after all known sources of systematic variance have been accounted for. Precision is measured by the standard error of estimate, a type of standard deviation measurement; the smaller the standard error of estimate, the higher is the precision of the sample. The ideal sample design produces a small standard error of estimate.
SAMPLE DESIGN - Two approaches of sample design are as follows: Different decisions researcher have to make can be found in exhibit 6.2 on page 172. Different types of sampling designs are described in table 6.3 on page 174.
- Representation - The members of a sample are selected using probability or non-probability procedures.
Probability sampling is based on the concept of random selection – a controlled procedure which ensures that each population element is given a known non-zero change of selection. Non-probability sampling is arbitrary and subjective; when elements are chosen subjectively, there is usually some pattern or scheme used. Thus, each member of the population does not have a known chance of being included.
- Element selection - Whether the elements are selected individually and directly from the population – viewed as a single pool – or additional controls are imposed, element selection may also classify samples. If each sample element is drawn individually from the population at large, it is an unrestricted sample. Restricted sampling covers all other forms of sampling.
- Probability sampling - is based on the concept of random selection – a controlled procedure that assures that each population element is given a known nonzero chance of selection. Only probability samples provide estimates of precision and offer the opportunity to generalize the findings to the population of interest from the sample population. The unrestricted, simple random sample is the simplest form of probability sampling. Since all probability samples must provide a known non-zero chance of selection for each population element, the simple random sample is considered a special case in which each population element has a known and equal chance of selection. In this section, we use the simple random sample to build a foundation for understanding sampling procedures and choosing probability samples.
STEPS IN SAMPLING DESIGN: There are several questions to be answered in securing a sample. Each requires unique information.
1) What is the target population – Good operational definitions are critical in choosing the relevant population.
2) What are the parameters of interest – Population parameters are summary descriptors (e.g., incidence proportion, mean, variance) of variables of interest in the population. Sample statistics are descriptors of those same relevant variables computed from sample data.
Sample statistics are used as estimators of population parameters. The sample statistics are the basis of conclusions about the population. Depending on how measurement questions are phrased, each may collect a different level of data. Each different level of data also generates different sample statistics. The population proportion of incidence “is equal to the number of elements in the population belonging to the category of interest, divided by the total number of elements in the population.” Proportion measures are necessary for nominal data and are widely used for other measures as well. The most frequent proportion measure is the percentage.
3) What is the sampling frame – The sampling frame is closely related to the population. It is the list of elements from which the sample is actually drawn. Ideally, it is a complete and correct list of population members only. A too inclusive frame is a frame that includes many elements other than the ones in which the researcher is interested.
4) What is the appropriate sampling method – A researcher must follow an appropriate method and make sure that interviewers (or others) cannot modify the selections made and only the selected elements from the original sampling are included.
5) What size sample is needed - Some principles that influence sample size include:
- The narrower or smaller the error range, the larger the sample must be.
- The greater the dispersion or variance within the population, the larger the sample must be to provide estimation precision.
- The higher the confidence level in the estimate, the larger the sample must be.
- The greater the desired precision of the estimate, the larger the sample must be.
- The greater the number of subgroups of interest within a sample, the greater the sample size must be, as each subgroup must meet minimum sample size requirements.
6) How much will it cost – also the costs for each and every experiment have to be taken into consideration, since money is often the factor which limits most of the research.
PROBABILITY SAMPLING:
1) Simple random sampling - Since all probability samples must provide a known nonzero probability of selection for each population element, the simple random sample is considered a special case in which each population element has a known and equal chance of selection.
Sample size
Probability of selection = --------------------
population size
However, Simple random sampling is often impractical, i.e. it requires a population list (sampling frame) that is often not available; and it fails to use all the information about a population, thus resulting in a design that may be wasteful. It may also be expensive to implement. Therefore alternative probability sampling approaches such as, systematic sampling, stratified sampling, cluster sampling and double sampling, will be considered.
2) Systematic sampling - In this approach, every kth element in the population is sampled, beginning with a random start of an element in the range of 1 to k. The kth element, or skip interval, is determined by dividing the sample size into the population size to obtain the skip pattern applied to the sampling frame.
population size
K = skip interval = ----------------------
sample size
The major advantage of systematic sampling is its simplicity and flexibility. A concern with systematic sampling is the possible periodicity in the population that parallels the sampling ratio. Another difficulty may arise when there is a monotonic trend in the population elements. That is, the population list varies from the smallest to the largest element or vice versa.
3) Stratified sampling - Most populations can be segregated into several mutually exclusive subpopulations, or strata. A stratified random sampling is the process by which the sample is constrained to include elements from each of the segments is called. After a population is divided into the appropriate strata, a simple random sample can be taken within each stratum. The results from the study can then be weighted (based on the proportion of the strata to the population) and combined into appropriate population estimated.
A stratified random sample is often chosen in order to:
- Increase a sample’s statistical efficiency;
- Provide adequate data for analysing the various subpopulations or strata;
- Enable different research methods and procedures to be used in different strata.
Stratification is usually more efficient statistically than simple random sampling and at worst is equal to it. With the ideal stratification, each stratum is homogeneous internally and heterogeneous with other strata. Also, the more strata used, the closer a researcher comes to maximizing interstrata differences (differences between strata) and minimizing intrastratum variances (differences within a given stratum).
4) In proportionate stratified sampling, each stratum is properly represented so that the sample size drawn from the stratum is proportionate to the stratum’s share of the total population. This approach has higher statistical efficiency than a simple random sample and it is much easier to carry out than other stratifying methods. It also provides a self-weighting sample; the population mean or proportion can be estimated simply by calculating the mean or proportion of all sample cases, eliminating the weighting of responses. On the other hand, proportionate stratified samples often gain little in statistical efficiency if the strata measures and their variances are similar for the major variables under study. Any stratification that departs from the proportionate relationship is disproportionate.
5) Cluster sampling – this is where the population is divided into groups of elements with some groups randomly selected for study. Two conditions foster the use of cluster sampling:
- The need for more economic efficiency than can be provided by simple random sampling;
- The frequent unavailability of a practical sampling frame for individual elements
Statistical efficiency for cluster samples is usually lower than for simple random samples mainly because clusters often don’t meet the need for heterogeneity and, instead, are homogeneous.
An area sampling is the most important form of cluster sampling. It is possible to use when a research involves populations that can be identified with some geographic area. This method overcomes the problems of both high sampling cost and the unavailability of a practical sampling frame for individual elements. In designing cluster samples, including area samples, the following questions should be answered:
- How homogeneous are the resulting clusters? – When clusters are homogeneous, this contributes to low statistical efficiency. Sometimes one can improve this efficiency by constructing clusters to increase intracluster variance.
- Shall equal-size or unequal-size clusters be sought for? – A cluster sample may be composed of clusters of equal or unequal size.
The theory of clustering is that the means of sample clusters are unbiased estimates of the population mean. This is more often true when clusters are naturally equal, such as households in city blocks. While one can deal with clusters of unequal size, it may be desirable to reduce or counteract the effects of unequal size.
- How large a cluster should be taken? – Comparing the efficiency of differing cluster sizes requires that the different costs for each size are discovered and the different variances of the cluster means are estimated.
- Shall a single-stage or multistage cluster be used? – Concerning single-stage or multistage cluster design, for most large-scale area sampling, the tendency is to use multistage designs. Several situations justify drawing a sample within a cluster, in preference to the direct creation of smaller clusters and taking a census of that cluster using one-stage cluster sampling.
- How large a sample is needed? – It depends mainly on the specific cluster design.
6) Double sampling - It may be more convenient or economical to collect some information by sample and then use this information as the basis for selecting a subsample for further study. This procedure is called double sampling, sequential sampling, or multiphase sampling. It is usually found with stratified and/or cluster designs.
NONPROBABILITY SAMPLING
With a subjective approach like non-probability sampling, the probability of selecting population elements is unknown. There are a variety of ways to choose persons or cases to include in the sample. A greater opportunity for bias to enter the sample selection procedure and to distort the findings of the study exists. Any range within which to expect the population parameter cannot be estimated. There are some practical reasons for using the less precise methods.
METHODS:
1) Convenience – Non-probability samples that are unrestricted are called convenience samples. They are the least reliable design but normally the cheapest and easiest to conduct. Researches or field workers have the freedom to choose whomever they find.
2) Purposive sampling - A non-probability sample that conforms to certain criteria is called purposive sampling. There are two major types – judgment sampling and quota sampling:
- Judgment sampling occurs when a researcher selects sample members to conform to some criterion. When used in the early stages of an exploratory study, a judgment sample is appropriate. When one wishes to select a biased group for screening purposes, this sampling method is also a good choice.
- Quota sampling is the second type of purposive sampling. It is used to improve representativeness. The logic behind quota sampling is that certain relevant characteristics describe the dimensions of the population. If a sample has the same distribution on these characteristics, then it is likely to be representative of the population regarding other variables on which the researcher has no control. In most quota samples, researchers specify more than one control dimension. Each should meet two tests: (1) It should have a distribution in the population that can be estimated, and (2) be pertinent to the topic studied.
3) Snowball - In the initial stage of snowball sampling, individuals are discovered and may or may not be selected through probability methods. This group is then used to refer the researcher to others who possess similar characteristics and who, in turn, identify others.
Eventually sampling on the internet has significantly increased in the past decades and almost every firm uses the Internet to conduct research.
CHAPTER B: SURVEY RESEARCH
There are 2 different types of data-collection methods: the observation approach and the communication approach.
The observation approach: involves observing conditions, behaviour, events, people or processes.
The communication approach: involves surveying/interviewing people and recording their response for analysis. Communicating with people about various topics, including participants, attitudes, motivations, intentions and expectations.
The researcher determines the data-collection approach by:
- identifying the types of data needed
- investigate questions the researcher must answer
- the desired data type (nominal/ordinal/interval/ratio)
- the characteristics of a sample unit
Survey: a measurement process used to collect information during a highly structured interview – sometimes with a human interviewer and other times without.
The goal of the survey: to obtain comparable data across subsets of the chosen sample so that similarities and differences can be found. A well-chosen question can yield information that would take much more time and effort to gather by observation.
To obtain comparable data:
- questions are carefully chosen or crafted
- questions are sequenced
- questions are precisely asked of each participants.
à a survey that uses telephone, mail or the Internet as the medium of communication can expand geographic coverage at a fraction of the cost and time required by observation.
When combined with statistical probability sampling for selecting participants, survey findings and conclusions can be applied to large and diverse populations.
The strength of a survey as a primary data-collection approach is its versatility: it does not require there to be a visual or other objective perception of the information sought by the researcher.
There are three major sources of error in communication research:
- measurement questions and survey instruments,
- interviewers, and
- participants.
A research may become useless if the researcher will:
- select or craft inappropriate questions;
- ask questions in an inappropriate order, or
- use inappropriate transitions and instructions to obtain information.
(Look at figure 7.1 p.206, 7.2 p. 207& 7.3 208)
For a survey to success, three conditions must be met by participants:
1. The participant must possess the information being targeted by the investigative questions.
2. The participant must understand his or her role in the interview as the provider of accurate information.
3. The participant must have adequate motivation to cooperate(=participant receptiveness)
à To increase the motivation of the participants it is important to establish a friendly relationship with the participant. Hereby, you will avoid two kind of errors: Whether they respond (willingness to respond) and how they will respond.
3 factors that will help with participant receptiveness:
1. the participant must believe that the participation experience will be pleasant and satisfying
2. the participant must believe that answering the survey is an important and worthwhile use of his/her time.
3. the participant must dismiss any mental reservations that he/she might have about participation.
Participants cause error in two ways: Whether they respond (willingness) and how they respond. To avoid participant-based errors:
Dealing with non-respond errors - By failing to respond or refusing to respond, participants create a non-representative sample for the study overall or for a particular item or question in the study.
In surveys, non-response error occurs when the responses of participants differ in some systematic way from the responses of nonparticipants.
This occurs when:
- The researcher cannot locate the person (the pre-designated sample element) to be studied; or
- The researcher is unsuccessful in encouraging that person to participate.
Solutions to reduce errors of non-response:
- establishing and implementing call- back procedures
- creating a non-response sample and weighting results from this sample
- substituting another individual for the missing participant
(Figure 7.5 p.213: comparison of communication approaches, including the disadvantages and advantages!)
Response errors: occur during the interview (created by either the interviewer or participant) or during the preparation of data for analysis.
Participant-initiated error: when the participant fails to answer fully and accurately – either by choice or because of inaccurate or incomplete knowledge
The interviewer can do little about the participant’s information level. The most appropriate applications for communication research are those where participants are uniquely qualified to provide the desired information.
Interviewer error: response bias caused by the interviewer.
Errors:
- Failure to secure full participant cooperation (sampling error). The sample is likely to be biased if interviewers do not obtain participant cooperation.
- Failure to record answers accurately and completely (data entry error). It is possible that, when interviewer records a procedure that forces him to summarize or interpret participant answers or he has insufficient space to record answers accurately, the data will be biased.
- Failure to consistently execute interview procedures. The precision of survey estimates will be reduced and there will be more error around estimates to the extent that interviewers are inconsistent in ways that influence the data.
- Failure to establish appropriate interview environment. Answers may be systematically inaccurate or biased when interviewers fail to appropriately train and motivate participants or fail to establish a suitable interpersonal setting.
- Falsification of individual answers or whole interviews. Surveying is difficult work, often done by part-time employees, usually with only limited training and under little direct supervision. At times, a falsification of an answer to an overlooked question happens, as an interviewer may put his own answer in the blank space.
- Inappropriate influencing behaviour. An interviewer can distort the results of any survey by inappropriate suggestions, directions, or verbal probes; by word emphasis and question rephrasing; by tone of voice; or by body language, facial reaction to an answer, or other nonverbal signals.
- Physical presence bias. Sometimes young people modify their responses during an interview conducted by an older person, whom might be perceived as an authority.
Participants also cause error by responding in such a way as to unconsciously or consciously misrepresent their actual behaviour, attitudes, preferences, motivations, or intentions (= response bias).
à Participants create response bias when they modify their responses to be socially acceptable or to save face or reputation with the interviewer (social desirability bias),
à and sometimes even in an attempt to appear rational and logical.
One major cause of response bias is acquiescence = the tendency to be agreeable.
In order to reduce response errors, a researcher can use follow ups or reminders to increase the response rate.
In addition, there is evidence that advance notification, particularly by telephone, is effective in increasing response rates (preliminary notification).
Other concurrent techniques, such as appropriate questionnaire length, survey sponsorship, return envelopes and postage for mail surveys, personalization, cover letters and deadline dates also help to reduce the non-response error.
A researcher can conduct a semi structured interview or survey by personal interview or telephone or can distribute a self-administered survey by mail, fax, etc. Exhibit 7.5 provides an overview of the different communication approaches.
Telephone interviewing remains popular because of the dispersion of telephone service in households and the low cost of this method compared with personal interviewing. However, telephone interviewing also have some disadvantages.
Noncontact rate is a ratio of potential but unreached contacts (no answer, busy, answering machine, and disconnects but not refusals).
The refusal rate refers to the ration of contacted participants who decline the interview to all potential contacts. Moreover, the telephone interview is limited to a certain length and the use of visual or complex question, the interviewee might hang up and participants are less involved.
Disadvantages of telephone interviewing:
- Inaccessible households
- Inaccurate or non-functioning numbers
- Limitation on interview length
- Limitations on use of visual or complex questions
- Ease of interview termination
- Less participant involvement
- Distracting physical environment
Random dialling: requires choosing telephone exchanges or exchange blocks and then generating random numbers within these blocks for calling.
Computer-assisted telephone interviewing (CATI)
The self-administered questionnaire is the most popular type of surveys. Self-administered surveys can be delivered by mail, computer or can be intercept studies.
Advantages of self–administered surveys include:
- They typically cost less than surveys via personal interviews;
- Mail surveys are typically perceived as more impersonal, providing more anonymity than the other communication modes, including other methods for distributing self-administered questionnaires.
Disadvantages of self-administered surveys include:
- Researchers cannot expect to obtain large amounts of information and cannot deeply investigate the issue they want to;
- Participant often refuse to cooperate with a long and/or complex mail or intercept questionnaire unless they perceive a personal benefit.
A survey via personal interview is a two-way conversation between a trained interviewer and a participant.
The main advantage lies in the depth of information and detail that can be secured. It far exceeds the information secured from telephone and self-administered studies. The interviewer can also do more things to improve the quality of the information received. However, this method is costly and time consuming, and its flexibility can result in excessive interviewer bias.
Computer-assisted personal interviewing (CAPI): special scoring devices and visual materials are used.
Intercept interview: targets participants in centralised locations, such as shoppers in retail malls. Reduce the costs associated with travel.
Lastly, outsourcing survey services offers special advantages to managers. A professionally trained research staff, centralized location interviewing, focus group facilities and computer assisted facilities are among them. Speciality firms offer software and computer based assistance for telephone and personal interviewing as well as for mail and mixed modes. Panel suppliers produce data for longitudinal studies of all varieties. However, it is highly significant to be very careful in selection interviewers and giving them appropriate training
Web-based survey: figure. 7.7, p. 226
à A web-based survey as the power of CATI systems, but without the expense of network administrators, specialised software or additional hardware.
CHAPTER C – EXPERIMENTS
Causal methods are research methods which answer questions such as “Why do events occur under some conditions and not under others?”
Ex post facto research designs, in which a researcher interviews respondents or observes what is or what has been, have the potential for discovering causality. In comparison, the distinction is that with causal methods the researcher is required to accept the world as it is found, whereas an experiment allows the researcher to systematically alter the variables of interest and observe what changes follow.
Experiments are studies which involve intervention by the researcher beyond what is required for measurement. Usually this means manipulating some variable in a setting and observing how it affects the subjects being studied (e.g., physical entities or people). One manipulates the independent or explanatory variable and observes whether the hypothesized dependent variable is affected by this. In a causal relationship there is at least one independent variable (IV) and one dependent variable (DV). It is hypothesized that in some way the IV ‘causes’ the DV to occur.
The basis for the conclusion of the experiment is formed by three types of evidence:
1. There must be an agreement between independent and dependent variables. In other words, the presence or absence of one can be linked to the presence or absence of the other.
2. The time order of the occurrence of the variables has to be considered. The dependent variable should not go before the independent variable.
3. The researchers ought to be confident that other extraneous variables did not influence the dependent variable. Researchers control their ability to confound the planned comparison, in order to ensure that other extraneous variables are not the source of influence. Standardized conditions for control can be arranged under laboratory conditions. Such controls are important, but further precautions are needed so that the results achieved reflect only the influence of the independent variable on the dependent variable.
ADVANTAGES OF EXPERIMENTS
Causality cannot be proved with certainty but the probability of one variable being related to another can be established credibly. An experiment comes closer than any primary data collection method to accomplishing this.
1.The primary advantage is the researcher’s ability to manipulate the independent variable. Consequently, the probability increases that changes in the dependent variable are a function of that manipulation. Also, a control group serves as a comparison to assess the existence and potency of the manipulation.
2. The second advantage is that influence of extraneous variables can be controlled more effectively than in other designs. This helps the researcher isolate experimental variables and evaluate their impact over time.
3. Thirdly, the convenience and cost are superior to other methods. This allows the experimenter opportunistic scheduling of data collection and the flexibility to adjust variables and conditions that evoke extremes not observed under routine circumstances. Also, the experimenter can assemble combinations of variables for testing rather than having to search for their unexpected appearance in the study environment.
4. Fourth, replication (= repeating an experiment with different subject groups and conditions) leads to the discovery of an average effect of the independent variable across people, situations and times.
5. Fifth, researchers can use naturally occurring events and, to some extent, field experiments (a study of the dependent variable in actual environmental conditions) to reduce subjects’ perceptions of the researcher as a source of intervention or deviation in their everyday lives.
DISADVANTAGES OF EXPERIMENTS
1. It is argued that the primary disadvantage of the experimental method is the artificiality of the laboratory. However, many subjects’ perceptions of an unnatural environment can be improved by investment in the facility.
2. Second, despite random assignment, generalization from non-probability samples can pose problems. Additionally, when an experiment is not successfully disguised, volunteer subjects are often those with the most interest in the topic.
3. Despite the relatively low costs of experimentation, many applications of experimentation far outrun the budgets for other primary data collection methods.
4. Experimentation is most effectively targeted at problems of the present or immediate future, as the studies of the past are not feasible, and studies about intentions or predictions are difficult.
5. There are limits to the types of manipulation and controls that are ethical in the study of people.
CONDUCTING AN EXPERIMENT
Researchers, in a well-executed experiment, must complete a series of activities to carry out their craft successfully.
- Researcher has to start with selecting relevant variables and specifying treatment levels.
- Then the issue of control of the experimental environment has to be considered.
- Subsequently an experimental design has to be chosen and subjects have to be selected and assigned.
-The next step is pilot testing, revising and testing.
- In the end the data collected has to be analysed.
STEP 1 - SELECTING RELEVANT VARIABLES: The researcher’s task is to translate a vague problem into the question or hypothesis that best states the objectives of the research. Depending on the complexity of the problem, investigative questions and additional hypotheses can be created to address specific facets of the study or data that need to be gathered.
Hypothesis is a relational statement as it describes a relationship between two or more variables. It must also be operationalized (how concepts are transformed into variables to make them measurable and subject to testing). Once the researcher has formulated the research question and hypothesis, he has to:
- Select variables that are the best operational representations of the original concepts.
- Determine how many variables to test.
- Select or design appropriate measures for them.
The number of variables in an experiment is constrained by the project budget, the time allocated, the availability of appropriate controls, and the number of subjects being tested. There must be more subjects than variables – for statistical reasons.
The selection of measures for testing requires a thorough review of the available literature and instruments. Also, measures must be adapted to the unique needs of the research situation without compromising their intended purpose or original meaning.
STEP 2 - SPECIFYING TREATMENT LEVELS: In an experiment, participants experience a manipulation of the independent variable, called the experimental treatment.
The treatment levels of the independent variable are the arbitrary or natural groups the researcher makes within the independent variable of an experiment. The levels assigned to an independent variable should be based on simplicity and common sense.
A control group could provide a base level for comparison. The control group is composed of subjects who are not exposed to the independent variable(s), in contrast to those who receive the experimental treatment.
STEP 3 - CONTROLLING THE EXPERIMENTAL ENVIRONMENT: Extraneous variables can appear as differences in age, gender, race, dress, communication, etc. These have the potential for distorting the effect of the treatment on the dependent variable and must be controlled or eliminated. However, at this stage, a researcher is mainly concerned with environmental control, holding constant the physical environment of the experiment. For consistency, the introduction of the experiment to the subjects and the instructions would likely be videotaped. The arrangement of the room, the time of administration, the experimenter’s contact with the subjects, and so forth, must all be consistent across each administration of the experiment. Other forms of control involve subjects and experimenters. When subjects do not know if they are receiving the experimental treatment, they are said to be blind.
When the experimenters do not know if they are giving the treatment to the experimental group or to the control group, the experiment is said to be double blind. Both approaches control unwanted complications such as subjects’ reactions to expected conditions or experimenter influence.
STEP 4 - CHOOSING THE EXPERIMENTAL DESIGN: Experimental designs are unique to the experimental method. They serve as positional and statistical plans to designate relationships between experimental treatments and the experimenter’s observations and measurement points in the temporal scheme of the study. The researchers apply their knowledge to select one design that is best suited to the goals of the research. Judicious selection of the design improves the probability that the observed change in the dependent variable was caused by the manipulation of the independent variable and not by any other factor. It simultaneously strengthens the generalizability of results beyond the experimental setting.
STEP 5 - SELECTING AND ASSIGNING PARTICIPANTS: The selected participants should be representative of the population to which the researcher wishes to generalize the results from the study. In principle, the procedure for random sampling of experimental subjects is similar to the selection of respondents for a survey. First the researcher prepares a sampling frame and then randomly assigns the subjects for the experiment to groups.
Systematic sampling may be used if the sampling frame is free from any form of periodicity that parallels the sampling ratio. Since the sampling frame is often small, experimental subjects are recruited; thus, they are a self-selecting sample.
However, if randomization is used, those assigned to the experimental group are likely to be similar to those assigned to the control group.
Random assignment to the groups is required to make the groups as comparable as possible with respect to the dependent variable. Randomization does not guarantee that if the groups were pretested they would be pronounced identical; but it is an assurance that those differences remaining are randomly distributed.
Matching may be used when it is not possible to randomly assign subjects to groups. This employs a non-probability quota sampling approach. The object of matching is to have each experimental and control subject matched on every characteristic used in the research. Since the characteristics of concern are only those that are correlated with the treatment condition or the dependent variable, they are easier to identify, control, and match.
Some authorities suggest a quota matrix as the most efficient means of visualising the matching process. E.g. one-third of the subjects from each cell of the matrix would be assigned to each of the three groups (2 experimental + 1 control). If matching does not alleviate the assignment problem, a combination of matching, randomization, and increasing the sample size would be used.
STEP 6 - PILOT TESTING, REVISING, AND TESTING: The procedures for this stage are similar to those for other forms of primary data collection. Pilot testing is intended to reveal errors in the design and improper control of extraneous or environmental conditions. Pretesting the instruments permits refinement before the final test. This allows for revising scripts, determining control problems with laboratory conditions, and scanning of the environment for factors that might confound the results.
STEP 7 - ANALYZING THE DATA: If adequate planning and pretesting have occurred, the experimental data will take an order and structure uncommon to surveys and unstructured observational studies.
Researchers have several measurement and instrument options with experiments. Among them are:
- Observational techniques and coding schemes.
- Paper-and-pencil tests.
- Self-report instruments with open-ended or closed questions.
- Scaling techniques (e.g. Likert scales, semantic differentials, Q-sort).
- Physiological measures (e.g. galvanic skin response, EKG, voice pitch analysis, eye dilation).
VALIDITY IN EXPERIMENTATION
There is always a question about whether the results are true. Validity has been defined as whether a measure accomplishes its claims. There are several different types of validity, but here only the two major varieties are considered: internal validity – do the conclusions drawn about a demonstrated experimental relationship truly imply cause? – and external validity – does an observed causal relationship generalize across persons, settings, and times? Each type of validity has specific threats a researcher should to guard against.
INTERNAL VALIDITY – following are some of the threats to the internal validity:
- History – in an experiment some events may occur that confuse the relationship being studied. In many experimental designs, a researcher takes a control measurement (O1) of the dependent variable before introducing the manipulation (X).
After the manipulation, a researcher takes an after-measurement (O2) of the dependent variable. The difference between O1 and O2 is the change that the manipulation has caused.
- Maturation – changes may also occur within the subject that are a function of the passage of time and are not specific to any particular event. These are of special concern when the study covers a long time, but they may also be factors in tests that are as short as an hour or two. A subject can become hungry, bored, or tired in a short time, which can affect response results.
- Testing – the process of taking a test can affect the scores of a second test. Taking the first test can have a learning effect that influences the results of the second test.
- Instrumentation – this threat to internal validity results from changes between observations in either the measuring instrument or the observer.
Using different questions at each measurement is an obvious source of potential trouble, but using different observers or interviewers also threatens validity. There can even be an instrumentation problem if the same observer is used for all measurements - experience, boredom, fatigue, and anticipation of results can all distort the results of separate observations.
- Selection – an important threat to internal validity is the differential selection of subjects for experimental and control groups. Validity considerations require that the groups be equivalent in every respect.
If subjects are randomly assigned to experimental and control groups, this selection problem can be largely overcome. Additionally, matching the members of the groups on key factors can enhance the equivalence of the groups.
- Statistical regression – this factor operates especially when groups have been selected by their extreme scores. No matter what is done between O1 and O2, there is a strong tendency for the average of the high scores at O1 to decline at O2 and for the low scores at O1 to increase. This tendency results from imperfect measurement that, in effect, records some persons abnormally high and abnormally low at O1. In the second measurement, members of both groups score more closely to their long-run mean scores.
- Experimental mortality – this occurs when the composition of the study groups changes during the test. Attrition is especially likely in the experimental group and with each dropout the group changes. Because members of the control group are not affected by the testing situation, they are less likely to withdraw.
In general, the threats mentioned are dealt with adequately in experiments by random assignment. However, five additional threats to internal validity are independent of whether or not one randomizes. The first three have the effect of equalizing experimental and control groups.
- Diffusion or imitation of treatment – if the control group learns of the treatment (by talking to people in the experimental group) it eliminates the difference between the groups.
- Compensatory equalization – where the experimental treatment is much more desirable, there may be an administrative reluctance to withdraw the control group members. Compensatory actions for the control groups may confound the experiment.
- Compensatory rivalry – this may occur when members of the control group know they are in the control group. This may generate competitive pressures.
- Resentful demoralization of the disadvantaged – when the treatment is desirable and the experiment is obtrusive, control group members may become resentful of their deprivation and lower their cooperation and output.
- Local history – the regular history effect already mentioned impacts both experimental and control groups alike. However, when one assigns all experimental persons to one group session and all control people to another, there is a chance for some peculiar event to confound results.
This can be handled by administering treatments to individuals or small groups that are randomly assigned to experimental or control sessions.
EXTERNAL VALIDITY – internal validity factors cause confusion about whether the experimental treatment (X) or extraneous factors are the source of observation differences. External validity is concerned with the interaction of the experimental treatment with other factors and the resulting impact on the ability to generalize to (and across) times, settings, or persons. The following interactive possibilities are among the major threats to external validity:
- Reactivity of testing on X – the reactive effect refers to sensitising subjects via a pre-test so that they respond to the experimental stimulus (X) in a different way. This before-measurement effect can be particularly significant in experiments where the IV is a change in attitude.
- Interaction of selection and X – the process by which test subjects are selected for an experiment may be a threat to external validity. The population from which one selects subjects may not be the same as the population to which one wishes to generalize results.
- Other reactive factors – experimental settings may have a biasing effect on a subject’s response to X. An artificial setting can obviously produce results that are not representative of larger populations. If subjects know they are participating in an experiment, there may be a tendency to role-play in a way that distorts the effects of X. Another reactive effect is the possible interaction between X and subject characteristics.
Problems of internal validity can be solved by the careful design of experiments, but this is less true for problems of external validity.
EXPERIMENTAL RESEARCH DESIGNS: The many experimental designs differ greatly in their power to control contamination of the relationship between independent and dependent variables. The most widely accepted designs are based on this characteristic of control: (1) pre-experiments, (2) true experiments, and (3) field experiments.
PREEXPERIMENTAL DESIGNS – all three fail to adequately control the various threats to internal validity.
- After-only study: First, treatment or manipulation of independent variable is conducted and then observation or measurement of dependent variable takes place. The lack of a pre-test and control group makes this design inadequate for establishing causality.
- One group Pre-test-Post-test Design: In this case there is a pre-test, which takes place before the manipulation, and the post-test. Still a weak design – how well does it control for history? Maturation? Testing effect? The others?
- Static Group Comparison – the design provides for two groups, one of which receives the experimental stimulus while the other serves as a control.
The addition of a comparison group creates a substantial improvement over the other two designs. Its chief weakness is that there is no way to be certain that the two groups are equivalent.
TRUE EXPERIMENTAL DESIGNS – the major deficiency of the pre-experimental designs is that they fail to provide comparison groups that are truly equivalent. The way to achieve equivalence is through matching and random assignment. With randomly assigned groups, tests of statistical significance of the observed differences can be employed. It is common to show an X for the test stimulus and a blank for the existence of as control situation. This is an oversimplification of what really occurs. More precisely, there is an X1 and an X2 - sometimes more. X1 identifies one specific independent variable, while X2 is another independent variable that has been chosen, often randomly, as the control case. Different levels of the same independent variable may also be used, with one level serving as the control.
- Pre-test-Post-test Control Group Design – this design consists of adding a control group to the one-group pre-test-post-test design and assigning the subjects to either of the groups by a random procedure (R). The seven major internal validity problems are dealt with fairly well in this design, but there are still some difficulties. Local history may occur in one of the groups, and not in the other. Also, if communication exists between people in test and control groups, there can be competition and other internal validity problems.
Maturation, testing, and regression are handled well because one would expect them to be felt equally in experimental and control groups. Mortality can be a problem if there are different dropout rates in the study groups. Selection is adequately dealt with by random assignment. The record of this design is not as good on external validity - there is a chance for a reactive effect from testing. This might be a substantial influence in attitude change studies where pre-tests introduce unusual topics and content. This design also doesn’t ensure against reaction between selection and the experimental variable.
Even random selection may be defeated by a high decline rate by subjects, resulting in using a disproportionate share of people who are essentially volunteers and who may not be typical of the population.
If this occurs, the experiment will need to be replicated several times with other groups under other conditions, before a researcher can be confident of external validity.
- Post-test-Only Control Group Design – The pre-test measurements are omitted in this design. Pre-tests are well established in classical research design but are not really necessary when it is possible to randomize.
The simplicity of this design makes it more attractive than the pre-test-post-test control group design. Internal validity threats from history, maturation, selection, and statistical regression are adequately controlled by random assignment. Because participants are measured only once, the threats of testing and instrumentation are reduced, but different mortality rates between experimental and control groups continue to be a potential problem. The external validity problem of testing interaction effect is reduced.
FIELD EXPERIMENTS: QUASI- OR SEMI-EXPERIMENTS – under field conditions, a researcher often cannot control enough of the extraneous variables or the experimental treatment to use a true experimental design. If the stimulus condition occurs in a natural environment, a field experiment is required. Some studies are not possible with a control group, a pre-test, or randomization of customers. Pre-experimental designs or quasi-experiments are used to deal with such conditions.
It is often unknown when or to whom to expose the experimental treatment in a quasi-experiment. Usually though, it can decided when and whom to measure. A quasi-experiment is inferior to a true experimental design but is usually superior to pre-experimental designs.
- Non-equivalent Control Group Design – this is a strong and widely used quasi-experimental design. It differs from the pre-test-post-test control group design - the test and control groups are not randomly assigned.
There are two varieties.
- Intact equivalent design, in which the membership of the experimental and control groups is naturally assembled. Ideally, the two groups are as alike as possible. This design is especially useful when any type of individual selection process would be reactive.
-The self-selected experimental group design is weaker because volunteers are recruited to form the experimental group, while no volunteer subjects are used for control. This design is likely when subjects believe it would be in their interest to be a subject in an experiment.
- Separate Sample Pre-test-Post-test Design – Most applicable when it is unknown when and to who to introduce the treatment but it can decide when and whom to measure. This is a weaker design because several threats to internal validity are not handled adequately.
History can confound the results but can be overcome by repeating the study at other times in other settings. It is considered superior to true experiments in external validity. Its strength results from its being a field experiment in which the samples are usually drawn from the population to which a researcher wishes to generalize our findings. The design is more appropriate where the population is large, if a before-measurement was reactive, or if there was no way to restrict the application of the treatment.
CHAPTER D: MEASUREMENT SCALES
To measure: to discover the extent, dimensions, quantity, or capacity of something, especially by comparison with a standard.
Measurement in research consists of assigning numbers to empirical events, objects or properties, or activities in compliance with a set of rules.
This definition implies that measurement is a three-part process:
- Selecting observable empirical events
- Developing a set of mapping rules: a scheme for assigning numbers or symbols to represent aspects of the event being measured
- Applying the mapping rule(s) to each observation of that event.
Variables being studied in research may be classified as objects or as properties.
- Objects include the concepts of ordinary experience, such as touchable items like furniture. Objects also include things that are not as concrete, i.e. genes, attitudes and peer-group pressures.
- Properties are the characteristics of the object. A person’s physical properties may be stated in terms of weight, height.
- Psychological properties: include attitudes and intelligence.
- Social properties include leadership ability, class affiliation, and status. In a literal sense, researchers do not measure either objects or properties.
àThey measure indicants of the properties or indicants of the properties of objects. Since each property cannot be measured directly, one must infer its presence or absence by observing some indicant or pointer measurement.
MEASUREMENT SCALES
In measuring, one devises some mapping rule and then translates the observation of property indicants using this rule.
Several types of measurement are possible; the appropriate choice depends on what is assumed about the mapping rules.
Each one has its own set of underlying assumptions about how the numerical symbols correspond to real-world observations.
Mapping rules have four assumptions:
1. Numbers are used to classify, group, or sort responses. No order exists.
2. Numbers are ordered. One number is greater than, less than, or equal to another number.
3. Differences between numbers are ordered. The difference between any pair of numbers is greater than, less than, or equal to the difference between any other pair of numbers.
4. The number series has a unique origin indicated by the number zero. This is an absolute and meaningful zero point.
Combinations of these characteristics of classification, order, distance, and origin provide four widely used classifications of measurement scales:
1) NOMINAL SCALES – with these scales, a researcher is collecting information on a variable that naturally (or by design) can be grouped into two or more categories that are mutually exclusive and collectively exhaustive.
The only possible arithmetic operation when a nominal scale is employed is the counting of members.
Nominal classifications can consist of any number of separate groups if the groups are mutually exclusive and collectively exhaustive. These scales are the least powerful of the four data types. They suggest no order or distance relationship and have no arithmetic origin.
Any information a sample element might share about varying degrees of the property being measured, is wasted by this scale. The only qualification is the number count of cases in each category (the frequency distribution), so the researcher is restricted to the use of the mode as the measure of central tendency.
It can only be concluded which category has the most members. There is no generally used measure of dispersion for nominal scales.
Dispersion: describes how scores cluster or scatter in a distribution. Nominal data are statistically weak, but they can still be useful. One can almost always classify a set of properties into a set of equivalent classes. Nominal measures are especially valuable in exploratory work where the objective is to uncover relationships rather than secure precise measurements. Nominal scales are also widely used in survey and other research when data are classified by major subgroups of the population. Classifications such as respondents’ marital status, gender, political orientation, and exposure to a certain experience provide insight into important demographic data patterns.
2) ORDINAL SCALES – include the characteristics of the nominal scale plus an indicator of order. Ordinal data require conformity to a logical postulate: If a > b and b > c, then a > c. The use of an ordinal scale implies a statement of ‘greater than’ or ‘less than’ (or equal) without stating how much greater or less. Other descriptions can be used – ‘superior to’, ‘happier than’ etc. An ordinal concept can be extended beyond the three cases used in the simple illustration of a>b>c – any number of cases can be ranked.
Another extension of the ordinal concept occurs when there is more than one property of interest. Examples of ordinal data include attitude and preference scales.
Because the numbers used with ordinal scales have only a rank meaning, the appropriate measure of central tendency is the median. The median is the midpoint of a distribution. A percentile or quartile reveals the dispersion. Co-relational analysis of ordinal data is restricted to various ordinal techniques.
Measures of statistical significance are technically confined to a body of statistics known as nonparametric methods, synonymous with distribution-free statistics.
3) INTERVAL SCALES – have the power of nominal and ordinal data plus one additional strength: they incorporate the concept of equality of interval (the scaled distance between 1 and 2 equals the distance between 2 and 3). Calendar time is such a scale. Centigrade and Fahrenheit temperature scales are other examples of classical interval scales. Both have an arbitrarily determined zero point, not a unique origin. Researchers treat many attitude scales as interval. When a scale is interval and the data are relatively symmetric with one mode, you use the arithmetic mean as the measure of central tendency. When the distribution of scores computed from interval data leans in one direction or the other (skewed right or left), we often use the median as the measure of central tendency and the interquartile range as the measure of dispersion.
4) RATIO SCALES – incorporate all of the powers of the previous scales plus the provision for absolute zero or origin. Ratio data represent the actual amounts of a variable. Measures of physical dimensions such as weight, height, and distance are examples. In business research, we ratio scales in many areas – there are money values, population counts, return rates, and productivity rates. For statistical purposes the analyst would use the same statistical techniques as with interval data. All statistical techniques mentioned up to this point are usable with ratio scales. Other manipulations carried out with real numbers may be done with ratio-scale values. Thus, multiplication and division can be used with this scale but not with the others mentioned. Geometric and harmonic means are measures of central tendency, and coefficients of variation may also be calculated for describing variability. Higher levels of measurement generally yield more information.
Because of the measurement precision at higher levels, more powerful and sensitive statistical procedures can be used. When we collect information at higher levels, we can always covert, rescale, or reduce the data to arrive at a lower level.
SOURCES OF MEASUREMENT DIFFERENCES
Since compete control (of study) is unattainable, error does occur. Much error is systematic (results from bias), while the remainder is random (occurs erratically). There are four major error sources which may contaminate the results:
- THE RESPONDENT – opinion differences that affect measurement come from relatively stable characteristics of the respondent. Typical of these are employee status, ethnic group membership, social class, etc.
The skilled researcher will anticipate many of these dimensions, adjusting the design to eliminate, neutralize, or otherwise deal with them. Respondents may be reluctant to express strong negative or positive feelings, may purposefully express attitudes that they perceive as different from those of others, or may have little knowledge about something but be reluctant to admit ignorance. This reluctance to admit ignorance of a topic can lead to an interview consisting of ‘guesses’ or assumptions, which, in turn, create erroneous data. Respondents may also suffer from temporary factors like fatigue, boredom, anxiety, hunger, etc.; these limit the ability to respond accurately and fully.
- SITUATIONAL FACTORS – any condition that places a strain on the interview or measurement session can have serious effects on the interviewer-respondent rapport. If another person is present, that person can distort responses by joining in, by distracting, or by merely being there. If the respondents believe anonymity is not ensured, they may be reluctant to express certain feelings.
- THE MEASURER – the interviewer can distort responses by rewording, paraphrasing, or reordering questions. Stereotypes in appearance and action introduce bias. Inflections of voice and conscious or unconscious prompting with smiles, nods, and so forth, may encourage or discourage certain replies. Checking of the wrong response or failure to record full replies will obviously distort findings.
In the data analysis stage, incorrect coding, careless tabulation, and faulty statistical calculation may introduce further errors.
- THE INSTRUMENT – a defective instrument can cause distortion in two major ways. First, it can be too confusing and ambiguous. The use of complex words and syntax beyond participant comprehension is typical. Leading questions, ambiguous meanings, mechanical defects (inadequate space for replies, response-choice omissions, and poor printing), and multiple questions suggest the range of problems.
Many of these problems are the direct result of operational definitions that are insufficient, resulting in an inappropriate scale being chosen or developed. A more elusive type of instrument deficiency is poor selection from the universe of content items. Seldom does the instrument explore all the potentially important issues.
Even if the general issues are studied, the questions may not cover enough aspects of each area of concern.
THE CHARACTERISTICS OF GOOD MEASUREMENT
The tool should be an accurate counter or indicator of what we are interested in measuring. In addition, it should be easy and efficient to use.
There are three major criteria for evaluating a measurement tool:
- VALIDITY – is the extent to which a test measures what we actually wish to measure. This text features two major forms: external and internal validity. The external validity of research findings is the data’s ability to be generalized across persons, settings, and times. Internal validity is further limited in this discussion to the ability of a research instrument to measure what it is purported to measure. One widely accepted classification of validity consists of three major forms:
- Content Validity – of a measuring instrument is the extent to which it provides adequate coverage of the investigative questions guiding the study. If the instrument contains a representative sample of the universe of subject matter of interest, then content validity is good. To evaluate the content validity of an instrument, one must first agree on what elements constitute adequate coverage. A determination of content validity involves judgment.
- Criterion-Related Validity – reflects the success of measures used for prediction or estimation. You may want to predict and outcome or estimate the existence of a current behaviour or time perspective. An attitude scale that correctly forecasts the outcome of a purchase decision has predictive validity. An observational method that correctly categorizes families by current income class has concurrent validity. Any criterion measure must be judged in terms of four qualities: relevance, freedom from bias, reliability, and availability. A criterion is relevant if it is defined and scored in the terms we judge to be the proper measures of someone’s success. An erratic criterion can hardly be considered a reliable standard by which to judge performance on a sales employment test. Finally, the information specified by the criterion must be available.
If it is not available, how much will it cost and how difficult will it be to secure? The amount of money and effort that should be spent on development of a criterion depends on the importance of the problem for which the test is used. Once there are test and criterion scores, they must be compared in some way.
- Construct validity – in attempt to evaluate, we consider both the theory and the measuring instrument being used. If we were interested in measuring the effect of trust in cross functional teams, the way in which ‘trust’ was operationally defined would have to correspond to an empirically grounded theory. If a known measure of trust was available, we might correlate the results obtained using this measure with those derived from our new instrument.
Such an approach would provide us with preliminary indications of convergent validity (the degree to which scores on one scale correlate with scores on other scales designed to assess the same construct). Another method to of validating the trust construct would be to separate it from other constructs in the theory or related theories. To the extent that trust could be separated from bonding, reciprocity, and empathy, we would have completed the first steps toward discriminant validity (the degree to which scores on a scale do not correlate with scores from scales designed to measure different constructs).
- RELIABILITY – has to do with the accuracy and precision of a measurement procedure. A measure is reliable to the degree that it supplies consistent results. Reliability is a necessary contributor to validity but is not a sufficient condition for validity. If a measurement is not valid, it hardly matters if it is reliable – because it does not measure what the designer needs to measure in order to solve the research problem. In this context, reliability is not as valuable as validity, but it is much easier to assess.
Reliability is concerned with estimates of the degree to which a measurement is free of random or unstable error. Reliable instruments can be used with confidence that transient and situational factors are not interfering. Reliable instruments are robust; they work well at different times under different conditions. This distinction of time and condition is the basis of frequently used perspectives on reliability:
- Stability – a measure is said to possess stability if consistent results with repeated measurements of the same person with the same instrument can be secured. An observation procedure is stable if it gives the same reading on a particular person when repeated one or more times.
Some of the difficulties that can occur in the test-retest methodology and cause a downward bias in stability include:
- Time delay between measurements – leads to situational factor changes.
- Insufficient time between measurements – permits the respondent to remember precious answers and repeat them, resulting in biased reliability indicators.
- Respondent’s discernment of a study’s disguised purpose – nay introduce bias if the respondent holds opinions related to the purpose but not assessed with current measuring questions.
- Topic sensitivity – occurs when the respondent seeks to learn more about the topic or form new and different opinions before the retest.
A suggested remedy is to extend the interval between test and retest (from two weeks to a month).
- Equivalence – a second perspective on reliability considers how much error may be introduced by different investigators (in observation) or different samples of items being studied (in questioning or scales). Thus, while stability is concerned with personal and situational fluctuations from one time to another, equivalence is concerned with variations at one point in time among observers and samples of times. A good way to test for the equivalence of measurements by different observers is to compare their scoring of the same event. In studies where a consensus among experts or observers is required, the similarity of the judges’ perceptions is sometimes questioned. One tests for item sample equivalence by using alternative or parallel forms of the same test administered to the same persons simultaneously. The results of the two tests are then correlated. Under this condition, the length of the testing process is likely to affect the subjects’ responses through fatigue, and the inferred reliability of the parallel form will be reduced accordingly. Some measurement theorist recommended an interval between the two tests to compensate for this problem. This approach, called delayed equivalent forms, is a composite of test-retest and the equivalence method. As in test-retest, one would administer form X followed by form Y to half of the examinees and form Y followed by form X to the other half to prevent ‘order of presentation’ effects.
- Internal Consistency – a third approach to reliability uses only one administration of an instrument or test to assess the internal consistency or homogeneity among the items.
The split-half technique can be used when the measuring tool has many similar questions or statements to which participant can respond. The instrument is administered and the results are separated by item into even and odd numbers or into randomly selected halves.
When the two halves are correlated, if the results of the correlation are high, the instrument is said to have high reliability in an internal consistency sense. The high correlation tells us there is similarity (or homogeneity) among the items. The potential for incorrect inferences about high internal consistency exists when the test contains many items – which inflate the correlation index.
- PRACTICALITY – is concerned with a wide range of factors of economy, convenience, and interpretability. The scientific requirements of a project call for the measurement process to be reliable and valid, while the operational requirements call for it to be practical.
- Economy – some trade-off usually occurs between the ideal research project and the budget. Data are not free, and instrument length is one area where economic pressures dominate. The choice of data collection method is also often dictated by economic factors.
- Convenience – a measuring device passes the convenience test if it is easy to administer. A questionnaire or a measurement scale with a set of detailed but clear instructions, with examples, is easier to complete correctly than one that lacks these features. We can also make the instrument easier to administer by giving close attention to its design and layout.
- Interpretability – this aspect of practicality is relevant when persons other than the test designers must interpret the results. In such cases, the designer of the data collection instrument provides several key pieces of information to make interpretation possible
Scaling is the ‘procedure for the assignment of numbers (or other symbols) to a property of objects in order to impart some of the characteristics of numbers to the properties in question.’ Procedurally, numbers are assigned to indicants of the properties of objects. Thus, one assigns a number scale to the various levels of heat and cold and calls it a thermometer.
SELECTING A MEASUREMENT SCALE – selecting and constructing a measurement scale requires the consideration of several factors that influence the reliability, validity, and practicality of the scale:
1) Research objectives – researchers face two general types of scaling objectives:
- To measure characteristics of the participants who participate in the study.
- To use participants as judges of the objects or indicants presented to them.
With the first study objective, the scale would measure the customers’ orientation as favourable or unfavourable. With the second objective, the same data may be used, but the focus is on how satisfied people are with different design options.
2) Response types – measurement scales fall into one of four general types: rating, ranking, categorization, and sorting. A rating scale is used when participants score an object or indicant without making a direct comparison to another object or attitude. Ranking scales constrain the study participant to making comparisons and determining order among two or more properties (or their indicants) or objects. A choice scale requires that participants choose one alternative over another. Categorization asks participants to put themselves or property indicants in groups or categories. Sorting requires that participants sort cards (representing concepts or constructs) into piles using criteria established by the researcher. The cards might contain photos or images or verbal statements of product features.
3) Data properties – decisions about the choice of measurement scales are often made with regard to the data properties generated by each scale. Scales are classified in increasing order of power; scales are nominal, ordinal, interval, or ratio. Nominal scales classify data into categories without indicating order, distance, or unique origin. Ordinal data show relationships of more than and less than but have no distance or unique origin. Interval scales have both order and distance but no unique origin. Ratio scales possess all four properties’ features. The assumptions underlying each level of scale determine how a particular measurement scale’s data will be analysed statistically.
4) Number of dimensions – measurement scales are either uni/one-dimensional or multidimensional. With a uni-dimensional scale, one seeks to measure only one attribute of the participant or object. A multidimensional scale recognizes that an object might be better described with several dimensions than on a uni-dimensional continuum.
5) Balanced or unbalanced – a balanced rating scale has an equal number of categories above and below the midpoint. Generally, rating scales should be balanced, with an equal number of favourable and unfavourable response choices. An unbalanced rating scale has an unequal number of favourable and unfavourable response choices.
6) Forced or unforced choices – an unforced-choice rating scale provides participants with an opportunity to express no opinion when they are unable to make a choice among the alternatives offered. A forced-choice scale requires that participants select one of the offered alternatives. Researchers often exclude the response choice ‘no opinion’, ‘don’t know’, or ‘neutral’ when they know that most participants have an attitude on the topic. However, when many participants are clearly undecided and the scale does not allow them to express their uncertainty, the forced-choice biases results.
7) Number of scale points – a scale should be appropriate for its purpose. For a scale to be useful, it should match the stimulus presented and extract information proportionate to the complexity of the attitude, object, concept, or construct. First, as the number of scale points increases, the reliability of the measure increases. Second, in some studies, scales with 11 points may produce more valid results than 3-, 5-, or 7-point scales. Third, some constructs require greater measurement sensitivity and the opportunity to extract more variance, which additional scale points provide.
Fourth, a larger number of scale points are needed to produce accuracy when using single-dimension versus multiple-dimension scales. Finally, in cross-cultural measurement, the cultural practices may condition participants to a standard metric.
8) Errors to avoid with rating scales – Before accepting participants’ ratings, their tendencies to make errors of central tendency and halo effect should be considered. Some raters are reluctant to give extreme judgments, and this fact accounts for the error of central tendency. Participants may also be ‘easy raters’ or ‘hard raters’, making what is called an error of leniency. These errors most often occur when the rater does not know the object or property being rated. To address these tendencies, researchers can:
- Adjust the strength of descriptive adjectives.
- Space the intermediate descriptive phrases farther apart.
- Provide smaller differences in meaning between the steps near the ends of the scale than between the steps near the centre.
- Use more points in the scale.
The hallo effect: the systematic bias that the rater introduces by carrying over a generalized impression of the subject from one rating to another. Halo is especially difficult to avoid when the property being studied is not clearly defined, is not easily observed, is not frequently discussed, involves reactions with others, or is a trait of high moral importance. Ways of counteracting the halo effect include having the participant rate one trait at a time, revealing one trait per page, or periodically reversing the terms that anchor the endpoints of the scale, so positive attributes are not always on the same end of each scale.
RATING SCALES – rating scales are used to judge properties of objects without reference to other similar objects. These ratings may be in such form as ‘like-dislike’ or other classifications using even more categories.
1) Simple attitude scales – the simple category scale (also called a dichotomous scale) offers two mutually exclusive response choices. These may be ‘yes’ and ‘no’, ‘important’ and ‘unimportant’. This response strategy is particularly useful for demographic questions or where a dichotomous response is adequate. When there are multiple options for the rater but only one answer is sought, the multiple-choice, single-response scale is appropriate. Both the multiple-choice, single-response scale and the simple category scale produce nominal data. A variation, the multiple choice, multiple-response scale (also called a checklist) allows the rater to select one or several alternatives. The cumulative feature of this scale can be beneficial when a complete picture of the participant’s choice is desired. This scale generates nominal data. Simple attitude scales are easy to develop, are inexpensive, and can be designed to be highly specific. The design approach is subjective. The researcher’s insight and ability offer the only assurance that the items chosen are a representative sample of the universe of attitudes about the attitude project. There is no evidence that each person will view all items with the same frame of reference as will other people.
2) Likert scales – the Likert scale is the most frequently used variation of the summated rating scale. Summated rating scales consist of statements that express either a favourable or an unfavourable attitude toward the object of interest. The participant is asked to agree or disagree with each statement. Each response is given a numerical score to reflect its degree of attitudinal favourableness, and the scores may be summed to measure the participant’s overall attitude.
The Likert scale is easy and quick to construct. Careful researchers are careful that each item meets an empirical test for discriminating ability between favourable and unfavourable attitudes. Likert scales are probably more reliable and provide a greater volume of data than many other scales. The scale produces interval data.
Originally, creating a Likert scale involved a procedure known as item analysis.
In the first step, a large number of statements were collected that met two criteria: (1) each statement was relevant to the attitude being studied; (2) each was believed to reflect a favourable or unfavourable position on that attitude. People similar to those who were going to be studied were asked to read each statement and to state the level of their agreement with it, using a 5-point scale.
To ensure consistent results, the assigned numerical values are reversed if the statement is worded negatively. The two extreme groups represent people with the most favourable and least favourable attitudes toward the attitude being studied.
These extremes are the two criterion groups by which individual items are evaluated.
Item analysis assess each item based on how well it discriminates between those persons whose total score is high and those whose total score is low. The mean scores for the high-score and low-score groups are then tested for statistical significance by computing t values. After finding the t values for each statement, they are rank-ordered, and those statements with the highest t values are selected.
3) Semantic differential scales – the semantic differential (SD) scale measures the psychological meanings of an attitude object using bipolar adjectives. Researchers use this scale for studies such as brand and institutional image. The method consists of a set of bipolar rating scales, usually with 7 points, by which one or more participants rate one or more concepts on each scale item. The SD scale is based on the proposition that an object can have several dimensions of connotative meaning. The meanings are located in multidimensional property space, called semantic space. Connotative meanings are suggested or implied meanings, in addition to the explicit meaning of an object. The semantic differential has several advantages. It is an efficient and easy way to secure attitudes from a large sample. These attitudes may be measured in both direction and intensity. The total set of responses provides a comprehensive picture of the meaning of an object and a measure of the person doing the rating. It is a standardized technique that is easily repeated but escapes many problems of response distortion found with more direct methods. It produces interval data.
4) Numerical/multiple rating list scales – numerical scales have equal intervals that separate their numeric scale points. The verbal anchors serve as the labels for the extreme points.
Numerical scales are often 5-point scales but may have 7 or 10 points. The participants write a number from the scale next to each item. The scale’s linearity, simplicity, and production of ordinal or interval data make it popular for managers and researchers. A multiple rating list scale is similar to the numerical scale but differs in two ways: (1) it accepts a circled response from the rater, and (2) the layout facilitates visualization of the results. The advantage is that a mental map of the participant’s evaluations is evident to both the rater and the researcher. This scale produces interval data.
5) Staple scale – is used as an alternative to the semantic differential, especially when it is difficult to find bipolar adjectives that match the investigative question. For example, there are three attributes of corporate image. The scale is composed of the word (or phrase) identifying the image dimension and a set of 10 response categories for each of the three attributes. Fewer response categories are sometimes used.
Participants select a plus number for the characteristic that describes the attitude object. The more accurate the description, the larger is the positive number. Similarly, the less accurate the description, the larger is the negative number chosen. Ratings range from +5 to -5, with participants selecting a number that describes the store very accurately to very inaccurately. Like the Likert, SD, and numerical scales, Stapel scales usually produce interval data.
6) Constant-sum scales – is a scale that helps the researcher discover proportions. With a constant-sum scale, the participant allocates points to more than one attribute or property indicant, such that they total a constant sum, usually 100 or 10. Up to 10 categories may be used, but both participant precision and patience suffer when too many stimuli are proportioned and summed. The advantage of the scale is its compatibility with per cent and the fact that alternatives that are perceived to be equal can be so scored – unlike the case with most ranking scales. The scale is used to record attitudes, behaviour, and behavioural intent. The scale produces interval data.
7) Graphic rating scales – the scale was originally created to enable researchers to discern fine differences. Theoretically, an infinite number of ratings are possible if participants are sophisticated enough to differentiate and record them. They are instructed to mark their response at any point along a continuum. Usually, the score is a measure of length (millimetres) from either endpoint. The results are treated as interval data. The difficulty is in coding and analysis. This scale requires more time than scales with predetermined categories.
RANKING SCALES – in ranking scales, the participant directly compares two or more objects and makes choices among them.
Frequently, the participant is asked to select one as the ‘best’ or the ‘most preferred’. When there are only two choices, this approach is satisfactory, but it often results in ties when more than two choices are found. Using the paired-comparison scale, the participant can express attitudes unambiguously by choosing between two objects. The number of judgements required in a paired comparison is [(n)(n-1)/2], where n is the number of stimuli or objects to be judged. Reducing the number of comparisons per participant without reducing the number of objects can lighten this burden. Each participant can be presented with only a sample of the stimuli. In this way, each pair of objects must be compared an equal number of times. Another procedure is to choose a few objects that are believed to cover the range of attractiveness at equal intervals. All other stimuli are then compared to these few standard objects.
Paired comparisons run the risk that participants will tire to the point that they give ill-considered answers or refuse to continue. A paired comparison provides ordinal data. The forced ranking scale lists attributes that are ranked relative to each other. This method is faster than paired comparisons and is usually easier and more motivating to the participant. A drawback to forced ranking is the number of stimuli that can be handled by this method. In addition, rank ordering produces ordinal data since the distance between preferences is unknown. Often the manager is interested in benchmarking. This calls for a standard by which other programs, processes, brands, or people can be compared. The comparative scale is ideal for such comparisons if the participants are familiar with the standard. Some researchers treat the data produced by comparative scales as interval data since the scoring reflects an interval between the standard and what is being compared. The rank or position of the item would be treated as ordinal data unless the linearity of the variables in question could be supported.
Arbitrary scales are designed by collecting several items that are unambiguous and appropriate to a given topic. These scales are not only easy to develop, but also inexpensive and can be designed to be highly specific. Moreover, arbitrary scales provide useful information and are adequate if developed skilfully.
Consensus scaling requires items to be selected by a panel of judges and then evaluate them on:
- Their relevance to the topic area
- Their potential for ambiguity
- The level of attitude they represent
In this field, especially Turnstone equal-appearing interval scale is well-known.
Item analysis scaling is the procedure for evaluating an item based on how well it discriminates between those persons whose total score is high and those whose total score is low. The most popular scale using this approach is the Likert scale.
CUMULATIVE SCALES – total scores on cumulative scales have the same meaning. Given the person’s total score, it is possible to estimate which items were answered positively and negatively. A pioneering type of this type was the scalogram.
Scalogram analysis is a procedure for determining whether a set of items forms a uni-dimensional scale. A scale is uni-dimensional if the responses fall into a pattern in which endorsement of the item reflecting the extreme position results in endorsing all items that are less extreme.
The scalogram and similar procedures for discovering underlying structure are useful for assessing attitudes and behaviours that are highly structured, such as social distance, organizational hierarchies, and evolutionary product stages.
Factor scales include a variety of techniques that have been developed to address two problems:
- How to deal with a universe of content that is multidimensional
- How to uncover underlying dimensions that have not been identified by exploratory research
Factoring develops measurement questions through factor analysis or similar correlation techniques. It is particularly useful in uncovering latent attitude dimensions, and it approaches sampling through the concept of multidimensional attribute space. The semantic differential scale is an example.
Other developments in scaling include multidimensional scaling and conjoint analysis. Each represents a family of related techniques with a variety of applications for handling complex judgments. Magnitude estimation and Rasch models provide an avenue for reconceptualising traditional scaling techniques for greater efficiency and freedom form error.
CHAPTER E: QUESTIONAIRRES AND RESPONSES
There are 3 suggested phases of developing an instrument design.
PHASE 1: REVISITING THE RESEARCH QUESTION HIERARCHY
In general, once the researcher understands the connection between the investigative questions and the potential measurement questions, a strategy for the survey is the next step. This proceeds to getting down to the particulars of instrument design. The following are important issues to be considered:
1) Type of scale for desired analysis – the analytical procedures available to the researcher are determined by the scale types used in the survey. It is important to plan the analysis before developing the measurement questions.
2) Communication approach – Communication-based research may be conducted by personal interview, telephone, mail, computer, or some combination of these (called hybrid studies). The different delivery mechanisms result in different introductions, instructions, instrument layout, and conclusions.
3) Disguising objectives and sponsors – it has to be decided whether the purpose of the study should be disguised. A disguised question is designed to conceal the question’s true purpose. The decision about when to use disguised questions within surveys may be made easier by identifying four situations where disguising the study objective is or is not an issue:
- Willingly shared, conscious-level information – in surveys requesting conscious-level information that should be willingly shared, either disguised or undisguised questions may be used, but the situation rarely requires disguised techniques.
- Reluctantly shared, conscious-level information – sometimes the participant knows the information which a researcher need but is reluctant to share it for a variety of reasons. When the participant is asked for an opinion on some topic on which he may hold a socially unacceptable view, projective techniques are used. In this type of disguised question, the survey designer phrases the questions in a hypothetical way or asks how other people in the participant’s experience would answer the question. The assumption is that responses to these questions will indirectly reveal the participant’s opinions.
- Knowable, limited-conscious-level information – not all information is at the participant’s conscious level. Given some time – and motivation – the participant can express this information.
Asking about individual attitudes when participants know they hold the attitude but have not explored why they hold the attitude may encourage the use of disguised questions.
- Subconscious-level information – in assessing buying behaviour, it is accepted that some motivations are subconscious. This is true for attitudinal information as well. Seeking insight into the basic motivations underlying attitudes or consumption practices may or may not require disguised techniques.
4) Preliminary analysis plan – researchers are concerned with adequate coverage of the topic and with securing the information in its most usable form. A good way to test how well the study plan meets those needs is to develop ‘dummy’ tables that display the data one expects to secure. Each dummy table is a cross-tabulation between two or more variables. The preliminary analysis plan serves as a check on whether the planned measurement questions meet the data needs of the research question. This also helps the researcher determine the type of scale needed for each question – a preliminary step to developing measurement questions for investigative questions.
PHASE 2: CONSTRUCTING AND REFINING THE MEASUREMENT QUESTIONS
Drafting or selecting questions begins once a complete list of investigative questions is developed and a decision is made on the collection processes to be used. The order, type, and wording of the measurement questions, the introduction, the instructions, the transitions, and the closure in a quality questionnaire should accomplish the following:
- Encourage each participant to provide an adequate amount of information.
- Encourage each participant to provide accurate responses.
- Discourage each participant from early discontinuation of participation.
- Discourage each participant from refusing to answer specific questions.
- Leave the participant with a positive attitude about survey participation.
- Question categories and structure - Questionnaires and interview schedules (an alternative term for the questionnaires used in personal interviews) can range from those that have a great deal of structure to those that are essentially unstructured. Questionnaires contain three categories of measurement questions:
- Administrative questions – identify the participant, interviewer, interview location, and conditions. These questions are rarely asked of the participant but are necessary for studying patterns within the data and identify possible error sources.
- Classification questions – usually cover sociological-demographic variables that allow participants’ answers to be grouped so that patterns are revealed and can be studied.
These questions usually appear at the end of a survey (except for those used as filters or screens, questions that determine whether a participant has the requisite level of knowledge to participate.
- Target questions (structured or unstructured) – address the investigative questions of a specific study. These are grouped by topic in the survey. Target questions may be structured (they present the participants with a fixed set of choices, often called closed questions) or unstructured (the do not limit responses but do provide a frame of reference for participants’ answers; sometimes referred to as open-ended questions).
2) Question content - is first and foremost dictated by the investigative questions guiding the study. From these questions, questionnaire designers craft or borrow the target and classification questions that will be asked of participants.
Four questions, covering numerous issues, guide the instrument designer in selecting appropriate question content:
- Should this question be asked (does it match the study objective)?
- Is the question of proper scope and coverage?
- Can the participant adequately answer this question as asked?
- Will the participant willingly answer this question as asked?
3) Question wording - a dilemma arises from the requirements of question design (the need to be explicit, to present alternatives, and to explain meanings). All contribute to longer and more involved sentences. The difficulties caused by question wording exceed most other sources of distortion in surveys. The diligent question designer will put a survey question through many revisions. Leading questions can inject significant error by implying that one response should be favoured over another.
4) Response strategy - a third major area in question design is the degree and form of structure imposed on the participant.
The various response strategies offer options that include unstructured response (or open-ended response, the free choice of words) and structured response (or closed response, specified alternatives provided).
Free-response questions - also known as open-ended questions, ask the participant a question and either the interviewer pauses for the answer (which is unaided) or the participant records his or her ideas in his or her own words in the space provided on a questionnaire.
Dichotomous question - suggest opposing responses (yes/no) and generate nominal data.
Multiple-choice questions - are appropriate when there are more than two alternatives or when a researcher seeks for gradations of preference, interest, or agreement the question. Multiple-choice questions usually generate nominal data. When the choices are numeric alternatives, this response structure may produce at least interval and sometimes ratio data. When the choices represent ordered but unequal, numerical ranges or a verbal rating scale, the multiple-choice question generates ordinal data.
Checklist – when multiple responses to a single question are required, the question should be asked in one of three ways: the checklist, rating, or ranking strategy. If relative order is not important, the checklist is logical choice. They are more efficient than asking for the same information with a series of dichotomous selection questions, one for each individual factor. Checklists generate nominal data.
Rating questions - ask the participant to position each factor on a companion scale, either verbal, numeric, or graphic. Generally, rating-scale structures generate ordinal data; some carefully crafted scales generate interval data. It is important to remember that the researcher should represent only one response dimension in rating-scale response options. Otherwise, the participant is presented with a double-barreled question with insufficient choices to reply to both aspects.
Ranking questions - ideal when relative order of the alternatives is important. The checklist strategy would provide the three factors of influence, but there is no way of knowing the importance the participant places on each factor. Ranking generates ordinal data.
PHASE 3: DRAFTING AND REFINING THE INSTRUMENT
Phase 3 of instrument design – drafting and refinement – is a multistep process:
1) Participant screening and introduction – the introduction must supply the sample unit with the motivation to participate in the study.
It must reveal enough about the forthcoming questions, usually by revealing some or all of the topics to be covered, for participants to judge their interest level and their ability to provide the desired information. In any communication study, the introduction also reveals the amount of time participation is likely to take. The introduction also reveals the researcher organization or sponsor (unless the study is disguised) and possibly the objective of the study. In personal or phone interviews the introduction usually contains one or more screen questions or filter questions to determine if the potential participant has the knowledge or experience necessary to participate in the study.
2) Measurement question sequencing - the design of survey questions is influenced by the need to relate each question to the others in the instrument. Often the content of one question (called a branch question) assumes other questions have been asked and answered.
The basic principle used to guide sequence decisions is this: the nature and needs of the participant must determine the sequence of questions and the organization of the interview schedule. Four guidelines are suggested to implement this principle:
- The question process must quickly awaken interest and motivate the participant to participate in the interview. Put the more interesting topical target questions early. Leave classification questions not used as filters or screens to the end of the survey.
- The participant should not be confronted by early requests for information that might be considered personal or ego-threatening. Put questions that might influence the participant to discontinue or terminate the questioning process near the end. Use buffer questions – neutral questions designed chiefly to establish rapport with the participant.
- The questioning process should begin with simple items and then move to the more complex, as well as move from general items to the more specific. Put taxing and challenging questions later in the questioning process. The procedure of moving from general to more specific questions is sometimes called the funnel approach. The objectives of this procedure are to learn the participant’s frame of reference and to extract the full range of desired information while limiting the distortion effect of earlier questions on later ones.
- Changes in the frame of reference should be small and should be clearly pointed out. Use transition statements between different topics of the target question set.
3) Instructions - to the interviewer or participant attempt to ensure that all participants are treated equally, thus avoiding building error into the results. Two principles form the foundation for good instructions: clarity and courtesy. Instruction topics include those for:
- Terminating an unqualified participant – defining for the interviewer how to terminate an interview when the participant does not correctly answer the screen or filter questions.
- Terminating a discontinued interview – defining for the interviewer how to conclude an interview when the participant decides to discontinue.
- Moving between questions on an instrument – defining for an interviewer or participant how to move between questions or topic sections of an instrument (skip directions) when movement is dependent on the specific answer to a question or when branched questions are used.
- Disposing of a completed questionnaire – defining for an interviewer or participant completing a self-administered instrument how to submit the completed questionnaire.
4) Conclusion - its role is to leave the participant with the impression that his or her involvement has been valuable. Subsequent researchers may need this individual to participate in new studies.
OVERCOMING INSTRUMENT PROBLEMS – there is no substitute for a thorough understanding of question wording, question content, and question sequencing issues.
However, the researcher can do several things to help improve survey results, among them:
- Build rapport with the participant – most information can be secured by direct undisguised questioning if rapport has been developed. Rapport is particularly useful in building participant interest in the project, and the more interest participants have, the more cooperation they will give.
- Redesign the questioning process – to improve the quality of answers by modifying the administrative process and the response strategy.
- Explore alternative response strategies – when drafting the original question, try developing positive, negative, and neutral versions of each type of question. This practice helps to select question wording that minimizes bias.
- Use methods other than surveying to secure the data.
- Pre-test all the survey elements – the assessment of questions and instruments before the start of a study.
There are abundant reasons for pretesting individual questions, questionnaires, and interview schedules:
- Discovering ways to increase participant interest;
- Increasing the likelihood that participants will remain engaged to the completion of the survey;
- Discovering question content, wording, and sequencing problems;
- Discovering target question groups where researcher training is needed;
- Exploring ways to improve the overall quality of survey data.
CHAPTER F: LIST OF MAIN CONCEPTS: RESEARCH METHODOLOGY
A population element = the unit of study - the individual participant or object on which the measurement is taken.
A population = the total collection of elements about which some conclusion is to be drawn.
A census = a count of all the elements in a population. The listing of all population elements from which the sample will be drawn is called the sample frame.
Accuracy = the degree to which bias is absent from the sample. When the sample is drawn properly, the measure of behaviour, attitudes or knowledge of some sample elements will be less than the measure of those same variables drawn from the population. Also, the measure of the behaviour, attitudes, or knowledge of other sample elements will be more than the population values. Variations in these sample values offset each other, resulting in a sample value that is close to the population value.
Systematic variance =“the variation in measures due to some known or unknown influences that ‘cause’ the scores to lean in on direction more than another.” The systematic variance may be reduced by e.g. increasing the sample size.
Precision: precision of estimate is the second criterion of a good sample design. In order to interpret the findings of research, a measurement of how closely the sample represents the population is needed.
Sampling error = The numerical descriptors that describe samples may be expected to differ from those that describe populations because of random fluctuations natural to the sampling process.
Representation =The members of a sample are selected using probability or non-probability procedures.
Probability sampling is based on the concept of random selection – a controlled procedure which ensures that each population element is given a known non-zero change of selection.
Non-probability sampling is arbitrary and subjective; when elements are chosen subjectively, there is usually some pattern or scheme used. Thus, each member of the population does not have a known chance of being included.
Element selection - Whether the elements are selected individually and directly from the population – viewed as a single pool – or additional controls are imposed, element selection may also classify samples.
Probability sampling - is based on the concept of random selection – a controlled procedure that assures that each population element is given a known nonzero chance of selection. Only probability samples provide estimates of precision and offer the opportunity to generalize the findings to the population of interest from the sample population.
Population parameters = summary descriptors (e.g., incidence proportion, mean, variance) of variables of interest in the population.
Sample statistics = used as estimators of population parameters. The sample statistics are the basis of conclusions about the population. Depending on how measurement questions are phrased, each may collect a different level of data. Each different level of data also generates different sample statistics.
The population proportion of incidence “is equal to the number of elements in the population belonging to the category of interest, divided by the total number of elements in the population.”.
The sampling frame = is closely related to the population. It is the list of elements from which the sample is actually drawn. Ideally, it is a complete and correct list of population members only.
Stratified random sampling = is the process by which the sample is constrained to include elements from each of the segments is called.
Cluster sampling = this is where the population is divided into groups of elements with some groups randomly selected for study.
An area sampling = the most important form of cluster sampling. It is possible to use when a research involves populations that can be identified with some geographic area. This method overcomes the problems of both high sampling cost and the unavailability of a practical sampling frame for individual elements
The theory of clustering = that the means of sample clusters are unbiased estimates of the population mean. This is more often true when clusters are naturally equal, such as households in city blocks. While one can deal with clusters of unequal size, it may be desirable to reduce or counteract the effects of unequal size.
Double sampling = It may be more convenient or economical to collect some information by sample and then use this information as the basis for selecting a subsample for further study. This procedure is called double sampling, sequential sampling, or multiphase sampling. It is usually found with stratified and/or cluster designs.
Convenience = Non-probability samples that are unrestricted are called convenience samples. They are the least reliable design but normally the cheapest and easiest to conduct. Researches or field workers have the freedom to choose whomever they find.
Purposive sampling = A non-probability sample that conforms to certain criteria is called purposive sampling. There are two major types – judgment sampling and quota sampling:
Judgment sampling occurs when a researcher selects sample members to conform to some criterion. When used in the early stages of an exploratory study, a judgment sample is appropriate. When one wishes to select a biased group for screening purposes, this sampling method is also a good choice.
Quota sampling is the second type of purposive sampling. It is used to improve representativeness. The logic behind quota sampling is that certain relevant characteristics describe the dimensions of the population. If a sample has the same distribution on these characteristics, then it is likely to be representative of the population regarding other variables on which the researcher has no control. In most quota samples, researchers specify more than one control dimension. Each should meet two tests: (1) It should have a distribution in the population that can be estimated, and (2) be pertinent to the topic studied.
Snowball = In the initial stage of snowball sampling, individuals are discovered and may or may not be selected through probability methods. This group is then used to refer the researcher to others who possess similar characteristics and who, in turn, identify others.
The observation approach: involves observing conditions, behaviour, events, people or processes.
The communication approach: involves surveying/interviewing people and recording their response for analysis. Communicating with people about various topics, including participants, attitudes, motivations, intentions and expectations.
Survey: a measurement process used to collect information during a highly structured interview – sometimes with a human interviewer and other times without.
Participant receptiveness = the participant’s willingness to cooperate.
Dealing with non-respond errors - By failing to respond or refusing to respond, participants create a non-representative sample for the study overall or for a particular item or question in the study.
In surveys, non-response error occurs when the responses of participants differ in some systematic way from the responses of nonparticipants.
Response errors: occur during the interview (created by either the interviewer or participant) or during the preparation of data for analysis.
Participant-initiated error: when the participant fails to answer fully and accurately – either by choice or because of inaccurate or incomplete knowledge
Interviewer error: response bias caused by the interviewer.
Response bias = Participants also cause error by responding in such a way as to unconsciously or consciously misrepresent their actual behaviour, attitudes, preferences, motivations, or intentions.
Social desirability bias = Participants create response bias when they modify their responses to be socially acceptable or to save face or reputation with the interviewer
Acquiescence = the tendency to be agreeable.
Noncontact rate = ratio of potential but unreached contacts (no answer, busy, answering machine, and disconnects but not refusals).
The refusal rate refers to the ration of contacted participants who decline the interview to all potential contacts.
Random dialling: requires choosing telephone exchanges or exchange blocks and then generating random numbers within these blocks for calling.
A survey via personal interview is a two-way conversation between a trained interviewer and a participant.
Computer-assisted personal interviewing (CAPI): special scoring devices and visual materials are used.
Intercept interview: targets participants in centralised locations, such as shoppers in retail malls. Reduce the costs associated with travel.
Outsourcing survey services offers special advantages to managers. A professionally trained research staff, centralized location interviewing, focus group facilities and computer assisted facilities are among them.
Causal methods are research methods which answer questions such as “Why do events occur under some conditions and not under others?”
Ex post facto research designs, in which a researcher interviews respondents or observes what is or what has been, have the potential for discovering causality. In comparison, the distinction is that with causal methods the researcher is required to accept the world as it is found, whereas an experiment allows the researcher to systematically alter the variables of interest and observe what changes follow.
Experiments are studies which involve intervention by the researcher beyond what is required for measurement.
Replication = repeating an experiment with different subject groups and conditions
Field experiments =a study of the dependent variable in actual environmental conditions
Hypothesis = a relational statement as it describes a relationship between two or more variables
In an experiment, participants experience a manipulation of the independent variable, called the experimental treatment.
The treatment levels of the independent variable are the arbitrary or natural groups the researcher makes within the independent variable of an experiment. The levels assigned to an independent variable should be based on simplicity and common sense.
A control group could provide a base level for comparison. The control group is composed of subjects who are not exposed to the independent variable(s), in contrast to those who receive the experimental treatment. When subjects do not know if they are receiving the experimental treatment, they are said to be blind. When the experimenters do not know if they are giving the treatment to the experimental group or to the control group, the experiment is said to be double blind.
Random assignment to the groups is required to make the groups as comparable as possible with respect to the dependent variable. Randomization does not guarantee that if the groups were pretested they would be pronounced identical; but it is an assurance that those differences remaining are randomly distributed.
Matching may be used when it is not possible to randomly assign subjects to groups. This employs a non-probability quota sampling approach. The object of matching is to have each experimental and control subject matched on every characteristic used in the research.
Validity = as whether a measure accomplishes its claims.
Internal validity = do the conclusions drawn about a demonstrated experimental relationship truly imply cause?
External validity – does an observed causal relationship generalize across persons, settings, and times? Each type of validity has specific threats a researcher should to guard against.
Statistical regression = this factor operates especially when groups have been selected by their extreme scores. No matter what is done between O1 and O2, there is a strong tendency for the average of the high scores at O1 to decline at O2 and for the low scores at O1 to increase. This tendency results from imperfect measurement that, in effect, records some persons abnormally high and abnormally low at O1. In the second measurement, members of both groups score more closely to their long-run mean scores.
Experimental mortality – this occurs when the composition of the study groups changes during the test.
Attrition is especially likely in the experimental group and with each dropout the group changes. Because members of the control group are not affected by the testing situation, they are less likely to withdraw.
Diffusion or imitation of treatment = if the control group learns of the treatment (by talking to people in the experimental group) it eliminates the difference between the groups.
Compensatory equalization = where the experimental treatment is much more desirable, there may be an administrative reluctance to withdraw the control group members. Compensatory actions for the control groups may confound the experiment.
Compensatory rivalry = this may occur when members of the control group know they are in the control group. This may generate competitive pressures.
Resentful demoralization of the disadvantaged = when the treatment is desirable and the experiment is obtrusive, control group members may become resentful of their deprivation and lower their cooperation and output.
Reactivity of testing on X – the reactive effect refers to sensitising subjects via a pre-test so that they respond to the experimental stimulus (X) in a different way. This before-measurement effect can be particularly significant in experiments where the IV is a change in attitude.
Interaction of selection and X = the process by which test subjects are selected for an experiment may be a threat to external validity. The population from which one selects subjects may not be the same as the population to which one wishes to generalize results.
Static Group Comparison – the design provides for two groups, one of which receives the experimental stimulus while the other serves as a control.
Pre-test-Post-test Control Group Design – this design consists of adding a control group to the one-group pre-test-post-test design and assigning the subjects to either of the groups by a random procedure (R).
Post-test-Only Control Group Design – The pre-test measurements are omitted in this design. Pre-tests are well established in classical research design but are not really necessary when it is possible to randomize.
Non-equivalent Control Group Design – this is a strong and widely used quasi-experimental design. It differs from the pre-test-post-test control group design - the test and control groups are not randomly assigned.
There are two varieties.
- Intact equivalent design, in which the membership of the experimental and control groups is naturally assembled. Ideally, the two groups are as alike as possible. This design is especially useful when any type of individual selection process would be reactive.
-The self-selected experimental group design is weaker because volunteers are recruited to form the experimental group, while no volunteer subjects are used for control. This design is likely when subjects believe it would be in their interest to be a subject in an experiment.
Separate Sample Pre-test-Post-test Design = Most applicable when it is unknown when and to who to introduce the treatment but it can decide when and whom to measure. This is a weaker design because several threats to internal validity are not handled adequately.
Measurement in research consists of assigning numbers to empirical events, objects or properties, or activities in compliance with a set of rules.
Mapping rules = a scheme for assigning numbers or symbols to represent aspects of the event being measured
Objects include the concepts of ordinary experience, such as touchable items like furniture. Objects also include things that are not as concrete, i.e. genes, attitudes and peer-group pressures.
Properties are the characteristics of the object. A person’s physical properties may be stated in terms of weight, height.
Psychological properties: include attitudes and intelligence.
Social properties include leadership ability, class affiliation, and status. In a literal sense, researchers do not measure either objects or properties.
Dispersion: describes how scores cluster or scatter in a distribution. Nominal data are statistically weak, but they can still be useful.
Nominal scales = with these scales, a researcher is collecting information on a variable that naturally (or by design) can be grouped into two or more categories that are mutually exclusive and collectively exhaustive.
Ordinal scales = include the characteristics of the nominal scale plus an indicator of order. Ordinal data require conformity to a logical postulate: If a > b and b > c, then a > c.
Interval scales = have the power of nominal and ordinal data plus one additional strength: they incorporate the concept of equality of interval (the scaled distance between 1 and 2 equals the distance between 2 and 3).
Ratio scales = incorporate all of the powers of the previous scales plus the provision for absolute zero or origin. Ratio data represent the actual amounts of a variable. Measures of
Content Validity – of a measuring instrument is the extent to which it provides adequate coverage of the investigative questions guiding the study. If the instrument contains a representative sample of the universe of subject matter of interest, then content validity is good. To evaluate the content validity of an instrument, one must first agree on what elements constitute adequate coverage. A determination of content validity involves judgment.
Criterion-Related Validity – reflects the success of measures used for prediction or estimation. You may want to predict and outcome or estimate the existence of a current behaviour or time perspective
Construct validity – in attempt to evaluate, we consider both the theory and the measuring instrument being used. If we were interested in measuring the effect of trust in cross functional teams, the way in which ‘trust’ was operationally defined would have to correspond to an empirically grounded theory. If a known measure of trust was available, we might correlate the results obtained using this measure with those derived from our new instrument.
Reliability – has to do with the accuracy and precision of a measurement procedure. A measure is reliable to the degree that it supplies consistent results. Reliability is a necessary contributor to validity but is not a sufficient condition for validity.
Stability – a measure is said to possess stability if consistent results with repeated measurements of the same person with the same instrument can be secured.
An observation procedure is stable if it gives the same reading on a particular person when repeated one or more times.
Equivalence – a second perspective on reliability considers how much error may be introduced by different investigators (in observation) or different samples of items being studied (in questioning or scales).
Internal Consistency – a third approach to reliability uses only one administration of an instrument or test to assess the internal consistency or homogeneity among the items.
The split-half technique can be used when the measuring tool has many similar questions or statements to which participant can respond.
Practicality = concerned with a wide range of factors of economy, convenience, and interpretability. The scientific requirements of a project call for the measurement process to be reliable and valid, while the operational requirements call for it to be practical.
Scaling = the ‘procedure for the assignment of numbers (or other symbols) to a property of objects in order to impart some of the characteristics of numbers to the properties in question.’
Ranking scales constrain the study participant to making comparisons and determining order among two or more properties (or their indicants) or objects.
A choice scale requires that participants choose one alternative over another.
Categorization asks participants to put themselves or property indicants in groups or categories.
Sorting requires that participants sort cards (representing concepts or constructs) into piles using criteria established by the researcher. The cards might contain photos or images or verbal statements of product features.
Nominal scales classify data into categories without indicating order, distance, or unique origin.
Ordinal data show relationships of more than and less than but have no distance or unique origin.
Interval scales have both order and distance but no unique origin.
Ratio scales possess all four properties’ features. The assumptions underlying each level of scale determine how a particular measurement scale’s data will be analysed statistically.
Uni-dimensional scale, one seeks to measure only one attribute of the participant or object.
Multidimensional scale recognizes that an object might be better described with several dimensions than on a uni-dimensional continuum.
Balanced rating scale has an equal number of categories above and below the midpoint. Generally, rating scales should be balanced, with an equal number of favourable and unfavourable response choices.
Unbalanced rating scale has an unequal number of favourable and unfavourable response choices.
Unforced-choice rating scale provides participants with an opportunity to express no opinion when they are unable to make a choice among the alternatives offered.
Forced-choice scale requires that participants select one of the offered alternatives. Researchers often exclude the response choice ‘no opinion’, ‘don’t know’, or ‘neutral’ when they know that most participants have an attitude on the topic.
Hallo effect = the systematic bias that the rater introduces by carrying over a generalized impression of the subject from one rating to another. Halo is especially difficult to avoid when the property being studied is not clearly defined, is not easily observed, is not frequently discussed, involves reactions with others, or is a trait of high moral importance.
Simple category scale (also called a dichotomous scale) offers two mutually exclusive response choices. These may be ‘yes’ and ‘no’, ‘important’ and ‘unimportant’.
When there are multiple options for the rater but only one answer is sought, the multiple-choice, single-response scale is appropriate.
Likert scale is the most frequently used variation of the summated rating scale. Summated rating scales consist of statements that express either a favourable or an unfavourable attitude toward the object of interest. The participant is asked to agree or disagree with each statement. Each response is given a numerical score to reflect its degree of attitudinal favourableness, and the scores may be summed to measure the participant’s overall attitude.
Item analysis assesses each item based on how well it discriminates between those persons whose total score is high and those whose total score is low. The mean scores for the high-score and low-score groups are then tested for statistical significance by computing t values. After finding the t values for each statement, they are rank-ordered, and those statements with the highest t values are selected.
The semantic differential (SD) scale measures the psychological meanings of an attitude object using bipolar adjectives. Researchers use this scale for studies such as brand and institutional image.
Numerical/multiple rating list scales have equal intervals that separate their numeric scale points. The verbal anchors serve as the labels for the extreme points.
Staple scale = used as an alternative to the semantic differential, especially when it is difficult to find bipolar adjectives that match the investigative question.
Constant-sum scales = a scale that helps the researcher discover proportions.
Graphic rating scales – the scale was originally created to enable researchers to discern fine differences. Theoretically, an infinite number of ratings are possible if participants are sophisticated enough to differentiate and record them.
They are instructed to mark their response at any point along a continuum. Usually, the score is a measure of length (millimetres) from either endpoint. The results are treated as interval data. The difficulty is in coding and analysis. This scale requires more time than scales with predetermined categories.
Ranking scales– in ranking scales, the participant directly compares two or more objects and makes choices among them.
Arbitrary scales are designed by collecting several items that are unambiguous and appropriate to a given topic. These scales are not only easy to develop, but also inexpensive and can be designed to be highly specific. Moreover, arbitrary scales provide useful information and are adequate if developed skilfully.
Consensus scaling requires items to be selected by a panel of judges and then evaluate them on:
- Their relevance to the topic area
- Their potential for ambiguity
- The level of attitude they represent
Scalogram analysis = a procedure for determining whether a set of items forms a uni-dimensional scale.
Factor scales include a variety of techniques that have been developed to address two problems:
- How to deal with a universe of content that is multidimensional
- How to uncover underlying dimensions that have not been identified by exploratory research
A disguised question = designed to conceal the question’s true purpose.
Administrative questions – identify the participant, interviewer, interview location, and conditions. These questions are rarely asked of the participant but are necessary for studying patterns within the data and identify possible error sources.
Classification questions – usually cover sociological-demographic variables that allow participants’ answers to be grouped so that patterns are revealed and can be studied.
Target questions (structured or unstructured) – address the investigative questions of a specific study. These are grouped by topic in the survey. Target questions may be structured (they present the participants with a fixed set of choices, often called closed questions) or unstructured (the do not limit responses but do provide a frame of reference for participants’ answers; sometimes referred to as open-ended questions).
Response strategy - a third major area in question design is the degree and form of structure imposed on the participant.
The various response strategies offer options that include unstructured response (or open-ended response, the free choice of words) and structured response (or closed response, specified alternatives provided).
Free-response questions - also known as open-ended questions, ask the participant a question and either the interviewer pauses for the answer (which is unaided) or the participant records his or her ideas in his or her own words in the space provided on a questionnaire.
Dichotomous question - suggest opposing responses (yes/no) and generate nominal data.
Checklist – when multiple responses to a single question are required, the question should be asked in one of three ways: the checklist, rating, or ranking strategy. If relative order is not important, the checklist is logical choice. They are more efficient than asking for the same information with a series of dichotomous selection questions, one for each individual factor. Checklists generate nominal data.
Bron
- Deze samenvatting is gebaseerd op het studiejaar 2013-2014.
Join with a free account for more service, or become a member for full access to exclusives and extra support of WorldSupporter >>
Contributions: posts
Spotlight: topics
Online access to all summaries, study notes en practice exams
- Check out: Register with JoHo WorldSupporter: starting page (EN)
- Check out: Aanmelden bij JoHo WorldSupporter - startpagina (NL)
How and why use WorldSupporter.org for your summaries and study assistance?
- For free use of many of the summaries and study aids provided or collected by your fellow students.
- For free use of many of the lecture and study group notes, exam questions and practice questions.
- For use of all exclusive summaries and study assistance for those who are member with JoHo WorldSupporter with online access
- For compiling your own materials and contributions with relevant study help
- For sharing and finding relevant and interesting summaries, documents, notes, blogs, tips, videos, discussions, activities, recipes, side jobs and more.
Using and finding summaries, notes and practice exams on JoHo WorldSupporter
There are several ways to navigate the large amount of summaries, study notes en practice exams on JoHo WorldSupporter.
- Use the summaries home pages for your study or field of study
- Use the check and search pages for summaries and study aids by field of study, subject or faculty
- Use and follow your (study) organization
- by using your own student organization as a starting point, and continuing to follow it, easily discover which study materials are relevant to you
- this option is only available through partner organizations
- Check or follow authors or other WorldSupporters
- Use the menu above each page to go to the main theme pages for summaries
- Theme pages can be found for international studies as well as Dutch studies
Do you want to share your summaries with JoHo WorldSupporter and its visitors?
- Check out: Why and how to add a WorldSupporter contributions
- JoHo members: JoHo WorldSupporter members can share content directly and have access to all content: Join JoHo and become a JoHo member
- Non-members: When you are not a member you do not have full access, but if you want to share your own content with others you can fill out the contact form
Quicklinks to fields of study for summaries and study assistance
Main summaries home pages:
- Business organization and economics - Communication and marketing -International relations and international organizations - IT, logistics and technology - Law and administration - Leisure, sports and tourism - Medicine and healthcare - Pedagogy and educational science - Psychology and behavioral sciences - Society, culture and arts - Statistics and research
- Summaries: the best textbooks summarized per field of study
- Summaries: the best scientific articles summarized per field of study
- Summaries: the best definitions, descriptions and lists of terms per field of study
- Exams: home page for exams, exam tips and study tips
Main study fields:
Business organization and economics, Communication & Marketing, Education & Pedagogic Sciences, International Relations and Politics, IT and Technology, Law & Administration, Medicine & Health Care, Nature & Environmental Sciences, Psychology and behavioral sciences, Science and academic Research, Society & Culture, Tourisme & Sports
Main study fields NL:
- Studies: Bedrijfskunde en economie, communicatie en marketing, geneeskunde en gezondheidszorg, internationale studies en betrekkingen, IT, Logistiek en technologie, maatschappij, cultuur en sociale studies, pedagogiek en onderwijskunde, rechten en bestuurskunde, statistiek, onderzoeksmethoden en SPSS
- Studie instellingen: Maatschappij: ISW in Utrecht - Pedagogiek: Groningen, Leiden , Utrecht - Psychologie: Amsterdam, Leiden, Nijmegen, Twente, Utrecht - Recht: Arresten en jurisprudentie, Groningen, Leiden
JoHo can really use your help! Check out the various student jobs here that match your studies, improve your competencies, strengthen your CV and contribute to a more tolerant world
1670 | 1 |
Add new contribution