Join with a free account for more service, or become a member for full access to exclusives and extra support of WorldSupporter >>
- How do you design a study that uses SPSS? - Chapter 1
- How do you make a codebook for SPSS? - Chapter 2
- How do you start with IBM SPSS? - Chapter 3
- How do you create a file and enter your data in SPSS? - Chapter 4
- How do you screen and clean up data in SPSS? - Chapter 5
- How do you use SPSS for descriptive statistics? - Chapter 6
- Which graphs can you use to display data? - Chapter 7
- How do you manipulate data in SPSS? - Chapter 8
- How do you check the reliability of a scale? - Chapter 9
- Which method to use in SPSS? - Chapter 10
- When and how is a correlation analysis applied? - Chapter 11
- What is the difference between correlation and partial correlation? - Chapter 12
- How do you perform multiple regression in SPSS? - Chapter 13
- How do you perform a logistic regression analysis in SPSS? - Chapter 14
- How do you perform factor analysis in SPSS? - Chapter 15
- How do you use SPSS for non-parametric statistics? - Chapter 16
- Which t-tests can be used in SPSS? - Chapter 17
- How do you use one-way ANOVA in SPSS? - Chapter 18
- How do you use two-way ANOVA in SPSS? - Chapter 19
- SPSS Survival Manual by Pallant - 6th edition - BulletPoints
How do you design a study that uses SPSS? - Chapter 1
Introduction: What is SPSS?
SPSS is a statistical computer program used by scientists to collect, analyze and process data. It is mainly used to investigate research results. The abbreviation SPSS stands for Statistical Package for the Social Sciences. The program is therefore used in particular in social science.
Three important situations in which you can use SPSS:
Checking the reliability of a sample. If you want to do a research in the Netherlands, it is of course not feasible to test every resident. For this reason, we almost always use a sample (a selection of people from the population). It is important that this sample is as representative as possible for the entire population so that the outcome results can be well generalized (after all, you want to say something about the entire population and not just about the sample).
Checking the reliability of your results. SPSS can provide information as to whether the connection you have found (for example, men vote for SGP (i.e., a Dutch political party) more often than women) is a coincidence or whether the difference has something to do with something else.
Visualize data. It can be useful to visualize your results through graphs and tables. You can do this by using SPSS.
General information related to research
SPSS can answer research questions by doing analyzes; tests. An example of a research question is: do men more often choose a technical profession than women? The first step is to collect your data (what percentage of men choose a technical profession and what percentage of women). This data is mentioned in SPSS data . When you have collected your data (for example by means of questionnaires) you can enter this data in SPSS. You can then have SPSS perform a test that examines whether there is actually a difference between the data of men and women.
How to plan the set up of a study?
A good research study is highly dependent on a detailed planning. The book gives the following tips when starting an study:
Choose the design of your research (for example, experiment, questionnaire, observational). Weigh all the pros and cons of each method.
If you opt for an experiment: decide whether you opt for an between-groups design (different test subjects in each experimental condition) or a repeated measures design (all test subjects in all conditions).
If you choose an experiment: make sure you have enough levels in your independent variable.
Always select more test subjects than necessary (given the high risk of dropping out).
If possible, randomly assign test subjects to each experimental condition. It is important that these groups do not differ in other matters (check this with a covariance analysis).
Choose reliable and valid dependent variables.
Anticipate possible confounding variables. These are variables other than the independent variable that can provide a possible explanation for your result. If possible, check variables for these confounding.
If you choose a questionnaire study (survey), check in advance whether the instructions, questions and scales are clear. You do this through pilot testing.
How to choose the appropriate scales and methods?
When choosing the right scale and method, two concepts are important: reliability and validity. Both terms can influence the quality of your data.
Reliability
The reliability of a scale indicates to what extent the scale is free from random error. There are two types of reliability:
Test-retest reliability: this is measured by offering the relevant scale to two different people in two different situations and then calculating the correlation between these two scores. The higher this correlation, the greater the test-retest reliability.
Internal consistency: the extent to which the items of a scale are interrelated. This can for example be calculated with the Cronbach's cofficient alpha in SPSS. A Cronbach's alpha of .7 or greater indicates a reliable scale.
Validity
The validity of a scale refers to the extent to which the methods measure what they are intended to measure. There are different forms of validity:
Content validity: the degree of accuracy with which the method or scale covers the intended domain or content.
Criterion Validity: the relationship between different scales and a specified measurement criterion.
Construct validity: the relationship with other constructs, both related constructs (convergent validity) and unrelated constructs (discriminant validity).
How to prepare a questionnaire?
When drawing up a questionnaire, it is important to keep in mind which statistical methods you need to analyze the data. Depending on the statistical technique you have to ask a specific question in a specific way.
Types of questions
Many questions can be classified into two groups: open questions and closed questions. A closed question gives respondents multiple answer options. Closed questions can be quickly converted into a numerical format in SPSS. For example, answer 'yes' can be coded with number 1 and answer 'no' with number 2. The answers to open questions can be divided into different categories, for example work or relationships. A combination of open and closed questions often works best in an study.
Format of answers
It is important to choose the right scale when drawing up answer formats. For example, if you want to calculate a correlation, you need to know the exact ages. In addition, it is often useful to use a Likert-type scale. People do not simply answer whether or not they agree with the question, but to what extent (for example, on a scale of 1 to 6).
SPSS is a statistical computer program used by scientists to collect, analyze and process data. It is mainly used to investigate research results. The abbreviation SPSS stands for Statistical Package for the Social Sciences. The program is therefore used in particular in social science.
Three important situations in which you can use SPSS:
Checking the reliability of a sample. If you want to do a research in the Netherlands, it is of course not feasible to test every resident. For this reason, we almost always use a sample (a selection of people from the population). It is important that this sample is as representative as possible for the entire population so that the outcome results can be well generalized (after all, you want to say something about the entire population and not just about the sample).
Checking the reliability of your results. SPSS can provide information as to whether the connection you have found (for example, men vote for SGP (i.e., a Dutch political party) more often than women) is a coincidence or whether the difference has something to do with something else.
Visualize data. It can be useful to visualize your results through graphs and tables. You can do this by using SPSS.
How do you make a codebook for SPSS? - Chapter 2
How to prepare SPSS data?
Before you can enter all information from questionnaires and experiments in IBM SPSS it is necessary to make a codebook. This is a summary of the instructions that you will use to convert the information of each test subject into a format that IBM SPSS can understand. Preparing a codebook consists of (1) defining and labeling each variable, and (2) assigning numbers to all possible answers.
A codebook basically consists of four columns:
- The abbreviated name of the variable (for example 'ID' for 'identification number)
- The written name of the variable (for example 'identification number')
- An explanation of how the possible answers are taught (for example 1 = men, 2 = women)
- The measurement scale (for example nominal)
What is a variable?
A variable is an element that can assume a certain value. It is an element that you would like to measure and analyze. Examples of a variable are 'gender', 'age', 'education level' and 'IQ'. With SPSS you can investigate whether your variables are interdependent (for example education level and IQ) or whether a certain variable predicts another variable (for example: do men achieve higher IQ scores than women?).
The dependent variable
The dependent variable is the variable about which you make a prediction or the outcome of your measurement. An example is intelligence. You can then investigate which factors (independent variables) influence intelligence (the dependent variable). The outcome of the dependent variable therefore depends on other variables (hence the name).
The independent variable
The independent variable is a factor for which you will measure whether it causes a change in the dependent variable. For example, if one wants to do a research on the influence of drinking alcohol on exam results, the independent variable is the amount of alcohol and the dependent variable is the exam result.
What are measurement scales?
It is important to know from which measurement level your variable is, and then make a good choice for your statistical test (the method with which you want to investigate your research question). There are roughly four measurement scales: nominal, ordinal, interval and ratio. These scales are discussed below.
Discrete variables (nominal and ordinal)
A discrete variable can only assume a few fixed values. This includes the nominal scale and the ordinal scale. First, the nominal scale is a qualitative measurement scale with separate categories, for example gender (male / female). Second, measurements at ordinal level have a natural order. The order is clear, but the differences cannot be interpreted. An example are the selective Dutch secondary educational levels (i.e., VMBO-HAVO-VWO). The differences between these educational levels are not all the same. That is, the difference between VMBO and HAVO is not equal to the difference between HAVO and VWO.
Continuous variables (interval and ratio)
A continuous variable is a variable that can be measured in numbers, with the intervening values having meaning. This includes the interval scale and the ratio scale. With an interval scale, the differences between scores as opposed to an ordinal scale are the same. The difference between 10 and 11 on a test is just as large as the difference between 50 and 51. However, an interval scale does not have an absolute zero. That is why you cannot say how much higher a value is. A good example of this is the Fahrenheit scale: 30 degrees is not twice as hot as 15 degrees. A ratio scale has the same characteristics as an interval scale, but a ratio scale does have an absolute zero. After all, 50 centimeters is twice as long as 25 centimeters.
Categorical and dichotomous variables
A categorical variable is a variable that does not assume numbers, but is subdivided into categories. The most commonly used example is male / female.
A dichotomous variable is a variable that only has two options, such as right / wrong.
Which rules should be met for naming a variable?
Every question or item in your questionnaire must be given a unique variable name. There are a number of rules that a variable name must meet:
- Each variable must be given a different name and must therefore be unique.
- Each variable must start with a letter (not with a number).
- A variable cannot contain a symbol (for example!,?) or space.
- A variable cannot contain a word that is used by IBM SPSS as a command (for example, all, ne, eq).
- A variable cannot contain more than 64 characters
How to code the response?
Each result is given a numerical code, for example 1 for women and 2 for men.
With open questions you make an inventory of the most common answers. For example with the question 'What makes you experience stress?' you can divide the answers into work = 1, relation = 2, etc. It is also useful to create a residual category for other answers (other = 99).
Before you can enter all information from questionnaires and experiments in IBM SPSS it is necessary to make a codebook. This is a summary of the instructions that you will use to convert the information of each test subject into a format that IBM SPSS can understand. Preparing a codebook consists of (1) defining and labeling each variable, and (2) assigning numbers to all possible answers.
A codebook basically consists of four columns:
- The abbreviated name of the variable (for example 'ID' for 'identification number)
- The written name of the variable (for example 'identification number')
- An explanation of how the possible answers are taught (for example 1 = men, 2 = women)
- The measurement scale (for example nominal)
How do you start with IBM SPSS? - Chapter 3
How to open IBM SPSS?
There are different ways to start IBM SPSS.
The simplest way is to click on the SPSS icon on your desktop. Place your cursor on the icon and click twice.
You can also open IBM SPSS by clicking Start, placing your cursor on All Programs, and then going to the list of all available programs. See if you can find a folder here called IBM SPSS Statistics, in this case IBM SPSS Statistics 24.
IBM SPSS will also start up if you double-click an IBM SPSS data file in Window Explorer.
How to open an existing SPSS file?
If you want to open an existing SPSS data file, click on File in the IBM SPSS menu and then choose Open and Data. The Open file section allows you to search for the desired file. You can also always open a data file from the hard drive of your computer. If you have a data file on a USB stick, first copy it to your computer. You can then open the file by clicking the icon twice. The file will then open in the Data Editor.
How to work with SPSS files?
Save a data file
It is important to always save your data when you are working with it. Saving does not happen automatically in IBM SPSS. To save a file, go to the File menu, then choose Save. You can also click on the icon that looks like a floppy disk. You can see this at the top left of your screen. Always ensure that your file is saved on your computer and not on an external drive. When you save the file for the first time, you must create a name for the file and choose a folder where you want to save the file. IBM SPSS automatically ensures that your file is saved with .sav at the end.
Open another data file
If you are working on a data file and you want to open a new file, click on File and then Open and Data. Find the folder where your file is stored. Click on the desired file and then click on the Open button. The second file will then be opened in a new screen.
Create a new data file
To create a new data file, click on File and then on New and Data. You can then define your variables and enter new data.
How to deal with different screens?
IBM SPSS is a program that consists of different screens or 'windows'. To open these screens you first have to open an existing data file or create your own data file. To open an existing dataset, click on 'File' in the menu and then on 'open'. Then choose 'datas' for your dataset. The most important screens in SPSS are the 'Data Editor', the 'Viewer', the 'Pivot Tabe Editor', the 'Chart Editor' and the 'Syntax Editor'.
The Data Editor
The Data Editor consists of the content of your data file. In this screen you can create and / or save datasets, make changes to existing data and perform statistical analyzes.
The Viewer Editor
When you perform analyzes, the Viewer Editor (your output) starts automatically. This screen consists of two parts. On the left is a navigation showing all the analyzes that you have performed. On the right side you can see the results of your analyzes, for example tables and graphs.
When you save the output of IBM SPSS, this is done in a separate file ending in .spv. Data files always end with .sav. To save the results of your analyzes it is important to have the Viewer screen open. Click on File and then on Save. Choose the folder where you want to save the output and create a new name. Then click Save.
It is important to know that an output file can only be opened in IBM SPSS. If you send your file to someone else who does not have the IBM SPSS program, he or she cannot open your file. To remedy this, you can export your output. Select File and then Export. You can now choose the type, for example PDF or Word. Then choose the Browse button to create a folder in which you want to save the file and choose a suitable name in the Save File line. Then click Save and OK.
You can use the navigation bar (left in the screen) to print out certain sections of your output. Highlight the sections you want to print. Click on the first section, then hold down the Ctrl key and click on the File menu and on Print.
The Pivot Editor
You can adjust the tables that you can see in the Viewer screen (the output). This is possible through the Pivot Editor. To adjust a table, select the desired table and click twice on the table. You can then use the Pivot Editor to, for example, change the size, font or dimensions of the columns.
The Chart Editor screen
When you ask SPSS to make a graph, it first appears in the Viewer screen. If you want to adjust the graph, you must activate the Chart Editor screen. You do this by selecting the relevant graph (double click).
The Syntax Editor screen
In the Syntax Editor screen you can see the commands that SPSS uses to perform certain analyzes. If you want to perform an analysis again, you can indicate this in your Syntax screen. You then select the desired command and then click on Run. If you would like the command of your analysis to appear in the Syntax screen, click on Paste instead of OK.
What are dialog boxes?
When you select a menu option, further information is often requested. This is done in a dialog. For example, there is a dialog box when you use the "Frequencies" analysis.
To select the variable on which you want to perform the analysis, select the variable and then press the arrow key (arrow pointing to the right). If you want to select multiple variables, select them by holding down the Ctrl key simultaneously. To easily find the right variables, select one of the variables and right-click. Then choose Sort Alphabetically . Now the variables are sorted alphabetically and you can easily find the desired variables. To remove a variable that you have selected from the selection, select the variable in the dialog box and click the arrow key (arrow pointing to the left).
You will often find the same buttons in the dialog box.
OK: Click this button when you have selected your variables and you are ready to perform the analysis.
Paste: This button ensures that your analysis is transported to the Syntax Editor. This can be useful when you want to execute a certain command multiple times.
Reset: This button is used to clear the dialog.
Cancel: If you click on this button, all commands you have given for the technique or procedure will be deleted.
Help: If you click on this button, additional information about the technique or procedure you want to perform will appear.
How to close IBM SPSS?
If you want to close IBM SPSS, click on File and then on Exit. IBM SPSS will then show you a reminder to save your file before closing the program. It is important to save both your data file and your output.
There are different ways to start IBM SPSS.
The simplest way is to click on the SPSS icon on your desktop. Place your cursor on the icon and click twice.
You can also open IBM SPSS by clicking Start, placing your cursor on All Programs, and then going to the list of all available programs. See if you can find a folder here called IBM SPSS Statistics, in this case IBM SPSS Statistics 24.
IBM SPSS will also start up if you double-click an IBM SPSS data file in Window Explorer.
How do you create a file and enter your data in SPSS? - Chapter 4
How to change the options?
You can use the options for all kinds of options to display variables, the type of tables you want to get as output and more. Options can be found under Edit. Make sure you first select what you want in all tabs, and then click OK.
General tab
Here you can choose to display variables alphabetically or in the order in which they appear in the file, the latter being more in line with research in most cases. To do this, click on File under Variable Lists. For a clear overview of numbers, click on Output at No scientific notation for small numbers in tables.
Data tab
Here you choose how data is displayed.
Output tab
This allows you to customize the name of variables and labels.
Pivot tables tab
Here you can choose the design of tables.
How to define the variables?
The Data Editor (the main SPSS screen) is subdivided into two different tabs; Data View and Variable View (these tabs can be found at the bottom left of the screen). Before you can enter data, variables must first be created. In the Variables View tab you can define your variables. You then enter all your data in the Data view tab. When you have performed an analysis, the output screen appears.
Variable View
You can create the variables in this tab. Each row represents a variable. You can enter information about the variable in each column.
Name: The name of the variable
Type: Type of data, often these are just numbers or numeric variables. It is also possible that data or letters are used, for example. If you want to enter the type, select the cell and press the blue square with dots. You can then choose the type of variable in a new screen (for example, numeric, dollar, or date).
Width: How many positions are available
Decimals: Number of decimals
Labels: Text with which you can explain the name of the variable
Values: Enter the values of the labels here. An example may be that your variable is gender and the code is 0 for men and 1 for women. To enter the values, select the cell and click on the blue square with the dots. Then you write '0' for value and 'man' for the label.
Missing: Here you can specify a value that you used to indicate 'no answer'. Here too, select the blue square to enter the values.
Columns: Width of the column in data view.
Align: Alignment
Measure: At which level the data was measured: nominal, ordinal or scale.
Role: The role that the variable plays in your data set. You can select here whether it is a dependent variable ('target') or an independent variable ('input').
There are four steps in determining variables:
- Create variables
- Assign labels to the answer categories and the missing values
- Entering data
- Clean up data
How to enter variables and data?
There are two ways to create a new variable. In the first way, a new variable is created by entering new data. In the second way, a variable is created that is based on existing data in the data set. For example, two variables are then combined to create a new, third variable.
Method 1: New variable, enter data manually.
Click on Variable View at the bottom left of the screen.
Then type the name of your variable at the first row. You can indicate at Label what exactly the variable measures. At Values you can indicate what each answer means. You do not have to enter this for open questions!
Make sure that you check well with Measures which measurement level the variable has.
Method 2: New variable based on existing variables
Compiled summary, based on Chapter 4 of Pallant's SPSS Survival Manual, 6th edition from 2016. Example in SPSS: averaging different variables.
Click on Transform → Compute Variable
Enter the name for the new variable under Target Variable.
You can then click on Statistical under Function group. All kinds of options that you can do appear under Functions and Special Variables. For example, if you click on Mean, you can merge the average of a few variables into a new variable. MEAN (?,?) Appears in the Numeric Expression block.
Now you can drag the variables from the list on the left to the Numeric Expression block, so that the question marks are replaced by the names of the variables.
If you then click on OK, you can find the newly created variable with the associated values in the dataset.
How to edit existing data?
Remove a test subject from the data
You can adjust the entered data in the data editor. To delete a test subject, select the row of the test subject in question and click on delete (on the keyboard). You can also use SPSS through the following steps: Edit → Clear
Add a test subject among the other test subjects
Move your cursor to a cell in the row directly below the row where you want to add a new test subject. Click on Edit and then choose Insert Cases. An empty row will then appear where you can enter new data.
Delete a variable
Position your cursor to the section above the column that you want to delete. Click once to select the entire column. Then click on delete on your keyboard. You can also click on Edit in the menu and then on Clear.
Add a variable between another variable name
Position your cursor in a cell in the column to the right of the variable next to which you want to place a new variable. Click on the Edit menu and choose Insert variable. An empty column will appear in which you can enter data from the new variable.
Move a variable
Left click on the variable you want to move, hold and drag the variable to the new location.
How to insert Excel data?
It is also possible to export data from an existing file from Excel. For example, you can prepare your data in Excel and then put it in IBM SPSS. To do this, complete the following steps. Open IBSM SPSS. Then click on File, Open, Data. In the Files or type section you choose Excel. Excel files always end with .xls or .xlsx. Find the file of your choice. Click on the file so that it appears in the File name. Then click on the Open button. A screen called Opening Excel Data Source will open. Make sure Read variable names from the first row or data is checked. Then click OK. You can then save the new file as an IBM SPSS file. Choose File, then Save as. Type a new name. Note that the Save as Type is set to SPSS Statistics (* .sav). Then click Save. In the Data Editor, Variable View, you will now have to add extra information regarding Labels, Values and Measure. You will also probably have to change the width of the columns.
What else can you do with the data?
Split data
Sometimes it is useful to create different groups within your data to compare these groups. This way you can, for example, compare the data of men and women with each other. To be able to do this, you must split your data file into SPSS. You then ensure that, for example, all men are in one group (group 1) and all women (group 2).
Procedure
Now follow the procedure for splitting your data file.
Go to Data and choose Split File.
Click on Compare groups and specify your group variable (in this case gender). Click OK.
You now see in your data file (Data View) that all test subjects are sorted by gender. First you see all men, then all women.
Select data
For some analyzes you only need a part of your sample. For example: only the men. You must then select this group in SPSS. You do this by using the Select Cases option. When you have selected the group of men, all women are deleted in SPSS. All analyzes that you will subsequently do will only be done for men.
Procedure
Now follow the procedure for selecting a part of your sample (in this case men).
Choose Data for Select Cases.
Click on If condition is satisfied.
Click the IF button .
Choose the variable on which you want to select your group (in this case gender).
Click on the arrow button and drag the variable to the section. Click the = key of the keyboard on the screen.
Type in the value that corresponds to the value for men in your codebook. Look for this in your Variable View.
Click Continue and OK.
Then click on If at If condition is satisfied -> then select the variable
In the Data View you can now see that all women (gender = 1) have been deleted. Only the men (gender = 0) are selected.
How to merge files?
Sometimes it is necessary to merge data files. If the files have the same variables and use the same variable names, you can merge the files by adding the data. However, it may be necessary to add new variables first.
Merge files by adding data
The following is the procedure for merging files by adding data.
Open the file that you want to add.
Go to Data and choose Merge Files and then Add Cases.
Click in the dialog on An external SPSS data file.
Click Continue, OK and File, save as to give the file its own name.
Merge files by adding variables
The following is the procedure for merging files by adding data.
Sort the files in ascending order via Data, click Composite summary, based on Chapter 4 of Pallant's SPSS Survival Manual, 6th edition from 2016. at Sort Cases, click ID and then OK.
Go to Data, click on Merge Files and then Add Variables.
Click in the dialog on An external SPSS data file.
Check in the Excluded variables box if you see the added variables. Make sure that each variable has a unique name, so that two different variables do not have the same name.
Click on the variable that you want to add, and then on the box Match cases on key variables. Move the variable to the Key variables box, and click OK.
Save the merged file under a new name with File, save as.
How do you screen and clean up data in SPSS? - Chapter 5
Typing errors
It is always very important to run through your data for example on typing errors. You can of course check all entered data again with the original data, but this takes a lot of time. An easier way is to request Frequencies. You do this by following the following steps: Analysis → Descriptive Statistics → Frequencies.
Screening and cleaning up the data
Before you can analyze your data it is important to check your data file for errors, possible errors. First, it is important to see if you have made typos (see above). In addition, it is essential to investigate whether there are other errors with your data. For this you follow the following steps:
Step 1: Checking for errors. First it is necessary to check all scores of all variables. You then investigate whether there are certain scores that fall outside the normal range.
Step 2: Finding and checking error in the data file. It is then necessary to find out where the error is in the data file. This error must then be corrected or removed.
How to check for errors?
When you check your file for errors, you particularly check whether there are values that fall outside the normal range of possible scores. For example: when variable 'gender' is coded with 0 or 1 (where 0 = male and 1 = female), it is not possible to find scores other than 0 or 1. Scores that have a number other than 0 or 1 ( for example 2 or 3) should therefore be removed or adjusted. There are different ways to find errors with IBM SPSS. These can be roughly divided into two methods: one for error with categorical variables and one for error with continuous variables.
Checking categorical variables
Use the following procedure to check error with categorical variables.
Click on Analyze and then on Descriptive Statistics and then on Frequencies.
Choose the variables that you want to check (for example, gender). To find a variable easily, you can sort your variable list by alphabet.
Click on the arrow button (pointing to the right) to move the desired variables to the variables window.
Then click Statistics. Check Minimum and Maximum in the Dispersion section.
Then click Continue and then OK (or Paste to save everything in the Syntax Editor).
The syntax is generated as follows:
FREQUENCIES VARIABLES = gender
/ STATISTICS = MINIMUM MAXIMUM
/ ORDER = ANALYSIS.
In this example you see that there is one error in the data file. There is one test subject where the gender is coded with number 2 (instead of 0 or 1). Therefore, check with this test subject whether there is a male or female gender. Then change the data of this test subject.
It can also happen that a test subject has forgotten to enter data for the relevant variable. You can find this in the table under Missing.
In this example it can be seen, for example, that the data for variable gender is missing in one test subject. Find this test subject and see if you can correct the data (see below).
How to find and correct errors in the data file?
What to do when you have found responses that fall outside the normal range? Then it is important to trace these test subjects. You can do this by taking the following steps:
Click on Data and then choose Sort Cases.
In the dialog you then choose the variable for which you knew that there was an error (in this case, 'gender'). Click on the arrow button (pointing to the right) and move the variable to the Sort By window. Then choose ascending (from low to high) or descending (from high to low). In our example we would like to find the test subject who had the answer option '2' at gender. In this case we opt for descending.
Then click OK.
Checking continuous variables
Follow the following procedure to check error with continuous variables.
Click on Analyze and then on Descriptive Statistics and then on Descriptives.
Choose the variables that you want to check (for example, gender). Click on the arrow button (pointing to the right) to move the desired variables to the variables window.
Click on Options. You can choose what you want to show: average, standard deviation, or minimum and maximum.
Then click Continue and then OK (or Paste to save everything in the Syntax Editor).
The syntax is generated as follows:
DESCRIPTIVES
VARIABLES = age
/ STATISTICS = MEAN STDDEV MIN MAX
View whether the minimum and maximum are logical, for example an age of 2 to 82. Also check whether the average is logical, or whether there are certain data that make the average deviate considerably.
What are case summaries?
Summarize Cases gives you a table with specific information for each test subject. You follow the following steps to obtain this summary:
Click on Analyze, go to Reports and then choose Case Summaries.
Choose the variables that you are interested in (in this case gender, province and age).
Click Statistics and remove Number of Case from the Cell Statistics window. Then click Continue.
Click on Options and remove Subheadings for totals.
Click Continue and then OK (or Paste if you want to save the analysis in the Syntax Editor).
The syntax is generated as follows:
SUMMARIZE
/ TABLES = gender province age
/ FORMAT = VALIDLIST NOCASENUM NOTOTAL LIMIT = 5
/ TITLE = 'Case Summaries'
/ MISSING = VARIABLE
/ CELLS = NONE.
In the example only a summary is given of the first five test subjects. You can indicate this by noting the number under Display Cases under Limit cases to first (in this case 5).
It is always very important to run through your data for example on typing errors. You can of course check all entered data again with the original data, but this takes a lot of time. An easier way is to request Frequencies. You do this by following the following steps: Analysis → Descriptive Statistics → Frequencies.
How do you use SPSS for descriptive statistics? - Chapter 6
When you are sure that there is no error in your data file, you can start with the descriptive phase of your data analysis. We called this descriptive statistics. These have as purpose:
Describe the characteristics of your sample in the method section of your article
Checking your variables to investigate whether you meet certain assumptions associated with the statistical techniques you want to implement to answer your research questions
Asking specific research questions
When it comes to research with human subjects, it is almost necessary to collect general characteristics. Consider the number of people in the sample, the number or percentage of men and women, the ages, and level of education.
Examples of descriptive statistics are the average, the standard deviation and the distribution of the scores.
Procedure for creating a codebook
If you only want a quick summary of the characteristics of your variables in your data file, you probably need a codebook. The following is the procedure for obtaining a codebook.
Click Analyze and go to Reports and choose Codebook.
Select the variables you want (for example, gender, age) and drag these variables to the Codebook Variables window.
Click on the Output sheet and uncheck all Options except Label, Value Labels and Missing Values.
Click on Statistics and make sure that all options in both sections are checked.
Click OK (or Paste to save everything in the Syntax Editor).
The syntax is then as follows:
DATASET ACTIVATE DataSet1.
CODEBOOK gender [n] age [s]
/ VARINFO LABEL VALUELABELS MISSING
/ OPTIONS VARORDER = VARLIST SORT = ASCENDING MAXCATS = 200
/ STATISTICS COUNT PERCENT MEAN STDDEV QUARTILES.
This output gives you a quick summary of the test subjects in your data file. If you want more detailed information you can get it through Frequencies, Descriptives or Explore. You can use Frequencies to obtain information from categorical variables .
What is the procedure for obtaining descriptive statistics for categorical variables?
To get descriptive statistics of categorical variables you use the Frequencies function. You can find this by following these steps:
Go to Analyze and then to Descriptive Statistics and then to Frequencies.
Then choose the categorical variables that you are interested in. Move this to the variable box.
Then click OK (or Paste if you want to save it in the Syntax Editor).
The syntax associated with this procedure is:
FREQUENCIES
VARIABES = gender
/ ORDER = ANALYSIS
What is the procedure for obtaining descriptive statistics for continuous variables?
For continuous variables (for example age) it is easier to use Descriptives. This analysis provides the basic 'summary' statistics such as the average, the median and the standard deviation. You can find the confidence interval through Explore.
The procedure associated with obtaining descriptive statistics for continuous variables is:
Click on Analyze then select Descriptive Statistics and then Descriptives.
Click on all continuous variables for which you would like to obtain descriptive statistics. Then click on the arrow button (pointing to the right) to move these variables to the Variables section.
Click on Options. Make sure the following statistics are checked: mean, standard deviation, minimum, maximum and then also click on skewness and kurtosis.
Click Continue and then OK (or Paste to save the analysis in the Syntax Editor).
The syntax generated with this procedure is:
DESCRIPTIVES
VARIABLES = age
/ STATISTICS = MEAN STDDEV MIN MAX KURTOSIS SKEWNESS
The Skewness function provides information about the symmetry of the distribution of the scores. Kurtosis provides information about the peak of distribution. If the distribution of the scores were perfectly normal, both the skewness and the kurtosis would be zero. A positive value of skewness indicates that the scores are mainly on the left. Negative values suggest that the scores are on the right side of the mean. A kurtosis of almost zero indicates a distribution that relationships are flat (too many test subjects in the extreme scores).
How to discover missing data?
When conducting research, in particular on people, you rarely get all the information from every case. That is why it is important that the research also looks at the missing data. This is possible in SPSS using the Missing Value Analysis procedure (bottom option in the Analyze menu). You must also decide how to deal with missing data when performing statistical analyzes. The Options button in many of the statistical procedures in SPSS offers various options regarding dealing with missing data. It is important that you choose carefully, since it can have major consequences for your results. The different options for dealing with missing data are:
The Exclude cases listwise option includes all cases in the analyzes, provided there is no missing data. A case involving missing data is completely excluded from the analysis.
The Exclude cases pairwise option (sometimes also referred to as Exclude cases analysis by analysis) only excludes cases if the data required for a specific analysis is missing. They are included in an analysis for which they contain the required information.
The Replace with mean option calculates the average value for the variable and gives this value to every missing case. This option should never be used because it can seriously disrupt the results of your analysis.
It is strongly advised to use the exclude cases pairwise option , unless there is a very urgent reason to do otherwise.
How to measure normality?
The following is the procedure for measuring normality through Explore.
Choose Analyze and select Descriptive statistics and then Explore.
Click on the variables in which you are interested. Click on the arrow button (pointing to the right) and drag these variables to the Dependent list.
Place Cases in the Labels by your independent variable.
In the Display section: make sure Both is selected.
Click on Statistics and click on Descriptives and Outliers. Then click Continue.
Then click on Plots and under Descriptives on: Histogram. Then uncheck Stem-and-leaf. Click on Normality plots with tests and then click on Continue.
Click on Options. In the Missing Values section you click on Exclude cases pairwise. Then click Continue and OK (or Paste to save the analysis in the Syntax Editor).
The syntax is generated as follows:
EXAMINE VARIABLES = age
/ ID = gender
/ PLOT BOXPLOT HISTOGRAM NPPLOT
/ COMPARE GROUPS
/ STATISTICS DESCRIPTIVES
/ CINTERVAL 95
/ MISSING PAIRWISE
/ NOTOTAL.
Interpretation of the output of normality
A lot of output comes from measuring normality. You can interpret the output as follows.
Trimmed mean
This function removes 5% of the upper and 5% of the lower of the data, and calculates a new average on which the strongly deviating data have had less influence. If you compare this new mean with the original mean, you can see how much influence the most deviating data have. You can see the most abnormal data in Extreme Values.
Skewness and kurtosis
The Skewness function provides information about the symmetry of the distribution of the scores. Kurtosis provides information about the peak of distribution. The skewness and kurtosis functions together provide information about the distribution of scores across the different groups.
Kolmogorov-Smirnov
The Kolmogorov-Smirnov test can be used to investigate whether the results are normally distributed. You perform this test with 'Explore'. You then follow the following steps: Analysis → Descriptive statistics → Explore. Then choose your dependent variable. Then go to Plots. At Boxplots you check None. Then check Normality plots with tests. Under Descriptive you click Stem-and-leaf and click on Histogram. Click on Continue.
You then look at the Tests of Normality table in your output. A non-significant result (p> .05) indicates a normal distribution. There is a significant p-value in the table, which means that the assumption of normality cannot be met. This is often the case with large samples.
Histograms
The form of the distribution per group can be seen with Histograms. This allows you to see if there is a normal distribution.
Box plot
The Boxplot represents 50% of the cases with a rectangle. The lines outside represent the smallest and largest value. Sometimes circles are displayed in a boxplot, these are the outliers.
How to check whether there are outliers?
Outliers consists of test subjects who have extremely high or extremely low values in comparison with the majority of the data set. Various techniques are possible to check for outliers, for example by means of a histogram, box plot or information in the descriptives table. When you have found outliers, you can create new variables that do not contain outliers.
You can first view the variable separately by means of Analyze → Descriptives → Frequencies.
You can now use the recode function if, for example, the data can only have the values 1 to 10 but also contains 100. If this is the case, you can create a new variable by means of 'Recode into different variable'. For the latter: click on Analyze → Descriptive Statistics → Explore
Click on the variable that you are interested in.
Then click on Statistics and click on Outliers → Continue → OK
When you are sure that there is no error in your data file, you can start with the descriptive phase of your data analysis. We called this descriptive statistics. These have as purpose:
Describe the characteristics of your sample in the method section of your article
Checking your variables to investigate whether you meet certain assumptions associated with the statistical techniques you want to implement to answer your research questions
Asking specific research questions
When it comes to research with human subjects, it is almost necessary to collect general characteristics. Consider the number of people in the sample, the number or percentage of men and women, the ages, and level of education.
Which graphs can you use to display data? - Chapter 7
In SPSS there are different types of graphs and charts that you can use to display data. The views discussed here are histograms, bar charts, line charts, scatter charts, and boxplots.
In the Graph menu in SPSS there are various options for creating graphs, the easiest method is to use the Chart Builder.
How to create a histogram?
You use a histogram in the case of a single continuous variable. You create a histogram as follows:
Compiled summary, based on Chapter 7 of Pallant's SPSS Survival Manual, 6th edition from 2016. Select the Chart Builder in the Graph menu and click OK.
Select the Histogram option under Gallery.
Drag the Simple Histogram option to the Chart Preview location.
Choose your variables in the Variables list and drag it to Chart Preview, to the X-Axis so that the variable is projected on the x-axis.
You can create a histogram per group. Under Groups / Point ID option Column Panels variable (for graphs side by side) or Rows panel variable (for graphs below each other).
Drag the categorical variable for the entire group (for example, age) to Panel (at the Chart Preview spot).
Click OK, or Paste to save everything in the Syntax Editor.
A histogram shows the output in bars facing upwards.
How to make a bar chart?
You use a bar chart in case of continuous variables for different categories, or if you want to show the number of cases of a certain category. For a bar chart you need a categorical variable and a continuous variable. You make a bar chart as follows:
Select the Chart Builder in the Graph menu and click OK.
Under Gallery, select the Clustered Bar option and drag it to the Chart Preview.
Under Element Properties on Display error bars, and click Apply.
Drag the categorical variable for a group (for example, age) to Cluster on X: set color (at the Chart Preview location).
Drag the other categorical variable (for example, hair color) onto the X-Axis so that the variable is projected on the x-axis.
Drag the continuous variable (for example, weight loss) to the Y-Axis so that the variable is projected on the y-axis.
Click OK, or Paste to save everything in the Syntax Editor.
A bar chart displays a predetermined categorical variable on the x-axis and a continuous variable on the y-axis. For another categorical variable, the output is displayed in bars (bars).
How to make a line graph?
You use a line graph for the average of a continuous variable at different values of a categorical variable (for example trimester 1, trimester 2, trimester 3). You can also display one-way or two-way ANOVA with a line graph. Make a line graph as follows:
Select the Chart Builder in the Graph menu and click OK.
Under Gallery, select the Multiple Line option and drag it to the Chart Preview.
Drag the continuous variable (for example, weight loss) to the Y-Axis so that the variable is projected on the y-axis.
Drag one of the categorical variables (for example, age) to Set Color and the other categorical variable (for example, hair color) to X-Axis.
Click OK, or Paste to save everything in the Syntax Editor.
With a line graph you show the progress of the categorical variables on the x-axis in the form of lines. The continuous variable is displayed on the y-axis.
How to make a scatter diagram?
You use a scatter plot in the case of a relationship between two continuous variables. A scatter diagram provides the following information:
Whether the variables have a linear or curved (fluid) relationship.
Whether the variables have a positive relationship.
How strong the connection is.
You make a scatter diagram as follows:
Select the Chart Builder in the Graph menu and click OK.
Select the Scatter / Dot option under Gallery. Select Grouped Scatter and drag it to the Chart Preview.
Drag the continuous independent variable (for example, weight loss) to the X-Axis so that the variable is projected on the x-axis.
Drag the dependent variable (for example, cholesterol level) to the Y-Axis so that the variable is projected on the y-axis.
You can display groups by dragging each categorical group variable (for example, age) to Set Color.
Click OK, or Paste to save everything in the Syntax Editor.
A scatter diagram shows many dots in a graph.
You can create a scatter diagram for two variables, or a matrix of scatter diagrams for a whole group of variables. You create a matrix of multiple scatter diagrams within a chart with the Scatterplot Matrix option in Gallery.
How to make a boxplot?
You use a boxplot to compare the distributions of results. One possibility is to compare the distribution of a continuous variable with the entire stitch group, another possibility is to break the results into different groups. You make a boxplot as follows:
Select the Chart Builder in the Graph menu and click OK.
Under Gallery, select the Simple Boxplot option and drag it to the Chart Preview.
Choose your categorical variables (for example, age) in the Variables list and drag them to Chart Preview, to the X-Axis so that the variable is projected on the x-axis.
Under Groups / Point ID, select the Point ID label option.
Click OK, or Paste to save everything in the Syntax Editor.
A boxplot shows the categorical variable on the x-axis, with a line (called whisker) per group with a block (box) in it.
A boxplot provides the following information:
- The boxplot shows the distribution of the continuous variable and the influence that the categorical variable has.
- The box shows 50% of the cases.
- The horizontal line inside the box shows the median.
- The whiskers show the largest and smallest values.
- The outliers are shown in circles outside the whiskers.
- Extreme outliers, more than three times the length of the box outside the box, are shown with an asterisk (*).
- The boxplot shows variety within a group and the differences between groups
How to adjust a graph or chart?
With the Chart Editor you can adjust graphs and charts. With this you can adjust the following:
The wording of labels
The position and the starting point of the axes
The design of text, lines, colors, patterns etc.
How to import charts and diagrams to Word or other word processors?
You can import the created charts and diagrams into Microsoft Word. In other word processors there is sometimes an option to import from SPSS, the procedure then works about the same. You place charts and diagrams in Word via the following procedure:
Open the file in which you want to display the graph in Word. Click on the IBM SPSS icon, which is depending on your version of Word at the bottom or top of the menu in Word.
Open the Viewer screen in SPSS .
Click on the graph, a border appears around it.
Click Edit and then Copy so that the graph is copied for pasting.
Go to the Word document and click where you want the graph, click Paste.
Save the file.
In SPSS there are different types of graphs and charts that you can use to display data. The views discussed here are histograms, bar charts, line charts, scatter charts, and boxplots.
In the Graph menu in SPSS there are various options for creating graphs, the easiest method is to use the Chart Builder.
How do you manipulate data in SPSS? - Chapter 8
If the raw data has been accurately entered into SPSS, the next step is to edit and prepare the data so that later analyzes can be performed and hypotheses can be tested.
Make sure that you also adjust the codebook for everything you adjust. An alternative is to use the Syntax option, this means that you keep track of all actions to be performed in the Syntax Editor, so that there is a list of what has been adjusted.
How to calculate the size of the scale?
There are two steps to calculate the total size of the scales:
Step 1: Reverse code negatively articulated items positively
Step 2: Add all the results
Step 1: Reverse code negatively articulated items positively
Questions that are negatively expressed (for example 'I am usually bad in statistics' must be converted to a positive wording (for example 'I am almost never good in statistics') so that all outcomes are given the same kind of interpretation. for example, a Likert scale is used, where 1 means absolutely disagree and 5 very much agree, you can automatically apply this in SPSS with the following procedure:
Click on Transform and select Recode into different variables.
Then select the data that you want to recode and move it to Input Variable - Output Variable.
Click per variable on the variable and type a new name in Output Variable with Change.
Now recode the values or variables. Type 1 in Old Value and 5 in New Value. Repeat this for all variables.
After this you can click on Continue and then OK or Paste to save everything in the Syntax Editor).
Check in Variable View whether the variables can all get the same kind of interpretation.
Step 2: Add all the results
Use the following procedure to add the results to calculate the scale:
Click on Transform and then on Compute Variable.
Type a name for the total scale results in Target Variable. Make sure that you do not use a name that was previously used for another variable, because then you delete the previous results.
Go to Label via Type and Label, enter a description of the scale (for example weight gain) and click Continue.
Click on the first item on the variable list on the left. Move this to the Numeric Expression box.
Click on + in the calculator.
Repeat this until all possible scale results are in the box. Start with the results that are not reversed (for example op4, op6) and then continue with the reversed results (Rop3, Rop5, Rop7).
The numeric expression then becomes: op1 + op4 + op6 + Rop3 + Rop5 + Rop7.
Click OK (or Paste to paste it first into the Syntax Editor and then Run).
Check the whole to make sure it logically matches the results of your research. Use Descriptives to check if there are no extreme values in the output. Also compare the average with the results from other studies and with your expectations. Check the distribution with skewness and kurtosis. By creating a histogram, you can immediately see if the results are normally distributed.
How to divide a continuous variable into groups?
With the following procedure you can divide a continuous variable (such as weight) into equal groups (for example 0 to 50 kilos, 51 to 100 kilos, and 101 to 150 kilos).
Click on Transform and then on Visual Binning.
Move the continuous variable to Variables to Bin and click Continue.
A histogram appears in Visual Binning.
Type the name for the new categorical variable that you are creating in Binned Variable. For a group with a weight of 0 to 50 kilos, you can use Weightgp1, for example.
Click on Make Cutpoints and then on Equal Percentiles Based on Scanned Cases. In Number of Cutpoints, enter a number that is 1 less than the number of groups you want to create. See if the percentages that appear in Width are correct. For three groups, this is 33.33% per group. Click Apply.
Click on Make Labels.
Click OK (or Paste to paste it first into the Syntax Editor and then Run).
How to divide a categorical variable into categories?
In some studies, it is more appropriate to divide the results into categories, for example, if only a few members of the population with a certain abnormal characteristic produce very different results. This may also be necessary with logistic regression. You can use the following procedure for this:
Click on Transform and select Recode into different variables.
Select the variable that you want to recode and type a new name for it in Name. In Label you can optionally enter an extended name. Then click on Change.
Click on Old and New Values .
Rename each Old Value to a New Value. You can put values in the same category by giving them the same value, for example as follows: 1 remains 1, 2 becomes 1, 3 becomes 2, 4 becomes 3, 5 becomes 4, 6 becomes 5 and so on.
After this you can click on Continue and then OK or Paste to save everything in the Syntax Editor).
Go to Data Editor and choose Variable View. Write appropriate labels for the new values.
Check with Frequencies if everything is still correct.
How to convert text to numeric values?
Converting text to numeric values is especially important when using databases such as Microsoft Access. The procedure is:
Click on Transform and select Automatic Recode.
Move the variable that is expressed in text, and move it to Variable-New Name.
Type the new name that you want to use in New name and click Add New Name.
Click OK.
How to deal with periods and time indications?
With the Date and Time Wizard you can make clear periods between measurements, for example how many hours there are between two dates. This is possible with the following procedure:
Click on Transform and select Date and Time Wizard.
Click on Calculate with dates and times and on Next.
Select Calculate the number of time units between two dates and click Next.
Move the first date to Date1.
Move the second date to minus Date2.
Select the time unit in Unit and click Next.
In Result Variable, type a name for the variable (for example, NumberDaysNot Studied).
If required, first place the operation on the syntax screen or go directly to Execution and click Finish.
How to transform variables?
Transforming variables is a possibility that is often useful, for example if the results do not form a nice normal distribution. An alternative is to use non-parametric techniques, but it is easier to transform the variables. There is controversy about transforming, so think carefully about how you want to adjust the results. The procedure for transforming is as follows:
Click on Transform and select Compute Variable.
In Target Variable, type a new name for the variable.
Choose the correct operation in Functions. Possible operations:
A Square root is an arc at the beginning of the x-axis and then only a gradual decrease.
Formula: new variable = SQRT (old variable).A Logarithm is a steep arc at the beginning of the x-axis and then a steep fall.
Formula: new variable = LG10 (old variable).An Inverse starts high on the y-axis, followed by first a steep and then a gradual decline.
Formula: new variable = 1 / (old variable).A Reflect and square root is an inverted square root, which only starts after a piece of no action on the x-axis.
Formula: new variable = SQRT (K - old variable) where K = the highest possible value + 1.A Reflect and logarithm is an inverted Logarithm, which starts after a piece of no action on the x-axis.
Formula: new variable = LG10 (K - old variable) where K = the highest possible value + 1.A Reflect and inverse is an inverted inverse, which starts after a piece of no action on the x-axis with a gradual and then steep rise.
Formula: new variable = 1 / (K - old variable) where K = the highest possible value + 1.
View the final formula in Numeric Expression and write it down in your codebook next to the new variable name.
Click on Type and Label and write a short description of the new variable under Label. Ensure that the new variable has a unique, previously unused name.
Click OK (or Paste to paste it first into the Syntax Editor and then Run).
Check in Analyze, Frequencies whether the skewness and kurtosis have improved.
Under Frequencies on charts and select Histogram to check whether the distribution is improved.
If the raw data has been accurately entered into SPSS, the next step is to edit and prepare the data so that later analyzes can be performed and hypotheses can be tested.
Make sure that you also adjust the codebook for everything you adjust. An alternative is to use the Syntax option, this means that you keep track of all actions to be performed in the Syntax Editor, so that there is a list of what has been adjusted.
How do you check the reliability of a scale? - Chapter 9
The value of a study largely depends on the reliability of the scale used. One aspect of reliability is internal consistency: the degree to which the items of a scale associated with each other. This can for example be calculated with the Cronbach's cofficient alpha in SPSS. A Cronbach's alpha of .7 or greater indicates a reliable scale. However, with short scales with few units, there are low Cronbach values and they don't say much.
How to check the reliability of a scale?
The procedure for checking the reliability of a scale is as follows:
Check whether all negatively formulated values have already been converted to positively formulated values.
Click on Analyze, select Scale and then Reliability Analysis.
Move all parts of the scale to Items.
Select the Alpha option under Model.
Type the name of the scale in the Scale label.
Click on Statistics. In Descriptives for, select Item, Scale and Scale if item deleted. Select the Correlations option in Inter-Item. Also select in Summaries the option Correlations.
After this you can click on Continue and then OK or Paste to save everything in the Syntax Editor).
What conclusions can you draw about reliability based on the output?
In the output you must check the following things to increase reliability:
Check the number of cases and the number of values.
Double-check that there are no negative values in the Inter-Item Correlation Matrix.
Check whether the Cronbach values are above 0.7.
In Corrected Item-Total Correlation, check the relationship between the results and the total result. One possibility is to remove exceptionally low outcomes (lower than 0.3).
Check the impact of each value in Alpha if Item Deleted. If the impact of a single value is so high that it is even higher than the final alpha value, you can consider removing the value.
Check the average correlation between value in Summaty Item Statistics. A strong mutual coherence indicates high reliability. However, in many studies, certainly in the case of few values, this coherence is not very strong.
How do you display reliability information?
An open and transparent investigation usually provides information about the reliability of the scales, in most cases in the chapter or part of the text this is about the methods. State the internal consistency, describe the scale and provide a summary of information about the reliability of the scales. This information provides guidance to be able to appreciate the results of the sample more and to interpret them better.
The value of a study largely depends on the reliability of the scale used. One aspect of reliability is internal consistency: the degree to which the items of a scale associated with each other. This can for example be calculated with the Cronbach's cofficient alpha in SPSS. A Cronbach's alpha of .7 or greater indicates a reliable scale. However, with short scales with few units, there are low Cronbach values and they don't say much.
Which method to use in SPSS? - Chapter 10
Which statistical methods are there?
Some studies use a single method, but many studies use multiple methods. In any case, it is crucial to choose the right research method.
Below you will not yet find out how exactly you apply the research methods. The overview below is intended to give a brief introduction to research methods, so that you can make a choice based on which method you need.
What methods are there to investigate relationships between variables?
If you want to investigate the relationships between different variables, for example between age and drug use, different methods are possible. These methods are also useful for processing the results of most types of surveys.
Introduction correlation
A correlation analysis is used to describe the strength and direction of a linear relationship between two variables. There are various statistics available in IBM SPSS to measure a correlation, including the Pearson product moment correlation cofficient (r) and the Spearman Rank Order Correlation (rho). Pearson r is used with variables at interval level while the Spearman rho is used with variables at ordinal level. A correlation indicates the extent to which two variables are related, for example being a man and wearing pink clothing.
Positive and negative correlations
Correlations are also often used to describe data and to check the data for assumptions. The correlation coefficient can be both negative and positive and is always between -1 and 1. A correlation of -1 is a perfectly negative correlation. There is a connection between two opposite things. Think of wearing a bikini and not wearing gloves. A correlation of 1 is a perfect positive correlation. There is a connection between two positive or two negative variables. For example: wearing a bikini and eating an ice cream. A correlation of 0 indicates that there is no question of a relationship between two variables.
Example of a research question with correlations
The following is an example of a research question about a correlational relationship.
Research question: Is there a connection between the amount of exam stress and the amount of alcohol consumption of students? Do people with more exam stress drink more alcohol or less alcohol?
What do you need: two variables, both continuous, or one continuous and the other dichotomous (two values)
What it does: correlation describes the relationship between two continuous variables of both the strength of the relationship and the direction of the relationship.
Non-parametric alternative: Spearman Rank Order Correlation (rho).
Introduction of partial correlation
This form of correlation builds on Pearson correlation. Partial correlation allows you to influence the effects of a misleading variable. For example, if a variable such as socially desirable answers influences your research results, you can remove these effects.
Introduction of multiple regression
The multiple regression analysis examines whether there is a (predictive) relationship based on the correlation of multiple independent variables with the dependent variables. The multiple regression analysis uses continuous or ordinal data, but can also include one or more categorical variables as independent variables.
In principle, factorial ANOVA and multiple regression can calculate the same. Factorial ANOVA is used more often in practice for experimental research and the multiple regression usually for non-experimental research.
There are three types of multiple regression: standard, hierarchical or step-by-step.
Introduction to factor analysis
With factor analysis you can reduce a large amount of variables or scaling units to a manageable number of factors. Factor analysis allows you to search patterns in correlation and to find similar groups. This method is used to expose an underlying structure, develop scales and determine units of measurement.
What methods are there to investigate differences between groups?
If you want to investigate if there is a significant difference between multiple groups, there are several methods that you can use. The parametric versions of these methods are only suitable if the data is a normal distribution with interval scaled data. In the other cases there are non-parametric alternatives.
Introduction of t-tests
You use t-tests with two different groups or two different data sets and you want to compare the average score of a continuous variable. There are different types of t-tests. The two most common are the independent-samples t-test and the paired-samples t-test. The independent t-test is used when you want to compare the average scores of two different groups. You use the paired t-test when you want to compare the average scores of the same group of people at different times or when you have matched pairs.
The non-parametric alternatives to t-tests are the Mann-Whitney U Test and the Wilcoxon Signed Rank Test.
Introduction one-way ANOVA
ANOVA is the abbreviation for Analysis of Variance. A one-way analysis of variance has one independent variable (this is called the factor) that has different levels. These levels correspond to different groups or conditions. An example is the influence of the form of therapy on the degree of depression. The form of therapy (psychotherapy, pharmacotherapy, not therapy) is the independent variable, consisting of three levels. The dependent variable here is the degree of depression.
The one-way variance analysis is called this, since it compares the variance (variability in scores) between different groups with the variance within each group (degree of chance). The one-way variance analysis then calculates an F ratio. This F-ratio stands for the variance between the groups divided by the variance within the groups. A large F ratio indicates more variability between the groups (caused by the independent variable) than within the groups (the error). A significant F-test suggests that there is a difference between the different groups. However, it does not tell us exactly what this difference is. To investigate this, a post-hoc test is required, with a post-hoc test you investigate which groups differ significantly from each other.
There are two types of one-way analysis of variance: repeated measurements ANOVA or repeated measures ANOVA (in the case of the same people but with multiple snapshots) and ANOVA between groups or between-groups ANOVA (results for two or more different groups of people). The last-mentioned type can also be applied to independent samples.
The non-parametric alternative to one-way ANOVA are the Kruskal-Wallis Test and the Friedman Test.
Introduction two-way ANOVA
With two-way variance analysis (two-way ANOVA) you can visualize the effects of two independent variables on a dependent variable.
There are two types of two-way analysis of variance: repeated measurements ANOVA or repeated measures ANOVA (in the case of the same people but with multiple snapshots) and ANOVA between groups or between-groups ANOVA (results for two or more different groups of people). In some studies these methods are combined, this is called Mixed Designs or Split Plot.
Introduction MANOVA
MANOVA is the abbreviation for Multivariate Analysis of Variance. With a MANOVA, unlike other analyzes, a test is predicted, not one dependent variable, but several dependent variables. A MANOVA compares groups and tells whether there are differences between the groups with regard to the combination of different dependent variables.
ANCOVA introduction
ANCOVA is the abbreviation for Analysis of covariance. With an ANCOVA one can compare a variable in two or more groups and see if other variables influence this relationship. These other variables are also called covariates. The ANCOVA actually combines the ANOVA analysis and the regression analysis. With the ANCOVA one can see if a population mean of the dependent variable is the same over all levels of the categorical independent variable and simultaneously controls the effects of other continuous variables. You can use ANCOVA if you want to remove the effects of a certain variable.
How to you make a decision about which method to use?
The following step-by-step plan helps determine which method you use in SPSS:
Decide which questions you want to answer. Formulate the questions as specific as possible.
Determine which survey components and scales you need.
Determine which type of variables are required (dependent / independent, categorical / ordinal / continuous).
Make a diagram per research question to visualize for yourself which results you want.
Decide whether you can use a parametric method or need a non-parametric alternative. Ask yourself whether there is a normal distribution, and whether the other assumptions for specific parametric methods are met.
Make the final decision as to which method you will use. Use the overview of supplies below.
What is needed for the most commonly used methods in SPSS?
The following is for each method in SPSS what is required in terms of variables, etc., for which type of research questions the method is usually used, what results are obtained from the research, and how the output is usually displayed.
Chi square for independent variables
Research type: investigate relationships between variables
Example question: What is the relationship between the number of statistics subjects and failure rates within the psychology study?
Requirements: a categorically independent variable and a categorically dependent variable
Correlation
Research type: investigate relationships between variables
Example question: Is there a connection between age and empathy? Do people become more empathetic as they get older?
Requirements: two continuous variables
Partial correlation
Research type: investigate relationships between variables
Example question: If the effects are corrected for socially desirable answers, is there still a connection between empathy and having a large circle of friends?
Requirements: three continuous variables (one of which is socially desirable answers)
Multiple regression
Research type: investigate relationships between variables
Example question: How much variance in being social can be explained by: empathy, self-confidence and dominance? Which of these variables has the greatest influence on how social someone is?
Requirements: a continuously dependent variable and at least two continuous independent variables
Independent t-test
Research type: differences between groups of studies
Example question: Are men having a cold more often than women?
Requirements: a categorically independent variable with only two groups, and a continuously dependent variable
Paired t-test
Research type: differences between groups of studies
Example question: Does ten-week karate training help to reduce depression? Is there a difference between time recording 1 (before training) and time recording 2 (after training)?
Requirements: a categorically independent variable (time frame 1 and time frame 2) and a continuously dependent variable
One-way ANOVA between groups
Research type: differences between groups of studies
Example question: Is there a difference in empathy in people under 20, between 21 and 40, and 41 years and older?
Requirements: a categorically independent variable and a continuously dependent variable
Two-way ANOVA between groups
Research type: differences between groups of studies
Example question: What is the effect of age on empathy in men and women?
Requirements: two categorically independent variables and a continuously dependent variable
Mixed ANOVA
Research type: differences between groups of studies
Example question: What is more effective in stimulating a passion for statistics (a university study versus easier-to-use software in the future), measured at three moments (before the study, after the study, and ten years later when software is further developed)?
Requirements: an inter-group independent variable, an intra-group independent variable (the time recordings) and a continuously dependent variable (the passion for statistics)
MANOVA
Research type: differences between groups of studies
Example question: Do women have characteristics that make them experience more effects of falling in love than men? (measured with regard to optimism, compliance and amount of serotonin)?
Requirements: a categorically independent variable and at least two continuously dependent variables
ANCOVA
Research type: differences between groups of studies
Example question: Is there a larger significant difference in the results of a statistical exam, if people have followed a university study or if software has been further developed in the future (corrected for the results beforehand)?
Requirements: a categorically independent variable, a continuously dependent variable (the outcomes on time recording 2) and at least one continuous covariate (time recording 1).
Some studies use a single method, but many studies use multiple methods. In any case, it is crucial to choose the right research method.
Below you will not yet find out how exactly you apply the research methods. The overview below is intended to give a brief introduction to research methods, so that you can make a choice based on which method you need.
When and how is a correlation analysis applied? - Chapter 11
Correlation analysis is applied to indicate the strength and direction of a linear relationship between two variables. Two correlation coefficients are mentioned in this chapter: (1) Pearson r for continuous variables (at interval level) and in cases where there is one continuous and one dichotomous variable, and (2) Spearman rho for variables at ordinal level and in cases that your data does not meet the criteria for the Pearson correlation. This text shows how to calculate a bivariate Pearson r and a non-parametric Spearman rho.
Which preparatory analyzes must be done?
Before you perform a correlation analysis, it is useful to first generate a scatter plot; on the basis of this you can see whether the assumption of linearity and homoscedasticity has been met. In addition, a scatter plot gives you a clearer picture of the nature of the relationship between your variables.
Procedure for generating a scatter plot
Click on Graphs in the menu at the top of the screen and then on Legacy Dialogs.
Click on Scatter / Plot and choose Simple Scatter. Now click on Define.
Click on the first variable (usually the dependent variable) and move it to the y-axis box.
Click on the second variable (usually the independent variable) and move it to the x-axis box.
You can put your ID variable in the Label Cases by box, so that outliers can be identified.
Click OK (or Paste to save the syntax editor).
Interpretation of the scatter plot output
The scatter plot can be used to check for a number of aspects of the distribution of two variables:
Check for outliers, or extreme data values that deviate from the cluster of data values. Try to find out why these are outliers (is the data entered correctly?). When you have identified an outlier and want to retrieve the ID number, you can use the Data Label Mode icon in the Chart Editor . Double-click the chart to activate the Chart Editor. Then click on the icon that looks like the dartboard rose (or click on Data Label Mode in the Elements menu) and click on the point in the chart that you want to identify, a number will appear; this is the ID number.
Inspection of the distribution of data scores.
Determine the direction of the relationship (positive or negative) between the variables.
After you have examined the distribution of scores in the scatter plot and determined that there is a roughly linear relationship, you can start calculating the Pearson r or Spearman rho correlation coefficient. Before you begin the following procedure, first follow these steps: (1) click on Edit in the menu , select Options and then on General. Make sure that the box No scientific notation for small numbers in tables is checked in the Output section.
Procedure for calculating Pearson r or Spearman rho
Click on Analyze in the menu at the top of the screen and then select Correlate. Then click on Bivariate .
Select your two variables and move them to the Variables box.
In the Correlation Coefficient section, the Pearson box is the standard option. If you want to calculate Spearman rho, check the Spearman box.
Click on Options . Click on the Exclude Cases Pairwise for missing values. Under Options you can also check averages and standard deviations.
Click Continue and then OK (or Paste to save the syntax editor).
How to interpret the correlation output?
The results of Pearson r can be found in the top table (correlations) and those of Spearman rho in the bottom table (nonparametric correlations). You interpret the output of both tests in the same way.
Step 1: Check the sample information (N); is this number correct? If a lot of data is missing, find out why. For example, have you forgotten to check the Exclude cases pairwise box?
Step 2: Determine the direction of the relationship; Is there a positive or negative correlation?
Step 3: Determine the strength of the relationship; You can see this from the value of the correlation coefficient. A correlation of 0 means that there is no correlation. A value of -1 means a perfect negative correlation and a value of +1 indicates a perfect positive correlation. To interpret the values you can best use the Cohen guidelines:
Small: r = .10 to .29 (or -.10 to -.29)
Average: r = .30 to .49 (or -.30 to -.49)
Large: r = .50 to 1.0 (or -.50 to -1.0)
Step 4: Calculate the determination coefficient. This gives you an idea of the shared variance of your two variables. You calculate the determination coefficient by squaring the r value. If you want to convert this to the percentage of shared variance, you only have to multiply the determination coefficient by 100.
Step 5: Determine the significance level ( Sig. 2 tailed ). The statistical significance level gives an indication of the extent to which we can rely on the results obtained.
How are correlation results displayed?
If you mention the correlation between two variables, you can do this in a running text (see p. 140 for an example). However, correlation is often used to investigate the relationship between groups of variables (instead of just two variables). In this case it is inconvenient to report this in a running text; in this case it is best to put the results in a table.
How do you calculate the correlation coefficients between groups of variables?
If you want to study the relationships between multiple variables, you can place all variables in the Variables box. However, this can result in a huge correlation matrix that is difficult to read and interpret. If you are only looking for a number of correlations, you can use the Syntax Editor .
Procedure for obtaining correlation coefficients between two groups of variables
Click on Analyze in the menu at the top of the screen and then select Correlate. Then click on Bivariate.
Move the variables that you are interested in to the Variables box. Select the first group of variables, followed by the second group of variables. In the output, the first group of variables will be presented as rows in the table, and the second group of variables as columns. So first place the variables with longer names, so that the table does not become too wide.
Click on Paste ; this opens the Syntax Editor.
Place your cursor between the first and second group of variables. Type the word with here.
To activate this new syntax, you must select the text from CORRELATIONS to the end.
Then click on the green arrow / triangle (>) or go to the menu and click on Run and then on Selection .
How to compare the correlation coefficients of two groups?
You can also find out the strength of the correlation between two separate groups.
Procedure for comparing correlation coefficients of two groups
Step 1: split the sample.
In the menu at the top of the screen, click Data and then Split File.
Click on Compare Groups.
Place the grouping variable to the Groups based on box. Click OK (or Paste to save the Syntax Editor).
Step 2: Correlation.
Follow the steps in the earlier section of this chapter to get the correlation between the variables that you are interested in. The results are displayed separately for each group.
Important: don't forget to turn off the Split File option when you're done. You do this by clicking in the Data Editor window on Data, Split File and then on Analyze all cases, do not create groups.
How to test the statistical significance of the difference between correlation coefficients?
This section describes the procedure that you can follow to find out whether the correlations between two groups differ significantly. First r values will be converted to z scores. A comparison is then used to calculate the observed value of z (zobs value). The value obtained will be calculated using a fixed decision rule to determine the probability that the difference in the correlation between the two groups is due to chance.
First a check must be made for a number of assumptions. It is assumed that the r values of the two groups were obtained from random samples and that the two groups are independent (that is, the same participants were not tested twice). The score distribution for the two groups must be normal. Each group must also consist of at least 20 cases.
Step 1: Convert every r value to a z score.
Step 2: Convert these values to the equation to calculate zobs. You do this using the following formula: zobs = z1 - z2 / √ 1 / N1-2 + 1 / N2-3
Step 3: Determine whether the zobs value is statistically significant. The following rule applies: if -1.96 obs <1.96, then the correlation coefficients are not significantly different. If zobs are less than or equal to -1.96 or greater or equal to 1.96, then the coefficients are significantly different.
Correlation analysis is applied to indicate the strength and direction of a linear relationship between two variables. Two correlation coefficients are mentioned in this chapter: (1) Pearson r for continuous variables (at interval level) and in cases where there is one continuous and one dichotomous variable, and (2) Spearman rho for variables at ordinal level and in cases that your data does not meet the criteria for the Pearson correlation. This text shows how to calculate a bivariate Pearson r and a non-parametric Spearman rho.
What is the difference between correlation and partial correlation? - Chapter 12
The partial correlation is similar to Pearson r, with the difference that with the partial correlation you can check for an additional (confound) variable.
How to conduct the partial correlation procedure?
Click on Analyze in the menu at the top of the screen, then select Correlate and then Partial.
Click on the two continuous variables that you want to correlate. Click on the arrow to move these variables to the Variables box.
Click on the variable for which you want to check and move it to the Controlling for box.
Click on Options.
In the Missing Values section, click Exclude cases pairwise.
Click in the Statistics section on Zero order correlations.
Click Continue and then OK (or Paste to save the Syntax Editor).
How to interpret partial correlation output?
In the output there is a table that consists of two parts. In the upper half you will find the normal Pearson product-moment correlation matrix that does not check for the possible confound variable. The same correlation analyzes are repeated in the second half of the table, but now the possible confound variable is checked. By comparing the two correlation coefficients with each other, you can find out whether taking the additional variable into account has influenced the relationship between your two variables.
How do you perform multiple regression in SPSS? - Chapter 13
In this chapter it is explained how to use SPSS in multiple regression analyzes. Multiple regression is not just one technique, but a collection of techniques that can be used to study the relationship between a continuous dependent variable and multiple independent variables or predictors (usually continuous). It is based on correlation, but offers a more refined analysis of the relationship between a series of variables. Multiple regression can be applied to various research questions, including:
How well a set of variables is able to predict a certain outcome.
Which variable within a series of variables is the best predictor of a certain outcome.
Whether a certain predictive variable can still predict the outcome when checking for the influence of another variable.
What are the most important types of multiple regression?
There are different types of multiple regression analyzes that you can apply depending on your research question. The three most important multiple regression analyzes are:
Standard or simultaneous
Hierarchical or sequential
Step-by-step
Standard multiple regression
In the standard multiple regression, all independent (or predictive) variables are compared simultaneously. Each variable is evaluated in terms of its predictive value compared to that of the other independent variables. You use this analysis if you have a series of variables and want to know to what extent they can explain the variance in a dependent variable as a group.
Hierarchical multiple regression
In the hierarchical multiple regression (also called sequential regression), the independent variables are added to the equation in the order determined by the researcher on the basis of a theoretical framework. Variables or sets of variables are added in steps. Each variable is measured in terms of what it adds to the prediction of the dependent variable after checking for the other variables.
Step-by-step multiple regression
In step-by-step regression, the researcher provides a list of independent variables and then lets the program select, based on a set of statistical criteria, which variables are added and in which order they are added to the comparison. There are three different versions of this approach: (1) forward selection, (2) backward deletion, and (3) step-by-step regression.
Which assumptions are made for a multiple regression?
Sample size
It is important that your sample is not too small, because otherwise the results cannot be (sufficiently) generalizable. Tabachnick and Fidell came up with a formula to calculate the required sample size: N> 50 + 8m (m = number of independent variables). You need more cases if the variable is skewed. You need a ratio of 40 cases per independent variable for step-by-step regression.
Multicollinearity and singularity
This refers to the relationship between the independent variables. Multicollinearity occurs when the independent variables strongly correlate with each other ( r = .9 and higher). Singularity exists when an independent variable is actually a combination of other independent variables. Neither contributes to a good regression model.
Outliers
Multiple regression is very sensitive to outliers (extremely high or low scores). So check all variables (both dependent and independent) for outliers. Outliers can be removed from the dataset, or they can get a score that is high / low, but does not deviate too much from the other scores. Tabachnick and Fidell define outliers as scores with standardized residual values> 3.3 or <-3.3. You can find outliers in the standardized residual plot.
Normality, linearity, homoscedasticity and independence of residuals
All of these terms refer to different aspects of the distribution of scores and the nature of the underlying relationship between the variables. These assumptions can be read in the residual scatter plots. Residuals are the differences between the obtained and predicted dependent variable (AV) scores. You can check the following assumptions based on the residual scatter plots:
Normality: the residuals should normally be distributed over the predicted AV scores.
Linearity: the residuals must have a linear relationship with the predicted AV scores.
Homoscedasticity: the variance of the residuals over the predicted AV scores should be the same for all predicted scores.
What does the standard multiple regression look like?
In the case of the standard multiple regression, all independent variables are entered into the model simultaneously. The results provide an indication of how well this set of variables is able to predict the dependent variable. It also shows how much unique variance each of the independent variables can explain relative to the other independent variables.
Procedure for standard multiple regression
Before you begin the following procedure, click on Edit in the menu. Then select Options and make sure that the box No scientific notification for small numbers in tables is checked.
Click on Analyze in the menu at the top of the screen, then select Regression and then Linear.
Click on your continuously dependent variable and move it to the Dependent box.
Click on your independent variables and click on the arrow to move them to the Independent box.
Make sure that Enter is selected for the Method.
Click on the Statistics button. Select the following: Estimates, Confidence Intervals, Model fit, Descriptives, Part and partial correlations and Collinearity diagnostics. In the Residuals section, select Casewise diagnostics and Outliers outside 3 standard deviations. Then click Continue.
Click on Options and select Exclude cases pairwise in the Missing Values section. Click on Continue.
Click on the Plots button. Click on * ZRESID and the arrow to move it to the Y- box. Click on * ZPRED and the arrow to move it to the X box. In the Standardized Residual Plots, select the Normal probability plot option and click Continue.
Click Save Check the Mahalanobis box and Cook’s in the Distances section. Click Continue and then OK (or Paste to save the Syntax Editor).
How to interpret the standard multiple regression output?
Step 1: Check the assumptions
Multicollinearity
The correlations between the variables in your model are shown in the Correlations table . Check whether your independent variables have at least some relationship with your dependent variable (preferably> .3). Also check that the correlation between your independent variables is not too large (preferably a bivariate correlation <.7).
As part of the multiple regression procedure, SPSS also performs 'collinearity diagnostics' on your variables. This can overcome problems related to multicollinearity that are not visible in the correlation matrix. These results are in the Coefficients table . Two values are given here: Tolerance and VIF . Tolerance is an indicator of how much of the variability of the specified independent variable is not explained by the other independent variables in the model. If this value is very low (<.10), this indicates that the multiple correlation with other variables is high, possibly indicating multicollinearity. VIF (variance inflation factor) values above 10 are cause for concern, as this may indicate multicollinearity. Only use Tolerance and VIF as a warning sign and always check your correlation matrix.
Outliers, normality, linearity, homoscedasticity and independence of residuals
One way in which these assumptions can be checked is by inspecting the Normal Probability Plot (PP) or the Regression Standardized Residual and the Scatter Plot. These are at the end of the output. In the Normal PP Plot you hope that the points from bottom left to top right form a fairly straight diagonal line. This suggests that there are no major deviations from normality. In the Scatter plot of the standardized residuals (the second plot) you hope that the residuals are roughly rectangularly distributed, with most scores in the middle (around the zero point). You can also identify outliers on the basis of the Scatter plot. Outliers can also be identified by inspecting the Mahalanobis distances. These are not visible in the output, but are added as an extra variable at the end of the data file. To find out which scores are outliers, you need a critical chi-square value. Tabachnik and Fidell suggest the use of an alpha value of .001.
Step 2: Evaluate the model
Look in the Model Summary box and check the value under the R Square heading ; this tells you how much of the variance in the dependent variable is explained by the model. You may notice that there is also an Adjusted R Square value in the output. When you have a small sample, the R square value is often an optimistic overestimate of the real population value. The Adjusted R Square statistic “corrects” this value and provides a better estimate of the true population value. So if you have a small sample, you better report this value. To find out the statistical significance of the results, you have to look in the ANOVA table; this tests the null hypothesis that multiple R in the population is equal to 0.
Step 3: Evaluate all independent variables
The next thing you want to know is which of the variables in the model contributes to the prediction of the dependent variable. We find this information in the output box, called Coefficients. Look in the Beta column under Standardized Coefficients. To be able to compare the different variables with each other, it is important that you look at the standardized coefficients and not the non- standardized ones (B); you only use the latter if you want to draw up a regression equation.
Check the value in the Sig for all independent variables . -column; this tells you whether this variable makes a significant unique contribution to the comparison. This is very dependent on which variables are included in the comparison and how much overlap there is between the independent variables. If the Sig. Value is less than .05 (.01, .001, etc.), the variable makes a significant unique contribution to the prediction of the dependent variable.
Another potentially useful information component in the coefficient table is the Part correlation coefficients (sometimes also called semipartial correlation coefficients). If you square this value, you will get an indication of the contribution of that variable to the total R- square. In other words, it tells you how much of the total variance in the dependent variable is declared uniquely by that variable and how much R -square would drop if this variable were not included in your model.
What is hierarchical multiple regression?
In this form of multiple regression, the variables are added in steps in a predetermined order.
Hierarchical multiple regression procedure
Click on Analyze in the menu at the top of the screen, then select Regression and then Linear.
Choose your continuous dependent variable and move it to the Dependent box.
Move the variables for which you want to check to the Independant box; these form the first block that will be entered in the analysis.
Click on Next, this produces a second independent variables box in which you can add the second block of variables.
Choose your next block of independent variables.
Make sure this is set to default in the Method box (Enter).
Click on Statistics and check the following options: Estimates, Model fit, R squared change, Descriptives, Part and partial correlations and Collinearity diagnostics . Click on Continue .
Click on Options . In the Missing Values section, click Exclude cases pairwise and click Continue.
Click on the Plots button.
Click on * ZRESID and the arrow to move it to the Y- box.
Click on * ZPRED and the arrow to move it to the X box.
In the Standardized Residual Plots, select the Normal probability plot option and click Continue.
Click Save. Check the Mahalanobis box and Cook’s in the Distances section. Click Continue and then OK (or Paste to save the Syntax Editor).
How to interpret the hierarchical multiple regression output?
The output of this regression analysis is similar to that of the standard multiple regression, with some extra information here and there. You will find two models in the Model Summary box. Model 1 refers to the first block of variables that has been added and Model 2 includes all variables that have been added in both blocks.
Step 1: Evaluate the model
Check the R Square values in the first Model Summary box. Pay attention! The second R square value includes all variables of both blocks and therefore not only the variables added during the second step. To find out how much of the total variance is explained by the variables that you are interested in, look in the R Square change column and the corresponding Sig. F change.
Step 2: Evaluate all independent variables
To find out how well all variables contribute to the final comparison, look at the Coefficients table in the Model 2 row. This summarizes the results with all variables included in the comparison. In the Sig. The column shows whether the variables make a unique statistically significant contribution.
How to present the results from a multiple regression?
Depending on the type of analysis you have performed and the nature of the research question, there are a number of different ways in which the results of multiple regression can be presented. You must provide at least the following information: (1) what type of analysis you have performed (standard or hierarchical), (2) standardized (beta) values in the case of a theoretical investigation or un standardized (B) coefficients in the case of a applied research. If you have performed a hierarchical multiple regression, you must also mention the R square value changes (value changes) for each step, together with the probability values (probability values).
In this chapter it is explained how to use SPSS in multiple regression analyzes. Multiple regression is not just one technique, but a collection of techniques that can be used to study the relationship between a continuous dependent variable and multiple independent variables or predictors (usually continuous). It is based on correlation, but offers a more refined analysis of the relationship between a series of variables. Multiple regression can be applied to various research questions, including:
How well a set of variables is able to predict a certain outcome.
Which variable within a series of variables is the best predictor of a certain outcome.
Whether a certain predictive variable can still predict the outcome when checking for the influence of another variable.
How do you perform a logistic regression analysis in SPSS? - Chapter 14
Using logistic regression you can test models with which you can predict categorical outcomes - consisting of two or more categories. Using logistic regression you can measure how well your set of predictive variables is able to predict or explain your categorically dependent variable. It offers you an indication of the adequacy of your model by mapping the 'goodness of fit'. Your independent variable can be either categorical or continuous, or a combination of both. Here is shown how to perform a binomial (also called binary) logistic regression with a dichotomous dependent variable (so with only two categories or values). If your dependent variable consists of several categories, you will have to perform a multinomial logistic regression. This is not covered here, but is of course available in SPSS (see the Help menu).
Which assumptions are made for a logistic regression analysis?
Sample size
As with all other analyses, it is important that your sample size is sufficient. Always run Descriptive Statistics on each of your independent variables and consider removing categories with too few cases.
Multicollinearity
Always check if there are high intercorrelations between your independent variables. But for this use of by calling up collinearity diagnostics under the Statistics button. Ignore the rest of the output and only focus on the Coefficients table and the columns called Collinearity Statistics . Very low tolerance values (<.1) indicate that the variable is highly correlated with other variables. In that case, reconsider which variables you want to include in your model and remove one of the highly inter-correlating variables.
Outliers
It is important to check for outliers. This can be done by inspecting the residuals.
How to conduct a logistic regression?
To be able to interpret the results of logistic regression, it is important that you accurately set up the coding of responses for each of your variables. For the dichotomous dependent variable you have to code the responses as 0 and 1. You assign the 0 value to responses that show a lack or absence of the characteristic in which you are interested. You assign the 1 value to responses that show the presence of the characteristic in which you are interested. You carry out a similar procedure for your categorically independent variables. For continuous independent variables you link high values to the values of the characteristic in which you are interested (eg 0 hours of sleep gets value 0 and 10 hours of sleep, value 10).
Logistic regression procedure
Before you start the procedure below, first go to Edit in the main menu. Select Options there and make sure that the box No scientific notation for small numbers in tables is checked.
Click on Analyze in the menu at the top of the screen , then select Regression and then Binary Logistic.
Move your categorically dependent variable to the Dependent box. Move your independent variables to the Covariates box. Ensure that the Enter option is displayed with Method.
If you have categorical (nominal or ordinal) independent variables, click on the Categorical button. Mark all categorical variables and move them to the Categorical covariates box. Mark all your categorical variables again and click on the First button in the Change contrast section. Click on Change and you will see the word (first) appear after the name of the variable. Repeat this for all categorical variables. Click on Continue.
Click on Options. Select the following options: Classification plots, Hosmer-Lemeshow goodness of fit, Casewise listing of residuals and CI for Exp (B) .
Click Continue and then OK (or Paste to save the Syntax Editor).
How to interpret the logistic regression output?
The first thing to look at in your output are the details regarding your sample size. You can find this in the Case Processing Summary table. Make sure that it contains the number of test subjects that you have entered. The following table, Dependent Variable Encoding , shows how SPSS has encoded your dependent variable. Check in the table that follows (Categorical Variables Coding) the coding of your independent variables. Also check the number of cases per category in the Frequency column; you don't want groups with very small numbers.
The following output part (Block 0) concerns the results of the analysis without one of the independent variables being included in the model; this serves as a baseline to compare with the model in which the variables are included. First go to the next section; Block 1. Here your model (with the independent variables in it) is tested. The Omnibus Tests of Model Coefficients provides a general indication of how well the model is performing, compared to the results from Block 0, where none of the independent variables are included in the model. This is also referred to as the 'goodness of fit' test. Here you want a high significant value (Sig. Value <.05), because that means that your model with predictors is better than the baseline model. The results in the Hosmer and Lemeshow Test table provide support for the goodness or fit of your model. Please note that this test is interpreted very differently than the omnibus test. For the Hosmer and Lemeshow Goodness of Fit Test, a bad fit is indicated by a significance value of less than .05, which means that you want to see a high significance value here.
The Model Summary table also provides information about the usability of the model. The Cox & Snell R Square and the Nagelkerke R Square values provide an indication of the amount of variation in the dependent variable that is explained by the model (ranging from 0 to 1).
The Classification Table provides an indication of how well the model is able to predict the correct category for each case. You can compare this table with the Classification Table from Block 0 to find out how much improvement occurs in the model when the independent variables are included.
The sensitivity of the model is the percentage of the group that contains the characteristic that you are interested in and that are correctly determined by the model ('true positives'). The specificity of the model is the percentage of the group that does not contain the characteristic in which you are interested and has been correctly established ('true negatives'). The positive predictive value is the percentage of cases where the model states that they have the characteristic and that actually have this characteristic. The negative predictive value is the percentage of cases where the model states that they do not have the characteristic and that do not actually have this characteristic.
The Variables in the Equation table provides information about the contribution or importance of each of your independent variables based on the Wald test; you can read this in the Wald column. Now go in the Sig. -column looking for values less than .05; these are the variables that contribute significantly to the predictive value of the model. Check whether the B values are positive or negative; this says something about the direction of the relationship. If you have correctly coded all variables, negative B values mean that an increase in the independent variable score will result in a reduced chance that the case will have a score of 1 on the dependent variable. The opposite applies to positive B values. Another useful information item in the Variables in the Equation table can be found in the Exp (B) column; these values are the odds ratios (OR) for each of your independent variables. According to Tabachnick and Fidell, the OR represents " the change in odds of being in one of the categories or outcome when the value of a predictor increases by one unit". We prefer to convert OR smaller than 1 (1 divided by the value) when we report this for the purpose of interpretation.
A 95% confidence interval is given for each of the OR (95% CI for EXP (B) ); You should mention this in your results.
The last table in the output (Casewise List) provides information about cases in your sample for whom the model does not fit well. Cases with ZResid values above 2.5 or below -2.5 are outliers and must therefore be investigated more closely.
Using logistic regression you can test models with which you can predict categorical outcomes - consisting of two or more categories. Using logistic regression you can measure how well your set of predictive variables is able to predict or explain your categorically dependent variable. It offers you an indication of the adequacy of your model by mapping the 'goodness of fit'. Your independent variable can be either categorical or continuous, or a combination of both. Here is shown how to perform a binomial (also called binary) logistic regression with a dichotomous dependent variable (so with only two categories or values). If your dependent variable consists of several categories, you will have to perform a multinomial logistic regression. This is not covered here, but is of course available in SPSS (see the Help menu).
How do you perform factor analysis in SPSS? - Chapter 15
Factor analysis differs from many of the other techniques in SPSS. It is not designed to test hypotheses or to indicate whether one group differs significantly from the other. Instead it takes a large set of variables and looks for a way to 'reduce' or summarize the data by using a smaller set of factors or components. This is done by searching for clusters or groups between the intercorrelations of a set of variables. There are two core approaches to factor analysis: (1) explorative factor analysis - often used during the early stages of research to collect information about the relationships between a set of variables - and (2) confirmatory factor analysis - applied later in the research process to specific hypotheses or theories regarding test the underlying structure of a set of variables.
The term "factor analysis" includes a variety of different related techniques. One of the most important distinctions is that between principal component analysis (PCA) and factor analysis (FA). These two techniques are similar in many respects; both attempt to produce a smaller number of linear combinations of the original variables in a manner that includes (or can explain) most of the variability in the correlation pattern. Of course there are also differences; in PCA the original variables are transformed into a smaller set of linear combinations using all variance in the variables, while in FA the factors are estimated using a mathematical model where only the shared variance is analyzed.
How to conduct a factor analysis?
Step 1: Assessment of the suitability of the data (assumptions)
There are two important issues that you should take into account when determining the suitability of your dataset for factor analysis: sample size and the strength of the relationship between your variables (or items). There are not really clear guidelines for the sample size. Generally applies; the bigger the better. If you have a small sample (<150) or many variables, then look for more information about factor analysis.
The second issue concerns the strength of the intercorrelations between the items. Tabachnick and Fidell recommend that correlation coefficients have values greater than .3. SPSS offers two statistical measurements that can help determine the factorability of the data: (1) Bartlett's test for sphericity, and (2) Kaiser-Meyer-Olkin (SME) measurement for sample adequacy. Bartlett's test must be significant (p <.05) for appropriate factor analysis. The SME index must have a minimum value of .6 for a good factor analysis.
Step 2: Factor extraction
Factor extraction involves determining the smallest number of factors that can best be used to represent the interrelationships between the set of variables. There are different approaches that can be applied to identify the number of underlying factors or dimensions, of which PCA is the most used. It is up to the researcher to determine the number of factors that he / she believes is the best representation of the underlying relationship between the variables. Techniques that can be applied to help determine the number of factors are:
Kaiser's criterion: also known as the eigenvalue rule. Based on this rule, only factors with an eigenvalue of 1.0 or more are used for further research.
Catell's scree test: in this test all eigenvalues of the factors are plotted and this plot then searches for the point where the shape of the curve changes direction and becomes horizontal. Catell advises to keep all factors above this point.
Horn's parallel analysis: this includes comparing the magnitude of the eigenvalues with the eigenvalues obtained from a randomly generated data set of the same size. Only the eigenvalue that overwrites the corresponding values of the random data set is retained. This approach appears to be the most accurate (Kaiser's criterion and Catell's scree test tend to overestimate the number of components).
Step 3: Factor rotation and interpretation
After the number of factors has been determined, they must be interpreted. To facilitate this process, the factors are 'rotated'. SPSS shows which variables clump together; it is up to you to provide possible interpretations here.
There are two general rotation approaches that result in orthogonal (non-correlated) or oblique (correlated) factor solutions. In practice, these two approaches often result in similar results, especially when the correlation pattern between the items is clear. Pallant advises starting with oblique rotation to investigate the degree of correlation between your factors.
Within the two broad categories of rotation approaches, a number of different techniques are available in SPSS. The most commonly used orthogonal technique is the Varimax method; it tries to minimize the number of variables with high loads on each factor. The most commonly used oblique technique is Direct Oblimin.
What is the procedure for factor analysis?
Before you start the procedure below, first go to Edit in the main menu. Select Options there and make sure that the box No scientific notation for small numbers in tables is checked.
Click on Analyze in the menu at the top of the screen, then select Dimension Reduction and then Factor.
Select all required variables (or items) and move them to the Variables box.
Click on the Descriptives button. Make sure that in the Statistics section Initial Solution is checked. In the Correlation Matrix section, select the Coefficients and SME and Bartlett's test of sphericity options. Click on Continue.
Click on the Extraction button.
Ensure that Principal components is shown in the Method section, or choose one of the other factor extraction techniques (for example Maximum likelihood). Select the Correlation matrix in the Analyze section . In the Display section, Screeplot and the Unrotated factor solution must be selected. In the Extraction section, select the Based on Eigenvalue option or click on Fixed number of factors if you want to specify a specific number of factors and type in the number of desired factors. Click on Continue.Click on the Rotation button. Choose Direct Oblimin and click Continue.
Click on the Options button and in the Missing Values section select the option Exclude cases pairwise. In the Coefficient Display Format section selects the options Sorted by size and Surpress small coefficients. Type the value of .3 in the box to Absolute value below. This means that only factor loads with a value greater than .3 will be displayed, which makes the output easier to interpret.
Click Continue and then OK (or Paste to save the Syntax Editor).
How to interpret the factor analysis output? (Part 1)
Step 1: assess the suitability of your data set
Check if the Kaiser-Meyer-Olkin Measure or Sampling Adequacy (SME) value is .6 or higher and that the value of Bartlett's test of sphericity is significant (.05 or smaller) to verify whether your dataset is suitable for factor analysis. Search the Correlation Matrix table for correlation coefficients of .3 or higher.
Step 2: Factor extraction using Kaiser's criterion
To determine how many components meet the self-esteem criterion of 1 or higher, look at the Total Variance Explained table. Look at the values in the first series of columns ( Initial Eigenvalues ). In the Cumulative% column you can see how many percent of the variance explains the components.
Step 3: Factor extraction using Catell's scree test
Kaiser's criterion often gives too many components. That is why it is important to look at the Screeplot . Look for the point where the shape of the curve changes direction and becomes horizontal. All factors above this point must be retained.
Step 4: Factor extraction using parallel analysis
For parallel analysis, the third way to determine the number of factors, you must use the self-esteem list in the Total Variance Explained table and additional information provided by another statistical program (available from the website of this book). Follow the link to the Additional Material site and download the zip file (parallel analysis.zip) on your computer. Unzip this file on your hard drive and click on the MonteCarloPA.exe file. The Monte Carlo PCA for Parallel Analysis program will now start, in which you must then enter the following information: (1) the number of variables you want to analyze, (2) the number of participants in your sample, and (3) the number of replications). Click on Calculate. You must then systematically compare the first eigenvalue that you obtained in SPSS with the first value from the results of the parallel analysis. If your value is greater than the criterion value from the parallel analysis, you retain this factor; if your value is smaller, you reject it.
Step 5: Inspect factor loads
In the Component Matrix table you will find the unloaded loads of each of the items on the different components. SPSS uses Kaiser's criterion as the standard technique.
Step 6: Inspect the rotated factor solution
Before you make a final decision regarding the number of factors, you must look at the rotated factor solution in the Pattern Matrix table: this shows the item loads on the various factors. Ideally, you want at least three item loads per component. If this is not the case, you will have to find a solution with fewer factors. In this case, follow the procedure below:
Repeat all steps mentioned earlier in this chapter. Note: when you now click on the Extraction button, select Fixed number of factors . In the box next to Factors to extract, type in the number of factors that you want to extract.
Click Continue and then OK .
How to interpret the output of factor analysis? (Part 2)
The first thing to check is the percentage of variance explained by the new factor solution; this is stated in the Total Variance Explained table. After rotation of the new factor solution, you will find three new tables to look at at the end of your output. First, the Component Correlation Matrix (at the end of your output); this shows the strength of the relationship between the factors. This gives you information before you decide whether it was reasonable to assume that the components are not related to each other (and therefore the Varimax rotation can be applied), or whether the Oblimin rotation solution should be applied and reported.
Oblimin rotation yields two tables of factor loads. The Pattern Matrix displays the factor loads of each of the variables. Search for the highest loaded items on each component to identify and name the component. The Structure Matrix table provides information about the correlation between variables and factors. If you need to present the Oblimin rotation solution in your output, you must display both tables.
Earlier in the output you will find the table called Communalities ; this table provides information about how much variance is explained in each item. Low values (<.3) may mean that the item does not fit well with the other items in that component. If you want to improve or refine the scale, you can use this information to remove items from the scale. Removing items with low common (communality) values generally increases the total explained variance. Common values can change drastically, depending on the number of factors that are retained. It is therefore often better to interpret the common values after you have chosen how many factors you should keep according to the scoring plot and parallel analysis.
How are the results of a factor analysis reported?
The information that you present in your results depends on your field, the type of report you write and where your report will be presented. If you want to publish your research within the field of psychology and education, there are fairly strict requirements that you place in your article when you have used factor analysis. First of all, you must display the details of the factor extraction method that you used; therefore the criteria used to determine the number of factors, the type of rotation technique, the total explained variance, the original eigenvalues and the eigenvalues after rotation. In your report you must include a table with factor loads with all values in it (not just values > .3). In the case of the Varimax rotated solution, the table should be called 'pattern / structure coefficients'. In the case of the Oblimin rotation, both the Pattern Matrix and the Structure Matrix coefficients must be fully presented, together with the information about the correlations between the factors.
Factor analysis differs from many of the other techniques in SPSS. It is not designed to test hypotheses or to indicate whether one group differs significantly from the other. Instead it takes a large set of variables and looks for a way to 'reduce' or summarize the data by using a smaller set of factors or components. This is done by searching for clusters or groups between the intercorrelations of a set of variables. There are two core approaches to factor analysis: (1) explorative factor analysis - often used during the early stages of research to collect information about the relationships between a set of variables - and (2) confirmatory factor analysis - applied later in the research process to specific hypotheses or theories regarding test the underlying structure of a set of variables.
How do you use SPSS for non-parametric statistics? - Chapter 16
Non-parametric statistics are ideal when your data is measured on a nominal or ordinal scale. They are also useful when you have very small samples and when your data does not meet the assumptions of the parametric techniques.
IBS SPSS provides various non-parametric techniques for different situations. The most commonly used non-parametric techniques are explained below.
Which non-parametric techniques are available?
Non-parametric technology | Parametric alternative |
Chi-square test for goodness or fit | No |
Chi-square test for independence | No |
McNemar's Test | No |
Cochran's Q Test | No |
Kappa Measure of Agreement | No |
Mann-Whitney U Test | T-test for independent sampling |
Wilcoxon Signed Rank Test | T-test for paired samples |
Kruskal-Wallis Test | ANOVA one-way between-groups |
Friedman Test | ANOVA one-way repeated measures |
Assumptions for non-parametric techniques
General assumptions of non-parametric techniques that require checking are:
Random samples
Independent observations (with the exception of techniques where repeated measurements are performed).
In addition, some techniques have additional assumptions; these will be discussed per technique.
How to perform the chi-square test for goodness or fit?
This test, also called the one-sample chi-square, is often used to compare the proportion of cases from a sample with hypothetical values or previously obtained values from comparable populations. The only thing you need in the data file is one categorical variable and a specific proportion against which you want to test the observed frequencies.
Procedure for chi-square test for goodness or fit
Click on Analyze in the menu at the top of the screen, then select Non-parametric Tests, then Legacy Dialogs and then Chi-square.
Move the categorical variable to the Test Variable List box. In the Expected Values section, click on the Values option. In the Values box you must enter the values of your variables: The first value corresponds to the expected proportion for the first coded value of the variable (e.g., 1 = yes). Click on Add. The second value is the expected proportion for the second coded value (e.g., 2 = no). Click on Add . Etc.
Click OK (or Paste to save the Syntax Editor).
Interpretation of the output
The first table shows the observed frequencies of the current data file. In the Test Statistics table the results of the Chi-Square Test - which compares the expected and observed values with each other - are reported.
Reporting the results
You must include the chi-square value, the degrees of freedom (df) and the p-value (Asymp. Sig.) in the results.
How to conduct the chi-square test for independence?
The Chi-square test for independence is used when you want to study the relationship between two categorical variables. Each of these variables can have two or more categories. The chi-square test for independence compares the observed frequencies or proportions of cases that occur in each of the categories with the values that are expected if there is no association between the measured variables. When SPSS encounters a 2x2 table (2 categories in each variable), the output includes an additional correction value (Yates' Correction for Continuity); this value is designed to compensate for what some researchers regard as an overestimate of the chi-square value when it is used in a 2x2 table.
Additional assumptions
The lowest expected frequency must be at least 5 for each cell. If you have a 2x2 table, it is recommended that you have a minimum expected frequency of 10. If you have a 2x2 table that violates this assumption, you should consider reporting Fisher's Exact Probability Test instead.
Procedure
Click on Analyze in the menu at the top of the screen, then select Descriptive Statistics and then Crosstabs.
Click on the variable (s) that you want to make your row variable (s) and move it to the Row (s) box.
Click on the other variable(s) that you want to make your column variable(s) and move it to the Column(s) box.
Click the Statistics button. Check Chi-square and Phi and Cramer's V and click Continue. Click on the Cells button. Make sure the option Observed is checked in the Counts box. Then go to the Percentage section and click on the Row box. Click on Residuals on Adjusted standardized .
Click Continue and then OK (or Paste to save the Syntax Editor).
Interpretation of the output
To start with, it is important to see whether assumptions have been violated when it comes to the chi-square 'minimum expected cell frequency'. This must be 5 or larger.
The most important value that you are interested in is the Pearson Chi-Square value; you can find this in the Chi-Square Tests. If you have a 2x2 table, use the value from the second row (Continuity Correction). This is Yates's continuity correction.
For more detailed information from a study, cross-references in a Crosstabulation can also be used.
Different types of statistics are available in the Crosstabs procedure to calculate the effect size. The two most common are:
Phi coefficient: this is a correlation coefficient that can vary between 0 and 1. The higher the value, the stronger the association between the two variables. The phi coefficient is often used with 2 by 2 tables.
Cramer's V: this statistic shows the value of tables that are larger than 2 by 2. The number of degrees of freedom (df) is also taken into account.
How to perform McNemar's test?
You can not use a chi-square tests in the event of identical or repeated measurements. In that case, the McNemar test can be used as an alternative. Your data is also different. With equal or repeated measurements you have two variables; the first was measured at time 1 (before an intervention) and the second at time 2 (after an intervention). Both variables are categorical and map the same information.
Procedure for McNemar's test
Click on Analyze in the menu at the top of the screen, then select Nonparametric tests and then Related samples.
At "What is your objective?" Click on Customize analysis.
Go to Fields. Select the two variables and move them to the Test Fields box.
Click on Settings tests on Customize tests .
Click on the box to select McNemar's test (2 samples). Click on Define Success and then on OK.
Click Run (or Paste to save the Syntax Editor).
Interpretation of the output
If the p-value (Sig.) Is less than .05, it means that there is a significant difference between your two variables.
The McNemar test is only applicable if you have two response categories (eg yes / no or present / absent). However, if you have three or more categories, the McNemar's test can still be used. SPSS then automatically generates the results of the McNemar-Bowker symmetry test.
How to perform Cochran's Q test?
Click on Analyze in the menu at the top of the screen, then on Nonparametric Tests, then on Related Samples.
At "What is your objective?" Click on Customize analysis.
Click on the variables that represent the different time moments and move them to the Test Fields box.
Click on Settings tests on Customize tests.
Click on Cochran's Q (k samples). Make sure that the option All pairwise is selected for Multiple comparisons.
Click OK (or Paste to save the Syntax Editor).
Interpretation of the output
To determine if there is a significant difference between the variables / time moments, look at the p-value (Asymp. Sig.); it must be less than .05.
How to measure the correspondence between two tests with Kappa?
Kappa's measure of agreement (measure of agreement) is often applied when the inter-assessor reliability must be established. Kappa is an estimate of the degree of agreement between two assessors or tests. This takes into account the degree of agreement that might have happened by chance (coincidence).
The value obtained from Kappa's measure of conformity is influenced by the prevalence of the positive value. This means that in studies where the domain of interest is rare, kappa statistics can be very low, despite high levels of general agreement.
An additional assumption of this approach is that it is assumed that assessor / test 1 and assessor / test 2 have the same number of categories.
Procedure for Kappa: measure of conformity
Click on Analyze in the menu at the top of the screen, then on Descriptive Statistics and then on Crosstabs.
Click on the variable that you want in the output table in the row and move it to the Row(s) box.
Click on the other variable that you want in the output table in the column and move it to the Column(s) box.
Click the Statistics button, select Kappa, and click Continue.
Click on the Cells button.
Make sure the option Observed is checked in the Counts box.
Click in the Percentage section on Column. Click Continue and then OK (or Paste to save the Syntax Editor).
Interpretation of the output
The most important information can be found in the Symmetric Measures table. A value of .5 means an average match. A Kappa value of .7 and higher means a good agreement and a value of .8 represents a very good agreement.
Sensitivity and specificity
The frequencies and percentages as shown in the Crosstabulation table can also be used to calculate the sensitivity and specificity of a measurement or test.
Sensitivity indicates the proportion of cases with a disease or disorder that have been correctly diagnosed. Specificity indicates the proportion of cases without the disease or disorder that have been correctly classified.
How to perform the Mann-Whitney U test?
The Mann-Whitney U Test is used to test the differences between two independent groups on a continuous scale. This test is the non-parametric alternative to the t-test for independent samples. The Mann-Whitney U Test compares medians of the two groups. It converts the scores of the continuous variable to rank the two groups. The test then checks whether the grades of the two groups differ significantly.
Procedure for Mann-Whitney U Test
In the menu at the top of the screen, click Analyze, then Nonparametric Tests, then Independent Samples.
At "What is your objective?" Click on Customize analysis.
Go to Fields.
Move the categorical (independent) variable to Groups.
Move the continuous (dependent) variable to the Test Fields box.
Click on Settings tests on Customize tests. Click on Mann-Whitney U (2 samples).
Click Continue and then OK (or Paste to save the Syntax Editor).
Interpretation of the output
The most important values in your output are the Z value and the significance level which is represented as Asymp. Sig. (2-tailed). If your sample is larger than 30, SPSS will give the value for a Z approximation test.
If there are statistically significant differences between the groups, the direction of the difference must be determined. In other words: which group is higher? You can read this from the Ranks table under the Mean Rank column . When reporting the results, it is important to display the median values of each group separately. To do this, you must follow the following steps:
Click on Analyze in the menu at the top of the screen , then on Compare Means and then choose Means .
Move your continuous variable to the Dependent List box.
Move your categorical variable to the Independent List box.
Click on Options. Click in the Statistics section on Median and move it to the Cell Statistics box, Click Mean and Standard Deviation and remove it from the Cell Statistics box.
Click Continue and then OK (or Paste to save the Syntax Editor).
Effect size
SPSS does not provide a statistical representation of the effect size. Using the z value (as shown by the output) the estimated value of r can be calculated:
r = z / √N, where N is the number of cases.
How to perform the Wilcoxon Signed Rank test?
Wilcoxon's 'ranking test' is designed to perform repeated measurements. For example, when participants are measured at two different times or under two different conditions. It is the non-parametric alternative to the t-test for repeated measurements (or t-test for paired samples). The Wilcoxon ranking test converts the scores to levels and compares them with each other at time 1 and time 2.
Procedure for the Wilcoxon Signed Rank Test
In the menu at the top of the screen, click Analyze, then Nonparametric Tests, then Legacy Dialogs and then choose 2 Related Samples.
Click on the variables that represent the scores at times 1 and 2 and move them to the Test Pairs box.
Make sure the Wilcoxon option is checked in the Test Type section.
Click on Options and select Quartiles (this yields the median for every time).
Click Continue and then OK (or Paste to save the Syntax Editor).
Interpretation of the output
The two outcomes in which you are interested are the Z value and the corresponding significant levels (shown as Asymp. Sig. (2-part) ). If the significance level is equal to or lower than 0.5, then there is a statistically significant difference between the two scores.
Effect size
For the Wilcoxon ranking test, the effect size can be calculated in a similar way as for the Mann-Whitney U test; namely by dividing the z-value by the square root of N, where N in this case stands for the number of observations taken over the two times and therefore not the number of cases.
How to perform the Kruskal-Wallis test?
The non-parametric alternative to a one-way between-groups variance analysis is the Kruskal-Wallis test (or Kruskal-Walles H test). This test makes it possible to compare the scores of a variable consisting of three or more groups. Because it involves an analysis between different groups, there must be different people in each different group.
Procedure for Kruskal-Wallis Test
In the menu at the top of the screen, click Analyze, then Nonparametric Tests, then Independent Samples.
At "What is your objective?" Click on Customize analysis.
Go to Fields.
Move the continuously dependent variable to the Test Fields box.
Move the categorical independent variable to Groups.
Click on Settings tests on Customize tests. Click on Kruskal Wallis 1-way ANOVA. Make sure that the option All pairwise is selected for Multiple comparisons.
Click Run (or Paste to save the Syntax Editor).
Interpretation of the output
The most important information you need from the output of the Kruskal-Wallis Test are the value of the Chi-square, the degrees of freedom (df) and the significance level (shown as Asymp. Sig.).
If the significance level is lower than .05, this indicates a statistically significant difference in your continuous variable between the three groups. For the three groups you can then look at the Mean Rank, which is shown in the first output table. From this you can see which group on average scores the highest.
Post-hoc tests and effect size
After performing the Kruskal-Wallis Test, you are not yet sure which groups differ statistically significantly from each other. This requires follow-up Mann-Whitney U tests between pairs of groups. To prevent Type 1 errors, a Bonferroni adjustment to the alpha values is needed if you want to compare the groups with each other.
The Bonferroni adjustment means that the alpha level of .05 is divided by the number of tests that you want to apply and that you will use the revised alpha level as a criterion to determine significance.
How to perform the Friedman test?
The non-parametric alternative to a one-way repeated measures variance analysis is the Friedman Test. This test can be used when taking the same samples and measuring them at three or more times under three or more different conditions.
Procedure for Friedman Test
In the menu at the top of the screen, click Analyze, then Nonparametric Tests, then Legacy Dialogs and then K Related Samples.
Move the variables that represent the measurements to the Test Variables box.
Make sure the Friedman option is selected in the Test Type section.
Click Statistics and check the Quartiles option.
Click Continue and then OK (or Paste to save the Syntax Editor).
Post-hoc tests and effect size
If it is determined that there is a statistically significant difference somewhere in the three time points, the next step would be to perform a post-hoc test to compare the time points in which you are interested.
Non-parametric statistics are ideal when your data is measured on a nominal or ordinal scale. They are also useful when you have very small samples and when your data does not meet the assumptions of the parametric techniques.
IBS SPSS provides various non-parametric techniques for different situations. The most commonly used non-parametric techniques are explained below.
Which t-tests can be used in SPSS? - Chapter 17
There are different t-tests available in SPSS, the following two will be discussed here:
T-test for independent samples (independent-samples t-test): this test is used to compare the means of two different groups of people or conditions.
T-test for paired samples (paired-samples t-test): this test is used to compare the means of the same group of people at two different times or when there are equal (matched) pairs.
If there are more than two groups or conditions, these tests cannot be used; in that case a variance analysis should be conducted.
What does the T-test for independent samples look like?
An independent-samples t-test is used when you want to compare the mean score on a continuous variable of two groups of participants. On the basis of this test you can determine whether there is a statistically significant difference between the averages of two groups. In statistical terms: it is tested what the likelihood is that two sets of scores are from the same population.
The non-parametric alternative to this test is the Mann-Whitney U Test.
Procedure for the t-test for independent sampling
Click on Analyze in the menu at the top of the screen, then on Compare Means and then on Independent Samples T Test.
Move the dependent (continuous) variable to the Test variable box.
Move the independent (categorical) variable to the Grouping variable part.
Click on Define Groups and enter the number of numbers used in the dataset to code each group. If you cannot remember which values are used for which group, right-click on the name of the variable and click on the Variable Information option . A pop-up box will appear in which the values and labels for this variable are stated. After you have entered the values, close the pop-up box and click Continue.
Click OK (or Paste to save the Syntax Editor).
Interpretation of the output
Step 1: Check the information about the groups.
Check the mean, the standard deviation and the group size (N) of both groups, found in the table of the Group Statistics.
Step 2: Check whether the assumptions are met.
Check the result of the Levene's test (in the first part of the Independent Samples Test output box) . This test determines whether the variance of the scores is the same for the two groups. The result determines which t-value generated by SPSS should be used. Namely, if the significance value (p) is greater than .05, the first line of the table must be read ( Equal variances assumed). If the significance levels are equal to or lower than .05, the data violates the assumption of equal variance and the second line of the table must be read ( Equal variances not assumed).
Step 3: Map the differences between the groups.
Read the significance value under Sig. (2-tailed) that can be found under t-test for Equality of Means. Whether you need to read this value for equal variances assumed or not assumed is therefore clear from step 2. With a value equal to or lower than .05, there is a significant difference between the average scores on your dependent variable for all two groups . With a value higher than .05, there is therefore no significant difference between the two groups.
Calculate the effect size
SPSS does not produce effect sizes. Eta squared can be calculated manually based on the information from the output, using the following formula:
η2 = t2 / t2 + (N1 + N2 - 2)
The effect size can be determined with the Cohen guidelines:
- 0.01 is a small effect
- 0.06 is an average / moderate effect
- 0.14 is a big effect.
What does the T-test for paired samples look like?
A paired-samples t-test (also called repeated measures t-test) is used if you want to collect data from just one group of people at two different times or under two different conditions. This is, for example, an experimental design with a pre-test / post-test. This test is also used when there are matched pairs of participants (each participant is matched with another participant based on a specific criterion); One is exposed to Intervention 1 and the other to Intervention 2. The scores on a continuous measurement are then compared for each pair.
With this test you can determine whether there is a statistically significant difference between the average scores at time 1 and time 2. The basic assumptions for t-tests apply here. An addition to this is the assumption that the difference between the two scores of each participant is normally distributed. With a sample size greater than 30, it is unlikely that a violation of this assumption will cause serious problems.
The non-parametric alternative to this test is the Wilcoxon Signed Rank Test.
Procedure for the t-test for paired samples
Click on Analyze in the menu at the top of the screen, then on Compare Means and then on Paired Samples T Test.
Move the two variables that you want to compare for each participant to the Paired Variables box.
Click OK (or Paste to save the Syntax Editor).
Interpretation of the output
Step 1: Determine the general meaning.
In the Paired Samples Test table the p ( probability ) value must be looked at in the last column ( Sig (2-tailed) ) . If this value is lower than .05, then there is a significant difference between the two scores.
Step 2: Compare the averages.
After it has been determined whether there is a significant difference, it is necessary to find out which series of scores is higher. For this, the Mean scores in the Paired Samples Statistics table must be looked at .
Calculate the effect size
SPSS does not produce effect sizes. Eta squared can be calculated by hand using the following formula:
η2 = t2 / t2 + (N - 1)
The effect size can be determined with the Cohen guidelines. These are the same guidelines as for the t-test for independent samples:
- 0.01 is a small effect
- 0.06 is an average / moderate effect
- 0.14 is a big effect.
There are different t-tests available in SPSS, the following two will be discussed here:
T-test for independent samples (independent-samples t-test): this test is used to compare the means of two different groups of people or conditions.
T-test for paired samples (paired-samples t-test): this test is used to compare the means of the same group of people at two different times or when there are equal (matched) pairs.
If there are more than two groups or conditions, these tests cannot be used; in that case a variance analysis should be conducted.
How do you use one-way ANOVA in SPSS? - Chapter 18
T-tests are used to compare the scores of two different groups or conditions. In many research situations, however, we are interested in comparing average scores from more than two groups. In that case, variance analysis (ANOVA) is used. Variance analysis is so called because the variance (variability in scores) between the different groups (which is expected to be explained by the independent variable) is compared with the variability within each of the groups (which is expected to be caused by chance ). The F ratio - or the variance between the groups divided by the variance within the groups - is calculated. A significant F value means that the null hypothesis, namely that the population means are equal, can be rejected. Because it says nothing about which groups differ from each other, post-hoc tests still have to be performed. An alternative to post-hoc testing is to perform specific comparisons (or planned comparisons ).
Two types of one-way ANOVAs are discussed below, namely:
- Between-groups ANOVA, which is used when dealing with different participants / cases in each of your groups (also called the independent groups design).
- Repeated measures ANOVA, which is used when you compare the same participants under different conditions / times (also called the within-subjects design).
When do you use post-hoc tests at ANOVA?
The one-way between-groups ANOVA is applied when you have one categorically independent (grouping) variable with at least three levels (groups) and one continuously dependent variable.
The non-parametric alternative of the one-way between-groups ANOVA is the Kruskal-Wallis Test.
Procedure for one-way between-groups ANOVA with post-hoc tests
Click on Analyze in the menu at the top of the screen, then on Compare Means and then on One-way ANOVA.
Move your dependent variable to the Dependent List box.
Move your independent variable to the Factor box.
Select Options and select Descriptive, Homogenity or variance test, Brown-Forsythe, Welch and Means Plot.
In Missing values, make sure that the option Exclude cases analysis by analysis is checked and click on Continue.
Click on Post Hoc and select Tukey .
Click Continue and then OK (or Paste to save the Syntax Editor).
Interpretation of the output
Always check the Descriptives table first; Here you will find information about each group. Then look in the Test of Homogeneity of Variances table, where you will find Levene's test for equal variances. If the significance value (Sig.) Is greater than .05, the assumption of equal variances is not violated. If this is the case, consult the Robust Tests of Equality Means table. The two tests listed here (Welch and Brown-Forsythe) are better to use in that case.
In the ANOVA table you are mainly interested in the Sig. -column; here you will find the p value. If this is less than .05, there is a significant difference between the average scores of your dependent variable for the different groups. It does not tell you which groups differ from each other. The statistical significance of the differences between each pair of groups can be found in the Multiple Comparisons table, where you will find the results of the post-hoc tests. The averages of each group are in the Descriptives table. Only look at the Multiple Comparisons table when you have found a significant value in your general ANOVA (see the ANOVA table). Look at the Mean Difference column and look for asterisks (*) next to the values. When you see an asterisk, it means that the two groups being compared with each other differ significantly with a p-value <.05. The exact significant value is stated in the Sig. -column.
The Means plot is a simple way to compare the average scores of the different groups. These plots can be misleading, so always look carefully at the values on the y-axis.
Calculate the effect size
Although SPSS does not generate an effect size, it can be calculated using the information from the ANOVA table. Use the following formula:
η2 = sum of squares between groups / total squared sum
When to use planned comparisons at ANOVA?
Use planned comparisons when you are interested in comparisons between specific groups. This technique is more sensitive in detecting differences. Post-hoc testing, on the other hand, sets stricter significance levels to reduce the risk of Type 1 errors. You must decide whether you use post-hoc tests or planned comparisons before you begin your analysis.
Specifying coefficient values
First you have to classify your groups based on the different values of the independent variable (eg age categories). You must then decide which groups to compare and which to ignore. The sum of the coefficient values must always be 0. Coefficients with different values are compared with each other. If you want to ignore one of the groups, give it the value 0.
Procedure for one-way between-groups ANOVA with planned comparisons
Click on Analyze in the menu at the top of the screen, then on Compare Means and then on One-way ANOVA.
Move your dependent (continuous) variable to the Dependent List box.
Move your independent variable to the Factor box.
Select Options and select Descriptive, Homogenity or variance test, Brown-Forsythe, Welch and Means Plot.
In Missing values, make sure that the option Exclude cases analysis by analysis is checked and click on Continue.
Click on Contrasts. Enter the coefficient for the first group in the Coefficients box and click Add . Enter the coefficients for the second group and click Add . Do this for all your groups. The Coefficient Total at the bottom of the table must be 0 if you have entered all the coefficients correctly.
Click Continue and then OK (or Paste to save the Syntax Editor).
Interpretation of the output
The Descriptives and Test of homogeneity of variances tables look the same as those of the one-way ANOVA with post-hoc tests. Only the output that is relevant for the planned comparisons is discussed here.
Step 1: The Contrast Coefficients table shows the coefficients that you have entered for each group. Check if this is correct.
Step 2: The most important results that you are interested in are in the Contrast Tests table. If Levene's test is not significant, there are equal variances, so you should look at the first row in this table. If the significance level of the contrast that you specified is significant (equal to or less than .05), it means that there is a statistically significant difference between the relevant group and the other groups. As you will see, the planned comparison analysis yields a t value instead of an F value. To get the F value, all you have to do is square the t value. You also need the degrees of freedom to report the results. The first value (for all scheduled comparisons) is 1; the second is in the table next to the t value (under df).
What does the ANOVA One-way repeated measures look like?
In a one-way repeated measures ANOVA design, each participant is exposed to two or more conditions or measured on the same continuous scale at three or more times. The technique can also be used to compare participants' responses to two or more different questions or items. It is important here that the questions must be measured on the same scale (eg 1 = completely disagree, up to 5 = completely agree).
The non-parametric alternative to this test is the Friedman Test.
Procedure for one-way repeated measures ANOVA
Click on Analyze in the menu at the top of the screen, then on General Linear Model and then on Repeated Measures.
Enter a name in the Within Subject Factor Name box that represents your independent variable. This is not the actual variable name, but only a label that you link to your independent variable.
Enter the number of levels or groups in the Number of Levels box.
Click on Add.
Click on the Define button.
Select the variables that represent your repeated measurements variable and move them to the Withing Subjects Variable box.
Click on Options.
Check the Descriptive Statistics and Estimates of effect sizes options in the Display section. If you want to request post-hoc tests, select the name of your independent variable in the Factor and Factor Interactions section and move it to the Display Means for box. Check Compare Main effects . In the Confidence interval adjustment section, click on the arrow that points downwards and choose the Bonferroni option.
Click Continue and then OK (or Paste to save the Syntax Editor).
Interpretation of the output
In the first output box you see the descriptive statistics of your collection of scores (Mean, Standard deviation, N). Check if this is correct.
In the following table (Multivariate tests) you are interested in the value of Wilk's Lambda and the associated significance value (under the Sig. Column). If the p-value is equal to or less than .05, you can conclude that there is a significant difference. To find out the effect size, look at the Partial Eta Squared column in the Multivariate Tests table. See the table in Part 5 for the effect size guidelines.
When you have found a significant difference, it means that there is a difference somewhere between your groups. However, it does not tell you which groups or scores differ from each other; You can read this from the Pairwise Comparison table, in which each pair of groups is compared and it is indicated whether the difference between the groups is significant (see the Sig. column).
T-tests are used to compare the scores of two different groups or conditions. In many research situations, however, we are interested in comparing average scores from more than two groups. In that case, variance analysis (ANOVA) is used. Variance analysis is so called because the variance (variability in scores) between the different groups (which is expected to be explained by the independent variable) is compared with the variability within each of the groups (which is expected to be caused by chance ). The F ratio - or the variance between the groups divided by the variance within the groups - is calculated. A significant F value means that the null hypothesis, namely that the population means are equal, can be rejected. Because it says nothing about which groups differ from each other, post-hoc tests still have to be performed. An alternative to post-hoc testing is to perform specific comparisons (or planned comparisons).
How do you use two-way ANOVA in SPSS? - Chapter 19
Two-way means that there are two independent variables. Between-groups indicates that there are different participants in each of the groups. The two-way between-groups ANOVA can be used to look at the individual and collective influence of two independent variables on one dependent variable. You can therefore not only test the main effect for each independent variable, but also see whether there is possibly an interaction effect. The latter effect occurs when the influence of an independent variable on the dependent variable is dependent on a second independent variable.
How to perform a two-way ANOVA?
A two-way ANOVA can be conducted using the following steps:
Click on Analyze in the menu at the top of the screen, then on General Linear Model and then on Univariate.
Move your dependent (continuous) variable to the Dependent variable box.
Move your independent (categorical) variables to the Fixed Factors box.
Click on Options and then on Descriptive Statistics, Estimates of effect size and Homogeneity of tests. Then click Continue.
Click Post Hoc. Select the independent variables you are interested in from the Factors on the left and move them to the Post Hoc Tests for section. Choose which key you want to use (for example Tukey) and click Continue.
Click on the Plots button. Move the independent variable with the most groups to the Horizontal box. Move the other independent variables to the Separate Lines box. Click on Add. You should now see two variables in the Plots section.
Click Continue and then OK (or Paste to save the Syntax Editor).
How to interpret the results of two-way ANOVA?
Check in the Descriptive Statistics table if your descriptive statistics (average scores, standard deviations and number of participants (N)) are correct.
Check in the Levene's Test of Equality or Error Variances whether the assumptions are met. You want the significance value to be greater than .05. A significant result suggests that the variance depending on you is not the same across the different groups. If this is the case, it is recommended to set a stricter level of significance (e.g., 01) to evaluate your results from your two-way ANOVA.
The most important output of the two-way ANOVA can be found in the table called Tests of Between-Subjects. The first thing you do is see if there is an interaction effect. Namely, when this is the case, it becomes more difficult to interpret the main effects. Look in the output for the variable1 * variable 2 significance value (under the Sig. Column); if it is less than or equal to .05 (or another determined alpha value), there is a significant interaction effect. If you don't find an interaction effect, you can interpret the main effects without worry. In the left column you will find the variables that you are interested in. Look at the Sig. column next to each variable to determine if there is a main effect for each independent variable. When the significance value is less than or equal to .05 (or another determined alpha value), there is a significant main effect for that independent variable.
The effect size can be found in the Partial Eta Squared column. Use the Cohen guidelines to interpret this.
If you have found significant differences, then you must look for where these differences are located; you do this on the basis of post-hoc tests. These tests are only relevant if your independent variable consists of more than two levels / groups. The results of the post-hoc tests are in the Multiple Comparisons table. The Tukey Honestly Significant Difference (HSD) test is one of the most used tests. Search the Sig. column to values less than .05. Significant results are also indicated with a small asterisk in the Mean Difference column.
At the end of your output you will find a plot of the scores of the different groups. Based on this, you get a clear visual picture of the relationship between your variables. Always look carefully at the scale when you interpret the plots.
Which additional analyzes are required in the event of a significant interaction effect?
If you find a significant interaction effect, it is advisable to perform follow-up tests to examine this relationship more precisely (only if one of your variables consists of at least three levels). This can, for example, be done on the basis of a simple effect analysis. This means that you will view the results of each of the subgroups separately. To do this, you must split the sample into groups according to one of your independent variables and perform separate one-way ANOVAs to investigate the effect of the other variable. Follow the following procedure for this:
In the menu at the top of the screen, click Data and then Split File.
Click on Organize output by groups.
Place the grouping variable to the Groups based on box. This splits the sample based on this variable and repeats all analyzes that follow for these two groups separately.
Click OK.
After you split your file, perform a one-way ANOVA. Note: When you have finished the analysis, do not forget to disable the Split File option. To do this, go to Data again and select Split File. Check the first option (Analyze all cases, do not create groups) and click OK.
Two-way means that there are two independent variables. Between-groups indicates that there are different participants in each of the groups. The two-way between-groups ANOVA can be used to look at the individual and collective influence of two independent variables on one dependent variable. You can therefore not only test the main effect for each independent variable, but also see whether there is possibly an interaction effect. The latter effect occurs when the influence of an independent variable on the dependent variable is dependent on a second independent variable.
SPSS Survival Manual by Pallant - 6th edition - BulletPoints
How to design a study that uses SPSS? - BulletPoints 1
Three important situations in which you can use SPSS: (1) checking the reliability of a sample; (2) checking the reliability of results; (3) visualize data.
The reliability of a scale indicates to what extent the scale is free from random error. There are two types of reliability: test-retest reliability and internal consistency.
The reliability of a scale indicates to what extent the scale is free from random error. There are two types of reliability: content validity, criterion validity, and construct validity.
How to make a codebook for SPSS? - BulletPoints 2
Before you can enter all information from questionnaires and experiments in IBM SPSS it is necessary to make a codebook. This is a summary of the instructions that you will use to convert the information of each test subject into a format that IBM SPSS can understand. Preparing a codebook consists of (1) defining and labeling each variable, and (2) assigning numbers to all possible answers.
A codebook basically consists of four columns:
(1) The abbreviated name of the variable (for example 'ID' for 'identification number);
(2) The written name of the variable (for example 'identification number');
(3) An explanation of how the possible answers are taught (for example 1 = men, 2 = women);
(4) The measurement scale (for example nominal).
How to start with IBM SPSS? - BulletPoints 3
It is important to always save your data when you are working with it. Saving does not happen automatically in IBM SPSS. To save a file, go to the File menu, then choose Save. You can also click on the icon that looks like a floppy disk. You can see this at the top left of your screen. Always ensure that your file is saved on your computer and not on an external drive. When you save the file for the first time, you must create a name for the file and choose a folder where you want to save the file. IBM SPSS automatically ensures that your file is saved with .sav at the end.
It is important to know that an output file can only be opened in IBM SPSS. If you send your file to someone else who does not have the IBM SPSS program, he or she cannot open your file. To remedy this, you can export your output. Select File and then Export. You can now choose the type, for example PDF or Word. Then choose the Browse button to create a folder in which you want to save the file and choose a suitable name in the Save File line. Then click Save and OK.
How to create a file and enter your data in SPSS? - BulletPoints 4
There are four steps in determining variables: (1) Create variables; (2) Assign labels to the answer categories and the missing values; (3) Entering data; (4) Clean up data.
There are two ways to create a new variable. In the first way, a new variable is created by entering new data. In the second way, a variable is created that is based on existing data in the data set. For example, two variables are then combined to create a new, third variable.
For some analyzes you only need a part of your sample. For example: only the men. You must then select this group in SPSS. You do this by using the Select Cases option. When you have selected the group of men, all women are deleted in SPSS. All analyzes that you will subsequently do will only be done for men.
How to screen and clean up data in SPSS? - BulletPoints 5
- Before you can analyze your data it is important to check your data file for errors, possible errors. First, it is important to see if you have made typos (see above). In addition, it is essential to investigate whether there are other errors with your data. For this you follow the following steps. Step 1: Checking for errors. First it is necessary to check all scores of all variables. You then investigate whether there are certain scores that fall outside the normal range. Step 2: Finding and checking error in the data file. It is then necessary to find out where the error is in the data file. This error must then be corrected or removed.
How to use SPSS for descriptive statistics? - BulletPoints 6
When you are sure that there is no error in your data file, you can start with the descriptive phase of your data analysis. We called this descriptive statistics. These have as purpose: (1) to describe the characteristics of your sample in the method section of your article; (2) to check your variables to investigate whether you meet certain assumptions associated with the statistical techniques you want to implement to answer your research questions; (3) to ask specific research questions.
The Skewness function provides information about the symmetry of the distribution of the scores. Kurtosis provides information about the peak of distribution. If the distribution of the scores were perfectly normal, both the skewness and the kurtosis would be zero. A positive value of skewness indicates that the scores are mainly on the left. Negative values suggest that the scores are on the right side of the mean. A kurtosis of almost zero indicates a distribution that relationships are flat (too many test subjects in the extreme scores).
When conducting research, in particular on people, you rarely get all the information from every case. That is why it is important that the research also looks at the missing data. This is possible in SPSS using the Missing Value Analysis procedure (bottom option in the Analyze menu). You must also decide how to deal with missing data when performing statistical analyzes. The Options button in many of the statistical procedures in SPSS offers various options regarding dealing with missing data. It is important that you choose carefully, since it can have major consequences for your results.
Which graphs can you use to display data? - BulletPoints 7
- In SPSS there are different types of graphs and charts that you can use to display data. The views discussed in this chapter are histograms, bar charts, line charts, scatter charts, and boxplots.
How to manipulate data in SPSS? - BulletPoints 8
If the raw data has been accurately entered into SPSS, the next step is to edit and prepare the data so that later analyzes can be performed and hypotheses can be tested.
Make sure that you also adjust the codebook for everything you adjust. An alternative is to use the Syntax option, this means that you keep track of all actions to be performed in the Syntax Editor, so that there is a list of what has been adjusted.
How to check the reliability of a scale? - BulletPoints 9
- The value of a study largely depends on the reliability of the scale used. One aspect of reliability is internal consistency: the degree to which the items of a scale associated with each other. This can for example be calculated with the Cronbach's cofficient alpha in SPSS. A Cronbach's alpha of .7 or greater indicates a reliable scale. However, with short scales with few units, there are low Cronbach values and they don't say much.
Which method to use in SPSS? - BulletPoints 10
- Some studies use a single method, but many studies use multiple methods. In any case, it is crucial to choose the right research method. In this chapter, several methods are discussed: chi-square, correlation, partial correlation, multiple regression analysis, independent t-test, paired t-test and various forms of analysis of (co-)variance; oneway- and two-way (M)AN(C)OVA.
When and how is a correlation analysis applied? - BulletPoints 11
- Correlation analysis is applied to indicate the strength and direction of a linear relationship between two variables. Two correlation coefficients are mentioned in this chapter: (1) Pearson r for continuous variables (at interval level) and in cases where there is one continuous and one dichotomous variable, and (2) Spearman rho for variables at ordinal level and in cases that your data does not meet the criteria for the Pearson correlation. This text shows how to calculate a bivariate Pearson r and a non-parametric Spearman rho.
To interpret the values you can best use the Cohen guidelines:
Small: r = .10 to .29 (or -.10 to -.29)
Average: r = .30 to .49 (or -.30 to -.49)
Large: r = .50 to 1.0 (or -.50 to -1.0)
What is the difference between correlation and partial correlation? - BulletPoints 12
- The partial correlation is similar to Pearson r, with the difference that with the partial correlation you can check for an additional (confound) variable.
How to perform multiple regression in SPSS? - BulletPoints 13
Multiple regression is not just one technique, but a collection of techniques that can be used to study the relationship between a continuous dependent variable and multiple independent variables or predictors (usually continuous). It is based on correlation, but offers a more refined analysis of the relationship between a series of variables. Multiple regression can be applied to various research questions.
In the standard multiple regression, all independent (or predictive) variables are compared simultaneously.
In the hierarchical multiple regression (also called sequential regression), the independent variables are added to the equation in the order determined by the researcher on the basis of a theoretical framework. Variables or sets of variables are added in steps. Each variable is measured in terms of what it adds to the prediction of the dependent variable after checking for the other variables.
In step-by-step regression, the researcher provides a list of independent variables and then lets the program select, based on a set of statistical criteria, which variables are added and in which order they are added to the comparison. There are three different versions of this approach: (1) forward selection, (2) backward deletion, and (3) step-by-step regression.
How to perform a logistic regression analysis in SPSS? - BulletPoints 14
Using logistic regression you can test models with which you can predict categorical outcomes - consisting of two or more categories. Using logistic regression you can measure how well your set of predictive variables is able to predict or explain your categorically dependent variable. It offers you an indication of the adequacy of your model by mapping the 'goodness of fit'. Your independent variable can be either categorical or continuous, or a combination of both.
For logistic regression, assumptions are made regarding the sample size, multicollinearity, and outliers.
How to perform factor analysis in SPSS? - BulletPoints 15
Factor analysis differs from many of the other techniques in SPSS. It is not designed to test hypotheses or to indicate whether one group differs significantly from the other. Instead it takes a large set of variables and looks for a way to 'reduce' or summarize the data by using a smaller set of factors or components. This is done by searching for clusters or groups between the intercorrelations of a set of variables. There are two core approaches to factor analysis: (1) explorative factor analysis - often used during the early stages of research to collect information about the relationships between a set of variables - and (2) confirmatory factor analysis - applied later in the research process to specific hypotheses or theories regarding test the underlying structure of a set of variables.
There are two important issues that you should take into account when determining the suitability of your dataset for factor analysis: sample size and the strength of the relationship between your variables (or items). There are not really clear guidelines for the sample size. Generally applies; the bigger the better. If you have a small sample (<150) or many variables, then look for more information about factor analysis.
The second issue concerns the strength of the intercorrelations between the items. Tabachnick and Fidell recommend that correlation coefficients have values greater than .3. SPSS offers two statistical measurements that can help determine the factorability of the data: (1) Bartlett's test for sphericity, and (2) Kaiser-Meyer-Olkin (SME) measurement for sample adequacy. Bartlett's test must be significant (p <.05) for appropriate factor analysis. The SME index must have a minimum value of .6 for a good factor analysis.
How to use SPSS for non-parametric statistics? - BulletPoints 16
- Non-parametric statistics are ideal when your data is measured on a nominal or ordinal scale. They are also useful when you have very small samples and when your data does not meet the assumptions of the parametric techniques.
General assumptions of non-parametric techniques that require checking are: (1) Random samples; (2) Independent observations (with the exception of techniques where repeated measurements are performed). In addition, some techniques have additional assumptions.
The Chi-square test for independence is used when you want to study the relationship between two categorical variables. Each of these variables can have two or more categories. The chi-square test for independence compares the observed frequencies or proportions of cases that occur in each of the categories with the values that are expected if there is no association between the measured variables. When SPSS encounters a 2x2 table (2 categories in each variable), the output includes an additional correction value (Yates' Correction for Continuity); this value is designed to compensate for what some researchers regard as an overestimate of the chi-square value when it is used in a 2x2 table.
Kappa's measure of agreement (measure of agreement) is often applied when the inter-assessor reliability must be established. Kappa is an estimate of the degree of agreement between two assessors or tests. This takes into account the degree of agreement that might have happened by chance (coincidence).
Sensitivity indicates the proportion of cases with a disease or disorder that have been correctly diagnosed. Specificity indicates the proportion of cases without the disease or disorder that have been correctly classified.
Which t-tests can be used in SPSS? - BulletPoints 17
There are different t-tests available in SPSS, the following two will be discussed here: T-test for independent samples (independent-samples t-test): this test is used to compare the means of two different groups of people or conditions. T-test for paired samples (paired-samples t-test): this test is used to compare the means of the same group of people at two different times or when there are equal (matched) pairs.
The effect size can be determined with the Cohen guidelines:
0.01 is a small effect
0.06 is an average / moderate effect
0.14 is a big effect.
How to use one-way ANOVA in SPSS? - BulletPoints 18
In this chapter we discussed two types of one-way ANOVA's, namely: Between-groups ANOVA, which is used when dealing with different participants / cases in each of your groups (also called the independent groups design). Repeated measures ANOVA, which is used when you compare the same participants under different conditions / times (also called the within-subjects design).
The one-way between-groups ANOVA is applied when you have one categorically independent (grouping) variable with at least three levels (groups) and one continuously dependent variable.
In a one-way repeated measures ANOVA design, each participant is exposed to two or more conditions or measured on the same continuous scale at three or more times. The technique can also be used to compare participants' responses to two or more different questions or items. It is important here that the questions must be measured on the same scale (eg 1 = completely disagree, up to 5 = completely agree).
Use planned comparisons when you are interested in comparisons between specific groups. This technique is more sensitive in detecting differences. Post-hoc testing, on the other hand, sets stricter significance levels to reduce the risk of Type 1 errors. You must decide whether you use post-hoc tests or planned comparisons before you begin your analysis.
How to use two-way ANOVA in SPSS? - BulletPoints 19
- Two-way means that there are two independent variables. Between-groups indicates that there are different participants in each of the groups. The two-way between-groups ANOVA can be used to look at the individual and collective influence of two independent variables on one dependent variable. You can therefore not only test the main effect for each independent variable, but also see whether there is possibly an interaction effect. The latter effect occurs when the influence of an independent variable on the dependent variable is dependent on a second independent variable.
- The most important output of the two-way ANOVA can be found in the table called Tests of Between-Subjects. The first thing you do is see if there is an interaction effect. Namely, when this is the case, it becomes more difficult to interpret the main effects.
- If you find a significant interaction effect, it is advisable to perform follow-up tests to examine this relationship more precisely (only if one of your variables consists of at least three levels). This can, for example, be done on the basis of a simple effect analysis. This means that you will view the results of each of the subgroups separately. To do this, you must split the sample into groups according to one of your independent variables and perform separate one-way ANOVAs to investigate the effect of the other variable.
Contributions: posts
Spotlight: topics
Online access to all summaries, study notes en practice exams
- Check out: Register with JoHo WorldSupporter: starting page (EN)
- Check out: Aanmelden bij JoHo WorldSupporter - startpagina (NL)
How and why use WorldSupporter.org for your summaries and study assistance?
- For free use of many of the summaries and study aids provided or collected by your fellow students.
- For free use of many of the lecture and study group notes, exam questions and practice questions.
- For use of all exclusive summaries and study assistance for those who are member with JoHo WorldSupporter with online access
- For compiling your own materials and contributions with relevant study help
- For sharing and finding relevant and interesting summaries, documents, notes, blogs, tips, videos, discussions, activities, recipes, side jobs and more.
Using and finding summaries, notes and practice exams on JoHo WorldSupporter
There are several ways to navigate the large amount of summaries, study notes en practice exams on JoHo WorldSupporter.
- Use the summaries home pages for your study or field of study
- Use the check and search pages for summaries and study aids by field of study, subject or faculty
- Use and follow your (study) organization
- by using your own student organization as a starting point, and continuing to follow it, easily discover which study materials are relevant to you
- this option is only available through partner organizations
- Check or follow authors or other WorldSupporters
- Use the menu above each page to go to the main theme pages for summaries
- Theme pages can be found for international studies as well as Dutch studies
Do you want to share your summaries with JoHo WorldSupporter and its visitors?
- Check out: Why and how to add a WorldSupporter contributions
- JoHo members: JoHo WorldSupporter members can share content directly and have access to all content: Join JoHo and become a JoHo member
- Non-members: When you are not a member you do not have full access, but if you want to share your own content with others you can fill out the contact form
Quicklinks to fields of study for summaries and study assistance
Main summaries home pages:
- Business organization and economics - Communication and marketing -International relations and international organizations - IT, logistics and technology - Law and administration - Leisure, sports and tourism - Medicine and healthcare - Pedagogy and educational science - Psychology and behavioral sciences - Society, culture and arts - Statistics and research
- Summaries: the best textbooks summarized per field of study
- Summaries: the best scientific articles summarized per field of study
- Summaries: the best definitions, descriptions and lists of terms per field of study
- Exams: home page for exams, exam tips and study tips
Main study fields:
Business organization and economics, Communication & Marketing, Education & Pedagogic Sciences, International Relations and Politics, IT and Technology, Law & Administration, Medicine & Health Care, Nature & Environmental Sciences, Psychology and behavioral sciences, Science and academic Research, Society & Culture, Tourisme & Sports
Main study fields NL:
- Studies: Bedrijfskunde en economie, communicatie en marketing, geneeskunde en gezondheidszorg, internationale studies en betrekkingen, IT, Logistiek en technologie, maatschappij, cultuur en sociale studies, pedagogiek en onderwijskunde, rechten en bestuurskunde, statistiek, onderzoeksmethoden en SPSS
- Studie instellingen: Maatschappij: ISW in Utrecht - Pedagogiek: Groningen, Leiden , Utrecht - Psychologie: Amsterdam, Leiden, Nijmegen, Twente, Utrecht - Recht: Arresten en jurisprudentie, Groningen, Leiden
JoHo can really use your help! Check out the various student jobs here that match your studies, improve your competencies, strengthen your CV and contribute to a more tolerant world
3952 | 1 |
Add new contribution