Appendix for Practice Exam 2015/2016: Statistics II for IB – UG
- 1967 reads
Which of the following statements on type i and type II errors is correct?
Which kind of relation do we have between Type I and II errors?
Consider to choose among tests, in order to achieve a given power level. In other words, you have a target power for your test, which statement is correct?
A scatter plot of number of teachers (T) and number of people with University degrees in Dutch cities (P) shows a positive relation. Which is the most likely explanation for this positive association?
Which ofthe following statements regarding scatterplots are correct?
Consider missing data.
A researcher observed that in her survey study about travel expenditures individuals who did not provide their household income tended to be almost exclusively those in the higher income bracket. Which sentence is correct?
What is one of the distinctions between a population parameter and a sample statistic?
Which ofthe following property would indicate that a dataset is not symmetric?
Which one ofthese statistics can be unaffected by outliers?
What is the effect of an outlier on the value ofa correlation coefficient between a dependent and an independent variable?
A regression model with variable Y regressed on variable X is used to
Consider the following population model: Yj = ß0 + ß1X1,j + ß2X2,j + ... + ßkXk,j + εj
For any j = 1, ..., N. If all the values of the dependent variable are multiplied by the same constant, what does it happen to the norm of the residuals and R2 of the regression?
You collect data on the score Sf on the final exam of a course and on the score S1, S2, S3 on the first, second and thirdassignment of the course, respectively. All the scores are expressed on the integer scale points from 1 to 10. Some data show that there is a relation between these variables. The estimated linear regression model is: Ŝf = 6.8 + 0.0 * S1 + 0.25 * S2 + 0.0 * S3
One interpretation of the coefficients is
Pick the choice that best completes the following sentence. If a relationship between two variables is called statistically significant, it means the investigators think the variables are
Consider a dependent variable with variance which does not change for different values of an independent variable. With respect this independent variable, the dependent variable is characterized by
Consider the following component matrix of a Principal Component Analysis.
Component: | 1 | 2 | 3 | 4 |
X6 Product Quality X7 E Commerce Activities X8 Technical Support X9 Complaint Resolution X10 Advertising X11 Product Line X12 Salesfroce Image X13 Competitive Pricing X14 Warranty & Claims X16 Order & Billing X18 Delivery Speed | .248 .307 .292 .871 .340 .716 .377 -.281 .394 .809 .879 | -.501 .713 -.369 .031 .581 -.455 .754 .660 -.305 .042 .117 | -.081 .306 .794 -.274 .115 -.151 .341 -.069 .778 -.220 -.302 | .670 .284 -.202 -.215 .331 .212 .232 -.348 -.193 -.247 -.206 |
Sum of Squares (value) | 3.427 | 2.551 | 1.691 | 1.087 |
Percentage of trace | 31.15 | 23.19 | 15.37 | 9.88 |
What is the total percentage of variance explained by the four factors?
Which of the following is/are critical assumption(s) for factor analysis?
A researcher applies Multivariate Regression Analysis to important characteristics that can influence the amount of customers of a company. For the study, the researcher has at disposal data from 92 customers in 4 metric variables:
Each variable is measured on an integer scale with points from 1 to 10, with 1 being ”Poor” and 10 being "Excellent". The researcher considers variable X19 as representative of the customer satisfaction with respect to the overall company's activity, while she considers variables X8, X11, and X15 as representative of the customer satisfaction with respect to just a specific part of the company activities, as explained by the variable names. Therefore, the researcher tries to explain the variation in X19 by means of the variation in X8, X11, and X15. The appencix on PART 2 — Problem on Multivariate Regression Analysis" on pages 13-17 contains the SPSS output necessary to answer the questions.
Explain if Multivariate Regression Analysis is allowed for the given dataset.
Are there any problems with missing data and outliers?
Discuss the assumption of normality in this data set. Use a significance level of α = 0.05
Explain how to test for the presence of heteroscedasticity for the four variables in the dataset. Interpret the test statistics given in the tables. What do you conclude?
Provide the linear regression model, and explain what coefficients and variables represent.
Provide the regression equation for the linear regression model using the entermethod.
Determine the percentage of variation in the dependent variable that is explained by the regression model. Is this percentage significant? Specify and explain the used test.
Explain which independent variables have a significant contribution in the prediction of the dependent variable in the regression model. Use a significance level of α = 0.05
Indicate and explain which independent variable has the highest influence on the dependent variable of the regression equation.
Does multicollinearity cause a problem in the regression? Explain your answer.
Provide the regression equations for the linear regression models using the sequentialforward method.
Explain which independent variables have a unique, significant contribution to the prediction ofthe dependent variable. Indicate exactly which table you use in your explanation
A researcher is studying the market segmentation of a company’s customers and applies factor analysis to important characteristics that can influence this market segmentation. The researcher has at disposal data from 92 customers in 12 metric variables measured on a 0—10 scale with 10 being "Excellent” and 0 being ”Poor". The variables are
Appendix Bcontains the SPSS output necessary to answer the questions.
D
C
D
B
A
D
B
B
C
B
B
D
A
C
A
B
B
X19 is the dependent variable, and X8, X11, and X15 are the independent variables. From the table "Descriptive Statistics", there are 92 cases with values for the four considered variables. 50 the ratio "sample size to independent variables” is 92:3 = 30.7:1
It is in agreement with the adopted rule of thumb of having at least 10 times as many cases as independent variables. It is also in accordance with the minimum ratio 5:1 considered in the textbook.
All variables are metric. Therefore Multivariate Regression Analysis (MBA) is allowed.
From the table "Case Processing Summary" there is no missing data, so there is no problem.
There is one possible outlier in X15 —- New Products. This realization can be seen in the X15- boxplot and histogram. There are no apparent outliers in the other plots.
There are no problems with outliers.
The assumption of normality can be checked in different ways:
As asked in the question, we consider a significance level 0.05
We say that a variable behaves as a normal variable at the 5% confidence level if the value on the column Sig. of the table Tests of Normality is greater than 0.05. Then all the variables behave as normal at the 5% confidence level.
First, we can consider a Levene test. In this test the following hypothesis is tested:
We can consider the Levene test based on different statistics, such as mean and median. This test considers the variance of a metric variable compared across levels of another variable. In particular, the test focuses on a particular statistic, such as the mean and the median. In the four tables "Test of Homogeneity of Variances" given in this exam, the additional information on the chosen statistic is not provided. Considering the Levene test with output given in the tables and a significance level of 0.05 we observe that
With X19 as factor) - Technical Suggort (0.094) - Product Line (0.000) - New Products (0.034)
With X15 as factor) - Technical Support (0.000) - Product Line (0.027) - Satisfaction (0.001)
With X11 as factor) - Technical Support (0.000) - New Products (0.001) - Satisfaction (0.000)
With X8 as factor) - Product Line (0.005) - New Products (0.074) - Satisfaction (0.000)
In the two underlined cases, the significance level is higher than 01:0.05 and we fail to reject the null hypothesis. We reject H0 in all the other cases. In all these cases and we say that the variables are statistically heteroscedastic at the 5% confidence
level.
Second, we can perform a graphical analysis, considering the scatterplots of pairs of variables. We consider where the pattern has an overall shape that differs from a rectangle (e.g., an overall triangular shape). For example, X11 seems to be heteroscedastic, showing this kind of pattern.
The considered model is "Model 1". The (theoretical) linear regression model in vector form (that is, with no label for the observation) is
X19 = a + b1*X8 + b2*X11 + b3*X15 + e
In this notation X19 is the regressand (dependent, explained variable) vector. Similarly, X8, X11 and X15 are the regressors (independent, explanatory variables) vectors.
The parameter a is the constant coefficient.
The parameters b1, b2, b3 are the coefficients of the various regressors, and they represent the impact of an increase/decrease of a regressor on the explained variable.
e is the vector of errors.
The regression equation is the estimated version of the (theoretical) regression model. From the table "Coefficients" we get
^X19 = 3.455 + 0.017*X8 + 0.506*X11 + 0.073*X15
Where the symbol "AX19" indicates the vector of the estimated (or predicted) value of variable vector X19, for the values X8, X11 and X15 ofthe vector regressor.
You can also answer this question by providing the regression equation for each single observation (that is, with the indexfor the observation)
From the "Model Summary" table we get that R2 = 0.325. This means that 32.5% of the variation in the dependent variable X19 is explained by the independent variables. This means that the model explains about a third of the total variation of X19.
The test to determine whether this percentage is significant is the F-test reported in the ANOVA table. The hypothesis is
H0) R2 = 0 vs. H1) R2 > 0
The test is equivalently formulated as
H0) bi=0 vs. H1) bi ≠ 0
(we test if all the regression coefficients are equal to zero versus the hypothesis that there is at least one regression coefficient that is different from zero).
F-statistic= 14.108, p-value= .000 (close to 0). Interpretation of the values:
The p-value is < α = 0.05, then we reject H0. We conclude that the R2 is significantly different from zero.
The i-th independent variable has a significant contribution in the prediction of the dependent variable if its coefficient is statistically different from 0. We then test the hypotheses H0) bi=0 vs. H1) bi ≠ 0. The t-test is reported in the coefficients table.
X8: t-value=0.233, p-value=0.816 X8: p>alpha, fail to reject H0
X11: t-value=6.162, p-value=0.000 X11: p<alpha, reject H0
X15: t-value=1.001, p-value=0.320 X15: p>alpha, fail to reject H0
Conclusion: X11 is significantly different from 0 (ie, this independent variable has a significant contribution in the prediction of the dependent variable).
From the ”Coefficients" table we get that the independent variable that has the highest influence on the dependent variable is X11 since it has the highest standardized regression coefficient (it has also the highest unstandardized coefficient).
No. All tolerances are above 0.10. All VlFs are below 10
First of all, we must always check if the estimation and test results we consider are appropriate! Sometimes we may have accidentally at disposal useless results! On page 17-18 of this exam, there are two tables that do not refer to the models considered in this exam! The tables are "ANOVA” and "Coefficients" on page 17. They refer indeed to a different regression model, where the dependent variable is indeed X22 - Purchase Level! We can understand this fact on the basis of 2 facts:
The regression equation for the regression model obtained by using the forward method is obtained from the last table of the Tables and graphs for MODEL 2 in PART 2"
Model 2 (forward method)
^X19 = 3.878 + 0.513 * X11
Again, the table to use is the last table of the "Tables and graphs for MODEL 2 in PART 2”. From this table we understand that the only one statistically significant variable to be taken as explanatory variable is X11. This is consistent with the findings in the answer to question 8 of this part of the exam. The regression equation is also quite similar.
Overall, the model with only X11 is the model to adopt. In previous questions we have seen that X3 and X15 do not addmuch information to explain the variation of X19.
By comparing the Adjusted-R2 of Modell (0.302) with Model 2 (0.309) this value is slightly higher in case of Model2, which support the same conclusion to select the model with forward method.
Join with a free account for more service, or become a member for full access to exclusives and extra support of WorldSupporter >>
This bundle brings together the WorldSupporter exam materials for Statistics 2 (International Business - Groningen)
There are several ways to navigate the large amount of summaries, study notes en practice exams on JoHo WorldSupporter.
Do you want to share your summaries with JoHo WorldSupporter and its visitors?
Main summaries home pages:
Main study fields:
Business organization and economics, Communication & Marketing, Education & Pedagogic Sciences, International Relations and Politics, IT and Technology, Law & Administration, Medicine & Health Care, Nature & Environmental Sciences, Psychology and behavioral sciences, Science and academic Research, Society & Culture, Tourisme & Sports
Main study fields NL:
JoHo can really use your help! Check out the various student jobs here that match your studies, improve your competencies, strengthen your CV and contribute to a more tolerant world
2575 | 1 |
Add new contribution