![Image](https://www.worldsupporter.org/sites/default/files/styles/medium/public/bundle/wereldbol_summaries_joho_single_boek_1_150x190px_0.png?itok=PgQm9J5z)
JoHo kan jouw hulp goed gebruiken! Check hier de diverse studentenbanen die aansluiten bij je studie, je competenties verbeteren, je cv versterken en een bijdrage leveren aan een tolerantere wereld
Indicate whether each of the following variables is categorical or numeric. If the variable is categorical, specify the measurement level. If the variable is numeric, specify the measurement level and indicate whether the variable is discrete or continuous:
Upon visiting a newly opened H&M store, customers were given a brief survey. Is the answer to each of the following questions categorical or numerical? If categorical, give the level of measurement. If numerical, is it discrete or continuous?
Tourists visiting Croatia are asked to fill in a survey. The survey consists of various questions about how they experienced their holiday. Describe for each question the type of data obtained.
Question | Type of data |
Which of the following areas did you visit?
| |
Did you rent a sailing boat?
| |
What was the average amount of money you spent on food per day? | |
What would you recommend as the optimal number of days for tourists to spend in Croatia? | |
How often would you recommend visiting Croatia?
|
An administrator examines the travel expenses of faculty members that attended various professional meetings. He found that 36% of the travel expenses was spent for transportation costs, 17% was spent for accommodation, 13% was spent on food; 9% was spent on conference fees, 10% on registration costs, and the remainder was spent on miscellaneous costs.
Construct a pie chart for these data.
Construct a bar chart for these data.
A company has defined seven codes for possible defects for one of its products. Construct a Pareto diagram for the following frequencies:
Defect code | A | B | C | D | E | F | G |
Frequency | 10 | 70 | 15 | 90 | 8 | 4 | 3 |
Construct a time-series plot for the following data of customers shopping at a new mall during a particular week.
Day | Number of customers |
Monday | 516 |
Tuesday | 534 |
Wednesday | 451 |
Thursday | 487 |
Friday | 558 |
Saturday | 641 |
Sunday | 830 |
Determine an appropriate interval width for a random sample of 370 observations with scores that fall between 40 to 200.
Construct a stem-and-leaf display for the following data.
17 | 16 | 15 | 17 | 17 |
20 | 30 | 25 | 25 | 14 |
12 | 18 | 31 | 26 | 26 |
12 | 15 | 16 | 16 | 28 |
Construct a histogram for these data.
Is the distribution of these data symmetric, right-skewed, or left-skewed?
Prepare a scatter plot of the following data:
The following table shows the age of faculty members who have obtained a PhD degree from the largest university in the Netherlands.
Age | Percent |
26 - 28 | 18.00 |
29 - 32 | 23.50 |
33 - 40 | 30.51 |
41 - 55 | 12.99 |
56+ | 15.00 |
What percent of faculty members who obtained a PhD are 46 years or older?
What percent of faculty member who obtained a PhD are under the age of 33 years?
Construct a relative cumulative frequency distribution of the data.
Suppose, we have 200 observations. What are the cumulative frequencies for the data described?
Interpret the cumulative frequencies.
The following data are presented:
Age | 30 -40 | 40 -50 | 50 - 60 | 60 - 70 |
Number | 12 | 13 | 22 | 34 |
Describe possible errors in this table.
Suppose, the amount of money a person spends on movie tickets each month (in euros) is:
6.0, 5.3, 4.0, 5.7, 10.0, 8.4, 2.5, 10.0, 9.5, 0.0, 5.0, 10.0
What graph would you use to visually display these data?
In Germany, it was found that 32% of shoppers with incomes less than 50,000 shop online. Of the remaining 68%, half of the individuals never shop, and the other half shops by going to the actual store. Use a pie chart to plot this data.
Four types of checking accounts are offered by a bank. Suppose, a random sample of 300 customers were surveyed and asked some questions. It was found that 60% of the respondents preferred "Easy Checking", 12% preferred "Intelligent Checking", 18% preferred "Super Checking", and the remainder preferred "Ultimate Checking". Of the participants who selected Easy Checking, 100 were females. Of those who selected Intelligent Checking, a third was female. Of those who selected Super checking, half was female. Finally, of those who selected Ultimate Checking, 80% was female. Describe the data with a cross table.
How many females are there in total, and how many males?
What type of graph is appropriate for these data?
What type of graph is most appropriate for two numerical variables?
Question | Type of data |
Which of the following areas did you visit?
| Both categorical (nominal data, binary coded: yes/no) as numerical (discrete) by the number of areas that one visited. |
Did you rent a sailing boat?
| Categorical; nominal; binary coded. |
What was the average amount of money you spent on food per day? | Numerical; interval; continuous. |
What would you recommend as the optimal number of days for tourists to spend in Croatia? | Numerical; interval; discrete. |
How often would you recommend visiting Croatia?
| Categorical; ordinal. |
No answer indication available.
No answer indication available.
No answer indication available.
Note that the time points on the horizontal axis consists of numbers. This could of course also be replaced by the days (Monday - Sunday).
According to the quick guide, a sample size of 370 can be approximated by eight to ten classes.
Using the formula for interval width yields:
w = (200 - 40) / 8 = 20; or
w = (200 - 40) / 10 = 16
Thus, an appropriate interval width lies somewhere between 16 and 20.
1 | 2, 2, 4, 5, 5, 6, 6, 6, 7, 7, 7, 8.
2 | 0, 5, 5, 6, 6, 8.
3| 0, 1.
No answer indication available.
Right skewed (positively skewed); the tail is at the right side of the distribution.
No answer indication available.
12.99 + 15.00 = 27.99%
18.00 + 23.50 = 41.50%
Age | Percent |
26 - 28 | 18.00 |
29 - 32 | 41.50 |
33 - 40 | 72.01 |
41 - 55 | 85.00 |
56+ | 100.00 |
The cumulative frequencies for 200 observations are: 36, 82, 144, 170, 200.
For sample size n = 200, there are 36 individuals that obtained a PhD between the age of 26 and 28. There are 82 individuals that obtained a PhD before the age of 33. There are 144 individuals that obtained a PhD before the age of 41, and so forth.
A possible error lies in the boundaries of the frequency classes. First, there is no upper and lower limit, hence (possibly) excluding some observations. Second, it is unclear from this frequency distribution, to what class observations such as 30 and 40 belong to.
A time-series plot would be appropriate here. Data are given for t number of time points, with t = 12.
No answer indication available.
Type of checking account | Female | Male | Total |
Easy Checking | 100 | 80 | 180 |
Intelligent Checking | 12 | 24 | 36 |
Super checking | 27 | 27 | 54 |
Ultimate Checking | 24 | 6 | 30 |
Total | 163 | 137 | 300 |
There are 163 females and 137 males in the sample of 300 participants.
D, a bar chart. The other graphs are appropriate in the event of numerical variables. Here, we have frequencies for two categorical variables. This is best displayed by a bar chart (or pie chart).
A scatter plot.
A random sample of five numbers was drawn:
18 71 80 80 84
Compute the mean, median, and mode.
The number of cars crossing the border between Israel and Jordan is recorded. Over a 6-day period, the following number of cars for each day is found:
16 21 12 19 1 2
Compute the mean, median, and mode.
The records of the university of Groningen over a 12-year period show the following percentage increase in the number of students enrolled:
4.1 3.2 3.5 4.5 5.1 3.8
2.1 2.2 3.1 5.1 1.5 1.0
Compute the mean increase in the number of students enrolled.
Compute the median increase in the number of students enrolled.
Find the mode.
The finances over the past decade are reviewed. The records are shown per year.
2.51 3.74 4.15 5.33 6.18
6.65 7.18 6.92 6.95 7.54
Calculate the mean.
Calculate the median.
During the past years, many countries faced depopulation. We collected the number of elementary schools that were closed for ten countries:
10 6 13 5 11 5 6 3 7 9
Find the mean, median, and mode of the number of schools closed.
Find the five-number summary.
A textile manufacturer obtains a sample of 50 bolts of cloths and carefully inspects each bolt. Based on this inspection, the manufacturer records the number of imperfections.The following
contingency table is obtained:
Number of imperfections | 0 | 1 | 2 | 3 |
Number of bolts | 33 | 12 | 4 | 1 |
Calculate the mean, median, and mode for these sample data.
Compute the variance and standard deviation of the following sample data:
6 8 10 12 14 9 11 7 13 11
Compute the variance and standard deviation of the following sample data:
5 -3 0 2 -1 7 4
Consider two different investments, stock A and stock B. The mean closing price for stock A is 4.00 and the mean closing price for stock B is 80.00. The mean rate of return is the same for both stock A and stock B. We might think that stock B is more volatile than stock A. Now, suppose the standard deviations were found to be considerably different, with SA = 2.00 and SB = 8.00. Compute the coefficient of variation for these sample data and compare these competing investment opportunities.
Calculate the coefficient of variation for the following data:
13 15 12 14 11
A set of data is mounded (bell-shaped) with a mean of 300 and a variance of 144.
Approximately what proportion of observations is greater than 288?
Approximately what proportion of observations is less than 324?
Approximately what proportion of observations is greater than 336?
The number of cars that pass through a tunnel during a period of 35 are as follows:
60 70 74 56 84 54 50
47 80 71 50 95 121 90
75 84 70 61 110 64 80
85 85 43 76 60 91 90
60 87 110 85 44 94 69
What is the mean number of cars?
What is the standard deviation?
What is the coefficient of variation?
Construct a stem-and-leaf display of the number of cars that pass through the tunnel. Next, find the interquartile range.
Provide the five-number summary for the sample data.
The daily exchange rate from EUR to USD for seven business days is:
1.14 1.14 1.13 1.13 1.12 1.11
Over the same period, the daily exchange rate from EUR of JPY is:
110 110 109 109 108 109
Compare the means of these two distributions.
Compare the standard deviations of these two distributions.
A company produces light bulbs with a mean lifetime of 1,200 hours and a standard deviation of 50 hours. Find the z-score for a light bulb that lasts only 1,120 hours.
Consider the z-score computed by question 14a. What percentage of light bulbs lasts longer than 1,120 hours?
Consider again the mean and standard deviation from question 14a. Find the z-score corresponding to a light bulb that lasts 1,300 hours.
What percentage of light bulbs lasts longer than 1300 hours?
Suppose that a student who completed courses for 15 ECTS in total during his first semester of college. He received one A, one B, one C, and one D. Now, suppose that a value of 4 is assigned to an A, a value of 3 is assigned to a B, a value of 2 is assigned to a C, and a value of 1 is assigned to a D. Calculate the student's semester GPA.
Now, however, each course is not worth the same number of credit hours. The A was earned in a 3-credit English course, the B was earned in a biology course of 3 hours, the C was earned in a 4-credit biology course, and the D was earned in a 5-credit Spanish course. Using these weight, calculate again the student's semester weighted GPA.
Consider the following data:
xi | wi |
4.7 | 8 |
3.8 | 7 |
5.7 | 4 |
2.6 | 3 |
5.5 | 2 |
What is the artihmetic mean of the xi values?
What is the weighted mean of the xi values?
What is the sample variance?
What is the sample standard deviation?
Consider the following data:
(15,45) (6,18) (11,33) (12,36) (16,48), (14,42)
(5,15) (17,51) (4,12) (19,57), (7,21)
Compute the covariance.
Compute the correlation coefficient.
Draw a scatter plot to display the relationship between the two variables.
Consider the following data:
Quiz score (x) | 4 | 3.4 | 3 | 5 | 1.1 |
Exam score (y) | 100 | 66 | 78 | 80 | 30 |
Compute the covariance.
Compute the correlation coefficient.
Mean = (18+71+80+80+84)/5 = 66.7; median = 80; mode = 80.
Mean = (16+21+12+19+1+2)/6 = 11.8; median = (12+16)/2 = 14; there is no mode.
Mean = (4.1 + 3.2 + 3.5 + 4.5 + 5.1 + 3.8 + 2.1 + 2.2 + 3.1 + 5.1 + 1.5 + 1.0) / 12 = 3.3.
Median = 3.4.
Mode = 5.1.
Mean = (2.51 + 3.74 + 4.15 + 5.33 + 6.18 + 6.65 + 7.18 + 6.92 + 6.95 + 7.54) / 10 = 5.7.
Median = 6.4.
Mean = 7.5; median = 6.5; mode = 6.
For the five number summary, order the data in ascending order, that is:
3 5 5 6 6 7 9 10 11 13
Q1 is the value located in the 0.25(10+1)th position, that is the 2.75th position.
The second value is 5, the third value is also 5.
Q1 = 5 + 0.25*(5 - 5)
Q1 = 5 + 0
Q1 = 5
Q3 = the value located in the 0.75(10+1)th ordered position, that is the 8.25th position.
Q3 = 10 + 0.75(11 - 10)
Q3 = 10 + 0.75
Q3 = 10.75
Thus, the five number summary is: 3 (minimum); 5 (Q1); 6.5 (median); 10.75 (Q3); 13 (maximum).
Mean = (0*33 + 1*12 + 2*4 + 3*1) / 50 = 23/50 = 0.46.
Median = 0
Mode = 0
To calcuate the sample variance and standard deviation, follow these steps:
The squared deviation from the mean for all observations are: 16.81 4.41 0.01 3.61 15.21 1.21 0.81 9.61 8.41 and 0.81. The sum of these squared deviations equals 60.9. Next, s2 = (60.9) / (n -1) = 60.9/9 = 6.76. Thus, the variance equals 6.76. The standard deviation then is computed by the square root of the variance. That is: s = √6.76 = 2.6
Again, apply the same steps as in question 7. The sample mean is equal to 2. The squared deviation from the mean for each observation is: 9, 25, 4, 0, 9, 25, 4. The sum of these squared differences is equal to 76. The variance, s2 = 76/6 = 12.83. The standard deviation is the square root of the variance, that is: s = √12.83 = 3.56.
CVA = 2.00 / 4.00 x 100% = 50%.
CVB = 8.00 / 80.00 x 100% = 10%.
The market value of stock A fluctuates more from period to period than does the market value of stock B. The coefficient of variation (CV) indicates that stock for stock A, the sample standarddeviation is 50% of the mean, and for stock B the sample standard deviation is only 10% of the mean.
Use the formula:
\[CV = \frac{s}{\bar{x}} x 100\% \hspace{5mm} if \hspace{5mm} \bar{x} > 0 \]
CV = (1.58 / 13) x 100% = 12.15%
Thus, the sample standard deviation is 12.15% of the mean.
Use the formula:
\[z = \frac{x_{i} - \mu}{\sigma} \]
The standard deviation, σ, is equal to the square root of the variance, σ2, that is: √144 = 12
z = (288 - 300) / 12 = -12/12 = -1
According to the empirical rule, approximately 68% fall within 1 standard deviation above and below the mean. The remaining 34% percent is thus spread to the left and right of this interval. This means that 0.5*34 = 16% of the observations fall below z = -1. Vice versa, 100 - 16 = 84% of scores are greater than 288.
z = (324 - 300) / 12 = 24/12 = 2
According to the empirical rule, approximately 95% fall within 2 standard deviations above and below the mean. The reamining 5% is spread at the higher and lower end of the distribution. Thus, 97.5% of observations are less than 324.
z = (336 - 300) / 12 = 36/12 = 3. Approximately all observations are lower than 336. Thus, to answer the question, almost no (0.15%) observations are greater than 336.
Mean = 75.
Standard deviation = 19.26.
CV = (19.26/75) x 100% = 25.67.
4 | 3 4 7
5 | 0 0 4 6
6 | 0 0 0 1 4 9
7 | 0 0 1 4 5 6
8 | 0 0 4 4 5 5 5 7
9 | 0 0 1 4 5
10 |
11| 0 0
12| 1
The interquartile range, IQR = 26.
Minimum = 43; Q1 = 60; Median = 75; Q3 = 86; Maximum = 121.
The means are 1.13 and 109.17.
The standard deviations are 0.01 and 0.75
CVA = (0.01/1.13) x 100% = 1.04%
CVB = (0.75/109.17) x 100% = 0.69%
The coefficient of variations tells us that the sample standard deviation for EUR to USD is 1.04% of the mean, whereas the sample standard deviation for EUR to JPY is 0.69% of the mean. Thus, the exchange rate for EUR to USD fluctuates more from day to day than does that of EUR of JPY.
z = (1,120 - 1,200) / 50 = -1.6.
94.52 (you can find the p-value corresponding to this z-score in the table of a standard normal distribution).
z = (1,300 - 1,200) / 50 = 2.
According to the empirical rule, approximately 2.5% of observations are more than two standard deviations above the mean.
\[ \bar{x} = \frac{4+3+2+1}{4} = 2.5\]
Use the formula for the weighted mean, that is:
\[\bar{x} = \frac{\Sigma w_{i}x_{i}}{n} \]
\[\bar{x} = \frac{4*3 + 3*3 + 2*4 + 1*5}{15} = \frac{34}{15} = 2.267 \]
\[\bar{x} = \frac{4.7+2.8+5.7+2.6+5.5}{5} = \frac{22.3}{5} = 4.46\]
\[\bar{x} = \frac{4.7*8 + 3.8*7 + 5.7*4 + 2.6*3 + 5.5*2}{24} = \frac{105.8}{24} = 4.41 \]
The variance is 1.643.
The standard deviation is √1.643 = 1.281.
The covariance = 82.42.
The correlation coefficient between x and y, that is r = 1.0 (perfect positive linear relationship).
Cov(x,y) = 30.8.
r = 0.83.
A random sample of five numbers was drawn:
18 71 80 80 84
Compute the mean, median, and mode.
The sample space S = [E1, E2, E3, E4, E5, E6]. Given A = [E1, E2, E3] and B = [E3, E4, E5].
What is A intersection B?
What is the union of A and B?
Is the union of A and B collectively exhaustive?
Use the following sample space S: S = [E1, E2, E3, E4, E5, E6, E7, E8, E9, E10].
Given A = [E1, E2, E3, E4], what is Ā?
Given Ā = [E1, E4, E5, E7] and B̄ (complement B) = [E2, E3, E5, E8]. What is A intersection B̄ (complement B)?
What is A intersection B?
What is the union of A and B?
Is the union of A and B collectively exhaustive?
Suppose, two letters are to be selected from A, B, C, D, and E. Further, these two letters have to be arranged in order. How many permutations are possible?
Suppose, there are 8 candidates that applied for a particular job. Yet, there are only 4 positions available. Of these 8 candidates, 5 are men and 3 are women. If every combination of candidates is equally likely to occur, what is then the probability that no women will be hired?
Suppose, there are 10 Apple iPads, 5 Samsung tablets, and 5 Huawei tablets on offer in a store A person enters the store and wants to buy 3 tablets. These tablets are selected purely by chance. What is the total number of outcomes in the sample space?
What is the probability that this person selects 2 Apple iPads and 1 Samsung tablet?
A sample space consists of 5 A's and 7 B's. Now, suppose we want to randomly draw two letters from this sample space. What is the total number of possible combinations?
What is the probability that a randomly selected set of 2 will include 1 A and 1 B?
In a family of 6 family members, there are three males and three females. What is the probability that a random sample of two family members consists of two males?
Suppose there are 12 employees who could be assigned to an editorial task. Of these 12 employees, 7 are women and 5 are men. Two of the men are brothers. The manager of the company has to assign the editorial task randomly to one employee. Let A be the event "chosen employee is a man". Let B be the event "chosen employee is one of the brothers". What is the probability of event A?
What is the probability of event B?
What is the probability of the intersection of A and B?
Suppose, P(A) = 0.75, P(B) = 0.80, and P(A ∩ B) = 0.65. What is P (A ∪ B)?
What is the conditional probability of event B, given that event A has occurred?
What is the joint probability of both event A and event B?
Suppose, within the Netherlands, 54% of all master's degrees are earned by women. Of all master's degrees that are obtained, 20% is obtained in psychology. In addition, 8% of all master's degrees are obtained by women in psychology. Are the events "the diploma holder is a woman" and the event "the diploma is in psychology" statistically independent?
Suppose, the odds in favor of winning are 3 to 2. What is then the probability of winning?
Suppose, we are interested in examining the effect of alcohol on highway crashes. Obviously, it is unethical to provide one group of drivers with alcohol and compare their crash involvement to that of a sober group. We know, however, that 10.3% of the nighttime drivers have been drinking, and that 32.4% of the single-vehicle-accident drivers had been drinking. In this example, single-vehicle accidents are chosen to ensure that any driving error could be assigned to the driver only.
Based on these data, what is the sample space?
What is the conditional probability that the driver had been drinking, given that he was not involved in a crash?
Do these numbers provide sufficient evidence to conclude that alcohol increases the probability of crashes?
For questions 26-30, the sample space is defined by events A1, A2, B1, and B2.
Given that P(A1) = 0.15, P(B1) = 0.20, and P(B1|A1) = 0.60. What is P(A1|B1)?
Given that P(A1 ∩ B1) = 0.09 and P(B1) = 0.18. What is P(A1|B1)?
Given that P(A2 ∩ B2) = 0.81 and P(B2) = 0.82. What is P(A2|B2)?
Given that P(A1) = 0.10, P(B1|A1) = 0.90. What is the probability of P(A1 ∩ B1)?
Given that P(A1) = 0.10, P(B1|A1) = 0.90, P(B2|A1) = 0.10. What is the probability of P(A2)?
A ∩ B = [E3].
A ∪ B = [E1, E2, E3, E4, E5].
No, A and B are not collectively exhaustive, because E6 is not covered in the union.
Ā = [E5, E6, E7, E8, E9, E10]
A ∩ complement B = [E2, E3, E5, E8], because A is equal to the complement of B.
A ∩ B is the empty set. There are no basic outcomes in both A and B, because they are each others complement.
A ∪ B = [E1, E2, E3, E4, E5, E6, E7, E8]
No, events E9 and E10 are not covered in the union of A and B.
There are five outcomes, that is n = 5, and two outcomes have to be selected, that is x = 2.
Using the formula for the number of permutations yields:
\[P^{5}_{2} = \frac{5!}{3} = \frac{120}{6}\ = 20 ].
First, calculate the total number of possible combinations of four candidates selected from the eight possible candidates. That is:
\[ C^{8}_{4} = \frac{8!}{4!4!} = 70 \]
Then, if no women is to be hired, this implies that the four successful candidates must come from the available five men. That means that the number of combinations is as follows:
\[ C^{5}_{4} = \frac{5!}{4!1!} = 5 \]
To conclude, if out of 70 possible combinations each is likely to be chosen, the probability that one of the 5-all male combinations would be selected is 5/70 = 1/14 = 0.07 (that is, 7%).
\[N = C^{20}_{3} = \frac{20!}{3!(20-3)!} = 1,140 \]
Thus, there are 1,140 number of outcomes in the sample space.
\[ C^{10}_{2} = \frac{10!}{2!(10-2)!} = 45 \]
Similarly, the number of ways that we can select 1 Samsung tablet from the available 5 is 5.
\[ C^{5}_{1} = \frac{5!}{1!(5-1!)} = \frac{5!}{1!4!} = 5 \]
Therefore, the number of outcomes that satisfy event A is as follows:
\[ N_{A} = C^{10}_{2} x C^{5}_{1} = 45 x 5 = 225 \]
Hence, the probability of A [i.e., 2 Apple iPads and 1 Samsung tablet] is:
\[ P_{A} = \frac{N_{A}}{N} = \frac{225}{1140} = 0.197 \]
The total number of possible combinations of 2 letters selected from 8 is as follows:
\[ C^{12}_{2} = \frac{12!}{2!10!} = 66 \]
The number of ways that we can select 1 A from the 5 available A's is as follows:
\[ N_{A} = C^{12}_{2} x C^{5}_{1} = \frac{5!}{1!(5-1)!} = \frac{5!}{1!4!} = 5 \]
Similarly, the number of ways that we can select 1 B from the 7 available B's is as follows:
\[ N_{A} = C^{12}_{2} x C^{7}_{1} = \frac{7!}{1-(7-1)!} = \frac{7!}{1!6!} = 7\]
Therefore, the number of ways that we can select one A and one B, that is the number of outcomes that satisfy event A, is as follows:
\[N_{A} = C^{5}_{1} x C^{7}_{1} = 5 x 7 = 35 \]
Finally, the probability of event A (that is, one A and one B) is as follows:
\[ P_{A} = \frac{N_{A}}{A} = \frac{35}{66} = 0.53\].
\[ N = C^{6}{3} = \frac{6!}{3!3!} = \frac{720}{36} = 20 \]
Now, the number of combinations for two males is:
\[ C^{3}_{2} = \frac{3!}{2!1!} = \frac{6}{2} = 3 \]
Therefore, the probability of selecting two males is 3/20 = 0.15 (that is: 15%).
\[P_{A} = \frac{N_{A}}{N} = \frac{5}{12} = 0.42 \]
\[P_{B} = \frac{N_{B}}{N} = \frac{2}{12} = 0.17 \]
A ∩ B = 0.17
Use the addition rule of probabilities.
\[ P (A ∪ B) = P(A) + P(B) - P(A ∩ B) \]
Transforming this formula provides:
\[ P (A ∩ B) = P(A) + P(B) - P(A ∪ B) \]
This gives:
\[ 0.75 + 0.80 - 0.65 = 0.90 \]
\[ P(B|A) = \frac{P(A ∩ B)}{P(A)} = \frac{0.65}{0.75} = 0.8667 \]
To answer this question, use the multiplication rule of probabilities. That is:
\[ P(A ∩ B) = P(A|B) P(B) = (0.8125)(0.80) = 0.65 \]
\[ P(A) = 0.54, P(B) = 0.20, P(A ∩ B) = 0.08 \]
Since
\[ P(A)P(B) = (0.54)(0.20) = 0.108 \neq 0.08 = P(A ∩ B) \]
these events are not independent.
The dependence can be found from the conditional probability:
\[ P(A|B) = \frac{P(A ∩ B)}{P(B)} = \frac{0.08}{0.20} = 0.40 \neq 0.54 = P(A) \]
That means that, in the Netherlands, only 40% of psychology degrees go to women, whereas women constitute 54% of all degree recipients.
\[ \frac{3}{2} = \frac{P(A)}{1-P(A)} \]
\[ 3(1-P(A)) = 2P(A) \]
\[ 5P(A) = 3 \]
\[ P(A) = \frac{3}{5} = 0.6 \]
A1: the driver had been drinking.
A2: the driver had not been drinking.
B1: the driver was involved in a single-vehicle crash.
B2: the driver was not involved in a single-vehicle crash.
P(A1|C1) = 0.324
P(A1|C2) = 0.103
To answer this question, use the overinvolvement ratio. That is:
\[ \frac{P(A_{1}|C_{1})}{P(A_{1}|C_{2})} = \frac{0.324}{0.103} = 3.15 \]
Based on this ratio of 3.15, we can conclude that there is evidence that alcohol increases the probability of car crashes.
Using Bayes' theorem, we find that P(A1|B1) = (0.60*0.15)/(0.20) = 0.45.
\[ P(A_{1}|B_{1}) = \frac{P(A_{1} ∩ B_{1})}{P(B_{1})} = \frac{0.09}{0.18} = 0.50 \]
\[ P(A_{2}|B_{2}) = \frac{P(A_{2} ∩ B_{2})}{P(B_{2})} = \frac{0.81}{0.82} = 0.988 \]
P(A1 ∩ B1) = 0.90 * 0.10 = 0.09
Use both:
P(A1 ∩ B1) = 0.90 * 0.10 = 0.09
and:
P(A1 ∩ B2) = 0.10 * 0.10 = 0.01
to find that:
P(A1) = 0.09 + 0.01 = 0.10
A2 is the complement of A1, thus A2 = 1 - A1 = 1 - 0.10 = 0.90
The sample space S = [E1, E2, E3, E4, E5, E6]. Given A = [E1, E2, E3] and B = [E3, E4, E5].
What is A intersection B?
A researcher is studying the number of owl eggs found in Danmark. Is the number of eggs a discrete or continuous random variable?
The weight of students is recorded as part of a national health study. Is the weight of students a discrete or continuous random variable?
Indicate for each of the following if a discrete or continuous random variable provides the best definition:
Give the probability distribution function of the face values of a single die when a fair die is rolled.
What is the probability of a value of 5 or higher, when rolling a single fair die once?
Use the following probability distribution:
x | 0 | 1 | 2 | 3 | 4 | 5 | 6 |
P(x) | 0.03 | 0.15 | 0.11 | 0.19 | 0.22 | 0.26 | 0.04 |
P(3 < x < 6) = ?
P(x > 3) = ?
P(2 < x < 5) = ?
P(x < 4) = ?
What is the mean of this probability distribution?
Suppose, the probability distribution of the number of errors (X) on pages from a business textbook is as follows: P(0) = 0.81; P(1) = 0.17; P(2) = 0.02.
What is the mean number of errors per page?
Someone is interested in the total costs of a project on which he intends to bid. He estimates that the materials will costs €25,000,- and that the larbor will costs €900,- per day. Suppose the project takes X days to complete. Provide the linear function for the total costs, denoted by C, of the project.
Now, assume that the following probability distribution is provided for the completion time of the project.
Completion time (x) | 10 | 11 | 12 | 13 | 14 |
P(x) | 0.1 | 0.2 | 0.3 | 0.2 | 0.1 |
What is the variance for completion time X?
What is the mean for the total costs, C?
What is the variance for the total costs, C?
Suppose that a real estate agent has five contacts and believes that for each contact the probability of making a sale is 0.40. What is the probability that the real estate agent makes at most 1 sale?
What is the probability that the real estate agent makes between 2 and 4 sales (inclusive)?
It is predicted that 3.5% of all small corporations will file for bankruptcy in 2020. For a random sample of 100 small corporations, estimate the probability that at least 3 will file for bankruptcy in 2020, assuming that this prediction is correct. To do so, use the Poisson distribution.
Now, do the same using the (actual) binomial distribution. Is the Poisson distribution a close estimate of the actual binomial distribution?
Consider the following joint probability distribution for two random variables X and Y. Find the marginal probabilities.
Y return | ||||
X return | 0% | 5% | 10% | 15% |
0% | 0.0625 | 0.0625 | 0.0625 | 0.0625 |
5% | 0.0625 | 0.0625 | 0.0625 | 0.0625 |
10% | 0.0625 | 0.0625 | 0.0625 | 0.0625 |
15% | 0.0625 | 0.0625 | 0.0625 | 0.0625 |
Are X and Y independent?
Find the mean of X.
Find the mean of Y.
What is the variance of X?
What is the standard deviation of X?
Consider the following probability distribution
X | |||
Y | 0 | 1 | |
0 | 0.25 | 0.35 | |
1 | 0.10 | 0.30 |
Compute the marginal probability distributions for X and Y.
Consider the following information for questions 28-30. An investor has €1000,- to invest and two investment opportunities, each requiring a minimum of €500,-. The profit for €100,- for the first investment (X) can be represented by the following probability distributions: P(X = -5) = 0.4 and P(X = 20) = 0.6. Subsequently, the profit per €100,- from the second investment (Y) is represented by the following probability distributions: P(Y = 0) = 0.6 and P(Y = 25) = 0.4. Random variables X and Y are independent. The investor has the following possible strategies:
Find the mean and variance for the first strategy.
Find the mean and variance for the second strategy.
Find the mean and variance for the third strategy.
It is a discrete random variable, because it can take on a finite number of countable numbers.
The weight of students is a continuous random variable.
x | P(x) |
1 | 0.16667 |
2 | 0.16667 |
3 | 0.16667 |
4 | 0.16667 |
5 | 0.16667 |
6 | 0.16667 |
0.1667 + 0.1667 = 0.3333
P(3 < x < 6) = 0.19 + 0.22 + 0.26 = 0.67
P(x > 3) = 0.19 + 0.22 + 0.26 + 0.04 = 0.71
P(2 < x < 5) = 0.19 + 0.22 + 0.26 = 0.67
P(x < 4) = 0.03 + 0.15 + 0.11 + 0.19 = 0.48
\[ \mu_{X} = 0(0.03) + (1)(0.15) + (2)(0.11) + (3)(0.19) + (4)(0.22) + (5)(0.26) + (6)(0.04) = 3.36 \]
\[ \mu_{x} = E[X] = \sum_{x} xP(x) = (0)(0.81) + (1)(0.17) + (2)(0.02) = 0.21 \]
Thus, the mean number of errors per page is 0.21.
C = 25,000 + 900X.
\[ \mu_{X} = E[X] = \sum_{x}xP(x) = (10)(0.1) + (11)(0.3) + (12)(0.3) + (13)(0.2) + (14)(0.1) = 11.9 \]
So, the mean for completion time X is 11.9 days.
\[ \sigma^{2}_{Y} = Var(a + bX) = b^{2}\sigma^{2}_{X} \]
\[ (10 - 11.9)^{2}(0.1) + (11 - 11.9)^{2}(0.3) + ... + (14 - 11.9)^{2}(0.1) = 1.29 \]
So, the variance for completion time X is 1.29 days.
\[ \mu_{C} = E[25,000 + 900X] = (25,000 + 900\mu_{X}) = 2500 + (900)(11.9) = €35,710,- \]
\[ \sigma^{2}_{C} = Var(25,000 + 900X) = (900)^{2}\sigma^{2}_{X} = (810,000)(1.29) = €1,044,900,- \]
\[ P(0) = \frac{5!}{0!5!}(0.4)^{0}(0.6)^{5} = (0.6)^{5} = 0.078 \]
\[ P(1) = \frac{5!}{1!4!}(0.4)^{1}(0.6)^{4} = 5(0.4)(0.6)^{4} = 0.259 \]
P(X < 1) = P(X = 0) + P(X = 1) = 0.078 + 0.259 = 0.337
\[ P(2) = \frac{5!}{2!3!}(0.4)^{2}(0.6)^{3} = 10(0.4)^{2}(0.6)^{3} = 0.346 \]
\[ P(3) = \frac{5!}{3!2!}(0.4)^{3}(0.6)^{2} = 10(0.4)^{3}(0.6)^{2} = 0.230 \]
\[ P(4) = \frac{5!}{4!1!}(0.4)^{4}(0.6)^{1} = 5(0.4)^{4}(0.6)^{1} = 0.077 \]
P(2 < X < 4) = P(2) + P(3) + P(4) = 0.346 + 0.230 + 0.077 = 0.653
The distribution of X is binomial with n = 100 and P = 0.0035, so that the mean of the distribution is equal to nP = 3.5. Next, using the Poisson distribution to approximate the probabily of at least 3 bankruptcies, we find:
\[ P(X \geq 3) = 1 - P(X \leq 2) \]
\[ P(0) = \frac{e^{-3.5}(3.5)^{0}}{0!} = e^{-3.5} = 0.030197 \]
\[ P(1) = \frac{e^{-3.5}(3.5)^{1}}{1!} = (3.5)(0.030197) = 0.1056895 \]
\[ P(2) = \frac{e^{-3.5}(3.5)^{2}}{2!} = (6.125)(0.030197) = 0.1849566 \]
Hence,
\[ P(X \leq 2) = P(0) + P(1) + P(2) = 0.3208431 \]
\[ P(X \geq 3) = 1 - 0.3208431 = 0.6791569 \]
Using the binomial distribution, we compute the probability belonging to X > 3 as: P(X > 3) = 0.684093.
Thus, the Poisson probability is a close estimate of the actual binomial distribution.
\[ P(X = 0) = \sum_{y}P(0,y) = 0.0625 + 0.0625 + 0.0625 + 0.0625 = 0.25\]
Note that for every combination of values for X and Y, P(x,y) = 0.0625. Therefore, all the marginal probabilities of X are 25%. The same holds for the marginal probabilities of Y. Note that the sum of the marginal probabilities for a random variable is 1.
To test independence, we need to check if P(x,y) = P(x)P(y) for all possible pairs of values x and y.
P(x,y) = 0.0625 for all possible values of x and y.
P(x) = 0.25 and P(y) = 0.25 for all possible values of x and y.
P(x,y) = 0.0625 = (0.25)(0.25) = P(x)P(y)
Thus, X and Y are independent.
\[ \mu_{X} = E[X] = \sum_{x}P(x) = 0(0.25) + 0.05(0.25) + 0.10(0.25) + 0.15(0.25) = 0.075 \]
The mean of Y is equal to the mean of X, that is 0.075.
\[ \sigma^{2}_{X} = \sum_{X}(x-\mu_{X})^{2}P(x) = (0.25)[(0 - 0.075)^{2} + (0.05 - 0.075)^{2} + (0.10 - 0.075)^{2} + (0.15 - 0.075)^{2}] = 0.003125 \]
The standard deviation of X is the square root of the variance, that is 0.0559016, or 5.59%.
\[ P(X = 0) = \sum_{y}P(0,y) = 0.25 + 0.10 = 0.35 \]
\[ P(Y = 0) = \sum_{x}P(x,0) = 0.35 + 0.20 = 0.55 \]
\[ \mu_{X} = E[X] = \sum_{x}xP(x) = (-5)(0.4) + (20)(0.6) = €10,- \]
\[ \sigma^{2}_{x} = E[(X - \mu_{X})^{2}] = \sum_{x}(x - \mu)^{2} P(x) = (-5 - 10)^{2}(0.4) + (20 - 10)^{2}(0.6) = 150 \]
Strategy a has a mean profit of E[10X] = €100,- and variance of Var(10X) = 100Var(X) = 15,000.
\[ \mu_{Y} = E[Y] = \sum_{y}yP(y) = (0)(0.6) + (25)(0.4) = €10,- \]
\[ \sigma^{2}_{y} = E[(Y - \mu_{Y})^{2}] = \sum_{y}(y - \mu)^{2} P(Y) = (0 - 10)^{2}(0.6) + (25 - 10)^{2}(0.4) = 150 \]
Strategy b has a mean profit of E[10Y] = €100,- and variance of Var(10Y) = 100Var(Y) = 15,000.
\[ E[5X + 5Y] = E[5X] + E[5Y] = 5E[X] + 5E[Y] = €100,- \]
\[ Var(5X + 5Y) = Var(5X) + Var(5Y) = 25Var(X) + 25Var(Y) = 7,500 \]
The variance of strategy c is smaller than that of the strategies of a and b, reflecting the decrease in risk that follows from diversification in an investment portfolio. Most investors would prefer strategy c, because this strategy yields the same expected return as the other two strategies, but with a lower risk.
A researcher is studying the number of owl eggs found in Danmark. Is the number of eggs a discrete or continuous random variable?
Consider the uniform probability density function f(x) = 0.5x with a range of 0 to 2. What is the probability that a random variable X is between 1.4 and 1.8?
Consider the uniform probability density function f(x) = 0.5x with a range of 0 to 2. What is the probability that a random variable X is between 0.5 and 1.6?
Consider the uniform probability density function f(x) = 0.5x with a range of 0 to 2. What is the probability that a random variable X is less than 0.8?
Consider the uniform probability density function f(x) = 0.5x with a range of 0 to 2. What is the probability that a random variable X is greater than 1.3?
A homeowner estimates the heating bill based on the range of likely temperatures in January. He obtains the following linear equation: Y = 290 - 5T, in which T refers to the average temperature for the month in degrees Fahrenheit. If the average temperature in January has mean 24 and standard deviation 4, what is then the mean and standard deviation of this homeowner's January heating bill?
The profit for a production process is equal to 6000 dollars minus three times the number of units produced. The mean and variance for the number of units produced are 1000 and 900 respectively. Find the mean and variance of the profit.
The profit of a particular production process is equal to €2000,- minus two times the number of units produced. The mean and variance for the number of units produced are 500 and 900 respectively. What are the mean and variance of the profit?
The profit of a particular production process is equal to €1000,- minus two times the number of units produced. The mean and variance for the number of units produced are 50 and 90 respectively. What are the mean and variance of the profit?
Consider for questions 9-15 the standard normal distribution.
P(Z < 1.16) = ?
P(Z > 1.73) = ?
P(Z > -2.29) = ?
P(Z > -1.35) = ?
P(1.16 < Z < 1.73) = ?
P(-2.29 < Z < 1.26) = ?
P(-2.29 < Z < -1.35) = ?
The probability is 0.70 that Z is less than what number?
The probability is 0.25 that Z is less than what number?
The probability is 0.2 that Z is greater than what number?
The probability is 0.6 that Z is greater than what number?
Let a continuous random variable X be normally distributed with X ~ (30, 81). What is the probability that X is greater than 40?
The anticipated consumer demand at a restaurant can be modeled by a normal random variable with mean 1,500 pounds and standard deviation 110 pounds. What is the probability that the demand will exceed 1,300 pounds?
The scores on an achievement test are known to be randomly distributed with a mean of 420 and a standard deviation of 80. What is the minimum test score needed in order to be in the top 10% of all people taking the test?
Given a random sample size of n = 900 from a binomial probability distribution with P = 0.30. Can the normal distribution be used to compute probabilities belonging to this distribution. If so, why?
Given a random sample size of n = 900 from a binomial probability distribution with P = 0.30. What is the probability that the number of successes is greater than 305?
Service times for customers at a library information desk can be modeled by an exponential distribution with a mean service of 5 minutes. What is the probability that a customer service time will take longer than 10 minutes?
A company in the Netherlands with 2000 employees has a mean number of lost-time accidents per week equal to λ = 0.4 and the number of accidents follow a Poisson distribution. What is the probability that the time between accidents is less than 2 weeks?
An investor has asked you for assistance in establishing a portfolio containing two stocks. The investor has €1000,- which can be allocated in any proportion to two alternative stocks. The returns per euro from these two investments are denoted by random variables X and Y. Both of these variables are independent and normally distributed. Investment X has a mean of 25 and variance of 81. The second investment has a mean of 40 and a variance of 121. These two stock prices have a negative correlation, ρxy = -0.40. Define the linear equation of the value of the portfolio, denoted by W.
What is the mean value for the stock portfolio?
What is the standard deviation for the stock portfolio?
What is the probability that the portfolio value exceeds 2,000?
P(1.8 < X < 1.4) = F(1.8) - F(1.4) = (0.5)(1.8) - (0.5)(1.4) = 0.9 - 0.7 = 0.2.
P(1.6 < X < 0.5) = F(1.6) - F(0.5) = (0.5)(1.6) - (0.5)(0.5) = 0.8 - 0.25 = 0.55.
P(X < 0.8) = F(0.8) = (0.5)(0.8) = 0.40.
P(2.0 < X < 1.3) = F(2.0) - F(1.3) = (0.5)(2.0) - (0.5)(1.3) = 1.0 - 0.65 = 0.35.
\[ \mu_{Y} = 290 - 5\mu_{T} = 290 - (5)(24) = 170 \]
\[ \sigma_{Y} = |-5| \sigma_{T} = (5)(4) = 20 \]
\[ Y = 6000 - 3U \]
\[\mu_{Y} = 1000 = 6000 - 3U \]
\[3U = 6000 - 1000 = 5000 \]
\[U ≈ 1667 \]
\[ \sigma_{Y} = |3|\sigma_{U} \]
\[ 900 = |3|\sigma_{U} \]
\[ \sigma_{U} = \frac{900}{3} = 300 \]
Thus, the mean and variance of the profit are 1,667 and 300 dollars respectively.
\[ Y = 2000 - 2U \]
\[\mu_{Y} = 500 = 2000 - 2U \]
\[2U = 2000 - 500 = 1500\]
\[U ≈ 750 \]
\[ \sigma_{Y} = |2|\sigma_{U} \]
\[ 900 = |2|\sigma_{U} \]
\[ \sigma_{U} = \frac{900}{2} = 450 \]
Thus, the mean and variance of the profit are €750,- and €450,- respectively.
\[ Y = 1000 - 2U \]
\[\mu_{Y} = 50 = 1000 - 2U \]
\[2U = 1000 - 50 = 950\]
\[U ≈ 475 \]
\[ \sigma_{Y} = |2|\sigma_{U} \]
\[ 90 = |2|\sigma_{U} \]
\[ \sigma_{U} = \frac{900}{2} = 45 \]
Thus, the mean and variance of the profit are €950,- and €45,- respectively
P(Z < 1.16) = 0.8770
P(Z > 1.73) = 1 - 0.9582 = 0.0418
P(Z > -2.29) = P(Z < 2.29) = 0.9890
P(Z > -1.35) = P(Z > 1.35) = 0.9115
P(1.16 < Z < 1.73) = 0.9582 - 0.8770 = 0.0812
P(-2.29 < Z < 1.26) = 0.9890 - 0.8962 = 0.0928
P(-2.29 < Z < -1.35) = 0.0855 - 0.011 = 0.0745
z = 0.525
z = -0.575
z = -0.845
z = -0.256
\[ Z = \frac{X - \mu}{sigma} = \frac{40 - 30}{\sqrt{81}} = \frac{-10}{9} = -1.11 \]
P(Z > -1.11) = 1 - 0.8665 = 0.1335
\[ Z = \frac{(1300 - 1,500)}{110} = -1.82 \]
P(Z > -1.82) = 0.9656
Top 10% corresponds to z = 1.185 (between z = 1.18 and z = 1.19 in Standard Normal Distribution Table).
\[ 1.185 = \frac{X - 420}{80} \]
\[ 1.185*80 = X - 420 \]
\[ 94.5 + 420 = X\]
Thus, X = 514.8. One needs to score at least 515 to be in the top 10% of all people taking this test.
nP(1 - P) = 900*0.30(1 - 0.30) = 189 > 5, thus the binomial distribution can be approximated by the standard normal distribution.
\[ \mu = nP = 270 \]
\[ \sigma^{2} = 189 \]
\[ \sigma = \sqrt{189} = 13.75 \]
\[ z = \frac{305 - 270}{13.75} = 2.55 \]
P(Z > 2.55) = 1 - 0.9946 = 0.0054
\[ P(T > 10) = 1 - P(T < 10) = 1 - F(10) = 1 - (1 - e^{-(0.20)(10)}) = e^{-2.0} = 0.1353 \]
Thus, the probability that a service time exceeds 10 minutes is 0.1353.
\[ P(T < 2) = F(2) = 1 - e^{-(0.4)(2)} = 1 - e^{-0.8} = 1 - 0.4493 = 0.5507 \]
Thus, the probability of less than 2 weeks between accidents is about 55%.
W = 20X + 30Y
W = 20*25 + 30*40 = 1,700
\[ \sigma^{2}_{W} = 20^{2} \sigma^{2}_{X} 30^{2} \sigma^{2}_{Y} + 2*30 \rho_{XY} \sigma_{X} \ sigma_{Y} \]
\[ \sigma^{2}_{W} = 20^{2}*81 + 30^{2}*121 + 2*20*30*{-0.40}*9*11 = 93,780 \]
\[ \sigma = \sqrt{\sigma^{2}} = \sqrt{93,780} = 306.24 \]
\[ Z = \frac{2000 - 1700}{306.24} = 0.980 \]
P(Z > 0.980) = 0.1635
Consider the uniform probability density function f(x) = 0.5x with a range of 0 to 2. What is the probability that a random variable X is between 1.4 and 1.8?
Suppose that we know that the annual percentage salary increase is normally distributed with a mean of 12.2% and a standard deviation of 3.6%. A random sample of 9 observations is obtained from this population and the sample mean is computed. What is the standard error of the sample mean?
What is the probability that the sample mean exceeds 14.4%?
Question 2a
Given a population with a mean of 105 and a variance of 16, the central limit theorem applies when the sample size is n > 25. A random sample of size 25 is obtained. What are the mean and variance of the sampling distribution for the sample means?
What is the probability that x̅ > 106?
What is the probability that 104 < x̅ < 106?
What is the probability that x̅ < 105.5?
Given a population with a mean of 150 and a variance of 1600, the central limit theorem applies when the sample size is n > 25. A random sample of size 36 is obtained. What are the mean and variance of the sampling distribution for the sample means?
What is the probability that x̅ > 155?
What is the probability that 145 < x̅ < 165?
What is the probability that x̅ > 165?
The lifetime of light bulbs procuded by a company have a mean of 1,200 hours and a standard deviation of 400 hours. The population is normally distributed. Suppose that you buy nine light bulbs, which can be regarded as a proper random sample from the population. What is the mean of the sample mean lifetime?
What is the variance of the sample mean?
What is the standard error of the sample mean?
What is the probability that, on average, those nine light bulbs have live times of less than 1050 hours?
To get some feeling for possible magnitudes of the finite population correction factor, calculate it for samples of n = 20 observations from populations of members: 20, 100, 10,000.
Explain why the result found in the previous question is precisely what one should expect on intuitive grounds.
A random sample of 270 students was taken from a large population of students taking a statistics exam. If, in fact, 20% of the students fail the test, what is the probability that the sample proportion of students failing the test will be between 16 and 24%?
Now, compute the same probability for 16 to 24%, but this time use a sample of 400 students.
It has been estimated that 43% of the students drink alcohol. Find the probability that more than half of a random sample of 80 students drink alcohol.
Suppose that 50% of all adult Americans eat McDonald's once a week. What is the probability that more than 58% of a random sample of 250 adult Americans eat McDonald's once a week?
Suppose that 50% of all adult Americans eat McDonald's once a week. What is the probability that more than 55% of a random sample of 250 adult Americans eat McDonald's once a week?
Given is n = 6. Determine an upper limit for the sample variance such that the probability of exceeding this limit, given a population standard deviation of 3.6, is less than 0.05. Use the chi-square distribution to solve this problem.
Question 11a
There are six employees with the following years of experience:
2, 4, 6, 6, 7, 8
Two of these employees are to be chosen at random.
What is the mean age for these six employees?
How many possible samples of two employees are there?
List all possible samples
Find the sampling distribution of the sample means.
What is the central limit theorem?
Suppose a population distribution is left-skewed with mean 100 and variance 15. From this population, we draw a random sample of n = 100. What is the expected mean of this sample?
What is the expected variance of this sample?
What shape is expected for the sampling distribution?
μ = 12.2; σ = 3.6; n = 9.
\[ \sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}} = \frac{3.6}{\sqrt{9}} = 1.2 \]
\[ P(\bar{x} > 14.4) = P( \frac{\bar{X} - \mu}{\sigma_{\bar{x}}} > \frac{14.4 - 12.2}{1.2} ) = P(z > 1.83) = 0.0336 \]
To conclude, the probability that the sample mean will exceed 14.4% is only 0.0336.
The central limit theorem appies, thus the sampling distribution has mean 105 and variance 16/√25 = 3.2.
\[ Z = \frac{\bar{X} - \mu_{X}}{\sigma_{\bar{X}}} = \frac{106 - 105}{3.2} = 0.3125\]
P(Z > 0.3125) = 1- 0.6217 = 0.3783
\[ Z = \frac{\bar{X} - \mu_{X}}{\sigma_{\bar{X}}} = \frac{104 - 105}{3.2} = -0.3125\]
P(104 < x̅ < 106) = P(-0.3125 < z < 0.3125) = 0.6217 - (1 - 0.6217) = 0.2434
\[ Z = \frac{\bar{X} - \mu_{X}}{\sigma_{\bar{X}}} = \frac{105.5 - 105}{3.2} = 0.1563\]
P(Z < 0.1563) = 0.5636
The central limit theorem applies, thus the mean of the sampling distribution is 150 and the variance 1600/√36 = 266.67.
\[ Z = \frac{\bar{X} - \mu_{X}}{\sigma_{\bar{X}}} = \frac{155 - 150}{266.7} = 0.0188\]
P(Z > 0.0188) = 1- 0.5040 = 0.4960
\[ Z = \frac{\bar{X} - \mu_{X}}{\sigma_{\bar{X}}} = \frac{145 - 150}{266.7} = -0.06563\]
\[ Z = \frac{\bar{X} - \mu_{X}}{\sigma_{\bar{X}}} = \frac{165 - 150}{266.7} = 0.0563\]
P(145 < x̅ < 165) = P(-0.0563 < z < 0.0563) = 0.5239 - (1 - 0.5239) = 0.5239 - 0.4761 = 0.0478
P(x̅ > 165) = 1 - 0.5239 = 0.4761
The population is normally distributed. Therefore, the sampling distribution of the sample means is normal. Hence, the mean of the sampling distribution is 1,200.
The variance is 400/√9 = 133.33
The standard error is √400/√9 = 6.67.
\[ Z = \frac{\bar{X} - \mu_{X}}{\sigma_{\bar{X}}} = \frac{1050 - 1200}{133.33} = 1.1250\]
P(x̅ < 1050) = P(Z < 1.1250) = (0.8686 + 0.8708)/2 = 0.8697
The finite population correction factor is calculated as follows: (N - n)/(N - 1).
The population correction factor for sample size n = 20 for a population with 20 members is: (20 - 20)(20 - 1) = 0.
The population correction factor for sample size n = 20 for a population with 100 members is: (100 - 20)(100 - 1) = 0.8081.
The population correction factor for sample size n = 20 for a population with 10,000 members is: (10,000 - 20)(10,000 - 1) = 0.9981.
It is the total sample size, not the fraction of the population in the sample, that determines the precision of the results from a random sample. The larger the number of members in the population, the higher the precision of the estimate, regardless of the size of a single sample.
P = 0.20 and n = 270.
\[ \sigma_{\hat{p}} = \sqrt{ \frac{P(1 - P)}{n} } = \sqrt{\frac{0.20(1 - 0.20)}{270} } = 0.024 \]
The required probability is:
\[ P(0.16 < \hat{p} < 0.24 = P( \frac{0.16 - 0.20}{0.024} < Z \frac{0.24 - 0.20}{0.024} ) \]
P(-1.67 < Z < 1.67) = 0.9525 - (1 - 0.9525) = 0.9050
Thus, we see that the probability is 0.9050 that the sample proportion is within the interval [0.16 - 0.24] given P = 0.20 and sample size n = 270. This interval can be called a 90.50% acceptance interval. Note that, if the sample proportion was actually outside this interval, we may suspect that the population proportion P is not 0.20.
P = 0.20; n = 400.
\[ \sigma_{\hat{p}} = \sqrt{ \frac{P(1 - P)}{n} } = \sqrt{\frac{0.20(1 - 0.20)}{400} } = 0.0200 \]
The required probability is:
\[ P(0.16 < \hat{p} < 0.24 = P( \frac{0.16 - 0.20}{0.0200} < Z \frac{0.24 - 0.20}{0.0200} ) \]
P(-2.00 < Z < 2.00) = 0.9772 - (1 - 0.9772) = 0.9544
This interval can thus be called a 95.44% acceptance interval (given P = 0.20 and sample size n = 400).
P = 0.43; n = 80.
\[ \sigma_{\hat{p}} = \sqrt{ \frac{P(1 - P)}{n} } = \sqrt{\frac{0.43(1 - 0.43)}{80} } = 0.055 \]
\[ P(\hat{p} > 0.50) = P(Z > \frac{0.50 - 0.43}{0.055}) \]
P (Z > 1.27) = 0.1020
P = 0.50; n = 250
\[ \sigma_{\hat{p}} = \sqrt{ \frac{P(1 - P)}{n} } = \sqrt{\frac{0.50(1 - 0.50)}{250} } = 0.0316 \]
\[ P(\hat{p} > 0.58) = P(Z > \frac{0.58 - 0.50}{0.0316}) = 2.5316 \]
P (Z > 2.53) = 1 - 0.9943 = 0.0057
\[ P(\hat{p} > 0.55) = P(Z > \frac{0.55 - 0.50}{0.0316}) = 0.9494 \]
P (Z > 0.95) = 1 - 0.8289 = 0.1711
n = 6; σ2 = (3.6)2 = 12.96.
Using the chi-square distribution, we can state that:
\[ P(s2 > K) = P ( \frac{(n - 1) s^{2}}{12.96} > 11.070) = 0.05 \]
where K is the desired upper limit and X25 = 11.070 is the upper 0.05 critical value of the chi-square distribution with 5 degrees of freedom. The required upper limit for s2 is obtained by solving:
\[ \frac{(n - 1)K}{12.96} = 11.070 \]
\[ K = \frac{(11.070)(12.96)}{(6 - 1)} = 28.69 \]
Thus, if the sample variance s2 from a random sample of size n = 6 exceeds 28.69, there is strong evidence to suspect that the population variance exceeds 12.96.
\[ \mu = \frac{2 + 4 + 6 + 6 + 7 + 8}{6} = 5.5 \]
Two of these employees are to be chosen randomly. We are sampling without replacement, thus, the first observation has a probability of 1/6 of being selected, while the second observation has a probability of 1/5 of being selected. Fifteen possible random samples of two eployees could be selected. Note that some samples (such as 2,6) occur twice because there are two employees with six years of experience in the population.
2 4
2 6 (2x)
2 7
2 8
4 6 (2x)
4 7
4 8
6 6
6 7 (2x)
6 8 (2x)
7 8
Sample mean | Probability of sample mean |
3.0 | 1/15 |
4.0 | 2/15 |
4.5 | 1/15 |
5.0 | 3/15 |
5.5 | 1/15 |
6.0 | 2/15 |
6.5 | 2/15 |
7.0 | 2/15 |
7.5 | 1/15 |
The central limit theorem shows that, if the sample size is large enough, the mean of a random sample drawn from a population with any probability distribution, will be approximately normally distributed with mean μ and variance σ2/n.
100
σ2/n = 15/100 = 0.15
According to the central limit theorem, we expect that, as n becomes large, the distribution approaches the standard normal distribution.
Suppose that we know that the annual percentage salary increase is normally distributed with a mean of 12.2% and a standard deviation of 3.6%. A random sample of 9 observations is obtained from this population and the sample mean is computed. What is the standard error of the sample mean?
Let x1, x2, ..., xn be a random sample from a normally distributed population with mean μ and variance σ2. Assuming that a population is normally distributed with a very large population size compared to the sample size, should the sample mean or the sample median be used to estimate the population mean?
Give one advantage of the median over the mean for estimating a population mean.
Give one disadvantage of the median in comparison to the mean for estimating a population mean.
Which two properties should an estimator possess?
Suppose that shopping times for customers at a local mall follow a normal distribution. The population standard deviation is equal to 20 minutes. A random sample of 64 shoppers in the local grocery store has a mean time of 75 minutes. What is the standard error?
What is the margin of error?
What is the 95% confidence interval for the population mean μ?
Give an interpretation of this confidence interval.
How can the margin of error be reduced?
What distribution is used when the population variance is known?
What distribution is used when the population variance is unknown?
Find the standard error for n = 17 and s = 16.
Find the upper critical value of student's t distribution with v = 23 degrees of freedom for α = 0.05.
From a random sample of 344 employees, it was found that 261 were in favor of a modified bonus plan. What is the sample proportion?
What is the reliability factor for a 90% confidence interval?
What is the margin of error for a 90% confidence interval?
Provide the 90% confidence interval.
Interpret the 90% confidence level.
What is the number that is exceeded with probability 0.10 by a chi-square random variable with 4 degrees of freedom?
What is the number that is exceeded with probability 0.05 by a chi-square random variable with 18 degrees of freedom?
The following information is provided: n = 25, s2 = 100. What are the critical values for a 95% confidence interval with α = 0.05?
Use the information provided in the previous question. Find the 95% confidence interval for the population variance.
Suppose there are 1395 secondary schools in the Netherlands. From a simple random sample of 400 of these schools, it was found that the sample mean enrollment during the past year in biology courses was 320.8 students, and the sample standard deviation was found to be 149.7 students. What it the point estimate for the population total, Nμ?
Find the corresponding 99% confidence interval for this population total.
From a simple random sample of 400 of the 1,395 students in our population, it is found that biology was a two-semester course in 141 of the sampled schools. Estimate the proportion of all schools for which the biology course is two semesters long.
Provide the confidence interval for the proportion of all schools for which the biology course is two semesters long.
Suppose we have: ME = 0.50; σ = 1.8; and za/2 = z0.005 = 2.576. What is the required sample size for a 99% confidence interval?
It is given that ME = 0.06 and za/2 = z0.025 = 1.96. What is the required sample size?
Suppose that an opinion survey is conducted about the presidential election. The survey was said to have a 3% margin of error. The implication is that a 95% confidence interval for the population proportion holding a particular opinion is the sample proportion plus or minus 3%. How many citizens of voting age need to be sampled to obtain this 3% margin of error?
Suppose that a simple random sample of the 1,395 Dutch secondary schools is taken. Whatever the true proportion, a 95% confidence interval must extend no further than 0.04 on each side of the sample proportion. How many sample observations should be taken?
Assuming that a population is normally distributed with a very large population size compared to the sample size, the sample mean is an unbiased estimator of the population mean.
The median gives less weight to extreme observations and, thus, is less sensitive to outliers.
The relative efficiency of the median is lower than that of the mean.
Unbiasedness and being the most efficient.
Standard error = σ/√n = 20/√64 = 2.5
The margin of error = zα/2 * (σ/√n) = 1.96*2.5 = 4.9
The 95% confidence interval runs from 75 - 4.9 to 75 + 4.9, that is: [70.1 - 79.9].
In the long run, 95% of the intervals found in this manner contain the true value of the population mean.
Decrease the population standard deviation, or increase the sample size, or decrease the confidence interval.
The standard normal distribution (z distribution).
Student's t distribution.
Standard error = s/√n = 16/√17 = 3.88
Use Table 8 (Appendix) to find that the upper critical value is 1.714.
\[ \hat{p} = 261/344 = 0.759 \]
\[ z_{\alpha/2} = z_{0.05} = 1.645\]
\[ 1.645 \sqrt{(0.759)(0.241)}{344} = 0.038 \]
0.759 +/- 0.038 = [0.721; 0.797]
Imagine taking a very large number of independent random samples of size n = 344 from this population, and, calculating a 90% confidence interval for each sample result. Then, the confidence level of the interval implies that in the long run 90% of the intervals found in this manner contain the true value of the population proportion.
7.779
28.869
\[ X^{2}_{n-1,1-\alpha/2} = \chi^{2}_{24,0.975} = 12.401 \]
\[ X^{2}_{n-1,\alpha/2} = \chi^{2}_{24,0.025} = 39.364 \]
\[ LCL = \frac{(n - 1) s^{2}}{\chi^{2}_{n - 1,\alpha/2} } = \frac{(24)(100)}{39.364} = 60.97 \]
\[ UCL = \frac{(n - 1) s^{2}}{\chi^{2}_{n - 1,1 - \alpha/2} } = \frac{(24)(100)}{12.401} = 193.53 \]
Hence, the 95% confidence interval is: [60.97; 193.53]
Nx̄ = (1,395)(320.8) = 447,516. Thus, we estimate a total of 447,516 students to be enrolled in biology courses.
\[ N\hat{\sigma}_{\bar{x}} = \frac{Ns}{\sqrt{n}} \sqrt{ (\frac{N - n}{N - 1}) } = \frac{(1,395)(149.7)}{\sqrt{400}} = 8,821.6 \]
Because the sample size is large, we can use the central limit theorem with zα/2 = 2.58 for a 99% confidence interval. Hence:
\[ N\bar{x} \pm z_{\alpha/2} N \hat{\sigma}_{\bar{x}} \]
\[ 447,516 \pm 2.58(8.821.6) \]
\[ 447,516 \pm 22,760 \]
Thus, the 99% confidence interval runs from 424,756 to 470,276 students.
N = 1,395; n = 400.
\[ \hat{p} = \frac{141}{400} = 0.3525 \]
The point estimate of the population proportion P, is simply equal to this population proportion, that is: 0.3525.
\[ \hat{\sigma}^{2}_{\hat{p}} = \frac{\hat{p} (1 - \hat{p}}{n - 1} ( \frac{N - n}{N - 1} ) = \frac{(0.3525)(0.6475)}{400} = 0.0004073 \]
so
\[ \hat{\sigma}_{\hat{p}} = \sqrt{0.0004073} = 0.0202 \]
For a 90% confidence interval: za/2 = 1.645.
\[ ME = z_{\alpha/2} \hat{\sigma}_{\hat{p}} = 1.645(0.0202) ≅ 0.0332 \]
Thus, the 90% confidence interval runs from 0.3525 +/- 0.0332. That is, from 31.93% to 38.57%.
\[ n = \frac{z^{2}_{\alpha/2}} \sigma^{2}{ME^{2}} = \frac{ (2.576)^{2} (1.8)^{2} }{(0.5)^{2}} ≈ 86 \]
\[ n = \frac{0.25 (z_{\alpha/2})^{2}}{(ME)^{2}} = \frac{0.25(1.96)^{2}}{(0.06)^{2}} = 267 \]
\[ n = \frac{0.25 (z_{\alpha/2})^{2}}{(ME)^{2}} = \frac{(0.25)(1.96)^{2}}{(0.03)^{2}} = 1067.11 = 1068 \]
\[ 1.96 \sigma_{\hat{p}} = 0.04 \]
\[ \sigma_{\hat{p}} = 0.020408 \]
\[ n_{max} = \frac{0.25N}{(N - 1) \sigma^{2}_{\hat{p}} + 0.25 } = \frac{(0.25)(1,395)}{(1,394)(0.020408)^{2} + 0.25} = 419.88 = 420 \]
Let x1, x2, ..., xn be a random sample from a normally distributed population with mean μ and variance σ2. Assuming that a population is normally distributed with a very large population size compared to the sample size, should the sample mean or the sample median be used to estimate the population mean?
The following information is provided for a dependent random sample from two normally distributed populations:
\[ n = 11 \hspace{3mm} \bar{d} = 28.5 \hspace{3mm} s_{d} = 3.3 \]
Find the 98% confidence interval for the difference between the means of the two populations.
What is the margin of error for a 98% confidence interval for the difference between the means of the two populations?
What do you conclude based on the confidence interval found in question 1a?
Consider the following data:
Before | After |
6 12 8 10 6 | 8 14 9 13 7 |
What type of dependent sample is depicted here?
What is the sample mean of the differences?
It is given that the mean difference is equal to 7.7 with standard deviation sd = 43.68901. Compute the 95% confidence interval using the normal approximation.
An educational study is conducted to examine the effectiveness of a mathematics reading program of elementary age school children. Each child was given a pre- and posttest. HIgher scores indicate improvement in mathematics. From a very large population, a random sample was drawn. The data obtained from this sample are provided in the table below. What is the mean difference score?
Child | Pretest Score | Posttest score |
1 2 3 4 5 6 7 | 40 36 32 38 33 | 48 42 36 |
What is the standard deviation of the difference scores?
Find the t value corresponding to a 95% confidence interval.
Compute a 95% confidence interval.
Can we conclude, based on this 95% confidence interval, that there is a significant improvement in mathematics?
Compute a 95% confidence interval using the normal approximation.
What do we conclude based on this interval?
A study regarding student's GPA was conducted. From a very large university, independent random samples of 120 students majoring in economics and 90 students majoring in finance were selected. The mean GPA for the random sample of economics majors was found to be 3.08. The mean GPA for the random sample of finance majors was found to be 2.88. From similar past studies, the population standard deviation for the finance majors is 0.64. Denote the population mean for economics by μx and the population mean for finance by μy. With which scenario are we dealing here?
Compute the 95% confidence interval for the difference score for the information provided in the previous question.
What do we conclude based on this 95% confidence interval (from question 4b)?
Consider the following data:
X | 100 | 125 | 135 | 128 | 140 | 142 | 128 | 137 | 156 | 142 |
Y | 95 | 87 | 100 | 75 | 110 | 105 | 85 | 95 |
Suppose these are independent samples with unknown variances, but the variances are assumed to be equal. Give nx, ny, x̄, ȳ, σ2x and σ2y.
Compute the pooled variance.
What are the degrees of freedom?
Find the t value corresponding to a 95% confidence interval.
Compute a 95% confidence interval.
Assuming equal population variances, determine the number of degrees for:
nx = 16; s2x = 30
ny = 9; s2x = 36
Compute the pooled sample variance for the information provided in the previous question.
Assuming equal population variances, determine the number of degrees for:
nx = 12; s2x = 30
ny = 14; s2x = 36
Compute the pooled sample variance for the information provided in the previous question.
Assuming equal population variances, determine the number of degrees for:
nx = 20; s2x = 16
ny = 8; s2x = 25
Compute the pooled sample variance for the information provided in the previous question.
The following information is provided:
\[ n_{x} = 120; \hat{p}_{y} = 0.892 \]
\[ n_{y} = 141; \hat{p}_{y} = 0.518 \]
Compute a 95% confidence interval for the population difference (Px - Py).
Calculate the margin of error for a 95% confidence interval with:
\[ n_{x} = 300; \hat{p}_{y} = 0.62 \]
\[ n_{y} = 350; \hat{p}_{y} = 0.72 \]
Calculate the margin of error for a 95% confidence interval with:
\[ n_{x} = 100; \hat{p}_{y} = 0.44 \]
\[ n_{y} = 150; \hat{p}_{y} = 0.55 \]
\[ \bar{d} \pm t_{n-1,\alpha/2} \frac{s_{d}}{\sqrt{n}} = 28.5 \pm 2.764 \frac{3.3}{\sqrt{11}} = 28.5 \pm 2.7502 \]
The 98% confidence interval is: [25.75; 31.25].
ME = 2.7502
Based on these sample data we conclude that there is sufficient evidence to suggest that there is a significant difference between the two populations.
Repeated measurements
\[ \bar{d} = \frac{2 + 2 + 1 + 3 + 1}{5} = 1.8 \]
Using the normal approximation we have tn-1,a/2 = t139,0.025 ≅ 1.96.
\[ \bar{d} \pm t_{n-1,\alpha/2} \frac{s_{d}}{\sqrt{n}} \]
\[ 7.7 \pm 1.96 \frac{43.68901}{\sqrt{140}} \]
\[ 7.7 \pm 7.2 \]
This results in the following 95% confidence interval: [70.5; 84.9]
\[ \bar{d} = \frac{8 + 6 -2 + 5 + 10}{5} = 5.4 \]
sd ≅ 4.56
t4,0.025 = 2.776
\[ \bar{d} \pm t_{n-1,\alpha/2} \frac{s_{d}}{\sqrt{n}} \]
\[ 5.4 \pm 2.776 \frac{4.56}{\sqrt{5}} \]
\[ 5.4 \pm 5.6620 \]
The 95% confidence interval is: [-0.26; 11.620]
No, because the zero is within the range of the confidence interval. Thus, there is insufficient evidence to conclude that there is a significant difference.
Using the normal approximation, we replace t by z, that is: z = 1.96.
\[ 5.4 \pm 1.96 \frac{4.56}{\sqrt{5}} \]
\[ 5.4 \pm 3.9976 \]
The 95% confidence interval is: [1.40; 9.40]
Based on the 95% confidence interval computed by the normal approximation, we would conclude that there is a significant improvement in the mathematics scores. Note, however, that we are dealing with a dependent sample here (repeated measures). Therefore, the normal approximation is not a valid procedure. It is, however, important to see the difference the distribution can make on the statistical inferences.
A. population variances known.
\[ (\bar{x} - \bar{y}) \pm z_{\alpha/2} + \sqrt{\frac{\sigma^{2}_{x}}{n_{x}} + \frac{\sigma^{2}_{y}}{n_{y}}} \]
\[ (3.08 - 2.88) \pm 1.96 \sqrt{ \frac{(0.42)^{2}}{120}} + \frac{(0.64)^{2}}{90} = 0.20 \pm 0.1521 \]
Thus, the 95% interval extends from 0.0479 to 0.3521
The confidence interval does not comprise the zero, thus we conclude that there is a significant difference in the mean GPA of students majoring in economics and students majoring in finance. More precisely, on average, the mean GPA of students majoring in economics is higher than the GPA of students majoring in finance.
nx = 10; x̄ = 133.30; σ2x = 218.0111
ny = 8; ȳ = 94.00; σ2y = 129.4286
\[ s^{2}_{p} = \frac{ (n_{x} - 1)s^{2}_{x} + (n_{y} - 1)s^{2}_{y} }{n_{x} + n_{y} - 2} = \frac{(10 - 1)(218.0111) + (8 - 1)(129.4286) }{10 + 8 -2} = 19.2563 \]
The degrees of freedom are given by: nx + ny - 2 = 10 + 8 - 2 = 16
t16,0.025 = 2.12
\[ (\bar{x} - \bar{y}) \pm t_{n_{x} + n_{y} - 2, a/2} + \sqrt{\frac{s^{2}_{p}}{n_{x}} + \frac{s^{2}_{p}}{n_{y}}} \]
\[ 39.3 \pm (2.21) \sqrt{ \frac{179.2563}{10} + \frac{179.2563}{8} } \]
\[ 39.3 \pm 13.46 \]
Thus, the 95% confidence interval is: [25.84; 52.76]
df = nx + ny - 2 = 16 + 9 - 2 = 23
\[ s^{2}_{p} = \frac{ (n_{x} - 1)s^{2}_{x} + (n_{y} - 1)s^{2}_{y} }{n_{x} + n_{y} - 2} \]
\[ s^{2}_{p} = \frac{ (16-1)30 + (9 - 1)36}{16 + 9 - 2} = \frac{738}{23} = 32.08 \]
df = nx + ny - 2 = 12 + 14 - 2 = 24
\[ s^{2}_{p} = \frac{ (12-1)30 + (14 - 1)36}{12 + 14 - 2} = \frac{798}{24} = 33.25 \]
df = nx + ny - 2 = 20 + 8 - 2 = 26
\[ s^{2}_{p} = \frac{ (20-1)16 + (8 - 1)25}{20 + 8 - 2} = \frac{479}{26} = 18.42 \]
\[ (\hat{p}_{x} - \hat{p}_{y}) \pm z_{\alpha/2} = \sqrt{ \frac{ \hat{p}_{x} (1 - \hat{p}_{x} ) }{n_{x}} + \frac{ \hat{p}_{y} (1 - \hat{p}_{y} ) }{n_{y}} } \]
\[ (0.892 - 0.518) \pm 1.96 \sqrt{ \frac{(0.892)(0.108)}{120} + \frac{(0.518)(0.482)}{141} } \]
From this, it follows that the 95% confidence interval runs from 0.274 to 0.473.
\[ ME = z_{\alpha/2} = \sqrt{ \frac{ \hat{p}_{x} (1 - \hat{p}_{x} ) }{n_{x}} + \frac{ \hat{p}_{y} (1 - \hat{p}_{y} ) }{n_{y}} } \]
\[ 1.96 \sqrt{ \frac{(0.62)(0.38)}{300} + \frac{(0.72)(0.28)}{350} } \]
ME = 0.0733
\[ ME = z_{\alpha/2} = \sqrt{ \frac{ \hat{p}_{x} (1 - \hat{p}_{x} ) }{n_{x}} + \frac{ \hat{p}_{y} (1 - \hat{p}_{y} ) }{n_{y}} } \]
\[ 1.96 \sqrt{ \frac{(0.44)(0.56)}{100} + \frac{(0.55)(0.45)}{120} } \]
ME = 0.1329
The following information is provided for a dependent random sample from two normally distributed populations:
\[ n = 11 \hspace{3mm} \bar{d} = 28.5 \hspace{3mm} s_{d} = 3.3 \]
Find the 98% confidence interval for the difference between the means of the two populations.
Kees wants to use the results of a random sample market survey to seek strong evidence that his brand of cereal has more than 20% of the total market. Formulate the null hypothesis and alternative hypothesis using P as the population proportion.
Is the alternative hypothesis you formulated a one-sided or two-sided composite alternative hypothesis?
A car factory has proposed a process to monitor the diameter of pistons on a regular schedule. They want to test whether the diameter is equal to 3800. Formulate the null hypothesis and alternative hypothesis.
What is a type I error?
What is a type II error?
A random sample is obtained from a population with variance σ2 = 625. The sample mean is computed. Test the null hypothesis H0: μ = 100 versus the alternative hypothesis H1: μ > 100 with α = 0.05. Compute the critical value x̅c and state your decision rule regarding a sample size of n = 25.
Do the same for n = 16.
Do the same for n = 44.
Do the same for n = 32.
A random sample of n = 25 is obtained from a population with known variance. The sample mean is computed. Test the null hypothesis: H0: μ = 120 versus the alternative hypothesis H1: μ > 120 with α = 0.10. Compute the critical value x̅c and state your decision rule regarding the population variance σ2 = 196.
Do the same for σ2 = 625.
Do the same for σ2 = 900.
Do the same for σ2 = 500.
Test the hypotheses: H0: μ = 100 and H1 = μ > 100, using a random sample of n = 31, a probability of type I error equal to 0.05 and the following sample statistics: x̅ = 108; s = 20.
Test the hypotheses: H0: μ = 100 and H1 = μ > 100, using a random sample of n = 31, a probability of type I error equal to 0.05 and the following sample statistics: x̅ = 104; s = 10.
Test the hypotheses: H0: μ = 100 and H1 = μ > 100, using a random sample of n = 31, a probability of type I error equal to 0.05 and the following sample statistics: x̅ = 96; s = 10.
Mention four conditions that will raise the power function.
Suppose, we find the probability of a type II error involved in failing to reject the null hypothesis when the true proportion is 0.56 to be β = 0.31 using a significance level of α = 0.05. What is the power?
Suppose, we find the probability of a type II error involved in failing to reject the null hypothesis when the true proportion is 0.66 to be β = 0.25 using a significance level of α = 0.10. What is the power?
A random sample of 20 products is obtained, and the weight of each product is measured. The sample variance is computed to be 6.62. The hypothesis is tested that the weight of the products cannot exceed. Formulate the null hypothesis and alternative hypothesis.
What are the degrees of freedom?
What is the critical value?
What is the test statistic?
Based on these sample data, can we reject the null hypothesis?
Suppose we are testing the following hypotheses:
H0: μ < 100
H1: μ > 100
using a random sample of n = 49, a probability of type I error equal to 0.05.
Suppose the population variances are unknown, what distribution should you use?
Test the hypotheses using the following test statistics: x̅ = 108; s = 20
Test the hypotheses using the following test statistics: x̅ = 104; s = 10
Test the hypotheses using the following test statistics: x̅ = 96; s = 10
Test the hypotheses using the following test statistics: x̅ = 95; s = 8
H0: P = 0.20
H1: P > 0.20
A one-sided composite alternative hypothesis.
H0: μ = 3800
H1: μ ≠ 3800
A type I error refers to rejecting the null hypothesis, while the null hypothesis is true.
A type II error refers to failing to reject the null hypothesis, while the null hypothesis is false.
For a one-sided hypothesis test with significance level α = 0.05, the value of zα = 1.645 from the standard normal table. The variance is 625, thus the standard deviation is √625 = 25.
\[ x_{c} = \mu_{0} + z_{\alpha} \sigma/\sqrt{n} = 100 + 1.96 x (25 / \sqrt{25}) = 109.80 \]
The decision rule is: reject H0 if x̅ > 109.80
\[ x_{c} = \mu_{0} + z_{\alpha} \sigma/\sqrt{n} = 100 + 1.96 x (25 / \sqrt{16}) = 112.50 \]
The decision rule is: reject H0 if x̅ > 112.50
\[ x_{c} = \mu_{0} + z_{\alpha} \sigma/\sqrt{n} = 100 + 1.96 x (25 / \sqrt{44}) = 107.39 \]
The decision rule is: reject H0 if x̅ > 107.39
\[ x_{c} = \mu_{0} + z_{\alpha} \sigma/\sqrt{n} = 100 + 1.96 x (25 / \sqrt{32}) = 108.62 \]
The decision rule is: reject H0 if x̅ > 108.62
For a one-sided hypothesis test with significance level α = 0.05, the value of zα = 1.282 from the standard normal table. The variance is 196, thus the standard deviation is √196 = 14.
\[ x_{c} = \mu_{0} + z_{\alpha} \sigma/\sqrt{n} = 120 + 1.282 x (14 / \sqrt{25}) = 123.59 \]
The decision rule is: reject H0 if x̅ > 123.59
\[ x_{c} = \mu_{0} + z_{\alpha} \sigma/\sqrt{n} = 120 + 1.282 x (\sqrt{625} / \sqrt{25}) = 121.28 \]
The decision rule is: reject H0 if x̅ > 121.28
\[ x_{c} = \mu_{0} + z_{\alpha} \sigma/\sqrt{n} = 120 + 1.282 x (\sqrt{900} / \sqrt{25}) = 127.69 \]
The decision rule is: reject H0 if x̅ > 127.69
\[ x_{c} = \mu_{0} + z_{\alpha} \sigma/\sqrt{n} = 120 + 1.282 x (\sqrt{500} / \sqrt{25}) = 125.73 \]
The decision rule is: reject H0 if x̅ > 125.73
t30,0.05 = 1.697
\[ t = \frac{\bar{x} - \mu_{0}}{s / \sqrt(n)} = \frac{108 - 100}{20 / \sqrt{31}} = 2.23 \]
Thus, t > t30,0.05. Based on this result, we reject the null hypothesis in favor of the alternative hypothesis.
t30,0.05 = 1.697
\[ t = \frac{\bar{x} - \mu_{0}}{s / \sqrt(n)} = \frac{104 - 100}{10 / \sqrt{31}} = 2.23 \]
Thus, t > t30,0.05. Based on this result, we reject the null hypothesis in favor of the alternative hypothesis.
The t value is actually the same as in the previous question, because both the nominator and denominator are half of the original value, hence yielding the same outcome.
t30,0.05 = 1.697
\[ t = \frac{\bar{x} - \mu_{0}}{s / \sqrt(n)} = \frac{96 - 100}{10 / \sqrt{31}} = -2.23 \]
Thus, t < t30,0.05. Because we are testing a one-sided alternative hypothesis with H1: μ > μ0, here, we cannot reject the null hypothesis (be aware that the sample mean is lower than the parameter of interest, rather than higher than the parameter).
(1) the true mean is farther from the hypothesized mean μ0; (2) the significance level is higher; (3) the population variance is lower; (4) the sample size is larger.
Power = 1 - β = 1 - 0.31 = 0.69
Power = 1 - β = 1 - 0.25 = 0.75
H0: σ2 < σ20 = 4
H1: σ2 > 4
df = n - 1 = 20 - 1 = 19
For this test with a significance level of α = 0.05 and 19 degrees of freedom, the critical value of the chi-square variable is 30.144 (see Appendix Table 7 of the book).
\[ \frac{(n - 1)s^{2}}{\sigma^{2}_{0}} = \frac{20 -1)(6.62)}{4} = 31.445 \]
31.445 > 30.144. Therefore, we can reject the null hypothesis and conclude that the variability of the weight of the products exceeds the standard.
Student's t distribution
The critical t value is: tc = 1.684
\[ t = \frac{108 - 100}{20 / \sqrt{49}} = 2.8 \]
t > tc, therefore we can reject the null hypothesis.
\[ t = \frac{104 - 100}{20 / \sqrt{10}} = 2.8 \]
t > tc, therefore we can reject the null hypothesis.
\[ t = \frac{96 - 100}{10 / \sqrt{49}} = -2.8 \]
t < tc, yet we are testing t > tc. Therefore we cannot reject the null hypothesis ("wrong side").
\[ t = \frac{95 - 100}{8 / \sqrt{49}} = 4.38 \]
t < tc, yet we are testing t > tc. Therefore we cannot reject the null hypothesis ("wrong side").
Kees wants to use the results of a random sample market survey to seek strong evidence that his brand of cereal has more than 20% of the total market. Formulate the null hypothesis and alternative hypothesis using P as the population proportion.
A researcher wants to determine whether two different production processes have different mean numbers of products produced per hour. The mean of production process 1 is defined as μ1 and the mean of production process 2 is defined as μ2. The null and alternative hypotheses are as follows: H0: μ1 – μ2 = 0 and H1: μ1 – μ2 > 0. From the populations, a random sample is drawn of 25 matched pairs. The sample means are respectively 50 and 60 for populations 1 and 2. Give the decision rule using a probability of type I error α = 0.05.
Can you reject the null hypothesis if the sample standard deviation of the difference is 20?
Can you reject the null hypothesis using a probability of type I error α = 0.05 if the sample standard deviation of the difference is 30?
Can you reject the null hypothesis using a probability of type I error α = 0.05 if the sample standard deviation of the difference is 15?
Can you reject the null hypothesis using a probability of type I error α = 0.05 if the sample standard deviation of the difference is 40?
A researcher wants to determine whether two different production processes have different mean numbers of products produced per hour. The mean of production process 1 is defined as μ1 and the mean of production process 2 is defined as μ2. The null and alternative hypotheses are as follows: H0: μ1 – μ2 = 0 and H1: μ1 – μ2 < 0. From the populations, a random sample is drawn of 25 matched pairs. The standard deviation of the difference between the sample means is found to be 25. Give the decision rule using a probability of type I error α = 0.05.
Can you reject the null hypothesis using a probability of type I error α = 0.05 if the sample means are respectively 56 and 50 for populations 1 and 2?
Can you reject the null hypothesis using a probability of type I error α = 0.05 if the sample means are respectively 59 and 50 for populations 1 and 2?
Can you reject the null hypothesis using a probability of type I error α = 0.05 if the sample means are respectively 56 and 48 for populations 1 and 2?
Can you reject the null hypothesis using a probability of type I error α = 0.05 if the sample means are respectively 54 and 50 for populations 1 and 2?
A researcher wants to conduct a hypothesis test for the difference in means between two populations with independent samples. The following information is provided:
nx = 25; = 115; = 625
ny = 25; = 100; = 400
Compute the test statistic.
The researcher decides to test at a significance level of α = 0.05. Determine the critical z value.
Compare the critical z value to the test statistic. Can the researcher reject the null hypothesis?
How large should the sample size be in order to obtain a good approximation if we replace the population variances with the sample variances?
Use the following information:
nx = 25; = 1078; sx = 633
ny = 25; = 908.2; sy = 469.8
We are interested in testing the difference in population means between X and Y. The alternative hypothesis states that the mean of population 2 is larger than the mean of population 1. For this hypothesis test, we are using a significance level of α = 0.05. Note that the population variances are unknown and that the sample variances are given.
Formulate the null hypothesis and alternative hypothesis.
Compute the pooled variance estimate.
What are the degrees of freedom?
What is the critical value of t?
Compute the test statistic.
Provide the decision rule for this hypothesis test.
Can the null hypothesis be rejected?
How large should the sample size be in order to be able to use the standard normal distribution for testing the equality of two population proportions?
Consider the following information:
nx = 270; = 0.185
ny = 203; = 0.399
Compute the estimate of the common variance, P0, under the null hypothesis.
Compute the test statistic.
Suppose we are testing with the alternative hypothesis: H1: Px < Py. For this test, we are using a significance level of α = 0.05. What is the critical value?
Formulate the decision rule.
Can we reject the null hypothesis?
Consider the following information:
nx = 17; sx = 123.35
ny = 11; sy = 8.02
What are the degrees of freedom for the F distribution?
Given a significance level of α = 0.02, what is the critical value of F?
Compute the test statistic. Can the null hypothesis be rejected?
tn-1,a = t24,0.05 = 1.711
The general decision rule here is: reject H0 if t > t24,0.05 = 1.711.
\[ t = \frac{\bar{d}}{s_{d} / \sqrt{n} } = \frac{10}{20 / \sqrt{25}} = 2.5 \]
t > t24,0.05 and, thus, we can reject the null hypothesis.
\[ t = \frac{\bar{d}}{s_{d} / \sqrt{n} } = \frac{10}{30 / \sqrt{25}} = 1.67 \]
t < t24,0.05 and, thus, we cannot reject the null hypothesis.
\[ t = \frac{\bar{d}}{s_{d} / \sqrt{n} } = \frac{10}{15 / \sqrt{25}} = 3.33 \]
t > t24,0.05 and, thus, we can reject the null hypothesis.
\[ t = \frac{\bar{d}}{s_{d} / \sqrt{n} } = \frac{10}{40 / \sqrt{25}} = 1.25 \]
t < t24,0.05 and, thus, we cannot reject the null hypothesis.
tn-1,a = t24,0.05 = -1.711
The general decision rule here is: reject H0 if t < -t24,0.05 = -1.711.
\[ t = \frac{\bar{d}}{s_{d} / \sqrt{n} } = \frac{-6}{25 / \sqrt{25}} = -3.8 \]
t < t24,0.05 and, thus, we can reject the null hypothesis.
\[ t = \frac{\bar{d}}{s_{d} / \sqrt{n} } = \frac{-9}{25 / \sqrt{25}} = -1.8 \]
t < t24,0.05 and, thus, we can reject the null hypothesis.
\[ t = \frac{\bar{d}}{s_{d} / \sqrt{n} } = \frac{-8}{25 / \sqrt{25}} = -1.6 \]
t > t24,0.05 and, thus, we cannot reject the null hypothesis.
\[ t = \frac{\bar{d}}{s_{d} / \sqrt{n} } = \frac{-4}{25 / \sqrt{25}} = -0.8 \]
t > t24,0.05 and, thus, we cannot reject the null hypothesis.
\[ z = \frac{115 - 100}{\sqrt{\frac{625}{25} + \frac{400}{25}}} = 2.34 \]
Z0.05 = 1.645
z > z0.05 thus the null hypothesis can be rejected.
The sample size should be larger than 100.
H0: μx – μy = 0
H1: μx – μy < 0
\[ s^{2}_{p} = \frac{ (25-1)(633)^{2} + (25 – 1)(469.8)^{2} }{25 + 25 - 2} = 310,700 \]
df = 25 + 25 – 2 = 48
t48,0.05 = 1.677
\[ t = \frac{1078 – 908.2}{ \sqrt{ \frac{310,700}{25} + \frac{310,700}{25}}} = 1.08 \]
Reject H0 if t > t48,0.05 = 1.677
No, the test statistic is smaller than the critical value. Thus, there is not sufficient evidence to reject the null hypothesis.
nP0(1 – P0) > 5
\[ \hat{p}_{0} = \frac{n_{x} \hat{p}_{x} + n_{y} \hat{p}_{y}}{n_{x} + n_{y}} = \frac{(270)(0.185) + (203)(0.399)}{270 + 203} = 0.277 \]
\[ \frac{0.185 – 0.399}{ \sqrt{ \frac{ (0.277)(1 – 0.277) }{270} + \frac{ (0.277)(1 – 0.277) }{203} } } = -5.15 \]
–z0.05 = -1.645
Reject H0 if z < –z0.05 = -1.645
Yes, we can reject the null hypothesis that there is no difference in proportions between these two populations, because -5.15 < -1.645.
dfnumerator = (nx - 1) = 17 – 1 = 16 and dfdenominator = (ny - 1) = 11 – 1 = 10.
From Appendix Table 9 (in the book) it follows that: F16,10,0.01 = 4.520
\[ F = \frac{s^{2}_{x}}{s^{2}_{y}} = \frac{123.35}{8.02} = 15.380 \]
Obviously, the test statistic of F(15.380) exceeds the critical value (4.520). Hence, the null hypothesis can be rejected in favor of the alternative hypothesis.
Use the following information for questions 1-5. A researcher wants to determine whether two different production processes have different mean numbers of products produced per hour. The mean of production process 1 is defined as μ1 and the mean of production process 2 is defined as μ2. The null and alternative hypotheses are as follows: H0: μ1 – μ2 = 0 and H1: μ1 – μ2 > 0. From the populations, a random sample is drawn of 25 matched pairs. The sample means are respectively 50 and 60 for populations 1 and 2. Give the decision rule using a probability of type I error α = 0.05.
Suppose we are interested in the relationship between the number of workers (denoted by X) and the number of tables produced per hour (Y). A sample of 10 workers is provided. The following descriptive statistics are obtained:
\[Cov(x,y) = 106.93 \hspace{5mm} s^{2}_{x} = 42.01 \hspace{5mm} \bar{y} = 41.2 \hspace{5mm} \bar{x} = 21.3 \]
Compute the slope of the sample regression.
Compute the y-intercept for the sample regression.
What is the equation of the regression line?
If management decides to employ 25 workers, how many tables would we expect to be produced?
The following regression equation is given: Y = 559 + 0.3815X.
What is the expected value of Y for X = 55,000.
Use the following regression equation:
Y = 100 + 21X
Interpret the slope of the regression line.
What is the change in Y when X changes by +5?
What is the change in Y when X changes by -7?
What is the predicted value of Y when X = 14?
What is the predicted value of Y when X = 27?
Does this equation prove that a change in X causes a change in Y?
Given the regression equation:
Y = 107 + 10X
What is the change in Y when X changes by +2?
What is the change in Y when X changes by -4?
What is the predicted value of Y when X = 15?
What is the predicted value of Y when X = 22?
Compute the coefficients for a least squares regression equation and write the equation, given the following sample statistics: x̅ = 10; ȳ = 50; sx = 80; sy = 75; rxy = 0.4; n = 60.
Compute the coefficients for a least squares regression equation and write the equation, given the following sample statistics: x̅ = 60; ȳ = 50; sx = 80; sy = 65; rxy = 0.7; n = 60.
Compute the coefficients for a least squares regression equation and write the equation, given the following sample statistics: x̅ = 90; ȳ = 100; sx = 60; sy = 70; rxy = 0.4; n = 60.
The following information is provided: SSE = 17.89 and SST = 68.22. What is the percent explained variability?
What absolute value of the Student's t statistic indicates a relationship between two variables when we use a two-tailed test with α= 0.05 and n > 60?
Given the simple regression model
\[ Y = \beta_{0} + \beta_{1}X \]
and the regression results that follow, test the null hypothesis that the slope coefficient is zero versus the alternative hypothesis that the slope coefficient differs from zero using probability of type I error rate equal to 0.005 and determine the two-sided 99% confidence interval. The following sample statistics are provided: n = 22; b1 = 0.3815; sb1 = 0.0253.
Consider your answer on the previous question. Based on this result, what do you conclude about the slope coefficient?
Which four factors result in narrower prediction intervals?
Suppose we want to test H0: ρ = 0 against H1: ρ > 0 using the sample information: n = 49 and r = 0.42.
What is the test statistic?
What is the critical value if we are testing at a 0.05% signifcance level?
What do we conclude about the population correlation?
Suppose we have the following information: n = 25. Using the rule of thumb for testing the hypothesis that the population correlation is zero, what should be the absolute value of the sample correlation that has to be exceeded in order to reject this null hypothesis?
Suppose we have the following information: n = 64. Using the rule of thumb for testing the hypothesis that the population correlation is zero, what should be the absolute value of the sample correlation that has to be exceeded in order to reject this null hypothesis?
Which two factors can influence the estimated regression equation?
Points with a high leverage will have a .... standard error of the residual.
\[ b_{1} = \frac{Cov(x,y)}{s^{2}_{x}} = r \frac{s_{y}}{s_{x}} = \frac{106.93}{42.01} = 2.545 \]
\[ b_{0} = \bar{y} - b_{1}\bar{x} = 41.2 - 2.545(21.3) = -13.02 \]
\[ \bar{y} = b_{0} + b_{1}x = -13.02 + 2.545x \]
\[ \hat{y} = -13.02 + 2.545(25) = 50.605 \]
Y = 559 + 0.3815*55,000 = 21,542
For every one-unit change in X, Y changes by 21.
If X changes by +5, Y changes by (21)(5) = 105
If X changes by -7, Y changes by (21)(-7) = -147
Y = 100 + (21)(14) = 394
Y = 100 + (21)(27) = 667
No, regression results summarize the information contained in the data. They do not prove causation.
If X changes by +2, Y changes by (10)(2) = 20
If X changes by -4, Y changes by (10)(-4) = 40
Y = 107 + (10)(15) = 257
Y = 107 + (10)(22) = 327
\[ b_{1} = r\frac{s_{Y}}{s_{X}} = 0.4 \frac{75}{80} = 0.375 \]
\[ b_{0} = \bar{y} = b_{1}\bar{x} = 50 - 0.43(10) = 46.25 \]
\[ \hat{y}_{i} = 46.25 + 0.375x_{i} \]
\[ b_{1} = r\frac{s_{Y}}{s_{X}} = 0.7 \frac{65}{80} = 0.8125 \]
\[ b_{0} = \bar{y} = b_{1}\bar{x} = 50 - 0.8125(60) = 1.25 \]
\[ \hat{y}_{i} = 1.25 + 0.8125x_{i} \]
\[ b_{1} = r\frac{s_{Y}}{s_{X}} = 0.4 \frac{70}{60} = 0.467 \]
\[ b_{0} = \bar{y} = b_{1}\bar{x} = 100 - 0.467(90) = 58 \]
\[ \hat{y}_{i} = 58 + 0.467x_{i} \]
\[ R^{2} = 1 - \frac{SSE}{SST} = 1 - \frac{17.89}{68.22} = 0.738 \]
Thus, 73,80% of the variability is explained by the regression model.
According to the rule of thumb, the absolute value of the Student's t statistic should be greater than 2.0 to indicate that there is a relationship.
For a 99% confidence interval we have 1 - α = 0.05 and n - 2 = 22 - 2 = 20 degrees of freedom. Hence, from Appendix Table 8 (see book) it follows that:
\[ t_{n-2,\alpha/2} = t_{20,0.005} = 2.845 \]
Therefore, the 99% confidence interval is:
\[ 0.3815 - (2.845)(0.0253) < \beta_{1} < 0.381 + (2.845)(0.0253) \]
\[ 0.3095 < \beta_{1} < 0.4535 \]
The confidence interval does not comprise the zero, therefore we can reject the null hypothesis and conclude that the slope coefficient is not equal to zero.
\[ t = \frac{0.43 \sqrt{(49 - 2)}}{\sqrt{1 - (0.43)^{2}}} = 3.265 \]
Since there are (n - 2) = 47 degrees of freedom, it follows from Appendix Table 8 that t47,0.005 = 2.704
t47,0.005 = 2.704 < t. Therefore, we can reject the null hypothesis. There is strong evidence of a positive linear relationship between the two variables. Note, however, that we cannot conclude from this result that one variable caused the other, but only that they are related.
\[ |r| > \frac{2}{\sqrt{n}} = \frac{2}{\sqrt{25}} > 0.4 \]
\[ |r| > \frac{2}{\sqrt{n}} = \frac{2}{\sqrt{64}} > 0.25 \]
Points with a high leverage and outliers.
Smaller.
Suppose we are interested in the relationship between the number of workers (denoted by X) and the number of tables produced per hour (Y). A sample of 10 workers is provided. The following descriptive statistics are obtained:
\[Cov(x,y) = 106.93 \hspace{5mm} s^{2}_{x} = 42.01 \hspace{5mm} \bar{y} = 41.2 \hspace{5mm} \bar{x} = 21.3 \]
Compute the slope of the sample regression.
\[ \hat{y} = 12 + 5_{x1} + 6_{x2} + 2_{x3} \]
Compute the expected value of y when x1 = 11, x2 = 24, and x3 = 27.
Compute the expected value of y when x1 = 31, x2 = 20, and x3 = 17.
Compute the expected value of y when x1 = 32, x2 = 29, and x3 = 13.
Compute the expected value of y when x1 = 30, x2 = 26, and x3 = 29.
\[ \hat{y} = 10 + 5_{x1} + 4_{x2} + 2_{x3} \]
Compute the expected value of y when x1 = 20, x2 = 11, and x3 = 10.
Compute the expected value of y when x1 = 15, x2 = 14, and x3 = 20.
Compute the expected value of y when x1 = 35, x2 = 19, and x3 = 25.
Compute the expected value of y when x1 = 10, x2 = 17, and x3 = 30.
\[ \hat{y} = 10 - 2_{x1} - 14_{x2} + 6_{x3} \]
What is the change in y when x1 increases by 4?
What is the change in y when x3 increases by 1?
What is the change in y when x2 increases by 2?
What is the fifth assumption of a multiple linear regression model?
Compute the coefficient b1 for the regression model
\[ \hat{y}_{i} = b_{0} + b_{1}x_{1i} + b_{2}x_{x2i} \]
given the following summary statistics:
rx1y = 0.80, rx2y = 0.30, rx1x2 - 0.90, sx1 = 500, sx2 = 400, sy = 100
Compute the coefficient b2 for the regression model (using the regression model of question 13).
The following data are available: n = 25; K = 2; SSE = 0.0625; SST = 0.4640.
Compute the adjusted coefficient of determination.
When is the adjusted coefficient of determination preferred over the standard coefficient of determination?
How is the coefficient of multiple correlation related to the multiple coefficient of determination?
b1 = 0.2372; sb1 = 0.0556; b2 = -0.000249; sb2 = 0.00003205.
What is the critical t statistic for a two-tailed hypothesis test with a 99% confidence interval?
Provide the 99% confidence interval for β1.
Provide the 99% confidence interval for β2.
A researcher is testing the influence of four independent variables on a certain dependent variable using multiple regression (n = 88). He finds that, for the complete model with four predictor variables, SSE = 1,149.14. For a multiple regression model with only two of the four predictor variables, SSE = 1,426.93. The variance estimator is s2e = 13.52. Compute the F statistic.
How many degrees of freedom does the F statistic have?
What is the critical value for F with a significance level of 0.01?
What is a dummy variable?
Formulate the null hypothesis and the alternative hypothesis for testing the slope coefficient in the event of dummy variables.
What is the model constant when the dummy variable equals 1 in the following equation, where x1 is a continuous variable and x2 is a dummy variable?
\[ \hat{y} = 9 + 6x_{1} + 9x_{2} \]
What is the model constant when the dummy variable equals 1 in the following equation, where x1 is a continuous variable and x2 is a dummy variable?
\[ \hat{y} = 7 + 4x_{1} + 2x_{2} \]
What is the model constant when the dummy variable equals 1 in the following equation, where x1 is a continuous variable and x2 is a dummy variable?
\[ \hat{y} = 4 + 4x_{1} + 8x_{2} + 9x_{1}x_{2} \]
Consider the following equation: yi = 2x1.4
Compute the value of yi when xi = 1
Consider the following equation: yi = 2x1.4
Compute the value of yi when xi = 1
\[ \hat{y} = 12 + (5)(11) + (6)(24) + (2)(27) = 265 \]
\[ \hat{y} = 12 + (5)(31) + (6)(20) + (2)(17) = 321 \]
\[ \hat{y} = 12 + (5)(32) + (6)(29) + (2)(13) = 372 \]
\[ \hat{y} = 12 + (5)(30) + (6)(26) + (2)(9) = 336 \]
\[ \hat{y} = 10 + (5)(20) + (4)(11) + (2)(10) = 174 \]
\[ \hat{y} = 10 + (5)(15) + (4)(14) + (2)(20) = 181 \]
\[ \hat{y} = 10 + (5)(35) + (4)(19) + (2)(25) = 311 \]
\[ \hat{y} = 10 + (5)(10) + (4)(17) + (2)(30) = 188 \]
The change in y when x1 increases by 4 is equal to (2)(4) = 8.
The change in y when x3 increases by 1 is equal to (6)(1) = 6.
The change in y when x2 increases by 2 is equal to (14)(2) = 28.
There is no direct linear relationship between the independent variables.
\[ b_{1} = \frac{ s_{y} (r_{x1y} - r_{x1x2}r_{x2y} ) }{s_{x1} (1 - r^{2}_{x1x2})} = \frac{100 (0.80 - 0.90*0.30)}{500 (1 - 0.90^{2}) = 0.56 } \]
\[ b_{2} = \frac{s_{y} (r_{x2y} - r_{x1x2} r_{x1y} ) }{s_{x2} (1 - r^{2}_{x1x2})} =
\frac{100 (0.30 - 0.90*0.80)}{400 (1 - 0.90^{2}) = -0.55 } \]
\[ \bar{R}^{2} = 1 - \frac{0.0625/22}{0.4640/24} = 0.853 \]
This adjusted coefficient of determination corrects for the fact that nonnrelevant independent variables will result in a (small) reduction in the error sum of squares (SSE). Consequently, the adjusted coefficient of determination offers a better comparison between multiple regression models with different numbers of independent variables.
The coefficient of multiple correlation is equal to the square root of the multiple coefficient of determination
tn-K-1,a/2 = t22,0.005 = 2.819
0.237 - (2.819)(0.05556) < β1 < 0.237 + (2.819)(0.05556)
0.80 < β1 < 0.394
-0.000249 - (2.819)(0.0000320) < β2 < -0.000249 + (2.819)(0.0000320)
-0.000339 < β2 < -0.000159
\[ F = \frac{(1426.93 - 1149.14)/2}{13.52} = 10.27 \]
The F statistic has 2 degrees of freedom (i.e., for the two variables tested simultaneously) for the numerator and 85 degrees of freedom for the denominator.
F* = 4.9 (see Appendix Table 9)
A dummy variable is a variable with two possible outcomes: 0 and 1.
\[ H_{0}: \beta_{3} = 0 | \beta_{1} \neq 0, \beta_{2} \neq 0 \]
\[ H_{1}: \beta_{3} \neq 0 | \beta_{1} \neq 0, \beta_{2} \neq 0 \]
18
9
12
2.64
5.28
\[ \hat{y} = 12 + 5_{x1} + 6_{x2} + 2_{x3} \]
Compute the expected value of y when x1 = 11, x2 = 24, and x3 = 27.
What are the four stages of model building?
If a model cannot be verified, what should you do?
In an experimental design, the experimental outcome (Y) is measured at specific combinations of levels for ... and ... variables.
If a blocking variable has 4 levels, how many dummy variables should be created?
What is a treatment variable?
What is a blocking variable?
What is a lagged value?
What is multicollinearity?
Suppose that all the coefficient student t statistics are small, indicting no individual effect, and yet the overall F statistic indicates a strong effect for the total regression model. What is this an indication of?
How to correct for multicollinearity?
What is the danger of correcting multicollinearity by removing one or more of the highly correlated independent variables?
What are the four assumptions made in a simple linear regression analysis?
What is the fifth assumption that is added for multiple regression analysis?
What is heteroscedasticity?
Describe one procedure to check for heteroscedasticity.
From the regression of the squared residuals on the predicted values, we obtain the following estimated model (for n = 25):
\[ e^{2} = 0.00621 - 0.00550 \hat{y} \hspace{2mm} with \hspace{2mm} R^{2} = 0.066 \]
Compute the test statistic.
What is the critical value if we are testing with a 10% significance level?
Can we reject the null hypothesis that the regression model has uniform variance?
What is the meaning of ρ for (auto)correlated errors?
What does it imply if ρ = 0?
What does it imply if ρ = 0.3?
What does it imply if ρ = 0.9?
What is the most commonly used test to check possible autocorrelation of error terms?
Formulate the null hypothesis of this test.
Provide the decision rules for testing the null hypothesis against the alternative hypothesis: H1: ρ > 0.
Provide the decision rules for testing the null hypothesis against the alternative hypothesis: H1: ρ < 0.
Suppose we found d = 0.2015, indicating positive autocorrelation. Estimate the serial correlation.
Suppose we found d = 0.5213, indicating positive autocorrelation. Estimate the serial correlation.
In determining whether the errors in a regression model are positively correlated for the model
\[ y_{t} = \beta_{0} + \beta_{1}x_{1t} + \epsilon_{t} \]
we determine
\[ \sum^{30}_{t = 1}e^{2}_{t} = 7587.9154 \]
and
\[ \sum^{30}_{t = 2} (e_{t} - e_{t - 1})^{2} = 8195.2065 \]
Formulate the null and alternative hypothesis for the mentioned analysis.
Calculate the Durbin-Watson statistic.
Model building consists of four stages: (1) model specification; (2) coefficient estimation; (3) model verification, and; (4) interpretation and inference.
Go back to the first stage; model specification.
In an experimental design, the experimental outcome (Y) is measured at specific combinations of levels for treatment and blocking variables.
3
A treatment variable is a variable whose effect we are interested in estimating with minimum variance. For instance, we may desire to know which of the five different production machines provides the highest productivity per hour. For this example, the treatment variable is the production machine, represented by a four-level categorical variable.
A blocking variable is a variable that is part of the environment. Therefore, the variable level of such a variable cannot be preselected.
When time series are analyzed (i.e., when measurements are taken over time) lagged values of the dependent variable are an important issue. Often in time series data, the dependent variable in time period t is related to the value taken by this dependent variable in an earlier time period, that is yt-1. The lagged value then is the value of the dependent variable in this previous time period.
Multicollinearity refers to a state of very high intercorrelations among the independent variables.
Multicollinearity
1. Remove one or more of the highly correlated independent variables.
2. Change the model specification, including possibly a new independent variable that is a function of several correlated independent variables.
3. Obtain additional data that do not have the same strong correlations between the independent variables.
This might lead to a bias in coefficient estimation
1. The Y's are linear functions of X, plus a random error term.
2. The x values are fixed number that are independent of the error terms.
3. The error terms are assumed to be random variables with a mean of zero and a covariance of σ2.
4. The random error terms are not correlated with one another.
There is no direct linear relationship between the Xj independent variables.
Heteroscedasticity refers to the situation in which the errors terms do not have uniform variance.
One possibility to check for heteroscedasticity is by examining a scatter plot of the residuals versus the independent variable. If the magnitude of the error terms tends to increase (or decrease) for increasing values of the independent variable, this indicates that the error variances are not constant.
\[ nR^{2} = (25)(0.066) = 1.65 \]
From Appendix Table 7, it can be found that for a 10% significance level, the critical value is: X21,0.10 = 2.706
The test statistic does not exceed the critical value, therefore the null hypothesis cannot be rejected.
This ρ is the correlation coefficient (range -1 to +1) between the error in time t and the error in the previous time point, that is t - 1.
If ρ = 0, this means that there is no autocorrelation in the errors.
There is a relatively weak autocorrelation.
There is a quite strong autocorrelation.
Durbin-Watson test.
H0: ρ = 0.
Reject H0 if d > dL. Accept H0 if d > du. Test inconclusive if dL < d < dU.
Reject H0 if d > 4 - dL. Accept H0 if d < 4 - du. Test inconclusive if 4 - dL > d > 4 - dU
\[ r = 1 - \frac{d}{2} = 1 - \frac{0.2015}{2} = 0.90 \]
\[ r = 1 - \frac{d}{2} = 1 - \frac{0.5213}{2} = 0.74 \]
H0: ρ = 0 and H0: ρ > 0.
\[ d = \frac{ \sum^{n}_{t = 2} (e_{t} - e_{t-1})^{2} }{\sum^{n}_{t=1} e^{2}_{t}} = \frac{8195.2065}{7587.9154} = 1.08 \]
What are the four stages of model building?
Consider the following data:
Category | A | B | C | D | Total |
Observed number of objects | 43 | 53 | 60 | 44 | 200 |
Probability (under H0) | 1/4 | 1/4 | 1/4 | 1/4 | 1 |
Expected number of objects (under H0) | 50 | 50 | 50 | 50 | 200 |
Compute the chi-square test statistic.
What are the degrees of freedom for the critical test statistic?
Provide the range of the test statistic with probability .10 and .90 using Table 7a and 7b.
Can we reject the null hypothesis that there is no preference for any of the four categories?
Consider the following data:
Category | A | B | C | D | Total |
Observed number of objects | 50 | 93 | 45 | 12 | 200 |
Probability (under H0) | 0.30 | 0.50 | 0.15 | 0.05 | 1 |
Expected number of objects (under H0) | 200 |
Compute the expected values based on the null hypothesis that is specified in the table.
Compute the chi-square test statistic.
How many degrees of freedom are there?
From Appendix Table 7 with K - 1 degrees of freedom, it is found that the test statistic falls between .... and ....
Can the null hypothesis be rejected?
Consider the following data:
Category | A | B | C | D | Total |
Observed number of objects | 287 | 49 | 30 | 34 | 400 |
Probability (under H0) | 0.80 | 0.10 | 0.06 | 0.04 | 1 |
Expected number of objects (under H0) | 400 |
Compute the expected values based on the null hypothesis that is specified in the table.
Compute the chi-square test statistic.
How many degrees of freedom are there?
Find the critical value using a significance level of 0.001.
Can the null hypothesis be rejected?
It is tested whether the population distribution is Poisson. Consider the following data:
Number of occurrences | 0 | 1 | 2 | 3+ |
Observed frequency | 156 | 63 | 29 | 14 |
Expected frequency under H0 | 135.4 | 89.4 | 29.5 | 7.7 |
Compute the test statistic.
How many degrees of freedom are there?
Find the corresponding critical value using a 0.001 significance level.
Can the null hypothesis that the population distribution is Poisson be rejected?
Suppose we are interested in whether people prefer pinapple on their pizza. We sample 7 participants under the null hypothesis H0: P = 0.5. What is the probability of obtaining no more than 2 people with a preference for pineapple on their pizza?
If our test statistic for a Sign test is equal to S = 2. Can we reject the null hypothesis?
A random sample of 100 students was asked to compare two new ice cream flavors: grilled BBQ and bubblegum surprise. After testing both flavors, 65 students preferred grilled BBQ, 40 students preerred bubblegum flavor, and 4 expressed no preference. Use the normal approximation to determine the mean and standard deviation for preferring bubblegum surprise.
Compute the test statistic using the normal approximation and continuity correction.
Find the approximate p-value.
Can we reject the null hypothesis?
What will be the test statistic if the continuity correction is not used?
Given a random sample of n = 31 matched pairs, compute the mean and standard deviation for the Wilcoxon statistic under the null hypothesis.
Now, suppose we find that the observed value of the statistic is T = 189. If we test the null hypothesis against a lower-tail alternative hypothesis with significance level 0.05, what can we conclude about the null hypothesis?
Two independent samples are considered with n1 = 10, n2 = 12 and R1 = 93.5.
Compute the mean and variance for the Mann-Whitney statistic.
Compute the Mann-Whitney U statistic.
What can we conclude about the null hypothesis if we are testing with a significence level of 0.05?
X2 = 3.88
df = K - 1 = 4 - 1 = 3.
Lower critical value (Appendix Table 7b) X23,0.90 = 0.584
Upper critical value (Appendix Table 7a) X23,0.10 = 6.251
It is found that the test statistic of 3.88 falls between 0.584 and 6.251; from this it follows that 0.10 < p-value < 0.90. The null hypothesis can therefore not be rejected. However, this does not mean that we can conclude that all four categories are equally preferred. It only means that there is not enough evidence to support a preference.
EA = nPA = 200(0.30) = 60
EB = nPB = 200(0.50) = 100
EC = nPC = 200(0.15) = 30
ED = nPD = 200(0.05) = 10
X2 = 10.06
df = K - 1 = 4 - 1 = 3.
From Appendix Table 7 with K - 1 degrees of freedom, it is found that the test statistic falls between 9.348 and 11.345.
0.001 < p-value < 0.025. Hence, the null hypothesis can be rejected.
EA = nPA = 400(0.80) = 320
EB = nPB = 400(0.10) = 40
EC = nPC = 400(0.06) = 24
ED = nPD = 400(0.04) = 16
X2 = 27.178
df = K - 1 = 4 - 1 = 3.
From Appendix Table 7 with K - 1 degrees of freedom and significance level 0.001, it is found that X23,0.001 = 16.266
The test statistic is much larger than the critical value. Hence, the null hypothesis can be rejected.
X2 = 16.08
df = K - m - 1 = 4 - 1 - 1 = 2
X22,0.001 = 13.816
The test statistic exceeds the critical value, thus the null hypothesis that the population distribution is Poisson can be rejected at the 0.01% significance level.
p-value = P(x < 2) = 0.227 (see Appendix Table 3)
No, with a p-value this large, the null hypothesis cannot be rejected.
Let P be the population proportion that prefers bubblegum surprise, given S = 40.
\[ \mu = np = 0.5n = 0.5(96) = 48 \]
\[ \sigma = 0.5 \sqrt{96} = 4.899 \]
Since 40 < 48, S* = 40.5
\[ z = \frac{S* - \mu}{\sigma} = \frac{40.5 - 48}{4.899} = -1.53 \]
From the standard normal distribution, it follows that the approximate p-value = 2(0.0630) = 0.126
The null hypothesis can be rejected at all significance levels greater than 12.6%.
If no continuity correction factor is used, the value for the test statistic becomes Z = -1.633, yielding a slightly smaller p-value of 0.1024.
\[ \mu_{T} = \frac{n(n + 1)}{4} = \frac{(31)(32)}{4} = 248 \]
\[ Var(T) = \sigma^{2}_{T} = \frac{n(n + 1)(2n + 1)}{24} = \frac{ (31)(32)(63) }{24} = 2604 \]
\[ \sigma_{T} = \sqrt{2604} = 51.03 \]
\[ Z = \frac{T - \mu_{T}}{\sigma_{T}} = \frac{189 - 248}{51.03} = \frac{-59}{51.03} = -1.16 \]
For α = 0.05, zα = -1.645
The test statistic does not exceed the critica value, hence there is not enough evidence to reject the null hypothesis.
\[ E(U) = \mu_{U} = \frac{n1n2}{2} = \frac{ (10)(12) }{2} = 60 \]
\[ Var(U) = \sigma^{2}_{U} = \frac{ n1n2 (n1 + n2 + 1) }{12} = \frac{ (10)(12)(23) }{12} = 230 \]
\[ Z = \frac{U - \mu{U}}{\sigma_{U}} = \frac{81.5 - 60}{ \sqrt{230} } = 1.42 \]
The corresponding p-value = 0.1556. With a 0.05 significance level, this test result is not sufficient to conclude that the null hypothesis can be rejected.
What is the null hypothesis of a one-way analysis of variance?
Suppose, we found the following data: SSW = 12.18, n = 20, k = 3. Compute an estimate of the within-groups mean square.
Suppose, we found the following data: SSG = 21.55, n = 20, k = 3. Compute an estimate of the between-groups mean square.
Compute the F ratio for the MSW and MSG calculate in the previous two questions.
What are the degrees of freedom corresponding to the information provided in questions 2 and 3.
What is the critical F value if we are testing with a 1% significance level?
What can we conclude about the population means based on this F ratio?
Consider the following analysis of variance table:
Source of variation | Sum of Squares | Degrees of freedom | Mean Squares | F ratio |
Between groups | 1728 | 4 | ||
Within groups | 624 | .. | ||
Total | 2352 | 17 |
How many degrees of freedom does the within-groups sum of squares have?
Compute the mean squares for between groups.
Compute the mean squares for within groups.
Compute the F ratio.
Find the critical F value corresponding to a significance level of 0.05.
What can be concluded about the null hypothesis?
Consider the following analysis of variance table:
Source of variation | Sum of Squares | Degrees of freedom | Mean Squares | F ratio |
Between groups | 879 | .. | ||
Within groups | 798 | 16 | ||
Total | 1677 | 19 |
How many degrees of freedom does the between-groups sum of squares have?
Compute the mean squares for between groups.
Compute the mean squares for within groups.
Compute the F ratio.
Find the critical F value corresponding to a significance level of 0.05.
What can be concluded about the null hypothesis?
Consider for questins 20-28 a two-way analysis of variance with one observations per cell and randomized blocks with the following results:
Source of variation | Sum of squares | Degrees of freedom | Mean squares | F ratio |
Between groups | 3636 | 33 | MSG = SSG / (K - 1) | |
Between blocks | 7575 | 66 | MSB = SSB / (H - 1) | |
Error | 9999 | 1818 | MSE = SSE / ((K - 1) (H - 1)) | |
Total | 210210 | 2727 |
Compute the mean squares for the between groups.
Compute the mean squares for the within groups.
Compute the mean squares for the error.
Compute the F ratio MSG / MSE.
Find the critical value for the hypothesis test that the between group means are equal using a 5% significance level.
What do we conclude about the null hypothesis that the between group means are equal?
Compute the F ratio MSB / MSE.
Find the critical value for the hypothesis test that the between block means are equal using a 5% significance level.
What do we conclude about the null hypothesis that the between block means are equal?
Consider the following data:
Source of variation | Sum of squares | Degrees of freedom | Mean squares | F ratio |
Between groups | 62.04 | 1 | 62.04 | |
Between blocks | 0.06 | 1 | 0.06 | |
Interaction | 1.85 | ... | 1.85 | |
Error | 23.31 | 63 | 0.37 | |
Total | 87.26 | 66 |
Compute the degrees of freedom for the interaction term.
Compute the F ratio for the interaction term.
All population means are equal, that is: H0: μ1 = μ2 = ... = μk for K populations.
MSW = (12.18) / (20 - 3) = 0.72
MSG = (21.55) / (3 - 1) = 10.78
F = MSG / MSW = 10.78 / 0.72 = 15.039
df = (K - 1) = 3 - 1 = 2 for the numerator
df = (n - K) = 20 - 3 = 17 for the denominator
F2,17,0.01 = 6.112 (Appendix Table 9)
The test value (15.039) exceeds the critical value (6.112), therefore we can reject the null hypothesis that the population mean is the same for all three groups.
It follows from the degrees of freedom of the between-groups sum of squares that there are K - 1 = 4, thus K = 5. Further, from the degrees of freedom of the total sum of squares it follows that n - 1 = 17, thus n = 18.
As a result, we obtain: df = N - k = 18 - 5 = 13.
MSG = SSG / (K - 1) = 1728 / 4 = 432
MSW = SSW / (n - K) = 624 / 13 = 48
F = MSG / MSW = 246.86 / 48 = 9
F4,13,0.05 = 3.179
F > F4,13,0.05 , therefore we can reject the null hypothesis that the population means are equal.
n - 1 = 19 --> n = 20
n - k = 16 --> 20 - k = 16 --> k = 4
df = k - 1 = 4 - 1 = 3
Thus, there are 3 degrees of freedom.
MSG = SSG / (K - 1) = 879 / 3 = 293
MSW = SSW / (n - K) = 798 / 16 = 49.875
F = MSG / MSW = 293 / 49.875 = 5.875
F3,16,0.05 = 3.239
F < F3,16,0.05 , therefore we cannot reject the null hypothesis that the population means are equal.
MSG = SSG / (K - 1) = 3636 / 33 = 110.18
MSB = SSB / (H - 1) = 7575 / 66 = 114.77
MSE = SSE / ((K - 1) (H - 1)) = 9999 / 1818 = 5.5
F = MSG / MSE = 110.18 / 5.5 = 20.03
F33,1818,0.05 = 1.676
The test statistic exceeds the critical value, therefore we can reject the null hypothesis that the between-groups means are equal.
F = MSB / MSE = 114.77 / 5.5 = 20.87
F66,9999,0.05 = 1.676
The test statistic exceeds the critical value, therefore we can reject the null hypothesis that the between-blocks means are equal.
df = 1
F = MSI / MSE = 1.85 / 0.37 = 5.
What is the null hypothesis of a one-way analysis of variance?
What is meant with a time series?
What are the four components of a time series?
Let the estimates of level and trend in year 5 be as follows:
\[ \hat{x}_{5} = 347 \]
\[ T_{5} = 13 \]
What is the forecast for the next year using the Holt-Winters method?
What is the forecast for year 7 using the Holt-Winters method for nonseasonal series?
What is the forecast for year 8 using the Holt-Winters method for nonseasonal series?
What is the forecast for year 9 using the Holt-Winters method for nonseasonal series?
Suppose we have 32 observations and a seasonal factor s = 4 indicating quarterly data. Write down the equation for the forecast the next observation beyond the end of the series. Use for this the method developed by Holt-Winters for seasonal series.
What is the null hypothesis in an autoregressive model?
Provide the general equation that represents a series according to the autoregressive model.
What algorithm is used to obtain the parameters for the autoregressive model?
A time series is a set of measurements, ordered over time, on a particular quantity of interest. In a time series, the sequence of observations is important.
\[ \hat{x}_{6} = 347 + 13 = 360 \]
\[ \hat{x}_{7} = 347 + (2)(13) = 373 \]
\[ \hat{x}_{8} = 347 + (3)(13) = 386 \]
\[ \hat{x}_{8} = 347 + (4)(13) = 399 \]
\[ \hat{x}_{n+h} = ( \hat{x}_{n} + hT_{n} ) F_{n+h-s} = \hat{x}_{33} = (\hat{x}_{32} + T_{32}) F_29 \]
H0: Φp = 0
\[ x_{t} = \gamma + \phi_{1}x_{t - 1} + \gamma + \phi_{2}x_{t - 2} + ... + \gamma + \phi_{p}x_{t - p} + \epsilon_{t} \]
The least squares algorithm.
What is meant with a time series?
Suppose we conducted a stratified sampling procedure. Use the following information:
N1 = 75; N2 = 30; N3 = 125.
n1 = 15; n2 = 8; n3 = 25.
x̄1 = 21.2; s1 = 12.8.
x̄2 = 13.3; s2 = 11.4.
x̄3 = 26.1; s3 = 9.2.
Compute the point estimate of the population mean.
Compute the point estimate of the variance for the first stratum.
Compute the point estimate of the variance for the second stratum.
Compute the point estimate of the variance for the third stratum.
Compute the point estimate of the variance for the population mean.
Compute the point estimate of the standard deviation for the population mean.
Compute a 95% confidence interval for the population mean.
Suppose we conducted a stratified sampling procedure. Use the following information:
N1 = 364; N2 = 1031.
n1 = 40; n2 = 60.
p(hat)1 = 7/40 = 0.175
p(hat)2 = 13/60 = 0.217
Compute the point estimate of the population proportion.
Compute the point estimate of the variance of the proportion for the first stratum.
Compute the point estimate of the variance of the proportion for the second stratum.
Compute the point estimate of the variance of the proportion for the population.
Compute the point estimate of the standard deviation of the proportion for the population.
Compute the 90% confidence interval for the population proportion from these stratified samples.
Suppose we have a total of N = 125 which is divided into three strata with N1 = 75, N2 = 30, and N3 = 20. Now, suppose we want to select a sample of size n = 25.
Compute the sample size for the first stratum using proportional allocation.
Compute the sample size for the second stratum using proportional allocation.
Compute the sample size for the third stratum using proportional allocation.
Suppose we have a total of N = 225 which is divided into three strata with N1 = 100, N2 = 75, and N3 = 50. Now, suppose we want to select a sample of size n = 50.
Compute the sample size for the first stratum using proportional allocation.
Compute the sample size for the second stratum using proportional allocation.
Compute the sample size for the third stratum using proportional allocation.
Suppose we have a total of N = 500 which is divided into three strata with N1 = 250, N2 = 100, and N3 = 150. Now, suppose we want to select a sample of size n = 50.
Compute the sample size for the first stratum using proportional allocation.
Compute the sample size for the second stratum using proportional allocation.
Compute the sample size for the third stratum using proportional allocation.
Suppose we have a total of N = 500 which is divided into three strata with N1 = 250, N2 = 100, and N3 = 150. Now, suppose we want to select a sample of size n = 100.
Compute the sample size for the first stratum using proportional allocation.
Compute the sample size for the second stratum using proportional allocation.
Compute the sample size for the third stratum using proportional allocation.
What is the difference between proportional allocation and optimal allocation in terms of sample effort?
What is the difference between proportional allocation and optimal allocation in terms of estimating the sample size for strata for population proportions?
What is the difference between stratified sampling and cluster sampling?
Mention one advantage and one disadvantage of cluster sampling.
Mention one advantage and one disadvantage of two-phase sampling
\[ \bar{x}_{st} = \frac{1}{N} \ sum^{K}_{j = 1} N_{j}\bar{x}_{j} = \frac{ (75)(21.2) + (30)(13.3) + (20)(26.1) }{125} = 20.09 \]
\[ \hat{\sigma}^{\frac{2}{x_{1}}} = \frac{ s^{2}_{1} }{n_{1}} x \frac{ (N_{1} - n_{1} ) }{N_{1} - 1 } = \frac{s^{2}_{1}}{n_{1}} x \frac{ (N_{1} - n_{1}) }{N_{1} - 1} = \frac{(12.8)^{2}}{15} x \frac{60}{74} = 8.856 \]
\[ \hat{\sigma}^{\frac{2}{x_{2}}} = \frac{ s^{2}_{2} }{n_{2}} x \frac{ (N_{2} - n_{2} ) }{N_{2} - 1 } = \frac{s^{2}_{1}}{n_{1}} x \frac{ (N_{1} - n_{1}) }{N_{1} - 1} = \frac{(11.4)^{2}}{8} x \frac{22}{29} = 12.324 \]
\[ \hat{\sigma}^{\frac{2}{x_{3}}} = \frac{ s^{2}_{3} }{n_{3}} x \frac{ (N_{3} - n_{3} ) }{N_{3} - 1 } = \frac{s^{2}_{1}}{n_{1}} x \frac{ (N_{1} - n_{1}) }{N_{1} - 1} = \frac{(9.2)^{2}}{2} x \frac{18}{19} = 40.093 \]
\[ \hat{\sigma}^{\frac{2}{st}} = \frac{1}{N^{2}} \ sum^{K}_{j = 1} N^{2}_{j} \hat{\sigma}^{2}_{x_{j}} = \frac{ (75)^{2}(8.856) + (30)^{2}(12.324) + (20)^{2}(40.093) }{125^{2}} = 4.924 \]
\[ \hat{\sigma}_{\bar{x}_{st}} = \sqrt{4.924} = 2.22 \]
20.09 +/- (1.96)(2.22) = [15.74; 24.44]
\[ \hat{p}_{st} = \frac{1}{N} = \sum^{K}_{j = 1} N_{j} \hat{p}_{j} = \frac{ (364)(0.175) + (1031)(0.217) }{1395} = 0.206 \]
\[ \hat{\sigma}^{2}_{p_{st}} = \frac{ \hat{p}_{j} (1 - \hat{p}_{j}) }{n_{j} - 1} x \frac{ (N_{j} - n_{j}) }{N_{j} - 1} = \frac{ (0.175)(0.825) }{39} x \frac{324}{363} = 0.003304 \]
\[ \hat{\sigma}^{2}_{p_{st}} = \frac{ \hat{p}_{j} (1 - \hat{p}_{j}) }{n_{j} - 1} x \frac{ (N_{j} - n_{j}) }{N_{j} - 1} = \frac{ (0.217)(0.783) }{59} x \frac{971}{1030} = 0.002715 \]
\[ \hat{\sigma}^{2}_{\hat{p}_{st}} = \frac{1}{N^{2}} \sum^{K}{j = 1} N^{2}_{j} \ hat{\sigma}^{2}_{\hat{p}_{j}} = \frac{ (364)^{2}(0.003304) + (1031)^{2}(0.002715) }{ (1395)^{2} } = 0.001708 \]
\[ \hat{\sigma}_{\hat{p}_{st}} = 0.0413 \]
(0.206) +/- (1.645)(0.0413) = [0.138; 0. 274]
\[ n_{1} = \frac{75}{125} x 25 = 12 \]
\[ n_{2} = \frac{30}{125} x 25 = 5 \]
\[ n_{3} = \frac{20}{125} x 25 = 6 \]
\[ n_{1} = \frac{100}{225} x 50 = 22 \]
\[ n_{2} = \frac{75}{225} x 50 = 17 \]
\[ n_{3} = \frac{50}{225} x 50 = 11 \]
\[ n_{1} = \frac{250}{500} x 50 = 25 \]
\[ n_{2} = \frac{100}{500} x 50 = 10 \]
\[ n_{3} = \frac{150}{500} x 50 = 15 \]
\[ n_{1} = \frac{250}{500} x 100 = 50 \]
\[ n_{2} = \frac{100}{500} x 100 = 20 \]
\[ n_{3} = \frac{150}{500} x 100 = 30 \]
Optimal allocation allocates relatively more sample effort to strata in which the population variance is highest.
Optimal allocation allocates more sample observations to strata in which the true population proportions are closest to 0.50.
In stratified random sampling, a sample is taken from every stratum of the population in an attempt to ensure that important segments of the population are given corresponding weight. In cluster sampling, a random sample of clusters is taken, such that some clusters will have no members in the sample.
Advantage: convenience. Disadvantage: the additional imprecision in the sample estimates.
Advantage: it enables the researcher, at a low cost, to try out the survey. Disadvantage: time consuming.
Suppose we conducted a stratified sampling procedure. Use the following information:
N1 = 75; N2 = 30; N3 = 125.
n1 = 15; n2 = 8; n3 = 25.
x̄1 = 21.2; s1 = 12.8.
x̄2 = 13.3; s2 = 11.4.
x̄3 = 26.1; s3 = 9.2.
Compute the point estimate of the population mean.
JoHo can really use your help! Check out the various student jobs here that match your studies, improve your competencies, strengthen your CV and contribute to a more tolerant world
There are several ways to navigate the large amount of summaries, study notes en practice exams on JoHo WorldSupporter.
Do you want to share your summaries with JoHo WorldSupporter and its visitors?
Field of study
JoHo kan jouw hulp goed gebruiken! Check hier de diverse studentenbanen die aansluiten bij je studie, je competenties verbeteren, je cv versterken en een bijdrage leveren aan een tolerantere wereld
Je vertrek voorbereiden of je verzekering afsluiten bij studie, stage of onderzoek in het buitenland
Study or work abroad? check your insurance options with The JoHo Foundation
Add new contribution