A teacher wanting to increase the active learning component of her course is concerned about student reactions to changes she is planning to make. She conducts a survey in her class, asking students whether they believe more active learning in the classroom (hands on exercises) instead of traditional lecture will helps improve their hearning. She does this at the beginning and end of the semester and wants to evaluate whether students’ opinions have changed over the semester. Can she used the methods we learned in this chapter for this analysis? Explain your reasoning.

Solution.

No. The samples at the beginning and at the end of the semester are not independent since the survey is conducted on the same students.

2.Website expermiment.

The OpenIntro website occasionally experiments with design and link placement. We conducted one experiment testing three different placements of a download link for this textbook on the book’s main page to see which location, if any, led to the most downloads. The number of site visitors included in the experiment was 701 and is captured in one of the response combinations in the following table:

Download

No Download

Postion 1

13.8%

18.3%

Postion 2

14.6%

18.5%

Postion 3

12.1%

22.7%

Calculate the actual number of site visitors in each of the six response categories.

Each individual in the experiment had an equal chance of being in any of the three experiment groups. However, we see that there are slightly different totals for the groups. Is there any evidence that the groups were actually imbalanced? Make sure to clearly state hypotheses, check conditions, calculate the appropriate test statistic and the p-value, and make your conclusion in context of the data.

Complete an appropriate hypothesis test to check whether there is evidence that there is a higher rate of site visitors clicking on the textbook link in any of the three groups.

3.Shipping holiday gifts.

A local news survey asked 500 randomly sampled Los Angeles residents which shipping carrier they prefer to use for shipping holiday gifts. The table below shows the distribution of responses by age group as well as the expected counts for each cell (shown in parentheses).

Age

18-34

35-54

55+

Total

Shipping Method

USPS

72

(81)

97

(102)

76

(62)

245

UPS

52

(53)

76

(68)

34

(41)

162

FedEx

31

(21)

24

(27)

9

(16)

64

Something else

7

(5)

6

(7)

3

(4)

16

Not sure

3

(5)

6

(5)

4

(3)

13

Total

165

209

126

500

State the null and alternative hypotheses for testing for independence of age and preferred shipping method for holiday gifts among Los Angeles residents.

Are the conditions for inference using a chi-square test satisfied?

Solution.

\(H_{0}:\) The age of Los Angeles residents is independent of shipping carrier preference variable. \(H_{A}:\) The age of Los Angeles residents is associated with the shipping carrier preference variable.

The conditions are not satisfied since some expected counts are below 5.

4.The Civil Wat.

A national survey conducted among a simple random sample of 1,507 adults shows that 56% of Americans think the Civil War is still relevant to American politics and political life.^{ 1 }

Conduct a hypothesis test to determine if these data provide strong evidence that the majority of the Americans think the Civil War is still relevant.

Interpret the p-value in this context.

Calculate a 90% confidence interval for the proportion of Americans who think the Civil War is still relevant. Interpret the interval in this context, and comment on whether or not the confidence interval agrees with the conclusion of the hypothesis test.

5.College smokers.

We are interested in estimating the proportion of students at a university who smoke. Out of a random sample of 200 students from this university, 40 students smoke.

Calculate a 95% confidence interval for the proportion of students at this university who smoke, and interpret this interval in context. (Reminder: Check conditions.)

If we wanted the margin of error to be no larger than 2% at a 95% confidence level for the proportion of students who smoke, how big of a sample would we need?

Solution.

Independence is satisfied (random sample), as is the success-failure condition (40 smokers, 160 non-smokers). The 95% CI: \((0.145, 0.255)\text{.}\) We are 95% confident that 14.5% to 25.5% of all students at this university smoke.

We want \(z^{*}SE\) to be no larger than 0.02 for a 95% confidence level. We use \(z^{*} = 1.96\) and plug in the point estimate \(\hat{p} = 0.2\) within the SE formula: \(1.96 \sqrt{0.2(1- 0.2)/n} \lt 0.02\text{.}\) The sample size \(n\) should be at least 1,537.

6.Acetaminophen and liver damage.

It is believed that large doses of acetaminophen (the active ingredient in over the counter pain relievers like Tylenol) may cause damage to the liver. A researcher wants to conduct a study to estimate the proportion of acetaminophen users who have liver damage. For participating in this study, he will pay each subject $20 and provide a free medical consultation if the patient has liver damage.

If he wants to limit the margin of error of his 98% confidence interval to 2%, what is the minimum amount of money he needs to set aside to pay his subjects?

The amount you calculated in part (a) is substantially over his budget so he decides to use fewer subjects. How will this affect the width of his confidence interval?

7.Life after college.

We are interested in estimating the proportion of graduates at a mid-sized university who found a job within one year of completing their undergraduate degree. Suppose we conduct a survey and find out that 348 of the 400 randomly sampled graduates found jobs. The graduating class under consideration included over 4500 students.

Describe the population parameter of interest. What is the value of the point estimate of this parameter?

Check if the conditions for constructing a confidence interval based on these data are met.

Calculate a 95% confidence interval for the proportion of graduates who found a job within one year of completing their undergraduate degree at this university, and interpret it in the context of the data.

What does “95% confidence” mean?

Now calculate a 99% confidence interval for the same parameter and interpret it in the context of the data.

Compare the widths of the 95% and 99% confidence intervals. Which one is wider? Explain.

Solution.

Proportion of graduates from this university who found a job within one year of graduating. \(\hat{p} = 348/400 = 0.87\text{.}\)

This is a random sample,so the observations are independent. Success-failure condition is satisfied: 348 successes, 52 failures, both well above 10.

\((0.8371, 0.9029)\text{.}\) We are 95% confident that approximately 84% to 90% of graduates from this university found a job within one year of completing their undergraduate degree.

95% of such random samples would produce a 95% confidence interval that includes the true proportion of students at this university who found a job within one year of graduating from college.

\((0.8267, 0.9133)\text{.}\) Similar interpretation as before.

99% CI is wider, as we are more confident that the true proportion is within the interval and so need to cover a wider range.

8.Diabetes and unemployment.

A Gallup poll surveyed Americans about their employment status and whether or not they have diabetes. The survey results indicate that 1.5% of the 47,774 employed (full or part time) and 2.5% of the 5,855 unemployed 18-29 year olds have diabetes.^{ 2 }

Create a two-way table presenting the results of this study.

State appropriate hypotheses to test for difference in proportions of diabetes between employed and unemployed Americans.

The sample difference is about 1%. If we completed the hypothesis test, we would find that the p-value is very small (about 0), meaning the difference is statistically significant. Use this result to explain the difference between statistically significant and practically significant findings.

9.Rock-paper-scissors.

Rock-paper-scissors is a hand game played by two or more people where players choose to sign either rock, paper, or scissors with their hands. For your statistics class project, you want to evaluate whether players choose between these three options randomly, or if certain options are favored above others. You ask two friends to play rock-paper-scissors and count the times each option is played. The following table summarizes the data:

Rock

Paper

Scissors

43

21

35

Use these data to evaluate whether players choose between these three options randomly, or if certain options are favored above others. Make sure to clearly outline each step of your analysis, and interpret your results in context of the data and the research question.

Solution.

Use a chi-squared goodness of fit test. \(H_{0}:\) Each option is equally likely. \(H_{A}:\) Some options are preferred over others. Total sample size: 99. Expected counts: \((1/3) * 99 = 33\) for each option. These are all above 5, so conditions are satisifed. \(df =3- 1 = 2\) and \(\chi^2 = \frac{(43-33)^2}{33} + \frac{(21-33)^2}{33}+ \frac{(35-33)^2}{33} =7.52 \rightarrow \text{p-value }= 0.023\text{.}\) Since the p-value is less than 5%, we reject \(H_{0}\text{.}\) The data provide convincing evidence that some options are preferred over others.

10.2010 Healthcare Law.

On June 28, 2012 the U.S. Supreme Court upheld the much debated 2010 healthcare law, declaring it constitutional. A Gallup poll released the day after this decision indicates that 46% of 1,012 Americans agree with this decision. At a 95% confidence level, this sample has a 3% margin of error. Based on this information, determine if the following statements are true or false, and explain your reasoning.^{ 3 }

We are 95% confident that between 43% and 49% of Americans in this sample support the decision of the U.S. Supreme Court on the 2010 healthcare law.

We are 95% confident that between 43% and 49% of Americans support the decision of the U.S. Supreme Court on the 2010 healthcare law.

If we considered many random samples of 1,012 Americans, and we calculated the sample proportions of those who support the decision of the U.S. Supreme Court, 95% of those sample proportions will be between 43% and 49%.

The margin of error at a 90% confidence level would be higher than 3%.

11.Browsing on the mobile device.

A survey of 2,254 American adults indicates that 17% of cell phone owners browse the internet exclusively on their phone rather than a computer or other device.^{ 4 }

According to an online article, a report from a mobile research company indicates that 38 percent of Chinese mobile web users only access the internet through their cell phones.^{ 5 }

S. Chang. “The Chinese Love to Use Feature Phone to Access the Internet”. In: M.I.C Gadget (2012).

Conduct a hypothesis test to determine if these data provide strong evidence that the proportion of Americans who only use their cell phones to access the internet is different than the Chinese proportion of 38%.

Interpret the p-value in this context.

Calculate a 95% confidence interval for the proportion of Americans who access the internet on their cell phones, and interpret the interval in this context.

Solution.

\(H_{0} : p = 0.38\text{.}\)\(H_{A} : p \ne = 0.38\text{.}\) Independence (random sample) and the success-failure condition are satisfied. \(Z = -20.5 \rightarrow \text{p-value } \approx 0\text{.}\) Since the p-value is very small, we reject \(H_{0}\text{.}\) The data provide strong evidence that the proportion of Americans who only use their cell phones to access the internet is different than the Chinese proportion of 38%, and the data indicate that the proportion is lower in the US.

If in fact 38% of Americans used their cell phones as a primary access point to the internet, the probability of obtaining a random sample of 2,254 Americans where 17% or less or 59% or more use their only their cell phones to access the internet would be approximately 0.

\((0.1545, 0.1855)\text{.}\) We are 95% confident that approximately 15.5% to 18.6% of all Americans primarily use their cell phones to browse the internet.

12.Which chi-square test? Part 1.

Consider each of the following tables. Determine (i) if a goodness of fit test, test for homogeneity, or test for independence is more appropriate, and (ii) how many degrees of freedom should be used for the test.

Favorite Animal

Count

Red Panda

22

Koala

7

Otter

13

Fennec Fox

25

Hedgehog

38

Favorite Kid Food

Count

Pizza

167

Tacos

48

Mac and Cheese

171

Chicken or Veggie Nuggets

74

Brocoli

2

Rushing

Not

Freshman

14

275

Sophomore

5

392

Other

7

725

Commute Time

Count

\(\le 10\) minutes

198

11-30 minutes

130

31-60 minutes

48

\(\gt\) 60 minutes

29

13.Which chi-square test? Part 2.

Consider each of the following planned studies. Determine (i) if a goodness of fit test, test for homogeneity, or test for independence is more appropriate, and (ii) how many degrees of freedom should be used for the test.

A state is conducting a study to better understand pay for tradespeople in the state’s three largest cities. In each city, the state will take a random sample of tradespeople and estimate the proportion who made at least $100,000 in each of the cities. In their final report, they would also like to note whether that proportion varies across the three cities.

A particular gene has 3 variants that can be found in proportions \(p_{1}=0.15\text{,}\)\(p_{2}=0.60\text{,}\) and \(p_{3}=0.25\) in the general population. Scientists suspect different variants of this gene might indicate an elevated risk for a particular genetic disease, and one way to evaluate this is to see if the general population distribution is the same in patients with the disease. The scientists will sample 450 patients with the disease and identify which variant each patient has.

A candy company produces candy pieces in 5 different colors that are mixed into bags. The colors should be in the following proportions: 15% green, 22% orange, 20% yellow, 24% red, and 19% purple. As a quality control check, the company randomly samples 1500 candy pieces and wants to determine if the target proportions match those of the observed distribution.

Solution.

Since there are 3 independent random samples here, we do a test for homogeneity. \(df=2\text{.}\)