Module F¶
Project Description¶
In this project, you will find a study or article online that makes some form of statistical claim, and discuss the strengths, weaknesses, biases, and results of the claim. You will then aim to answer a similar research question using your own research methods and analyses.
This project is intentionally more openended than previous projects. The direction of the analysis is largely left to you.
Your final submission will be in any format(s) as you like (it can be in multiple formats), as long as it is readable for your instructor. For any code written, make sure to submit your iPython notebook.
Here are the steps you’ll need to complete Under each step are subbullets detailing questions you need to answer in your report.
Form a group of no more than 4 people. You may want to form a group with people who are in a similar major or have similar interests, as this may make question selection easier.
Find a study or article online that makes a statistical claim. (For example, this article and this underlying study claim that reading Harry Potter can reduce prejudice  but is it true?)
Why did you choose this study? What drew you to it?
What did you initially expect to find? Was your original hypothesis correct?
Discuss, via a notebook, document, or slide presentation, the relative strengths and weaknesses of the study. Make sure to intersperse visualizations with numerical and with written analysis.
What are the null and alternative hypotheses?
Is a pvalue mentioned?
What would you do differently if you ran this study?
Is the way this study is conducted ethical?
Is there any bias in the data collection? (For example, what if the survey was advertised within a Harry Potter fansite?)
Is there any bias in the data source? (For example, what if all respondents were teenagers?)
Are there any ethical implications of the results? (For example, what if the UN mandated everyone to read Harry Potter?)
Obtain data to answer the same (or similar) research question as in the study. You may use the following guidelines when choosing a dataset.
If the data is linked in the study, you may use it. You may also find alternative data sources that answer a similar question.
If you use a dataset you found online (either the same one as the study or elsewhere), you must justify why this dataset is acceptable to use. This includes a discussion on the ethics of how the dataset was obtained, and what data cleaning must be done.
If you use a dataset different than that of the study, include a discussion of the strengths and weaknesses of the two datasets. For example, your dataset may include more data points than the dataset used in the study, but it may include more bias.
If you don’t want to use an online dataset, you may generate your own dataset. You must include all code and an explanation as to how the data was simulated.
Do your own analysis to answer the same (or a similar) question as the study. Your analysis does not need to answer the same question as the study, but it should be closely related. For example, you may find a different dataset online to show that, in fact, reading Harry Potter does not statistically significantly reduce prejudice. Alternatively, you may find a more general dataset than reading fiction in general reduces prejudice (implying that it may not specifically be Harry Potter that reduces prejudice).
What are the null and alternative hypotheses?
What is the pvalue?
Does your study address the strengths and weaknesses of the published study?
What are the ethical implications of the data and your study?
Submit your report by [Due Date].
Optional (faculty can decide whether to include or not): After completing and submitting your project, complete the group work self assessment and group assessment.
Grading Rubric¶
Excellent 
Developing 
Beginning 
NA / Not Present 


Study Choice (6) 
The chosen study is interesting and relevant. There is a clear hypothesis test that can be formed from the study. 
The chosen study is interesting and/or relevant. There is a claim that can be hypothesis tested, but it requires a level of abstraction. 
The chosen study is either outdated or irrelevant. There is only a weak statistical claim in the study, and it is hard to form the hypothesis test. 
There is no study chosen, or it has absolutely no statistical claim. 
Study Discussion (12) 
The report contains a lengthy discussion on why the study was chosen. There is a discussion on the strengths and weaknesses of the study, including what could have been done differently. There is a clear statement of what the null and alternative hypotheses are, and if the pvalue is mentioned. There is a discussion on the ethics of the study. 
The report contains a discussion on why the study was chosen. There is some mention of the strengths and weaknesses of the study, including what could have been done differently. These ideas may not be fully developed. There is a mention of what the null and alternative hypotheses are, and if the pvalue is mentioned, but there may be a misinterpretation of the study. There is a discussion on the ethics of the study. 
The report contains a discussion on why the study was chosen. There is some mention of the strengths and weaknesses of the study, including what could have been done differently. These ideas may not be fully developed. There is no clear statement of the null and alternative hypotheses. The pvalue is not mentioned, or is mentioned without elaboration. There is no or minimal discussion on the ethics of the study. 
There is no discussion on why the study was chosen. 
Dataset (12) 
There is a sufficient discussion as to what dataset is chosen and why. If a dataset is chosen from online (either from the study or elsewhere), there is a discussion as to why it was chosen, the ethics of the gathering of the data, and any necessary data cleaning. If the dataset is simulated, a reasonable algorithm is used, all code is included, and there is an explanation as to how it was simulated. 
There is a discussion as to what dataset is chosen and why. If a dataset is chosen from online (either from the study or elsewhere), there is a discussion as to why it was chosen, but it may be lacking in details. If the dataset is simulated, either the algorithm does not make complete sense or the explanation lacks clarity. 
There is a discussion as to what dataset is chosen and why, but it is lacking clarity or substance. If a dataset is chosen from online (either from the study or elsewhere), there is a discussion as to why it was chosen, but it is lacking crucial details. If the dataset is simulated, either the algorithm does not make sense or there is minimal explanation as to how it was simulated. 
There is no dataset chosen, or it was chosen without any discussion at all. 
Analysis (16) 
There is an indepth effort at conducting a study to test the hypotheses. The null and alternative hypotheses are stated. A significance level (alpha) is mentioned. The data is correctly used to calculate a pvalue, and the correct conclusion for the hypothesis test is drawn. There is a discussion as to the strengths and weaknesses of the study, as well as the ethical implications. 
There is an effort at conducting a study to test the hypotheses. The null and alternative hypotheses are stated. A significance level (alpha) is mentioned. The data is used to calculate a pvalue, but it may not be completely correct. There may not be a mention as to how the pvalue links to the hypotheses. There is a discussion as to the strengths and weaknesses of the study, but it may lack detail or nuance. 
There is an indepth effort at conducting a study to test the hypotheses, but it is limited. The null and alternative hypotheses or the significance level (alpha) are not stated. There is no mention of the pvalue, or it is used incorrectly. There is no discussion as to the strengths and weaknesses of the study, or it is misguided. 
There is no analysis, or it is incoherent. 
Readability (4) 
The report is structured well. There are descriptions where necessary. There are very few spelling/grammar errors. 
The report lacks structure, and is hard to follow. There are several spelling/grammar errors. 
There is no report, or it is unreadable. 

Total (50) 