I previously mentioned Christopher Madan’s article on open data in teaching. He points to the Open Stats Lab at Trinity College, which I hadn’t heard about before. They create statistics exercises, suitable for an undergrad statistics or psychological research methods course, based on open data from papers published in Psychological Science. Each exercise has the original paper, a dataset, and a brief (1-2 page) description of the analysis to be done. They have multiple examples for each topic, including significance testing, correlation, regression and ANOVA.
It’s very nicely done, and the contexts of the analyses (i.e. the research questions from the original papers) are interesting, which I think helps make it engaging for students. There’s flexibility in the exercises for the students to think about which variables to use, and to formulate their own interpretation. My one quibble is that the writeups seem aimed at the student, and I think it might be useful to have a separate document with a few pointers (e.g., a “teaching note”) for instructors. In particular, if I were using one of the correlation exercises (particularly the example on Ebola and voting), I would want to have a thorough discussion in class of what we can and cannot conclude about causation (including how the original paper tried to address causation) and how to think about alternative explanations for the observed correlation.
Psychological Science, the source of the data, has a voluntary data disclosure policy, offering “badges” to published articles that make their data public. This is a relatively gentle “nudge” towards open data, but it seems to be having an effect. Nearly a quarter of papers after the policy was instituted made their data public, compared to 3% before. If the Open Stats Lab’s efforts (and others like it) are successful, it provides a nice additional incentive to authors of making their data public. After all, who wouldn’t want their research becoming a “textbook example” used in statistics classes?