Reproducible Quantitative Methods
Lesson 1
Topics and Resources
-
Course syllabus and expectations
This is a project-based learning course. The idea is to mentor students through a complete reproducible workflow, using a real dataset, with the intention of publishing their work as a manuscript, complete with data and code products. This project based approach is meant to stimulate the natural scientific curiosity that we wouldn't get from using canned examples to complete the exercises, and to motivate us by offering authorship on conventional scientific products- all while giving us skills that will bring their work into compliance with federal funder guidelines (for example, NSF). This means we'll need to start with some data. The good news is, unused (or under-used) data is everywhere.
See Simon Leather’s post on unused data that needs love
I want us to have groups established with appropriate data sets identified by week 2 of the class. Let's use this etherpad to discuss the available projects. New to etherpad? think of it as a collective notepad, with chat functionality. We can, and will, use these for a lot of purposes in this class. Need a fresh etherpad for a new project? Go here.
How are we going to keep track of our progress? this is a component of the grade. In previous offerings, you see that we used a class blog, but I'm lukewarm on that now. Let's consider a few options that we can use to keep track of what we're doing and what we did- and I'll elect one of your classmates to set this up. Similarly, how are we going to communicate about class and our respective projects? Let's discuss options.
-
Open science, open data, & reproducibility
What is open science? What is open data? We'll use this time to talk, in very broad terms, what reproducibility and open science are, how they fit together, and why they're important. For this topic, I'll ask the class to come up with a definition together, and then clarify or tweak. This is a good opportunity for me to gauge prior knowledge and attitudes. Here are some resources you can use to help form your position:
-What is open science? from The OpenScience Project (2009)
-What is open science? from F1000Research (2014)
-Challenges and Responses to Open Data
Every community inevidably produces its own terminology and jargon, and the open science and reproducible research community is no exception. Please review the Open Research Glossary - this is not only an excellent resource for definitions of terminology commonly used in open science, it's also an example of the community-driven products that are common in the open science community. Check out this article describing how this glossary came about.
-
Rules and regulations from funders and institutions
Here's a bit of the legal stuff. Here's a bit about the rules and regulations surrounding reproducibility, sharing and openness.
Data Management Plans- Data management plans are now a required part of most federal grant proposals. See the SPARC resource for Data Sharing Requirements by Federal Agency.
Enforcement- what sort of teeth do the rules and regulations around sharing and reproducibility have? See Today’s Data, Tomorrow’s Discoveries NSF's public access plan.
Institutional Intellectual Property Policy- become familiar with your own! It might be hard to find. For example, Michigan State University’s copyright policies are here.
Exercises
- Find Kent States IP policy, and discuss
- Sign up for github
Sometimes this information can be hard to find, and as we move on to other places, you're going to need to be able to find this information yourself. Search for:
“Intellectual property”“IP rights”
“Copyright policy”
“Data sharing policy”
Look on Office of Research or institutional Technology Transfer websites or for an institution-wide policy directory. HINT: WHEN YOU GET LOST AT KENT STATE, LOOK TO THE LIBRARY.
Can you interpret the policy in terms of your own work? Here are some questions I'd like you to think about:
What are the rules or regulations around sharing your particular research products?
Does the IP policy support the funding mandates?
Who do you go to with questions about this policy?
Is the policy different for students vs paid employees?
If you don't already have a github account, please sign up! This will be important as we're organizing our groups for our projects. We'll be learning a lot more about github over the course of our projects, but we'll learn it in bits, so don't let the octocats intimidate you.
Discussion
Openness and reproducibility in research
So why bother with reproducibility? What's the big deal about openness? How do they fit together? in this, the first class discussion, let's think about your motivations for taking this class. We'll begin the discussion by watching the video together.
Video
Rethinking Research Data | Kristin Briney | TEDxUWMilwaukee (15:05)
Questions
Do you agree or disagree with Briney’s assertion that publication is advertising? What might make it “advertising” and not “science?”
What are your concerns or challenges to the concept of open data? Why do you support open data or open science?