Reproducible Quantitative Methods

Spring Semester 2018

RQM is a research methods course that focuses on modernizing the post-data collection portion of the scientific workflow. The course takes an approach that produces both conventional research products and trains students to make their work more efficient and reproducible. This website will serve as the hub for materials for the SS18 offering of RQM (BSCI 5/70195-006) at Kent State University. Your instructor is Dr. Christie Bahlai. You can call me Christie. Hello. Here is a link to the syllabus.

Course Schedule

Part 1: Data

Weeks 1-4

How to handle your data to make your work more efficient and reproducible, how to handle common problems with data coming from other sources

Jan 16-18 - Introduction to reproducibility and open science frameworks

Jan 23-25 - Best practices for spreadsheets/ Learning to use data produced by others

Jan 30- Feb 1 - Introduction to Metadata / Data and scientific authorship

Feb 6-8 - Metadata, infrastructure, and data organization / Cleaning up messy data / Identifying 'grey' data sources

Part 2: Analysis

Weeks 5-8

Applying reproducibility principles to common statistical and visualization approaches.

Feb 13-15 - Intro to scripting in R/ Version control in R with Github

Feb 20-22 - Programming in R / Licensing data and software for reuse

Feb 27-Mar 1 - Programming in R, continued /Authorship and citation practices for non-manuscript research products

Mar 6-8 - Student-directed processing and analysis of project data /Data and code sharing challenges

Part 3: Communication

Weeks 9-12

Using technology to make our work accessible to others and to work better together.

Mar 13-15 - Making better plots / Visualization for outreach and communication

Mar 20-22 - Project workshop time, Github for project management / Scientific publication and accessibility

April 3-5 - Project workshop time

April 10-12 - Project workshop time / Scientific collaboration

Part 4: Opening Your Work

Weeks 13-14

Inviting the world to contribute to the scientific enterprise

April 17-19 - Project workshop time / Science and technology in a connected world

April 24-26 - Preparing a paper for publication / The future of open science and reproducible research

May 1-3 - Wrap up!

About

The RQM Course
Here's a bit more about the course. We can start with a talk I gave about the first offering of the course, my motivations, and our results.

Almost every graduate student has a “Now what?” moment during their thesis, and this moment often occurs after a student has collected data and now has to analyse it. Additionally, new (or newly enforced) requirements from federal funders are holding our scientific outputs, including data and code, to more rigorous reproducibility standards, but has offered little guidance on how individual labs and research projects should change their workflows.

Because of poor quality in many data sources, data scientists estimate they spend up to 80% of their time ‘data munging’- that is, cleaning, quality checking, and documenting data that they’re trying to use for their insights. The reality is, most data producers (a group which includes most experimental scientists) do not have specific training in data handling. This leads to decision paralysis, inefficiency, and the potential for incredible losses of information at the interface between observations and analysis- and takes the joy out of data-driven discovery. Training initiatives that address these issues are in high demand- workshops for Software Carpentry and Data Carpentry -organizations that offer workshops to train scientists in efficient software and data science skills- are usually at capacity and waitlisted within days of initial advertising.

This course directly builds on the principles laid out in Software Carpentry and Data Carpentry workshops, but provides students with a more immersive, long term experience in the form of a project-based learning approach. Project-based learning hybridizes a traditional lecture with a student-led working group, which allows the course to be effectively customized to directly apply the principles to real data and real problems. We provide the added incentive of including the students on a publication resulting from their work- giving them concrete training in applying these skills in a way that is relevant to their field. The course takes a two-pronged approach- approximately 2/3 of class time is given to applied tools training using a project data set, and the remaining 1/3 of class time is used to discuss the more philosophical aspects of modern, technologically- enabled science (e.g. how do we handle authorship on manuscripts supported by data compiled from a variety of sources? Is software a research product?).
How to use this website

You: a student of ecology or environmental sciences, interested in becoming better at data and computational applications in your research! This is a living document which we will be collectively modifying as we work through the materials outlined. It is adapted directly from an instructor guide on this subject, available here. If you find typos, broken links, or want to suggest changes, please submit an issue or pull request to this repo.
About this guide

This website was adapted (by Christie Bahlai!) from the instructor guide created by Christie Bahlai, a quantitative ecologist at Michigan State University, while supported by a fellowship from the Mozilla Science Lab.

This work is licensed under a Creative Commons Attribution 4.0 International License.

Help

Getting Help

Here we'll hopefully answer questions you might have. We aren't old enough to have a FAQ :)
Resources
- A friendly introduction to github
  What the git? This course relies very heavily on github as a collaboration platform. It's got a learning curve that most closely resembles a cliff, so here's a resource that you can go back to again and again if you get stuck.
  This workshop is the friendliest of friendly introductions. Git it.

Reproducible Quantitative Methods

Spring Semester 2018

Course Schedule

Part 1: Data

Part 2: Analysis

Part 3: Communication

Part 4: Opening Your Work

About

The RQM Course

How to use this website

About this guide

Help

Getting Help

Resources

A friendly introduction to github