Reproducible Quantitative Methods
Lesson 6
Topics and Resources
-
Licencing and Creative Commons: Special Guest Timothy Vollmer
As we've been talking about in class, it's important to make your data and code available for others to reuse and remix. But how do we manage intellectual property rights in these situations? And importantly, how to we signal to future scientists how our research products can be appropriately used? On Tuesday, we have a special guest speaker. Timothy Vollmer, who is Senior Manager for Public Policy at Creative Commons. Timothy helps educate policymakers at all levels and across various disciplines such as education, data, science, culture, and government about copyright licensing, the public domain, and the adoption of open policies. Prior to CC, Timothy worked on information policy issues for the American Library Association in Washington, D.C. He is a graduate of the University of Michigan School of Information, and helped establish the Open.Michigan initiative. Here's a link to his slides.
-
The basics of programming in R
We'll be going over an example of this on Thursday, but if you have little or no programming experience, please give this section a once-over before class.
These three concepts-- conditionals, loops, and functions-- are essentially the fundamentals of any programming approach.
Conditional- Working Definition: A logical statement that allows a computer to follow a different set of instructions depending on whether a condition is true or false (true/false values are known as a boolean values).
Example 1
Example 2
Example 3Loop- Working Definition: A set of rules or steps that are repeated over and over until a certain condition is reached. This condition (another true/false, or boolean) is evaluated each time the loop runs to determine if the loop should keep going. Loops help you avoid repeating lines of code and add tremendous efficiency to your program.
ExampleFunction- Working Definition: A discrete reusable chunk of code that can be called to perform a specific task.
ExampleIf you don’t use many functions in your own work, consult Quick-R.Look for problems within your dataset that can use any of these approaches, but it works to combine them as well.
Example: if you have to make a calculation that requires a conditional or iterating through a loop, write this within a function, and experiment with applying the function you wrote to multiple data objects. Here is sample code from the first iteration of the course that uses a function to replace missing values with estimates:
Exercises
- Loops, Conditionals and functions
Brainstorm how your knew knowledge will aide you in processing your data.
ProTip(s)
We've got lots of helpful hints for this section
Plot early and plot often! When applying functions, loops and considtionals to your data, be sure to check that the operation has done what you expected it to do- (just simple x-y scatter plots using base R will often do the trick- check for things like impossible values, strange relationships between variables) to ensure the data going into, and coming out of functions is following expected patterns.
Helpful commentary. Don't forget to comment your code while you work! This will help both you and future scientists understand what you did and why. Too many comments are better than too few!
Make it pretty. Autoformat your code in R Studio to make it easier to read and debug - the command is (cmd + l in Mac or ctrl + l or Windows)
Learn from your mistakes. Learning to use the R help and look up error messages is more useful than learning the syntax or commands for any specific package
Discussion
Licensing
How do data and content licenses differ? What do you need to keep in mind when assigning a license more restrictive than CC-0 to your work? Talk about what happens if you are integrating multiple data sets and one data set is set as non-commercial while the other is only set as share-and-share alike. Think back to when we discussed Kent's IP policies and revisit what licenses you may or may not be allowed to apply to your work.
Readings
About Creative Commons Licences and Licensing for data reuse
Questions
How do data and content licenses differ?
What is license stacking and what do you need to keep in mind when assigning a license to your work?
What are some things you’d want to consider when selecting a data repository?