Reproducible Quantitative Methods

Lesson 6

Programming in R / Licensing data and software for reuse

yeah bar

Topics and Resources

  1. Licencing and Creative Commons

    As we've been talking about in class, it's important to make your data and code available for others to reuse and remix. But how do we manage intellectual property rights in these situations? And importantly, how to we signal to future scientists how our research products can be appropriately used?

  2. The basics of programming in R

    We'll be going over an example of this on Thursday, but if you have little or no programming experience, please give this section a once-over before class.

    These three concepts-- conditionals, loops, and functions-- are essentially the fundamentals of any programming approach.

    Conditional- Working Definition: A logical statement that allows a computer to follow a different set of instructions depending on whether a condition is true or false (true/false values are known as a boolean values).
    Example 1
    Example 2
    Example 3

    Loop- Working Definition: A set of rules or steps that are repeated over and over until a certain condition is reached. This condition (another true/false, or boolean) is evaluated each time the loop runs to determine if the loop should keep going. Loops help you avoid repeating lines of code and add tremendous efficiency to your program.
    Example

    Function- Working Definition: A discrete reusable chunk of code that can be called to perform a specific task.
    Example

    If you don’t use many functions in your own work, consult Quick-R.Look for problems within your dataset that can use any of these approaches, but it works to combine them as well.

    Example: if you have to make a calculation that requires a conditional or iterating through a loop, write this within a function, and experiment with applying the function you wrote to multiple data objects. Here is sample code from the first iteration of the course that uses a function to replace missing values with estimates:

Exercises

  1. Loops, Conditionals and functions
  2. Brainstorm how your knew knowledge will aide you in processing your data.


    ProTip(s)


    We've got lots of helpful hints for this section

    Plot early and plot often! When applying functions, loops and considtionals to your data, be sure to check that the operation has done what you expected it to do- (just simple x-y scatter plots using base R will often do the trick- check for things like impossible values, strange relationships between variables) to ensure the data going into, and coming out of functions is following expected patterns.

    Helpful commentary. Don't forget to comment your code while you work! This will help both you and future scientists understand what you did and why. Too many comments are better than too few!

    Make it pretty. Autoformat your code in R Studio to make it easier to read and debug - the command is (cmd + l in Mac or ctrl + l or Windows)

    Learn from your mistakes. Learning to use the R help and look up error messages is more useful than learning the syntax or commands for any specific package

Discussion

Licensing

How do data and content licenses differ? What do you need to keep in mind when assigning a license more restrictive than CC-0 to your work? Talk about what happens if you are integrating multiple data sets and one data set is set as non-commercial while the other is only set as share-and-share alike. Think back to when we discussed Kent's IP policies and revisit what licenses you may or may not be allowed to apply to your work.

Readings

About Creative Commons Licences and Licensing for data reuse

Questions

How do data and content licenses differ?

What is license stacking and what do you need to keep in mind when assigning a license to your work?

What are some things you’d want to consider when selecting a data repository?

Previous Lesson | Home | Next Lesson