7  Background skills

This chapter provides information on data science skills crucial to completing the book’s activities (aka prerequisites).

7.1 R/Rstudio Installation

The R for Data Science book provides instructions for installing R and RStudio: https://r4ds.hadley.nz/intro#prerequisites.

Once setting up RStudio:

  1. Go to the Tools -> Global Options.. -> General and uncheck “Restore .Rdata into workspace at startup and set Save workspace to .Rdata on exit” to “Never”.
  2. In the Global Options window, go to “Code” and check “Use native pipe operator, |> (Requires R 4.1+)”
  3. Click “Accept” and “Ok”

7.2 R skills

The book uses R as the focal programming language and uses the Tidyverse approach when working with and visualizing data. The following functions are commonly used: read_csv, write_csv, mutate, filter, group_by, summarize, ggplot, select, pipes (|> or %>), pivot_wider, pivot_longer, arrange, left_join). If you are new to R and Tidyverse, there are many great materials on the internet. The Data Carpentry “Data Analysis and Visualization in R for Ecologists” is an excellent starting point for learning. The R for Data Science book is an especially useful reference for learning the Tidyverse commands. Finally, I have created an introductory module to the tidyverse for my undergraduate Environmental Data Science class. You can use the module as a “test” of your Tidyverse skills

7.3 Git Skills

You will be required to use Git and GitHub to complete the assignments in the book. In particular, Git and GitHub are used to generate and submit forecasts to the NEON Ecological Forecasting Challenge. Below are instructions for setting up Git and GitHub on your computer.

7.3.1 Setting up Git and GitHub

  1. Create a GitHub user account at https://github.com if you don’t already have one. Here is advice about choosing a user name, because choosing a good user name is critical.

  2. Go to RStudio and install the usethis package.

install.packages("usethis")
  1. Run the following command, where you replace the user.email and user.name with the email used for GitHub and your GitHub user name. You can learn more about the command here
library(usethis)
use_git_config(user.name = "Jane Doe", user.email = "jane@example.org")

If you get an error at this step, it is likely because your computer doesn’t have Git installed. Follow the instructions here about installing Git

  1. Set up your GitHub credentials on your computer. Follow the instructions here about using usethis::create_github_token() and gitcreds::gitcreds_set() functions. Also, save your GitHub PAT in a password manager so you can find it later (in case you need to interact with GitHub from a different computer).

If you are having issues (i.e., your computer does not seem to have Git installed), here is an excellent resource to help you debug your git + RStudio issues.

7.3.2 Working with GitHub: a Quarto example

This section provides instructions for working with Git and GitHub in RStudio to create, modify, and render a Quarto document.

  1. Go to https://github.com/frec3044/git-rmd-intro. Find the “fork” bottom near the top right. Click “Fork,” then select your personal GitHub account.

  2. Go to the repo on your personal GitHub account. It will be something like https://github.com/[your-user-name]/git-rmd-intro

  3. Under the green “Code” button, select the local tab, and copy the URL link.

  4. Open RStudio on your computer and create a new project. First, File -> New Project -> Version Control -> Git. Paste the URL from you repo in the first box, hit tab to fill in the repo name in the second, and then use Browse to select where you want the project on your computer (I recommend having a directory on your computer where you keep all repositories we use in the class). If you don’t see a Version Control option, then you may not have Git installed on your computer (use the instructions here to install Git)

  5. Your project will load. Then go to File -> New -> New File -> Quarto Document

  6. In the prompt use Title = “Assignment 1” and Author = [Your name]

  7. Save file as “assignment1.qmd” in the assignment subdirectory of the Project.

  8. Commit your assignment1.qmd file using the Git tab at the top right pane using a useful commit message. You will need to check the box for the files that you want to commit. A useful message helps you remember what you did to the files included in the commit. The Git tab may not appear in the top-right panel if you have moved the panels around. If you don’t have the Git tab in the pane, you may not have created a GitHub project correctly, or you may not have Git installed on your computer.

  9. Find the Sources / Visual buttons right above the document. Select Source (which is the code view).

  10. Copy the code chunk on lines 21-24 and paste it at the end of the document. Change to echo: TRUE.

  11. Find the following code at the top

format: html:

and change it so that all the necessary files are saved in a single html file.

format:   
  html:
    embed-resources: true
  1. Find the Render (found above the document) button and click it to render the document to an html document. You will see a file named “assignment1.html” appear. The HTML is like a webpage version of your code. If you have a directory called assignment1_files, then you did not do step 15 correctly.
  2. Click on the “assignment1.html” in your “Files” pane and select “View in Web Browser”. Confirm that it looks as expected.
  3. Commit the updated .qmd and new .html files to git.
  4. Push to your repository on GitHub.
  5. Go to https://github.com/[your-user-name]/git-rmd-intro. You should also see your two most recent commits.