10 Automated, iterative forecasting
In the third forecast generation assignment, you will be automating the submission of your forecast. By running your forecast code each day, you will be updating your model with the latest data, which is a simple form of data assimilation.
You will be generating forecasts of water temperature at NEON aquatics sites.
10.1 Key concepts
GitHub Actions is used to execute the forecast each day in the “cloud”. The “cloud” are small computers that are run somewhere in a Microsoft data center (Microsoft owns GitHub). You create a YAML file in the .github/workflows
in your GitHub repo that defines the time of day that your job is run, the environment (type of operating system and Docker container), and the steps in the job.
You will use Docker container (eco4cast/neon4cast-rocker
) to ensure a consistent computation environment each day. The Docker is based on a rocker container that already has R, Rstudio, and many common packages (e.g., tidyverse). It also contains packages that are specific to the NEON Forecasting Challenge.
Nothing persists in the container after it finishes in GitHub Actions. You will use the neon4cast::stage2
and neon4cast::stage3
to read the meteorology data. You will use read_csv and a weblink to read the targets. You will export your forecasts using the neon4cast::submit
function that uploads your forecast to the NEON Ecological Forecasting Challenge S3 submission bucket.
10.2 Docker Skills
Docker is a tool that the activities in the book use for improving the reproducibility and automation of data analysis and forecasting workflows. Below are the instructions for setting up and interacting with a Docker container (instructions are from Freya Olsson’s workshop)
Go to https://docs.docker.com/get-docker/ to install the relevant install for your platform (available for PC, Mac, and Linux). Also see https://docs.docker.com/desktop/.
NOTE: * If you’re running Windows, you will need WSL (Windows Subsystem for Linux) * If you’re running a Linux distribution, you may have to enable Virtualization on your computer (see here)
10.3 Running a docker container
- Launch Docker Desktop (either from the Command Line or by starting the GUI)
- At the command line run the following command which tells docker to
run
the container with the nameeco4cast/rocker-neon4cast
that has all the packages and libraries installed already. ThePASSWORD=yourpassword
sets a simple password that you will use to open the container. The-ti
option starts both a terminal and an interactive session.
docker run --rm -ti -e PASSWORD=yourpassword -p 8787:8787 eco4cast/rocker-neon4cast
This can take a few minutes to download and install. It will be quicker the next time you launch it.
- Open up a web browser and navigate to
http://localhost:8787/
- Enter the username:
rstudio
and password:yourpassword
- You should see an R Studio interface with all the packages etc. pre-installed and ready to go.
You can close this localhost window (and then come back to it) but if you close the container from Docker (turn off your computer etc.) any changes will be lost unless you push them to Github or export to your local environment.
10.4 Reading
Thomas, R. Q., Boettiger, C., Carey, C. C., Dietze, M. C., Johnson, L. R., Kenney, M. A., et al. (2023). The NEON Ecological Forecasting Challenge. Frontiers in Ecology and the Environment, 21(3), 112–113. https://doi.org/10.1002/fee.2616
10.5 Assignment
Pre-assignment set up: Complete Chapter 9
- Convert your template markdown code to a .R script
- Update your repository to include the GitHub Action files
- Commit updated files to GitHub
- Provide the instructor with a link to the GitHub repository that demonstrates multiple successful automated executions of forecasting code.
10.6 Module reference
Olsson, F., C. Boettiger, C.C. Carey, M.E. Lofton, and R.Q. Thomas. Can you predict the future? A tutorial for the National Ecological Observatory Network Ecological Forecasting Challenge. In review at Journal of Open Source Education.