24  Accessing weather drivers

24.1 “Historical weather”

Our goal is to generate genuine forecasts of the future. Therefore we need to use weather forecasts as driver inputs into our model.

This opens a critical issue when calibrating a model that is then used for forecasting. A model could be calibrated using observed weather from a weather station as inputs. As a result, the parameters are tuned to the weather from the station. However, the bias and statistics of the observed weather may be different from the forecasted weather since weather forecasts are for a larger grid (e.g., the weather forecast may be consistently warmer by 2 degrees C). This offset in the meteorology used to calibrate the forecast could result in the forecast being biased. The issue can be fixed by either adjusting the weather forecast to be more consistent with the observed weather or calibrating the model using weather inputs from the weather forecast model. We use the latter in this book.

The get_historical_met function gets the first day of all the weather forecasts generated since September 2020. The first day of each forecast is the closest to what was observed because it directly follows data assimilation. Combining the first days together gives a complete time series of weather until the present day.

The function is printed out here so that you can modify it if you want to add variables or process differently.

get_historical_met <- function(site, sim_dates, use_mean = TRUE){

  if(use_mean){
    groups <- c("datetime", "variable")
  }else{
    groups <- c("datetime", "variable","parameter")
  }
  site <- "TALL"
  met_s3 <- arrow::s3_bucket(paste0("bio230014-bucket01/neon4cast-drivers/noaa/gefs-v12/stage3/site_id=", site),
                             endpoint_override = "sdsc.osn.xsede.org",
                             anonymous = TRUE)

  inputs_all <- arrow::open_dataset(met_s3) |>
    filter(variable %in% c("air_temperature", "surface_downwelling_shortwave_flux_in_air")) |>
    mutate(datetime = as_date(datetime)) |>
    mutate(prediction = ifelse(variable == "surface_downwelling_shortwave_flux_in_air", prediction/0.486, prediction),
           variable = ifelse(variable == "surface_downwelling_shortwave_flux_in_air", "PAR", variable),
           prediction = ifelse(variable == "air_temperature", prediction - 273.15, prediction),
           variable = ifelse(variable == "air_temperature", "temp", variable)) |>
    summarise(prediction = mean(prediction, na.rm = TRUE), .by =  all_of(groups)) |>
    mutate(doy = yday(datetime)) |>
    filter(datetime %in% sim_dates) |>
    collect()

  if(use_mean){
    inputs_all <- inputs_all |>
      mutate(parameter = "mean")
  }

  return(inputs_all)
}

24.2 Forecasts

An individual forecast can be accessed using the get_forecast_met function. This provides the full horizon (35 days ahead) for a forecast that was generated on a specific day (reference_datetime).

The function is printed out here so that you can modify it if you want to add variables or process differently.

get_forecast_met <- function(site, sim_dates, use_mean = TRUE){

  if(use_mean){
    groups <- c("datetime", "variable")
  }else{
    groups <- c("datetime", "variable","parameter")
  }

  met_s3 <- arrow::s3_bucket(paste0("bio230014-bucket01/neon4cast-drivers/noaa/gefs-v12/stage2/reference_datetime=",sim_dates[1],"/site_id=",site),
                             endpoint_override = "sdsc.osn.xsede.org",
                             anonymous = TRUE)

  inputs_all <- arrow::open_dataset(met_s3) |>
    filter(variable %in% c("air_temperature", "surface_downwelling_shortwave_flux_in_air")) |>
    mutate(datetime = as_date(datetime)) |>
    mutate(prediction = ifelse(variable == "surface_downwelling_shortwave_flux_in_air", prediction/0.486, prediction),
           variable = ifelse(variable == "surface_downwelling_shortwave_flux_in_air", "PAR", variable),
           prediction = ifelse(variable == "air_temperature", prediction- 273.15, prediction),
           variable = ifelse(variable == "air_temperature", "temp", variable)) |>
    summarise(prediction = mean(prediction, na.rm = TRUE), .by = all_of(groups)) |>
    mutate(doy = yday(datetime)) |>
    filter(datetime %in% sim_dates) |>
    collect()

  if(use_mean){
    inputs_all <- inputs_all |>
      mutate(parameter = "mean")
  }

  return(inputs_all)
}