18  Data to constrain process model

library(tidyverse)
library(neonstore)

The goal of this chapter is to access and process data from the National Ecological Observatory Network (NEON) to calibrate parameters, estimate parameter uncertainty, assimilate data, and evaluate output from the forest carbon model in Chapter 16. The data prepared for use in the forest carbon model are used in subsequent chapters.

In this document, we will be calculating the carbon in tree wood, tree leaves, and soil for a NEON site (matching the model states in Chapter 16).

Much of the code in this chapter is specific to NEON and the particular carbon pools and fluxes analyzed. While the chapter focuses on a single NEON site, the code and concepts can be applied to developing carbon budgets at other forested NEON sites. Additional carbon stock collections are needed at grassland sites (e.g., non-woody vegetation samples need to be processed).

This chapter was developed in collaboration with John Smith at Montana State University.

NEON data is organized by data product ID in the NEON Data Portal: https://data.neonscience.org/static/browse.html

The chapter uses the neonstore packages developed by Carl Boettiger to access NEON data. The neon_cloud function uses the NEON data product ID and the table within the product to download the data from NEON cloud storage. If you are new to a NEON data product, it is important to explore the data product on NEON’s Data Portal before using the neon_cloud functionality (otherwise, you don’t know what tables you need to download and how they link together).

18.1 NEON Project

18.2 NEON Terrestrial sites

18.3 Download data

First, we define the site ID. The four-letter site code denotes individual NEON sites. You can learn more about NEON sites here: https://www.neonscience.org/field-sites/explore-field-sites.

The elevation, latitude, and longitude are needed to convert the tree diameter measurements to biomass and are found on the NEON page describing the site.

site <- "OSBS"
elevation <- 46 
latitude <- 29.689282 
longitude <- -81.993431

18.4 Wood carbon

In this section, we will be calculating carbon in live and dead trees at a NEON site. The carbon in live trees represents the wood carbon stock in Figure 16.1, and the dead trees represent a component of the soil organic matter stock in Figure 16.1. In the end, we will have a site-level mean carbon stock in live trees and dead trees for each year sampled from the plots representing the ecosystem under the flux tower (e.g., tower plots). We use the tower plots so that they correspond to the same ecosystem as the NEON nee data.

We will select the key variables in each table (thus only downloading those variables).

The code below reads the data directly from NEON’s cloud storage.

## Mapping and tagging table

map_tag_table <- neon_cloud("mappingandtagging",
                            product = "DP1.10098.001",
                            site = site) |> 
  select(individualID, scientificName) |> 
  collect() |> 
  group_by(individualID) |> 
  slice(1) |> #This is needed because some individuals change species IDs
  ungroup()

## Individual table
ind_table <- neon_cloud("apparentindividual",
                        product = "DP1.10098.001",
                        site = site) |>
  select(individualID, eventID, plotID, date, stemDiameter,plantStatus, measurementHeight) |> 
  distinct() |> 
  collect()

## Plot table
plot_table <- neon_cloud("perplotperyear",
                        product = "DP1.10098.001",
                        site = site) |> 
  select(plotID,totalSampledAreaTrees,plotType) |>  
  distinct(plotID, .keep_all = TRUE) |> 
  collect() 

The equations that convert diameter at breast height (DBH), where breast height is defined at 130 cm above the base of the tree, differ by species and location. Therefore, the scientific name (both genus and species components) is needed. The species names in the mapping and tagging table need to be separated into genus and species so we can calculate biomass using an R function that expects them to be separate.

genus_species <- unlist(str_split_fixed(map_tag_table$scientificName, " ", 3))

map_tag_table <- map_tag_table |> 
  mutate(GENUS = genus_species[,1], 
         SPECIES = genus_species[,2]) 

Now we will join the tables by the key variables to build our dataset for the site.

combined_table <- left_join(ind_table, map_tag_table, by = "individualID") |> 
  arrange(plotID,individualID)

combined_table <- inner_join(combined_table, plot_table, by = "plotID") |> 
  arrange(individualID)

combined_table_dbh <- combined_table |> 
  filter(measurementHeight == 130,
         !is.na(stemDiameter))

18.4.1 Calculate carbon in live trees

Tidy up the individual tree data to include only live trees from the tower plots. Also, create a variable that is the year of the sample date. We will filter the data to only include records with diameter at breast height (dbh) measurements, using a measurement height of 130 cm.

combined_table_live_tower <- combined_table_dbh |>  
  filter(str_detect(plantStatus,"Live"),
         plotType == "tower",
         measurementHeight == 130) |> 
  mutate(stemDiameter = as.numeric(stemDiameter))

To calculate the biomass of each tree in the table, we will use the get_biomass function from the allodb package (Gonzalex-Akre https://doi.org/10.1111/2041-210X.13756), which converts DBH measurements to biomass estimates. This function takes as arguments: dbh, genus, species, and coords. We have already extracted genera and species and filtered them to the dbh measurements. (note allodb is not on CRAN but can be downloaded using remotes::install_github("ropensci/allodb"))

In this next section, as well as in a future one where we calculate dead-tree carbon, we are going to make a simplifying assumption. We will assume that the below-ground biomass of a tree is some fixed proportion of its above-ground biomass. In our analysis, we will assume this value is \(0.3\) (ag_bg_propr), but it is a parameter that can be changed. We also assume that carbon is \(0.5\) of biomass.

The get_biomass function is within the allodb package and returns the biomass of each tree in units of kg.

library(allodb)

ag_bg_prop <- 0.3

tree_live_carbon <- combined_table_live_tower |>
  mutate(ag_tree_kg = get_biomass(dbh = combined_table_live_tower$stemDiameter,
                                  genus = combined_table_live_tower$GENUS,
                                  species = combined_table_live_tower$SPECIES,
                                  coords = c(longitude, latitude)),
  bg_tree_kg = ag_tree_kg * ag_bg_prop, ## assumption about ag to bg biomass
  tree_kgC = (ag_tree_kg + bg_tree_kg) * 0.5) ## convert biomass to carbon

Calculate the plot-level biomass by summing up the tree biomass in a plot and dividing by the area of the plot.

measurement_dates <- tree_live_carbon |> 
  summarise(measure_date = max(date), .by = eventID)

 plot_live_carbon <-  tree_live_carbon |> 
   left_join(measurement_dates, by = "eventID") |> 
    mutate(treeC_kgCm2 = (tree_kgC)/(totalSampledAreaTrees)) |> 
    summarise(plot_kgCm2 = sum(treeC_kgCm2, na.rm = TRUE), .by = c("plotID", "measure_date"))

Figure 18.1 plot level carbon in living trees

ggplot(plot_live_carbon, aes(x = measure_date, y = plot_kgCm2, color = plotID)) + 
  geom_point() +
  geom_line() +
  theme_bw()
Figure 18.1: Plot level carbon in living trees for the focal NEON site

Only a subset of plots is measured each year, and we only want the plots that have annual measurements. This code determines the set of plots measured each year (a subset of n = 5), while all the other plots are measured every 5 years.

last_plots <- plot_live_carbon |> 
  filter(measure_date == max(measure_date)) |> 
  pull(plotID)

site_live_carbon <- plot_live_carbon |> 
  filter(plotID %in% last_plots) |> 
  pivot_wider(names_from = plotID, values_from = plot_kgCm2) |> 
  na.omit() |> 
  pivot_longer(-measure_date, names_to = "plotID", values_to = "plot_kgCm2") |> 
    group_by(measure_date) |>
    summarize(mean_kgCperm2 = mean(plot_kgCm2, na.rm = TRUE),
              sd_kgCperm2 = sd(plot_kgCm2))

Figure 18.2 is the site-level carbon calculated from the mean of the plots measured each year.

ggplot(site_live_carbon, aes(x = measure_date, y = mean_kgCperm2)) + 
  geom_point() + 
  geom_errorbar(aes(ymin=mean_kgCperm2-sd_kgCperm2, ymax=mean_kgCperm2+sd_kgCperm2), width=.2,
                 position=position_dodge(0.05)) +
  theme_bw()
Figure 18.2: Site-level carbon in living trees for the focal NEON site

18.4.2 Calculate carbon in dead trees

We will now use the allodb package to extract the carbon in dead trees. This is exactly like the steps above, except for using the trees with a dead status.

combined_table_dead_tower <- combined_table_dbh |> 
  filter(!is.na(plantStatus)) |>
  dplyr::filter(str_detect(string = "Standing dead", pattern = plantStatus),
         plotType == "tower") |> 
  mutate(stemDiameter = as.numeric(stemDiameter))

Calculate the biomass of each tree in the table. This assumes that standing dead trees have the same carbon content as live trees (which is incorrect).

tree_dead_carbon <- combined_table_dead_tower |>
    mutate(ag_tree_kg = get_biomass(
          dbh = combined_table_dead_tower$stemDiameter,
          genus = combined_table_dead_tower$GENUS,
          species = combined_table_dead_tower$SPECIES,
          coords = c(longitude, latitude)
          ),
         bg_tree_kg = ag_tree_kg * ag_bg_prop,
         tree_kgC = (ag_tree_kg + bg_tree_kg) * 0.5)

Calculate the plot-level carbon.

measurement_dates <- tree_dead_carbon |> 
  summarise(measure_date = max(date), .by = eventID)

 plot_dead_carbon <-  tree_dead_carbon |> 
   left_join(measurement_dates, by = "eventID") |> 
    mutate(treeC_kgCm2 = (tree_kgC)/(totalSampledAreaTrees)) |> 
    summarise(plot_kgCm2 = sum(treeC_kgCm2, na.rm = TRUE), .by = c("plotID", "measure_date"))

Figure 18.3 plot the level of carbon in dead trees.

ggplot(plot_dead_carbon, aes(x = measure_date, y = plot_kgCm2, color = plotID)) + 
  geom_point() +
  geom_line() +
  theme_bw()
Figure 18.3: Plot level carbon in dead trees for the focal NEON site

Calculate site-level carbon in dead trees from the plots measured each year.

 site_dead_carbon <- plot_dead_carbon |>
    filter(plotID %in% last_plots) |> 
    group_by(measure_date) |>
    summarize(mean_kgCperm2 = mean(plot_kgCm2, na.rm = TRUE),
              sd_kgCperm2 = sd(plot_kgCm2))

Figure 18.4 is the site-level carbon.

ggplot(site_dead_carbon, aes(x = measure_date, y = mean_kgCperm2)) + 
  geom_point() +
  geom_line() +
  theme_bw()
Figure 18.4: Site-level carbon in dead trees for the focal NEON site

18.5 Calculate carbon in trees on the ground (coarse woody debris)

While the code above calculates the carbon in standing dead trees, it misses the carbon in dead trees that are no longer standing (called coarse woody debris). The coarse woody debris is another component of SOM in our simple forest model.

The data needed to calculate carbon in trees lying on the ground are available in two NEON data products.

cdw_density <- neon_cloud("cdw_densitydisk",
                        product = "DP1.10014.001",
                        site = site) |> 
  collect()

log_table <- neon_cloud("cdw_densitylog",
                        product = "DP1.10014.001",
                        site = site,
                        unify_schemas = TRUE) |> 
  collect()
 
cdw_tally <- neon_cloud("cdw_fieldtally",
                        product = "DP1.10010.001",
                        site = site) |> 
  collect()

We will follow the same steps to calculate carbon in coarse woody debris.

## Filter by tower plot for log table
log_table_filter <- log_table |> 
  filter(plotType == "tower",
         plotID %in% last_plots)

## Filter by tower plot for cdw table
cdw_tally <- cdw_tally |>
  filter(plotType == 'tower',
         plotID %in% last_plots)

## create 
log_table_filter$gcm3 <- rep(NA, nrow(log_table_filter))

## Set site-specific volume factor
site_volume_factor <- 8

for (i in 1:nrow(log_table_filter)){
  ## Match log table sampleID to cdw density table sample ID
  ind <- which(cdw_density$sampleID == log_table_filter$sampleID[i])
  ## Produce g/cm^3 by multiplying the bulk density of the disk by the site volume factor
  log_table_filter$gcm3[i] <- mean(cdw_density$bulkDensDisk[ind]) * site_volume_factor
}

year_measurement <- min(log_table_filter$yearBoutBegan)

## Table of coarse wood
site_cwd_carbon <- log_table_filter |>
  summarize(mean_kgCperm2 = .5 * sum(gcm3, na.rm = TRUE) * .1) |> 
  mutate(year = year_measurement)

18.6 Calculate carbon in fine roots

We lump fine root carbon into the wood stem stock in the simple forest model. Here, we will calculate the carbon stored in fine roots using the root chemistry data product. We will calculate the carbon in both dead and alive roots. Though we are interested mostly in live roots, at the time of writing this, the 2021 NEON data for our site does not have rootStatus data available. Thus, we will use historical data to estimate the ratio, so we don’t have to discard perfectly good information.

## root chemistry data product
bbc_percore <- neon_cloud("bbc_percore",
                          product = "DP1.10067.001",
                          site = site) |> 
  collect()

rootmass <- neon_cloud("bbc_rootmass",
                        product = "DP1.10067.001",
                        site = site) |> 
  collect()
rootmass$year = year(rootmass$collectDate)

## set variables for liveDryMass, deadDryMass, unkDryMass, area
rootmass$liveDryMass <- rep(0, nrow(rootmass))
rootmass$deadDryMass <- rep(0, nrow(rootmass))
rootmass$unkDryMass <- rep(0, nrow(rootmass))
rootmass$area <- rep(NA, nrow(rootmass))

for (i in 1:nrow(rootmass)){
  ## match by sample ID
  ind <- which(bbc_percore$sampleID == rootmass$sampleID[i])
  ## extract core sample area
  rootmass$area[i] <- bbc_percore$rootSampleArea[ind]
  ## categorize mass as live, dead, or unknown
  if (is.na(rootmass$rootStatus[i])){
    rootmass$unkDryMass[i] <- rootmass$dryMass[i]
  } else if (rootmass$rootStatus[i] == 'live'){
    rootmass$liveDryMass[i] <- rootmass$dryMass[i]
  } else if (rootmass$rootStatus[i] == 'dead'){
    rootmass$deadDryMass[i] <- rootmass$dryMass[i]
  } else{
    rootmass$unkDryMass[i] <- rootmass$dryMass[i]
  }
}

##
site_roots <- rootmass |>
  ## Filter plotID to only our plots of interest
  filter(plotID %in% last_plots) |>
  ## group by year
  group_by(year) |>
  ## sum live, dead, unknown root masses. multiply by
  ## .5 for conversion to kgC/m^2
  summarize(mean_kgCperm2_live = .5*sum(liveDryMass/area, na.rm = TRUE)/1000,
            mean_kgCperm2_dead = .5*sum(deadDryMass/area, na.rm = TRUE)/1000,
            mean_kgCperm2_unk = .5*sum(unkDryMass/area, na.rm = TRUE)/1000,
            year_total = sum(c(mean_kgCperm2_dead, mean_kgCperm2_live, mean_kgCperm2_unk)) / length(unique(plotID)),
            med_date = median(collectDate)) |> 
  rename(mean_kgCperm2 = year_total) |> 
  select(year, mean_kgCperm2)

18.7 Calculate carbon in soils

The video below provides an introduction to the science of soil carbon and methods for measuring it.

Soil carbon data are contained in two NEON data products: one describing the soil’s physical characteristics (depth and density), and another describing the soil’s carbon concentration. Ultimately, multiplying the density by the carbon concentration gives the total carbon.

#Download bieogeochemistry soil data to get carbon concentration
#data_product1 <- "DP1.00097.001"
#Download physical soil data to get the bulk density

mgc_perbiogeosample <- neon_cloud("mgp_perbiogeosample",
                        product = "DP1.00096.001",
                        site = site) |> 
  collect()

mgp_perbulksample <- neon_cloud("mgp_perbulksample",
                        product = "DP1.00096.001",
                        site = site) |> 
  collect()

This code pulls out the relevant columns from the data that was read in above.

bulk_density <- mgp_perbulksample |> 
    filter(bulkDensSampleType == "Regular") |> 
    select(horizonName,bulkDensExclCoarseFrag) 

  #gramsPerCubicCentimeter
horizon_carbon <- mgc_perbiogeosample |> 
    filter(biogeoSampleType == "Regular") |> 
    select(horizonName,biogeoTopDepth,biogeoBottomDepth,carbonTot) 

year <- year(as_date(mgp_perbulksample$collectDate[1]))

The code below

  • joins the bulk density table to the table with the carbon concentration

  • Determines the height of the horizon (biogeoBottomDepth - biogeoTopDepth) and converts to total mass of soil in the horizon using the bulk density.

  • Multiply the carbon concentration (carbonTot) by the mass of soil (along with unit conversion) to get the soil carbon in kg C / m2.

  #Unit notes
  #bulkDensExclCoarseFrag = gramsPerCubicCentimeter
  #carbonTot = gramsPerKilogram
  
  #Combine and calculate the carbon of each horizon
horizon_combined <- inner_join(horizon_carbon,bulk_density, by = "horizonName") |>
    #Convert volume in g per cm3 to mass per area in g per cm2 by multiplying by layer thickness
    mutate(horizon_soil_g_per_cm2 = (biogeoBottomDepth - biogeoTopDepth) * bulkDensExclCoarseFrag) |> 
    #Units of carbon are g per Kg soil but we have bulk density in g per cm2 so convert Kg soil to g soil
    mutate(CTot_g_per_g_soil = carbonTot*(1/1000),  #Units are g C per g soil
           horizon_C_g_percm2 = CTot_g_per_g_soil*horizon_soil_g_per_cm2, #Units are g C per cm2
           horizon_C_kg_per_m2 = horizon_C_g_percm2 * 10000 / 1000) |> #Units are g C per m2
    select(-CTot_g_per_g_soil,-horizon_C_g_percm2) |>
    arrange(biogeoTopDepth)

Soil carbon can be visualized by depth in Figure 18.5.

ggplot(horizon_combined, map = aes(-biogeoTopDepth,horizon_C_kg_per_m2)) +
  geom_line() +
  geom_point() +
  labs(y = "Carbon", x = "Depth", title = "Soil carbon by depth") +
  coord_flip()  +
  theme_bw()
Figure 18.5: Soil carbon by depth for the site from the megapit.

Total soil carbon is the sum across the depths.

site_soil_carbon <- horizon_combined |> 
    summarize(soilC_gC_m2 = sum(horizon_C_kg_per_m2))

18.8 Combine

Next, we will combine our estimates of carbon in each component for visualization and to aggregate into the wood and SOM stocks below.

site_live_carbon <- site_live_carbon |> 
  mutate(variable = "live_tree") |> 
  rename(datetime = measure_date) |> 
  select(datetime, variable, mean_kgCperm2)

site_dead_carbon <- site_dead_carbon |> 
  mutate(variable = "dead_trees") |> 
  rename(datetime = measure_date) |> 
  select(datetime, variable, mean_kgCperm2)

site_cwd_carbon <- site_cwd_carbon |> 
  mutate(variable = "down_wood") |> 
  mutate(datetime = as_date(paste(year, "01-01"))) |> 
  select(datetime, variable, mean_kgCperm2)

site_roots <- site_roots |> 
  mutate(variable = "fine_roots") |> 
  mutate(datetime = as_date(paste(year, "01-01"))) |> 
  select(datetime, variable, mean_kgCperm2)

site_soil_carbon <- site_soil_carbon |> 
  mutate(variable = "soil_carbon") |> 
  rename(mean_kgCperm2 = soilC_gC_m2) |> 
  mutate(datetime = as_date(paste(year, "01-01"))) |> 
  select(datetime, variable, mean_kgCperm2)

total_carbon_components <- bind_rows(site_live_carbon, site_dead_carbon, site_cwd_carbon, site_roots, site_soil_carbon)

The different pools of carbon can be plotted on the same figure to compare the magnitudes Figure 18.6.

total_carbon_components |> 
  ggplot(aes(x = datetime, y = mean_kgCperm2, color = variable)) + 
  geom_point() +
  theme_bw()
Figure 18.6: Site-level carbon stocks at the focal NEON site

Combine carbon pools to match the stocks used in our simple process model. - wood = live trees (stem and coarse roots) + fine roots - som = dead trees + down wood + soil carbon

total_carbon_simple <- total_carbon_components |> 
  pivot_wider(names_from = variable, values_from = mean_kgCperm2) |> 
  mutate(wood = live_tree + mean(fine_roots, na.rm = TRUE),
         som = mean(dead_trees, na.rm = TRUE) + mean(down_wood, na.rm = TRUE) + mean(soil_carbon, na.rm = TRUE),
         som = ifelse(datetime != min(datetime), NA, som)) |> 
  select(datetime, wood, som) |> 
  pivot_longer(-datetime, names_to = "variable", values_to = "observation")

18.9 MODIS LAI

Leaf area index can serve as a proxy for leaf carbon. The forest model converts leaf carbon into LAI using a leaf mass-to-area ratio. As a result, we can use the leaf area index (LAI) from the MODIS satellite sensor to constrain and evaluate LAI predictions. MODIS LAI product is an 8-day mean for a 500m grid cell.

Download the leaf area index for the focal NEON site using the MODISTools package.

lai <- MODISTools::mt_subset(product = "MCD15A2H",
                  lat = latitude,
                  lon =  longitude,
                  band = c("Lai_500m", "FparLai_QC"),
                  start = as_date(min(total_carbon_simple$datetime)),
                  end = Sys.Date(),
                  site_name = site,
                  progress = FALSE)


lai_cleaned <- lai |> 
  mutate(scale = ifelse(band == "FparLai_QC", 1, scale),
         scale = as.numeric(scale),
         value = scale * value,
         datetime = lubridate::as_date(calendar_date)) |> 
  select(band, value, datetime) |> 
  pivot_wider(names_from = band, values_from = value) |> 
  filter(FparLai_QC == 0) |> 
  rename(observation = Lai_500m) |> 
  mutate(variable = "lai") |> 
  select(datetime, variable, observation)

Figure 18.7 is the LAI for the focal NEON site.

lai_cleaned |> 
  ggplot(aes(x = datetime, y = observation)) +
  geom_point() +
  geom_smooth(span = 0.12) +
  theme_bw()
Figure 18.7: MODIS LAI for the 500m grid cell that includes the flux tower

18.10 Flux data

NEE flux data is used to help constrain the net of photosynthesis and respiration in the simple forest model. It is already processed for use in the NEON Ecological Forecasting Challenge. Here we read in that data.

Learn about flux data here:

url <- "https://sdsc.osn.xsede.org/bio230014-bucket01/challenges/targets/project_id=neon4cast/duration=P1D/terrestrial_daily-targets.csv.gz"

flux <- read_csv(url, show_col_types = FALSE) |> 
  filter(site_id %in% site, 
         variable == "nee") |> 
  mutate(datetime = as_date(datetime)) |> 
  select(datetime, variable, observation)

Figure 18.8 is the daily mean NEE for the focal NEON site.

Figure 18.8: Daily mean NEE from the flux tower at the focal NEON site

18.11 Combine to create data constraints

The units of the carbon stocks and nee need to be converted to the units of the forest process model. The carbon stocks are converted from kgC/m2 to MgC/ha and nee is converted from gC/m2/day to MgC/ha/day.

obs <- total_carbon_simple |> 
  bind_rows(lai_cleaned, flux) |> 
  mutate(site_id = site) |> 
  #convert from kgC/m2 to MgC/ha
  mutate(observation = ifelse(variable %in% c("wood", "som") , observation * 10, observation),
         observation = ifelse(variable %in% c("nee") , observation * 0.01, observation))

The combined data with the variable names converted to the names used in the forest process model Figure 18.9.

obs |> 
  ggplot(aes(x = datetime, y = observation)) + 
  geom_point() +
  facet_wrap(~variable, scale = "free_y") +
  theme_bw()
Figure 18.9: The data available to constrain the forest process model.

Save the observations to a CSV file.

write_csv(obs, "data/site_carbon_data.csv")

Now, we have a complete, up-to-date carbon budget file in a format compatible with our simple forest process model. This will allow us to calibrate parameters, assimilate data, and evaluate forecasts. We will use this file in the subsequent chapters.