Broad-scale Forest Biomass Mapping:

background-image: url("ortho_title.png")
background-size: cover
class: center middle inverse

## Broad-scale Forest Biomass Mapping:
### Generating Predictions Using A Spatio-Temporal Patchwork of LiDAR Coverages

#### Lucas Johnson
#### Mike Mahoney
#### Eddie Bevilacqua
#### Colin Beier
#### 2021-12-07

???

Hi everyone - I'm Lucas Johnson - a PhD student at SUNY ESF in Syracuse, 
NY. I'm going to be sharing some methods and results for leveraging 
a patchwork of extant LiDAR data for high-resolution biomass mapping across
a mixed use landscape.

---

# Outline

### 1. A spatio-temporal patchwork of LiDAR coverages

### 2. Making the most of field inventory data

### 3. "Pooled" modeling and results

### 4. Landcover "updating"

<br />

.center[
Access these slides at https://lucas-johnson.github.io/agu-2021/slides.html
]

???

First I'm going to share a bit about what this spatio-temporal patchwork looks
like in New York State - my study area, and why it presents challenges.

Secondly I'm going to describe one way to make the most of available 
field inventory data for model training and validation.

Third, I'm going to share some maps, and results from our efforts.

And lastly I'm going to share one method for temporally harmonizing resulting 
map outputs using landcover data.

---

# Spatio-temporal Patchwork of LiDAR Coverages

.pull-left[

### Available Data
.large[
- NYS has near wall-to-wall LiDAR coverage

-  2002 - 2019

- Older coverages with sparse metadata and low pulse density
]
]

???

So, NYS has lots of publicly available airborne LiDAR data, 
and if you combine all of it you get an almost wall-to-wall coverage. 
However this data is of varying quality, some of it dating back to 2002.
Some of the older coverages lack proper documentation, 
and older coverages used less powerful sensors resulting in fewer pulses or 
returns per unit area than more recent sensors.

.pull-right[

### Selected Data 
.large[
- Collected in one year/season

- Min density: 1 pulse m<sup>-2

- If there were multiple coverages for the same area, the newest coverage was
selected
]
]

???
So we imposed some restrictions and selected the coverages that we deemed
most suitable for modeling.
These coverages were collected in one year, so they had a single
"year-of-acquisition" which helps create temporally coherent results, and also 
helps to align with field inventory data (which you will hear more about soon)
Additionally, we required a minimum pulse density of 1 pulse per meter squared.
This single variable serves as a proxy for data quality and age - the more dense
the point cloud the more information rich, and older coverages are much more 
sparse.
We actually tried including a relatively newer coverage (2012) with pulse
density below 1 pulse per meter squared and discovered that this brought too 
much error into our model.

---

# Patchwork continued...

.center[
<img src="regions_map.png" width="80%" />
]

???
So, we were left with 17 discrete coverages ranging from 2014 to 2019 as you 
can see in the map.
We are still missing a decent portion of the state, but New York in the process
of filling most of these gaps.

And why does this present a challenge? Well, as you can see the data ranges
across 5 years, and the data were collected with different sensors and flight
parameters
Which can make producing coherent results a difficulty -
What date/dates do your final map products represent?
Can different coverages be combined?

Additionally, you have these irregular boundaries representing different 
conditions at different times and it can be challenging to gather enough
inventory data that match the component lidar coverages both temporally and 
spatially.

These circumstances are not uncommon, with many states holding similar 
patchworks of LiDAR often initially acquired for the purposes of DEM generation
and flood mapping. 
However, many are discouraged by the previously mentioned complexities 
from leveraging these rich data sources for broad-scale maps of forest 
structure.
I will present a set of methods which seek to make this process more simple and 
efficient.

---

# Making the Most of the Field Data
.large[
Matching inventory year to LiDAR acquisition: 345 plots
]
.center[
<img src="tier_1_map.png" width="70%" />

]

???

So that brings us to this problem of temporally aligning field ineventory to the
patchwork of LiDAR data. 
The larger the differences between inventory and LiDAR acquisition, the more
uncertainty and error brought into the model as we know these forests are not
static, but changing. 
However if require field inventories to match the year of the LiDAR acquisitions
we are quite limited in the amount of data we have for model training
and validation. 
In this particular context - we are left with 345 plots for this entire area 
with most coverages holding fewer than 30 plots.

---

# Field Inventory continued...
.large[
Including plots with "bracketing" inventory years: 1145 plots
]
.center[
<img src="tier_12_map.png" width="70%" />

]

???
Something we did to boost the number of available plots was to leverage
what we call "bracketing" inventories where inventories for a given plot were
conducted both before and after the lidar acquisition.
You have a set of bounds in other words. 
We compared the plot-level biomass at t1 and t2 to identify the presence of
disturbances. 
If plot-level biomass decreased by more then 5% from t1 to t2 we excluded the
plot. 
However, if we determined that the plot-level biomass was stable or increasing
we "temporally adjusted" the plot-level biomass by
linearly interpolating between plot biomass at t1 and t2 to produce a 
biomass estimate that temporally matched the LiDAR acquisition.
This more than tripled the number of available plots we had for model training
and validation while enforcing 0 temporal lag between the inventory and LiDAR.

---

# Model Development

.pull-left[

### Options
.large[
- Coverage specific models (17)

- Eco-region specific models (~4)
]
]

???
So, we have generated more reference data but we are still left with a patchwork
from which we need to build models.
The trend that we have identified in the literature is to use a divide
and conquer approach whereby modeling is done on a coverage by coverage basis
or the larger area is broken up into some ecologically meaningful regions.
However, ecoregions might not line up well with the irregular lidar coveage 
boundaries.
And some coverages might not have a sufficient amount of reference data to 
develop a robust model.
If we were to follow these general approaches we would produce anywhere from 
4 to 17 separate models. 
And that is a lot! That means we would have 17 sets of accuracy results, and 
17 sets of metadata and so on. Not to mention the within region/coverage
model iterations.

.pull-right[

### "Pooled" Model
.large[
- Efficiency

- Coverages with lower plot density
]
]

???
In our case, some of our individual coverages lacked what we deemed 
a sufficient number of plots - not to mention, we found it more practial
to pool the data together across all coverages to create one model.

---

# "Pooled" Models

.pull-left[

### Predictors
.large[
- LiDAR-derived height, and density metrics

- Environmental predictors (topography, climate, tax parcels)
]
]

???
Our pooled models used standard LiDAR height and density metrics as predictors, 
as well as topography and climate data to account for any regional trends or 
differences across the area. 
Additionally we used NYS tax classification codes at the parcel level
in the form of boolean indicator variables to add land-use information to the
models.
For example we might expect a pixel or observation with code 931, representing
NYS forest preserve to be managed differently than a pixel with code 
920 representing private fishing and hunting clubs.
--

.pull-right[

### Models
.large[
- Random Forest (RF)

- Stochastic Gradient Boosting Machine (GBM)

- Support Vector Machine (SVM)

- RMSE Weighted Ensemble (RF + GBM + SVM)

- Linear Model Ensemble (RF + GBM + SVM)
]
]

???
We developed three machine learning models and two ensemble models including a
random forest model, a stochastic gradient boosting machine, and a support 
vector machine, an RMSE weighted ensemble and a linear model ensemble.
The ensemble models serve to reduce the generalization error of the component
models, and model averaging has been proven to increase the predictive 
accuracy when the component model error is dominated by variance and the 
covariance between component models is small, which is often the case in 
ecological modeling.

---

# Model Results

.center[
<img src="rmse_x_scale.png" width="75%" />

]

???
We assessed our models following the approach developed by Riemann et al, where
by mapped pixel predictions are compared to FIA-based estimates at various
scales by aggregating FIA estiamtes and mapped pixels within hexagonal grid
cells of various sizes.
The smallest scale is the plot-to-pixel scale, and the largest scale
consists of 216,500 hectare hexagonal grid cells which corresponds to 50km 
between hexagon centroids.

Perhaps unsurprisingly all model predictions become more accurate at larger 
scales of aggregation. 
The pixel-based accuracy can be considered the most rigorous with the largest
variance in both pixel and plot AGB distributions as well as the most potential 
for spatial misalignment between 30m pixel predictions and FIA plot 
measurements.

All model predictions were relatively unbiased across all scales -
the MBE or mean bias error computed as the average model error is denoted in
the black text at the top of each bar.

While the differences between models were relatively small, the linear model
ensemble performed near the top if not the top across all scales in terms of
RMSE. So we selected the linear model ensemble for further map production and 
assessment based on these results, and the ability of the linear ensembling 
model to extrapolate from training data to extreme values.

---
class: middle

.center[
<img src="agb_combo.png" width="86%" />

???

Here is the set of initial maps built with the linear model ensemble
at a 30m resolution. 
The Land Change Monitoring , Assessment, and Projection (LCMAP) primary 
classification was used to mask water, developed, and barren areas in these 
maps. 
On the right you have a detailed view of two of the component lidar coverage
maps to give a sense of the range of conditions. 
The map in the center reflects AGB in 2015 in portions of Warren, Washington,
and Essex counties which are situated largely in the Adirondack region of the
state - if you're not familiar this is a heavily forested area.
On the far right we have a map reflecting conditions in 2018 for portions of
Cayuga and Oswego counties. 
This region is much more sparsely forested, with a patchy forest cover 
intermixed with agriculture.

---

.center[
<img src="hex_panel.png" width="86%" />
]

???
To assess the regional/spatial trend in our model accuracy we grouped and 
summarized individual plot-to-pixel residuals within the largest sized
hexagonal grid cells which are spaced 50km apart. 
Really this is trying to answer the question of whether or not a pooled model 
is regionally unbiased.
The test being that we would expect regional hot or cold spots where
if the pooled model was improperly calibrated for individual lidar coverages.

Subfigure a maps the mean bias error, and while there are some hexes 
with bias magnitudes up to 40 Mg/ha, 
there isn't any strong regional trend that can be visually deciphered.
The same more or less goes for RMSE in subfigure c.
We also map the Max reference value per hexagon, 
and standard deviation of the reference data within each hex. 
We related these characterizations of the reference data to the hex-level 
MBE and RMSE respectively as shown in the scatter plots e and f.
We see some trends where scatter plots suggest that RMSE is a function of the 
within-hex variation in reference data, 
and the MBE is a function of the range of reference data values, with
overpredictions in hexes with a lower (near-zero) max, and underpredicitons in 
hexes with a higher more extreme max.

These patterns suggest that error isn't spatially motivated or driven by 
differences in LiDAR coverages but rather is a function of the biomass 
condistions. 
Additionally the patterns described here, especially the MBE patterns,
are corroborated by our modeling results where our predictions are biased 
towards the mean. 
The model is simply more likely to be correct more often making near-mean 
predictions rather than predictions at the extremes of the distributions.

---
class: middle
# Landcover Updating

.pull-left[
.large[
- Harmonize the temporal patchwork of AGB maps

- <svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg">  <path d="M326.612 185.391c59.747 59.809 58.927 155.698.36 214.59-.11.12-.24.25-.36.37l-67.2 67.2c-59.27 59.27-155.699 59.262-214.96 0-59.27-59.26-59.27-155.7 0-214.96l37.106-37.106c9.84-9.84 26.786-3.3 27.294 10.606.648 17.722 3.826 35.527 9.69 52.721 1.986 5.822.567 12.262-3.783 16.612l-13.087 13.087c-28.026 28.026-28.905 73.66-1.155 101.96 28.024 28.579 74.086 28.749 102.325.51l67.2-67.19c28.191-28.191 28.073-73.757 0-101.83-3.701-3.694-7.429-6.564-10.341-8.569a16.037 16.037 0 0 1-6.947-12.606c-.396-10.567 3.348-21.456 11.698-29.806l21.054-21.055c5.521-5.521 14.182-6.199 20.584-1.731a152.482 152.482 0 0 1 20.522 17.197zM467.547 44.449c-59.261-59.262-155.69-59.27-214.96 0l-67.2 67.2c-.12.12-.25.25-.36.37-58.566 58.892-59.387 154.781.36 214.59a152.454 152.454 0 0 0 20.521 17.196c6.402 4.468 15.064 3.789 20.584-1.731l21.054-21.055c8.35-8.35 12.094-19.239 11.698-29.806a16.037 16.037 0 0 0-6.947-12.606c-2.912-2.005-6.64-4.875-10.341-8.569-28.073-28.073-28.191-73.639 0-101.83l67.2-67.19c28.239-28.239 74.3-28.069 102.325.51 27.75 28.3 26.872 73.934-1.155 101.96l-13.087 13.087c-4.35 4.35-5.769 10.79-3.783 16.612 5.864 17.194 9.042 34.999 9.69 52.721.509 13.906 17.454 20.446 27.294 10.606l37.106-37.106c59.271-59.259 59.271-155.699.001-214.959z"></path></svg> [LCMAP](https://www.sciencedirect.com/science/article/pii/S003442571930375X?casa_token=YsxYliagDdMAAAAA:P66pGr9_Ez4o6Bj2HDGuNpLn_mEjg-xb2NqM9bUYMcCzsWGgSNbeE6zg1q0_bIC8yPJvOsHr7QM) at annual resolution

- Tree cover loss & conversion to developed 
]
]

.pull-right[
<img src="regions_map.png" width="100%" />
]

???
So after mapping we are still left with this temporal piece-meal. Portions of 
the map are not representative of the same year.

We leveraged the annual primary classifications from LCMAP to "update" each 
individual component coverage to the common year of 2019. 
Stacks of annual land cover classifications were constructed for each coverage.
These coverage-specific stacks had depths reflecting the time gap between 
lidar acquisition and target year, and were condensed to a binary raster 
such that, 
any changes from tree cover to another class 
OR 
from any class to developed water or barren were set to 0.

Of course, this only really accounts for AGB losses attributed 
to discrete landcover changes, so we are completely missing more subtle within
forest disturbances and any forest biomass accretion.
We assessed the "updated" maps as well as the initial patchwork maps
using a set of FIA plots inventoried in 2019, and the Riemann method described 
above. 
Results were more or less unchanged between the two maps suggesting that this
approach doesn't hold much weight. 
However, the kinds and frequency of disturbances accounted for in this approach 
are unlikely to be identified in FIA plots.

---

.center[
<img src="landscape_agb_loss_champlain_15_19.png" width="85%" />
]

???
Here we have a visual spot-check with NAIP imagery serving as reference 
information.
This example shows that the annual LCMAP predictions do well 
to account for changes in the forest conditions.
On the far left you have conditions in 2015, and on the right you have 
conditions in 2019. 
The blue plot in the middle represents biomass loss as computed through this 
process.

So - this figure represents an initial biomass surface made for a 2015 lidar
coverage, and then the same surface after it has been updated to 2019 with
landcover data.

Following the patterns of the AGB loss map (center), we can attribute biomass
losses in the lower half of the scene to conversion from tree cover to 
croplands. 
NAIP corroborates this change.
---

# Conclusions

.center[
<img src="regions_map.png" width="80%" />
]

???
The heterogeneity of available LiDAR data often discourages their use for
broad-scale biomass mapping efforts, 
but we have shown that pooling is an option within certain bounds on data 
quality and temporal differences. 
Our results indicate that the pooled model approach should be considered for
all future LiDAR-AGB mapping efforts when multiple disparate coverages are
involved.
We showed that this approach can produce mapped estimates that are unbiased at 
the state level, while offering a level of operational efficiency to the 
practitioners.
However when plot-level predictions are grouped within smaller units, 
our maps exhibited patches of moderate bias, likely due to our model's tendency 
to predict near the mean of the FIA AGB distribution.

The methods outlined here add to the body of options available to practioners
seeking to produce accurate LiDAR-based biomass maps which can serve as 
important baselines for further carbon monitoring or carbon backcasting 
approaches.

---

## Thank you!
.large[
This work was supported by the [Climate and Applied Forest Research Institute](http://cafri-ny.org/)
and [SUNY ESF](https://www.esf.edu/),
with funding from the New York State Department of Environmental Conservation, 
Office of Climate change.
A special thanks to the USFS FIA program for their data sharing and cooperation.
]

#### Links:

Access these slides at https://lucas-johnson.github.io/agu-2021/slides.html

.pull-left[
<img src="cafri.jpeg" width="250" height="100" />
]

.pull-right[
<img src="esf.png" width="100%" />
]

???
Thanks to everyone for listening! 
Here are a few links to where you find me or these slides later. 
I welcome any questions and am open to any perspectives anybody might want to 
share.