background-image: url("ortho_title.png") background-size: cover class: center middle inverse ## Broad-scale Forest Biomass Mapping: ### Generating Predictions Using A Spatio-Temporal Patchwork of LiDAR Coverages #### Lucas Johnson #### Mike Mahoney #### Eddie Bevilacqua #### Colin Beier #### 2021-12-07 ??? Hi everyone - I'm Lucas Johnson - a PhD student at SUNY ESF in Syracuse, NY. I'm going to be sharing some methods and results for leveraging a patchwork of extant LiDAR data for high-resolution biomass mapping across a mixed use landscape. --- # Outline ### 1. A spatio-temporal patchwork of LiDAR coverages ### 2. Making the most of field inventory data ### 3. "Pooled" modeling and results ### 4. Landcover "updating" <br /> <br /> .center[ Access these slides at https://lucas-johnson.github.io/agu-2021/slides.html ] ??? First I'm going to share a bit about what this spatio-temporal patchwork looks like in New York State - my study area, and why it presents challenges. Secondly I'm going to describe one way to make the most of available field inventory data for model training and validation. Third, I'm going to share some maps, and results from our efforts. And lastly I'm going to share one method for temporally harmonizing resulting map outputs using landcover data. --- # Spatio-temporal Patchwork of LiDAR Coverages .pull-left[ ### Available Data .large[ - NYS has near wall-to-wall LiDAR coverage - 2002 - 2019 - Older coverages with sparse metadata and low pulse density ] ] ??? So, NYS has lots of publicly available airborne LiDAR data, and if you combine all of it you get an almost wall-to-wall coverage. However this data is of varying quality, some of it dating back to 2002. Some of the older coverages lack proper documentation, and older coverages used less powerful sensors resulting in fewer pulses or returns per unit area than more recent sensors. -- .pull-right[ ### Selected Data .large[ - Collected in one year/season - Min density: 1 pulse m<sup>-2 - If there were multiple coverages for the same area, the newest coverage was selected ] ] ??? So we imposed some restrictions and selected the coverages that we deemed most suitable for modeling. These coverages were collected in one year, so they had a single "year-of-acquisition" which helps create temporally coherent results, and also helps to align with field inventory data (which you will hear more about soon) Additionally, we required a minimum pulse density of 1 pulse per meter squared. This single variable serves as a proxy for data quality and age - the more dense the point cloud the more information rich, and older coverages are much more sparse. We actually tried including a relatively newer coverage (2012) with pulse density below 1 pulse per meter squared and discovered that this brought too much error into our model. --- # Patchwork continued... .center[ <img src="regions_map.png" width="80%" /> ] ??? So, we were left with 17 discrete coverages ranging from 2014 to 2019 as you can see in the map. We are still missing a decent portion of the state, but New York in the process of filling most of these gaps. And why does this present a challenge? Well, as you can see the data ranges across 5 years, and the data were collected with different sensors and flight parameters Which can make producing coherent results a difficulty - What date/dates do your final map products represent? Can different coverages be combined? Additionally, you have these irregular boundaries representing different conditions at different times and it can be challenging to gather enough inventory data that match the component lidar coverages both temporally and spatially. These circumstances are not uncommon, with many states holding similar patchworks of LiDAR often initially acquired for the purposes of DEM generation and flood mapping. However, many are discouraged by the previously mentioned complexities from leveraging these rich data sources for broad-scale maps of forest structure. I will present a set of methods which seek to make this process more simple and efficient. --- # Making the Most of the Field Data .large[ Matching inventory year to LiDAR acquisition: 345 plots ] .center[ <img src="tier_1_map.png" width="70%" /> ] ??? So that brings us to this problem of temporally aligning field ineventory to the patchwork of LiDAR data. The larger the differences between inventory and LiDAR acquisition, the more uncertainty and error brought into the model as we know these forests are not static, but changing. However if require field inventories to match the year of the LiDAR acquisitions we are quite limited in the amount of data we have for model training and validation. In this particular context - we are left with 345 plots for this entire area with most coverages holding fewer than 30 plots. --- # Field Inventory continued... .large[ Including plots with "bracketing" inventory years: 1145 plots ] .center[ <img src="tier_12_map.png" width="70%" /> ] ??? Something we did to boost the number of available plots was to leverage what we call "bracketing" inventories where inventories for a given plot were conducted both before and after the lidar acquisition. You have a set of bounds in other words. We compared the plot-level biomass at t1 and t2 to identify the presence of disturbances. If plot-level biomass decreased by more then 5% from t1 to t2 we excluded the plot. However, if we determined that the plot-level biomass was stable or increasing we "temporally adjusted" the plot-level biomass by linearly interpolating between plot biomass at t1 and t2 to produce a biomass estimate that temporally matched the LiDAR acquisition. This more than tripled the number of available plots we had for model training and validation while enforcing 0 temporal lag between the inventory and LiDAR. --- # Model Development .pull-left[ ### Options .large[ - Coverage specific models (17) - Eco-region specific models (~4) ] ] ??? So, we have generated more reference data but we are still left with a patchwork from which we need to build models. The trend that we have identified in the literature is to use a divide and conquer approach whereby modeling is done on a coverage by coverage basis or the larger area is broken up into some ecologically meaningful regions. However, ecoregions might not line up well with the irregular lidar coveage boundaries. And some coverages might not have a sufficient amount of reference data to develop a robust model. If we were to follow these general approaches we would produce anywhere from 4 to 17 separate models. And that is a lot! That means we would have 17 sets of accuracy results, and 17 sets of metadata and so on. Not to mention the within region/coverage model iterations. -- .pull-right[ ### "Pooled" Model .large[ - Efficiency - Coverages with lower plot density ] ] ??? In our case, some of our individual coverages lacked what we deemed a sufficient number of plots - not to mention, we found it more practial to pool the data together across all coverages to create one model. --- # "Pooled" Models .pull-left[ ### Predictors .large[ - LiDAR-derived height, and density metrics - Environmental predictors (topography, climate, tax parcels) ] ] ??? Our pooled models used standard LiDAR height and density metrics as predictors, as well as topography and climate data to account for any regional trends or differences across the area. Additionally we used NYS tax classification codes at the parcel level in the form of boolean indicator variables to add land-use information to the models. For example we might expect a pixel or observation with code 931, representing NYS forest preserve to be managed differently than a pixel with code 920 representing private fishing and hunting clubs. -- .pull-right[ ### Models .large[ - Random Forest (RF) - Stochastic Gradient Boosting Machine (GBM) - Support Vector Machine (SVM) - RMSE Weighted Ensemble (RF + GBM + SVM) - Linear Model Ensemble (RF + GBM + SVM) ] ] ??? We developed three machine learning models and two ensemble models including a random forest model, a stochastic gradient boosting machine, and a support vector machine, an RMSE weighted ensemble and a linear model ensemble. The ensemble models serve to reduce the generalization error of the component models, and model averaging has been proven to increase the predictive accuracy when the component model error is dominated by variance and the covariance between component models is small, which is often the case in ecological modeling. --- # Model Results .center[ <img src="rmse_x_scale.png" width="75%" /> <svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M326.612 185.391c59.747 59.809 58.927 155.698.36 214.59-.11.12-.24.25-.36.37l-67.2 67.2c-59.27 59.27-155.699 59.262-214.96 0-59.27-59.26-59.27-155.7 0-214.96l37.106-37.106c9.84-9.84 26.786-3.3 27.294 10.606.648 17.722 3.826 35.527 9.69 52.721 1.986 5.822.567 12.262-3.783 16.612l-13.087 13.087c-28.026 28.026-28.905 73.66-1.155 101.96 28.024 28.579 74.086 28.749 102.325.51l67.2-67.19c28.191-28.191 28.073-73.757 0-101.83-3.701-3.694-7.429-6.564-10.341-8.569a16.037 16.037 0 0 1-6.947-12.606c-.396-10.567 3.348-21.456 11.698-29.806l21.054-21.055c5.521-5.521 14.182-6.199 20.584-1.731a152.482 152.482 0 0 1 20.522 17.197zM467.547 44.449c-59.261-59.262-155.69-59.27-214.96 0l-67.2 67.2c-.12.12-.25.25-.36.37-58.566 58.892-59.387 154.781.36 214.59a152.454 152.454 0 0 0 20.521 17.196c6.402 4.468 15.064 3.789 20.584-1.731l21.054-21.055c8.35-8.35 12.094-19.239 11.698-29.806a16.037 16.037 0 0 0-6.947-12.606c-2.912-2.005-6.64-4.875-10.341-8.569-28.073-28.073-28.191-73.639 0-101.83l67.2-67.19c28.239-28.239 74.3-28.069 102.325.51 27.75 28.3 26.872 73.934-1.155 101.96l-13.087 13.087c-4.35 4.35-5.769 10.79-3.783 16.612 5.864 17.194 9.042 34.999 9.69 52.721.509 13.906 17.454 20.446 27.294 10.606l37.106-37.106c59.271-59.259 59.271-155.699.001-214.959z"></path></svg> [Riemann et al.](https://www.nrs.fs.fed.us/pubs/jrnl/2010/nrs_2010_riemann_001.pdf) ] ??? We assessed our models following the approach developed by Riemann et al, where by mapped pixel predictions are compared to FIA-based estimates at various scales by aggregating FIA estiamtes and mapped pixels within hexagonal grid cells of various sizes. The smallest scale is the plot-to-pixel scale, and the largest scale consists of 216,500 hectare hexagonal grid cells which corresponds to 50km between hexagon centroids. Perhaps unsurprisingly all model predictions become more accurate at larger scales of aggregation. The pixel-based accuracy can be considered the most rigorous with the largest variance in both pixel and plot AGB distributions as well as the most potential for spatial misalignment between 30m pixel predictions and FIA plot measurements. All model predictions were relatively unbiased across all scales - the MBE or mean bias error computed as the average model error is denoted in the black text at the top of each bar. While the differences between models were relatively small, the linear model ensemble performed near the top if not the top across all scales in terms of RMSE. So we selected the linear model ensemble for further map production and assessment based on these results, and the ability of the linear ensembling model to extrapolate from training data to extreme values. --- class: middle .center[ <img src="agb_combo.png" width="86%" /> <svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M326.612 185.391c59.747 59.809 58.927 155.698.36 214.59-.11.12-.24.25-.36.37l-67.2 67.2c-59.27 59.27-155.699 59.262-214.96 0-59.27-59.26-59.27-155.7 0-214.96l37.106-37.106c9.84-9.84 26.786-3.3 27.294 10.606.648 17.722 3.826 35.527 9.69 52.721 1.986 5.822.567 12.262-3.783 16.612l-13.087 13.087c-28.026 28.026-28.905 73.66-1.155 101.96 28.024 28.579 74.086 28.749 102.325.51l67.2-67.19c28.191-28.191 28.073-73.757 0-101.83-3.701-3.694-7.429-6.564-10.341-8.569a16.037 16.037 0 0 1-6.947-12.606c-.396-10.567 3.348-21.456 11.698-29.806l21.054-21.055c5.521-5.521 14.182-6.199 20.584-1.731a152.482 152.482 0 0 1 20.522 17.197zM467.547 44.449c-59.261-59.262-155.69-59.27-214.96 0l-67.2 67.2c-.12.12-.25.25-.36.37-58.566 58.892-59.387 154.781.36 214.59a152.454 152.454 0 0 0 20.521 17.196c6.402 4.468 15.064 3.789 20.584-1.731l21.054-21.055c8.35-8.35 12.094-19.239 11.698-29.806a16.037 16.037 0 0 0-6.947-12.606c-2.912-2.005-6.64-4.875-10.341-8.569-28.073-28.073-28.191-73.639 0-101.83l67.2-67.19c28.239-28.239 74.3-28.069 102.325.51 27.75 28.3 26.872 73.934-1.155 101.96l-13.087 13.087c-4.35 4.35-5.769 10.79-3.783 16.612 5.864 17.194 9.042 34.999 9.69 52.721.509 13.906 17.454 20.446 27.294 10.606l37.106-37.106c59.271-59.259 59.271-155.699.001-214.959z"></path></svg> [LCMAP](https://www.sciencedirect.com/science/article/pii/S003442571930375X?casa_token=YsxYliagDdMAAAAA:P66pGr9_Ez4o6Bj2HDGuNpLn_mEjg-xb2NqM9bUYMcCzsWGgSNbeE6zg1q0_bIC8yPJvOsHr7QM) ] ??? Here is the set of initial maps built with the linear model ensemble at a 30m resolution. The Land Change Monitoring , Assessment, and Projection (LCMAP) primary classification was used to mask water, developed, and barren areas in these maps. On the right you have a detailed view of two of the component lidar coverage maps to give a sense of the range of conditions. The map in the center reflects AGB in 2015 in portions of Warren, Washington, and Essex counties which are situated largely in the Adirondack region of the state - if you're not familiar this is a heavily forested area. On the far right we have a map reflecting conditions in 2018 for portions of Cayuga and Oswego counties. This region is much more sparsely forested, with a patchy forest cover intermixed with agriculture. --- .center[ <img src="hex_panel.png" width="86%" /> ] ??? To assess the regional/spatial trend in our model accuracy we grouped and summarized individual plot-to-pixel residuals within the largest sized hexagonal grid cells which are spaced 50km apart. Really this is trying to answer the question of whether or not a pooled model is regionally unbiased. The test being that we would expect regional hot or cold spots where if the pooled model was improperly calibrated for individual lidar coverages. Subfigure a maps the mean bias error, and while there are some hexes with bias magnitudes up to 40 Mg/ha, there isn't any strong regional trend that can be visually deciphered. The same more or less goes for RMSE in subfigure c. We also map the Max reference value per hexagon, and standard deviation of the reference data within each hex. We related these characterizations of the reference data to the hex-level MBE and RMSE respectively as shown in the scatter plots e and f. We see some trends where scatter plots suggest that RMSE is a function of the within-hex variation in reference data, and the MBE is a function of the range of reference data values, with overpredictions in hexes with a lower (near-zero) max, and underpredicitons in hexes with a higher more extreme max. These patterns suggest that error isn't spatially motivated or driven by differences in LiDAR coverages but rather is a function of the biomass condistions. Additionally the patterns described here, especially the MBE patterns, are corroborated by our modeling results where our predictions are biased towards the mean. The model is simply more likely to be correct more often making near-mean predictions rather than predictions at the extremes of the distributions. --- class: middle # Landcover Updating .pull-left[ .large[ - Harmonize the temporal patchwork of AGB maps - <svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M326.612 185.391c59.747 59.809 58.927 155.698.36 214.59-.11.12-.24.25-.36.37l-67.2 67.2c-59.27 59.27-155.699 59.262-214.96 0-59.27-59.26-59.27-155.7 0-214.96l37.106-37.106c9.84-9.84 26.786-3.3 27.294 10.606.648 17.722 3.826 35.527 9.69 52.721 1.986 5.822.567 12.262-3.783 16.612l-13.087 13.087c-28.026 28.026-28.905 73.66-1.155 101.96 28.024 28.579 74.086 28.749 102.325.51l67.2-67.19c28.191-28.191 28.073-73.757 0-101.83-3.701-3.694-7.429-6.564-10.341-8.569a16.037 16.037 0 0 1-6.947-12.606c-.396-10.567 3.348-21.456 11.698-29.806l21.054-21.055c5.521-5.521 14.182-6.199 20.584-1.731a152.482 152.482 0 0 1 20.522 17.197zM467.547 44.449c-59.261-59.262-155.69-59.27-214.96 0l-67.2 67.2c-.12.12-.25.25-.36.37-58.566 58.892-59.387 154.781.36 214.59a152.454 152.454 0 0 0 20.521 17.196c6.402 4.468 15.064 3.789 20.584-1.731l21.054-21.055c8.35-8.35 12.094-19.239 11.698-29.806a16.037 16.037 0 0 0-6.947-12.606c-2.912-2.005-6.64-4.875-10.341-8.569-28.073-28.073-28.191-73.639 0-101.83l67.2-67.19c28.239-28.239 74.3-28.069 102.325.51 27.75 28.3 26.872 73.934-1.155 101.96l-13.087 13.087c-4.35 4.35-5.769 10.79-3.783 16.612 5.864 17.194 9.042 34.999 9.69 52.721.509 13.906 17.454 20.446 27.294 10.606l37.106-37.106c59.271-59.259 59.271-155.699.001-214.959z"></path></svg> [LCMAP](https://www.sciencedirect.com/science/article/pii/S003442571930375X?casa_token=YsxYliagDdMAAAAA:P66pGr9_Ez4o6Bj2HDGuNpLn_mEjg-xb2NqM9bUYMcCzsWGgSNbeE6zg1q0_bIC8yPJvOsHr7QM) at annual resolution - Tree cover loss & conversion to developed ] ] .pull-right[ <img src="regions_map.png" width="100%" /> ] ??? So after mapping we are still left with this temporal piece-meal. Portions of the map are not representative of the same year. We leveraged the annual primary classifications from LCMAP to "update" each individual component coverage to the common year of 2019. Stacks of annual land cover classifications were constructed for each coverage. These coverage-specific stacks had depths reflecting the time gap between lidar acquisition and target year, and were condensed to a binary raster such that, any changes from tree cover to another class OR from any class to developed water or barren were set to 0. Of course, this only really accounts for AGB losses attributed to discrete landcover changes, so we are completely missing more subtle within forest disturbances and any forest biomass accretion. We assessed the "updated" maps as well as the initial patchwork maps using a set of FIA plots inventoried in 2019, and the Riemann method described above. Results were more or less unchanged between the two maps suggesting that this approach doesn't hold much weight. However, the kinds and frequency of disturbances accounted for in this approach are unlikely to be identified in FIA plots. --- .center[ <img src="landscape_agb_loss_champlain_15_19.png" width="85%" /> ] ??? Here we have a visual spot-check with NAIP imagery serving as reference information. This example shows that the annual LCMAP predictions do well to account for changes in the forest conditions. On the far left you have conditions in 2015, and on the right you have conditions in 2019. The blue plot in the middle represents biomass loss as computed through this process. So - this figure represents an initial biomass surface made for a 2015 lidar coverage, and then the same surface after it has been updated to 2019 with landcover data. Following the patterns of the AGB loss map (center), we can attribute biomass losses in the lower half of the scene to conversion from tree cover to croplands. NAIP corroborates this change. --- # Conclusions .center[ <img src="regions_map.png" width="80%" /> ] ??? The heterogeneity of available LiDAR data often discourages their use for broad-scale biomass mapping efforts, but we have shown that pooling is an option within certain bounds on data quality and temporal differences. Our results indicate that the pooled model approach should be considered for all future LiDAR-AGB mapping efforts when multiple disparate coverages are involved. We showed that this approach can produce mapped estimates that are unbiased at the state level, while offering a level of operational efficiency to the practitioners. However when plot-level predictions are grouped within smaller units, our maps exhibited patches of moderate bias, likely due to our model's tendency to predict near the mean of the FIA AGB distribution. The methods outlined here add to the body of options available to practioners seeking to produce accurate LiDAR-based biomass maps which can serve as important baselines for further carbon monitoring or carbon backcasting approaches. --- ## Thank you! .large[ This work was supported by the [Climate and Applied Forest Research Institute](http://cafri-ny.org/) and [SUNY ESF](https://www.esf.edu/), with funding from the New York State Department of Environmental Conservation, Office of Climate change. A special thanks to the USFS FIA program for their data sharing and cooperation. ] #### Links: <svg viewBox="0 0 496 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M165.9 397.4c0 2-2.3 3.6-5.2 3.6-3.3.3-5.6-1.3-5.6-3.6 0-2 2.3-3.6 5.2-3.6 3-.3 5.6 1.3 5.6 3.6zm-31.1-4.5c-.7 2 1.3 4.3 4.3 4.9 2.6 1 5.6 0 6.2-2s-1.3-4.3-4.3-5.2c-2.6-.7-5.5.3-6.2 2.3zm44.2-1.7c-2.9.7-4.9 2.6-4.6 4.9.3 2 2.9 3.3 5.9 2.6 2.9-.7 4.9-2.6 4.6-4.6-.3-1.9-3-3.2-5.9-2.9zM244.8 8C106.1 8 0 113.3 0 252c0 110.9 69.8 205.8 169.5 239.2 12.8 2.3 17.3-5.6 17.3-12.1 0-6.2-.3-40.4-.3-61.4 0 0-70 15-84.7-29.8 0 0-11.4-29.1-27.8-36.6 0 0-22.9-15.7 1.6-15.4 0 0 24.9 2 38.6 25.8 21.9 38.6 58.6 27.5 72.9 20.9 2.3-16 8.8-27.1 16-33.7-55.9-6.2-112.3-14.3-112.3-110.5 0-27.5 7.6-41.3 23.6-58.9-2.6-6.5-11.1-33.3 2.6-67.9 20.9-6.5 69 27 69 27 20-5.6 41.5-8.5 62.8-8.5s42.8 2.9 62.8 8.5c0 0 48.1-33.6 69-27 13.7 34.7 5.2 61.4 2.6 67.9 16 17.7 25.8 31.5 25.8 58.9 0 96.5-58.9 104.2-114.8 110.5 9.2 7.9 17 22.9 17 46.4 0 33.7-.3 75.4-.3 83.6 0 6.5 4.6 14.4 17.3 12.1C428.2 457.8 496 362.9 496 252 496 113.3 383.5 8 244.8 8zM97.2 352.9c-1.3 1-1 3.3.7 5.2 1.6 1.6 3.9 2.3 5.2 1 1.3-1 1-3.3-.7-5.2-1.6-1.6-3.9-2.3-5.2-1zm-10.8-8.1c-.7 1.3.3 2.9 2.3 3.9 1.6 1 3.6.7 4.3-.7.7-1.3-.3-2.9-2.3-3.9-2-.6-3.6-.3-4.3.7zm32.4 35.6c-1.6 1.3-1 4.3 1.3 6.2 2.3 2.3 5.2 2.6 6.5 1 1.3-1.3.7-4.3-1.3-6.2-2.2-2.3-5.2-2.6-6.5-1zm-11.4-14.7c-1.6 1-1.6 3.6 0 5.9 1.6 2.3 4.3 3.3 5.6 2.3 1.6-1.3 1.6-3.9 0-6.2-1.4-2.3-4-3.3-5.6-2z"></path></svg> [@lucas-johnson](https://github.com/lucas-johnson/) <svg viewBox="0 0 448 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M100.28 448H7.4V148.9h92.88zM53.79 108.1C24.09 108.1 0 83.5 0 53.8a53.79 53.79 0 0 1 107.58 0c0 29.7-24.1 54.3-53.79 54.3zM447.9 448h-92.68V302.4c0-34.7-.7-79.2-48.29-79.2-48.29 0-55.69 37.7-55.69 76.7V448h-92.78V148.9h89.08v40.8h1.3c12.4-23.5 42.69-48.3 87.88-48.3 94 0 111.28 61.9 111.28 142.3V448z"></path></svg> [@lucas-johnson](https://www.linkedin.com/in/lucas-johnson-2ab76293/) Access these slides at https://lucas-johnson.github.io/agu-2021/slides.html .pull-left[ <img src="cafri.jpeg" width="250" height="100" /> ] .pull-right[ <img src="esf.png" width="100%" /> ] ??? Thanks to everyone for listening! Here are a few links to where you find me or these slides later. I welcome any questions and am open to any perspectives anybody might want to share.