First the GLOBIOr package needs to be installed, use the following code to install the current (local) version.
# install package
install.packages("GLOBIOr_0.2.5.100008.tar.gz", repos=NULL)
After the package is installed, we can load it into R.
# load packages
library(GLOBIOr)
## GLOBIOr version: 0.2.5.100015
Besides GLOBIOr we will also need two other packages, namely
ggplot2
and terra
, for plotting and spatial
data handling respectively.
library(ggplot2)
library(terra)
## terra 1.8.9
Next we can use the build-in example data, for this example we will
use the MSA response to roads data, the response to climate change and
land-use class. Note, however, that the other datasets are also
available. The default R documentation can be consulted for more
information regarding the contents of the example datasets (similar to
requesting the help for R functions, for example by calling
?MSADataRoadLand
). For this example we will only fit MSA
responses for vertebrates, so after loading the data we can subset the
data to only include vertebrates.
# load example data, included in package for now
data(MSADataRoadLand) # MSA response to roads
data(MSADataLandUseLand) # MSA response to land use
data(MSADataClimateLand) # MSA response to climate change
# subset data
MSADataRoadLand = subset(MSADataRoadLand, taxon=="vertebrates")
MSADataLandUseLand = subset(MSADataLandUseLand, taxon=="vertebrates")
MSADataClimateLand = subset(MSADataClimateLand, taxon=="vertebrates")
# explore one of the datasets
head(MSADataRoadLand)
## study dataset taxon RoadDist MSA MSA_t
## 1 Delgado et al. 2004 5 vertebrates 10 0.9285714 0.9264706
## 2 Meunier et al. 2000 14a vertebrates 50 1.0000000 0.9975490
## 3 Meunier et al. 2000 14b vertebrates 50 1.0000000 0.9975490
## 4 Xu et al. 2013 33 vertebrates 25 1.0000000 0.9975490
## 5 Xu et al. 2013 33 vertebrates 75 1.0000000 0.9975490
## 6 Whitaker et al. 1997 29 vertebrates 10 0.8227023 0.8211204
## n_species RoadDistLog10
## 1 8 1.000000
## 2 3 1.698970
## 3 3 1.698970
## 4 1 1.397940
## 5 1 1.875061
## 6 27 1.000000
We can now fit the MSA relationships using the
fitMSAModels()
function (see example below). This function
fits the MSA relationship(s) for the given dataset and returns a list of
fitted models. By default the fitMSAModels()
function tries
to fits six models:
defaultBeta
: a default beta model without any
transformation to the response variable.svBeta
: a beta model with a Smithson-Verkuilen
transformation to the response variable (ref).ordBeta
: a ordered beta model that aiming to model
potential zeros and ones in the response (ref).zeroInfBeta
: a zero-inflated beta model that aims to
model potential zeros in the response (ref).oneInfBeta
: a one-inflated beta model that aims to
model potential ones in the response (ref).zeroOneInfBeta
: a zero-one inflated beta model that
aims to model potential zeros and ones in the response (ref).The tryModels
input argument can be used to specify
which models should be fitted (i.e. specific models can be excluded). By
default the fitMSAModels()
function uses the defined
beta
formula to fit all (sub)models. However, the
zi
argument can for example be used to specify the
zero-inflation part model of the zeroInfBeta
model (when a
different relationship is assumed for the continuous data compared to
the binomial that predicts the probability of a zero). For the use of
custom formulas for the other models see the documentation of the
fitMSAModels()
function (?fitMSAModels
). The
weights
input argument can be used to specify the weights
of the data points. The function also provides response plots of the
fitted models (can be turned off by setting
plot = FALSE
).
Below we fit the MSA relationship for the road dataset. Here the MSA
is modelled as a function of the log10 of the road distance, dataset
nested within study is used as a random effect. The weights are set to
the square root of the number of species in the dataset. Because only
one formula is specified (beta
) all models will using this
formulation.
# fit potential MSA models for the road dataset
modelsRoads = fitMSAModels(beta=MSA~RoadDistLog10+(1|study/dataset),
data=MSADataRoadLand,
weights=sqrt(n_species),
plot=TRUE)
## Zeros and ones detected in response.
##
## Fitting models...
## - defaultBeta model cannot be fitted, zeros and/or ones in response data.
## - fitting svBeta model.
## - fitting ordBeta model.
## - Fitting zeroInfBeta model. Ones detected, using sv transformation on ones.
## - Fitting oneInfBeta model. Zeros detected, using sv transformation on zeros.
## - Fitting zeroOneInfBeta model.
##
## Creating response plots for predictor: RoadDistLog10
As can be seen from the outputs and plot generated by the
fitMSAModels()
function, the relationship between the MSA
and the log10 of the road distance is fitted using five approaches,
which depend on the nature of the data. In this case the dataset has
both zeros, ones as well as data between zero and one, therefore the
defaultBeta
cannot be fitted.
# fit potential MSA models for the land use dataset
modelsLU = fitMSAModels(MSA~land_use+(1|study/dataset),
data=MSADataLandUseLand,
weights=sqrt(n_species),
plot=TRUE)
## Ones detected in response, no zeros.
##
## Fitting models...
## - defaultBeta model cannot be fitted, zeros and/or ones in response data.
## - fitting svBeta model.
## - fitting ordBeta model.
## - zeroInfBeta not fitted: no zeros in the response data.
## - Fitting oneInfBeta model.
## - zeroOneInfBeta not fitted: needs both ones and zeros in the response data.
##
## Creating response plots for predictor: land_use
Similar to the output before we can be seen from the outputs and plot
generated by the fitMSAModels()
function which models have
been fitted and which not. In this case the relationship between the MSA
and the land use is fitted using three approaches, which again depend on
the nature of the data. In this case the data set has only ones and
values between zero and one (no zeros), therefore the
defaultBeta
, zeroInfBeta
and
zeroOneInfBeta
cannot be fitted.
# fit potential MSA models for the climate change dataset
modelsClimate = fitMSAModels(MSA~GMTI+(1|study),
data=MSADataClimateLand,
dispformula = ~GMTI,
weights=sqrt(n_species),
plot=TRUE)
## No zeros and/or ones detected in response.
##
## Fitting models...
## - fitting defaultBeta model.
## - svBeta model not fitted: no zeros or ones in response data.
## - fitting ordBeta model.
## - zeroInfBeta not fitted: no zeros in the response data.
## - oneInfBeta not fitted: no ones in the response data.
## - zeroOneInfBeta not fitted: needs both ones and zeros in the response data.
##
## Creating response plots for predictor: GMTI
In the final example of this case study we fit models to investigate
the relationship between the MSA and the climate change. Here
fitMSAModels()
uses two approaches. In this case the
dataset has no zeros and no ones (only values between zero and one),
therefore only the defaultBeta
and ordBeta
are
used, all other models are omitted.
In order to select the best model from all fitted options (provided
by the fitMSAModels()
function), the individual models can
be evaluated using the testMSAModel()
function. This
function aims to determine the most appropriate model based on the
DHARMa
diagnostics (dispersion test) and a measure for
goodness of fit, the the root mean square error (RMSE) determined via
blocked cross-validation. The testMSAModel()
function does
the following:
DHARMa
package (ref), the p-value returned by the
DHARMa::testDispersion()
is used to evaluate the dispersion
of provided models. By default 500 simulated residuals are used, the
nsim
argument can be set to a positive integer to increase
or decrease the number of simulations (a minimum of 250 is
recommended).testMSAModel()
calculates a RMSE using a blocked
cross-validation approach. By default 5 cross-validation folds used, the
ncv
argument can be set to a positive integer to increase
or decrease the number of folds. In the blocked cross-validation, the
data is grouped by the first level in the random effect structure
provided in the corresponding model. In this example the random effect
is (1|study/dataset)
, thus the study
column
will be used to group the data in the blocked cross-validation. The
variable used for grouping the data in the blocked cross-validation
approach can be set manually via the groupcv
input
argumentThe function will return a data.frame
that contains the
diagnostics for each model. Finally, one model will be suggested as the
best model based on the diagnostics (a model without dispersion issues
and the lowest cross-validated RMSE).
# check for dispersion issues and cross-validated RSME
ModelRoadsStats = testMSAModels(models=modelsRoads)
## Diagnostics
## Running DHARMa dispersion tests.
## Dispersion issues found for: svBeta, zeroInfBeta, oneInfBeta
## +-------------+----------------+------------------+
## | ModelName | DispersionPval | DispersionIssues |
## +-------------+----------------+------------------+
## | svBeta | 0.012 | TRUE |
## | zeroInfBeta | 0.032 | TRUE |
## | oneInfBeta | 0.044 | TRUE |
## +-------------+----------------+------------------+
##
## Model performance based on RMSE (full dataset).
## +----------------+------------------+-------+
## | ModelName | DispersionIssues | RMSE |
## +----------------+------------------+-------+
## | zeroOneInfBeta | FALSE | 0.142 |
## | oneInfBeta | TRUE | 0.146 |
## | svBeta | TRUE | 0.148 |
## | zeroInfBeta | TRUE | 0.154 |
## | ordBeta | FALSE | 0.166 |
## +----------------+------------------+-------+
##
## Calculating RMSE using blocked cross-validation.
## +----------------+------------------+---------+
## | ModelName | DispersionIssues | RMSE_CV |
## +----------------+------------------+---------+
## | oneInfBeta | TRUE | 0.314 |
## | zeroOneInfBeta | FALSE | 0.317 |
## | svBeta | TRUE | 0.319 |
## | zeroInfBeta | TRUE | 0.320 |
## | ordBeta | FALSE | 0.325 |
## +----------------+------------------+---------+
# check for dispersion issues and cross-validated RSME
ModelLUStats = testMSAModels(models=modelsLU)
## Diagnostics
## Running DHARMa dispersion tests.
## No dispersion issues found.
##
## Model performance based on RMSE (full dataset).
## +------------+------------------+-------+
## | ModelName | DispersionIssues | RMSE |
## +------------+------------------+-------+
## | svBeta | FALSE | 0.106 |
## | ordBeta | FALSE | 0.112 |
## | oneInfBeta | FALSE | 0.116 |
## +------------+------------------+-------+
##
## Calculating RMSE using blocked cross-validation.
## +------------+------------------+---------+
## | ModelName | DispersionIssues | RMSE_CV |
## +------------+------------------+---------+
## | oneInfBeta | FALSE | 0.241 |
## | ordBeta | FALSE | 0.248 |
## | svBeta | FALSE | 0.253 |
## +------------+------------------+---------+
# check for dispersion issues and cross-validated RSME
ModelClimateStats = testMSAModels(models=modelsClimate)
## Diagnostics
## Running DHARMa dispersion tests.
## Dispersion issues found for: defaultBeta, ordBeta
## +-------------+----------------+------------------+
## | ModelName | DispersionPval | DispersionIssues |
## +-------------+----------------+------------------+
## | defaultBeta | 0.048 | TRUE |
## | ordBeta | 0.012 | TRUE |
## +-------------+----------------+------------------+
## All models fitted have dispersion issues.
## Model with smallest dispersion issue: defaultBeta.
##
## Model performance based on RMSE (full dataset).
## +-------------+------------------+-------+
## | ModelName | DispersionIssues | RMSE |
## +-------------+------------------+-------+
## | defaultBeta | TRUE | 0.135 |
## | ordBeta | TRUE | 0.135 |
## +-------------+------------------+-------+
##
## Calculating RMSE using blocked cross-validation.
## +-------------+------------------+---------+
## | ModelName | DispersionIssues | RMSE_CV |
## +-------------+------------------+---------+
## | ordBeta | TRUE | 0.139 |
## | defaultBeta | TRUE | 0.144 |
## +-------------+------------------+---------+
From the climate model diagnostics we can see that all models exhibit
dispersal issues. For this example we will continue with the
defaultBeta
model as its p-value (0.048) is very close to
the significance threshold (>0.05). Nevertheless, it might be worth
investigating the model diagnostics in more detail (e.g. consider
altering the random effect structure, number of residual simulations).
In this case the climate change dataset does not have any zeros or ones,
so the choice of going for the defaultBeta
makes sense from
the perspective of the data.
Now that we have reviewed the model diagnostics (the
data.frame
returned by testMSAModels()
), we
can extract the most appropriate models to continue our analysis. The
selectMSAModel()
function can be used for this purpose.
There are three options to extract the model:
ModelName
column of the data.frame
returned by
the testMSAModels()
function.data.frame
returned by the testMSAModels()
function.data.frame
returned by the previous
testMSAModels()
call directly to the
selectMSAModel
function. This will use the
SuggestedModel
column to extract the model.The code below shows an example of all three options.
#select model by name
selectedModelRoads = selectMSAModel(models=modelsRoads,
select="zeroOneInfBeta")
## Selecting model by name: zeroOneInfBeta.
#select model by index
selectedModelLU = selectMSAModel(models=modelsLU,
select=5)
## Selecting model by index, selected model: oneInfBeta.
#select model using the generated stats
selectedModelClimate = selectMSAModel(models=modelsClimate,
select=ModelClimateStats)
## Warning: no models passed diagnostics, returning model smallest dispersion issue.
## Returning defaultBeta model.
Next, the MSA response can be plotted by providing the returned model
object to the plotMSAResponse()
function. Below two
examples are provided. For additional plotting options see the R help
file: ?plotMSAResponse
# plot final response of the selected model
plotMSAResponse(model = selectedModelRoads,
predCol = RoadDistLog10,
main="MSA response - Distance to Roads")
# plot final response of the selected model
plotMSAResponse(model = selectedModelLU,
predCol = land_use,
datacolor = "#333333",
fitcolor="#ff4010",
xlab="",
ylab="MSA",
main="MSA response - Land use")
Before being able to predict the MSA spatially, we require pressure
maps. The can vary depending on the scenario of interest (e.g. current
or future conditions). For this case study, multiple pressure maps were
prepared for Europe on the default 300m by 300m resolution. This example
includes maps for 2015 and for a future scenario (2050 SSP3). These
example maps can be downloaded from Figshare
using the
GLOBIOr
function called downloadExampleMaps()
,
see code below (total size: ~720mb
). The
downloadExampleMaps()
function requires a path
(downloadDir
) to store the maps (tif
files).
This path can be relative (inside the current working directory) or
absolute. The function will not redownload the maps if they are found in
the provided downloadDir
. By default the
downloadExampleMaps()
function will load the maps into a
single list
object.
# download example pressure maps from Figshare
maps = downloadExampleMaps(downloadDir = "maps")
# see what is inside the loaded data
str(maps,1)
## List of 10
## $ tifpaths :'data.frame': 9 obs. of 6 variables:
## $ climatechange_2015 :S4 class 'SpatRaster' [package "terra"]
## $ climatechange_2050_SSP3:S4 class 'SpatRaster' [package "terra"]
## $ landuse_2015 :S4 class 'SpatRaster' [package "terra"]
## $ landuse_2050_SSP3 :S4 class 'SpatRaster' [package "terra"]
## $ landuse_conv_table :'data.frame': 8 obs. of 2 variables:
## $ roaddistkm_2015_log10 :S4 class 'SpatRaster' [package "terra"]
## $ roaddistkm_2050_log10 :S4 class 'SpatRaster' [package "terra"]
## $ country_borders :S4 class 'SpatRaster' [package "terra"]
## $ region_borders :S4 class 'SpatRaster' [package "terra"]
For making predictions using a categorical predictor (in our case for
land use), we need to set the levels of the terra
rast
to match the categorical labels contained in the model
dataset (in this example the MSADataLandUseLand
data.frame
that was used to fit the models).
# categorical values used in model training data
print(unique(MSADataLandUseLand$land_use))
## [1] "Cropland_H" "Cropland_L" "Pasture_H"
## [4] "Pasture_L" "Plantation" "SecondaryVegetation"
## [7] "Urban"
# conversion data.frame (provided in the example data)
print(maps$landuse_conv_table)
## value cover
## 1 1 Urban
## 2 2 Cropland_H
## 3 3 Pasture_H
## 4 4 Pasture_L
## 5 5 Plantation
## 6 6 SecondaryVegetation
## 7 7 Cropland_L
## 8 8 Natural
# set levels landuse 2015
maps$landuse_2015 = setRasterLevels(maps$landuse_2015,
NumericalValue=maps$landuse_conv_table$value,
CoverName=maps$landuse_conv_table$cover)
# set levels landuse 2050 SSP3
maps$landuse_2050_SSP3 = setRasterLevels(maps$landuse_2050_SSP3,
NumericalValue=maps$landuse_conv_table$value,
CoverName=maps$landuse_conv_table$cover)
# plot some of the example pressure maps (2015)
# using the default terra package plotting function (will be replaced later)
terra::plot(maps$landuse_2015,main="Land use 2015")
terra::plot(maps$climatechange_2015,main="Climate change 2015")
terra::plot(maps$roaddistkm_2015_log10,main="Road distance 2015 (log10 km)")
After we have selected the most appropriate model(s) and loaded the
pressure maps into our R
environment, we can predict the
MSA spatially using the predictMSAPressure()
function. This
function will predict the MSA for a given terra
rast
layer. TODO: 2) [explanation params] 1) [explanation
about masking function]
# predict distance to road MSA spatially
predictedRoadMSA = predictMSAPressure(model = selectedModelRoads,
variableName = "RoadDistLog10",
pressureMap = maps$roaddistkm_2015_log10,
predictionMask = list(maps$landuse_2015),
maskPredictionFor = list(c("Urban")))
## Masking prediction for Urban in mask 1
## Predicting MSA pressure for RoadDistLog10 using model zeroOneInfBeta
## 85.97 sec elapsed
# predict land use MSA spatially
predictedLandUseMSA = predictMSAPressure(model = selectedModelLU,
variableName = "land_use",
pressureMap = maps$landuse_2015)
## Warning: no prediction mask provided, predicting for all raster cells
## Predicting MSA pressure for land_use using model oneInfBeta
## Warning: map value Natural not included in model predictor, setting MSA to 1.0.
## 28.75 sec elapsed
# predict climate change MSA spatially
predictedClimateMSA = predictMSAPressure(model = selectedModelClimate,
variableName = "GMTI",
pressureMap = maps$climatechange_2015)
## Warning: no prediction mask provided, predicting for all raster cells
## Predicting MSA pressure for GMTI using model defaultBeta
## 39.61 sec elapsed
After the prediction are done, we can visualize the individual
predictions using default terra
plotting functions. TODO: provide GLOBIOr plotting function (with the colors discussed)
# plot the individual MSA predictions
tmp = c(predictedRoadMSA,predictedLandUseMSA,predictedClimateMSA)
names(tmp) = c("MSA_roaddist","MSA_landuse","MSA_climate")
terra::plot(tmp)
Finally we can combine the predictions into a single MSA map. TODO: 1) more explanation 2) contribution calculation
# combine predicted MSA maps
predictedMSAList = list(predictedRoadMSA,predictedLandUseMSA,predictedClimateMSA)
predictedMSA = combinePredictedMSA(predictedMSAList = predictedMSAList,
computeContribution = TRUE)
## Combining 3 MSA predictions
## Included maps: RoadDistLog10, land_use, GMTI
## 80.96 sec elapsed
# plot (new plotting function in development)
terra::plot(predictedMSA$MSA,type = 'continuous', range = c(0, 1), main="Combined MSA")
# plot contribution maps
terra::plot(predictedMSA$PressureContribution,type = 'continuous', range = c(0, 1))