Install the GLOBIOr package

First the GLOBIOr package needs to be installed, use the following code to install the current (local) version.

# install package
install.packages("GLOBIOr_0.2.5.100008.tar.gz", repos=NULL)

Load packages

After the package is installed, we can load it into R.

# load packages
library(GLOBIOr) 
## GLOBIOr version: 0.2.5.100015

Besides GLOBIOr we will also need two other packages, namely ggplot2 and terra, for plotting and spatial data handling respectively.

library(ggplot2)
library(terra)
## terra 1.8.9

Load the example MSA data

Next we can use the build-in example data, for this example we will use the MSA response to roads data, the response to climate change and land-use class. Note, however, that the other datasets are also available. The default R documentation can be consulted for more information regarding the contents of the example datasets (similar to requesting the help for R functions, for example by calling ?MSADataRoadLand). For this example we will only fit MSA responses for vertebrates, so after loading the data we can subset the data to only include vertebrates.

# load example data, included in package for now
data(MSADataRoadLand) # MSA response to roads
data(MSADataLandUseLand) #  MSA response to land use
data(MSADataClimateLand) # MSA response to climate change

# subset data
MSADataRoadLand = subset(MSADataRoadLand, taxon=="vertebrates")
MSADataLandUseLand = subset(MSADataLandUseLand, taxon=="vertebrates")
MSADataClimateLand = subset(MSADataClimateLand, taxon=="vertebrates")

# explore one of the datasets
head(MSADataRoadLand)
##                  study dataset       taxon RoadDist       MSA     MSA_t
## 1  Delgado et al. 2004       5 vertebrates       10 0.9285714 0.9264706
## 2  Meunier et al. 2000     14a vertebrates       50 1.0000000 0.9975490
## 3  Meunier et al. 2000     14b vertebrates       50 1.0000000 0.9975490
## 4       Xu et al. 2013      33 vertebrates       25 1.0000000 0.9975490
## 5       Xu et al. 2013      33 vertebrates       75 1.0000000 0.9975490
## 6 Whitaker et al. 1997      29 vertebrates       10 0.8227023 0.8211204
##   n_species RoadDistLog10
## 1         8      1.000000
## 2         3      1.698970
## 3         3      1.698970
## 4         1      1.397940
## 5         1      1.875061
## 6        27      1.000000

Fit MSA Relationship(s)

We can now fit the MSA relationships using the fitMSAModels() function (see example below). This function fits the MSA relationship(s) for the given dataset and returns a list of fitted models. By default the fitMSAModels() function tries to fits six models:

  1. defaultBeta: a default beta model without any transformation to the response variable.
  2. svBeta: a beta model with a Smithson-Verkuilen transformation to the response variable (ref).
  3. ordBeta: a ordered beta model that aiming to model potential zeros and ones in the response (ref).
  4. zeroInfBeta: a zero-inflated beta model that aims to model potential zeros in the response (ref).
  5. oneInfBeta: a one-inflated beta model that aims to model potential ones in the response (ref).
  6. zeroOneInfBeta: a zero-one inflated beta model that aims to model potential zeros and ones in the response (ref).

The tryModels input argument can be used to specify which models should be fitted (i.e. specific models can be excluded). By default the fitMSAModels() function uses the defined beta formula to fit all (sub)models. However, the zi argument can for example be used to specify the zero-inflation part model of the zeroInfBeta model (when a different relationship is assumed for the continuous data compared to the binomial that predicts the probability of a zero). For the use of custom formulas for the other models see the documentation of the fitMSAModels() function (?fitMSAModels). The weights input argument can be used to specify the weights of the data points. The function also provides response plots of the fitted models (can be turned off by setting plot = FALSE).

Below we fit the MSA relationship for the road dataset. Here the MSA is modelled as a function of the log10 of the road distance, dataset nested within study is used as a random effect. The weights are set to the square root of the number of species in the dataset. Because only one formula is specified (beta) all models will using this formulation.

# fit potential MSA models for the road dataset
modelsRoads = fitMSAModels(beta=MSA~RoadDistLog10+(1|study/dataset),
                           data=MSADataRoadLand,
                           weights=sqrt(n_species),
                           plot=TRUE)

## Zeros and ones detected in response.
## 
## Fitting models...
## - defaultBeta model cannot be fitted, zeros and/or ones in response data.
## - fitting svBeta model.
## - fitting ordBeta model.
## - Fitting zeroInfBeta model. Ones detected, using sv transformation on ones.
## - Fitting oneInfBeta model. Zeros detected, using sv transformation on zeros.
## - Fitting zeroOneInfBeta model.
## 
## Creating response plots for predictor: RoadDistLog10

As can be seen from the outputs and plot generated by the fitMSAModels() function, the relationship between the MSA and the log10 of the road distance is fitted using five approaches, which depend on the nature of the data. In this case the dataset has both zeros, ones as well as data between zero and one, therefore the defaultBeta cannot be fitted.

# fit potential MSA models for the land use dataset
modelsLU = fitMSAModels(MSA~land_use+(1|study/dataset),
                          data=MSADataLandUseLand,
                          weights=sqrt(n_species),
                          plot=TRUE)

## Ones detected in response, no zeros.
## 
## Fitting models...
## - defaultBeta model cannot be fitted, zeros and/or ones in response data.
## - fitting svBeta model.
## - fitting ordBeta model.
## - zeroInfBeta not fitted: no zeros in the response data.
## - Fitting oneInfBeta model.
## - zeroOneInfBeta not fitted: needs both ones and zeros in the response data.
## 
## Creating response plots for predictor: land_use

Similar to the output before we can be seen from the outputs and plot generated by the fitMSAModels() function which models have been fitted and which not. In this case the relationship between the MSA and the land use is fitted using three approaches, which again depend on the nature of the data. In this case the data set has only ones and values between zero and one (no zeros), therefore the defaultBeta, zeroInfBeta and zeroOneInfBeta cannot be fitted.

# fit potential MSA models for the climate change dataset
modelsClimate = fitMSAModels(MSA~GMTI+(1|study),
                               data=MSADataClimateLand,
                               dispformula = ~GMTI,
                               weights=sqrt(n_species),
                               plot=TRUE)

## No zeros and/or ones detected in response.
## 
## Fitting models...
## - fitting defaultBeta model.
## - svBeta model not fitted: no zeros or ones in response data.
## - fitting ordBeta model.
## - zeroInfBeta not fitted: no zeros in the response data.
## - oneInfBeta not fitted: no ones in the response data.
## - zeroOneInfBeta not fitted: needs both ones and zeros in the response data.
## 
## Creating response plots for predictor: GMTI

In the final example of this case study we fit models to investigate the relationship between the MSA and the climate change. Here fitMSAModels() uses two approaches. In this case the dataset has no zeros and no ones (only values between zero and one), therefore only the defaultBeta and ordBeta are used, all other models are omitted.

Select the most appropriate model

In order to select the best model from all fitted options (provided by the fitMSAModels() function), the individual models can be evaluated using the testMSAModel() function. This function aims to determine the most appropriate model based on the DHARMa diagnostics (dispersion test) and a measure for goodness of fit, the the root mean square error (RMSE) determined via blocked cross-validation. The testMSAModel() function does the following:

  1. First it checks for dispersion issues using the DHARMa package (ref), the p-value returned by the DHARMa::testDispersion() is used to evaluate the dispersion of provided models. By default 500 simulated residuals are used, the nsim argument can be set to a positive integer to increase or decrease the number of simulations (a minimum of 250 is recommended).
  2. After models have passed the diagnostics, testMSAModel() calculates a RMSE using a blocked cross-validation approach. By default 5 cross-validation folds used, the ncv argument can be set to a positive integer to increase or decrease the number of folds. In the blocked cross-validation, the data is grouped by the first level in the random effect structure provided in the corresponding model. In this example the random effect is (1|study/dataset), thus the study column will be used to group the data in the blocked cross-validation. The variable used for grouping the data in the blocked cross-validation approach can be set manually via the groupcv input argument

The function will return a data.frame that contains the diagnostics for each model. Finally, one model will be suggested as the best model based on the diagnostics (a model without dispersion issues and the lowest cross-validated RMSE).

# check for dispersion issues and cross-validated RSME
ModelRoadsStats = testMSAModels(models=modelsRoads)

## Diagnostics
## Running DHARMa dispersion tests.
## Dispersion issues found for: svBeta, zeroInfBeta, oneInfBeta
## +-------------+----------------+------------------+
## |   ModelName | DispersionPval | DispersionIssues |
## +-------------+----------------+------------------+
## |      svBeta |          0.012 |             TRUE |
## | zeroInfBeta |          0.032 |             TRUE |
## |  oneInfBeta |          0.044 |             TRUE |
## +-------------+----------------+------------------+
## 
## Model performance based on RMSE (full dataset).
## +----------------+------------------+-------+
## |      ModelName | DispersionIssues |  RMSE |
## +----------------+------------------+-------+
## | zeroOneInfBeta |            FALSE | 0.142 |
## |     oneInfBeta |             TRUE | 0.146 |
## |         svBeta |             TRUE | 0.148 |
## |    zeroInfBeta |             TRUE | 0.154 |
## |        ordBeta |            FALSE | 0.166 |
## +----------------+------------------+-------+
## 
## Calculating RMSE using blocked cross-validation.
## +----------------+------------------+---------+
## |      ModelName | DispersionIssues | RMSE_CV |
## +----------------+------------------+---------+
## |     oneInfBeta |             TRUE |   0.314 |
## | zeroOneInfBeta |            FALSE |   0.317 |
## |         svBeta |             TRUE |   0.319 |
## |    zeroInfBeta |             TRUE |   0.320 |
## |        ordBeta |            FALSE |   0.325 |
## +----------------+------------------+---------+
# check for dispersion issues and cross-validated RSME
ModelLUStats = testMSAModels(models=modelsLU)

## Diagnostics
## Running DHARMa dispersion tests.
## No dispersion issues found.
## 
## Model performance based on RMSE (full dataset).
## +------------+------------------+-------+
## |  ModelName | DispersionIssues |  RMSE |
## +------------+------------------+-------+
## |     svBeta |            FALSE | 0.106 |
## |    ordBeta |            FALSE | 0.112 |
## | oneInfBeta |            FALSE | 0.116 |
## +------------+------------------+-------+
## 
## Calculating RMSE using blocked cross-validation.
## +------------+------------------+---------+
## |  ModelName | DispersionIssues | RMSE_CV |
## +------------+------------------+---------+
## | oneInfBeta |            FALSE |   0.241 |
## |    ordBeta |            FALSE |   0.248 |
## |     svBeta |            FALSE |   0.253 |
## +------------+------------------+---------+
# check for dispersion issues and cross-validated RSME
ModelClimateStats = testMSAModels(models=modelsClimate)

## Diagnostics
## Running DHARMa dispersion tests.
## Dispersion issues found for: defaultBeta, ordBeta
## +-------------+----------------+------------------+
## |   ModelName | DispersionPval | DispersionIssues |
## +-------------+----------------+------------------+
## | defaultBeta |          0.048 |             TRUE |
## |     ordBeta |          0.012 |             TRUE |
## +-------------+----------------+------------------+
## All models fitted have dispersion issues.
## Model with smallest dispersion issue: defaultBeta.
## 
## Model performance based on RMSE (full dataset).
## +-------------+------------------+-------+
## |   ModelName | DispersionIssues |  RMSE |
## +-------------+------------------+-------+
## | defaultBeta |             TRUE | 0.135 |
## |     ordBeta |             TRUE | 0.135 |
## +-------------+------------------+-------+
## 
## Calculating RMSE using blocked cross-validation.
## +-------------+------------------+---------+
## |   ModelName | DispersionIssues | RMSE_CV |
## +-------------+------------------+---------+
## |     ordBeta |             TRUE |   0.139 |
## | defaultBeta |             TRUE |   0.144 |
## +-------------+------------------+---------+

From the climate model diagnostics we can see that all models exhibit dispersal issues. For this example we will continue with the defaultBeta model as its p-value (0.048) is very close to the significance threshold (>0.05). Nevertheless, it might be worth investigating the model diagnostics in more detail (e.g. consider altering the random effect structure, number of residual simulations). In this case the climate change dataset does not have any zeros or ones, so the choice of going for the defaultBeta makes sense from the perspective of the data.

Now that we have reviewed the model diagnostics (the data.frame returned by testMSAModels()), we can extract the most appropriate models to continue our analysis. The selectMSAModel() function can be used for this purpose. There are three options to extract the model:

  1. By model name, using the names as written in the ModelName column of the data.frame returned by the testMSAModels() function.
  2. By index, using the row index of the data.frame returned by the testMSAModels() function.
  3. By providing the data.frame returned by the previous testMSAModels() call directly to the selectMSAModel function. This will use the SuggestedModel column to extract the model.

The code below shows an example of all three options.

#select model by name
selectedModelRoads = selectMSAModel(models=modelsRoads,
                                    select="zeroOneInfBeta")
## Selecting model by name: zeroOneInfBeta.
#select model by index
selectedModelLU = selectMSAModel(models=modelsLU,
                                 select=5)
## Selecting model by index, selected model: oneInfBeta.
#select model using the generated stats
selectedModelClimate = selectMSAModel(models=modelsClimate,
                                      select=ModelClimateStats)
## Warning: no models passed diagnostics, returning model smallest dispersion issue.
## Returning defaultBeta model.

Next, the MSA response can be plotted by providing the returned model object to the plotMSAResponse() function. Below two examples are provided. For additional plotting options see the R help file: ?plotMSAResponse

# plot final response of the selected model
plotMSAResponse(model = selectedModelRoads,
                predCol = RoadDistLog10,
                main="MSA response - Distance to Roads")

# plot final response of the selected model
plotMSAResponse(model = selectedModelLU,
                predCol = land_use,
                datacolor = "#333333",
                fitcolor="#ff4010",
                xlab="",
                ylab="MSA",
                main="MSA response - Land use")

Spatial pressure maps

Before being able to predict the MSA spatially, we require pressure maps. The can vary depending on the scenario of interest (e.g. current or future conditions). For this case study, multiple pressure maps were prepared for Europe on the default 300m by 300m resolution. This example includes maps for 2015 and for a future scenario (2050 SSP3). These example maps can be downloaded from Figshare using the GLOBIOr function called downloadExampleMaps(), see code below (total size: ~720mb). The downloadExampleMaps() function requires a path (downloadDir) to store the maps (tif files). This path can be relative (inside the current working directory) or absolute. The function will not redownload the maps if they are found in the provided downloadDir. By default the downloadExampleMaps() function will load the maps into a single list object.

# download example pressure maps from Figshare
maps = downloadExampleMaps(downloadDir = "maps")
# see what is inside the loaded data
str(maps,1)
## List of 10
##  $ tifpaths               :'data.frame': 9 obs. of  6 variables:
##  $ climatechange_2015     :S4 class 'SpatRaster' [package "terra"]
##  $ climatechange_2050_SSP3:S4 class 'SpatRaster' [package "terra"]
##  $ landuse_2015           :S4 class 'SpatRaster' [package "terra"]
##  $ landuse_2050_SSP3      :S4 class 'SpatRaster' [package "terra"]
##  $ landuse_conv_table     :'data.frame': 8 obs. of  2 variables:
##  $ roaddistkm_2015_log10  :S4 class 'SpatRaster' [package "terra"]
##  $ roaddistkm_2050_log10  :S4 class 'SpatRaster' [package "terra"]
##  $ country_borders        :S4 class 'SpatRaster' [package "terra"]
##  $ region_borders         :S4 class 'SpatRaster' [package "terra"]

Set pressure map levels

For making predictions using a categorical predictor (in our case for land use), we need to set the levels of the terra rast to match the categorical labels contained in the model dataset (in this example the MSADataLandUseLand data.frame that was used to fit the models).

# categorical values used in model training data
print(unique(MSADataLandUseLand$land_use))
## [1] "Cropland_H"          "Cropland_L"          "Pasture_H"          
## [4] "Pasture_L"           "Plantation"          "SecondaryVegetation"
## [7] "Urban"

# conversion data.frame (provided in the example data)
print(maps$landuse_conv_table)
##   value               cover
## 1     1               Urban
## 2     2          Cropland_H
## 3     3           Pasture_H
## 4     4           Pasture_L
## 5     5          Plantation
## 6     6 SecondaryVegetation
## 7     7          Cropland_L
## 8     8             Natural
# set levels landuse 2015
maps$landuse_2015 = setRasterLevels(maps$landuse_2015,
                                    NumericalValue=maps$landuse_conv_table$value,
                                    CoverName=maps$landuse_conv_table$cover)

# set levels landuse 2050 SSP3
maps$landuse_2050_SSP3 = setRasterLevels(maps$landuse_2050_SSP3,
                                         NumericalValue=maps$landuse_conv_table$value,
                                         CoverName=maps$landuse_conv_table$cover)

Plot pressure maps

# plot some of the example pressure maps (2015)
# using the default terra package plotting function (will be replaced later)
terra::plot(maps$landuse_2015,main="Land use 2015")

terra::plot(maps$climatechange_2015,main="Climate change 2015")

terra::plot(maps$roaddistkm_2015_log10,main="Road distance 2015 (log10 km)")

Predicting MSA spatially

After we have selected the most appropriate model(s) and loaded the pressure maps into our R environment, we can predict the MSA spatially using the predictMSAPressure() function. This function will predict the MSA for a given terra rast layer. TODO: 2) [explanation params] 1) [explanation about masking function]

# predict distance to road MSA spatially
predictedRoadMSA = predictMSAPressure(model = selectedModelRoads,
                                      variableName = "RoadDistLog10",
                                      pressureMap = maps$roaddistkm_2015_log10,
                                      predictionMask = list(maps$landuse_2015),
                                      maskPredictionFor = list(c("Urban")))

## Masking prediction for Urban in mask 1
## Predicting MSA pressure for RoadDistLog10 using model zeroOneInfBeta
## 85.97 sec elapsed
# predict land use MSA spatially
predictedLandUseMSA = predictMSAPressure(model = selectedModelLU,
                                         variableName = "land_use",
                                         pressureMap = maps$landuse_2015)

## Warning: no prediction mask provided, predicting for all raster cells
## Predicting MSA pressure for land_use using model oneInfBeta
## Warning: map value Natural not included in model predictor, setting MSA to 1.0.
## 28.75 sec elapsed
# predict climate change MSA spatially
predictedClimateMSA = predictMSAPressure(model = selectedModelClimate,
                                         variableName = "GMTI",
                                         pressureMap = maps$climatechange_2015)
## Warning: no prediction mask provided, predicting for all raster cells
## Predicting MSA pressure for GMTI using model defaultBeta
## 39.61 sec elapsed

After the prediction are done, we can visualize the individual predictions using default terra plotting functions. TODO: provide GLOBIOr plotting function (with the colors discussed)

# plot the individual MSA predictions
tmp = c(predictedRoadMSA,predictedLandUseMSA,predictedClimateMSA)
names(tmp) = c("MSA_roaddist","MSA_landuse","MSA_climate")
terra::plot(tmp)

Combine predictions into a single MSA map

Finally we can combine the predictions into a single MSA map. TODO: 1) more explanation 2) contribution calculation

# combine predicted MSA maps
predictedMSAList = list(predictedRoadMSA,predictedLandUseMSA,predictedClimateMSA)
predictedMSA = combinePredictedMSA(predictedMSAList = predictedMSAList,
                                   computeContribution = TRUE)
## Combining 3 MSA predictions
## Included maps: RoadDistLog10, land_use, GMTI
## 80.96 sec elapsed

# plot (new plotting function in development)
terra::plot(predictedMSA$MSA,type = 'continuous', range = c(0, 1), main="Combined MSA")

# plot contribution maps
terra::plot(predictedMSA$PressureContribution,type = 'continuous', range = c(0, 1))