Chapter : statistical tests with dependent data

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Introduction

1. What is autocorrelation ?

Statistical analyses, including many regression methods, often assume that observations are independent of each other. Most of the time, it seems natural to analyse two different datas, X and Y, assuming their independence. But there may still be a dependency intrinsic to each of them: this is known as autocorrelation. Autocorrelation indicates that the independence assumption is not satisfied, which can lead to biased parameter estimates, incorrect confidence intervals and a loss of statistical power.

In statistics, autocorrelation is a measure of the linear dependence between the values of one variable itself, at different times or locations. Checking for autocorrelation in a variable means analyzing the relationship between an observation and other previous observations in the series. In other words, this means analyzing the possible link between what is measured at time 1 and what is measured at time 2, to see if the data at time 2 can be predicted with what we know about the data at time 1, if they are close in space or time.

Correlation between two observations is often expressed as a correlation coefficient, such as Pearson’s correlation coefficient. Values are correlated with an offset function in time or space, giving an autocorrelation coefficient which may be:

positive: indicating that the observations tend to be similar,
negative: indicating that the observations tend to be opposite to each other.

2. What types of autocorrelation ?

Autocorrelation is commonly used to analyze data for which the order or location of observations has an important significance, such as:

Time series: data taken at the same location at several points in time,
Spatial data: data taken at a given point in time at several locations in the same geographical area, and therefore close in space.

2.1. Time series

A typical example of autocorrelation in a time series is a seasonal autocorrelation, where values are observed at a same time each year (seasons), and are therefore correlated with each other.

Temporal autocorrelation shows spatial similarities that depend on the scale used.

Temporal autocorrelation in ecology is important for understanding the processes of reproduction, migration, population dispersal and species responses to seasonal environmental changes.

Example of temporal autocorrelation

An example of temporal autocorrelation in ecology concerns populations of animal or plant species, particularly when studying seasonal variations. Let’s imagine a study of annual fluctuations in the population of migratory birds in a given region. Temporal autocorrelation means that the number of birds observed at a given time of the year is correlated with the number of birds observed at the same time of the previous year.

Suppose the data show positive temporal autocorrelation for a migratory bird species. This would mean that if, for example, in April of the current year, a large number of these birds are observed, it is highly likely that in April of the previous year, a large number of these same birds were also observed. This temporal dependence may be due to seasonal bird migrations, the availability of food resources or other cyclical environmental factors.

2.2. Spatial dependence

Spatial autocorrelation refers to the correlation between observations in close spatial locations. It shows temporal similarities that depend on the scale used. Spatial autocorrelation is important for understanding patterns of species distribution, population dispersal, biotic interactions and ecological processes at different spatial scales.

Spatial autocorrelation can be due to various factors, such as :

Microclimate effects: local characteristics of terrain, vegetation or topography can influence temperature on a small scale,
Dispersion phenomena: the propagation of heat, wind, humidity or other environmental factors can cause spatial autocorrelation,
Biotic interactions: interactions between plant, animal and micro-organism species in an ecosystem can also influence the spatial distribution of variables such as temperature.

Example of spatial dependence

An example of spatial autocorrelation is the observation of temperature distribution in a large forest. Let’s imagine a large, dense forest with many weather stations measuring temperatures at different locations. Spatial autocorrelation would manifest itself in the fact that temperatures recorded at nearby locations in the forest are correlated, i.e. they tend to be similar.

Suppose the data show positive spatial autocorrelation. This would mean that, if you have two weather stations located close to each other (say, a few meters apart), the temperatures measured at these two stations at any given time are highly positively correlated. In other words, when one of the stations records a high temperature, the other station will also tend to record a high temperature, and vice versa.

3. Why is this important?

Autocorrelation is an essential concept in statistics, as it can have a significant impact on data analysis. In the presence of pseudoreplications, meaning that several different values are thought to be taken when in fact they are dependent on each other, it may appear that one variable has a significant effect on another, when in reality this effect is biased due to the temporal or spatial dependence of the data. It can therefore have a significant impact on statistical analysis, and often requires adjustments to ensure that results are reliable and unbiased.

Autocorrelation checking is an essential process in statistics to guarantee the accuracy, reliability and validity of analyses. It enables appropriate statistical methods to be applied, biases to be corrected and informed modeling and forecasting decisions to be made. It is also essential for understanding patterns of variation in biological data.

I. Time dependence

Temporal dependency means that values sampled at a given time can be related to the value sampled before. Therefore, the variable can be correlated to itself and create pseudoreplication.

Diverse packages on R are useful to manage time series, such as TSA (Time Series Analysis).


Attaching package: 'nlme'

The following object is masked from 'package:dplyr':

    collapse

Loading required package: carData


Attaching package: 'car'

The following object is masked from 'package:dplyr':

    recode

The following object is masked from 'package:purrr':

    some


Attaching package: 'TSA'

The following object is masked from 'package:readr':

    spec

The following objects are masked from 'package:stats':

    acf, arima

The following object is masked from 'package:utils':

    tar

1. The dataset

To understand better how to handle time series, data are used to illustrate the process from an experiment on vultures in time.

## Load the data
dataT=read.table("vautour.txt",header=T,dec=",", stringsAsFactors = T)
head(dataT)       # What are the data

  col year   success    growth  denstot     ratio
1   C 1985 0.5000000 1.2500000  9.00000 0.1432580
2   C 1986 0.5000000 0.6666667  4.00000 0.4604144
3   C 1987 0.8333333 1.3750000 17.00000 0.7876576
4   C 1988 0.5238095 1.2000000 29.00000 0.6261380
5   C 1989 0.5000000 1.1428571 45.00000 0.3209719
6   C 1990 0.5694444 1.1296296 27.33333 0.2768327

summary(dataT)    # Summary of the spread of each variable

 col         year         success           growth          denstot      
 C:19   Min.   :1972   Min.   :0.3333   Min.   :0.6667   Min.   : 4.000  
 O:27   1st Qu.:1983   1st Qu.:0.5964   1st Qu.:1.0055   1st Qu.: 9.167  
        Median :1990   Median :0.7197   Median :1.1329   Median :26.833  
        Mean   :1989   Mean   :0.7268   Mean   :1.1172   Mean   :28.023  
        3rd Qu.:1995   3rd Qu.:0.8735   3rd Qu.:1.1982   3rd Qu.:41.643  
        Max.   :2003   Max.   :1.0000   Max.   :1.3889   Max.   :68.933  
     ratio         
 Min.   :0.007996  
 1st Qu.:0.207050  
 Median :0.420548  
 Mean   :0.464461  
 3rd Qu.:0.733536  
 Max.   :0.993844

Two colonies ‘C’ and ‘O’ are studied from 1972 to 2003. In particular, the reproductive success (success), the growth rate (growth), the density of population (denstot) and the resource availability (ratio) of the colonies.

## Arrange the dataset
dataT_ordered=dataT %>% arrange(col,year)
## Create 2 sub-datasets for the 2 colonies
C=dataT[dataT$col=="C",]
O=dataT[dataT$col=="O",]

2. Data exploration

## Data representation
# Window size
par(mfrow=c(2,3),mar=c(2,4,2.5,0),oma=c(2,2,2,4)) 

## Colony C
plot(C[,2],C[,3], xlab="year", ylab="mean reproductive success")
lines(C[,2],C[,3])
plot(C[,2],C[,4], pch=17, xlab="year", ylab="Colony growth rate")
lines(C[,2],C[,4])
plot(C[,2],C[,5],pch=16, xlab="year", ylab="Number of pairs")
lines(C[,2],C[,5])
mtext(text="Parameter for Colony C",side=3,line=-1,outer=TRUE)

## Colony O
plot(O[,2],O[,3],  xlab="year", ylab="mean reproductive success")
lines(O[,2],O[,3])
plot(O[,2],O[,4],pch=17, xlab="year", ylab="Colony growth rate")
lines(O[,2],O[,4])
plot(O[,2],O[,5], pch=16, xlab="year", ylab="Number of pairs")
lines(O[,2],O[,5])
mtext(text="Parameter for Colony O",side=1,line=-14.5,outer=TRUE)

Through time, all the variables do not have the same behaviour for both of the colonies. Some variables fluctuate in time like the O colony growth rate, or can evolve gradually, like the number of pairs for the O colony.

The irregularity in time of the variables can mean that the data are correlated to the precedent ones. A key parameter to know if there really is a correlation of the variable to itself is autocorrelation.

3. Autocorrelation

Autocorrelation is given by the expression: \[\rho(k) = \frac{E(X_tX_{t+1}) - E(X_t)E(X{t+1})}{\sqrt{E(X_t^2)E(X_{t+1}^2)}}\]

With the discrete sequence \(\{X_1, ... X_T\}\) with \(X = \{X_t\}_{t\in\mathbb{N}}\) as \(E[X_t^2] < \infty\) for \(t\in\mathbb{Z}\).

In other words, the values are compared to their neighbors to conclude if they are correlated or not.

The autocorrelation can be detected in a non-stationary process.

Stationnarity is the constistency of the statistical parameters of a series such as the mean and the variance (Figure).

In non-stationary processes, these parameters vary in time.
In stationary processes, the values gravitate around a mean value within the same range.

Given the precedent plots displayed, one would think there are both stationary and non-stationary processes within the different variables. Tests will allow us to know with more details if there is a dependency.

On R, autocorrelation can be known thanks to the function acf (Auto- and Cross- Covariance and -Correlation Function Estimation) of the TSA package.

Important note: In order for the acf() function to work, the data have to be sorted according to time with one value per time step. To do so, NA or mean values can be added if necessary to fill the dataset accordingly. The package lubridate can be useful to manage dates data.

par(mfrow=c(2,3),mar=c(2,4,4,2))
acf(C[,3], main="Reproductive success for C")
acf(C[,4], main="Growth rate for C")
acf(C[,5], main="Density for C")
acf(O[,3], main="Reproductive success for O")
acf(O[,4], main="Growth rate for O")
acf(O[,5], main="Density for O")

The plots display the correlation coefficient at the associated time step with the neighbors (lag). The blue dotted lines represent the significance threshold of correlation.

These plots allow us to know if the time dependency exists and what profile it follows.

Growth rate of O and reproductive success of C : White noise. There is no significant value and particular pattern. Therefore, no time dependency.
Density of C and reproductive success of O : Short positive dependency. Some values are significantly positively correlated and a pattern emerges (cyclic here).
Growth rate of C : Short negative dependency. Some values are significantly negatively correlated and a pattern emerges (cyclic here).
Density of O : Long term dependency. Several values are significantly correlated and a pattern appears (linear here).

For the following analyses, we will focus on the Density of the O colony. The time is considered an autocovariable.

4. Models

Now that the time dependency have been proved to be true thanks to autocorrelation test, it is possible to proceed with adapted statistical tests.

Different models can be used according to the type of dependency found in the data. The goal will be to apply the model and verify that there is no time dependency left on the residuals.

5. Least square ordinary regression

Least square ordinary regression methods are used during the process of selection of the model.

\[Y = \beta_0 + \beta_1X_1 + \beta_2X_2+...+\beta_nX_n + \epsilon\] - \(Y\) is the dependent variable - \(X_n\) are the explanatory variables - \(\beta_n\) are the coefficients of \(X\) - \(\epsilon\) is the random error term or residuals

To select a relevant model is the same as selecting a model that maximizes likelihood and minimize the number of parameters.

6. Weak time dependency

If the data display a weak time dependency, it is possible to simply add the time as a random variable to the model.

\[Y\sim\alpha.X + time\]

Important note: If the variance of \(Y\) increases beforehand, you will need to perform a logarithmic transformation. If the variance of \(Y\) decreases beforehand, one would use an exponential transformation. Here, the variance increases.

A useful package on R to manage glm and gls is be nlme.

par(mfrow=c(1,2))
hist(O$denstot, main = paste("Histogram of total density", "\n" ,"of O colony "), xlab = "Density")
hist(log(O$denstot), main = paste("Histogram of total density", "\n" ,"of colony O (log transformation)"), xlab = "Log(Density)")

The plots indicates that a log transformation could help better manage the data.

A simple lme model can be used (Linear Mixed-Effects Model) with X = log(total density) and Y the ressource availability.

We can check if the model is satisfactory if the residuals are not significant.

## Model construction
modlme=lme(log(denstot) ~ ratio, data=O, random=~1|year) 
## Residuals
par(mfrow=c(1,1))
acf(modlme$residuals[,1], main = "Residuals of the lme model")

## Residuals
summary(modlme)

Linear mixed-effects model fit by REML
  Data: O 
       AIC      BIC    logLik
  81.15956 86.03506 -36.57978

Random effects:
 Formula: ~1 | year
        (Intercept)  Residual
StdDev:    0.896558 0.3362093

Fixed effects:  log(denstot) ~ ratio 
               Value Std.Error DF  t-value p-value
(Intercept) 2.177414 0.3189390 25 6.827055  0.0000
ratio       1.345590 0.5560464 25 2.419925  0.0231
 Correlation: 
      (Intr)
ratio -0.816

Standardized Within-Group Residuals:
          Min            Q1           Med            Q3           Max 
-0.6631737487 -0.2783018429 -0.0002989742  0.2288493092  0.6403315530 

Number of Observations: 27
Number of Groups: 27

The plot shows there is still a time dependency in this kind of model. The correlation value is too strong, the model is not adapted to this case.

7. Medium to strong time dependency

If a medium or a strong time dependency are proven, adding time as a random effect will not be effective enough. It will be necessary to use Time Models.

7.1 Stationary process

To deal with stationary processes, it is possible to use GLS (Generalized Leat Square) models. Three major GLS models can be used.

AR or Autoregressive Model states that a value \(Y_{(t)}\) will depend on the values \(Y\) at \(p\) the precedent times. The model adds a an autoregressive coefficient \(\phi_p\) to each \(Y_{(t-1)}\) value. In summary, this model explains the current value as a linear combination of the precedent values with \(\epsilon_t\) the random component or white noise. \[Y_{(t)}=\mu+\phi_1 \times Y_{(t-1)}+\phi_2 \times Y_{(t-2)} + ...+\phi_q \times Y_{(t-p)}+\epsilon_{(t)}\]
MA or Moving Average model functions in the same way as the AR model. However, it will consider that the current value of a series is the weighted linear combination of the previous \(q\) errors.

\[Y_{(t)}= \mu+ \theta_1 \times \epsilon_{(t-1)}+\theta_2 \times \epsilon_{(t-2)}) + ...+\theta_q \times \epsilon_{(t-q)} \]

ARMA is a combination of AR and MA models and is a linear combination of the \(p\) past values and \(q\) past residual errors.

\[Y_{(t)}=\mu+\phi_1 \times Y_{(t-1)}+\phi_2 \times Y_{(t-2)} + ...+\phi_q \times Y_{(t-p)}+\epsilon_{(t)} + \theta_1 \times \epsilon_{(t-1)}+\theta_2 \times \epsilon_{(t-2)}) + ...+\theta_q \times \epsilon_{(t-q)} \] The gls() function covers the three types of model depending on the correlation structure of the data.

## Model construction
modgls=gls(log(denstot) ~  year + ratio, data=O, correlation = corAR1(value=0.9,form = ~ year))
acf(modgls$residuals, main = "Residuals of the gls model")

summary(modgls)

Generalized least squares fit by REML
  Model: log(denstot) ~ year + ratio 
  Data: O 
       AIC      BIC    logLik
  15.67389 21.56415 -2.836943

Correlation Structure: AR(1)
 Formula: ~year 
 Parameter estimate(s):
      Phi 
0.7161672 

Coefficients:
                 Value Std.Error   t-value p-value
(Intercept) -236.27349 30.847312 -7.659451  0.0000
year           0.12045  0.015544  7.748987  0.0000
ratio         -0.02149  0.106990 -0.200826  0.8425

 Correlation: 
      (Intr) year  
year  -1.000       
ratio  0.142 -0.143

Standardized residuals:
       Min         Q1        Med         Q3        Max 
-2.1682662 -0.4112077  0.1065081  0.7340184  1.0718461 

Residual standard error: 0.3264013 
Degrees of freedom: 27 total; 24 residual

The autoregressive coefficient Psi is equal to 0.72. The autocorrelation is not completely removed. Still, the variable ‘ratio’ does not have a significant effect on \(Y\) anymore.

We can compare the two previous models thanks to an analysis of their variance (anova).

anova(modgls,modlme)

Warning in nlme::anova.lme(object = modgls, modlme): fitted objects with
different fixed effects. REML comparisons are not meaningful.

       Model df      AIC      BIC    logLik   Test  L.Ratio p-value
modgls     1  5 15.67389 21.56415  -2.83694                        
modlme     2  4 81.15956 86.03506 -36.57978 1 vs 2 67.48567  <.0001

Anova(modgls)

Analysis of Deviance Table (Type II tests)

Response: log(denstot)
      Df   Chisq Pr(>Chisq)    
year   1 60.0468  9.263e-15 ***
ratio  1  0.0403     0.8408    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Anova(modlme)

Analysis of Deviance Table (Type II tests)

Response: log(denstot)
      Chisq Df Pr(>Chisq)  
ratio 5.856  1    0.01552 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

modgls2=gls(log(denstot) ~   ratio, O, correlation = corAR1(value=0.9,form = ~ year))
anova(modgls2,modlme)

        Model df      AIC      BIC    logLik
modgls2     1  4 15.21587 20.09137  -3.60793
modlme      2  4 81.15956 86.03506 -36.57978

anova(modgls2,modgls)

Warning in nlme::anova.lme(object = modgls2, modgls): fitted objects with
different fixed effects. REML comparisons are not meaningful.

        Model df      AIC      BIC    logLik   Test  L.Ratio p-value
modgls2     1  4 15.21587 20.09137 -3.607935                        
modgls      2  5 15.67389 21.56415 -2.836943 1 vs 2 1.541984  0.2143

anova(modgls2)

Denom. DF: 25 
            numDF     F-value p-value
(Intercept)     1 0.000000006  0.9999
ratio           1 0.014065502  0.9065

The second models fits the data better than the first. We can conclude that the autocorrelation made the ratio variable effect wrongly significant.

In order to chose the range of application of the parameters \(p\) and \(q\), it is possible to use a partial correlation function that will display the direct correlation link between the time steps.

par(mfrow=c(1,2))
acf(log(O$denstot), main="Density for O")
pacf(log(O$denstot), main="Density for O")

So, the order \(p\) (AR) is generally chosen based on significant lags in the PACF, and the order \(q\) (MA) is generally chosen based on significant lags in the ACF.

7.2 Non-stationary process

To manage non-stationary processes, another model called ARIMA is used. A new parameter \(d\) is introduced with \(p\) and \(q\).

ARIMA(p,d,q) models are a combination of AR models with \(p\), MA models with \(q\), and finally includes difference of order \(d\).

This difference helps make the series stationary by subtracting from the current value the values from the previous \(d\) time points.

In general, we will choose an order \(d=1\) when the series shows a constant trend, and an order \(d=2\) when the series shows a fluctuating trend.

On R, ARIMA models can be used with the function arima(). The argument ’order’contains the values of \(p\), \(d\) and \(q\), in that order.

arima1=stats::arima(log(O$denstot),order=c(1,1,1),xreg=O$ratio)
arima1


Call:
stats::arima(x = log(O$denstot), order = c(1, 1, 1), xreg = O$ratio)

Coefficients:
         ar1     ma1  O$ratio
      0.1244  0.0987  -0.0038
s.e.  0.5483  0.5314   0.0905

sigma^2 estimated as 0.06632:  log likelihood = -1.65,  aic = 11.29

Different values of these parameters can be tested to find the best model with the AIC criteria displayed in the results. \(d\) must always be equal to 1.

The ARIMA function can help make predictions.

prediction: now we will use this statistical model to make prediction under a specific scenario
scenario: constant and maximum availability in resource (ratio=1) during 20 years

newxreg=data.frame(O.ratio=rep(1,20))
predict(arima1, n.ahead = 20,newxreg)

$pred
Time Series:
Start = 28 
End = 47 
Frequency = 1 
 [1] 4.241140 4.242138 4.242262 4.242277 4.242279 4.242280 4.242280 4.242280
 [9] 4.242280 4.242280 4.242280 4.242280 4.242280 4.242280 4.242280 4.242280
[17] 4.242280 4.242280 4.242280 4.242280

$se
Time Series:
Start = 28 
End = 47 
Frequency = 1 
 [1] 0.2575288 0.4068621 0.5189478 0.6112697 0.6914231 0.7632103 0.8288035
 [8] 0.8895733 0.9464492 1.0000958 1.0510076 1.0995646 1.1460662 1.1907532
[15] 1.2338228 1.2754388 1.3157392 1.3548413 1.3928461 1.4298412

pred=predict(arima1, n.ahead = 20,newxreg)
par(mfrow=c(1,1))
obspred=rbind (data.frame(denstot=O$denstot), data.frame(denstot=exp(predict(arima1, n.ahead = 20,newxreg)$pred)))
plot(obspred$denstot, main = "Prediction on a 20 years scenario", ylab ="Predicted observations for the total density of O")
lines(obspred$denstot)

8. To go further

The ARIMAX model, which stands for AutoRegressive Integrated Moving Average with eXogenous inputs, is an extension of the ARIMA model that incorporates additional exogenous (external) variables. The ARIMAX model is particularly useful when there are external factors that can influence the time series to model.

The eXogenous (X) Inputs are additional external variables that are not part of the time series being modeled but may influence it. These variables are included in the model to capture their impact on the dependent variable.

The ARIMAX model allows to account for the influence of external factors when modeling time series data, making it more flexible and potentially improving the accuracy of predictions.

II. Spatial dependance

Spatial relationships are multidirectional and multilateral. They are distinct, in this sense, from temporal relationships, which allow only sequential relationships along the past-present-future axis.

These are the steps that will be followed here :

Define connection criteria: We establish criteria for connections, determining that point A is connected to point B, which in turn is connected to point C, and so forth. This linkage needs to be translated into a numerical value, providing a quantitative measure of connectivity.

Transform information into numerical value: The established connection criteria are converted into numerical values, creating a quantifiable representation of the relationships between points. This numerical transformation is crucial for subsequent analyses.

Calculate autocorrelation coefficient: Utilizing the numerical values derived from the connection criteria, we compute an autocorrelation coefficient. This statistical measure helps quantify the degree of spatial autocorrelation, providing insights into the patterns of similarity or dissimilarity among neighbouring points.

Integrating spatial autocorrelation into analyses: Upon detecting spatial autocorrelation, the next step involves seamlessly integrating this information into our analytical processes. This will be discussed later in this chapter.

1 - Codifying the neighbour structure

1.1 Defining neighbours

1.1.1 Characteristics of the relationships between spatial objects

Consider a surface ℜ. This surface can be divided into n mutually exclusive zones. Two adjacent zones are separated by a common boundary.

Mathematical definition of spatial relationships

Spatial relationships \(B\) are a subset of the Cartesian product \(ℝ²× ℝ² = {(i, j) : i ∈ ℝ ², j ∈ ℝ²}\) of couples \((i, j)\) of spatial objects, i.e. all couples \((i, j)\) such that \(i\) and \(j\) are both spatial objects identified by their geographical coordinates, and such that \((i, j)\) is different from \((j,i)\). A spatial object cannot be linked to itself: \((i,i) ⊈ B\). Moreover, if \((i, j) ⊆ B\) and \((j,i) ⊆ B\) for all couples of spatial objects, the spatial relationships are said to be symmetrical (Tiefelsdorf, 1998).

Figure 2.1 illustrates the codifying process of spatial relationships. This approach makes it possible to systematically transcribe the complexity of geographic space into a final set of data analyzable by a computer.

**Figure 2.1 :** Codifying spatial correlation (Tiefelsdorf, 1998)

First, the study zone is divided into mutually exclusive areas. Each area contains a reference point (often its centroid). Then, the spatial relationships can be specified by a neighbourhood graph connecting the areas considered to be neighbouring, or by a matrix containing the geographical coordinates of the reference points. The third step consists in coding the graph in a neighbourhood matrix, or transforming the geographic coordinates into a distance matrix.

The neighborhood matrix measures the similarity between observations. A value greater than zero indicates that the observations are considered neighbors. For example, in the case of the binary matrix shown in figure 1 :

Conversely, the distance matrix measures dissimilarity between zones. The higher \(d_{ij}\) , the more different the zones. With, if an Euclidian distance is used : \(d_{ij}=\sqrt{(x_i-x_j )^2+(y_i-y_j )²}\), \(α\) and \(β\) being the geographical coordinates of the observations. The spatial dependence structure may not be geographical. Any relevant dual relationship may be used to define a neighbourhood graph. For instance:

Genetic proximity between families
Cultural exchanges
Migratory flows

1.1.2 The “list of neighbours” object in R

Package spdep makes it possible to define the relationships between spatial objects. In R, the class of an object defines all its properties and how the statistician can use it. Neighbourhood relationships are recorded in an object of class nb. Assume n spatial observations and neighbours_nb the spatial object containing the associated neighbourhood relationships. neighbours_nb is a list of length n. Each element [i] of the list contains a vector with the index of the neighbours of the item indexed i. If [i] does not have neighbours, the list contains only 0. The list also contains a vector of characters corresponding to the attributes of each neighbourhood zone, as well as a logical value indicating whether the relationship is symmetrical. The main information about the object neighbours_nb can be derived using the function: summary(neighbours_nb)

Let’s load a few packages useful for analysis :

#  Spatial series analysis : packages

#Libraries for spatial series analyses
library(spdep) #to extracted the neighbors is a spatial data

Loading required package: spData

To access larger datasets in this package, install the spDataLarge
package with: `install.packages('spDataLarge',
repos='https://nowosad.github.io/drat/', type='source')`

Loading required package: sf

Linking to GEOS 3.10.2, GDAL 3.4.1, PROJ 8.2.1; sf_use_s2() is TRUE


Attaching package: 'spdep'

The following object is masked from 'package:ade4':

    mstree

library(ade4)  #used for plotting spatial data
library(spatialreg) #used for spatial data modelling

Loading required package: Matrix


Attaching package: 'Matrix'

The following objects are masked from 'package:tidyr':

    expand, pack, unpack


Attaching package: 'spatialreg'

The following objects are masked from 'package:spdep':

    get.ClusterOption, get.coresOption, get.mcOption,
    get.VerboseOption, get.ZeroPolicyOption, set.ClusterOption,
    set.coresOption, set.mcOption, set.VerboseOption,
    set.ZeroPolicyOption

library(gwrr)  #to run geographically weighted regression

Loading required package: fields

Loading required package: spam

Spam version 2.10-0 (2023-10-23) is loaded.
Type 'help( Spam)' or 'demo( spam)' for a short introduction 
and overview of this package.
Help for individual functions is also obtained by adding the
suffix '.spam' to the function name, e.g. 'help( chol.spam)'.


Attaching package: 'spam'

The following object is masked from 'package:Matrix':

    det

The following objects are masked from 'package:base':

    backsolve, forwardsolve

Loading required package: viridisLite


Try help(fields) to get started.

Loading required package: lars

Loaded lars 1.3

#these two libraries (spdep and ade4 also provive function that make object compatibles between library)

Let’s then begin the analysis of a first dataset dealing with richness of birds from agricultural habitats and of plant species from permanent pastures at the scale of Ille-et-Vilaine, “div35”.

#Spatial series analysis : dataset div35
#For a REGULAR configuration of points 

#      0 - Exploring the dataset 

#Investigate the data set of plant and bird richness at the scale of Ille-et-Vilaine

#Import dataset
div35=read.table("div35.txt", header=T, dec=",")

#Summary
summary(div35)

    rich_pp         rich_ois           x                y          
 Min.   :14.00   Min.   :13.00   Min.   :555000   Min.   :5285000  
 1st Qu.:23.00   1st Qu.:32.00   1st Qu.:575000   1st Qu.:5315000  
 Median :25.00   Median :34.00   Median :595000   Median :5335000  
 Mean   :24.51   Mean   :33.74   Mean   :593971   Mean   :5336029  
 3rd Qu.:26.00   3rd Qu.:37.00   3rd Qu.:615000   3rd Qu.:5355000  
 Max.   :29.00   Max.   :41.00   Max.   :635000   Max.   :5385000

#Plot the spatial distribution of the points
par(mfrow=c(1,1))
plot(div35[,3:4])

#This is a map of Ille-et-Vilaine department.
#We can point St-Malo in the North and Rennes in the center for of the map, for example.

#The spatial configuration of the points is regular.

Let’s also start the analysis of second dataset dealing with chemical measures in a set of ponds, “mares”.

#Spatial series analysis : dataset mares
#For an IRREGULAR configuration of points

#      0 - Exploring the data set  

#The data set is issued from a study that analysed several physico-chemical properties of several ponds. each pond is geolocalised

#Import the dataset
mare=read.table("mares.txt",header=TRUE)

#Summary
summary(mare)

       pH         Conductivity     Carbonate        hardness    
 Min.   :7.800   Min.   : 59.0   Min.   : 40.0   Min.   : 65.0  
 1st Qu.:7.900   1st Qu.:196.5   1st Qu.:145.0   1st Qu.:167.5  
 Median :7.900   Median :341.0   Median :160.0   Median :185.0  
 Mean   :7.935   Mean   :310.1   Mean   :148.9   Mean   :179.4  
 3rd Qu.:8.000   3rd Qu.:405.0   3rd Qu.:175.0   3rd Qu.:202.5  
 Max.   :8.200   Max.   :560.0   Max.   :200.0   Max.   :275.0  
  Bicarbonate       Chloride        Suspens          Organic      
 Min.   : 90.0   Min.   : 5.00   Min.   : 0.200   Min.   :0.3700  
 1st Qu.:150.0   1st Qu.:10.00   1st Qu.: 1.700   1st Qu.:0.4700  
 Median :165.0   Median :12.50   Median : 2.400   Median :0.5500  
 Mean   :158.7   Mean   :15.06   Mean   : 3.968   Mean   :0.5632  
 3rd Qu.:167.5   3rd Qu.:17.50   3rd Qu.: 4.500   3rd Qu.:0.6200  
 Max.   :190.0   Max.   :32.50   Max.   :19.000   Max.   :1.0000  
    Nitrate        Ammonia              x                y        
 Min.   :0.22   Min.   :0.00000   Min.   : 55.00   Min.   : 17.0  
 1st Qu.:0.70   1st Qu.:0.00000   1st Qu.: 78.00   1st Qu.: 88.0  
 Median :1.48   Median :0.01000   Median : 92.00   Median :148.0  
 Mean   :1.48   Mean   :0.02935   Mean   : 96.94   Mean   :162.1  
 3rd Qu.:2.05   3rd Qu.:0.03000   3rd Qu.:112.50   3rd Qu.:232.0  
 Max.   :5.52   Max.   :0.38000   Max.   :163.00   Max.   :317.0

row.names(mare)<-seq(1,31,1) #we name the ponds from 1 to 31.

#We can plot the ponds on a map according to their coordinates
#We can see that the configuration of the ponds is not regular over space
par(mfrow=c(1,1))
plot(mare$x,mare$y)

1.2 Connection among neighbours

In order to identify points in close proximity, it is imperative to establish specific criteria beyond merely considering distance along the x-axis. This approach will enable us to clearly define the parameters that determine proximity between points in our analysis.

1.2.1 Neighbourhood criterion

a) Defining neighbours based on distance

Once we have a set of points spread across the territory, we can calculate the distance between them. These points may be specific locations where the information has been observed, or points representative of each zone, for example their centroid. In this case, the underlying assumption is that the distribution of the variable within each zone is sufficiently homogeneous to approximate it to a single point.

Neighbourhood graphs materialise the links between the various entities. They are defined in such a way that they represent the underlying spatial structure as closely as possible. There exists many different types of neighbourhood graphs.

a.1) Neighbourhood graphs based on geometric concepts :

• Delaunay’s Triangulation Delaunay’s Triangulation is a geometric method that connects points into triangles such that the minimum angle of all triangles is maximised.

**Figure 2.2 :** Delaunay triangle (source : Pascaline Le Gouar)

• Sphere-of-influence based graph

The sphere-of-influence based graph links two points if their “circles from the nearest neighbour” overlap. The “circle of the nearest neighbour” of point \(P\) is the largest circle centred in \(P\) and that contains no other points than \(P\). All points in the study set are not necessarily interconnected.

**Figure 2.3 :** The sphere of influence graph (edges) of a set of points, showing their nearest neighbour circles (source : Toussaint & Emirates, 2014)

• Gabriel’s graph

Gabriel’s graph links two points \(p_i\) and \(p_j\) if and only if all other points are outside the circle with diameter \([p_i, p_j]\).

**Figure 2.4 :** Gabriel graph (source : Pascaline Le Gouar)

• Graph of relative neighbours

The graph of relative neighbours considers that two points \(pi\) and \(pj\) are neighbours if \[d(p_i,p_j) ≤ max [d(p_i,p_k),d(p_j,p_k)] \,\,\,\,\,\,∀k = 1,..., n k\ne i,j\] with \(d(p_i, p_j)\) the distance between \(p_i\) and \(p_j\).

For example, here’s the application of these geometric concepts to Parisian districts :

**Figure 2.6 :** Four neighbourhood graphs of Parisian districts based on geometric concepts

Application in R :

Coming back to our “mares” dataset, we can explore these geometric concepts :

#Spatial series analysis : dataset "mares"
#Connexions among neighbours
#For an IRREGULAR configuration of points

#Several criteria exist to connect points that are irregularly distributed.
# It is necessary to tests several criteria to test different hypothesis and different scale of the phenomena.

#Criteria are determined when we have an a priori idea of the processes involved.
#For example, may be due to eutrophication (lots of nutrients added, lots of organisms making excrements), heterogeneity in the pond - pond animal and plant communities may have an influence
#Two ponds should look alike because they're on the same soil
#Animals and plants: depending on the dispersal abilities of these species, will influence a pattern in water quality.
#Ponds supposed to be in a specific landscape will look alike.
#Larger-scale phenomena (flooding): if ponds are distributed in a valley, all the ponds will be homogenized.

#      1 - Examine the connection between points ====================
# for first step extract the coordinate
marexy=mare[,11:12]

#           A - The gabriel graph method ---------------------------

#Note that for this function, the coordinates should be in a matrix:
mare.gab<-gabrielneigh(as.matrix(marexy))

# We can represent it spatially with this function:
s.label(marexy,clabel=.5, cpoint=.1,neig=nb2neig(graph2nb(mare.gab)))

# the part nb2neig(graph2nb(mare.gab)) is necessary to translate the object from one package to another 
# s.label is a function from ade4 and we need to translate our object from tripack->spdep->ade4

# if we want a synthetic view of the neighbor relationship we can use this function:
graph2nb(mare.gab)

Neighbour list object:
Number of regions: 31 
Number of nonzero links: 47 
Percentage nonzero weights: 4.890739 
Average number of links: 1.516129 
3 regions with no links:
13 28 31
Non-symmetric neighbours list

#31 pools, 47 links, each point will have an average of 1.5 neighbors, 3 regions have no links (31, 13, 28) = considered as non-neighbors

gab=graph2nb(mare.gab)
#note: the ponds 13, 28 and 31 seems to have no link. This is just because they do not appear
#in the $from object but only in the $to (this is a bug in the function - we would assume)

#         B - The Delaunay triangulation graph method -----------

#to compute a neighbor relationship using the delaunay triangulation method 
#We should use this function using directly the coordinates:

mare.tri<-tri2nb(marexy)

Warning in sn2listw(df1): style is M (missing); style should be set to a valid
value

# graphic representation:
s.label(marexy,clabel=0.6,cpoint=1,neig=nb2neig(mare.tri))

# synthetic view:
mare.tri

Neighbour list object:
Number of regions: 31 
Number of nonzero links: 164 
Percentage nonzero weights: 17.06556 
Average number of links: 5.290323

#With this connection criterion, ponds have an average of 5.2 neighbors: more connections are created

a.2) Neighbourhood graphs based on nearest neighbours

A second method consists in selecting the k closest points as neighbours. This method has the advantage that it leaves no point without a neighbour, which is not required when conducting a spatial analysis, but generally offers a better reflection of reality. The choice can also be made to keep only the points located at a certain distance.

For example :

Nearest neighbour
Two nearest neighbour
Three nearest neighbour
Neighbours at a minimum distance

etc.

Still using the example of Parisian districts, here’s the nearest neighbour application:

**Figure 2.7 :** Four graphs based on the nearest neighbours of Parisian districts

The nbdists function of R can be used to calculate the vector of distances between neighbours. It makes it possible to determine the minimum distance \(d_{min}\) above which all points have at least one neighbour, then the dnearneighb function allows to keep as neighbours only the points between distances \(0\) and \(d_{min}\).

Application in R

Let’s see what it looks like with the “mares” dataset with the criterion of neighbours at a minimum distance (30 km) :

#Spatial series analysis : dataset "mares"
#Connexions among neighbours
#For an IRREGULAR configuration of points

#      1 - Examine the connection between points ====================
# for first step extract the coordinate 
marexy=mare[,11:12]

#  C - The neighbours at a minimum distance criterion --------------------------
# to compute a neighbor relationship using a distance criteria we should use this function using directly the coordinates as a matrix and indicating the min and max of distance.
# the max of distance is defined depending on our knowledge of the system or for different hypothesis we want to test:

mare.dnear<-dnearneigh(as.matrix(marexy),0,30) #here we specify that ponds are connected if they are appart from less than 30m

# Graphic representation:
s.label(marexy,clabel=0.6, cpoint=1,neig=nb2neig(mare.dnear))

# Synthetic view:
mare.dnear #in that specific case: the ponds 4 and 31 are not connected to any other pond

Neighbour list object:
Number of regions: 31 
Number of nonzero links: 80 
Percentage nonzero weights: 8.324662 
Average number of links: 2.580645 
2 regions with no links:
4 31
4 disjoint connected subgraphs

Let’s then compare the current criterions in the mares dataset:

#Spatial series analysis : dataset "mares"
#Connexions among neighbours
#For an IRREGULAR configuration of points

par(mfrow=c(1,3))
s.label(marexy,clabel=.5, cpoint=.1,neig=nb2neig(graph2nb(mare.gab))) #Gabriel
s.label(marexy,clabel=0.6,cpoint=1,neig=nb2neig(mare.tri))  #Triangulation
s.label(marexy,clabel=0.6, cpoint=1,neig=nb2neig(mare.dnear)) #Neighbour at a min distance

#When comparing gabriel, triangulation and distance results, we observe different connexions.

#We can think of several biological processes according to the different critera :
#Gabriel: we can imagine gradients in relation to the ground - we can also imagine a small hydro network that connects the two.
#Triangular: Flooding in a valley would make a connection on a larger scale
#Neighbour at a min distance: composition of plant and animal communities, dispersion

b) Defining neighbours based on contiguity

When the areal data consist in a partition of the entire territory, the concept of “distance between observations” can become quite ambiguous. The example below illustrates the limits of using the distance between centroids to define the notion of neighbourhood.

Example - Ambiguity of the notion of distance between centroids

Let \(R_1\), \(R_2\), \(R_3\) be three distinct zones. It can be considered that since \(R_2\) and \(R_3\) are separated in space, but both are adjacent to \(R_1\), they are both closer to \(R_1\) than to one another. However, the centroids in these zones are equidistant from each other (see Figure 2.8). Summarising the proximity between zones by the distance between the centroids results in a partial loss of the richness of the spatial relationships.

**Figure** **2.8** - Left: three zones - Right: distance between centroids (Source: Smith, 2016)

Types of contiguity :

Rook contiguity

In the sense of Rook contiguity, neighbours have at least two common boundary points (a segment). This matches the movement of the Rook in chess.

**Figure 2.9 :** Rook configuration (source : Pascaline Le Gouar)

Queen contiguity

For two zones to be adjacent in the sense of Queen contiguity, they only need to share one common boundary point. This matches the movement of the Queen in chess.

**Figure 2.10 :** Queen configuration (source : Pascaline Le Gouar)

Bishop contiguity

In the sense of bishop contiguity, two regions are considered as spatial neighbors if they meet at one point ;it is based on the existence of common vertices between two spatial units. This matches the movement of the Bishop in chess.

Still using the example of Paris districts, here’s an example of how continuity can be applied:

**Figure 2.12 :** Queen and Rook contiguity in Paris districts

Application in R

We can return to the first dataset to investigate the configuration of the points :

#Spatial series analysis : dataset div35
#Connexion between points
#For a REGULAR configuration of points

#      1 - Examine the connection between points ====================

# To investigate the various criteria of connection between points we need to
# extract the coordinates in a separate object
xy=div35[,3:4]

#             A - If we choose that points are connected to their 4th nearest neighbors (the "rook connection") ----
div.knear4<-knearneigh(as.matrix(xy),4) #to extract the neighbour points
div.knear4

$nn
      [,1] [,2] [,3] [,4]
 [1,]    2    3   10    4
 [2,]    3    1   10   16
 [3,]    2    4   16   10
 [4,]    5   25    3   34
 [5,]   34    4   25   42
 [6,]    7   11   12   10
 [7,]    6   12    8   11
 [8,]   13    7    9   12
 [9,]   14    8   13   20
[10,]   11   16    2   17
[11,]   10   12    6   17
[12,]    7   11   13   18
[13,]    8   19   12   14
[14,]    9   20   13   21
[15,]   23   22   24   32
[16,]   25   10   17    3
[17,]   18   26   16   11
[18,]   19   17   27   12
[19,]   18   28   20   13
[20,]   29   21   19   14
[21,]   30   20   22   14
[22,]   23   31   21   15
[23,]   22   15   24   32
[24,]   33   23   15   32
[25,]   34   16    4   26
[26,]   27   35   17   25
[27,]   36   28   26   18
[28,]   29   27   37   19
[29,]   28   38   20   30
[30,]   21   31   29   39
[31,]   32   40   30   22
[32,]   31   41   33   23
[33,]   24   32   23   41
[34,]    5   25   35   42
[35,]   26   43   34   36
[36,]   37   27   44   35
[37,]   38   36   28   45
[38,]   37   29   46   39
[39,]   47   38   40   30
[40,]   48   31   39   41
[41,]   49   32   40   48
[42,]   50   43   34   35
[43,]   44   35   42   51
[44,]   45   43   52   36
[45,]   46   44   37   53
[46,]   45   54   47   38
[47,]   39   55   46   48
[48,]   56   40   49   47
[49,]   41   48   40   56
[50,]   51   42   57   58
[51,]   50   58   52   43
[52,]   53   51   44   59
[53,]   54   52   60   45
[54,]   55   53   46   61
[55,]   56   54   47   62
[56,]   55   48   63   62
[57,]   58   50   51   42
[58,]   51   57   59   50
[59,]   64   52   60   58
[60,]   65   53   59   61
[61,]   62   54   66   60
[62,]   61   63   67   55
[63,]   68   62   56   67
[64,]   59   65   58   60
[65,]   60   64   66   59
[66,]   67   61   65   62
[67,]   68   62   66   61
[68,]   67   63   62   66

$np
[1] 68

$k
[1] 4

$dimension
[1] 2

$x
           x       y
 [1,] 565000 5285000
 [2,] 565000 5295000
 [3,] 575000 5295000
 [4,] 585000 5295000
 [5,] 595000 5295000
 [6,] 555000 5315000
 [7,] 555000 5325000
 [8,] 555000 5335000
 [9,] 555000 5345000
[10,] 565000 5305000
[11,] 565000 5315000
[12,] 565000 5325000
[13,] 565000 5335000
[14,] 565000 5345000
[15,] 565000 5375000
[16,] 575000 5305000
[17,] 575000 5315000
[18,] 575000 5325000
[19,] 575000 5335000
[20,] 575000 5345000
[21,] 575000 5355000
[22,] 575000 5365000
[23,] 575000 5375000
[24,] 575000 5385000
[25,] 585000 5305000
[26,] 585000 5315000
[27,] 585000 5325000
[28,] 585000 5335000
[29,] 585000 5345000
[30,] 585000 5355000
[31,] 585000 5365000
[32,] 585000 5375000
[33,] 585000 5385000
[34,] 595000 5305000
[35,] 595000 5315000
[36,] 595000 5325000
[37,] 595000 5335000
[38,] 595000 5345000
[39,] 595000 5355000
[40,] 595000 5365000
[41,] 595000 5375000
[42,] 605000 5305000
[43,] 605000 5315000
[44,] 605000 5325000
[45,] 605000 5335000
[46,] 605000 5345000
[47,] 605000 5355000
[48,] 605000 5365000
[49,] 605000 5375000
[50,] 615000 5305000
[51,] 615000 5315000
[52,] 615000 5325000
[53,] 615000 5335000
[54,] 615000 5345000
[55,] 615000 5355000
[56,] 615000 5365000
[57,] 625000 5305000
[58,] 625000 5315000
[59,] 625000 5325000
[60,] 625000 5335000
[61,] 625000 5345000
[62,] 625000 5355000
[63,] 625000 5365000
[64,] 635000 5325000
[65,] 635000 5335000
[66,] 635000 5345000
[67,] 635000 5355000
[68,] 635000 5365000

attr(,"class")
[1] "knn"
attr(,"call")
knearneigh(x = as.matrix(xy), k = 4)

knn2nb(div.knear4) #knn2nb convert the object returned by the function knearneigh to a class nb (necessary for the subsequent function)

Neighbour list object:
Number of regions: 68 
Number of nonzero links: 272 
Percentage nonzero weights: 5.882353 
Average number of links: 4 
Non-symmetric neighbours list

# The information printed corresponds to all the links that have been created compared to all those that could have been created (272)

knn2nb(div.knear4)  # transformation to get as many lists than points/observation

Neighbour list object:
Number of regions: 68 
Number of nonzero links: 272 
Percentage nonzero weights: 5.882353 
Average number of links: 4 
Non-symmetric neighbours list

plot.new()
plot(knn2nb(div.knear4), xy, add=TRUE)

#             B - If we specify a queen type of connection with points connected to their 8th nearest neighbors (the "queen connection") ----
div.knear8<-knearneigh(as.matrix(xy),8)
div.knear8

$nn
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
 [1,]    2    3   10    4   16   25   11    5
 [2,]    3    1   10   16   11    4    6   25
 [3,]    2    4   16   10   25    1    5   17
 [4,]    5   25    3   34   16   26    2   35
 [5,]   34    4   25   42   35    3   26   16
 [6,]    7   11   12   10    8   17   13    2
 [7,]    6   12    8   11   13   18    9   10
 [8,]   13    7    9   12   14   19    6   11
 [9,]   14    8   13   20    7   21   12   19
[10,]   11   16    2   17    3    6   12   25
[11,]   10   12    6   17    7   16   18    2
[12,]    7   11   13   18    8   17    6   19
[13,]    8   19   12   14   20    9    7   18
[14,]    9   20   13   21    8   19   29   12
[15,]   23   22   24   32   33   31   21   30
[16,]   25   10   17    3   26   11    4    2
[17,]   18   26   16   11   27   12   10   25
[18,]   19   17   27   12   28   26   11   13
[19,]   18   28   20   13   12   27   29   14
[20,]   29   21   19   14   28   13   30   18
[21,]   30   20   22   14   29   31   23   39
[22,]   23   31   21   15   32   30   24   40
[23,]   22   15   24   32   31   33   21   41
[24,]   33   23   15   32   22   31   41   40
[25,]   34   16    4   26   35    5    3   17
[26,]   27   35   17   25   16   36   18   34
[27,]   36   28   26   18   17   37   19   35
[28,]   29   27   37   19   36   18   20   38
[29,]   28   38   20   30   37   21   19   39
[30,]   21   31   29   39   20   22   40   38
[31,]   32   40   30   22   23   21   41   39
[32,]   31   41   33   23   40   22   24   49
[33,]   24   32   23   41   31   15   22   40
[34,]    5   25   35   42    4   26   43   16
[35,]   26   43   34   36   25   42   27   44
[36,]   37   27   44   35   28   43   45   26
[37,]   38   36   28   45   29   44   46   27
[38,]   37   29   46   39   45   47   28   30
[39,]   47   38   40   30   48   46   31   29
[40,]   48   31   39   41   32   49   47   30
[41,]   49   32   40   48   33   31   23   39
[42,]   50   43   34   35    5   51   44   25
[43,]   44   35   42   51   50   36   52   34
[44,]   45   43   52   36   37   51   53   35
[45,]   46   44   37   53   52   38   36   54
[46,]   45   54   47   38   53   37   39   55
[47,]   39   55   46   48   54   56   38   40
[48,]   56   40   49   47   39   55   41   31
[49,]   41   48   40   56   32   47   31   63
[50,]   51   42   57   58   43   52   34   59
[51,]   50   58   52   43   59   44   57   42
[52,]   53   51   44   59   45   60   43   58
[53,]   54   52   60   45   46   59   44   61
[54,]   55   53   46   61   47   60   62   45
[55,]   56   54   47   62   61   48   63   46
[56,]   55   48   63   62   47   49   54   40
[57,]   58   50   51   42   59   64   52   43
[58,]   51   57   59   50   64   52   43   60
[59,]   64   52   60   58   51   65   53   44
[60,]   65   53   59   61   54   66   52   64
[61,]   62   54   66   60   55   67   53   65
[62,]   61   63   67   55   56   68   66   54
[63,]   68   62   56   67   55   61   48   66
[64,]   59   65   58   60   52   66   51   57
[65,]   60   64   66   59   61   53   67   52
[66,]   67   61   65   62   60   68   54   64
[67,]   68   62   66   61   63   55   65   56
[68,]   67   63   62   66   56   61   55   54

$np
[1] 68

$k
[1] 8

$dimension
[1] 2

$x
           x       y
 [1,] 565000 5285000
 [2,] 565000 5295000
 [3,] 575000 5295000
 [4,] 585000 5295000
 [5,] 595000 5295000
 [6,] 555000 5315000
 [7,] 555000 5325000
 [8,] 555000 5335000
 [9,] 555000 5345000
[10,] 565000 5305000
[11,] 565000 5315000
[12,] 565000 5325000
[13,] 565000 5335000
[14,] 565000 5345000
[15,] 565000 5375000
[16,] 575000 5305000
[17,] 575000 5315000
[18,] 575000 5325000
[19,] 575000 5335000
[20,] 575000 5345000
[21,] 575000 5355000
[22,] 575000 5365000
[23,] 575000 5375000
[24,] 575000 5385000
[25,] 585000 5305000
[26,] 585000 5315000
[27,] 585000 5325000
[28,] 585000 5335000
[29,] 585000 5345000
[30,] 585000 5355000
[31,] 585000 5365000
[32,] 585000 5375000
[33,] 585000 5385000
[34,] 595000 5305000
[35,] 595000 5315000
[36,] 595000 5325000
[37,] 595000 5335000
[38,] 595000 5345000
[39,] 595000 5355000
[40,] 595000 5365000
[41,] 595000 5375000
[42,] 605000 5305000
[43,] 605000 5315000
[44,] 605000 5325000
[45,] 605000 5335000
[46,] 605000 5345000
[47,] 605000 5355000
[48,] 605000 5365000
[49,] 605000 5375000
[50,] 615000 5305000
[51,] 615000 5315000
[52,] 615000 5325000
[53,] 615000 5335000
[54,] 615000 5345000
[55,] 615000 5355000
[56,] 615000 5365000
[57,] 625000 5305000
[58,] 625000 5315000
[59,] 625000 5325000
[60,] 625000 5335000
[61,] 625000 5345000
[62,] 625000 5355000
[63,] 625000 5365000
[64,] 635000 5325000
[65,] 635000 5335000
[66,] 635000 5345000
[67,] 635000 5355000
[68,] 635000 5365000

attr(,"class")
[1] "knn"
attr(,"call")
knearneigh(x = as.matrix(xy), k = 8)

knn2nb(div.knear8)

Neighbour list object:
Number of regions: 68 
Number of nonzero links: 544 
Percentage nonzero weights: 11.76471 
Average number of links: 8 
Non-symmetric neighbours list

# Plotting the connection between points for each criterion
## MPE this in an intercative command, can not be used with a markodon document.
par(mfrow=c(1,2))

ade4::s.label(xy, ylim=c(min(div35[,4])-10000,max(div35[,4])+10000), xlim=c(min(div35[,3])-10000,max(div35[,3])+10000),  clabel=0.6, cpoint=1,neig=nb2neig(knn2nb(div.knear4)))
ade4::s.label(xy, ylim=c(min(div35[,4])-10000,max(div35[,4])+10000), xlim=c(min(div35[,3])-10000,max(div35[,3])+10000),  clabel=0.4, cpoint=1,neig=nb2neig(knn2nb(div.knear8)))

#Be careful, the distance between points is not always 10km !
par(mfrow=c(1,1))

c) Defining neighbours based on the optimisation of a trajectory

Some methods such as spatial sampling require prior data sorting. When the latter are characterised by two variables (i.e. their geographical coordinates in the plan), how to choose a sorting method becomes a complex theoretical problem.

One solution consists in running a path along all the points, and sorting them by their order of appearance when the path is taken. The neighbours of a given point are then the points located just before or just after along the path. Out of the set of possible paths, some have characteristics that are better suited to the desired objectives, such as, for instance, reducing sampling variance.

Shortest path (Hamilton path)

This is the case of the shortest path. It minimises the sum of the distances between two consecutive points. This path, which does not set any particular constraints on the starting or arrival point, is known in the literature as the Hamilton path (Figure 2.13b).
Hamiltonian cycle

A particular and well-known case of shortest path is that of the travelling salesman. It represents the path which a travelling salesman must take to visit all his customers, minimising the distance travelled and managing to return home in the evenings. Such a path corresponds to a Hamiltonian cycle (Figure 2.13c).

The TSP package in R allows to compute this thanks to the function insert-dummy and the concorde program.
General randomized tessellation stratified method (GRTS)

The general randomized tessellation stratified method (GRTS, Stevens Jr et al. 2004) is popular in spatial sampling, as it makes it possible to get a spatially-balanced sample for a finite population of individuals (distinct and identifiable units of dimension 0 of a discrete population, e.g. trees in a forest), a linear population (continuous units of dimension 1, e.g. rivers) or a population of surfaces (continuous units of dimension 2, e.g. forests).

The idea of the method is to project the coordinates on a unit square, then cut this square into four cells, each of which is cut again into four sub-cells, etc. To each cell, a value is assigned, resulting from the order in which the division was carried out, ultimately making it possible for the units to be placed on the path going through the two-dimensional space.

This method can be implemented with package spsurvey in R (Kincaid et al. 2016). However, with the GRTS method, large jumps (Figures 2.13d) are created along the paths, which can affect the accuracy of the estimates.

Figure 2.13 : Looking for paths that cross through all the neighbourhoods of Paris

1.3 Neighborhood weights

1.3.1 From a list of neighbours to a weight matrix

Once the neighbourhood graph has been defined and codified into a list of neighbours, the link between points \(i\) and \(j\) is transformed into the element \(w_{ij}\) of the weight matrix \(W\). The weight matrix \(W\) is the “formal expression of spatial dependency between observations” (Anselin et al. 1988)
Formally, the weights express the neighbor structure between the observations as a \(n\times n\) matrix \(W\) in which the elements \(w_{ij}\) of the matrix are the spatial weights:

Defining the weight matrix

Most commonly, the weight matrix is a binary contiguity matrix (see Figure 2.13):

The weight matrices can also take into account the distance between the geographical zones, as relationships becoming smaller with distance: \(1\) if \(d < d_0 - 0\) otherwise, \(\frac{1}{d^α}\) , or \(e^{−αd}\) with \(α\) an estimated or predetermined parameter. Using a maximum distance beyond which \(w_{ij} = 0\) makes it possible to limit the number of components with a value different from zero. When the size of the zones is heterogeneous, this method increases the risk of a considerable variability in the number of neighbours.

Lastly, certain matrices take the strength of relations between the zones into account. For example, weight can be defined by \(\frac{b^α_{ij}}{d^β_{ij}}\) with \(b_{ij}\) a measure of the strength of relationships between zones \(i\) and \(j\) (which is not necessarily symmetrical), such as the percentage of common boundaries, the total population, the wealth and \(d_{ij}\) the distance between the zones.

Application in R

Let’s go back to our previous data set, div35, to see the transformation into a binary matrix :

::: {.cell}

```{.r .cell-code}
#Spatial series analysis : dataset div35
#Weight matrix
#For a REGULAR configuration of points

# about the code : 
# nb2neig(knn2nb()) transform the matrix to subsequently use it is ade4
# neig = define the connection matrix between points

#             Transform the list of the names of the neighbors to a 2 by 2 matrix of neighboors with "0" and "1" -----

w2<-knn2nb(div.knear4)
str(w2)                     # we have as much list as points
```

::: {.cell-output .cell-output-stdout}
```
List of 68
 $ : int [1:4] 2 3 4 10
 $ : int [1:4] 1 3 10 16
 $ : int [1:4] 2 4 10 16
 $ : int [1:4] 3 5 25 34
 $ : int [1:4] 4 25 34 42
 $ : int [1:4] 7 10 11 12
 $ : int [1:4] 6 8 11 12
 $ : int [1:4] 7 9 12 13
 $ : int [1:4] 8 13 14 20
 $ : int [1:4] 2 11 16 17
 $ : int [1:4] 6 10 12 17
 $ : int [1:4] 7 11 13 18
 $ : int [1:4] 8 12 14 19
 $ : int [1:4] 9 13 20 21
 $ : int [1:4] 22 23 24 32
 $ : int [1:4] 3 10 17 25
 $ : int [1:4] 11 16 18 26
 $ : int [1:4] 12 17 19 27
 $ : int [1:4] 13 18 20 28
 $ : int [1:4] 14 19 21 29
 $ : int [1:4] 14 20 22 30
 $ : int [1:4] 15 21 23 31
 $ : int [1:4] 15 22 24 32
 $ : int [1:4] 15 23 32 33
 $ : int [1:4] 4 16 26 34
 $ : int [1:4] 17 25 27 35
 $ : int [1:4] 18 26 28 36
 $ : int [1:4] 19 27 29 37
 $ : int [1:4] 20 28 30 38
 $ : int [1:4] 21 29 31 39
 $ : int [1:4] 22 30 32 40
 $ : int [1:4] 23 31 33 41
 $ : int [1:4] 23 24 32 41
 $ : int [1:4] 5 25 35 42
 $ : int [1:4] 26 34 36 43
 $ : int [1:4] 27 35 37 44
 $ : int [1:4] 28 36 38 45
 $ : int [1:4] 29 37 39 46
 $ : int [1:4] 30 38 40 47
 $ : int [1:4] 31 39 41 48
 $ : int [1:4] 32 40 48 49
 $ : int [1:4] 34 35 43 50
 $ : int [1:4] 35 42 44 51
 $ : int [1:4] 36 43 45 52
 $ : int [1:4] 37 44 46 53
 $ : int [1:4] 38 45 47 54
 $ : int [1:4] 39 46 48 55
 $ : int [1:4] 40 47 49 56
 $ : int [1:4] 40 41 48 56
 $ : int [1:4] 42 51 57 58
 $ : int [1:4] 43 50 52 58
 $ : int [1:4] 44 51 53 59
 $ : int [1:4] 45 52 54 60
 $ : int [1:4] 46 53 55 61
 $ : int [1:4] 47 54 56 62
 $ : int [1:4] 48 55 62 63
 $ : int [1:4] 42 50 51 58
 $ : int [1:4] 50 51 57 59
 $ : int [1:4] 52 58 60 64
 $ : int [1:4] 53 59 61 65
 $ : int [1:4] 54 60 62 66
 $ : int [1:4] 55 61 63 67
 $ : int [1:4] 56 62 67 68
 $ : int [1:4] 58 59 60 65
 $ : int [1:4] 59 60 64 66
 $ : int [1:4] 61 62 65 67
 $ : int [1:4] 61 62 66 68
 $ : int [1:4] 62 63 66 67
 - attr(*, "region.id")= chr [1:68] "1" "2" "3" "4" ...
 - attr(*, "call")= language knearneigh(x = as.matrix(xy), k = 4)
 - attr(*, "sym")= logi FALSE
 - attr(*, "type")= chr "knn"
 - attr(*, "knn-k")= num 4
 - attr(*, "class")= chr "nb"
```
:::

```{.r .cell-code}
nb2neig(knn2nb(div.knear4)) # these lists are transformed in a 2D matrix with 0 or 1.
```

::: {.cell-output .cell-output-stdout}
```
1 .
2 1.
3 11.
4 1.1.
5 ...1.
6 ......
7 .....1.
8 ......1.
9 .......1.
10 111..1....
11 .....11..1.
12 .....111..1.
13 .......11..1.
14 ........1...1.
15 ...............
16 .11......1......
17 .........11....1.
18 ...........1....1.
19 ............1....1.
20 ........1....1....1.
21 .............1.....1.
22 ..............1.....1.
23 ..............1......1.
24 ..............1.......1.
25 ...11..........1.........
26 ................1.......1.
27 .................1.......1.
28 ..................1.......1.
29 ...................1.......1.
30 ....................1.......1.
31 .....................1.......1.
32 ..............1.......11......1.
33 ......................11.......1.
34 ...11...................1.........
35 .........................1.......1.
36 ..........................1.......1.
37 ...........................1.......1.
38 ............................1.......1.
39 .............................1.......1.
40 ..............................1.......1.
41 ...............................11......1.
42 ....1............................11.......
43 ..................................1......1.
44 ...................................1......1.
45 ....................................1......1.
46 .....................................1......1.
47 ......................................1......1.
48 .......................................11.....1.
49 .......................................11......1.
50 .........................................1........
51 ..........................................1......1.
52 ...........................................1......1.
53 ............................................1......1.
54 .............................................1......1.
55 ..............................................1......1.
56 ...............................................11.....1.
57 .........................................1.......11......
58 .................................................11.....1.
59 ...................................................1.....1.
60 ....................................................1.....1.
61 .....................................................1.....1.
62 ......................................................11....1.
63 .......................................................1.....1.
64 .........................................................111....
65 ..........................................................11...1.
66 ............................................................11..1.
67 ............................................................111..1.
68 .............................................................11..11.
```
:::
:::

1.3.2 Weight matrix standardisation

The sum of the weights of the neighbours of a zone is called its degree of connection. If the weight matrix is not standardised (“\(B\)” coding scheme), the degree of connection will depend on the number of its neighbours, which creates heterogeneity between the zones. According to Tiefelsdorf (1998), four types of standardisation can be distinguished:

— Line standardisation (“W” coding scheme): for a given zone, the weight ascribed to each neighbour is divided by the sum of the weights of its neighbours : \(∑^n_{j=1}w_{ij} = 1\). This standardisation makes the interpretation of the weight matrix easier, because \(∑^n_{j=1}w_{ij}x_j\) represents the average of variable \(x\) on all neighbours of observation \(i\). Each weight \(w_{ij}\) can be interpreted as the fraction of spatial influence on observation \(i\) ascribable to \(j\). In contrast, such standardisation implies a certain degree of competition between neighbours: the fewer neighbours a zone has, the greater their weight. Moreover, when weights are inversely proportional to the distance between the zones, row standardisation makes them difficult to interpret.

— Global standardisation (“C” coding scheme): weights are standardised so that the sum of all weights is equal to the total number of entities. All weights are multiplied by \(\frac{n}{∑^n_{j=1}∑^n_{i=1}w_{ij}}\).

— Uniform standardisation (“U” coding scheme): weights are standardised so that the sum of all weights equals \(1\) : \(∑^n_{j=1}∑^n_{i=1}w_{ij} = 1\).

— Standardisation by variance stabilisation (“S” coding scheme): let q be the vector defined by : q = \((\sqrt{∑^n_{j=1}w^2_{1j}}, \sqrt{∑^n_{j=1}w^2_{2j}},...., \sqrt{∑^n_{j=1}w^2_{nj}})^T\). Let matrix \(S^∗\) = \([diag(q)]^{−1}W.^4\) From \(S^*\), we calculate \(Q = ∑^n_{j=1}∑^n_{i=1} s^∗_{ij}\) from which we deduce the standardised weight matrix: \(S = \frac{n}{Q} S^∗\).

1.3.3 Importance of the choice of weight matrix

When trying to test the importance of economic or social relationships between certain variables, the geographical location of the observations is a key parameter. First of all, observations in the same geographical zone are subject to the same external parameters (climate, pollution, etc.) Secondly, neighbouring observations mutually influence one another. Spatial models take these various interactions into account ; these models use neighbourhood specification via weight matrix \(W\).

The “weight list” object in R

The function nb2listw of package spdep makes it possible to convert a “list of neighbours” object into a “weight list” object. It is important to note that the “weight list” object, which corresponds to the weight matrix described above, is not a matrix n×n as represented in theory. It is a list containing the standardisation style and then for each observation: its attribute, the list of observation numbers of its neighbours, the list of the attributes of its neighbours and the list of the weights of its neighbours. Reference is often made to sparse matrices. When a zone has no neighbours, the option zero.policy=TRUE makes it possible to generate a list of weights which takes value ‘zero’ for observations without neighbours (if the option is FALSE, an error message is generated).

Application in R

In the dataset “div35” :

#Spatial series analysis : dataset div35
#Weight matrix
#For a REGULAR configuration of points

#      2 - Define the spatial weight matrix

#Note that spatial weight matrix is always associated to a neighborhood matrix (if two points are not considered as connected,
#their spatial weight is of "0")

#             A - In the case in which points are connected to their 4th nearest neighbours (the "rook connection") ----

#compare spatial weighted matrix with rook connection (that is with the 4th nearest neighbors)
#but different weight formulation (binary B vs standardized W)

pond4.bin<- nb2listw(knn2nb(div.knear4), style="B",zero.policy=TRUE)
# "B" for binary = all connections have the same weight

pond4.stan<- nb2listw(knn2nb(div.knear4), style="W", zero.policy=TRUE)
# "W" for standardized (line standardisation) = 1/nb of connections

pond4.bin

Characteristics of weights list object:
Neighbour list object:
Number of regions: 68 
Number of nonzero links: 272 
Percentage nonzero weights: 5.882353 
Average number of links: 4 
Non-symmetric neighbours list

Weights style: B 
Weights constants summary:
   n   nn  S0  S1   S2
B 68 4624 272 510 4438

pond4.stan

Characteristics of weights list object:
Neighbour list object:
Number of regions: 68 
Number of nonzero links: 272 
Percentage nonzero weights: 5.882353 
Average number of links: 4 
Non-symmetric neighbours list

Weights style: W 
Weights constants summary:
   n   nn S0     S1      S2
W 68 4624 68 31.875 277.375

pond4.bin$weights

[[1]]
[1] 1 1 1 1

[[2]]
[1] 1 1 1 1

[[3]]
[1] 1 1 1 1

[[4]]
[1] 1 1 1 1

[[5]]
[1] 1 1 1 1

[[6]]
[1] 1 1 1 1

[[7]]
[1] 1 1 1 1

[[8]]
[1] 1 1 1 1

[[9]]
[1] 1 1 1 1

[[10]]
[1] 1 1 1 1

[[11]]
[1] 1 1 1 1

[[12]]
[1] 1 1 1 1

[[13]]
[1] 1 1 1 1

[[14]]
[1] 1 1 1 1

[[15]]
[1] 1 1 1 1

[[16]]
[1] 1 1 1 1

[[17]]
[1] 1 1 1 1

[[18]]
[1] 1 1 1 1

[[19]]
[1] 1 1 1 1

[[20]]
[1] 1 1 1 1

[[21]]
[1] 1 1 1 1

[[22]]
[1] 1 1 1 1

[[23]]
[1] 1 1 1 1

[[24]]
[1] 1 1 1 1

[[25]]
[1] 1 1 1 1

[[26]]
[1] 1 1 1 1

[[27]]
[1] 1 1 1 1

[[28]]
[1] 1 1 1 1

[[29]]
[1] 1 1 1 1

[[30]]
[1] 1 1 1 1

[[31]]
[1] 1 1 1 1

[[32]]
[1] 1 1 1 1

[[33]]
[1] 1 1 1 1

[[34]]
[1] 1 1 1 1

[[35]]
[1] 1 1 1 1

[[36]]
[1] 1 1 1 1

[[37]]
[1] 1 1 1 1

[[38]]
[1] 1 1 1 1

[[39]]
[1] 1 1 1 1

[[40]]
[1] 1 1 1 1

[[41]]
[1] 1 1 1 1

[[42]]
[1] 1 1 1 1

[[43]]
[1] 1 1 1 1

[[44]]
[1] 1 1 1 1

[[45]]
[1] 1 1 1 1

[[46]]
[1] 1 1 1 1

[[47]]
[1] 1 1 1 1

[[48]]
[1] 1 1 1 1

[[49]]
[1] 1 1 1 1

[[50]]
[1] 1 1 1 1

[[51]]
[1] 1 1 1 1

[[52]]
[1] 1 1 1 1

[[53]]
[1] 1 1 1 1

[[54]]
[1] 1 1 1 1

[[55]]
[1] 1 1 1 1

[[56]]
[1] 1 1 1 1

[[57]]
[1] 1 1 1 1

[[58]]
[1] 1 1 1 1

[[59]]
[1] 1 1 1 1

[[60]]
[1] 1 1 1 1

[[61]]
[1] 1 1 1 1

[[62]]
[1] 1 1 1 1

[[63]]
[1] 1 1 1 1

[[64]]
[1] 1 1 1 1

[[65]]
[1] 1 1 1 1

[[66]]
[1] 1 1 1 1

[[67]]
[1] 1 1 1 1

[[68]]
[1] 1 1 1 1

attr(,"mode")
[1] "binary"
attr(,"B")
[1] TRUE

#List of created weights associated with the neighbor relationship
#when points are neighbors = 1, otherwise = 0. Translated into list, numerical value put into calculation
#Here : constant criterion for all points

pond4.stan$weights

[[1]]
[1] 0.25 0.25 0.25 0.25

[[2]]
[1] 0.25 0.25 0.25 0.25

[[3]]
[1] 0.25 0.25 0.25 0.25

[[4]]
[1] 0.25 0.25 0.25 0.25

[[5]]
[1] 0.25 0.25 0.25 0.25

[[6]]
[1] 0.25 0.25 0.25 0.25

[[7]]
[1] 0.25 0.25 0.25 0.25

[[8]]
[1] 0.25 0.25 0.25 0.25

[[9]]
[1] 0.25 0.25 0.25 0.25

[[10]]
[1] 0.25 0.25 0.25 0.25

[[11]]
[1] 0.25 0.25 0.25 0.25

[[12]]
[1] 0.25 0.25 0.25 0.25

[[13]]
[1] 0.25 0.25 0.25 0.25

[[14]]
[1] 0.25 0.25 0.25 0.25

[[15]]
[1] 0.25 0.25 0.25 0.25

[[16]]
[1] 0.25 0.25 0.25 0.25

[[17]]
[1] 0.25 0.25 0.25 0.25

[[18]]
[1] 0.25 0.25 0.25 0.25

[[19]]
[1] 0.25 0.25 0.25 0.25

[[20]]
[1] 0.25 0.25 0.25 0.25

[[21]]
[1] 0.25 0.25 0.25 0.25

[[22]]
[1] 0.25 0.25 0.25 0.25

[[23]]
[1] 0.25 0.25 0.25 0.25

[[24]]
[1] 0.25 0.25 0.25 0.25

[[25]]
[1] 0.25 0.25 0.25 0.25

[[26]]
[1] 0.25 0.25 0.25 0.25

[[27]]
[1] 0.25 0.25 0.25 0.25

[[28]]
[1] 0.25 0.25 0.25 0.25

[[29]]
[1] 0.25 0.25 0.25 0.25

[[30]]
[1] 0.25 0.25 0.25 0.25

[[31]]
[1] 0.25 0.25 0.25 0.25

[[32]]
[1] 0.25 0.25 0.25 0.25

[[33]]
[1] 0.25 0.25 0.25 0.25

[[34]]
[1] 0.25 0.25 0.25 0.25

[[35]]
[1] 0.25 0.25 0.25 0.25

[[36]]
[1] 0.25 0.25 0.25 0.25

[[37]]
[1] 0.25 0.25 0.25 0.25

[[38]]
[1] 0.25 0.25 0.25 0.25

[[39]]
[1] 0.25 0.25 0.25 0.25

[[40]]
[1] 0.25 0.25 0.25 0.25

[[41]]
[1] 0.25 0.25 0.25 0.25

[[42]]
[1] 0.25 0.25 0.25 0.25

[[43]]
[1] 0.25 0.25 0.25 0.25

[[44]]
[1] 0.25 0.25 0.25 0.25

[[45]]
[1] 0.25 0.25 0.25 0.25

[[46]]
[1] 0.25 0.25 0.25 0.25

[[47]]
[1] 0.25 0.25 0.25 0.25

[[48]]
[1] 0.25 0.25 0.25 0.25

[[49]]
[1] 0.25 0.25 0.25 0.25

[[50]]
[1] 0.25 0.25 0.25 0.25

[[51]]
[1] 0.25 0.25 0.25 0.25

[[52]]
[1] 0.25 0.25 0.25 0.25

[[53]]
[1] 0.25 0.25 0.25 0.25

[[54]]
[1] 0.25 0.25 0.25 0.25

[[55]]
[1] 0.25 0.25 0.25 0.25

[[56]]
[1] 0.25 0.25 0.25 0.25

[[57]]
[1] 0.25 0.25 0.25 0.25

[[58]]
[1] 0.25 0.25 0.25 0.25

[[59]]
[1] 0.25 0.25 0.25 0.25

[[60]]
[1] 0.25 0.25 0.25 0.25

[[61]]
[1] 0.25 0.25 0.25 0.25

[[62]]
[1] 0.25 0.25 0.25 0.25

[[63]]
[1] 0.25 0.25 0.25 0.25

[[64]]
[1] 0.25 0.25 0.25 0.25

[[65]]
[1] 0.25 0.25 0.25 0.25

[[66]]
[1] 0.25 0.25 0.25 0.25

[[67]]
[1] 0.25 0.25 0.25 0.25

[[68]]
[1] 0.25 0.25 0.25 0.25

attr(,"mode")
[1] "binary"
attr(,"W")
[1] TRUE
attr(,"comp")
attr(,"comp")$d
 [1] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[39] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4

#             B - In the case in which points are connected to their 8th nearest neighbors (the "queen connection") ----

#compare spatial weighted matrix with  queen connection
#but different weight formulation (binary B vs standardized [line standardisation] W)

pond8.bin<- nb2listw(knn2nb(div.knear8), style="B",zero.policy=TRUE)
pond8.stan<- nb2listw(knn2nb(div.knear8), style="W", zero.policy=TRUE)
pond8.bin$weights

[[1]]
[1] 1 1 1 1 1 1 1 1

[[2]]
[1] 1 1 1 1 1 1 1 1

[[3]]
[1] 1 1 1 1 1 1 1 1

[[4]]
[1] 1 1 1 1 1 1 1 1

[[5]]
[1] 1 1 1 1 1 1 1 1

[[6]]
[1] 1 1 1 1 1 1 1 1

[[7]]
[1] 1 1 1 1 1 1 1 1

[[8]]
[1] 1 1 1 1 1 1 1 1

[[9]]
[1] 1 1 1 1 1 1 1 1

[[10]]
[1] 1 1 1 1 1 1 1 1

[[11]]
[1] 1 1 1 1 1 1 1 1

[[12]]
[1] 1 1 1 1 1 1 1 1

[[13]]
[1] 1 1 1 1 1 1 1 1

[[14]]
[1] 1 1 1 1 1 1 1 1

[[15]]
[1] 1 1 1 1 1 1 1 1

[[16]]
[1] 1 1 1 1 1 1 1 1

[[17]]
[1] 1 1 1 1 1 1 1 1

[[18]]
[1] 1 1 1 1 1 1 1 1

[[19]]
[1] 1 1 1 1 1 1 1 1

[[20]]
[1] 1 1 1 1 1 1 1 1

[[21]]
[1] 1 1 1 1 1 1 1 1

[[22]]
[1] 1 1 1 1 1 1 1 1

[[23]]
[1] 1 1 1 1 1 1 1 1

[[24]]
[1] 1 1 1 1 1 1 1 1

[[25]]
[1] 1 1 1 1 1 1 1 1

[[26]]
[1] 1 1 1 1 1 1 1 1

[[27]]
[1] 1 1 1 1 1 1 1 1

[[28]]
[1] 1 1 1 1 1 1 1 1

[[29]]
[1] 1 1 1 1 1 1 1 1

[[30]]
[1] 1 1 1 1 1 1 1 1

[[31]]
[1] 1 1 1 1 1 1 1 1

[[32]]
[1] 1 1 1 1 1 1 1 1

[[33]]
[1] 1 1 1 1 1 1 1 1

[[34]]
[1] 1 1 1 1 1 1 1 1

[[35]]
[1] 1 1 1 1 1 1 1 1

[[36]]
[1] 1 1 1 1 1 1 1 1

[[37]]
[1] 1 1 1 1 1 1 1 1

[[38]]
[1] 1 1 1 1 1 1 1 1

[[39]]
[1] 1 1 1 1 1 1 1 1

[[40]]
[1] 1 1 1 1 1 1 1 1

[[41]]
[1] 1 1 1 1 1 1 1 1

[[42]]
[1] 1 1 1 1 1 1 1 1

[[43]]
[1] 1 1 1 1 1 1 1 1

[[44]]
[1] 1 1 1 1 1 1 1 1

[[45]]
[1] 1 1 1 1 1 1 1 1

[[46]]
[1] 1 1 1 1 1 1 1 1

[[47]]
[1] 1 1 1 1 1 1 1 1

[[48]]
[1] 1 1 1 1 1 1 1 1

[[49]]
[1] 1 1 1 1 1 1 1 1

[[50]]
[1] 1 1 1 1 1 1 1 1

[[51]]
[1] 1 1 1 1 1 1 1 1

[[52]]
[1] 1 1 1 1 1 1 1 1

[[53]]
[1] 1 1 1 1 1 1 1 1

[[54]]
[1] 1 1 1 1 1 1 1 1

[[55]]
[1] 1 1 1 1 1 1 1 1

[[56]]
[1] 1 1 1 1 1 1 1 1

[[57]]
[1] 1 1 1 1 1 1 1 1

[[58]]
[1] 1 1 1 1 1 1 1 1

[[59]]
[1] 1 1 1 1 1 1 1 1

[[60]]
[1] 1 1 1 1 1 1 1 1

[[61]]
[1] 1 1 1 1 1 1 1 1

[[62]]
[1] 1 1 1 1 1 1 1 1

[[63]]
[1] 1 1 1 1 1 1 1 1

[[64]]
[1] 1 1 1 1 1 1 1 1

[[65]]
[1] 1 1 1 1 1 1 1 1

[[66]]
[1] 1 1 1 1 1 1 1 1

[[67]]
[1] 1 1 1 1 1 1 1 1

[[68]]
[1] 1 1 1 1 1 1 1 1

attr(,"mode")
[1] "binary"
attr(,"B")
[1] TRUE

pond8.stan$weights

[[1]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[2]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[3]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[4]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[5]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[6]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[7]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[8]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[9]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[10]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[11]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[12]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[13]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[14]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[15]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[16]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[17]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[18]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[19]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[20]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[21]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[22]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[23]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[24]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[25]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[26]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[27]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[28]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[29]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[30]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[31]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[32]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[33]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[34]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[35]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[36]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[37]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[38]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[39]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[40]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[41]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[42]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[43]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[44]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[45]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[46]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[47]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[48]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[49]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[50]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[51]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[52]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[53]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[54]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[55]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[56]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[57]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[58]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[59]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[60]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[61]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[62]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[63]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[64]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[65]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[66]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[67]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

[[68]]
[1] 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

attr(,"mode")
[1] "binary"
attr(,"W")
[1] TRUE
attr(,"comp")
attr(,"comp")$d
 [1] 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
[39] 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8

#             C - In the case in which points are connected to their 4th nearest neighbors but the weight is inversely proportional to the distance (weight = 1/distance)
#To get inverse distance, we need to calculate the distances between all of the neighbours. 
#for this we will use nbdists, which gives us the distances in a similar structure to our input neighbors list. To use this function we need to input the neighbours list and the coordinates.

coords <- cbind(div35$x,div35$y)
distances <- nbdists(knn2nb(div.knear4),coords)
distances[1]

[[1]]
[1] 10000.01 14142.14 22360.69 20000.03

distances[34]

[[1]]
[1]  9999.987  9999.993 10000.008 10000.018

#Calculating the inverse distances (the 1/distance gives very small values so we make an adjustement)
invd1 <- lapply(distances, function(x) (1/x))
length(invd1)

[1] 68

invd1[1]

[[1]]
[1] 9.999994e-05 7.071067e-05 4.472134e-05 4.999993e-05

invd.weights <- nb2listw(knn2nb(div.knear4),glist = invd1,style = "B")
summary(invd.weights)

Characteristics of weights list object:
Neighbour list object:
Number of regions: 68 
Number of nonzero links: 272 
Percentage nonzero weights: 5.882353 
Average number of links: 4 
Non-symmetric neighbours list
Link number distribution:

 4 
68 
68 least connected regions:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 with 4 links
68 most connected regions:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 with 4 links

Weights style: B 
Weights constants summary:
   n   nn         S0           S1           S2
B 68 4624 0.02584031 4.834501e-06 4.031344e-05

invd.weights$weights[1]

[[1]]
[1] 9.999994e-05 7.071067e-05 4.472134e-05 4.999993e-05

Let’s see what it looks like in the second dataset, “mares” :

#Spatial series analysis : dataset mares
#Weight matrix
#For a IRREGULAR configuration of points

#      2 - Define the spatial weight matrix =========================

# Example : BINARY criterion
pondmaretri.bin <- nb2listw(mare.tri, style="B",zero.policy=TRUE)
pondmaregab.bin <- nb2listw(graph2nb(mare.gab), style="B", zero.policy=TRUE)
pondmarednear.bin <- nb2listw(mare.dnear, style="B",zero.policy=TRUE)

#There are points without neighbours

2 - Spatial autocorrelation indices

Note

Spatial autocorrelation indices measure the spatial dependence between values of the same variable in different places in space. The more the observation values are influenced by observation values that are geographically close to them, the greater the spatial correlation.

2.1 Global measures of spatial autocorrelation

2.1.1 Spatial autocorrelation indices

Calculating the spatial autocorrelation indices is used to answer two questions:

— Could the values taken by the neighbouring observations have been comparable (or also dissimilar) by mere chance?

— If not, then we are dealing with a case of spatial autocorrelation. How is this denoted and what is the strength of the said autocorrelation?

To answer the first question, we must test the hypothesis of absence of spatial autocorrelation for a gross variable y.

— \(H_0\): no spatial autocorrelation

— \(H_1\): spatial autocorrelation

To carry out this test, it is necessary to specify the distribution of the variable of interest y, in the absence of spatial autocorrelation (under \(H_0\)). In this context, statistical inference is generally conducted considering either of the following assumptions:

Normality hypothesis: each of the values of the variable, or \(y_i\), is the result of an independent draw in the normal distribution specific to each geographical area \(i\) on which this variable is measured.

Randomisation hypothesis: The inference over Moran’s I is usually conducted under the randomisation hypothesis. The estimated statistic calculated from data is compared with the distribution of the data derived by randomly re-ordering the data – permutations. The idea is simply that if the null hypothesis is true, then all possible combinations of data are equiprobable. The data observed are then only one of the many outcomes possible. In the case of spatial autocorrelation, the null hypothesis is always that there is no spatial association and the values of the variable are randomly assigned to the spatial units in order to calculate the test statistic. If the null hypothesis is rejected, i.e. if spatial autocorrelation is found, we can then calculate the range of values that governs the spatial autocorrelation index and thus answer the question as to the signals and strength of the spatial autocorrelation: the closest this index is to 1 in absolute value, the greater is the correlation. This interval depends on the weight matrix and can sometimes vary outside the interval [−1; 1], hence the importance of calculating the limits of this interval.

For example, here is the difference between a simulated random distribution and the real distribution in the case of the median income earned by Parisians in the different Parisian districts :

**Figure 2.15 :** Illustration, on Parisian census districts (IRIS), of the gap between random distribution and spatially autocorrelated distribution - Source: INSEE, Localised Tax Revenues System (RFL) 2010

Very generally speaking, spatial autocorrelation indices are used to characterise the correlation between measures that are geographically similar to a measured phenomenon. If \(WY\) is the vector of means of variable \(Y\) (where \(W\) is the spatial weights matrix) in the neighbourhood of each spatial unit, spatial autocorrelation indices occur as:

\[Corr(Y,WY) = \frac{Cov(Y,WY)}{\sqrt{Var(Y).Var(WY)}}\]

Based on this very general formulation, for quantitative variables, a main indice is used to test for spatial autocorrelation — the Moran index. The Moran index considers the variances and covariances taking into account the difference between each observation and the average of all observations.

Moran index

\[I_W = \frac{n}{∑_i∑_jw_{ij}}\frac{∑_i∑_jw_{ij}(y_i −\bar{y})(y_j − \bar{y})}{∑_i(y_i −\bar{y})²} \,\,\,\,\,\,\,\,\,\,i\ne{j}\]

— \(H_0\): The neighbours do not co-vary in any particular way.

— \(I_W > 0\) => positive spatial autocorrelation

As spatial autocorrelation is measured based on a comparison of the value of an individual variable with that of its neighbours, the definition of the neighbourhood will have a significant impact on the measurement of spatial autocorrelation. The larger the planned neighbourhood, the greater the number of neighbours considered, and the greater the probability that their average will be closer to the population’s average, which may lead to a relatively low value for spatial autocorrelation.

Application in R

The spdep package is used to calculate spatial autocorrelation indices and their significance using function moran.test.

By default, the distribution of the variable of interest under the null hypothesis is derived by randomisation. The randomisation = FALSE argument makes it possible to assume that this is a normal distribution.

If certain entities do not have neighbours

order for the package functions spdep to accept spatial weight matrices in which certain units do not have neighbours, it is necessary to specify the option: zero.policy=TRUE. By default, the size of the matrix is reduced to exclude observations without neighbours. The opposite can be specified with the option: adjust.n=FALSE. In this case, the absolute value of the test statistic increases, and the absolute value of its expected maturity and variance decreases (Bivand et al. 2013a).

Two approaches are used to estimate the significance of these indices — an analytical solution based on the normality hypothesis and a Monte Carlo solution based on the randomisation hypothesis. The analytical solution, used by the moran.test function, is based on the assumption that the test statistic asymptotically follows a normal distribution with mean 0 and variance 1. This is not always the most accurate measure of significance as convergence towards this distribution may depend on the arrangement of the polygons. Instead, the moran.mc function can be used, allowing to choose the number of permutations to calculate the simulated distribution of Moran’s I. Comparing the significance levels calculated from functions moran.mc and moran.test makes it possible to ensure the robustness of the conclusions.

Monte Carlo approach to estimating significance

In a Monte Carlo test (a permutation bootstrap test), the attribute values are randomly assigned to polygons in the data set and, for each permutation of the attribute values, a Moran’s I value is computed. The output is a sampling distribution of Moran’s I values under the (null) hypothesis that attribute values are randomly distributed across the study area. We then compare our observed Moran’s I value to this sampling distribution.

For example, the Mantel test is using the Monte Carlo method.

Let’s explore the Moran Global Index in the mares dataset :

#Spatial series analysis : dataset mares
    #Moran Global Index
    #For a REGULAR configuration of points

#      3 - The Moran Global Index  ==================================

# Let's test the global spatial autocorrelation for carbonates and each criterion using the Moran test (possible because the carbonate is a quantitative variable)

moran.test(mare$Carbonate, pondmaregab.bin, zero.policy=TRUE)


    Moran I test under randomisation

data:  mare$Carbonate  
weights: pondmaregab.bin  n reduced by no-neighbour observations
  

Moran I statistic standard deviate = 1.4378, p-value = 0.07524
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
       0.15443932       -0.03703704        0.01773410

moran.test(mare$Carbonate, pondmaretri.bin, zero.policy=TRUE)


    Moran I test under randomisation

data:  mare$Carbonate  
weights: pondmaretri.bin    

Moran I statistic standard deviate = 1.8498, p-value = 0.03217
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
      0.145853163      -0.033333333       0.009383014

moran.test(mare$Carbonate, pondmarednear.bin, zero.policy=TRUE)


    Moran I test under randomisation

data:  mare$Carbonate  
weights: pondmarednear.bin  n reduced by no-neighbour observations
  

Moran I statistic standard deviate = 1.4158, p-value = 0.07842
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
       0.16716343       -0.03571429        0.02053503

#No significant autocorrelation for all tests:
#Only for the 2nd one (triangulation). For carbonate, we have something significant - for Gabriel and distance, not significant.
#Rather, large-scale processes determine the distribution of carbonate concentration.

# We could also look at spatial autocorrelation with other variables to see if we reach the same conclusion
moran.test(mare$Nitrate, pondmaregab.bin, zero.policy=TRUE)


    Moran I test under randomisation

data:  mare$Nitrate  
weights: pondmaregab.bin  n reduced by no-neighbour observations
  

Moran I statistic standard deviate = 0.49064, p-value = 0.3118
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
       0.02036859       -0.03703704        0.01368928

moran.test(mare$Nitrate, pondmaretri.bin, zero.policy=TRUE)


    Moran I test under randomisation

data:  mare$Nitrate  
weights: pondmaretri.bin    

Moran I statistic standard deviate = 1.2281, p-value = 0.1097
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
       0.07342432       -0.03333333        0.00755716

moran.test(mare$Nitrate, pondmarednear.bin, zero.policy=TRUE)


    Moran I test under randomisation

data:  mare$Nitrate  
weights: pondmarednear.bin  n reduced by no-neighbour observations
  

Moran I statistic standard deviate = 1.0102, p-value = 0.1562
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
       0.09328084       -0.03571429        0.01630642

#--> No spatial structure with the 3 criteria

Spatial autocorrelation in multivariate data

#Spatial series analysis : dataset mares
    #The Mantel test (using the Monte Carlo method)
    #For a REGULAR configuration of points

############SPATIAL AUTOCORRELATION IN MULTIVARIATE DATA##############

#The Mantel test (using the Monte Carlo method)
#In case of multivariate data it can be useful to have a test that sum up the spatial autocorrelation for  all characteristics/variables
#In that case, we need to test the correlation between a matrix of geographical distance and a matrix of distance of the characteristics between objects 
#To compute a geographical distance matrix you can use the function dist that 
#compute an euclidian distance
geo = dist(marexy)

#To compute the matrix of distance of the characteristics between objects, 
#we can have our own measure (genetic distance, similarity etc...) 
# or compute an euclidean distance between the object 
# it is usually advice to scale the data if they are not measure on the same scale (ie to divide by the standard
#deviation)
chim = dist(scale(mare[,1:10]),method = "euclidian")

# the test used to correlate those two matrix is a Mantel test, it uses randomization of the values 
#to have the expected distribution under H0
# H0: the correlation between teh two matrix is null : there is no spatial autocorrelation
r1 = mantel.rtest(geo,chim,nrepet=1000)
#nrepet = to specify the number of permutations
# or r1 = mantel.randtest(geo,chim,nrepet=1000)
r1

Monte-Carlo test
Call: mantel.rtest(m1 = geo, m2 = chim, nrepet = 1000)

Observation: 0.2437815 

Based on 1000 replicates
Simulated p-value: 0.006993007 
Alternative hypothesis: greater 

     Std.Obs  Expectation     Variance 
 2.908960278 -0.001557550  0.007113084

plot(r1, main = "Mantel's test")

#Significant
#Indicator far from what is expected if permutations are involved
#Ponds that are close will look alike
#Spatial dependence

2.1.2 Spatial autocorrelation of categorical variables

When the variable of interest is not continuous but categorical, the degree of local association is measured by analysing the statistics of the join count (Zhukov 2010). To illustrate the calculation of these statistics, we consider a binary variable representing two colours, White (B) and Black (N) so that a relation can be called White-White, Black-Black or White-Black. It can be seen that:

— positive spatial autocorrelation occurs if the number of White-Black relations is significantly lower than what would have occurred with random spatial distribution;

— negative spatial autocorrelation occurs if the number of White-Black relations is significantly greater than what would have occurred with random spatial distribution;

— no positive spatial autocorrelation occurs if the number of White-Black links is approximately identical to what would have occurred with random spatial distribution;

2.2 Local measures of spatial autocorrelation

Global statistics are based on the assumption of a spatial stationary process: spatial autocorrelation would be the same throughout space. However, this assumption is all the less realistic as the number of observations is high.

2.2.1 Local spatial autocorrelation indicators

Anselin (Anselin 1995) defines local spatial autocorrelation indicators. These must measure the intensity and significance of local autocorrelation between the value of a variable in a spatial unit and the value of the same variable in the surrounding spatial units. More specifically, these indicators make it possible to:

— detect significant groupings of identical values around a particular location (clusters);

— identify spatial non-stationarity zones, which do not follow the global process.

To be considered as local spatial association measures – (LISA; Local Indicators of Spatial Association) – as defined by Anselin, these indicators must verify the following two properties:

— for each observation, they indicate the intensity of the grouping of similar – or opposite in trend – values around this observation;

— the sum of local indices on all observations is proportional to the corresponding global index. One of the most used LISA is the local Moran’s I.

Local Moran’s I \[I_i = (y_i −\bar{y})\sum \limits_{\underset{}{j}}w_{ij}(y_j −\bar{y})\] \[I_W = constante ∗\sum \limits_{\underset{}i}I_i\]

\(Ii > 0\) indicates a grouping of similar values (higher or lower than average).

\(Ii < 0\) indicates a combination of dissimilar values (e.g. high values surrounded by low values).

Here is an example with the median income in the different Parisian districts :

**Figure 2.17 :** Values of local Moran’s I, on Parisian IRIS - Source: INSEE, Localised Tax Revenues System (RFL) 2010

2.2.2 Significance of the local Moran’s I

Significant LISAs are combinations of similar or dissimilar values more marked than what might have been observed based on random spatial distribution. The significance test of each local association indicator is based on a statistic assumed to asymptotically follow a normal distribution under the null hypothesis. If the assumption of normality holds, \(z(I_i) = \frac{I−E(I_i)}{\sqrt{Var(I_i)}}∼ N (0,1)\).

To test the validity of the normality assumption of the LISAs under the null hypothesis, several random distributions are simulated in the space of the variable of interest and the local indicators associated with these simulations are calculated.

Several adjustments methods can then be used : “bonferroni”, “holm”, etc.

2.2.3 Interpretation of local indices

In the absence of global spatial autocorrelation

The LISAs make it possible to identify areas where similar values are significantly grouped. These are areas where the local spatial structure is such that the relations between neighbours are particularly strong.

In the presence of global spatial autocorrelation

LISAs indicate areas that have a particular impact on the global process (local autocorrelation more pronounced than global autocorrelation), or, on the contrary, which stand out from it (lower autocorrelation).

Application with R

Let’s investigate this in the div35 dataset :

#Spatial series analysis : dataset div35
    #Local Moran Index
    #For a REGULAR configuration of points


#             2 - The Local Moran Index                                           ----
# Due to the large extend of the dataset (scale of Brittany), we might expect non stationary issue
#We will investigate this via the Local Moran Index. The autocorrelaton is computed for each point taking into account their neighbours.
# Let's start with plant diversity:

# For the PLANTS
localI.pp1=spdep::localmoran(div35[,1], pond4.bin)
localI.pp1

             Ii         E.Ii     Var.Ii         Z.Ii Pr(z != E(Ii))
1   3.888286246 -0.090487416  5.7405874  1.660623069    0.096789173
2   3.392860034 -0.046011342  2.9522005  2.001441977    0.045344781
3   2.772659813 -0.046011342  2.9522005  1.640482051    0.100904977
4   0.730402871 -0.016433658  1.0623114  0.724602181    0.468696120
5  -0.744866294 -0.017090940  1.1046174 -0.692454769    0.488651781
6   2.027685582 -0.016433658  1.0623114  1.983263007    0.047338059
7   2.148789768 -0.090487416  5.7405874  0.934608416    0.349990140
8   1.334088885 -0.017090940  1.1046174  1.285604009    0.198581297
9   1.389136242 -0.823575799 42.4509690  0.339610489    0.734149878
10  2.772659813 -0.046011342  2.9522005  1.640482051    0.100904977
11  2.583663887 -0.090487416  5.7405874  1.116112079    0.264374162
12  1.286381176 -0.016433658  1.0623114  1.264028249    0.206219888
13  0.389109258 -0.017090940  1.1046174  0.386486383    0.699136486
14 -0.730186999 -0.001754363  0.1138241 -2.159097257    0.030842622
15 -4.678917402 -0.674372813 36.3931084 -0.663809645    0.506812152
16  1.842359481 -0.046011342  2.9522005  1.099042162    0.271749676
17  0.480854853 -0.001754363  0.1138241  1.430468903    0.152582485
18 -0.010901535 -0.016433658  1.0623114  0.005367424    0.995717436
19 -0.922852748 -0.047106812  3.0216512 -0.503797452    0.614403718
20 -0.752205942 -0.016433658  1.0623114 -0.713867334    0.475309218
21 -0.366874443 -0.017090940  1.1046174 -0.332807718    0.739279432
22  0.581775007 -0.001973457  0.1280321  1.631422156    0.102801273
23 -2.499042068 -0.046011342  2.9522005 -1.427677319    0.153384757
24 -1.493510348 -0.016433658  1.0623114 -1.433102092    0.151828664
25  0.420302760 -0.001754363  0.1138241  1.250990595    0.210937909
26  0.299198575 -0.001754363  0.1138241  0.892033979    0.372374720
27  0.480854853 -0.001754363  0.1138241  1.430468903    0.152582485
28 -0.010901535 -0.016433658  1.0623114  0.005367424    0.995717436
29 -0.124666073 -0.001973457  0.1280321 -0.342893327    0.731678712
30 -0.124666073 -0.001754363  0.1138241 -0.364314177    0.715623414
31  0.067999676 -0.001973457  0.1280321  0.195556350    0.844957428
32 -0.003561888 -0.001754363  0.1138241 -0.005357561    0.995725305
33 -0.744866294 -0.017090940  1.1046174 -0.692454769    0.488651781
34  0.601959038 -0.046011342  2.9522005  0.377122311    0.706082712
35  3.018538007 -0.090487416  5.7405874  1.297615742    0.194419410
36  4.444264551 -0.149861878  9.3629567  1.501400079    0.133252113
37  2.152459591 -0.046011342  2.9522005  1.279522125    0.200713253
38 -1.122858145 -0.017090940  1.1046174 -1.052101820    0.292752837
39 -0.185218166 -0.001754363  0.1138241 -0.543792485    0.586584280
40  0.389109258 -0.017090940  1.1046174  0.386486383    0.699136486
41  0.067999676 -0.001973457  0.1280321  0.195556350    0.844957428
42 -0.445775655 -0.001973457  0.1280321 -1.240309455    0.214860957
43 -0.253109906 -0.001973457  0.1280321 -0.701859778    0.482766615
44  1.101055074 -0.016433658  1.0623114  1.084219560    0.278267461
45  2.462559702 -0.046011342  2.9522005  1.460002088    0.144289500
46  1.713915648 -0.090487416  5.7405874  0.753104752    0.451386969
47 -0.253109906 -0.001973457  0.1280321 -0.701859778    0.482766615
48  0.200113333 -0.017090940  1.1046174  0.206662857    0.836273167
49 -0.752205942 -0.016433658  1.0623114 -0.713867334    0.475309218
50 -1.122858145 -0.016433658  1.0623114 -1.073484713    0.283053694
51  0.260665425 -0.001973457  0.1280321  0.734006027    0.462945065
52 -0.060444157 -0.001973457  0.1280321 -0.163410101    0.870195550
53  0.238646483 -0.001754363  0.1138241  0.712555671    0.476120751
54  3.888286246 -0.090487416  5.7405874  1.660623069    0.096789173
55 -0.566879840 -0.016433658  1.0623114 -0.534058645    0.593300961
56  0.774440757 -0.001973457  0.1280321  2.169871833    0.030016557
57  0.260665425 -0.001973457  0.1280321  0.734006027    0.462945065
58  0.033136350 -0.151833724  9.4812939  0.060071388    0.952098778
59  0.645996924 -0.001973457  0.1280321  1.810905382    0.070155499
60 -0.744866294 -0.017090940  1.1046174 -0.692454769    0.488651781
61 -1.765077309 -0.090487416  5.7405874 -0.698924554    0.484599181
62  4.169027767 -0.226544763 13.8719900  1.180174068    0.237930990
63 10.615073262 -0.316154192 18.8993089  2.514467166    0.011921240
64  2.657060363 -0.092021073  5.8355936  1.138007394    0.255117395
65 -0.366874443 -0.001754363  0.1138241 -1.082227409    0.279151492
66  0.389109258 -0.001973457  0.1280321  1.092972479    0.274405883
67  2.849726112 -0.151833724  9.4812939  0.974794795    0.329662060
68  5.352545940 -0.047106812  3.0216512  3.106301938    0.001894432
attr(,"call")
spdep::localmoran(x = div35[, 1], listw = pond4.bin)
attr(,"class")
[1] "localmoran" "matrix"     "array"     
attr(,"quadr")
        mean    median     pysal
1  High-High High-High High-High
2  High-High High-High High-High
3  High-High High-High High-High
4  High-High High-High High-High
5   Low-High  Low-High  Low-High
6  High-High High-High High-High
7  High-High High-High High-High
8    Low-Low   Low-Low   Low-Low
9    Low-Low   Low-Low   Low-Low
10 High-High High-High High-High
11 High-High High-High High-High
12 High-High High-High High-High
13   Low-Low   Low-Low   Low-Low
14  High-Low   Low-Low  High-Low
15  Low-High  Low-High  Low-High
16 High-High High-High High-High
17 High-High  Low-High High-High
18  High-Low  High-Low  High-Low
19  Low-High  Low-High  Low-High
20  High-Low  High-Low  High-Low
21  Low-High  Low-High  Low-High
22   Low-Low   Low-Low   Low-Low
23  High-Low  High-Low  High-Low
24  High-Low  High-Low  High-Low
25 High-High  Low-High High-High
26 High-High  Low-High High-High
27 High-High  Low-High High-High
28  High-Low  High-Low  High-Low
29  Low-High  Low-High  Low-High
30  High-Low   Low-Low  High-Low
31   Low-Low   Low-Low   Low-Low
32  High-Low   Low-Low  High-Low
33  Low-High  Low-High  Low-High
34 High-High High-High High-High
35 High-High High-High High-High
36 High-High High-High High-High
37 High-High High-High High-High
38  Low-High  Low-High  Low-High
39  High-Low   Low-Low  High-Low
40   Low-Low   Low-Low   Low-Low
41   Low-Low   Low-Low   Low-Low
42  Low-High  Low-High  Low-High
43  Low-High  Low-High  Low-High
44 High-High High-High High-High
45 High-High High-High High-High
46 High-High High-High High-High
47  Low-High  Low-High  Low-High
48   Low-Low   Low-Low   Low-Low
49  High-Low  High-Low  High-Low
50  High-Low  High-Low  High-Low
51   Low-Low   Low-Low   Low-Low
52  Low-High   Low-Low  Low-High
53 High-High  Low-High High-High
54 High-High High-High High-High
55  High-Low  High-Low  High-Low
56   Low-Low   Low-Low   Low-Low
57   Low-Low   Low-Low   Low-Low
58   Low-Low   Low-Low   Low-Low
59   Low-Low   Low-Low   Low-Low
60  Low-High  Low-High  Low-High
61  High-Low  High-Low  High-Low
62   Low-Low   Low-Low   Low-Low
63   Low-Low   Low-Low   Low-Low
64   Low-Low   Low-Low   Low-Low
65  High-Low   Low-Low  High-Low
66   Low-Low   Low-Low   Low-Low
67   Low-Low   Low-Low   Low-Low
68   Low-Low   Low-Low   Low-Low

#Instead of 1 index, there are 68.
#For plants: 6 points where we have a significant index

# the five columns correspond respectively to:
# first column without header: the atlas square number (point i)
# Ii: the Moran index at point i
# E.Ii: the expected value of the Moran index at point i under H0
# Var.Ii: the variance of the index
# ZIi the variable of the test
# Pr(Z>0) p value of the test adjusted with bonferonni correction

# We can isolate the significant points like this:
localI.pp1[localI.pp1[,5]<0.05,]

           Ii         E.Ii     Var.Ii      Z.Ii Pr(z != E(Ii))
2   3.3928600 -0.046011342  2.9522005  2.001442    0.045344781
6   2.0276856 -0.016433658  1.0623114  1.983263    0.047338059
14 -0.7301870 -0.001754363  0.1138241 -2.159097    0.030842622
56  0.7744408 -0.001973457  0.1280321  2.169872    0.030016557
63 10.6150733 -0.316154192 18.8993089  2.514467    0.011921240
68  5.3525459 -0.047106812  3.0216512  3.106302    0.001894432

# for the BIRDS
localI.oi1=localmoran(div35[,2], pond8.bin)
localI.oi1

              Ii          E.Ii       Var.Ii         Z.Ii Pr(z != E(Ii))
1    2.103998978 -0.1629313507   9.70253707  0.727772293    0.466752988
2   -0.402453047 -0.0031781809   0.19311812 -0.908574316    0.363574869
3   -3.147821643 -0.0820174075   4.93455032 -1.380132010    0.167546003
4    1.304714450 -0.0626533870   3.77873904  0.703415461    0.481796874
5    2.812060815 -0.2307054227  13.61966397  0.824490000    0.409661196
6    0.661811678 -0.0626533870   3.77873904  0.372686788    0.709381571
7   -0.149056684 -0.0031781809   0.19311812 -0.331955433    0.739922912
8    0.014564967 -0.0004118922   0.02503677  0.094652402    0.924590926
9    0.844257059 -0.0177011963   1.07363733  0.831873774    0.405480197
10  -2.044461479 -0.0820174075   4.93455032 -0.883432755    0.377002500
11   1.074485754 -0.1069140054   6.41222033  0.466544055    0.640826121
12   0.567693028 -0.0094023304   0.57087598  0.763794796    0.444989539
13  -0.004258762 -0.0031781809   0.19311812 -0.002458929    0.998038060
14   0.417103190 -0.0177011963   1.07363733  0.419628634    0.674756767
15  -0.259103105 -0.0004118922   0.02503677 -1.634905124    0.102068938
16   0.131851284 -0.0004118922   0.02503677  0.835891342    0.403216034
17   2.544184660 -0.1069140054   6.41222033  1.046939710    0.295127385
18   0.816745454 -0.0094023304   0.57087598  1.093419603    0.274209610
19  -1.497125335 -0.0439809385   2.65881004 -0.891179354    0.372832958
20  -2.074869043 -0.1069140054   6.41222033 -0.777160920    0.437063836
21   2.340019590 -0.5571274838  31.50809639  0.516130424    0.605763320
22   0.755930327 -0.0031781809   0.19311812  1.727397718    0.084096260
23   3.081384950 -0.0439809385   2.65881004  1.916713549    0.055274333
24   2.770069418 -0.1318106033   7.88047063  1.033721086    0.301266573
25   1.574038584 -0.0301494953   1.82581690  1.187207614    0.235145738
26   3.174055619 -0.1069140054   6.41222033  1.295680705    0.195085511
27   2.577488182 -0.0301494953   1.82581690  1.929828145    0.053628135
28   0.236105788 -0.0004118922   0.02503677  1.494770399    0.134974374
29  -1.444998084 -0.1069140054   6.41222033 -0.528419925    0.597207911
30   0.357736042 -0.0031781809   0.19311812  0.821282331    0.411485469
31   5.613900600 -0.1933605258  11.46988183  1.714714747    0.086397538
32  -2.391976492 -0.0626533870   3.77873904 -1.198274408    0.230810206
33  -1.113410843 -0.0094023304   0.57087598 -1.461172653    0.143968066
34 -12.611813807 -0.3517305513  20.44091227 -2.711710247    0.006693708
35   7.195093906 -0.3102362215  18.12727415  1.762802055    0.077933881
36   8.324517695 -0.1629313507   9.70253707  2.724799336    0.006434058
37   3.715599847 -0.0626533870   3.77873904  1.943647984    0.051937915
38   0.157914910 -0.0004118922   0.02503677  1.000611106    0.317014859
39   0.246241642 -0.0177011963   1.07363733  0.254730579    0.798931197
40   0.321536561 -0.0031781809   0.19311812  0.738908205    0.459962739
41   1.287338699 -0.2666671752  15.66979316  0.392573540    0.694634486
42   3.876325540 -0.0626533870   3.77873904  2.026330152    0.042730963
43   5.587836975 -0.2307054227  13.61966397  1.576634432    0.114879665
44   6.743324390 -0.1069140054   6.41222033  2.705213010    0.006826060
45   4.223840552 -0.1069140054   6.41222033  1.710249030    0.087219833
46   1.128060985 -0.0301494953   1.82581690  0.857154045    0.391359756
47   0.040628593 -0.0004118922   0.02503677  0.259372166    0.795348103
48   0.926791874 -0.0439809385   2.65881004  0.595352183    0.551608081
49  -7.745155658 -0.1629313507   9.70253707 -2.434187191    0.014925274
50   0.443166816 -0.0094023304   0.57087598  0.598982392    0.549184626
51   3.384012606 -0.1069140054   6.41222033  1.378594370    0.168019853
52   0.197010349 -0.0004118922   0.02503677  1.247690752    0.212144329
53   0.505429922 -0.0094023304   0.57087598  0.681388594    0.495625638
54  -0.463268174 -0.0626533870   3.77873904 -0.206088389    0.836721868
55   0.719730846 -0.0031781809   0.19311812  1.645023592    0.099964946
56  -3.356330650 -0.0626533870   3.77873904 -1.694367417    0.090195487
57   0.816745454 -0.0094023304   0.57087598  1.093419603    0.274209610
58   0.079724032 -0.0004118922   0.02503677  0.506451813    0.612539509
59  -0.257655125 -0.0031781809   0.19311812 -0.579077812    0.562536671
60   0.438822878 -0.1318106033   7.88047063  0.203273691    0.838921118
61   0.574932924 -0.0031781809   0.19311812  1.315527088    0.188332820
62   7.898811805 -0.1318106033   7.88047063  2.860705354    0.004226997
63  24.379711256 -2.5274165775 105.09842847  2.624636904    0.008674141
64   0.246241642 -0.0177011963   1.07363733  0.254730579    0.798931197
65  -3.195604957 -0.0626533870   3.77873904 -1.611685249    0.107030449
66   5.597972829 -0.2666671752  15.66979316  1.481527532    0.138466054
67  10.571781440 -0.2666671752  15.66979316  2.738012907    0.006181165
68   7.898811805 -0.1318106033   7.88047063  2.860705354    0.004226997
attr(,"call")
localmoran(x = div35[, 2], listw = pond8.bin)
attr(,"class")
[1] "localmoran" "matrix"     "array"     
attr(,"quadr")
        mean    median     pysal
1  High-High High-High High-High
2   Low-High  Low-High  Low-High
3   Low-High  Low-High  Low-High
4  High-High High-High High-High
5  High-High High-High High-High
6  High-High  High-Low High-High
7   Low-High   Low-Low  Low-High
8  High-High   Low-Low High-High
9    Low-Low   Low-Low   Low-Low
10  Low-High  Low-High  Low-High
11 High-High High-High High-High
12 High-High High-High High-High
13   Low-Low   Low-Low  Low-High
14   Low-Low   Low-Low   Low-Low
15  High-Low   Low-Low  High-Low
16 High-High  Low-High High-High
17 High-High High-High High-High
18 High-High High-High High-High
19  Low-High  Low-High  Low-High
20  High-Low  High-Low  High-Low
21   Low-Low   Low-Low   Low-Low
22   Low-Low   Low-Low   Low-Low
23   Low-Low   Low-Low   Low-Low
24   Low-Low   Low-Low   Low-Low
25 High-High High-High High-High
26 High-High High-High High-High
27 High-High High-High High-High
28 High-High  Low-High High-High
29  High-Low  High-Low  High-Low
30   Low-Low   Low-Low   Low-Low
31   Low-Low   Low-Low   Low-Low
32  High-Low  High-Low  High-Low
33  High-Low  High-Low  High-Low
34  Low-High  Low-High  Low-High
35 High-High High-High High-High
36 High-High High-High High-High
37 High-High High-High High-High
38 High-High  Low-High High-High
39   Low-Low   Low-Low   Low-Low
40   Low-Low   Low-Low   Low-Low
41   Low-Low   Low-Low   Low-Low
42 High-High High-High High-High
43 High-High High-High High-High
44 High-High High-High High-High
45 High-High High-High High-High
46 High-High High-High High-High
47 High-High   Low-Low High-High
48   Low-Low   Low-Low   Low-Low
49  High-Low  High-Low  High-Low
50 High-High High-High High-High
51 High-High High-High High-High
52 High-High  Low-High High-High
53 High-High High-High High-High
54  High-Low  High-Low  High-Low
55   Low-Low   Low-Low   Low-Low
56  High-Low  High-Low  High-Low
57 High-High High-High High-High
58 High-High  Low-High High-High
59  Low-High  Low-High  Low-High
60   Low-Low   Low-Low   Low-Low
61   Low-Low   Low-Low   Low-Low
62   Low-Low   Low-Low   Low-Low
63   Low-Low   Low-Low   Low-Low
64   Low-Low   Low-Low   Low-Low
65  High-Low  High-Low  High-Low
66   Low-Low   Low-Low   Low-Low
67   Low-Low   Low-Low   Low-Low
68   Low-Low   Low-Low   Low-Low

localI.oi1[localI.oi1[,5]<0.05,]

           Ii        E.Ii     Var.Ii      Z.Ii Pr(z != E(Ii))
34 -12.611814 -0.35173055  20.440912 -2.711710    0.006693708
36   8.324518 -0.16293135   9.702537  2.724799    0.006434058
42   3.876326 -0.06265339   3.778739  2.026330    0.042730963
44   6.743324 -0.10691401   6.412220  2.705213    0.006826060
49  -7.745156 -0.16293135   9.702537 -2.434187    0.014925274
62   7.898812 -0.13181060   7.880471  2.860705    0.004226997
63  24.379711 -2.52741658 105.098428  2.624637    0.008674141
67  10.571781 -0.26666718  15.669793  2.738013    0.006181165
68   7.898812 -0.13181060   7.880471  2.860705    0.004226997

#There are more significant values for the birds

#With inverse distance
localI.oi1=localmoran(div35[,2], invd.weights)
?localmoran
localI.oi1

              Ii          E.Ii       Var.Ii         Z.Ii Pr(z != E(Ii))
1  -9.808124e-05 -5.405897e-06 2.540841e-08 -0.581400615   5.609705e-01
2   7.307599e-06 -1.472730e-07 9.032237e-10  0.248051867   8.040943e-01
3  -2.810989e-06 -3.800588e-06 2.307916e-08  0.006514018   9.948026e-01
4  -1.085793e-05 -2.903285e-06 1.767340e-08 -0.059835711   9.522865e-01
5  -1.729350e-05 -9.845974e-06 5.463545e-08 -0.031862110   9.745820e-01
6   2.864836e-05 -2.673903e-06 1.515846e-08  0.254404963   7.991827e-01
7  -2.827084e-05 -1.472734e-07 9.032279e-10 -0.935775786   3.493886e-01
8  -3.012427e-06 -1.908660e-08 1.170985e-10 -0.276618045   7.820734e-01
9  -1.211676e-06 -7.096197e-07 3.957084e-09 -0.007981142   9.936320e-01
10 -1.252264e-04 -3.800592e-06 2.307921e-08 -0.799283167   4.241262e-01
11  1.062133e-04 -5.345703e-06 3.423476e-08  0.602935626   5.465515e-01
12  2.527150e-05 -4.701166e-07 3.047898e-09  0.466268017   6.410237e-01
13  1.064689e-05 -1.589090e-07 1.031054e-09  0.336524029   7.364757e-01
14  4.348244e-05 -8.202525e-07 5.021464e-09  0.625194153   5.318436e-01
15 -6.478429e-06 -1.500430e-08 7.545923e-11 -0.744057476   4.568417e-01
16 -1.226518e-06 -2.059460e-08 1.336708e-10 -0.104304126   9.169280e-01
17  2.111920e-04 -5.345702e-06 3.423475e-08  1.170307388   2.418773e-01
18  3.149780e-05 -4.701169e-07 3.047902e-09  0.579046943   5.625575e-01
19 -6.812320e-05 -2.199048e-06 1.419535e-08 -0.553313771   5.800486e-01
20 -2.087218e-04 -5.345698e-06 3.423471e-08 -1.099174409   2.716920e-01
21 -7.510704e-05 -2.581663e-05 1.473653e-07 -0.128399955   8.978325e-01
22  6.522670e-05 -1.472732e-07 9.032257e-10  2.175237652   2.961231e-02
23  2.614028e-05 -2.199048e-06 1.419535e-08  0.237857250   8.119918e-01
24 -2.389723e-05 -5.625367e-06 3.161253e-08 -0.102766883   9.181480e-01
25  6.557757e-07 -1.507475e-06 9.748010e-09  0.021910329   9.825195e-01
26  3.371663e-04 -5.345701e-06 3.423474e-08  1.851152595   6.414760e-02
27  1.232998e-04 -1.507475e-06 9.748013e-09  1.264100900   2.061938e-01
28  9.198937e-06 -2.059462e-08 1.336710e-10  0.797425792   4.252038e-01
29  8.521788e-05 -5.345700e-06 3.423473e-08  0.489463306   6.245137e-01
30  4.684646e-05 -1.589089e-07 1.031053e-09  1.463884920   1.432254e-01
31 -2.989662e-05 -9.668027e-06 6.123752e-08 -0.081744204   9.348501e-01
32 -2.240706e-04 -3.132668e-06 2.017461e-08 -1.555491548   1.198290e-01
33 -5.085219e-05 -4.012694e-07 2.290072e-09 -1.054252242   2.917674e-01
34 -7.257953e-04 -1.758653e-05 1.091337e-07 -2.143788470   3.204984e-02
35  2.882244e-04 -1.551180e-05 9.678110e-08  0.976340580   3.288957e-01
36  4.421444e-04 -8.146567e-06 5.180167e-08  1.978433151   4.787986e-02
37  1.616712e-04 -3.132671e-06 2.017465e-08  1.160284543   2.459330e-01
38  1.050212e-05 -2.059461e-08 1.336709e-10  0.910142100   3.627476e-01
39  8.040522e-06 -8.850594e-07 5.732126e-09  0.117890452   9.061545e-01
40  6.132615e-05 -1.589090e-07 1.031053e-09  1.914824438   5.551490e-02
41 -1.943095e-04 -1.235704e-05 7.328858e-08 -0.672109186   5.015142e-01
42  7.925463e-05 -2.903283e-06 1.767337e-08  0.618001778   5.365742e-01
43  5.878111e-04 -1.153526e-05 7.271505e-08  2.222622489   2.624127e-02
44  3.371663e-04 -5.345702e-06 3.423476e-08  1.851152516   6.414761e-02
45  2.321879e-04 -5.345704e-06 3.423478e-08  1.283782237   1.992182e-01
46  8.985150e-05 -1.507475e-06 9.748015e-09  0.925322496   3.547982e-01
47 -3.832887e-06 -2.059461e-08 1.336709e-10 -0.329737073   7.415986e-01
48 -1.085218e-04 -2.199047e-06 1.419534e-08 -0.892386962   3.721856e-01
49 -1.991094e-04 -6.953530e-06 3.892181e-08 -0.973995793   3.300586e-01
50  5.592035e-05 -4.356932e-07 2.670024e-09  1.090643627   2.754297e-01
51  1.692004e-04 -5.345700e-06 3.423474e-08  0.943358185   3.454977e-01
52  1.180529e-05 -2.059462e-08 1.336709e-10  1.022857716   3.063751e-01
53  1.904517e-05 -4.701166e-07 3.047898e-09  0.353488020   7.237226e-01
54  3.309061e-05 -3.132671e-06 2.017465e-08  0.255026232   7.987028e-01
55 -7.452908e-06 -1.589092e-07 1.031057e-09 -0.227156114   8.203024e-01
56 -4.428675e-04 -2.903289e-06 1.767345e-08 -3.309457115   9.347708e-04
57  3.846216e-05 -3.769281e-07 2.104067e-09  0.846718914   3.971518e-01
58  7.413021e-06 -1.908661e-08 1.170985e-10  0.686809530   4.922028e-01
59  2.150674e-05 -1.589090e-07 1.031053e-09  0.674731652   4.998463e-01
60 -7.130883e-05 -6.590525e-06 4.207363e-08 -0.315516559   7.523695e-01
61  4.684639e-05 -1.589090e-07 1.031054e-09  1.463882049   1.432262e-01
62  6.746909e-04 -6.590539e-06 4.207381e-08  3.321395754   8.956843e-04
63  1.119689e-03 -1.171176e-04 4.915527e-07  1.764074499   7.771944e-02
64  5.397255e-06 -7.554456e-07 4.306902e-09  0.093752602   9.253057e-01
65 -2.206088e-04 -2.903283e-06 1.767337e-08 -1.637607130   1.015037e-01
66  2.504894e-04 -1.235703e-05 7.328845e-08  0.970921935   3.315872e-01
67  5.546073e-04 -1.235706e-05 7.328882e-08  2.094290519   3.623411e-02
68  7.969751e-04 -5.284144e-06 2.904507e-08  4.707372008   2.509307e-06
attr(,"call")
localmoran(x = div35[, 2], listw = invd.weights)
attr(,"class")
[1] "localmoran" "matrix"     "array"     
attr(,"quadr")
        mean    median     pysal
1   High-Low  High-Low  High-Low
2    Low-Low   Low-Low   Low-Low
3    Low-Low   Low-Low  Low-High
4   High-Low  High-Low  High-Low
5   High-Low  High-Low  High-Low
6   High-Low  High-Low High-High
7   Low-High  Low-High  Low-High
8   High-Low   Low-Low  High-Low
9    Low-Low   Low-Low  Low-High
10  Low-High   Low-Low  Low-High
11 High-High High-High High-High
12 High-High High-High High-High
13  Low-High  Low-High   Low-Low
14   Low-Low   Low-Low   Low-Low
15  High-Low   Low-Low  High-Low
16 High-High  Low-High  High-Low
17 High-High High-High High-High
18 High-High High-High High-High
19  Low-High  Low-High  Low-High
20  High-Low  High-Low  High-Low
21   Low-Low   Low-Low  Low-High
22   Low-Low   Low-Low   Low-Low
23  Low-High  Low-High   Low-Low
24   Low-Low   Low-Low  Low-High
25 High-High High-High High-High
26 High-High High-High High-High
27 High-High High-High High-High
28 High-High  Low-High High-High
29 High-High High-High High-High
30   Low-Low   Low-Low   Low-Low
31  Low-High  Low-High  Low-High
32  High-Low  High-Low  High-Low
33  High-Low  High-Low  High-Low
34  Low-High  Low-High  Low-High
35 High-High High-High High-High
36 High-High High-High High-High
37 High-High High-High High-High
38 High-High  Low-High High-High
39  Low-High  Low-High   Low-Low
40   Low-Low   Low-Low   Low-Low
41  Low-High   Low-Low  Low-High
42 High-High  High-Low High-High
43 High-High High-High High-High
44 High-High High-High High-High
45 High-High High-High High-High
46 High-High High-High High-High
47 High-High  Low-High  High-Low
48  Low-High  Low-High  Low-High
49  High-Low  High-Low  High-Low
50 High-High High-High High-High
51 High-High High-High High-High
52 High-High  Low-High High-High
53 High-High High-High High-High
54 High-High High-High High-High
55  Low-High  Low-High  Low-High
56  High-Low  High-Low  High-Low
57  High-Low  High-Low High-High
58 High-High   Low-Low High-High
59  Low-High   Low-Low   Low-Low
60  Low-High  Low-High  Low-High
61   Low-Low   Low-Low   Low-Low
62   Low-Low   Low-Low   Low-Low
63   Low-Low   Low-Low   Low-Low
64   Low-Low   Low-Low   Low-Low
65  High-Low  High-Low  High-Low
66   Low-Low   Low-Low   Low-Low
67   Low-Low   Low-Low   Low-Low
68   Low-Low   Low-Low   Low-Low

localI.oi1[localI.oi1[,5]<0.05,]

              Ii          E.Ii       Var.Ii      Z.Ii Pr(z != E(Ii))
22  0.0000652267 -1.472732e-07 9.032257e-10  2.175238   2.961231e-02
34 -0.0007257953 -1.758653e-05 1.091337e-07 -2.143788   3.204984e-02
36  0.0004421444 -8.146567e-06 5.180167e-08  1.978433   4.787986e-02
43  0.0005878111 -1.153526e-05 7.271505e-08  2.222622   2.624127e-02
56 -0.0004428675 -2.903289e-06 1.767345e-08 -3.309457   9.347708e-04
62  0.0006746909 -6.590539e-06 4.207381e-08  3.321396   8.956843e-04
67  0.0005546073 -1.235706e-05 7.328882e-08  2.094291   3.623411e-02
68  0.0007969751 -5.284144e-06 2.904507e-08  4.707372   2.509307e-06

#We can also represent the Local Index value using this code lines:
# par(mfrow=c(2,1))
# s.value(xy,localI.pp1[,1], ylim=c(min(div35[,4])-10000,max(div35[,4])+10000), xlim=c(min(div35[,3])-10000,max(div35[,3])+10000), sub="local I (plant)")
# s.value(xy,localI.oi1[,1], ylim=c(min(div35[,4])-10000,max(div35[,4])+10000), xlim=c(min(div35[,3])-10000,max(div35[,3])+10000), sub="local I (bird)")

#Clusters of values that look alike and outliers (values that don't look alike)
#Sometimes close values are similar, sometimes not

2.3 Spatio-temporal indices

It is not unusual for a geolocalised database to have observations raised at different points in time. It may be interesting to understand how a localised phenomenon has spread and evolved in space and time and how it can be linked to the conditions of the environment surrounding it. In this case, it is important to be able to assess how the underlying spatial structures change over different periods of time. On spatiotemporal data, prior graphical exploration of cross-section data (standard Moran’s I) can be used to study the existence and change in grouping or dispersion trends that are statistically significantly different from random models. Many recent developments show a growing interest in analysing spatio-temporal data in many areas of research such as physics, meteorology, economics and environmental studies. By extending the Moran index to include time attributes, it becomes possible to calculate global and localized indices that concurrently take into account spatial and temporal auto-correlations. This can also be done on the basis of spatial-temporal risk weighting matrices.

3 - Spatial modeling

Once the structure of the spatial correlation has be estimated, it can then be included in the model.

Depending on the structure of the autocorrelation and especially if the process is or is not stationary,and of the structure of the residuals, the analysis required will be different and more or less difficult.

For the analysis under R, it will be important to import the package spatialreg, spdep as well as gwrr

library(spatialreg)
library(spdep)
library(gwrr)

3.1.Stationary process

For the models described here, it is required for the processes to be stationary, meaning that the process is constant across regions. Another assumption is that the spatial autocorrelation is isotropic. This means that the cause of the autocorrelation act the same way in every direction. Anisotropy may be caused for example by environmental factors like the wind or water currents.

All the models showed here will be based on the general linear model modified to take the spatial autocorrelation into account :

Y = X\(\beta\) + \(\epsilon\)

3.1.1. Spatially lagged X model

Theory

This analysis best perform on local analysis. If the scale is to big, it may be wiser to use another analysis as this one will lack precision.

The idea behind this analysis is that the state of a neighbor region will affect our region.

This model estimate how well the response variable at one site reflect the response values at its surrounding sites.

An extra parameter : WX\(\theta\), called the autocovariate is added to the general linear model’s formula. It is a distance-weighted function of the neighboring response value.

Y = X\(\beta\) + WX\(\theta\) + \(\epsilon\)

The argument WX\(\theta\) is the average of the neighboring Xs.

Application in R

(For the purpose of the example, we will assume that the analysis is the right one even though other would be more fitted)

As this analysis is quite simple the average value of the neighboring regions can be calculated by hand and then added to the general linear formula.

However, there are commands in R in the package sdpep that allows to do the same analysis ans to go a little further, as well as giving the R-squared value and the p value.

Our exemple here will be the study of plants and bird richness in Ile & Vilaine.

(The dataset “div35” set can be found in the doc called “Spatial autocorrelation_ modelisation”)

(To see how to create weighted matrix, see the previous parts)

div35 <- read.table("div35.txt", header=T, dec=",")

xy=div35[,3:4] #Extract the coordinates x and y of each points
plot(div35[,3:4])
#For a rook connection
div.knear4<-knearneigh(as.matrix(xy),4) #extract the neighbour points
knn2nb(div.knear4) # convert the object of type knn into a type nb # transformation to get as many lists than points/observation

Neighbour list object:
Number of regions: 68 
Number of nonzero links: 272 
Percentage nonzero weights: 5.882353 
Average number of links: 4 
Non-symmetric neighbours list

plot(knn2nb(div.knear4), xy, add=TRUE)

#For a Queen conection
div.knear8<-knearneigh(as.matrix(xy),8)
knn2nb(div.knear8)

Neighbour list object:
Number of regions: 68 
Number of nonzero links: 544 
Percentage nonzero weights: 11.76471 
Average number of links: 8 
Non-symmetric neighbours list

plot(knn2nb(div.knear8), xy, add=TRUE)

#Spatial weighted matrix
pond4.stan<- nb2listw(knn2nb(div.knear4), style="W", zero.policy=TRUE)
pond8.stan<- nb2listw(knn2nb(div.knear8), style="W", zero.policy=TRUE)

#The argument "style = "w"" allows to determine the weight given to each link. 
# "W" for standardized = 1/nb of connections
#if you want to give the same weight to each link, then use the argument "B" instead of "w".

(to be able to use the weighted matrix in the analysis, the items must be a class “listw”, hence the argument “nb2listw”)

The function used for spatially weighted matrix is “lmslx”. In the argument :

First the equation tested in shape of : Y ~ X1+X2+X3 …
the data set (here div35)
The weighted matrix ( here pond8.stan)

lmsSLX=spatialreg::lmSLX(rich_ois~rich_pp,data=div35,pond8.stan)
summary(lmsSLX)


Call:
lm(formula = formula(paste("y ~ ", paste(colnames(x)[-1], collapse = "+"))), 
    data = as.data.frame(x), weights = weights)

Residuals:
     Min       1Q   Median       3Q      Max 
-15.8936  -1.6548   0.5529   2.1413   6.5077 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)  -7.4084    10.8417  -0.683   0.4968  
rich_pp       0.4686     0.1856   2.525   0.0140 *
lag.rich_pp   1.1986     0.4717   2.541   0.0135 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.02 on 65 degrees of freedom
Multiple R-squared:  0.2395,    Adjusted R-squared:  0.2161 
F-statistic: 10.23 on 2 and 65 DF,  p-value: 0.000137

Here, 2 coefficients are given as well as their p-value. The first one (rich_pp) correspond to the argument X\(\beta\) of our original function. The second one (lag.rich_pp) correspond to WX\(\theta\).

Here there is only one X, and therefore only 2 coefficients, but with a bigger model to test, the coefficients corresponding to the X will be named just like the original Xs, and the coefficients for the autocorrelation will be named “lag.X”.

As always, we only keep the coefficients that have a significant p-value. Here they all do. Our final model here would be :

\(Bird\:Richness = -7.40 + 0.47X + 1.19WY + \epsilon\)

An interpretation of those results here is that the more plants there is in a place, the more bird will be present as well. As for the lag value, we can expect that the more plants there will in the neighboring regions, the more birds there will in our region.

This model works at small scale only

The first estimate, that regards the response variable is called the direct effect, wherease the factor impacting the autocorrelation is called the indirect factor. However we can have an idea of the general effect on the whole region by combining the two effects int the total effect.

impacts(lmsSLX)

Impact measures (SlX, glht):
           Direct Indirect    Total
rich_pp 0.4685765 1.198607 1.667184

summary(impacts(lmsSLX, R=500),zstats=TRUE)

Impact measures (SlX, glht, n-k):
           Direct Indirect    Total
rich_pp 0.4685765 1.198607 1.667184
========================================================
Standard errors:
           Direct  Indirect     Total
rich_pp 0.1856093 0.4717312 0.4377709
========================================================
Z-values:
          Direct Indirect    Total
rich_pp 2.524532 2.540869 3.808348

p-values:
        Direct   Indirect Total    
rich_pp 0.011585 0.011058 0.0001399

The function impact only gives the total effect as the sum of the direct and indirect effect. Putting the function into a summary allows to show standard error, and the p-values. The total effect represent the overall impact of the variable when applied to the whole map.

Sometimes, non-significant direct and indirect effect may become significant when looking at the total effect

3.1.2. - Simultaneous autoregressive models (SAR)

To be able to study an autocorrelation at a larger scale, other models must be used.

The SAR model is an autoregressive model.These models take spatial autocorrelation into account by using a neighborhood matrix specifying the relationship between the residuals at each location.

The SAR model allows to study directional processes, which, in ecology, are quite common ( ex : wind dispersal or stream flows). The null hypothesis for the general SAR model is that the Ys are mutually independent.

The autoregressif process can be found in either the residuals of the data or in the response value. Depending on where this process is thought to be,three models are available :

The Error model
The Lag model
The Mix model

A specific diagnostic can be made to chose between either one.

3.1.2.1. Lagrange Multiplier diagnostic

Theory

The idea for this test is to statistically check which of the SAR model would best fit to our equation.

The Lagrange multiplier diagnostic tests the fit of the model under the null hypothesis and looks for evidence of departures in the direction of interest. In summary, it will examine the residuals of a model and look for specific types of non-randomness.

Application in R

The command lm.LMtests() allows to get the lagrange multiplier for a set model.

lmO=lm(div35$rich_ois~div35$rich_pp)
lm.LMtests(lmO,pond8.stan, test="all")


    Lagrange multiplier diagnostics for spatial dependence

data:  
model: lm(formula = div35$rich_ois ~ div35$rich_pp)
weights: pond8.stan

LMerr = 1.7447, df = 1, p-value = 0.1865


    Lagrange multiplier diagnostics for spatial dependence

data:  
model: lm(formula = div35$rich_ois ~ div35$rich_pp)
weights: pond8.stan

LMlag = 4.1598, df = 1, p-value = 0.04139


    Lagrange multiplier diagnostics for spatial dependence

data:  
model: lm(formula = div35$rich_ois ~ div35$rich_pp)
weights: pond8.stan

RLMerr = 3.7287, df = 1, p-value = 0.05349


    Lagrange multiplier diagnostics for spatial dependence

data:  
model: lm(formula = div35$rich_ois ~ div35$rich_pp)
weights: pond8.stan

RLMlag = 6.1438, df = 1, p-value = 0.01319


    Lagrange multiplier diagnostics for spatial dependence

data:  
model: lm(formula = div35$rich_ois ~ div35$rich_pp)
weights: pond8.stan

SARMA = 7.8884, df = 2, p-value = 0.01937

The arguments are :

The dataset (here lm0)
The weight matrix (here pond8.stan)
test = the type of test that will be made (here, “all” means that the fives tests will be conducted : LMerr, LMlag, RLMerr, RLMlag, SARMA)

The firsts p-values to look at are for the normal tests. The significant one will be the analysis conducted later. If non of them are significant, we look at the robust tests.

3.1.2.2. Lag models

Theory

In this model, we assume that the autoregressif process is in the response variable. As this model is general, this means that anything affecting our Y, will also affect its neighbor in some way, affecting us in return. This creates a feedback effect.

It would for exemple be liked a stone thrown in a lake. This creates waves that propagate to the whole lake.

The lag model is used to take into account a autoregressif process only occuring in the response variable. Therefore, the responsable variable argument in the model will include a term for the spatial autocorrelation : ρW

The model will take the form of :

Y =\(\rho\)Wx +X\(\beta\) + \(\epsilon\)

ρ is the autoregression parameter W is the spatial weights matrix

Application in R

lagsarlm(rich_ois~rich_pp, div35, pond8.stan)


Call:
lagsarlm(formula = rich_ois ~ rich_pp, data = div35, listw = pond8.stan)
Type: lag 

Coefficients:
        rho (Intercept)     rich_pp 
  0.3733486   8.2046960   0.5261261 

Log likelihood: -191.0906

Using the values obtained here, the final model would be:

\(Bird\:Richness = 0.37\:WX +0.53\:X + \epsilon\)

3.1.2.3. Error models

Theory

When using the error model, we assume that the autoregressif process is only in the residuals.

When dealing with such model, an interpretation may be that some spatially correlated variable are missing in the dataset.

this model takes the form of :

Y = X\(\beta\) + \(\lambda\)WY + \(\epsilon\)

The term \(\lambda\)W represent the spatial structure with \(\lambda\) being the spatial autoregression coefiscient. \(\gamma\) is the spatially dependant error term.

Application in R

errorsarlm(rich_ois~rich_pp, data = div35, pond8.stan)


Call:
errorsarlm(formula = rich_ois ~ rich_pp, data = div35, listw = pond8.stan)
Type: error 

Coefficients:
     lambda (Intercept)     rich_pp 
  0.3331847  21.0332219   0.5184891 

Log likelihood: -191.8431

Using the values obtained here, the final model would be:

\(Bird\:Richness =0.52\:X + 0.33\:WY + \epsilon\)

3.1.2.4. Mixed models

Theory

If the spatial autocorrelation affect both the response variable and the residuals, the mix model is used. This model include all the parameter added above, and it is the most inclusive model.

It can be noted that this model can also be used in both situation described previously, as the coefficient will cancel themselves out if either one is not present.

Y = \(\rho\)WX +X\(\beta\) + \(\lambda\)WY + \(\epsilon\)

Application in R

lmsarD=spatialreg::sacsarlm(rich_ois~rich_pp,data=div35,pond8.stan)
summary(lmsarD)


Call:spatialreg::sacsarlm(formula = rich_ois ~ rich_pp, data = div35, 
    listw = pond8.stan)

Residuals:
      Min        1Q    Median        3Q       Max 
-15.48701  -1.58517   0.52216   2.42819   6.27633 

Type: sac 
Coefficients: (asymptotic standard errors) 
            Estimate Std. Error z value  Pr(>|z|)
(Intercept) -1.07853    6.34383 -0.1700 0.8650009
rich_pp      0.53751    0.16269  3.3038 0.0009539

Rho: 0.63823
Asymptotic standard error: 0.20901
    z-value: 3.0535, p-value: 0.0022618
Lambda: -0.65643
Asymptotic standard error: 0.50258
    z-value: -1.3061, p-value: 0.19151

LR test value: 4.7419, p-value: 0.093391

Log likelihood: -190.4117 for sac model
ML residual variance (sigma squared): 14.311, (sigma: 3.783)
Number of observations: 68 
Number of parameters estimated: 5 
AIC: NA (not available for weighted model), (AIC for lm: 391.57)

From here, we can replace in the formula :

\(Bird\:Richness = 0.64WX + 0.53X - 0.65\)

For exemple purpose, the lambda was implemented in the model. However it was not necessary here as the associated value was not significant

Even though three commands are available for each type of SAR model, it is best to only use the mixed model function sacsarlm, as the other ones don’t provide p-values, and this one provides more information than the others.

(If the image doesn’t load, it is available in the Spatial autocorrelation_ modelisation doc)

3.2. - Non stationary process

Graphically weighted models (GWM)

Theory

In ecology, a lot of processes are non stationary. This model is one of the few that allow to deal with non-stationarity. It is a local modeling.

The model is fitted to be locally linear :

\[Y_i = \beta_{0i}\:+\:\sum_k\beta_{ki}x_{ki}\:+\epsilon_i\]

The \(\beta\) here now correspond to the coefficient at the location i.

It applies a geographical weight to the data used in each model, meaning that the further away a point will be from the regression point, the less weight they will have. This model require the use of a function called the Kernel function. It consists in an equation giving the weights \(w_ij\) relating the focal point \(i\) to each observation \(j\). This means it places a “window” over the observation.

To define the “window” or bandwidth, a local regression is made with a set bandwidth and then used to predict the values of the dependant variables. The residuals obtained are compared to other residuals from other regressions made with different bandwidth and the smallest local residuals are kept.

Once the bandwidth is defined, the regression is made for each points and the result can be mapped.

Application in R

The function gwr.est will define the optimal bandwidth.

lmgwrD=gwr.est(rich_ois~rich_pp,data=div35,locs=xy)

[1] "Bandwidth:  79992  RMSPE : 4.216"
[1] "Bandwidth:  60260  RMSPE : 4.175"
[1] "Bandwidth:  45461  RMSPE : 4.126"
[1] "Bandwidth:  34361  RMSPE : 4.07"
[1] "Bandwidth:  26037  RMSPE : 4.012"
[1] "Bandwidth:  7306  RMSPE : 3.983"
[1] "Bandwidth:  21354  RMSPE : 3.972"
[1] "Bandwidth:  17842  RMSPE : 3.943"
[1] "Bandwidth:  9940  RMSPE : 3.927"
[1] "Bandwidth:  15867  RMSPE : 3.928"
[1] "Bandwidth:  14385  RMSPE : 3.919"
[1] "Bandwidth:  11052  RMSPE : 3.918"
[1] "Bandwidth:  13552  RMSPE : 3.916"
[1] "Bandwidth:  11677  RMSPE : 3.915"
[1] "Bandwidth:  13083  RMSPE : 3.915"
[1] "Bandwidth:  12028  RMSPE : 3.915"
[1] "Bandwidth:  12819  RMSPE : 3.915"
[1] "Bandwidth:  12226  RMSPE : 3.915"
[1] "Bandwidth:  12671  RMSPE : 3.915"
[1] "Bandwidth:  12337  RMSPE : 3.914"
[1] "Bandwidth:  12588  RMSPE : 3.914"
[1] "Bandwidth:  12525  RMSPE : 3.914"
[1] "Bandwidth:  12384  RMSPE : 3.914"
[1] "Bandwidth:  12490  RMSPE : 3.914"
[1] "Bandwidth:  12411  RMSPE : 3.914"
[1] "Bandwidth:  12470  RMSPE : 3.914"
[1] "Bandwidth:  12425  RMSPE : 3.914"
[1] "Bandwidth:  12437  RMSPE : 3.914"
[1] "Bandwidth:  12462  RMSPE : 3.914"
[1] "Bandwidth:  12443  RMSPE : 3.914"
[1] "Bandwidth:  12457  RMSPE : 3.914"
[1] "Bandwidth:  12453  RMSPE : 3.914"
[1] "Bandwidth:  12445  RMSPE : 3.914"
[1] "Bandwidth:  12447  RMSPE : 3.914"
[1] "Bandwidth:  12452  RMSPE : 3.914"
[1] "Bandwidth:  12449  RMSPE : 3.914"
[1] "Bandwidth:  12451  RMSPE : 3.914"
[1] "Bandwidth:  12450  RMSPE : 3.914"
[1] "Bandwidth:  12449  RMSPE : 3.914"
[1] "Bandwidth:  12450  RMSPE : 3.914"
[1] "Bandwidth:  12449  RMSPE : 3.914"
[1] "Bandwidth:  12450  RMSPE : 3.914"
[1] "Bandwidth:  12449  RMSPE : 3.914"
[1] "Bandwidth:  12450  RMSPE : 3.914"
[1] "Bandwidth:  12450  RMSPE : 3.914"
[1] "Bandwidth:  12450  RMSPE : 3.914"
[1] "Bandwidth:  12450  RMSPE : 3.914"

summary(lmgwrD)

        Length Class  Mode   
phi       1    -none- numeric
RMSPE     1    -none- numeric
beta    136    -none- numeric
yhat     68    -none- numeric
RMSE      1    -none- numeric
rsquare   1    -none- numeric

#The value for each parameters are obtained by directly fetching the parameters from the list :
lmgwrD$phi #gives the bandwidth

[1] 12449.71

lmgwrD$RMSPE #gives the root-square of the bandwidth

[1] 3.914431

Phi: kernel bandwidth

RMSPE: root square of error of the prediction based on bandwidth estimated

Beta: matrix with the coefficients for each point (intercept line [1,] and regression coefficient line [2,])

Yhat: model estimate

RMSE: root square of error estimation

Rsquare: model fitting

plot(xy)
s.value(xy,lmgwrD$beta[2,],add.plot=T)

The result can finally be plotted for each point.

References

F. Dormann, Carsten, Jana M. McPherson, Miguel B. Araújo, Roger Bivand, Janine Bolliger, Gudrun Carl, Richard G. Davies, et al. « Methods to Account for Spatial Autocorrelation in the Analysis of Species Distributional Data: A Review ». Ecography 30, nᵒ 5 (2007): 609‑28. https://doi.org/10.1111/j.2007.0906-7590.05171.x.

Smith, P. A. (1994). Autocorrelation in Logistic Regression Modelling of Species’ Distributions. Global Ecology and Biogeography Letters, 4(2), 47–. doi:10.2307/2997753

Getis, Arthur. «Spatial Autocorrelation». In Handbook of Applied Spatial Analysis: Software Tools, Methods and Applications, édité par Manfred M. Fischer et Arthur Getis, 255‑78. Berlin, Heidelberg: Springer, 2010. https://doi.org/10.1007/978-3-642-03647-7_14.

https://www.youtube.com/watch?v =b3HtV2Mhmvk

Engle, Robert F. « A general approach to lagrange multiplier model diagnostics ». Journal of Econometrics 20, nᵒ 1 (1 octobre 1982): 83‑104. https://doi.org/10.1016/0304-4076(82)90104-X.

Class and powerpoint made by Mrs. Le Gouar Pascaline

GWR informations : https://pro.arcgis.com/en/pro-app/latest/tool-reference/spatial-statistics/how-geographicallyweightedregression-works.htm

Páez, A., et D. C. Wheeler. « Geographically Weighted Regression ». In International Encyclopedia of Human Geography, édité par Rob Kitchin et Nigel Thrift, 407‑14. Oxford: Elsevier, 2009. https://doi.org/10.1016/B978-008044910-4.00447-8.

https://gdsl-ul.github.io/san/09-gwr.html

Anselin, Luc (2013). Spatial econometrics: methods and models. Vol. 4. Springer Science & Business Media.

Anselin, Luc and Daniel A Griffith (1988). « Do spatial effects really matter in regression analysis? » Papers in Regional Science 65.1, pp. 11–34.

Bivand, Roger S, Edzer Pebesma, and Virgilio Gomez-Rubio (2013b). « Spatial Neighbors ». Applied Spatial Data Analysis with R. Springer, pp. 83–125.

Toussaint (2014). « The sphere of influence graph: Theory and applications ». International Journal of Information Thechnology and Computer Science 14.2.

Smith, Tony E. (2016). Notebook on Spatial Data Analysis. http://www.seas.upenn.edu/ ese502/notebo

Tiefelsdorf, Michael (1998). « Modelling spatial processes: The identification and analysis of spatial relationships in regression residuals by means of Moran’s I (Germany) ». PhD thesis. Université Wilfrid Laurier.