Application to movement ecology
ENSAI - CREST
https://marieetienne.github.io/MAF/
December 9, 2024
$$
$$
Movement informs on internal states and habitat preferences
A continuous process sampled at some discrete, potentially irregular times.
Time series with values in \(\mathbb{R}^2\) (on Earth…).
\[ \begin{array}{|c|c|c|c|} \hline \text{Time} & \text{Location} & \text{Turning Angle} & \text{Speed} \\ \hline t_{0} & (x_0, y_0) & \text{NA} & \text{NA} \\ t_{1} & (x_1, y_1) & \text{NA} & sp_1 \\ t_{2} & (x_2, y_2) & ang_2 & sp_2 \\ \vdots & \vdots & \vdots & \vdots \\ t_{n} & (x_n, y_n) & ang_n & sp_n \\ \hline \end{array} \]
Rarely in a supervised context
Mostly unsupervised
Often analysed using discrete time model and typically Hidden Markov Model
\[\left .\begin{array}{c} Y_{1k} \sim f(\theta_{1\ell})\\ Y_{2k} \sim f(\theta_{2\ell}) \end{array} \right\rbrace \mbox{ if } k \mbox{ in region }I_{\ell}=[\tau_{\ell-1}+1,\tau_{\ell}]\]
The best segmentation for a given number of change points L: \[argmin_{0< \tau_1 < \ldots < \tau_L} C(\boldsymbol{Y}) \]
Brute force complexity : \(O(N^L)\),
Dynamic Programming algorithm complexity : \(O(N^2)\) or even less with pruning strategy
Limitation in the choice for \(f.\)
A parametric model (\(M, \boldsymbol{\theta}\)) produces \({\boldsymbol{Y}}\) and \({\boldsymbol{Z}}\).
The only observed data are \({\boldsymbol{Y}}\) while \({\boldsymbol{Z}}\) are hidden variables.
Questions are
Bayes formula is the key :
\[{\mathbb{P}}({\boldsymbol{Y}}, {\boldsymbol{Z}})={\mathbb{P}}({\boldsymbol{Y}}\vert {\boldsymbol{Z}}){\mathbb{P}}({\boldsymbol{Z}})={\mathbb{P}}({\boldsymbol{Z}}\vert {\boldsymbol{Y}}) {\mathbb{P}}({\boldsymbol{Y}})\]
Assume that the numer of states \(K\) is known
Model parameters \(\boldsymbol{\theta}=(\boldsymbol{\pi}, \boldsymbol{\gamma})\)
We could assume that
Question: If \(K=2\), what are \(\boldsymbol{\pi}\) and \(\boldsymbol{\gamma}\) ?
Couples \(\{(Y_i, Z_i)\}\) are i.i.d.
Label switching: the model is invariant for any permutation of the labels \(\{1, \dots, K\}\) \(\Rightarrow\) the mixture model has [\(K!\) equivalent definitions]{.rouge].
Distribution of a \(Y_i\): \[{\mathbb{P}}(Y_i)=\sum_{k=1}^K {\mathbb{P}}(Y_i, Z_i=k)= \class{bleu}{{\mathbb{P}}(Z_i=k)} \class{orange}{{\mathbb{P}}(Y_i | Z_i=k)}\]
Distribution of \({\boldsymbol{Y}}\):
\[\begin{align} {\mathbb{P}}({\boldsymbol{Y}}; \boldsymbol{\theta}) & = \prod_{i=1}^n \sum_{k=1}^K P(Y_i , Z_i=k;\boldsymbol{\theta}) \\ &= \prod_{i=1}^n \sum_{k=1}^K \class{bleu}{{\mathbb{P}}(Z_i=k; \boldsymbol{\pi})} \class{orange}{{\mathbb{P}}(Y_i | Z_i=k; \boldsymbol{\gamma})} \\ &= \prod_{i=1}^n \sum_{k=1}^K \class{bleu}{\pi_k} \class{orange}{f_{\gamma_k}(Y_i)} \end{align}\]
Modelling the dependence in state sequence: If an animal is feeding at time \(i\), he has more chance to be feeding at time \(i+1\) than if he was travelling at time \(i\). \[{\mathbb{P}}(Z_{i+1}=1 \vert Z_{i}=1) \ne {\mathbb{P}}(Z_{i+1}=1 \vert Z_{i}=2)\]
Markov Chain definition
\({\boldsymbol{Z}}\) is a Markov chain if \[P(Z_{i+1} \vert Z_{1:i}) = P(Z_{i+1} \vert Z_{i})\]
where \(Z_{1:i}\) stands for \(Z_{1}, \ldots, Z_i\).
\({\boldsymbol{Z}}\) is completely defined by the initial distribution \(\boldsymbol{\nu}\), \(\nu_k={\mathbb{P}}(Z_1=k)\) and the transition matrix \(\boldsymbol{\Pi}\)
Hidden States \({\boldsymbol{Z}}\) model: \({\boldsymbol{Z}}\) is assumed to follow a Markov Chain model with unknown initial distribution \(\boldsymbol{\nu}\) and transition matrix \(\boldsymbol{\Pi}\).
Observations \({\boldsymbol{Y}}\) model: The \(Y_i's\) are assumed to be independent conditionnaly on \({\boldsymbol{Z}}\) : \((Y_i\vert Z_i = k) \overset{i.i.d}{\sim} f_{\gamma_k}().\)
Model parameters are \(\boldsymbol{\theta}=(\boldsymbol{\nu}, \boldsymbol{\Pi}, \boldsymbol{\gamma})\)
Define all the key elements of a hidden Markov model with
Goal :
\[(\widehat{\boldsymbol{\gamma}},\widehat{\boldsymbol{\Pi}}, \widehat{\boldsymbol{\nu}}) = \arg\max_{\boldsymbol{\gamma}, \boldsymbol{\Pi}, {\boldsymbol{\nu}}} \log P({\boldsymbol{Y}}; \boldsymbol{\gamma}, \boldsymbol{\Pi}, \boldsymbol{\nu})\]
\[\log {\mathbb{P}}({\boldsymbol{Y}}; \boldsymbol{\theta}) = \log \sum_{k_1, k_N \in K^N }{\mathbb{P}}({\boldsymbol{Y}}, {\boldsymbol{Z}}= (k_1, \ldots,k_N); \boldsymbol{\theta})\] * No analytical estimators.
It is not always possible to compute since this sum typically involves \(K^n\) terms : \(2^{100}\approx10^{30}\)
Brute force algorithm is not the way
A convenient notation \(Z_{ik}=1\) if \(Z_i=k\)
\[\begin{align} \log {\mathbb{P}}({\boldsymbol{Y}}, {\boldsymbol{Z}}; \boldsymbol{\theta}) & = \class{bleu}{\sum_k Z_{1k} \log \nu_k }\\ &\class{bleu}{+ \sum_{i > 1} \sum_{k, \ell} Z_{i-1,k}Z_{i,\ell} \log \pi_{k\ell}} \\ & + \class{orange}{\sum_i \sum_k Z_{ik} \log f(y_i ; \gamma_k)} \end{align}\]
Idea : replace \(Z_{ik}\) by its best guess
\[\begin{align} {\mathbb{E}}\left( \log {\mathbb{P}}({\boldsymbol{Y}}, {\boldsymbol{Z}}; \boldsymbol{\theta})\vert Y_{1:N}\right) & = \class{bleufonce}{\sum_k{\mathbb{E}}\left( Z_{1k} \vert Y_{1:N}\right) \log \nu_k }\\ & \class{bleufonce}{+ \sum_{i > 1} \sum_{k, \ell} {\mathbb{E}}\left( Z_{i-1,k}Z_{i,\ell}\vert Y_{1:N}\right) \log \pi_{k\ell}} \\ & + \class{rouge}{\sum_i \sum_k {\mathbb{E}}\left(Z_{ik}\vert Y_{1:N}\right) \log f(X_i ; \gamma_k)} \end{align}\]
Expexctation Maximization (EM) algorithm
Bayes Formula
\[\begin{align} {\mathbb{P}}({\boldsymbol{Y}}, {\boldsymbol{Z}};\boldsymbol{\theta}) & = {\mathbb{P}}({\boldsymbol{Y}}\vert {\boldsymbol{Z}}; \boldsymbol{\theta}) P({\boldsymbol{Z}}; \boldsymbol{\theta}),\\ & = {\mathbb{P}}({\boldsymbol{Z}}\vert {\boldsymbol{Y}}; \boldsymbol{\theta}) {\mathbb{P}}({\boldsymbol{Y}}; \boldsymbol{\theta}). \end{align}\]
Therefore, \[\begin{align} \log {\mathbb{P}}({\boldsymbol{Y}}; \boldsymbol{\theta}) & = \log \left \lbrace {\mathbb{P}}({\boldsymbol{Y}}, {\boldsymbol{Z}};\boldsymbol{\theta}) / {\mathbb{P}}({\boldsymbol{Z}}\vert {\boldsymbol{Y}}; \boldsymbol{\theta}) \right\rbrace\\ & = \log {\mathbb{P}}({\boldsymbol{Y}}, {\boldsymbol{Z}};\boldsymbol{\theta}) - \log {\mathbb{P}}({\boldsymbol{Z}}\vert {\boldsymbol{Y}}; \boldsymbol{\theta}) \\ \end{align}\]
For a given \(\boldsymbol{\theta}_0\), we may compute \({\mathbb{P}}_{\boldsymbol{\theta}_0}=P({\boldsymbol{Z}}\vert \boldsymbol{\theta}_0, {\boldsymbol{Y}})\) and \[\begin{align} \log {\mathbb{P}}({\boldsymbol{Y}}; \boldsymbol{\theta}) &= \class{orange}{{\mathbb{E}}_{\boldsymbol{\theta}_0}(\log P({\boldsymbol{Y}}, {\boldsymbol{Z}};\boldsymbol{\theta})\vert {\boldsymbol{Y}})} - {\mathbb{E}}_{\boldsymbol{\theta}_0}(\log {\mathbb{P}}({\boldsymbol{Z}}\vert {\boldsymbol{Y}}; \boldsymbol{\theta})\vert {\boldsymbol{Y}})\\ & = \class{orange}{Q(\boldsymbol{\theta}, \boldsymbol{\theta}_0)} - H(\boldsymbol{\theta}, \boldsymbol{\theta}_0) \end{align}\]
Let’s compute it together
K=2, \(\pi_11=0.9\), \(pi_{22}=0.8\)
\[Y_i \vert Z_i = 1 \sim\mathcal{N}(2, 1)\]
\[Y_i \vert Z_i = 2 \sim\mathcal{N}(0, 1.5)\]
Initialisation of \(\boldsymbol{\theta}^{(0)}=(\Pi, \gamma_1, ..., \gamma_K)^{(0)}\).
While the convergence is not reached
[E-step] Calculation of
\[\begin{align} \tau^{(\ell)}_{i}(k)&={\mathbb{E}}[Z_{ik}| {\boldsymbol{Y}}, \boldsymbol{\theta}^{(\ell-1)}] = {\mathbb{P}}(Z_i = k | {\boldsymbol{Y}}, \boldsymbol{\theta}^{(\ell-1)})\\ \xi^{(\ell)}_{i}(k,h) &= {\mathbb{E}}[Z_{i-1,k}Z_{ih}| {\boldsymbol{Y}}, \boldsymbol{\theta}^{(\ell-1)}]\\ \end{align}\]
Smart algorithm Forward-Backward algorithm
[M-step] Maximization in \(\boldsymbol{\theta}=(\boldsymbol{\pi}, \boldsymbol{\gamma})\) of
\[\sum_k \tau^{(\ell)}_{1}(k)\log \nu_k + \sum_{i > 1} \sum_{k, h} \xi^{(\ell)}_{i}(k,h) \log \pi_{kh}+ \sum_i \sum_k \tau^{(\ell)}_{i}(k) \log f(y_i; \gamma_k)\]
Transition estimation \[\hat{\Pi}_{kh} = \frac{\sum_{i=1}^{N-1}{{\mathbb{E}}[Z_{i-1,k}Z_{ih}\vert Y_{0:n}]}}{\sum_{i=1}^{N-1}{{\mathbb{E}}[Z_{i-1,k}]}}= \frac{\sum_{i=1}^{N-1}\xi^{(\ell)}_{i}(k,h)}{\sum_{i=1}^{N-1}\tau^{(\ell)}_{i}(k)}\]
If Gaussian emissions: \[\begin{align} \hat{\mu}_k & = \frac{\sum_{i=1}^N{\mathbb{E}}[Z_{i,k}\vert Y_{1:N}] Y_i}{\sum_{i=1}^{N}{{\mathbb{E}}[Z_{i,k}\vert Y_{1:N}]}} = \frac{\sum_{i=1}^N \tau^{(\ell)}_{i}(k) Y_i}{\sum_{i=1}^{N}{\tau^{(\ell)}_{i}(k)}}\\ \hat{\sigma}_k^2 & = \frac{\sum_{i=1}^N \tau^{(\ell)}_{i}(k) (Y_i-\hat{\mu_i})^2 }{\sum_{i=1}^{N}{\tau^{(\ell)}_{i}(k)}} \end{align}\]
Problem: Computing \(\tau^{(\ell)}_{i}(k)= {\mathbb{E}}[Z_{i,k}\vert Y_{1:N}]\) and \(\xi^{(\ell)}_{i}(k,h)={\mathbb{E}}[Z_{i-1,k}Z_{ih}\vert Y_{1:N}]\)
recursion to compute \(\tau^{(\ell)}_{i}(k)\) and \(\xi^{(\ell)}_{i}(k,h)}={\mathbb{E}}[Z_{i-1,k}Z_{ih}\vert Y_{1:N}]\)
\[E(Z_{ik}\vert {\boldsymbol{Y}}) \quad \text{and} \quad E(Z_{ik}Z_{j\ell}\vert {\boldsymbol{Y}}).\]
Forward \[\alpha_n(i)={\mathbb{P}}_{\boldsymbol{\theta}}\left( Y_{0:n}, Z_n=i\right)\]
Initialisation \(\alpha_0(i)=\nu(i)f_{\gamma_i}(y_0)\)
for \(n=1,\ldots, N-1\) \[\alpha_{n+1}(i)= \left(\sum_{k=1}^K \alpha_n(k) \Pi_{ki}\right) f_{\tau_i}(y_{n+1})\]
Termination \[{\mathbb{P}}(Y_{0:N})=\sum_{k=1}^K(\alpha_N(k))\]
Backward \[\beta_n(i)={\mathbb{P}}_{\boldsymbol{\theta}}\left( Y_{n+1:N}\vert Z_n=i\right)\]
Initialisation
for \(n=1,\ldots, N-1\) \[\beta_{n}(i)= \left(\sum_{k=1}^K \Pi_{ik} f_{\gamma_k}(y_{n+1})\beta_{n+1}(k) \right) \]
Termination \[{\mathbb{P}}(Y_{0:N})=\sum_{j,k=1}^K(\beta_0(j) f_{\gamma_k}(y_0) \Pi_{kj} \nu(k))\]