class: center, middle, inverse, title-slide # The hidden part of Markovian stochastic processes for biology and ecology ### Marie-Pierre Etienne ### HDR Defense ### October 2021 --- name: intro <!-- F1D763 --> <!-- F7A913 --> <!-- C94326 --> <!-- 1F908E --> <!-- 33658A --> <!-- # Why do I enjoy research in statistics ? --> <!-- -- --> <!-- <div class= "addspace"> --> <!-- <li> Working with researchers from different background,</li> --> <!-- </div> --> <!-- <div class= "addspace"> --> <!-- <li> Never left the school system, </li> --> <!-- </div> --> <!-- <div class= "addspace"> --> <!-- <li> Continnuously learning new concepts, methods, tools </li> --> <!-- </div> --> <!-- <div class= "addspace"> --> <!-- <li> being confortable with my inside geek part. </li> --> <!-- </div> --> # How biology feeds statistics ? -- <figure> <img src="overview1.png" alt="at the beginning is" style="width:90%" class = "centerimg" /> </figure> --- template: intro count: false <figure> <img src="overview2.png" alt="at the beginning is" style="width:90%" class = "centerimg" /> </figure> --- template: intro count: false <figure> <img src="overview3.png" alt="at the beginning is" style="width:90%" class = "centerimg" /> </figure> --- template: intro count: false <figure> <img src="overview3_app.png" alt="at the beginning is" style="width:90%" class = "centerimg" /> </figure> --- template: intro count: false <figure> <img src="overview4.png" alt="at the beginning is" style="width:90%" class = "centerimg" /> </figure> --- template: intro count: false <figure> <img src="overview4_app.png" alt="at the beginning is" style="width:90%" class = "centerimg" /> </figure> --- template: intro count: false <figure> <img src="overview5.png" alt="at the beginning is" style="width:90%" class = "centerimg" /> </figure> --- template: intro count: false <figure> <img src="overview5_app.png" alt="at the beginning is" style="width:90%" class = "centerimg" /> </figure> --- # A hierarchical spatial model to map the fish density <img class="logopos_right" src="model.png" style="width:6%"> <img class="logopos_left" src="fish_model.png" style="height:7%"> <figure> <img src="overview3_app.png" alt="at the beginning is" style="width:90%" class = "centerimg" /> </figure> --- # A hierarchical spatial model to map the fish density <img class="logopos_right" src="model.png" style="width:6%"> <img class="logopos_left" src="fish_model.png" style="height:7%"> ## Marine ecology * France commitment for 2022 : 30% of territorial waters classified as Marine Protected Area (MPA) * Localization also matters * Mapping the fish density to identify critical areas -- ### 2 data sources .pull-left[ **Scientific data**: a yearly survey according to a standardized sampling plan. <figure> <img src="map_sci.png" alt="Orhago campaign and commercial data" style="width:60%" class = "centerimg" /> .legend[Common sole - ORHAGO survey (kg; 2018)] </figure> ] -- .pull-right[ **Commercial catch**: Catch geolocalized thanks to Vessel Monitoring System (VMS). Preferential sampling. <figure> <img src="map_com.png" alt="Orhago campaign and commercial data" style="width:60%" class = "centerimg" /> .legend[Common sole - Otter trawls targeting demersal species(kg; 2018)] </figure> ] -- .question[How to integrate commercial data to improve the estimation of spatial fish density?] --- # A hierarchical spatial model to map the fish density <img class="logopos_right" src="model.png" style="width:6%"> <img class="logopos_left" src="fish_model.png" style="height:7%"> ## A model should account for the specifities of catch data .left-column[ <figure> <img src="model_baptiste_1.png" alt="DAG for Baptist's model" style="width:100%" class = "centerimg" /> </figure> ] --- count: false # A hierarchical spatial model to map the fish density <img class="logopos_right" src="model.png" style="width:6%"> <img class="logopos_left" src="fish_model.png" style="height:7%"> ## A model should account for the specifities of catch data .left-column[ <figure> <img src="model_baptiste_2.png" alt="DAG for Baptist's model" style="width:100%" class = "centerimg" /> </figure> ] .right-column[ * Observation process : Zero inflated data ( *Sophie Ancelet's PhD* ) <a name=cite-aitchison1955distribution></a><a name=cite-ancelet2010modelling></a><a name=cite-foster2013poisson></a>([Aitchison, 1955](#bib-aitchison1955distribution); [Ancelet, Etienne, Benoît, et al., 2010](#bib-ancelet2010modelling); [Foster and Bravington, 2013](#bib-foster2013poisson)),] --- count: false # A hierarchical spatial model to map the fish density <img class="logopos_right" src="model.png" style="width:6%"> <img class="logopos_left" src="fish_model.png" style="height:7%"> ## A model should account for the specifities of catch data .left-column[ <figure> <img src="model_baptiste_3.png" alt="DAG for Baptist's model" style="width:100%" class = "centerimg" /> </figure> ] .right-column[ * Observation process : Zero inflated data ( *Sophie Ancelet's PhD* ) ([Aitchison, 1955](#bib-aitchison1955distribution); [Ancelet, Etienne, Benoît, et al., 2010](#bib-ancelet2010modelling); [Foster and Bravington, 2013](#bib-foster2013poisson)), * Hidden layer : Spatially structured ( *Jean-Baptiste Lecomte's PhD* ) <a name=cite-lecomte2013compound></a>([Lecomte, Benoît, Ancelet, et al., 2013](#bib-lecomte2013compound)), ] --- count: false # A hierarchical spatial model to map the fish density <img class="logopos_right" src="model.png" style="width:6%"> <img class="logopos_left" src="fish_model.png" style="height:7%"> ## A model should account for the specifities of catch data .left-column[ <figure> <img src="model_baptiste_4.png" alt="DAG for Baptist's model" style="width:100%" class = "centerimg" /> </figure> ] .right-column[ * Observation process : Zero inflated data ( *Sophie Ancelet's PhD* ) ([Aitchison, 1955](#bib-aitchison1955distribution); [Ancelet, Etienne, Benoît, et al., 2010](#bib-ancelet2010modelling); [Foster and Bravington, 2013](#bib-foster2013poisson)), * Hidden layer : Spatially structured ( *Jean-Baptiste Lecomte's PhD* ) ([Lecomte, Benoît, Ancelet, et al., 2013](#bib-lecomte2013compound)), * Integrate Commercial data ] --- count: false # A hierarchical spatial model to map the fish density <img class="logopos_right" src="model.png" style="width:6%"> <img class="logopos_left" src="fish_model.png" style="height:7%"> ## A model should account for the specifities of catch data .left-column[ <figure> <img src="model_baptiste_full.png" alt="DAG for Baptist's model" style="width:100%" class = "centerimg" /> </figure> ] .right-column[ * Observation process : Zero inflated data ( *Sophie Ancelet's PhD* ) ([Aitchison, 1955](#bib-aitchison1955distribution); [Ancelet, Etienne, Benoît, et al., 2010](#bib-ancelet2010modelling); [Foster and Bravington, 2013](#bib-foster2013poisson)), * Hidden layer : Spatially structured ( *Jean-Baptiste Lecomte's PhD* ) ([Lecomte, Benoît, Ancelet, et al., 2013](#bib-lecomte2013compound)), * Integrate Commercial data where **sampling process** and **observations** are not independent (Commercial data) (*Baptiste Alglave's PhD*) ] --- name:test # A hierarchical spatial model to map the fish density <img class="logopos_right" src="model.png" style="width:6%"> <img class="logopos_left" src="fish_model.png" style="height:7%"> ## Estimation of Hierarchical model * Integration over the hidden variables: a difficult task * Use of Monte Carlo simulation * Approximation: Integrated Nested Laplace Approximation <a name=cite-rue2009approximate></a>([Rue, Martino, and Chopin, 2009](#bib-rue2009approximate)) combined with Gaussian Random Markov Field for a sparse spatial model: * Maximum likelihood estimation is also possible: the maximization uses Automatic differentiation thanks to the TMB R package <a name=cite-kristensen2016automatic></a>([Kristensen, Nielsen, Berg, et al., 2016](#bib-kristensen2016automatic)), -- ## Results .font80[<a name=cite-alglave2021integrated></a>[Alglave, Vermard, Etienne, et al. (2021)](#bib-alglave2021integrated) in review] * Ignoring PS provides biased estimates * Commercial data contain valuable information --- template:test <figure> <img src="sole_spatial_results.png" alt="Sole's results" style="width:75%" class = "centerimg" /> .legend[Reconstruction of the spatial relative density for soles in 2018 in the Bay of Biscay during the Orhago scientific campaign] </figure> --- # A hierarchical spatial model to map the fish density <img class="logopos_right" src="model.png" style="width:6%"> <img class="logopos_left" src="fish_model.png" style="height:7%"> ## Around this model (as part of Baptiste's Alglave PhD project) * Adaptation for presence absence data (Florian Quemper Master's project) : a bit disappointing * Spatio temporal modelling (in progress) * Account for aggregated data (initiated) .pull-left[ <figure> <img src="aggregation.png" alt="Sole's results" style="width:100%" class = "centerimg" /> .legend[Illustration of the reallocation process. Catch declaration = 50] </figure> ] .pull-right[ Most commercial vessels report one catch per day and administrative unit * Catch are then "reallocated" * Change the commercial observation process `$$Y^{C}_s = \sum_{l \in \mathcal{F}_i} Y^{C}_{sl}$$` ] --- # A statistical test to identify atypical genome region <img class="logopos_right" src="pvalue.png" style="width:6%"> <img class="logopos_left" src="dna.png" style="height:7%"> <figure> <img src="overview4_app.png" alt="at the beginning is" style="width:90%" class = "centerimg" /> </figure> --- # A statistical test to identify atypical genome region <img class="logopos_right" src="pvalue.png" style="width:6%"> <img class="logopos_left" src="dna.png" style="height:7%"> ## PhD thesis: a scoring scheme to measure atypicity within a sequence see <a name=cite-Etienne03></a><a name=cite-Etienne04></a>[Daudin, Etienne, and Vallois (2003)](#bib-Etienne03); [Etienne and Vallois (2004)](#bib-Etienne04) -- ## Recurrent alteration in a cohort : CGH profiles .pull-left[ <img src="recurrent_alt.png" alt="at the beginning is" style="width:90%" class = "centerimg" /> ] -- .pull-right[ For a cohort of size `\(N\)`, a profile of length `\(L\)`, `$$\mathbb{P} ( \mbox{At least } m \mbox{ patients an alteration of length } l )$$` * <a name=cite-robin2009simultaneous></a>[Robin and Stefanov (2009)](#bib-robin2009simultaneous), considers a profile `\((X^{i}_k)_{k=1,\ldots,n}\)` as a 2 states discrete time Markov chain, the solution involved a Markov chain on a NL state space ] --- # A statistical test to identify atypical genome region <img class="logopos_right" src="pvalue.png" style="width:6%"> <img class="logopos_left" src="dna.png" style="height:7%"> ## Recurrent alteration in a cohort : CGH profiles .pull-lefts[ * For large `\(L\)` and small `\(N\)`, <a name=cite-robin2015detection></a>[Robin and Stefanov (2015)](#bib-robin2015detection) considers `\(X^{i}\)` a 2 states continuous time Markov process <figure> <img src="ind_profile.png" alt="at the beginning is" style="width:90%" class = "centerimg" /> </figure> and define `\(Y^{(N)}_{s}\)` as the number of patients with alteration at position (time) `\(s\)`. `$$\mathbb{P}(\exists t, \forall 0 \leq s < l, Y^{(N)}_{t+s} \geq m )$$` ] -- .pull-rightb[ <figure> <img src="cum_profile1.png" alt="at the beginning is" style="width:100%" class = "centerimg" /> </figure> ] --- count: false # A statistical test to identify atypical genome region <img class="logopos_right" src="pvalue.png" style="width:6%"> <img class="logopos_left" src="dna.png" style="height:7%"> ## Recurrent alteration in a cohort : CGH profiles .pull-lefts[ * For large `\(L\)` and small `\(N\)`, [Robin and Stefanov (2015)](#bib-robin2015detection) considers `\(X^{i}\)` a 2 states continuous time Markov process <figure> <img src="ind_profile.png" alt="at the beginning is" style="width:90%" class = "centerimg" /> </figure> and define `\(Y^{(N)}_{s}\)` as the number of patients with alteration at position (time) `\(s\)`. `$$\mathbb{P}(\exists t, \forall 0 \leq s < l, Y^{(N)}_{t+s} \geq m )$$` ] .pull-rightb[ <figure> <img src="cum_profile2.png" alt="at the beginning is" style="width:100%" class = "centerimg" /> </figure> ] --- count: false # A statistical test to identify atypical genome region <img class="logopos_right" src="pvalue.png" style="width:6%"> <img class="logopos_left" src="dna.png" style="height:7%"> ## Recurrent alteration in a cohort : CGH profiles .pull-lefts[ * For large `\(L\)` and small `\(N\)`, [Robin and Stefanov (2015)](#bib-robin2015detection) considers `\(X^{i}\)` a 2 states continuous time Markov process <figure> <img src="ind_profile.png" alt="at the beginning is" style="width:90%" class = "centerimg" /> </figure> and define `\(Y^{(N)}_{s}\)` as the number of patients with alteration at position (time) `\(s\)`. `$$\mathbb{P}(\exists t, \forall 0 \leq s < l, Y^{(N)}_{t+s} \geq m )$$` ] .pull-rightb[ <figure> <img src="cum_profile3.png" alt="at the beginning is" style="width:100%" class = "centerimg" /> </figure> ] <br> But the computational complexity drastically increases with `\(N\)`. .question[Could we identify some limit behavior when N increases ?] --- count:false # A statistical test to identify atypical genome region <img class="logopos_right" src="pvalue.png" style="width:6%"> <img class="logopos_left" src="dna.png" style="height:7%"> ## Recurrent alteration in a cohort : CGH profiles Let `\(S^{N}= \sum_{i=1}^N \frac{X^{i}-E(X^i)}{\sqrt{N/4}}\)` a normalized version of `\(Y^{(N)}\)`. <figure> <img src="cum_profile4.png" alt="at the beginning is" style="width:80%" class = "centerimg" /> </figure> `$$\mathbb{P}( \exists t, \forall 0 \leq s < l, S^{(N)}_{t+s} \geq m^\prime )= \mathbb{P}( E^{*,m^\prime}_{S^N} > l ),$$` with `\(E^{*,m^\prime}_{S^N}\)` the length of the longest excursion of the cumulative profile `\(S^{N}\)` above `\(m^\prime\)` -- .addspace[$$ $$] .question[ `$$\mathbb{P}( E^{*,m^\prime}_{S^N} > l )\underset{N\to\infty}{\longrightarrow} ??$$`] --- # A statistical test to identify atypical genome region <img class="logopos_right" src="pvalue.png" style="width:6%"> <img class="logopos_left" src="dna.png" style="height:7%"> ## Recurrent alteration in a cohort : CGH profiles <a name=cite-Decreusefond19a></a>[Decreusefond, Etienne, Lang, et al. (2021)](#bib-Decreusefond19a) prove <div class= "theorem"> Let Z be a stationary standard Ornstein Uhlenbeck process, `$$\left \vert \mathbb{P}( E^{*,m}_{S^N} > l) - \mathbb{P}(E^{*,m}_Z > l))\right\vert \leq \frac{C\ln{N}}{N^{1/8}}.$$` </div> Key points of the proof: -- * Convergence of `\(S^{N}\)` to `\(Z\)` : a Gaussian process with covariance function `\(c(s) = e^{-\tau s}.\)` -- * Rate of convergence: combined a Brownian representation of the Ornstein Uhlenbeck process and a result from <a name=cite-kubilius1994rate></a>[Kubilius (1994)](#bib-kubilius1994rate) on the rate of convergence of the martingale increments to the Brownian motion. -- * The quantity of interest is expressed through a continuous functional to preserve the convergence --- # A statistical test to identify atypical genome region <img class="logopos_right" src="pvalue.png" style="width:6%"> <img class="logopos_left" src="dna.png" style="height:7%"> ## Perspective : Practical implementation The distribution of the length of the OU excursions is unknown in general * Conditionally on the first hitting time `\(\sigma_m\)`, express the change of measure from OU to a Wiener process with a quantity which only depends on the signs and the lengths of Brownian excursions. * Simulation of the signs and lengths of Brownian excursion using the size biased algorithm porposed by <a name=cite-devroye2010exact></a>[Devroye (2010)](#bib-devroye2010exact). --- # Efficient estimation methods in movement ecology <img class="logopos_right" src="compute.png" style="height:6%"> <img class="logopos_left" src="paw.png" style="height:7%"> -- <figure> <img src="overview5_app.png" alt="at the beginning is" style="width:90%" class = "centerimg" /> </figure> --- # Efficient estimation methods in movement ecology <img class="logopos_right" src="compute.png" style="height:6%"> <img class="logopos_left" src="paw.png" style="height:7%"> ## <a name=cite-nathan2008movement></a>([Nathan, Getz, Revilla, et al., 2008](#bib-nathan2008movement)) presents individual movement as the results of .pull-left[ <figure> <img src="nathan_fig.png" alt="at the beginning is" style="width:100%" class = "centerimg" /> </figure> .legend[ Movement drivers by [Nathan, Getz, Revilla, et al. (2008)](#bib-nathan2008movement) ] ] .pull-right[ * Motion capacities * Internal state * Environment ] -- .question[Movement informs on internal states and habitat preferences] --- # Efficient estimation methods in movement ecology <img class="logopos_right" src="compute.png" style="height:6%"> <img class="logopos_left" src="paw.png" style="height:7%"> ## Movement data .pull-left[ <figure> <img src="path_1.png" alt="at the beginning is" style="width:100%" class = "centerimg" /> </figure> ] --- count: false # Efficient estimation methods in movement ecology <img class="logopos_right" src="compute.png" style="height:6%"> <img class="logopos_left" src="paw.png" style="height:7%"> ## Movement data .pull-left[ <figure> <img src="path_2.png" alt="at the beginning is" style="width:100%" class = "centerimg" /> </figure> ] --- count: false # Efficient estimation methods in movement ecology <img class="logopos_right" src="compute.png" style="height:6%"> <img class="logopos_left" src="paw.png" style="height:7%"> ## Movement data .pull-left[ <figure> <img src="path_3.png" alt="at the beginning is" style="width:100%" class = "centerimg" /> </figure> ] -- .pull-right[ A continuous process sampled at some discrete potentially irregular times. Time series with values in `\(\mathbb{R}^2\)`. `$$\begin{array}\\ \mbox{Time} & \mbox{Location} & \mbox{Turning angle} & \mbox{Speed}\\ t_{0} & (x_0, y_0) & NA & NA\\ t_{1} & (x_1, y_1) & NA & sp_1\\ t_{2} & (x_2, y_2) & ang_2 & sp_2\\ \vdots & \vdots& \vdots& \vdots \\ t_{n} & (x_n, y_n) & ang_n & sp_n\\\\ \end{array}$$` ] --- # Efficient estimation methods in movement ecology <img class="logopos_right" src="compute.png" style="height:6%"> <img class="logopos_left" src="paw.png" style="height:7%"> ## Heterogeneity in movement pattern interpretated as different internal states .pull-left[ <figure> <img src="traj_seg_booby_black.png" alt="at the beginning is" style="width:90%" class = "centerimg" /> .legend[Peruvian booby data courtesy of Sophie Bertrand] </figure> ] --- # Efficient estimation methods in movement ecology <img class="logopos_right" src="compute.png" style="height:6%"> <img class="logopos_left" src="paw.png" style="height:7%"> ## Heterogeneity in movement pattern interpretated as different internal states .pull-left[ <figure> <img src="traj_seg_booby.png" alt="at the beginning is" style="width:90%" class = "centerimg" /> .legend[Peruvian booby data courtesy of Sophie Bertrand] </figure> ] .pull-right[ ## Accounting for internal states Classically addressed with Hidden Markov Model ### Exploring the change point detection approach. <a name=cite-lavielle2005using></a><a name=cite-picard2007segmentation></a>([Lavielle, 2005](#bib-lavielle2005using); [Picard, Robin, Lebarbier, et al., 2007](#bib-picard2007segmentation)) ] --- # Efficient estimation methods in movement ecology <img class="logopos_right" src="compute.png" style="height:6%"> <img class="logopos_left" src="paw.png" style="height:7%"> ## Signal processing approach for movement ecology ([Picard, Robin, Lebarbier, et al., 2007](#bib-picard2007segmentation)) .pull-left[ <figure> <img src="segmentation.png" alt="at the beginning is" style="width:90%" class = "centerimg" /> .legend[Change point detection <a name=cite-patin2019identifying></a>([Patin, Etienne, Lebarbier, et al., 2019](#bib-patin2019identifying))] </figure> Let `\(\boldsymbol{\tau}={\tau_1,...,\tau_{K-1}}\)` ( `\(\tau_0=-1\)` ) be a partition in K segments of `\(\{1,\ldots n\}\)` `$$\boldsymbol{X}_{i}\overset{i.i.d}{\sim}\mathcal{L}(\theta_k),\quad \forall i \in \{ \tau_{k-1}+1:\tau_k \}$$` The .care[Dynamic Programming algorithm] allows to explore efficiently all possible segmentation and to estimate `\(\boldsymbol{\hat{\tau}}\)` ] -- .pull-right[ <figure> <img src="seg_classification.png" alt="at the beginning is" style="width:90%" class = "centerimg" /> .legend[Change point detection and classification ([Patin, Etienne, Lebarbier, et al., 2019](#bib-patin2019identifying))] </figure> Let `\(Z_k\)` stand for the class of segment `\(k,\)` `\(\forall i \in \{ \tau_{k-1}+1:\tau_k \}\)` `$$Z_k \overset{i.i.d}{\sim} \mathcal{M}(\pi), \quad\boldsymbol{X}_{i}\vert Z_k=l \overset{i.i.d}{\sim}\mathcal{L}(\theta_l)$$` The Dynamic Programming coupled with EM algorithm allows to explore efficiently all possible segmentation and to estimate `\(\boldsymbol{\hat{\tau}}\)` ] -- In ([Patin, Etienne, Lebarbier, et al., 2019](#bib-patin2019identifying)) : a direct extension to simultaneous segmentation for home range shift. --- # Efficient estimation methods in movement ecology <img class="logopos_right" src="compute.png" style="height:6%"> <img class="logopos_left" src="paw.png" style="height:7%"> ## Signal processing approach for movement ecology ([Lavielle, 2005](#bib-lavielle2005using); [Picard, Robin, Lebarbier, et al., 2007](#bib-picard2007segmentation)) .pull-left[ <figure> <img src="segmentation.png" alt="at the beginning is" style="width:90%" class = "centerimg" /> .legend[Change point detection ([Patin, Etienne, Lebarbier, et al., 2019](#bib-patin2019identifying))] </figure> ] .pull-right[ <figure> <img src="seg_classification.png" alt="at the beginning is" style="width:90%" class = "centerimg" /> .legend[Change point detection and classification ([Patin, Etienne, Lebarbier, et al., 2019](#bib-patin2019identifying))] </figure> ] .question[Movement path is more than time series, importance of considering the space.] -- .center[.care[Proposing ecologically meaningful movement models]] Pros and cons of Discrete time versus continuous time movement models discussed in <a name=cite-mcclintock2014discrete></a>[McClintock, Johnson, Hooten, et al. (2014)](#bib-mcclintock2014discrete) --- # Efficient estimation methods in movement ecology <img class="logopos_right" src="compute.png" style="height:6%"> <img class="logopos_left" src="paw.png" style="height:7%"> ## Diffusions as continuous time movement model Let `\((X_s)_{s\geq0}\in\mathbb{R}^2\)` denote the position at time `\(s\)`. .pull-left[ * Brownian motion: a pure diffusion model `$$dX_s = dW_s, \quad X_0=x_0.$$` <figure> <img src="bm.png" alt="at the beginning is" style="width:60%" class = "centerimg" /> </figure> ] -- .pull-left[ * Ornstein Uhlenbeck process: central place behavior `$$dX_s = -B (X_s- \mu) ds + dW_s, \quad X_0=x_0.$$` <figure> <img src="ou.png" alt="at the beginning is" style="width:60%" class = "centerimg" /> </figure> ] Popular models as Brownian Motion and Ornstein Uhlenbeck have known transition densities `\(q(x_t, x_{t+s})\)` which is not the case in general. --- # Efficient estimation methods in movement ecology <img class="logopos_right" src="compute.png" style="height:6%"> <img class="logopos_left" src="paw.png" style="height:7%"> ## Diffusions as continuous time movement model <a name=cite-brillinger2002employing></a>[Brillinger, Preisler, Ager, et al. (2002)](#bib-brillinger2002employing) propose a flexible framework `$$dX_s = -\nabla H(X_s) ds + \gamma dW_s, \quad X_0=x_0.$$` but no explicit transitions `\(q(x_t, x_{t+s})\)` -- In <a name=cite-Gloaguen2018stochastic></a>[Gloaguen, Etienne, and Le Corff (2018a)](#bib-Gloaguen2018stochastic), as part of *P. Gloaguen's PhD*, explore `\(H(X_s) = \sum_{k=1}^K \pi_k \varphi_k(X_s),\)` .pull-left[ <figure> <img src="map2.png" alt="at the beginning is" style="width:90%" class = "centerimg" /> </figure> ] -- .pull-right[ * Euler approximation : biased estimates with low frequency data * <a name=cite-ozaki1992bridge></a>([Ozaki, 1992](#bib-ozaki1992bridge)) and <a name=cite-kessler1997estimation></a>[Kessler (1997)](#bib-kessler1997estimation) same results than * MCEM based on exact simulation <a name=cite-beskos2006exact></a>([Beskos, Papaspiliopoulos, Roberts, et al., 2006](#bib-beskos2006exact)) limits the flexibility of the SDE. ] --- # Efficient estimation methods in movement ecology <img class="logopos_right" src="compute.png" style="height:6%"> <img class="logopos_left" src="paw.png" style="height:7%"> ## Partially observed SDE .pull-lefts[ <img src="path_4.png" alt="at the beginning is" style="width:70%" class = "centerimg" /> ] -- .pull-rights[ Let `\(Y_k\)` be the recorded position `\(s_k\)`, a noisy observation of the true position `\(X_k\)`: `$$dX_s = b(X_s) ds + \gamma dW_s, \quad X_0=x_0; \quad Y_k \overset{ind}{\sim} \mathcal{L} (X_k, \theta{o}).$$` ] -- ### Additive smoothing distributions for the E Step `$$\sum_{k=0}^{n-1}\mathbb{E}( h(X_k, X_{k+1}) \vert Y_{0:n})$$` -- * The particle-based, rapid incremental smoother (PaRIS) algorithm <a name=cite-olsson2017efficient></a>([Olsson and Westerborn, 2017](#bib-olsson2017efficient)) provides an online smoother using a rewriting of the Backward weight and an acceptation/rejection mechanism but depends on `\(q(\xi_{k-1}, \xi_{k})\)` * The generalized random PaRIS algorithm, in <a name=cite-gloaguen2018online></a>([Gloaguen, Etienne, and Le Corff, 2018b](#bib-gloaguen2018online)), uses simple Euler approximation to propose the particles and uses a General Poisson Estimator to replace `\(q(\xi_{k-1}, \xi_{k})\)` with an unbiased estimator. -- .care[Restrictive constraints on the drift and the diffusion term], are relaxed in <a name=cite-martin2021backward></a>([Martin, Etienne, Gloaguen, et al., 2021](#bib-martin2021backward)). --- # Efficient estimation methods in movement ecology <img class="logopos_right" src="compute.png" style="height:6%"> <img class="logopos_left" src="paw.png" style="height:7%"> ## Flexible movement model <figure> <img src="DAG1.png" alt="at the beginning is" style="width:80%" class = "centerimg" /> </figure> --- # Efficient estimation methods in movement ecology <img class="logopos_right" src="compute.png" style="height:6%"> <img class="logopos_left" src="paw.png" style="height:7%"> ## Flexible movement model <figure> <img src="DAG2.png" alt="at the beginning is" style="width:80%" class = "centerimg" /> .legend[Flexible movement model which accounts for environment] </figure> <!-- --- --> <!-- template: movement --> <!-- ## Flexible movement model --> <!-- <figure> --> <!-- <img src="DAG3.png" alt="at the beginning is" style="width:80%" class = "centerimg" /> --> <!-- <figcaption> Flexible non homogeneous movement model</figcaption> --> <!-- </figure> --> --- # Efficient estimation methods in movement ecology <img class="logopos_right" src="compute.png" style="height:6%"> <img class="logopos_left" src="paw.png" style="height:7%"> ## Linking covariates and stationary distribution A classical choice of **resource selection function**, (i.e stationary distribution including covariates) $$\pi{\left(x \vert \beta\right)} \varpropto \exp\left(\sum_{j=1}^J \beta_j c_j (x) \right). $$ Diffusion, under regularity condition admits a stationary distribution. -- Combining the ideas of <a name=cite-michelot2019linking></a>([Michelot, Blackwell, and Matthiopoulos, 2019](#bib-michelot2019linking)), and ([Brillinger, Preisler, Ager, et al., 2002](#bib-brillinger2002employing)) lead to the Langevin diffusion as movement model, $$ d X_t = \frac{\gamma^2}{2} \nabla \log \pi{\left(X_t\right)} \, d t + \gamma \,d W_t,\quad X_0 =x_0. $$ -- Using Euler approximation `$$ X_{i+1} \vert \lbrace X_i = x_i \rbrace = x_i + \frac{\gamma^2 \Delta_i}{2} \sum_{j=1}^J \beta_j \nabla c_j(x_i) + \sqrt{\Delta_i} \varepsilon_{i+1},\quad \varepsilon_{i+1} \overset{ind}{\sim} N \left( {0} , \gamma^2 \boldsymbol{I}_d \right),$$` leads to a simple linear model published in <a name=cite-michelot2019langevin></a>[Michelot, Etienne, Blackwell, et al. (2019)](#bib-michelot2019langevin). --- # Perspectives ### Some natural extension of the Langevin model useful in movement ecology * Coupling Hidden Markov model or change point detection model with a Langevin distribution: <figure> <img src="DAG4.png" alt="at the beginning is" style="width:60%" class = "centerimg" /> </figure> * Introduction of an individual random effect, ### Handling categorical covariates <a name=cite-lejay2018maximum></a>[Lejay and Pigato (2018)](#bib-lejay2018maximum) define a threshold diffusion and proposes an ML estimation method. Explore the generalisation to `\(\mathbb{R}^2\)`. ### Longer term perspective * Collective movement: Cheaper GPS imply massive deployment. Invest the area of collective movement analysis. * Combined sound monitoring --- # A few words to finish .pull-left[ <figure> <img src="overview_final.png" alt="at the beginning is" style="width:90%" class = "centerimg" /> </figure> ] -- .pull-right[ * Close interaction with biologist, - it's helpful, - provides exciting statistical problems, - experiment a large diversity of approaches. * Hidden variables approach - provide flexible models (spatial abundance, classification in time series, Partially observed SDE) - popularized in Ecology in a Bayesian setting but frequentist approach is also possible * The Markovian property - not so realistic, - but a key component in both theoretical approach and computational aspects ] --- # Many thanks to <figure> <img src="wordcloud_bis.png" alt="at the beginning is" style="width:65%" class = "centerimg" /> </figure> --- class: biblio # Bibliography <img class="logopos_right" src="article.png" style="height:6%"> <a name=bib-aitchison1955distribution></a>[Aitchison, J.](#cite-aitchison1955distribution) (1955). "On the distribution of a positive random variable having a discrete probability mass at the origin". In: _Journal of the american statistical association_ 50.271, pp. 901-908. <a name=bib-alglave2021integrated></a>[Alglave, B., Y. Vermard, M. Etienne, et al.](#cite-alglave2021integrated) (2021). "Integrated framework accounting for preferential sampling to infer fish spatial ditribution". <a name=bib-ancelet2010modelling></a>[Ancelet, S., M. Etienne, H. Benoît, et al.](#cite-ancelet2010modelling) (2010). "Modelling spatial zero-inflated continuous data with an exponentially compound Poisson process". In: _Environmental and Ecological Statistics_ 17.3, pp. 347-376. <a name=bib-beskos2006exact></a>[Beskos, A., O. Papaspiliopoulos, G. O. Roberts, et al.](#cite-beskos2006exact) (2006). "Exact and computationally efficient likelihood-based estimation for discretely observed diffusion processes (with discussion)". In: _Journal of the Royal Statistical Society: Series B (Statistical Methodology)_ 68.3, pp. 333-382. <a name=bib-brillinger2002employing></a>[Brillinger, D. R., H. K. Preisler, A. A. Ager, et al.](#cite-brillinger2002employing) (2002). "Employing stochastic differential equations to model wildlife motion". In: _Bulletin of the Brazilian Mathematical Society_ 33.3, pp. 385-408. <a name=bib-Etienne03></a>[Daudin, J., M. Etienne, and P. Vallois](#cite-Etienne03) (2003). "Asymptotic behavior of the local score of independent and identically distributed random sequences". In: _Stochastic Processes and their Applications_ 107.1. <a name=bib-Decreusefond19a></a>[Decreusefond, L., M. Etienne, G. Lang, et al.](#cite-Decreusefond19a) (2021). "Convergence of the sum of pure jump Markov unitary processes to an Ornstein Uhlenbeck process". --- class: biblio count: false # Bibliography <img class="logopos_right" src="article.png" style="height:6%"> <a name=bib-devroye2010exact></a>[Devroye, L.](#cite-devroye2010exact) (2010). "On exact simulation algorithms for some distributions related to Brownian motion and Brownian meanders". In: _Recent Developments in Applied Probability and Statistics_. Springer, pp. 1-35. <a name=bib-Etienne04></a>[Etienne, M. and P. Vallois](#cite-Etienne04) (2004). "Approximation of the distribution of the supremum of a centred random walk. Application to the local score". In: _Methodol. Comput. Appl. Probab._ 6.3, pp. 255-275. <a name=bib-foster2013poisson></a>[Foster, S. D. and M. V. Bravington](#cite-foster2013poisson) (2013). "A Poisson-Gamma model for analysis of ecological non-negative continuous data". In: _Environmental and ecological statistics_ 20.4, pp. 533-552. <a name=bib-gloaguen2018online></a>[Gloaguen, P., M. Etienne, and S. Le Corff](#cite-gloaguen2018online) (2018b). "Online sequential Monte Carlo smoother for partially observed diffusion processes". In: _EURASIP Journal on Advances in Signal Processing_ 2018.1, p. 9. <a name=bib-Gloaguen2018stochastic></a>[Gloaguen, P., M. Etienne, and S. Le Corff](#cite-Gloaguen2018stochastic) (2018a). "Stochastic differential equation based on a multimodal potential to model movement data in ecology". In: _Journal of the Royal Statistical Society: Series C (Applied Statistics)_ 67.3, pp. 599-619. DOI: [10.1111/rssc.12251](https://doi.org/10.1111%2Frssc.12251). <a name=bib-kessler1997estimation></a>[Kessler, M.](#cite-kessler1997estimation) (1997). "Estimation of an ergodic diffusion from discrete observations". In: _Scandinavian Journal of Statistics_ 24.2, pp. 211-229. <a name=bib-kristensen2016automatic></a>[Kristensen, K., A. Nielsen, C. W. Berg, et al.](#cite-kristensen2016automatic) (2016). "TMB: Automatic Differentiation and Laplace Approximation". In: _Journal of Statistical Software_ 70.5, pp. 1-21. DOI: [10.18637/jss.v070.i05](https://doi.org/10.18637%2Fjss.v070.i05). --- class: biblio count: false # Bibliography <img class="logopos_right" src="article.png" style="height:6%"> <a name=bib-kubilius1994rate></a>[Kubilius, K.](#cite-kubilius1994rate) (1994). "Rate of convergence in the invariance principle for martingale difference arrays". In: _Lithuanian Mathematical Journal_ 34.4, pp. 383-392. <a name=bib-lavielle2005using></a>[Lavielle, M.](#cite-lavielle2005using) (2005). "Using penalized contrasts for the change-point problem". In: _Signal processing_ 85.8, pp. 1501-1510. <a name=bib-lecomte2013compound></a>[Lecomte, J., H. P. Benoît, S. Ancelet, et al.](#cite-lecomte2013compound) (2013). "Compound P oisson-gamma vs. delta-gamma to handle zero-inflated continuous data under a variable sampling volume". In: _Methods in Ecology and Evolution_ 4.12, pp. 1159-1166. <a name=bib-lejay2018maximum></a>[Lejay, A. and P. Pigato](#cite-lejay2018maximum) (2018). "Maximum likelihood drift estimation for a threshold diffusion". In: _Scandinavian Journal of Statistics_. <a name=bib-martin2021backward></a>[Martin, A., M. Etienne, P. Gloaguen, et al.](#cite-martin2021backward) (2021). "Backward importance sampling for online estimation of state space models". working paper or preprint. URL: [https://hal.archives-ouvertes.fr/hal-02476102](https://hal.archives-ouvertes.fr/hal-02476102). <a name=bib-mcclintock2014discrete></a>[McClintock, B. T., D. S. Johnson, M. B. Hooten, et al.](#cite-mcclintock2014discrete) (2014). "When to be discrete: the importance of time formulation in understanding animal movement". In: _Movement Ecology_ 2.1, p. 21. <a name=bib-michelot2019linking></a>[Michelot, T., P. G. Blackwell, and J. Matthiopoulos](#cite-michelot2019linking) (2019). "Linking resource selection and step selection models for habitat preferences in animals". In: _Ecology_ 100.1. --- class: biblio count: false # Bibliography <img class="logopos_right" src="article.png" style="height:6%"> <a name=bib-michelot2019langevin></a>[Michelot, T., M. Etienne, P. Blackwell, et al.](#cite-michelot2019langevin) (2019). "The Langevin diffusion as a continuous-time model of animal movement and habitat selection". In: _Methods in Ecology and Evolution_. <a name=bib-nathan2008movement></a>[Nathan, R., W. M. Getz, E. Revilla, et al.](#cite-nathan2008movement) (2008). "A movement ecology paradigm for unifying organismal movement research". In: _Proceedings of the National Academy of Sciences_ 105.49, pp. 19052-19059. <a name=bib-olsson2017efficient></a>[Olsson, J. and J. Westerborn](#cite-olsson2017efficient) (2017). "Efficient particle-based online smoothing in general hidden Markov models: the PaRIS algorithm". In: _Bernoulli_ 23.3, pp. 1951-1996. <a name=bib-ozaki1992bridge></a>[Ozaki, T.](#cite-ozaki1992bridge) (1992). "A bridge between nonlinear time series models and nonlinear stochastic dynamical systems: a local linearization approach". In: _Statistica Sinica_, pp. 113-135. <a name=bib-patin2019identifying></a>[Patin, R., M. Etienne, E. Lebarbier, et al.](#cite-patin2019identifying) (2019). "Identifying stationary phases in multivariate time series for highlighting behavioural modes and home range settlements". In: _Journal of Animal Ecology_. <a name=bib-picard2007segmentation></a>[Picard, F., S. Robin, E. Lebarbier, et al.](#cite-picard2007segmentation) (2007). "A segmentation/clustering model for the analysis of array CGH data". In: _Biometrics_ 63.3, pp. 758-766. <a name=bib-robin2009simultaneous></a>[Robin, S. and V. Stefanov](#cite-robin2009simultaneous) (2009). "Simultaneous occurrences of runs in independent Markov chains". In: _Methodology and Computing in Applied Probability_ 11.2, pp. 267-275. --- class: biblio count: false # Bibliography <img class="logopos_right" src="article.png" style="height:6%"> <a name=bib-robin2015detection></a>[Robin, S. and V. T. Stefanov](#cite-robin2015detection) (2015). "Detection of significant genomic alterations via simultaneous minimal sojourns at a state by independent continuous-time markov chains". In: _Methodology and Computing in Applied Probability_ 17.2, pp. 479-487. <a name=bib-rue2009approximate></a>[Rue, H., S. Martino, and N. Chopin](#cite-rue2009approximate) (2009). "Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations". In: _Journal of the royal statistical society: Series b (statistical methodology)_ 71.2, pp. 319-392.