Objectives

The aim of this session is to introduce the R package pomp, which contains functions for many of the tasks we have performed in this course. The benefit of using a package like pomp (or alternatives such as SSM or libbi) is that they have been optimised for computational efficiency and can take full advantage of any available hardware (including running on a high-performance cluster). The disadvantage of using readily available packages is that 1) to allow for good performance, models are usually not coded in R, 2) any specific method or functionality you want to use might not be implemented, and 3) they can be more difficult to debug when something is going wrong.

The aim of this session is to see how the different methods we have encountered in the course can be applied to fit a model to data using pomp. It is meant to serve both as a recap of what you have done in the last few days as well as an introduction to one of the available packages for model fitting and inference.

In the previous session, you coded a particle filter to estimate the likelihood, and the function mcmcMh to sample from the posterior distribution with a Metropolis-Hastings algorithm.

In this session you will:

learn how to code a model in pomp using C snippets or R code
explore how to use a particle filter and pMCMC using pomp

Code a model in pomp

To do model fitting with pomp, you need to create a pomp object. This works a bit like the fitmodel objects we created earlier. To create a pomp object, you need to specify a data set as well as functions that simulate the model, evaluate the prior density, etc. The components and functions that need to go into a pomp object are determined by the method that one wants to use. For more information on this, you can have a look at the recent article on pomp in the Journal of Statistical Software.

To load the pomp library, type

library("pomp")

If this does not work, you need to install the pomp library first. You can do this using

install.packages("pomp")

To specify a model that you want to fit to data in pomp, you can either write a function in R, or you can use so-called C snippets, that is model code written in C that is pre-compiled and can be called from an R function. The advantage of doing this is that you can benefit from the speed of compiled C code while not having to learn an awful lot about the details of C syntax.

We have coded up the models of the previous practical sessions in pomp for you. To look at, for example, the SEITL model, use

source("https://sbfnk.github.io/mfiidd/pomp/seitlPomp.r", echo = TRUE)

If this yields a compile error, have a look at the important information for windows and mac users or ask us for help. If you can’t get it to work at all, you work with the R versions of the code (see the note after the next paragraph).

Take 15 minutes to have a look at the code that is printed when you execute this command, and try to understand what it does. You can have a look at our more detailed explanation for further help. If your R client truncates the output, you can look at the full code in a browser by looking at https://github.com/sbfnk/mfiidd/blob/main/Rmd/pomp/seitlPomp.r or the files at https://github.com/sbfnk/mfiidd/tree/main/Rmd/pomp for the other files in the practical to which we refer below.

Note: For the examples presented here, we are providing an alternative version that is coded entirely in R. To look at this, simply add “” to the file names (before the .r extension). For example, to see the R version of the SEITL model in pomp, type

source("https://sbfnk.github.io/mfiidd/pomp/seitlPompR.r", echo = TRUE)

which creates an object called seitlPompR.

How would you change the (C or R code) to get the SEIT4L model? You can have a go at it yourself, or load our version using

source("https://sbfnk.github.io/mfiidd/pomp/seit4lPomp.r", echo = TRUE)

(again, simply add “R” to get the R version).

We now have a pomp object called seitlPomp (or seitlPompR, or seit4lPomp etc.), containing the data, the deterministic skeleton, the stochastic model and the observation process. All the functions we use below take this object as argument and pick whichever elements from within the seitlPomp are needed.

Now try the following operations (with SEITL or SEIT4L). For full documentation of the functions used, refer to the R help pages, which you can access using ?function (replacing ‘function’ with the name of the function you are interested in).

We suggest that you try the methods described below with both of the models.

Simulating the model and estimating the likelihood

Let’s, again, guess some parameters and initial values:

theta <- c(R0 = 2, D_lat = 2, D_inf = 2, alpha = 0.9, D_imm = 13, rho = 0.85)
seitlInitState <- c(S.0 = 250, E.0 = 0, I.0 = 4, T.0 = 0, L.0 = 30, Inc.0 = 0)
seit4lInitState <- c(S.0 = 250, E.0 = 0, I.0 = 4, T1.0 = 0, T2.0 = 0, T3.0 = 0, T4.0 = 0,
    L.0 = 30, Inc.0 = 0)

Note the pomp syntax, whereby initial states are named as the named variable with .0 appended. Note also that the SEIT4L model needs a slightly different initial state (because the variable names are different).

Deterministic trajectory

seitlTraj <- trajectory(seitlPomp, params = c(theta, seitlInitState), format = "data.frame")
plotTraj(seitlTraj)

The trajectory function simulates the deterministic skeleton of the pomp object it is given, ignoring all process or measurement stochasticity.

Stochastic trajectories

seitlSim <- simulate(seitlPomp, params = c(theta, seitlInitState), include.data = FALSE,
    format = "data.frame")

plotTraj(seitlSim, data = fluTdc1971, stateNames = "obs")

The simulate function simulates the (stochastic) model of the pomp object it is given. By specifying obs = TRUE and states = TRUE, we indicate that we want to have both the model states and observations in the data frame that is returned.

Particle filter

pf <- pfilter(seitlPomp, params = c(theta, seitlInitState), Np = 100)
logLik(pf)

## [1] -216.9046

This estimates the likelihood of the given parameters using a particle filter with Np = 100 particles.

Model fitting by maximising the likelihood

Several methods are available in pomp for fitting a deterministic or stochastic model to data using maximum likelihood. Remember that this implies a frequentist method, that is the prior distribution is ignored, and the parameter values that yield maximum likelihood are interpreted as the true parameters.

Trajectory matching

seitlTm <- traj_objfun(seitlPomp, params = c(theta, seitlInitState), est = names(theta))
res <- optim(par = log(theta), fn = seitlTm)

logLik(seitlTm)

## [1] -139.4111

This fits the deterministic skeleton to the data by maximising the likelihood using a standard optimisation method (see ?traj_objfun for details on the methods available). That is, it ignores all process noise (or demographic stochasticity) in the model.

We can use the best-fit parameters from trajectory matching to simulate multiple trajectories from the stochastic model by passing the new object seitlTm to simulate.

seitlTmSim <- simulate(seitlTm, nsim = 10, format = "data.frame")
plotTraj(seitlTmSim, data = fluTdc1971, stateNames = "obs")

Maximum likelihood by iterated filtering (MIF)

propSd <- rep(0.01, length(theta))
names(propSd) <- names(theta)
rwSd <- do.call(rw_sd, as.list(propSd))

seitlMf <- mif2(seitlPomp, params = coef(seitlTm), Nmif = 50, Np = 1000, cooling.fraction.50 = 0.01,
    rw.sd = rwSd)
seitlMfSim <- simulate(seitlMf, nsim = 10, include.data = TRUE, format = "data.frame")
plotTraj(seitlMfSim, data = fluTdc1971, stateNames = "obs")

logLik(seitlMf)

## [1] -124.7027

This fits the stochastic model to the data by maximising the likelihood using so-called iterated filtering. The parameters to be fitted are randomly perturbed (via a so-called random walk) in parameter space and the likelihood estimated with a particle filter. When iterating this procedure with smaller and smaller perturbations (i.e., smaller and smaller steps in the random walk), the parameters converge to the maximum likelihood estimate. For more information on maximum likelihood by iterated filtering, see the references given on the wikipedia page.

In the function call above, we have used Nmif = 50 iterations of MIF for finding the parameters that maximise the likelihood, Np = 1000 particles for the likelihood itself, and a proportion of cooling.fraction.50 = 0.01 of the random walk intensity remaining after 50 iterations.

Note that we have initialised the method with the maximum likelihood estimate obtained from trajectory matching the deterministic model (seitlTm). It is a common procedure to initialise more computationally intensive models with results from a simpler, more approximate method.

Model fitting by sampling from the posterior distribution

Several methods are available in pomp for fitting a deterministic or stochastic model to data by sampling from the posterior distribution. Remember that this implies a Bayesian method, that is the prior distribution is taken into account, and the parameter vectors sampled from the posterior distribution are interpreted as draws from a random distribution that encodes our uncertainty.

Particle Markov-Chain Monte Carlo (pMCMC)

The following code runs pMCMC with adaptation of the proposal distribution using the model given to the POMP object. Note: you can try first with a smaller value of the Nmcmc parameter; depending on the power of your computer, the code below can take a while to run.

# run pMCMC with adaptive MCMC
seitlPmcmc <- pmcmc(seitlPomp, params = coef(seitlMf), Nmcmc = 5000, Np = 128, proposal = mvn_rw_adaptive(rw.sd = propSd,
    scale.start = 100, shape.start = 200))
trace <- traces(seitlPmcmc)

library("coda")
# acceptance rate
1 - rejectionRate(trace)

##    loglik log.prior        R0     D_lat     D_inf     alpha     D_imm       rho 
##    0.1544    0.0000    0.1544    0.1544    0.1544    0.1544    0.1544    0.1544 
##       S.0       E.0       I.0       T.0       L.0     Inc.0 
##    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000

plot(trace)

This runs pMCMC on the model, here for Nmcmc = 5000 MCMC iterations with Np = 128 particles. If you’re interested in seeing the adaptation of the covariance matrix in action, you can run the pmcmc command above with the additional argument verbose = TRUE.

You will notice that the pMCMC runs much faster than the one we implemented earlier in the course. This is because both the model and the particle filter are now coded in C and compiled to fully use the processing power of the computer it runs on. Remember that the trace plots for the earlier practicals were obtained by running the code for several hours on a computing cluster. The code above, while perhaps not running for quite long enough for a reliable estimation of the posterior densities, should run in only a few minutes to run on your computer (on a Macbook Pro, it took a few minutes). You could try running the same sequence of commands with the seitlPompR object to see the enourmous difference in running times.

Look at the trace and density plots above. Are you satisfied with the performance of the MCMC? How would you check if you can trust the resulting posterior samples? What would you change if you weren’t satisfied? We hope that the knowledge and tools you have acquired in this course means you know what might be needed in order to obtain a reliable model fit. Do you get the same answers as yesterday?

Going further

There are other methods available in pomp, such as ABC (Approximate Bayesian Computation). They all operate by the same principle of solving the problems of 1) estimating the likelihood (in the case of a stochastic model) and 2) maximising the it (if frequentist) or sampling from the posterior distribution (if Bayesian). You can read up on the different methods on the pomp web site, and try to use them on the Tristan da Cunha dataset.
As another example of a library for model fitting and inference you could have a go at LibBi and the corresponding R package RBi. The deterministic and stochastic SEITL models can be found here. Can you fit them to the Tristan da Cunha data set?

This web site and the material contained in it were originally created in support of an annual short course on Model Fitting and Inference for Infectious Disease Dynamics at the London School of Hygiene & Tropical Medicine. All material is under a MIT license. Please report any issues or suggestions for improvement on the corresponding GitHub issue tracker. We are always keen to hear about any uses of the material here, so please do get in touch using the Discussion board if you have any questions or ideas, or if you find the material here useful or use it in your own teaching.

Fitting models to data using pomp