Objectives

As you should have noticed in the previous session, several simulations are required to assess the dynamics of a stochastic model against a single one for a deterministic model. Similarly, in the next session you will see that fitting a stochastic model is also computationally much more intensive than fitting a deterministic model. Because of this, we want to avoid wasting simulation runs, which could happen for two reasons:

If you initialise the MCMC with a initTheta far from the region of high posterior density, the chain might take a long time to reach this region of interest and you will have to discard a lot of iterations.
If the covariance matrix of the Gaussian proposal is very different from the posterior, this will result in poor mixing and sub-optimal acceptance rates and you will have to run your chain for longer (or thin it).

In this context, it can be useful to first run a MCMC on the deterministic model, which can be simulated faster, and learn from the output of this chain to initialise the chain for the fit of the stochastic model. The rationale for this approach is that the deterministic model is an approximation of the stochastic model and should capture, in most cases, the same dynamics.

So in this session you will:

Fit the deterministic SEITL and SEIT4L models to the Tristan da Cunha outbreak. This will prepare you for the next session.
Learn about issues that arise when fitting a lot of parameters and how you can try to solve them.
Compare both models and assess whether the best one actually provides a good fit of the data.

To save time, half of the group will fit the deterministic SEITL model and the other half will fit the deterministic SEIT4L model. In the rest of the session, although most of the examples will refer to the SEITL model, the same commands work for the SEIT4L model, i.e. using data(seit4lDeter) instead of data(seitlDeter).

Run a MCMC

Here you could use the function my_mcmcMh that you have already coded but it might result in poor acceptance rates since the models have 6 parameters to be estimated. Because of this, using adaptive MCMC might be a better choice.

Take 10 minutes to have a look at the adaptive MCMC implemented in the function mcmcMh. As you can see, this function takes similar arguments as your function my_mcmcMh: target, initTheta, proposalSd and nIterations, plus several new ones that control the adaptive part of the algorithm and that you can learn about via the help page of mcmcMh.

Note also that this function returns a list that contains, in addition to the trace, the acceptance rate and the empirical covariance matrix of the posterior. The latter will be useful for improving the covariance matrix of the Gaussian proposal in the next session, when we will fit the stochastic model.

The first step is to code a wrapper function to evaluate the posterior at a given theta value. Here again, you could wrap the function my_dLogPosterior that you have already coded and pass it to mcmcMh so that it returns samples of theta from the posterior distribution.

However, as you will see at the end of this session, in order to be able to compare different models we also need to track the log-likelihood of the sampled theta. Although this could be done by running dTrajObs on each returned theta, you might remember that the log-likelihood is actually computed in my_dLogPosterior so we could just use it. This is exactly what the function dLogPosterior does for you.

Take 5 minutes to look at the code of dLogPosterior. As you can see, this function takes similar arguments as your function my_dLogPosterior: fitmodel, theta, initState and data, plus one called margLogLike, which takes the function that will compute the log-likelihood of theta. The last argument (...) is called dot-dot-dot and adds some flexibility to the definition of margLogLike.

For instance, we have seen that for a deterministic model, the likelihood was returned by dTrajObs. However, margLogLike expects a function that returns the log-likelihood, which can be obtained by passing log = TRUE to dTrajObs. To do so, we can use the dot-dot-dot argument of dLogPosterior, which allows you to pass any extra argument to the function assigned to margLogLike.

The function dLogPosterior returns a list of two elements:

logDensity: the log of the posterior density
trace: a vector that contains, among other things, theta and its logLikelihood. All this information will be collected in the trace data frame returned by mcmcMh.

Now, take 15 minutes to prepare all the inputs to be able to run mcmcMh and fit your model. You should proceed as follows:

# wrapper for posterior
my_posteriorTdc <- function(theta){

    my_fitmodel <- # INSERT HERE
    my_initState <- # INSERT HERE

        return(dLogPosterior(fitmodel = my_fitmodel,
                             theta = theta,
                             initState =  my_initState,
                             data = fluTdc1971,
                             margLogLike  =  dTrajObs,
                             log = TRUE))

}

# theta to initialise the MCMC
initTheta <- # INSERT HERE

# diagonal elements of the covariance matrix for the Gaussian proposal
# Must be in the same order as initTheta or named
proposalSd <- # INSERT HERE


# lower and upper limits of each parameter (must be named vectors)
lower <- # INSERT HERE
upper <- # INSERT HERE

# number of iterations for the MCMC
nIterations <- # INSERT HERE

# additional parameters for the adaptive MCMC, see ?mcmcMh for more details
adaptSizeStart <- # INSERT HERE
adaptSizeCooling <- # INSERT HERE
adaptShapeStart <- # INSERT HERE

If you have trouble filling some of the empty bits, have a look at our example.

Then you should be able to run mcmcMh:

my_mcmcTdc <- mcmcMh(
  target = my_posteriorTdc,
  initTheta = initTheta,
  proposalSd = proposalSd,
  limits = list(lower = lower,upper = upper),
  nIterations = nIterations,
  adaptSizeStart = adaptSizeStart,
  adaptSizeCooling = adaptSizeCooling,
  adaptShapeStart = adaptShapeStart
)

You should see some information printed as the chain runs: acceptance rate, state of the chain, log-likelihood etc. In particular, what can you say about the evolution of the acceptance rate?

Note that we have set the number of iterations to 5000. This is a benchmark and if your laptop is quite slow you might want to perform fewer iterations. Here the objective is to have a short - preliminary - run to calibrate your adaptive parameters before running a longer chain.

Take 10 minutes to change the parameters controlling the adaptive part of the MCMC and look at the effect on the acceptance rate. Try to find a “good” combination of these parameters so that the acceptance rate is near the optimal value of 23% (actually, the algorithm efficiency remains high whenever the acceptance rate is between about 0.1 and 0.6 so any value in between is OK). If you can’t find one, look at our example.

Short run analysis

Now it’s time to use what you’ve learned in the previous session to analyse your MCMC outputs using the coda package.

library("coda")

Didn’t manage to run mcmcMh? Just use the results from our example.

Note that because there are more than 6 parameters to look at, the coda functions used previously to plot the traces, densities and autocorrelations might not be optimal for laptop screens (the axis labels take too much space). Fortunately, coda has another set of functions to make the plots more compact:

xyplot for the trace.
densityplot for the density.
acfplot for the autocorrelation.

You can find a grouped documentation for these three functions by typing ?acfplot.

Take 15 minutes to analyse the trace returned by mcmcMh. Remember that, with the notation above, the trace can be accessed by my_mcmcTdc$trace and then converted to a mcmc object so that the coda functions recognize it:

# convert to a mcmc object for coda
my_trace <- mcmc(my_mcmcTdc$trace)
# plot the trace
library("lattice")  ## for xyplot
xyplot(my_trace)

Are you surprised by the trace of the log-prior?
Determine what burn-in and thinning would be appropriate for your trace.
Compare the thinned and unthinned traces (e.g. posterior estimates, density). Do you think it is important to thin?

If you are not sure about how to analyse your trace, have a look at our example.

Long run analysis

You should have noticed that the effective sample size (ESS) is quite small (<100) for your preliminary run. By contrast, the ESS was much higher for the SIR model of the previous session with a similar number of iterations. However, this model has only 2 parameters against 6 for the SEIT(4)L models. Intuitively, the more parameters you have the bigger is the parameter space and the longer it takes to the MCMC algorithm to explore it. This is why we need to run a much longer chain to achieve a good ESS (~1000).

To save time, we have run 2 chains of 100,000 iterations starting from different initial theta values:

theta1 <- c(R_0 = 2, D_lat = 2, D_inf = 2, alpha = 0.8, D_imm = 16, rho = 0.85)
theta2 <- c(R_0 = 20, D_lat = 2, D_inf = 2, alpha = 0.1, D_imm = 8, rho = 0.3)

Each chain took 4 hours to run on a single-CPU and can be loaded as follows:

data(mcmcTdcDeterLongRun)
# this should load 2 objects in your environment: mcmcSeitlTheta1 and
# mcmcSeitlTheta2. Each one is a list of 3 elements returned by mcmcMh
names(mcmcSeitlTheta1)
## [1] "trace"           "acceptanceRate"  "covmatEmpirical"
# the trace contains 9 variables for 100000 iterations
dim(mcmcSeitlTheta1$trace)
## [1] 50000     9
# let's have a look at it
head(mcmcSeitlTheta1$trace)
##           R_0    D_lat    D_inf     alpha    D_imm       rho  logPrior
## [1,] 2.000000 2.000000 2.000000 0.8000000 16.00000 0.8500000 -12.81448
## [2,] 2.000000 2.000000 2.000000 0.8000000 16.00000 0.8500000 -12.81448
## [3,] 1.722749 1.481383 1.186734 0.8390849 11.34142 0.8183184 -12.81448
## [4,] 1.722749 1.481383 1.186734 0.8390849 11.34142 0.8183184 -12.81448
## [5,] 1.722749 1.481383 1.186734 0.8390849 11.34142 0.8183184 -12.81448
## [6,] 1.722749 1.481383 1.186734 0.8390849 11.34142 0.8183184 -12.81448
##      logLikelihood logDensity
## [1,]     -445.7795  -458.5939
## [2,]     -445.7795  -458.5939
## [3,]     -421.8376  -434.6520
## [4,]     -421.8376  -434.6520
## [5,]     -421.8376  -434.6520
## [6,]     -421.8376  -434.6520

Take 15 minutes to analyse both chains together.

Hint: the function mcmc.list of coda can be used to combine several mcmc objects into a mcmc.list object. Diagnostic functions which act on mcmc objects may also be applied to mcmc.list objects. In general, the chains will be combined, if this makes sense, otherwise the diagnostic function will be applied separately to each chain in the list.

Did the chains converge to the same distribution?
What advantage can you see in combining several independent chains?
Record the mean, median and 95% credible intervals of each parameter.

Finally, you should also assess the fit of your model by using the function plotPosteriorFit. This function takes a sample of theta from the trace, simulates some observations and plot them against the data, thus allowing you to assess the fit. Take 10 minutes to look at the documentation of this function and assess your fit, in particular try the different options for the argument posterior.summary.

Once again, if you are not sure how to complete these steps, have a look at our example.

Correlations

You can check at correlations between parameters using the function levelplot from the coda package. Note however that this function doesn’t accept mcmc.list objects.

Which parameters are strongly correlated? Can you explain why?

As previously stated, both the latent and infectious period have been estimated to be around 2 days in empirical studies and are unlikely to last more than 5 days.

Are your estimates in agreement with these previous studies? If not, did your approach take into account this prior information when fitting your model?

Modify seitlDeter to account this prior information and re-run a short MCMC for 5000 iterations. Can you notice a difference in the posterior?

A solution can be found here.

Informative priors

In order to take into account the results of previous empirical studies, we have re-run both chains for $10^5$ iterations with the following informative priors:

$D_{lat}\sim\mathcal{N}(\mu=2,\sigma=1)$
$D_{inf}\sim\mathcal{N}(\mu=2,\sigma=1)$

names(mcmcSeitlInfoPriorTheta1)
## [1] "trace"           "acceptanceRate"  "covmatEmpirical"
names(mcmcSeitlInfoPriorTheta2)
## [1] "trace"           "acceptanceRate"  "covmatEmpirical"

Take 15 minutes to perform the same analysis as before and compare the posteriors with and without informative priors. Which parameters have significantly different posteriors? Does that make sense to you?

A solution can be found here.

Model comparison

We can compare the SEITL and SEIT4L models using the deviance information criterion (DIC).

The deviance of a parameter set $\theta$ is defined as \[ D(\theta)=-2\log(p(y|\theta)) + C \] where $p(y|\theta)$ is the likelihood of the data given $\theta$ and $C$ is a constant that will cancel out when comparing two models.

The DIC can be computed as \[ \mathrm{DIC}=D(\bar\theta) + 2p_D \] where $\bar\theta$ is the mean of $\theta$ with respect to the posterior distribution, and $p_D$ is the effective number of parameters, which is approximately equal to: \[p_D=\frac{1}{2}\hat{\mathrm{var}}(D(\theta))\] that is half of the variance of the deviance with respect to the posterior distribution.

The idea is that models with smaller DIC should be preferred to models with larger DIC. Models are penalized both by the value of $D(\bar\theta)$, which favors a good fit, but also by the effective number of parameters $p_D$. Since $D(\bar\theta)$ will decrease as the number of parameters in a model increases, the $p_D$ term compensates for this effect by favouring models with a smaller number of effective parameters.

Compute the DIC for your model. A solution is provided here.

Now, it’s time to compare the DIC of the SEITL and SEIT4L model. Which model should be preferred? Is the difference substantial?

You can have a look at the MRC FAQ on DIC to decide which model is the best.

Posterior predictive checks

The DIC can only tell you which one of two or more models fit the data best; it doesn’t give you any information on whether the best model provides actually a good fit to the data overall (in the kingdom of the blind, the one-eyed man is king).

One way to assess the overall quality of a fit is to perform posterior predictive checks. It works by choosing a test statistic of the data (i.e., the peak, final size, etc.) and running many model replicates from parameter values that have been sampled from the posterior.

One then tests how often in the simulations the test statistic takes a value as extreme or more extreme than the value it takes in the data. This allows us to compute a p-value (a frequentist quantity, strictly speaking).

If the data come out as an extreme case of model output (that is, the chosen test statistic or a more extreme value only comes up very rarely in the simulations), it would indicate that the model is not a good fit to the data.

Let’s look at the maximum in the data

max(fluTdc1971$obs)

## [1] 47

Now, write a function that takes as argument a data frame of parameter samples, the number of trajectories to evaluate, the model, initial conditions and a data set, and which returns the Bayesian p-value of the test statistic in the data with respect to the posterior samples.

# This is a function that takes 4 arguments:
# - trace, a data frame containing samples from the posterior
#   distribution, one column per parameter 
# - nSamples, the number of samples to take
# - fitmodel, the model we use to generate replicates
# - initState, the initial state
# - data, the data set we have fit the model to
# It should return the two-sided p-value for the maximal observation
# in the data with respect to the model.
my_postPredCheck <- function(trace, nSamples, fitmodel, initState, data) {

    # calculate maximum in obs column of data
    
    # draw nSamples random numbers between 1
    # and nSamples using the `sample` function

    # initialise vector of model maxima
    
    ## start for() loop over sampled numbers

        # get row from trace corresponding to sampled number

        # use rTrajObs to generate observation trajectory using row
        # from trace

        # calculate maximum in model and add to vector of model maxima

    ## end for() loop

    # calculate 2-sided p-value, that is the proportion of elements of
    # maxModel which are either greater or equal or less or equal
    # (whichever is less) and  multiply by 2 (because it is a 2-sided
    # test)

    # return two-sided p-value
}

If you have trouble filling any of the empty bits, have a look at our more guided example.

Now, do a posterior predictive check using one of the mcmc runs analysed above. For example,

initState <- c(S = 279, E = 0, I = 2, T = 3, L = 0, Inc = 0)
my_postPredCheck(
  trace = mcmcSeitlTheta1$trace[, seitlDeter$thetaNames],
  nSamples = 100,
  fitmodel = seitlDeter,
  initState = initState,
  data = fluTdc1971
)

## [1] 0.1

Of course, this is the outcome of a random draw (100 samples in this case), so it will give a different result every time. With more samples, you could improve the estimate.

Does the best model yield to a “better” (higher) p-value?

References

If you’re interested in finding out more about the comparison between different hypotheses to explain the second wave using mathematical models, make sure to check out the paper by Camacho et al. (2011) and associated slides on the topic.

This web site and the material contained in it were originally created in support of an annual short course on Model Fitting and Inference for Infectious Disease Dynamics at the London School of Hygiene & Tropical Medicine. All material is under a MIT license. Please report any issues or suggestions for improvement on the corresponding GitHub issue tracker. We are always keen to hear about any uses of the material here, so please do get in touch using the Discussion board if you have any questions or ideas, or if you find the material here useful or use it in your own teaching.

MCMC and model comparison: the deterministic SEIT(4)L models