As you should have noticed in the previous session, several simulations are required to assess the dynamics of a stochastic model against a single one for a deterministic model. Similarly, in the next session you will see that fitting a stochastic model is also computationally much more intensive than fitting a deterministic model. Because of this, we want to avoid wasting simulation runs, which could happen for two reasons:
initTheta
far from
the region of high posterior density, the chain might take a
long time to reach this region of interest and you will have to
discard a lot of iterations.In this context, it can be useful to first run a MCMC on the deterministic model, which can be simulated faster, and learn from the output of this chain to initialise the chain for the fit of the stochastic model. The rationale for this approach is that the deterministic model is an approximation of the stochastic model and should capture, in most cases, the same dynamics.
So in this session you will:
To save time, half of the group will fit the deterministic SEITL
model and the other half will fit the deterministic SEIT4L model. In the
rest of the session, although most of the examples will refer to the
SEITL model, the same commands work for the SEIT4L model, i.e. using
data(seit4lDeter)
instead of
data(seitlDeter)
.
Here you could use the function my_mcmcMh
that you have
already coded but it might result in poor acceptance rates since the
models have 6 parameters to be estimated. Because of this, using
adaptive MCMC might be a better choice.
Take 10 minutes to have a look at the adaptive MCMC
implemented in the function mcmcMh
. As you can
see, this function takes similar arguments as your function
my_mcmcMh
: target
, initTheta
,
proposalSd
and nIterations
, plus several new
ones that control the adaptive part of the algorithm and that you can
learn about via the help page of mcmcMh
.
Note also that this function returns a list that contains, in addition to the trace, the acceptance rate and the empirical covariance matrix of the posterior. The latter will be useful for improving the covariance matrix of the Gaussian proposal in the next session, when we will fit the stochastic model.
The first step is to code a wrapper function to evaluate the
posterior at a given theta
value. Here again, you could
wrap the function my_dLogPosterior
that you have already
coded and pass it to mcmcMh
so that it returns samples of
theta
from the posterior distribution.
However, as you will see at the end of this session, in order to be
able to compare different models we also need to track the
log-likelihood of the sampled theta
. Although this could be
done by running dTrajObs
on each returned
theta
, you might remember that the log-likelihood is
actually computed in my_dLogPosterior
so we could just use
it. This is exactly what the function dLogPosterior
does
for you.
Take 5 minutes to look at the code of
dLogPosterior
. As you can see, this function takes
similar arguments as your function my_dLogPosterior
:
fitmodel
, theta
, initState
and
data
, plus one called margLogLike
, which takes
the function that will compute the log-likelihood of theta
.
The last argument (...
) is called dot-dot-dot and adds some
flexibility to the definition of margLogLike
.
For instance, we have seen that for a deterministic model, the
likelihood was returned by dTrajObs
. However,
margLogLike
expects a function that returns the
log-likelihood, which can be obtained by passing log = TRUE
to dTrajObs
. To do so, we can use the dot-dot-dot argument
of dLogPosterior
, which allows you to pass any extra
argument to the function assigned to margLogLike
.
The function dLogPosterior
returns a list of two
elements:
logDensity
: the log of the posterior densitytrace
: a vector that contains, among other things,
theta
and its logLikelihood
. All this
information will be collected in the trace
data frame
returned by mcmcMh
.Now, take 15 minutes to prepare all the inputs to be able to
run mcmcMh
and fit your model. You should proceed
as follows:
# wrapper for posterior
my_posteriorTdc <- function(theta){
my_fitmodel <- # INSERT HERE
my_initState <- # INSERT HERE
return(dLogPosterior(fitmodel = my_fitmodel,
theta = theta,
initState = my_initState,
data = fluTdc1971,
margLogLike = dTrajObs,
log = TRUE))
}
# theta to initialise the MCMC
initTheta <- # INSERT HERE
# diagonal elements of the covariance matrix for the Gaussian proposal
# Must be in the same order as initTheta or named
proposalSd <- # INSERT HERE
# lower and upper limits of each parameter (must be named vectors)
lower <- # INSERT HERE
upper <- # INSERT HERE
# number of iterations for the MCMC
nIterations <- # INSERT HERE
# additional parameters for the adaptive MCMC, see ?mcmcMh for more details
adaptSizeStart <- # INSERT HERE
adaptSizeCooling <- # INSERT HERE
adaptShapeStart <- # INSERT HERE
If you have trouble filling some of the empty bits, have a look at our example.
Then you should be able to run mcmcMh
:
my_mcmcTdc <- mcmcMh(
target = my_posteriorTdc,
initTheta = initTheta,
proposalSd = proposalSd,
limits = list(lower = lower,upper = upper),
nIterations = nIterations,
adaptSizeStart = adaptSizeStart,
adaptSizeCooling = adaptSizeCooling,
adaptShapeStart = adaptShapeStart
)
You should see some information printed as the chain runs: acceptance rate, state of the chain, log-likelihood etc. In particular, what can you say about the evolution of the acceptance rate?
Note that we have set the number of iterations to 5000. This is a benchmark and if your laptop is quite slow you might want to perform fewer iterations. Here the objective is to have a short - preliminary - run to calibrate your adaptive parameters before running a longer chain.
Take 10 minutes to change the parameters controlling the adaptive part of the MCMC and look at the effect on the acceptance rate. Try to find a “good” combination of these parameters so that the acceptance rate is near the optimal value of 23% (actually, the algorithm efficiency remains high whenever the acceptance rate is between about 0.1 and 0.6 so any value in between is OK). If you can’t find one, look at our example.
Now it’s time to use what you’ve learned in the previous
session to analyse your MCMC outputs using the coda
package.
library("coda")
Didn’t manage to run mcmcMh
? Just use the results from
our example.
Note that because there are more than 6 parameters to look at, the
coda
functions used previously to plot the traces,
densities and autocorrelations might not be optimal for laptop screens
(the axis labels take too much space). Fortunately, coda
has another set of functions to make the plots more compact:
xyplot
for the trace.densityplot
for the density.acfplot
for the autocorrelation.You can find a grouped documentation for these three functions by
typing ?acfplot
.
Take 15 minutes to analyse the trace returned by
mcmcMh
. Remember that, with the notation above,
the trace can be accessed by my_mcmcTdc$trace
and then
converted to a mcmc
object so that the coda
functions recognize it:
# convert to a mcmc object for coda
my_trace <- mcmc(my_mcmcTdc$trace)
# plot the trace
library("lattice") ## for xyplot
xyplot(my_trace)
If you are not sure about how to analyse your trace, have a look at our example.
You should have noticed that the effective sample size (ESS) is quite small (<100) for your preliminary run. By contrast, the ESS was much higher for the SIR model of the previous session with a similar number of iterations. However, this model has only 2 parameters against 6 for the SEIT(4)L models. Intuitively, the more parameters you have the bigger is the parameter space and the longer it takes to the MCMC algorithm to explore it. This is why we need to run a much longer chain to achieve a good ESS (~1000).
To save time, we have run 2 chains of 100,000 iterations starting
from different initial theta
values:
theta1 <- c(R_0 = 2, D_lat = 2, D_inf = 2, alpha = 0.8, D_imm = 16, rho = 0.85)
theta2 <- c(R_0 = 20, D_lat = 2, D_inf = 2, alpha = 0.1, D_imm = 8, rho = 0.3)
Each chain took 4 hours to run on a single-CPU and can be loaded as follows:
data(mcmcTdcDeterLongRun)
# this should load 2 objects in your environment: mcmcSeitlTheta1 and
# mcmcSeitlTheta2. Each one is a list of 3 elements returned by mcmcMh
names(mcmcSeitlTheta1)
## [1] "trace" "acceptanceRate" "covmatEmpirical"
# the trace contains 9 variables for 100000 iterations
dim(mcmcSeitlTheta1$trace)
## [1] 50000 9
# let's have a look at it
head(mcmcSeitlTheta1$trace)
## R_0 D_lat D_inf alpha D_imm rho logPrior
## [1,] 2.000000 2.000000 2.000000 0.8000000 16.00000 0.8500000 -12.81448
## [2,] 2.000000 2.000000 2.000000 0.8000000 16.00000 0.8500000 -12.81448
## [3,] 1.722749 1.481383 1.186734 0.8390849 11.34142 0.8183184 -12.81448
## [4,] 1.722749 1.481383 1.186734 0.8390849 11.34142 0.8183184 -12.81448
## [5,] 1.722749 1.481383 1.186734 0.8390849 11.34142 0.8183184 -12.81448
## [6,] 1.722749 1.481383 1.186734 0.8390849 11.34142 0.8183184 -12.81448
## logLikelihood logDensity
## [1,] -445.7795 -458.5939
## [2,] -445.7795 -458.5939
## [3,] -421.8376 -434.6520
## [4,] -421.8376 -434.6520
## [5,] -421.8376 -434.6520
## [6,] -421.8376 -434.6520
Take 15 minutes to analyse both chains together.
Hint: the function mcmc.list
of
coda
can be used to combine several mcmc
objects into a mcmc.list
object. Diagnostic functions which
act on mcmc
objects may also be applied to
mcmc.list
objects. In general, the chains will be combined,
if this makes sense, otherwise the diagnostic function will be applied
separately to each chain in the list.
Finally, you should also assess the fit of your model by using the
function plotPosteriorFit
. This function takes a sample of
theta
from the trace, simulates some observations and plot
them against the data, thus allowing you to assess the fit. Take
10 minutes to look at the documentation of this function and assess your
fit, in particular try the different options for the argument
posterior.summary
.
Once again, if you are not sure how to complete these steps, have a look at our example.
You can check at correlations between parameters using the function
levelplot
from the coda
package. Note however
that this function doesn’t accept mcmc.list
objects.
Which parameters are strongly correlated? Can you explain why?
As previously stated, both the latent and infectious period have been estimated to be around 2 days in empirical studies and are unlikely to last more than 5 days.
Are your estimates in agreement with these previous studies? If not, did your approach take into account this prior information when fitting your model?
Modify seitlDeter
to account this prior information and
re-run a short MCMC for 5000 iterations. Can you notice a
difference in the posterior?
A solution can be found here.
In order to take into account the results of previous empirical studies, we have re-run both chains for \(10^5\) iterations with the following informative priors:
names(mcmcSeitlInfoPriorTheta1)
## [1] "trace" "acceptanceRate" "covmatEmpirical"
names(mcmcSeitlInfoPriorTheta2)
## [1] "trace" "acceptanceRate" "covmatEmpirical"
Take 15 minutes to perform the same analysis as before and compare the posteriors with and without informative priors. Which parameters have significantly different posteriors? Does that make sense to you?
A solution can be found here.
We can compare the SEITL and SEIT4L models using the deviance information criterion (DIC).
The deviance of a parameter set \(\theta\) is defined as \[ D(\theta)=-2\log(p(y|\theta)) + C \] where \(p(y|\theta)\) is the likelihood of the data given \(\theta\) and \(C\) is a constant that will cancel out when comparing two models.
The DIC can be computed as \[ \mathrm{DIC}=D(\bar\theta) + 2p_D \] where \(\bar\theta\) is the mean of \(\theta\) with respect to the posterior distribution, and \(p_D\) is the effective number of parameters, which is approximately equal to: \[p_D=\frac{1}{2}\hat{\mathrm{var}}(D(\theta))\] that is half of the variance of the deviance with respect to the posterior distribution.
The idea is that models with smaller DIC should be preferred to models with larger DIC. Models are penalized both by the value of \(D(\bar\theta)\), which favors a good fit, but also by the effective number of parameters \(p_D\). Since \(D(\bar\theta)\) will decrease as the number of parameters in a model increases, the \(p_D\) term compensates for this effect by favouring models with a smaller number of effective parameters.
Compute the DIC for your model. A solution is provided here.
Now, it’s time to compare the DIC of the SEITL and SEIT4L model. Which model should be preferred? Is the difference substantial?
You can have a look at the MRC FAQ on DIC to decide which model is the best.
The DIC can only tell you which one of two or more models fit the data best; it doesn’t give you any information on whether the best model provides actually a good fit to the data overall (in the kingdom of the blind, the one-eyed man is king).
One way to assess the overall quality of a fit is to perform posterior predictive checks. It works by choosing a test statistic of the data (i.e., the peak, final size, etc.) and running many model replicates from parameter values that have been sampled from the posterior.
One then tests how often in the simulations the test statistic takes a value as extreme or more extreme than the value it takes in the data. This allows us to compute a p-value (a frequentist quantity, strictly speaking).
If the data come out as an extreme case of model output (that is, the chosen test statistic or a more extreme value only comes up very rarely in the simulations), it would indicate that the model is not a good fit to the data.
Let’s look at the maximum in the data
max(fluTdc1971$obs)
## [1] 47
Now, write a function that takes as argument a data frame of parameter samples, the number of trajectories to evaluate, the model, initial conditions and a data set, and which returns the Bayesian p-value of the test statistic in the data with respect to the posterior samples.
# This is a function that takes 4 arguments:
# - trace, a data frame containing samples from the posterior
# distribution, one column per parameter
# - nSamples, the number of samples to take
# - fitmodel, the model we use to generate replicates
# - initState, the initial state
# - data, the data set we have fit the model to
# It should return the two-sided p-value for the maximal observation
# in the data with respect to the model.
my_postPredCheck <- function(trace, nSamples, fitmodel, initState, data) {
# calculate maximum in obs column of data
# draw nSamples random numbers between 1
# and nSamples using the `sample` function
# initialise vector of model maxima
## start for() loop over sampled numbers
# get row from trace corresponding to sampled number
# use rTrajObs to generate observation trajectory using row
# from trace
# calculate maximum in model and add to vector of model maxima
## end for() loop
# calculate 2-sided p-value, that is the proportion of elements of
# maxModel which are either greater or equal or less or equal
# (whichever is less) and multiply by 2 (because it is a 2-sided
# test)
# return two-sided p-value
}
If you have trouble filling any of the empty bits, have a look at our more guided example.
Now, do a posterior predictive check using one of the mcmc runs analysed above. For example,
initState <- c(S = 279, E = 0, I = 2, T = 3, L = 0, Inc = 0)
my_postPredCheck(
trace = mcmcSeitlTheta1$trace[, seitlDeter$thetaNames],
nSamples = 100,
fitmodel = seitlDeter,
initState = initState,
data = fluTdc1971
)
## [1] 0.1
Of course, this is the outcome of a random draw (100 samples in this case), so it will give a different result every time. With more samples, you could improve the estimate.
Does the best model yield to a “better” (higher) p-value?
This web site and the material contained in it were originally created in support of an annual short course on Model Fitting and Inference for Infectious Disease Dynamics at the London School of Hygiene & Tropical Medicine. All material is under a MIT license. Please report any issues or suggestions for improvement on the corresponding GitHub issue tracker. We are always keen to hear about any uses of the material here, so please do get in touch using the Discussion board if you have any questions or ideas, or if you find the material here useful or use it in your own teaching.