Particle MCMC

Estimated time: 1.5-2 hours

Introduction

In the previous session, we learned how particle filters estimate likelihoods for stochastic models by simulating many trajectories and weighting them by how well they match observations. We also calibrated the number of particles needed for reliable inference.

Now we’ll put this into practice. This session shows how to combine particle filters with Metropolis-Hastings to perform full Bayesian inference on stochastic models — the Particle Marginal Metropolis-Hastings (PMMH) algorithm.

Coming from R?

In R, particle MCMC is available through pomp (the pmcmc() function) and the mcstate package (part of the odin/dust/mcstate ecosystem). The concepts are identical; only the implementation differs.

Objectives

In this session you will:

Implement PMMH using our calibrated particle filter from the previous session
Fit the stochastic SEIT4L model to the Tristan da Cunha outbreak
Compare stochastic vs deterministic model fits
Understand why PMMH requires long chains and adaptive proposals

Reminder: Stochastic variability

PMMH uses particle filters internally, introducing two sources of randomness: the MCMC proposals and the particle filter likelihood estimates. Your chains will look different from others’ runs. Focus on whether the chains have converged (diagnostics), not exact values.

Computational requirements

PMMH combines particle filters with MCMC, making it computationally demanding. For this session:

Memory: ~4 GB RAM recommended
Runtime: Each MCMC iteration takes 0.1-0.5 seconds (256 particles)
Full inference: 10,000 iterations takes ~30-60 minutes; we use pre-computed chains for the main analysis
Short test runs: The 500-iteration demonstrations in this session take ~2-5 minutes each

The pre-computed chains (loaded from data/) were generated with 500,000 iterations over several hours.

Setup

Source file

The source file of this session is located at sessions/pmcmc.qmd.

Load libraries

using Random ## for random numbers
using Distributions ## for probability distributions
using DifferentialEquations ## for differential equations
using DataFrames ## for data frames
using Plots ## for plots
using StatsPlots ## for statistical plots
using Turing ## for probabilistic programming and MCMC
using MCMCChains ## for MCMC diagnostics
using AdvancedMH ## for custom MH proposals
using CSV ## for CSV file reading
using StatsBase ## for statistical functions
using JLD2 ## for loading pre-computed chains
using SSMProblems ## for state-space model interface
using GeneralisedFilters ## for particle filtering
using ForwardDiff ## for stripping Dual numbers in non-differentiable likelihoods
using DrWatson

Load the course package

This loads the MFIIDD course package, which provides the helper functions used throughout — including the same particle filter implementation from the previous session.

using MFIIDD

SEIT4L Particle Filter Implementation

Gillespie step (src/seit4l_gillespie.jl):

using Random

"""
    gillespie_step(rng, state, θ, dt=1.0)

Simulate SEIT4L for `dt` time units using the Gillespie algorithm.

# Arguments
- `rng`: Random number generator
- `state`: Vector [S, E, I, T1, T2, T3, T4, L]
- `θ`: Parameter dictionary with keys :R_0, :D_lat, :D_inf, :α, :D_imm
- `dt`: Length of the simulation interval (default 1.0, i.e. one day)

# Returns
- `(new_state, incidence)`: Updated state and number of new cases over `dt`
"""
function gillespie_step(rng::AbstractRNG, state::Vector{Float64}, θ::Dict, dt::Float64=1.0)
    β = θ[:R_0] / θ[:D_inf]
    ϵ = 1.0 / θ[:D_lat]
    ν = 1.0 / θ[:D_inf]
    τ = 4.0 / θ[:D_imm]
    α = θ[:α]

    # Copy state for modification
    s = copy(state)

    # Stoichiometry: how each transition changes [S, E, I, T1, T2, T3, T4, L]
    stoich = [
        [-1, 1, 0, 0, 0, 0, 0, 0],   # S → E (infection)
        [0, -1, 1, 0, 0, 0, 0, 0],   # E → I (becoming infectious)
        [0, 0, -1, 1, 0, 0, 0, 0],   # I → T1 (recovery)
        [0, 0, 0, -1, 1, 0, 0, 0],   # T1 → T2
        [0, 0, 0, 0, -1, 1, 0, 0],   # T2 → T3
        [0, 0, 0, 0, 0, -1, 1, 0],   # T3 → T4
        [1, 0, 0, 0, 0, 0, -1, 0],   # T4 → S (immunity wanes)
        [0, 0, 0, 0, 0, 0, -1, 1]    # T4 → L (long-term immunity)
    ]

    function rates(s)
        S, E, I, T1, T2, T3, T4, L = s
        N = S + E + I + T1 + T2 + T3 + T4 + L
        [β*S*I/N, ϵ*E, ν*I, τ*T1, τ*T2, τ*T3, (1-α)*τ*T4, α*τ*T4]
    end

    # Simulate up to `dt` time units
    t, daily_inc = 0.0, 0
    while t < dt
        r = rates(s)
        total_rate = sum(r)
        total_rate ≤ 0 && break

        # Time to next event
        τ_wait = randexp(rng) / total_rate
        t + τ_wait > dt && break
        t += τ_wait

        # Select which event occurs
        cum, rnd, event = 0.0, rand(rng) * total_rate, 0
        for i in 1:8
            cum += r[i]
            if rnd ≤ cum
                event = i
                break
            end
        end

        # Apply the transition
        for j in 1:8
            s[j] += stoich[event][j]
        end

        # E → I transitions count as new cases
        event == 2 && (daily_inc += 1)
    end

    return s, daily_inc
end

"""
    gillespie_step_seit4l!(state, θ, dt)

In-place version for simple bootstrap filter (no RNG argument, uses global RNG).
"""
function gillespie_step_seit4l!(state::Vector{Float64}, θ::Dict, dt::Float64=1.0)
    new_state, inc = gillespie_step(Random.default_rng(), state, θ, dt)
    for i in 1:8
        state[i] = new_state[i]
    end
    return inc
end

SSMProblems interface (src/seit4l_ssm_interface.jl):

using Random
using Distributions
using SSMProblems

"""
SEIT4L latent dynamics.
State vector: [S, E, I, T1, T2, T3, T4, L, daily_inc]
The last element tracks daily incidence for the observation process.
"""
struct SEIT4LDynamics <: SSMProblems.LatentDynamics
    θ::Dict{Symbol, Float64}
end

function SSMProblems.simulate(rng::AbstractRNG, dyn::SEIT4LDynamics,
                              step::Integer, prev_state; kwargs...)
    state = collect(prev_state[1:8])  # Extract compartments
    new_state, daily_inc = gillespie_step(rng, state, dyn.θ)
    return vcat(new_state, daily_inc)  # Append daily incidence
end

"""
Poisson observation process.
Observes daily incidence (last element of state) with reporting rate ρ.
"""
struct PoissonObservation <: SSMProblems.ObservationProcess
    ρ::Float64
end

function SSMProblems.distribution(obs::PoissonObservation, step::Integer,
                                  state; kwargs...)
    daily_inc = state[end]
    Poisson(max(obs.ρ * daily_inc, 1e-10))
end

"""
Initial state distribution (deterministic).
"""
struct SEIT4LInitial <: SSMProblems.StatePrior
    init_state::Vector{Float64}
end

function SSMProblems.simulate(rng::AbstractRNG, prior::SEIT4LInitial; kwargs...)
    vcat(prior.init_state, 0.0)  # Append 0 for initial daily incidence
end

Bootstrap filter runner (src/seit4l_runner.jl):

using Random: default_rng
using GeneralisedFilters: BF, filter
using ForwardDiff: value
using SSMProblems: StateSpaceModel

"""
    run_particle_filter(θ, obs, n_particles; init_state)

Run bootstrap particle filter for SEIT4L and return log-likelihood.

# Arguments
- `θ`: Parameter dictionary (must include :R_0, :D_lat, :D_inf, :α, :D_imm, :ρ)
- `obs`: Vector of observed daily incidence
- `n_particles`: Number of particles
- `init_state`: Initial state vector [S, E, I, T1, T2, T3, T4, L]

# Returns
- `log_likelihood`: Estimated log-likelihood
"""
function run_particle_filter(θ, obs, n_particles;
                             init_state=[279.0, 0.0, 2.0, 3.0, 0.0, 0.0, 0.0, 0.0])
    # Coerce init_state to Float64 so callers can pass an integer vector
    # (e.g. [279, 0, 2, ...]) without hitting SEIT4LInitial's Vector{Float64}
    # signature.
    init_state_f64 = collect(Float64.(init_state))
    length(init_state_f64) == 8 ||
        throw(ArgumentError("init_state must be [S, E, I, T1, T2, T3, T4, L]"))

    # Strip ForwardDiff Duals before constructing the SSM components: the
    # bootstrap filter is non-differentiable, and SEIT4LDynamics /
    # PoissonObservation declare concrete Float64 fields. Mirrors the pattern
    # used in sessions/pmcmc.qmd.
    θ_f64 = Dict{Symbol,Float64}(k => value(v) for (k, v) in θ)

    # Define SSM components
    initial = SEIT4LInitial(init_state_f64)
    dynamics = SEIT4LDynamics(θ_f64)
    observation = PoissonObservation(θ_f64[:ρ])

    # Create state-space model
    model = StateSpaceModel(initial, dynamics, observation)

    # Run bootstrap particle filter
    rng = default_rng()
    algo = BF(n_particles)  # Bootstrap Filter
    _, log_lik = filter(rng, model, algo, obs)

    return log_lik
end

Initialisation

We set a random seed for reproducibility.

Random.seed!(1234)

TaskLocalRNG()

Recap: Data and particle filter

We load the Tristan da Cunha data and set up initial conditions:

# Load Tristan da Cunha data
flu_tdc = CSV.read(datadir("flu_tdc_1971.csv"), DataFrame)

# Initial state for SEIT4L
init_state = Dict(
    :S => 279.0,
    :E => 0.0,
    :I => 2.0,
    :T1 => 3.0,
    :T2 => 0.0,
    :T3 => 0.0,
    :T4 => 0.0,
    :L => 0.0
)

Dict{Symbol, Float64} with 8 entries:
  :I  => 2.0
  :T2 => 0.0
  :L  => 0.0
  :S  => 279.0
  :T4 => 0.0
  :T1 => 3.0
  :E  => 0.0
  :T3 => 0.0

In the Particle Filters session, we:

Implemented the SEIT4L Gillespie simulator (gillespie_step)
Defined the SSMProblems.jl interface (SEIT4LDynamics, PoissonObservation, SEIT4LInitial)
Calibrated particle count: 128-256 particles gave SD of log-likelihood between 1 and 3

All these components were loaded via using MFIIDD in the setup above (defined in src/seit4l_gillespie.jl, src/seit4l_ssm_interface.jl, and src/seit4l_runner.jl). Now we’ll use them for full Bayesian inference.

In your own work, you should run prior and posterior predictive checks when fitting models. We skip them here to focus on the PMMH algorithm.

Particle MCMC (PMMH) with Turing.jl

Now that we have a calibrated particle filter, we can use it for inference. We’ll implement Particle Marginal Metropolis-Hastings (PMMH) - one of the two main particle MCMC algorithms from Andrieu et al. (2010).

PMMH uses the particle filter to estimate the likelihood, then does standard Metropolis-Hastings on the parameters. In Turing.jl, we implement this by:

Defining priors as usual with ~
Building a state-space model and running GeneralisedFilters.filter() to get the log-likelihood
Adding it to the model’s log-probability with @addlogprob!
Sampling with MH() with custom random-walk proposals

"""
PMMH model for SEIT4L inference.
Uses GeneralisedFilters for likelihood estimation.
"""
@model function pmmh_seit4l(obs, n_particles)
    # Weakly informative priors
    R_0 ~ truncated(Normal(3.0, 2.0), lower=1.0)
    D_lat ~ truncated(Normal(2.0, 1.0), lower=0.5)
    D_inf ~ truncated(Normal(3.0, 2.0), lower=0.5)
    α ~ Beta(2, 2)
    D_imm ~ truncated(Normal(15.0, 10.0), lower=1.0)
    ρ ~ Beta(2, 2)

    # Create parameter dictionary; extract underlying values so ForwardDiff
    # Duals (used only by Turing's gradient-based init probe) don't flow into
    # the particle filter, which is non-differentiable anyway.
    θ = Dict(:R_0 => ForwardDiff.value(R_0),
             :D_lat => ForwardDiff.value(D_lat),
             :D_inf => ForwardDiff.value(D_inf),
             :α => ForwardDiff.value(α),
             :D_imm => ForwardDiff.value(D_imm),
             :ρ => ForwardDiff.value(ρ))

    # Build state-space model
    # The SSMProblems interface expects a plain vector, not a Dict
    # Order: [S, E, I, T1, T2, T3, T4, L]
    init_state = [279.0, 0.0, 2.0, 3.0, 0.0, 0.0, 0.0, 0.0]
    ssm = StateSpaceModel(SEIT4LInitial(init_state),
                          SEIT4LDynamics(θ),
                          PoissonObservation(θ[:ρ]))

    # Run particle filter to estimate log-likelihood
    _, log_lik = GeneralisedFilters.filter(Random.default_rng(), ssm, BF(n_particles), obs)

    # Add log-likelihood to model (priors are already included via ~)
    Turing.@addlogprob! log_lik
end

Main.Notebook.pmmh_seit4l

Code explainer

StateSpaceModel(...) - assembles the state-space model from initial distribution, dynamics, and observation components defined in src/seit4l_ssm_interface.jl.
GeneralisedFilters.filter(...) - runs the bootstrap particle filter (BF) and returns the estimated log-likelihood. Random.default_rng() provides the global random number generator — this is Julia’s way of passing randomness explicitly rather than relying on a hidden global state.
@addlogprob! - manually adds the particle filter’s log-likelihood estimate to the model. The priors specified with ~ still contribute automatically. This is the recommended pattern for integrating external likelihood estimators with Turing.
Why 256 particles? PMMH acceptance depends on the variance of the log-likelihood estimate. Using more particles reduces variance but increases computation.

Now let’s run PMMH. Unlike standard MCMC where NUTS auto-tunes proposals, PMMH requires Metropolis-Hastings because the particle filter likelihood is not differentiable.

Let’s start with a short chain to see the challenges:

# Define model with 256 particles
model_pmmh = pmmh_seit4l(flu_tdc.obs, 256)

# Run a short chain with RAM
# externalsampler() wraps an AdvancedMH.jl sampler for use with Turing models
# check_model=false is needed because our model uses @addlogprob! instead of standard ~ syntax
chain_short = sample(
    model_pmmh,
    externalsampler(AdvancedMH.RobustAdaptiveMetropolis()),
    500;
    check_model=false,
    progress=false
)

Chains MCMC chain (500×12×1 Array{Float64, 3}):

Iterations        = 1:1:500
Number of chains  = 1
Samples per chain = 500
Wall duration     = 70.01 seconds
Compute duration  = 70.01 seconds
parameters        = R_0, D_lat, D_inf, α, D_imm, ρ
internals         = logα, η, accepted, logprior, loglikelihood, logjoint

Use `describe(chains)` for summary statistics and quantiles.

# Plot the short chain
plot(chain_short)

Notice the poor mixing

Look carefully at the trace plots above. You should see:

Flat stretches (plateaus): The chain gets stuck at the same value for many iterations
Slow wandering: Instead of rapidly exploring, the chain drifts gradually

With only 500 samples, the chain hasn’t had time to properly explore the posterior.

Exercise: Run your own short PMMH chain

Try running a short PMMH chain yourself:

Run sample(model_pmmh, externalsampler(AdvancedMH.RobustAdaptiveMetropolis()), 200; check_model=false) and look at the trace plots
What acceptance rate does the sampler achieve? (Look at the proportion of samples that change between iterations)
Try reducing the particle count to 64: pmmh_seit4l(flu_tdc.obs, 64). How does this affect mixing? Is it faster but noisier?
Compare to the pre-computed chain loaded below — how many more iterations were needed for convergence?

Why is PMMH mixing difficult?

The particle filter gives a noisy estimate of the likelihood. Even at the same parameters, different runs give different values. This noise means proposals that would be accepted with an exact likelihood may be rejected (and vice versa), slowing exploration.

Robust Adaptive Metropolis (RAM) helps by automatically learning the posterior covariance structure during sampling (Vihola, 2012).

Pre-computed chains

Running PMMH with RAM for good mixing takes several hours. We load pre-computed chains here (500,000 samples, 50,000 burnin, thinned by 50). See scripts/generate_pmcmc_seit4l.jl if you want to run it yourself.

# Load pre-computed SEIT4L chain with good mixing
chain_df = CSV.read(datadir("pmcmc_seit4l_chain.csv"), DataFrame)
chain_pmcmc = Chains(Matrix(chain_df[:, [:R_0, :D_lat, :D_inf, :α, :D_imm, :ρ]]),
                     [:R_0, :D_lat, :D_inf, :α, :D_imm, :ρ])

Chains MCMC chain (9000×6×1 Array{Float64, 3}):

Iterations        = 1:1:9000
Number of chains  = 1
Samples per chain = 9000
parameters        = R_0, D_lat, D_inf, α, D_imm, ρ

Use `describe(chains)` for summary statistics and quantiles.

# Display chain summary
display(chain_pmcmc)

Chains MCMC chain (9000×6×1 Array{Float64, 3}):

Iterations        = 1:1:9000
Number of chains  = 1
Samples per chain = 9000
parameters        = R_0, D_lat, D_inf, α, D_imm, ρ

Use `describe(chains)` for summary statistics and quantiles.

# Plot trace plots - much better mixing!
plot(chain_pmcmc)

Compare these traces to the short chain above - the well-mixed chain shows:

# Plot posterior densities (3 rows x 2 columns for 6 parameters)
density(chain_pmcmc, size = (800, 900), layout = (3, 2))

Why Robust Adaptive Metropolis?

With standard MH and independent proposals, mixing is poor when parameters are correlated. The sampler proposes moves that go “against” the correlation structure and gets rejected.

RAM solves this by learning the posterior covariance during sampling (Vihola, 2012). It automatically:

Adapts the proposal covariance to match the posterior shape
Scales proposals to achieve ~23% acceptance (optimal for random-walk MH)

This is similar to how NUTS adapts its mass matrix, but works for non-differentiable likelihoods like particle filters.

Comparing with deterministic model fits

We quickly fit the deterministic SEIT4L model for comparison:

"""
SEIT4L deterministic model.
"""
function seit4l_ode!(du, u, p, t)
    R_0, D_lat, D_inf, α, D_imm = p
    β = R_0 / D_inf
    ϵ = 1.0 / D_lat
    ν = 1.0 / D_inf
    τ = 4.0 / D_imm

    S, E, I, T1, T2, T3, T4, L, Inc = u
    N = S + E + I + T1 + T2 + T3 + T4 + L

    du[1] = -β * S * I / N + (1 - α) * τ * T4
    du[2] = β * S * I / N - ϵ * E
    du[3] = ϵ * E - ν * I
    du[4] = ν * I - τ * T1
    du[5] = τ * T1 - τ * T2
    du[6] = τ * T2 - τ * T3
    du[7] = τ * T3 - τ * T4
    du[8] = α * τ * T4
    du[9] = ϵ * E
end

@model function seit4l_deterministic_model(init_state, times, n_obs)
    # Weakly informative priors
    R_0 ~ truncated(Normal(3.0, 2.0), lower=1.0)
    D_lat ~ truncated(Normal(2.0, 1.0), lower=0.5)
    D_inf ~ truncated(Normal(3.0, 2.0), lower=0.5)
    α ~ Beta(2, 2)
    D_imm ~ truncated(Normal(15.0, 10.0), lower=1.0)
    ρ ~ Beta(2, 2)

    # Solve ODE; promote u0 to the parameter element type so ForwardDiff Duals
    # propagate through the solution
    params = [R_0, D_lat, D_inf, α, D_imm]
    T = eltype(params)
    u0 = T[init_state[:S], init_state[:E], init_state[:I],
           init_state[:T1], init_state[:T2], init_state[:T3], init_state[:T4],
           init_state[:L], 0.0]

    prob = ODEProblem(seit4l_ode!, u0, (times[1], times[end]), params)
    sol = solve(prob, Tsit5(), saveat=times)

    # Reject parameter draws that drive the solver to fail early; the gradient
    # probe in NUTS init occasionally lands on combinations where the ODE only
    # saves the initial timepoint.
    if !SciMLBase.successful_retcode(sol) || length(sol.t) != length(times)
        Turing.@addlogprob! -Inf
        return
    end

    # Extract daily incidence
    inc_cumulative = sol[9, :]
    inc_daily = [zero(eltype(inc_cumulative)); diff(inc_cumulative)]

    # Likelihood: Poisson observations
    # obs[i] corresponds to inc_daily[i] (times 0-59, indices 1-60)
    lambdas = [max(ρ * inc_daily[i], 1e-10) for i in 1:n_obs]
    obs ~ arraydist(Poisson.(lambdas))
end

seit4l_deterministic_model (generic function with 2 methods)

Why | conditioning here but @addlogprob! above?

The deterministic model uses obs ~ arraydist(Poisson.(lambdas)) — the likelihood is expressed as a ~ statement, so Turing can condition on the data directly with |.

The stochastic pmmh_seit4l model uses an external likelihood estimator (the particle filter via GeneralisedFilters.filter), which returns a log-likelihood that we add manually with @addlogprob!. In that model, there is no obs ~ statement in the model body — obs enters as a function argument passed to the filter — so | has nothing to condition on.

In short: use | when the likelihood is a ~ statement; use @addlogprob! when the likelihood comes from an external computation.

# Fit deterministic model
times_obs = 0.0:1.0:59.0
model_det = seit4l_deterministic_model(init_state, times_obs, length(flu_tdc.obs)) | (; obs = flu_tdc.obs)
chain_det = sample(model_det, NUTS(), 2000, progress=false)

┌ Info: Found initial step size
└   ϵ = 0.2
┌ Warning: Interrupted. Larger maxiters is needed. If you are using an integrator for non-stiff ODEs or an automatic switching algorithm (the default), you may want to consider using a method for stiff equations. See the solver pages for more details (e.g. https://docs.sciml.ai/DiffEqDocs/stable/solvers/ode_solve/#Stiff-Problems).
└ @ SciMLBase ~/.julia/packages/SciMLBase/O1HPI/src/integrator_interface.jl:677

Chains MCMC chain (2000×20×1 Array{Float64, 3}):

Iterations        = 1001:1:3000
Number of chains  = 1
Samples per chain = 2000
Wall duration     = 24.42 seconds
Compute duration  = 24.42 seconds
parameters        = R_0, D_lat, D_inf, α, D_imm, ρ
internals         = n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size, logprior, loglikelihood, logjoint

Use `describe(chains)` for summary statistics and quantiles.

Plotting code

# Compare posteriors
p1 = density(chain_det, :R_0, title = "R_0 (Deterministic)", legend = false)
p2 = density(chain_pmcmc, :R_0, title = "R_0 (Stochastic)", legend = false)
p3 = density(chain_det, :α, title = "α (Deterministic)", legend = false)
p4 = density(chain_pmcmc, :α, title = "α (Stochastic)", legend = false)

plot(p1, p2, p3, p4, layout = (2, 2), size = (800, 600))

Exercise: Interpret the stochastic vs deterministic comparison

Look at the posterior density plots above and compare the two models:

Which model gives wider posterior distributions? Why would accounting for stochastic variability change the uncertainty in parameter estimates?
Focus on the \(\alpha\) parameter (probability of long-term immunity). Do the two models agree on its value?
For a population of 284 people, do you think the deterministic approximation is reasonable, or does stochasticity matter?

For the Tristan da Cunha outbreak with only 284 individuals, demographic stochasticity plays a significant role. The stochastic model typically shows wider posteriors because it accounts for an additional source of variability — the randomness of individual infection events — that the deterministic model ignores. The deterministic model is much faster to fit and useful for initial exploration, but for small populations the stochastic model gives more honest uncertainty estimates.

Model comparison: SEITL vs SEIT4L

In the SEITL session, we introduced two model variants that both explain the two-wave pattern through temporary immunity:

SEITL: Single T compartment with exponential duration (memoryless)
SEIT4L: Four T sub-compartments giving Erlang-4 duration (more realistic timing)

Which model better explains the data? We fit both and compare.

Fitting SEITL for comparison

The SEITL model uses the same structure as SEIT4L, but with a single T compartment instead of four. The implementation follows the same pattern we used for SEIT4L: a Gillespie simulator for stochastic dynamics, a bootstrap particle filter to estimate the likelihood, and a Turing @model that adds the log-likelihood via @addlogprob!. Since the code structure is identical, we load pre-computed chains rather than repeating the implementation. See scripts/generate_pmcmc_seitl.jl if you want to examine the full code.

Both models were fitted using RAM with 500,000 iterations, 50,000 burn-in, and thinned by 50:

# Load pre-computed SEITL chain
chain_seitl_df = CSV.read(datadir("pmcmc_seitl_chain.csv"), DataFrame)
chain_seitl = Chains(Matrix(chain_seitl_df[:, [:R_0, :D_lat, :D_inf, :α, :D_imm, :ρ]]),
                     [:R_0, :D_lat, :D_inf, :α, :D_imm, :ρ])

Chains MCMC chain (9000×6×1 Array{Float64, 3}):

Iterations        = 1:1:9000
Number of chains  = 1
Samples per chain = 9000
parameters        = R_0, D_lat, D_inf, α, D_imm, ρ

Use `describe(chains)` for summary statistics and quantiles.

# Compare the fits visually
p1 = density(chain_seitl, :R_0, title="R_0", label="SEITL")
density!(p1, chain_pmcmc, :R_0, label="SEIT4L")

p2 = density(chain_seitl, :D_imm, title="D_imm", label="SEITL")
density!(p2, chain_pmcmc, :D_imm, label="SEIT4L")

plot(p1, p2, layout=(1, 2), size=(800, 400))

Model comparison

Looking at the posterior densities above, we can see differences between the two models. But which model better explains the data? This is the question of model comparison or model selection.

Approaches to model comparison

Information criteria estimate out-of-sample predictive accuracy by penalising model complexity:

DIC (Deviance Information Criterion): Adds a penalty based on “effective number of parameters”. Easy to compute but can be unstable for complex models.
WAIC (Widely Applicable Information Criterion): More robust than DIC, based on pointwise predictive density. Asymptotically equivalent to leave-one-out cross-validation.
LOO-CV (Leave-One-Out Cross-Validation): Gold standard for predictive accuracy. ParetoSmooth.jl provides efficient PSIS-LOO approximation.

For particle MCMC, these criteria require care because the likelihood estimates are noisy. The noise doesn’t bias the posterior, but it does affect pointwise likelihood calculations needed for WAIC and LOO-CV. Solutions include using more particles when computing criteria, or averaging over multiple likelihood estimates per parameter sample.

Bayes factors compare marginal likelihoods directly:

\[\text{BF}_{12} = \frac{P(\text{data}|M_1)}{P(\text{data}|M_2)}\]

Values > 10 indicate strong evidence for \(M_1\); values < 0.1 indicate strong evidence for \(M_2\). The marginal likelihood can be estimated from particle filter output using techniques like path sampling or thermodynamic integration, but these are computationally intensive.

Posterior predictive checks provide a qualitative but intuitive comparison: which model generates data that looks more like the observations? This doesn’t give a single number, but can reveal how models differ in their predictions. A full posterior predictive comparison would simulate many datasets from each fitted model and compare them to the observed data - we leave this as an exercise.

Comparing parameter estimates

We can compare how the two models estimate the same parameters:

# Posterior means from each model
println("Posterior mean estimates:")
println("SEITL:")
println("  R_0 = $(round(mean(chain_seitl[:R_0]), digits=2))")
println("  D_imm = $(round(mean(chain_seitl[:D_imm]), digits=2))")

println("SEIT4L:")
println("  R_0 = $(round(mean(chain_pmcmc[:R_0]), digits=2))")
println("  D_imm = $(round(mean(chain_pmcmc[:D_imm]), digits=2))")

Posterior mean estimates:
SEITL:
  R_0 = 7.05
  D_imm = 17.56
SEIT4L:
  R_0 = 10.8
  D_imm = 12.28

The key difference is in the D_imm (duration of temporary immunity) estimates. SEIT4L, with its Erlang-distributed immunity duration, typically estimates a shorter mean duration but with less variance in timing.

What the original study found

Camacho et al. (2011) compared SEITL and SEIT4L using DIC and found that SEIT4L provided a better fit (lower DIC). This suggests that the timing of immunity loss matters - it’s not memoryless, as the exponential distribution in SEITL would imply.

The Erlang-4 distribution in SEIT4L has a coefficient of variation of 0.5 (compared to 1.0 for exponential), meaning individuals lose immunity at more consistent times. For the Tristan da Cunha data, this more realistic timing distribution better captures the second wave dynamics.

Practical recommendations

For particle MCMC model comparison:

Start with posterior predictive checks - do the models make qualitatively different predictions?
Use DIC for quick comparisons - it’s easy to compute and gives a rough ranking
Use WAIC or LOO-CV for publication - more robust, with standard errors
Be cautious with Bayes factors - they’re sensitive to prior choices and computationally expensive

When criteria disagree, consider what question you’re actually asking: best predictions (information criteria) or most probable model (Bayes factors)?

Exercise: Posterior predictive comparison

Generate posterior predictive simulations from both SEITL and SEIT4L and compare them visually:

# For each model chain, sample parameters and simulate trajectories
function posterior_predictive_sims(chain, simulate_fn, init_state, times, n_sims=50)
    plots_data = []
    for i in rand(1:length(chain), n_sims)
        θ = Dict(:R_0 => chain[:R_0][i], :D_lat => chain[:D_lat][i],
                 :D_inf => chain[:D_inf][i], :α => chain[:α][i],
                 :D_imm => chain[:D_imm][i], :ρ => chain[:ρ][i])
        traj = simulate_fn(θ, init_state, times)
        obs_sim = rand.(Poisson.(max.(θ[:ρ] .* traj.Inc, 1e-10)))
        push!(plots_data, obs_sim)
    end
    return plots_data
end

Which model produces simulated epidemics that look more like the observed data?
Does either model consistently miss the second wave timing?
How does the spread of simulations differ between models?

Learning points

Particle filters estimate likelihoods for stochastic models by simulating many trajectories (particles) and weighting them by how well they match the data
Number of particles should be calibrated so the standard deviation of the log-likelihood estimate is between 1 and 3. The required number scales with problem complexity.
Stochastic vs deterministic: stochastic models capture demographic stochasticity (important for small populations) but are much slower to fit
Model comparison using information criteria helps choose between competing mechanistic explanations, though applying them to particle MCMC requires care

Going further

Multiple chains and diagnostics

For production PMMH, you should run multiple chains and check convergence:

# Run multiple chains in parallel
chain_multi = sample(
    model_pmmh,
    externalsampler(AdvancedMH.RobustAdaptiveMetropolis()),
    MCMCThreads(),
    10000,
    4;
    check_model=false
)

# Check R̂ and ESS statistics
summarystats(chain_multi)

Use the same diagnostics (R̂ < 1.1, ESS > 100) as for standard MCMC.

Alternative: Manual proposal tuning

If you need more control, you can use MH() with a covariance matrix instead of RAM. This is the same approach we used in the MCMC session:

# Manual covariance matrix (6 parameters, so 6x6 matrix)
# Diagonal entries are proposal variances, off-diagonal are covariances
Σ = [4.0  0.0  0.0  0.0  0.0  0.0;   # R_0
     0.0  0.04 0.0  0.0  0.0  0.0;   # D_lat
     0.0  0.0  0.25 0.0  0.0  0.0;   # D_inf
     0.0  0.0  0.0  0.01 0.0  0.0;   # α
     0.0  0.0  0.0  0.0  4.0  0.0;   # D_imm
     0.0  0.0  0.0  0.0  0.0  0.01]  # ρ

chain = sample(model_pmmh, MH(Σ), 5000)

This requires choosing proposal variances for each parameter - too large gives low acceptance, too small gives slow exploration. In practice, RAM is easier because it learns the covariance structure automatically.

Particle Gibbs

Particle Gibbs (PG) is another PMCMC method, available in Turing via PG(n_particles).

The methodological difference: PMMH uses the particle filter to estimate the marginal likelihood, then does standard Metropolis-Hastings on the parameters. PG conditions on a “reference trajectory” - one particle path from the previous iteration is retained, and the other particles are sampled conditional on it.

PG can exhibit high autocorrelation in some settings - Lindsten et al. (2014) discuss this and propose the Particle Gibbs with Ancestor Sampling (PGAS) algorithm as an improvement.

Filtered trajectories

PMCMC methods can provide filtered trajectories - stochastic trajectories from the model conditioned on both the estimated parameters and the observed data. These are useful for visualising what the underlying epidemic dynamics might have looked like, given what we observed.

To extract a trajectory from our particle filter, we’d sample one particle at the final time point (weighted by the particle weights) and reconstruct its path back through time. This gives one draw from the posterior distribution over trajectories. Repeating across MCMC iterations gives a collection of plausible epidemic histories.

See Doucet & Johansen (2009) for details on trajectory sampling from particle filters.

SMC²

SMC² (SMC squared) uses Sequential Monte Carlo for both parameters and state trajectories, avoiding MCMC entirely. This can be more efficient for complex models but requires more careful tuning. Turing.jl supports this through the SMC sampler.

Next session

In Observation Models and Distance Metrics, we’ll step back and examine a fundamental question: why do we choose Poisson likelihoods, and what are the alternatives?

References

Andrieu et al. (2010). Particle Markov chain Monte Carlo methods. JRSS B 72(3), 269-342. (The original PMCMC paper)
Pitt et al. (2012). On some properties of Markov chain Monte Carlo simulation methods based on the particle filter. Biometrika 99(2), 269-284. (Particle calibration: target log-likelihood SD of 1-3)
Vihola (2012). Robust adaptive Metropolis algorithm with coerced acceptance rate. Statistics and Computing 22, 997-1008. (RAM algorithm used in this session)
Roberts & Rosenthal (2009). Examples of Adaptive MCMC. J. Computational and Graphical Statistics 18(2), 349-367. (Adaptive MCMC theory)
Doucet & Johansen (2009). A tutorial on particle filtering and smoothing. Handbook of Nonlinear Filtering 12, 656-704. (Trajectory sampling)
Doucet et al. (2015). Efficient implementation of Markov chain Monte Carlo when using an unbiased likelihood estimator. Biometrika 102(2), 295-313.
Doucet et al. (2001). Sequential Monte Carlo Methods in Practice. Springer. (Comprehensive textbook)
Arulampalam et al. (2002). A tutorial on particle filters. IEEE Trans. Signal Processing 50(2), 174-188.
Turing.jl documentation - Probabilistic programming in Julia