using Random ## for random numbers
using Distributions ## for probability distributions
using DifferentialEquations ## for differential equations
using DataFrames ## for data frames
using Plots ## for plots
using StatsPlots ## for statistical plots
using Turing ## for probabilistic programming and MCMC
using MCMCChains ## for MCMC diagnostics
using AdvancedMH ## for custom MH proposals
using CSV ## for CSV file reading
using StatsBase ## for statistical functions
using JLD2 ## for loading pre-computed chains
using SSMProblems ## for state-space model interface
using GeneralisedFilters ## for particle filtering
using DrWatsonParticle MCMC
Estimated time: 1.5-2 hours
Introduction
In the previous session, we learned how particle filters estimate likelihoods for stochastic models by simulating many trajectories and weighting them by how well they match observations. We also calibrated the number of particles needed for reliable inference.
Now we’ll put this into practice. This session shows how to combine particle filters with Metropolis-Hastings to perform full Bayesian inference on stochastic models — the Particle Marginal Metropolis-Hastings (PMMH) algorithm.
Objectives
In this session you will:
- Implement PMMH using our calibrated particle filter from the previous session
- Fit the stochastic SEIT4L model to the Tristan da Cunha outbreak
- Compare stochastic vs deterministic model fits
- Understand why PMMH requires long chains and adaptive proposals
PMMH uses particle filters internally, introducing two sources of randomness: the MCMC proposals and the particle filter likelihood estimates. Your chains will look different from others’ runs. Focus on whether the chains have converged (diagnostics), not exact values.
PMMH combines particle filters with MCMC, making it computationally demanding. For this session:
- Memory: ~4 GB RAM recommended
- Runtime: Each MCMC iteration takes 0.1-0.5 seconds (256 particles)
- Full inference: 10,000 iterations takes ~30-60 minutes; we use pre-computed chains for the main analysis
- Short test runs: The 500-iteration demonstrations in this session take ~2-5 minutes each
The pre-computed chains (loaded from data/) were generated with 500,000 iterations over several hours.
Source file
The source file of this session is located at sessions/pmcmc.qmd.
Load libraries
Initialisation
We set a random seed for reproducibility.
Random.seed!(1234)TaskLocalRNG()
Recap: Data and particle filter
We load the Tristan da Cunha data and set up initial conditions:
# Load Tristan da Cunha data
flu_tdc = CSV.read(datadir("flu_tdc_1971.csv"), DataFrame)
# Initial state for SEIT4L
init_state = Dict(
:S => 279.0,
:E => 0.0,
:I => 2.0,
:T1 => 3.0,
:T2 => 0.0,
:T3 => 0.0,
:T4 => 0.0,
:L => 0.0
)Dict{Symbol, Float64} with 8 entries:
:I => 2.0
:T2 => 0.0
:L => 0.0
:S => 279.0
:T4 => 0.0
:T1 => 3.0
:E => 0.0
:T3 => 0.0
In the Particle Filters session, we:
- Implemented the SEIT4L Gillespie simulator (
gillespie_step) - Defined the SSMProblems.jl interface (
SEIT4LDynamics,PoissonObservation,SEIT4LInitial) - Calibrated particle count: 128-256 particles gave SD of log-likelihood between 1 and 3
All these components were loaded from src/seit4l_particle_filter.jl in the setup above. Now we’ll use them for full Bayesian inference.
Particle MCMC (PMMH) with Turing.jl
Now that we have a calibrated particle filter, we can use it for inference. We’ll implement Particle Marginal Metropolis-Hastings (PMMH) - one of the two main particle MCMC algorithms from Andrieu et al. (2010).
PMMH uses the particle filter to estimate the likelihood, then does standard Metropolis-Hastings on the parameters. In Turing.jl, we implement this by:
- Defining priors as usual with
~ - Building a state-space model and running
GeneralisedFilters.filter()to get the log-likelihood - Adding it to the model’s log-probability with
@addlogprob! - Sampling with
MH()with custom random-walk proposals
"""
PMMH model for SEIT4L inference.
Uses GeneralisedFilters for likelihood estimation.
"""
@model function pmmh_seit4l(obs, n_particles)
# Weakly informative priors
R_0 ~ truncated(Normal(3.0, 2.0), lower=1.0)
D_lat ~ truncated(Normal(2.0, 1.0), lower=0.5)
D_inf ~ truncated(Normal(3.0, 2.0), lower=0.5)
α ~ Beta(2, 2)
D_imm ~ truncated(Normal(15.0, 10.0), lower=1.0)
ρ ~ Beta(2, 2)
# Create parameter dictionary
θ = Dict(:R_0 => R_0, :D_lat => D_lat, :D_inf => D_inf,
:α => α, :D_imm => D_imm, :ρ => ρ)
# Build state-space model
# The SSMProblems interface expects a plain vector, not a Dict
# Order: [S, E, I, T1, T2, T3, T4, L]
init_state = [279.0, 0.0, 2.0, 3.0, 0.0, 0.0, 0.0, 0.0]
ssm = StateSpaceModel(SEIT4LInitial(init_state),
SEIT4LDynamics(θ),
PoissonObservation(θ[:ρ]))
# Run particle filter to estimate log-likelihood
_, log_lik = GeneralisedFilters.filter(Random.default_rng(), ssm, BF(n_particles), obs)
# Add log-likelihood to model (priors are already included via ~)
Turing.@addlogprob! log_lik
endMain.Notebook.pmmh_seit4l
StateSpaceModel(...)- assembles the state-space model from initial distribution, dynamics, and observation components defined insrc/seit4l_particle_filter.jl.GeneralisedFilters.filter(...)- runs the bootstrap particle filter (BF) and returns the estimated log-likelihood.Random.default_rng()provides the global random number generator — this is Julia’s way of passing randomness explicitly rather than relying on a hidden global state.@addlogprob!- manually adds the particle filter’s log-likelihood estimate to the model. The priors specified with~still contribute automatically. This is the recommended pattern for integrating external likelihood estimators with Turing.- Why 256 particles? PMMH acceptance depends on the variance of the log-likelihood estimate. Using more particles reduces variance but increases computation.
Now let’s run PMMH. Unlike standard MCMC where NUTS auto-tunes proposals, PMMH requires Metropolis-Hastings because the particle filter likelihood is not differentiable.
Let’s start with a short chain to see the challenges:
# Define model with 256 particles
model_pmmh = pmmh_seit4l(flu_tdc.obs, 256)
# Run a short chain with RAM
# externalsampler() wraps an AdvancedMH.jl sampler for use with Turing models
# check_model=false is needed because our model uses @addlogprob! instead of standard ~ syntax
chain_short = sample(
model_pmmh,
externalsampler(AdvancedMH.RobustAdaptiveMetropolis()),
500;
check_model=false,
progress=false
)Chains MCMC chain (500×12×1 Array{Float64, 3}):
Iterations = 1:1:500
Number of chains = 1
Samples per chain = 500
Wall duration = 74.25 seconds
Compute duration = 74.25 seconds
parameters = R_0, D_lat, D_inf, α, D_imm, ρ
internals = logα, η, accepted, logprior, loglikelihood, logjoint
Use `describe(chains)` for summary statistics and quantiles.
# Plot the short chain
plot(chain_short)Look carefully at the trace plots above. You should see:
- Flat stretches (plateaus): The chain gets stuck at the same value for many iterations
- Slow wandering: Instead of rapidly exploring, the chain drifts gradually
With only 500 samples, the chain hasn’t had time to properly explore the posterior.
Try running a short PMMH chain yourself:
- Run
sample(model_pmmh, externalsampler(AdvancedMH.RobustAdaptiveMetropolis()), 200; check_model=false)and look at the trace plots - What acceptance rate does the sampler achieve? (Look at the proportion of samples that change between iterations)
- Try reducing the particle count to 64:
pmmh_seit4l(flu_tdc.obs, 64). How does this affect mixing? Is it faster but noisier? - Compare to the pre-computed chain loaded below — how many more iterations were needed for convergence?
Why is PMMH mixing difficult?
The particle filter gives a noisy estimate of the likelihood. Even at the same parameters, different runs give different values. This noise means proposals that would be accepted with an exact likelihood may be rejected (and vice versa), slowing exploration.
Robust Adaptive Metropolis (RAM) helps by automatically learning the posterior covariance structure during sampling (Vihola, 2012).
Running PMMH with RAM for good mixing takes several hours. We load pre-computed chains here (500,000 samples, 50,000 burnin, thinned by 50). See scripts/generate_pmcmc_seit4l.jl if you want to run it yourself.
# Load pre-computed SEIT4L chain with good mixing
chain_df = CSV.read(datadir("pmcmc_seit4l_chain.csv"), DataFrame)
chain_pmcmc = Chains(Matrix(chain_df[:, [:R_0, :D_lat, :D_inf, :α, :D_imm, :ρ]]),
[:R_0, :D_lat, :D_inf, :α, :D_imm, :ρ])Chains MCMC chain (9000×6×1 Array{Float64, 3}):
Iterations = 1:1:9000
Number of chains = 1
Samples per chain = 9000
parameters = R_0, D_lat, D_inf, α, D_imm, ρ
Use `describe(chains)` for summary statistics and quantiles.
# Display chain summary
display(chain_pmcmc)Chains MCMC chain (9000×6×1 Array{Float64, 3}):
Iterations = 1:1:9000
Number of chains = 1
Samples per chain = 9000
parameters = R_0, D_lat, D_inf, α, D_imm, ρ
Use `describe(chains)` for summary statistics and quantiles.
# Plot trace plots - much better mixing!
plot(chain_pmcmc)Compare these traces to the short chain above - the well-mixed chain shows:
# Plot posterior densities (3 rows x 2 columns for 6 parameters)
density(chain_pmcmc, size = (800, 900), layout = (3, 2))With standard MH and independent proposals, mixing is poor when parameters are correlated. The sampler proposes moves that go “against” the correlation structure and gets rejected.
RAM solves this by learning the posterior covariance during sampling (Vihola, 2012). It automatically:
- Adapts the proposal covariance to match the posterior shape
- Scales proposals to achieve ~23% acceptance (optimal for random-walk MH)
This is similar to how NUTS adapts its mass matrix, but works for non-differentiable likelihoods like particle filters.
Comparing with deterministic model fits
We quickly fit the deterministic SEIT4L model for comparison:
"""
SEIT4L deterministic model.
"""
function seit4l_ode!(du, u, p, t)
R_0, D_lat, D_inf, α, D_imm = p
β = R_0 / D_inf
ϵ = 1.0 / D_lat
ν = 1.0 / D_inf
τ = 4.0 / D_imm
S, E, I, T1, T2, T3, T4, L, Inc = u
N = S + E + I + T1 + T2 + T3 + T4 + L
du[1] = -β * S * I / N + (1 - α) * τ * T4
du[2] = β * S * I / N - ϵ * E
du[3] = ϵ * E - ν * I
du[4] = ν * I - τ * T1
du[5] = τ * T1 - τ * T2
du[6] = τ * T2 - τ * T3
du[7] = τ * T3 - τ * T4
du[8] = α * τ * T4
du[9] = ϵ * E
end
@model function seit4l_deterministic_model(init_state, times, n_obs)
# Weakly informative priors
R_0 ~ truncated(Normal(3.0, 2.0), lower=1.0)
D_lat ~ truncated(Normal(2.0, 1.0), lower=0.5)
D_inf ~ truncated(Normal(3.0, 2.0), lower=0.5)
α ~ Beta(2, 2)
D_imm ~ truncated(Normal(15.0, 10.0), lower=1.0)
ρ ~ Beta(2, 2)
# Solve ODE
params = [R_0, D_lat, D_inf, α, D_imm]
u0 = [init_state[:S], init_state[:E], init_state[:I],
init_state[:T1], init_state[:T2], init_state[:T3], init_state[:T4],
init_state[:L], 0.0]
prob = ODEProblem(seit4l_ode!, u0, (times[1], times[end]), params)
sol = solve(prob, Tsit5(), saveat=times)
# Extract daily incidence
inc_cumulative = sol[9, :]
inc_daily = [0.0; diff(inc_cumulative)]
# Likelihood: Poisson observations
# obs[i] corresponds to inc_daily[i] (times 0-59, indices 1-60)
lambdas = [max(ρ * inc_daily[i], 1e-10) for i in 1:n_obs]
obs ~ arraydist(Poisson.(lambdas))
endseit4l_deterministic_model (generic function with 2 methods)
| conditioning here but @addlogprob! above?
The deterministic model uses obs ~ arraydist(Poisson.(lambdas)) — the likelihood is expressed as a ~ statement, so Turing can condition on the data directly with |.
The stochastic pmmh_seit4l model uses an external likelihood estimator (the particle filter via GeneralisedFilters.filter), which returns a log-likelihood that we add manually with @addlogprob!. In that model, there is no obs ~ statement in the model body — obs enters as a function argument passed to the filter — so | has nothing to condition on.
In short: use | when the likelihood is a ~ statement; use @addlogprob! when the likelihood comes from an external computation.
# Fit deterministic model
times_obs = 0.0:1.0:59.0
model_det = seit4l_deterministic_model(init_state, times_obs, length(flu_tdc.obs)) | (; obs = flu_tdc.obs)
chain_det = sample(model_det, NUTS(), 2000, progress=false)┌ Info: Found initial step size └ ϵ = 0.05
Chains MCMC chain (2000×20×1 Array{Float64, 3}):
Iterations = 1001:1:3000
Number of chains = 1
Samples per chain = 2000
Wall duration = 24.76 seconds
Compute duration = 24.76 seconds
parameters = R_0, D_lat, D_inf, α, D_imm, ρ
internals = n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size, logprior, loglikelihood, logjoint
Use `describe(chains)` for summary statistics and quantiles.
# Compare posteriors
p1 = density(chain_det, :R_0, title = "R_0 (Deterministic)", legend = false)
p2 = density(chain_pmcmc, :R_0, title = "R_0 (Stochastic)", legend = false)
p3 = density(chain_det, :α, title = "α (Deterministic)", legend = false)
p4 = density(chain_pmcmc, :α, title = "α (Stochastic)", legend = false)
plot(p1, p2, p3, p4, layout = (2, 2), size = (800, 600))Look at the posterior density plots above and compare the two models:
- Which model gives wider posterior distributions? Why would accounting for stochastic variability change the uncertainty in parameter estimates?
- Focus on the \(\alpha\) parameter (probability of long-term immunity). Do the two models agree on its value?
- For a population of 284 people, do you think the deterministic approximation is reasonable, or does stochasticity matter?
For the Tristan da Cunha outbreak with only 284 individuals, demographic stochasticity plays a significant role. The stochastic model typically shows wider posteriors because it accounts for an additional source of variability — the randomness of individual infection events — that the deterministic model ignores. The deterministic model is much faster to fit and useful for initial exploration, but for small populations the stochastic model gives more honest uncertainty estimates.
Model comparison: SEITL vs SEIT4L
In the SEITL session, we introduced two model variants that both explain the two-wave pattern through temporary immunity:
- SEITL: Single T compartment with exponential duration (memoryless)
- SEIT4L: Four T sub-compartments giving Erlang-4 duration (more realistic timing)
Which model better explains the data? We fit both and compare.
Fitting SEITL for comparison
The SEITL model uses the same structure as SEIT4L, but with a single T compartment instead of four. The implementation follows the same pattern we used for SEIT4L: a Gillespie simulator for stochastic dynamics, a bootstrap particle filter to estimate the likelihood, and a Turing @model that adds the log-likelihood via @addlogprob!. Since the code structure is identical, we load pre-computed chains rather than repeating the implementation. See scripts/generate_pmcmc_seitl.jl if you want to examine the full code.
Both models were fitted using RAM with 500,000 iterations, 50,000 burn-in, and thinned by 50:
# Load pre-computed SEITL chain
chain_seitl_df = CSV.read(datadir("pmcmc_seitl_chain.csv"), DataFrame)
chain_seitl = Chains(Matrix(chain_seitl_df[:, [:R_0, :D_lat, :D_inf, :α, :D_imm, :ρ]]),
[:R_0, :D_lat, :D_inf, :α, :D_imm, :ρ])Chains MCMC chain (9000×6×1 Array{Float64, 3}):
Iterations = 1:1:9000
Number of chains = 1
Samples per chain = 9000
parameters = R_0, D_lat, D_inf, α, D_imm, ρ
Use `describe(chains)` for summary statistics and quantiles.
# Compare the fits visually
p1 = density(chain_seitl, :R_0, title="R_0", label="SEITL")
density!(p1, chain_pmcmc, :R_0, label="SEIT4L")
p2 = density(chain_seitl, :D_imm, title="D_imm", label="SEITL")
density!(p2, chain_pmcmc, :D_imm, label="SEIT4L")
plot(p1, p2, layout=(1, 2), size=(800, 400))Model comparison
Looking at the posterior densities above, we can see differences between the two models. But which model better explains the data? This is the question of model comparison or model selection.
Approaches to model comparison
Information criteria estimate out-of-sample predictive accuracy by penalising model complexity:
- DIC (Deviance Information Criterion): Adds a penalty based on “effective number of parameters”. Easy to compute but can be unstable for complex models.
- WAIC (Widely Applicable Information Criterion): More robust than DIC, based on pointwise predictive density. Asymptotically equivalent to leave-one-out cross-validation.
- LOO-CV (Leave-One-Out Cross-Validation): Gold standard for predictive accuracy. ParetoSmooth.jl provides efficient PSIS-LOO approximation.
For particle MCMC, these criteria require care because the likelihood estimates are noisy. The noise doesn’t bias the posterior, but it does affect pointwise likelihood calculations needed for WAIC and LOO-CV. Solutions include using more particles when computing criteria, or averaging over multiple likelihood estimates per parameter sample.
Bayes factors compare marginal likelihoods directly:
\[\text{BF}_{12} = \frac{P(\text{data}|M_1)}{P(\text{data}|M_2)}\]
Values > 10 indicate strong evidence for \(M_1\); values < 0.1 indicate strong evidence for \(M_2\). The marginal likelihood can be estimated from particle filter output using techniques like path sampling or thermodynamic integration, but these are computationally intensive.
Posterior predictive checks provide a qualitative but intuitive comparison: which model generates data that looks more like the observations? This doesn’t give a single number, but can reveal how models differ in their predictions. A full posterior predictive comparison would simulate many datasets from each fitted model and compare them to the observed data - we leave this as an exercise.
Comparing parameter estimates
We can compare how the two models estimate the same parameters:
# Posterior means from each model
println("Posterior mean estimates:")
println("SEITL:")
println(" R_0 = $(round(mean(chain_seitl[:R_0]), digits=2))")
println(" D_imm = $(round(mean(chain_seitl[:D_imm]), digits=2))")
println("SEIT4L:")
println(" R_0 = $(round(mean(chain_pmcmc[:R_0]), digits=2))")
println(" D_imm = $(round(mean(chain_pmcmc[:D_imm]), digits=2))")Posterior mean estimates:
SEITL:
R_0 = 7.05
D_imm = 17.56
SEIT4L:
R_0 = 10.8
D_imm = 12.28
The key difference is in the D_imm (duration of temporary immunity) estimates. SEIT4L, with its Erlang-distributed immunity duration, typically estimates a shorter mean duration but with less variance in timing.
What the original study found
Camacho et al. (2011) compared SEITL and SEIT4L using DIC and found that SEIT4L provided a better fit (lower DIC). This suggests that the timing of immunity loss matters - it’s not memoryless, as the exponential distribution in SEITL would imply.
The Erlang-4 distribution in SEIT4L has a coefficient of variation of 0.5 (compared to 1.0 for exponential), meaning individuals lose immunity at more consistent times. For the Tristan da Cunha data, this more realistic timing distribution better captures the second wave dynamics.
Practical recommendations
For particle MCMC model comparison:
- Start with posterior predictive checks - do the models make qualitatively different predictions?
- Use DIC for quick comparisons - it’s easy to compute and gives a rough ranking
- Use WAIC or LOO-CV for publication - more robust, with standard errors
- Be cautious with Bayes factors - they’re sensitive to prior choices and computationally expensive
When criteria disagree, consider what question you’re actually asking: best predictions (information criteria) or most probable model (Bayes factors)?
Generate posterior predictive simulations from both SEITL and SEIT4L and compare them visually:
# For each model chain, sample parameters and simulate trajectories
function posterior_predictive_sims(chain, simulate_fn, init_state, times, n_sims=50)
plots_data = []
for i in rand(1:length(chain), n_sims)
θ = Dict(:R_0 => chain[:R_0][i], :D_lat => chain[:D_lat][i],
:D_inf => chain[:D_inf][i], :α => chain[:α][i],
:D_imm => chain[:D_imm][i], :ρ => chain[:ρ][i])
traj = simulate_fn(θ, init_state, times)
obs_sim = rand.(Poisson.(max.(θ[:ρ] .* traj.Inc, 1e-10)))
push!(plots_data, obs_sim)
end
return plots_data
end- Which model produces simulated epidemics that look more like the observed data?
- Does either model consistently miss the second wave timing?
- How does the spread of simulations differ between models?
Summary
Stochastic epidemic models have transitions that are random events rather than smooth flows. Standard MCMC fails because the likelihood depends on the unknown sequence of random events, which cannot be written analytically.
Particle MCMC solves this by embedding particle filters within MCMC. The particle filter estimates the likelihood by simulating many trajectories and weighting them by how well they match data. PMMH uses this noisy but unbiased estimate in a Metropolis-Hastings framework - the noise slows mixing but doesn’t bias the posterior.
We fitted both SEITL and SEIT4L to the Tristan da Cunha data. Calibration is critical: too few particles gives high variance (poor mixing), too many wastes computation. RAM adapts proposal covariances automatically, essential for the correlated parameters typical in epidemic models.
- Particle filters estimate likelihoods for stochastic models by simulating many trajectories (particles) and weighting them by how well they match the data
- Number of particles should be calibrated so the standard deviation of the log-likelihood estimate is between 1 and 3. The required number scales with problem complexity.
- Stochastic vs deterministic: stochastic models capture demographic stochasticity (important for small populations) but are much slower to fit
- Model comparison using information criteria helps choose between competing mechanistic explanations, though applying them to particle MCMC requires care
Going further
Multiple chains and diagnostics
For production PMMH, you should run multiple chains and check convergence:
# Run multiple chains in parallel
chain_multi = sample(
model_pmmh,
externalsampler(AdvancedMH.RobustAdaptiveMetropolis()),
MCMCThreads(),
10000,
4;
check_model=false
)
# Check R̂ and ESS statistics
summarystats(chain_multi)Use the same diagnostics (R̂ < 1.1, ESS > 100) as for standard MCMC.
Alternative: Manual proposal tuning
If you need more control, you can use MH() with a covariance matrix instead of RAM. This is the same approach we used in the MCMC session:
# Manual covariance matrix (6 parameters, so 6x6 matrix)
# Diagonal entries are proposal variances, off-diagonal are covariances
Σ = [4.0 0.0 0.0 0.0 0.0 0.0; # R_0
0.0 0.04 0.0 0.0 0.0 0.0; # D_lat
0.0 0.0 0.25 0.0 0.0 0.0; # D_inf
0.0 0.0 0.0 0.01 0.0 0.0; # α
0.0 0.0 0.0 0.0 4.0 0.0; # D_imm
0.0 0.0 0.0 0.0 0.0 0.01] # ρ
chain = sample(model_pmmh, MH(Σ), 5000)This requires choosing proposal variances for each parameter - too large gives low acceptance, too small gives slow exploration. In practice, RAM is easier because it learns the covariance structure automatically.
Particle Gibbs
Particle Gibbs (PG) is another PMCMC method, available in Turing via PG(n_particles).
The methodological difference: PMMH uses the particle filter to estimate the marginal likelihood, then does standard Metropolis-Hastings on the parameters. PG conditions on a “reference trajectory” - one particle path from the previous iteration is retained, and the other particles are sampled conditional on it.
PG can exhibit high autocorrelation in some settings - Lindsten et al. (2014) discuss this and propose the Particle Gibbs with Ancestor Sampling (PGAS) algorithm as an improvement.
Filtered trajectories
PMCMC methods can provide filtered trajectories - stochastic trajectories from the model conditioned on both the estimated parameters and the observed data. These are useful for visualising what the underlying epidemic dynamics might have looked like, given what we observed.
To extract a trajectory from our particle filter, we’d sample one particle at the final time point (weighted by the particle weights) and reconstruct its path back through time. This gives one draw from the posterior distribution over trajectories. Repeating across MCMC iterations gives a collection of plausible epidemic histories.
See Doucet & Johansen (2009) for details on trajectory sampling from particle filters.
SMC²
SMC² (SMC squared) uses Sequential Monte Carlo for both parameters and state trajectories, avoiding MCMC entirely. This can be more efficient for complex models but requires more careful tuning. Turing.jl supports this through the SMC sampler.
Next session
In Observation Models and Distance Metrics, we’ll step back and examine a fundamental question: why do we choose Poisson likelihoods, and what are the alternatives?
References
- Andrieu et al. (2010). Particle Markov chain Monte Carlo methods. JRSS B 72(3), 269-342. (The original PMCMC paper)
- Pitt et al. (2012). On some properties of Markov chain Monte Carlo simulation methods based on the particle filter. Biometrika 99(2), 269-284. (Particle calibration: target log-likelihood SD of 1-3)
- Vihola (2012). Robust adaptive Metropolis algorithm with coerced acceptance rate. Statistics and Computing 22, 997-1008. (RAM algorithm used in this session)
- Roberts & Rosenthal (2009). Examples of Adaptive MCMC. J. Computational and Graphical Statistics 18(2), 349-367. (Adaptive MCMC theory)
- Doucet & Johansen (2009). A tutorial on particle filtering and smoothing. Handbook of Nonlinear Filtering 12, 656-704. (Trajectory sampling)
- Doucet et al. (2015). Efficient implementation of Markov chain Monte Carlo when using an unbiased likelihood estimator. Biometrika 102(2), 295-313.
- Doucet et al. (2001). Sequential Monte Carlo Methods in Practice. Springer. (Comprehensive textbook)
- Arulampalam et al. (2002). A tutorial on particle filters. IEEE Trans. Signal Processing 50(2), 174-188.
- Turing.jl documentation - Probabilistic programming in Julia