using Random ## for random numbers
using Distributions ## for probability distributions
using DataFrames ## for data frames
using Plots ## for plots
using StatsPlots ## for statistical plots
using CSV ## for CSV file reading
using StatsBase ## for statistical functions
using SSMProblems ## for state-space model interface
using GeneralisedFilters ## for particle filtering
using DrWatsonParticle Filters
Estimated time: 2-2.5 hours (includes computation time for calibration)
Introduction
Where we are: You’ve completed the core workflow (Introduction → MCMC → Diagnostics → Model Checking) and seen the SEITL model. Now we tackle stochastic models, where the same parameters produce different trajectories each run.
In the previous sessions, we fitted deterministic models using MCMC. For deterministic models, computing the likelihood is straightforward: given parameters, we solve the ODEs to get a trajectory, then evaluate the probability of the data given that trajectory.
But what about stochastic models? In the SEITL session, we saw that the same parameters can produce very different trajectories due to demographic stochasticity. There’s no single “expected” trajectory to compare against the data.
This session introduces particle filters — the computational technique that makes Bayesian inference possible for stochastic models.
Particle filters weren’t covered in the R-based course. If this is your first encounter with them, that’s expected — they’re a new topic rather than a Julia translation of something you’ve seen before. In R, the pomp package and the odin/dust/mcstate ecosystem provide particle filtering capabilities.
Objectives
In this session you will:
- Understand why stochastic models need different inference methods
- Learn the core particle filter algorithm: propagate, weight, resample
- Implement the SEIT4L model using the state-space model interface
- Calibrate the number of particles needed for reliable inference
Particle filters use stochastic simulation, so your results will differ slightly from others’ - and from repeated runs. The patterns (e.g., how ESS changes with particle count) should be similar, but exact numbers will vary. This is expected.
Particle filters are computationally intensive. For this session:
- Memory: ~2-4 GB RAM for 256 particles (scales linearly with particle count)
- Runtime: Single likelihood evaluation takes 0.1-0.5 seconds with 100 particles
- Calibration section: Running 100 replicates at multiple particle counts takes 5-10 minutes
If you’re on a slower machine, reduce particle counts in the calibration exercises (e.g., use 50, 100, 200 instead of 64, 128, 256, 512).
Source file
The source file of this session is located at sessions/particle_filters.qmd.
Load libraries
Initialisation
We set a random seed for reproducibility.
Random.seed!(1234)TaskLocalRNG()
Why deterministic likelihood fails for stochastic models
First, consider the problem. With a deterministic model, the trajectory is fully determined by the parameters:
\[\text{Parameters} \rightarrow \text{Unique trajectory} \rightarrow P(\text{data}|\text{trajectory})\]
With a stochastic model, the same parameters can produce infinitely many different trajectories:
\[\text{Parameters} \rightarrow \text{Many possible trajectories} \rightarrow \text{???}\]
The likelihood we need is \(P(\text{data}|\theta)\), which marginalises over all possible trajectories:
\[P(\text{data}|\theta) = \int P(\text{data}|\text{trajectory}) \times P(\text{trajectory}|\theta) \, d(\text{trajectory})\]
This integral is intractable - we can’t enumerate all possible trajectories. Particle filters give us a Monte Carlo approximation.
Load data and define SEIT4L dynamics
# Load Tristan da Cunha data
flu_tdc = CSV.read(datadir("flu_tdc_1971.csv"), DataFrame)
# Parameters (reasonable values from literature)
θ_example = Dict(
:R_0 => 7.0,
:D_lat => 1.0,
:D_inf => 4.0,
:α => 0.5,
:D_imm => 10.0,
:ρ => 0.65
)
# Initial state for SEIT4L
init_state = Dict(
:S => 279.0,
:E => 0.0,
:I => 2.0,
:T1 => 3.0,
:T2 => 0.0,
:T3 => 0.0,
:T4 => 0.0,
:L => 0.0
)Dict{Symbol, Float64} with 8 entries:
:I => 2.0
:T2 => 0.0
:L => 0.0
:S => 279.0
:T4 => 0.0
:T1 => 3.0
:E => 0.0
:T3 => 0.0
The Gillespie simulator
From here on, we use the SEIT4L model (Erlang-4 temporary immunity) rather than the basic SEITL (exponential temporary immunity). As we saw in the SEITL session, the Erlang distribution gives more realistic timing for immunity loss. The original Camacho et al. (2011) paper found that SEIT4L fits the Tristan da Cunha data better. Everything below works identically with SEITL — just with fewer T sub-compartments.
The particle filter needs to simulate many stochastic trajectories. We use the Gillespie algorithm introduced in the SEITL session, adapted here for the SEIT4L model with its four T compartments:
Gillespie step implementation (click to expand)
# Stoichiometry matrix: how each transition changes [S, E, I, T1, T2, T3, T4, L]
const STOICH = [
[-1, 1, 0, 0, 0, 0, 0, 0], # S -> E (infection)
[ 0, -1, 1, 0, 0, 0, 0, 0], # E -> I (become infectious)
[ 0, 0, -1, 1, 0, 0, 0, 0], # I -> T1 (recovery starts)
[ 0, 0, 0, -1, 1, 0, 0, 0], # T1 -> T2
[ 0, 0, 0, 0, -1, 1, 0, 0], # T2 -> T3
[ 0, 0, 0, 0, 0, -1, 1, 0], # T3 -> T4
[ 0, 0, 0, 0, 0, 0, -1, 1], # T4 -> L (long-term immunity)
[ 1, 0, 0, 0, 0, 0, 0, -1], # L -> S (waning immunity)
]
function gillespie_step(rng, state, θ)
S, E, I, T1, T2, T3, T4, L = state
N = sum(state)
# Calculate transition rates
β = θ[:R_0] / θ[:D_inf]
rates = [
β * S * I / N, # S -> E
E / θ[:D_lat], # E -> I
(1 - θ[:α]) * I / θ[:D_inf], # I -> T1 (temporary immunity)
θ[:α] * I / θ[:D_inf], # I -> L (direct to long-term, not shown above)
4 * T1 / θ[:D_imm], # T1 -> T2 (Erlang stages)
4 * T2 / θ[:D_imm], # T2 -> T3
4 * T3 / θ[:D_imm], # T3 -> T4
4 * T4 / θ[:D_imm], # T4 -> L
]
daily_incidence = 0
t = 0.0
while t < 1.0 # Simulate one day
total_rate = sum(rates)
if total_rate == 0
break
end
# Time to next event
dt = randexp(rng) / total_rate
if t + dt > 1.0
break
end
t += dt
# Choose which event occurs
event = sample(rng, 1:length(rates), Weights(rates))
# Update state
state = state .+ STOICH[event]
# Track E -> I transitions as new cases
if event == 2
daily_incidence += 1
end
# Recalculate rates with new state
S, E, I, T1, T2, T3, T4, L = state
# ... (rates update)
end
return state, daily_incidence
endKey aspects:
- Stoichiometry matrix: Defines how each transition changes the state vector
- Rates: Computed from current state and parameters
- Event selection: Exponential waiting time, event chosen proportional to rates
- Incidence tracking: Counts E→I transitions as new cases
How particle filters work
The particle filter approximates the likelihood integral by:
- Propagate: Simulate many independent trajectories (“particles”) forward through time
- Weight: At each observation time, assign weights based on how well each particle matches the data
- Resample: Replace low-weight particles with copies of high-weight particles
We can visualise this with our SEIT4L model:
# Propagate 20 particles for 5 days
n_particles = 20
particles = [Float64[init_state[:S], init_state[:E], init_state[:I],
init_state[:T1], init_state[:T2], init_state[:T3],
init_state[:T4], init_state[:L]] for _ in 1:n_particles]
# Store trajectories (days 0-5)
trajectories = [zeros(6) for _ in 1:n_particles]
# Simulate 5 days
for day in 1:5
for i in 1:n_particles
particles[i], inc = gillespie_step(Random.default_rng(), particles[i], θ_example)
trajectories[i][day+1] = inc
end
end
# For weighting step later
day_5_predictions = [trajectories[i][6] for i in 1:n_particles]
day_5_observed = flu_tdc.obs[6]
# Plot
p = plot(title="Particle Filter: Propagation Step",
xlabel="Time (days)", ylabel="Daily incidence")
# Particle trajectories
for i in 1:n_particles
plot!(p, 0:5, trajectories[i], alpha=0.4, color=:steelblue, linewidth=1, label="")
end
# All observations - earlier in gray, previous (day 4) in red, current target (day 5) as open circle
scatter!(p, flu_tdc.time[1:4], flu_tdc.obs[1:4],
label="Earlier observations", markersize=4, color=:gray)
scatter!(p, [flu_tdc.time[5]], [flu_tdc.obs[5]],
label="Previous observation", markersize=7, color=:red)
scatter!(p, [flu_tdc.time[6]], [flu_tdc.obs[6]],
label="Current target", markersize=7, markershape=:circle,
markerstrokecolor=:red, markerstrokewidth=2, color=:white)
plot!(p, legend=:topright)Each blue line is one particle trajectory. The particles spread out because each follows different random events. The current observation (red, at day 5) will be used to weight the particles.
The weighting step
Particles that predict incidence close to the observed value get higher weights:
# Compute weights (Poisson likelihood)
log_w = [logpdf(Poisson(max(θ_example[:ρ] * pred, 0.01)), day_5_observed)
for pred in day_5_predictions]
# Normalise
particle_weights = exp.(log_w .- maximum(log_w))
particle_weights ./= sum(particle_weights)
# Visualise
p = bar(1:n_particles, particle_weights,
xlabel="Particle", ylabel="Weight",
title="Particle Weights at Day 5 (obs = $day_5_observed)",
legend=false, alpha=0.7)
# Add predictions as labels
for (i, pred) in enumerate(day_5_predictions)
annotate!(p, i, particle_weights[i] + 0.02, text("$(round(Int, pred))", 7))
end
plot!(p)Particles predicting values closer to the observation get higher weights.
The resampling step
When weights become too uneven, we resample - drawing particles with replacement according to their weights:
# Resample particles
n_effective = 1.0 / sum(particle_weights.^2) # Effective sample size
println("Effective sample size: $(round(n_effective, digits=1)) out of $n_particles")
# Draw new particles
resampled_idx = sample(1:n_particles, Weights(particle_weights), n_particles)
# Count how many times each original particle was selected
counts = [count(==(i), resampled_idx) for i in 1:n_particles]
p = bar(1:n_particles, counts,
xlabel="Original particle", ylabel="Copies after resampling",
title="Resampling: High-weight particles duplicated",
legend=false, alpha=0.7, color=:green)
plot!(p)Effective sample size: 2.0 out of 20
High-weight particles are duplicated, low-weight particles are discarded. This focuses computational effort on plausible trajectories.
Look at the three plots above (propagation, weighting, resampling).
- In the propagation plot, why do the particle trajectories spread apart over time?
- In the weighting plot, particles with predicted incidence closer to the observed value get higher weights. Why does Poisson likelihood give this behaviour?
- After resampling, some particles are duplicated many times while others disappear entirely. Why is this beneficial for estimating the likelihood?
Without resampling, particle filters suffer from degeneracy: after several time steps, almost all weight concentrates on a single particle. This happens because multiplicative weights compound - a particle that’s slightly better at time 1 gets a slightly higher weight, making it even more likely to dominate at time 2, and so on.
Imagine 100 particles after 10 time steps without resampling: - 1 particle has weight 0.99 - 99 particles share weight 0.01
This means we’re effectively using just 1 trajectory to estimate the likelihood - a very poor approximation. Resampling fixes this by redistributing particles to maintain diversity, keeping the effective sample size high.
Accumulating the likelihood
The log-likelihood is estimated by summing the log of mean weights at each time step:
\[\log \hat{P}(\text{data}|\theta) = \sum_{t=1}^{T} \log \left( \frac{1}{N} \sum_{i=1}^{N} w_t^{(i)} \right)\]
This is unbiased in expectation, meaning if we run the particle filter many times, the average will converge to the true likelihood.
Implementing the state-space model interface
For production use, we define the model using the SSMProblems.jl interface. This separates the model from the filtering algorithm, so you can swap in different filtering methods (bootstrap, guided, auxiliary) without rewriting the model. It also integrates with Turing.jl for the Particle MCMC in the next session.
A state-space model has three components:
- Initial state distribution (
SEIT4LInitial) - where does the system start? - Transition dynamics (
SEIT4LDynamics) - how does the state evolve? (usesgillespie_step) - Observation process (
PoissonObservation) - how do observations relate to the state?
Here is the full interface implementation (also available in src/seit4l_particle_filter.jl):
# Initial state: deterministic starting point
struct SEIT4LInitial <: SSMProblems.StatePrior
init_state::Vector{Float64}
end
function SSMProblems.simulate(rng::AbstractRNG, prior::SEIT4LInitial; kwargs...)
vcat(prior.init_state, 0.0) # Append 0 for initial daily incidence
end
# Dynamics: Gillespie simulation for one day
struct SEIT4LDynamics <: SSMProblems.LatentDynamics
θ::Dict{Symbol, Float64}
end
function SSMProblems.simulate(rng::AbstractRNG, dyn::SEIT4LDynamics,
step::Integer, prev_state; kwargs...)
state = collect(prev_state[1:8]) # Extract compartments
new_state, daily_inc = gillespie_step(rng, state, dyn.θ)
return vcat(new_state, daily_inc) # Return state with incidence appended
end
# Observation: Poisson likelihood on reported cases
struct PoissonObservation <: SSMProblems.ObservationProcess
ρ::Float64 # Reporting rate
end
function SSMProblems.distribution(obs::PoissonObservation, step::Integer,
state; kwargs...)
daily_inc = state[end]
Poisson(max(obs.ρ * daily_inc, 1e-10))
endSSMProblems.LatentDynamics- abstract type for state transition models. Implementsimulateto define how the state evolves.SSMProblems.ObservationProcess- abstract type for observation models. Implementdistributionto return aDistributions.jldistribution.AbstractRNGparameter allows reproducible random number generation across particles.
How the pieces fit together:
- The state vector is [S, E, I, T1, T2, T3, T4, L, daily_inc] - the 9th element tracks daily incidence for the observation process
SEIT4LDynamicswraps the Gillespie simulator, callinggillespie_stepeach dayPoissonObservationassumes observed cases follow Poisson(ρ × true_incidence)
Running the bootstrap particle filter
The run_particle_filter function assembles the SSM components and runs the bootstrap filter:
function run_particle_filter(θ, obs, n_particles;
init_state=[279.0, 0.0, 2.0, 3.0, 0.0, 0.0, 0.0, 0.0])
# Define SSM components
initial = SEIT4LInitial(init_state)
dynamics = SEIT4LDynamics(θ)
observation = PoissonObservation(θ[:ρ])
# Create state-space model
model = StateSpaceModel(initial, dynamics, observation)
# Run bootstrap particle filter
rng = Random.default_rng()
algo = BF(n_particles) # Bootstrap Filter
_, log_lik = GeneralisedFilters.filter(rng, model, algo, obs)
return log_lik
endThis ties everything together: the initial state, Gillespie dynamics, and Poisson observations form a StateSpaceModel, which the bootstrap filter (BF) processes to estimate the log-likelihood.
Testing it:
# Run with 100 particles
ll = run_particle_filter(θ_example, flu_tdc.obs, 100)
println("Log-likelihood estimate: $(round(ll, digits=1))")Log-likelihood estimate: -118.2
Calibrating particle count
A key question: how many particles do we need?
- Too few particles produce noisy likelihood estimates, causing poor MCMC mixing
- Too many particles waste computation
The principled approach targets a specific variance of the log-likelihood estimate. Pitt et al. (2012) and Doucet et al. (2015) showed that MCMC efficiency is maximised when the standard deviation of the log-likelihood estimate is between 1 and 3.
Why is some noise acceptable? MCMC with a noisy likelihood (called “pseudo-marginal MCMC”) still targets the correct posterior — the noise slows down mixing but doesn’t introduce bias. An SD of 1-3 is a sweet spot: low enough for reasonable MCMC acceptance rates, high enough to avoid wasting computation on unnecessary precision.
Calibrating (this takes 1-2 minutes to run all particle counts):
# Reference parameters (near what we expect the posterior mode to be)
θ_cal = Dict(:R_0 => 6.0, :D_lat => 1.3, :D_inf => 2.0,
:α => 0.5, :D_imm => 10.5, :ρ => 0.7)
# Test different particle counts
particle_counts = 2 .^ (2:9) # 4, 8, 16, ..., 512
n_reps = 30
calibration = DataFrame(n_particles=Int[], mean_ll=Float64[],
sd_ll=Float64[], time_per_run=Float64[])
for n in particle_counts
t0 = time()
lls = [run_particle_filter(θ_cal, flu_tdc.obs, n) for _ in 1:n_reps]
elapsed = (time() - t0) / n_reps
push!(calibration, (n, mean(lls), std(lls), elapsed))
end
calibration| Row | n_particles | mean_ll | sd_ll | time_per_run |
|---|---|---|---|---|
| Int64 | Float64 | Float64 | Float64 | |
| 1 | 4 | -147.911 | 26.6157 | 0.00483963 |
| 2 | 8 | -134.305 | 17.1061 | 0.00285196 |
| 3 | 16 | -128.136 | 14.4657 | 0.003629 |
| 4 | 32 | -118.14 | 6.38355 | 0.00707473 |
| 5 | 64 | -117.235 | 3.43089 | 0.0142501 |
| 6 | 128 | -115.902 | 0.70281 | 0.0287174 |
| 7 | 256 | -115.981 | 0.427675 | 0.0580098 |
| 8 | 512 | -115.822 | 0.356587 | 0.12216 |
# Plot calibration results
p1 = plot(calibration.n_particles, calibration.mean_ll,
xlabel="Number of particles", ylabel="Mean log-likelihood",
title="Bias", legend=false, marker=:circle, linewidth=2, xscale=:log2)
p2 = plot(calibration.n_particles, calibration.sd_ll,
xlabel="Number of particles", ylabel="SD of log-likelihood",
title="Variance (target: 1-3)", legend=false, marker=:circle,
linewidth=2, xscale=:log2)
hline!(p2, [1, 3], color=:grey, linestyle=:dot, alpha=0.7)
p3 = plot(calibration.n_particles, calibration.time_per_run,
xlabel="Number of particles", ylabel="Time (seconds)",
title="Computation time", legend=false, marker=:circle,
linewidth=2, xscale=:log2)
plot(p1, p2, p3, layout=(1, 3), size=(900, 300))Look at the variance plot (middle). We want the SD to be in the range 1-3. Read off which particle count achieves this.
In our runs, we typically see:
- 4-16 particles: SD > 10, far too noisy for reliable inference
- 32-64 particles: SD around 3-6, borderline
- 128 particles: SD around 1.5-3, in the target range
- 256-512 particles: SD < 1.5, diminishing returns for the extra computation
Your numbers will differ due to stochastic variability, but the pattern should be similar. For the PMCMC session, we’ll use around 128-256 particles based on this calibration.
Based on the calibration plots above:
- What particle count achieves SD of log-likelihood between 1 and 3?
- What’s the trade-off between 128 and 256 particles for this problem?
- Why might you want to use more particles than the minimum needed?
The particle filter likelihood estimate depends on the parameters. Explore how the log-likelihood changes as you vary R₀:
# Vary R_0 and compute log-likelihoods
R0_values = 3.0:1.0:12.0
n_particles = 128 # Use your calibrated value
ll_results = Float64[]
for R0 in R0_values
θ_test = Dict(:R_0 => R0, :D_lat => 1.3, :D_inf => 2.0,
:α => 0.5, :D_imm => 10.5, :ρ => 0.7)
ll = run_particle_filter(θ_test, flu_tdc.obs, n_particles)
push!(ll_results, ll)
end
plot(R0_values, ll_results, xlabel="R_0", ylabel="Log-likelihood",
title="Likelihood surface", marker=:circle, legend=false)Questions:
- Which value of R₀ gives the highest log-likelihood?
- Run the same R₀ value multiple times. How much does the log-likelihood vary? (This is the noise from particle filtering)
- How does the shape of this likelihood surface compare to what you’d expect from deterministic model fitting?
Summary
Stochastic models cannot be fitted with standard MCMC because the likelihood requires integrating over all possible trajectories. Particle filters solve this by approximating the integral with Monte Carlo simulation.
The algorithm is elegant: propagate many trajectories (particles) forward, weight them by how well they match observations, resample to focus on plausible trajectories, and accumulate likelihood contributions. The resulting estimate is unbiased, though noisy.
The number of particles is a critical tuning parameter. Too few gives high variance in the likelihood estimate, causing poor MCMC mixing. Too many wastes computation. The optimal choice targets a log-likelihood standard deviation between 1 and 3.
- Stochastic models require particle filters because the likelihood marginalises over all possible trajectories
- Particle filters work by: propagate particles forward, weight by observation likelihood, resample when weights become uneven
- SSMProblems.jl interface separates model specification from filtering algorithms
- Calibrate particle count to achieve log-likelihood SD between 1 and 3 (too few = noisy, too many = slow)
- The likelihood estimate is unbiased but noisy - this noise slows MCMC but doesn’t bias posteriors
- Next step: The noisy likelihood estimate slots directly into the Metropolis-Hastings acceptance ratio — wherever we previously computed
log_likelihood(θ), we now userun_particle_filter(θ, data, n_particles)instead
References
- Doucet & Johansen (2009). A tutorial on particle filtering and smoothing. Handbook of Nonlinear Filtering 12, 656-704. (Excellent introduction)
- Pitt et al. (2012). On some properties of Markov chain Monte Carlo simulation methods based on the particle filter. Biometrika 99(2), 269-284. (Calibration: target SD 1-3)
- Doucet et al. (2015). Efficient implementation of Markov chain Monte Carlo when using an unbiased likelihood estimator. Biometrika 102(2), 295-313.
- SSMProblems.jl documentation - State-space model interface
- GeneralisedFilters.jl documentation - Particle filtering algorithms
Next session: In Particle MCMC, we’ll use the particle filter within an MCMC framework to fit the stochastic SEIT4L model to data.