Particle Filters

Estimated time: 2-2.5 hours (includes computation time for calibration)

Introduction

Where we are: You’ve completed the core workflow (Introduction → MCMC → Diagnostics → Model Checking) and seen the SEITL model. Now we tackle stochastic models, where the same parameters produce different trajectories each run.

In the previous sessions, we fitted deterministic models using MCMC. For deterministic models, computing the likelihood is straightforward: given parameters, we solve the ODEs to get a trajectory, then evaluate the probability of the data given that trajectory.

But what about stochastic models? In the SEITL session, we saw that the same parameters can produce very different trajectories due to demographic stochasticity. There’s no single “expected” trajectory to compare against the data.

This session introduces particle filters — the computational technique that makes Bayesian inference possible for stochastic models.

Particle filters weren’t covered in the R-based course. If this is your first encounter with them, that’s expected — they’re a new topic rather than a Julia translation of something you’ve seen before. In R, the pomp package and the odin/dust/mcstate ecosystem provide particle filtering capabilities.

Objectives

In this session you will:

  1. Understand why stochastic models need different inference methods
  2. Learn the core particle filter algorithm: propagate, weight, resample
  3. Implement the SEIT4L model using the state-space model interface
  4. Calibrate the number of particles needed for reliable inference
ImportantReminder: Stochastic variability

Particle filters use stochastic simulation, so your results will differ slightly from others’ - and from repeated runs. The patterns (e.g., how ESS changes with particle count) should be similar, but exact numbers will vary. This is expected.

WarningComputational requirements

Particle filters are computationally intensive. For this session:

  • Memory: ~2-4 GB RAM for 256 particles (scales linearly with particle count)
  • Runtime: Single likelihood evaluation takes 0.1-0.5 seconds with 100 particles
  • Calibration section: Running 100 replicates at multiple particle counts takes 5-10 minutes

If you’re on a slower machine, reduce particle counts in the calibration exercises (e.g., use 50, 100, 200 instead of 64, 128, 256, 512).

Source file

The source file of this session is located at sessions/particle_filters.qmd.

Load libraries

using Random ## for random numbers
using Distributions ## for probability distributions
using DataFrames ## for data frames
using Plots ## for plots
using StatsPlots ## for statistical plots
using CSV ## for CSV file reading
using StatsBase ## for statistical functions
using SSMProblems ## for state-space model interface
using GeneralisedFilters ## for particle filtering
using DrWatson

Load shared particle filter code

The SEIT4L Gillespie simulator and SSMProblems interface are defined in a shared file (code shown in main text below).

include(joinpath(project_root, "src", "seit4l_particle_filter.jl"))
Main.Notebook.run_particle_filter

Initialisation

We set a random seed for reproducibility.

Random.seed!(1234)
TaskLocalRNG()

Why deterministic likelihood fails for stochastic models

First, consider the problem. With a deterministic model, the trajectory is fully determined by the parameters:

\[\text{Parameters} \rightarrow \text{Unique trajectory} \rightarrow P(\text{data}|\text{trajectory})\]

With a stochastic model, the same parameters can produce infinitely many different trajectories:

\[\text{Parameters} \rightarrow \text{Many possible trajectories} \rightarrow \text{???}\]

The likelihood we need is \(P(\text{data}|\theta)\), which marginalises over all possible trajectories:

\[P(\text{data}|\theta) = \int P(\text{data}|\text{trajectory}) \times P(\text{trajectory}|\theta) \, d(\text{trajectory})\]

This integral is intractable - we can’t enumerate all possible trajectories. Particle filters give us a Monte Carlo approximation.

Load data and define SEIT4L dynamics

# Load Tristan da Cunha data
flu_tdc = CSV.read(datadir("flu_tdc_1971.csv"), DataFrame)

# Parameters (reasonable values from literature)
θ_example = Dict(
    :R_0 => 7.0,
    :D_lat => 1.0,
    :D_inf => 4.0,
    :α => 0.5,
    :D_imm => 10.0,
    :ρ => 0.65
)

# Initial state for SEIT4L
init_state = Dict(
    :S => 279.0,
    :E => 0.0,
    :I => 2.0,
    :T1 => 3.0,
    :T2 => 0.0,
    :T3 => 0.0,
    :T4 => 0.0,
    :L => 0.0
)
Dict{Symbol, Float64} with 8 entries:
  :I  => 2.0
  :T2 => 0.0
  :L  => 0.0
  :S  => 279.0
  :T4 => 0.0
  :T1 => 3.0
  :E  => 0.0
  :T3 => 0.0

The Gillespie simulator

NoteWhy SEIT4L instead of SEITL?

From here on, we use the SEIT4L model (Erlang-4 temporary immunity) rather than the basic SEITL (exponential temporary immunity). As we saw in the SEITL session, the Erlang distribution gives more realistic timing for immunity loss. The original Camacho et al. (2011) paper found that SEIT4L fits the Tristan da Cunha data better. Everything below works identically with SEITL — just with fewer T sub-compartments.

The particle filter needs to simulate many stochastic trajectories. We use the Gillespie algorithm introduced in the SEITL session, adapted here for the SEIT4L model with its four T compartments:

Gillespie step implementation (click to expand)
# Stoichiometry matrix: how each transition changes [S, E, I, T1, T2, T3, T4, L]
const STOICH = [
    [-1,  1,  0,  0,  0,  0,  0,  0],  # S -> E (infection)
    [ 0, -1,  1,  0,  0,  0,  0,  0],  # E -> I (become infectious)
    [ 0,  0, -1,  1,  0,  0,  0,  0],  # I -> T1 (recovery starts)
    [ 0,  0,  0, -1,  1,  0,  0,  0],  # T1 -> T2
    [ 0,  0,  0,  0, -1,  1,  0,  0],  # T2 -> T3
    [ 0,  0,  0,  0,  0, -1,  1,  0],  # T3 -> T4
    [ 0,  0,  0,  0,  0,  0, -1,  1],  # T4 -> L (long-term immunity)
    [ 1,  0,  0,  0,  0,  0,  0, -1],  # L -> S (waning immunity)
]

function gillespie_step(rng, state, θ)
    S, E, I, T1, T2, T3, T4, L = state
    N = sum(state)

    # Calculate transition rates
    β = θ[:R_0] / θ[:D_inf]
    rates = [
        β * S * I / N,           # S -> E
        E / θ[:D_lat],           # E -> I
        (1 - θ[:α]) * I / θ[:D_inf],  # I -> T1 (temporary immunity)
        θ[:α] * I / θ[:D_inf],   # I -> L (direct to long-term, not shown above)
        4 * T1 / θ[:D_imm],      # T1 -> T2 (Erlang stages)
        4 * T2 / θ[:D_imm],      # T2 -> T3
        4 * T3 / θ[:D_imm],      # T3 -> T4
        4 * T4 / θ[:D_imm],      # T4 -> L
    ]

    daily_incidence = 0
    t = 0.0

    while t < 1.0  # Simulate one day
        total_rate = sum(rates)
        if total_rate == 0
            break
        end

        # Time to next event
        dt = randexp(rng) / total_rate
        if t + dt > 1.0
            break
        end
        t += dt

        # Choose which event occurs
        event = sample(rng, 1:length(rates), Weights(rates))

        # Update state
        state = state .+ STOICH[event]

        # Track E -> I transitions as new cases
        if event == 2
            daily_incidence += 1
        end

        # Recalculate rates with new state
        S, E, I, T1, T2, T3, T4, L = state
        # ... (rates update)
    end

    return state, daily_incidence
end

Key aspects:

  • Stoichiometry matrix: Defines how each transition changes the state vector
  • Rates: Computed from current state and parameters
  • Event selection: Exponential waiting time, event chosen proportional to rates
  • Incidence tracking: Counts E→I transitions as new cases

How particle filters work

The particle filter approximates the likelihood integral by:

  1. Propagate: Simulate many independent trajectories (“particles”) forward through time
  2. Weight: At each observation time, assign weights based on how well each particle matches the data
  3. Resample: Replace low-weight particles with copies of high-weight particles

We can visualise this with our SEIT4L model:

# Propagate 20 particles for 5 days
n_particles = 20
particles = [Float64[init_state[:S], init_state[:E], init_state[:I],
                     init_state[:T1], init_state[:T2], init_state[:T3],
                     init_state[:T4], init_state[:L]] for _ in 1:n_particles]

# Store trajectories (days 0-5)
trajectories = [zeros(6) for _ in 1:n_particles]

# Simulate 5 days
for day in 1:5
    for i in 1:n_particles
        particles[i], inc = gillespie_step(Random.default_rng(), particles[i], θ_example)
        trajectories[i][day+1] = inc
    end
end

# For weighting step later
day_5_predictions = [trajectories[i][6] for i in 1:n_particles]
day_5_observed = flu_tdc.obs[6]

# Plot
p = plot(title="Particle Filter: Propagation Step",
         xlabel="Time (days)", ylabel="Daily incidence")

# Particle trajectories
for i in 1:n_particles
    plot!(p, 0:5, trajectories[i], alpha=0.4, color=:steelblue, linewidth=1, label="")
end

# All observations - earlier in gray, previous (day 4) in red, current target (day 5) as open circle
scatter!(p, flu_tdc.time[1:4], flu_tdc.obs[1:4],
         label="Earlier observations", markersize=4, color=:gray)
scatter!(p, [flu_tdc.time[5]], [flu_tdc.obs[5]],
         label="Previous observation", markersize=7, color=:red)
scatter!(p, [flu_tdc.time[6]], [flu_tdc.obs[6]],
         label="Current target", markersize=7, markershape=:circle,
         markerstrokecolor=:red, markerstrokewidth=2, color=:white)

plot!(p, legend=:topright)

Each blue line is one particle trajectory. The particles spread out because each follows different random events. The current observation (red, at day 5) will be used to weight the particles.

The weighting step

Particles that predict incidence close to the observed value get higher weights:

# Compute weights (Poisson likelihood)
log_w = [logpdf(Poisson(max(θ_example[:ρ] * pred, 0.01)), day_5_observed)
         for pred in day_5_predictions]

# Normalise
particle_weights = exp.(log_w .- maximum(log_w))
particle_weights ./= sum(particle_weights)

# Visualise
p = bar(1:n_particles, particle_weights,
        xlabel="Particle", ylabel="Weight",
        title="Particle Weights at Day 5 (obs = $day_5_observed)",
        legend=false, alpha=0.7)

# Add predictions as labels
for (i, pred) in enumerate(day_5_predictions)
    annotate!(p, i, particle_weights[i] + 0.02, text("$(round(Int, pred))", 7))
end

plot!(p)

Particles predicting values closer to the observation get higher weights.

The resampling step

When weights become too uneven, we resample - drawing particles with replacement according to their weights:

# Resample particles
n_effective = 1.0 / sum(particle_weights.^2)  # Effective sample size
println("Effective sample size: $(round(n_effective, digits=1)) out of $n_particles")

# Draw new particles
resampled_idx = sample(1:n_particles, Weights(particle_weights), n_particles)

# Count how many times each original particle was selected
counts = [count(==(i), resampled_idx) for i in 1:n_particles]

p = bar(1:n_particles, counts,
        xlabel="Original particle", ylabel="Copies after resampling",
        title="Resampling: High-weight particles duplicated",
        legend=false, alpha=0.7, color=:green)
plot!(p)
Effective sample size: 2.0 out of 20

High-weight particles are duplicated, low-weight particles are discarded. This focuses computational effort on plausible trajectories.

WarningExercise: Understand the particle filter steps

Look at the three plots above (propagation, weighting, resampling).

  1. In the propagation plot, why do the particle trajectories spread apart over time?
  2. In the weighting plot, particles with predicted incidence closer to the observed value get higher weights. Why does Poisson likelihood give this behaviour?
  3. After resampling, some particles are duplicated many times while others disappear entirely. Why is this beneficial for estimating the likelihood?
TipWhy resampling is essential: particle degeneracy

Without resampling, particle filters suffer from degeneracy: after several time steps, almost all weight concentrates on a single particle. This happens because multiplicative weights compound - a particle that’s slightly better at time 1 gets a slightly higher weight, making it even more likely to dominate at time 2, and so on.

Imagine 100 particles after 10 time steps without resampling: - 1 particle has weight 0.99 - 99 particles share weight 0.01

This means we’re effectively using just 1 trajectory to estimate the likelihood - a very poor approximation. Resampling fixes this by redistributing particles to maintain diversity, keeping the effective sample size high.

Accumulating the likelihood

The log-likelihood is estimated by summing the log of mean weights at each time step:

\[\log \hat{P}(\text{data}|\theta) = \sum_{t=1}^{T} \log \left( \frac{1}{N} \sum_{i=1}^{N} w_t^{(i)} \right)\]

This is unbiased in expectation, meaning if we run the particle filter many times, the average will converge to the true likelihood.

Implementing the state-space model interface

For production use, we define the model using the SSMProblems.jl interface. This separates the model from the filtering algorithm, so you can swap in different filtering methods (bootstrap, guided, auxiliary) without rewriting the model. It also integrates with Turing.jl for the Particle MCMC in the next session.

A state-space model has three components:

  1. Initial state distribution (SEIT4LInitial) - where does the system start?
  2. Transition dynamics (SEIT4LDynamics) - how does the state evolve? (uses gillespie_step)
  3. Observation process (PoissonObservation) - how do observations relate to the state?

Here is the full interface implementation (also available in src/seit4l_particle_filter.jl):

# Initial state: deterministic starting point
struct SEIT4LInitial <: SSMProblems.StatePrior
    init_state::Vector{Float64}
end

function SSMProblems.simulate(rng::AbstractRNG, prior::SEIT4LInitial; kwargs...)
    vcat(prior.init_state, 0.0)  # Append 0 for initial daily incidence
end

# Dynamics: Gillespie simulation for one day
struct SEIT4LDynamics <: SSMProblems.LatentDynamics
    θ::Dict{Symbol, Float64}
end

function SSMProblems.simulate(rng::AbstractRNG, dyn::SEIT4LDynamics,
                              step::Integer, prev_state; kwargs...)
    state = collect(prev_state[1:8])  # Extract compartments
    new_state, daily_inc = gillespie_step(rng, state, dyn.θ)
    return vcat(new_state, daily_inc)  # Return state with incidence appended
end

# Observation: Poisson likelihood on reported cases
struct PoissonObservation <: SSMProblems.ObservationProcess
    ρ::Float64  # Reporting rate
end

function SSMProblems.distribution(obs::PoissonObservation, step::Integer,
                                  state; kwargs...)
    daily_inc = state[end]
    Poisson(max(obs.ρ * daily_inc, 1e-10))
end
  • SSMProblems.LatentDynamics - abstract type for state transition models. Implement simulate to define how the state evolves.
  • SSMProblems.ObservationProcess - abstract type for observation models. Implement distribution to return a Distributions.jl distribution.
  • AbstractRNG parameter allows reproducible random number generation across particles.

How the pieces fit together:

  • The state vector is [S, E, I, T1, T2, T3, T4, L, daily_inc] - the 9th element tracks daily incidence for the observation process
  • SEIT4LDynamics wraps the Gillespie simulator, calling gillespie_step each day
  • PoissonObservation assumes observed cases follow Poisson(ρ × true_incidence)

Running the bootstrap particle filter

The run_particle_filter function assembles the SSM components and runs the bootstrap filter:

function run_particle_filter(θ, obs, n_particles;
                             init_state=[279.0, 0.0, 2.0, 3.0, 0.0, 0.0, 0.0, 0.0])
    # Define SSM components
    initial = SEIT4LInitial(init_state)
    dynamics = SEIT4LDynamics(θ)
    observation = PoissonObservation(θ[:ρ])

    # Create state-space model
    model = StateSpaceModel(initial, dynamics, observation)

    # Run bootstrap particle filter
    rng = Random.default_rng()
    algo = BF(n_particles)  # Bootstrap Filter
    _, log_lik = GeneralisedFilters.filter(rng, model, algo, obs)

    return log_lik
end

This ties everything together: the initial state, Gillespie dynamics, and Poisson observations form a StateSpaceModel, which the bootstrap filter (BF) processes to estimate the log-likelihood.

Testing it:

# Run with 100 particles
ll = run_particle_filter(θ_example, flu_tdc.obs, 100)
println("Log-likelihood estimate: $(round(ll, digits=1))")
Log-likelihood estimate: -118.2

Calibrating particle count

A key question: how many particles do we need?

  • Too few particles produce noisy likelihood estimates, causing poor MCMC mixing
  • Too many particles waste computation

The principled approach targets a specific variance of the log-likelihood estimate. Pitt et al. (2012) and Doucet et al. (2015) showed that MCMC efficiency is maximised when the standard deviation of the log-likelihood estimate is between 1 and 3.

Why is some noise acceptable? MCMC with a noisy likelihood (called “pseudo-marginal MCMC”) still targets the correct posterior — the noise slows down mixing but doesn’t introduce bias. An SD of 1-3 is a sweet spot: low enough for reasonable MCMC acceptance rates, high enough to avoid wasting computation on unnecessary precision.

Calibrating (this takes 1-2 minutes to run all particle counts):

# Reference parameters (near what we expect the posterior mode to be)
θ_cal = Dict(:R_0 => 6.0, :D_lat => 1.3, :D_inf => 2.0,
             :α => 0.5, :D_imm => 10.5, :ρ => 0.7)

# Test different particle counts
particle_counts = 2 .^ (2:9)  # 4, 8, 16, ..., 512
n_reps = 30

calibration = DataFrame(n_particles=Int[], mean_ll=Float64[],
                        sd_ll=Float64[], time_per_run=Float64[])

for n in particle_counts
    t0 = time()
    lls = [run_particle_filter(θ_cal, flu_tdc.obs, n) for _ in 1:n_reps]
    elapsed = (time() - t0) / n_reps

    push!(calibration, (n, mean(lls), std(lls), elapsed))
end

calibration
8×4 DataFrame
Row n_particles mean_ll sd_ll time_per_run
Int64 Float64 Float64 Float64
1 4 -147.911 26.6157 0.00483963
2 8 -134.305 17.1061 0.00285196
3 16 -128.136 14.4657 0.003629
4 32 -118.14 6.38355 0.00707473
5 64 -117.235 3.43089 0.0142501
6 128 -115.902 0.70281 0.0287174
7 256 -115.981 0.427675 0.0580098
8 512 -115.822 0.356587 0.12216
# Plot calibration results
p1 = plot(calibration.n_particles, calibration.mean_ll,
    xlabel="Number of particles", ylabel="Mean log-likelihood",
    title="Bias", legend=false, marker=:circle, linewidth=2, xscale=:log2)

p2 = plot(calibration.n_particles, calibration.sd_ll,
    xlabel="Number of particles", ylabel="SD of log-likelihood",
    title="Variance (target: 1-3)", legend=false, marker=:circle,
    linewidth=2, xscale=:log2)
hline!(p2, [1, 3], color=:grey, linestyle=:dot, alpha=0.7)

p3 = plot(calibration.n_particles, calibration.time_per_run,
    xlabel="Number of particles", ylabel="Time (seconds)",
    title="Computation time", legend=false, marker=:circle,
    linewidth=2, xscale=:log2)

plot(p1, p2, p3, layout=(1, 3), size=(900, 300))

Look at the variance plot (middle). We want the SD to be in the range 1-3. Read off which particle count achieves this.

In our runs, we typically see:

  • 4-16 particles: SD > 10, far too noisy for reliable inference
  • 32-64 particles: SD around 3-6, borderline
  • 128 particles: SD around 1.5-3, in the target range
  • 256-512 particles: SD < 1.5, diminishing returns for the extra computation

Your numbers will differ due to stochastic variability, but the pattern should be similar. For the PMCMC session, we’ll use around 128-256 particles based on this calibration.

WarningExercise: Choose the number of particles

Based on the calibration plots above:

  1. What particle count achieves SD of log-likelihood between 1 and 3?
  2. What’s the trade-off between 128 and 256 particles for this problem?
  3. Why might you want to use more particles than the minimum needed?
WarningExercise: Sensitivity to parameters

The particle filter likelihood estimate depends on the parameters. Explore how the log-likelihood changes as you vary R₀:

# Vary R_0 and compute log-likelihoods
R0_values = 3.0:1.0:12.0
n_particles = 128  # Use your calibrated value

ll_results = Float64[]
for R0 in R0_values
    θ_test = Dict(:R_0 => R0, :D_lat => 1.3, :D_inf => 2.0,
                  :α => 0.5, :D_imm => 10.5, :ρ => 0.7)
    ll = run_particle_filter(θ_test, flu_tdc.obs, n_particles)
    push!(ll_results, ll)
end

plot(R0_values, ll_results, xlabel="R_0", ylabel="Log-likelihood",
     title="Likelihood surface", marker=:circle, legend=false)

Questions:

  1. Which value of R₀ gives the highest log-likelihood?
  2. Run the same R₀ value multiple times. How much does the log-likelihood vary? (This is the noise from particle filtering)
  3. How does the shape of this likelihood surface compare to what you’d expect from deterministic model fitting?

Summary

Stochastic models cannot be fitted with standard MCMC because the likelihood requires integrating over all possible trajectories. Particle filters solve this by approximating the integral with Monte Carlo simulation.

The algorithm is elegant: propagate many trajectories (particles) forward, weight them by how well they match observations, resample to focus on plausible trajectories, and accumulate likelihood contributions. The resulting estimate is unbiased, though noisy.

The number of particles is a critical tuning parameter. Too few gives high variance in the likelihood estimate, causing poor MCMC mixing. Too many wastes computation. The optimal choice targets a log-likelihood standard deviation between 1 and 3.

TipLearning points
  • Stochastic models require particle filters because the likelihood marginalises over all possible trajectories
  • Particle filters work by: propagate particles forward, weight by observation likelihood, resample when weights become uneven
  • SSMProblems.jl interface separates model specification from filtering algorithms
  • Calibrate particle count to achieve log-likelihood SD between 1 and 3 (too few = noisy, too many = slow)
  • The likelihood estimate is unbiased but noisy - this noise slows MCMC but doesn’t bias posteriors
  • Next step: The noisy likelihood estimate slots directly into the Metropolis-Hastings acceptance ratio — wherever we previously computed log_likelihood(θ), we now use run_particle_filter(θ, data, n_particles) instead

References

Next session: In Particle MCMC, we’ll use the particle filter within an MCMC framework to fit the stochastic SEIT4L model to data.