MCMC Diagnostics

Estimated time: 1.5-2 hours

Introduction

In the previous session, you saw how MCMC algorithms explore posterior distributions by taking random walks through parameter space. You implemented Metropolis-Hastings yourself and used Turing.jl’s NUTS to do the same thing with gradients. What we haven’t addressed: how do you know the sampler actually worked?

MCMC comes with weak guarantees. A chain always returns numbers, but nothing forces those numbers to be representative samples from the posterior. Chains get stuck, wander with no clear trend, or mix so slowly that a thousand “samples” are effectively a dozen. The convergence theorems only kick in as the number of iterations goes to infinity, which is not a length of run anyone actually does. Take unreliable output at face value and you end up with the wrong point estimates and overconfident intervals.

This session covers how to catch those failures before they end up in a paper: trace plots for the visual check, R̂ and effective sample size for the numerical ones, and a few standard failure modes worth keeping in mind.

Objectives

The aim is to learn how to assess whether your MCMC chains have converged and whether your posterior samples are reliable enough to base conclusions on.

In this session, you will:

Diagnose chains visually using trace plots and other visualisations
Calculate quantitative diagnostics like R̂ and effective sample size
Identify common failure modes and how to fix them
Use Turing.jl’s built-in diagnostics for automated checking
Apply best practices for robust MCMC in practice

Setup

"100"

Source file

The source file of this session is located at sessions/mcmc_diagnostics.qmd.

Load libraries

using Random ## for random numbers
using Distributions ## for probability distributions
using DifferentialEquations ## for differential equations
using DataFrames ## for data frames
using Plots ## for plots
using StatsPlots ## for statistical plots
using Turing ## for probabilistic programming and MCMC
using MCMCChains ## for MCMC diagnostics
using StatsBase ## for autocorrelation

Initialisation

We set a random seed for reproducibility. Setting this ensures that you should get exactly the same results on your computer as we do.

Random.seed!(12345)

TaskLocalRNG()

Load data and model from previous session

We’ll use the same SIR model and epidemic data from previous sessions.

SIR Model Definition (from src/sir_model.jl)

using DifferentialEquations
using Distributions
using DataFrames

"""
    sir_ode!(du, u, p, t)

SIR model ordinary differential equations.

# States
- `u[1]` (S): Susceptible
- `u[2]` (I): Infectious
- `u[3]` (R): Recovered

# Parameters
- `p[1]` (R_0): Basic reproduction number
- `p[2]` (D_inf): Infectious period (days)
"""
function sir_ode!(du, u, p, t)
    S, I, R = u
    R_0, D_inf = p
    N = S + I + R

    β = R_0 / D_inf
    ν = 1 / D_inf

    du[1] = -β * S * I / N          # dS/dt
    du[2] = β * S * I / N - ν * I   # dI/dt
    du[3] = ν * I                   # dR/dt
end

"""
    simulate_sir(R_0, D_inf, S0, I0, times)

Simulate the deterministic SIR model.

# Arguments
- `R_0`: Basic reproduction number
- `D_inf`: Infectious period (days)
- `S0`: Initial susceptible population
- `I0`: Initial infectious population
- `times`: Time points to return (e.g., 0.0:1.0:30.0)

# Returns
DataFrame with columns: time, S, I, R, Inc (daily incidence)
"""
function simulate_sir(R_0, D_inf, S0, I0, times)
    times_vec = collect(times)
    u0 = Float64[S0, I0, 0.0]
    prob = ODEProblem(sir_ode!, u0, (times_vec[1], times_vec[end]), [R_0, D_inf])
    sol = solve(prob, Tsit5(), saveat=times_vec)

    df = DataFrame(
        time = sol.t,
        S = [u[1] for u in sol.u],
        I = [u[2] for u in sol.u],
        R = [u[3] for u in sol.u]
    )

    # Compute daily incidence from change in R
    df.Inc = [0.0; diff(df.R)]

    return df
end

# Initial state and time range
init_state = [999, 1, 0]
t0, tend = (0, 30)

# Epidemic data
epi1 = DataFrame(
    time = 1:1:30,
    obs = [
        1, 3, 4, 11, 19, 38, 67, 112, 183, 210, 237, 182, 178, 154, 95, 68, 51,
        30, 32, 18, 19, 9, 2, 2, 2, 0, 0, 1, 0, 0
    ]
)

# Define our SIR model in Turing.jl (same as MCMC session)
@model function sir_model(init_state, times, n_obs)
    # Priors
    R_0 ~ Uniform(1.0, 20.0)
    D_inf ~ Uniform(1.0, 14.0)

    # Simulate ODE
    prob = ODEProblem(sir_ode!, init_state, (times[1], times[end]), [R_0, D_inf])
    sol = solve(prob, saveat = times)

    # Likelihood
    I = sol[2, :]
    # Use max to handle tiny numerical errors from automatic differentiation
    lambdas = [max(I[i], 0.0) for i in 1:n_obs]
    obs ~ arraydist(Poisson.(lambdas))
end

# Create model instance
model = sir_model(init_state, epi1.time, length(epi1.obs)) | (; obs = epi1.obs)

Visual diagnostics

1. Trace plots: The most important diagnostic

Trace plots show parameter values over iterations. A well-mixed chain looks like a “hairy caterpillar” - densely filled with rapid up-and-down fluctuations around a stable mean:

# Good chain: well-mixing NUTS
chain_good = sample(model, NUTS(), 2000)

# chain_good[:R_0] extracts the samples for R_0 from the chain object
# You can use this with any parameter name defined in your @model function
plot(
    chain_good[:R_0], title = "Good Chain: R_0", xlabel = "Iteration",
    ylabel = "R_0", linewidth = 1
)

Plotting code

# For comparison, let's look at the other parameter too
p1 = plot(
    chain_good[:R_0], title = "R_0 Trace", xlabel = "Iteration",
    ylabel = "R_0", linewidth = 1
)
p2 = plot(
    chain_good[:D_inf], title = "D_inf Trace", xlabel = "Iteration",
    ylabel = "D_inf", linewidth = 1
)

plot(p1, p2, layout = (2, 1), size = (800, 600))

Look for a chain that moves freely around parameter space, shows no obvious trend or drift, and stays visually “fat” — rapid up-and-down fluctuations rather than a thin worm-like track. The opposites are what you want to avoid: a chain stuck in one region, still climbing or descending after warmup, or exploring only a narrow band.

2. Running averages

Running averages should stabilise if the chain has converged:

function running_mean(x)
    return [mean(x[1:i]) for i in eachindex(x)]
end

R_0_samples = vec(chain_good[:R_0])
p1 = plot(
    running_mean(R_0_samples), title = "Running Mean: R_0",
    xlabel = "Iteration", ylabel = "Running Mean R_0", linewidth = 2
)
hline!(
    p1, [mean(R_0_samples)], color = :red, linestyle = :dash,
    label = "Final mean"
)

D_inf_samples = vec(chain_good[:D_inf])
p2 = plot(
    running_mean(D_inf_samples), title = "Running Mean: D_inf",
    xlabel = "Iteration", ylabel = "Running Mean D_inf", linewidth = 2
)
hline!(
    p2, [mean(D_inf_samples)], color = :red, linestyle = :dash,
    label = "Final mean"
)

plot(p1, p2, layout = (2, 1))

3. Autocorrelation

MCMC samples are correlated. Good chains have autocorrelation that decays quickly:

function autocorr_plot(chain, param_name; max_lag = 100)
    samples = Array(chain[param_name])  # Convert to plain array
    lags = 0:max_lag
    # autocor from StatsBase expects a vector and returns scalar for each lag
    autocorrs = [autocor(samples, [lag])[1] for lag in lags]

    plot(
        lags, autocorrs, title = "Autocorrelation: $param_name",
        xlabel = "Lag", ylabel = "Autocorrelation", linewidth = 2
    )
    hline!([0], color = :gray, linestyle = :dash, label = "")
    hline!(
        [-0.05, 0.05], color = :red, linestyle = :dash, label = "±5% threshold"
    )
end

p1 = autocorr_plot(chain_good, :R_0)
p2 = autocorr_plot(chain_good, :D_inf)

plot(p1, p2, layout = (2, 1), size = (800, 600))

We can see that autocorrelation drops quickly to near zero and both parameters show good mixing. After ~20-30 MCMC steps, samples are essentially independent

Exercise: Interpret the visual diagnostics

Look back at the trace plots, running mean plots, and autocorrelation plots above.

What would a bad trace plot look like? Sketch or describe what you would see if the chain was stuck or not mixing well.
If the running mean was still trending upward after 2000 iterations, what would that suggest?
If autocorrelation remained high (say > 0.5) even at lag 100, what would that indicate about your effective sample size?

4. Burn-in and warmup

MCMC chains often start from arbitrary initial values and need time to “find” the high-probability regions of the posterior. Samples from this initial exploration phase (called “burn-in”) are discarded because they don’t represent the target distribution.

Modern samplers like NUTS have automatic warmup periods where they adapt the step size to achieve good acceptance rates, learn the geometry of the posterior and explore to find typical parameter values.

In our earlier example, when we ran sample(model, NUTS(), 2000), Turing actually ran ~3000 iterations total: 1000 warmup iterations (automatically discarded) plus 2000 sampling iterations. The returned chain contains only the 2000 post-warmup samples.

By default, NUTS uses 1000 warmup iterations regardless of how many sampling iterations you request. You can change this with the n_adapts keyword: NUTS(0.65; n_adapts=2000) uses 2000 warmup iterations instead. For most models, the default 1000 is sufficient.

With NUTS, you typically don’t need to manually discard burn-in samples. The warmup is automatic and adaptive. However, you should still check trace plots to ensure the chain has converged - if you see trends even after warmup, you may need longer adaptation.

Quantitative diagnostics

1. R̂ statistic (Gelman-Rubin diagnostic)

R̂ compares within-chain and between-chain variance. It should be < 1.1 (ideally < 1.01):

Running Julia with multiple threads

To run multiple chains in parallel, Julia must be started with multiple threads. Check your current thread count with Threads.nthreads(). If it shows 1, restart Julia with:

julia -t auto          # Use all available CPU cores
julia -t 4             # Use exactly 4 threads

In VS Code, you can set this permanently in settings: search for “Julia: Num Threads” and set it to “auto” or a specific number. Without multiple threads, chains will run sequentially (slower but still correct).

# Run multiple chains for R̂ calculation
# Detect available threads to avoid overwhelming CI systems
n_chains = max(2, min(4, Threads.nthreads()))
println("Running $n_chains chains in parallel ($(Threads.nthreads()) threads available)")
chain_multi = sample(model, NUTS(), MCMCThreads(), 2000, n_chains)

Code explainer

MCMCThreads() - tells Turing to run multiple chains in parallel using Julia threads. Much faster than running sequentially on multi-core machines.
sample(model, sampler, MCMCThreads(), n_samples, n_chains) - the extended signature for parallel sampling. Here we request n_chains chains, each with n_samples samples.
Threads.nthreads() - returns how many threads Julia has available. Start Julia with julia -t auto to use all CPU cores.

# Check what each individual chain found
println("Individual chain summaries:")
for i in 1:n_chains
    chain_i = chain_multi[:, :, i]
    r0_mean = mean(chain_i[:R_0])
    dinf_mean = mean(chain_i[:D_inf])
    println(
        "  Chain $i: R_0 = $(round(r0_mean, digits = 2)), ",
        "D_inf = $(round(dinf_mean, digits = 2))"
    )
end

# Calculate R̂
println("\nR̂ values:")
rhat_values = DataFrame(rhat(chain_multi))
for row in eachrow(rhat_values)
    param = row.parameters
    value = row.rhat
    status = value < 1.1 ? "good" : "bad"
    println("  $param: $(round(value, digits = 3)) $status")
end

Individual chain summaries:
  Chain 1: R_0 = 2.45, D_inf = 1.94
  Chain 2: R_0 = 2.44, D_inf = 1.94

R̂ values:
  R_0: 1.002 good
  D_inf: 1.002 good

Code explainer

chain_multi[:, :, i] - extracts the i-th chain from a multi-chain object. The three dimensions are: samples × parameters × chains.
rhat(chain) - calculates the R̂ (Gelman-Rubin) diagnostic. Compares variance within chains to variance between chains. Values < 1.1 indicate convergence; < 1.01 is even better.

# Visualise multiple chains
plot(chain_multi)

2. Effective Sample Size (ESS)

ESS tells you how many independent samples your correlated MCMC samples are equivalent to:

ess_values = DataFrame(ess(chain_multi))
n_samples = length(chain_multi)

Code explainer

ess(chain) - calculates Effective Sample Size. MCMC samples are correlated, so N samples don’t give you N independent draws. ESS estimates how many independent samples your chain is equivalent to.

println("Effective Sample Size:")
for row in eachrow(ess_values)
    param = row.parameters
    value = row.ess
    efficiency = value / n_samples * 100
    status = value > 100 ? "good" : "bad"  # Rule of thumb: ESS > 100
    println(
        "  $param: $(round(value, digits = 0)) / $n_samples ",
        "($(round(efficiency, digits = 1))%) $status"
    )
end

Effective Sample Size:
  R_0: 1139.0 / 2000 (56.9%) good
  D_inf: 1274.0 / 2000 (63.7%) good

An ESS close to N means little autocorrelation and efficient sampling. An ESS much smaller than N means the samples are highly correlated and you need to run longer. As a rule of thumb, ESS below 100 is generally not enough for reliable inference.

3. Monte Carlo Standard Error (MCSE)

MCSE quantifies uncertainty in your posterior estimates due to finite sampling:

mcse_values = DataFrame(mcse(chain_multi))
posterior_means = DataFrame(mean(chain_multi))

println("Posterior means ± MCSE:")
for row in eachrow(mcse_values)
    param = row.parameters
    mcse_val = row.mcse
    # Get mean for this parameter
    mean_val = posterior_means[posterior_means.parameters .== param, :mean][1]
    rel_error = mcse_val / abs(mean_val) * 100
    println(
        "  $param: $(round(mean_val, digits = 3)) ± ",
        "$(round(mcse_val, digits = 4)) ",
        "($(round(rel_error, digits = 2))% relative error)"
    )
end

Posterior means ± MCSE:
  R_0: 2.445 ± 0.001 (0.04% relative error)
  D_inf: 1.944 ± 0.001 (0.05% relative error)

Common problems and solutions

Problem 1: Poor mixing (chain gets stuck)

Chains get stuck for a few common reasons: a sampler too basic for the geometry, proposal steps that are the wrong size, correlated parameters or narrow ridges in the posterior, or multimodality trapping the chain in one mode.

The first thing to try is a better algorithm — NUTS handles most well-behaved posteriors — and running longer. If that isn’t enough, reparametrise the model to give the posterior a friendlier shape, and run multiple chains from different starting points so stuck chains show up as outliers.

With gradient-based samplers like NUTS, mixing problems are uncommon when the model is specified correctly, the posterior is reasonably well-behaved, and the priors aren’t pathological.

Problem 2: Divergent transitions (for HMC/NUTS)

NUTS simulates a ball rolling over the posterior surface. A divergent transition occurs when the simulation becomes numerically unstable — the ball “flies off” to implausible values instead of following the contour. This typically happens when the posterior has sharp ridges or funnels: the step size that works in flat regions is too large for the narrow parts, causing numerical errors to explode.

A small number of divergences (say <1% of samples) may be acceptable, but many divergences indicate biased sampling — the sampler is systematically avoiding regions it can’t navigate, so the samples underrepresent those areas.

# Check for divergent transitions
if haskey(chain_good.info, :divergent_transitions)
    n_divergent = sum(chain_good.info[:divergent_transitions])
    println("Divergent transitions: $n_divergent / $(length(chain_good))")
    if n_divergent > 0
        println("Consider increasing adapt_delta or reparameterizing")
    else
        println("No divergent transitions")
    end
end

Problem 3: Multi-modal posteriors

Some posteriors have multiple peaks, and a single chain can settle into one and never see the others. The standard responses are:

Run multiple chains from different starting points
Use more advanced samplers (parallel tempering, etc.)
Watch R̂ — it will flag chains that end up in different modes

Example: Detecting multimodality

Here’s an artificial example with two modes to show how multiple chains help:

# Create a artificial multimodal example
@model function multimodal_example()
    x ~ Normal(0, 5)  # Wide prior

    # Likelihood with two modes
    mode1_loglik = logpdf(Normal(-2, 0.5), x)
    mode2_loglik = logpdf(Normal(2, 0.5), x)
    Turing.@addlogprob! log(0.5 * exp(mode1_loglik) + 0.5 * exp(mode2_loglik))
end

multimodal_example (generic function with 2 methods)

Code explainer

Turing.@addlogprob! manually adds a value to the log-probability of the model. This is useful when you can’t express the likelihood using ~ syntax — here we need a mixture of two Normals, which requires computing the log-probability by hand.
The ~ syntax (e.g. x ~ Normal(0, 5)) automatically adds the log-density to the model’s total log-probability. @addlogprob! does the same thing, but you compute the value yourself.

# Single chain might miss one mode
chain_single = sample(multimodal_example(), NUTS(), 1000)

# Multiple chains more likely to find both modes
chains_multi = [sample(multimodal_example(), NUTS(), 1000) for _ in 1:4]

p = plot(title = "Multimodal Posterior")
for (i, chain) in enumerate(chains_multi)
    density!(p, chain[:x], alpha = 0.7, label = "Chain $i")
end
plot!(p)

Different chains find different modes, which would show up as high R̂ values.

Posterior predictive checks

Convergence diagnostics tell you if the algorithm worked. Posterior predictive checks tell you if the model makes sense.

Why check predictions? Even if MCMC converged perfectly, your model might still be wrong. The model could be mis-specified, missing important features, or based on incorrect assumptions. Posterior predictive checks reveal whether your model actually generates data that looks like what you observed.

# Generate predictions from the posterior
# decondition() removes the observed data, so Turing samples new observations
predictions = predict(decondition(model), chain_multi)

Chains MCMC chain (2000×30×2 Array{Float64, 3}):

Iterations        = 1:1:2000
Number of chains  = 2
Samples per chain = 2000
parameters        = obs[1], obs[2], obs[3], obs[4], obs[5], obs[6], obs[7], obs[8], obs[9], obs[10], obs[11], obs[12], obs[13], obs[14], obs[15], obs[16], obs[17], obs[18], obs[19], obs[20], obs[21], obs[22], obs[23], obs[24], obs[25], obs[26], obs[27], obs[28], obs[29], obs[30]
internals         = 

Use `describe(chains)` for summary statistics and quantiles.

Code explainer

predict(model, chain) - generates new data from the model using parameter values from the posterior chain. This is the basis for posterior predictive checks.
decondition(model) - removes the conditioning (observed data) from the model, so that obs ~ arraydist(Poisson.(lambdas)) samples new observations instead of computing likelihoods against the data. This is how you generate posterior predictive samples from a conditioned model.

# Plot posterior predictive samples
p = plot(title = "Posterior Predictive Check", xlabel = "Time", ylabel = "Cases")

# Plot multiple posterior predictive samples
for i in 1:50
    # Extract predictions for each time point
    pred_obs = [predictions[Symbol("obs[$j]")][i, 1] for j in 1:length(epi1.obs)]
    plot!(p, epi1.time, pred_obs, alpha = 0.3, color = :lightblue, label = "")
end

# Add real data
scatter!(p, epi1.time, epi1.obs, label = "Real data", markersize = 4, color = :red)
plot!(p)

Understanding the Julia code for generating predictions

decondition(model) removes the observed data from the model
- Our model was created with sir_model(...) | (; obs = epi1.obs), which conditions on the observed data. decondition() reverses this, telling Turing to sample new observations rather than computing likelihoods against the data.
predict(decondition(model), chain_multi) uses parameters from the posterior chain to simulate new data
- For each MCMC sample: takes R₀ and D_inf from the chain → solves ODE → samples new obs from Poisson
The result is stored with keys like Symbol("obs[1]"), Symbol("obs[2]"), etc.
- Because our model uses obs ~ arraydist(Poisson.(lambdas)), Turing names each element of the predicted vector obs[1], obs[2], etc. To access these in Julia, we convert the string to a Symbol: Symbol("obs[$j]") becomes :obs[1], :obs[2], etc.
- predictions[Symbol("obs[$j]")][i, 1] then extracts the i-th posterior sample for the j-th observation. The 1 is the chain index (we have one chain here).

Do the simulated epidemics (blue lines) look similar to the real data (red points)? They should capture the timing, peak size, and overall shape. If they don’t, your model needs improvement - no amount of MCMC will fix a fundamentally wrong model.

The complete workflow

Run multiple chains (at least 4)
Check the trace plots
Calculate R̂ and ESS
Check for divergences (NUTS)
Run a posterior predictive check
Rerun with more samples or different settings if any of these come back badly

Advanced diagnostics

This section covers refinements to the basic diagnostics above. If you’re short on time, you can skip ahead to the troubleshooting guide — the core diagnostics (R̂, ESS, trace plots) from the previous section are sufficient for most analyses.

Split-R̂

Modern R̂ splits each chain in half for more robust diagnostics:

# This is what rhat() actually computes - split chains for better diagnostics
rhat_split = DataFrame(rhat(chain_multi))
println("Split-R̂ values (recommended):")
for row in eachrow(rhat_split)
    println("  $(row.parameters): $(round(row.rhat, digits = 4))")
end

Split-R̂ values (recommended):
  R_0: 1.0022
  D_inf: 1.002

Bulk and tail ESS

Turing.jl provides different ESS measures:

# Different types of ESS
ess_bulk = DataFrame(ess(chain_multi, kind = :bulk))    # Central part of distribution
ess_tail = DataFrame(ess(chain_multi, kind = :tail))    # Tails of distribution

println("Bulk ESS (central posterior):")
for row in eachrow(ess_bulk)
    println("  $(row.parameters): $(round(row.ess, digits = 0))")
end

println("\nTail ESS (posterior tails):")
for row in eachrow(ess_tail)
    println("  $(row.parameters): $(round(row.ess, digits = 0))")
end

Bulk ESS (central posterior):
  R_0: 1139.0
  D_inf: 1274.0

Tail ESS (posterior tails):
  R_0: 1273.0
  D_inf: 1324.0

Energy diagnostics (for HMC/NUTS)

NUTS uses Hamiltonian dynamics. Energy plots help diagnose geometric problems:

# Energy diagnostic plot
if haskey(chain_good.info, :energy)
    energy = chain_good.info[:energy]
    histogram(energy, title = "Energy Distribution", xlabel = "Energy", ylabel = "Frequency", alpha = 0.7)
end

Troubleshooting guide

Problem	Symptoms	Solutions
Poor mixing	Trace plots show trends, low ESS	Use NUTS, tune proposals, longer runs
Multimodality	High R̂, different chain means	Multiple chains, better initialisation
Divergences	NUTS warnings, biased sampling	Increase `adapt_delta`, reparametrise
Slow convergence	High autocorrelation, low ESS	Better algorithm, longer runs, reparametrise
Non-stationarity	Trending trace plots	Longer warmup, check model specification

Practical example: Debugging a problematic model

Let’s create and diagnose a problematic model:

# Poorly specified model with correlations and poor scaling
@model function problematic_model()
    # Poor scaling: one parameter much larger than others
    R_0 ~ Normal(0, 100)     # Wrong scale
    D_inf ~ Normal(0, 0.01)  # Very tight

    # Add some correlation structure to make it harder
    x1 ~ Normal(R_0, 1)
    x2 ~ Normal(R_0, 1)

    # Some fake observations
    for i in 1:10
        0.0 ~ Normal(x1 + x2 - 2*R_0, 0.1)
    end
end

problematic_model (generic function with 2 methods)

# This model will have problems
chain_problematic = sample(problematic_model(), NUTS(), 1000)

# Diagnose it
println("R̂ values:")
display(DataFrame(rhat(chain_problematic)))

println("\nESS values:")
display(DataFrame(ess(chain_problematic)))

R̂ values:

ESS values:

4×2 DataFrame

Row	parameters	rhat
	Symbol	Float64
1	R_0	1.11557
2	D_inf	1.03364
3	x1	1.11515
4	x2	1.11588

4×3 DataFrame

Row	parameters	ess	ess_per_sec
	Symbol	Float64	Float64
1	R_0	6.22285	2.08961
2	D_inf	57.2134	19.212
3	x1	6.23098	2.09234
4	x2	6.22214	2.08937

# Look at the trace plots
plot(chain_problematic)

Exercise: Diagnose and fix the problematic model

Look at the trace plots and quantitative diagnostics above. What specific problems can you identify?
Which parameters show the worst R̂ and ESS values?
Try modifying the problematic_model above to improve its behaviour. Hint: Change the prior scales to be more similar across parameters, or remove the correlation structure.
Run the modified model and compare the diagnostics.

Solution

The problems are:

Mismatched scales: R_0 ~ Normal(0, 100) is very wide while D_inf ~ Normal(0, 0.01) is extremely tight. The sampler struggles to adapt its step size to work well for both simultaneously.
Redundant parameters: x1 and x2 both depend on R_0 and appear only as x1 + x2 - 2*R_0, making them poorly identified — many combinations of x1 and x2 give the same sum.

A fixed version:

@model function fixed_model()
    R_0 ~ Normal(0, 1)       # Moderate scale
    D_inf ~ Normal(0, 1)     # Same scale as R_0

    x1 ~ Normal(R_0, 1)
    x2 ~ Normal(R_0, 1)

    for i in 1:10
        0.0 ~ Normal(x1 + x2 - 2*R_0, 0.1)
    end
end

chain_fixed = sample(fixed_model(), NUTS(), 1000)
plot(chain_fixed)

After fixing the scales, R̂ should be much closer to 1.0 and ESS should increase substantially.

The general recipe:

Reparametrise (e.g. centered parameterisations)
Match prior scales across parameters
Extend the warmup period
Check whether the model specification itself is doing something wrong

How the pieces fit together

After three sessions, you now have a complete workflow for Bayesian inference with deterministic models. In the Introduction, you learned the core concepts: priors encode beliefs before seeing data, likelihoods measure how well parameters explain observations, and Bayes’ theorem combines them into a posterior. In MCMC, you saw how to sample from posteriors that are too complex for grid search - from basic Metropolis-Hastings through adaptive methods (RAM) to gradient-based samplers (NUTS). This session covered diagnostics: trace plots for visual assessment, R̂ for convergence, and ESS for effective sample size.

The workflow: Define model → Run MCMC → Check convergence → Check model fit → Iterate

But MCMC convergence doesn’t mean your model is correct - that’s next in Model Checking.

Learning points

Always check convergence - never trust MCMC without diagnostics
Multiple chains are essential for robust inference
R̂ < 1.1 and ESS > 100 are minimum requirements
NUTS is usually better than basic Metropolis-Hastings
When in doubt, run longer and use more chains

MCMC only works if you use it correctly. Diagnostics aren’t optional.

References

Vehtari et al. (2021). Rank-normalization, folding, and localization: An improved R̂ for assessing convergence of MCMC. Bayesian Analysis 16(2), 667-718. (Modern R̂ and ESS)
Gelman et al. (2013). Bayesian Data Analysis, 3rd ed. Chapman & Hall/CRC. (Chapter 11 covers diagnostics)
MCMCChains.jl documentation - Chain diagnostics in Julia
ArviZ - Exploratory analysis of Bayesian models (Python, with Julia bindings via ArviZ.jl)

Next session: Converged chains don’t guarantee a good model. In Model Checking and Validation, we’ll learn how to check whether your model actually describes the data well.