Introduction to model fitting in Julia

Estimated time: 2-2.5 hours

Lecture slides

Introduction

This session introduces the fundamental concepts of Bayesian inference through simulation with a simple epidemiological model. We will build intuition by manually calculating priors, likelihoods, and posteriors using an SIR epidemic model, before turning to algorithms for doing so more efficiently in the next session.

Objectives

The aim of this first session is to set you up with a framework for model fitting using simulation-based intuition. We will use Distributions.jl for elegant statistical operations and build understanding through hands-on simulation and manual calculations rather than jumping straight into automated inference.

In this session, you will

  1. understand priors, likelihood, and posterior through simulation
  2. calculate posterior distributions through manual grid search
  3. manually explore the posterior by trying different parameter values
  4. understand why we need algorithmic approaches (setting up the next session)

Source file

The source file of this session is located at sessions/introduction.qmd.

Load libraries

using Random ## for random numbers
using Distributions ## for probability distributions
using DifferentialEquations ## for differential equations
using DataFrames ## for data frames
using CSV ## for CSV file reading
using Plots ## for plots
using StatsPlots ## for statistical plots
using DrWatson
Precompiling packages...
   1131.9 msQuartoNotebookWorkerTablesExt (serial)
  1 dependency successfully precompiled in 1 seconds
Precompiling packages...
   1064.3 msQuartoNotebookWorkerLaTeXStringsExt (serial)
  1 dependency successfully precompiled in 2 seconds
Precompiling packages...
   2051.8 msQuartoNotebookWorkerJSONExt (serial)
  1 dependency successfully precompiled in 2 seconds
Precompiling packages...
   2374.1 msQuartoNotebookWorkerPlotsExt (serial)
  1 dependency successfully precompiled in 3 seconds

using X in Julia is equivalent to library(X) in R — it loads a package and makes its exports available.

# R
library(deSolve)
library(ggplot2)

Initialisation

We set a random seed for reproducibility. Setting this ensures that you should get exactly the same results on your computer as we do.

A deterministic SIR model in Julia

We’ll work with a simple deterministic SIR model. The model has two parameters: the basic reproductive number \(R_0\) and the infectious period \(D_\mathrm{inf}\). The model equations are:

\[ \begin{cases} \begin{aligned} \frac{\mathrm{d}S}{\mathrm{d}t} &= - \beta S \frac{I}{N}\\ \frac{\mathrm{d}I}{\mathrm{d}t} &= \beta S \frac{I}{N} - \nu I\\ \frac{\mathrm{d}R}{\mathrm{d}t} &= \nu I\\ \end{aligned} \end{cases} \]

where \(\beta=R_0/D_\mathrm{inf}\) and \(\nu = 1/D_\mathrm{inf}\).

To use this model in Julia we first define a function that represents the ODEs:

# Define ODE function
function sir_ode!(du, u, p, t)
    S, I, R = u
    R_0, D_inf = p
    N = S + I + R
        
    β = R_0 / D_inf
    ν = 1 / D_inf

    du[1] = -β * S * I / N  # dS/dt
    du[2] = β * S * I / N - ν * I  # dI/dt
    du[3] = ν * I  # dR/dt
end

In Julia, the exclamation mark indicates that the function can modify its arguments. In this case, the function modifies du in the last 3 lines to indicate the current change in the system as given by the ODEs.

In R’s deSolve, ODE functions return list(c(dS, dI, dR)) and take arguments in the order (t, state, parameters). In Julia’s DifferentialEquations.jl, the function modifies du in place (note the ! convention) and the argument order is (du, u, p, t).

# R (deSolve)
sir_ode <- function(t, state, parameters) {
  with(as.list(c(state, parameters)), {
    N <- S + I + R
    beta <- R_0 / D_inf
    nu <- 1 / D_inf
    dS <- -beta * S * I / N
    dI <- beta * S * I / N - nu * I
    dR <- nu * I
    list(c(dS, dI, dR))
  })
}

In order to simulate this model, i.e. numerically integrate the differential equations, we define it as an ODEproblem (which tells Julia that the function we defined above represents the dynamics of the system) which we then solve:

## define initial state
init_state = [999, 1, 0]
## define start and end times
t0, tend = (0, 30)
## define parameters: R_0 = 2.5, D_inf = 2
p = [2.5, 2]
# Define ODE problem
prob = ODEProblem(sir_ode!, init_state, (t0, tend), p)
 # Solve ODE
sol = solve(prob, saveat = 1)

Here is a closer look at what each of these commands does:

init_state = [999, 1, 0]

This creates an 1-dimensional Array (i.e., a vector) with three elements that we will later use to represent the initial state of our forward simulation (999 susceptibles, 1 infective and 0 recovered). The output tells us that the elements of the vector are of type Int64, i.e. integer-valued numbers.

t0, tend = (t0, tend)

This assigns values to the variables t0 (0) and tend (30). It is just a shorter version of writing

t0 = 0
tend = 40
prob = ODEProblem(sir_ode!, init_state, (t0, tend), p)

This defines our ODE system. The first argument to ODEproblem() is a function that contains the equations of the ODE, in our case the sir_ode!() function we defined earlier. The second argument is the initial state. The third argument defines that start and end time of the simulation, here given as a tuple indicated by round parentheses, (t0, tend). A tuple is similar to an array (defined with brackets as we did above in defining init_state) but more flexible in allowing different data types, as in e.g. lists in R. Here we could also have used a vector [t0, tend]. Either will be correctly interpreted by ODEProblem(). The fourth argument gives the parameters of the model p as a vector.

sol = solve(prob, saveat = 1)

This solves the ODE problem given by prob. The saveat argument tells the solver to save state information at time steps of size 1. If this is not given, the time steps at which information is saved will be chosen by the ODE solver - this is often fine but as we will later want to compare this to observations at integer time steps we here save information at these distinct steps.

In R, deSolve::ode() defines and solves the ODE in one call. Julia splits this into two steps: ODEProblem() defines the problem, then solve() runs it.

# R (deSolve) — one call does everything
sol <- ode(
  y = c(S = 999, I = 1, R = 0),
  times = seq(0, 30, by = 1),
  func = sir_ode,
  parms = c(R_0 = 2.5, D_inf = 2)
)

The sol object now contains information on the system dynamics (contained in sol.u) at different times (contained in sol.t).

We can plot this using

plot(sol, labels=["S" "I" "R"])

Part 1: Understanding priors

What is a prior?

A prior represents what we believe about parameters before seeing any data. It encodes our background knowledge, expertise, or assumptions about plausible parameter values.

In our SIR model:

  • \(R_0 \sim \text{Uniform}(1, 20)\): We believe \(R_0\) could be anywhere from 1 to 20, with all values equally likely
  • \(D_\mathrm{inf} \sim \text{Uniform}(1, 14)\): We believe infectious periods from 1 to 14 days are equally plausible

We can explore what these priors mean through simulation:

# Define our priors as Distribution objects
R_0_prior = Uniform(1.0, 20.0)
D_inf_prior = Uniform(1.0, 14.0)

# Sample from priors many times
n_samples = 1000
R_0_samples = rand(R_0_prior, n_samples)
D_inf_samples = rand(D_inf_prior, n_samples)

println(
    "R_0 prior samples - mean: ", round(mean(R_0_samples), digits = 2), 
    ", range: ", round(minimum(R_0_samples), digits = 2), " to ",
    round(maximum(R_0_samples), digits = 2)
)
println(
    "D_inf prior samples - mean: ", round(mean(D_inf_samples), digits = 2),
    ", range: ", round(minimum(D_inf_samples), digits = 2), " to ",
    round(maximum(D_inf_samples), digits = 2)
)
R_0 prior samples - mean: 10.94, range: 1.0 to 20.0
D_inf prior samples - mean: 7.56, range: 1.0 to 13.98
  • Uniform(a, b) - creates a uniform distribution object from a to b. This is from Distributions.jl.
  • rand(dist, n) - draws n random samples from the distribution dist. You can also use rand(dist) for a single sample.

R uses function families for each distribution: runif/dunif/punif/qunif for random draws, density, CDF, and quantiles respectively. In Julia, you create a single distribution object and use generic functions:

Operation R Julia
Create (implicit) dist = Uniform(1, 20)
Random draw runif(10, 1, 20) rand(dist, 10)
Density dunif(x, 1, 20) pdf(dist, x)
Log density dunif(x, 1, 20, log=TRUE) logpdf(dist, x)
CDF punif(x, 1, 20) cdf(dist, x)
Quantile qunif(p, 1, 20) quantile(dist, p)

The same pattern works for all distributions: Normal, Poisson, Gamma, etc.

# Visualise our priors
p1 = histogram(
    R_0_samples, title = "R_0 Prior", xlabel = "R_0", ylabel = "Frequency",
    alpha = 0.7, label = "Uniform(1, 20)"
)
p2 = histogram(
    D_inf_samples, title = "D_inf Prior", xlabel = "D_inf (days)",
    ylabel = "Frequency", alpha = 0.7, label = "Uniform(1, 14)"
)
plot(p1, p2, layout = (1,2))

Prior predictive simulation

An important question is: what trajectories do our priors represent? We can find out by simulating epidemics using random draws from our priors:

# Prepare plot where we'll add the individual trajectories
p = plot(
    title = "Prior Predictive Trajectories",
    xlabel = "Time", ylabel = "Infected"
)

ntraj = 20

for i in 1:ntraj
    R_0 = R_0_samples[i]
    D_inf = D_inf_samples[i]

    prob = ODEProblem(
        sir_ode!, init_state, (t0, tend), [R_0, D_inf]
    )
    sol = solve(prob)
    plot!(p, sol, alpha = 0.6, label = "", color = :grey, idxs = (0, 2))
end

plot!(p)

The plot() commands now contain an argument idxs, which allows us to choose what to plot. This uses the plot recipe for differential equations. Index 0 corresponds to the time variable, and any indices >0 to the variables of the differential equations. In our case, we specify 0, 2, that is we plot time (on the x axis) against the second variable, I (on the y axis).

Note the use of plot! (with !) which adds to an existing plot, following the same convention as push! - the ! indicates the function modifies something in place rather than creating a new object.

plot!() in Julia adds to an existing plot, similar to lines() or points() in base R, or adding layers with + geom_*() in ggplot2.

# Base R
plot(x, y)       # new plot    → Julia: plot(x, y)
lines(x2, y2)    # add to it   → Julia: plot!(x2, y2)

# ggplot2
ggplot(df, aes(x, y)) + geom_line() + geom_point()
WarningExercise: Evaluate the prior predictive trajectories

Look at these trajectories. Do they seem reasonable? Are there any that look unrealistic?

  • Do any grow too fast or too slow?
  • Are the peak sizes reasonable for a population of 1000?
  • Do any last too long or too short?

This is prior predictive checking - validating that our priors produce scientifically plausible trajectories.

Choosing different priors

What if we had different beliefs? Here are more informative (narrower) priors:

# More informative priors based on epidemiological knowledge
R_0_informed = Normal(2.5, 0.8)  # Most respiratory diseases have R_0 around 2-3
D_inf_informed = Gamma(2, 1)     # Infectious period likely 1-4 days for common infections

# Sample and visualise
R_0_informed_samples = rand(R_0_informed, 1000)
D_inf_informed_samples = rand(D_inf_informed, 1000)

p1 = histogram(
    R_0_samples, title = "Flat Prior", xlabel = "R_0", alpha = 0.7,
    label = "Uniform(1, 20)"
)
histogram!(p1, R_0_informed_samples, alpha = 0.7, label = "Normal(2.5, 0.8)")

p2 = histogram(
    D_inf_samples, title = "Flat Prior", xlabel = "D_inf", alpha = 0.7,
    label = "Uniform(1, 14)"
)
histogram!(p2, D_inf_informed_samples, alpha = 0.7, label = "Gamma(2, 1)")

plot(p1, p2, layout = (1,2))

Watch out for the Gamma parameterisation: Julia’s Gamma(shape, scale) matches R’s scale argument, but many R users default to the rate parameterisation where rate = 1/scale.

# R — these are equivalent:
dgamma(x, shape = 2, scale = 1)
dgamma(x, shape = 2, rate = 1)   # same here since scale = 1

# Julia equivalent:
# Gamma(2, 1)  — always uses scale, never rate

Be careful when scale ≠ 1: Gamma(2, 3) in Julia corresponds to dgamma(x, 2, scale = 3) or dgamma(x, 2, rate = 1/3) in R.

# Compare epidemic predictions
p = plot(
    title = "Prior Predictive: Flat vs Informed",
    xlabel = "Time", ylabel = "Infected"
)

# Flat priors (gray)
for i in 1:10
    R_0 = R_0_samples[i]
    D_inf = D_inf_samples[i]

    prob = ODEProblem(
        sir_ode!, init_state, (t0, tend), [R_0, D_inf]
    )
    sol = solve(prob)
    plot!(p, sol, alpha = 0.6, label = "", color = :gray, idxs = (0, 2))
end

# Informed priors (blue)
for i in 1:10
    R_0 = R_0_informed_samples[i]
    D_inf = D_inf_informed_samples[i]

    prob = ODEProblem(
        sir_ode!, init_state, (t0, tend), [R_0, D_inf]
    )
    sol = solve(prob)
    plot!(p, sol, alpha = 0.6, label = "", color = :blue, idxs = (0, 2))
end

plot!(p)

Priors determine what kinds of epidemics our model considers plausible before seeing data.

WarningExercise: Experiment with priors

Try modifying the priors and re-running the prior predictive simulation above. For example:

  1. What happens if you use R_0_prior = Uniform(1.0, 5.0) instead of Uniform(1.0, 20.0)?
  2. What if you use LogNormal(log(2.5), 0.3) for \(R_0\)? (Hint: this centres the prior around 2.5 but on a log scale)
  3. Can you find a prior that produces only “realistic-looking” epidemic curves?

There is no single right answer — the point is to build intuition for how prior choices affect the range of predictions.

TipPrinciples for choosing priors
  1. Use domain knowledge: What do published studies say about plausible parameter ranges? For infectious diseases, systematic reviews often provide estimates.
  2. Avoid impossible values: If a parameter must be positive, use a distribution with positive support (Gamma, LogNormal) rather than truncating a Normal.
  3. Check with prior predictive simulation: Do samples from your priors generate plausible epidemic trajectories? If not, your priors may be too vague or misspecified.
  4. Weakly informative is often better than flat: Completely flat priors can cause computational problems and may place substantial probability on implausible values.
  5. Document your choices: Record why you chose each prior - this helps with reproducibility and peer review.

Evaluating prior densities

So far we’ve been sampling from our priors to understand what they represent. But for Bayesian inference, we also need to evaluate how probable specific parameter values are under our priors.

We define some specific parameter combinations to explore throughout our analysis:

# Parameter combinations we'll explore throughout our analysis
test_params = [
    (R_0 = 1.1, D_inf = 2.0),
    (R_0 = 1.5, D_inf = 2.0),
    (R_0 = 2.5, D_inf = 2.0),
    (R_0 = 4.0, D_inf = 2.0),
    (R_0 = 2.5, D_inf = 1.0),
    (R_0 = 2.5, D_inf = 4.0)
]

# Calculate prior densities for both flat and informed priors
log_priors_flat = []
log_priors_informed = []

println("Flat vs Informed prior log-densities:")
for theta in test_params
    # Flat priors
    flat_prior = logpdf(Uniform(1, 20), theta.R_0) + logpdf(Uniform(1, 14), theta.D_inf)
    push!(log_priors_flat, flat_prior)

    # Informed priors
    informed_prior = logpdf(Normal(2.5, 0.8), theta.R_0) + logpdf(Gamma(2, 1), theta.D_inf)
    push!(log_priors_informed, informed_prior)

    println(
        "R_0 = $(theta.R_0), D_inf = $(theta.D_inf): ",
        "Flat = $(round(flat_prior, digits = 1)), ",
        "Informed = $(round(informed_prior, digits = 1))"
    )
end
Flat vs Informed prior log-densities:
R_0 = 1.1, D_inf = 2.0: Flat = -5.5, Informed = -3.5
R_0 = 1.5, D_inf = 2.0: Flat = -5.5, Informed = -2.8
R_0 = 2.5, D_inf = 2.0: Flat = -5.5, Informed = -2.0
R_0 = 4.0, D_inf = 2.0: Flat = -5.5, Informed = -3.8
R_0 = 2.5, D_inf = 1.0: Flat = -5.5, Informed = -1.7
R_0 = 2.5, D_inf = 4.0: Flat = -5.5, Informed = -3.3
  • logpdf(dist, x) - returns the log probability density of x under distribution dist. We use log probabilities because they’re numerically more stable (multiplying tiny probabilities can underflow).
  • Normal(μ, σ) - creates a normal distribution with mean μ and standard deviation σ.
  • Gamma(α, θ) - creates a gamma distribution with shape α and scale θ. Useful for positive parameters like durations.
  • push!(array, value) - appends value to the end of array. The ! indicates it modifies the array in place, which you can think of as shorthand for array = push(array, value).

logpdf(dist, x) in Julia is equivalent to calling a density function with log = TRUE in R. For example:

Julia R
logpdf(Normal(2.5, 0.8), x) dnorm(x, 2.5, 0.8, log = TRUE)
logpdf(Gamma(2, 1), x) dgamma(x, 2, scale = 1, log = TRUE)
logpdf(Poisson(10), x) dpois(x, 10, log = TRUE)

The key difference is that in Julia you first create the distribution object, then pass it to logpdf. In R, you pass the parameters directly to the density function each time.

Part 2: Understanding the likelihood

What is a likelihood?

The likelihood quantifies how probable our observed data is, given specific parameter values. It asks: “If the parameters were X, how likely would we be to see the data we actually observed?”

The prior predictive trajectories we looked at earlier were purely a function of the model and its specification. The likelihood, on the other hand, is about comparing the model to data. We start by loading some observed epidemic data (simulated from the model, but in principle this could be any data) representing the number of infectious individuals at different time points:

# Load epidemic data from CSV
epi1 = CSV.read(datadir("epi1_synthetic.csv"), DataFrame)

println("Data overview:")
first(epi1, 8)
Data overview:
8×2 DataFrame
Row time obs
Int64 Int64
1 1 1
2 2 3
3 3 4
4 4 11
5 5 19
6 6 38
7 7 67
8 8 112

datadir() comes from the DrWatson.jl package we loaded in the setup. It returns the path to the data/ folder in the project root, so datadir("epi1_synthetic.csv") expands to something like /path/to/mfiidd.julia/data/epi1_synthetic.csv. This keeps file paths portable across different machines.

We first visualise this data set.

# Visualise the observed data
scatter(
    epi1.time, epi1.obs, xlabel = "Time", ylabel = "Observed Cases", 
    title = "Epidemic Data", label = "Observed", markersize = 4, color = :red
)

This data was generated from an SIR model with \(R_0 ≈ 2.5\) and \(D_\mathrm{inf} = 2\) days. Can we figure this out by exploring the likelihood?

Exploring likelihood through simulation

The likelihood asks: “How probable is our observed data, given specific parameter values?” We explore this by trying the parameter combinations from before:

# Calculate and store likelihood values for our test parameters
log_likelihoods = []

# Generate plot
p = plot(title = "Model Fit Comparison", xlabel = "Time", ylabel = "Cases")

# For each parameter set, calculate log-likelihood and plot
for theta in test_params

    R_0 = theta.R_0
    D_inf = theta.D_inf

    # Simulate from the model
    prob = ODEProblem(
        sir_ode!, init_state, (t0, tend), [R_0, D_inf]
    )
    sol = solve(prob, saveat = epi1.time)

    # I compartment is second compartment
    I = sol[2, :]
    # assuming Poisson log-likelihood at each time point
    loglik = [ Poisson(It) for It in I ]
    # calculate total loglikelihood as sum of log likelihoods at each time point
    total_loglik = sum(logpdf.(loglik, epi1.obs))
    # store in vector
    push!(log_likelihoods, total_loglik)

    println(
        "R_0 = $(R_0), D_inf = $(D_inf) → ",
        "log-likelihood = $(round(total_loglik, digits = 1))"
    )
    plot!(
        p, sol, idxs = (0, 2),
        label = "R_0 = $(theta.R_0), D_inf = $(theta.D_inf)", linewidth = 2
    )
end

scatter!(
    p, epi1.time, epi1.obs, label = "Observed", markersize = 4, color = :black,
    markershape = :circle
)
plot!(p)
R_0 = 1.1, D_inf = 2.0 → log-likelihood = -5763.6
R_0 = 1.5, D_inf = 2.0 → log-likelihood = -2853.9
R_0 = 2.5, D_inf = 2.0 → log-likelihood = -160.9
R_0 = 4.0, D_inf = 2.0 → log-likelihood = -2456.1
R_0 = 2.5, D_inf = 1.0 → log-likelihood = -5495.3
R_0 = 2.5, D_inf = 4.0 → log-likelihood = -2821.4
  • sol[2, :] - extracts the second row (the I compartment) from the ODE solution matrix. The : means “all columns” (all time points).
  • [ Poisson(It) for It in I ] - an array comprehension that creates a Poisson distribution for each expected value It in the vector I. If you know R, this is like lapply(I, function(It) ...) or purrr::map(I, ~ ...) — it applies an operation to each element and collects the results.
  • Poisson(λ) - creates a Poisson distribution with rate parameter λ. Used here as our observation model.
  • logpdf.(loglik, epi1.obs) - the . broadcasts the logpdf function element-wise, calculating log-likelihood for each observation against its corresponding Poisson distribution.

R vectorises most functions implicitly; Julia requires an explicit . for element-wise operations (broadcasting).

# R — automatically vectorises over obs and lambda
dpois(obs, lambda = expected, log = TRUE)
# Julia — need the dot for element-wise operation
logpdf.(Poisson.(expected), obs)

Forgetting the . on a vector operation is a common gotcha for R users.

WarningExercise: Predict the best fit

Before looking at the output above, consider the parameter combinations being tested. Based on your understanding of SIR dynamics and the data plot:

  • Which value of \(R_0\) do you expect to give the best fit? Why?
  • How would changing \(D_\mathrm{inf}\) affect the epidemic curve?

Now check the log-likelihood values. Did the results match your intuition? Is the visual best fit also the one with the highest log-likelihood?

The observation model

Our likelihood makes specific assumptions about how observations relate to the underlying epidemic:

  1. Poisson noise: Each observation is drawn from a Poisson distribution centred on the model’s predicted prevalence. This implies the variance equals the mean - when we expect 100 cases, we expect variation of about ±10.

  2. Conditional independence: Given the deterministic trajectory, observations at different time points are independent. The observation error at time \(t\) doesn’t affect the error at time \(t+1\).

The total log-likelihood is the sum across all time points: if the model predicts well at most times but poorly at one, that one bad fit drags down the total.

WarningExercise: When do these assumptions break down?

Think about when these assumptions might be violated:

  • When might variance not equal the mean? (Hint: consider overdispersion, or very small/large counts)
  • When might observations not be independent? (Hint: consider reporting delays, weekend effects, or outbreak investigations)
  • What other aspects of the observation process might you want to model? (Hint: under-reporting, delays between infection and observation)

A helper function

We’ll be calculating likelihoods repeatedly. Rather than copy-pasting the ODE simulation and Poisson likelihood code from above each time, let’s extract it into a reusable function:

function simulate_and_loglik(R_0, D_inf, data; init_state = [999, 1, 0], tspan = (0, 30))
    # Simulate ODE
    prob = ODEProblem(sir_ode!, init_state, tspan, [R_0, D_inf])
    sol = solve(prob, saveat = data.time)
    I = sol[2, :]

    # Calculate Poisson log-likelihood
    loglik = sum(logpdf.(Poisson.(I), data.obs))

    return (I = I, loglik = loglik)
end

This function simulates the SIR model and returns both the trajectory and the log-likelihood. We’ll use it throughout the rest of this session.

The semicolon ; in the function signature separates positional arguments (R_0, D_inf, data) from keyword arguments (init_state, tspan). Keyword arguments have defaults and must be passed by name: simulate_and_loglik(2.5, 2.0, epi1, init_state = [499, 1, 0]).

The function returns a named tuple (I = I, loglik = loglik), which you access with dot syntax: result.I or result.loglik. Named tuples are lightweight and immutable — think of them as a simple way to return multiple values with meaningful names.

Keyword arguments with defaults work like R function arguments:

# R
simulate_and_loglik <- function(R_0, D_inf, data,
                                init_state = c(999, 1, 0),
                                tspan = c(0, 30)) {
  # ...
  list(I = I, loglik = loglik)  # named list
}
result$I  # access with $

Julia’s named tuples are similar to R’s named lists but lighter-weight. The main difference is the ; to separate keyword from positional arguments — R doesn’t make this distinction.

ImportantTake a break here

You’ve completed Parts 1 and 2: priors and likelihoods. Part 3 combines them into the posterior and covers grid search.

If you’re working through this in one sitting, stop here for 10-15 minutes. The concepts in Part 3 build directly on what you’ve learned, and a short break will help consolidate your understanding.

Part 3: Understanding the posterior

What is a posterior?

The posterior combines our prior beliefs with the data evidence using Bayes’ theorem:

\[p(\theta|\text{data}) \propto p(\text{data}|\theta) \times p(\theta)\]

  • \(p(\theta|\text{data})\): Posterior - what we believe about parameters after seeing data
  • \(p(\text{data}|\theta)\): Likelihood - how well parameters explain the data
  • \(p(\theta)\): Prior - what we believed before seeing data

In log space: \(\log p(\theta|\text{data}) = \log p(\text{data}|\theta) + \log p(\theta) + \text{constant}\)

We can combine our stored prior and likelihood values to calculate the posterior:

# Posterior = Prior + Likelihood (using our stored values)
println("Posterior breakdown using flat priors:")
for (i, theta) in enumerate(test_params)
    log_post_flat = log_priors_flat[i] + log_likelihoods[i]

    println("R_0=$(theta.R_0), D_inf=$(theta.D_inf)")
    println("  Prior: $(round(log_priors_flat[i], digits=1)), Likelihood: $(round(log_likelihoods[i], digits=1)), Posterior: $(round(log_post_flat, digits=1))")
end
Posterior breakdown using flat priors:
R_0=1.1, D_inf=2.0
  Prior: -5.5, Likelihood: -5763.6, Posterior: -5769.1
R_0=1.5, D_inf=2.0
  Prior: -5.5, Likelihood: -2853.9, Posterior: -2859.4
R_0=2.5, D_inf=2.0
  Prior: -5.5, Likelihood: -160.9, Posterior: -166.4
R_0=4.0, D_inf=2.0
  Prior: -5.5, Likelihood: -2456.1, Posterior: -2461.6
R_0=2.5, D_inf=1.0
  Prior: -5.5, Likelihood: -5495.3, Posterior: -5500.8
R_0=2.5, D_inf=4.0
  Prior: -5.5, Likelihood: -2821.4, Posterior: -2826.9

How priors influence the posterior

The following compares flat vs informed priors for all our test parameters:

# Compare flat vs informative priors using our stored values
println("Flat vs Informed posterior comparison:")
for (i, theta) in enumerate(test_params)
    # Calculate posteriors using stored values
    log_post_flat = log_priors_flat[i] + log_likelihoods[i]
    log_post_informed = log_priors_informed[i] + log_likelihoods[i]

    println("R_0=$(theta.R_0), D_inf=$(theta.D_inf)")
    println("  Flat posterior: $(round(log_post_flat, digits=1)), Informed posterior: $(round(log_post_informed, digits=1))")
    println("  Difference: $(round(log_post_informed - log_post_flat, digits=1))")
    println()
end
Flat vs Informed posterior comparison:
R_0=1.1, D_inf=2.0
  Flat posterior: -5769.1, Informed posterior: -5767.1
  Difference: 2.0

R_0=1.5, D_inf=2.0
  Flat posterior: -2859.4, Informed posterior: -2856.7
  Difference: 2.7

R_0=2.5, D_inf=2.0
  Flat posterior: -166.4, Informed posterior: -162.9
  Difference: 3.5

R_0=4.0, D_inf=2.0
  Flat posterior: -2461.6, Informed posterior: -2459.8
  Difference: 1.7

R_0=2.5, D_inf=1.0
  Flat posterior: -5500.8, Informed posterior: -5497.0
  Difference: 3.8

R_0=2.5, D_inf=4.0
  Flat posterior: -2826.9, Informed posterior: -2824.7
  Difference: 2.2

When we have strong prior beliefs that align with the data, the posterior gets a boost. When priors conflict with data, they pull the posterior away from the likelihood.

Systematic posterior exploration

Now that we understand the components of Bayesian inference, we can systematically explore the posterior by trying many parameter combinations and calculating their log-posterior values manually.

Grid search to find the best parameters

We calculate log-posterior values for a grid of parameter combinations:

# Define grid of parameter values to try
R_0_values = 1.5:0.2:4.0
D_inf_values = 1.0:0.2:3.0

# Store results
best_log_posterior = -Inf
best_params = nothing
results = []

for R_0 in R_0_values
    for D_inf in D_inf_values
        # Prior
        log_prior = logpdf(Uniform(1, 20), R_0) + logpdf(Uniform(1, 14), D_inf)

        # Likelihood (using our helper)
        result = simulate_and_loglik(R_0, D_inf, epi1)

        # Posterior = prior × likelihood
        log_post = log_prior + result.loglik
        push!(results, (R_0 = R_0, D_inf = D_inf, log_posterior = log_post))

        if log_post > best_log_posterior
            best_log_posterior = log_post
            best_params = (R_0 = R_0, D_inf = D_inf)
        end
    end
end

println("Best parameters: R_0 = $(best_params.R_0), D_inf = $(best_params.D_inf)")
println("Best log-posterior: $(round(best_log_posterior, digits = 1))")
Best parameters: R_0 = 2.3, D_inf = 2.0
Best log-posterior: -98.5
  • 1.5:0.2:4.0 - creates a range from 1.5 to 4.0 in steps of 0.2. This gives us 14 values for R_0.
  • The nested for loops try every combination of R_0 and D_inf values - that’s 14 × 11 = 154 combinations.
  • For each combination, we calculate the log-posterior (prior + likelihood) and track the best one found so far.

Julia’s start:step:stop syntax is equivalent to seq(start, stop, by = step) in R. Note that R’s : operator (e.g. 1:10) only increments by 1, whereas Julia’s : supports any step size.

# R
seq(1.5, 4.0, by = 0.2)
# Julia
1.5:0.2:4.0

Is this the result you expected?

Apart from finding the best parameters we can also use our results to visualise the whole posterior surface.

# Visualise the posterior surface
R_0_grid = [r.R_0 for r in results]
D_inf_grid = [r.D_inf for r in results]
log_post_grid = [r.log_posterior for r in results]

# Reshape for heatmap
R_0_unique = unique(R_0_grid)
D_inf_unique = unique(D_inf_grid)
post_matrix = reshape(log_post_grid, length(R_0_unique), length(D_inf_unique))

heatmap(
    D_inf_unique, R_0_unique, post_matrix,
    xlabel = "D_inf (days)", ylabel = "R_0",
    title = "Log-Posterior Surface", color = :viridis
)

What do you observe about the posterior surface?

Posterior predictive checking

Finding the best parameters is only half the job - we also need to check whether the model actually fits the data well. We do this by simulating data from the fitted model and comparing it to our observations.

# Get the trajectory for the best parameters
best_result = simulate_and_loglik(best_params.R_0, best_params.D_inf, epi1)

# Plot: deterministic trajectory + observation noise
p = plot(title = "Best Fit: Observation Noise Only", xlabel = "Time", ylabel = "Cases")

# Multiple realisations with Poisson observation noise (same parameters each time)
for i in 1:10
    pred_data = rand.(Poisson.(best_result.I))
    plot!(p, epi1.time, pred_data, alpha = 0.3, color = :blue, label = "")
end

# Deterministic trajectory
plot!(p, epi1.time, best_result.I, linewidth = 2, color = :red,
      label = "Model (R_0=$(best_params.R_0), D_inf=$(best_params.D_inf))")
scatter!(p, epi1.time, epi1.obs, label = "Observed", markersize = 4, color = :black)

The blue lines show variation from observation noise only - the underlying trajectory is identical each time.

Impact of parameter uncertainty

But we don’t know the true parameters - we have a distribution of plausible values. What happens when we also vary the parameters?

# Find parameter combinations in the top 10% of posterior density
threshold = quantile(log_post_grid, 0.9)
good_results = filter(r -> r.log_posterior > threshold, results)
println("$(length(good_results)) parameter combinations in top 10% of posterior")

# Plot trajectories from different "good" parameter values
p = plot(title = "Parameter + Observation Uncertainty", xlabel = "Time", ylabel = "Cases")

for r in good_results[1:min(15, end)]
    result = simulate_and_loglik(r.R_0, r.D_inf, epi1)
    pred_data = rand.(Poisson.(result.I))  # Add observation noise too
    plot!(p, epi1.time, pred_data, alpha = 0.3, color = :blue, label = "")
end

scatter!(p, epi1.time, epi1.obs, label = "Observed", markersize = 4, color = :black)
15 parameter combinations in top 10% of posterior

In the first plot, we fixed the parameters at their best values and only varied the observation noise. In the second plot, we vary both the parameters (sampling from the posterior) and the observation noise. The second plot shows more variation because it accounts for our uncertainty about the true parameter values.

WarningExercise: Posterior predictive checking

Does the real data look consistent with our simulated datasets? This is posterior predictive checking - a key model validation technique that we’ll explore in depth in the model checking session.

Why we need algorithmic approaches

We have seen how being able to simulate a model, in combination with prior densities and observation probabilities (expressed in the likelihood) enables us to fit this model to data and develop an understanding of parametric uncertainty. But we have done so using grid search, which has serious limitations:

# The curse of dimensionality
n_params = [2, 3, 4, 5, 6, 10]
n_values_per_param = 15  # Reasonable resolution

println("Grid search scaling problems:")
for n in n_params
    total_evals = n_values_per_param^n
    println("$(n) parameters: $(total_evals) evaluations")
end
Grid search scaling problems:
2 parameters: 225 evaluations
3 parameters: 3375 evaluations
4 parameters: 50625 evaluations
5 parameters: 759375 evaluations
6 parameters: 11390625 evaluations
10 parameters: 576650390625 evaluations

This shows that the computational cost of grid search, i.e. the number of simulations that we need to run, reaches very large numbers even at moderate levels of complexity.

In addition, this kind of sampling is wasteful, because most parameter combinations will be far from the ones that lead to a good fit to the data.

Lastly, it is difficult to establish whether the grid is fine enough (or important parameter regions might be in the “gaps”), and whether the limits of the grid are wide enough.

Summary

Priors express our beliefs about parameters before seeing data. Through prior predictive simulation, different prior choices lead to different predicted epidemic trajectories - this helps validate whether assumptions are reasonable.

The likelihood quantifies how well parameters explain observed data. Our Poisson observation model assumes each data point is an independent draw around the model’s predicted prevalence - assumptions worth questioning for real applications.

Bayes’ theorem combines prior and likelihood to give the posterior - updated beliefs after seeing data. Grid search can explore this manually, but becomes computationally infeasible as parameters increase. Posterior predictive checking validates that fitted models generate realistic data. This scalability limitation motivates the algorithmic approaches in the next session.

TipLearning points
  • Priors express beliefs before seeing data - explore them through simulation (prior predictive checking)
  • Likelihood quantifies how well parameters explain observed data, given an observation model
  • Posterior combines prior and likelihood via Bayes’ theorem: \(p(\theta|\text{data}) \propto p(\text{data}|\theta) \times p(\theta)\)
  • Observation models encode assumptions (e.g., Poisson noise, conditional independence) - think critically about when they’re violated
  • Grid search can explore posteriors but doesn’t scale beyond a few parameters
  • Posterior predictive checking validates whether fitted models generate realistic data

References

Next session: In MCMC sampling, we’ll learn how Markov Chain Monte Carlo algorithms solve these problems by automatically exploring the posterior distribution. We’ll start with simple algorithms you can code yourself, then see how Turing.jl automates everything with state-of-the-art methods.