This session introduces the fundamental concepts of Bayesian inference through simulation with a simple epidemiological model. We will build intuition by manually calculating priors, likelihoods, and posteriors using an SIR epidemic model, before turning to algorithms for doing so more efficiently in the next session.
Objectives
The aim of this first session is to set you up with a framework for model fitting using simulation-based intuition. We will use Distributions.jl for elegant statistical operations and build understanding through hands-on simulation and manual calculations rather than jumping straight into automated inference.
In this session, you will
understand priors, likelihood, and posterior through simulation
calculate posterior distributions through manual grid search
manually explore the posterior by trying different parameter values
understand why we need algorithmic approaches (setting up the next session)
NoteSetup
Source file
The source file of this session is located at sessions/introduction.qmd.
Load libraries
usingRandom## for random numbersusingDistributions## for probability distributionsusingDifferentialEquations## for differential equationsusingDataFrames## for data framesusingCSV## for CSV file readingusingPlots## for plotsusingStatsPlots## for statistical plotsusingDrWatson
Precompiling packages...
1131.9 ms ✓ QuartoNotebookWorkerTablesExt (serial)
1 dependency successfully precompiled in 1 seconds
Precompiling packages...
1064.3 ms ✓ QuartoNotebookWorkerLaTeXStringsExt (serial)
1 dependency successfully precompiled in 2 seconds
Precompiling packages...
2051.8 ms ✓ QuartoNotebookWorkerJSONExt (serial)
1 dependency successfully precompiled in 2 seconds
Precompiling packages...
2374.1 ms ✓ QuartoNotebookWorkerPlotsExt (serial)
1 dependency successfully precompiled in 3 seconds
NoteComing from R?
using X in Julia is equivalent to library(X) in R — it loads a package and makes its exports available.
# Rlibrary(deSolve)library(ggplot2)
Initialisation
We set a random seed for reproducibility. Setting this ensures that you should get exactly the same results on your computer as we do.
A deterministic SIR model in Julia
We’ll work with a simple deterministic SIR model. The model has two parameters: the basic reproductive number \(R_0\) and the infectious period \(D_\mathrm{inf}\). The model equations are:
where \(\beta=R_0/D_\mathrm{inf}\) and \(\nu = 1/D_\mathrm{inf}\).
To use this model in Julia we first define a function that represents the ODEs:
# Define ODE functionfunctionsir_ode!(du, u, p, t) S, I, R = u R_0, D_inf = p N = S + I + R β = R_0 / D_inf ν =1/ D_inf du[1] =-β * S * I / N # dS/dt du[2] = β * S * I / N - ν * I # dI/dt du[3] = ν * I # dR/dtend
TipWhy is there an exclamation mark in sir_ode!()?
In Julia, the exclamation mark indicates that the function can modify its arguments. In this case, the function modifies du in the last 3 lines to indicate the current change in the system as given by the ODEs.
NoteComing from R?
In R’s deSolve, ODE functions return list(c(dS, dI, dR)) and take arguments in the order (t, state, parameters). In Julia’s DifferentialEquations.jl, the function modifies du in place (note the ! convention) and the argument order is (du, u, p, t).
# R (deSolve)sir_ode <-function(t, state, parameters) {with(as.list(c(state, parameters)), { N <- S + I + R beta <- R_0 / D_inf nu <-1/ D_inf dS <--beta * S * I / N dI <- beta * S * I / N - nu * I dR <- nu * Ilist(c(dS, dI, dR)) })}
In order to simulate this model, i.e. numerically integrate the differential equations, we define it as an ODEproblem (which tells Julia that the function we defined above represents the dynamics of the system) which we then solve:
Here is a closer look at what each of these commands does:
init_state = [999, 1, 0]
This creates an 1-dimensional Array (i.e., a vector) with three elements that we will later use to represent the initial state of our forward simulation (999 susceptibles, 1 infective and 0 recovered). The output tells us that the elements of the vector are of type Int64, i.e. integer-valued numbers.
t0, tend = (t0, tend)
This assigns values to the variables t0 (0) and tend (30). It is just a shorter version of writing
This defines our ODE system. The first argument to ODEproblem() is a function that contains the equations of the ODE, in our case the sir_ode!() function we defined earlier. The second argument is the initial state. The third argument defines that start and end time of the simulation, here given as a tuple indicated by round parentheses, (t0, tend). A tuple is similar to an array (defined with brackets as we did above in defining init_state) but more flexible in allowing different data types, as in e.g. lists in R. Here we could also have used a vector [t0, tend]. Either will be correctly interpreted by ODEProblem(). The fourth argument gives the parameters of the model p as a vector.
sol =solve(prob, saveat =1)
This solves the ODE problem given by prob. The saveat argument tells the solver to save state information at time steps of size 1. If this is not given, the time steps at which information is saved will be chosen by the ODE solver - this is often fine but as we will later want to compare this to observations at integer time steps we here save information at these distinct steps.
NoteComing from R?
In R, deSolve::ode() defines and solves the ODE in one call. Julia splits this into two steps: ODEProblem() defines the problem, then solve() runs it.
# R (deSolve) — one call does everythingsol <-ode(y =c(S =999, I =1, R =0),times =seq(0, 30, by =1),func = sir_ode,parms =c(R_0 =2.5, D_inf =2))
The sol object now contains information on the system dynamics (contained in sol.u) at different times (contained in sol.t).
We can plot this using
plot(sol, labels=["S""I""R"])
Part 1: Understanding priors
What is a prior?
A prior represents what we believe about parameters before seeing any data. It encodes our background knowledge, expertise, or assumptions about plausible parameter values.
In our SIR model:
\(R_0 \sim \text{Uniform}(1, 20)\): We believe \(R_0\) could be anywhere from 1 to 20, with all values equally likely
\(D_\mathrm{inf} \sim \text{Uniform}(1, 14)\): We believe infectious periods from 1 to 14 days are equally plausible
We can explore what these priors mean through simulation:
# Define our priors as Distribution objectsR_0_prior =Uniform(1.0, 20.0)D_inf_prior =Uniform(1.0, 14.0)# Sample from priors many timesn_samples =1000R_0_samples =rand(R_0_prior, n_samples)D_inf_samples =rand(D_inf_prior, n_samples)println("R_0 prior samples - mean: ", round(mean(R_0_samples), digits =2), ", range: ", round(minimum(R_0_samples), digits =2), " to ",round(maximum(R_0_samples), digits =2))println("D_inf prior samples - mean: ", round(mean(D_inf_samples), digits =2),", range: ", round(minimum(D_inf_samples), digits =2), " to ",round(maximum(D_inf_samples), digits =2))
R_0 prior samples - mean: 10.94, range: 1.0 to 20.0
D_inf prior samples - mean: 7.56, range: 1.0 to 13.98
rand(dist, n) - draws n random samples from the distribution dist. You can also use rand(dist) for a single sample.
NoteComing from R?
R uses function families for each distribution: runif/dunif/punif/qunif for random draws, density, CDF, and quantiles respectively. In Julia, you create a single distribution object and use generic functions:
Operation
R
Julia
Create
(implicit)
dist = Uniform(1, 20)
Random draw
runif(10, 1, 20)
rand(dist, 10)
Density
dunif(x, 1, 20)
pdf(dist, x)
Log density
dunif(x, 1, 20, log=TRUE)
logpdf(dist, x)
CDF
punif(x, 1, 20)
cdf(dist, x)
Quantile
qunif(p, 1, 20)
quantile(dist, p)
The same pattern works for all distributions: Normal, Poisson, Gamma, etc.
An important question is: what trajectories do our priors represent? We can find out by simulating epidemics using random draws from our priors:
# Prepare plot where we'll add the individual trajectoriesp =plot( title ="Prior Predictive Trajectories", xlabel ="Time", ylabel ="Infected")ntraj =20for i in1:ntraj R_0 = R_0_samples[i] D_inf = D_inf_samples[i] prob =ODEProblem( sir_ode!, init_state, (t0, tend), [R_0, D_inf] ) sol =solve(prob)plot!(p, sol, alpha =0.6, label ="", color =:grey, idxs = (0, 2))endplot!(p)
NoteHow did we tell Julia to plot just the number of infected?
The plot() commands now contain an argument idxs, which allows us to choose what to plot. This uses the plot recipe for differential equations. Index 0 corresponds to the time variable, and any indices >0 to the variables of the differential equations. In our case, we specify 0, 2, that is we plot time (on the x axis) against the second variable, I (on the y axis).
Note the use of plot! (with !) which adds to an existing plot, following the same convention as push! - the ! indicates the function modifies something in place rather than creating a new object.
NoteComing from R?
plot!() in Julia adds to an existing plot, similar to lines() or points() in base R, or adding layers with + geom_*() in ggplot2.
# Base Rplot(x, y) # new plot → Julia: plot(x, y)lines(x2, y2) # add to it → Julia: plot!(x2, y2)# ggplot2ggplot(df, aes(x, y)) +geom_line() +geom_point()
WarningExercise: Evaluate the prior predictive trajectories
Look at these trajectories. Do they seem reasonable? Are there any that look unrealistic?
Do any grow too fast or too slow?
Are the peak sizes reasonable for a population of 1000?
Do any last too long or too short?
This is prior predictive checking - validating that our priors produce scientifically plausible trajectories.
Choosing different priors
What if we had different beliefs? Here are more informative (narrower) priors:
# More informative priors based on epidemiological knowledgeR_0_informed =Normal(2.5, 0.8) # Most respiratory diseases have R_0 around 2-3D_inf_informed =Gamma(2, 1) # Infectious period likely 1-4 days for common infections# Sample and visualiseR_0_informed_samples =rand(R_0_informed, 1000)D_inf_informed_samples =rand(D_inf_informed, 1000)p1 =histogram( R_0_samples, title ="Flat Prior", xlabel ="R_0", alpha =0.7, label ="Uniform(1, 20)")histogram!(p1, R_0_informed_samples, alpha =0.7, label ="Normal(2.5, 0.8)")p2 =histogram( D_inf_samples, title ="Flat Prior", xlabel ="D_inf", alpha =0.7, label ="Uniform(1, 14)")histogram!(p2, D_inf_informed_samples, alpha =0.7, label ="Gamma(2, 1)")plot(p1, p2, layout = (1,2))
NoteComing from R?
Watch out for the Gamma parameterisation: Julia’s Gamma(shape, scale) matches R’s scale argument, but many R users default to the rate parameterisation where rate = 1/scale.
# R — these are equivalent:dgamma(x, shape =2, scale =1)dgamma(x, shape =2, rate =1) # same here since scale = 1# Julia equivalent:# Gamma(2, 1) — always uses scale, never rate
Be careful when scale ≠ 1: Gamma(2, 3) in Julia corresponds to dgamma(x, 2, scale = 3) or dgamma(x, 2, rate = 1/3) in R.
Priors determine what kinds of epidemics our model considers plausible before seeing data.
WarningExercise: Experiment with priors
Try modifying the priors and re-running the prior predictive simulation above. For example:
What happens if you use R_0_prior = Uniform(1.0, 5.0) instead of Uniform(1.0, 20.0)?
What if you use LogNormal(log(2.5), 0.3) for \(R_0\)? (Hint: this centres the prior around 2.5 but on a log scale)
Can you find a prior that produces only “realistic-looking” epidemic curves?
There is no single right answer — the point is to build intuition for how prior choices affect the range of predictions.
TipPrinciples for choosing priors
Use domain knowledge: What do published studies say about plausible parameter ranges? For infectious diseases, systematic reviews often provide estimates.
Avoid impossible values: If a parameter must be positive, use a distribution with positive support (Gamma, LogNormal) rather than truncating a Normal.
Check with prior predictive simulation: Do samples from your priors generate plausible epidemic trajectories? If not, your priors may be too vague or misspecified.
Weakly informative is often better than flat: Completely flat priors can cause computational problems and may place substantial probability on implausible values.
Document your choices: Record why you chose each prior - this helps with reproducibility and peer review.
Evaluating prior densities
So far we’ve been sampling from our priors to understand what they represent. But for Bayesian inference, we also need to evaluate how probable specific parameter values are under our priors.
We define some specific parameter combinations to explore throughout our analysis:
logpdf(dist, x) - returns the log probability density of x under distribution dist. We use log probabilities because they’re numerically more stable (multiplying tiny probabilities can underflow).
Normal(μ, σ) - creates a normal distribution with mean μ and standard deviation σ.
Gamma(α, θ) - creates a gamma distribution with shape α and scale θ. Useful for positive parameters like durations.
push!(array, value) - appends value to the end of array. The ! indicates it modifies the array in place, which you can think of as shorthand for array = push(array, value).
NoteComing from R?
logpdf(dist, x) in Julia is equivalent to calling a density function with log = TRUE in R. For example:
Julia
R
logpdf(Normal(2.5, 0.8), x)
dnorm(x, 2.5, 0.8, log = TRUE)
logpdf(Gamma(2, 1), x)
dgamma(x, 2, scale = 1, log = TRUE)
logpdf(Poisson(10), x)
dpois(x, 10, log = TRUE)
The key difference is that in Julia you first create the distribution object, then pass it to logpdf. In R, you pass the parameters directly to the density function each time.
Part 2: Understanding the likelihood
What is a likelihood?
The likelihood quantifies how probable our observed data is, given specific parameter values. It asks: “If the parameters were X, how likely would we be to see the data we actually observed?”
The prior predictive trajectories we looked at earlier were purely a function of the model and its specification. The likelihood, on the other hand, is about comparing the model to data. We start by loading some observed epidemic data (simulated from the model, but in principle this could be any data) representing the number of infectious individuals at different time points:
# Load epidemic data from CSVepi1 = CSV.read(datadir("epi1_synthetic.csv"), DataFrame)println("Data overview:")first(epi1, 8)
Data overview:
8×2 DataFrame
Row
time
obs
Int64
Int64
1
1
1
2
2
3
3
3
4
4
4
11
5
5
19
6
6
38
7
7
67
8
8
112
TipWhat is datadir()?
datadir() comes from the DrWatson.jl package we loaded in the setup. It returns the path to the data/ folder in the project root, so datadir("epi1_synthetic.csv") expands to something like /path/to/mfiidd.julia/data/epi1_synthetic.csv. This keeps file paths portable across different machines.
We first visualise this data set.
# Visualise the observed datascatter( epi1.time, epi1.obs, xlabel ="Time", ylabel ="Observed Cases", title ="Epidemic Data", label ="Observed", markersize =4, color =:red)
This data was generated from an SIR model with \(R_0 ≈ 2.5\) and \(D_\mathrm{inf} = 2\) days. Can we figure this out by exploring the likelihood?
Exploring likelihood through simulation
The likelihood asks: “How probable is our observed data, given specific parameter values?” We explore this by trying the parameter combinations from before:
# Calculate and store likelihood values for our test parameterslog_likelihoods = []# Generate plotp =plot(title ="Model Fit Comparison", xlabel ="Time", ylabel ="Cases")# For each parameter set, calculate log-likelihood and plotfor theta in test_params R_0 = theta.R_0 D_inf = theta.D_inf# Simulate from the model prob =ODEProblem( sir_ode!, init_state, (t0, tend), [R_0, D_inf] ) sol =solve(prob, saveat = epi1.time)# I compartment is second compartment I = sol[2, :]# assuming Poisson log-likelihood at each time point loglik = [ Poisson(It) for It in I ]# calculate total loglikelihood as sum of log likelihoods at each time point total_loglik =sum(logpdf.(loglik, epi1.obs))# store in vectorpush!(log_likelihoods, total_loglik)println("R_0 = $(R_0), D_inf = $(D_inf) → ","log-likelihood = $(round(total_loglik, digits =1))" )plot!( p, sol, idxs = (0, 2), label ="R_0 = $(theta.R_0), D_inf = $(theta.D_inf)", linewidth =2 )endscatter!( p, epi1.time, epi1.obs, label ="Observed", markersize =4, color =:black, markershape =:circle)plot!(p)
sol[2, :] - extracts the second row (the I compartment) from the ODE solution matrix. The : means “all columns” (all time points).
[ Poisson(It) for It in I ] - an array comprehension that creates a Poisson distribution for each expected value It in the vector I. If you know R, this is like lapply(I, function(It) ...) or purrr::map(I, ~ ...) — it applies an operation to each element and collects the results.
Poisson(λ) - creates a Poisson distribution with rate parameter λ. Used here as our observation model.
logpdf.(loglik, epi1.obs) - the . broadcasts the logpdf function element-wise, calculating log-likelihood for each observation against its corresponding Poisson distribution.
NoteComing from R?
R vectorises most functions implicitly; Julia requires an explicit . for element-wise operations (broadcasting).
# R — automatically vectorises over obs and lambdadpois(obs, lambda = expected, log =TRUE)
# Julia — need the dot for element-wise operationlogpdf.(Poisson.(expected), obs)
Forgetting the . on a vector operation is a common gotcha for R users.
WarningExercise: Predict the best fit
Before looking at the output above, consider the parameter combinations being tested. Based on your understanding of SIR dynamics and the data plot:
Which value of \(R_0\) do you expect to give the best fit? Why?
How would changing \(D_\mathrm{inf}\) affect the epidemic curve?
Now check the log-likelihood values. Did the results match your intuition? Is the visual best fit also the one with the highest log-likelihood?
The observation model
Our likelihood makes specific assumptions about how observations relate to the underlying epidemic:
Poisson noise: Each observation is drawn from a Poisson distribution centred on the model’s predicted prevalence. This implies the variance equals the mean - when we expect 100 cases, we expect variation of about ±10.
Conditional independence: Given the deterministic trajectory, observations at different time points are independent. The observation error at time \(t\) doesn’t affect the error at time \(t+1\).
The total log-likelihood is the sum across all time points: if the model predicts well at most times but poorly at one, that one bad fit drags down the total.
WarningExercise: When do these assumptions break down?
Think about when these assumptions might be violated:
When might variance not equal the mean? (Hint: consider overdispersion, or very small/large counts)
When might observations not be independent? (Hint: consider reporting delays, weekend effects, or outbreak investigations)
What other aspects of the observation process might you want to model? (Hint: under-reporting, delays between infection and observation)
A helper function
We’ll be calculating likelihoods repeatedly. Rather than copy-pasting the ODE simulation and Poisson likelihood code from above each time, let’s extract it into a reusable function:
This function simulates the SIR model and returns both the trajectory and the log-likelihood. We’ll use it throughout the rest of this session.
TipCode explainer: keyword arguments and named tuples
The semicolon ; in the function signature separates positional arguments (R_0, D_inf, data) from keyword arguments (init_state, tspan). Keyword arguments have defaults and must be passed by name: simulate_and_loglik(2.5, 2.0, epi1, init_state = [499, 1, 0]).
The function returns a named tuple(I = I, loglik = loglik), which you access with dot syntax: result.I or result.loglik. Named tuples are lightweight and immutable — think of them as a simple way to return multiple values with meaningful names.
NoteComing from R?
Keyword arguments with defaults work like R function arguments:
# Rsimulate_and_loglik <-function(R_0, D_inf, data,init_state =c(999, 1, 0),tspan =c(0, 30)) {# ...list(I = I, loglik = loglik) # named list}result$I # access with $
Julia’s named tuples are similar to R’s named lists but lighter-weight. The main difference is the ; to separate keyword from positional arguments — R doesn’t make this distinction.
ImportantTake a break here
You’ve completed Parts 1 and 2: priors and likelihoods. Part 3 combines them into the posterior and covers grid search.
If you’re working through this in one sitting, stop here for 10-15 minutes. The concepts in Part 3 build directly on what you’ve learned, and a short break will help consolidate your understanding.
Part 3: Understanding the posterior
What is a posterior?
The posterior combines our prior beliefs with the data evidence using Bayes’ theorem:
When we have strong prior beliefs that align with the data, the posterior gets a boost. When priors conflict with data, they pull the posterior away from the likelihood.
Systematic posterior exploration
Now that we understand the components of Bayesian inference, we can systematically explore the posterior by trying many parameter combinations and calculating their log-posterior values manually.
Grid search to find the best parameters
We calculate log-posterior values for a grid of parameter combinations:
Best parameters: R_0 = 2.3, D_inf = 2.0
Best log-posterior: -98.5
TipCode explainer
1.5:0.2:4.0 - creates a range from 1.5 to 4.0 in steps of 0.2. This gives us 14 values for R_0.
The nested for loops try every combination of R_0 and D_inf values - that’s 14 × 11 = 154 combinations.
For each combination, we calculate the log-posterior (prior + likelihood) and track the best one found so far.
NoteComing from R?
Julia’s start:step:stop syntax is equivalent to seq(start, stop, by = step) in R. Note that R’s : operator (e.g. 1:10) only increments by 1, whereas Julia’s : supports any step size.
# Rseq(1.5, 4.0, by =0.2)
# Julia1.5:0.2:4.0
Is this the result you expected?
Apart from finding the best parameters we can also use our results to visualise the whole posterior surface.
# Visualise the posterior surfaceR_0_grid = [r.R_0 for r in results]D_inf_grid = [r.D_inf for r in results]log_post_grid = [r.log_posterior for r in results]# Reshape for heatmapR_0_unique =unique(R_0_grid)D_inf_unique =unique(D_inf_grid)post_matrix =reshape(log_post_grid, length(R_0_unique), length(D_inf_unique))heatmap( D_inf_unique, R_0_unique, post_matrix, xlabel ="D_inf (days)", ylabel ="R_0", title ="Log-Posterior Surface", color =:viridis)
What do you observe about the posterior surface?
Posterior predictive checking
Finding the best parameters is only half the job - we also need to check whether the model actually fits the data well. We do this by simulating data from the fitted model and comparing it to our observations.
# Get the trajectory for the best parametersbest_result =simulate_and_loglik(best_params.R_0, best_params.D_inf, epi1)# Plot: deterministic trajectory + observation noisep =plot(title ="Best Fit: Observation Noise Only", xlabel ="Time", ylabel ="Cases")# Multiple realisations with Poisson observation noise (same parameters each time)for i in1:10 pred_data =rand.(Poisson.(best_result.I))plot!(p, epi1.time, pred_data, alpha =0.3, color =:blue, label ="")end# Deterministic trajectoryplot!(p, epi1.time, best_result.I, linewidth =2, color =:red, label ="Model (R_0=$(best_params.R_0), D_inf=$(best_params.D_inf))")scatter!(p, epi1.time, epi1.obs, label ="Observed", markersize =4, color =:black)
The blue lines show variation from observation noise only - the underlying trajectory is identical each time.
Impact of parameter uncertainty
But we don’t know the true parameters - we have a distribution of plausible values. What happens when we also vary the parameters?
# Find parameter combinations in the top 10% of posterior densitythreshold =quantile(log_post_grid, 0.9)good_results =filter(r -> r.log_posterior > threshold, results)println("$(length(good_results)) parameter combinations in top 10% of posterior")# Plot trajectories from different "good" parameter valuesp =plot(title ="Parameter + Observation Uncertainty", xlabel ="Time", ylabel ="Cases")for r in good_results[1:min(15, end)] result =simulate_and_loglik(r.R_0, r.D_inf, epi1) pred_data =rand.(Poisson.(result.I)) # Add observation noise tooplot!(p, epi1.time, pred_data, alpha =0.3, color =:blue, label ="")endscatter!(p, epi1.time, epi1.obs, label ="Observed", markersize =4, color =:black)
15 parameter combinations in top 10% of posterior
TipWhat’s the difference?
In the first plot, we fixed the parameters at their best values and only varied the observation noise. In the second plot, we vary both the parameters (sampling from the posterior) and the observation noise. The second plot shows more variation because it accounts for our uncertainty about the true parameter values.
WarningExercise: Posterior predictive checking
Does the real data look consistent with our simulated datasets? This is posterior predictive checking - a key model validation technique that we’ll explore in depth in the model checking session.
Why we need algorithmic approaches
We have seen how being able to simulate a model, in combination with prior densities and observation probabilities (expressed in the likelihood) enables us to fit this model to data and develop an understanding of parametric uncertainty. But we have done so using grid search, which has serious limitations:
# The curse of dimensionalityn_params = [2, 3, 4, 5, 6, 10]n_values_per_param =15# Reasonable resolutionprintln("Grid search scaling problems:")for n in n_params total_evals = n_values_per_param^nprintln("$(n) parameters: $(total_evals) evaluations")end
This shows that the computational cost of grid search, i.e. the number of simulations that we need to run, reaches very large numbers even at moderate levels of complexity.
In addition, this kind of sampling is wasteful, because most parameter combinations will be far from the ones that lead to a good fit to the data.
Lastly, it is difficult to establish whether the grid is fine enough (or important parameter regions might be in the “gaps”), and whether the limits of the grid are wide enough.
Summary
Priors express our beliefs about parameters before seeing data. Through prior predictive simulation, different prior choices lead to different predicted epidemic trajectories - this helps validate whether assumptions are reasonable.
The likelihood quantifies how well parameters explain observed data. Our Poisson observation model assumes each data point is an independent draw around the model’s predicted prevalence - assumptions worth questioning for real applications.
Bayes’ theorem combines prior and likelihood to give the posterior - updated beliefs after seeing data. Grid search can explore this manually, but becomes computationally infeasible as parameters increase. Posterior predictive checking validates that fitted models generate realistic data. This scalability limitation motivates the algorithmic approaches in the next session.
TipLearning points
Priors express beliefs before seeing data - explore them through simulation (prior predictive checking)
Likelihood quantifies how well parameters explain observed data, given an observation model
Posterior combines prior and likelihood via Bayes’ theorem: \(p(\theta|\text{data}) \propto p(\text{data}|\theta) \times p(\theta)\)
Observation models encode assumptions (e.g., Poisson noise, conditional independence) - think critically about when they’re violated
Grid search can explore posteriors but doesn’t scale beyond a few parameters
Posterior predictive checking validates whether fitted models generate realistic data
Next session: In MCMC sampling, we’ll learn how Markov Chain Monte Carlo algorithms solve these problems by automatically exploring the posterior distribution. We’ll start with simple algorithms you can code yourself, then see how Turing.jl automates everything with state-of-the-art methods.