vignettes/mishra-case-study.Rmd
mishra-case-study.RmdThis vignette demonstrates how to replicate the analysis from Mishra et al. (2020) of COVID-19 transmission dynamics in South Korea using EpiAwareR. This case study showcases the compositional modelling approach by building a complete epidemiological model from three reusable components.
The analysis estimates the time-varying reproduction number () using:
We use the actual South Korea COVID-19 data from the preprint, which
is included in the package. The data comes from the
covidregionaldata R package and contains daily confirmed
cases from January to July 2020.
# Load South Korea COVID-19 data
data_path <- system.file("extdata", "south_korea_data.csv", package = "EpiAwareR")
south_korea <- read.csv(data_path)
south_korea$date <- as.Date(south_korea$date)
# Prepare full dataset - tspan will select the window
# Column must be named 'y_t' for EpiAware
full_data <- data.frame(
date = south_korea$date,
y_t = south_korea$cases_new
)
# Preview the training window (days 45-80)
# This corresponds to Feb 13 - Mar 19, 2020 (the main epidemic wave)
cat("Training window (tspan 45-80):\n")
#> Training window (tspan 45-80):
print(full_data[45:80, ])
#> date y_t
#> 45 2020-02-13 0
#> 46 2020-02-14 0
#> 47 2020-02-15 0
#> 48 2020-02-16 1
#> 49 2020-02-17 1
#> 50 2020-02-18 1
#> 51 2020-02-19 15
#> 52 2020-02-20 34
#> 53 2020-02-21 75
#> 54 2020-02-22 190
#> 55 2020-02-23 256
#> 56 2020-02-24 161
#> 57 2020-02-25 130
#> 58 2020-02-26 254
#> 59 2020-02-27 449
#> 60 2020-02-28 427
#> 61 2020-02-29 909
#> 62 2020-03-01 595
#> 63 2020-03-02 686
#> 64 2020-03-03 600
#> 65 2020-03-04 516
#> 66 2020-03-05 438
#> 67 2020-03-06 518
#> 68 2020-03-07 483
#> 69 2020-03-08 367
#> 70 2020-03-09 248
#> 71 2020-03-10 131
#> 72 2020-03-11 242
#> 73 2020-03-12 114
#> 74 2020-03-13 110
#> 75 2020-03-14 107
#> 76 2020-03-15 76
#> 77 2020-03-16 74
#> 78 2020-03-17 84
#> 79 2020-03-18 93
#> 80 2020-03-19 152
# Plot the training data
plot(full_data$date[45:80], full_data$y_t[45:80], type = "b",
xlab = "Date", ylab = "Daily Cases",
main = "South Korea COVID-19 Cases (Feb 13 - Mar 19, 2020)")
The AR(2) process models the evolution of log over time with temporal autocorrelation:
where .
ar2 <- AR(
order = 2,
damp_priors = list(
truncnorm(0.1, 0.05, 0, 1), # ρ₁: Second-order coefficient
truncnorm(0.8, 0.05, 0, 1) # ρ₂: First-order (strong autocorrelation)
),
init_priors = list(
norm(-1.0, 0.1), # Initial value for lag 1
norm(-1.0, 0.5) # Initial value for lag 2
),
std_prior = halfnorm(0.5) # σ: Innovation standard deviation
)
#> Julia version 1.11.9 at location /opt/hostedtoolcache/julia/1.11.9/x64/bin will be used.
#> Loading setup script for JuliaCall...
#> Finish loading setup script for JuliaCall.
#> EpiAware Julia backend loaded successfully
print(ar2)
#> <EpiAware AR(2) Latent Model>
#> Damping priors: 2
#> Init priors: 2
#> Innovation std prior: specifiedPrior choices (from Mishra et al. 2020):
The renewal equation models new infections based on past infections and generation time:
where is the discretized generation time distribution.
renewal <- Renewal(
gen_distribution = gamma_dist(6.5, 0.62), # Shape and scale parameters
initialisation_prior = norm(log(1.0), 1.0) # Prior on initial log infections (wide)
)
print(renewal)
#> <EpiAware Renewal Infection Model>
#> Generation distribution: Gamma
#> Initialisation prior: specifiedKey parameters:
Links latent infections to observed case counts with overdispersion:
where controls overdispersion (clustering).
# The preprint fixes cluster_factor to 0.25
# We approximate this with a tight truncated normal prior
negbin <- NegativeBinomialError(
cluster_factor_prior = truncnorm(0.25, 0.01, 0, 1)
)
print(negbin)
#> <EpiAware Negative Binomial Observation Model>
#> Cluster factor prior: truncated(Normal(mean, sd), lower, upper)Parameterization:
model <- EpiProblem(
epi_model = renewal,
latent_model = ar2,
observation_model = negbin,
tspan = c(45, 80) # Exact tspan from preprint (Feb 13 - Mar 19, 2020)
)
print(model)
#> <EpiAware Epidemiological Model>
#> Time span: 45 to 80
#> Components:
#> - Infection model: epiaware_renewal
#> - Latent model: epiaware_ar
#> - Observation model: epiaware_negbinThe EpiProblem combines components into a joint Bayesian
model with automatic validation. The tspan = c(45, 80)
selects days 45-80 from the full dataset, matching the preprint
exactly.
We use NUTS (No-U-Turn Sampler) with Pathfinder initialization for posterior inference:
results <- fit(
model = model,
data = full_data, # Pass full dataset; tspan selects the window
method = nuts_sampler(
warmup = 1000, # Adaptation iterations
draws = 2000, # Posterior samples per chain (matches preprint)
chains = 4 # Independent chains for convergence assessment
)
)
#> Generating Turing.jl model...
#> Running NUTS sampling...
#> Chains: 4
#> Warmup: 1000
#> Draws: 2000
#> Running Pathfinder initialization...
#> Pathfinder init failed, using default initialization...
#> Processing results...Note: Fitting typically takes 2-5 minutes on modern hardware.
print(results)
#> <EpiAware Model Fit>
#>
#> Model:
#> Time span: 45 to 80
#> Infection model: epiaware_renewal
#> Latent model: epiaware_ar
#> Observation model: epiaware_negbin
#>
#> Sampling:
#> Method: NUTS
#> Chains: 4
#> Draws: 2000 (per chain)
#>
#> Convergence:
#> Max Rhat: 1.069
#> Min ESS (bulk): 74
#> Warning: Some parameters have ESS < 100
#>
#> Use summary() for parameter estimates
#> Use plot() to visualize resultsCheck for:
# Detailed posterior summaries
summary(results)
#> # A tibble: 53 × 10
#> variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 latent.… -0.992 -0.991 0.102 0.102 -1.16 -0.822 1.000 9314. 4989.
#> 2 latent.… -0.660 -0.658 0.476 0.469 -1.45 0.118 1.00 6998. 5282.
#> 3 latent.… 0.0842 0.0832 0.0401 0.0426 0.0208 0.152 1.00 4611. 2548.
#> 4 latent.… 0.808 0.808 0.0433 0.0435 0.738 0.880 1.00 5215. 3668.
#> 5 latent.… 0.556 0.548 0.0883 0.0864 0.425 0.710 1.00 3412. 4832.
#> 6 latent.… 0.970 0.978 0.889 0.888 -0.501 2.44 1.00 7866. 5508.
#> 7 latent.… 1.38 1.39 0.888 0.887 -0.100 2.82 1.000 8127. 5890.
#> 8 latent.… 1.23 1.23 0.900 0.886 -0.238 2.74 1.00 8660. 6117.
#> 9 latent.… 1.18 1.18 0.892 0.908 -0.294 2.65 1.00 9224. 5562.
#> 10 latent.… 2.17 2.17 0.824 0.816 0.810 3.53 1.00 8364. 5814.
#> # ℹ 43 more rowsKey parameters to examine:
epiaware_fit objects integrate directly with the posterior package. You can
convert the samples to any draws format for use with bayesplot, loo, or
other packages:
library(posterior)
#> This is posterior version 1.6.1
#>
#> Attaching package: 'posterior'
#> The following objects are masked from 'package:stats':
#>
#> mad, sd, var
#> The following objects are masked from 'package:base':
#>
#> %in%, match
# Convert to draws_df for tidy workflows
draws <- as_draws_df(results)
head(draws)
#> # A draws_df: 6 iterations, 1 chains, and 53 variables
#> latent.ar_init.1. latent.ar_init.2. latent.damp_AR.1. latent.damp_AR.2.
#> 1 -1.11 -0.17 0.114 0.74
#> 2 -1.00 -0.19 0.109 0.78
#> 3 -1.12 -0.63 0.077 0.74
#> 4 -1.03 -0.70 0.105 0.85
#> 5 -1.01 -0.62 0.070 0.83
#> 6 -0.81 -0.72 0.087 0.85
#> latent.std latent.ϵ_t.1. latent.ϵ_t.2. latent.ϵ_t.3.
#> 1 0.64 1.028 1.19 1.65
#> 2 0.53 1.311 2.02 0.35
#> 3 0.53 1.131 1.94 0.61
#> 4 0.64 -0.093 1.86 1.04
#> 5 0.48 0.248 0.90 1.44
#> 6 0.53 0.323 0.21 2.20
#> # ... with 45 more variables
#> # ... hidden reserved variables {'.chain', '.iteration', '.draw'}
# Summarise specific parameters
subset_draws(draws,
variable = c("latent.damp_AR.1.", "latent.damp_AR.2.", "latent.std")
) |>
summarise_draws()
#> # A tibble: 3 × 10
#> variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 latent.damp_… 0.0842 0.0832 0.0401 0.0426 0.0208 0.152 1.00 4611. 2548.
#> 2 latent.damp_… 0.808 0.808 0.0433 0.0435 0.738 0.880 1.00 5215. 3668.
#> 3 latent.std 0.556 0.548 0.0883 0.0864 0.425 0.710 1.00 3412. 4832.This also enables bayesplot diagnostics:
library(bayesplot)
# Trace plots for AR damping coefficients
mcmc_trace(as_draws_array(results),
pars = c("latent.damp_AR.1.", "latent.damp_AR.2."))
Test sensitivity to AR order:
# AR(1) alternative - strong autocorrelation like AR(2)
ar1 <- AR(
order = 1,
damp_priors = list(truncnorm(0.8, 0.05, 0, 1)), # High autocorrelation
init_priors = list(norm(-1.0, 0.5)),
std_prior = halfnorm(0.5)
)
model_ar1 <- EpiProblem(
epi_model = renewal,
latent_model = ar1,
observation_model = negbin,
tspan = c(45, 80)
)
results_ar1 <- fit(model_ar1, data = full_data)
#> Generating Turing.jl model...
#> Running NUTS sampling...
#> Chains: 4
#> Warmup: 1000
#> Draws: 1000
#> Running Pathfinder initialization...
#> Pathfinder init failed, using default initialization...
#> Processing results...Compare the estimates from both models:
library(ggplot2)
library(patchwork)
p_ar2 <- plot(results, type = "Rt") +
ggtitle("AR(2) Model") +
coord_cartesian(ylim = c(0, 6))
p_ar1 <- plot(results_ar1, type = "Rt") +
ggtitle("AR(1) Model") +
coord_cartesian(ylim = c(0, 6))
p_ar2 + p_ar1
The AR(2) model typically produces smoother trajectories due to the additional lag term, while AR(1) may show more rapid fluctuations.
Account for incubation and reporting delays:
# Incubation period (~5 days)
obs_incubation <- LatentDelay(
model = negbin,
delay_distribution = lognorm(1.6, 0.42)
)
# Reporting delay (~2 days)
obs_full <- LatentDelay(
model = obs_incubation,
delay_distribution = lognorm(0.58, 0.47)
)
model_delays <- EpiProblem(
epi_model = renewal,
latent_model = ar2,
observation_model = obs_full,
tspan = c(45, 80)
)
results_delays <- fit(model_delays, data = full_data)
#> Generating Turing.jl model...
#> Running NUTS sampling...
#> Chains: 4
#> Warmup: 1000
#> Draws: 1000
#> Running Pathfinder initialization...
#> Pathfinder init failed, using default initialization...
#> Processing results...Compare models with and without observation delays:
p_no_delay <- plot(results, type = "Rt") +
ggtitle("Without Delays") +
coord_cartesian(ylim = c(0, 6))
p_delay <- plot(results_delays, type = "Rt") +
ggtitle("With Incubation + Reporting Delays") +
coord_cartesian(ylim = c(0, 6))
p_no_delay + p_delay
Accounting for delays shifts the trajectory earlier in time, as infections precede observed cases.
The Mishra et al. (2020) analysis demonstrated:
Try modifying the analysis:
Mishra, S., Berah, T., Mellan, T. A., et al. (2020). On the derivation of the renewal equation from an age-dependent branching process: an epidemic modelling perspective. arXiv preprint arXiv:2006.16487.
sessionInfo()
#> R version 4.5.3 (2026-03-11)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.3 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=C.UTF-8 LC_COLLATE=C.UTF-8
#> [5] LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
#> [7] LC_PAPER=C.UTF-8 LC_NAME=C.UTF-8
#> [9] LC_ADDRESS=C.UTF-8 LC_TELEPHONE=C.UTF-8
#> [11] LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C.UTF-8
#>
#> time zone: UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] patchwork_1.3.2 ggplot2_4.0.2 bayesplot_1.15.0 posterior_1.6.1
#> [5] EpiAwareR_0.1.1
#>
#> loaded via a namespace (and not attached):
#> [1] tensorA_0.36.2.1 sass_0.4.10 utf8_1.2.6
#> [4] generics_0.1.4 stringi_1.8.7 digest_0.6.39
#> [7] magrittr_2.0.4 evaluate_1.0.5 grid_4.5.3
#> [10] RColorBrewer_1.1-3 fastmap_1.2.0 plyr_1.8.9
#> [13] jsonlite_2.0.0 backports_1.5.0 scales_1.4.0
#> [16] textshaping_1.0.5 jquerylib_0.1.4 abind_1.4-8
#> [19] cli_3.6.5 rlang_1.1.7 withr_3.0.2
#> [22] cachem_1.1.0 yaml_2.3.12 tools_4.5.3
#> [25] reshape2_1.4.5 checkmate_2.3.4 dplyr_1.2.0
#> [28] JuliaCall_0.17.6 vctrs_0.7.2 R6_2.6.1
#> [31] ggridges_0.5.7 matrixStats_1.5.0 lifecycle_1.0.5
#> [34] stringr_1.6.0 fs_2.0.0 ragg_1.5.2
#> [37] pkgconfig_2.0.3 desc_1.4.3 pkgdown_2.2.0
#> [40] pillar_1.11.1 bslib_0.10.0 gtable_0.3.6
#> [43] glue_1.8.0 Rcpp_1.1.1 systemfonts_1.3.2
#> [46] xfun_0.57 tibble_3.3.1 tidyselect_1.2.1
#> [49] knitr_1.51 farver_2.1.2 htmltools_0.5.9
#> [52] labeling_0.4.3 rmarkdown_2.30 compiler_4.5.3
#> [55] S7_0.2.1 distributional_0.7.0