Introduction

This vignette demonstrates how to replicate the analysis from Mishra et al. (2020) of COVID-19 transmission dynamics in South Korea using EpiAwareR. This case study showcases the compositional modelling approach by building a complete epidemiological model from three reusable components.

The analysis estimates the time-varying reproduction number (RtR_t) using:

  • AR(2) latent process for smooth evolution of log RtR_t
  • Renewal equation for infection dynamics
  • Negative binomial observation model for overdispersed case counts

Data Preparation

We use the actual South Korea COVID-19 data from the preprint, which is included in the package. The data comes from the covidregionaldata R package and contains daily confirmed cases from January to July 2020.

# Load South Korea COVID-19 data
data_path <- system.file("extdata", "south_korea_data.csv", package = "EpiAwareR")
south_korea <- read.csv(data_path)
south_korea$date <- as.Date(south_korea$date)

# Prepare full dataset - tspan will select the window
# Column must be named 'y_t' for EpiAware
full_data <- data.frame(
  date = south_korea$date,
  y_t = south_korea$cases_new
)

# Preview the training window (days 45-80)
# This corresponds to Feb 13 - Mar 19, 2020 (the main epidemic wave)
cat("Training window (tspan 45-80):\n")
#> Training window (tspan 45-80):
print(full_data[45:80, ])
#>          date y_t
#> 45 2020-02-13   0
#> 46 2020-02-14   0
#> 47 2020-02-15   0
#> 48 2020-02-16   1
#> 49 2020-02-17   1
#> 50 2020-02-18   1
#> 51 2020-02-19  15
#> 52 2020-02-20  34
#> 53 2020-02-21  75
#> 54 2020-02-22 190
#> 55 2020-02-23 256
#> 56 2020-02-24 161
#> 57 2020-02-25 130
#> 58 2020-02-26 254
#> 59 2020-02-27 449
#> 60 2020-02-28 427
#> 61 2020-02-29 909
#> 62 2020-03-01 595
#> 63 2020-03-02 686
#> 64 2020-03-03 600
#> 65 2020-03-04 516
#> 66 2020-03-05 438
#> 67 2020-03-06 518
#> 68 2020-03-07 483
#> 69 2020-03-08 367
#> 70 2020-03-09 248
#> 71 2020-03-10 131
#> 72 2020-03-11 242
#> 73 2020-03-12 114
#> 74 2020-03-13 110
#> 75 2020-03-14 107
#> 76 2020-03-15  76
#> 77 2020-03-16  74
#> 78 2020-03-17  84
#> 79 2020-03-18  93
#> 80 2020-03-19 152

# Plot the training data
plot(full_data$date[45:80], full_data$y_t[45:80], type = "b",
     xlab = "Date", ylab = "Daily Cases",
     main = "South Korea COVID-19 Cases (Feb 13 - Mar 19, 2020)")

Model Components

1. Latent Process: AR(2) Model

The AR(2) process models the evolution of log RtR_t over time with temporal autocorrelation:

logRt=ϕ1logRt1+ϕ2logRt2+ϵt\log R_t = \phi_1 \log R_{t-1} + \phi_2 \log R_{t-2} + \epsilon_t

where ϵtNormal(0,σ2)\epsilon_t \sim \text{Normal}(0, \sigma^2).

ar2 <- AR(
  order = 2,
  damp_priors = list(
    truncnorm(0.1, 0.05, 0, 1), # ρ₁: Second-order coefficient
    truncnorm(0.8, 0.05, 0, 1)  # ρ₂: First-order (strong autocorrelation)
  ),
  init_priors = list(
    norm(-1.0, 0.1), # Initial value for lag 1
    norm(-1.0, 0.5)  # Initial value for lag 2
  ),
  std_prior = halfnorm(0.5) # σ: Innovation standard deviation
)
#> Julia version 1.11.9 at location /opt/hostedtoolcache/julia/1.11.9/x64/bin will be used.
#> Loading setup script for JuliaCall...
#> Finish loading setup script for JuliaCall.
#> EpiAware Julia backend loaded successfully

print(ar2)
#> <EpiAware AR(2) Latent Model>
#>   Damping priors: 2 
#>   Init priors: 2 
#>   Innovation std prior: specified

Prior choices (from Mishra et al. 2020):

  • First-order damping coefficient ρ₂ centered at 0.8 reflects strong autocorrelation in log Rt
  • Second-order damping coefficient ρ₁ centered at 0.1 for smooth dynamics
  • Initial values centered at -1 on log scale (Rt ≈ 0.37 initially)
  • Innovation std of 0.5 allows flexibility in Rt changes

2. Infection Model: Renewal Equation

The renewal equation models new infections based on past infections and generation time:

It=Rts=1tItsgsI_t = R_t \sum_{s=1}^{t} I_{t-s} \cdot g_s

where gsg_s is the discretized generation time distribution.

renewal <- Renewal(
  gen_distribution = gamma_dist(6.5, 0.62), # Shape and scale parameters
  initialisation_prior = norm(log(1.0), 1.0) # Prior on initial log infections (wide)
)

print(renewal)
#> <EpiAware Renewal Infection Model>
#>   Generation distribution: Gamma 
#>   Initialisation prior: specified

Key parameters:

  • Generation time: Mean ~6.5 days (typical for COVID-19 in early 2020)
  • Gamma distribution allows flexible shape
  • Initial infections seeded near 1 with wide prior (sd=1.0 on log scale)

3. Observation Model: Negative Binomial

Links latent infections to observed case counts with overdispersion:

YtNegBinom(μ=It,ϕ)Y_t \sim \text{NegBinom}(\mu = I_t, \phi)

where ϕ\phi controls overdispersion (clustering).

# The preprint fixes cluster_factor to 0.25
# We approximate this with a tight truncated normal prior
negbin <- NegativeBinomialError(
  cluster_factor_prior = truncnorm(0.25, 0.01, 0, 1)
)

print(negbin)
#> <EpiAware Negative Binomial Observation Model>
#>   Cluster factor prior: truncated(Normal(mean, sd), lower, upper)

Parameterization:

  • Cluster factor = 1/ϕ\sqrt{1/\phi} (coefficient of variation)
  • Fixed to 0.25 (via tight prior) as in the preprint
  • More interpretable than directly specifying ϕ\phi

Compose and Fit

Create Complete Model

model <- EpiProblem(
  epi_model = renewal,
  latent_model = ar2,
  observation_model = negbin,
  tspan = c(45, 80) # Exact tspan from preprint (Feb 13 - Mar 19, 2020)
)

print(model)
#> <EpiAware Epidemiological Model>
#>   Time span: 45 to 80 
#>   Components:
#>     - Infection model: epiaware_renewal 
#>     - Latent model: epiaware_ar 
#>     - Observation model: epiaware_negbin

The EpiProblem combines components into a joint Bayesian model with automatic validation. The tspan = c(45, 80) selects days 45-80 from the full dataset, matching the preprint exactly.

Bayesian Inference

We use NUTS (No-U-Turn Sampler) with Pathfinder initialization for posterior inference:

results <- fit(
  model = model,
  data = full_data, # Pass full dataset; tspan selects the window
  method = nuts_sampler(
    warmup = 1000, # Adaptation iterations
    draws = 2000, # Posterior samples per chain (matches preprint)
    chains = 4 # Independent chains for convergence assessment
  )
)
#> Generating Turing.jl model...
#> Running NUTS sampling...
#>   Chains: 4
#>   Warmup: 1000
#>   Draws: 2000
#>   Running Pathfinder initialization...
#>   Pathfinder init failed, using default initialization...
#> Processing results...

Note: Fitting typically takes 2-5 minutes on modern hardware.

Results

Convergence Diagnostics

print(results)
#> <EpiAware Model Fit>
#> 
#> Model:
#>   Time span: 45 to 80 
#>   Infection model: epiaware_renewal 
#>   Latent model: epiaware_ar 
#>   Observation model: epiaware_negbin 
#> 
#> Sampling:
#>   Method: NUTS
#>   Chains: 4 
#>   Draws: 2000 (per chain)
#> 
#> Convergence:
#>   Max Rhat: 1.069 
#>   Min ESS (bulk): 74 
#>   Warning: Some parameters have ESS < 100
#> 
#> Use summary() for parameter estimates
#> Use plot() to visualize results

Check for:

  • Rhat < 1.1: Chains have converged to the same distribution
  • ESS > 100: Sufficient effective sample size for inference
  • No divergent transitions: NUTS is exploring the posterior efficiently

Parameter Summaries

# Detailed posterior summaries
summary(results)
#> # A tibble: 53 × 10
#>    variable    mean  median     sd    mad      q5    q95  rhat ess_bulk ess_tail
#>    <chr>      <dbl>   <dbl>  <dbl>  <dbl>   <dbl>  <dbl> <dbl>    <dbl>    <dbl>
#>  1 latent.… -0.992  -0.991  0.102  0.102  -1.16   -0.822 1.000    9314.    4989.
#>  2 latent.… -0.660  -0.658  0.476  0.469  -1.45    0.118 1.00     6998.    5282.
#>  3 latent.…  0.0842  0.0832 0.0401 0.0426  0.0208  0.152 1.00     4611.    2548.
#>  4 latent.…  0.808   0.808  0.0433 0.0435  0.738   0.880 1.00     5215.    3668.
#>  5 latent.…  0.556   0.548  0.0883 0.0864  0.425   0.710 1.00     3412.    4832.
#>  6 latent.…  0.970   0.978  0.889  0.888  -0.501   2.44  1.00     7866.    5508.
#>  7 latent.…  1.38    1.39   0.888  0.887  -0.100   2.82  1.000    8127.    5890.
#>  8 latent.…  1.23    1.23   0.900  0.886  -0.238   2.74  1.00     8660.    6117.
#>  9 latent.…  1.18    1.18   0.892  0.908  -0.294   2.65  1.00     9224.    5562.
#> 10 latent.…  2.17    2.17   0.824  0.816   0.810   3.53  1.00     8364.    5814.
#> # ℹ 43 more rows

Key parameters to examine:

  • AR damping coefficients (ϕ1\phi_1, ϕ2\phi_2): Autocorrelation strength
  • Innovation std (σ\sigma): Variability in RtR_t changes
  • Cluster factor: Degree of case count overdispersion
  • Initial infections: Epidemic seeding

Working with posterior draws

epiaware_fit objects integrate directly with the posterior package. You can convert the samples to any draws format for use with bayesplot, loo, or other packages:

library(posterior)
#> This is posterior version 1.6.1
#> 
#> Attaching package: 'posterior'
#> The following objects are masked from 'package:stats':
#> 
#>     mad, sd, var
#> The following objects are masked from 'package:base':
#> 
#>     %in%, match

# Convert to draws_df for tidy workflows
draws <- as_draws_df(results)
head(draws)
#> # A draws_df: 6 iterations, 1 chains, and 53 variables
#>   latent.ar_init.1. latent.ar_init.2. latent.damp_AR.1. latent.damp_AR.2.
#> 1             -1.11             -0.17             0.114              0.74
#> 2             -1.00             -0.19             0.109              0.78
#> 3             -1.12             -0.63             0.077              0.74
#> 4             -1.03             -0.70             0.105              0.85
#> 5             -1.01             -0.62             0.070              0.83
#> 6             -0.81             -0.72             0.087              0.85
#>   latent.std latent.ϵ_t.1. latent.ϵ_t.2. latent.ϵ_t.3.
#> 1       0.64         1.028          1.19          1.65
#> 2       0.53         1.311          2.02          0.35
#> 3       0.53         1.131          1.94          0.61
#> 4       0.64        -0.093          1.86          1.04
#> 5       0.48         0.248          0.90          1.44
#> 6       0.53         0.323          0.21          2.20
#> # ... with 45 more variables
#> # ... hidden reserved variables {'.chain', '.iteration', '.draw'}

# Summarise specific parameters
subset_draws(draws,
  variable = c("latent.damp_AR.1.", "latent.damp_AR.2.", "latent.std")
) |>
  summarise_draws()
#> # A tibble: 3 × 10
#>   variable        mean median     sd    mad     q5   q95  rhat ess_bulk ess_tail
#>   <chr>          <dbl>  <dbl>  <dbl>  <dbl>  <dbl> <dbl> <dbl>    <dbl>    <dbl>
#> 1 latent.damp_… 0.0842 0.0832 0.0401 0.0426 0.0208 0.152  1.00    4611.    2548.
#> 2 latent.damp_… 0.808  0.808  0.0433 0.0435 0.738  0.880  1.00    5215.    3668.
#> 3 latent.std    0.556  0.548  0.0883 0.0864 0.425  0.710  1.00    3412.    4832.

This also enables bayesplot diagnostics:

library(bayesplot)

# Trace plots for AR damping coefficients
mcmc_trace(as_draws_array(results),
           pars = c("latent.damp_AR.1.", "latent.damp_AR.2."))

Visualization

# Time-varying reproduction number with credible intervals
plot(results, type = "Rt")


# Observed vs predicted case counts
plot(results, type = "cases")


# Posterior distributions for key parameters
plot(results, type = "posterior")

Model Comparisons

Comparing Latent Processes

Test sensitivity to AR order:

# AR(1) alternative - strong autocorrelation like AR(2)
ar1 <- AR(
  order = 1,
  damp_priors = list(truncnorm(0.8, 0.05, 0, 1)),  # High autocorrelation
  init_priors = list(norm(-1.0, 0.5)),
  std_prior = halfnorm(0.5)
)

model_ar1 <- EpiProblem(
  epi_model = renewal,
  latent_model = ar1,
  observation_model = negbin,
  tspan = c(45, 80)
)

results_ar1 <- fit(model_ar1, data = full_data)
#> Generating Turing.jl model...
#> Running NUTS sampling...
#>   Chains: 4
#>   Warmup: 1000
#>   Draws: 1000
#>   Running Pathfinder initialization...
#>   Pathfinder init failed, using default initialization...
#> Processing results...

Compare the RtR_t estimates from both models:

library(ggplot2)
library(patchwork)

p_ar2 <- plot(results, type = "Rt") +
  ggtitle("AR(2) Model") +
  coord_cartesian(ylim = c(0, 6))

p_ar1 <- plot(results_ar1, type = "Rt") +
  ggtitle("AR(1) Model") +
  coord_cartesian(ylim = c(0, 6))

p_ar2 + p_ar1

The AR(2) model typically produces smoother RtR_t trajectories due to the additional lag term, while AR(1) may show more rapid fluctuations.

Adding Observation Delays

Account for incubation and reporting delays:

# Incubation period (~5 days)
obs_incubation <- LatentDelay(
  model = negbin,
  delay_distribution = lognorm(1.6, 0.42)
)

# Reporting delay (~2 days)
obs_full <- LatentDelay(
  model = obs_incubation,
  delay_distribution = lognorm(0.58, 0.47)
)

model_delays <- EpiProblem(
  epi_model = renewal,
  latent_model = ar2,
  observation_model = obs_full,
  tspan = c(45, 80)
)

results_delays <- fit(model_delays, data = full_data)
#> Generating Turing.jl model...
#> Running NUTS sampling...
#>   Chains: 4
#>   Warmup: 1000
#>   Draws: 1000
#>   Running Pathfinder initialization...
#>   Pathfinder init failed, using default initialization...
#> Processing results...

Compare models with and without observation delays:

p_no_delay <- plot(results, type = "Rt") +
  ggtitle("Without Delays") +
  coord_cartesian(ylim = c(0, 6))

p_delay <- plot(results_delays, type = "Rt") +
  ggtitle("With Incubation + Reporting Delays") +
  coord_cartesian(ylim = c(0, 6))

p_no_delay + p_delay

Accounting for delays shifts the RtR_t trajectory earlier in time, as infections precede observed cases.

Interpretation

The Mishra et al. (2020) analysis demonstrated:

  1. Flexible RtR_t estimation: AR(2) captures both smooth trends and rapid changes
  2. Uncertainty quantification: Bayesian credible intervals reflect parameter uncertainty
  3. Intervention detection: Sharp decline in RtR_t coincides with public health measures
  4. Compositional flexibility: Easy to test alternative assumptions (AR order, delays)

Key Findings

  • Initial Rt2.5R_t \approx 2.5: Rapid exponential growth phase
  • Post-intervention Rt<1R_t < 1: Control achieved through distancing measures
  • Moderate overdispersion: Case counts show clustering but not extreme variance
  • Good model fit: Posterior predictive checks show data within credible intervals

Extensions

Try modifying the analysis:

  1. Different priors: Test sensitivity to prior choices
  2. Alternative latent models: MA, random walk, spline models
  3. Stratification: Separate models by age group or region
  4. Forecast evaluation: Hold-out validation of predictive performance

References

Mishra, S., Berah, T., Mellan, T. A., et al. (2020). On the derivation of the renewal equation from an age-dependent branching process: an epidemic modelling perspective. arXiv preprint arXiv:2006.16487.

Session Info

sessionInfo()
#> R version 4.5.3 (2026-03-11)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.3 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=C.UTF-8          LC_NUMERIC=C             
#>  [3] LC_TIME=C.UTF-8           LC_COLLATE=C.UTF-8       
#>  [5] LC_MONETARY=C.UTF-8       LC_MESSAGES=C.UTF-8      
#>  [7] LC_PAPER=C.UTF-8          LC_NAME=C.UTF-8          
#>  [9] LC_ADDRESS=C.UTF-8        LC_TELEPHONE=C.UTF-8     
#> [11] LC_MEASUREMENT=C.UTF-8    LC_IDENTIFICATION=C.UTF-8
#> 
#> time zone: UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] patchwork_1.3.2  ggplot2_4.0.2    bayesplot_1.15.0 posterior_1.6.1 
#> [5] EpiAwareR_0.1.1 
#> 
#> loaded via a namespace (and not attached):
#>  [1] tensorA_0.36.2.1     sass_0.4.10          utf8_1.2.6          
#>  [4] generics_0.1.4       stringi_1.8.7        digest_0.6.39       
#>  [7] magrittr_2.0.4       evaluate_1.0.5       grid_4.5.3          
#> [10] RColorBrewer_1.1-3   fastmap_1.2.0        plyr_1.8.9          
#> [13] jsonlite_2.0.0       backports_1.5.0      scales_1.4.0        
#> [16] textshaping_1.0.5    jquerylib_0.1.4      abind_1.4-8         
#> [19] cli_3.6.5            rlang_1.1.7          withr_3.0.2         
#> [22] cachem_1.1.0         yaml_2.3.12          tools_4.5.3         
#> [25] reshape2_1.4.5       checkmate_2.3.4      dplyr_1.2.0         
#> [28] JuliaCall_0.17.6     vctrs_0.7.2          R6_2.6.1            
#> [31] ggridges_0.5.7       matrixStats_1.5.0    lifecycle_1.0.5     
#> [34] stringr_1.6.0        fs_2.0.0             ragg_1.5.2          
#> [37] pkgconfig_2.0.3      desc_1.4.3           pkgdown_2.2.0       
#> [40] pillar_1.11.1        bslib_0.10.0         gtable_0.3.6        
#> [43] glue_1.8.0           Rcpp_1.1.1           systemfonts_1.3.2   
#> [46] xfun_0.57            tibble_3.3.1         tidyselect_1.2.1    
#> [49] knitr_1.51           farver_2.1.2         htmltools_0.5.9     
#> [52] labeling_0.4.3       rmarkdown_2.30       compiler_4.5.3      
#> [55] S7_0.2.1             distributional_0.7.0