rbi.helpers is collection of helper functions to use with rbi, an R interface to LibBi, a library for Bayesian Inference.

This vignette builds on the rbi
vignette, applying the higher-level functions contained in
**rbi.helpers** to the same model introduced there. For the
lower-level functions to run **LibBi** and read the
results, please refer to the documentation and vignette that comes with
**rbi**.

The easiest way to install the latest stable version of
**rbi.helpers** is via CRAN. The package name is
`rbi.helpers`

(all lowercase):

`install.packages("rbi.helpers")`

Alternatively, the current development version can be installed using
the `remotes`

package

`remotes::install_github("sbfnk/rbi.helpers")`

Most functions in the **rbi.helpers** package require a
working installation of **LibBi**. Please see the rbi
vignette for how to get one via homebrew or linuxbrew.

These steps are reproduced from the rbi vignette, where there is more information on the individual steps

```
model_file <- system.file(package = "rbi", "SIR.bi") # file included in package
sir_model <- bi_model(model_file) # load model
set.seed(1001912)
sir_data <- bi_generate_dataset(sir_model, end_time = 16 * 7, noutputs = 16)
```

In the rbi
vignette, a stochastic
SIR model is fitted to simulated data from the same model using
particle Markov-chain Monte Carlo with 32 particles. Given a model and
data, how do we know how many particles we need? This question does not
have a simple answer, as the “optimal” number of particles may depend on
the state of the Markov chain. A possible rule-of-thumb is to choose the
number of particles such that the variance of the log-likelihood near
the mode is approximately one. This suggests a strategy
by which first and approximate location of the mode or mean of the
posterior distribution is obtained in a trial run, before the numer of
particles is adjusted by monitoring the variance of the log-likelihood
while keeping the parameters fixed. **rbi.helpers**
implements the second part of this strategy (adjusting the number of
particles at a given location in parameter space) with the
`adapt_particles`

method. For the first part (finding the
mode), a crude method is to take a fixed number of samples from the
prior distribution and choose the one that maximises the posterior. In
**rbi**, this can be achieved with

```
bi_prior <- sample(
proposal = "prior", sir_model, nsamples = 1000, end_time = 16 * 7,
nparticles = 4, obs = sir_data, seed = 1234
)
```

This runs particle MCMC with the prior distribution as proposal
distribution (because `proposal = "prior"`

), in this case
with 4 particles. In other words, when sampling from the posterior the
proposals will be drawn independently from the prior distribution. Note
that we set a seed to make the results reproducible. It is worth trying
the commands with a different seed and seeing the difference to the
results obtained below. The location in parameters of the sampler at the
end of the 1000 iterations will give an approximation of the mode of the
posterior distribution. This can then be used to adjust the number of
particles using

```
adapted <- adapt_particles(bi_prior)
#> Wed Jul 26 13:56:34 2023 Adapting the number of particles
#> Wed Jul 26 13:56:57 2023 4 particles, loglikelihod variance: 1.66872911536756
#> Wed Jul 26 13:57:02 2023 8 particles, loglikelihod variance: 1.19505440796246
#> Wed Jul 26 13:57:09 2023 16 particles, loglikelihod variance: 0.965995392125263
#> Wed Jul 26 13:57:09 2023 Choosing 16 particles.
```

This will take the last sample of the output file contained in the
`libbi`

object `bi_prior`

, and use it to adjust
the number of particles by starting with 1 particle (or a given
`min`

) and doubling it until the variance of the
loglikelihood crosses 1. The number of particles is then saved in the
`adapted`

object:

```
adapted$options$nparticles
#> [1] 16
```

Having adjusted the number of particles, the second important
information to give the posterior sampler is the proposal distribution.
This can, again, be obtained using a sequence of trial runs, whereby the
proposal distribution is sequentially adjusted from previous samples to
be proportional to the empirical covariance of the posterior samples.
The way this is implemented in the `adapt_proposal`

function
in **rbi.helpers** is that the proposal distribution is
adjusted to come from a multivariate normal taking into account the
covariance of samples obtained so far, until the acceptance rate lies
between the given minimum and maximumad. For example, to adjust the
proposal distribution for an acceptance rate between 0.05 and 0.4, we
can run:

```
adapted <- adapt_proposal(adapted, min = 0.05, max = 0.4)
#> Wed Jul 26 13:57:10 2023 Adapting the proposal distribution
#> Wed Jul 26 13:57:10 2023 Initial trial run
#> Wed Jul 26 13:57:36 2023 Acceptance rate 0.468468468468468, adapting size with scale 1
#> Wed Jul 26 13:57:43 2023 Acceptance rate 0.414414414414414, adapting size with scale 2
#> Wed Jul 26 13:57:50 2023 Acceptance rate: 0.306306306306306
```

The covariance matrices for parameters and initial conditions are
stored in the input file contained in the `libbi`

object
`adapted`

```
bi_read(adapted, file = "input")
#> $`__proposal_parameter_cov`
#> __dim_parameter_cov.1 __dim_parameter_cov.2 value
#> 1 p_rep p_rep 0.009676395
#> 2 p_rep p_R0 0.000000000
#> 3 p_R0 p_rep 0.000000000
#> 4 p_R0 p_R0 0.461986363
```

To compute the Deviance
Information Criterion (DIC), use `DIC`

:

This samples from the posterior distribution using the adapted number of particles and proposal distribution and then calculates the DIC.

LibBi uses real-valued times. To convert these to time or date
objects for use in R, use the `numeric_to_time`

function:

```
res <- numeric_to_time(posterior, unit = "day", origin = as.Date("2018-04-01"))
head(res$Z)
#> np time value
#> 1 0 2018-04-01 1.0000000
#> 2 0 2018-04-08 0.5700822
#> 3 0 2018-04-15 4.4669582
#> 4 0 2018-04-22 92.7540209
#> 5 0 2018-04-29 428.5571654
#> 6 0 2018-05-06 204.5738370
```

The function `time_to_numeric`

does the converse,
converting R times or dates into numeric values for use by LibBi:

```
orig <- time_to_numeric(res, unit = "day", origin = as.Date("2018-04-01"))
head(orig$Z)
#> np time value
#> 1 0 0 1.0000000
#> 2 0 7 0.5700822
#> 3 0 14 4.4669582
#> 4 0 21 92.7540209
#> 5 0 28 428.5571654
#> 6 0 35 204.5738370
```

With the pipe operator available since R version 4.1, it is possible to construct inference chains more elegantly. For example, to get posterior samples from adapted Metropolis-Hastings including sampled observations, we could have written

```
posterior <- sample(
proposal = "prior", sir_model, nsamples = 1000,
end_time = 16 * 7, nparticles = 4, obs = sir_data, seed = 1234
) |>
adapt_particles() |>
adapt_proposal(min = 0.05, max = 0.4) |>
sample(nsamples = 5000) |>
sample_obs()
#> Wed Jul 26 13:58:17 2023 Adapting the proposal distribution
#> Wed Jul 26 13:59:03 2023 Adapting the number of particles
#> Wed Jul 26 13:59:27 2023 4 particles, loglikelihod variance: 1.90228595419509
#> Wed Jul 26 13:59:32 2023 8 particles, loglikelihod variance: 0.931236269625011
#> Wed Jul 26 13:59:32 2023 Choosing 8 particles.
#> Wed Jul 26 13:59:32 2023 Initial trial run
#> Wed Jul 26 13:59:58 2023 Acceptance rate: 0.267267267267267
```