Bayesians are always shrinking things, but they are growing.

- Guernsey McPearson, alter ego of Stephen Senn

BetaBinomial()

For now, email obrien.ralph@gmail.com for beta version.

The common beta-binomial model is a relatively simple, excellent, and flexible tool for analyzing a single proportion, P, within the classical Bayesian framework. It takes P to be distributed as a beta(a, b) distribution, denoted P ~ beta(a, b), where a and b are the shape1 and shape2 arguments in R's family of beta distribution functions. Beta(1, 1) equates to the standard uniform distribution in which all values of P are equally likely. Beta(a > 1, b > 1) is unimodal. The data are m1 "successes" and m2 "failures," and m1 is taken to be distributed as a binomial random variable with sample size m1+m2 and probability of "success" P, i.e., m1 ~ binomial(m1+m2, P). Given the prior P0 ~ beta(a0, b0), the posterior is P1 ~ beta(a1 = a0+m1, a2 = b0+m2).

Example 1 covers, in depth, how to use the functions in the downloaded BetaBinomial file to handle a one-sample clinical trial to test a treatment for a fictitious serious disease (like COVID), including creating effective graphics for the prior and posterior beta distributions and performing study planning analyses. Example 2 discusses a study that seeks to infer what P is, given that know it must be quite low.

Examples 3 and 4 use data from two real trials that tested vaccines for COVID-19. These show how Pfizer-BioNTech adapted the method to assess the incidence rate ratio (IRR, like a fixed hazards ratio) in two-group designs with a time-to-event outcome measure and right censoring. Here, vaccine efficacy is quantified as VE = 1 - IRR. However, the method is "lazy" in how it handles time under surveillance. The Rfunc IRR.Bayes() does this more precisely.

Setting P0 ~ beta(a0, b0). a0 and b0 may be set directly with the A0 and B0 arguments. However, it is often very helpful to compute values for a0 > 1 and b0 > 1 based on specifying

a prior median of P (P0.50) and a Q*100% quantile (P0.Q), i.e. Prob[P0 < P.Q] = Q, or
a prior mode of P and a Q*100% quantile (P.Q)
a mean and standard deviation for P0.

Transformations of P. The Trans and TransName arguments transform the results for P to those of t(P), where t() is monotonic. Examples 3 and 4 use t(P) = P/(1 - P)*TTUSR, where TTUSR, the total time under surveillance ratio, a constant defined below.

Arguments

A0, B0

Defines the prior distribution, P0 ~ beta(a0, b0) directly.

Instead of using A0 and B0, P0 ~ beta(a0 ≥ 1, b0 ≥ 1) can be fit by specifying P0.50 and P.Q, Mode and P.Q, or Mean and SD. Not all values for these arguments lead to a fitable beta distribution by this methodology. If the fit fails, BetaBinomial() returns $solved=FALSE and prints an appropriate message.

P0.50

Mode

Mean

Median of P0 ~ beta(a0 ≥ 1, b0 ≥ 1); requires P.Q and Q. The fit is not exact, but almost always sufficient for typical problems.

Mode of P0 ~ beta(a0 ≥ 1, b0 ≥ 1); requires P.Q and Q. Yields exact solution.

Mean of P0 ~ beta(a0 ≥ 1, b0 ≥ 1); requires SD. Yields exact solution.

P0.Q and Q are required when using P0.50 or Mode. SD is required when using Mean.

P0.Q

Q*100% quantile of P, i.e. Prob[P0 < P0.Q] = Q. Fitting is aided when P0.Q is well separated from Median or Mode:

For Median or Mode <= 0.50, best to use Q >= 0.80.
For Median or Mode > 0.50, best to use Q < 0.20.

Defines P.Q.

Standard deviation of P0 ~ beta(a0, b0)

To obtain posterior results, you must specify data. The default, M=c(0,0), limits the output to describing the prior distribution.

M = c(m1, m2), where m1 is the number of "successes" and m2 is the number "failures." Thus, the sample proportion of successes is m1/(m1+m2).

The default transformation is t(P) = P, giving a classical beta-binomial problem as per Example 1. Examples 3 and 4 use IRR(P) = P/((1-P)*TTUSR.

TransName

Trans

Name of transform, such as "IRR" in Examples 3 and 4.

Transformation of P, such as Trans="P/((1-P)*TTUSR".

Requests for quantiles and cumulative probabilities for P0 ~ beta(a0 ≥ 1, b0 ≥ 1), which may be transformed using TransName and Trans. If using a transformation of P, the quantiles reported are for t(P) and the cumulative probabilities are for Prob[T(P) < t(p)].

PI.level=0.95

Q.points

CP.points

Level for probability (credible) interval(s), PI = [LPL, UPL], and its associated spread(s),
UPL - LPL.

Sets points to compute prior and posterior quantiles of P. The default is Q.points=c(0.005, 0.025,0.05,0.25,0.50,0.75,0.95,0.975,0.995), which gives limits for 90%, 95%, and 99% probability (credible) intervals.

Sets points for computing Prob[P < p], the cumulative probabilities for both prior and posterior beta distributions. For example, CP.points = c(0.20, 0.50) computes Prob[P < 0.20] and Prob[P < 0.50].

Print=FALSE suppresses printing of results.

Object Returned

A list with elements:

$solved

$a0, $b0

$a1, $b1

P.50

PI.spread

$qntl

$cprob

TRUE/FALSE whether fitting of prior beta(a0, b0) succeeded.

Parameters for prior, P ~ beta(a0, b0).

Parameters for posterior, P ~ beta(a1, b1).

Matrix with medians of P0 or t(P0), where P0 ~ beta(a0,b0) and, if M supplied, P1 ~ beta(a1, b1)).q set by Q.points.

Matrix of lower and upper limits of 100*PI.level% probability intervals, LPL and UPL, for P0 ~ beta(a0, b0) and, if M supplied, P1 ~ beta(a1, b1)), or their like values transformed by t(P0) and t(P1).

UPL - LPL

100*q% quantiles of P0 ~ beta(a0,b0) and P1 ~ beta(a1, b1)), or their t(P) transforms; q set by Q.points.

Cumulative probabilities, Prob[P0 < p)] and Prob[P1 <p] or their t(P) transforms as set by CP.points. Names may be modified using CProbNames.

Methodology

The methodology is summarized in the code, which implements well-known theory about the standard beta(a, b) distribution and the beta-binomial model for analyzing a single proportion. The additional methodology underlying the COVID vaccine trials is summarized before Example 2.

Examples

Example 1, which is fiction, stems from my collaborating in 2021 with Bruce Spiess and Cynthia Garvin at the University of Florida to create a Bayesian design for (what I would term) a Phase I/II clinical trial to begin testing a proposed treatment for COVID-19 patients who were hospitalized and had just been intubated. Two prior distributions, each legitimate, are assessed and compared.

Example 2 is based on my recollections of an actual study that Jason Connor and I helped design at the Cleveland Clinic in the early 2000s. (The investigator died suddenly before the proposal was funded.) This focuses directly on a single proportion, P, which was known to be small.

Example 3 deals with the actual analysis and data from the 2020 Pfizer-BioNTech clinical trial that successfully tested their vaccine for COVID. This uses IRR(P) = P/((1- P)*TTUSR. where TTUSR is the ratio of the groups' total time under surveillance. Three prior distributions are assessed and compared, noting how they relate to the Scientific Method.

Example 4 uses the same method as Example 3 to analyze the data from the Janssen (Johnson & Johnson) clinical trial of their COVID vaccine. These analyses demonstrate how statements made widely in the mass media lacked statistical credibility and even common sense.