Bayesians are always shrinking things, but they are growing.

- Guernsey McPearson, alter ego of Stephen Senn

BetaBinomial(), PlotBetaDensities(), BB.WhatIf(), CI.P.score()

Sourcing the file BetaBinomial.RfuncYYMMDD.r (where "YYMMDD" is the version/release date) contains code, loads four functions that relate to analyzing a single proportion, P:

BetaBinomial(). Performs a Bayesian beta-binomial analysis of P.
PlotBetaDensities(). Plots two beta density functions, such as that shown here.
BB.WhatIf(). Performs statistical planning analyses for studies that will depend on beta-binomial analyses.
CI.P.score(). Computes the frequentist estimate and the confidence interval for P based on the score method, sometimes called Wilson's method.

The examples are not run when the file is sourced, but see

For now, email obrien.ralph@gmail.com for beta version.

The common beta-binomial model is a relatively simple, excellent, and flexible tool for analyzing a single proportion, P, within the classical Bayesian framework. It takes P to be distributed as a beta(a, b) distribution, denoted P ~ beta(a, b), where a and b are the shape1 and shape2 arguments in R's family of beta distribution functions. Beta(1, 1) equates to the standard uniform distribution in which all values of P are equally likely. Beta(a > 1, b > 1) is unimodal. The data are m1 "successes" and m2 "failures," and m1 is taken to be distributed as a binomial random variable with sample size m1+m2 and probability of "success" P, i.e., m1 ~ binomial(m1+m2, P). Given the prior P ~ beta(a0, b0), the posterior is P ~ beta(a0+m1, b0+m2).

Example 1 covers the basics using a fictional study based in reality. Examples 2, 3, and 4 use data from three trials that tested vaccines for COVID-19. These show how to adapt the method to assess the incidence rate ratio (IRR, like a fixed hazards ratio) in two-group designs with a time-to-event outcome measure and right censoring. Here, vaccine efficacy is quantified as VE = 1 - IRR.

Setting a0 and b0. a0 and b0 may be set directly with the A0 and B0 arguments. However, it is often very helpful to compute values for a0 > 1 and b0 > 1 based on specifying

a prior median of P (P_0.50) and a Q*100% quantile (P.Q), i.e. Prob[P < P.Q] = Q, or
a prior mode of P and a Q*100% quantile (P.Q)

In addition, specifying

a mean and standard deviation

will fit beta(a0 > 0, b0 > 0).

Transformations of P. The Trans and TransName arguments transform the results for P to those of t(P), where t() is monotonic. Examples 2, 3, and 4 use t(P) = P/(1 - P)*OSTR, where OSTR, the overall surveillance time ratio, a constant defined below.

Arguments

A0, B0

Defines the prior distribution, P ~ beta(a0, b0) directly.

Instead of using A0 and B0, P ~ beta(a0, b0) can be fit by specifying Median and P.Q, Mode and P.Q, or Mean and SD. Not all values for these arguments lead to a fitable beta distribution by this methodology. If the fit fails, BetaBinomial() returns $solved=FALSE and prints an appropriate message.

Median

Mode

Mean

Median of P ~ beta(a0>1, b0>1); requires P.Q and Q. The fit is not exact, but almost always sufficient for typical problems.

Mode of P ~ beta(a0>1, b0>1); requires P.Q and Q. Yields exact solution.

Mean of P ~ beta(a>0, b>0); requires SD. Yields exact solution.

P.Q and Q are required when using Median or Mode. SD is required when using Mean.

P.Q

Q*100% quantile of P, i.e. Prob[P < P.Q] = Q. Fitting is aided when P.Q is well separated from Median or Mode:

For Median or Mode <= 0.50, best to use Q >= 0.80.
For Median or Mode > 0.50, best to use Q < 0.20.

Defines P.Q.

Standard deviation of P ~ beta(a0, b0)

To obtain posterior results, you must specify data. The default, M=c(0,0), limits the output to the prior distribution.

M = c(m1, m2), where m1 is the number of "successes" and m2 is the number "failures." Thus, the sample proportion of successes is m1/(m1+m2).

The default transformation is t(P) = P, giving a classical beta-binomial problem as per Example 1. Examples 2, 3, and 4 use IRR(P) = P/((1-P)*OSTR.

TransName

Trans

Name of transform, such as "IRR" in Examples 2, 3, and 4.

Transformation of P, such as Trans="P/((1-P)*OSTR".

Requests for quantiles and cumulative probabilities for P ~ beta(a0, b0), which may be transformed using TransName and Trans. If using a transformationof P, the quantiles reported are for t(P) and the cumulative probabilities are for Prob[T(P) < t(p)].

PI.level=0.95

Q.points

CP.points

Level for probability (credible) interval(s), PI = [LPL, UPL], and its associated spread(s),
UPL - LPL.

Sets points to compute prior and posterior quantiles of P. The default is Q.points=c(0.005, 0.025,0.05,0.25,0.50,0.75,0.95,0.975,0.995), which gives limits for 90%, 95%, and 99% probability (credible) intervals.

Sets points for computing Prob[P < p], the cumulative probabilities for both prior and posterior beta distributions. For example, CP.points = c(0.20, 0.50) computes Prob[P < 0.20] and Prob[P < 0.50].

Print=FALSE suppresses printing of results.

Objects Returned

A list with elements:

$solved

$a0, $b0

$a1, $b1

$qntl

$cprob

TRUE/FALSE whether fitting of prior beta(a0, b0) succeeded.

Parameters for prior, P ~ beta(a0, b0).

Parameters for posterior, P ~ beta(a1, b1).

100*q% quantiles of P or t(P), where P ~ beta(a0,b0) and, if M supplied, P ~ beta(a1, b1)); q set by Q.points.

Cumulative probabilities, Prob[P < p)] or Prob[t(P) < t(p)] as set by CP.points. Names may be modified using CProbNames.

Methodology

The methodology is summarized in the code, which implements well-known theory about the standard beta(a, b) distribution and the beta-binomial model for analyzing a single proportion. The additional methodology underlying the COVID vaccine trials is summarized before Example 2.

Examples

Example 1 is based on my recollections of an actual study that Jason Connor and I helped design at the Cleveland Clinic in the early 2000s. (The investigator died suddenly before the proposal was funded.) This focuses directly on a single proportion, P, which was known to be small.

Example 2 deals with the actual analysis and data from the 2020 Pfizer-BioNTech clinical trial that successfully tested their vaccine for COVID. This uses IRR(P) = P/((1- P)*OSTR. where OSTR is the ratio of the group's total at-risk (surveillance) times. Three prior distributions are assessed and compared, noting how they relate to the Scientific Method.

Example 3 uses the same method as Example 2 to analyze the data from the Janssen (Johnson & Johnson) clinical trial of their COVID vaccine. These analyses demonstrate how statements made widely in the mass media lacked statistical credibility and even common sense.

Example 4 is much like Example 3, but is based on AstaZenaca's trial of their COVID vaccine. However, it randomized 2:1 vaccine to placebo. (To be completed whenever the necessary data and AstaZenaca's analyses are submitted to the US FDA and publicly released.)