Rfuncs Project
Bayesians are always shrinking things, but they are growing.
- Guernsey McPearson, alter ego of Stephen Senn
BetaBinomial()
For now, email obrien.ralph@gmail.com for beta version.
The common beta-binomial model is a relatively simple, excellent, and flexible tool for analyzing a single proportion, P, within the classical Bayesian framework. It takes P to be distributed as a beta(a, b) distribution, denoted P ~ beta(a, b), where a and b are the shape1 and shape2 arguments in R's family of beta distribution functions. Beta(1, 1) equates to the standard uniform distribution in which all values of P are equally likely. Beta(a > 1, b > 1) is unimodal. The data are m1 "successes" and m2 "failures," and m1 is taken to be distributed as a binomial random variable with sample size m1+m2 and probability of "success" P, i.e., m1 ~ binomial(m1+m2, P). Given the prior P0 ~ beta(a0, b0), the posterior is P1 ~ beta(a1 = a0+m1, a2 = b0+m2).
​
Example 1 covers, in depth, how to use the functions in the downloaded BetaBinomial file to handle a one-sample clinical trial to test a treatment for a fictitious serious disease (like COVID), including creating effective graphics for the prior and posterior beta distributions and performing study planning analyses. Example 2 discusses a study that seeks to infer what P is, given that know it must be quite low.
Examples 3 and 4 use data from two real trials that tested vaccines for COVID-19. These show how Pfizer-BioNTech adapted the method to assess the incidence rate ratio (IRR, like a fixed hazards ratio) in two-group designs with a time-to-event outcome measure and right censoring. Here, vaccine efficacy is quantified as VE = 1 - IRR. However, the method is "lazy" in how it handles time under surveillance. The Rfunc IRR.Bayes() does this more precisely.
​
Setting P0 ~ beta(a0, b0). a0 and b0 may be set directly with the A0 and B0 arguments. However, it is often very helpful to compute values for a0 > 1 and b0 > 1 based on specifying
-
a prior median of P (P0.50) and a Q*100% quantile (P0.Q), i.e. Prob[P0 < P.Q] = Q, or
-
a prior mode of P and a Q*100% quantile (P.Q)
-
a mean and standard deviation for P0.
​
Transformations of P. The Trans and TransName arguments transform the results for P to those of t(P), where t() is monotonic. Examples 3 and 4 use t(P) = P/(1 - P)*TTUSR, where TTUSR, the total time under surveillance ratio, a constant defined below.
​
Arguments
​
A0, B0
​
​
​
Defines the prior distribution, P0 ~ beta(a0, b0) directly.
Instead of using A0 and B0, P0 ~ beta(a0 ≥ 1, b0 ≥ 1) can be fit by specifying P0.50 and P.Q, Mode and P.Q, or Mean and SD. Not all values for these arguments lead to a fitable beta distribution by this methodology. If the fit fails, BetaBinomial() returns $solved=FALSE and prints an appropriate message.
P0.50
​
​
Mode
​
Mean
Median of P0 ~ beta(a0 ≥ 1, b0 ≥ 1); requires P.Q and Q. The fit is not exact, but almost always sufficient for typical problems.
Mode of P0 ~ beta(a0 ≥ 1, b0 ≥ 1); requires P.Q and Q. Yields exact solution.
Mean of P0 ~ beta(a0 ≥ 1, b0 ≥ 1); requires SD. Yields exact solution.
P0.Q and Q are required when using P0.50 or Mode. SD is required when using Mean.
P0.Q
​
​
​
​
Q
​
SD
Q*100% quantile of P, i.e. Prob[P0 < P0.Q] = Q. Fitting is aided when P0.Q is well separated from Median or Mode:
-
For Median or Mode <= 0.50, best to use Q >= 0.80.
-
For Median or Mode > 0.50, best to use Q < 0.20.
​
Defines P.Q.
​
Standard deviation of P0 ~ beta(a0, b0)
To obtain posterior results, you must specify data. The default, M=c(0,0), limits the output to describing the prior distribution.
M
M = c(m1, m2), where m1 is the number of "successes" and m2 is the number "failures." Thus, the sample proportion of successes is m1/(m1+m2).
The default transformation is t(P) = P, giving a classical beta-binomial problem as per Example 1. Examples 3 and 4 use IRR(P) = P/((1-P)*TTUSR.
TransName
​
Trans
Name of transform, such as "IRR" in Examples 3 and 4.
Transformation of P, such as Trans="P/((1-P)*TTUSR".
Requests for quantiles and cumulative probabilities for P0 ~ beta(a0 ≥ 1, b0 ≥ 1), which may be transformed using TransName and Trans. If using a transformation of P, the quantiles reported are for t(P) and the cumulative probabilities are for Prob[T(P) < t(p)].
PI.level=0.95
​
​
Q.points
​
​
​
CP.points
​
​
​
Level for probability (credible) interval(s), PI = [LPL, UPL], and its associated spread(s),
UPL - LPL.
​
Sets points to compute prior and posterior quantiles of P. The default is Q.points=c(0.005, 0.025,0.05,0.25,0.50,0.75,0.95,0.975,0.995), which gives limits for 90%, 95%, and 99% probability (credible) intervals.
Sets points for computing Prob[P < p], the cumulative probabilities for both prior and posterior beta distributions. For example, CP.points = c(0.20, 0.50) computes Prob[P < 0.20] and Prob[P < 0.50].
​
Print=FALSE suppresses printing of results.
Object Returned
A list with elements:
$solved
$a0, $b0
$a1, $b1
P.50
​
PI
​
​
PI.spread
$qntl
​
$cprob
TRUE/FALSE whether fitting of prior beta(a0, b0) succeeded.
Parameters for prior, P ~ beta(a0, b0).
Parameters for posterior, P ~ beta(a1, b1).
Matrix with medians of P0 or t(P0), where P0 ~ beta(a0,b0) and, if M supplied, P1 ~ beta(a1, b1)).q set by Q.points.
Matrix of lower and upper limits of 100*PI.level% probability intervals, LPL and UPL, for P0 ~ beta(a0, b0) and, if M supplied, P1 ~ beta(a1, b1)), or their like values transformed by t(P0) and t(P1).
UPL - LPL
100*q% quantiles of P0 ~ beta(a0,b0) and P1 ~ beta(a1, b1)), or their t(P) transforms; q set by Q.points.
Cumulative probabilities, Prob[P0 < p)] and Prob[P1 <p] or their t(P) transforms as set by CP.points. Names may be modified using CProbNames.
​
Methodology
​
The methodology is summarized in the code, which implements well-known theory about the standard beta(a, b) distribution and the beta-binomial model for analyzing a single proportion. The additional methodology underlying the COVID vaccine trials is summarized before Example 2.
Examples
​
-
Example 1, which is fiction, stems from my collaborating in 2021 with Bruce Spiess and Cynthia Garvin at the University of Florida to create a Bayesian design for (what I would term) a Phase I/II clinical trial to begin testing a proposed treatment for COVID-19 patients who were hospitalized and had just been intubated. Two prior distributions, each legitimate, are assessed and compared.
-
Example 2 is based on my recollections of an actual study that Jason Connor and I helped design at the Cleveland Clinic in the early 2000s. (The investigator died suddenly before the proposal was funded.) This focuses directly on a single proportion, P, which was known to be small.
​
-
Example 3 deals with the actual analysis and data from the 2020 Pfizer-BioNTech clinical trial that successfully tested their vaccine for COVID. This uses IRR(P) = P/((1- P)*TTUSR. where TTUSR is the ratio of the groups' total time under surveillance. Three prior distributions are assessed and compared, noting how they relate to the Scientific Method.
​
-
Example 4 uses the same method as Example 3 to analyze the data from the Janssen (Johnson & Johnson) clinical trial of their COVID vaccine. These analyses demonstrate how statements made widely in the mass media lacked statistical credibility and even common sense.
​