Count Data

rpact Training 2024, UCB

Gernot Wassmer and Friedrich Pahlke

August 29, 2024

Applications

Exacerbation rates in asthma and COPD
Counts of brain lesions by MRI in Multiple Sclerosis
Annual relapse rate in pediatric MS
Hospitalizations in heart failure trials

Literature selection:
Keene et al (2007)
Zhu and Lakkis (2014)
Stucke and Kieser (2013)
Friede and Schmidli (2009, 2010)
Schneider, Schmidli, and Friede (2013)
Friede, Häring, and Schmidli (2018)
Mütze, Glimm, Schmidli, Friede (2019a, b)

\(\hspace{1cm}\)

Fixed sample size

\(\hspace{1cm}\)

Blinded sample size reestimation

Group sequential designs

Notation

Let \[ H_0 : \frac{\lambda_1}{\lambda_2} = \delta_0\;,\;\; r = \frac{n_1}{n_2}\;, \] where \(t_{ijk}\,\lambda_i\) are the rates of a negative binomial distributed random variable \(Y_{ijk}\): \[Y_{ijk} \sim NB(t_{ijk}\,\lambda_i, \phi)\]

\[P(Y_{ijk} = y) = \frac{\Gamma(1 + 1/\phi)}{\Gamma(1/\phi) \, y!}\left(\frac{1}{1 + \phi\,t_{ijk}\,\lambda_i}\right)^{1/\phi} \left(\frac{\phi\,t_{ijk}\,\lambda_i}{1 + \phi\,t_{ijk}\,\lambda_i}\right)^y\] with

\[E(Y_{ijk}) = t_{ijk}\,\lambda_i \quad \text{and} \quad Var(Y_{ijk}) = t_{ijk}\,\lambda_i(1 + \phi\,t_{ijk}\,\lambda_i)\]

Mütze et al (2019)

Sample size calculation

to meet power \(1-\beta\) for two-sample comparisons is performed for

assumed \(\lambda_1\) and \(\lambda_2\),
assumed exposure times \(t_{ijk}\) for treatment \(i\), subjects \(j = 1,\ldots,n_i\), at interim stage \(k\),
and assumed overdispersion (shape paramater) \(\phi \geq 0\).

Fisher information

The information level of the Wald statistic at stage \(k\) is given by

\[ {\cal I}_k = \left(\frac{1}{ \sum_{j = 1}^{n_1} \frac{t_{1jk}\lambda_1}{1 + \phi\,t_{1jk}\lambda_1}} + \frac{1}{ \sum_{j = 1}^{n_2}\frac{t_{2jk}\lambda_2}{1 + \phi\,t_{2jk}\lambda_2}}\right)^{-1}\;. \]

This can be derived from the expected Fisher information in a negative binomial regression model (cf., Lawless, 1987, see also Zhu & Lakkis, 2012)

Fixed exposure time

In many applications it can be assumed that all subjects have the same exposure time during the stages of the trial, i.e., \(t_{ijk} = t\). In this case, the information at the end of the trial \({\cal I} = {\cal I}_K\) with \(n_1 = r\,n_2\) simplifies to

\[ \begin{split} {\cal I} = & \frac{\sum_{j = 1}^{n_1} \frac{t\,\lambda_1}{1 + \phi\,t\,\lambda_1} \sum_{j = 1}^{n_2} \frac{t\,\lambda_2}{1 + \phi\,t\,\lambda_2}} {\sum_{j = 1}^{n_1} \frac{t\,\lambda_1}{1 + \phi\,t\,\lambda_1} + \sum_{j = 1}^{n_2} \frac{t\lambda_2}{1 + \phi\,t\,\lambda_2}} = \frac{n_2^2 \frac{r\,t\,\lambda_1}{1 + \phi\,t\,\lambda_1} \frac{t\,\lambda_2}{1 + \phi\,t\,\lambda_2}} {n_2 (\frac{r\,t\,\lambda_1}{1 + \phi\,t\,\lambda_1} + \frac{t\,\lambda_2}{1 + \phi\,t\,\lambda_2})} = \\[2mm] &n_2\,\frac{r\,t^2\lambda_1 \lambda_2} {r\,t\,\lambda_1 + r\,t\lambda_1 \phi\,t\,\lambda_2 + t\lambda_2 + r\,t\,\lambda_2 \phi\,t\,\lambda_1} = \\[2mm] &n_2\,\frac{1}{ \frac{r\,\lambda_1 + r\,\lambda_1 \phi\,t\,\lambda_2 + \lambda_2 + r\,\lambda_2 \phi\,t\,\lambda_1} {r\,t\,\lambda_1 \lambda_2}} = n_2\;\left(\frac{1}{t} (\frac{1}{\lambda_2} + \frac{1}{r\,\lambda_1}) + \phi\,(1 + \frac{1}{r})\right)^{-1} \;. \end{split} \]

Sample size calculation

Setting \[ {\cal I} = \frac{(\Phi^{-1}(1-\alpha) + \Phi^{-1}(1-\beta))^2}{(\log\frac{\lambda_1}{\lambda_2} - \log(\delta_0))^2} =: \frac{(z_{1 - \alpha} + z_{1 - \beta})^2}{(\log\frac{\lambda_1}{\lambda_2} - \log(\delta_0))^2} \] provides the sample size formula \[ n_2 = \frac{(z_{1 - \alpha} + z_{1 - \beta})^2}{(\log\frac{\lambda_1}{\lambda_2} - \log(\delta_0))^2} \;\left(\frac{1}{t} (\frac{1}{\lambda_2} + \frac{1}{r\,\lambda_1}) + \phi\,(1 + \frac{1}{r})\right) \] given in Friede and Schmidli (2010) which is the same as approach 2 of Zhu and Lakkis (2014).

\(\alpha\) is replaced by \(\alpha/2\) in the two-sided case (only \(\delta_0 = 1\)).

Group sequential design

Test statistic is based on the Wald statistic that adds the difference of the rates on the log-scale.
As shown in Mütze et al. (2019), if Maximum Likelihood estimates are used to estimate the true parameters, the sequence of Wald statistics has the independent and normally distributed increments property (asymptotically).

Maximum sample size

Aim is to calculate the maximum sample size, \(N\), in a group sequential setting.

Fixed exposure time

If a group sequential design with information rates \(I = (\tau_1,\tau_2,\ldots,1)\) and inflation factor \(IF > 1\) calculated, the maximum cumulative sample size is

\[ N = IF\!\cdot\!n_2(1 + r) \] and \[ N_1 = \frac{r}{1 + r}\!\cdot\! N \;\text{, } \;N_2 = \frac{1}{1 + r}\!\cdot\! N \;. \]

Variable exposure time

If subjects entering the study have different exposure times, typically an accrual time is followed by an additional follow-up time. If subjects entering the study in an accrual period \([0;\, a]\) and the study time is \(t_{max}\), at time point \(t_k\), the time under exposure for subject \(j\) in treatment \(i\) at stage \(k\) of the trial is

\[ t_{ijk} = t_{k} - a_{ij}\; \]

with \(t_K = t_{max}\).

The maximum and stage-wise sample size for the group sequential design are found by finding the smallest \(N_{2k}\) (and \(N_{1k}\)) for which

\[ \left(\frac{1}{ \sum_{j = 1}^{r\cdot N_{2k}} \frac{t_{1jk}\lambda_1}{1 + \phi\,t_{1jk}\lambda_1}} + \frac{1}{ \sum_{j = 1}^{N_{2k}}\frac{t_{2jk}\lambda_2}{1 + \phi\,t_{2jk}\lambda_2}}\right)^{-1} = IF\!\cdot\! \frac{(z_{1 - \alpha} + z_{1 - \beta})^2}{(\log\frac{\lambda_1}{\lambda_2} - \log(\delta_0))^2} \]

with \(t_{ijk} = 0\) for \(t_{ijk} < a_{ij}\).

Observation times

If an accrual time is specified, the calendar times where the interim stages are to be conducted in order to reach the information levels specified for the stages are numerically determined by finding the time points \(t_k\) with

\[ t_{ijk} = \max\{t_{k} - a_{ij},\,0\}\; \]

through

\[ \left(\frac{1}{ \sum_{j = 1}^{r\cdot N_{2k}} \frac{t_{1jk}\lambda_1}{1 + \phi\,t_{1jk}\lambda_1}} + \frac{1}{ \sum_{j = 1}^{N_{2k}}\frac{t_{2jk}\lambda_2}{1 + \phi\,t_{2jk}\lambda_2}}\right)^{-1} = I \!\cdot\! IF\!\cdot\! \frac{(z_{1 - \alpha} + z_{1 - \beta})^2}{(\log\frac{\lambda_1}{\lambda_2} - \log(\delta_0))^2}\;. \]

\(\hspace{1cm}\)

For fixed exposure time, \(t\),

\[ t_{ijk} = \max\{ \min\{t_{k} - a_{ij}, t\}, 0\}\; . \]

Optimum allocation ratio

For fixed exposure time and given \(\lambda_1\) and \(\lambda_2\), the optimum sample size allocation ratio \(r\) (minimizing the overall sample size at given power) is given by

\[ r = \sqrt{\frac{\frac{1}{t\,\lambda_1} + \phi}{\frac{1}{t\,\lambda_2} + \phi}} \] which is found by minimizing

\[ n_2(1 + r) = c \;\left(\frac{1}{t} \frac{(r\,\lambda_1 + \lambda_2)(1 + r)}{r\,\lambda_1\lambda_2} + \phi\,\frac{(1 + r)^2}{r}\right) \] where \(c\) is a constant.

\(\hspace{1cm}\)

For variable exposure time, the optimum allocation ratio is found by a numerical search algorithm.

Power calculation

At given \(\lambda_1\) and \(\lambda_2\), the non-centrality parameter is

\[ \xi = (\log(\lambda_1/\lambda_2) - \log(\delta_0)) / \sqrt{V}\;, \]

where \(V\) is the derived variance estimate for the considered situation, that is,

\[ V = (1 + r)\left(\frac{1}{t} (\frac{1}{\lambda_2} + \frac{1}{r\,\lambda_1}) + \sigma\,(1 + \frac{1}{r})\right) \]

for assuming a fixed exposure time.

\(\hspace{1cm}\)

For assuming variable exposure times, at given maximum number of subjects,
\(N\),

\[ V = N \left(\frac{1}{ \sum_{j = 1}^{N_1} \frac{t_{1jk}\lambda_1}{1 + \sigma\,t_{1jk}\lambda_1}} + \frac{1}{ \sum_{j = 1}^{N_2}\frac{t_{2jk}\lambda_2}{1 + \sigma\,t_{2jk}\lambda_2}}\right). \]

Power calculation

Given \(N\), the group sequential multiple integral is calculated by setting
\[ \zeta_k = \xi \sqrt{t_k\!\cdot\!N} \;,\;k=1,\ldots,K. \]

R packages for count data:

gscounts: Design and analysis of group sequential designs for negative binomial outcomes, as described by T Mütze, E Glimm, H Schmidli, T Friede (2018)

Version 1.0.4, published Nov 2021

Reference manual: cran.r-project.org/web/packages/gscounts/gscounts.pdf

Vignette: cran.r-project.org/web/packages/gscounts/vignettes/Introduction_gscounts.html

\(\hspace{1cm}\)

NBDesign: Design and monitoring of clinical trials with negative binomial endpoint

Version 2.0.0, published Sep 2020 by Luo Xiaodong

Reference manual: cran.r-project.org/web/packages/NBDesign/NBDesign.pdf

rpact Functions for Count Data

`getSampleSizeCounts()`

performs sample size and power calculations for count data design.

You can specify

either\(\lambda_1\) and \(\lambda_2\), or \(\lambda_2\) and \(\theta\), or \(\lambda\) and \(\theta\) (blinded sample size reassessment!)
\(\lambda_1\) and \(\theta\) can be vectors
different ways of calculation: fixed exposure time, accrual and study time, or accrual and fixed number of patients
staggered patient entry
group sequential or fixed sample size setting

Usage

getSampleSizeCounts(
design = NULL,
…,
lambda1 = NA_real_,
lambda2 = NA_real_,
lambda = NA_real_,
theta = NA_real_,
thetaH0 = 1,
overdispersion = 0,
fixedExposureTime = NA_real_,
accrualTime = NA_real_,
accrualIntensity = NA_real_,
followUpTime = NA_real_,
maxNumberOfSubjects = NA_real_,
allocationRatioPlanned = NA_real_)

Examples

Fixed exposure time

Each subject is observed the same length of time, or endpoint is event rate in a time interval.
Assume a new combination therapy is assumed to decrease the exacerbation rate from 1.4 to 1.05 (25% decrease) within an observation period of 1 year, i.e., each subject has a equal follow-up of 1 year.
What is the sample size that yields 90% power for detecting such a difference, if the over dispersion is assumed to be equal to 0.5?

Solution

example1 <- getSampleSizeCounts(
beta = 0.1,
lambda2 = 1.4,
theta = 0.75,
overdispersion = 0.5,
fixedExposureTime = 1)
names(example1)

 [1] "thetaH0"                        "groups"                         "allocationRatioPlanned"         "optimumAllocationRatio"         "directionUpper"                 "lambda1"                        "lambda2"                        "lambda"                         "theta"                          "nFixed"                         "nFixed1"                        "nFixed2"                        "maxNumberOfSubjects"            "maxNumberOfSubjects1"           "maxNumberOfSubjects2"          
[16] "overallReject"                  "rejectPerStage"                 "futilityStop"                   "futilityPerStage"               "earlyStop"                      "overdispersion"                 "fixedExposureTime"              "accrualTime"                    "accrualIntensity"               "followUpTime"                   "calendarTime"                   "expectedStudyDurationH1"        "studyTime"                      "numberOfSubjects"               "expectedNumberOfSubjectsH1"    
[31] "informationOverStages"          "expectedInformationH0"          "expectedInformationH01"         "expectedInformationH1"          "criticalValuesEffectScale"      "criticalValuesEffectScaleLower" "criticalValuesEffectScaleUpper" "criticalValuesPValueScale"      "futilityBoundsEffectScale"      "futilityBoundsEffectScaleLower" "futilityBoundsEffectScaleUpper" "futilityBoundsPValueScale"      "maxInformation"

example1$nFixed

[1] 678

example1$maxInformation

[1] 126.9611

You can also use the function design_nb() from the gscounts package to perform this calculation:

library(gscounts)
out_fix <- design_nb(rate1 = 1.4, rate2 = 0.75*1.4, dispersion = 0.5, 
        power = 0.9, sig_level = 0.025, followup_max = 1)
names(out_fix)

 [1] "rate1"        "rate2"        "dispersion"   "power"        "ratio_H0"     "sig_level"    "n"            "n1"           "n2"           "max_info"     "followup_max"

out_fix$n

[1] 678

getPowerCounts()

performs the power calculation at given sample size:

getPowerCounts(
lambda2 = 1.4,
theta = 0.75,
overdispersion = 0.5,
fixedExposureTime = 1,
directionUpper = FALSE,
maxNumberOfSubjects = example1$nFixed
) |> summary()

Power calculation for a count data endpoint

Fixed sample analysis, one-sided significance level 2.5%. The results were calculated for a two-sample test for count data, H0: lambda(1) / lambda(2) = 1, power directed towards smaller values, H1: effect = 0.75, lambda(2) = 1.4, number of subjects = 678, overdispersion = 0.5, fixed exposure time = 1.

Stage	Fixed
Efficacy boundary (z-value scale)	1.960
Power	0.9004
Lambda(1)	1.050
One-sided local significance level	0.0250

Usage in blinded sample size reestimation

Superiority Table 1 from Friede and Schmidli (2010)

example2 <- getSampleSizeCounts(
alpha = 0.025,
beta = 0.2,
lambda = 1,
theta = 0.7,
overdispersion = 0.4,
fixedExposureTime = 1)

example2$nFixed2

[1] 177

Noninferiority: Table 2 from Friede and Schmidli (2010)

example3 <- getSampleSizeCounts(
alpha = 0.025,
beta = 0.2,
lambda = 1,
theta = 1,
thetaH0 = 1.15,
overdispersion = 0.4,
fixedExposureTime = 1)
example3$nFixed2

[1] 1126

Example of Zhu and Lakkis (2014) Table 1, Method 2

example4 <- getSampleSizeCounts(
alpha = 0.025,
beta = 0.2,
lambda2 = 0.8,
theta = 0.85,
overdispersion = 0.4,
fixedExposureTime = 0.75)
example4$nFixed2

[1] 1316

Example of Zhu and Lakkis (2014) Table 2, Method 2

example5 <- getSampleSizeCounts(
alpha = 0.025,
beta = 0.2,
lambda2 = 2,
theta = 0.5,
overdispersion = 1,
allocationRatioPlanned = 2/3,
fixedExposureTime = 1)
example5$nFixed

[1] 124

Design with interim stages

design <- getDesignGroupSequential(
    informationRates = c(0.4, 0.7, 1),
    beta = 0.2,
    typeOfDesign = "asOF",
    typeBetaSpending = "bsOF",
    bindingFutility = TRUE
)

plot(design)

Fixed exposure time: Specify fixedExposureTime

example6 <- getSampleSizeCounts(
design = design,    
lambda1 = 0.2,
lambda2 = 0.3,
overdispersion = 1,
fixedExposureTime = 12)
example6$maxNumberOfSubjects

[1] 276

example6$informationOverStages

         [,1]
[1,] 20.38332
[2,] 35.67081
[3,] 50.95829

getDesignCharacteristics(design)$shift / log(0.2 / 0.3)^2

[1] 50.95829

c(0.4, 0.7, 1) * getDesignCharacteristics(design)$shift / log(0.2 / 0.3)^2

[1] 20.38332 35.67081 50.95829

Estimate observation time under uniform recruitment

Suppose uniform recruitment of patients is over 6 months: accrualTime = 6

example7 <- getSampleSizeCounts(
design = design,    
lambda1 = 0.2,
lambda2 = 0.3,
overdispersion = 1,
allocationRatioPlanned = 1,
fixedExposureTime = 12,
accrualTime = 6)
example7$numberOfSubjects

     [,1]
[1,]  234
[2,]  276
[3,]  276

example7$calendarTime

          [,1]
[1,]  5.105111
[2,]  7.806379
[3,] 18.000000

Use gscounts: accrual_period = 6

library(gscounts)
out <- design_gsnb(rate1 = 0.2, rate2 = 0.3, 
        dispersion = 1, ratio_H0 = 1, 
        power = 0.8, 
        sig_level = 0.025, 
        timing = c(0.4, 0.7, 1), 
        esf = obrien, 
        followup_max = 12, 
        random_ratio = 1,
        accrual_period = 6,
        esf_futility = obrien,
        futility = "binding")
out$n

[1] 274

out$calendar

[1]  5.121567  7.851587 18.000000

out$max_info

[1] 50.84536

gscounts uses integer sample sizes and recalculates information and calendar times. This explains the slight differences.

Sample size for variable exposure time with uniform recruitment

Specify followUpTime

example8 <- getSampleSizeCounts(
design = design,    
lambda1 = 0.0875,
lambda2 = 0.125,
overdispersion = 5,
allocationRatioPlanned = 1,
followUpTime = 2.75,
accrualTime = 1.25)
example8$calendarTime

         [,1]
[1,] 1.332914
[2,] 2.206128
[3,] 4.000000

example8$numberOfSubjects

     [,1]
[1,] 2082
[2,] 2082
[3,] 2082

Use gscounts: specify study_period

out <- design_gsnb(rate1 = 0.0875, rate2 = 0.125, dispersion = 5, 
        ratio_H0 = 1, power = 0.8, sig_level = 0.025, 
        timing = c(0.4, 0.7, 1), esf = obrien, study_period = 4, 
        accrual_period = 1.25, random_ratio = 1, 
        futility = "binding", esf_futility = obrien)
out$n1

[1] 1040

out$n2

[1] 1040

out$calendar

[1] 1.333422 2.207740 4.000000

Given recruitment times and non-uniform recruitment

In getSampleSizeCounts(), you can specify a vector accrualTime and acrualIntensity or specify maxNumberOfSubjects and find study time.
In gscounts, this is handled through the specification of the study entry times (t_recruit1 and t_recruit2).

Example

How long is the study duration if patient recruitment is performed over 1 year instead of 1.25 years, i.e., if 4/5 * 2080 = 1664 subjects will be recruited?

example9 <- getSampleSizeCounts(
design = design,    
lambda1 = 0.0875,
lambda2 = 0.125,
overdispersion = 5,
accrualTime = 1,
maxNumberOfSubjects = 1664)
example9$calendarTime

         [,1]
[1,] 1.429221
[2,] 2.932854
[3,] 7.884515

Use gscounts:

recruit <- seq(0, 1, length.out = 1664 / 2)
out <- design_gsnb(rate1 = 0.0875, rate2 = 0.125, dispersion = 5, 
        power = 0.8, sig_level = 0.025, 
        timing = c(0.4, 0.7, 1), esf = obrien,
        futility = "binding", esf_futility = obrien,
        t_recruit1 = recruit, t_recruit2 = recruit)
out$calendar

[1] 1.429222 2.932858 7.884545

accrualIntensity: number of subjects per time unit

example10 <- getSampleSizeCounts(
design = design,    
lambda1 = 0.0875,
lambda2 = 0.125,
overdispersion = 5,
accrualTime = c(0, 0.5, 1),
accrualIntensity = c(1664, 1664) 
)
example10$calendarTime

         [,1]
[1,] 1.429352
[2,] 2.932992
[3,] 7.884660

Further topics

Data analysis: use function getInformation() and proceed in the way as described in the vignette of Tobias
Simulation function getSimulationCounts(): In progress
Effect size estimation: TBD
Adaptive design analysis using combination tests, e.g., unblinded SSR: TBD
Semiparametric models