
Group sequential design with the simulation trio
Source:vignettes/group-sequential-design.Rmd
group-sequential-design.RmdPurpose
This vignette demonstrates the three simulation functions working
together on a real phase 3 trial: simdata_fast() generates
the trial data, analysis_fast() performs the interim and
final analyses, and simsummary_fast() aggregates the
operating characteristics. We reproduce the group sequential design of
the innovaTV 301 trial and check that the simulated operating
characteristics match the closed-form values from gsDesign and rpact.
The point is not that simulation replaces the closed-form calculation under proportional hazards, where gsDesign and rpact are exact, but that once the simulation agrees on a case they can handle, the same machinery can be trusted for cases they cannot, such as non-proportional hazards or data-dependent analysis timing. The benchmark code is shown but not executed when the vignette is built, because the simulation uses many replicates; the printed results are those obtained from an interactive run.
The innovaTV 301 trial
innovaTV 301 (ENGOT-cx12/GOG-3057) was a phase 3, open-label trial of tisotumab vedotin versus the investigator’s choice of chemotherapy in patients with recurrent or metastatic cervical cancer (Vergote et al., 2024). The primary end point was overall survival, with patients randomly assigned in a 1:1 ratio.
The design enrolled approximately 482 patients and was powered at 90% on the occurrence of 336 total deaths, with one prespecified interim efficacy analysis at about 75% of information (252 of 336 events). The overall two-sided type I error was controlled at 5% using the Lan-DeMets spending function with an O’Brien-Fleming boundary. For planning we take an exponential overall survival with a median of 12.9 months in the tisotumab vedotin group and 9.0 months in the chemotherapy group (a hazard ratio of about 0.70), accrual over 23 months, and a 5% annual dropout rate in each group.
Simulating the trial
simdata_fast() generates the survival, censoring, and
entry times for all replicates in one fused C++ pass. We specify the two
group sizes, the accrual window, the per-group event medians, and a
per-group dropout hazard corresponding to 5% per 12 months.
Interim and final analyses
analysis_fast() runs the event-driven looks. We analyze
at 252 and 336 events, take the chemotherapy group as the control, and
compute both the log-rank and the Cox statistics with a two-sided test,
matching the trial.
res <- analysis_fast(
df, control = 2,
event.looks = c(252, 336),
stat = c("logrank", "coxph"), side = 2
)Spending boundaries
The efficacy boundary is the Lan-DeMets O’Brien-Fleming spending
function at the planned information fractions. We obtain it from
gsDesign and convert the upper Z boundaries to two-sided nominal p-value
boundaries, which is the scale simsummary_fast() consumes
through its p.col argument.
Operating characteristics
simsummary_fast() applies the nominal p-value boundaries
to the simulated log-rank p-values and aggregates the crossing
probabilities, expected events, expected sample size, and expected
analysis time across the looks.
simsummary_fast(
res,
p.col = "logrank.p",
alpha = spend_alpha,
direction = "lower"
)The interactive run produces the following output.
Group-Sequential Operating Characteristics (simsummary_fast)
Simulations: 10000
Boundaries: nominal p-value on 'logrank.p'
Stopping Boundaries: Look by Look
Look Info. Frac. Events (s) Sample (n) Nominal p Cum. Cross. Eff.
1 0.75 252.0 482.0 0.0193 0.6881
2 1.00 336.0 482.0 0.0442 0.9000
Overall
Rejection rate (efficacy): 0.9000
Expected events at stop: 278.2
Expected sample size at stop: 482.0
Expected analysis time at stop:27.31
Comparison with the closed-form design
For an independent closed-form reference we recompute the same design with rpact and read off the operating characteristics. The full rpact output is long, so we extract only the quantities needed for the comparison.
library(rpact)
design <- getDesignGroupSequential(
kMax = 2,
alpha = 0.05,
beta = 0.1,
sided = 2,
typeOfDesign = "asOF",
informationRates = c(252, 336) / 336
)
results <- getPowerSurvival(
design,
maxNumberOfEvents = 336,
median1 = 12.9,
median2 = 9.0,
maxNumberOfSubjects = 482,
accrualTime = c(0, 23),
dropoutRate1 = -log(1 - 0.05),
dropoutRate2 = -log(1 - 0.05),
allocationRatioPlanned = 1
)The simulated and closed-form operating characteristics agree closely. The efficacy boundaries are the spending boundaries fed into the simulation, so they match by construction; the crossing probabilities, expected events, and expected timing are estimated independently by simulation and line up with the analytic values.
| Quantity | FastSurvival (10,000 sims) | gsDesign / rpact |
|---|---|---|
| Interim efficacy boundary (two-sided nominal p) | 0.0193 | 0.0193 |
| Final efficacy boundary (two-sided nominal p) | 0.0442 | 0.0442 |
| Probability of crossing at the interim | 0.688 | 0.698 |
| Overall power | 0.900 | 0.905 |
| Expected number of events at stop | 278 | 277 |
| Expected analysis time at stop (months) | 27.3 | 27.3 |
Remaining differences are Monte Carlo error and shrink as
nsim increases.
Beyond proportional hazards
The value of the simulation trio is that nothing in the workflow
assumes proportional hazards. To study a delayed treatment effect,
replace the constant event hazard with a piecewise specification through
e.hazard and e.time and keep everything else
the same. The log-rank statistic loses power under a delayed effect, and
a weighted or max-combo statistic can be substituted at the
analysis_fast() step to recover it. Because gsDesign and
rpact cannot evaluate these cases in closed form, the validated
simulation machinery becomes the tool of choice.
df_delay <- simdata_fast(
nsim = 10000,
n = c(241, 241),
a.time = c(0, 23),
a.prop = 1,
e.hazard = list(c(0.077, 0.045), c(0.077, 0.077)),
e.time = c(0, 3, Inf),
d.hazard = list(-log(1 - 0.05) / 12, -log(1 - 0.05) / 12),
seed = 1
)
res_delay <- analysis_fast(
df_delay, control = 2,
event.looks = c(252, 336),
stat = "maxcombo", side = 2
)References
Vergote, I., González-Martín, A., Fujiwara, K., et al. (2024). Tisotumab vedotin as second- or third-line therapy for recurrent cervical cancer. New England Journal of Medicine, 391(1), 44-55.
O’Brien, P. C., & Fleming, T. R. (1979). A multiple testing procedure for clinical trials. Biometrics, 35(3), 549-556.
Lan, K. K. G., & DeMets, D. L. (1983). Discrete sequential boundaries for clinical trials. Biometrika, 70(3), 659-663.
Lin, R. S., Lin, J., Roychoudhury, S., et al. (2020). Alternative analysis methods for time to event endpoints under nonproportional hazards: a comparative analysis. Statistics in Biopharmaceutical Research, 12(2), 187-198.