Fast Simulation of Two-Group Time-to-Event Trial Data

Simulates time-to-event trial data for one or two groups across many simulated trials, with piecewise accrual, piecewise-exponential survival and dropout, and optional subgroups defined by a prevalence specification. The entire generation pipeline (accrual, survival, dropout, derived columns, and two-group interleaving) runs in a single C++ kernel that materializes the output data frame once, avoiding intermediate R-level vector operations and copies. The random-number stream is consumed in the same order as a per-group reference implementation, so results are reproducible from seed.

Usage

simdata_fast(
  nsim = 1000,
  n,
  alloc = c(1, 1),
  a.time,
  a.rate = NULL,
  a.prop = NULL,
  e.hazard = NULL,
  e.median = NULL,
  e.time = NULL,
  d.hazard = NULL,
  d.median = NULL,
  d.time = NULL,
  seed = NULL,
  prevalence = NULL,
  fixed.alloc = FALSE,
  h01.hazard = NULL,
  h01.median = NULL,
  h01.time = NULL,
  h02.hazard = NULL,
  h02.median = NULL,
  h02.time = NULL,
  h12.hazard = NULL,
  h12.median = NULL,
  h12.time = NULL,
  switch.prop = NULL,
  h12.switch.hazard = NULL,
  h12.switch.median = NULL,
  h12.switch.time = NULL,
  switch.clock = "reset"
)

Arguments

nsim: Number of simulated trials.
n: Either a single total sample size (split by alloc), a length-two vector of per-group sample sizes, or, for a multi-arm trial, a vector of length greater than two giving the per-arm sample sizes (which requires a per-arm e.hazard or e.median list). When n is a per-arm vector, alloc is ignored.
alloc: A length-two allocation ratio, used when n is scalar.
a.time: A numeric vector of accrual-interval breakpoints.
a.rate: Absolute accrual rates (subjects per unit time), interpreted in one of two ways. With length length(a.time) - 1 the accrual period is fully specified and the rates must accrue exactly sum(n) subjects (an inconsistent total is an error). With length length(a.time) the final rate applies to an open last interval whose end time is computed so the total is sum(n). Supply exactly one of a.rate and a.prop.
a.prop: Accrual proportions, one per accrual interval (length length(a.time) - 1), giving the fraction of subjects enrolled in each interval. Values are normalized to sum to one and distribute the fixed total sum(n). Unlike a.rate this carries no rate, so the accrual period must be fully specified by a.time. Supply exactly one of a.rate and a.prop.
e.hazard: Survival hazard(s). A scalar or vector for one group, or a two-element list for two groups; per-cell lists are used with subgroups.
e.median: Survival median(s); an alternative to e.hazard.
e.time: Survival breakpoints for piecewise hazards (last element Inf).
d.hazard: Dropout hazard(s), same structure as e.hazard.
d.median: Dropout median(s); an alternative to d.hazard.
d.time: Dropout breakpoints for piecewise hazards.
seed: Optional integer seed for the dqrng generator.
prevalence: Optional subgroup prevalence specification (numeric vector, list of vectors, array, or a named control/treatment list for group-specific prevalence).
fixed.alloc: Logical; when TRUE subgroup sizes are deterministic rather than drawn.
h01.hazard: Transition hazard(s) for the non-terminal (intermediate) event (state 0 to state 1) in the illness-death model. A scalar or vector for one group, or a two-element list for two groups. Supplying any of h01.* or h02.* activates the illness-death model with two correlated endpoints, and is mutually exclusive with e.hazard / e.median.
h01.median: Median(s) for the intermediate event; an alternative to h01.hazard.
h01.time: Breakpoints for a piecewise h01.hazard (last element Inf).
h02.hazard: Transition hazard(s) for the terminal event without a prior intermediate event (state 0 to state 2). Same group and piecewise conventions as h01.hazard.
h02.median: Median(s) for the direct terminal event; an alternative to h02.hazard.
h02.time: Breakpoints for a piecewise h02.hazard.
h12.hazard: Transition hazard(s) for the terminal event after an intermediate event (state 1 to state 2) for subjects who do not switch. Defaults to h02.hazard, which gives the Fleischer maximal-independence model (Fleischer Theorem 1 when there is no switching).
h12.median: Median(s) for the post-event terminal event; an alternative to h12.hazard.
h12.time: Breakpoints for a piecewise h12.hazard, measured from the intermediate-event time (clock-reset).
switch.prop: Probability that a subject with an intermediate event switches treatment, a scalar or a two-element list (control, treatment). Defaults to zero (no switching); the treatment group is typically left at zero.
h12.switch.hazard: Transition hazard(s) from state 1 to state 2 for subjects who switch. Required when any switch.prop is positive.
h12.switch.median: Median(s) for the post-switch terminal event; an alternative to h12.switch.hazard.
h12.switch.time: Breakpoints for a piecewise h12.switch.hazard, measured from the switch (intermediate-event) time (clock-reset).
switch.clock: Time origin for the post-event hazards. Currently only "reset" (measured from the intermediate event) is implemented.

Value

A data.frame with nsim * sum(n) rows. The columns are sim, group, any subgroup columns, accrual_time, surv_time, dropout_time, tte, event, and calendar_time. In the illness-death model the columns are instead sim, group, accrual_time, e1_surv_time, e2_surv_time, dropout_time, e1_tte, e1_event, e2_tte, e2_event, e1_calendar_time, e2_calendar_time, intermediate, switched, and switch_time, where e1 is the first (state-0 exit) endpoint and e2 is the terminal endpoint. In oncology e1 is progression-free survival, e2 is overall survival, and intermediate flags progression.

Details

For each subject the observed time-to-event is tte = pmin(surv_time, dropout_time) and event is 1 when the survival time occurs first. The calendar time of the observed event is accrual_time + tte.

The total enrolled is fixed at sum(n). With a.rate the rates are absolute (subjects per unit time): when the accrual period is fully specified the rates must accrue exactly sum(n), and when one extra rate is given the end of the final interval is solved so the total is met. With a.prop the values are relative proportions that distribute sum(n) across the fully specified intervals. Each accrual interval receives a deterministic number of subjects (the rate or proportion times the group total, rounded to keep the per-group total exact), placed uniformly within the interval.

Survival and dropout are exponential when a single hazard (or median) is supplied and piecewise-exponential when a vector is supplied together with the corresponding e.time or d.time breakpoints, whose last element must be Inf. Group-specific parameters are supplied as a two-element list (control first, treatment second).

When prevalence is supplied the trial has subgroups. A numeric vector defines a single factor; a list of numeric vectors defines several independent factors; a multi-dimensional array defines the joint distribution of correlated factors. Per-cell hazards may be supplied as a list with one element per cell. With fixed.alloc = TRUE the subgroup sizes are deterministic; otherwise subgroup membership is drawn from the prevalence distribution.

When n is a vector of length greater than two together with a per-arm survival list, the simulation is a multi-arm trial. Each arm is generated in turn with the validated single-group kernel over a common accrual window, and the arms are stacked into one data frame with a group column labeled 1 to length(n) in the order of n. Per-arm survival is supplied as an e.hazard or e.median list with one element per arm, and optional dropout as a shared value or a per-arm list through d.hazard or d.median. The arms share the master seed, so the result is reproducible. A multi-arm design is analyzed as a set of pairwise contrasts by subsetting the output to the control arm and one other arm and calling analysis_fast once per contrast. Multi-arm mode does not support subgroups or the illness-death model, which remain two-group.

Examples