Validation of FastSurvival • FastSurvival

Purpose

FastSurvival is built for speed, but a fast estimator is only useful if it returns the same answer as the established implementation. This vignette checks the numerical agreement between each FastSurvival function and a reference, on a real clinical-trial dataset. We use the gbsg data from the survival package, the German Breast Cancer Study Group cohort of 686 patients, with recurrence-free survival time rfstime (in days), the event indicator status, and the treatment indicator hormon (0 = no hormonal therapy, 1 = hormonal therapy). Throughout we take the no-hormone arm as the control (control = 0) and report one-sided tests of treatment benefit (side = 1).

The references are: the survival package for the Kaplan-Meier estimate, the log-rank test, the Cox hazard ratio, milestone survival, and the median survival comparison; survRM2 for the restricted mean survival time; survAH for the average hazard with survival weight; nph for the weighted log-rank test and the max-combo test; and nphRCT for the robust modestly-weighted test. The window mean survival time and the weighted Kaplan-Meier statistic, which have no direct package counterpart on CRAN, are checked against a Kaplan-Meier integral computed from survfit() and against the restricted mean survival time identity, respectively. The same comparisons form the basis of the automated test suite shipped with the package.

library(FastSurvival)

Reference data

library(survival)

# German Breast Cancer Study Group cohort
str(gbsg[, c("rfstime", "status", "hormon")])
#> 'data.frame':    686 obs. of  3 variables:
#>  $ rfstime: int  1838 403 1603 177 1855 842 293 42 564 1093 ...
#>  $ status : int  0 1 0 0 0 1 1 0 1 1 ...
#>  $ hormon : int  0 0 0 0 1 0 0 1 1 0 ...
table(hormon = gbsg$hormon)
#> hormon
#>   0   1 
#> 440 246

Kaplan-Meier survival

survfit_fast() evaluates the Kaplan-Meier estimate at a single time point. We compare against summary(survfit(...)) at the same time (1000 days).

# survfit_fast assumes time-sorted input; the C++ core groups tied times in the
# survival::survfit risk-set convention, so the data only need ordering by time.
ord  <- order(gbsg$rfstime)
fast <- survfit_fast(gbsg$rfstime[ord], gbsg$status[ord],
                     t_eval = 1000, conf.type = "log-log")

fit  <- survfit(Surv(rfstime, status) ~ 1, data = gbsg)
ref  <- summary(fit, times = 1000)

data.frame(
  quantity = c("survival", "std.err"),
  fast     = c(unclass(fast)["surv"], unclass(fast)["std.err"]),
  survival = c(ref$surv, ref$std.err),
  row.names = NULL
)
#>   quantity       fast   survival
#> 1 survival 0.65775040 0.65775040
#> 2  std.err 0.01905146 0.01905146

Log-rank test

survdiff_fast() with side = 1 returns a signed Z-score. Its square is the chi-square statistic from survdiff().

fast_lr <- survdiff_fast(gbsg$rfstime, gbsg$status, gbsg$hormon,
                         control = 0, side = 1)

ref_lr  <- survdiff(Surv(rfstime, status) ~ hormon, data = gbsg)

c(fast = as.numeric(fast_lr)^2, survival = ref_lr$chisq)
#>     fast survival 
#> 8.564781 8.564781

Weighted log-rank test

survdiff_fast() also computes Fleming-Harrington G(rho, gamma) weighted log-rank tests through the weight = "fh" argument. The nph package provides logrank.test() with the same family, so we compare the chi-square statistic (the squared one-sided Z) across several weight choices.

fh_grid <- data.frame(rho = c(0, 1, 0, 1), gamma = c(1, 0, 0, 1))

do.call(rbind, lapply(seq_len(nrow(fh_grid)), function(i) {
  r <- fh_grid$rho[i]
  g <- fh_grid$gamma[i]
  fast <- as.numeric(survdiff_fast(gbsg$rfstime, gbsg$status, gbsg$hormon,
                                   control = 0, side = 1,
                                   weight = "fh", rho = r, gamma = g))
  nph_chisq <- nph::logrank.test(gbsg$rfstime, gbsg$status, gbsg$hormon,
                                 rho = r, gamma = g)$test$Chisq
  data.frame(rho = r, gamma = g, fast = fast^2, nph = nph_chisq)
}))
#>   rho gamma     fast      nph
#> 1   0     1 5.110660 5.110660
#> 2   1     0 8.713791 8.713791
#> 3   0     0 8.564781 8.564781
#> 4   1     1 5.881310 5.881310

The two implementations agree to numerical precision across the weight family, including the ordinary log-rank test recovered at rho = 0, gamma = 0.

Cox hazard ratio

coxph_fast() returns the Pike-Halley Estimator, a closed-form approximation to the Cox partial likelihood maximizer. We report the log hazard ratio from both; the sign convention is the same and side does not affect the point estimate.

fast_hr <- coxph_fast(gbsg$rfstime, gbsg$status, gbsg$hormon,
                      control = 0, side = 1)

ref_cox <- coxph(Surv(rfstime, status) ~ hormon, data = gbsg)

c(fast = unclass(fast_hr)["coef"], cox = unname(coef(ref_cox)))
#>  fast.coef        cox 
#> -0.3638987 -0.3640099

Because the Pike-Halley Estimator is a closed-form approximation rather than the exact partial-likelihood maximizer, the two log hazard ratios are not expected to be identical. They differ slightly, here in the fourth decimal place, which reflects the approximation and is not a sign of error.

Restricted mean survival time

rmst_fast() integrates the Kaplan-Meier survival curve up to a horizon. We compare the two-group RMST difference against survRM2::rmst2() at 1000 days.

library(survRM2)

tau <- 1000

fast_rmst <- rmst_fast(gbsg$rfstime, gbsg$status, gbsg$hormon,
                       control = 0, tau = tau, side = 1)

ref_rmst  <- rmst2(time = gbsg$rfstime, status = gbsg$status,
                   arm = gbsg$hormon, tau = tau)

c(fast = unclass(fast_rmst)["diff"],
  survRM2 = ref_rmst$unadjusted.result[1, 1])
#> fast.diff   survRM2 
#>  51.14447  51.14447

Window mean survival time

wmst_fast() integrates the Kaplan-Meier curve between a lower and an upper window limit, generalizing the restricted mean survival time. CRAN has no dedicated WMST package, so we validate the per-group windowed area directly against a Kaplan-Meier step-function integral computed from survfit() over the same window. A full comparison against the survWMST package, which is distributed on GitHub, is provided in tools/compare_wmst_survwmst.R.

tau1 <- 200
tau2 <- 1000

fast_wm <- wmst_fast(gbsg$rfstime, gbsg$status, gbsg$hormon,
                     control = 0, side = 1, tau1 = tau1, tau2 = tau2)

# Integral of the Kaplan-Meier step function over [lo, hi] for one group.
km_window_area <- function(gi, lo, hi) {
  sf <- survfit(Surv(rfstime, status) ~ 1, data = gbsg[gbsg$hormon == gi, ])
  tk <- c(0, sf$time)
  sk <- c(1, sf$surv)
  brk  <- sort(unique(c(lo, hi, sf$time[sf$time > lo & sf$time < hi])))
  area <- 0
  for (b in seq_len(length(brk) - 1L)) {
    u  <- brk[b]
    su <- sk[max(which(tk <= u))]
    area <- area + su * (brk[b + 1L] - u)
  }
  area
}

data.frame(
  group    = c("control", "treatment"),
  fast     = unclass(fast_wm)[c("wmst.control", "wmst.treatment")],
  survfit  = c(km_window_area(0, tau1, tau2), km_window_area(1, tau1, tau2)),
  row.names = NULL
)
#>       group     fast  survfit
#> 1   control 628.0064 628.0064
#> 2 treatment 678.3393 678.3393

Weighted Kaplan-Meier test

wkm_fast() computes the weighted Kaplan-Meier (Pepe-Fleming) test, the weighted integral of the difference between the two Kaplan-Meier curves. With the constant weight the weighted difference reduces exactly to the difference in restricted mean survival time over the observed range, which gives a clean internal check against rmst_fast() evaluated at the largest observed time.

tmax <- max(gbsg$rfstime)

fast_wk <- wkm_fast(gbsg$rfstime, gbsg$status, gbsg$hormon,
                    control = 0, side = 1, weight = "constant")

ref_rm  <- rmst_fast(gbsg$rfstime, gbsg$status, gbsg$hormon,
                     control = 0, tau = tmax, side = 1)

c(wkm.constant = unclass(fast_wk)["wdiff"], rmst = unclass(ref_rm)["diff"])
#> wkm.constant.wdiff          rmst.diff 
#>           256.7207           256.7207

With the default Pepe-Fleming weight, wkm_fast() reproduces the weighted Kaplan-Meier statistic of the nphsim package. Because nphsim is distributed only on GitHub, that comparison is kept in tools/compare_wkm_nphsim.R (which also bundles a survival-only reproduction of the statistic) and in the test suite, rather than in this vignette.

Milestone survival

milestone_fast() compares Kaplan-Meier survival between two groups at a milestone timepoint. The per-group survival probabilities match those from survfit().

tstar <- 1000

fast_ms <- milestone_fast(gbsg$rfstime, gbsg$status, gbsg$hormon,
                          control = 0, tau = tstar, method = "loglog",
                          side = 1)

fit_g <- survfit(Surv(rfstime, status) ~ hormon, data = gbsg)
ref_g <- summary(fit_g, times = tstar)

data.frame(
  group    = c("control", "treatment"),
  fast     = fast_ms$surv[c("control", "treatment")],
  survival = ref_g$surv,
  row.names = NULL
)
#>       group      fast  survival
#> 1   control 0.6208762 0.6208762
#> 2 treatment 0.7230082 0.7230082

Median survival time

medsurv_fast() estimates the Kaplan-Meier median survival time and, for two groups, their difference. The point estimate is the Kaplan-Meier median (the first time at which the product-limit curve reaches 0.5), the same convention as survfit(), so the per-group medians match those reported by survfit() exactly.

fast_med <- medsurv_fast(gbsg$rfstime, gbsg$status, gbsg$hormon,
                         control = 0, side = 1, method = "nph")

fit_med <- survfit(Surv(rfstime, status) ~ hormon, data = gbsg)
med_ref <- summary(fit_med)$table[, "median"]

data.frame(
  group    = c("control", "treatment"),
  fast     = unclass(fast_med)[c("median.control", "median.treatment")],
  survival = c(med_ref[1], med_ref[2]),
  row.names = NULL
)
#>       group fast survival
#> 1   control 1528     1528
#> 2 treatment 2018     2018

We use survfit() as the reference here, rather than nph::nphparams(), because the two define the median on different survival curves: survfit() and medsurv_fast() use the Kaplan-Meier (product-limit) median, whereas nph::nphparams() reads the median off the Nelson-Aalen curve, S(t) = exp(-H(t)), which lies above the Kaplan-Meier curve and so can cross 0.5 at a later time on tied data. The method = "nph" standard error reproduces the nph::nphparams() standard error to numerical precision when the two medians coincide; that comparison is run on tie-free scenarios in tools/compare_medsurv_nphparams.R and in the package test suite.

Average hazard with survival weight

ahsw_fast() computes the average hazard with survival weight of Uno and Horiguchi. We compare the per-group average hazard against survAH::ah2().

library(survAH)

tau <- 1000

fast_ah <- ahsw_fast(gbsg$rfstime, gbsg$status, gbsg$hormon,
                     control = 0, tau = tau, side = 1)

ref_ah  <- ah2(time = gbsg$rfstime, status = gbsg$status,
               arm = gbsg$hormon, tau = tau)

data.frame(
  quantity = c("AH (control)", "AH (treatment)"),
  fast     = unclass(fast_ah)[c("ah.ctrl", "ah.trt")],
  survAH   = c(ref_ah$ah["AH (arm0)", "Est."],
               ref_ah$ah["AH (arm1)", "Est."]),
  row.names = NULL
)
#>         quantity         fast       survAH
#> 1   AH (control) 0.0004585857 0.0004585857
#> 2 AH (treatment) 0.0003155277 0.0003155277

The average hazard is the ratio of the cumulative event probability to the restricted mean survival time, so for the gbsg data, where follow-up is measured in days, the values are on the order of 1e-04 per day in both groups.

Average hazard ratio

ahr_fast() computes the Kalbfleisch-Prentice average hazard ratio between two groups over a restricted interval. The reference implementation is ahrKM() from the AHR package, but that package has been archived on CRAN, so here we check the point estimates against a direct survival-based computation: each group’s Kaplan-Meier curve from survfit(), integrated to form the group shares of the total hazard.

tau <- 1000

fast_ahr <- ahr_fast(gbsg$rfstime, gbsg$status, gbsg$hormon,
                     control = 0, tau = tau, side = 1)

g  <- gbsg$hormon
ev <- c(gbsg$rfstime[g == 0 & gbsg$status == 1],
        gbsg$rfstime[g == 1 & gbsg$status == 1])
grid <- sort(unique(c(0, ev[ev <= tau], tau)))

km_on_grid <- function(gi) {
  sf <- survfit(Surv(rfstime, status) ~ 1, data = gbsg[g == gi, ])
  approxfun(sf$time, sf$surv, method = "constant",
            yleft = 1, rule = 2, f = 0)(grid)
}

S0  <- km_on_grid(0)
S1  <- km_on_grid(1)
m   <- length(grid)
dS0 <- S0 - c(1, S0[-m])
GL  <- S0[m] * S1[m]
ref_theta_ctrl <- -sum(S1 * dS0) / (1 - GL)

data.frame(
  quantity  = c("theta (control)", "theta (treatment)", "AHR"),
  fast      = c(fast_ahr$theta[[1]], fast_ahr$theta[[2]], fast_ahr$ahr),
  reference = c(ref_theta_ctrl, 1 - ref_theta_ctrl,
                (1 - ref_theta_ctrl) / ref_theta_ctrl),
  row.names = NULL
)
#>            quantity      fast reference
#> 1   theta (control) 0.5974108 0.5974108
#> 2 theta (treatment) 0.4025892 0.4025892
#> 3               AHR 0.6738899 0.6738899

The point estimates match the survival-based reference to numerical precision. The average hazard ratio was additionally cross-checked against the archived AHR package, the reference implementation used by Dormuth et al. (2024): the two group shares, the average hazard ratio, the variances, and both test statistics agree to within the order of 1e-15. That external comparison is reproducible with the tools/compare_ahr_ahrKM.R script shipped in the package sources.

Max-combo test

maxcombo_fast() computes the max-combo test, the most extreme of a set of Fleming-Harrington weighted log-rank statistics. The nph package provides logrank.maxtest(), whose default weight set is FH(0, 0), FH(0, 1), and FH(1, 0). We request the same three weights from maxcombo_fast() and compare the component Z-scores. The individual Z-scores are kept in the z attribute, signed so that a negative value indicates benefit for the treatment group under the package convention. Because nph orients the contrast in the opposite direction, the signs are mirrored, so we compare absolute values.

rho   <- c(0, 0, 1)
gamma <- c(0, 1, 0)

fast_mc <- maxcombo_fast(gbsg$rfstime, gbsg$status, gbsg$hormon,
                         control = 0, side = 1, rho = rho, gamma = gamma)

nph_mc <- nph::logrank.maxtest(gbsg$rfstime, gbsg$status, gbsg$hormon)

data.frame(
  weight = c("FH(0,0)", "FH(0,1)", "FH(1,0)"),
  fast   = abs(attr(fast_mc, "z")),
  nph    = abs(nph_mc$tests$z),
  row.names = NULL
)
#>    weight     fast      nph
#> 1 FH(0,0) 2.926565 2.926565
#> 2 FH(0,1) 2.260677 2.260677
#> 3 FH(1,0) 2.951913 2.951913

The component statistics agree in absolute value. The final max-combo statistic is the largest of these, which for the one-sided test is the most extreme component Z. The p-values from the two packages are close but not identical, since they use different numerical methods for the joint distribution (a multivariate normal integral here, a Bonferroni-style combination in nph).

Robust modestly-weighted log-rank test

rmw_fast() computes the robust modestly-weighted (rMW) test of Magirr and Öhrn, the maximum of the standard log-rank statistic and a single modestly-weighted log-rank statistic. Each component is a weighted log-rank test, so we check the two component Z-scores against nphRCT, the implementation used by the method’s authors. The modestly-weighted component uses the survival-threshold parameterization, so s_star = 1 recovers the standard log-rank test and s_star = 0.5 gives the modestly-weighted component. As with the max-combo test the package orients a negative Z toward treatment benefit while nphRCT uses the opposite sign, so we compare absolute values.

gbsg_df <- data.frame(
  time  = gbsg$rfstime,
  event = gbsg$status,
  arm   = factor(ifelse(gbsg$hormon == 0, "control", "experimental"),
                 levels = c("control", "experimental"))
)

fit_rmw <- rmw_fast(gbsg_df$time, gbsg_df$event, gbsg_df$arm,
                    control = "control", side = 1, s_star = 0.5)

z_lr_nph <- nphRCT::wlrt(Surv(time, event) ~ arm, data = gbsg_df,
                         method = "mw", s_star = 1)$z
z_mw_nph <- nphRCT::wlrt(Surv(time, event) ~ arm, data = gbsg_df,
                         method = "mw", s_star = 0.5)$z

data.frame(
  component = c("log-rank (s_star = 1)", "modestly-weighted (s_star = 0.5)"),
  fast      = abs(attr(fit_rmw, "z")),
  nphRCT    = abs(c(z_lr_nph, z_mw_nph)),
  row.names = NULL
)
#>                          component     fast   nphRCT
#> 1            log-rank (s_star = 1) 2.926565 2.926565
#> 2 modestly-weighted (s_star = 0.5) 2.773749 2.773749

The component statistics agree in absolute value. The null correlation of the two components, reported in the corr attribute, matches the covariance computed by the authors’ find_cor routine to numerical precision. The combined statistic and one-sided p-value then follow from the bivariate normal distribution of the two components, evaluated with mvtnorm::pmvnorm. On the POPLAR overall-survival data analyzed in the original article, rmw_fast() reproduces the published one-sided p-values of 0.0028 for the log-rank test, 0.0009 for the modestly-weighted test, and 0.0012 for the rMW test, with a component correlation of 0.97.

Summary

Across all functions the FastSurvival results reproduce the reference values on the gbsg data. The point estimates and test statistics agree to numerical precision for the Kaplan-Meier, log-rank, weighted log-rank, RMST, milestone, median, and average-hazard quantities; the window mean survival time matches a Kaplan-Meier integral and the weighted Kaplan-Meier statistic matches the RMST identity; the average hazard ratio matches a survival-based reference; and the closed-form Cox hazard ratio agrees with the partial-likelihood maximizer to the order expected for the Pike-Halley approximation. This agreement is verified continuously by the package test suite.

References

Kaplan, E. L., & Meier, P. (1958). Nonparametric estimation from incomplete observations. Journal of the American Statistical Association, 53(282), 457-481.

Mantel, N. (1966). Evaluation of survival data and two new rank order statistics arising in its consideration. Cancer Chemotherapy Reports, 50(3), 163-170.

Fleming, T. R., & Harrington, D. P. (1991). Counting Processes and Survival Analysis. New York: John Wiley & Sons.

Cox, D. R. (1972). Regression models and life-tables. Journal of the Royal Statistical Society. Series B (Methodological), 34(2), 187-220.

Homma, G. (2025). One step from Pike to Cox: a closed-form hazard ratio estimator. Manuscript under review.

Royston, P., & Parmar, M. K. B. (2013). Restricted mean survival time: an alternative to the hazard ratio for the design and analysis of randomized trials with a time-to-event outcome. BMC Medical Research Methodology, 13, 152.

Paukner, M., & Chappell, R. (2021). Window mean survival time. Statistics in Medicine, 40(25), 5521-5533.

Pepe, M. S., & Fleming, T. R. (1989). Weighted Kaplan-Meier statistics: a class of distance tests for censored survival data. Biometrics, 45(2), 497-507.

Pepe, M. S., & Fleming, T. R. (1991). Weighted Kaplan-Meier statistics: large sample and optimality considerations. Journal of the Royal Statistical Society. Series B (Methodological), 53(2), 341-352.

Tang, Y. (2021). Some new confidence intervals for Kaplan-Meier based estimators from one and two sample survival data. Statistics in Medicine, 40(23), 4961-4976.

Uno, H., Claggett, B., Tian, L., et al. (2014). Moving beyond the hazard ratio in quantifying the between-group difference in survival analysis. Journal of Clinical Oncology, 32(22), 2380-2385.

Uno, H., & Horiguchi, M. (2023). Ratio and difference of average hazard with survival weight: new measures to quantify survival benefit of new therapy. Statistics in Medicine, 42(7), 936-952.

Kalbfleisch, J. D., & Prentice, R. L. (1981). Estimation of the average hazard ratio. Biometrika, 68(1), 105-112.

Dormuth, I., Pauly, M., Rauch, G., & Herrmann, C. (2024). Sample size calculation under nonproportional hazards using average hazard ratios. Biometrical Journal, 66(6), e202300271.

Karrison, T. G. (2016). Versatile tests for comparing survival curves based on weighted log-rank statistics. The Stata Journal, 16(3), 678-690.

Magirr, D., & Burman, C.-F. (2019). Modestly weighted logrank tests. Statistics in Medicine, 38(20), 3782-3790.

Magirr, D., & Öhrn, F. (2026). Robust modestly weighted log-rank tests. Pharmaceutical Statistics, 25(1), e70066.