Fast Closed-Form Hazard Ratio Estimation via the Pike-Halley Estimator

Estimates the hazard ratio for a two-group parallel trial using the Pike-Halley Estimator, a pure closed-form approximation to the Cox partial likelihood maximizer. The function returns the point estimate, its standard error on the log scale, and a Wald-type confidence interval, using output names consistent with summary(survival::coxph(...)). The C++ backend accepts pooled sorted vectors directly, performing group splitting and all accumulation in a single C++ pass without intermediate R-level vector copies.

Usage

coxph_fast(
  time,
  event,
  group,
  control,
  side = 2,
  conf.level = 0.95,
  presorted = FALSE
)

Arguments

time: A numeric vector of follow-up times for all subjects (pooled over both groups).
event: An integer or numeric vector of event indicators (1 = event, 0 = censored), aligned with time.
group: A vector of group labels aligned with time. Any type that supports equality comparison is accepted.
control: A scalar value indicating which level of group represents the control group. Subjects with group != control are treated as the treatment group.
side: 1 for a one-sided test in the direction of treatment benefit (hazard ratio below 1, i.e. a negative coefficient) or 2 for a two-sided test (default 2). The reported p-value follows this choice; the confidence interval is always two-sided at conf.level.
conf.level: A single numeric value in (0, 1) specifying the confidence level for the Wald interval. Defaults to 0.95.
presorted: A logical value. If TRUE, time, event, and group are assumed to be already sorted in ascending order of time, and the internal order() call is skipped. If FALSE (default), sorting is handled internally.

Value

An object of class "coxph_fast", which is a named numeric vector of length 5 with elements matching the column names of summary(coxph(...))$coefficients and summary(coxph(...))$conf.int:

coef: Log hazard ratio log(theta_hat).
exp(coef): Hazard ratio theta_hat (point estimate).
se(coef): Standard error of coef on the log scale, equal to 1 / sqrt(I_0).
lower .95: Lower bound of the Wald confidence interval for the hazard ratio. The label reflects conf.level (e.g., "lower .90" when conf.level = 0.90).
upper .95: Upper bound of the Wald confidence interval.

Returns a vector of NA_real_ values (still with class "coxph_fast") when the estimate cannot be computed (e.g., no events, all events in one group, or I_0 = 0).

Details

Let t_k (k = 1, ..., K) denote the distinct observed event times in the pooled sample. At each t_k, let n_T_k and n_C_k be the numbers at risk in the treatment and control groups just before t_k, and let O_T_k and O_C_k be the numbers of events in each group, with n_k = n_T_k + n_C_k and O_k = O_T_k + O_C_k. Define E_T = sum n_T_k O_k / n_k and E_C = sum n_C_k O_k / n_k as the log-rank expected event totals.

The Pike-Halley Estimator is obtained in three steps. First, the Pike anchor is computed as theta_0 = (O_T E_C) / (O_C E_T). Second, the score U_0, the observed information I_0, and the third-order curvature term J_0 of the Breslow partial likelihood are evaluated at eta_0 = log(theta_0):

p_k = n_T_k theta_0 / (n_C_k + n_T_k theta_0) U_0 = sum (O_T_k - O_k p_k) I_0 = sum O_k p_k (1 - p_k) J_0 = sum O_k p_k (1 - p_k) (1 - 2 p_k)

Third, the closed-form Halley correction is applied:

delta_hat = U_0 / I_0 - J_0 U_0^2 / (2 I_0^3) theta_hat = theta_0 exp(delta_hat)

The residual error satisfies |theta_hat - theta_Cox| = O_p(n^{-3/2}), three orders of magnitude faster than the O_p(n^{-1/2}) rate of Peto and Pike, and the per-call cost is approximately thirty times lower than that of the iterative Cox solver (Homma, 2025).

The Wald standard error on the log scale is SE = 1 / sqrt(I_0), where I_0 is the observed information evaluated at the Pike anchor. This is the same quantity used in the Wald confidence interval reported by summary(coxph(...)), which is based on the observed information at the maximum likelihood estimate. Because the Pike anchor lies within O_p(n^{-1/2}) of the Cox maximum likelihood estimate, the difference between I_0 and the information at the maximum likelihood estimate is negligible for the purpose of interval construction.

The C++ core (pihe_core) accepts the pooled sorted data together with an integer group indicator and performs group splitting, at-risk counting, and per-distinct-event-time accumulation in a single left-to-right scan. This eliminates the rev(cumsum(rev(...))), tapply(), which(), diff(), and group-split vector copies present in the pure-R version.

The returned object has class "coxph_fast" and is a named numeric vector of length 5. A print() method formats the result similarly to summary(coxph(...)).

References

Cox, D. R. (1972). Regression models and life-tables. Journal of the Royal Statistical Society. Series B (Methodological), 34(2), 187-220.

Berry, G., Kitchin, R. M., & Mock, P. A. (1991). A comparison of two simple hazard ratio estimators based on the logrank test. Statistics in Medicine, 10(5), 749-755.

Homma, G. (2025). One step from Pike to Cox: a closed-form hazard ratio estimator. Manuscript under review.

Examples

library(survival)

# Compare coxph_fast with coxph on the ovarian dataset.
# coxph() treats rx as numeric with rx=1 as the reference (control),
# so set control = 1 for a consistent comparison.
fit_fast <- coxph_fast(ovarian$futime, ovarian$fustat, ovarian$rx, control = 1)
fit_fast
#> Pike-Halley estimator for the hazard ratio (two-group)
#> 
#>   control = 1
#>   alternative = two.sided
#> 
#> Coefficients:
#>          coef exp(coef) se(coef)      z Pr(>|z|)
#> group -0.5964    0.5508   0.5868 -1.016     0.31
#> 
#> Hazard ratio and 95% Wald confidence interval:
#>       exp(coef) exp(-coef) lower .95 upper .95
#> group    0.5508     1.8155    0.1744    1.7399

fit_cox <- summary(coxph(Surv(futime, fustat) ~ rx, data = ovarian))
cat("coxph_fast HR :", fit_fast["exp(coef)"], "\n")
#> coxph_fast HR : 0.5508019 
cat("coxph      HR :", fit_cox$coefficients[, "exp(coef)"], "\n")
#> coxph      HR : 0.5508019 

# presorted = TRUE: sort once outside, reuse inside a loop
ord <- order(ovarian$futime)
coxph_fast(ovarian$futime[ord], ovarian$fustat[ord], ovarian$rx[ord],
           control = 1, presorted = TRUE)
#> Pike-Halley estimator for the hazard ratio (two-group)
#> 
#>   control = 1
#>   alternative = two.sided
#> 
#> Coefficients:
#>          coef exp(coef) se(coef)      z Pr(>|z|)
#> group -0.5964    0.5508   0.5868 -1.016     0.31
#> 
#> Hazard ratio and 95% Wald confidence interval:
#>       exp(coef) exp(-coef) lower .95 upper .95
#> group    0.5508     1.8155    0.1744    1.7399

# \donttest{
# Speed comparison against coxph()
if (requireNamespace("microbenchmark", quietly = TRUE)) {
  microbenchmark::microbenchmark(
    coxph_fast = coxph_fast(ovarian$futime, ovarian$fustat, ovarian$rx, 2),
    coxph      = coxph(Surv(futime, fustat) ~ rx, data = ovarian),
    times = 1000
  )
}
#> Unit: microseconds
#>        expr      min       lq       mean    median       uq      max neval cld
#>  coxph_fast   36.264   43.015   56.55532   62.0775   67.085  200.218  1000  a 
#>       coxph 1314.877 1364.456 1439.01877 1383.9650 1410.935 7421.366  1000   b
# }