
Fast Closed-Form Hazard Ratio Estimation via the Pike-Halley Estimator
Source:R/coxph_fast.R
coxph_fast.RdEstimates the hazard ratio for a two-group parallel trial using the
Pike-Halley Estimator, a pure closed-form approximation to the Cox partial
likelihood maximizer. The function returns the point estimate, its standard
error on the log scale, and a Wald-type confidence interval, using output
names consistent with summary(survival::coxph(...)). The C++ backend
accepts pooled sorted vectors directly, performing group splitting and all
accumulation in a single C++ pass without intermediate R-level vector copies.
Arguments
- time
A numeric vector of follow-up times for all subjects (pooled over both groups).
- event
An integer or numeric vector of event indicators (1 = event, 0 = censored), aligned with
time.- group
A vector of group labels aligned with
time. Any type that supports equality comparison is accepted.- control
A scalar value indicating which level of
grouprepresents the control group. Subjects withgroup != controlare treated as the treatment group.- side
1 for a one-sided test in the direction of treatment benefit (hazard ratio below 1, i.e. a negative coefficient) or 2 for a two-sided test (default 2). The reported p-value follows this choice; the confidence interval is always two-sided at
conf.level.- conf.level
A single numeric value in (0, 1) specifying the confidence level for the Wald interval. Defaults to 0.95.
- presorted
A logical value. If
TRUE,time,event, andgroupare assumed to be already sorted in ascending order oftime, and the internalorder()call is skipped. IfFALSE(default), sorting is handled internally.
Value
An object of class "coxph_fast", which is a named numeric
vector of length 5 with elements matching the column names of
summary(coxph(...))$coefficients and
summary(coxph(...))$conf.int:
coefLog hazard ratio log(theta_hat).
exp(coef)Hazard ratio theta_hat (point estimate).
se(coef)Standard error of
coefon the log scale, equal to 1 / sqrt(I_0).lower .95Lower bound of the Wald confidence interval for the hazard ratio. The label reflects
conf.level(e.g.,"lower .90"whenconf.level = 0.90).upper .95Upper bound of the Wald confidence interval.
Returns a vector of NA_real_ values (still with class
"coxph_fast") when the estimate cannot be computed (e.g., no
events, all events in one group, or I_0 = 0).
Details
Let t_k (k = 1, ..., K) denote the distinct observed event times in the pooled sample. At each t_k, let n_T_k and n_C_k be the numbers at risk in the treatment and control groups just before t_k, and let O_T_k and O_C_k be the numbers of events in each group, with n_k = n_T_k + n_C_k and O_k = O_T_k + O_C_k. Define E_T = sum n_T_k O_k / n_k and E_C = sum n_C_k O_k / n_k as the log-rank expected event totals.
The Pike-Halley Estimator is obtained in three steps. First, the Pike anchor is computed as theta_0 = (O_T E_C) / (O_C E_T). Second, the score U_0, the observed information I_0, and the third-order curvature term J_0 of the Breslow partial likelihood are evaluated at eta_0 = log(theta_0):
p_k = n_T_k theta_0 / (n_C_k + n_T_k theta_0) U_0 = sum (O_T_k - O_k p_k) I_0 = sum O_k p_k (1 - p_k) J_0 = sum O_k p_k (1 - p_k) (1 - 2 p_k)
Third, the closed-form Halley correction is applied:
delta_hat = U_0 / I_0 - J_0 U_0^2 / (2 I_0^3) theta_hat = theta_0 exp(delta_hat)
The residual error satisfies |theta_hat - theta_Cox| = O_p(n^{-3/2}), three orders of magnitude faster than the O_p(n^{-1/2}) rate of Peto and Pike, and the per-call cost is approximately thirty times lower than that of the iterative Cox solver (Homma, 2025).
The Wald standard error on the log scale is SE = 1 / sqrt(I_0), where I_0
is the observed information evaluated at the Pike anchor. This is the same
quantity used in the Wald confidence interval reported by
summary(coxph(...)), which is based on the observed information at
the maximum likelihood estimate. Because the Pike anchor lies within
O_p(n^{-1/2}) of the Cox maximum likelihood estimate, the difference
between I_0 and the information at the maximum likelihood estimate is
negligible for the purpose of interval construction.
The C++ core (pihe_core) accepts the pooled sorted data together
with an integer group indicator and performs group splitting, at-risk
counting, and per-distinct-event-time accumulation in a single left-to-right
scan. This eliminates the rev(cumsum(rev(...))), tapply(),
which(), diff(), and group-split vector copies present in the
pure-R version.
The returned object has class "coxph_fast" and is a named numeric
vector of length 5. A print() method formats the result similarly
to summary(coxph(...)).
References
Cox, D. R. (1972). Regression models and life-tables. Journal of the Royal Statistical Society. Series B (Methodological), 34(2), 187-220.
Berry, G., Kitchin, R. M., & Mock, P. A. (1991). A comparison of two simple hazard ratio estimators based on the logrank test. Statistics in Medicine, 10(5), 749-755.
Homma, G. (2025). One step from Pike to Cox: a closed-form hazard ratio estimator. Manuscript under review.
See also
coxph for the standard iterative Cox estimator.
print.coxph_fast for the print method.
Examples
library(survival)
# Compare coxph_fast with coxph on the ovarian dataset.
# coxph() treats rx as numeric with rx=1 as the reference (control),
# so set control = 1 for a consistent comparison.
fit_fast <- coxph_fast(ovarian$futime, ovarian$fustat, ovarian$rx, control = 1)
fit_fast
#> Pike-Halley estimator for the hazard ratio (two-group)
#>
#> control = 1
#> alternative = two.sided
#>
#> Coefficients:
#> coef exp(coef) se(coef) z Pr(>|z|)
#> group -0.5964 0.5508 0.5868 -1.016 0.31
#>
#> Hazard ratio and 95% Wald confidence interval:
#> exp(coef) exp(-coef) lower .95 upper .95
#> group 0.5508 1.8155 0.1744 1.7399
fit_cox <- summary(coxph(Surv(futime, fustat) ~ rx, data = ovarian))
cat("coxph_fast HR :", fit_fast["exp(coef)"], "\n")
#> coxph_fast HR : 0.5508019
cat("coxph HR :", fit_cox$coefficients[, "exp(coef)"], "\n")
#> coxph HR : 0.5508019
# presorted = TRUE: sort once outside, reuse inside a loop
ord <- order(ovarian$futime)
coxph_fast(ovarian$futime[ord], ovarian$fustat[ord], ovarian$rx[ord],
control = 1, presorted = TRUE)
#> Pike-Halley estimator for the hazard ratio (two-group)
#>
#> control = 1
#> alternative = two.sided
#>
#> Coefficients:
#> coef exp(coef) se(coef) z Pr(>|z|)
#> group -0.5964 0.5508 0.5868 -1.016 0.31
#>
#> Hazard ratio and 95% Wald confidence interval:
#> exp(coef) exp(-coef) lower .95 upper .95
#> group 0.5508 1.8155 0.1744 1.7399
# \donttest{
library(microbenchmark)
microbenchmark(
coxph_fast = coxph_fast(ovarian$futime, ovarian$fustat, ovarian$rx, 2),
coxph = coxph(Surv(futime, fustat) ~ rx, data = ovarian),
times = 1000
)
#> Unit: microseconds
#> expr min lq mean median uq max neval cld
#> coxph_fast 39.519 47.526 63.96425 66.9795 74.916 235.420 1000 a
#> coxph 1376.146 1455.349 1562.19797 1482.5190 1523.305 9273.055 1000 b
# }