This vignette provides the detailed mathematical framework underlying
the MultiCorrSurvBinary
package. It covers correlation
bounds, random number generation methods, and statistical test
formulations used for correlated time-to-event and binary outcomes.
Both overall survival (OS) and progression-free survival (PFS) follow exponential distributions:
\[X \sim \text{Exponential}(\lambda)\]
where the probability density function is:
\[f(x) = \lambda e^{-\lambda x}, \quad x \geq 0\]
The relationship between median survival time (\(m\)) and rate parameter (\(\lambda\)) is:
\[\lambda = \frac{\ln(2)}{m}\]
Properties: - Mean: \(E[X] = \frac{1}{\lambda}\) - Variance: \(\text{Var}(X) = \frac{1}{\lambda^2}\) - Median: \(\text{Median}(X) = \frac{\ln(2)}{\lambda}\)
Objective response follows a Bernoulli distribution:
\[Y \sim \text{Bernoulli}(p)\]
where: \[P(Y = 1) = p, \quad P(Y = 0) = 1-p\]
Properties: - Mean: \(E[Y] = p\) - Variance: \(\text{Var}(Y) = p(1-p)\)
The package uses a Gaussian (normal) copula to model the dependence structure between outcomes. For \(n\) outcomes, the normal copula is defined as:
\[C_R(u_1, \ldots, u_n) = \Phi_R(\Phi^{-1}(u_1), \ldots, \Phi^{-1}(u_n))\]
where: - \(\Phi_R\) is the \(n\)-dimensional standard normal CDF with correlation matrix \(R\) - \(\Phi^{-1}\) is the inverse standard normal CDF - \(u_i \in [0,1]\) are uniform marginals
For three outcomes (OS, PFS, OR), the correlation matrix is:
\[R = \begin{pmatrix} 1 & \rho_{OS,PFS} & \rho_{OS,OR} \\ \rho_{OS,PFS} & 1 & \rho_{PFS,OR} \\ \rho_{OS,OR} & \rho_{PFS,OR} & 1 \end{pmatrix}\]
Positive Definiteness Constraint: All eigenvalues of \(R\) must be positive, ensuring \(R\) is a valid correlation matrix.
For any two random variables \(X\) and \(Y\) with marginal distributions \(F_X\) and \(F_Y\), the Fréchet-Hoeffding bounds define the range of possible correlations:
\[\rho_{min} \leq \rho(X,Y) \leq \rho_{max}\]
These bounds are achieved by the Fréchet-Hoeffding copulas: - Lower bound (W-copula): \(C^-(u,v) = \max(u + v - 1, 0)\) - Upper bound (M-copula): \(C^+(u,v) = \min(u,v)\)
For two exponential distributions with parameters \(\lambda_1\) and \(\lambda_2\):
The upper bound correlation is achieved when both variables use the same uniform random variable: \[X_1 = F_1^{-1}(U), \quad X_2 = F_2^{-1}(U)\]
where \(F_i^{-1}(u) = -\frac{\ln(1-u)}{\lambda_i}\) is the exponential quantile function.
The correlation is calculated as: \[\rho_{max} = \frac{E[X_1 X_2] - E[X_1]E[X_2]}{\sqrt{\text{Var}(X_1)\text{Var}(X_2)}}\]
where: \[E[X_1 X_2] = \int_0^1 F_1^{-1}(u) F_2^{-1}(u) du\]
The lower bound uses anti-monotonic dependence: \[X_1 = F_1^{-1}(U), \quad X_2 = F_2^{-1}(1-U)\]
The calculation follows similarly with: \[E[X_1 X_2] = \int_0^1 F_1^{-1}(u) F_2^{-1}(1-u) du\]
For a continuous exponential variable \(X\) and binary variable \(Y\) with \(P(Y=1) = p\):
Binary outcome equals 1 for the largest continuous values: \[Y = \mathbf{1}_{X \geq F_X^{-1}(1-p)}\]
The threshold is \(t_{upper} = -\frac{\ln(p)}{\lambda}\).
Expected product: \[E[XY] = \int_{t_{upper}}^{\infty} x \lambda e^{-\lambda x} dx = \frac{1}{\lambda}e^{-\lambda t_{upper}}\]
Upper bound correlation: \[\rho_{max} = \frac{E[XY] - E[X]E[Y]}{\sqrt{\text{Var}(X)\text{Var}(Y)}} = \frac{\frac{1}{\lambda}e^{-\lambda t_{upper}} - \frac{1}{\lambda} \cdot p}{\frac{1}{\lambda}\sqrt{p(1-p)}}\]
Binary outcome equals 1 for the smallest continuous values: \[Y = \mathbf{1}_{X \leq F_X^{-1}(p)}\]
The threshold is \(t_{lower} = -\frac{\ln(1-p)}{\lambda}\).
Expected product: \[E[XY] = \int_0^{t_{lower}} x \lambda e^{-\lambda x} dx\]
This integral evaluates to: \[E[XY] = \frac{1}{\lambda}[1 - (1 + \lambda t_{lower})e^{-\lambda t_{lower}}]\]
For a single outcome, direct sampling is used: - OS/PFS: \(X = -\frac{\ln(U)}{\lambda}\) where \(U \sim \text{Uniform}(0,1)\) - OR: \(Y = \mathbf{1}_{U \leq p}\) where \(U \sim \text{Uniform}(0,1)\)
Patient accrual times follow uniform distribution: \[T_{accrual} \sim \text{Uniform}(0, \tau)\]
where \(\tau\) is the accrual period.
Total observation times are: - \(T_{total,OS} = T_{accrual} + X_{OS}\) - \(T_{total,PFS} = T_{accrual} + X_{PFS}\)
The log-rank test compares survival distributions between treatment arms.
Test Statistic: \[\chi^2 = \frac{(O_T - E_T)^2}{V}\]
where: - \(O_T\) = observed events in treatment group - \(E_T\) = expected events in treatment group - \(V\) = variance estimate
Expected Events Calculation: At each event time \(t_j\) with \(d_j\) total events and \(n_{Tj}\), \(n_{Cj}\) at risk: \[E_{Tj} = \frac{n_{Tj}}{n_{Tj} + n_{Cj}} \cdot d_j\]
Variance Calculation: \[V = \sum_j \frac{n_{Tj} n_{Cj} d_j (n_{Tj} + n_{Cj} - d_j)}{(n_{Tj} + n_{Cj})^2 (n_{Tj} + n_{Cj} - 1)}\]
For alternative hypothesis directions:
For comparing response rates between treatment arms:
Pooled Proportion: \[\hat{p}_{pooled} = \frac{r_T + r_C}{n_T + n_C}\]
where \(r_T\), \(r_C\) are responses and \(n_T\), \(n_C\) are sample sizes.
Standard Error: \[SE = \sqrt{\hat{p}_{pooled}(1-\hat{p}_{pooled})\left(\frac{1}{n_T} + \frac{1}{n_C}\right)}\]
Z-Statistic: \[Z = \frac{\hat{p}_T - \hat{p}_C}{SE}\]
where \(\hat{p}_T = \frac{r_T}{n_T}\) and \(\hat{p}_C = \frac{r_C}{n_C}\).
Analysis timing is determined by the prioritized outcome (OS or PFS) in the timing subgroup:
At analysis time \(t_{analysis}\):
Event Indicators: - OS: \(\delta_{OS,i} = \mathbf{1}_{T_{total,OS,i} \leq t_{analysis}}\) - PFS: \(\delta_{PFS,i} = \mathbf{1}_{T_{total,PFS,i} \leq t_{analysis}}\)
Observed Times: - OS: \(Y_{OS,i} = \min(T_{total,OS,i}, t_{analysis}) - T_{accrual,i}\) - PFS: \(Y_{PFS,i} = \min(T_{total,PFS,i}, t_{analysis}) - T_{accrual,i}\)
Fréchet-Hoeffding bounds are calculated using R’s
integrate()
function with adaptive quadrature for high
precision.
The package uses the copula
package’s
normalCopula
with dispersion structure “un” (unstructured)
for maximum flexibility.
Before simulation, the package validates: 1. Individual correlation bounds for each pair 2. Positive definiteness of correlation matrix 3. Parameter feasibility (e.g., \(0 < p < 1\) for binary outcomes)
This mathematical framework ensures rigorous statistical foundations
for all simulation and analysis procedures in the
MultiCorrSurvBinary
package.