Statistical Methods and Mathematical Framework

Gosuke Homma

2025-06-16

1 Introduction

This vignette provides the detailed mathematical framework underlying the MultiCorrSurvBinary package. It covers correlation bounds, random number generation methods, and statistical test formulations used for correlated time-to-event and binary outcomes.

2 Mathematical Framework

2.1 Marginal Distributions

2.1.1 Time-to-Event Outcomes (OS, PFS)

Both overall survival (OS) and progression-free survival (PFS) follow exponential distributions:

\[X \sim \text{Exponential}(\lambda)\]

where the probability density function is:

\[f(x) = \lambda e^{-\lambda x}, \quad x \geq 0\]

The relationship between median survival time (\(m\)) and rate parameter (\(\lambda\)) is:

\[\lambda = \frac{\ln(2)}{m}\]

Properties: - Mean: \(E[X] = \frac{1}{\lambda}\) - Variance: \(\text{Var}(X) = \frac{1}{\lambda^2}\) - Median: \(\text{Median}(X) = \frac{\ln(2)}{\lambda}\)

2.1.2 Binary Outcome (OR)

Objective response follows a Bernoulli distribution:

\[Y \sim \text{Bernoulli}(p)\]

where: \[P(Y = 1) = p, \quad P(Y = 0) = 1-p\]

Properties: - Mean: \(E[Y] = p\) - Variance: \(\text{Var}(Y) = p(1-p)\)

2.2 Correlation Structure and Copulas

2.2.1 Normal Copula

The package uses a Gaussian (normal) copula to model the dependence structure between outcomes. For \(n\) outcomes, the normal copula is defined as:

\[C_R(u_1, \ldots, u_n) = \Phi_R(\Phi^{-1}(u_1), \ldots, \Phi^{-1}(u_n))\]

where: - \(\Phi_R\) is the \(n\)-dimensional standard normal CDF with correlation matrix \(R\) - \(\Phi^{-1}\) is the inverse standard normal CDF - \(u_i \in [0,1]\) are uniform marginals

2.2.2 Correlation Matrix Structure

For three outcomes (OS, PFS, OR), the correlation matrix is:

\[R = \begin{pmatrix} 1 & \rho_{OS,PFS} & \rho_{OS,OR} \\ \rho_{OS,PFS} & 1 & \rho_{PFS,OR} \\ \rho_{OS,OR} & \rho_{PFS,OR} & 1 \end{pmatrix}\]

Positive Definiteness Constraint: All eigenvalues of \(R\) must be positive, ensuring \(R\) is a valid correlation matrix.

3 Fréchet-Hoeffding Bounds

3.1 Theoretical Foundation

For any two random variables \(X\) and \(Y\) with marginal distributions \(F_X\) and \(F_Y\), the Fréchet-Hoeffding bounds define the range of possible correlations:

\[\rho_{min} \leq \rho(X,Y) \leq \rho_{max}\]

These bounds are achieved by the Fréchet-Hoeffding copulas: - Lower bound (W-copula): \(C^-(u,v) = \max(u + v - 1, 0)\) - Upper bound (M-copula): \(C^+(u,v) = \min(u,v)\)

3.2 Continuous-Continuous Bounds (OS-PFS)

For two exponential distributions with parameters \(\lambda_1\) and \(\lambda_2\):

3.2.1 Upper Bound Calculation

The upper bound correlation is achieved when both variables use the same uniform random variable: \[X_1 = F_1^{-1}(U), \quad X_2 = F_2^{-1}(U)\]

where \(F_i^{-1}(u) = -\frac{\ln(1-u)}{\lambda_i}\) is the exponential quantile function.

The correlation is calculated as: \[\rho_{max} = \frac{E[X_1 X_2] - E[X_1]E[X_2]}{\sqrt{\text{Var}(X_1)\text{Var}(X_2)}}\]

where: \[E[X_1 X_2] = \int_0^1 F_1^{-1}(u) F_2^{-1}(u) du\]

3.2.2 Lower Bound Calculation

The lower bound uses anti-monotonic dependence: \[X_1 = F_1^{-1}(U), \quad X_2 = F_2^{-1}(1-U)\]

The calculation follows similarly with: \[E[X_1 X_2] = \int_0^1 F_1^{-1}(u) F_2^{-1}(1-u) du\]

3.3 Continuous-Binary Bounds (OS-OR, PFS-OR)

For a continuous exponential variable \(X\) and binary variable \(Y\) with \(P(Y=1) = p\):

3.3.1 Upper Bound (Positive Dependence)

Binary outcome equals 1 for the largest continuous values: \[Y = \mathbf{1}_{X \geq F_X^{-1}(1-p)}\]

The threshold is \(t_{upper} = -\frac{\ln(p)}{\lambda}\).

Expected product: \[E[XY] = \int_{t_{upper}}^{\infty} x \lambda e^{-\lambda x} dx = \frac{1}{\lambda}e^{-\lambda t_{upper}}\]

Upper bound correlation: \[\rho_{max} = \frac{E[XY] - E[X]E[Y]}{\sqrt{\text{Var}(X)\text{Var}(Y)}} = \frac{\frac{1}{\lambda}e^{-\lambda t_{upper}} - \frac{1}{\lambda} \cdot p}{\frac{1}{\lambda}\sqrt{p(1-p)}}\]

3.3.2 Lower Bound (Negative Dependence)

Binary outcome equals 1 for the smallest continuous values: \[Y = \mathbf{1}_{X \leq F_X^{-1}(p)}\]

The threshold is \(t_{lower} = -\frac{\ln(1-p)}{\lambda}\).

Expected product: \[E[XY] = \int_0^{t_{lower}} x \lambda e^{-\lambda x} dx\]

This integral evaluates to: \[E[XY] = \frac{1}{\lambda}[1 - (1 + \lambda t_{lower})e^{-\lambda t_{lower}}]\]

4 Random Number Generation Algorithm

4.1 Step-by-Step Process

4.1.1 Single Outcome Generation

For a single outcome, direct sampling is used: - OS/PFS: \(X = -\frac{\ln(U)}{\lambda}\) where \(U \sim \text{Uniform}(0,1)\) - OR: \(Y = \mathbf{1}_{U \leq p}\) where \(U \sim \text{Uniform}(0,1)\)

4.1.2 Multiple Correlated Outcomes

  1. Generate Correlated Uniform Variables:
    • Sample from multivariate normal: \(\mathbf{Z} \sim N_n(\mathbf{0}, R)\)
    • Transform to uniform: \(\mathbf{U} = (\Phi(Z_1), \ldots, \Phi(Z_n))\)
  2. Transform to Target Distributions:
    • OS: \(X_{OS} = -\frac{\ln(1-U_1)}{\lambda_{OS}}\)
    • PFS: \(X_{PFS} = -\frac{\ln(1-U_2)}{\lambda_{PFS}}\)
    • OR: \(Y_{OR} = \mathbf{1}_{U_3 \leq p_{OR}}\)
  3. Apply OS ≥ PFS Constraint:
    • Prioritize OS: \(X_{PFS}^* = \min(X_{PFS}, X_{OS})\)
    • Prioritize PFS: \(X_{OS}^* = \max(X_{OS}, X_{PFS})\)

4.2 Accrual Time Generation

Patient accrual times follow uniform distribution: \[T_{accrual} \sim \text{Uniform}(0, \tau)\]

where \(\tau\) is the accrual period.

Total observation times are: - \(T_{total,OS} = T_{accrual} + X_{OS}\) - \(T_{total,PFS} = T_{accrual} + X_{PFS}\)

5 Statistical Testing Framework

5.1 Survival Outcomes (OS, PFS)

5.1.1 Log-Rank Test

The log-rank test compares survival distributions between treatment arms.

Test Statistic: \[\chi^2 = \frac{(O_T - E_T)^2}{V}\]

where: - \(O_T\) = observed events in treatment group - \(E_T\) = expected events in treatment group - \(V\) = variance estimate

Expected Events Calculation: At each event time \(t_j\) with \(d_j\) total events and \(n_{Tj}\), \(n_{Cj}\) at risk: \[E_{Tj} = \frac{n_{Tj}}{n_{Tj} + n_{Cj}} \cdot d_j\]

Variance Calculation: \[V = \sum_j \frac{n_{Tj} n_{Cj} d_j (n_{Tj} + n_{Cj} - d_j)}{(n_{Tj} + n_{Cj})^2 (n_{Tj} + n_{Cj} - 1)}\]

5.1.2 One-Sided P-Values

For alternative hypothesis directions:

  • Greater (Treatment Better): \(H_1: h_T < h_C\) (lower hazard)
    • If \(O_T < E_T\): \(p = \frac{1}{2}P(\chi_1^2 > \chi^2)\)
    • If \(O_T \geq E_T\): \(p = 1 - \frac{1}{2}P(\chi_1^2 > \chi^2)\)
  • Less (Treatment Worse): \(H_1: h_T > h_C\) (higher hazard)
    • If \(O_T > E_T\): \(p = \frac{1}{2}P(\chi_1^2 > \chi^2)\)
    • If \(O_T \leq E_T\): \(p = 1 - \frac{1}{2}P(\chi_1^2 > \chi^2)\)

5.2 Binary Outcome (OR)

5.2.1 Chi-Squared Test with Pooled Proportion

For comparing response rates between treatment arms:

Pooled Proportion: \[\hat{p}_{pooled} = \frac{r_T + r_C}{n_T + n_C}\]

where \(r_T\), \(r_C\) are responses and \(n_T\), \(n_C\) are sample sizes.

Standard Error: \[SE = \sqrt{\hat{p}_{pooled}(1-\hat{p}_{pooled})\left(\frac{1}{n_T} + \frac{1}{n_C}\right)}\]

Z-Statistic: \[Z = \frac{\hat{p}_T - \hat{p}_C}{SE}\]

where \(\hat{p}_T = \frac{r_T}{n_T}\) and \(\hat{p}_C = \frac{r_C}{n_C}\).

5.2.2 One-Sided P-Values

  • Greater: \(p = P(Z_{standard} > Z) = 1 - \Phi(Z)\)
  • Less: \(p = P(Z_{standard} < Z) = \Phi(Z)\)
  • Two-sided: \(p = 2(1 - \Phi(|Z|))\)

6 Event-Driven Analysis

6.1 Timing Determination

Analysis timing is determined by the prioritized outcome (OS or PFS) in the timing subgroup:

  1. Sort patients by total time: \(T_{total} = T_{accrual} + X_{outcome}\)
  2. Find analysis time when \(E\)-th event occurs: \(t_{analysis} = T_{total}^{(E)}\)

6.2 Data Censoring

At analysis time \(t_{analysis}\):

Event Indicators: - OS: \(\delta_{OS,i} = \mathbf{1}_{T_{total,OS,i} \leq t_{analysis}}\) - PFS: \(\delta_{PFS,i} = \mathbf{1}_{T_{total,PFS,i} \leq t_{analysis}}\)

Observed Times: - OS: \(Y_{OS,i} = \min(T_{total,OS,i}, t_{analysis}) - T_{accrual,i}\) - PFS: \(Y_{PFS,i} = \min(T_{total,PFS,i}, t_{analysis}) - T_{accrual,i}\)

7 Implementation Notes

7.1 Numerical Integration

Fréchet-Hoeffding bounds are calculated using R’s integrate() function with adaptive quadrature for high precision.

7.2 Copula Implementation

The package uses the copula package’s normalCopula with dispersion structure “un” (unstructured) for maximum flexibility.

7.3 Boundary Validation

Before simulation, the package validates: 1. Individual correlation bounds for each pair 2. Positive definiteness of correlation matrix 3. Parameter feasibility (e.g., \(0 < p < 1\) for binary outcomes)

8 References

  1. Nelsen, R.B. (2006). An Introduction to Copulas. Springer.
  2. Joe, H. (1997). Multivariate Models and Dependence Concepts. Chapman & Hall.
  3. Prentice, R.L. (1978). Linear rank tests with right censored data. Biometrika, 65(1), 167-179.
  4. Mantel, N. (1966). Evaluation of survival data and two new rank order statistics. Cancer Chemotherapy Reports, 50(3), 163-170.
library(MultiCorrSurvBinary)
library(copula)
library(survival)

This mathematical framework ensures rigorous statistical foundations for all simulation and analysis procedures in the MultiCorrSurvBinary package.