Introduction to CorOncoEndpoints • CorOncoEndpoints

library(CorOncoEndpoints)
set.seed(123)

Overview

The CorOncoEndpoints package provides tools for generating correlated oncology endpoints in clinical trial simulations. This vignette introduces the basic functionality and common use cases.

Why CorOncoEndpoints?

In oncology clinical trials, we often need to simulate multiple correlated endpoints:

Overall Survival (OS): Time from randomization to death
Progression-Free Survival (PFS): Time from randomization to progression or death
Objective Response (Response): Binary indicator of tumor response

These endpoints are not independent—they exhibit natural correlations. For example:

Patients who respond to treatment tend to have longer survival times
PFS is always ≤ OS (you cannot die before progressing)
Response is associated with both OS and PFS

CorOncoEndpoints generates realistic simulated data that preserves these correlation structures.

Basic Usage

Example 1: Generate OS and Response

Let’s start with a simple example generating correlated OS and Response data:

# Generate data for two groups
data1 <- rOncoEndpoints(
  nsim = 100,                           # 100 simulations
  group = c("Treatment", "Control"),    # Two treatment groups
  n = c(150, 150),                      # Sample size per group
  p = c(0.4, 0.3),                      # Response rates
  hazard_OS = c(0.05, 0.07),            # Hazard rates for OS
  rho_tte_resp = c(0.3, 0.2),           # Correlation between OS and Response
  copula = "Clayton"                    # Copula family
)

# View first few rows
head(data1)
#>   simID     Group         OS Response
#> 1     1 Treatment  6.7816835        0
#> 2     1 Treatment 31.0521872        0
#> 3     1 Treatment 10.5180043        1
#> 4     1 Treatment 42.9146021        1
#> 5     1 Treatment 56.4245855        0
#> 6     1 Treatment  0.9325366        0

# Check dimensions
cat("Total observations:", nrow(data1), "\n")
#> Total observations: 30000
cat("Number of simulations:", length(unique(data1$simID)), "\n")
#> Number of simulations: 100
cat("Groups:", unique(data1$Group), "\n")
#> Groups: Treatment Control

Example 2: All Three Endpoints

Now let’s generate all three endpoints (OS, PFS, Response):

data2 <- rOncoEndpoints(
  nsim = 100,
  group = c("Experimental", "Standard"),
  n = c(200, 200),
  p = c(0.5, 0.35),
  hazard_OS = c(0.04, 0.06),
  hazard_PFS = c(0.08, 0.10),           # Note: hazard_PFS > hazard_OS
  rho_tte_resp = c(0.4, 0.25),          # Correlation between OS and Response
  copula = "Frank"
)

head(data2)
#>   simID        Group        OS        PFS Response
#> 1     1 Experimental 13.758314 13.7583143        1
#> 2     1 Experimental 38.484101  0.5779209        1
#> 3     1 Experimental 23.023374 23.0233739        1
#> 4     1 Experimental  2.137153  0.9313718        1
#> 5     1 Experimental  8.745343  6.6986000        0
#> 6     1 Experimental 10.339117 10.3391172        1

Important: When generating all three endpoints, you specify the correlation between OS and Response (rho_tte_resp). The correlation between PFS and Response is automatically determined by the model structure.

Example 3: Verify Correlations

Let’s check the correlations in our simulated data:

# For the Experimental group in simulation 1
sim1_exp <- subset(data2, simID == 1 & Group == "Experimental")

# Correlation between OS and Response
cor_os_resp <- cor(sim1_exp$OS, sim1_exp$Response)
cat("Correlation (OS, Response):", round(cor_os_resp, 3), "\n")
#> Correlation (OS, Response): 0.423

# Correlation between PFS and Response
cor_pfs_resp <- cor(sim1_exp$PFS, sim1_exp$Response)
cat("Correlation (PFS, Response):", round(cor_pfs_resp, 3), "\n")
#> Correlation (PFS, Response): 0.322

# Correlation between OS and PFS
cor_os_pfs <- cor(sim1_exp$OS, sim1_exp$PFS)
cat("Correlation (OS, PFS):", round(cor_os_pfs, 3), "\n")
#> Correlation (OS, PFS): 0.54

# Verify PFS <= OS constraint
cat("All PFS <= OS?", all(sim1_exp$PFS <= sim1_exp$OS), "\n")
#> All PFS <= OS? TRUE

Validation of Simulation Results

Use CheckSimResults() to validate that your simulations match theoretical values:

# Generate more simulations for better validation
data_val <- rOncoEndpoints(
  nsim = 1000,
  group = c("Treatment", "Control"),
  n = c(100, 100),
  p = c(0.4, 0.3),
  hazard_OS = c(0.05, 0.07),
  rho_tte_resp = c(0.3, 0.2),
  copula = "Clayton"
)

# Validate results
validation <- CheckSimResults(
  dataset = data_val,
  p = c(Treatment = 0.4, Control = 0.3),
  hazard_OS = c(Treatment = 0.05, Control = 0.07),
  rho_tte_resp = c(Treatment = 0.3, Control = 0.2),
  copula = "Clayton"
)

# Show results
print(validation, n = 20)
#> # A tibble: 8 × 10
#>   Group     Endpoint Empirical Theoretical     Bias Relative_Bias     SE     MSE
#>   <chr>     <chr>        <dbl>       <dbl>    <dbl>         <dbl>  <dbl>   <dbl>
#> 1 Treatment OS_Mean     20.0         20     0.0374          0.187 2.00   3.99   
#> 2 Treatment OS_Medi…    14.0         13.9   0.0924          0.667 1.96   3.86   
#> 3 Treatment Response     0.401        0.4   0.00112         0.280 0.0484 0.00235
#> 4 Treatment Cor_OS_…     0.308        0.3   0.00811         2.70  0.0972 0.00952
#> 5 Control   OS_Mean     14.3         14.3   0.0229          0.161 1.45   2.11   
#> 6 Control   OS_Medi…     9.98         9.90  0.0734          0.741 1.43   2.05   
#> 7 Control   Response     0.296        0.3  -0.00355        -1.18  0.0455 0.00208
#> 8 Control   Cor_OS_…     0.203        0.2   0.00317         1.59  0.0994 0.00990
#> # ℹ 2 more variables: RMSE <dbl>, Assessment <chr>

Interpretation:

Bias: Should be close to 0 for unbiased methods
Relative_Bias: < 5% is excellent, < 10% is acceptable
SE: Standard error of estimates across simulations
RMSE: Overall accuracy measure

Understanding Correlation Bounds

Not all correlations are feasible. Use CorBoundResponseTTE() to check feasible ranges:

# For response probability = 0.4
bounds <- CorBoundResponseTTE(p = 0.4)
cat("Feasible correlation range:", 
    round(bounds[1], 3), "to", round(bounds[2], 3), "\n")
#> Feasible correlation range: -0.626 to 0.748

# Try different response probabilities
p_values <- c(0.2, 0.4, 0.6, 0.8)
bounds_matrix <- sapply(p_values, CorBoundResponseTTE)
colnames(bounds_matrix) <- paste("p =", p_values)
rownames(bounds_matrix) <- c("Lower", "Upper")
print(round(bounds_matrix, 3))
#>       p = 0.2 p = 0.4 p = 0.6 p = 0.8
#> Lower  -0.446  -0.626  -0.748  -0.805
#> Upper   0.805   0.748   0.626   0.446

Notice that the feasible range depends on the response probability.

Copula Families

Clayton Copula

Exhibits lower tail dependence
Cannot model negative correlations (rho > 0 only)
Good for survival data where patients with poor outcomes tend to have poor outcomes across endpoints

# Clayton copula example
data_clayton <- rOncoEndpoints(
  nsim = 100,
  n = 100,
  p = 0.4,
  hazard_OS = 0.05,
  rho_tte_resp = 0.3,
  copula = "Clayton"
)

head(data_clayton)
#>   simID  Group         OS Response
#> 1     1 Group1  0.9734742        0
#> 2     1 Group1  1.7577495        0
#> 3     1 Group1 16.7348480        1
#> 4     1 Group1 43.1064927        1
#> 5     1 Group1 21.5900391        0
#> 6     1 Group1 61.0736650        0

Frank Copula

Flexible for both positive and negative correlations
Symmetric tail behavior
More general choice

# Frank copula with negative correlation
bounds_neg <- CorBoundResponseTTE(p = 0.4)
rho_negative <- -0.2  # Must be within bounds

data_frank <- rOncoEndpoints(
  nsim = 100,
  n = 100,
  p = 0.4,
  hazard_OS = 0.05,
  rho_tte_resp = rho_negative,
  copula = "Frank"
)

# Check negative correlation
cor(data_frank[data_frank$simID == 1, ]$OS, 
    data_frank[data_frank$simID == 1, ]$Response)
#> [1] -0.194624

Common Use Cases

Use Case 1: Power Analysis

Generate data under different scenarios to estimate statistical power:

# Scenario: Treatment vs Control
scenarios <- expand.grid(
  hazard_ratio = c(0.7, 0.8, 0.9),
  response_diff = c(0.1, 0.15, 0.2),
  correlation = c(0.2, 0.3, 0.4)
)

# For each scenario, generate data and calculate power
# (Example code structure - not run)

Use Case 2: Sample Size Calculation

Determine required sample size for detecting treatment effects:

# Generate data with different sample sizes
n_values <- c(50, 100, 150, 200)

# For each n, simulate trials and calculate detection rates
# (Example code structure - not run)

Use Case 3: Trial Design Evaluation

Compare different trial designs (e.g., different endpoints, different copulas):

# Compare Clayton vs Frank copula
# Compare OS-only vs OS+Response endpoints
# (Example code structure - not run)

Summary

This vignette introduced the basic functionality of CorOncoEndpoints:

Generate correlated endpoints with rOncoEndpoints()
Validate simulations with CheckSimResults()
Check feasible correlations with CorBoundResponseTTE()
Choose appropriate copulas (Clayton for positive only, Frank for both)

For more advanced usage, see the “Advanced Usage and Examples” vignette. For theoretical details, see the “Theoretical Background” vignette.

Next Steps

Explore the advanced-usage vignette for complex scenarios
Read the theoretical-background vignette to understand the mathematical framework
Check function documentation with ?rOncoEndpoints, ?CheckSimResults, etc.