
Reference based beta correction
reference_based_beta_correction.RdThis function adjusts CpG beta values for tumor cells and inferred normal cells using reference regressions and estimated purities. This can be carried out refitting the regressions to include the new data points (betas + estimated purities) or using the original reference regressions. Unlike beta_correction_for_cohorts(), this function does not require the usage of a full cohort of samples, as it is single sample and single CpG applicable. This function allows multi-core execution.
Usage
reference_based_beta_correction(
betas_to_correct,
purities_samples_to_correct,
purities_purebeta_format = TRUE,
only_certain_CpGs = FALSE,
CpGs_to_correct_vector,
refitting,
reference_regressions,
reference_betas,
reference_purities,
set_seed = TRUE,
seed_num = 2000,
cores = 1
)Arguments
- betas_to_correct
A matrix with CpGs as rows and analysed samples (or an individual sample) as columns with the uncorrected beta values from the CpGs of the samples that are intended to be corrected. The values must be numeric, the rows must be named with the CpG ID, and the columns with the sample IDs. An example of the required format is available in the example_betas_to_correct matrix.
- purities_samples_to_correct
The output of the purity_estimation function in the original format should be entered here. If the user intends to use any alternative format a dataframe with sample IDs in the first column and sample purity could be enetered here after setting the purities_purebeta_format argument to FALSE (IMPORTANT! The user MUST enter sample purity values, not 1-Purity values).
- purities_purebeta_format
Default = TRUE. If the user wanted to use a different input format for sample purity (see purities_samples_to_correct for the format specifications) this argument should be set to FALSE.
- only_certain_CpGs
Default = FALSE. If the beta correction has to be applied only to certain CpGs and not to all the ones included in the matrix provided as the betas_to_correct argument this argument should be set to TRUE.
- refitting
Default = FALSE. This argument should be set to TRUE if the user wants to refit the reference regression to also take into account the betas and PureBeta estimated sample purity values as additional data points. This could be advisable when having a non-fully representative reference dataset and a significant number of samples whose purity has been estimated using PureBeta. Else, the reference regressions will be directly used for the beta correction.
- reference_regressions
The output of the reference regression generator should be entered here if the refitting argument has NOT been set to TRUE. Else, this argument should be ignored (both short and extended versions are valid). The input list must at least include the list contains a named vector with the variance of the betas of CpGs used to build the regressions (input$cpg.variance), the slopes, intercepts residual standard error and degrees of freedom of the regression calculated per CpG (input$reg.slopes, input$reg.intercepts, input$reg.RSE and input$df as matrices).
- reference_betas
A matrix with CpGs as rows and analysed samples (or an individual sample) as columns with the uncorrected beta values from the CpGs of the samples that are intended to be used to as reference data should be entered here if the refitting argument has been set to TRUE. The values must be numeric, the rows must be named with the CpG ID, and the columns with the sample IDs. An example of the required format is available in the example_betas_reference matrix.
- reference_purities
Named vector containing the sample purity values of the samples whose DNA methylation beta values are intended to be used as reference data should be entered here if the refitting argument has been set to TRUE. The vector must be named with the sample ID, which must match with the sample IDs from the matrix containing the beta values. An example of the required format is available in the example_purities_reference vector.
- set_seed
Default = FALSE. A seed for the FlexMix package to detect the different CpG methylation patterns can be used by setting this argument to TRUE. This argument will only be used if the refitting argument has been set to TRUE.
- seed_num
Default = 2000. The seed to be used when set_seed = TRUE can be specified here.
- cores
Default = 1. Number of cores to be used to run the function in parallel.
- CpGs_to_correct_vector.
A vector of the CpG IDs to be corrected when only_certain_CpGs = TRUE has been specified.
Value
List with the corrected betas for the tumour (output$`Corrected_tumor`) and microenvironment (output$`Corrected_microenvironment`) when refitting = FALSE. If refitting = TRUE has been selected the corrected betas for the tumour and microenvironment (output$`Corrected_betas`) will be available in addition to the parameters of the refitted new regressions (output$`Regression_parameters`).
Examples
# Using the non-refitting approach for all the CpGs
reference_based_beta_correction(betas_to_correct = example_betas_to_correct,
purities_samples_to_correct = purity_estimation_output,
only_certain_CpGs = FALSE,
refitting = FALSE,
reference_regressions = reference_regression_generator_output,
cores = 5)
# Using the non-refitting approach for certain CpGs
reference_based_beta_correction(betas_to_correct = example_betas_to_correct,
purities_samples_to_correct = purity_estimation_output,
only_certain_CpGs = TRUE,
CpGs_to_correct_vector = c("cg09248054", "cg08231710"),
refitting = FALSE,
refernce_regressions = reference_regression_generator_output,
cores = 5)
# Using the refitting approach
reference_based_beta_correction(betas_to_correct = example_betas_to_correct,
purities_samples_to_correct = purity_estimation_output,
only_certain_CpGs = FALSE,
refitting = TRUE,
reference_betas = example_betas_reference,
reference_purities = example_purities_reference,
set_seed = TRUE,
seed_num = 1,
cores = 5)