Perform the inference of the maximum log-likelihood VERSO phylogenetic tree.

VERSO(
  D,
  alpha = NULL,
  beta = NULL,
  initialization = NULL,
  random_tree = FALSE,
  keep_equivalent = TRUE,
  check_indistinguishable = TRUE,
  marginalize = FALSE,
  num_rs = 10,
  num_iter = 10000,
  n_try_bs = 1000,
  num_processes = Inf,
  verbose = TRUE,
  log_file = ""
)

Arguments

D

Input data for the inference reporting presence (as 1), absense (as 0) or missing information (as NA) for a set of variants.

alpha

False positive error rate provided as a verctor; if a vector of alpha (and beta) is provided, the inference is performed for multiple values and the solution at maximum-likelihood is returned.

beta

False negative error rate provided as a verctor; if a vector of beta (and alpha) is provided, the inference is performed for multiple values and the solution at maximum-likelihood is returned.

initialization

Binary matrix representing a perfect philogeny tree; genotypes are rows and mutations are columns. This parameter overrides "random_tree".

random_tree

Boolean. Shall I start MCMC search from a random tree? If FALSE (default) and initialization is NULL, search is started from a TRaIT tree (BMC Bioinformatics . 2019 Apr 25;20(1):210. doi: 10.1186/s12859-019-2795-4).

keep_equivalent

Boolean. Shall I return results (B and C) at equivalent likelihood with the best returned solution?

check_indistinguishable

Boolean. Shall I remove any indistinguishable variant from input data prior inference?

marginalize

Boolean. Shall I marginalize C when computing likelihood?

num_rs

Number of restarts during MCMC inference.

num_iter

Maximum number of MCMC steps to be performed during the inference.

n_try_bs

Number of steps without changes in likelihood of best solution after which to stop the MCMC.

num_processes

Number of processes to be used during parallel execution. To execute in single process mode, this parameter needs to be set to either 1, NA or NULL.

verbose

Boolean. Shall I print to screen information messages during the execution?

log_file

log file where to print outputs when using parallel. If parallel execution is disabled, this parameter is ignored.

Value

A list of 9 elements: B, C, phylogenetic_tree, corrected_genotypes, genotypes_prevalence, genotypes_summary, log_likelihood and error_rates. Here, B returns the maximum likelihood variants tree (inner nodes of the phylogenetic tree), C the attachment of patients to genotypes and phylogenetic_tree VERSO phylogenetic tree, including both variants tree and patients attachments to variants; corrected_genotypes is the corrected genotypes, which corrects D given VERSO phylogenetic tree, genotypes_prevalence the number of patients and observed prevalence of each genotype and genotypes_summary provide a summary of association of mutations to genotypes. In equivalent_solutions, solutions (B and C) with likelihood equivalent to the best solution are returned. Finally log_likelihood and error_rates return the likelihood of the inferred phylogenetic moldel and best values of alpha and beta as estimated by VERSO.

Examples

data(variants)
set.seed(12345)
inference = VERSO(D = variants, 
                  alpha = c(0.01,0.05), 
                  beta = c(0.01,0.05), 
                  check_indistinguishable = TRUE, 
                  num_rs = 5, 
                  num_iter = 100, 
                  n_try_bs = 50, 
                  num_processes = 1, 
                  verbose = FALSE)