Skip to contents

RDA function to do everything

Usage

rda_do_everything(
  gen,
  env,
  coords = NULL,
  impute = "structure",
  K_impute = 3,
  entropy = TRUE,
  repetitions = 10,
  project = "new",
  quiet_impute = TRUE,
  save_output = FALSE,
  output_filename = NULL,
  model = "full",
  correctGEO = FALSE,
  correctPC = FALSE,
  outlier_method = "p",
  sig = 0.05,
  z = 3,
  p_adj = "fdr",
  cortest = TRUE,
  nPC = 3,
  varpart = FALSE,
  naxes = "all",
  Pin = 0.05,
  R2permutations = 1000,
  R2scope = T,
  stdz = TRUE,
  quiet = FALSE
)

Arguments

gen

genotype dosage matrix (rows = individuals & columns = SNPs) or vcfR object

env

dataframe with environmental data or a Raster* type object from which environmental values for the coordinates can be extracted

coords

dataframe with coordinates (only needed if correctGEO = TRUE) or if env is a Raster* from which values should be extracted

impute

if NAs in gen, imputation will be performed on missing values; options are "structure" which uses the str_impute() function to impute based on population structure inferred with LEA::snmf (default); other option is "simple" based on simple_impute() which imputes to the median

K_impute

if impute = "structure", an integer vector (range or single value) corresponding to the number of ancestral populations for which the sNMF algorithm estimates have to be calculated (defaults to 3)

entropy

A boolean value. If true, the cross-entropy criterion is calculated (see create.dataset and cross.entropy.estimation).

repetitions

An integer corresponding with the number of repetitions for each value of K.

project

A character string among "continue", "new", and "force". If "continue", the results are stored in the current project. If "new", the current project is removed and a new one is created to store the result. If "force", the results are stored in the current project even if the input file has been modified since the creation of the project.

quiet_impute

if impute = "structure", whether to suppress results of cross-entropy scores (defaults to TRUE; only does so if K is range of values); only displays run with minimum cross-entropy

save_output

if impute = "structure", if TRUE, saves SNP GDS and ped (plink) files with retained SNPs in new directory; if FALSE returns object (defaults to FALSE)

output_filename

if impute = "structure" and save_output = TRUE, name prefix for saved .geno file, SNMF project file, and SNMF output file results (defaults to FALSE, in which no files are saved)

model

whether to fit the model with all variables ("full") or to perform variable selection to determine the best set of variables ("best"); defaults to "full"

correctGEO

whether to condition on geographic coordinates

correctPC

whether to condition on PCs from PCA of genotypes

outlier_method

method to determine outliers. Can either be "p" to use the p-value method from here or "z" to use the z-score based method from here

sig

if outlier_method = "p", the significance level to use to identify SNPs (defaults to 0.05)

z

if outlier_method = "z", the number of standard deviations to use to identify SNPs (defaults to 3)

p_adj

if outlier_method = "p", method to use for p-value correction (defaults to "fdr"); other options can be found in p.adjust()

cortest

whether to create table of correlations for SNPs and environmental variable (defaults to TRUE)

nPC

number of PCs to use if correctPC = TRUE (defaults to 3); if set to "manual" a selection option with a terminal prompt will be provided

varpart

whether to perform variance partitioning (defaults to FALSE)

naxes

number of RDA axes to use (defaults to "all" to use all axes), if set to "manual" a selection option with a terminal prompt will be given, otherwise can be any integer that is less than or equal to the total number of axes

Pin

if model = "best", limits of permutation P-values for adding (Pin) a term to the model, or dropping (Pout) from the model. Term is added if P <= Pin, and removed if P > Pout (see ordiR2step) (defaults to 0.05)

R2permutations

if model = "best", number of permutations used in the estimation of adjusted R2 for cca using RsquareAdj (see ordiR2step) (defaults to 1000)

R2scope

if model = "best" and set to TRUE (default), use adjusted R2 as the stopping criterion: only models with lower adjusted R2 than scope are accepted (see ordiR2step)

stdz

whether to center and scale environmental data (defaults to TRUE)

quiet

whether to operate quietly and suppress the output of tables and figures (defaults to FALSE)

Value

list containing (1) outlier SNPs, (2) dataframe with correlation test results, if cortest = TRUE, (3) the RDA model, (4) results from outlier analysis (output from rda_getoutliers), (5) RDA R-Squared, (6) RDA ANOVA, (7) p-values if outlier_method = "p", and (8) results from variance partitioning analysis, if varpart = TRUE

Details

Much of algatr's code is adapted from Capblancq T., Forester B.R. 2021. Redundancy analysis: A swiss army knife for landscape genomics. Methods Ecol. Evol. 12:2298-2309. doi: https://doi.org/10.1111/2041-210X.13722.

See also