Package 'CondIndTests' reference manual

Title:	Nonlinear Conditional Independence Tests
Description:	Code for a variety of nonlinear conditional independence tests: Kernel conditional independence test (Zhang et al., UAI 2011, <arXiv:1202.3775>), Residual Prediction test (based on Shah and Buehlmann, <arXiv:1511.03334>), Invariant environment prediction, Invariant target prediction, Invariant residual distribution test, Invariant conditional quantile prediction (all from Heinze-Deml et al., <arXiv:1706.08576>).
Authors:	Christina Heinze-Deml <[email protected]>, Jonas Peters <[email protected]>, Asbjoern Marco Sinius Munk <[email protected]>
Maintainer:	Christina Heinze-Deml <[email protected]>
License:	GPL
Version:	0.1.5
Built:	2025-03-29 03:15:37 UTC
Source:	https://github.com/christinaheinze/nonlinearicp-and-condindtests

Wrapper function for conditional independence tests.

Description

Tests the null hypothesis that Y and E are independent given X.

Usage

CondIndTest(Y, E, X, method = "KCI", alpha = 0.05,
  parsMethod = list(), verbose = FALSE)
CondIndTest(Y, E, X, method = "KCI", alpha = 0.05,
  parsMethod = list(), verbose = FALSE)

Arguments

`Y`	An n-dimensional vector or a matrix or dataframe with n rows and p columns.
`E`	An n-dimensional vector or a matrix or dataframe with n rows and p columns.
`X`	An n-dimensional vector or a matrix or dataframe with n rows and p columns.
`method`	The conditional indepdence test to use, can be one of `"KCI"`, `"InvariantConditionalQuantilePrediction"`, `"InvariantEnvironmentPrediction"`, `"InvariantResidualDistributionTest"`, `"InvariantTargetPrediction"`, `"ResidualPredictionTest"`.
`alpha`	Significance level. Defaults to 0.05.
`parsMethod`	Named list to pass options to `method`.
`verbose`	If `TRUE`, intermediate output is provided. Defaults to `FALSE`.

Value

A list with the p-value of the test (pvalue) and possibly additional entries, depending on the output of the chosen conditional independence test in method.

References

Please cite C. Heinze-Deml, J. Peters and N. Meinshausen: "Invariant Causal Prediction for Nonlinear Models", arXiv:1706.08576 and the corresponding reference for the conditional independence test.

Examples


# Example 1
set.seed(1)
n <- 100
Z <- rnorm(n)
X <- 4 + 2 * Z + rnorm(n)
Y <- 3 * X^2 + Z + rnorm(n)
test1 <- CondIndTest(X,Y,Z, method = "KCI")
cat("These data come from a distribution, for which X and Y are NOT
cond. ind. given Z.")
cat(paste("The p-value of the test is: ", test1$pvalue))

# Example 2
set.seed(1)
Z <- rnorm(n)
X <- 4 + 2 * Z + rnorm(n)
Y <- 3 + Z + rnorm(n)
test2 <- CondIndTest(X,Y,Z, method = "KCI")
cat("The data come from a distribution, for which X and Y are cond.
ind. given Z.")
cat(paste("The p-value of the test is: ", test2$pvalue))

# Example 1
set.seed(1)
n <- 100
Z <- rnorm(n)
X <- 4 + 2 * Z + rnorm(n)
Y <- 3 * X^2 + Z + rnorm(n)
test1 <- CondIndTest(X,Y,Z, method = "KCI")
cat("These data come from a distribution, for which X and Y are NOT
cond. ind. given Z.")
cat(paste("The p-value of the test is: ", test1$pvalue))

# Example 2
set.seed(1)
Z <- rnorm(n)
X <- 4 + 2 * Z + rnorm(n)
Y <- 3 + Z + rnorm(n)
test2 <- CondIndTest(X,Y,Z, method = "KCI")
cat("The data come from a distribution, for which X and Y are cond.
ind. given Z.")
cat(paste("The p-value of the test is: ", test2$pvalue))

Fishers test to test whether the exceedance of the conditional quantiles is independent of the categorical variable E.

Description

Used as a subroutine in InvariantConditionalQuantilePrediction to test whether the exceedance of the conditional quantiles is independent of the categorical variable E.

Usage

fishersTestExceedance(Y, predicted, E, verbose)
fishersTestExceedance(Y, predicted, E, verbose)

Arguments

`Y`	An n-dimensional vector.
`predicted`	A matrix with n rows. The columns contain predictions for different conditional quantiles of Y\|X.
`E`	An n-dimensional vector. `E` needs to be a factor.
`verbose`	Set to `TRUE` if output should be printed.

Value

A list with the p-value for the test.

F-test for a nested model comparison.

Description

Used as a subroutine in InvariantTargetPrediction to test whether out-of-sample prediction performance is better when using X and E as predictors for Y, compared to using X only.

Usage

fTestTargetY(Y, predictedOnlyX, predictedXE, verbose, ...)
fTestTargetY(Y, predictedOnlyX, predictedXE, verbose, ...)

Arguments

`Y`	An n-dimensional vector.
`predictedOnlyX`	Predictions for Y based on predictors in X only.
`predictedXE`	Predictions for Y based on predictors in X and E.
`verbose`	Set to `TRUE` if output should be printed.
`...`	The dimensions of X (df) and E (dimE) need to be passed via the ... argument to allow for coherent interface of fTestTargetY and wilcoxTestTargetY.

Value

A list with the p-value for the test.

Invariant conditional quantile prediction.

Description

Tests the null hypothesis that Y and E are independent given X.

Usage

InvariantConditionalQuantilePrediction(Y, E, X, alpha = 0.05,
  verbose = FALSE, test = fishersTestExceedance,
  mtry = sqrt(NCOL(X)), ntree = 100, nodesize = 5, maxnodes = NULL,
  quantiles = c(0.1, 0.5, 0.9), returnModel = FALSE)
InvariantConditionalQuantilePrediction(Y, E, X, alpha = 0.05,
  verbose = FALSE, test = fishersTestExceedance,
  mtry = sqrt(NCOL(X)), ntree = 100, nodesize = 5, maxnodes = NULL,
  quantiles = c(0.1, 0.5, 0.9), returnModel = FALSE)

Arguments

`Y`	An n-dimensional vector.
`E`	An n-dimensional vector. If `test = fishersTestExceedance`, E needs to be a factor.
`X`	A matrix or dataframe with n rows and p columns.
`alpha`	Significance level. Defaults to 0.05.
`verbose`	If `TRUE`, intermediate output is provided. Defaults to `FALSE`.
`test`	Unconditional independence test that tests whether exceedence is independent of E. Defaults to `fishersTestExceedance`.
`mtry`	Random forest parameter: Number of variables randomly sampled as candidates at each split. Defaults to `sqrt(NCOL(X))`.
`ntree`	Random forest parameter: Number of trees to grow. Defaults to 100.
`nodesize`	Random forest parameter: Minimum size of terminal nodes. Defaults to 5.
`maxnodes`	Random forest parameter: Maximum number of terminal nodes trees in the forest can have. Defaults to NULL.
`quantiles`	Quantiles for which to test independence between exceedence and E. Defaults to `c(0.1, 0.5, 0.9)`.
`returnModel`	If `TRUE`, the fitted quantile regression forest model will be returned. Defaults to `FALSE`.

Value

A list with the following entries:

pvalue The p-value for the null hypothesis that Y and E are independent given X.
model The fitted quantile regression forest model if returnModel = TRUE.

Examples

# Example 1
n <- 1000
E <- rbinom(n, size = 1, prob = 0.2)
X <- 4 + 2 * E + rnorm(n)
Y <- 3 * (X)^2 + rnorm(n)
InvariantConditionalQuantilePrediction(Y, as.factor(E), X)

# Example 2
E <- rbinom(n, size = 1, prob = 0.2)
X <- 4 + 2 * E + rnorm(n)
Y <- 3 * E + rnorm(n)
InvariantConditionalQuantilePrediction(Y, as.factor(E), X)

# Example 1
n <- 1000
E <- rbinom(n, size = 1, prob = 0.2)
X <- 4 + 2 * E + rnorm(n)
Y <- 3 * (X)^2 + rnorm(n)
InvariantConditionalQuantilePrediction(Y, as.factor(E), X)

# Example 2
E <- rbinom(n, size = 1, prob = 0.2)
X <- 4 + 2 * E + rnorm(n)
Y <- 3 * E + rnorm(n)
InvariantConditionalQuantilePrediction(Y, as.factor(E), X)

Invariant environment prediction.

Description

Tests the null hypothesis that Y and E are independent given X.

Usage

InvariantEnvironmentPrediction(Y, E, X, alpha = 0.05, verbose = FALSE,
  trainTestSplitFunc = caTools::sample.split,
  argsTrainTestSplitFunc = list(Y = E, SplitRatio = 0.8),
  test = propTestTargetE, mtry = sqrt(NCOL(X)), ntree = 100,
  nodesize = 5, maxnodes = NULL, permute = TRUE,
  returnModel = FALSE)
InvariantEnvironmentPrediction(Y, E, X, alpha = 0.05, verbose = FALSE,
  trainTestSplitFunc = caTools::sample.split,
  argsTrainTestSplitFunc = list(Y = E, SplitRatio = 0.8),
  test = propTestTargetE, mtry = sqrt(NCOL(X)), ntree = 100,
  nodesize = 5, maxnodes = NULL, permute = TRUE,
  returnModel = FALSE)

Arguments

`Y`	An n-dimensional vector.
`E`	An n-dimensional vector. If `test = propTestTargetE`, E needs to be a factor.
`X`	A matrix or dataframe with n rows and p columns.
`alpha`	Significance level. Defaults to 0.05.
`verbose`	If `TRUE`, intermediate output is provided. Defaults to `FALSE`.
`trainTestSplitFunc`	Function to split sample. Defaults to stratified sampling using `caTools::sample.split`, assuming E is a factor.
`argsTrainTestSplitFunc`	Arguments for sampling splitting function.
`test`	Unconditional independence test that tests whether the out-of-sample prediction accuracy is the same when using X only vs. X and Y as predictors for E. Defaults to `propTestTargetE`.
`mtry`	Random forest parameter: Number of variables randomly sampled as candidates at each split. Defaults to `sqrt(NCOL(X))`.
`ntree`	Random forest parameter: Number of trees to grow. Defaults to 100.
`nodesize`	Random forest parameter: Minimum size of terminal nodes. Defaults to 5.
`maxnodes`	Random forest parameter: Maximum number of terminal nodes trees in the forest can have. Defaults to `NULL`.
`permute`	Random forest parameter: If `TRUE`, model that would use X only for predicting Y also includes a random permutation of E. Defaults to `TRUE`.
`returnModel`	If `TRUE`, the fitted quantile regression forest model will be returned. Defaults to `FALSE`.

Value

A list with the following entries:

pvalue The p-value for the null hypothesis that Y and E are independent given X.
model The fitted models if returnModel = TRUE.

Examples

# Example 1
n <- 1000
E <- rbinom(n, size = 1, prob = 0.2)
X <- 4 + 2 * E + rnorm(n)
Y <- 3 * (X)^2 + rnorm(n)
InvariantEnvironmentPrediction(Y, as.factor(E), X)

# Example 2
E <- rbinom(n, size = 1, prob = 0.2)
X <- 4 + 2 * E + rnorm(n)
Y <- 3 * E + rnorm(n)
InvariantEnvironmentPrediction(Y, as.factor(E), X)

# Example 3
E <- rnorm(n)
X <- 4 + 2 * E + rnorm(n)
Y <- 3 * (X)^2 + rnorm(n)
InvariantEnvironmentPrediction(Y, E, X, test = wilcoxTestTargetY)
InvariantEnvironmentPrediction(Y, X, E, test = wilcoxTestTargetY)
# Example 1
n <- 1000
E <- rbinom(n, size = 1, prob = 0.2)
X <- 4 + 2 * E + rnorm(n)
Y <- 3 * (X)^2 + rnorm(n)
InvariantEnvironmentPrediction(Y, as.factor(E), X)

# Example 2
E <- rbinom(n, size = 1, prob = 0.2)
X <- 4 + 2 * E + rnorm(n)
Y <- 3 * E + rnorm(n)
InvariantEnvironmentPrediction(Y, as.factor(E), X)

# Example 3
E <- rnorm(n)
X <- 4 + 2 * E + rnorm(n)
Y <- 3 * (X)^2 + rnorm(n)
InvariantEnvironmentPrediction(Y, E, X, test = wilcoxTestTargetY)
InvariantEnvironmentPrediction(Y, X, E, test = wilcoxTestTargetY)

Invariant residual distribution test.

Description

Tests the null hypothesis that Y and E are independent given X.

Usage

InvariantResidualDistributionTest(Y, E, X, alpha = 0.05,
  verbose = FALSE, fitWithGam = TRUE,
  test = leveneAndWilcoxResidualDistributions, colNameNoSmooth = NULL,
  mtry = sqrt(NCOL(X)), ntree = 100, nodesize = 5, maxnodes = NULL,
  returnModel = FALSE)
InvariantResidualDistributionTest(Y, E, X, alpha = 0.05,
  verbose = FALSE, fitWithGam = TRUE,
  test = leveneAndWilcoxResidualDistributions, colNameNoSmooth = NULL,
  mtry = sqrt(NCOL(X)), ntree = 100, nodesize = 5, maxnodes = NULL,
  returnModel = FALSE)

Arguments

`Y`	An n-dimensional vector.
`E`	An n-dimensional vector. E needs to be a factor.
`X`	A matrix or dataframe with n rows and p columns.
`alpha`	Significance level. Defaults to 0.05.
`verbose`	If `TRUE`, intermediate output is provided. Defaults to `FALSE`.
`fitWithGam`	If `TRUE`, a GAM is used for the nonlinear regression, else a random forest is used. Defaults to `TRUE`.
`test`	Unconditional independence test that tests whether residual distribution is invariant across different levels of E. Defaults to `leveneAndWilcoxResidDistributions`.
`colNameNoSmooth`	Gam parameter: Name of variables that should enter linearly into the model. Defaults to `NULL`.
`mtry`	Random forest parameter: Number of variables randomly sampled as candidates at each split. Defaults to `sqrt(NCOL(X))`.
`ntree`	Random forest parameter: Number of trees to grow. Defaults to 100.
`nodesize`	Random forest parameter: Minimum size of terminal nodes. Defaults to 5.
`maxnodes`	Random forest parameter: Maximum number of terminal nodes trees in the forest can have. Defaults to `NULL`.
`returnModel`	If `TRUE`, the fitted quantile regression forest model will be returned. Defaults to `FALSE`.

Value

A list with the following entries:

pvalue The p-value for the null hypothesis that Y and E are independent given X.
model The fitted model if returnModel = TRUE.

Examples


# Example 1
n <- 1000
E <- rbinom(n, size = 1, prob = 0.2)
X <- 4 + 2 * E + rnorm(n)
Y <- 3 * (X)^2 + rnorm(n)
InvariantResidualDistributionTest(Y, as.factor(E), X)
InvariantResidualDistributionTest(Y, as.factor(E), X, test = ksResidualDistributions)

# Example 2
E <- rbinom(n, size = 1, prob = 0.2)
X <- 4 + 2 * E + rnorm(n)
Y <- 3 * E + rnorm(n)
InvariantResidualDistributionTest(Y, as.factor(E), X)
InvariantResidualDistributionTest(Y, as.factor(E), X, test = ksResidualDistributions)
# Example 1
n <- 1000
E <- rbinom(n, size = 1, prob = 0.2)
X <- 4 + 2 * E + rnorm(n)
Y <- 3 * (X)^2 + rnorm(n)
InvariantResidualDistributionTest(Y, as.factor(E), X)
InvariantResidualDistributionTest(Y, as.factor(E), X, test = ksResidualDistributions)

# Example 2
E <- rbinom(n, size = 1, prob = 0.2)
X <- 4 + 2 * E + rnorm(n)
Y <- 3 * E + rnorm(n)
InvariantResidualDistributionTest(Y, as.factor(E), X)
InvariantResidualDistributionTest(Y, as.factor(E), X, test = ksResidualDistributions)

Invariant target prediction.

Description

Tests the null hypothesis that Y and E are independent given X.

Usage

InvariantTargetPrediction(Y, E, X, alpha = 0.05, verbose = FALSE,
  fitWithGam = TRUE, trainTestSplitFunc = caTools::sample.split,
  argsTrainTestSplitFunc = NULL, test = fTestTargetY,
  colNameNoSmooth = NULL, mtry = sqrt(NCOL(X)), ntree = 100,
  nodesize = 5, maxnodes = NULL, permute = TRUE,
  returnModel = FALSE)
InvariantTargetPrediction(Y, E, X, alpha = 0.05, verbose = FALSE,
  fitWithGam = TRUE, trainTestSplitFunc = caTools::sample.split,
  argsTrainTestSplitFunc = NULL, test = fTestTargetY,
  colNameNoSmooth = NULL, mtry = sqrt(NCOL(X)), ntree = 100,
  nodesize = 5, maxnodes = NULL, permute = TRUE,
  returnModel = FALSE)

Arguments

`Y`	An n-dimensional vector.
`E`	An n-dimensional vector or an nxq dimensional matrix or dataframe.
`X`	A matrix or dataframe with n rows and p columns.
`alpha`	Significance level. Defaults to 0.05.
`verbose`	If `TRUE`, intermediate output is provided. Defaults to `FALSE`.
`fitWithGam`	If `TRUE`, a GAM is used for the nonlinear regression, else a random forest is used. Defaults to `TRUE`.
`trainTestSplitFunc`	Function to split sample. Defaults to stratified sampling using `caTools::sample.split`, assuming E is a factor.
`argsTrainTestSplitFunc`	Arguments for sampling splitting function.
`test`	Unconditional independence test that tests whether the out-of-sample prediction accuracy is the same when using X only vs. X and E as predictors for Y. Defaults to `fTestTargetY`.
`colNameNoSmooth`	Gam parameter: Name of variables that should enter linearly into the model. Defaults to `NULL`.
`mtry`	Random forest parameter: Number of variables randomly sampled as candidates at each split. Defaults to `sqrt(NCOL(X))`.
`ntree`	Random forest parameter: Number of trees to grow. Defaults to 100.
`nodesize`	Random forest parameter: Minimum size of terminal nodes. Defaults to 5.
`maxnodes`	Random forest parameter: Maximum number of terminal nodes trees in the forest can have. Defaults to NULL.
`permute`	Random forest parameter: If `TRUE`, model that would use X only for predicting Y also includes a random permutation of E. Defaults to `TRUE`.
`returnModel`	If `TRUE`, the fitted quantile regression forest model will be returned. Defaults to `FALSE`.

Value

A list with the following entries:

pvalue The p-value for the null hypothesis that Y and E are independent given X.
model The fitted models if returnModel = TRUE.

Examples

# Example 1
n <- 1000
E <- rbinom(n, size = 1, prob = 0.2)
X <- 4 + 2 * E + rnorm(n)
Y <- 3 * (X)^2 + rnorm(n)
InvariantTargetPrediction(Y, as.factor(E), X)
InvariantTargetPrediction(Y, as.factor(E), X, test = wilcoxTestTargetY)

# Example 2
E <- rbinom(n, size = 1, prob = 0.2)
X <- 4 + 2 * E + rnorm(n)
Y <- 3 * E + rnorm(n)
InvariantTargetPrediction(Y, as.factor(E), X)
InvariantTargetPrediction(Y, as.factor(E), X, test = wilcoxTestTargetY)

# Example 3
E <- rnorm(n)
X <- 4 + 2 * E + rnorm(n)
Y <- 3 * (X)^2 + rnorm(n)
InvariantTargetPrediction(Y, E, X)
InvariantTargetPrediction(Y, X, E)
InvariantTargetPrediction(Y, E, X, test = wilcoxTestTargetY)
InvariantTargetPrediction(Y, X, E, test = wilcoxTestTargetY)
# Example 1
n <- 1000
E <- rbinom(n, size = 1, prob = 0.2)
X <- 4 + 2 * E + rnorm(n)
Y <- 3 * (X)^2 + rnorm(n)
InvariantTargetPrediction(Y, as.factor(E), X)
InvariantTargetPrediction(Y, as.factor(E), X, test = wilcoxTestTargetY)

# Example 2
E <- rbinom(n, size = 1, prob = 0.2)
X <- 4 + 2 * E + rnorm(n)
Y <- 3 * E + rnorm(n)
InvariantTargetPrediction(Y, as.factor(E), X)
InvariantTargetPrediction(Y, as.factor(E), X, test = wilcoxTestTargetY)

# Example 3
E <- rnorm(n)
X <- 4 + 2 * E + rnorm(n)
Y <- 3 * (X)^2 + rnorm(n)
InvariantTargetPrediction(Y, E, X)
InvariantTargetPrediction(Y, X, E)
InvariantTargetPrediction(Y, E, X, test = wilcoxTestTargetY)
InvariantTargetPrediction(Y, X, E, test = wilcoxTestTargetY)

Kernel conditional independence test.

Description

Tests the null hypothesis that Y and E are independent given X. The distribution of the test statistic under the null hypothesis equals an infinite weighted sum of chi squared variables. This distribution can either be approximated by a gamma distribution or by a Monte Carlo approach. This version includes an implementation of choosing the hyperparameters by Gaussian Process regression.

Usage

KCI(Y, E, X, width = 0, alpha = 0.05, unbiased = FALSE,
  gammaApprox = TRUE, GP = TRUE, nRepBs = 5000, lambda = 0.001,
  thresh = 1e-05, numEig = NROW(Y), verbose = FALSE)
KCI(Y, E, X, width = 0, alpha = 0.05, unbiased = FALSE,
  gammaApprox = TRUE, GP = TRUE, nRepBs = 5000, lambda = 0.001,
  thresh = 1e-05, numEig = NROW(Y), verbose = FALSE)

Arguments

`Y`	A vector of length n or a matrix or dataframe with n rows and p columns.
`E`	A vector of length n or a matrix or dataframe with n rows and p columns.
`X`	A matrix or dataframe with n rows and p columns.
`width`	Kernel width; if it is set to zero, the width is chosen automatically (default: 0).
`alpha`	Significance level (default: 0.05).
`unbiased`	A boolean variable that indicates whether a bias correction should be applied (default: FALSE).
`gammaApprox`	A boolean variable that indicates whether the null distribution is approximated by a Gamma distribution. If it is FALSE, a Monte Carlo approach is used (default: TRUE).
`GP`	Flag whether to use Gaussian Process regression to choose the hyperparameters
`nRepBs`	Number of draws for the Monte Carlo approach (default: 500).
`lambda`	Regularization parameter (default: 1e-03).
`thresh`	Threshold for eigenvalues. Whenever eigenvalues are computed, they are set to zero if they are smaller than thresh times the maximum eigenvalue (default: 1e-05).
`numEig`	Number of eigenvalues computed (only relevant for computing the distribution under the hypothesis of conditional independence) (default: length(Y)).
`verbose`	If `TRUE`, intermediate output is provided. (default: `FALSE`).

Value

A list with the following entries:

testStatistic the statistic Tr(K_(ddot(Y)|X) * K_(E|X))
criticalValue the critical point at the p-value equal to alpha; obtained by a Monte Carlo approach if gammaApprox = FALSE, otherwise obtained by Gamma approximation.
pvalue The p-value for the null hypothesis that Y and E are independent given X. It is obtained by a Monte Carlo approach if gammaApprox = FALSE, otherwise obtained by Gamma approximation.

Examples

# Example 1
n <- 100
E <- rnorm(n)
X <- 4 + 2 * E + rnorm(n)
Y <- 3 * (X)^2 + rnorm(n)
KCI(Y, E, X)
KCI(Y, X, E)

# Example 1
n <- 100
E <- rnorm(n)
X <- 4 + 2 * E + rnorm(n)
Y <- 3 * (X)^2 + rnorm(n)
KCI(Y, E, X)
KCI(Y, X, E)

Kolmogorov-Smirnov test to compare residual distributions

Description

Used as a subroutine in InvariantResidualDistributionTest to test whether residual distribution remains invariant across different levels of E.

Usage

ksResidualDistributions(Y, predicted, E, verbose)
ksResidualDistributions(Y, predicted, E, verbose)

Arguments

`Y`	An n-dimensional vector.
`predicted`	An n-dimensional vector of predictions for Y.
`E`	An n-dimensional vector. `E` needs to be a factor.
`verbose`	Set to `TRUE` if output should be printed.

Value

A list with the p-value for the test.

Levene and wilcoxon test to compare first and second moments of residual distributions

Description

Used as a subroutine in InvariantResidualDistributionTest to test whether residual distribution remains invariant across different levels of E.

Usage

leveneAndWilcoxResidualDistributions(Y, predicted, E, verbose)
leveneAndWilcoxResidualDistributions(Y, predicted, E, verbose)

Arguments

`Y`	An n-dimensional vector.
`predicted`	An n-dimensional vector of predictions for Y.
`E`	An n-dimensional vector. `E` needs to be a factor.
`verbose`	Set to `TRUE` if output should be printed.

Value

A list with the p-value for the test.

Proportion test to compare two misclassification rates.

Description

Used as a subroutine in InvariantEnvironmentPrediction to test whether out-of-sample performance is better when using X and Y as predictors for E, compared to using X only.

Usage

propTestTargetE(E, predictedOnlyX, predictedXY, verbose)
propTestTargetE(E, predictedOnlyX, predictedXY, verbose)

Arguments

`E`	An n-dimensional vector.
`predictedOnlyX`	Predictions for E based on predictors in X only.
`predictedXY`	Predictions for E based on predictors in X and Y.
`verbose`	Set to `TRUE` if output should be printed.

Value

A list with the p-value for the test.

Residual prediction test.

Description

Tests the null hypothesis that Y and E are independent given X.

Usage

ResidualPredictionTest(Y, E, X, alpha = 0.05, verbose = FALSE,
  degree = 4, basis = c("nystrom", "nystrom_poly", "fourier",
  "polynomial", "provided")[1], resid_type = "OLS", XBasis = NULL,
  noiseMat = NULL, getnoiseFct = function(n, ...) {     rnorm(n) },
  argsGetNoiseFct = NULL, nSim = 100, funcOfRes = function(x) {    
  abs(x) }, useX = TRUE, returnXBasis = FALSE,
  nSub = ceiling(NROW(X)/4), ntree = 100, nodesize = 5,
  maxnodes = NULL)
ResidualPredictionTest(Y, E, X, alpha = 0.05, verbose = FALSE,
  degree = 4, basis = c("nystrom", "nystrom_poly", "fourier",
  "polynomial", "provided")[1], resid_type = "OLS", XBasis = NULL,
  noiseMat = NULL, getnoiseFct = function(n, ...) {     rnorm(n) },
  argsGetNoiseFct = NULL, nSim = 100, funcOfRes = function(x) {    
  abs(x) }, useX = TRUE, returnXBasis = FALSE,
  nSub = ceiling(NROW(X)/4), ntree = 100, nodesize = 5,
  maxnodes = NULL)

Arguments

`Y`	An n-dimensional vector.
`E`	An n-dimensional vector or an nxq dimensional matrix or dataframe.
`X`	A matrix or dataframe with n rows and p columns.
`alpha`	Significance level. Defaults to 0.05.
`verbose`	If `TRUE`, intermediate output is provided. Defaults to `FALSE`.
`degree`	Degree of polynomial to use if `basis="polynomial"` or `basis="nystrom_poly"`. Defaults to 4.
`basis`	Can be one of `"nystrom","nystrom_poly","fourier","polynomial","provided"`. Defaults to `"nystrom"`.
`resid_type`	Can be `"Lasso"` or `"OLS"`. Defaults to `"OLS"`.
`XBasis`	Basis if `basis="provided"`. Defaults to `NULL`.
`noiseMat`	Matrix with simulated noise. Defaults to NULL in which case the simulation is performed inside the function.
`getnoiseFct`	Function to use to generate the noise matrix. Defaults to `function(n, ...){rnorm(n)}`.
`argsGetNoiseFct`	Arguments for `getnoiseFct`. Defaults to `NULL`.
`nSim`	Number of simulations to use. Defaults to 100.
`funcOfRes`	Function of residuals to use in addition to predicting the conditional mean. Defaults to `function(x){abs(x)}`.
`useX`	Set to `TRUE` if the predictors in X should also be used when predicting the scaled residuals with E. Defaults to `TRUE`.
`returnXBasis`	Set to `TRUE` if basis expansion should be returned. Defaults to `FALSE`.
`nSub`	Number of random features to use if `basis` is one of `"nystrom","nystrom_poly"` or `"fourier"`. Defaults to `ceiling(NROW(X)/4)`.
`ntree`	Random forest parameter: Number of trees to grow. Defaults to 500.
`nodesize`	Random forest parameter: Minimum size of terminal nodes. Defaults to 5.
`maxnodes`	Random forest parameter: Maximum number of terminal nodes trees in the forest can have. Defaults to NULL.

Value

A list with the following entries:

pvalue The p-value for the null hypothesis that Y and E are independent given X.
XBasis Basis expansion if returnXBasis was set to TRUE.
fctBasisExpansion Function used to create basis expansion if basis is not "provided".

Examples

# Example 1
n <- 100
E <- rbinom(n, size = 1, prob = 0.2)
X <- 4 + 2 * E + rnorm(n)
Y <- 3 * (X)^2 + rnorm(n)
ResidualPredictionTest(Y, as.factor(E), X)

# Example 2
E <- rbinom(n, size = 1, prob = 0.2)
X <- 4 + 2 * E + rnorm(n)
Y <- 3 * E + rnorm(n)
ResidualPredictionTest(Y, as.factor(E), X)

# not run:
# # Example 3
# E <- rnorm(n)
# X <- 4 + 2 * E + rnorm(n)
# Y <- 3 * (X)^2 + rnorm(n)
# ResidualPredictionTest(Y, E, X)
# ResidualPredictionTest(Y, X, E)
# Example 1
n <- 100
E <- rbinom(n, size = 1, prob = 0.2)
X <- 4 + 2 * E + rnorm(n)
Y <- 3 * (X)^2 + rnorm(n)
ResidualPredictionTest(Y, as.factor(E), X)

# Example 2
E <- rbinom(n, size = 1, prob = 0.2)
X <- 4 + 2 * E + rnorm(n)
Y <- 3 * E + rnorm(n)
ResidualPredictionTest(Y, as.factor(E), X)

# not run:
# # Example 3
# E <- rnorm(n)
# X <- 4 + 2 * E + rnorm(n)
# Y <- 3 * (X)^2 + rnorm(n)
# ResidualPredictionTest(Y, E, X)
# ResidualPredictionTest(Y, X, E)

Wilcoxon test to compare two mean squared error rates.

Description

Used as a subroutine in InvariantTargetPrediction to test whether out-of-sample performance is better when using X and E as predictors for Y, compared to using X only.

Usage

wilcoxTestTargetY(Y, predictedOnlyX, predictedXE, verbose, ...)
wilcoxTestTargetY(Y, predictedOnlyX, predictedXE, verbose, ...)

Arguments

`Y`	An n-dimensional vector.
`predictedOnlyX`	Predictions for Y based on predictors in X only.
`predictedXE`	Predictions for Y based on predictors in X and E.
`verbose`	Set to `TRUE` if output should be printed.
`...`	Argument to allow for coherent interface of fTestTargetY and wilcoxTestTargetY.

Value

A list with the p-value for the test.

Package 'CondIndTests'

Help Index

Wrapper function for conditional independence tests.

Description

Usage

Arguments

Value

References

Examples

Fishers test to test whether the exceedance of the conditional quantiles is independent of the categorical variable E.

Description

Usage

Arguments

Value

F-test for a nested model comparison.

Description

Usage

Arguments

Value

Invariant conditional quantile prediction.

Description

Usage

Arguments

Value

Examples

Invariant environment prediction.

Description

Usage

Arguments

Value

Examples

Invariant residual distribution test.

Description

Usage

Arguments

Value

Examples

Invariant target prediction.

Description

Usage

Arguments

Value

Examples

Kernel conditional independence test.

Description

Usage

Arguments

Value

Examples

Kolmogorov-Smirnov test to compare residual distributions

Description

Usage

Arguments

Value

Levene and wilcoxon test to compare first and second moments of residual distributions

Description

Usage

Arguments

Value

Proportion test to compare two misclassification rates.

Description

Usage

Arguments

Value

Residual prediction test.

Description

Usage

Arguments

Value

Examples

Wilcoxon test to compare two mean squared error rates.

Description

Usage

Arguments

Value