Title: | Nonlinear Conditional Independence Tests |
---|---|
Description: | Code for a variety of nonlinear conditional independence tests: Kernel conditional independence test (Zhang et al., UAI 2011, <arXiv:1202.3775>), Residual Prediction test (based on Shah and Buehlmann, <arXiv:1511.03334>), Invariant environment prediction, Invariant target prediction, Invariant residual distribution test, Invariant conditional quantile prediction (all from Heinze-Deml et al., <arXiv:1706.08576>). |
Authors: | Christina Heinze-Deml <[email protected]>, Jonas Peters <[email protected]>, Asbjoern Marco Sinius Munk <[email protected]> |
Maintainer: | Christina Heinze-Deml <[email protected]> |
License: | GPL |
Version: | 0.1.5 |
Built: | 2025-01-28 03:08:16 UTC |
Source: | https://github.com/christinaheinze/nonlinearicp-and-condindtests |
Tests the null hypothesis that Y and E are independent given X.
CondIndTest(Y, E, X, method = "KCI", alpha = 0.05, parsMethod = list(), verbose = FALSE)
CondIndTest(Y, E, X, method = "KCI", alpha = 0.05, parsMethod = list(), verbose = FALSE)
Y |
An n-dimensional vector or a matrix or dataframe with n rows and p columns. |
E |
An n-dimensional vector or a matrix or dataframe with n rows and p columns. |
X |
An n-dimensional vector or a matrix or dataframe with n rows and p columns. |
method |
The conditional indepdence test to use, can be one of
|
alpha |
Significance level. Defaults to 0.05. |
parsMethod |
Named list to pass options to |
verbose |
If |
A list with the p-value of the test (pvalue
) and possibly additional
entries, depending on the output of the chosen conditional independence test in method
.
Please cite C. Heinze-Deml, J. Peters and N. Meinshausen: "Invariant Causal Prediction for Nonlinear Models", arXiv:1706.08576 and the corresponding reference for the conditional independence test.
# Example 1 set.seed(1) n <- 100 Z <- rnorm(n) X <- 4 + 2 * Z + rnorm(n) Y <- 3 * X^2 + Z + rnorm(n) test1 <- CondIndTest(X,Y,Z, method = "KCI") cat("These data come from a distribution, for which X and Y are NOT cond. ind. given Z.") cat(paste("The p-value of the test is: ", test1$pvalue)) # Example 2 set.seed(1) Z <- rnorm(n) X <- 4 + 2 * Z + rnorm(n) Y <- 3 + Z + rnorm(n) test2 <- CondIndTest(X,Y,Z, method = "KCI") cat("The data come from a distribution, for which X and Y are cond. ind. given Z.") cat(paste("The p-value of the test is: ", test2$pvalue))
# Example 1 set.seed(1) n <- 100 Z <- rnorm(n) X <- 4 + 2 * Z + rnorm(n) Y <- 3 * X^2 + Z + rnorm(n) test1 <- CondIndTest(X,Y,Z, method = "KCI") cat("These data come from a distribution, for which X and Y are NOT cond. ind. given Z.") cat(paste("The p-value of the test is: ", test1$pvalue)) # Example 2 set.seed(1) Z <- rnorm(n) X <- 4 + 2 * Z + rnorm(n) Y <- 3 + Z + rnorm(n) test2 <- CondIndTest(X,Y,Z, method = "KCI") cat("The data come from a distribution, for which X and Y are cond. ind. given Z.") cat(paste("The p-value of the test is: ", test2$pvalue))
Used as a subroutine in InvariantConditionalQuantilePrediction
to test whether the exceedance of the conditional quantiles
is independent of the categorical variable E.
fishersTestExceedance(Y, predicted, E, verbose)
fishersTestExceedance(Y, predicted, E, verbose)
Y |
An n-dimensional vector. |
predicted |
A matrix with n rows. The columns contain predictions for different conditional quantiles of Y|X. |
E |
An n-dimensional vector. |
verbose |
Set to |
A list with the p-value for the test.
Used as a subroutine in InvariantTargetPrediction
to test
whether out-of-sample prediction performance is better when using X and E as predictors for Y,
compared to using X only.
fTestTargetY(Y, predictedOnlyX, predictedXE, verbose, ...)
fTestTargetY(Y, predictedOnlyX, predictedXE, verbose, ...)
Y |
An n-dimensional vector. |
predictedOnlyX |
Predictions for Y based on predictors in X only. |
predictedXE |
Predictions for Y based on predictors in X and E. |
verbose |
Set to |
... |
The dimensions of X (df) and E (dimE) need to be passed via the ... argument to allow for coherent interface of fTestTargetY and wilcoxTestTargetY. |
A list with the p-value for the test.
Tests the null hypothesis that Y and E are independent given X.
InvariantConditionalQuantilePrediction(Y, E, X, alpha = 0.05, verbose = FALSE, test = fishersTestExceedance, mtry = sqrt(NCOL(X)), ntree = 100, nodesize = 5, maxnodes = NULL, quantiles = c(0.1, 0.5, 0.9), returnModel = FALSE)
InvariantConditionalQuantilePrediction(Y, E, X, alpha = 0.05, verbose = FALSE, test = fishersTestExceedance, mtry = sqrt(NCOL(X)), ntree = 100, nodesize = 5, maxnodes = NULL, quantiles = c(0.1, 0.5, 0.9), returnModel = FALSE)
Y |
An n-dimensional vector. |
E |
An n-dimensional vector. If |
X |
A matrix or dataframe with n rows and p columns. |
alpha |
Significance level. Defaults to 0.05. |
verbose |
If |
test |
Unconditional independence test that tests whether exceedence is
independent of E. Defaults to |
mtry |
Random forest parameter: Number of variables randomly sampled as
candidates at each split. Defaults to |
ntree |
Random forest parameter: Number of trees to grow. Defaults to 100. |
nodesize |
Random forest parameter: Minimum size of terminal nodes. Defaults to 5. |
maxnodes |
Random forest parameter: Maximum number of terminal nodes trees in the forest can have. Defaults to NULL. |
quantiles |
Quantiles for which to test independence between exceedence and E.
Defaults to |
returnModel |
If |
A list with the following entries:
pvalue
The p-value for the null hypothesis that Y and E are independent given X.
model
The fitted quantile regression forest model if returnModel = TRUE
.
# Example 1 n <- 1000 E <- rbinom(n, size = 1, prob = 0.2) X <- 4 + 2 * E + rnorm(n) Y <- 3 * (X)^2 + rnorm(n) InvariantConditionalQuantilePrediction(Y, as.factor(E), X) # Example 2 E <- rbinom(n, size = 1, prob = 0.2) X <- 4 + 2 * E + rnorm(n) Y <- 3 * E + rnorm(n) InvariantConditionalQuantilePrediction(Y, as.factor(E), X)
# Example 1 n <- 1000 E <- rbinom(n, size = 1, prob = 0.2) X <- 4 + 2 * E + rnorm(n) Y <- 3 * (X)^2 + rnorm(n) InvariantConditionalQuantilePrediction(Y, as.factor(E), X) # Example 2 E <- rbinom(n, size = 1, prob = 0.2) X <- 4 + 2 * E + rnorm(n) Y <- 3 * E + rnorm(n) InvariantConditionalQuantilePrediction(Y, as.factor(E), X)
Tests the null hypothesis that Y and E are independent given X.
InvariantEnvironmentPrediction(Y, E, X, alpha = 0.05, verbose = FALSE, trainTestSplitFunc = caTools::sample.split, argsTrainTestSplitFunc = list(Y = E, SplitRatio = 0.8), test = propTestTargetE, mtry = sqrt(NCOL(X)), ntree = 100, nodesize = 5, maxnodes = NULL, permute = TRUE, returnModel = FALSE)
InvariantEnvironmentPrediction(Y, E, X, alpha = 0.05, verbose = FALSE, trainTestSplitFunc = caTools::sample.split, argsTrainTestSplitFunc = list(Y = E, SplitRatio = 0.8), test = propTestTargetE, mtry = sqrt(NCOL(X)), ntree = 100, nodesize = 5, maxnodes = NULL, permute = TRUE, returnModel = FALSE)
Y |
An n-dimensional vector. |
E |
An n-dimensional vector. If |
X |
A matrix or dataframe with n rows and p columns. |
alpha |
Significance level. Defaults to 0.05. |
verbose |
If |
trainTestSplitFunc |
Function to split sample. Defaults to stratified sampling
using |
argsTrainTestSplitFunc |
Arguments for sampling splitting function. |
test |
Unconditional independence test that tests whether the out-of-sample
prediction accuracy is the same when using X only vs. X and Y as predictors for E.
Defaults to |
mtry |
Random forest parameter: Number of variables randomly sampled as
candidates at each split. Defaults to |
ntree |
Random forest parameter: Number of trees to grow. Defaults to 100. |
nodesize |
Random forest parameter: Minimum size of terminal nodes. Defaults to 5. |
maxnodes |
Random forest parameter: Maximum number of terminal nodes trees in the forest can have.
Defaults to |
permute |
Random forest parameter: If |
returnModel |
If |
A list with the following entries:
pvalue
The p-value for the null hypothesis that Y and E are independent given X.
model
The fitted models if returnModel = TRUE
.
# Example 1 n <- 1000 E <- rbinom(n, size = 1, prob = 0.2) X <- 4 + 2 * E + rnorm(n) Y <- 3 * (X)^2 + rnorm(n) InvariantEnvironmentPrediction(Y, as.factor(E), X) # Example 2 E <- rbinom(n, size = 1, prob = 0.2) X <- 4 + 2 * E + rnorm(n) Y <- 3 * E + rnorm(n) InvariantEnvironmentPrediction(Y, as.factor(E), X) # Example 3 E <- rnorm(n) X <- 4 + 2 * E + rnorm(n) Y <- 3 * (X)^2 + rnorm(n) InvariantEnvironmentPrediction(Y, E, X, test = wilcoxTestTargetY) InvariantEnvironmentPrediction(Y, X, E, test = wilcoxTestTargetY)
# Example 1 n <- 1000 E <- rbinom(n, size = 1, prob = 0.2) X <- 4 + 2 * E + rnorm(n) Y <- 3 * (X)^2 + rnorm(n) InvariantEnvironmentPrediction(Y, as.factor(E), X) # Example 2 E <- rbinom(n, size = 1, prob = 0.2) X <- 4 + 2 * E + rnorm(n) Y <- 3 * E + rnorm(n) InvariantEnvironmentPrediction(Y, as.factor(E), X) # Example 3 E <- rnorm(n) X <- 4 + 2 * E + rnorm(n) Y <- 3 * (X)^2 + rnorm(n) InvariantEnvironmentPrediction(Y, E, X, test = wilcoxTestTargetY) InvariantEnvironmentPrediction(Y, X, E, test = wilcoxTestTargetY)
Tests the null hypothesis that Y and E are independent given X.
InvariantResidualDistributionTest(Y, E, X, alpha = 0.05, verbose = FALSE, fitWithGam = TRUE, test = leveneAndWilcoxResidualDistributions, colNameNoSmooth = NULL, mtry = sqrt(NCOL(X)), ntree = 100, nodesize = 5, maxnodes = NULL, returnModel = FALSE)
InvariantResidualDistributionTest(Y, E, X, alpha = 0.05, verbose = FALSE, fitWithGam = TRUE, test = leveneAndWilcoxResidualDistributions, colNameNoSmooth = NULL, mtry = sqrt(NCOL(X)), ntree = 100, nodesize = 5, maxnodes = NULL, returnModel = FALSE)
Y |
An n-dimensional vector. |
E |
An n-dimensional vector. E needs to be a factor. |
X |
A matrix or dataframe with n rows and p columns. |
alpha |
Significance level. Defaults to 0.05. |
verbose |
If |
fitWithGam |
If |
test |
Unconditional independence test that tests whether residual distribution is
invariant across different levels of E. Defaults to |
colNameNoSmooth |
Gam parameter: Name of variables that should enter linearly into the model.
Defaults to |
mtry |
Random forest parameter: Number of variables randomly sampled as
candidates at each split. Defaults to |
ntree |
Random forest parameter: Number of trees to grow. Defaults to 100. |
nodesize |
Random forest parameter: Minimum size of terminal nodes. Defaults to 5. |
maxnodes |
Random forest parameter: Maximum number of terminal nodes trees in the forest can have.
Defaults to |
returnModel |
If |
A list with the following entries:
pvalue
The p-value for the null hypothesis that Y and E are independent given X.
model
The fitted model if returnModel = TRUE
.
# Example 1 n <- 1000 E <- rbinom(n, size = 1, prob = 0.2) X <- 4 + 2 * E + rnorm(n) Y <- 3 * (X)^2 + rnorm(n) InvariantResidualDistributionTest(Y, as.factor(E), X) InvariantResidualDistributionTest(Y, as.factor(E), X, test = ksResidualDistributions) # Example 2 E <- rbinom(n, size = 1, prob = 0.2) X <- 4 + 2 * E + rnorm(n) Y <- 3 * E + rnorm(n) InvariantResidualDistributionTest(Y, as.factor(E), X) InvariantResidualDistributionTest(Y, as.factor(E), X, test = ksResidualDistributions)
# Example 1 n <- 1000 E <- rbinom(n, size = 1, prob = 0.2) X <- 4 + 2 * E + rnorm(n) Y <- 3 * (X)^2 + rnorm(n) InvariantResidualDistributionTest(Y, as.factor(E), X) InvariantResidualDistributionTest(Y, as.factor(E), X, test = ksResidualDistributions) # Example 2 E <- rbinom(n, size = 1, prob = 0.2) X <- 4 + 2 * E + rnorm(n) Y <- 3 * E + rnorm(n) InvariantResidualDistributionTest(Y, as.factor(E), X) InvariantResidualDistributionTest(Y, as.factor(E), X, test = ksResidualDistributions)
Tests the null hypothesis that Y and E are independent given X.
InvariantTargetPrediction(Y, E, X, alpha = 0.05, verbose = FALSE, fitWithGam = TRUE, trainTestSplitFunc = caTools::sample.split, argsTrainTestSplitFunc = NULL, test = fTestTargetY, colNameNoSmooth = NULL, mtry = sqrt(NCOL(X)), ntree = 100, nodesize = 5, maxnodes = NULL, permute = TRUE, returnModel = FALSE)
InvariantTargetPrediction(Y, E, X, alpha = 0.05, verbose = FALSE, fitWithGam = TRUE, trainTestSplitFunc = caTools::sample.split, argsTrainTestSplitFunc = NULL, test = fTestTargetY, colNameNoSmooth = NULL, mtry = sqrt(NCOL(X)), ntree = 100, nodesize = 5, maxnodes = NULL, permute = TRUE, returnModel = FALSE)
Y |
An n-dimensional vector. |
E |
An n-dimensional vector or an nxq dimensional matrix or dataframe. |
X |
A matrix or dataframe with n rows and p columns. |
alpha |
Significance level. Defaults to 0.05. |
verbose |
If |
fitWithGam |
If |
trainTestSplitFunc |
Function to split sample. Defaults to stratified sampling
using |
argsTrainTestSplitFunc |
Arguments for sampling splitting function. |
test |
Unconditional independence test that tests whether the out-of-sample
prediction accuracy is the same when using X only vs. X and E as predictors for Y.
Defaults to |
colNameNoSmooth |
Gam parameter: Name of variables that should enter linearly into the model.
Defaults to |
mtry |
Random forest parameter: Number of variables randomly sampled as
candidates at each split. Defaults to |
ntree |
Random forest parameter: Number of trees to grow. Defaults to 100. |
nodesize |
Random forest parameter: Minimum size of terminal nodes. Defaults to 5. |
maxnodes |
Random forest parameter: Maximum number of terminal nodes trees in the forest can have. Defaults to NULL. |
permute |
Random forest parameter: If |
returnModel |
If |
A list with the following entries:
pvalue
The p-value for the null hypothesis that Y and E are independent given X.
model
The fitted models if returnModel = TRUE
.
# Example 1 n <- 1000 E <- rbinom(n, size = 1, prob = 0.2) X <- 4 + 2 * E + rnorm(n) Y <- 3 * (X)^2 + rnorm(n) InvariantTargetPrediction(Y, as.factor(E), X) InvariantTargetPrediction(Y, as.factor(E), X, test = wilcoxTestTargetY) # Example 2 E <- rbinom(n, size = 1, prob = 0.2) X <- 4 + 2 * E + rnorm(n) Y <- 3 * E + rnorm(n) InvariantTargetPrediction(Y, as.factor(E), X) InvariantTargetPrediction(Y, as.factor(E), X, test = wilcoxTestTargetY) # Example 3 E <- rnorm(n) X <- 4 + 2 * E + rnorm(n) Y <- 3 * (X)^2 + rnorm(n) InvariantTargetPrediction(Y, E, X) InvariantTargetPrediction(Y, X, E) InvariantTargetPrediction(Y, E, X, test = wilcoxTestTargetY) InvariantTargetPrediction(Y, X, E, test = wilcoxTestTargetY)
# Example 1 n <- 1000 E <- rbinom(n, size = 1, prob = 0.2) X <- 4 + 2 * E + rnorm(n) Y <- 3 * (X)^2 + rnorm(n) InvariantTargetPrediction(Y, as.factor(E), X) InvariantTargetPrediction(Y, as.factor(E), X, test = wilcoxTestTargetY) # Example 2 E <- rbinom(n, size = 1, prob = 0.2) X <- 4 + 2 * E + rnorm(n) Y <- 3 * E + rnorm(n) InvariantTargetPrediction(Y, as.factor(E), X) InvariantTargetPrediction(Y, as.factor(E), X, test = wilcoxTestTargetY) # Example 3 E <- rnorm(n) X <- 4 + 2 * E + rnorm(n) Y <- 3 * (X)^2 + rnorm(n) InvariantTargetPrediction(Y, E, X) InvariantTargetPrediction(Y, X, E) InvariantTargetPrediction(Y, E, X, test = wilcoxTestTargetY) InvariantTargetPrediction(Y, X, E, test = wilcoxTestTargetY)
Tests the null hypothesis that Y and E are independent given X. The distribution of the test statistic under the null hypothesis equals an infinite weighted sum of chi squared variables. This distribution can either be approximated by a gamma distribution or by a Monte Carlo approach. This version includes an implementation of choosing the hyperparameters by Gaussian Process regression.
KCI(Y, E, X, width = 0, alpha = 0.05, unbiased = FALSE, gammaApprox = TRUE, GP = TRUE, nRepBs = 5000, lambda = 0.001, thresh = 1e-05, numEig = NROW(Y), verbose = FALSE)
KCI(Y, E, X, width = 0, alpha = 0.05, unbiased = FALSE, gammaApprox = TRUE, GP = TRUE, nRepBs = 5000, lambda = 0.001, thresh = 1e-05, numEig = NROW(Y), verbose = FALSE)
Y |
A vector of length n or a matrix or dataframe with n rows and p columns. |
E |
A vector of length n or a matrix or dataframe with n rows and p columns. |
X |
A matrix or dataframe with n rows and p columns. |
width |
Kernel width; if it is set to zero, the width is chosen automatically (default: 0). |
alpha |
Significance level (default: 0.05). |
unbiased |
A boolean variable that indicates whether a bias correction should be applied (default: FALSE). |
gammaApprox |
A boolean variable that indicates whether the null distribution is approximated by a Gamma distribution. If it is FALSE, a Monte Carlo approach is used (default: TRUE). |
GP |
Flag whether to use Gaussian Process regression to choose the hyperparameters |
nRepBs |
Number of draws for the Monte Carlo approach (default: 500). |
lambda |
Regularization parameter (default: 1e-03). |
thresh |
Threshold for eigenvalues. Whenever eigenvalues are computed, they are set to zero if they are smaller than thresh times the maximum eigenvalue (default: 1e-05). |
numEig |
Number of eigenvalues computed (only relevant for computing the distribution under the hypothesis of conditional independence) (default: length(Y)). |
verbose |
If |
A list with the following entries:
testStatistic
the statistic Tr(K_(ddot(Y)|X) * K_(E|X))
criticalValue
the critical point at the p-value equal to alpha;
obtained by a Monte Carlo approach if gammaApprox = FALSE
, otherwise obtained by Gamma approximation.
pvalue
The p-value for the null hypothesis that Y and E are independent given X.
It is obtained by a Monte Carlo approach if gammaApprox = FALSE
,
otherwise obtained by Gamma approximation.
# Example 1 n <- 100 E <- rnorm(n) X <- 4 + 2 * E + rnorm(n) Y <- 3 * (X)^2 + rnorm(n) KCI(Y, E, X) KCI(Y, X, E)
# Example 1 n <- 100 E <- rnorm(n) X <- 4 + 2 * E + rnorm(n) Y <- 3 * (X)^2 + rnorm(n) KCI(Y, E, X) KCI(Y, X, E)
Used as a subroutine in InvariantResidualDistributionTest
to test whether residual distribution remains invariant across different levels
of E.
ksResidualDistributions(Y, predicted, E, verbose)
ksResidualDistributions(Y, predicted, E, verbose)
Y |
An n-dimensional vector. |
predicted |
An n-dimensional vector of predictions for Y. |
E |
An n-dimensional vector. |
verbose |
Set to |
A list with the p-value for the test.
Used as a subroutine in InvariantResidualDistributionTest
to test whether residual distribution remains invariant across different levels
of E.
leveneAndWilcoxResidualDistributions(Y, predicted, E, verbose)
leveneAndWilcoxResidualDistributions(Y, predicted, E, verbose)
Y |
An n-dimensional vector. |
predicted |
An n-dimensional vector of predictions for Y. |
E |
An n-dimensional vector. |
verbose |
Set to |
A list with the p-value for the test.
Used as a subroutine in InvariantEnvironmentPrediction
to test
whether out-of-sample performance is better when using X and Y as predictors for E,
compared to using X only.
propTestTargetE(E, predictedOnlyX, predictedXY, verbose)
propTestTargetE(E, predictedOnlyX, predictedXY, verbose)
E |
An n-dimensional vector. |
predictedOnlyX |
Predictions for E based on predictors in X only. |
predictedXY |
Predictions for E based on predictors in X and Y. |
verbose |
Set to |
A list with the p-value for the test.
Tests the null hypothesis that Y and E are independent given X.
ResidualPredictionTest(Y, E, X, alpha = 0.05, verbose = FALSE, degree = 4, basis = c("nystrom", "nystrom_poly", "fourier", "polynomial", "provided")[1], resid_type = "OLS", XBasis = NULL, noiseMat = NULL, getnoiseFct = function(n, ...) { rnorm(n) }, argsGetNoiseFct = NULL, nSim = 100, funcOfRes = function(x) { abs(x) }, useX = TRUE, returnXBasis = FALSE, nSub = ceiling(NROW(X)/4), ntree = 100, nodesize = 5, maxnodes = NULL)
ResidualPredictionTest(Y, E, X, alpha = 0.05, verbose = FALSE, degree = 4, basis = c("nystrom", "nystrom_poly", "fourier", "polynomial", "provided")[1], resid_type = "OLS", XBasis = NULL, noiseMat = NULL, getnoiseFct = function(n, ...) { rnorm(n) }, argsGetNoiseFct = NULL, nSim = 100, funcOfRes = function(x) { abs(x) }, useX = TRUE, returnXBasis = FALSE, nSub = ceiling(NROW(X)/4), ntree = 100, nodesize = 5, maxnodes = NULL)
Y |
An n-dimensional vector. |
E |
An n-dimensional vector or an nxq dimensional matrix or dataframe. |
X |
A matrix or dataframe with n rows and p columns. |
alpha |
Significance level. Defaults to 0.05. |
verbose |
If |
degree |
Degree of polynomial to use if |
basis |
Can be one of |
resid_type |
Can be |
XBasis |
Basis if |
noiseMat |
Matrix with simulated noise. Defaults to NULL in which case the simulation is performed inside the function. |
getnoiseFct |
Function to use to generate the noise matrix. Defaults to |
argsGetNoiseFct |
Arguments for |
nSim |
Number of simulations to use. Defaults to 100. |
funcOfRes |
Function of residuals to use in addition to predicting the
conditional mean. Defaults to |
useX |
Set to |
returnXBasis |
Set to |
nSub |
Number of random features to use if |
ntree |
Random forest parameter: Number of trees to grow. Defaults to 500. |
nodesize |
Random forest parameter: Minimum size of terminal nodes. Defaults to 5. |
maxnodes |
Random forest parameter: Maximum number of terminal nodes trees in the forest can have. Defaults to NULL. |
A list with the following entries:
pvalue
The p-value for the null hypothesis that Y and E are independent given X.
XBasis
Basis expansion if returnXBasis
was set to TRUE
.
fctBasisExpansion
Function used to create basis expansion if basis is not "provided"
.
# Example 1 n <- 100 E <- rbinom(n, size = 1, prob = 0.2) X <- 4 + 2 * E + rnorm(n) Y <- 3 * (X)^2 + rnorm(n) ResidualPredictionTest(Y, as.factor(E), X) # Example 2 E <- rbinom(n, size = 1, prob = 0.2) X <- 4 + 2 * E + rnorm(n) Y <- 3 * E + rnorm(n) ResidualPredictionTest(Y, as.factor(E), X) # not run: # # Example 3 # E <- rnorm(n) # X <- 4 + 2 * E + rnorm(n) # Y <- 3 * (X)^2 + rnorm(n) # ResidualPredictionTest(Y, E, X) # ResidualPredictionTest(Y, X, E)
# Example 1 n <- 100 E <- rbinom(n, size = 1, prob = 0.2) X <- 4 + 2 * E + rnorm(n) Y <- 3 * (X)^2 + rnorm(n) ResidualPredictionTest(Y, as.factor(E), X) # Example 2 E <- rbinom(n, size = 1, prob = 0.2) X <- 4 + 2 * E + rnorm(n) Y <- 3 * E + rnorm(n) ResidualPredictionTest(Y, as.factor(E), X) # not run: # # Example 3 # E <- rnorm(n) # X <- 4 + 2 * E + rnorm(n) # Y <- 3 * (X)^2 + rnorm(n) # ResidualPredictionTest(Y, E, X) # ResidualPredictionTest(Y, X, E)
Used as a subroutine in InvariantTargetPrediction
to test
whether out-of-sample performance is better when using X and E as predictors for Y,
compared to using X only.
wilcoxTestTargetY(Y, predictedOnlyX, predictedXE, verbose, ...)
wilcoxTestTargetY(Y, predictedOnlyX, predictedXE, verbose, ...)
Y |
An n-dimensional vector. |
predictedOnlyX |
Predictions for Y based on predictors in X only. |
predictedXE |
Predictions for Y based on predictors in X and E. |
verbose |
Set to |
... |
Argument to allow for coherent interface of fTestTargetY and wilcoxTestTargetY. |
A list with the p-value for the test.