Title: | Invariant Causal Prediction for Nonlinear Models |
---|---|
Description: | Performs 'nonlinear Invariant Causal Prediction' to estimate the causal parents of a given target variable from data collected in different experimental or environmental conditions, extending 'Invariant Causal Prediction' from Peters, Buehlmann and Meinshausen (2016), <arXiv:1501.01332>, to nonlinear settings. For more details, see C. Heinze-Deml, J. Peters and N. Meinshausen: 'Invariant Causal Prediction for Nonlinear Models', <arXiv:1706.08576>. |
Authors: | Christina Heinze-Deml <[email protected]>, Jonas Peters <[email protected]> |
Maintainer: | Christina Heinze-Deml <[email protected]> |
License: | GPL |
Version: | 0.1.2.1 |
Built: | 2025-02-02 03:25:05 UTC |
Source: | https://github.com/christinaheinze/nonlinearicp-and-condindtests |
Nonlinear Invariant Causal Prediction
nonlinearICP(X, Y, environment, condIndTest = InvariantResidualDistributionTest, argsCondIndTest = NULL, alpha = 0.05, varPreSelectionFunc = NULL, argsVarPreSelectionFunc = NULL, maxSizeSets = ncol(X), condIndTestNames = NULL, speedUp = FALSE, subsampleSize = c(0.1, 0.25, 0.5, 0.75, 1), retrieveDefiningsSets = TRUE, seed = 1, stopIfEmpty = TRUE, testAdditionalSet = NULL, verbose = FALSE)
nonlinearICP(X, Y, environment, condIndTest = InvariantResidualDistributionTest, argsCondIndTest = NULL, alpha = 0.05, varPreSelectionFunc = NULL, argsVarPreSelectionFunc = NULL, maxSizeSets = ncol(X), condIndTestNames = NULL, speedUp = FALSE, subsampleSize = c(0.1, 0.25, 0.5, 0.75, 1), retrieveDefiningsSets = TRUE, seed = 1, stopIfEmpty = TRUE, testAdditionalSet = NULL, verbose = FALSE)
X |
A (nxp)-dimensional matrix (or data frame) with n observations of p variables. |
Y |
A (nx1)-dimensional response vector. |
environment |
Environment variable(s) in an (n x k)-dimensional matrix or dataframe. Note that not all nonlinear conditional independence tests may support more than one environmental variable. |
condIndTest |
Function implementing a conditional independence test (see below
for the required interface). Defaults to |
argsCondIndTest |
Arguments of |
alpha |
Significance level to be used. Defaults to |
varPreSelectionFunc |
Variable selection function that is applied
to pre-select a set of variables before running the ICP procedure on the resulting
subset. Should be used with care as causal parents might be excluded in this step.
Defaults to |
argsVarPreSelectionFunc |
Arguments of |
maxSizeSets |
Maximal size of sets considered as causal parents.
Defaults to |
condIndTestNames |
Name of conditional independence test, used for printing.
Defaults to |
speedUp |
Use subsamples of sizes specified in |
subsampleSize |
Size of subsamples used in |
retrieveDefiningsSets |
Boolean variable to indicate whether defining sets
should be retrieved. Defaults to |
seed |
Random seed. |
stopIfEmpty |
Stop ICP procedure if retrieved set is empty. If
|
testAdditionalSet |
If a particular set should be tested, the corresponding indices can be provided via this argument. |
verbose |
Boolean variable to indicate whether messages should be printed. |
The function provided as condIndTest
needs to take the following
arguments in the given order: Y, environment, X, alpha, verbose
. Additional
arguments can then be provided via argsCondIndTest
.
A list with the following elements:
retrievedCausalVars
Indices of variables in
acceptedSets
List of accepted sets.
definingSets
List of defining sets.
acceptedModels
List of accepted models if specified in argsCondIndTest
.
pvalues.accepted
P-values of accepted sets.
rejectedSets
List of rejected sets.
pvalues.rejected
P-values of rejected sets.
settings
Settings provided to nonlinearICP
.
Please cite C. Heinze-Deml, J. Peters and N. Meinshausen: "Invariant Causal Prediction for Nonlinear Models", arXiv:1706.08576.
The function CondIndTest
from the package
CondIndTests
is a wrapper for a variety of nonlinear conditional independence
tests that can be used in condIndTest
.
# Example 1 require(CondIndTests) data("simData") targetVar <- 2 # choose environments where we did not intervene on var useEnvs <- which(simData$interventionVar[,targetVar] == 0) ind <- is.element(simData$environment, useEnvs) X <- simData$X[ind,-targetVar] Y <- simData$X[ind,targetVar] E <- as.factor(simData$environment[ind]) result <- nonlinearICP(X = X, Y = Y, environment = E) cat(paste("Variable",result$retrievedCausalVars, "was retrieved as the causal parent of target variable", targetVar)) ################################################### # Example 2 E <- rep(c(1,2), each = 500) X1 <- E + 0.1*rnorm(1000) X1 <- rnorm(1000) X2 <- X1 + E^2 + 0.1*rnorm(1000) Y <- X1 + X2 + 0.1*rnorm(1000) resultnonlinICP <- nonlinearICP(cbind(X1,X2), Y, as.factor(E)) summary(resultnonlinICP)
# Example 1 require(CondIndTests) data("simData") targetVar <- 2 # choose environments where we did not intervene on var useEnvs <- which(simData$interventionVar[,targetVar] == 0) ind <- is.element(simData$environment, useEnvs) X <- simData$X[ind,-targetVar] Y <- simData$X[ind,targetVar] E <- as.factor(simData$environment[ind]) result <- nonlinearICP(X = X, Y = Y, environment = E) cat(paste("Variable",result$retrievedCausalVars, "was retrieved as the causal parent of target variable", targetVar)) ################################################### # Example 2 E <- rep(c(1,2), each = 500) X1 <- E + 0.1*rnorm(1000) X1 <- rnorm(1000) X2 <- X1 + E^2 + 0.1*rnorm(1000) Y <- X1 + X2 + 0.1*rnorm(1000) resultnonlinICP <- nonlinearICP(cbind(X1,X2), Y, as.factor(E)) summary(resultnonlinICP)
Example dataset for tests
data("simData")
data("simData")
A list with the following entries
X
Dataframe with 500 observations and three variables.
environment
A vector of length 500, indicating which environment
the observations belong to.
interventionVar
A matrix of dimension 6 (no. of environments) x 3 (no. of variables),
where entry i,j indicates whether variable j was intervened on in environment i.
Summary functions for 'nonlinICP.class' objects.
## S3 method for class 'nonlinICP.class' summary(object, ...)
## S3 method for class 'nonlinICP.class' summary(object, ...)
object |
object of class 'nonlinICP.class'. |
... |
Additional inputs to generic summary function (not used). |
Christina Heinze-Deml and Jonas Peters
nonlinearICP
- it
is then applied to pre-select a set of variables before running the ICP procedure
on this subset. Here, the variable selection is based on random forest variable
importance measures.Variable selection function that can be provided to nonlinearICP
- it
is then applied to pre-select a set of variables before running the ICP procedure
on this subset. Here, the variable selection is based on random forest variable
importance measures.
varSelectionRF(X, Y, env, verbose, nSelect = sqrt(ncol(X)), useMtry = sqrt(ncol(X)), ntree = 100)
varSelectionRF(X, Y, env, verbose, nSelect = sqrt(ncol(X)), useMtry = sqrt(ncol(X)), ntree = 100)
X |
A (nxp)-dimensional matrix (or data frame) with n observations of p variables. |
Y |
Response vector (n x 1) |
env |
Indicator of the experiment or the intervention type an observation belongs to. A numeric vector of length n. Has to contain at least two different unique values. |
verbose |
If |
nSelect |
Number of variables to select. Defaults to |
useMtry |
Random forest parameter |
ntree |
Random forest parameter |
A vector containing the indices of the selected variables.
# Example 1 require(CondIndTests) data("simData") targetVar <- 2 # choose environments where we did not intervene on var useEnvs <- which(simData$interventionVar[,targetVar] == 0) ind <- is.element(simData$environment, useEnvs) X <- simData$X[ind,-targetVar] Y <- simData$X[ind,targetVar] E <- as.factor(simData$environment[ind]) chosenIdx <- varSelectionRF(X = X, Y = Y, env = E, verbose = TRUE) cat(paste("Variable(s)", paste(chosenIdx, collapse=", "), "was/were chosen."))
# Example 1 require(CondIndTests) data("simData") targetVar <- 2 # choose environments where we did not intervene on var useEnvs <- which(simData$interventionVar[,targetVar] == 0) ind <- is.element(simData$environment, useEnvs) X <- simData$X[ind,-targetVar] Y <- simData$X[ind,targetVar] E <- as.factor(simData$environment[ind]) chosenIdx <- varSelectionRF(X = X, Y = Y, env = E, verbose = TRUE) cat(paste("Variable(s)", paste(chosenIdx, collapse=", "), "was/were chosen."))