The purpose of the Appendix is to summarise the usage of the two main R libraries that are discussed in this book: ggm and lavaan.
The ggm (graphical Gaussian model) library
There is no R library that is dedicated to the d-sep test and its generalisations. However, the ggm library of R does have a number of useful functions that can help you. First, the basiSet() function can always be used to obtain the union basis set of d-separation claims that lies at the heart of the d-sep test. This function requires that you specify the DAG, and this is done using the DAG() function. The dSep() function is not required but is a very useful little function for helping you test your understanding of this important concept. The shipley.test() function implements the d-sep test for the special case in which your DAG involves only normally distributed variables and linear relationships.
DAG(…, order = FALSE)
The R function DAG() is used for imputing a directed acyclic graph. The output of this function is the adjacency matrix of the DAG – i.e. a square Boolean matrix of order equal to the number of nodes of the graph and a 1 in position (i,j) if there is an arrow from i to j and a zero otherwise. The row names of the adjacency matrix are the nodes of the DAG.
Arguments
…= a sequence of model formulae, using the regression (∼) operator. For each formula, the right-hand response defines an effect node (a child in the DAG) and the left-hand explanatory variables the parents of that node. If the regressions are not recursive (i.e. if there is a feedback loop in the DAG) then the function returns an error message.
order = logical, defaulting to FALSE. If TRUE then nodes of the DAG are permuted according to the topological order. If FALSE then nodes are in the order they first appear in the model formulae (from left to right). This argument is used for purely aesthetic reasons.
Consider this simple causal chain model: X→Y→Z. To input this model as a DAG you would specify My.DAG<-DAG(Y∼X, Z∼Y). Note that this uses the same syntax as when specifying linear models in R. For each child node (dependent variable, endogenous variable) you specify dependent variable ∼ parent variable1 + parent variable2, etc.