\item{score}{character giving which network score should be used to sample the DAGs landscape.}

\item{data.dists}{a named list giving the distribution for each node in the network, see details.}

\item{max.parents}{a constant giving the maximum number of parents allowed.}

\item{mcmc.scheme}{a sampling scheme. It is vector giving in that order: the number of returned DAGS, the number of thinned steps and length of the burnin phase.}

\item{mcmc.scheme}{a sampling scheme. It is vector giving in that order: the number of returned DAGS, the number of thinned steps and length of the burn-in phase.}

\item{seed}{a non-negative integer which sets the seed.}

\item{verbose}{extra output, see output for details.}

\item{start.dag}{a DAG given as a matrix, see details for format, which can be used to explicitly provide a starting point for the structural search. Alternatively character "random" will select a random DAG as starting point. Character "hc" will call a hill-climber to select a DAG as starting point.}

\item{start.dag}{a DAG given as a matrix, see details for format, which can be used to provide a starting point for the structural search explicitly. Alternatively, character "random" will select a random DAG as a starting point. Character "hc" will call a hill-climber to select a DAG as a starting point.}

\item{prior.dag}{user defined prior. It should be given as a matrix where entries range from zero to one. 0.5 is non-informative for the given arc.}

\item{prior.lambda}{hyper parameter representing the strength of belief in the userdefined prior.}

\item{prior.lambda}{hyper parameter representing the strength of belief in the user-defined prior.}

\item{prob.rev}{probability of selecting a new edge reversal.}

\item{prob.mbr}{probability of selecting a Markov blanket resampling move.}

\item{heating}{a real positive number that heats up the convergence if between zero and one and apply an exponential decrease scheme if larger than one. The default is one. See details}

\item{heating}{a real positive number that heats up the chain if between zero and one and apply an exponential decrease scheme if larger than one. The default is one. See details}

\item{prior.choice}{an integer, 1 or 2, where 1 is a uniform structural prior and 2 uses a weighted prior, see details.}

}

...

...

@@ -46,22 +46,22 @@ mcmcabn(score.cache = NULL,

This function is a structural Monte Carlo Markov Chain Model Choice (MC)^3 sampler that is equipped with two large scale MCMC moves that are purposed to accelerate chain mixing.

}

\details{The procedure runs a structural Monte Carlo Markov Chain Model Choice (MC)^3 to find the most probable posterior network (DAG). The default algorithm is based on three MCMC move: edge addition, edge deletion and edge reversal. This algorithm is known as the (MC)^3. It is known to mix slowly and getting stuck in low probability regions. Indeed, changing of Markov equivalence region often requires multiple MCMC moves. Then large scale MCMC moves are implemented. Their relative frequency can be set by the user. The new edge reversal move (REV) from Grzegorczyk and Husmeier (2008) and the Markov blanket resampling (MBR) from Su and Borsuk (2016). The classical reversal move depends on the global configuration of the parents and children and fails to propose MCMC jumps that produce valid but very different DAGs in a unique move. The REV move sample globally a new set of parent. The MBR workaround applies the same idea but to the entire Markov blanket of a randomly chosen node.

\details{The procedure runs a structural Monte Carlo Markov Chain Model Choice (MC)^3 to find the most probable posterior network (DAG). The default algorithm is based on three MCMC moves: edge addition, edge deletion, and edge reversal. This algorithm is known as the (MC)^3. It is known to mix slowly and getting stuck in low probability regions. Indeed, changing of Markov equivalence region often requires multiple MCMC moves. Then large scale MCMC moves are implemented. The user can set the relative frequencies. The new edge reversal move (REV) from Grzegorczyk and Husmeier (2008) and the Markov blanket resampling (MBR) from Su and Borsuk (2016). The classical reversal move depends on the global configuration of the parents and children and fails to propose MCMC jumps that produce valid but very different DAGs in a unique move. The REV move sample globally a new set of parents. The MBR workaround applies the same idea but to the entire Markov blanket of a randomly chosen node.

The classical (MC)^3 is unbiased but inefficient in mixing, the two radical MCMC alternative move are known to massively accelerate mixing without introducing biases. But those move are computationally expensive. Then low frequencies are advised. The REV move is not necessarily ergotic, then it should not be used alone.

The classical (MC)^3 is unbiased but inefficient in mixing. The two radical MCMC alternative moves are known to accelerate mixing without introducing biases. Those MCMC moves are computationally expensive. Then low frequencies are advised. The REV move is not necessarily ergotic, then it should not be used alone.

The parameter \code{start.dag} can be: \code{"random"}, \code{"hc"} or user defined. If user select \code{"random"} then a random valid DAG is selected. The routine used favourise low density structure. If \code{"hc"} (for Hill-climber: \link[abn:search_heuristic]{searchHeuristic} then a DAG is selected using 100 different searches with 500 optimization steps. A user defined DAG can be provided. It should be a named square matrix containing only zeros and ones. The DAG should be valid (i.e. acyclic).

The parameter \code{prior.choice} determines the prior used within each individual node for a given choice of parent combination. In Koivisto and Sood (2004) p.554 a form of prior is used which assumes that the prior probability for parent combinations comprising of the same number of parents are all equal. Specifically, that the prior probability for parent set G with cardinality |G| is proportional to 1/[n-1 choose |G|] where there are n total nodes. Note that this favours parent combinations with either very low or very high cardinality which may not be appropriate. This prior is used when \code{prior.choice=2}. When \code{prior.choice=1} an uninformative prior is used where parent combinations of all cardinalities are equally likely. When \code{prior.choice=3} a userdefined prior is used, defined by \code{prior.dag}. It is given by an adjacency matrix (squared and same size as number of nodes) where entries ranging from zero to one give the user prior belief. An hyperparameter defining the global user belief in the prior is given by \code{prior.lambda}.

The parameter \code{prior.choice} determines the prior used within each node for a given choice of parent combination. In Koivisto and Sood (2004) p.554, a form of prior is used, which assumes that the prior probability for parent combinations comprising of the same number of parents are all equal. Specifically, that the prior probability for parent set G with cardinality |G| is proportional to 1/[n-1 choose |G|] where there are n total nodes. Note that this favors parent combinations with either very low or very high cardinality, which may not be appropriate. This prior is used when \code{prior.choice=2}. When \code{prior.choice=1} an uninformative prior is used where parent combinations of all cardinalities are equally likely. When \code{prior.choice=3} a user-defined prior is used, defined by \code{prior.dag}. It is given by an adjacency matrix (squared and same size as number of nodes) where entries ranging from zero to one give the user prior belief. An hyperparameter defining the global user belief in the prior is given by \code{prior.lambda}.

MCMC sampler come with asymptotic statistical guarantees. Therefore it is highly advised to run multiple long enough chains. The burnin phase length (i.e throwing away first MCMC iterations) should be adequately chosen.

MCMC sampler comes with asymptotic statistical guarantees. Therefore it is highly advised to run multiple long enough chains. The burn-in phase length (i.e. throwing away first MCMC iterations) should be adequately chosen.

The argument \code{data.dists} must be a list with named arguments, one for each of the variables in \code{data.df}, where each entry is either \code{"poisson"}, \code{"binomial"}, or \code{"gaussian"}.

The parameter \code{heating} could improve convergence. It should be a real positive number. If smaller than one it is a tuning parameter which transforms the score by raising it to this power. One is neutral. The smaller the more probable to accept any move. If larger than one, it indicates the number of returned steps where an exponentially decrease heating scheme is applied. After this number of steps, the \code{heating} parameter is set to one.

The parameter \code{heating} could improve convergence. It should be a real positive number. If smaller than one, it is a tuning parameter which transforms the score by raising it to this power. One is neutral. The smaller, the more probable to accept any move. If larger than one, it indicates the number of returned steps where an exponentially decrease heating scheme is applied. After this number of steps, the \code{heating} parameter is set to one.

}

\value{A list with an entry for the list of sampled DAGs, the list of scores, the acceptance probability, the method used for each MCMC jump, the rejection status for each MCMC jump, the total number of iterations the thinning, the length of burnin phase, the named list of distribution per node and the heating parameter. The returned object is of class mcmcabn.}

\value{A list with an entry for the list of sampled DAGs, the list of scores, the acceptance probability, the method used for each MCMC jump, the rejection status for each MCMC jump, the total number of iterations the thinning, the length of burn-in phase, the named list of distribution per node and the heating parameter. The returned object is of class mcmcabn.}

\author{Gilles Kratzer}

...

...

@@ -98,7 +98,7 @@ Scutari, M. (2010). Learning Bayesian Networks with the bnlearn R Package. Journ

## Example from the asia dataset from Lauritzen and Spiegelhalter (1988)

## provided by Scutari (2010)

# The number of MCMC run is delibaretelly chosen too small (computing time)

# The number of MCMC run is deliberately chosen too small (computing time)

# no thinning (usually not recommended)

# no burn-in (usually not recommended,

# even if not supported by any theoretical arguments)

\title{List of files to reproduce examples \code{mcmcabn} library.}

\description{10^5 MCMC runs with 1000 burn-in runs from the asia synthetic dataset from Lauritzen and Spiegelhalter (1988) provided by Scutari (2010). Named list of distributions and pre-computed scores.

\description{10^5 MCMC runs with 1000 burn-in runs from the asia synthetic dataset from Lauritzen and Spiegelhalter (1988) provided by Scutari (2010). A named list of distributions and pre-computed scores.

\description{Generic function to plot \code{mcmcabn} objects.

}

\details{The plot function for mcmcabn objects is based on \pkg{ggplot2}, \pkg{ggpubr} and \pkg{cowplot} packages. By default it returns a trace plot with coloured points when MBR and REV methods have been used. It displays histograms on the right of the densities of (MC)^3, MBR and REV MCMC jumps respectively.}

\details{The plot function for mcmcabn objects is based on \pkg{ggplot2}, \pkg{ggpubr} and \pkg{cowplot} packages. By default, it returns a trace plot with coloured points when MBR and REV methods have been used. It displays histograms on the right of the densities of (MC)^3, MBR and REV MCMC jumps respectively.}

\details{The query can be formulated using an adjacency matrix or a formula-wise expression.

The adjacency matrix should be squared of dimension equal to the number of nodes in the networks. Their entries should be either 1, 0 or -1. The 1 indicates the requested arcs, the -1 the excluded and the 0 all other entries that are not subject to query. The rows indicated the set of parents of the index nodes. The order of rows and column should be the same as the one used in the \code{mcmcabn()} function in the \code{data.dist} argument.

The adjacency matrix should be squared of dimension equal to the number of nodes in the networks. Their entries should be either 1, 0, or -1. The 1 indicates the requested arcs, the -1 the excluded, and the 0 all other entries that are not subject to query. The rows indicated the set of parents of the index nodes. The order of rows and column should be the same as the one used in the \code{mcmcabn()} function in the \code{data.dist} argument.

The formula statement has been designed to ease querying over the MCMC sample. It allows user to make complex queries without explicitly writing an adjacency matrix (which can be painful when the number of variables is large). The formula argument can be provided using typically a formula like:

\code{~ node1|parent1:parent2 + node2:node3|parent3}. The formula statement has to start with `~`. In this example, node1 has two parents (parent1 and parent2). node2 and node3 have the same parent3. The parents names have to exactly match those given in name. `:` is the separator between either children or parents, `|` separates children (left side) and parents (right side), `+` separates

terms, `.` replaces all the variables in name. Additional, when one want to exclude an arc simply put `-` in front of that statement. Then a formula like: ~ -node1|parent1 exclude all DAGs that have an arc between parent1 and node1.

The formula statement has been designed to ease querying over the MCMC sample. It allows user to make complex queries without explicitly writing an adjacency matrix (which can be painful when the number of variables is large). The formula argument can be provided using a formula alike:

\code{~ node1|parent1:parent2 + node2:node3|parent3}. The formula statement has to start with `~`. In this example, node1 has two parents (parent1 and parent2). node2 and node3 have the same parent3. The parents' names have to match those given in name exactly. `:` is the separator between either children or parents, `|` separates children (left side) and parents (right side), `+` separates

terms, `.` replaces all the variables in name. Additional, when one wants to exclude an arc simply put `-` in front of that statement. Then a formula alike: ~ -node1|parent1 exclude all DAGs that have an arc between parent1 and node1.

If the formula argument is not provided the function returns the average support of all individual arcs using a named matrix.

If the formula argument is not provided, the function returns the average support of all individual arcs using a named matrix.

}

\value{A frequency for the requested query. Alternatively a matrix with arc-wise frequencies.}

\details{The summary function for \code{mcmcabn} objects returns multiple summary metrics for assesing the quality of the MCMC run. Thinning is the number of thinned MCMC steps for one MCMC returned.}

\details{The summary function for \code{mcmcabn} objects returns multiple summary metrics for assessing the quality of the MCMC run. Thinning is the number of thinned MCMC steps for one MCMC returned.}

\value{This method prints: the number of burn-in steps, the number of MCMC steps, the thinning, the maximum achieved score, the empirical mean of the MCMC samples, the empirical standard deviation of the MCMC samples, the user defined quantiles of the posterior network score, the global acceptance rate, a table of the accepted and rejected moves in function of the methods used, the sample size adjusted for autocorrelation and the autocorrelations by lag.}

title: "Advances with mcmcabn: a structural MCMC sampler for DAGs learned from observed systemic datasets"

title: "Advances with Mcmcabn: A Structural Mcmc Sampler for Dags Learned from Observed Systemic Datasets"

author: "Gilles Kratzer, Reinhard Furrer"

date: "2019-10-14"

---

...

...

@@ -139,7 +139,7 @@ plot(out2)

# Visualizing DAGs diversity landscape

The main incentive to perform structural MCMC is the belief that presenting a single best scoring DAG is rudimentary, taking into account the number of possible quasi-equally scoring DAGs at the top of the posterior distribution. The code below allows us to display this diversity

The main incentive to perform structural MCMC is the belief that presenting a single best scoring DAG is rudimentary, taking into account the number of possible quasi-equally scoring DAGs at the top of the posterior distribution. The code below allows us to display this diversity.

```{r}

#renaming columns of the dataset

...

...

@@ -190,7 +190,7 @@ Let us plot the MCMC run

plot(out)

```

Let us plot the diversity plot, where the x1-axis is the number of arcs of each structure, the x2-axis is the structural Hamming distance (a proxy for how different two DAGs are), the y1-axis is the occurence of the structures and the y2-axis is the scores. As one can see, half of the posterior distribution is represented by one DAG.

Let us plot the diversity plot, where the x1-axis is the number of arcs of each structure, the x2-axis is the structural Hamming distance (a proxy for how different two DAGs are), the y1-axis is the occurrence of the structures, and the y2-axis is the scores. As one can see, half of the posterior distribution is represented by one DAG.

```{r}

plot(1:nb.dag, sort(tab,decreasing = TRUE)[1:nb.dag], type = 'n',ylab = "",xlab = "Number of arcs",xaxt="n",yaxt="n", ylim = c(0,2000))