DAG with No Tears | Ritesh Goru

NOTE: This post assumes you are acquainted with graphical models

Open Table of contents

Motivation
Novelty
Problem Formulation
Acyclicity Constraint
Score Function and Optimisation
To-Do

Motivation

One of the challenging problems in graphical models is estimating the structure of Bayesian Networks (DAG’s) given data. This is called structure learning. The main idea is to construct a DAG (Directed Acyclic Graph) given the data. Until now this problem has been viewed as a combinatorial optimisation (due to the acyclicity constraint) and thus most of the solutions involved some kind of search over DAG space.

Novelty

The Authors propose a method which essentially converts this combinatorial problem into a continuous one. The important contribution is characterising the acyclicity constraint as a smooth function. Which converts the problem into a constrained optimisation problem.

Problem Formulation

Given: n d-dimensional data points $\{x_{i}\}^{i=n}_{i=1}$ , or a matrix $X_{n \times d}$ .

The DAG can be represented by a 2D matrix with adjacency graph W. So, our

Aim: To fill up a matrix $W_{d \times d}$ such that it would result in a DAG

Objective:

$\min_{W \in R^{d \times d}} F(W)$

$\text{subject to } h(W) = 0$

Here,

$h$ is the function which captures acyclicity constraint
$F$ is the score function

Acyclicity Constraint

This is probably the most important part of the paper. They provide a nice expression for the function $h$ :

$h(W) = tr(e^{W \circ W}) - d$

And $h(W) = 0$ if and only if $W$ is a DAG

Where,

$\circ$ denotes hadamard product (Element wise multiplication)
$tr$ is trace

Score Function and Optimisation

Work in progress

To-Do

Score function and optimisation
Algorithm
Thresholding
Grammar check