^{1}

^{2}

^{1}

^{3}

^{1}

^{2}

^{3}

Edited by: Michael Wibral, Goethe University Frankfurt, Germany

Reviewed by: Robin A. A. Ince, University of Manchester, UK; Raul Vicente, Max Planck Society, Germany

Specialty section: This article was submitted to Computational Intelligence, a section of the journal Frontiers in Robotics and AI

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

The behavior of many real-world phenomena can be modeled by non-linear dynamical systems whereby a latent system state is observed through a filter. We are interested in interacting subsystems of this form, which we model by a set of coupled maps as a synchronous update graph dynamical system. Specifically, we study the structure learning problem for spatially distributed dynamical systems coupled via a directed acyclic graph. Unlike established structure learning procedures that find locally maximum posterior probabilities of a network structure containing latent variables, our work exploits the properties of dynamical systems to compute globally optimal approximations of these distributions. We arrive at this result by the use of time delay embedding theorems. Taking an information-theoretic perspective, we show that the log-likelihood has an intuitive interpretation in terms of information transfer.

Complex systems are broadly defined as systems that comprise interacting non-linear components (Boccaletti et al.,

The structure learning problem for distributed dynamical systems is a precursor to inference in systems that are not fully observable. This case encompasses many practical problems of known artificial, biological, and chemical systems, such as neural networks (Lizier et al.,

In this paper, we exploit the properties of discrete-time multivariate dynamical systems in inferring coupling between latent variables in a DAG. Specifically, the main focus of this paper is to analytically derive a measure (score) for evaluating the fitness of a candidate DAG, given data. We assume the data are generated by a certain family of multivariate dynamical system and are thus able to overcome the issue of latent variables faced by established structure learning algorithms. That is, under certain assumptions of the dynamical system, we are able to employ time delay embedding theorems (Stark et al.,

Our main result is a tractable form of the log-likelihood function for synchronous GDSs. Using this result, we are able to directly compute the

We are interested in classes of systems whereby dynamical units are coupled via a graph structure. These types of systems have been studied under several names, including complex dynamical networks (Boccaletti et al.,

BN structure learning comprises two subproblems:

In non-linear time series analysis, the problem of inferring coupling strength and causality in complex systems has received significant attention recently (Schreiber,

A recent approach to inferring causality is convergent cross-mapping (CCM), which is based on Takens theorem (Takens,

This section summarizes relevant technical concepts used throughout the paper. First, a stochastic temporal process _{1}, _{2}, …, _{N}_{1}, _{2}, …, _{N}^{i}_{n}

DBNs are a general graphical representation of a temporal model, representing a probability distribution over infinite trajectories of random variables (_{1}, _{2}, …) compactly (Friedman et al., _{n}_{n}_{n}

BNs _{1}, _{→}) extend the BN to model temporal processes and comprise two parts: the prior BN _{1} = (_{1}, Θ_{1}), which defines the joint distribution _{→} = (_{→}, Θ_{→}), which defines a first-order Markov process _{n}_{+1}|_{n}_{→} forms a DAG. The 2TBN probability distribution factorizes according to _{→}with a local CPD _{D}_{1}, _{2}, …, _{N}_{1}, _{2}, …, _{N}

Embedding theory refers to methods from differential topology for inferring the (hidden) state of a dynamical system from a reconstructed sequence of observations. The state of a discrete-time dynamical system is given by a point _{n}_{n}_{n+}_{1} = _{n}_{n}_{n}_{f}_{,}_{ψ}

In differential topology, an _{f}_{,}_{ψ}_{f}_{,}_{ψ}_{f}_{,}_{ψ}_{f}_{,}_{ψ}

There are technical assumptions for Takens’ theorem (and the generalized versions employed herein) to hold. These assumptions require: (

Multivariate ^{1}, ^{2}, …, ^{m}^{i}^{i}

We express multivariate dynamical systems as a synchronous update GDS to allow for generic maps. With this model, we can express the time evolution of the GDS as a stationary DBN, and perform inference and learning on the subsequent graph. We formally state the network of dynamical systems as a special case of the sequential GDS (Mortveit and Reidys,

Definition 1. Synchronous graph dynamical system (GDS). _{n}_{n}^{i}^{i}

^{i} confined to a d^{i}-dimensional manifold

^{i}}

^{1}, ^{2}, …, ^{M}

Without loss of generality, we can use local functions to describe the time evolution of the subsystems:

Here, ^{i}^{i}

The time evolution of a synchronous GDS can be modeled as a DBN. First, each subsystem vertex ^{i} = ^{i} are denoted Π_{G}^{ i}_{→} is stationary and synchronous, parents of ^{i}^{3} is coupled to both subsystem ^{1} and ^{2} through the edge set ^{1} and ^{1}) are associated with subsystem ^{1}, and similarly for ^{2} and ^{3}. The distributions for the state [equation (

^{1}, ^{2} and ^{3}), and (B) the rolled-out DBN of the equivalent structure^{3} is coupled to both subsystems ^{1} and ^{2} by means of the edges between latent variables

In the rest of the paper, we use simplified notation, given this constrained graph structure. First, since our focus is on learning coupling between distributed systems, the superscripts refer to individual _{→}is constrained such that _{→}can be computed independently of the prior network _{1} (Friedman et al., _{→}is stationary, learning _{→}is equivalent to learning the synchronous GDS.

In this section, we develop the theory for learning the synchronous update GDS from data. We will focus on techniques for learning graphical models using the

To derive the score, we use the DBN formulation of synchronous GDSs (Sec.

Ideally, we want to be able to compute the posterior probability of the network structure

A common approach to compute equation (

Akaike (

When

To calculate the information criterion [equation (

Note that although we describe the states and observations as discrete in equation (

The log-likelihood function [equation (_{n+1}|_{n}

Lemma 1. _{n}_{n}^{i}^{i}^{i}_{n}_{+1} = _{n}_{n}_{n}_{+1} = _{n}_{n}_{n}_{+1}). Given this type of forced system, the bundle delay embedding theorem (Stark, ^{1}_{f}

Given a DAG ^{ i}^{ i}^{i}

Denote the RHS of equation (^{i}

Lemma 2. _{1}, _{2}, …, _{N}^{2}

Rearranging equation (

Lemma 2 shows that the distributions can be reformulated by conditioning on delay vectors. The RHS of equation (

Theorem 1. _{1}, _{2}, …, _{N}

In equation (

Theorem 2.

We can now compute the number of parameters needed to specify the model as (Friedman et al.,

Since we are searching for the graph _{G} g

To conclude our study of the scores, we look at the log-likelihood in the context of information transfer. First, rearranging the terms of collective transfer entropy [equation (

Proposition 1.

Again, the first two terms in equation (

Recall that the empty DAG

Proposition 2.

We have presented a principled method to score the structure of non-linear dynamical networks, where dynamical units are coupled via a DAG. We approached the problem by modeling the time evolution of a synchronous GDS as a DBN. We then derived the AIC and BIC scoring functions for the DBN based on time delay embedding theorems. Finally, we have shown that the log-likelihood of the synchronous GDS can be interpreted in the context of information transfer.

The representation of synchronous GDSs as DBNs allows for inference of coupling in dynamical networks and facilitates techniques for synthesis in these systems. DBNs are an expressive framework that allows representation of generic systems, as well as a numerous general purpose inference techniques that can be used for filtering, prediction, and smoothing (Friedman et al.,

Theorem 2 captures an interesting parallel between learning from complete data and learning non-linear dynamical networks. If the embedding dimension

The results presented here provoke new insights into the concepts of structure learning, non-linear time series analysis, and effective network analysis (Sporns et al.,

In the future, we aim to perform empirical studies to exemplify the properties of the presented scoring functions. Specifically, the empirical studies should yield insight into the effect of weak, moderate and strong coupling between dynamical units. An important concept to consider in stochastic systems is the convergence of the shadow (reconstructed) manifold to the true manifold (Sugihara et al.,

Finally, the reconstruction theorems used in this paper typically make the assumption that the map (or flow) is a diffeomorphism (invertible in time). Thus, given any state, the past and future are uniquely determined and the time delay

OC co-wrote the manuscript, derived and proved the theorems, lemmas, and propositions. MP co-wrote the manuscript, assisted with the proofs, and supervised. RF co-wrote the manuscript, assisted with the proofs, and supervised.

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

We would like to thank Joseph Lizier, Jürgen Jost, and Wolfram Martens for many helpful discussions, particularly in regards to embedding theory. This work was supported in part by the Australian Centre for Field Robotics; the New South Wales Government; and the Faculty of Engineering & Information Technologies, The University of Sydney, under the Faculty Research Cluster Program.

^{1}Stark (

^{2}The original proof (Deyle and Sugihara,