^{1}

^{1}

^{1}

^{*}

^{2}

^{1}

^{2}

Edited by: Elio Tuci, Middlesex University, United Kingdom

Reviewed by: Daniel Polani, University of Hertfordshire, United Kingdom; Hector Zenil, Karolinska Institutet (KI), Sweden; Joseph T. Lizier, University of Sydney, Australia

Specialty section: This article was submitted to Computational Intelligence, a section of the journal Frontiers in Robotics and AI

This is an open-access article distributed under the terms of the

The study of collective behavior has traditionally relied on a variety of different methodological tools ranging from more theoretical methods such as population or game-theoretic models to empirical ones like Monte Carlo or multi-agent simulations. An approach that is increasingly being explored is the use of information theory as a methodological framework to study the flow of information and the statistical properties of collectives of interacting agents. While a few general purpose toolkits exist, most of the existing software for information theoretic analysis of collective systems is limited in scope. We introduce Inform, an open-source framework for efficient information theoretic analysis that exploits the computational power of a C library while simplifying its use through a variety of wrappers for common higher-level scripting languages. We focus on two such wrappers here: PyInform (Python) and rinform (R). Inform and its wrappers are cross-platform and general-purpose. They include classical information-theoretic measures, measures of information dynamics and information-based methods to study the statistical behavior of collective systems, and expose a lower-level API that allow users to construct measures of their own. We describe the architecture of the Inform framework, study its computational efficiency and use it to analyze three different case studies of collective behavior: biochemical information storage in regenerating planaria, nest-site selection in the ant

Collective behaviors, such as the coordinated motion of a flock of starlings (

A common approach is to develop software solutions to compute specific information-theoretic measures. For example, TRENTOOL (

In previous work (^{1}^{2}

In this work, we introduce two of Inform’s language wrappers: PyInform^{3}^{4}

We begin with a review of the design and implementation of the Inform framework in Section 2. In Section 2.1 we describe the architecture of Inform and its wrappers with a focus on each of the four major components of the framework—distributions, information measures, time series measures and utilities. In Section 2.2 we discuss the validation process and stability of Inform, PyInform and rinform. In Section 3 we showcase the capabilities of the framework by analyzing three different collective systems: cellular-level biochemical processes in regenerating planaria (see Section 3.1), house-hunting behavior in Temnothorax ants (see Section 3.2), and consensus achievement in multi-agent simulations (see Section 3.3). Section 4 is dedicated to the analysis of the computational performance of Inform taking the JIDT library of (

Inform (MIT license)^{5}

Optimized implementations of many common information-theoretic time series measures, including block entropy, mutual information, complete and apparent transfer entropy, active information storage and predictive information.

Optimized implementations of less common concepts such as effective information, information flow, evidence for integration and partial information decomposition.

All time series measures include local and average variants where applicable.

An empirical probability distribution structure over a discrete event space^{6}

A collection of utility functions, such as black boxing and binning algorithms, which may be used in conjunction with time series measures to facilitate analysis of complex systems.

No external library dependencies.

The Inform library is implemented in cross-platform C, and can be built on any system with a C11-compliant^{7 }

Essentially all modern programming languages provide a C foreign-function interface.

Most of Inform’s functionality requires minimal memory management — typically only one allocation and deallocation per function.

C does not have exceptions. While useful in a given language, exceptions make interfacing languages more difficult.

C requires no external dependencies for distribution — as such, the wrapper libraries do not depend on an external virtual machine, interpreter or JIT compiler.

All subsequent references to Inform will refer to the entire framework including its wrappers; any reference to the C library will be disambiguated as such.

Information theory largely focuses on quantifying information within probability distributions. To model this, Inform is designed around the concept of an empirical probability distribution. These distributions are used to define functions which compute information theoretic quantities. From these basic building blocks, we implemented an entire host of time series measures. Intuitively, the time series measures construct empirical distributions and call the appropriate information-theoretic functions. These three components—distributions, information measures and time series measures—form Inform’s core functionality. Additionally, Inform provides a suite of utilities that can be used to augment and extend it’s core features. We now detail how these components are implemented and interact with each other to provide a cohesive toolkit.

Inform’s empirical probability distributions are implemented by a distribution class,

Inform uses the

which computes the (Shannon) entropy of the distribution

Each measure in the framework takes some number of distributions and the logarithmic base as arguments, ensures that they are all valid^{8}

Inform’s final core component is a suite of measures defined over time series. The version 1.0.0 release includes

The time series measures available in inform v1.0.0.

Time Series Measure | Local/Pointwise Variant |
---|---|

Block Entropy ( | |

Cross Entropy ( | |

(Multivariate) Mutual Information ( | |

Conditional Entropy ( | |

Relative Entropy ( | |

Entropy Rate ( | |

Active Information ( | |

Transfer Entropy ( | |

Separable Information ( | |

Predictive Information ( | |

Excess Information ( | |

Effective Information ( | |

Information Flow ( | |

Partial Information Decomposition ( | |

Evidence of Integration ( |

Local/Pointwise variants are implemented for all measures that reasonably admit them, signified by a

(

The final component of Inform is the utility suite. One of the greatest challenges of building a general-purpose framework is ensuring that it can be applied to problems that are outside of the authors’ initial use cases. Inform attempts to do this by first exposing the basic components of the library, distributions and information measures, and then providing utility functions that can be used to augment the core functionality. One particular example of this is the ^{9}

The Inform framework was developed using a test-driven approach: unit tests were written for each component before implementing the component itself. Consequently, all features in Inform have been thoroughly unit tested to ensure that they perform as expected. In fact, the bulk of the development effort went into testing, and test code accounts for roughly 60% of the entire C source code distribution.

To ensure cross-platform support, continuous integration services are employed to build and run all unit tests on multiple platforms. Travis CI^{10}^{11}^{12}

In this section, we illustrate the use of Inform by performing information-theoretic analyses of three collective behaviors: the dynamics membrane potentials and ion concentrations in regenerating planaria, nest-site selection by colonies of the ant

In this first case study, we use partial information decomposition (

We use the BioElectric Tissue Simulation Engine (BETSE) (

The

From these binned data, we compute the partial information decomposition (PID) of the information about

As we conclude this example, it is worthwhile to acknowledge that Inform’s current implementation of PID is limited to Williams’s and Beer’s

In this case study, we use local active information to analyze collective decisions made by the ant

For this study, we look at a live colony of 78 T.

Distribution of local active information and colony-level commitment state for a live colony of 78 T.

In this final case study, we use transfer entropy to analyze the flow of information in a multi-agent system developed to study the best-of-

We consider a collective of 100 agents tasked with a binary decision-making problem where the best option has quality

Probability density functions of the average transfer entropy for agents in systems applying the majority rule (purple) and for agents in systems using the voter model (green).

In this section, we investigate the performance of PyInform by calculating two computationally demanding measures of information dynamics: active information (AI) and transfer entropy (TE). While we focus on PyInform here, rinform shows comparable performance characteristics. We compare the performance of PyInform with that of JIDT (

Using the four data sets described above, we computed the AI for each agent in the collective and the TE using PyInform and JIDT’s built-in time series-based functionality. We computed AI and TE for history lengths ^{13}

Performance ratio versus history length for average and local active information

In addition to comparing the runtime performance, we also compared the absolute results of the calculations for all values of

In this section we provide a few examples of how to directly use the Python and R wrappers, respectively, PyInform and rinform. Live documentation of these wrappers can be found at

We start with a simple example of how to use the

This is only a sample of the functionality provided around the ^{14}^{15}

As described in Section 2.1, the Shannon information measures are defined around the

A host of information measures are provided in the Inform framework. These can be found in the ^{16}^{17}

The time series measures are a primary focus for the Inform framework. ^{18}

Time series measures can fail for a variety of reasons ranging from invalid arguments to exhausted system memory. In these situations, an error is raised which describes the reason for the function’s failure. At the end of both

All of the time series measures follow the same basic calling conventions as ^{19}^{20}

Our next example,

The flexibility of the the

Here, ^{21}

As such, these observations can be encoded as a base-

The ^{22}

Inform’s collection of utilities allows the user to easily construct new information-measures over time series data. Combining utility functions such as

We will now conclude this section with two demonstrative examples of how ^{23}

As such, one might compute the conditional entropy by first constructing the joint distribution

Finally, we will perform a similar process to estimate the active information of random variable

where

In this paper we introduced Inform v1.0.0, a flexible and computationally efficient framework to perform information-theoretic analysis of collective behaviors. Inform is a general-purpose, open-source, and cross-platform framework designed to be flexible and easy to use. It builds on a computationally efficient C library and an ecosystem of foreign language wrappers for Python, R, Julia, and the Wolfram Language. Inform gives the user access to a large set of functions to estimate information-theoretic measures from empirical discretely-valued time series. These include classic information-theoretic measures such as Shannon’s entropy and mutual information, information dynamics measures such as active information storage and transfer entropy, and information-based concepts conceived to investigate the causal architecture of collective systems. Inform’s low-level API is organized around the concepts of probability distributions, information measures, time series measures and utilities and its flexibility allows users to construct new measures and algorithms of their own. We showcased the Inform framework by applying it to the study of three collective behaviors: cellular-level biochemical processes in regenerating planaria, colony emigration by the ant

The Inform framework is still a relatively young project compared to more mature projects such as JIDT. While it has many features that make it unique such as, its computational efficiency, the large set of information-theoretic methods, and the availability of foreign language wrappers, it does lack some important functionality. We are planning three subsequent releases to incrementally extend the Inform framework. In the version 1.1.0 release, we will modify Inform’s interface to provide the user with access to the probability distributions used in the computation of information dynamics measures and their accumulation functions. In Python, for example, the extended API for computing the active information may take the following form:

The advantage of exposing probability distributions and their accumulation functions is that the user can modify the way that probabilities are estimated. As opposed to the version 1.0.0 where Inform’s time series measures require that all time series be stored in memory prior to the estimation of distributions, this new release will allow the user to write their own accumulation functions which could incrementally update distributions from very large time series stored on the hard-drive or with data that is generated in real-time. In the version 1.2.0 release, we will provide support for non-Shannon entropy functions. Shannon’s entropy of a discrete random variable is the unique functional form of entropy that satisfies all Shannon’s four axioms (^{24}

DGM designed and implemented the Inform library as well as the Python, Julia, and Mathematica wrappers. GV designed and implemented the R wrapper. All authors contributed to the conceptualization of the framework and to the writing of the manuscript.

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The reviewer, JL, declared a past collaboration with one of the authors, SW, to the handling Editor.

The authors would like to thank Jake Hanson and Harrison Smith for their contributions to PyInform and its documentation.

Support for continuous event spaces is planned for v2.0.0, Section 6.

ISO/IEC 9899:2011:

An empirical distribution is considered invalid if it has no recorded events.

The naming of this function is intended to bring to mind the process of “black boxing” nodes in a network. That is, this function models drawing an opaque box around a collection of nodes, treating them as one unit with no known internal structure.

See

In Python, we use —numpy—, a package that provides a wealth of useful array-based functionality:

Note that if we had considered W′ = (Y,X) ϵ Ω′ instead, the encoded time series would have been different , e.g., 2,1,1,4,3,4,2,5. However, the mutual information between them, I(W,W′), tends to the theoretical maximum H(W) as the number of observations increases; this indicates that (X,Y) and (Y,X) are informationally equivalent representations of the underlying space.

The —block_entropy— function computes the Shannon block entropy of a time series. This reduces to the standard Shannon entropy when a block size of k = 1 is used, e.g., —block_entropy(series, k = 1)—.