^{1}

^{2}

^{3}

^{4}

^{5}

^{6}

^{7}

^{1}

^{2}

^{3}

^{4}

^{5}

^{6}

^{7}

Edited by: Zbigniew R. Struzik, University of Tokyo, Japan

Reviewed by: Hector Zenil, Karolinska Institutet, Sweden; Sebastian Wallot, Max Planck Institute for Empirical Aesthetics (MPG), Germany; Víctor M. Eguíluz, Instituto de Física Interdisicplinary Sistemas Complejos IFISC (CSIC-UIB), Spain

Specialty section: This article was submitted to Computational Intelligence, a section of the journal Frontiers in Robotics and AI

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

We present a set of Matlab/Octave functions to compute measures of emergence, self-organization, and complexity applied to discrete and continuous data. These measures are based on Shannon’s information and differential entropy. Examples from different datasets and probability distributions are provided to show how to use our proposed code.

Complexity has generated interest in recent years (Bar-Yam,

The Java Information Dynamics Toolkit presents a multi-platform library to calculate complexity of dynamical systems using Shannon’s entropy (e.g., information transfer) for discrete and continuous data (Lizier,

The Online Algorithmic Complexity Calculator

The Algorithmic Complexity for Short Strings (ACSS), for the R language, computes the Kolmogorov complexity for short strings (Soler-Toscano et al.,

In this manuscript, we present a package to calculate statistical measures of emergence E, self-organization S, and complexity C which are applicable to any dataset or probability distributions (Fernández et al.,

A previous effort of Fernández et al. (^{1}

This paper is organized as follows. Section

Function or filename | Functionality |
---|---|

DiscreteComplexityMeasures (pmfSample, noOfStates) | This function calculates discrete entropy-based complexity measures for a univariate sample in accordance to the number of the sample’s system states |

ContinuousComplexityMeasures (pdfSample, minVal, maxVal, distSampleSize, noOfStates) | This function calculates continuous entropy-based complexity measures for a probability density distribution in accordance to the minimum and maximum values such distribution takes, the integration step, and the number of system’s states |

bar3DPlot (M, width, param1Labels, param2Labels) | This function makes a 3D bar to graphically display ESC Measures |

In Appendix ^{2}

In this section, we describe the statistical measures of E, S, and C. Discrete measures were defined in a previous study presented in Fernández et al. (

Many notions of

_{D}_{i}_{C}^{Δ} corresponds to discretized version of X, and Δ is the integration step. On the other hand, _{i}_{2}(

It is also worth noting that, _{D}_{i}_{i}

The complexity of different phenomena can be calculated using entropy-based measures. However, to obtain meaningful results, users must first determine the adequate function to be employed for their problem (e.g., should a raw sample or an estimated probability distribution function be used?). In this section, we describe two functions for complexity: ^{3}

DiscreteComplexityMeasures(pmfSample, noOfStates)

ContinuousComplexityMeasures(pdfSample, varargin)

Additional parameters are:

bar3DPlot(M, width, param1Labels, varargin)

0 <

When

Complexity measure functions return 4 elements: three mandatory outputs

E, S, and C measures provide a _{in}_{out}_{in}_{in}_{j}_{j}_{max}

On the other hand, _{1}) = 0.8 and the remaining 4 states have equal probability _{2, …, 5}) = 0.05 hence

Some of the known issues, considerations, and limitations of this package are as follows:

The statistical measures proposed are mainly based on Shannon’s discrete and differential entropy (i.e., H(X)) per symbol.

Our proposed measures only consider I.I.D. random variables. Thus, conditional time relations or strings size > 1 are not considered. The former is particularly important when analyzing a distribution. For instance, if a discrete sequence of repeating points, e.g., 0, 1, 2, 0, 1, 2, … is analyzed in terms of each number, the distribution will resemble a uniform distribution; hence, E = 1. However, if the states of the system are strings of 3 elements, the distribution will be Dirac delta S = 1.

In order to obtain some preliminary results when calculating continuous complexity, it should be considered the size of the integration step Δ. In this context, if Δ ≈ 0 then _{C}

Emergence value is understood as

These ESC measures are univariate.

In this section, we present an example that shows the functionality of our complexity measures (additional details are provided at Appendix

The example

First, you must choose between discrete or continuous examples. Next, you need to specify the working directory and the dataset. Some datasets from the University of California Irvine were provided in advance (UCI) (Lichman,

The working directory is specified (a) via Matlab/Octave user interface, or (b) by setting the path via code as is shown.

Next, you must choose the type of complexity measure:

Specify the dataset to be employed:

Also, you may specify the number of states (

Finally, calculate ESC measures as follows:

For illustrative purposes, we chose as

Probability Density Functions are used to estimate Gaussian and Power-Law (PL) distributions. In the former, a pre-programmed language function is employed, whereas in the latter, we implemented our own probability function. In either case, some parameters are required: _{min}

Next, specify variable

Also, you must specify the

For illustrative purposes we chose as the _{min}

_{min}_{min}_{min}

In this paper, we presented two functions to calculate entropy-based complexity measures:

Additional notes need to be made. First, for pedagogical purposes these functions were developed using GNU Octave language. However, they can be easily extended to R or Python. Note that for a fast computation process, the implementation of these measures on other languages will require vector and matrix operations, loop usage is discouraged. Second, these functions only are designed to calculate discrete and continuous complexity of univariate systems. Thus, a measure for multivariate systems is required. A fast proxy for multivariate entropy calculation could be the summation of each feature entropy. Consequently, Emergence could be calculated as the ratio of _{i}

GS-B designed and coded ESC discrete and continuous Matlab/Octave functions and performed the experiments. GS-B, CG, and NF conceived and designed the experiments and wrote the paper.

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The reviewer HZ declared a past co-authorship with one of the authors (CG) to the handling Editor, who ensured that the process met the standards of a fair and objective review.

The authors would like to thank Carlos Piña Ph.D. for the help in editing and proofreading this manuscript. GS-B was supported by the Consejo Nacional de Ciencia y Tecnología under the Cátedra-Conacyt contract 969.

The Supplementary Material for this article can be found online at

In the following, experimental results are briefly described. To demonstrate the functionality of

_{min}_{min}_{i}_{min}_{min}_{min}

A solar flare occurs when magnetic energy that has built up in the solar atmosphere is suddenly released. UCI’s dataset contains three types of classes categorized by their magnitude and frequency. For each class, the number of solar flares of a certain class that occur in a 24-h period are counted.^{4}

Bike-sharing systems (BSS) are a new generation of urban mobility systems, composed by bicycles which are rented to subscribers for these to travel short to medium distances. These type of systems can be scrutinized from a large-scale statistic point of view.^{5}

The need for a more efficient lifestyle requires to parametrize several aspects of human activities. Household electric consumption provides information not only to casual/conscious consumers but also to providers and grid managers. In these experiments, measurements of electric power consumption in one household with a 1-min sampling rate over a period of almost 4 years were used.^{6}

In the previous section, numeric results of different phenomena and parameters of distributions were presented. In this section, we provide results for the analysis of multiple timescales. For such purposes, we employed the largest dataset available which is the household electric consumption. In the former example, only half of the dataset was employed. For this example, ~2 million points were used. The different timescales that were analyzed are

First, we remove missing data points. Then, data were averaged in accordance to the aforementioned timescales. Next, we calculate Emergence, Self-organization, and Complexity, for the house’s global, kitchen, and indoor comfort active power. Results for the complexity measures are presented in Figure

^{1}

^{2}

^{3}

^{1}

^{2}

^{3}