^{*}

Edited by: Yilei Zhang, Nanyang Technological University, Singapore

Reviewed by: Jamie Sleigh, University of Auckland, New Zealand; Gopikrishna Deshpande, Auburn University, United States; Adam Ponzi, Okinawa Institute of Science and Technology Graduate University, Japan; Jan Lauwereyns, Kyushu University, Japan

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

To infer the causes of its sensations, the brain must call on a generative (predictive) model. This necessitates passing local messages between populations of neurons to update beliefs about hidden variables in the world beyond its sensory samples. It also entails inferences about how we will act. Active inference is a principled framework that frames perception and action as approximate Bayesian inference. This has been successful in accounting for a wide range of physiological and behavioral phenomena. Recently, a process theory has emerged that attempts to relate inferences to their neurobiological substrates. In this paper, we review and develop the anatomical aspects of this process theory. We argue that the form of the generative models required for inference constrains the way in which brain regions connect to one another. Specifically, neuronal populations representing beliefs about a variable must receive input from populations representing the Markov blanket of that variable. We illustrate this idea in four different domains: perception, planning, attention, and movement. In doing so, we attempt to show how appealing to generative models enables us to account for anatomical brain architectures. Ultimately, committing to an anatomical theory of inference ensures we can form empirical hypotheses that can be tested using neuroimaging, neuropsychological, and electrophysiological experiments.

This paper is based upon the notion that brain function can be framed as Bayesian inference (Kersten et al.,

The good regulator theorem (Conant and Ashby,

The key notion that underwrites this is the Markov blanket (Pearl,

Throughout, we will see that anatomy and generative models offer constraints upon one another that limit the space of plausible brain architectures. Ensuring mutual and internal consistency in these domains represents the kind of conceptual analysis necessary to form meaningful hypotheses in neuroscience (Nachev and Hacker,

This is not the first attempt to map inferential computations to the anatomy of the brain, and builds upon several existing accounts of neuroanatomy in terms of predictive coding architectures (Bastos et al.,

The organization of this review is as follows. First, we overview the notion of a generative model and introduce some of the general principles that will be necessary for understanding the rest of the paper. Following this, we try to ground these abstract ideas by illustrating their implications in the domains of perceptual inference, planning, neuromodulation, and movement. We conclude by considering some specific and testable hypotheses.

A generative model is a probabilistic description of how a given type of data might have been generated. It expresses prior beliefs about unobserved hidden states (or latent variables), the probabilistic dependencies between these states, and a likelihood that maps hidden states (i.e., causes) to sensory data (i.e., consequences). Such models can be used to predict new sensory data, and to infer the hidden states that could have caused observed data (Beal,

Forney factor graphs. The graphical model in this figure represents the (arbitrary) probability distribution shown below. Crucially, this distribution can be represented as the product of factors (ϕ) that represent prior and conditional distributions. By assigning each factor a square node, and connecting those factors that share random variables, we construct a graphical representation of the joint probability distribution. The “ = ” node enforces equality on all edges (lines) that connect to it. Small black squares represent observable data. This figure additionally illustrates a simple method for determining the Markov blanket of a variable (or set of variables). By drawing a line around all of the factor nodes connected to a variable, we find that the edges we intersect represent all of the constituents of the Markov blanket. For example, the green line shows that the blanket of

A common rhetoric for describing directed causal relationships is that “parent” variables cause “child” variables. Using these terms, a Markov blanket for a given variable may be thought of as its parents, children, and the parents of its children (Pearl,

One further concept that will be useful in what follows is the idea of “closing a box” (Loeliger,

Partition functions and free energy. This schematic illustrates a useful operation known as “closing the box” or taking a partition function of part of a graph. By summing (or integrating) over all variables on edges within the dashed box, we can reduce this portion of the graph to a single factor that plays the part of a (marginal) likelihood. While it is not always feasible to perform the summation explicitly, we can approximate the marginal likelihood with a negative free energy. This affords an efficient method for evaluating subregions of the graph. Taking the partition function, or computing the free energy, for the whole graph allows us to evaluate the evidence sensory data affords the generative model.

In the following sections, we will unpack the idea of a generative model and its constituent Markov blankets in several domains. Before doing so, it is worth emphasizing the domain generality of this sort of approach. The ideas here have been applied, explicitly or implicitly, across applications as diverse as agency (Friston et al.,

To start, we consider some of the simplest generative models that capture useful features of the environment. Broadly, there are two important categories (Friston et al.,

Perception as inference. This figure shows two generative models that describe hidden state trajectories, and the data they generate. On the left, we show evolution of discrete states (

Glossary of mathematical notation.

Generative model | A set of probability distributions that make up a generative model | |

Posterior beliefs | An approximation to the probability of a variable given observed data | |

Shannon entropy | Uncertainty (or dispersion) of a probability distribution | |

Expectation | Expected (or average) value of a variable | |

_{KL} |
KL-Divergence | Difference between two probability distributions |

It is also possible to represent trajectories in continuous time using a sequence of numbers, but these no longer express states at each time step. Instead, we can represent the coefficients of a Taylor series expansion of the trajectory. These are the current position, velocity, acceleration, and subsequent temporal derivatives—sometimes referred to as “generalized coordinates of motion” (Friston et al.,

Generative models that evolve continuous time or discrete time likely coexist in the brain, mirroring the processes generating sensory data. While, at the level of sensory receptors, data arrive in continuous time, they may be generated in a sequential, categorical manner at a deeper level of hierarchical structure. For example, a continuous model may be necessary for low level auditory processing, but language processing depends upon being able to infer discrete sequences of words (which may themselves make up discrete phrases or sentences).

Before detailing the neuronal network that could perform these inferences, it is worth acknowledging the limitations of the generative model alone in trying to understand neuroanatomy at the microcircuit level. It may be that the brain makes use of auxiliary variables that, while in the service of inference, are not themselves sufficient statistics or messages. Probably the simplest example of this kind of variable is a prediction error, which quantifies the difference between the optimal solution and the current estimate of a continuous state (e.g., luminance contrast). In a biological setting, with inferences that play out in continuous time, gradient descents using prediction errors offer a plausible way to describe inferential dynamics (Rao and Ballard,

The anatomy of perceptual inference. The neuronal network illustrated in this figure could be used to perform inferences about the model of Figure

The move from the factor graph representation in Figure

Sensory input may come directly from sensory thalamic nuclei, or may come via sensory cortical areas (Thomson and Bannister,

This neuronal network illustrates a very important point. The architectures suggested by theoretical considerations must be constrained by our knowledge of real neuroanatomy (Douglas and Martin,

A similar constraint comes from neuropsychological research (Heinke and Humphreys,

While the anatomy of Figure

One way to think about planning is that it represents the selection from several possible behavioral trajectories, or models of future action (Kaplan and Friston,

Planning as inference. This figure illustrates the use of partition functions to evaluate regions of the graph (see also Figure

Once we acknowledge the need for beliefs about the future, we run into a problem. By definition, sensory data from the future have not yet been collected, and we cannot compute their associated free energy. We can resolve this by using beliefs about the future to compute predictions about the sort of data expected under each policy. Averaging with respect to this “posterior predictive” density (probability distribution) allows us to compute an expected free energy (Figure

Free energy approximations to model evidence (or expected model evidence) depend upon how closely beliefs (

The basal ganglia are a complex network of subcortical structures (Lanciego et al.,

The basal ganglia. In the upper part of this figure, we show the same network as in Figure

The striatum, consisting of the caudate and putamen, is the main input nucleus of the basal ganglia (Shipp,

The differences between direct and indirect pathway medium spiny neurons are consistent with the form of the messages required to compute posterior beliefs about policies. While the direct pathway neurons have an inhibitory effect on the output nuclei of the basal ganglia, the indirect pathway has a net excitatory effect. The latter depends upon an additional GABAergic (inhibitory) synapse from the globus pallidus externus to the subthalamic nucleus (Jahanshahi et al.,

In contrast, indirect pathway striatal neurons have a smaller dendritic arbor (Gertler et al., ^{1}

Hierarchical models. This figure illustrates the extension of Figure

In addition to providing a computational hypothesis for basal ganglia function that formalizes the notion that they are engaged in planning (i.e., policy evaluation), we can now refine the cortical anatomy of Figure ^{2}

While the computational anatomy appears to be consistent with known basal ganglia circuitry, there are several outstanding questions that need resolution. The first of these is the number of synapses between the indirect pathway neurons and the output nuclei. The upper and lower parts of Figure

A second question concerns the role of the hyperdirect pathway (Nambu et al.,

Disorders of basal ganglia nuclei are well characterized. It is a useful test of the validity of the proposed anatomy to see whether deficits in these computational units are consistent with behaviors observed in neurological practice. Parkinson's disease is a common disorder in which degeneration of neurons in the substantia nigra pars compacta lead to dopamine deficits in the striatum (Albin et al.,

An intriguing feature of Parkinson's disease is that, in certain contexts, patients can perform complex fluent motor behavior; e.g., cycling (Snijders and Bloem,

While Parkinson's disease represents reduced direct pathway influences, there are other syndromes that occur if the indirect pathway is damaged. These provide support for the idea that the indirect pathway uses prior beliefs to prevent the performance of implausible behavioral policies. One such syndrome is hemiballismus, resulting from damage to the subthalamic nucleus (Hawley and Weiner,

The ideas in this section may be generalizable to other subcortical structures. Specifically, some nuclei of the amygdala resemble those of the basal ganglia, but appear to have a role in regulating autonomic, as opposed to skeletomotor, policies (Swanson and Petrovich,

In addition to knowing which variables are causally related to which others, our brains must be able to infer how reliable these relationships are. In the previous section, we discussed the role of dopamine in modulating the indirect and direct pathways, but without specifying its role in the generative model. We have suggested that dopamine increases the influence of message 2 relative to message 1 (Figure

We can generalize the idea that dopamine modulates the balance between prior and posterior influences over policies by considering the confidence ascribed to other factors in the generative model. Figure

Precision and uncertainty. This figure shows the graph of Figure

It is likely that there is a range of mechanisms that give rise to attentional gain-control in the brain, from neuromodulators acting via NMDA receptor pathways (Law-Tho et al.,

Putative roles of neurotransmitters in active inference.

Acetylcholine | Likelihood | •Presence of presynaptic receptors on thalamocortical afferents (Sahin et al., |

Noradrenaline | Transitions | •Maintenance of persistent prefrontal (delay-period) activity (requiring precise transition probabilities) depends upon noradrenaline (Arnsten and Li, |

Dopamine | Policies | •Expressed post-synaptically on striatal medium spiny neurons (Freund et al., |

Serotonin | Preferences or interoceptive likelihood | •Receptors expressed on layer V pyramidal cells (Aghajanian and Marek, |

The anatomy of uncertainty. This schematic extends the network of Figure

The cholinergic system appears a good candidate for the encoding of likelihood precision, given its known role in regulating the gain of sensory evoked responses (Gil et al.,

Neurobiological theories based upon active inference frequently implicate dopamine in the encoding of the precision of beliefs about policies (Friston et al.,

While we have focused upon three modulatory transmitters, there are clearly many more to be accounted for in this computational framework (Iglesias et al.,

Drawing from the idea that some nuclei of the amygdala could be an autonomic analog of the basal ganglia (Swanson and Petrovich,

For simplicity, we have only included the unidirectional connections from neuromodulatory systems to the cortex and basal ganglia in Figure

There are many disorders thought to be due to abnormalities of precision estimation (Friston,

The graphs and neuronal networks of the preceding sections have all focused upon the discrete dynamics outlined on the left of Figure

In addition to accounting for anatomical findings, models based upon this form of active inference have reproduced a range of complex motor phenomena, including handwriting (Friston et al.,

The success of continuous state space generative models in accounting for motor behavior appears to imply a disconnect between movement and planning, with the latter more easily accounted for using discrete time models. This suggests there must be some interface between the two, where decisions, selected from a discrete repertoire, are translated into beliefs about continuous variables. Figure

Decisions and movement. This graph illustrates how beliefs about categorical variables may influence those about continuous variables (message 1) and vice versa (message 2). The upper part of the graph is the same as that from Figure

Translating from discrete to continuous variables implies that there must be an interface at which a discretized encoding of space is mapped to a continuous encoding. In the oculomotor system, the superior colliculus may represent an interface of this sort (Parr and Friston,

An anatomy of inference. This schematic summarizes the networks we have discussed so far, but adds in the messages of Figure

While the superior colliculus may play the role of discrete-continuous interface in the oculomotor system, other structures must play analogous roles for different motor outputs. These are likely to share features of anatomy and physiology with the colliculus. Following the pattern above, these structures should receive input from cortical layer V, and from the output nuclei of the basal ganglia. Furthermore, they should encode continuous variables in a discretized fashion, with different neurons representing discrete elements of a continuous scale. A network that includes the ventrolateral (motor) thalamus and the primary motor cortex represent a candidate that meets these criteria (Bosch-Bouju et al.,

Figure

The preceding sections have reviewed recent attempts to understand the anatomy of the brain in terms of the inferential computations it must perform. We have argued that the key determinant for anatomical connectivity is the structure of the generative model the brain uses to make these inferences. This allows us to express hypotheses about computational neuroanatomy in a graphical notation that can be built up from relatively few simple building blocks, as described above. This framework is sufficiently general that we can use it to understand perceptual inference, planning, attentional gain, and movement. These can all be combined within the same factor graph, enabling the expression of systems-level hypotheses about brain-wide networks.

This article has focused on some very specific but ubiquitous features of computational anatomy that emerge under a factor graph treatment—with special attention to known neuroanatomy, neurophysiology, and neuropsychology. There are clear and obvious architectural features that are predicted under a graphical treatment of neuronal message passing; for example, the very existence of sparse neuronal (axonal) connections and the hierarchical organization of cortical and subcortical structures. The very existence of the brain as a network or graph that possesses hierarchically nested Markov blankets—and engages in sparse message passing [unlike the liver or blood (Friston and Buzsaki,

Not only do these ideas have to be internally consistent (minimally complex in relation to one another), they must accurately account for a range of observed phenomena, including the consequences of anatomical lesions. We have outlined a few examples throughout that illustrate this, including abnormalities of perception resulting from disconnections (e.g., Charles Bonnet syndrome), disorders of policy evaluation (e.g., Parkinson's disease), and failures of attentional gain (e.g., Lewy body disease). It is also important to realize that, as messages are propagated across the graph, deficits in one part of the graph have implications for all other parts. A disorder that offers a clear example of this kind of diaschisis (Price et al.,

The heterogeneity of anatomical lesions giving rise to neglect illustrates that the same processes of policy (i.e., saccadic) selection can be disrupted by multiple distant lesions. We have previously shown through simulation (Parr and Friston,

The above represent criteria for the face validity of anatomical process theories. To go further, it is necessary to make empirical predictions based upon these theories. We have highlighted three novel ideas that have arisen from the form of the generative models used here, which could be interrogated in empirical studies. First, if we interpret the direct and indirect pathways of the basal ganglia in terms of partition functions and empirical priors, respectively, this has important consequences for learned behaviors. While it is possible to optimize the parameters of a conditional probability (

Second, we touched upon hypothetical computational roles for serotonin that would be consistent with its anatomical and laminar distribution in the cortex under the computational anatomy discussed above. This scheme offers the potential to frame these, anatomically derived, computational hypotheses in terms of simulated behavior. To test whether serotonergic modulations are best explained as manipulations of interoceptive sensory precision, or the precision of preferences, we would need to design a task in which simulating manipulations of each of these parameters would give rise to different behavioral outputs. Fitting these parameters to behavior under different levels of pharmacological manipulation would allow us to evaluate the relative evidence for each of these hypotheses. For a recent example of this sort of approach, inferring computational parameters (including precision of preferences) for visual exploration, see Mirza et al. (

Finally, we considered the role of the motor thalamocortical networks, and suggested that these might represent the sort of discrete-continuous interface that we have previously associated with the superior colliculus. This predicts that there should be a very different sort of deficit resulting from the pathway into the ventrolateral thalamus compared to that following lesions to motor cortical outputs. The former might involve deficits in choice of movement, or difficulty initiating movements. The latter are more likely to give rise to impairments in the motor trajectories themselves. Of course, it is important to emphasize again that lesions to any neuroanatomical structure, or equivalently, to any part of a generative model, will have wide-reaching consequences due to the propagation of inferential messages.

The above represent theoretically motivated hypotheses that may be evaluated in relation to empirical evidence. These are potentially falsifiable (in a frequentist statistical sense), or could be shown to be relatively poor hypotheses compared to an alternative explanation (in a Bayesian sense). It is worth emphasizing that the inferential framework described here is not subject to these same tests. This (active) inference formulation simply provides a formal language and notation in which hypotheses about neuronal processes can be articulated and evaluated. The formulation of the brain's inferential computations as graphs and Markov blankets is therefore not in competition with, or an endorsement of, other approaches to understanding brain function. It accommodates those approaches that appeal to chaotic dynamical systems (Korn and Faure,

Clearly the account given in this paper is far from complete. We have omitted important structures from this, including the cerebellum and second order thalamic nuclei, like the pulvinar. These have not escaped treatment under the framework of active inference. The pulvinar, have been associated with generative models that treat prior beliefs over precisions as empirical priors, conditioned upon hidden states (Kanai et al.,

In this paper, we have emphasized the idea that generative models, and their constituent Markov blankets, represent a useful way to express hypotheses about brain connectivity. We have reviewed recent attempts to apply this framework to a range of anatomical networks, illustrating their face validity and internal consistency. There may be other plausible mappings between the connectivity implied by the Markov blankets of a generative model and the anatomy of the brain, which could make use of different auxiliary variables to the free energy gradients (prediction errors) we have assumed. Similarly, there are other plausible generative models that the brain may use, and these may involve different Markov blankets. For this reason, we emphasize not only current anatomical theories, but also a theoretically rigorous graphical framework in which questions about computational anatomy can be clearly posed. Under this framework, there are two broad lines of enquiry. First, what are the generative models the brain employs to make inferences about the world? Second, what is the mapping between the network implied by a given generative model and the connections of the brain? These questions constrain one another, as a good hypothesis for a computational neuroanatomy will imply a plausible generative model that contains Markov blankets consistent with brain connectivity.

All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Rosetrees Trust (Award Number 173346) to TP. KF is a Wellcome Principal Research Fellow (Ref: 088130/Z/09/Z).

The Supplementary Material for this article can be found online at:

^{1}An empirical prior arises in hierarchical models, when a higher level provides constraints on a lower-level. It is called an empirical prior because the reciprocal message passing means that the prior is informed by the (empirical) data at hand. In other words, hierarchical models endow (empirical) priors with context-sensitivity.

^{2}The (Shannon) entropy is the dispersion of a probability distribution, and may be thought of as quantifying uncertainty about the variable described by the distribution.