^{*}

Edited by: Walter Adriani, Istituto Superiore di Sanità (ISS), Italy

Reviewed by: Vieri Giuliano Santucci, Istituto di Scienze e Tecnologie della Cognizione (ISTC), Italy; BenoÎt Girard, Centre National de la Recherche Scientifique (CNRS), France

This article was submitted to Decision Neuroscience, a section of the journal Frontiers in Neuroscience

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

Expected utility models are often used as a normative baseline for human performance in motor tasks. However, this baseline ignores computational costs that are incurred when searching for the optimal strategy. In contrast, bounded rational decision-theory provides a normative baseline that takes computational effort into account, as it describes optimal behavior of an agent with limited information-processing capacity to change a prior motor strategy (before information-processing) into a posterior strategy (after information-processing). Here, we devised a pointing task where subjects had restricted reaction and movement time. In particular, we manipulated the permissible reaction time as a proxy for the amount of computation allowed for planning the movements. Moreover, we tested three different distributions over the target locations to induce different prior strategies that would influence the amount of required information-processing. We found that movement endpoint precision generally decreases with limited planning time and that non-uniform prior probabilities allow for more precise movements toward high-probability targets. Considering these constraints in a bounded rational decision model, we found that subjects were generally close to bounded optimal. We conclude that bounded rational decision theory may be a promising normative framework to analyze human sensorimotor performance.

According to ethological theory (Alcock,

In human motor control, for example, a wide range of behaviors has been previously explained by optimal actor models that depend on critical variables such as motor effort, trajectory smoothness, endpoint variability, task accuracy, or endstate comfort (Flash and Hogan,

While Bayesian models are conceptually appealing, we know in particular from research in artificial intelligence that such models can become wildly intractable when faced with real-world information-processing problems. A growing number of studies has therefore addressed the question under what conditions we observe deviations from Bayes-optimal behavior. Acerbi et al. (

Here, we propose information-theoretic bounded rationality (Ortega and Braun,

In our experiment we consider a decision-maker that is confronted with a world state

where we have introduced _{p(x|a)}[_{0}(_{0}(_{0}(^{*}(

The Kullback-Leibler divergence _{KL}(_{0}(_{0}(

In the case of multiple world states _{0}(

Here, the average _{KL} measures the average distance between the prior _{0}(^{*}(

where

with the mutual information

By varying the parameter β from 0 to ∞, we obtain a family of bounded rational solutions ^{*}(

with performance criterion

Model predictions. _{1} the information processing cost is higher and higher utility is achieved compared to the low resource scenario _{2}. _{1}(_{i}) changes. A higher world state probability lowers the entropy whereas low world state probabilities are associated with higher action entropies.

The bounded rational decision-making model makes the following predictions:

Any decision-maker, irrespective of the underlying mechanism, can either be on or below the curve, i.e., given a certain amount of information resources it is not possible to achieve a higher expected utility than the bounded rational actor.

If we vary the resources available to the decision-making process (e.g., _{1} and _{2} in Figure

If we vary the world state distribution ρ(_{0}(_{1}(

If we want to compare real-world decision-makers with the above theoretical description, we need to determine two quantities experimentally, namely the average utility ^{exp} and the average empirical information ^{exp}. The choice probabilities are the measured posteriors that may differ from the theoretical bounded optimal posteriors. In addition to the posteriors, we need to determine the prior choice probabilities in order to obtain _{0}(

The performance of the decision-maker can be represented by a data point

where ^{min} is the maximum theoretical utility achievable without any information processing and ^{max} denotes the maximal theoretical utility for a channel with an information processing rate of

Ten naïve subjects (5 female, 5 male) participated in the study and provided informed written consent prior to the participation. The study was approved by the ethics committee of Ulm University. They carried out the experiment over several days and were compensated for their time with a basic hourly wage of 8 € and 0-5 € bonus depending on performance given by their target hit ratio.

The experiment was conducted using a vBOT robotic manipulandum (Howard et al.,

In the experimental task, subjects had to perform reaching movements toward one of four concentrically located targets representing the world states

Experimental design. _{u}, ρ_{l}, and ρ_{r} over the four targets correspond to different world state distributions.

Trials had a consistent flow of events (see Figure

Scheme of the decision process. The task requires planning and execution of a movement. Based on the expected utility

_{1} and _{2} were defined. _{1} was intended to represent a hard time constraint which was set specifically for each subject. The second reaction time limit _{2} was the same for all subjects and defined a fixed limit of 300 ms. There were three different probability distributions over targets: uniform (ρ_{u}), high probability for leftmost target (ρ_{l}) and high probability for rightmost target (ρ_{r}) (see Figure

To allow for learning, the experiment was preceded by a training phase that consisted of four stages. In the first stage, the training phase started out with a block of 500 trials, where in each trial the target was selected randomly from a uniform probability distribution. To get subjects used to the task, the reaction time limit was set to 1 s. In the second stage, subjects underwent four blocks of 100 single target trials. As these four blocks were identical to the four blocks at the end of the experiment, one can get an idea about behavioral improvements of subjects. In the third stage of training, subjects' individual reaction time limit _{1} was determined, by fixing _{1} to a value well below 300 ms where approximately 20% of trials could not be completed in time. In the fourth and final stage of training, subjects underwent six blocks of 500 trials each, where each block was characterized by a different condition, identical to the test trials described above. Subjects' data across these blocks can be seen in Figure

The reaching task in our experiment consists of two processes, movement planning and movement execution (see Figure

We compared two reaction time conditions for each subject: a fast condition and a slow condition, _{1} and _{2} respectively. Averaged over all subjects we measured a mean reaction time τ_{RT1} = 154 ± 10 ms and τ_{RT2} = 202 ± 5 ms. By manipulating the permissible reaction time, we effectively manipulated subjects' decision-making resources. According to bounded rational decision theory one would expect higher decision noise with less resources, and therefore an increased total endpoint variance in case of _{1} compared to _{2}. In line with our prediction we found a general tendency of increased movement variability when decreasing subjects' reaction time limit. Figure _{1} and _{2} for uniform target distribution ρ_{u}. In particular, we find that standard deviation σ = 〈_{σw〉ρ(w)} of subjects' endpoints on average increased when decreasing subjects' reaction time limit (_{RT}: = _{w} represents the difference between mean movement endpoint and desired target location _{bw〉ρ(w)} (see Figure

Effect of resource limitation on movement variability. _{1}. _{RT} in the two reaction time conditions is measured by deviation from the correct target location. Accuracy is generally higher in the slower reaction time condition _{2}.

To ensure that differences in movement planning were indeed the major explanans for the observed differences in endpoint variability, we checked that movement execution was similar for the two reaction time conditions. We therefore looked at trajectory paths, velocity profiles and trajectory variance averaged over all of subjects' movements. While we found no systematic differences between movement paths in the fast and slow reaction time conditions (see Figure ^{−5}, repeated measures ANOVA) as can be seen in Figure

To assess the effect of limited reaction time on subjects' performance, we investigate the change in utility and information. Subjects' overall performance is measured by their expected utility 𝔼[_{u} can be seen in Figure _{RT1} = 1.42±0.05 bits and _{RT2} = 1.56±0.03 bits. Forming the difference quotient between information and reaction time, we get a mean information rate of approximately 3 bits/s.

Bounded rational analysis of movement performance for different reaction times and uniform target distribution ρ_{u}. Movement performance is given by subjects' average target hitting probability (expected utility) and sensorimotor information in bits measured by the mutual information _{1} and _{2}. The 95 % confidence region is determined by bootstrapping. Subjects' experimental performances are each compared against two bounded optimal efficiency frontiers, arising as the theoretical predictions with and without execution noise. For planning in an ideal system without movement execution noise (

In Figure _{1} and _{2} and between targets _{3} and _{4}, such that the posterior simply selects one of the two possibilities. When considering execution noise in the planning phase, the optimal performance of a bounded rational decision-maker depends on the individual execution noise level

In line with the prediction of Figure _{1} (_{1} (_{2}, even though average efficiencies as defined in Equation 8 are well above 90% throughout for all subjects (see Figure

Performance with different reaction time resources (_{1}<_{2}).

A bounded rational decision-maker should consider the given world state distribution ρ(^{*}(_{u} the processing cost to achieve maximum expected utility is approx. 1.7 bit for an average level of motor execution noise. In comparison, the non-uniform distribution ρ_{nu} that represents the mirror symmetric distributions ρ_{l} and ρ_{r} from Figure _{nu} is depicted in Figure _{r} and ρ_{l}. Subjects' performance in the two reaction time conditions is compared against the bounded optimal efficiency frontier for ρ_{nu} considering the individual level of movement execution noise. On average over all subjects we measured mean reaction times of τ_{RT1} = 155 ± 9 ms and τ_{RT2} = 189 ± 6 ms and mean information values of _{RT1} = 1.01±0.03 bits and _{RT2} = 1.09±0.02 bits. Taking the difference quotient of information and reaction time we estimate an information rate for the population of approximately 2.1 bits/s, which is similar to the uniform condition.

Bounded rational analysis of movement performance for varying world state distributions. Subject's experimental performances are each compared against the bounded optimal efficiency frontier for given world state distributions. For the non-uniform world state distribution ρ_{nu} the information processing cost is reduced compared to the uniform distribution ρ_{u}.

The efficiency frontier for the non-uniform distribution ρ_{nu} is shifted to the left compared to the efficiency frontier under the uniform world state distribution ρ_{u}. As predicted in Figure ^{−8}, repeated measures ANOVA)(see Figure

Performance under non-uniform world state distributions. _{nu} is generally lower than for ρ_{u}.

In order to make predictions about subjects' performance for non-uniform world state distributions from their behavior under a uniform condition, it is necessary to make additional assumptions about the constraints that determine the decision-making process. For the prediction of the bounded rational decision-making model in Figure _{u} and ρ_{nu} in both reaction time conditions (see Figure

Predicting performance across world state distributions.

In principle, there are alternative assumptions one could make to predict behavior in the non-uniform condition from uniform performance. We consider three possible alternatives: (1) one could try to achieve a higher utility by maintaining the same level of information, (2) one could reoptimize prior and posterior by maintaining the same rationality parameter β, and (3) one could simply maintain exactly the same behavior as before without adapting at all. The first hypothesis can be ruled out immediately when looking at Figure _{i}) for more probable world states _{i} is lower than entropy of action distributions for less probable world states _{i}. Figure _{u}, ρ_{l} and ρ_{r} and relate to the outer most target locations (_{1}, _{4}). The theoretical entropies are computed from the bounded optimal posteriors that have the same expected utility as the experimental posteriors. For frequent targets we average the theoretical entropies for target _{1} in the case of ρ_{l} and target _{4} in the case of ρ_{r} and compare against the experimentally determined action entropy for the same targets. For infrequent targets we average the theoretical entropies for target _{4} in the case of ρ_{l} and target _{1} in the case of ρ_{r} and compare against the experimentally determined action entropy for the same targets. The _{1} and target _{4} under the uniform target distribution ρ_{u}. Dependent on the world state frequency the optimal entropy of the action distribution is modulated, such that frequent world states should be associated with lower entropy and infrequent world states with higher entropy. This way the average entropy is lowered. The experimental results confirm the trend that the action entropy decreases with increasing world state frequency (

Comparison of predicted and experimental utility and information when predictions are generated from the uniform condition under the hypothesis that the rationality parameter β is constant.

Comparison of predicted and experimental utility and information when predictions are generated from the uniform condition under the hypothesis that there is no adaptation and the posterior

Conditional entropy ^{*}(

In this study we investigated how the abstract theory of information-theoretic bounded rationality can be applied to a sensorimotor reaching experiment by varying informational resources during motor planning. In particular, we varied the permissible reaction time for planning and the probability distribution over the different targets to manipulate subjects' action prior. We found that both information constraints had a significant impact on task performance as measured by endpoint variability and expected utility (i.e., the expected probability of a target hit under known motor execution noise). The two experimental manipulations of reaction time and prior can be mapped into abstract informational resources quantified by relative Shannon information within bounded rationality theory. Our results show that a decrease in permissible reaction time is accompanied by a decrease in both utility and information-processing resources, and that changing to a low-entropy world state distribution also decreases information-processing costs, as the motor system selectively adapts endpoint variability to the probability of the targets. Both the reduction in information-processing due to reaction time limits and the modulation of endpoint variability depending on target frequency can be understood within the normative framework of bounded rationality. Usually, optimal decision-making models under motor execution noise either assume perfect planning or add planning noise in an

Bounded rationality with information-theoretic concepts has been investigated in the context of economic decision-making (Sims,

Two of the earliest applications of information theory to human behavior were Hick's law (Hick,

Our experiment could be interpreted as a Hick's experiment in continuous space. For experimental design reasons we decided not to have continuous targets, so we could average responses for each target without making a global translational symmetry assumption. Instead, we chose four targets as an approximation to the continuous case. For a continuous uniform target distribution

where the first term is the differential entropy of the target distribution, the second term is the differential entropy introduced by Gaussian noise and the third term is a correction term due to truncation. For our approximation the first and second term would stay the same, the third term would vary depending on target distance and number of targets. In Figure

Information values for both world state distributions are very close to their respective maxima of

Another important difference from Hick's task to our experiment is that we had a hard controlled reaction time limit, whereas the orignal experiment was a free reaction time task with a given accuracy level. In follow-up studies, for example by Pachella and Fisher (

As we measured endpoint accuracy in reaching movements, Fitts' law is of course in principle applicable to our task. However, Fitts' law doesn't say anything about reaction time which we manipulated, but only about movement time which we did not manipulate. As we kept movement time, target distance and target width constant throughout the experiment, Fitts' law cannot explain our changes in endpoint accuracy due to reaction time manipulation. As suggested previously (Albertas et al.,

How does bounded rationality contribute to this line of study? While our experimental design can be interpreted as a Hick's experiment in continuous space, both Hick's law and Fitts' law are fundamentally concerned with the relationship between information and time. In contrast, bounded rationality studies the relationship between utility and information, where information is conceived as an abstract resource measure, ultimately counting how many distinctions one can make. In general, one would expect a monotonic relationship between information and any concrete resource measure like time, because the more resources are available the more distinctions one can make (for example, the longer the computation of the number pi, the more digits are known). A linear relationship between information and resource would only be expected in special cases, for example in case of logarithmic search. In this sense both Hick's law and Fitts' law are consistent with bounded rationality theory, even though these findings only concern the resource part and ignore utility. In contrast, within a bounded rationality framework one could study effects of different utilities on the choice task, for example in the form of monetary payoff or by changing features of the stimulus. Norwich et al. (

As far as the authors are aware, this study is the first to show the effect of limited reaction time on continuous endpoint variability in a human reaching task. There is a large body of studies that have investigated speed-accuracy trade-offs (Wickelgren,

While there have been a number of studies investigating the effect of neuromuscular noise on motor control (Harris and Wolpert,

Although there have been attempts in applying computational models of noisy decision-making to

From the large family of sampling-based decision-making models, in particular drift diffusion evidence accumulation models have been successfully linked to neurophysiological recordings, especially in the parietal cortex (Wang and Sandholm,

Decision-making models that take the cost of computation into account have also been explored in the reinforcement learning literature (Keramati et al.,

SS, SG, and DB designed the experiment. SS performed experiments and analyzed the data. SS and SG generated predictions from computer simulations. SG and DB supervised the project. SS and DB wrote the paper.

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The Supplementary Material for this article can be found online at: