^{1}

^{1}

^{1}

Edited by: Eldad Yechiam, Technion-Israel Institute of Technology, Israel

Reviewed by: Floris P. De Lange, Radboud University Nijmegen, Netherlands; Davide Marchiori, National Chengchi University, Taiwan

*Correspondence: Hang Zhang, Department of Psychology, New York University, 6 Washington Place, New York, NY 10003, USA. e-mail:

This article was submitted to Frontiers in Decision Neuroscience, a specialty of Frontiers in Neuroscience.

This is an open-access article distributed under the terms of the

In decision from experience, the source of probability information affects how probability is distorted in the decision task. Understanding how and why probability is distorted is a key issue in understanding the peculiar character of experience-based decision. We consider how probability information is used not just in decision-making but also in a wide variety of cognitive, perceptual, and motor tasks. Very similar patterns of distortion of probability/frequency information have been found in visual frequency estimation, frequency estimation based on memory, signal detection theory, and in the use of probability information in decision-making under risk and uncertainty. We show that distortion of probability in all cases is well captured as linear transformations of the log odds of frequency and/or probability, a model with a slope parameter, and an intercept parameter. We then consider how

Estimates of the frequency of events by human observers are typically distorted. In Figure

^{2} denotes the proportion of variance accounted by the fit.

Such S-shaped distortions^{1}

Figure

We use a two-parameter family of transformations to characterize the distortions of frequency/probability. This family of distortion functions is defined by the implicit equation,

where

is the log odds (Barnard,

_{0}_{0} in the LLO function is the “fixed point” of the transformation, the value of _{0}. Left: _{0} fixed at 0.4 and γ varied between 0.2 and 1.8. Note that the line at γ = 1 overlaps with the diagonal line, i.e., no distortion of probability. Right: γ fixed at 0.6 and _{0} varied between 0.1 and 0.9.

The two parameters of the family are readily interpretable. The parameter γ in Eq. _{0} is the “fixed point” of the linear transformation, the value of _{0} and simplify to get,

Since Lo() is invertible, π(_{0}) = _{0}. We refer to _{0} as the

In Figure _{0} to _{0}. At point (_{0}, _{0}), the slope of the curve equals γ. When γ = 1, π(_{0} < 1 we see an S-shaped curve. When 0 < γ_{0} < 1 we see an inverted-S-shaped curve. When the crossover point _{0} is set to either 0 or 1, the curve is no longer S-shaped but simply concave or convex.

This family of functions, with a slightly different parameterization, has been previously used to model frequency distortion (Pitz,

The LLO function we use is just one family of the functions that can capture the S-shaped transformations. Prelec (

The present paper is organized into four sections. In Section _{0}) and goodness-of-fit (^{2}) of the LLO fit are shown on each plot. We see dramatic differences in γ and _{0} across tasks and individuals. We are concerned with two questions: how can we explain the LLO transformation? What determines the slope γ and crossover point _{0}? We address these two questions in the following sections.

We conducted three experiments to investigate the factors that influence γ and _{0}. We report them in Section _{0}. We discuss the results in the light of recent findings in decision under risk, especially those in the name of “decision from experience” (Hertwig et al.,

Although no attempts have been made to explain the various S-shaped distortions of frequency/probability in one theory, there are quite a few accounts for the distortion in one specific task or area. In Section

In Section

We now demonstrate that the subjective frequency/probability in a wide variety of tasks can be fitted by the LLO function with two parameters γ and _{0}. In the accompanying figures, we plot subjective frequency/probability versus true frequency/probability on log odds scales. On these scales the LLO function is a straight line with slope γ and crossover point _{0}. Black dots denote data points. The blue line denotes the LLO fit. When you read the plot, note how different γ and _{0} can be for different tasks or individuals. These plots pose quantitative tests for any theory that is aimed at accounting for probability distortions.

We introduced Attneave (

^{2} denotes the proportion of variance accounted by the fit. The S-shaped distortions of frequency/probability on linear scales in Figures

Note that the relative frequency of even the most common letter (“e”) is less than 0.15. Intriguingly, the estimated crossover point

Another impressive example is Lichtenstein et al. (^{8}) to obtain the relative frequencies,

^{2} denotes the proportion of variance accounted by the fit.

In the above two examples, participants’ estimation of frequency was based on their memory of events (e.g., reading of a case of lethal events on the newspaper). To show the LLO transformation is not unique to memory nor to sequential presentation of events, our third example is Varey et al. (

Confidence rating refers to the task where participants estimate the probability of correctness or success of their own action. For example, in Gigerenzer et al. (

^{2} denotes the proportion of variance accounted by the fit.

^{2} denotes the proportion of variance accounted by the fit.

Gigerenzer et al. (

A classical task of decision under risk is to choose between two gambles or between one gamble and one sure payoff. Kahneman and Tversky (^{2}

Based on their choices between different gambles and different sure payoffs, participants’ decision weight (a counterpart of π) for any specific stated probability (_{0} = 0.40.

The data presented in most decision-making studies are averaged across participants. As an exception, Gonzalez and Wu (_{0} ranges from 0.26 to 0.98, with a median of 0.46. The only common point across participants seems to be that all the slopes are lesser than one.

When the probabilities of possible consequences of a decision are known, it is decision under risk. When the probabilities are unknown, it is decision under uncertainty. Tversky and Fox (

^{2} denotes the proportion of variance accounted by the fit. In Tanner et al. (1956), c.f. Green and Swets (

Signal detection theory (Green and Swets,

Based on the relative frequencies of hit, miss, FA, and CR, the actual decision criterion used by the observer can be measured and the experiment can compare the subject’s decision criterion with the optimal criterion. Systematic deviations from the optimal decision criterion have been found in many studies (Green and Swets,

In Figure

In a cognitive signal detection task where participants were asked to classify a number into two categories with different means (Healy and Kubovy,

At this moment, you are probably intrigued by the same two questions as the authors are: why does probability distortion in so many tasks conform to an LLO transformation? What determines the slope γ and crossover point _{0}?

The plots we present here reflect only part of the empirical results we have reviewed. To provide a more complete picture, we clarify the following two points.

First, the slope γof the LLO transformation is not determined by the type of task. The slope γ of the same task can be less than one under some conditions and greater than one under others, not to mention the quantitative differences. For example, the typical distortion in relative frequency estimation is an overestimation of small relative frequency and underestimation of large relative frequency, corresponding to γ < 1. But in a visual task that resembles Varey et al. (

In decision-making under uncertainty, a reversal is reported in Wu et al. (

Second, the crossover point of the LLO transformation is not determined by the type of task, either. See the difference between Attneave (

Luce (

While the LLO family provides good fits to all of the data we have obtained, a two-parameter form of Prelec’s model of the probability weighting function (Prelec,

What controls the slope γ and crossover point of the LLO transformation in a specific task? In this section we report three new experiments on frequency/probability distortions.

Gonzalez and Wu (

The task we consider here is estimation of the relative frequency of one color of dot among a crowd of two or more colors of dots, a task used by Varey et al. (_{0}. We compared γ and _{0} across conditions.

In earlier studies on frequency estimation, some researchers found that small relative frequencies are overestimated and large relative frequencies underestimated (Stevens and Galanter,

In Experiment 1, participants estimated the relative frequency of either black or white dots among black and white dots. Each participant completed eight blocks. We examined the effects of two factors on γ and _{0}:

Eleven participants, seven female and four male, participated. Six of them estimated the relative frequency of black dots, the remaining five, white. One additional participant was excluded from the analysis because of marked inaccuracy. All participants gave informed consent and were paid $12/h for their time. The University Committee on activities involving human subjects (UCAIHS) at New York University approved the experiment.

Stimuli were black and white dots displayed on a gray background. They were presented on a SONY GDM-FW900 Trinitron 24″ CRT monitor controlled by a Dell Pentium D Optiplex 745 computer using the Psychophysics Toolbox (Brainard,

On each trial the display of black and white dots was presented for 1.5 s. Participants were asked to estimate the relative frequency of black or white dots. Their estimates were numbers between 1 and 999 interpreted as their estimate of relative frequency out of as 1000. Each participant made estimates for only one color of dots (black or white) and the color assigned to each participant was randomized. Participants were encouraged to be as accurate as possible. No feedback was given.

Trials were organized into blocks of 100 trials. In each block all of the relative frequencies 0.01, 0.02, …, 0.99 except 0.50 occurred once and 0.50 occurred twice. The total number of dots (

The experimental blocks were numbered from 1 to 8 in order. We refer to block index as _{0} across the 11 participants.

Starting from slightly less than one, the slope γ became shallower with experience (Figure

The crossover point _{0} fluctuated around 1/2 (0.5) in all the blocks, ranging from 0.42 to 0.55. According to a repeated-measures ANOVA, _{0} did not vary significantly across blocks, _{0}.

We used a similar procedure to analyze the effect of sample numerosity as we used in the effect of experience above.

As sample numerosity increased, the slope γ declined (Figure

Moreover, the relationship of γ to

A least-squares fit of Eq.

The crossover point _{0} was 0.50, 0.54, 0.51, 0.68, 0.68, respectively for the numerosity of 200, 300, 400, 500, 600. Similar to experience, the effect of sample numerosity failed to reach significance, _{0}.

What determines the crossover point _{0}? In Experiment 1, _{0} was around 0.5 and little affected by experience or sample numerosity. But recall that the estimation of the relative frequency of the 26 English letters (Attneave, _{0} = 0.044, very different from 0.5 and coincidently not far from 1/26. Fox and Rottenstreich, _{0} = 1/m.

Experiment 2 was focused on testing the prediction of _{0} = 1/m. The results of Experiment 1 were consistent with the prediction where there were two categories of dots, black and white. In Experiment 2, we set

Ten participants, nine female and one male, participated. None had participated in Experiment 1. All reported normal color vision and passed a color counting test. All subjects gave informed consent and were paid $12/h for their time. The UCAIHS at New York University approved the experiment.

The same as Experiment 1, except that dots could any of four colors, red, green, white, or black.

In each trial a display of black, white, red, and green dots were presented for 3 s. Afterward one of the four colors was randomly chosen and participants were asked to estimate the relative frequency of dots of this specific color. As in Experiment 1, participants input a number between 1 and 999 as the numerator of 1000 and no feedback was given.

In any trial, the relative frequencies of the four colors were multinomial-like random distributions centered at (0.1, 0.2, 0.3, 0.4) and each relative frequency was constrained to be no less than 0.02. The order of relative frequencies for different colors was randomized. The total number of dots in a display could be 400, 500, or 600, each numerosity occurring in 32 trials of a block. Each participant completed one session of five blocks. That is, five blocks × 96 trials = 480 trials in total.

Fox and Rottenstreich,

In an attempt to further test the “guessing 1/

For each participant, we left out the trials whose estimated relative frequencies were within preferred response ± 0.04 and fit the remaining trials to Eq.

For the 10 participants, we computed the mean and 95% confidence interval separately for crossover point and for preferred response. The crossover point was 0.22 ± 0.07, indistinguishable from 1/4 (0.25). Note that it was much lower than 0.5. If this were the result of the “guessing 1/4” heuristic, we would expect a positive correlation between crossover point and preferred response. However, no significant correlation was detected, Pearson’s

We concluded that the prediction of _{0} = 1/m, was supported, but it was unlikely to be the result of the heuristics discussed above.

Tversky and Kahneman (

Ten participants, seven female and three male, participated. None had participated in Experiment 1 or 2. One additional participant was excluded for failing to converge in the adaptive staircase procedures we used to measure JND. All subjects gave informed consent and were paid $12/h for their time. The UCAIHS at New York University approved the experiment.

Same as Experiment 1.

On each trial two displays of black and white dots were presented, each for 1.5 s, separated by a blank screen of 1 s. Half of the participants judged which display had a higher proportion of black dots, and the other half, white dots.

As in Experiment 1, the total number of dots (

The proportion of black or white dots of one display was fixed at 0.5. The proportion of the other was adjusted by adaptive staircase procedures. For each of the five numerosity conditions, there was one 1-up/2-down staircase of 100 trials, resulting in 500 trials in total Each staircase had multiplicative step sizes of 0.175, 0.1125, 0.0625, 0.05 log unit, respectively for the first, second, third, and the remaining reversals. The five staircases were interleaved. Five practice trials preceded the formal experiment.

The 1-up/2-down staircase procedure converges to the 70.7% JND threshold. For each participant and numerosity condition, we averaged all the trials after the first two reversals to compute the threshold. The mean threshold across participants was 0.57, 0.57, 0.56, 0.56, 0.55, respectively for the numerosity of 200, 300, 400, 500, 600. According to a repeated-measures ANOVA, there was no significant difference in the JND threshold for different numerosities,

As demonstrated in Section _{0} (LLO, the Eq.

In Experiment 1 we found that slope γ decreased with increasing experience or larger sample numerosity. Intuitively, these trends are surprising, because an accumulation of experience or a larger sample size should reduce “noise” and thus lead to more accurate estimation. Interesting, the slope γ was proportional to the reciprocal of log

In both Experiment 1 and 2 we found that the crossover point _{0} agrees with a prediction of _{0} = 1/m. Our results are consistent with the category effect found in Fox and Rottenstreich,

Recently, research on decision-making has begun to focus on how the source of probability/frequency information affects probability distortion. This new research area contrasts “decision from experience” (Barron and Erev,

What are the implications of our results for decision from experience? A typical finding in decision from experience is an underweighting of small probabilities (e.g., Hertwig et al.,

In the language of LLO, the larger the numerosity (sample size), the shallower the slope of the probability distortion (underweighting small probabilities corresponds to a slope of over one). Note that this effect of sampling size on the probability distortion in decision from experience qualitatively parallels to what is found in Experiment 1. And according to Eq.

There is another hint in the literature that the highly ordered changes in probability distortion that we observe in visual numerosity tasks would also show up in decision-making tasks where probability information is presented as visual numerosity. Denes-Raj and Epstein (

We have also shown that we can systematically manipulate the crossover point _{0} in a relative visual numerosity task. The crossover point is often assumed not to vary in decision-making under risk (Tversky and Kahneman,

Gigerenzer (Gigerenzer et al.,

Probability distortion in confidence rating typically has a slope of γ > 1 (see Figure

Why do humans distort frequency/probability in the ways that they do? The subjective probability may deviate from the true probability for many reasons, but no simple reason can explain the S-shaped patterns we have observed.

For example, people might overestimate the frequencies of the events that attract more media exposure (Lichtenstein et al.,

The S-shaped distortion has received much attention in quite a few areas. Theories and models have been developed to account for the S-shaped distortion in a specific area, although little efforts have been made to build a unified theory for all the areas. In this section, we briefly describe the representative theories and models, organizing them by area. Their predictions, quantitative or qualitative, on slope, and crossover point of the distortion are compared with the empirical results we summarized in Sections

Spence’s (

The basic assumption is Stevens’ power law: the perceived magnitude of a physical magnitude, such as the number of black dots in a visual array of different colors of dots, is a power function of the physical magnitude with a specific exponential. We apply the power assumption to the estimation of relative frequency as below. Suppose among _{1} black dots and _{2} other colors of dots. The perceived numerosity would be

Dividing both the numerator and denominator of the right side by ^{α}, we get the perceived relative frequency as a function of the true relative frequencies:

It is easy to see this is a variant of LLO (substitute Eq. _{0} = 0.5. Thus an S-shaped distortion follows the assumption of Stevens’ power law.

Hollands and Dyre (

As to the crossover point, Hollands and Dyre (_{0} = 1/

Tversky et al.’s support theory (Tversky and Koehler,

To explain the inverted-S-shaped distortion of relative frequency, Fox and Rottenstreich,

The value of the prior probability was the crossover point. Fox and Rottenstreich, _{0} = 1/

However, the weighted addition of a true log odds and a prior log odds would lead to a γ never greater than 1, unless the prior log odds has a negative weight. Therefore, it cannot explain the γ > 1 cases (Shuford,

The slope of the distortion equals the weight assigned to the true log odds in the combination. Fox and Rottenstreich,

The calibration model of Smith and Ferrell (

The calibration model borrows the framework of signal detection theory. Correctness and wrongness of an answer, or success and failure of an action, are considered as two alternative states, i.e., signal present and absent. The observer’s confidence, is assumed to be have a constant mapping to the perceived likelihood ratio of the two states. If the discriminability between the two states is perceived to be larger than the true value, small probabilities would be underestimated and large probabilities overestimated, amounting to γ > 1 (as in Figure

The calibration model does not necessarily lead to an LLO transformation and does not have any specific predictions for the selection of slope and crossover point.

Erev et al.,

With some specific response rules, the S-shaped distortion can be produced. The predictions of the stochastic model are not intuitive and are illustrated in their computational simulation. One of the predictions states that the underestimation of small probability and overestimation of large probability (i.e., the γ > 1 pattern) widely identified in confidence rating tasks, a seemingly reverse pattern of regression-to-the-mean, is actually a kind of regression-to-the-mean phenomenon disguised by the way how the true probability is defined. The true probability in the confidence rating task is usually defined as the actual success rate of a specific confidence level. That is, successful and unsuccessful actions are grouped by participants’ confidence rating. Wallsten et al. (

However, we doubt this effect of true probability definition can apply to the confidence rating data of McGraw et al. (

Martins (

The involvement of a prior could explain why the estimated probabilities shrink toward a center. However, for any specific

Another difficulty that adaptive probability theory encounters is the underweighting of small probability observed in studies of decision from experience (e.g., Hertwig et al.,

In this article we examined probability distortion in human judgment and the factors that affect it. An evident direction for future research is to develop process-based models of human use of probability and frequency information. The theories and models we reviewed above are among those that use specific cognitive processes to explain the emergency of the

We conjecture that log odds to be a fundamental representation of frequency/probability used by the human brain. Here are a few pieces of evidence.

Phillips and Edwards (

Maloney and Dal Martello (

^{2} denotes the proportion of variance accounted by the linear fit. See text for implications.

Gold and Shadlen (_{1} or hypothesis _{0} is true. Assume there is a pair of sensory neurons: “neuron” and “antineuron.” The firing rate of “neuron,” _{1} or _{0} is true. So does the firing rate of “antineuron,” _{1} is the same as the random distribution of _{0}, and vice versa. For many families of random distributions, such as Gaussian, Poisson, and exponential distributions, Gold and Shadlen prove that the log likelihood ratio of _{1} to _{0}, is a linear function of the firing rate differences between “neuron” and “antineuron,”

Log odds has been independently developed to fit psychophysical data in many areas of perception and cognition over the course of many years. As early as 1884, Peirce and Jastrow (

In the decision area, Karmarkar (

For signal detection theory, it is a common practice to plot the actual decision criterion against the optimal decision criterion in the log scale (Green and Swets,

We are seeking for a general explanation for the linear transformation of log odds in these various areas. No matter how different these tasks look like, they are connected by the same evolutionary aim: using possibly imperfect probabilistic information to make decisions that lead to the greatest chance of survival. It is therefore surprising, at first glance, that organisms systematically distort probability. It is doubly surprising that the same pattern of distortion (LLO) is found across a wide variety of tasks.

A full explanation of the phenomena just described would require not only that we account for the form of the distortion but also for the large differences in the values of the two parameters across tasks and individuals and the factors that affect parameter settings. The key question that remains is, then, what determines the slope and crossover point of the linear log odds transformation? We found that in one task we could identify experimental factors that controlled both the slope and crossover point of the LLO transformation of perceived relative numerosity. We conjecture that there are factors in each of the domains we considered that are responsible for the particular choice of probability distortion observed. We need only find out what they are.

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

This research was supported in part by grant EY019889 from the National Institutes of Health and by the Alexander v. Humboldt Foundation.

^{1}We use the term “distortion” to cover transformations in probability or relative frequency implicit in tasks involving probability or relative frequency. We use “S-shaped” to refer to both S-shaped and inverted-S-shaped. Precisely, Attneave’s (

^{2}We use the generic term “probability distortion” to refer to non-linear transformations of probability in different kinds of task. In decision under risk, the term “probability weight function” or “decision weight function” would coincide with what we refer to as probability distortion.