^{*}

Edited by: Dominic Standage, Queen's University, Canada

Reviewed by: NaYoung So, Columbia University, USA; Wael Asaad, Brown University, USA

*Correspondence: James V. Stone, Psychology Department, Sheffield University, Western Bank, Sheffield S10 2TP, UK e-mail:

This article was submitted to Decision Neuroscience, a section of the journal Frontiers in Neuroscience.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

As the strength of a stimulus increases, the proportions of correct binary responses increases, which define the psychometric function. Simultaneously, mean reaction times (RT) decrease, which collectively define the chronometric function. However, RTs are traditionally ignored when estimating psychophysical parameters, even though they may provide additional Shannon information. Here, we extend Palmer et al's (

For over a 100 years, it has been known that the ability to discriminate between two stimuli increases as a sigmoidal function of the difference between those stimuli, where this is traditionally measured using binary observer responses. However, when an observer makes a response, there is a trade-off between speed, or reaction time (RT), and accuracy of responses. This speed-accuracy trade-off has been the subject of numerous papers, notably (Ratcliff,

Here, we propose four extensions to the proportional-rate diffusion model (PRD) proposed in Palmer et al. (

Once the model has been fitted to these data, it can be used to estimate the mutual information (Shannon and Weaver,

We provide a brief summary of Palmer et al's PRD model (Palmer et al.,

The PRD model is based on a diffusion model of RT, where the _{PRD} varies as a sigmoidal function of _{res} is a fixed residual RT (e.g., time to respond after a decision is made). Notice that this model requires that the mean RT _{PRD} decreases monotonically as the motion signal increases above zero, a requirement which will be relaxed in the model proposed below.

Within the PRD model, the probability _{PRD} of making a correct response is defined by the logistic psychometric function
_{PRD} = 0.5, whereas if _{PRD} = 1.0.

The model proposed here is based on the assumption that responses arise from a two-alternative forced choice (2AFC) procedure. On each trial, the observer is presented with two stimuli, and the task is to choose the stronger stimulus, where strength can be defined in terms of differences in any physical quantity, such as speed, luminance, or contrast. The two stimuli are a _{R} that remains constant within a specific subset of trials, and a _{C} that varies between trials. A _{R} and the comparison value _{C}, specifically _{C} − _{R}.

We measure performance in terms of the proportion _{PRD} to _{PSE}, which is the point-of-subjective-equality (PSE) between the comparison and reference stimuli. Specifically, _{PSE} is the value _{C} of the comparison stimulus which is perceived to be the same as the value _{R} of reference stimulus.

Given that the stimulus strength is _{C} − _{R}, the _{C}. The probability of choosing the comparison stimulus is defined as

In order to take account of observer lapses in concentration, which result in a pure guess, we introduce a lapse rate parameter γ. Evidence presented in Wichmann and Hill (_{min} = γ/2 and _{max} = 1 − γ/2, respectively, such that^{1}

Finally, we can adapt results from Luce (_{dec} = _{i} − _{res}, so that Equations (5, 8) can be combined to provide a mapping between mean decision time _{dec} and the probability _{dec} =

For each trial, we obtain a RT and a binary response from the observer, which indicates whether the observer has chosen the comparison stimulus or the reference stimulus. At each stimulus strength _{i}, the comparison and reference stimuli are presented to the observer on _{i} trials, and the number of times the observer chooses the comparison and reference stimulus is recorded as _{i} and _{i} − _{i}, respectively. For a given putative value of _{i}, a standard binomial model gives the probability of the observed binary responses as
_{i} is a function of the parameters _{i} is the proportion of comparison stimulus responses _{i} = _{i}/_{i}.

When considered over all _{x} values of _{i} is determined by Equation (6), which is a function of the EPRD parameter values θ_{P} = (_{P} is obtained by finding EPRD parameter values θ_{P} that maximize _{P}.

If the number of trials at each stimulus strength is large then Equation (13) can be approximated by a Gaussian function. At a given stimulus strength _{i}, the observed proportion of binary responses is _{i}, which is assumed to be the probability _{i} plus a noise term η_{P}, so that _{i} = _{i} + η_{P}. If the noise η_{P} has a Gaussian distribution with variance _{P,i} then
_{i} is defined as a function of _{P,i} can be estimated from the data as _{P,i} = _{i}_{i}(1 − _{i}). Results for the Gaussian approximation in Equation (14) were found to be very similar to those for Equation (13). Results reported here are based on Equation (13).

RTs tend to be short if the comparison stimulus value is very different from the reference stimulus, but as the comparison and reference stimuli become more similar, so the RT increases, as shown in Figure _{τ} model, which is used to estimate EPRD model parameters.

It is commonly assumed that the RT is the time required for the cumulative amount of perceptual evidence to reach some criterion value (Ratcliff, ^{2}

If RTs have an inverse Gaussian distribution with mean _{i} then the probability of a single observed RT τ_{ij} associated with the _{i} is
_{x} stimulus strengths is presented _{i} times. For one model RT mean, the probability of the observed _{i} RTs (one RT per trial) defines the log likelihood function
_{i} and λ_{i} yields a maximum likelihood estimate (MLE) of both parameters at one stimulus strength _{i}. Even though the algebraic mean and the MLE mean are identical (Tweedie, _{i}, which is vital for subsequent calculations.

For a given stimulus strength _{i}, the predicted mean RT _{i} varies as a tanh function of _{i}, as defined in Equation (8). The central limit theorem allows us to assume that the distribution of mean RTs of the inverse Gaussian pdf at a given stimulus strength _{i} is Gaussian with mean _{i} and variance _{τ,i}. Therefore, the likelihood of the EPRD mean _{i} from Equation (8) is
_{i} is _{τi} (Equation 16), so the variance _{τi} of a distribution of means (where each mean is based on _{i} samples) is
_{i} to the EPRD mean RTs _{i} of Equation (8) as follows. The probability of the _{x} mean RTs _{i} (one mean RT per stimulus strength) defines the log likelihood function
_{i} is defined in Equation (8), so that the parameters to be estimated for model _{τ} are θ_{τ} = (_{res}) to fit the overall variation in mean RT with stimulus strength

In summary, we have three estimates of the mean RT at each stimulus strength: the algebraic mean _{obsi}, the MLE mean of the inverse Gaussian or Wald pdf _{i} (from Equation 17), which collectively are used as data to estimate the means _{i} (one per stimulus strength) obtained from the fitted EPRD model (from Equation 21). The MLE means _{i} are shown as crosses in Figure _{i} are corresponding points on the fitted curve, respectively.

_{P} (i.e., using only binary responses), and a graph similar to _{τ} (i.e., using only mean RTs).

We also have two estimates of the probability of a comparison stimulus response at each stimulus strength: the observed proportion of comparison stimulus responses (which is the MLE _{i} = _{i}/_{i}), and the mean _{i} (one per stimulus strength) obtained from fitting the EPRD model (Equation 13) to the MLE means _{i}. These are shown as dots in Figure

In the absence of knowledge regarding the covariance between the noise in mean RT and binary response probability, we are forced to assume this covariance is zero. In other words, we assume that _{P} and _{τ} provide independent estimates of the EPRD model parameters. In this case, estimates based on combined RT and binary response probability are obtained by maximizing the sum of likelihoods

The amount of Shannon information (Shannon and Weaver,

More importantly, the total amount of Shannon information that the observer has about the stimulus cannot be less than the amount of Shannon information implicit in the observer's combined binary and RT responses. In other words, the total mutual information, as measured by an experimenter, between observer responses and stimulus strength provides a lower bound for the amount of Shannon information that the observer has about the stimulus strength. Thus, each the mutual information value provided in this paper constitutes a conservative estimate of the amount of information that the observer gains about the stimulus.

The mutual information

We can evaluate Equation (25) by summing over discrete versions of the variables _{i} is _{i} = _{i}/_{i}, so that
_{k}) = 1/_{k}. In order to evaluate Equation (27), we require expressions for _{i}|_{k}) and _{i}).

Using Equation (5) across a range of _{k} is _{k}. Assuming a binomial distribution, the probability of the observed proportion _{i} given a fitted value _{k} at _{k} is
_{i}|_{k}) = _{i}|_{k}), and _{i}|_{k}) values are normalized to ensure that ∑_{i} _{i}|_{k}) = 1.

The distribution of binary responses is binomial with a mean equal to the grand mean _{G} of all _{G} binary responses of an observer
_{i} = 1 if and only if a response corresponds to the observer choosing the comparison stimulus. The observer's prior probability of the binary responses for the _{i}) values are normalized to ensure that ∑_{i} _{i}) = 1.

Following the same line of reasoning as above, the mutual information

We can evaluate Equation (31) by summing over discrete versions of the variables _{i}|_{k}) is defined by the EPRD model (Equation 8) with a fitted value _{k}, so that
_{k}) = 1/_{i}.

The posterior is defined in Equation (18), but is repeated here with changed subscripts for clarity
_{τk} is defined in Equation (19), and _{i}|_{k}) values are normalized to ensure that ∑_{i} _{i}|_{k}) = 1.

A parametric form for the observer's prior probability distribution _{G} RTs. These were fitted to an inverse Gaussian distribution to obtain a grand mean _{G} and a parameter λ_{G}. This pdf has a variance
_{i}, the RT mean is based on a sample of _{i} RTs, and the central limit theorem suggests that the distribution of means is approximately Gaussian with a variance
_{i} is
_{i}) values are normalized to ensure that ∑_{i} _{i}) = 1.

So far we have derived expressions for the Shannon information implicit in the average RT _{i} and also in the average binary response, which is summarized as the proportion _{i} of comparison responses, for a stimulus strength _{i}. Here, we derive an expression for the Shannon information associated with a single trial; first for RTs, and then for binary responses.

As the number of trials at each stimulus strength is increased, so the variance in each mean RT decreases, and the central limit theorem ensures that the distribution of means becomes increasingly Gaussian. The mutual information between two variables (e.g., mean RT and stimulus strength) depends on the signal to noise ratio SNR
_{i} trials, the variance of the measurement noise has been reduced by a factor of _{i} relative to the noise in the RT of a single trial (provided this noise is iid). This implies that the value of SNR for a single trial is
_{τ} into Equation (40) then we obtain an estimate of the average Shannon information

The mutual information between stimulus strength and (binary or RT) responses can be used to define the smallest average detectable difference in stimulus strength, which we call the _{range} as the range of stimulus strengths

A brief explanation for this definition is as follows. Consider a range of stimulus strengths _{range} which give rise to “noisy” observer responses _{range}/2^{I}, which we can recognize as being equal to the SI.

We used the EPRD models described above to estimate the PSE and other key parameters for a simple demonstration experiment using a human observer. On each trial, the observer was presented with a colored picture of an upright face and an inverted face (see Figure _{i}, the observer was presented with the same stimulus pair for a total of _{i} = 20 trials. Stimuli were shown in random order, and the left/right position of reference/comparison stimuli was counterbalanced across trials.

Each of three models defined by _{P}, _{τ}, and _{C} was used to fit a psychometric and/or a chronometric function to the data from one subject, as shown in Figure

_{res} (s) |
||||||||
---|---|---|---|---|---|---|---|---|

Binary _{P} |
1.031 ± 0.003 | NA | NA | 22.32 | NA | 0.005 | −31.13 | 2.68 |

RT _{τ} |
1.034 ± 0.004 | 0.998 | 28.37 | 28.32 | 0.437 | NA | 18.7 | 0.87 |

Comb _{C} |
1.032 ± 0.003 | 1.016 | 23.12 | 23.50 | 0.354 | 0.011 | −13.10 | 3.18 |

_{res} is the fixed part of RT; γ, lapse rate; LLik, log likelihood; and MI, mutual information between stimulus strength and RT or binary responses or both (see text). The final number (3.18 bits) represents

Based on 420 binary responses, maximizing _{P} (Equation 12) yields a psychometric function similar to that in Figure _{PSE} = 1.031. This maximum likelihood estimate implies that an inverted face must be 3.1% wider than an upright face in order for the two faces to be perceived as the same width. Numerical estimation of the Hessian matrix of second derivatives of Equation (12) at _{PSE} yields a standard error (se) of 0.003, which implies that _{PSE} is significantly different from

Each of 21 mean RTs (one per stimulus strength) was first estimated by maximizing Equation (17), based on 20 RTs per stimulus strength. Using these 21 mean RTs, _{τ} (Equation 21), was maximized with respect to four parameters (PSE, _{res}) to yield a chronometric function similar to that in Figure _{PSE} = 1.034 (se = 0.004,

Based on 42 data points (the 21 estimated mean RTs used for _{τ} plus 21 corresponding binary response probabilities used for _{P}), maximizing _{C} (Equation 22) yields the psychometric function and the chronometric function in Figures _{res}, and γ.

The mutual information _{i} = 20, this implies that the mutual information

Similarly, Equation (27) can be used to estimate the mutual information between _{i} = 20, this implies that the mutual information

We can use _{tot} between _{tot} is
_{tot} cannot be less than

Using a conservative estimate of mutual information of ^{2.68}) of the effective range _{range} of stimulus strengths. Note that the range of scaling values used _{range} = 0.2 (i.e., 0.9 … 1.1) equals the range of stimulus strengths _{range} = 0.2 (i.e., −0.1 … 0.1). Therefore, the SI for the width scaling factor is

We have shown how the PRD model from Palmer et al. (

A key feature of diffusion-based models is that they treat each RT as the end-point of an accumulation of evidence. If we take this type of evidence-accumulation process seriously then it makes sense to model the distribution of RT values as an inverse Gaussian distribution (for reasons described in section 5).

A striking result is the difference between the log likelihoods associated with the binary response model and the RT model, despite the fact that the binary response model has fewer free parameters than the RT model, and that both models provide similar PSE estimates which (based on their sems, not shown) are not significantly different. These log likelihood values suggest that the EPRD model provides a better fit to the RT data than it does to the binary response data. This difference in likelihoods suggests that the parameter estimates obtained using the combined RT and response data is dominated by the binary data likelihood term.

Self-evidently, both the RT and binary responses of an observer depend on the stimulus strength

We can gain some insight into the nature of this problem by considering the proportion of the differential entropy in stimulus values accounted for by the corresponding differential entropy in observer responses. At one extreme, if an observer is told to respond as quickly as possible then the RTs should provide relatively large amounts of mutual information regarding stimulus strength, whereas the binary responses carry relatively little mutual information (because speeded responses tend to be inaccurate Hanks et al.,

The scenario considered above can be represented geometrically, as in Figure

Unfortunately, we have been unable to derive an expression for the total mutual information between the joint variables (RT and binary responses) and stimulus strength

The Shannon increment (SI) is similar in spirit to the more conventional just noticeable difference (JND). However, the JND has an arbitrary value, and (despite its name) there is no reason to suppose that a JND is indeed just noticeable. The SI is monotonically related to the average amount of Shannon information an observer gains regarding a single presentation of a stimulus, and is a measure of the perceptual resolution with which a parameter is represented by the observer.

We have presented an extended proportional-rate diffusion model, which takes account of both individual RTs and binary responses for maximum likelihood estimation of key psychophysical parameters (e.g., PSE, slope) of the psychometric and chronometric functions. The fact that these psychophysical parameters have similar estimated values when computed independently for two models based on RTs alone or on binary responses alone provides support for the underlying physical basis of this class of diffusion models.

An information-theoretic analysis was used to estimate the average amount of Shannon information that each RT provided about the stimulus value, and also the average amount of Shannon information that each binary response provided about the stimulus value. This analysis provides bounds for the average amount of Shannon information that the observer gains about the stimulus value from one presentation, which was found to be between 2.68 and 3.55 bits/trial for the experiment used here.

The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Thanks to Steve Snow, Nathan Lepora, and Tom Stafford for reading an early draft of this paper, and to two referees for their detailed comments.

γ EPRD lapse rate parameter.

_{x}.

_{i}, with range _{i}.

_{x}.

_{i} number of trials at stimulus strength _{i}.

_{x} number of different stimulus strengths.

_{i} proportion of comparison stimulus responses at stimulus strength _{i}, predicted by EPRD model.

_{i} MLE mean, equal to observed proportion of comparison responses at stimulus strength _{i}.

_{C} variable stimulus value of the comparison stimulus.

_{R} fixed stimulus value of the reference stimulus.

_{PSE} value of the comparison stimulus which the observer perceives as being the same as the reference stimulus.

_{i} MLE mean of inverse Gaussian RT at stimulus strength _{i}.

_{i} mean RT at stimulus strength _{i}, as predicted by EPRD model.

_{dec,i} mean decision RT at stimulus strength _{i}, as predicted by EPRD model.

_{res} mean residual RT (assumed the same at all stimulus strengths), as predicted by EPRD model, where _{res} = _{dec,i} − _{i}.

θ_{τ} = (_{PSE}, _{res}), five parameters for the RT component of the EPRD model.

θ_{P} = (_{PSE},

_{τ, i} variance in mean RT.

_{i} stimulus strength.

_{i} perceived strength of stimulus with strength _{i}.

^{1}Notice that, if the lapse rate is γ = 0.01 then the upper and lower bounds are 0.995 and 0.005, respectively, because half of the observer's guesses will be correct, on average.

^{2}For reference, the Wald distribution is the distribution of first passage times of a biased Brownian process, and is qualitatively similar to the log-normal distribution, which is often used to model RT.