^{1}

^{1}

Edited by: Stefano Fusi, Columbia University, USA

Reviewed by: Bosco Tjan, University of Southern California, USA; Gabriel Kreiman, Harvard University, USA; Denis Pelli, New York University, USA

*Correspondence: Peter Neri, Institute of Medical Sciences, Aberdeen Medical School, Aberdeen, UK. e-mail:

This is an open-access article subject to an exclusive license agreement between the authors and the Frontiers Research Foundation, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are credited.

Signals in the environment are rarely specified exactly: our visual system may know what to look for (e.g., a specific face), but not its exact configuration (e.g., where in the room, or in what orientation). Uncertainty, and the ability to deal with it, is a fundamental aspect of visual processing. The MAX model is the current gold standard for describing how human vision handles uncertainty: of all possible configurations for the signal, the observer chooses the one corresponding to the template associated with the largest response. We propose an alternative model in which the MAX operation, which is a dynamic non-linearity (depends on multiple inputs from several stimulus locations) and happens after the input stimulus has been matched to the possible templates, is replaced by an early static non-linearity (depends only on one input corresponding to one stimulus location) which is applied before template matching. By exploiting an integrated set of analytical and experimental tools, we show that this model is able to account for a number of empirical observations otherwise unaccounted for by the MAX model, and is more robust with respect to the realistic limitations imposed by the available neural hardware. We then discuss how these results, currently restricted to a simple visual detection task, may extend to a wider range of problems in sensory processing.

There are virtually no situations, whether in the laboratory or in the natural environment, when the human visual system has exact knowledge of all aspects concerning the task at hand (Pelli,

Is it empirically feasible to distinguish this model from the ideal model or from other candidate models of how humans cope with uncertainty? This problem turns out to be surprisingly difficult. As mentioned above, it is known that under some conditions MAX performance is nearly identical to ideal performance (see Theoretical Properties of MAX Kernels in Appendix for an analytical demonstration). Under a variety of situations human performance is explained by a simple model which adopts a nearly ideal strategy, but is corrupted by a late internal noise source (Burgess et al.,

A possible route to resolving this empirical issue may be to employ experimental techniques that allow a more detailed characterization of the underlying process than detectability metrics alone (Abbey and Eckstein,

Can this technique be exploited nonetheless to yield some useful insights into the underlying process? Previous work has shown that the non-linearities associated with uncertainty (and other forms of non-linear processing) can often be characterized using psychophysical reverse correlation, at least to a limited extent. Of specific relevance here are two methods: signal-clamping (an approach that capitalizes on the distinction between target-present and target-absent noise samples; Tjan and Nandy,

The display had three regions (Figure ^{2} and standard deviation (SD) 3.5 cd/m^{2}; we denote it using the vector n^{[q,z]}, the noise sample associated with the non-target (_{k}) where _{k} indicates the spatial position of the bar with respect to fixation: bars to the left of fixation are indicated by a negative ^{[φ]}, where φ represents the shift (in units of number of bars) applied to the target within the extrinsic uncertainty window: each element of ^{[φ]}(_{k}) = ρδ_{kφ} (Kronecker δ) for −_{N}. Signals at different locations were therefore orthogonal _{1} ≠ φ_{2} where 〈,𣊚 is inner product). Uncertainty markers consisted of red rectangles whose horizontal extent explicitly indicated the spatial extent within which the target bar could appear; they are denoted by the vector ^{[M]}, each element being ^{[M]}(_{k}) = ⊓(^{(j−1)} for j = 1 to 4 in different blocks: at the beginning of each block the uncertainty markers informed the observer of the specific extrinsic uncertainty window used for that block, and remained the same throughout the block. On the following block a different extrinsic uncertainty range was randomly selected out of the four detailed above. The bulk of our data was collected using a bright target bar (ρ > 0) on 10 naive observers; we collected an average of ∼8K ± 4K (±SD across observers) trials per observer. All subjects were paid by the hour for their participation; most were experienced psychophysical observers, but none was aware of the purpose or methodology used in the experiments. On a subset of these observers (6 out of 10) we performed additional measurements using a dark target bar (ρ < 0); for this condition we collected ∼1.1K ± 0.2K trials per observer.

We estimated internal noise (plotted on y axis in Figures ^{[1]})) and noise-only divided by the combined SD of both external and internal (σ_{I}) noise sources, and input _{I} = 0, i.e., before the addition of internal noise. The latter can be estimated, together with internal noise, from data obtained using the double-pass methodology described earlier (Burgess and Colborne,

We used variations of three main models (see Figure ^{[1]} = ^{[0]} = ^{x} or Ф(^{n} (which approximates ^{x} for ^{x} (for theoretical (but not simulation) purposes we also consider Ф(^{n}, see Theoretical Properties of MAX Kernels in Appendix). The ideal model in Gaussian noise, for example, is a specific case of the Korenberg model (see Theoretical Properties of MAX Kernels in Appendix). These models were challenged with the same stimuli used for human observers and generated a binary response by selecting the stimulus interval associated with largest response (decision-variable assumption; Pelli, ^{[0]}) and SD of ^{[1]})). Figure

Estimated first-order kernels (Ahumada, _{qz} = 2δ_{qz}−1. Estimated second-order kernels (Neri,

Model-human consistency was computed as the percentage of trials on which the model response matched the human response; we converted it to _{k})^{[M]} = 𝒩(_{k}, σ^{[M]}) where 𝒩 is the Gaussian density function with mean 0 and SD σ^{[M]} from fit to aggregate estimate of intrinsic uncertainty windows (red line in inset to Figure ^{[M]} = ^{[M]}). When the model was parameterized on the data (e.g.,

Observers were required to detect a bright ‘target’ bar briefly flashed on the screen (Figure ^{−5} for open). This increasing trend applied across all four uncertainty levels (Cuzick test for trend (Cuzick,

Following the preliminary assessment of threshold levels detailed above, we proceeded to collect a large number of trials (>110K) at or near the determined threshold SNR on 10 naive observers. We targeted a threshold performance level of output

Overall average efficiency (across conditions and observers) was 33% (±18% SD), matching the range measured by previous investigators for similar tasks (Barlow,

It should be emphasized that, although the correlation between internal noise and signal detectability demonstrated in Figure

The above-detailed characterization represents a necessary preliminary step for placing the data analysis and model simulations that follow within a solid framework. First, Figures

Figure

The above-noted similarity between first-order and second-order kernels will be critical for selecting adequate computational models later in the article, making it necessary to confirm that these qualitative observations are quantitatively robust and borne out by individual observer analysis, not just by cursory evaluation of aggregate data. Because (as is normal; Meese et al.,

Similarly to first-order kernels, second-order diagonals present negative modulations alongside the central positive peaks. A result of this nature, if statistically robust, would provide direct evidence against the MAX uncertainty model: this model predicts that second-order diagonals must contain only positive modulations, as we demonstrate both analytically (Theoretical Properties of MAX Kernels in Appendix) and via Monte Carlo simulations later in the article (Figure

We know from well-established results in non-linear systems analysis that certain cascade models generate specific modulations within first-order and second-order kernels (Marmarelis, ^{−5}), cyan (^{−3}) and magenta (

As a preliminary step toward the design of a physiologically plausible model, we will obtain an estimate of the front-end filter that is applied to the input stimulus via convolution (Figure

Under signal-clamping, first-order kernels are derived from target-present noise fields contingent on target position (Tjan and Nandy,

^{[M]} = ^{0.6844log(M)−2.6081}.

To estimate the function for the front-end filter as effectively as possible, and to avoid committing to a specific set of assumptions at this stage, we combined all traces in Figure

In a complementary manner to kernels derived from target-present noise fields, kernels derived from target-absent noise fields can be exploited to estimate the intrinsic uncertainty window applied by the observer to the output of the front-end convolution (Tjan and Nandy,

Figure

^{r} (i.e., _{k}) = δ_{k0}, Ф(^{r}).

We can assess the applicability of different models via a completely different approach, in which we do not attempt to gauge the structure of the system, but rather focus exclusively on how well different models are able to predict whether the human observer will respond 1 or 2 on each specific trial (Neri and Levi, ^{−3} respectively), but the Hammerstein model is superior to the MAX model in the presence of substantial uncertainty (magenta and red symbols fall above unity line at ^{−3} and ^{−5}). When spatial uncertainty is close to zero (blue) the MAX model operates like a matched template. Consistent with our results, Manjeshwar and Wilson (

^{r}) on the y axis. ^{n}) on x axis. Shading shows ±1 SD across observers. Symbols show, for each observer, the

Clearly, model-human consistency depends on the exact parametrization used for the model. As an example, Figure

Despite the inability of the ideal observer to capture the kernel structure observed experimentally (Figures

The analysis presented in Figure

Figure

^{j−1} for

The early non-linearity we characterized in the previous sections is suspiciously reminiscent of the expansive non-linearity that is commonly observed for uncalibrated monitors: when pixel intensity is controlled linearly at the palette level, the actual output from the monitor is typically supralinear (Brainard et al.,

We tested these predictions by performing additional measurements for the two smaller uncertainty levels on a subset of the observers (see Section ^{−3}; we were not able to measure a statistically significant effect for the other uncertainty level tested). We also estimated the front-end filter for these experiments, which looked very similar to the filter for detecting a bright target (inset to Figure

Uncertainty has been a subject of controversy on a number of occasions in the vision literature (Cohn and Lasley,

Throughout this report we have drawn a distinction between the NL Hammerstein model and the LNL Korenberg model. However it is evident that in general the former represents a subclass of the latter (Marmarelis and Marmarelis, _{1}NL_{2} implementation of a MAX uncertainty model, the linear stage L_{2} immediately preceding the psychophysical decision is necessarily a simple sum (this stage (indicated by σ in Figure _{1} (see Section _{1} (front-end convolution followed by weighting), not L_{2}; formally adding an early (ineffective) linear stage does not therefore reduce it to the LNL implementation of a MAX model. The prediction for the Hammerstein model is consistent with the data (Figure

The issue of formulating the front-end stage in the Hammerstein model draws attention to a further question: what is a plausible physiological substrate for this stage? As mentioned in the preceding paragraph, there is an implicit assumption in this model that the earliest stage involves a high-fidelity linear transducer (a delta function); a compatible physiological interpretation would presumably place this stage at retinal or geniculate level. The subsequent early non-linearity may then reflect the rectifying properties of ON and OFF channels (Shapley,

If we are not positioned to relate these models to specific physiological constructs, can we at least sketch an intuitive description in terms of the associated phenomenological experience? We attempt this in Figure

^{x}) whereby the ‘bright-bar’ content of each stimulus is emphasized (righthand pair of stimuli) before summing the evidence across the entire stimulus (Σ) to obtain a final figure of merit for comparison/decision (≶).

^{2} except for panel C where it only shows

If we accept the notion that human observers were striving to maximize efficiency within the constraints imposed by early filtering in the visual system and suboptimal encoding of the specified target uncertainty ranges (as indicated by the high model-human consistency achieved by the ideal observer in Figures

There are other features of the Hammerstein model that make it potentially more attractive than the MAX model. It is conceivable that it can be implemented more easily in neural hardware: static non-linearities are ubiquitous in neural structures and arise naturally from well-known properties of neuronal physiology (Priebe and Ferster,

The experiments described in this paper are restricted to a specific task, that of detecting a luminance bar embedded in noise. Although pertinent to visual processing and perhaps representative of a larger class of problems in visual detection, it is clearly inadequate as a proxy for more complex tasks. Suppose for example the task involves selecting, of two crowds, the crowd containing a specific target face. If we adopt a template-matching strategy, all faces in the stimulus must be matched against a template (or set of templates) for the target face. The MAX model applies seamlessly to this scenario, whereas the Hammerstein model is possibly undefined in this case: the static non-linearity must be applied before template matching, but what does it mean to apply a point-non-linearity to a whole face? Applying this kind of transformation to individual pixels in the image would make no sense for the task at hand.

This problem may be alleviated by recasting it in terms of feature space (a common strategy in kernel methods; Schölkopf and Smola,

The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Threshold and slope of Weibull psychometric curve

0 for target, 1 for non-target

0 for incorrect, 1 for correct response

Target shape

Noise sample

Stimulus sample

^{q},

^{z}]

Associated with specific

System front-end filter

Extrinsic uncertainty window

Intrinsic uncertainty window

Inner product

Hadamard (element-by-element) product

Outer product

Convolution

Cross-correlation

System output

Generic (typically expansive) static non-linearity

Decisional transducer

MAX model

Hammerstein model

Korenberg model

As a preliminary step we show that the MAX model (with output max (

We can write

where we use _{k}) = _{−k}) and _{k} = 〈^{r}.

Using equation

where Ф is a highly expansive static non-linearity Ф(^{p} (large

Using an adapted Volterra expansion (Neri,

where ^{T})), e.g., Θ_{1} = _{2} =

where Ψ is a non-linear decisional transducer function (Neri, _{2} (computed as detailed in Methods) is approximately correct (see also Neri, in press). By combining equations

where Ф^{(j)} is the ^{(2)} ≥ 0 for highly expansive Ф, we can state the following result (central to this article):

for the MAX model (rewrite equation

Using the same procedure adopted to derive equation

which (for a first-order expansion of Ψ) leads to

in line with well-established results (Westwick and Kearney,

In the signal-clamping methodology the target-present first-order estimated kernel _{k}) = ρδ_{k0}. Using a procedure analagous to Neri (_{d} = 0 for

where we index using : to take the entire corresponding vector dimension, e.g., _{2}(:, _{k}) is a 1-D vector consisting of the _{2}(_{j}, _{k}) for

By substituting this expression into equation

where the term

We can derive similar expressions for the target-absent kernel

which, for the MAX model, can be rewritten as

This result confirms the notion proposed by Tjan and Nandy (_{1}. It is trivially affected by odd-order non-linear kernels (Schetzen, _{k} = 0 for

where ^{(1)}/Ψ^{(2)}, making it practically prohibitive to correct for the second term (Ψ is in general not known).

If instead of assuming a MAX model we adopt a Hammerstein model, we have

where _{ν0}) image of

When φ = 0 by design (i.e., the target is presented at a fixed position), these results are directly applicable to the widely reported empirical observation that first-order kernels often present different characteristics when computed from target-present as opposed to target-absent noise fields (Ahumada et al.,

Supported by Royal Society (University Research Fellowship) and Medical Research Council (New Investigator Research Grant).