^{1}

^{*}

^{2}

^{1}

^{2}

Edited by: Mark Louis Taper, Montana State University System, United States

Reviewed by: Brian Dennis, University of Idaho, United States; Jeff E. Houlahan, University of New Brunswick Fredericton, Canada

This article was submitted to Environmental Informatics, a section of the journal Frontiers in Ecology and Evolution

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

Evidential statistics is an important advance in model and theory testing, and scientific reasoning in general, combining and extending key insights from other philosophies of statistics. A key desiderata in evidential statistics is the rigorous and objective comparison of alternative models against data. Scientific theories help to define the range of models which are brought to bear in any such assessment, including both tried and trusted models and risky novel models; such theories emerge from a kind of evolutionary process of repeated model assessment, where model selection is akin to natural selection acting both on the standing crop of genetic variation, and on novel mutations. The careful use of evidential statistics could play an important and as yet to be fulfilled role in the future development of scientific theories. We illustrate these ideas using examples from ecology and evolutionary biology.

Statistical inference aims at relating models to data and the empirical world, whether that model deals with an issue as simple as estimating the mean of a population or as complex as predicting millennial-scale changes in the global climate. There have been decades-long debates about the best way to make inferences (e.g., Neyman-Pearson error statistics vs. Bayesian approaches). This special feature highlights the approach called “evidential statistics,” (Taper and Ponciano,

Scientists quest to obtain knowledge about the empirical world so as to understand its causal structure, and to use that causal structure for prediction as well as control and management. The inferential procedures employed to gain such knowledge should be “truth-tropic” (Lipton,

Statistics is essential for testing models in the broad sense, examining their relationship with the empirical world, efforts that in turn contribute to the goal of crafting and testing more general theories. Building and testing theories relies on a variety of approaches, only some of which make explicit use of statistical inference. Evidential statistics aims at providing a systematic approach for assessing the relative informativeness of models, which depends upon available data and protocols—distinct from the personal beliefs embedded within Bayesian statistics—via objective metrics of evidence that ideally lead toward closer approximations of the “truth” as models continue to be refined and compared (Dennis et al.,

Like any evolutionary process, theory development depends upon the availability of an array of alternative models for comparison, using both a standing crop of existing models that have proven useful in other contexts, and novel conceptual mutations. Evidential procedures are akin to natural selection culling genetic variants, favoring the fittest in the population at hand in a given environment. For example, in our saguaro cactus model, general climatic variables such as average rainfall or seasonal patterns in precipitation are doubtless important and would discriminate among many models, but a key idiosyncratic factor operating at the northern range limits appears to be the number of consecutive hours below freezing (MacArthur,

This evolutionary perspective on theory development stems back to Popper (

Our approach to models and theories can be considered part of the Pragmatic View of the structure of scientific theory (Winther,

Scheiner and Willig (

Constitutive theories are the workhorses in this framework and what most individuals would think of when asked to name or describe a theory. Their role is to organize models into larger entities. They consist of a set of propositions, which might arise inductively from a set of models (e.g., a constitutive theory of diversity gradients, Scheiner and Willig,

where

The framework is multilayered, and both general and constitutive theories can be nested and overlapping. For example, a model of the evolution of plasticity of

Models can be both qualitative and quantitative in describing or predicting nature. In ecology and evolution we tend to think of dynamical mathematical models, systems of equations or computer rules linked by logical operators corresponding to assumptions about mechanisms at and across different levels of biological organization. A computer simulation, such as an individual-based model of population dynamics, might be an example. Models can also be qualitative; Charles Darwin's theory of evolution was almost entirely verbal and qualitative. There is a single, iconic tree-like figure in

From models we deductively derive hypotheses that in turn make predictions. These predictions are often derived from a mathematical model, which are based on some expected distribution of parameter values (see other articles in this special feature). Those distributions are then compared to data (broadly defined). Whereas the model is general in the sense that it applies across a domain of interest, a hypothesis becomes a prediction when applied to a specific, empirical instance. That application, the collision of models and data, is where evidential statistics steps in.

Statistical methods shed light on the possible relative verisimilitude or falsity of a hypothesis, compared to coherently-specified alternative hypotheses. That hypothesis might be that a model parameter has a very specific value (e.g., in plant populations the relationship between the average mass per individual and the density of survivors should have a exponent of −3/2, Yoda et al.,

Those theories are systems that organize models, data, concepts, and so forth [Box 3.2 in Pickett et al. (

Within constitutive theories are families of models, and decisions need to be made as to which models to include or exclude. Sometimes that decision-making process is how well one model mirrors the empirical world relative to another model. Evidence based on the relationship of a hypothesis with data and the empirical world leads to inferences about the relative truth or falsity of the hypotheses generated by each model, a decision-making process mediated by statistics. But these decisions are only part of what goes into conclusions about the utility of a constitutive or general theory. A principle in a general theory (e.g., “The ecological properties of species are the result of evolution” from the theory of ecology, Scheiner and Willig,

Evaluating models, such as the predator-prey model given above, involves more than just comparing predictions with data. That model famously predicts predator-prey cycles, looking in some respects like real-world cycles (such as the lynx-snowshoe cycle of Canada). May (

Models may be false, while still playing a vital role in the conceptual framework of ecological theory. We contrast structural stability (the robustness of model conclusions to small deviations in model assumptions) with the stability of model structure. For the predator-prey model, the essential structure of the model itself (a +/– interaction between two antagonists, a natural enemy and its victim) is applicable across many empirical systems (e.g., predator-prey, host-pathogen, and plant-herbivore). The Lotka-Volterra predator-prey model demonstrates that there is a tendency to oscillate inherent in such antagonistic interactions. This qualitative conclusion is robust across many variants of this basic model, although the details may differ (e.g., the oscillations may manifest as transients following a perturbation, rather than as permanent cycles). Because the Lotka-Volterra model makes such robust, qualitative predictions, it continues to play an important role in the conceptual framework of theoretical ecology, even though it is known to be literally false for all empirical predator-prey systems. The same can be said of the model of exponential growth, d

If a theory is relatively narrow, encompassing just one or a few specific models, and all of those models fail, we would then discard the theory as not useful. For example, Arditi and Ginzburg (

The use of statistics to assess hypotheses and models involves both deductive and inductive reasoning. We deduce hypotheses/predictions from a model. If a prediction proves false, one or more aspects of the model may be concluded to be false, which is the basis of Popper's (

A third, less familiar, type of reasoning is abduction. The term was coined by Charles Peirce (Douven,

The development of constitutive and general theories cannot be entirely shoe-horned into formal statistical inference, including evidential statistics, vital though that is for sifting hypotheses and models. Statistical inference alone is insufficient when dealing with the sculpting over time of scientific understanding, involving the concerted efforts of many scientific minds who collectively craft complex models or theories (Longino,

What is the role of evidential statistics in determining the relationship between models and theories where the latter are qualitative, rather than quantitative? For example, our explanation about the range of saguaro cacti includes information about the geographic history of the North and South American continents. We have models of the movements of the continents over geological history, but those models are not mathematical equations. Rather, we have inferred that history from a range of observations, only some of which include quantitative models. In modern systematics, a phylogeny is a quantitative model of a set of relationships among species (or higher taxa) in a clade. When multiple phylogenies are overlain on a map, the subsequent qualitative biogeographic patterns can be used to make inferences about the geological history of that region. It is possible to devise a formal inference process for making decisions about that history, but a formal process is not always necessary. Wegener's (

Evidential statistics is still a relatively new approach to linking data, models, and constitutive theories, but it promises to provide a clearer and more coherent way to assess the relative match of models to data, compared to competitors such as Neyman-Pearson testing or Bayesian analysis. Does the use of evidential statistics change if the purpose of a model is for understanding (e.g., why saguaro are confined to the Sonoran Desert) vs. prediction (e.g., what is the most likely global mean temperature in the year 2100)? Does this use change if the model is mechanistic vs. phenomenological? Are different evidence functions better suited for prediction vs. explanation? If one carries out multiple studies, each of which uses evidence functions, how can these best be brought together to examine broad-scale patterns across many systems? Maybe there is a straightforward, evidentiary-statistics version of meta-analysis (for a start, see Goodman,

If the goal is understanding, a very simple model may be appropriate. For example, we might ask whether saguaro abundance within its occupied range is controlled by intraspecific competition only, or also by interspecific competition with ferrocactus. We could build a very simple model of logistic growth without and with competition and use inferential statistics to ask which model is more consistent with observed densities across space and/or time. The model is not likely to be useful for making an accurate prediction of densities, but may nonetheless help uncover the presence of a particular ecological mechanism (e.g., competition). Simple models can illuminate essential elements of a system, even if statistical inference indicates that the model is very far from an accurate depiction of the empirical system. Depending on our goal, the most useful model could either be very simple (to highlight a single, essential feature) or very complex (to be as accurate as possible). In this case, our goal is not theory testing. Rather, the goal is to use an established theory to build a model for a specific instance so as to enhance understanding.

Prediction is important and indeed vital in the progress of science (Houlahan et al.,

Sometimes statistical inference is not necessary for testing a theory, for example when a model is being used to explore if something is possible or not. The data are simply that some object or phenomenon exists or does not exist. The model either matches the data or it does not; no statistical inference is needed. For example, contra the “central dogma” we might have a theory that acquired characteristics can be inherited. For over a century, all of the data said that this theory was false. Then retroviruses were discovered showing that information can flow from RNA acquired from the environment back to DNA. For at least this narrow domain, the theory of the inheritance of acquired characteristics has been shown to be true. One might be able to shoehorn such examples into evidential statistics, but it is not clear that is necessary to understand the logic of scientific discovery in cases of this sort.

Even with a question that is less clear cut than simply “Does it exist?” statistical inference may be unnecessary. Statistical inference is about finding the informative signal within noisy data. For highly controlled experiments, the noise might be so small that the signal is immediately obvious. We know physiologists who say that if you need to use statistics, you really should refine your experimental methodology. Statisticians sometimes refer to this as the interocular trauma test, as in “it hits you between the eyes.” Mark Taper (pers. comm.) ripostes “[Y]ou are still comparing the fit of data to models – it is just that the integration can be done by eye.” Our evolutionary history has presumably fit us to be pretty good seat-of-the-pants statisticians, in that our past inferences have helped our ancestors survive and reproduce. But this decision process is not the same as the formal mathematics of statistical inference represented by evidential statistics.

Evidential statistics is an important advance in model and theory testing, and scientific reasoning in general, combining and extending key insights from other philosophies of statistics. We applaud the editors and authors of this special issue for crystallizing many important exciting themes swirling around the topic of evidential statistics. A scientist should use whichever tool is apt for the particular question at hand. Statistical inference itself is just one class of tools used in scientific inquiry that depends on quantitative data and mathematical reasoning. Other types of data and reasoning are sometimes more appropriate for a given question, such as qualitative data, and narrative or logical reasoning. We urge scientists to use as wide a range of tools as possible in the service of our quest to understand, predict, and manage our ever-fascinating, complex world.

The authors equally conceived of the content and wrote the paper.

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

We thank Mark Taper and Jeff Houlahan for their extensive and perceptive comments on earlier drafts that greatly helped to strengthen our presentation.