^{1}

^{2}

^{1}

^{2}

Edited by: Gregory R. Samanez-Larkin, Vanderbilt University, USA

Reviewed by: Samuel Joseph Gershman, Princeton University, USA; Carlos Diuk, Princeton University, USA

*Correspondence: Darrell A. Worthy, Department of Psychology, Texas A&M University, 4235 TAMU, College Station, TX 77843-4235, USA. e-mail:

This article was submitted to Frontiers in Decision Neuroscience, a specialty of Frontiers in Neuroscience.

This is an open-access article distributed under the terms of the

We incorporated behavioral and computational modeling techniques to examine age-based differences in strategy use in two four-choice decision-making tasks. Healthy older (aged 60–82 years) and younger adults (aged 18–23 years) performed one of two decision-making tasks that differed in the degree to which rewards for each option depended on the choices made on previous trials. In the

The US population is aging at a very high rate. By 2050 developed nations are projected to have substantially higher populations of older adults (26% of the population) than children under age 15 (16%; Cohen,

One important aspect of decision-making is that decisions can rarely be considered as isolated events. Rather, our decisions often affect what possibilities are available in the future. For example, the choices of whether to attend college, what college to attend, and what to major in will affect what job prospects are available to choose from in the future. Likewise, the choices regarding how to invest and save for retirement will eventually affect the class of retirement homes that are available to choose from. It is thus important to examine how people make decisions based not only on their immediate effects, but also based on how the present decisions will affect future possibilities.

A recent study from our lab suggests that older adults may actually be better than younger adults in situations where rewards are

One reason for this interaction between age and the reward structure of the task on decision-making performance may be an age related shift in the neural areas recruited during decision-making. A number of studies have shown that normal aging leads to structural and functional declines in a number of brain regions including the striatum, cerebellum, hippocampus, and prefrontal cortices (Raz,

An additional distinction that has emerged in the decision-making literature concerns brain regions implicated in the evaluation of immediate versus future consequences of each action. The ventral striatum has often been linked to the evaluation of immediate rewards (Hariri et al.,

A recently proposed theory of cognitive aging, the scaffolding theory of aging and cognition (STAC; Park and Reuter-Lorenz,

In the current work we seek to fill this gap by examining older and younger adults’ behavior in choice-dependent and choice-independent decision-making tasks, and by fitting a series of computational models to each participant’s data that differ in their assumptions about how participants make decisions in the task. Increased frontal compensation in older adults may lead them to employ explicit, heuristic-based strategies to a greater extent than younger adults, who may show more use of less explicit, reinforcement-learning (RL) strategies. Indeed, some recent work suggests that older adults are more likely to make their decisions based on simple heuristics than younger adults (e.g., Mata et al.,

Win-stay–lose-shift models have been extensively used to model decision-making behavior (Frank and Kong,

In the tasks used in the present experiments participants select from among four options on each trial and receive between 1 and 10 points. We develop a WSLS model for these tasks by having the model assume that participants compare the reward received on the present trial to the reward received on the previous trial. The trial is a “win” trial if the reward on the present trial is equal to or greater than the reward received on the previous trial, and the trial is a “loss” trial if the reward on the present trial is less than the reward received on the previous trial.

The WSLS model has two free parameters. The first parameter represents the probability of staying with the same option on the next trial if the reward received on the current trial is equal to or greater than the reward received on the previous trial:

In Eq. 1

The second parameter represents the probability of shifting to the other option on the next trial if the reward received on the current trial is less than the reward received on the previous trial:

This probability is divided by three and assigned to each of the other three options. The probability of staying with an option following a “loss” is 1 −

Many common RL models used to account for decision-making behavior in choice tasks operate by developing and updating expected reward values for each option, _{j}_{.}_{j}

Here θ is an exploitation parameter that determines the degree to which the option with the highest EV is chosen. As θ approaches infinity the highest valued option is chosen more often, and as θ approaches 0 all options are chosen equally often.

We fit two models that have slight differences in the assumptions regarding how EVs are updated on each trial. Both models use the Softmax rule in Eq. 1 to determine the probability of selecting each option. The Delta-Rule model assumes that the EV for the option chosen on each trial, denoted as option

This model assumes that the expected values for each option are updated only when that option is selected, and are based only on the reward received immediately after making a choice. Learning is primarily mediated by a _{i}_{0}). The Delta-Rule model has been used in a number of studies, primarily when the rewards in the environment are choice-independent (e.g., Sutton and Barto,

The learning rule for the Delta-Rule model can be modified to include eligibility traces (ET) which simply assert that participants remember which options they have chosen in the recent past, and that some of the credit from the reward received on each trial goes to options chosen on previous trials, rather than all of the credit going to only the option that was just chosen. The addition of ETs in the ET model has often resulted in an improved fit (Sutton and Barto,

The model assumes that participants keep a memory for recent actions, known as an ET. The ET for each option is denoted above as, λ_{j}

On each trial, the ET, λ_{j}

Additionally, each time an option is chosen the ET for that option is incremented according to:

Eligibility traces are meant to assert that participants remember which actions they have recently selected, and in this way recent actions can be credited if they lead to increases in reward. Thus, in the ET model traces for options that are not chosen continue to decay and EVs are updated more based on recent rewards the more often they are chosen (Eq. 7). To summarize, there are two main differences between the Delta-Rule and ET models presented above. First, the ET model incorporates ETs for recent actions, and second, the ET model updates the EVs of all options on each trial based on each option’s ET value, whereas the Delta-Rule model only updates the EV for the chosen option. It should also be noted that the Delta-Rule model is nested within the ET model, as the ET model is identical to the Delta-Rule model when ζ = 0.

We propose that utilizing a heuristic-based WSLS strategy will engage frontal brain regions, while utilizing an RL strategy will engage striatal brain regions. Older adults who engage in compensatory scaffolding should be more likely to utilize a WSLS strategy than an RL strategy than younger adults. Evidence for this distinction in the neural areas that mediate these two different types of strategies comes from many different sources. Reward prediction errors from RL models similar to the one presented above have been correlated with striatal activity in a number of studies (Pagnoni et al.,

In contrast, there is a large body of evidence that suggests that the use of heuristics, or rules, is explicit and more frontally mediated (e.g., Ashby et al.,

Based on the scaffolding theory outlined above, we predict that, relative to younger adults, older adults will employ more explicit strategies like WSLS due to frontal compensation. Thus, older adults’ data should show more evidence of being best fit by the WSLS model, while younger adults’ data should show more evidence of being best fit by one of the RL models.

In the following sections we present an experiment in which older and younger adults performed either a choice-dependent or choice-independent decision-making task. We then present behavioral results, followed by results of a modeling analysis where we compare the fits of the WSLS, Delta-Rule, and ET models, as well as the fits of a Baseline model that assumes random responding. This Baseline model has three free parameters representing the probability of selecting three of the four options on any given trial. The probability of selecting the fourth option is 1 minus the sum of the probabilities of the three other options. This model assumes random, stochastic responding. To foreshadow, we find that the ET and WSLS models provide the best fit to the data. We directly compare the fits of these two models and find that younger adults show more evidence of being best fit by the ET model than older adults.

Fifty-six younger adults (18–23 years of age,

Older adults were given a series of standardized neuropsychological tests before being included in the study. The neuropsychological testing session was held separately and before the experimental session. The battery of tests was designed to assess general intellectual ability across three functional realms:

The standard, age appropriate, published norms were used to calculate normative scores for each subject. For all of the WAIS subtests, the percentile was calculated according to testing instructions, and this score was then converted to a standardized

Each participant completed one of two decision-making tasks where all options led to gains in points and the goal was to maximize points gained. The two tasks had the same basic surface features and differed only on how the rewards for each option were structured. Figure

The rewards given for each deck in the choice-independent task are shown in Figure

The reward structure for the choice-dependent task is shown in Figure ^{1}

Participants were given a goal of earning at least 450 points by the end of the experiment. The goal criterion was determined so that participants had to draw a minimum of 25 cards from the increasing decks to meet the criterion in each task. The total points earned for the gains task can be plotted as a function of the number of cards drawn from the increasing decks. This is shown in Figure

The specific instructions participants received before performing the choice-independent task are shown below. The instructions were the same for participants who performed the choice-dependent task except participants were told that their goal was to earn 450, not 550, points.

You will perform a gambling task where you will be asked to make selections from one of four options. After each selection you will gain a certain number of points. Your objective is to gain as many points as possible. You will have a specific goal to earn a certain number of points by the end of the task. When you begin the task your goal will be listed on the screen. Try your best to earn as many points as possible.

Four decks will appear on the screen. You will use the “W,” “Z,” “P,” and “?/” keys to pick from these decks.

Press the “W” key to pick from the deck on the top left.

Press the “Z” key to pick from the deck on the bottom left.

Press the “P” key to pick from the deck on the top right.

Press the “?/” key to pick from the deck on the bottom right.

You will receive between 1 and 10 points each time you draw a card. Your goal is to earn at least 550 points by the end of the task.

We first examined performance in each task by computing each participant’s payoff relative to the payoff obtained by an optimal performer. This proportion of the optimal cumulative payoff was computed by dividing the points earned by each participant by the maximum number of points that could be earned by an omniscient observer (600 in the choice-independent task and 515 in the choice-dependent task). The proportions of the optimal cumulative payoff are shown in Figure ^{2} = 0.11. We conducted pair-wise comparisons within each task to investigate the locus of the interaction. Within the choice-independent task there was a significant effect of age, ^{2} = 0.09. Younger adults (^{2} = 0.13. Older adults (

We fit each participant’s data individually with the WSLS, Delta-Rule, ET, and the Baseline models detailed above. The models were fit to the choice data from each trial by maximizing negative log-likelihood. We used Akaike weights to compare the relative fit of each model (Akaike,

where _{i}_{i}

From the differences in AIC we then computed the relative likelihood,

Finally, the relative model likelihoods are normalized by dividing the likelihood for each model by the sum of the likelihoods for all models. This yields Akaike weights:

These weights can be interpreted as the probability that the model is the best model given the data set and the set of candidate models (Wagenmakers and Farrell,

We computed the Akaike weights for each model for each participant. Table

WSLS | Delta-rule | ET | Baseline | |
---|---|---|---|---|

Older adults | 0.44 (0.08) | 0.19 (0.03) | 0.37 (0.06) | 0 (0) |

Younger adults | 0.34 (0.08) | 0.23 (0.04) | 0.43 (0.06) | 0 (0) |

Older adults | 0.36 (0.08) | 0.21 (0.04) | 0.40 (0.06) | 0.02 (0.01) |

Younger adults | 0.19 (0.07) | 0.31 (0.04) | 0.48 (0.05) | 0.02 (0.02) |

We can conclude from Table _{ET} = ln_{WSLS} − ln_{ET}), for each participant’s data. Because lower log-likelihood values indicate a better fit, positive Relative fit_{ET} values indicate a better fit for the ET model, while negative Relative fit_{ET} values indicate a better fit for the WSLS model.

These Relative fit_{ET} values are plotted in Figure ^{2} = 0.04. Younger adults (11.63) had higher Relative fit_{ET} values than older adults (_{ET} values were near 0 for older adults, indicating equal evidence for both models.

_{ET} values for participants in each condition

We next examined whether there was a relationship between the Relative fit_{ET} values and proportions of the optimal cumulative payoff obtained in the choice-independent and choice-dependent tasks. For the choice-independent task there was a significant positive correlation between Relative fit_{ET} values and the proportions of optimal cumulative payoff (_{ET} values and proportions of the optimal cumulative payoff within the older adults group (_{ET} values and proportions of the optimal cumulative payoff was also positive, but only marginally significant within the younger adult group (

Across all participants in the choice-dependent task there was a significant negative correlation between Relative fit_{ET} values and the proportions of optimal cumulative payoff (_{ET} values and the proportions of optimal cumulative payoff was negative and highly significant for younger adults (

We examined the older adult data from the neuropsychological testing session to determine if there were any relationships between scores on those tests and strategy use in the decision-making tasks. We first examined correlations between the scores on each neuropsychological test for older adults, and the proportions of the optimal cumulative payoff they earned as well as their Relative fit_{ET} values. However, none of these correlations reached significance (all

We next split up the data based on whether older adult participants’ data were best fit by the ET or WSLS model. Thirty-two older adults were fit better by the WSLS model and 26 were fit better by the ET model. We examined the average

The CVLT recognition for true positives test requires yes/no recognition of items presented earlier and has been linked to frontal lobe functioning. For example, patients with frontal lobe dysfunction have been found to underperform on this test relative to normal controls (Baldo et al.,

We observed an interaction between age and the nature of the optimal task strategy on performance. Older adults performed better when rewards were choice-dependent, while younger adults performed better when rewards were choice-independent. This replicates our previous finding in the same choice-independent task, and mirrors our previous findings for two different choice-dependent tasks (Worthy et al.,

A direct comparison of the ET and WSLS model fits showed more evidence of WSLS strategy use for older adults than younger adults. Participants who were better fit by the ET model, relative to the fit of the WSLS model, tended to perform better on the choice-independent task, but worse on the choice-dependent task. A WSLS strategy may lead to sub-optimal switches from the most-rewarding options in the choice-independent task due to variation around the mean value given by each deck. A participant may switch to a different deck after receiving less on the current trial than what they received on the previous trial, even though they may be switching to a deck with a lower overall mean reward value. The ET model assumes that participants update and maintain EVs for each option. The EVs are essentially recency-weighted averages of the rewards received on previous trials, and the model predicts which option should be chosen by comparing the EV of each option with the EVs of the other options. This model should not predict as much switching from decks that give high average rewards because the decks are valued based on the average rewards received over many trials, rather than a relative comparison between the current reward and the reward received on the preceding trial.

A WSLS strategy likely helps on the choice-dependent task because participants are less likely to stay with the Decreasing options, and will select the Increasing options more due to the variation in rewards around each deck’s mean reward value. A WSLS strategy should also lead participants to switch away from the Decreasing options quicker once the rewards given by the Decreasing options begin to decline. An RL strategy will consistently value the Decreasing option early in the task because selecting it leads to larger average rewards. Because the EVs are recency-weighted averages of the rewards received for each option, participants using this type of strategy will pick the Increasing option less often early in the task, leading to poorer overall performance.

Thus, the age-based differences in performance on the choice-independent and choice-dependent tasks were due to differences in the types of strategies older and younger adults used to make their decisions on each trial, with older adults using a heuristic-based WSLS more often than younger adults. Other work also suggests that older adults may be more likely to use simple heuristics during decision-making than younger adults (Thornton and Dumke,

The differences in strategy preferences that we observed could be due to a shift in the neural areas recruited during decision-making, as predicted by STAC (Park and Reuter-Lorenz,

This study applied a series of mathematical models to data from younger and older adults who performed either a choice-dependent or choice-independent decision-making task. Older adults showed more evidence of utilizing a WSLS heuristic to make decisions than younger adults, who were best fit by an RL model that tracked recency-weighted averages of each option based on prediction errors. These results suggest that older and younger adults use qualitatively different strategies to make decisions, and that the shift in strategies may results from older adults engaging more frontal brain regions to compensate for age-based neural declines (Park and Reuter-Lorenz,

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

^{1}It should be noted that the choice-dependent task is formally a partially observable Markov decision process (POMDP). Some research in machine learning suggests that the inclusion of ETs can help RL models cope with partial observability (e.g., Loch and Singh,