^{1}

^{1}

^{2}

^{1}

^{2}

^{3}

^{*}

^{1}

^{2}

^{3}

Edited by: Mikhail Lebedev, Duke University, United States

Reviewed by: Chadwick Boulay, Ottawa Hospital, Canada; Amy L. Orsborn, New York University, United States; Steven Chase, Carnegie Mellon University, United States

This article was submitted to Neuroprosthetics, a section of the journal Frontiers in Neuroscience

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

Neural activity in the primary motor cortex (M1) is known to correlate with movement related variables including kinematics and dynamics. Our recent work, which we believe is part of a paradigm shift in sensorimotor research, has shown that in addition to these movement related variables, activity in M1 and the primary somatosensory cortex (S1) are also modulated by context, such as value, during both active movement and movement observation. Here we expand on the investigation of reward modulation in M1, showing that reward level changes the neural tuning function of M1 units to both kinematic as well as dynamic related variables. In addition, we show that this reward-modulated activity is present during brain machine interface (BMI) control. We suggest that by taking into account these context dependencies of M1 modulation, we can produce more robust BMIs. Toward this goal, we demonstrate that we can classify reward expectation from M1 on a movement-by-movement basis under BMI control and use this to gate multiple linear BMI decoders toward improved offline performance. These findings demonstrate that it is possible and meaningful to design a more accurate BMI decoder that takes reward and context into consideration. Our next step in this development will be to incorporate this gating system, or a continuous variant of it, into online BMI performance.

Primary motor cortical (M1) activity encodes movement related kinematics and dynamics (Georgopoulos et al.,

Recently, our lab and others have shown that context can modulate neural activity in M1 (Chhatbar and Francis,

The current work has two main goals. First, to show that significant differences exist in both directional and force tuning models of M1 units between rewarding and non-rewarding trials, and second, the reward level (Tarigoppula et al.,

Two non-human primates (NHPs), one male rhesus macaque (monkey S) and one female bonnet macaque (monkey P), were implanted with chronic 96-channel platinum microelectrode arrays (Utah array, 10 × 10 array separated by 400 μm, 1.5 mm electrode length, ICS-96 connectors, Blackrock Microsystems). The hand and arm region of M1 contralateral to their dominant hand was implanted with the same technique as our previous work (Chhatbar et al.,

After a 2–3 week recovery period, spiking activity was recorded with a multichannel acquisition processor system (MAP, Plexon Inc.) while the subjects performed the experimental task. Neural signals were amplified and bandpass filtered between 170 Hz and 8 kHz to isolate single and multi-unit activity and sampled at 40 kHz, and each channel manually thresholded to detect single units. Single and multi-units were sorted based on their waveforms using principal component (PC)-based methods in Sort-Client software (Plexon Inc.).

Monkeys S and P were trained to perform a reach-grasp-transport-release task, depicted in Figure

Behavioral Task: The behavioral task was composed of 6 scenes. First, during the cue display scene the animal was cued via the number of green squares to the amount of reward it would receive if it completed a trial successfully. Each green square indicated 0.5 s worth of liquid reward. Zero green squares indicated a non-rewarding trial. Trials could either be under manual control or BMI control. In manual control the NHP squeezed a physical manipulandum, with the amount of force, represented by a red rectangle, having to be held within the blue target rectangles in order for the trial to be successful. In BMI control mode, the NHP controlled the reaching trajectory of the arm toward the object (see Methods section).

For the manual task, the virtual arm reached the target cylinder automatically. The animal then controlled the grasping motion of the hand by manually squeezing a force transducer with its dominant hand. The amount of force applied was represented in the virtual environment by a red rectangle that increased in width proportional to the force output. The subject had to maintain a level of force indicated by a pair of blue force target rectangles (Figure

For the BMI task, after reward cue presentation the subject controlled the virtual robotic arm's movement from the starting position to the target cylinder using M1 activity. During the reaching stage of the task, the cylinder was always located horizontally to the right from the starting position of the virtual hand. When within a threshold distance, the cylinder was grasped automatically. The animal then needed to move the cylinder to the target position using BMI control. The target location was determined pseudorandomly within the confines of the task plane. If the animal brought the cylinder to the target position, the trial was considered successful. The hand then automatically released the grasp on the cylinder, and the arm reset to the starting position.

This study was designed to investigate the effect of varying the level of reward on neural encoding, so there were two versions for both BMI and manual tasks, differing in the reward levels offered during the recording session. The amount of juice reward delivered was determined by the amount of time the juice reward straw (electronically controlled by a solenoid) was kept open. The solenoid was opened for 0.5 s for every successive level of reward (approximately 1 ml juice). For the first recording session, the reward levels were 0 (non-rewarding) or 1 (0.5 s of reward delivery). For the second session, the reward levels were 0 and 3 (1.5 s of reward delivery). Only successful trials were considered for further analysis for both BMI and the manual tasks.

During BMI trials, subjects controlled the virtual arm movement during the reaching and transporting stages using a ReFIT Kalman filter (Gilja et al.,

At time

and the observation model is:

where _{t} is the state vector representing positions and velocities at time bin _{t} is the observation variable representing binned firing rates at time bin _{t} and _{t} are Gaussian noise. Given _{t} based on the previous state _{t−1} and the current observation _{t}. The ReFIT Kalman filter allows one to retrain and improve parameters

An assistive controller (Figure _{cx} and _{cy} that controlled the virtual arm's movement in the x and y dimensions were a linear combination (H in Figure _{x1} and _{y1} and “intended” velocities _{x} and _{y} given by

where

Assisted BMI control. Velocity command is the linear combination of predicted velocities (output from the ReFIT Kalman decoder) and intended velocities. The velocity of the virtual hand is a linear combination (H) of the ReFIT Kalman decoder output and the intended velocity. The higher the value of

We define intended velocities as velocity in the direction of the target location, but with the speed of the decoded velocities. Thus, the difficulty of the task could be changed by adding more or less of the intended velocity by adjusting the independence ratio P. A higher value of

Linear regression was performed to fit neural encoding models for both the manual task and the BMI task. For the BMI task, during the reaching and transporting scenes, the linear intended velocity-encoding model was given by:

where _{it} was unit _{xt} and _{yt} were the BMI controlled virtual hand's intended velocities at time bin

where θ_{t} indicates the intended movement direction, which was the direction toward the target at time bin _{pi} was the preferred direction of the

therefore Equation (1) is equivalent to Equation (2).

Intended velocities and directions were used instead of real velocities and directions for fitting Equations (1) and (2) because it has been previously shown that a ReFIT Kalman decoder that makes use of intended kinematics information can correct model parameters and have better BMI performance (Gilja et al.,

Considering rewarding and non-rewarding trials separately, the following equations were written:

_{2i} (_{2i}, and the corresponding θ_{pi}was adjusted by:

After this adjustment θ_{pi} would always represent the preferred direction of the unit, meaning that the unit had a maximum firing rate when the direction of movement was θ_{pi}.

For each unit, the shapes of the two tuning curves defined by Equations (2.1.2) and (2.2.2) could then be compared. _{pi}) and the normalized difference between two amplitudes (Δ_{i}) were calculated:

Δθ_{pi} was the angle between two unit vectors

By this definition, Δθ_{pi} was in the range of [0, π].

For the manual task, during the grasping, transporting, and releasing scenes the linear force encoding model was given by:

where _{it} is the _{t} is the grip force at time bin _{t} > 0 data were used for further analysis. Similar to the previous analysis, Equation (4) for all units was fit with the Matlab function “regstats” to find all significant units (

α_{ir} (rewarding) and α_{inr} (non-rewarding) for each unit were then compared using one-way analysis of covariance (ANCOVA, Matlab function “aoctool”) to see if the two slopes had a significant difference (_{i} was calculated:

From these neural encoding models of rewarding and non-rewarding trials, a combined linear decoding/prediction kinematics model was designed taking multiple reward levels into consideration (decoder 2). For the velocity decoder, the decoding accuracies between decoder 1, where all trials were considered together, and decoder 2, treating rewarding and non-rewarding trials separately, were compared using 5-fold cross validation. For velocity decoder 1, the linear decoding model was:

For velocity decoder 2, the linear decoding models were:

_{t} was the

where _{xt} and _{yt} were intended velocities at time bin

where _{1} was the total number of time bins for velocity decoder 1, which was the total number of time bins for all trials, and _{vt1} was the velocity error at time bin

where _{2.1} was the total number of time bins for velocity decoder 2.1 (all rewarding trials), _{2.2} was the total number of time bins for velocity decoder 2.2 (all non-rewarding trials), _{vt2.1} was the velocity error at time bin _{vt2.2} was the velocity error at time bin _{1} = _{2.1}+_{2.2}, _{ve} was defined to quantify the percent error reduction:

_{ve} was used to compare the velocity decoding accuracy of decoder 1 and decoder 2.

The decoding accuracies between force decoders 1 and 2 were also compared using 5-fold cross validation.

For force decoder 1, the linear decoding model was:

For force decoder 2, the linear decoding models were:

_{t} was the n-dimensional firing rate vector at time bin

where

where _{1} represented all time bins for force decoder 1 over all trials, _{2.1} represented all time bins for force decoder 2.1 over all rewarding trials, _{2.2} represented all time bins for force decoder 2.2 over all non-rewarding trials, _{ft1} was the force error at time bin _{ft2.1} was the force error at time bin _{ft2.2} was the force error at time bin _{1} = _{2.1}+_{2.2}, _{fe} was defined as:

Control groups were created to ensure that error reduction did not suffer because decoder 2 had more parameters than decoder 1. We shuffled the reward label for each trial randomly and then ran decoder 2 again. This way there were still two separate linear decoders, but the separation was random. _{ve} and _{fe} for the random shuffled decoder 2 were calculated using the same method as above and denoted by _{ves} or _{fes}. This shuffling process was performed 1,000 times for each data block. Also, we compared the _{ve} (for BMI task) and _{fe} (for manual task) and the corresponding _{ves} or _{fes} distribution for each block using bootstrap hypothesis test. A _{ves} that are greater than _{ve}:

where the total sample number _{t} = 1000, and _{l} is the number of samples whose _{ves} were greater than _{ve}. Similarly, a _{fes} that were greater than _{fe}. This process was used to generate control groups for both the BMI and manual tasks.

Previous results showed that post-cue firing rates in M1 are separable between rewarding and non-rewarding trials (Marsh et al.,

For trial _{t}. _{t} was the 6_{t} represented one unit's firing rate for each of the six bins in the above mentioned time period. The output was the class label _{t}, where _{t} = 0 for non-rewarding and _{t} = 1 for rewarding trials. For any testing data _{test}, the 5 nearest neighbors (Euclidean distance) of _{test} were found in training data and named as _{n1} ~ _{n5}. We then obtained:

where _{ni} were the labels for _{ni}, _{test} > 0.5, and classified as non-rewarding if _{test} < 0.5.

Not all M1 units had reward modulation, so we hypothesized that a better classifier could be built by using the subset of units which showed reward modulation. To choose this optimal subset, the best individual unit ensemble construction procedure (Leavitt et al.,

Combining the post-cue classifier (method 1.4.3) and the separate linear model (method 1.4.2), a two-stage decoder was designed to incorporate multiple reward levels. The first stage was the kNN classifier (method 1.4.3), using post-cue firing rates at the beginning of each trial to determine the reward level for the trial. The second stage then consisted of the two different linear decoders obtained from stage one. The second-stage decoder's output was velocity (using Equations 5.1 and 5.2) for the BMI task or force (using Equations 6.1 and 6.2) for the manual task. Using this two-stage decoder, the reward level could be calculated directly from the population firing rates, and no additional information was needed for the linear decoders in the second stage.

An offline test was run for the two-stage decoder, and its decoding accuracy was compared to a single stage linear decoder. _{vt} and _{ft} were calculated using the same equations as described previously. _{ve} and _{fe} for the two-stage decoder were defined as:

where _{vt3} was the velocity prediction error at time bin _{vt1} was the velocity error at time bin _{ft3} was the force prediction error at time bin _{ft1} was the force error at time bin _{ve} and _{fe} were then used to test if this two-stage decoder had an improved accuracy over the single linear decoder. Table

Three proposed decoders.

The two NHP subjects in this work conducted two manual grip force sessions and two BMI sessions each. Only successfully completed trials were considered for analysis. For the manual grip force task, 64 M1 units were recorded in monkey P. There were 83 successful trials in session one with reward (R) [0,1] and 103 trials in session 2 with R [0,3]. In monkey S 77 M1 units were recorded. There were 152 successful trials for session one with R [0,1] and 130 trials in session two with R [0,3]. In addition, each NHP completed two BMI sessions. For monkey P, 102 M1 units were recorded over 64 successful trials in session one with R [0,1] and 72 trials in session 2 with R [0,3]. For monkey S, 87 M1 units were recorded over 66 successful trials for session one in BMI with R [0,1] and 145 trials in session two with R [0,3].

M1 units were used to decode movement information based on their tuning curves. Our hypothesis was that for a given unit, tuning curve parameters would change based on the presence or absence of cued reward. For the BMI task (Figure

Example tuning curves for rewarding and non-rewarding trials for the BMI task. The x-axis is intended direction (degrees), and the y-axis is firing rate (Hz). The left subplots shows an example unit's tuning curves and all data points used to fit them. The right subplots are six example units' tuning curves. All example units were recorded from monkey S where the reward levels were 0 and 3.

Statistical results for amplitude and preferred direction differences between rewarding (R) and non-rewarding (NR) trials in the BMI task. Monkey S _{p}. All units were recorded in blocks where reward levels were either zero or three.

In Figure _{p}) are shown in Figures

In the manual grip force task, M1 units encode force and value. Figure

Linear force tuning curves between rewarding and non-rewarding trials for significant sample units. Both tuning curve characteristics, slope and intercept, change between rewarding and non-rewarding trials. The x-axis is the force sensor output, and the y-axis is firing rate (Hz). The left subplots shows an example unit's tuning curves and all data points used to fit them. The right subplots show six example units' tuning curves. All example units were recorded in monkey S M1 from an experimental block with reward levels of 0 and 3.

Significant force tuning was noted in 77 units (100%) in monkey S and 33 units (52%) in monkey P. Of these units with significant force turning, 30 units (39%) from monkey S and 12 units (19%) from monkey P were also significantly modulated by reward, having significantly different force turning curve slopes between rewarding and non-rewarding trials (see Methods section). Figure

Statistical results for slope differences between rewarding and non-rewarding trials in the manual grip force task. The number of units with significant changes in force tuning curve slopes and Δα distribution are shown in

The difference in directional tuning curves (Figures _{ve}, and force decoding, _{fe}, was clear whether we used reward levels of zero and one, or zero and three (see Methods section). However, the percentage improvement was greater when the difference in trial value was greater, that is there was a greater percentage improvement between zero and three levels of reward than for zero and one level of reward. The reward modulated velocity decoder produced a 22–29% error reduction compared to the single linear decoder (Table _{ve}). The reward modulated force decoder resulted in an error reduction of between 10 and 25% compared to the single force decoder (Table _{fe}). These results demonstrate an improved decoding accuracy when rewarding and non-rewarding trials were treated separately, particularly when the value differences were greater. The distributions of decoding error differences between decoder 1 and 2 are plotted in Figure _{vt1}−_{vt2}) and Figure _{ft1}−_{ft2}). These figures demonstrate that most of the data have positive error reduction using decoder 2 for both force and velocity decoding (_{vt1}−_{vt2} > 0 or _{ft1}−_{ft2} > 0). Additionally, we compared the error reduction between decoder 2 and shuffled surrogate data (see Methods section) where decoder 2 results are greater than that of the shuffled groups (_{ve} or _{fe} are larger than their corresponding _{ves} or _{fes}), and this difference is significant (p < 0.05, bootstrap hypothesis test). These results obviate the need to accept the alternate explanation of improved decoding performance due to decoder 2 having more parameters than decoder 1.

Decoding accuracy was greater when multiple linear decoders corresponding to different reward levels were used (decoder 2) compared to a single linear decoder (decoder 1), for both velocity and force decoding.

_{ve} |
22 | 23 |

1.3 | 2.3 | |

_{ve} |
29 | 27 |

−1.1 | 0.29 | |

_{fe} |
12 | 10 |

−0.35 | 1.3 | |

_{fe} |
25 | 15 |

−1.9 | −2.2 |

Distributions of velocity error reductions between decoder 1 and 2 (_{vt1}−_{vt2}). The x-axis represents velocity error reductions and the y-axis represents probability. The first row represents the task where the reward levels were 0 or 1. The second row represents the task where the reward levels were 0 or 3. The first column represents data from monkey S, and the second column represents data from monkey P.

Distributions of force error reductions between decoder 1 and 2 (_{ft1}−_{ft2}). The x-axis represents force error reductions and the y-axis represents probability. The first row represents the task where the reward levels were 0 or 1. The second row represents the task where the reward levels were 0 or 3. The first column represents data from monkey S, and the second column represents data from monkey P.

Results from Table

Reward level classification mean accuracy and standard deviation across 10 Monte-Carlo repetitions using post-cue firing rates and classifying between rewarding (

70 ± 3.7% | 73 ± 1.9% | |

72 ± 2.8% | 80 ± 2.4% |

Since the reward level could be classified using firing rates (Table _{ve} and force decoding improvement _{fe} for the offline test of the two-stage decoder using 5-fold cross-validation for both monkeys. Here, the reward level is decoded during the first stage from the neural data and is used to determine the equations for the second stage. The percent improvements _{ve} and _{fe} are greater than 0 for all cases, therefore the two-stage decoder is more accurate than the single linear decoder (see Table

Two-stage decoder improvement over decoder 1.

_{ve} |
15 | 15 |

_{ve} |
7.9 | 7.2 |

_{fe} |
6.9 | 7.1 |

_{fe} |
10 | 7.3 |

Distributions of velocity error reductions between decoder 1 and 3 (_{vt1}−_{vt3}). The x-axis represents velocity error reductions and the y-axis represents probability. The first row represents the task where the reward levels were 0 or 1. The second row represents the task where the reward levels were 0 or 3. The first column represents data from monkey S, and the second column represents data from monkey P.

Distributions of force error reductions between decoder 1 and 3 (_{ft1}−_{ft3}). The x-axis represents force error reductions and the y-axis represents probability. The first row represents the task where the reward levels were 0 or 1. The second row represents the task where the reward levels were 0 or 3. The first column represents data from monkey S, and the second column represents data from monkey P.

In the current work, NHP subjects controlled either grip force manually or reaching kinematics with a BMI. In both cases the subjects controlled a simulation of an anthropomorphic robotic arm or hand to reach, grasp, and transport target objects. Each trial was cued as to the reward value the subject would receive for making the correct movement. In this work, we found that M1 unit activity was modulated by cued reward level in both the manual grip force task and the BMI kinematics control task. In both of these tasks the neural tuning functions, force tuning and kinematic tuning, were significantly modulated by the level of expected reward in blocks of trials where cued reward was (0 vs. 1), or (0 vs. 3). Our results indicate that reward influences motor related encoding in both manual and BMI tasks. When we explicitly took the influence of reward into consideration, our linear decoding models predicted with significantly higher accuracy. Having a more predicative linear decoding model is important because the most successful BMI systems use such linear models at their core, such as within a Kalman framework, or simply use the output of a linear model that decodes neural rate into movement parameters. A more accurate linear decoding model should lead to a more controllable BMI. In addition to the BMI control aspects of this work, the basic neuroscience is important, namely that expected reward modulates M1 motor related tuning functions, which was previously shown for kinematics in a manual task (Ramakrishnan et al.,

It can be seen from Table _{ve}) and for grip force decoding (_{fe}) from trials with reward levels (0 vs. 3) are larger than (0 vs. 1). We have recently found that at least some M1 units code reward level in a linear manner in agreement with the above results (Tarigoppula et al., _{ve} and _{fe} from Table

From the current and previously cited work, it is clear that neural firing rates in M1 are modulated by more than just movement related information, and we have made use of reward information in a controlled environment to develop a more robust and accurate decoder. In this study, the virtual hand moved at a constant speed and animals could only control the movement direction in the BMI task. It is not possible that the firing rate differences are because of different movement speeds, but it could be related to intended movement speed. Previous work has already shown that there are reward induced changes in directional tuning under manual control (Ramakrishnan et al.,

In a more realistic BMI scenario the number of reward levels may not be known, and there may be other unknown factors encoded in M1. If reward levels are discrete, or can be treated as such, then it is possible to use a strategy similar to the two-stage decoder, but with multiple levels for reward using clustering as the first stage. If rewards are continuous, one possible solution is to use a latent variable model to represent reward value (Wu et al.,

All animal work was approved by the SUNY Downstate IACUC.

YZ, JH, and JF conceived the research. YZ and JH conducted the experiments. YZ conducted the analysis. YZ, JH, JA, and JF wrote the paper.

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

We would like to thank SUNY Downstate DMC for all their help with the NHPs.