^{1}

^{*}

^{†}

^{1}

^{*}

^{†}

^{2}

^{2}

^{2}

^{3}

^{4}

^{4}

^{1}

^{2}

^{3}

^{*}

^{†}

^{4}

^{*}

^{†}

^{1}

^{2}

^{3}

^{4}

Edited by: Desdemona Fricker, UMR 8119 Centre de Neurophysique, Physiologie, Pathologie, France

Reviewed by: Adrien Peyrache, McGill University, Canada; Federico Stella, Radboud University, Netherlands

^{†}These authors have contributed equally to this work

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

Head direction (HD) cells, which fire action potentials whenever an animal points its head in a particular direction, are thought to subserve the animal’s sense of spatial orientation. HD cells are found prominently in several thalamo-cortical regions including anterior thalamic nuclei, postsubiculum, medial entorhinal cortex, parasubiculum, and the parietal cortex. While a number of methods in neural decoding have been developed to assess the dynamics of spatial signals within thalamo-cortical regions, studies conducting a quantitative comparison of machine learning and statistical model-based decoding methods on HD cell activity are currently lacking. Here, we compare statistical model-based and machine learning approaches by assessing decoding accuracy and evaluate variables that contribute to population coding across thalamo-cortical HD cells.

Animals can navigate by monitoring an online record of their spatial orientation in an environment and using this information to produce direct trajectories to hidden goals (

Several studies have reported that simultaneously recorded populations of HD cells tend to maintain coherence across their preferred firing directions (

Although a number of methods have been developed to assess the dynamics of thalamo-cortical HD signals (e.g.,

A central aim of the present study was to provide a comparison of the various methods used to assess the neural dynamics of spatial behavior. Specifically, we compare linear methods such as Kalman Filter, Vector Reconstruction, Optimal Linear Estimator, and Wiener Filter and non-linear methods such as Generalized Linear Models and Wiener Cascade. We compare these statistical model-based methods with several machine learning methods. In addition, we present a quantitative assessment of population coding by HD cells within the ATN, PoS, PaS, MEC, and PC and explore contributing variables to decoding accuracy such as the number of classified HD cells per dataset as well as the firing rate and tuning strength of HD cell populations.

Neuronal recordings analyzed in the present report were presented in previous work (

For data collected in PC, 4 male Fisher-Brown Norway hybrid rats were used. Rats were 5–10 months of age at initial surgery and were implanted with an 18-tetrode electrode array targeting the PC (for details see

For all datasets, electrical signals were pre-amplified on a headstage (HS18 or HS27) and were recorded using a Digital Lynx Data Acquisition System (Neuralynx, Bozeman, MT), and thresholded (adjusted prior to each session) spike waveforms (filtered 0.6–6 kHz, digitized at 32 kHz) and timestamps were collected for each session. Rat position and HD were tracked by either using red and green LEDs attached to the animal’s headstage (secured ∼8 cm apart) or by using colored domes of reflective tape which were created by covering 1/2 Styrofoam balls in reflective tape. A video tracking system provided x-y coordinates of each LED or Styrofoam ball position at a sampling rate of 30–60 Hz as interleaved video. However, for one animal included in the PC datasets, data was collected at 30 Hz (rat 4) and co-registered with spikes and stimuli.

For PoS, PaS, MEC, and ATN datasets, spike sorting was conducted using SpikeSort3D (Neuralynx, Bozeman, MT). First, waveform characteristics from each tetrode/stereotrode were plotted as scatterplots from one of the four tetrode wires and signal waveform characteristics (amplitude, peak and valley) were used for cell isolation. Individual units formed clusters of points in these plots and the boundaries were identified and manually “cut.” For PC datasets, spike data were automatically overclustered using KlustaKwik^{1} then manually adjusted using a modified version of MClust (A.D. Redish).

The HD of the animal was determined by the relative position of the red and green LEDs. The amount of time and number of spikes in each HD was sorted into sixty 6° bins. The firing rate for each 6° bin was determined by dividing the number of spikes by the amount of time. A firing rate by HD plot was constructed for each cell in the dataset and the directionality of each cell was quantified using a number of measures. First, we computed the mean vector length (Rayleigh r) for each cell. The mean vector length ranges between 0 and 1, with higher values indicating that spike occurrence is clustered around a particular direction. Second, we computed a stability score for each cell. Stability was calculated by dividing the recording session into four equal time bins and cross-correlating the 60 directional firing bins across each time bin and averaging these values (Directional Stability = (Q1:Q2 + Q1:Q3 + Q1:Q4 + Q2:Q3 + Q2:Q4 + Q3:Q4)/6). Because the mean vector length is susceptible to reporting high values when cells display low firing rates, we used a dual criterion for classifying neural activity as an HD cell. Cells were classified as an HD cell if the resulting mean vector length and directional stability scores exceeded the 95th percentile chance level generated by shuffling the neural data (see

Cells not sufficiently active during maze sessions (< 250 spikes/session; session = ∼50 min) were excluded from all analyses (39 cells excluded so 339 putative pyramidal cells remained). Data from video frames in which HD tracking was lost or segments in which the rat was still for relatively long (60 s) periods (calculated from smoothed positioning data) were excluded. Occupancy data were binned per 6° of HD and converted to firing rate (spikes/s). Rayleigh statistics were calculated using a combination of custom Matlab scripts and the circular statistics toolbox (

Twelve decoding methods were applied. Six are statistical model-based methods: Kalman Filter, Generalized Linear Model, Vector Reconstruction, Optimal Linear Estimator, Wiener Filter and Wiener Cascade. The remaining six are machine learning methods: Support Vector Regression, XGBoost, Feedforward Neural Network, Recurrent Neural Network, Gated Recurrent Unit, and Long Short-Term Memory. The python code for Wiener Filter, Wiener Cascade and the machine learning methods is from the freely available Neural Decoding package from ^{2}. Head direction data were transformed using directional cosines, then fed into the decoding algorithm, then transformed back to polar coordinates (

The Kalman Filter model (

The relationship between these variables is shown in

Graphic representation of the Kalman Filter and Generalized Linear Model: The main model is a hidden Markov chain structure. HDs follow a Markov chain and spike counts at the current time bin are independent from the counts from previous time bins.

The model assumes that the HD follows a first-order auto-regression structure with additive Gaussian noise. The model is given as:

where

For parameter fitting, the classical approach, maximum likelihood method (MLE) is used to obtain the values of

Similar to the Kalman Filter model, the generalized linear model is also a hidden Markov Chain model with HD (trigonometric) as the states and spike counts as the observations (

Where _{t,c} is the spike counts for brain cell

To fit the model parameters _{t,c} given

Since the training dataset contains the HD and spike counts at each observed time bin, we can make an estimation of the preferred direction for each cell (

Where _{c}(_{est}(

To achieve an accurate reconstruction with this method, there are several critical criteria for the training dataset. First, the data should have a sufficiently strong unimodal peak for a specific HD and firing rate, or else the estimation of preferred directions will be poor. Second, the preferred direction vectors must cover the full range of directions from 0° to 360°. Without input data covering some HDs, some predicted HDs may never be achieved (see

Illustration of the coverage of the full range of HDs: If all the preferred direction vectors cover only half of the possible HDs, then the vectors in the other half circle cannot be achieved by a non-negative weighted linear combination of these vectors, so the predicted angles will not cover all values between 0° and 360°.

The Optimal Linear Estimator (OLE) method (

Where

The solution for

Where

With fitted

The OLE method also has the prerequisite on the training dataset like the vector reconstruction method that the non-negative linear combination of

The Wiener Filter model (

The Wiener Cascade model (

To conduct HD decoding, we also used the following 6 machine learning methods. The selection on input-output is consistent for each method. In these methods, together with Wiener Cascade, there exists some free parameters that are not tuned during training. Instead they are set before the optimization process. These values are called “hyper-parameters.” In this paper, hyper-parameter selection was based on Bayesian Optimization (^{3}). It searched over a range of values for the hyper-parameters and chose the optimal one. Further detail is provided in

The support vector regression (

XGBoost (

The feedforward neural network (

The structure of feedforward neural network and Recurrent Neural Network (RNN):

The Recurrent Neural Network is the basic neural network structure designed for time series data (

The gated recurrent unit (

The structure of Gated Recurrent Unit and Long Short-Term Memory units: _{t} is used to determine if the update _{t}. r_{t} is the “reset gate” and is used to determine if the previous hidden value (also the output value) h_{t}_{–}_{1} will be kept in the memory. The effects of the two gates are achieved by sigmoid activation functions which can be learned during training. _{t} and more gates compared to GRU. Each gate can be seen in the plot where the _{t}_{–}_{v} will be used to calculate current output and kept in the memory. The second _{t}.

In this paper, the use of Gated Recurrent Unit (GRU) was the same as the hidden units in the Recurrent Neural Network (RNN) component. The GRU component was a chain structure of several gated units and it was applied to predict HD in one time bin. The model also applied the dropout method to avoid overfitting and used RMSprop to be the optimization algorithm. Same as the RNN methods, the dimension of gated units, the dropout proportion and the number of epochs were searched by Bayesian Optimization. An implementation difference was that the activation function between the output from the recurrent part and input to the feedforward layer was hyperbolic tangent (tanh) instead of ReLU since the former is the standard choice for Gated Recurrent Unit.

The Gated Recurrent Unit and Long Short-Term Memory (

Compared to the Gated Recurrent Unit, the Long Short-Term Memory unit has a more complex structure which includes more parameters. In the present paper, the use of Long Short-Term Memory was just a replacement of the Gated Recurrent Unit with the same settings: optimization algorithm = RMSprop; activation non-linear function = tanh; dimension of LSTM components, dropout proportion and number of epochs were searched by Bayesian Optimization.

Data were analyzed using two-way repeated measures ANOVAs (e.g., Decoding Method or Brain Region). In order to avoid large numbers of pairwise post-tests, we determined which factors were contributing to significant ANOVA results by removing factors one at a time. We started with the factor that was furthest from the mean, removed it, and reran the ANOVA. We repeated this process until the ANOVA was no longer significant. We also explored factors that may contribute to variability in decoding accuracy including the number of classified HD cells per dataset, cell firing rate, HD tuning strength, and angular head velocity (described in section Factors Influencing Variability Across Decoding Method, Brain Region, and Datasets). Linear regression was used to compare decoding accuracy to each of these factors. For all statistical analyses,

As described in section Neural Decoding Methods, cross-validation has been applied. There are two cross validation approaches: UT and LT. After running the code for all datasets, the results of the two cases is consistent. For brevity, only the results for UT are displayed. The output for LT can be seen in

Some decoding approaches use a likelihood model (i.e., firing rate given HD) in a Bayesian framework to represent individual single units. Two of the twelve methods we used, the Kalman filter and Generalized Linear Model, use likelihood representations. An examination of the likelihood representations is useful for understanding successful (and unsuccessful) decoding of HD. Thus, to compare the approaches, we first produced tuning curve plots (i.e., polar plots) showing the relationship between the cells firing rate and the animal’s HD (

The true-vs.-estimated tuning plots in 6-degree bins for one HD cell in each brain region: The polar plots show firing rates vs. HD. The black curves are the true tuning functions, smoothed by a Gaussian kernel function. The red curves are the estimated functions using the Kalman Filter (KF) method and the blue curves are the estimated functions using the Generalized Linear Model (GLM) method.

After training the model, we decoded the HD for the validating data and contrasted the decoding result with the true values. As a first-step, we visually compared the true and reconstructed HD as a function of time (

The true-vs.-predicted head angle plotted as a function of time for a representative ATN dataset for each of the 12 decoding methods: The black curves are the true curves and the red curves are the predicted curves. Test data is shown. Predicted curves are constructed using a model generated from a separate training segment of the data. The method name and decoding accuracy measured as median absolute error (MAE) are shown on the title of each plot (average absolute error, AAE, is also shown). KF, Kalman Filter; GLM, Generalized Linear Model; VR, Vector Reconstruction; OLE, Optimal Linear Estimator; WF, Wiener Filte, and WC, Wiener Cascade. The remaining six are machine learning methods: SVR, Support Vector Regression; XGB, XGBoost; FFNN, Feedforward Neural Network; RNN, Recurrent Neural Network; GRU, Gated Recurrent Unit; LSTM, Long Short-Term Memory.

Next, we quantified decoding accuracy by calculating the median absolute error (

where ϕ(_{est}(

For comparison, we also computed the average absolute error (

The median absolute error is shown for each brain region, each dataset, and each decoding method. Datasets for each brain region are sorted from lowest to highest median absolute error (i.e., from best to worst decoding accuracy). Note that median absolute error varies considerably within regions and on average increases from ATN to parahippocampal and PC regions. KF, Kalman Filter; GLM, Generalized Linear Model; VR, Vector Reconstruction; OLE, Optimal Linear Estimator; WF, Wiener Filter; WC, Wiener Cascade; SVR, Support Vector Regression; XGB, XGBoost; FFNN, Feedforward Neural Network; RNN, Recurrent Neural Network; GRU, Gated Recurrent Unit; LSTM, Long Short-Term Memory.

Next, we aimed to quantify the variance observed across decoding methods. The Optimal Linear Estimator method and Vector Reconstruction method appear to have large error relative to the other 10 methods (see _{(}_{11}, _{312}_{)} = 7.27, _{(}_{9}, _{260}_{)} = 1.00,

Mean ± 95% Confidence-Interval (CI) Median Absolute Error (MAE) for each decoding method. Data from different brain regions and datasets were pooled. KF, Kalman Filter; GLM, Generalized Linear Model; VR, Vector Reconstruction; OLE, Optimal Linear Estimator; WF, Wiener Filter; WC, Wiener Cascade; SVR, Support Vector Regression; XGB, XGBoost; FFNN, Feedforward Neural Network; RNN, Recurrent Neural Network; GRU, Gated Recurrent Unit; LSTM, Long Short-Term Memory.

In addition to variability across decoding methods, we observed variance in _{(}_{4}, _{22}_{)} > 2.82, _{(}_{4}, _{22}_{)} = 1.27, _{(}_{3}, _{18}_{)} < 3.16, _{(}_{2,12}_{)} < 3.89,

Decoding accuracy varies across brain regions. The average Median Absolute Error (MAE) for each area and each decoding method. The shading in the ^{∗∗}

We also investigated whether our findings above could be influenced by variability in the animal’s movement characteristics. We first measured whether there were significant biases in the animal’s trajectory by determining the dwell time in each HD. Plotting the data in this way demonstrates that good coverage of the full range of HDs occurred for all datasets from each brain region (_{(}_{4}, _{22}_{)} > 6.814, _{(}_{3}, _{16}_{)} < 2.462,

It should be noted that there are at least three additional variables that could influence our findings above. First, the density of HD cells varies considerably across brain regions (reviewed in

Next, we set out to explore factors that could underlie the variability we observed across brain regions and datasets (

As noted above, the percentage of cells classified as HD cells varies among the different brain regions (_{(}_{22}_{)} = 4.77,

Scatterplots of median absolute error vs. number of cells for all 12 methods. The dashed line is the fitted linear regression. The correlation coefficient (r) and the corresponding ^{∗∗∗}^{∗∗}^{∗}

Given that the number of cells influences decoding accuracy, we next investigated whether the regional differences reported in the previous section can be explained by the number of cells per datasets. To address this question, we repeated our decoding analyses on datasets composed of a random subsample of at least 3 cells. For datasets with 6 or more cells, we split the datasets in half, each composed of 3 randomly selected cells (without repeats). Due to the higher computational demands of machine learning approaches, and the similarity in results between model-based and machine learning methods (see _{(}_{4,46}_{)} > 2.57, _{(}_{3}, _{37}_{)} < 2.86, _{(}_{2}, _{28}_{)} < 3.34, _{(}_{2}, _{27}_{)} < 3.34,

We additionally examined the contribution of the directional specificity of HD cell tuning to decoding accuracy. We first removed the influence of the cell’s firing rate by normalizing each cell’s tuning curve relative to the directional bin with the peak firing rate. We then calculated the standard deviation of the standardized firing rate by HD tuning function as a proxy for tuning strength (_{(}_{2}, _{33}_{)} = 36.83, _{(}_{1}, _{22}_{)} = 10.74,

Tuning influences decoding accuracy. Top Row. Examples illustrating the relationship between scaled standard deviation (scaled STD) and tuning for single cells from ATN ^{∗∗∗}^{∗∗}

HD cell firing rates can vary between different HD cells (

Thus, to evaluate the relationship between decoding accuracy and firing rate, we created a measure that we refer to as the cell’s response rate, which is the proportion of video frames in which there was HD cell activity (i.e., cell spikes). As noted above, the number of cells per dataset can influence measures of _{(}_{3}, _{44}_{)} = 30.93, _{(}_{1}, _{22}_{)} = 7.73,

Example histograms of spike counts

Time cost is an important indicator of the decoding method’s performance.

The average training time and testing time is shown for each decoding method grouped by category, model-based methods (above) and machine learning methods (below).

VR | 0.00 | 2.65 | |

OLE | 0.00 | 2.70 | |

KF | 0.12 | 2.65 | |

WF | 0.29 | 0.00 | |

GLM | 2.88 | 2.63 | |

WC | 31.58 | 0.00 | |

SVR | 126.88 | 0.46 | |

XGB | 339.90 | 0.02 | |

FFNN | 3213.17 | 1.94 | |

RNN | 5274.18 | 2.23 | |

GRU | 5548.07 | 4.15 | |

LSTM | 6337.79 | 4.50 |

The full table can be seen in _{(}_{5}_{)} = 3.13,

The general aim of the present study was to compare statistical model-based and machine learning approaches for decoding an animal’s directional orientation from populations of HD cells. Overall, 12 computational models were evaluated using HD cell recordings from 27 datasets and from across 5 brain regions (PC, MEC, PaS, PoS, and ATN). Performance was similar for most methods (10 of the 12), but with significantly poorer performance by Vector Reconstruction and Optimal Linear Estimator methods. The generality of this result is supported by the fact that the findings were consistent across datasets from different laboratories (i.e., PC vs. other datasets), across different HD cell criteria (i.e., PC vs. other datasets), and across different behavioral testing procedures and recording environments (i.e., PC vs. ATN vs. all other datasets). For the Wiener Filter, Wiener Cascade and the machine learning methods, the prediction performance was highly accurate. One interesting result is that the Recurrent Neural Network model has a much simpler structure than the Gated Recurrent Unit model and Long Short-Term Memory model. In other words, the Recurrent Neural Network model has fewer parameters. The decoding result, however, indicates that these three methods do not have much performance difference. This result suggests that the more complex models may be overfitting the data, while the simpler, Recurrent Neural Network model may capture the critical parameters.

Both Kalman Filter and Generalized Linear Models are based on the hidden Markov chain framework. They make use of a Bayesian framework, assuming that firing rate is distributed according to HD. These two approaches model the activity of single cells as a function of HD. As a result, we can obtain the function curve generated by the model for spike count with HD as the input, which can be used as an estimation of the count-angle curve and the tuning curve. As shown in

We found significantly poorer performance by Vector Reconstruction and Optimal Linear Estimator methods. There are several possible reasons for this inferior performance. For these methods, there are two critical criteria for the training dataset. First, the data should have a sufficiently strong unimodal peak for a specific HD between HD and firing rate, or else the estimation of preferred directions will be poor. This limitation may further explain poor decoding performance, particularly for cortical datasets, as classification of HD cells could include cells that are stable yet have low mean vector length. Second, the preferred direction vectors must cover the full range of directions from 0° to 360°. Without input data covering some HDs, some predicted HDs may never be achieved (see

In general, the machine-learning methods displayed similar decoding accuracy to 4 of the model-based methods (Kalman Filter, Generalized Linear Models, Wiener Filter, Wiener Cascade). This indicates that the relationship between neural firing and HD is well captured by the 4 methods and do not differ from more complicated networks, which may have the problem of over-fitting the data. While it is possible that machine-learning methods would provide a benefit when dealing with larger scale recordings and high dimensional inputs, a large advantage of the model-based methods is their efficiency and robustness. All parameters can be efficiently estimated, and the linear methods can even have closed-form estimation. Related to these points, we also compared decoding accuracy with the elapsed time of training and testing decoding methods (time cost). All methods, with the exception of Vector Reconstruction and Optimal Linear Estimator, did not significantly differ with respect to

We also contrasted the accuracy of HD cell decoding between 5 brain regions, including ATN, PoS, PaS, MEC, and PC. From these comparisons, we found that decoding performance varied considerably across datasets and brain regions (see

Greater decoding accuracy by ATN populations support the hypothesis that the ATN has a pivotal role in processing the HD cell signal (

We considered several variables that may have contributed to the observed regional differences in decoding accuracy. These included the population firing rate (response rate), tuning strength, and cell density. Our analyses found that measures of tuning strength and cell density were significantly related to

In summary, the present study suggests three general conclusions regarding the use of statistical model-based and machine learning approaches for neural decoding of HD: first, our comparison of different computational models suggests limitations in decoding accuracy by Vector Reconstruction and Optimal Linear Estimator methods. Second, we found that decoding accuracy is variable across the HD cell system, with superior decoding in ATN compared to parahippocampal and cortical regions. Last, we found that decoding accuracy can be influenced by variables such as tuning strength, the response rate, and the recording density of HD cells. Thus, the present study provides a framework for the use of these computational approaches for future investigation of the neural basis of spatial orientation.

The datasets generated for this study are available on request to the corresponding authors.

The animal study was reviewed and approved by Dartmouth College IACUC and University of Lethbridge Animal Welfare Committee.

All authors contributed to the preparation of the manuscript.

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

We thank Dr. Bruce L. McNaughton for assistance with the parietal cortex data and acknowledge funding to Dr. McNaughton that made this possible: NSERC Grant RGPIN-2017-03857 and NSERC Grant RGPIN-2017-03857.

The Supplementary Material for this article can be found online at: