^{1}

^{2}

^{*}

^{2}

^{1}

^{2}

Edited by: David Hansel, University of Paris, France

Reviewed by: Maurizio Mattia, Istituto Superiore di Sanità, Italy; Masami Tatsuno, University of Lethbridge, Canada

*Correspondence: Max Berniker, Department of Mechanical and Industrial Engineering, University of Illinois at Chicago, 842 West Taylor Street, 2023 ERF, Chicago, IL, 60607, USA

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

The motor system generates time-varying commands to move our limbs and body. Conventional descriptions of motor control and learning rely on dynamical representations of our body's state (forward and inverse models), and control policies that must be integrated forward to generate feedforward time-varying commands; thus these are representations across space, but not time. Here we examine a new approach that directly represents both time-varying commands and the resulting state trajectories with a function; a representation across space and time. Since the output of this function includes time, it necessarily requires more parameters than a typical dynamical model. To avoid the problems of local minima these extra parameters introduce, we exploit recent advances in machine learning to build our function using a stacked autoencoder, or deep network. With initial and target states as inputs, this deep network can be trained to output an accurate temporal profile of the optimal command and state trajectory for a point-to-point reach of a non-linear limb model, even when influenced by varying force fields. In a manner that mirrors motor babble, the network can also teach itself to learn through trial and error. Lastly, we demonstrate how this network can learn to optimize a cost objective. This functional approach to motor control is a sharp departure from the standard dynamical approach, and may offer new insights into the neural implementation of motor control.

That standard framework for describing the motor system is dynamical. Forward and inverse models along with a control policy represent the motor system at a specific instant in time; thus they are representations across space, but not time. To generate feedforward commands and estimated state trajectories these representations must be integrated forward in time (Figure

The solution to an optimal control problem, an extremal trajectory, can be uniquely specified by the same parameters that define the control problem; e.g., the objective function, initial conditions, terminal penalties, etc… These optimal trajectories can therefore be represented with a function of time (Figure

Recent advances in deep neural networks have appealing characteristics for approximating this optimal trajectory function. Deep architectures excel at finding hidden, low-dimensional features by discovering statistical regularities in high-dimensional training data, and can do so in a relatively unsupervised fashion (Hinton and Salakhutdinov,

Here we demonstrate that a deep network can accurately approximate optimal trajectory functions for motor control. This network maps a behavior's defining variables, such as the initial and desired final state, to the optimal outputs: a complete vectorized profile of the state and command; thus providing the roles of both a forward model and a controller. Using point-to-point reaches for data, the network is trained to simultaneously represent optimal trajectories for freely moving reaches, and those made in either a clockwise or counterclockwise curl field. We then show that this network architecture can be boot-strapped, and teach itself in a manner analogous to motor babble (Meltzoff and Moore,

Here we reframe the conventional optimal control problem from one of solving a set of dynamical constraints using rate equations, to one of approximating a function. Consider a dynamical system we wish to control defined by the following rate equations,

where ^{n} and ^{r} are the state and command. The cost function to be minimized is,

where _{d}

such that there is some corresponding state and command trajectory {^{o}^{o}_{o}

Usually these optimal trajectories are characterized by necessary and sufficient conditions for optimality, e.g., the Euler-Lagrange equations, the Hamilton-Jacobi-Bellman equations or the Pontryagin minimization principle (e.g., see Bryson and Ho, _{d}_{0}, are uniquely associated with the optimal trajectory. In principle therefore, some function exists that maps these parameters to the optimal solutions (Figures

Since we know that these solutions are parameterized by a small number of variables, they must reside in a low-dimensional manifold. Thus, we can employ standard methods to identify this low-dimensional space and approximate the optimal control functions.

Note that since time is continuous, the output of our optimal control function is infinite dimensional (Figures

where _{i}

Turning our attention to the function inputs, we note that for many behaviors, the objective function of Equation (2) is largely invariant. That is, though the goals for various behaviors may change in terms of what states the system is trying to visit (_{d}

or, using a short-hand notation,

where

Though the output of our optimal trajectory function approximator is high-dimensional, ^{(n+r)N}, we know that it is exactly a function of the low-dimensional variable, ^{k}, where

With a training set of optimal trajectories, {^{i}}^{m}_{i = 1}, a series of feedforward autoencoders are trained, then unrolled and stacked (see Figure ^{i}, ^{i}^{m}_{i = 1} via supervised learning. In the final step, the shallow network is coupled to the top-half of the stacked autoencoders (Figure

^{i}}^{m}_{i = 1}

Reaches were simulated using a two-link arm model with four states: should and elbow angles (θ_{s}, θ_{e}) and their velocities. Realistic human limb link lengths and inertial parameters were used (obtained from Berniker and Kording, _{s},_{e}]^{T} where _{e}, y_{e}_{e}, _{e}]^{T}. The shoulder was defined as the origin (_{e}_{e}

Training data were optimal under Equation (2). The terminal cost was the squared Euclidean error from the target, φ = 1/2(_{d}^{T}Φ (_{d}^{T}^{io}^{io}^{m}_{i = 1}.

The corresponding unique identifiers for each reach were the initial and final desired state, (_{d}, x_{o}^{i}, the initial and final desired positions as well as the field flag. All inputs and outputs were scaled to lie between zero and one to be consistent with the networks range of values. ξ encoded the force field condition with one of three values, [0.05, 0.5, 0.95].

To train the network on optimal trajectories, 2000 random reaches under each of the three force field conditions were obtained, randomly assigning some for training (^{i} with (^{i}.

Initial tests of the depth and number of nodes of the network found that relatively good reconstruction could be obtained with an autoencoder that mapped the 606-dimensional optimal trajectories down to 100 and then five hidden features (and then back up to 100, then 606 dimensions). However, to account for inaccuracies in the shallow networks map from inputs to these low-dimensional features, the size of the inner feature set,

The network nodes used sigmoidal activations, ^{l}_{i}^{l}_{i}^{l}_{i}^{th}^{l}^{l−1}^{l−1} + ^{l−1}. ^{l−1} is the matrix of weights that connects the previous layers activations, ^{l−1} to layer ^{l−1} is a vector of biases for the units in layer ^{0} = X, and the output of the network is Y. Using software written for this purpose along with third party optimizing software (fmin), the network was trained until convergence criteria were met; either 2000 iterations of optimizing, or an error threshold was met.

The self-trained network used the same architecture described above, only now the input, X, was 4-dimensional since we did not examine reaches in a force field. The initial training set consisted of randomly generated reaches (

In a final exercise the network learned to optimize reaches after being self-trained. To do this we adapted the network through supervised training on the cost function (Equation 2). To do so we must compute the appropriate derivatives:

where θ is a vector of network weights. With our current architecture, we cannot compute ∂_{1})], to X. Note that this relationship is independent of whether or not the network is trained, or the trajectories are optimal, it is merely the input-output trajectories of the limb model. Using this forward dynamics network the deep network was optimized using gradient descent in a series of iterative steps: train the deep network using the current forward dynamics network, then train the forward dynamics network on trajectories obtained using the deep network.

An accurate and precise forward model was needed for the gradient, ∂

Just as in the previous section, the deep network was trained on self-generated data. Once trained, the deep network produced reaches essentially identical to those displayed in the previous section. Then the network was optimized as described above. Training was halted after 300 rounds of optimizing. To overcome some of the additional computational complexities accompany this training, we down sampled the trajectories, increasing

To serve as a point for comparison, a linear low-dimensional function was created using a PCA decomposition of the training data. The first five principle components were used to find a 5-dimensional feature vector,

We examined the ability of a deep network to represent a trajectory function; that is, a function that outputs the entire state and command trajectory for a movement. Using this network, we present results on how it can approximate reaches after training on optimal example data, train itself with self-generated data, and finally, learn to make reaches that optimize a cost function.

A deep network that approximates an optimal trajectory function was trained on point-to-point reaches moving freely through space, in a clockwise curl field, or counterclockwise curl field (defined through joint velocities, see Materials and Methods). We quantify performance with RMS errors between the approximate commands and states and their optimal counterparts. The state and commands are scaled to lie between zero and one (to be compatible with the deep network's range of outputs), so 1.0 is the maximum possible error. We complement these approximation errors with the subsequent errors that arise when the network is used as a controller. That is, small errors in the network's output,

Since the dynamics of our system are non-linear, linear methods for representing low-dimensional features such as PCA will contain unavoidable errors. To quantify these errors as well as obtain a measure for comparison with our deep network, we first approximated the optimal trajectory function using PCA. The training error for this linear approximation was 0.037, while the validation error was 0.039. A set of test reaches performed with this PCA function approximation put these errors in perspective. Errors in the counterclockwise, null and clockwise fields on these test reaches were 0.042, 0.009, and 0.044, respectively (Figures

Using the same data, a deep network was trained to approximate the optimal trajectory function (see Materials and Methods for details). The training and validation error for the deep network approximation were 0.004, an order of magnitude lower than with PCA. The test reaches demonstrate obvious improvements; errors in the counterclockwise, null and clockwise fields were all 0.002. In contrast with PCA, the state estimates were close to optimal (Figures

An interesting feature of using a trajectory function for control is that the network can teach itself to estimate both state and command trajectories. In a manner analogous to the learning proposed in motor babble (Meltzoff and Moore,

Using this idea, a network was self-trained to make point-to-point reaches. Using the same architecture as above, this network boot-strapped itself. The initial round of reaches were chosen uniformly over the entire workspace, the resulting reaches were of relatively small displacements (see Figure

After training on the 3rd round of data, both the training and validation errors were 0.006. The networks commands were not optimal under Equation (2). However, they successfully brought the limb to the desired targets and the state estimates accurately predicted the trajectories. This discrepancy between optimality and these self-learned trajectories was apparent on test reaches. Here, the commands generated by the network, based on self-generated data, differed significantly from the optimal ones (see Figure

We have shown that the deep network architecture can easily learn to approximate the optimal trajectory function when provided with samples, and can also teach itself to represent point-to-point reaching trajectories. In a final exercise, we demonstrate how the network can also be optimized to produce trajectories that minimize a given cost without being given the appropriate training data.

Just as in the previous section, a deep network was trained on self-generated data. Once trained, the deep network produced reaches essentially identical to those displayed in the previous section. Then the network was optimized (see Materials and Methods). Comparing the function's trajectories on the test reaches the error from optimality was 0.018 (half that from above). As can be seen, the motor commands were far closer to optimal relative to their previous, self-trained values (compare Figure

In the standard approach to motor control, state estimates and motor commands are produced using dynamical models; representations across space. Here we propose a new approach using a function to output optimal trajectories, containing both state estimates and commands for an entire reach; a representation across both space and time. We have shown how recent advances in training deep networks, in largely unsupervised ways, allow for accurate approximations to this optimal function. The resulting network can be trained to accurately represent optimal trajectories, or teach itself with self-generated data. What's more, the network can be optimized for a new cost, outputting the entire command and estimated state of an optimal reach.

The trajectory function approach has some obvious weaknesses. Using a deep network requires a large number of parameters, and in turn a lot of training data for successful learning. Similarly, simultaneously representing states and commands across time also increases the number of parameters and required training data. For our example two-link limb this data was easily obtained, but in other contexts, e.g., high-dimensional systems where computing optimal trajectories is computationally intensive, this may be impractical. Yet, without a formal method for approximating the analytical solutions to these optimal control functions, using network approximations such as these may be the second best option.

With regard to the nervous system, simulating large data sets for training may be neither feasible, nor necessary. The nervous system has at its disposal a very good means of generating large data sets of motor information, its own body. A long-standing hypothesis regarding the early stages of motor learning is that seemingly random motor commands and their consequences, termed motor babble, may be used to train-up the motor system. While this idea has been useful in framing motor learning for both biological and robotic systems (Meltzoff and Moore,

Another potential weakness is the fact that our trajectory function produces command and state trajectories over a fixed length of time. Being that the dynamics of the limb are non-linear, the network's outputs cannot be scaled in time to produce accurate movements of longer or shorter durations. Fortunately it is easy to propose potential solutions to this difficulty. For example, movement duration could be included as an input to the trajectory function, and the same training procedures could be implemented without change. Alternatively, rather than representing commands and states across discretized time, basis functions could be used, whose width could be varied as a function of the movement time.

No doubt other possibilities exist to improve upon the discretized representation of time. However, this “bug” may in fact be a feature. For example, if the function's output modulated temporal bases, then the motor system would reflect these temporal regularities in it's commands (d'Avella et al.,

The trajectory function approach offers multiple benefits too. As noted above, the architecture can generate its own training data. Since the network directly represents an entire trajectory, examining global features of reaches, such as curvature, velocity profiles, and motor effort are relatively easy. Additionally, although we have implemented a feedforward network in this preliminary examination, future work could implement the same trajectory function with a probabilistic network, e.g., a deep Boltzman machine (Hinton et al.,

Being that our functional approach is a sharp departure from the conventional dynamical approach, we end by speculating on how it could be neurally implemented and what potential insight it may offer. In the conventional approach, neurons encode the dynamics of the motor system, in which case neural activity represents the instantaneous state or command of a reach. Thus, this is a centralized representation of commands and state, encoding their temporal changes through time-varying neural activity. In contrast, with the approach we present many groups of neurons are used to encode the temporal profile of states and commands at distinct points in time. To drive the motor system this information must be conveyed in a serial fashion. This could be achieved by chaining the network's outputs into an ordered sequence of activity. Thus, our functional approach is a spatially distributed representation, encoding time-varying signals through changes in the spatial location of neural activity.

Despite a long history of electrophysiological studies, it is not clear which of these two approaches best explains the evidence. While there is a lot of evidence to support the conventional approach it is largely based on aggregate neural activity (Shidara et al.,

Finally, there is a recently renewed enthusiasm for interpreting neural activity in terms of a dynamical systems perspective (as distinct from the representation of a dynamical forward model) wherein the dynamics of a network of neurons are tuned to encode the time-varying motor information (Shenoy et al.,

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

MB is supported in part by the National Science Foundation Grant CMMI 1200830. KPK is supported by National Institute of Neurological Disorders and Stroke Grant 1R01-NS-063399.