^{1}

^{*}

^{1}

^{2}

^{1}

^{1}

^{3}

^{1}

^{2}

^{3}

Edited by: Arindam Basu, Nanyang Technological University, Singapore

Reviewed by: Guillaume Garreau, IBM Research Almaden, United States; Subhrajit Roy, IBM Research, Australia

*Correspondence: Rohit Shukla

This article was submitted to Neuromorphic Engineering, a section of the journal Frontiers in Neuroscience

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

Emerging neural hardware substrates, such as IBM's TrueNorth Neurosynaptic System, can provide an appealing platform for deploying numerical algorithms. For example, a recurrent Hopfield neural network can be used to find the Moore-Penrose generalized inverse of a matrix, thus enabling a broad class of linear optimizations to be solved efficiently, at low energy cost. However, deploying numerical algorithms on hardware platforms that severely limit the range and precision of representation for numeric quantities can be quite challenging. This paper discusses these challenges and proposes a rigorous mathematical framework for reasoning about range and precision on such substrates. The paper derives techniques for normalizing inputs and properly quantizing synaptic weights originating from arbitrary systems of linear equations, so that solvers for those systems can be implemented in a provably correct manner on hardware-constrained neural substrates. The analytical model is empirically validated on the IBM TrueNorth platform, and results show that the guarantees provided by the framework for range and precision hold under experimental conditions. Experiments with optical flow demonstrate the energy benefits of deploying a reduced-precision and energy-efficient generalized matrix inverse engine on the IBM TrueNorth platform, reflecting 10× to 100× improvement over FPGA and ARM core baselines.

Recent advances in neuromorphic engineering (Schuman et al.,

In spite of the radically differing hardware implementations of these neural network substrates, many of them share an inherent design principle: converting input signal amplitude information into a rate-coded spike train and performing parallel operations of dot-product computations on these spike trains, based on synaptic weights stored in the memory array. These similarities also result in a set of common challenges during practical implementation, especially when using them as computing substrates for applications with a mathematical algorithmic basis. These challenges include a restricted range of input values and the limited precision of synaptic weights and inputs. Since a value is encoded in unary spikes over time (i.e., as a firing rate), each individual input and variable must take a value in the range [0, 1]. Furthermore, the precision of the encoded value is directly proportional to the size of the evaluation window, which, for reasons of efficiency, is typically limited to a few hundred spikes. Finally, because of hardware cost, synaptic weights can be implemented only by a limited number of memory bits, resulting in limited precision. For instance, IBM's TrueNorth supports 9-bit weight values, where most significant indicates sign of the weights.

Mapping existing algorithms to these substrates requires the designer to choose a strategy for quantizing inputs and weights carefully, so that the range limitations are not violated (i.e., values represented by firing rates do not saturate), while maintaining sufficient precision. Prior work notes these challenges, but typically presents only

In contrast, this paper develops a rigorous mathematical model that enables a designer to map numerical algorithms to these substrates and to reason quantitatively about the range and precision of the computation taking place in the neural substrate. Our mathematical framework can be applied to a wide range of problems in linear optimization running on neural substrates with diverse constraints. The model is validated empirically by constructing input matrices with random values and computing matrix inverse using a recurrent Hopfield neural-network-based linear solver. Our results show that the scaling factor and error bounds derived by the mathematical model hold for this application under a broad range of input conditions. We report the computing resources and power numbers for real-time applications, and quantify how the errors and inefficiencies can be addressed to enable practical deployment of the Hopfield linear solver.

Prior work by authors in Shukla et al. (

This section describes how a system of linear equations can be solved using a recurrent Hopfield neural network, and shows how such a solver can be used in applications such as target tracking and optical flow. These example applications have been successfully deployed on TrueNorth (Shukla et al.,

A linear equations solver is used to solve matrix equations of the form

where _{F} is the Frobenius norm. Note that the problem decomposes by columns of

where [_{·j} denotes the

This system can be solved by the following stationary iterative process:

and α is a positive steplength. (This process can also be thought of as a steepest descent method applied to the optimization problem Equation (2). For convergence of this process, we require

where ^{T}^{T}^{T}^{T}

We can map this process Equation (4) to a recurrent Hopfield network cleanly by rewriting it as follows:

where

The Hopfield neural network architecture for implementing Equation (8) is shown in Figure

Neural network architecture of Hopfield linear solver.

This elementary derivation shows that we can solve arbitrary systems of linear equations (which we refer to also as “matrix division”) directly in a recurrent neural network by loading synaptic weight coefficients _{ff} and _{hop} derived from _{k+1} appropriately. The weight matrix _{ff} serves as the feedforward weight for the input matrix _{hop} serves as the weight for the recurrent part for the values _{k}.

We have implemented prototypes for two different classes of applications. The first prototype is for applications in which Hopfield network weights are hard-coded on TrueNorth, while the second prototype is for applications in which Hopfield network weights are encoded as dynamic spike trains.

For the first class of applications, we consider a typical target tracking scenario, shown in Figure _{ff} and _{hop} can be precomputed and hard-coded onto TrueNorth board for use.

Illustration of

In target tracking, a real-time video input is preprocessed to extract features (e.g., edges of particular orientations) to form a feature set. This feature set is then compared against a set of templates to identify objects of interest, with the goal of tracking the objects in the image frame as they move in three dimensions. As a proof of concept, we have chosen a very simple image whose feature set consists of just three edges similar in appearance to the letter H. To determine size and placement of the bounding box for the tracked image, we utilize the theory of affine transforms which shows that a current image

We investigated optical flow as our second application, shown in Figure

Our demonstration is similar to the one reported in Esser et al. (

IBM TrueNorth is a biologically-inspired architecture that performs computations using spiking neurons. Input values are represented in a stochastic time-based coding, in which the probability of occurrence of a spike at a particular time tick is directly proportional to the input value. Since the computation values are represented as spike trains, designers are faced with two key issues in mapping algorithms to these spiking neural substrate.

Signed computations on spiking neural network substrates that have input values represented as rate based encoding must be performed by splitting all numbers (and intermediate results) into positive and negative parts.

Data representation is limited by maximum frequency of spikes. To represent different values within a matrix, we need to scale all quantities so that no number exceeds this maximum frequency.

To repeat the argument presented in Shukla et al. (

Example illustrating the importance of proper scaling for spike-based computation.

In implementating algorithms on TrueNorth, therefore, we must choose a scale factor that ensures that the intermediate computations never saturate. On the other hand, the scale factor should not be much larger than necessary, as this will result in loss of precision for the spike-train representations.

A Hopfield linear solver (Lendaris et al., _{k} that arise in the stationary iterative process Equation (4) [equivalently, Equation 8] have no elements greater than 1 in absolute value, for all

For purposes of this section we define the max-norm of a matrix to be its largest element in absolute value, that is,

(Note that when

where _{j} is the vector whose elements are all zero except for a 1 in position

We write the singular value decomposition of

where

where σ_{1} ≥ σ_{2} ≥ … ≥ σ_{N} ≥ 0. We further use notation

In this notation, we have that

Note too that

The following claim shows how we can scale the elements of _{k}‖_{max} ≤ 1 for all

CLAIM 1.

From Equation (12) we have that, ^{T}^{2})^{T}, so that, ^{T}, we have that

where _{·i} is the

so that

Since 1 − α_{i} > 0, we have for such

where the last inequality follows from ‖_{·i}‖_{2} = 1, the standard inequality that relates ‖·‖_{2} to ‖·‖_{1}, and the definition (14). By considering ^{T}_{l} one column at a time, we have

and by applying Equation (14), we obtain the result. □

An immediate corollary of this result is that if we re place _{max}) in Equation (1), where

then the matrices _{l} produced by the iterative process Equation (8) have ‖_{l}‖_{max} ≤ 1 for all _{ff} and Equation (19), we have by setting

By applying this scaling, and writing the solution

where _{n}: = _{max}). We use _{j} to denote the

By setting

Note that _{l} and _{l} differ from each other only by the scaling factor η‖_{max}, so we have from Claim 1 and Equation (19) that

and thus

We present two techniques to encode weights for the Hopfield neural network based linear solver of section 2.1. In the first subsection, we consider hardcoding the Hopfield neural network weights as TrueNorth neuron parameters. The extracted features of the image are used to compute weight matrices _{hop} and _{ff}, and these are converted further as TrueNorth neuron weight and threshold parameters. This scheme is suitable when the initial features do not change, as in 2-D image tracking. In the second subsection, we see how the computations are performed when weights are introduced as spikes. This scheme is appropriate for scenarios in which the initial conditions may vary frequently, such as optical flow and inverse kinematics.

To perform matrix multiplication with weight matrices _{ff} and _{hop}, the floating point values of these two weight matrices are encoded as a ratio of TrueNorth weights to thresholds; see Algorithm 1. Here, a single synapse is used for each term in the dot product computation. In TrueNorth, each neuron can have up to four axon types as input, each of which can be assigned a unique synaptic weight in the range [−255, 255]. Figure _{k}(1, 1);_{k}(2, 1);_{k}(3, 1)] and the columns of the 3 × 3 weight matrix _{hop} (which can have either positive or negative values). Each of the three values in _{k} have been assigned a different axon type, so that they are multiplied with a corresponding weight value to compute a dot product of the form _{1}_{k}(1, 1) + _{2}_{k}(2, 1) + _{3}_{k}(3, 1). Each neuron _{i} = 1) and rest of the parameters of the LIF neuron have the default initial value.

Computes the weights and threshold values for performing dot product on TrueNorth

^{th} row of weight matrices (_{i, .}) |

1: |

2: |

3: |

4: |

Synapse connection showing the dot product between first column of _{k} and the weight matrix _{hop}, and the corresponding threshold values for each neuron. _{ff} and _{hop} can be encoded using a single neuron. _{ff} and _{hop} cannot be encoded using a single neuron. We would need multiple neurons to compute partial sums and later add them up together.

Figure _{ff} and _{hop}) can be encoded on a single TrueNorth neuron. Using all of the four axon types available in a single TrueNorth neuron, we can encode a Hopfield neural network that has four _{ff} and _{hop} neurons. For scenarios where the Hopfield neural network might have more than four neurons in either _{ff} or _{hop}, then the matrix multiplication would have to be divided as partial sums across multiple TrueNorth neurons. Figure _{k} by a single column of the matrix _{hop}. Multiple neurons would be required to handle partial sums in matrix multiplication. Partial summation of matrix dot product are computed in neurons _{1} and _{2}. Both of these neurons have linear reset mode, and their weight and threshold values are computed using Algorithm 1. Once the partial sums have been computed, the results would go through a separate adder neuron where all of the intermediate sums would be computed. LIF neuron parameters for an adder neuron is shown in Figure

Neuron parameters for

For applications such as optical flow and inverse kinematics, where the initial input conditions may change dynamically, the hard-coding of TrueNorth weights discussed in previous subsection is not appropriate. We need an algorithm in which TrueNorth neurons can be used as arithmetic computation units and operate over spiking inputs. Cassidy et al. (_{hop} and _{ff} using spikes. Figure

Computes weight matrices _{ff} and _{hop} using spiking inputs

1: |

2: |

3: |

4: |

5: |

6: |

7: |

8: |

9: |

10: |

11: |

12: |

13: |

Since the matrix computations for _{hop} and _{ff} will be done in the hardware itself, we need to reconstruct the iteration formula in such a way that every term that serves as an input to the hardware has magnitude <1, otherwise the computations might saturate and give us the wrong result. We rewrite Equation (22a) as follows, to ensure that each bracketed term has all its elements in the range [−1, 1]:

We state the formal claim as follows.

CLAIM 2. _{hop},

_{2} ≤ 1 for all four of the matrices in question.

For the first matrix, note from Equation (15) that

For the third matrix we note that _{hop} is a square symmetric matrix with eigenvalues in the range [−1, 1]. Thus the eigenvalues of _{hop}‖_{2} ≤ 1, as required.

For the fourth matrix, we have from Equation (15), the definition of η in Equation (19), and the fact that ‖_{2} = σ_{max} that

■

The spiking neural substrates can operate only for values in the range [0, 1]. Thus, to perform computation on numbers that can be either positive or negative, the computations must be divided into two separate domains, one working with the positive parts of the matrices and one with the negative parts. Algorithms 3 and 4 implement the formula Equation (22b). Algorithm 3 performs the preprocessing step or the feedforward path of the neural network architecture, while Algorithm 4 implements the recurrent part of the Hopfield architecture. The steps shown in Algorithms 3 and 4 ensure that the intermediate computation values never saturate. This is managed by performing subtraction of intermediate results followed by addition in the final step. Since the input values were normalized by the scaling factor, as shown in Equation (19), the addition of partial sums would never saturate. We use the following definition of the positive and negative parts of a matrix:

where the max-operation is applied component-wise. The proposed architecture ensures that nonzero elements in the positive-part matrices have zeros in the corresponding elements of the negative-part matrices, and vice versa. We do not have to scale the values in Algorithm 4 while computing matrices _{1}, and _{2} because Claim 1 guarantees that no quantity will exceed 1, by choice of scale factor η. The max function used in Algorithms 2, 3, and 4 can be implemented with a LLIF (linear leaky integrated fire) neuron.

Computes the scaled value of input features _{ff} and the normalizing factor. These scaled values serve as the input for recurrent network

1: |

2: |

3: _{n} = Normalize(_{max}); |

4: |

5: _{n} = Normalize(_{max}); |

6: |

7: |

8: |

9: |

10: |

11: |

12: |

13: |

14: |

15: |

Solve for system of linear equations defined as

1: |

2: |

3: |

4: |

5: |

6: |

7: |

8: |

9: |

10: |

11: |

12: |

13: |

14: |

15: |

16: |

17: |

18: |

19: |

20: |

21: |

22: |

23: |

24: |

25: |

26: |

27: _{k} = Rescale(_{k}, η ‖_{max}); |

28: |

To implement these arithmetic operations we set the TrueNorth neuron parameters appropriately. A detailed description of individual neuron parameters and their behavior with respect to TrueNorth's spiking neurons can be found in Cassidy et al. (_{1}, and _{2} in Algorithm 4, or, variable _{s} in Algorithm 3.

For applications in which matrix

Since the computation involves sending the values through a recurrent path, it is crucial to maintain independence of spike occurrence between the inputs from feedforward path and inputs from recurrent path. Therefore, the inputs that are fed back need to be passed through a decorrelator, as shown in Figure

Computations on the neural network substrate generally have limited precision, producing accumulated error in the output. The two main sources of computation error are (1) quantization of the weights and the input, and (2) stochastic computations, when computations are performed using spiking weights. In this section, we first find the upper bound for the output error for the case where weights are hard-coded into the neural network substrate, considering only quantization errors in the weights and the input. We then update the upper bound for the case in which weights are represented using spikes, so that further stochastic errors arise in the computations.

Given that the elements in the input and weight matrices contain quantization errors, we examine quantization errors in the output. We denote the errors in _{hop}, _{ff}, and _{n} by Δ_{hop}, Δ_{ff}, and Δ_{n}, respectively. If δ_{hop}, δ_{ff}, and δ_{bn} represent upper bounds on the individual elements of Δ_{hop}, Δ_{ff}, and Δ_{n}, respectively, we have by the dimensions of

These errors produce an error Δ_{n} and weight matrices _{ff} and _{hop} without quantization, we can find an upper bound on the norm of the output error Δ^{Q}. This result requires a condition on the singular values of the modified iteration matrix _{hop} + Δ_{hop}, without which the output errors Δ_{j} at successive iterations

and require the following condition to hold:

By using the definition Equation (9a), together with Equations (12), (13), (28), and the Wielandt-Hoffmann inequality, we have

Thus a sufficient condition for Equation (30) is

Note that this condition can be satisfied _{N} > 0. If we assume that in addition to Equation (15), α also satisfies the (not very restrictive) condition

then

This condition bounds the allowable error in the elements of _{hop} in terms of the steplength α and the spectrum of

Recall for the following result that the dimensions of matrices _{n} are

CLAIM 3.

_{Q} is defined by

By subtracting _{k} from both sides of Equation (36), we obtain

By taking norms in Equation (37a) and applying standard norm inequalities, we obtain

where we used Equations (25) and (28) to bound the second term. By applying this formula recursively for

We now obtain a bound on

giving the result. ■

This claim can be used to find the amount of resources needed to generate an output ^{Q} in which the error satisfies a specified bound, for example, _{hop}, δ_{ff}, and δ_{bn} can be manipulated in various ways to meet these goals. For instance, in a neural substrate like TrueNorth, δ_{bn} can be reduced by increasing the number of time ticks at the cost of increased execution time. On the other hand, reductions in δ_{hop} can be achieved by using multiple neurosynaptic cores at the cost of increased area and power. The validation of the model is discussed in section 3. We leave exploration of such optimizations to future work.

As mentioned earlier, stochastic computation is the second key source of error in the Hopfield network. Computation in the stochastic domain is performed not on the exact values of inputs and outputs, but on their expected values. However, the random errors present in the inputs to the computations performed in the network lead to random errors in the output, which accumulate during execution of Hopfield network. In this section, we seeks bounds on stochastic error. Claim 4 shows a bound on the output error for the entire computation in terms of error bounds for a single stochastic matrix multiplication. We complement this claim by estimating the bound on stochastic error in a single stochastic matrix multiplication. It is important to note that there are no useful error bounds that hold with absolute certainty! It is possible—though highly unlikely under reasonable assumptions—for stochastic errors to overwhelm the computation. However, we can use information about the distribution of the errors to give some insight into how these errors propagate through the computation, showing conditions under which we can reasonably expect the results to be acceptably accurate.

Claim 3 defines the error bound due to quantization error, in terms of a quantity _{Q} defined in Equation (35). We can use this definition as part of the upper bound for stochastic error. For this analysis we assume that the only stochastic computation performed in the Hopfield network is stochastic multiplication—we assume that additions are exact. Each matrix multiplication yields some stochastic error that propagates through subsequent iterations.

CLAIM 4. _{M} a bound on the stochastic multiplication error caused by multiplication of W_{ff} _{n}, and denote by E_{N} a bound on stochastic multiplication error caused by multiplication of W_{hop} _{k}. Let E_{Q} be defined as in Claim 3. Then the error bound for ΔH can be estimated as follows:

where Δ_{Mk} and Δ_{Nk} represent the stochastic multiplication errors for _{ff}_{n} and _{hop}_{k−1}. Our assumptions yield the following bounds on these error quantities:

By comparing the formula above, and denoting by _{k}:

This formula represents a closed-form expression for the stochastic error, similar to the closed form equation that was used to derive quantization error in Equations (37a) and (37b). By taking norms, and using Equations (29) and (30) together with Equation (40), we have

The result now follows from the bound on

We complete the error analysis by obtaining bounds _{M} and _{N} on the stochastic matrix multiplication error norms. Instead of finding a

Before describing the stochastic errors that arise from a matrix multiplication, we examine the error in multiplying two _{1} and _{2} be two such scalars, and let _{1}_{2} be their product. On the spiking substrate, _{1} and _{2} are represented by spike trains of length _{1} and _{2} (respectively) their _{1} and _{2} are as follows:

Denoting by _{1} and _{2}, we have that _{1}_{2} and

The value of _{1} and _{2} are close to 0 or 1. A closed form solution can be calculated for the _{Z} over all possible _{1}, _{2} ∈ [0, 1].

CLAIM 5.

As _{1}, _{2}), we obtain

It is easy to check that when _{1} = _{2} = 2/3, the gradient is zero and the Hessian is negative definite. Thus _{1} = _{2} = 2/3 is an approximate maximizer, and the value of

Note that each element of the matrix Δ_{Mk} in the proof of Claim 4 is the stochastic error that arises from taking the inner product of two vectors: one row of _{ff} + Δ_{ff} and one column of _{n} + Δ_{n}. These two vectors have length _{Mk} is the sum of _{Mk} is a random variable with expectation 0 and variance bounded by 0.296_{M} in Equation (40) can be obtained by taking the Frobenius norm of an _{M}:

A similar calculation involving Δ_{Nk} and _{N}, using the fact that _{hop} is _{N}:

This derivation of suitable values for the bounds _{M} and _{N} is informal, but it suffices to give insight into how these bounds vary with the dimensions of the matrices involved, and with the length of the spike train representation

This section presents the validation results and observations for the mathematical models for scaling factor and the precision analysis. For all experiments, we set α = 1.9/trace(^{T}

A TrueNorth-based Hopfield linear solver was applied in the context of real-time robotics applications in Shukla et al. (

Table _{j} in Equation (22a) is 2 × 1.

Results for Hopfield linear solver with spike based weight representation.

1 | Each element chosen uniformly in [−1, 1] | Most basic test for Hopfield linear solver | 1.05 | 0.0004 | 0.0013 |

2 | Each element is an integer chosen uniformly in [−100, 100] | Observe the behavior of linear solver as the input range is increased | 3.5 | 0.0025 | 0.008 |

3 | Each element chosen uniformly in [−100, 100] | Each element can have fractional values | 3.5 | 0.0014 | 0.0028 |

4 | Each element chosen uniformly in [1, 100] | All elements of |
4 | 0.0353 | 0.19 |

5 | Each element chosen uniformly in [0.001, 1] | Analyzing the convergence when the values are small | 4 | 0.0038 | 0.008 |

6 | Each element chosen uniformly in [0.0001, 1] | Analyzing the convergence when the possible values are smaller than previous experiments | 4.25 | 0.0068 | 0.0234 |

7 | Each element chosen uniformly in [−1000, 1000] | Higher precision is required for calculation | 4 | 0.0186 | 0.0413 |

8 | Each element chosen uniformly in [−10, 000, 10, 000] | Testing for the cases when even more precision is required for calculation | 4 | 0.32 | 0.83 |

9 | Each element chosen uniformly in [1, 10, 000] | Matrix |
4 | 1.16 | 2.97 |

10 | Each element chosen uniformly in [−1000, 1000] except that 50% of elements in |
Effect of sparsity on final result and convergence | 4 | 0.024 | 0.0488 |

11 | Each element chosen uniformly in [1, 10, 000] except for 50% zeros | Effect of sparsity on final result and convergence | 4 | 0.24 | 0.94 |

12 | Each element chosen uniformly in [0.0001, 1] except for 45% zeros in |
Effect of sparsity on final result and convergence when elements of |
4.25 | 0.0038 | 0.0114 |

13 | Each element chosen uniformly in [0, 50]. For matrix |
Both the eigenvalues of _{hop} will have magnitude close to 1, but will have opposite signs |
4.25 | 0.37 | 1.01 |

14 | Each element chosen uniformly in [−5 × 10^{5}, 5 × 10^{5}] |
Testing for the cases when up-to 10^{−6} precision would be required for calculation |
4.25 | 5.11 | 9.54 |

15 | Each element chosen uniformly in [1, 5 × 10^{5}] |
Precision of better than 10^{−6} would be required for calculation and all matrix input values have the same sign |
4.25 | 96.48 | 316.54 |

In Table

Inputs that require low precision for computations (Experiments 1, 2, and 3 in Table ^{−6} to reach a solution. Since the proposed work is using stochastic computing, it would require at least 1 million ticks in the best-case scenario to represent a precision of 10^{−6} for a single value, as well as requiring more iterations to converge. While implementing the Hopfield solver on spiking neural substrate such as TrueNorth, the developer would have to consider this speed-accuracy tradeoff. For low-precision values, the Hopfield solver would converge faster, but many more ticks may be required for for high precision values.

When the firing rates of TrueNorth neurons saturate, the actual outputs of the Hopfield linear solver algorithm may no longer match the expected output; in fact, the difference may be quite large. However, for a large enough input scaling factor, the firing rates of neurons will be low enough so that they will never saturate.

We refer to cases 1, 2, and 3 in Table _{min} will be small as well; as a result we get a larger scale factor. The scenarios where η is close to the desired bound is when all of the elements in a matrix are the same and each element has values of high magnitude, similar to Case 2 in Table

Sample matrices for worked out examples.

η = 60 | η = 20 | η = 30 | δ_{hardcoded} = 0.025%_{spiking} = 3.39% |
δ_{hardcoded} = N/A_{spiking} = 0.8% |

Comparison of scaling factor for different matrix structures. Table

Cases 4 and 5 of Table _{ff} and _{hop} are either hard coded on TrueNorth or are supplied as spike train inputs. As per case 4, absolute error for hardcoded weights (δ_{hardcoded}) is less than spiking weights (δ_{spiking}) for same number of spike ticks. This is because hardcoding the weights gives us more control over precision when compared with spiking weights. In Case 5, δ_{hardcoded} cannot be computed because TrueNorth neuron's threshold parameter has a limited number of bits, so _{ff} cannot be mapped onto the board using the technique of Algorithm 1. This problem does not occur with the spike train representation, as higher precision can be represented with longer duration.

The goal of this section is to analyze the quantization error bound and stochastic error bound with the worst-case erroneous output of the Hopfield linear solver. We do so by injecting errors into the weight and input matrices and measuring the resulting error. We are evaluating whether the bounds are tight enough so that they are close to the worst-case erroneous Hopfield linear solver output. Initially, quantization error is introduced in the weight and input matrices and its effect is analyzed for the linear solver. Next, stochastic error is added to weight and input matrices, and the linear solver simulation results are compared with the stochastic bound that was derived in section 2.5.2. Despite being hard to predict, our simulations show that the stochastic error can reach an average of 70% of the bound demonstrating sufficient agreement between the analysis and simulated data.

In order to evaluate the effect of quantization error, the output of the Hopfield network is compared for two cases. In the first case, we provide the exact input and weights to the Hopfield network and calculate the output. In the second case, we introduce some error to the input and the weights equal to the maximum quantization error, and again evaluate the output of the network. Finally, we compare these two outputs and calculate the output error.

The error evaluation results are shown in Table

Quantization error simulation results.

Case 1 | 90.75 |

Case 2 | 5.13 |

Case 3 | 38.63 |

Case 4 | 13.89 |

Case 5 | 24.86 |

Later, we evaluate the stochastic error bound using a similar method as was used for quantization error. However, due to the randomness of the stochastic error, we repeat each test 100 times and report both the average over all repetitions as well as their maximum. As mentioned in section 2.5.2, the calculated bound does not define an absolute upper limit for each repetition, but rather a bound for their average. In other words, we expect the average error over multiple runs to be smaller than the estimated bound, but the maximum value of the error could exceed the bound.

The results for three representative cases (Cases 1, 2, and 3, from Table

Stochastic error for different spike train lengths. Table

Simulations of the quantization and stochastic errors show that the proposed bounds can provide reasonable upper limits for the error of the Hopfield network implemented on a neural network hardware. Therefore, these bounds can be used to gain insight into the precision results of this network for any set of input and weights, before running the algorithm. In addition, they can be used to allocate appropriate resources in order to achieve a specific output precision.

Prior work Shukla et al. (

The implementation of a Hopfield linear solver in such a setup is challenging since the Hopfield neural network (_{ff} and _{hop}) weights change continuously. Also, in this setup there is no training or testing data involved. The goal here is to compute the results online by just looking at the streaming input values without any prior knowledge of the experiment or scenario. We observe additional benefits by deploying multiple linear solvers in parallel since we have to calculate pseudoinverse for multiple different locations on the image at the same time. These experiments give us better insights with respect to selecting TrueNorth as a potential substrate for deployment of such algorithms, and provides a vehicle for energy analysis when compared with more traditional approaches. In this experiment we measure the motion vector error against the baseline, but have also utilized an approximately correct metric: as long as the solver correctly detects flow in one of eight possible ordinal and cardinal directions, we count it as correct. The velocity of the movement of two bars is calculated by solving for _{x}(_{y}(_{t}(

In the proposed setup, we can have multiple input matrices A and B (see Equation 3), that are independent of each other, since the convolution operation can operate on separate and independent patches of image at the same time. The results of these independent convolutions can be streamed as different input matrices A and B. As a result, we can have multiple independent linear solvers running in parallel to compute different pseudo-inverses for these different input matrices. For a frame of size 120-by-180 pixels, linear solver implementation processed 9,800 pixels of a single frame to predict the motion vectors.

Using the optical flow implementation described above, we compare the power and energy consumption of TrueNorth based linear solver implementation with more traditional approaches like QR inverse algorithm on Virtex-7 FPGA (xc7vx980t) and on an ARM cortex A15 mobile processor.

On TrueNorth we can implement 392 instances of the Hopfield linear solver that operate in parallel independent from one another. These 392 instances required 4,092 cores of the available 4,096 cores and can process roughly 9,800 pixels for predicting the motion vectors. Therefore, we would need to compute optical flow motion vectors in the specified scenario in batches of two streaming input pixels for a single 120-by-180 pixels frame.

To maintain the throughput of 30 FPS for 9,800 pixels we needed an 8-core ARM chip operating at 2.5 GHz. For the same FPS and pixel count instantiate 32 instances of QR inverse algorithm on Virtex-7. A detailed discussion about each of the implementation technique is presented as follows:

Figure

This figure shows a comparison between three different implementation techniques for matrix inversion. Y-axis of the plot shows the percentage accuracy in predicting the motion of bars for optical flow. And, X-axis of the plot shows the energy consumed per frame (in Joules) for optical flow.

To the best of our knowledge, this paper is the first attempt to formalize a mathematical framework for determine scaling factors and error bounds when deploying a recurrent numerical solver on limited-precision neural hardware. The proposed research developed a mathematical and algorithmic framework for calculating generalized matrix inverses on this hardware platform. Apart from using the proposed algorithm for real-time robotics applications, it could also be used for on-chip training of multi-layered perceptrons and extreme-learning machines (Tang et al.,

First, section 3.1 compares the results of proposed linear solver against MATLAB's double precision pseudo-inverse function. Results presented in Table

Second, section 3.2 presents the range analysis of Hopfield linear solver. We can guarantee that the proposed scaling factor will keep the firing rates of neurons low enough that they never saturate. Similarly, in section 3.3, we validate the bounds that were proposed in precision analysis. For quantization error, the experimental errors can get very close to estimated (91%) bound, indicating that the bound is tight enough to be useful. For stochastic errors, the average of experimental error always remains below the bound.

Finally, section 3.4 compares the TrueNorth-based Hopfield linear solver against standard QR inverse algorithms that were implemented on the ARM processor and in FPGA. Experiments with the optical-flow application showed the energy benefits of deploying a reduced-precision and energy-efficient generalized matrix inverse engine on the IBM TrueNorth platform. Since TrueNorth architecture was designed to be low power, deployment of multiple linear solvers running in parallel could give a 10 × to 100 × improvement in terms of energy consumed per frame over FPGA and ARM core baselines.

Sections 2.3.2 and 2.4 present algorithms that can compute matrix inverses using concepts from stochastic computing (Gaines,

In future work, we will look into speeding up the computation by using a population coding scheme for encoding values to spikes. Our current implementation uses a rate coding technique for encoding values with a single neuron. Considering the resources that we have available on TrueNorth board, a population coding scheme could perform computations in parallel, hence reducing the time to solution.

Large scaling factor values may end up resulting in longer computation time, since we end up requiring more ticks to represent the scaled values accurately. Alternatively, tight scaling factors that still avoid saturation require computation of the matrix pseudoinverse, using external hardware. This complicates deployment of the Hopfield solver in scenarios where the matrix changes over time. Developing tighter bounds, especially ones that are easier to compute online, would avoid these problems.

Finally, in the proposed architecture in this paper the term α was precomputed. As part of our future work, we would like to create a TrueNorth based framework where α could be computed dynamically via spikes.

RS was responsible for preparing the manuscript, scaling analysis of linear solver, coming up with the algorithm for hardware implementation of linear solver and all of the experiments related to hardware (TrueNorth, FPGA and ARM) analysis and hardware implementation. RS also did the scaling factor based experiments. SK was responsible for mathematical theory of error analysis and precision based error analysis. EJ did the implementation of algorithm on TrueNorth hardware. JL, and ML were the ones that guided the project, came up with initial ideas. SW suggested corrections for mathematical theorem and proofs, as well as did rewritings in the manuscript.

ML has financial interest in Thalchemy corp. and is co-founder of the said corporation. Thalchemy corp. was not at all involved in this research project in any form. The other authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.