^{*}

Edited by: Jonathan C. Tapson, Western Sydney University, Australia

Reviewed by: Fabio Stefanini, Columbia University Medical Center, USA; Sio Hoi Ieng, University of Pierre and Marie Curie, France; Nabil Imam, Cornell University, USA

*Correspondence: Tobi Delbruck

This article was submitted to Neuromorphic Engineering, a section of the journal Frontiers in Neuroscience

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

In this study we compare nine optical flow algorithms that locally measure the flow normal to edges according to accuracy and computation cost. In contrast to conventional, frame-based motion flow algorithms, our open-source implementations compute optical flow based on address-events from a neuromorphic Dynamic Vision Sensor (DVS). For this benchmarking we created a dataset of two synthesized and three real samples recorded from a 240 × 180 pixel Dynamic and Active-pixel Vision Sensor (DAVIS). This dataset contains events from the DVS as well as conventional frames to support testing state-of-the-art frame-based methods. We introduce a new source for the ground truth: In the special case that the perceived motion stems solely from a rotation of the vision sensor around its three camera axes, the true optical flow can be estimated using gyro data from the inertial measurement unit integrated with the DAVIS camera. This provides a ground-truth to which we can compare algorithms that measure optical flow by means of motion cues. An analysis of error sources led to the use of a refractory period, more accurate numerical derivatives and a Savitzky-Golay filter to achieve significant improvements in accuracy. Our pure Java implementations of two recently published algorithms reduce computational cost by up to 29% compared to the original implementations. Two of the algorithms introduced in this paper further speed up processing by a factor of 10 compared with the original implementations, at equal or better accuracy. On a desktop PC, they run in real-time on dense natural input recorded by a DAVIS camera.

Accurate and fast measurement of optical flow is a necessary requirement for using this flow in vision tasks such as detecting moving obstacles crossing the path of a vehicle, visually guiding aircraft or space vehicle landing, or acquiring structure from motion information about the environment. The progress of optical flow estimation techniques is marked by two major stepping stones: The quantitative evaluation of optical flow algorithms by Barron et al. (^{1}

In contrast to conventional image sensors, the dynamic vision sensor (DVS) camera produces not frames but asynchronous address-events (AE) as output (Lichtsteiner et al.,

Optical flow algorithms operating on the DVS output can benefit from these characteristics. They offer a solution to one challenge frame-based techniques face, namely large inter-frame displacements that occur in fast motion. For instance, Benosman et al. (

This work proposes a novel source of ground truth for optical flow evaluation, namely rate gyro data from an inertial measurement unit on the camera. In our dataset, camera motion is restricted to camera rotation, so all optical flow can by computed using the rate gyro information. A database is created and offered to compare several event-based optical flow algorithms. We introduce a simple smoothing filter to increase accuracy while reducing computation cost. The database as well as all the code is made public; the dataset link is provided together with a detailed description in Section 2.2 and each algorithm's open-source software implementation is provided in a footnote link. Section 3 presents the benchmarking results, which are discussed in Section 4 and lead us to the conclusion that economical real-time event-driven computation of motion flow is possible, but further development will be required for many practical applications.

We implemented and tested a direction selective filter (Delbruck, ^{2}

Except for the direction-selective (DS) method, this evaluation mainly focuses on gradient-based motion estimation methods that operate on the DVS AER events. Gradient-based approaches compute a first-order derivative on some image property to estimate motion. In frame-based methods, this property is typically luminance. While we refer to all methods in Sections 2.1.2 and 2.1.3 as gradient-based, note that they operate on a different function: The four Lucas-Kanade (LK) variants (Section 2.1.2) use a change in light intensity, while the four local-plane (LP) algorithms in Section 2.1.3 compute gradients on a surface containing the timestamps t(x,y) of the most recent DVS events as a function of pixel location.

We refer to this method by the acronym “DS.” The DS method was developed by T. Delbruck in 2007 as the jAER class DirectionSelectiveFlow (

In the DS algorithm, each incoming DVS event is preprocessed by an orientation filter called SimpleOrientationFilter^{3}

Next, for each orientation event, DirectionSelectiveFilter uses time of flight to compute the motion of this edge. It does this by computing the temporal interval of these orientation events to the most recent past orientation events along 2 lines of pixels extending out from the current event's pixel along the 2 directions perpendicular to the edge, i.e., if the edge is vertical, then the two horizontal directions left and right are checked. The distance is a parameter typically set to 5 pixels. The reciprocal of the average time difference between pixels is thus the speed of the edge in pixels per second. In this computation, past orientation event timestamps that are too old (typically more than 100 ms old) are not counted. In one of the two directions, the timestamp differences will likely indicate a reasonable speed; in the other direction the timestamp differences will typically be very large because the orientation events resulted from previous edges. The output events are labeled with this scalar speed and a quantized angular direction with values 0–7, 0 being upward motion, increasing by 45° counter-clockwise to 7 being motion up and to right.

We refer to these methods by the acronym “LK.” They are implemented as the jAER class LukasKanedeFlow^{4}

The method makes use of the assumption that light intensity ^{5}^{2} equations:

The Least-Squares solution to this matrix equation ^{T}^{−1}^{T}^{T}_{1} ≥ λ_{2} > 0. Thus, the eigenvalues serve as confidence measures, i.e., as means of determining the correctness of the computed velocities. No velocity is computed if both eigenvalues are smaller than a certain confidence threshold τ, i.e., λ_{1} < τ. If both are greater than τ, the matrix is considered invertible and the velocity can be computed as shown. If λ_{1} ≥ τ and λ_{2} < τ, we compute

Inserting Equation (3) back into the gradient constraint Equation (1) asserts the validity of this formula. Note that it is not feasible to use this equation in the first place, sidestepping the Least-Squares fit of the whole neighborhood. That would result in very noisy flow fields, while taking the whole neighborhood into account helps smooth local fluctuations.

The appropriate value of τ^{6}

In the case of DVS output, the Equation (2) have to be reformulated in terms of address-events. To estimate the spatial gradients without gray levels (events are ON/OFF type only), one can count the number of events that happened in adjacent pixels in the neighborhood during the last Δ

This equation relates the differential flow brightness consistency constraint to AER events. Image intensities are then approximated by event summations, because the dynamic vision sensor (DVS) does not provide absolute intensities. The results of this approach are shown in Section 3 under the label _{BD}.

As we will discuss in Section 4, the main problem of event-based gradient methods is the potentially small number of events in a neighborhood, making the derivative-estimation unstable. Brosch et al. (

The effect on accuracy of this consistent use of second derivatives is discussed in Section 4. Note that one factor

The original method Benosman et al. (_{j+1}, _{j}, _{j−1}, _{j}, _{j}, _{j+1}, _{j}, _{j−1}, _{CD1} in Section 3.

Because of the crucial role the derivative approximation plays in estimating flow vectors here, we tried the effect of applying a finite difference with higher order of accuracy by using more points on the accumulated event histogram. Above, the central difference coefficients are (−1/2, 0, 1/2) at locations (_{CD2} and compare it in Section 3.

The event-based Lucas-Kanade method treated here computes derivatives of an ^{T}^{−1}^{T} of the matrix containing the polynomial terms _{00}, _{10}, _{01}}. Thus, _{00}, _{10}, _{01}} that act on the pixel neighborhood, where each of kernels is a row of

For the specific example of a first order filter that estimates the first derivatives in a 3 × 3 neighborhood, Equation (7) results in kernels in {_{10}, _{01}} (rearranged to show the spatial operation on the neighborhood)

The convolution coefficients _{pq} are calculated once for the neighborhood of the event in question, using Equation (7). Now the gradient of this surface can be read out directly from the first order fit terms. It is not necessary to evaluate the fitted function over all the points, but these smoothed points could be computed by inserting each x,y coordinate of the neighborhood into Equation (6). The results of using the SG filter are presented in Section 3 under the label _{SG}.

We refer to these methods by the acronym “LP.” They are implemented as the jAER class LocalPlanesFlow^{7}

When each event at pixel location (

This formulation assumes time to be a strictly increasing function of space, such that the local derivatives

To correctly deal with vanishing gradients in one direction, we first use the fact that the true

This formulation is robust against vanishing derivatives

As a confidence measure for the rare case that both gradients vanish (flat plane, infinite velocity), we introduce a threshold th3. If both ^{3} pixels/second are not reported.

Brosch et al. (^{T}, the contour orientation ^{T}^{T}^{T}_{x}, _{y} results in Equation (9). However, using homogeneous coordinates in this way is misleading, as it implies checking orthogonality between vectors of different vector spaces: The normal ^{3} (with scale c rather than 1 as in _{h}), whereas ^{3}. Instead of _{h}, we use the vector containing infinitesimal increments along the x, y, and t axis: _{h} up to a scale dt, which drops out in the orthogonality relations. Thus, we arrive at the same result without the need to change over to projective space.

We implemented both the original method^{8}^{9}_{orig} and _{robust} respectively.

How critical for accuracy is the iterative improvement of the plane fit, as described in the second paragraph of this section? One could argue that the iterative rejection of distant events makes sense only when the fit is already good: If the initial plane happened to misrepresent local motion, a valid event would be considered an outlier and wrongly removed. Using simply the initial fit to estimate the gradient substantially reduces the amount of computations needed: The re-computation of the plane equation for the whole neighborhood as well as the eigenvalue-decomposition for the reduced data matrix drop out. But even for the initial plane fit we can dispense with the costly eigenvalue-decomposition. The vertical offset

We measured the difference in processing time for both methods as well as the error right after the initial fit (labeled _{SF})^{10}

In an attempt to smooth the often noisy surface of recent events without distorting the signal too much, we applied a two-dimensional linear Savitzky-Golay filter as described above. The same as for the event rate function, we could obtain the smoothed surface by calculating the _{pq} once for the neighborhood of the event using Equation (7) and inserting each point of the neighborhood into Equation (6). However, if interested only in the gradient of a linear fit as in the context of the Local Planes method, this last step can be skipped. After computing the three parameters _{00}+_{10}_{01}

Computing the parameters

The results of this method are shown in Section 3 under the label _{SG}^{11}

We refer to this method of computing optical flow as IMU. It is implemented as the jAER class IMUFlow^{12}^{13}

The gyro data is given in rotational angular rates around the three camera axes (see Figure _{n} from an IMU sample ẋ_{n} taken at time _{n} from Equation (10):

The subscript n indicates the position in the sample series (we drop it in the following). The IMU updates the angular rates at a sample rate of 2.5 kHz (the IMU was fully interfaced to the camera CPLD since publication of Delbruck et al.,

These rates offer way to obtain an estimate of true optical flow produced by a pure rotation of the camera. As shown in Delbruck et al. (_{x}, _{y}) according to Equation (12):
_{0} is the pixel address nearest the center of the IMU, _{0} is the center of the pixel array. Equation (12) assumes a pinhole lens with no distortion, so that all pixels have the same magnification, which was approximately the case for the lens used in this study, although on the periphery, some slight lens distortion is evident from the data in

The z image rotation matrix

To scale from camera pan and tilt rotation angle to DVS pixels, we multiply with the conversion factor

This equation is derived from the trigonometric identity that the tangent of the angle that subtends one pixel is equal to the pixel pitch

Once we know the transformed event's address

This computation of course assumes that the objects seen by the camera are stationary, and the camera is only rotating, not translating through space. In this study, we excluded camera translations by fixing the sensor on a rotor (see Section 2.2).

The implemented computation of the motion field based on integration and then discrete differentiation of the gyro rates for each event is inherited from the dual use in jAER of the rate gyro information for stabilizing (derotating) the DVS output (Delbruck et al.,

Optical flow estimated from IMU gyro data is useful as ground truth where either the visual motion field (not just normal flow) is measured, or the orientation of the contour is known so that the ground truth vectors can be projected onto its normal (e.g., in direction selective filters).

To evaluate the methods presented above, we created a public dataset^{14}^{15}

Note that the gradient-based and direction-selective methods discussed here only compute normal flow, i.e., the velocity perpendicular to the orientation of a moving edge. We therefore focused on collecting a dataset that contained only normal flow.

Address-events with timestamp and polarity can be created in Matlab and imported into jAER. The script for creating this dataset is available in the dataset and the script for importing these synthetic events is located in the jAER repository in the matlab folder^{16}

The first is a square of width 40 pixels translating with

The second sample is a bar of width 1 pixel and length 50 pixels rotating with rotation rate 0.21 Hz. In this case, normal flow matches object flow. We expect the flow vector direction to vary continuously with the rotation angle, and the speed to be proportional to the distance from the center of rotation.

To record the samples from real input, the DVS was mounted on a rotor which restricted motion to either pan, tilt or roll, depending on how the camera was fastened. The pan, tilt and roll were arranged to create pure camera rotation; in the case of pan and tilt movements the scene was sufficiently far from sensor so that the slight camera translation created negligible flow. The rotational speed was controlled via a DC-voltage source. The visual stimulus for the rotating bar and translating sinusoid was printed on paper and fixed in front of the camera at a distance of approximately 30 cm. Images of the experimental setup and of the input stimuli are provided in the supplementary material.

The first real sample shows a contrast pattern that varies sinusoidally in horizontal direction. The camera is panned clockwise around its y-axis to create the impression that the sinusoid is shifting to the left. The contours are slightly curved due to lens distortion. The consequence is that the ground truth from the IMU is not perfectly normal everywhere on the contours. This introduces a systematic error of maximum 2.2 pixels or 8° at the corners of the image in the flow vectors on the periphery of the image. Such distortions can be corrected by the fully-integrated

The second sample contains a disk with eight compartments of varying gray-levels. The camera is rotated around its z-axis (roll) so the disk seems to turn clockwise.

The third sample shows a table with two cases and books. The scene contains some shadows, reflective components, specularities, parts with little contrast between fore- and background, no highly textured surface, no complex motion discontinuities, no transparency. The motion is purely translational and caused only by the panning camera.

In their concluding remarks, Barranco et al. (

As a novel source of ground truth we propose using the flow field obtained from IMU gyro data as explained in Section 2.1.4. When using a camera that hosts an inertial measurement unit, the ground truth can be calculated in real time and parallel to any other filters applied to the incoming frames or events. Optical flow measurements based on visual cues can be compared directly to the IMU flow, which can serve as feedback or initial guess and to discard outliers. It has some obvious limitations, however: the flow field can only be calculated from the IMU, if the camera is rotating about its axes. Also, the scene has to be static; objects moving independently in the scene would be labeled with the same IMU-deduced flow as resting objects. Finally, there is no depth distinction. For these reasons the dataset created here does not contain moving objects or non-rigid motion.

We use the established absolute Average Endpoint Error
_{i} = (_{vx, vy)i} stands for the i-th sample of measured flow and _{i} = (_{ux, uy)i} for the corresponding ground truth flow vector.

The endpoint error does not distinguish between angular deviation and speed difference. Therefore, we measure the angular error as well. In the frame-based optical flow literature, the angular error is not purely angular in the image plane, because it considers the angle in space-time (pixel,pixel,frame), i.e., by computing it as (Barron et al.,

This angular error is between vectors in 3D with a constant third component as arbitrary scaling constant that also prevents division by zero. To see how this measure combines angular and length error, compare the angular error between velocity vectors (1,0) and (2,0): The angular error in the image plane is zero, the endpoint error is not and neither is Equation (17). The additional third component in (_{x}, _{y}, 1), i.e., the combination of angular error in the 2D image plane and length error, causes this bias that has been pointed out in Barron et al. (

Zero-velocity measurements are not counted in the computation of Equation (18).

We display the Standard Deviation (SD) instead of the Standard Error of the mean (SE) because we are interested in the spread of the data points around the mean rather than the closeness of the sample mean to the population mean.

In addition to the average and SD, we report similar robustness measures as in Baker et al. (

The computation time of an optical flow vector for a single event is usually smaller than the resolution of the time measurement function used (e.g., Java's System.nanoTime() has a resolution of microseconds on our Windows computer). The events have to be buffered into event packets and the total computation time then averaged. However, the algorithms discard events during computation of flow. Not all events can be used: for instance, the neighborhood of an event may be so sparsely populated that the data matrix used in the Least-Squares Regression is singular. Then no motion vector is calculated. Other tests include a refractory period or a speed control. Dividing the total processing time of the packet by the initial input size would result in a mean processing time which could be several orders of magnitude too small, because not all events actually make it all the way through the filter. To circumvent this, all unnecessary filters (like refractory period and speed control) are turned off; in those tests that are inherent to the flow computation (like invertibility), the event is assigned zero velocity and continued to be processed normally, so that no regular part of the filter is skipped. Another option could be to filter out in a first stage all the events that would produce invalid results, and in a second stage measure the computation time for the remaining set with known size.

The real-time cost of processing an event was measured on a Core i5 @ 2.4 GHz PC running Windows 10 × 64 and Java 1.7. The overhead on this machine simply to process event packets is of the order of tens ns/event and can be neglected.

Figure

_{BD} fails to detect the correct direction of motion in these synthesized samples due to the asymmetry of the backward finite difference approximation (discussed in Section 2.1.2).

In Figure

The event-density reported in the supplementary material signifies the fraction of events that passed the filter with valid flow vector attached. As mentioned in Section 2.5, the algorithms contain various tests to improve the results and discard outliers. These tests have a noticeable effect on computation time: For instance, the refractory period skips computation of affected events entirely. At the same time, the motion field is thinned out. The event-density is a direct indicator of these two effects.

Table

translSin | 31 px/s | 30.05 px/s | 3% |

rotDisk | 0.52 1/s | 0.57 1/s | 8.8% |

translBox | 26 px/s | 24.5 px/s | 6.6% |

Without IMU offset correction, the IMU constantly outputs some non-zero gyro data that results in flow vectors even when the sensor is at rest. These offsets are computed during stationary camera conditions by averaging several hundred samples and then subtracting these offsets from subsequent readings. This subtraction reduces but not completely removes the error. The remaining non-zero flow vector at rest is much less than a pixel per second and the resulting angular error is only a small fraction of a degree.

See Figure

Table

IMU | 0.04 | |

LK_{BD} |
5.32 | 0.21 |

LK_{CD1} |
5.06 | 0.30 |

LK_{CD2} |
8.99 | 0.29 |

LK_{SG} |
0.27 | |

LP_{orig} |
4.35 | 0.31 |

LP_{robust} |
4.51 | 0.28 |

LP_{SF} |
0.70 | 0.13 |

LP_{SG} |
0.03 | |

DS | 0.02 |

IMU | 100 |

LK_{BD} |
1857 |

LK_{CD1} |
1957 |

LK_{CD2} |
2557 |

LK_{SG} |
1372 |

LP_{orig} |
1827 |

LP_{robust} |
1840 |

LP_{SF} |
1359 |

LP_{SG} |
980 |

DS | 1096 |

In Table

LK_{BD} |
135.77 ± 31.45 | 73.24 ± 56.52 | 20.35 ± 16.46 | 51.71 ± 45.47 | 108.07 ± 28.67 |

LK_{CD1} |
29.68 ± 20.54 | 21.72 ± 35.31 | 19.93 ± 21.35 | 28.98 ± 31.27 | |

LK_{CD2} |
9.38 ± 19.31 | 32.85 ± 21.17 | 13.33 ± 15.18 | 18.72 ± 19.00 | 36.07 ± 36.69 |

LK_{SG} |
11.48 ± 8.80 | ||||

LP_{orig} |
17.54 ± 21.56 | 38.93 ± 61.85 | 28.06 ± 33.39 | 9.78 ± 32.96 | |

LP_{robust} |
9.56 ± 29.97 | 37.72 ± 55.76 | 22.30 ± 32.70 | 9.46 ± 22.67 | |

LP_{SF} |
2.39 ± 8.98 | 43.99 ± 48.52 | 23.39 ± 32.12 | ||

LP_{SG} |
8.72 ± 17.92 | 13.96 ± 23.62 | |||

DS | 32.82 ± 56.67 |

Values in bold face are smallest error for each algorithm.

_{rel}[%] |
|||||
---|---|---|---|---|---|

LK_{BD} |
123.43 ± 11.04 | 414.95 ± 767.50 | 57.85 ± 16.37 | 76.65 ± 31.87 | 116.95 ± 15.57 |

LK_{CD1} |
197.47 ± 386.69 | 57.16 ± 25.55 | 54.46 ± 34.89 | 58.13 ± 28.86 | |

LK_{CD2} |
32.87 ± 24.43 | 37.54 ± 21.52 | 64.53 ± 24.52 | 72.54 ± 27.70 | |

LK_{SG} |
65.08 ± 21.08 | 326.77 ± 253.31 | |||

LP_{orig} |
175.08 ± 460.93 | 62.82 ± 48.67 | 60.71 ± 61.76 | 37.93 ± 35.15 | |

LP_{robust} |
91.61 ± 278.97 | 59.45 ± 37.49 | 36.13 ± 27.80 | ||

LP_{SF} |
6.41 ± 15.13 | 69.62 ± 33.17 | 58.99 ± 37.87 | ||

LP_{SG} |
114.56 ± 341.18 | 78.02 ± 281.03 | 66.74 ± 44.78 | ||

DS | 62.92 ± 60.50 |

This study compares accuracy and processing time of nine event-based optical flow algorithms. A direct comparison with frame-based methods is possible by applying them on the frames included in this dataset (which has not been done yet). In order to use existing frame-based benchmarks (e.g., on the Middlebury database), (Barranco et al.,

The fastest method was the direction selective filter with 0.36 μs per event, where no extensive linear algebra is performed. The next fastest method is the Savitzky-Golay-filter variant of the Local Plane Fit method, which with 0.58 μs is about eight times faster than the original version because computing the fitting parameters does not involve solving a system of equations. _{SF} is almost as fast because it does not repeatedly improve the initial plane fit. However, it has to solve a linear system of equations once per event with linear least squares. The standard Lucas-Kanade methods take between 5 and 9 μs per event, while its Savitzky-Golay variant ranges at 3.13 μs. Calculating the motion field with the IMU data is very fast (0.38 μs per event) because the transformation of the event-coordinates (rotation and translation) is not costly, and the rotation matrix and translation vector has to be computed only about every millisecond when an update from the IMU comes in.

All methods take less than 10 μs to calculate the motion vector of a single event; DS, IMU and _{SG} take less than one μs per event. Thus, they all run in real-time on contemporary PCs^{17}_{SG} or DS, or subsampled to reduce event density and make it suitable for slower algorithms. Subsampling is usually unproblematic in global motion estimation (e.g., for motion stabilization), but might impede local flow estimation (e.g., object tracking).

The algorithms are implemented in pure Java but became much faster by relying on just-in-time optimization and performing most linear algebra explicitly in pure Java rather than by using numerical libraries like jama or jblas. As pointed out by Barranco et al. (

Methods that performed well in terms of angular error were the Savitzky-Golay variants of the Lucas-Kanade and Local Plane Fit algorithm, and - surprisingly, given its limitation through the quantized angle - the DirectionSelective filter. The derivative estimation with central finite differences instead of backward differences clearly improves finding the correct direction. Similarly, computing the plane fitting parameters with robust Equation (9) improves performance of the original version, most distinctly in the rotating bar and disk samples. There, the original method suffers from angle quantization, as outlined in Section 2.1.3. Evaluation of the endpoint error reveals a similar picture, though the LocalPlane Savitzky-Golay variant is not as good as for the AAE.

The standard deviation is of the same order as the mean, for all methods and samples. There are at least two possible explanations. First, the estimated flow vectors stray around the mean due to the noisy event-structure at contours. Second, a substantial portion of vectors point quite oppositely to the true direction of motion (see Section 4.2.1 for a discussion of this phenomenon). Together with the knowledge that the variation is a dispersion measure sensitive to outliers (because distances from the mean are squared), this accounts for the large standard deviation. A more robust statistic would be the median absolute deviation around the median, or the mean absolute deviation around the mean, which are more resilient to outliers.

A vital part of the Lucas-Kanade variants is the estimation of intensity derivatives using finite differences. This is a relatively crude form of numerical differentiation given the highly discontinuous and noisy intensity (event frequency) function. Comparing Figure _{CD1}). A second order (8-point) central difference derivative (_{CD2}) does not pay off significantly. While for some samples the error is reduced by a few percent, using a higher order in fact increases the angular error for other samples by up to 10%. The derivative is more accurate, but also more susceptible to noise.

^{T})

The consistent use of second temporal derivatives on the RHS of Equation (5) is problematic because of the small number of events in a neighborhood, as outlined by the authors proposing this modification (Brosch et al., _{CD2}: Depending on the image sequence, it increases the error by up to 10% because noise is amplified. Apart from its influence on accuracy, the second temporal derivative increases the processing time by about 50% because in our implementation we now need to loop over the neighborhood twice to find the number of events at each pixel location in two different time windows. Note that we did not explicitly include a comparison between the use of first and second temporal derivative in the tables in Section 3; all Lucas-Kanade methods there employ the second temporal derivative.

As visible in Figure _{rf}, which allows skipping the calculation of an event's motion flow vector when the pixel has fired its last event within τ_{rf}.^{18}_{rf} = 15 ms most of them are excluded. In Figure

Figure

All Lucas-Kanade variants display considerably higher AEE on the synthetic samples translSqu and rotBar. This is due to fact that a moving edge e.g., in translSqu consists of a line of single events with equal timestamps stepping to the adjacent pixel location every 50 ms. The event rate function on which the derivatives are computed thus contains only a constant one event per time bin, regardless of how fast the edge is moving. (In a real sample, a faster moving edge would accumulate less events at a given pixel than a slow one.) These artificial samples thus turn out to be inadequate for Lucas-Kanade methods.

Local area methods that are causally event-based know nothing about what lies ahead of a leading edge. This is not a problem in Lucas-Kanade-like methods above, where the flow vectors basically point from regions of higher event-rate to those with less. Local plane fit methods however operate on the surface of most recent events, where a large portion of the timestamp data points may lie quite far away (stemming from previous motions of other objects), or may have never been set at all (e.g., when looking from the past to the future side of a moving edge). This makes estimation of fitting parameters susceptible to noise and, due to sparse distribution of valid events, often impossible.

The local plane fit as well as Lucas-Kanade methods perform badly in the presence of texture: A contour moving over a textured background produces a varying number of events along the edge and over time because the contrast between edge and background changes due to the texture. This can be dealt with by applying Gabor-filtering and using a phase-based instead of gradient-based method (Barranco et al.,

Local plane fits become unreliable when the edge is broad. Consider an edge moving to the left as in the translating sinusoid sample. The pixels in the leftmost column of the edge can fire at almost the same time as pixels in the adjacent column, not because the edge moved so fast, but because its contrast changes continuously. This causes large variations in the estimated speed of the flow and a general overestimation of speed in the Local Planes methods. The AEE reflecting this can be reduced by a factor of 2–3 by prefiltering the events with a refractory period^{19}_{rf}≈100

A large neighborhood increases the chances that the surface of active events contains structure that is not well approximated by a local plane, while a small neighborhood runs the risk of containing not enough valid events for a robust fit. Other contributing factors are multiple motions or transparent motion stimuli (Brosch et al.,

Recall that this method computes the flow vectors by first determining the orientation of a contour, and then looks for past orientation events perpendicular to it. The speed is then the pixel distance divided by the average time-difference to this past event over a search distance (typically 5 pixels). Angular errors are introduced in the first stage, endpoint (speed) errors in the second stage. Thus, like the other methods, the direction selective flow computation suffers from noisy and uneven contours, because then the local orientations of small contour-patches vary, and with it the direction of normal velocity. But even if the algorithm found the correct orientation (low angular error), the magnitude of the computed velocity (endpoint error) may vary by a factor of 100. This error is due to the fact that the events

Another source of error is the quantization of vector direction in bins of 45°. In our dataset, the DS filter is favored by the use of pure pan and tilt movements and this bias should be taken into account by users who desire continuous angle motion vectors.

A different event-based approach to motion estimation using direction-selective filters has been taken recently by Brosch et al. (

The rotation rates used to compute ground truth are updated with 2.5 kHz. This quantized time is a potential error source in fast motion. Considering a pure rotation about the z-axis (roll), each image point describes an arc, but our method approximates this segment with a straight line, introducing an error in speed and direction. However, calculating the angle between the tangent to the endpoint of the circular arc and the secant line approximating this arc, it becomes clear that this error is not significant in most situations. For instance, if the sensor turns with less than 1.39 rounds per second, the error in direction due to a finite sample rate stays below 0.1 degree. Nevertheless, a way to make this source of ground truth applicable even to very fast rotations or slower IMU update rates, one could use the rotational velocities directly instead of integrating to get the rotation angle. This would require a rotation about the focal point and a coordinate transform from the IMU pose to the focal point. The motion field then follows directly from the well-known relations between the scene and the sensor.

In this report we compare nine basic algorithms that compute optical flow based on address-events from a neuromorphic dynamic vision sensor. A tenth method is presented that allows estimation of the motion field in real time using camera rotation rates obtained from an inertial measurement unit mounted on the camera. Based on this ground truth, the nine methods are tested for three real sequences that seem simplistic but nevertheless reveal fundamental challenges of event-based motion flow, namely noisy and fissured contours.

Six of the methods presented above are variants of the Lucas-Kanade algorithm (Benosman et al., _{BD}, _{CD1}, and _{CD2} evaluate the performance of three finite difference methods and provide evidence of how critical the numeric derivative approximation is. _{robust} and the consistent use of a second temporal derivative in the Lucas-Kanade methods were suggested by Brosch et al. (

All methods run in real-time if the event-rate is below 1e5 events per second; DS, IMU and _{SG} are able to handle 1e6 events per second, which is rarely exceeded by dense naturalistic input. Compared to the original Lucas-Kanade (Benosman et al., ^{20}_{BD} and _{orig} reduced the computational cost by 26 and 29% respectively.

We also discuss the error statistics used so far and suggest a purely angular measure between flow vectors in the image plane. This serves to lessen the bias inherent in the conventional average angular error and to separate a deviation in direction from speed error, to better expose aspects requiring improvement.

With the IMU-integration we establish a new performance measure to compare various motion flow algorithms in situations where the direction normal to edges is known.

The data set of consisting of a mixture of DVS and DAVIS data collected for this study has been shared to provide a baseline for future comparison^{21}^{22}

It is easy to record natural data for which all of the described algorithms fail rather dramatically, but further pursuit of accurate event-based methods with DVSs is worthwhile considering that this neuromorphic hardware fits many of the challenging demands of real-time applications in terms of low power consumption, small response latency, and efficient use of hardware resources. One implication of this report is that accuracy can be improved with modifications that actually reduce computation time significantly. We expect that use of DAVIS and ATIS (Posch et al.,

BR developed the algorithms, collected and analyzed the data, and wrote the body of the paper. TD contributed guidance and writing.

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

This work is supported by the European Union funded project SEEBETTER (FP7-ICT-2009-6), VISUALISE (FP7-ICT-2011.9.11), the University of Zurich and the Swiss Federal Institute of Technology. We greatly appreciate the helpful reviewers comments, suggestions from G. Orchard, E. Muggler and M. Osswald regarding camera calibration, M. Milde and G. Orchard for insights into the original Local Planes method, and L. Longinotti for his complete integration of the IMU data acquisition in the DAVIS camera logic.

^{1}

^{2}

^{3}

^{4}

^{5}This assumption does not hold if the neighborhood covers motion boundaries, which are admittedly the most interesting regions for determining structure from motion. Because most information in event-based methods comes from contours, this may well be a significant error source that our dataset does not cover well because of its basis in the motion field caused by camera rotation. However, events at an edge typically span a width of several pixels when accumulated over the time window set for flow computation, so except for the outermost events the assumption will hold.

^{6}The parameter

^{7}

^{8}

^{9}

^{10}

^{11}

^{12}

^{13}

^{14}This dataset is hosted at

^{15}We write DVS to mean the DAVIS240c throughout this paper, because only the DVS events from the DAVIS240c were used. However, the dataset includes synchronous frames.

^{16}

^{17}Which however burn on the order of 100 W.

^{18}Besides the positive effect on accuracy discussed here, the refractory period can also be used to thin down the motion field for better visibility and computation speed. This does not affect the motion flow calculation of other events: If the refractory period test returns true, i.e., the filter is told to skip this event and move on to the next, the filter nevertheless memorizes the event. Thus, it is part of the neighborhood of the next event close by whose motion flow is calculated.

^{19}

^{20}

^{21}in pure Java rather than C++ as Benosman et al. (

^{22}