^{1}

^{*}

^{2}

^{3}

^{1}

^{1}

^{2}

^{3}

Edited by: Gaute T. Einevoll, Norwegian University of Life Sciences, Norway

Reviewed by: Mikael Djurfeldt, Royal Institute of Technology, Sweden; Alexander K. Kozlov, Royal Institute of Technology, Sweden

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

More than half of the Top 10 supercomputing sites worldwide use GPU accelerators and they are becoming ubiquitous in workstations and edge computing devices. GeNN is a C++ library for generating efficient spiking neural network simulation code for GPUs. However, until now, the full flexibility of GeNN could only be harnessed by writing model descriptions and simulation code in C++. Here we present PyGeNN, a Python package which exposes all of GeNN's functionality to Python with minimal overhead. This provides an alternative, arguably more user-friendly, way of using GeNN and allows modelers to use GeNN within the growing Python-based machine learning and computational neuroscience ecosystems. In addition, we demonstrate that, in both Python and C++ GeNN simulations, the overheads of recording spiking data can strongly affect runtimes and show how a new spike recording system can reduce these overheads by up to 10×. Using the new recording system, we demonstrate that by using PyGeNN on a modern GPU, we can simulate a full-scale model of a cortical column faster even than real-time neuromorphic systems. Finally, we show that long simulations of a smaller model with complex stimuli and a custom three-factor learning rule defined in PyGeNN can be simulated almost two orders of magnitude faster than real-time.

A wide range of spiking neural network (SNN) simulators are available, each with their own application domains. NEST (Gewaltig and Diesmann,

Our GeNN simulator can already be used as a backend for the Python-based Brian2 simulator (Stimberg et al.,

In a simulation of a large, highly-connected model of a cortical microcircuit (Potjans and Diesmann,

In a simulation of a much smaller model of Pavlovian conditioning (Izhikevich,

Using the facilities provided by PyGeNN, we show that both scenarios can be simulated from Python with only minimal overheads over a pure C++ implementation.

GeNN (Yavuz et al.,

The

In order to use GeNN from Python, both the model creation API and the

The

Having to manually add these directives whenever a model is added to GeNN would be exactly the sort of maintenance overhead we were trying to avoid by using SWIG. Therefore, when building the Python wrapper, we instead search the GeNN header files for the macros used to declare models in C++ and automatically generate SWIG

As previously discussed, a key feature of GeNN is the ease with which it allows users to define their own neuron and synapse models as well as “snippets” defining how variables and connectivity should be initialized. Beneath the syntactic sugar described in our previous work (Knight and Nowotny,

While GeNN

Initialization of variables with homogeneous values—such as the neurons' membrane potential—is performed by initialization kernels generated by GeNN and the initial values of variables with heterogeneous values—such as the

As illustrated in the previously-defined model, for convenience, PyGeNN allows users to access GeNN's built-in models. However, one of PyGeNN's most powerful features is that it enables users to easily define their own neuron and synapse models from within Python. For example, an Izhikevich neuron model (Izhikevich,

The

Internally, GeNN stores the spikes emitted by a neuron population during one simulation timestep in an array containing the indices of the neurons that spiked alongside a counter of how many spikes have been emitted overall. Previously, recording spikes in GeNN was very similar to the recording of voltages shown in the previous example code—the array of neuron indices was simply copied from the GPU to the CPU every timestep. However, especially when simulating models with a small simulation timestep, such frequent synchronization between the CPU and GPU is costly—especially if a slower, interpreted language such as Python is involved. Furthermore, biological neurons typically spike at a low rate (in the cortex, the average firing rate is only around 3 Hz; Buzsáki and Mizuseki,

When a model includes delays, the array of indices and the counter used to store spikes internally are duplicated for each delay “slot.” Additional delay slots could be artificially added to the neuron population so that this data structure could be re-used to also store spike data for subsequent recording. However, the array containing the indices has memory allocated for all neurons to handle the worst case where all neurons in the population fire in the same time step. Therefore, while this data structure is ideal for efficient spike propagation, using it to store many timesteps worth of spikes would be very wasteful of memory. At low firing rates, the most memory efficient solution would be to simply store the indices of neurons which spiked each timestep, for example in a data structure similar to a Yale sparse matrix with each “row” representing a timestep (Eisenstat et al., ^{3} neurons running for 10 × 10^{3} simulation timesteps, required <120 MB—a small fraction of the memory on a modern GPU. While efficiently handling spikes stored in a bitfield is a little trickier than working with a list of neuron indices, GeNN provides an efficient C++ helper function for saving the spikes stored in a bitfield to a text file and a numpy-based method for decoding them in PyGeNN.

Potjans and Diesmann (_{i} of each neuron

where τ_{m} = 10 ms and _{m} = 40MΩ represent the time constant and resistance of the neuron's cell membrane, _{rest} = −65 mV defines the resting potential, _{syni} represents the synaptic input current and _{exti} represents an external input current. When the membrane voltage crosses a threshold _{th} = −50 mV a spike is emitted, the membrane voltage is reset to _{rest} and updating of _{ref} = 2 ms. Neurons in each population are connected randomly with numbers of synapses derived from an extensive review of the anatomical literature. These synapses are current-based, i.e., presynaptic spikes lead to exponentially-decaying input currents _{syni}

where τ_{syn} = 0.5 ms represents the synaptic time constant, _{ij} represents the synaptic weight and _{j} are the arrival times of incoming spikes from ^{9} synapses. As well as receiving synaptic input, each neuron in the network also receives an independent Poisson input current, representing input from neighboring not explicitly modeled cortical regions. The Poisson input is delivered to each neuron via _{exti} with

where ν_{ext} represents the mean input rate and _{ext} represents the weight. The ordinary differential Equations (1), (2), and (3) are solved with an exponential Euler algorithm. For a full description of the model parameters, please refer to Potjans and Diesmann (

Illustration of the microcircuit model. Blue triangles represent excitatory populations, red circles represent inhibitory populations, and the number beneath each symbol shows the number of neurons in each population. Connection probabilities are shown in small bold numbers at the appropriate point in the connection matrix. All excitatory synaptic weights are normally distributed with a mean of 0.0878 nA (unless otherwise indicated in green) and a standard deviation of 0.0878 nA. All inhibitory synaptic weights are normally distributed with a mean of 0.3512 nA and a standard deviation of 0.03512 nA.

The cortical microcircuit model described in the previous section is ideal for exploring the performance of short simulations of relatively large models. However, the performance of longer simulations of smaller models is equally vital. Such models can be particularly troublesome for GPU simulation as, not only might they not offer enough parallelism to fully occupy the device but, each timestep can be simulated so quickly that the overheads of launching kernels etc can dominate. Additional overheads can be incurred when models require injecting external stimuli throughout the simulation. Longer simulations are particularly useful when exploring synaptic plasticity so, to explore the performance of PyGeNN in this scenario, we simulate a model of Pavlovian conditioning using a three-factor Spike-Timing-Dependent Plasticity (STDP) learning rule (Izhikevich,

The model illustrated in _{i} and adaption variables _{i} evolve such that:

When the membrane voltage rises above 30, a spike is emitted and _{i} is reset to _{i}. Excitatory neurons use the regular-spiking parameters (Izhikevich, _{syni} represents the synaptic input current and _{exti} represents an external input current. While there are numerous ways to solve Equations (4) and (5) (Humphries and Gurney, _{i}, Equation (5) is integrated for a single 1 ms timestep.

Illustration of the balanced random network model. The blue triangle represents the excitatory population, the red circle represents the inhibitory population, and the numbers beneath each symbol show the number of neurons in each population. Connection probabilities are shown in small bold numbers at the appropriate point in the connection matrix. All excitatory synaptic weights are plastic and initialized to 1 and all inhibitory synaptic weights are initialized to −1.

The excitatory and inhibitory neural populations are connected recurrently, as shown in

where _{j} are the arrival times of incoming spikes from _{ij} = −1.0 and excitatory synapses are plastic. Each plastic synapse has an eligibility trace _{ij} as well as a synaptic weight _{ij} and these evolve according to a three-factor STDP learning rule (Izhikevich,

where τ_{c} = 1, 000_{post} − _{pre}. These changes are only applied to the trace at the times of pre and postsynaptic spikes as indicated by the Dirac delta function δ(_{pre/post}). Here, a double exponential STDP kernel is employed such that:

where the time constants of the STDP window τ_{+} = τ_{−} = 20 ms and the strength of potentiation and depression are _{+} = 0.1 and _{−} = 0.15, respectively. Finally, each excitatory neuron has an additional variable _{j} which describes extracellular dopamine concentration:

where τ_{d} = 200 ms represents the time constant of dopamine uptake and DA(

The first step in implementing this learning rule in PyGeNN is to implement the STDP updates and decay of _{ij} using GeNN's event-driven plasticity system, the implementation of which was described in our previous work (Knight and Nowotny, _{ij} and _{ij} state variables:

We then instruct GeNN to record the times of current and previous pre and postsynaptic spikes. The current spike time will equal the current time if a spike of this sort is being processed in the current timestep whereas the previous spike time only tracks spikes which have occurred

Next we define the “sim code” which is called whenever presynaptic spikes arrive at the synapse. This code first implements Equation (6)—adding the synaptic weight (_{ij}) to the postsynaptic neuron's input (_{syni}) using the

Within the sim code we also need to calculate the time that has elapsed since the last update of _{ij} using the spike times we previously requested that GeNN record. Within a timestep, GeNN processes presynaptic spikes before postsynaptic spikes so the time of the last update to _{ij} will be the latest time either type of spike was processed in previous timesteps:

Using this time, we can now calculate how much to decay _{ij} using the closed-form solution to Equation (7):

To complete the sim code we calculate the depression case of Equation (9) (here we use the

Finally, we define the “learn post code” which is called whenever a postsynaptic spike arrives at the synapse. Other than implementing the potentiation case of Equation (9) and using the _{ij}—in order to correctly handle presynaptic updates made in the same timestep—this code is very similar to the sim code:

Adding the synaptic weight _{ij} update described by Equation (8) requires two further additions to the model. As well as the pre and postsynaptic spikes, the weight update model needs to receive events whenever dopamine is injected via DA. GeNN supports such events via the “spike-like event” system which allows events to be triggered based on an expression evaluated on the presynaptic neuron. In this case, this expression simply tests an

In order to extend our event-driven update of _{ij} to include spike-like events we need to instruct GeNN to record the times at which they occur:

The spike-like events can now be handled using a final “event code” string:

After updating the previously defined calculations of _{ij}. Mikaitis et al. (_{ij} to be updated in an event-driven manner with:

where _{ij}, _{ij}, and _{j}, respectively were updated. Because we will always update _{ij} and _{ij} together when presynaptic, postsynaptic and spike-like events occur,

and this update can now be added to each of our three event handling code strings to complete the implementation of the learning rule.

To perform the Pavlovian conditioning experiment described by Izhikevich (_{1}…_{100}) from amongst the two neural populations. Stimuli are presented to the network in a random order, separated by intervals sampled from _{exti} throughout the simulation. _{1} is arbitrarily chosen as the Conditioned Stimuli (CS) and, whenever this stimuli is presented, a reward in the form of an increase in dopamine is delivered by setting DA(

where the

The same approach can then be used to zero the current afterwards.

In the following subsections we will analyse the performance of the models introduced in sections 2.5 and 2.6 on a representative selection of NVIDIA GPU hardware:

Jetson Xavier NX—a low-power embedded system with a GPU based on the Volta architecture with 8 GB of shared memory.

GeForce GTX 1050Ti—a low-end desktop GPU based on the Pascal architecture with 4 GB of dedicated memory.

GeForce GTX 1650—a low-end desktop GPU based on the Turing architecture with 4 GB of dedicated memory.

Titan RTX—a high-end workstation GPU based on the Turing architecture with 24 GB of dedicated memory.

All of these systems run Ubuntu 18 apart from the system with the GeForce 1050 Ti which runs Windows 10.

Simulation times of the microcircuit model running on various GPU hardware for 1 s of biological time. “Overhead” refers to time spent in simulation loop but not within CUDA kernels. The dashed horizontal line indicates realtime performance.

Without the recording system described in section 2.4, the CPU and GPU need to be synchronized after every timestep to allow spike data to be copied off the GPU and stored in a suitable data structure. The “overheads” shown in

However, when the spike recording system described in section 2.4 is used, spike data is kept in GPU memory until the end of the simulation and overheads are reduced by up to 10×. Because synchronization with the CPU is no longer required every timestep, simulations run approximately twice as fast on the Windows machine. Furthermore, on the high-end desktop GPU, the simulation now runs faster than real-time in both PyGeNN and GeNN versions—significantly faster than other recently published GPU simulators (Golosio et al.,

Results of Pavlovian conditioning experiment. Raster plot and spike density function (SDF) (Szücs,

Simulation times of the Pavlovian Conditioning model running on various GPU hardware for 1 h of biological time. “GPU recording” indicates simulations where the new recording system is employed. Times are taken from averages calculated over 5 runs of each model.

Interestingly, unlike in the simulations of the microcircuit model, here the GTX 1050 Ti performs rather differently. Although the clock speed of this device is approximately the same as the other GPUs (1,290–1,392 MHz) and it has a similar number of CUDA cores to the GTX 1650, its performance is significantly worse. The difference in performance across all configurations is likely to be due to architectural differences between the older Pascal; and newer Volta and Turing architectures. Specifically, Pascal GPUs have one type of Arithmetic Logic Unit (ALU) which handles both integer and floating point arithmetic, whereas the newer Volta and Turing architectures have equal numbers of dedicated integer and floating point ALUs as well as significantly larger L1 caches. As discussed in our previous work (Knight and Nowotny,

The difference between the speeds of the PyGeNN and GeNN simulations of the Pavlovian conditioning model (

Comparison of the duration of individual timestep in PyGeNN and GeNN simulations of the microcircuit and Pavlovian conditioning experiments. Times are taken from averages calculated over 5 runs using the GPU recording system.

In this paper we have introduced PyGeNN, a Python interface to the C++ based GeNN library for GPU accelerated spiking neural network simulations.

Uniquely, the new interface provides access to all the features of GeNN, without leaving the comparative simplicity of Python and with, as we have shown, typically negligible overheads from the Python bindings. PyGeNN also allows bespoke neuron and synapse models to be defined from within Python, making PyGeNN much more flexible and broadly applicable than, for instance, the Python interface to NEST (Eppler et al.,

In many ways, the new interface resembles elements of the Python-based Brian 2 simulator (Stimberg et al.,

As we have demonstrated, the PyGeNN wrapper, exactly like native GeNN, can be used on a variety of hardware from data center scale down to mobile devices such as the NVIDIA Jetson. This allows for the same codes to be used in large-scale brain simulations and embedded and embodied spiking neural network research. Supporting the popular Python language in this interface makes this ecosystem available to a wider audience of researchers in both Computational Neuroscience, bio-mimetic machine learning and autonomous robotics.

The new interface also opens up opportunities to support researchers that work with other Python based systems. In the Computational Neuroscience and Neuromorphic computing communities, we can now build a PyNN (Davison et al.,

In this work we have introduced a new spike recording system for GeNN and have shown that, using this system, we can now simulate the Potjans microcircuit model (Potjans and Diesmann,

All models, data and analysis scripts used for this study can be found in

JK and TN wrote the paper. TN was the original developer of GeNN. AK is the original developer of PyGeNN. JK is currently the primary developer of both GeNN and PyGeNN, responsible for implementing the spike recording system, and performed the experiments and the analysis of the results that are presented in this work. All authors contributed to the article and approved the submitted version.

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

We would like to thank Malin Sandström and everyone else at the International Neuroinformatics Coordinating Facility (INCF) for their hard work running the Google Summer of Code mentoring organization every year. Without them, this and many other exciting Neuroinformatics projects would not be possible.