Developer Blog
With this latest release of Numba, we're excited to be able to deliver several frequently requested features:
- JIT classes
np.dot
support in nopython mode- a multithreaded CPU target for
@guvectorize
.
Experimental Support for JIT-friendly Classes
Numba derives its performance from the combination of two important ingredients:
- Data structures that can be accessed directly, bypassing the Python interpreter (such as the NumPy ndarray).
- Just-in-time generation of machine code by the LLVM compiler library .
While much of our focus is on the compiler-side of things, we also want to promote machine-friendly data structures in Python. We look forward to someday being able to pass Pandas DataFrames, xray DataArrays, and DyND arrays to Numba-compiled functions. (More on that below...)
However, sometimes you just want to use some simple objects as your data structure, along with compiled methods that operate on the attributes. For those cases, we've created a new decorator that can be applied to a Python class: @jitclass
.
Here's an example of how this works:
# conda create -n numba_023_test python=3.4 numba scipy bokeh jupyter import numpy as np from numba import jit, jitclass, int64, float64, guvectorize from bokeh.plotting import figure, output_notebook, show output_notebook()
@jitclass([ ('xmin', float64), ('xmax', float64), ('nbins', int64), ('xstep', float64), ('xcenter', float64[:]), ('bins', int64[:]), ('moments', float64[:]) ]) class Hist1D(object): '''A 1D histogram that can be updated, and computes mean and stddev incrementally''' def __init__(self, xmin, xmax, nbins): self.xmin = xmin self.xmax = xmax self.nbins = nbins self.xstep = (xmax - xmin) / nbins self.xcenter = (np.arange(nbins) + 0.5) * self.xstep - self.xmin self.bins = np.zeros(self.nbins, dtype=np.int64) self.moments = np.zeros(3, dtype=np.float64) def fill_many(self, values): for value in values: bin_index = np.int64((value - self.xmin) / self.xstep) if 0 <= bin_index < len(self.bins): self.bins[bin_index] += 1 self.moments[0] += 1 self.moments[1] += value self.moments[2] += value**2 @property def count(self): return np.int64(self.moments[0]) @property def mean(self): return self.moments[1] / self.moments[0] @property def stddev(self): return np.sqrt(self.moments[2] / self.moments[0] - self.mean**2)
This example shows all the basic features that @jitclass
currently supports. The attributes and Numba types are described in a specification that is passed to the @jitclass
decorator.
h = Hist1D(-4, 4, 25) h.fill_many(np.random.normal(size=5000)) fig = figure(plot_width=600, plot_height=300) fig.line(h.xcenter, h.bins) show(fig) print('Count: %f, Mean: %f, StdDev: %f' % (h.count, h.mean, h.stddev))
Count: 4999.000000, Mean: -0.013222, StdDev: 0.997465
The great thing about JIT classes is that they can also be passed to any nopython mode functions:
@jit(nopython=True) def add_uniform_noise(hist, noise_fraction): '''Add uniformly distributed noise to a histogram. The final histogram will have the specified fraction of noise samples. ''' n = np.int64(hist.count / (1 - noise_fraction)) samples = np.empty(n, dtype=np.float64) for i in range(n): samples[i] = np.random.uniform(hist.xmin, hist.xmax) hist.fill_many(samples)
Let's try it out:
h2 = Hist1D(-4, 4, 25) h2.fill_many(np.random.normal(size=5000)) add_uniform_noise(h2, noise_fraction=0.3) fig = figure(plot_width=600, plot_height=300, y_range=(0, h2.bins.max() * 1.1)) fig.line(h2.xcenter, h2.bins) _ = show(fig)
One important caveat about JIT classes is that access to attributes from Python will be significantly slower than a normal Python object:
class PythonObject(object): def __init__(self): self.count = 1 python_obj = PythonObject() %timeit python_obj.count + 2 %timeit h2.nbins + 2
10000000 loops, best of 3: 103 ns per loop The slowest run took 5270.13 times longer than the fastest. This could mean that an intermediate result is being cached 100000 loops, best of 3: 8.28 µs per loop
This performance difference is because JIT classes store their attribute data in a non-Python object form, so when the attribute value is requested from the Python interpreter, Numba has to wrap the value in a new Python object to return it. This is similar to the tradeoffs associated with accessing NumPy array elements from Python. In moderation, attribute access is fine, but if you need to do it frequently, then you should consider compiling that code with Numba as well.
Our support for JIT classes is very limited today. Only nopython mode is supported, and our interface for handling recursive types (types that contain instances of themselves) is still in flux. We will be working to remove the rough spots, and expand the functionality to cover more use cases over the next few releases.
Initial support for np.dot
in nopython mode (requires SciPy 0.16 or later)
This simple-sounding request turned out to be much more complicated because we wanted to make sure that Numba would use the same high performance BLAS library (MKL, OpenBLAS, etc.) likely being used by NumPy and SciPy. We discovered that SciPy exports C-callable BLAS functions for Cython which Numba will also take advantage of. Those SciPy functions in turn will call whatever BLAS implementation SciPy was configured with.
The feature itself is fairly straightforward. We support the np.dot()
function in nopython mode for contiguous arrays when doing:
- 1D vector × 1D vector dot product
- 2D matrix × 1D vector multiplication
- 2D matrix × 2D vector multiplication
Future releases will implement the broadcast rules for higher dimensional dot products, and also support non-contiguous arrays by copying to temporary storage before calling the BLAS library. (Note that np.dot
inside nopython mode will be no faster than outside nopython mode, since both will use the same optimized BLAS library to do the heavy lifting.)
Multithreaded CPU @guvectorize
We love Universal Functions ("ufuncs") and Generalized Universal Functions ("gufuncs")! They are an underappreciated abstraction for expressing array-oriented functions that are intutive to write and easy for a compiler to parallelize. In fact, many people may not realize that Numba supports four different targets for both ufuncs and gufuncs:
target=cpu
: Single-threaded, CPU executiontarget=parallel
: Multi-threaded, CPU executiontarget=cuda
: Execution on NVIDIA GPUs that support CUDA (most of them)target=hsa
: Execution on AMD APUs that support HSA (Kaveri and Carrizo)
With the Numba 0.22.1 release, we open sourced the parallel
, cuda
, and hsa
targets that had been in our numbapro
package (see Deprecating NumbaPro: The New State of Accelerate in Anaconda for more details).
However, there was one missing implementation: @guvectorize(target=parallel)
. During the Numba 0.23 release cycle we filled in that gap, so now you can take advantage of gufuncs on your multicore processors:
@guvectorize([(float64[:], float64[:])], '(n)->()') def l2norm_cpu(vec, result): result[0] = (vec**2).sum() @guvectorize([(float64[:], float64[:])], '(n)->()', target='parallel') def l2norm_parallel(vec, result): result[0] = (vec**2).sum()
On my quad-core MacBook Pro laptop:
n = 100000 dims = 10 random_vectors = np.random.uniform(size=n*dims).reshape((n, dims)) %timeit l2norm_cpu(random_vectors) %timeit l2norm_parallel(random_vectors)
10 loops, best of 3: 48 ms per loop 10 loops, best of 3: 19.5 ms per loop
For more details on gufuncs, check out our @guvectorize documentation:
What's Next?
In the next release cycle, we'll be refining and improving the features described above, upgrading Numba to LLVM 3.7, and documenting an official API for 3rd parties to extend Numba. Our hope is that this will allow other libraries to add new types and function implementations to the Numba compiler without needing to modify the Numba codebase itself. This opens up a lot of possibilities (nopython support for accessing Pandas DataFrames, anyone?) and we look forward to seeing what you can build with Numba.
As always, you can install the latest Numba with conda:
conda install numba # Don't forget to install scipy for np.dot support!
or find our release at PyPI.
If you have questions or suggestions, we welcome feedback in our GitHub repository and on our mailing list.
(You can download this blog post in notebook form from https://notebooks.anaconda.org/seibert/numba-0-23-release)