Continuum Analytics News: Numba 0.23 Has Been Released!

Developer Blog

PostedFriday, January 15, 2016

.bokeh table { border: none; } .bokeh td { padding: 0; }

With this latest release of Numba, we're excited to be able to deliver several frequently requested features:

JIT classes
np.dot support in nopython mode
a multithreaded CPU target for @guvectorize.

Experimental Support for JIT-friendly Classes

Numba derives its performance from the combination of two important ingredients:

Data structures that can be accessed directly, bypassing the Python interpreter (such as the NumPy ndarray).
Just-in-time generation of machine code by the LLVM compiler library .

While much of our focus is on the compiler-side of things, we also want to promote machine-friendly data structures in Python. We look forward to someday being able to pass Pandas DataFrames, xray DataArrays, and DyND arrays to Numba-compiled functions. (More on that below...)

However, sometimes you just want to use some simple objects as your data structure, along with compiled methods that operate on the attributes. For those cases, we've created a new decorator that can be applied to a Python class: @jitclass.

Here's an example of how this works:

# conda create -n numba_023_test python=3.4 numba scipy bokeh jupyter
import numpy as np
from numba import jit, jitclass, int64, float64, guvectorize
from bokeh.plotting import figure, output_notebook, show
output_notebook()

BokehJS successfully loaded.

@jitclass([    
    ('xmin', float64),
    ('xmax', float64),
    ('nbins', int64),
    ('xstep', float64),
    ('xcenter', float64[:]),
    ('bins', int64[:]),
    ('moments', float64[:])
])
class Hist1D(object):
    '''A 1D histogram that can be updated, and computes mean and stddev incrementally'''
    def __init__(self, xmin, xmax, nbins):
        self.xmin = xmin
        self.xmax = xmax
        self.nbins = nbins
        self.xstep = (xmax - xmin) / nbins
        self.xcenter = (np.arange(nbins) + 0.5) * self.xstep - self.xmin
        self.bins = np.zeros(self.nbins, dtype=np.int64)
        self.moments = np.zeros(3, dtype=np.float64)
    
    def fill_many(self, values):
        for value in values:
            bin_index = np.int64((value - self.xmin) / self.xstep)
            if 0 <= bin_index < len(self.bins):
                self.bins[bin_index] += 1
                self.moments[0] += 1
                self.moments[1] += value
                self.moments[2] += value**2
    
    @property
    def count(self):
        return np.int64(self.moments[0])
    
    @property
    def mean(self):
        return self.moments[1] / self.moments[0]
    
    @property
    def stddev(self):
        return np.sqrt(self.moments[2] / self.moments[0] - self.mean**2)

This example shows all the basic features that @jitclass currently supports. The attributes and Numba types are described in a specification that is passed to the @jitclass decorator.

h = Hist1D(-4, 4, 25)
h.fill_many(np.random.normal(size=5000))

fig = figure(plot_width=600, plot_height=300)
fig.line(h.xcenter, h.bins)
show(fig)

print('Count: %f, Mean: %f, StdDev: %f' % (h.count, h.mean, h.stddev))

Count: 4999.000000, Mean: -0.013222, StdDev: 0.997465

The great thing about JIT classes is that they can also be passed to any nopython mode functions:

@jit(nopython=True)
def add_uniform_noise(hist, noise_fraction):
    '''Add uniformly distributed noise to a histogram.
    The final histogram will have the specified fraction of noise samples.
    '''
    n = np.int64(hist.count / (1 - noise_fraction))
    samples = np.empty(n, dtype=np.float64)
    for i in range(n):
        samples[i] = np.random.uniform(hist.xmin, hist.xmax)
    hist.fill_many(samples)

Let's try it out:

h2 = Hist1D(-4, 4, 25)
h2.fill_many(np.random.normal(size=5000))

add_uniform_noise(h2, noise_fraction=0.3)

fig = figure(plot_width=600, plot_height=300, y_range=(0, h2.bins.max() * 1.1))
fig.line(h2.xcenter, h2.bins)
_ = show(fig)

One important caveat about JIT classes is that access to attributes from Python will be significantly slower than a normal Python object:

class PythonObject(object):
  def __init__(self):
      self.count = 1

python_obj = PythonObject()

%timeit python_obj.count + 2
%timeit h2.nbins + 2

10000000 loops, best of 3: 103 ns per loop
The slowest run took 5270.13 times longer than the fastest. This could mean that an intermediate result is being cached 
100000 loops, best of 3: 8.28 µs per loop

This performance difference is because JIT classes store their attribute data in a non-Python object form, so when the attribute value is requested from the Python interpreter, Numba has to wrap the value in a new Python object to return it. This is similar to the tradeoffs associated with accessing NumPy array elements from Python. In moderation, attribute access is fine, but if you need to do it frequently, then you should consider compiling that code with Numba as well.

Our support for JIT classes is very limited today. Only nopython mode is supported, and our interface for handling recursive types (types that contain instances of themselves) is still in flux. We will be working to remove the rough spots, and expand the functionality to cover more use cases over the next few releases.

Initial support for `np.dot` in nopython mode (requires SciPy 0.16 or later)

This simple-sounding request turned out to be much more complicated because we wanted to make sure that Numba would use the same high performance BLAS library (MKL, OpenBLAS, etc.) likely being used by NumPy and SciPy. We discovered that SciPy exports C-callable BLAS functions for Cython which Numba will also take advantage of. Those SciPy functions in turn will call whatever BLAS implementation SciPy was configured with.

The feature itself is fairly straightforward. We support the np.dot() function in nopython mode for contiguous arrays when doing:

1D vector × 1D vector dot product
2D matrix × 1D vector multiplication
2D matrix × 2D vector multiplication

Future releases will implement the broadcast rules for higher dimensional dot products, and also support non-contiguous arrays by copying to temporary storage before calling the BLAS library. (Note that np.dot inside nopython mode will be no faster than outside nopython mode, since both will use the same optimized BLAS library to do the heavy lifting.)

Multithreaded CPU `@guvectorize`

We love Universal Functions ("ufuncs") and Generalized Universal Functions ("gufuncs")! They are an underappreciated abstraction for expressing array-oriented functions that are intutive to write and easy for a compiler to parallelize. In fact, many people may not realize that Numba supports four different targets for both ufuncs and gufuncs:

target=cpu: Single-threaded, CPU execution
target=parallel: Multi-threaded, CPU execution
target=cuda: Execution on NVIDIA GPUs that support CUDA (most of them)
target=hsa: Execution on AMD APUs that support HSA (Kaveri and Carrizo)

With the Numba 0.22.1 release, we open sourced the parallel, cuda, and hsa targets that had been in our numbapro package (see Deprecating NumbaPro: The New State of Accelerate in Anaconda for more details).

However, there was one missing implementation: @guvectorize(target=parallel). During the Numba 0.23 release cycle we filled in that gap, so now you can take advantage of gufuncs on your multicore processors:

@guvectorize([(float64[:], float64[:])], '(n)->()')
def l2norm_cpu(vec, result):
    result[0] = (vec**2).sum()
    
@guvectorize([(float64[:], float64[:])], '(n)->()', target='parallel')
def l2norm_parallel(vec, result):
    result[0] = (vec**2).sum()

On my quad-core MacBook Pro laptop:

n = 100000
dims = 10
random_vectors = np.random.uniform(size=n*dims).reshape((n, dims))
%timeit l2norm_cpu(random_vectors)
%timeit l2norm_parallel(random_vectors)

10 loops, best of 3: 48 ms per loop
10 loops, best of 3: 19.5 ms per loop

For more details on gufuncs, check out our @guvectorize documentation:

What's Next?

In the next release cycle, we'll be refining and improving the features described above, upgrading Numba to LLVM 3.7, and documenting an official API for 3rd parties to extend Numba. Our hope is that this will allow other libraries to add new types and function implementations to the Numba compiler without needing to modify the Numba codebase itself. This opens up a lot of possibilities (nopython support for accessing Pandas DataFrames, anyone?) and we look forward to seeing what you can build with Numba.

As always, you can install the latest Numba with conda:

conda install numba # Don't forget to install scipy for np.dot support!

or find our release at PyPI.

If you have questions or suggestions, we welcome feedback in our GitHub repository and on our mailing list.

(You can download this blog post in notebook form from https://notebooks.anaconda.org/seibert/numba-0-23-release)

Continuum Analytics News: Numba 0.23 Has Been Released!

Developer Blog

Experimental Support for JIT-friendly Classes

Initial support for `np.dot` in nopython mode (requires SciPy 0.16 or later)

Multithreaded CPU `@guvectorize`

What's Next?

Trending Articles

Bath man appears in court charged with attempted murder of a man...

MACLEAN, Allan

Black Angus Grilled Artichokes

Practice Sheet of Right form of verbs for HSC Students

Police blotter for Jan. 12

99 God Status for Whatsapp, Facebook

Rajasthan Board 12th Science Result 2018 name wise- RBSE 12th commerce result...

Notorious Naushad of Ippa gang nabbed

Child Kidnapping: Amy McNeil was kidnapped on her way to school by 5 adults;...

Sonible Smartlimit v1.1.5-R2R

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Arrow Flash 2 – Sinhala Dubbed – Episode 23 – 20th March 2016

[GET] AI Traffic Goldmine

[E² Plugin] HDF-Radio

Universal Multi-Patch v1.3 By RADIXX11

IWAN – Thanks and Praise ( Throw Back Thursday )

RONALD P SONDERGAARD Arrested by Miami-Dade County Corrections on Mar 03, 2017

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

HSSC Excise & Taxation Inspector Result 2017 Scorecard/ Category Wise Merit List

Experimental Support for JIT-friendly Classes

Initial support for np.dot in nopython mode (requires SciPy 0.16 or later)

Multithreaded CPU @guvectorize

What's Next?

Trending Articles

Initial support for `np.dot` in nopython mode (requires SciPy 0.16 or later)

Multithreaded CPU `@guvectorize`