This weekend, I visited PyCon Italy in the pittoreque town of Firenze. It was a great conference with great talks and encounters (great thanks to all the volunteers who made it happen) and amazing coffee.
I held a talk with the title "Python Data Science Going Functional" Science Track", where I mostly presented on Dask, one of those libraries-to-watch in the Python data science eco system. Slides are available on speaker deck.
Dask
The talk introduces Dask as a functional abstraction in the Python data science stack. While creating the slides I had stumbled over an exciting tweet about Dask by Travis Oliphant
Cool to see dask.array achieving similar performance to Cython + OpenMP: https://t.co/3tsWCAgWWQ Much simpler code with #dask. @PyData
— Travis Oliphant (@teoliphant) April 4, 2016
And after verifying the results on my machine (with some modifications, as I do not trust timeit), I included a very similar benchmark in my slides. While reproducing and adapting the benchmarks, I stumbled over some weirdly long execution times for the dask from_array classmethod. So I included this finding in my talk's slides without really being able to attribute this delay to a specific reason.
After delivering my talk I felt a bit unsatisfied about this. Why did from_array perform so badly? So I decided to ask. The answer: Dask hashes down the whole array in from_array to generate a key for it, which is the reason for it to be so slow. The solution is surprisingly simple. By passing a name='identifier' to the from_array, one can provide a custom key and from_array is a suddenly a cheap operation. So the current state of my benchmark shows that Dask improves upon pure numpy or numexpr performance, however does not quite reach the performance of a Cython implementation:

A corrected benchmark showing execution times for numexpr (NX), numpy (np) and Cython (with OpenMP parallelization) and Cython (including from_array).
The expression evaluated in that benchmark was
x=da.from_array(x_np,chunks=arr.shape[0]/CPU_COUNT,name='x')mx=x.max()x=(x/mx).sum()*mxx.compute()I plan to upload a revised edition of the slides on Speakerdeck (once I have a decenet internet connection again), to include the improved benchmark, so that they are not misleading for people who stumble on them without context.
Learnings
What can we conclude from this?
- The conversion overhead of converting a dask array to a numpy array is not as bad as I feared.
- There are two aspects in a benchnark: performance and usability.
- Dask should be watched not only for out-of-core computations, but also for parallelizing simple, blocking numpy expressions.