Quantcast
Channel: Planet Python
Viewing all articles
Browse latest Browse all 22462

Robin Wilson: Calculating percentiles in Python – use numpy not scipy!

$
0
0

This is just a brief public service announcement reporting something that I’ve just found: np.percentile is lot faster than scipy.stats.scoreatpercentile– almost an order of magnitude faster in some cases.

Someone recently asked me why on earth I was using scoreatpercentile anyway – and it turns out that np.percentile was only added in numpy 1.7, which was released part-way through my PhD in Feb 2013, hence why the scipy function is used in some of my code.

In my code I frequently calculate percentiles from satellite images represented as large 2D numpy arrays – and the speed differences can be quite astounding:

Image sizescoreatpercentilepercentilespeedup
100595us169us3.5x
100084ms13ms6.5x
3000927ms104ms9x
80008s1s8x

As you can see, we get 3-4 times speedup for even small arrays (100 x 100, so 10,000 elements), and up to 8-9 times speedup for large arrays (tens of millions of elements).

Anyway, the two functions have very similar signatures and options – the only thing missing from np.percentile is the ability to set hard upper or lower limits – so it should be fairly easy to switch over, and it’s worth it for the speed boost!


Viewing all articles
Browse latest Browse all 22462

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>