Quantcast
Channel: Planet Python
Viewing all articles
Browse latest Browse all 24330

Ian Ozsvald: “Higher Performance Python” at PyDataCambridge 2019

$
0
0

I’ve had the pleasure of speaking at the first PyDataCambridge conference (2019), this is the second PyData conference in the UK after PyDataLondon (which colleagues and I co-founded 6 years back). I’m super proud to see PyData spread to 6 regional meetups and now 2 UK conferences.

I spoke on Higher Performance Python with a focus towards making Pandas operations go faster and an eye on the upcoming Second Edition of our High Performance Python (O’Reilly) book. The talk covers:

  • Using line_profiler to evaluate sklearn’s LinearRegression vs NumPy’s lstsq (spoiler – lstsq is much faster but that’s due to sklearn being much safer, the slow-down is all due to safety code in sklearn that helps keep your productivity higher overall)
  • Using Pandas for line-by-line iteration (slow) vs apply (faster) and apply with raw=True to expose NumPy arrays (fastest)
  • Using Numba to JIT compile lstsq using apply with raw=True for a huge speed-up
  • Using Dask to parallelise the Numba solution for further speed-ups
  • Advice on being a “highly performant data scientist”

The last point is important – going “compiler happy” and writing highly efficient code may well slow down your team and your overall velocity. Amongst other items I recommended profiling first, maybe introducing Dask& Numba only with a team’s consent and looking at tools like Bulwark to add tests to DataFrames to avoid being derailed by strange data bugs.

Right now Micha and I are busily working to complete the second edition of our book, all going well it’ll be in for Christmas with a publication date around April 2020.

 


Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.

The post “Higher Performance Python” at PyDataCambridge 2019 appeared first on Entrepreneurial Geekiness.


Viewing all articles
Browse latest Browse all 24330

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>