Quantcast
Channel: Planet Python
Viewing all articles
Browse latest Browse all 22462

Continuum Analytics News: Dask and scikit-learn: a 3-Part Tutorial

$
0
0
PostedThursday, July 28, 2016

Dask core contributor Jim Crist has put together a series of posts discussing some recent experiments combining Dask and scikit-learn on his blog, Marginally Stable. From these experiments, a small library has been built up, and can be found here.

The tutorial spans three posts, which covers model parallelism, data parallelism and combining the two with a real-life dataset. 

Part I: Dask & scikit-learn: Model Parallelism

In this post we'll look instead at model-parallelism (use same data across different models), and dive into a daskified implementation of GridSearchCV.

Part II: Dask & scikit-learn: Data Parallelism

In the last post we discussed model-parallelism — fitting several models across the same data. In this post we'll look into simple patterns for data-parallelism, which will allow fitting a single model on larger datasets.

Part III: Dask & scikit-learn: Putting it All Together

In this post we'll combine the above concepts together to do distributed learning and grid search on a real dataset; namely the airline dataset. This contains information on every flight in the USA between 1987 and 2008.

Keep up with Jim and his blog by following him on Twitter, @jiminy_crist

 


Viewing all articles
Browse latest Browse all 22462

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>