Developer Blog
Dask core contributor Jim Crist has put together a series of posts discussing some recent experiments combining Dask and scikit-learn on his blog, Marginally Stable. From these experiments, a small library has been built up, and can be found here.
The tutorial spans three posts, which covers model parallelism, data parallelism and combining the two with a real-life dataset.
Part I: Dask & scikit-learn: Model Parallelism
In this post we'll look instead at model-parallelism (use same data across different models), and dive into a daskified implementation of GridSearchCV.
Part II: Dask & scikit-learn: Data Parallelism
In the last post we discussed model-parallelism — fitting several models across the same data. In this post we'll look into simple patterns for data-parallelism, which will allow fitting a single model on larger datasets.
Part III: Dask & scikit-learn: Putting it All Together
In this post we'll combine the above concepts together to do distributed learning and grid search on a real dataset; namely the airline dataset. This contains information on every flight in the USA between 1987 and 2008.
Keep up with Jim and his blog by following him on Twitter, @jiminy_crist.