Developer Blog
I am thrilled to announce that the Gordon and Betty Moore Foundation has provided a significant grant in order to help move Numba and Dask to version 1.0 and graduate them into robust community-supported projects.
Numba and Dask are two projects that have grown out of our intense foundational desire at Continuum to improve the state of large-scale data analytics, quantitative computing, advanced analytics and machine learning. Our fundamental purpose at Continuum is to empower people to solve the world’s greatest challenges. We are on a mission to help people discover, analyze and collaborate by connecting their curiosity and experience with any data.
One part of helping great people do even more with their computing power is to ensure that modern hardware is completely accessible and utilizable to those with deep knowledge in other areas besides programming. For many years, Python has been simplifying the connection between computers and the minds of those with deep knowledge in areas such as statistics, science, business, medicine, mathematics and engineering. Numba and Dask strengthen this connection even further so that modern hardware with multiple parallel computing units can be fully utilized with Python code.
Numba enables scaling up on modern hardware, including computers with GPUs and extreme multi-core CPUs, by compiling a subset of Python syntax to machine code that can run in parallel. Dask enables Python code to take full advantage of both multi-core CPUs and data that does not fit in memory by defining a directed graph of tasks that work on blocks of data and using the wealth of libraries in the PyData stack. Dask also now works well on a cluster of machines with data stored in a distributed file-system, such as Hadoop’s HDFS. Together, Numba and Dask can be used to more easily build solutions that take full advantage of modern hardware, such as machine-learning algorithms, image-processing on clusters of GPUs or automatic visualization of billions of data-points with datashader.
Peter Wang and I started Continuum with a desire to bring next-generation array-computing to PyData. We have broadened that initial desire to empowering entire data science teams with the Anaconda platform, while providing full application solutions to data-centric companies and institutions. It is extremely rewarding to see that Numba and Dask are now delivering on our initial dream to bring next-generation array-computing to the Python ecosystem in a way that takes full advantage of modern hardware.
This award from the Moore Foundation will make it even easier for Numba and Dask to allow Python to be used for large scale computing. With Numba and Dask, users will be able to build high performance applications with large data sets. The grant will also enable our Community Innovation team at Continuum to ensure that these technologies can be used by other open source projects in the PyData ecosystem. This will help scientists, engineers and others interested in improving the world achieve their goals even faster.
Continuum has been an active contributor to the Python data science ecosystem since Peter and I founded the company in early 2012. Anaconda, the leading Open Data Science platform, is now the most popular Python distribution available. Continuum has also conceived and developed several new additions to this ecosystem, making them freely available to the open source community, while continuing to support the foundational projects that have made the ecosystem possible.
The Gordon and Betty Moore Foundation fosters pathbreaking scientific discovery, environmental conservation, patient care improvements and preservation of the special character of the Bay Area. The Numba and Dask projects are funded by the Gordon and Betty Moore Foundation through Grant GBMF5423 to Continuum Analytics (Grant Agreement #5423).
We are honored to receive this grant and look forward to working with The Moore Foundation.
To hear more about Numba and Dask, check out our related SciPy sessions in Austin, TX this week:
- Thursday, July 14th at 10:30am: “Dask: Parallel and Distributed Computing” by Matthew Rocklin & Jim Crist of Continuum Analytics
- Friday, July 15th at 11:00am: “Scaling Up and Out: Programming GPU Clusters with Numba and Dask” by Stan Seibert & Siu Kwan Lam of Continuum Analytics
- Friday, July 15th at 2:30pm: “Datashader: Revealing the Structure of Genuinely Big Data” by James Bednar & Jim Christ of Continuum Analytics.