In Why is everyone trying to kill Airflow, dataengineeringdude argues that pre-cloud era tools such as the popular open-source orchestrator Apache Airflow are here to stay, even if there are numerous things to be unhappy about (read the article for a long list of those).
Trouble is, Airflow is cut from the same cloth as other pre-cloud era tools, it's complex to operate and the codebase shows its age, being built on a language that wasn't designed to support large programs. Ironically, that's still how Airflow gets most of its power, being programmable in its implementation language, Python.
It's the end of year season, which is when you make prediction. My prediction is that Airflow is not here to stay and the reason – in addition to its pre-cloud era design – is that DAG execution is in the process of being unbundled.
I speak this truth in reverence for Python, a 90's programming language that I have written thousands upon thousands of lines of code in and the original idea of Airflow, a gloriously pragmatic tool that does what you pretty much want – or wanted.
Airflow is a collection of components that are not best-in-class on their own. That's often the case with integrated systems, the upside is that the components are typically well-integrated. For example, Github includes a code editor and it's "good enough" to make small edits – but it's no Visual Studio Code.
dataengineeringdude already mentions that Airflow's UI is not the best, although it has seen some minor improvements over the past few years. The task execution log is plain at best and integration with better log tools is a hyperlink that takes you out of the experience. The scheduler is not the best either, it provides the same old scheduling capabilities as cron and has no concept of a calendar. Christmas, anyone?
But the reason I wanted to write this post was that I have been following along with the progress of Dagger, a new DevOps platform co-founded by Solomon Hykes who brought us Docker (about which you'll probably see an article "Is WASM trying to kill Docker?" real soon.)
It's in the name! But the ironic thing is that as of this writing, Dagger is being marketed as "CI/CD Pipelines as Code"– but if it it looks like a duck, and quacks like a duck, it's probably more than just a CI/CD tool, because the requirements are the same as data engineering pipelines.
Where Dagger is fundamentally different to Airflow DAGs is that the code is declarative. You can code up a DAG in any of the supported languages including "non-languages" such as Cue. The resulting logic is then executed by an engine which ... could be distributed. You get the drift. It's the same programming model as has been popularized for a decade by tools such as Apache Spark.
But Airflow can run my DAG every Sunday at 4PM! Lots of tools can kick off a process (such as Dagger) at a cron-defined time. Even Azure Data Factory can do that – for free!
We'll see if Dagger reaches critical awareness, but the future is cloud-native, distributed, declarative, local-first, and speaks your language.