Quantcast
Channel: Planet Python
Viewing all articles
Browse latest Browse all 24367

Python Sweetness: A fork in the road for Mitogen

$
0
0

Mitogen for Ansible's original plan described facets of a scheme centered on features made possible by a rigorous single cohesive distributed program model, but of those facets, it quickly became clear that most users are really only interested in the big one: a much faster Ansible.

While I'd prefer feature work, this priority is fine: better performance usually entails enhancements that benefit the overall scheme, and improving people's lives in this manner is highly rewarding, so incentives remain aligned. It is impossible not to find renewed energy when faced with comments like this:

Enabling the mitogen plugin in ansible feels like switching from floppy to SSD
https://t.co/nCshkioX9h

Although feedback on the project has been very positive, the existing solution is sometimes not enough. Limitations in the extension and Ansible really bite, most often manifesting when running against many targets. In these scenarios, it is heartbreaking to see the work fail to help those who could benefit from it most, and that's what I'd like to talk about.

Controller-side Performance

Some time ago I began refactoring Ansible's linear strategy, aiming to get it to where controller-side enhancements might exist without adding more spaghetti, while becoming familiar with requirements for later features. To recap, the strategy plugin is responsible for almost every post-parsing task, including worker management. It is in many ways the beating heart at the core of every Ansible run.

After some months and one particularly enlightening conversation that work was resumed, eventually subsuming all of the remaining strategy support and result processing code, forming one huge refactor of a big chunk of upstream that I have been sitting on for nearly a month.

The result exists today and is truly wonderful. It integrates Mitogen into the heart of Ansible without baking it in, introduces a carefully designed process model with strong persistence properties, eliminating most bottlenecks endured by the extension and vanilla Ansible, and provides an architectural basis for the next planned iteration of scalability work, Windows compatibility, some features I've already mentioned, and quite a few I've been keeping quiet.

With the new strategy it is possible to almost perfectly saturate an 8 vCPU machine given 100 targets, with minimal loss of speedup compared to single-target. Regarding single target, simple loops against localhost are up to 4x faster than the current stable extension.

There are at least 2 obvious additional enhancements now possible with the new work, but I stopped myself in order to allow stablizing one piece of the puzzle at a time. When this is done, it is clear exactly where to pick things up next.

Deep Cuts

There's just a small hitch: this work goes deep, entailing changes that, while so far would be possible as monkey-patches, are highly version-specific, and unlikely to remain monkey-patchable as the branch receives real-world usage. There must be a mechanism to ship unknown future patches to upstream code.

I hoped it could land after Ansible 2.7, benefitting from related changes planned upstream, but they appear to have been delayed or abandoned, and so a situation exists where I cannot ship improvements for at least another 4-6 months, assuming the related changes finally arrived in Ansible 2.8.

To the right is a rough approximation of components involved in executing a playbook. Those modified or replaced by the stable extension are green, yellow are replaced by the branch-in-waiting. Finally in orange are components affected by planned features and optimizations.

Although there are tens of thousands of lines of surrounding code, as should hopefully be clear, the number of untouched major components involved in a run has been dwindling fast. In short, the existing mechanism for delivering improvements is reaching its limit.

The F Word

I hope any seasoned developer, especially those familiar with the size of the Ansible code base, should understand the predicament. There is no problem delivering improvements today, assuming an unsupported one-off code dump was all anyone wanted, but that is never the case.

The problem lies in entering an unsustainable permanent marriage with a large project, not forgetting to mention this outcome was an explicit non-goal from the start. Simultaneously over the months I have garnered significant trust to deliver these kinds of improvements, and abandoning one of the best yet would seem foolish.

Something of a many-variabled optimization process has recently come to an end, and a solution has been found that I am comfortable with. While making an announcement requires more time and may still not be definite, I wanted to document at least some of my reasoning before it comes.

Even though I wanted to avoid this outcome, and while the solution in mind is not without restraint, it is still a cloud with many silver linings. For instance, new user configuration steps can be reduced to almost zero, core features can be added with minimal friction, and creative limitations are significantly uncapped.

The key question was how to sustain continued work on a solution that has clear value to a real problem that plagued upstream since conception. The answer it turns out, is obvious: the scalability fixes I wish to release primarily benefit one type of user.

What about upstream?

Beyond debating strawmen and lines of code, no actionable outcome has ever materialized, not after carefully worded chain rattling, and not even in the form of a bug report. If it had, it was always going to at best be a compromise with an organization that has delivered consistently worsening performance every major release for the past 2 and a half years, and it is the principal reason crowdfunding the extension was the only method to deliver real improvements.

The cold reality is that the upstream trend is not a good one: this problem has existed forever and it is slowly getting worse over time. My best interpretation is that some veterans hate the extension's solution, perhaps some of those around since 2012 when Michael DeHaan, the project founder, first attempted a connection method uncannily similar to today's design.

In any case they have my e-mail address, an existing thread to hit Reply to, and at least two invitations to a telephone call. A conversation requires interest and initiative, and above all else it requires two parties.

What About The Extension?

The planned structure keeps the extension front-and-centre, so regardless of outcome it will continue to receive significant feature work and maintenance. It is definitely not going away.

With a third stable release looming, it's probably high time for a quick update. Many bugs were squashed since July, with stable work recently centered around problems with Ansible 2.6. This involved some changes to temporary file handling, and in the process, discovery of a huge missed optimization.

v0.2.3 will need only 2 roundtrips for each copy and template, or in terms of a 250ms transcontinental link, 10 seconds to copy 20 files vs. 30 seconds previously, or 2 minutes compared to vanilla's best configuration. This work is delayed somewhat as a new RPC chaining mechanism is added to better support all similar future changes, and identical situations likely to appear in similar tools.

Just tuning in?

Until next time!


Viewing all articles
Browse latest Browse all 24367

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>