Eventlet Removal Logo
Eventlet Removal

Migration Testimony: Jay Faulkner's Experience

Jay Faulkner

Jay Faulkner

https://jay.jvf.cc/

Open Source Developer at G-Research, OpenStack Contributor & Former Technical Commitee Member

Working on OpenStack Ironic

Technical Context

Which project or component did you migrate away from Eventlet?

My primary technical work was focused on Ironic sub-projects, like Ironic Python Agent (IPA) or Networking Generic Switch (NGS), but I was involved in the migration at an OpenStack level very early. While serving as chair of the Technical Committee in November 2023, I emailed the list about the dire state of eventlet in python 3.12. This was the kickoff to the years-long project of eventlet removal.

How deeply was Eventlet integrated into your codebase?

Ironic Python Agent (IPA) has traditionally had an interesting relationship with eventlet. One of my earliest encounters with eventlet was troubleshooting deadlocks for weeks when adding TLS support to IPA. To me, this added a real sense of the unknown: I wasn’t entirely sure how integrated it was with our codebase, because so much of it was implicit via monkey patching and use of the eventlet WSGI server. It was clear that getting rid of the use of the eventlet WSGI server would be the first step to determining how tangled the removal would be.

We had an interesting problem though: we cannot use a traditional WSGI server (e.g. gunicorn), which expects to be run directly rather than started via a python method. The nature of IPA being run from a read-only image means it configures itself on the fly when started up. Using a token and URL from the kernel command line, it authenticates to the Ironic API which populates the remaining IPA configuration. Only at this point can the IPA API server startup; leaving us with a choice: turn this flow into a two step, two process flow, or find a WSGI server that can startup after the process.

Which framework or alternative did you choose to replace Eventlet, and why?

While we were trying to figure out how to solve the WSGI server problem, Dmitry Tantsur from Red Hat — a longtime Ironic and Metal3 contributor — found cheroot. As the basic WSGI library that powers cherrypy, it was already designed to be imported and started up directly from python. This is exactly what we needed!

Once the eventlet WSGI dependency was removed from IPA, it became clear how integrated eventlet was with IPA: not very much at all. The only remaining tasks were replacing a few eventlet.sleep() invocations with time.sleep(). With these changes, we were able to complete the removal of eventlet from IPA.

Motivation and Decision

What motivated you to start this migration?

I've been in the OpenStack community for a while, and often we struggle finding enough resources to perform the important, but less visible work in the community such as security, release management, or QA. The eventlet migration — a large undertaking to remove technical debt — is an example of this type of less-visible work. I was highly motivated to kick the community into gear and help Ironic be an example of migration in the hope that it would help others understand the importance of this migration.

Additionally, we promise our operators eighteen months of support. As someone who started out in operations, I know people rely on those promises. I was extremely concerned we'd have a major bug (or worse; security issue), involving eventlet, that we might be unable to fix due to that project being unmaintained at the time.

I am extremely grateful for the community stepping up when the call for action went out initially and for those — like Hervé Beraud and others building this website — are helping to keep a spotlight on this important work as we drive it to completion.

Did you have any concerns or doubts before starting?

Moving a large community like OpenStack — literally hundreds of contributors spread across the world — in the same direction is incredibly difficult. Getting them to a consensus on something technical, like an eventlet migration, is even tougher. When this started, I didn't see how it would end — or if it would end. Looking back to November 2023 when I originally wrote that email, if you told me we'd have Ironic migrated and many other projects well on the way by the 2025.2 release, I would've been thrilled.

For me personally, I found the project very intimidating at first. I learned how to write python working on OpenStack and have written more python code running in context with monkey patched eventlet than without. In general, approaching the questions around threading models, asyncio, and how to safely extricate eventlet at a technical level were difficult – other members of the Ironic community breaking it down into digestible pieces helped me significantly with this.

Migration Process

How did the migration process go? Where did you start?

Like I said above, this started for me before the technical migration started — in my role as Technical Committee Chair I documented the alarming state of eventlet on the mailing list in November, 2023. While I'm sure the issues were not news to anyone involved with the project, it was valuable to collate the various issues and state the business case around migrating from eventlet. OpenStack really had no choice — as has been well-documented already, changes to core python functionality meant that model was not going to last much longer and the project was in peril.

It takes more than just words though; as a member of the G-Research Open Source Software (GR-OSS) team, I am lucky to have collaborators whose expertise in this area exceeds my own — in this case, we asked Itamar Turner-Trauring to take a look at the eventlet codebase. He, along with contributors like Herve Beraud at Red Hat, worked together to bring the library to a functional state — buying OpenStack (and other eventlet using projects) the badly needed time to migrate.

After working to shore up the status quo and spur the community into action, the hard technical work of migration began. Myself, CID, and Adam MacArthur from GR-OSS started digging into Ironic Python Agent, the small daemon in the Bare Metal project which runs on machines being actively provisioned or cleaned with the Ironic direct driver. It was our hope that trying to migrate this early would shine lights on potential problem points and help supply information for the migration of Ironic; it did this successfully, as the cheroot library piloted as IPA’s replacement WSGI server is also in use in Ironic.

What tools or strategies helped you the most?

Trusting CI. There were a lot of times, such as during the migration for NGS, where we removed explicit eventlet references and the monkey patch and it just worked. It required having faith that the CI would find issues and knowing our tests well enough to identify if a failure was environmental or some new issue.

Trusting the community and finding your role. My background is in operations and Linux systems and I'm less experienced than many others Ironic contributors in the realm that eventlet code lives in. I tried to find spots to help where I could, and asked the community for help where I couldn't. In fact, during the most technically intensive portion of the Ironic migration I was on a sabbatical. It was very refreshing to return and see how much progress had been made.

Were there any particularly tricky or painful parts?

With Ironic — and even NGS — we often write and maintain code that interacts with hardware we don't possess or have a method to test. Even at this writing, a part of me is a little nervous about when Ironic's 2025.2 release gets used in the real world.

Another tricky part is still ongoing, even though the migration is code complete, which is fully understanding the change in performance. Ironic has multiple different use cases and operating modes; while we have a lot of confidence we've sped things up, the shape of the performance has changed. I anticipate we'll continue working through bottlenecks and improving on our post-eventlet architecture in the coming months and years.

Roughly how long did the migration take?

Experiments around removing eventlet from IPA started as early as November 2024, however the bulk of the Ironic-specific eventlet removal work occurred during the OpenStack 2025.2 release cycle. It's important to note that the long tail of prework — early research identifying potential issues, the oslo libraries updating to support being used without eventlet, and the updates to eventlet itself all were part of enabling Ironic to get it wrapped up so quickly.

Were you able to migrate incrementally? If so, how?

Incrementally can mean many things. Ironic migrated in multiple commits, with CI passing between the commits — but realistically, I wouldn't want to run on any of those incremental commits. Ironic decided that the best thing for our operators was to migrate entirely on a release boundary — users of 2025.1 will use eventlet; 2025.2 will not. This allows us to have one performance model to worry about -- if an operator gets a performance regression, we want to fix it and roll forward.

Outcomes and Benefits

What concrete benefits have you seen after migrating?

Julia covered this well in her blogpost on ironicbaremetal.org — the shape of the performance, particularly around memory usage and memory usage reporting changed significantly. Our post-migration testing showed an approximately 10% speed boost in some internal benchmarks, but I expect that some operators may be able to tune for their environment and get even more of a boost. I expect that over time, we may see this improve further as we learn more about the performance of a post-eventlet Ironic.

Primarily for me the concrete benefit is that we've disarmed the ticking time bomb. OpenStack Ironic will continue to work even if the next version of python completely breaks eventlet. This was an existential threat to OpenStack as we know it, and it's nice to know we're going to overcome it as a community.

How did your team react to the change?

I don't know how everyone felt, but I know I observed something that I think speaks volumes: we had a large amount of community participation in this project. Multiple Ironic contributors across multiple companies all were able to contribute something to get it done. Usually you only get reviews or engagement from a portion of the community interested in whatever the feature is; it was very nice to see all corners of the Ironic community rally around the work needed and accomplish it.

Lessons Learned

What advice would you give to a team that's hesitant to migrate?

If you need advice on how to start your migration when reading this, you're already starting too late. This is not the kind of thing that can be done at the last minute. Make a list of what you need to do and start on the first item right away. The window for migrating off eventlet while it’s still working is closing rapidly.

Is there anything you would do differently next time?

Start a year earlier, maybe more – we started too late. The state of eventlet had been dubious for a while, and I think as a TC member before that I could have done more to raise awareness and possibly get us started earlier. As it is now, we're looking at some OpenStack projects having to continue to support code running on eventlet through 2028 — even Ironic's last eventlet-dependent release, 2025.1, will be supported until mid-2026, so even Ironic is not yet out of the woods.

Have you faced blockers? If so, which?

There were some technical problems during the Ironic migration which were worked out, but once the oslo libraries supported a threading-based backend, we were mostly free to migrate Ironic.

Would you like to share a link to a patch, repo, or documentation?

I have a few videos about OpenStack, as well as episodes of the GR-OSS OUT podcast at https://youtube.com/@oss-gr. Keep an eye there for content about a post-eventlet Ironic.

Final Thoughts

Is there anything else you'd like to share with the community about your experience?

In my podcast, we talk about "gross" moments in open source, where something happens that's unpleasant and you have to work through it. This really was one for the entire OpenStack ecosystem. Rallying around taking working code using one model and refactoring it to another model and hoping nothing breaks is extremely intimidating — and that's before thinking about the sheer scope of OpenStack: hundreds of individual services that all needed this done. In November 2023, I was afraid this might be impossible. Now we can see the finish line — and the community and software are better for it.