Migration Testimony: Dmitry Tantsur's Experience
    Dmitry Tantsur
https://www.linkedin.com/in/dtantsur/Senior Principal Software Engineer at Red Hat, OpenStack Contributor.
Working on OpenStack Ironic
Technical Context
Which project or component did you migrate away from Eventlet?
I helped migrate Ironic with a focus on standalone applications.
How deeply was Eventlet integrated into your codebase?
Probably deeper than average. Not only did Ironic rely on green threads and monkey patching, it used to make certain performance and scaling assumptions inside of its Conductor. We also relies on the oslo.service's WSGI support for standalone applications (something that many OpenStack projects got rid of).
What were the main technical pain points you identified early or during your migration?
Since we wanted to keep standalone API services, we needed to find a proper replacement for the WSGI server of oslo.service+eventlet. On top of that, we had to revisit the design of worker threads in the conductor. We also lost an easy path towards very high parallelism - something that we still need to recover for some of our background operations.
Which framework or alternative did you choose to replace Eventlet, and why?
The Conductor side was relatively simple: we already relied on the Futurist library, so we kept doing so, just with the threaded backend. I had to write a new thread pool implementation that was able to scale down the number of threads as the load falls.
The API was a different story. I did a survey of established HTTP frameworks in Python and stumbled upon CheRoot - the HTTP server behind CherryPy. It was not without its own rough edges, but at least it ticked the main boxes and was an active project.
Motivation and Decision
What motivated you to start this migration?
We have definitely faced eventlet-related bugs in the past, especially around its TLS handling. The biggest push came from the Eventlet community itself, or rather from the part of the OpenStack community that took over the project. It was clear that the clock was ticking.
Did you have any concerns or doubts before starting?
Our biggest argument was about the choice between native threads and async/await.
What justified your choices?
I personally advocated for native threads for a very pragmatic reason: I did not believe we had capacity to pull off a complete migration to asyncio within any reasonable timeframe. To be clear, asyncio is not off the table. In fact, we may end up migrating some parts of conductor to it in the future because of the above mentioned parallelism issues.
Migration Process
How did the migration process go? Where did you start?
With a very large etherpad :) I created a rough plan based on my knowledge of the code. It proved very incomplete but it allowed us to start and, more importantly, allowed less experienced community members to help. We practiced on Ironic Python Agent first before moving on to Ironic.
What tools or strategies helped you the most? Were there any particularly tricky or painful parts?
The RPC bus. In addition to oslo.messaging, Ironic also supports JSON RPC and a single all-in-one process without any RPC. It is the latter architecture that caused us the most headache. Unfortunately, I'm guilty of making a pretty bad decision at some point, which we managed to revert in time for the coordinated release.
Julia and I also spent quite some time polishing the Conductor thread pool.
Roughly how long did the migration take?
I think the core of it started after the previous PTG and finished a couple of weeks before the final 2025.2 release.
Were you able to migrate incrementally? If so, how?
Sort of. We definitely did not want to migrate the entire Ironic on one patchset. We started with separately migrating API and JSON RPC to CheRoot. Then we migrated the RPC-less process (partly breaking it in the process), which unblocked migrating the Conductor and its threads.
Outcomes and Benefits
What concrete benefits have you seen after migrating?
I think it's too early to judge. I'm happy not to depend on the fate of eventlet though.
How did your team react to the change?
Everyone was on board. Eventlet bugs had been pretty well known before.
Lessons Learned
What advice would you give to a team that's hesitant to migrate?
You need to collect people who really know the project and let them brainstorm on the plan. Without that, it's way too easy to make invalid assumptions or underestimate the scope. Create some sort of a scale or performance test in advance, even if a very simple one. The Metal3 project had a test that quickly creates and deletes 100 node resources, and this test saved us from a big embarrassment.
Is there anything you would do differently next time?
I would rather work on a separate branch and merge everything at once. We ended up releasing an intermediary version of Ironic in a half-migrated state, and its bugs have caused headaches in my downstream.
Have you faced blockers? If so, which?
Only shortly, when we discovered that the existing Futurist's thread pool did not match our requirements.
Would you like to share a link to a patch, repo, or documentation?
https://etherpad.opendev.org/p/ironic-eventlet-removal was our working document (probably outdated at this point), and patches can be found in project:openstack/ironic topic:eventlet-removal