Eventlet Removal Logo
Eventlet Removal

Migration Testimony: Arnaud Morin's Experience

Arnaud Morin

Arnaud Morin

https://www.arnaudmorin.fr/

Cloud DevOps & Virtualization Evangelist at OVH, OpenStack Contributor

Working on OpenStack Mistral

Technical Context

Which project or component did you migrate away from Eventlet?

I migrated Mistral and related mistral projects (like mistral-lib) away from Eventlet. Note that this is not yet entirely finished, so the code is still relying on eventlet on Epoxy release, but I plan to evacuate this during the F cycle.

How deeply was Eventlet integrated into your codebase?

Mistral was relying a lot on eventlet greenthreads for multiple functionalities:

  • Thread pools: Mistral uses threads to spawn tasks for workflows, tasks, queues and other concurrency operations
  • Monkey patching: Mistral is monkey patching all eventlet resources (threads, queues, etc.), even for unit tests
  • Mistral is also relying on oslo projects that use eventlet, like oslo-messaging and oslo-service

Which framework or alternative did you choose to replace Eventlet, and why?

From a code perspective, I wanted to change the minimum things, so I discarded the async alternative and took a look at native threading/multi-processing from Python. Most of the time, classes and objects are pretty similar and may be replaced with minimum code change, like eventlet.sleep by time.sleep. It's also true with other classes like Thread pool executors.

Motivation and Decision

What motivated you to start this migration?

My team is using mistral in production, so the first thing that motivated me was to have a clean mistral without eventlet. I also had very bad experiences with eventlet in the past in neutron/oslo.messaging and I always wanted to get rid of this, so that was my way of moving this out! And finally, I was willing to learn more about how eventlet/concurrency is done in mistral, so that was a perfect occasion to dig into code.

Did you have any concerns or doubts before starting?

My main concern was: is it even possible to get rid of eventlet without rewriting completely the framework around it? Also, I wasn't sure about where to start, but I was very happy to listen/watch what other projects were doing. From the beginning, I wasn't sure that I would be able to completely get rid of it, but I was willing to do simple tasks, collect where eventlet is used.

Migration Process

How did the migration process go? Where did you start?

I started by taking a deep breath and read the wiki about eventlet removal. That was very useful to me to identify what solutions were possible to replace eventlet code with other python code.

I then did very small/simple tasks, the first one was to replace all eventlet.sleep references by time.sleep.

The second thing I did was to reference where eventlet was used, and try to identify similar patterns that could be replaced by similar solutions.

What tools or strategies helped you the most?

My strategy was the following:

  • Make sure the CI is green before my changes (unit/devstack tests)
  • Make sure I have a mistral running somewhere with "master" branch code
  • Try to replace one occurrence of eventlet, then run unit tests (they are pretty quick, so it can be done easily from my dev env)
  • If that works, push the change to gerrit and wait for devstack results
  • In parallel, start pushing my change on my mistral on "master" and execute basic tests to make sure everything is still ok

Repeat this process as many times as possible!

Were there any particularly tricky or painful parts?

I had mostly two kinds of issues:

  1. Unit tests failing because of the way the unit test is started/written - It's not always easy to identify that it's not your change in code which is wrong, but the way the unit test was calling it. When that occurred, the fix was usually pretty easy: rewrite/refactor the test to fit the new code.
  2. Eventlet green threads execution order is not the same as native threads - Sometimes it's a little bit trickier because eventlet green threads and python threads are not executed in the same order. When a test relies on an assertion that may occur later if replaced by a native thread, then it's pretty hard to identify. If, by luck it works from time to time, it's even harder to debug.

One other thing that comes to mind is that changing a very small piece of code is sometimes not enough and needs more work to be done to make sure the code works as expected, but then you start refactoring too much code. By chance, I did not encounter this situation very often in mistral.

Roughly how long did the migration take?

That's hard to say, but I would say at least one full cycle. And it's not yet finished, but most of the big work has been done.

I spent multiple hours/days on this, but I was not focused on it 100% of my time. I tried to focus on removing one eventlet call per week and spent something like 2-3 hours each time to make sure everything works fine.

Were you able to migrate incrementally? If so, how?

Yes, I love the gerrit workflow for this, I can split my work by commits/patchsets which are then tested incrementally in the CI. Mistral CI is pretty consistent/reliable, which helped identifying where my changes were failing.

My patchsets were usually working on a small subset of eventlet removal. Sometimes, where the pattern is similar in multiple places, I did the code change in only one patchset (e.g. eventlet.sleep removal was done only in one patchset per repo).

Outcomes and Benefits

What concrete benefits have you seen after migrating?

The main benefits are for maintainability and testability. Without eventlet, it's much easier to understand what is going on behind the scene. Debugging sessions with eventlet has always been painful, this is not the case anymore!

How did your team react to the change?

Mistral team is pretty small, and so far, everyone is happy to see that happening.

Lessons Learned

What advice would you give to a team that's hesitant to migrate?

Eventlet is deprecated, complex to maintain and leads to very weird behavior. While migrating out of it may sound like a hard thing to do, the benefits in maintainability and testability are huge!

Would you like to share a link to a patch, repo, or documentation?

Mistral Eventlet Removal Patches

Final Thoughts

Is there anything else you'd like to share with the community about your experience?

One thing I did before starting was to clean the CI so it's reliable and produces reproducible tests. You also want to have more than a devstack environment to make sure the system won't be broken under real conditions.