The Python project’s recent switch to a tail-calling interpreter may not provide as large a speed advantage as initially thought. A blog post from Nelson Elhage gives the details. In short, switching to a tail-call-based interpreter accidentally works around an unfixed regression in LLVM 19. On other compilers, the performance benefit (while still present) is more moderate.
When the tail-call interpreter was announced, I was surprised and impressed by the performance improvements, but also confused: I’m not an expert, but I’m passingly-familiar with modern CPU hardware, compilers, and interpreter design, and I couldn’t explain why this change would be so effective. I became curious – and perhaps slightly obsessed – and the reports in this post are the result of a few weeks of off-and-on compiling and benchmarking and disassembly of dozens of different Python binaries, in an attempt to understand what I was seeing.
