Python tail-call speedup based on LLVM regression

The Python project’s recent switch to a tail-calling interpreter may not provide as large a speed advantage as initially thought. A blog post from Nelson Elhage gives the details. In short, switching to a tail-call-based interpreter accidentally works around an unfixed regression in LLVM 19. On other compilers, the performance benefit (while still present) is more moderate.

When the tail-call interpreter was announced, I was surprised and impressed by the performance improvements, but also confused: I’m not an expert, but I’m passingly-familiar with modern CPU hardware, compilers, and interpreter design, and I couldn’t explain why this change would be so effective. I became curious – and perhaps slightly obsessed – and the reports in this post are the result of a few weeks of off-and-on compiling and benchmarking and disassembly of dozens of different Python binaries, in an attempt to understand what I was seeing.

[$] Capability analysis for the kernel

One of the advantages of the Rust type system is its ability to encapsulate
requirements about the state of the program in the type system;
often, this state includes which locks must be held to be able to carry out
specific operations. C lacks the ability to express these
requirements, but there would be obvious benefits if that kind of feature
could be grafted onto the language. The Clang compiler has made some
strides in that direction with its thread-safety
analysis
feature; two developers have been independently working to
take advantage of that work for the kernel.

[$] Hash-based module integrity checking

On January 20, Thomas Weißschuh shared a new

patch set
implementing an alternate method for checking the integrity of
loadable kernel modules. This mechanism, which checks module integrity based
on hashes computed at build time instead of using cryptographic signatures,
could enable reproducible kernel builds in more contexts. Several distributions
have already expressed interest in the patch set if Weißschuh can get it
into the kernel.

[$] Timer IDs, CRIU, and ABI challenges

The kernel project has usually been willing to make fundamental internal
changes if they lead to a better kernel in the end. The project also,
though, goes out of its way to avoid breaking interfaces that have been
exposed to user space, even if programs come to rely on behavior that was
never documented. Sometimes, those two principles come into conflict,
leading to a situation where fixing problems within the kernel is either
difficult or impossible. This sort of situation has been impeding
performance improvements in the kernel’s POSIX timers implementation for
some time, but it appears that a solution has been found.

Zen and the Art of Microcode Hacking (Google Bug Hunters)

The Google Bug Hunters blog has a
detailed description
of how a vulnerability in AMD’s microcode-patching
functionality was discovered and exploited; the authors have also released
a set of tools to assist with this kind of research in the future.

Secure hash functions are designed in such a way that there is no
secret key, and there is no way to use knowledge of the
intermediate state in order to generate a collision. However, CMAC
was not designed as a hash function, and therefore it is a weak
hash function against an adversary who has the key. Remember that
every AMD Zen CPU has to have the same AES-CMAC key in order to
successfully calculate the hash of the AMD public key and the
microcode patch contents. Therefore, the key only needs to be
revealed from a single CPU in order to compromise all other CPUs
using the same key. This opens up the potential for hardware
attacks (e.g., reading the key from ROM with a scanning electron
microscope), side-channel attacks (e.g., using Correlation Power
Analysis to leak the key during validation), or other software or
hardware attacks that can somehow reveal the key. In summary, it is
a safe assumption that such a key will not remain secret forever.

[$] Two new graph-based functional programming languages

Functional programming languages have a long association with graphs. In the
1990s, it was even thought that parallel graph-reduction
architectures could make functional programming languages much faster than their
imperative counterparts. Alas, that prediction mostly failed to materialize.
Even though graphs are still used as a theoretical formalism in order to define
and optimize functional languages (such as Haskell’s

spineless tagless graph-machine
), they are still mostly compiled down to the same old
non-parallel assembly code that every other language uses. Now, two
projects —

Bend
and

Vine
— have sprung up attempting to change that, and prove that
parallel graph reduction can be a useful technique for real programs.

Linux from Scratch version 12.3 released

Version
12.3
of Linux From
Scratch
(LFS) has been released, along with Beyond Linux
From Scratch (BLFS) 12.3
. LFS provides step-by-step instructions
on building a customized Linux system entirely from source, and BLFS
helps to extend an LFS installation into a more usable system. Notable
changes in this release include toolchain updates to GNU Binutils
2.44, GNU C Library (glibc) 2.41, and Linux 6.13.2. The Changelog
has a full list of changes since the previous stable release.

[$] A look at Firefox forks

Mozilla’s actions have been rubbing many Firefox fans the
wrong way as of late, and inspiring them to look for alternatives.
There are many choices for users who are looking for a browser that
isn’t part of the Chrome monoculture but is full-featured and suitable
for day-to-day use. For those who are willing to stay in the Firefox
“family” there are a number of good options that have taken vastly
different approaches. This includes GNU IceCat, Floorp, LibreWolf, and Zen.

Mozilla reverses course on its terms of use

Mozilla has issued
an update
to its terms of use (TOU) that were announced
on February 26. It has removed a reference in the TOU to
Mozilla’s Acceptable Use Policy “because it seems to be causing
more confusion than clarity
“, and has revised the TOU “to more
clearly reflect the limited scope of how Mozilla interacts with user
data
“. The new language says:

You give Mozilla the rights necessary to operate Firefox. This
includes processing your data as we describe in the Firefox Privacy
Notice. It also includes a nonexclusive, royalty-free, worldwide
license for the purpose of doing as you request with the content you
input in Firefox. This does not give Mozilla any ownership in that
content.

Mozilla has also updated its Privacy FAQ to provide
more detail about its reasons for the changes.

[$] Guard pages for file-backed memory

One of the many new features packed into the 6.13 kernel release was guard
pages, a hardening mechanism that makes it possible to inject zero-access
pages into a process’s address space in an efficient way. That feature
only supports anonymous (user-space data) pages, though. To make guard
pages more widely useful, Lorenzo Stoakes has put together a patch
set
enabling the feature for file-backed pages as well; in the process,
he examined and resolved a long list of potential problems that extending
the feature could encounter. One potential problem was not on his list,
though.

[$] Fedora discusses Flatpak priorities

Differences of opinion, as well as outright disputes, between
upstream open-source projects and Linux distribution packagers over
packaging practices are nothing new. It is rarer, though, for those
disputes to boil over to threats of legal action—but a
disagreement between the Open
Broadcaster Software (OBS) Studio
project and Fedora packagers
reached that point in mid-February. After escalation to a higher
authority things have been worked out to the satisfaction of the OBS
project, but some lingering questions remain. How Fedora should
prioritize Flatpak repositories,
how to handle conflicts between upstreams and Fedora packagers, and
the mechanics of removing or retiring Flatpaks all remain open
questions.