In the first part of this series, I showed you the theory behind
concurrent memory models and how that theory can be applied to
simple loads and stores. However, loads and stores alone are not
a practical tool for the building of higher-level synchronization primitives
such as spinlocks, mutexes, and condition variables.
Even though it is possible to synchronize two threads using the
full memory-barrier pattern that was introduced last week (Dekker’s
algorithm), modern processors provide a way that is
easier, more generic, and faster—yes, all three of them—the
compare-and-swap operation.
Source: LWN.net – [$] Lockless patterns: an introduction to compare-and-swap