
Let’s talk about transformers, dear readers. Not robots in disguise, but the neural network architecture that underpins basically every modern AI model. Transformers are smart, but they trade training efficiency for inference complexity. To help reduce the amount of compute needed for complex transformers, we use a thing called a Key Value


















