Master key concepts with interactive flashcards
What is KV-cache and why is it important for transformer inference?
Click to reveal answer
KV-cache stores the key and value tensors from previous tokens' attention computations. Without it, attention would need to recompute all previous tokens at each generation step (O(n²) complexity). KV-cache enables O(n) inference but requires significant memory, especially for long sequences.
Click to show question