NVIDIA Generative AI with LLMs Flashcards

Master key concepts with interactive flashcards

Need help staying focused? LockIn blocks distractions while you study.

Card 1 of 50

0 known

Architecture

Question

What is KV-cache and why is it important for transformer inference?

Click to reveal answer

Architecture

Answer

KV-cache stores the key and value tensors from previous tokens' attention computations. Without it, attention would need to recompute all previous tokens at each generation step (O(n²) complexity). KV-cache enables O(n) inference but requires significant memory, especially for long sequences.

Click to show question

Tip: Use ← → arrow keys to navigate, Space to flip card