Skip to content

Commit bfffc90

Browse files
LucasWilkinsonlk-chen
authored andcommitted
[BugFix] Fix cascade attention - RuntimeError: scheduler_metadata must have shape (metadata_size) (vllm-project#17283)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
1 parent eaa4367 commit bfffc90

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

vllm/v1/attention/backends/flash_attn.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -372,7 +372,7 @@ def schedule(batch_size, cu_query_lens, max_query_len, seqlens,
372372
suffix_kv_lens = torch.from_numpy(suffix_kv_lens).to(
373373
self.runner.device)
374374
prefix_scheduler_metadata = schedule(
375-
batch_size=num_reqs,
375+
batch_size=1,
376376
cu_query_lens=cu_prefix_query_lens,
377377
max_query_len=num_actual_tokens,
378378
seqlens=prefix_kv_lens,

0 commit comments

Comments
 (0)