[Roadmap] vLLM Release/CI/Performance Benchmark Q2 2025 #16284

khluu · 2025-04-08T20:01:58Z

This is a living document! If you're interested in contributing to any item, please join #sig-ci channel in vLLM Slack!

If any of the items you wanted is not on the roadmap, your suggestion and contribution is strongly welcomed! Please feel free to comment in this thread, open feature request, or create an RFC.

Release

In this quarter, we want to support & publish built-in artifacts for various platforms & hardwares on a more regular basis
✅ Finished
🛣️ In roadmap

	Per Commit wheel	Nightly wheel	Versioned wheel	Per commit image	Nightly image	Versioned image
CUDA (Default)	✅	🛣️	✅	✅	🛣️	✅
CUDA 11.8	✅		✅		🛣️	🛣️
CUDA 12.1	✅		✅		🛣️	🛣️
aarch (GH200)	🛣️	🛣️	🛣️		🛣️	🛣️
ROCm		🛣️	🛣️		🛣️	🛣️
TPU		🛣️	🛣️		🛣️	🛣️
Neuron		🛣️	🛣️		🛣️	🛣️
CPU		🛣️	🛣️		🛣️	🛣️
HPU						🛣️
XPU						🛣️
IBM Power		🛣️			🛣️
IBM Z (s390x)					🛣️

CI

Our CI has been growing and that comes with a lot of issues (higher cost, CI taking longer/timing out, flakiness, hard to track when tests started failing etc.) We need to improve the stability of our CI pipeline, optimize CI time, and clean up tech debts to make it easier for others to contribute!

Performance benchmark

In Q2, we plan to revamp the way performance benchmarks are presented. In particular, we are splitting the purpose of benchmark into two:

Developer facing: performance regression, accuracy, stress test, release gating performance tests.
User facing: versioned benchmarks on a variety of workloads with reproducible commands.

Cross cutting both efforts will be funding for compute resources. We need the test to run on multiple nodes of 8xH100 as the main testing target, but also have H200, GH200, B200, and A100. Along with AMD MI300x, Trainium2, TPU v6e by end of Q2. On the infrastructure side, we can connect all the machines under one VPC using tailscale, minimizing distributing IPs and Keys.

cc @simon-mo @hmellor @DarkLight1337 @ywang96 @yangw-dev @houseroad @Alexei-V-Ivanov-AMD @xuechendi @russellb @youkaichao

The text was updated successfully, but these errors were encountered:

tlrmchlsmth · 2025-04-08T23:03:37Z

We should plan for a CUDA 12.8 build for B200

xuechendi · 2025-04-09T02:30:27Z

@khluu I think last row should be "XPU"?

khluu · 2025-04-09T07:20:57Z

We should plan for a CUDA 12.8 build for B200

I think we probably will upgrade default CUDA to 12.8 at some point

khluu added the documentation Improvements or additions to documentation label Apr 8, 2025

khluu changed the title ~~Pending~~ [Roadmap] vLLM Release/CI/Perf Benchmark Q2 2025 Apr 8, 2025

khluu changed the title ~~[Roadmap] vLLM Release/CI/Perf Benchmark Q2 2025~~ [Roadmap] vLLM Release/CI/Performance Benchmark Q2 2025 Apr 8, 2025

khluu removed the documentation Improvements or additions to documentation label Apr 8, 2025

khluu pinned this issue Apr 15, 2025

khluu unpinned this issue Apr 15, 2025

khluu pinned this issue Apr 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Roadmap] vLLM Release/CI/Performance Benchmark Q2 2025 #16284

[Roadmap] vLLM Release/CI/Performance Benchmark Q2 2025 #16284

khluu commented Apr 8, 2025 •

edited

Loading

tlrmchlsmth commented Apr 8, 2025

Uh oh!

xuechendi commented Apr 9, 2025

Uh oh!

khluu commented Apr 9, 2025

Uh oh!

Uh oh!

[Roadmap] vLLM Release/CI/Performance Benchmark Q2 2025 #16284

[Roadmap] vLLM Release/CI/Performance Benchmark Q2 2025 #16284

Comments

khluu commented Apr 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Release

CI

Performance benchmark

tlrmchlsmth commented Apr 8, 2025

Uh oh!

xuechendi commented Apr 9, 2025

Uh oh!

khluu commented Apr 9, 2025

Uh oh!

khluu commented Apr 8, 2025 •

edited

Loading