Skip to content

[Roadmap] vLLM Release/CI/Performance Benchmark Q2 2025 #16284

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
2 of 21 tasks
khluu opened this issue Apr 8, 2025 · 3 comments
Open
2 of 21 tasks

[Roadmap] vLLM Release/CI/Performance Benchmark Q2 2025 #16284

khluu opened this issue Apr 8, 2025 · 3 comments

Comments

@khluu
Copy link
Collaborator

khluu commented Apr 8, 2025

This is a living document! If you're interested in contributing to any item, please join #sig-ci channel in vLLM Slack!

If any of the items you wanted is not on the roadmap, your suggestion and contribution is strongly welcomed! Please feel free to comment in this thread, open feature request, or create an RFC.


Release

In this quarter, we want to support & publish built-in artifacts for various platforms & hardwares on a more regular basis
✅ Finished
🛣️ In roadmap

Per Commit wheel Nightly wheel Versioned wheel Per commit image Nightly image Versioned image
CUDA (Default) 🛣️ 🛣️
CUDA 11.8 🛣️ 🛣️
CUDA 12.1 🛣️ 🛣️
aarch (GH200) 🛣️ 🛣️ 🛣️ 🛣️ 🛣️
ROCm 🛣️ 🛣️ 🛣️ 🛣️
TPU 🛣️ 🛣️ 🛣️ 🛣️
Neuron 🛣️ 🛣️ 🛣️ 🛣️
CPU 🛣️ 🛣️ 🛣️ 🛣️
HPU 🛣️
XPU 🛣️
IBM Power 🛣️ 🛣️
IBM Z (s390x) 🛣️

CI

Our CI has been growing and that comes with a lot of issues (higher cost, CI taking longer/timing out, flakiness, hard to track when tests started failing etc.) We need to improve the stability of our CI pipeline, optimize CI time, and clean up tech debts to make it easier for others to contribute!

  • Latency & cost
    • Split & shorten long CI jobs (e.g. Entrypoints, Spec decoding, Kernels, etc.)
    • Optimize image build time
    • Reduce number of models used in unit tests
    • Implement better conditional testing strategy
  • Stability
  • Onboard new runner types
    • L40S on AWS
    • A100 from Red Hat MOC cluster
    • TPU v6e
    • IBM S390x
  • Refactoring & clean up
    • CI infra repository
      • (WIP) Replace jinja2 template with a cleaner looking pipeline generator for Buildkite
      • Split hardware CI (AMD, Intel, IBM, etc.) into separate modules
    • Split unit test list into multiple modules (multimodal, language, V1, distributed, etc.)

Performance benchmark

In Q2, we plan to revamp the way performance benchmarks are presented. In particular, we are splitting the purpose of benchmark into two:

  • Developer facing: performance regression, accuracy, stress test, release gating performance tests.
  • User facing: versioned benchmarks on a variety of workloads with reproducible commands.

Cross cutting both efforts will be funding for compute resources. We need the test to run on multiple nodes of 8xH100 as the main testing target, but also have H200, GH200, B200, and A100. Along with AMD MI300x, Trainium2, TPU v6e by end of Q2. On the infrastructure side, we can connect all the machines under one VPC using tailscale, minimizing distributing IPs and Keys.

cc @simon-mo @hmellor @DarkLight1337 @ywang96 @yangw-dev @houseroad @Alexei-V-Ivanov-AMD @xuechendi @russellb @youkaichao

@khluu khluu added the documentation Improvements or additions to documentation label Apr 8, 2025
@khluu khluu changed the title Pending [Roadmap] vLLM Release/CI/Perf Benchmark Q2 2025 Apr 8, 2025
@khluu khluu changed the title [Roadmap] vLLM Release/CI/Perf Benchmark Q2 2025 [Roadmap] vLLM Release/CI/Performance Benchmark Q2 2025 Apr 8, 2025
@khluu khluu removed the documentation Improvements or additions to documentation label Apr 8, 2025
@tlrmchlsmth
Copy link
Collaborator

We should plan for a CUDA 12.8 build for B200

@xuechendi
Copy link
Contributor

@khluu I think last row should be "XPU"?

@khluu
Copy link
Collaborator Author

khluu commented Apr 9, 2025

We should plan for a CUDA 12.8 build for B200

I think we probably will upgrade default CUDA to 12.8 at some point

@khluu khluu pinned this issue Apr 15, 2025
@khluu khluu unpinned this issue Apr 15, 2025
@khluu khluu pinned this issue Apr 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants