You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a living document! If you're interested in contributing to any item, please join #sig-ci channel in vLLM Slack!
If any of the items you wanted is not on the roadmap, your suggestion and contribution is strongly welcomed! Please feel free to comment in this thread, open feature request, or create an RFC.
Release
In this quarter, we want to support & publish built-in artifacts for various platforms & hardwares on a more regular basis
✅ Finished
🛣️ In roadmap
Per Commit wheel
Nightly wheel
Versioned wheel
Per commit image
Nightly image
Versioned image
CUDA (Default)
✅
🛣️
✅
✅
🛣️
✅
CUDA 11.8
✅
✅
🛣️
🛣️
CUDA 12.1
✅
✅
🛣️
🛣️
aarch (GH200)
🛣️
🛣️
🛣️
🛣️
🛣️
ROCm
🛣️
🛣️
🛣️
🛣️
TPU
🛣️
🛣️
🛣️
🛣️
Neuron
🛣️
🛣️
🛣️
🛣️
CPU
🛣️
🛣️
🛣️
🛣️
HPU
🛣️
XPU
🛣️
IBM Power
🛣️
🛣️
IBM Z (s390x)
🛣️
CI
Our CI has been growing and that comes with a lot of issues (higher cost, CI taking longer/timing out, flakiness, hard to track when tests started failing etc.) We need to improve the stability of our CI pipeline, optimize CI time, and clean up tech debts to make it easier for others to contribute!
Latency & cost
Split & shorten long CI jobs (e.g. Entrypoints, Spec decoding, Kernels, etc.)
User facing: versioned benchmarks on a variety of workloads with reproducible commands.
Cross cutting both efforts will be funding for compute resources. We need the test to run on multiple nodes of 8xH100 as the main testing target, but also have H200, GH200, B200, and A100. Along with AMD MI300x, Trainium2, TPU v6e by end of Q2. On the infrastructure side, we can connect all the machines under one VPC using tailscale, minimizing distributing IPs and Keys.
Uh oh!
There was an error while loading. Please reload this page.
This is a living document! If you're interested in contributing to any item, please join
#sig-ci
channel in vLLM Slack!If any of the items you wanted is not on the roadmap, your suggestion and contribution is strongly welcomed! Please feel free to comment in this thread, open feature request, or create an RFC.
Release
In this quarter, we want to support & publish built-in artifacts for various platforms & hardwares on a more regular basis
✅ Finished
🛣️ In roadmap
CI
Our CI has been growing and that comes with a lot of issues (higher cost, CI taking longer/timing out, flakiness, hard to track when tests started failing etc.) We need to improve the stability of our CI pipeline, optimize CI time, and clean up tech debts to make it easier for others to contribute!
Performance benchmark
In Q2, we plan to revamp the way performance benchmarks are presented. In particular, we are splitting the purpose of benchmark into two:
Cross cutting both efforts will be funding for compute resources. We need the test to run on multiple nodes of 8xH100 as the main testing target, but also have H200, GH200, B200, and A100. Along with AMD MI300x, Trainium2, TPU v6e by end of Q2. On the infrastructure side, we can connect all the machines under one VPC using tailscale, minimizing distributing IPs and Keys.
cc @simon-mo @hmellor @DarkLight1337 @ywang96 @yangw-dev @houseroad @Alexei-V-Ivanov-AMD @xuechendi @russellb @youkaichao
The text was updated successfully, but these errors were encountered: