Supported Features#
The feature support principle of vLLM Ascend is: aligned with the vLLM. We are also actively collaborating with the community to accelerate support.
Functional call: https://docs.vllm.ai/en/latest/features/tool_calling/
You can check the support status of vLLM V1 Engine. Below is the feature support status of vLLM Ascend:
Feature |
Status |
Next Step |
|---|---|---|
Chunked Prefill |
🟢 Functional |
Functional, see detailed note: Chunked Prefill |
Automatic Prefix Caching |
🟢 Functional |
Functional, see detailed note: vllm-ascend#732 |
LoRA |
🔵 Experimental |
Functional, see detailed note: LoRA |
Speculative decoding |
🟢 Functional |
Basic support |
Pooling |
🔵 Experimental |
CI needed to adapt to more models; V1 support relies on vLLM support. |
Enc-dec |
🟡 Planned |
vLLM should support this feature first. |
Multi Modality |
🟢 Functional |
Multi Modality, optimizing and adapting more models |
LogProbs |
🟢 Functional |
CI needed |
Prompt logProbs |
🟢 Functional |
CI needed |
Async output |
🟢 Functional |
CI needed |
Beam search |
🔵 Experimental |
CI needed |
Guided Decoding |
🟢 Functional |
|
Tensor Parallel |
🟢 Functional |
Make TP >4 work with graph mode. |
Pipeline Parallel |
🟢 Functional |
Write official guide and tutorial. |
Expert Parallel |
🟢 Functional |
Support dynamic EPLB. |
Data Parallel |
🟢 Functional |
Data Parallel support for Qwen3 MoE. |
Prefill Decode Disaggregation |
🟢 Functional |
Functional, xPyD is supported. |
Quantization |
🟢 Functional |
W8A8 available; working on more quantization method support (W4A8, etc) |
Graph Mode |
🟢 Functional |
Functional, see detailed note: Graph Mode |
Sleep Mode |
🟢 Functional |
Functional, see detailed note: Sleep Mode |
Context Parallel |
🟢 Functional |
Functional, see detailed note: Context Parallel |
🟢 Functional: Fully operational, with ongoing optimizations.
🔵 Experimental: Experimental support, interfaces and functions may change.
🚧 WIP: Under active development, will be supported soon.
🟡 Planned: Scheduled for future implementation (some may have open PRs/RFCs).
🔴 NO plan/Deprecated: No plan or deprecated by vLLM.