Supported Features

Supported Features#

The feature support principle of vLLM Ascend is: aligned with the vLLM. We are also actively collaborating with the community to accelerate support.

Functional call: https://docs.vllm.ai/en/latest/features/tool_calling/

You can check the support status of vLLM V1 Engine. Below is the feature support status of vLLM Ascend:

Feature

Status

Next Step

Chunked Prefill

🟢 Functional

Functional, see detailed note: Chunked Prefill

Automatic Prefix Caching

🟢 Functional

Functional, see detailed note: vllm-ascend#732

LoRA

🔵 Experimental

Functional, see detailed note: LoRA

Speculative decoding

🟢 Functional

Basic support

Pooling

🔵 Experimental

CI needed to adapt to more models; V1 support relies on vLLM support.

Enc-dec

🟡 Planned

vLLM should support this feature first.

Multi Modality

🟢 Functional

Multi Modality, optimizing and adapting more models

LogProbs

🟢 Functional

CI needed

Prompt logProbs

🟢 Functional

CI needed

Async output

🟢 Functional

CI needed

Beam search

🔵 Experimental

CI needed

Guided Decoding

🟢 Functional

vllm-ascend#177

Tensor Parallel

🟢 Functional

Make TP >4 work with graph mode.

Pipeline Parallel

🟢 Functional

Write official guide and tutorial.

Expert Parallel

🟢 Functional

Support dynamic EPLB.

Data Parallel

🟢 Functional

Data Parallel support for Qwen3 MoE.

Prefill Decode Disaggregation

🟢 Functional

Functional, xPyD is supported.

Quantization

🟢 Functional

W8A8 available; working on more quantization method support (W4A8, etc)

Graph Mode

🟢 Functional

Functional, see detailed note: Graph Mode

Sleep Mode

🟢 Functional

Functional, see detailed note: Sleep Mode

Context Parallel

🟢 Functional

Functional, see detailed note: Context Parallel

  • 🟢 Functional: Fully operational, with ongoing optimizations.

  • 🔵 Experimental: Experimental support, interfaces and functions may change.

  • 🚧 WIP: Under active development, will be supported soon.

  • 🟡 Planned: Scheduled for future implementation (some may have open PRs/RFCs).

  • 🔴 NO plan/Deprecated: No plan or deprecated by vLLM.