Feature Guide#
This section provides a detailed usage guide of vLLM Ascend features.
Feature Guide
- Graph Mode Guide
- Quantization Guide
- Sleep Mode Guide
- Structured Output Guide
- LoRA Adapters Guide
- Expert Load Balance (EPLB)
- Netloader Guide
- Multi Token Prediction (MTP)
- Dynamic Batch
- Ascend Store Deployment Guide
- External DP
- Distributed DP Server With Large-Scale Expert Parallelism
- UCM-Enhanced Prefix Caching Deployment Guide
- Fine-Grained Tensor Parallelism (Finegrained TP)
- Layer Sharding Linear Guide
- Speculative Decoding Guide
- Context Parallel Guide
- Npugraph_ex
- Weight Prefetch Guide
- Sequence Parallelism
- Batch Invariance