Npugraph_ex#
Introduction#
As introduced in the RFC, this is a simple ACLGraph graph mode acceleration solution based on Fx graphs.
Using npugraph_ex#
Npugraph_ex will be enabled by default in the future, Take Qwen series models as an example to show how to configure it.
Offline example:
from vllm import LLM
model = LLM(
model="path/to/Qwen2-7B-Instruct",
additional_config={
"npugraph_ex_config": {
"enable": True,
"enable_static_kernel": False,
}
}
)
outputs = model.generate("Hello, how are you?")
Online example:
vllm serve Qwen/Qwen2-7B-Instruct
--additional-config '{"npugraph_ex_config":{"enable":true, "enable_static_kernel":false}}'
You can find more details about npugraph_ex here