PaddleOCR-VL#

Introduction#

PaddleOCR-VL is a SOTA and resource-efficient model tailored for document parsing. Its core component is PaddleOCR-VL-0.9B, a compact yet powerful vision-language model (VLM) that integrates a NaViT-style dynamic resolution visual encoder with the ERNIE-4.5-0.3B language model to enable accurate element recognition.

This document provides a detailed workflow for the complete deployment and verification of the model, including supported features, environment preparation, single-node deployment, and functional verification. It is designed to help users quickly complete model deployment and verification.

Supported Features#

Refer to supported features to get the model’s supported feature matrix.

Refer to feature guide to get the feature’s configuration.

Environment Preparation#

Model Weight#

It is recommended to download the model weights to a local directory (e.g., ./PaddleOCR-VL) for quick access during deployment.

Installation#

You can use our official docker image to run PaddleOCR-VL directly.

Select an image based on your machine type and start the docker image on your node, refer to using docker.

export IMAGE=quay.io/ascend/vllm-ascend:v0.13.0rc1
docker run --rm \
    --name vllm-ascend \
    --shm-size=1g \
    --net=host \
    --device /dev/davinci0 \
    --device /dev/davinci_manager \
    --device /dev/devmm_svm \
    --device /dev/hisi_hdc \
    -v /usr/local/dcmi:/usr/local/dcmi \
    -v /usr/local/Ascend/driver/tools/hccn_tool:/usr/local/Ascend/driver/tools/hccn_tool \
    -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
    -v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
    -v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
    -v /etc/ascend_install.info:/etc/ascend_install.info \
    -v /root/.cache:/root/.cache \
    -it $IMAGE bash

Note

The 310P device is supported from version 0.15.0rc1. You need to select the corresponding image for installation.

Deployment#

Single-node Deployment#

Single NPU (PaddleOCR-VL)#

PaddleOCR-VL supports single-node single-card deployment on the 910B4 and 310P platform. Follow these steps to start the inference service:

  1. Prepare model weights: Ensure the downloaded model weights are stored in the PaddleOCR-VL directory.

  2. Create and execute the deployment script (save as deploy.sh):

Run the following script to start the vLLM server on single 910B4:

#!/bin/sh
export VLLM_USE_MODELSCOPE=true
export MODEL_PATH="PaddlePaddle/PaddleOCR-VL"
export TASK_QUEUE_ENABLE=2
export CPU_AFFINITY_CONF=1
export PYTORCH_NPU_ALLOC_CONF="expandable_segments:True"

vllm serve ${MODEL_PATH} \
          --max-num-batched-tokens 16384 \
          --served-model-name PaddleOCR-VL-0.9B \
          --trust-remote-code \
          --no-enable-prefix-caching \
          --mm-processor-cache-gb 0 \
          --compilation-config '{"cudagraph_mode":"FULL_DECODE_ONLY"}' \
          --additional_config '{"enable_cpu_binding":true}' \
          --port 8000

Run the following script to start the vLLM server on single 310P:

#!/bin/sh
export VLLM_USE_MODELSCOPE=true
export MODEL_PATH="PaddlePaddle/PaddleOCR-VL"

vllm serve ${MODEL_PATH} \
          --max_model_len 16384 \
          --served-model-name PaddleOCR-VL-0.9B \
          --trust-remote-code \
          --no-enable-prefix-caching \
          --mm-processor-cache-gb 0 \
          --enforce-eager \
          --dtype float16 \
          --port 8000

Note

The --max_model_len option is added to prevent errors when generating the attention operator mask on the 310P device.

Multiple NPU (PaddleOCR-VL)#

Single-node deployment is recommended.

Prefill-Decode Disaggregation#

Not supported yet.

Functional Verification#

If your service start successfully, you can see the info shown below:

INFO:     Started server process [87471]
INFO:     Waiting for application startup.
INFO:     Application startup complete.

Once your server is started, you can use the OpenAI API client to make queries.

from openai import OpenAI

client = OpenAI(
    api_key="EMPTY",
    base_url="http://localhost:8000/v1",
    timeout=3600
)

# Task-specific base prompts
TASKS = {
    "ocr": "OCR:",
    "table": "Table Recognition:",
    "formula": "Formula Recognition:",
    "chart": "Chart Recognition:",
}

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image_url",
                "image_url": {
                    "url": "https://ofasys-multimodal-wlcb-3-toshanghai.oss-accelerate.aliyuncs.com/wpf272043/keepme/image/receipt.png"
                }
            },
            {
                "type": "text",
                "text": TASKS["ocr"]
            }
        ]
    }
]

response = client.chat.completions.create(
    model="PaddleOCR-VL-0.9B",
    messages=messages,
    temperature=0.0,
)
print(f"Generated text: {response.choices[0].message.content}")

If you query the server successfully, you can see the info shown below (client):

Generated text: CINNAMON SUGAR
1 x 17,000
17,000
SUB TOTAL
17,000
GRAND TOTAL
17,000
CASH IDR
20,000
CHANGE DUE
3,000

Offline Inference with vLLM and PP-DocLayoutV2#

In the above example, we demonstrated how to use vLLM to infer the PaddleOCR-VL-0.9B model. Typically, we also need to integrate the PP-DocLayoutV2 model to fully unleash the capabilities of the PaddleOCR-VL model, making it more consistent with the examples provided by the official PaddlePaddle documentation.

Note

Use separate virtual environments for VLLM and PP-DocLayoutV2 to prevent dependency conflicts.

The 910B4 device supports inference using the PaddlePaddle framework.

  1. Pull the PaddlePaddle-compatible CANN image

docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-npu:cann800-ubuntu20-npu-910b-base-aarch64-gcc84

Start the container using the following command:

docker run -it --name paddle-npu-dev -v $(pwd):/work \
    --privileged --network=host --shm-size=128G -w=/work \
    -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
    -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
    -v /usr/local/dcmi:/usr/local/dcmi \
    -e ASCEND_RT_VISIBLE_DEVICES="0,1,2,3,4,5,6,7" \
    ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-npu:cann800-ubuntu20-npu-910b-base-$(uname -m)-gcc84 /bin/bash
  1. Install PaddlePaddle and PaddleOCR

python -m pip install paddlepaddle==3.2.0
wget https://paddle-whl.bj.bcebos.com/stable/npu/paddle-custom-npu/paddle_custom_npu-3.2.0-cp310-cp310-linux_aarch64.whl
pip  install  paddle_custom_npu-3.2.0-cp310-cp310-linux_aarch64.whl
python -m pip install -U "paddleocr[doc-parser]"
pip install safetensors

Note

The OpenCV component may be missing:

apt-get update
apt-get install -y libgl1 libglib2.0-0

CANN-8.0.0 does not support some versions of NumPy and OpenCV. It is recommended to install the specified versions.

python -m pip install numpy==1.26.4
python -m pip install opencv-python==3.4.18.65

The 310P device supports only the OM model inference. For details about the process, see the guide provided in ModelZoo.

Using vLLM as the backend, combined with PP-DocLayoutV2 for offline inference#

from paddleocr import PaddleOCRVL

doclayout_model_path = "/path/to/your/PP-DocLayoutV2/"

pipeline = PaddleOCRVL(vl_rec_backend="vllm-server", 
                       vl_rec_server_url="http://localhost:8000/v1", 
                       layout_detection_model_name="PP-DocLayoutV2",  
                       layout_detection_model_dir=doclayout_model_path,
                       device="npu")

output = pipeline.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/paddleocr_vl_demo.png")

for i, res in enumerate(output):
    res.save_to_json(save_path=f"output_{i}.json")
    res.save_to_markdown(save_path=f"output_{i}.md")