容器化部署vLLM Deepseek : 从装机到交付

一、配置清单

1、14700KF

2、RTX 4090 x 2

3、内存ddr5 32G x4

5、主板华硕 Z790

6、硬盘2T固态系统、4T企业

二、BIOS配置调整

2.1 PCI接口确认

lspci | grep -i nvidia

01:00.0 VGA compatible controller: NVIDIA Corporation AD102 [GeForce RTX 4090] (rev a1)
07:00.0 VGA compatible controller: NVIDIA Corporation AD102 [GeForce RTX 4090] (rev a1)

lspci -vvv -s 01:00.0 | grep  LnkSta
lspci -vvv -s 07:00.0 | grep  LnkSta

LnkSta: Speed 2.5GT/s (downgraded), Width x16
LnkSta: Speed 2.5GT/s (downgraded), Width x4 (downgraded)

已确认PCI通道x16 x4的、需要一会在BIOS里面调中 x8 x8的。

2.2 确认是否开启  Above 4G Decoding

cat /proc/cmdline | grep -i “pci=assign-busses|enable_4g_decoding”

2.3 电源相关 关闭ASPM(链路节能)CEP,解除功耗限制。

三、驱动安装

apt update

ubuntu-drivers autoinstall

nvidia-smi

四、软件安装

cat docker-compose.yml

services:
vllm:
image: vllm/vllm-openai:v0.8.1
restart: unless-stopped
deploy:
resources:
reservations:
devices:
– driver: nvidia
count: 2
capabilities: [gpu]
ports:
– “172.17.0.1:8001:8000”
volumes:
– /home/kairui/models:/models
– /dev/shm:/dev/shm
logging:
driver: “json-file”
options:
max-size: “1g”
max-file: “10”
environment:
– HF_HOME=/models
– NVIDIA_VISIBLE_DEVICES=all
– NVIDIA_DRIVER_CAPABILITIES=all
– CUDA_VISIBLE_DEVICES=0,1
– PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
command: [
“–model”, “/models/DeepSeek-R1-Distill-Qwen-14B”,
“–served-model-name”, “deepseek-r1”,
“–tensor-parallel-size”, “2”,
“–gpu-memory-utilization”, “0.85”,
“–dtype”, “float16”,
“–max-model-len”, “8192”,
“–max-num-seqs”, “64”,
“–api-key”, “5XxBnYwkSAnlmhUVXzuYlBtG8XOfBF9K”
]

发表评论

电子邮件地址不会被公开。 必填项已用*标注

Captcha Code