CVE-2026-44223

vLLM is an inference and serving engine for large language models (LLMs). From to before 0.20.0, the extract_hidden_states speculative decoding proposer in vLLM returns a tensor with an incorrect shape after the first decode step, causing a RuntimeError that crashes the EngineCore process. The crash is triggered when any request in the batch uses sampling penalty parameters (repetition_penalty, frequency_penalty, or presence_penalty). A single request with a penalty parameter (e.g., "repetition_penalty": 1.1) is sufficient to crash the server. This vulnerability is fixed in 0.20.0.
Configurations

Configuration 1 (hide)

cpe:2.3:a:vllm:vllm:*:*:*:*:*:*:*:*

History

15 May 2026, 15:16

Type Values Removed Values Added
References () https://github.com/vllm-project/vllm/pull/38610 - Issue Tracking, Patch () https://github.com/vllm-project/vllm/pull/38610 - Issue Tracking, Patch
References () https://github.com/vllm-project/vllm/security/advisories/GHSA-83vm-p52w-f9pw - Mitigation, Vendor Advisory () https://github.com/vllm-project/vllm/security/advisories/GHSA-83vm-p52w-f9pw - Mitigation, Vendor Advisory

14 May 2026, 15:37

Type Values Removed Values Added
References () https://github.com/vllm-project/vllm/pull/38610 - () https://github.com/vllm-project/vllm/pull/38610 - Issue Tracking, Patch
References () https://github.com/vllm-project/vllm/security/advisories/GHSA-83vm-p52w-f9pw - () https://github.com/vllm-project/vllm/security/advisories/GHSA-83vm-p52w-f9pw - Mitigation, Vendor Advisory
First Time Vllm vllm
Vllm
CPE cpe:2.3:a:vllm:vllm:*:*:*:*:*:*:*:*

12 May 2026, 20:16

Type Values Removed Values Added
New CVE

Information

Published : 2026-05-12 20:16

Updated : 2026-05-15 15:16


NVD link : CVE-2026-44223

Mitre link : CVE-2026-44223

CVE.ORG link : CVE-2026-44223


JSON object : View

Products Affected

vllm

  • vllm
CWE
CWE-131

Incorrect Calculation of Buffer Size

CWE-704

Incorrect Type Conversion or Cast