CVE-2026-44223

vLLM is an inference and serving engine for large language models (LLMs). From 0.18.0 to before 0.20.0, the extract_hidden_states speculative decoding proposer in vLLM returns a tensor with an incorrect shape after the first decode step, causing a RuntimeError that crashes the EngineCore process. The crash is triggered when any request in the batch uses sampling penalty parameters (repetition_penalty, frequency_penalty, or presence_penalty). A single request with a penalty parameter (e.g., "repetition_penalty": 1.1) is sufficient to crash the server. This vulnerability is fixed in 0.20.0.

CVSS v3 6.5 MEDIUM

6.5^/10

CVSS v3 : MEDIUM

V3 Legend

Vector :

Exploitability : 2.8 / Impact : 3.6

Attack Vector NETWORK

Attack Complexity LOW

Privileges Required LOW

User Interaction NONE

Confidentiality Impact NONE

Integrity Impact NONE

Availability Impact HIGH

Scope UNCHANGED

References

Link	Resource
https://github.com/vllm-project/vllm/pull/38610	Issue Tracking Patch
https://github.com/vllm-project/vllm/security/advisories/GHSA-83vm-p52w-f9pw	Mitigation Vendor Advisory
https://github.com/vllm-project/vllm/pull/38610	Issue Tracking Patch
https://github.com/vllm-project/vllm/security/advisories/GHSA-83vm-p52w-f9pw	Mitigation Vendor Advisory

Configurations

Configuration 1 (hide)

cpe:2.3:a:vllm:vllm:*:*:*:*:*:*:*:*

History

22 Jun 2026, 22:16

Type	Values Removed	Values Added
Summary	(en) vLLM is an inference and serving engine for large language models (LLMs). From to before 0.20.0, the extract_hidden_states speculative decoding proposer in vLLM returns a tensor with an incorrect shape after the first decode step, causing a RuntimeError that crashes the EngineCore process. The crash is triggered when any request in the batch uses sampling penalty parameters (repetition_penalty, frequency_penalty, or presence_penalty). A single request with a penalty parameter (e.g., "repetition_penalty": 1.1) is sufficient to crash the server. This vulnerability is fixed in 0.20.0.	(en) vLLM is an inference and serving engine for large language models (LLMs). From 0.18.0 to before 0.20.0, the extract_hidden_states speculative decoding proposer in vLLM returns a tensor with an incorrect shape after the first decode step, causing a RuntimeError that crashes the EngineCore process. The crash is triggered when any request in the batch uses sampling penalty parameters (repetition_penalty, frequency_penalty, or presence_penalty). A single request with a penalty parameter (e.g., "repetition_penalty": 1.1) is sufficient to crash the server. This vulnerability is fixed in 0.20.0.

15 May 2026, 15:16

Type	Values Removed	Values Added
References	~~() https://github.com/vllm-project/vllm/pull/38610 - Issue Tracking, Patch~~	() https://github.com/vllm-project/vllm/pull/38610 - Issue Tracking, Patch
References	~~() https://github.com/vllm-project/vllm/security/advisories/GHSA-83vm-p52w-f9pw - Mitigation, Vendor Advisory~~	() https://github.com/vllm-project/vllm/security/advisories/GHSA-83vm-p52w-f9pw - Mitigation, Vendor Advisory

14 May 2026, 15:37

Type	Values Removed	Values Added
References	~~() https://github.com/vllm-project/vllm/pull/38610 -~~	() https://github.com/vllm-project/vllm/pull/38610 - Issue Tracking, Patch
References	~~() https://github.com/vllm-project/vllm/security/advisories/GHSA-83vm-p52w-f9pw -~~	() https://github.com/vllm-project/vllm/security/advisories/GHSA-83vm-p52w-f9pw - Mitigation, Vendor Advisory
First Time		Vllm vllm Vllm
CPE		cpe:2.3:a:vllm:vllm::::::::

12 May 2026, 20:16

Type	Values Removed	Values Added
New CVE

Information

Published : 2026-05-12 20:16

Updated : 2026-06-22 22:16

NVD link : CVE-2026-44223

Mitre link : CVE-2026-44223

CVE.ORG link : CVE-2026-44223

JSON object : View

Products Affected

vllm

vllm

CWE

CWE-131

Incorrect Calculation of Buffer Size

CWE-704

Incorrect Type Conversion or Cast

6.5 /10