CVE-2026-33626: Critical SSRF Vulnerability in LMDeploy Rapidly Exploited in the Wild — Technical Analysis and Mitigation Guide

Executive Summary

The CVE-2026-33626 vulnerability represents a critical Server-Side Request Forgery (SSRF) flaw in LMDeploy, an open-source toolkit widely used for compressing, deploying, and serving large language models (LLMs). This vulnerability, present in all versions of LMDeploy up to and including 0.12.0 with vision-language support, was exploited in the wild within 13 hours of public disclosure. Attackers leveraged the flaw to access sensitive cloud metadata, probe internal services, and conduct internal network reconnaissance, posing a significant risk to organizations deploying LLM inference infrastructure. This advisory provides a comprehensive technical analysis of the vulnerability, details on exploitation tactics, victimology, and actionable mitigation strategies to safeguard your environments.

Threat Actor Profile

The exploitation of CVE-2026-33626 was opportunistic and rapid, with threat actors acting within hours of the public advisory. Analysis of observed attacks indicates the actors were technically adept, capable of weaponizing the vulnerability directly from advisory details without relying on a public proof-of-concept. The primary source IP associated with initial exploitation was 103.116.72.119, registered to Prime Security Corp. in Kowloon Bay, Hong Kong. The attackers utilized callback infrastructure such as cw2mhnbd.requestrepo.com for out-of-band (OOB) exfiltration and SSRF confirmation. No attribution to a specific advanced persistent threat (APT) group has been established as of this report; the tactics align with financially motivated or cloud-focused opportunistic actors.

Technical Analysis of Malware/TTPs

The core of CVE-2026-33626 lies in the load_image() function within lmdeploy/vl/utils.py. This function processes the image_url field in chat completion requests, fetching arbitrary URLs without validating whether the destination is an internal or private IP address. As a result, attackers can craft requests that force the LMDeploy server to initiate HTTP requests to sensitive internal resources.

A typical exploit payload involves submitting a chat completion request with an image_url pointing to a cloud metadata endpoint, such as http://169.254.169.254/latest/meta-data/iam/security-credentials/ (the AWS Instance Metadata Service). This enables attackers to retrieve cloud credentials if the inference server is running in a cloud environment. The SSRF can also be used to access internal services like Redis (127.0.0.1:6379), MySQL (127.0.0.1:3306), or administrative HTTP interfaces (127.0.0.1:8080), and to perform internal port scans.

Attackers further leveraged OOB DNS/HTTP callbacks to domains like cw2mhnbd.requestrepo.com to confirm SSRF and exfiltrate data. The exploitation chain included enumeration of API endpoints (such as /openapi.json), probing of distributed inference cluster endpoints (/distserve/p2p_drop_connect), and attempts to disrupt cluster operations.

The MITRE ATT&CK techniques observed include T1190 (Exploit Public-Facing Application), T1213 (Data from Information Repositories), T1046 (Network Service Scanning), T1589 (Gather Victim Identity Information), and T1071.004 (Application Layer Protocol: DNS).

Exploitation in the Wild

The exploitation timeline was exceptionally rapid. The vendor advisory was published on GitHub on April 21, 2026, at approximately 15:00 UTC. The first exploitation attempt was detected by Sysdig at 03:35 UTC on April 22, just 12 hours and 31 minutes after disclosure. Despite the absence of a public proof-of-concept, attackers were able to construct working exploits from the advisory and source code diff.

Observed exploitation phases included:

Attackers first targeted the AWS metadata endpoint to extract IAM credentials, then probed local services such as Redis and MySQL on loopback addresses. They used OOB DNS callbacks to confirm SSRF and enumerate the API surface, including /openapi.json. Further, they attempted to disrupt distributed inference clusters by POSTing to /distserve/p2p_drop_connect and performed port sweeps on localhost.

The primary source IP for these attacks was 103.116.72.119, and the callback domain used for OOB exfiltration was cw2mhnbd.requestrepo.com. The attacks were automated and systematic, indicating a high level of technical proficiency and a clear understanding of cloud infrastructure attack surfaces.

Victimology and Targeting

The primary targets of this exploitation campaign were organizations deploying LMDeploy inference servers with vision-language support, particularly those exposed to the internet or accessible from untrusted networks. Cloud-hosted inference nodes were at heightened risk due to the potential for cloud metadata exfiltration, which could lead to full cloud account compromise if IAM credentials were obtained.

Victims included research institutions, AI startups, and enterprises experimenting with or deploying LLM inference infrastructure. The attack surface was amplified in environments where inference servers had broad outbound network access or where internal services (such as Redis or MySQL) were exposed without authentication.

The exploitation did not appear to be targeted at specific industries or geographies but rather opportunistically focused on any accessible LMDeploy instance. The rapid weaponization and scanning for vulnerable endpoints suggest the use of automated reconnaissance tools and scripts.

Mitigation and Countermeasures

Immediate mitigation requires upgrading LMDeploy to version 0.12.3 or later, which introduces a _is_safe_url() check to prevent SSRF by validating destination URLs. Organizations should audit all inference servers for the presence of vulnerable versions and apply the patch without delay.

Enforce the use of IMDSv2 on all cloud inference nodes by setting httpTokens=required, which mitigates the risk of metadata service exploitation via SSRF. Restrict outbound egress from inference servers at the VPC or security group level to prevent unauthorized access to internal or cloud metadata endpoints.

Rotate all IAM credentials that may have been exposed via SSRF, and audit cloud logs for anomalous access patterns. Internal services such as Redis, MySQL, and administrative interfaces should be bound to private interfaces only and require strong authentication.

Implement runtime detection by logging and alerting on outbound requests to link-local (169.254.0.0/16), loopback (127.0.0.0/8), RFC1918, and well-known service ports. Monitor for outbound connections to cloud metadata endpoints from inference processes. Utilize Falco or similar runtime security tools with rules for detecting metadata service access from containers or hosts.

Finally, monitor for abnormal outbound connections from inference processes, and review all API endpoints for potential abuse vectors.

References

About Rescana

Rescana is a leader in third-party risk management (TPRM), providing organizations with a comprehensive platform to continuously monitor, assess, and mitigate cyber risks across their digital supply chain. Our advanced analytics and threat intelligence empower security teams to proactively identify vulnerabilities, respond to emerging threats, and ensure compliance with industry standards. For questions or further assistance, we are happy to help at ops@rescana.com.