top of page

Subscribe to our newsletter

Critical Apache Tika CVE-2025-66516: XXE Vulnerability Exposes Over 500 Instances After Incomplete Patch

  • Rescana
  • 2 days ago
  • 5 min read
Image for post about Apache Issues Max-Severity Tika CVE After Patch Miss

Executive Summary

A critical XML External Entity (XXE) injection vulnerability, CVE-2025-66516, has been identified in Apache Tika, carrying a maximum CVSS score of 10.0. This vulnerability is the result of a patch miss, where the initial remediation failed to address the root cause in all relevant modules, leaving many deployments exposed even after partial upgrades. Attackers can exploit this flaw by submitting specially crafted PDF files containing malicious XML Forms Architecture (XFA) content, enabling them to read sensitive files, perform server-side request forgery (SSRF), and, under certain configurations, achieve remote code execution (RCE). There is clear evidence of active scanning and exploitation attempts in the wild, with over 500 vulnerable Apache Tika instances discovered exposed on the internet. The urgency of this issue is compounded by the availability of public proof-of-concept (PoC) exploits and the rapid weaponization of similar vulnerabilities in the past. Organizations using Apache Tika must act immediately to upgrade all affected modules, verify dependency integrity, and implement robust monitoring and network restrictions to mitigate risk.

Technical Information

CVE-2025-66516 is a critical XXE vulnerability affecting multiple modules of Apache Tika. The vulnerability arises from improper restriction of XML external entity references (CWE-611) during the parsing of PDF files containing embedded XFA forms. When a vulnerable Apache Tika instance processes such a PDF, it inadvertently allows the inclusion and resolution of external entities, which can be leveraged by attackers to access arbitrary files on the server, initiate SSRF attacks, and, in certain misconfigured environments, execute arbitrary code.

The vulnerability affects the following modules and versions: org.apache.tika:tika-core from version 1.13 up to and including 3.2.1 (fixed in 3.2.2), org.apache.tika:tika-parser-pdf-module from version 2.0.0 up to and including 3.2.1 (fixed in 3.2.2), and org.apache.tika:tika-parsers from version 1.13 up to and including 1.28.5 (fixed in 2.0.0). The original patch (CVE-2025-54988) only addressed the tika-parser-pdf-module, neglecting the underlying issue in tika-core and failing to cover the 1.x release line, which left a significant number of deployments vulnerable even after partial upgrades.

The attack vector is remote and can be triggered by submitting a malicious PDF to any service or endpoint that uses Apache Tika for document parsing. The attacker embeds an XFA form within the PDF, referencing external entities such as local files (e.g., /etc/passwd on Unix systems or C:\Windows\win.ini on Windows) or internal network resources. When Apache Tika parses the file, it processes the external entity references, resulting in the disclosure of sensitive information or enabling further lateral movement within the network.

The technical impact of this vulnerability is severe. Successful exploitation can lead to the exfiltration of sensitive files, exposure of internal network resources via SSRF, and, in rare cases where the environment is misconfigured to allow it, remote code execution. The presence of public PoCs and the simplicity of the attack methodology significantly lower the barrier to exploitation.

Exploitation in the Wild

Since the public disclosure of CVE-2025-66516, there has been a marked increase in scanning activity targeting Apache Tika endpoints. Security researchers and threat intelligence platforms, including SOC Prime and CyberPress, have reported mass scanning campaigns, with over 500 vulnerable Apache Tika instances identified as exposed to the internet. These scans are designed to locate endpoints running susceptible versions of Apache Tika and to test for the presence of the XXE vulnerability by submitting crafted PDF files.

Publicly available PoCs have been released on platforms such as GitHub and various security forums. These PoCs demonstrate the feasibility of exploiting the vulnerability to read arbitrary files and perform SSRF. The attack typically involves uploading a PDF with a malicious XFA form that references a sensitive file or an internal URL. Upon parsing, the vulnerable Apache Tika instance processes the external entity, resulting in the exfiltration of the targeted resource.

While there have not yet been reports of large-scale breaches directly attributed to this vulnerability, the rapid weaponization of previous Apache vulnerabilities, such as those affecting Tomcat, suggests that both opportunistic attackers and advanced persistent threat (APT) groups are likely to incorporate this exploit into their toolkits imminently. The mapping of this vulnerability to MITRE ATT&CK techniques, including T1190 (Exploit Public-Facing Application), T1048 (Exfiltration Over Alternative Protocol), and T1213 (Data from Information Repositories), underscores its potential utility in a wide range of attack scenarios.

APT Groups using this vulnerability

As of the time of this advisory, there is no direct attribution of CVE-2025-66516 exploitation to specific APT groups. However, the characteristics of the vulnerability—remote exploitability, high impact, and the presence of public PoCs—make it highly attractive to both state-sponsored and financially motivated threat actors. Historical analysis of similar vulnerabilities in Apache products indicates that APT groups are quick to adopt newly disclosed exploits, particularly those that enable initial access or facilitate lateral movement within target environments.

The exposure of over 500 vulnerable Apache Tika instances across diverse sectors and geographies, as reported by CyberPress and Censys, suggests a broad attack surface that is likely to attract the attention of sophisticated adversaries. Organizations in critical infrastructure, finance, healthcare, and government should be especially vigilant, as these sectors have historically been targeted by APT groups leveraging similar vulnerabilities.

Affected Product Versions

The following Apache Tika modules and versions are confirmed to be affected by CVE-2025-66516:

org.apache.tika:tika-core is vulnerable from version 1.13 through 3.2.1, with the issue resolved in version 3.2.2. org.apache.tika:tika-parser-pdf-module is affected from version 2.0.0 through 3.2.1, with the fix implemented in version 3.2.2. org.apache.tika:tika-parsers is impacted from version 1.13 through 1.28.5, with remediation available in version 2.0.0. It is critical to note that upgrading only the parser module without updating the core module does not fully mitigate the vulnerability, due to the patch miss in the original remediation effort.

Organizations should conduct a comprehensive inventory of their Apache Tika deployments, including all direct and transitive dependencies, to ensure that no vulnerable versions remain in use. This includes checking for embedded or bundled versions of Apache Tika within larger applications or third-party products.

Workaround and Mitigation

The primary mitigation for CVE-2025-66516 is to upgrade all affected Apache Tika modules to the latest secure versions. Specifically, organizations must update tika-core to version 3.2.2 or later, tika-parser-pdf-module to version 3.2.2 or later, and tika-parsers to version 2.0.0 or later. It is essential to verify that both the core and parser modules are updated, as partial upgrades will not fully address the vulnerability.

In addition to upgrading, organizations should restrict network access to Apache Tika endpoints, limiting exposure to only trusted internal networks and preventing direct internet access wherever possible. Outbound connections from Apache Tika servers should be tightly controlled to prevent SSRF exploitation and data exfiltration.

Monitoring and detection are also critical components of a robust mitigation strategy. Security teams should implement SIEM rules and leverage detection content provided by vendors such as SOC Prime to identify suspicious activity indicative of exploitation attempts. This includes monitoring for unusual PDF uploads containing XFA forms with external entity references, unexpected outbound requests from Apache Tika servers, and access logs showing the parsing of attacker-supplied PDFs.

Where immediate patching is not feasible, organizations may consider disabling PDF parsing functionality within Apache Tika or implementing strict input validation to reject files containing XFA forms. However, these workarounds should be viewed as temporary measures, and full remediation via module upgrades remains the recommended course of action.

References

Rescana is here for you

Rescana is committed to helping organizations navigate the evolving threat landscape with confidence. Our Third-Party Risk Management (TPRM) platform empowers security teams to identify, assess, and mitigate risks across their digital supply chain, providing continuous visibility and actionable intelligence. We encourage all customers to leverage our expertise and solutions to strengthen their security posture and respond proactively to emerging threats. If you have any questions or require further assistance regarding this advisory or your broader cybersecurity strategy, our team is ready to help at ops@rescana.com.

bottom of page