ZCyberNews
中文
VulnerabilitiesHigh3 min read
CVE-2026-31247

Docling JATS XML Backend XXE Flaw CVE-2026-31247 Enables DoS

CVE-2026-31247: Docling's JATS XML backend through 2.61.0 uses etree.parse() without disabling entity expansion, allowing XML bomb attacks that consume excessive resources and...

Docling JATS XML Backend XXE Flaw CVE-2026-31247 Enables DoS

Executive Summary

Docling, an open-source document conversion library, carries a high-severity XML External Entity (XXE) vulnerability in its JATS XML backend, tracked as CVE-2026-31247 with a CVSS score of 7.5. The flaw affects all versions through 2.61.0 and stems from the library's use of Python's etree.parse() function without disabling entity resolution. An attacker can supply a crafted XML file containing nested entity expansion — commonly known as an XML bomb — that causes exponential resource consumption, leading to denial of service. The vulnerability was disclosed via the NVD on May 11, 2026, and no official patch has been released as of this writing.

Technical Analysis

The JATS XML backend in Docling processes XML files formatted according to the Journal Article Tag Suite (JATS) standard, a common schema in academic and publishing workflows. According to the NVD entry for CVE-2026-31247, the backend invokes etree.parse() — part of Python's standard xml.etree.ElementTree library — without explicitly disabling entity expansion. By default, etree.parse() resolves both internal and external entities, including nested entity definitions.

An XML bomb, also known as a billion laughs attack, exploits this behavior by defining a small number of deeply nested entities. For example, a file with a few kilobytes of XML can expand to gigabytes of in-memory data when parsed, exhausting CPU and memory resources on the target system. The attack does not require authentication or special privileges — any user or service that can submit a JATS XML document to Docling can trigger the condition.

Docling is used in environments where document conversion from PDF, Word, or HTML to JATS XML is automated, such as research repositories, publishing pipelines, and academic archives. The vulnerability is particularly concerning in server-side deployments where Docling processes untrusted XML files from external contributors or ingested documents. A single malicious submission can degrade or crash the processing service, affecting availability for all users.

The CVSS vector for CVE-2026-31247 reflects a network-exploitable, low-complexity attack with no privileges required and no user interaction needed. The scope is unchanged, and while confidentiality and integrity are not directly impacted, availability is completely compromised (C:N/I:N/A:H).

Mitigations & Recommendations

As of May 11, 2026, the Docling project has not released a patched version addressing CVE-2026-31247. Defenders who rely on Docling for JATS XML conversion should implement the following mitigations:

  • Disable JATS XML processing in environments where it is not essential, or restrict its use to trusted XML sources only.
  • Apply input validation by scanning XML files for entity declarations before passing them to Docling. Tools such as xmllint with the --noent flag disabled can be used to pre-validate documents.
  • Sandbox the Docling process using containerization or seccomp profiles to limit resource exhaustion impact on the host system. Setting memory and CPU limits at the OS or container level (e.g., via Docker --memory and --cpus flags) can contain a denial-of-service event to a single container.
  • Monitor for abnormal resource consumption in Docling processes. Spikes in memory usage or CPU time during XML parsing may indicate exploitation attempts.
  • Consider alternative XML parsing libraries that disable entity expansion by default, such as defusedxml for Python, and patch the Docling source locally if feasible.

Stay Updated

Get the latest cybersecurity news delivered to your inbox.

Tags:#docling#cve-2026-31247#xxe#xml-bomb#denial-of-service#jats-xml

Related Articles