XML external entities (XXE) — parser defaults
XML parsers in most languages historically defaulted to resolving external entities. A single XML payload can read local files, exfiltrate data over DNS, hang the parser on a billion-laughs payload, or pivot through SSRF. Most parsers have safer modes; few default to them. The fix is to set the right flags everywhere — every parser, every library, every vendored XML toolkit.
Pattern
- Python.
xml.etree.ElementTree.parse,xml.dom.minidom.parse,xml.sax.parse,lxml.etree.parsewith default options. - Java.
DocumentBuilderFactory,SAXParserFactory,XMLInputFactory,TransformerFactory,SchemaFactory— all entity-resolving by default. - PHP.
simplexml_load_string,DOMDocument::loadXMLwith default options;libxml_disable_entity_loaderglobal flag was the historical mitigation but its semantics changed in PHP 8. - .NET.
XmlDocument,XmlReaderdefaults pre-4.5.2 resolved entities; current defaults are safer but application code often re-enables them. - Ruby.
Nokogiri::XML(input)with default options —noent: trueis the dangerous flag. - Go.
encoding/xmldoes not resolve entities (good); but third-party XML libraries vary.
Why it matters
The classic XXE payload reads a local file:
<?xml version="1.0"?>
<!DOCTYPE foo [<!ENTITY x SYSTEM "file:///etc/passwd">]>
<foo>&x;</foo>…and your parser returns the contents in the response body, or in an error message, or via a side-channel DNS lookup. There is no patch coming for “the XML spec allows this” — the fix is to configure the parser to refuse.
Mitigation — disable entity resolution and DTD loading
Per language, the right flags:
Python (uplift to defusedxml).
# Replace
# import xml.etree.ElementTree as ET
import defusedxml.ElementTree as ET
# All ET.* calls now refuse external entities, DTDs, and
# entity-expansion attacks.defusedxml has wrappers for ElementTree, minidom, sax,
lxml, pulldom, and xmlrpc. Replacing the import is the
fix.
Java.
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
dbf.setFeature("http://xml.org/sax/features/external-general-entities", false);
dbf.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
dbf.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
dbf.setXIncludeAware(false);
dbf.setExpandEntityReferences(false);Equivalent flag-sets exist for SAXParserFactory,
XMLInputFactory (IS_SUPPORTING_EXTERNAL_ENTITIES,
SUPPORT_DTD), and TransformerFactory
(XMLConstants.ACCESS_EXTERNAL_DTD,
ACCESS_EXTERNAL_STYLESHEET).
PHP.
$dom = new DOMDocument();
$dom->loadXML(
$input,
LIBXML_NONET | LIBXML_NOENT
? 0 // default safe in PHP 8+; verify with phpunit
: 0 // explicitly no LIBXML_NOENT, no LIBXML_DTDLOAD
);PHP 8 changed libxml_disable_entity_loader semantics; the
correct shape is now flag-based at the load call.
.NET.
var settings = new XmlReaderSettings {
DtdProcessing = DtdProcessing.Prohibit,
XmlResolver = null,
};
using var reader = XmlReader.Create(input, settings);Never use XmlDocument.Load(stream) directly without setting
XmlResolver = null and DtdProcessing = Prohibit.
Ruby.
doc = Nokogiri::XML(input) do |config|
config.strict.nonet # no network, no DTD loading
endUplift — replace XML where possible
For new APIs and config files, prefer JSON, YAML (with
safe_load), or protobuf. XML’s parser-level vulnerabilities
are not a fixable category; the only real fix is to stop
parsing XML where the format isn’t required.
Inputs
- Call sites — every XML-parser instantiation in the repo.
- Languages and parsers in scope.
The prompt
You are remediating XML-parser defaults across this repo.
Output a PR or a TRIAGE.md.
## Step 0 — Inventory
1. List every XML-parser instantiation: `ElementTree.parse`,
`DocumentBuilderFactory.newInstance`, `XmlDocument`,
`Nokogiri::XML`, `simplexml_load_string`, etc.
2. For each, identify whether the input is untrusted (request
body, uploaded file, partner data feed) or trusted (a
bundled config file).
3. Flag any code path that returns parsed XML content in an
HTTP response (XXE exfiltration surface) and any path that
logs parsed content.
## Step 1 — Mitigate per language
Apply the language-specific configuration from the recipe
body:
- **Python:** swap imports to `defusedxml`.
- **Java:** set the `disallow-doctype-decl`,
`external-general-entities`, `external-parameter-entities`,
`load-external-dtd` features to safe values on every parser
factory, plus `setXIncludeAware(false)` and
`setExpandEntityReferences(false)`.
- **PHP:** pass safe libxml flags to every parser; remove
`LIBXML_NOENT` and `LIBXML_DTDLOAD` everywhere.
- **.NET:** set `DtdProcessing = Prohibit` and
`XmlResolver = null` on every reader.
- **Ruby:** use `nonet`, `noent: false`.
## Step 2 — Uplift the API surface (when applicable)
If a public API takes XML and the agent has authority to
introduce a JSON endpoint side-by-side, do it. Mark XML
endpoints deprecated; keep them serving (with the mitigated
parser) until clients migrate.
## Step 3 — Tests
For every language touched, add a test:
1. The classic XXE payload (referencing
`file:///etc/passwd` or a local DTD) is rejected without
resolving the entity.
2. A billion-laughs / quadratic-blowup payload is rejected
without exhausting memory.
3. A normal XML payload still parses correctly (behaviour
preservation).
## Step 4 — Open the PR
- Branch: `remediate/xxe-<module-slug>`.
- Title: `[Security][XXE] harden XML parsers in <module>`.
- Body: per-language summary, call-site list, test additions,
any deprecation notices.
- Label: `sec-auto-remediation`.
## Stop conditions
- A parser configuration option changes the schema-validation
semantics in a way that breaks legitimate inputs.
- Tests fail in unrelated code that depends on XInclude or
DTD resolution for a real reason. Triage.
- A vendored XML library has no exposed safe-mode flag. Flag
and triage; consider replacing the library.
## Scope
- Do not change XML schemas (XSDs). The fix is parser
configuration, not schema content.
- Do not bundle unrelated refactors.
- Do not silently re-enable any flag the recipe disables.Watch for
- DTD-using legitimate inputs. Some partner integrations ship inline DTDs. The mitigation breaks them. Identify before deploying; allowlist the partner-DTD path explicitly rather than globally re-enabling DTDs.
- Logging the parsed payload. Even with entities disabled, logging unparsed XML can leak data via log-injection. Log the payload’s hash, not its content.
- Dependency-injected parsers. A framework that injects a parser bean (Spring, Symfony, Rails) needs the parser bean reconfigured at construction. Find and fix the bean definition, not just the call sites.
- Schema validation feature-flags re-enabling DTDs. Some
XSD validators re-resolve external schemas; double-check
SchemaFactorysettings.
Related
- Classic Vulnerable Defaults — workflow context.
- Java ObjectInputStream — sibling Java deserialization pattern.