What Synthesis Found in 31 Seconds: An XXE Vulnerability in a Production Java SSO System¶

I pointed Synthesis's static security scanner at a Norwegian open-source Java portfolio -- a production SSO system spanning 60+ repositories, used across enterprise deployments. One command. Thirty-one seconds. Ninety-five HIGH-severity findings.

Most of those findings were false positives. But buried in the noise was a genuine XXE vulnerability in a live authentication endpoint. This post is about the finding, the fix, and what the workflow actually looks like when you combine automated scanning with AI-assisted triage.

The scan¶

synthesis code-graph security -d /src/my-portfolio --severity HIGH

Synthesis crawled the entire portfolio, built a code graph, and ran 21 static security signal types across every Java source file. Thirty-one seconds for sixty-plus repositories. The output: 95 findings flagged as HIGH severity, each with a source file, line number, and CWE classification.

Ninety-five is a large number. In my experience, a large number of high-severity findings from a static scanner means one of two things: either the codebase has serious systemic problems, or the scanner has a high false positive rate. Usually it is the second.

Triage took considerably longer than the scan.

The finding¶

After working through the results, filtering out patterns that were clearly benign, and investigating the ones that looked plausible, one finding stood out.

A @POST endpoint in the authentication service backend accepted XML-formatted user credentials. The XML parsing code looked like this:

// Before: vulnerable (CWE-611)
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = dbf.newDocumentBuilder();
Document dDoc = builder.parse(input);

Three lines. No entity protection. This is CWE-611: XML External Entity (XXE) injection.

A similar pattern appeared in a second authentication component in the same system.

What XXE actually is¶

XML External Entity injection exists because of a design decision made decades ago. The XML specification allows documents to define external entities that reference URIs. When a parser processes a document containing an external entity, it resolves that URI. This was a feature. It became a vulnerability.

If you do not explicitly disable external entity resolution, an attacker can submit XML like this:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [
  <!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<credentials>
  <username>&xxe;</username>
  <password>anything</password>
</credentials>

The parser resolves &xxe; by reading /etc/passwd from the server's filesystem. The file contents end up in the parsed document. Depending on how the application handles the parsed data, the attacker may receive the file contents in a response, or use the same mechanism for server-side request forgery (SSRF) or denial of service.

DocumentBuilderFactory.newInstance() in Java is vulnerable by default. It has been vulnerable by default for as long as it has existed. The OWASP Top 10 ranks XXE as a common, exploitable vulnerability in production systems. It is particularly dangerous in authentication endpoints, because those endpoints are by definition exposed to unauthenticated users.

The fix¶

The fix is well-established. Disable DOCTYPE declarations entirely, enable secure processing, and disable entity reference expansion:

// After: secure
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
dbf.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true);
dbf.setExpandEntityReferences(false);
DocumentBuilder builder = dbf.newDocumentBuilder();
Document dDoc = builder.parse(input);

Three additional lines. The disallow-doctype-decl feature is the most important one. It rejects any XML document that contains a DOCTYPE declaration, which eliminates the entire class of XXE attacks. FEATURE_SECURE_PROCESSING adds general security constraints. setExpandEntityReferences(false) is defense in depth.

I filed an issue, submitted a pull request with the fix, and it was reviewed and merged. Standard responsible disclosure workflow.

The workflow¶

This is where the story gets interesting. Not because the vulnerability was exotic -- it was not -- but because of what the end-to-end workflow looked like.

Step 1: Synthesis scans the codebase. One command, 31 seconds, 95 findings with source locations and CWE classifications. No manual code review. No grepping for DocumentBuilderFactory.

Step 2: Human triages. This is the slow part, and it cannot be automated away. Many findings are patterns that look dangerous in isolation but are safe in context. A Runtime.exec() call that only accepts hardcoded commands. A SQL query that uses parameterized statements despite containing string concatenation in a comment. Each finding requires reading the surrounding code, understanding the data flow, and deciding whether it is exploitable. This took hours, not seconds.

Step 3: Claude Code explains the genuine findings. For each finding that survived triage, I asked Claude Code to explain the vulnerability in the context of this specific code. Not because I could not look up CWE-611 myself, but because the AI can trace the data flow from the HTTP endpoint to the parser and produce an explanation grounded in this codebase rather than generic documentation.

Step 4: Claude Code writes the fix. Based on the explanation, Claude Code produced the patched code. The fix for XXE is well-known, so this was straightforward. For more complex vulnerabilities, the fix might require more judgment.

Step 5: Human reviews and merges. Is this the right fix? Does it break the endpoint's legitimate functionality? Does the DOCTYPE restriction affect any valid use case? The answers were clear: no legitimate credential submission uses DOCTYPE declarations, and the fix is a strict improvement.

The pattern: automated tool finds candidates, AI explains and patches, human decides. Each component does what it is good at.

What the scanner does not do¶

I want to be direct about the limitations, because overstating what automated scanning can do is worse than understating it.

Synthesis's security scanner covers 21 signal types. It is a static pattern matcher. It does not do taint analysis, symbolic execution, or dynamic testing. It will catch DocumentBuilderFactory.newInstance() without entity protection. It will not catch a custom XML parser that has the same vulnerability expressed differently.

The false positive rate on this scan was high. Ninety-five findings, and the majority were not exploitable in context. A security team that treated every finding as confirmed would waste significant time. A security team that used it as triage input would save significant time.

Static analysis is a net, not a guarantee. The mesh catches certain shapes and misses others.

What this illustrates¶

The traditional approach to security review is manual audit. An experienced security engineer reads code, knows what to look for, and finds vulnerabilities through expertise and pattern recognition. This works, but it scales poorly. Sixty repositories is a lot of code to read.

The scan took 31 seconds. The triage took hours. The fix took minutes. The review and merge took a day. The bottleneck was human judgment, and it should be. The speed advantage is not that the whole process is fast. It is that the finding step -- which used to require reading code line by line -- is now a command. Expert time shifts from finding to judging. That is a better use of expertise.

If you maintain Java code that parses XML, check your DocumentBuilderFactory instances. If they do not explicitly disable external entities, they are vulnerable by default. This has been true for decades, and it is still one of the most common findings in Java security audits.

Synthesis is a CLI tool for codebase knowledge management. The security scanner is one of 21 signal types in the code-graph module.