Series MapLesson 18 / 32
Build CoreOrdered learning track

Learn Java Data Mapper Json Xml Validation Part 018 Xml Processing Mental Model

13 min read2458 words
PrevNext
Lesson 1832 lesson track0718 Build Core

title: Learn Java Data Mapper, JSON/XML Processing & Validation - Part 018 description: XML processing mental model untuk Java: DOM, SAX, StAX, JAXB/Jakarta XML Binding, XSD, namespace, schema validity, XML tree/event/binding choices, dan XML vs JSON contract design. series: learn-java-data-mapper-json-xml-validation seriesTitle: Learn Java Data Mapper, JSON/XML Processing & Validation order: 18 partTitle: XML Processing Mental Model: DOM, SAX, StAX, JAXB, XSD, Namespace, Schema Validity tags:

  • java
  • xml
  • jaxp
  • dom
  • sax
  • stax
  • jaxb
  • jakarta-xml-binding
  • xsd
  • namespace
  • data-mapper date: 2026-06-29

Part 018 — XML Processing Mental Model: DOM, SAX, StAX, JAXB, XSD, Namespace, Schema Validity

Target skill: mampu memilih dan menggunakan model pemrosesan XML yang tepat: DOM, SAX, StAX, Jakarta XML Binding/JAXB, XSD validation, namespace-aware parsing, dan schema-driven contract design.

Banyak engineer modern lebih sering bekerja dengan JSON. Akibatnya, XML sering diperlakukan sebagai:

“JSON yang syntax-nya pakai tag.”

Itu salah.

XML punya konsep yang tidak ada atau tidak dominan di JSON:

  • element
  • attribute
  • namespace
  • prefix
  • qualified name
  • mixed content
  • processing instruction
  • entity
  • CDATA
  • schema
  • XSD type
  • order-sensitive content
  • text node
  • whitespace semantics
  • ID/IDREF
  • canonicalization
  • XPath/XSLT
  • external entity risk

Mental model:

XML is not just a data shape. XML is a document model with names, namespaces, order, text, attributes, and schema.

Part ini bukan deep dive JAXB dulu. Itu Part 019. Ini adalah fondasi mental agar saat masuk JAXB/Jackson XML/security, kita tidak salah memilih abstraction.


1. Kaufman Deconstruction

Subskill XML processing:

SubskillKemampuan
Understand XML document modelElement, attribute, text, namespace, order
Choose parser modelDOM vs SAX vs StAX vs binding
Reason about schemaXSD validity vs business validity
Handle namespacesQName, prefix, URI, qualified elements
Map XML to JavaKapan JAXB/Jakarta XML Binding cocok
Avoid JSON thinkingTidak memaksakan JSON mental model ke XML
Process large XMLStreaming dengan SAX/StAX
Validate safelyXSD validation and parser hardening
Preserve document semanticsMixed content/order/attribute vs element
Test XML contractsGolden XML, namespace, schema, invalid fixture

Latihan utama:

  1. Ambil satu XML invoice.
  2. Gambar struktur element/attribute/text.
  3. Tandai namespace.
  4. Pilih DOM/SAX/StAX/JAXB.
  5. Buat valid dan invalid XML fixtures.
  6. Validasi XSD.
  7. Map ke DTO.
  8. Test round-trip bila diperlukan.

2. XML Is a Document Tree

Example:

<invoice xmlns="https://example.com/invoice"
         xmlns:tax="https://example.com/tax"
         id="INV-001">
    <customer id="CUS-001">Ana</customer>
    <amount currency="IDR">100000.00</amount>
    <tax:withholding rate="2.5">2500.00</tax:withholding>
</invoice>

This contains:

XML constructExample
element<invoice>, <customer>, <amount>
attributeid="INV-001", currency="IDR"
default namespacexmlns="https://example.com/invoice"
prefixed namespacexmlns:tax="https://example.com/tax"
qualified elementtax:withholding
text nodeAna, 100000.00
ordercustomer before amount before withholding

JSON equivalent is not exact.

{
  "id": "INV-001",
  "customer": {
    "id": "CUS-001",
    "name": "Ana"
  },
  "amount": {
    "currency": "IDR",
    "value": "100000.00"
  }
}

XML has attribute vs element distinction and namespace semantics.


3. XML Processing Choices

High-level decision:

NeedBest fit
small document + random accessDOM
huge document + event processingSAX or StAX
pull-based streamingStAX
object mapping against schema-like shapeJAXB/Jakarta XML Binding
query parts of documentXPath
transform XML to XML/HTML/textXSLT
simple API JSON-like XMLJackson XML may be okay
strict enterprise XML schemaJAXB + XSD validation

4. DOM

DOM builds the whole XML document as tree.

Use DOM when:

  • XML is small/medium
  • you need random access
  • you need modify document tree
  • you need XPath over whole document
  • you need inspect mixed content
  • developer simplicity matters more than memory

Avoid DOM when:

  • XML can be very large
  • you only need sequential processing
  • memory pressure is important
  • you process many documents concurrently

Basic parsing:

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);

DocumentBuilder builder = factory.newDocumentBuilder();

Document document;
try (InputStream input = Files.newInputStream(path)) {
    document = builder.parse(input);
}

Element root = document.getDocumentElement();
String invoiceId = root.getAttribute("id");

Important: use namespace-aware parsing.


5. SAX

SAX is event-based push parsing.

The parser calls your handler:

startElement(invoice)
startElement(customer)
characters(Ana)
endElement(customer)
endElement(invoice)

Use SAX when:

  • XML is large
  • sequential processing is enough
  • you want low memory usage
  • callback style is acceptable
  • you do not need to modify tree

SAX handler example:

public final class InvoiceSaxHandler extends DefaultHandler {
    private final StringBuilder text = new StringBuilder();
    private String currentInvoiceId;
    private BigDecimal amount;

    @Override
    public void startElement(
        String uri,
        String localName,
        String qName,
        Attributes attributes
    ) {
        text.setLength(0);

        if ("invoice".equals(localName)) {
            currentInvoiceId = attributes.getValue("id");
        }
    }

    @Override
    public void characters(char[] ch, int start, int length) {
        text.append(ch, start, length);
    }

    @Override
    public void endElement(String uri, String localName, String qName) {
        if ("amount".equals(localName)) {
            amount = new BigDecimal(text.toString().trim());
        }
    }
}

SAX is powerful but can become state-machine-heavy.


6. StAX

StAX is streaming pull parsing. Your code asks for next event/token.

XMLInputFactory factory = XMLInputFactory.newFactory();

try (InputStream input = Files.newInputStream(path)) {
    XMLStreamReader reader = factory.createXMLStreamReader(input);

    while (reader.hasNext()) {
        int event = reader.next();

        if (event == XMLStreamConstants.START_ELEMENT) {
            String localName = reader.getLocalName();
            // handle
        }
    }
}

Use StAX when:

  • XML is large
  • you prefer pull control
  • you want to combine streaming with object binding
  • you need stop early
  • you want simpler control flow than SAX callbacks
  • you write XML too

StAX is often excellent for enterprise import/export.


7. DOM vs SAX vs StAX

FeatureDOMSAXStAX
Memoryhighlowlow
Accessrandomsequentialsequential
Controltree traversalparser pushes callbacksapp pulls events
Modificationeasynono tree modification
Simplicityeasy for small docscallback stateexplicit loop
Large filesriskygoodgood
Stop earlypossible after parse onlypossible with exception/controlnatural
Write XMLseparate APIsnoyes
Best usesmall docs, XPath, modificationsfast event parsingcontrolled streaming

Rule:

DOM is for document-in-memory. SAX/StAX are for document-as-stream. JAXB is for document-as-object.


8. JAXB / Jakarta XML Binding

Jakarta XML Binding maps XML to Java objects and Java objects to XML.

Core operations:

  • unmarshal XML to Java object tree
  • access/update Java representation
  • marshal Java object tree to XML
  • optionally validate during unmarshal/marshal depending setup

DTO:

@XmlRootElement(name = "invoice")
@XmlAccessorType(XmlAccessType.FIELD)
public class InvoiceXml {
    @XmlAttribute(name = "id")
    private String id;

    @XmlElement(name = "customer")
    private CustomerXml customer;

    @XmlElement(name = "amount")
    private AmountXml amount;

    public String getId() {
        return id;
    }

    public CustomerXml getCustomer() {
        return customer;
    }

    public AmountXml getAmount() {
        return amount;
    }
}

Unmarshal:

JAXBContext context = JAXBContext.newInstance(InvoiceXml.class);
Unmarshaller unmarshaller = context.createUnmarshaller();

InvoiceXml invoice;
try (InputStream input = Files.newInputStream(path)) {
    invoice = (InvoiceXml) unmarshaller.unmarshal(input);
}

Marshal:

Marshaller marshaller = context.createMarshaller();
marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, Boolean.TRUE);
marshaller.marshal(invoice, outputStream);

Use JAXB when XML shape maps cleanly to object model and schema-like contract matters.


9. XSD and Schema Validity

XSD can define:

  • element names
  • element order
  • attributes
  • required/optional fields
  • simple types
  • complex types
  • namespaces
  • occurrence constraints
  • enumerations
  • patterns
  • numeric restrictions
  • date/time types

Example concept:

<xs:element name="invoice" type="InvoiceType"/>

Schema validation answers:

“Is this XML structurally valid according to schema?”

It does not necessarily answer:

“Is this invoice allowed by business workflow?”

Distinguish:

Validation kindExample
well-formednessXML syntax is valid
schema validityrequired element exists, type matches XSD
semantic/business validityinvoice date is not in closed period
referential validitycustomer id exists
authorization validitycaller can submit invoice
regulatory validityrecord satisfies reporting rule

Layering:


10. Namespace Mental Model

Namespace is not the prefix. Namespace is the URI.

<tax:withholding xmlns:tax="https://example.com/tax">

Here:

PartMeaning
taxprefix
https://example.com/taxnamespace URI
withholdinglocal name
{https://example.com/tax}withholdingexpanded/QName-like identity

This matters because prefixes can change:

<t:withholding xmlns:t="https://example.com/tax">

Same namespace, same local name, different prefix. Should be treated as same element.

Always use namespace-aware parsing for serious XML.

DOM:

factory.setNamespaceAware(true);

StAX:

String namespaceUri = reader.getNamespaceURI();
String localName = reader.getLocalName();

JAXB:

@XmlElement(
    name = "withholding",
    namespace = "https://example.com/tax"
)
private TaxWithholdingXml withholding;

11. Attribute vs Element

XML can represent data as attribute or element.

Attribute:

<amount currency="IDR">100000.00</amount>

Element:

<amount>
    <value>100000.00</value>
    <currency>IDR</currency>
</amount>

Guidelines:

Use attribute forUse element for
metadata about elementstructured data
identifiersrepeated values
simple codeslong text
flagsmixed/complex content
compact referencedata needing child structure

But schema/integration contract decides.

JAXB mapping:

@XmlAttribute(name = "currency")
private String currency;

@XmlValue
private BigDecimal value;

for:

<amount currency="IDR">100000.00</amount>

12. Order Matters

In JSON object, property order is generally not semantically important. In XML schema, element order can be significant.

XSD sequence:

<xs:sequence>
    <xs:element name="customer"/>
    <xs:element name="amount"/>
    <xs:element name="dueDate"/>
</xs:sequence>

Valid:

<invoice>
    <customer>...</customer>
    <amount>...</amount>
    <dueDate>2026-06-29</dueDate>
</invoice>

Invalid if order differs:

<invoice>
    <amount>...</amount>
    <customer>...</customer>
    <dueDate>2026-06-29</dueDate>
</invoice>

JAXB ordering:

@XmlType(propOrder = {"customer", "amount", "dueDate"})
public class InvoiceXml {
}

Order is contract.


13. Text, Whitespace, and Mixed Content

XML:

<message>Hello <b>Ana</b>, welcome.</message>

This is mixed content: text and element children interleaved.

Object mapping becomes harder.

Use cases:

  • documents
  • rich text
  • legal/regulatory text
  • templates
  • XHTML-like payloads

Do not assume every XML is simple data object.

If XML is document-like, DOM/XPath/XSLT may be better than JAXB.


14. CDATA and Escaping

CDATA:

<description><![CDATA[Use < and > literally]]></description>

CDATA is a syntax feature. After parsing, content is text.

Escaped equivalent:

<description>Use &lt; and &gt; literally</description>

Most business logic should not care whether input used CDATA or escaped text, unless preserving exact lexical form is required.


15. Entity and Security Awareness

XML entities can be dangerous if parser is not hardened.

Risks:

  • XXE
  • external entity fetch
  • local file disclosure
  • SSRF
  • entity expansion attacks
  • schema import fetching untrusted URLs
  • DTD processing surprises

Security deep dive is Part 021, but mental model starts here:

Never parse untrusted XML with default assumptions. Harden parser configuration.

For now, remember:

  • disable external entity resolution unless required
  • enable secure processing where applicable
  • restrict external schema/DTD access
  • set limits
  • avoid network fetching during parse
  • validate safely

16. XML vs JSON Mapping Differences

ConcernJSONXML
object fieldspropertieselements/attributes
array/listarrayrepeated elements
namespaceno native equivalentcentral concept
orderusually irrelevantoften relevant
schemaJSON Schema optionalXSD common in enterprise
mixed contentuncommonnative
comments/PInot in JSON data modelpossible
entitynoyes
streamingtoken parserSAX/StAX
object bindingJackson databindJAXB/Jakarta XML Binding/Jackson XML
canonicalizationless commonimportant for signatures

Do not design XML by mechanically translating JSON.


17. XML Contract Shape Examples

17.1 Element-Centric

<customer>
    <id>CUS-001</id>
    <fullName>Ana Maria</fullName>
    <email>ana@example.com</email>
</customer>

Java:

@XmlRootElement(name = "customer")
@XmlAccessorType(XmlAccessType.FIELD)
public class CustomerXml {
    @XmlElement(name = "id")
    private String id;

    @XmlElement(name = "fullName")
    private String fullName;

    @XmlElement(name = "email")
    private String email;
}

17.2 Attribute-Centric

<customer id="CUS-001" status="ACTIVE">
    <fullName>Ana Maria</fullName>
</customer>

Java:

@XmlAttribute(name = "id")
private String id;

@XmlAttribute(name = "status")
private String status;

@XmlElement(name = "fullName")
private String fullName;

17.3 Namespaced

<cust:customer xmlns:cust="https://example.com/customer">
    <cust:id>CUS-001</cust:id>
</cust:customer>

Java:

@XmlRootElement(
    name = "customer",
    namespace = "https://example.com/customer"
)
public class CustomerXml {
    @XmlElement(
        name = "id",
        namespace = "https://example.com/customer"
    )
    private String id;
}

18. Choosing JAXB vs Jackson XML

This series covers both. High-level decision:

SituationPrefer
schema-first enterprise XMLJAXB/Jakarta XML Binding
strict XSD integrationJAXB + XSD validation
Java object ↔ XML document with annotationsJAXB
JSON-like XML for simple APIJackson XML possible
same DTO serialized to JSON and XMLJackson XML possible with caution
namespace-heavy XMLJAXB often clearer
mixed content/document XMLDOM/SAX/StAX/XPath; JAXB if model fits
large XML importStAX/SAX + partial binding
need exact schema complianceJAXB/XSD-oriented design

Jackson XML is convenient, but XML is not JSON. For schema-heavy integrations, JAXB often maps XML concepts more directly.


19. XML Validation Layers with Jakarta Validation

You may have both:

  1. XSD validation
  2. Jakarta Validation

Example:

XSD says:

amount element must be decimal
currency attribute required

Jakarta Validation says:

public class AmountXml {
    @XmlAttribute(name = "currency")
    @NotBlank
    @Pattern(regexp = "[A-Z]{3}")
    private String currency;

    @XmlValue
    @NotNull
    @DecimalMin("0.01")
    private BigDecimal value;
}

Domain says:

currency must be supported for product
amount must not exceed daily limit

Do not collapse all validation into one layer.


20. Large XML Import Pattern

For very large XML:

<cases>
    <case>...</case>
    <case>...</case>
    <case>...</case>
</cases>

Use StAX to stream to each <case>, then JAXB/Jackson XML bind each subtree if needed.

Conceptual flow:

This avoids loading all cases into memory.


21. Error Design for XML

Error should include:

  • line/column if parser provides
  • XPath-like path if possible
  • schema validation message
  • stable error code
  • rejected element/attribute
  • safe value
  • correlation/import id

Example:

{
  "code": "INVALID_XML_SCHEMA",
  "path": "/invoice/amount",
  "line": 12,
  "column": 18,
  "message": "amount must be decimal"
}

For batch imports:

{
  "code": "INVALID_CASE_RECORD",
  "recordIndex": 27,
  "path": "/cases/case[27]/priority",
  "message": "priority is required"
}

22. Testing XML Contracts

22.1 Well-Formed Invalid

<invoice>
    <amount>100.00</invoice>

Expected: parser error.

22.2 Schema Invalid

<invoice id="INV-001">
    <amount currency="IDR">abc</amount>
</invoice>

Expected: XSD validation error.

22.3 Business Invalid

<invoice id="INV-001">
    <amount currency="IDR">-100.00</amount>
</invoice>

Expected: bean/domain validation error.

22.4 Namespace Fixture

<inv:invoice xmlns:inv="https://example.com/invoice" id="INV-001">
    <inv:amount currency="IDR">100.00</inv:amount>
</inv:invoice>

Expected: parsed correctly by namespace URI, not prefix string.

22.5 Golden Marshal Fixture

If you produce XML, compare output structurally. Raw string comparison can be brittle due to formatting/prefix differences.

For strict integrations, canonicalization or schema validation may be needed.


23. XML Processing Anti-Patterns

23.1 Namespace-Unaware Parsing

Parsing without namespace awareness breaks real XML integrations.

23.2 DOM for Huge Files

Works in test, fails in production.

23.3 Treating Attributes and Elements as Interchangeable

They are contract choices.

23.4 Ignoring Element Order

XSD may require order.

23.5 Disabling Schema Validation Because It Is Annoying

Schema validation catches integration errors early.

23.6 Exposing JAXB Object as Domain Model

XML binding model is boundary model. Map to domain command/value objects.

23.7 Trusting Default Parser Security

Untrusted XML needs hardening.


24. Decision Matrix

ProblemBest Default
small config XMLDOM or JAXB
large import XMLStAX/SAX
strict schema-first integrationJAXB + XSD
generate XML for partnerJAXB with schema/golden tests
query a few nodes from small docDOM + XPath
transform XML to another XMLXSLT or streaming transform
document-like mixed contentDOM/SAX/StAX depending need
XML with many namespacesJAXB or namespace-aware StAX
JSON-like simple XMLJackson XML possible
untrusted XMLhardened parser + limits

25. Mini Case Study: Regulatory Case XML Import

Input:

<cases xmlns="https://example.com/regulatory/case">
    <case id="CASE-001">
        <title>Suspicious Activity</title>
        <priority>HIGH</priority>
        <reportedAt>2026-06-29T03:00:00Z</reportedAt>
        <party id="PTY-001" role="SUBJECT">
            <name>Ana</name>
        </party>
    </case>
</cases>

Boundary model:

@XmlRootElement(name = "case", namespace = "https://example.com/regulatory/case")
@XmlAccessorType(XmlAccessType.FIELD)
public class CaseXml {
    @XmlAttribute(name = "id")
    private String id;

    @XmlElement(name = "title", namespace = "https://example.com/regulatory/case")
    private String title;

    @XmlElement(name = "priority", namespace = "https://example.com/regulatory/case")
    private String priority;

    @XmlElement(name = "reportedAt", namespace = "https://example.com/regulatory/case")
    private String reportedAt;

    @XmlElement(name = "party", namespace = "https://example.com/regulatory/case")
    private List<PartyXml> parties;
}

Semantic mapping:

public record CreateCaseCommand(
    CaseId caseId,
    String title,
    Priority priority,
    Instant reportedAt,
    List<PartyCommand> parties
) {}

Do not let CaseXml become domain aggregate. It is a boundary representation.


26. Practice Drill

Given XML:

<payment xmlns="https://example.com/payment" id="PAY-001">
    <amount currency="IDR">100000.00</amount>
    <payer id="CUS-001">
        <name>Ana</name>
    </payer>
    <method type="BANK_TRANSFER">
        <bankCode>014</bankCode>
        <accountNumber>001234567890</accountNumber>
    </method>
</payment>

Tasks:

  1. Identify elements, attributes, text nodes.
  2. Identify namespace URI and local names.
  3. Decide DOM/SAX/StAX/JAXB.
  4. Create JAXB boundary classes.
  5. Define XSD-level validation candidates.
  6. Define Jakarta Validation candidates.
  7. Define domain validation candidates.
  8. Map to domain command.
  9. Create invalid fixtures:
    • missing amount
    • invalid currency
    • wrong namespace
    • method missing bankCode
    • malformed XML
  10. Define production parser hardening checklist.

27. Summary

XML processing requires a different mental model from JSON.

Mental model:

XML is a document model; choose tree, event, stream, or binding based on document shape and operational constraints.

Rules:

  1. XML has elements, attributes, namespaces, text, order, and schema.
  2. DOM loads the whole document tree.
  3. SAX is push event streaming.
  4. StAX is pull event streaming.
  5. JAXB/Jakarta XML Binding maps XML and Java object trees.
  6. XSD validation is not the same as business validation.
  7. Namespace URI matters more than prefix.
  8. Attribute vs element is a contract decision.
  9. Order can matter.
  10. Large XML should use streaming.
  11. Untrusted XML must be hardened.
  12. XML boundary objects should map to domain objects, not become domain objects.

Part berikutnya deep dives into Jakarta XML Binding/JAXB: annotations, marshal/unmarshal lifecycle, schema validation, adapters, namespaces, and production patterns.


References

Lesson Recap

You just completed lesson 18 in build core. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.