Start HereOrdered learning track

SAX Event-Driven Processing

Learn Java XML In Action - Part 006

SAX event-driven XML processing: ContentHandler, DefaultHandler, secure parser configuration, state-machine extraction, streaming validation, error handling, large-file strategy, dan production-grade parser pattern.

13 min read2471 words
PrevNext
Lesson 0632 lesson track0106 Start Here
#java#xml#sax#parser+5 more

Part 006 — SAX Event-Driven Processing

Tujuan Part Ini

Part ini membahas SAX sebagai pendekatan event-driven untuk memproses XML di Java.

Jika DOM bertanya:

“Bagaimana jika seluruh XML kita ubah menjadi tree lalu dibaca kapan saja?”

SAX bertanya:

“Bagaimana jika parser berjalan sekali dari awal sampai akhir, lalu memanggil handler setiap menemukan event?”

SAX adalah API lama, tetapi tetap relevan untuk production karena:

  • memory footprint bisa rendah;
  • cocok untuk file besar;
  • cocok untuk ekstraksi subset field;
  • cocok untuk pipeline batch dan regulatory feed;
  • cocok untuk validasi streaming;
  • tidak memaksa object graph penuh;
  • dapat dibuat sangat cepat jika state machine-nya rapi.

Namun SAX juga punya trade-off besar:

  • tidak random access;
  • logic mudah berubah menjadi spaghetti handler;
  • state harus dikelola manual;
  • text callback bisa dipanggil beberapa kali untuk satu logical text value;
  • error handling dan context stack harus dirancang eksplisit.

Target part ini: kamu mampu membangun parser SAX yang aman, deterministik, observable, dan maintainable.


Posisi Part Ini dalam Framework Kaufman

Part ini adalah latihan kuat untuk tahap learn enough to self-correct.

SAX memaksa kamu memahami XML sebagai aliran event, bukan hanya object tree. Ini memperkuat skill penting:

  • membaca struktur XML secara incremental;
  • menjaga state dengan disiplin;
  • mendeteksi invalid structure lebih awal;
  • mengelola error dengan line/column;
  • membedakan parser event, domain event, dan business command;
  • membuat memory behavior yang predictable.

Dalam Kaufman-style deliberate practice, SAX adalah latihan bagus karena feedback-nya cepat: jika state machine salah, parser langsung menghasilkan data salah atau error struktur.


Mental Model: SAX adalah Push Event Stream

SAX parser membaca dokumen dan mendorong event ke handler.

DOM memberi kamu tree. SAX memberi kamu event:

startDocument
startPrefixMapping
startElement
characters
endElement
endPrefixMapping
endDocument

Kamu tidak bertanya “ambil element X dari dokumen”. Kamu berkata “ketika parser melewati element X dalam context Y, kumpulkan data”.


Java API yang Terlibat

SAX di Java biasanya memakai:

javax.xml.parsers.SAXParserFactory
javax.xml.parsers.SAXParser
org.xml.sax.XMLReader
org.xml.sax.helpers.DefaultHandler
org.xml.sax.ContentHandler
org.xml.sax.Attributes
org.xml.sax.ErrorHandler
org.xml.sax.SAXParseException
org.xml.sax.InputSource

SAXParserFactory mengkonfigurasi dan membuat parser SAX. SAXParser membungkus XMLReader. Saat parsing, parser memanggil method handler seperti startElement, characters, dan endElement.


SAX vs DOM: Perbedaan Operasional

DimensiDOMSAX
ModelTree di memoryEvent stream
AccessRandom accessForward-only
MemoryTinggi, tergantung ukuran dokumenRendah, tergantung state handler
MutationMudahTidak cocok
QueryMudah dengan traversal/XPathManual state machine
Payload besarBurukCocok
DebuggabilityMudah inspeksi treeButuh trace event/context
Code styleObject navigationCallback/stateful
Error earlySetelah node terbangun atau saat parseSangat early

Rule praktis:

Gunakan SAX ketika ukuran payload besar atau data yang dibutuhkan bisa diproses secara forward-only.


Secure SAX Parser Factory

Seperti DOM, SAX harus di-hardening.

import javax.xml.XMLConstants;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;

public final class SaxFactories {

    private SaxFactories() {
    }

    public static SAXParserFactory secureNamespaceAwareFactory() {
        SAXParserFactory factory = SAXParserFactory.newInstance();
        factory.setNamespaceAware(true);
        factory.setXIncludeAware(false);
        factory.setValidating(false);

        setFeature(factory, XMLConstants.FEATURE_SECURE_PROCESSING, true);
        setFeature(factory, "http://apache.org/xml/features/disallow-doctype-decl", true);
        setFeature(factory, "http://xml.org/sax/features/external-general-entities", false);
        setFeature(factory, "http://xml.org/sax/features/external-parameter-entities", false);
        setFeature(factory, "http://apache.org/xml/features/nonvalidating/load-external-dtd", false);

        return factory;
    }

    public static SAXParser newSecureSaxParser() {
        try {
            return secureNamespaceAwareFactory().newSAXParser();
        } catch (ParserConfigurationException | org.xml.sax.SAXException ex) {
            throw new IllegalStateException("Cannot create secure SAX parser", ex);
        }
    }

    private static void setFeature(SAXParserFactory factory, String feature, boolean value) {
        try {
            factory.setFeature(feature, value);
        } catch (ParserConfigurationException | org.xml.sax.SAXNotRecognizedException |
                 org.xml.sax.SAXNotSupportedException ex) {
            throw new IllegalStateException("Required SAX feature not supported: " + feature, ex);
        }
    }
}

Setelah parser dibuat, kamu juga bisa harden XMLReader:

SAXParser parser = SaxFactories.newSecureSaxParser();
XMLReader reader = parser.getXMLReader();
reader.setEntityResolver((publicId, systemId) -> new InputSource(new StringReader("")));
reader.setErrorHandler(new ThrowingSaxErrorHandler());

Error handler:

import org.xml.sax.ErrorHandler;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;

public final class ThrowingSaxErrorHandler implements ErrorHandler {
    @Override
    public void warning(SAXParseException exception) throws SAXException {
        throw enrich("XML warning", exception);
    }

    @Override
    public void error(SAXParseException exception) throws SAXException {
        throw enrich("XML error", exception);
    }

    @Override
    public void fatalError(SAXParseException exception) throws SAXException {
        throw enrich("XML fatal error", exception);
    }

    private SAXException enrich(String category, SAXParseException ex) {
        return new SAXException("%s at line=%d column=%d: %s".formatted(
                category,
                ex.getLineNumber(),
                ex.getColumnNumber(),
                ex.getMessage()
        ), ex);
    }
}

Minimal SAX Handler

Contoh XML:

<Order xmlns="urn:example:order:v1">
    <Id>O-100</Id>
</Order>

Handler sederhana:

import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

public final class MinimalOrderHandler extends DefaultHandler {
    private static final String NS = "urn:example:order:v1";

    private boolean insideId;
    private final StringBuilder text = new StringBuilder();
    private String orderId;

    @Override
    public void startElement(String uri, String localName, String qName, Attributes attributes) {
        if (NS.equals(uri) && "Id".equals(localName)) {
            insideId = true;
            text.setLength(0);
        }
    }

    @Override
    public void characters(char[] ch, int start, int length) {
        if (insideId) {
            text.append(ch, start, length);
        }
    }

    @Override
    public void endElement(String uri, String localName, String qName) {
        if (NS.equals(uri) && "Id".equals(localName)) {
            orderId = text.toString().trim();
            insideId = false;
        }
    }

    public String orderId() {
        return orderId;
    }
}

Parser usage:

public String parseOrderId(InputStream input) {
    try {
        SAXParser parser = SaxFactories.newSecureSaxParser();
        MinimalOrderHandler handler = new MinimalOrderHandler();
        parser.parse(input, handler);
        return handler.orderId();
    } catch (Exception ex) {
        throw new XmlParsingException("Cannot parse order id", ex);
    }
}

Ini cukup untuk demo, tetapi belum cukup untuk production karena:

  • insideId tidak membedakan Order/Id dan Line/Id;
  • duplicate Id tidak dicegah;
  • missing Id tidak dicek;
  • text callback fragmentation sudah di-handle, tetapi context belum kuat;
  • error tidak menyertakan path/context.

Penting: characters() Bisa Dipanggil Berkali-kali

Jangan asumsikan satu element text menghasilkan satu callback.

Salah:

@Override
public void characters(char[] ch, int start, int length) {
    currentText = new String(ch, start, length);
}

Benar:

@Override
public void characters(char[] ch, int start, int length) {
    if (capturingText) {
        textBuffer.append(ch, start, length);
    }
}

Parser boleh membagi character data menjadi beberapa event karena buffer internal, entity boundary, CDATA, atau implementasi parser.

Invariant:

Text value selesai hanya saat endElement dari element target tercapai.


Context Stack: Fondasi SAX yang Maintainable

Boolean flag cepat rusak saat struktur makin kompleks. Gunakan stack.

import java.util.ArrayDeque;
import java.util.Deque;

public final class ElementPath {
    private final Deque<QNameKey> stack = new ArrayDeque<>();

    public void push(String namespaceUri, String localName) {
        stack.addLast(new QNameKey(namespaceUri, localName));
    }

    public QNameKey pop() {
        return stack.removeLast();
    }

    public QNameKey current() {
        return stack.peekLast();
    }

    public boolean endsWith(QNameKey... suffix) {
        if (suffix.length > stack.size()) {
            return false;
        }
        QNameKey[] values = stack.toArray(QNameKey[]::new);
        int offset = values.length - suffix.length;
        for (int i = 0; i < suffix.length; i++) {
            if (!values[offset + i].equals(suffix[i])) {
                return false;
            }
        }
        return true;
    }

    public String asDebugPath() {
        return stack.stream()
                .map(q -> "{" + q.namespaceUri() + "}" + q.localName())
                .reduce((a, b) -> a + "/" + b)
                .orElse("/");
    }

    public int depth() {
        return stack.size();
    }

    public void clear() {
        stack.clear();
    }

    public void popExpecting(QNameKey expected) throws org.xml.sax.SAXException {
        QNameKey actual = stack.removeLast();
        if (!actual.equals(expected)) {
            throw new org.xml.sax.SAXException("Expected to close " + expected + " but closed " + actual);
        }
    }
}

public record QNameKey(String namespaceUri, String localName) {
}

Usage di handler:

private final ElementPath path = new ElementPath();

@Override
public void startElement(String uri, String localName, String qName, Attributes attributes) {
    path.push(uri, localName);
}

@Override
public void endElement(String uri, String localName, String qName) throws SAXException {
    QNameKey ended = path.pop();
    if (!ended.equals(new QNameKey(uri, localName))) {
        throw new SAXException("Parser path mismatch at " + path.asDebugPath());
    }
}

Meskipun parser well-formed XML seharusnya menjaga matching start/end, path stack tetap berguna untuk business extraction.


Production SAX Extractor: Order Parser

Target XML:

<Order xmlns="urn:example:order:v1">
    <Id>O-100</Id>
    <CustomerId>C-9</CustomerId>
    <Lines>
        <Line>
            <Sku>SKU-1</Sku>
            <Quantity>2</Quantity>
            <UnitPrice currency="USD">12.50</UnitPrice>
        </Line>
    </Lines>
</Order>

Domain:

public record OrderDocument(
        String orderId,
        String customerId,
        List<OrderLine> lines
) {
}

public record OrderLine(
        String sku,
        int quantity,
        BigDecimal unitPrice,
        String currency
) {
}

Handler:

import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

import java.math.BigDecimal;
import java.util.ArrayList;
import java.util.List;

public final class OrderSaxHandler extends DefaultHandler {
    private static final String NS = "urn:example:order:v1";

    private static final QNameKey ORDER = new QNameKey(NS, "Order");
    private static final QNameKey ORDER_ID = new QNameKey(NS, "Id");
    private static final QNameKey CUSTOMER_ID = new QNameKey(NS, "CustomerId");
    private static final QNameKey LINES = new QNameKey(NS, "Lines");
    private static final QNameKey LINE = new QNameKey(NS, "Line");
    private static final QNameKey SKU = new QNameKey(NS, "Sku");
    private static final QNameKey QUANTITY = new QNameKey(NS, "Quantity");
    private static final QNameKey UNIT_PRICE = new QNameKey(NS, "UnitPrice");

    private final ElementPath path = new ElementPath();
    private final StringBuilder text = new StringBuilder();

    private String orderId;
    private String customerId;
    private final List<OrderLine> lines = new ArrayList<>();

    private String currentSku;
    private Integer currentQuantity;
    private BigDecimal currentUnitPrice;
    private String currentCurrency;

    private boolean capturingText;

    @Override
    public void startDocument() {
        path.clear();
        text.setLength(0);
        orderId = null;
        customerId = null;
        lines.clear();
    }

    @Override
    public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
        QNameKey current = new QNameKey(uri, localName);
        path.push(uri, localName);

        if (path.endsWith(ORDER, LINES, LINE)) {
            startLine();
            return;
        }

        if (isTextField(current)) {
            capturingText = true;
            text.setLength(0);
        }

        if (path.endsWith(ORDER, LINES, LINE, UNIT_PRICE)) {
            currentCurrency = requiredAttribute(attributes, "currency");
        }
    }

    @Override
    public void characters(char[] ch, int start, int length) {
        if (capturingText) {
            text.append(ch, start, length);
        }
    }

    @Override
    public void endElement(String uri, String localName, String qName) throws SAXException {
        QNameKey current = new QNameKey(uri, localName);

        if (capturingText && isTextField(current)) {
            applyText(current, text.toString().trim());
            text.setLength(0);
            capturingText = false;
        }

        if (path.endsWith(ORDER, LINES, LINE)) {
            endLine();
        }

        path.popExpecting(current);
    }

    @Override
    public void endDocument() throws SAXException {
        if (orderId == null || orderId.isBlank()) {
            throw new SAXException("Missing required Order/Id");
        }
        if (customerId == null || customerId.isBlank()) {
            throw new SAXException("Missing required Order/CustomerId");
        }
        if (lines.isEmpty()) {
            throw new SAXException("Order must contain at least one Line");
        }
    }

    public OrderDocument result() {
        return new OrderDocument(orderId, customerId, List.copyOf(lines));
    }

    private void startLine() {
        currentSku = null;
        currentQuantity = null;
        currentUnitPrice = null;
        currentCurrency = null;
    }

    private void endLine() throws SAXException {
        if (currentSku == null || currentSku.isBlank()) {
            throw new SAXException("Missing Line/Sku at " + path.asDebugPath());
        }
        if (currentQuantity == null) {
            throw new SAXException("Missing Line/Quantity at " + path.asDebugPath());
        }
        if (currentUnitPrice == null) {
            throw new SAXException("Missing Line/UnitPrice at " + path.asDebugPath());
        }
        if (currentCurrency == null || currentCurrency.isBlank()) {
            throw new SAXException("Missing UnitPrice/@currency at " + path.asDebugPath());
        }

        lines.add(new OrderLine(currentSku, currentQuantity, currentUnitPrice, currentCurrency));
    }

    private boolean isTextField(QNameKey current) {
        return current.equals(ORDER_ID)
                || current.equals(CUSTOMER_ID)
                || current.equals(SKU)
                || current.equals(QUANTITY)
                || current.equals(UNIT_PRICE);
    }

    private void applyText(QNameKey current, String value) throws SAXException {
        if (path.endsWith(ORDER, ORDER_ID)) {
            if (orderId != null) {
                throw new SAXException("Duplicate Order/Id");
            }
            orderId = requireNonBlank(value, "Order/Id");
            return;
        }

        if (path.endsWith(ORDER, CUSTOMER_ID)) {
            if (customerId != null) {
                throw new SAXException("Duplicate Order/CustomerId");
            }
            customerId = requireNonBlank(value, "Order/CustomerId");
            return;
        }

        if (path.endsWith(ORDER, LINES, LINE, SKU)) {
            currentSku = requireNonBlank(value, "Line/Sku");
            return;
        }

        if (path.endsWith(ORDER, LINES, LINE, QUANTITY)) {
            currentQuantity = parseInt(value, "Line/Quantity");
            return;
        }

        if (path.endsWith(ORDER, LINES, LINE, UNIT_PRICE)) {
            currentUnitPrice = parseDecimal(value, "Line/UnitPrice");
        }
    }

    private String requiredAttribute(Attributes attributes, String localName) throws SAXException {
        String value = attributes.getValue("", localName);
        if (value == null || value.isBlank()) {
            throw new SAXException("Missing required attribute '" + localName + "' at " + path.asDebugPath());
        }
        return value;
    }

    private String requireNonBlank(String value, String field) throws SAXException {
        if (value == null || value.isBlank()) {
            throw new SAXException("Missing required text for " + field + " at " + path.asDebugPath());
        }
        return value;
    }

    private int parseInt(String value, String field) throws SAXException {
        try {
            return Integer.parseInt(requireNonBlank(value, field));
        } catch (NumberFormatException ex) {
            throw new SAXException("Invalid integer for " + field + ": " + value, ex);
        }
    }

    private BigDecimal parseDecimal(String value, String field) throws SAXException {
        try {
            return new BigDecimal(requireNonBlank(value, field));
        } catch (NumberFormatException ex) {
            throw new SAXException("Invalid decimal for " + field + ": " + value, ex);
        }
    }
}

Parser service:

public final class OrderSaxParser {

    public OrderDocument parse(InputStream input) {
        try {
            SAXParser parser = SaxFactories.newSecureSaxParser();
            OrderSaxHandler handler = new OrderSaxHandler();
            parser.parse(input, handler);
            return handler.result();
        } catch (Exception ex) {
            throw new XmlParsingException("Cannot parse order XML with SAX", ex);
        }
    }
}

State Machine Design

SAX handler sebenarnya adalah state machine.

Ada dua pendekatan umum:

  1. Path-based state — current meaning ditentukan dari suffix path.
  2. Enum-based state — current state eksplisit sebagai enum.

Path-based lebih fleksibel untuk XML hierarchical. Enum-based lebih ketat untuk grammar sederhana.

Contoh enum:

public enum OrderParseState {
    START,
    IN_ORDER,
    IN_ORDER_ID,
    IN_CUSTOMER_ID,
    IN_LINES,
    IN_LINE,
    IN_SKU,
    IN_QUANTITY,
    IN_UNIT_PRICE,
    DONE
}

Gunakan enum jika:

  • struktur sangat ketat;
  • error transition harus eksplisit;
  • kamu ingin membuat parser seperti finite-state automaton.

Gunakan path jika:

  • struktur banyak nested;
  • namespace/localName adalah sumber kebenaran;
  • kamu butuh path untuk error message;
  • XML punya extension elements.

Handling Unknown Elements

Dalam enterprise XML, unknown element bisa berarti:

  • extension yang diizinkan;
  • versi schema baru;
  • partner mengirim field tidak dikenal;
  • typo;
  • malicious payload;
  • backward compatibility scenario.

Policy harus eksplisit.

Strict Mode

Reject unknown element:

private void validateAllowedElement(QNameKey current) throws SAXException {
    if (!ALLOWED_ELEMENTS.contains(current)) {
        throw new SAXException("Unexpected element " + current + " at " + path.asDebugPath());
    }
}

Extension Mode

Izinkan unknown di extension point:

private boolean insideExtensionPoint() {
    return path.endsWith(ORDER, new QNameKey(NS, "Extensions"));
}

Policy yang baik:

  • strict untuk command/message yang mengubah state bisnis;
  • extension-tolerant untuk metadata/annotation area yang memang didesain extension;
  • log/metric unknown extension agar evolution terlihat;
  • jangan silent-ignore unknown element di area core business.

Streaming Large File: Process per Record, Jangan Kumpulkan Semua

Misalnya file besar:

<Orders xmlns="urn:example:order:v1">
    <Order>...</Order>
    <Order>...</Order>
    <Order>...</Order>
</Orders>

Jika handler mengumpulkan semua OrderDocument ke list, memory tetap bisa membesar. Untuk file besar, proses record satu per satu.

public interface OrderConsumer {
    void accept(OrderDocument order) throws Exception;
}

Handler:

public final class StreamingOrdersSaxHandler extends DefaultHandler {
    private final OrderConsumer consumer;
    private OrderSaxHandler currentOrderHandler;
    private int orderDepth;

    public StreamingOrdersSaxHandler(OrderConsumer consumer) {
        this.consumer = consumer;
    }

    @Override
    public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
        if (isOrderStart(uri, localName) && currentOrderHandler == null) {
            currentOrderHandler = new OrderSaxHandler();
            currentOrderHandler.startDocument();
            orderDepth = 0;
        }

        if (currentOrderHandler != null) {
            orderDepth++;
            currentOrderHandler.startElement(uri, localName, qName, attributes);
        }
    }

    @Override
    public void characters(char[] ch, int start, int length) throws SAXException {
        if (currentOrderHandler != null) {
            currentOrderHandler.characters(ch, start, length);
        }
    }

    @Override
    public void endElement(String uri, String localName, String qName) throws SAXException {
        if (currentOrderHandler != null) {
            currentOrderHandler.endElement(uri, localName, qName);
            orderDepth--;

            if (orderDepth == 0) {
                currentOrderHandler.endDocument();
                OrderDocument order = currentOrderHandler.result();
                consume(order);
                currentOrderHandler = null;
            }
        }
    }

    private boolean isOrderStart(String uri, String localName) {
        return "urn:example:order:v1".equals(uri) && "Order".equals(localName);
    }

    private void consume(OrderDocument order) throws SAXException {
        try {
            consumer.accept(order);
        } catch (Exception ex) {
            throw new SAXException("Order consumer failed for orderId=" + order.orderId(), ex);
        }
    }
}

This pattern enables:

  • parse huge file;
  • validate per record;
  • write per record;
  • batch every N records;
  • stop early on fatal error;
  • keep memory bounded.

But be careful: side effects during parsing require idempotency and checkpointing.


Backpressure dan Transaction Boundary

SAX parser is push-based. Once parsing starts, handler gets called by parser. If consumer is slow, parser blocks inside handler.

For batch ingestion:

Design options:

PatternKapan DipakaiRisiko
Consume per recordLow latency, simpleBanyak transaksi kecil.
Batch N recordsThroughput lebih baikPartial failure lebih kompleks.
Write staging tableAudit dan replay kuatButuh storage tambahan.
Publish to queueDecouple downstreamOrdering/idempotency perlu desain.
Stop on first errorStrict regulatory inputSatu record buruk menghentikan file.
Collect record errorsPartner batch correctionHarus punya error report lengkap.

Untuk regulatory atau financial batch, biasanya lebih defensible jika ingestion menghasilkan:

  • file-level audit record;
  • record-level status;
  • raw payload reference;
  • validation error report;
  • replay mechanism.

Locator: Line dan Column untuk Business Error

SAX menyediakan Locator lewat setDocumentLocator.

import org.xml.sax.Locator;

public abstract class LocatingHandler extends DefaultHandler {
    private Locator locator;

    @Override
    public void setDocumentLocator(Locator locator) {
        this.locator = locator;
    }

    protected String location() {
        if (locator == null) {
            return "line=?, column=?";
        }
        return "line=" + locator.getLineNumber() + ", column=" + locator.getColumnNumber();
    }
}

Gunakan untuk error bisnis:

throw new SAXException("Missing Line/Sku at " + path.asDebugPath() + " (" + location() + ")");

Ini sangat penting saat partner mengirim file 500 MB. Tanpa line/column, debugging menjadi mahal.


Attributes Handling

Attributes menyediakan access berdasarkan index, qName, atau namespace/localName.

Untuk unnamespaced attribute:

String value = attributes.getValue("", "currency");

Untuk namespaced attribute:

String href = attributes.getValue("http://www.w3.org/1999/xlink", "href");

Jangan bergantung pada prefix:

// Fragile
attributes.getValue("xlink:href");

Attribute order tidak boleh dianggap stabil.


Prefix Mapping Events

SAX menyediakan:

@Override
public void startPrefixMapping(String prefix, String uri) {
}

@Override
public void endPrefixMapping(String prefix) {
}

Biasanya kamu tidak perlu memakainya untuk extraction karena startElement sudah memberikan uri dan localName jika namespace-aware. Namun prefix mapping berguna untuk:

  • preserving prefixes;
  • custom serializer;
  • canonicalization/debugging;
  • detecting namespace policy violation;
  • logging namespace context.

Contoh policy:

@Override
public void startPrefixMapping(String prefix, String uri) throws SAXException {
    if ("".equals(prefix) && !ALLOWED_DEFAULT_NAMESPACES.contains(uri)) {
        throw new SAXException("Unexpected default namespace: " + uri);
    }
}

Tapi hati-hati: prefix adalah syntactic binding, bukan identity element. Jangan jadikan prefix sebagai business identity.


Validation dengan SAX

SAX dapat divalidasi saat parse via schema.

SchemaFactory schemaFactory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
schemaFactory.setProperty(XMLConstants.ACCESS_EXTERNAL_DTD, "");
schemaFactory.setProperty(XMLConstants.ACCESS_EXTERNAL_SCHEMA, "");
Schema schema = schemaFactory.newSchema(schemaFile);

SAXParserFactory factory = SaxFactories.secureNamespaceAwareFactory();
factory.setSchema(schema);
SAXParser parser = factory.newSAXParser();
parser.parse(input, handler);

Kelebihan:

  • tidak perlu DOM;
  • invalid document gagal saat streaming;
  • cocok untuk batch besar;
  • error line/column tersedia.

Trade-off:

  • schema selection harus diketahui sebelum parse atau dilakukan dengan pre-scan/root sniffing;
  • error aggregation lebih kompleks;
  • validation dan extraction coupling harus dijaga.

Pattern production:

root sniff -> choose schema -> streaming validate + extract -> record-level processing

Root sniff bisa dilakukan dengan SAX/StAX kecil yang hanya membaca root element.


SAX untuk Early Termination

Kadang kamu hanya butuh membaca root, version, atau satu field. SAX bisa berhenti lebih awal dengan exception kontrol.

public final class StopParsingException extends SAXException {
}

Handler:

@Override
public void endElement(String uri, String localName, String qName) throws SAXException {
    if (path.endsWith(ORDER, ORDER_ID)) {
        orderId = text.toString().trim();
        throw new StopParsingException();
    }
    path.popExpecting(new QNameKey(uri, localName));
}

Parser wrapper:

try {
    parser.parse(input, handler);
} catch (StopParsingException done) {
    return handler.orderId();
}

Gunakan secara hati-hati:

  • jangan swallow fatal parse error sebagai stop normal;
  • bedakan exception kontrol internal dari error XML;
  • dokumentasikan bahwa input belum diparse penuh.

Untuk security validation, early termination tidak membuktikan seluruh dokumen aman/valid. Ia hanya cocok untuk metadata sniffing.


SAX dan Error Aggregation

Default SAX biasanya fail-fast. Namun untuk partner batch, kamu mungkin ingin mengumpulkan error per record dan lanjut.

Strategi:

  1. SAX parser membaca file besar.
  2. Handler mengenali record boundary.
  3. Untuk setiap record, handler membangun minimal record object atau error object.
  4. Jika satu record invalid secara semantic, simpan error, reset record state, lanjut record berikutnya.
  5. Jika XML not well-formed, parsing harus berhenti karena stream tidak dapat dipercaya.

Pembedaan penting:

Error TypeBisa Lanjut?Alasan
XML not well-formedTidakParser tidak bisa menjamin struktur berikutnya.
XSD invalid globalBiasanya tidakTergantung validator dan recovery.
Record semantic invalidBisaJika record boundary masih diketahui.
Downstream DB transientMungkinButuh retry/checkpoint.
Unknown required namespaceTidakContract mismatch.

SAX handler untuk error aggregation harus sangat disiplin agar tidak menghasilkan partial state ambigu.


Observability untuk SAX Pipeline

Metrics minimal:

xml.sax.parse.duration
xml.sax.records.total
xml.sax.records.success
xml.sax.records.failed
xml.sax.bytes.read
xml.sax.validation.errors
xml.sax.first_error.line
xml.sax.partner
xml.sax.document_type
xml.sax.schema_version

Structured log example:

{
  "event": "xml_record_failed",
  "stage": "sax_extract",
  "documentId": "file-2026-07-02-001",
  "partnerId": "partner-a",
  "recordIndex": 2819,
  "path": "{urn:example:order:v1}Orders/{urn:example:order:v1}Order/{urn:example:order:v1}Lines/{urn:example:order:v1}Line",
  "line": 91822,
  "column": 17,
  "errorCode": "MISSING_REQUIRED_FIELD",
  "field": "Sku"
}

Untuk file besar, logging setiap record sukses bisa terlalu mahal. Gunakan metrics dan sampling. Log detail untuk error, summary untuk sukses.


Performance Notes

SAX biasanya cepat dan hemat memory, tetapi handler bisa membuatnya lambat.

Hal yang sering membunuh performance:

  • membuat String untuk setiap event yang tidak diperlukan;
  • path string dibangun setiap callback;
  • regex berat di setiap element;
  • logging di hot path;
  • menyimpan semua record ke list;
  • BigDecimal parsing terlalu awal jika tidak perlu;
  • downstream call synchronous per tiny field;
  • exception dipakai untuk control flow terlalu sering selain early stop yang disengaja.

Optimasi aman:

  • hanya capture text untuk field target;
  • gunakan StringBuilder reuse;
  • pakai QNameKey constants;
  • process per record;
  • batch downstream write;
  • buat debug path hanya saat error;
  • benchmark dengan payload realistis.

SAX Handler Design Rules

Rule 1 — Handler Harus Punya Scope Jelas

Handler yang parse semua jenis dokumen akan jadi sulit diuji. Lebih baik:

OrderFileHandler
  -> OrderRecordHandler
     -> LineHandler? optional decomposition

Rule 2 — Jangan Campur Parsing dan Business Side Effect Terlalu Dalam

Buruk:

@Override
public void endElement(...) {
    if (endLine) {
        database.insert(line);
        kafka.send(event);
        billing.charge(...);
    }
}

Lebih baik:

handler -> emits domain record -> application service handles transaction/outbox

Rule 3 — State Reset Harus Eksplisit

Setiap start/end record harus membersihkan state.

Rule 4 — Missing dan Duplicate Harus Dicek

Jangan hanya overwrite field jika element muncul dua kali.

Rule 5 — Namespace URI adalah Identity

Prefix tidak boleh jadi dasar logic.

Rule 6 — Well-formed Error Berhenti

Jangan mencoba recovery dari XML stream yang tidak well-formed kecuali kamu sedang membangun parser khusus, bukan memakai SAX normal.

Rule 7 — Handler Tidak Boleh Reused Sembarangan

Buat handler baru per parse atau pastikan startDocument reset semua state.


SAX Anti-Patterns

Anti-Pattern 1: Boolean Flag Explosion

boolean inOrder;
boolean inLine;
boolean inPrice;
boolean inDiscount;
boolean inTax;

Jika flag makin banyak, gunakan path stack atau handler decomposition.

Anti-Pattern 2: Ignoring uri dan localName

if ("Id".equals(qName)) { ... }

Ini fragile terhadap prefix dan namespace.

Anti-Pattern 3: One Handler for Entire Enterprise Schema

Satu handler ribuan baris akan sulit diuji dan rentan state leak. Pecah berdasarkan record atau aggregate.

Anti-Pattern 4: Accumulate Entire File

Jika kamu menyimpan semua record di list, kamu kehilangan manfaat streaming.

Anti-Pattern 5: Characters Assignment Instead of Append

characters() bisa multiple callback. Selalu append saat capture.

Anti-Pattern 6: Silent Unknown Element

Mengabaikan unknown element tanpa policy bisa menyembunyikan contract drift.

Anti-Pattern 7: No Locator

Tanpa line/column, production support untuk file besar sangat mahal.


Testing SAX Parser

Test Text Fragmentation

Kamu bisa memanggil handler langsung untuk memastikan append logic benar.

@Test
void appendsFragmentedCharacters() throws Exception {
    MinimalOrderHandler handler = new MinimalOrderHandler();

    handler.startDocument();
    handler.startElement("urn:example:order:v1", "Order", "Order", emptyAttributes());
    handler.startElement("urn:example:order:v1", "Id", "Id", emptyAttributes());
    handler.characters("O-".toCharArray(), 0, 2);
    handler.characters("100".toCharArray(), 0, 3);
    handler.endElement("urn:example:order:v1", "Id", "Id");
    handler.endElement("urn:example:order:v1", "Order", "Order");
    handler.endDocument();

    assertEquals("O-100", handler.orderId());
}

Test Prefix Independence

@Test
void prefixDoesNotMatter() {
    String xml = """
            <x:Order xmlns:x="urn:example:order:v1">
                <x:Id>O-100</x:Id>
                <x:CustomerId>C-1</x:CustomerId>
                <x:Lines>
                    <x:Line>
                        <x:Sku>SKU-1</x:Sku>
                        <x:Quantity>1</x:Quantity>
                        <x:UnitPrice currency="USD">10.00</x:UnitPrice>
                    </x:Line>
                </x:Lines>
            </x:Order>
            """;

    OrderDocument order = new OrderSaxParser().parse(toInputStream(xml));
    assertEquals("O-100", order.orderId());
}

Test Nested Same Name

<Order xmlns="urn:example:order:v1">
    <Id>ORDER-1</Id>
    <Lines>
        <Line>
            <Id>LINE-1</Id>
            ...
        </Line>
    </Lines>
</Order>

Pastikan parser tidak mengambil Line/Id sebagai Order/Id. Ini menguji path-based logic.

Test XXE Rejection

@Test
void rejectsDoctype() {
    String xml = """
            <!DOCTYPE foo [ <!ENTITY xxe SYSTEM "file:///etc/passwd"> ]>
            <foo>&xxe;</foo>
            """;

    assertThrows(XmlParsingException.class, () -> new OrderSaxParser().parse(toInputStream(xml)));
}

Test Large File Does Not Accumulate

Buat synthetic file 100.000 order. Consumer hanya count record. Pastikan memory stabil.

AtomicInteger count = new AtomicInteger();
StreamingOrdersSaxHandler handler = new StreamingOrdersSaxHandler(order -> count.incrementAndGet());
parser.parse(input, handler);
assertEquals(100_000, count.get());

Production Readiness Checklist untuk SAX

  • Parser factory centralized.
  • Namespace aware aktif.
  • External entities disabled.
  • External DTD disabled.
  • XInclude disabled.
  • Secure processing aktif.
  • EntityResolver default-deny.
  • ErrorHandler throws enriched error.
  • Locator digunakan untuk diagnostics.
  • Handler baru per parse atau full reset di startDocument.
  • characters() append, bukan overwrite.
  • Context path atau enum state eksplisit.
  • Unknown element policy eksplisit.
  • Missing field dicek.
  • Duplicate field dicek.
  • Namespace URI/localName dipakai, bukan prefix.
  • Record-level processing tidak mengumpulkan seluruh file jika file besar.
  • Side effect dipisahkan dari parser jika perlu transaction control.
  • Metrics record count, failure count, parse duration tersedia.
  • Test prefix variation, nested same-name element, fragmented text, XXE, large file tersedia.

Latihan Deliberate Practice

Drill 1 — Event Trace

Buat handler yang hanya mencetak event:

startDocument
startElement {ns}Order
characters "..."
endElement {ns}Order
endDocument

Parse XML pretty-printed dan amati whitespace character events.

Drill 2 — Build Path Stack

Implementasikan ElementPath sendiri. Tambahkan method:

  • push
  • popExpecting
  • endsWith
  • asDebugPath
  • depth

Gunakan untuk membedakan Order/Id dan Line/Id.

Drill 3 — Fragmented Text Simulation

Panggil characters() manual dua sampai tiga kali untuk satu field dan buktikan hasil tetap benar.

Drill 4 — Large File Streaming

Generate file dengan 100.000 <Order>. Parser harus:

  • count record;
  • tidak menyimpan semua record;
  • menghasilkan summary;
  • gagal dengan line/column jika satu record invalid.

Drill 5 — Strict vs Extension Mode

Tambahkan <Extensions> di schema mental kamu. Unknown element di luar <Extensions> harus gagal. Unknown element di dalam <Extensions> boleh lewat tetapi dihitung metric-nya.

Drill 6 — Record-Level Error Report

Buat parser batch yang menghasilkan:

public record RecordError(
        int recordIndex,
        int line,
        int column,
        String path,
        String errorCode,
        String message
) {
}

Bedakan error structural fatal dan semantic per-record.


Ringkasan

SAX adalah tool penting untuk XML production karena memberi processing forward-only dengan memory rendah. Ia cocok untuk file besar, batch feed, extraction subset field, dan validasi streaming. Namun SAX menukar kenyamanan DOM dengan tanggung jawab state management.

Mental model yang harus tertanam:

  • SAX parser mendorong event ke handler.
  • Handler adalah state machine.
  • characters() bisa dipanggil berkali-kali untuk satu logical text.
  • Namespace URI + local name adalah identity, bukan prefix.
  • Context stack membuat parsing maintainable.
  • Locator membuat incident debugging realistis.
  • Untuk file besar, proses record per record; jangan akumulasi seluruh file.
  • Unknown element policy harus eksplisit.
  • Side effect harus didesain dengan transaction, idempotency, dan checkpointing.
  • Parser hardening sama wajibnya seperti di DOM.

Jika DOM membuat kamu nyaman membaca XML, SAX membuat kamu disiplin memproses XML sebagai stream. Skill ini sangat penting untuk sistem enterprise yang menerima file partner besar, report regulatory, atau feed historical yang tidak mungkin dimuat penuh ke memory.


Referensi Resmi dan Lanjutan

  • Oracle Java SE 25 API — SAXParserFactory
  • Oracle Java SE 25 API — SAXParser
  • Oracle Java SE 25 API — org.xml.sax
  • SAX Project Quickstart
  • Oracle JAXP Security Guide
  • OWASP XML External Entity Prevention Cheat Sheet
  • W3C XML and Namespaces specifications
Lesson Recap

You just completed lesson 06 in start here. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.