Build CoreOrdered learning track

XSD Types, Datatypes, and Value Constraints

Learn Java XML In Action - Part 011

XSD types, datatypes, lexical space, value space, facets, constraints, precision, time, identity, nil/default semantics, dan modelling rules untuk contract XML production-grade.

17 min read3375 words
PrevNext
Lesson 1132 lesson track0718 Build Core
#java#xml#xsd#schema+5 more

Part 011 — XSD Types, Datatypes, and Value Constraints

Tujuan Part Ini

Di part sebelumnya, kita melihat XSD sebagai contract design language: element, type, compositor, namespace, occurrence, dan boundary antara structure rule vs business rule.

Part ini masuk ke layer berikutnya: nilai.

Target setelah part ini:

  • memahami perbedaan lexical space, value space, dan canonical representation;
  • memilih built-in XSD datatype secara tepat;
  • mendesain simpleType dengan facet yang tidak terlalu longgar dan tidak terlalu rapuh;
  • memahami restriction, list, dan union;
  • memahami efek nillable, default, fixed, dan whitespace normalization;
  • membuat constraint yang cukup kuat untuk integrasi tanpa memasukkan business rule yang salah tempat;
  • menghindari bug klasik: decimal precision, date/time timezone, regex XSD, enum drift, dan optionality ambiguity;
  • menyiapkan schema yang siap divalidasi di Java menggunakan javax.xml.validation.

Mental model utama:

XSD datatype = cara mengubah string XML menjadi nilai terkontrol

string in XML document
    -> lexical validation
    -> whitespace normalization
    -> datatype parsing
    -> value-space validation
    -> facet validation
    -> typed value for downstream processing

XML secara fisik adalah text. XSD memberi grammar untuk menjawab: "text ini mewakili nilai apa, dengan batasan apa, dan apakah nilai itu layak masuk pipeline?"


1. Why Datatype Design Matters

Banyak bug XML production bukan karena dokumen tidak bisa diparse. Bug terjadi karena dokumen valid secara XML, bahkan kadang valid secara XSD, tetapi nilai di dalamnya ambigu atau salah dimodelkan.

Contoh:

<Order>
  <Amount>100.00</Amount>
  <Currency>IDR</Currency>
  <SubmittedAt>2026-07-02T10:30:00</SubmittedAt>
</Order>

Pertanyaan engineering-nya:

FieldPertanyaan Contract
AmountDecimal atau float? Boleh negatif? Berapa digit maksimum? Berapa fractional digit?
CurrencyFree text atau ISO-like enum? Siapa yang mengelola perubahan daftar currency?
SubmittedAtLocal datetime atau instant? Wajib timezone? Bagaimana interpretasi DST?
Missing elementApakah absent berarti unknown, not applicable, default, atau error?
Empty stringApakah "" nilai valid? Sama dengan null?

XSD bisa menjawab sebagian pertanyaan ini. Tapi XSD juga bisa menyesatkan kalau dipakai terlalu agresif.

Prinsipnya:

XSD should reject structurally impossible data.
XSD should not pretend to own volatile business policy.

Misalnya:

  • amount >= 0 bisa cocok di XSD jika semua domain memang tidak pernah negatif.
  • customerAge >= 18 mungkin business rule, bukan schema rule.
  • status enum cocok bila status adalah protocol-level vocabulary.
  • countryCode enum bisa berbahaya bila daftar negara berubah dan partner sulit upgrade.

2. The Three Spaces: Lexical, Value, Canonical

XSD datatype punya tiga konsep penting.

2.1 Lexical Space

Lexical space adalah bentuk string yang boleh muncul di XML.

Contoh untuk boolean:

<Active>true</Active>
<Active>false</Active>
<Active>1</Active>
<Active>0</Active>

Semua bisa valid sebagai xs:boolean.

2.2 Value Space

Value space adalah nilai konseptual setelah parsing.

"true" -> true
"1"    -> true
"0"    -> false

Dua lexical form bisa mewakili value yang sama.

2.3 Canonical Representation

Canonical representation adalah bentuk string yang disarankan sebagai representasi stabil.

Untuk boolean, bentuk canonical biasanya true atau false, bukan 1 atau 0.

Dalam production, canonical representation penting untuk:

  • golden file testing;
  • digital signature/canonicalization contexts;
  • deterministic audit output;
  • comparison antar payload;
  • cache key;
  • idempotency fingerprint.

Mental model:


3. Built-in Datatype Families

XSD menyediakan banyak built-in type. Jangan dihafal sebagai daftar. Pahami familinya.

xs:anySimpleType
  ├── string-like
  ├── boolean
  ├── decimal/numeric
  ├── floating-point
  ├── date/time
  ├── binary
  ├── URI/QName/NOTATION
  └── identity-related/tokenized types

3.1 String-like Types

TypeUse CaseCatatan
xs:stringfree textwhitespace preserved by default
xs:normalizedStringtext tanpa tab/newline mentahwhitespace replacement
xs:tokencode/name/value yang harus collapsedwhitespace collapsed
xs:languagelanguage taggunakan hati-hati, validasi tidak selalu cukup untuk policy
xs:Name, xs:NCNameXML name-like valuecocok untuk QName-ish metadata internal, bukan business name

Practical rule:

Human text    -> xs:string
Protocol code -> xs:token + pattern/enumeration
Identifier    -> xs:token + length/pattern

Jangan pakai xs:string untuk semua field. Itu membuat schema hanya memvalidasi struktur, bukan kontrak nilai.

3.2 Numeric Types

TypeUse CaseHindari Untuk
xs:decimalmoney, quantity, precise measurementfloating scientific values
xs:integercount, version, sequence numbervalue dengan fractional part
xs:long/xs:intnumeric bounded by implementation/APIpublic contract jangka panjang jika range bisa tumbuh
xs:float/xs:doublescientific approximate valuemoney, legal amount, financial ledger

Untuk enterprise payload:

Money -> xs:decimal + totalDigits + fractionDigits + minInclusive if needed
Count -> xs:nonNegativeInteger or restricted xs:integer
Rate  -> xs:decimal, not double, unless approximation is intended

3.3 Date and Time Types

TypeMeaningCommon Risk
xs:datecalendar datetimezone ambiguity
xs:timetime of dayuseless without date/timezone in many domains
xs:dateTimedate + time, optional timezone in XSD lexical formlocal-vs-instant ambiguity
xs:gYear, xs:gYearMonthpartial calendar valuesoften overused for reporting period
xs:durationdurationmonth/year durations are context-dependent

Important design rule:

If the value represents an event instant, require timezone in lexical policy.
If the value represents a local business date, use xs:date and document timezone/calendar semantics externally.

XSD 1.0 xs:dateTime allows values with and without timezone. That means this may validate:

<SubmittedAt>2026-07-02T10:30:00</SubmittedAt>

But the business meaning is ambiguous. Is it Jakarta time? UTC? Partner local time?

In Java, convert deliberately:

OffsetDateTime instantLike = OffsetDateTime.parse("2026-07-02T10:30:00+07:00");
LocalDateTime localOnly = LocalDateTime.parse("2026-07-02T10:30:00");

Do not silently parse local datetime as system default timezone in server code.

3.4 Binary Types

TypeUse Case
xs:base64Binaryembedded binary data
xs:hexBinarydigest/hash-like values

Production rule:

Prefer external object storage for large binary.
Allow base64 in XML only for bounded, contractually small payloads.

For regulatory or partner payloads, embedded base64 might be required. If so, add:

  • maximum size at transport boundary;
  • checksum field;
  • content type field;
  • malware scanning pipeline;
  • audit redaction rule.

3.5 URI and QName Types

TypeUse CaseRisk
xs:anyURIURI-like valuevalidation is permissive; do not treat as security-safe URL
xs:QNamenamespace-qualified nameprefix binding depends on in-scope namespace context

Never treat xs:anyURI validation as SSRF protection. It only says the lexical form is URI-like enough for schema semantics. Network access policy is separate.


4. Simple Type Design

A simpleType defines value constraints for text-only values.

Example:

<xs:simpleType name="CurrencyCodeType">
  <xs:restriction base="xs:token">
    <xs:pattern value="[A-Z]{3}"/>
  </xs:restriction>
</xs:simpleType>

This says:

The value must be a whitespace-collapsed token matching exactly three uppercase letters.

It does not say the code is officially active, accepted for settlement, or enabled for a tenant. That belongs to business policy.

4.1 Named vs Anonymous Simple Types

Anonymous type:

<xs:element name="Currency">
  <xs:simpleType>
    <xs:restriction base="xs:token">
      <xs:pattern value="[A-Z]{3}"/>
    </xs:restriction>
  </xs:simpleType>
</xs:element>

Named type:

<xs:simpleType name="CurrencyCodeType">
  <xs:restriction base="xs:token">
    <xs:pattern value="[A-Z]{3}"/>
  </xs:restriction>
</xs:simpleType>

<xs:element name="Currency" type="tns:CurrencyCodeType"/>

Production rule:

Use named simple types for reusable domain vocabulary.
Use anonymous simple types only for local, one-off constraints.

Named types improve reuse, documentation, diff review, generated model naming, contract governance, and test fixture organization.


5. Facets: Constraint Building Blocks

Facets refine datatype values.

FacetMeaningTypical Use
lengthexact lengthfixed-width code
minLengthminimum string/list lengthrequired non-empty text
maxLengthmaximum string/list lengthpayload limit, DB column alignment
patternregex constraintcode format
enumerationallowed vocabularystatus, type, category
whiteSpacepreserve/replace/collapsetoken normalization
minInclusivevalue >= boundamount/count/date lower bound
maxInclusivevalue <= boundbounded range
minExclusivevalue > boundpositive value
maxExclusivevalue < boundthreshold
totalDigitsmax total decimal digitsmoney precision
fractionDigitsmax fractional digitscents/scale

5.1 Non-empty String

Bad:

<xs:element name="CustomerName" type="xs:string"/>

This accepts empty string.

Better for machine-consumed fields:

<xs:simpleType name="NonBlankToken100Type">
  <xs:restriction base="xs:token">
    <xs:minLength value="1"/>
    <xs:maxLength value="100"/>
  </xs:restriction>
</xs:simpleType>

xs:token collapses whitespace, so whitespace-only input becomes empty and fails minLength=1.

Practical rule:

Human prose -> xs:string with maxLength; handle blank semantics in app if needed.
Protocol value -> xs:token + minLength/maxLength/pattern.

5.2 Money Amount

Bad:

<xs:element name="Amount" type="xs:double"/>

Problems:

  • floating-point approximation;
  • possible scientific notation surprises;
  • downstream rounding issues;
  • mismatch with financial storage.

Better:

<xs:simpleType name="MoneyAmountType">
  <xs:restriction base="xs:decimal">
    <xs:totalDigits value="18"/>
    <xs:fractionDigits value="2"/>
    <xs:minInclusive value="0.00"/>
  </xs:restriction>
</xs:simpleType>

But be careful: not all money is 2 decimals. Some currencies have 0, 3, or special minor unit rules. XSD cannot easily express "fraction digits depends on currency field" in XSD 1.0.

So split constraints:

RuleWhere
amount is decimalXSD
max precision is boundedXSD
non-negative if domain invariantXSD
currency-specific minor unitbusiness validation
tenant/product-specific limitbusiness validation

5.3 Percentage / Rate

<xs:simpleType name="PercentageType">
  <xs:restriction base="xs:decimal">
    <xs:minInclusive value="0"/>
    <xs:maxInclusive value="100"/>
    <xs:fractionDigits value="4"/>
  </xs:restriction>
</xs:simpleType>

This is good only if the contract represents percentage as 0..100. Some APIs represent rate as 0..1.

Name type precisely:

Percentage0To100Type
Rate0To1Type
BasisPointType

Ambiguous names cause integration defects.


6. Pattern Facet: XSD Regex Is Not Java Regex

XSD regex is similar to familiar regex but not identical to Java regex.

Common trap:

<xs:pattern value="[A-Z]{3}"/>

In XSD, pattern matching has schema-specific semantics. Engineers often assume Java matches() semantics. Do not blindly copy Java regex.

For production schema:

  • keep patterns simple;
  • write examples in xs:documentation;
  • test allowed and rejected values;
  • avoid complicated lookarounds/backreferences assumptions;
  • remember escaping through XML attributes.

Example:

<xs:simpleType name="PartnerReferenceType">
  <xs:annotation>
    <xs:documentation>
      Partner-assigned reference. Allowed characters: uppercase letters,
      digits, dot, underscore, hyphen. Must start with letter or digit.
      Examples: ORD-2026-0001, CLAIM_9912, ABC.123
    </xs:documentation>
  </xs:annotation>
  <xs:restriction base="xs:token">
    <xs:minLength value="1"/>
    <xs:maxLength value="40"/>
    <xs:pattern value="[A-Z0-9][A-Z0-9._-]*"/>
  </xs:restriction>
</xs:simpleType>

7. Enumeration: Strong Contract or Change Trap?

Enumeration is tempting:

<xs:simpleType name="OrderStatusType">
  <xs:restriction base="xs:token">
    <xs:enumeration value="DRAFT"/>
    <xs:enumeration value="SUBMITTED"/>
    <xs:enumeration value="APPROVED"/>
    <xs:enumeration value="REJECTED"/>
  </xs:restriction>
</xs:simpleType>

Good when:

  • vocabulary is protocol-level;
  • changes require coordinated release;
  • unknown values should be rejected;
  • generated code enum is desirable;
  • audit/reporting relies on exact finite set.

Dangerous when:

  • partner can introduce new values;
  • values come from configurable catalog;
  • values are tenant-specific;
  • unknown values should be tolerated and routed;
  • schema rollout is slower than business change.

Decision matrix:

Vocabulary TypeXSD Enumeration?Better Alternative
protocol statusyesenum
currency codemaybepattern + business code list
country codemaybepattern + reference data validation
product SKUnotoken + reference lookup
tenant-specific categorynotoken + governance outside schema
error code owned by serviceyesenum or pattern with registry

7.1 Forward-Compatible Enumeration Pattern

Sometimes you need known values but must tolerate extension.

Option 1: Use pattern instead of enum.

<xs:simpleType name="ReasonCodeType">
  <xs:restriction base="xs:token">
    <xs:minLength value="1"/>
    <xs:maxLength value="40"/>
    <xs:pattern value="[A-Z][A-Z0-9_]*"/>
  </xs:restriction>
</xs:simpleType>

Then validate known/allowed codes in business layer.

Option 2: Use an extension field:

<Status>REJECTED</Status>
<StatusReasonCode>PARTNER_CUSTOM_001</StatusReasonCode>

Keep state-machine status stable, allow reason code to evolve.


8. Whitespace Semantics

Whitespace matters more than most engineers expect.

XSD whiteSpace facet can be:

ValueMeaning
preservekeep whitespace
replacereplace tab/newline/carriage return with spaces
collapsereplace then collapse runs of spaces and trim ends

String-like types behave differently:

TypeWhitespace Behavior
xs:stringpreserve
xs:normalizedStringreplace
xs:tokencollapse

Example:

<Code>  ABC   123  </Code>

If Code is xs:token, the value becomes conceptually:

ABC 123

For codes and identifiers, collapsed whitespace is usually desired. For legal text or description, preserving whitespace may matter.

Production rule:

Use xs:token for machine-consumed values.
Use xs:string for human-authored prose.

9. Optional, Empty, Nil, Default, Fixed

This is one of the most important modelling areas.

9.1 Absent Element

<Order>
  <!-- Discount absent -->
</Order>

If schema says:

<xs:element name="Discount" type="xs:decimal" minOccurs="0"/>

Absent means the element is not present. It does not automatically mean zero, null, unknown, or not applicable.

9.2 Empty Element

<Discount/>

For xs:decimal, this is invalid because empty string is not decimal. For xs:string, it is valid as empty string.

9.3 Nillable Element

Schema:

<xs:element name="Discount" type="xs:decimal" nillable="true" minOccurs="0"/>

Instance:

<Discount xsi:nil="true" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"/>

This explicitly states nil.

Design question:

RepresentationMeaning Candidate
absentnot provided / not applicable / unknown
empty stringprovided as empty text
xsi:nil="true"explicitly nil
0.00actual zero value

Never let these four collapse accidentally.

9.4 Default Values

<xs:element name="Priority" type="xs:token" default="NORMAL" minOccurs="0"/>

Default semantics can surprise application code. Some validators expose defaulted values into post-schema-validation infoset; many pipelines do not rely on that consistently.

Production advice:

Avoid XSD default for business defaults unless every processing component is PSVI-aware and tested.
Prefer explicit defaults in application/service layer.

9.5 Fixed Values

<xs:element name="SchemaVersion" type="xs:token" fixed="1.0"/>

Useful for protocol discriminators, but do not overuse. Namespace/versioning often communicates version more cleanly.


10. Lists and Unions

10.1 List Type

A list type is whitespace-separated values in one element.

<xs:simpleType name="TagListType">
  <xs:list itemType="xs:token"/>
</xs:simpleType>

Instance:

<Tags>URGENT MANUAL_REVIEW VIP</Tags>

This is compact, but often less maintainable than repeated elements:

<Tags>
  <Tag>URGENT</Tag>
  <Tag>MANUAL_REVIEW</Tag>
  <Tag>VIP</Tag>
</Tags>

Use list when values are simple, order is not semantically rich, no per-item metadata is needed, compactness matters, or partner standard already uses it.

Use repeated elements when each item may later gain attributes, item-level error reporting matters, item ordering matters, or schema evolution is likely.

10.2 Union Type

Union allows one of several datatypes.

<xs:simpleType name="CustomerIdentifierType">
  <xs:union memberTypes="tns:NationalIdType tns:PassportNumberType tns:InternalCustomerIdType"/>
</xs:simpleType>

Unions can be convenient but ambiguous. Prefer explicit discriminator when possible:

<CustomerIdentifier>
  <Type>PASSPORT</Type>
  <Value>A1234567</Value>
</CustomerIdentifier>

Why? Because downstream systems need to know which interpretation was chosen.


11. Identity Constraints: unique, key, keyref

XSD can express some document-local identity constraints.

Example: order line IDs unique within an order.

<xs:element name="Order" type="tns:OrderType">
  <xs:unique name="uniqueLineId">
    <xs:selector xpath="tns:Lines/tns:Line"/>
    <xs:field xpath="tns:LineId"/>
  </xs:unique>
</xs:element>

Identity constraints are useful for duplicate line detection, local references, and preventing inconsistent document-internal links.

They are not a replacement for database constraints or business validation.

Key pitfalls:

  • namespace prefixes inside selector/field matter;
  • XPath subset is limited;
  • error messages may be cryptic;
  • streaming validation plus identity constraints may require buffering by validator;
  • large documents with many keys can have memory implications.

Use identity constraints when document-local consistency is fundamental to the XML contract.


12. XSD 1.0 vs XSD 1.1 Constraints

XSD 1.1 adds stronger constraint capabilities such as assertions. But not every Java runtime validator supports XSD 1.1 by default.

Typical Java SE built-in validation support targets W3C XML Schema via JAXP, but advanced XSD 1.1 support often requires a different processor/library.

Example of rule that XSD 1.0 cannot naturally express:

If Currency = IDR, Amount must have 0 fraction digits.
If Currency = USD, Amount may have 2 fraction digits.

In XSD 1.0, model it as:

  • structural constraints in XSD;
  • cross-field constraints in business validation;
  • contract documentation;
  • executable validation tests.

In XSD 1.1, assertions can express more, but introducing XSD 1.1 changes your toolchain and interoperability assumptions.

Decision rule:

Use XSD 1.0-compatible contracts unless both producer and consumer ecosystems explicitly support XSD 1.1.

13. Java Validation Implications

A typical Java XSD validation flow:

SchemaFactory factory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema schema = factory.newSchema(schemaSource);
Validator validator = schema.newValidator();
validator.validate(xmlSource);

Conceptual lifecycle:

Production rules:

ObjectReuse?Notes
SchemaFactoryconfigure centrallynot your validation contract itself
Schemayesimmutable representation of compiled grammar in common usage
Validatorno/shared with carecreate per validation or pool only with strict reset discipline
ErrorHandlerper validationcollect contextual errors
LSResourceResolver/catalog resolvercentral policyprevent network fetch and pin schema dependencies

A schema datatype design affects Java code generation and parsing:

XSD TypeJava Mapping Concern
xs:decimalBigDecimal, scale handling
xs:dateTimeXMLGregorianCalendar, OffsetDateTime, timezone policy
xs:integerBigInteger or bounded numeric
enum simple typegenerated enum drift across schema versions
list typecollection parsing and whitespace normalization
nillable elementnull semantics vs absent semantics

14. Constraint Boundary: Schema vs Business Rule

Use this model:

Examples:

RuleBest Layer
XML has valid namespace and structureXML parser + XSD
Amount is decimal with <= 18 digitsXSD
Currency is three uppercase lettersXSD
Currency is enabled for this tenantbusiness validation
Order status is one of protocol statusesXSD enum or business enum
Cannot approve order before quote is acceptedworkflow/state machine
Customer ID existsbusiness/persistence validation
Line IDs are unique inside documentXSD identity or business validation
Total amount equals sum of linesbusiness validation or XSD 1.1 assertion if toolchain supports it

The mistake is not "putting too much in XSD" or "putting too little in XSD". The mistake is putting a rule in a layer that cannot evolve, test, or explain it properly.


15. Production Type Library Pattern

For enterprise XML systems, define a shared type library.

Example file structure:

schemas/
  common/
    common-types.xsd
    identifiers.xsd
    money.xsd
    temporal.xsd
    audit.xsd
  order/
    order-v1.xsd
  claim/
    claim-v1.xsd

Common simple types:

<xs:simpleType name="CorrelationIdType">
  <xs:restriction base="xs:token">
    <xs:minLength value="1"/>
    <xs:maxLength value="128"/>
    <xs:pattern value="[A-Za-z0-9._:-]+"/>
  </xs:restriction>
</xs:simpleType>

<xs:simpleType name="IsoDateType">
  <xs:restriction base="xs:date"/>
</xs:simpleType>

<xs:simpleType name="UtcInstantTextType">
  <xs:restriction base="xs:token">
    <xs:pattern value="[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}(\.[0-9]+)?Z"/>
  </xs:restriction>
</xs:simpleType>

Notice: UtcInstantTextType uses token + pattern instead of xs:dateTime if the contract wants lexical UTC Z only. This is a design choice. It increases lexical strictness but may lose typed dateTime semantics from validator perspective.

Alternative:

<xs:element name="SubmittedAt" type="xs:dateTime"/>

Then enforce timezone requirement in application validation. Choose deliberately.


16. Naming Rules for XSD Types

Type names are API names. Treat them seriously.

Bad names:

CodeType
AmountType
DateType
StringType
IdType

Better:

CurrencyCodeType
MoneyAmount18_2Type
LocalBusinessDateType
PartnerReferenceType
OrderLineIdType
UtcTimestampType

Naming heuristic:

<TypeDomain><SemanticMeaning><ConstraintHint>Type

Examples:

Type NameWhy Good
OrderStatusTypedomain vocabulary
NonBlankToken100Typereusable technical constraint
CorrelationIdTypeoperational semantics
MoneyAmountTypevalue category
Percentage0To100Typeexplicit range semantics

17. Example: Production-Grade Order Type Constraints

<xs:schema
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:tns="https://example.com/order/v1"
    targetNamespace="https://example.com/order/v1"
    elementFormDefault="qualified"
    attributeFormDefault="unqualified">

  <xs:simpleType name="OrderIdType">
    <xs:restriction base="xs:token">
      <xs:minLength value="1"/>
      <xs:maxLength value="64"/>
      <xs:pattern value="ORD-[0-9]{4}-[0-9]{8}"/>
    </xs:restriction>
  </xs:simpleType>

  <xs:simpleType name="CurrencyCodeType">
    <xs:restriction base="xs:token">
      <xs:pattern value="[A-Z]{3}"/>
    </xs:restriction>
  </xs:simpleType>

  <xs:simpleType name="MoneyAmountType">
    <xs:restriction base="xs:decimal">
      <xs:totalDigits value="18"/>
      <xs:fractionDigits value="2"/>
      <xs:minInclusive value="0.00"/>
    </xs:restriction>
  </xs:simpleType>

  <xs:simpleType name="OrderStatusType">
    <xs:restriction base="xs:token">
      <xs:enumeration value="SUBMITTED"/>
      <xs:enumeration value="ACCEPTED"/>
      <xs:enumeration value="REJECTED"/>
    </xs:restriction>
  </xs:simpleType>

  <xs:complexType name="OrderType">
    <xs:sequence>
      <xs:element name="OrderId" type="tns:OrderIdType"/>
      <xs:element name="SubmittedAt" type="xs:dateTime"/>
      <xs:element name="Status" type="tns:OrderStatusType"/>
      <xs:element name="Currency" type="tns:CurrencyCodeType"/>
      <xs:element name="TotalAmount" type="tns:MoneyAmountType"/>
    </xs:sequence>
  </xs:complexType>

  <xs:element name="Order" type="tns:OrderType"/>
</xs:schema>

Review it critically:

FieldGoodRemaining Risk
OrderIdbounded, formattedformat policy may change
SubmittedAttyped dateTimetimezone not required by XSD 1.0 alone
Statusclosed protocol vocabularyadding status is breaking change
Currencyconstrained formatnot validated against actual currency registry
TotalAmountdecimal precisioncurrency-specific fraction digits not enforced

This is realistic. Good schemas still leave explicit business validation.


18. Failure Modes

18.1 xs:string Everywhere

Symptom:

<xs:element name="Amount" type="xs:string"/>
<xs:element name="SubmittedAt" type="xs:string"/>
<xs:element name="Status" type="xs:string"/>

Impact:

  • invalid values pass early;
  • downstream parser errors happen later;
  • audit trail says payload passed schema even though semantically impossible;
  • partner receives vague rejection from business layer.

Fix:

Use datatype facets for stable structural/value invariants.

18.2 Enum Overfitting

Symptom:

<xs:enumeration value="PROMO_JULY_2026"/>

Impact:

  • schema changes for campaign config;
  • partner releases blocked;
  • generated code redeployment needed;
  • stale enum rejects valid business values.

Fix:

Use enum only for protocol vocabulary, not operational catalog.

18.3 DateTime Without Timezone Policy

Symptom:

<SubmittedAt>2026-07-02T10:30:00</SubmittedAt>

Impact:

  • environment-dependent interpretation;
  • replay differs by region;
  • audit chronology becomes disputed;
  • SLA calculations drift.

Fix:

Require offset/UTC lexically or enforce it in semantic validator.

18.4 Decimal Scale Assumption

Symptom:

<Amount>100.123456</Amount>

No fractionDigits.

Impact:

  • DB write fails;
  • rounding happens silently;
  • reconciliation mismatch.

Fix:

Bound precision explicitly and document rounding rules outside schema.

18.5 nillable Everywhere

Symptom:

<xs:element name="CustomerId" type="tns:CustomerIdType" nillable="true" minOccurs="0"/>

Impact:

  • absent, nil, empty, invalid, and unknown become semantically confused;
  • generated code nullable everywhere;
  • validation loses useful signal.

Fix:

Use nillable only when explicit nil has contract meaning.

19. Validation Test Matrix for Datatypes

Every important simple type should have tests.

Example for MoneyAmountType:

CaseValueExpected
zero0.00valid
normal12345.67valid
too many fractional digits12.345invalid
negative-1.00invalid if minInclusive=0
too many total digits1234567890123456789.00invalid
non-numericABCinvalid
empty``invalid
whitespace12.00verify behavior

For CurrencyCodeType:

CaseValueExpected
uppercase 3 charsIDRvalid
lowercaseidrinvalid
too longUSDTinvalid
digitUS1invalid
whitespace paddedIDRdepends on xs:token normalization, usually valid as IDR

Testing should assert both validator result and diagnostic quality.


20. Kaufman Practice Drill

Spend 60–90 minutes building type judgement.

Drill 1 — Classify Fields

Given fields:

OrderId, CustomerName, Currency, Amount, SubmittedAt, Status,
ReasonCode, AttachmentDigest, CountryCode, ProductSku, Quantity,
DiscountPercentage, BillingPeriod, TenantId

For each field decide:

  1. built-in base type;
  2. facets;
  3. enum or pattern;
  4. XSD rule vs business rule;
  5. absent/empty/nil policy.

Drill 2 — Break Your Own Schema

Create invalid payloads:

  • wrong namespace;
  • empty required value;
  • whitespace-only token;
  • decimal overflow;
  • invalid date/time;
  • enum typo;
  • unknown status;
  • nil element;
  • missing optional field.

Validate them and inspect error messages.

Drill 3 — Versioning Thought Experiment

Ask:

If this type changes next year, will it break generated code?
If a partner sends a new value, should we reject, route, or tolerate?
If this value is used for audit/legal proof, is its lexical form deterministic?

21. Production Checklist

Before approving an XSD datatype design, check:

  • machine values use xs:token, not raw xs:string;
  • human text has max length;
  • money uses xs:decimal, not float/double;
  • precision is bounded with totalDigits/fractionDigits where needed;
  • date/time fields have explicit timezone policy;
  • enums are reserved for stable protocol vocabulary;
  • volatile catalogs are not hardcoded as schema enums;
  • optional, nil, empty, default, and zero are semantically distinct;
  • nillable is not used casually;
  • patterns are simple, tested, and documented;
  • identity constraints are used only where document-local consistency belongs in schema;
  • Java mapping impact is reviewed;
  • validation fixtures include positive and negative examples;
  • error messages are useful enough for support/partner debugging.

22. Key Takeaways

  • XML text becomes reliable data only after datatype and facet validation.
  • xs:string is often too weak for machine-consumed protocol values.
  • xs:token is a good default base for codes, identifiers, and status-like values.
  • Money should normally be xs:decimal with explicit precision constraints.
  • Date/time modelling requires explicit timezone semantics beyond just choosing xs:dateTime.
  • Enumeration is powerful but can become a compatibility trap.
  • nillable, absent, empty, default, and fixed values have different meanings.
  • XSD constraints should capture stable invariants, not volatile business policy.
  • Good datatype design reduces downstream parsing errors, improves diagnostics, and strengthens audit defensibility.

References

  • W3C, XML Schema Definition Language (XSD) 1.1 Part 2: Datatypes.
  • W3C, XML Schema Definition Language (XSD) 1.1 Part 1: Structures.
  • Oracle Java Documentation, javax.xml.validation package and SchemaFactory API.
  • Oracle Java Documentation, java.xml module.
Lesson Recap

You just completed lesson 11 in build core. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.