Build CoreOrdered learning track

Learn Java Core Types Part 010 Bytes Binary Data Buffering

[]12 min read2259 words

In This Lesson

1. Kaufman Deconstruction 2. Mental Model: Bytes Are Raw, Meaning Comes From Interpretation 3. Java `byte`: Signed Storage, Unsigned Use Cases

PrevNext

Lesson 1032 lesson track07–18 Build Core

title: Learn Java Core Types, Data Model & Data APIs - Part 010 description: Deep engineering treatment of Java bytes, binary data, charset encoding/decoding, ByteBuffer, heap vs direct buffers, endianness, Base64, hex, and production I/O failure modes. series: learn-java-core-types seriesTitle: Learn Java Core Types, Data Model & Data APIs order: 10 partTitle: Bytes, Binary Data, and Buffering tags:

java
byte
binary
charset
utf-8
bytebuffer
nio
base64
hex
encoding
advanced date: 2026-06-27

Part 010 — Bytes, Binary Data, and Buffering

Text is not bytes.

This sounds obvious, but many production bugs come from code that acts as if it were not true.

byte[] bytes = input.getBytes();
String again = new String(bytes);

This code depends on the platform default charset. It might work on your machine and corrupt data elsewhere.

Or:

ByteBuffer buffer = ByteBuffer.allocate(8);
buffer.putLong(42L);
socket.write(buffer); // bug: position is at end unless flipped

Or:

int b = bytes[i]; // sign extension surprise when treating byte as unsigned

This part explains how Java models binary data:

byte and byte[];
signed byte vs unsigned byte interpretation;
text encoding and decoding;
charset correctness;
ByteBuffer position/limit/capacity;
heap vs direct buffer;
endianness;
Base64 and hex;
binary protocol failure modes.

1. Kaufman Deconstruction

Skill besar pada part ini:

Mampu mendesain dan membaca boundary binary/text Java secara eksplisit tanpa encoding bugs, buffer state bugs, atau signed-byte surprises.

Sub-skill:

Sub-skill	Yang perlu dikuasai
Byte model	`byte` signed 8-bit, but often interpreted as unsigned octet
Binary container	`byte[]`, `ByteBuffer`, streams, channels
Encoding	`String` <-> `byte[]` via `Charset`
Charset safety	avoid default charset, use `StandardCharsets`
Buffer state	capacity, position, limit, mark, flip, clear, compact
Endianness	byte order for multi-byte numeric values
Base64/hex	binary-to-text representation
Protocol thinking	framing, partial reads, length prefix, validation

Target 20 jam:

Jam	Fokus latihan
1-2	signed byte experiments and unsigned conversion
3-5	encode/decode UTF-8, UTF-16, ISO-8859-1 examples
6-8	`CharsetEncoder`/`CharsetDecoder` error handling
9-11	`ByteBuffer` position/limit/flip/compact drills
12-14	endianness and binary integer serialization
15-17	Base64/hex encode/decode utilities
18-20	build a small binary framed message parser

2. Mental Model: Bytes Are Raw, Meaning Comes From Interpretation

A byte sequence has no inherent meaning.

48 65 6C 6C 6F

Possible interpretations:

ASCII/UTF-8 text: Hello
hex-encoded bytes;
part of compressed data;
part of encrypted data;
binary protocol frame;
image data;
integer fields;
serialized object payload.

Meaning comes from a contract:

When the contract is implicit, bugs appear.

3. Java `byte`: Signed Storage, Unsigned Use Cases

Java byte is signed and ranges from -128 to 127.

But many binary protocols define bytes as unsigned octets from 0 to 255.

byte b = (byte) 0xFF;
System.out.println(b); // -1

To interpret as unsigned:

int unsigned = Byte.toUnsignedInt(b);
System.out.println(unsigned); // 255

Or:

int unsigned = b & 0xFF;

Prefer named API for clarity:

int value = Byte.toUnsignedInt(buffer[index]);

3.1 Sign Extension Pitfall

byte b = (byte) 0xFE;
int x = b;
System.out.println(x); // -2

Widening from byte to int preserves signed value.

If you need unsigned octet:

int x = b & 0xFF;
System.out.println(x); // 254

3.2 Byte Literals and Casting

byte a = 127;       // ok, constant fits
byte b = (byte)128; // -128 after narrowing
byte c = (byte)255; // -1 after narrowing

Rule:

byte is a signed Java primitive. Treating it as unsigned is an interpretation step, not its native type behavior.

4. `byte[]`: The Basic Binary Container

byte[] is the simplest binary data container.

byte[] payload = new byte[] { 0x48, 0x65, 0x6C, 0x6C, 0x6F };

It is mutable.

record DocumentHash(byte[] bytes) { }

This record is dangerous because callers can mutate the array after construction.

byte[] raw = {1, 2, 3};
DocumentHash h = new DocumentHash(raw);
raw[0] = 99; // mutates h.bytes() content

Defensive copy:

import java.util.Arrays;

public final class DocumentHash {
    private final byte[] bytes;

    public DocumentHash(byte[] bytes) {
        this.bytes = Arrays.copyOf(bytes, bytes.length);
    }

    public byte[] bytes() {
        return Arrays.copyOf(bytes, bytes.length);
    }
}

For records:

public record BinaryPayload(byte[] bytes) {
    public BinaryPayload {
        bytes = bytes.clone();
    }

    @Override
    public byte[] bytes() {
        return bytes.clone();
    }
}

But remember: generated record equals for arrays uses reference equality, not content equality. For binary value objects, class may be better than record unless you override equals, hashCode, and toString carefully.

5. Text Encoding: `String` to `byte[]`

A String is text. A byte[] is bytes. Encoding converts text to bytes.

byte[] bytes = text.getBytes(StandardCharsets.UTF_8);

Decoding converts bytes to text.

String text = new String(bytes, StandardCharsets.UTF_8);

Avoid:

text.getBytes();
new String(bytes);

Because these use the default charset.

5.1 StandardCharsets

Use StandardCharsets:

import java.nio.charset.StandardCharsets;

byte[] utf8 = text.getBytes(StandardCharsets.UTF_8);
String decoded = new String(utf8, StandardCharsets.UTF_8);

Common standard charsets:

Charset	Use case
`UTF_8`	default modern text interchange
`UTF_16`	Java/Unicode interop when explicitly required
`US_ASCII`	strict ASCII protocols
`ISO_8859_1`	legacy single-byte Latin-1 systems

Rule:

For new protocols and storage, prefer UTF-8 unless a contract says otherwise.

6. Encoding Is Not Always Lossless

Some characters cannot be represented in some charsets.

String text = "Ayu 😊";
byte[] ascii = text.getBytes(StandardCharsets.US_ASCII);
String decoded = new String(ascii, StandardCharsets.US_ASCII);
System.out.println(decoded); // likely Ayu ?

Default encoding methods may replace unmappable characters.

If you need strict failure, use CharsetEncoder.

import java.nio.ByteBuffer;
import java.nio.CharBuffer;
import java.nio.charset.*;

CharsetEncoder encoder = StandardCharsets.US_ASCII
    .newEncoder()
    .onMalformedInput(CodingErrorAction.REPORT)
    .onUnmappableCharacter(CodingErrorAction.REPORT);

try {
    ByteBuffer encoded = encoder.encode(CharBuffer.wrap("Ayu 😊"));
} catch (CharacterCodingException ex) {
    // input cannot be represented as US-ASCII
}

Rule:

Boundary	Error strategy
user-facing import	report invalid encoding clearly
logs/debug output	replacement may be acceptable
legal/audit records	strict preservation and explicit errors
security tokens	bytes, not text; no charset conversion unless specified
protocol payload	strict contract

7. Charset Boundary Diagram

The sender and receiver must agree on charset.

If sender uses UTF-8 and receiver assumes ISO-8859-1, text may corrupt.

This corruption is often called mojibake.

8. `ByteBuffer`: State Machine, Not Just a Byte Array

ByteBuffer is central in Java NIO.

It has:

capacity;
position;
limit;
mark;
byte order;
backing storage, optionally.

ByteBuffer buffer = ByteBuffer.allocate(8);

Initial state:

capacity = 8
position = 0
limit = 8

After writing a long:

buffer.putLong(42L);

State:

position = 8
limit = 8

To read what you wrote, call flip():

buffer.flip();
long value = buffer.getLong();

flip() prepares for reading:

limit = old position
position = 0

8.1 Buffer State Diagram

8.2 `clear` Does Not Erase Data

buffer.clear();

This resets position/limit for writing. It does not zero the memory.

If buffer contains secrets, clear() is not secure erasure.

8.3 `compact` Preserves Unread Bytes

compact() is useful after partial reads:

unread bytes move to beginning;
position set after moved bytes;
limit set to capacity;
ready for more writing.

Typical socket pattern:

ByteBuffer buffer = ByteBuffer.allocate(8192);

int read = channel.read(buffer);
buffer.flip();

while (canReadFrame(buffer)) {
    Frame frame = readFrame(buffer);
    process(frame);
}

buffer.compact(); // preserve incomplete frame

9. Relative vs Absolute Buffer Operations

Relative operations use and update position:

buffer.put((byte) 1);
byte b = buffer.get();

Absolute operations use index and do not update position:

buffer.put(0, (byte) 1);
byte b = buffer.get(0);

Rule:

Operation style	Use when
relative	sequential protocol read/write
absolute	inspect/update known offset
duplicate/slice	pass sub-view without copying

Be careful: slices and duplicates can share content.

10. Heap vs Direct ByteBuffer

ByteBuffer heap = ByteBuffer.allocate(1024);
ByteBuffer direct = ByteBuffer.allocateDirect(1024);

Heap buffer:

backed by Java heap array;
easier for GC visibility;
often has accessible array;
good general default.

Direct buffer:

memory outside normal Java heap;
useful for native I/O interactions;
allocation/deallocation more expensive;
can reduce copying in some I/O scenarios;
not always faster by default.

Decision:

Need	Prefer
ordinary small data processing	heap `byte[]` or heap `ByteBuffer`
NIO channel high-throughput I/O	consider direct buffer
simple encode/decode	`byte[]` often enough
native interop	direct buffer may help
many short-lived buffers	avoid direct allocation churn

Rule:

Do not use direct buffers as a cargo-cult performance optimization. Measure and understand allocation lifetime.

11. Endianness: Byte Order for Multi-Byte Values

A single byte has no endianness. Multi-byte values do.

Example integer 0x01020304:

Big-endian:

01 02 03 04

Little-endian:

04 03 02 01

Java ByteBuffer defaults to big-endian.

ByteBuffer buffer = ByteBuffer.allocate(4);
buffer.putInt(0x01020304);

Explicit little-endian:

ByteBuffer buffer = ByteBuffer.allocate(4)
    .order(ByteOrder.LITTLE_ENDIAN);
buffer.putInt(0x01020304);

Always specify byte order in binary protocols.

record FrameHeader(int version, int length) {
    static FrameHeader read(ByteBuffer buffer) {
        buffer.order(ByteOrder.BIG_ENDIAN);
        int version = Byte.toUnsignedInt(buffer.get());
        int length = buffer.getInt();
        return new FrameHeader(version, length);
    }
}

Better: set order once at buffer creation/boundary and document protocol order.

12. Base64: Binary as Text

Base64 encodes binary data into text-safe representation.

Use cases:

JSON payload containing bytes;
email/MIME;
tokens;
HTTP basic credentials format;
embedding binary in text protocols.

Java API:

String encoded = Base64.getEncoder().encodeToString(bytes);
byte[] decoded = Base64.getDecoder().decode(encoded);

URL-safe variant:

String token = Base64.getUrlEncoder().withoutPadding().encodeToString(bytes);
byte[] raw = Base64.getUrlDecoder().decode(token);

Important:

Base64 is encoding, not encryption.
It increases size by roughly 33%.
Padding policy must match receiver expectations.
URL-safe Base64 differs from basic Base64.

12.1 Base64 Is Not a Charset

Do not do this conceptually:

String s = new String(binaryBytes, StandardCharsets.UTF_8); // wrong for arbitrary binary

Do this:

String s = Base64.getEncoder().encodeToString(binaryBytes);

Base64 converts arbitrary bytes to ASCII-ish text safely.

13. Hex Encoding

Hex is often used for diagnostics, hashes, signatures, binary IDs.

Java 17 introduced HexFormat.

import java.util.HexFormat;

String hex = HexFormat.of().formatHex(bytes);
byte[] parsed = HexFormat.of().parseHex(hex);

Uppercase:

String hex = HexFormat.of().withUpperCase().formatHex(bytes);

With delimiter:

String hex = HexFormat.ofDelimiter(":").formatHex(bytes);

Hex trade-off:

Encoding	Pros	Cons
Hex	readable, stable, easy debug	2x size
Base64	compact text encoding	less readable, padding variants

For hashes in logs, hex is often friendlier.

14. Binary Protocol Framing

Network/file reads may be partial.

Never assume one read equals one message.

Bad mental model:

read() -> whole message

Correct mental model:

read() -> some bytes
parser -> zero or more complete frames + maybe incomplete remainder

Length-prefixed frame example:

[4-byte length][payload bytes]

Parser sketch:

static boolean canReadFrame(ByteBuffer buffer) {
    if (buffer.remaining() < Integer.BYTES) {
        return false;
    }

    buffer.mark();
    int length = buffer.getInt();
    buffer.reset();

    if (length < 0 || length > 1_000_000) {
        throw new IllegalArgumentException("Invalid frame length: " + length);
    }

    return buffer.remaining() >= Integer.BYTES + length;
}

static byte[] readFrame(ByteBuffer buffer) {
    int length = buffer.getInt();
    byte[] payload = new byte[length];
    buffer.get(payload);
    return payload;
}

Production concerns:

maximum frame size;
negative length;
integer overflow in length calculations;
partial reads;
buffer compaction;
timeout;
backpressure;
malformed payload metrics;
audit/logging without dumping secrets.

15. Byte Array Equality and Hashing

Arrays do not use content equality.

byte[] a = {1, 2, 3};
byte[] b = {1, 2, 3};

System.out.println(a.equals(b)); // false

Use:

Arrays.equals(a, b);
Arrays.hashCode(a);

For nested arrays:

Arrays.deepEquals(...);
Arrays.deepHashCode(...);

For cryptographic comparisons, use appropriate constant-time comparison APIs where relevant.

Do not log secrets or raw tokens.

16. Binary Value Object Example

import java.util.Arrays;
import java.util.HexFormat;
import java.util.Objects;

public final class Sha256Hash {
    private static final int LENGTH = 32;
    private static final HexFormat HEX = HexFormat.of();

    private final byte[] bytes;

    public Sha256Hash(byte[] bytes) {
        Objects.requireNonNull(bytes, "bytes");
        if (bytes.length != LENGTH) {
            throw new IllegalArgumentException("SHA-256 hash must be 32 bytes");
        }
        this.bytes = bytes.clone();
    }

    public static Sha256Hash fromHex(String hex) {
        Objects.requireNonNull(hex, "hex");
        return new Sha256Hash(HEX.parseHex(hex));
    }

    public byte[] bytes() {
        return bytes.clone();
    }

    public String toHex() {
        return HEX.formatHex(bytes);
    }

    @Override
    public boolean equals(Object o) {
        return this == o || (o instanceof Sha256Hash other && Arrays.equals(bytes, other.bytes));
    }

    @Override
    public int hashCode() {
        return Arrays.hashCode(bytes);
    }

    @Override
    public String toString() {
        return toHex();
    }
}

Why class, not record?

Because record-generated equality for byte[] would compare array references. For binary value semantics, explicit implementation is clearer.

17. Text vs Binary Decision Framework

Rules:

Text crossing byte boundary needs a charset.
Arbitrary binary must not be forced into String.
Binary-as-text needs Base64 or hex.
Multi-byte binary numbers need byte order.
Buffer state must be managed explicitly.
Mutable byte arrays need defensive copies.

18. I/O Streams and Bytes

Classic byte streams:

InputStream in;
OutputStream out;

Read loop:

byte[] buffer = new byte[8192];
int n;
while ((n = in.read(buffer)) != -1) {
    out.write(buffer, 0, n);
}

Do not ignore n.

Wrong:

out.write(buffer); // writes entire buffer, including stale bytes

Correct:

out.write(buffer, 0, n);

For text:

try (Reader reader = new InputStreamReader(in, StandardCharsets.UTF_8)) {
    // character stream
}

Boundary distinction:

API	Data level
`InputStream` / `OutputStream`	bytes
`Reader` / `Writer`	characters/text
`InputStreamReader`	byte -> char bridge via charset
`OutputStreamWriter`	char -> byte bridge via charset

19. File Reading/Writing: Explicit Charset

Text file:

String content = Files.readString(path, StandardCharsets.UTF_8);
Files.writeString(path, content, StandardCharsets.UTF_8);

Binary file:

byte[] bytes = Files.readAllBytes(path);
Files.write(path, bytes);

Do not read binary data as string just to pass it around.

Bad:

String image = Files.readString(imagePath); // wrong for binary

Good:

byte[] image = Files.readAllBytes(imagePath);

If binary must go to JSON:

String base64 = Base64.getEncoder().encodeToString(image);

20. Memory and Allocation Concerns

Binary-heavy code often stresses memory.

Common issues:

copying large byte[] repeatedly;
converting bytes to Base64 strings unnecessarily;
keeping full file in memory;
direct buffer allocation churn;
unbounded frame length;
logging giant payloads;
retaining buffer slices longer than expected.

Engineering mitigations:

Issue	Mitigation
large payload	stream instead of load-all
repeated concat	use buffers/streams
unbounded input	enforce max size
debug logs huge	log length/hash/sample, not full payload
mutable ownership	copy at boundary or document ownership
direct buffer churn	pool carefully or allocate long-lived buffers

21. Binary Data in APIs

Do not expose raw byte[] casually.

Bad:

class Attachment {
    byte[] content;
}

Better:

final class AttachmentContent {
    private final byte[] bytes;

    AttachmentContent(byte[] bytes) {
        this.bytes = bytes.clone();
    }

    int size() {
        return bytes.length;
    }

    InputStream openStream() {
        return new ByteArrayInputStream(bytes);
    }

    byte[] copyBytes() {
        return bytes.clone();
    }
}

For very large content, avoid storing in memory:

interface BlobRef {
    long size();
    InputStream openStream() throws IOException;
}

Data representation decision depends on size and ownership.

22. Common Failure Modes

Failure	Cause	Prevention
mojibake	mismatched/default charset	explicit `StandardCharsets.UTF_8`
data loss on encode	unmappable characters replaced	strict `CharsetEncoder` with `REPORT`
signed byte bug	treating `byte` as 0..255	`Byte.toUnsignedInt`
buffer writes nothing	forgot `flip()` before read/write	manage buffer state
stale bytes written	ignored read count	write `0..n` only
wrong integer value	endianness mismatch	explicit `ByteOrder`
binary corrupted as text	arbitrary bytes converted to `String`	Base64/hex or keep bytes
mutable binary value	exposed `byte[]`	defensive copy
array equality bug	`byte[].equals` reference equality	`Arrays.equals`
memory pressure	load huge payloads	streaming, max size
partial read bug	assumes one read = one message	frame parser

23. Practice Drill: Framed UTF-8 Message Parser

Build a parser for this protocol:

[4-byte big-endian length][UTF-8 JSON-like text payload]

Rules:

length is signed Java int but valid range is 0..1_000_000;
parser receives arbitrary chunks;
one chunk may contain half a frame;
one chunk may contain multiple frames;
payload must decode as valid UTF-8;
malformed length fails fast;
malformed UTF-8 fails clearly;
parser preserves incomplete bytes for next read.

Suggested public API:

final class FramedUtf8Parser {
    List<String> feed(byte[] chunk);
}

Test cases:

one complete frame;
two frames in one chunk;
frame split across chunks;
negative length;
length above max;
incomplete length prefix;
invalid UTF-8;
zero-length payload.

24. Review Checklist

Before approving Java binary/text boundary code, ask:

Is this text or binary?
Where is the charset specified?
Are default charset APIs avoided?
Are arbitrary bytes ever converted to String?
If binary is transported as text, is Base64/hex used intentionally?
Are byte values interpreted as signed or unsigned intentionally?
Are mutable byte[] values defensively copied?
Does equality use content equality?
Is byte order specified for multi-byte numeric fields?
Does ByteBuffer code handle flip, clear, and compact correctly?
Are partial reads handled?
Are frame sizes bounded?
Is strict decoding required?
Are secrets/tokens excluded from logs?
Are large payloads streamed instead of fully loaded?

25. Summary

Binary data in Java is simple only when the boundary contract is explicit.

Key takeaways:

byte is signed; unsigned interpretation requires conversion.
byte[] is mutable; defensive copy is often required.
String to bytes requires a Charset.
Avoid default charset APIs at production boundaries.
Use CharsetEncoder/CharsetDecoder for strict error handling.
ByteBuffer is a state machine with position, limit, capacity, and byte order.
Always flip() before reading data you wrote into a buffer.
Specify endianness for binary protocols.
Use Base64 or hex for binary-as-text.
Do not assume one I/O read equals one full message.
Treat binary value objects carefully because arrays use reference equality.

Next part: Object, Class, runtime type, identity-sensitive operations, and how Java's root object model affects debugging, frameworks, and domain modeling.

Lesson Recap

You just completed lesson 10 in build core. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Back To Series Next Lesson

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.

Previous Lesson

Lesson 09

Learn Java Core Types Part 009 Text Parsing Formatting Regex

Next Lesson

Lesson 11

Learn Java Core Types Part 011 Object Class And Runtime Type

Learn Java Core Types Part 010 Bytes Binary Data Buffering

Part 010 — Bytes, Binary Data, and Buffering

1. Kaufman Deconstruction

2. Mental Model: Bytes Are Raw, Meaning Comes From Interpretation

3. Java byte: Signed Storage, Unsigned Use Cases

3.1 Sign Extension Pitfall

3.2 Byte Literals and Casting

4. byte[]: The Basic Binary Container

5. Text Encoding: String to byte[]

5.1 StandardCharsets

6. Encoding Is Not Always Lossless

7. Charset Boundary Diagram

8. ByteBuffer: State Machine, Not Just a Byte Array

8.1 Buffer State Diagram

8.2 clear Does Not Erase Data

8.3 compact Preserves Unread Bytes

9. Relative vs Absolute Buffer Operations

10. Heap vs Direct ByteBuffer

11. Endianness: Byte Order for Multi-Byte Values

12. Base64: Binary as Text

12.1 Base64 Is Not a Charset

13. Hex Encoding

14. Binary Protocol Framing

15. Byte Array Equality and Hashing

16. Binary Value Object Example

17. Text vs Binary Decision Framework

18. I/O Streams and Bytes

19. File Reading/Writing: Explicit Charset

20. Memory and Allocation Concerns

21. Binary Data in APIs

22. Common Failure Modes

23. Practice Drill: Framed UTF-8 Message Parser

24. Review Checklist

25. Summary

3. Java `byte`: Signed Storage, Unsigned Use Cases

4. `byte[]`: The Basic Binary Container

5. Text Encoding: `String` to `byte[]`

8. `ByteBuffer`: State Machine, Not Just a Byte Array

8.2 `clear` Does Not Erase Data

8.3 `compact` Preserves Unread Bytes