Build CoreOrdered learning track

Production NIO Server Patterns

Learn Java Networking - Part 012

Production NIO server architecture patterns: reactor loops, boss/worker split, connection state machines, write queues, admission control, backpressure, and graceful shutdown.

16 min read3012 words
PrevNext
Lesson 1232 lesson track0718 Build Core
#java#networking#nio#server-architecture+3 more

Part 012 — Production NIO Server Patterns

Goal utama part ini: mengubah pemahaman Selector dari Part 011 menjadi desain server yang defensible: punya ownership model, state machine, write queue, admission control, overload behavior, timeout, graceful close, dan failure matrix.

Part 011 menjawab “bagaimana selector bekerja.” Part ini menjawab “bagaimana membangun server NIO yang tidak rapuh ketika masuk dunia production.”

Kita tidak akan membahas REST framework, servlet container, Netty internals secara penuh, atau observability umum. Fokusnya adalah pola arsitektur networking mentah yang membuat kamu bisa membaca, mengevaluasi, atau membangun server event-loop sendiri.


1. Why Toy NIO Servers Fail in Production

Toy NIO server biasanya hanya melakukan:

select -> accept -> read -> write -> close

Production server membutuhkan lebih dari itu:

ConcernToy serverProduction server
Connection countTidak dibatasiAdmission control dan max connection
Request sizeDiasumsikan kecilMax frame/body/header size
Write behaviorLangsung writeOutbound queue + partial write
Slow clientTidak dipikirkanHigh/low watermark dan idle timeout
Expensive workDikerjakan di selector threadWorker pool dengan safe handoff
ShutdownProcess matiStop accept, drain, close deadline
ErrorPrint stack traceClassified close reason + metric
FairnessTidak ada budgetRead/write/frame/task budget
Protocol stateSatu bufferExplicit state machine
OverloadMelambat sampai jatuhReject, shed, backpressure, degrade

The production problem is not merely I/O. It is state management under partial progress and failure.


2. Architectural Mental Model

A production NIO server should be decomposed into these roles:

ComponentResponsibility
Boss loopOwns listening socket, accepts connections, assigns to worker loops
Worker event loopOwns connected channels, reads/writes bytes, manages connection state
Connection stateParser, outbound queue, deadlines, counters, protocol phase
Application worker poolPerforms CPU/blocking business work outside event loop
Timer/timeout managerEnforces idle/read/write/request deadlines
Admission controllerDecides whether to accept, reject, pause, or close
Metrics/logging hooksClassify behavior and failures

This split is common in high-performance networking systems: accept cheaply, then distribute connection ownership.


3. Reactor Pattern, Not Magic

The selector-based server is a Reactor:

Reactor means:

  • wait for readiness,
  • dispatch to handlers,
  • handlers must not block the reactor,
  • state is explicit,
  • backpressure must be explicit.

Do not confuse Reactor with Proactor:

PatternMeaning
ReactorApp is notified when operation can be attempted
ProactorApp is notified when operation has completed

Java Selector is Reactor-style readiness. Java AsynchronousSocketChannel is closer to completion-style programming and is covered later.


4. Single Reactor vs Multi-Reactor

Single reactor

one selector thread handles accept + read + write for all connections

Good for:

  • learning,
  • simple servers,
  • many idle connections,
  • low business complexity.

Risk:

  • one loop can become bottleneck,
  • accept can starve read/write,
  • all connections share one failure domain.

Boss + worker reactors

boss selector accepts only
worker selectors handle connected sockets

Good for:

  • higher connection count,
  • multi-core usage,
  • isolating accept path,
  • distributing connection load.

Risk:

  • more cross-thread handoff complexity,
  • more lifecycle complexity,
  • harder metrics/debugging.

Multiple listening sockets/processes

Often used with OS/container orchestration, load balancer, or SO_REUSEPORT-style designs. In pure Java standard API, the exact behavior is OS-dependent and should be treated carefully.


5. Event Loop Ownership Model

Production NIO collapses if ownership is unclear.

Recommended rule:

A connection is owned by exactly one event-loop thread from registration until close.

That loop owns:

  • SelectionKey,
  • SocketChannel,
  • inbound buffer,
  • protocol parser,
  • outbound queue,
  • connection deadlines,
  • connection-local counters,
  • interestOps transitions.

Worker threads may compute responses, but they should not mutate channel state directly. They should submit tasks back to the owning loop.

record LoopTask(ConnectionState state, ByteBuffer response) implements Runnable {
    @Override
    public void run() {
        state.outbound.add(response);
        state.enableWrite();
    }
}

This reduces data races and avoids expensive locking in the hot path.


6. Boss Acceptor Loop

The boss loop owns only the listening socket.

Responsibilities:

  1. accept pending connections,
  2. apply admission control,
  3. configure socket options,
  4. choose worker loop,
  5. transfer registration to worker loop.

Sketch:

final class BossLoop implements Runnable {
    private final Selector selector;
    private final ServerSocketChannel server;
    private final WorkerLoop[] workers;
    private int nextWorker;

    @Override
    public void run() {
        while (running) {
            selector.select(1000);
            Iterator<SelectionKey> it = selector.selectedKeys().iterator();
            while (it.hasNext()) {
                SelectionKey key = it.next();
                it.remove();
                if (key.isAcceptable()) {
                    acceptReadyConnections();
                }
            }
        }
    }

    private void acceptReadyConnections() throws IOException {
        int budget = 256;
        while (budget-- > 0) {
            SocketChannel channel = server.accept();
            if (channel == null) break;

            if (!admissionController.allow(channel)) {
                channel.close();
                continue;
            }

            channel.configureBlocking(false);
            channel.setOption(StandardSocketOptions.TCP_NODELAY, true);
            chooseWorker().register(channel);
        }
    }

    private WorkerLoop chooseWorker() {
        WorkerLoop worker = workers[nextWorker];
        nextWorker = (nextWorker + 1) % workers.length;
        return worker;
    }
}

Registration with a worker loop must be done in that worker's event loop or with proper wakeup coordination.

final class WorkerLoop {
    private final Selector selector;
    private final Queue<Runnable> tasks = new ConcurrentLinkedQueue<>();

    void register(SocketChannel channel) {
        tasks.add(() -> doRegister(channel));
        selector.wakeup();
    }

    private void doRegister(SocketChannel channel) {
        try {
            ConnectionState state = new ConnectionState(channel, this);
            SelectionKey key = channel.register(selector, SelectionKey.OP_READ, state);
            state.key = key;
        } catch (IOException e) {
            closeQuietly(channel);
        }
    }
}

7. Worker Event Loop

A worker loop should have a stable shape:

while (running) {
    selector.select(nextDeadlineMillis());
    drainTasks(TASK_BUDGET);
    processSelectedKeys(IO_BUDGETS);
    expireTimeouts(now);
    flushCloseQueue();
}

The ordering is intentional:

  1. select: wait for I/O or wakeup.
  2. drainTasks: register new channels and enqueue worker responses.
  3. processSelectedKeys: advance network I/O.
  4. expireTimeouts: enforce deadlines.
  5. flushCloseQueue: clean resources deterministically.

Important: do not let task draining starve I/O. A huge worker callback storm can be as dangerous as I/O storm.


8. Connection State Machine

A connection should not be modeled as “has a buffer.” It should be modeled as a state machine.

Minimal enum:

enum ConnectionPhase {
    READING_HEADER,
    READING_BODY,
    DISPATCHING,
    WAITING_FOR_RESPONSE,
    WRITING,
    CLOSING,
    CLOSED
}

State transition rules should be explicit. Hidden booleans eventually become inconsistent.

Bad state model:

boolean reading;
boolean writing;
boolean done;
boolean closed;
boolean processing;

Better:

ConnectionPhase phase;
CloseReason closeReason;
long deadlineNanos;

9. Protocol Parser Pattern

A production parser should be incremental.

interface FrameParser {
    ParseResult parse(ByteBuffer input, List<Frame> out);
}

enum ParseResult {
    NEED_MORE_DATA,
    FRAME_AVAILABLE,
    PROTOCOL_ERROR,
    FRAME_TOO_LARGE
}

Read handler shape:

private void onRead(ConnectionState c) throws IOException {
    int bytes = readWithBudget(c);
    if (bytes < 0) {
        close(c, CloseReason.PEER_CLOSED);
        return;
    }

    c.inbound.flip();
    try {
        int frameBudget = 32;
        while (frameBudget-- > 0) {
            ParseResult result = c.parser.parse(c.inbound, c.frames);
            if (result == ParseResult.NEED_MORE_DATA) break;
            if (result == ParseResult.PROTOCOL_ERROR || result == ParseResult.FRAME_TOO_LARGE) {
                close(c, CloseReason.PROTOCOL_ERROR);
                return;
            }
            dispatchReadyFrames(c);
        }
    } finally {
        c.inbound.compact();
    }
}

Parser invariants:

  • never consume bytes unless a state transition is valid,
  • never trust length fields without max limit,
  • support partial header,
  • support partial body,
  • support multiple frames per read,
  • report malformed input deterministically,
  • keep parser memory bounded.

10. Read Path Design

Production read path:

Read path needs four kinds of limits:

LimitWhy
Read byte budgetPrevent hot connection starvation
Frame count budgetPrevent one read from decoding unbounded frames
Max frame sizePrevent memory abuse
Request deadlinePrevent slowloris-style incomplete request

Slowloris defense is not only HTTP-specific. Any protocol with incremental request body can be abused by sending bytes too slowly.


11. Write Path Design

Outbound path should be queue-based.

final class OutboundQueue {
    private final ArrayDeque<ByteBuffer> buffers = new ArrayDeque<>();
    private long queuedBytes;

    void add(ByteBuffer buffer) {
        if (buffer.remaining() == 0) return;
        buffers.add(buffer);
        queuedBytes += buffer.remaining();
    }

    ByteBuffer peek() {
        return buffers.peek();
    }

    void removeFullyWritten(ByteBuffer bufferBeforeWrite) {
        if (!bufferBeforeWrite.hasRemaining()) {
            buffers.poll();
        }
    }

    long queuedBytes() {
        return queuedBytes;
    }
}

In real implementation, update queuedBytes by the actual bytes written, not only on removal.

Write path:

Write invariants:

  • never block waiting for a socket to accept all bytes,
  • never allocate unbounded response buffers,
  • never keep OP_WRITE enabled without pending bytes,
  • support partial writes,
  • support close-after-drain,
  • classify slow consumer separately from application errors.

12. Backpressure with High/Low Watermarks

Backpressure should be designed as a state transition, not an afterthought.

static final long HIGH_WATERMARK = 8L * 1024 * 1024;
static final long LOW_WATERMARK = 2L * 1024 * 1024;

void afterEnqueue(ConnectionState c) {
    if (c.outboundBytes() >= HIGH_WATERMARK) {
        c.readPaused = true;
        c.key.interestOps(c.key.interestOps() & ~SelectionKey.OP_READ);
    }
    c.key.interestOps(c.key.interestOps() | SelectionKey.OP_WRITE);
}

void afterWrite(ConnectionState c) {
    if (c.readPaused && c.outboundBytes() <= LOW_WATERMARK) {
        c.readPaused = false;
        c.key.interestOps(c.key.interestOps() | SelectionKey.OP_READ);
    }

    if (c.outboundBytes() == 0) {
        c.key.interestOps(c.key.interestOps() & ~SelectionKey.OP_WRITE);
    }
}

Why high/low rather than a single threshold?

A single threshold causes flapping: pause/resume/pause/resume around the boundary. High/low creates hysteresis.


13. Admission Control

Admission control answers: “Should this server accept more work?”

Admission can happen at several layers:

LayerControl
Kernel listen queuebacklog parameter and OS tuning
Boss accept loopmax accepts per loop
Connection countglobal/per-IP/per-tenant limits
Worker assignmentavoid overloaded worker
Protocol parserreject oversized frames early
Application queuereject if worker queue is saturated
Outbound queueclose/pause slow clients

Example:

final class AdmissionController {
    private final AtomicInteger activeConnections = new AtomicInteger();
    private final int maxConnections;

    boolean allow(SocketChannel channel) {
        int current = activeConnections.incrementAndGet();
        if (current > maxConnections) {
            activeConnections.decrementAndGet();
            return false;
        }
        return true;
    }

    void onClose() {
        activeConnections.decrementAndGet();
    }
}

In a regulated or high-integrity system, rejection behavior should be explicit:

  • close immediately,
  • send protocol-level “server busy,”
  • return retry-after equivalent if protocol supports it,
  • sample logs to avoid log amplification,
  • metric close reason as ADMISSION_REJECTED.

14. Worker Pool Handoff

Event loop should not perform expensive business logic.

Bad:

Frame request = decode(buffer);
Frame response = handler.handle(request); // may block or be expensive
state.outbound.add(encode(response));

Better:

Frame request = decode(buffer);
state.phase = ConnectionPhase.DISPATCHING;

workerPool.execute(() -> {
    Response response;
    try {
        response = handler.handle(request);
    } catch (Throwable t) {
        response = errorResponse(t);
    }

    ByteBuffer encoded = encode(response);
    state.ownerLoop.execute(() -> {
        if (state.phase == ConnectionPhase.CLOSED) return;
        state.outbound.add(encoded);
        state.phase = ConnectionPhase.WRITING;
        state.enableWrite();
    });
});

Ordering problem:

If a connection can pipeline multiple requests, responses may complete out of order. You need a policy:

PolicyUse case
No pipeliningSimpler request/response protocols
Ordered response queueHTTP/1.1-like semantics
Stream/request IDsMultiplexed protocol like HTTP/2 conceptually
Per-connection serial executorPreserve order at cost of concurrency

Do not accidentally introduce response reordering if protocol forbids it.


15. Timeout and Deadline Model

Timeouts are part of server correctness.

Types:

TimeoutMeaning
Idle timeoutNo read/write activity for too long
Read/header timeoutRequest header/frame header not completed in time
Body timeoutBody transfer too slow
Application timeoutHandler took too long
Write timeoutResponse cannot be drained to client
Graceful shutdown deadlineDrain period exceeded

Connection state should carry deadline fields:

long idleDeadlineNanos;
long readDeadlineNanos;
long appDeadlineNanos;
long writeDeadlineNanos;

Simple timeout scan:

void expireTimeouts(long now) {
    for (SelectionKey key : selector.keys()) {
        if (!key.isValid()) continue;
        if (!(key.attachment() instanceof ConnectionState c)) continue;

        if (now >= c.currentDeadlineNanos()) {
            close(c, CloseReason.TIMEOUT);
        }
    }
}

For very high connection counts, scanning all keys every loop can be expensive. Use timing wheel, heap, or segmented scans. But do not skip timeout design.


16. Graceful Close and Half-Close

TCP close is not one thing. At application level, define close policy.

ScenarioPolicy
Protocol errorClose immediately, optionally send error frame first
Normal response completeDrain outbound queue, then close if protocol says so
Peer EOF before full requestClose as incomplete request
Peer EOF after full requestMaybe write response then close
Server shutdownStop accept, drain existing, close after deadline
Slow consumerClose after write timeout/high watermark breach

Close-after-drain pattern:

void closeAfterDrain(ConnectionState c, CloseReason reason) {
    c.closeReason = reason;
    c.closeWhenDrained = true;
    if (c.outboundBytes() == 0) {
        closeNow(c);
    } else {
        c.enableWrite();
    }
}

Immediate close pattern:

void closeNow(ConnectionState c) {
    try {
        c.key.cancel();
        c.channel.close();
    } catch (IOException ignored) {
        // best effort
    } finally {
        c.phase = ConnectionPhase.CLOSED;
        metrics.connectionClosed(c.closeReason);
    }
}

Be careful with half-close semantics. SocketChannel.read() returning -1 means the peer has closed its output side. Whether you still write depends on your protocol. Many servers simply close unless they are intentionally supporting half-close.


17. Graceful Shutdown Sequence

A robust shutdown does not just kill the process.

Server shutdown phases:

  1. Running: accept and process normally.
  2. Draining: stop accepting new connections; existing requests may finish.
  3. Closing: close idle connections; close after response for active ones.
  4. Forced: close everything after deadline.
  5. Stopped: selectors and channels closed.

Shutdown state:

enum ServerPhase {
    RUNNING,
    DRAINING,
    CLOSING,
    STOPPED
}

During draining:

  • remove OP_ACCEPT or close listening channel,
  • stop assigning new work,
  • reject new frames on persistent connections if protocol allows,
  • flush existing responses,
  • enforce deadline.

18. Worker Assignment Strategies

When boss accepts a connection, it must choose a worker.

StrategyProsCons
Round-robinSimple, low overheadIgnores load variance
Least connectionsBetter balance for long-lived idle connsDoes not measure throughput/load
Least queued tasksBetter under app callbacksMore shared state
Hash by remote/client IDAffinityCan skew badly
Dedicated tenant shardIsolationOperational complexity

Default recommendation:

  • Start with round-robin.
  • Track per-worker active connections, selected events, loop lag, outbound bytes.
  • Move to load-aware assignment only when metrics justify it.

19. Loop Lag as a Health Signal

Event loop lag means the loop is not waking/processing on time.

Simple measurement:

long expectedWakeNanos = System.nanoTime() + timeoutNanos;
selector.select(timeoutMillis);
long lagNanos = Math.max(0, System.nanoTime() - expectedWakeNanos);

Better: schedule periodic tick tasks and measure delay.

Track:

MetricMeaning
active connections per loopLoad distribution
selected keys per secondI/O activity
bytes read/written per secondThroughput
OP_WRITE-enabled connectionsSlow consumer/backpressure pressure
outbound queued bytesMemory risk
task queue depthWorker callback pressure
loop lag p50/p95/p99Event-loop saturation
close reason countsFailure classification
accept rejectsAdmission pressure

Even if observability was covered in another series, for NIO server specifically these metrics are correctness sensors.


20. Production Close Reasons

Define close reasons as enum. This improves logs and metrics.

enum CloseReason {
    NORMAL,
    PEER_CLOSED,
    PROTOCOL_ERROR,
    FRAME_TOO_LARGE,
    IDLE_TIMEOUT,
    READ_TIMEOUT,
    WRITE_TIMEOUT,
    APP_TIMEOUT,
    ADMISSION_REJECTED,
    BACKPRESSURE_LIMIT,
    SERVER_SHUTDOWN,
    IO_EXCEPTION,
    INTERNAL_ERROR
}

A mature system should answer:

  • Are clients disconnecting normally?
  • Are we closing because clients are too slow?
  • Are we rejecting due to overload?
  • Are protocol errors increasing after a deploy?
  • Are write timeouts correlated with one tenant/network?

Without close reason, networking failures become noise.


21. Memory Model and Buffer Ownership

In NIO servers, buffer ownership is architecture.

Rules:

  1. Inbound buffer belongs to connection/event loop.
  2. Outbound buffers must not be mutated after enqueue.
  3. If a worker produces a buffer, it transfers ownership to event loop.
  4. Direct buffers should be bounded and reused carefully.
  5. Never retain slices of a huge buffer indefinitely unless intentional.

Common bug:

ByteBuffer response = sharedBuffer;
state.outbound.add(response); // another request mutates sharedBuffer before write completes

Correct:

ByteBuffer response = ByteBuffer.wrap(encodedBytes).asReadOnlyBuffer();
state.outbound.add(response);

For high-performance systems, you may use buffer pooling. But pooling introduces ownership complexity:

  • when is buffer returned?
  • what if partial write remains?
  • who owns reference after enqueue?
  • what if connection closes before write completes?

Never introduce pooling before ownership is clear.


22. Designing the Server API Boundary

A clean internal API separates network protocol from application logic.

interface ProtocolHandler {
    void onFrame(ConnectionContext ctx, Frame frame);
    void onConnected(ConnectionContext ctx);
    void onClosed(ConnectionContext ctx, CloseReason reason);
}

interface ConnectionContext {
    void write(Frame frame);
    void close();
    SocketAddress remoteAddress();
    long connectionId();
}

But ConnectionContext.write() should not write directly to the socket. It should enqueue onto owning event loop.

public void write(Frame frame) {
    ByteBuffer encoded = encoder.encode(frame);
    ownerLoop.execute(() -> {
        if (!state.isClosed()) {
            state.outbound.add(encoded);
            state.enableWrite();
        }
    });
}

This makes the application API simple while preserving event-loop ownership internally.


23. Error Handling Policy

Do not let arbitrary exceptions determine protocol behavior.

Error sourceExamplePolicy
I/O errorconnection resetclose, metric as IO_EXCEPTION
Protocol errorinvalid frame lengthclose as PROTOCOL_ERROR or send error frame then close
Application errorhandler throwsprotocol error response if possible, maybe keep connection
Overloadworker queue fullreject request or close as BACKPRESSURE_LIMIT
Timeoutincomplete frameclose as timeout
Bug/invariant violationimpossible phase transitionclose connection, alert if systemic

A strong invariant:

No exception should leave a connection in an unknown state.

Either recover to a known phase or close.


24. Testing a Production NIO Server

Unit tests alone are insufficient. You need behavioral tests.

TestWhat it catches
Partial frame byte-by-byteParser state bugs
Multiple frames in one packet/readIncorrect message boundary assumption
Slow reader clientOutbound queue leak
Slow writer clientInbound timeout/backpressure bug
Abrupt resetClose/error handling bug
10k idle connectionsSelector spin/resource leak
Worker pool saturationMissing admission/backpressure
Graceful shutdown under loadDrain/close race
Random fuzz framesProtocol validation weakness
Large response partial writeWrite queue correctness

Example byte-by-byte client behavior:

connect
send first byte of length
sleep
send second byte
sleep
...
observe timeout or correct eventual parse

A server that passes only “happy path full request in one write” is not network-correct.


25. Failure Matrix

FailureBad server behaviorProduction behavior
Client sends huge frame lengthAllocates huge buffer/OOMReject before allocation
Client reads slowlyHeap grows with outbound queuePause reads, enforce watermark, close if needed
Worker pool saturatedEvent loop keeps reading and queuingApply admission/backpressure
Selector thread blockedAll connections stallMove expensive work off-loop
Shutdown requestedDrops in-flight responsesStop accept, drain, deadline, force close
Application handler throwsConnection leaks or loop diesCatch, classify, respond/close
Connection resetStack traces flood logsSample/log classified close reason
OP_WRITE always enabledCPU spinDemand-driven write interest
Frame split across readsProtocol failsIncremental parser
Multiple frames per readDrops/merges framesLoop parser with frame budget

26. Reference Implementation Skeleton

This skeleton shows structure, not every implementation detail.

public final class ProductionNioServer implements AutoCloseable {
    private final BossLoop boss;
    private final WorkerLoop[] workers;
    private final ExecutorService appExecutor;

    public ProductionNioServer(
            InetSocketAddress bind,
            int workerCount,
            ExecutorService appExecutor
    ) throws IOException {
        this.appExecutor = appExecutor;
        this.workers = new WorkerLoop[workerCount];
        for (int i = 0; i < workerCount; i++) {
            workers[i] = new WorkerLoop("nio-worker-" + i, appExecutor);
        }
        this.boss = new BossLoop(bind, workers);
    }

    public void start() {
        for (WorkerLoop worker : workers) {
            new Thread(worker, worker.name()).start();
        }
        new Thread(boss, "nio-boss").start();
    }

    public void shutdownGracefully(Duration drainDeadline) {
        boss.stopAccepting();
        for (WorkerLoop worker : workers) {
            worker.beginDrain(drainDeadline);
        }
    }

    @Override
    public void close() {
        boss.close();
        for (WorkerLoop worker : workers) {
            worker.close();
        }
        appExecutor.shutdown();
    }
}

Worker loop concept:

final class WorkerLoop implements Runnable, AutoCloseable {
    private final String name;
    private final Selector selector;
    private final Queue<Runnable> tasks = new ConcurrentLinkedQueue<>();
    private final ExecutorService appExecutor;
    private volatile boolean running = true;
    private volatile boolean draining;

    WorkerLoop(String name, ExecutorService appExecutor) throws IOException {
        this.name = name;
        this.selector = Selector.open();
        this.appExecutor = appExecutor;
    }

    String name() { return name; }

    void execute(Runnable task) {
        tasks.add(task);
        selector.wakeup();
    }

    void register(SocketChannel channel) {
        execute(() -> doRegister(channel));
    }

    @Override
    public void run() {
        while (running) {
            try {
                selector.select(1000);
                drainTasks(1024);
                processKeys();
                expireTimeouts();
                enforceDrainPolicy();
            } catch (IOException e) {
                // loop-level exception should be rare and visible
                reportLoopFailure(e);
            }
        }
        closeAllKeys();
    }

    private void doRegister(SocketChannel channel) {
        try {
            ConnectionState state = new ConnectionState(channel, this);
            SelectionKey key = channel.register(selector, SelectionKey.OP_READ, state);
            state.key = key;
        } catch (IOException e) {
            closeQuietly(channel);
        }
    }

    private void processKeys() throws IOException {
        Iterator<SelectionKey> it = selector.selectedKeys().iterator();
        while (it.hasNext()) {
            SelectionKey key = it.next();
            it.remove();
            if (!key.isValid()) continue;
            try {
                if (key.isReadable()) onRead((ConnectionState) key.attachment());
                if (key.isValid() && key.isWritable()) onWrite((ConnectionState) key.attachment());
            } catch (Throwable t) {
                close((ConnectionState) key.attachment(), CloseReason.INTERNAL_ERROR);
            }
        }
    }
}

A production implementation would complete the omitted methods with the policies described above.


27. Decision Matrix: Build Raw NIO or Use Framework?

SituationRecommendation
You need custom binary protocol and full controlRaw NIO or Netty-like framework
You need HTTP serverUse mature HTTP server/framework unless learning/building infrastructure
You need TLS-heavy protocolPrefer framework unless SSLEngine expertise is required
You need maximum learningBuild raw NIO once
You need production delivery fastUse proven framework
You need to debug framework internalsUnderstand raw NIO patterns deeply
You need many simple blocking operationsConsider virtual threads before raw selector

Top 1% engineering judgment is knowing that raw NIO is powerful but expensive. You learn it to understand the machine, not to rewrite every server by hand.


28. Production Readiness Checklist

  • Listener has explicit bind address, backlog, and socket options.
  • Boss loop has accept budget and admission controller.
  • Worker loops own connected channels exclusively.
  • Cross-thread work uses event-loop task queue and wakeup.
  • Connection state machine is explicit.
  • Parser is incremental and bounded.
  • Write queue supports partial writes.
  • OP_WRITE is demand-driven.
  • High/low watermarks protect memory.
  • Slow readers and slow writers have policies.
  • Timeouts exist for idle, read, app, write, and shutdown.
  • Graceful shutdown stops accept before draining.
  • Close reasons are classified and measured.
  • Loop lag is measured.
  • Worker saturation causes backpressure/rejection.
  • Tests cover partial frames, slow clients, reset, overload, and shutdown.

29. Deliberate Practice Project

Build a small binary request/response server with this protocol:

Request:
  magic: 2 bytes
  version: 1 byte
  requestId: 8 bytes
  operation: 1 byte
  payloadLength: 4 bytes
  payload: N bytes

Response:
  magic: 2 bytes
  version: 1 byte
  requestId: 8 bytes
  status: 1 byte
  payloadLength: 4 bytes
  payload: N bytes

Requirements:

  1. Boss + 2 worker event loops.
  2. Incremental parser.
  3. Max payload size.
  4. Worker-pool dispatch.
  5. Ordered response per connection.
  6. High/low outbound watermark.
  7. Idle/read/write timeout.
  8. Graceful shutdown.
  9. Close reason metrics.
  10. Slow client test.

Stretch goals:

  • add per-IP connection limit,
  • add request deadline propagation,
  • add protocol error response before close,
  • add fuzz test for frame parser,
  • add loop-lag metric.

30. Mental Compression

A production NIO server is not a loop. It is a system of coordinated state machines:

Boss loop: accepts and assigns.
Worker loop: owns connection I/O.
Connection state: remembers partial progress.
Parser: converts bytes into frames.
Application worker: performs expensive work.
Outbound queue: handles partial writes.
Watermarks: prevent memory death.
Timeouts: prevent infinite partial progress.
Shutdown: stops intake before draining.
Metrics: prove behavior under failure.

If Part 011 taught the selector mechanics, Part 012 teaches the architectural discipline needed to survive production. The next part moves to Java asynchronous socket channels and completion-oriented APIs, so you can compare readiness-based and completion-based models with the right mental model.

Lesson Recap

You just completed lesson 12 in build core. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.