Build CoreOrdered learning track

State in Microservices

Learn Java Microservices File Handling, State, Configuration and Secret Management - Part 027

Mental model state di Java microservices: durable, ephemeral, derived, workflow, session, cache, operational state, ownership, placement, consistency, dan failure modeling.

15 min read2857 words
PrevNext
Lesson 2770 lesson track14–38 Build Core
#java#microservices#state-management#distributed-systems+3 more

Part 027 — State in Microservices

A microservice is never simply stateless or stateful.

It is a computation boundary around state that lives somewhere.

Setelah blok file dan object storage, kita masuk ke konsep yang lebih luas: state.

Banyak engineer memakai kalimat “service ini stateless” seolah-olah itu berarti tidak ada state. Itu tidak benar. Yang biasanya dimaksud adalah:

The service process does not keep correctness-critical state locally
between requests.

Tetapi produk tetap punya state:

  • user sudah login atau belum;
  • case berada di status UNDER_REVIEW atau ESCALATED;
  • file evidence sudah ACCEPTED atau masih QUARANTINED;
  • upload session sudah complete atau masih pending;
  • idempotency key sudah pernah dipakai;
  • retry job sudah mencapai attempt keberapa;
  • config versi berapa yang aktif;
  • credential versi mana yang sedang digunakan;
  • cache permission apakah stale;
  • event mana yang sudah diproses;
  • workflow berada di step mana.

Jadi pertanyaan arsitektur yang benar bukan:

Is this microservice stateless?

Pertanyaan yang benar:

Where is the state?
Who owns it?
How is it mutated?
How is it recovered?
What happens when it is stale, duplicated, missing, or corrupted?

Part ini membangun mental model state yang akan dipakai untuk session, cache, workflow, configuration, secret rotation, dan failure recovery.


1. State: Definisi Praktis

Dalam konteks microservices, state adalah data atau kondisi yang membuat output operasi sekarang bergantung pada sesuatu yang terjadi sebelumnya.

Contoh sederhana:

POST /cases/123/submit

Operation ini bergantung pada state:

  • apakah case 123 ada;
  • apakah actor punya permission;
  • apakah case masih DRAFT;
  • apakah required evidence sudah attached;
  • apakah validation config saat ini mengizinkan submit;
  • apakah request dengan idempotency key yang sama sudah pernah diproses;
  • apakah secret database masih valid;
  • apakah service sedang read-only karena maintenance.

State bukan hanya row database. State adalah semua kondisi yang ikut menentukan keputusan.


2. State Taxonomy

Gunakan taxonomy berikut sebagai bahasa bersama.

State TypeContohBiasanya Disimpan DiCorrectness Impact
Durable domain statecase status, file lifecycle, payment statusdatabase, event storeSangat tinggi
Durable artifact stateobject metadata, checksum, version IDDB + object storageTinggi
Workflow statesaga step, BPMN token, escalation stepworkflow engine, DBSangat tinggi
Session stateweb session, wizard progress, CSRF tokenRedis, JDBC, signed tokenSedang sampai tinggi
Ephemeral processing statetemp file, chunk buffer, in-flight batchmemory, /tmp, emptyDirRendah jika recoverable
Derived statesearch index, read model, projectionElasticsearch/OpenSearch, DB viewSedang, bisa rebuild
Cache statepermission cache, config cache, lookup cacheRedis, Caffeine, CDNTergantung domain
Operational stateleader lock, job checkpoint, rate limiterDB, Redis, coordination storeTinggi untuk correctness operasional
Configuration stateactive profile, feature flag, thresholdConfigMap, config server, DBTinggi jika mengubah behavior
Secret statecredential version, token lease, certVault, cloud secret manager, Kubernetes SecretSangat tinggi

Satu service bisa memakai banyak jenis state sekaligus.

Yang membuat desain sulit adalah setiap state punya lifecycle, owner, consistency, dan failure mode berbeda.


3. The State Placement Problem

Setiap kali kita membuat state baru, kita sedang mengambil keputusan placement.

Pertanyaan:

Should this state live in memory, local disk, database, object storage,
event log, cache, workflow engine, config system, secret manager,
or an external coordination system?

Tidak ada jawaban universal. Yang ada adalah trade-off.

PlacementCocok UntukJangan Pakai Untuk
JVM heaprequest-local computation, cheap cachesource of truth, long-running workflow
Static field / singleton memoryimmutable lookup, metrics accumulatormutable business state
ThreadLocalrequest context terbatasasync boundary tanpa cleanup
Local filesystemtemp staging, transient buffercommitted domain state
Kubernetes emptyDirscratch space selama pod hidupdurable state lintas pod restart
PersistentVolumestateful workload tertentuarbitrary shared mutable storage antar service
Relational DBtransactional domain statelarge binary payload utama
Object storagefile/blob payloadhigh-frequency small mutations
Rediscache, session, lock with cautionauthoritative ledger tanpa durability model jelas
Event logordered facts, replaymutable object query utama tanpa projection
Workflow enginelong-running process statelow-level domain CRUD tanpa process semantics
Config systembehavior knobsuser/domain data
Secret managercredential/key materialnon-sensitive config atau business data

Rule dasar:

Correctness-critical state must live in a recoverable, observable,
and owned storage boundary.

4. Durable State

Durable state adalah state yang harus survive:

  • JVM restart;
  • pod restart;
  • node eviction;
  • rolling deployment;
  • traffic shift;
  • transient network partition;
  • operator restart;
  • scale down / scale up.

Contoh:

Case.status = ESCALATED
EvidenceFile.lifecycleStatus = ACCEPTED
UploadSession.status = COMPLETED
IdempotencyKey.result = FILE-123
SecretRotation.status = NEW_VERSION_ACTIVE

Durable state biasanya membutuhkan:

  • transaction boundary;
  • schema;
  • concurrency control;
  • audit;
  • backup/restore;
  • migration strategy;
  • ownership.

4.1 Durable State Is Not Always SQL

SQL database sangat cocok untuk banyak domain state, tetapi bukan satu-satunya storage.

Durable StateStorage Candidate
case lifecyclePostgreSQL
file payloadobject storage
file metadataPostgreSQL
immutable audit eventappend-only log / audit DB
workflow tokenworkflow engine DB
event streamKafka / event store
search projectionsearch index, rebuildable
report artifactobject storage + metadata DB

Yang penting bukan storage-nya populer. Yang penting storage-nya cocok dengan invariant.


5. Ephemeral State

Ephemeral state adalah state yang boleh hilang tanpa merusak correctness, atau bisa direkonstruksi.

Contoh:

  • byte buffer saat streaming;
  • temporary upload chunk;
  • local decompression directory;
  • in-memory parsed config cache;
  • short-lived HTTP request context;
  • worker current batch list;
  • local file sebelum dikirim ke object storage.

Kubernetes emptyDir dibuat saat Pod ditempatkan di node dan datanya hilang permanen saat Pod dihapus dari node. Ini cocok untuk scratch space, bukan source of truth.

Invariant:

No business-critical committed state may exist only in ephemeral storage.

5.1 Ephemeral State Design Pattern

Gunakan pola:

Durable intent -> ephemeral work -> durable result -> cleanup

Contoh upload processing:

1. Create UploadSession row: INITIATED
2. Stream bytes to local temp file or object multipart upload
3. Compute checksum
4. Store payload in object storage
5. Update metadata: UPLOADED
6. Delete temp file

Jika pod mati di step 3:

  • UploadSession masih ada;
  • status masih bisa dievaluasi;
  • temp file boleh hilang;
  • reconciliation job bisa expire session atau minta client resume.

5.2 Jangan Menyimpan Progress Penting Hanya di Memory

Buruk:

private final Map<String, UploadProgress> progressByUploadId = new ConcurrentHashMap<>();

Masalah:

  • hilang saat restart;
  • tidak terlihat oleh pod lain;
  • tidak scalable horizontal;
  • tidak bisa diaudit;
  • client bisa diarahkan ke pod lain;
  • memory leak jika cleanup gagal.

Lebih baik:

UploadSession persisted in DB
Part metadata persisted or derived from object store multipart state
Local progress only optimization

6. Derived State

Derived state adalah state yang bisa dibangun ulang dari source of truth.

Contoh:

  • search index;
  • read model;
  • dashboard aggregate;
  • materialized view;
  • denormalized permission projection;
  • file inventory projection;
  • object metadata cache.

Derived state boleh stale jika domain mengizinkan. Tetapi harus punya:

  • source of truth jelas;
  • rebuild process;
  • lag metric;
  • conflict handling;
  • versioning;
  • backfill strategy.

Invariant:

Derived state must either be correct enough for its use case
or explicitly marked as stale/incomplete.

6.1 Derived State Failure Mode

Misal Evidence Search Index tertinggal.

User mencari file evidence, tetapi file tidak muncul.

Pertanyaan design:

  • apakah search result boleh stale?
  • apakah detail view tetap membaca source of truth?
  • apakah action penting seperti delete/approve memakai index atau DB?
  • apakah index lag terukur?
  • apakah ada rebuild job?

Rule:

Do not execute irreversible domain action based solely on rebuildable projection
unless the projection is explicitly authoritative for that action.

7. Workflow State

Workflow state adalah state yang menggambarkan posisi proses jangka panjang.

Contoh regulatory workflow:

Workflow state sering melibatkan:

  • human task;
  • timer;
  • escalation;
  • external system response;
  • file attachment;
  • approval;
  • compensation;
  • audit.

Kesalahan umum:

Workflow state disembunyikan dalam kombinasi boolean columns.

Contoh buruk:

is_submitted boolean
is_reviewed boolean
is_escalated boolean
is_closed boolean

Masalah:

  • kombinasi invalid mudah muncul;
  • transisi tidak eksplisit;
  • audit lemah;
  • UI dan worker menginterpretasi berbeda;
  • sulit menambah state baru.

Lebih baik:

status varchar not null
status_changed_at timestamp not null
status_reason varchar
version bigint not null

Dengan transition guard di domain layer.


8. Session State

Session state adalah state yang menghubungkan beberapa request dari actor yang sama.

Contoh:

  • login session;
  • CSRF token;
  • multi-step wizard;
  • temporary draft;
  • upload session;
  • OAuth authorization flow;
  • device trust.

Session state bisa disimpan di:

  • signed cookie/token;
  • Redis;
  • JDBC;
  • server memory;
  • external session system.

Spring Session menyediakan model untuk memindahkan HTTP session ke external store seperti JDBC atau Redis. Ini membantu ketika aplikasi Spring Boot berjalan multi-instance dan session tidak boleh terikat pada satu JVM.

8.1 Session Placement Decision

PatternKelebihanRisiko
Server memory sessionsederhanatidak scale horizontal, hilang saat restart
Sticky sessionmengurangi cross-node readfailover buruk, imbalance
Redis sessionshared, cepatRedis outage berdampak login/session
JDBC sessiondurable, familiarlatency lebih tinggi
Signed tokenstateless serverrevocation, size, token leakage
Hybridfleksibelkompleksitas governance

Rule:

If user journey correctness depends on session continuity,
then session state needs an explicit durability and failover model.

9. Cache as State

Cache sering dianggap bukan state. Dalam praktik, cache adalah state dengan expiry dan consistency contract.

Redis menyediakan TTL/expire dan eviction policy. Itu berarti cache entry bisa hilang karena waktu atau memory pressure. Maka cache consumer harus punya fallback dan correctness boundary.

Pertanyaan wajib:

What happens if the cache returns stale value?
What happens if the cache misses?
What happens if the cache evicts hot key?
What happens if Redis is unavailable?

9.1 Cache Correctness Classes

ClassContohStale ImpactDesign
Performance cachecountry list, static referencerendahTTL panjang acceptable
UX cachedashboard aggregatesedangshow stale marker
Decision cachepricing/risk thresholdtinggishort TTL + validation
Security cachepermission, revoked tokensangat tinggibounded TTL + forced recheck
Coordination cachedistributed locksangat tinggilease/fencing required

Untuk security-sensitive cache, jangan hanya berkata “TTL 5 menit cukup”. Jelaskan apa yang bisa salah dalam 5 menit itu.


10. Operational State

Operational state adalah state yang tidak terlihat sebagai domain data tetapi menentukan operasi sistem.

Contoh:

  • worker checkpoint;
  • scheduler last run time;
  • distributed lock;
  • leader election state;
  • retry attempt count;
  • DLQ cursor;
  • rate limiter bucket;
  • circuit breaker state;
  • idempotency record;
  • migration marker.

Operational state sering menyebabkan incident karena dianggap “infrastruktur”, bukan domain.

10.1 Scheduled Job State

Jika service punya 5 pod dan tiap pod menjalankan scheduler yang sama:

@Scheduled(fixedDelay = 60000)
public void processExpiredUploads() {
    // process stale upload sessions
}

Pertanyaan:

  • apakah semua pod boleh menjalankan job?
  • apakah job idempotent?
  • apakah row locking aman?
  • apakah ada leader election?
  • apakah duplicate processing acceptable?
  • apakah ada checkpoint?

Pattern aman:

Use DB row locking / advisory lock / queue partition / leader lease.
Make job idempotent anyway.

Distributed lock tanpa idempotency bukan desain matang. Lock bisa expire, network bisa partition, process bisa pause.


11. State Ownership

Setiap state harus punya owner.

Gunakan ownership matrix:

StateOwnerSource of TruthMutation AuthorityRebuildable?
Case statusCase ServicePostgreSQLCase domain serviceNo
Evidence file lifecycleEvidence ServicePostgreSQL + object versionEvidence domain servicePartial
Upload progressEvidence ServiceDB/object multipart stateUpload serviceYes/expire
Search projectionSearch ServiceIndex from eventsProjection workerYes
Permission cacheAccess ServiceRedis from DB/policyAccess serviceYes
HTTP sessionAuth/session serviceRedis/JDBC/tokenAuth layerDepends
Worker checkpointWorker ownerDB/queue offsetWorkerYes with replay

Rule:

The service that reads state is not necessarily the service that owns state.

Consumer boleh cache atau project state, tetapi mutation authority harus tetap jelas.


12. Consistency Model

State design harus menjawab consistency expectation.

ModelMaknaContoh
Strong consistencyread setelah write melihat hasil terbarucase submit response membaca updated status
Read-your-writesactor melihat perubahan miliknyauser upload file lalu melihat file listed
Monotonic readsuser tidak melihat state mundurstatus tidak kembali dari accepted ke uploading
Eventual consistencyprojection akan menyusulsearch index setelah upload
Causal consistencyperubahan terkait terlihat dalam urutan sebab-akibatfile accepted setelah scan completed
Best-effortboleh hilang/stalenon-critical metrics cache

Microservice mature tidak berkata “eventual consistency” untuk semua hal. Ia menyebut mana yang eventual dan mana yang harus kuat.

12.1 Example: File Upload Consistency

Setelah POST /files/upload-complete sukses:

  • metadata detail endpoint harus bisa membaca file status minimal UPLOADED;
  • search endpoint boleh belum menampilkan file sampai indexer berjalan;
  • download endpoint mungkin belum available sampai scan accepted;
  • audit endpoint harus punya event upload completion;
  • object storage payload harus ada dan checksum sesuai.

Ini campuran consistency model.


13. State Mutation Patterns

13.1 Command-Based Mutation

Jangan expose setter mentah.

public record AcceptEvidenceFileCommand(
    String fileId,
    String scanId,
    String verifiedSha256,
    String actorId,
    String idempotencyKey
) {}

Domain service:

public final class EvidenceFileApplicationService {
    private final EvidenceFileRepository repository;
    private final AuditPublisher auditPublisher;
    private final IdempotencyStore idempotencyStore;

    public EvidenceFile accept(AcceptEvidenceFileCommand command) {
        return idempotencyStore.getOrCompute(command.idempotencyKey(), () -> {
            EvidenceFile file = repository.getForUpdate(command.fileId());
            file.accept(command.scanId(), command.verifiedSha256());
            repository.save(file);
            auditPublisher.fileAccepted(file.id(), command.actorId());
            return file;
        });
    }
}

State mutation dikendalikan oleh use case, bukan oleh repository setter.

13.2 Optimistic Locking

Untuk mencegah lost update:

update evidence_file
set status = ?, version = version + 1
where file_id = ? and version = ?;

Jika affected row = 0:

State changed concurrently. Reload and retry or return conflict.

13.3 Append-Only Event + Projection

Untuk state yang butuh audit kuat:

Command -> validate invariant -> append event -> update projection

Event tidak menggantikan semua query model. Event adalah log fakta. Projection memudahkan read.


14. Java-Specific State Pitfalls

14.1 Static Mutable State

Buruk:

public final class CurrentTenantHolder {
    public static String currentTenant;
}

Masalah:

  • shared antar request;
  • race condition;
  • bocor antar tenant;
  • tidak aman untuk concurrency;
  • tidak cocok untuk async/reactive.

Lebih baik:

  • pass explicit TenantContext;
  • gunakan request-scoped context dengan disiplin cleanup;
  • untuk reactive gunakan context propagation yang sesuai.

14.2 Singleton Bean with Mutable Business State

Spring bean default singleton. Jika menyimpan mutable state per request di field, bug akan muncul.

Buruk:

@Service
public class UploadService {
    private String currentUploadId;

    public void process(String uploadId) {
        this.currentUploadId = uploadId;
    }
}

Benar:

@Service
public class UploadService {
    public void process(String uploadId) {
        UploadContext context = new UploadContext(uploadId);
        doProcess(context);
    }
}

14.3 ThreadLocal Leak

ThreadLocal mudah bocor di thread pool jika tidak dibersihkan.

try {
    TenantContextHolder.set(tenant);
    chain.doFilter(request, response);
} finally {
    TenantContextHolder.clear();
}

Dalam async/reactive code, ThreadLocal sering tidak cukup karena execution bisa pindah thread.

14.4 In-Memory Cache Without Bounds

Buruk:

Map<String, FileMetadata> cache = new ConcurrentHashMap<>();

Masalah:

  • unbounded memory;
  • stale forever;
  • no eviction;
  • no metrics;
  • no invalidation;
  • OOM risk.

Gunakan cache library dengan:

  • maximum size/weight;
  • TTL;
  • metrics;
  • explicit invalidation;
  • fallback path;
  • correctness classification.

15. Kubernetes State Boundary

Di Kubernetes, compute mudah diganti. State harus sengaja ditempatkan.

15.1 Pod Local State

Pod bisa mati karena:

  • deployment rollout;
  • node drain;
  • eviction;
  • OOM kill;
  • crash;
  • autoscaling scale down.

Karena itu pod-local state harus dianggap disposable.

15.2 PersistentVolume

PersistentVolume menyediakan abstraksi storage durable di Kubernetes. Cocok untuk workload tertentu yang memang membutuhkan storage terikat, terutama stateful workload. Tetapi untuk microservice stateless compute, sering lebih baik memakai managed database/object storage daripada menaruh domain state di PVC lokal aplikasi.

15.3 StatefulSet

StatefulSet berguna untuk aplikasi yang membutuhkan identity stabil dan persistent storage per pod. Ini biasanya relevan untuk infrastructure workload atau service stateful tertentu, bukan default untuk semua microservice.

Decision rule:

Use StatefulSet when stable identity and pod-specific persistent state are part of the application model.
Do not use StatefulSet just because the service accidentally writes local state.

16. State Failure Modeling

Untuk setiap state, modelkan failure berikut.

FailurePertanyaan
MissingApa yang terjadi jika state hilang?
StaleApa yang terjadi jika state lama?
DuplicateApa yang terjadi jika state/event dobel?
CorruptBagaimana mendeteksi corruption?
DivergentBagaimana jika DB dan object storage tidak sama?
Unauthorized mutationSiapa bisa mengubah tanpa hak?
Partial commitApa yang terjadi jika storage sukses tapi DB gagal?
Replay driftApakah replay menghasilkan hasil berbeda?
Clock skewApakah TTL/expiry salah karena clock?
Region splitApakah state berbeda antar region?

Contoh untuk UploadSession:

FailureMitigation
Session row hilangclient harus memulai ulang; object temp cleanup
Session staleexpire via reconciliation
Duplicate complete requestidempotency key
Part metadata corruptverify against object storage list/checksum
Payload exists, metadata missingorphan object scanner
Metadata exists, payload missinginvariant metric + recovery/reject

17. State Design Checklist

Sebelum menambah state baru, jawab:

  1. Apa nama state ini?
  2. Apa semantic meaning-nya?
  3. Siapa owner-nya?
  4. Apa source of truth-nya?
  5. Apakah state durable atau ephemeral?
  6. Apakah state bisa direkonstruksi?
  7. Siapa mutation authority?
  8. Apa allowed transition-nya?
  9. Apa consistency requirement-nya?
  10. Apa concurrency control-nya?
  11. Apa idempotency boundary-nya?
  12. Apa failure mode missing/stale/duplicate/corrupt?
  13. Apa observability-nya?
  14. Apa retention/cleanup policy-nya?
  15. Apa security boundary-nya?

Jika tidak bisa menjawab ini, state tersebut belum siap production.


18. Key Takeaways

  1. Stateless service does not mean state-free system.
  2. State is any condition that affects future behavior.
  3. Every state needs owner, source of truth, mutation boundary, recovery model, and observability.
  4. Ephemeral state is acceptable only if loss is safe or recoverable.
  5. Derived state must be rebuildable and have lag visibility.
  6. Cache is state with expiry and consistency risk.
  7. Operational state is still state; scheduler checkpoints, locks, retries, and idempotency records matter.
  8. Java singleton beans, static fields, ThreadLocal, and unbounded maps are common hidden state traps.
  9. Kubernetes makes compute replaceable; it does not magically make state safe.
  10. A mature service explicitly documents state placement and failure behavior.

Di part berikutnya, kita akan membongkar mitos yang paling sering menyesatkan desain microservices: the stateless service myth.


References

Lesson Recap

You just completed lesson 27 in build core. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.