Final StretchOrdered learning track

Case Study - Saga and Workflow for Enforcement Lifecycle

Learn Java Microservices Design and Architect - Part 095

Studi kasus production-grade tentang saga dan workflow untuk enforcement lifecycle pada regulatory case-management domain, mencakup orchestration boundary, human task, SLA timer, compensation, idempotency, workflow versioning, dan Java implementation model.

18 min read3416 words
PrevNext
Lesson 95100 lesson track83–100 Final Stretch
#java#microservices#architecture#workflow+4 more

Part 095 — Case Study: Saga and Workflow for Enforcement Lifecycle

Microservices yang benar untuk domain enforcement tidak dimulai dari daftar endpoint. Ia dimulai dari pertanyaan: proses bisnis mana yang harus tetap benar walaupun setiap service hanya bisa menjamin transaksi lokal?

Pada part sebelumnya kita sudah punya capability map, service boundary, service contract, dan data ownership model. Sekarang kita naik satu level: bagaimana proses enforcement berjalan lintas service tanpa distributed transaction global.

Dalam regulatory case-management, kegagalan desain workflow biasanya bukan error teknis yang langsung terlihat. Bentuknya lebih berbahaya:

  • case sudah dieskalasi tapi task reviewer tidak pernah dibuat,
  • evidence diterima tapi tidak masuk eligibility assessment,
  • party sudah dinotifikasi tapi audit trail tidak bisa menjelaskan alasan keputusan,
  • SLA terlewati tapi tidak ada timer yang menyalakan escalation,
  • keputusan final diterbitkan dua kali karena retry,
  • proses dibatalkan tapi downstream service masih melanjutkan pekerjaan lama,
  • human task selesai setelah policy berubah dan sistem tidak tahu versi rule mana yang dipakai.

Ini bukan masalah “pakai Kafka atau REST”. Ini masalah business process correctness under partial failure.


1. Target Mental Model

Untuk domain seperti enforcement lifecycle, jangan membayangkan workflow sebagai diagram cantik. Bayangkan workflow sebagai durable process memory.

Service domain menyimpan truth lokal:

  • Case Service menyimpan case state dan assignment,
  • Evidence Service menyimpan evidence metadata dan admissibility state,
  • Assessment Service menyimpan allegation assessment,
  • Decision Service menyimpan decision record,
  • Notification Service menyimpan notification attempt/outcome,
  • Audit Service menyimpan evidence chain dan decision trail.

Workflow coordinator menyimpan process progress:

  • step mana sudah selesai,
  • command mana sudah dikirim,
  • event mana sudah diterima,
  • timeout mana sedang berjalan,
  • compensation mana perlu dijalankan,
  • task manusia mana yang sedang menunggu,
  • versi proses/rule mana yang sedang dipakai.

Ini pemisahan penting.

Workflow state bukan domain state.

Workflow boleh tahu bahwa IssueDecisionActivity sudah selesai. Tetapi truth bahwa decision sudah legally issued tetap milik Decision Service.


2. Enforcement Lifecycle sebagai Long-Running Business Transaction

Contoh simplified enforcement lifecycle:

Lifecycle ini panjang, melibatkan manusia, timer, policy, external parties, dan state yang tidak bisa diubah sembarangan. Database transaction tunggal tidak mungkin menjaga semuanya.

Maka desain yang sehat biasanya memakai kombinasi:

  1. local transaction di tiap service,
  2. outbox/inbox untuk reliable messaging,
  3. workflow/saga coordinator untuk process progress,
  4. idempotency untuk retry-safe command,
  5. compensation/correction untuk side effect yang tidak bisa di-rollback,
  6. audit event untuk reconstructability.

3. Saga vs Workflow: Jangan Campur Semua Istilah

Dalam praktik, “saga” dan “workflow” sering dipakai bergantian. Untuk desain enterprise, bedakan seperti ini:

KonsepFokusCocok UntukRisiko Jika Salah Pakai
SagaKoordinasi transaksi bisnis lintas service lewat local transaction + compensationproses lintas service yang butuh eventual consistencycompensation dianggap rollback teknis padahal harus aksi bisnis
WorkflowDurable orchestration dari langkah otomatis, human task, timer, SLA, branchingproses panjang dengan visibility, task, timeout, auditengine menjadi god service yang mengambil domain ownership
Process ManagerKomponen aplikasi yang mengingat progress proses dan mengirim command berikutnyaorchestration lightweight tanpa engine eksternalstate machine tersembunyi di handler event
ChoreographyService bereaksi terhadap event tanpa central coordinatorproses sederhana, loosely coupled, tidak butuh global visibilityevent soup, sulit audit, sulit tahu proses stuck
BPMN/Durable EngineModel proses eksplisit yang bisa dioperasikanhuman workflow, compliance, SLA, retry/timer kuatengine dependency terlalu besar jika proses sederhana

Untuk enforcement lifecycle, workflow engine/process manager lebih masuk akal daripada pure choreography karena:

  • ada human task,
  • ada timer/SLA,
  • ada escalation,
  • ada audit requirement,
  • ada branching berbasis policy,
  • ada status proses yang perlu dilihat operator,
  • ada recovery setelah worker/service restart.

4. Reference Architecture

Hal yang perlu diperhatikan:

  • Workflow coordinator tidak membaca database private service lain.
  • Workflow coordinator mengirim command lewat API atau message command.
  • Domain service menerbitkan event setelah local transaction commit.
  • Workflow coordinator bereaksi terhadap event atau activity completion.
  • Audit service menerima audit-worthy event dari semua service.
  • Human task tidak dianggap “manual side note”; task adalah bagian process state.

5. Ownership Boundary

5.1 Case Service

Bertanggung jawab atas:

  • case identity,
  • lifecycle state utama,
  • assignment investigator,
  • jurisdiction classification,
  • case closure reason,
  • case-level invariant.

Tidak bertanggung jawab atas:

  • detail evidence file,
  • decision legal reasoning,
  • notification delivery mechanics,
  • workflow timer.

5.2 Workflow Coordinator

Bertanggung jawab atas:

  • process instance,
  • step sequencing,
  • timer/SLA,
  • branching,
  • command correlation,
  • waiting for human task,
  • compensation orchestration,
  • process visibility.

Tidak bertanggung jawab atas:

  • domain truth milik service,
  • database ownership service lain,
  • policy decision yang seharusnya dimiliki decision/policy service,
  • permanent audit evidence store.

5.3 Audit Service

Bertanggung jawab atas:

  • append-only audit event,
  • evidence chain,
  • actor attribution,
  • decision trace,
  • reconstruction query,
  • retention and immutability.

Tidak bertanggung jawab atas:

  • menjalankan workflow,
  • memperbaiki domain state,
  • memutuskan business policy.

6. Process Instance Identity

Dalam workflow domain, identity harus eksplisit.

IdentityContohDigunakan Untuk
caseIdCASE-2026-000123domain identity utama
workflowInstanceIdENF-WF-CASE-2026-000123-v1durable process instance
correlationIdrequest/command correlationtrace antar command/event
causationIdevent penyebab event berikutnyareconstruct causal chain
idempotencyKeycommand repeat protectionretry-safe action
taskIdhuman task identitywork queue dan audit
decisionIdfinal decision identitylegal/audit record

Rule yang baik:

caseId bukan pengganti workflowInstanceId. Satu case bisa punya beberapa process instance: initial enforcement, appeal, remediation monitoring, reopening, atau correction workflow.


7. Saga Step Matrix

Sebelum menulis workflow code, buat matrix langkah bisnis.

StepOwner ServiceCommandSuccess EventTimeoutCompensation/Correction
Open caseCase ServiceOpenCaseCaseOpened5sclose as intake error jika belum visible
Classify jurisdictionCase Service / PolicyClassifyCaseCaseClassified10smark classification failed, manual review
Assign investigatorCase ServiceAssignInvestigatorInvestigatorAssigned10sunassign / reassign
Create evidence requestEvidence ServiceRequestEvidenceEvidenceRequestCreated10scancel evidence request
Wait for evidenceExternal/Humann/aEvidenceSubmitted14 daysescalate missing evidence
Validate evidenceEvidence ServiceValidateEvidenceEvidenceValidated30smark evidence review required
Assess allegationsAssessment ServiceAssessAllegationAllegationAssessed60smark assessment requires review
Create recommendationAssessment ServiceCreateRecommendationRecommendationCreated30swithdraw recommendation
Legal reviewHuman TaskCreateReviewTaskReviewApproved / ReviewRejected7 daysescalate reviewer
Draft decisionDecision ServiceDraftDecisionDecisionDrafted30svoid draft if not issued
Issue decisionDecision ServiceIssueDecisionDecisionIssued30scorrection notice, not rollback
Notify partiesNotification ServiceNotifyPartiesPartiesNotified1 dayretry / alternate delivery / manual service
Start monitoringCase ServiceStartComplianceMonitoringMonitoringStarted10smanual monitoring setup

Matrix ini adalah inti desain. Tanpa matrix ini, workflow code akan berubah menjadi narasi tersembunyi.


8. State Machine untuk Enforcement Saga

Dalam enforcement domain, banyak compensation bukan rollback.

Contoh:

  • decision yang sudah issued tidak boleh “dihapus”; harus dibuat correction/amendment,
  • notification yang sudah terkirim tidak bisa ditarik; harus dikirim clarification notice,
  • evidence yang sudah diterima tidak boleh dihapus jika retention policy mewajibkan penyimpanan,
  • audit event tidak boleh dihapus; hanya bisa ditambah correction event.

Ini membedakan sistem enterprise/regulatory dari demo e-commerce sederhana.


9. Command Contract untuk Workflow Activity

Setiap activity command harus punya envelope.

public record WorkflowCommandEnvelope<T>(
    String commandId,
    String workflowInstanceId,
    String caseId,
    String correlationId,
    String causationId,
    String idempotencyKey,
    String actorId,
    String actorType,
    String policyVersion,
    Instant requestedAt,
    T payload
) {}

Contoh command:

public record IssueDecisionCommand(
    String decisionId,
    String caseId,
    String recommendationId,
    String approvedBy,
    String legalBasisCode,
    String decisionTemplateVersion,
    String reasonSummary
) {}

Activity interface:

public interface DecisionActivities {
    ActivityResult issueDecision(
        WorkflowCommandEnvelope<IssueDecisionCommand> command
    );

    ActivityResult voidDraft(
        WorkflowCommandEnvelope<VoidDecisionDraftCommand> command
    );
}

ActivityResult perlu membedakan outcome bisnis dan teknis.

public sealed interface ActivityResult {
    record Accepted(String resourceId, String eventId) implements ActivityResult {}
    record AlreadyApplied(String resourceId, String originalEventId) implements ActivityResult {}
    record Rejected(String reasonCode, String message) implements ActivityResult {}
    record NeedsManualIntervention(String reasonCode, String taskType) implements ActivityResult {}
    record TransientFailure(String reasonCode) implements ActivityResult {}
}

Kenapa tidak pakai boolean?

Karena workflow membutuhkan keputusan lanjutan:

  • retry jika transient,
  • lanjut jika already applied,
  • stop jika rejected final,
  • buat human task jika manual intervention.

10. Idempotency Rule per Activity

Workflow engine bisa retry activity. Network bisa timeout. Worker bisa crash setelah downstream berhasil tapi sebelum response kembali. Karena itu activity harus idempotent.

public final class IssueDecisionActivity implements DecisionActivities {
    private final DecisionClient decisionClient;

    @Override
    public ActivityResult issueDecision(
            WorkflowCommandEnvelope<IssueDecisionCommand> command) {

        try {
            DecisionResponse response = decisionClient.issueDecision(
                command.idempotencyKey(),
                command.payload()
            );

            if (response.status() == DecisionResponse.Status.CREATED) {
                return new ActivityResult.Accepted(
                    response.decisionId(),
                    response.eventId()
                );
            }

            if (response.status() == DecisionResponse.Status.ALREADY_EXISTS) {
                return new ActivityResult.AlreadyApplied(
                    response.decisionId(),
                    response.eventId()
                );
            }

            return new ActivityResult.Rejected(
                response.reasonCode(),
                response.message()
            );
        } catch (TimeoutException ex) {
            return new ActivityResult.TransientFailure("DECISION_TIMEOUT");
        }
    }
}

Downstream service juga harus menyimpan idempotency.

@Transactional
public DecisionIssuedResult issueDecision(
        String idempotencyKey,
        IssueDecisionCommand command) {

    return idempotencyRepository.find(idempotencyKey)
        .map(existing -> DecisionIssuedResult.alreadyApplied(existing.resourceId(), existing.eventId()))
        .orElseGet(() -> {
            Decision decision = Decision.issue(
                DecisionId.of(command.decisionId()),
                CaseId.of(command.caseId()),
                command.legalBasisCode(),
                command.reasonSummary(),
                command.approvedBy()
            );

            decisionRepository.save(decision);

            OutboxEvent event = OutboxEvent.decisionIssued(decision);
            outboxRepository.save(event);

            idempotencyRepository.save(IdempotencyRecord.applied(
                idempotencyKey,
                decision.id().value(),
                event.eventId()
            ));

            return DecisionIssuedResult.created(decision.id().value(), event.eventId());
        });
}

Rule:

Idempotency tidak boleh hanya disimpan di workflow layer. Domain service yang menimbulkan side effect harus ikut menjaga idempotency.


11. Workflow Pseudocode: Temporal-Style Durable Execution

Ini bukan instruksi framework spesifik. Tujuannya menunjukkan bentuk orchestration yang durable.

public final class EnforcementLifecycleWorkflowImpl implements EnforcementLifecycleWorkflow {

    private final CaseActivities caseActivities;
    private final EvidenceActivities evidenceActivities;
    private final AssessmentActivities assessmentActivities;
    private final DecisionActivities decisionActivities;
    private final NotificationActivities notificationActivities;
    private final HumanTaskActivities humanTaskActivities;
    private final AuditActivities auditActivities;

    @Override
    public void run(EnforcementWorkflowInput input) {
        String workflowId = input.workflowInstanceId();
        String caseId = input.caseId();

        auditActivities.recordMilestone(milestone(workflowId, caseId, "WORKFLOW_STARTED"));

        ActivityResult opened = caseActivities.openCase(envelope(input, new OpenCaseCommand(caseId)));
        requireSuccessOrAlreadyApplied(opened, "OPEN_CASE");

        ActivityResult classified = caseActivities.classifyCase(envelope(input, new ClassifyCaseCommand(caseId)));
        if (isNoJurisdiction(classified)) {
            caseActivities.closeCase(envelope(input, CloseCaseCommand.noJurisdiction(caseId)));
            auditActivities.recordMilestone(milestone(workflowId, caseId, "CLOSED_NO_JURISDICTION"));
            return;
        }

        caseActivities.assignInvestigator(envelope(input, new AssignInvestigatorCommand(caseId)));

        evidenceActivities.requestEvidence(envelope(input, new RequestEvidenceCommand(caseId)));

        EvidenceSubmitted submitted = waitForEvidenceOrEscalate(caseId, Duration.ofDays(14));

        ActivityResult validated = evidenceActivities.validateEvidence(
            envelope(input, new ValidateEvidenceCommand(caseId, submitted.evidenceBatchId()))
        );
        requireSuccessOrManualIntervention(validated, "VALIDATE_EVIDENCE");

        ActivityResult assessed = assessmentActivities.assessAllegations(
            envelope(input, new AssessAllegationsCommand(caseId, submitted.evidenceBatchId()))
        );
        requireSuccessOrManualIntervention(assessed, "ASSESS_ALLEGATIONS");

        RecommendationCreated recommendation = assessmentActivities.createRecommendation(
            envelope(input, new CreateRecommendationCommand(caseId))
        );

        ReviewOutcome review = humanTaskActivities.createAndWaitForReviewTask(
            new LegalReviewTaskRequest(
                workflowId,
                caseId,
                recommendation.recommendationId(),
                Duration.ofDays(7)
            )
        );

        if (review.rejected()) {
            auditActivities.recordMilestone(milestone(workflowId, caseId, "LEGAL_REVIEW_REWORK"));
            assessmentActivities.requestRework(envelope(input, new RequestAssessmentReworkCommand(caseId, review.reason())));
            // Real production design would loop with bounded attempt / versioned state.
            return;
        }

        DecisionDrafted draft = decisionActivities.draftDecision(
            envelope(input, new DraftDecisionCommand(caseId, recommendation.recommendationId()))
        );

        ActivityResult issued = decisionActivities.issueDecision(
            envelope(input, new IssueDecisionCommand(
                draft.decisionId(),
                caseId,
                recommendation.recommendationId(),
                review.approvedBy(),
                review.legalBasisCode(),
                review.templateVersion(),
                review.reasonSummary()
            ))
        );
        requireSuccessOrAlreadyApplied(issued, "ISSUE_DECISION");

        notificationActivities.notifyParties(
            envelope(input, new NotifyPartiesCommand(caseId, draft.decisionId()))
        );

        caseActivities.startComplianceMonitoring(
            envelope(input, new StartMonitoringCommand(caseId, draft.decisionId()))
        );

        auditActivities.recordMilestone(milestone(workflowId, caseId, "WORKFLOW_COMPLETED"));
    }
}

Hal yang disengaja:

  • workflow tidak update database service lain secara langsung,
  • workflow menunggu event/human task/timer secara eksplisit,
  • activity result dibaca sebagai business outcome,
  • idempotency key dibawa dari workflow ke service,
  • audit milestone dicatat pada transisi penting,
  • timeout menghasilkan escalation, bukan silent failure.

12. BPMN-Style View untuk Human Workflow

Jika menggunakan BPMN-style orchestration, modelnya bisa seperti ini:

BPMN membantu komunikasi dengan business/regulatory stakeholder. Tetapi jangan jadikan BPMN sebagai alasan memindahkan semua domain rule ke workflow diagram. Rule domain tetap di domain service/policy service.


13. Timer dan SLA Design

Timer bukan sekadar scheduler. Timer adalah business promise.

TimerTriggerActionAudit Need
Evidence submission SLAevidence request created + 14 dayscreate escalation taskwho/what was delayed
Legal review SLAreview task created + 7 daysescalate to supervisorreviewer, queue, due date
Notification delivery SLAdecision issued + 1 dayalternate delivery/manual serviceattempted channels
Compliance monitoring startdecision issued + immediatestart monitoring processmonitoring responsibility
Appeal windowdecision notified + statutory periodallow/close appeal intakenotification proof

Java representation:

public record ProcessTimer(
    String timerId,
    String workflowInstanceId,
    String caseId,
    String timerType,
    Instant dueAt,
    String businessReason,
    String escalationPolicyVersion
) {}

Timer rule:

A timer must produce a domain-visible action or an explicit no-op reason. Silent timer expiry is a defect.


14. Human Task as a First-Class State

Human task sering diperlakukan sebagai row di table work queue. Untuk compliance workflow, task harus punya contract.

public record HumanTaskCreated(
    String taskId,
    String workflowInstanceId,
    String caseId,
    String taskType,
    String assignedRole,
    String assignedUserId,
    Instant createdAt,
    Instant dueAt,
    String decisionRequired,
    Map<String, String> requiredEvidenceReferences,
    String policyVersion
) {}

Human task completion:

public record LegalReviewCompleted(
    String taskId,
    String caseId,
    String reviewerId,
    boolean approved,
    String reasonCode,
    String reasonText,
    String legalBasisCode,
    String policyVersion,
    Instant completedAt
) {}

Important invariants:

  • task completion must be idempotent,
  • task must reject stale completion if case/process already moved on,
  • task must capture actor and reason,
  • task must include policy/rule version,
  • task completion must emit audit-worthy event,
  • task assignment and reassignment must be logged.

15. Compensation in Regulatory Domain

Compensation bukan “undo”. Compensation adalah aksi bisnis yang membuat sistem kembali ke state yang sah.

Failed StepPossible CompensationWhy Not Simple Rollback
Evidence request created wronglycancel evidence request and notify requesterexternal party may have seen request
Recommendation created with wrong evidencewithdraw recommendation and create corrected recommendationrecommendation may be referenced by review
Decision draft wrongvoid draftdraft may have audit trail
Decision issued wronglyissue correction/amendment/revocation processissued decision is legal fact
Notification sent to wrong partynotify correction, trigger privacy incidentcannot unsend
Monitoring started wrongclose monitoring with reasonmonitoring state may have generated tasks

Compensation should have its own event.

public record DecisionCorrectedEvent(
    String eventId,
    String decisionId,
    String caseId,
    String correctionId,
    String correctedBy,
    String correctionReason,
    String originalDecisionEventId,
    Instant correctedAt
) {}

Do not mutate history. Add correction history.


16. Process Versioning

Long-running workflows survive deployments. That means workflow versioning is mandatory.

Possible changes:

  • new review step added,
  • evidence SLA changed from 14 days to 10 days,
  • decision policy changed,
  • notification channel changed,
  • additional approval required for high-risk case,
  • old state no longer maps to new process model.

Versioning rule:

Change TypeStrategy
additive step for new cases onlynew workflow version
SLA duration changedstore due date at timer creation; do not recalculate blindly
rule changedcapture policy version in command/task/audit
payload changedsupport backward-compatible event/command evolution
compensation changedpreserve old compensation behavior for old process instances
old process bugexplicit migration workflow or manual intervention

Do not assume all active workflow instances can use latest code safely.


17. Workflow Event Handling

A workflow should correlate events explicitly.

public record WorkflowSignal(
    String workflowInstanceId,
    String caseId,
    String eventId,
    String eventType,
    String causationId,
    Instant occurredAt,
    JsonNode payload
) {}

Event correlation rules:

  • event must include caseId,
  • workflow subscription must map event to process instance,
  • duplicate event must be ignored safely,
  • stale event must be recorded and ignored or routed to exception handling,
  • unexpected event must not crash the whole worker,
  • event must be auditable if it changes process progress.

Example stale event:

EvidenceSubmitted arrives after case has been closed for no jurisdiction.

Correct behavior is not “throw exception and retry forever”. It should become a controlled business exception:

  • mark event as late,
  • notify evidence team,
  • create audit event,
  • apply retention/privacy policy,
  • do not reopen case automatically unless rule says so.

18. Workflow Persistence and Data Ownership

If using an engine, engine persistence stores process state. If using a custom process manager, you store your own state.

But either way:

  • workflow persistence is not a reporting database,
  • workflow state is not source of truth for domain facts,
  • workflow should not become a replacement for service catalog,
  • workflow history can be sensitive and needs retention policy,
  • workflow state needs backup/DR if it drives legal process.

For custom process manager:

CREATE TABLE enforcement_process_instance (
    workflow_instance_id VARCHAR(80) PRIMARY KEY,
    case_id VARCHAR(60) NOT NULL,
    process_version VARCHAR(40) NOT NULL,
    state VARCHAR(80) NOT NULL,
    status VARCHAR(40) NOT NULL,
    current_step VARCHAR(80),
    correlation_id VARCHAR(80),
    started_at TIMESTAMP NOT NULL,
    updated_at TIMESTAMP NOT NULL,
    completed_at TIMESTAMP NULL,
    last_event_id VARCHAR(80),
    version BIGINT NOT NULL
);

CREATE TABLE enforcement_process_event_inbox (
    event_id VARCHAR(80) PRIMARY KEY,
    workflow_instance_id VARCHAR(80) NOT NULL,
    event_type VARCHAR(120) NOT NULL,
    received_at TIMESTAMP NOT NULL,
    processed_at TIMESTAMP NULL,
    processing_status VARCHAR(40) NOT NULL
);

This is not enough for a full workflow engine, but enough to show the architectural shape.


19. Error Handling Policy

Workflow error handling must classify failures.

FailureExampleWorkflow Response
transient technicalHTTP timeout to Evidence Serviceretry with backoff/budget
duplicatesame task completion submitted twiceaccept already-applied
business rejectionno jurisdictionbranch/close case
policy conflictrule version mismatchcreate manual policy review task
stale eventevidence submitted after closurelate event handling
unknown outcometimeout after command may have succeededquery by idempotency key before retry
non-compensatable side effectnotification sent wrongcorrection/incident workflow
systemic failuredownstream unavailable for hourspause workflow, alert, runbook

Never retry business rejection as if it were transient failure.


20. Unknown Outcome Handling

The hardest failure is not success or failure. It is “I don’t know whether it succeeded.”

Example:

  1. Workflow sends IssueDecision.
  2. Decision Service commits decision and outbox event.
  3. Network timeout occurs before workflow receives response.
  4. Workflow retries.

Correct behavior:

  • retry uses same idempotency key,
  • Decision Service returns AlreadyApplied,
  • workflow continues safely,
  • audit trail links retry to original command.

Incorrect behavior:

  • create second decision,
  • fail workflow permanently,
  • manual operator edits database,
  • drop event as duplicate without preserving trace.

21. Java Package Structure

Example service package for workflow coordinator:

com.acme.enforcement.workflow
├── api
│   ├── EnforcementWorkflowController.java
│   └── dto
├── application
│   ├── StartEnforcementWorkflowHandler.java
│   ├── SignalWorkflowEventHandler.java
│   └── QueryWorkflowStatusHandler.java
├── domain
│   ├── EnforcementProcess.java
│   ├── EnforcementProcessState.java
│   ├── ProcessTimer.java
│   ├── ProcessTransition.java
│   └── WorkflowPolicy.java
├── orchestration
│   ├── EnforcementLifecycleWorkflow.java
│   ├── activities
│   │   ├── CaseActivities.java
│   │   ├── EvidenceActivities.java
│   │   ├── AssessmentActivities.java
│   │   ├── DecisionActivities.java
│   │   └── NotificationActivities.java
│   └── signals
│       ├── EvidenceSubmittedSignal.java
│       └── LegalReviewCompletedSignal.java
├── infrastructure
│   ├── clients
│   ├── persistence
│   ├── messaging
│   └── telemetry
└── config

If using external engine SDK, keep SDK-specific code under infrastructure or orchestration adapter, not scattered across domain/application code.


22. Observability Contract for Workflow

Each workflow instance should expose:

  • workflow status,
  • current state,
  • waiting reason,
  • current task/timer,
  • last successful command,
  • last failed command,
  • retry count,
  • compensation state,
  • policy version,
  • assigned actor/role for current human task,
  • due date/SLA risk,
  • trace/correlation ID.

Metrics:

MetricMeaning
workflow_started_totalnew enforcement workflows
workflow_completed_totalsuccessful completion
workflow_failed_totalfailed workflows
workflow_stuck_totalworkflows without progress beyond threshold
workflow_step_duration_secondsduration by step
human_task_age_secondsage of open human tasks
workflow_compensation_totalcompensation frequency
workflow_late_event_totallate/stale events
workflow_retry_totalactivity retry count

This will be expanded in Part 096.


23. Testing Strategy

Test the workflow as a state machine, not just controller/unit methods.

23.1 Golden Path

  • case opened,
  • classified jurisdiction valid,
  • evidence submitted before SLA,
  • legal review approved,
  • decision issued,
  • parties notified,
  • monitoring started.

23.2 Business Branches

  • no jurisdiction,
  • insufficient evidence,
  • legal review rejected,
  • evidence SLA breached,
  • appeal received,
  • high-risk case requires second approval.

23.3 Failure Scenarios

  • duplicate event,
  • late event,
  • activity timeout with unknown outcome,
  • downstream service unavailable,
  • worker crash after activity success,
  • human task completed twice,
  • policy version mismatch,
  • notification succeeds but response lost,
  • compensation fails,
  • workflow version upgrade during active process.

23.4 Property-Like Invariants

  • decision cannot be issued without approved legal review,
  • issued decision cannot be deleted,
  • closed case cannot accept new assessment unless reopened,
  • every workflow branch ends in completed, closed, compensated, or manual intervention,
  • every legal decision has audit trail,
  • every external notification has delivery attempt record,
  • every timer either fires, is cancelled, or is superseded with reason.

24. Architecture Review Checklist

Use this checklist before approving workflow architecture.

Boundary

  • Is workflow state separated from domain truth?
  • Does each domain service own its state and invariant?
  • Does workflow avoid direct database access to other services?
  • Are human tasks first-class process state?

Correctness

  • Are local transactions clear?
  • Are all side-effect commands idempotent?
  • Is unknown outcome handling explicit?
  • Are stale/late events handled?
  • Are compensation actions business-valid?

Operational

  • Can operators see stuck workflows?
  • Are timers observable?
  • Are retries bounded?
  • Is there a manual intervention path?
  • Are runbooks linked to workflow failure states?

Audit

  • Are actor, reason, policy version, and causation captured?
  • Can a decision be reconstructed end-to-end?
  • Are correction events append-only?
  • Are human task actions audited?

Evolution

  • Is workflow versioning strategy defined?
  • Are old instances safe during deployment?
  • Are command/event contracts backward-compatible?
  • Is process migration explicit?

25. Common Failure Modes

SmellMeaningFix
Workflow updates all databasescoordinator became distributed transaction scriptrestore service ownership
All events trigger all servicesevent soupdefine event taxonomy and ownership
Compensation deletes historyaudit violationappend correction events
Human task outside workflowinvisible process statemodel task as durable step
No idempotency keyretry unsafeintroduce command identity and dedupe store
No timer ownershipSLA missed silentlymake timer part of process model
Workflow diagram owns domain rulesdomain logic leaked to process layermove invariant to domain/policy service
Pure choreography for complex processno global progress visibilityadd process manager/workflow coordinator
Workflow engine used for everythinggod orchestratorkeep simple flows local/event-driven
No versioningdeployment breaks active casesversion process and policy

26. A Minimal Workflow Decision Record

# ADR: Use Orchestrated Workflow for Enforcement Lifecycle

## Context
The enforcement lifecycle spans case intake, evidence, assessment, legal review,
decision issuance, notification, and monitoring. The process includes human tasks,
SLA timers, audit requirements, and non-rollbackable side effects.

## Decision
Use an orchestrated workflow/process manager for enforcement lifecycle coordination.
Domain truth remains owned by individual services. Workflow state stores process
progress only.

## Alternatives Considered
1. Pure choreography with events
2. Single case service coordinating all steps internally
3. Distributed transaction/2PC
4. External workflow engine
5. Custom lightweight process manager

## Consequences
- More explicit process visibility
- More operational surface
- Requires idempotent activities
- Requires workflow versioning
- Requires audit correlation

## Fitness Functions
- Every command has idempotency key
- Every workflow state has owner and runbook
- Every final decision has audit chain
- Workflow never accesses private DB tables of other services
- Stuck workflow detection exists for every waiting state

27. Exercises

  1. Take one enforcement process from your system and build a saga step matrix.
  2. Identify which steps are compensatable, correctable, or non-reversible.
  3. Define the idempotency key for every command.
  4. Draw the process state machine and mark every timer.
  5. List which service owns each domain fact.
  6. Define what happens if every activity times out after downstream success.
  7. Create a workflow versioning rule for a new mandatory legal review step.
  8. Write a reconstructability query: “Why was decision X issued?”

28. Final Takeaway

For regulatory microservices, workflow is not decoration. It is the part of the architecture that remembers what the business process is waiting for.

A strong design keeps three truths separate:

  1. Domain truth belongs to domain services.
  2. Process truth belongs to workflow/process manager.
  3. Evidence truth belongs to audit/evidence chain.

The top-level skill is knowing which truth you are changing at each line of code.


References

Lesson Recap

You just completed lesson 95 in final stretch. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.