Deepen PracticeOrdered learning track

Deployment Topology and Runtime Environments

Learn Enterprise CPQ OMS Camunda 7 - Part 048

Deployment topology and runtime environment design for a production-grade Java microservices CPQ and OMS platform using GlassFish, Jersey/JAX-RS, PostgreSQL, Kafka, Redis, and Camunda 7.

19 min read3683 words
PrevNext
Lesson 4864 lesson track3653 Deepen Practice
#java#microservices#cpq#oms+10 more

Part 048 — Deployment Topology and Runtime Environments

A CPQ/OMS platform can look clean in code and still fail in production because the runtime topology is wrong.

Common examples:

  • every service shares one overloaded PostgreSQL instance without pool control
  • Camunda job executor runs inside nodes that should only serve REST traffic
  • external task workers scale without respecting remote dependency limits
  • BFF calls too many services synchronously
  • Redis is deployed as if it were source of truth
  • Kafka topics exist, but no one owns replay, retention, or DLQ operation
  • database migrations run after new code starts
  • process definitions are deployed without compatibility with running process instances
  • secrets and tenant config drift across environments
  • staging has no meaningful similarity to production

Deployment topology is architecture made physical.

It decides where failure spreads, where capacity is consumed, where state lives, and how safely releases can happen.


1. Runtime Topology Is a Correctness Concern

Deployment is not only DevOps packaging.

For CPQ/OMS, topology affects correctness:

Topology mistakeBusiness consequence
Quote service and reporting share DB poolquote acceptance slows during export
Camunda engine shares schema with domain tables casuallymigration and lock risk become coupled
External workers over-scaleexternal inventory/billing is overloaded
No idempotent startup behaviorduplicate process instances/orders
No environment parityproduction-only workflow bugs
No graceful shutdownhalf-processed external tasks
Bad network segmentationinternal admin APIs exposed
No deployment ordernew consumers fail on old schema/events

A top-level engineer sees runtime topology as part of the domain safety model.


2. Baseline Deployment Units

This series uses these logical deployment units.

This is a logical map, not a mandate that every box must be its own repository or cluster.

But each box has a separate runtime responsibility.


3. Stateless vs Stateful Boundary

Services should be stateless at runtime.

Truth lives in stateful platforms.

ComponentRuntime state?Authority?
JAX-RS service nodeno durable stateno
BFFno durable stateno
External task workerno durable stateno
Camunda engineprocess runtime state in DBworkflow authority
PostgreSQLdurable domain stateyes
Kafkadurable event logevent distribution, not domain DB
Redisephemeral/cache stateno for core domain truth
Object storagedocument/artifact binaryartifact binary authority with DB metadata

A service instance must be replaceable.

If killing a service instance destroys quote/order truth, the topology is wrong.


4. GlassFish/Jersey Deployment Shape

Because this stack uses JAX-RS/Jersey and GlassFish, a common unit is a WAR application per service.

Example:

quote-service.war
order-service.war
pricing-service.war
catalog-service.war
configuration-service.war
policy-service.war
notification-service.war
document-service.war
search-service.war

Each WAR should own:

  • JAX-RS resources
  • application services
  • domain model
  • repository implementation
  • outbound adapters
  • service-specific configuration
  • Flyway/Liquibase migration module or migration artifact
  • OpenAPI contract
  • schema contract
  • test fixtures

Each WAR should not contain:

  • entity classes from another service
  • direct repository access to another service database
  • Camunda internal entities unless it is the workflow application
  • generated clients mixed with domain objects
  • shared mutable global state

GlassFish gives a Jakarta EE runtime.

But the runtime does not automatically create service boundaries.

The boundary is created by deployment, database ownership, API contracts, event ownership, and authorization.


A serious CPQ/OMS platform needs environment stages with explicit purpose.

EnvironmentPurposeData style
Localdeveloper feedbacksynthetic fixtures
Component CIservice testsdisposable DB/Kafka/Redis
Integrationmulti-service contract verificationsynthetic but realistic
Workflow LabBPMN/DMN/process migration testscenario catalog
Performanceload/capacity modelgenerated production-shaped data
UATbusiness scenario acceptancecurated business data
Pre-prodrelease rehearsalproduction-like config, anonymized data if allowed
Productionreal business operationreal data

Do not use one shared “dev” environment as the only integration proof.

Shared dev environments become polluted, unstable, and misleading.


6. Environment Parity That Actually Matters

Perfect parity is expensive.

Useful parity is specific.

For CPQ/OMS, prioritize parity for:

  • database schema and constraints
  • transaction isolation expectations
  • connection pool limits
  • Camunda process engine configuration
  • job executor/external worker behavior
  • Kafka partition count and retention class
  • Redis eviction/TTL behavior
  • auth/tenant claims
  • external dependency timeout behavior
  • process definitions and DMN versions
  • observability pipelines
  • deployment order

Do not obsess over identical CPU count in every environment while ignoring that staging has no process incidents, no Kafka replay, no Redis eviction, and no tenant isolation.


7. Camunda 7 Topology Options

Camunda 7 can be run in different ways.

The topology decision affects coupling, scaling, and migration flexibility.

Option A — Embedded Engine Per Workflow Application

Pros:

  • simple deployment for one workflow app
  • code and BPMN deployed together
  • local control of delegates

Cons:

  • process engine lifecycle tied to app deployment
  • scaling REST/API traffic and job execution can be coupled
  • harder to keep domain services clean if delegates call repositories directly
  • migration fence weaker if business logic lives inside delegates

Option B — Shared Camunda Engine

Pros:

  • centralized process runtime
  • external task pattern can decouple workers
  • engine operations more visible
  • domain services can remain workflow-aware but engine-independent

Cons:

  • shared runtime becomes critical platform
  • process deployment governance required
  • tenant/process authorization must be strong
  • cluster/job executor tuning becomes platform-level work

Option C — Remote Engine + External Task Workers

This is often the cleanest enterprise boundary.

Domain services own business truth.

Camunda owns workflow state.

Workers perform integration actions through service APIs.

This pattern is strong because business mutation remains behind service APIs.


8. Camunda Runtime Segmentation

Do not assume all Camunda nodes must do all things.

You may separate:

Node typeResponsibility
Webapp/REST nodesCockpit/Tasklist/REST access
Engine/job executor nodesasync continuations, timers, internal jobs
Deployment node/jobBPMN/DMN deployment
External workersexternal task execution
Admin/ops accesscontrolled operator functions

For high control, you may disable job execution in some nodes and dedicate execution capacity to specific nodes.

The key is not the exact shape.

The key is to avoid accidental coupling between UI/admin traffic, REST traffic, process deployment, and job execution.


9. Camunda Database Boundary

Camunda engine tables should be treated as Camunda-owned.

Domain services should not query or mutate Camunda tables directly.

Allowed access patterns:

  • Camunda REST/Java API
  • exported/projection events
  • operational reports through supported APIs or controlled read replicas if formally accepted
  • separate workflow correlation table in domain DB if needed

Not allowed:

  • joining order tables directly with Camunda runtime tables in business code
  • using Camunda history tables as the only business audit trail
  • manually changing process state through SQL in ordinary operations
  • coupling domain migration to Camunda internal schema

Camunda workflow state is useful evidence.

It is not a replacement for domain state.


10. PostgreSQL Topology

Each service should own its schema or database boundary.

Common options:

OptionDescriptionTrade-off
Database per servicestrongest isolationmore operational overhead
Schema per servicepragmatic isolationshared instance blast radius
Shared schemaweak boundaryavoid for microservices

For this learning series, a pragmatic enterprise topology is:

PostgreSQL cluster / instance
  schema catalog
  schema configuration
  schema pricing
  schema quote
  schema order
  schema policy
  schema notification
  schema document
  schema audit
  schema projection
  schema camunda

With strict rules:

  • service user can access only its schema
  • cross-service reads go through API/event/projection
  • no foreign keys across service schemas
  • migrations are owned per service
  • reporting uses projections or replicas, not direct OLTP joins
  • Camunda schema is isolated

This is not as pure as database-per-service.

But it is far safer than shared tables.


11. PostgreSQL Connection Pool Topology

Connection pools are capacity contracts.

If every service sets maxPoolSize=100, the database will be destroyed under load.

Design pools from DB capacity backward.

Example:

ServicePool purposeMax connections
quote-serviceOLTP commands/queries20
order-serviceOLTP commands/queries20
pricing-serviceprice calculation reads/writes15
catalog-servicecatalog reads/publish15
camunda-engineworkflow runtimesized separately
search-serviceprojection writes10
reporting/exportseparate pool/replicaisolated

Rules:

  • separate long-running export from OLTP
  • use statement timeout
  • avoid idle transaction leakage
  • monitor pool wait time
  • reject early when pool is saturated
  • never scale service pods without checking DB connection budget

12. Kafka Topology

Kafka is the event distribution backbone.

Deployment decisions include:

  • topic ownership
  • partition count
  • replication factor
  • retention
  • compaction or delete policy
  • consumer group ownership
  • DLQ topic strategy
  • schema registry integration
  • ACLs
  • replay tooling

Example topic classes:

TopicOwnerKeyRetention intent
quote.events.v1Quote ServicequoteIddomain event stream
order.events.v1Order ServiceorderIddomain event stream
catalog.events.v1Catalog ServicecatalogVersion/productOfferingIdpublication/change stream
notification.commands.v1Notification ServicecommunicationIdcommand/event boundary
audit.events.v1Audit ServicesubjectIdevidence ingestion
projection.dlq.v1Projection owneroriginal keyoperational recovery

Do not let every service publish to every topic.

Topic ownership is part of the architecture.


13. Redis Topology

Redis should be deployed according to usage class.

UsageTopology concern
cachememory sizing, eviction policy, TTL discipline
idempotency fast-pathpersistence/fallback to PostgreSQL
rate limitavailability vs fail-open/fail-closed policy
ephemeral workspace stateacceptable loss behavior
lock coordinationclock/TTL/fencing caveats

Avoid mixing unrelated criticality in one Redis deployment without conscious policy.

Example:

redis-cache-catalog-pricing
redis-rate-limit
redis-ui-ephemeral

or one cluster with strict key namespace, memory budgets, and monitoring.

The key question:

if this Redis data disappears, what business truth is lost?

The acceptable answer for core CPQ/OMS should be:

none.


14. Object Storage for Artifacts

Quote documents, signed agreements, rendered proposals, and generated files should not live as large blobs inside core OLTP rows unless there is a deliberate reason.

Recommended split:

DataLocation
artifact metadataPostgreSQL document schema
artifact binaryobject storage
artifact hashPostgreSQL metadata
template versionPostgreSQL/config repository
render input snapshotPostgreSQL or object storage with hash
access policydocument service + authorization

Deployment implications:

  • object storage credentials are service-specific
  • access should go through signed URL or document service authorization
  • artifact immutability must be enforced
  • backup/retention policy must align with legal/commercial requirement

15. Network Zones

A useful topology separates traffic classes.

Principles:

  • public traffic should not reach internal service ports directly
  • Camunda Cockpit/Tasklist/Admin endpoints must be protected
  • service-to-service traffic should authenticate
  • database ports are not exposed to application users
  • external provider callbacks enter through controlled ingress
  • admin/control-plane APIs are separated from public APIs

Network topology is part of authorization.


16. Configuration and Secrets

Configuration categories:

CategoryExampleChange frequency
build-timeJava version, dependenciesrelease
deploy-timeservice URL, pool size, feature flag defaultdeployment
runtime dynamicapproval thresholds, catalog availabilitygoverned config
secretDB password, OAuth client secretrotation policy
tenant configtenant catalog segment, policy setcontrolled admin change

Rules:

  • secrets are not stored in Git
  • tenant config changes are audited
  • policy/config changes have effective date
  • service startup validates required config
  • config drift is detectable
  • dangerous config has guardrails
  • local config cannot silently become production config

CPQ/OMS has many business policies.

Do not hide policy changes inside environment variables.


17. Deployment Order

Release order matters because services, schemas, events, and processes evolve together.

Safe general order:

For breaking changes, use expand-migrate-contract:

  1. expand schema/contract to support old and new
  2. deploy code that writes/reads both if needed
  3. backfill data
  4. move traffic
  5. verify
  6. remove old paths in later release

Never deploy code that requires a new column before the migration exists.

Never deploy a process definition that calls a worker topic no worker understands.

Never remove an event field while existing consumers still require it.


18. Process Definition Deployment

Camunda process deployment has its own compatibility concerns.

Questions before deploying new BPMN/DMN:

  • Are existing process instances allowed to continue on old definition?
  • Do we need process instance migration?
  • Are external task topic names compatible?
  • Are process variables schema-compatible?
  • Are DMN decision input names stable?
  • Are timer semantics changed?
  • Are incidents expected after deployment?
  • Is rollback possible?
  • Are task forms/worklist projections compatible?

In many cases, do not migrate running instances automatically.

Let existing orders finish on old process definition unless there is a clear operational reason.

Workflow versioning is not the same as code versioning.


19. Feature Flags

Feature flags are useful, but dangerous if they become hidden business policy.

Good uses:

  • progressive rollout of a new quote workspace
  • enable new pricing rule evaluator for one tenant
  • route small percentage of preview traffic to new service
  • disable non-critical notification channel
  • activate new projection read model

Bad uses:

  • secretly bypass approval
  • silently change price calculation without versioned evidence
  • hide schema incompatibility
  • create permanent forked behavior nobody owns

Every flag needs:

  • owner
  • purpose
  • default
  • expiry/removal plan
  • audit for changes
  • tenant scope if applicable
  • test coverage for both states

20. Blue/Green and Rolling Deployment

Stateless services can often roll gradually.

But CPQ/OMS deployment must account for:

  • DB schema compatibility
  • API contract compatibility
  • event schema compatibility
  • idempotency behavior
  • workflow topic compatibility
  • cache invalidation
  • projection rebuild
  • user session/workspace continuity

A service is rolling-deployable only if old and new versions can run together.

Checklist:

CompatibilityRequired question
APICan old BFF call new service and new BFF call old service?
DBCan old and new code read/write schema?
EventCan old/new consumers process old/new events?
BPMNCan workers support old and new process definitions?
RedisAre key names/versioning compatible?
Search projectionCan projection handle replay from both versions?

If not, use a controlled cutover with traffic freeze, migration, verification, and rollback plan.


21. Graceful Shutdown

Graceful shutdown is critical for workers and services.

On shutdown:

  • stop accepting new HTTP requests
  • stop fetching new external tasks
  • finish or safely abandon in-flight tasks
  • stop Kafka polling and commit offsets only after successful processing
  • release resources
  • preserve idempotency state
  • flush telemetry if safe
  • avoid starting new Camunda process instances after shutdown begins

External task workers must be especially careful.

If the worker performed the external side effect but dies before completing the Camunda task, retry may happen.

Therefore external calls need idempotency and process recovery needs reconciliation.


22. Health Checks

Use different checks for different purposes.

CheckPurposeShould include
livenessshould container be restarted?local process health only
readinessshould traffic be sent?DB connectivity, critical config, dependency readiness policy
startupis app initialized?migration/dependency initialization status
deep healthoperator diagnosisdependency details, non-public

Do not make liveness depend on every remote dependency.

That can cause cascading restarts during dependency outage.

Readiness should be strict enough to avoid sending traffic to broken nodes.

Deep health should not expose secrets or internal topology publicly.


23. Scaling Model

Scale based on bottleneck, not feelings.

ComponentScale signal
BFFrequest latency, CPU, backend fan-out latency
Quote Servicecommand latency, DB pool wait, optimistic conflict rate
Pricing Servicecalculation latency, CPU, cache hit ratio
Catalog Servicepublish workload, cache invalidation, read latency
Order Servicecommand backlog, fulfillment state transitions
External Task Workerstask backlog, dependency capacity, failure rate
Camunda Enginejob acquisition latency, incidents, DB load
Kafka Consumerslag, processing time, DLQ growth
Search Projectionprojection lag, write throughput
PostgreSQLCPU, I/O, locks, pool wait, query latency
Redismemory, evictions, command latency, hot keys

Do not scale workers beyond external dependency contracts.

More workers can mean faster outage.


24. Multi-Tenant Deployment Concerns

Tenant isolation can be implemented at different layers.

LayerConcern
API gatewaytenant routing and authentication
servicetenant context validation
PostgreSQLtenant_id constraints/indexing or schema/database isolation
Kafkatenant in event envelope, ACL/topic strategy
Redistenant-scoped keys
Camundatenant id/process definition segregation if used
object storagetenant-safe path/bucket policy
logs/metricstenant-aware but privacy-safe dimensions

Do not rely on frontend filtering for tenant isolation.

Tenant boundary must exist in backend authorization and data access.


25. Operational Access

Production topology must include controlled operational access.

Operators need to:

  • inspect order state
  • inspect workflow state
  • inspect failed jobs/incidents
  • retry safe operations
  • quarantine poison events
  • replay DLQ events
  • regenerate read models
  • view audit trail
  • view artifact metadata
  • manage feature flags
  • deploy/rollback BPMN/DMN safely

But operators should not have unrestricted database write access as the normal recovery mechanism.

Operational control plane should expose safe commands with authorization and audit.


26. Local Development Topology

Local development should optimize feedback, not pretend to be production.

Minimum local stack:

PostgreSQL
Kafka-compatible broker or Kafka
Redis
Camunda 7 engine/webapp
selected services under development
mock external systems

Local development should support:

  • deterministic seed data
  • one-command environment reset
  • BPMN/DMN deployment
  • contract validation
  • event inspection
  • outbox inspection
  • external mock behavior selection
  • test tenant setup

Do not require every developer to run the entire enterprise platform for every change.

Support slices:

catalog + configuration + pricing
quote + policy + document
order + camunda + workers
notification + external provider mock
projection + kafka + postgres

27. Runtime Verification After Deployment

After deployment, verify behavior, not only process health.

Smoke checks:

  • create quote draft
  • configure product
  • price quote
  • submit approval
  • complete approval in test tenant
  • accept quote
  • create order
  • start order workflow
  • process one external task through mock/sandbox dependency
  • emit and consume event
  • update projection
  • generate document
  • send test notification or enqueue safely
  • verify audit trail

Technical checks:

  • DB migrations applied
  • no connection pool saturation
  • Kafka consumer lag normal
  • Redis no unexpected evictions
  • Camunda no new incident spike
  • error rate within threshold
  • traces flow across services
  • DLQ empty or expected
  • feature flags correct

28. Deployment Anti-Patterns

Anti-Pattern 1: Shared Everything

One shared database user, one schema, one Redis namespace, one Kafka topic, one thread pool.

This removes isolation and creates ambiguous ownership.

Anti-Pattern 2: Camunda as the Domain Database

Workflow state is not quote/order truth.

Anti-Pattern 3: Scale Everything Horizontally

Scaling services without checking DB, Kafka partitioning, external dependency limits, and Camunda job executor behavior can increase failure.

Anti-Pattern 4: Migrations as Startup Side Effect Without Control

Automatic migrations on every service startup can create race conditions and unpredictable production startup.

Anti-Pattern 5: No Worker Version Strategy

New BPMN calls a topic or variable shape that old workers do not support.

Anti-Pattern 6: Staging That Proves Nothing

A staging environment without realistic process definitions, Kafka replay, DB constraints, auth claims, and dependency failure behavior gives false confidence.

Anti-Pattern 7: Operational Recovery Through SQL Patches

SQL patches may be necessary in emergencies, but they should not be the normal operational interface.


29. Production Topology Checklist

Before go-live:

  • Are service deployment units clear?
  • Does each service own its schema/API/event boundary?
  • Is Camunda schema isolated?
  • Are Camunda job execution and REST/webapp access intentionally placed?
  • Are external workers separately scalable and bounded?
  • Are DB connection budgets calculated?
  • Are Kafka topics owned and ACL-protected?
  • Are Redis namespaces and TTL policies defined?
  • Are object artifacts immutable and access-controlled?
  • Are secrets managed outside Git?
  • Are tenant boundaries enforced backend-side?
  • Are migrations ordered and backward-compatible?
  • Are BPMN/DMN deployments versioned and tested?
  • Can old and new service versions run together during rolling deploy?
  • Is graceful shutdown implemented?
  • Are liveness/readiness/deep health checks separated?
  • Are operational actions auditable?
  • Is environment parity meaningful for workflow, DB, Kafka, Redis, auth, and observability?

30. Mental Model

Deployment topology is not where code runs.

It is where responsibility, state, capacity, and failure boundaries become real.

For CPQ/OMS, the production rule is:

the runtime must make the safe architecture easier than the unsafe shortcut.

If topology makes it easy to bypass service boundaries, query another service database, mutate Camunda tables, overrun external dependencies, or patch data without audit, the system will eventually do those things under pressure.

Design the runtime so correctness is the path of least resistance.

Lesson Recap

You just completed lesson 48 in deepen practice. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.