Series/Learn Java Microservices Design and Architect

Series MapLesson 64 / 100

Deepen PracticeOrdered learning track

API Gateway, Edge, and BFF Design

Learn Java Microservices Design and Architect - Part 064

API gateway, edge, and Backend-for-Frontend design for Java microservices: responsibilities, topology, routing, policy enforcement, aggregation, BFF boundaries, security, observability, and anti-patterns.

[2026-07-05]16 min read3075 words

In This Lesson

1. Core idea 2. Terms that must not be confused 3. Edge topology

PrevNext

Lesson 64100 lesson track55–82 Deepen Practice

#java#microservices#architecture#api-gateway+6 more

Part 064 — API Gateway, Edge, and BFF Design

1. Core idea

An API gateway is not a place to hide bad service boundaries.

A Backend-for-Frontend is not a dumping ground for UI-specific hacks.

The edge layer exists because external traffic has different concerns from internal service collaboration.

External clients need:

stable public entry point
authentication entry
rate limit and quota
TLS termination or forwarding policy
request validation
routing
protocol adaptation
response shaping
client-specific aggregation
observability
abuse protection
API lifecycle management

Internal services need:

business capability boundaries
data ownership
service-to-service identity
reliability contracts
domain-specific APIs
independent deployability

The edge should protect and adapt.

It should not become the business brain.

The main rule:

Put edge policy at the edge, client experience adaptation in the BFF, and domain decisions in domain services.

When this rule is violated, the gateway becomes a distributed monolith in disguise.

2. Terms that must not be confused

Term	Main responsibility	Should contain business domain logic?
Load balancer	Distribute traffic to runtime targets	No
Ingress	Route external HTTP/HTTPS traffic into cluster services	No
API gateway	Public/internal API entry, routing, cross-cutting edge policy	Minimal, policy only
Edge service	Boundary service between outside world and internal platform	Minimal, interface adaptation
BFF	Client-specific backend that adapts APIs for one frontend experience	Limited experience orchestration, not domain authority
Aggregator	Compose multiple backend responses into one response	No authoritative state mutation
Service mesh	Service-to-service traffic management and identity	No business logic
Domain service	Own business capability, data, invariant, commands	Yes

A gateway may implement some aggregator behavior.

A BFF may call an API gateway.

A gateway may run inside Kubernetes behind Ingress.

A service mesh may sit behind gateway.

But responsibilities must remain clear.

3. Edge topology

A typical topology:

This diagram shows multiple layers.

But not every system needs every layer.

Architecture should be driven by constraints:

number of client types
security exposure
partner integration requirements
rate limiting needs
API lifecycle complexity
aggregation complexity
team ownership
latency budget
regulatory/audit requirements
deployment topology

Small internal systems may only need:

Ingress -> Service

Large enterprise systems may need:

CDN/WAF -> API management -> Gateway -> BFF -> Domain Services

The wrong architecture is not "too simple" or "too complex".

The wrong architecture puts responsibility in the wrong layer.

4. What belongs at the gateway

Good gateway responsibilities:

TLS policy / certificate boundary where appropriate
host/path routing
API version routing
coarse authentication entry
token validation / identity propagation
coarse authorization gate where policy is edge-level
rate limiting
quota enforcement
request size limits
header normalization
CORS policy
WAF integration
request/response compression
observability enrichment
correlation ID creation/propagation
protocol translation when justified
API key management for partners
traffic splitting/canary routing
coarse caching for safe public reads

Bad gateway responsibilities:

deciding whether a case should be approved
calculating regulatory risk score
owning domain entity lifecycle
writing to multiple service databases
compensating failed business transactions
enforcing object-level authorization without domain context
becoming the only place where validation exists
hiding breaking service APIs behind ad hoc transformation forever
joining internal service data for arbitrary reporting
acting as a shared service SDK over HTTP

The gateway can reject invalid traffic.

It should not become the source of business truth.

5. Gateway policy vs domain policy

A useful split:

Gateway policy:
  Can this request enter the system?

Domain policy:
  Is this actor allowed to perform this business action on this object right now?

Example:

Gateway can check:

token exists
token is valid
token audience is correct
client is allowed to call partner API
request size is within limit
rate limit not exceeded
required headers exist

Domain service must check:

user can access this specific case
case is in a state where action is allowed
actor has required role for this workflow step
tenant owns the requested object
decision can be made under current policy version
command does not violate invariant

If object-level authorization is only in the gateway, it will eventually fail.

Why?

Because object-level rules usually need domain state.

The gateway should not own domain state.

6. BFF mental model

A Backend-for-Frontend is a backend designed for one frontend experience.

It exists because different client experiences often need different API shapes:

web app wants rich screen composition
mobile app wants small payloads and fewer round trips
partner API wants stable public contract
admin console wants operational commands
public portal wants stricter exposure and caching

The BFF owns experience composition.

It does not own domain truth.

Good BFF responsibilities:

screen-shaped query composition
client-specific DTO shaping
frontend session support where needed
feature flag evaluation for UI behavior
reducing client round trips
client-specific caching
anti-corruption for external/partner contract
protocol adaptation
response degradation for optional fragments
user journey telemetry

Bad BFF responsibilities:

direct database access to domain data
duplicate business invariant
write orchestration that should be a workflow/saga
hidden shared business logic copied across BFFs
authorization decisions requiring deep domain state
becoming a "frontend monolith" with all backend behavior

7. API gateway vs BFF

The API gateway is usually client-neutral or policy-oriented.

The BFF is client-specific.

Concern	API Gateway	BFF
Routing	Yes	Sometimes
Authentication/token validation	Yes	Sometimes
Rate limiting	Yes	Sometimes per journey
Client-specific response shape	Rarely	Yes
Domain command invariant	No	No
Screen composition	No or limited	Yes
Partner contract translation	Sometimes	Yes, if partner-specific facade
Coarse cache	Yes	Yes
Business data ownership	No	No
Service aggregation	Limited	Common
UX-specific fallback	Rare	Common

A common topology:

API Gateway -> BFF -> Domain Services

Another valid topology:

Client -> BFF -> Internal Gateway -> Domain Services

Another:

Partner -> API Management/Gateway -> Partner Facade -> Internal Services

Do not argue from labels.

Argue from responsibility.

8. Edge routing design

Routing should be explicit and boring.

Example route table:

routes:
  - id: web-case-bff
    host: app.example.gov
    path: /api/web/cases/**
    destination: web-case-bff
    auth: user-oidc
    rateLimit: user-standard

  - id: mobile-case-bff
    host: mobile-api.example.gov
    path: /api/mobile/cases/**
    destination: mobile-case-bff
    auth: user-oidc
    rateLimit: mobile-standard

  - id: partner-intake-api
    host: partner-api.example.gov
    path: /v1/intake/**
    destination: partner-intake-facade
    auth: partner-mtls-jwt
    rateLimit: partner-contractual

Routing smell:

path: /**
destination: core-service

This says the gateway is hiding an unclear internal architecture.

A route should expose:

consumer group
API surface
authentication mode
rate limit policy
destination owner
version lifecycle
observability tags
emergency disable option

9. API versioning at the edge

Gateway routing is often involved in versioning.

But versioning should not become permanent confusion.

Good edge versioning:

/v1/partner/case-intake -> partner-intake-facade-v1
/v2/partner/case-intake -> partner-intake-facade-v2

Bad edge versioning:

Gateway silently maps every old field to every new internal service forever.

The gateway can help with:

version route selection
deprecation headers
traffic migration
compatibility windows
partner-specific contract maintenance
kill-switch for unsafe old versions

But the lifecycle still needs ownership:

who owns the public contract?
how long is compatibility supported?
how are consumers notified?
what metrics show remaining v1 usage?
what is the retirement date?

Example response headers:

Deprecation: true
Sunset: Wed, 31 Dec 2026 23:59:59 GMT
Link: <https://developer.example.gov/apis/case-intake/v2>; rel="successor-version"

10. Request identity propagation

The edge creates or normalizes request identity.

Common headers:

X-Request-Id
X-Correlation-Id
traceparent
tracestate
X-Forwarded-For
X-Forwarded-Proto
X-Forwarded-Host

Rules:

accept external correlation ID only if trusted or sanitized
generate one if missing
preserve W3C trace context where possible
do not trust user-controlled forwarding headers unless added by trusted proxy
log normalized client identity safely
propagate authenticated subject and workload identity separately
never encode authorization truth only in arbitrary headers from the outside

Bad:

Gateway passes X-User-Role from browser to services.

Better:

Gateway validates token, forwards signed/verified identity context, and services enforce domain authorization.

Even better in high-assurance systems:

services validate tokens or mTLS identity themselves
gateway is not the only trust anchor
service-to-service authorization is enforced internally

11. Rate limiting and quota

Rate limiting protects system capacity and business contracts.

It can be applied by:

IP
user
tenant
client application
partner
endpoint
operation group
token subject
API key
contractual quota bucket

Rate limit types:

Type	Use
Fixed window	Simple, can burst at boundaries
Sliding window	Smoother fairness
Token bucket	Allows controlled bursts
Leaky bucket	Smooth output rate
Concurrency limit	Protects in-flight capacity
Cost-based limit	Different operations consume different cost

For microservices, rate limiting should not exist only at the edge.

Use multiple layers:

Edge rate limit:
  protect public surface and partner contracts

BFF rate limit:
  protect user journey and expensive aggregation

Service concurrency limit:
  protect service instance resources

Dependency limit:
  protect downstream services

A common mistake:

Edge allows 1000 RPS, but one endpoint fans out to 8 services with 2 retries each.

Effective downstream load could be far larger than edge traffic.

Rate limit policy must consider fan-out.

12. Gateway aggregation

Gateway aggregation combines multiple backend calls into one client response.

It can reduce client complexity and round trips.

But it creates fan-out risk.

Example:

Composition contract should classify fragments:

Fragment	Required?	Timeout	Failure behavior
Case summary	Yes	500ms	fail response
Party profile	Usually	400ms	show partial profile unavailable
Evidence count	No	300ms	show unknown count
Pending decision	No	300ms	hide widget / show stale

The BFF should make partial failure explicit.

Example response:

{
  "case": {
    "id": "CASE-2026-00019",
    "status": "UNDER_REVIEW"
  },
  "partyProfile": null,
  "evidenceSummary": {
    "count": 12
  },
  "diagnostics": {
    "partial": true,
    "missingFragments": ["partyProfile"],
    "correlationId": "c-019283"
  }
}

Do not pretend partial data is complete.

13. BFF write operations

BFFs often handle commands from UI.

But write orchestration must be controlled.

Acceptable BFF write responsibility:

Validate request shape -> call one authoritative command endpoint -> map response.

Risky BFF write responsibility:

Call Service A, then Service B, then Service C, then update hidden local state.

If a user action requires a multi-step business process, consider:

domain service command that owns the process
workflow service
saga orchestrator
async command with status polling
process manager

Bad BFF:

public SubmitCaseResponse submit(SubmitCaseRequest request) {
    caseClient.createCase(request);
    evidenceClient.attachInitialEvidence(request.evidence());
    decisionClient.requestInitialScreening(request.caseId());
    notificationClient.notifySupervisor(request.caseId());
    return new SubmitCaseResponse("OK");
}

What happens if decisionClient succeeds and notificationClient times out?

The BFF now owns a business saga accidentally.

Better:

public SubmitCaseResponse submit(SubmitCaseRequest request) {
    CaseSubmissionResult result = caseCommandClient.submitCase(new SubmitCaseCommand(
            request.idempotencyKey(),
            request.partyId(),
            request.summary(),
            request.evidenceReferences()
    ));

    return SubmitCaseResponse.from(result);
}

The authoritative service owns the workflow or emits events.

14. BFF internal architecture

A BFF still needs good architecture.

Do not let it become a ball of controller code.

Suggested structure:

com.example.casewebbff
  api/
    CaseScreenController.java
    SubmitCaseController.java
    dto/
  application/
    GetCaseScreenQueryHandler.java
    SubmitCaseUseCase.java
    composition/
      CaseScreenComposer.java
      FragmentPolicy.java
  client/
    casecommand/
    casequery/
    partyprofile/
    evidence/
    decision/
  security/
    IdentityContextMapper.java
  telemetry/
    BffObservationNames.java
  config/
    DependencyPolicies.java

The BFF has no domain repository for service-owned data.

It may have:

short-lived cache
session store
user preference cache
feature exposure config
response shape mapping
UI-specific authorization hints

But it should not own authoritative business records.

15. Java/Spring gateway example

A simple Spring Cloud Gateway-style route intent:

spring:
  cloud:
    gateway:
      server:
        webflux:
          routes:
            - id: web-case-bff
              uri: http://web-case-bff
              predicates:
                - Host=app.example.gov
                - Path=/api/web/cases/**
              filters:
                - RemoveRequestHeader=Cookie
                - AddRequestHeader=X-Edge-Route, web-case-bff

Real production gateways need more than this:

authentication
authorization gate
CORS
body size limits
rate limiting
timeout
retry discipline
response header policy
trace propagation
structured access logs
mTLS or identity propagation
admin controls
route ownership metadata

A gateway route should be reviewed like code.

It is production behavior.

16. Ingress vs API gateway

Kubernetes Ingress commonly handles HTTP/HTTPS routing from outside the cluster to Services.

But Ingress alone is not necessarily a full API management layer.

Ingress concerns:

external HTTP/HTTPS entry
host/path routing
TLS termination depending on controller
load balancer integration
controller-specific annotations/features

API gateway concerns:

API lifecycle
authentication policy
rate limit/quota
request transformation
tenant/client policy
developer/partner contract
API analytics
cross-cutting resiliency
edge observability

Sometimes the same product can do both.

Sometimes they are separate.

Do not decide from product names.

Decide from required responsibilities.

17. Gateway timeout and retry policy

Gateway retry can be dangerous.

A gateway often does not know whether a backend command is safe to retry.

Safe gateway retry candidates:

idempotent reads
connection failure before request was sent
explicitly idempotent command with idempotency key
retry-aware backend contract

Dangerous gateway retry:

payment/approval/decision command without idempotency
POST with side effects and no operation ID
retry after backend might have processed request
retry across layered client and mesh retries

A safe policy:

edgePolicy:
  timeout: 2s
  retry:
    enabled: true
    maxAttempts: 2
    onlyForMethods:
      - GET
      - HEAD
    onlyForStatus:
      - 502
      - 503
      - 504
    backoff: jitter

For command APIs:

commandPolicy:
  retryAtGateway: false
  requireIdempotencyKey: true
  backendOwnsRetrySemantics: true

Gateway timeout must also fit the end-to-end deadline budget.

If browser timeout is 10s, gateway timeout 30s is meaningless.

18. Security at the edge

The edge is security-critical because it is exposed.

Controls:

TLS configuration
mTLS for partner/integration traffic where required
JWT/OIDC validation
API key validation for machine clients where appropriate
request body size limit
upload scanning pipeline where relevant
schema validation for public APIs
WAF rules
bot/abuse mitigation
IP allow/deny lists when justified
CORS policy
header sanitization
rate limiting
audit/security event logging
admin endpoint isolation

But remember:

Edge security is not enough. Internal services must still enforce domain authorization.

Example defense-in-depth:

The gateway can say:

This authenticated client may call this API family.

The domain service says:

This actor may approve this specific case in this specific state under this policy.

19. Observability at the edge

Edge telemetry should answer:

who is calling?
which API route?
what is the request rate?
what is rejected before backend?
what is backend latency?
what is gateway overhead?
which client/tenant is consuming quota?
which version is still used?
which routes cause fan-out?
which responses are degraded?

Useful metrics:

edge_requests_total{route="web-case-bff",status_family="2xx"}
edge_request_duration_seconds_bucket{route="web-case-bff"}
edge_rejections_total{route="partner-intake",reason="rate_limit"}
edge_auth_failures_total{route="partner-intake",reason="invalid_token"}
edge_backend_duration_seconds_bucket{route="web-case-bff",backend="web-case-bff"}
edge_deprecated_api_requests_total{api_version="v1",consumer="partner-x"}

Avoid high cardinality:

raw path with IDs
full user ID as metric label
case ID
request body field values
raw error message

Structured access log example:

{
  "event": "edge.request.completed",
  "route": "partner-intake-v1",
  "method": "POST",
  "pathTemplate": "/v1/intake/cases",
  "status": 202,
  "durationMs": 183,
  "consumerId": "partner-acme",
  "correlationId": "c-019283",
  "traceId": "4bf92f3577b34da6a3ce929d0e0e4736"
}

Access logs are part of audit-adjacent evidence.

Do not log secrets or sensitive payloads.

20. Partner API facade

Partner APIs need special treatment.

External partners care about:

stable contract
predictable versioning
clear error semantics
idempotency
replay safety
rate limit/quota
onboarding/offboarding
auditability
supportability
schema compatibility

A partner facade can isolate internal services from partner-specific concerns.

The facade can own:

public DTO mapping
partner-specific validation
idempotency key handling
partner error code mapping
callback/webhook protocol
asynchronous status resource
public documentation examples

It must not own:

final regulatory decision
case lifecycle truth
evidence truth
internal workflow state authority

21. Common anti-patterns

21.1 God gateway

Everything flows through one gateway with business logic, data joins, and special cases.

Symptom:

Changing business process requires gateway deployment.

21.2 Backend-for-everything

One BFF serves web, mobile, partner, admin, and batch clients.

It becomes a generic backend with client-specific branches everywhere.

21.3 Gateway hides service contract chaos

Internal APIs break constantly, and gateway transformations patch over them.

This hides the pain instead of fixing compatibility discipline.

21.4 BFF owns distributed transaction

BFF calls several services to complete a command without durable workflow or compensation.

21.5 Edge-only authorization

Gateway checks permissions once, then internal services blindly trust all calls.

Object-level authorization eventually fails.

21.6 No route ownership

Nobody owns route lifecycle, version retirement, or incident response.

21.7 Public API exposes internal service shape

External contract mirrors internal microservice structure.

Now internal refactoring breaks external consumers.

22. Architecture decision matrix

Use this when deciding edge shape.

Situation	Recommended shape
Single internal web app, low security complexity	Ingress + one BFF may be enough
Multiple clients with different UX needs	API gateway + separate BFF per client family
Public partner API	API management/gateway + partner facade
Heavy screen composition	BFF or query composition service
High-volume static/read API	Gateway/CDN caching plus source service contract
Complex business workflow command	Domain service/workflow orchestrator, not BFF orchestration
Need service-to-service mTLS/retry/telemetry	Service mesh/internal proxy may help
Need developer portal/subscription/quota	API management product may be justified
Internal event-driven process	Gateway is irrelevant; use messaging/workflow boundary

23. Case study: Regulatory case portal

Clients:

public complainant portal
investigator web UI
supervisor dashboard
partner agency integration
mobile inspection app

Bad design:

All clients call Case Service directly.
Case Service exposes endpoints for every screen.
Partner DTOs leak into internal model.
Mobile app receives massive payloads.
Supervisor dashboard causes fan-out from browser.

Better design:

Public Portal -> Public Case BFF -> Case Intake / Evidence Upload / Notification
Investigator Web -> Investigator Case BFF -> Case Query / Party / Evidence / Workflow
Supervisor Dashboard -> Supervisor BFF -> Case Query Read Model / SLA Projection
Partner Agency -> API Gateway -> Partner Intake Facade -> Case Command
Mobile App -> Mobile Inspection BFF -> Inspection Task / Evidence Metadata

Boundary logic:

BFFs own client experience shape
domain services own commands and invariant
read models support high-fan-out dashboards
partner facade protects internal model
gateway handles public security/rate/version policy

This design is more components than direct calls.

But complexity is placed where it belongs.

24. Edge/BFF review checklist

Before approving gateway/BFF design, ask:

Which clients are served?
Is this route public, partner, internal, or admin?
Who owns this route?
Who owns the destination service?
What authentication mode is used?
What is checked at the edge?
What must still be checked by domain services?
Is this a read composition or a write command?
Does the BFF call multiple services for one command?
Is the operation idempotent?
What is the timeout budget?
Does gateway retry anything?
Are retries safe?
What is the rate limit policy?
Is fan-out bounded?
Is partial response allowed?
Are API versions tracked?
Is deprecation policy defined?
Are sensitive headers sanitized?
Are trace/correlation IDs propagated?
Are edge rejections observable?
Does the public API leak internal model shape?
Can this route be disabled during incident?
Is route metadata in service catalog?

25. Fitness functions

Useful automated checks:

Every public route must declare owner, auth policy, rate limit, and destination.

No route may forward arbitrary external identity headers without normalization.

Every command route exposed externally must require idempotency or documented non-retry policy.

Every deprecated API route must emit version usage metrics.

BFF packages may not depend on persistence adapters for domain-owned databases.

Gateway retry must not be enabled for unsafe methods unless idempotency is enforced.

Every BFF aggregation must classify fragments as required or optional.

Every public route must have request body size limit.

The gateway is code.

Treat it like code.

26. Practice exercise

Design edge architecture for this regulatory platform:

Clients:
- investigator web app
- supervisor dashboard
- public complainant portal
- mobile inspection app
- partner agency system

Internal services:
- Case Command Service
- Case Query Service
- Party Profile Service
- Evidence Metadata Service
- Workflow Service
- Audit Event Service
- Notification Dispatch Service

Create:

edge topology diagram
route table
BFF list
partner facade contract
timeout policy
rate limit policy
partial response policy
edge observability metrics
authorization split between gateway/BFF/domain service
anti-patterns you intentionally avoided

Then answer:

Which clients should not share a BFF?
Which APIs should be public stable contracts?
Which screen should use read model instead of fan-out?
Which commands must avoid BFF orchestration?
Which failure should degrade UI instead of failing the whole response?

27. Summary

The edge layer is a boundary between external complexity and internal service architecture.

A good gateway:

routes clearly
enforces edge policy
protects capacity
propagates identity and telemetry
supports API lifecycle
avoids business ownership

A good BFF:

serves one client experience
composes reads carefully
adapts response shape
handles partial failure honestly
avoids becoming workflow engine or domain service

The final rule:

Gateway is for edge policy. BFF is for client experience. Domain service is for business truth.

When those three are separated, microservices remain evolvable.

When they are mixed, the edge becomes the new monolith.

Lesson Recap

You just completed lesson 64 in deepen practice. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Back To Series Next Lesson

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.

Previous Lesson

Lesson 63

Service Discovery and Client-Side Behavior

Next Lesson

Lesson 65

Service Mesh Pragmatic Usage