Series/Learn Kubernetes with Cloud Services AWS & Azure

Build CoreOrdered learning track

DNS, TLS, Certificates, and Edge Security

Learn Kubernetes with Cloud Services AWS & Azure - Part 019

Production patterns for DNS, TLS, certificate lifecycle, and edge security on Kubernetes with AWS Route 53, ACM, ExternalDNS, cert-manager, Azure DNS, Azure Key Vault, Application Gateway, and Gateway API.

[2026-07-03]21 min read4060 words

In This Lesson

1. The Endpoint Contract 2. Mental Model: DNS Is a Naming Control Plane 3. Internal DNS vs External DNS

PrevNext

Lesson 1940 lesson track09–22 Build Core

#kubernetes#aws#azure#eks+6 more

Part 019 — DNS, TLS, Certificates, and Edge Security

A production endpoint is not just an Ingress.

A production endpoint is a chain of contracts:

name -> resolution -> routing -> TLS -> policy -> load balancer -> gateway/ingress -> service -> ready pod

When an incident happens, users do not say:

the Gateway resource is unhealthy.

They say:

api.company.com is down.

So this part starts from the user-facing name and works inward.

The goal is to make DNS, TLS, and edge security boring. Boring means predictable. Predictable means each layer has a known owner, a known failure mode, a known rollback, and a known observability signal.

1. The Endpoint Contract

For every endpoint, define the contract explicitly.

endpointContract:
  hostname: api.prod.example.com
  visibility: public
  dnsProvider: route53
  dnsZone: example.com
  tlsTermination: aws-alb
  certificateSource: acm
  ingressApi: kubernetes-ingress
  ingressController: aws-load-balancer-controller
  backendService: order-api.default.svc.cluster.local
  backendProtocol: http
  authLayer: application
  waf: enabled
  ownerTeam: platform-edge
  appOwner: order-platform

That contract is more important than the YAML.

The YAML is an implementation. The contract is the invariant.

A strong endpoint contract answers:

Who owns the DNS record?
Who owns the certificate?
Who rotates the certificate?
Where does TLS terminate?
Is traffic public, private, partner-only, or internal?
Which controller is allowed to mutate DNS/load balancer resources?
Which health check protects the user path?
Which logs prove that traffic reached each layer?
How do we revoke the endpoint fast?
How do we recover if the certificate or DNS automation fails?

If those answers are implicit, the platform is fragile.

2. Mental Model: DNS Is a Naming Control Plane

DNS is not just a mapping from name to IP.

DNS is a distributed naming control plane with caching, delegation, propagation delay, and stale state.

Kubernetes engineers often underestimate DNS because Kubernetes internal DNS feels simple:

service.namespace.svc.cluster.local

External DNS is not that simple.

External DNS has:

public zones
private zones
split-horizon names
delegation boundaries
TTL behavior
negative caching
wildcard records
apex record constraints
cloud-specific alias records
ownership conflicts
stale records after controller failure
propagation delay outside your cluster

A production Kubernetes platform must treat DNS as part of the application delivery path.

3. Internal DNS vs External DNS

Do not mix these two models.

Scope	Example	Owner	Failure Impact
Internal Kubernetes DNS	`orders.default.svc.cluster.local`	CoreDNS + Kubernetes API	Service-to-service discovery fails
Internal cloud DNS	`orders.prod.internal.example.com`	Route 53 private hosted zone / Azure Private DNS	VPC/VNet workloads cannot resolve private endpoints
Public DNS	`api.example.com`	Route 53 public hosted zone / Azure DNS public zone	Internet users cannot reach the service
Partner DNS	`partner-api.example.com`	DNS + firewall + private link/VPN policy	Partner traffic fails or leaks

Internal Kubernetes DNS is derived from Kubernetes objects.

External DNS is usually derived from load balancer, ingress, gateway, or explicit DNS automation.

The mental boundary:

Kubernetes DNS answers: where is the Service inside the cluster?
External DNS answers: where should a client enter the platform?

4. Kubernetes Internal DNS

Kubernetes DNS creates stable names for Services and, in some cases, Pods.

Common Service names:

orders
orders.default
orders.default.svc
orders.default.svc.cluster.local

Example:

kubectl run dns-debug --rm -it --image=busybox:1.36 --restart=Never -- nslookup orders.default.svc.cluster.local

4.1 DNS Search Path

Inside a Pod, /etc/resolv.conf typically includes search domains such as:

default.svc.cluster.local
svc.cluster.local
cluster.local

That is why an application in the same namespace can often call:

http://orders:8080

But production services should not rely blindly on short names when cross-namespace calls are involved.

Prefer explicit names:

orders.payment.svc.cluster.local

This makes dependency intent clear.

4.2 Headless Service DNS

A normal Service gives a virtual stable endpoint.

A headless Service gives direct endpoint records.

apiVersion: v1
kind: Service
metadata:
  name: postgres
  namespace: data
spec:
  clusterIP: None
  selector:
    app: postgres
  ports:
    - name: postgres
      port: 5432

Headless Service is useful when clients need stable per-Pod identity, such as StatefulSet members:

postgres-0.postgres.data.svc.cluster.local
postgres-1.postgres.data.svc.cluster.local

Use it carefully. It pushes more topology awareness to clients.

4.3 DNS Failure Modes Inside the Cluster

Symptom	Likely Cause	First Check
Service name cannot resolve	CoreDNS down, Service missing, namespace typo	`kubectl -n kube-system get pods -l k8s-app=kube-dns`
Resolution slow	CoreDNS overloaded, upstream DNS slow, high ndots behavior	CoreDNS metrics/logs
External names fail	Egress DNS blocked, upstream resolver unavailable	NetworkPolicy / node DNS config
Only some Pods fail	Pod DNS policy, custom `dnsConfig`, node issue	compare `/etc/resolv.conf`
Headless records stale	EndpointSlice delay, readiness not aligned	`kubectl get endpointslice`

DNS debugging is not optional. Every platform runbook needs it.

5. External DNS Pattern

The external DNS pattern is simple in concept:

ExternalDNS is a common Kubernetes controller that watches exposed Services/Ingresses and synchronizes records with DNS providers.

Example Ingress with ExternalDNS annotation:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: orders
  namespace: prod-orders
  annotations:
    external-dns.alpha.kubernetes.io/hostname: orders.prod.example.com
spec:
  ingressClassName: nginx
  rules:
    - host: orders.prod.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: orders
                port:
                  number: 8080

The controller resolves the target from the Ingress status or Service status and updates the DNS provider.

This is powerful, but it creates a new privilege boundary.

A DNS controller can publish names. Publishing names is production power.

6. DNS Ownership Model

External DNS automation needs strict ownership.

Bad model:

Every cluster can mutate every record in example.com.

Good model:

Cluster A can mutate only *.dev.example.com.
Cluster B can mutate only *.staging.example.com.
Production platform controller can mutate only approved production zones.
Shared security-owned records require manual or pipeline approval.

Typical controls:

domain filters
zone ID filters
TXT registry ownership
cloud IAM least privilege
namespace allow-list
admission policy for hostname suffix
separate controller instances for public and private zones
separate cloud identities per environment
separate change review for root/apex/critical hostnames

Example ExternalDNS flags conceptually:

args:
  - --source=ingress
  - --source=service
  - --domain-filter=prod.example.com
  - --registry=txt
  - --txt-owner-id=eks-prod-us-east-1
  - --policy=sync

--policy=sync can delete records that no longer match desired state. That is correct for strongly-owned zones and dangerous for shared zones.

When in doubt, use upsert-only until ownership is mature.

7. TTL Is a Reliability Lever

TTL is not just performance tuning.

TTL controls how long stale answers survive.

TTL	Good For	Risk
30s	active cutovers, incident recovery	higher resolver/provider load
60s-300s	normal service endpoints	moderate stale-cache window
1h+	stable infrastructure records	slow recovery from wrong target

Rules:

Use short TTL during migrations.
Raise TTL only when the target is stable.
Do not assume all clients obey TTL perfectly.
Monitor both DNS record state and actual client behavior.
Document expected propagation time in runbooks.

DNS cutovers fail when teams confuse:

authoritative zone updated

with:

all clients are using the new answer

Those are not the same event.

8. Split-Horizon DNS

Split-horizon DNS means the same name can resolve differently depending on resolver context.

Example:

api.prod.example.com
  public internet -> public load balancer
  corporate network -> private load balancer
  VPC/VNet -> private load balancer

This is useful, but dangerous.

Failure modes:

public clients receive private IPs
internal clients receive public IPs and hairpin through the internet
cert validation works but routing path is wrong
incident tests from engineer laptop do not match production clients
private hosted zone overrides public name unexpectedly

Use split-horizon only when the operating model is mature.

For regulated systems, split-horizon DNS must be documented because it affects auditability of traffic paths.

9. AWS DNS Pattern: Route 53 + EKS

A typical EKS public endpoint pattern:

9.1 Public Zone

Use a Route 53 public hosted zone for internet-facing names.

Example:

orders.prod.example.com -> alias to ALB DNS name

Prefer alias records for AWS load balancer targets when supported.

9.2 Private Zone

Use Route 53 private hosted zones for VPC-only names.

Example:

orders.prod.internal.example.com -> internal ALB/NLB

Private zone design needs VPC association planning:

same account vs cross-account
same region vs multi-region
shared services VPC
EKS VPCs per environment
resolver forwarding between on-prem and AWS

9.3 ExternalDNS with Route 53

ExternalDNS needs permission to change records.

A safe EKS pattern uses workload identity, not static credentials.

Conceptual permission boundary:

ExternalDNS service account -> IAM role -> Route 53 hosted zone permissions

The IAM policy should limit:

hosted zone ARN
change actions
list actions needed for reconciliation
ideally hostname suffix through admission policy because IAM cannot express every DNS ownership invariant cleanly

Example Helm-style values sketch:

provider:
  name: aws
policy: upsert-only
registry: txt
txtOwnerId: eks-prod-us-east-1
domainFilters:
  - prod.example.com
serviceAccount:
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::111122223333:role/external-dns-prod

Use separate ExternalDNS deployments for public and private zones.

Do not let one controller manage both unless you have a strong reason.

10. Azure DNS Pattern: Azure DNS + AKS

A typical AKS public endpoint pattern:

For private endpoints:

AKS gives multiple integration paths:

ExternalDNS with Azure DNS.
Application Routing add-on for managed DNS/TLS workflows.
Application Gateway Ingress Controller.
Application Gateway for Containers with Gateway API.
Manual DNS records through IaC.

Choose based on ownership.

Pattern	Best When	Risk
Manual IaC DNS	few stable endpoints	slow app-team iteration
ExternalDNS	many dynamic endpoints	controller has DNS mutation power
Application Routing add-on	platform wants managed DNS/TLS integration	less custom control
App Gateway + Key Vault	centralized edge/security ownership	coordination with network/security team

11. TLS Mental Model

TLS answers three questions:

Is the server the entity the client intended to reach?
Is traffic protected from passive reading and active tampering?
Which party terminates or re-encrypts the connection?

In Kubernetes, TLS can terminate at several layers.

Common models:

Model	Description	Use When
Edge termination	TLS ends at CDN/WAF/Front Door/CloudFront	centralized internet security
Load balancer termination	TLS ends at ALB/App Gateway	common web workloads
Ingress termination	TLS ends at NGINX/Envoy/Traefik	app/platform owns cert in cluster
TLS passthrough	TLS reaches app Pod	app requires end-to-end TLS ownership
Re-encryption	TLS at edge, then TLS again to backend	regulated/internal trust boundary
mTLS	both sides authenticate	service mesh or high-security internal traffic

Do not say “we use HTTPS” until you can say exactly where TLS terminates.

12. Kubernetes TLS Secrets

A Kubernetes TLS Secret stores certificate and private key material.

Shape:

apiVersion: v1
kind: Secret
metadata:
  name: orders-tls
  namespace: prod-orders
type: kubernetes.io/tls
data:
  tls.crt: <base64 PEM certificate chain>
  tls.key: <base64 PEM private key>

A typical Ingress reference:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: orders
  namespace: prod-orders
spec:
  ingressClassName: nginx
  tls:
    - hosts:
        - orders.prod.example.com
      secretName: orders-tls
  rules:
    - host: orders.prod.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: orders
                port:
                  number: 8080

This is simple. The production questions are not simple:

Who can read the Secret?
How is the Secret encrypted at rest?
How is the key generated?
How is renewal triggered?
Does the ingress controller reload without dropping connections?
What happens if renewal fails?
How many days before expiry do we alert?
Is the private key ever stored in Git?
Is the certificate chain complete?
Does the certificate cover all SANs?

Never treat a TLS Secret as ordinary config.

It is private key material.

13. Certificate Sources

Production platforms usually choose one or more certificate sources.

Source	Example	Strength	Weakness
Public CA through ACME	Let's Encrypt with cert-manager	automated, fast	rate limits, DNS/HTTP challenge dependency
Cloud certificate manager	AWS ACM, Azure Key Vault certs	cloud-integrated, centralized	controller-specific integration
Enterprise CA	internal PKI	compliance and private trust	slower process, integration complexity
Service mesh CA	Istio/Linkerd SPIFFE-like identity	workload mTLS	usually not for public browser TLS
Manually imported cert	uploaded PEM/PFX	simple emergency path	rotation risk

The correct source depends on where TLS terminates.

If TLS terminates at AWS ALB, ACM is often the cleanest source.

If TLS terminates at Azure Application Gateway, Key Vault integration is often the cleanest source.

If TLS terminates in an in-cluster ingress controller, cert-manager is often the cleanest source.

14. cert-manager Pattern

cert-manager extends Kubernetes with certificate resources.

Core objects:

Object	Scope	Purpose
`Issuer`	namespace	certificate issuer for one namespace
`ClusterIssuer`	cluster	shared issuer across namespaces
`Certificate`	namespace	desired certificate and Secret target
`CertificateRequest`	namespace	concrete request generated by cert-manager
`Order` / `Challenge`	namespace	ACME protocol state

Example ClusterIssuer using ACME HTTP-01:

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    email: platform@example.com
    server: https://acme-v02.api.letsencrypt.org/directory
    privateKeySecretRef:
      name: letsencrypt-prod-account-key
    solvers:
      - http01:
          ingress:
            ingressClassName: nginx

Example Certificate:

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: orders-cert
  namespace: prod-orders
spec:
  secretName: orders-tls
  dnsNames:
    - orders.prod.example.com
  issuerRef:
    name: letsencrypt-prod
    kind: ClusterIssuer

cert-manager then reconciles the desired certificate and writes the target Secret.

14.1 HTTP-01 vs DNS-01

Challenge	How It Works	Good For	Risk
HTTP-01	CA calls a temporary HTTP path on the hostname	simple public websites	requires public route during issuance
DNS-01	CA checks a TXT record	wildcard certs, private ingress, no HTTP exposure	DNS API permission needed

DNS-01 often fits platform engineering better because it separates certificate proof from HTTP path routing.

But DNS-01 gives the cert controller DNS mutation power.

That power must be constrained.

15. AWS TLS Pattern: ACM + ALB

For EKS with AWS Load Balancer Controller, a common pattern is:

Route 53 -> ALB -> Service -> Pod
TLS terminates at ALB
Certificate is stored in ACM

Example Ingress:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: orders
  namespace: prod-orders
  annotations:
    kubernetes.io/ingress.class: alb
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: ip
    alb.ingress.kubernetes.io/listen-ports: '[{"HTTP":80},{"HTTPS":443}]'
    alb.ingress.kubernetes.io/ssl-redirect: '443'
    alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-east-1:111122223333:certificate/abc-def
spec:
  rules:
    - host: orders.prod.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: orders
                port:
                  number: 8080

In this model:

Kubernetes references the certificate ARN.
The private key stays in ACM, not in a Kubernetes Secret.
The ALB performs TLS termination.
The app receives HTTP unless backend protocol is configured for HTTPS.

This is often a strong security posture because private keys are not stored in etcd.

15.1 ACM Certificate Ownership

ACM certificates should be provisioned by IaC or a controlled certificate workflow, not copied manually during incidents.

Track:

domain names/SANs
validation method
owning hosted zone
renewal eligibility
attached load balancers
expiry alarms
environment
certificate owner

When ACM cannot renew, the incident is usually caused by broken validation DNS records or ownership drift.

16. Azure TLS Pattern: Key Vault + Application Gateway

For AKS with Application Gateway, a common pattern is:

Azure DNS -> Application Gateway -> AKS backend
TLS terminates at Application Gateway
Certificate is stored in Azure Key Vault

Application Gateway can reference certificates from Key Vault. This centralizes certificate handling and keeps private key material outside ordinary Kubernetes Secrets.

Important ownership model:

Application Gateway managed identity -> Key Vault certificate/secret permission -> HTTPS listener

Benefits:

separate security team can own certificate lifecycle
Application Gateway can pull updated cert versions
app teams do not handle private keys
Key Vault access can be audited centrally

Production caveat:

Use versionless Key Vault secret identifiers when automatic rotation is expected. Versioned references pin the listener to one version and can defeat automatic rotation.

17. AKS Application Routing Add-on

AKS Application Routing can simplify DNS and TLS for common ingress scenarios.

Conceptual model:

This is useful when the platform wants managed integration between:

AKS routing resources
Azure DNS
Azure Key Vault certificates
external-dns-like automation
ingress/gateway exposure

Use it when the default path fits.

Avoid it when you need deep custom control over every edge component.

18. Gateway API TLS Model

Gateway API separates infrastructure owner concerns from route owner concerns.

Example:

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: public-gateway
  namespace: platform-edge
spec:
  gatewayClassName: external
  listeners:
    - name: https
      protocol: HTTPS
      port: 443
      hostname: "*.prod.example.com"
      tls:
        mode: Terminate
        certificateRefs:
          - kind: Secret
            name: wildcard-prod-example-com
      allowedRoutes:
        namespaces:
          from: Selector
          selector:
            matchLabels:
              edge-access: public

Application route:

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: orders
  namespace: prod-orders
spec:
  parentRefs:
    - name: public-gateway
      namespace: platform-edge
  hostnames:
    - orders.prod.example.com
  rules:
    - backendRefs:
        - name: orders
          port: 8080

This model is excellent for platform engineering because it separates:

platform-owned gateway listeners
app-owned route intent
policy-controlled namespace attachment
certificate ownership
hostnames allowed by environment

19. Edge Security Controls

TLS is not the whole edge security story.

A production endpoint should define:

Control	Purpose
HTTPS redirect	prevent plaintext client traffic
TLS policy	minimum version and cipher posture
HSTS	instruct browsers to prefer HTTPS
WAF	block common attack patterns
rate limiting	protect backend and cost surface
request size limits	prevent resource exhaustion
header normalization	reduce spoofing and routing bugs
source restrictions	allow only expected clients/networks
bot protection	reduce automated abuse
DDoS posture	absorb volumetric attacks upstream
auth boundary	decide whether edge, gateway, or app owns auth

Do not push all edge security into application code.

Application code should enforce business authorization, but edge infrastructure should reduce obvious bad traffic before it consumes app capacity.

20. Where to Put WAF

Common WAF locations:

Platform	WAF Location	Typical Pairing
AWS	AWS WAF on ALB	EKS ALB ingress
AWS	AWS WAF on CloudFront	global edge before regional ALB
Azure	WAF_v2 Application Gateway	regional ingress to AKS
Azure	Azure Front Door WAF	global edge before regional AKS

Selection logic:

Need global acceleration and centralized edge? Use CDN/front door layer.
Need regional L7 protection near AKS/EKS? Use ALB/App Gateway WAF.
Need private-only traffic? WAF may sit in internal gateway or be replaced by network/auth controls.

Never enable WAF without an exception workflow.

False positives become production incidents.

21. Certificate Renewal Failure Model

Certificate expiry incidents are embarrassing because they are predictable.

A serious platform treats certificates as lifecycle objects.

Minimum monitoring:

certificate expiry days remaining
renewal controller health
issuer/account health
failed ACME challenges
DNS validation record state
Key Vault/ACM certificate renewal state
load balancer listener certificate binding
ingress/gateway reload state
synthetic HTTPS probe from outside the cluster

Alert thresholds:

Days Before Expiry	Severity	Action
30	warning	investigate renewal path
14	high	assign owner, confirm new cert available
7	urgent	manual renewal fallback ready
3	incident	executive-visible service risk
0	outage	user-facing TLS failure

Do not wait until three days before expiry to discover that nobody owns DNS validation.

22. End-to-End TLS vs TLS Termination

A common debate:

Should we terminate TLS at the load balancer or run TLS all the way to the Pod?

The answer depends on the trust boundary.

22.1 Terminate at Load Balancer

Pros:

simpler app configuration
centralized certificate management
easier WAF/header policy
fewer private keys in cluster
simpler debugging

Cons:

traffic may be plaintext inside VPC/VNet unless re-encrypted
app cannot directly inspect client TLS certs
compliance teams may object for sensitive data paths

22.2 Re-encrypt to Backend

Pros:

encrypted traffic beyond edge
compatible with stricter internal security posture
can use internal CA between gateway and app

Cons:

more certificate lifecycle work
backend health checks are harder
app/framework TLS config becomes operational dependency
ingress/gateway must trust backend CA

22.3 mTLS Between Services

Usually handled with service mesh or dedicated sidecar/proxy model.

Pros:

workload identity at transport layer
encrypted east-west traffic
strong service-to-service authentication

Cons:

operational complexity
cert rotation complexity
debugging complexity
policy design burden

Use mTLS for clear risk-driven reasons, not because it sounds advanced.

23. Hostname Admission Policy

One of the best platform guardrails is hostname validation.

Examples of invalid app-owned requests:

host: admin.prod.example.com       # reserved
host: login.example.com            # security-owned
host: api.other-team.example.com   # wrong ownership
host: example.com                  # apex not allowed
host: '*.prod.example.com'         # wildcard not app-owned

Policy requirements:

namespace/team can only claim approved suffixes
prod hostnames require production namespace labels
reserved hostnames require platform/security approval
wildcard hostnames limited to platform namespaces
internal-only hostnames cannot use public ingress class
public hostnames cannot use private DNS zone without explicit design

Kyverno-style conceptual policy:

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: restrict-ingress-hostnames
spec:
  validationFailureAction: Enforce
  rules:
    - name: require-prod-host-suffix
      match:
        any:
          - resources:
              kinds:
                - Ingress
      validate:
        message: "Production ingress hosts must end with .prod.example.com"
        pattern:
          spec:
            rules:
              - host: "*.prod.example.com"

Real policies need more nuance, but the invariant is simple:

App teams should not be able to claim arbitrary DNS names just by committing YAML.

24. Edge Path Observability

A good request trace crosses layers.

DNS answer -> client connect -> edge request log -> load balancer access log -> ingress log -> app log -> trace

Minimum signals:

Layer	Signal
DNS	authoritative record value, query logs if enabled
TLS	certificate expiry, handshake failures, protocol/cipher stats
WAF	allowed/blocked/count rules
Load balancer	request count, target response code, target latency, healthy targets
Ingress/Gateway	route match, upstream status, config reload errors
Service	EndpointSlice count, ready endpoints
Pod	readiness state, app status, request logs

Synthetic probes should test the user-facing hostname:

curl -Iv https://orders.prod.example.com/health

Do not only test the Service inside the cluster.

That proves the app works, not that the endpoint works.

25. Debugging Cookbook

25.1 DNS Record Check

dig orders.prod.example.com

dig +short orders.prod.example.com

dig @8.8.8.8 orders.prod.example.com

dig @1.1.1.1 orders.prod.example.com

Check authoritative nameserver:

dig NS prod.example.com

dig @ns-123.awsdns-45.com orders.prod.example.com

25.2 TLS Certificate Check

openssl s_client -connect orders.prod.example.com:443 -servername orders.prod.example.com </dev/null

Important fields:

subject
subject alternative names
issuer
expiry
full chain
verification result

25.3 Kubernetes Route Check

kubectl get ingress -A
kubectl describe ingress -n prod-orders orders
kubectl get gateway -A
kubectl get httproute -A
kubectl get svc -n prod-orders orders
kubectl get endpointslice -n prod-orders -l kubernetes.io/service-name=orders

25.4 AWS Check

aws elbv2 describe-load-balancers
aws elbv2 describe-listeners --load-balancer-arn <alb-arn>
aws elbv2 describe-target-health --target-group-arn <target-group-arn>
aws route53 list-resource-record-sets --hosted-zone-id <zone-id>
aws acm describe-certificate --certificate-arn <cert-arn>

25.5 Azure Check

az network dns record-set a list -g <rg> -z prod.example.com
az network application-gateway show -g <rg> -n <appgw>
az keyvault certificate show --vault-name <kv> --name <cert>
az aks show -g <rg> -n <cluster>

The fastest debugging path is outside-in:

DNS -> TLS -> load balancer listener -> backend health -> ingress/gateway route -> service endpoints -> pod readiness -> app

26. Common Failure Modes

26.1 DNS Points to Old Load Balancer

Cause:

Ingress was recreated.
Load balancer DNS name changed.
ExternalDNS failed.
Manual record drifted from desired state.

Fix:

confirm current ingress/load balancer status
update DNS record
reduce TTL before migration next time
move record ownership to IaC/controller with guardrails

26.2 Certificate Valid but Wrong Host

Cause:

host rule changed without cert update
wildcard does not cover required depth
wrong certificate ARN/Secret referenced
SNI mismatch

Fix:

inspect SANs
inspect listener config
inspect ingress/gateway TLS config
add synthetic probe for all hostnames

26.3 Renewal Failed

Cause:

DNS challenge cannot mutate TXT record
HTTP challenge path not reachable
CA rate limit
Key Vault/ACM validation record missing
cert-manager issuer degraded

Fix:

inspect Certificate, CertificateRequest, Order, Challenge
inspect DNS provider permissions
renew manually if inside emergency window
create permanent expiry alert

26.4 Public Endpoint Accidentally Internal

Cause:

wrong load balancer scheme
wrong subnet tags
private DNS zone used
internal ingress class selected

Fix:

inspect load balancer scheme
inspect route table/subnet tags
inspect DNS answer from public resolver
enforce ingress class/hostname policy

26.5 Internal Endpoint Accidentally Public

Cause:

public ingress class used
public DNS zone annotation used
service type LoadBalancer without internal annotation
default controller behavior misunderstood

Fix:

delete or patch public exposure immediately
rotate any exposed credentials if needed
add admission control
review namespace default templates

26.6 WAF Blocks Real Traffic

Cause:

managed rule false positive
request body size limit
path normalization difference
bot rule too aggressive

Fix:

switch rule to count/detect mode if safe
inspect WAF logs
add narrow exception
maintain rule change review process

27. Production Design Patterns

Pattern A: Public Web App on EKS

Route 53 public zone
ExternalDNS manages app hostname
AWS Load Balancer Controller creates ALB
ACM cert terminates TLS on ALB
AWS WAF attached to ALB
ALB target type ip
Service routes to ready Pods

Good for:

public APIs
web frontends
standard regional workloads

Main risks:

ALB annotation sprawl
cert ARN drift
WAF false positives
DNS ownership too broad

Pattern B: Public Web App on AKS with Application Gateway

Azure DNS public zone
Application Gateway WAF_v2
Key Vault certificate
AGIC or Application Gateway for Containers
AKS Service backend

Good for:

Azure-native edge ownership
centralized Key Vault certificate lifecycle
WAF at regional edge

Main risks:

identity permission between App Gateway and Key Vault
subnet and routing ownership
listener/cert reference drift

Pattern C: In-Cluster Ingress with cert-manager

DNS provider
ExternalDNS
NGINX/Envoy ingress
cert-manager ACME DNS-01
Kubernetes TLS Secret

Good for:

cloud-portable platform
teams comfortable operating ingress controllers
custom L7 behavior

Main risks:

private keys in Kubernetes Secrets
ingress controller reload behavior
DNS challenge permission
multi-tenant secret access control

Pattern D: Gateway API Platform Edge

Platform owns GatewayClass/Gateway/listeners
App teams own HTTPRoute
Certificates owned by platform/security
Routes attach through allowedRoutes policy

Good for:

internal developer platform
separation of infrastructure and route ownership
multi-team governance

Main risks:

controller maturity differences
policy complexity
migration from Ingress mental model

28. Edge Security Review Checklist

For each production endpoint:

[ ] hostname owner documented
[ ] DNS zone owner documented
[ ] public/private visibility correct
[ ] TTL appropriate for service maturity
[ ] DNS automation ownership bounded
[ ] TLS termination point documented
[ ] certificate source documented
[ ] certificate renewal monitored
[ ] private key not stored in Git
[ ] Kubernetes Secret access minimized if used
[ ] WAF/rate-limit decision documented
[ ] HTTP -> HTTPS redirect enabled where appropriate
[ ] backend health check matches readiness semantics
[ ] synthetic probe checks real hostname
[ ] ingress/gateway controller logs available
[ ] load balancer access logs available
[ ] rollback path documented
[ ] emergency certificate replacement path tested

29. Decision Matrix

Requirement	Preferred Pattern
Keep private keys out of Kubernetes on AWS	ACM + ALB/NLB termination
Keep private keys out of Kubernetes on Azure	Key Vault + Application Gateway/App Gateway for Containers
Need wildcard certs for many app routes	DNS-01 with cert-manager or cloud cert manager
Need app teams to self-serve routes safely	Gateway API with allowedRoutes and hostname policy
Need cloud-portable ingress	NGINX/Envoy + cert-manager + ExternalDNS
Need enterprise central edge	CloudFront/Front Door + regional ingress
Need private-only APIs	private DNS + internal load balancer/gateway
Need browser-facing public TLS	public CA / ACM / Key Vault integrated certificate
Need service-to-service identity	mTLS/service mesh, not public TLS alone

30. Practice Lab

Build the same endpoint contract twice:

EKS implementation.
AKS implementation.

Requirements:

hostname: orders.lab.example.com
visibility: public
http redirect: enabled
TLS: enabled
certificate renewal: automated
DNS: automated
health check: readiness-compatible
WAF: design only, not necessarily implemented

Deliverables:

endpoint contract YAML
DNS ownership statement
TLS termination diagram
Kubernetes manifest
cloud-specific manifest/annotation
certificate renewal runbook
failure-mode table
rollback procedure

You understand this part when you can explain why the hostname is available or unavailable without opening the application code.

31. Part Summary

DNS, TLS, and edge security are not decorations around Kubernetes.

They are the user-facing contract of the platform.

The invariants:

DNS names must have owners.
TLS termination must be explicit.
Certificates must be lifecycle-managed.
Private keys must have a security boundary.
Edge exposure must be policy-controlled.
External automation must be least-privileged.
Synthetic probes must test the real hostname.

If those invariants hold, endpoints are operable.

If they do not, Kubernetes YAML only gives the illusion of control.

References

Kubernetes Documentation — DNS for Services and Pods: https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/
Kubernetes Documentation — Ingress TLS: https://kubernetes.io/docs/concepts/services-networking/ingress/
Kubernetes Documentation — TLS Secrets: https://kubernetes.io/docs/reference/kubectl/generated/kubectl_create/kubectl_create_secret_tls/
Kubernetes Documentation — Manage TLS Certificates in a Cluster: https://kubernetes.io/docs/tasks/tls/managing-tls-in-a-cluster/
Gateway API Documentation: https://gateway-api.sigs.k8s.io/
cert-manager Documentation: https://cert-manager.io/docs/
ExternalDNS Documentation: https://kubernetes-sigs.github.io/external-dns/
AWS Route 53: https://aws.amazon.com/route53/
AWS Load Balancer Controller Documentation: https://kubernetes-sigs.github.io/aws-load-balancer-controller/
AWS Certificate Manager: https://docs.aws.amazon.com/acm/
Azure DNS Documentation: https://learn.microsoft.com/en-us/azure/dns/
Azure Application Gateway TLS with Key Vault: https://learn.microsoft.com/en-us/azure/application-gateway/key-vault-certs
Azure AKS Application Routing: https://learn.microsoft.com/en-us/azure/aks/app-routing
Azure Key Vault Provider for Secrets Store CSI Driver: https://learn.microsoft.com/en-us/azure/aks/csi-secrets-store-driver

Lesson Recap

You just completed lesson 19 in build core. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Back To Series Next Lesson

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.

Previous Lesson

Lesson 18

Cloud Load Balancing and Ingress on AWS/Azure

Next Lesson

Lesson 20

Kubernetes Identity, RBAC, and Cloud IAM