Series/Learn Kubernetes with Cloud Services AWS & Azure

Build CoreOrdered learning track

Service Discovery and Kubernetes Networking

Learn Kubernetes with Cloud Services AWS & Azure - Part 009

Service discovery and Kubernetes networking from a production engineering perspective: Service, EndpointSlice, DNS, kube-proxy, traffic flow, cloud implications, and debugging.

[2026-07-03]18 min read3404 words

In This Lesson

1. The Core Model 2. Why Service Discovery Exists 3. Minimal Service Example

PrevNext

Lesson 0940 lesson track09–22 Build Core

#kubernetes#networking#service-discovery#eks+2 more

Part 009 — Service Discovery and Kubernetes Networking

The useful way to learn Kubernetes networking is not to memorize every object. The useful way is to understand the contract:

A Pod is replaceable and unstable. A Service gives that replaceable group of Pods a stable network identity.

Everything else is implementation detail: labels, EndpointSlices, DNS records, kube-proxy or eBPF dataplane, cloud load balancer integration, CNI behavior, and network policy enforcement.

In production, broken networking rarely looks like “Kubernetes networking is broken”. It looks like:

rollout succeeds but 20% traffic still hits old behavior;
service name resolves but connection times out;
one Availability Zone works and another fails;
CoreDNS CPU spikes and every service starts failing randomly;
a Service has zero endpoints because labels drifted;
pods are healthy, but cloud load balancer health checks fail;
a migration from ClusterIP to LoadBalancer accidentally exposes an internal API;
kube-proxy rules, CNI IP exhaustion, or NodePort security group gaps create partial outage.

This part builds the mental model needed to debug those situations.

1. The Core Model

Kubernetes networking is based on a few invariants.

Invariant	Meaning	Production consequence
Every Pod gets an IP	A Pod is directly addressable within the cluster network	Pod IP is not stable, so do not depend on it as identity
Containers in the same Pod share network namespace	They share IP and port space	Sidecars can communicate through `localhost`; port collision matters
Services select Pods by labels	Service membership is indirect	Label hygiene is production-critical
Service IP is stable during Service lifetime	Clients call the Service, not individual Pods	Service becomes the logical dependency boundary
EndpointSlice represents backends	Kubernetes materializes selected Pods into endpoints	Debug endpoints, not only Service YAML
DNS maps names to Services	Most application code uses DNS names	DNS is part of application availability
Network dataplane is implementation-specific	iptables, IPVS, nftables, eBPF, cloud CNI, etc.	Same manifest can have different operational behavior per cluster

The important distinction:

Service is the stable abstraction.
EndpointSlice is the current resolved backend set.
CNI/dataplane is how packets actually move.
DNS is how clients usually discover the Service.

When debugging, never stop at kubectl get svc. A Service with a ClusterIP can still have no usable backend.

2. Why Service Discovery Exists

Without Service discovery, a client needs to know Pod IPs:

order-api -> 10.244.2.17:8080
order-api -> 10.244.5.31:8080
order-api -> 10.244.7.11:8080

That does not survive:

rolling deployment;
node replacement;
autoscaling;
rescheduling after failure;
blue/green migration;
pod disruption;
zone rebalancing.

With a Service:

order-api -> inventory-service.default.svc.cluster.local:8080

The client depends on a logical service name. Kubernetes continuously updates the backend set behind that logical name.

This is not only convenience. It is a correctness mechanism.

3. Minimal Service Example

A Service selects Pods through labels.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: inventory-api
spec:
  replicas: 3
  selector:
    matchLabels:
      app.kubernetes.io/name: inventory-api
  template:
    metadata:
      labels:
        app.kubernetes.io/name: inventory-api
    spec:
      containers:
        - name: app
          image: registry.example.com/inventory-api:1.4.2
          ports:
            - name: http
              containerPort: 8080
---
apiVersion: v1
kind: Service
metadata:
  name: inventory-api
spec:
  type: ClusterIP
  selector:
    app.kubernetes.io/name: inventory-api
  ports:
    - name: http
      port: 80
      targetPort: http

The Service port is 80. The container port is 8080. The name http ties the Service to the container port by name.

Prefer named ports in production because they survive container port changes better than hard-coded numeric targetPort values.

4. The Service-to-Pod Resolution Chain

When a client calls:

http://inventory-api.default.svc.cluster.local

The chain is:

Conceptually, this is simple. Operationally, there are many places it can break:

Layer	Example failure	Symptom
DNS	CoreDNS unavailable	`no such host`, timeout resolving Service name
Service	Wrong selector	Service exists but has no endpoints
EndpointSlice	Pods not ready	Service resolves but traffic has no backends
Dataplane	kube-proxy/eBPF issue	ClusterIP connection fails despite endpoints
CNI	routing/IP exhaustion	Pod-to-Pod fails across nodes or new Pods cannot get IPs
App	wrong `targetPort` or readiness behavior	traffic reaches wrong port or unready app

5. Service Types

Kubernetes gives several Service types. Use them intentionally.

Type	Scope	Typical use	Production warning
`ClusterIP`	Inside cluster	Internal service-to-service calls	Default and safest for internal APIs
`Headless`	DNS returns backend Pod records	Stateful discovery, direct backend identity	Client must handle backend selection/failover
`NodePort`	Exposes port on every node	Building block for external LBs, sometimes debugging	Avoid direct public NodePort exposure
`LoadBalancer`	Cloud provider creates external/internal LB	L4 public/private exposure	Cloud-specific behavior and cost
`ExternalName`	DNS CNAME to external service	Alias external dependency	No proxying, no port mapping, DNS-only abstraction

Most internal application Services should be ClusterIP.

Use LoadBalancer when the application needs L4 exposure and a cloud load balancer is the right edge primitive.

Use Ingress/Gateway API when the application needs L7 HTTP routing, host/path routing, TLS termination, request policy, or shared edge infrastructure.

6. ClusterIP Service

ClusterIP is the default Service type.

apiVersion: v1
kind: Service
metadata:
  name: billing-api
spec:
  type: ClusterIP
  selector:
    app.kubernetes.io/name: billing-api
  ports:
    - name: http
      port: 80
      targetPort: http

Client code should call:

http://billing-api.payments.svc.cluster.local

Inside the same namespace, this often works:

http://billing-api

But production configuration should prefer fully qualified service names when cross-namespace dependencies are involved:

billing-api.payments.svc.cluster.local

This avoids ambiguity when a service with the same name exists in another namespace.

7. Service Selector Correctness

A Service selector is not validated against a Deployment selector. Kubernetes will happily accept a Service that selects nothing.

Bad example:

apiVersion: v1
kind: Service
metadata:
  name: checkout-api
spec:
  selector:
    app: checkout
  ports:
    - port: 80
      targetPort: 8080

Deployment labels:

metadata:
  labels:
    app.kubernetes.io/name: checkout-api

This Service has zero endpoints.

Debug command:

kubectl get endpointslice -l kubernetes.io/service-name=checkout-api
kubectl describe svc checkout-api
kubectl get pod -l app=checkout
kubectl get pod -l app.kubernetes.io/name=checkout-api

Production invariant:

Every Service selector must be tested against live Pods before release.

In CI, use static policy. In cluster, use admission or conformance checks.

8. EndpointSlice Mental Model

Older Kubernetes versions used Endpoints heavily. Modern Kubernetes uses EndpointSlice to scale better for large backend sets.

An EndpointSlice contains a subset of endpoints for a Service.

Conceptually:

apiVersion: discovery.k8s.io/v1
kind: EndpointSlice
metadata:
  labels:
    kubernetes.io/service-name: inventory-api
addressType: IPv4
ports:
  - name: http
    port: 8080
    protocol: TCP
endpoints:
  - addresses:
      - 10.42.1.18
    conditions:
      ready: true
  - addresses:
      - 10.42.3.24
    conditions:
      ready: true

You normally do not create EndpointSlices manually for selector-based Services. The controller creates them.

But you should inspect them during incident response.

Useful commands:

kubectl get endpointslice -A
kubectl get endpointslice -n payments -l kubernetes.io/service-name=inventory-api -o wide
kubectl describe endpointslice -n payments -l kubernetes.io/service-name=inventory-api

Interpretation:

Observation	Likely meaning
No EndpointSlice	Service selector selects no Pods, or controller problem
Endpoints exist but `ready=false`	Pods exist but readiness gate excludes them from traffic
Endpoints have unexpected IPs	Label selector captures wrong Pods
Endpoints only in one zone	Scheduling/topology issue
Endpoint port mismatch	Service `targetPort` / container port mismatch

9. DNS Contract

Kubernetes DNS gives Services stable names.

Common forms:

service-name
service-name.namespace
service-name.namespace.svc
service-name.namespace.svc.cluster.local

Inside namespace payments, a Pod can resolve:

inventory-api

From another namespace, use:

inventory-api.payments.svc.cluster.local

DNS is not a side concern. DNS is in the critical path of almost every microservice call.

Production implications:

configure application DNS caching intentionally;
avoid doing DNS lookup per request in high-throughput paths;
monitor CoreDNS latency, errors, cache hit ratio, CPU, memory;
avoid excessive short-lived connections that trigger repeated lookups;
treat CoreDNS as a tier-0 cluster dependency.

10. CoreDNS Failure Patterns

CoreDNS failures usually appear as application failures.

Failure pattern	App symptom	Root cause examples
DNS timeout	intermittent dependency failures	overloaded CoreDNS, network policy blocking DNS, node issue
`NXDOMAIN`	service not found	wrong namespace, wrong name, missing Service
slow resolution	high p95/p99 latency	CoreDNS CPU throttling, upstream DNS latency
only some Pods fail	node-local issue	node routing, iptables/eBPF, DNS config drift
external DNS lookup fails	cannot call external APIs	upstream resolver, NAT, firewall, DNS forwarding config

Debug from inside cluster:

kubectl run dns-debug \
  --rm -it \
  --image=registry.k8s.io/e2e-test-images/agnhost:2.45 \
  --restart=Never \
  -- nslookup inventory-api.payments.svc.cluster.local

Also inspect:

kubectl -n kube-system get deploy coredns
kubectl -n kube-system top pod -l k8s-app=kube-dns
kubectl -n kube-system logs deploy/coredns

On managed clusters, CoreDNS is often delivered as a managed or cluster add-on. Treat its version, resources, autoscaling, and upgrade lifecycle as platform responsibilities.

11. Headless Service

A headless Service has no ClusterIP:

apiVersion: v1
kind: Service
metadata:
  name: postgres
spec:
  clusterIP: None
  selector:
    app.kubernetes.io/name: postgres
  ports:
    - name: postgres
      port: 5432
      targetPort: postgres

Instead of hiding all backends behind one virtual IP, DNS can return records for individual endpoints.

This is useful when:

clients must know individual backend identities;
a StatefulSet needs stable per-pod DNS;
the protocol has its own leader/follower or shard routing semantics;
a service mesh or client-side load balancer expects endpoint discovery.

With StatefulSet:

postgres-0.postgres.data.svc.cluster.local
postgres-1.postgres.data.svc.cluster.local
postgres-2.postgres.data.svc.cluster.local

Production warning:

Headless Service moves more responsibility to the client.

The client must handle connection selection, retry, stale records, and endpoint failure correctly.

12. NodePort Service

A NodePort exposes the Service on a port across nodes.

apiVersion: v1
kind: Service
metadata:
  name: legacy-api
spec:
  type: NodePort
  selector:
    app.kubernetes.io/name: legacy-api
  ports:
    - name: http
      port: 80
      targetPort: http
      nodePort: 30080

Traffic can arrive at:

<NodeIP>:30080

In production, direct NodePort access is usually not the desired external interface. It is commonly used as a building block behind a cloud load balancer or for controlled internal scenarios.

Risks:

every node exposes the port;
security groups / NSGs must be correct;
client may hit a node with no local backend unless traffic policy is configured;
operational ownership becomes unclear;
port ranges are limited and can collide.

Prefer Ingress/Gateway/API or cloud LoadBalancer Services for intentional exposure.

13. LoadBalancer Service

A LoadBalancer Service asks the cloud provider to provision a cloud load balancer.

apiVersion: v1
kind: Service
metadata:
  name: public-metrics-ingest
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
spec:
  type: LoadBalancer
  selector:
    app.kubernetes.io/name: metrics-ingest
  ports:
    - name: tcp
      port: 443
      targetPort: 8443

The exact annotations and behavior are cloud-specific.

On AWS/EKS, L4 exposure often maps to Network Load Balancer behavior through the relevant service controller or AWS Load Balancer Controller.

On Azure/AKS, LoadBalancer Services integrate with Azure Load Balancer and require subnet, public/private, annotation, and health probe awareness.

General production decision:

Requirement	Better primitive
TCP/UDP pass-through	`Service type=LoadBalancer`
HTTP host/path routing	Ingress or Gateway API
Shared edge with many apps	Ingress/Gateway API
Cloud-native WAF/TLS policy	Cloud L7 gateway/controller
Internal service-to-service only	`ClusterIP`

14. ExternalName Service

ExternalName maps a Service name to an external DNS name.

apiVersion: v1
kind: Service
metadata:
  name: external-tax-api
spec:
  type: ExternalName
  externalName: tax-api.vendor.example.com

This is DNS aliasing, not proxying.

Consequences:

no Kubernetes load balancing;
no EndpointSlice backend health;
no port translation;
no traffic policy enforcement at Service level;
external DNS behavior still matters.

Use this sparingly. It can be useful for migration, but it can also hide external dependency ownership.

15. Service Without Selector

A Service can exist without a selector. This is useful when you want Kubernetes DNS and Service abstraction for something whose endpoints are managed manually or by another controller.

apiVersion: v1
kind: Service
metadata:
  name: external-db
spec:
  ports:
    - name: postgres
      port: 5432
      targetPort: 5432

Then create EndpointSlice manually:

apiVersion: discovery.k8s.io/v1
kind: EndpointSlice
metadata:
  name: external-db-1
  labels:
    kubernetes.io/service-name: external-db
addressType: IPv4
ports:
  - name: postgres
    protocol: TCP
    port: 5432
endpoints:
  - addresses:
      - 10.10.20.15

Use case:

exposing a database outside cluster through Kubernetes DNS;
gradual migration from VM service to Kubernetes;
abstracting private endpoints during platform transition.

Production warning:

Manual endpoints create manual correctness burden.

Health, failover, IP lifecycle, and drift must be handled somewhere.

16. Traffic Policy

Two fields often matter for production:

spec:
  internalTrafficPolicy: Cluster
  externalTrafficPolicy: Cluster

`internalTrafficPolicy`

For internal cluster traffic:

Cluster: route to any ready endpoint in cluster.
Local: route only to node-local endpoints.

Local can reduce cross-node traffic and preserve locality, but it can create blackholes if a node has no local endpoint.

`externalTrafficPolicy`

For external traffic entering through NodePort/LoadBalancer style paths:

Cluster: can route to any endpoint in cluster, may hide original client IP depending on implementation.
Local: sends only to local endpoints and can preserve client source IP, but nodes without local endpoints should not receive traffic.

Production trade-off:

Goal	Likely setting	Risk
Simpler load distribution	`Cluster`	possible extra hop/source IP behavior
Preserve client IP	`Local`	uneven load or traffic loss if node health is wrong
Reduce cross-zone data transfer	locality-aware patterns	capacity imbalance
Strict zone-local routing	topology-aware routing	needs careful scheduling and readiness

Do not change traffic policy without validating cloud load balancer health checks and node/backend registration behavior.

17. Named Ports and Protocol Discipline

Bad Service:

ports:
  - port: 80
    targetPort: 8080

Better:

ports:
  - name: http
    port: 80
    targetPort: http

Deployment:

ports:
  - name: http
    containerPort: 8080

Why this matters:

container ports can change without changing Service target logic;
probes can reference the same named port;
manifests are easier to review;
service mesh and observability tools often infer behavior from names;
protocol naming prevents accidental ambiguity.

Use consistent names:

http
https
grpc
metrics
postgres
redis

Avoid vague names like:

port1
main
service
api

18. Readiness and Service Membership

A Pod can exist but not receive Service traffic if it is not ready.

This is a critical contract:

readinessProbe:
  httpGet:
    path: /ready
    port: http
  periodSeconds: 5
  failureThreshold: 2

Service membership should reflect ability to serve traffic, not process existence.

Bad readiness:

/ready returns 200 as soon as process starts

Good readiness:

/ready returns 200 only after:
- app boot complete;
- required config loaded;
- database connection pool ready enough;
- local cache initialized if required;
- schema compatibility checked if necessary;
- dependency degradation policy decided.

EndpointSlice readiness is the bridge between application health and Service routing.

19. Network Path: Same Node vs Cross Node

Packet path differs depending on where client and backend are scheduled.

Cross-node behavior depends heavily on CNI and cloud networking.

In EKS, Pod networking is commonly tied to VPC networking behavior through the AWS VPC CNI unless another CNI architecture is selected.

In AKS, Pod networking depends on the selected Azure CNI mode and cluster networking model.

The manifest can look the same while operational constraints differ:

pod IP source range;
subnet capacity;
route table behavior;
NAT behavior;
security group/NSG behavior;
cross-zone cost;
max pods per node;
observability tooling.

This is why cloud networking gets dedicated parts later in the series.

20. kube-proxy and Dataplane Reality

Historically, Kubernetes used kube-proxy to implement Service virtual IP routing on nodes. Depending on cluster configuration and platform, dataplane behavior may use iptables, IPVS, nftables, or eBPF-based alternatives.

You do not need to memorize every implementation to operate Kubernetes well.

You do need to understand the invariant:

A Service IP is virtual. Something on the node dataplane must translate traffic to actual backend endpoints.

Operational consequences:

a Service ClusterIP is not a normal interface IP on a pod;
packet capture may not show what you expect at first glance;
node-local rules matter;
CNI and service proxy implementation affect latency, scale, and debuggability;
upgrades of dataplane components are production changes.

Debugging often requires both Kubernetes and node-level visibility.

21. Namespace Boundary

Namespace changes DNS and policy scope. It does not create complete network isolation by itself.

A Service named billing-api in namespace payments gets:

billing-api.payments.svc.cluster.local

A different namespace can also have:

billing-api.reporting.svc.cluster.local

Naming convention should avoid ambiguity:

<bounded-context>-<capability>-api

Examples:

payments-ledger-api
orders-checkout-api
risk-scoring-api

Do not rely on short names across namespaces.

22. Label Strategy for Services

A Service selector should select exactly one logical workload group.

Recommended labels:

metadata:
  labels:
    app.kubernetes.io/name: inventory-api
    app.kubernetes.io/part-of: commerce-platform
    app.kubernetes.io/component: api
    app.kubernetes.io/version: "1.4.2"

Service selector:

selector:
  app.kubernetes.io/name: inventory-api
  app.kubernetes.io/component: api

Avoid selecting by version unless you intentionally want version-specific traffic routing.

Bad selector:

selector:
  app: api

That can accidentally capture unrelated Pods.

Production invariant:

Service selectors should be stable across rollout and specific enough to avoid accidental backend capture.

23. Multi-Port Service

A Service can expose multiple ports.

apiVersion: v1
kind: Service
metadata:
  name: user-api
spec:
  selector:
    app.kubernetes.io/name: user-api
  ports:
    - name: http
      port: 80
      targetPort: http
    - name: metrics
      port: 9090
      targetPort: metrics

This is acceptable, but be deliberate.

If different consumers should access different surfaces, separate Services may be clearer:

user-api             -> application traffic
user-api-metrics     -> metrics scraping
user-api-admin       -> admin-only surface

This supports separate RBAC, NetworkPolicy, monitoring, and exposure decisions.

24. Production Service Manifest Baseline

A reasonable internal Service baseline:

apiVersion: v1
kind: Service
metadata:
  name: order-api
  namespace: commerce
  labels:
    app.kubernetes.io/name: order-api
    app.kubernetes.io/part-of: commerce-platform
    app.kubernetes.io/component: api
spec:
  type: ClusterIP
  selector:
    app.kubernetes.io/name: order-api
    app.kubernetes.io/component: api
  ports:
    - name: http
      protocol: TCP
      port: 80
      targetPort: http

Deployment excerpt:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: order-api
  namespace: commerce
spec:
  replicas: 4
  selector:
    matchLabels:
      app.kubernetes.io/name: order-api
      app.kubernetes.io/component: api
  template:
    metadata:
      labels:
        app.kubernetes.io/name: order-api
        app.kubernetes.io/part-of: commerce-platform
        app.kubernetes.io/component: api
    spec:
      containers:
        - name: app
          image: registry.example.com/order-api:2.8.1
          ports:
            - name: http
              containerPort: 8080
          readinessProbe:
            httpGet:
              path: /ready
              port: http

This keeps selector, port, and readiness contracts aligned.

25. Debugging Cookbook

25.1 Service exists, but client gets connection refused

Check:

kubectl get svc -n commerce order-api -o wide
kubectl describe svc -n commerce order-api
kubectl get endpointslice -n commerce -l kubernetes.io/service-name=order-api -o wide
kubectl get pod -n commerce -l app.kubernetes.io/name=order-api -o wide

Likely causes:

wrong selector;
pods not ready;
wrong targetPort;
application not listening on expected interface/port;
network policy blocking traffic;
dataplane issue.

25.2 DNS fails

Check:

kubectl -n kube-system get pods -l k8s-app=kube-dns
kubectl -n kube-system logs deploy/coredns
kubectl run net-debug --rm -it --image=nicolaka/netshoot --restart=Never -- bash

Inside debug pod:

nslookup order-api.commerce.svc.cluster.local
curl -v http://order-api.commerce.svc.cluster.local

Likely causes:

wrong namespace;
Service missing;
CoreDNS unavailable;
DNS network policy blocked;
node-local DNS cache issue;
application DNS cache stale.

25.3 Works from one namespace, fails from another

Check:

kubectl get networkpolicy -A
kubectl auth can-i get svc -n commerce
kubectl run debug -n failing-namespace --rm -it --image=nicolaka/netshoot --restart=Never -- bash

Likely causes:

NetworkPolicy;
namespace-specific DNS search behavior;
wrong short Service name;
service mesh sidecar policy;
mTLS authorization;
egress restrictions.

25.4 LoadBalancer created, but traffic fails

Check:

kubectl describe svc -n edge public-api
kubectl get events -n edge --sort-by=.lastTimestamp
kubectl get endpointslice -n edge -l kubernetes.io/service-name=public-api

Cloud-side checks:

load balancer listeners;
backend health;
security groups / NSGs;
subnet tags or subnet assignment;
public/private scheme;
target type;
health check path/port;
route table and NAT behavior.

26. EKS-Specific Service Concerns

EKS production networking often fails because cluster and VPC assumptions diverge.

Key concerns:

Concern	Why it matters
Pod IP allocation	AWS VPC CNI consumes VPC/subnet IP capacity depending on configuration
Subnet tagging	Load balancer discovery may depend on subnet tags or explicit annotations
Security groups	Node/load balancer/backend communication must be allowed
Target type	ALB/NLB can target nodes or pod IPs depending on controller/configuration
Cross-zone behavior	Availability and data transfer cost can be affected
Fargate	Some target modes are required for pod-level routing
Private cluster	Control plane, nodes, and load balancer paths must be intentionally designed

Do not copy an internet-facing annotation into an internal service. Make exposure explicit and reviewed.

Example of intentionally internal load balancer annotation patterns vary by controller and version. Keep cloud exposure rules in platform templates, not scattered application YAML.

27. AKS-Specific Service Concerns

AKS production networking has its own constraints.

Key concerns:

Concern	Why it matters
Azure CNI mode	Determines pod IP allocation and subnet planning
Azure Load Balancer behavior	`type=LoadBalancer` integrates with Azure LB and health probes
NSG and route tables	Traffic may fail outside Kubernetes visibility
Private cluster	API, nodes, DNS, and private endpoint behavior matter
Application Gateway integration	L7 exposure may bypass some in-cluster load-balancer assumptions
Managed identity	Controllers need Azure permissions to configure resources
Outbound type	Egress path can affect external calls and DNS/upstream access

As with EKS, never treat Service YAML as the whole truth. Cloud infrastructure is part of the network path.

28. Service Discovery and Microservice Contracts

A Service name is effectively part of the internal API contract.

Bad dependency config:

INVENTORY_URL: http://10.42.3.17:8080

Better:

INVENTORY_URL: http://inventory-api.commerce.svc.cluster.local

Even better in a larger platform:

INVENTORY_BASE_URL: http://inventory-api.commerce.svc.cluster.local
INVENTORY_TIMEOUT_MS: "500"
INVENTORY_RETRY_MAX_ATTEMPTS: "2"
INVENTORY_CIRCUIT_BREAKER_ENABLED: "true"

Service discovery solves naming. It does not solve distributed systems.

You still need:

timeouts;
retries with backoff;
circuit breakers;
idempotency;
bulkheads;
observability;
graceful degradation;
version compatibility.

29. Common Anti-Patterns

Anti-pattern 1: Service selector too broad

selector:
  app: api

This can capture unrelated Pods.

Anti-pattern 2: Using NodePort as public API

This bypasses intended edge controls and creates a large attack surface.

Anti-pattern 3: Relying only on `kubectl get svc`

A Service can exist with no endpoints.

Anti-pattern 4: No named ports

Numeric ports make drift harder to catch.

Anti-pattern 5: Readiness not connected to serviceability

If readiness returns 200 before the app can serve real traffic, Service routing becomes unsafe.

Anti-pattern 6: DNS lookup per request

This can overload DNS and create latency spikes.

Anti-pattern 7: Copying cloud annotations blindly

Annotations are infrastructure policy. They affect exposure, cost, security, and ownership.

Anti-pattern 8: Assuming namespace equals isolation

Namespace organizes objects. NetworkPolicy and identity enforce access.

30. Review Checklist

Before shipping a Service:

31. Practice Lab

Create a namespace:

kubectl create namespace svc-lab

Deploy an app:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: echo-api
  namespace: svc-lab
spec:
  replicas: 3
  selector:
    matchLabels:
      app.kubernetes.io/name: echo-api
  template:
    metadata:
      labels:
        app.kubernetes.io/name: echo-api
    spec:
      containers:
        - name: app
          image: hashicorp/http-echo:1.0
          args:
            - "-text=hello-from-echo"
            - "-listen=:8080"
          ports:
            - name: http
              containerPort: 8080
          readinessProbe:
            tcpSocket:
              port: http
            periodSeconds: 5

Create Service:

apiVersion: v1
kind: Service
metadata:
  name: echo-api
  namespace: svc-lab
spec:
  type: ClusterIP
  selector:
    app.kubernetes.io/name: echo-api
  ports:
    - name: http
      port: 80
      targetPort: http

Test:

kubectl run client \
  -n svc-lab \
  --rm -it \
  --image=curlimages/curl:8.10.1 \
  --restart=Never \
  -- curl -v http://echo-api.svc-lab.svc.cluster.local

Break selector intentionally:

kubectl patch svc echo-api -n svc-lab -p '{"spec":{"selector":{"app":"wrong"}}}'

Observe:

kubectl get endpointslice -n svc-lab -l kubernetes.io/service-name=echo-api

Repair selector:

kubectl patch svc echo-api -n svc-lab -p '{"spec":{"selector":{"app.kubernetes.io/name":"echo-api"}}}'

The lesson: the Service object can remain healthy-looking while backend membership is broken.

32. Mental Model Summary

Service discovery in Kubernetes is a chain of contracts:

The highest-leverage debugging question is:

At which link in the chain does desired state stop matching runtime reality?

That question scales from local minikube to large EKS/AKS production clusters.

33. References

Kubernetes Documentation — Services, Load Balancing, and Networking: https://kubernetes.io/docs/concepts/services-networking/
Kubernetes Documentation — Service: https://kubernetes.io/docs/concepts/services-networking/service/
Kubernetes Documentation — EndpointSlices: https://kubernetes.io/docs/concepts/services-networking/endpoint-slices/
Kubernetes Documentation — DNS for Services and Pods: https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/
Kubernetes Documentation — Service Internal Traffic Policy: https://kubernetes.io/docs/concepts/services-networking/service-traffic-policy/
AWS EKS Documentation — Route internet traffic with AWS Load Balancer Controller: https://docs.aws.amazon.com/eks/latest/userguide/aws-load-balancer-controller.html
Azure AKS Documentation — Concepts for networking: https://learn.microsoft.com/en-us/azure/aks/concepts-network

Lesson Recap

You just completed lesson 09 in build core. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Back To Series Next Lesson

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.

Previous Lesson

Lesson 08

Health Probes and Lifecycle Engineering

Next Lesson

Lesson 10

Ingress, Gateway API, and Edge Routing

Service Discovery and Kubernetes Networking

Part 009 — Service Discovery and Kubernetes Networking

1. The Core Model

2. Why Service Discovery Exists

3. Minimal Service Example

4. The Service-to-Pod Resolution Chain

5. Service Types

6. ClusterIP Service

7. Service Selector Correctness

8. EndpointSlice Mental Model

9. DNS Contract

10. CoreDNS Failure Patterns

11. Headless Service

12. NodePort Service

13. LoadBalancer Service

14. ExternalName Service

15. Service Without Selector

16. Traffic Policy

internalTrafficPolicy

externalTrafficPolicy

17. Named Ports and Protocol Discipline

18. Readiness and Service Membership

19. Network Path: Same Node vs Cross Node

20. kube-proxy and Dataplane Reality

21. Namespace Boundary

22. Label Strategy for Services

23. Multi-Port Service

24. Production Service Manifest Baseline

25. Debugging Cookbook

25.1 Service exists, but client gets connection refused

25.2 DNS fails

25.3 Works from one namespace, fails from another

25.4 LoadBalancer created, but traffic fails

26. EKS-Specific Service Concerns

27. AKS-Specific Service Concerns

28. Service Discovery and Microservice Contracts

29. Common Anti-Patterns

Anti-pattern 1: Service selector too broad

Anti-pattern 2: Using NodePort as public API

Anti-pattern 3: Relying only on kubectl get svc

Anti-pattern 4: No named ports

Anti-pattern 5: Readiness not connected to serviceability

Anti-pattern 6: DNS lookup per request

Anti-pattern 7: Copying cloud annotations blindly

Anti-pattern 8: Assuming namespace equals isolation

30. Review Checklist

31. Practice Lab

32. Mental Model Summary

33. References

`internalTrafficPolicy`

`externalTrafficPolicy`

Anti-pattern 3: Relying only on `kubectl get svc`