Kubernetes Network Policies That Actually Work: From Default Deny to Microsegmentation
Problem
By default, every pod in a Kubernetes cluster can communicate with every other pod across all namespaces. There are no network boundaries. A compromised pod in a development namespace can reach the production database. A compromised web frontend can directly access the secrets store. A crypto miner deployed in any namespace can exfiltrate data to any external IP.
Network policies exist to fix this, but they have a reputation for being difficult:
- CNI-dependent behaviour. The Kubernetes NetworkPolicy API is a specification, not an implementation. Calico, Cilium, and cloud-native CNIs each implement it differently. Policies that work on Cilium may behave differently on Calico. Some CNIs support egress policies; some do not support egress to specific CIDR ranges reliably.
- DNS is the most common pitfall. Apply a default-deny egress policy and every pod in the namespace immediately fails DNS resolution, because DNS egress to CoreDNS in
kube-systemis now blocked. This is the #1 reason teams abandon network policies after the first attempt. - Testing is manual and error-prone. There is no built-in tool to validate that policies enforce the expected connectivity. Teams apply policies and hope for the best, discovering gaps only when something breaks or during a security audit.
- Policy count grows linearly with services. A namespace with 20 microservices needs 20+ policies. Writing and maintaining these takes 30-60 minutes per service, and every new service dependency requires a policy update.
This article provides a complete, tested approach: start with default-deny, solve DNS immediately, build per-service policies, test systematically, and monitor for dropped traffic.
Target systems: Kubernetes 1.29+ with Calico, Cilium, or any CNI that supports the NetworkPolicy API. Specific notes for CNI-dependent behaviour.
Threat Model
- Adversary: Attacker with code execution in a pod (RCE, supply chain compromise, compromised container image).
- Access level: Unprivileged process inside a container with network access to the cluster network.
- Objective: Lateral movement to other services (access database from compromised frontend), data exfiltration (send data to external attacker-controlled server), service disruption (DoS other pods), and internal reconnaissance (scan the cluster network for open ports and services).
- Blast radius: Without network policies, entire cluster. Every pod can reach every other pod and any external IP. With default-deny + per-service policies, the compromised pod can only reach its explicitly allowed dependencies. Exfiltration is blocked unless the policy explicitly allows external egress.
Configuration
Step 1: Default-Deny for Every Namespace
Apply default-deny ingress AND egress to every non-system namespace. This is the foundation, everything is blocked, then you allowlist what is needed.
# default-deny.yaml
# Apply to each namespace individually.
# Do NOT apply to kube-system (breaks cluster components).
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: production
spec:
podSelector: {} # Matches all pods in the namespace
policyTypes:
- Ingress
- Egress
# Apply to all non-system namespaces:
for ns in $(kubectl get namespaces -o jsonpath='{.items[*].metadata.name}' \
| tr ' ' '\n' \
| grep -v -E '^kube-'); do
kubectl apply -n "$ns" -f - <<'EOF'
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
EOF
echo "Applied default-deny to namespace: $ns"
done
After this step: Every pod in every non-system namespace has zero network access, no ingress, no egress, no DNS. This is intentional. The next step fixes DNS.
Step 2: Allow DNS (Critical - Do This Immediately)
# allow-dns.yaml
# Apply to every namespace that has default-deny.
# Without this, no pod can resolve hostnames.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-dns
namespace: production
spec:
podSelector: {}
policyTypes:
- Egress
egress:
# Allow DNS to CoreDNS in kube-system
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: kube-system
- podSelector:
matchLabels:
k8s-app: kube-dns
ports:
- protocol: UDP
port: 53
- protocol: TCP
port: 53
CNI-specific note: The namespaceSelector + podSelector combination behaves differently across CNIs:
- Cilium: Both selectors must match (AND logic). This is correct.
- Calico: Both selectors must match (AND logic). This is correct.
- Some cloud-native CNIs: May treat them as OR logic. Test with your specific CNI.
The safest approach (works on all CNIs):
egress:
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: kube-system
ports:
- protocol: UDP
port: 53
- protocol: TCP
port: 53
This allows egress to any pod in kube-system on port 53, slightly broader than targeting CoreDNS specifically, but guaranteed to work on every CNI.
# Apply DNS allow to all non-system namespaces:
for ns in $(kubectl get namespaces -o jsonpath='{.items[*].metadata.name}' \
| tr ' ' '\n' \
| grep -v -E '^kube-'); do
kubectl apply -n "$ns" -f allow-dns.yaml
echo "Applied allow-dns to namespace: $ns"
done
Verify DNS works:
kubectl run dns-test --image=busybox --restart=Never -n production \
--command -- nslookup kubernetes.default
# Expected: Name resolves successfully
kubectl delete pod dns-test -n production
Step 3: Per-Service Ingress Policies
For each service, create a policy that allows traffic only from its known callers.
Example: Frontend → API → Database
# api-ingress.yaml
# Allow the API to receive traffic from the frontend on port 8080.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: api-allow-frontend
namespace: production
spec:
podSelector:
matchLabels:
app: api
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
app: frontend
ports:
- port: 8080
protocol: TCP
# database-ingress.yaml
# Allow the database to receive traffic from the API on port 5432.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: database-allow-api
namespace: production
spec:
podSelector:
matchLabels:
app: database
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
app: api
ports:
- port: 5432
protocol: TCP
# frontend-ingress.yaml
# Allow the frontend to receive traffic from the ingress controller.
# The ingress controller is typically in a different namespace.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: frontend-allow-ingress
namespace: production
spec:
podSelector:
matchLabels:
app: frontend
policyTypes:
- Ingress
ingress:
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: ingress-nginx
ports:
- port: 80
protocol: TCP
Step 4: Egress Controls
For services that need external access (e.g., calling a payment API):
# api-egress.yaml
# Allow the API to reach the payment processor and nothing else externally.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: api-egress
namespace: production
spec:
podSelector:
matchLabels:
app: api
policyTypes:
- Egress
egress:
# DNS (already covered by allow-dns, but explicit here for clarity)
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: kube-system
ports:
- protocol: UDP
port: 53
# Allow egress to the database
- to:
- podSelector:
matchLabels:
app: database
ports:
- port: 5432
protocol: TCP
# Allow egress to payment processor (external IP)
- to:
- ipBlock:
cidr: 198.51.100.0/24 # Payment processor IP range
ports:
- port: 443
protocol: TCP
Step 5: Testing Strategy
Manual testing with ephemeral debug pods:
# Test: frontend → api (should succeed)
kubectl run test-conn --image=busybox --restart=Never -n production \
-l app=frontend --command -- wget -qO- --timeout=3 http://api:8080/health
# Expected: 200 response or health check output
# Test: api → frontend (should fail - not allowed)
kubectl run test-conn2 --image=busybox --restart=Never -n production \
-l app=api --command -- wget -qO- --timeout=3 http://frontend:80/
# Expected: timeout (connection blocked by policy)
# Test: api → external internet (should fail - not in egress allowlist)
kubectl run test-egress --image=busybox --restart=Never -n production \
-l app=api --command -- wget -qO- --timeout=3 http://example.com
# Expected: timeout (egress blocked)
# Clean up:
kubectl delete pod test-conn test-conn2 test-egress -n production --ignore-not-found
Automated policy testing script:
#!/bin/bash
# network-policy-test.sh
# Tests expected connectivity between services.
NAMESPACE="production"
PASS=0
FAIL=0
test_connection() {
local from_label=$1
local to_host=$2
local to_port=$3
local expected=$4 # "pass" or "fail"
local desc=$5
result=$(kubectl run "test-$(date +%s)" --image=busybox --restart=Never \
-n "$NAMESPACE" -l "app=$from_label" --rm -i --timeout=10s \
--command -- wget -qO- --timeout=3 "http://${to_host}:${to_port}/" 2>&1)
if [ "$expected" = "pass" ] && echo "$result" | grep -qv "timed out"; then
echo "PASS: $desc"
((PASS++))
elif [ "$expected" = "fail" ] && echo "$result" | grep -q "timed out"; then
echo "PASS: $desc (correctly blocked)"
((PASS++))
else
echo "FAIL: $desc (expected=$expected)"
((FAIL++))
fi
}
echo "=== Network Policy Tests ==="
test_connection "frontend" "api" "8080" "pass" "frontend → api:8080"
test_connection "api" "database" "5432" "pass" "api → database:5432"
test_connection "api" "frontend" "80" "fail" "api → frontend:80 (should be blocked)"
test_connection "database" "api" "8080" "fail" "database → api:8080 (should be blocked)"
test_connection "frontend" "database" "5432" "fail" "frontend → database:5432 (should be blocked)"
echo ""
echo "Results: $PASS passed, $FAIL failed"
exit $FAIL
Step 6: Monitoring Dropped Traffic
Cilium with Hubble:
# Install Hubble CLI
HUBBLE_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/hubble/master/stable.txt)
curl -L --remote-name-all "https://github.com/cilium/hubble/releases/download/${HUBBLE_VERSION}/hubble-linux-amd64.tar.gz"
tar xzf hubble-linux-amd64.tar.gz
sudo mv hubble /usr/local/bin/
# View dropped traffic in real time:
hubble observe --verdict DROPPED --namespace production
# View dropped traffic for a specific pod:
hubble observe --verdict DROPPED --to-pod production/api-7d8f9b6c4d-x2k4l
Prometheus metrics for policy drops:
# Cilium provides these metrics out of the box:
# cilium_drop_count_total - total dropped packets by reason
# cilium_policy_verdict - policy decisions (forwarded, dropped, denied)
# Alert on new drop sources (pods trying to reach blocked destinations):
groups:
- name: network-policy-monitoring
rules:
- alert: NetworkPolicyDrop
expr: rate(cilium_drop_count_total{reason="POLICY_DENIED"}[5m]) > 0
for: 5m
labels:
severity: warning
annotations:
summary: "Network policy dropping traffic in {{ $labels.namespace }}"
description: "Packets being dropped by network policy. Check if a new service needs a policy update."
Expected Behaviour
After applying all network policies:
- Pods can only communicate with explicitly allowed destinations
- DNS resolution works in every namespace (allow-dns policy)
- Ingress controller can reach frontend pods
- Frontend can reach API pods; API can reach database pods; no other paths exist
- External egress is blocked except for explicitly allowlisted IPs
hubble observe --verdict DROPPEDshows blocked trafficnetwork-policy-test.shreturns all-pass- New services deployed without a network policy have zero connectivity (default-deny catches them)
Trade-offs
| Control | Impact | Risk | Mitigation |
|---|---|---|---|
| Default-deny per namespace | Every new service needs a policy before it works | Developer friction: “it works locally but not in staging” | Make network policy a required part of the service deployment template. Provide a policy generator tool. |
| Per-service policies | 30-60 minutes per service to write and test | Policy maintenance grows linearly with service count | Use Kyverno to generate baseline policies automatically from labels. |
| Egress restrictions | Blocks unexpected outbound connections | Legitimate external API calls blocked until allowlisted | Maintain an egress allowlist per namespace. Alert on new blocked egress (may indicate a missing policy, not an attack). |
| CIDR-based egress to external IPs | IP ranges can change for external services | Policy breaks if external service changes IP | Use DNS-based egress policies (Cilium CiliumNetworkPolicy supports FQDN rules; standard NetworkPolicy does not). |
Failure Modes
| Failure | Symptom | Detection | Recovery |
|---|---|---|---|
| DNS egress not allowed | Every pod in namespace fails DNS resolution | Immediate. all services return DNS errors; application logs show “name resolution failed” | Apply the allow-dns policy to the namespace. This is the #1 issue, always apply DNS allow immediately after default-deny. |
| Policy selector doesn’t match any pods | Policy has no effect. traffic that should be blocked is allowed | kubectl get netpol -o yaml shows the policy but hubble observe shows traffic flowing; connectivity test shows unexpected success |
Check label selectors against actual pod labels: kubectl get pods -l app=api -n production. Fix the selector to match. |
| Health check probes blocked | Pods show as unhealthy; Kubernetes restarts them continuously | Pod restart count increases; readiness probe failures in kubectl describe pod |
Add ingress allow for kubelet health checks. Source IP depends on CNI. check your CNI documentation for the health check source CIDR. |
| Egress to external API blocked | Application feature fails (payment processing, email sending, webhook) | Application logs show connection timeout to external service; hubble observe --verdict DROPPED shows the blocked connection |
Add the external IP/CIDR to the service’s egress policy. Use FQDN-based policies on Cilium for services with dynamic IPs. |
| Policy applied to wrong namespace | Wrong namespace loses connectivity | Connectivity tests in the wrong namespace fail; kubectl get netpol -n <namespace> shows unexpected policies |
Delete the policy from the wrong namespace. Re-apply to the correct one. |
When to Consider a Managed Alternative
Transition point: Writing per-service policies for 50+ microservices takes 30-60 minutes each and must be updated with every new service dependency. Maintaining 50+ policies across 2+ clusters with different CNIs creates drift. At this scale, network policy lifecycle management becomes a dedicated task.
Recommended providers:
- Isovalent (#54) Cilium Enterprise: Policy lifecycle management, policy editor UI, network flow visualization, policy recommendation engine that suggests policies based on observed traffic patterns. Multi-cluster policy distribution.
- Sysdig (#122): Network policy visualization and gap analysis. Shows which pods have no network policy and which policies have no matching pods. Identifies over-permissive policies.
What you still control: The policies themselves (what each service can reach) remain your decision. Managed tools help you create, visualize, and verify policies, but the security intent is yours.
Premium content pack: Kyverno policy pack that enforces “every namespace must have a default-deny policy” and “every deployment must have a corresponding network policy.” Includes policy templates for common service architectures (frontend-api-db, worker-queue-db, ingress-service).