DNS Security for Production Infrastructure: DNSSEC, CAA Records, and Internal Resolution
Problem
DNS is the most critical single point of failure in any infrastructure, and the least hardened layer for most teams. Every service depends on DNS resolution. A DNS compromise is silent and total, traffic redirects to attacker-controlled infrastructure with no firewall, WAF, or TLS configuration providing protection, because the client believes it is connecting to the correct destination.
The specific gaps in most production DNS setups:
- No DNSSEC. DNS responses are unsigned. Any attacker who can intercept or poison DNS traffic can forge responses, redirecting your users or services to arbitrary IP addresses. This is not theoretical, BGP hijacking combined with DNS poisoning is a documented attack pattern used against cryptocurrency exchanges and financial institutions.
- No CAA records. Without Certificate Authority Authorization records, any CA in the world can issue a valid TLS certificate for your domain. An attacker who compromises or social-engineers a CA can obtain a legitimate certificate for your domain and use it for man-in-the-middle attacks that pass TLS validation.
- Plaintext internal resolution. Hosts resolve DNS over plaintext UDP. An attacker on the network can observe every domain your hosts query (leaking your internal service topology, vendor relationships, and activity patterns) and poison responses.
- Default CoreDNS in Kubernetes. CoreDNS runs with default configuration, no query logging for security analysis, no rate limiting to prevent DNS amplification from compromised pods, and no network policy restricting which pods can access DNS.
Target systems: Any Linux host or Kubernetes cluster. Specific configurations for Unbound (host-level resolver), CoreDNS (Kubernetes), and integration with managed DNS providers.
Threat Model
- Adversary: Network-adjacent attacker (DNS cache poisoning), compromised registrar account (domain hijacking), rogue or compromised Certificate Authority (unauthorized certificate issuance), or compromised pod in Kubernetes (DNS-based reconnaissance or data exfiltration via DNS tunnelling).
- Access level: Network access for poisoning (same subnet or upstream path). Credential access for registrar compromise. No access needed for CA abuse without CAA records.
- Objective: Redirect traffic to attacker-controlled infrastructure (phishing, credential theft), issue fraudulent TLS certificates (man-in-the-middle), exfiltrate data through DNS queries (DNS tunnelling), or map internal infrastructure through DNS enumeration.
- Blast radius: Total. A DNS compromise affects every service on the domain. Combined with a fraudulent certificate, the attacker has a valid TLS-encrypted impersonation of your infrastructure that passes all client-side validation.
Configuration
DNSSEC: Signing Your Zone
DNSSEC adds cryptographic signatures to DNS responses, allowing resolvers to verify that the response came from the authoritative server and was not modified in transit.
If you use a managed DNS provider (recommended for most teams), DNSSEC is typically a single toggle:
# Verify DNSSEC is active for your domain
dig +dnssec example.com A
# Look for the 'ad' (authenticated data) flag in the response:
# ;; flags: qr rd ra ad; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1
# ^^
# 'ad' means the resolver validated the DNSSEC signatures.
# If 'ad' is missing, DNSSEC is not configured or not validating.
# Use delv for detailed DNSSEC validation:
delv @1.1.1.1 example.com A
# Expected output for a properly signed zone:
# ; fully validated
# example.com. 300 IN A 93.184.216.34
# example.com. 300 IN RRSIG A 13 2 300 ...
# If validation fails:
# ;; resolution failed: SERVFAIL
# This means the DNSSEC chain is broken.
DNSSEC at the registrar level:
After enabling DNSSEC at your DNS provider, you must add the DS (Delegation Signer) record at your registrar. The DS record links your zone’s signing key to the parent zone (e.g., .com), completing the chain of trust.
# Example DS record (your DNS provider will give you these values):
example.com. IN DS 12345 13 2 AABBCCDD...
# Fields: key-tag, algorithm, digest-type, digest
# Algorithm 13 = ECDSA P-256 (recommended)
# Digest type 2 = SHA-256 (recommended)
Monitoring DNSSEC health:
# Check signature expiry (RRSIG records have expiry timestamps)
dig +dnssec +short example.com RRSIG | awk '{print $5}'
# Output: 20260501120000 (format: YYYYMMDDHHmmss)
# Signatures must be refreshed before expiry.
# Managed DNS providers handle this automatically.
# Monitor with Prometheus blackbox exporter:
# Add a DNS probe that checks for the 'ad' flag.
# Alert if the flag disappears (DNSSEC broken).
CAA Records: Restricting Certificate Issuance
CAA (Certificate Authority Authorization) records specify which Certificate Authorities are permitted to issue certificates for your domain. Without CAA records, any of the ~150 publicly trusted CAs can issue a certificate.
# Add CAA records to your DNS zone.
# These restrict certificate issuance to Let's Encrypt only.
# Allow Let's Encrypt to issue standard certificates:
example.com. IN CAA 0 issue "letsencrypt.org"
# Allow Let's Encrypt to issue wildcard certificates:
example.com. IN CAA 0 issuewild "letsencrypt.org"
# Send violation reports to your security team:
example.com. IN CAA 0 iodef "mailto:security@example.com"
Multiple CAs: If you use more than one CA (e.g., Let’s Encrypt for automation and ZeroSSL as a backup):
example.com. IN CAA 0 issue "letsencrypt.org"
example.com. IN CAA 0 issue "sectigo.com"
example.com. IN CAA 0 issuewild "letsencrypt.org"
example.com. IN CAA 0 iodef "mailto:security@example.com"
Verify CAA records:
dig CAA example.com
# Expected output:
# example.com. 300 IN CAA 0 issue "letsencrypt.org"
# example.com. 300 IN CAA 0 issuewild "letsencrypt.org"
# example.com. 300 IN CAA 0 iodef "mailto:security@example.com"
Common mistake: Forgetting issuewild. If you only set issue records, any CA can still issue wildcard certificates. Always set both issue and issuewild.
Certificate Transparency Monitoring
CAA records prevent issuance, but you should also monitor for certificates that are issued, detecting any that should not exist:
# Monitor Certificate Transparency logs for your domain.
# crt.sh provides a free web interface and API:
curl -s "https://crt.sh/?q=%.example.com&output=json" | \
python3 -c "import sys,json; [print(c['common_name'], c['issuer_name'], c['not_before']) for c in json.load(sys.stdin)]"
# For automated monitoring, use a CT log monitor:
# - certspotter (free for 1 domain): https://sslmate.com/certspotter/
# - Prometheus ct_exporter for metric-based alerting
Internal DNS Resolver Hardening with Unbound
For Linux hosts that need a local caching resolver with DNSSEC validation and DNS-over-TLS to upstream, Unbound is the recommended choice.
Install and configure:
# Install Unbound
# Debian/Ubuntu:
sudo apt install unbound
# RHEL/Rocky:
sudo dnf install unbound
Create /etc/unbound/unbound.conf.d/hardening.conf:
# /etc/unbound/unbound.conf.d/hardening.conf
# Hardened Unbound configuration for local caching resolver
# with DNSSEC validation and DNS-over-TLS upstream.
server:
# Listen on localhost only. This is a local resolver, not authoritative.
interface: 127.0.0.1
interface: ::1
port: 53
# Access control: only localhost can query.
access-control: 127.0.0.0/8 allow
access-control: ::1/128 allow
access-control: 0.0.0.0/0 refuse
access-control: ::/0 refuse
# DNSSEC validation
auto-trust-anchor-file: "/var/lib/unbound/root.key"
val-clean-additional: yes
# Harden against known DNS attacks
harden-glue: yes
harden-dnssec-stripped: yes
harden-below-nxdomain: yes
harden-referral-path: yes
# Hide identity and version
hide-identity: yes
hide-version: yes
# Rate limiting to prevent abuse
ratelimit: 1000
# Logging for security analysis
verbosity: 1
log-queries: yes
log-replies: yes
log-tag-queryreply: yes
logfile: /var/log/unbound/unbound.log
# Disable unnecessary protocols
do-not-query-localhost: yes
# Prefetch popular domains before TTL expires
prefetch: yes
prefetch-key: yes
# Use DNS-over-TLS for upstream queries
tls-cert-bundle: /etc/ssl/certs/ca-certificates.crt
forward-zone:
name: "."
# Cloudflare DNS over TLS
forward-addr: 1.1.1.1@853#cloudflare-dns.com
forward-addr: 1.0.0.1@853#cloudflare-dns.com
# Quad9 DNS over TLS (malware blocking)
forward-addr: 9.9.9.9@853#dns.quad9.net
forward-addr: 149.112.112.112@853#dns.quad9.net
forward-tls-upstream: yes
# Enable and start Unbound
sudo systemctl enable --now unbound
# Configure the system to use Unbound as its resolver
# On systemd-resolved systems:
sudo systemctl disable --now systemd-resolved
echo "nameserver 127.0.0.1" | sudo tee /etc/resolv.conf
# Verify DNS-over-TLS is working (no plaintext DNS should leave the host)
# On another terminal, capture DNS traffic:
sudo tcpdump -i eth0 port 53 -c 5
# Expected: no packets captured (all DNS goes over TLS on port 853)
sudo tcpdump -i eth0 port 853 -c 5
# Expected: TLS-encrypted packets to 1.1.1.1 and 9.9.9.9
# Verify DNSSEC validation
dig @127.0.0.1 +dnssec example.com
# Look for 'ad' flag in response
Kubernetes CoreDNS Hardening
CoreDNS in Kubernetes runs with minimal security configuration by default. Harden it with query logging, rate limiting, and network policy.
CoreDNS ConfigMap with security additions:
kubectl edit configmap coredns -n kube-system
# Add logging and rate limiting to the Corefile:
.:53 {
errors
health {
lameduck 5s
}
ready
# Query logging for security analysis.
# Log all queries with client IP, query name, and response code.
log . {
class denial error
}
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
prometheus :9153
forward . /etc/resolv.conf {
max_concurrent 1000
}
cache 30
loop
reload
loadbalance
}
Network policy for CoreDNS:
# coredns-network-policy.yaml
# Restrict which pods can access CoreDNS.
# By default, all pods can query DNS. This policy restricts to
# pods in namespaces with the label 'dns-access: allowed'.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: coredns-allow
namespace: kube-system
spec:
podSelector:
matchLabels:
k8s-app: kube-dns
policyTypes:
- Ingress
ingress:
- from:
- namespaceSelector:
matchLabels:
dns-access: "allowed"
ports:
- protocol: UDP
port: 53
- protocol: TCP
port: 53
# Label namespaces that should have DNS access:
kubectl label namespace default dns-access=allowed
kubectl label namespace production dns-access=allowed
kubectl label namespace kube-system dns-access=allowed
# Apply the network policy:
kubectl apply -f coredns-network-policy.yaml
Warning: Restricting DNS access can break pods in unlabelled namespaces. Apply the namespace labels before the network policy, and test in a non-production cluster first.
DNS Monitoring and Anomaly Detection
# Prometheus blackbox exporter DNS probe configuration.
# Add to blackbox-exporter ConfigMap:
modules:
dns_external:
prober: dns
timeout: 5s
dns:
query_name: "example.com"
query_type: "A"
valid_rcodes:
- NOERROR
validate_answer_rrs:
fail_if_not_matches_regexp:
- ".*93\\.184\\.216\\.34.*" # Expected IP address
validate_additional_rrs:
fail_if_not_matches_regexp:
- ".*"
dns_dnssec:
prober: dns
timeout: 5s
dns:
query_name: "example.com"
query_type: "A"
valid_rcodes:
- NOERROR
# Check that the AD (authenticated data) flag is set
validate_authority_rrs:
fail_if_not_matches_regexp:
- ".*"
# Prometheus alert rules for DNS security
groups:
- name: dns-security
rules:
- alert: DNSSECValidationFailed
expr: probe_dns_lookup_time_seconds{module="dns_dnssec"} == 0 and probe_success{module="dns_dnssec"} == 0
for: 5m
labels:
severity: critical
annotations:
summary: "DNSSEC validation failed for {{ $labels.instance }}"
runbook: "Check DNSSEC DS record at registrar. Verify signing key has not expired."
- alert: DNSResolutionChanged
expr: changes(probe_dns_answer_rrs{module="dns_external"}[1h]) > 0
labels:
severity: warning
annotations:
summary: "DNS resolution for {{ $labels.instance }} has changed"
runbook: "Verify the change is expected. Check for DNS hijacking."
Expected Behaviour
After applying all DNS hardening:
dig +dnssec example.comreturns responses with thead(authenticated data) flagdig CAA example.comreturns your configured CAA records- Attempting to issue a certificate from a non-allowlisted CA fails with a CAA denial
tcpdump port 53on a hardened host shows no plaintext DNS traffic (all queries go over TLS on port 853)- CoreDNS query logs capture client IP and query details for security analysis
- Certificate Transparency monitoring alerts within 1 hour of any new certificate issuance for your domain
- Prometheus DNS probes verify resolution returns expected IPs and DNSSEC validates
Trade-offs
| Control | Impact | Risk | Mitigation |
|---|---|---|---|
| DNSSEC | 1-5ms added latency for first resolution (cached after) | DNSSEC misconfiguration causes total DNS failure for the zone. SERVFAIL for all queries | Monitor DNSSEC health. Use a managed DNS provider that handles signing and key rotation automatically. |
| CAA records | None | Forgetting to add a new CA before switching certificate providers causes issuance failure (cert-manager fails to renew) | Add the new CA to CAA records before migrating. Include in certificate provider change checklist. |
| DNS-over-TLS (Unbound) | 10-30ms added latency for cache misses | Dependency on upstream DoT provider availability. If both Cloudflare and Quad9 DoT are unreachable, all DNS fails. | Configure multiple upstream providers. Consider adding a non-DoT fallback for availability (trades privacy for availability during upstream outage). |
| CoreDNS query logging | Disk I/O for high-volume clusters (>100K queries/day) | Log volume can fill node storage on busy clusters | Log only errors and denials (class denial error). Ship to external storage. |
| CoreDNS network policy | Pods in unlabelled namespaces lose DNS resolution | Immediate breakage for any namespace without the DNS label | Label all namespaces before applying the policy. Test in staging first. |
| Certificate Transparency monitoring | None (passive monitoring) | Alert fatigue if you issue certificates frequently (cert-manager renewals trigger alerts) | Filter known cert-manager issuances from CT alerts. |
Failure Modes
| Failure | Symptom | Detection | Recovery |
|---|---|---|---|
| DNSSEC signature expired | All DNS resolution for the zone fails (SERVFAIL) | delv shows expired signatures; Prometheus probe fires; every service reports DNS errors |
Emergency: disable DNSSEC at registrar (propagation: 15-60 minutes). Long-term: fix signing configuration at DNS provider. |
| DS record mismatch after key rotation | Zone becomes unresolvable (broken chain of trust) | dig +dnssec shows no ad flag; delv shows validation failure |
Update DS record at registrar to match the new key. Wait for propagation (up to 48 hours for TTL expiry of old DS). |
| CAA blocks legitimate cert issuance | cert-manager fails to issue or renew certificates | cert-manager logs: CAA record for example.com prevents issuance; Let’s Encrypt error in ACME logs |
Add the CA’s domain to CAA records. Wait for DNS propagation (TTL). Retry issuance. |
| DNS-over-TLS upstream unreachable | All DNS resolution fails on the host | Unbound logs show upstream timeout; every service fails to resolve | Restart Unbound (flushes connections). If upstream is down, temporarily add a non-DoT fallback in forward-zone. |
| CoreDNS network policy blocks DNS | Pods in unlabelled namespace fail all DNS resolution | Application logs show DNS timeout; nslookup from pod fails |
Label the namespace: kubectl label namespace <name> dns-access=allowed. The policy takes effect immediately. |
| CT monitor detects unauthorized cert | Certificate issued by unexpected CA | CT monitoring alert fires with unknown issuer | Investigate: is this a legitimate cert (new team, new provider) or unauthorized? If unauthorized: revoke the certificate, rotate affected credentials, investigate CA compromise vector. |
When to Consider a Managed Alternative
DNS is the easiest first managed service for most teams. Consider switching when:
- You need DNSSEC but do not have the expertise for key rotation (ZSK every 90 days, KSK annually). Managed providers handle this automatically.
- Self-managed DNS availability does not meet your uptime requirements. A 5-minute DNS outage affects 100% of your services.
- You need anycast distribution for global resolution performance. Self-hosted DNS serves from one or two locations.
Recommended providers:
- Cloudflare (#29): Free DNS hosting with automatic DNSSEC (one-click enable), global anycast network, DDoS-resilient. The easiest starting point. Free tier covers everything most teams need.
- deSEC: Free, non-profit, DNSSEC-by-default, API-driven, privacy-focused. No account required for basic usage. Good for teams that want DNSSEC without vendor dependency.
- DNSimple (#77): Automatic DNSSEC key rotation, Let’s Encrypt integration, developer-friendly API. From $5/month per zone. For teams wanting automated key management.
- NS1 (#76): Advanced traffic steering, real-time analytics, Filter Chain for complex routing. From $100/month. For teams needing DNS-level traffic management.
What you still control: Internal DNS resolution (Unbound on hosts, CoreDNS in Kubernetes) stays self-managed. CAA records are configured at whichever provider hosts your zone. Split-horizon DNS for internal zones remains on internal infrastructure.
What changes: Authoritative DNS serving, DNSSEC key management, anycast distribution, and DDoS resilience for DNS are handled by the provider. You no longer need to manage BIND/PowerDNS/NSD infrastructure, monitor signing key expiry, or maintain multiple authoritative DNS servers for redundancy.