gRPC Security in Production: TLS, Authentication, and Interceptor-Based Access Control

gRPC Security in Production: TLS, Authentication, and Interceptor-Based Access Control

Problem

gRPC services in production frequently run with security configurations that would never be acceptable for HTTP APIs:

  • No TLS because “it’s internal.” Service-to-service gRPC calls traverse the pod network in plaintext. Any compromised pod can sniff every protobuf message, including tokens, user data, and internal state.
  • No per-method authorization. The service exposes 30 RPC methods, and every authenticated caller can invoke all of them. The billing service can call the user deletion RPC because there is no method-level access control.
  • No request size limits. The default grpc.max_receive_message_length in many frameworks is 4MB or unlimited. A malicious or buggy client can send a 2GB protobuf message that crashes the server.
  • No deadline enforcement. Clients call RPCs without setting deadlines. A slow downstream causes cascading timeouts that consume all available connections and threads.
  • Unauthenticated health check endpoints. The gRPC health checking protocol is exposed without any access control, leaking service availability information to anyone who can reach the port.

These gaps exist because gRPC services are often developed and deployed behind network boundaries that teams assume are sufficient. They are not.

Target systems: gRPC services in Go, Java, Python, or Node.js, running in Kubernetes or on VMs, with or without a service mesh.

Threat Model

  • Adversary: Compromised pod or container in the same network segment. Internal attacker with access to the Kubernetes cluster. External attacker who gains access through a different vulnerability.
  • Access level: Network-level access to the gRPC port. May have valid credentials for one service but not the target.
  • Objective: Eavesdrop on plaintext gRPC traffic to extract credentials and data. Invoke privileged RPCs by calling methods the attacker’s service should not access. Denial of service through oversized messages or deadline-less requests. Enumerate internal services through health check endpoints.
  • Blast radius: All services in the same network segment for eavesdropping. The specific target service for unauthorized RPC invocation. Cascading failure across dependent services for deadline abuse.

Configuration

TLS for gRPC: Server and Client

Server-side TLS in Go:

// server.go - gRPC server with TLS
package main

import (
    "crypto/tls"
    "crypto/x509"
    "log"
    "net"
    "os"

    "google.golang.org/grpc"
    "google.golang.org/grpc/credentials"
    "google.golang.org/grpc/health"
    healthpb "google.golang.org/grpc/health/grpc_health_v1"
    "google.golang.org/grpc/keepalive"
    "time"
)

func main() {
    // Load server certificate and key
    cert, err := tls.LoadX509KeyPair(
        "/etc/certs/server.crt",
        "/etc/certs/server.key",
    )
    if err != nil {
        log.Fatalf("failed to load server cert: %v", err)
    }

    // Load CA certificate for client verification (mTLS)
    caCert, err := os.ReadFile("/etc/certs/ca.crt")
    if err != nil {
        log.Fatalf("failed to read CA cert: %v", err)
    }
    caPool := x509.NewCertPool()
    caPool.AppendCertsFromPEM(caCert)

    tlsConfig := &tls.Config{
        Certificates: []tls.Certificate{cert},
        ClientAuth:   tls.RequireAndVerifyClientCert,
        ClientCAs:    caPool,
        MinVersion:   tls.VersionTLS13,
    }

    // Server options with security controls
    opts := []grpc.ServerOption{
        grpc.Creds(credentials.NewTLS(tlsConfig)),

        // Maximum message sizes
        grpc.MaxRecvMsgSize(4 * 1024 * 1024),  // 4 MB inbound
        grpc.MaxSendMsgSize(4 * 1024 * 1024),  // 4 MB outbound

        // Keepalive enforcement: disconnect idle clients
        grpc.KeepaliveEnforcementPolicy(keepalive.EnforcementPolicy{
            MinTime:             30 * time.Second,
            PermitWithoutStream: false,
        }),
        grpc.KeepaliveParams(keepalive.ServerParameters{
            MaxConnectionIdle:     5 * time.Minute,
            MaxConnectionAge:      30 * time.Minute,
            MaxConnectionAgeGrace: 10 * time.Second,
            Time:                  1 * time.Minute,
            Timeout:               20 * time.Second,
        }),

        // Connection limits
        grpc.MaxConcurrentStreams(100),
    }

    server := grpc.NewServer(opts...)

    // Register your services
    // pb.RegisterMyServiceServer(server, &myServiceImpl{})

    // Register health service
    healthServer := health.NewServer()
    healthpb.RegisterHealthServer(server, healthServer)
    healthServer.SetServingStatus("myservice", healthpb.HealthCheckResponse_SERVING)

    lis, err := net.Listen("tcp", ":8443")
    if err != nil {
        log.Fatalf("failed to listen: %v", err)
    }

    log.Println("gRPC server listening on :8443 with mTLS")
    if err := server.Serve(lis); err != nil {
        log.Fatalf("failed to serve: %v", err)
    }
}

Client-side mTLS in Go:

// client.go - gRPC client with mTLS
package main

import (
    "crypto/tls"
    "crypto/x509"
    "log"
    "os"
    "time"

    "google.golang.org/grpc"
    "google.golang.org/grpc/credentials"
)

func newGRPCConnection(target string) (*grpc.ClientConn, error) {
    // Load client certificate for mTLS
    cert, err := tls.LoadX509KeyPair(
        "/etc/certs/client.crt",
        "/etc/certs/client.key",
    )
    if err != nil {
        return nil, err
    }

    // Load CA to verify server certificate
    caCert, err := os.ReadFile("/etc/certs/ca.crt")
    if err != nil {
        return nil, err
    }
    caPool := x509.NewCertPool()
    caPool.AppendCertsFromPEM(caCert)

    tlsConfig := &tls.Config{
        Certificates: []tls.Certificate{cert},
        RootCAs:      caPool,
        MinVersion:   tls.VersionTLS13,
        ServerName:   "myservice.internal",
    }

    conn, err := grpc.Dial(
        target,
        grpc.WithTransportCredentials(credentials.NewTLS(tlsConfig)),
        // Always set a default timeout for RPCs
        grpc.WithDefaultCallOptions(
            grpc.MaxCallRecvMsgSize(4*1024*1024),
            grpc.MaxCallSendMsgSize(4*1024*1024),
        ),
    )
    if err != nil {
        return nil, err
    }
    return conn, nil
}

Token-Based Authentication with Interceptors

For environments where mTLS is not practical (multi-language teams, third-party callers), use token-based authentication with a unary interceptor:

// auth_interceptor.go - Server-side authentication interceptor
package auth

import (
    "context"
    "strings"

    "google.golang.org/grpc"
    "google.golang.org/grpc/codes"
    "google.golang.org/grpc/metadata"
    "google.golang.org/grpc/status"
)

// Methods that do not require authentication
var publicMethods = map[string]bool{
    "/grpc.health.v1.Health/Check": true,
    "/grpc.health.v1.Health/Watch": true,
}

// Per-method authorization: which service identities can call which methods
var methodACL = map[string][]string{
    "/mypackage.MyService/GetUser":    {"api-gateway", "admin-service"},
    "/mypackage.MyService/DeleteUser": {"admin-service"},
    "/mypackage.MyService/ListUsers":  {"api-gateway", "admin-service", "reporting-service"},
    "/mypackage.MyService/UpdateUser": {"api-gateway", "admin-service"},
}

func UnaryAuthInterceptor(tokenValidator TokenValidator) grpc.UnaryServerInterceptor {
    return func(
        ctx context.Context,
        req interface{},
        info *grpc.UnaryServerInfo,
        handler grpc.UnaryHandler,
    ) (interface{}, error) {
        // Skip auth for public methods
        if publicMethods[info.FullMethod] {
            return handler(ctx, req)
        }

        // Extract token from metadata
        md, ok := metadata.FromIncomingContext(ctx)
        if !ok {
            return nil, status.Error(codes.Unauthenticated, "missing metadata")
        }

        authHeader := md.Get("authorization")
        if len(authHeader) == 0 {
            return nil, status.Error(codes.Unauthenticated, "missing authorization header")
        }

        token := strings.TrimPrefix(authHeader[0], "Bearer ")

        // Validate token and extract caller identity
        callerIdentity, err := tokenValidator.Validate(ctx, token)
        if err != nil {
            return nil, status.Error(codes.Unauthenticated, "invalid token")
        }

        // Per-method authorization check
        allowedCallers, exists := methodACL[info.FullMethod]
        if !exists {
            // Method not in ACL: deny by default
            return nil, status.Error(codes.PermissionDenied, "method not authorized")
        }

        authorized := false
        for _, allowed := range allowedCallers {
            if callerIdentity.ServiceName == allowed {
                authorized = true
                break
            }
        }
        if !authorized {
            return nil, status.Errorf(
                codes.PermissionDenied,
                "service %s not authorized for %s",
                callerIdentity.ServiceName,
                info.FullMethod,
            )
        }

        // Add caller identity to context for downstream use
        ctx = context.WithValue(ctx, callerKey, callerIdentity)
        return handler(ctx, req)
    }
}

// Streaming interceptor for streaming RPCs
func StreamAuthInterceptor(tokenValidator TokenValidator) grpc.StreamServerInterceptor {
    return func(
        srv interface{},
        ss grpc.ServerStream,
        info *grpc.StreamServerInfo,
        handler grpc.StreamHandler,
    ) error {
        if publicMethods[info.FullMethod] {
            return handler(srv, ss)
        }

        md, ok := metadata.FromIncomingContext(ss.Context())
        if !ok {
            return status.Error(codes.Unauthenticated, "missing metadata")
        }

        authHeader := md.Get("authorization")
        if len(authHeader) == 0 {
            return status.Error(codes.Unauthenticated, "missing authorization header")
        }

        token := strings.TrimPrefix(authHeader[0], "Bearer ")
        _, err := tokenValidator.Validate(ss.Context(), token)
        if err != nil {
            return status.Error(codes.Unauthenticated, "invalid token")
        }

        return handler(srv, ss)
    }
}

Register interceptors on the server:

server := grpc.NewServer(
    grpc.Creds(credentials.NewTLS(tlsConfig)),
    grpc.ChainUnaryInterceptor(
        UnaryAuthInterceptor(tokenValidator),
        UnaryLoggingInterceptor(),
    ),
    grpc.ChainStreamInterceptor(
        StreamAuthInterceptor(tokenValidator),
        StreamLoggingInterceptor(),
    ),
    grpc.MaxRecvMsgSize(4 * 1024 * 1024),
    grpc.MaxConcurrentStreams(100),
)

Deadline Enforcement

Every gRPC call must have a deadline. Without one, a slow backend causes the caller to wait indefinitely, consuming a connection and a goroutine. Enforce deadlines on both client and server:

// Client-side: always set a deadline
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()

resp, err := client.GetUser(ctx, &pb.GetUserRequest{UserId: "123"})
if err != nil {
    // Handle context.DeadlineExceeded specifically
    if status.Code(err) == codes.DeadlineExceeded {
        log.Println("GetUser timed out after 5 seconds")
    }
}
// Server-side interceptor: enforce a maximum deadline
func DeadlineInterceptor(maxDeadline time.Duration) grpc.UnaryServerInterceptor {
    return func(
        ctx context.Context,
        req interface{},
        info *grpc.UnaryServerInfo,
        handler grpc.UnaryHandler,
    ) (interface{}, error) {
        // If client did not set a deadline, impose one
        if _, hasDeadline := ctx.Deadline(); !hasDeadline {
            var cancel context.CancelFunc
            ctx, cancel = context.WithTimeout(ctx, maxDeadline)
            defer cancel()
        }
        return handler(ctx, req)
    }
}

Envoy gRPC Proxy Hardening

When Envoy fronts your gRPC services, apply these security controls:

# Envoy configuration for gRPC proxy
static_resources:
  listeners:
    - name: grpc_listener
      address:
        socket_address:
          address: 0.0.0.0
          port_value: 8443
      filter_chains:
        - transport_socket:
            name: envoy.transport_sockets.tls
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext
              require_client_certificate: true
              common_tls_context:
                tls_params:
                  tls_minimum_protocol_version: TLSv1_3
                tls_certificates:
                  - certificate_chain:
                      filename: /etc/envoy/certs/server.crt
                    private_key:
                      filename: /etc/envoy/certs/server.key
                validation_context:
                  trusted_ca:
                    filename: /etc/envoy/certs/ca.crt
          filters:
            - name: envoy.filters.network.http_connection_manager
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
                stat_prefix: grpc_ingress
                codec_type: HTTP2

                # gRPC-specific timeout
                stream_idle_timeout: 300s

                http2_protocol_options:
                  max_concurrent_streams: 100
                  initial_stream_window_size: 1048576
                  initial_connection_window_size: 1048576

                route_config:
                  name: grpc_route
                  virtual_hosts:
                    - name: grpc_services
                      domains: ["*"]
                      routes:
                        # Per-method routing with timeouts
                        - match:
                            prefix: "/mypackage.MyService/GetUser"
                            grpc: {}
                          route:
                            cluster: myservice
                            timeout: 5s
                            max_stream_duration:
                              max_stream_duration: 5s
                        - match:
                            prefix: "/mypackage.MyService/StreamUpdates"
                            grpc: {}
                          route:
                            cluster: myservice
                            timeout: 0s
                            max_stream_duration:
                              max_stream_duration: 3600s
                        # Default: deny unmatched methods
                        - match:
                            prefix: "/"
                            grpc: {}
                          direct_response:
                            status: 403
                            body:
                              inline_string: "Method not allowed"

                http_filters:
                  # Rate limiting per method
                  - name: envoy.filters.http.local_ratelimit
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.filters.http.local_ratelimit.v3.LocalRateLimit
                      stat_prefix: grpc_rate_limit
                      token_bucket:
                        max_tokens: 100
                        tokens_per_fill: 50
                        fill_interval: 1s
                  - name: envoy.filters.http.router
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router

  clusters:
    - name: myservice
      connect_timeout: 5s
      type: STRICT_DNS
      lb_policy: ROUND_ROBIN
      typed_extension_protocol_options:
        envoy.extensions.upstreams.http.v3.HttpProtocolOptions:
          "@type": type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions
          explicit_http_config:
            http2_protocol_options:
              max_concurrent_streams: 100
      load_assignment:
        cluster_name: myservice
        endpoints:
          - lb_endpoints:
              - endpoint:
                  address:
                    socket_address:
                      address: 10.0.1.10
                      port_value: 8443
      transport_socket:
        name: envoy.transport_sockets.tls
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
          common_tls_context:
            tls_params:
              tls_minimum_protocol_version: TLSv1_3
            tls_certificates:
              - certificate_chain:
                  filename: /etc/envoy/certs/client.crt
                private_key:
                  filename: /etc/envoy/certs/client.key
            validation_context:
              trusted_ca:
                filename: /etc/envoy/certs/ca.crt

Health Check Endpoint Security

Restrict health check access to internal monitoring systems only:

// Option 1: Serve health checks on a separate port
// Health check listener on internal-only port
healthLis, _ := net.Listen("tcp", "127.0.0.1:8444")
healthServer := grpc.NewServer() // No TLS, localhost only
healthpb.RegisterHealthServer(healthServer, healthSvc)
go healthServer.Serve(healthLis)

// Main service on external port with full security
mainLis, _ := net.Listen("tcp", ":8443")
mainServer := grpc.NewServer(opts...)
pb.RegisterMyServiceServer(mainServer, &impl{})
mainServer.Serve(mainLis)
# Kubernetes: health check on a separate port
apiVersion: v1
kind: Pod
spec:
  containers:
    - name: myservice
      ports:
        - containerPort: 8443
          name: grpc
        - containerPort: 8444
          name: health
      livenessProbe:
        grpc:
          port: 8444
        initialDelaySeconds: 10
        periodSeconds: 10
      readinessProbe:
        grpc:
          port: 8444
        initialDelaySeconds: 5
        periodSeconds: 5

Expected Behaviour

After applying the gRPC security configuration:

# Verify TLS is required (plaintext connection should fail)
grpcurl -plaintext localhost:8443 list
# Expected: "Failed to dial target" or "connection refused"

# Verify mTLS works with valid client cert
grpcurl \
  -cacert /etc/certs/ca.crt \
  -cert /etc/certs/client.crt \
  -key /etc/certs/client.key \
  localhost:8443 grpc.health.v1.Health/Check
# Expected: {"status":"SERVING"}

# Verify unauthorized method is rejected
grpcurl \
  -cacert /etc/certs/ca.crt \
  -cert /etc/certs/client.crt \
  -key /etc/certs/client.key \
  -H "authorization: Bearer <reporting-service-token>" \
  -d '{"user_id": "123"}' \
  localhost:8443 mypackage.MyService/DeleteUser
# Expected: "PermissionDenied: service reporting-service not authorized for /mypackage.MyService/DeleteUser"

# Verify message size limit
# Generate a 5MB payload (exceeds 4MB limit)
grpcurl \
  -cacert /etc/certs/ca.crt \
  -cert /etc/certs/client.crt \
  -key /etc/certs/client.key \
  -d "$(python3 -c 'print("{\"data\":\"" + "A"*5000000 + "\"}")')" \
  localhost:8443 mypackage.MyService/ProcessData
# Expected: "ResourceExhausted: grpc: received message larger than max"

Trade-offs

Control Impact Risk Mitigation
mTLS for all gRPC services Certificate management overhead; every service needs a cert Certificate rotation failures cause outages Use cert-manager with short-lived certificates (24h); automate rotation
Per-method ACL in interceptors Requires updating the ACL map when adding new RPCs New RPCs are inaccessible until ACL is updated (default deny) Store ACL in a ConfigMap or external config; fail-open for development environments only
MaxRecvMsgSize(4MB) Limits maximum protobuf message size Legitimate large payloads (file transfer, batch operations) are rejected Use streaming RPCs for large data; increase limit per-method where necessary
Deadline enforcement (5s default) Slow operations time out Complex queries or batch operations exceed the default deadline Set per-RPC deadlines appropriate to each method’s expected latency
MaxConcurrentStreams(100) Limits parallel RPCs per connection High-throughput clients may exhaust stream capacity Clients should use connection pooling; increase limit for known high-throughput callers
Health check on separate port Requires managing an additional port Port misconfiguration exposes health on the main port Validate with a network scan after deployment; use Kubernetes NetworkPolicy to restrict health port access

Failure Modes

Failure Symptom Detection Recovery
Certificate expired All gRPC connections fail with TLS handshake error Certificate monitoring alerts; connection error rate spikes to 100% Renew certificate; consider cert-manager with automatic renewal
ACL missing new service identity New service cannot call any RPCs; receives PermissionDenied Service deployment fails health checks; logs show PermissionDenied for the new service Add the service identity to the methodACL map and redeploy
MaxRecvMsgSize too small Legitimate large requests fail with ResourceExhausted Application logs show ResourceExhausted for specific RPCs; client error reports Increase limit per-method using grpc.MaxRecvMsgSize in the handler registration
No deadline set by client Server goroutines accumulate; memory grows; eventually OOM Goroutine count in metrics increases steadily; memory usage climbs without releasing Deploy the DeadlineInterceptor to enforce server-side maximums; fix clients to set deadlines
Keepalive too aggressive Clients on high-latency networks get disconnected Connection resets from clients in remote regions; increased reconnection rate Increase MinTime in KeepaliveEnforcementPolicy; adjust based on client network conditions

When to Consider a Managed Alternative

Transition point: When managing certificates, interceptors, and per-method ACLs across 20+ gRPC services becomes a full-time maintenance task, or when you need consistent security policy enforcement that cannot rely on every service team correctly implementing interceptors.

What managed alternatives handle:

  • Service mesh (Istio, Linkerd): Automatic mTLS between all services without application code changes. Sidecar proxies handle TLS termination, certificate rotation, and mutual authentication. Istio AuthorizationPolicy provides per-method access control declaratively, without modifying application interceptors.

  • Sysdig (#122): Runtime monitoring for gRPC traffic patterns, detecting anomalous RPC calls, unexpected callers, and unusual message sizes. Provides visibility into service-to-service communication that application-level logging may miss.

What you still control: Business logic authorization (which user can delete which resource), request validation (is this protobuf message semantically valid), and application-level rate limiting based on business rules remain in your application code regardless of mesh or monitoring provider.