Centralized Logging Architecture for Security: Fluentd, Vector, and Loki Compared

Centralized Logging Architecture for Security: Fluentd, Vector, and Loki Compared

Problem

Self-managed log infrastructure is one of the highest operational costs for small-to-medium teams. The choice of collector (Fluentd vs Vector vs Promtail) and backend (Loki vs Elasticsearch vs OpenObserve) determines your query capability, operational burden, and cost for years. Choosing wrong is expensive to reverse, log pipelines are deeply integrated into every service.

For security use cases specifically, the requirements are: full-text search across log content (not just labels), sub-10-second query latency for 30-day windows, structured log support (JSON, key-value), retention management (30 days hot, 12 months archival), and integration with alerting (fire alerts from log queries).

Threat Model

  • Adversary: Any attacker. Centralized logs are the foundation of all security investigation and detection. Without them, you are blind.

Configuration

Collector Comparison

Feature Vector Fluentd Promtail
Language Rust Ruby + C Go
Memory usage (idle) 15-30MB 50-100MB 20-40MB
Throughput 10-50K events/sec/core 5-20K events/sec/core 5-15K events/sec/core
Configuration YAML/TOML Ruby DSL YAML
Transform capability VRL (powerful) Filters (plugin-based) Pipeline stages (limited)
Buffer/retry Built-in disk buffer Plugin-based (buffered output) WAL-based
Kubernetes support Native Via fluent-bit/fluentd Native (Loki only)
Best for New deployments, performance Existing ecosystems, plugin breadth Loki-only deployments

Recommendation: Vector for new deployments (fastest, lowest memory, most flexible transforms). Fluentd only if you have an existing Fluentd ecosystem with custom plugins. Promtail only if you are committed to Loki as the only backend.

Backend Comparison

Feature Loki Elasticsearch OpenObserve (#120) Quickwit (#121)
Query language LogQL (label-based) KQL / Lucene (full-text) SQL + full-text Tantivy (full-text)
Full-text search Limited (filter expressions) Yes (inverted index) Yes Yes
Storage cost (per GB/month) $0.01-0.03 (S3) $0.10-0.50 (local SSD) $0.01-0.03 (S3) $0.01-0.03 (S3)
Operational complexity Low (stateless queriers) High (cluster management) Medium Low
Retention management Built-in compactor ILM policies (complex) Built-in Built-in
Security query suitability Good for label-based (namespace, pod, severity) Best (full-text across all fields) Good Good

Recommendation for security:

  • If you need full-text search across log content: Elasticsearch or OpenObserve
  • If label-based queries (namespace, pod, app, severity) are sufficient: Loki (5-10x cheaper)
  • For most teams: start with Loki (cost-effective), supplement with Elasticsearch only for security investigation queries

Loki Deployment with Vector

# vector-daemonset.yaml - collect logs from all pods
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: vector
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app: vector
  template:
    metadata:
      labels:
        app: vector
    spec:
      serviceAccountName: vector
      containers:
        - name: vector
          image: timberio/vector:0.40.0-debian
          volumeMounts:
            - name: config
              mountPath: /etc/vector
            - name: varlog
              mountPath: /var/log
              readOnly: true
            - name: varlogpods
              mountPath: /var/log/pods
              readOnly: true
          resources:
            requests:
              cpu: 100m
              memory: 128Mi
            limits:
              cpu: 500m
              memory: 256Mi
      volumes:
        - name: config
          configMap:
            name: vector-config
        - name: varlog
          hostPath:
            path: /var/log
        - name: varlogpods
          hostPath:
            path: /var/log/pods
# vector-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: vector-config
  namespace: monitoring
data:
  vector.yaml: |
    sources:
      kubernetes_logs:
        type: kubernetes_logs
        auto_partial_merge: true

    transforms:
      parse_json:
        type: remap
        inputs: [kubernetes_logs]
        source: |
          # Parse JSON log lines (most applications log structured JSON)
          parsed, err = parse_json(.message)
          if err == null {
            . = merge(., parsed)
          }
          # Add security-relevant labels
          .security_source = .kubernetes.pod_namespace + "/" + .kubernetes.pod_name

      filter_security:
        type: filter
        inputs: [parse_json]
        condition:
          type: vrl
          source: |
            # Keep all logs from security-relevant namespaces
            # and any log containing security-relevant keywords
            includes(["production", "kube-system", "falco", "monitoring"], .kubernetes.pod_namespace) ||
            match!(.message, r'(?i)(error|fail|denied|unauthorized|forbidden|CVE|exploit)')

    sinks:
      loki:
        type: loki
        inputs: [filter_security]
        endpoint: "http://loki.monitoring:3100"
        labels:
          namespace: "{{ kubernetes.pod_namespace }}"
          pod: "{{ kubernetes.pod_name }}"
          container: "{{ kubernetes.container_name }}"
          app: "{{ kubernetes.pod_labels.app }}"
          severity: "{{ level }}"
        encoding:
          codec: json

Log Parsing and Enrichment

# Vector transform for structured security log enrichment
transforms:
  enrich_security:
    type: remap
    inputs: [parse_json]
    source: |
      # Classify log severity for security
      if match!(.message, r'(?i)(critical|emergency|fatal)') {
        .security_severity = "critical"
      } else if match!(.message, r'(?i)(error|fail|denied|unauthorized)') {
        .security_severity = "high"
      } else if match!(.message, r'(?i)(warn|deprecat)') {
        .security_severity = "medium"
      } else {
        .security_severity = "low"
      }

      # Extract common security fields
      .source_ip = parse_regex!(.message, r'(?P<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})').ip ?? null
      .user = parse_regex!(.message, r'user[= ]+(?P<user>\S+)').user ?? null

Retention Policies

# Loki retention configuration (loki-config.yaml)
limits_config:
  retention_period: 720h  # 30 days for queryable hot storage

compactor:
  working_directory: /data/loki/compactor
  retention_enabled: true
  retention_delete_delay: 2h
  delete_request_store: s3

# For 12-month archival: ship a copy to immutable S3 (see Article #65)

Security Alerting from Logs

# Loki alerting rules (via Grafana or Loki ruler)
groups:
  - name: security-log-alerts
    rules:
      - alert: AuthenticationFailureSpike
        expr: >
          sum(rate({namespace="production"} |= "authentication failed" [5m])) > 0.5
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "Authentication failure spike in production"

      - alert: UnauthorizedAPIAccess
        expr: >
          sum(rate({namespace="production"} |~ "403|Forbidden|Unauthorized" [5m])) > 1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Elevated 403/Unauthorized responses in production"

      - alert: SuspiciousCommandExecution
        expr: >
          count_over_time({source="auditd"} |= "exec" |~ "curl|wget|nc|ncat|python.*-c|perl.*-e" [5m]) > 0
        labels:
          severity: critical
        annotations:
          summary: "Suspicious command execution detected in audit logs"

Expected Behaviour

  • All cluster logs centralized within 30 seconds of generation
  • Security queries return results within 5 seconds for 30-day window
  • No log loss under sustained load (verified with canary log entries)
  • Retention policies manage 30-day hot and 12-month archival automatically
  • Security alerts fire from log queries within 2 minutes of the event
  • Vector DaemonSet consumes <256MB memory per node

Trade-offs

Backend Monthly Cost (20-node cluster) Query Capability Ops Effort
Loki (self-managed) $50-100 (S3 storage) Label-based; limited full-text Low (stateless, S3-backed)
Elasticsearch (self-managed) $200-500 (SSD storage + compute) Full-text search, aggregations High (cluster management, ILM)
OpenObserve (#120) (self-managed) $50-100 (S3 storage) Full-text + SQL Medium (simpler than ES)
Grafana Cloud (#108) Loki $0-200 (usage-based) Same as self-managed Loki Zero (fully managed)
Axiom (#112) $0-100 (500GB free) Full-text, serverless Zero (fully managed)

Failure Modes

Failure Symptom Detection Recovery
Vector DaemonSet not running on node Logs from that node not collected DaemonSet pod count < node count; log gap detection alert Check tolerations. Fix resource limits. Ensure Vector pod can schedule on all nodes.
Loki ingestion rate exceeded Logs rejected with 429; Vector retries Loki metrics show 429 rate_limited; Vector shows delivery retries Scale Loki ingesters. Or: increase rate limits in limits_config.
Elasticsearch cluster red Log ingestion stops; queries fail ES cluster health API shows red; Prometheus ES exporter alerts Fix shard allocation. Add nodes. Or: migrate to managed backend.
Log parsing fails Structured fields missing; queries return no results Dashboard panels show “no data” for structured fields; raw .message still present Fix VRL parsing in Vector transform. Test with vector tap for live debugging.
Retention not applied Storage grows unbounded; disk fills Disk usage alerts; Loki compactor metrics show no deletions Check compactor configuration. Verify retention_enabled: true. Check compactor pod is running.

When to Consider a Managed Alternative

This is the strongest observability conversion article. Self-managed Elasticsearch is a full-time job past 20 hosts. Even Loki, while simpler, requires capacity planning, storage management, and version upgrades.

  • Grafana Cloud (#108): Managed Loki. Start free (50GB logs/month). Native Grafana integration. The most natural migration from self-hosted Loki.
  • Axiom (#112): 500GB/month free. Serverless query. Zero cluster management. Full-text search (unlike Loki). Best for teams that want to ingest everything without worrying about backend operations.
  • Better Stack (#113): Logging + uptime monitoring + incident management in one. Managed. For teams wanting a single vendor for log-related concerns.
  • SigNoz (#117): OpenTelemetry-native. Unified logs + metrics + traces. For teams migrating to OTel.

Premium content pack: Logging pipeline configurations. Vector DaemonSet manifests for Kubernetes, Vector configs for Loki/Elasticsearch/Axiom backends, log parsing transforms for common application frameworks, Loki alerting rules for security events, and retention policy templates.