Centralized Logging Architecture for Security: Fluentd, Vector, and Loki Compared

Problem

Self-managed log infrastructure is one of the highest operational costs for small-to-medium teams. The choice of collector (Fluentd vs Vector vs Promtail) and backend (Loki vs Elasticsearch vs OpenObserve) determines your query capability, operational burden, and cost for years. Choosing wrong is expensive to reverse, log pipelines are deeply integrated into every service.

For security use cases specifically, the requirements are: full-text search across log content (not just labels), sub-10-second query latency for 30-day windows, structured log support (JSON, key-value), retention management (30 days hot, 12 months archival), and integration with alerting (fire alerts from log queries).

Threat Model

Adversary: Any attacker. Centralized logs are the foundation of all security investigation and detection. Without them, you are blind.

Configuration

Collector Comparison

Feature	Vector	Fluentd	Promtail
Language	Rust	Ruby + C	Go
Memory usage (idle)	15-30MB	50-100MB	20-40MB
Throughput	10-50K events/sec/core	5-20K events/sec/core	5-15K events/sec/core
Configuration	YAML/TOML	Ruby DSL	YAML
Transform capability	VRL (powerful)	Filters (plugin-based)	Pipeline stages (limited)
Buffer/retry	Built-in disk buffer	Plugin-based (buffered output)	WAL-based
Kubernetes support	Native	Via fluent-bit/fluentd	Native (Loki only)
Best for	New deployments, performance	Existing ecosystems, plugin breadth	Loki-only deployments

Recommendation: Vector for new deployments (fastest, lowest memory, most flexible transforms). Fluentd only if you have an existing Fluentd ecosystem with custom plugins. Promtail only if you are committed to Loki as the only backend.

Backend Comparison

Feature	Loki	Elasticsearch	OpenObserve (#120)	Quickwit (#121)
Query language	LogQL (label-based)	KQL / Lucene (full-text)	SQL + full-text	Tantivy (full-text)
Full-text search	Limited (filter expressions)	Yes (inverted index)	Yes	Yes
Storage cost (per GB/month)	$0.01-0.03 (S3)	$0.10-0.50 (local SSD)	$0.01-0.03 (S3)	$0.01-0.03 (S3)
Operational complexity	Low (stateless queriers)	High (cluster management)	Medium	Low
Retention management	Built-in compactor	ILM policies (complex)	Built-in	Built-in
Security query suitability	Good for label-based (namespace, pod, severity)	Best (full-text across all fields)	Good	Good

Recommendation for security:

If you need full-text search across log content: Elasticsearch or OpenObserve
If label-based queries (namespace, pod, app, severity) are sufficient: Loki (5-10x cheaper)
For most teams: start with Loki (cost-effective), supplement with Elasticsearch only for security investigation queries

Loki Deployment with Vector

# vector-daemonset.yaml - collect logs from all pods
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: vector
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app: vector
  template:
    metadata:
      labels:
        app: vector
    spec:
      serviceAccountName: vector
      containers:
        - name: vector
          image: timberio/vector:0.40.0-debian
          volumeMounts:
            - name: config
              mountPath: /etc/vector
            - name: varlog
              mountPath: /var/log
              readOnly: true
            - name: varlogpods
              mountPath: /var/log/pods
              readOnly: true
          resources:
            requests:
              cpu: 100m
              memory: 128Mi
            limits:
              cpu: 500m
              memory: 256Mi
      volumes:
        - name: config
          configMap:
            name: vector-config
        - name: varlog
          hostPath:
            path: /var/log
        - name: varlogpods
          hostPath:
            path: /var/log/pods

# vector-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: vector-config
  namespace: monitoring
data:
  vector.yaml: |
    sources:
      kubernetes_logs:
        type: kubernetes_logs
        auto_partial_merge: true

    transforms:
      parse_json:
        type: remap
        inputs: [kubernetes_logs]
        source: |
          # Parse JSON log lines (most applications log structured JSON)
          parsed, err = parse_json(.message)
          if err == null {
            . = merge(., parsed)
          }
          # Add security-relevant labels
          .security_source = .kubernetes.pod_namespace + "/" + .kubernetes.pod_name

      filter_security:
        type: filter
        inputs: [parse_json]
        condition:
          type: vrl
          source: |
            # Keep all logs from security-relevant namespaces
            # and any log containing security-relevant keywords
            includes(["production", "kube-system", "falco", "monitoring"], .kubernetes.pod_namespace) ||
            match!(.message, r'(?i)(error|fail|denied|unauthorized|forbidden|CVE|exploit)')

    sinks:
      loki:
        type: loki
        inputs: [filter_security]
        endpoint: "http://loki.monitoring:3100"
        labels:
          namespace: "{{ kubernetes.pod_namespace }}"
          pod: "{{ kubernetes.pod_name }}"
          container: "{{ kubernetes.container_name }}"
          app: "{{ kubernetes.pod_labels.app }}"
          severity: "{{ level }}"
        encoding:
          codec: json

Log Parsing and Enrichment

# Vector transform for structured security log enrichment
transforms:
  enrich_security:
    type: remap
    inputs: [parse_json]
    source: |
      # Classify log severity for security
      if match!(.message, r'(?i)(critical|emergency|fatal)') {
        .security_severity = "critical"
      } else if match!(.message, r'(?i)(error|fail|denied|unauthorized)') {
        .security_severity = "high"
      } else if match!(.message, r'(?i)(warn|deprecat)') {
        .security_severity = "medium"
      } else {
        .security_severity = "low"
      }

      # Extract common security fields
      .source_ip = parse_regex!(.message, r'(?P<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})').ip ?? null
      .user = parse_regex!(.message, r'user[= ]+(?P<user>\S+)').user ?? null

Retention Policies

# Loki retention configuration (loki-config.yaml)
limits_config:
  retention_period: 720h  # 30 days for queryable hot storage

compactor:
  working_directory: /data/loki/compactor
  retention_enabled: true
  retention_delete_delay: 2h
  delete_request_store: s3

# For 12-month archival: ship a copy to immutable S3 (see Article #65)

Security Alerting from Logs

# Loki alerting rules (via Grafana or Loki ruler)
groups:
  - name: security-log-alerts
    rules:
      - alert: AuthenticationFailureSpike
        expr: >
          sum(rate({namespace="production"} |= "authentication failed" [5m])) > 0.5
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "Authentication failure spike in production"

      - alert: UnauthorizedAPIAccess
        expr: >
          sum(rate({namespace="production"} |~ "403|Forbidden|Unauthorized" [5m])) > 1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Elevated 403/Unauthorized responses in production"

      - alert: SuspiciousCommandExecution
        expr: >
          count_over_time({source="auditd"} |= "exec" |~ "curl|wget|nc|ncat|python.*-c|perl.*-e" [5m]) > 0
        labels:
          severity: critical
        annotations:
          summary: "Suspicious command execution detected in audit logs"

Expected Behaviour

All cluster logs centralized within 30 seconds of generation
Security queries return results within 5 seconds for 30-day window
No log loss under sustained load (verified with canary log entries)
Retention policies manage 30-day hot and 12-month archival automatically
Security alerts fire from log queries within 2 minutes of the event
Vector DaemonSet consumes <256MB memory per node

Trade-offs

Backend	Monthly Cost (20-node cluster)	Query Capability	Ops Effort
Loki (self-managed)	$50-100 (S3 storage)	Label-based; limited full-text	Low (stateless, S3-backed)
Elasticsearch (self-managed)	$200-500 (SSD storage + compute)	Full-text search, aggregations	High (cluster management, ILM)
OpenObserve (#120) (self-managed)	$50-100 (S3 storage)	Full-text + SQL	Medium (simpler than ES)
Grafana Cloud (#108) Loki	$0-200 (usage-based)	Same as self-managed Loki	Zero (fully managed)
Axiom (#112)	$0-100 (500GB free)	Full-text, serverless	Zero (fully managed)

Failure Modes

Failure	Symptom	Detection	Recovery
Vector DaemonSet not running on node	Logs from that node not collected	DaemonSet pod count < node count; log gap detection alert	Check tolerations. Fix resource limits. Ensure Vector pod can schedule on all nodes.
Loki ingestion rate exceeded	Logs rejected with 429; Vector retries	Loki metrics show `429 rate_limited`; Vector shows delivery retries	Scale Loki ingesters. Or: increase rate limits in `limits_config`.
Elasticsearch cluster red	Log ingestion stops; queries fail	ES cluster health API shows red; Prometheus ES exporter alerts	Fix shard allocation. Add nodes. Or: migrate to managed backend.
Log parsing fails	Structured fields missing; queries return no results	Dashboard panels show “no data” for structured fields; raw `.message` still present	Fix VRL parsing in Vector transform. Test with `vector tap` for live debugging.
Retention not applied	Storage grows unbounded; disk fills	Disk usage alerts; Loki compactor metrics show no deletions	Check compactor configuration. Verify `retention_enabled: true`. Check compactor pod is running.

When to Consider a Managed Alternative

This is the strongest observability conversion article. Self-managed Elasticsearch is a full-time job past 20 hosts. Even Loki, while simpler, requires capacity planning, storage management, and version upgrades.

Grafana Cloud (#108): Managed Loki. Start free (50GB logs/month). Native Grafana integration. The most natural migration from self-hosted Loki.
Axiom (#112): 500GB/month free. Serverless query. Zero cluster management. Full-text search (unlike Loki). Best for teams that want to ingest everything without worrying about backend operations.
Better Stack (#113): Logging + uptime monitoring + incident management in one. Managed. For teams wanting a single vendor for log-related concerns.
SigNoz (#117): OpenTelemetry-native. Unified logs + metrics + traces. For teams migrating to OTel.

Premium content pack: Logging pipeline configurations. Vector DaemonSet manifests for Kubernetes, Vector configs for Loki/Elasticsearch/Axiom backends, log parsing transforms for common application frameworks, Loki alerting rules for security events, and retention policy templates.