Vector Database Security: Access Control, Embedding Protection, and Query Isolation

Problem

Vector databases are the backbone of RAG (Retrieval-Augmented Generation) systems. They store document embeddings that encode the semantic content of proprietary data: internal documentation, customer records, legal documents, and codebases. Unlike traditional databases, vector stores are often deployed with minimal security because they are treated as “just a cache” or “an index.” Most self-hosted deployments of Qdrant, Weaviate, or Milvus ship with no authentication, no TLS, and no namespace isolation.

An attacker who gains access to a vector database can reconstruct sensitive information from embeddings, query across tenant boundaries in multi-tenant RAG systems, poison the retrieval pipeline by injecting malicious documents, or exhaust resources through expensive nearest-neighbor queries. Because embeddings are dense numerical representations, traditional DLP tools do not flag them as sensitive data, even though they encode proprietary content.

Target systems: Self-hosted Qdrant, Weaviate, or Milvus on Kubernetes. Also applies to managed services (Pinecone, Zilliz) where client-side controls are needed.

Threat Model

Adversary: Internal user with network access to the vector database, or external attacker who has compromised an application with database connectivity.
Objective: Embedding extraction (download embeddings to reconstruct source documents). Cross-tenant data access (query one tenant’s data from another tenant’s context). Retrieval poisoning (inject embeddings that cause the RAG pipeline to retrieve misleading content). Denial of service (run expensive similarity searches that exhaust CPU/memory).
Blast radius: Proprietary data reconstructed from embeddings (confidentiality). RAG pipeline returns poisoned results (integrity). Vector database unavailable due to resource exhaustion (availability).

Configuration

Qdrant Authentication and TLS

Qdrant supports API key authentication starting from version 1.7. Enable it alongside TLS.

# qdrant-config.yaml - hardened Qdrant configuration
apiVersion: v1
kind: ConfigMap
metadata:
  name: qdrant-config
  namespace: vector-db
data:
  config.yaml: |
    service:
      host: 0.0.0.0
      http_port: 6333
      grpc_port: 6334

      # Enable API key authentication
      api_key: "${QDRANT_API_KEY}"

      # Enable read-only API key for query-only clients
      read_only_api_key: "${QDRANT_READ_ONLY_KEY}"

      # TLS configuration
      enable_tls: true
      tls:
        cert: /certs/tls.crt
        key: /certs/tls.key
        ca_cert: /certs/ca.crt

      # Request size limits
      max_request_size_mb: 32

    storage:
      # Storage path on the encrypted volume
      storage_path: /qdrant/storage

      # Snapshot security
      snapshots_path: /qdrant/snapshots

      # Performance tuning that also limits resource abuse
      optimizers:
        max_optimization_threads: 2
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: qdrant
  namespace: vector-db
spec:
  serviceName: qdrant
  replicas: 1
  selector:
    matchLabels:
      app: qdrant
  template:
    metadata:
      labels:
        app: qdrant
    spec:
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        fsGroup: 1000
        seccompProfile:
          type: RuntimeDefault
      containers:
        - name: qdrant
          image: qdrant/qdrant:v1.12.0
          ports:
            - containerPort: 6333
              name: http
            - containerPort: 6334
              name: grpc
          securityContext:
            allowPrivilegeEscalation: false
            capabilities:
              drop: ["ALL"]
          env:
            - name: QDRANT_API_KEY
              valueFrom:
                secretKeyRef:
                  name: qdrant-credentials
                  key: api-key
            - name: QDRANT_READ_ONLY_KEY
              valueFrom:
                secretKeyRef:
                  name: qdrant-credentials
                  key: read-only-key
          resources:
            requests:
              cpu: "2"
              memory: "4Gi"
            limits:
              cpu: "4"
              memory: "8Gi"
          volumeMounts:
            - name: data
              mountPath: /qdrant/storage
            - name: snapshots
              mountPath: /qdrant/snapshots
            - name: config
              mountPath: /qdrant/config/config.yaml
              subPath: config.yaml
              readOnly: true
            - name: certs
              mountPath: /certs
              readOnly: true
      volumes:
        - name: config
          configMap:
            name: qdrant-config
        - name: certs
          secret:
            secretName: qdrant-tls
            defaultMode: 0440
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes: ["ReadWriteOnce"]
        resources:
          requests:
            storage: 100Gi
        storageClassName: encrypted-gp3  # Encryption at rest via storage class
    - metadata:
        name: snapshots
      spec:
        accessModes: ["ReadWriteOnce"]
        resources:
          requests:
            storage: 50Gi
        storageClassName: encrypted-gp3

Weaviate Authentication and Multi-Tenancy

# weaviate-config.yaml - hardened Weaviate with OIDC and multi-tenancy
apiVersion: v1
kind: ConfigMap
metadata:
  name: weaviate-config
  namespace: vector-db
data:
  conf.yaml: |
    authentication:
      # OIDC authentication for production
      oidc:
        enabled: true
        issuer: https://auth.example.com/realms/ml
        client_id: weaviate
        username_claim: email
        groups_claim: groups

      # API key authentication as fallback
      apikey:
        enabled: true
        allowed_keys:
          - "${WEAVIATE_ADMIN_KEY}"
          - "${WEAVIATE_READONLY_KEY}"
        users:
          - admin@example.com
          - readonly@example.com

    authorization:
      # Role-based access control
      rbac:
        enabled: true
        admins:
          - admin@example.com
        viewers:
          - readonly@example.com
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: weaviate
  namespace: vector-db
spec:
  serviceName: weaviate
  replicas: 1
  selector:
    matchLabels:
      app: weaviate
  template:
    metadata:
      labels:
        app: weaviate
    spec:
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        fsGroup: 1000
        seccompProfile:
          type: RuntimeDefault
      containers:
        - name: weaviate
          image: cr.weaviate.io/semitechnologies/weaviate:1.27.0
          ports:
            - containerPort: 8080
              name: http
            - containerPort: 50051
              name: grpc
          securityContext:
            allowPrivilegeEscalation: false
            capabilities:
              drop: ["ALL"]
          env:
            - name: AUTHENTICATION_OIDC_ENABLED
              value: "true"
            - name: AUTHENTICATION_OIDC_ISSUER
              value: "https://auth.example.com/realms/ml"
            - name: AUTHENTICATION_OIDC_CLIENT_ID
              value: "weaviate"
            - name: AUTHORIZATION_RBAC_ENABLED
              value: "true"
            - name: QUERY_DEFAULTS_LIMIT
              value: "100"  # Limit default query results
            - name: QUERY_MAXIMUM_RESULTS
              value: "1000"  # Hard cap on returned results
            - name: LIMIT_RESOURCES
              value: "true"
          resources:
            requests:
              cpu: "2"
              memory: "4Gi"
            limits:
              cpu: "4"
              memory: "8Gi"
          volumeMounts:
            - name: data
              mountPath: /var/lib/weaviate
      volumes: []
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes: ["ReadWriteOnce"]
        resources:
          requests:
            storage: 100Gi
        storageClassName: encrypted-gp3

Namespace Isolation for Multi-Tenant RAG

Use Qdrant collections or Weaviate tenants to isolate data per customer, combined with application-level enforcement.

# tenant_isolation.py - application-level tenant isolation for Qdrant
from functools import wraps
from typing import Optional

from flask import Flask, g, jsonify, request
from qdrant_client import QdrantClient
from qdrant_client.http.models import (
    Distance,
    Filter,
    FieldCondition,
    MatchValue,
    PointStruct,
    VectorParams,
)

app = Flask(__name__)

qdrant = QdrantClient(
    url="https://qdrant.vector-db.svc.cluster.local:6333",
    api_key="<from-secret>",
    https=True,
)


def require_tenant(f):
    """Extract and validate tenant ID from the authenticated request."""
    @wraps(f)
    def decorated(*args, **kwargs):
        # Tenant ID comes from the verified JWT (set by gateway)
        tenant_id = request.headers.get("X-Tenant-Id")
        if not tenant_id:
            return jsonify({"error": "tenant ID required"}), 400

        # Validate tenant ID format (prevent injection)
        if not tenant_id.isalnum() or len(tenant_id) > 64:
            return jsonify({"error": "invalid tenant ID"}), 400

        g.tenant_id = tenant_id
        return f(*args, **kwargs)
    return decorated


def get_tenant_collection(tenant_id: str) -> str:
    """Map tenant ID to a dedicated Qdrant collection."""
    return f"tenant_{tenant_id}_embeddings"


def ensure_tenant_collection(tenant_id: str):
    """Create a collection for a tenant if it does not exist."""
    collection_name = get_tenant_collection(tenant_id)
    collections = [c.name for c in qdrant.get_collections().collections]
    if collection_name not in collections:
        qdrant.create_collection(
            collection_name=collection_name,
            vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
        )


@app.route("/api/v1/search", methods=["POST"])
@require_tenant
def search():
    """Search within the tenant's isolated collection only."""
    data = request.get_json()
    query_vector = data.get("vector")
    top_k = min(data.get("top_k", 10), 100)  # Cap results

    collection_name = get_tenant_collection(g.tenant_id)

    results = qdrant.search(
        collection_name=collection_name,  # Tenant-scoped collection
        query_vector=query_vector,
        limit=top_k,
        with_payload=True,
        with_vectors=False,  # Never return raw embeddings to clients
    )

    return jsonify({
        "results": [
            {
                "id": str(r.id),
                "score": r.score,
                "metadata": r.payload,
                # Embeddings are NOT included in the response
            }
            for r in results
        ]
    })


@app.route("/api/v1/ingest", methods=["POST"])
@require_tenant
def ingest():
    """Ingest embeddings into the tenant's isolated collection."""
    data = request.get_json()
    points = data.get("points", [])

    if len(points) > 1000:  # Batch size limit
        return jsonify({"error": "max 1000 points per request"}), 400

    collection_name = get_tenant_collection(g.tenant_id)
    ensure_tenant_collection(g.tenant_id)

    qdrant_points = [
        PointStruct(
            id=p["id"],
            vector=p["vector"],
            payload={
                **p.get("metadata", {}),
                "tenant_id": g.tenant_id,  # Always stamp tenant ID
            },
        )
        for p in points
    ]

    qdrant.upsert(collection_name=collection_name, points=qdrant_points)
    return jsonify({"status": "ok", "count": len(qdrant_points)})

Network Policy for Vector Database

# vector-db-network-policy.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: qdrant-access
  namespace: vector-db
spec:
  podSelector:
    matchLabels:
      app: qdrant
  policyTypes:
    - Ingress
  ingress:
    # Only allow access from the RAG application namespace
    - from:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: rag-application
          podSelector:
            matchLabels:
              component: retrieval-service
      ports:
        - port: 6333
          protocol: TCP
        - port: 6334
          protocol: TCP
    # Allow access from the ingestion pipeline namespace
    - from:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: data-ingestion
          podSelector:
            matchLabels:
              component: embedding-pipeline
      ports:
        - port: 6333
          protocol: TCP

Query Rate Limiting

Prevent resource exhaustion from expensive similarity searches.

# rate-limit-policy.yaml - Istio rate limiting for vector DB queries
apiVersion: networking.istio.io/v1
kind: EnvoyFilter
metadata:
  name: qdrant-rate-limit
  namespace: vector-db
spec:
  workloadSelector:
    labels:
      app: qdrant
  configPatches:
    - applyTo: HTTP_FILTER
      match:
        context: SIDECAR_INBOUND
        listener:
          filterChain:
            filter:
              name: envoy.filters.network.http_connection_manager
      patch:
        operation: INSERT_BEFORE
        value:
          name: envoy.filters.http.local_ratelimit
          typed_config:
            "@type": type.googleapis.com/udpa.type.v1.TypedStruct
            type_url: type.googleapis.com/envoy.extensions.filters.http.local_ratelimit.v3.LocalRateLimit
            value:
              stat_prefix: qdrant_rate_limit
              token_bucket:
                max_tokens: 100
                tokens_per_fill: 50
                fill_interval: 60s
              filter_enabled:
                runtime_key: local_rate_limit_enabled
                default_value:
                  numerator: 100
                  denominator: HUNDRED
              filter_enforced:
                runtime_key: local_rate_limit_enforced
                default_value:
                  numerator: 100
                  denominator: HUNDRED
              response_headers_to_add:
                - append_action: OVERWRITE_IF_EXISTS_OR_ADD
                  header:
                    key: x-ratelimit-limit
                    value: "100"

Audit Logging for Vector Database Operations

# vector_audit.py - audit logging middleware
import json
import logging
import time
from datetime import datetime, timezone
from functools import wraps

from flask import g, request

audit_logger = logging.getLogger("vector.audit")
handler = logging.FileHandler("/var/log/vector-db/audit.jsonl")
audit_logger.addHandler(handler)
audit_logger.setLevel(logging.INFO)


def audit_log(operation: str):
    """Decorator to log vector database operations."""
    def decorator(f):
        @wraps(f)
        def wrapper(*args, **kwargs):
            start = time.time()
            try:
                result = f(*args, **kwargs)
                duration = time.time() - start
                entry = {
                    "timestamp": datetime.now(timezone.utc).isoformat(),
                    "operation": operation,
                    "tenant_id": getattr(g, "tenant_id", "unknown"),
                    "source_ip": request.remote_addr,
                    "user_agent": request.headers.get("User-Agent", ""),
                    "duration_ms": round(duration * 1000, 2),
                    "status": "success",
                    "collection": getattr(g, "collection", "unknown"),
                    "result_count": _extract_count(result),
                }
                audit_logger.info(json.dumps(entry))
                return result
            except Exception as e:
                duration = time.time() - start
                entry = {
                    "timestamp": datetime.now(timezone.utc).isoformat(),
                    "operation": operation,
                    "tenant_id": getattr(g, "tenant_id", "unknown"),
                    "source_ip": request.remote_addr,
                    "duration_ms": round(duration * 1000, 2),
                    "status": "error",
                    "error": str(e),
                }
                audit_logger.error(json.dumps(entry))
                raise
        return wrapper
    return decorator


def _extract_count(response) -> int:
    """Extract result count from a Flask response."""
    try:
        data = json.loads(response.get_data(as_text=True))
        return len(data.get("results", []))
    except (json.JSONDecodeError, AttributeError):
        return 0

Preventing Embedding Extraction

Configure the retrieval API to never return raw embedding vectors. Clients should only receive metadata and relevance scores.

# safe_search.py - search endpoint that strips embeddings
from qdrant_client import QdrantClient


def safe_search(
    client: QdrantClient,
    collection: str,
    query_vector: list[float],
    top_k: int = 10,
) -> list[dict]:
    """
    Search that never returns raw embeddings.

    Even if the client requests with_vectors=True, we override it.
    Embeddings are internal representations and should not leave
    the retrieval service.
    """
    results = client.search(
        collection_name=collection,
        query_vector=query_vector,
        limit=min(top_k, 100),
        with_payload=True,
        with_vectors=False,  # Never expose embeddings
    )

    return [
        {
            "id": str(r.id),
            "score": r.score,
            "metadata": {
                k: v
                for k, v in r.payload.items()
                if k not in ("_tenant_id", "_internal_tags")
            },
        }
        for r in results
    ]

Expected Behaviour

Vector database requires API key or OIDC authentication for all operations
All connections use TLS; data at rest is encrypted via the storage class
Each tenant’s embeddings are stored in isolated collections; cross-tenant queries are impossible at the application layer
Raw embedding vectors are never returned to clients; only metadata and relevance scores
Query rate limiting prevents resource exhaustion from expensive similarity searches
All search, ingest, and delete operations are audit-logged with tenant ID, source IP, and duration
Network policies restrict access to the vector database to only the retrieval service and ingestion pipeline

Trade-offs

Control	Impact	Risk	Mitigation
Per-tenant collections	More collections to manage; higher memory usage	Collection count grows linearly with tenants	Use Weaviate multi-tenancy (single collection, tenant isolation) for high tenant counts. Qdrant collection-per-tenant works up to hundreds of tenants.
Never returning embeddings	Clients cannot cache or reuse embeddings locally	Increases query load since clients must re-query for similar searches	Implement server-side caching. Provide a “re-rank” API that operates on IDs rather than vectors.
API key authentication	Simpler than OIDC but less granular	Single key compromise exposes all data	Rotate keys regularly. Use OIDC for user-facing access. Use separate read-only keys for query workloads.
Query rate limiting	Prevents abuse but may throttle legitimate batch operations	Data ingestion pipelines hit rate limits	Use separate rate limit tiers per client type. Exempt the ingestion pipeline service account from search rate limits.
Encrypted storage class	Slight I/O overhead (typically under 5%)	Performance impact on high-throughput vector search	Modern storage encryption (AES-NI) has negligible overhead. Profile before and after to confirm.

Failure Modes

Failure	Symptom	Detection	Recovery
Embedding extraction via API	Attacker downloads large batches of embeddings	Audit logs show bulk queries with unusually high result counts; rate limit alerts fire	Revoke the compromised API key. Verify `with_vectors=False` is enforced server-side. Review audit logs for data scope.
Cross-tenant data leakage	Tenant A’s search returns results from Tenant B	Application logs show collection name mismatch; integration tests detect cross-tenant results	Fix the tenant-to-collection mapping. Audit all queries from the affected time window. Notify affected tenants.
Retrieval poisoning	RAG pipeline returns misleading or harmful content	Output quality metrics degrade; users report incorrect answers	Identify poisoned embeddings via ingestion audit log. Delete the malicious points. Re-ingest from clean source.
Rate limit too aggressive	Legitimate queries are rejected with 429 errors	Application error rates spike; client-side retry storms	Increase rate limits. Implement client-side backoff. Use separate limits for different API paths (search vs. ingest).
Vector database OOM	Qdrant/Weaviate crashes under memory pressure	Pod restarts; OOMKilled events in Kubernetes	Increase memory limits. Enable disk-based indexing (Qdrant: on_disk=true). Reduce the number of loaded collections.

When to Consider a Managed Alternative

Managed vector databases handle authentication, encryption, scaling, and multi-tenancy.

Pinecone: Fully managed vector database with built-in RBAC, encryption, and namespace isolation.
Zilliz Cloud: Managed Milvus with authentication, TLS, and role-based access.
Weaviate Cloud: Managed Weaviate with OIDC, RBAC, and automatic backups.
Cloudflare (#29): Vectorize for edge-deployed vector search with built-in security.
Snyk (#48): Scan vector database container images for vulnerabilities.

Premium content pack: Hardened Qdrant and Weaviate Kubernetes manifests, tenant isolation middleware, audit logging configurations, network policies, and rate limiting templates.